linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v7u1 00/31] x86, boot, 64bit: Add support for loading ramdisk and bzImage above 4G
@ 2013-01-04  0:48 Yinghai Lu
  2013-01-04  0:48 ` [PATCH v7u1 01/31] x86, mm: Fix page table early allocation offset checking Yinghai Lu
                   ` (33 more replies)
  0 siblings, 34 replies; 199+ messages in thread
From: Yinghai Lu @ 2013-01-04  0:48 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H. Peter Anvin
  Cc: Eric W. Biederman, Andrew Morton, Borislav Petkov, Jan Kiszka,
	Jason Wessel, linux-kernel, Yinghai Lu

Now we have limit kdump reseved under 896M, because kexec has the limitation.
and also bzImage need to stay under 4g.

To make kexec/kdump could use range above 4g, we need to make bzImage and
ramdisk could be loaded above 4g.
During booting bzImage will be unpacked on same postion and stay high.

The patches add fields in setup_header and boot_params to
1. get info about ramdisk position info above 4g from bootloader/kexec
2. get info about cmd_line_ptr info above 4g from bootloader/kexec
3. set xloadflags bit0 in header for bzImage and bootloader/kexec load
   could check that to decide if it could to put bzImage high.
4. use sentinel to make sure ext_* fields in boot_params could be used.

This patches is tested with kexec tools with local changes and they are sent
to kexec list later.

could be found at:

        git://git.kernel.org/pub/scm/linux/kernel/git/yinghai/linux-yinghai.git for-x86-boot

and it is on top of linus's tree 2013-01-03
plus tip:x86/mm, tip:x86/mm2

-v2: add ext_cmd_line_ptr support, and handle boot_param/cmd_line is above
     4G case.
-v3: according to hpa, use xloadflags instead code32_start_offset.
     0x200 will not be changed...
-v4: move ext_ramdisk_image/ext_ramdisk_size/ext_cmd_line_ptr to boot_params.
     add handling cross GB boundary case.
-v5: put spare pages in BRK,so could avoid wasting about 4 pages.
     add check for bit USE_EXT_BOOT_PARAMS in xloadflags
-v6: use sentinel according to HPA
     add kdump load high support.
-v7: move sentinel from 0x1f0 to 0x1ef... according to HPA.
     Use HPA's #PF handler version instead of ioremap.
-v7u1: update changelog and comments, so it could break KGDB...

H. Peter Anvin (1):
  x86, 64bit: early #PF handler set page table

Yinghai Lu (30):
  x86, mm: Fix page table early allocation offset checking
  x86, 64bit, mm: make pgd next calculation consistent with pud/pmd
  x86, realmode: set real_mode permissions early
  x86, 64bit, mm: add generic kernel/ident mapping helper
  x86, 64bit: copy zero-page early
  x86, 64bit, realmode: use init_level4_pgt to set trapmoline_pgt directly
  x86, realmode: Separate real_mode reserve and setup
  x86, 64bit: #PF handler set page to cover 2M only
  x86, 64bit: Don't set max_pfn_mapped wrong value early on native path
  x86: Merge early_reserve_initrd for 32bit and 64bit
  x86: add get_ramdisk_image/size()
  x86, boot: add get_cmd_line_ptr()
  x86, boot: move checking of cmd_line_ptr out of common path
  x86, boot: pass cmd_line_ptr with unsigned long instead
  x86, boot: move verify_cpu.S and no_longmode down
  x86, boot: Move lldt/ltr out of 64bit code section
  x86, kexec: remove 1024G limitation for kexec buffer on 64bit
  x86, kexec: set ident mapping for kernel that is above max_pfn
  x86, kexec: replace ident_mapping_init and init_level4_page
  x86, kexec: only set ident mapping for ram.
  x86, boot: add fields to support load bzImage and ramdisk above 4G
  x86, boot: update comments about entries for 64bit image
  x86, boot: Not need to check setup_header version for setup_data
  memblock: add memblock_mem_size()
  x86: Don't enable swiotlb if there is not enough ram for it
  x86, kdump: remove crashkernel range find limit for 64bit
  x86: add Crash kernel low reservation
  x86: Merge early kernel reserve for 32bit and 64bit
  x86, 64bit, mm: Mark data/bss/brk to nx
  x86, 64bit, mm: hibernate use generic mapping_init

 Documentation/kernel-parameters.txt     |    3 +
 Documentation/x86/boot.txt              |   53 +++++++-
 Documentation/x86/zero-page.txt         |    4 +
 arch/x86/boot/boot.h                    |   18 ++-
 arch/x86/boot/cmdline.c                 |   12 +-
 arch/x86/boot/compressed/cmdline.c      |   12 +-
 arch/x86/boot/compressed/head_64.S      |   48 ++++---
 arch/x86/boot/compressed/misc.c         |   12 ++
 arch/x86/boot/header.S                  |   12 +-
 arch/x86/boot/setup.ld                  |    7 ++
 arch/x86/include/asm/init.h             |   12 ++
 arch/x86/include/asm/kexec.h            |    6 +-
 arch/x86/include/asm/page.h             |    4 +
 arch/x86/include/asm/pgtable_64_types.h |    4 +
 arch/x86/include/asm/processor.h        |    1 +
 arch/x86/include/asm/realmode.h         |    3 +-
 arch/x86/include/uapi/asm/bootparam.h   |   13 +-
 arch/x86/kernel/head32.c                |   20 ---
 arch/x86/kernel/head64.c                |  131 ++++++++++++++-----
 arch/x86/kernel/head_64.S               |  210 +++++++++++++++++++------------
 arch/x86/kernel/machine_kexec_64.c      |  171 ++++++++-----------------
 arch/x86/kernel/pci-swiotlb.c           |   14 ++-
 arch/x86/kernel/setup.c                 |  128 ++++++++++++++-----
 arch/x86/kernel/traps.c                 |    9 ++
 arch/x86/mm/init.c                      |   11 +-
 arch/x86/mm/init_64.c                   |  109 ++++++++++++++--
 arch/x86/power/hibernate_64.c           |   66 ++++------
 arch/x86/realmode/init.c                |   42 ++++---
 include/linux/kexec.h                   |    3 +
 include/linux/memblock.h                |    1 +
 kernel/kexec.c                          |   34 ++++-
 mm/memblock.c                           |   17 +++
 32 files changed, 778 insertions(+), 412 deletions(-)

-- 
1.7.10.4


^ permalink raw reply	[flat|nested] 199+ messages in thread

* [PATCH v7u1 01/31] x86, mm: Fix page table early allocation offset checking
  2013-01-04  0:48 [PATCH v7u1 00/31] x86, boot, 64bit: Add support for loading ramdisk and bzImage above 4G Yinghai Lu
@ 2013-01-04  0:48 ` Yinghai Lu
  2013-01-04  7:17   ` Borislav Petkov
  2013-01-15 12:27   ` Stefano Stabellini
  2013-01-04  0:48 ` [PATCH v7u1 02/31] x86, 64bit, mm: make pgd next calculation consistent with pud/pmd Yinghai Lu
                   ` (32 subsequent siblings)
  33 siblings, 2 replies; 199+ messages in thread
From: Yinghai Lu @ 2013-01-04  0:48 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H. Peter Anvin
  Cc: Eric W. Biederman, Andrew Morton, Borislav Petkov, Jan Kiszka,
	Jason Wessel, linux-kernel, Yinghai Lu

During debugging loading kernel above 4G, found one page if is not used
in BRK with early page allocation.

pgt_buf_top is address that can not be used, so should check if that new
end is above that top, otherwise last page will not be used.

Fix that checking and also add print out for every allocation from BRK.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
---
 arch/x86/mm/init.c |    4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/arch/x86/mm/init.c b/arch/x86/mm/init.c
index 6f85de8..c4293cf 100644
--- a/arch/x86/mm/init.c
+++ b/arch/x86/mm/init.c
@@ -47,7 +47,7 @@ __ref void *alloc_low_pages(unsigned int num)
 						__GFP_ZERO, order);
 	}
 
-	if ((pgt_buf_end + num) >= pgt_buf_top) {
+	if ((pgt_buf_end + num) > pgt_buf_top) {
 		unsigned long ret;
 		if (min_pfn_mapped >= max_pfn_mapped)
 			panic("alloc_low_page: ran out of memory");
@@ -61,6 +61,8 @@ __ref void *alloc_low_pages(unsigned int num)
 	} else {
 		pfn = pgt_buf_end;
 		pgt_buf_end += num;
+		printk(KERN_DEBUG "BRK [%#010lx, %#010lx] PGTABLE\n",
+			pfn << PAGE_SHIFT, (pgt_buf_end << PAGE_SHIFT) - 1);
 	}
 
 	for (i = 0; i < num; i++) {
-- 
1.7.10.4


^ permalink raw reply related	[flat|nested] 199+ messages in thread

* [PATCH v7u1 02/31] x86, 64bit, mm: make pgd next calculation consistent with pud/pmd
  2013-01-04  0:48 [PATCH v7u1 00/31] x86, boot, 64bit: Add support for loading ramdisk and bzImage above 4G Yinghai Lu
  2013-01-04  0:48 ` [PATCH v7u1 01/31] x86, mm: Fix page table early allocation offset checking Yinghai Lu
@ 2013-01-04  0:48 ` Yinghai Lu
  2013-01-04  0:48 ` [PATCH v7u1 03/31] x86, realmode: set real_mode permissions early Yinghai Lu
                   ` (31 subsequent siblings)
  33 siblings, 0 replies; 199+ messages in thread
From: Yinghai Lu @ 2013-01-04  0:48 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H. Peter Anvin
  Cc: Eric W. Biederman, Andrew Morton, Borislav Petkov, Jan Kiszka,
	Jason Wessel, linux-kernel, Yinghai Lu

Just like the way we calculate next for pud and pmd, aka
round down and add size.

Also, do not do boundary-checking with 'next', and just pass 'end' down
to phys_pud_init() instead. Because the loop in phys_pud_init() stops at
PTRS_PER_PUD and thus can handle a possibly bigger 'end' properly.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
---
 arch/x86/mm/init_64.c |    6 ++----
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c
index 167439c..b1178eb 100644
--- a/arch/x86/mm/init_64.c
+++ b/arch/x86/mm/init_64.c
@@ -530,9 +530,7 @@ kernel_physical_mapping_init(unsigned long start,
 		pgd_t *pgd = pgd_offset_k(start);
 		pud_t *pud;
 
-		next = (start + PGDIR_SIZE) & PGDIR_MASK;
-		if (next > end)
-			next = end;
+		next = (start & PGDIR_MASK) + PGDIR_SIZE;
 
 		if (pgd_val(*pgd)) {
 			pud = (pud_t *)pgd_page_vaddr(*pgd);
@@ -542,7 +540,7 @@ kernel_physical_mapping_init(unsigned long start,
 		}
 
 		pud = alloc_low_page();
-		last_map_addr = phys_pud_init(pud, __pa(start), __pa(next),
+		last_map_addr = phys_pud_init(pud, __pa(start), __pa(end),
 						 page_size_mask);
 
 		spin_lock(&init_mm.page_table_lock);
-- 
1.7.10.4


^ permalink raw reply related	[flat|nested] 199+ messages in thread

* [PATCH v7u1 03/31] x86, realmode: set real_mode permissions early
  2013-01-04  0:48 [PATCH v7u1 00/31] x86, boot, 64bit: Add support for loading ramdisk and bzImage above 4G Yinghai Lu
  2013-01-04  0:48 ` [PATCH v7u1 01/31] x86, mm: Fix page table early allocation offset checking Yinghai Lu
  2013-01-04  0:48 ` [PATCH v7u1 02/31] x86, 64bit, mm: make pgd next calculation consistent with pud/pmd Yinghai Lu
@ 2013-01-04  0:48 ` Yinghai Lu
  2013-01-04 20:15   ` Borislav Petkov
  2013-01-04  0:48 ` [PATCH v7u1 04/31] x86, 64bit, mm: add generic kernel/ident mapping helper Yinghai Lu
                   ` (30 subsequent siblings)
  33 siblings, 1 reply; 199+ messages in thread
From: Yinghai Lu @ 2013-01-04  0:48 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H. Peter Anvin
  Cc: Eric W. Biederman, Andrew Morton, Borislav Petkov, Jan Kiszka,
	Jason Wessel, linux-kernel, Yinghai Lu

Trampoline code is executed by APs with kernel low mapping.
We need to set trampoline code to EXEC early before we do smp
AP bootings.

Found the problem after switching to #PF handler set page table,
and we do not set initial kernel low mapping with EXEC anymore in
arch/x86/kernel/head_64.S.

Change to use early_initcall instead that will make sure tramopline
will have EXEC set.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
---
 arch/x86/realmode/init.c |    8 ++++++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/arch/x86/realmode/init.c b/arch/x86/realmode/init.c
index 8045026..b96fe6f 100644
--- a/arch/x86/realmode/init.c
+++ b/arch/x86/realmode/init.c
@@ -111,5 +111,9 @@ static int __init set_real_mode_permissions(void)
 
 	return 0;
 }
-
-arch_initcall(set_real_mode_permissions);
+/*
+ * Trampoline will be executed by APs with SMP.
+ * So we need to set it to EXEC in do_pre_smp_initcalls() at least,
+ * and that needs early_initcall().
+ */
+early_initcall(set_real_mode_permissions);
-- 
1.7.10.4


^ permalink raw reply related	[flat|nested] 199+ messages in thread

* [PATCH v7u1 04/31] x86, 64bit, mm: add generic kernel/ident mapping helper
  2013-01-04  0:48 [PATCH v7u1 00/31] x86, boot, 64bit: Add support for loading ramdisk and bzImage above 4G Yinghai Lu
                   ` (2 preceding siblings ...)
  2013-01-04  0:48 ` [PATCH v7u1 03/31] x86, realmode: set real_mode permissions early Yinghai Lu
@ 2013-01-04  0:48 ` Yinghai Lu
  2013-01-04 21:19   ` Borislav Petkov
  2013-01-04  0:48 ` [PATCH v7u1 05/31] x86, 64bit: copy zero-page early Yinghai Lu
                   ` (29 subsequent siblings)
  33 siblings, 1 reply; 199+ messages in thread
From: Yinghai Lu @ 2013-01-04  0:48 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H. Peter Anvin
  Cc: Eric W. Biederman, Andrew Morton, Borislav Petkov, Jan Kiszka,
	Jason Wessel, linux-kernel, Yinghai Lu

It is simple version for kernel_physical_mapping_init.
it will work to build one page table that will be used later.

Use mapping_info to control
        1. alloc_pg_page method
        2. if PMD is EXEC,
        3. if pgd is with kernel low mapping or ident mapping.

Will use to replace some local versions in kexec, hibernation and etc.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
---
 arch/x86/include/asm/init.h |   12 ++++++
 arch/x86/mm/init_64.c       |   90 +++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 102 insertions(+)

diff --git a/arch/x86/include/asm/init.h b/arch/x86/include/asm/init.h
index bac770b..62052c5 100644
--- a/arch/x86/include/asm/init.h
+++ b/arch/x86/include/asm/init.h
@@ -1,5 +1,17 @@
 #ifndef _ASM_X86_INIT_H
 #define _ASM_X86_INIT_H
 
+struct x86_mapping_info {
+	void *(*alloc_pgt_page)(void *); /* allocate buf for page table */
+	void *context;			 /* context for alloc_pgt_page */
+	unsigned long pmd_flag;		 /* page flag for PMD entry */
+	bool kernel_mapping;		 /* kernel mapping or ident mapping */
+};
+
+int kernel_ident_mapping_init(struct x86_mapping_info *info, pgd_t *pgd_page,
+				unsigned long addr, unsigned long end);
+
+int kernel_mapping_init(pgd_t *pgd_page,
+				unsigned long addr, unsigned long end);
 
 #endif /* _ASM_X86_INIT_H */
diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c
index b1178eb..9c5f2b1 100644
--- a/arch/x86/mm/init_64.c
+++ b/arch/x86/mm/init_64.c
@@ -56,6 +56,96 @@
 
 #include "mm_internal.h"
 
+static void ident_pmd_init(unsigned long pmd_flag, pmd_t *pmd_page,
+			   unsigned long addr, unsigned long end)
+{
+	addr &= PMD_MASK;
+	for (; addr < end; addr += PMD_SIZE) {
+		pmd_t *pmd = pmd_page + pmd_index(addr);
+
+		if (!pmd_present(*pmd))
+			set_pmd(pmd, __pmd(addr | pmd_flag));
+	}
+}
+static int ident_pud_init(struct x86_mapping_info *info, pud_t *pud_page,
+			  unsigned long addr, unsigned long end)
+{
+	unsigned long next;
+
+	for (; addr < end; addr = next) {
+		pud_t *pud = pud_page + pud_index(addr);
+		pmd_t *pmd;
+
+		next = (addr & PUD_MASK) + PUD_SIZE;
+		if (next > end)
+			next = end;
+
+		if (pud_present(*pud)) {
+			pmd = pmd_offset(pud, 0);
+			ident_pmd_init(info->pmd_flag, pmd, addr, next);
+			continue;
+		}
+		pmd = (pmd_t *)info->alloc_pgt_page(info->context);
+		if (!pmd)
+			return -ENOMEM;
+		ident_pmd_init(info->pmd_flag, pmd, addr, next);
+		set_pud(pud, __pud(__pa(pmd) | _KERNPG_TABLE));
+	}
+
+	return 0;
+}
+
+int kernel_ident_mapping_init(struct x86_mapping_info *info, pgd_t *pgd_page,
+			      unsigned long addr, unsigned long end)
+{
+	unsigned long next;
+	int result;
+	int off = info->kernel_mapping ? pgd_index(__PAGE_OFFSET) : 0;
+
+	for (; addr < end; addr = next) {
+		pgd_t *pgd = pgd_page + pgd_index(addr) + off;
+		pud_t *pud;
+
+		next = (addr & PGDIR_MASK) + PGDIR_SIZE;
+		if (next > end)
+			next = end;
+
+		if (pgd_present(*pgd)) {
+			pud = pud_offset(pgd, 0);
+			result = ident_pud_init(info, pud, addr, next);
+			if (result)
+				return result;
+			continue;
+		}
+
+		pud = (pud_t *)info->alloc_pgt_page(info->context);
+		if (!pud)
+			return -ENOMEM;
+		result = ident_pud_init(info, pud, addr, next);
+		if (result)
+			return result;
+		set_pgd(pgd, __pgd(__pa(pud) | _KERNPG_TABLE));
+	}
+
+	return 0;
+}
+
+static void *alloc_pgt_page(void *context)
+{
+	return alloc_low_page();
+}
+
+int kernel_mapping_init(pgd_t *pgd_page, unsigned long addr, unsigned long end)
+{
+	struct x86_mapping_info info = {
+		.alloc_pgt_page	= alloc_pgt_page,
+		.pmd_flag	= __PAGE_KERNEL_LARGE,
+		.kernel_mapping	= true,
+	};
+
+	return kernel_ident_mapping_init(&info, pgd_page, addr, end);
+}
+
 static int __init parse_direct_gbpages_off(char *arg)
 {
 	direct_gbpages = 0;
-- 
1.7.10.4


^ permalink raw reply related	[flat|nested] 199+ messages in thread

* [PATCH v7u1 05/31] x86, 64bit: copy zero-page early
  2013-01-04  0:48 [PATCH v7u1 00/31] x86, boot, 64bit: Add support for loading ramdisk and bzImage above 4G Yinghai Lu
                   ` (3 preceding siblings ...)
  2013-01-04  0:48 ` [PATCH v7u1 04/31] x86, 64bit, mm: add generic kernel/ident mapping helper Yinghai Lu
@ 2013-01-04  0:48 ` Yinghai Lu
  2013-01-07 15:53   ` Borislav Petkov
  2013-01-04  0:48 ` [PATCH v7u1 06/31] x86, 64bit, realmode: use init_level4_pgt to set trapmoline_pgt directly Yinghai Lu
                   ` (28 subsequent siblings)
  33 siblings, 1 reply; 199+ messages in thread
From: Yinghai Lu @ 2013-01-04  0:48 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H. Peter Anvin
  Cc: Eric W. Biederman, Andrew Morton, Borislav Petkov, Jan Kiszka,
	Jason Wessel, linux-kernel, Yinghai Lu, Alexander Duyck,
	Fenghua Yu

real_mode_data aka zero-page could be above 4g.
We will have #PF handler to set page table for not accessible ram
early, but could limit it before x86_64_start_reservations to limit
the change to native path.

Also we will need to ramdisk info in zero-page to access microcode
blob in ramdisk in x86_64_start_kernel, so copy zero-page early make
it accessing ramdisk info simple.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Cc: Alexander Duyck <alexander.h.duyck@intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
---
 arch/x86/kernel/head64.c |    6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kernel/head64.c b/arch/x86/kernel/head64.c
index 7b215a5..c0a25e0 100644
--- a/arch/x86/kernel/head64.c
+++ b/arch/x86/kernel/head64.c
@@ -87,6 +87,8 @@ void __init x86_64_start_kernel(char * real_mode_data)
 	}
 	load_idt((const struct desc_ptr *)&idt_descr);
 
+	copy_bootdata(__va(real_mode_data));
+
 	if (console_loglevel == 10)
 		early_printk("Kernel alive\n");
 
@@ -95,7 +97,9 @@ void __init x86_64_start_kernel(char * real_mode_data)
 
 void __init x86_64_start_reservations(char *real_mode_data)
 {
-	copy_bootdata(__va(real_mode_data));
+	/* version is always not zero if it is copied */
+	if (!boot_params.hdr.version)
+		copy_bootdata(__va(real_mode_data));
 
 	memblock_reserve(__pa_symbol(_text),
 			 (unsigned long)__bss_stop - (unsigned long)_text);
-- 
1.7.10.4


^ permalink raw reply related	[flat|nested] 199+ messages in thread

* [PATCH v7u1 06/31] x86, 64bit, realmode: use init_level4_pgt to set trapmoline_pgt directly
  2013-01-04  0:48 [PATCH v7u1 00/31] x86, boot, 64bit: Add support for loading ramdisk and bzImage above 4G Yinghai Lu
                   ` (4 preceding siblings ...)
  2013-01-04  0:48 ` [PATCH v7u1 05/31] x86, 64bit: copy zero-page early Yinghai Lu
@ 2013-01-04  0:48 ` Yinghai Lu
  2013-01-04 17:18   ` Sakkinen, Jarkko
  2013-01-07 15:54   ` Borislav Petkov
  2013-01-04  0:48 ` [PATCH v7u1 07/31] x86, realmode: Separate real_mode reserve and setup Yinghai Lu
                   ` (27 subsequent siblings)
  33 siblings, 2 replies; 199+ messages in thread
From: Yinghai Lu @ 2013-01-04  0:48 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H. Peter Anvin
  Cc: Eric W. Biederman, Andrew Morton, Borislav Petkov, Jan Kiszka,
	Jason Wessel, linux-kernel, Yinghai Lu, Jarkko Sakkinen

with #PF handler way to set early page table, level3_ident will go away with
64bit native path.

So just use entries in init_level4_pgt to set them in tramopline_pgt

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Cc: Jarkko Sakkinen <jarkko.sakkinen@intel.com>
---
 arch/x86/realmode/init.c |    4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/x86/realmode/init.c b/arch/x86/realmode/init.c
index b96fe6f..384b3f4 100644
--- a/arch/x86/realmode/init.c
+++ b/arch/x86/realmode/init.c
@@ -78,8 +78,8 @@ void __init setup_real_mode(void)
 	*trampoline_cr4_features = read_cr4();
 
 	trampoline_pgd = (u64 *) __va(real_mode_header->trampoline_pgd);
-	trampoline_pgd[0] = __pa_symbol(level3_ident_pgt) + _KERNPG_TABLE;
-	trampoline_pgd[511] = __pa_symbol(level3_kernel_pgt) + _KERNPG_TABLE;
+	trampoline_pgd[0] = init_level4_pgt[pgd_index(__PAGE_OFFSET)].pgd;
+	trampoline_pgd[511] = init_level4_pgt[511].pgd;
 #endif
 }
 
-- 
1.7.10.4


^ permalink raw reply related	[flat|nested] 199+ messages in thread

* [PATCH v7u1 07/31] x86, realmode: Separate real_mode reserve and setup
  2013-01-04  0:48 [PATCH v7u1 00/31] x86, boot, 64bit: Add support for loading ramdisk and bzImage above 4G Yinghai Lu
                   ` (5 preceding siblings ...)
  2013-01-04  0:48 ` [PATCH v7u1 06/31] x86, 64bit, realmode: use init_level4_pgt to set trapmoline_pgt directly Yinghai Lu
@ 2013-01-04  0:48 ` Yinghai Lu
  2013-01-04 17:18   ` Sakkinen, Jarkko
  2013-01-07 15:54   ` Borislav Petkov
  2013-01-04  0:48 ` [PATCH v7u1 08/31] x86, 64bit: early #PF handler set page table Yinghai Lu
                   ` (26 subsequent siblings)
  33 siblings, 2 replies; 199+ messages in thread
From: Yinghai Lu @ 2013-01-04  0:48 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H. Peter Anvin
  Cc: Eric W. Biederman, Andrew Morton, Borislav Petkov, Jan Kiszka,
	Jason Wessel, linux-kernel, Yinghai Lu, Jarkko Sakkinen

After we switch to use #PF handler help to set page table, init_level4_pgt
will only have entries set after init_mem_mapping.
We need to move copying init_level4_pgt to trampoline_pgd after that.

So split reserve and setup, and move the setup after init_mem_mapping()

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Cc: Jarkko Sakkinen <jarkko.sakkinen@intel.com>
---
 arch/x86/include/asm/realmode.h |    3 ++-
 arch/x86/kernel/setup.c         |    4 +++-
 arch/x86/realmode/init.c        |   30 +++++++++++++++++++-----------
 3 files changed, 24 insertions(+), 13 deletions(-)

diff --git a/arch/x86/include/asm/realmode.h b/arch/x86/include/asm/realmode.h
index fe1ec5b..9c6b890 100644
--- a/arch/x86/include/asm/realmode.h
+++ b/arch/x86/include/asm/realmode.h
@@ -58,6 +58,7 @@ extern unsigned char boot_gdt[];
 extern unsigned char secondary_startup_64[];
 #endif
 
-extern void __init setup_real_mode(void);
+void reserve_real_mode(void);
+void setup_real_mode(void);
 
 #endif /* _ARCH_X86_REALMODE_H */
diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
index 81ea5a5..01b22d0 100644
--- a/arch/x86/kernel/setup.c
+++ b/arch/x86/kernel/setup.c
@@ -913,10 +913,12 @@ void __init setup_arch(char **cmdline_p)
 	printk(KERN_DEBUG "initial memory mapped: [mem 0x00000000-%#010lx]\n",
 			(max_pfn_mapped<<PAGE_SHIFT) - 1);
 
-	setup_real_mode();
+	reserve_real_mode();
 
 	init_mem_mapping();
 
+	setup_real_mode();
+
 	memblock.current_limit = get_max_mapped();
 	dma_contiguous_reserve(0);
 
diff --git a/arch/x86/realmode/init.c b/arch/x86/realmode/init.c
index 384b3f4..3baae96 100644
--- a/arch/x86/realmode/init.c
+++ b/arch/x86/realmode/init.c
@@ -8,9 +8,26 @@
 struct real_mode_header *real_mode_header;
 u32 *trampoline_cr4_features;
 
-void __init setup_real_mode(void)
+void __init reserve_real_mode(void)
 {
 	phys_addr_t mem;
+	unsigned char *base;
+	size_t size = PAGE_ALIGN(real_mode_blob_end - real_mode_blob);
+
+	/* Has to be in very low memory so we can execute real-mode AP code. */
+	mem = memblock_find_in_range(0, 1<<20, size, PAGE_SIZE);
+	if (!mem)
+		panic("Cannot allocate trampoline\n");
+
+	base = __va(mem);
+	memblock_reserve(mem, size);
+	real_mode_header = (struct real_mode_header *) base;
+	printk(KERN_DEBUG "Base memory trampoline at [%p] %llx size %zu\n",
+	       base, (unsigned long long)mem, size);
+}
+
+void __init setup_real_mode(void)
+{
 	u16 real_mode_seg;
 	u32 *rel;
 	u32 count;
@@ -25,16 +42,7 @@ void __init setup_real_mode(void)
 	u64 efer;
 #endif
 
-	/* Has to be in very low memory so we can execute real-mode AP code. */
-	mem = memblock_find_in_range(0, 1<<20, size, PAGE_SIZE);
-	if (!mem)
-		panic("Cannot allocate trampoline\n");
-
-	base = __va(mem);
-	memblock_reserve(mem, size);
-	real_mode_header = (struct real_mode_header *) base;
-	printk(KERN_DEBUG "Base memory trampoline at [%p] %llx size %zu\n",
-	       base, (unsigned long long)mem, size);
+	base = (unsigned char *)real_mode_header;
 
 	memcpy(base, real_mode_blob, size);
 
-- 
1.7.10.4


^ permalink raw reply related	[flat|nested] 199+ messages in thread

* [PATCH v7u1 08/31] x86, 64bit: early #PF handler set page table
  2013-01-04  0:48 [PATCH v7u1 00/31] x86, boot, 64bit: Add support for loading ramdisk and bzImage above 4G Yinghai Lu
                   ` (6 preceding siblings ...)
  2013-01-04  0:48 ` [PATCH v7u1 07/31] x86, realmode: Separate real_mode reserve and setup Yinghai Lu
@ 2013-01-04  0:48 ` Yinghai Lu
  2013-01-07 15:55   ` Borislav Petkov
  2013-01-04  0:48 ` [PATCH v7u1 09/31] x86, 64bit: #PF handler set page to cover 2M only Yinghai Lu
                   ` (25 subsequent siblings)
  33 siblings, 1 reply; 199+ messages in thread
From: Yinghai Lu @ 2013-01-04  0:48 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H. Peter Anvin
  Cc: Eric W. Biederman, Andrew Morton, Borislav Petkov, Jan Kiszka,
	Jason Wessel, linux-kernel, Yinghai Lu

From: "H. Peter Anvin" <hpa@zytor.com>

two use cases:
1. We will support load and run kernel above 4G, and zero_page, ramdisk
   will be above 4G, too
2. need to access ramdisk early to get microcode to update that as
   early possible.

We could use early_iomap to access them, but it will make code to
messy and hard to unified with 32bit.

So here comes #PF handler to set page page.

When #PF happen, handler will use pages in __initdata to set page page
to cover accessed page.

those code and page in __INIT sections, so will not increase ram usages.

The good point is: with help of #PF handler, we can set kernel mapping
from blank, and switch to init_level4_pgt later.

switchover in head_64.S is only using three page to handle kernel
crossing 1G, 512G with shareing page, most insteresting part.

early_make_pgtable is using kernel high mapping address to access pages
to set page table.

-v4: Add phys_base offset to make kexec happy, and add
	init_mapping_kernel()   - Yinghai
-v5: fix compiling with xen, and add back ident level3 and level2 for xen
     also move back init_level4_pgt from BSS to DATA again.
     because we have to clear it anyway.  - Yinghai
-v6: switch to init_level4_pgt in init_mem_mapping. - Yinghai
-v7: remove not needed clear_page for init_level4_page
     it is with fill 512,8,0 already in head_64.S  - Yinghai
-v8: we need to keep that handler alive until init_mem_mapping and don't
     let early_trap_init to trash that early #PF handler.
     So split early_trap_pf_init out and move it down. - Yinghai
-v9: switchover only cover kernel space instead of 1G so could avoid
     touch possible mem holes. - Yinghai
-v11: change far jmp back to far return to initial_code, that is needed
     to fix failure that is reported by Konrad on AMD system.  - Yinghai

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
---
 arch/x86/include/asm/pgtable_64_types.h |    4 +
 arch/x86/include/asm/processor.h        |    1 +
 arch/x86/kernel/head64.c                |   81 ++++++++++--
 arch/x86/kernel/head_64.S               |  210 +++++++++++++++++++------------
 arch/x86/kernel/setup.c                 |    2 +
 arch/x86/kernel/traps.c                 |    9 ++
 arch/x86/mm/init.c                      |    3 +-
 7 files changed, 219 insertions(+), 91 deletions(-)

diff --git a/arch/x86/include/asm/pgtable_64_types.h b/arch/x86/include/asm/pgtable_64_types.h
index 766ea16..2d88344 100644
--- a/arch/x86/include/asm/pgtable_64_types.h
+++ b/arch/x86/include/asm/pgtable_64_types.h
@@ -1,6 +1,8 @@
 #ifndef _ASM_X86_PGTABLE_64_DEFS_H
 #define _ASM_X86_PGTABLE_64_DEFS_H
 
+#include <asm/sparsemem.h>
+
 #ifndef __ASSEMBLY__
 #include <linux/types.h>
 
@@ -60,4 +62,6 @@ typedef struct { pteval_t pte; } pte_t;
 #define MODULES_END      _AC(0xffffffffff000000, UL)
 #define MODULES_LEN   (MODULES_END - MODULES_VADDR)
 
+#define EARLY_DYNAMIC_PAGE_TABLES	64
+
 #endif /* _ASM_X86_PGTABLE_64_DEFS_H */
diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
index 888184b..bdee8bd 100644
--- a/arch/x86/include/asm/processor.h
+++ b/arch/x86/include/asm/processor.h
@@ -731,6 +731,7 @@ extern void enable_sep_cpu(void);
 extern int sysenter_setup(void);
 
 extern void early_trap_init(void);
+void early_trap_pf_init(void);
 
 /* Defined in head.S */
 extern struct desc_ptr		early_gdt_descr;
diff --git a/arch/x86/kernel/head64.c b/arch/x86/kernel/head64.c
index c0a25e0..25591f9 100644
--- a/arch/x86/kernel/head64.c
+++ b/arch/x86/kernel/head64.c
@@ -26,11 +26,73 @@
 #include <asm/e820.h>
 #include <asm/bios_ebda.h>
 
-static void __init zap_identity_mappings(void)
+/*
+ * Manage page tables very early on.
+ */
+extern pgd_t early_level4_pgt[PTRS_PER_PGD];
+extern pmd_t early_dynamic_pgts[EARLY_DYNAMIC_PAGE_TABLES][PTRS_PER_PMD];
+static unsigned int __initdata next_early_pgt = 2;
+
+/* Wipe all early page tables except for the kernel symbol map */
+static void __init reset_early_page_tables(void)
 {
-	pgd_t *pgd = pgd_offset_k(0UL);
-	pgd_clear(pgd);
-	__flush_tlb_all();
+	unsigned long i;
+
+	for (i = 0; i < PTRS_PER_PGD-1; i++)
+		early_level4_pgt[i].pgd = 0;
+
+	next_early_pgt = 0;
+
+	write_cr3(__pa(early_level4_pgt));
+}
+
+/* Create a new PMD entry */
+int __init early_make_pgtable(unsigned long address)
+{
+	unsigned long physaddr = address - __PAGE_OFFSET;
+	unsigned long i;
+	pgdval_t pgd, *pgd_p;
+	pudval_t *pud_p;
+	pmdval_t pmd, *pmd_p;
+
+	/* Invalid address or early pgt is done ?  */
+	if (physaddr >= MAXMEM || read_cr3() != __pa(early_level4_pgt))
+		return -1;
+
+	i = (address >> PGDIR_SHIFT) & (PTRS_PER_PGD - 1);
+	pgd_p = &early_level4_pgt[i].pgd;
+	pgd = *pgd_p;
+
+	/*
+	 * The use of __START_KERNEL_map rather than __PAGE_OFFSET here is
+	 * critical -- __PAGE_OFFSET would point us back into the dynamic
+	 * range and we might end up looping forever...
+	 */
+	if (pgd && next_early_pgt < EARLY_DYNAMIC_PAGE_TABLES) {
+		pud_p = (pudval_t *)((pgd & PTE_PFN_MASK) + __START_KERNEL_map - phys_base);
+	} else {
+		if (next_early_pgt >= EARLY_DYNAMIC_PAGE_TABLES-1)
+			reset_early_page_tables();
+
+		pud_p = (pudval_t *)early_dynamic_pgts[next_early_pgt++];
+		for (i = 0; i < PTRS_PER_PUD; i++)
+			pud_p[i] = 0;
+
+		*pgd_p = (pgdval_t)pud_p - __START_KERNEL_map + phys_base + _KERNPG_TABLE;
+	}
+	i = (address >> PUD_SHIFT) & (PTRS_PER_PUD - 1);
+	pud_p += i;
+
+	pmd_p = (pmdval_t *)early_dynamic_pgts[next_early_pgt++];
+	pmd = (physaddr & PUD_MASK) + (__PAGE_KERNEL_LARGE & ~_PAGE_GLOBAL);
+	for (i = 0; i < PTRS_PER_PMD; i++) {
+		pmd_p[i] = pmd;
+		pmd += PMD_SIZE;
+	}
+
+	*pud_p = (pudval_t)pmd_p - __START_KERNEL_map + phys_base + _KERNPG_TABLE;
+
+	return 0;
 }
 
 /* Don't add a printk in there. printk relies on the PDA which is not initialized 
@@ -70,12 +132,13 @@ void __init x86_64_start_kernel(char * real_mode_data)
 				(__START_KERNEL & PGDIR_MASK)));
 	BUILD_BUG_ON(__fix_to_virt(__end_of_fixed_addresses) <= MODULES_END);
 
+	/* Kill off the identity-map trampoline */
+	reset_early_page_tables();
+
 	/* clear bss before set_intr_gate with early_idt_handler */
 	clear_bss();
 
-	/* Make NULL pointers segfault */
-	zap_identity_mappings();
-
+	/* XXX - this is wrong... we need to build page tables from scratch */
 	max_pfn_mapped = KERNEL_IMAGE_SIZE >> PAGE_SHIFT;
 
 	for (i = 0; i < NUM_EXCEPTION_VECTORS; i++) {
@@ -92,6 +155,10 @@ void __init x86_64_start_kernel(char * real_mode_data)
 	if (console_loglevel == 10)
 		early_printk("Kernel alive\n");
 
+	clear_page(init_level4_pgt);
+	/* set init_level4_pgt kernel high mapping*/
+	init_level4_pgt[511] = early_level4_pgt[511];
+
 	x86_64_start_reservations(real_mode_data);
 }
 
diff --git a/arch/x86/kernel/head_64.S b/arch/x86/kernel/head_64.S
index 980053c..d94f6d6 100644
--- a/arch/x86/kernel/head_64.S
+++ b/arch/x86/kernel/head_64.S
@@ -47,14 +47,13 @@ L3_START_KERNEL = pud_index(__START_KERNEL_map)
 	.code64
 	.globl startup_64
 startup_64:
-
 	/*
 	 * At this point the CPU runs in 64bit mode CS.L = 1 CS.D = 1,
 	 * and someone has loaded an identity mapped page table
 	 * for us.  These identity mapped page tables map all of the
 	 * kernel pages and possibly all of memory.
 	 *
-	 * %esi holds a physical pointer to real_mode_data.
+	 * %rsi holds a physical pointer to real_mode_data.
 	 *
 	 * We come here either directly from a 64bit bootloader, or from
 	 * arch/x86_64/boot/compressed/head.S.
@@ -66,7 +65,8 @@ startup_64:
 	 * tables and then reload them.
 	 */
 
-	/* Compute the delta between the address I am compiled to run at and the
+	/*
+	 * Compute the delta between the address I am compiled to run at and the
 	 * address I am actually running at.
 	 */
 	leaq	_text(%rip), %rbp
@@ -78,45 +78,62 @@ startup_64:
 	testl	%eax, %eax
 	jnz	bad_address
 
-	/* Is the address too large? */
-	leaq	_text(%rip), %rdx
-	movq	$PGDIR_SIZE, %rax
-	cmpq	%rax, %rdx
-	jae	bad_address
-
-	/* Fixup the physical addresses in the page table
+	/*
+	 * Is the address too large?
 	 */
-	addq	%rbp, init_level4_pgt + 0(%rip)
-	addq	%rbp, init_level4_pgt + (L4_PAGE_OFFSET*8)(%rip)
-	addq	%rbp, init_level4_pgt + (L4_START_KERNEL*8)(%rip)
+	leaq	_text(%rip), %rax
+	shrq	$MAX_PHYSMEM_BITS, %rax
+	jnz	bad_address
 
-	addq	%rbp, level3_ident_pgt + 0(%rip)
+	/*
+	 * Fixup the physical addresses in the page table
+	 */
+	addq	%rbp, early_level4_pgt + (L4_START_KERNEL*8)(%rip)
 
 	addq	%rbp, level3_kernel_pgt + (510*8)(%rip)
 	addq	%rbp, level3_kernel_pgt + (511*8)(%rip)
 
 	addq	%rbp, level2_fixmap_pgt + (506*8)(%rip)
 
-	/* Add an Identity mapping if I am above 1G */
+	/*
+	 * Set up the identity mapping for the switchover.  These
+	 * entries should *NOT* have the global bit set!  This also
+	 * creates a bunch of nonsense entries but that is fine --
+	 * it avoids problems around wraparound.
+	 */
 	leaq	_text(%rip), %rdi
-	andq	$PMD_PAGE_MASK, %rdi
+	leaq	early_level4_pgt(%rip), %rbx
 
 	movq	%rdi, %rax
-	shrq	$PUD_SHIFT, %rax
-	andq	$(PTRS_PER_PUD - 1), %rax
-	jz	ident_complete
+	shrq	$PGDIR_SHIFT, %rax
 
-	leaq	(level2_spare_pgt - __START_KERNEL_map + _KERNPG_TABLE)(%rbp), %rdx
-	leaq	level3_ident_pgt(%rip), %rbx
-	movq	%rdx, 0(%rbx, %rax, 8)
+	leaq	(4096 + _KERNPG_TABLE)(%rbx), %rdx
+	movq	%rdx, 0(%rbx,%rax,8)
+	movq	%rdx, 8(%rbx,%rax,8)
 
+	addq	$4096, %rdx
 	movq	%rdi, %rax
-	shrq	$PMD_SHIFT, %rax
-	andq	$(PTRS_PER_PMD - 1), %rax
-	leaq	__PAGE_KERNEL_IDENT_LARGE_EXEC(%rdi), %rdx
-	leaq	level2_spare_pgt(%rip), %rbx
-	movq	%rdx, 0(%rbx, %rax, 8)
-ident_complete:
+	shrq	$PUD_SHIFT, %rax
+	andl	$(PTRS_PER_PUD-1), %eax
+	movq	%rdx, (4096+0)(%rbx,%rax,8)
+	movq	%rdx, (4096+8)(%rbx,%rax,8)
+
+	addq	$8192, %rbx
+	movq	%rdi, %rax
+	shrq	$PMD_SHIFT, %rdi
+	addq	$(__PAGE_KERNEL_LARGE_EXEC & ~_PAGE_GLOBAL), %rax
+	leaq	(_end - 1)(%rip), %rcx
+	shrq	$PMD_SHIFT, %rcx
+	subq	%rdi, %rcx
+	incl	%ecx
+
+1:
+	andq	$(PTRS_PER_PMD - 1), %rdi
+	movq	%rax, (%rbx,%rdi,8)
+	incq	%rdi
+	addq	$PMD_SIZE, %rax
+	decl	%ecx
+	jnz	1b
 
 	/*
 	 * Fixup the kernel text+data virtual addresses. Note that
@@ -124,7 +141,6 @@ ident_complete:
 	 * cleanup_highmap() fixes this up along with the mappings
 	 * beyond _end.
 	 */
-
 	leaq	level2_kernel_pgt(%rip), %rdi
 	leaq	4096(%rdi), %r8
 	/* See if it is a valid page table entry */
@@ -139,17 +155,14 @@ ident_complete:
 	/* Fixup phys_base */
 	addq	%rbp, phys_base(%rip)
 
-	/* Due to ENTRY(), sometimes the empty space gets filled with
-	 * zeros. Better take a jmp than relying on empty space being
-	 * filled with 0x90 (nop)
-	 */
-	jmp secondary_startup_64
+	movq	$(early_level4_pgt - __START_KERNEL_map), %rax
+	jmp 1f
 ENTRY(secondary_startup_64)
 	/*
 	 * At this point the CPU runs in 64bit mode CS.L = 1 CS.D = 1,
 	 * and someone has loaded a mapped page table.
 	 *
-	 * %esi holds a physical pointer to real_mode_data.
+	 * %rsi holds a physical pointer to real_mode_data.
 	 *
 	 * We come here either from startup_64 (using physical addresses)
 	 * or from trampoline.S (using virtual addresses).
@@ -159,12 +172,14 @@ ENTRY(secondary_startup_64)
 	 * after the boot processor executes this code.
 	 */
 
+	movq	$(init_level4_pgt - __START_KERNEL_map), %rax
+1:
+
 	/* Enable PAE mode and PGE */
-	movl	$(X86_CR4_PAE | X86_CR4_PGE), %eax
-	movq	%rax, %cr4
+	movl	$(X86_CR4_PAE | X86_CR4_PGE), %ecx
+	movq	%rcx, %cr4
 
 	/* Setup early boot stage 4 level pagetables. */
-	movq	$(init_level4_pgt - __START_KERNEL_map), %rax
 	addq	phys_base(%rip), %rax
 	movq	%rax, %cr3
 
@@ -196,7 +211,7 @@ ENTRY(secondary_startup_64)
 	movq	%rax, %cr0
 
 	/* Setup a boot time stack */
-	movq stack_start(%rip),%rsp
+	movq stack_start(%rip), %rsp
 
 	/* zero EFLAGS after setting rsp */
 	pushq $0
@@ -236,15 +251,33 @@ ENTRY(secondary_startup_64)
 	movl	initial_gs+4(%rip),%edx
 	wrmsr	
 
-	/* esi is pointer to real mode structure with interesting info.
+	/* rsi is pointer to real mode structure with interesting info.
 	   pass it to C */
-	movl	%esi, %edi
+	movq	%rsi, %rdi
 	
 	/* Finally jump to run C code and to be on real kernel address
 	 * Since we are running on identity-mapped space we have to jump
 	 * to the full 64bit address, this is only possible as indirect
 	 * jump.  In addition we need to ensure %cs is set so we make this
 	 * a far return.
+	 *
+	 * Note: do not change to far jump indirect with 64bit offset.
+	 *
+	 * AMD does not support far jump indirect with 64bit offset.
+	 * AMD64 Architecture Programmer's Manual, Volume 3: states only
+	 *	JMP FAR mem16:16 FF /5 Far jump indirect,
+	 *		with the target specified by a far pointer in memory.
+	 *	JMP FAR mem16:32 FF /5 Far jump indirect,
+	 *		with the target specified by a far pointer in memory.
+	 *
+	 * Intel64 does support 64bit offset.
+	 * Software Developer Manual Vol 2: states:
+	 *	FF /5 JMP m16:16 Jump far, absolute indirect,
+	 *		address given in m16:16
+	 *	FF /5 JMP m16:32 Jump far, absolute indirect,
+	 *		address given in m16:32.
+	 *	REX.W + FF /5 JMP m16:64 Jump far, absolute indirect,
+	 *		address given in m16:64.
 	 */
 	movq	initial_code(%rip),%rax
 	pushq	$0		# fake return address to stop unwinder
@@ -270,13 +303,13 @@ ENDPROC(start_cpu0)
 
 	/* SMP bootup changes these two */
 	__REFDATA
-	.align	8
-	ENTRY(initial_code)
+	.balign	8
+	GLOBAL(initial_code)
 	.quad	x86_64_start_kernel
-	ENTRY(initial_gs)
+	GLOBAL(initial_gs)
 	.quad	INIT_PER_CPU_VAR(irq_stack_union)
 
-	ENTRY(stack_start)
+	GLOBAL(stack_start)
 	.quad  init_thread_union+THREAD_SIZE-8
 	.word  0
 	__FINITDATA
@@ -284,7 +317,7 @@ ENDPROC(start_cpu0)
 bad_address:
 	jmp bad_address
 
-	.section ".init.text","ax"
+	__INIT
 	.globl early_idt_handlers
 early_idt_handlers:
 	# 104(%rsp) %rflags
@@ -321,14 +354,22 @@ ENTRY(early_idt_handler)
 	pushq %r11		#  0(%rsp)
 
 	cmpl $__KERNEL_CS,96(%rsp)
-	jne 10f
+	jne 11f
+
+	cmpl $14,72(%rsp)	# Page fault?
+	jnz 10f
+	GET_CR2_INTO(%rdi)	# can clobber any volatile register if pv
+	call early_make_pgtable
+	andl %eax,%eax
+	jz 20f			# All good
 
+10:
 	leaq 88(%rsp),%rdi	# Pointer to %rip
 	call early_fixup_exception
 	andl %eax,%eax
 	jnz 20f			# Found an exception entry
 
-10:
+11:
 #ifdef CONFIG_EARLY_PRINTK
 	GET_CR2_INTO(%r9)	# can clobber any volatile register if pv
 	movl 80(%rsp),%r8d	# error code
@@ -350,7 +391,7 @@ ENTRY(early_idt_handler)
 1:	hlt
 	jmp 1b
 
-20:	# Exception table entry found
+20:	# Exception table entry found or page table generated
 	popq %r11
 	popq %r10
 	popq %r9
@@ -364,6 +405,8 @@ ENTRY(early_idt_handler)
 	decl early_recursion_flag(%rip)
 	INTERRUPT_RETURN
 
+	__INITDATA
+
 	.balign 4
 early_recursion_flag:
 	.long 0
@@ -374,11 +417,10 @@ early_idt_msg:
 early_idt_ripmsg:
 	.asciz "RIP %s\n"
 #endif /* CONFIG_EARLY_PRINTK */
-	.previous
 
 #define NEXT_PAGE(name) \
 	.balign	PAGE_SIZE; \
-ENTRY(name)
+GLOBAL(name)
 
 /* Automate the creation of 1 to 1 mapping pmd entries */
 #define PMDS(START, PERM, COUNT)			\
@@ -388,24 +430,37 @@ ENTRY(name)
 	i = i + 1 ;					\
 	.endr
 
+	__INITDATA
+NEXT_PAGE(early_level4_pgt)
+	.fill	511,8,0
+	.quad	level3_kernel_pgt - __START_KERNEL_map + _PAGE_TABLE
+
+NEXT_PAGE(early_dynamic_pgts)
+	.fill	512*EARLY_DYNAMIC_PAGE_TABLES,8,0
+
 	.data
-	/*
-	 * This default setting generates an ident mapping at address 0x100000
-	 * and a mapping for the kernel that precisely maps virtual address
-	 * 0xffffffff80000000 to physical address 0x000000. (always using
-	 * 2Mbyte large pages provided by PAE mode)
-	 */
+
+#ifndef CONFIG_XEN
 NEXT_PAGE(init_level4_pgt)
-	.quad	level3_ident_pgt - __START_KERNEL_map + _KERNPG_TABLE
-	.org	init_level4_pgt + L4_PAGE_OFFSET*8, 0
-	.quad	level3_ident_pgt - __START_KERNEL_map + _KERNPG_TABLE
-	.org	init_level4_pgt + L4_START_KERNEL*8, 0
+	.fill	512,8,0
+#else
+NEXT_PAGE(init_level4_pgt)
+	.quad   level3_ident_pgt - __START_KERNEL_map + _KERNPG_TABLE
+	.org    init_level4_pgt + L4_PAGE_OFFSET*8, 0
+	.quad   level3_ident_pgt - __START_KERNEL_map + _KERNPG_TABLE
+	.org    init_level4_pgt + L4_START_KERNEL*8, 0
 	/* (2^48-(2*1024*1024*1024))/(2^39) = 511 */
-	.quad	level3_kernel_pgt - __START_KERNEL_map + _PAGE_TABLE
+	.quad   level3_kernel_pgt - __START_KERNEL_map + _PAGE_TABLE
 
 NEXT_PAGE(level3_ident_pgt)
 	.quad	level2_ident_pgt - __START_KERNEL_map + _KERNPG_TABLE
-	.fill	511,8,0
+	.fill	511, 8, 0
+NEXT_PAGE(level2_ident_pgt)
+	/* Since I easily can, map the first 1G.
+	 * Don't set NX because code runs from these pages.
+	 */
+	PMDS(0, __PAGE_KERNEL_IDENT_LARGE_EXEC, PTRS_PER_PMD)
+#endif
 
 NEXT_PAGE(level3_kernel_pgt)
 	.fill	L3_START_KERNEL,8,0
@@ -413,21 +468,6 @@ NEXT_PAGE(level3_kernel_pgt)
 	.quad	level2_kernel_pgt - __START_KERNEL_map + _KERNPG_TABLE
 	.quad	level2_fixmap_pgt - __START_KERNEL_map + _PAGE_TABLE
 
-NEXT_PAGE(level2_fixmap_pgt)
-	.fill	506,8,0
-	.quad	level1_fixmap_pgt - __START_KERNEL_map + _PAGE_TABLE
-	/* 8MB reserved for vsyscalls + a 2MB hole = 4 + 1 entries */
-	.fill	5,8,0
-
-NEXT_PAGE(level1_fixmap_pgt)
-	.fill	512,8,0
-
-NEXT_PAGE(level2_ident_pgt)
-	/* Since I easily can, map the first 1G.
-	 * Don't set NX because code runs from these pages.
-	 */
-	PMDS(0, __PAGE_KERNEL_IDENT_LARGE_EXEC, PTRS_PER_PMD)
-
 NEXT_PAGE(level2_kernel_pgt)
 	/*
 	 * 512 MB kernel mapping. We spend a full page on this pagetable
@@ -442,11 +482,16 @@ NEXT_PAGE(level2_kernel_pgt)
 	PMDS(0, __PAGE_KERNEL_LARGE_EXEC,
 		KERNEL_IMAGE_SIZE/PMD_SIZE)
 
-NEXT_PAGE(level2_spare_pgt)
-	.fill   512, 8, 0
+NEXT_PAGE(level2_fixmap_pgt)
+	.fill	506,8,0
+	.quad	level1_fixmap_pgt - __START_KERNEL_map + _PAGE_TABLE
+	/* 8MB reserved for vsyscalls + a 2MB hole = 4 + 1 entries */
+	.fill	5,8,0
+
+NEXT_PAGE(level1_fixmap_pgt)
+	.fill	512,8,0
 
 #undef PMDS
-#undef NEXT_PAGE
 
 	.data
 	.align 16
@@ -472,6 +517,5 @@ ENTRY(nmi_idt_table)
 	.skip IDT_ENTRIES * 16
 
 	__PAGE_ALIGNED_BSS
-	.align PAGE_SIZE
-ENTRY(empty_zero_page)
+NEXT_PAGE(empty_zero_page)
 	.skip PAGE_SIZE
diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
index 01b22d0..63160c6 100644
--- a/arch/x86/kernel/setup.c
+++ b/arch/x86/kernel/setup.c
@@ -917,6 +917,8 @@ void __init setup_arch(char **cmdline_p)
 
 	init_mem_mapping();
 
+	early_trap_pf_init();
+
 	setup_real_mode();
 
 	memblock.current_limit = get_max_mapped();
diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c
index ecffca1..68bda7a 100644
--- a/arch/x86/kernel/traps.c
+++ b/arch/x86/kernel/traps.c
@@ -688,10 +688,19 @@ void __init early_trap_init(void)
 	set_intr_gate_ist(X86_TRAP_DB, &debug, DEBUG_STACK);
 	/* int3 can be called from all */
 	set_system_intr_gate_ist(X86_TRAP_BP, &int3, DEBUG_STACK);
+#ifdef CONFIG_X86_32
 	set_intr_gate(X86_TRAP_PF, &page_fault);
+#endif
 	load_idt(&idt_descr);
 }
 
+void __init early_trap_pf_init(void)
+{
+#ifdef CONFIG_X86_64
+	set_intr_gate(X86_TRAP_PF, &page_fault);
+#endif
+}
+
 void __init trap_init(void)
 {
 	int i;
diff --git a/arch/x86/mm/init.c b/arch/x86/mm/init.c
index c4293cf..ab26a15 100644
--- a/arch/x86/mm/init.c
+++ b/arch/x86/mm/init.c
@@ -437,9 +437,10 @@ void __init init_mem_mapping(void)
 	}
 #else
 	early_ioremap_page_table_range_init();
+#endif
+
 	load_cr3(swapper_pg_dir);
 	__flush_tlb_all();
-#endif
 
 	early_memtest(0, max_pfn_mapped << PAGE_SHIFT);
 }
-- 
1.7.10.4


^ permalink raw reply related	[flat|nested] 199+ messages in thread

* [PATCH v7u1 09/31] x86, 64bit: #PF handler set page to cover 2M only
  2013-01-04  0:48 [PATCH v7u1 00/31] x86, boot, 64bit: Add support for loading ramdisk and bzImage above 4G Yinghai Lu
                   ` (7 preceding siblings ...)
  2013-01-04  0:48 ` [PATCH v7u1 08/31] x86, 64bit: early #PF handler set page table Yinghai Lu
@ 2013-01-04  0:48 ` Yinghai Lu
  2013-01-09 22:57   ` Borislav Petkov
  2013-01-04  0:48 ` [PATCH v7u1 10/31] x86, 64bit: Don't set max_pfn_mapped wrong value early on native path Yinghai Lu
                   ` (24 subsequent siblings)
  33 siblings, 1 reply; 199+ messages in thread
From: Yinghai Lu @ 2013-01-04  0:48 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H. Peter Anvin
  Cc: Eric W. Biederman, Andrew Morton, Borislav Petkov, Jan Kiszka,
	Jason Wessel, linux-kernel, Yinghai Lu, Alexander Duyck

Now #PF hanlder could map 1G per #PF, That causes same problem that
is fixed by
	x86, mm: Only direct map addresses that are marked as E820_RAM

only add one 2M mapping instead of 1G accessing one time for dynamically
per #PF.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Cc: Alexander Duyck <alexander.h.duyck@intel.com>
---
 arch/x86/kernel/head64.c |   42 +++++++++++++++++++++++++-----------------
 1 file changed, 25 insertions(+), 17 deletions(-)

diff --git a/arch/x86/kernel/head64.c b/arch/x86/kernel/head64.c
index 25591f9..a3fc233 100644
--- a/arch/x86/kernel/head64.c
+++ b/arch/x86/kernel/head64.c
@@ -52,15 +52,15 @@ int __init early_make_pgtable(unsigned long address)
 	unsigned long physaddr = address - __PAGE_OFFSET;
 	unsigned long i;
 	pgdval_t pgd, *pgd_p;
-	pudval_t *pud_p;
+	pudval_t pud, *pud_p;
 	pmdval_t pmd, *pmd_p;
 
 	/* Invalid address or early pgt is done ?  */
 	if (physaddr >= MAXMEM || read_cr3() != __pa(early_level4_pgt))
 		return -1;
 
-	i = (address >> PGDIR_SHIFT) & (PTRS_PER_PGD - 1);
-	pgd_p = &early_level4_pgt[i].pgd;
+again:
+	pgd_p = &early_level4_pgt[pgd_index(address)].pgd;
 	pgd = *pgd_p;
 
 	/*
@@ -68,29 +68,37 @@ int __init early_make_pgtable(unsigned long address)
 	 * critical -- __PAGE_OFFSET would point us back into the dynamic
 	 * range and we might end up looping forever...
 	 */
-	if (pgd && next_early_pgt < EARLY_DYNAMIC_PAGE_TABLES) {
+	if (pgd)
 		pud_p = (pudval_t *)((pgd & PTE_PFN_MASK) + __START_KERNEL_map - phys_base);
-	} else {
-		if (next_early_pgt >= EARLY_DYNAMIC_PAGE_TABLES-1)
+	else {
+		if (next_early_pgt >= EARLY_DYNAMIC_PAGE_TABLES) {
 			reset_early_page_tables();
+			goto again;
+		}
 
 		pud_p = (pudval_t *)early_dynamic_pgts[next_early_pgt++];
 		for (i = 0; i < PTRS_PER_PUD; i++)
 			pud_p[i] = 0;
-
 		*pgd_p = (pgdval_t)pud_p - __START_KERNEL_map + phys_base + _KERNPG_TABLE;
 	}
-	i = (address >> PUD_SHIFT) & (PTRS_PER_PUD - 1);
-	pud_p += i;
-
-	pmd_p = (pmdval_t *)early_dynamic_pgts[next_early_pgt++];
-	pmd = (physaddr & PUD_MASK) + (__PAGE_KERNEL_LARGE & ~_PAGE_GLOBAL);
-	for (i = 0; i < PTRS_PER_PMD; i++) {
-		pmd_p[i] = pmd;
-		pmd += PMD_SIZE;
-	}
+	pud_p += pud_index(address);
+	pud = *pud_p;
 
-	*pud_p = (pudval_t)pmd_p - __START_KERNEL_map + phys_base + _KERNPG_TABLE;
+	if (pud)
+		pmd_p = (pmdval_t *)((pud & PTE_PFN_MASK) + __START_KERNEL_map - phys_base);
+	else {
+		if (next_early_pgt >= EARLY_DYNAMIC_PAGE_TABLES) {
+			reset_early_page_tables();
+			goto again;
+		}
+
+		pmd_p = (pmdval_t *)early_dynamic_pgts[next_early_pgt++];
+		for (i = 0; i < PTRS_PER_PMD; i++)
+			pmd_p[i] = 0;
+		*pud_p = (pudval_t)pmd_p - __START_KERNEL_map + phys_base + _KERNPG_TABLE;
+	}
+	pmd = (physaddr & PMD_MASK) + (__PAGE_KERNEL_LARGE & ~_PAGE_GLOBAL);
+	pmd_p[pmd_index(address)] = pmd;
 
 	return 0;
 }
-- 
1.7.10.4


^ permalink raw reply related	[flat|nested] 199+ messages in thread

* [PATCH v7u1 10/31] x86, 64bit: Don't set max_pfn_mapped wrong value early on native path
  2013-01-04  0:48 [PATCH v7u1 00/31] x86, boot, 64bit: Add support for loading ramdisk and bzImage above 4G Yinghai Lu
                   ` (8 preceding siblings ...)
  2013-01-04  0:48 ` [PATCH v7u1 09/31] x86, 64bit: #PF handler set page to cover 2M only Yinghai Lu
@ 2013-01-04  0:48 ` Yinghai Lu
  2013-01-11 12:13   ` Borislav Petkov
  2013-01-15 13:48   ` Stefano Stabellini
  2013-01-04  0:48 ` [PATCH v7u1 11/31] x86: Merge early_reserve_initrd for 32bit and 64bit Yinghai Lu
                   ` (23 subsequent siblings)
  33 siblings, 2 replies; 199+ messages in thread
From: Yinghai Lu @ 2013-01-04  0:48 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H. Peter Anvin
  Cc: Eric W. Biederman, Andrew Morton, Borislav Petkov, Jan Kiszka,
	Jason Wessel, linux-kernel, Yinghai Lu

We are not having max_pfn_mapped set correctly until init_memory_mapping.

so don't print it initial value for 64bit

Also need to use KERNEL_IMAGE_SIZE directly for highmap cleanup.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
---
 arch/x86/kernel/head64.c |    3 ---
 arch/x86/kernel/setup.c  |    2 ++
 arch/x86/mm/init_64.c    |    6 +++++-
 3 files changed, 7 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kernel/head64.c b/arch/x86/kernel/head64.c
index a3fc233..7061d8b 100644
--- a/arch/x86/kernel/head64.c
+++ b/arch/x86/kernel/head64.c
@@ -146,9 +146,6 @@ void __init x86_64_start_kernel(char * real_mode_data)
 	/* clear bss before set_intr_gate with early_idt_handler */
 	clear_bss();
 
-	/* XXX - this is wrong... we need to build page tables from scratch */
-	max_pfn_mapped = KERNEL_IMAGE_SIZE >> PAGE_SHIFT;
-
 	for (i = 0; i < NUM_EXCEPTION_VECTORS; i++) {
 #ifdef CONFIG_EARLY_PRINTK
 		set_intr_gate(i, &early_idt_handlers[i]);
diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
index 63160c6..04797e78 100644
--- a/arch/x86/kernel/setup.c
+++ b/arch/x86/kernel/setup.c
@@ -910,8 +910,10 @@ void __init setup_arch(char **cmdline_p)
 	setup_bios_corruption_check();
 #endif
 
+#ifdef CONFIG_X86_32
 	printk(KERN_DEBUG "initial memory mapped: [mem 0x00000000-%#010lx]\n",
 			(max_pfn_mapped<<PAGE_SHIFT) - 1);
+#endif
 
 	reserve_real_mode();
 
diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c
index 9c5f2b1..98385a2 100644
--- a/arch/x86/mm/init_64.c
+++ b/arch/x86/mm/init_64.c
@@ -394,10 +394,14 @@ void __init init_extra_mapping_uc(unsigned long phys, unsigned long size)
 void __init cleanup_highmap(void)
 {
 	unsigned long vaddr = __START_KERNEL_map;
-	unsigned long vaddr_end = __START_KERNEL_map + (max_pfn_mapped << PAGE_SHIFT);
+	unsigned long vaddr_end = __START_KERNEL_map + KERNEL_IMAGE_SIZE;
 	unsigned long end = roundup((unsigned long)_brk_end, PMD_SIZE) - 1;
 	pmd_t *pmd = level2_kernel_pgt;
 
+	/* Xen has its own end somehow with abused max_pfn_mapped */
+	if (max_pfn_mapped)
+		vaddr_end = __START_KERNEL_map + (max_pfn_mapped << PAGE_SHIFT);
+
 	for (; vaddr + PMD_SIZE - 1 < vaddr_end; pmd++, vaddr += PMD_SIZE) {
 		if (pmd_none(*pmd))
 			continue;
-- 
1.7.10.4


^ permalink raw reply related	[flat|nested] 199+ messages in thread

* [PATCH v7u1 11/31] x86: Merge early_reserve_initrd for 32bit and 64bit
  2013-01-04  0:48 [PATCH v7u1 00/31] x86, boot, 64bit: Add support for loading ramdisk and bzImage above 4G Yinghai Lu
                   ` (9 preceding siblings ...)
  2013-01-04  0:48 ` [PATCH v7u1 10/31] x86, 64bit: Don't set max_pfn_mapped wrong value early on native path Yinghai Lu
@ 2013-01-04  0:48 ` Yinghai Lu
  2013-01-04  0:48 ` [PATCH v7u1 12/31] x86: add get_ramdisk_image/size() Yinghai Lu
                   ` (22 subsequent siblings)
  33 siblings, 0 replies; 199+ messages in thread
From: Yinghai Lu @ 2013-01-04  0:48 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H. Peter Anvin
  Cc: Eric W. Biederman, Andrew Morton, Borislav Petkov, Jan Kiszka,
	Jason Wessel, linux-kernel, Yinghai Lu

They are the same, could move them out from head32/64.c to setup.c.

We are using memblock, and it could handle overlapping properly, so
we don't need to reserve some at first to hold the location, and just
need to make sure we reserve them before we are using memblock to find
free mem to use.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Reviewed-by: Pekka Enberg <penberg@kernel.org>
---
 arch/x86/kernel/head32.c |   11 -----------
 arch/x86/kernel/head64.c |   11 -----------
 arch/x86/kernel/setup.c  |   22 ++++++++++++++++++----
 3 files changed, 18 insertions(+), 26 deletions(-)

diff --git a/arch/x86/kernel/head32.c b/arch/x86/kernel/head32.c
index e175548..b071d41 100644
--- a/arch/x86/kernel/head32.c
+++ b/arch/x86/kernel/head32.c
@@ -33,17 +33,6 @@ void __init i386_start_kernel(void)
 	memblock_reserve(__pa_symbol(_text),
 			 (unsigned long)__bss_stop - (unsigned long)_text);
 
-#ifdef CONFIG_BLK_DEV_INITRD
-	/* Reserve INITRD */
-	if (boot_params.hdr.type_of_loader && boot_params.hdr.ramdisk_image) {
-		/* Assume only end is not page aligned */
-		u64 ramdisk_image = boot_params.hdr.ramdisk_image;
-		u64 ramdisk_size  = boot_params.hdr.ramdisk_size;
-		u64 ramdisk_end   = PAGE_ALIGN(ramdisk_image + ramdisk_size);
-		memblock_reserve(ramdisk_image, ramdisk_end - ramdisk_image);
-	}
-#endif
-
 	/* Call the subarch specific early setup function */
 	switch (boot_params.hdr.hardware_subarch) {
 	case X86_SUBARCH_MRST:
diff --git a/arch/x86/kernel/head64.c b/arch/x86/kernel/head64.c
index 7061d8b..c463725 100644
--- a/arch/x86/kernel/head64.c
+++ b/arch/x86/kernel/head64.c
@@ -176,17 +176,6 @@ void __init x86_64_start_reservations(char *real_mode_data)
 	memblock_reserve(__pa_symbol(_text),
 			 (unsigned long)__bss_stop - (unsigned long)_text);
 
-#ifdef CONFIG_BLK_DEV_INITRD
-	/* Reserve INITRD */
-	if (boot_params.hdr.type_of_loader && boot_params.hdr.ramdisk_image) {
-		/* Assume only end is not page aligned */
-		unsigned long ramdisk_image = boot_params.hdr.ramdisk_image;
-		unsigned long ramdisk_size  = boot_params.hdr.ramdisk_size;
-		unsigned long ramdisk_end   = PAGE_ALIGN(ramdisk_image + ramdisk_size);
-		memblock_reserve(ramdisk_image, ramdisk_end - ramdisk_image);
-	}
-#endif
-
 	reserve_ebda_region();
 
 	/*
diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
index 04797e78..1b8a8cc 100644
--- a/arch/x86/kernel/setup.c
+++ b/arch/x86/kernel/setup.c
@@ -360,6 +360,19 @@ static u64 __init get_mem_size(unsigned long limit_pfn)
 
 	return mapped_pages << PAGE_SHIFT;
 }
+static void __init early_reserve_initrd(void)
+{
+	/* Assume only end is not page aligned */
+	u64 ramdisk_image = boot_params.hdr.ramdisk_image;
+	u64 ramdisk_size  = boot_params.hdr.ramdisk_size;
+	u64 ramdisk_end   = PAGE_ALIGN(ramdisk_image + ramdisk_size);
+
+	if (!boot_params.hdr.type_of_loader ||
+	    !ramdisk_image || !ramdisk_size)
+		return;		/* No initrd provided by bootloader */
+
+	memblock_reserve(ramdisk_image, ramdisk_end - ramdisk_image);
+}
 static void __init reserve_initrd(void)
 {
 	/* Assume only end is not page aligned */
@@ -386,10 +399,6 @@ static void __init reserve_initrd(void)
 	if (pfn_range_is_mapped(PFN_DOWN(ramdisk_image),
 				PFN_DOWN(ramdisk_end))) {
 		/* All are mapped, easy case */
-		/*
-		 * don't need to reserve again, already reserved early
-		 * in i386_start_kernel
-		 */
 		initrd_start = ramdisk_image + PAGE_OFFSET;
 		initrd_end = initrd_start + ramdisk_size;
 		return;
@@ -400,6 +409,9 @@ static void __init reserve_initrd(void)
 	memblock_free(ramdisk_image, ramdisk_end - ramdisk_image);
 }
 #else
+static void __init early_reserve_initrd(void)
+{
+}
 static void __init reserve_initrd(void)
 {
 }
@@ -661,6 +673,8 @@ early_param("reservelow", parse_reservelow);
 
 void __init setup_arch(char **cmdline_p)
 {
+	early_reserve_initrd();
+
 #ifdef CONFIG_X86_32
 	memcpy(&boot_cpu_data, &new_cpu_data, sizeof(new_cpu_data));
 	visws_early_detect();
-- 
1.7.10.4


^ permalink raw reply related	[flat|nested] 199+ messages in thread

* [PATCH v7u1 12/31] x86: add get_ramdisk_image/size()
  2013-01-04  0:48 [PATCH v7u1 00/31] x86, boot, 64bit: Add support for loading ramdisk and bzImage above 4G Yinghai Lu
                   ` (10 preceding siblings ...)
  2013-01-04  0:48 ` [PATCH v7u1 11/31] x86: Merge early_reserve_initrd for 32bit and 64bit Yinghai Lu
@ 2013-01-04  0:48 ` Yinghai Lu
  2013-01-07 15:56   ` Borislav Petkov
  2013-01-04  0:48 ` [PATCH v7u1 13/31] x86, boot: add get_cmd_line_ptr() Yinghai Lu
                   ` (21 subsequent siblings)
  33 siblings, 1 reply; 199+ messages in thread
From: Yinghai Lu @ 2013-01-04  0:48 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H. Peter Anvin
  Cc: Eric W. Biederman, Andrew Morton, Borislav Petkov, Jan Kiszka,
	Jason Wessel, linux-kernel, Yinghai Lu

There are several places to find ramdisk information early for reserving
and relocating.

Use functions to make code more readable and consistent.

Later will add ext_ramdisk_image/size in those functions to support
loading ramdisk above 4g.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
---
 arch/x86/kernel/setup.c |   29 +++++++++++++++++++++--------
 1 file changed, 21 insertions(+), 8 deletions(-)

diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
index 1b8a8cc..644a123 100644
--- a/arch/x86/kernel/setup.c
+++ b/arch/x86/kernel/setup.c
@@ -294,12 +294,25 @@ static void __init reserve_brk(void)
 
 #ifdef CONFIG_BLK_DEV_INITRD
 
+static u64 __init get_ramdisk_image(void)
+{
+	u64 ramdisk_image = boot_params.hdr.ramdisk_image;
+
+	return ramdisk_image;
+}
+static u64 __init get_ramdisk_size(void)
+{
+	u64 ramdisk_size = boot_params.hdr.ramdisk_size;
+
+	return ramdisk_size;
+}
+
 #define MAX_MAP_CHUNK	(NR_FIX_BTMAPS << PAGE_SHIFT)
 static void __init relocate_initrd(void)
 {
 	/* Assume only end is not page aligned */
-	u64 ramdisk_image = boot_params.hdr.ramdisk_image;
-	u64 ramdisk_size  = boot_params.hdr.ramdisk_size;
+	u64 ramdisk_image = get_ramdisk_image();
+	u64 ramdisk_size  = get_ramdisk_size();
 	u64 area_size     = PAGE_ALIGN(ramdisk_size);
 	u64 ramdisk_here;
 	unsigned long slop, clen, mapaddr;
@@ -338,8 +351,8 @@ static void __init relocate_initrd(void)
 		ramdisk_size  -= clen;
 	}
 
-	ramdisk_image = boot_params.hdr.ramdisk_image;
-	ramdisk_size  = boot_params.hdr.ramdisk_size;
+	ramdisk_image = get_ramdisk_image();
+	ramdisk_size  = get_ramdisk_size();
 	printk(KERN_INFO "Move RAMDISK from [mem %#010llx-%#010llx] to"
 		" [mem %#010llx-%#010llx]\n",
 		ramdisk_image, ramdisk_image + ramdisk_size - 1,
@@ -363,8 +376,8 @@ static u64 __init get_mem_size(unsigned long limit_pfn)
 static void __init early_reserve_initrd(void)
 {
 	/* Assume only end is not page aligned */
-	u64 ramdisk_image = boot_params.hdr.ramdisk_image;
-	u64 ramdisk_size  = boot_params.hdr.ramdisk_size;
+	u64 ramdisk_image = get_ramdisk_image();
+	u64 ramdisk_size  = get_ramdisk_size();
 	u64 ramdisk_end   = PAGE_ALIGN(ramdisk_image + ramdisk_size);
 
 	if (!boot_params.hdr.type_of_loader ||
@@ -376,8 +389,8 @@ static void __init early_reserve_initrd(void)
 static void __init reserve_initrd(void)
 {
 	/* Assume only end is not page aligned */
-	u64 ramdisk_image = boot_params.hdr.ramdisk_image;
-	u64 ramdisk_size  = boot_params.hdr.ramdisk_size;
+	u64 ramdisk_image = get_ramdisk_image();
+	u64 ramdisk_size  = get_ramdisk_size();
 	u64 ramdisk_end   = PAGE_ALIGN(ramdisk_image + ramdisk_size);
 	u64 mapped_size;
 
-- 
1.7.10.4


^ permalink raw reply related	[flat|nested] 199+ messages in thread

* [PATCH v7u1 13/31] x86, boot: add get_cmd_line_ptr()
  2013-01-04  0:48 [PATCH v7u1 00/31] x86, boot, 64bit: Add support for loading ramdisk and bzImage above 4G Yinghai Lu
                   ` (11 preceding siblings ...)
  2013-01-04  0:48 ` [PATCH v7u1 12/31] x86: add get_ramdisk_image/size() Yinghai Lu
@ 2013-01-04  0:48 ` Yinghai Lu
  2013-01-07 15:56   ` Borislav Petkov
  2013-01-04  0:48 ` [PATCH v7u1 14/31] x86, boot: move checking of cmd_line_ptr out of common path Yinghai Lu
                   ` (20 subsequent siblings)
  33 siblings, 1 reply; 199+ messages in thread
From: Yinghai Lu @ 2013-01-04  0:48 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H. Peter Anvin
  Cc: Eric W. Biederman, Andrew Morton, Borislav Petkov, Jan Kiszka,
	Jason Wessel, linux-kernel, Yinghai Lu, Gokul Caushik,
	Josh Triplett, Joe Millenbach, Alexander Duyck

later will check ext_cmd_line_ptr at the same time.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Cc: Gokul Caushik <caushik1@gmail.com>
Cc: Josh Triplett <josh@joshtriplett.org>
Cc: Joe Millenbach <jmillenbach@gmail.com>
Cc: Alexander Duyck <alexander.h.duyck@intel.com>
---
 arch/x86/boot/compressed/cmdline.c |   10 ++++++++--
 arch/x86/kernel/head64.c           |   13 +++++++++++--
 2 files changed, 19 insertions(+), 4 deletions(-)

diff --git a/arch/x86/boot/compressed/cmdline.c b/arch/x86/boot/compressed/cmdline.c
index 10f6b11..b4c913c 100644
--- a/arch/x86/boot/compressed/cmdline.c
+++ b/arch/x86/boot/compressed/cmdline.c
@@ -13,13 +13,19 @@ static inline char rdfs8(addr_t addr)
 	return *((char *)(fs + addr));
 }
 #include "../cmdline.c"
+static unsigned long get_cmd_line_ptr(void)
+{
+	unsigned long cmd_line_ptr = real_mode->hdr.cmd_line_ptr;
+
+	return cmd_line_ptr;
+}
 int cmdline_find_option(const char *option, char *buffer, int bufsize)
 {
-	return __cmdline_find_option(real_mode->hdr.cmd_line_ptr, option, buffer, bufsize);
+	return __cmdline_find_option(get_cmd_line_ptr(), option, buffer, bufsize);
 }
 int cmdline_find_option_bool(const char *option)
 {
-	return __cmdline_find_option_bool(real_mode->hdr.cmd_line_ptr, option);
+	return __cmdline_find_option_bool(get_cmd_line_ptr(), option);
 }
 
 #endif
diff --git a/arch/x86/kernel/head64.c b/arch/x86/kernel/head64.c
index c463725..316e7b2 100644
--- a/arch/x86/kernel/head64.c
+++ b/arch/x86/kernel/head64.c
@@ -111,13 +111,22 @@ static void __init clear_bss(void)
 	       (unsigned long) __bss_stop - (unsigned long) __bss_start);
 }
 
+static unsigned long get_cmd_line_ptr(void)
+{
+	unsigned long cmd_line_ptr = boot_params.hdr.cmd_line_ptr;
+
+	return cmd_line_ptr;
+}
+
 static void __init copy_bootdata(char *real_mode_data)
 {
 	char * command_line;
+	unsigned long cmd_line_ptr;
 
 	memcpy(&boot_params, real_mode_data, sizeof boot_params);
-	if (boot_params.hdr.cmd_line_ptr) {
-		command_line = __va(boot_params.hdr.cmd_line_ptr);
+	cmd_line_ptr = get_cmd_line_ptr();
+	if (cmd_line_ptr) {
+		command_line = __va(cmd_line_ptr);
 		memcpy(boot_command_line, command_line, COMMAND_LINE_SIZE);
 	}
 }
-- 
1.7.10.4


^ permalink raw reply related	[flat|nested] 199+ messages in thread

* [PATCH v7u1 14/31] x86, boot: move checking of cmd_line_ptr out of common path
  2013-01-04  0:48 [PATCH v7u1 00/31] x86, boot, 64bit: Add support for loading ramdisk and bzImage above 4G Yinghai Lu
                   ` (12 preceding siblings ...)
  2013-01-04  0:48 ` [PATCH v7u1 13/31] x86, boot: add get_cmd_line_ptr() Yinghai Lu
@ 2013-01-04  0:48 ` Yinghai Lu
  2013-01-07 16:00   ` Borislav Petkov
  2013-01-04  0:48 ` [PATCH v7u1 15/31] x86, boot: pass cmd_line_ptr with unsigned long instead Yinghai Lu
                   ` (19 subsequent siblings)
  33 siblings, 1 reply; 199+ messages in thread
From: Yinghai Lu @ 2013-01-04  0:48 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H. Peter Anvin
  Cc: Eric W. Biederman, Andrew Morton, Borislav Petkov, Jan Kiszka,
	Jason Wessel, linux-kernel, Yinghai Lu

cmdline.c::__cmdline_find_option... are shared between 16-bit setup code
and 32/64 bit decompressor code.

for 32/64 only path via kexec, we should not check if ptr is less 1M.
as those cmdline could be put above 1M, or even 4G.

Move out accessible checking out of __cmdline_find_option()
So decompressor in misc.c can parse cmdline correctly.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
---
 arch/x86/boot/boot.h    |   14 ++++++++++++--
 arch/x86/boot/cmdline.c |    8 ++++----
 2 files changed, 16 insertions(+), 6 deletions(-)

diff --git a/arch/x86/boot/boot.h b/arch/x86/boot/boot.h
index 18997e5..7fadf80 100644
--- a/arch/x86/boot/boot.h
+++ b/arch/x86/boot/boot.h
@@ -289,12 +289,22 @@ int __cmdline_find_option(u32 cmdline_ptr, const char *option, char *buffer, int
 int __cmdline_find_option_bool(u32 cmdline_ptr, const char *option);
 static inline int cmdline_find_option(const char *option, char *buffer, int bufsize)
 {
-	return __cmdline_find_option(boot_params.hdr.cmd_line_ptr, option, buffer, bufsize);
+	u32 cmd_line_ptr = boot_params.hdr.cmd_line_ptr;
+
+	if (cmd_line_ptr >= 0x100000)
+		return -1;      /* inaccessible */
+
+	return __cmdline_find_option(cmd_line_ptr, option, buffer, bufsize);
 }
 
 static inline int cmdline_find_option_bool(const char *option)
 {
-	return __cmdline_find_option_bool(boot_params.hdr.cmd_line_ptr, option);
+	u32 cmd_line_ptr = boot_params.hdr.cmd_line_ptr;
+
+	if (cmd_line_ptr >= 0x100000)
+		return -1;      /* inaccessible */
+
+	return __cmdline_find_option_bool(cmd_line_ptr, option);
 }
 
 
diff --git a/arch/x86/boot/cmdline.c b/arch/x86/boot/cmdline.c
index 6b3b6f7..768f00f 100644
--- a/arch/x86/boot/cmdline.c
+++ b/arch/x86/boot/cmdline.c
@@ -41,8 +41,8 @@ int __cmdline_find_option(u32 cmdline_ptr, const char *option, char *buffer, int
 		st_bufcpy	/* Copying this to buffer */
 	} state = st_wordstart;
 
-	if (!cmdline_ptr || cmdline_ptr >= 0x100000)
-		return -1;	/* No command line, or inaccessible */
+	if (!cmdline_ptr)
+		return -1;      /* No command line */
 
 	cptr = cmdline_ptr & 0xf;
 	set_fs(cmdline_ptr >> 4);
@@ -111,8 +111,8 @@ int __cmdline_find_option_bool(u32 cmdline_ptr, const char *option)
 		st_wordskip,	/* Miscompare, skip */
 	} state = st_wordstart;
 
-	if (!cmdline_ptr || cmdline_ptr >= 0x100000)
-		return -1;	/* No command line, or inaccessible */
+	if (!cmdline_ptr)
+		return -1;      /* No command line */
 
 	cptr = cmdline_ptr & 0xf;
 	set_fs(cmdline_ptr >> 4);
-- 
1.7.10.4


^ permalink raw reply related	[flat|nested] 199+ messages in thread

* [PATCH v7u1 15/31] x86, boot: pass cmd_line_ptr with unsigned long instead
  2013-01-04  0:48 [PATCH v7u1 00/31] x86, boot, 64bit: Add support for loading ramdisk and bzImage above 4G Yinghai Lu
                   ` (13 preceding siblings ...)
  2013-01-04  0:48 ` [PATCH v7u1 14/31] x86, boot: move checking of cmd_line_ptr out of common path Yinghai Lu
@ 2013-01-04  0:48 ` Yinghai Lu
  2013-01-04  0:48 ` [PATCH v7u1 16/31] x86, boot: move verify_cpu.S and no_longmode down Yinghai Lu
                   ` (18 subsequent siblings)
  33 siblings, 0 replies; 199+ messages in thread
From: Yinghai Lu @ 2013-01-04  0:48 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H. Peter Anvin
  Cc: Eric W. Biederman, Andrew Morton, Borislav Petkov, Jan Kiszka,
	Jason Wessel, linux-kernel, Yinghai Lu

boot/compressed/misc.c is used for bzImage in 64bit and 32bit, and
cmd_line_ptr could point to buffer that is above 4g, cmd_line_ptr
should be 64bit otherwise high 32bit will be capped out.

So need to change data type to unsigned long, that will be 64bit get
correct address of command line buffer.

and it is still ok with 32bit bzImage, because unsigned long for them
is still 32bit.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
---
 arch/x86/boot/boot.h    |    8 ++++----
 arch/x86/boot/cmdline.c |    4 ++--
 2 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/arch/x86/boot/boot.h b/arch/x86/boot/boot.h
index 7fadf80..5b75319 100644
--- a/arch/x86/boot/boot.h
+++ b/arch/x86/boot/boot.h
@@ -285,11 +285,11 @@ struct biosregs {
 void intcall(u8 int_no, const struct biosregs *ireg, struct biosregs *oreg);
 
 /* cmdline.c */
-int __cmdline_find_option(u32 cmdline_ptr, const char *option, char *buffer, int bufsize);
-int __cmdline_find_option_bool(u32 cmdline_ptr, const char *option);
+int __cmdline_find_option(unsigned long cmdline_ptr, const char *option, char *buffer, int bufsize);
+int __cmdline_find_option_bool(unsigned long cmdline_ptr, const char *option);
 static inline int cmdline_find_option(const char *option, char *buffer, int bufsize)
 {
-	u32 cmd_line_ptr = boot_params.hdr.cmd_line_ptr;
+	unsigned long cmd_line_ptr = boot_params.hdr.cmd_line_ptr;
 
 	if (cmd_line_ptr >= 0x100000)
 		return -1;      /* inaccessible */
@@ -299,7 +299,7 @@ static inline int cmdline_find_option(const char *option, char *buffer, int bufs
 
 static inline int cmdline_find_option_bool(const char *option)
 {
-	u32 cmd_line_ptr = boot_params.hdr.cmd_line_ptr;
+	unsigned long cmd_line_ptr = boot_params.hdr.cmd_line_ptr;
 
 	if (cmd_line_ptr >= 0x100000)
 		return -1;      /* inaccessible */
diff --git a/arch/x86/boot/cmdline.c b/arch/x86/boot/cmdline.c
index 768f00f..625d21b 100644
--- a/arch/x86/boot/cmdline.c
+++ b/arch/x86/boot/cmdline.c
@@ -27,7 +27,7 @@ static inline int myisspace(u8 c)
  * Returns the length of the argument (regardless of if it was
  * truncated to fit in the buffer), or -1 on not found.
  */
-int __cmdline_find_option(u32 cmdline_ptr, const char *option, char *buffer, int bufsize)
+int __cmdline_find_option(unsigned long cmdline_ptr, const char *option, char *buffer, int bufsize)
 {
 	addr_t cptr;
 	char c;
@@ -99,7 +99,7 @@ int __cmdline_find_option(u32 cmdline_ptr, const char *option, char *buffer, int
  * Returns the position of that option (starts counting with 1)
  * or 0 on not found
  */
-int __cmdline_find_option_bool(u32 cmdline_ptr, const char *option)
+int __cmdline_find_option_bool(unsigned long cmdline_ptr, const char *option)
 {
 	addr_t cptr;
 	char c;
-- 
1.7.10.4


^ permalink raw reply related	[flat|nested] 199+ messages in thread

* [PATCH v7u1 16/31] x86, boot: move verify_cpu.S and no_longmode down
  2013-01-04  0:48 [PATCH v7u1 00/31] x86, boot, 64bit: Add support for loading ramdisk and bzImage above 4G Yinghai Lu
                   ` (14 preceding siblings ...)
  2013-01-04  0:48 ` [PATCH v7u1 15/31] x86, boot: pass cmd_line_ptr with unsigned long instead Yinghai Lu
@ 2013-01-04  0:48 ` Yinghai Lu
  2013-01-04  0:48 ` [PATCH v7u1 17/31] x86, boot: Move lldt/ltr out of 64bit code section Yinghai Lu
                   ` (17 subsequent siblings)
  33 siblings, 0 replies; 199+ messages in thread
From: Yinghai Lu @ 2013-01-04  0:48 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H. Peter Anvin
  Cc: Eric W. Biederman, Andrew Morton, Borislav Petkov, Jan Kiszka,
	Jason Wessel, linux-kernel, Yinghai Lu, Matt Fleming

We need to move some code to 32bit section in following patch:

   x86, boot: Move lldt/ltr out of 64bit code section

but that will push startup_64 down from 0x200.

According to hpa, we can not change startup_64 position and that
is an ABI.

We could move function verify_cpu and no_longmode down, because
verify_cpu is used via function call and no_longmode will not
return, then we don't need to add extra code for jumping back.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Cc: Matt Fleming <matt.fleming@intel.com>
---
 arch/x86/boot/compressed/head_64.S |   17 +++++++++--------
 1 file changed, 9 insertions(+), 8 deletions(-)

diff --git a/arch/x86/boot/compressed/head_64.S b/arch/x86/boot/compressed/head_64.S
index 2c4b171..fb984c0 100644
--- a/arch/x86/boot/compressed/head_64.S
+++ b/arch/x86/boot/compressed/head_64.S
@@ -176,14 +176,6 @@ ENTRY(startup_32)
 	lret
 ENDPROC(startup_32)
 
-no_longmode:
-	/* This isn't an x86-64 CPU so hang */
-1:
-	hlt
-	jmp     1b
-
-#include "../../kernel/verify_cpu.S"
-
 	/*
 	 * Be careful here startup_64 needs to be at a predictable
 	 * address so I can export it in an ELF header.  Bootloaders
@@ -349,6 +341,15 @@ relocated:
  */
 	jmp	*%rbp
 
+	.code32
+no_longmode:
+	/* This isn't an x86-64 CPU so hang */
+1:
+	hlt
+	jmp     1b
+
+#include "../../kernel/verify_cpu.S"
+
 	.data
 gdt:
 	.word	gdt_end - gdt
-- 
1.7.10.4


^ permalink raw reply related	[flat|nested] 199+ messages in thread

* [PATCH v7u1 17/31] x86, boot: Move lldt/ltr out of 64bit code section
  2013-01-04  0:48 [PATCH v7u1 00/31] x86, boot, 64bit: Add support for loading ramdisk and bzImage above 4G Yinghai Lu
                   ` (15 preceding siblings ...)
  2013-01-04  0:48 ` [PATCH v7u1 16/31] x86, boot: move verify_cpu.S and no_longmode down Yinghai Lu
@ 2013-01-04  0:48 ` Yinghai Lu
  2013-01-04  0:48 ` [PATCH v7u1 18/31] x86, kexec: remove 1024G limitation for kexec buffer on 64bit Yinghai Lu
                   ` (16 subsequent siblings)
  33 siblings, 0 replies; 199+ messages in thread
From: Yinghai Lu @ 2013-01-04  0:48 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H. Peter Anvin
  Cc: Eric W. Biederman, Andrew Morton, Borislav Petkov, Jan Kiszka,
	Jason Wessel, linux-kernel, Yinghai Lu, Zachary Amsden,
	Matt Fleming

commit 08da5a2ca

    x86_64: Early segment setup for VT

sets up LDT and TR into a valid state in order to speed up boot
decompression under VT.

Those code are put in code64, and it is using GDT that is only
loaded from code32 path.

That breaks booting with 64bit bootloader that does not go through
code32 path and jump to startup_64 directly, and it has different
GDT.

Move those lines into code32 after their GDT is loaded.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Cc: Zachary Amsden <zamsden@gmail.com>
Cc: Matt Fleming <matt.fleming@intel.com>
---
 arch/x86/boot/compressed/head_64.S |    9 ++++++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/arch/x86/boot/compressed/head_64.S b/arch/x86/boot/compressed/head_64.S
index fb984c0..5c80b94 100644
--- a/arch/x86/boot/compressed/head_64.S
+++ b/arch/x86/boot/compressed/head_64.S
@@ -154,6 +154,12 @@ ENTRY(startup_32)
 	btsl	$_EFER_LME, %eax
 	wrmsr
 
+	/* After gdt is loaded */
+	xorl	%eax, %eax
+	lldt	%ax
+	movl    $0x20, %eax
+	ltr	%ax
+
 	/*
 	 * Setup for the jump to 64bit mode
 	 *
@@ -239,9 +245,6 @@ preferred_addr:
 	movl	%eax, %ss
 	movl	%eax, %fs
 	movl	%eax, %gs
-	lldt	%ax
-	movl    $0x20, %eax
-	ltr	%ax
 
 	/*
 	 * Compute the decompressed kernel start address.  It is where
-- 
1.7.10.4


^ permalink raw reply related	[flat|nested] 199+ messages in thread

* [PATCH v7u1 18/31] x86, kexec: remove 1024G limitation for kexec buffer on 64bit
  2013-01-04  0:48 [PATCH v7u1 00/31] x86, boot, 64bit: Add support for loading ramdisk and bzImage above 4G Yinghai Lu
                   ` (16 preceding siblings ...)
  2013-01-04  0:48 ` [PATCH v7u1 17/31] x86, boot: Move lldt/ltr out of 64bit code section Yinghai Lu
@ 2013-01-04  0:48 ` Yinghai Lu
  2013-01-04  0:48 ` [PATCH v7u1 19/31] x86, kexec: set ident mapping for kernel that is above max_pfn Yinghai Lu
                   ` (15 subsequent siblings)
  33 siblings, 0 replies; 199+ messages in thread
From: Yinghai Lu @ 2013-01-04  0:48 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H. Peter Anvin
  Cc: Eric W. Biederman, Andrew Morton, Borislav Petkov, Jan Kiszka,
	Jason Wessel, linux-kernel, Yinghai Lu

Now 64bit kernel supports more than 1T ram and kexec tools
could find buffer above 1T, remove that obsolete limitation.
and use MAXMEM instead.

Tested on system with more than 1024G ram.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Cc: Eric W. Biederman <ebiederm@xmission.com>
---
 arch/x86/include/asm/kexec.h |    6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/arch/x86/include/asm/kexec.h b/arch/x86/include/asm/kexec.h
index 6080d26..17483a4 100644
--- a/arch/x86/include/asm/kexec.h
+++ b/arch/x86/include/asm/kexec.h
@@ -48,11 +48,11 @@
 # define vmcore_elf_check_arch_cross(x) ((x)->e_machine == EM_X86_64)
 #else
 /* Maximum physical address we can use pages from */
-# define KEXEC_SOURCE_MEMORY_LIMIT      (0xFFFFFFFFFFUL)
+# define KEXEC_SOURCE_MEMORY_LIMIT      (MAXMEM-1)
 /* Maximum address we can reach in physical address mode */
-# define KEXEC_DESTINATION_MEMORY_LIMIT (0xFFFFFFFFFFUL)
+# define KEXEC_DESTINATION_MEMORY_LIMIT (MAXMEM-1)
 /* Maximum address we can use for the control pages */
-# define KEXEC_CONTROL_MEMORY_LIMIT     (0xFFFFFFFFFFUL)
+# define KEXEC_CONTROL_MEMORY_LIMIT     (MAXMEM-1)
 
 /* Allocate one page for the pdp and the second for the code */
 # define KEXEC_CONTROL_PAGE_SIZE  (4096UL + 4096UL)
-- 
1.7.10.4


^ permalink raw reply related	[flat|nested] 199+ messages in thread

* [PATCH v7u1 19/31] x86, kexec: set ident mapping for kernel that is above max_pfn
  2013-01-04  0:48 [PATCH v7u1 00/31] x86, boot, 64bit: Add support for loading ramdisk and bzImage above 4G Yinghai Lu
                   ` (17 preceding siblings ...)
  2013-01-04  0:48 ` [PATCH v7u1 18/31] x86, kexec: remove 1024G limitation for kexec buffer on 64bit Yinghai Lu
@ 2013-01-04  0:48 ` Yinghai Lu
  2013-01-04  0:48 ` [PATCH v7u1 20/31] x86, kexec: replace ident_mapping_init and init_level4_page Yinghai Lu
                   ` (14 subsequent siblings)
  33 siblings, 0 replies; 199+ messages in thread
From: Yinghai Lu @ 2013-01-04  0:48 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H. Peter Anvin
  Cc: Eric W. Biederman, Andrew Morton, Borislav Petkov, Jan Kiszka,
	Jason Wessel, linux-kernel, Yinghai Lu

When first kernel is booted with memmap= or mem=  to limit max_pfn.
kexec can load second kernel above that max_pfn.

We need to set ident mapping for whole image in this case not just
for first 2M.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
---
 arch/x86/kernel/machine_kexec_64.c |   43 +++++++++++++++++++++++++++++++-----
 1 file changed, 37 insertions(+), 6 deletions(-)

diff --git a/arch/x86/kernel/machine_kexec_64.c b/arch/x86/kernel/machine_kexec_64.c
index b3ea9db..be14ee1 100644
--- a/arch/x86/kernel/machine_kexec_64.c
+++ b/arch/x86/kernel/machine_kexec_64.c
@@ -56,6 +56,25 @@ out:
 	return result;
 }
 
+static int ident_mapping_init(struct kimage *image, pgd_t *level4p,
+				unsigned long mstart, unsigned long mend)
+{
+	int result;
+
+	mstart = round_down(mstart, PMD_SIZE);
+	mend   = round_up(mend - 1, PMD_SIZE);
+
+	while (mstart < mend) {
+		result = init_one_level2_page(image, level4p, mstart);
+		if (result)
+			return result;
+
+		mstart += PMD_SIZE;
+	}
+
+	return 0;
+}
+
 static void init_level2_page(pmd_t *level2p, unsigned long addr)
 {
 	unsigned long end_addr;
@@ -184,22 +203,34 @@ err:
 	return result;
 }
 
-
 static int init_pgtable(struct kimage *image, unsigned long start_pgtable)
 {
+	unsigned long mstart, mend;
 	pgd_t *level4p;
 	int result;
+	int i;
+
 	level4p = (pgd_t *)__va(start_pgtable);
 	result = init_level4_page(image, level4p, 0, max_pfn << PAGE_SHIFT);
 	if (result)
 		return result;
+
 	/*
-	 * image->start may be outside 0 ~ max_pfn, for example when
-	 * jump back to original kernel from kexeced kernel
+	 * segments's mem ranges could be outside 0 ~ max_pfn,
+	 * for example when jump back to original kernel from kexeced kernel.
+	 * or first kernel is booted with user mem map, and second kernel
+	 * could be loaded out of that range.
 	 */
-	result = init_one_level2_page(image, level4p, image->start);
-	if (result)
-		return result;
+	for (i = 0; i < image->nr_segments; i++) {
+		mstart = image->segment[i].mem;
+		mend   = mstart + image->segment[i].memsz;
+
+		result = ident_mapping_init(image, level4p, mstart, mend);
+
+		if (result)
+			return result;
+	}
+
 	return init_transition_pgtable(image, level4p);
 }
 
-- 
1.7.10.4


^ permalink raw reply related	[flat|nested] 199+ messages in thread

* [PATCH v7u1 20/31] x86, kexec: replace ident_mapping_init and init_level4_page
  2013-01-04  0:48 [PATCH v7u1 00/31] x86, boot, 64bit: Add support for loading ramdisk and bzImage above 4G Yinghai Lu
                   ` (18 preceding siblings ...)
  2013-01-04  0:48 ` [PATCH v7u1 19/31] x86, kexec: set ident mapping for kernel that is above max_pfn Yinghai Lu
@ 2013-01-04  0:48 ` Yinghai Lu
  2013-01-04 21:01   ` Borislav Petkov
  2013-01-04  0:48 ` [PATCH v7u1 21/31] x86, kexec: only set ident mapping for ram Yinghai Lu
                   ` (13 subsequent siblings)
  33 siblings, 1 reply; 199+ messages in thread
From: Yinghai Lu @ 2013-01-04  0:48 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H. Peter Anvin
  Cc: Eric W. Biederman, Andrew Morton, Borislav Petkov, Jan Kiszka,
	Jason Wessel, linux-kernel, Yinghai Lu

Now ident_mapping_init is checking if pgd/pud is present for every 2M,
so several 2Ms are in same PUD, it will keep checking if pud is there.

init_level4_page does not check existing pgd/pud.

We could use generic mapping_init to replace them all.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
---
 arch/x86/kernel/machine_kexec_64.c |  161 ++++++------------------------------
 1 file changed, 26 insertions(+), 135 deletions(-)

diff --git a/arch/x86/kernel/machine_kexec_64.c b/arch/x86/kernel/machine_kexec_64.c
index be14ee1..d2d7e02 100644
--- a/arch/x86/kernel/machine_kexec_64.c
+++ b/arch/x86/kernel/machine_kexec_64.c
@@ -16,144 +16,12 @@
 #include <linux/io.h>
 #include <linux/suspend.h>
 
+#include <asm/init.h>
 #include <asm/pgtable.h>
 #include <asm/tlbflush.h>
 #include <asm/mmu_context.h>
 #include <asm/debugreg.h>
 
-static int init_one_level2_page(struct kimage *image, pgd_t *pgd,
-				unsigned long addr)
-{
-	pud_t *pud;
-	pmd_t *pmd;
-	struct page *page;
-	int result = -ENOMEM;
-
-	addr &= PMD_MASK;
-	pgd += pgd_index(addr);
-	if (!pgd_present(*pgd)) {
-		page = kimage_alloc_control_pages(image, 0);
-		if (!page)
-			goto out;
-		pud = (pud_t *)page_address(page);
-		clear_page(pud);
-		set_pgd(pgd, __pgd(__pa(pud) | _KERNPG_TABLE));
-	}
-	pud = pud_offset(pgd, addr);
-	if (!pud_present(*pud)) {
-		page = kimage_alloc_control_pages(image, 0);
-		if (!page)
-			goto out;
-		pmd = (pmd_t *)page_address(page);
-		clear_page(pmd);
-		set_pud(pud, __pud(__pa(pmd) | _KERNPG_TABLE));
-	}
-	pmd = pmd_offset(pud, addr);
-	if (!pmd_present(*pmd))
-		set_pmd(pmd, __pmd(addr | __PAGE_KERNEL_LARGE_EXEC));
-	result = 0;
-out:
-	return result;
-}
-
-static int ident_mapping_init(struct kimage *image, pgd_t *level4p,
-				unsigned long mstart, unsigned long mend)
-{
-	int result;
-
-	mstart = round_down(mstart, PMD_SIZE);
-	mend   = round_up(mend - 1, PMD_SIZE);
-
-	while (mstart < mend) {
-		result = init_one_level2_page(image, level4p, mstart);
-		if (result)
-			return result;
-
-		mstart += PMD_SIZE;
-	}
-
-	return 0;
-}
-
-static void init_level2_page(pmd_t *level2p, unsigned long addr)
-{
-	unsigned long end_addr;
-
-	addr &= PAGE_MASK;
-	end_addr = addr + PUD_SIZE;
-	while (addr < end_addr) {
-		set_pmd(level2p++, __pmd(addr | __PAGE_KERNEL_LARGE_EXEC));
-		addr += PMD_SIZE;
-	}
-}
-
-static int init_level3_page(struct kimage *image, pud_t *level3p,
-				unsigned long addr, unsigned long last_addr)
-{
-	unsigned long end_addr;
-	int result;
-
-	result = 0;
-	addr &= PAGE_MASK;
-	end_addr = addr + PGDIR_SIZE;
-	while ((addr < last_addr) && (addr < end_addr)) {
-		struct page *page;
-		pmd_t *level2p;
-
-		page = kimage_alloc_control_pages(image, 0);
-		if (!page) {
-			result = -ENOMEM;
-			goto out;
-		}
-		level2p = (pmd_t *)page_address(page);
-		init_level2_page(level2p, addr);
-		set_pud(level3p++, __pud(__pa(level2p) | _KERNPG_TABLE));
-		addr += PUD_SIZE;
-	}
-	/* clear the unused entries */
-	while (addr < end_addr) {
-		pud_clear(level3p++);
-		addr += PUD_SIZE;
-	}
-out:
-	return result;
-}
-
-
-static int init_level4_page(struct kimage *image, pgd_t *level4p,
-				unsigned long addr, unsigned long last_addr)
-{
-	unsigned long end_addr;
-	int result;
-
-	result = 0;
-	addr &= PAGE_MASK;
-	end_addr = addr + (PTRS_PER_PGD * PGDIR_SIZE);
-	while ((addr < last_addr) && (addr < end_addr)) {
-		struct page *page;
-		pud_t *level3p;
-
-		page = kimage_alloc_control_pages(image, 0);
-		if (!page) {
-			result = -ENOMEM;
-			goto out;
-		}
-		level3p = (pud_t *)page_address(page);
-		result = init_level3_page(image, level3p, addr, last_addr);
-		if (result)
-			goto out;
-		set_pgd(level4p++, __pgd(__pa(level3p) | _KERNPG_TABLE));
-		addr += PGDIR_SIZE;
-	}
-	/* clear the unused entries */
-	while (addr < end_addr) {
-		pgd_clear(level4p++);
-		addr += PGDIR_SIZE;
-	}
-out:
-	return result;
-}
-
 static void free_transition_pgtable(struct kimage *image)
 {
 	free_page((unsigned long)image->arch.pud);
@@ -203,15 +71,37 @@ err:
 	return result;
 }
 
+static void *alloc_pgt_page(void *data)
+{
+	struct kimage *image = (struct kimage *)data;
+	struct page *page;
+	void *p = NULL;
+
+	page = kimage_alloc_control_pages(image, 0);
+	if (page) {
+		p = page_address(page);
+		clear_page(p);
+	}
+
+	return p;
+}
+
 static int init_pgtable(struct kimage *image, unsigned long start_pgtable)
 {
+	struct x86_mapping_info info = {
+		.alloc_pgt_page	= alloc_pgt_page,
+		.context	= image,
+		.pmd_flag	= __PAGE_KERNEL_LARGE_EXEC,
+	};
 	unsigned long mstart, mend;
 	pgd_t *level4p;
 	int result;
 	int i;
 
 	level4p = (pgd_t *)__va(start_pgtable);
-	result = init_level4_page(image, level4p, 0, max_pfn << PAGE_SHIFT);
+	clear_page(level4p);
+	result = kernel_ident_mapping_init(&info, level4p,
+						0, max_pfn << PAGE_SHIFT);
 	if (result)
 		return result;
 
@@ -225,7 +115,8 @@ static int init_pgtable(struct kimage *image, unsigned long start_pgtable)
 		mstart = image->segment[i].mem;
 		mend   = mstart + image->segment[i].memsz;
 
-		result = ident_mapping_init(image, level4p, mstart, mend);
+		result = kernel_ident_mapping_init(&info,
+						 level4p, mstart, mend);
 
 		if (result)
 			return result;
-- 
1.7.10.4


^ permalink raw reply related	[flat|nested] 199+ messages in thread

* [PATCH v7u1 21/31] x86, kexec: only set ident mapping for ram.
  2013-01-04  0:48 [PATCH v7u1 00/31] x86, boot, 64bit: Add support for loading ramdisk and bzImage above 4G Yinghai Lu
                   ` (19 preceding siblings ...)
  2013-01-04  0:48 ` [PATCH v7u1 20/31] x86, kexec: replace ident_mapping_init and init_level4_page Yinghai Lu
@ 2013-01-04  0:48 ` Yinghai Lu
  2013-01-13 12:56   ` Borislav Petkov
  2013-01-04  0:48 ` [PATCH v7u1 22/31] x86, boot: add fields to support load bzImage and ramdisk above 4G Yinghai Lu
                   ` (12 subsequent siblings)
  33 siblings, 1 reply; 199+ messages in thread
From: Yinghai Lu @ 2013-01-04  0:48 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H. Peter Anvin
  Cc: Eric W. Biederman, Andrew Morton, Borislav Petkov, Jan Kiszka,
	Jason Wessel, linux-kernel, Yinghai Lu, Alexander Duyck

We should not set mapping for all under max_pfn.
That causes same problem that is fixed by

	x86, mm: Only direct map addresses that are marked as E820_RAM

This patch expose pfn_mapped array, and only set ident mapping for ranges
in that array.

This patch rely on new ident_mapping_init that could handle existing
pgd/pud between different calling.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Cc: Alexander Duyck <alexander.h.duyck@intel.com>
---
 arch/x86/include/asm/page.h        |    4 ++++
 arch/x86/kernel/machine_kexec_64.c |   13 +++++++++----
 arch/x86/mm/init.c                 |    4 ++--
 3 files changed, 15 insertions(+), 6 deletions(-)

diff --git a/arch/x86/include/asm/page.h b/arch/x86/include/asm/page.h
index 3698a6a..c878924 100644
--- a/arch/x86/include/asm/page.h
+++ b/arch/x86/include/asm/page.h
@@ -17,6 +17,10 @@
 
 struct page;
 
+#include <linux/range.h>
+extern struct range pfn_mapped[];
+extern int nr_pfn_mapped;
+
 static inline void clear_user_page(void *page, unsigned long vaddr,
 				   struct page *pg)
 {
diff --git a/arch/x86/kernel/machine_kexec_64.c b/arch/x86/kernel/machine_kexec_64.c
index d2d7e02..4eabc16 100644
--- a/arch/x86/kernel/machine_kexec_64.c
+++ b/arch/x86/kernel/machine_kexec_64.c
@@ -100,10 +100,15 @@ static int init_pgtable(struct kimage *image, unsigned long start_pgtable)
 
 	level4p = (pgd_t *)__va(start_pgtable);
 	clear_page(level4p);
-	result = kernel_ident_mapping_init(&info, level4p,
-						0, max_pfn << PAGE_SHIFT);
-	if (result)
-		return result;
+	for (i = 0; i < nr_pfn_mapped; i++) {
+		mstart = pfn_mapped[i].start << PAGE_SHIFT;
+		mend   = pfn_mapped[i].end << PAGE_SHIFT;
+
+		result = kernel_ident_mapping_init(&info,
+						 level4p, mstart, mend);
+		if (result)
+			return result;
+	}
 
 	/*
 	 * segments's mem ranges could be outside 0 ~ max_pfn,
diff --git a/arch/x86/mm/init.c b/arch/x86/mm/init.c
index ab26a15..d704b36 100644
--- a/arch/x86/mm/init.c
+++ b/arch/x86/mm/init.c
@@ -300,8 +300,8 @@ static int __meminit split_mem_range(struct map_range *mr, int nr_range,
 	return nr_range;
 }
 
-static struct range pfn_mapped[E820_X_MAX];
-static int nr_pfn_mapped;
+struct range pfn_mapped[E820_X_MAX];
+int nr_pfn_mapped;
 
 static void add_pfn_range_mapped(unsigned long start_pfn, unsigned long end_pfn)
 {
-- 
1.7.10.4


^ permalink raw reply related	[flat|nested] 199+ messages in thread

* [PATCH v7u1 22/31] x86, boot: add fields to support load bzImage and ramdisk above 4G
  2013-01-04  0:48 [PATCH v7u1 00/31] x86, boot, 64bit: Add support for loading ramdisk and bzImage above 4G Yinghai Lu
                   ` (20 preceding siblings ...)
  2013-01-04  0:48 ` [PATCH v7u1 21/31] x86, kexec: only set ident mapping for ram Yinghai Lu
@ 2013-01-04  0:48 ` Yinghai Lu
  2013-01-13 21:41   ` Borislav Petkov
  2013-01-04  0:48 ` [PATCH v7u1 23/31] x86, boot: update comments about entries for 64bit image Yinghai Lu
                   ` (11 subsequent siblings)
  33 siblings, 1 reply; 199+ messages in thread
From: Yinghai Lu @ 2013-01-04  0:48 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H. Peter Anvin
  Cc: Eric W. Biederman, Andrew Morton, Borislav Petkov, Jan Kiszka,
	Jason Wessel, linux-kernel, Yinghai Lu, Rob Landley,
	Matt Fleming, Gokul Caushik, Josh Triplett, Joe Millenbach

ext_ramdisk_image/size will record high 32bits for ramdisk info.

xloadflags bit0 will be set if relocatable with 64bit.

Let get_ramdisk_image/size to use ext_ramdisk_image/size to get
right positon for ramdisk.

bootloader will fill value to ext_ramdisk_image/size when it load
ramdisk above 4G.

Also bootloader will check if xloadflags bit0 is set to decicde if
it could load ramdisk high above 4G.

sentinel is used to make sure kernel have ext_* valid values set

Update header version to 2.12.

-v2: add ext_cmd_line_ptr for above 4G support.
-v3: update to xloadflags from HPA.
-v4: use fields from bootparam instead setup_header according to HPA.
-v5: add checking for USE_EXT_BOOT_PARAMS
-v6: use sentinel to check if ext_* are valid suggested by HPA.
     HPA said:
	1. add a field in the uninitialized portion, call it "sentinel";
	2. make sure the byte position corresponding to the "sentinel" field is
	   nonzero in the bzImage file;
	3. if the kernel boots up and sentinel is nonzero, erase those fields
	   that you identified as uninitialized;
-v7: change to 0x1ef instead of 0x1f0, HPA said:
	it is quite plausible that someone may (fairly sanely) start the
	copy range at 0x1f0 instead of 0x1f1

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Cc: Rob Landley <rob@landley.net>
Cc: Matt Fleming <matt.fleming@intel.com>
Cc: Gokul Caushik <caushik1@gmail.com>
Cc: Josh Triplett <josh@joshtriplett.org>
Cc: Joe Millenbach <jmillenbach@gmail.com>
---
 Documentation/x86/boot.txt            |   15 ++++++++++++++-
 Documentation/x86/zero-page.txt       |    4 ++++
 arch/x86/boot/compressed/cmdline.c    |    2 ++
 arch/x86/boot/compressed/misc.c       |   12 ++++++++++++
 arch/x86/boot/header.S                |   12 ++++++++++--
 arch/x86/boot/setup.ld                |    7 +++++++
 arch/x86/include/uapi/asm/bootparam.h |   13 ++++++++++---
 arch/x86/kernel/head64.c              |    2 ++
 arch/x86/kernel/setup.c               |    4 ++++
 9 files changed, 65 insertions(+), 6 deletions(-)

diff --git a/Documentation/x86/boot.txt b/Documentation/x86/boot.txt
index 406d82d..18ca9fb 100644
--- a/Documentation/x86/boot.txt
+++ b/Documentation/x86/boot.txt
@@ -57,6 +57,9 @@ Protocol 2.10:	(Kernel 2.6.31) Added a protocol for relaxed alignment
 Protocol 2.11:	(Kernel 3.6) Added a field for offset of EFI handover
 		protocol entry point.
 
+Protocol 2.12:	(Kernel 3.9) Added three fields for loading bzImage and
+		 ramdisk above 4G with 64bit in bootparam.
+
 **** MEMORY LAYOUT
 
 The traditional memory map for the kernel loader, used for Image or
@@ -182,7 +185,7 @@ Offset	Proto	Name		Meaning
 0230/4	2.05+	kernel_alignment Physical addr alignment required for kernel
 0234/1	2.05+	relocatable_kernel Whether kernel is relocatable or not
 0235/1	2.10+	min_alignment	Minimum alignment, as a power of two
-0236/2	N/A	pad3		Unused
+0236/2	2.12+	xloadflags	Boot protocol option flags
 0238/4	2.06+	cmdline_size	Maximum size of the kernel command line
 023C/4	2.07+	hardware_subarch Hardware subarchitecture
 0240/8	2.07+	hardware_subarch_data Subarchitecture-specific data
@@ -582,6 +585,16 @@ Protocol:	2.10+
   misaligned kernel.  Therefore, a loader should typically try each
   power-of-two alignment from kernel_alignment down to this alignment.
 
+Field name:     xloadflags
+Type:           modify (obligatory)
+Offset/size:    0x236/2
+Protocol:       2.12+
+
+  This field is a bitmask.
+
+  Bit 0 (read): CAN_BE_LOADED_ABOVE_4G
+        - If 1, kernel/boot_params/cmdline/ramdisk can be above 4g,
+
 Field name:	cmdline_size
 Type:		read
 Offset/size:	0x238/4
diff --git a/Documentation/x86/zero-page.txt b/Documentation/x86/zero-page.txt
index cf5437d..1140e59 100644
--- a/Documentation/x86/zero-page.txt
+++ b/Documentation/x86/zero-page.txt
@@ -19,6 +19,9 @@ Offset	Proto	Name		Meaning
 090/010	ALL	hd1_info	hd1 disk parameter, OBSOLETE!!
 0A0/010	ALL	sys_desc_table	System description table (struct sys_desc_table)
 0B0/010	ALL	olpc_ofw_header	OLPC's OpenFirmware CIF and friends
+0C0/004	ALL	ext_ramdisk_image ramdisk_image high 32bits
+0C4/004	ALL	ext_ramdisk_size  ramdisk_size high 32bits
+0C8/004	ALL	ext_cmd_line_ptr  cmd_line_ptr high 32bits
 140/080	ALL	edid_info	Video mode setup (struct edid_info)
 1C0/020	ALL	efi_info	EFI 32 information (struct efi_info)
 1E0/004	ALL	alk_mem_k	Alternative mem check, in KB
@@ -27,6 +30,7 @@ Offset	Proto	Name		Meaning
 1E9/001	ALL	eddbuf_entries	Number of entries in eddbuf (below)
 1EA/001	ALL	edd_mbr_sig_buf_entries	Number of entries in edd_mbr_sig_buffer
 				(below)
+1EF/001	ALL	sentinel	0: states _ext_* fields are valid
 290/040	ALL	edd_mbr_sig_buffer EDD MBR signatures
 2D0/A00	ALL	e820_map	E820 memory map table
 				(array of struct e820entry)
diff --git a/arch/x86/boot/compressed/cmdline.c b/arch/x86/boot/compressed/cmdline.c
index b4c913c..bffd73b 100644
--- a/arch/x86/boot/compressed/cmdline.c
+++ b/arch/x86/boot/compressed/cmdline.c
@@ -17,6 +17,8 @@ static unsigned long get_cmd_line_ptr(void)
 {
 	unsigned long cmd_line_ptr = real_mode->hdr.cmd_line_ptr;
 
+	cmd_line_ptr |= (u64)real_mode->ext_cmd_line_ptr << 32;
+
 	return cmd_line_ptr;
 }
 int cmdline_find_option(const char *option, char *buffer, int bufsize)
diff --git a/arch/x86/boot/compressed/misc.c b/arch/x86/boot/compressed/misc.c
index 88f7ff6..f714576 100644
--- a/arch/x86/boot/compressed/misc.c
+++ b/arch/x86/boot/compressed/misc.c
@@ -318,6 +318,16 @@ static void parse_elf(void *output)
 	free(phdrs);
 }
 
+static void sanitize_real_mode(struct boot_params *real_mode)
+{
+	if (real_mode->sentinel) {
+		/* ext_* fields in boot_params are not valid, clear them */
+		real_mode->ext_ramdisk_image = 0;
+		real_mode->ext_ramdisk_size  = 0;
+		real_mode->ext_cmd_line_ptr  = 0;
+	}
+}
+
 asmlinkage void decompress_kernel(void *rmode, memptr heap,
 				  unsigned char *input_data,
 				  unsigned long input_len,
@@ -325,6 +335,8 @@ asmlinkage void decompress_kernel(void *rmode, memptr heap,
 {
 	real_mode = rmode;
 
+	sanitize_real_mode(real_mode);
+
 	if (real_mode->screen_info.orig_video_mode == 7) {
 		vidmem = (char *) 0xb0000;
 		vidport = 0x3b4;
diff --git a/arch/x86/boot/header.S b/arch/x86/boot/header.S
index 8c132a6..0d5790f 100644
--- a/arch/x86/boot/header.S
+++ b/arch/x86/boot/header.S
@@ -279,7 +279,7 @@ _start:
 	# Part 2 of the header, from the old setup.S
 
 		.ascii	"HdrS"		# header signature
-		.word	0x020b		# header version number (>= 0x0105)
+		.word	0x020c		# header version number (>= 0x0105)
 					# or else old loadlin-1.5 will fail)
 		.globl realmode_swtch
 realmode_swtch:	.word	0, 0		# default_switch, SETUPSEG
@@ -369,7 +369,15 @@ relocatable_kernel:    .byte 1
 relocatable_kernel:    .byte 0
 #endif
 min_alignment:		.byte MIN_KERNEL_ALIGN_LG2	# minimum alignment
-pad3:			.word 0
+
+xloadflags:
+CAN_BE_LOADED_ABOVE_4G	= 1		# If set, the kernel/boot_param/
+					# ramdisk could be loaded above 4g
+#if defined(CONFIG_X86_64) && defined(CONFIG_RELOCATABLE)
+			.word CAN_BE_LOADED_ABOVE_4G
+#else
+			.word 0
+#endif
 
 cmdline_size:   .long   COMMAND_LINE_SIZE-1     #length of the command line,
                                                 #added with boot protocol
diff --git a/arch/x86/boot/setup.ld b/arch/x86/boot/setup.ld
index 03c0683..9333d37 100644
--- a/arch/x86/boot/setup.ld
+++ b/arch/x86/boot/setup.ld
@@ -13,6 +13,13 @@ SECTIONS
 	.bstext		: { *(.bstext) }
 	.bsdata		: { *(.bsdata) }
 
+	/* sentinel: make sure if boot_params from bootloader is right */
+	. = 495;
+	.sentinel	: {
+		sentinel = .;
+		BYTE(0xff);
+	}
+
 	. = 497;
 	.header		: { *(.header) }
 	.entrytext	: { *(.entrytext) }
diff --git a/arch/x86/include/uapi/asm/bootparam.h b/arch/x86/include/uapi/asm/bootparam.h
index 92862cd..3d8ed8f 100644
--- a/arch/x86/include/uapi/asm/bootparam.h
+++ b/arch/x86/include/uapi/asm/bootparam.h
@@ -58,7 +58,9 @@ struct setup_header {
 	__u32	initrd_addr_max;
 	__u32	kernel_alignment;
 	__u8	relocatable_kernel;
-	__u8	_pad2[3];
+	__u8	min_alignment;
+	__u16	xloadflags;
+#define CAN_BE_LOADED_ABOVE_4G	(1<<0)
 	__u32	cmdline_size;
 	__u32	hardware_subarch;
 	__u64	hardware_subarch_data;
@@ -106,7 +108,10 @@ struct boot_params {
 	__u8  hd1_info[16];	/* obsolete! */		/* 0x090 */
 	struct sys_desc_table sys_desc_table;		/* 0x0a0 */
 	struct olpc_ofw_header olpc_ofw_header;		/* 0x0b0 */
-	__u8  _pad4[128];				/* 0x0c0 */
+	__u32 ext_ramdisk_image;			/* 0x0c0 */
+	__u32 ext_ramdisk_size;				/* 0x0c4 */
+	__u32 ext_cmd_line_ptr;				/* 0x0c8 */
+	__u8  _pad4[116];				/* 0x0cc */
 	struct edid_info edid_info;			/* 0x140 */
 	struct efi_info efi_info;			/* 0x1c0 */
 	__u32 alt_mem_k;				/* 0x1e0 */
@@ -115,7 +120,9 @@ struct boot_params {
 	__u8  eddbuf_entries;				/* 0x1e9 */
 	__u8  edd_mbr_sig_buf_entries;			/* 0x1ea */
 	__u8  kbd_status;				/* 0x1eb */
-	__u8  _pad6[5];					/* 0x1ec */
+	__u8  _pad5[3];					/* 0x1ec */
+	__u8  sentinel;					/* 0x1ef */
+	__u8  _pad6[1];					/* 0x1f0 */
 	struct setup_header hdr;    /* setup header */	/* 0x1f1 */
 	__u8  _pad7[0x290-0x1f1-sizeof(struct setup_header)];
 	__u32 edd_mbr_sig_buffer[EDD_MBR_SIG_MAX];	/* 0x290 */
diff --git a/arch/x86/kernel/head64.c b/arch/x86/kernel/head64.c
index 316e7b2..e63d29a 100644
--- a/arch/x86/kernel/head64.c
+++ b/arch/x86/kernel/head64.c
@@ -115,6 +115,8 @@ static unsigned long get_cmd_line_ptr(void)
 {
 	unsigned long cmd_line_ptr = boot_params.hdr.cmd_line_ptr;
 
+	cmd_line_ptr |= (u64)boot_params.ext_cmd_line_ptr << 32;
+
 	return cmd_line_ptr;
 }
 
diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
index 644a123..2509efa 100644
--- a/arch/x86/kernel/setup.c
+++ b/arch/x86/kernel/setup.c
@@ -298,12 +298,16 @@ static u64 __init get_ramdisk_image(void)
 {
 	u64 ramdisk_image = boot_params.hdr.ramdisk_image;
 
+	ramdisk_image |= (u64)boot_params.ext_ramdisk_image << 32;
+
 	return ramdisk_image;
 }
 static u64 __init get_ramdisk_size(void)
 {
 	u64 ramdisk_size = boot_params.hdr.ramdisk_size;
 
+	ramdisk_size |= (u64)boot_params.ext_ramdisk_size << 32;
+
 	return ramdisk_size;
 }
 
-- 
1.7.10.4


^ permalink raw reply related	[flat|nested] 199+ messages in thread

* [PATCH v7u1 23/31] x86, boot: update comments about entries for 64bit image
  2013-01-04  0:48 [PATCH v7u1 00/31] x86, boot, 64bit: Add support for loading ramdisk and bzImage above 4G Yinghai Lu
                   ` (21 preceding siblings ...)
  2013-01-04  0:48 ` [PATCH v7u1 22/31] x86, boot: add fields to support load bzImage and ramdisk above 4G Yinghai Lu
@ 2013-01-04  0:48 ` Yinghai Lu
  2013-01-14 11:20   ` Borislav Petkov
  2013-01-04  0:48 ` [PATCH v7u1 24/31] x86, boot: Not need to check setup_header version for setup_data Yinghai Lu
                   ` (10 subsequent siblings)
  33 siblings, 1 reply; 199+ messages in thread
From: Yinghai Lu @ 2013-01-04  0:48 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H. Peter Anvin
  Cc: Eric W. Biederman, Andrew Morton, Borislav Petkov, Jan Kiszka,
	Jason Wessel, linux-kernel, Yinghai Lu, Rob Landley,
	Matt Fleming

Now 64bit entry is fixed on 0x200, can not be changed anymore.

Update the comments to reflect that.

Also put info about it in boot.txt

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Cc: Rob Landley <rob@landley.net>
Cc: Matt Fleming <matt.fleming@intel.com>
---
 Documentation/x86/boot.txt         |   38 ++++++++++++++++++++++++++++++++++++
 arch/x86/boot/compressed/head_64.S |   22 ++++++++++++---------
 2 files changed, 51 insertions(+), 9 deletions(-)

diff --git a/Documentation/x86/boot.txt b/Documentation/x86/boot.txt
index 18ca9fb..24cc542 100644
--- a/Documentation/x86/boot.txt
+++ b/Documentation/x86/boot.txt
@@ -1042,6 +1042,44 @@ must have read/write permission; CS must be __BOOT_CS and DS, ES, SS
 must be __BOOT_DS; interrupt must be disabled; %esi must hold the base
 address of the struct boot_params; %ebp, %edi and %ebx must be zero.
 
+**** 64-bit BOOT PROTOCOL
+
+For machine with 64bit cpus and 64bit kernel, we could use 64bit bootloader
+We need a 64-bit boot protocol.
+
+In 64-bit boot protocol, the first step in loading a Linux kernel
+should be to setup the boot parameters (struct boot_params,
+traditionally known as "zero page"). The memory for struct boot_params
+should be allocated under or above 4G and initialized to all zero.
+Then the setup header from offset 0x01f1 of kernel image on should be
+loaded into struct boot_params and examined. The end of setup header
+can be calculated as follow:
+
+	0x0202 + byte value at offset 0x0201
+
+In addition to read/modify/write the setup header of the struct
+boot_params as that of 16-bit boot protocol, the boot loader should
+also fill the additional fields of the struct boot_params as that
+described in zero-page.txt.
+
+After setting up the struct boot_params, the boot loader can load the
+64-bit kernel in the same way as that of 16-bit boot protocol, but
+kernel could be above 4G.
+
+In 64-bit boot protocol, the kernel is started by jumping to the
+64-bit kernel entry point, which is the start address of loaded
+64-bit kernel plus 0x200.
+
+At entry, the CPU must be in 64-bit mode with paging enabled.
+The range with setup_header.init_size from start address of loaded
+kernel and zero page and command line buffer get ident mapping;
+a GDT must be loaded with the descriptors for selectors
+__BOOT_CS(0x10) and __BOOT_DS(0x18); both descriptors must be 4G flat
+segment; __BOOT_CS must have execute/read permission, and __BOOT_DS
+must have read/write permission; CS must be __BOOT_CS and DS, ES, SS
+must be __BOOT_DS; interrupt must be disabled; %rsi must hold the base
+address of the struct boot_params.
+
 **** EFI HANDOVER PROTOCOL
 
 This protocol allows boot loaders to defer initialisation to the EFI
diff --git a/arch/x86/boot/compressed/head_64.S b/arch/x86/boot/compressed/head_64.S
index 5c80b94..aaafd4e 100644
--- a/arch/x86/boot/compressed/head_64.S
+++ b/arch/x86/boot/compressed/head_64.S
@@ -37,6 +37,12 @@
 	__HEAD
 	.code32
 ENTRY(startup_32)
+	/*
+	 * 32bit entry is 0, could not be changed!
+	 * If we come here directly from a bootloader,
+	 * kernel(text+data+bss+brk) ramdisk, zero_page, command line
+	 * all need to be under 4G limit.
+	 */
 	cld
 	/*
 	 * Test KEEP_SEGMENTS flag to see if the bootloader is asking
@@ -182,20 +188,18 @@ ENTRY(startup_32)
 	lret
 ENDPROC(startup_32)
 
-	/*
-	 * Be careful here startup_64 needs to be at a predictable
-	 * address so I can export it in an ELF header.  Bootloaders
-	 * should look at the ELF header to find this address, as
-	 * it may change in the future.
-	 */
 	.code64
 	.org 0x200
 ENTRY(startup_64)
 	/*
+	 * 64bit entry is 0x200, could not be changed!
 	 * We come here either from startup_32 or directly from a
-	 * 64bit bootloader.  If we come here from a bootloader we depend on
-	 * an identity mapped page table being provied that maps our
-	 * entire text+data+bss and hopefully all of memory.
+	 * 64bit bootloader.
+	 * If we come here from a bootloader, kernel(text+data+bss+brk),
+	 * ramdisk, zero_page, command line could be above 4G.
+	 * We depend on an identity mapped page table being provided
+	 * that maps our entire kernel(text+data+bss+brk), zero page
+	 * and command line.
 	 */
 #ifdef CONFIG_EFI_STUB
 	/*
-- 
1.7.10.4


^ permalink raw reply related	[flat|nested] 199+ messages in thread

* [PATCH v7u1 24/31] x86, boot: Not need to check setup_header version for setup_data
  2013-01-04  0:48 [PATCH v7u1 00/31] x86, boot, 64bit: Add support for loading ramdisk and bzImage above 4G Yinghai Lu
                   ` (22 preceding siblings ...)
  2013-01-04  0:48 ` [PATCH v7u1 23/31] x86, boot: update comments about entries for 64bit image Yinghai Lu
@ 2013-01-04  0:48 ` Yinghai Lu
  2013-01-14 11:26   ` Borislav Petkov
  2013-01-04  0:48 ` [PATCH v7u1 25/31] memblock: add memblock_mem_size() Yinghai Lu
                   ` (9 subsequent siblings)
  33 siblings, 1 reply; 199+ messages in thread
From: Yinghai Lu @ 2013-01-04  0:48 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H. Peter Anvin
  Cc: Eric W. Biederman, Andrew Morton, Borislav Petkov, Jan Kiszka,
	Jason Wessel, linux-kernel, Yinghai Lu

That is for bootloader.

setup_data is in setup_header, and all bootloader is copying that
for bzImage. So for old bootloader should keep that as 0.

kexec till now for elf image, will set setup_data to 0.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
---
 arch/x86/kernel/setup.c |    6 ------
 1 file changed, 6 deletions(-)

diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
index 2509efa..15ce495 100644
--- a/arch/x86/kernel/setup.c
+++ b/arch/x86/kernel/setup.c
@@ -439,8 +439,6 @@ static void __init parse_setup_data(void)
 	struct setup_data *data;
 	u64 pa_data;
 
-	if (boot_params.hdr.version < 0x0209)
-		return;
 	pa_data = boot_params.hdr.setup_data;
 	while (pa_data) {
 		u32 data_len, map_len;
@@ -476,8 +474,6 @@ static void __init e820_reserve_setup_data(void)
 	u64 pa_data;
 	int found = 0;
 
-	if (boot_params.hdr.version < 0x0209)
-		return;
 	pa_data = boot_params.hdr.setup_data;
 	while (pa_data) {
 		data = early_memremap(pa_data, sizeof(*data));
@@ -501,8 +497,6 @@ static void __init memblock_x86_reserve_range_setup_data(void)
 	struct setup_data *data;
 	u64 pa_data;
 
-	if (boot_params.hdr.version < 0x0209)
-		return;
 	pa_data = boot_params.hdr.setup_data;
 	while (pa_data) {
 		data = early_memremap(pa_data, sizeof(*data));
-- 
1.7.10.4


^ permalink raw reply related	[flat|nested] 199+ messages in thread

* [PATCH v7u1 25/31] memblock: add memblock_mem_size()
  2013-01-04  0:48 [PATCH v7u1 00/31] x86, boot, 64bit: Add support for loading ramdisk and bzImage above 4G Yinghai Lu
                   ` (23 preceding siblings ...)
  2013-01-04  0:48 ` [PATCH v7u1 24/31] x86, boot: Not need to check setup_header version for setup_data Yinghai Lu
@ 2013-01-04  0:48 ` Yinghai Lu
  2013-01-14 20:42   ` H. Peter Anvin
  2013-01-04  0:48 ` [PATCH v7u1 26/31] x86: Don't enable swiotlb if there is not enough ram for it Yinghai Lu
                   ` (8 subsequent siblings)
  33 siblings, 1 reply; 199+ messages in thread
From: Yinghai Lu @ 2013-01-04  0:48 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H. Peter Anvin
  Cc: Eric W. Biederman, Andrew Morton, Borislav Petkov, Jan Kiszka,
	Jason Wessel, linux-kernel, Yinghai Lu

Use it to get mem size under the limit_pfn.
to replace local version in x86 reserved_initrd.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
---
 arch/x86/kernel/setup.c  |   16 +---------------
 include/linux/memblock.h |    1 +
 mm/memblock.c            |   17 +++++++++++++++++
 3 files changed, 19 insertions(+), 15 deletions(-)

diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
index 15ce495..c58497e 100644
--- a/arch/x86/kernel/setup.c
+++ b/arch/x86/kernel/setup.c
@@ -363,20 +363,6 @@ static void __init relocate_initrd(void)
 		ramdisk_here, ramdisk_here + ramdisk_size - 1);
 }
 
-static u64 __init get_mem_size(unsigned long limit_pfn)
-{
-	int i;
-	u64 mapped_pages = 0;
-	unsigned long start_pfn, end_pfn;
-
-	for_each_mem_pfn_range(i, MAX_NUMNODES, &start_pfn, &end_pfn, NULL) {
-		start_pfn = min_t(unsigned long, start_pfn, limit_pfn);
-		end_pfn = min_t(unsigned long, end_pfn, limit_pfn);
-		mapped_pages += end_pfn - start_pfn;
-	}
-
-	return mapped_pages << PAGE_SHIFT;
-}
 static void __init early_reserve_initrd(void)
 {
 	/* Assume only end is not page aligned */
@@ -404,7 +390,7 @@ static void __init reserve_initrd(void)
 
 	initrd_start = 0;
 
-	mapped_size = get_mem_size(max_pfn_mapped);
+	mapped_size = (u64)memblock_mem_size(max_pfn_mapped);
 	if (ramdisk_size >= (mapped_size>>1))
 		panic("initrd too large to handle, "
 		       "disabling initrd (%lld needed, %lld available)\n",
diff --git a/include/linux/memblock.h b/include/linux/memblock.h
index d452ee1..f388203 100644
--- a/include/linux/memblock.h
+++ b/include/linux/memblock.h
@@ -155,6 +155,7 @@ phys_addr_t memblock_alloc_base(phys_addr_t size, phys_addr_t align,
 phys_addr_t __memblock_alloc_base(phys_addr_t size, phys_addr_t align,
 				  phys_addr_t max_addr);
 phys_addr_t memblock_phys_mem_size(void);
+phys_addr_t memblock_mem_size(unsigned long limit_pfn);
 phys_addr_t memblock_start_of_DRAM(void);
 phys_addr_t memblock_end_of_DRAM(void);
 void memblock_enforce_memory_limit(phys_addr_t memory_limit);
diff --git a/mm/memblock.c b/mm/memblock.c
index 6259055..4b3b8d2 100644
--- a/mm/memblock.c
+++ b/mm/memblock.c
@@ -827,6 +827,23 @@ phys_addr_t __init memblock_phys_mem_size(void)
 	return memblock.memory.total_size;
 }
 
+phys_addr_t __init memblock_mem_size(unsigned long limit_pfn)
+{
+	unsigned long pages = 0;
+	struct memblock_region *r;
+	unsigned long start_pfn, end_pfn;
+
+	for_each_memblock(memory, r) {
+		start_pfn = memblock_region_memory_base_pfn(r);
+		end_pfn = memblock_region_memory_end_pfn(r);
+		start_pfn = min_t(unsigned long, start_pfn, limit_pfn);
+		end_pfn = min_t(unsigned long, end_pfn, limit_pfn);
+		pages += end_pfn - start_pfn;
+	}
+
+	return (phys_addr_t)pages << PAGE_SHIFT;
+}
+
 /* lowest address */
 phys_addr_t __init_memblock memblock_start_of_DRAM(void)
 {
-- 
1.7.10.4


^ permalink raw reply related	[flat|nested] 199+ messages in thread

* [PATCH v7u1 26/31] x86: Don't enable swiotlb if there is not enough ram for it
  2013-01-04  0:48 [PATCH v7u1 00/31] x86, boot, 64bit: Add support for loading ramdisk and bzImage above 4G Yinghai Lu
                   ` (24 preceding siblings ...)
  2013-01-04  0:48 ` [PATCH v7u1 25/31] memblock: add memblock_mem_size() Yinghai Lu
@ 2013-01-04  0:48 ` Yinghai Lu
  2013-01-04 16:05   ` Konrad Rzeszutek Wilk
  2013-01-04 17:50   ` Shuah Khan
  2013-01-04  0:48 ` [PATCH v7u1 27/31] x86, kdump: remove crashkernel range find limit for 64bit Yinghai Lu
                   ` (7 subsequent siblings)
  33 siblings, 2 replies; 199+ messages in thread
From: Yinghai Lu @ 2013-01-04  0:48 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H. Peter Anvin
  Cc: Eric W. Biederman, Andrew Morton, Borislav Petkov, Jan Kiszka,
	Jason Wessel, linux-kernel, Yinghai Lu, Konrad Rzeszutek Wilk,
	Joerg Roedel

Normal boot path on system with iommu support:
swiotlb buffer will be allocated early at first and then try to initialize
iommu, if iommu for intel or amd could setup properly, swiotlb buffer
will be freed.

The early allocating is with bootmem, and get panic when we try to use
kdump with buffer above 4G only if swiotlb is enabled.

because actually the kernel can go on without swiotlb, and use intel iommu.

Try disable swiotlb if there is not enough ram for it.

That is for kdump to use kernel above 4G.

Suggested-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Joerg Roedel <joro@8bytes.org>
---
 arch/x86/kernel/pci-swiotlb.c |   14 ++++++++++----
 1 file changed, 10 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kernel/pci-swiotlb.c b/arch/x86/kernel/pci-swiotlb.c
index 6c483ba..949ebfe 100644
--- a/arch/x86/kernel/pci-swiotlb.c
+++ b/arch/x86/kernel/pci-swiotlb.c
@@ -6,6 +6,7 @@
 #include <linux/swiotlb.h>
 #include <linux/bootmem.h>
 #include <linux/dma-mapping.h>
+#include <linux/memblock.h>
 
 #include <asm/iommu.h>
 #include <asm/swiotlb.h>
@@ -50,6 +51,11 @@ static struct dma_map_ops swiotlb_dma_ops = {
 	.dma_supported = NULL,
 };
 
+static bool __init enough_mem_for_swiotlb(void)
+{
+	/* do we have less than 1M RAM under 4G ? */
+	return memblock_mem_size(1ULL<<(32-PAGE_SHIFT)) > (1ULL<<20);
+}
 /*
  * pci_swiotlb_detect_override - set swiotlb to 1 if necessary
  *
@@ -58,12 +64,12 @@ static struct dma_map_ops swiotlb_dma_ops = {
  */
 int __init pci_swiotlb_detect_override(void)
 {
-	int use_swiotlb = swiotlb | swiotlb_force;
-
 	if (swiotlb_force)
 		swiotlb = 1;
+	else if (!enough_mem_for_swiotlb())
+		swiotlb = 0;
 
-	return use_swiotlb;
+	return swiotlb;
 }
 IOMMU_INIT_FINISH(pci_swiotlb_detect_override,
 		  pci_xen_swiotlb_detect,
@@ -78,7 +84,7 @@ int __init pci_swiotlb_detect_4gb(void)
 {
 	/* don't initialize swiotlb if iommu=off (no_iommu=1) */
 #ifdef CONFIG_X86_64
-	if (!no_iommu && max_pfn > MAX_DMA32_PFN)
+	if (!no_iommu && max_pfn > MAX_DMA32_PFN && enough_mem_for_swiotlb())
 		swiotlb = 1;
 #endif
 	return swiotlb;
-- 
1.7.10.4


^ permalink raw reply related	[flat|nested] 199+ messages in thread

* [PATCH v7u1 27/31] x86, kdump: remove crashkernel range find limit for 64bit
  2013-01-04  0:48 [PATCH v7u1 00/31] x86, boot, 64bit: Add support for loading ramdisk and bzImage above 4G Yinghai Lu
                   ` (25 preceding siblings ...)
  2013-01-04  0:48 ` [PATCH v7u1 26/31] x86: Don't enable swiotlb if there is not enough ram for it Yinghai Lu
@ 2013-01-04  0:48 ` Yinghai Lu
  2013-01-14 15:43   ` Borislav Petkov
  2013-01-04  0:48 ` [PATCH v7u1 28/31] x86: add Crash kernel low reservation Yinghai Lu
                   ` (6 subsequent siblings)
  33 siblings, 1 reply; 199+ messages in thread
From: Yinghai Lu @ 2013-01-04  0:48 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H. Peter Anvin
  Cc: Eric W. Biederman, Andrew Morton, Borislav Petkov, Jan Kiszka,
	Jason Wessel, linux-kernel, Yinghai Lu

Now kexeced kernel/ramdisk could be above 4g, so remove 896 limit for
64bit.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
---
 arch/x86/kernel/setup.c |    4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
index c58497e..6adbc45 100644
--- a/arch/x86/kernel/setup.c
+++ b/arch/x86/kernel/setup.c
@@ -501,13 +501,11 @@ static void __init memblock_x86_reserve_range_setup_data(void)
 /*
  * Keep the crash kernel below this limit.  On 32 bits earlier kernels
  * would limit the kernel to the low 512 MiB due to mapping restrictions.
- * On 64 bits, kexec-tools currently limits us to 896 MiB; increase this
- * limit once kexec-tools are fixed.
  */
 #ifdef CONFIG_X86_32
 # define CRASH_KERNEL_ADDR_MAX	(512 << 20)
 #else
-# define CRASH_KERNEL_ADDR_MAX	(896 << 20)
+# define CRASH_KERNEL_ADDR_MAX	MAXMEM
 #endif
 
 static void __init reserve_crashkernel(void)
-- 
1.7.10.4


^ permalink raw reply related	[flat|nested] 199+ messages in thread

* [PATCH v7u1 28/31] x86: add Crash kernel low reservation
  2013-01-04  0:48 [PATCH v7u1 00/31] x86, boot, 64bit: Add support for loading ramdisk and bzImage above 4G Yinghai Lu
                   ` (26 preceding siblings ...)
  2013-01-04  0:48 ` [PATCH v7u1 27/31] x86, kdump: remove crashkernel range find limit for 64bit Yinghai Lu
@ 2013-01-04  0:48 ` Yinghai Lu
  2013-01-04  0:48 ` [PATCH v7u1 29/31] x86: Merge early kernel reserve for 32bit and 64bit Yinghai Lu
                   ` (5 subsequent siblings)
  33 siblings, 0 replies; 199+ messages in thread
From: Yinghai Lu @ 2013-01-04  0:48 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H. Peter Anvin
  Cc: Eric W. Biederman, Andrew Morton, Borislav Petkov, Jan Kiszka,
	Jason Wessel, linux-kernel, Yinghai Lu, Rob Landley

During kdump kernel's booting stage, it need to find low ram for
swiotlb buffer when system does not support intel iommu/dmar remapping.

kexed-tools is appending memmap=exactmap and range from /proc/iomem
with "Crash kernel", and that range is above 4G for 64bit after boot
protocol 2.12.

We need to add another range in /proc/iomem like "Crash kernel low",
so kexec-tools could find that info and append to kdump kernel
command line.

Try to reserve some under 4G if the normal "Crash kernel" is above 4G.

User could specify the size with crashkernel_low=XX[KMG].

-v2: fix warning that is found by Fengguang's test robot.
-v3: move out get_mem_size change to another patch, to solve compiling
     warning that is found by Borislav Petkov <bp@alien8.de>
-v4: user must specify crashkernel_low if system does not support
     intel or amd iommu.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Cc: Eric Biederman <ebiederm@xmission.com>
Cc: Rob Landley <rob@landley.net>
---
 Documentation/kernel-parameters.txt |    3 +++
 arch/x86/kernel/setup.c             |   42 +++++++++++++++++++++++++++++++++--
 include/linux/kexec.h               |    3 +++
 kernel/kexec.c                      |   34 +++++++++++++++++++++++-----
 4 files changed, 75 insertions(+), 7 deletions(-)

diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt
index 363e348..da0e077 100644
--- a/Documentation/kernel-parameters.txt
+++ b/Documentation/kernel-parameters.txt
@@ -594,6 +594,9 @@ bytes respectively. Such letter suffixes can also be entirely omitted.
 			is selected automatically. Check
 			Documentation/kdump/kdump.txt for further details.
 
+	crashkernel_low=size[KMG]
+			[KNL, x86] parts under 4G.
+
 	crashkernel=range1:size1[,range2:size2,...][@offset]
 			[KNL] Same as above, but depends on the memory
 			in the running system. The syntax of range is
diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
index 6adbc45..2203dd6 100644
--- a/arch/x86/kernel/setup.c
+++ b/arch/x86/kernel/setup.c
@@ -508,8 +508,44 @@ static void __init memblock_x86_reserve_range_setup_data(void)
 # define CRASH_KERNEL_ADDR_MAX	MAXMEM
 #endif
 
+static void __init reserve_crashkernel_low(void)
+{
+#ifdef CONFIG_X86_64
+	const unsigned long long alignment = 16<<20;	/* 16M */
+	unsigned long long low_base = 0, low_size = 0;
+	unsigned long total_low_mem;
+	unsigned long long base;
+	int ret;
+
+	total_low_mem = memblock_mem_size(1UL<<(32-PAGE_SHIFT));
+	ret = parse_crashkernel_low(boot_command_line, total_low_mem,
+						&low_size, &base);
+	if (ret != 0 || low_size <= 0)
+		return;
+
+	low_base = memblock_find_in_range(low_size, (1ULL<<32),
+					low_size, alignment);
+
+	if (!low_base) {
+		pr_info("crashkernel low reservation failed - No suitable area found.\n");
+
+		return;
+	}
+
+	memblock_reserve(low_base, low_size);
+	pr_info("Reserving %ldMB of low memory at %ldMB for crashkernel (System low RAM: %ldMB)\n",
+			(unsigned long)(low_size >> 20),
+			(unsigned long)(low_base >> 20),
+			(unsigned long)(total_low_mem >> 20));
+	crashk_low_res.start = low_base;
+	crashk_low_res.end   = low_base + low_size - 1;
+	insert_resource(&iomem_resource, &crashk_low_res);
+#endif
+}
+
 static void __init reserve_crashkernel(void)
 {
+	const unsigned long long alignment = 16<<20;	/* 16M */
 	unsigned long long total_mem;
 	unsigned long long crash_size, crash_base;
 	int ret;
@@ -523,8 +559,6 @@ static void __init reserve_crashkernel(void)
 
 	/* 0 means: find the address automatically */
 	if (crash_base <= 0) {
-		const unsigned long long alignment = 16<<20;	/* 16M */
-
 		/*
 		 *  kexec want bzImage is below CRASH_KERNEL_ADDR_MAX
 		 */
@@ -535,6 +569,7 @@ static void __init reserve_crashkernel(void)
 			pr_info("crashkernel reservation failed - No suitable area found.\n");
 			return;
 		}
+
 	} else {
 		unsigned long long start;
 
@@ -556,6 +591,9 @@ static void __init reserve_crashkernel(void)
 	crashk_res.start = crash_base;
 	crashk_res.end   = crash_base + crash_size - 1;
 	insert_resource(&iomem_resource, &crashk_res);
+
+	if (crash_base >= (1ULL<<32))
+		reserve_crashkernel_low();
 }
 #else
 static void __init reserve_crashkernel(void)
diff --git a/include/linux/kexec.h b/include/linux/kexec.h
index d0b8458..d2e6927 100644
--- a/include/linux/kexec.h
+++ b/include/linux/kexec.h
@@ -191,6 +191,7 @@ extern struct kimage *kexec_crash_image;
 /* Location of a reserved region to hold the crash kernel.
  */
 extern struct resource crashk_res;
+extern struct resource crashk_low_res;
 typedef u32 note_buf_t[KEXEC_NOTE_BYTES/4];
 extern note_buf_t __percpu *crash_notes;
 extern u32 vmcoreinfo_note[VMCOREINFO_NOTE_SIZE/4];
@@ -199,6 +200,8 @@ extern size_t vmcoreinfo_max_size;
 
 int __init parse_crashkernel(char *cmdline, unsigned long long system_ram,
 		unsigned long long *crash_size, unsigned long long *crash_base);
+int parse_crashkernel_low(char *cmdline, unsigned long long system_ram,
+		unsigned long long *crash_size, unsigned long long *crash_base);
 int crash_shrink_memory(unsigned long new_size);
 size_t crash_get_memory_size(void);
 void crash_free_reserved_phys_range(unsigned long begin, unsigned long end);
diff --git a/kernel/kexec.c b/kernel/kexec.c
index 5e4bd78..2436ffc 100644
--- a/kernel/kexec.c
+++ b/kernel/kexec.c
@@ -54,6 +54,12 @@ struct resource crashk_res = {
 	.end   = 0,
 	.flags = IORESOURCE_BUSY | IORESOURCE_MEM
 };
+struct resource crashk_low_res = {
+	.name  = "Crash kernel low",
+	.start = 0,
+	.end   = 0,
+	.flags = IORESOURCE_BUSY | IORESOURCE_MEM
+};
 
 int kexec_should_crash(struct task_struct *p)
 {
@@ -1369,10 +1375,11 @@ static int __init parse_crashkernel_simple(char 		*cmdline,
  * That function is the entry point for command line parsing and should be
  * called from the arch-specific code.
  */
-int __init parse_crashkernel(char 		 *cmdline,
+static int __init __parse_crashkernel(char *cmdline,
 			     unsigned long long system_ram,
 			     unsigned long long *crash_size,
-			     unsigned long long *crash_base)
+			     unsigned long long *crash_base,
+				const char *name)
 {
 	char 	*p = cmdline, *ck_cmdline = NULL;
 	char	*first_colon, *first_space;
@@ -1382,16 +1389,16 @@ int __init parse_crashkernel(char 		 *cmdline,
 	*crash_base = 0;
 
 	/* find crashkernel and use the last one if there are more */
-	p = strstr(p, "crashkernel=");
+	p = strstr(p, name);
 	while (p) {
 		ck_cmdline = p;
-		p = strstr(p+1, "crashkernel=");
+		p = strstr(p+1, name);
 	}
 
 	if (!ck_cmdline)
 		return -EINVAL;
 
-	ck_cmdline += 12; /* strlen("crashkernel=") */
+	ck_cmdline += strlen(name);
 
 	/*
 	 * if the commandline contains a ':', then that's the extended
@@ -1409,6 +1416,23 @@ int __init parse_crashkernel(char 		 *cmdline,
 	return 0;
 }
 
+int __init parse_crashkernel(char *cmdline,
+			     unsigned long long system_ram,
+			     unsigned long long *crash_size,
+			     unsigned long long *crash_base)
+{
+	return __parse_crashkernel(cmdline, system_ram, crash_size, crash_base,
+					"crashkernel=");
+}
+
+int __init parse_crashkernel_low(char *cmdline,
+			     unsigned long long system_ram,
+			     unsigned long long *crash_size,
+			     unsigned long long *crash_base)
+{
+	return __parse_crashkernel(cmdline, system_ram, crash_size, crash_base,
+					"crashkernel_low=");
+}
 
 static void update_vmcoreinfo_note(void)
 {
-- 
1.7.10.4


^ permalink raw reply related	[flat|nested] 199+ messages in thread

* [PATCH v7u1 29/31] x86: Merge early kernel reserve for 32bit and 64bit
  2013-01-04  0:48 [PATCH v7u1 00/31] x86, boot, 64bit: Add support for loading ramdisk and bzImage above 4G Yinghai Lu
                   ` (27 preceding siblings ...)
  2013-01-04  0:48 ` [PATCH v7u1 28/31] x86: add Crash kernel low reservation Yinghai Lu
@ 2013-01-04  0:48 ` Yinghai Lu
  2013-01-04  0:48 ` [PATCH v7u1 30/31] x86, 64bit, mm: Mark data/bss/brk to nx Yinghai Lu
                   ` (4 subsequent siblings)
  33 siblings, 0 replies; 199+ messages in thread
From: Yinghai Lu @ 2013-01-04  0:48 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H. Peter Anvin
  Cc: Eric W. Biederman, Andrew Morton, Borislav Petkov, Jan Kiszka,
	Jason Wessel, linux-kernel, Yinghai Lu, Alexander Duyck

They are the same, and we could move them out from head32/64.c to setup.c.

We are using memblock, and it could handle overlapping properly, so
we don't need to reserve some at first to hold the location, and just
need to make sure we reserve them before we are using memblock to find
free mem to use.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Cc: Alexander Duyck <alexander.h.duyck@intel.com>
---
 arch/x86/kernel/head32.c |    9 ---------
 arch/x86/kernel/head64.c |    9 ---------
 arch/x86/kernel/setup.c  |    9 +++++++++
 3 files changed, 9 insertions(+), 18 deletions(-)

diff --git a/arch/x86/kernel/head32.c b/arch/x86/kernel/head32.c
index b071d41..17f7792 100644
--- a/arch/x86/kernel/head32.c
+++ b/arch/x86/kernel/head32.c
@@ -30,9 +30,6 @@ static void __init i386_default_early_setup(void)
 
 void __init i386_start_kernel(void)
 {
-	memblock_reserve(__pa_symbol(_text),
-			 (unsigned long)__bss_stop - (unsigned long)_text);
-
 	/* Call the subarch specific early setup function */
 	switch (boot_params.hdr.hardware_subarch) {
 	case X86_SUBARCH_MRST:
@@ -46,11 +43,5 @@ void __init i386_start_kernel(void)
 		break;
 	}
 
-	/*
-	 * At this point everything still needed from the boot loader
-	 * or BIOS or kernel text should be early reserved or marked not
-	 * RAM in e820. All other memory is free game.
-	 */
-
 	start_kernel();
 }
diff --git a/arch/x86/kernel/head64.c b/arch/x86/kernel/head64.c
index e63d29a..d9d7c75 100644
--- a/arch/x86/kernel/head64.c
+++ b/arch/x86/kernel/head64.c
@@ -184,16 +184,7 @@ void __init x86_64_start_reservations(char *real_mode_data)
 	if (!boot_params.hdr.version)
 		copy_bootdata(__va(real_mode_data));
 
-	memblock_reserve(__pa_symbol(_text),
-			 (unsigned long)__bss_stop - (unsigned long)_text);
-
 	reserve_ebda_region();
 
-	/*
-	 * At this point everything still needed from the boot loader
-	 * or BIOS or kernel text should be early reserved or marked not
-	 * RAM in e820. All other memory is free game.
-	 */
-
 	start_kernel();
 }
diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
index 2203dd6..3117515 100644
--- a/arch/x86/kernel/setup.c
+++ b/arch/x86/kernel/setup.c
@@ -706,8 +706,17 @@ early_param("reservelow", parse_reservelow);
 
 void __init setup_arch(char **cmdline_p)
 {
+	memblock_reserve(__pa_symbol(_text),
+			 (unsigned long)__bss_stop - (unsigned long)_text);
+
 	early_reserve_initrd();
 
+	/*
+	 * At this point everything still needed from the boot loader
+	 * or BIOS or kernel text should be early reserved or marked not
+	 * RAM in e820. All other memory is free game.
+	 */
+
 #ifdef CONFIG_X86_32
 	memcpy(&boot_cpu_data, &new_cpu_data, sizeof(new_cpu_data));
 	visws_early_detect();
-- 
1.7.10.4


^ permalink raw reply related	[flat|nested] 199+ messages in thread

* [PATCH v7u1 30/31] x86, 64bit, mm: Mark data/bss/brk to nx
  2013-01-04  0:48 [PATCH v7u1 00/31] x86, boot, 64bit: Add support for loading ramdisk and bzImage above 4G Yinghai Lu
                   ` (28 preceding siblings ...)
  2013-01-04  0:48 ` [PATCH v7u1 29/31] x86: Merge early kernel reserve for 32bit and 64bit Yinghai Lu
@ 2013-01-04  0:48 ` Yinghai Lu
  2013-01-04  0:48 ` [PATCH v7u1 31/31] x86, 64bit, mm: hibernate use generic mapping_init Yinghai Lu
                   ` (3 subsequent siblings)
  33 siblings, 0 replies; 199+ messages in thread
From: Yinghai Lu @ 2013-01-04  0:48 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H. Peter Anvin
  Cc: Eric W. Biederman, Andrew Morton, Borislav Petkov, Jan Kiszka,
	Jason Wessel, linux-kernel, Yinghai Lu

HPA said, we should not have RW and +x set at the time.

for kernel layout:
[    0.000000] Kernel Layout:
[    0.000000]   .text: [0x01000000-0x021434f8]
[    0.000000] .rodata: [0x02200000-0x02a13fff]
[    0.000000]   .data: [0x02c00000-0x02dc763f]
[    0.000000]   .init: [0x02dc9000-0x0312cfff]
[    0.000000]    .bss: [0x0313b000-0x03dd6fff]
[    0.000000]    .brk: [0x03dd7000-0x03dfffff]

before the patch, we have
---[ High Kernel Mapping ]---
0xffffffff80000000-0xffffffff81000000          16M                           pmd
0xffffffff81000000-0xffffffff82200000          18M     ro         PSE GLB x  pmd
0xffffffff82200000-0xffffffff82c00000          10M     ro         PSE GLB NX pmd
0xffffffff82c00000-0xffffffff82dc9000        1828K     RW             GLB x  pte
0xffffffff82dc9000-0xffffffff82e00000         220K     RW             GLB NX pte
0xffffffff82e00000-0xffffffff83000000           2M     RW         PSE GLB NX pmd
0xffffffff83000000-0xffffffff8313a000        1256K     RW             GLB NX pte
0xffffffff8313a000-0xffffffff83200000         792K     RW             GLB x  pte
0xffffffff83200000-0xffffffff83e00000          12M     RW         PSE GLB x  pmd
0xffffffff83e00000-0xffffffffa0000000         450M                           pmd

after patch,, we get
---[ High Kernel Mapping ]---
0xffffffff80000000-0xffffffff81000000          16M                           pmd
0xffffffff81000000-0xffffffff82200000          18M     ro         PSE GLB x  pmd
0xffffffff82200000-0xffffffff82c00000          10M     ro         PSE GLB NX pmd
0xffffffff82c00000-0xffffffff82e00000           2M     RW             GLB NX pte
0xffffffff82e00000-0xffffffff83000000           2M     RW         PSE GLB NX pmd
0xffffffff83000000-0xffffffff83200000           2M     RW             GLB NX pte
0xffffffff83200000-0xffffffff83e00000          12M     RW         PSE GLB NX pmd
0xffffffff83e00000-0xffffffffa0000000         450M                           pmd

so data, bss, brk get NX ...

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
---
 arch/x86/mm/init_64.c |    7 ++++---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c
index 98385a2..9653411 100644
--- a/arch/x86/mm/init_64.c
+++ b/arch/x86/mm/init_64.c
@@ -820,6 +820,7 @@ void mark_rodata_ro(void)
 	unsigned long end = (unsigned long) &__end_rodata_hpage_align;
 	unsigned long text_end = PFN_ALIGN(&__stop___ex_table);
 	unsigned long rodata_end = PFN_ALIGN(&__end_rodata);
+	unsigned long all_end = PFN_ALIGN(&_end);
 
 	printk(KERN_INFO "Write protecting the kernel read-only data: %luk\n",
 	       (end - start) >> 10);
@@ -828,10 +829,10 @@ void mark_rodata_ro(void)
 	kernel_set_to_readonly = 1;
 
 	/*
-	 * The rodata section (but not the kernel text!) should also be
-	 * not-executable.
+	 * The rodata/data/bss/brk section (but not the kernel text!)
+	 * should also be not-executable.
 	 */
-	set_memory_nx(rodata_start, (end - rodata_start) >> PAGE_SHIFT);
+	set_memory_nx(rodata_start, (all_end - rodata_start) >> PAGE_SHIFT);
 
 	rodata_test();
 
-- 
1.7.10.4


^ permalink raw reply related	[flat|nested] 199+ messages in thread

* [PATCH v7u1 31/31] x86, 64bit, mm: hibernate use generic mapping_init
  2013-01-04  0:48 [PATCH v7u1 00/31] x86, boot, 64bit: Add support for loading ramdisk and bzImage above 4G Yinghai Lu
                   ` (29 preceding siblings ...)
  2013-01-04  0:48 ` [PATCH v7u1 30/31] x86, 64bit, mm: Mark data/bss/brk to nx Yinghai Lu
@ 2013-01-04  0:48 ` Yinghai Lu
  2013-01-04 11:43   ` Rafael J. Wysocki
  2013-01-04  7:09 ` [PATCH v7u1 00/31] x86, boot, 64bit: Add support for loading ramdisk and bzImage above 4G Borislav Petkov
                   ` (2 subsequent siblings)
  33 siblings, 1 reply; 199+ messages in thread
From: Yinghai Lu @ 2013-01-04  0:48 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H. Peter Anvin
  Cc: Eric W. Biederman, Andrew Morton, Borislav Petkov, Jan Kiszka,
	Jason Wessel, linux-kernel, Yinghai Lu, Pavel Machek,
	Rafael J. Wysocki, linux-pm

Make it only map range in pfn_mapped array.

and it has kernel mapping with EXEC.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Cc: Pavel Machek <pavel@ucw.cz>
Cc: Rafael J. Wysocki <rjw@sisk.pl>
Cc: linux-pm@vger.kernel.org
---
 arch/x86/power/hibernate_64.c |   66 ++++++++++++++---------------------------
 1 file changed, 22 insertions(+), 44 deletions(-)

diff --git a/arch/x86/power/hibernate_64.c b/arch/x86/power/hibernate_64.c
index 460f314..a0fde91 100644
--- a/arch/x86/power/hibernate_64.c
+++ b/arch/x86/power/hibernate_64.c
@@ -11,6 +11,8 @@
 #include <linux/gfp.h>
 #include <linux/smp.h>
 #include <linux/suspend.h>
+
+#include <asm/init.h>
 #include <asm/proto.h>
 #include <asm/page.h>
 #include <asm/pgtable.h>
@@ -39,41 +41,21 @@ pgd_t *temp_level4_pgt;
 
 void *relocated_restore_code;
 
-static int res_phys_pud_init(pud_t *pud, unsigned long address, unsigned long end)
+static void *alloc_pgt_page(void *context)
 {
-	long i, j;
-
-	i = pud_index(address);
-	pud = pud + i;
-	for (; i < PTRS_PER_PUD; pud++, i++) {
-		unsigned long paddr;
-		pmd_t *pmd;
-
-		paddr = address + i*PUD_SIZE;
-		if (paddr >= end)
-			break;
-
-		pmd = (pmd_t *)get_safe_page(GFP_ATOMIC);
-		if (!pmd)
-			return -ENOMEM;
-		set_pud(pud, __pud(__pa(pmd) | _KERNPG_TABLE));
-		for (j = 0; j < PTRS_PER_PMD; pmd++, j++, paddr += PMD_SIZE) {
-			unsigned long pe;
-
-			if (paddr >= end)
-				break;
-			pe = __PAGE_KERNEL_LARGE_EXEC | paddr;
-			pe &= __supported_pte_mask;
-			set_pmd(pmd, __pmd(pe));
-		}
-	}
-	return 0;
+	return (void *)get_safe_page(GFP_ATOMIC);
 }
 
 static int set_up_temporary_mappings(void)
 {
-	unsigned long start, end, next;
-	int error;
+	struct x86_mapping_info info = {
+		.alloc_pgt_page	= alloc_pgt_page,
+		.pmd_flag	= __PAGE_KERNEL_LARGE_EXEC,
+		.kernel_mapping = true,
+	};
+	unsigned long mstart, mend;
+	int result;
+	int i;
 
 	temp_level4_pgt = (pgd_t *)get_safe_page(GFP_ATOMIC);
 	if (!temp_level4_pgt)
@@ -84,21 +66,17 @@ static int set_up_temporary_mappings(void)
 		init_level4_pgt[pgd_index(__START_KERNEL_map)]);
 
 	/* Set up the direct mapping from scratch */
-	start = (unsigned long)pfn_to_kaddr(0);
-	end = (unsigned long)pfn_to_kaddr(max_pfn);
-
-	for (; start < end; start = next) {
-		pud_t *pud = (pud_t *)get_safe_page(GFP_ATOMIC);
-		if (!pud)
-			return -ENOMEM;
-		next = start + PGDIR_SIZE;
-		if (next > end)
-			next = end;
-		if ((error = res_phys_pud_init(pud, __pa(start), __pa(next))))
-			return error;
-		set_pgd(temp_level4_pgt + pgd_index(start),
-			mk_kernel_pgd(__pa(pud)));
+	for (i = 0; i < nr_pfn_mapped; i++) {
+		mstart = pfn_mapped[i].start << PAGE_SHIFT;
+		mend   = pfn_mapped[i].end << PAGE_SHIFT;
+
+		result = kernel_ident_mapping_init(&info, temp_level4_pgt,
+						   mstart, mend);
+
+		if (result)
+			return result;
 	}
+
 	return 0;
 }
 
-- 
1.7.10.4


^ permalink raw reply related	[flat|nested] 199+ messages in thread

* Re: [PATCH v7u1 00/31] x86, boot, 64bit: Add support for loading ramdisk and bzImage above 4G
  2013-01-04  0:48 [PATCH v7u1 00/31] x86, boot, 64bit: Add support for loading ramdisk and bzImage above 4G Yinghai Lu
                   ` (30 preceding siblings ...)
  2013-01-04  0:48 ` [PATCH v7u1 31/31] x86, 64bit, mm: hibernate use generic mapping_init Yinghai Lu
@ 2013-01-04  7:09 ` Borislav Petkov
  2013-01-04 21:44   ` Yinghai Lu
  2013-01-14 20:45 ` H. Peter Anvin
  2013-01-15 12:19 ` Stefano Stabellini
  33 siblings, 1 reply; 199+ messages in thread
From: Borislav Petkov @ 2013-01-04  7:09 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Eric W. Biederman,
	Andrew Morton, Borislav Petkov, Jan Kiszka, Jason Wessel,
	linux-kernel

On Thu, Jan 03, 2013 at 04:48:20PM -0800, Yinghai Lu wrote:

>         git://git.kernel.org/pub/scm/linux/kernel/git/yinghai/linux-yinghai.git for-x86-boot
> 
> and it is on top of linus's tree 2013-01-03
> plus tip:x86/mm, tip:x86/mm2

This is causing a merge conflict when merging tip:x86/mm2
after having merged tip:x86/mm ontop of -rc2+ (today's Linus'
tree) in mm/nobootmem.c. free_all_bootmem_node has gained a
reset_node_lowmem_managed_pages() call which got added in
9feedc9d831e18ae6d0d15aa562e5e46ba53647b.

Now, you have a patch in tip:x86/mm2 which kills that
free_all_bootmem_node() function but the commit above adds that
reset_node_lowmem_managed_pages() call to it.

A proper merge conflict resolve would need to be added to the pull
request which sends tip:x86/mm2 upstream and then you'd need to rebase
your stuff ontop. Or something better which I'm not thinking of right
now...

Thanks.

-- 
Regards/Gruss,
Boris.

^ permalink raw reply	[flat|nested] 199+ messages in thread

* Re: [PATCH v7u1 01/31] x86, mm: Fix page table early allocation offset checking
  2013-01-04  0:48 ` [PATCH v7u1 01/31] x86, mm: Fix page table early allocation offset checking Yinghai Lu
@ 2013-01-04  7:17   ` Borislav Petkov
  2013-01-04 21:50     ` Yinghai Lu
  2013-01-15 12:27   ` Stefano Stabellini
  1 sibling, 1 reply; 199+ messages in thread
From: Borislav Petkov @ 2013-01-04  7:17 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Eric W. Biederman,
	Andrew Morton, Jan Kiszka, Jason Wessel, linux-kernel

On Thu, Jan 03, 2013 at 04:48:21PM -0800, Yinghai Lu wrote:
> During debugging loading kernel above 4G, found one page if is not used
> in BRK with early page allocation.
> 
> pgt_buf_top is address that can not be used, so should check if that new
> end is above that top, otherwise last page will not be used.
> 
> Fix that checking and also add print out for every allocation from BRK.

This commit message still bothers the hell out of me. Please, fix it up
to something more readable like the below, for example:

"pgt_buf_top is an address which cannot be used so we should check
whether the new 'end' is above it. Otherwise, the last BRK page remains
unused.

Fix that check and add a debug printout of every BRK allocation."

Thanks.

-- 
Regards/Gruss,
Boris.

^ permalink raw reply	[flat|nested] 199+ messages in thread

* Re: [PATCH v7u1 31/31] x86, 64bit, mm: hibernate use generic mapping_init
  2013-01-04  0:48 ` [PATCH v7u1 31/31] x86, 64bit, mm: hibernate use generic mapping_init Yinghai Lu
@ 2013-01-04 11:43   ` Rafael J. Wysocki
  2013-01-04 21:59     ` Yinghai Lu
  0 siblings, 1 reply; 199+ messages in thread
From: Rafael J. Wysocki @ 2013-01-04 11:43 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Eric W. Biederman,
	Andrew Morton, Borislav Petkov, Jan Kiszka, Jason Wessel,
	linux-kernel, Pavel Machek, linux-pm

On Thursday, January 03, 2013 04:48:51 PM Yinghai Lu wrote:
> Make it only map range in pfn_mapped array.

Can you please explain why that should be sufficient?

Have you tested it?

> and it has kernel mapping with EXEC.

That's because it needs to execute code from one of those pages and it
doesn't know in advance which one that's going to be.

Thanks,
Rafael


> Signed-off-by: Yinghai Lu <yinghai@kernel.org>
> Cc: Pavel Machek <pavel@ucw.cz>
> Cc: Rafael J. Wysocki <rjw@sisk.pl>
> Cc: linux-pm@vger.kernel.org
> ---
>  arch/x86/power/hibernate_64.c |   66 ++++++++++++++---------------------------
>  1 file changed, 22 insertions(+), 44 deletions(-)
> 
> diff --git a/arch/x86/power/hibernate_64.c b/arch/x86/power/hibernate_64.c
> index 460f314..a0fde91 100644
> --- a/arch/x86/power/hibernate_64.c
> +++ b/arch/x86/power/hibernate_64.c
> @@ -11,6 +11,8 @@
>  #include <linux/gfp.h>
>  #include <linux/smp.h>
>  #include <linux/suspend.h>
> +
> +#include <asm/init.h>
>  #include <asm/proto.h>
>  #include <asm/page.h>
>  #include <asm/pgtable.h>
> @@ -39,41 +41,21 @@ pgd_t *temp_level4_pgt;
>  
>  void *relocated_restore_code;
>  
> -static int res_phys_pud_init(pud_t *pud, unsigned long address, unsigned long end)
> +static void *alloc_pgt_page(void *context)
>  {
> -	long i, j;
> -
> -	i = pud_index(address);
> -	pud = pud + i;
> -	for (; i < PTRS_PER_PUD; pud++, i++) {
> -		unsigned long paddr;
> -		pmd_t *pmd;
> -
> -		paddr = address + i*PUD_SIZE;
> -		if (paddr >= end)
> -			break;
> -
> -		pmd = (pmd_t *)get_safe_page(GFP_ATOMIC);
> -		if (!pmd)
> -			return -ENOMEM;
> -		set_pud(pud, __pud(__pa(pmd) | _KERNPG_TABLE));
> -		for (j = 0; j < PTRS_PER_PMD; pmd++, j++, paddr += PMD_SIZE) {
> -			unsigned long pe;
> -
> -			if (paddr >= end)
> -				break;
> -			pe = __PAGE_KERNEL_LARGE_EXEC | paddr;
> -			pe &= __supported_pte_mask;
> -			set_pmd(pmd, __pmd(pe));
> -		}
> -	}
> -	return 0;
> +	return (void *)get_safe_page(GFP_ATOMIC);
>  }
>
>  static int set_up_temporary_mappings(void)
>  {
> -	unsigned long start, end, next;
> -	int error;
> +	struct x86_mapping_info info = {
> +		.alloc_pgt_page	= alloc_pgt_page,
> +		.pmd_flag	= __PAGE_KERNEL_LARGE_EXEC,
> +		.kernel_mapping = true,
> +	};
> +	unsigned long mstart, mend;
> +	int result;
> +	int i;
>  
>  	temp_level4_pgt = (pgd_t *)get_safe_page(GFP_ATOMIC);
>  	if (!temp_level4_pgt)
> @@ -84,21 +66,17 @@ static int set_up_temporary_mappings(void)
>  		init_level4_pgt[pgd_index(__START_KERNEL_map)]);
>  
>  	/* Set up the direct mapping from scratch */
> -	start = (unsigned long)pfn_to_kaddr(0);
> -	end = (unsigned long)pfn_to_kaddr(max_pfn);
> -
> -	for (; start < end; start = next) {
> -		pud_t *pud = (pud_t *)get_safe_page(GFP_ATOMIC);
> -		if (!pud)
> -			return -ENOMEM;
> -		next = start + PGDIR_SIZE;
> -		if (next > end)
> -			next = end;
> -		if ((error = res_phys_pud_init(pud, __pa(start), __pa(next))))
> -			return error;
> -		set_pgd(temp_level4_pgt + pgd_index(start),
> -			mk_kernel_pgd(__pa(pud)));
> +	for (i = 0; i < nr_pfn_mapped; i++) {
> +		mstart = pfn_mapped[i].start << PAGE_SHIFT;
> +		mend   = pfn_mapped[i].end << PAGE_SHIFT;
> +
> +		result = kernel_ident_mapping_init(&info, temp_level4_pgt,
> +						   mstart, mend);
> +
> +		if (result)
> +			return result;
>  	}
> +
>  	return 0;
>  }
>  
> 
-- 
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.

^ permalink raw reply	[flat|nested] 199+ messages in thread

* Re: [PATCH v7u1 26/31] x86: Don't enable swiotlb if there is not enough ram for it
  2013-01-04  0:48 ` [PATCH v7u1 26/31] x86: Don't enable swiotlb if there is not enough ram for it Yinghai Lu
@ 2013-01-04 16:05   ` Konrad Rzeszutek Wilk
  2013-01-04 19:57     ` Yinghai Lu
  2013-01-04 17:50   ` Shuah Khan
  1 sibling, 1 reply; 199+ messages in thread
From: Konrad Rzeszutek Wilk @ 2013-01-04 16:05 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Eric W. Biederman,
	Andrew Morton, Borislav Petkov, Jan Kiszka, Jason Wessel,
	linux-kernel, Joerg Roedel

On Thu, Jan 03, 2013 at 04:48:46PM -0800, Yinghai Lu wrote:
> Normal boot path on system with iommu support:
> swiotlb buffer will be allocated early at first and then try to initialize
> iommu, if iommu for intel or amd could setup properly, swiotlb buffer
> will be freed.
> 
> The early allocating is with bootmem, and get panic when we try to use
> kdump with buffer above 4G only if swiotlb is enabled.
> 
> because actually the kernel can go on without swiotlb, and use intel iommu.
> 
> Try disable swiotlb if there is not enough ram for it.
> 
> That is for kdump to use kernel above 4G.
> 
> Suggested-by: Eric W. Biederman <ebiederm@xmission.com>
> Signed-off-by: Yinghai Lu <yinghai@kernel.org>
> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
> Cc: Joerg Roedel <joro@8bytes.org>
> ---
>  arch/x86/kernel/pci-swiotlb.c |   14 ++++++++++----
>  1 file changed, 10 insertions(+), 4 deletions(-)
> 
> diff --git a/arch/x86/kernel/pci-swiotlb.c b/arch/x86/kernel/pci-swiotlb.c
> index 6c483ba..949ebfe 100644
> --- a/arch/x86/kernel/pci-swiotlb.c
> +++ b/arch/x86/kernel/pci-swiotlb.c
> @@ -6,6 +6,7 @@
>  #include <linux/swiotlb.h>
>  #include <linux/bootmem.h>
>  #include <linux/dma-mapping.h>
> +#include <linux/memblock.h>
>  
>  #include <asm/iommu.h>
>  #include <asm/swiotlb.h>
> @@ -50,6 +51,11 @@ static struct dma_map_ops swiotlb_dma_ops = {
>  	.dma_supported = NULL,
>  };
>  
> +static bool __init enough_mem_for_swiotlb(void)
> +{
> +	/* do we have less than 1M RAM under 4G ? */

And why 1MB? The default size is 64MB.

> +	return memblock_mem_size(1ULL<<(32-PAGE_SHIFT)) > (1ULL<<20);
> +}
>  /*
>   * pci_swiotlb_detect_override - set swiotlb to 1 if necessary
>   *
> @@ -58,12 +64,12 @@ static struct dma_map_ops swiotlb_dma_ops = {
>   */
>  int __init pci_swiotlb_detect_override(void)
>  {
> -	int use_swiotlb = swiotlb | swiotlb_force;
> -
>  	if (swiotlb_force)
>  		swiotlb = 1;
> +	else if (!enough_mem_for_swiotlb())
> +		swiotlb = 0;
>  
> -	return use_swiotlb;
> +	return swiotlb;
>  }
>  IOMMU_INIT_FINISH(pci_swiotlb_detect_override,
>  		  pci_xen_swiotlb_detect,
> @@ -78,7 +84,7 @@ int __init pci_swiotlb_detect_4gb(void)
>  {
>  	/* don't initialize swiotlb if iommu=off (no_iommu=1) */
>  #ifdef CONFIG_X86_64
> -	if (!no_iommu && max_pfn > MAX_DMA32_PFN)
> +	if (!no_iommu && max_pfn > MAX_DMA32_PFN && enough_mem_for_swiotlb())
>  		swiotlb = 1;
>  #endif
>  	return swiotlb;
> -- 
> 1.7.10.4
> 

^ permalink raw reply	[flat|nested] 199+ messages in thread

* Re: [PATCH v7u1 06/31] x86, 64bit, realmode: use init_level4_pgt to set trapmoline_pgt directly
  2013-01-04  0:48 ` [PATCH v7u1 06/31] x86, 64bit, realmode: use init_level4_pgt to set trapmoline_pgt directly Yinghai Lu
@ 2013-01-04 17:18   ` Sakkinen, Jarkko
  2013-01-04 22:01     ` Yinghai Lu
  2013-01-07 15:54   ` Borislav Petkov
  1 sibling, 1 reply; 199+ messages in thread
From: Sakkinen, Jarkko @ 2013-01-04 17:18 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Eric W. Biederman,
	Andrew Morton, Borislav Petkov, Jan Kiszka, Jason Wessel,
	linux-kernel

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset="utf-8", Size: 1268 bytes --]

On Thu, 2013-01-03 at 16:48 -0800, Yinghai Lu wrote:
> with #PF handler way to set early page table, level3_ident will go away with
> 64bit native path.
> 
> So just use entries in init_level4_pgt to set them in tramopline_pgt
> 
> Signed-off-by: Yinghai Lu <yinghai@kernel.org>
> Cc: Jarkko Sakkinen <jarkko.sakkinen@intel.com>

Acked-by: Jarkko Sakkinen <jarkko.sakkinen@intel.com>

> ---
>  arch/x86/realmode/init.c |    4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/x86/realmode/init.c b/arch/x86/realmode/init.c
> index b96fe6f..384b3f4 100644
> --- a/arch/x86/realmode/init.c
> +++ b/arch/x86/realmode/init.c
> @@ -78,8 +78,8 @@ void __init setup_real_mode(void)
>  	*trampoline_cr4_features = read_cr4();
>  
>  	trampoline_pgd = (u64 *) __va(real_mode_header->trampoline_pgd);
> -	trampoline_pgd[0] = __pa_symbol(level3_ident_pgt) + _KERNPG_TABLE;
> -	trampoline_pgd[511] = __pa_symbol(level3_kernel_pgt) + _KERNPG_TABLE;
> +	trampoline_pgd[0] = init_level4_pgt[pgd_index(__PAGE_OFFSET)].pgd;
> +	trampoline_pgd[511] = init_level4_pgt[511].pgd;
>  #endif
>  }
>  

ÿôèº{.nÇ+‰·Ÿ®‰­†+%ŠËÿ±éݶ\x17¥Šwÿº{.nÇ+‰·¥Š{±þG«éÿŠ{ayº\x1dʇڙë,j\a­¢f£¢·hšïêÿ‘êçz_è®\x03(­éšŽŠÝ¢j"ú\x1a¶^[m§ÿÿ¾\a«þG«éÿ¢¸?™¨è­Ú&£ø§~á¶iO•æ¬z·švØ^\x14\x04\x1a¶^[m§ÿÿÃ\fÿ¶ìÿ¢¸?–I¥

^ permalink raw reply	[flat|nested] 199+ messages in thread

* Re: [PATCH v7u1 07/31] x86, realmode: Separate real_mode reserve and setup
  2013-01-04  0:48 ` [PATCH v7u1 07/31] x86, realmode: Separate real_mode reserve and setup Yinghai Lu
@ 2013-01-04 17:18   ` Sakkinen, Jarkko
  2013-01-07 15:54   ` Borislav Petkov
  1 sibling, 0 replies; 199+ messages in thread
From: Sakkinen, Jarkko @ 2013-01-04 17:18 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Eric W. Biederman,
	Andrew Morton, Borislav Petkov, Jan Kiszka, Jason Wessel,
	linux-kernel

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset="utf-8", Size: 3482 bytes --]

On Thu, 2013-01-03 at 16:48 -0800, Yinghai Lu wrote:
> After we switch to use #PF handler help to set page table, init_level4_pgt
> will only have entries set after init_mem_mapping.
> We need to move copying init_level4_pgt to trampoline_pgd after that.
> 
> So split reserve and setup, and move the setup after init_mem_mapping()
> 
> Signed-off-by: Yinghai Lu <yinghai@kernel.org>
> Cc: Jarkko Sakkinen <jarkko.sakkinen@intel.com>

Acked-by: Jarkko Sakkinen <jarkko.sakkinen@intel.com>

> ---
>  arch/x86/include/asm/realmode.h |    3 ++-
>  arch/x86/kernel/setup.c         |    4 +++-
>  arch/x86/realmode/init.c        |   30 +++++++++++++++++++-----------
>  3 files changed, 24 insertions(+), 13 deletions(-)
> 
> diff --git a/arch/x86/include/asm/realmode.h b/arch/x86/include/asm/realmode.h
> index fe1ec5b..9c6b890 100644
> --- a/arch/x86/include/asm/realmode.h
> +++ b/arch/x86/include/asm/realmode.h
> @@ -58,6 +58,7 @@ extern unsigned char boot_gdt[];
>  extern unsigned char secondary_startup_64[];
>  #endif
>  
> -extern void __init setup_real_mode(void);
> +void reserve_real_mode(void);
> +void setup_real_mode(void);
>  
>  #endif /* _ARCH_X86_REALMODE_H */
> diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
> index 81ea5a5..01b22d0 100644
> --- a/arch/x86/kernel/setup.c
> +++ b/arch/x86/kernel/setup.c
> @@ -913,10 +913,12 @@ void __init setup_arch(char **cmdline_p)
>  	printk(KERN_DEBUG "initial memory mapped: [mem 0x00000000-%#010lx]\n",
>  			(max_pfn_mapped<<PAGE_SHIFT) - 1);
>  
> -	setup_real_mode();
> +	reserve_real_mode();
>  
>  	init_mem_mapping();
>  
> +	setup_real_mode();
> +
>  	memblock.current_limit = get_max_mapped();
>  	dma_contiguous_reserve(0);
>  
> diff --git a/arch/x86/realmode/init.c b/arch/x86/realmode/init.c
> index 384b3f4..3baae96 100644
> --- a/arch/x86/realmode/init.c
> +++ b/arch/x86/realmode/init.c
> @@ -8,9 +8,26 @@
>  struct real_mode_header *real_mode_header;
>  u32 *trampoline_cr4_features;
>  
> -void __init setup_real_mode(void)
> +void __init reserve_real_mode(void)
>  {
>  	phys_addr_t mem;
> +	unsigned char *base;
> +	size_t size = PAGE_ALIGN(real_mode_blob_end - real_mode_blob);
> +
> +	/* Has to be in very low memory so we can execute real-mode AP code. */
> +	mem = memblock_find_in_range(0, 1<<20, size, PAGE_SIZE);
> +	if (!mem)
> +		panic("Cannot allocate trampoline\n");
> +
> +	base = __va(mem);
> +	memblock_reserve(mem, size);
> +	real_mode_header = (struct real_mode_header *) base;
> +	printk(KERN_DEBUG "Base memory trampoline at [%p] %llx size %zu\n",
> +	       base, (unsigned long long)mem, size);
> +}
> +
> +void __init setup_real_mode(void)
> +{
>  	u16 real_mode_seg;
>  	u32 *rel;
>  	u32 count;
> @@ -25,16 +42,7 @@ void __init setup_real_mode(void)
>  	u64 efer;
>  #endif
>  
> -	/* Has to be in very low memory so we can execute real-mode AP code. */
> -	mem = memblock_find_in_range(0, 1<<20, size, PAGE_SIZE);
> -	if (!mem)
> -		panic("Cannot allocate trampoline\n");
> -
> -	base = __va(mem);
> -	memblock_reserve(mem, size);
> -	real_mode_header = (struct real_mode_header *) base;
> -	printk(KERN_DEBUG "Base memory trampoline at [%p] %llx size %zu\n",
> -	       base, (unsigned long long)mem, size);
> +	base = (unsigned char *)real_mode_header;
>  
>  	memcpy(base, real_mode_blob, size);
>  

ÿôèº{.nÇ+‰·Ÿ®‰­†+%ŠËÿ±éݶ\x17¥Šwÿº{.nÇ+‰·¥Š{±þG«éÿŠ{ayº\x1dʇڙë,j\a­¢f£¢·hšïêÿ‘êçz_è®\x03(­éšŽŠÝ¢j"ú\x1a¶^[m§ÿÿ¾\a«þG«éÿ¢¸?™¨è­Ú&£ø§~á¶iO•æ¬z·švØ^\x14\x04\x1a¶^[m§ÿÿÃ\fÿ¶ìÿ¢¸?–I¥

^ permalink raw reply	[flat|nested] 199+ messages in thread

* Re: [PATCH v7u1 26/31] x86: Don't enable swiotlb if there is not enough ram for it
  2013-01-04  0:48 ` [PATCH v7u1 26/31] x86: Don't enable swiotlb if there is not enough ram for it Yinghai Lu
  2013-01-04 16:05   ` Konrad Rzeszutek Wilk
@ 2013-01-04 17:50   ` Shuah Khan
  2013-01-04 20:34     ` Yinghai Lu
  1 sibling, 1 reply; 199+ messages in thread
From: Shuah Khan @ 2013-01-04 17:50 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Eric W. Biederman,
	Andrew Morton, Borislav Petkov, Jan Kiszka, Jason Wessel,
	linux-kernel, Konrad Rzeszutek Wilk, Joerg Roedel

On Thu, Jan 3, 2013 at 5:48 PM, Yinghai Lu <yinghai@kernel.org> wrote:
> Normal boot path on system with iommu support:
> swiotlb buffer will be allocated early at first and then try to initialize
> iommu, if iommu for intel or amd could setup properly, swiotlb buffer
> will be freed.
>
> The early allocating is with bootmem, and get panic when we try to use
> kdump with buffer above 4G only if swiotlb is enabled.
>
> because actually the kernel can go on without swiotlb, and use intel iommu.
>
> Try disable swiotlb if there is not enough ram for it.
>
> That is for kdump to use kernel above 4G.
>
> Suggested-by: Eric W. Biederman <ebiederm@xmission.com>
> Signed-off-by: Yinghai Lu <yinghai@kernel.org>
> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
> Cc: Joerg Roedel <joro@8bytes.org>
> ---
>  arch/x86/kernel/pci-swiotlb.c |   14 ++++++++++----
>  1 file changed, 10 insertions(+), 4 deletions(-)
>
> diff --git a/arch/x86/kernel/pci-swiotlb.c b/arch/x86/kernel/pci-swiotlb.c
> index 6c483ba..949ebfe 100644
> --- a/arch/x86/kernel/pci-swiotlb.c
> +++ b/arch/x86/kernel/pci-swiotlb.c
> @@ -6,6 +6,7 @@
>  #include <linux/swiotlb.h>
>  #include <linux/bootmem.h>
>  #include <linux/dma-mapping.h>
> +#include <linux/memblock.h>
>
>  #include <asm/iommu.h>
>  #include <asm/swiotlb.h>
> @@ -50,6 +51,11 @@ static struct dma_map_ops swiotlb_dma_ops = {
>         .dma_supported = NULL,
>  };
>
> +static bool __init enough_mem_for_swiotlb(void)
> +{
> +       /* do we have less than 1M RAM under 4G ? */
> +       return memblock_mem_size(1ULL<<(32-PAGE_SHIFT)) > (1ULL<<20);
> +}
>  /*
>   * pci_swiotlb_detect_override - set swiotlb to 1 if necessary
>   *
> @@ -58,12 +64,12 @@ static struct dma_map_ops swiotlb_dma_ops = {
>   */
>  int __init pci_swiotlb_detect_override(void)
>  {
> -       int use_swiotlb = swiotlb | swiotlb_force;
> -
>         if (swiotlb_force)
>                 swiotlb = 1;
> +       else if (!enough_mem_for_swiotlb())
> +               swiotlb = 0;
>
> -       return use_swiotlb;
> +       return swiotlb;

This change doesn't take into account what swiolb was when
pci_swiotlb_detect_override() is called. Instead of returning
use_swiotlb like the original code did, it returns swiotlb which could
be zero, if !enough_mem_for_swiotlb().

Might work fine on Intel platforms, but not on systems where the IOMMU
driver wants to enable swiotlb for some devices as in the case of AMD.

AMD IOMMU driver enables swiotlb for devices that are not specified in
IVRs and/or not in the AMD IOMMU scope, after it successfully
initializes IOMMU. It will explicitely set switolb=1 to make sure
reserved swiotlb memory is not released. This change will break that
case.

Reference: amd_iommu_init_dma_ops()

        iommu_detected = 1;
        swiotlb = 0;

        /* Make the driver finally visible to the drivers */
        unhandled = device_dma_ops_init();
        if (unhandled && max_pfn > MAX_DMA32_PFN) {
                /* There are unhandled devices - initialize swiotlb for them */
                swiotlb = 1;
        }


Thanks,
-- Shuah

^ permalink raw reply	[flat|nested] 199+ messages in thread

* Re: [PATCH v7u1 26/31] x86: Don't enable swiotlb if there is not enough ram for it
  2013-01-04 16:05   ` Konrad Rzeszutek Wilk
@ 2013-01-04 19:57     ` Yinghai Lu
  0 siblings, 0 replies; 199+ messages in thread
From: Yinghai Lu @ 2013-01-04 19:57 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Eric W. Biederman,
	Andrew Morton, Borislav Petkov, Jan Kiszka, Jason Wessel,
	linux-kernel, Joerg Roedel

On Fri, Jan 4, 2013 at 8:05 AM, Konrad Rzeszutek Wilk
<konrad.wilk@oracle.com> wrote:
>> +static bool __init enough_mem_for_swiotlb(void)
>> +{
>> +     /* do we have less than 1M RAM under 4G ? */
>
> And why 1MB? The default size is 64MB.

because kdump scripts will use memmap=exact,.. to add range below 1M
to get realmode data.

^ permalink raw reply	[flat|nested] 199+ messages in thread

* Re: [PATCH v7u1 03/31] x86, realmode: set real_mode permissions early
  2013-01-04  0:48 ` [PATCH v7u1 03/31] x86, realmode: set real_mode permissions early Yinghai Lu
@ 2013-01-04 20:15   ` Borislav Petkov
  2013-01-04 20:58     ` Yinghai Lu
  0 siblings, 1 reply; 199+ messages in thread
From: Borislav Petkov @ 2013-01-04 20:15 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Eric W. Biederman,
	Andrew Morton, Jan Kiszka, Jason Wessel, linux-kernel

On Thu, Jan 03, 2013 at 04:48:23PM -0800, Yinghai Lu wrote:
> Trampoline code is executed by APs with kernel low mapping.
> We need to set trampoline code to EXEC early before we do smp
> AP bootings.

"... before we boot the APs."

> 
> Found the problem after switching to #PF handler set page table,
> and we do not set initial kernel low mapping with EXEC anymore in

"...table, since we do not make initial kernel low mapping executable
anymore, in ..."

> arch/x86/kernel/head_64.S.
> 
> Change to use early_initcall instead that will make sure tramopline

							   trampoline

> will have EXEC set.
> 
> Signed-off-by: Yinghai Lu <yinghai@kernel.org>
> ---
>  arch/x86/realmode/init.c |    8 ++++++--
>  1 file changed, 6 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/x86/realmode/init.c b/arch/x86/realmode/init.c
> index 8045026..b96fe6f 100644
> --- a/arch/x86/realmode/init.c
> +++ b/arch/x86/realmode/init.c
> @@ -111,5 +111,9 @@ static int __init set_real_mode_permissions(void)
>  
>  	return 0;
>  }
> -
> -arch_initcall(set_real_mode_permissions);
> +/*
> + * Trampoline will be executed by APs with SMP.
> + * So we need to set it to EXEC in do_pre_smp_initcalls() at least,
> + * and that needs early_initcall().
> + */
> +early_initcall(set_real_mode_permissions);

Now you have two conflicting comments, one over
set_real_mode_permissions() the one you're adding here. Let's merge them
into one (the diff is ontop of your patch).

--
diff --git a/arch/x86/realmode/init.c b/arch/x86/realmode/init.c
index b96fe6f54d2f..9eb0fa95881e 100644
--- a/arch/x86/realmode/init.c
+++ b/arch/x86/realmode/init.c
@@ -84,10 +84,11 @@ void __init setup_real_mode(void)
 }
 
 /*
- * set_real_mode_permissions() gets called very early, to guarantee the
- * availability of low memory.  This is before the proper kernel page
- * tables are set up, so we cannot set page permissions in that
- * function.  Thus, we use an arch_initcall instead.
+ * This function gets called very early to guarantee the availability
+ * of low memory. This is even before the proper kernel page tables are
+ * set up, so we cannot set page permissions in that function. However,
+ * trampoline code will be executed by APs so we need it to be marked
+ * executable at pre-SMP time, thus run it as a early_initcall().
  */
 static int __init set_real_mode_permissions(void)
 {
@@ -111,9 +112,4 @@ static int __init set_real_mode_permissions(void)
 
 	return 0;
 }
-/*
- * Trampoline will be executed by APs with SMP.
- * So we need to set it to EXEC in do_pre_smp_initcalls() at least,
- * and that needs early_initcall().
- */
 early_initcall(set_real_mode_permissions);

-- 
Regards/Gruss,
Boris.

^ permalink raw reply related	[flat|nested] 199+ messages in thread

* Re: [PATCH v7u1 26/31] x86: Don't enable swiotlb if there is not enough ram for it
  2013-01-04 17:50   ` Shuah Khan
@ 2013-01-04 20:34     ` Yinghai Lu
  2013-01-04 21:02       ` Shuah Khan
  0 siblings, 1 reply; 199+ messages in thread
From: Yinghai Lu @ 2013-01-04 20:34 UTC (permalink / raw)
  To: Shuah Khan
  Cc: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Eric W. Biederman,
	Andrew Morton, Borislav Petkov, Jan Kiszka, Jason Wessel,
	linux-kernel, Konrad Rzeszutek Wilk, Joerg Roedel

[-- Attachment #1: Type: text/plain, Size: 926 bytes --]

On Fri, Jan 4, 2013 at 9:50 AM, Shuah Khan <shuahkhan@gmail.com> wrote:
>
> This change doesn't take into account what swiolb was when
> pci_swiotlb_detect_override() is called. Instead of returning
> use_swiotlb like the original code did, it returns swiotlb which could
> be zero, if !enough_mem_for_swiotlb().
>
> Might work fine on Intel platforms, but not on systems where the IOMMU
> driver wants to enable swiotlb for some devices as in the case of AMD.
>
> AMD IOMMU driver enables swiotlb for devices that are not specified in
> IVRs and/or not in the AMD IOMMU scope, after it successfully
> initializes IOMMU. It will explicitely set switolb=1 to make sure
> reserved swiotlb memory is not released. This change will break that
> case.

in that case, we have to panic....

please check attached patch.

Also for those kind of systems, users must specify crashkernel_low=72M or
what ever for kdump.

Thanks

Yinghai

[-- Attachment #2: auto_switch_off_swiotlb.patch --]
[-- Type: application/octet-stream, Size: 3429 bytes --]

Subject: [PATCH] x86: Don't enable swiotlb if there is not enough ram for it

Normal boot path on system with iommu support:
swiotlb buffer will be allocated early at first and then try to initialize
iommu, if iommu for intel or amd could setup properly, swiotlb buffer
will be freed.

The early allocating is with bootmem, and get panic when we try to use
kdump with buffer above 4G only if swiotlb is enabled.

because actually the kernel can go on without swiotlb, and use intel iommu.

Try disable swiotlb if there is not enough ram for it.

That is for kdump to use kernel above 4G.

-v2: Shuah Khan <shuahkhan@gmail.com> pointed out that AMD iommu unhandled
     devices that need swiotlb will have problem.
     In that case, we have to panic, because we do not enable swiotlb
     before.

Suggested-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Joerg Roedel <joro@8bytes.org>

---
 arch/x86/kernel/pci-swiotlb.c |   14 ++++++++++----
 drivers/iommu/amd_iommu.c     |    4 ++++
 2 files changed, 14 insertions(+), 4 deletions(-)

Index: linux-2.6/arch/x86/kernel/pci-swiotlb.c
===================================================================
--- linux-2.6.orig/arch/x86/kernel/pci-swiotlb.c
+++ linux-2.6/arch/x86/kernel/pci-swiotlb.c
@@ -6,6 +6,7 @@
 #include <linux/swiotlb.h>
 #include <linux/bootmem.h>
 #include <linux/dma-mapping.h>
+#include <linux/memblock.h>
 
 #include <asm/iommu.h>
 #include <asm/swiotlb.h>
@@ -50,6 +51,11 @@ static struct dma_map_ops swiotlb_dma_op
 	.dma_supported = NULL,
 };
 
+static bool __init enough_mem_for_swiotlb(void)
+{
+	/* do we have less than 1M RAM under 4G ? */
+	return memblock_mem_size(1ULL<<(32-PAGE_SHIFT)) > (1ULL<<20);
+}
 /*
  * pci_swiotlb_detect_override - set swiotlb to 1 if necessary
  *
@@ -58,12 +64,12 @@ static struct dma_map_ops swiotlb_dma_op
  */
 int __init pci_swiotlb_detect_override(void)
 {
-	int use_swiotlb = swiotlb | swiotlb_force;
-
 	if (swiotlb_force)
 		swiotlb = 1;
+	else if (!enough_mem_for_swiotlb())
+		swiotlb = 0;
 
-	return use_swiotlb;
+	return swiotlb;
 }
 IOMMU_INIT_FINISH(pci_swiotlb_detect_override,
 		  pci_xen_swiotlb_detect,
@@ -78,7 +84,7 @@ int __init pci_swiotlb_detect_4gb(void)
 {
 	/* don't initialize swiotlb if iommu=off (no_iommu=1) */
 #ifdef CONFIG_X86_64
-	if (!no_iommu && max_pfn > MAX_DMA32_PFN)
+	if (!no_iommu && max_pfn > MAX_DMA32_PFN && enough_mem_for_swiotlb())
 		swiotlb = 1;
 #endif
 	return swiotlb;
Index: linux-2.6/drivers/iommu/amd_iommu.c
===================================================================
--- linux-2.6.orig/drivers/iommu/amd_iommu.c
+++ linux-2.6/drivers/iommu/amd_iommu.c
@@ -3144,6 +3144,7 @@ int __init amd_iommu_init_dma_ops(void)
 {
 	struct amd_iommu *iommu;
 	int ret, unhandled;
+	int swiotlb_orig;
 
 	/*
 	 * first allocate a default protection domain for every IOMMU we
@@ -3166,12 +3167,15 @@ int __init amd_iommu_init_dma_ops(void)
 	prealloc_protection_domains();
 
 	iommu_detected = 1;
+	swiotlb_orig = swiotlb;
 	swiotlb = 0;
 
 	/* Make the driver finally visible to the drivers */
 	unhandled = device_dma_ops_init();
 	if (unhandled && max_pfn > MAX_DMA32_PFN) {
 		/* There are unhandled devices - initialize swiotlb for them */
+		if (!swiotlb_orig)
+			panic("can not enable swiotlb for unhandled devices by AMD iommu!\n");
 		swiotlb = 1;
 	}
 

^ permalink raw reply	[flat|nested] 199+ messages in thread

* Re: [PATCH v7u1 03/31] x86, realmode: set real_mode permissions early
  2013-01-04 20:15   ` Borislav Petkov
@ 2013-01-04 20:58     ` Yinghai Lu
  2013-01-04 21:04       ` Borislav Petkov
  0 siblings, 1 reply; 199+ messages in thread
From: Yinghai Lu @ 2013-01-04 20:58 UTC (permalink / raw)
  To: Borislav Petkov, Yinghai Lu, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin, Eric W. Biederman, Andrew Morton, Jan Kiszka,
	Jason Wessel, linux-kernel

On Fri, Jan 4, 2013 at 12:15 PM, Borislav Petkov <bp@alien8.de> wrote:
>  /*
> - * set_real_mode_permissions() gets called very early, to guarantee the
> - * availability of low memory.  This is before the proper kernel page
> - * tables are set up, so we cannot set page permissions in that
> - * function.  Thus, we use an arch_initcall instead.
> + * This function gets called very early to guarantee the availability
> + * of low memory. This is even before the proper kernel page tables are
> + * set up, so we cannot set page permissions in that function. However,
> + * trampoline code will be executed by APs so we need it to be marked
> + * executable at pre-SMP time, thus run it as a early_initcall().
>   */

more than that, that set_real_mode_permissions reference is wrong,
actually it is set_real_mode.

 /*
- * set_real_mode_permissions() gets called very early, to guarantee the
- * availability of low memory.  This is before the proper kernel page
+ * set_real_mode() gets called very early, to guarantee the
+ * availability of low memory. This is before the proper kernel page
  * tables are set up, so we cannot set page permissions in that
- * function.  Thus, we use an arch_initcall instead.
+ * function. Also trampoline code will be executed by APs so we
+ * need to mark it executable at do_pre_smp_initcalls() at least,
+ * thus run it as a early_initcall().
  */

^ permalink raw reply	[flat|nested] 199+ messages in thread

* Re: [PATCH v7u1 20/31] x86, kexec: replace ident_mapping_init and init_level4_page
  2013-01-04  0:48 ` [PATCH v7u1 20/31] x86, kexec: replace ident_mapping_init and init_level4_page Yinghai Lu
@ 2013-01-04 21:01   ` Borislav Petkov
  2013-01-04 22:04     ` Yinghai Lu
  0 siblings, 1 reply; 199+ messages in thread
From: Borislav Petkov @ 2013-01-04 21:01 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Eric W. Biederman,
	Andrew Morton, Jan Kiszka, Jason Wessel, linux-kernel

On Thu, Jan 03, 2013 at 04:48:40PM -0800, Yinghai Lu wrote:
>  static int init_pgtable(struct kimage *image, unsigned long start_pgtable)
>  {
> +	struct x86_mapping_info info = {
> +		.alloc_pgt_page	= alloc_pgt_page,
> +		.context	= image,
> +		.pmd_flag	= __PAGE_KERNEL_LARGE_EXEC,
> +	};

This is leaving ->kernel_mapping uninitialized to contain a random,
previous stack value. I don't think we want that.

>  	unsigned long mstart, mend;
>  	pgd_t *level4p;
>  	int result;
>  	int i;
>  
>  	level4p = (pgd_t *)__va(start_pgtable);
> -	result = init_level4_page(image, level4p, 0, max_pfn << PAGE_SHIFT);
> +	clear_page(level4p);
> +	result = kernel_ident_mapping_init(&info, level4p,
> +						0, max_pfn << PAGE_SHIFT);
>  	if (result)
>  		return result;
>  
> @@ -225,7 +115,8 @@ static int init_pgtable(struct kimage *image, unsigned long start_pgtable)
>  		mstart = image->segment[i].mem;
>  		mend   = mstart + image->segment[i].memsz;
>  
> -		result = ident_mapping_init(image, level4p, mstart, mend);
> +		result = kernel_ident_mapping_init(&info,
> +						 level4p, mstart, mend);
>  
>  		if (result)
>  			return result;
> -- 
> 1.7.10.4
> 
> 

-- 
Regards/Gruss,
Boris.

^ permalink raw reply	[flat|nested] 199+ messages in thread

* Re: [PATCH v7u1 26/31] x86: Don't enable swiotlb if there is not enough ram for it
  2013-01-04 20:34     ` Yinghai Lu
@ 2013-01-04 21:02       ` Shuah Khan
  2013-01-04 22:10         ` Yinghai Lu
  0 siblings, 1 reply; 199+ messages in thread
From: Shuah Khan @ 2013-01-04 21:02 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Eric W. Biederman,
	Andrew Morton, Borislav Petkov, Jan Kiszka, Jason Wessel,
	linux-kernel, Konrad Rzeszutek Wilk, Joerg Roedel

On Fri, Jan 4, 2013 at 1:34 PM, Yinghai Lu <yinghai@kernel.org> wrote:
> On Fri, Jan 4, 2013 at 9:50 AM, Shuah Khan <shuahkhan@gmail.com> wrote:
>>
>> This change doesn't take into account what swiolb was when
>> pci_swiotlb_detect_override() is called. Instead of returning
>> use_swiotlb like the original code did, it returns swiotlb which could
>> be zero, if !enough_mem_for_swiotlb().
>>
>> Might work fine on Intel platforms, but not on systems where the IOMMU
>> driver wants to enable swiotlb for some devices as in the case of AMD.
>>
>> AMD IOMMU driver enables swiotlb for devices that are not specified in
>> IVRs and/or not in the AMD IOMMU scope, after it successfully
>> initializes IOMMU. It will explicitely set switolb=1 to make sure
>> reserved swiotlb memory is not released. This change will break that
>> case.
>
> in that case, we have to panic....
>
> please check attached patch.
>
> Also for those kind of systems, users must specify crashkernel_low=72M or
> what ever for kdump.

Pani'cing the system doesn't sound like a good option to me in this
case. This change to disable swiotlb is made for kdump. However, with
this change several system fail to boot, unless crashkernel_low=72M is
specified.

I would the say the right approach to solve this would be to not
change the current pci_swiotlb_detect_override() behavior and treat
swiotlb =1 upon entry equivalent to swiotlb_force set.

Thanks,
-- Shuah

^ permalink raw reply	[flat|nested] 199+ messages in thread

* Re: [PATCH v7u1 03/31] x86, realmode: set real_mode permissions early
  2013-01-04 20:58     ` Yinghai Lu
@ 2013-01-04 21:04       ` Borislav Petkov
  2013-01-04 22:13         ` Yinghai Lu
  0 siblings, 1 reply; 199+ messages in thread
From: Borislav Petkov @ 2013-01-04 21:04 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Eric W. Biederman,
	Andrew Morton, Jan Kiszka, Jason Wessel, linux-kernel

On Fri, Jan 04, 2013 at 12:58:15PM -0800, Yinghai Lu wrote:
> more than that, that set_real_mode_permissions reference is wrong,
> actually it is set_real_mode.

Huh, set_real_mode_permissions is the name of the function above which the
comment is located. There's no set_real_mode. What do you mean?

-- 
Regards/Gruss,
Boris.

^ permalink raw reply	[flat|nested] 199+ messages in thread

* Re: [PATCH v7u1 04/31] x86, 64bit, mm: add generic kernel/ident mapping helper
  2013-01-04  0:48 ` [PATCH v7u1 04/31] x86, 64bit, mm: add generic kernel/ident mapping helper Yinghai Lu
@ 2013-01-04 21:19   ` Borislav Petkov
  2013-01-04 22:19     ` Yinghai Lu
  0 siblings, 1 reply; 199+ messages in thread
From: Borislav Petkov @ 2013-01-04 21:19 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Eric W. Biederman,
	Andrew Morton, Jan Kiszka, Jason Wessel, linux-kernel

On Thu, Jan 03, 2013 at 04:48:24PM -0800, Yinghai Lu wrote:
> +int kernel_mapping_init(pgd_t *pgd_page, unsigned long addr, unsigned long end)
> +{
> +	struct x86_mapping_info info = {
> +		.alloc_pgt_page	= alloc_pgt_page,
> +		.pmd_flag	= __PAGE_KERNEL_LARGE,
> +		.kernel_mapping	= true,
> +	};
> +
> +	return kernel_ident_mapping_init(&info, pgd_page, addr, end);

This patch looks good so far except this:
kernel_ident_mapping_init says it initializes ident mapping but
this is wrong and the type of mapping is actually controlled by
info.kernel_mapping.

So this function which gets &info, etc should be called
kernel_mapping_init, AFAICT. And wrt the one wrapping
kernel_ident_mapping_init, I can't seem to find where it is called.
What's up?

Thanks.

-- 
Regards/Gruss,
Boris.

^ permalink raw reply	[flat|nested] 199+ messages in thread

* Re: [PATCH v7u1 00/31] x86, boot, 64bit: Add support for loading ramdisk and bzImage above 4G
  2013-01-04  7:09 ` [PATCH v7u1 00/31] x86, boot, 64bit: Add support for loading ramdisk and bzImage above 4G Borislav Petkov
@ 2013-01-04 21:44   ` Yinghai Lu
  0 siblings, 0 replies; 199+ messages in thread
From: Yinghai Lu @ 2013-01-04 21:44 UTC (permalink / raw)
  To: Borislav Petkov, Yinghai Lu, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin, Eric W. Biederman, Andrew Morton, Jan Kiszka,
	Jason Wessel, linux-kernel

On Thu, Jan 3, 2013 at 11:09 PM, Borislav Petkov <bp@alien8.de> wrote:
> On Thu, Jan 03, 2013 at 04:48:20PM -0800, Yinghai Lu wrote:
>
>>         git://git.kernel.org/pub/scm/linux/kernel/git/yinghai/linux-yinghai.git for-x86-boot
>>
>> and it is on top of linus's tree 2013-01-03
>> plus tip:x86/mm, tip:x86/mm2
>
> This is causing a merge conflict when merging tip:x86/mm2
> after having merged tip:x86/mm ontop of -rc2+ (today's Linus'
> tree) in mm/nobootmem.c. free_all_bootmem_node has gained a
> reset_node_lowmem_managed_pages() call which got added in
> 9feedc9d831e18ae6d0d15aa562e5e46ba53647b.
>
> Now, you have a patch in tip:x86/mm2 which kills that
> free_all_bootmem_node() function but the commit above adds that
> reset_node_lowmem_managed_pages() call to it.
>
> A proper merge conflict resolve would need to be added to the pull
> request which sends tip:x86/mm2 upstream and then you'd need to rebase
> your stuff ontop. Or something better which I'm not thinking of right
> now...

Peter or Linus could figure it out.

^ permalink raw reply	[flat|nested] 199+ messages in thread

* Re: [PATCH v7u1 01/31] x86, mm: Fix page table early allocation offset checking
  2013-01-04  7:17   ` Borislav Petkov
@ 2013-01-04 21:50     ` Yinghai Lu
  2013-01-05 13:05       ` Borislav Petkov
  0 siblings, 1 reply; 199+ messages in thread
From: Yinghai Lu @ 2013-01-04 21:50 UTC (permalink / raw)
  To: Borislav Petkov, Yinghai Lu, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin, Eric W. Biederman, Andrew Morton, Jan Kiszka,
	Jason Wessel, linux-kernel

On Thu, Jan 3, 2013 at 11:17 PM, Borislav Petkov <bp@alien8.de> wrote:
> On Thu, Jan 03, 2013 at 04:48:21PM -0800, Yinghai Lu wrote:
>> During debugging loading kernel above 4G, found one page if is not used
>> in BRK with early page allocation.
>>
>> pgt_buf_top is address that can not be used, so should check if that new
>> end is above that top, otherwise last page will not be used.
>>
>> Fix that checking and also add print out for every allocation from BRK.
>
> This commit message still bothers the hell out of me. Please, fix it up
> to something more readable like the below, for example:
>
> "pgt_buf_top is an address which cannot be used so we should check
> whether the new 'end' is above it. Otherwise, the last BRK page remains
> unused.
>
> Fix that check and add a debug printout of every BRK allocation."

but your changelog is wrong.

it is NOT last BRK page.

it is NOT every BRK allocation.

^ permalink raw reply	[flat|nested] 199+ messages in thread

* Re: [PATCH v7u1 31/31] x86, 64bit, mm: hibernate use generic mapping_init
  2013-01-04 11:43   ` Rafael J. Wysocki
@ 2013-01-04 21:59     ` Yinghai Lu
  2013-01-04 22:07       ` Rafael J. Wysocki
  0 siblings, 1 reply; 199+ messages in thread
From: Yinghai Lu @ 2013-01-04 21:59 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Eric W. Biederman,
	Andrew Morton, Borislav Petkov, Jan Kiszka, Jason Wessel,
	linux-kernel, Pavel Machek, linux-pm

On Fri, Jan 4, 2013 at 3:43 AM, Rafael J. Wysocki <rjw@sisk.pl> wrote:
> On Thursday, January 03, 2013 04:48:51 PM Yinghai Lu wrote:
>> Make it only map range in pfn_mapped array.
>
> Can you please explain why that should be sufficient?

It is needed.

http://git.kernel.org/?p=linux/kernel/git/tip/tip.git;a=commitdiff;h=66520ebc2df3fe52eb4792f8101fac573b766baf

Subject: [PATCH] x86, mm: Only direct map addresses that are marked as
 E820_RAM

Currently direct mappings are created for [ 0 to max_low_pfn<<PAGE_SHIFT )
and [ 4GB to max_pfn<<PAGE_SHIFT ), which may include regions that are not
backed by actual DRAM. This is fine for holes under 4GB which are covered
by fixed and variable range MTRRs to be UC. However, we run into trouble
on higher memory addresses which cannot be covered by MTRRs.

Our system with 1TB of RAM has an e820 that looks like this:

 BIOS-e820: [mem 0x0000000000000000-0x00000000000983ff] usable
 BIOS-e820: [mem 0x0000000000098400-0x000000000009ffff] reserved
 BIOS-e820: [mem 0x00000000000d0000-0x00000000000fffff] reserved
 BIOS-e820: [mem 0x0000000000100000-0x00000000c7ebffff] usable
 BIOS-e820: [mem 0x00000000c7ec0000-0x00000000c7ed7fff] ACPI data
 BIOS-e820: [mem 0x00000000c7ed8000-0x00000000c7ed9fff] ACPI NVS
 BIOS-e820: [mem 0x00000000c7eda000-0x00000000c7ffffff] reserved
 BIOS-e820: [mem 0x00000000fec00000-0x00000000fec0ffff] reserved
 BIOS-e820: [mem 0x00000000fee00000-0x00000000fee00fff] reserved
 BIOS-e820: [mem 0x00000000fff00000-0x00000000ffffffff] reserved
 BIOS-e820: [mem 0x0000000100000000-0x000000e037ffffff] usable
 BIOS-e820: [mem 0x000000e038000000-0x000000fcffffffff] reserved
 BIOS-e820: [mem 0x0000010000000000-0x0000011ffeffffff] usable

and so direct mappings are created for huge memory hole between
0x000000e038000000 to 0x0000010000000000. Even though the kernel never
generates memory accesses in that region, since the page tables mark
them incorrectly as being WB, our (AMD) processor ends up causing a MCE
while doing some memory bookkeeping/optimizations around that area.

This patch iterates through e820 and only direct maps ranges that are
marked as E820_RAM, and keeps track of those pfn ranges. Depending on
the alignment of E820 ranges, this may possibly result in using smaller
size (i.e. 4K instead of 2M or 1G) page tables.

>
> Have you tested it?
>

No

will update to

Subject: [PATCH] x86, 64bit, mm: hibernate use generic mapping_init

We should not set mapping for all under max_pfn.
That causes same problem that is fixed by
        x86, mm: Only direct map addresses that are marked as E820_RAM

Make it only map range in pfn_mapped array.

^ permalink raw reply	[flat|nested] 199+ messages in thread

* Re: [PATCH v7u1 06/31] x86, 64bit, realmode: use init_level4_pgt to set trapmoline_pgt directly
  2013-01-04 17:18   ` Sakkinen, Jarkko
@ 2013-01-04 22:01     ` Yinghai Lu
  2013-01-05  9:59       ` Sakkinen, Jarkko
  0 siblings, 1 reply; 199+ messages in thread
From: Yinghai Lu @ 2013-01-04 22:01 UTC (permalink / raw)
  To: Sakkinen, Jarkko
  Cc: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Eric W. Biederman,
	Andrew Morton, Borislav Petkov, Jan Kiszka, Jason Wessel,
	linux-kernel

On Fri, Jan 4, 2013 at 9:18 AM, Sakkinen, Jarkko
<jarkko.sakkinen@intel.com> wrote:
> On Thu, 2013-01-03 at 16:48 -0800, Yinghai Lu wrote:
>> with #PF handler way to set early page table, level3_ident will go away with
>> 64bit native path.
>>
>> So just use entries in init_level4_pgt to set them in tramopline_pgt
>>
>> Signed-off-by: Yinghai Lu <yinghai@kernel.org>
>> Cc: Jarkko Sakkinen <jarkko.sakkinen@intel.com>
>
> Acked-by: Jarkko Sakkinen <jarkko.sakkinen@intel.com>

Thanks.

updated the patch, and would save some time for HPA.

^ permalink raw reply	[flat|nested] 199+ messages in thread

* Re: [PATCH v7u1 20/31] x86, kexec: replace ident_mapping_init and init_level4_page
  2013-01-04 21:01   ` Borislav Petkov
@ 2013-01-04 22:04     ` Yinghai Lu
  2013-01-05 13:24       ` Borislav Petkov
  0 siblings, 1 reply; 199+ messages in thread
From: Yinghai Lu @ 2013-01-04 22:04 UTC (permalink / raw)
  To: Borislav Petkov, Yinghai Lu, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin, Eric W. Biederman, Andrew Morton, Jan Kiszka,
	Jason Wessel, linux-kernel

On Fri, Jan 4, 2013 at 1:01 PM, Borislav Petkov <bp@alien8.de> wrote:
> On Thu, Jan 03, 2013 at 04:48:40PM -0800, Yinghai Lu wrote:
>>  static int init_pgtable(struct kimage *image, unsigned long start_pgtable)
>>  {
>> +     struct x86_mapping_info info = {
>> +             .alloc_pgt_page = alloc_pgt_page,
>> +             .context        = image,
>> +             .pmd_flag       = __PAGE_KERNEL_LARGE_EXEC,
>> +     };
>
> This is leaving ->kernel_mapping uninitialized to contain a random,
> previous stack value. I don't think we want that.

that should be initialized to false by default.

^ permalink raw reply	[flat|nested] 199+ messages in thread

* Re: [PATCH v7u1 31/31] x86, 64bit, mm: hibernate use generic mapping_init
  2013-01-04 21:59     ` Yinghai Lu
@ 2013-01-04 22:07       ` Rafael J. Wysocki
  0 siblings, 0 replies; 199+ messages in thread
From: Rafael J. Wysocki @ 2013-01-04 22:07 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Eric W. Biederman,
	Andrew Morton, Borislav Petkov, Jan Kiszka, Jason Wessel,
	linux-kernel, Pavel Machek, linux-pm

On Friday, January 04, 2013 01:59:33 PM Yinghai Lu wrote:
> On Fri, Jan 4, 2013 at 3:43 AM, Rafael J. Wysocki <rjw@sisk.pl> wrote:
> > On Thursday, January 03, 2013 04:48:51 PM Yinghai Lu wrote:
> >> Make it only map range in pfn_mapped array.
> >
> > Can you please explain why that should be sufficient?
> 
> It is needed.
> 
> http://git.kernel.org/?p=linux/kernel/git/tip/tip.git;a=commitdiff;h=66520ebc2df3fe52eb4792f8101fac573b766baf
> 
> Subject: [PATCH] x86, mm: Only direct map addresses that are marked as
>  E820_RAM
> 
> Currently direct mappings are created for [ 0 to max_low_pfn<<PAGE_SHIFT )
> and [ 4GB to max_pfn<<PAGE_SHIFT ), which may include regions that are not
> backed by actual DRAM. This is fine for holes under 4GB which are covered
> by fixed and variable range MTRRs to be UC. However, we run into trouble
> on higher memory addresses which cannot be covered by MTRRs.
> 
> Our system with 1TB of RAM has an e820 that looks like this:
> 
>  BIOS-e820: [mem 0x0000000000000000-0x00000000000983ff] usable
>  BIOS-e820: [mem 0x0000000000098400-0x000000000009ffff] reserved
>  BIOS-e820: [mem 0x00000000000d0000-0x00000000000fffff] reserved
>  BIOS-e820: [mem 0x0000000000100000-0x00000000c7ebffff] usable
>  BIOS-e820: [mem 0x00000000c7ec0000-0x00000000c7ed7fff] ACPI data
>  BIOS-e820: [mem 0x00000000c7ed8000-0x00000000c7ed9fff] ACPI NVS
>  BIOS-e820: [mem 0x00000000c7eda000-0x00000000c7ffffff] reserved
>  BIOS-e820: [mem 0x00000000fec00000-0x00000000fec0ffff] reserved
>  BIOS-e820: [mem 0x00000000fee00000-0x00000000fee00fff] reserved
>  BIOS-e820: [mem 0x00000000fff00000-0x00000000ffffffff] reserved
>  BIOS-e820: [mem 0x0000000100000000-0x000000e037ffffff] usable
>  BIOS-e820: [mem 0x000000e038000000-0x000000fcffffffff] reserved
>  BIOS-e820: [mem 0x0000010000000000-0x0000011ffeffffff] usable
> 
> and so direct mappings are created for huge memory hole between
> 0x000000e038000000 to 0x0000010000000000. Even though the kernel never
> generates memory accesses in that region, since the page tables mark
> them incorrectly as being WB, our (AMD) processor ends up causing a MCE
> while doing some memory bookkeeping/optimizations around that area.
> 
> This patch iterates through e820 and only direct maps ranges that are
> marked as E820_RAM, and keeps track of those pfn ranges. Depending on
> the alignment of E820 ranges, this may possibly result in using smaller
> size (i.e. 4K instead of 2M or 1G) page tables.
> 
> >
> > Have you tested it?
> >
> 
> No
> 
> will update to
> 
> Subject: [PATCH] x86, 64bit, mm: hibernate use generic mapping_init
> 
> We should not set mapping for all under max_pfn.
> That causes same problem that is fixed by
>         x86, mm: Only direct map addresses that are marked as E820_RAM
> 
> Make it only map range in pfn_mapped array.

OK

Thanks,
Rafael


-- 
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.

^ permalink raw reply	[flat|nested] 199+ messages in thread

* Re: [PATCH v7u1 26/31] x86: Don't enable swiotlb if there is not enough ram for it
  2013-01-04 21:02       ` Shuah Khan
@ 2013-01-04 22:10         ` Yinghai Lu
  2013-01-04 22:26           ` Shuah Khan
  2013-01-07 15:26           ` Konrad Rzeszutek Wilk
  0 siblings, 2 replies; 199+ messages in thread
From: Yinghai Lu @ 2013-01-04 22:10 UTC (permalink / raw)
  To: Shuah Khan
  Cc: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Eric W. Biederman,
	Andrew Morton, Borislav Petkov, Jan Kiszka, Jason Wessel,
	linux-kernel, Konrad Rzeszutek Wilk, Joerg Roedel

On Fri, Jan 4, 2013 at 1:02 PM, Shuah Khan <shuahkhan@gmail.com> wrote:
> Pani'cing the system doesn't sound like a good option to me in this
> case. This change to disable swiotlb is made for kdump. However, with
> this change several system fail to boot, unless crashkernel_low=72M is
> specified.

this patchset is new feature to put second kdump kernel above 4G.

>
> I would the say the right approach to solve this would be to not
> change the current pci_swiotlb_detect_override() behavior and treat
> swiotlb =1 upon entry equivalent to swiotlb_force set.

that will make intel system have to take crashkernel_low=72M too.
otherwise intel system will get panic during swiotlb allocation.

Thanks

Yinghai

^ permalink raw reply	[flat|nested] 199+ messages in thread

* Re: [PATCH v7u1 03/31] x86, realmode: set real_mode permissions early
  2013-01-04 21:04       ` Borislav Petkov
@ 2013-01-04 22:13         ` Yinghai Lu
  2013-01-05 13:25           ` Borislav Petkov
  0 siblings, 1 reply; 199+ messages in thread
From: Yinghai Lu @ 2013-01-04 22:13 UTC (permalink / raw)
  To: Borislav Petkov, Yinghai Lu, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin, Eric W. Biederman, Andrew Morton, Jan Kiszka,
	Jason Wessel, linux-kernel

On Fri, Jan 4, 2013 at 1:04 PM, Borislav Petkov <bp@alien8.de> wrote:
> On Fri, Jan 04, 2013 at 12:58:15PM -0800, Yinghai Lu wrote:
>> more than that, that set_real_mode_permissions reference is wrong,
>> actually it is set_real_mode.
>
> Huh, set_real_mode_permissions is the name of the function above which the
> comment is located. There's no set_real_mode. What do you mean?

old comments is wrong.

setup_read_mode reserve from low ram under 1M and copy etc.

set_real_mode_permissions will change to +x etc....

^ permalink raw reply	[flat|nested] 199+ messages in thread

* Re: [PATCH v7u1 04/31] x86, 64bit, mm: add generic kernel/ident mapping helper
  2013-01-04 21:19   ` Borislav Petkov
@ 2013-01-04 22:19     ` Yinghai Lu
  2013-01-05 13:21       ` Borislav Petkov
  0 siblings, 1 reply; 199+ messages in thread
From: Yinghai Lu @ 2013-01-04 22:19 UTC (permalink / raw)
  To: Borislav Petkov, Yinghai Lu, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin, Eric W. Biederman, Andrew Morton, Jan Kiszka,
	Jason Wessel, linux-kernel

On Fri, Jan 4, 2013 at 1:19 PM, Borislav Petkov <bp@alien8.de> wrote:
> On Thu, Jan 03, 2013 at 04:48:24PM -0800, Yinghai Lu wrote:
>> +int kernel_mapping_init(pgd_t *pgd_page, unsigned long addr, unsigned long end)
>> +{
>> +     struct x86_mapping_info info = {
>> +             .alloc_pgt_page = alloc_pgt_page,
>> +             .pmd_flag       = __PAGE_KERNEL_LARGE,
>> +             .kernel_mapping = true,
>> +     };
>> +
>> +     return kernel_ident_mapping_init(&info, pgd_page, addr, end);
>
> This patch looks good so far except this:
> kernel_ident_mapping_init says it initializes ident mapping but
> this is wrong and the type of mapping is actually controlled by
> info.kernel_mapping.

it is not wrong, and it could do two things.
kernel_mapping
ident_mapping


>
> So this function which gets &info, etc should be called
> kernel_mapping_init, AFAICT. And wrt the one wrapping
> kernel_ident_mapping_init, I can't seem to find where it is called.
> What's up?

this kernel_mapping_init is for -v8 ..., should be dropped if -v7 is
used at last.

^ permalink raw reply	[flat|nested] 199+ messages in thread

* Re: [PATCH v7u1 26/31] x86: Don't enable swiotlb if there is not enough ram for it
  2013-01-04 22:10         ` Yinghai Lu
@ 2013-01-04 22:26           ` Shuah Khan
  2013-01-04 22:34             ` Yinghai Lu
  2013-01-04 22:47             ` Eric W. Biederman
  2013-01-07 15:26           ` Konrad Rzeszutek Wilk
  1 sibling, 2 replies; 199+ messages in thread
From: Shuah Khan @ 2013-01-04 22:26 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Eric W. Biederman,
	Andrew Morton, Borislav Petkov, Jan Kiszka, Jason Wessel,
	linux-kernel, Konrad Rzeszutek Wilk, Joerg Roedel

On Fri, Jan 4, 2013 at 3:10 PM, Yinghai Lu <yinghai@kernel.org> wrote:
> On Fri, Jan 4, 2013 at 1:02 PM, Shuah Khan <shuahkhan@gmail.com> wrote:
>> Pani'cing the system doesn't sound like a good option to me in this
>> case. This change to disable swiotlb is made for kdump. However, with
>> this change several system fail to boot, unless crashkernel_low=72M is
>> specified.
>
> this patchset is new feature to put second kdump kernel above 4G.
>
I understand this is just one of the patches to implement the new
kdump feature. However, I think regression on existing behavior with a
panic is a bit of a big hammer. Thie change causes panic on systems
even when kdump is not enabled, if I understand it correctly.

Granted kdump gets enabled by several distros, but it is not a
required feature. However, expecting system to boot with devices that
require swiotlb fully functioning is a basic feature. So I would argue
that not breaking the basic functionality is a higher priority over
enabling kdump in this case.

-- Shuah

^ permalink raw reply	[flat|nested] 199+ messages in thread

* Re: [PATCH v7u1 26/31] x86: Don't enable swiotlb if there is not enough ram for it
  2013-01-04 22:26           ` Shuah Khan
@ 2013-01-04 22:34             ` Yinghai Lu
  2013-01-04 22:47             ` Eric W. Biederman
  1 sibling, 0 replies; 199+ messages in thread
From: Yinghai Lu @ 2013-01-04 22:34 UTC (permalink / raw)
  To: Shuah Khan
  Cc: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Eric W. Biederman,
	Andrew Morton, Borislav Petkov, Jan Kiszka, Jason Wessel,
	linux-kernel, Konrad Rzeszutek Wilk, Joerg Roedel

On Fri, Jan 4, 2013 at 2:26 PM, Shuah Khan <shuahkhan@gmail.com> wrote:

> However, I think regression on existing behavior with a
> panic is a bit of a big hammer. Thie change causes panic on systems
> even when kdump is not enabled, if I understand it correctly.

I don't think so.

+static bool __init enough_mem_for_swiotlb(void)
+{
+       /* do we have less than 1M RAM under 4G ? */
+       return memblock_mem_size(1ULL<<(32-PAGE_SHIFT)) > (1ULL<<20);
+}

enough_mem_for_swiotlb could return false for them?

and

 int __init pci_swiotlb_detect_override(void)
 {
-       int use_swiotlb = swiotlb | swiotlb_force;
-
        if (swiotlb_force)
                swiotlb = 1;
+       else if (!enough_mem_for_swiotlb())
+               swiotlb = 0;

-       return use_swiotlb;
+       return swiotlb;
 }

it only disable swiotlb when there is less 1M mem under 4G.

^ permalink raw reply	[flat|nested] 199+ messages in thread

* Re: [PATCH v7u1 26/31] x86: Don't enable swiotlb if there is not enough ram for it
  2013-01-04 22:26           ` Shuah Khan
  2013-01-04 22:34             ` Yinghai Lu
@ 2013-01-04 22:47             ` Eric W. Biederman
  2013-01-04 22:56               ` Shuah Khan
  2013-01-04 22:58               ` Yinghai Lu
  1 sibling, 2 replies; 199+ messages in thread
From: Eric W. Biederman @ 2013-01-04 22:47 UTC (permalink / raw)
  To: Shuah Khan
  Cc: Yinghai Lu, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	Andrew Morton, Borislav Petkov, Jan Kiszka, Jason Wessel,
	linux-kernel, Konrad Rzeszutek Wilk, Joerg Roedel

Shuah Khan <shuahkhan@gmail.com> writes:

> On Fri, Jan 4, 2013 at 3:10 PM, Yinghai Lu <yinghai@kernel.org> wrote:
>> On Fri, Jan 4, 2013 at 1:02 PM, Shuah Khan <shuahkhan@gmail.com> wrote:
>>> Pani'cing the system doesn't sound like a good option to me in this
>>> case. This change to disable swiotlb is made for kdump. However, with
>>> this change several system fail to boot, unless crashkernel_low=72M is
>>> specified.
>>
>> this patchset is new feature to put second kdump kernel above 4G.
>>
> I understand this is just one of the patches to implement the new
> kdump feature. However, I think regression on existing behavior with a
> panic is a bit of a big hammer. Thie change causes panic on systems
> even when kdump is not enabled, if I understand it correctly.
>
> Granted kdump gets enabled by several distros, but it is not a
> required feature. However, expecting system to boot with devices that
> require swiotlb fully functioning is a basic feature. So I would argue
> that not breaking the basic functionality is a higher priority over
> enabling kdump in this case.

Yinghai Lu it looks like your autodetection of the problem case in this
patch is problematic and needs a rethink.  My quick skim says you are
trying to detect failure too early in the code.  Furthermore having
kexec on panic sized magic comments without explanation is wrong.

Shuah Khan this is motivated by kdump.  However a correct implementation
should be about dealing with the case when there is simply not enough
memory available below 4G for bounce buffers.

If a device needs an iommu, and swiotlb is the only iommu option, and 
there is not enough memory below 4G panic'ing is entirely reasonable.

Do I read this discussion right that we are waisting 64M on systems
that have the swiotlb code but don't use the swiotlb? 

Eric


^ permalink raw reply	[flat|nested] 199+ messages in thread

* Re: [PATCH v7u1 26/31] x86: Don't enable swiotlb if there is not enough ram for it
  2013-01-04 22:47             ` Eric W. Biederman
@ 2013-01-04 22:56               ` Shuah Khan
  2013-01-04 23:00                 ` Yinghai Lu
  2013-01-04 22:58               ` Yinghai Lu
  1 sibling, 1 reply; 199+ messages in thread
From: Shuah Khan @ 2013-01-04 22:56 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Yinghai Lu, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	Andrew Morton, Borislav Petkov, Jan Kiszka, Jason Wessel,
	linux-kernel, Konrad Rzeszutek Wilk, Joerg Roedel

On Fri, Jan 4, 2013 at 3:47 PM, Eric W. Biederman <ebiederm@xmission.com> wrote:
> Shuah Khan <shuahkhan@gmail.com> writes:
>
>> On Fri, Jan 4, 2013 at 3:10 PM, Yinghai Lu <yinghai@kernel.org> wrote:
>>> On Fri, Jan 4, 2013 at 1:02 PM, Shuah Khan <shuahkhan@gmail.com> wrote:
>>>> Pani'cing the system doesn't sound like a good option to me in this
>>>> case. This change to disable swiotlb is made for kdump. However, with
>>>> this change several system fail to boot, unless crashkernel_low=72M is
>>>> specified.
>>>
>>> this patchset is new feature to put second kdump kernel above 4G.
>>>
>> I understand this is just one of the patches to implement the new
>> kdump feature. However, I think regression on existing behavior with a
>> panic is a bit of a big hammer. Thie change causes panic on systems
>> even when kdump is not enabled, if I understand it correctly.
>>
>> Granted kdump gets enabled by several distros, but it is not a
>> required feature. However, expecting system to boot with devices that
>> require swiotlb fully functioning is a basic feature. So I would argue
>> that not breaking the basic functionality is a higher priority over
>> enabling kdump in this case.
>
> Yinghai Lu it looks like your autodetection of the problem case in this
> patch is problematic and needs a rethink.  My quick skim says you are
> trying to detect failure too early in the code.  Furthermore having
> kexec on panic sized magic comments without explanation is wrong.
>
> Shuah Khan this is motivated by kdump.  However a correct implementation
> should be about dealing with the case when there is simply not enough
> memory available below 4G for bounce buffers.
>
> If a device needs an iommu, and swiotlb is the only iommu option, and
> there is not enough memory below 4G panic'ing is entirely reasonable.
>
> Do I read this discussion right that we are waisting 64M on systems
> that have the swiotlb code but don't use the swiotlb?
>

No. pci_swiotlb_late_init() does free reserved swiolb buffers on
systems that don't need swiolb. IOMMU drivers turn off swiotlb after
iommu is initialized correctly. It is possible on some systems when
BIOS is incorrect, iommu initialization could fail and swiotlb is left
enabled.

AMD IOMMU driver is using this lever to leave swiotlb enabled when it
detects devices that can't be supported by iommu. My concern is that
this change for kdump removes that handshake ability between iommu and
swiolb.

-- Shuah

^ permalink raw reply	[flat|nested] 199+ messages in thread

* Re: [PATCH v7u1 26/31] x86: Don't enable swiotlb if there is not enough ram for it
  2013-01-04 22:47             ` Eric W. Biederman
  2013-01-04 22:56               ` Shuah Khan
@ 2013-01-04 22:58               ` Yinghai Lu
  1 sibling, 0 replies; 199+ messages in thread
From: Yinghai Lu @ 2013-01-04 22:58 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Shuah Khan, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	Andrew Morton, Borislav Petkov, Jan Kiszka, Jason Wessel,
	linux-kernel, Konrad Rzeszutek Wilk, Joerg Roedel

On Fri, Jan 4, 2013 at 2:47 PM, Eric W. Biederman <ebiederm@xmission.com> wrote:
> Yinghai Lu it looks like your autodetection of the problem case in this
> patch is problematic and needs a rethink.  My quick skim says you are
> trying to detect failure too early in the code.  Furthermore having
> kexec on panic sized magic comments without explanation is wrong.

current amd iommu implementation have this sequence:
1. alloc buffer for swiotlb.
2. detect and initialize intel iommu or amd iommu
3. release swiotlb if swiotlb == 0 , set by ops_init.

so we need to detect that before allocating buffer for swiotlb.

>
> Shuah Khan this is motivated by kdump.  However a correct implementation
> should be about dealing with the case when there is simply not enough
> memory available below 4G for bounce buffers.
>
> If a device needs an iommu, and swiotlb is the only iommu option, and
> there is not enough memory below 4G panic'ing is entirely reasonable.
>
> Do I read this discussion right that we are waisting 64M on systems
> that have the swiotlb code but don't use the swiotlb?

No wasting.

^ permalink raw reply	[flat|nested] 199+ messages in thread

* Re: [PATCH v7u1 26/31] x86: Don't enable swiotlb if there is not enough ram for it
  2013-01-04 22:56               ` Shuah Khan
@ 2013-01-04 23:00                 ` Yinghai Lu
  2013-01-04 23:21                   ` Shuah Khan
  0 siblings, 1 reply; 199+ messages in thread
From: Yinghai Lu @ 2013-01-04 23:00 UTC (permalink / raw)
  To: Shuah Khan
  Cc: Eric W. Biederman, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	Andrew Morton, Borislav Petkov, Jan Kiszka, Jason Wessel,
	linux-kernel, Konrad Rzeszutek Wilk, Joerg Roedel

On Fri, Jan 4, 2013 at 2:56 PM, Shuah Khan <shuahkhan@gmail.com> wrote:
>
> AMD IOMMU driver is using this lever to leave swiotlb enabled when it
> detects devices that can't be supported by iommu. My concern is that
> this change for kdump removes that handshake ability between iommu and
> swiolb.

No, it does not remove that ability.

I'd like to see the boot log on system that could be affected by this patch.

^ permalink raw reply	[flat|nested] 199+ messages in thread

* Re: [PATCH v7u1 26/31] x86: Don't enable swiotlb if there is not enough ram for it
  2013-01-04 23:00                 ` Yinghai Lu
@ 2013-01-04 23:21                   ` Shuah Khan
  2013-01-04 23:55                     ` Yinghai Lu
  0 siblings, 1 reply; 199+ messages in thread
From: Shuah Khan @ 2013-01-04 23:21 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Eric W. Biederman, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	Andrew Morton, Borislav Petkov, Jan Kiszka, Jason Wessel,
	linux-kernel, Konrad Rzeszutek Wilk, Joerg Roedel

[-- Attachment #1: Type: text/plain, Size: 1659 bytes --]

On Fri, Jan 4, 2013 at 4:00 PM, Yinghai Lu <yinghai@kernel.org> wrote:
> On Fri, Jan 4, 2013 at 2:56 PM, Shuah Khan <shuahkhan@gmail.com> wrote:
>>
>> AMD IOMMU driver is using this lever to leave swiotlb enabled when it
>> detects devices that can't be supported by iommu. My concern is that
>> this change for kdump removes that handshake ability between iommu and
>> swiolb.
>
> No, it does not remove that ability.
>
> I'd like to see the boot log on system that could be affected by this patch.

You are in luck. I am testing iommu on an AMD system as we speak. I
also enabled amd_iommu_dump for AMD-Vi table dump.

[    3.198194] pci 0000:00:00.2: irq 72 for MSI/MSI-X
[    3.198259] AMD-Vi: Enabling IOMMU at 0000:00:00.2 cap 0x40
[    3.264727] AMD-Vi: Lazy IO/TLB flushing enabled

You can see here that AMD IOMMU is enabled.

[    3.264733] PCI-DMA: Using software bounce buffering for IO (SWIOTLB)
[    3.264738] Placing 64MB software IO TLB between ffff8800b9ddb000 -
ffff8800bdddb000
[    3.264743] software IO TLB at phys 0xb9ddb000 - 0xbdddb000

You can see here that the Software iotlb is left enabled. This is
where AMD IOMMU driver detects devices iommu can't handle and leaves
swiotlb enabled even after it initialized iommu.

[    3.265516] perf: AMD IBS detected (0x000000ff)
[    3.265731] audit: initializing netlink socket (disabled)
[    3.265742] type=2000 audit(1357265243.260:1): initialized
[    3.321516] HugeTLB registered 2 MB page size, pre-allocated 0 pages
[    3.399052] VFS: Disk quotas dquot_6.5.2

Please see attached dmesg for full log. I can do some testing on this
system with your patch if you would like.

-- Shuah

[-- Attachment #2: dmesg.log --]
[-- Type: application/octet-stream, Size: 70721 bytes --]

[    0.000000] Initializing cgroup subsys cpuset
[    0.000000] Initializing cgroup subsys cpu
[    0.000000] Linux version 3.2.0-29-generic (buildd@allspice) (gcc version 4.6.3 (Ubuntu/Linaro 4.6.3-1ubuntu5) ) #46-Ubuntu SMP Fri Jul 27 17:03:23 UTC 2012 (Ubuntu 3.2.0-29.46-generic 3.2.24)
[    0.000000] Command line: BOOT_IMAGE=/boot/vmlinuz-3.2.0-29-generic root=UUID=a6a5f660-ddab-4664-9970-0cc54fabea36 ro earlyprink=vga loglvl=info amd_iommu_dump
[    0.000000] KERNEL supported cpus:
[    0.000000]   Intel GenuineIntel
[    0.000000]   AMD AuthenticAMD
[    0.000000]   Centaur CentaurHauls
[    0.000000] BIOS-provided physical RAM map:
[    0.000000]  BIOS-e820: 0000000000000000 - 000000000009f400 (usable)
[    0.000000]  BIOS-e820: 000000000009f400 - 00000000000a0000 (reserved)
[    0.000000]  BIOS-e820: 00000000000f0000 - 0000000000100000 (reserved)
[    0.000000]  BIOS-e820: 0000000000100000 - 00000000bddde000 (usable)
[    0.000000]  BIOS-e820: 00000000bddde000 - 00000000bde0e000 (ACPI data)
[    0.000000]  BIOS-e820: 00000000bde0e000 - 00000000d0000000 (reserved)
[    0.000000]  BIOS-e820: 00000000fec00000 - 00000000fee10000 (reserved)
[    0.000000]  BIOS-e820: 00000000ff800000 - 0000000100000000 (reserved)
[    0.000000]  BIOS-e820: 0000000100000000 - 000000043efff000 (usable)
[    0.000000] NX (Execute Disable) protection: active
[    0.000000] DMI 2.7 present.
[    0.000000] DMI: HP ProLiant DL385p Gen8, BIOS A28 08/14/2012
[    0.000000] e820 update range: 0000000000000000 - 0000000000010000 (usable) ==> (reserved)
[    0.000000] e820 remove range: 00000000000a0000 - 0000000000100000 (usable)
[    0.000000] No AGP bridge found
[    0.000000] last_pfn = 0x43efff max_arch_pfn = 0x400000000
[    0.000000] MTRR default type: uncachable
[    0.000000] MTRR fixed ranges enabled:
[    0.000000]   00000-9FFFF write-back
[    0.000000]   A0000-BFFFF uncachable
[    0.000000]   C0000-FFFFF write-back
[    0.000000] MTRR variable ranges enabled:
[    0.000000]   0 base 000000000000 mask FFFF80000000 write-back
[    0.000000]   1 base 000080000000 mask FFFFC0000000 write-back
[    0.000000]   2 disabled
[    0.000000]   3 disabled
[    0.000000]   4 disabled
[    0.000000]   5 disabled
[    0.000000]   6 disabled
[    0.000000]   7 disabled
[    0.000000] TOM2: 000000043f000000 aka 17392M
[    0.000000] x86 PAT enabled: cpu 0, old 0x7040600070406, new 0x7010600070106
[    0.000000] e820 update range: 00000000c0000000 - 0000000100000000 (usable) ==> (reserved)
[    0.000000] last_pfn = 0xbddde max_arch_pfn = 0x400000000
[    0.000000] found SMP MP-table at [ffff8800000f4f60] f4f60
[    0.000000] initial memory mapped : 0 - 20000000
[    0.000000] Base memory trampoline at [ffff88000009a000] 9a000 size 20480
[    0.000000] Using GB pages for direct mapping
[    0.000000] init_memory_mapping: 0000000000000000-00000000bddde000
[    0.000000]  0000000000 - 0080000000 page 1G
[    0.000000]  0080000000 - 00bdc00000 page 2M
[    0.000000]  00bdc00000 - 00bddde000 page 4k
[    0.000000] kernel direct mapping tables up to bddde000 @ 1fffd000-20000000
[    0.000000] init_memory_mapping: 0000000100000000-000000043efff000
[    0.000000]  0100000000 - 0400000000 page 1G
[    0.000000]  0400000000 - 043ee00000 page 2M
[    0.000000]  043ee00000 - 043efff000 page 4k
[    0.000000] kernel direct mapping tables up to 43efff000 @ bdddb000-bddde000
[    0.000000] RAMDISK: 364d4000 - 37262000
[    0.000000] ACPI: RSDP 00000000000f4ee0 00024 (v02 HP    )
[    0.000000] ACPI: XSDT 00000000bdde1880 000B4 (v01 HP     ProLiant 00000002   \xffffffd2? 0000162E)
[    0.000000] ACPI: FACP 00000000bdde1980 000F4 (v03 HP     ProLiant 00000002   \xffffffd2? 0000162E)
[    0.000000] ACPI: DSDT 00000000bdde1a80 0D2F7 (v01 HP         DSDT 00000001 INTL 20061109)
[    0.000000] ACPI: FACS 00000000bddde140 00040
[    0.000000] ACPI: SPCR 00000000bddde180 00050 (v01 HP     SPCRRBSU 00000001   \xffffffd2? 0000162E)
[    0.000000] ACPI: MCFG 00000000bddde200 0003C (v01 HP     ProLiant 00000001      00000000)
[    0.000000] ACPI: HPET 00000000bddde240 00038 (v01 HP     ProLiant 00000002   \xffffffd2? 0000162E)
[    0.000000] ACPI: SPMI 00000000bddde280 00040 (v05 HP     ProLiant 00000001   \xffffffd2? 0000162E)
[    0.000000] ACPI: ERST 00000000bddde2c0 00230 (v01 HP     ProLiant 00000001   \xffffffd2? 0000162E)
[    0.000000] ACPI: APIC 00000000bddde500 0009E (v01 HP     ProLiant 00000002      00000000)
[    0.000000] ACPI: SRAT 00000000bddde800 00150 (v02 AMD    AGESA    00000001 AMD  00000001)
[    0.000000] ACPI: FFFF 00000000bdddf000 00176 (v01 HP     ProLiant 00000001   \xffffffd2? 0000162E)
[    0.000000] ACPI: BERT 00000000bdddf180 00030 (v01 HP     ProLiant 00000001   \xffffffd2? 0000162E)
[    0.000000] ACPI: HEST 00000000bdddf1c0 0018C (v01 HP     ProLiant 00000001   \xffffffd2? 0000162E)
[    0.000000] ACPI: FFFF 00000000bdddf380 00064 (v02 HP     ProLiant 00000002   \xffffffd2? 0000162E)
[    0.000000] ACPI: SLIT 00000000bdddf400 00030 (v01 AMD    AGESA    00000001 AMD  00000001)
[    0.000000] ACPI: FFFF 00000000bdde1800 0006E (v01 HP     Proliant 00000001   PH 0000504D)
[    0.000000] ACPI: IVRS 00000000bdddf800 00180 (v01  AMD     RD890S 00202031 AMD  00000000)
[    0.000000] ACPI: SSDT 00000000bddeed80 00125 (v03     HP  CRSPCI0 00000002   HP 00000001)
[    0.000000] ACPI: SSDT 00000000bddeeec0 00377 (v01     HP     pmab 00000001 INTL 20120503)
[    0.000000] ACPI: SSDT 00000000bddef240 01714 (v02 AMD    POWERNOW 00000001 AMD  00000001)
[    0.000000] ACPI: Local APIC address 0xfee00000
[    0.000000] SRAT: PXM 0 -> APIC 0x20 -> Node 0
[    0.000000] SRAT: PXM 0 -> APIC 0x21 -> Node 0
[    0.000000] SRAT: PXM 0 -> APIC 0x22 -> Node 0
[    0.000000] SRAT: PXM 0 -> APIC 0x23 -> Node 0
[    0.000000] SRAT: PXM 1 -> APIC 0x24 -> Node 1
[    0.000000] SRAT: PXM 1 -> APIC 0x25 -> Node 1
[    0.000000] SRAT: PXM 1 -> APIC 0x26 -> Node 1
[    0.000000] SRAT: PXM 1 -> APIC 0x27 -> Node 1
[    0.000000] SRAT: Node 0 PXM 0 0-a0000
[    0.000000] SRAT: Node 0 PXM 0 100000-c0000000
[    0.000000] SRAT: Node 0 PXM 0 100000000-240000000
[    0.000000] SRAT: Node 1 PXM 1 240000000-43f000000
[    0.000000] NUMA: Initialized distance table, cnt=2
[    0.000000] NUMA: Node 0 [0,a0000) + [100000,c0000000) -> [0,c0000000)
[    0.000000] NUMA: Node 0 [0,c0000000) + [100000000,240000000) -> [0,240000000)
[    0.000000] Initmem setup node 0 0000000000000000-0000000240000000
[    0.000000]   NODE_DATA [000000023fffb000 - 000000023fffffff]
[    0.000000] Initmem setup node 1 0000000240000000-000000043efff000
[    0.000000]   NODE_DATA [000000043eff9000 - 000000043effdfff]
[    0.000000]  [ffffea0000000000-ffffea0008ffffff] PMD -> [ffff880237e00000-ffff88023fdfffff] on node 0
[    0.000000]  [ffffea0009000000-ffffea0010ffffff] PMD -> [ffff880436600000-ffff88043e5fffff] on node 1
[    0.000000] Zone PFN ranges:
[    0.000000]   DMA      0x00000010 -> 0x00001000
[    0.000000]   DMA32    0x00001000 -> 0x00100000
[    0.000000]   Normal   0x00100000 -> 0x0043efff
[    0.000000] Movable zone start PFN for each node
[    0.000000] early_node_map[4] active PFN ranges
[    0.000000]     0: 0x00000010 -> 0x0000009f
[    0.000000]     0: 0x00000100 -> 0x000bddde
[    0.000000]     0: 0x00100000 -> 0x00240000
[    0.000000]     1: 0x00240000 -> 0x0043efff
[    0.000000] On node 0 totalpages: 2088301
[    0.000000]   DMA zone: 64 pages used for memmap
[    0.000000]   DMA zone: 5 pages reserved
[    0.000000]   DMA zone: 3914 pages, LIFO batch:0
[    0.000000]   DMA32 zone: 16320 pages used for memmap
[    0.000000]   DMA32 zone: 757278 pages, LIFO batch:31
[    0.000000]   Normal zone: 20480 pages used for memmap
[    0.000000]   Normal zone: 1290240 pages, LIFO batch:31
[    0.000000] On node 1 totalpages: 2093055
[    0.000000]   Normal zone: 32704 pages used for memmap
[    0.000000]   Normal zone: 2060351 pages, LIFO batch:31
[    0.000000] ACPI: PM-Timer IO Port: 0x920
[    0.000000] ACPI: Local APIC address 0xfee00000
[    0.000000] ACPI: LAPIC (acpi_id[0x00] lapic_id[0x20] enabled)
[    0.000000] ACPI: LAPIC (acpi_id[0x01] lapic_id[0x21] enabled)
[    0.000000] ACPI: LAPIC (acpi_id[0x02] lapic_id[0x22] enabled)
[    0.000000] ACPI: LAPIC (acpi_id[0x03] lapic_id[0x23] enabled)
[    0.000000] ACPI: LAPIC (acpi_id[0x04] lapic_id[0x24] enabled)
[    0.000000] ACPI: LAPIC (acpi_id[0x05] lapic_id[0x25] enabled)
[    0.000000] ACPI: LAPIC (acpi_id[0x06] lapic_id[0x26] enabled)
[    0.000000] ACPI: LAPIC (acpi_id[0x07] lapic_id[0x27] enabled)
[    0.000000] ACPI: LAPIC_NMI (acpi_id[0xff] dfl dfl lint[0x1])
[    0.000000] ACPI: IOAPIC (id[0x08] address[0xfec00000] gsi_base[0])
[    0.000000] IOAPIC[0]: apic_id 8, version 33, address 0xfec00000, GSI 0-23
[    0.000000] ACPI: IOAPIC (id[0x09] address[0xfaffc000] gsi_base[24])
[    0.000000] IOAPIC[1]: apic_id 9, version 33, address 0xfaffc000, GSI 24-55
[    0.000000] ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 high edge)
[    0.000000] ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 low level)
[    0.000000] ACPI: IRQ0 used by override.
[    0.000000] ACPI: IRQ2 used by override.
[    0.000000] ACPI: IRQ9 used by override.
[    0.000000] Using ACPI (MADT) for SMP configuration information
[    0.000000] ACPI: HPET id: 0x1166a201 base: 0xfed00000
[    0.000000] SMP: Allowing 8 CPUs, 0 hotplug CPUs
[    0.000000] nr_irqs_gsi: 72
[    0.000000] PM: Registered nosave memory: 000000000009f000 - 00000000000a0000
[    0.000000] PM: Registered nosave memory: 00000000000a0000 - 00000000000f0000
[    0.000000] PM: Registered nosave memory: 00000000000f0000 - 0000000000100000
[    0.000000] PM: Registered nosave memory: 00000000bddde000 - 00000000bde0e000
[    0.000000] PM: Registered nosave memory: 00000000bde0e000 - 00000000d0000000
[    0.000000] PM: Registered nosave memory: 00000000d0000000 - 00000000fec00000
[    0.000000] PM: Registered nosave memory: 00000000fec00000 - 00000000fee10000
[    0.000000] PM: Registered nosave memory: 00000000fee10000 - 00000000ff800000
[    0.000000] PM: Registered nosave memory: 00000000ff800000 - 0000000100000000
[    0.000000] Allocating PCI resources starting at d0000000 (gap: d0000000:2ec00000)
[    0.000000] Booting paravirtualized kernel on bare hardware
[    0.000000] setup_percpu: NR_CPUS:256 nr_cpumask_bits:256 nr_cpu_ids:8 nr_node_ids:2
[    0.000000] PERCPU: Embedded 28 pages/cpu @ffff880237c00000 s83072 r8192 d23424 u524288
[    0.000000] pcpu-alloc: s83072 r8192 d23424 u524288 alloc=1*2097152
[    0.000000] pcpu-alloc: [0] 0 1 2 3 [1] 4 5 6 7 
[    0.000000] Built 2 zonelists in Zone order, mobility grouping on.  Total pages: 4111783
[    0.000000] Policy zone: Normal
[    0.000000] Kernel command line: BOOT_IMAGE=/boot/vmlinuz-3.2.0-29-generic root=UUID=a6a5f660-ddab-4664-9970-0cc54fabea36 ro earlyprink=vga loglvl=info amd_iommu_dump
[    0.000000] PID hash table entries: 4096 (order: 3, 32768 bytes)
[    0.000000] xsave/xrstor: enabled xstate_bv 0x7, cntxt size 0x340
[    0.000000] Checking aperture...
[    0.000000] No AGP bridge found
[    0.000000] Node 0: aperture @ efb4000000 size 32 MB
[    0.000000] Aperture beyond 4GB. Ignoring.
[    0.000000] Your BIOS doesn't leave a aperture memory hole
[    0.000000] Please enable the IOMMU option in the BIOS setup
[    0.000000] This costs you 64 MB of RAM
[    0.000000] Mapping aperture over 65536 KB of RAM @ b4000000
[    0.000000] PM: Registered nosave memory: 00000000b4000000 - 00000000b8000000
[    0.000000] Memory: 16300960k/17809404k available (6555k kernel code, 1083980k absent, 424464k reserved, 6645k data, 920k init)
[    0.000000] SLUB: Genslabs=15, HWalign=64, Order=0-3, MinObjects=0, CPUs=8, Nodes=2
[    0.000000] Hierarchical RCU implementation.
[    0.000000] 	RCU dyntick-idle grace-period acceleration is enabled.
[    0.000000] NR_IRQS:16640 nr_irqs:1288 16
[    0.000000] Extended CMOS year: 2000
[    0.000000] spurious 8259A interrupt: IRQ7.
[    0.000000] Console: colour dummy device 80x25
[    0.000000] console [tty0] enabled
[    0.000000] allocated 134217728 bytes of page_cgroup
[    0.000000] please try 'cgroup_disable=memory' option if you don't want memory cgroups
[    0.000000] hpet clockevent registered
[    0.000000] Fast TSC calibration using PIT
[    0.000000] Detected 3192.021 MHz processor.
[    0.008004] Calibrating delay loop (skipped), value calculated using timer frequency.. 6384.04 BogoMIPS (lpj=12768084)
[    0.008009] pid_max: default: 32768 minimum: 301
[    0.008051] Security Framework initialized
[    0.008062] AppArmor: AppArmor initialized
[    0.008064] Yama: becoming mindful.
[    0.009591] Dentry cache hash table entries: 2097152 (order: 12, 16777216 bytes)
[    0.014979] Inode-cache hash table entries: 1048576 (order: 11, 8388608 bytes)
[    0.017169] Mount-cache hash table entries: 256
[    0.017305] Initializing cgroup subsys cpuacct
[    0.017312] Initializing cgroup subsys memory
[    0.017331] Initializing cgroup subsys devices
[    0.017334] Initializing cgroup subsys freezer
[    0.017337] Initializing cgroup subsys blkio
[    0.017343] Initializing cgroup subsys perf_event
[    0.017369] tseg: 00be000000
[    0.017371] CPU: Physical Processor ID: 0
[    0.017373] CPU: Processor Core ID: 0
[    0.017376] mce: CPU supports 7 MCE banks
[    0.018800] ACPI: Core revision 20110623
[    0.024713] ftrace: allocating 26998 entries in 106 pages
[    0.028663] ..TIMER: vector=0x30 apic1=0 pin1=2 apic2=-1 pin2=-1
[    0.068921] CPU0: AMD Opteron(tm) Processor 6328                  stepping 00
[    0.072003] Performance Events: Broken BIOS detected, complain to your hardware vendor.
[    0.072003] [Firmware Bug]: the BIOS has corrupted hw-PMU resources (MSR c0010202 is 430076)
[    0.072003] AMD Family 15h PMU driver.
[    0.072003] ... version:                0
[    0.072003] ... bit width:              48
[    0.072003] ... generic registers:      6
[    0.072003] ... value mask:             0000ffffffffffff
[    0.072003] ... max period:             00007fffffffffff
[    0.072003] ... fixed-purpose events:   0
[    0.072003] ... event mask:             000000000000003f
[    0.072003] NMI watchdog enabled, takes one hw-pmu counter.
[    0.072003] Booting Node   0, Processors  #1
[    0.072003] smpboot cpu 1: start_ip = 9a000
[    0.160030] NMI watchdog enabled, takes one hw-pmu counter.
[    0.160129]  #2
[    0.160131] smpboot cpu 2: start_ip = 9a000
[    0.252029] NMI watchdog enabled, takes one hw-pmu counter.
[    0.252098]  #3
[    0.252100] smpboot cpu 3: start_ip = 9a000
[    0.344031] NMI watchdog enabled, takes one hw-pmu counter.
[    0.344103]  Ok.
[    0.344106] Booting Node   1, Processors  #4
[    0.344109] smpboot cpu 4: start_ip = 9a000
[    0.012000] NUMA core number 1 differs from configured core number 0
[    0.436044] NMI watchdog enabled, takes one hw-pmu counter.
[    0.436127]  #5
[    0.436129] smpboot cpu 5: start_ip = 9a000
[    0.012000] NUMA core number 1 differs from configured core number 0
[    0.528046] NMI watchdog enabled, takes one hw-pmu counter.
[    0.528133]  #6
[    0.528135] smpboot cpu 6: start_ip = 9a000
[    0.012000] NUMA core number 1 differs from configured core number 0
[    0.620054] NMI watchdog enabled, takes one hw-pmu counter.
[    0.620125]  #7 Ok.
[    0.620127] smpboot cpu 7: start_ip = 9a000
[    0.012000] NUMA core number 1 differs from configured core number 0
[    0.712062] NMI watchdog enabled, takes one hw-pmu counter.
[    0.712087] Brought up 8 CPUs
[    0.712091] Total of 8 processors activated (51071.14 BogoMIPS).
[    0.716393] devtmpfs: initialized
[    0.721069] EVM: security.selinux
[    0.721072] EVM: security.SMACK64
[    0.721074] EVM: security.capability
[    0.721119] print_constraints: dummy: 
[    0.721119] RTC time:  2:07:21, date: 01/04/13
[    0.721119] NET: Registered protocol family 16
[    0.721119] Extended Config Space enabled on 2 nodes
[    0.721803] Trying to unpack rootfs image as initramfs...
[    0.721119] ACPI FADT declares the system doesn't support PCIe ASPM, so disable it
[    0.721119] ACPI: bus type pci registered
[    0.721119] PCI: MMCONFIG for domain 0000 [bus 00-ff] at [mem 0xc0000000-0xcfffffff] (base 0xc0000000)
[    0.721119] PCI: MMCONFIG at [mem 0xc0000000-0xcfffffff] reserved in E820
[    0.746031] PCI: Using configuration type 1 for base access
[    0.747098] bio: create slab <bio-0> at 0
[    0.747098] ACPI: Added _OSI(Module Device)
[    0.747098] ACPI: Added _OSI(Processor Device)
[    0.747098] ACPI: Added _OSI(3.0 _SCP Extensions)
[    0.747098] ACPI: Added _OSI(Processor Aggregator Device)
[    0.749357] ACPI: EC: Look up EC in DSDT
[    0.769443] ACPI: Interpreter enabled
[    0.769448] ACPI: (supports S0 S4 S5)
[    0.769465] ACPI: Using IOAPIC for interrupt routing
[    0.781694] ACPI: No dock devices found.
[    0.781700] HEST: Table parsing has been initialized.
[    0.781705] PCI: Using host bridge windows from ACPI; if necessary, use "pci=nocrs" and report a bug
[    0.781793] ACPI: PCI Root Bridge [PCI0] (domain 0000 [bus 00-3f])
[    0.781844] pci_root PNP0A08:00: host bridge window [mem 0xfaf00000-0xfdffffff]
[    0.781850] pci_root PNP0A08:00: host bridge window [io  0x1000-0xffff]
[    0.781854] pci_root PNP0A08:00: host bridge window [io  0x0000-0x03af]
[    0.781859] pci_root PNP0A08:00: host bridge window [io  0x03e0-0x0cf7]
[    0.781863] pci_root PNP0A08:00: host bridge window [io  0x0d00-0x0fff]
[    0.781868] pci_root PNP0A08:00: host bridge window [mem 0xfed00000-0xfed03fff]
[    0.781873] pci_root PNP0A08:00: host bridge window [mem 0xfed40000-0xfed44fff]
[    0.781878] pci_root PNP0A08:00: host bridge window [io  0x03b0-0x03bb]
[    0.781883] pci_root PNP0A08:00: host bridge window [io  0x03c0-0x03df]
[    0.781887] pci_root PNP0A08:00: host bridge window [mem 0x000a0000-0x000bffff]
[    0.781909] pci 0000:00:00.0: [1002:5a10] type 0 class 0x000600
[    0.781966] pci 0000:00:00.2: [1002:5a23] type 0 class 0x000806
[    0.782023] pci 0000:00:02.0: [1002:5a16] type 1 class 0x000604
[    0.782056] pci 0000:00:02.0: PME# supported from D0 D3hot D3cold
[    0.782059] pci 0000:00:02.0: PME# disabled
[    0.782082] pci 0000:00:0a.0: [1002:5a1d] type 1 class 0x000604
[    0.782116] pci 0000:00:0a.0: PME# supported from D0 D3hot D3cold
[    0.782118] pci 0000:00:0a.0: PME# disabled
[    0.782134] pci 0000:00:0c.0: [1002:5a20] type 1 class 0x000604
[    0.782166] pci 0000:00:0c.0: PME# supported from D0 D3hot D3cold
[    0.782169] pci 0000:00:0c.0: PME# disabled
[    0.782194] pci 0000:00:11.0: [1002:4390] type 0 class 0x000101
[    0.782213] pci 0000:00:11.0: reg 10: [io  0x1000-0x1007]
[    0.782223] pci 0000:00:11.0: reg 14: [io  0x1008-0x100b]
[    0.782232] pci 0000:00:11.0: reg 18: [io  0x1010-0x1017]
[    0.782241] pci 0000:00:11.0: reg 1c: [io  0x1018-0x101b]
[    0.782251] pci 0000:00:11.0: reg 20: [io  0x1020-0x102f]
[    0.782260] pci 0000:00:11.0: reg 24: [mem 0xfccf0000-0xfccf03ff]
[    0.782280] pci 0000:00:11.0: set SATA to AHCI mode
[    0.782329] pci 0000:00:12.0: [1002:4397] type 0 class 0x000c03
[    0.782342] pci 0000:00:12.0: reg 10: [mem 0xfcce0000-0xfcce0fff]
[    0.782406] pci 0000:00:12.1: [1002:4398] type 0 class 0x000c03
[    0.782419] pci 0000:00:12.1: reg 10: [mem 0xfccd0000-0xfccd0fff]
[    0.782488] pci 0000:00:12.2: [1002:4396] type 0 class 0x000c03
[    0.782507] pci 0000:00:12.2: reg 10: [mem 0xfccc0000-0xfccc00ff]
[    0.782615] pci 0000:00:12.2: supports D1 D2
[    0.782616] pci 0000:00:12.2: PME# supported from D0 D1 D2 D3hot
[    0.782620] pci 0000:00:12.2: PME# disabled
[    0.782641] pci 0000:00:13.0: [1002:4397] type 0 class 0x000c03
[    0.782654] pci 0000:00:13.0: reg 10: [mem 0xfccb0000-0xfccb0fff]
[    0.782719] pci 0000:00:13.1: [1002:4398] type 0 class 0x000c03
[    0.782732] pci 0000:00:13.1: reg 10: [mem 0xfcca0000-0xfcca0fff]
[    0.782801] pci 0000:00:13.2: [1002:4396] type 0 class 0x000c03
[    0.782820] pci 0000:00:13.2: reg 10: [mem 0xfcc90000-0xfcc900ff]
[    0.782901] pci 0000:00:13.2: supports D1 D2
[    0.782902] pci 0000:00:13.2: PME# supported from D0 D1 D2 D3hot
[    0.782906] pci 0000:00:13.2: PME# disabled
[    0.782930] pci 0000:00:14.0: [1002:4385] type 0 class 0x000c05
[    0.783028] pci 0000:00:14.1: [1002:439c] type 0 class 0x000101
[    0.783044] pci 0000:00:14.1: reg 10: [io  0x01f0-0x01f7]
[    0.783053] pci 0000:00:14.1: reg 14: [io  0x03f4-0x03f7]
[    0.783063] pci 0000:00:14.1: reg 18: [io  0x0170-0x0177]
[    0.783072] pci 0000:00:14.1: reg 1c: [io  0x0374-0x0377]
[    0.783081] pci 0000:00:14.1: reg 20: [io  0x0500-0x050f]
[    0.783136] pci 0000:00:14.3: [1002:439d] type 0 class 0x000601
[    0.783211] pci 0000:00:14.4: [1002:4384] type 1 class 0x000604
[    0.783259] pci 0000:00:18.0: [1022:1600] type 0 class 0x000600
[    0.783287] pci 0000:00:18.1: [1022:1601] type 0 class 0x000600
[    0.783305] pci 0000:00:18.2: [1022:1602] type 0 class 0x000600
[    0.783320] pci 0000:00:18.3: [1022:1603] type 0 class 0x000600
[    0.783341] pci 0000:00:18.4: [1022:1604] type 0 class 0x000600
[    0.783364] pci 0000:00:18.5: [1022:1605] type 0 class 0x000600
[    0.783382] pci 0000:00:19.0: [1022:1600] type 0 class 0x000600
[    0.783410] pci 0000:00:19.1: [1022:1601] type 0 class 0x000600
[    0.783426] pci 0000:00:19.2: [1022:1602] type 0 class 0x000600
[    0.783444] pci 0000:00:19.3: [1022:1603] type 0 class 0x000600
[    0.783468] pci 0000:00:19.4: [1022:1604] type 0 class 0x000600
[    0.783493] pci 0000:00:19.5: [1022:1605] type 0 class 0x000600
[    0.783548] pci 0000:03:00.0: [103c:323b] type 0 class 0x000104
[    0.783561] pci 0000:03:00.0: reg 10: [mem 0xfdf00000-0xfdffffff 64bit]
[    0.783571] pci 0000:03:00.0: reg 18: [mem 0xfdef0000-0xfdef03ff 64bit]
[    0.783578] pci 0000:03:00.0: reg 20: [io  0x4000-0x40ff]
[    0.783590] pci 0000:03:00.0: reg 30: [mem 0x00000000-0x0007ffff pref]
[    0.783623] pci 0000:03:00.0: PME# supported from D0 D1 D3hot
[    0.783626] pci 0000:03:00.0: PME# disabled
[    0.788063] pci 0000:00:02.0: PCI bridge to [bus 03-03]
[    0.788070] pci 0000:00:02.0:   bridge window [io  0x4000-0x4fff]
[    0.788073] pci 0000:00:02.0:   bridge window [mem 0xfde00000-0xfdffffff]
[    0.788119] pci 0000:02:00.0: [103c:3306] type 0 class 0x000880
[    0.788133] pci 0000:02:00.0: reg 10: [io  0x3000-0x30ff]
[    0.788143] pci 0000:02:00.0: reg 14: [mem 0xfddf0000-0xfddf01ff]
[    0.788153] pci 0000:02:00.0: reg 18: [io  0x3400-0x34ff]
[    0.788259] pci 0000:02:00.1: [102b:0533] type 0 class 0x000300
[    0.788273] pci 0000:02:00.1: reg 10: [mem 0xfb000000-0xfbffffff pref]
[    0.788283] pci 0000:02:00.1: reg 14: [mem 0xfdde0000-0xfdde3fff]
[    0.788293] pci 0000:02:00.1: reg 18: [mem 0xfd000000-0xfd7fffff]
[    0.788397] pci 0000:02:00.2: [103c:3307] type 0 class 0x000880
[    0.788411] pci 0000:02:00.2: reg 10: [io  0x3800-0x38ff]
[    0.788420] pci 0000:02:00.2: reg 14: [mem 0xfcff0000-0xfcff00ff]
[    0.788430] pci 0000:02:00.2: reg 18: [mem 0xfce00000-0xfcefffff]
[    0.788440] pci 0000:02:00.2: reg 1c: [mem 0xfcd80000-0xfcdfffff]
[    0.788450] pci 0000:02:00.2: reg 20: [mem 0xfcd70000-0xfcd77fff]
[    0.788460] pci 0000:02:00.2: reg 24: [mem 0xfcd60000-0xfcd67fff]
[    0.788470] pci 0000:02:00.2: reg 30: [mem 0x00000000-0x0000ffff pref]
[    0.788515] pci 0000:02:00.2: PME# supported from D0 D3hot D3cold
[    0.788519] pci 0000:02:00.2: PME# disabled
[    0.788548] pci 0000:02:00.4: [103c:3300] type 0 class 0x000c03
[    0.788594] pci 0000:02:00.4: reg 20: [io  0x3c00-0x3c1f]
[    0.796067] pci 0000:00:0a.0: PCI bridge to [bus 02-02]
[    0.796074] pci 0000:00:0a.0:   bridge window [io  0x3000-0x3fff]
[    0.796076] pci 0000:00:0a.0:   bridge window [mem 0xfcd00000-0xfddfffff]
[    0.796080] pci 0000:00:0a.0:   bridge window [mem 0xfb000000-0xfbffffff 64bit pref]
[    0.796120] pci 0000:04:00.0: [14e4:1657] type 0 class 0x000200
[    0.796134] pci 0000:04:00.0: reg 10: [mem 0xfcbf0000-0xfcbfffff 64bit pref]
[    0.796146] pci 0000:04:00.0: reg 18: [mem 0xfcbe0000-0xfcbeffff 64bit pref]
[    0.796157] pci 0000:04:00.0: reg 20: [mem 0xfcbd0000-0xfcbdffff 64bit pref]
[    0.796165] pci 0000:04:00.0: reg 30: [mem 0x00000000-0x0001ffff pref]
[    0.796209] pci 0000:04:00.0: PME# supported from D0 D3hot D3cold
[    0.796213] pci 0000:04:00.0: PME# disabled
[    0.796248] pci 0000:04:00.1: [14e4:1657] type 0 class 0x000200
[    0.796262] pci 0000:04:00.1: reg 10: [mem 0xfcbc0000-0xfcbcffff 64bit pref]
[    0.796273] pci 0000:04:00.1: reg 18: [mem 0xfcbb0000-0xfcbbffff 64bit pref]
[    0.796284] pci 0000:04:00.1: reg 20: [mem 0xfcba0000-0xfcbaffff 64bit pref]
[    0.796292] pci 0000:04:00.1: reg 30: [mem 0x00000000-0x0001ffff pref]
[    0.796336] pci 0000:04:00.1: PME# supported from D0 D3hot D3cold
[    0.796339] pci 0000:04:00.1: PME# disabled
[    0.796367] pci 0000:04:00.2: [14e4:1657] type 0 class 0x000200
[    0.796381] pci 0000:04:00.2: reg 10: [mem 0xfcb90000-0xfcb9ffff 64bit pref]
[    0.796392] pci 0000:04:00.2: reg 18: [mem 0xfcb80000-0xfcb8ffff 64bit pref]
[    0.796403] pci 0000:04:00.2: reg 20: [mem 0xfcb70000-0xfcb7ffff 64bit pref]
[    0.796411] pci 0000:04:00.2: reg 30: [mem 0x00000000-0x0001ffff pref]
[    0.796455] pci 0000:04:00.2: PME# supported from D0 D3hot D3cold
[    0.796458] pci 0000:04:00.2: PME# disabled
[    0.796486] pci 0000:04:00.3: [14e4:1657] type 0 class 0x000200
[    0.796500] pci 0000:04:00.3: reg 10: [mem 0xfcb60000-0xfcb6ffff 64bit pref]
[    0.796511] pci 0000:04:00.3: reg 18: [mem 0xfcb50000-0xfcb5ffff 64bit pref]
[    0.796522] pci 0000:04:00.3: reg 20: [mem 0xfcb40000-0xfcb4ffff 64bit pref]
[    0.796531] pci 0000:04:00.3: reg 30: [mem 0x00000000-0x0001ffff pref]
[    0.796575] pci 0000:04:00.3: PME# supported from D0 D3hot D3cold
[    0.796578] pci 0000:04:00.3: PME# disabled
[    0.804069] pci 0000:00:0c.0: PCI bridge to [bus 04-04]
[    0.804078] pci 0000:00:0c.0:   bridge window [mem 0xfcb00000-0xfcbfffff 64bit pref]
[    0.804143] pci 0000:00:14.4: PCI bridge to [bus 01-01] (subtractive decode)
[    0.804155] pci 0000:00:14.4:   bridge window [mem 0xfaf00000-0xfdffffff] (subtractive decode)
[    0.804157] pci 0000:00:14.4:   bridge window [io  0x1000-0xffff] (subtractive decode)
[    0.804159] pci 0000:00:14.4:   bridge window [io  0x0000-0x03af] (subtractive decode)
[    0.804161] pci 0000:00:14.4:   bridge window [io  0x03e0-0x0cf7] (subtractive decode)
[    0.804164] pci 0000:00:14.4:   bridge window [io  0x0d00-0x0fff] (subtractive decode)
[    0.804166] pci 0000:00:14.4:   bridge window [mem 0xfed00000-0xfed03fff] (subtractive decode)
[    0.804168] pci 0000:00:14.4:   bridge window [mem 0xfed40000-0xfed44fff] (subtractive decode)
[    0.804170] pci 0000:00:14.4:   bridge window [io  0x03b0-0x03bb] (subtractive decode)
[    0.804172] pci 0000:00:14.4:   bridge window [io  0x03c0-0x03df] (subtractive decode)
[    0.804174] pci 0000:00:14.4:   bridge window [mem 0x000a0000-0x000bffff] (subtractive decode)
[    0.804197] ACPI: PCI Interrupt Routing Table [\_SB_.PCI0._PRT]
[    0.804351] ACPI: PCI Interrupt Routing Table [\_SB_.PCI0.G13A._PRT]
[    0.804393] ACPI: PCI Interrupt Routing Table [\_SB_.PCI0.G112._PRT]
[    0.804422] ACPI: PCI Interrupt Routing Table [\_SB_.PCI0.G12C._PRT]
[    0.804584]  pci0000:00: Requesting ACPI _OSC control (0x1d)
[    0.804647]  pci0000:00: ACPI _OSC request failed (AE_SUPPORT), returned control mask: 0x00
[    0.804653] ACPI _OSC control for PCIe not granted, disabling ASPM
[    0.895819] ACPI: PCI Interrupt Link [I020] (IRQs *24)
[    0.896307] ACPI: PCI Interrupt Link [I021] (IRQs *25)
[    0.896780] ACPI: PCI Interrupt Link [I022] (IRQs *26)
[    0.897256] ACPI: PCI Interrupt Link [I023] (IRQs *27)
[    0.897717] ACPI: PCI Interrupt Link [I030] (IRQs *28)
[    0.898184] ACPI: PCI Interrupt Link [I031] (IRQs *29)
[    0.898658] ACPI: PCI Interrupt Link [I032] (IRQs *30)
[    0.899136] ACPI: PCI Interrupt Link [I033] (IRQs *31)
[    0.899601] ACPI: PCI Interrupt Link [I040] (IRQs *44)
[    0.900093] ACPI: PCI Interrupt Link [I041] (IRQs *45)
[    0.900588] ACPI: PCI Interrupt Link [I042] (IRQs *46)
[    0.901085] ACPI: PCI Interrupt Link [I043] (IRQs *47)
[    0.901561] ACPI: PCI Interrupt Link [I050] (IRQs *48)
[    0.902043] ACPI: PCI Interrupt Link [I051] (IRQs *49)
[    0.902531] ACPI: PCI Interrupt Link [I052] (IRQs *50)
[    0.903033] ACPI: PCI Interrupt Link [I053] (IRQs *51)
[    0.903535] ACPI: PCI Interrupt Link [I060] (IRQs *47)
[    0.904013] ACPI: PCI Interrupt Link [I061] (IRQs *44)
[    0.904520] ACPI: PCI Interrupt Link [I062] (IRQs *45)
[    0.905015] ACPI: PCI Interrupt Link [I063] (IRQs *46)
[    0.905503] ACPI: PCI Interrupt Link [I070] (IRQs *24)
[    0.906001] ACPI: PCI Interrupt Link [I071] (IRQs *25)
[    0.906504] ACPI: PCI Interrupt Link [I072] (IRQs *26)
[    0.907016] ACPI: PCI Interrupt Link [I073] (IRQs *27)
[    0.907126] ACPI: PCI Interrupt Link [I090] (IRQs *24)
[    0.907236] ACPI: PCI Interrupt Link [I091] (IRQs *24)
[    0.907345] ACPI: PCI Interrupt Link [I092] (IRQs *24)
[    0.907454] ACPI: PCI Interrupt Link [I093] (IRQs *24)
[    0.907958] ACPI: PCI Interrupt Link [I0A0] (IRQs *24)
[    0.908488] ACPI: PCI Interrupt Link [I0A1] (IRQs *25)
[    0.909007] ACPI: PCI Interrupt Link [I0A2] (IRQs *26)
[    0.909530] ACPI: PCI Interrupt Link [I0A3] (IRQs *27)
[    0.910038] ACPI: PCI Interrupt Link [I0B0] (IRQs *32)
[    0.910555] ACPI: PCI Interrupt Link [I0B1] (IRQs *33)
[    0.911079] ACPI: PCI Interrupt Link [I0B2] (IRQs *34)
[    0.911608] ACPI: PCI Interrupt Link [I0B3] (IRQs *35)
[    0.912143] ACPI: PCI Interrupt Link [I0C0] (IRQs *36)
[    0.912675] ACPI: PCI Interrupt Link [I0C1] (IRQs *37)
[    0.913212] ACPI: PCI Interrupt Link [I0C2] (IRQs *38)
[    0.913754] ACPI: PCI Interrupt Link [I0C3] (IRQs *39)
[    0.914278] ACPI: PCI Interrupt Link [I0D0] (IRQs *40)
[    0.914810] ACPI: PCI Interrupt Link [I0D1] (IRQs *41)
[    0.915350] ACPI: PCI Interrupt Link [I0D2] (IRQs *42)
[    0.915898] ACPI: PCI Interrupt Link [I0D3] (IRQs *43)
[    0.915948] ACPI: PCI Interrupt Link [BI02] (IRQs *52)
[    0.916003] ACPI: PCI Interrupt Link [BI03] (IRQs *52)
[    0.916082] ACPI: PCI Interrupt Link [BI04] (IRQs *52)
[    0.916156] ACPI: PCI Interrupt Link [BI05] (IRQs *52)
[    0.916233] ACPI: PCI Interrupt Link [BI06] (IRQs *54)
[    0.916318] ACPI: PCI Interrupt Link [BI07] (IRQs *24)
[    0.916408] ACPI: PCI Interrupt Link [BI08] (IRQs *24)
[    0.916505] ACPI: PCI Interrupt Link [BI09] (IRQs *24)
[    0.916609] ACPI: PCI Interrupt Link [BI0A] (IRQs *24)
[    0.916721] ACPI: PCI Interrupt Link [BI0B] (IRQs *54)
[    0.916840] ACPI: PCI Interrupt Link [BI0C] (IRQs *54)
[    0.916965] ACPI: PCI Interrupt Link [BI0D] (IRQs *54)
[    0.917012] ACPI: PCI Interrupt Link [PI20] (IRQs 10 11) *0, disabled.
[    0.917081] ACPI: PCI Interrupt Link [PI21] (IRQs 10 11) *0, disabled.
[    0.917146] ACPI: PCI Interrupt Link [PI22] (IRQs 10 11) *0, disabled.
[    0.917209] ACPI: PCI Interrupt Link [PI23] (IRQs 10 11) *0, disabled.
[    0.917293] ACPI: PCI Interrupt Link [PI30] (IRQs 10 11) *0, disabled.
[    0.917357] ACPI: PCI Interrupt Link [PI31] (IRQs 10 11) *0, disabled.
[    0.917419] ACPI: PCI Interrupt Link [PI32] (IRQs 10 11) *0, disabled.
[    0.917481] ACPI: PCI Interrupt Link [PI33] (IRQs 10 11) *0, disabled.
[    0.917543] ACPI: PCI Interrupt Link [PI40] (IRQs 10 11) *0, disabled.
[    0.917604] ACPI: PCI Interrupt Link [PI41] (IRQs 10 11) *0, disabled.
[    0.917665] ACPI: PCI Interrupt Link [PI42] (IRQs 10 11) *0, disabled.
[    0.917726] ACPI: PCI Interrupt Link [PI43] (IRQs 10 11) *0, disabled.
[    0.917788] ACPI: PCI Interrupt Link [PI50] (IRQs 10 11) *0, disabled.
[    0.917849] ACPI: PCI Interrupt Link [PI51] (IRQs 10 11) *0, disabled.
[    0.917910] ACPI: PCI Interrupt Link [PI52] (IRQs 10 11) *0, disabled.
[    0.917974] ACPI: PCI Interrupt Link [PI53] (IRQs 10 11) *0, disabled.
[    0.918036] ACPI: PCI Interrupt Link [PI60] (IRQs 10 11) *0, disabled.
[    0.918097] ACPI: PCI Interrupt Link [PI61] (IRQs 10 11) *0, disabled.
[    0.918159] ACPI: PCI Interrupt Link [PI62] (IRQs 10 11) *0, disabled.
[    0.918220] ACPI: PCI Interrupt Link [PI63] (IRQs 10 11) *0, disabled.
[    0.918281] ACPI: PCI Interrupt Link [PI70] (IRQs 10 11) *0, disabled.
[    0.918343] ACPI: PCI Interrupt Link [PI71] (IRQs 10 11) *0, disabled.
[    0.918404] ACPI: PCI Interrupt Link [PI72] (IRQs 10 11) *0, disabled.
[    0.918465] ACPI: PCI Interrupt Link [PI73] (IRQs 10 11) *0, disabled.
[    0.918539] ACPI: PCI Interrupt Link [PI90] (IRQs 10 11) *0, disabled.
[    0.918600] ACPI: PCI Interrupt Link [PI91] (IRQs 10 11) *0, disabled.
[    0.918661] ACPI: PCI Interrupt Link [PI92] (IRQs 10 11) *0, disabled.
[    0.918722] ACPI: PCI Interrupt Link [PI93] (IRQs 10 11) *0, disabled.
[    0.918783] ACPI: PCI Interrupt Link [PIA0] (IRQs 10 11) *0, disabled.
[    0.918847] ACPI: PCI Interrupt Link [PIA1] (IRQs 10 11) *0, disabled.
[    0.918908] ACPI: PCI Interrupt Link [PIA2] (IRQs 10 11) *0, disabled.
[    0.918969] ACPI: PCI Interrupt Link [PIA3] (IRQs 10 11) *0, disabled.
[    0.919031] ACPI: PCI Interrupt Link [PIB0] (IRQs 10 11) *0, disabled.
[    0.919096] ACPI: PCI Interrupt Link [PIB1] (IRQs 10 11) *0, disabled.
[    0.919157] ACPI: PCI Interrupt Link [PIB2] (IRQs 10 11) *0, disabled.
[    0.919218] ACPI: PCI Interrupt Link [PIB3] (IRQs 10 11) *0, disabled.
[    0.919280] ACPI: PCI Interrupt Link [PIC0] (IRQs 10 11) *0, disabled.
[    0.919341] ACPI: PCI Interrupt Link [PIC1] (IRQs 10 11) *0, disabled.
[    0.919403] ACPI: PCI Interrupt Link [PIC2] (IRQs 10 11) *0, disabled.
[    0.919464] ACPI: PCI Interrupt Link [PIC3] (IRQs 10 11) *0, disabled.
[    0.919526] ACPI: PCI Interrupt Link [PID0] (IRQs 10 11) *0, disabled.
[    0.919587] ACPI: PCI Interrupt Link [PID1] (IRQs 10 11) *0, disabled.
[    0.919648] ACPI: PCI Interrupt Link [PID2] (IRQs 10 11) *0, disabled.
[    0.919736] ACPI: PCI Interrupt Link [PID3] (IRQs 10 11) *0, disabled.
[    0.919797] ACPI: PCI Interrupt Link [PIR2] (IRQs 10 11) *0, disabled.
[    0.919859] ACPI: PCI Interrupt Link [PIR3] (IRQs 10 11) *0, disabled.
[    0.919920] ACPI: PCI Interrupt Link [PIR4] (IRQs 10 11) *0, disabled.
[    0.919981] ACPI: PCI Interrupt Link [PIR5] (IRQs 10 11) *0, disabled.
[    0.920043] ACPI: PCI Interrupt Link [PIR6] (IRQs 10 11) *0, disabled.
[    0.920120] ACPI: PCI Interrupt Link [PIR7] (IRQs 10 11) *0, disabled.
[    0.920183] ACPI: PCI Interrupt Link [PIR8] (IRQs 10 11) *0, disabled.
[    0.920245] ACPI: PCI Interrupt Link [PIR9] (IRQs 10 11) *0, disabled.
[    0.920306] ACPI: PCI Interrupt Link [PIRA] (IRQs 10 11) *0, disabled.
[    0.920368] ACPI: PCI Interrupt Link [PIRB] (IRQs 10 11) *0, disabled.
[    0.920430] ACPI: PCI Interrupt Link [PIRC] (IRQs 10 11) *0, disabled.
[    0.920491] ACPI: PCI Interrupt Link [PIRD] (IRQs 10 11) *0, disabled.
[    0.920567] ACPI: PCI Interrupt Link [USB1] (IRQs *22)
[    0.920633] ACPI: PCI Interrupt Link [USB2] (IRQs *23)
[    0.920705] ACPI: PCI Interrupt Link [USB3] (IRQs *23)
[    0.920784] ACPI: PCI Interrupt Link [USB4] (IRQs *22)
[    0.920894] ACPI: Invalid _PRS IRQ 0
[    0.921021] ACPI: PCI Interrupt Link [U1PI] (IRQs) *0
[    0.921133] ACPI: Invalid _PRS IRQ 0
[    0.921237] ACPI: PCI Interrupt Link [U2PI] (IRQs) *0
[    0.921351] ACPI: Invalid _PRS IRQ 0
[    0.921456] ACPI: PCI Interrupt Link [U3PI] (IRQs) *0
[    0.921571] ACPI: Invalid _PRS IRQ 0
[    0.921678] ACPI: PCI Interrupt Link [U4PI] (IRQs) *0
[    0.921721] ACPI: PCI Interrupt Link [SATA] (IRQs *16)
[    0.921772] ACPI: Invalid _PRS IRQ 0
[    0.921815] ACPI: PCI Interrupt Link [SATP] (IRQs) *0
[    0.921906] vgaarb: device added: PCI:0000:02:00.1,decodes=io+mem,owns=io+mem,locks=none
[    0.921906] vgaarb: loaded
[    0.921906] vgaarb: bridge control possible 0000:02:00.1
[    0.921906] i2c-core: driver [aat2870] using legacy suspend method
[    0.921906] i2c-core: driver [aat2870] using legacy resume method
[    0.921906] SCSI subsystem initialized
[    0.922013] libata version 3.00 loaded.
[    0.922013] usbcore: registered new interface driver usbfs
[    0.922013] usbcore: registered new interface driver hub
[    0.922013] usbcore: registered new device driver usb
[    0.922013] PCI: Using ACPI for IRQ routing
[    0.925505] Freeing initrd memory: 13880k freed
[    0.930962] PCI: pci_cache_line_size set to 64 bytes
[    0.931065] reserve RAM buffer: 000000000009f400 - 000000000009ffff 
[    0.931068] reserve RAM buffer: 00000000bddde000 - 00000000bfffffff 
[    0.931070] reserve RAM buffer: 000000043efff000 - 000000043fffffff 
[    0.931159] NetLabel: Initializing
[    0.931163] NetLabel:  domain hash size = 128
[    0.931166] NetLabel:  protocols = UNLABELED CIPSOv4
[    0.931178] NetLabel:  unlabeled traffic allowed by default
[    0.931208] hpet0: at MMIO 0xfed00000, IRQs 2, 8, 0, 0
[    0.931208] hpet0: 4 comparators, 32-bit 14.318180 MHz counter
[    0.933100] Switching to clocksource hpet
[    0.938725] AppArmor: AppArmor Filesystem Enabled
[    0.938748] pnp: PnP ACPI init
[    0.938763] ACPI: bus type pnp registered
[    0.938796] pnp 00:00: [bus 00-3f]
[    0.938798] pnp 00:00: [mem 0xfaf00000-0xfdffffff window]
[    0.938801] pnp 00:00: [io  0x1000-0xffff window]
[    0.938803] pnp 00:00: [io  0x0000-0x03af window]
[    0.938805] pnp 00:00: [io  0x03e0-0x0cf7 window]
[    0.938807] pnp 00:00: [io  0x0d00-0x0fff window]
[    0.938809] pnp 00:00: [mem 0xfed00000-0xfed03fff window]
[    0.938811] pnp 00:00: [mem 0xfed40000-0xfed44fff window]
[    0.938812] pnp 00:00: [io  0x03b0-0x03bb window]
[    0.938814] pnp 00:00: [io  0x03c0-0x03df window]
[    0.938816] pnp 00:00: [mem 0x000a0000-0x000bffff window]
[    0.938900] pnp 00:00: Plug and Play ACPI device, IDs PNP0a08 PNP0a03 (active)
[    0.941255] pnp 00:01: [io  0x0010-0x001f]
[    0.941257] pnp 00:01: [io  0x0020-0x003f]
[    0.941259] pnp 00:01: [io  0x00a0-0x00bf]
[    0.941260] pnp 00:01: [io  0x0050-0x0053]
[    0.941262] pnp 00:01: [io  0x0070-0x0079]
[    0.941263] pnp 00:01: [io  0x0090-0x009f]
[    0.941265] pnp 00:01: [io  0x00f0]
[    0.941266] pnp 00:01: [io  0x0379-0x037a]
[    0.941268] pnp 00:01: [io  0x0400-0x043f]
[    0.941270] pnp 00:01: [io  0x04d0-0x04d1]
[    0.941271] pnp 00:01: [io  0x04d6]
[    0.941272] pnp 00:01: [io  0x0520]
[    0.941276] pnp 00:01: [io  0x0580-0x059f]
[    0.941278] pnp 00:01: [io  0x0600-0x067f]
[    0.941279] pnp 00:01: [io  0x0700-0x0703]
[    0.941281] pnp 00:01: [io  0x0820-0x082f]
[    0.941282] pnp 00:01: [io  0x0900-0x09fe]
[    0.941284] pnp 00:01: [io  0x0c06-0x0c07]
[    0.941285] pnp 00:01: [io  0x0c14]
[    0.941287] pnp 00:01: [io  0x0c4a]
[    0.941288] pnp 00:01: [io  0x0c50-0x0c52]
[    0.941290] pnp 00:01: [io  0x0c6c]
[    0.941291] pnp 00:01: [io  0x0c6f]
[    0.941292] pnp 00:01: [io  0x0c80-0x0c83]
[    0.941294] pnp 00:01: [io  0x0c90-0x0c9f]
[    0.941295] pnp 00:01: [io  0x0ca0-0x0ca5]
[    0.941297] pnp 00:01: [io  0x0cd0-0x0cdf]
[    0.941298] pnp 00:01: [io  0x0f50-0x0f58]
[    0.941300] pnp 00:01: [io  0x0b00-0x0b3f]
[    0.941301] pnp 00:01: [io  0x03f8-0x03ff]
[    0.941303] pnp 00:01: [mem 0xc0000000-0xcfffffff]
[    0.941415] system 00:01: [io  0x0379-0x037a] has been reserved
[    0.941420] system 00:01: [io  0x0400-0x043f] has been reserved
[    0.941425] system 00:01: [io  0x04d0-0x04d1] has been reserved
[    0.941429] system 00:01: [io  0x04d6] has been reserved
[    0.941434] system 00:01: [io  0x0520] has been reserved
[    0.941438] system 00:01: [io  0x0580-0x059f] has been reserved
[    0.941442] system 00:01: [io  0x0600-0x067f] has been reserved
[    0.941447] system 00:01: [io  0x0700-0x0703] has been reserved
[    0.941451] system 00:01: [io  0x0820-0x082f] has been reserved
[    0.941456] system 00:01: [io  0x0900-0x09fe] has been reserved
[    0.941460] system 00:01: [io  0x0c06-0x0c07] has been reserved
[    0.941464] system 00:01: [io  0x0c14] has been reserved
[    0.941468] system 00:01: [io  0x0c4a] has been reserved
[    0.941473] system 00:01: [io  0x0c50-0x0c52] has been reserved
[    0.941477] system 00:01: [io  0x0c6c] has been reserved
[    0.941481] system 00:01: [io  0x0c6f] has been reserved
[    0.941485] system 00:01: [io  0x0c80-0x0c83] has been reserved
[    0.941490] system 00:01: [io  0x0c90-0x0c9f] has been reserved
[    0.941494] system 00:01: [io  0x0ca0-0x0ca5] has been reserved
[    0.941499] system 00:01: [io  0x0cd0-0x0cdf] has been reserved
[    0.941503] system 00:01: [io  0x0f50-0x0f58] has been reserved
[    0.941507] system 00:01: [io  0x0b00-0x0b3f] has been reserved
[    0.941512] system 00:01: [io  0x03f8-0x03ff] has been reserved
[    0.941517] system 00:01: [mem 0xc0000000-0xcfffffff] has been reserved
[    0.941522] system 00:01: Plug and Play ACPI device, IDs PNP0c02 (active)
[    0.941530] pnp 00:02: [io  0x0ca2-0x0ca3]
[    0.941586] pnp 00:02: Plug and Play ACPI device, IDs IPI0001 (active)
[    0.941605] pnp 00:03: [mem 0xfed00000-0xfed003ff]
[    0.941662] pnp 00:03: Plug and Play ACPI device, IDs PNP0103 (active)
[    0.941670] pnp 00:04: [dma 7]
[    0.941672] pnp 00:04: [io  0x0000-0x000f]
[    0.941674] pnp 00:04: [io  0x0080-0x008f]
[    0.941676] pnp 00:04: [io  0x00c0-0x00df]
[    0.941733] pnp 00:04: Plug and Play ACPI device, IDs PNP0200 (active)
[    0.941740] pnp 00:05: [io  0x0061]
[    0.941799] pnp 00:05: Plug and Play ACPI device, IDs PNP0800 (active)
[    0.941809] pnp 00:06: [io  0x0060]
[    0.941810] pnp 00:06: [io  0x0064]
[    0.941822] pnp 00:06: [irq 1]
[    0.941878] pnp 00:06: Plug and Play ACPI device, IDs PNP0303 (active)
[    0.941891] pnp 00:07: [irq 12]
[    0.941951] pnp 00:07: Plug and Play ACPI device, IDs PNP0f13 PNP0f0e (active)
[    0.941960] pnp 00:08: [io  0x002e-0x002f]
[    0.941962] pnp 00:08: [io  0x0620-0x065f]
[    0.941964] pnp 00:08: [io  0x0680-0x069f]
[    0.941965] pnp 00:08: [io  0x0600-0x061f]
[    0.941967] pnp 00:08: [io  0x0660-0x067f]
[    0.941968] pnp 00:08: [io  0x0300-0x031f]
[    0.942026] pnp 00:08: Plug and Play ACPI device, IDs PNP0a06 (active)
[    0.942146] pnp 00:09: [irq 3]
[    0.942147] pnp 00:09: [io  0x02f8-0x02ff]
[    0.942284] pnp 00:09: Plug and Play ACPI device, IDs PNP0501 PNP0500 (active)
[    0.942292] pnp 00:0a: [io  0x0070-0x0071]
[    0.942351] pnp 00:0a: Plug and Play ACPI device, IDs PNP0b00 (active)
[    0.942455] pnp 00:0b: [mem 0xfaff4000-0xfaff7fff]
[    0.942543] system 00:0b: [mem 0xfaff4000-0xfaff7fff] has been reserved
[    0.942549] system 00:0b: Plug and Play ACPI device, IDs PNP0c02 (active)
[    0.942590] pnp: PnP ACPI: found 12 devices
[    0.942594] ACPI: ACPI bus type pnp unregistered
[    0.949841] PCI: max bus depth: 1 pci_try_num: 2
[    0.949861] pci 0000:00:02.0: BAR 15: assigned [mem 0xfc000000-0xfc0fffff pref]
[    0.949868] pci 0000:00:0c.0: BAR 14: assigned [mem 0xfc100000-0xfc1fffff]
[    0.949874] pci 0000:03:00.0: BAR 6: assigned [mem 0xfc000000-0xfc07ffff pref]
[    0.949879] pci 0000:00:02.0: PCI bridge to [bus 03-03]
[    0.949884] pci 0000:00:02.0:   bridge window [io  0x4000-0x4fff]
[    0.949889] pci 0000:00:02.0:   bridge window [mem 0xfde00000-0xfdffffff]
[    0.949895] pci 0000:00:02.0:   bridge window [mem 0xfc000000-0xfc0fffff pref]
[    0.949902] pci 0000:02:00.2: BAR 6: assigned [mem 0xfcd00000-0xfcd0ffff pref]
[    0.949908] pci 0000:00:0a.0: PCI bridge to [bus 02-02]
[    0.949912] pci 0000:00:0a.0:   bridge window [io  0x3000-0x3fff]
[    0.949917] pci 0000:00:0a.0:   bridge window [mem 0xfcd00000-0xfddfffff]
[    0.949922] pci 0000:00:0a.0:   bridge window [mem 0xfb000000-0xfbffffff 64bit pref]
[    0.949930] pci 0000:04:00.0: BAR 6: assigned [mem 0xfcb00000-0xfcb1ffff pref]
[    0.949935] pci 0000:04:00.1: BAR 6: assigned [mem 0xfcb20000-0xfcb3ffff pref]
[    0.949941] pci 0000:04:00.2: BAR 6: assigned [mem 0xfc100000-0xfc11ffff pref]
[    0.949947] pci 0000:04:00.3: BAR 6: assigned [mem 0xfc120000-0xfc13ffff pref]
[    0.949952] pci 0000:00:0c.0: PCI bridge to [bus 04-04]
[    0.949957] pci 0000:00:0c.0:   bridge window [mem 0xfc100000-0xfc1fffff]
[    0.949962] pci 0000:00:0c.0:   bridge window [mem 0xfcb00000-0xfcbfffff 64bit pref]
[    0.949969] pci 0000:00:14.4: PCI bridge to [bus 01-01]
[    0.950022] ACPI: PCI Interrupt Link [BI02] enabled at IRQ 52
[    0.950032] pci 0000:00:02.0: PCI INT A -> Link[BI02] -> GSI 52 (level, high) -> IRQ 52
[    0.950039] pci 0000:00:02.0: setting latency timer to 64
[    0.950104] ACPI: PCI Interrupt Link [BI0A] enabled at IRQ 24
[    0.950112] pci 0000:00:0a.0: PCI INT A -> Link[BI0A] -> GSI 24 (level, high) -> IRQ 24
[    0.950119] pci 0000:00:0a.0: setting latency timer to 64
[    0.950189] ACPI: PCI Interrupt Link [BI0C] enabled at IRQ 54
[    0.950196] pci 0000:00:0c.0: PCI INT A -> Link[BI0C] -> GSI 54 (level, high) -> IRQ 54
[    0.950203] pci 0000:00:0c.0: setting latency timer to 64
[    0.950209] pci_bus 0000:00: resource 4 [mem 0xfaf00000-0xfdffffff]
[    0.950211] pci_bus 0000:00: resource 5 [io  0x1000-0xffff]
[    0.950213] pci_bus 0000:00: resource 6 [io  0x0000-0x03af]
[    0.950215] pci_bus 0000:00: resource 7 [io  0x03e0-0x0cf7]
[    0.950217] pci_bus 0000:00: resource 8 [io  0x0d00-0x0fff]
[    0.950219] pci_bus 0000:00: resource 9 [mem 0xfed00000-0xfed03fff]
[    0.950221] pci_bus 0000:00: resource 10 [mem 0xfed40000-0xfed44fff]
[    0.950223] pci_bus 0000:00: resource 11 [io  0x03b0-0x03bb]
[    0.950225] pci_bus 0000:00: resource 12 [io  0x03c0-0x03df]
[    0.950227] pci_bus 0000:00: resource 13 [mem 0x000a0000-0x000bffff]
[    0.950229] pci_bus 0000:03: resource 0 [io  0x4000-0x4fff]
[    0.950230] pci_bus 0000:03: resource 1 [mem 0xfde00000-0xfdffffff]
[    0.950232] pci_bus 0000:03: resource 2 [mem 0xfc000000-0xfc0fffff pref]
[    0.950235] pci_bus 0000:02: resource 0 [io  0x3000-0x3fff]
[    0.950236] pci_bus 0000:02: resource 1 [mem 0xfcd00000-0xfddfffff]
[    0.950238] pci_bus 0000:02: resource 2 [mem 0xfb000000-0xfbffffff 64bit pref]
[    0.950240] pci_bus 0000:04: resource 1 [mem 0xfc100000-0xfc1fffff]
[    0.950242] pci_bus 0000:04: resource 2 [mem 0xfcb00000-0xfcbfffff 64bit pref]
[    0.950245] pci_bus 0000:01: resource 4 [mem 0xfaf00000-0xfdffffff]
[    0.950246] pci_bus 0000:01: resource 5 [io  0x1000-0xffff]
[    0.950248] pci_bus 0000:01: resource 6 [io  0x0000-0x03af]
[    0.950250] pci_bus 0000:01: resource 7 [io  0x03e0-0x0cf7]
[    0.950252] pci_bus 0000:01: resource 8 [io  0x0d00-0x0fff]
[    0.950254] pci_bus 0000:01: resource 9 [mem 0xfed00000-0xfed03fff]
[    0.950255] pci_bus 0000:01: resource 10 [mem 0xfed40000-0xfed44fff]
[    0.950257] pci_bus 0000:01: resource 11 [io  0x03b0-0x03bb]
[    0.950259] pci_bus 0000:01: resource 12 [io  0x03c0-0x03df]
[    0.950261] pci_bus 0000:01: resource 13 [mem 0x000a0000-0x000bffff]
[    0.950299] NET: Registered protocol family 2
[    0.950766] IP route cache hash table entries: 524288 (order: 10, 4194304 bytes)
[    0.952642] TCP established hash table entries: 524288 (order: 11, 8388608 bytes)
[    0.954673] TCP bind hash table entries: 65536 (order: 8, 1048576 bytes)
[    0.954926] TCP: Hash tables configured (established 524288 bind 65536)
[    0.954930] TCP reno registered
[    0.954960] UDP hash table entries: 8192 (order: 6, 262144 bytes)
[    0.955054] UDP-Lite hash table entries: 8192 (order: 6, 262144 bytes)
[    0.955205] NET: Registered protocol family 1
[    0.955295] ACPI: PCI Interrupt Link [USB1] enabled at IRQ 22
[    0.955312] pci 0000:00:12.0: PCI INT A -> Link[USB1] -> GSI 22 (level, low) -> IRQ 22
[    1.447502] pci 0000:00:12.0: PCI INT A disabled
[    1.447514] pci 0000:00:12.1: PCI INT A -> Link[USB1] -> GSI 22 (level, low) -> IRQ 22
[    1.947438] pci 0000:00:12.1: PCI INT A disabled
[    1.947496] ACPI: PCI Interrupt Link [USB2] enabled at IRQ 23
[    1.947509] pci 0000:00:12.2: PCI INT B -> Link[USB2] -> GSI 23 (level, low) -> IRQ 23
[    2.174234] pci 0000:00:12.2: PCI INT B disabled
[    2.174294] ACPI: PCI Interrupt Link [USB3] enabled at IRQ 23
[    2.174299] pci 0000:00:13.0: PCI INT A -> Link[USB3] -> GSI 23 (level, low) -> IRQ 23
[    2.447458] pci 0000:00:13.0: PCI INT A disabled
[    2.447469] pci 0000:00:13.1: PCI INT A -> Link[USB3] -> GSI 23 (level, low) -> IRQ 23
[    2.947402] pci 0000:00:13.1: PCI INT A disabled
[    2.947467] ACPI: PCI Interrupt Link [USB4] enabled at IRQ 22
[    2.947472] pci 0000:00:13.2: PCI INT B -> Link[USB4] -> GSI 22 (level, low) -> IRQ 22
[    3.197395] pci 0000:00:13.2: PCI INT B disabled
[    3.197440] pci 0000:02:00.1: Boot video device
[    3.197715] ACPI: PCI Interrupt Link [I061] enabled at IRQ 44
[    3.197725] pci 0000:02:00.4: PCI INT B -> Link[I061] -> GSI 44 (level, high) -> IRQ 44
[    3.197747] pci 0000:02:00.4: PCI INT B disabled
[    3.197760] PCI: CLS 64 bytes, default 64
[    3.197787] AMD-Vi: device: 00:00.2 cap: 0040 seg: 0 flags: 3e info 1300
[    3.197792] AMD-Vi:        mmio-addr: 00000000faff4000
[    3.198005] AMD-Vi:   DEV_SELECT_RANGE_START	 devid: 00:00.0 flags: 00
[    3.198010] AMD-Vi:   DEV_RANGE_END		 devid: 00:00.2
[    3.198014] AMD-Vi:   DEV_SELECT			 devid: 00:02.0 flags: 00
[    3.198018] AMD-Vi:   DEV_SELECT			 devid: 03:00.0 flags: 00
[    3.198022] AMD-Vi:   DEV_SELECT			 devid: 00:0a.0 flags: 00
[    3.198025] AMD-Vi:   DEV_SELECT_RANGE_START	 devid: 02:00.0 flags: 00
[    3.198030] AMD-Vi:   DEV_RANGE_END		 devid: 02:00.4
[    3.198033] AMD-Vi:   DEV_SELECT			 devid: 00:0c.0 flags: 00
[    3.198037] AMD-Vi:   DEV_SELECT_RANGE_START	 devid: 04:00.0 flags: 00
[    3.198041] AMD-Vi:   DEV_RANGE_END		 devid: 04:00.3
[    3.198044] AMD-Vi:   DEV_SELECT			 devid: 00:11.0 flags: 00
[    3.198048] AMD-Vi:   DEV_SELECT_RANGE_START	 devid: 00:12.0 flags: 00
[    3.198052] AMD-Vi:   DEV_RANGE_END		 devid: 00:12.2
[    3.198056] AMD-Vi:   DEV_SELECT_RANGE_START	 devid: 00:13.0 flags: 00
[    3.198060] AMD-Vi:   DEV_RANGE_END		 devid: 00:13.2
[    3.198063] AMD-Vi:   DEV_SELECT			 devid: 00:14.0 flags: d7
[    3.198067] AMD-Vi:   DEV_SELECT			 devid: 00:14.1 flags: 00
[    3.198071] AMD-Vi:   DEV_SELECT			 devid: 00:14.3 flags: 00
[    3.198074] AMD-Vi:   DEV_SELECT			 devid: 00:14.4 flags: 00
[    3.198078] AMD-Vi:   DEV_ALIAS_RANGE		 devid: 01:00.0 flags: 00 devid_to: 00:14.4
[    3.198083] AMD-Vi:   DEV_RANGE_END		 devid: 01:1f.7
[    3.198097] pci 0000:00:00.2: can't derive routing for PCI INT A
[    3.198102] pci 0000:00:00.2: PCI INT A: no GSI
[    3.198107] AMD-Vi: IVMD_TYPE_ALL		 devid_start: 00:00.0 devid_end: 04:00.3 range_start: 00000000000f0000 range_end: 0000000000100000 flags: 7
[    3.198116] AMD-Vi: IVMD_TYPE_ALL		 devid_start: 00:00.0 devid_end: 04:00.3 range_start: 00000000bff70000 range_end: 00000000bfff0000 flags: 7
[    3.198124] AMD-Vi: IVMD_TYPE_ALL		 devid_start: 00:00.0 devid_end: 04:00.3 range_start: 00000000000e8000 range_end: 00000000000e9000 flags: 7
[    3.198132] AMD-Vi: IVMD_TYPE_ALL		 devid_start: 00:00.0 devid_end: 04:00.3 range_start: 00000000bdffe000 range_end: 00000000be000000 flags: 7
[    3.198139] AMD-Vi: IVMD_TYPE_ALL		 devid_start: 00:00.0 devid_end: 04:00.3 range_start: 00000000bdff9000 range_end: 00000000bdffd000 flags: 7
[    3.198147] AMD-Vi: IVMD_TYPE_ALL		 devid_start: 00:00.0 devid_end: 04:00.3 range_start: 00000000bdfe9000 range_end: 00000000bdff9000 flags: 7
[    3.198194] pci 0000:00:00.2: irq 72 for MSI/MSI-X
[    3.198259] AMD-Vi: Enabling IOMMU at 0000:00:00.2 cap 0x40
[    3.264727] AMD-Vi: Lazy IO/TLB flushing enabled
[    3.264733] PCI-DMA: Using software bounce buffering for IO (SWIOTLB)
[    3.264738] Placing 64MB software IO TLB between ffff8800b9ddb000 - ffff8800bdddb000
[    3.264743] software IO TLB at phys 0xb9ddb000 - 0xbdddb000
[    3.265516] perf: AMD IBS detected (0x000000ff)
[    3.265731] audit: initializing netlink socket (disabled)
[    3.265742] type=2000 audit(1357265243.260:1): initialized
[    3.321516] HugeTLB registered 2 MB page size, pre-allocated 0 pages
[    3.399052] VFS: Disk quotas dquot_6.5.2
[    3.399111] Dquot-cache hash table entries: 512 (order 0, 4096 bytes)
[    3.399628] fuse init (API version 7.17)
[    3.399716] msgmni has been set to 31864
[    3.400204] Block layer SCSI generic (bsg) driver version 0.4 loaded (major 253)
[    3.400246] io scheduler noop registered
[    3.400249] io scheduler deadline registered
[    3.400280] io scheduler cfq registered (default)
[    3.400480] pci_hotplug: PCI Hot Plug PCI Core version: 0.5
[    3.400503] pciehp: PCI Express Hot Plug Controller Driver version: 0.4
[    3.400656] input: Power Button as /devices/LNXSYSTM:00/LNXPWRBN:00/input/input0
[    3.400664] ACPI: Power Button [PWRF]
[    3.400743] ACPI: acpi_idle registered with cpuidle
[    3.405430] ERST: Error Record Serialization Table (ERST) support is initialized.
[    3.405499] GHES: APEI firmware first mode is enabled by APEI bit and WHEA _OSC.
[    3.405575] Serial: 8250/16550 driver, 32 ports, IRQ sharing enabled
[    3.426003] serial8250: ttyS0 at I/O 0x3f8 (irq = 4) is a 16550A
[    4.447359] Refined TSC clocksource calibration: 3191.999 MHz.
[    4.447367] Switching to clocksource tsc
[    4.467827] serial8250: ttyS1 at I/O 0x2f8 (irq = 3) is a 16550A
[    7.719126] 00:09: ttyS1 at I/O 0x2f8 (irq = 3) is a 16550A
[    8.447406] Linux agpgart interface v0.103
[    8.448638] brd: module loaded
[    8.449297] loop: module loaded
[    8.449400] ahci 0000:00:11.0: version 3.0
[    8.449444] ACPI: PCI Interrupt Link [SATA] enabled at IRQ 16
[    8.449461] ahci 0000:00:11.0: PCI INT A -> Link[SATA] -> GSI 16 (level, low) -> IRQ 16
[    8.449532] ahci 0000:00:11.0: AHCI 0001.0100 32 slots 1 ports 3 Gbps 0x1 impl SATA mode
[    8.449539] ahci 0000:00:11.0: flags: 64bit ncq sntf ilck pm led clo pmp pio slum part ccc 
[    8.449711] scsi0 : ahci
[    8.449757] ata1: SATA max UDMA/133 abar m1024@0xfccf0000 port 0xfccf0100 irq 16
[    8.449836] pata_acpi 0000:00:14.1: can't derive routing for PCI INT A
[    8.449859] pata_acpi 0000:00:14.1: setting latency timer to 64
[    8.450126] Fixed MDIO Bus: probed
[    8.450143] tun: Universal TUN/TAP device driver, 1.6
[    8.450147] tun: (C) 1999-2004 Max Krasnyansky <maxk@qualcomm.com>
[    8.450188] PPP generic driver version 2.4.2
[    8.450269] ehci_hcd: USB 2.0 'Enhanced' Host Controller (EHCI) Driver
[    8.450284] ehci_hcd 0000:00:12.2: PCI INT B -> Link[USB2] -> GSI 23 (level, low) -> IRQ 23
[    8.450299] ehci_hcd 0000:00:12.2: EHCI Host Controller
[    8.450343] ehci_hcd 0000:00:12.2: new USB bus registered, assigned bus number 1
[    8.450354] ehci_hcd 0000:00:12.2: applying AMD SB700/SB800/Hudson-2/3 EHCI dummy qh workaround
[    8.450380] ehci_hcd 0000:00:12.2: debug port 1
[    8.450407] ehci_hcd 0000:00:12.2: irq 23, io mem 0xfccc0000
[    8.697160] ehci_hcd 0000:00:12.2: USB 2.0 started, EHCI 1.00
[    8.697274] hub 1-0:1.0: USB hub found
[    8.697280] hub 1-0:1.0: 6 ports detected
[    8.697365] ehci_hcd 0000:00:13.2: PCI INT B -> Link[USB4] -> GSI 22 (level, low) -> IRQ 22
[    8.697382] ehci_hcd 0000:00:13.2: EHCI Host Controller
[    8.697422] ehci_hcd 0000:00:13.2: new USB bus registered, assigned bus number 2
[    8.697432] ehci_hcd 0000:00:13.2: applying AMD SB700/SB800/Hudson-2/3 EHCI dummy qh workaround
[    8.697455] ehci_hcd 0000:00:13.2: debug port 1
[    8.697481] ehci_hcd 0000:00:13.2: irq 22, io mem 0xfcc90000
[    8.947202] ehci_hcd 0000:00:13.2: USB 2.0 started, EHCI 1.00
[    8.947322] hub 2-0:1.0: USB hub found
[    8.947326] hub 2-0:1.0: 6 ports detected
[    8.947405] ohci_hcd: USB 1.1 'Open' Host Controller (OHCI) Driver
[    8.947419] ohci_hcd 0000:00:12.0: PCI INT A -> Link[USB1] -> GSI 22 (level, low) -> IRQ 22
[    8.947439] ohci_hcd 0000:00:12.0: OHCI Host Controller
[    8.947488] ohci_hcd 0000:00:12.0: new USB bus registered, assigned bus number 3
[    8.947507] ohci_hcd 0000:00:12.0: irq 22, io mem 0xfcce0000
[    9.197198] ata1: SATA link down (SStatus 0 SControl 300)
[    9.201262] hub 3-0:1.0: USB hub found
[    9.201273] hub 3-0:1.0: 3 ports detected
[    9.201347] ohci_hcd 0000:00:12.1: PCI INT A -> Link[USB1] -> GSI 22 (level, low) -> IRQ 22
[    9.201365] ohci_hcd 0000:00:12.1: OHCI Host Controller
[    9.201407] ohci_hcd 0000:00:12.1: new USB bus registered, assigned bus number 4
[    9.201426] ohci_hcd 0000:00:12.1: irq 22, io mem 0xfccd0000
[    9.451224] hub 4-0:1.0: USB hub found
[    9.451232] hub 4-0:1.0: 3 ports detected
[    9.451297] ohci_hcd 0000:00:13.0: PCI INT A -> Link[USB3] -> GSI 23 (level, low) -> IRQ 23
[    9.451314] ohci_hcd 0000:00:13.0: OHCI Host Controller
[    9.451354] ohci_hcd 0000:00:13.0: new USB bus registered, assigned bus number 5
[    9.451371] ohci_hcd 0000:00:13.0: irq 23, io mem 0xfccb0000
[    9.697166] usb 1-2: new high-speed USB device number 2 using ehci_hcd
[    9.701249] hub 5-0:1.0: USB hub found
[    9.701259] hub 5-0:1.0: 3 ports detected
[    9.701325] ohci_hcd 0000:00:13.1: PCI INT A -> Link[USB3] -> GSI 23 (level, low) -> IRQ 23
[    9.701342] ohci_hcd 0000:00:13.1: OHCI Host Controller
[    9.701380] ohci_hcd 0000:00:13.1: new USB bus registered, assigned bus number 6
[    9.701398] ohci_hcd 0000:00:13.1: irq 23, io mem 0xfcca0000
[    9.951208] hub 6-0:1.0: USB hub found
[    9.951218] hub 6-0:1.0: 3 ports detected
[    9.951292] uhci_hcd: USB Universal Host Controller Interface driver
[    9.951327] uhci_hcd 0000:02:00.4: PCI INT B -> Link[I061] -> GSI 44 (level, high) -> IRQ 44
[    9.951338] uhci_hcd 0000:02:00.4: setting latency timer to 64
[    9.951341] uhci_hcd 0000:02:00.4: UHCI Host Controller
[    9.951382] uhci_hcd 0000:02:00.4: new USB bus registered, assigned bus number 7
[    9.951396] uhci_hcd 0000:02:00.4: port count misdetected? forcing to 2 ports
[    9.951425] uhci_hcd 0000:02:00.4: irq 44, io base 0x00003c00
[    9.951527] hub 7-0:1.0: USB hub found
[    9.951533] hub 7-0:1.0: 2 ports detected
[    9.951651] usbcore: registered new interface driver libusual
[    9.951683] i8042: PNP: PS/2 Controller [PNP0303:KBD,PNP0f0e:PS2M] at 0x60,0x64 irq 1,12
[    9.953293] serio: i8042 KBD port at 0x60,0x64 irq 1
[    9.953302] serio: i8042 AUX port at 0x60,0x64 irq 12
[    9.953407] mousedev: PS/2 mouse device common for all mice
[    9.953508] rtc_cmos 00:0a: RTC can wake from S4
[    9.953602] rtc_cmos 00:0a: rtc core: registered rtc_cmos as rtc0
[    9.953637] rtc0: alarms up to one month, y3k, 114 bytes nvram, hpet irqs
[    9.953711] device-mapper: uevent: version 1.0.3
[    9.953771] device-mapper: ioctl: 4.22.0-ioctl (2011-10-19) initialised: dm-devel@redhat.com
[    9.953908] cpuidle: using governor ladder
[    9.954126] cpuidle: using governor menu
[    9.954129] EFI Variables Facility v0.08 2004-May-17
[    9.954322] TCP cubic registered
[    9.954419] NET: Registered protocol family 10
[    9.954827] NET: Registered protocol family 17
[    9.954833] Registering the dns_resolver key type
[    9.954952] PM: Hibernation image not present or could not be loaded.
[    9.954961] registered taskstats version 1
[    9.999379]   Magic number: 13:540:105
[    9.999395] usbmon usbmon1: hash matches
[    9.999503] rtc_cmos 00:0a: setting system clock to 2013-01-04 02:07:31 UTC (1357265251)
[    9.999542] powernow-k8: Found 2 AMD Opteron(tm) Processor 6328                  (8 cpu cores) (version 2.20.00)
[    9.999565] powernow-k8: Core Performance Boosting: on.
[    9.999610] powernow-k8:    0 : pstate 0 (3200 MHz)
[    9.999614] powernow-k8:    1 : pstate 1 (2800 MHz)
[    9.999617] powernow-k8:    2 : pstate 2 (2300 MHz)
[    9.999620] powernow-k8:    3 : pstate 3 (1900 MHz)
[    9.999624] powernow-k8:    4 : pstate 4 (1400 MHz)
[   10.000419] BIOS EDD facility v0.16 2004-Jun-25, 0 devices found
[   10.000436] EDD information not available.
[   10.002870] Freeing unused kernel memory: 920k freed
[   10.003107] Write protecting the kernel read-only data: 12288k
[   10.009841] Freeing unused kernel memory: 1616k freed
[   10.015071] Freeing unused kernel memory: 1200k freed
[   10.029353] hub 1-2:1.0: USB hub found
[   10.029456] hub 1-2:1.0: 2 ports detected
[   10.031517] udevd[117]: starting version 175
[   10.079849] HP HPSA Driver (v 2.0.2-1)
[   10.080277] ACPI: PCI Interrupt Link [I020] enabled at IRQ 24
[   10.080287] hpsa 0000:03:00.0: PCI INT A -> Link[I020] -> GSI 24 (level, high) -> IRQ 24
[   10.080307] hpsa 0000:03:00.0: MSIX
[   10.080365] hpsa 0000:03:00.0: irq 73 for MSI/MSI-X
[   10.080373] hpsa 0000:03:00.0: irq 74 for MSI/MSI-X
[   10.080380] hpsa 0000:03:00.0: irq 75 for MSI/MSI-X
[   10.080386] hpsa 0000:03:00.0: irq 76 for MSI/MSI-X
[   10.086640] tg3.c:v3.121 (November 2, 2011)
[   10.087015] ACPI: PCI Interrupt Link [I0C0] enabled at IRQ 36
[   10.087033] tg3 0000:04:00.0: PCI INT A -> Link[I0C0] -> GSI 36 (level, high) -> IRQ 36
[   10.087050] tg3 0000:04:00.0: setting latency timer to 64
[   10.090810] scsi1 : pata_atiixp
[   10.090898] scsi2 : pata_atiixp
[   10.090941] ata2: PATA max UDMA/100 cmd 0x1f0 ctl 0x3f6 bmdma 0x500 irq 14
[   10.090947] ata3: PATA max UDMA/100 cmd 0x170 ctl 0x376 bmdma 0x508 irq 15
[   10.100175] hpsa 0000:03:00.0: hpsa0: <0x323b> at IRQ 73 using DAC
[   10.197450] scsi3 : hpsa
[   10.199237] hpsa 0000:03:00.0: RAID              device c3b0t0l0 added.
[   10.199244] hpsa 0000:03:00.0: Direct-Access     device c3b0t0l1 added.
[   10.199387] scsi 3:0:0:0: RAID              HP       P420i            3.20 PQ: 0 ANSI: 5
[   10.199509] scsi 3:0:0:1: Direct-Access     HP       LOGICAL VOLUME   3.20 PQ: 0 ANSI: 5
[   10.199757] scsi 3:0:0:0: Attached scsi generic sg0 type 12
[   10.199918] sd 3:0:0:1: Attached scsi generic sg1 type 0
[   10.200117] sd 3:0:0:1: [sda] 286677120 512-byte logical blocks: (146 GB/136 GiB)
[   10.200267] sd 3:0:0:1: [sda] Write Protect is off
[   10.200274] sd 3:0:0:1: [sda] Mode Sense: 73 00 00 08
[   10.200338] sd 3:0:0:1: [sda] Write cache: disabled, read cache: enabled, doesn't support DPO or FUA
[   10.200771]  sda: sda1 sda2 < sda5 >
[   10.201189] sd 3:0:0:1: [sda] Attached SCSI disk
[   10.468667] tg3 0000:04:00.0: eth0: Tigon3 [partno(629133-001) rev 5719001] (PCI Express) MAC address 2c:76:8a:4f:2c:cc
[   10.468680] tg3 0000:04:00.0: eth0: attached PHY is 5719C (10/100/1000Base-T Ethernet) (WireSpeed[1], EEE[1])
[   10.468688] tg3 0000:04:00.0: eth0: RXcsums[1] LinkChgREG[0] MIirq[0] ASF[1] TSOcap[1]
[   10.468693] tg3 0000:04:00.0: eth0: dma_rwctrl[00000001] dma_mask[64-bit]
[   10.469070] ACPI: PCI Interrupt Link [I0C1] enabled at IRQ 37
[   10.469085] tg3 0000:04:00.1: PCI INT B -> Link[I0C1] -> GSI 37 (level, high) -> IRQ 37
[   10.469097] tg3 0000:04:00.1: setting latency timer to 64
[   10.493918] EXT4-fs (sda1): mounted filesystem with ordered data mode. Opts: (null)
[   10.500041] usb 7-1: new full-speed USB device number 2 using uhci_hcd
[   10.514519] tg3 0000:04:00.1: eth1: Tigon3 [partno(629133-001) rev 5719001] (PCI Express) MAC address 2c:76:8a:4f:2c:cd
[   10.514531] tg3 0000:04:00.1: eth1: attached PHY is 5719C (10/100/1000Base-T Ethernet) (WireSpeed[1], EEE[1])
[   10.514538] tg3 0000:04:00.1: eth1: RXcsums[1] LinkChgREG[0] MIirq[0] ASF[1] TSOcap[1]
[   10.514544] tg3 0000:04:00.1: eth1: dma_rwctrl[00000001] dma_mask[64-bit]
[   10.514602] tg3 0000:04:00.2: PCI INT A -> Link[I0C0] -> GSI 36 (level, high) -> IRQ 36
[   10.514614] tg3 0000:04:00.2: setting latency timer to 64
[   11.207800] tg3 0000:04:00.2: eth2: Tigon3 [partno(629133-001) rev 5719001] (PCI Express) MAC address 2c:76:8a:4f:2c:ce
[   11.207809] tg3 0000:04:00.2: eth2: attached PHY is 5719C (10/100/1000Base-T Ethernet) (WireSpeed[1], EEE[1])
[   11.207816] tg3 0000:04:00.2: eth2: RXcsums[1] LinkChgREG[0] MIirq[0] ASF[1] TSOcap[1]
[   11.207822] tg3 0000:04:00.2: eth2: dma_rwctrl[00000001] dma_mask[64-bit]
[   11.207871] tg3 0000:04:00.3: PCI INT B -> Link[I0C1] -> GSI 37 (level, high) -> IRQ 37
[   11.207880] tg3 0000:04:00.3: setting latency timer to 64
[   11.957826] tg3 0000:04:00.3: eth3: Tigon3 [partno(629133-001) rev 5719001] (PCI Express) MAC address 2c:76:8a:4f:2c:cf
[   11.957835] tg3 0000:04:00.3: eth3: attached PHY is 5719C (10/100/1000Base-T Ethernet) (WireSpeed[1], EEE[1])
[   11.957841] tg3 0000:04:00.3: eth3: RXcsums[1] LinkChgREG[0] MIirq[0] ASF[1] TSOcap[1]
[   11.957847] tg3 0000:04:00.3: eth3: dma_rwctrl[00000001] dma_mask[64-bit]
[   18.792547] ADDRCONF(NETDEV_UP): eth0: link is not ready
[   18.792554] ADDRCONF(NETDEV_UP): eth1: link is not ready
[   18.792561] ADDRCONF(NETDEV_UP): eth2: link is not ready
[   18.792566] ADDRCONF(NETDEV_UP): eth3: link is not ready
[   18.796258] udevd[448]: starting version 175
[   18.804943] lp: driver loaded but no devices found
[   18.809866] power_meter ACPI000D:00: Found ACPI power meter.
[   18.809906] power_meter ACPI000D:00: Ignoring unsafe software power cap!
[   18.813494] Adding 4027644k swap on /dev/sda5.  Priority:-1 extents:1 across:4027644k 
[   18.836579] EXT4-fs (sda1): re-mounted. Opts: errors=remount-ro
[   18.850360] hpilo 0000:02:00.2: PCI INT B -> Link[I061] -> GSI 44 (level, high) -> IRQ 44
[   18.850367] hpilo 0000:02:00.2: setting latency timer to 64
[   18.854559] piix4_smbus 0000:00:14.0: SMBus Host Controller at 0xb00, revision 0
[   18.854872] SP5100 TCO timer: SP5100 TCO WatchDog Timer Driver v0.01
[   18.854946] SP5100 TCO timer: mmio address 0xfec000f0 already in use
[   18.858114] MCE: In-kernel MCE decoding enabled.
[   18.858773] EDAC MC: Ver: 2.1.0
[   18.859326] AMD64 EDAC driver v3.4.0
[   18.865368] type=1400 audit(1357265260.363:2): apparmor="STATUS" operation="profile_load" name="/sbin/dhclient" pid=598 comm="apparmor_parser"
[   18.865655] type=1400 audit(1357265260.363:3): apparmor="STATUS" operation="profile_load" name="/usr/lib/NetworkManager/nm-dhcp-client.action" pid=598 comm="apparmor_parser"
[   18.865810] type=1400 audit(1357265260.363:4): apparmor="STATUS" operation="profile_load" name="/usr/lib/connman/scripts/dhclient-script" pid=598 comm="apparmor_parser"
[   18.866816] EDAC amd64: DRAM ECC enabled.
[   18.866825] EDAC amd64: F15h detected (node 0).
[   18.866882] EDAC MC: DCT0 chip selects:
[   18.866884] EDAC amd64: MC: 0:     0MB 1:     0MB
[   18.866886] EDAC amd64: MC: 2:     0MB 3:     0MB
[   18.866889] EDAC amd64: MC: 4:  4096MB 5:  4096MB
[   18.866891] EDAC amd64: MC: 6:     0MB 7:     0MB
[   18.866893] EDAC MC: DCT1 chip selects:
[   18.866894] EDAC amd64: MC: 0:     0MB 1:     0MB
[   18.866896] EDAC amd64: MC: 2:     0MB 3:     0MB
[   18.866898] EDAC amd64: MC: 4:     0MB 5:     0MB
[   18.866900] EDAC amd64: MC: 6:     0MB 7:     0MB
[   18.866902] EDAC amd64: using x8 syndromes.
[   18.866904] EDAC amd64: MCT channel count: 1
[   18.866938] EDAC amd64: CS4: Registered DDR3 RAM
[   18.866941] EDAC amd64: CS5: Registered DDR3 RAM
[   18.866966] EDAC MC0: Giving out device to 'amd64_edac' 'F15h': DEV 0000:00:18.2
[   18.867461] EDAC amd64: DRAM ECC enabled.
[   18.867474] EDAC amd64: F15h detected (node 1).
[   18.867565] EDAC MC: DCT0 chip selects:
[   18.867567] EDAC amd64: MC: 0:     0MB 1:     0MB
[   18.867569] EDAC amd64: MC: 2:     0MB 3:     0MB
[   18.867572] EDAC amd64: MC: 4:  4096MB 5:  4096MB
[   18.867574] EDAC amd64: MC: 6:     0MB 7:     0MB
[   18.867576] EDAC MC: DCT1 chip selects:
[   18.867578] EDAC amd64: MC: 0:     0MB 1:     0MB
[   18.867581] EDAC amd64: MC: 2:     0MB 3:     0MB
[   18.867583] EDAC amd64: MC: 4:     0MB 5:     0MB
[   18.867584] EDAC amd64: MC: 6:     0MB 7:     0MB
[   18.867586] EDAC amd64: using x8 syndromes.
[   18.867588] EDAC amd64: MCT channel count: 1
[   18.867618] EDAC amd64: CS4: Registered DDR3 RAM
[   18.867620] EDAC amd64: CS5: Registered DDR3 RAM
[   18.867654] EDAC MC1: Giving out device to 'amd64_edac' 'F15h': DEV 0000:00:19.2
[   18.867861] EDAC PCI0: Giving out device to module 'amd64_edac' controller 'EDAC PCI controller': DEV '0000:00:18.2' (POLLED)
[   18.959493] hpsa 0000:03:00.0: cp ffff880430301180 has check condition: unknown type: Sense: 0x5, ASC: 0x20, ASCQ: 0x0, Returning result: 0x2, cmd=[85 06 20 00 05 00 fe 00 00 00 00 00 00 40 ef 00]
[   18.974183] hpsa 0000:03:00.0: cp ffff880430300000 has check condition: unknown type: Sense: 0x5, ASC: 0x20, ASCQ: 0x0, Returning result: 0x2, cmd=[85 08 0e 00 00 00 01 00 00 00 00 00 00 40 ec 00]
[   19.021052] tg3 0000:04:00.0: irq 77 for MSI/MSI-X
[   19.021059] tg3 0000:04:00.0: irq 78 for MSI/MSI-X
[   19.021063] tg3 0000:04:00.0: irq 79 for MSI/MSI-X
[   19.021067] tg3 0000:04:00.0: irq 80 for MSI/MSI-X
[   19.021071] tg3 0000:04:00.0: irq 81 for MSI/MSI-X
[   19.040357] input: HP  Virtual Keyboard  as /devices/pci0000:00/0000:00:0a.0/0000:02:00.4/usb7/7-1/7-1:1.0/input/input1
[   19.040480] generic-usb 0003:03F0:7029.0001: input,hidraw0: USB HID v1.01 Keyboard [HP  Virtual Keyboard ] on usb-0000:02:00.4-1/input0
[   19.042414] input: HP  Virtual Keyboard  as /devices/pci0000:00/0000:00:0a.0/0000:02:00.4/usb7/7-1/7-1:1.1/input/input2
[   19.042696] generic-usb 0003:03F0:7029.0002: input,hidraw1: USB HID v1.01 Mouse [HP  Virtual Keyboard ] on usb-0000:02:00.4-1/input1
[   19.042712] usbcore: registered new interface driver usbhid
[   19.042714] usbhid: USB HID core driver
[   20.064712] ADDRCONF(NETDEV_UP): eth0: link is not ready
[   20.368660] vesafb: mode is 640x480x32, linelength=2560, pages=0
[   20.368663] vesafb: scrolling: redraw
[   20.368666] vesafb: Truecolor: size=8:8:8:8, shift=24:16:8:0
[   20.380171] vesafb: framebuffer at 0xfb000000, mapped to 0xffffc90012880000, using 1216k, total 1216k
[   20.380335] Console: switching to colour frame buffer device 80x30
[   20.391177] fb0: VESA VGA frame buffer device
[   21.752369] tg3 0000:04:00.0: eth0: Link is up at 100 Mbps, full duplex
[   21.752372] tg3 0000:04:00.0: eth0: Flow control is off for TX and off for RX
[   21.752374] tg3 0000:04:00.0: eth0: EEE is disabled
[   21.753637] ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
[   23.746663] init: failsafe main process (745) killed by TERM signal
[   23.804512] type=1400 audit(1357265265.303:5): apparmor="STATUS" operation="profile_load" name="/usr/sbin/tcpdump" pid=1038 comm="apparmor_parser"
[   23.804932] type=1400 audit(1357265265.303:6): apparmor="STATUS" operation="profile_replace" name="/sbin/dhclient" pid=1036 comm="apparmor_parser"
[   23.805206] type=1400 audit(1357265265.303:7): apparmor="STATUS" operation="profile_replace" name="/usr/lib/NetworkManager/nm-dhcp-client.action" pid=1036 comm="apparmor_parser"
[   23.805346] type=1400 audit(1357265265.303:8): apparmor="STATUS" operation="profile_replace" name="/usr/lib/connman/scripts/dhclient-script" pid=1036 comm="apparmor_parser"
[   32.446459] eth0: no IPv6 routers present

^ permalink raw reply	[flat|nested] 199+ messages in thread

* Re: [PATCH v7u1 26/31] x86: Don't enable swiotlb if there is not enough ram for it
  2013-01-04 23:21                   ` Shuah Khan
@ 2013-01-04 23:55                     ` Yinghai Lu
  2013-01-05  2:02                       ` Shuah Khan
  0 siblings, 1 reply; 199+ messages in thread
From: Yinghai Lu @ 2013-01-04 23:55 UTC (permalink / raw)
  To: Shuah Khan
  Cc: Eric W. Biederman, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	Andrew Morton, Borislav Petkov, Jan Kiszka, Jason Wessel,
	linux-kernel, Konrad Rzeszutek Wilk, Joerg Roedel

On Fri, Jan 4, 2013 at 3:21 PM, Shuah Khan <shuahkhan@gmail.com> wrote:

> Please see attached dmesg for full log. I can do some testing on this
> system with your patch if you would like.

That would be great.

Please try
git://git.kernel.org/pub/scm/linux/kernel/git/yinghai/linux-yinghai.git
for-x86-boot
or just this patch.

Too bad, I can not access AMD systems with IOMMU support.

Thanks a lot.

Yinghai

^ permalink raw reply	[flat|nested] 199+ messages in thread

* Re: [PATCH v7u1 26/31] x86: Don't enable swiotlb if there is not enough ram for it
  2013-01-04 23:55                     ` Yinghai Lu
@ 2013-01-05  2:02                       ` Shuah Khan
  2013-01-05  4:10                         ` Yinghai Lu
  0 siblings, 1 reply; 199+ messages in thread
From: Shuah Khan @ 2013-01-05  2:02 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Eric W. Biederman, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	Andrew Morton, Borislav Petkov, Jan Kiszka, Jason Wessel,
	linux-kernel, Konrad Rzeszutek Wilk, Joerg Roedel

On Fri, Jan 4, 2013 at 4:55 PM, Yinghai Lu <yinghai@kernel.org> wrote:
> On Fri, Jan 4, 2013 at 3:21 PM, Shuah Khan <shuahkhan@gmail.com> wrote:
>
>> Please see attached dmesg for full log. I can do some testing on this
>> system with your patch if you would like.
>
> That would be great.
>
> Please try
> git://git.kernel.org/pub/scm/linux/kernel/git/yinghai/linux-yinghai.git
> for-x86-boot
> or just this patch.
>
> Too bad, I can not access AMD systems with IOMMU support.
>
> Thanks a lot.

I tried your patch on my AMD system. I did change the patch to print
warning instead of panic() and it did trigger the condition for
panic.:

[    5.376654] AMD-Vi: Found IOMMU at 0000:00:00.2 cap 0x40
[    5.376717]
[    5.376799] pci 0000:00:00.2: irq 72 for MSI/MSI-X

It would have panic'ed here:

[    5.388858] AMD-Vi: can not enable swiotlb for unhandled devices by
AMD iommu!

[    5.388964] AMD-Vi: Lazy IO/TLB flushing enabled
[    5.389324] LVT offset 0 assigned for vector 0x400

I applied your patch to 3.6.11 and changed the panic() to pr_info()
and also changed enough_mem_for_swiotlb() to always return false to
simulate not enough memory condition as this system does have enough
memory.

So at least on this AMD system, your patch will result in a panic.

Thanks,
-- Shuah

^ permalink raw reply	[flat|nested] 199+ messages in thread

* Re: [PATCH v7u1 26/31] x86: Don't enable swiotlb if there is not enough ram for it
  2013-01-05  2:02                       ` Shuah Khan
@ 2013-01-05  4:10                         ` Yinghai Lu
  2013-01-05 22:04                           ` Shuah Khan
  0 siblings, 1 reply; 199+ messages in thread
From: Yinghai Lu @ 2013-01-05  4:10 UTC (permalink / raw)
  To: Shuah Khan
  Cc: Eric W. Biederman, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	Andrew Morton, Borislav Petkov, Jan Kiszka, Jason Wessel,
	linux-kernel, Konrad Rzeszutek Wilk, Joerg Roedel

On Fri, Jan 4, 2013 at 6:02 PM, Shuah Khan <shuahkhan@gmail.com> wrote:
> I applied your patch to 3.6.11 and changed the panic() to pr_info()
> and also changed enough_mem_for_swiotlb() to always return false to
> simulate not enough memory condition as this system does have enough
> memory.
>
> So at least on this AMD system, your patch will result in a panic.

ok, thanks for testing.

if enough_mem_for_swiotlb() return false really,  allocating buffer
for swiotlb with bootmem would panic already, right?

so this patch just delay the panic a while for AMD system with
unhandled devices by IOMMU.

Thanks

Yinghai

^ permalink raw reply	[flat|nested] 199+ messages in thread

* Re: [PATCH v7u1 06/31] x86, 64bit, realmode: use init_level4_pgt to set trapmoline_pgt directly
  2013-01-04 22:01     ` Yinghai Lu
@ 2013-01-05  9:59       ` Sakkinen, Jarkko
  0 siblings, 0 replies; 199+ messages in thread
From: Sakkinen, Jarkko @ 2013-01-05  9:59 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Eric W. Biederman,
	Andrew Morton, Borislav Petkov, Jan Kiszka, Jason Wessel,
	linux-kernel

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset="utf-8", Size: 1076 bytes --]

On Fri, 2013-01-04 at 14:01 -0800, Yinghai Lu wrote:
> On Fri, Jan 4, 2013 at 9:18 AM, Sakkinen, Jarkko
> <jarkko.sakkinen@intel.com> wrote:
> > On Thu, 2013-01-03 at 16:48 -0800, Yinghai Lu wrote:
> >> with #PF handler way to set early page table, level3_ident will go away with
> >> 64bit native path.
> >>
> >> So just use entries in init_level4_pgt to set them in tramopline_pgt
> >>
> >> Signed-off-by: Yinghai Lu <yinghai@kernel.org>
> >> Cc: Jarkko Sakkinen <jarkko.sakkinen@intel.com>
> >
> > Acked-by: Jarkko Sakkinen <jarkko.sakkinen@intel.com>
> 
> Thanks.
> 
> updated the patch, and would save some time for HPA.

Sure. They look fine to me. Larger patch does only code
reordering from realmode code point of view and smaller 
patch initializes trampoline_gdt essentially with the 
same values as before so I don't see why these patches 
would break anything.

But let's wait for hpa comments.

/Jarkko
ÿôèº{.nÇ+‰·Ÿ®‰­†+%ŠËÿ±éݶ\x17¥Šwÿº{.nÇ+‰·¥Š{±þG«éÿŠ{ayº\x1dʇڙë,j\a­¢f£¢·hšïêÿ‘êçz_è®\x03(­éšŽŠÝ¢j"ú\x1a¶^[m§ÿÿ¾\a«þG«éÿ¢¸?™¨è­Ú&£ø§~á¶iO•æ¬z·švØ^\x14\x04\x1a¶^[m§ÿÿÃ\fÿ¶ìÿ¢¸?–I¥

^ permalink raw reply	[flat|nested] 199+ messages in thread

* Re: [PATCH v7u1 01/31] x86, mm: Fix page table early allocation offset checking
  2013-01-04 21:50     ` Yinghai Lu
@ 2013-01-05 13:05       ` Borislav Petkov
  0 siblings, 0 replies; 199+ messages in thread
From: Borislav Petkov @ 2013-01-05 13:05 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Eric W. Biederman,
	Andrew Morton, Jan Kiszka, Jason Wessel, linux-kernel

On Fri, Jan 04, 2013 at 01:50:18PM -0800, Yinghai Lu wrote:
> On Thu, Jan 3, 2013 at 11:17 PM, Borislav Petkov <bp@alien8.de> wrote:
> > On Thu, Jan 03, 2013 at 04:48:21PM -0800, Yinghai Lu wrote:
> >> During debugging loading kernel above 4G, found one page if is not used
> >> in BRK with early page allocation.
> >>
> >> pgt_buf_top is address that can not be used, so should check if that new
> >> end is above that top, otherwise last page will not be used.
> >>
> >> Fix that checking and also add print out for every allocation from BRK.
> >
> > This commit message still bothers the hell out of me. Please, fix it up
> > to something more readable like the below, for example:
> >
> > "pgt_buf_top is an address which cannot be used so we should check
> > whether the new 'end' is above it. Otherwise, the last BRK page remains
> > unused.
> >
> > Fix that check and add a debug printout of every BRK allocation."
> 
> but your changelog is wrong.
> 
> it is NOT last BRK page.

"...otherwise last page will not be used." ??? Is it the current last
page? Which is it?

The fact that I need ot ask twice and cannot simply read it out from
your commit message should tell you one thing and one thing only: you
need to write out in more detail what you're doing so that people can
understand it. I admit, I'm not the smartest but that's even better -
you need to explain your code even to dumb people :-).

> it is NOT every BRK allocation.

Ok, so it is not every BRK allocation but it is for "every allocation
from BRK." But why do we need to print it out then? Why is it important
to print out every PGTABLE allocation we do from BRK? The answer to
*that* question should definitely go into the commit message.

So I hope you can catch my drift: this code needs very good explanation
because no one can take a look into your brain and read it out from
there. So please, let's give that commit message another try.

Thanks.

-- 
Regards/Gruss,
Boris.

^ permalink raw reply	[flat|nested] 199+ messages in thread

* Re: [PATCH v7u1 04/31] x86, 64bit, mm: add generic kernel/ident mapping helper
  2013-01-04 22:19     ` Yinghai Lu
@ 2013-01-05 13:21       ` Borislav Petkov
  0 siblings, 0 replies; 199+ messages in thread
From: Borislav Petkov @ 2013-01-05 13:21 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Eric W. Biederman,
	Andrew Morton, Jan Kiszka, Jason Wessel, linux-kernel

On Fri, Jan 04, 2013 at 02:19:17PM -0800, Yinghai Lu wrote:
> On Fri, Jan 4, 2013 at 1:19 PM, Borislav Petkov <bp@alien8.de> wrote:
> > On Thu, Jan 03, 2013 at 04:48:24PM -0800, Yinghai Lu wrote:
> >> +int kernel_mapping_init(pgd_t *pgd_page, unsigned long addr, unsigned long end)
> >> +{
> >> +     struct x86_mapping_info info = {
> >> +             .alloc_pgt_page = alloc_pgt_page,
> >> +             .pmd_flag       = __PAGE_KERNEL_LARGE,
> >> +             .kernel_mapping = true,
> >> +     };
> >> +
> >> +     return kernel_ident_mapping_init(&info, pgd_page, addr, end);
> >
> > This patch looks good so far except this:
> > kernel_ident_mapping_init says it initializes ident mapping but
> > this is wrong and the type of mapping is actually controlled by
> > info.kernel_mapping.
> 
> it is not wrong, and it could do two things.
> kernel_mapping
> ident_mapping

I know that. But it has kernel_ident_mapping in the name although it can
do both. IMO, it should be like this:


int kernel_mapping_init_range(pgd_t *pgd_page, unsigned long addr, unsigned long end)

...
	return kernel_mapping_init(&info, pgd_page, addr, end);

so that your workhorse is kernel_mapping_init and it can do both kernel
and ident mapping and the wrapper is kernel_mapping_init_range() which
gets a range of (addr, end), preps the &info descriptor and calls the
workhorse.

See what I mean?

> > So this function which gets &info, etc should be called
> > kernel_mapping_init, AFAICT. And wrt the one wrapping
> > kernel_ident_mapping_init, I can't seem to find where it is called.
> > What's up?
> 
> this kernel_mapping_init is for -v8 ..., should be dropped if -v7 is
> used at last.

Well, this *DEFINITELY* should be in the commit message so that
reviewers don't go crazy looking for functions used in other branches.
Maan, this was absolutely insane.

-- 
Regards/Gruss,
Boris.

^ permalink raw reply	[flat|nested] 199+ messages in thread

* Re: [PATCH v7u1 20/31] x86, kexec: replace ident_mapping_init and init_level4_page
  2013-01-04 22:04     ` Yinghai Lu
@ 2013-01-05 13:24       ` Borislav Petkov
  2013-01-10  1:26         ` Yinghai Lu
  0 siblings, 1 reply; 199+ messages in thread
From: Borislav Petkov @ 2013-01-05 13:24 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Eric W. Biederman,
	Andrew Morton, Jan Kiszka, Jason Wessel, linux-kernel

On Fri, Jan 04, 2013 at 02:04:05PM -0800, Yinghai Lu wrote:
> On Fri, Jan 4, 2013 at 1:01 PM, Borislav Petkov <bp@alien8.de> wrote:
> > On Thu, Jan 03, 2013 at 04:48:40PM -0800, Yinghai Lu wrote:
> >>  static int init_pgtable(struct kimage *image, unsigned long start_pgtable)
> >>  {
> >> +     struct x86_mapping_info info = {
> >> +             .alloc_pgt_page = alloc_pgt_page,
> >> +             .context        = image,
> >> +             .pmd_flag       = __PAGE_KERNEL_LARGE_EXEC,
> >> +     };
> >
> > This is leaving ->kernel_mapping uninitialized to contain a random,
> > previous stack value. I don't think we want that.
> 
> that should be initialized to false by default.

So make it explicit. You can't possibly rely on what the stack contains
when you allocate that struct there.

-- 
Regards/Gruss,
Boris.

^ permalink raw reply	[flat|nested] 199+ messages in thread

* Re: [PATCH v7u1 03/31] x86, realmode: set real_mode permissions early
  2013-01-04 22:13         ` Yinghai Lu
@ 2013-01-05 13:25           ` Borislav Petkov
  2013-01-07 12:40             ` Borislav Petkov
  0 siblings, 1 reply; 199+ messages in thread
From: Borislav Petkov @ 2013-01-05 13:25 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Eric W. Biederman,
	Andrew Morton, Jan Kiszka, Jason Wessel, linux-kernel

On Fri, Jan 04, 2013 at 02:13:11PM -0800, Yinghai Lu wrote:
> On Fri, Jan 4, 2013 at 1:04 PM, Borislav Petkov <bp@alien8.de> wrote:
> > On Fri, Jan 04, 2013 at 12:58:15PM -0800, Yinghai Lu wrote:
> >> more than that, that set_real_mode_permissions reference is wrong,
> >> actually it is set_real_mode.
> >
> > Huh, set_real_mode_permissions is the name of the function above which the
> > comment is located. There's no set_real_mode. What do you mean?
> 
> old comments is wrong.
> 
> setup_read_mode reserve from low ram under 1M and copy etc.
> 
> set_real_mode_permissions will change to +x etc....

Ok, let me shout it out to you, hopefully you can understand me now:

THERE ARE NO FUNCTIONS BY THE NAME setup_read_mode OR set_real_mode IN
YOUR BRANCH OR ANYWHERE IN THE KERNEL!!!

$ git log -p yinghai/for-x86-boot-v7 | grep -EriIn '(setup_read_mode|set_real_mode)\W'
$ git log -p yinghai/for-x86-boot-v8 | grep -EriIn '(setup_read_mode|set_real_mode)\W'
$

Or do you mean that the function naming is wrong? WTF?

-- 
Regards/Gruss,
Boris.

^ permalink raw reply	[flat|nested] 199+ messages in thread

* Re: [PATCH v7u1 26/31] x86: Don't enable swiotlb if there is not enough ram for it
  2013-01-05  4:10                         ` Yinghai Lu
@ 2013-01-05 22:04                           ` Shuah Khan
  0 siblings, 0 replies; 199+ messages in thread
From: Shuah Khan @ 2013-01-05 22:04 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Eric W. Biederman, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	Andrew Morton, Borislav Petkov, Jan Kiszka, Jason Wessel,
	linux-kernel, Konrad Rzeszutek Wilk, Joerg Roedel

On Fri, Jan 4, 2013 at 9:10 PM, Yinghai Lu <yinghai@kernel.org> wrote:
> On Fri, Jan 4, 2013 at 6:02 PM, Shuah Khan <shuahkhan@gmail.com> wrote:
>> I applied your patch to 3.6.11 and changed the panic() to pr_info()
>> and also changed enough_mem_for_swiotlb() to always return false to
>> simulate not enough memory condition as this system does have enough
>> memory.
>>
>> So at least on this AMD system, your patch will result in a panic.
>
> ok, thanks for testing.
>
> if enough_mem_for_swiotlb() return false really,  allocating buffer
> for swiotlb with bootmem would panic already, right?
>
> so this patch just delay the panic a while for AMD system with
> unhandled devices by IOMMU.
>
> Thanks
>
> Yinghai

Right. It will eventually panic. I think this is not a valid test. I
am planning to run more tests without forcing no memory condition
which is what I should have done in the first place. I will let you
know what I find, very likely Monday.

Thanks,
-- Shuah

^ permalink raw reply	[flat|nested] 199+ messages in thread

* Re: [PATCH v7u1 03/31] x86, realmode: set real_mode permissions early
  2013-01-05 13:25           ` Borislav Petkov
@ 2013-01-07 12:40             ` Borislav Petkov
  0 siblings, 0 replies; 199+ messages in thread
From: Borislav Petkov @ 2013-01-07 12:40 UTC (permalink / raw)
  To: Yinghai Lu, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	Eric W. Biederman, Andrew Morton, Jan Kiszka, Jason Wessel,
	linux-kernel

On Sat, Jan 05, 2013 at 02:25:46PM +0100, Borislav Petkov wrote:
> Or do you mean that the function naming is wrong? WTF?

Ok, I think I can guess what you mean: that's setup_real_mode which
allocates the real_mode_blob memory. Right?

-- 
Regards/Gruss,
    Boris.

Sent from a fat crate under my desk. Formatting is fine.
--

^ permalink raw reply	[flat|nested] 199+ messages in thread

* Re: [PATCH v7u1 26/31] x86: Don't enable swiotlb if there is not enough ram for it
  2013-01-04 22:10         ` Yinghai Lu
  2013-01-04 22:26           ` Shuah Khan
@ 2013-01-07 15:26           ` Konrad Rzeszutek Wilk
  2013-01-07 17:02             ` Shuah Khan
  2013-01-07 20:32             ` Yinghai Lu
  1 sibling, 2 replies; 199+ messages in thread
From: Konrad Rzeszutek Wilk @ 2013-01-07 15:26 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Shuah Khan, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	Eric W. Biederman, Andrew Morton, Borislav Petkov, Jan Kiszka,
	Jason Wessel, linux-kernel, Joerg Roedel

On Fri, Jan 04, 2013 at 02:10:25PM -0800, Yinghai Lu wrote:
> On Fri, Jan 4, 2013 at 1:02 PM, Shuah Khan <shuahkhan@gmail.com> wrote:
> > Pani'cing the system doesn't sound like a good option to me in this
> > case. This change to disable swiotlb is made for kdump. However, with
> > this change several system fail to boot, unless crashkernel_low=72M is
> > specified.
> 
> this patchset is new feature to put second kdump kernel above 4G.
> 
> >
> > I would the say the right approach to solve this would be to not
> > change the current pci_swiotlb_detect_override() behavior and treat
> > swiotlb =1 upon entry equivalent to swiotlb_force set.
> 
> that will make intel system have to take crashkernel_low=72M too.
> otherwise intel system will get panic during swiotlb allocation.

Two things:

 1). You need to wrap the 'is_enough_..' in CONFIG_KEXEC, which means
    that the function needs to go in a header file.
 2). The check for 1MB is suspect. Why only 1MB? You mentioned it is
     b/c of crashkernel_low=72M (which I am not seeing in v3.8 kernel-parameters.txt?
     Is that part of your mega-patchset?). Anyhow, there seems to be a disconnect -
     what if the user supplied crashkernel_low=27M? Perhaps the 'is_enough'
     should also parse the bootparams to double-check that there is enough
     low-mem space? But then if the kernel grows then 72M might not be enough -
     you might need 82M with 3.9.

     Perhaps a better way for this is to do:
	1). Change 'is_enough' to check only for 4MB.
	2). When booting as kexec, the SWIOTLB would only use 4MB instead of 64MB?

     Or, we could also use the post-late SWIOTLB initialization similiary to how it was
     done on ia64. This would mean that the AMD VI code would just call the
     .. something like this - NOT tested or even compile tested:

diff --git a/drivers/iommu/amd_iommu.c b/drivers/iommu/amd_iommu.c
index c1c74e0..e7fa8f7 100644
--- a/drivers/iommu/amd_iommu.c
+++ b/drivers/iommu/amd_iommu.c
@@ -3173,6 +3173,24 @@ int __init amd_iommu_init_dma_ops(void)
 	if (unhandled && max_pfn > MAX_DMA32_PFN) {
 		/* There are unhandled devices - initialize swiotlb for them */
 		swiotlb = 1;
+		/* Late (so no bootmem allocator) usage and only if the early SWIOTLB
+ 		 * hadn't been allocated (which can happen on kexec kernels booted
+ 		 * above 4GB). */
+		if (!swiotlb_nr_tbl()) {
+			int retry = 3;
+			int mb_size = 64;
+			int rc = 0;
+retry_me:
+			if (retry < 0)
+				panic("We tried setting %dMB for SWIOTLB but got -ENOMEM", mb_size << 1);
+			rc = swiotlb_late_init_with_default_size(mb_size * (1<<20));
+			if (rc) {
+				retry --;
+				mb_size >> 1;
+				goto retry_me;
+			}
+ 			dma_ops = &swiotlb_dma_ops;
+		}
 	}
 
 	amd_iommu_stats_init();

And then the early SWIOTLB initialization for 64MB can fail and we are still OK.
> 
> Thanks
> 
> Yinghai

^ permalink raw reply related	[flat|nested] 199+ messages in thread

* Re: [PATCH v7u1 05/31] x86, 64bit: copy zero-page early
  2013-01-04  0:48 ` [PATCH v7u1 05/31] x86, 64bit: copy zero-page early Yinghai Lu
@ 2013-01-07 15:53   ` Borislav Petkov
  0 siblings, 0 replies; 199+ messages in thread
From: Borislav Petkov @ 2013-01-07 15:53 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Eric W. Biederman,
	Andrew Morton, Jan Kiszka, Jason Wessel, linux-kernel,
	Alexander Duyck, Fenghua Yu

On Thu, Jan 03, 2013 at 04:48:25PM -0800, Yinghai Lu wrote:
> real_mode_data aka zero-page could be above 4g.

I think this could be more informative if it said 'struct boot_params'
instead of real_mode_data because real_mode_data is the argument name
passed to the respective function and grepping for struct boot_params
actually gives you the struct definition with the helpful comments and
offsets in arch/x86/include/uapi/asm/bootparam.h.

> We will have #PF handler to set page table for not accessible ram
> early, but could limit it before x86_64_start_reservations to limit
> the change to native path.
> 
> Also we will need to ramdisk info in zero-page to access microcode
		   s/to/the/				  ^
							the

> blob in ramdisk in x86_64_start_kernel, so copy zero-page early make
								  makes
> it accessing ramdisk info simple.
> 
> Signed-off-by: Yinghai Lu <yinghai@kernel.org>
> Cc: Alexander Duyck <alexander.h.duyck@intel.com>
> Cc: Fenghua Yu <fenghua.yu@intel.com>

-- 
Regards/Gruss,
Boris.

^ permalink raw reply	[flat|nested] 199+ messages in thread

* Re: [PATCH v7u1 06/31] x86, 64bit, realmode: use init_level4_pgt to set trapmoline_pgt directly
  2013-01-04  0:48 ` [PATCH v7u1 06/31] x86, 64bit, realmode: use init_level4_pgt to set trapmoline_pgt directly Yinghai Lu
  2013-01-04 17:18   ` Sakkinen, Jarkko
@ 2013-01-07 15:54   ` Borislav Petkov
  1 sibling, 0 replies; 199+ messages in thread
From: Borislav Petkov @ 2013-01-07 15:54 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Eric W. Biederman,
	Andrew Morton, Jan Kiszka, Jason Wessel, linux-kernel,
	Jarkko Sakkinen

On Thu, Jan 03, 2013 at 04:48:26PM -0800, Yinghai Lu wrote:
> with #PF handler way to set early page table, level3_ident will go away with
> 64bit native path.
> 
> So just use entries in init_level4_pgt to set them in tramopline_pgt

s/tramopline_pgt/trampoline_pgd/.

> 
> Signed-off-by: Yinghai Lu <yinghai@kernel.org>
> Cc: Jarkko Sakkinen <jarkko.sakkinen@intel.com>
> ---
>  arch/x86/realmode/init.c |    4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/x86/realmode/init.c b/arch/x86/realmode/init.c
> index b96fe6f..384b3f4 100644
> --- a/arch/x86/realmode/init.c
> +++ b/arch/x86/realmode/init.c
> @@ -78,8 +78,8 @@ void __init setup_real_mode(void)
>  	*trampoline_cr4_features = read_cr4();
>  
>  	trampoline_pgd = (u64 *) __va(real_mode_header->trampoline_pgd);
> -	trampoline_pgd[0] = __pa_symbol(level3_ident_pgt) + _KERNPG_TABLE;
> -	trampoline_pgd[511] = __pa_symbol(level3_kernel_pgt) + _KERNPG_TABLE;
> +	trampoline_pgd[0] = init_level4_pgt[pgd_index(__PAGE_OFFSET)].pgd;
> +	trampoline_pgd[511] = init_level4_pgt[511].pgd;
>  #endif
>  }
>  
> -- 
> 1.7.10.4
> 
> 

-- 
Regards/Gruss,
Boris.

^ permalink raw reply	[flat|nested] 199+ messages in thread

* Re: [PATCH v7u1 07/31] x86, realmode: Separate real_mode reserve and setup
  2013-01-04  0:48 ` [PATCH v7u1 07/31] x86, realmode: Separate real_mode reserve and setup Yinghai Lu
  2013-01-04 17:18   ` Sakkinen, Jarkko
@ 2013-01-07 15:54   ` Borislav Petkov
  1 sibling, 0 replies; 199+ messages in thread
From: Borislav Petkov @ 2013-01-07 15:54 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Eric W. Biederman,
	Andrew Morton, Jan Kiszka, Jason Wessel, linux-kernel,
	Jarkko Sakkinen

On Thu, Jan 03, 2013 at 04:48:27PM -0800, Yinghai Lu wrote:
> After we switch to use #PF handler help to set page table, init_level4_pgt
> will only have entries set after init_mem_mapping.
> We need to move copying init_level4_pgt to trampoline_pgd after that.
> 
> So split reserve and setup, and move the setup after init_mem_mapping()
> 
> Signed-off-by: Yinghai Lu <yinghai@kernel.org>
> Cc: Jarkko Sakkinen <jarkko.sakkinen@intel.com>
> ---
>  arch/x86/include/asm/realmode.h |    3 ++-
>  arch/x86/kernel/setup.c         |    4 +++-
>  arch/x86/realmode/init.c        |   30 +++++++++++++++++++-----------
>  3 files changed, 24 insertions(+), 13 deletions(-)
> 
> diff --git a/arch/x86/include/asm/realmode.h b/arch/x86/include/asm/realmode.h
> index fe1ec5b..9c6b890 100644
> --- a/arch/x86/include/asm/realmode.h
> +++ b/arch/x86/include/asm/realmode.h
> @@ -58,6 +58,7 @@ extern unsigned char boot_gdt[];
>  extern unsigned char secondary_startup_64[];
>  #endif
>  
> -extern void __init setup_real_mode(void);
> +void reserve_real_mode(void);
> +void setup_real_mode(void);
>  
>  #endif /* _ARCH_X86_REALMODE_H */
> diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
> index 81ea5a5..01b22d0 100644
> --- a/arch/x86/kernel/setup.c
> +++ b/arch/x86/kernel/setup.c
> @@ -913,10 +913,12 @@ void __init setup_arch(char **cmdline_p)
>  	printk(KERN_DEBUG "initial memory mapped: [mem 0x00000000-%#010lx]\n",
>  			(max_pfn_mapped<<PAGE_SHIFT) - 1);
>  
> -	setup_real_mode();
> +	reserve_real_mode();
>  
>  	init_mem_mapping();
>  
> +	setup_real_mode();
> +
>  	memblock.current_limit = get_max_mapped();
>  	dma_contiguous_reserve(0);
>  
> diff --git a/arch/x86/realmode/init.c b/arch/x86/realmode/init.c
> index 384b3f4..3baae96 100644
> --- a/arch/x86/realmode/init.c
> +++ b/arch/x86/realmode/init.c
> @@ -8,9 +8,26 @@
>  struct real_mode_header *real_mode_header;
>  u32 *trampoline_cr4_features;
>  
> -void __init setup_real_mode(void)
> +void __init reserve_real_mode(void)
>  {
>  	phys_addr_t mem;
> +	unsigned char *base;
> +	size_t size = PAGE_ALIGN(real_mode_blob_end - real_mode_blob);
> +
> +	/* Has to be in very low memory so we can execute real-mode AP code. */

While we're at it, can we change the comment to say "has to be in the
first megabyte..." to be more precise. "very low" is not very telling
:-).

Thanks.

-- 
Regards/Gruss,
Boris.

^ permalink raw reply	[flat|nested] 199+ messages in thread

* Re: [PATCH v7u1 08/31] x86, 64bit: early #PF handler set page table
  2013-01-04  0:48 ` [PATCH v7u1 08/31] x86, 64bit: early #PF handler set page table Yinghai Lu
@ 2013-01-07 15:55   ` Borislav Petkov
  2013-01-10  1:56     ` Yinghai Lu
  0 siblings, 1 reply; 199+ messages in thread
From: Borislav Petkov @ 2013-01-07 15:55 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Eric W. Biederman,
	Andrew Morton, Jan Kiszka, Jason Wessel, linux-kernel

On Thu, Jan 03, 2013 at 04:48:28PM -0800, Yinghai Lu wrote:
> From: "H. Peter Anvin" <hpa@zytor.com>
> 
> two use cases:
> 1. We will support load and run kernel above 4G, and zero_page, ramdisk
>    will be above 4G, too
> 2. need to access ramdisk early to get microcode to update that as
>    early possible.
> 
> We could use early_iomap to access them, but it will make code to
								 too
> messy and hard to unified with 32bit.
		    s/unified/unify/

> 
> So here comes #PF handler to set page page.
> 
> When #PF happen, handler will use pages in __initdata to set page page

"When a page fault happens, the handler will use pages from __initdata
to cover the accessed page."

> to cover accessed page.
> 
> those code and page in __INIT sections, so will not increase ram usages.

Huh, what? Something is in __INIT and will not increase RAM usage?

> The good point is: with help of #PF handler, we can set kernel mapping
> from blank, and switch to init_level4_pgt later.

I think you want to say "we can create temporary, ad-hoc kernel mappings
and forget them later by switching to init_level4_pgt." ?

> switchover in head_64.S is only using three page to handle kernel
> crossing 1G, 512G with shareing page, most insteresting part.

Again, what?

> early_make_pgtable is using kernel high mapping address to access pages
> to set page table.
> 
> -v4: Add phys_base offset to make kexec happy, and add
> 	init_mapping_kernel()   - Yinghai
> -v5: fix compiling with xen, and add back ident level3 and level2 for xen
>      also move back init_level4_pgt from BSS to DATA again.
>      because we have to clear it anyway.  - Yinghai
> -v6: switch to init_level4_pgt in init_mem_mapping. - Yinghai
> -v7: remove not needed clear_page for init_level4_page
>      it is with fill 512,8,0 already in head_64.S  - Yinghai
> -v8: we need to keep that handler alive until init_mem_mapping and don't
>      let early_trap_init to trash that early #PF handler.
>      So split early_trap_pf_init out and move it down. - Yinghai
> -v9: switchover only cover kernel space instead of 1G so could avoid
>      touch possible mem holes. - Yinghai
> -v11: change far jmp back to far return to initial_code, that is needed
>      to fix failure that is reported by Konrad on AMD system.  - Yinghai

Those -vXX version lines need to go under the "---" line. Alternatively,
you might want to add some of them to the commit message with a proper
explanation since they are not that trivial at a first glance, for
example the -v5, -v6, -v8, -v9 with a better explanation.

> 
> Signed-off-by: Yinghai Lu <yinghai@kernel.org>

This needs hpa's S-O-B.

[ … ]

-- 
Regards/Gruss,
Boris.

^ permalink raw reply	[flat|nested] 199+ messages in thread

* Re: [PATCH v7u1 12/31] x86: add get_ramdisk_image/size()
  2013-01-04  0:48 ` [PATCH v7u1 12/31] x86: add get_ramdisk_image/size() Yinghai Lu
@ 2013-01-07 15:56   ` Borislav Petkov
  2013-01-10  1:53     ` Yinghai Lu
  0 siblings, 1 reply; 199+ messages in thread
From: Borislav Petkov @ 2013-01-07 15:56 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Eric W. Biederman,
	Andrew Morton, Jan Kiszka, Jason Wessel, linux-kernel

On Thu, Jan 03, 2013 at 04:48:32PM -0800, Yinghai Lu wrote:
> There are several places to find ramdisk information early for reserving
> and relocating.
> 
> Use functions to make code more readable and consistent.
> 
> Later will add ext_ramdisk_image/size in those functions to support
> loading ramdisk above 4g.
> 
> Signed-off-by: Yinghai Lu <yinghai@kernel.org>
> ---
>  arch/x86/kernel/setup.c |   29 +++++++++++++++++++++--------
>  1 file changed, 21 insertions(+), 8 deletions(-)
> 
> diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
> index 1b8a8cc..644a123 100644
> --- a/arch/x86/kernel/setup.c
> +++ b/arch/x86/kernel/setup.c
> @@ -294,12 +294,25 @@ static void __init reserve_brk(void)
>  
>  #ifdef CONFIG_BLK_DEV_INITRD
>  
> +static u64 __init get_ramdisk_image(void)
> +{
> +	u64 ramdisk_image = boot_params.hdr.ramdisk_image;
> +
> +	return ramdisk_image;

just do

	return (u64)boot_params.hdr.ramdisk_image;

> +}
> +static u64 __init get_ramdisk_size(void)
> +{
> +	u64 ramdisk_size = boot_params.hdr.ramdisk_size;

ditto.

> +
> +	return ramdisk_size;
> +}
> +
>  #define MAX_MAP_CHUNK	(NR_FIX_BTMAPS << PAGE_SHIFT)
>  static void __init relocate_initrd(void)
>  {
>  	/* Assume only end is not page aligned */
> -	u64 ramdisk_image = boot_params.hdr.ramdisk_image;
> -	u64 ramdisk_size  = boot_params.hdr.ramdisk_size;
> +	u64 ramdisk_image = get_ramdisk_image();
> +	u64 ramdisk_size  = get_ramdisk_size();
>  	u64 area_size     = PAGE_ALIGN(ramdisk_size);
>  	u64 ramdisk_here;
>  	unsigned long slop, clen, mapaddr;
> @@ -338,8 +351,8 @@ static void __init relocate_initrd(void)
>  		ramdisk_size  -= clen;
>  	}
>  
> -	ramdisk_image = boot_params.hdr.ramdisk_image;
> -	ramdisk_size  = boot_params.hdr.ramdisk_size;
> +	ramdisk_image = get_ramdisk_image();
> +	ramdisk_size  = get_ramdisk_size();
>  	printk(KERN_INFO "Move RAMDISK from [mem %#010llx-%#010llx] to"
>  		" [mem %#010llx-%#010llx]\n",
>  		ramdisk_image, ramdisk_image + ramdisk_size - 1,
> @@ -363,8 +376,8 @@ static u64 __init get_mem_size(unsigned long limit_pfn)
>  static void __init early_reserve_initrd(void)
>  {
>  	/* Assume only end is not page aligned */
> -	u64 ramdisk_image = boot_params.hdr.ramdisk_image;
> -	u64 ramdisk_size  = boot_params.hdr.ramdisk_size;
> +	u64 ramdisk_image = get_ramdisk_image();
> +	u64 ramdisk_size  = get_ramdisk_size();
>  	u64 ramdisk_end   = PAGE_ALIGN(ramdisk_image + ramdisk_size);
>  
>  	if (!boot_params.hdr.type_of_loader ||
> @@ -376,8 +389,8 @@ static void __init early_reserve_initrd(void)
>  static void __init reserve_initrd(void)
>  {
>  	/* Assume only end is not page aligned */
> -	u64 ramdisk_image = boot_params.hdr.ramdisk_image;
> -	u64 ramdisk_size  = boot_params.hdr.ramdisk_size;
> +	u64 ramdisk_image = get_ramdisk_image();
> +	u64 ramdisk_size  = get_ramdisk_size();
>  	u64 ramdisk_end   = PAGE_ALIGN(ramdisk_image + ramdisk_size);
>  	u64 mapped_size;
>  
> -- 
> 1.7.10.4
> 
> 

-- 
Regards/Gruss,
Boris.

^ permalink raw reply	[flat|nested] 199+ messages in thread

* Re: [PATCH v7u1 13/31] x86, boot: add get_cmd_line_ptr()
  2013-01-04  0:48 ` [PATCH v7u1 13/31] x86, boot: add get_cmd_line_ptr() Yinghai Lu
@ 2013-01-07 15:56   ` Borislav Petkov
  0 siblings, 0 replies; 199+ messages in thread
From: Borislav Petkov @ 2013-01-07 15:56 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Eric W. Biederman,
	Andrew Morton, Jan Kiszka, Jason Wessel, linux-kernel,
	Gokul Caushik, Josh Triplett, Joe Millenbach, Alexander Duyck

On Thu, Jan 03, 2013 at 04:48:33PM -0800, Yinghai Lu wrote:
> later will check ext_cmd_line_ptr at the same time.
> 
> Signed-off-by: Yinghai Lu <yinghai@kernel.org>
> Cc: Gokul Caushik <caushik1@gmail.com>
> Cc: Josh Triplett <josh@joshtriplett.org>
> Cc: Joe Millenbach <jmillenbach@gmail.com>
> Cc: Alexander Duyck <alexander.h.duyck@intel.com>
> ---
>  arch/x86/boot/compressed/cmdline.c |   10 ++++++++--
>  arch/x86/kernel/head64.c           |   13 +++++++++++--
>  2 files changed, 19 insertions(+), 4 deletions(-)
> 
> diff --git a/arch/x86/boot/compressed/cmdline.c b/arch/x86/boot/compressed/cmdline.c
> index 10f6b11..b4c913c 100644
> --- a/arch/x86/boot/compressed/cmdline.c
> +++ b/arch/x86/boot/compressed/cmdline.c
> @@ -13,13 +13,19 @@ static inline char rdfs8(addr_t addr)
>  	return *((char *)(fs + addr));
>  }
>  #include "../cmdline.c"
> +static unsigned long get_cmd_line_ptr(void)
> +{
> +	unsigned long cmd_line_ptr = real_mode->hdr.cmd_line_ptr;
> +
> +	return cmd_line_ptr;

	return (unsigned long)real_mode->hdr.cmd_line_ptr;

should suffice.

> +}
>  int cmdline_find_option(const char *option, char *buffer, int bufsize)
>  {
> -	return __cmdline_find_option(real_mode->hdr.cmd_line_ptr, option, buffer, bufsize);
> +	return __cmdline_find_option(get_cmd_line_ptr(), option, buffer, bufsize);
>  }
>  int cmdline_find_option_bool(const char *option)
>  {
> -	return __cmdline_find_option_bool(real_mode->hdr.cmd_line_ptr, option);
> +	return __cmdline_find_option_bool(get_cmd_line_ptr(), option);
>  }
>  
>  #endif
> diff --git a/arch/x86/kernel/head64.c b/arch/x86/kernel/head64.c
> index c463725..316e7b2 100644
> --- a/arch/x86/kernel/head64.c
> +++ b/arch/x86/kernel/head64.c
> @@ -111,13 +111,22 @@ static void __init clear_bss(void)
>  	       (unsigned long) __bss_stop - (unsigned long) __bss_start);
>  }
>  
> +static unsigned long get_cmd_line_ptr(void)
> +{
> +	unsigned long cmd_line_ptr = boot_params.hdr.cmd_line_ptr;
> +
> +	return cmd_line_ptr;

ditto.

> +}
> +
>  static void __init copy_bootdata(char *real_mode_data)
>  {
>  	char * command_line;
> +	unsigned long cmd_line_ptr;
>  
>  	memcpy(&boot_params, real_mode_data, sizeof boot_params);
> -	if (boot_params.hdr.cmd_line_ptr) {
> -		command_line = __va(boot_params.hdr.cmd_line_ptr);
> +	cmd_line_ptr = get_cmd_line_ptr();
> +	if (cmd_line_ptr) {
> +		command_line = __va(cmd_line_ptr);
>  		memcpy(boot_command_line, command_line, COMMAND_LINE_SIZE);
>  	}
>  }
> -- 
> 1.7.10.4
> 
> 

-- 
Regards/Gruss,
Boris.

^ permalink raw reply	[flat|nested] 199+ messages in thread

* Re: [PATCH v7u1 14/31] x86, boot: move checking of cmd_line_ptr out of common path
  2013-01-04  0:48 ` [PATCH v7u1 14/31] x86, boot: move checking of cmd_line_ptr out of common path Yinghai Lu
@ 2013-01-07 16:00   ` Borislav Petkov
  0 siblings, 0 replies; 199+ messages in thread
From: Borislav Petkov @ 2013-01-07 16:00 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Eric W. Biederman,
	Andrew Morton, Jan Kiszka, Jason Wessel, linux-kernel

On Thu, Jan 03, 2013 at 04:48:34PM -0800, Yinghai Lu wrote:
> cmdline.c::__cmdline_find_option... are shared between 16-bit setup code
> and 32/64 bit decompressor code.
> 
> for 32/64 only path via kexec, we should not check if ptr is less 1M.
> as those cmdline could be put above 1M, or even 4G.
> 
> Move out accessible checking out of __cmdline_find_option()
> So decompressor in misc.c can parse cmdline correctly.
> 
> Signed-off-by: Yinghai Lu <yinghai@kernel.org>
> ---
>  arch/x86/boot/boot.h    |   14 ++++++++++++--
>  arch/x86/boot/cmdline.c |    8 ++++----
>  2 files changed, 16 insertions(+), 6 deletions(-)
> 
> diff --git a/arch/x86/boot/boot.h b/arch/x86/boot/boot.h
> index 18997e5..7fadf80 100644
> --- a/arch/x86/boot/boot.h
> +++ b/arch/x86/boot/boot.h
> @@ -289,12 +289,22 @@ int __cmdline_find_option(u32 cmdline_ptr, const char *option, char *buffer, int
>  int __cmdline_find_option_bool(u32 cmdline_ptr, const char *option);
>  static inline int cmdline_find_option(const char *option, char *buffer, int bufsize)
>  {
> -	return __cmdline_find_option(boot_params.hdr.cmd_line_ptr, option, buffer, bufsize);
> +	u32 cmd_line_ptr = boot_params.hdr.cmd_line_ptr;

This check could very well use a comment for why we're checking it to be
under 1Mb, no matter that the original code didn't have it.

> +	if (cmd_line_ptr >= 0x100000)
> +		return -1;      /* inaccessible */
> +
> +	return __cmdline_find_option(cmd_line_ptr, option, buffer, bufsize);
>  }

[ … ]

Thanks.

-- 
Regards/Gruss,
Boris.

^ permalink raw reply	[flat|nested] 199+ messages in thread

* Re: [PATCH v7u1 26/31] x86: Don't enable swiotlb if there is not enough ram for it
  2013-01-07 15:26           ` Konrad Rzeszutek Wilk
@ 2013-01-07 17:02             ` Shuah Khan
  2013-01-07 19:29               ` Konrad Rzeszutek Wilk
  2013-01-08  2:22               ` Eric W. Biederman
  2013-01-07 20:32             ` Yinghai Lu
  1 sibling, 2 replies; 199+ messages in thread
From: Shuah Khan @ 2013-01-07 17:02 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: Yinghai Lu, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	Eric W. Biederman, Andrew Morton, Borislav Petkov, Jan Kiszka,
	Jason Wessel, linux-kernel, Joerg Roedel

On Mon, Jan 7, 2013 at 8:26 AM, Konrad Rzeszutek Wilk
<konrad.wilk@oracle.com> wrote:
> On Fri, Jan 04, 2013 at 02:10:25PM -0800, Yinghai Lu wrote:
>> On Fri, Jan 4, 2013 at 1:02 PM, Shuah Khan <shuahkhan@gmail.com> wrote:
>> > Pani'cing the system doesn't sound like a good option to me in this
>> > case. This change to disable swiotlb is made for kdump. However, with
>> > this change several system fail to boot, unless crashkernel_low=72M is
>> > specified.
>>
>> this patchset is new feature to put second kdump kernel above 4G.
>>
>> >
>> > I would the say the right approach to solve this would be to not
>> > change the current pci_swiotlb_detect_override() behavior and treat
>> > swiotlb =1 upon entry equivalent to swiotlb_force set.
>>
>> that will make intel system have to take crashkernel_low=72M too.
>> otherwise intel system will get panic during swiotlb allocation.
>
> Two things:
>
>  1). You need to wrap the 'is_enough_..' in CONFIG_KEXEC, which means
>     that the function needs to go in a header file.
>  2). The check for 1MB is suspect. Why only 1MB? You mentioned it is
>      b/c of crashkernel_low=72M (which I am not seeing in v3.8 kernel-parameters.txt?
>      Is that part of your mega-patchset?). Anyhow, there seems to be a disconnect -
>      what if the user supplied crashkernel_low=27M? Perhaps the 'is_enough'
>      should also parse the bootparams to double-check that there is enough
>      low-mem space? But then if the kernel grows then 72M might not be enough -
>      you might need 82M with 3.9.
>
>      Perhaps a better way for this is to do:
>         1). Change 'is_enough' to check only for 4MB.
>         2). When booting as kexec, the SWIOTLB would only use 4MB instead of 64MB?
>
>      Or, we could also use the post-late SWIOTLB initialization similiary to how it was
>      done on ia64. This would mean that the AMD VI code would just call the
>      .. something like this - NOT tested or even compile tested:
>
> diff --git a/drivers/iommu/amd_iommu.c b/drivers/iommu/amd_iommu.c
> index c1c74e0..e7fa8f7 100644
> --- a/drivers/iommu/amd_iommu.c
> +++ b/drivers/iommu/amd_iommu.c
> @@ -3173,6 +3173,24 @@ int __init amd_iommu_init_dma_ops(void)
>         if (unhandled && max_pfn > MAX_DMA32_PFN) {
>                 /* There are unhandled devices - initialize swiotlb for them */
>                 swiotlb = 1;
> +               /* Late (so no bootmem allocator) usage and only if the early SWIOTLB
> +                * hadn't been allocated (which can happen on kexec kernels booted
> +                * above 4GB). */
> +               if (!swiotlb_nr_tbl()) {
> +                       int retry = 3;
> +                       int mb_size = 64;
> +                       int rc = 0;
> +retry_me:
> +                       if (retry < 0)
> +                               panic("We tried setting %dMB for SWIOTLB but got -ENOMEM", mb_size << 1);
> +                       rc = swiotlb_late_init_with_default_size(mb_size * (1<<20));
> +                       if (rc) {
> +                               retry --;
> +                               mb_size >> 1;
> +                               goto retry_me;
> +                       }
> +                       dma_ops = &swiotlb_dma_ops;
> +               }
>         }
>
>         amd_iommu_stats_init();
>
> And then the early SWIOTLB initialization for 64MB can fail and we are still OK.
>>

Yinghai/Konrad,

Did more testing. btw this patch depends on your [v7u1,25/31]
memblock: add memblock_mem_size(). Here are the test results:

1. When there is not enough memory: (enough_mem_for_swiotlb() returns false)
system will panic in amd_iommu_init_dma_ops().

2. When there is enough memory: (enough_mem_for_swiotlb() returns true):
swiotlb is reserved
pci_swiotlb_late_init() leaves the buffer allocated since swiotlb=1
with that getting changed in amd_iommu_init_dma_ops().

I agree with Konrad that the logic should be wrapped in CONFIG_KEXEC.

Also, since IOMMU drivers can no longer assume swiotlb is allocated
enough_mem_for_swiotlb() check fails, AMD IOMMU or another other iommu
driver can't simply rely on changing swiotlb=1 and assuming the buffer
is there.

As Konrad suggested,  a hook is needed, however, I think the logic to
ensure switolb buffer belongs in swiotlb modules. How about changing
pci_swiolb_late_init() logic to ensure swioltb late init is done
instead of leaving it up to AMD IOMMU driver or some other driver.

The logic to update dma_ops doesn't really belong in
amd_iommu_init_dma_ops() anyways.

-- Shuah

^ permalink raw reply	[flat|nested] 199+ messages in thread

* Re: [PATCH v7u1 26/31] x86: Don't enable swiotlb if there is not enough ram for it
  2013-01-07 17:02             ` Shuah Khan
@ 2013-01-07 19:29               ` Konrad Rzeszutek Wilk
  2013-01-08  2:22               ` Eric W. Biederman
  1 sibling, 0 replies; 199+ messages in thread
From: Konrad Rzeszutek Wilk @ 2013-01-07 19:29 UTC (permalink / raw)
  To: Shuah Khan
  Cc: Yinghai Lu, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	Eric W. Biederman, Andrew Morton, Borislav Petkov, Jan Kiszka,
	Jason Wessel, linux-kernel, Joerg Roedel

> Also, since IOMMU drivers can no longer assume swiotlb is allocated
> enough_mem_for_swiotlb() check fails, AMD IOMMU or another other iommu
> driver can't simply rely on changing swiotlb=1 and assuming the buffer
> is there.
> 
> As Konrad suggested,  a hook is needed, however, I think the logic to
> ensure switolb buffer belongs in swiotlb modules. How about changing
> pci_swiolb_late_init() logic to ensure swioltb late init is done
> instead of leaving it up to AMD IOMMU driver or some other driver.

Perhaps by having the 'swiotlb' be more than just on/off? It could
carry different flags, such as:

	EARLY_BOOTMEM_ON
		(if pci_swiotlb_detect_4gb or pci_swiotlb_detect_override sets it)
	EARLY_BOOTMEM_OFF
		this would replace 'swiotlb=0'
	FORCE
		replaces 'swiotlb_force'
	LATE_INIT_ON
		this new option where pci_swiotlb_late_init()
		would call the late init.

That would require some tweaking also in the IA64 code, but seems
like a step in the right direction. And we can get rid of that
'swiotlb_force' parameter.

Actually, that would also allow us to get rid of the pci_swiotlb_detect_override
and the setup_io_tlb_npages function could now just set swiotlb |= SWIOTLB_FORCE;


^ permalink raw reply	[flat|nested] 199+ messages in thread

* Re: [PATCH v7u1 26/31] x86: Don't enable swiotlb if there is not enough ram for it
  2013-01-07 15:26           ` Konrad Rzeszutek Wilk
  2013-01-07 17:02             ` Shuah Khan
@ 2013-01-07 20:32             ` Yinghai Lu
  2013-01-07 21:30               ` Yinghai Lu
  1 sibling, 1 reply; 199+ messages in thread
From: Yinghai Lu @ 2013-01-07 20:32 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: Shuah Khan, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	Eric W. Biederman, Andrew Morton, Borislav Petkov, Jan Kiszka,
	Jason Wessel, linux-kernel, Joerg Roedel

On Mon, Jan 7, 2013 at 7:26 AM, Konrad Rzeszutek Wilk
<konrad.wilk@oracle.com> wrote:
> On Fri, Jan 04, 2013 at 02:10:25PM -0800, Yinghai Lu wrote:
>> On Fri, Jan 4, 2013 at 1:02 PM, Shuah Khan <shuahkhan@gmail.com> wrote:
>> > Pani'cing the system doesn't sound like a good option to me in this
>> > case. This change to disable swiotlb is made for kdump. However, with
>> > this change several system fail to boot, unless crashkernel_low=72M is
>> > specified.
>>
>> this patchset is new feature to put second kdump kernel above 4G.
>>
>> >
>> > I would the say the right approach to solve this would be to not
>> > change the current pci_swiotlb_detect_override() behavior and treat
>> > swiotlb =1 upon entry equivalent to swiotlb_force set.
>>
>> that will make intel system have to take crashkernel_low=72M too.
>> otherwise intel system will get panic during swiotlb allocation.
>
> Two things:
>
>  1). You need to wrap the 'is_enough_..' in CONFIG_KEXEC, which means
>     that the function needs to go in a header file.

kdump is just one case, user still can boot the system with memmap=...

to hit the panic.

>  2). The check for 1MB is suspect. Why only 1MB? You mentioned it is
>      b/c of crashkernel_low=72M (which I am not seeing in v3.8 kernel-parameters.txt?
>      Is that part of your mega-patchset?). Anyhow, there seems to be a disconnect -
>      what if the user supplied crashkernel_low=27M? Perhaps the 'is_enough'
>      should also parse the bootparams to double-check that there is enough
>      low-mem space? But then if the kernel grows then 72M might not be enough -
>      you might need 82M with 3.9.

again, could with memmap= include or exclude case, so parse boot command line
is not going to handle all the case.

>
>      Perhaps a better way for this is to do:
>         1). Change 'is_enough' to check only for 4MB.
>         2). When booting as kexec, the SWIOTLB would only use 4MB instead of 64MB?

can not tell it is from kexec path. kexec just get boot loader type assigned.

>
>      Or, we could also use the post-late SWIOTLB initialization similiary to how it was
>      done on ia64. This would mean that the AMD VI code would just call the
>      .. something like this - NOT tested or even compile tested:
>
> diff --git a/drivers/iommu/amd_iommu.c b/drivers/iommu/amd_iommu.c
> index c1c74e0..e7fa8f7 100644
> --- a/drivers/iommu/amd_iommu.c
> +++ b/drivers/iommu/amd_iommu.c
> @@ -3173,6 +3173,24 @@ int __init amd_iommu_init_dma_ops(void)
>         if (unhandled && max_pfn > MAX_DMA32_PFN) {
>                 /* There are unhandled devices - initialize swiotlb for them */
>                 swiotlb = 1;
> +               /* Late (so no bootmem allocator) usage and only if the early SWIOTLB
> +                * hadn't been allocated (which can happen on kexec kernels booted
> +                * above 4GB). */
> +               if (!swiotlb_nr_tbl()) {
> +                       int retry = 3;
> +                       int mb_size = 64;
> +                       int rc = 0;
> +retry_me:
> +                       if (retry < 0)
> +                               panic("We tried setting %dMB for SWIOTLB but got -ENOMEM", mb_size << 1);
> +                       rc = swiotlb_late_init_with_default_size(mb_size * (1<<20));
> +                       if (rc) {
> +                               retry --;
> +                               mb_size >> 1;
> +                               goto retry_me;
> +                       }
> +                       dma_ops = &swiotlb_dma_ops;
> +               }
>         }
>
>         amd_iommu_stats_init();
>
> And then the early SWIOTLB initialization for 64MB can fail and we are still OK.

no, that is overkill.

^ permalink raw reply	[flat|nested] 199+ messages in thread

* Re: [PATCH v7u1 26/31] x86: Don't enable swiotlb if there is not enough ram for it
  2013-01-07 20:32             ` Yinghai Lu
@ 2013-01-07 21:30               ` Yinghai Lu
  0 siblings, 0 replies; 199+ messages in thread
From: Yinghai Lu @ 2013-01-07 21:30 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: Shuah Khan, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	Eric W. Biederman, Andrew Morton, Borislav Petkov, Jan Kiszka,
	Jason Wessel, linux-kernel, Joerg Roedel

[-- Attachment #1: Type: text/plain, Size: 798 bytes --]

On Mon, Jan 7, 2013 at 12:32 PM, Yinghai Lu <yinghai@kernel.org> wrote:
>>  2). The check for 1MB is suspect. Why only 1MB? You mentioned it is
>>      b/c of crashkernel_low=72M (which I am not seeing in v3.8 kernel-parameters.txt?
>>      Is that part of your mega-patchset?). Anyhow, there seems to be a disconnect -
>>      what if the user supplied crashkernel_low=27M? Perhaps the 'is_enough'
>>      should also parse the bootparams to double-check that there is enough
>>      low-mem space? But then if the kernel grows then 72M might not be enough -
>>      you might need 82M with 3.9.
>
> again, could with memmap= include or exclude case, so parse boot command line
> is not going to handle all the case.

replace the hardcoded 1M checking, please check attached -v3.

Thanks

Yinghai

[-- Attachment #2: auto_switch_off_swiotlb_v3.patch --]
[-- Type: application/octet-stream, Size: 5014 bytes --]

Subject: [PATCH] x86: Don't enable swiotlb if there is not enough ram for it

Normal boot path on system with iommu support:
swiotlb buffer will be allocated early at first and then try to initialize
iommu, if iommu for intel or amd could setup properly, swiotlb buffer
will be freed.

The early allocating is with bootmem, and get panic when we try to use
kdump with buffer above 4G only if swiotlb is enabled.

because actually the kernel can go on without swiotlb, and use intel iommu.

Try disable swiotlb if there is not enough ram for it.

That is for kdump to use kernel above 4G.

-v2: Shuah Khan <shuahkhan@gmail.com> pointed out that AMD iommu unhandled
     devices that need swiotlb will have problem.
     In that case, we have to panic, because we do not enable swiotlb
     before.
-v3: replace hard-code 1M low ram checking with probed low ram
     size that is needed by swiotlb.

Suggested-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Joerg Roedel <joro@8bytes.org>

---
 arch/x86/kernel/pci-swiotlb.c |   15 +++++++++++----
 drivers/iommu/amd_iommu.c     |    4 ++++
 include/linux/swiotlb.h       |    1 +
 lib/swiotlb.c                 |   21 ++++++++++++++++++++-
 4 files changed, 36 insertions(+), 5 deletions(-)

Index: linux-2.6/arch/x86/kernel/pci-swiotlb.c
===================================================================
--- linux-2.6.orig/arch/x86/kernel/pci-swiotlb.c
+++ linux-2.6/arch/x86/kernel/pci-swiotlb.c
@@ -6,6 +6,7 @@
 #include <linux/swiotlb.h>
 #include <linux/bootmem.h>
 #include <linux/dma-mapping.h>
+#include <linux/memblock.h>
 
 #include <asm/iommu.h>
 #include <asm/swiotlb.h>
@@ -50,6 +51,12 @@ static struct dma_map_ops swiotlb_dma_op
 	.dma_supported = NULL,
 };
 
+static bool __init enough_mem_for_swiotlb(void)
+{
+	/* do we have enough low ram ? */
+	return memblock_mem_size(1ULL<<(32-PAGE_SHIFT)) >
+			 low_ram_size_for_swiotlb();
+}
 /*
  * pci_swiotlb_detect_override - set swiotlb to 1 if necessary
  *
@@ -58,12 +65,12 @@ static struct dma_map_ops swiotlb_dma_op
  */
 int __init pci_swiotlb_detect_override(void)
 {
-	int use_swiotlb = swiotlb | swiotlb_force;
-
 	if (swiotlb_force)
 		swiotlb = 1;
+	else if (!enough_mem_for_swiotlb())
+		swiotlb = 0;
 
-	return use_swiotlb;
+	return swiotlb;
 }
 IOMMU_INIT_FINISH(pci_swiotlb_detect_override,
 		  pci_xen_swiotlb_detect,
@@ -78,7 +85,7 @@ int __init pci_swiotlb_detect_4gb(void)
 {
 	/* don't initialize swiotlb if iommu=off (no_iommu=1) */
 #ifdef CONFIG_X86_64
-	if (!no_iommu && max_pfn > MAX_DMA32_PFN)
+	if (!no_iommu && max_pfn > MAX_DMA32_PFN && enough_mem_for_swiotlb())
 		swiotlb = 1;
 #endif
 	return swiotlb;
Index: linux-2.6/drivers/iommu/amd_iommu.c
===================================================================
--- linux-2.6.orig/drivers/iommu/amd_iommu.c
+++ linux-2.6/drivers/iommu/amd_iommu.c
@@ -3144,6 +3144,7 @@ int __init amd_iommu_init_dma_ops(void)
 {
 	struct amd_iommu *iommu;
 	int ret, unhandled;
+	int swiotlb_orig;
 
 	/*
 	 * first allocate a default protection domain for every IOMMU we
@@ -3166,12 +3167,15 @@ int __init amd_iommu_init_dma_ops(void)
 	prealloc_protection_domains();
 
 	iommu_detected = 1;
+	swiotlb_orig = swiotlb;
 	swiotlb = 0;
 
 	/* Make the driver finally visible to the drivers */
 	unhandled = device_dma_ops_init();
 	if (unhandled && max_pfn > MAX_DMA32_PFN) {
 		/* There are unhandled devices - initialize swiotlb for them */
+		if (!swiotlb_orig)
+			panic("can not enable swiotlb for unhandled devices by AMD iommu!\n");
 		swiotlb = 1;
 	}
 
Index: linux-2.6/include/linux/swiotlb.h
===================================================================
--- linux-2.6.orig/include/linux/swiotlb.h
+++ linux-2.6/include/linux/swiotlb.h
@@ -22,6 +22,7 @@ extern int swiotlb_force;
  */
 #define IO_TLB_SHIFT 11
 
+unsigned long low_ram_size_for_swiotlb(void);
 extern void swiotlb_init(int verbose);
 extern void swiotlb_init_with_tbl(char *tlb, unsigned long nslabs, int verbose);
 extern unsigned long swiotlb_nr_tbl(void);
Index: linux-2.6/lib/swiotlb.c
===================================================================
--- linux-2.6.orig/lib/swiotlb.c
+++ linux-2.6/lib/swiotlb.c
@@ -198,10 +198,29 @@ swiotlb_init_with_default_size(size_t de
 	swiotlb_init_with_tbl(vstart, io_tlb_nslabs, verbose);
 }
 
+/* default to 64MB */
+#define SWIOTLB_DEFAULT_SIZE (64UL<<20)
 void __init
 swiotlb_init(int verbose)
 {
-	swiotlb_init_with_default_size(64 * (1<<20), verbose);	/* default to 64MB */
+	swiotlb_init_with_default_size(SWIOTLB_DEFAULT_SIZE, verbose);
+}
+
+unsigned long __init low_ram_size_for_swiotlb(void)
+{
+	unsigned long bytes;
+	unsigned long nslabs = io_tlb_nslabs;
+
+	if (!nslabs) {
+		nslabs = (SWIOTLB_DEFAULT_SIZE >> IO_TLB_SHIFT);
+		nslabs = ALIGN(nslabs, IO_TLB_SEGSIZE);
+	}
+
+	bytes = PAGE_ALIGN(nslabs << IO_TLB_SHIFT);
+
+	bytes += PAGE_ALIGN(io_tlb_overflow);
+
+	return bytes;
 }
 
 /*

^ permalink raw reply	[flat|nested] 199+ messages in thread

* Re: [PATCH v7u1 26/31] x86: Don't enable swiotlb if there is not enough ram for it
  2013-01-07 17:02             ` Shuah Khan
  2013-01-07 19:29               ` Konrad Rzeszutek Wilk
@ 2013-01-08  2:22               ` Eric W. Biederman
  2013-01-08  2:48                 ` Konrad Rzeszutek Wilk
  2013-01-08  3:01                 ` Yinghai Lu
  1 sibling, 2 replies; 199+ messages in thread
From: Eric W. Biederman @ 2013-01-08  2:22 UTC (permalink / raw)
  To: Shuah Khan
  Cc: Konrad Rzeszutek Wilk, Yinghai Lu, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin, Andrew Morton, Borislav Petkov, Jan Kiszka,
	Jason Wessel, linux-kernel, Joerg Roedel

Shuah Khan <shuahkhan@gmail.com> writes:

> On Mon, Jan 7, 2013 at 8:26 AM, Konrad Rzeszutek Wilk
> <konrad.wilk@oracle.com> wrote:
>> On Fri, Jan 04, 2013 at 02:10:25PM -0800, Yinghai Lu wrote:
>>> On Fri, Jan 4, 2013 at 1:02 PM, Shuah Khan <shuahkhan@gmail.com> wrote:
>>> > Pani'cing the system doesn't sound like a good option to me in this
>>> > case. This change to disable swiotlb is made for kdump. However, with
>>> > this change several system fail to boot, unless crashkernel_low=72M is
>>> > specified.
>>>
>>> this patchset is new feature to put second kdump kernel above 4G.
>>>
>>> >
>>> > I would the say the right approach to solve this would be to not
>>> > change the current pci_swiotlb_detect_override() behavior and treat
>>> > swiotlb =1 upon entry equivalent to swiotlb_force set.
>>>
>>> that will make intel system have to take crashkernel_low=72M too.
>>> otherwise intel system will get panic during swiotlb allocation.
>>
>> Two things:
>>
>>  1). You need to wrap the 'is_enough_..' in CONFIG_KEXEC, which means
>>     that the function needs to go in a header file.
>>  2). The check for 1MB is suspect. Why only 1MB? You mentioned it is
>>      b/c of crashkernel_low=72M (which I am not seeing in v3.8 kernel-parameters.txt?
>>      Is that part of your mega-patchset?). Anyhow, there seems to be a disconnect -
>>      what if the user supplied crashkernel_low=27M? Perhaps the 'is_enough'
>>      should also parse the bootparams to double-check that there is enough
>>      low-mem space? But then if the kernel grows then 72M might not be enough -
>>      you might need 82M with 3.9.
>>
>>      Perhaps a better way for this is to do:
>>         1). Change 'is_enough' to check only for 4MB.
>>         2). When booting as kexec, the SWIOTLB would only use 4MB instead of 64MB?
>>
>>      Or, we could also use the post-late SWIOTLB initialization similiary to how it was
>>      done on ia64. This would mean that the AMD VI code would just call the
>>      .. something like this - NOT tested or even compile tested:
>>
>> diff --git a/drivers/iommu/amd_iommu.c b/drivers/iommu/amd_iommu.c
>> index c1c74e0..e7fa8f7 100644
>> --- a/drivers/iommu/amd_iommu.c
>> +++ b/drivers/iommu/amd_iommu.c
>> @@ -3173,6 +3173,24 @@ int __init amd_iommu_init_dma_ops(void)
>>         if (unhandled && max_pfn > MAX_DMA32_PFN) {
>>                 /* There are unhandled devices - initialize swiotlb for them */
>>                 swiotlb = 1;
>> +               /* Late (so no bootmem allocator) usage and only if the early SWIOTLB
>> +                * hadn't been allocated (which can happen on kexec kernels booted
>> +                * above 4GB). */
>> +               if (!swiotlb_nr_tbl()) {
>> +                       int retry = 3;
>> +                       int mb_size = 64;
>> +                       int rc = 0;
>> +retry_me:
>> +                       if (retry < 0)
>> +                               panic("We tried setting %dMB for SWIOTLB but got -ENOMEM", mb_size << 1);
>> +                       rc = swiotlb_late_init_with_default_size(mb_size * (1<<20));
>> +                       if (rc) {
>> +                               retry --;
>> +                               mb_size >> 1;
>> +                               goto retry_me;
>> +                       }
>> +                       dma_ops = &swiotlb_dma_ops;
>> +               }
>>         }
>>
>>         amd_iommu_stats_init();
>>
>> And then the early SWIOTLB initialization for 64MB can fail and we are still OK.
>>>
>
> Yinghai/Konrad,
>
> Did more testing. btw this patch depends on your [v7u1,25/31]
> memblock: add memblock_mem_size(). Here are the test results:
>
> 1. When there is not enough memory: (enough_mem_for_swiotlb() returns false)
> system will panic in amd_iommu_init_dma_ops().
>
> 2. When there is enough memory: (enough_mem_for_swiotlb() returns true):
> swiotlb is reserved
> pci_swiotlb_late_init() leaves the buffer allocated since swiotlb=1
> with that getting changed in amd_iommu_init_dma_ops().
>
> I agree with Konrad that the logic should be wrapped in CONFIG_KEXEC.

If enough_mem_for_swiotlb needs to be conditional on CONFIG_KEXEC the
code is architected wrong.  None of this logic has anything to do with
kexec except that the kexec path is one way to get this condition to
happen.  Especially since the kexec'd kernel where this condition occurs
does not need kexec support built in.

Yinghai I sat down and read your patch and the approach you are taking
is totally wrong.

The problem is that swiotlb_init() in lib/swiotlb.c does not know how to
fail without panic'ing the system.

Which leaves two valid approaches.
- Create a variant of swiotlb_init that can fail for use on x86 and
  handle the failure.
- Delay initializing the swiotlb until someone actually needs a mapping
  from it.  

Delaying the initialization of the swiotlb is out because the code
needs an early memory allocation to get a large chunk of contiguous
memory for the bounce buffers.

Which means the panics that occurr in swiotlb_init() need to be delayed
until someone something actually needs bounce buffers from the swiotlb.

Although arguably what should actually happen instead of panic() is that
swiotlb_map_single should simply fail early when it was not possible to
preallocate bounce buffers.

Eric

^ permalink raw reply	[flat|nested] 199+ messages in thread

* Re: [PATCH v7u1 26/31] x86: Don't enable swiotlb if there is not enough ram for it
  2013-01-08  2:22               ` Eric W. Biederman
@ 2013-01-08  2:48                 ` Konrad Rzeszutek Wilk
  2013-01-08  3:03                   ` Eric W. Biederman
  2013-01-08  3:01                 ` Yinghai Lu
  1 sibling, 1 reply; 199+ messages in thread
From: Konrad Rzeszutek Wilk @ 2013-01-08  2:48 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Shuah Khan, Yinghai Lu, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin, Andrew Morton, Borislav Petkov, Jan Kiszka,
	Jason Wessel, linux-kernel, Joerg Roedel

On Mon, Jan 07, 2013 at 06:22:51PM -0800, Eric W. Biederman wrote:
> Shuah Khan <shuahkhan@gmail.com> writes:
> 
> > On Mon, Jan 7, 2013 at 8:26 AM, Konrad Rzeszutek Wilk
> > <konrad.wilk@oracle.com> wrote:
> >> On Fri, Jan 04, 2013 at 02:10:25PM -0800, Yinghai Lu wrote:
> >>> On Fri, Jan 4, 2013 at 1:02 PM, Shuah Khan <shuahkhan@gmail.com> wrote:
> >>> > Pani'cing the system doesn't sound like a good option to me in this
> >>> > case. This change to disable swiotlb is made for kdump. However, with
> >>> > this change several system fail to boot, unless crashkernel_low=72M is
> >>> > specified.
> >>>
> >>> this patchset is new feature to put second kdump kernel above 4G.
> >>>
> >>> >
> >>> > I would the say the right approach to solve this would be to not
> >>> > change the current pci_swiotlb_detect_override() behavior and treat
> >>> > swiotlb =1 upon entry equivalent to swiotlb_force set.
> >>>
> >>> that will make intel system have to take crashkernel_low=72M too.
> >>> otherwise intel system will get panic during swiotlb allocation.
> >>
> >> Two things:
> >>
> >>  1). You need to wrap the 'is_enough_..' in CONFIG_KEXEC, which means
> >>     that the function needs to go in a header file.
> >>  2). The check for 1MB is suspect. Why only 1MB? You mentioned it is
> >>      b/c of crashkernel_low=72M (which I am not seeing in v3.8 kernel-parameters.txt?
> >>      Is that part of your mega-patchset?). Anyhow, there seems to be a disconnect -
> >>      what if the user supplied crashkernel_low=27M? Perhaps the 'is_enough'
> >>      should also parse the bootparams to double-check that there is enough
> >>      low-mem space? But then if the kernel grows then 72M might not be enough -
> >>      you might need 82M with 3.9.
> >>
> >>      Perhaps a better way for this is to do:
> >>         1). Change 'is_enough' to check only for 4MB.
> >>         2). When booting as kexec, the SWIOTLB would only use 4MB instead of 64MB?
> >>
> >>      Or, we could also use the post-late SWIOTLB initialization similiary to how it was
> >>      done on ia64. This would mean that the AMD VI code would just call the
> >>      .. something like this - NOT tested or even compile tested:
> >>
> >> diff --git a/drivers/iommu/amd_iommu.c b/drivers/iommu/amd_iommu.c
> >> index c1c74e0..e7fa8f7 100644
> >> --- a/drivers/iommu/amd_iommu.c
> >> +++ b/drivers/iommu/amd_iommu.c
> >> @@ -3173,6 +3173,24 @@ int __init amd_iommu_init_dma_ops(void)
> >>         if (unhandled && max_pfn > MAX_DMA32_PFN) {
> >>                 /* There are unhandled devices - initialize swiotlb for them */
> >>                 swiotlb = 1;
> >> +               /* Late (so no bootmem allocator) usage and only if the early SWIOTLB
> >> +                * hadn't been allocated (which can happen on kexec kernels booted
> >> +                * above 4GB). */
> >> +               if (!swiotlb_nr_tbl()) {
> >> +                       int retry = 3;
> >> +                       int mb_size = 64;
> >> +                       int rc = 0;
> >> +retry_me:
> >> +                       if (retry < 0)
> >> +                               panic("We tried setting %dMB for SWIOTLB but got -ENOMEM", mb_size << 1);
> >> +                       rc = swiotlb_late_init_with_default_size(mb_size * (1<<20));
> >> +                       if (rc) {
> >> +                               retry --;
> >> +                               mb_size >> 1;
> >> +                               goto retry_me;
> >> +                       }
> >> +                       dma_ops = &swiotlb_dma_ops;
> >> +               }
> >>         }
> >>
> >>         amd_iommu_stats_init();
> >>
> >> And then the early SWIOTLB initialization for 64MB can fail and we are still OK.
> >>>
> >
> > Yinghai/Konrad,
> >
> > Did more testing. btw this patch depends on your [v7u1,25/31]
> > memblock: add memblock_mem_size(). Here are the test results:
> >
> > 1. When there is not enough memory: (enough_mem_for_swiotlb() returns false)
> > system will panic in amd_iommu_init_dma_ops().
> >
> > 2. When there is enough memory: (enough_mem_for_swiotlb() returns true):
> > swiotlb is reserved
> > pci_swiotlb_late_init() leaves the buffer allocated since swiotlb=1
> > with that getting changed in amd_iommu_init_dma_ops().
> >
> > I agree with Konrad that the logic should be wrapped in CONFIG_KEXEC.
> 
> If enough_mem_for_swiotlb needs to be conditional on CONFIG_KEXEC the
> code is architected wrong.  None of this logic has anything to do with
> kexec except that the kexec path is one way to get this condition to
> happen.  Especially since the kexec'd kernel where this condition occurs
> does not need kexec support built in.

Fair enough - with the 'memmap' command line options one can trigger
this.
> 
> Yinghai I sat down and read your patch and the approach you are taking
> is totally wrong.
> 
> The problem is that swiotlb_init() in lib/swiotlb.c does not know how to
> fail without panic'ing the system.
> 
> Which leaves two valid approaches.
> - Create a variant of swiotlb_init that can fail for use on x86 and
>   handle the failure.

As an safe-fail step we could retry with an smaller size until a fit is found.

> - Delay initializing the swiotlb until someone actually needs a mapping
>   from it.  

So late init the SWIOTLB and perhaps have multiple "segments" of 4MB
of SWIOTLB that can grow as we exhaust its memory. Could work.
> 
> Delaying the initialization of the swiotlb is out because the code
> needs an early memory allocation to get a large chunk of contiguous
> memory for the bounce buffers.

Or it can use the late init, but with a smaller chunk of memory.

> 
> Which means the panics that occurr in swiotlb_init() need to be delayed
> until someone something actually needs bounce buffers from the swiotlb.
> 
> Although arguably what should actually happen instead of panic() is that
> swiotlb_map_single should simply fail early when it was not possible to
> preallocate bounce buffers.

This sounds like a Catch-22. Fail early implies that it would have to do
this when using the bootmem allocator. But the swiotlb_map_single is not
called at that time - it is called _after_ the bootmem allocator has been
de-activated. Actually it is called pretty late - when built-in PCI devices
start off or when 'udev' starts scanning the PCI bus and loading modules.

I think I am misunderstanding you - could you clarify please?


^ permalink raw reply	[flat|nested] 199+ messages in thread

* Re: [PATCH v7u1 26/31] x86: Don't enable swiotlb if there is not enough ram for it
  2013-01-08  2:22               ` Eric W. Biederman
  2013-01-08  2:48                 ` Konrad Rzeszutek Wilk
@ 2013-01-08  3:01                 ` Yinghai Lu
  2013-01-08  3:13                   ` Eric W. Biederman
  1 sibling, 1 reply; 199+ messages in thread
From: Yinghai Lu @ 2013-01-08  3:01 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Shuah Khan, Konrad Rzeszutek Wilk, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin, Andrew Morton, Borislav Petkov, Jan Kiszka,
	Jason Wessel, linux-kernel, Joerg Roedel

On Mon, Jan 7, 2013 at 6:22 PM, Eric W. Biederman <ebiederm@xmission.com> wrote:
> Yinghai I sat down and read your patch and the approach you are taking
> is totally wrong.

Thanks for check the patch, did you check v3?

>
> The problem is that swiotlb_init() in lib/swiotlb.c does not know how to
> fail without panic'ing the system.

I did not put panic in swiotlb, now I put panic in amd_iommu ops init
when it need extra
swiotlb for unhandled devices by AMD IOMMU.

>
> Which leaves two valid approaches.
> - Create a variant of swiotlb_init that can fail for use on x86 and
>   handle the failure.
> - Delay initializing the swiotlb until someone actually needs a mapping
>   from it.
>
> Delaying the initialization of the swiotlb is out because the code
> needs an early memory allocation to get a large chunk of contiguous
> memory for the bounce buffers.

ok.

>
> Which means the panics that occurr in swiotlb_init() need to be delayed
> until someone something actually needs bounce buffers from the swiotlb.
>
> Although arguably what should actually happen instead of panic() is that
> swiotlb_map_single should simply fail early when it was not possible to
> preallocate bounce buffers.

do you mean: actually needed dma buffer is much less than swiotlb
buffer aka 64M?

Yinghai

^ permalink raw reply	[flat|nested] 199+ messages in thread

* Re: [PATCH v7u1 26/31] x86: Don't enable swiotlb if there is not enough ram for it
  2013-01-08  2:48                 ` Konrad Rzeszutek Wilk
@ 2013-01-08  3:03                   ` Eric W. Biederman
  0 siblings, 0 replies; 199+ messages in thread
From: Eric W. Biederman @ 2013-01-08  3:03 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: Shuah Khan, Yinghai Lu, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin, Andrew Morton, Borislav Petkov, Jan Kiszka,
	Jason Wessel, linux-kernel, Joerg Roedel

Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> writes:

> On Mon, Jan 07, 2013 at 06:22:51PM -0800, Eric W. Biederman wrote:
>> Shuah Khan <shuahkhan@gmail.com> writes:
>> 
>> > On Mon, Jan 7, 2013 at 8:26 AM, Konrad Rzeszutek Wilk
>> > <konrad.wilk@oracle.com> wrote:
>> >> On Fri, Jan 04, 2013 at 02:10:25PM -0800, Yinghai Lu wrote:
>> >>> On Fri, Jan 4, 2013 at 1:02 PM, Shuah Khan <shuahkhan@gmail.com> wrote:
>> >>> > Pani'cing the system doesn't sound like a good option to me in this
>> >>> > case. This change to disable swiotlb is made for kdump. However, with
>> >>> > this change several system fail to boot, unless crashkernel_low=72M is
>> >>> > specified.
>> >>>
>> >>> this patchset is new feature to put second kdump kernel above 4G.
>> >>>
>> >>> >
>> >>> > I would the say the right approach to solve this would be to not
>> >>> > change the current pci_swiotlb_detect_override() behavior and treat
>> >>> > swiotlb =1 upon entry equivalent to swiotlb_force set.
>> >>>
>> >>> that will make intel system have to take crashkernel_low=72M too.
>> >>> otherwise intel system will get panic during swiotlb allocation.
>> >>
>> >> Two things:
>> >>
>> >>  1). You need to wrap the 'is_enough_..' in CONFIG_KEXEC, which means
>> >>     that the function needs to go in a header file.
>> >>  2). The check for 1MB is suspect. Why only 1MB? You mentioned it is
>> >>      b/c of crashkernel_low=72M (which I am not seeing in v3.8 kernel-parameters.txt?
>> >>      Is that part of your mega-patchset?). Anyhow, there seems to be a disconnect -
>> >>      what if the user supplied crashkernel_low=27M? Perhaps the 'is_enough'
>> >>      should also parse the bootparams to double-check that there is enough
>> >>      low-mem space? But then if the kernel grows then 72M might not be enough -
>> >>      you might need 82M with 3.9.
>> >>
>> >>      Perhaps a better way for this is to do:
>> >>         1). Change 'is_enough' to check only for 4MB.
>> >>         2). When booting as kexec, the SWIOTLB would only use 4MB instead of 64MB?
>> >>
>> >>      Or, we could also use the post-late SWIOTLB initialization similiary to how it was
>> >>      done on ia64. This would mean that the AMD VI code would just call the
>> >>      .. something like this - NOT tested or even compile tested:
>> >>
>> >> diff --git a/drivers/iommu/amd_iommu.c b/drivers/iommu/amd_iommu.c
>> >> index c1c74e0..e7fa8f7 100644
>> >> --- a/drivers/iommu/amd_iommu.c
>> >> +++ b/drivers/iommu/amd_iommu.c
>> >> @@ -3173,6 +3173,24 @@ int __init amd_iommu_init_dma_ops(void)
>> >>         if (unhandled && max_pfn > MAX_DMA32_PFN) {
>> >>                 /* There are unhandled devices - initialize swiotlb for them */
>> >>                 swiotlb = 1;
>> >> +               /* Late (so no bootmem allocator) usage and only if the early SWIOTLB
>> >> +                * hadn't been allocated (which can happen on kexec kernels booted
>> >> +                * above 4GB). */
>> >> +               if (!swiotlb_nr_tbl()) {
>> >> +                       int retry = 3;
>> >> +                       int mb_size = 64;
>> >> +                       int rc = 0;
>> >> +retry_me:
>> >> +                       if (retry < 0)
>> >> +                               panic("We tried setting %dMB for SWIOTLB but got -ENOMEM", mb_size << 1);
>> >> +                       rc = swiotlb_late_init_with_default_size(mb_size * (1<<20));
>> >> +                       if (rc) {
>> >> +                               retry --;
>> >> +                               mb_size >> 1;
>> >> +                               goto retry_me;
>> >> +                       }
>> >> +                       dma_ops = &swiotlb_dma_ops;
>> >> +               }
>> >>         }
>> >>
>> >>         amd_iommu_stats_init();
>> >>
>> >> And then the early SWIOTLB initialization for 64MB can fail and we are still OK.
>> >>>
>> >
>> > Yinghai/Konrad,
>> >
>> > Did more testing. btw this patch depends on your [v7u1,25/31]
>> > memblock: add memblock_mem_size(). Here are the test results:
>> >
>> > 1. When there is not enough memory: (enough_mem_for_swiotlb() returns false)
>> > system will panic in amd_iommu_init_dma_ops().
>> >
>> > 2. When there is enough memory: (enough_mem_for_swiotlb() returns true):
>> > swiotlb is reserved
>> > pci_swiotlb_late_init() leaves the buffer allocated since swiotlb=1
>> > with that getting changed in amd_iommu_init_dma_ops().
>> >
>> > I agree with Konrad that the logic should be wrapped in CONFIG_KEXEC.
>> 
>> If enough_mem_for_swiotlb needs to be conditional on CONFIG_KEXEC the
>> code is architected wrong.  None of this logic has anything to do with
>> kexec except that the kexec path is one way to get this condition to
>> happen.  Especially since the kexec'd kernel where this condition occurs
>> does not need kexec support built in.
>
> Fair enough - with the 'memmap' command line options one can trigger
> this.
>> 
>> Yinghai I sat down and read your patch and the approach you are taking
>> is totally wrong.
>> 
>> The problem is that swiotlb_init() in lib/swiotlb.c does not know how to
>> fail without panic'ing the system.
>> 
>> Which leaves two valid approaches.
>> - Create a variant of swiotlb_init that can fail for use on x86 and
>>   handle the failure.
>
> As an safe-fail step we could retry with an smaller size until a fit is found.
>
>> - Delay initializing the swiotlb until someone actually needs a mapping
>>   from it.  
>
> So late init the SWIOTLB and perhaps have multiple "segments" of 4MB
> of SWIOTLB that can grow as we exhaust its memory. Could work.
>> 
>> Delaying the initialization of the swiotlb is out because the code
>> needs an early memory allocation to get a large chunk of contiguous
>> memory for the bounce buffers.
>
> Or it can use the late init, but with a smaller chunk of memory.

A reasonable point.

>> Which means the panics that occurr in swiotlb_init() need to be delayed
>> until someone something actually needs bounce buffers from the swiotlb.
>> 
>> Although arguably what should actually happen instead of panic() is that
>> swiotlb_map_single should simply fail early when it was not possible to
>> preallocate bounce buffers.
>
> This sounds like a Catch-22. Fail early implies that it would have to do
> this when using the bootmem allocator.

Sorry I meant was something like.

... swiotlb_init(...)
{
        ...
	if (!alloc_bootmem_low_pages(...))
		noiotlb_memory = true;
 	...
}

... swiotlb_map_single(...)
{
	if (noiotlb_memory)
        	return SWIOTLB_MAP_ERROR;
	....
}

With the noiotlb_memory check happening early.

> But the swiotlb_map_single is not
> called at that time - it is called _after_ the bootmem allocator has been
> de-activated. Actually it is called pretty late - when built-in PCI devices
> start off or when 'udev' starts scanning the PCI bus and loading modules.
>
> I think I am misunderstanding you - could you clarify please?

Does the above help?

Eric

^ permalink raw reply	[flat|nested] 199+ messages in thread

* Re: [PATCH v7u1 26/31] x86: Don't enable swiotlb if there is not enough ram for it
  2013-01-08  3:01                 ` Yinghai Lu
@ 2013-01-08  3:13                   ` Eric W. Biederman
  2013-01-08  3:50                     ` Yinghai Lu
  0 siblings, 1 reply; 199+ messages in thread
From: Eric W. Biederman @ 2013-01-08  3:13 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Shuah Khan, Konrad Rzeszutek Wilk, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin, Andrew Morton, Borislav Petkov, Jan Kiszka,
	Jason Wessel, linux-kernel, Joerg Roedel

Yinghai Lu <yinghai@kernel.org> writes:

> On Mon, Jan 7, 2013 at 6:22 PM, Eric W. Biederman <ebiederm@xmission.com> wrote:
>> Yinghai I sat down and read your patch and the approach you are taking
>> is totally wrong.
>
> Thanks for check the patch, did you check v3?

I looked at the version of the patch you had as an attachment.  I don't
know the version number but it was the latest version of the patch I saw
in this thread.

After looking at things having a function enoung_mem_for_swiotlb()
in pci_swiotlb_detect_override() and pic_swiotlb_detect_4gb is brittle
hack.

>> The problem is that swiotlb_init() in lib/swiotlb.c does not know how to
>> fail without panic'ing the system.
>
> I did not put panic in swiotlb, now I put panic in amd_iommu ops init
> when it need extra
> swiotlb for unhandled devices by AMD IOMMU.

But the only reason you need to touch this code at all is that
swiotlb_init() calls panic() if you don't have 64MB of memory below 4G.

>> Which leaves two valid approaches.
>> - Create a variant of swiotlb_init that can fail for use on x86 and
>>   handle the failure.
>> - Delay initializing the swiotlb until someone actually needs a mapping
>>   from it.
>>
>> Delaying the initialization of the swiotlb is out because the code
>> needs an early memory allocation to get a large chunk of contiguous
>> memory for the bounce buffers.
>
> ok.
>
>>
>> Which means the panics that occurr in swiotlb_init() need to be delayed
>> until someone something actually needs bounce buffers from the swiotlb.
>>
>> Although arguably what should actually happen instead of panic() is that
>> swiotlb_map_single should simply fail early when it was not possible to
>> preallocate bounce buffers.
>
> do you mean: actually needed dma buffer is much less than swiotlb
> buffer aka 64M?


I meant we should detect failure to allocate bounce buffers in in
swiotlb_init() instead of panicing.

I meant swiotlb_map_single() should either panic or simply fail.

If I have read lib/swiotlb.c correctly the only place we allocate a
bounce buffer is in swiotlb_map_single.  If there are more places we can
allocate bounce buffers those need to be handled as well.

Eric

^ permalink raw reply	[flat|nested] 199+ messages in thread

* Re: [PATCH v7u1 26/31] x86: Don't enable swiotlb if there is not enough ram for it
  2013-01-08  3:13                   ` Eric W. Biederman
@ 2013-01-08  3:50                     ` Yinghai Lu
  2013-01-08 23:40                       ` Yinghai Lu
  0 siblings, 1 reply; 199+ messages in thread
From: Yinghai Lu @ 2013-01-08  3:50 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Shuah Khan, Konrad Rzeszutek Wilk, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin, Andrew Morton, Borislav Petkov, Jan Kiszka,
	Jason Wessel, linux-kernel, Joerg Roedel

On Mon, Jan 7, 2013 at 7:13 PM, Eric W. Biederman <ebiederm@xmission.com> wrote:
> I meant we should detect failure to allocate bounce buffers in in
> swiotlb_init() instead of panicing.
>
> I meant swiotlb_map_single() should either panic or simply fail.
>
> If I have read lib/swiotlb.c correctly the only place we allocate a
> bounce buffer is in swiotlb_map_single.  If there are more places we can
> allocate bounce buffers those need to be handled as well.

ok, will give it a try.

Thanks

Yinghai

^ permalink raw reply	[flat|nested] 199+ messages in thread

* Re: [PATCH v7u1 26/31] x86: Don't enable swiotlb if there is not enough ram for it
  2013-01-08  3:50                     ` Yinghai Lu
@ 2013-01-08 23:40                       ` Yinghai Lu
  2013-01-09  0:04                         ` Eric W. Biederman
  2013-01-09  0:43                         ` Konrad Rzeszutek Wilk
  0 siblings, 2 replies; 199+ messages in thread
From: Yinghai Lu @ 2013-01-08 23:40 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Shuah Khan, Konrad Rzeszutek Wilk, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin, Andrew Morton, Borislav Petkov, Jan Kiszka,
	Jason Wessel, linux-kernel, Joerg Roedel

[-- Attachment #1: Type: text/plain, Size: 674 bytes --]

On Mon, Jan 7, 2013 at 7:50 PM, Yinghai Lu <yinghai@kernel.org> wrote:
> On Mon, Jan 7, 2013 at 7:13 PM, Eric W. Biederman <ebiederm@xmission.com> wrote:
>> I meant we should detect failure to allocate bounce buffers in in
>> swiotlb_init() instead of panicing.
>>
>> I meant swiotlb_map_single() should either panic or simply fail.
>>
>> If I have read lib/swiotlb.c correctly the only place we allocate a
>> bounce buffer is in swiotlb_map_single.  If there are more places we can
>> allocate bounce buffers those need to be handled as well.
>
> ok, will give it a try.

please check if you are ok with attached.

looks like it need more change of lines.

Thanks

Yinghai

[-- Attachment #2: alloc_low_page_nopanic.patch --]
[-- Type: application/octet-stream, Size: 2534 bytes --]

Subject: [PATCH] mm: Add alloc_bootmem_low_pages_nopanic()

We don't need to panic in some case, like for swiotlb preallocating.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>

---
 include/linux/bootmem.h |    5 +++++
 mm/bootmem.c            |    8 ++++++++
 mm/nobootmem.c          |    8 ++++++++
 3 files changed, 21 insertions(+)

Index: linux-2.6/include/linux/bootmem.h
===================================================================
--- linux-2.6.orig/include/linux/bootmem.h
+++ linux-2.6/include/linux/bootmem.h
@@ -99,6 +99,9 @@ void *___alloc_bootmem_node_nopanic(pg_d
 extern void *__alloc_bootmem_low(unsigned long size,
 				 unsigned long align,
 				 unsigned long goal);
+void *__alloc_bootmem_low_nopanic(unsigned long size,
+				 unsigned long align,
+				 unsigned long goal);
 extern void *__alloc_bootmem_low_node(pg_data_t *pgdat,
 				      unsigned long size,
 				      unsigned long align,
@@ -132,6 +135,8 @@ extern void *__alloc_bootmem_low_node(pg
 
 #define alloc_bootmem_low(x) \
 	__alloc_bootmem_low(x, SMP_CACHE_BYTES, 0)
+#define alloc_bootmem_low_pages_nopanic(x) \
+	__alloc_bootmem_low_nopanic(x, PAGE_SIZE, 0)
 #define alloc_bootmem_low_pages(x) \
 	__alloc_bootmem_low(x, PAGE_SIZE, 0)
 #define alloc_bootmem_low_pages_node(pgdat, x) \
Index: linux-2.6/mm/bootmem.c
===================================================================
--- linux-2.6.orig/mm/bootmem.c
+++ linux-2.6/mm/bootmem.c
@@ -821,6 +821,14 @@ void * __init __alloc_bootmem_low(unsign
 	return ___alloc_bootmem(size, align, goal, ARCH_LOW_ADDRESS_LIMIT);
 }
 
+void * __init __alloc_bootmem_low_nopanic(unsigned long size,
+					  unsigned long align,
+					  unsigned long goal)
+{
+	return ___alloc_bootmem_nopanic(size, align, goal,
+					ARCH_LOW_ADDRESS_LIMIT);
+}
+
 /**
  * __alloc_bootmem_low_node - allocate low boot memory from a specific node
  * @pgdat: node to allocate from
Index: linux-2.6/mm/nobootmem.c
===================================================================
--- linux-2.6.orig/mm/nobootmem.c
+++ linux-2.6/mm/nobootmem.c
@@ -391,6 +391,14 @@ void * __init __alloc_bootmem_low(unsign
 	return ___alloc_bootmem(size, align, goal, ARCH_LOW_ADDRESS_LIMIT);
 }
 
+void * __init __alloc_bootmem_low_nopanic(unsigned long size,
+					  unsigned long align,
+					  unsigned long goal)
+{
+	return ___alloc_bootmem_nopanic(size, align, goal,
+					ARCH_LOW_ADDRESS_LIMIT);
+}
+
 /**
  * __alloc_bootmem_low_node - allocate low boot memory from a specific node
  * @pgdat: node to allocate from

[-- Attachment #3: swiotlb.patch --]
[-- Type: application/octet-stream, Size: 5541 bytes --]

Subject: [PATCH] x86: Don't panic if can not alloc buffer for swiotlb

Normal boot path on system with iommu support:
swiotlb buffer will be allocated early at first and then try to initialize
iommu, if iommu for intel or amd could setup properly, swiotlb buffer
will be freed.

The early allocating is with bootmem, and could panic when we try to use
kdump with buffer above 4G only, or with memmap to limit mem under 4G.

According to Eric, add _nopanic version and no_iotlb_memory to fail
map single later if swiotlb is still needed.

Suggested-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Joerg Roedel <joro@8bytes.org>

---
 arch/x86/kernel/pci-swiotlb.c |    2 -
 include/linux/swiotlb.h       |    1 
 lib/swiotlb.c                 |   59 ++++++++++++++++++++++++++++++++++--------
 3 files changed, 50 insertions(+), 12 deletions(-)

Index: linux-2.6/include/linux/swiotlb.h
===================================================================
--- linux-2.6.orig/include/linux/swiotlb.h
+++ linux-2.6/include/linux/swiotlb.h
@@ -22,6 +22,7 @@ extern int swiotlb_force;
  */
 #define IO_TLB_SHIFT 11
 
+void swiotlb_init_nopanic(int verbose);
 extern void swiotlb_init(int verbose);
 extern void swiotlb_init_with_tbl(char *tlb, unsigned long nslabs, int verbose);
 extern unsigned long swiotlb_nr_tbl(void);
Index: linux-2.6/lib/swiotlb.c
===================================================================
--- linux-2.6.orig/lib/swiotlb.c
+++ linux-2.6/lib/swiotlb.c
@@ -122,11 +122,18 @@ static dma_addr_t swiotlb_virt_to_bus(st
 	return phys_to_dma(hwdev, virt_to_phys(address));
 }
 
+static bool no_iotlb_memory;
+
 void swiotlb_print_info(void)
 {
 	unsigned long bytes = io_tlb_nslabs << IO_TLB_SHIFT;
 	unsigned char *vstart, *vend;
 
+	if (no_iotlb_memory) {
+		printk(KERN_INFO "software IO TLB: No low mem\n");
+		return;
+	}
+
 	vstart = phys_to_virt(io_tlb_start);
 	vend = phys_to_virt(io_tlb_end);
 
@@ -136,7 +143,8 @@ void swiotlb_print_info(void)
 	       bytes >> 20, vstart, vend - 1);
 }
 
-void __init swiotlb_init_with_tbl(char *tlb, unsigned long nslabs, int verbose)
+static void __init __swiotlb_init_with_tbl(char *tlb, unsigned long nslabs,
+				 int verbose, bool nopanic)
 {
 	void *v_overflow_buffer;
 	unsigned long i, bytes;
@@ -150,9 +158,17 @@ void __init swiotlb_init_with_tbl(char *
 	/*
 	 * Get the overflow emergency buffer
 	 */
-	v_overflow_buffer = alloc_bootmem_low_pages(PAGE_ALIGN(io_tlb_overflow));
-	if (!v_overflow_buffer)
-		panic("Cannot allocate SWIOTLB overflow buffer!\n");
+	v_overflow_buffer = alloc_bootmem_low_pages_nopanic(
+						PAGE_ALIGN(io_tlb_overflow));
+	if (!v_overflow_buffer) {
+		if (nopanic) {
+			no_iotlb_memory = true;
+			free_bootmem(io_tlb_start,
+				     PAGE_ALIGN(io_tlb_nslabs << IO_TLB_SHIFT));
+			return;
+		} else
+			panic("Cannot allocate SWIOTLB overflow buffer!\n");
+	}
 
 	io_tlb_overflow_buffer = __pa(v_overflow_buffer);
 
@@ -171,12 +187,17 @@ void __init swiotlb_init_with_tbl(char *
 		swiotlb_print_info();
 }
 
+void __init swiotlb_init_with_tbl(char *tlb, unsigned long nslabs, int verbose)
+{
+	__swiotlb_init_with_tbl(tlb, nslabs, verbose, false);
+}
+
 /*
  * Statically reserve bounce buffer space and initialize bounce buffer data
  * structures for the software IO TLB used to implement the DMA API.
  */
 static void __init
-swiotlb_init_with_default_size(size_t default_size, int verbose)
+swiotlb_init_with_default_size(size_t default_size, int verbose, bool nopanic)
 {
 	unsigned char *vstart;
 	unsigned long bytes;
@@ -191,17 +212,30 @@ swiotlb_init_with_default_size(size_t de
 	/*
 	 * Get IO TLB memory from the low pages
 	 */
-	vstart = alloc_bootmem_low_pages(PAGE_ALIGN(bytes));
-	if (!vstart)
-		panic("Cannot allocate SWIOTLB buffer");
+	vstart = alloc_bootmem_low_pages_nopanic(PAGE_ALIGN(bytes));
+	if (!vstart) {
+		if (nopanic) {
+			no_iotlb_memory = true;
+			return;
+		} else
+			panic("Cannot allocate SWIOTLB buffer");
+	}
 
-	swiotlb_init_with_tbl(vstart, io_tlb_nslabs, verbose);
+	__swiotlb_init_with_tbl(vstart, io_tlb_nslabs, verbose, nopanic);
+}
+
+void __init
+swiotlb_init_nopanic(int verbose)
+{
+	/* default to 64MB */
+	swiotlb_init_with_default_size(64 * (1<<20), verbose, true);
 }
 
 void __init
 swiotlb_init(int verbose)
 {
-	swiotlb_init_with_default_size(64 * (1<<20), verbose);	/* default to 64MB */
+	/* default to 64MB */
+	swiotlb_init_with_default_size(64 * (1<<20), verbose, false);
 }
 
 /*
@@ -405,6 +439,9 @@ phys_addr_t swiotlb_tbl_map_single(struc
 	unsigned long offset_slots;
 	unsigned long max_slots;
 
+	if (no_iotlb_memory)
+		return SWIOTLB_MAP_ERROR;
+
 	mask = dma_get_seg_boundary(hwdev);
 
 	tbl_dma_addr &= mask;
@@ -669,7 +706,7 @@ swiotlb_full(struct device *dev, size_t
 	printk(KERN_ERR "DMA: Out of SW-IOMMU space for %zu bytes at "
 	       "device %s\n", size, dev ? dev_name(dev) : "?");
 
-	if (size <= io_tlb_overflow || !do_panic)
+	if (!no_iotlb_memory && (size <= io_tlb_overflow || !do_panic))
 		return;
 
 	if (dir == DMA_BIDIRECTIONAL)
Index: linux-2.6/arch/x86/kernel/pci-swiotlb.c
===================================================================
--- linux-2.6.orig/arch/x86/kernel/pci-swiotlb.c
+++ linux-2.6/arch/x86/kernel/pci-swiotlb.c
@@ -91,7 +91,7 @@ IOMMU_INIT(pci_swiotlb_detect_4gb,
 void __init pci_swiotlb_init(void)
 {
 	if (swiotlb) {
-		swiotlb_init(0);
+		swiotlb_init_nopanic(0);
 		dma_ops = &swiotlb_dma_ops;
 	}
 }

^ permalink raw reply	[flat|nested] 199+ messages in thread

* Re: [PATCH v7u1 26/31] x86: Don't enable swiotlb if there is not enough ram for it
  2013-01-08 23:40                       ` Yinghai Lu
@ 2013-01-09  0:04                         ` Eric W. Biederman
  2013-01-09  0:43                         ` Konrad Rzeszutek Wilk
  1 sibling, 0 replies; 199+ messages in thread
From: Eric W. Biederman @ 2013-01-09  0:04 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Shuah Khan, Konrad Rzeszutek Wilk, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin, Andrew Morton, Borislav Petkov, Jan Kiszka,
	Jason Wessel, linux-kernel, Joerg Roedel

Yinghai Lu <yinghai@kernel.org> writes:

> On Mon, Jan 7, 2013 at 7:50 PM, Yinghai Lu <yinghai@kernel.org> wrote:
>> On Mon, Jan 7, 2013 at 7:13 PM, Eric W. Biederman <ebiederm@xmission.com> wrote:
>>> I meant we should detect failure to allocate bounce buffers in in
>>> swiotlb_init() instead of panicing.
>>>
>>> I meant swiotlb_map_single() should either panic or simply fail.
>>>
>>> If I have read lib/swiotlb.c correctly the only place we allocate a
>>> bounce buffer is in swiotlb_map_single.  If there are more places we can
>>> allocate bounce buffers those need to be handled as well.
>>
>> ok, will give it a try.
>
> please check if you are ok with attached.
>

It looks like the right direction.   Certainly enough to test and see if
the code will work.

I don't see the point of adding a nopanic case to the swiotlb
initialization.  That just looks like unnecessary complications.
Certainly a nopanic case implemented by passing a nopanic parameter
looks like the wrong way to go.  At most you want to return an error
code and do:

swiotlb_init()
{
	if (swiotlb_init_with_default_size() == -ENOMEM)
		panic("Cannot allocate SWIOTLB buffer");
}

The page freeing in swiotlb_init_with_tbl appears to be in the wrong
function.  I suggest looking at swiotlb_late_init which apparently is
allowed to fail for some ideas.

> looks like it need more change of lines.

The size of the change matters less than how clean and maintainable the
result is.  If done carefully I expect you can have net fewer lines but
not needing to handle the case when the swiotlb apis are unavailable.

Eric

^ permalink raw reply	[flat|nested] 199+ messages in thread

* Re: [PATCH v7u1 26/31] x86: Don't enable swiotlb if there is not enough ram for it
  2013-01-08 23:40                       ` Yinghai Lu
  2013-01-09  0:04                         ` Eric W. Biederman
@ 2013-01-09  0:43                         ` Konrad Rzeszutek Wilk
  2013-01-09  0:56                           ` Yinghai Lu
  2013-01-09  0:58                           ` Eric W. Biederman
  1 sibling, 2 replies; 199+ messages in thread
From: Konrad Rzeszutek Wilk @ 2013-01-09  0:43 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Eric W. Biederman, Shuah Khan, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin, Andrew Morton, Borislav Petkov, Jan Kiszka,
	Jason Wessel, linux-kernel, Joerg Roedel

On Tue, Jan 08, 2013 at 03:40:11PM -0800, Yinghai Lu wrote:
> On Mon, Jan 7, 2013 at 7:50 PM, Yinghai Lu <yinghai@kernel.org> wrote:
> > On Mon, Jan 7, 2013 at 7:13 PM, Eric W. Biederman <ebiederm@xmission.com> wrote:
> >> I meant we should detect failure to allocate bounce buffers in in
> >> swiotlb_init() instead of panicing.
> >>
> >> I meant swiotlb_map_single() should either panic or simply fail.
> >>
> >> If I have read lib/swiotlb.c correctly the only place we allocate a
> >> bounce buffer is in swiotlb_map_single.  If there are more places we can
> >> allocate bounce buffers those need to be handled as well.
> >
> > ok, will give it a try.
> 
> please check if you are ok with attached.
> 
> looks like it need more change of lines.

The swiotlb_full check I don't believe is neccessary. You won't ever get
to that unless swiotlb_map_page has at least provided a bounce buffer.
And if the swiotlb_map_page does not have a bounce buffer it will exit
with:

+       if (no_iotlb_memory)                                                    
+               return SWIOTLB_MAP_ERROR;                                       
+                 

which is dangerous. That is b/c there are drivers that don't use the
dma_mapping_error check (so check the bus address after calling
pci_map_*). This means they would try to do DMA on 0xffffffff (yikes!).

That is reason the failback (v_overflow_buffer) is still in
usage - b/c we have drivers that might just do this and this is the last
resort for them. And until those drivers are fixed - we _need_ this
fallback to work.

^ permalink raw reply	[flat|nested] 199+ messages in thread

* Re: [PATCH v7u1 26/31] x86: Don't enable swiotlb if there is not enough ram for it
  2013-01-09  0:43                         ` Konrad Rzeszutek Wilk
@ 2013-01-09  0:56                           ` Yinghai Lu
  2013-01-09  0:58                           ` Eric W. Biederman
  1 sibling, 0 replies; 199+ messages in thread
From: Yinghai Lu @ 2013-01-09  0:56 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: Eric W. Biederman, Shuah Khan, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin, Andrew Morton, Borislav Petkov, Jan Kiszka,
	Jason Wessel, linux-kernel, Joerg Roedel

On Tue, Jan 8, 2013 at 4:43 PM, Konrad Rzeszutek Wilk
<konrad.wilk@oracle.com> wrote:
> The swiotlb_full check I don't believe is neccessary. You won't ever get
> to that unless swiotlb_map_page has at least provided a bounce buffer.

yes, the code get there, when I boot the kernel with
"memmap=4095M$1M intel_iommu=off"

please check the call trace:

[  209.878091] usb 1-3: new high-speed USB device number 2 using ehci-pci
[  209.878096] DMA: Out of SW-IOMMU space for 8 bytes at device 0000:00:1a.7
[  209.878117] Kernel panic - not syncing: DMA: Random memory could be DMA read
[  209.878117]
[  209.878120] Pid: 2057, comm: khubd Not tainted
3.8.0-rc2-yh-00443-g262039c-dirty #1091
[  209.878121] Call Trace:
[  209.878132]  [<ffffffff8214a607>] panic+0xc0/0x1ce
[  209.878143]  [<ffffffff814dc013>] swiotlb_full+0xa3/0xc0
[  209.878147]  [<ffffffff814dccf3>] swiotlb_map_page+0x93/0xe0
[  209.878157]  [<ffffffff81c6800d>] usb_hcd_map_urb_for_dma+0xcd/0x510
[  209.878166]  [<ffffffff810eb758>] ? lockdep_init_map+0x518/0x570
[  209.878170]  [<ffffffff81c68a0d>] usb_hcd_submit_urb+0x5bd/0x690
[  209.878174]  [<ffffffff81c699b6>] usb_submit_urb+0x306/0x3c0
[  209.878179]  [<ffffffff81c6abb2>] usb_start_wait_urb+0x82/0x120
[  209.878183]  [<ffffffff81c69f7e>] ? usb_alloc_urb+0x1e/0x60
[  209.878187]  [<ffffffff81c6aeae>] usb_control_msg+0xde/0x130
[  209.878194]  [<ffffffff81196f80>] ? kmem_cache_alloc_trace+0x60/0x150
[  209.878198]  [<ffffffff81c62ad7>] hub_port_init+0x2a7/0xa10
[  209.878203]  [<ffffffff81c65542>] hub_port_connect_change+0x492/0x9c0
[  209.878211]  [<ffffffff82163f4e>] ? mutex_unlock+0xe/0x10
[  209.878215]  [<ffffffff81c65f9b>] hub_thread+0x52b/0x830
[  209.878222]  [<ffffffff810c3f94>] ? local_clock+0x34/0x60
[  209.878229]  [<ffffffff810afa00>] ? wake_up_bit+0x40/0x40
[  209.878233]  [<ffffffff81c65a70>] ? hub_port_connect_change+0x9c0/0x9c0
[  209.878237]  [<ffffffff810af088>] kthread+0xe8/0xf0
[  209.878241]  [<ffffffff810aefa0>] ? __init_kthread_worker+0x70/0x70
[  209.878248]  [<ffffffff8216f59c>] ret_from_fork+0x7c/0xb0
[  209.878251]  [<ffffffff810aefa0>] ? __init_kthread_worker+0x70/0x70

^ permalink raw reply	[flat|nested] 199+ messages in thread

* Re: [PATCH v7u1 26/31] x86: Don't enable swiotlb if there is not enough ram for it
  2013-01-09  0:43                         ` Konrad Rzeszutek Wilk
  2013-01-09  0:56                           ` Yinghai Lu
@ 2013-01-09  0:58                           ` Eric W. Biederman
  2013-01-09  1:07                             ` Yinghai Lu
  2013-01-09 13:12                             ` Konrad Rzeszutek Wilk
  1 sibling, 2 replies; 199+ messages in thread
From: Eric W. Biederman @ 2013-01-09  0:58 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: Yinghai Lu, Shuah Khan, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin, Andrew Morton, Borislav Petkov, Jan Kiszka,
	Jason Wessel, linux-kernel, Joerg Roedel

Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> writes:

> On Tue, Jan 08, 2013 at 03:40:11PM -0800, Yinghai Lu wrote:
>> On Mon, Jan 7, 2013 at 7:50 PM, Yinghai Lu <yinghai@kernel.org> wrote:
>> > On Mon, Jan 7, 2013 at 7:13 PM, Eric W. Biederman <ebiederm@xmission.com> wrote:
>> >> I meant we should detect failure to allocate bounce buffers in in
>> >> swiotlb_init() instead of panicing.
>> >>
>> >> I meant swiotlb_map_single() should either panic or simply fail.
>> >>
>> >> If I have read lib/swiotlb.c correctly the only place we allocate a
>> >> bounce buffer is in swiotlb_map_single.  If there are more places we can
>> >> allocate bounce buffers those need to be handled as well.
>> >
>> > ok, will give it a try.
>> 
>> please check if you are ok with attached.
>> 
>> looks like it need more change of lines.
>
> The swiotlb_full check I don't believe is neccessary. You won't ever get
> to that unless swiotlb_map_page has at least provided a bounce buffer.
> And if the swiotlb_map_page does not have a bounce buffer it will exit
> with:
>
> +       if (no_iotlb_memory)                                                    
> +               return SWIOTLB_MAP_ERROR;                                       
> +                 
>
> which is dangerous. That is b/c there are drivers that don't use the
> dma_mapping_error check (so check the bus address after calling
> pci_map_*). This means they would try to do DMA on 0xffffffff (yikes!).
>
> That is reason the failback (v_overflow_buffer) is still in
> usage - b/c we have drivers that might just do this and this is the last
> resort for them. And until those drivers are fixed - we _need_ this
> fallback to work.

So instead we need to say?

+       if (no_iotlb_memory)                                                    
+               panic("Cannot allocate SWIOTLB buffer");
+                 

Which is just making the panic a little later than it used to be and
seems completely reasonable.

Eric

^ permalink raw reply	[flat|nested] 199+ messages in thread

* Re: [PATCH v7u1 26/31] x86: Don't enable swiotlb if there is not enough ram for it
  2013-01-09  0:58                           ` Eric W. Biederman
@ 2013-01-09  1:07                             ` Yinghai Lu
  2013-01-09  1:12                               ` Yinghai Lu
  2013-01-09 13:12                             ` Konrad Rzeszutek Wilk
  1 sibling, 1 reply; 199+ messages in thread
From: Yinghai Lu @ 2013-01-09  1:07 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Konrad Rzeszutek Wilk, Shuah Khan, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin, Andrew Morton, Borislav Petkov, Jan Kiszka,
	Jason Wessel, linux-kernel, Joerg Roedel

On Tue, Jan 8, 2013 at 4:58 PM, Eric W. Biederman <ebiederm@xmission.com> wrote:

>
> So instead we need to say?
>
> +       if (no_iotlb_memory)
> +               panic("Cannot allocate SWIOTLB buffer");
> +
>
> Which is just making the panic a little later than it used to be and
> seems completely reasonable.

yes, looks some driver just use map_single without checking results.

^ permalink raw reply	[flat|nested] 199+ messages in thread

* Re: [PATCH v7u1 26/31] x86: Don't enable swiotlb if there is not enough ram for it
  2013-01-09  1:07                             ` Yinghai Lu
@ 2013-01-09  1:12                               ` Yinghai Lu
  2013-01-09  2:31                                 ` Eric W. Biederman
  2013-01-09 13:24                                 ` Konrad Rzeszutek Wilk
  0 siblings, 2 replies; 199+ messages in thread
From: Yinghai Lu @ 2013-01-09  1:12 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Konrad Rzeszutek Wilk, Shuah Khan, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin, Andrew Morton, Borislav Petkov, Jan Kiszka,
	Jason Wessel, linux-kernel, Joerg Roedel

[-- Attachment #1: Type: text/plain, Size: 532 bytes --]

On Tue, Jan 8, 2013 at 5:07 PM, Yinghai Lu <yinghai@kernel.org> wrote:
> On Tue, Jan 8, 2013 at 4:58 PM, Eric W. Biederman <ebiederm@xmission.com> wrote:
>
>>
>> So instead we need to say?
>>
>> +       if (no_iotlb_memory)
>> +               panic("Cannot allocate SWIOTLB buffer");
>> +
>>
>> Which is just making the panic a little later than it used to be and
>> seems completely reasonable.
>
> yes, looks some driver just use map_single without checking results.

update one.

later could have another patch to shrink size...

[-- Attachment #2: swiotlb.patch --]
[-- Type: application/octet-stream, Size: 5346 bytes --]

Subject: [PATCH] x86: Don't panic if can not alloc buffer for swiotlb

Normal boot path on system with iommu support:
swiotlb buffer will be allocated early at first and then try to initialize
iommu, if iommu for intel or amd could setup properly, swiotlb buffer
will be freed.

The early allocating is with bootmem, and could panic when we try to use
kdump with buffer above 4G only, or with memmap to limit mem under 4G.

According to Eric, add _nopanic version and no_iotlb_memory to fail
map single later if swiotlb is still needed.

-v2: don't pass nopanic, and use -ENOMEM return value according to Eric.
     panic early instead of using swiotlb_full to panic...according to Eric/Konrad.

Suggested-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Joerg Roedel <joro@8bytes.org>

---
 arch/x86/kernel/pci-swiotlb.c |    2 -
 include/linux/swiotlb.h       |    1 
 lib/swiotlb.c                 |   54 +++++++++++++++++++++++++++++++++++-------
 3 files changed, 47 insertions(+), 10 deletions(-)

Index: linux-2.6/include/linux/swiotlb.h
===================================================================
--- linux-2.6.orig/include/linux/swiotlb.h
+++ linux-2.6/include/linux/swiotlb.h
@@ -22,6 +22,7 @@ extern int swiotlb_force;
  */
 #define IO_TLB_SHIFT 11
 
+void swiotlb_init_nopanic(int verbose);
 extern void swiotlb_init(int verbose);
 extern void swiotlb_init_with_tbl(char *tlb, unsigned long nslabs, int verbose);
 extern unsigned long swiotlb_nr_tbl(void);
Index: linux-2.6/lib/swiotlb.c
===================================================================
--- linux-2.6.orig/lib/swiotlb.c
+++ linux-2.6/lib/swiotlb.c
@@ -122,11 +122,18 @@ static dma_addr_t swiotlb_virt_to_bus(st
 	return phys_to_dma(hwdev, virt_to_phys(address));
 }
 
+static bool no_iotlb_memory;
+
 void swiotlb_print_info(void)
 {
 	unsigned long bytes = io_tlb_nslabs << IO_TLB_SHIFT;
 	unsigned char *vstart, *vend;
 
+	if (no_iotlb_memory) {
+		printk(KERN_INFO "software IO TLB: No low mem\n");
+		return;
+	}
+
 	vstart = phys_to_virt(io_tlb_start);
 	vend = phys_to_virt(io_tlb_end);
 
@@ -136,7 +143,8 @@ void swiotlb_print_info(void)
 	       bytes >> 20, vstart, vend - 1);
 }
 
-void __init swiotlb_init_with_tbl(char *tlb, unsigned long nslabs, int verbose)
+static int __init __swiotlb_init_with_tbl(char *tlb, unsigned long nslabs,
+				 int verbose)
 {
 	void *v_overflow_buffer;
 	unsigned long i, bytes;
@@ -150,9 +158,10 @@ void __init swiotlb_init_with_tbl(char *
 	/*
 	 * Get the overflow emergency buffer
 	 */
-	v_overflow_buffer = alloc_bootmem_low_pages(PAGE_ALIGN(io_tlb_overflow));
+	v_overflow_buffer = alloc_bootmem_low_pages_nopanic(
+						PAGE_ALIGN(io_tlb_overflow));
 	if (!v_overflow_buffer)
-		panic("Cannot allocate SWIOTLB overflow buffer!\n");
+		return -ENOMEM;
 
 	io_tlb_overflow_buffer = __pa(v_overflow_buffer);
 
@@ -169,14 +178,22 @@ void __init swiotlb_init_with_tbl(char *
 
 	if (verbose)
 		swiotlb_print_info();
+
+	return 0;
+}
+
+void __init swiotlb_init_with_tbl(char *tlb, unsigned long nslabs, int verbose)
+{
+	if (__swiotlb_init_with_tbl(tlb, nslabs, verbose) == -ENOMEM)
+		panic("Cannot allocate SWIOTLB buffer");
 }
 
 /*
  * Statically reserve bounce buffer space and initialize bounce buffer data
  * structures for the software IO TLB used to implement the DMA API.
  */
-static void __init
-swiotlb_init_with_default_size(size_t default_size, int verbose)
+static  __init
+int swiotlb_init_with_default_size(size_t default_size, int verbose)
 {
 	unsigned char *vstart;
 	unsigned long bytes;
@@ -191,17 +208,33 @@ swiotlb_init_with_default_size(size_t de
 	/*
 	 * Get IO TLB memory from the low pages
 	 */
-	vstart = alloc_bootmem_low_pages(PAGE_ALIGN(bytes));
+	vstart = alloc_bootmem_low_pages_nopanic(PAGE_ALIGN(bytes));
 	if (!vstart)
-		panic("Cannot allocate SWIOTLB buffer");
+		return -ENOMEM;
 
-	swiotlb_init_with_tbl(vstart, io_tlb_nslabs, verbose);
+	if (__swiotlb_init_with_tbl(vstart, io_tlb_nslabs, verbose) == -ENOMEM) {
+		free_bootmem(io_tlb_start,
+				     PAGE_ALIGN(io_tlb_nslabs << IO_TLB_SHIFT));
+		return -ENOMEM;
+	}
+
+	return 0;
+}
+
+void __init
+swiotlb_init_nopanic(int verbose)
+{
+	/* default to 64MB */
+	if (swiotlb_init_with_default_size(64 * (1<<20), verbose) == -ENOMEM)
+		pr_warn("Cannot allocate SWIOTLB buffer");
 }
 
 void __init
 swiotlb_init(int verbose)
 {
-	swiotlb_init_with_default_size(64 * (1<<20), verbose);	/* default to 64MB */
+	/* default to 64MB */
+	if (swiotlb_init_with_default_size(64 * (1<<20), verbose) == -ENOMEM)
+		panic("Cannot allocate SWIOTLB buffer");
 }
 
 /*
@@ -405,6 +438,9 @@ phys_addr_t swiotlb_tbl_map_single(struc
 	unsigned long offset_slots;
 	unsigned long max_slots;
 
+	if (no_iotlb_memory)
+		panic("Cannot allocate SWIOTLB buffer");
+
 	mask = dma_get_seg_boundary(hwdev);
 
 	tbl_dma_addr &= mask;
Index: linux-2.6/arch/x86/kernel/pci-swiotlb.c
===================================================================
--- linux-2.6.orig/arch/x86/kernel/pci-swiotlb.c
+++ linux-2.6/arch/x86/kernel/pci-swiotlb.c
@@ -91,7 +91,7 @@ IOMMU_INIT(pci_swiotlb_detect_4gb,
 void __init pci_swiotlb_init(void)
 {
 	if (swiotlb) {
-		swiotlb_init(0);
+		swiotlb_init_nopanic(0);
 		dma_ops = &swiotlb_dma_ops;
 	}
 }

^ permalink raw reply	[flat|nested] 199+ messages in thread

* Re: [PATCH v7u1 26/31] x86: Don't enable swiotlb if there is not enough ram for it
  2013-01-09  1:12                               ` Yinghai Lu
@ 2013-01-09  2:31                                 ` Eric W. Biederman
  2013-01-09 13:24                                 ` Konrad Rzeszutek Wilk
  1 sibling, 0 replies; 199+ messages in thread
From: Eric W. Biederman @ 2013-01-09  2:31 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Konrad Rzeszutek Wilk, Shuah Khan, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin, Andrew Morton, Borislav Petkov, Jan Kiszka,
	Jason Wessel, linux-kernel, Joerg Roedel

Yinghai Lu <yinghai@kernel.org> writes:

> On Tue, Jan 8, 2013 at 5:07 PM, Yinghai Lu <yinghai@kernel.org> wrote:
>> On Tue, Jan 8, 2013 at 4:58 PM, Eric W. Biederman <ebiederm@xmission.com> wrote:
>>
>>>
>>> So instead we need to say?
>>>
>>> +       if (no_iotlb_memory)
>>> +               panic("Cannot allocate SWIOTLB buffer");
>>> +
>>>
>>> Which is just making the panic a little later than it used to be and
>>> seems completely reasonable.
>>
>> yes, looks some driver just use map_single without checking results.
>
> update one.
>
> later could have another patch to shrink size...

It does look better.

Reading the code I am still left with the question why do the nopanic
handling at all?  Since the code effectively moves the panic to later.

Why can't other architectures use the same panic handling as x86?

Eric


^ permalink raw reply	[flat|nested] 199+ messages in thread

* Re: [PATCH v7u1 26/31] x86: Don't enable swiotlb if there is not enough ram for it
  2013-01-09  0:58                           ` Eric W. Biederman
  2013-01-09  1:07                             ` Yinghai Lu
@ 2013-01-09 13:12                             ` Konrad Rzeszutek Wilk
  1 sibling, 0 replies; 199+ messages in thread
From: Konrad Rzeszutek Wilk @ 2013-01-09 13:12 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Yinghai Lu, Shuah Khan, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin, Andrew Morton, Borislav Petkov, Jan Kiszka,
	Jason Wessel, linux-kernel, Joerg Roedel

On Tue, Jan 08, 2013 at 04:58:14PM -0800, Eric W. Biederman wrote:
> Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> writes:
> 
> > On Tue, Jan 08, 2013 at 03:40:11PM -0800, Yinghai Lu wrote:
> >> On Mon, Jan 7, 2013 at 7:50 PM, Yinghai Lu <yinghai@kernel.org> wrote:
> >> > On Mon, Jan 7, 2013 at 7:13 PM, Eric W. Biederman <ebiederm@xmission.com> wrote:
> >> >> I meant we should detect failure to allocate bounce buffers in in
> >> >> swiotlb_init() instead of panicing.
> >> >>
> >> >> I meant swiotlb_map_single() should either panic or simply fail.
> >> >>
> >> >> If I have read lib/swiotlb.c correctly the only place we allocate a
> >> >> bounce buffer is in swiotlb_map_single.  If there are more places we can
> >> >> allocate bounce buffers those need to be handled as well.
> >> >
> >> > ok, will give it a try.
> >> 
> >> please check if you are ok with attached.
> >> 
> >> looks like it need more change of lines.
> >
> > The swiotlb_full check I don't believe is neccessary. You won't ever get
> > to that unless swiotlb_map_page has at least provided a bounce buffer.
> > And if the swiotlb_map_page does not have a bounce buffer it will exit
> > with:
> >
> > +       if (no_iotlb_memory)                                                    
> > +               return SWIOTLB_MAP_ERROR;                                       
> > +                 
> >
> > which is dangerous. That is b/c there are drivers that don't use the
> > dma_mapping_error check (so check the bus address after calling
> > pci_map_*). This means they would try to do DMA on 0xffffffff (yikes!).
> >
> > That is reason the failback (v_overflow_buffer) is still in
> > usage - b/c we have drivers that might just do this and this is the last
> > resort for them. And until those drivers are fixed - we _need_ this
> > fallback to work.
> 
> So instead we need to say?
> 
> +       if (no_iotlb_memory)                                                    
> +               panic("Cannot allocate SWIOTLB buffer");


"Did not allocate SWIOTLB buffer earlier and can't now provide you with the
DMA bounce buffer."

> +                 
> 
> Which is just making the panic a little later than it used to be and
> seems completely reasonable.

Yes. When those drivers are all fixed and then we can remove that duct-tape.

^ permalink raw reply	[flat|nested] 199+ messages in thread

* Re: [PATCH v7u1 26/31] x86: Don't enable swiotlb if there is not enough ram for it
  2013-01-09  1:12                               ` Yinghai Lu
  2013-01-09  2:31                                 ` Eric W. Biederman
@ 2013-01-09 13:24                                 ` Konrad Rzeszutek Wilk
  2013-01-09 17:27                                   ` Yinghai Lu
  1 sibling, 1 reply; 199+ messages in thread
From: Konrad Rzeszutek Wilk @ 2013-01-09 13:24 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Eric W. Biederman, Shuah Khan, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin, Andrew Morton, Borislav Petkov, Jan Kiszka,
	Jason Wessel, linux-kernel, Joerg Roedel

On Tue, Jan 08, 2013 at 05:12:02PM -0800, Yinghai Lu wrote:
> On Tue, Jan 8, 2013 at 5:07 PM, Yinghai Lu <yinghai@kernel.org> wrote:
> > On Tue, Jan 8, 2013 at 4:58 PM, Eric W. Biederman <ebiederm@xmission.com> wrote:
> >
> >>
> >> So instead we need to say?
> >>
> >> +       if (no_iotlb_memory)
> >> +               panic("Cannot allocate SWIOTLB buffer");
> >> +
> >>
> >> Which is just making the panic a little later than it used to be and
> >> seems completely reasonable.
> >
> > yes, looks some driver just use map_single without checking results.
> 
> update one.

Please make it inline.

Anyhow, comments:

> Subject: [PATCH] x86: Don't panic if can not alloc buffer for swiotlb
> 
> Normal boot path on system with iommu support:
> swiotlb buffer will be allocated early at first and then try to initialize
> iommu, if iommu for intel or amd could setup properly, swiotlb buffer

Intel or AMD
> will be freed.
> 
> The early allocating is with bootmem, and could panic when we try to use
> kdump with buffer above 4G only, or with memmap to limit mem under 4G.

Can you provide the memmap example in here, please.

> 
> According to Eric, add _nopanic version and no_iotlb_memory to fail
> map single later if swiotlb is still needed.
> 
> -v2: don't pass nopanic, and use -ENOMEM return value according to Eric.
>      panic early instead of using swiotlb_full to panic...according to Eric/Konrad.
> 
> Suggested-by: Eric W. Biederman <ebiederm@xmission.com>
> Signed-off-by: Yinghai Lu <yinghai@kernel.org>
> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
> Cc: Joerg Roedel <joro@8bytes.org>
> 
> ---
>  arch/x86/kernel/pci-swiotlb.c |    2 -
>  include/linux/swiotlb.h       |    1 
>  lib/swiotlb.c                 |   54 +++++++++++++++++++++++++++++++++++-------
>  3 files changed, 47 insertions(+), 10 deletions(-)
> 
> Index: linux-2.6/include/linux/swiotlb.h
> ===================================================================
> --- linux-2.6.orig/include/linux/swiotlb.h
> +++ linux-2.6/include/linux/swiotlb.h
> @@ -22,6 +22,7 @@ extern int swiotlb_force;
>   */
>  #define IO_TLB_SHIFT 11
>  
> +void swiotlb_init_nopanic(int verbose);

No need for that, lets just use the swiotlb_init version.

>  extern void swiotlb_init(int verbose);
>  extern void swiotlb_init_with_tbl(char *tlb, unsigned long nslabs, int verbose);
>  extern unsigned long swiotlb_nr_tbl(void);
> Index: linux-2.6/lib/swiotlb.c
> ===================================================================
> --- linux-2.6.orig/lib/swiotlb.c
> +++ linux-2.6/lib/swiotlb.c
> @@ -122,11 +122,18 @@ static dma_addr_t swiotlb_virt_to_bus(st
>  	return phys_to_dma(hwdev, virt_to_phys(address));
>  }
>  
> +static bool no_iotlb_memory;
> +
>  void swiotlb_print_info(void)
>  {
>  	unsigned long bytes = io_tlb_nslabs << IO_TLB_SHIFT;
>  	unsigned char *vstart, *vend;
>  
> +	if (no_iotlb_memory) {
> +		printk(KERN_INFO "software IO TLB: No low mem\n");

KERN_ERROR or KERN_WARN

> +		return;
> +	}
> +
>  	vstart = phys_to_virt(io_tlb_start);
>  	vend = phys_to_virt(io_tlb_end);
>  
> @@ -136,7 +143,8 @@ void swiotlb_print_info(void)
>  	       bytes >> 20, vstart, vend - 1);
>  }
>  
> -void __init swiotlb_init_with_tbl(char *tlb, unsigned long nslabs, int verbose)
> +static int __init __swiotlb_init_with_tbl(char *tlb, unsigned long nslabs,
> +				 int verbose)

Collapse it in. Meaning make swiotlb_init_with_tbl return an int. This will
require altering the Xen version of Xen-SWIOTLB but that should be fairly easy.


>  {
>  	void *v_overflow_buffer;
>  	unsigned long i, bytes;
> @@ -150,9 +158,10 @@ void __init swiotlb_init_with_tbl(char *
>  	/*
>  	 * Get the overflow emergency buffer
>  	 */
> -	v_overflow_buffer = alloc_bootmem_low_pages(PAGE_ALIGN(io_tlb_overflow));
> +	v_overflow_buffer = alloc_bootmem_low_pages_nopanic(
> +						PAGE_ALIGN(io_tlb_overflow));
>  	if (!v_overflow_buffer)
> -		panic("Cannot allocate SWIOTLB overflow buffer!\n");
> +		return -ENOMEM;
>  
>  	io_tlb_overflow_buffer = __pa(v_overflow_buffer);
>  
> @@ -169,14 +178,22 @@ void __init swiotlb_init_with_tbl(char *
>  
>  	if (verbose)
>  		swiotlb_print_info();
> +
> +	return 0;
> +}
> +
> +void __init swiotlb_init_with_tbl(char *tlb, unsigned long nslabs, int verbose)
> +{
> +	if (__swiotlb_init_with_tbl(tlb, nslabs, verbose) == -ENOMEM)
> +		panic("Cannot allocate SWIOTLB buffer");

And that way we can get rid of this.

>  }
>  
>  /*
>   * Statically reserve bounce buffer space and initialize bounce buffer data
>   * structures for the software IO TLB used to implement the DMA API.
>   */
> -static void __init
> -swiotlb_init_with_default_size(size_t default_size, int verbose)
> +static  __init
> +int swiotlb_init_with_default_size(size_t default_size, int verbose)
>  {
>  	unsigned char *vstart;
>  	unsigned long bytes;
> @@ -191,17 +208,33 @@ swiotlb_init_with_default_size(size_t de
>  	/*
>  	 * Get IO TLB memory from the low pages
>  	 */
> -	vstart = alloc_bootmem_low_pages(PAGE_ALIGN(bytes));
> +	vstart = alloc_bootmem_low_pages_nopanic(PAGE_ALIGN(bytes));
>  	if (!vstart)
> -		panic("Cannot allocate SWIOTLB buffer");
> +		return -ENOMEM;
>  
> -	swiotlb_init_with_tbl(vstart, io_tlb_nslabs, verbose);
> +	if (__swiotlb_init_with_tbl(vstart, io_tlb_nslabs, verbose) == -ENOMEM) {
> +		free_bootmem(io_tlb_start,
> +				     PAGE_ALIGN(io_tlb_nslabs << IO_TLB_SHIFT));
> +		return -ENOMEM;
> +	}
> +
> +	return 0;
> +}
> +
> +void __init
> +swiotlb_init_nopanic(int verbose)
> +{
> +	/* default to 64MB */
> +	if (swiotlb_init_with_default_size(64 * (1<<20), verbose) == -ENOMEM)
> +		pr_warn("Cannot allocate SWIOTLB buffer");
>  }

And this
>  
>  void __init
>  swiotlb_init(int verbose)
>  {
> -	swiotlb_init_with_default_size(64 * (1<<20), verbose);	/* default to 64MB */
> +	/* default to 64MB */
> +	if (swiotlb_init_with_default_size(64 * (1<<20), verbose) == -ENOMEM)
> +		panic("Cannot allocate SWIOTLB buffer");
>  }

And just make this function return 'int'
>  
>  /*
> @@ -405,6 +438,9 @@ phys_addr_t swiotlb_tbl_map_single(struc
>  	unsigned long offset_slots;
>  	unsigned long max_slots;
>  
> +	if (no_iotlb_memory)
> +		panic("Cannot allocate SWIOTLB buffer");
> +
>  	mask = dma_get_seg_boundary(hwdev);
>  
>  	tbl_dma_addr &= mask;
> Index: linux-2.6/arch/x86/kernel/pci-swiotlb.c
> ===================================================================
> --- linux-2.6.orig/arch/x86/kernel/pci-swiotlb.c
> +++ linux-2.6/arch/x86/kernel/pci-swiotlb.c
> @@ -91,7 +91,7 @@ IOMMU_INIT(pci_swiotlb_detect_4gb,
>  void __init pci_swiotlb_init(void)
>  {
>  	if (swiotlb) {
> -		swiotlb_init(0);
> +		swiotlb_init_nopanic(0);
>  		dma_ops = &swiotlb_dma_ops;
>  	}
>  }

^ permalink raw reply	[flat|nested] 199+ messages in thread

* Re: [PATCH v7u1 26/31] x86: Don't enable swiotlb if there is not enough ram for it
  2013-01-09 13:24                                 ` Konrad Rzeszutek Wilk
@ 2013-01-09 17:27                                   ` Yinghai Lu
  2013-01-09 18:01                                     ` Shuah Khan
  2013-01-09 21:00                                     ` Eric W. Biederman
  0 siblings, 2 replies; 199+ messages in thread
From: Yinghai Lu @ 2013-01-09 17:27 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: Eric W. Biederman, Shuah Khan, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin, Andrew Morton, Borislav Petkov, Jan Kiszka,
	Jason Wessel, linux-kernel, Joerg Roedel

[-- Attachment #1: Type: text/plain, Size: 791 bytes --]

On Wed, Jan 9, 2013 at 5:24 AM, Konrad Rzeszutek Wilk
<konrad.wilk@oracle.com> wrote:
> On Tue, Jan 08, 2013 at 05:12:02PM -0800, Yinghai Lu wrote:
>> On Tue, Jan 8, 2013 at 5:07 PM, Yinghai Lu <yinghai@kernel.org> wrote:
>> > On Tue, Jan 8, 2013 at 4:58 PM, Eric W. Biederman <ebiederm@xmission.com> wrote:
>> >
>> >>
>> >> So instead we need to say?
>> >>
>> >> +       if (no_iotlb_memory)
>> >> +               panic("Cannot allocate SWIOTLB buffer");
>> >> +
>> >>
>> >> Which is just making the panic a little later than it used to be and
>> >> seems completely reasonable.
>> >
>> > yes, looks some driver just use map_single without checking results.
>>
>> update one.
>
> Please make it inline.
>

please check updated attached. It should address all your request.

Thanks

Yinghai

[-- Attachment #2: swiotlb.patch --]
[-- Type: application/octet-stream, Size: 6123 bytes --]

Subject: [PATCH] x86: Don't panic if can not alloc buffer for swiotlb

Normal boot path on system with iommu support:
swiotlb buffer will be allocated early at first and then try to initialize
iommu, if iommu for intel or AMD could setup properly, swiotlb buffer
will be freed.

The early allocating is with bootmem, and could panic when we try to use
kdump with buffer above 4G only, or with memmap to limit mem under 4G.
for example: memmap=4095M$1M to remove memory under 4G.

According to Eric, add _nopanic version and no_iotlb_memory to fail
map single later if swiotlb is still needed.

-v2: don't pass nopanic, and use -ENOMEM return value according to Eric.
     panic early instead of using swiotlb_full to panic...according to Eric/Konrad.
-v3: make swiotlb_init to be notpanic, but will affect:
     arm64, ia64, powerpc, tile, unicore32, x86.

Suggested-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Joerg Roedel <joro@8bytes.org>

---
 arch/mips/cavium-octeon/dma-octeon.c |    3 +-
 drivers/xen/swiotlb-xen.c            |    4 ++-
 include/linux/swiotlb.h              |    2 -
 lib/swiotlb.c                        |   41 +++++++++++++++++++++++++++--------
 4 files changed, 38 insertions(+), 12 deletions(-)

Index: linux-2.6/lib/swiotlb.c
===================================================================
--- linux-2.6.orig/lib/swiotlb.c
+++ linux-2.6/lib/swiotlb.c
@@ -122,11 +122,18 @@ static dma_addr_t swiotlb_virt_to_bus(st
 	return phys_to_dma(hwdev, virt_to_phys(address));
 }
 
+static bool no_iotlb_memory;
+
 void swiotlb_print_info(void)
 {
 	unsigned long bytes = io_tlb_nslabs << IO_TLB_SHIFT;
 	unsigned char *vstart, *vend;
 
+	if (no_iotlb_memory) {
+		pr_warn("software IO TLB: No low mem\n");
+		return;
+	}
+
 	vstart = phys_to_virt(io_tlb_start);
 	vend = phys_to_virt(io_tlb_end);
 
@@ -136,7 +143,7 @@ void swiotlb_print_info(void)
 	       bytes >> 20, vstart, vend - 1);
 }
 
-void __init swiotlb_init_with_tbl(char *tlb, unsigned long nslabs, int verbose)
+int __init swiotlb_init_with_tbl(char *tlb, unsigned long nslabs, int verbose)
 {
 	void *v_overflow_buffer;
 	unsigned long i, bytes;
@@ -150,9 +157,10 @@ void __init swiotlb_init_with_tbl(char *
 	/*
 	 * Get the overflow emergency buffer
 	 */
-	v_overflow_buffer = alloc_bootmem_low_pages(PAGE_ALIGN(io_tlb_overflow));
+	v_overflow_buffer = alloc_bootmem_low_pages_nopanic(
+						PAGE_ALIGN(io_tlb_overflow));
 	if (!v_overflow_buffer)
-		panic("Cannot allocate SWIOTLB overflow buffer!\n");
+		return -ENOMEM;
 
 	io_tlb_overflow_buffer = __pa(v_overflow_buffer);
 
@@ -169,14 +177,16 @@ void __init swiotlb_init_with_tbl(char *
 
 	if (verbose)
 		swiotlb_print_info();
+
+	return 0;
 }
 
 /*
  * Statically reserve bounce buffer space and initialize bounce buffer data
  * structures for the software IO TLB used to implement the DMA API.
  */
-static void __init
-swiotlb_init_with_default_size(size_t default_size, int verbose)
+static  __init
+int swiotlb_init_with_default_size(size_t default_size, int verbose)
 {
 	unsigned char *vstart;
 	unsigned long bytes;
@@ -191,17 +201,27 @@ swiotlb_init_with_default_size(size_t de
 	/*
 	 * Get IO TLB memory from the low pages
 	 */
-	vstart = alloc_bootmem_low_pages(PAGE_ALIGN(bytes));
+	vstart = alloc_bootmem_low_pages_nopanic(PAGE_ALIGN(bytes));
 	if (!vstart)
-		panic("Cannot allocate SWIOTLB buffer");
+		return -ENOMEM;
+
+	if (swiotlb_init_with_tbl(vstart, io_tlb_nslabs, verbose) == -ENOMEM) {
+		free_bootmem(io_tlb_start,
+				     PAGE_ALIGN(io_tlb_nslabs << IO_TLB_SHIFT));
+		return -ENOMEM;
+	}
 
-	swiotlb_init_with_tbl(vstart, io_tlb_nslabs, verbose);
+	return 0;
 }
 
 void __init
 swiotlb_init(int verbose)
 {
-	swiotlb_init_with_default_size(64 * (1<<20), verbose);	/* default to 64MB */
+	/* default to 64MB */
+	if (swiotlb_init_with_default_size(64UL<<20, verbose) == -ENOMEM) {
+		pr_warn("Cannot allocate SWIOTLB buffer");
+		no_iotlb_memory = true;
+	}
 }
 
 /*
@@ -405,6 +425,9 @@ phys_addr_t swiotlb_tbl_map_single(struc
 	unsigned long offset_slots;
 	unsigned long max_slots;
 
+	if (no_iotlb_memory)
+		panic("Can not allocate SWIOTLB buffer earlier and can't now provide you with the DMA bounce buffer");
+
 	mask = dma_get_seg_boundary(hwdev);
 
 	tbl_dma_addr &= mask;
Index: linux-2.6/arch/mips/cavium-octeon/dma-octeon.c
===================================================================
--- linux-2.6.orig/arch/mips/cavium-octeon/dma-octeon.c
+++ linux-2.6/arch/mips/cavium-octeon/dma-octeon.c
@@ -317,7 +317,8 @@ void __init plat_swiotlb_setup(void)
 
 	octeon_swiotlb = alloc_bootmem_low_pages(swiotlbsize);
 
-	swiotlb_init_with_tbl(octeon_swiotlb, swiotlb_nslabs, 1);
+	if (swiotlb_init_with_tbl(octeon_swiotlb, swiotlb_nslabs, 1) == -ENOMEM)
+		panic("Cannot allocate SWIOTLB buffer");
 
 	mips_dma_map_ops = &octeon_linear_dma_map_ops.dma_map_ops;
 }
Index: linux-2.6/drivers/xen/swiotlb-xen.c
===================================================================
--- linux-2.6.orig/drivers/xen/swiotlb-xen.c
+++ linux-2.6/drivers/xen/swiotlb-xen.c
@@ -231,7 +231,9 @@ retry:
 	}
 	start_dma_addr = xen_virt_to_bus(xen_io_tlb_start);
 	if (early) {
-		swiotlb_init_with_tbl(xen_io_tlb_start, xen_io_tlb_nslabs, verbose);
+		if (swiotlb_init_with_tbl(xen_io_tlb_start, xen_io_tlb_nslabs,
+			 verbose))
+			panic("Cannot allocate SWIOTLB buffer");
 		rc = 0;
 	} else
 		rc = swiotlb_late_init_with_tbl(xen_io_tlb_start, xen_io_tlb_nslabs);
Index: linux-2.6/include/linux/swiotlb.h
===================================================================
--- linux-2.6.orig/include/linux/swiotlb.h
+++ linux-2.6/include/linux/swiotlb.h
@@ -23,7 +23,7 @@ extern int swiotlb_force;
 #define IO_TLB_SHIFT 11
 
 extern void swiotlb_init(int verbose);
-extern void swiotlb_init_with_tbl(char *tlb, unsigned long nslabs, int verbose);
+int swiotlb_init_with_tbl(char *tlb, unsigned long nslabs, int verbose);
 extern unsigned long swiotlb_nr_tbl(void);
 extern int swiotlb_late_init_with_tbl(char *tlb, unsigned long nslabs);
 

^ permalink raw reply	[flat|nested] 199+ messages in thread

* Re: [PATCH v7u1 26/31] x86: Don't enable swiotlb if there is not enough ram for it
  2013-01-09 17:27                                   ` Yinghai Lu
@ 2013-01-09 18:01                                     ` Shuah Khan
  2013-01-09 19:13                                       ` Yinghai Lu
  2013-01-09 21:00                                     ` Eric W. Biederman
  1 sibling, 1 reply; 199+ messages in thread
From: Shuah Khan @ 2013-01-09 18:01 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Konrad Rzeszutek Wilk, Eric W. Biederman, Thomas Gleixner,
	Ingo Molnar, H. Peter Anvin, Andrew Morton, Borislav Petkov,
	Jan Kiszka, Jason Wessel, linux-kernel, Joerg Roedel

On Wed, Jan 9, 2013 at 10:27 AM, Yinghai Lu <yinghai@kernel.org> wrote:
> On Wed, Jan 9, 2013 at 5:24 AM, Konrad Rzeszutek Wilk
> <konrad.wilk@oracle.com> wrote:
>> On Tue, Jan 08, 2013 at 05:12:02PM -0800, Yinghai Lu wrote:
>>> On Tue, Jan 8, 2013 at 5:07 PM, Yinghai Lu <yinghai@kernel.org> wrote:
>>> > On Tue, Jan 8, 2013 at 4:58 PM, Eric W. Biederman <ebiederm@xmission.com> wrote:
>>> >
>>> >>
>>> >> So instead we need to say?
>>> >>
>>> >> +       if (no_iotlb_memory)
>>> >> +               panic("Cannot allocate SWIOTLB buffer");
>>> >> +
>>> >>
>>> >> Which is just making the panic a little later than it used to be and
>>> >> seems completely reasonable.
>>> >
>>> > yes, looks some driver just use map_single without checking results.
>>>
>>> update one.
>>
>> Please make it inline.
>>
>
> please check updated attached. It should address all your request.
>
> Thanks
>
> Yinghai

Yinghai,

After several revisions, I am loosing track. Could you please write a
change log and explain the change to the existing behavior. If you
could addresses the following areas, it will be easier figure if we
are missing something (if any):

1. What happens when switolb is forced with iommu=soft on a system
with and without not enough low mem?
2. What happens when swiotlb is not forced, but iommu driver sets
swiotlb=1 after it gets done with its iommu initialization on a system
with and without enough low mem.
3. What happens when nopanic is used and when will the system fail?
Will it fail when driver runs into errors. I am hoping this won't be a
silent failure. Please see more on this below:

I did dma mapping error analysis a few months ago and found several
drivers that don't check dma mapping errors, don't unmap dma buffers
etc. Returning mapping error from switolb could cause problems when we
have drivers that don't check mapping errors. These drivers might not
fail cleanly either and could cause data corruption.

Here is a link to the dma mapping error analysis I did:

http://linuxdriverproject.org/mediawiki/index.php/DMA_Mapping_Error_Analysis

-- Shuah

^ permalink raw reply	[flat|nested] 199+ messages in thread

* Re: [PATCH v7u1 26/31] x86: Don't enable swiotlb if there is not enough ram for it
  2013-01-09 18:01                                     ` Shuah Khan
@ 2013-01-09 19:13                                       ` Yinghai Lu
  0 siblings, 0 replies; 199+ messages in thread
From: Yinghai Lu @ 2013-01-09 19:13 UTC (permalink / raw)
  To: Shuah Khan
  Cc: Konrad Rzeszutek Wilk, Eric W. Biederman, Thomas Gleixner,
	Ingo Molnar, H. Peter Anvin, Andrew Morton, Borislav Petkov,
	Jan Kiszka, Jason Wessel, linux-kernel, Joerg Roedel

On Wed, Jan 9, 2013 at 10:01 AM, Shuah Khan <shuahkhan@gmail.com> wrote:
> After several revisions, I am loosing track. Could you please write a
> change log and explain the change to the existing behavior. If you
> could addresses the following areas, it will be easier figure if we
> are missing something (if any):
>
> 1. What happens when switolb is forced with iommu=soft on a system
> with and without not enough low mem?

panic later when device try to map_single with it.
instead of panic early during swiotlb allocation.

> 2. What happens when swiotlb is not forced, but iommu driver sets
> swiotlb=1 after it gets done with its iommu initialization on a system
> with and without enough low mem.

panic later when device try to map_single with it.
instead of panic early during swiotlb allocation.

> 3. What happens when nopanic is used and when will the system fail?
> Will it fail when driver runs into errors. I am hoping this won't be a
> silent failure. Please see more on this below:

according to eric/konrad, I make all use nopanic path,
and only mips, and x86-xen still panic early when
swiotlb_init_with_tab is called
directly.

hope mips, and x86-xen guys could convert to calling swiotlb_init instead.

>
> I did dma mapping error analysis a few months ago and found several
> drivers that don't check dma mapping errors, don't unmap dma buffers
> etc. Returning mapping error from switolb could cause problems when we
> have drivers that don't check mapping errors. These drivers might not
> fail cleanly either and could cause data corruption.

Now, we are using panic with them, later after those drivers get fixed, we
can change to returning MAP_ERROR instead.

>
> Here is a link to the dma mapping error analysis I did:
>
> http://linuxdriverproject.org/mediawiki/index.php/DMA_Mapping_Error_Analysis

good, looks like need one generic way to fix the problem, like another wrapper?

Thanks

Yinghai

^ permalink raw reply	[flat|nested] 199+ messages in thread

* Re: [PATCH v7u1 26/31] x86: Don't enable swiotlb if there is not enough ram for it
  2013-01-09 17:27                                   ` Yinghai Lu
  2013-01-09 18:01                                     ` Shuah Khan
@ 2013-01-09 21:00                                     ` Eric W. Biederman
  2013-01-09 21:15                                       ` Yinghai Lu
  1 sibling, 1 reply; 199+ messages in thread
From: Eric W. Biederman @ 2013-01-09 21:00 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Konrad Rzeszutek Wilk, Shuah Khan, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin, Andrew Morton, Borislav Petkov, Jan Kiszka,
	Jason Wessel, linux-kernel, Joerg Roedel

Yinghai Lu <yinghai@kernel.org> writes:

> please check updated attached. It should address all your request.

There is one significant bug that I can see.

swiotlb_print_info tests no_iotlb_memory but no_iotlb_memory is set
after swiotlb_init_with_tlb returns.

Eric


^ permalink raw reply	[flat|nested] 199+ messages in thread

* Re: [PATCH v7u1 26/31] x86: Don't enable swiotlb if there is not enough ram for it
  2013-01-09 21:00                                     ` Eric W. Biederman
@ 2013-01-09 21:15                                       ` Yinghai Lu
  2013-01-10 23:07                                         ` Yinghai Lu
  0 siblings, 1 reply; 199+ messages in thread
From: Yinghai Lu @ 2013-01-09 21:15 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Konrad Rzeszutek Wilk, Shuah Khan, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin, Andrew Morton, Borislav Petkov, Jan Kiszka,
	Jason Wessel, linux-kernel, Joerg Roedel

On Wed, Jan 9, 2013 at 1:00 PM, Eric W. Biederman <ebiederm@xmission.com> wrote:
> Yinghai Lu <yinghai@kernel.org> writes:
>
>> please check updated attached. It should address all your request.
>
> There is one significant bug that I can see.
>
> swiotlb_print_info tests no_iotlb_memory but no_iotlb_memory is set
> after swiotlb_init_with_tlb returns.

there is another swiotlb_print_info calling from
pci_swiotlb_late_init

void __init pci_swiotlb_late_init(void)
{
        /* An IOMMU turned us off. */
        if (!swiotlb)
                swiotlb_free();
        else {
                printk(KERN_INFO "PCI-DMA: "
                       "Using software bounce buffering for IO (SWIOTLB)\n");
                swiotlb_print_info();
        }
}

so we need that checking when swiotlb == 1, but actually we can not
allocate that before.

Thanks

Yinghai

^ permalink raw reply	[flat|nested] 199+ messages in thread

* Re: [PATCH v7u1 09/31] x86, 64bit: #PF handler set page to cover 2M only
  2013-01-04  0:48 ` [PATCH v7u1 09/31] x86, 64bit: #PF handler set page to cover 2M only Yinghai Lu
@ 2013-01-09 22:57   ` Borislav Petkov
  0 siblings, 0 replies; 199+ messages in thread
From: Borislav Petkov @ 2013-01-09 22:57 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Eric W. Biederman,
	Andrew Morton, Jan Kiszka, Jason Wessel, linux-kernel,
	Alexander Duyck

On Thu, Jan 03, 2013 at 04:48:29PM -0800, Yinghai Lu wrote:
> Now #PF hanlder could map 1G per #PF, That causes same problem that
> is fixed by
> 	x86, mm: Only direct map addresses that are marked as E820_RAM
> 
> only add one 2M mapping instead of 1G accessing one time for dynamically
> per #PF.

Ok, I can more or less grasp what the code does but this sentence above
is a lot of fun. Do you mean:

"add only one 2M mapping per #PF instead of all 512 pmds comprising 1G"

?

Thanks.

-- 
Regards/Gruss,
    Boris.

Sent from a fat crate under my desk. Formatting is fine.
--

^ permalink raw reply	[flat|nested] 199+ messages in thread

* Re: [PATCH v7u1 20/31] x86, kexec: replace ident_mapping_init and init_level4_page
  2013-01-05 13:24       ` Borislav Petkov
@ 2013-01-10  1:26         ` Yinghai Lu
  2013-01-10 11:59           ` Borislav Petkov
  0 siblings, 1 reply; 199+ messages in thread
From: Yinghai Lu @ 2013-01-10  1:26 UTC (permalink / raw)
  To: Borislav Petkov, Yinghai Lu, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin, Eric W. Biederman, Andrew Morton, Jan Kiszka,
	Jason Wessel, linux-kernel

On Sat, Jan 5, 2013 at 5:24 AM, Borislav Petkov <bp@alien8.de> wrote:
> On Fri, Jan 04, 2013 at 02:04:05PM -0800, Yinghai Lu wrote:
>> On Fri, Jan 4, 2013 at 1:01 PM, Borislav Petkov <bp@alien8.de> wrote:
>> > On Thu, Jan 03, 2013 at 04:48:40PM -0800, Yinghai Lu wrote:
>> >>  static int init_pgtable(struct kimage *image, unsigned long start_pgtable)
>> >>  {
>> >> +     struct x86_mapping_info info = {
>> >> +             .alloc_pgt_page = alloc_pgt_page,
>> >> +             .context        = image,
>> >> +             .pmd_flag       = __PAGE_KERNEL_LARGE_EXEC,
>> >> +     };
>> >
>> > This is leaving ->kernel_mapping uninitialized to contain a random,
>> > previous stack value. I don't think we want that.
>>
>> that should be initialized to false by default.
>
> So make it explicit. You can't possibly rely on what the stack contains
> when you allocate that struct there.

I should say:

that *is* initialized to false by default.

please check

http://stackoverflow.com/questions/10828294/c-and-c-partial-initialization-of-automatic-structure

^ permalink raw reply	[flat|nested] 199+ messages in thread

* Re: [PATCH v7u1 12/31] x86: add get_ramdisk_image/size()
  2013-01-07 15:56   ` Borislav Petkov
@ 2013-01-10  1:53     ` Yinghai Lu
  2013-01-10 12:13       ` Borislav Petkov
  0 siblings, 1 reply; 199+ messages in thread
From: Yinghai Lu @ 2013-01-10  1:53 UTC (permalink / raw)
  To: Borislav Petkov, Yinghai Lu, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin, Eric W. Biederman, Andrew Morton, Jan Kiszka,
	Jason Wessel, linux-kernel

On Mon, Jan 7, 2013 at 7:56 AM, Borislav Petkov <bp@alien8.de> wrote:
> On Thu, Jan 03, 2013 at 04:48:32PM -0800, Yinghai Lu wrote:
>> There are several places to find ramdisk information early for reserving
>> and relocating.
>>
>> Use functions to make code more readable and consistent.
>>
>> Later will add ext_ramdisk_image/size in those functions to support
>> loading ramdisk above 4g.
>>
>> Signed-off-by: Yinghai Lu <yinghai@kernel.org>
>> ---
>>  arch/x86/kernel/setup.c |   29 +++++++++++++++++++++--------
>>  1 file changed, 21 insertions(+), 8 deletions(-)
>>
>> diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
>> index 1b8a8cc..644a123 100644
>> --- a/arch/x86/kernel/setup.c
>> +++ b/arch/x86/kernel/setup.c
>> @@ -294,12 +294,25 @@ static void __init reserve_brk(void)
>>
>>  #ifdef CONFIG_BLK_DEV_INITRD
>>
>> +static u64 __init get_ramdisk_image(void)
>> +{
>> +     u64 ramdisk_image = boot_params.hdr.ramdisk_image;
>> +
>> +     return ramdisk_image;
>
> just do

No, I will insert line between them.

^ permalink raw reply	[flat|nested] 199+ messages in thread

* Re: [PATCH v7u1 08/31] x86, 64bit: early #PF handler set page table
  2013-01-07 15:55   ` Borislav Petkov
@ 2013-01-10  1:56     ` Yinghai Lu
  2013-01-10 12:19       ` Borislav Petkov
  0 siblings, 1 reply; 199+ messages in thread
From: Yinghai Lu @ 2013-01-10  1:56 UTC (permalink / raw)
  To: Borislav Petkov, Yinghai Lu, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin, Eric W. Biederman, Andrew Morton, Jan Kiszka,
	Jason Wessel, linux-kernel

On Mon, Jan 7, 2013 at 7:55 AM, Borislav Petkov <bp@alien8.de> wrote:
> Those -vXX version lines need to go under the "---" line. Alternatively,
> you might want to add some of them to the commit message with a proper
> explanation since they are not that trivial at a first glance, for
> example the -v5, -v6, -v8, -v9 with a better explanation.

mostly they are for tracking version.

>
>>
>
> This needs hpa's S-O-B.

he will add later when he put the in the tip.

^ permalink raw reply	[flat|nested] 199+ messages in thread

* Re: [PATCH v7u1 20/31] x86, kexec: replace ident_mapping_init and init_level4_page
  2013-01-10  1:26         ` Yinghai Lu
@ 2013-01-10 11:59           ` Borislav Petkov
  0 siblings, 0 replies; 199+ messages in thread
From: Borislav Petkov @ 2013-01-10 11:59 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Eric W. Biederman,
	Andrew Morton, Jan Kiszka, Jason Wessel, linux-kernel

On Wed, Jan 09, 2013 at 05:26:18PM -0800, Yinghai Lu wrote:
> I should say:
> 
> that *is* initialized to false by default.
> 
> please check
> 
> http://stackoverflow.com/questions/10828294/c-and-c-partial-initialization-of-automatic-structure

Ok, I didn't know that, thanks for pointing it out.

And yet, this is not the point - the point is that this code is
complicated enough as it is so why not make the easy things trivial so
that people looking at it months or even years from now can still try to
understand it.

So what it is defined by the standard?! Just add that line anyway! Then
there's no need to go check what was meant. This way it is *there*,
*explicit* and everyone *knows* what is meant - even people who don't
sleep with C99std under their pillow.

It is not like we're saving code since the mov $0 gets issued by the
compiler anyway when it is on the stack:

	movq	$0, -48(%rbp)	#, info
	movq	$0, -40(%rbp)	#, info
	movq	$0, -32(%rbp)	#, info
	movq	$0, -24(%rbp)	#, info

Thanks.

-- 
Regards/Gruss,
    Boris.

Sent from a fat crate under my desk. Formatting is fine.
--

^ permalink raw reply	[flat|nested] 199+ messages in thread

* Re: [PATCH v7u1 12/31] x86: add get_ramdisk_image/size()
  2013-01-10  1:53     ` Yinghai Lu
@ 2013-01-10 12:13       ` Borislav Petkov
  0 siblings, 0 replies; 199+ messages in thread
From: Borislav Petkov @ 2013-01-10 12:13 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Eric W. Biederman,
	Andrew Morton, Jan Kiszka, Jason Wessel, linux-kernel

On Wed, Jan 09, 2013 at 05:53:59PM -0800, Yinghai Lu wrote:
> >> +static u64 __init get_ramdisk_image(void)
> >> +{
> >> +     u64 ramdisk_image = boot_params.hdr.ramdisk_image;
> >> +
> >> +     return ramdisk_image;
> >
> > just do
> 
> No, I will insert line between them.

... and you're going to do this because your code is the best and the
rest can suck it and you don't accept other people's suggestions or...
is there an actual technical reason behind it?

-- 
Regards/Gruss,
    Boris.

Sent from a fat crate under my desk. Formatting is fine.
--

^ permalink raw reply	[flat|nested] 199+ messages in thread

* Re: [PATCH v7u1 08/31] x86, 64bit: early #PF handler set page table
  2013-01-10  1:56     ` Yinghai Lu
@ 2013-01-10 12:19       ` Borislav Petkov
  2013-01-10 17:05         ` Yinghai Lu
  0 siblings, 1 reply; 199+ messages in thread
From: Borislav Petkov @ 2013-01-10 12:19 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Eric W. Biederman,
	Andrew Morton, Jan Kiszka, Jason Wessel, linux-kernel

On Wed, Jan 09, 2013 at 05:56:07PM -0800, Yinghai Lu wrote:
> On Mon, Jan 7, 2013 at 7:55 AM, Borislav Petkov <bp@alien8.de> wrote:
> > Those -vXX version lines need to go under the "---" line. Alternatively,
> > you might want to add some of them to the commit message with a proper
> > explanation since they are not that trivial at a first glance, for
> > example the -v5, -v6, -v8, -v9 with a better explanation.
> 
> mostly they are for tracking version.

I know that! Please read my suggestion again.

> > This needs hpa's S-O-B.
> 
> he will add later when he put the in the tip.

This is not how SOB chaining works:

SOB: Author
SOB: Handler - this is you, who has added it to the patchset
SOB: Committer - maintainer

You need to read Documentation/SubmittingPatches if there's still things
unclear.

-- 
Regards/Gruss,
    Boris.

Sent from a fat crate under my desk. Formatting is fine.
--

^ permalink raw reply	[flat|nested] 199+ messages in thread

* Re: [PATCH v7u1 08/31] x86, 64bit: early #PF handler set page table
  2013-01-10 12:19       ` Borislav Petkov
@ 2013-01-10 17:05         ` Yinghai Lu
  2013-01-10 20:27           ` Borislav Petkov
  0 siblings, 1 reply; 199+ messages in thread
From: Yinghai Lu @ 2013-01-10 17:05 UTC (permalink / raw)
  To: Borislav Petkov, Yinghai Lu, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin, Eric W. Biederman, Andrew Morton, Jan Kiszka,
	Jason Wessel, linux-kernel

On Thu, Jan 10, 2013 at 4:19 AM, Borislav Petkov <bp@alien8.de> wrote:
> This is not how SOB chaining works:
>
> SOB: Author
> SOB: Handler - this is you, who has added it to the patchset
> SOB: Committer - maintainer
>
> You need to read Documentation/SubmittingPatches if there's still things
> unclear.

Really don't know what you are doing here.

We did that before for a long time.

During reviewing some patches, Linus or HPA or Eric has better idea
and drafted some patch,
without their Signed-offs.

then first version submitter will continue the debugging and testing
and make the patch working.

At last the submit the patch with authorship from Linus or HPA or Eric.

So at that time how can the Signed-off from them?

And there are commits in the upstream does not have Signed-off from the Author.

Thanks

Yinghai

^ permalink raw reply	[flat|nested] 199+ messages in thread

* Re: [PATCH v7u1 08/31] x86, 64bit: early #PF handler set page table
  2013-01-10 17:05         ` Yinghai Lu
@ 2013-01-10 20:27           ` Borislav Petkov
  2013-01-12 22:04             ` H. Peter Anvin
  0 siblings, 1 reply; 199+ messages in thread
From: Borislav Petkov @ 2013-01-10 20:27 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Eric W. Biederman,
	Andrew Morton, Jan Kiszka, Jason Wessel, linux-kernel

On Thu, Jan 10, 2013 at 09:05:46AM -0800, Yinghai Lu wrote:
> On Thu, Jan 10, 2013 at 4:19 AM, Borislav Petkov <bp@alien8.de> wrote:
> > This is not how SOB chaining works:
> >
> > SOB: Author
> > SOB: Handler - this is you, who has added it to the patchset
> > SOB: Committer - maintainer
> >
> > You need to read Documentation/SubmittingPatches if there's still things
> > unclear.
> 
> Really don't know what you are doing here.
> 
> We did that before for a long time.
> 
> During reviewing some patches, Linus or HPA or Eric has better idea
> and drafted some patch,
> without their Signed-offs.
> 
> then first version submitter will continue the debugging and testing
> and make the patch working.
> 
> At last the submit the patch with authorship from Linus or HPA or Eric.
> 
> So at that time how can the Signed-off from them?
> 
> And there are commits in the upstream does not have Signed-off from the Author.

I certainly hope those are a very very small number, if any.

In any case, if you've taken hpa's (or anyone's, for that matter) patch,
it should have SOB from the original author. Then, no matter whether you
do modifications to it or not, if it goes upstream through you, then it
has to have your SOB. And then, the upstream maintainer adds his/hers
because he's/she's the one committing it.

This way, the chain of patch handling is clear when you look at it and
you can trace the path back to this patch's origin and how it came
upstream.

Here's the relevant portion of SubmittingPatches:

"Rule (b) allows you to adjust the code, but then it is very impolite
to change one submitter's code and make him endorse your bugs. To
solve this problem, it is recommended that you add a line between the
last Signed-off-by header and yours, indicating the nature of your
changes. While there is nothing mandatory about this, it seems like
prepending the description with your mail and/or name, all enclosed in
square brackets, is noticeable enough to make it obvious that you are
responsible for last-minute changes. Example :

	Signed-off-by: Random J Developer <random@developer.example.org>
	[lucky@maintainer.example.org: struct foo moved from foo.c to foo.h]
	Signed-off-by: Lucky K Maintainer <lucky@maintainer.example.org>"

In your case, the second SOB should be "Lucky K Developer 2" :-)

This way the SOB chain tells you exactly who did what.

HTH.

-- 
Regards/Gruss,
    Boris.

Sent from a fat crate under my desk. Formatting is fine.
--

^ permalink raw reply	[flat|nested] 199+ messages in thread

* Re: [PATCH v7u1 26/31] x86: Don't enable swiotlb if there is not enough ram for it
  2013-01-09 21:15                                       ` Yinghai Lu
@ 2013-01-10 23:07                                         ` Yinghai Lu
  2013-01-10 23:15                                           ` Eric W. Biederman
  2013-01-11 16:35                                           ` Konrad Rzeszutek Wilk
  0 siblings, 2 replies; 199+ messages in thread
From: Yinghai Lu @ 2013-01-10 23:07 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Konrad Rzeszutek Wilk, Shuah Khan, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin, Andrew Morton, Borislav Petkov, Jan Kiszka,
	Jason Wessel, linux-kernel, Joerg Roedel

On Wed, Jan 9, 2013 at 1:15 PM, Yinghai Lu <yinghai@kernel.org> wrote:
> On Wed, Jan 9, 2013 at 1:00 PM, Eric W. Biederman <ebiederm@xmission.com> wrote:
>> Yinghai Lu <yinghai@kernel.org> writes:
>>
>>> please check updated attached. It should address all your request.
>>
>> There is one significant bug that I can see.
>>
>> swiotlb_print_info tests no_iotlb_memory but no_iotlb_memory is set
>> after swiotlb_init_with_tlb returns.
>
> there is another swiotlb_print_info calling from
> pci_swiotlb_late_init
>
> void __init pci_swiotlb_late_init(void)
> {
>         /* An IOMMU turned us off. */
>         if (!swiotlb)
>                 swiotlb_free();
>         else {
>                 printk(KERN_INFO "PCI-DMA: "
>                        "Using software bounce buffering for IO (SWIOTLB)\n");
>                 swiotlb_print_info();
>         }
> }
>
> so we need that checking when swiotlb == 1, but actually we can not
> allocate that before.

Eric, so the code is right to put checking in swiotlb_print_info ?

I'd like to post the whole patchset again and ask HPA to put them in tip/next
to catch -v3.9 merging window.

Thanks

Yinghai

^ permalink raw reply	[flat|nested] 199+ messages in thread

* Re: [PATCH v7u1 26/31] x86: Don't enable swiotlb if there is not enough ram for it
  2013-01-10 23:07                                         ` Yinghai Lu
@ 2013-01-10 23:15                                           ` Eric W. Biederman
  2013-01-10 23:55                                             ` Yinghai Lu
  2013-01-11 16:35                                           ` Konrad Rzeszutek Wilk
  1 sibling, 1 reply; 199+ messages in thread
From: Eric W. Biederman @ 2013-01-10 23:15 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Konrad Rzeszutek Wilk, Shuah Khan, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin, Andrew Morton, Borislav Petkov, Jan Kiszka,
	Jason Wessel, linux-kernel, Joerg Roedel

Yinghai Lu <yinghai@kernel.org> writes:

> On Wed, Jan 9, 2013 at 1:15 PM, Yinghai Lu <yinghai@kernel.org> wrote:
>> On Wed, Jan 9, 2013 at 1:00 PM, Eric W. Biederman <ebiederm@xmission.com> wrote:
>>> Yinghai Lu <yinghai@kernel.org> writes:
>>>
>>>> please check updated attached. It should address all your request.
>>>
>>> There is one significant bug that I can see.
>>>
>>> swiotlb_print_info tests no_iotlb_memory but no_iotlb_memory is set
>>> after swiotlb_init_with_tlb returns.
>>
>> there is another swiotlb_print_info calling from
>> pci_swiotlb_late_init
>>
>> void __init pci_swiotlb_late_init(void)
>> {
>>         /* An IOMMU turned us off. */
>>         if (!swiotlb)
>>                 swiotlb_free();
>>         else {
>>                 printk(KERN_INFO "PCI-DMA: "
>>                        "Using software bounce buffering for IO (SWIOTLB)\n");
>>                 swiotlb_print_info();
>>         }
>> }
>>
>> so we need that checking when swiotlb == 1, but actually we can not
>> allocate that before.
>
> Eric, so the code is right to put checking in swiotlb_print_info ?

My biggest question was really why you didn't set no_iotlb
sooner.  But shrug I didn't see any real issue with the code except
for it being silly.

Certainly since we are calling swiotlb_print_info from outside swiotlb.c
the check is needed.

> I'd like to post the whole patchset again and ask HPA to put them in tip/next
> to catch -v3.9 merging window.

Eric

^ permalink raw reply	[flat|nested] 199+ messages in thread

* Re: [PATCH v7u1 26/31] x86: Don't enable swiotlb if there is not enough ram for it
  2013-01-10 23:15                                           ` Eric W. Biederman
@ 2013-01-10 23:55                                             ` Yinghai Lu
  0 siblings, 0 replies; 199+ messages in thread
From: Yinghai Lu @ 2013-01-10 23:55 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Konrad Rzeszutek Wilk, Shuah Khan, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin, Andrew Morton, Borislav Petkov, Jan Kiszka,
	Jason Wessel, linux-kernel, Joerg Roedel

[-- Attachment #1: Type: text/plain, Size: 356 bytes --]

On Thu, Jan 10, 2013 at 3:15 PM, Eric W. Biederman
<ebiederm@xmission.com> wrote:
> My biggest question was really why you didn't set no_iotlb
> sooner.  But shrug I didn't see any real issue with the code except
> for it being silly.

how about attached one?

removed the swiotlb_init_with_default_size(), and logic should be more clean.

Thanks

Yinghai

[-- Attachment #2: swiotlb.patch --]
[-- Type: application/octet-stream, Size: 6465 bytes --]

Subject: [PATCH] x86: Don't panic if can not alloc buffer for swiotlb

Normal boot path on system with iommu support:
swiotlb buffer will be allocated early at first and then try to initialize
iommu, if iommu for intel or AMD could setup properly, swiotlb buffer
will be freed.

The early allocating is with bootmem, and could panic when we try to use
kdump with buffer above 4G only, or with memmap to limit mem under 4G.
for example: memmap=4095M$1M to remove memory under 4G.

According to Eric, add _nopanic version and no_iotlb_memory to fail
map single later if swiotlb is still needed.

-v2: don't pass nopanic, and use -ENOMEM return value according to Eric.
     panic early instead of using swiotlb_full to panic...according to Eric/Konrad.
-v3: make swiotlb_init to be notpanic, but will affect:
     arm64, ia64, powerpc, tile, unicore32, x86.

Suggested-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Jeremy Fitzhardinge <jeremy@goop.org>
Cc: Kyungmin Park <kyungmin.park@samsung.com>
Cc: Marek Szyprowski <m.szyprowski@samsung.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Andrzej Pietrasiewicz <andrzej.p@samsung.com>
Cc: linux-mips@linux-mips.org
Cc: xen-devel@lists.xensource.com
Cc: virtualization@lists.linux-foundation.org

---
 arch/mips/cavium-octeon/dma-octeon.c |    3 +-
 drivers/xen/swiotlb-xen.c            |    4 ++
 include/linux/swiotlb.h              |    2 -
 lib/swiotlb.c                        |   47 +++++++++++++++++++++--------------
 4 files changed, 35 insertions(+), 21 deletions(-)

Index: linux-2.6/lib/swiotlb.c
===================================================================
--- linux-2.6.orig/lib/swiotlb.c
+++ linux-2.6/lib/swiotlb.c
@@ -122,11 +122,18 @@ static dma_addr_t swiotlb_virt_to_bus(st
 	return phys_to_dma(hwdev, virt_to_phys(address));
 }
 
+static bool no_iotlb_memory;
+
 void swiotlb_print_info(void)
 {
 	unsigned long bytes = io_tlb_nslabs << IO_TLB_SHIFT;
 	unsigned char *vstart, *vend;
 
+	if (no_iotlb_memory) {
+		pr_warn("software IO TLB: No low mem\n");
+		return;
+	}
+
 	vstart = phys_to_virt(io_tlb_start);
 	vend = phys_to_virt(io_tlb_end);
 
@@ -136,7 +143,7 @@ void swiotlb_print_info(void)
 	       bytes >> 20, vstart, vend - 1);
 }
 
-void __init swiotlb_init_with_tbl(char *tlb, unsigned long nslabs, int verbose)
+int __init swiotlb_init_with_tbl(char *tlb, unsigned long nslabs, int verbose)
 {
 	void *v_overflow_buffer;
 	unsigned long i, bytes;
@@ -150,9 +157,10 @@ void __init swiotlb_init_with_tbl(char *
 	/*
 	 * Get the overflow emergency buffer
 	 */
-	v_overflow_buffer = alloc_bootmem_low_pages(PAGE_ALIGN(io_tlb_overflow));
+	v_overflow_buffer = alloc_bootmem_low_pages_nopanic(
+						PAGE_ALIGN(io_tlb_overflow));
 	if (!v_overflow_buffer)
-		panic("Cannot allocate SWIOTLB overflow buffer!\n");
+		return -ENOMEM;
 
 	io_tlb_overflow_buffer = __pa(v_overflow_buffer);
 
@@ -169,15 +177,19 @@ void __init swiotlb_init_with_tbl(char *
 
 	if (verbose)
 		swiotlb_print_info();
+
+	return 0;
 }
 
 /*
  * Statically reserve bounce buffer space and initialize bounce buffer data
  * structures for the software IO TLB used to implement the DMA API.
  */
-static void __init
-swiotlb_init_with_default_size(size_t default_size, int verbose)
+void  __init
+swiotlb_init(int verbose)
 {
+	/* default to 64MB */
+	size_t default_size = 64UL<<20;
 	unsigned char *vstart;
 	unsigned long bytes;
 
@@ -188,20 +200,16 @@ swiotlb_init_with_default_size(size_t de
 
 	bytes = io_tlb_nslabs << IO_TLB_SHIFT;
 
-	/*
-	 * Get IO TLB memory from the low pages
-	 */
-	vstart = alloc_bootmem_low_pages(PAGE_ALIGN(bytes));
-	if (!vstart)
-		panic("Cannot allocate SWIOTLB buffer");
-
-	swiotlb_init_with_tbl(vstart, io_tlb_nslabs, verbose);
-}
+	/* Get IO TLB memory from the low pages */
+	vstart = alloc_bootmem_low_pages_nopanic(PAGE_ALIGN(bytes));
+	if (vstart && !swiotlb_init_with_tbl(vstart, io_tlb_nslabs, verbose))
+		return;
 
-void __init
-swiotlb_init(int verbose)
-{
-	swiotlb_init_with_default_size(64 * (1<<20), verbose);	/* default to 64MB */
+	if (io_tlb_start)
+		free_bootmem(io_tlb_start,
+				 PAGE_ALIGN(io_tlb_nslabs << IO_TLB_SHIFT));
+	pr_warn("Cannot allocate SWIOTLB buffer");
+	no_iotlb_memory = true;
 }
 
 /*
@@ -405,6 +413,9 @@ phys_addr_t swiotlb_tbl_map_single(struc
 	unsigned long offset_slots;
 	unsigned long max_slots;
 
+	if (no_iotlb_memory)
+		panic("Can not allocate SWIOTLB buffer earlier and can't now provide you with the DMA bounce buffer");
+
 	mask = dma_get_seg_boundary(hwdev);
 
 	tbl_dma_addr &= mask;
Index: linux-2.6/arch/mips/cavium-octeon/dma-octeon.c
===================================================================
--- linux-2.6.orig/arch/mips/cavium-octeon/dma-octeon.c
+++ linux-2.6/arch/mips/cavium-octeon/dma-octeon.c
@@ -317,7 +317,8 @@ void __init plat_swiotlb_setup(void)
 
 	octeon_swiotlb = alloc_bootmem_low_pages(swiotlbsize);
 
-	swiotlb_init_with_tbl(octeon_swiotlb, swiotlb_nslabs, 1);
+	if (swiotlb_init_with_tbl(octeon_swiotlb, swiotlb_nslabs, 1) == -ENOMEM)
+		panic("Cannot allocate SWIOTLB buffer");
 
 	mips_dma_map_ops = &octeon_linear_dma_map_ops.dma_map_ops;
 }
Index: linux-2.6/drivers/xen/swiotlb-xen.c
===================================================================
--- linux-2.6.orig/drivers/xen/swiotlb-xen.c
+++ linux-2.6/drivers/xen/swiotlb-xen.c
@@ -231,7 +231,9 @@ retry:
 	}
 	start_dma_addr = xen_virt_to_bus(xen_io_tlb_start);
 	if (early) {
-		swiotlb_init_with_tbl(xen_io_tlb_start, xen_io_tlb_nslabs, verbose);
+		if (swiotlb_init_with_tbl(xen_io_tlb_start, xen_io_tlb_nslabs,
+			 verbose))
+			panic("Cannot allocate SWIOTLB buffer");
 		rc = 0;
 	} else
 		rc = swiotlb_late_init_with_tbl(xen_io_tlb_start, xen_io_tlb_nslabs);
Index: linux-2.6/include/linux/swiotlb.h
===================================================================
--- linux-2.6.orig/include/linux/swiotlb.h
+++ linux-2.6/include/linux/swiotlb.h
@@ -23,7 +23,7 @@ extern int swiotlb_force;
 #define IO_TLB_SHIFT 11
 
 extern void swiotlb_init(int verbose);
-extern void swiotlb_init_with_tbl(char *tlb, unsigned long nslabs, int verbose);
+int swiotlb_init_with_tbl(char *tlb, unsigned long nslabs, int verbose);
 extern unsigned long swiotlb_nr_tbl(void);
 extern int swiotlb_late_init_with_tbl(char *tlb, unsigned long nslabs);
 

^ permalink raw reply	[flat|nested] 199+ messages in thread

* Re: [PATCH v7u1 10/31] x86, 64bit: Don't set max_pfn_mapped wrong value early on native path
  2013-01-04  0:48 ` [PATCH v7u1 10/31] x86, 64bit: Don't set max_pfn_mapped wrong value early on native path Yinghai Lu
@ 2013-01-11 12:13   ` Borislav Petkov
  2013-01-11 16:42     ` Yinghai Lu
  2013-01-15 13:48   ` Stefano Stabellini
  1 sibling, 1 reply; 199+ messages in thread
From: Borislav Petkov @ 2013-01-11 12:13 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Eric W. Biederman,
	Andrew Morton, Jan Kiszka, Jason Wessel, linux-kernel

On Thu, Jan 03, 2013 at 04:48:30PM -0800, Yinghai Lu wrote:
> We are not having max_pfn_mapped set correctly until init_memory_mapping.
> 
> so don't print it initial value for 64bit

Just a minor nitpick:

"So don't print its initial value for 64bit."

And you don't need the newlines between each sentence.

> Also need to use KERNEL_IMAGE_SIZE directly for highmap cleanup.
> 
> Signed-off-by: Yinghai Lu <yinghai@kernel.org>

Other than that, this commit message actually reads very good. :-)

Acked-by: Borislav Petkov <bp@suse.de>

Thanks.

-- 
Regards/Gruss,
    Boris.

Sent from a fat crate under my desk. Formatting is fine.
--

^ permalink raw reply	[flat|nested] 199+ messages in thread

* Re: [PATCH v7u1 26/31] x86: Don't enable swiotlb if there is not enough ram for it
  2013-01-10 23:07                                         ` Yinghai Lu
  2013-01-10 23:15                                           ` Eric W. Biederman
@ 2013-01-11 16:35                                           ` Konrad Rzeszutek Wilk
  2013-01-11 16:52                                             ` Yinghai Lu
  1 sibling, 1 reply; 199+ messages in thread
From: Konrad Rzeszutek Wilk @ 2013-01-11 16:35 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Eric W. Biederman, Shuah Khan, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin, Andrew Morton, Borislav Petkov, Jan Kiszka,
	Jason Wessel, linux-kernel, Joerg Roedel

On Thu, Jan 10, 2013 at 03:07:10PM -0800, Yinghai Lu wrote:
> On Wed, Jan 9, 2013 at 1:15 PM, Yinghai Lu <yinghai@kernel.org> wrote:
> > On Wed, Jan 9, 2013 at 1:00 PM, Eric W. Biederman <ebiederm@xmission.com> wrote:
> >> Yinghai Lu <yinghai@kernel.org> writes:
> >>
> >>> please check updated attached. It should address all your request.
> >>
> >> There is one significant bug that I can see.
> >>
> >> swiotlb_print_info tests no_iotlb_memory but no_iotlb_memory is set
> >> after swiotlb_init_with_tlb returns.
> >
> > there is another swiotlb_print_info calling from
> > pci_swiotlb_late_init
> >
> > void __init pci_swiotlb_late_init(void)
> > {
> >         /* An IOMMU turned us off. */
> >         if (!swiotlb)
> >                 swiotlb_free();
> >         else {
> >                 printk(KERN_INFO "PCI-DMA: "
> >                        "Using software bounce buffering for IO (SWIOTLB)\n");
> >                 swiotlb_print_info();
> >         }
> > }
> >
> > so we need that checking when swiotlb == 1, but actually we can not
> > allocate that before.
> 
> Eric, so the code is right to put checking in swiotlb_print_info ?
> 
> I'd like to post the whole patchset again and ask HPA to put them in tip/next
> to catch -v3.9 merging window.

I'm the frontline maintainer of swiotlb and related stuff so if you want to follow
the proper protocol you should wait until I give you my Ack.

I need to check this patch out and then also test-run them on IA64, AMD-VI, Calgary-X
GART and Intel VT-d to make a sanity test.

However, this particular patch can go outside the mega-patchset you have. So you
could post the mega-patchset to hpa without this being in it and just mention
that there is this extra one that Konrad is handling.

^ permalink raw reply	[flat|nested] 199+ messages in thread

* Re: [PATCH v7u1 10/31] x86, 64bit: Don't set max_pfn_mapped wrong value early on native path
  2013-01-11 12:13   ` Borislav Petkov
@ 2013-01-11 16:42     ` Yinghai Lu
  2013-01-11 16:52       ` Borislav Petkov
  0 siblings, 1 reply; 199+ messages in thread
From: Yinghai Lu @ 2013-01-11 16:42 UTC (permalink / raw)
  To: Borislav Petkov, Yinghai Lu, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin, Eric W. Biederman, Andrew Morton, Jan Kiszka,
	Jason Wessel, linux-kernel

On Fri, Jan 11, 2013 at 4:13 AM, Borislav Petkov <bp@alien8.de> wrote:
> On Thu, Jan 03, 2013 at 04:48:30PM -0800, Yinghai Lu wrote:
>> We are not having max_pfn_mapped set correctly until init_memory_mapping.
>>
>> so don't print it initial value for 64bit
>
> Just a minor nitpick:
>
> "So don't print its initial value for 64bit."

OK.

>
> And you don't need the newlines between each sentence.

Just want to make it more readable.

>
> Acked-by: Borislav Petkov <bp@suse.de>

Thanks, I add that.

Also, congratulation for the new job.

Yinghai

^ permalink raw reply	[flat|nested] 199+ messages in thread

* Re: [PATCH v7u1 26/31] x86: Don't enable swiotlb if there is not enough ram for it
  2013-01-11 16:35                                           ` Konrad Rzeszutek Wilk
@ 2013-01-11 16:52                                             ` Yinghai Lu
  2013-01-11 17:49                                               ` Yinghai Lu
  0 siblings, 1 reply; 199+ messages in thread
From: Yinghai Lu @ 2013-01-11 16:52 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: Eric W. Biederman, Shuah Khan, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin, Andrew Morton, Borislav Petkov, Jan Kiszka,
	Jason Wessel, linux-kernel, Joerg Roedel

[-- Attachment #1: Type: text/plain, Size: 981 bytes --]

On Fri, Jan 11, 2013 at 8:35 AM, Konrad Rzeszutek Wilk
<konrad.wilk@oracle.com> wrote:
> On Thu, Jan 10, 2013 at 03:07:10PM -0800, Yinghai Lu wrote:
>
> I'm the frontline maintainer of swiotlb and related stuff so if you want to follow
> the proper protocol you should wait until I give you my Ack.

Sure.

>
> I need to check this patch out and then also test-run them on IA64, AMD-VI, Calgary-X
> GART and Intel VT-d to make a sanity test.

that will be great, and please check attached two patches, or you want
to me update
for-x86-boot branch and you test that instead?

but if you want to check memmap=4095M$1M, then will need to test on
newer branch.

>
> However, this particular patch can go outside the mega-patchset you have. So you
> could post the mega-patchset to hpa without this being in it and just mention
> that there is this extra one that Konrad is handling.

Just want to put them together, so in case backporting guys could find
them easier.

Thanks

Yinghai

[-- Attachment #2: alloc_low_page_nopanic.patch --]
[-- Type: application/octet-stream, Size: 2534 bytes --]

Subject: [PATCH] mm: Add alloc_bootmem_low_pages_nopanic()

We don't need to panic in some case, like for swiotlb preallocating.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>

---
 include/linux/bootmem.h |    5 +++++
 mm/bootmem.c            |    8 ++++++++
 mm/nobootmem.c          |    8 ++++++++
 3 files changed, 21 insertions(+)

Index: linux-2.6/include/linux/bootmem.h
===================================================================
--- linux-2.6.orig/include/linux/bootmem.h
+++ linux-2.6/include/linux/bootmem.h
@@ -99,6 +99,9 @@ void *___alloc_bootmem_node_nopanic(pg_d
 extern void *__alloc_bootmem_low(unsigned long size,
 				 unsigned long align,
 				 unsigned long goal);
+void *__alloc_bootmem_low_nopanic(unsigned long size,
+				 unsigned long align,
+				 unsigned long goal);
 extern void *__alloc_bootmem_low_node(pg_data_t *pgdat,
 				      unsigned long size,
 				      unsigned long align,
@@ -132,6 +135,8 @@ extern void *__alloc_bootmem_low_node(pg
 
 #define alloc_bootmem_low(x) \
 	__alloc_bootmem_low(x, SMP_CACHE_BYTES, 0)
+#define alloc_bootmem_low_pages_nopanic(x) \
+	__alloc_bootmem_low_nopanic(x, PAGE_SIZE, 0)
 #define alloc_bootmem_low_pages(x) \
 	__alloc_bootmem_low(x, PAGE_SIZE, 0)
 #define alloc_bootmem_low_pages_node(pgdat, x) \
Index: linux-2.6/mm/bootmem.c
===================================================================
--- linux-2.6.orig/mm/bootmem.c
+++ linux-2.6/mm/bootmem.c
@@ -821,6 +821,14 @@ void * __init __alloc_bootmem_low(unsign
 	return ___alloc_bootmem(size, align, goal, ARCH_LOW_ADDRESS_LIMIT);
 }
 
+void * __init __alloc_bootmem_low_nopanic(unsigned long size,
+					  unsigned long align,
+					  unsigned long goal)
+{
+	return ___alloc_bootmem_nopanic(size, align, goal,
+					ARCH_LOW_ADDRESS_LIMIT);
+}
+
 /**
  * __alloc_bootmem_low_node - allocate low boot memory from a specific node
  * @pgdat: node to allocate from
Index: linux-2.6/mm/nobootmem.c
===================================================================
--- linux-2.6.orig/mm/nobootmem.c
+++ linux-2.6/mm/nobootmem.c
@@ -391,6 +391,14 @@ void * __init __alloc_bootmem_low(unsign
 	return ___alloc_bootmem(size, align, goal, ARCH_LOW_ADDRESS_LIMIT);
 }
 
+void * __init __alloc_bootmem_low_nopanic(unsigned long size,
+					  unsigned long align,
+					  unsigned long goal)
+{
+	return ___alloc_bootmem_nopanic(size, align, goal,
+					ARCH_LOW_ADDRESS_LIMIT);
+}
+
 /**
  * __alloc_bootmem_low_node - allocate low boot memory from a specific node
  * @pgdat: node to allocate from

[-- Attachment #3: swiotlb.patch --]
[-- Type: application/octet-stream, Size: 6465 bytes --]

Subject: [PATCH] x86: Don't panic if can not alloc buffer for swiotlb

Normal boot path on system with iommu support:
swiotlb buffer will be allocated early at first and then try to initialize
iommu, if iommu for intel or AMD could setup properly, swiotlb buffer
will be freed.

The early allocating is with bootmem, and could panic when we try to use
kdump with buffer above 4G only, or with memmap to limit mem under 4G.
for example: memmap=4095M$1M to remove memory under 4G.

According to Eric, add _nopanic version and no_iotlb_memory to fail
map single later if swiotlb is still needed.

-v2: don't pass nopanic, and use -ENOMEM return value according to Eric.
     panic early instead of using swiotlb_full to panic...according to Eric/Konrad.
-v3: make swiotlb_init to be notpanic, but will affect:
     arm64, ia64, powerpc, tile, unicore32, x86.

Suggested-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Jeremy Fitzhardinge <jeremy@goop.org>
Cc: Kyungmin Park <kyungmin.park@samsung.com>
Cc: Marek Szyprowski <m.szyprowski@samsung.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Andrzej Pietrasiewicz <andrzej.p@samsung.com>
Cc: linux-mips@linux-mips.org
Cc: xen-devel@lists.xensource.com
Cc: virtualization@lists.linux-foundation.org

---
 arch/mips/cavium-octeon/dma-octeon.c |    3 +-
 drivers/xen/swiotlb-xen.c            |    4 ++
 include/linux/swiotlb.h              |    2 -
 lib/swiotlb.c                        |   47 +++++++++++++++++++++--------------
 4 files changed, 35 insertions(+), 21 deletions(-)

Index: linux-2.6/lib/swiotlb.c
===================================================================
--- linux-2.6.orig/lib/swiotlb.c
+++ linux-2.6/lib/swiotlb.c
@@ -122,11 +122,18 @@ static dma_addr_t swiotlb_virt_to_bus(st
 	return phys_to_dma(hwdev, virt_to_phys(address));
 }
 
+static bool no_iotlb_memory;
+
 void swiotlb_print_info(void)
 {
 	unsigned long bytes = io_tlb_nslabs << IO_TLB_SHIFT;
 	unsigned char *vstart, *vend;
 
+	if (no_iotlb_memory) {
+		pr_warn("software IO TLB: No low mem\n");
+		return;
+	}
+
 	vstart = phys_to_virt(io_tlb_start);
 	vend = phys_to_virt(io_tlb_end);
 
@@ -136,7 +143,7 @@ void swiotlb_print_info(void)
 	       bytes >> 20, vstart, vend - 1);
 }
 
-void __init swiotlb_init_with_tbl(char *tlb, unsigned long nslabs, int verbose)
+int __init swiotlb_init_with_tbl(char *tlb, unsigned long nslabs, int verbose)
 {
 	void *v_overflow_buffer;
 	unsigned long i, bytes;
@@ -150,9 +157,10 @@ void __init swiotlb_init_with_tbl(char *
 	/*
 	 * Get the overflow emergency buffer
 	 */
-	v_overflow_buffer = alloc_bootmem_low_pages(PAGE_ALIGN(io_tlb_overflow));
+	v_overflow_buffer = alloc_bootmem_low_pages_nopanic(
+						PAGE_ALIGN(io_tlb_overflow));
 	if (!v_overflow_buffer)
-		panic("Cannot allocate SWIOTLB overflow buffer!\n");
+		return -ENOMEM;
 
 	io_tlb_overflow_buffer = __pa(v_overflow_buffer);
 
@@ -169,15 +177,19 @@ void __init swiotlb_init_with_tbl(char *
 
 	if (verbose)
 		swiotlb_print_info();
+
+	return 0;
 }
 
 /*
  * Statically reserve bounce buffer space and initialize bounce buffer data
  * structures for the software IO TLB used to implement the DMA API.
  */
-static void __init
-swiotlb_init_with_default_size(size_t default_size, int verbose)
+void  __init
+swiotlb_init(int verbose)
 {
+	/* default to 64MB */
+	size_t default_size = 64UL<<20;
 	unsigned char *vstart;
 	unsigned long bytes;
 
@@ -188,20 +200,16 @@ swiotlb_init_with_default_size(size_t de
 
 	bytes = io_tlb_nslabs << IO_TLB_SHIFT;
 
-	/*
-	 * Get IO TLB memory from the low pages
-	 */
-	vstart = alloc_bootmem_low_pages(PAGE_ALIGN(bytes));
-	if (!vstart)
-		panic("Cannot allocate SWIOTLB buffer");
-
-	swiotlb_init_with_tbl(vstart, io_tlb_nslabs, verbose);
-}
+	/* Get IO TLB memory from the low pages */
+	vstart = alloc_bootmem_low_pages_nopanic(PAGE_ALIGN(bytes));
+	if (vstart && !swiotlb_init_with_tbl(vstart, io_tlb_nslabs, verbose))
+		return;
 
-void __init
-swiotlb_init(int verbose)
-{
-	swiotlb_init_with_default_size(64 * (1<<20), verbose);	/* default to 64MB */
+	if (io_tlb_start)
+		free_bootmem(io_tlb_start,
+				 PAGE_ALIGN(io_tlb_nslabs << IO_TLB_SHIFT));
+	pr_warn("Cannot allocate SWIOTLB buffer");
+	no_iotlb_memory = true;
 }
 
 /*
@@ -405,6 +413,9 @@ phys_addr_t swiotlb_tbl_map_single(struc
 	unsigned long offset_slots;
 	unsigned long max_slots;
 
+	if (no_iotlb_memory)
+		panic("Can not allocate SWIOTLB buffer earlier and can't now provide you with the DMA bounce buffer");
+
 	mask = dma_get_seg_boundary(hwdev);
 
 	tbl_dma_addr &= mask;
Index: linux-2.6/arch/mips/cavium-octeon/dma-octeon.c
===================================================================
--- linux-2.6.orig/arch/mips/cavium-octeon/dma-octeon.c
+++ linux-2.6/arch/mips/cavium-octeon/dma-octeon.c
@@ -317,7 +317,8 @@ void __init plat_swiotlb_setup(void)
 
 	octeon_swiotlb = alloc_bootmem_low_pages(swiotlbsize);
 
-	swiotlb_init_with_tbl(octeon_swiotlb, swiotlb_nslabs, 1);
+	if (swiotlb_init_with_tbl(octeon_swiotlb, swiotlb_nslabs, 1) == -ENOMEM)
+		panic("Cannot allocate SWIOTLB buffer");
 
 	mips_dma_map_ops = &octeon_linear_dma_map_ops.dma_map_ops;
 }
Index: linux-2.6/drivers/xen/swiotlb-xen.c
===================================================================
--- linux-2.6.orig/drivers/xen/swiotlb-xen.c
+++ linux-2.6/drivers/xen/swiotlb-xen.c
@@ -231,7 +231,9 @@ retry:
 	}
 	start_dma_addr = xen_virt_to_bus(xen_io_tlb_start);
 	if (early) {
-		swiotlb_init_with_tbl(xen_io_tlb_start, xen_io_tlb_nslabs, verbose);
+		if (swiotlb_init_with_tbl(xen_io_tlb_start, xen_io_tlb_nslabs,
+			 verbose))
+			panic("Cannot allocate SWIOTLB buffer");
 		rc = 0;
 	} else
 		rc = swiotlb_late_init_with_tbl(xen_io_tlb_start, xen_io_tlb_nslabs);
Index: linux-2.6/include/linux/swiotlb.h
===================================================================
--- linux-2.6.orig/include/linux/swiotlb.h
+++ linux-2.6/include/linux/swiotlb.h
@@ -23,7 +23,7 @@ extern int swiotlb_force;
 #define IO_TLB_SHIFT 11
 
 extern void swiotlb_init(int verbose);
-extern void swiotlb_init_with_tbl(char *tlb, unsigned long nslabs, int verbose);
+int swiotlb_init_with_tbl(char *tlb, unsigned long nslabs, int verbose);
 extern unsigned long swiotlb_nr_tbl(void);
 extern int swiotlb_late_init_with_tbl(char *tlb, unsigned long nslabs);
 

^ permalink raw reply	[flat|nested] 199+ messages in thread

* Re: [PATCH v7u1 10/31] x86, 64bit: Don't set max_pfn_mapped wrong value early on native path
  2013-01-11 16:42     ` Yinghai Lu
@ 2013-01-11 16:52       ` Borislav Petkov
  0 siblings, 0 replies; 199+ messages in thread
From: Borislav Petkov @ 2013-01-11 16:52 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Eric W. Biederman,
	Andrew Morton, Jan Kiszka, Jason Wessel, linux-kernel

On Fri, Jan 11, 2013 at 08:42:05AM -0800, Yinghai Lu wrote:
> Just want to make it more readable.

Yeah, IMHO, a commit message of three sentences is readable even without
empty lines - it is small enough. :-)

> Also, congratulation for the new job.

Thanks. :-)

-- 
Regards/Gruss,
    Boris.

Sent from a fat crate under my desk. Formatting is fine.
--

^ permalink raw reply	[flat|nested] 199+ messages in thread

* Re: [PATCH v7u1 26/31] x86: Don't enable swiotlb if there is not enough ram for it
  2013-01-11 16:52                                             ` Yinghai Lu
@ 2013-01-11 17:49                                               ` Yinghai Lu
  2013-01-15  6:19                                                 ` Yinghai Lu
  0 siblings, 1 reply; 199+ messages in thread
From: Yinghai Lu @ 2013-01-11 17:49 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: Eric W. Biederman, Shuah Khan, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin, Andrew Morton, Borislav Petkov, Jan Kiszka,
	Jason Wessel, linux-kernel, Joerg Roedel

On Fri, Jan 11, 2013 at 8:52 AM, Yinghai Lu <yinghai@kernel.org> wrote:
>>
>> I need to check this patch out and then also test-run them on IA64, AMD-VI, Calgary-X
>> GART and Intel VT-d to make a sanity test.
>
> that will be great, and please check attached two patches, or you want
> to me update
> for-x86-boot branch and you test that instead?
>
> but if you want to check memmap=4095M$1M, then will need to test on
> newer branch.


I updated the for-x86-boot branch.

git://git.kernel.org/pub/scm/linux/kernel/git/yinghai/linux-yinghai.git
for-x86-boot

Thanks

Yinghai

^ permalink raw reply	[flat|nested] 199+ messages in thread

* Re: [PATCH v7u1 08/31] x86, 64bit: early #PF handler set page table
  2013-01-10 20:27           ` Borislav Petkov
@ 2013-01-12 22:04             ` H. Peter Anvin
  0 siblings, 0 replies; 199+ messages in thread
From: H. Peter Anvin @ 2013-01-12 22:04 UTC (permalink / raw)
  To: Borislav Petkov, Yinghai Lu, Thomas Gleixner, Ingo Molnar,
	Eric W. Biederman, Andrew Morton, Jan Kiszka, Jason Wessel,
	linux-kernel

On 01/10/2013 12:27 PM, Borislav Petkov wrote:
>>
>> So at that time how can the Signed-off from them?
>>
>> And there are commits in the upstream does not have Signed-off from the Author.
>
> I certainly hope those are a very very small number, if any.
>

There are indeed a handful, at which point the first Signed-off-by: 
indicates that he, *based on his own first-hand knowledge* knows the 
author is intending and allowed to release this patch under the 
appropriate licensing term (see the Developer's Certificate of Origin 
document for the exact details.)

	-hpa

-- 
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel.  I don't speak on their behalf.


^ permalink raw reply	[flat|nested] 199+ messages in thread

* Re: [PATCH v7u1 21/31] x86, kexec: only set ident mapping for ram.
  2013-01-04  0:48 ` [PATCH v7u1 21/31] x86, kexec: only set ident mapping for ram Yinghai Lu
@ 2013-01-13 12:56   ` Borislav Petkov
  2013-01-14  5:46     ` Yinghai Lu
  0 siblings, 1 reply; 199+ messages in thread
From: Borislav Petkov @ 2013-01-13 12:56 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Eric W. Biederman,
	Andrew Morton, Jan Kiszka, Jason Wessel, linux-kernel,
	Alexander Duyck

On Thu, Jan 03, 2013 at 04:48:41PM -0800, Yinghai Lu wrote:
> We should not set mapping for all under max_pfn.

"We should not establish mappings for all memory under max_pfn."

> That causes same problem that is fixed by

"Otherwise, it causes the same ..."

> 
> 	x86, mm: Only direct map addresses that are marked as E820_RAM

You could add this patch's commit id since it is in tip:x86/mm2 and it
shouldn't change: 66520ebc2df3.

Ditto for patch 09/31, "x86, 64bit: #PF handler set page to cover 2M only".

> 
> This patch expose pfn_mapped array, and only set ident mapping for ranges

	     exposes the...		       sets

> in that array.
> 
> This patch rely on new ident_mapping_init that could handle existing

	     relies on the new

> pgd/pud between different calling.

			    calls.

> 
> Signed-off-by: Yinghai Lu <yinghai@kernel.org>
> Cc: Alexander Duyck <alexander.h.duyck@intel.com>

[ … ]

> diff --git a/arch/x86/mm/init.c b/arch/x86/mm/init.c
> index ab26a158b5a8..d704b369fd70 100644
> --- a/arch/x86/mm/init.c
> +++ b/arch/x86/mm/init.c
> @@ -300,8 +300,8 @@ static int __meminit split_mem_range(struct map_range *mr, int nr_range,
>         return nr_range;
>  }
>  
> -static struct range pfn_mapped[E820_X_MAX];
> -static int nr_pfn_mapped;

This could use a comment saying that this is an array of all mapped
memory ranges or something like that.

> +struct range pfn_mapped[E820_X_MAX];
> +int nr_pfn_mapped;
>  
>  static void add_pfn_range_mapped(unsigned long start_pfn, unsigned long end_pfn)
>  {

Thanks.

-- 
Regards/Gruss,
    Boris.

Sent from a fat crate under my desk. Formatting is fine.
--

^ permalink raw reply	[flat|nested] 199+ messages in thread

* Re: [PATCH v7u1 22/31] x86, boot: add fields to support load bzImage and ramdisk above 4G
  2013-01-04  0:48 ` [PATCH v7u1 22/31] x86, boot: add fields to support load bzImage and ramdisk above 4G Yinghai Lu
@ 2013-01-13 21:41   ` Borislav Petkov
  2013-01-14  5:37     ` Yinghai Lu
  2013-01-14 23:10     ` H. Peter Anvin
  0 siblings, 2 replies; 199+ messages in thread
From: Borislav Petkov @ 2013-01-13 21:41 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Eric W. Biederman,
	Andrew Morton, Jan Kiszka, Jason Wessel, linux-kernel,
	Rob Landley, Matt Fleming, Gokul Caushik, Josh Triplett,
	Joe Millenbach

On Thu, Jan 03, 2013 at 04:48:42PM -0800, Yinghai Lu wrote:
> ext_ramdisk_image/size will record high 32bits for ramdisk info.
> 
> xloadflags bit0 will be set if relocatable with 64bit.

Let's describe that in more detail:

"Bit 0 of xloadflags is set if we are both a 64-bit and a relocatable
kernel. In that case, it denotes that ramdisk can be loaded above 4Gb."

> Let get_ramdisk_image/size to use ext_ramdisk_image/size to get
> right positon for ramdisk.
> 
> bootloader will fill value to ext_ramdisk_image/size when it load
> ramdisk above 4G.
> 
> Also bootloader will check if xloadflags bit0 is set to decicde if

							  decide

> it could load ramdisk high above 4G.
> 
> sentinel is used to make sure kernel have ext_* valid values set

The explanation of the sentinel field from "-v6" below should be
actually up here. We absolutely want to have it in the commit message
*and* in the code so that it is well documented why we've added it.

> Update header version to 2.12.
> 
> -v2: add ext_cmd_line_ptr for above 4G support.
> -v3: update to xloadflags from HPA.
> -v4: use fields from bootparam instead setup_header according to HPA.
> -v5: add checking for USE_EXT_BOOT_PARAMS
> -v6: use sentinel to check if ext_* are valid suggested by HPA.
>      HPA said:
> 	1. add a field in the uninitialized portion, call it "sentinel";
> 	2. make sure the byte position corresponding to the "sentinel" field is
> 	   nonzero in the bzImage file;
> 	3. if the kernel boots up and sentinel is nonzero, erase those fields
> 	   that you identified as uninitialized;

Question: if the bootloader sets ext_* properly, is it going to set
sentinel to 0 so that it can signal to the code further on that ext_*
are valid?

This is kinda missing from the mechanism of the sentinel and it should
be documented too.

> -v7: change to 0x1ef instead of 0x1f0, HPA said:
> 	it is quite plausible that someone may (fairly sanely) start the
> 	copy range at 0x1f0 instead of 0x1f1

Right, all those -vX notes are all important and should *definitely* be
at least in the commit message.

> Signed-off-by: Yinghai Lu <yinghai@kernel.org>
> Cc: Rob Landley <rob@landley.net>
> Cc: Matt Fleming <matt.fleming@intel.com>
> Cc: Gokul Caushik <caushik1@gmail.com>
> Cc: Josh Triplett <josh@joshtriplett.org>
> Cc: Joe Millenbach <jmillenbach@gmail.com>
> ---
>  Documentation/x86/boot.txt            |   15 ++++++++++++++-
>  Documentation/x86/zero-page.txt       |    4 ++++
>  arch/x86/boot/compressed/cmdline.c    |    2 ++
>  arch/x86/boot/compressed/misc.c       |   12 ++++++++++++
>  arch/x86/boot/header.S                |   12 ++++++++++--
>  arch/x86/boot/setup.ld                |    7 +++++++
>  arch/x86/include/uapi/asm/bootparam.h |   13 ++++++++++---
>  arch/x86/kernel/head64.c              |    2 ++
>  arch/x86/kernel/setup.c               |    4 ++++
>  9 files changed, 65 insertions(+), 6 deletions(-)
> 
> diff --git a/Documentation/x86/boot.txt b/Documentation/x86/boot.txt
> index 406d82d..18ca9fb 100644
> --- a/Documentation/x86/boot.txt
> +++ b/Documentation/x86/boot.txt
> @@ -57,6 +57,9 @@ Protocol 2.10:	(Kernel 2.6.31) Added a protocol for relaxed alignment
>  Protocol 2.11:	(Kernel 3.6) Added a field for offset of EFI handover
>  		protocol entry point.
>  
> +Protocol 2.12:	(Kernel 3.9) Added three fields for loading bzImage and
> +		 ramdisk above 4G with 64bit in bootparam.

change to:

"Added three additional fields to bootparam used for loading bzImage and
ramdisk above 4Gb in 64-bit."

> +
>  **** MEMORY LAYOUT
>  
>  The traditional memory map for the kernel loader, used for Image or
> @@ -182,7 +185,7 @@ Offset	Proto	Name		Meaning
>  0230/4	2.05+	kernel_alignment Physical addr alignment required for kernel
>  0234/1	2.05+	relocatable_kernel Whether kernel is relocatable or not
>  0235/1	2.10+	min_alignment	Minimum alignment, as a power of two
> -0236/2	N/A	pad3		Unused
> +0236/2	2.12+	xloadflags	Boot protocol option flags
>  0238/4	2.06+	cmdline_size	Maximum size of the kernel command line
>  023C/4	2.07+	hardware_subarch Hardware subarchitecture
>  0240/8	2.07+	hardware_subarch_data Subarchitecture-specific data
> @@ -582,6 +585,16 @@ Protocol:	2.10+
>    misaligned kernel.  Therefore, a loader should typically try each
>    power-of-two alignment from kernel_alignment down to this alignment.
>  
> +Field name:     xloadflags
> +Type:           modify (obligatory)
> +Offset/size:    0x236/2
> +Protocol:       2.12+
> +
> +  This field is a bitmask.
> +
> +  Bit 0 (read): CAN_BE_LOADED_ABOVE_4G
> +        - If 1, kernel/boot_params/cmdline/ramdisk can be above 4g,

								fullstop at the
								end.

> +
>  Field name:	cmdline_size
>  Type:		read
>  Offset/size:	0x238/4
> diff --git a/Documentation/x86/zero-page.txt b/Documentation/x86/zero-page.txt
> index cf5437d..1140e59 100644
> --- a/Documentation/x86/zero-page.txt
> +++ b/Documentation/x86/zero-page.txt
> @@ -19,6 +19,9 @@ Offset	Proto	Name		Meaning
>  090/010	ALL	hd1_info	hd1 disk parameter, OBSOLETE!!
>  0A0/010	ALL	sys_desc_table	System description table (struct sys_desc_table)
>  0B0/010	ALL	olpc_ofw_header	OLPC's OpenFirmware CIF and friends
> +0C0/004	ALL	ext_ramdisk_image ramdisk_image high 32bits
> +0C4/004	ALL	ext_ramdisk_size  ramdisk_size high 32bits
> +0C8/004	ALL	ext_cmd_line_ptr  cmd_line_ptr high 32bits
>  140/080	ALL	edid_info	Video mode setup (struct edid_info)
>  1C0/020	ALL	efi_info	EFI 32 information (struct efi_info)
>  1E0/004	ALL	alk_mem_k	Alternative mem check, in KB
> @@ -27,6 +30,7 @@ Offset	Proto	Name		Meaning
>  1E9/001	ALL	eddbuf_entries	Number of entries in eddbuf (below)
>  1EA/001	ALL	edd_mbr_sig_buf_entries	Number of entries in edd_mbr_sig_buffer
>  				(below)
> +1EF/001	ALL	sentinel	0: states _ext_* fields are valid
>  290/040	ALL	edd_mbr_sig_buffer EDD MBR signatures
>  2D0/A00	ALL	e820_map	E820 memory map table
>  				(array of struct e820entry)
> diff --git a/arch/x86/boot/compressed/cmdline.c b/arch/x86/boot/compressed/cmdline.c
> index b4c913c..bffd73b 100644
> --- a/arch/x86/boot/compressed/cmdline.c
> +++ b/arch/x86/boot/compressed/cmdline.c
> @@ -17,6 +17,8 @@ static unsigned long get_cmd_line_ptr(void)
>  {
>  	unsigned long cmd_line_ptr = real_mode->hdr.cmd_line_ptr;
>  
> +	cmd_line_ptr |= (u64)real_mode->ext_cmd_line_ptr << 32;
> +
>  	return cmd_line_ptr;
>  }

On 32-bit, this unsigned long cmd_line_ptr is 4 bytes and the OR doesn't
have any effect on the final result. You probably want to do:

#ifdef CONFIG_64BIT
	cmd_line_ptr |= (u64)real_mode->ext_cmd_line_ptr << 32;
#endif

right?

Or instead look at ->sentinel to know whether the ext_* fields are valid
or not, and save yourself the OR if not.

>  int cmdline_find_option(const char *option, char *buffer, int bufsize)
> diff --git a/arch/x86/boot/compressed/misc.c b/arch/x86/boot/compressed/misc.c
> index 88f7ff6..f714576 100644
> --- a/arch/x86/boot/compressed/misc.c
> +++ b/arch/x86/boot/compressed/misc.c
> @@ -318,6 +318,16 @@ static void parse_elf(void *output)
>  	free(phdrs);
>  }
>  
> +static void sanitize_real_mode(struct boot_params *real_mode)
> +{
> +	if (real_mode->sentinel) {
> +		/* ext_* fields in boot_params are not valid, clear them */
> +		real_mode->ext_ramdisk_image = 0;
> +		real_mode->ext_ramdisk_size  = 0;
> +		real_mode->ext_cmd_line_ptr  = 0;
> +	}
> +}
> +
>  asmlinkage void decompress_kernel(void *rmode, memptr heap,
>  				  unsigned char *input_data,
>  				  unsigned long input_len,
> @@ -325,6 +335,8 @@ asmlinkage void decompress_kernel(void *rmode, memptr heap,
>  {
>  	real_mode = rmode;
>  
> +	sanitize_real_mode(real_mode);
> +
>  	if (real_mode->screen_info.orig_video_mode == 7) {
>  		vidmem = (char *) 0xb0000;
>  		vidport = 0x3b4;
> diff --git a/arch/x86/boot/header.S b/arch/x86/boot/header.S
> index 8c132a6..0d5790f 100644
> --- a/arch/x86/boot/header.S
> +++ b/arch/x86/boot/header.S
> @@ -279,7 +279,7 @@ _start:
>  	# Part 2 of the header, from the old setup.S
>  
>  		.ascii	"HdrS"		# header signature
> -		.word	0x020b		# header version number (>= 0x0105)
> +		.word	0x020c		# header version number (>= 0x0105)
>  					# or else old loadlin-1.5 will fail)
>  		.globl realmode_swtch
>  realmode_swtch:	.word	0, 0		# default_switch, SETUPSEG
> @@ -369,7 +369,15 @@ relocatable_kernel:    .byte 1
>  relocatable_kernel:    .byte 0
>  #endif
>  min_alignment:		.byte MIN_KERNEL_ALIGN_LG2	# minimum alignment
> -pad3:			.word 0
> +
> +xloadflags:
> +CAN_BE_LOADED_ABOVE_4G	= 1		# If set, the kernel/boot_param/
> +					# ramdisk could be loaded above 4g
> +#if defined(CONFIG_X86_64) && defined(CONFIG_RELOCATABLE)
> +			.word CAN_BE_LOADED_ABOVE_4G
> +#else
> +			.word 0
> +#endif
>  
>  cmdline_size:   .long   COMMAND_LINE_SIZE-1     #length of the command line,
>                                                  #added with boot protocol
> diff --git a/arch/x86/boot/setup.ld b/arch/x86/boot/setup.ld
> index 03c0683..9333d37 100644
> --- a/arch/x86/boot/setup.ld
> +++ b/arch/x86/boot/setup.ld
> @@ -13,6 +13,13 @@ SECTIONS
>  	.bstext		: { *(.bstext) }
>  	.bsdata		: { *(.bsdata) }
>  
> +	/* sentinel: make sure if boot_params from bootloader is right */

This should say:

	/*
	 * The bootloader signals the validity of the three ext_* boot params with this.
	 */

> +	. = 495;
> +	.sentinel	: {
> +		sentinel = .;
> +		BYTE(0xff);
> +	}
> +
>  	. = 497;
>  	.header		: { *(.header) }
>  	.entrytext	: { *(.entrytext) }
> diff --git a/arch/x86/include/uapi/asm/bootparam.h b/arch/x86/include/uapi/asm/bootparam.h
> index 92862cd..3d8ed8f 100644
> --- a/arch/x86/include/uapi/asm/bootparam.h
> +++ b/arch/x86/include/uapi/asm/bootparam.h
> @@ -58,7 +58,9 @@ struct setup_header {
>  	__u32	initrd_addr_max;
>  	__u32	kernel_alignment;
>  	__u8	relocatable_kernel;
> -	__u8	_pad2[3];
> +	__u8	min_alignment;

Hehe, this is actually a minor bugfix because min_alignment was only
defined in header.S but was _pad2[0] in the struct definition.

> +	__u16	xloadflags;
> +#define CAN_BE_LOADED_ABOVE_4G	(1<<0)
>  	__u32	cmdline_size;
>  	__u32	hardware_subarch;
>  	__u64	hardware_subarch_data;
> @@ -106,7 +108,10 @@ struct boot_params {
>  	__u8  hd1_info[16];	/* obsolete! */		/* 0x090 */
>  	struct sys_desc_table sys_desc_table;		/* 0x0a0 */
>  	struct olpc_ofw_header olpc_ofw_header;		/* 0x0b0 */
> -	__u8  _pad4[128];				/* 0x0c0 */
> +	__u32 ext_ramdisk_image;			/* 0x0c0 */
> +	__u32 ext_ramdisk_size;				/* 0x0c4 */
> +	__u32 ext_cmd_line_ptr;				/* 0x0c8 */
> +	__u8  _pad4[116];				/* 0x0cc */
>  	struct edid_info edid_info;			/* 0x140 */
>  	struct efi_info efi_info;			/* 0x1c0 */
>  	__u32 alt_mem_k;				/* 0x1e0 */
> @@ -115,7 +120,9 @@ struct boot_params {
>  	__u8  eddbuf_entries;				/* 0x1e9 */
>  	__u8  edd_mbr_sig_buf_entries;			/* 0x1ea */
>  	__u8  kbd_status;				/* 0x1eb */
> -	__u8  _pad6[5];					/* 0x1ec */
> +	__u8  _pad5[3];					/* 0x1ec */
> +	__u8  sentinel;					/* 0x1ef */
> +	__u8  _pad6[1];					/* 0x1f0 */

This needs the -v7 explanation from above as a comment here or somewhere
around here, for why we've chosen 0x1ef offset.

>  	struct setup_header hdr;    /* setup header */	/* 0x1f1 */
>  	__u8  _pad7[0x290-0x1f1-sizeof(struct setup_header)];
>  	__u32 edd_mbr_sig_buffer[EDD_MBR_SIG_MAX];	/* 0x290 */
> diff --git a/arch/x86/kernel/head64.c b/arch/x86/kernel/head64.c
> index 316e7b2..e63d29a 100644
> --- a/arch/x86/kernel/head64.c
> +++ b/arch/x86/kernel/head64.c
> @@ -115,6 +115,8 @@ static unsigned long get_cmd_line_ptr(void)
>  {
>  	unsigned long cmd_line_ptr = boot_params.hdr.cmd_line_ptr;
>  
> +	cmd_line_ptr |= (u64)boot_params.ext_cmd_line_ptr << 32;
> +
>  	return cmd_line_ptr;

Ditto as above for get_cmd_line_ptr in compressed/...

>  }
>  
> diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
> index 644a123..2509efa 100644
> --- a/arch/x86/kernel/setup.c
> +++ b/arch/x86/kernel/setup.c
> @@ -298,12 +298,16 @@ static u64 __init get_ramdisk_image(void)
>  {
>  	u64 ramdisk_image = boot_params.hdr.ramdisk_image;
>  
> +	ramdisk_image |= (u64)boot_params.ext_ramdisk_image << 32;
> +
>  	return ramdisk_image;
>  }
>  static u64 __init get_ramdisk_size(void)
>  {
>  	u64 ramdisk_size = boot_params.hdr.ramdisk_size;
>  
> +	ramdisk_size |= (u64)boot_params.ext_ramdisk_size << 32;
> +
>  	return ramdisk_size;

Thanks.

-- 
Regards/Gruss,
    Boris.

Sent from a fat crate under my desk. Formatting is fine.
--

^ permalink raw reply	[flat|nested] 199+ messages in thread

* Re: [PATCH v7u1 22/31] x86, boot: add fields to support load bzImage and ramdisk above 4G
  2013-01-13 21:41   ` Borislav Petkov
@ 2013-01-14  5:37     ` Yinghai Lu
  2013-01-14  9:43       ` Borislav Petkov
  2013-01-14 17:49       ` H. Peter Anvin
  2013-01-14 23:10     ` H. Peter Anvin
  1 sibling, 2 replies; 199+ messages in thread
From: Yinghai Lu @ 2013-01-14  5:37 UTC (permalink / raw)
  To: Borislav Petkov, Yinghai Lu, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin, Eric W. Biederman, Andrew Morton, Jan Kiszka,
	Jason Wessel, linux-kernel, Rob Landley, Matt Fleming,
	Gokul Caushik, Josh Triplett, Joe Millenbach

On Sun, Jan 13, 2013 at 1:41 PM, Borislav Petkov <bp@alien8.de> wrote:
> On Thu, Jan 03, 2013 at 04:48:42PM -0800, Yinghai Lu wrote:
>> ext_ramdisk_image/size will record high 32bits for ramdisk info.
>>
>> xloadflags bit0 will be set if relocatable with 64bit.
>
> Let's describe that in more detail:
>
> "Bit 0 of xloadflags is set if we are both a 64-bit and a relocatable
> kernel. In that case, it denotes that ramdisk can be loaded above 4Gb."
>
>> Let get_ramdisk_image/size to use ext_ramdisk_image/size to get
>> right positon for ramdisk.
>>
>> bootloader will fill value to ext_ramdisk_image/size when it load
>> ramdisk above 4G.
>>
>> Also bootloader will check if xloadflags bit0 is set to decicde if
>
>                                                           decide
>
>> it could load ramdisk high above 4G.
>>
>> sentinel is used to make sure kernel have ext_* valid values set
>
> The explanation of the sentinel field from "-v6" below should be
> actually up here. We absolutely want to have it in the commit message
> *and* in the code so that it is well documented why we've added it.
>
>> Update header version to 2.12.
>>
>> -v2: add ext_cmd_line_ptr for above 4G support.
>> -v3: update to xloadflags from HPA.
>> -v4: use fields from bootparam instead setup_header according to HPA.
>> -v5: add checking for USE_EXT_BOOT_PARAMS
>> -v6: use sentinel to check if ext_* are valid suggested by HPA.
>>      HPA said:
>>       1. add a field in the uninitialized portion, call it "sentinel";
>>       2. make sure the byte position corresponding to the "sentinel" field is
>>          nonzero in the bzImage file;
>>       3. if the kernel boots up and sentinel is nonzero, erase those fields
>>          that you identified as uninitialized;
>
> Question: if the bootloader sets ext_* properly, is it going to set
> sentinel to 0 so that it can signal to the code further on that ext_*
> are valid?

old bootloaders have no idea of sentinel, but if they initialize boot_param
properly that new sentinel will be 0 and new kernel will know.

>
> This is kinda missing from the mechanism of the sentinel and it should
> be documented too.

No, we should have too much duplicated info.

>
>> -v7: change to 0x1ef instead of 0x1f0, HPA said:
>>       it is quite plausible that someone may (fairly sanely) start the
>>       copy range at 0x1f0 instead of 0x1f1
>
> Right, all those -vX notes are all important and should *definitely* be
> at least in the commit message.

No, I want to keep them in order to track the reviewing progress.

>
>> Signed-off-by: Yinghai Lu <yinghai@kernel.org>
>> Cc: Rob Landley <rob@landley.net>
>> Cc: Matt Fleming <matt.fleming@intel.com>
>> Cc: Gokul Caushik <caushik1@gmail.com>
>> Cc: Josh Triplett <josh@joshtriplett.org>
>> Cc: Joe Millenbach <jmillenbach@gmail.com>
>> ---
>>  Documentation/x86/boot.txt            |   15 ++++++++++++++-
>>  Documentation/x86/zero-page.txt       |    4 ++++
>>  arch/x86/boot/compressed/cmdline.c    |    2 ++
>>  arch/x86/boot/compressed/misc.c       |   12 ++++++++++++
>>  arch/x86/boot/header.S                |   12 ++++++++++--
>>  arch/x86/boot/setup.ld                |    7 +++++++
>>  arch/x86/include/uapi/asm/bootparam.h |   13 ++++++++++---
>>  arch/x86/kernel/head64.c              |    2 ++
>>  arch/x86/kernel/setup.c               |    4 ++++
>>  9 files changed, 65 insertions(+), 6 deletions(-)
>>
>> diff --git a/Documentation/x86/boot.txt b/Documentation/x86/boot.txt
>> index 406d82d..18ca9fb 100644
>> --- a/Documentation/x86/boot.txt
>> +++ b/Documentation/x86/boot.txt
>> @@ -57,6 +57,9 @@ Protocol 2.10:      (Kernel 2.6.31) Added a protocol for relaxed alignment
>>  Protocol 2.11:       (Kernel 3.6) Added a field for offset of EFI handover
>>               protocol entry point.
>>
>> +Protocol 2.12:       (Kernel 3.9) Added three fields for loading bzImage and
>> +              ramdisk above 4G with 64bit in bootparam.
>
> change to:
>
> "Added three additional fields to bootparam used for loading bzImage and
> ramdisk above 4Gb in 64-bit."

ok

>
>> +
>>  **** MEMORY LAYOUT
>>
>>  The traditional memory map for the kernel loader, used for Image or
>> @@ -182,7 +185,7 @@ Offset    Proto   Name            Meaning
>>  0230/4       2.05+   kernel_alignment Physical addr alignment required for kernel
>>  0234/1       2.05+   relocatable_kernel Whether kernel is relocatable or not
>>  0235/1       2.10+   min_alignment   Minimum alignment, as a power of two
>> -0236/2       N/A     pad3            Unused
>> +0236/2       2.12+   xloadflags      Boot protocol option flags
>>  0238/4       2.06+   cmdline_size    Maximum size of the kernel command line
>>  023C/4       2.07+   hardware_subarch Hardware subarchitecture
>>  0240/8       2.07+   hardware_subarch_data Subarchitecture-specific data
>> @@ -582,6 +585,16 @@ Protocol:        2.10+
>>    misaligned kernel.  Therefore, a loader should typically try each
>>    power-of-two alignment from kernel_alignment down to this alignment.
>>
>> +Field name:     xloadflags
>> +Type:           modify (obligatory)
>> +Offset/size:    0x236/2
>> +Protocol:       2.12+
>> +
>> +  This field is a bitmask.
>> +
>> +  Bit 0 (read): CAN_BE_LOADED_ABOVE_4G
>> +        - If 1, kernel/boot_params/cmdline/ramdisk can be above 4g,
>
>                                                                 fullstop at the
>                                                                 end.

ok.

>
>> +
>>  Field name:  cmdline_size
>>  Type:                read
>>  Offset/size: 0x238/4
>> diff --git a/Documentation/x86/zero-page.txt b/Documentation/x86/zero-page.txt
>> index cf5437d..1140e59 100644
>> --- a/Documentation/x86/zero-page.txt
>> +++ b/Documentation/x86/zero-page.txt
>> @@ -19,6 +19,9 @@ Offset      Proto   Name            Meaning
>>  090/010      ALL     hd1_info        hd1 disk parameter, OBSOLETE!!
>>  0A0/010      ALL     sys_desc_table  System description table (struct sys_desc_table)
>>  0B0/010      ALL     olpc_ofw_header OLPC's OpenFirmware CIF and friends
>> +0C0/004      ALL     ext_ramdisk_image ramdisk_image high 32bits
>> +0C4/004      ALL     ext_ramdisk_size  ramdisk_size high 32bits
>> +0C8/004      ALL     ext_cmd_line_ptr  cmd_line_ptr high 32bits
>>  140/080      ALL     edid_info       Video mode setup (struct edid_info)
>>  1C0/020      ALL     efi_info        EFI 32 information (struct efi_info)
>>  1E0/004      ALL     alk_mem_k       Alternative mem check, in KB
>> @@ -27,6 +30,7 @@ Offset      Proto   Name            Meaning
>>  1E9/001      ALL     eddbuf_entries  Number of entries in eddbuf (below)
>>  1EA/001      ALL     edd_mbr_sig_buf_entries Number of entries in edd_mbr_sig_buffer
>>                               (below)
>> +1EF/001      ALL     sentinel        0: states _ext_* fields are valid
>>  290/040      ALL     edd_mbr_sig_buffer EDD MBR signatures
>>  2D0/A00      ALL     e820_map        E820 memory map table
>>                               (array of struct e820entry)
>> diff --git a/arch/x86/boot/compressed/cmdline.c b/arch/x86/boot/compressed/cmdline.c
>> index b4c913c..bffd73b 100644
>> --- a/arch/x86/boot/compressed/cmdline.c
>> +++ b/arch/x86/boot/compressed/cmdline.c
>> @@ -17,6 +17,8 @@ static unsigned long get_cmd_line_ptr(void)
>>  {
>>       unsigned long cmd_line_ptr = real_mode->hdr.cmd_line_ptr;
>>
>> +     cmd_line_ptr |= (u64)real_mode->ext_cmd_line_ptr << 32;
>> +
>>       return cmd_line_ptr;
>>  }
>
> On 32-bit, this unsigned long cmd_line_ptr is 4 bytes and the OR doesn't
> have any effect on the final result. You probably want to do:

yes, that is what we want to keep 32bit and 64bit unified.

>
> #ifdef CONFIG_64BIT
>         cmd_line_ptr |= (u64)real_mode->ext_cmd_line_ptr << 32;
> #endif
>
> right?
>
> Or instead look at ->sentinel to know whether the ext_* fields are valid
> or not, and save yourself the OR if not.

no.

that is whole point of sentinel, we don't need to check sentinel everywhere
because ext_* are valid.

>
>>  int cmdline_find_option(const char *option, char *buffer, int bufsize)
>> diff --git a/arch/x86/boot/compressed/misc.c b/arch/x86/boot/compressed/misc.c
>> index 88f7ff6..f714576 100644
>> --- a/arch/x86/boot/compressed/misc.c
>> +++ b/arch/x86/boot/compressed/misc.c
>> @@ -318,6 +318,16 @@ static void parse_elf(void *output)
>>       free(phdrs);
>>  }
>>
>> +static void sanitize_real_mode(struct boot_params *real_mode)
>> +{
>> +     if (real_mode->sentinel) {
>> +             /* ext_* fields in boot_params are not valid, clear them */
>> +             real_mode->ext_ramdisk_image = 0;
>> +             real_mode->ext_ramdisk_size  = 0;
>> +             real_mode->ext_cmd_line_ptr  = 0;
>> +     }
>> +}
>> +
>>  asmlinkage void decompress_kernel(void *rmode, memptr heap,
>>                                 unsigned char *input_data,
>>                                 unsigned long input_len,
>> @@ -325,6 +335,8 @@ asmlinkage void decompress_kernel(void *rmode, memptr heap,
>>  {
>>       real_mode = rmode;
>>
>> +     sanitize_real_mode(real_mode);
>> +
>>       if (real_mode->screen_info.orig_video_mode == 7) {
>>               vidmem = (char *) 0xb0000;
>>               vidport = 0x3b4;
>> diff --git a/arch/x86/boot/header.S b/arch/x86/boot/header.S
>> index 8c132a6..0d5790f 100644
>> --- a/arch/x86/boot/header.S
>> +++ b/arch/x86/boot/header.S
>> @@ -279,7 +279,7 @@ _start:
>>       # Part 2 of the header, from the old setup.S
>>
>>               .ascii  "HdrS"          # header signature
>> -             .word   0x020b          # header version number (>= 0x0105)
>> +             .word   0x020c          # header version number (>= 0x0105)
>>                                       # or else old loadlin-1.5 will fail)
>>               .globl realmode_swtch
>>  realmode_swtch:      .word   0, 0            # default_switch, SETUPSEG
>> @@ -369,7 +369,15 @@ relocatable_kernel:    .byte 1
>>  relocatable_kernel:    .byte 0
>>  #endif
>>  min_alignment:               .byte MIN_KERNEL_ALIGN_LG2      # minimum alignment
>> -pad3:                        .word 0
>> +
>> +xloadflags:
>> +CAN_BE_LOADED_ABOVE_4G       = 1             # If set, the kernel/boot_param/
>> +                                     # ramdisk could be loaded above 4g
>> +#if defined(CONFIG_X86_64) && defined(CONFIG_RELOCATABLE)
>> +                     .word CAN_BE_LOADED_ABOVE_4G
>> +#else
>> +                     .word 0
>> +#endif
>>
>>  cmdline_size:   .long   COMMAND_LINE_SIZE-1     #length of the command line,
>>                                                  #added with boot protocol
>> diff --git a/arch/x86/boot/setup.ld b/arch/x86/boot/setup.ld
>> index 03c0683..9333d37 100644
>> --- a/arch/x86/boot/setup.ld
>> +++ b/arch/x86/boot/setup.ld
>> @@ -13,6 +13,13 @@ SECTIONS
>>       .bstext         : { *(.bstext) }
>>       .bsdata         : { *(.bsdata) }
>>
>> +     /* sentinel: make sure if boot_params from bootloader is right */
>
> This should say:
>
>         /*
>          * The bootloader signals the validity of the three ext_* boot params with this.
>          */

no, bootloader does not signal that.
old bootloaders have no idea of sentinel, but if they initialize boot_param
properly that new sentinel will be 0 and new kernel will know.

>
>> +     . = 495;
>> +     .sentinel       : {
>> +             sentinel = .;
>> +             BYTE(0xff);
>> +     }
>> +
>>       . = 497;
>>       .header         : { *(.header) }
>>       .entrytext      : { *(.entrytext) }
>> diff --git a/arch/x86/include/uapi/asm/bootparam.h b/arch/x86/include/uapi/asm/bootparam.h
>> index 92862cd..3d8ed8f 100644
>> --- a/arch/x86/include/uapi/asm/bootparam.h
>> +++ b/arch/x86/include/uapi/asm/bootparam.h
>> @@ -58,7 +58,9 @@ struct setup_header {
>>       __u32   initrd_addr_max;
>>       __u32   kernel_alignment;
>>       __u8    relocatable_kernel;
>> -     __u8    _pad2[3];
>> +     __u8    min_alignment;
>
> Hehe, this is actually a minor bugfix because min_alignment was only
> defined in header.S but was _pad2[0] in the struct definition.
>
>> +     __u16   xloadflags;
>> +#define CAN_BE_LOADED_ABOVE_4G       (1<<0)
>>       __u32   cmdline_size;
>>       __u32   hardware_subarch;
>>       __u64   hardware_subarch_data;
>> @@ -106,7 +108,10 @@ struct boot_params {
>>       __u8  hd1_info[16];     /* obsolete! */         /* 0x090 */
>>       struct sys_desc_table sys_desc_table;           /* 0x0a0 */
>>       struct olpc_ofw_header olpc_ofw_header;         /* 0x0b0 */
>> -     __u8  _pad4[128];                               /* 0x0c0 */
>> +     __u32 ext_ramdisk_image;                        /* 0x0c0 */
>> +     __u32 ext_ramdisk_size;                         /* 0x0c4 */
>> +     __u32 ext_cmd_line_ptr;                         /* 0x0c8 */
>> +     __u8  _pad4[116];                               /* 0x0cc */
>>       struct edid_info edid_info;                     /* 0x140 */
>>       struct efi_info efi_info;                       /* 0x1c0 */
>>       __u32 alt_mem_k;                                /* 0x1e0 */
>> @@ -115,7 +120,9 @@ struct boot_params {
>>       __u8  eddbuf_entries;                           /* 0x1e9 */
>>       __u8  edd_mbr_sig_buf_entries;                  /* 0x1ea */
>>       __u8  kbd_status;                               /* 0x1eb */
>> -     __u8  _pad6[5];                                 /* 0x1ec */
>> +     __u8  _pad5[3];                                 /* 0x1ec */
>> +     __u8  sentinel;                                 /* 0x1ef */
>> +     __u8  _pad6[1];                                 /* 0x1f0 */
>
> This needs the -v7 explanation from above as a comment here or somewhere
> around here, for why we've chosen 0x1ef offset.

no, there is no such comment for other fields there.

Thanks

Yinghai

^ permalink raw reply	[flat|nested] 199+ messages in thread

* Re: [PATCH v7u1 21/31] x86, kexec: only set ident mapping for ram.
  2013-01-13 12:56   ` Borislav Petkov
@ 2013-01-14  5:46     ` Yinghai Lu
  2013-01-14  9:53       ` Borislav Petkov
  0 siblings, 1 reply; 199+ messages in thread
From: Yinghai Lu @ 2013-01-14  5:46 UTC (permalink / raw)
  To: Borislav Petkov, Yinghai Lu, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin, Eric W. Biederman, Andrew Morton, Jan Kiszka,
	Jason Wessel, linux-kernel, Alexander Duyck

On Sun, Jan 13, 2013 at 4:56 AM, Borislav Petkov <bp@alien8.de> wrote:
> On Thu, Jan 03, 2013 at 04:48:41PM -0800, Yinghai Lu wrote:
>> We should not set mapping for all under max_pfn.
>
> "We should not establish mappings for all memory under max_pfn."

that is not accurate.

We should not set mapping for all range under max_pfn.

or

We should set mappings only for memory ranges under max_pfn.

>
>> That causes same problem that is fixed by
>
> "Otherwise, it causes the same ..."
>
>>
>>       x86, mm: Only direct map addresses that are marked as E820_RAM
>
> You could add this patch's commit id since it is in tip:x86/mm2 and it
> shouldn't change: 66520ebc2df3.

why ? they are not in linus tree yet, so it could change if that tip
branch is rebased.

>
> Ditto for patch 09/31, "x86, 64bit: #PF handler set page to cover 2M only".


>
>>
>> This patch expose pfn_mapped array, and only set ident mapping for ranges
>
>              exposes the...                    sets
>
>> in that array.
>>
>> This patch rely on new ident_mapping_init that could handle existing
>
>              relies on the new
>
>> pgd/pud between different calling.
>
>                             calls.
>

ok, fix those problems

^ permalink raw reply	[flat|nested] 199+ messages in thread

* Re: [PATCH v7u1 22/31] x86, boot: add fields to support load bzImage and ramdisk above 4G
  2013-01-14  5:37     ` Yinghai Lu
@ 2013-01-14  9:43       ` Borislav Petkov
  2013-01-14 23:06         ` Yinghai Lu
  2013-01-14 17:49       ` H. Peter Anvin
  1 sibling, 1 reply; 199+ messages in thread
From: Borislav Petkov @ 2013-01-14  9:43 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Eric W. Biederman,
	Andrew Morton, Jan Kiszka, Jason Wessel, linux-kernel,
	Rob Landley, Matt Fleming, Gokul Caushik, Josh Triplett,
	Joe Millenbach

On Sun, Jan 13, 2013 at 09:37:08PM -0800, Yinghai Lu wrote:
> > Question: if the bootloader sets ext_* properly, is it going to set
> > sentinel to 0 so that it can signal to the code further on that ext_*
> > are valid?
> 
> old bootloaders have no idea of sentinel, but if they initialize boot_param
> properly that new sentinel will be 0 and new kernel will know.
> 
> >
> > This is kinda missing from the mechanism of the sentinel and it should
> > be documented too.
> 
> No, we should have too much duplicated info.

This is not a complicated info - it should explain the basic mechanism
of the sentinel.

> >> -v7: change to 0x1ef instead of 0x1f0, HPA said:
> >>       it is quite plausible that someone may (fairly sanely) start the
> >>       copy range at 0x1f0 instead of 0x1f1
> >
> > Right, all those -vX notes are all important and should *definitely* be
> > at least in the commit message.
> 
> No, I want to keep them in order to track the reviewing progress.

Are you saying "no" just for the fun of it? Or do you have a general
aversion to documenting your code?

Give me *one* good reason where having a short, concise and clear
comment which helps people understand what the intent of the mechanism
is a bad thing.

[ … ]

> >> diff --git a/arch/x86/boot/compressed/cmdline.c b/arch/x86/boot/compressed/cmdline.c
> >> index b4c913c..bffd73b 100644
> >> --- a/arch/x86/boot/compressed/cmdline.c
> >> +++ b/arch/x86/boot/compressed/cmdline.c
> >> @@ -17,6 +17,8 @@ static unsigned long get_cmd_line_ptr(void)
> >>  {
> >>       unsigned long cmd_line_ptr = real_mode->hdr.cmd_line_ptr;
> >>
> >> +     cmd_line_ptr |= (u64)real_mode->ext_cmd_line_ptr << 32;
> >> +
> >>       return cmd_line_ptr;
> >>  }
> >
> > On 32-bit, this unsigned long cmd_line_ptr is 4 bytes and the OR doesn't
> > have any effect on the final result. You probably want to do:
> 
> yes, that is what we want to keep 32bit and 64bit unified.
> 
> >
> > #ifdef CONFIG_64BIT
> >         cmd_line_ptr |= (u64)real_mode->ext_cmd_line_ptr << 32;
> > #endif
> >
> > right?
> >
> > Or instead look at ->sentinel to know whether the ext_* fields are valid
> > or not, and save yourself the OR if not.
> 
> no.
> 
> that is whole point of sentinel, we don't need to check sentinel everywhere
> because ext_* are valid.

Dude, do you even read my comments? This line:

	cmd_line_ptr |= (u64)real_mode->ext_cmd_line_ptr << 32;

doesn't do a whit on 32-bit. So execute it *only* on 32-bit!

[ … ]

> >> diff --git a/arch/x86/boot/setup.ld b/arch/x86/boot/setup.ld
> >> index 03c0683..9333d37 100644
> >> --- a/arch/x86/boot/setup.ld
> >> +++ b/arch/x86/boot/setup.ld
> >> @@ -13,6 +13,13 @@ SECTIONS
> >>       .bstext         : { *(.bstext) }
> >>       .bsdata         : { *(.bsdata) }
> >>
> >> +     /* sentinel: make sure if boot_params from bootloader is right */
> >
> > This should say:
> >
> >         /*
> >          * The bootloader signals the validity of the three ext_* boot params with this.
> >          */
> 
> no, bootloader does not signal that.
> old bootloaders have no idea of sentinel, but if they initialize boot_param
> properly that new sentinel will be 0 and new kernel will know.

So say

	 * A RECENT bootloader signals the validity of the three ext_* boot params with this.

but say something.

Understand this (and we've been chewing this same shit for two weeks
now): you need to document your code and you need to document it
properly for other people to understand what you're doing. I'm not
talking about writing an essay or whatever - I'm talking about helpful
comments placed where it makes most sense so that others can understand
the mechanism.

[ … ]

> >> +     __u16   xloadflags;
> >> +#define CAN_BE_LOADED_ABOVE_4G       (1<<0)
> >>       __u32   cmdline_size;
> >>       __u32   hardware_subarch;
> >>       __u64   hardware_subarch_data;
> >> @@ -106,7 +108,10 @@ struct boot_params {
> >>       __u8  hd1_info[16];     /* obsolete! */         /* 0x090 */
> >>       struct sys_desc_table sys_desc_table;           /* 0x0a0 */
> >>       struct olpc_ofw_header olpc_ofw_header;         /* 0x0b0 */
> >> -     __u8  _pad4[128];                               /* 0x0c0 */
> >> +     __u32 ext_ramdisk_image;                        /* 0x0c0 */
> >> +     __u32 ext_ramdisk_size;                         /* 0x0c4 */
> >> +     __u32 ext_cmd_line_ptr;                         /* 0x0c8 */
> >> +     __u8  _pad4[116];                               /* 0x0cc */
> >>       struct edid_info edid_info;                     /* 0x140 */
> >>       struct efi_info efi_info;                       /* 0x1c0 */
> >>       __u32 alt_mem_k;                                /* 0x1e0 */
> >> @@ -115,7 +120,9 @@ struct boot_params {
> >>       __u8  eddbuf_entries;                           /* 0x1e9 */
> >>       __u8  edd_mbr_sig_buf_entries;                  /* 0x1ea */
> >>       __u8  kbd_status;                               /* 0x1eb */
> >> -     __u8  _pad6[5];                                 /* 0x1ec */
> >> +     __u8  _pad5[3];                                 /* 0x1ec */
> >> +     __u8  sentinel;                                 /* 0x1ef */
> >> +     __u8  _pad6[1];                                 /* 0x1f0 */
> >
> > This needs the -v7 explanation from above as a comment here or somewhere
> > around here, for why we've chosen 0x1ef offset.
> 
> no, there is no such comment for other fields there.

That's why I f*cking said "here or somewhere around here"! Or put
it somewhere else altogether, if you don't like it here but PUT IT
SOMEWHERE!

-- 
Regards/Gruss,
    Boris.

Sent from a fat crate under my desk. Formatting is fine.
--

^ permalink raw reply	[flat|nested] 199+ messages in thread

* Re: [PATCH v7u1 21/31] x86, kexec: only set ident mapping for ram.
  2013-01-14  5:46     ` Yinghai Lu
@ 2013-01-14  9:53       ` Borislav Petkov
  2013-01-14 18:17         ` Yinghai Lu
  0 siblings, 1 reply; 199+ messages in thread
From: Borislav Petkov @ 2013-01-14  9:53 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Eric W. Biederman,
	Andrew Morton, Jan Kiszka, Jason Wessel, linux-kernel,
	Alexander Duyck

On Sun, Jan 13, 2013 at 09:46:17PM -0800, Yinghai Lu wrote:
> On Sun, Jan 13, 2013 at 4:56 AM, Borislav Petkov <bp@alien8.de> wrote:
> > On Thu, Jan 03, 2013 at 04:48:41PM -0800, Yinghai Lu wrote:
> >> We should not set mapping for all under max_pfn.
> >
> > "We should not establish mappings for all memory under max_pfn."
> 
> that is not accurate.
> 
> We should not set mapping for all range under max_pfn.
> 
> or
> 
> We should set mappings only for memory ranges under max_pfn.

Ok, that last thing is getting close. So do I understand it correctly
now:

"We should establish mappings only for memory (memory which is not
marked reserved or whatever by E820 or some other mechanism) under
max_pfn."

?

> >> That causes same problem that is fixed by
> >
> > "Otherwise, it causes the same ..."
> >
> >>
> >>       x86, mm: Only direct map addresses that are marked as E820_RAM
> >
> > You could add this patch's commit id since it is in tip:x86/mm2 and it
> > shouldn't change: 66520ebc2df3.
> 
> why ? they are not in linus tree yet, so it could change if that tip
> branch is rebased.

Oh, you didn't know: tip branches don't get rebased. At least almost
never.

-- 
Regards/Gruss,
    Boris.

Sent from a fat crate under my desk. Formatting is fine.
--

^ permalink raw reply	[flat|nested] 199+ messages in thread

* Re: [PATCH v7u1 23/31] x86, boot: update comments about entries for 64bit image
  2013-01-04  0:48 ` [PATCH v7u1 23/31] x86, boot: update comments about entries for 64bit image Yinghai Lu
@ 2013-01-14 11:20   ` Borislav Petkov
  2013-01-14 18:35     ` Yinghai Lu
  0 siblings, 1 reply; 199+ messages in thread
From: Borislav Petkov @ 2013-01-14 11:20 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Eric W. Biederman,
	Andrew Morton, Jan Kiszka, Jason Wessel, linux-kernel,
	Rob Landley, Matt Fleming

On Thu, Jan 03, 2013 at 04:48:43PM -0800, Yinghai Lu wrote:
> Now 64bit entry is fixed on 0x200, can not be changed anymore.
> 
> Update the comments to reflect that.
> 
> Also put info about it in boot.txt
> 
> Signed-off-by: Yinghai Lu <yinghai@kernel.org>
> Cc: Rob Landley <rob@landley.net>
> Cc: Matt Fleming <matt.fleming@intel.com>
> ---
>  Documentation/x86/boot.txt         |   38 ++++++++++++++++++++++++++++++++++++
>  arch/x86/boot/compressed/head_64.S |   22 ++++++++++++---------
>  2 files changed, 51 insertions(+), 9 deletions(-)
> 
> diff --git a/Documentation/x86/boot.txt b/Documentation/x86/boot.txt
> index 18ca9fb..24cc542 100644
> --- a/Documentation/x86/boot.txt
> +++ b/Documentation/x86/boot.txt
> @@ -1042,6 +1042,44 @@ must have read/write permission; CS must be __BOOT_CS and DS, ES, SS
>  must be __BOOT_DS; interrupt must be disabled; %esi must hold the base
>  address of the struct boot_params; %ebp, %edi and %ebx must be zero.
>  
> +**** 64-bit BOOT PROTOCOL
> +
> +For machine with 64bit cpus and 64bit kernel, we could use 64bit bootloader
> +We need a 64-bit boot protocol.

Make that:

"64-bit kernels using 64-bit bootloaders use this protocol for booting."

> +
> +In 64-bit boot protocol, the first step in loading a Linux kernel
> +should be to setup the boot parameters (struct boot_params,
> +traditionally known as "zero page"). The memory for struct boot_params
> +should be allocated under or above 4G and initialized to all zero.

"Memory for struct boot_params may be allocated anywhere (even above
4G). This memory must be zeroed out."

Also, add a \n here.

> +Then the setup header from offset 0x01f1 of kernel image on should be

"Then, the setup header at offset 0x01f1 of the kernel image should be..."

> +loaded into struct boot_params and examined. The end of setup header
> +can be calculated as follow:

			"follows:"

> +
> +	0x0202 + byte value at offset 0x0201

What is that value at 0x201? What's its name? Maybe it is called "magic" :-)

> +
> +In addition to read/modify/write the setup header of the struct
> +boot_params as that of 16-bit boot protocol,

Hmm, do you mean:

"In addition to modifying struct setup_header in boot_params as part of
the 16-bit boot protocol, the boot loader..."


> the boot loader should
> +also fill the additional fields of the struct boot_params as that

							remove "that"

> +described in zero-page.txt.

Btw, you could document the sentinel mechanism here or in zero-page.txt,
for example.

> +
> +After setting up the struct boot_params, the boot loader can load the

		s/the//

> +64-bit kernel in the same way as that of 16-bit boot protocol, but
> +kernel could be above 4G.

"... the boot loader can load a 64-bit kernel the same way as with the
16-bit boot protocol with the additional advantage that the kernel can
be placed above the 4Gb barrier."

> +
> +In 64-bit boot protocol, the kernel is started by jumping to the

"In the 64-bit... "

> +64-bit kernel entry point, which is the start address of loaded

no comma:

"... entry point which is the start address of the loaded..."

> +64-bit kernel plus 0x200.

Again, what does the 0x200 value mean?

> +
> +At entry, the CPU must be in 64-bit mode with paging enabled.
> +The range with setup_header.init_size from start address of loaded
> +kernel and zero page and command line buffer get ident mapping;

Hmm, maybe:

"The ranges from the start address of the loaded kernel and with size
setup_header.init_size, the zero page and the command line buffer are
ident-mapped."

Newline here.

Then enumerate the further steps:

> +a GDT must be loaded with the descriptors for selectors
> +__BOOT_CS(0x10) and __BOOT_DS(0x18); both descriptors must be 4G flat
> +segment; __BOOT_CS must have execute/read permission, and __BOOT_DS
> +must have read/write permission; CS must be __BOOT_CS and DS, ES, SS
> +must be __BOOT_DS; interrupt must be disabled; %rsi must hold the base
> +address of the struct boot_params.

"Then:

* a GDT must be loaded with the descriptors for selectors
  __BOOT_CS(0x10) and __BOOT_DS(0x18)

* both descriptors must describe a 4G, flat segment

* __BOOT_CS must have execute/read permissions, and __BOOT_DS must have
  read/write permissions

* CS must be __BOOT_CS and DS, ES, SS must be __BOOT_DS

* interrupts must be disabled

* %rsi must hold the base address of the struct boot_params."


> +
>  **** EFI HANDOVER PROTOCOL
>  
>  This protocol allows boot loaders to defer initialisation to the EFI
> diff --git a/arch/x86/boot/compressed/head_64.S b/arch/x86/boot/compressed/head_64.S
> index 5c80b94..aaafd4e 100644
> --- a/arch/x86/boot/compressed/head_64.S
> +++ b/arch/x86/boot/compressed/head_64.S
> @@ -37,6 +37,12 @@
>  	__HEAD
>  	.code32
>  ENTRY(startup_32)
> +	/*
> +	 * 32bit entry is 0, could not be changed!

What does that mean? Did we try to change it or what?

> +	 * If we come here directly from a bootloader,
> +	 * kernel(text+data+bss+brk) ramdisk, zero_page, command line
> +	 * all need to be under 4G limit.

			"under the"

> +	 */
>  	cld
>  	/*
>  	 * Test KEEP_SEGMENTS flag to see if the bootloader is asking
> @@ -182,20 +188,18 @@ ENTRY(startup_32)
>  	lret
>  ENDPROC(startup_32)
>  
> -	/*
> -	 * Be careful here startup_64 needs to be at a predictable
> -	 * address so I can export it in an ELF header.  Bootloaders
> -	 * should look at the ELF header to find this address, as
> -	 * it may change in the future.
> -	 */
>  	.code64
>  	.org 0x200
>  ENTRY(startup_64)
>  	/*
> +	 * 64bit entry is 0x200, could not be changed!

Ah, I see what you mean:

	"64-bit entry point is 0x200 and it is ABI so immutable!"

Ditto for startup_32 above.

>  	 * We come here either from startup_32 or directly from a
> -	 * 64bit bootloader.  If we come here from a bootloader we depend on
> -	 * an identity mapped page table being provied that maps our
> -	 * entire text+data+bss and hopefully all of memory.
> +	 * 64bit bootloader.
> +	 * If we come here from a bootloader, kernel(text+data+bss+brk),
> +	 * ramdisk, zero_page, command line could be above 4G.
> +	 * We depend on an identity mapped page table being provided
> +	 * that maps our entire kernel(text+data+bss+brk), zero page
> +	 * and command line.

Heey, this one is good! :-)

Thanks.

-- 
Regards/Gruss,
    Boris.

Sent from a fat crate under my desk. Formatting is fine.
--

^ permalink raw reply	[flat|nested] 199+ messages in thread

* Re: [PATCH v7u1 24/31] x86, boot: Not need to check setup_header version for setup_data
  2013-01-04  0:48 ` [PATCH v7u1 24/31] x86, boot: Not need to check setup_header version for setup_data Yinghai Lu
@ 2013-01-14 11:26   ` Borislav Petkov
  2013-01-14 17:37     ` H. Peter Anvin
  0 siblings, 1 reply; 199+ messages in thread
From: Borislav Petkov @ 2013-01-14 11:26 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Eric W. Biederman,
	Andrew Morton, Jan Kiszka, Jason Wessel, linux-kernel

On Thu, Jan 03, 2013 at 04:48:44PM -0800, Yinghai Lu wrote:
> That is for bootloader.
> 
> setup_data is in setup_header, and all bootloader is copying that
> for bzImage. So for old bootloader should keep that as 0.

Are you sure all old bootloaders have kept setup_data as 0 so that you
can drop the check.

And besides, the check doesn't hurt but prevents insane old boot loaders
from handing in crap into the kernel so I'd leave it in.

Thanks.

-- 
Regards/Gruss,
    Boris.

Sent from a fat crate under my desk. Formatting is fine.
--

^ permalink raw reply	[flat|nested] 199+ messages in thread

* Re: [PATCH v7u1 27/31] x86, kdump: remove crashkernel range find limit for 64bit
  2013-01-04  0:48 ` [PATCH v7u1 27/31] x86, kdump: remove crashkernel range find limit for 64bit Yinghai Lu
@ 2013-01-14 15:43   ` Borislav Petkov
  2013-01-14 18:18     ` Yinghai Lu
  0 siblings, 1 reply; 199+ messages in thread
From: Borislav Petkov @ 2013-01-14 15:43 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Eric W. Biederman,
	Andrew Morton, Jan Kiszka, Jason Wessel, linux-kernel

On Thu, Jan 03, 2013 at 04:48:47PM -0800, Yinghai Lu wrote:
> Now kexeced kernel/ramdisk could be above 4g, so remove 896 limit for
> 64bit.
> 
> Signed-off-by: Yinghai Lu <yinghai@kernel.org>
> ---
>  arch/x86/kernel/setup.c |    4 +---
>  1 file changed, 1 insertion(+), 3 deletions(-)
> 
> diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
> index c58497e..6adbc45 100644
> --- a/arch/x86/kernel/setup.c
> +++ b/arch/x86/kernel/setup.c
> @@ -501,13 +501,11 @@ static void __init memblock_x86_reserve_range_setup_data(void)
>  /*
>   * Keep the crash kernel below this limit.  On 32 bits earlier kernels
>   * would limit the kernel to the low 512 MiB due to mapping restrictions.
> - * On 64 bits, kexec-tools currently limits us to 896 MiB; increase this
> - * limit once kexec-tools are fixed.

Does this mean that kexec-tools has been fixed too?

-- 
Regards/Gruss,
    Boris.

Sent from a fat crate under my desk. Formatting is fine.
--

^ permalink raw reply	[flat|nested] 199+ messages in thread

* Re: [PATCH v7u1 24/31] x86, boot: Not need to check setup_header version for setup_data
  2013-01-14 11:26   ` Borislav Petkov
@ 2013-01-14 17:37     ` H. Peter Anvin
  2013-01-14 18:04       ` Borislav Petkov
  0 siblings, 1 reply; 199+ messages in thread
From: H. Peter Anvin @ 2013-01-14 17:37 UTC (permalink / raw)
  To: Borislav Petkov, Yinghai Lu, Thomas Gleixner, Ingo Molnar,
	Eric W. Biederman, Andrew Morton, Jan Kiszka, Jason Wessel,
	linux-kernel

On 01/14/2013 03:26 AM, Borislav Petkov wrote:
> On Thu, Jan 03, 2013 at 04:48:44PM -0800, Yinghai Lu wrote:
>> That is for bootloader.
>>
>> setup_data is in setup_header, and all bootloader is copying that
>> for bzImage. So for old bootloader should keep that as 0.
>
> Are you sure all old bootloaders have kept setup_data as 0 so that you
> can drop the check.
>
> And besides, the check doesn't hurt but prevents insane old boot loaders
> from handing in crap into the kernel so I'd leave it in.
>

No, this is a case of cargo-cult programming.  I asked Yinghai to remove it.

It is cargo-cult programming because the value of boot_params.hdr comes 
from the kernel itself, so all you're doing is telling you the boot 
protocol version associated with the kernel itself, which we already know.

If we find a bootloader that does that incorrectly (e.g. if kexec were 
to blindly copy struct boot_params from the older kernel... which 
ironically would be better than the current situation) then the right 
thing to do would be to have a central place which scrub out the fields 
and just force them to zero rather than putting a bunch of tests all 
over the place.

	-hpa

-- 
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel.  I don't speak on their behalf.


^ permalink raw reply	[flat|nested] 199+ messages in thread

* Re: [PATCH v7u1 22/31] x86, boot: add fields to support load bzImage and ramdisk above 4G
  2013-01-14  5:37     ` Yinghai Lu
  2013-01-14  9:43       ` Borislav Petkov
@ 2013-01-14 17:49       ` H. Peter Anvin
  2013-01-14 18:57         ` Yinghai Lu
  1 sibling, 1 reply; 199+ messages in thread
From: H. Peter Anvin @ 2013-01-14 17:49 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Borislav Petkov, Thomas Gleixner, Ingo Molnar, Eric W. Biederman,
	Andrew Morton, Jan Kiszka, Jason Wessel, linux-kernel,
	Rob Landley, Matt Fleming, Gokul Caushik, Josh Triplett,
	Joe Millenbach

On 01/13/2013 09:37 PM, Yinghai Lu wrote:
>>
>> This is kinda missing from the mechanism of the sentinel and it should
>> be documented too.
>
> No, we should have too much duplicated info.
>

That is not duplicating info... that is basic documentation.  As you 
show in the post further on, it took a very simple description, and it 
*is* a very subtle thing that is inherently different from how the other 
fields operate.

It doesn't help that you didn't, despite repeated requests, implement 
what I *asked for*, which is:

If the sentinel is flagged, zero *all fields not explicitly set by the 
broken versions of kexec*, not just your new "ext" fields.

Yinghai, I understand you're frustrated, but please understand that 
Borislav is not in any shape, way, or form "some guys that do not know 
the code well keep sending comments out to waste others time".  Rather, 
he has spent a huge amount of time giving you an awful lot of good 
feedback  A lot of them have centered on documentation and code 
maintainability, both of which are vitally important part of a 
long-lived codebase.

Having someone doing line-by-line review of your code is enormously 
time-consuming and not something most people enjoy doing.  Borislav is 
doing you -- and me -- a huge favor here.

	-hpa



-- 
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel.  I don't speak on their behalf.


^ permalink raw reply	[flat|nested] 199+ messages in thread

* Re: [PATCH v7u1 24/31] x86, boot: Not need to check setup_header version for setup_data
  2013-01-14 17:37     ` H. Peter Anvin
@ 2013-01-14 18:04       ` Borislav Petkov
  2013-01-14 18:42         ` H. Peter Anvin
  0 siblings, 1 reply; 199+ messages in thread
From: Borislav Petkov @ 2013-01-14 18:04 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Yinghai Lu, Thomas Gleixner, Ingo Molnar, Eric W. Biederman,
	Andrew Morton, Jan Kiszka, Jason Wessel, linux-kernel

On Mon, Jan 14, 2013 at 09:37:36AM -0800, H. Peter Anvin wrote:
> No, this is a case of cargo-cult programming. I asked Yinghai to
> remove it.
>
> It is cargo-cult programming because the value of boot_params.hdr
> comes from the kernel itself, so all you're doing is telling you the
> boot protocol version associated with the kernel itself, which we
> already know.

LOOL. Great.

> If we find a bootloader that does that incorrectly (e.g. if kexec
> were to blindly copy struct boot_params from the older kernel...
> which ironically would be better than the current situation) then
> the right thing to do would be to have a central place which scrub
> out the fields and just force them to zero rather than putting a
> bunch of tests all over the place.

Ok, I didn't realize that.

And besides, nothing stops a silly boot loader from adjusting
boot_params.hdr.version so that the check - or any check for that matter
- passes, AFAICT.

See, this is exactly the reason why I'm trying to explain to Yinghai
why it is a Good Thing to document stuff like that at least in commit
messages.

Thanks.

-- 
Regards/Gruss,
    Boris.

Sent from a fat crate under my desk. Formatting is fine.
--

^ permalink raw reply	[flat|nested] 199+ messages in thread

* Re: [PATCH v7u1 21/31] x86, kexec: only set ident mapping for ram.
  2013-01-14  9:53       ` Borislav Petkov
@ 2013-01-14 18:17         ` Yinghai Lu
  0 siblings, 0 replies; 199+ messages in thread
From: Yinghai Lu @ 2013-01-14 18:17 UTC (permalink / raw)
  To: Borislav Petkov, Yinghai Lu, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin, Eric W. Biederman, Andrew Morton, Jan Kiszka,
	Jason Wessel, linux-kernel, Alexander Duyck

On Mon, Jan 14, 2013 at 1:53 AM, Borislav Petkov <bp@alien8.de> wrote:
> On Sun, Jan 13, 2013 at 09:46:17PM -0800, Yinghai Lu wrote:
>> On Sun, Jan 13, 2013 at 4:56 AM, Borislav Petkov <bp@alien8.de> wrote:
>> > On Thu, Jan 03, 2013 at 04:48:41PM -0800, Yinghai Lu wrote:
>> >> We should not set mapping for all under max_pfn.
>> >
>> > "We should not establish mappings for all memory under max_pfn."
>>
>> that is not accurate.
>>
>> We should not set mapping for all range under max_pfn.
>>
>> or
>>
>> We should set mappings only for memory ranges under max_pfn.
>
> Ok, that last thing is getting close. So do I understand it correctly
> now:
>
> "We should establish mappings only for memory (memory which is not
> marked reserved or whatever by E820 or some other mechanism) under
> max_pfn."
>
> ?

yes, you got it.

---
We should set mappings only for usable memory ranges under max_pfn
Otherwise causes same problem that is fixed by

        x86, mm: Only direct map addresses that are marked as E820_RAM

This patch exposes pfn_mapped array, and only sets ident mapping for ranges
in that array.

This patch relies on new kernel_ident_mapping_init that could handle existing
pgd/pud between different calls.

---
>
>> >> That causes same problem that is fixed by
>> >
>> > "Otherwise, it causes the same ..."
>> >
>> >>
>> >>       x86, mm: Only direct map addresses that are marked as E820_RAM
>> >
>> > You could add this patch's commit id since it is in tip:x86/mm2 and it
>> > shouldn't change: 66520ebc2df3.
>>
>> why ? they are not in linus tree yet, so it could change if that tip
>> branch is rebased.
>
> Oh, you didn't know: tip branches don't get rebased. At least almost
> never.

hpa rebased x86/mm2 one time with my patchset according to request from Ingo.

^ permalink raw reply	[flat|nested] 199+ messages in thread

* Re: [PATCH v7u1 27/31] x86, kdump: remove crashkernel range find limit for 64bit
  2013-01-14 15:43   ` Borislav Petkov
@ 2013-01-14 18:18     ` Yinghai Lu
  0 siblings, 0 replies; 199+ messages in thread
From: Yinghai Lu @ 2013-01-14 18:18 UTC (permalink / raw)
  To: Borislav Petkov, Yinghai Lu, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin, Eric W. Biederman, Andrew Morton, Jan Kiszka,
	Jason Wessel, linux-kernel

On Mon, Jan 14, 2013 at 7:43 AM, Borislav Petkov <bp@alien8.de> wrote:
> On Thu, Jan 03, 2013 at 04:48:47PM -0800, Yinghai Lu wrote:
>> Now kexeced kernel/ramdisk could be above 4g, so remove 896 limit for
>> 64bit.
>>
>> Signed-off-by: Yinghai Lu <yinghai@kernel.org>
>> ---
>>  arch/x86/kernel/setup.c |    4 +---
>>  1 file changed, 1 insertion(+), 3 deletions(-)
>>
>> diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
>> index c58497e..6adbc45 100644
>> --- a/arch/x86/kernel/setup.c
>> +++ b/arch/x86/kernel/setup.c
>> @@ -501,13 +501,11 @@ static void __init memblock_x86_reserve_range_setup_data(void)
>>  /*
>>   * Keep the crash kernel below this limit.  On 32 bits earlier kernels
>>   * would limit the kernel to the low 512 MiB due to mapping restrictions.
>> - * On 64 bits, kexec-tools currently limits us to 896 MiB; increase this
>> - * limit once kexec-tools are fixed.
>
> Does this mean that kexec-tools has been fixed too?

with the patcheset that i sent to kexec-tools mailing list.

^ permalink raw reply	[flat|nested] 199+ messages in thread

* Re: [PATCH v7u1 23/31] x86, boot: update comments about entries for 64bit image
  2013-01-14 11:20   ` Borislav Petkov
@ 2013-01-14 18:35     ` Yinghai Lu
  2013-01-14 18:37       ` Yinghai Lu
  2013-01-14 18:43       ` Borislav Petkov
  0 siblings, 2 replies; 199+ messages in thread
From: Yinghai Lu @ 2013-01-14 18:35 UTC (permalink / raw)
  To: Borislav Petkov, Yinghai Lu, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin, Eric W. Biederman, Andrew Morton, Jan Kiszka,
	Jason Wessel, linux-kernel, Rob Landley, Matt Fleming

On Mon, Jan 14, 2013 at 3:20 AM, Borislav Petkov <bp@alien8.de> wrote:
>> the boot loader should
>> +also fill the additional fields of the struct boot_params as that
>
>                                                         remove "that"
>
>> +described in zero-page.txt.
>
> Btw, you could document the sentinel mechanism here or in zero-page.txt,
> for example.


that is:
---
The memory for struct boot_params
could be allocated anywhere (even above 4G) and initialized to all zero.
---

bootloader memset zero_page at first means.
and all bootloader did that except kexec. and it is fixed patches for
kexec-tools.

^ permalink raw reply	[flat|nested] 199+ messages in thread

* Re: [PATCH v7u1 23/31] x86, boot: update comments about entries for 64bit image
  2013-01-14 18:35     ` Yinghai Lu
@ 2013-01-14 18:37       ` Yinghai Lu
  2013-01-14 18:46         ` Borislav Petkov
  2013-01-14 18:43       ` Borislav Petkov
  1 sibling, 1 reply; 199+ messages in thread
From: Yinghai Lu @ 2013-01-14 18:37 UTC (permalink / raw)
  To: Borislav Petkov, Yinghai Lu, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin, Eric W. Biederman, Andrew Morton, Jan Kiszka,
	Jason Wessel, linux-kernel, Rob Landley, Matt Fleming

On Mon, Jan 14, 2013 at 10:35 AM, Yinghai Lu <yinghai@kernel.org> wrote:
> On Mon, Jan 14, 2013 at 3:20 AM, Borislav Petkov <bp@alien8.de> wrote:
>>> the boot loader should
>>> +also fill the additional fields of the struct boot_params as that
>>
>>                                                         remove "that"
>>
>>> +described in zero-page.txt.
>>
>> Btw, you could document the sentinel mechanism here or in zero-page.txt,
>> for example.
>
>
> that is:
> ---
> The memory for struct boot_params
> could be allocated anywhere (even above 4G) and initialized to all zero.
> ---
>
> bootloader memset zero_page at first means.
> and all bootloader did that except kexec. and it is fixed patches for
> kexec-tools.

also I fix some errors that you mentioned, but not all of them
because I copied 32bit portion to 64bit and modified it.

So you  may submit patch to fix 32bit and 64bit at the same time if you like.

Thanks

Yinghai

^ permalink raw reply	[flat|nested] 199+ messages in thread

* Re: [PATCH v7u1 24/31] x86, boot: Not need to check setup_header version for setup_data
  2013-01-14 18:04       ` Borislav Petkov
@ 2013-01-14 18:42         ` H. Peter Anvin
  0 siblings, 0 replies; 199+ messages in thread
From: H. Peter Anvin @ 2013-01-14 18:42 UTC (permalink / raw)
  To: Borislav Petkov, Yinghai Lu, Thomas Gleixner, Ingo Molnar,
	Eric W. Biederman, Andrew Morton, Jan Kiszka, Jason Wessel,
	linux-kernel

On 01/14/2013 10:04 AM, Borislav Petkov wrote:
> 
> See, this is exactly the reason why I'm trying to explain to Yinghai
> why it is a Good Thing to document stuff like that at least in commit
> messages.
> 

It very much is.  It avoids these kinds of long discussions.

	-hpa



^ permalink raw reply	[flat|nested] 199+ messages in thread

* Re: [PATCH v7u1 23/31] x86, boot: update comments about entries for 64bit image
  2013-01-14 18:35     ` Yinghai Lu
  2013-01-14 18:37       ` Yinghai Lu
@ 2013-01-14 18:43       ` Borislav Petkov
  1 sibling, 0 replies; 199+ messages in thread
From: Borislav Petkov @ 2013-01-14 18:43 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Eric W. Biederman,
	Andrew Morton, Jan Kiszka, Jason Wessel, linux-kernel,
	Rob Landley, Matt Fleming

On Mon, Jan 14, 2013 at 10:35:28AM -0800, Yinghai Lu wrote:
> that is:
> ---
> The memory for struct boot_params
> could be allocated anywhere (even above 4G) and initialized to all zero.
> ---

This sounds ok.

> bootloader memset zero_page at first means.
> and all bootloader did that except kexec. and it is fixed patches for
> kexec-tools.

This reads very strange. Do you mean:

"All bootloaders clear the zero page as early as possible, except kexec.
This is now fixed in kexec-tools, too."

?

-- 
Regards/Gruss,
    Boris.

Sent from a fat crate under my desk. Formatting is fine.
--

^ permalink raw reply	[flat|nested] 199+ messages in thread

* Re: [PATCH v7u1 23/31] x86, boot: update comments about entries for 64bit image
  2013-01-14 18:37       ` Yinghai Lu
@ 2013-01-14 18:46         ` Borislav Petkov
  2013-01-14 20:01           ` Yinghai Lu
  0 siblings, 1 reply; 199+ messages in thread
From: Borislav Petkov @ 2013-01-14 18:46 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Eric W. Biederman,
	Andrew Morton, Jan Kiszka, Jason Wessel, linux-kernel,
	Rob Landley, Matt Fleming

On Mon, Jan 14, 2013 at 10:37:39AM -0800, Yinghai Lu wrote:
> also I fix some errors that you mentioned, but not all of them because
> I copied 32bit portion to 64bit and modified it.
>
> So you may submit patch to fix 32bit and 64bit at the same time if you
> like.

Sure, if they're old typos/errors. If they're new typos/errors
introduced by you, then you know that we don't commit half-baked
patches, right?

-- 
Regards/Gruss,
    Boris.

Sent from a fat crate under my desk. Formatting is fine.
--

^ permalink raw reply	[flat|nested] 199+ messages in thread

* Re: [PATCH v7u1 22/31] x86, boot: add fields to support load bzImage and ramdisk above 4G
  2013-01-14 17:49       ` H. Peter Anvin
@ 2013-01-14 18:57         ` Yinghai Lu
  2013-01-14 18:59           ` H. Peter Anvin
  2013-01-14 20:05           ` Borislav Petkov
  0 siblings, 2 replies; 199+ messages in thread
From: Yinghai Lu @ 2013-01-14 18:57 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Borislav Petkov, Thomas Gleixner, Ingo Molnar, Eric W. Biederman,
	Andrew Morton, Jan Kiszka, Jason Wessel, linux-kernel,
	Rob Landley, Matt Fleming, Gokul Caushik, Josh Triplett,
	Joe Millenbach

On Mon, Jan 14, 2013 at 9:49 AM, H. Peter Anvin <hpa@zytor.com> wrote:
> On 01/13/2013 09:37 PM, Yinghai Lu wrote:
>>>
>>>
>>> This is kinda missing from the mechanism of the sentinel and it should
>>> be documented too.
>>
>>
>> No, we should have too much duplicated info.
>>
>
> That is not duplicating info... that is basic documentation.  As you show in
> the post further on, it took a very simple description, and it *is* a very
> subtle thing that is inherently different from how the other fields operate.

please check if following is enough?

+       /*
+        * kernel have sentinel to set as 0xff in setup link scripts,
+        * so if bootloader just copy whole page from kernel image to
+        * get setup_header instead of clearing boot_param buffer and
+        * copying setup_header only, will leave sentinel as 0xff.
+        * With that, we can tell some fields in boot_param have
+        * invalid values, and we need to zero them in kernel.
+        */
+       __u8  sentinel;                                 /* 0x1ef */


>
> It doesn't help that you didn't, despite repeated requests, implement what I
> *asked for*, which is:
>
> If the sentinel is flagged, zero *all fields not explicitly set by the
> broken versions of kexec*, not just your new "ext" fields.

other fields are pad* fields, so do we zero out them
with memset with exact address?
so next times, when someone change pad fields to other ext_*,
they don't need to change code again here.


>
> Yinghai, I understand you're frustrated, but please understand that Borislav
> is not in any shape, way, or form "some guys that do not know the code well
> keep sending comments out to waste others time".  Rather, he has spent a
> huge amount of time giving you an awful lot of good feedback  A lot of them
> have centered on documentation and code maintainability, both of which are
> vitally important part of a long-lived codebase.
>
> Having someone doing line-by-line review of your code is enormously
> time-consuming and not something most people enjoy doing.  Borislav is doing
> you -- and me -- a huge favor here.

yes.

^ permalink raw reply	[flat|nested] 199+ messages in thread

* Re: [PATCH v7u1 22/31] x86, boot: add fields to support load bzImage and ramdisk above 4G
  2013-01-14 18:57         ` Yinghai Lu
@ 2013-01-14 18:59           ` H. Peter Anvin
  2013-01-14 19:19             ` Yinghai Lu
  2013-01-14 19:50             ` Yinghai Lu
  2013-01-14 20:05           ` Borislav Petkov
  1 sibling, 2 replies; 199+ messages in thread
From: H. Peter Anvin @ 2013-01-14 18:59 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Borislav Petkov, Thomas Gleixner, Ingo Molnar, Eric W. Biederman,
	Andrew Morton, Jan Kiszka, Jason Wessel, linux-kernel,
	Rob Landley, Matt Fleming, Gokul Caushik, Josh Triplett,
	Joe Millenbach

On 01/14/2013 10:57 AM, Yinghai Lu wrote:
>>
>> If the sentinel is flagged, zero *all fields not explicitly set by the
>> broken versions of kexec*, not just your new "ext" fields.
> 
> other fields are pad* fields, so do we zero out them
> with memset with exact address?
> so next times, when someone change pad fields to other ext_*,
> they don't need to change code again here.
> 

No, there were other fields that were also left uninitialized, per your
analysis from last year.  I don't remember the details, but I seem to
recall they included the EFI and graphics-related fields.

So yes, just zero them all out.

	-hpa



^ permalink raw reply	[flat|nested] 199+ messages in thread

* Re: [PATCH v7u1 22/31] x86, boot: add fields to support load bzImage and ramdisk above 4G
  2013-01-14 18:59           ` H. Peter Anvin
@ 2013-01-14 19:19             ` Yinghai Lu
  2013-01-14 19:50             ` Yinghai Lu
  1 sibling, 0 replies; 199+ messages in thread
From: Yinghai Lu @ 2013-01-14 19:19 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Borislav Petkov, Thomas Gleixner, Ingo Molnar, Eric W. Biederman,
	Andrew Morton, Jan Kiszka, Jason Wessel, linux-kernel,
	Rob Landley, Matt Fleming, Gokul Caushik, Josh Triplett,
	Joe Millenbach

On Mon, Jan 14, 2013 at 10:59 AM, H. Peter Anvin <hpa@zytor.com> wrote:
> On 01/14/2013 10:57 AM, Yinghai Lu wrote:
>>>
>>> If the sentinel is flagged, zero *all fields not explicitly set by the
>>> broken versions of kexec*, not just your new "ext" fields.
>>
>> other fields are pad* fields, so do we zero out them
>> with memset with exact address?
>> so next times, when someone change pad fields to other ext_*,
>> they don't need to change code again here.
>>
>
> No, there were other fields that were also left uninitialized, per your
> analysis from last year.  I don't remember the details, but I seem to
> recall they included the EFI and graphics-related fields.
>
> So yes, just zero them all out.
>

from last year's mail:

---
the one with *
? means set only when lfb_depth > 8
also
   0xa2 to 0x1df is not set

struct x86_linux_param_header {
        uint8_t  orig_x;                        /* 0x00 */  *
        uint8_t  orig_y;                        /* 0x01 */  *
        uint16_t ext_mem_k;                     /* 0x02 -- EXT_MEM_K
sits here */   *
        uint16_t orig_video_page;               /* 0x04 */  *
        uint8_t  orig_video_mode;               /* 0x06 */  *
        uint8_t  orig_video_cols;               /* 0x07 */  *
        uint16_t unused2;                       /* 0x08 */
        uint16_t orig_video_ega_bx;             /* 0x0a */  *
        uint16_t unused3;                       /* 0x0c */
        uint8_t  orig_video_lines;              /* 0x0e */  *
        uint8_t  orig_video_isVGA;              /* 0x0f */   *
        uint16_t orig_video_points;             /* 0x10 */   *

        /* VESA graphic mode -- linear frame buffer */
        uint16_t lfb_width;                     /* 0x12 */   *
        uint16_t lfb_height;                    /* 0x14 */   *
        uint16_t lfb_depth;                     /* 0x16 */   *
        uint32_t lfb_base;                      /* 0x18 */   *
        uint32_t lfb_size;                      /* 0x1c */   *
        uint16_t cl_magic;                      /* 0x20 */   *
#define CL_MAGIC_VALUE 0xA33F
        uint16_t cl_offset;                     /* 0x22 */   *
        uint16_t lfb_linelength;                /* 0x24 */   *
        uint8_t  red_size;                      /* 0x26 */   ?
        uint8_t  red_pos;                       /* 0x27 */   ?
        uint8_t  green_size;                    /* 0x28 */   ?
        uint8_t  green_pos;                     /* 0x29 */   ?
        uint8_t  blue_size;                     /* 0x2a */   ?
        uint8_t  blue_pos;                      /* 0x2b */   ?
        uint8_t  rsvd_size;                     /* 0x2c */   ?
        uint8_t  rsvd_pos;                      /* 0x2d */   ?
        uint16_t vesapm_seg;                    /* 0x2e */   *
        uint16_t vesapm_off;                    /* 0x30 */
        uint16_t pages;                         /* 0x32 */   *
        uint8_t  reserved4[12];                 /* 0x34 -- 0x3f
reserved for future expansion */

        struct apm_bios_info apm_bios_info;     /* 0x40 */   *
        struct drive_info_struct drive_info;    /* 0x80 */   *
        struct sys_desc_table sys_desc_table;   /* 0xa0 */   * only .length =
0, aka 0xa2 to 0x1df is not set
        uint32_t alt_mem_k;                     /* 0x1e0 */  *
        uint8_t  reserved5[4];                  /* 0x1e4 */
        uint8_t  e820_map_nr;                   /* 0x1e8 */  *
        uint8_t  eddbuf_entries;                /* 0x1e9 */  *
        uint8_t  edd_mbr_sig_buf_entries;       /* 0x1ea */  *
        uint8_t  reserved6[6];                  /* 0x1eb */
        HEADER.....                                         copied and
or meset 0 and set.
        uint8_t  reserved16[0x290 - 0x248];     /* 0x248 */
        uint32_t edd_mbr_sig_buffer[EDD_MBR_SIG_MAX];   /* 0x290 */  *
#endif
        struct  e820entry e820_map[E820MAX];    /* 0x2d0 */   *
        uint8_t _pad8[48];                      /* 0xcd0 */
        struct  edd_info eddbuf[EDDMAXNR];      /* 0xd00 */   *
                                                /* 0xeec */
#define COMMAND_LINE_SIZE 2048
};

----

in kernel for 0xa2 to 0x1df:
        struct sys_desc_table sys_desc_table;           /* 0x0a0 */
        struct olpc_ofw_header olpc_ofw_header;         /* 0x0b0 */
        __u32 ext_ramdisk_image;                        /* 0x0c0 */
        __u32 ext_ramdisk_size;                         /* 0x0c4 */
        __u32 ext_cmd_line_ptr;                         /* 0x0c8 */
        __u8  _pad4[116];                               /* 0x0cc */
        struct edid_info edid_info;                     /* 0x140 */
        struct efi_info efi_info;                       /* 0x1c0 */
        __u32 alt_mem_k;                                /* 0x1e0 */

so do you mean clear from 0xa2 to 0x1df?

Thanks

Yinghai

^ permalink raw reply	[flat|nested] 199+ messages in thread

* Re: [PATCH v7u1 22/31] x86, boot: add fields to support load bzImage and ramdisk above 4G
  2013-01-14 18:59           ` H. Peter Anvin
  2013-01-14 19:19             ` Yinghai Lu
@ 2013-01-14 19:50             ` Yinghai Lu
  2013-01-14 19:56               ` H. Peter Anvin
  1 sibling, 1 reply; 199+ messages in thread
From: Yinghai Lu @ 2013-01-14 19:50 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Borislav Petkov, Thomas Gleixner, Ingo Molnar, Eric W. Biederman,
	Andrew Morton, Jan Kiszka, Jason Wessel, linux-kernel,
	Rob Landley, Matt Fleming, Gokul Caushik, Josh Triplett,
	Joe Millenbach

On Mon, Jan 14, 2013 at 10:59 AM, H. Peter Anvin <hpa@zytor.com> wrote:
> No, there were other fields that were also left uninitialized, per your
> analysis from last year.  I don't remember the details, but I seem to
> recall they included the EFI and graphics-related fields.
>
> So yes, just zero them all out.

?

+static void sanitize_real_mode(struct boot_params *real_mode)
+{
+       if (real_mode->sentinel) {
+               /*fields in boot_params are not valid, clear them */
+               memset(&real_mode->olpc_ofw_header, 0,
+                      (char *)&real_mode->alt_mem_k -
+                       (char *)&real_mode->olpc_ofw_header);
+               memset(&real_mode->_pad7[0], 0,
+                      (char *)&real_mode->edd_mbr_sig_buffer[0] -
+                       (char *)&real_mode->_pad7[0]);
+               memset(&real_mode->_pad8[0], 0,
+                      (char *)&real_mode->eddbuf[0] -
+                       (char *)&real_mode->_pad8[0]);
+               memset(&real_mode->_pad9[0], 0, sizeof(real_mode->_pad9));
+       }
+}
+

^ permalink raw reply	[flat|nested] 199+ messages in thread

* Re: [PATCH v7u1 22/31] x86, boot: add fields to support load bzImage and ramdisk above 4G
  2013-01-14 19:50             ` Yinghai Lu
@ 2013-01-14 19:56               ` H. Peter Anvin
  2013-01-14 20:05                 ` Yinghai Lu
  0 siblings, 1 reply; 199+ messages in thread
From: H. Peter Anvin @ 2013-01-14 19:56 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Borislav Petkov, Thomas Gleixner, Ingo Molnar, Eric W. Biederman,
	Andrew Morton, Jan Kiszka, Jason Wessel, linux-kernel,
	Rob Landley, Matt Fleming, Gokul Caushik, Josh Triplett,
	Joe Millenbach

On 01/14/2013 11:50 AM, Yinghai Lu wrote:
> On Mon, Jan 14, 2013 at 10:59 AM, H. Peter Anvin <hpa@zytor.com> wrote:
>> No, there were other fields that were also left uninitialized, per your
>> analysis from last year.  I don't remember the details, but I seem to
>> recall they included the EFI and graphics-related fields.
>>
>> So yes, just zero them all out.
> 
> ?
> 

Yep.

> +static void sanitize_real_mode(struct boot_params *real_mode)
> +{
> +       if (real_mode->sentinel) {
> +               /*fields in boot_params are not valid, clear them */
> +               memset(&real_mode->olpc_ofw_header, 0,
> +                      (char *)&real_mode->alt_mem_k -
> +                       (char *)&real_mode->olpc_ofw_header);
> +               memset(&real_mode->_pad7[0], 0,
> +                      (char *)&real_mode->edd_mbr_sig_buffer[0] -
> +                       (char *)&real_mode->_pad7[0]);
> +               memset(&real_mode->_pad8[0], 0,
> +                      (char *)&real_mode->eddbuf[0] -
> +                       (char *)&real_mode->_pad8[0]);
> +               memset(&real_mode->_pad9[0], 0, sizeof(real_mode->_pad9));
> +       }
> +}
> +
> 


^ permalink raw reply	[flat|nested] 199+ messages in thread

* Re: [PATCH v7u1 23/31] x86, boot: update comments about entries for 64bit image
  2013-01-14 18:46         ` Borislav Petkov
@ 2013-01-14 20:01           ` Yinghai Lu
  0 siblings, 0 replies; 199+ messages in thread
From: Yinghai Lu @ 2013-01-14 20:01 UTC (permalink / raw)
  To: Borislav Petkov, Yinghai Lu, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin, Eric W. Biederman, Andrew Morton, Jan Kiszka,
	Jason Wessel, linux-kernel, Rob Landley, Matt Fleming

On Mon, Jan 14, 2013 at 10:46 AM, Borislav Petkov <bp@alien8.de> wrote:
> On Mon, Jan 14, 2013 at 10:37:39AM -0800, Yinghai Lu wrote:
>> also I fix some errors that you mentioned, but not all of them because
>> I copied 32bit portion to 64bit and modified it.
>>
>> So you may submit patch to fix 32bit and 64bit at the same time if you
>> like.
>
> Sure, if they're old typos/errors. If they're new typos/errors
> introduced by you, then you know that we don't commit half-baked
> patches, right?

no, we should not.

some times I could tell your understanding is right or not.

other cases, i just don't know if your changes is right or original 32
bit sentence
is right.

So later, you or maybe hpa would like to change them at the same time.

Yinghai

^ permalink raw reply	[flat|nested] 199+ messages in thread

* Re: [PATCH v7u1 22/31] x86, boot: add fields to support load bzImage and ramdisk above 4G
  2013-01-14 18:57         ` Yinghai Lu
  2013-01-14 18:59           ` H. Peter Anvin
@ 2013-01-14 20:05           ` Borislav Petkov
  2013-01-14 20:14             ` Yinghai Lu
  1 sibling, 1 reply; 199+ messages in thread
From: Borislav Petkov @ 2013-01-14 20:05 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: H. Peter Anvin, Thomas Gleixner, Ingo Molnar, Eric W. Biederman,
	Andrew Morton, Jan Kiszka, Jason Wessel, linux-kernel,
	Rob Landley, Matt Fleming, Gokul Caushik, Josh Triplett,
	Joe Millenbach

On Mon, Jan 14, 2013 at 10:57:08AM -0800, Yinghai Lu wrote:
> please check if following is enough?
> 
> +       /*
> +        * kernel have sentinel to set as 0xff in setup link scripts,
> +        * so if bootloader just copy whole page from kernel image to
> +        * get setup_header instead of clearing boot_param buffer and
> +        * copying setup_header only, will leave sentinel as 0xff.
> +        * With that, we can tell some fields in boot_param have
> +        * invalid values, and we need to zero them in kernel.
> +        */
> +       __u8  sentinel;                                 /* 0x1ef */

"The sentinel variable is set by the linker script to 0xff. If a
bootloader doesn't know about this variable and just copies the
setup_header portion and doesn't clear the boot_params buffer as it is
supposed to, it will leave the sentinel to its initial value of 0xff.

This tells the kernel that some fields in boot_params have invalid
values and we have to zero them out in the kernel."

-- 
Regards/Gruss,
    Boris.

Sent from a fat crate under my desk. Formatting is fine.
--

^ permalink raw reply	[flat|nested] 199+ messages in thread

* Re: [PATCH v7u1 22/31] x86, boot: add fields to support load bzImage and ramdisk above 4G
  2013-01-14 19:56               ` H. Peter Anvin
@ 2013-01-14 20:05                 ` Yinghai Lu
  2013-01-15  6:17                   ` Yinghai Lu
  0 siblings, 1 reply; 199+ messages in thread
From: Yinghai Lu @ 2013-01-14 20:05 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Borislav Petkov, Thomas Gleixner, Ingo Molnar, Eric W. Biederman,
	Andrew Morton, Jan Kiszka, Jason Wessel, linux-kernel,
	Rob Landley, Matt Fleming, Gokul Caushik, Josh Triplett,
	Joe Millenbach

On Mon, Jan 14, 2013 at 11:56 AM, H. Peter Anvin <hpa@zytor.com> wrote:
> On 01/14/2013 11:50 AM, Yinghai Lu wrote:
>> On Mon, Jan 14, 2013 at 10:59 AM, H. Peter Anvin <hpa@zytor.com> wrote:
>>> No, there were other fields that were also left uninitialized, per your
>>> analysis from last year.  I don't remember the details, but I seem to
>>> recall they included the EFI and graphics-related fields.
>>>
>>> So yes, just zero them all out.
>>
>> ?
>>
>
> Yep.

ok, will rebase -v7 branch this afternoon, and ask you and Borislav to
check if i missed anything.

hope can send out v7u2 tomorrow ...

^ permalink raw reply	[flat|nested] 199+ messages in thread

* Re: [PATCH v7u1 22/31] x86, boot: add fields to support load bzImage and ramdisk above 4G
  2013-01-14 20:05           ` Borislav Petkov
@ 2013-01-14 20:14             ` Yinghai Lu
  2013-01-14 20:26               ` Borislav Petkov
  0 siblings, 1 reply; 199+ messages in thread
From: Yinghai Lu @ 2013-01-14 20:14 UTC (permalink / raw)
  To: Borislav Petkov, Yinghai Lu, H. Peter Anvin, Thomas Gleixner,
	Ingo Molnar, Eric W. Biederman, Andrew Morton, Jan Kiszka,
	Jason Wessel, linux-kernel, Rob Landley, Matt Fleming,
	Gokul Caushik, Josh Triplett, Joe Millenbach

On Mon, Jan 14, 2013 at 12:05 PM, Borislav Petkov <bp@alien8.de> wrote:
> On Mon, Jan 14, 2013 at 10:57:08AM -0800, Yinghai Lu wrote:
>> please check if following is enough?
>>
>> +       /*
>> +        * kernel have sentinel to set as 0xff in setup link scripts,
>> +        * so if bootloader just copy whole page from kernel image to
>> +        * get setup_header instead of clearing boot_param buffer and
>> +        * copying setup_header only, will leave sentinel as 0xff.
>> +        * With that, we can tell some fields in boot_param have
>> +        * invalid values, and we need to zero them in kernel.
>> +        */
>> +       __u8  sentinel;                                 /* 0x1ef */
>
> "The sentinel variable is set by the linker script to 0xff. If a
> bootloader doesn't know about this variable and just copies the
> setup_header portion and doesn't clear the boot_params buffer as it is
> supposed to, it will leave the sentinel to its initial value of 0xff.

no, no, no.

bootloader does not need to know sentinel, and they only need to do:
    clearing boot_param buffer and copying setup_header only

even new bootloader is not supposed to know sentinel ...


>
> This tells the kernel that some fields in boot_params have invalid
> values and we have to zero them out in the kernel."

^ permalink raw reply	[flat|nested] 199+ messages in thread

* Re: [PATCH v7u1 22/31] x86, boot: add fields to support load bzImage and ramdisk above 4G
  2013-01-14 20:14             ` Yinghai Lu
@ 2013-01-14 20:26               ` Borislav Petkov
  2013-01-14 22:38                 ` Yinghai Lu
  0 siblings, 1 reply; 199+ messages in thread
From: Borislav Petkov @ 2013-01-14 20:26 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: H. Peter Anvin, Thomas Gleixner, Ingo Molnar, Eric W. Biederman,
	Andrew Morton, Jan Kiszka, Jason Wessel, linux-kernel,
	Rob Landley, Matt Fleming, Gokul Caushik, Josh Triplett,
	Joe Millenbach

On Mon, Jan 14, 2013 at 12:14:18PM -0800, Yinghai Lu wrote:
> no, no, no.
> 
> bootloader does not need to know sentinel, and they only need to do:
>     clearing boot_param buffer and copying setup_header only
> 
> even new bootloader is not supposed to know sentinel ...

Ah, ok. I thought something was fishy because if bootloaders would know
about it, they'd copy setup_header and zero out the sentinel only, to
force the kernel to use crappy ext_* etc. values.

How about this:

"The sentinel variable is set by the linker script to 0xff. It is
supposed to be used for catching bootloaders which just copy the
setup_header portion and don't clear the whole boot_params buffer as
they are supposed to. Such bootloaders will leave the sentinel to its
initial value of 0xff and in this case, the kernel will assume that some
fields in boot_params have invalid values and zero them out."

-- 
Regards/Gruss,
    Boris.

Sent from a fat crate under my desk. Formatting is fine.
--

^ permalink raw reply	[flat|nested] 199+ messages in thread

* Re: [PATCH v7u1 25/31] memblock: add memblock_mem_size()
  2013-01-04  0:48 ` [PATCH v7u1 25/31] memblock: add memblock_mem_size() Yinghai Lu
@ 2013-01-14 20:42   ` H. Peter Anvin
  2013-01-14 22:28     ` Yinghai Lu
  0 siblings, 1 reply; 199+ messages in thread
From: H. Peter Anvin @ 2013-01-14 20:42 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Thomas Gleixner, Ingo Molnar, Eric W. Biederman, Andrew Morton,
	Borislav Petkov, Jan Kiszka, Jason Wessel, linux-kernel

On 01/03/2013 04:48 PM, Yinghai Lu wrote:
>  
> -	mapped_size = get_mem_size(max_pfn_mapped);
> +	mapped_size = (u64)memblock_mem_size(max_pfn_mapped);

This cast is completely unnecessary.

	-hpa



^ permalink raw reply	[flat|nested] 199+ messages in thread

* Re: [PATCH v7u1 00/31] x86, boot, 64bit: Add support for loading ramdisk and bzImage above 4G
  2013-01-04  0:48 [PATCH v7u1 00/31] x86, boot, 64bit: Add support for loading ramdisk and bzImage above 4G Yinghai Lu
                   ` (31 preceding siblings ...)
  2013-01-04  7:09 ` [PATCH v7u1 00/31] x86, boot, 64bit: Add support for loading ramdisk and bzImage above 4G Borislav Petkov
@ 2013-01-14 20:45 ` H. Peter Anvin
  2013-01-14 22:44   ` Yinghai Lu
  2013-01-15 12:19 ` Stefano Stabellini
  33 siblings, 1 reply; 199+ messages in thread
From: H. Peter Anvin @ 2013-01-14 20:45 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Thomas Gleixner, Ingo Molnar, Eric W. Biederman, Andrew Morton,
	Borislav Petkov, Jan Kiszka, Jason Wessel, linux-kernel,
	David Woodhouse

This is getting extremely unwieldy and we need to get it broken up a bit.

I would really like to get the boot protocol changes earlier in the
series, because it has interactions with work other people are doing and
may need additional surgery.  That is, the additions of flags and fields
to make it possible for the kernel to indicate that > 4 GB booting is
possible, not the actual implementation thereof.  This split between
protocol changes and implementation will also be useful for bisection.

	-hpa


^ permalink raw reply	[flat|nested] 199+ messages in thread

* Re: [PATCH v7u1 25/31] memblock: add memblock_mem_size()
  2013-01-14 20:42   ` H. Peter Anvin
@ 2013-01-14 22:28     ` Yinghai Lu
  0 siblings, 0 replies; 199+ messages in thread
From: Yinghai Lu @ 2013-01-14 22:28 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Thomas Gleixner, Ingo Molnar, Eric W. Biederman, Andrew Morton,
	Borislav Petkov, Jan Kiszka, Jason Wessel, linux-kernel

On Mon, Jan 14, 2013 at 12:42 PM, H. Peter Anvin <hpa@zytor.com> wrote:
> On 01/03/2013 04:48 PM, Yinghai Lu wrote:
>>
>> -     mapped_size = get_mem_size(max_pfn_mapped);
>> +     mapped_size = (u64)memblock_mem_size(max_pfn_mapped);
>
> This cast is completely unnecessary.

ok, dropped that cast.

^ permalink raw reply	[flat|nested] 199+ messages in thread

* Re: [PATCH v7u1 22/31] x86, boot: add fields to support load bzImage and ramdisk above 4G
  2013-01-14 20:26               ` Borislav Petkov
@ 2013-01-14 22:38                 ` Yinghai Lu
  2013-01-14 23:11                   ` Borislav Petkov
  0 siblings, 1 reply; 199+ messages in thread
From: Yinghai Lu @ 2013-01-14 22:38 UTC (permalink / raw)
  To: Borislav Petkov, Yinghai Lu, H. Peter Anvin, Thomas Gleixner,
	Ingo Molnar, Eric W. Biederman, Andrew Morton, Jan Kiszka,
	Jason Wessel, linux-kernel, Rob Landley, Matt Fleming,
	Gokul Caushik, Josh Triplett, Joe Millenbach

On Mon, Jan 14, 2013 at 12:26 PM, Borislav Petkov <bp@alien8.de> wrote:
> On Mon, Jan 14, 2013 at 12:14:18PM -0800, Yinghai Lu wrote:
>> no, no, no.
>>
>> bootloader does not need to know sentinel, and they only need to do:
>>     clearing boot_param buffer and copying setup_header only
>>
>> even new bootloader is not supposed to know sentinel ...
>
> Ah, ok. I thought something was fishy because if bootloaders would know
> about it, they'd copy setup_header and zero out the sentinel only, to
> force the kernel to use crappy ext_* etc. values.
>
> How about this:
>
> "The sentinel variable is set by the linker script to 0xff. It is
> supposed to be used for catching bootloaders which just copy the
> setup_header portion and don't clear the whole boot_params buffer as
> they are supposed to. Such bootloaders will leave the sentinel to its
> initial value of 0xff and in this case, the kernel will assume that some
> fields in boot_params have invalid values and zero them out."
>
still not  clear ...

in the kernel image, we only have setup_header, and it is around with
other codes.

bootloader could prepare boot_param several ways.
1.  allocate boot_param, and memset it to 0, and copy setup_header from kernel
image to the middle boot_param.
2. allocate buffer equal to whole setup section include code and setup_header.
and copy whole setup section to buffer, and use that buffer as boot_params.
a. use setup_code to do real mode booting. or
b. clear used fields or store value in boot_param to use 32bit entry
or 64bit entry.


so now hope you understand my changes.

>> +       /*
>> +        * kernel have sentinel to set as 0xff in setup link scripts,
>> +        * so if bootloader just copy whole page from kernel image to
>> +        * get setup_header instead of clearing boot_param buffer and
>> +        * copying setup_header only, will leave sentinel as 0xff.
>> +        * With that, we can tell some fields in boot_param have
>> +        * invalid values, and we need to zero them in kernel.
>> +        */

^ permalink raw reply	[flat|nested] 199+ messages in thread

* Re: [PATCH v7u1 00/31] x86, boot, 64bit: Add support for loading ramdisk and bzImage above 4G
  2013-01-14 20:45 ` H. Peter Anvin
@ 2013-01-14 22:44   ` Yinghai Lu
  2013-01-14 23:16     ` H. Peter Anvin
  0 siblings, 1 reply; 199+ messages in thread
From: Yinghai Lu @ 2013-01-14 22:44 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Thomas Gleixner, Ingo Molnar, Eric W. Biederman, Andrew Morton,
	Borislav Petkov, Jan Kiszka, Jason Wessel, linux-kernel,
	David Woodhouse

On Mon, Jan 14, 2013 at 12:45 PM, H. Peter Anvin <hpa@zytor.com> wrote:
> This is getting extremely unwieldy and we need to get it broken up a bit.
>
> I would really like to get the boot protocol changes earlier in the
> series, because it has interactions with work other people are doing and
> may need additional surgery.  That is, the additions of flags and fields
> to make it possible for the kernel to indicate that > 4 GB booting is
> possible, not the actual implementation thereof.  This split between
> protocol changes and implementation will also be useful for bisection.

When will we actually change protocol version number in the code?

We need to refer that number in the protocol doc.

Yinghai

^ permalink raw reply	[flat|nested] 199+ messages in thread

* Re: [PATCH v7u1 22/31] x86, boot: add fields to support load bzImage and ramdisk above 4G
  2013-01-14  9:43       ` Borislav Petkov
@ 2013-01-14 23:06         ` Yinghai Lu
  0 siblings, 0 replies; 199+ messages in thread
From: Yinghai Lu @ 2013-01-14 23:06 UTC (permalink / raw)
  To: Borislav Petkov, Yinghai Lu, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin, Eric W. Biederman, Andrew Morton, Jan Kiszka,
	Jason Wessel, linux-kernel, Rob Landley, Matt Fleming,
	Gokul Caushik, Josh Triplett, Joe Millenbach

On Mon, Jan 14, 2013 at 1:43 AM, Borislav Petkov <bp@suse.de> wrote:
>> >> --- a/arch/x86/boot/compressed/cmdline.c
>> >> +++ b/arch/x86/boot/compressed/cmdline.c
>> >> @@ -17,6 +17,8 @@ static unsigned long get_cmd_line_ptr(void)
>> >>  {
>> >>       unsigned long cmd_line_ptr = real_mode->hdr.cmd_line_ptr;
>> >>
>> >> +     cmd_line_ptr |= (u64)real_mode->ext_cmd_line_ptr << 32;
>> >> +
>> >>       return cmd_line_ptr;
>> >>  }
>> >
>> > On 32-bit, this unsigned long cmd_line_ptr is 4 bytes and the OR doesn't
>> > have any effect on the final result. You probably want to do:
>>
>> yes, that is what we want to keep 32bit and 64bit unified.
>>
>> >
>> > #ifdef CONFIG_64BIT
>> >         cmd_line_ptr |= (u64)real_mode->ext_cmd_line_ptr << 32;
>> > #endif
>> >
>> > right?
>> >
>> > Or instead look at ->sentinel to know whether the ext_* fields are valid
>> > or not, and save yourself the OR if not.
>>
>> no.
>>
>> that is whole point of sentinel, we don't need to check sentinel everywhere
>> because ext_* are valid.
>
> Dude, do you even read my comments? This line:
>
>         cmd_line_ptr |= (u64)real_mode->ext_cmd_line_ptr << 32;
>
> doesn't do a whit on 32-bit. So execute it *only* on 32-bit!

Following should work on 32bit, and keep them same code on
32bit and 64bit.

because ext_cmd_line_ptr is always 0 on 32bit.

Index: linux-2.6/arch/x86/boot/compressed/cmdline.c
===================================================================
--- linux-2.6.orig/arch/x86/boot/compressed/cmdline.c
+++ linux-2.6/arch/x86/boot/compressed/cmdline.c
@@ -17,6 +17,9 @@ static unsigned long get_cmd_line_ptr(vo
 {
        unsigned long cmd_line_ptr = real_mode->hdr.cmd_line_ptr;

+       if (real_mode->ext_cmd_line_ptr)
+               cmd_line_ptr |= (u64)real_mode->ext_cmd_line_ptr << 32;
+
        return cmd_line_ptr;
 }
 int cmdline_find_option(const char *option, char *buffer, int bufsize)

^ permalink raw reply	[flat|nested] 199+ messages in thread

* Re: [PATCH v7u1 22/31] x86, boot: add fields to support load bzImage and ramdisk above 4G
  2013-01-13 21:41   ` Borislav Petkov
  2013-01-14  5:37     ` Yinghai Lu
@ 2013-01-14 23:10     ` H. Peter Anvin
  2013-01-14 23:21       ` Borislav Petkov
  1 sibling, 1 reply; 199+ messages in thread
From: H. Peter Anvin @ 2013-01-14 23:10 UTC (permalink / raw)
  To: Borislav Petkov, Yinghai Lu, Thomas Gleixner, Ingo Molnar,
	Eric W. Biederman, Andrew Morton, Jan Kiszka, Jason Wessel,
	linux-kernel, Rob Landley, Matt Fleming, Gokul Caushik,
	Josh Triplett, Joe Millenbach

On 01/13/2013 01:41 PM, Borislav Petkov wrote:
>> diff --git a/arch/x86/boot/compressed/cmdline.c b/arch/x86/boot/compressed/cmdline.c
>> index b4c913c..bffd73b 100644
>> --- a/arch/x86/boot/compressed/cmdline.c
>> +++ b/arch/x86/boot/compressed/cmdline.c
>> @@ -17,6 +17,8 @@ static unsigned long get_cmd_line_ptr(void)
>>  {
>>  	unsigned long cmd_line_ptr = real_mode->hdr.cmd_line_ptr;
>>  
>> +	cmd_line_ptr |= (u64)real_mode->ext_cmd_line_ptr << 32;
>> +
>>  	return cmd_line_ptr;
>>  }
> 
> On 32-bit, this unsigned long cmd_line_ptr is 4 bytes and the OR doesn't
> have any effect on the final result. You probably want to do:
> 
> #ifdef CONFIG_64BIT
> 	cmd_line_ptr |= (u64)real_mode->ext_cmd_line_ptr << 32;
> #endif
> 
> right?
> 

Actually, on 32 bits the compiler will simply drop the statement on the
floor, no #ifdef required.  If gcc outputs a warning we should do
something about it, otherwise we can just plain ignore it.

	-hpa



^ permalink raw reply	[flat|nested] 199+ messages in thread

* Re: [PATCH v7u1 22/31] x86, boot: add fields to support load bzImage and ramdisk above 4G
  2013-01-14 22:38                 ` Yinghai Lu
@ 2013-01-14 23:11                   ` Borislav Petkov
  2013-01-15  1:04                     ` Yinghai Lu
  0 siblings, 1 reply; 199+ messages in thread
From: Borislav Petkov @ 2013-01-14 23:11 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: H. Peter Anvin, Thomas Gleixner, Ingo Molnar, Eric W. Biederman,
	Andrew Morton, Jan Kiszka, Jason Wessel, linux-kernel,
	Rob Landley, Matt Fleming, Gokul Caushik, Josh Triplett,
	Joe Millenbach

On Mon, Jan 14, 2013 at 02:38:36PM -0800, Yinghai Lu wrote:
> On Mon, Jan 14, 2013 at 12:26 PM, Borislav Petkov <bp@alien8.de> wrote:
> > On Mon, Jan 14, 2013 at 12:14:18PM -0800, Yinghai Lu wrote:
> >> no, no, no.
> >>
> >> bootloader does not need to know sentinel, and they only need to do:
> >>     clearing boot_param buffer and copying setup_header only
> >>
> >> even new bootloader is not supposed to know sentinel ...
> >
> > Ah, ok. I thought something was fishy because if bootloaders would know
> > about it, they'd copy setup_header and zero out the sentinel only, to
> > force the kernel to use crappy ext_* etc. values.
> >
> > How about this:
> >
> > "The sentinel variable is set by the linker script to 0xff. It is
> > supposed to be used for catching bootloaders which just copy the
> > setup_header portion and don't clear the whole boot_params buffer as
> > they are supposed to. Such bootloaders will leave the sentinel to its
> > initial value of 0xff and in this case, the kernel will assume that some
> > fields in boot_params have invalid values and zero them out."
> >
> still not  clear ...
> 
> in the kernel image, we only have setup_header, and it is around with
> other codes.
> 
> bootloader could prepare boot_param several ways.
> 1.  allocate boot_param, and memset it to 0, and copy setup_header from kernel
> image to the middle boot_param.
> 2. allocate buffer equal to whole setup section include code and setup_header.
> and copy whole setup section to buffer, and use that buffer as boot_params.
> a. use setup_code to do real mode booting. or
> b. clear used fields or store value in boot_param to use 32bit entry
> or 64bit entry.
> 
> 
> so now hope you understand my changes.
> 
> >> +       /*
> >> +        * kernel have sentinel to set as 0xff in setup link scripts,
> >> +        * so if bootloader just copy whole page from kernel image to
> >> +        * get setup_header instead of clearing boot_param buffer and
> >> +        * copying setup_header only, will leave sentinel as 0xff.
> >> +        * With that, we can tell some fields in boot_param have
> >> +        * invalid values, and we need to zero them in kernel.

Ok, but this needlessly mentiones some sort of allocation technique
which the bootloader does and which we don't care about. What we do care
about is the sentinel variable and what it means: if the bootloader
copies it accidentally, we use *that* as a trigger. So let's revise it:

"The sentinel variable is set by the linker script to 0xff. A bootloader
is supposed to only take setup_header and put it into a clean
boot_params buffer. If it turns out that it is clumsy or too generous
with the buffer, it most probably will pick up the sentinel variable
too. The fact that this variable then is still non-zero signals to
us that that we should zero out certain portions of boot_params (see
sanitize_real_mode()) because we assume that they contain garbage."

I think this is as clear as it gets.

-- 
Regards/Gruss,
    Boris.

Sent from a fat crate under my desk. Formatting is fine.
--

^ permalink raw reply	[flat|nested] 199+ messages in thread

* Re: [PATCH v7u1 00/31] x86, boot, 64bit: Add support for loading ramdisk and bzImage above 4G
  2013-01-14 22:44   ` Yinghai Lu
@ 2013-01-14 23:16     ` H. Peter Anvin
  2013-01-14 23:39       ` David Woodhouse
  0 siblings, 1 reply; 199+ messages in thread
From: H. Peter Anvin @ 2013-01-14 23:16 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Thomas Gleixner, Ingo Molnar, Eric W. Biederman, Andrew Morton,
	Borislav Petkov, Jan Kiszka, Jason Wessel, linux-kernel,
	David Woodhouse

On 01/14/2013 02:44 PM, Yinghai Lu wrote:
> On Mon, Jan 14, 2013 at 12:45 PM, H. Peter Anvin <hpa@zytor.com> wrote:
>> This is getting extremely unwieldy and we need to get it broken up a bit.
>>
>> I would really like to get the boot protocol changes earlier in the
>> series, because it has interactions with work other people are doing and
>> may need additional surgery.  That is, the additions of flags and fields
>> to make it possible for the kernel to indicate that > 4 GB booting is
>> possible, not the actual implementation thereof.  This split between
>> protocol changes and implementation will also be useful for bisection.
> 
> When will we actually change protocol version number in the code?
> 
> We need to refer that number in the protocol doc.
> 

The protocol change and the documentation go together (as separate but
adjacent patches).  We bump the number when we change the structure to
accommodate the necessary fields and flags (i.e. a vacuous
implementation), not when we add the full functionality.

The reason I want to do this this way is that I want to also make David
Woodhouse's protocol fixes to make the EFI stub work correctly at the
same time, so we get a single protocol level bump.  I think it is better
I merge both sets, and I was hoping to do that tomorrow if possible.

	-hpa



^ permalink raw reply	[flat|nested] 199+ messages in thread

* Re: [PATCH v7u1 22/31] x86, boot: add fields to support load bzImage and ramdisk above 4G
  2013-01-14 23:10     ` H. Peter Anvin
@ 2013-01-14 23:21       ` Borislav Petkov
  0 siblings, 0 replies; 199+ messages in thread
From: Borislav Petkov @ 2013-01-14 23:21 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Yinghai Lu, Thomas Gleixner, Ingo Molnar, Eric W. Biederman,
	Andrew Morton, Jan Kiszka, Jason Wessel, linux-kernel,
	Rob Landley, Matt Fleming, Gokul Caushik, Josh Triplett,
	Joe Millenbach

On Mon, Jan 14, 2013 at 03:10:25PM -0800, H. Peter Anvin wrote:
> Actually, on 32 bits the compiler will simply drop the statement on
> the floor, no #ifdef required. If gcc outputs a warning we should do
> something about it, otherwise we can just plain ignore it.

Good point, it does that both when I do -O2 and -Os -m32 builds with a
simple test program. Disregard my previous comment about this.

Thanks.

-- 
Regards/Gruss,
    Boris.

Sent from a fat crate under my desk. Formatting is fine.
--

^ permalink raw reply	[flat|nested] 199+ messages in thread

* Re: [PATCH v7u1 00/31] x86, boot, 64bit: Add support for loading ramdisk and bzImage above 4G
  2013-01-14 23:16     ` H. Peter Anvin
@ 2013-01-14 23:39       ` David Woodhouse
  2013-01-14 23:50         ` H. Peter Anvin
  0 siblings, 1 reply; 199+ messages in thread
From: David Woodhouse @ 2013-01-14 23:39 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Yinghai Lu, Thomas Gleixner, Ingo Molnar, Eric W. Biederman,
	Andrew Morton, Borislav Petkov, Jan Kiszka, Jason Wessel,
	linux-kernel

[-- Attachment #1: Type: text/plain, Size: 398 bytes --]

On Mon, 2013-01-14 at 15:16 -0800, H. Peter Anvin wrote:
> The reason I want to do this this way is that I want to also make
> David Woodhouse's protocol fixes to make the EFI stub work correctly
> at the same time, so we get a single protocol level bump. 

My changes don't need a protocol level bump. It's just two new bits in
load_flags that old loaders won't care about.

-- 
dwmw2


[-- Attachment #2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 6171 bytes --]

^ permalink raw reply	[flat|nested] 199+ messages in thread

* Re: [PATCH v7u1 00/31] x86, boot, 64bit: Add support for loading ramdisk and bzImage above 4G
  2013-01-14 23:39       ` David Woodhouse
@ 2013-01-14 23:50         ` H. Peter Anvin
  2013-01-15  0:12           ` David Woodhouse
  0 siblings, 1 reply; 199+ messages in thread
From: H. Peter Anvin @ 2013-01-14 23:50 UTC (permalink / raw)
  To: David Woodhouse
  Cc: Yinghai Lu, Thomas Gleixner, Ingo Molnar, Eric W. Biederman,
	Andrew Morton, Borislav Petkov, Jan Kiszka, Jason Wessel,
	linux-kernel

On 01/14/2013 03:39 PM, David Woodhouse wrote:
> On Mon, 2013-01-14 at 15:16 -0800, H. Peter Anvin wrote:
>> The reason I want to do this this way is that I want to also make
>> David Woodhouse's protocol fixes to make the EFI stub work correctly
>> at the same time, so we get a single protocol level bump. 
> 
> My changes don't need a protocol level bump. It's just two new bits in
> load_flags that old loaders won't care about.
> 

I'm wondering if we should put your new flags in xloadflags instead just
because some boot loaders have been known to clobber the loadflags when
setting the upper bits.  In theory it shouldn't matter... I'm wondering
if it does in practice.  The other bit is that Yinghai's changes want
even more flags.

Either way might as well see if we can do it as closely adjacently as
possible (but no, I don't want to drag this out.)

	-hpa


^ permalink raw reply	[flat|nested] 199+ messages in thread

* Re: [PATCH v7u1 00/31] x86, boot, 64bit: Add support for loading ramdisk and bzImage above 4G
  2013-01-14 23:50         ` H. Peter Anvin
@ 2013-01-15  0:12           ` David Woodhouse
  0 siblings, 0 replies; 199+ messages in thread
From: David Woodhouse @ 2013-01-15  0:12 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Yinghai Lu, Thomas Gleixner, Ingo Molnar, Eric W. Biederman,
	Andrew Morton, Borislav Petkov, Jan Kiszka, Jason Wessel,
	linux-kernel

[-- Attachment #1: Type: text/plain, Size: 427 bytes --]

On Mon, 2013-01-14 at 15:50 -0800, H. Peter Anvin wrote:
> I'm wondering if we should put your new flags in xloadflags instead just
> because some boot loaders have been known to clobber the loadflags when
> setting the upper bits. 

It doesn't matter. Old broken bootloaders won't use the EFI stub entry
point if they don't see the flags that indicate it's available. But they
weren't going to anyway.

-- 
dwmw2


[-- Attachment #2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 6171 bytes --]

^ permalink raw reply	[flat|nested] 199+ messages in thread

* Re: [PATCH v7u1 22/31] x86, boot: add fields to support load bzImage and ramdisk above 4G
  2013-01-14 23:11                   ` Borislav Petkov
@ 2013-01-15  1:04                     ` Yinghai Lu
  0 siblings, 0 replies; 199+ messages in thread
From: Yinghai Lu @ 2013-01-15  1:04 UTC (permalink / raw)
  To: Borislav Petkov, Yinghai Lu, H. Peter Anvin, Thomas Gleixner,
	Ingo Molnar, Eric W. Biederman, Andrew Morton, Jan Kiszka,
	Jason Wessel, linux-kernel, Rob Landley, Matt Fleming,
	Gokul Caushik, Josh Triplett, Joe Millenbach

On Mon, Jan 14, 2013 at 3:11 PM, Borislav Petkov <bp@alien8.de> wrote:
>> >> +       /*
>> >> +        * kernel have sentinel to set as 0xff in setup link scripts,
>> >> +        * so if bootloader just copy whole page from kernel image to
>> >> +        * get setup_header instead of clearing boot_param buffer and
>> >> +        * copying setup_header only, will leave sentinel as 0xff.
>> >> +        * With that, we can tell some fields in boot_param have
>> >> +        * invalid values, and we need to zero them in kernel.
>
> Ok, but this needlessly mentiones some sort of allocation technique
> which the bootloader does and which we don't care about. What we do care
> about is the sentinel variable and what it means: if the bootloader
> copies it accidentally, we use *that* as a trigger. So let's revise it:
>
> "The sentinel variable is set by the linker script to 0xff. A bootloader
> is supposed to only take setup_header and put it into a clean
> boot_params buffer. If it turns out that it is clumsy or too generous
> with the buffer, it most probably will pick up the sentinel variable
> too. The fact that this variable then is still non-zero signals to
> us that that we should zero out certain portions of boot_params (see
> sanitize_real_mode()) because we assume that they contain garbage."


ok, I changed to :

        /*
         * The sentinel is set to 0xff via the linker script (setup.ld).
         * A bootloader is supposed to only take setup_header and put
         * it into a clean boot_params buffer. If it turns out that
         * it is clumsy or too generous with the buffer, it most
         * probably will pick up the sentinel variable too. The fact
         * that this variable then is still 0xff will let kernel
         * know that some variables in boot_params are invalid and
         * kernel should zero out certain portions of boot_params
         * (see sanitize_real_mode()).
         */

^ permalink raw reply	[flat|nested] 199+ messages in thread

* Re: [PATCH v7u1 22/31] x86, boot: add fields to support load bzImage and ramdisk above 4G
  2013-01-14 20:05                 ` Yinghai Lu
@ 2013-01-15  6:17                   ` Yinghai Lu
  2013-01-15 15:50                     ` Borislav Petkov
  0 siblings, 1 reply; 199+ messages in thread
From: Yinghai Lu @ 2013-01-15  6:17 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Borislav Petkov, Thomas Gleixner, Ingo Molnar, Eric W. Biederman,
	Andrew Morton, Jan Kiszka, Jason Wessel, linux-kernel,
	Rob Landley, Matt Fleming, Gokul Caushik, Josh Triplett,
	Joe Millenbach

On Mon, Jan 14, 2013 at 12:05 PM, Yinghai Lu <yinghai@kernel.org> wrote:
> On Mon, Jan 14, 2013 at 11:56 AM, H. Peter Anvin <hpa@zytor.com> wrote:
>> On 01/14/2013 11:50 AM, Yinghai Lu wrote:
>>> On Mon, Jan 14, 2013 at 10:59 AM, H. Peter Anvin <hpa@zytor.com> wrote:
>>>> No, there were other fields that were also left uninitialized, per your
>>>> analysis from last year.  I don't remember the details, but I seem to
>>>> recall they included the EFI and graphics-related fields.
>>>>
>>>> So yes, just zero them all out.
>>>
>>> ?
>>>
>>
>> Yep.
>
> ok, will rebase -v7 branch this afternoon, and ask you and Borislav to
> check if i missed anything.
>
> hope can send out v7u2 tomorrow ...

I rebased the -v7 branch, and it should address the comments and change log
request that I understand and agreed on.

could be found at:

        git://git.kernel.org/pub/scm/linux/kernel/git/yinghai/linux-yinghai.git
for-x86-boot

and it is on top of linus's tree 2013-01-14
plus tip:x86/mm, tip:x86/urgent, tip:x86/mm2

Please check if I miss anything that must be addressed.

Thanks

Yinghai

^ permalink raw reply	[flat|nested] 199+ messages in thread

* Re: [PATCH v7u1 26/31] x86: Don't enable swiotlb if there is not enough ram for it
  2013-01-11 17:49                                               ` Yinghai Lu
@ 2013-01-15  6:19                                                 ` Yinghai Lu
  2013-01-18 15:55                                                   ` Konrad Rzeszutek Wilk
  0 siblings, 1 reply; 199+ messages in thread
From: Yinghai Lu @ 2013-01-15  6:19 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: Eric W. Biederman, Shuah Khan, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin, Andrew Morton, Borislav Petkov, Jan Kiszka,
	Jason Wessel, linux-kernel, Joerg Roedel

On Fri, Jan 11, 2013 at 9:49 AM, Yinghai Lu <yinghai@kernel.org> wrote:
> On Fri, Jan 11, 2013 at 8:52 AM, Yinghai Lu <yinghai@kernel.org> wrote:
>>>
>>> I need to check this patch out and then also test-run them on IA64, AMD-VI, Calgary-X
>>> GART and Intel VT-d to make a sanity test.
>>
>> that will be great, and please check attached two patches, or you want
>> to me update
>> for-x86-boot branch and you test that instead?
>>
>> but if you want to check memmap=4095M$1M, then will need to test on
>> newer branch.
>
>
> I updated the for-x86-boot branch.
>
> git://git.kernel.org/pub/scm/linux/kernel/git/yinghai/linux-yinghai.git
> for-x86-boot
>

Konrad,

Did you get chance to test that branch on your setups?

Thanks

Yinghai

^ permalink raw reply	[flat|nested] 199+ messages in thread

* Re: [PATCH v7u1 00/31] x86, boot, 64bit: Add support for loading ramdisk and bzImage above 4G
  2013-01-04  0:48 [PATCH v7u1 00/31] x86, boot, 64bit: Add support for loading ramdisk and bzImage above 4G Yinghai Lu
                   ` (32 preceding siblings ...)
  2013-01-14 20:45 ` H. Peter Anvin
@ 2013-01-15 12:19 ` Stefano Stabellini
  2013-01-15 16:43   ` Yinghai Lu
  33 siblings, 1 reply; 199+ messages in thread
From: Stefano Stabellini @ 2013-01-15 12:19 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Eric W. Biederman,
	Andrew Morton, Borislav Petkov, Jan Kiszka, Jason Wessel,
	linux-kernel, Konrad Rzeszutek Wilk

On Fri, 4 Jan 2013, Yinghai Lu wrote:
> Now we have limit kdump reseved under 896M, because kexec has the limitation.
> and also bzImage need to stay under 4g.
> 
> To make kexec/kdump could use range above 4g, we need to make bzImage and
> ramdisk could be loaded above 4g.
> During booting bzImage will be unpacked on same postion and stay high.
> 
> The patches add fields in setup_header and boot_params to
> 1. get info about ramdisk position info above 4g from bootloader/kexec
> 2. get info about cmd_line_ptr info above 4g from bootloader/kexec
> 3. set xloadflags bit0 in header for bzImage and bootloader/kexec load
>    could check that to decide if it could to put bzImage high.
> 4. use sentinel to make sure ext_* fields in boot_params could be used.
> 
> This patches is tested with kexec tools with local changes and they are sent
> to kexec list later.
> 
> could be found at:
> 
>         git://git.kernel.org/pub/scm/linux/kernel/git/yinghai/linux-yinghai.git for-x86-boot

I tried to boot this kernel as PV guest with 2GB of RAM, but
unfortunately it crashes early on at boot (earlyprintk=xen log
appended).



mapping kernel into physical memory
about to get started...
[    0.000000] Initializing cgroup subsys cpuset
[    0.000000] Linux version 3.8.0-rc3+ (sstabellini@st22) (gcc version 4.4.5 (Debian 4.4.5-8) ) #4 SMP Tue Jan 15 12:11:59 UTC 2013
[    0.000000] Command line: root=/dev/xvda1 rw loglevel=9 debug console=hvc0 earlyprintk=xen
[    0.000000] ACPI in unprivileged domain disabled
[    0.000000] e820: BIOS-provided physical RAM map:
[    0.000000] Xen: [mem 0x0000000000000000-0x000000000009ffff] usable
[    0.000000] Xen: [mem 0x00000000000a0000-0x00000000000fffff] reserved
[    0.000000] Xen: [mem 0x0000000000100000-0x000000007fffffff] usable
[    0.000000] bootconsole [xenboot0] enabled
[    0.000000] NX (Execute Disable) protection: active
[    0.000000] DMI not present or invalid.
[    0.000000] e820: update [mem 0x00000000-0x0000ffff] usable ==> reserved
[    0.000000] e820: remove [mem 0x000a0000-0x000fffff] usable
[    0.000000] No AGP bridge found
[    0.000000] e820: last_pfn = 0x80000 max_arch_pfn = 0x400000000
[    0.000000] Base memory trampoline at [ffff88000009a000] 9a000 size 24576
[    0.000000] init_memory_mapping: [mem 0x00000000-0x000fffff]
[    0.000000]  [mem 0x00000000-0x000fffff] page 4k
[    0.000000] init_memory_mapping: [mem 0x7fe00000-0x7fffffff]
[    0.000000]  [mem 0x7fe00000-0x7fffffff] page 4k
[    0.000000] BRK [0x023b8000, 0x023b8fff] PGTABLE
[    0.000000] BRK [0x023b9000, 0x023b9fff] PGTABLE
[    0.000000] init_memory_mapping: [mem 0x7c000000-0x7fdfffff]
[    0.000000]  [mem 0x7c000000-0x7fdfffff] page 4k
[    0.000000] BRK [0x023ba000, 0x023bafff] PGTABLE
[    0.000000] BRK [0x023bb000, 0x023bbfff] PGTABLE
[    0.000000] BRK [0x023bc000, 0x023bcfff] PGTABLE
[    0.000000] init_memory_mapping: [mem 0x00100000-0x7bffffff]
[    0.000000]  [mem 0x00100000-0x7bffffff] page 4k
(XEN) d15:v0: unhandled page fault (ec=0000)
(XEN) Pagetable walk from ffffea0000080330:
(XEN)  L4[0x1d4] = 0000000000000000 ffffffffffffffff
(XEN) domain_crash_sync called from entry.S
(XEN) Domain 15 (vcpu#0) crashed on cpu#3:
(XEN) ----[ Xen-4.3-unstable  x86_64  debug=y  Not tainted ]----
(XEN) CPU:    3
(XEN) RIP:    e033:[<ffffffff810052e4>]
(XEN) RFLAGS: 0000000000000206   EM: 1   CONTEXT: pv guest
(XEN) rax: ffffea0000000000   rbx: 000000000200c000   rcx: 0000000080000000
(XEN) rdx: 0000000000080300   rsi: 000000000200c000   rdi: 0000000000000000
(XEN) rbp: ffffffff82001dd8   rsp: ffffffff82001d90   r8:  0000000000000000
(XEN) r9:  0000000000000083   r10: 0000000000000000   r11: 0000000000000000
(XEN) r12: 0000000000000000   r13: 0000001000000000   r14: 0000000000000000
(XEN) r15: 0000000000100000   cr0: 000000008005003b   cr4: 00000000000026f0
(XEN) cr3: 0000000184cc8000   cr2: ffffea0000080330
(XEN) ds: 0000   es: 0000   fs: 0000   gs: 0000   ss: e02b   cs: e033
(XEN) Guest stack trace from rsp=ffffffff82001d90:
(XEN)    0000000080000000 0000000000000000 0000000000000000 ffffffff810052e4
(XEN)    000000010000e030 0000000000010006 ffffffff82001dd8 000000000000e02b
(XEN)    ffffffff81005299 ffffffff82001e08 ffffffff8100768c 0000000080000000
(XEN)    0000000080000000 0000001000000000 000000007ff00000 ffffffff82001e48
(XEN)    ffffffff82161b3a ffffffff82001e48 0000000001000000 00000000017cb000
(XEN)    0000000000000000 0000000000000000 0000000000000000 ffffffff82001ed8
(XEN)    ffffffff821529cc ffffffff821d7920 0000000000000000 0000000000000000
(XEN)    0000000000000000 ffffffff82001f00 ffffffff819a8e6f 0000000000000010
(XEN)    ffffffff82001ee8 ffffffff82001ea8 0000000000000000 ffffffff82001ec8
(XEN)    ffffffffffffffff ffffffff821d7920 0000000000000000 0000000000000000
(XEN)    0000000000000000 ffffffff82001f28 ffffffff8214bb69 ffffffff82001f28
(XEN)    ffffffff810556e6 ae32416208683d40 ffffffff821e12e0 0000000000000000
(XEN)    0000000000000000 0000000000000000 0000000000000000 ffffffff82001f38
(XEN)    ffffffff8214b4fc ffffffff82001ff8 ffffffff8214f3b3 0000000000000000
(XEN)    0000000000000000 0000000000000000 0000000000000000 0000000000000000
(XEN)    0000000000000000 0000000000000000 0000000000000000 0000000000000000
(XEN)    0000000000000000 0000000000000000 0000000000000000 0000000000000000
(XEN)    0000000000000000 0000000000000000 0000000000000000 0000000000000000
(XEN)    0000000000000000 809822011f898975 000106a506100800 0000000000000001
(XEN)    0000000000000000 0000000000000000 0f00000060c0c748 ccccccccccccc305

^ permalink raw reply	[flat|nested] 199+ messages in thread

* Re: [PATCH v7u1 01/31] x86, mm: Fix page table early allocation offset checking
  2013-01-04  0:48 ` [PATCH v7u1 01/31] x86, mm: Fix page table early allocation offset checking Yinghai Lu
  2013-01-04  7:17   ` Borislav Petkov
@ 2013-01-15 12:27   ` Stefano Stabellini
  1 sibling, 0 replies; 199+ messages in thread
From: Stefano Stabellini @ 2013-01-15 12:27 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Eric W. Biederman,
	Andrew Morton, Borislav Petkov, Jan Kiszka, Jason Wessel,
	linux-kernel

On Fri, 4 Jan 2013, Yinghai Lu wrote:
> During debugging loading kernel above 4G, found one page if is not used
> in BRK with early page allocation.
> 
> pgt_buf_top is address that can not be used, so should check if that new
> end is above that top, otherwise last page will not be used.
> 
> Fix that checking and also add print out for every allocation from BRK.
> 
> Signed-off-by: Yinghai Lu <yinghai@kernel.org>

Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>


>  arch/x86/mm/init.c |    4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/x86/mm/init.c b/arch/x86/mm/init.c
> index 6f85de8..c4293cf 100644
> --- a/arch/x86/mm/init.c
> +++ b/arch/x86/mm/init.c
> @@ -47,7 +47,7 @@ __ref void *alloc_low_pages(unsigned int num)
>  						__GFP_ZERO, order);
>  	}
>  
> -	if ((pgt_buf_end + num) >= pgt_buf_top) {
> +	if ((pgt_buf_end + num) > pgt_buf_top) {
>  		unsigned long ret;
>  		if (min_pfn_mapped >= max_pfn_mapped)
>  			panic("alloc_low_page: ran out of memory");
> @@ -61,6 +61,8 @@ __ref void *alloc_low_pages(unsigned int num)
>  	} else {
>  		pfn = pgt_buf_end;
>  		pgt_buf_end += num;
> +		printk(KERN_DEBUG "BRK [%#010lx, %#010lx] PGTABLE\n",
> +			pfn << PAGE_SHIFT, (pgt_buf_end << PAGE_SHIFT) - 1);
>  	}
>  
>  	for (i = 0; i < num; i++) {
> -- 
> 1.7.10.4
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 

^ permalink raw reply	[flat|nested] 199+ messages in thread

* Re: [PATCH v7u1 10/31] x86, 64bit: Don't set max_pfn_mapped wrong value early on native path
  2013-01-04  0:48 ` [PATCH v7u1 10/31] x86, 64bit: Don't set max_pfn_mapped wrong value early on native path Yinghai Lu
  2013-01-11 12:13   ` Borislav Petkov
@ 2013-01-15 13:48   ` Stefano Stabellini
  2013-01-15 15:22     ` Konrad Rzeszutek Wilk
  2013-01-15 16:37     ` Yinghai Lu
  1 sibling, 2 replies; 199+ messages in thread
From: Stefano Stabellini @ 2013-01-15 13:48 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Eric W. Biederman,
	Andrew Morton, Borislav Petkov, Jan Kiszka, Jason Wessel,
	linux-kernel, Konrad Rzeszutek Wilk

On Fri, 4 Jan 2013, Yinghai Lu wrote:
> We are not having max_pfn_mapped set correctly until init_memory_mapping.
> 
> so don't print it initial value for 64bit
> 
> Also need to use KERNEL_IMAGE_SIZE directly for highmap cleanup.
> 
> Signed-off-by: Yinghai Lu <yinghai@kernel.org>
> ---
>  arch/x86/kernel/head64.c |    3 ---
>  arch/x86/kernel/setup.c  |    2 ++
>  arch/x86/mm/init_64.c    |    6 +++++-
>  3 files changed, 7 insertions(+), 4 deletions(-)
> 
> diff --git a/arch/x86/kernel/head64.c b/arch/x86/kernel/head64.c
> index a3fc233..7061d8b 100644
> --- a/arch/x86/kernel/head64.c
> +++ b/arch/x86/kernel/head64.c
> @@ -146,9 +146,6 @@ void __init x86_64_start_kernel(char * real_mode_data)
>  	/* clear bss before set_intr_gate with early_idt_handler */
>  	clear_bss();
>  
> -	/* XXX - this is wrong... we need to build page tables from scratch */
> -	max_pfn_mapped = KERNEL_IMAGE_SIZE >> PAGE_SHIFT;
> -
>  	for (i = 0; i < NUM_EXCEPTION_VECTORS; i++) {
>  #ifdef CONFIG_EARLY_PRINTK
>  		set_intr_gate(i, &early_idt_handlers[i]);
> diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
> index 63160c6..04797e78 100644
> --- a/arch/x86/kernel/setup.c
> +++ b/arch/x86/kernel/setup.c
> @@ -910,8 +910,10 @@ void __init setup_arch(char **cmdline_p)
>  	setup_bios_corruption_check();
>  #endif
>  
> +#ifdef CONFIG_X86_32
>  	printk(KERN_DEBUG "initial memory mapped: [mem 0x00000000-%#010lx]\n",
>  			(max_pfn_mapped<<PAGE_SHIFT) - 1);
> +#endif
>  
>  	reserve_real_mode();
>  
> diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c
> index 9c5f2b1..98385a2 100644
> --- a/arch/x86/mm/init_64.c
> +++ b/arch/x86/mm/init_64.c
> @@ -394,10 +394,14 @@ void __init init_extra_mapping_uc(unsigned long phys, unsigned long size)
>  void __init cleanup_highmap(void)
>  {
>  	unsigned long vaddr = __START_KERNEL_map;
> -	unsigned long vaddr_end = __START_KERNEL_map + (max_pfn_mapped << PAGE_SHIFT);
> +	unsigned long vaddr_end = __START_KERNEL_map + KERNEL_IMAGE_SIZE;
>  	unsigned long end = roundup((unsigned long)_brk_end, PMD_SIZE) - 1;
>  	pmd_t *pmd = level2_kernel_pgt;
>  
> +	/* Xen has its own end somehow with abused max_pfn_mapped */
> +	if (max_pfn_mapped)
> +		vaddr_end = __START_KERNEL_map + (max_pfn_mapped << PAGE_SHIFT);

If you are going to put a comment like that in the code, could you
please at least add some useful details, rather than a generic
"somehow"? It doesn't seem very helpful to me or to any other hackers
looking at the code.

The issue is even described as a comment in the code at the beginning of
arch/x86/xen/mmu.c:xen_setup_kernel_pagetable:

/* max_pfn_mapped is the last pfn mapped in the initial memory
 * mappings. Considering that on Xen after the kernel mappings we
 * have the mappings of some pages that don't exist in pfn space, we
 * set max_pfn_mapped to the last real pfn mapped. */

Now if max_pfn_mapped is supposed to represent the last pfn mapped in
the initial memory mapping, then I think that the way Xen uses
max_pfn_mapped is actually correct.


The question is: has max_pfn_mapped actually changed meaning?
Because if it hasn't I don't see why you need this change.



>  	for (; vaddr + PMD_SIZE - 1 < vaddr_end; pmd++, vaddr += PMD_SIZE) {
>  		if (pmd_none(*pmd))
>  			continue;
> -- 
> 1.7.10.4
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 

^ permalink raw reply	[flat|nested] 199+ messages in thread

* Re: [PATCH v7u1 10/31] x86, 64bit: Don't set max_pfn_mapped wrong value early on native path
  2013-01-15 13:48   ` Stefano Stabellini
@ 2013-01-15 15:22     ` Konrad Rzeszutek Wilk
  2013-01-15 15:59       ` Stefano Stabellini
  2013-01-15 16:37     ` Yinghai Lu
  1 sibling, 1 reply; 199+ messages in thread
From: Konrad Rzeszutek Wilk @ 2013-01-15 15:22 UTC (permalink / raw)
  To: Stefano Stabellini
  Cc: Yinghai Lu, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	Eric W. Biederman, Andrew Morton, Borislav Petkov, Jan Kiszka,
	Jason Wessel, linux-kernel

On Tue, Jan 15, 2013 at 01:48:45PM +0000, Stefano Stabellini wrote:
> On Fri, 4 Jan 2013, Yinghai Lu wrote:
> > We are not having max_pfn_mapped set correctly until init_memory_mapping.
> > 
> > so don't print it initial value for 64bit
> > 
> > Also need to use KERNEL_IMAGE_SIZE directly for highmap cleanup.
> > 
> > Signed-off-by: Yinghai Lu <yinghai@kernel.org>
> > ---
> >  arch/x86/kernel/head64.c |    3 ---
> >  arch/x86/kernel/setup.c  |    2 ++
> >  arch/x86/mm/init_64.c    |    6 +++++-
> >  3 files changed, 7 insertions(+), 4 deletions(-)
> > 
> > diff --git a/arch/x86/kernel/head64.c b/arch/x86/kernel/head64.c
> > index a3fc233..7061d8b 100644
> > --- a/arch/x86/kernel/head64.c
> > +++ b/arch/x86/kernel/head64.c
> > @@ -146,9 +146,6 @@ void __init x86_64_start_kernel(char * real_mode_data)
> >  	/* clear bss before set_intr_gate with early_idt_handler */
> >  	clear_bss();
> >  
> > -	/* XXX - this is wrong... we need to build page tables from scratch */
> > -	max_pfn_mapped = KERNEL_IMAGE_SIZE >> PAGE_SHIFT;
> > -
> >  	for (i = 0; i < NUM_EXCEPTION_VECTORS; i++) {
> >  #ifdef CONFIG_EARLY_PRINTK
> >  		set_intr_gate(i, &early_idt_handlers[i]);
> > diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
> > index 63160c6..04797e78 100644
> > --- a/arch/x86/kernel/setup.c
> > +++ b/arch/x86/kernel/setup.c
> > @@ -910,8 +910,10 @@ void __init setup_arch(char **cmdline_p)
> >  	setup_bios_corruption_check();
> >  #endif
> >  
> > +#ifdef CONFIG_X86_32
> >  	printk(KERN_DEBUG "initial memory mapped: [mem 0x00000000-%#010lx]\n",
> >  			(max_pfn_mapped<<PAGE_SHIFT) - 1);
> > +#endif
> >  
> >  	reserve_real_mode();
> >  
> > diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c
> > index 9c5f2b1..98385a2 100644
> > --- a/arch/x86/mm/init_64.c
> > +++ b/arch/x86/mm/init_64.c
> > @@ -394,10 +394,14 @@ void __init init_extra_mapping_uc(unsigned long phys, unsigned long size)
> >  void __init cleanup_highmap(void)
> >  {
> >  	unsigned long vaddr = __START_KERNEL_map;
> > -	unsigned long vaddr_end = __START_KERNEL_map + (max_pfn_mapped << PAGE_SHIFT);
> > +	unsigned long vaddr_end = __START_KERNEL_map + KERNEL_IMAGE_SIZE;
> >  	unsigned long end = roundup((unsigned long)_brk_end, PMD_SIZE) - 1;
> >  	pmd_t *pmd = level2_kernel_pgt;
> >  
> > +	/* Xen has its own end somehow with abused max_pfn_mapped */
> > +	if (max_pfn_mapped)
> > +		vaddr_end = __START_KERNEL_map + (max_pfn_mapped << PAGE_SHIFT);
> 
> If you are going to put a comment like that in the code, could you
> please at least add some useful details, rather than a generic
> "somehow"? It doesn't seem very helpful to me or to any other hackers
> looking at the code.

Hm I think I actually pointed out to him in the previous reviews how
we alter it and that he should ingest some of those comments in this
patch.

> 
> The issue is even described as a comment in the code at the beginning of
> arch/x86/xen/mmu.c:xen_setup_kernel_pagetable:
> 
> /* max_pfn_mapped is the last pfn mapped in the initial memory
>  * mappings. Considering that on Xen after the kernel mappings we
>  * have the mappings of some pages that don't exist in pfn space, we
>  * set max_pfn_mapped to the last real pfn mapped. */
> 
> Now if max_pfn_mapped is supposed to represent the last pfn mapped in
> the initial memory mapping, then I think that the way Xen uses
> max_pfn_mapped is actually correct.
> 
> 
> The question is: has max_pfn_mapped actually changed meaning?
> Because if it hasn't I don't see why you need this change.
> 
> 
> 
> >  	for (; vaddr + PMD_SIZE - 1 < vaddr_end; pmd++, vaddr += PMD_SIZE) {
> >  		if (pmd_none(*pmd))
> >  			continue;
> > -- 
> > 1.7.10.4
> > 
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > Please read the FAQ at  http://www.tux.org/lkml/
> > 

^ permalink raw reply	[flat|nested] 199+ messages in thread

* Re: [PATCH v7u1 22/31] x86, boot: add fields to support load bzImage and ramdisk above 4G
  2013-01-15  6:17                   ` Yinghai Lu
@ 2013-01-15 15:50                     ` Borislav Petkov
  2013-01-15 16:03                       ` Yinghai Lu
  0 siblings, 1 reply; 199+ messages in thread
From: Borislav Petkov @ 2013-01-15 15:50 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: H. Peter Anvin, Thomas Gleixner, Ingo Molnar, Eric W. Biederman,
	Andrew Morton, Jan Kiszka, Jason Wessel, linux-kernel,
	Rob Landley, Matt Fleming, Gokul Caushik, Josh Triplett,
	Joe Millenbach

On Mon, Jan 14, 2013 at 10:17:10PM -0800, Yinghai Lu wrote:
> Please check if I miss anything that must be addressed.

Well, I'm staring at your for-x86-boot branch with top-commit
e6bee79e9f177991a35f1bda9ed704cfbcb8e4a3 and well, you've missed almost
everything.

I've made a gazillion comments to your patches in the last week and
you've taken almost none of it. It's like I've been talking to myself
the whole time, but with audience.

-- 
Regards/Gruss,
    Boris.

Sent from a fat crate under my desk. Formatting is fine.
--

^ permalink raw reply	[flat|nested] 199+ messages in thread

* Re: [PATCH v7u1 10/31] x86, 64bit: Don't set max_pfn_mapped wrong value early on native path
  2013-01-15 15:22     ` Konrad Rzeszutek Wilk
@ 2013-01-15 15:59       ` Stefano Stabellini
  0 siblings, 0 replies; 199+ messages in thread
From: Stefano Stabellini @ 2013-01-15 15:59 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: Stefano Stabellini, Yinghai Lu, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin, Eric W. Biederman, Andrew Morton,
	Borislav Petkov, Jan Kiszka, Jason Wessel, linux-kernel

On Tue, 15 Jan 2013, Konrad Rzeszutek Wilk wrote:
> On Tue, Jan 15, 2013 at 01:48:45PM +0000, Stefano Stabellini wrote:
> > On Fri, 4 Jan 2013, Yinghai Lu wrote:
> > > We are not having max_pfn_mapped set correctly until init_memory_mapping.
> > > 
> > > so don't print it initial value for 64bit
> > > 
> > > Also need to use KERNEL_IMAGE_SIZE directly for highmap cleanup.
> > > 
> > > Signed-off-by: Yinghai Lu <yinghai@kernel.org>
> > > ---
> > >  arch/x86/kernel/head64.c |    3 ---
> > >  arch/x86/kernel/setup.c  |    2 ++
> > >  arch/x86/mm/init_64.c    |    6 +++++-
> > >  3 files changed, 7 insertions(+), 4 deletions(-)
> > > 
> > > diff --git a/arch/x86/kernel/head64.c b/arch/x86/kernel/head64.c
> > > index a3fc233..7061d8b 100644
> > > --- a/arch/x86/kernel/head64.c
> > > +++ b/arch/x86/kernel/head64.c
> > > @@ -146,9 +146,6 @@ void __init x86_64_start_kernel(char * real_mode_data)
> > >  	/* clear bss before set_intr_gate with early_idt_handler */
> > >  	clear_bss();
> > >  
> > > -	/* XXX - this is wrong... we need to build page tables from scratch */
> > > -	max_pfn_mapped = KERNEL_IMAGE_SIZE >> PAGE_SHIFT;
> > > -
> > >  	for (i = 0; i < NUM_EXCEPTION_VECTORS; i++) {
> > >  #ifdef CONFIG_EARLY_PRINTK
> > >  		set_intr_gate(i, &early_idt_handlers[i]);
> > > diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
> > > index 63160c6..04797e78 100644
> > > --- a/arch/x86/kernel/setup.c
> > > +++ b/arch/x86/kernel/setup.c
> > > @@ -910,8 +910,10 @@ void __init setup_arch(char **cmdline_p)
> > >  	setup_bios_corruption_check();
> > >  #endif
> > >  
> > > +#ifdef CONFIG_X86_32
> > >  	printk(KERN_DEBUG "initial memory mapped: [mem 0x00000000-%#010lx]\n",
> > >  			(max_pfn_mapped<<PAGE_SHIFT) - 1);
> > > +#endif
> > >  
> > >  	reserve_real_mode();
> > >  
> > > diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c
> > > index 9c5f2b1..98385a2 100644
> > > --- a/arch/x86/mm/init_64.c
> > > +++ b/arch/x86/mm/init_64.c
> > > @@ -394,10 +394,14 @@ void __init init_extra_mapping_uc(unsigned long phys, unsigned long size)
> > >  void __init cleanup_highmap(void)
> > >  {
> > >  	unsigned long vaddr = __START_KERNEL_map;
> > > -	unsigned long vaddr_end = __START_KERNEL_map + (max_pfn_mapped << PAGE_SHIFT);
> > > +	unsigned long vaddr_end = __START_KERNEL_map + KERNEL_IMAGE_SIZE;
> > >  	unsigned long end = roundup((unsigned long)_brk_end, PMD_SIZE) - 1;
> > >  	pmd_t *pmd = level2_kernel_pgt;
> > >  
> > > +	/* Xen has its own end somehow with abused max_pfn_mapped */
> > > +	if (max_pfn_mapped)
> > > +		vaddr_end = __START_KERNEL_map + (max_pfn_mapped << PAGE_SHIFT);
> > 
> > If you are going to put a comment like that in the code, could you
> > please at least add some useful details, rather than a generic
> > "somehow"? It doesn't seem very helpful to me or to any other hackers
> > looking at the code.
> 
> Hm I think I actually pointed out to him in the previous reviews how
> we alter it and that he should ingest some of those comments in this
> patch.

OK.
Still I think that altering max_pfn_mapped is the right thing to do for
Xen, because after all the last pfn mapped is different.
And "somehow" can't the best way to explain the reason for a change.



> > The issue is even described as a comment in the code at the beginning of
> > arch/x86/xen/mmu.c:xen_setup_kernel_pagetable:
> > 
> > /* max_pfn_mapped is the last pfn mapped in the initial memory
> >  * mappings. Considering that on Xen after the kernel mappings we
> >  * have the mappings of some pages that don't exist in pfn space, we
> >  * set max_pfn_mapped to the last real pfn mapped. */
> > 
> > Now if max_pfn_mapped is supposed to represent the last pfn mapped in
> > the initial memory mapping, then I think that the way Xen uses
> > max_pfn_mapped is actually correct.
> > 
> > 
> > The question is: has max_pfn_mapped actually changed meaning?
> > Because if it hasn't I don't see why you need this change.
> > 
> > 
> > 
> > >  	for (; vaddr + PMD_SIZE - 1 < vaddr_end; pmd++, vaddr += PMD_SIZE) {
> > >  		if (pmd_none(*pmd))
> > >  			continue;
> > > -- 
> > > 1.7.10.4
> > > 
> > > --
> > > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> > > the body of a message to majordomo@vger.kernel.org
> > > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > > Please read the FAQ at  http://www.tux.org/lkml/
> > > 
> 

^ permalink raw reply	[flat|nested] 199+ messages in thread

* Re: [PATCH v7u1 22/31] x86, boot: add fields to support load bzImage and ramdisk above 4G
  2013-01-15 15:50                     ` Borislav Petkov
@ 2013-01-15 16:03                       ` Yinghai Lu
  2013-01-15 16:48                         ` Borislav Petkov
  0 siblings, 1 reply; 199+ messages in thread
From: Yinghai Lu @ 2013-01-15 16:03 UTC (permalink / raw)
  To: Borislav Petkov, Yinghai Lu, H. Peter Anvin, Thomas Gleixner,
	Ingo Molnar, Eric W. Biederman, Andrew Morton, Jan Kiszka,
	Jason Wessel, linux-kernel, Rob Landley, Matt Fleming,
	Gokul Caushik, Josh Triplett, Joe Millenbach

On Tue, Jan 15, 2013 at 7:50 AM, Borislav Petkov <bp@alien8.de> wrote:
> On Mon, Jan 14, 2013 at 10:17:10PM -0800, Yinghai Lu wrote:
>> Please check if I miss anything that must be addressed.
>
> Well, I'm staring at your for-x86-boot branch with top-commit
> e6bee79e9f177991a35f1bda9ed704cfbcb8e4a3 and well, you've missed almost
> everything.
>
> I've made a gazillion comments to your patches in the last week and
> you've taken almost none of it. It's like I've been talking to myself
> the whole time, but with audience.

Come on, are you serious? almost none?

I took the comments about sentinel.
but did not take your comments about change kernel_ident_mapping_init.

^ permalink raw reply	[flat|nested] 199+ messages in thread

* Re: [PATCH v7u1 10/31] x86, 64bit: Don't set max_pfn_mapped wrong value early on native path
  2013-01-15 13:48   ` Stefano Stabellini
  2013-01-15 15:22     ` Konrad Rzeszutek Wilk
@ 2013-01-15 16:37     ` Yinghai Lu
  1 sibling, 0 replies; 199+ messages in thread
From: Yinghai Lu @ 2013-01-15 16:37 UTC (permalink / raw)
  To: Stefano Stabellini
  Cc: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Eric W. Biederman,
	Andrew Morton, Borislav Petkov, Jan Kiszka, Jason Wessel,
	linux-kernel, Konrad Rzeszutek Wilk

On Tue, Jan 15, 2013 at 5:48 AM, Stefano Stabellini
<stefano.stabellini@eu.citrix.com> wrote:

>
> If you are going to put a comment like that in the code, could you
> please at least add some useful details, rather than a generic
> "somehow"? It doesn't seem very helpful to me or to any other hackers
> looking at the code.
>
> The issue is even described as a comment in the code at the beginning of
> arch/x86/xen/mmu.c:xen_setup_kernel_pagetable:
>
> /* max_pfn_mapped is the last pfn mapped in the initial memory
>  * mappings. Considering that on Xen after the kernel mappings we
>  * have the mappings of some pages that don't exist in pfn space, we
>  * set max_pfn_mapped to the last real pfn mapped. */
>
> Now if max_pfn_mapped is supposed to represent the last pfn mapped in
> the initial memory mapping, then I think that the way Xen uses
> max_pfn_mapped is actually correct.

change the comments to:

+       /*
+        * Native path, max_pfn_mapped is not set yet.
+        * Xen has valid max_pfn_mapped set in
+        *      arch/x86/xen/mmu.c:xen_setup_kernel_pagetable().
+        */


Thanks

Yinghai

^ permalink raw reply	[flat|nested] 199+ messages in thread

* Re: [PATCH v7u1 00/31] x86, boot, 64bit: Add support for loading ramdisk and bzImage above 4G
  2013-01-15 12:19 ` Stefano Stabellini
@ 2013-01-15 16:43   ` Yinghai Lu
  2013-01-15 19:28     ` Yinghai Lu
  0 siblings, 1 reply; 199+ messages in thread
From: Yinghai Lu @ 2013-01-15 16:43 UTC (permalink / raw)
  To: Stefano Stabellini
  Cc: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Eric W. Biederman,
	Andrew Morton, Borislav Petkov, Jan Kiszka, Jason Wessel,
	linux-kernel, Konrad Rzeszutek Wilk

On Tue, Jan 15, 2013 at 4:19 AM, Stefano Stabellini
<stefano.stabellini@eu.citrix.com> wrote:
>> could be found at:
>>
>>         git://git.kernel.org/pub/scm/linux/kernel/git/yinghai/linux-yinghai.git for-x86-boot
>
> I tried to boot this kernel as PV guest with 2GB of RAM, but
> unfortunately it crashes early on at boot (earlyprintk=xen log
> appended).
>
>
>
> mapping kernel into physical memory
> about to get started...
> [    0.000000] Initializing cgroup subsys cpuset
> [    0.000000] Linux version 3.8.0-rc3+ (sstabellini@st22) (gcc version 4.4.5 (Debian 4.4.5-8) ) #4 SMP Tue Jan 15 12:11:59 UTC 2013
> [    0.000000] Command line: root=/dev/xvda1 rw loglevel=9 debug console=hvc0 earlyprintk=xen
> [    0.000000] ACPI in unprivileged domain disabled
> [    0.000000] e820: BIOS-provided physical RAM map:
> [    0.000000] Xen: [mem 0x0000000000000000-0x000000000009ffff] usable
> [    0.000000] Xen: [mem 0x00000000000a0000-0x00000000000fffff] reserved
> [    0.000000] Xen: [mem 0x0000000000100000-0x000000007fffffff] usable
> [    0.000000] bootconsole [xenboot0] enabled
> [    0.000000] NX (Execute Disable) protection: active
> [    0.000000] DMI not present or invalid.
> [    0.000000] e820: update [mem 0x00000000-0x0000ffff] usable ==> reserved
> [    0.000000] e820: remove [mem 0x000a0000-0x000fffff] usable
> [    0.000000] No AGP bridge found
> [    0.000000] e820: last_pfn = 0x80000 max_arch_pfn = 0x400000000
> [    0.000000] Base memory trampoline at [ffff88000009a000] 9a000 size 24576
> [    0.000000] init_memory_mapping: [mem 0x00000000-0x000fffff]
> [    0.000000]  [mem 0x00000000-0x000fffff] page 4k
> [    0.000000] init_memory_mapping: [mem 0x7fe00000-0x7fffffff]
> [    0.000000]  [mem 0x7fe00000-0x7fffffff] page 4k
> [    0.000000] BRK [0x023b8000, 0x023b8fff] PGTABLE
> [    0.000000] BRK [0x023b9000, 0x023b9fff] PGTABLE
> [    0.000000] init_memory_mapping: [mem 0x7c000000-0x7fdfffff]
> [    0.000000]  [mem 0x7c000000-0x7fdfffff] page 4k
> [    0.000000] BRK [0x023ba000, 0x023bafff] PGTABLE
> [    0.000000] BRK [0x023bb000, 0x023bbfff] PGTABLE
> [    0.000000] BRK [0x023bc000, 0x023bcfff] PGTABLE
> [    0.000000] init_memory_mapping: [mem 0x00100000-0x7bffffff]
> [    0.000000]  [mem 0x00100000-0x7bffffff] page 4k
> (XEN) d15:v0: unhandled page fault (ec=0000)
> (XEN) Pagetable walk from ffffea0000080330:
> (XEN)  L4[0x1d4] = 0000000000000000 ffffffffffffffff
> (XEN) domain_crash_sync called from entry.S
> (XEN) Domain 15 (vcpu#0) crashed on cpu#3:
> (XEN) ----[ Xen-4.3-unstable  x86_64  debug=y  Not tainted ]----

ok, i think i know the cause for now,  will check if there is good way
to fix it.

Thanks

Yinghai

^ permalink raw reply	[flat|nested] 199+ messages in thread

* Re: [PATCH v7u1 22/31] x86, boot: add fields to support load bzImage and ramdisk above 4G
  2013-01-15 16:03                       ` Yinghai Lu
@ 2013-01-15 16:48                         ` Borislav Petkov
  2013-01-15 18:43                           ` Yinghai Lu
  0 siblings, 1 reply; 199+ messages in thread
From: Borislav Petkov @ 2013-01-15 16:48 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: H. Peter Anvin, Thomas Gleixner, Ingo Molnar, Eric W. Biederman,
	Andrew Morton, Jan Kiszka, Jason Wessel, linux-kernel,
	Rob Landley, Matt Fleming, Gokul Caushik, Josh Triplett,
	Joe Millenbach

On Tue, Jan 15, 2013 at 08:03:49AM -0800, Yinghai Lu wrote:
> Come on, are you serious? almost none?

Of course I'm serious - the fact that I'm diddling with your patchset
for weeks now should tell you I'm f*cking serious about this.

> I took the comments about sentinel.
> but did not take your comments about change kernel_ident_mapping_init.

Because... ? I saw that you didn't take it but why, you didn't even say
why you didn't take it. And I asked you at the beginning: should we
review this patchset or do you simply ignore comments.

Let's see:

* [PATCH 06/31] x86, 64bit, realmode: use init_level4_pgt to set trapmoline_pgt directly
	- typo still there

* [PATCH 08/31] x86, 64bit: early #PF handler set page table
	- almost no changes, SOB chain still wrong

* [PATCH 12/31] x86: add get_ramdisk_image/size()
	- no change

* [PATCH 13/31] x86, boot: add get_cmd_line_ptr()
	- no change

* [PATCH 14/31] x86, boot: move checking of cmd_line_ptr out of common path
	- no change

* [PATCH 20/31] x86, kexec: replace ident_mapping_init and init_level4_page
	- no change

* [PATCH 21/31] x86, kexec: only set ident mapping for ram.
	- almost

* [PATCH 22/31] x86, boot: add fields to support load bzImage and ramdisk above 4G
	- except sentinel, almost no change

* [PATCH 23/31] x86, boot: update comments about entries for 64bit image
	- almost no change

How's that for "almost none"?!

Oh, and also, some of the suggestions you've taken but then changed
again making them wrong. Here's an example:

Your initial change had:

> +The memory for struct boot_params should be allocated under or above
> +4G and initialized to all zero.

I suggested:

"Memory for struct boot_params may be allocated anywhere (even above
4G). This memory must be zeroed out."

You changed it to:

"The memory for struct boot_params could be allocated anywhere (even
above 4G) and initialized to all zero."

which still reads funny and has a couple of issues.

-- 
Regards/Gruss,
    Boris.

Sent from a fat crate under my desk. Formatting is fine.
--

^ permalink raw reply	[flat|nested] 199+ messages in thread

* Re: [PATCH v7u1 22/31] x86, boot: add fields to support load bzImage and ramdisk above 4G
  2013-01-15 16:48                         ` Borislav Petkov
@ 2013-01-15 18:43                           ` Yinghai Lu
  2013-01-15 19:49                             ` Borislav Petkov
  0 siblings, 1 reply; 199+ messages in thread
From: Yinghai Lu @ 2013-01-15 18:43 UTC (permalink / raw)
  To: Borislav Petkov, Yinghai Lu, H. Peter Anvin, Thomas Gleixner,
	Ingo Molnar, Eric W. Biederman, Andrew Morton, Jan Kiszka,
	Jason Wessel, linux-kernel, Rob Landley, Matt Fleming,
	Gokul Caushik, Josh Triplett, Joe Millenbach

On Tue, Jan 15, 2013 at 8:48 AM, Borislav Petkov <bp@alien8.de> wrote:
> On Tue, Jan 15, 2013 at 08:03:49AM -0800, Yinghai Lu wrote:
>> Come on, are you serious? almost none?
>
> Of course I'm serious - the fact that I'm diddling with your patchset
> for weeks now should tell you I'm f*cking serious about this.
>
>> I took the comments about sentinel.
>> but did not take your comments about change kernel_ident_mapping_init.
>
> Because... ? I saw that you didn't take it but why, you didn't even say
> why you didn't take it. And I asked you at the beginning: should we
> review this patchset or do you simply ignore comments.

No, I didn't.

I only change lines according to the response that i could understand
and i think that is right.

>
> Let's see:
>
> * [PATCH 06/31] x86, 64bit, realmode: use init_level4_pgt to set trapmoline_pgt directly
>         - typo still there

are you looking wrong place?

http://git.kernel.org/?p=linux/kernel/git/yinghai/linux-yinghai.git;a=commitdiff;h=fd6da054a055aea9cf265a005563073ada6e1af0

 x86, 64bit, realmode: Use init_level4_pgt to set trapmoline_pgd directly
author	Yinghai Lu <yinghai@kernel.org>	
	Tue, 15 Jan 2013 05:11:07 +0000 (21:11 -0800)
committer	Yinghai Lu <yinghai@kernel.org>	
	Tue, 15 Jan 2013 05:11:07 +0000 (21:11 -0800)
with #PF handler way to set early page table, level3_ident will go away with
64bit native path.

So just use entries in init_level4_pgt to set them in tramopline_pgd


>
> * [PATCH 08/31] x86, 64bit: early #PF handler set page table
>         - almost no changes, SOB chain still wrong

HPA and I have explained that to you.

http://lkml.org/lkml/2013/1/12/115

>
> * [PATCH 12/31] x86: add get_ramdisk_image/size()
>        - no change

I respond: will insert other lines between them.

>
> * [PATCH 13/31] x86, boot: add get_cmd_line_ptr()
>         - no change

same above

>
> * [PATCH 14/31] x86, boot: move checking of cmd_line_ptr out of common path
>         - no change

same above

>
> * [PATCH 20/31] x86, kexec: replace ident_mapping_init and init_level4_page
>         - no change

https://patchwork.kernel.org/patch/1930741/

I pointed you about the grammar. ...

>
> * [PATCH 21/31] x86, kexec: only set ident mapping for ram.
>         - almost

almost what?

https://lkml.org/lkml/2013/1/14/325

I said I would  not add commit it for that.

>
> * [PATCH 22/31] x86, boot: add fields to support load bzImage and ramdisk above 4G
>         - except sentinel, almost no change

?

>
> * [PATCH 23/31] x86, boot: update comments about entries for 64bit image
>         - almost no change

I explained that i copied that from 32bit, and if you want to change
with 32bit need to do that later.

>
> How's that for "almost none"?!
>
> Oh, and also, some of the suggestions you've taken but then changed
> again making them wrong. Here's an example:
>
> Your initial change had:
>
>> +The memory for struct boot_params should be allocated under or above
>> +4G and initialized to all zero.
>
> I suggested:
>
> "Memory for struct boot_params may be allocated anywhere (even above
> 4G). This memory must be zeroed out."
>
> You changed it to:
>
> "The memory for struct boot_params could be allocated anywhere (even
> above 4G) and initialized to all zero."
>
> which still reads funny and has a couple of issues.

did not see anything wrong.

^ permalink raw reply	[flat|nested] 199+ messages in thread

* Re: [PATCH v7u1 00/31] x86, boot, 64bit: Add support for loading ramdisk and bzImage above 4G
  2013-01-15 16:43   ` Yinghai Lu
@ 2013-01-15 19:28     ` Yinghai Lu
  2013-01-16 11:32       ` Stefano Stabellini
  0 siblings, 1 reply; 199+ messages in thread
From: Yinghai Lu @ 2013-01-15 19:28 UTC (permalink / raw)
  To: Stefano Stabellini
  Cc: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Eric W. Biederman,
	Andrew Morton, Borislav Petkov, Jan Kiszka, Jason Wessel,
	linux-kernel, Konrad Rzeszutek Wilk

[-- Attachment #1: Type: text/plain, Size: 2839 bytes --]

On Tue, Jan 15, 2013 at 8:43 AM, Yinghai Lu <yinghai@kernel.org> wrote:
> On Tue, Jan 15, 2013 at 4:19 AM, Stefano Stabellini
> <stefano.stabellini@eu.citrix.com> wrote:
>>> could be found at:
>>>
>>>         git://git.kernel.org/pub/scm/linux/kernel/git/yinghai/linux-yinghai.git for-x86-boot
>>
>> I tried to boot this kernel as PV guest with 2GB of RAM, but
>> unfortunately it crashes early on at boot (earlyprintk=xen log
>> appended).
>>
>>
>>
>> mapping kernel into physical memory
>> about to get started...
>> [    0.000000] Initializing cgroup subsys cpuset
>> [    0.000000] Linux version 3.8.0-rc3+ (sstabellini@st22) (gcc version 4.4.5 (Debian 4.4.5-8) ) #4 SMP Tue Jan 15 12:11:59 UTC 2013
>> [    0.000000] Command line: root=/dev/xvda1 rw loglevel=9 debug console=hvc0 earlyprintk=xen
>> [    0.000000] ACPI in unprivileged domain disabled
>> [    0.000000] e820: BIOS-provided physical RAM map:
>> [    0.000000] Xen: [mem 0x0000000000000000-0x000000000009ffff] usable
>> [    0.000000] Xen: [mem 0x00000000000a0000-0x00000000000fffff] reserved
>> [    0.000000] Xen: [mem 0x0000000000100000-0x000000007fffffff] usable
>> [    0.000000] bootconsole [xenboot0] enabled
>> [    0.000000] NX (Execute Disable) protection: active
>> [    0.000000] DMI not present or invalid.
>> [    0.000000] e820: update [mem 0x00000000-0x0000ffff] usable ==> reserved
>> [    0.000000] e820: remove [mem 0x000a0000-0x000fffff] usable
>> [    0.000000] No AGP bridge found
>> [    0.000000] e820: last_pfn = 0x80000 max_arch_pfn = 0x400000000
>> [    0.000000] Base memory trampoline at [ffff88000009a000] 9a000 size 24576
>> [    0.000000] init_memory_mapping: [mem 0x00000000-0x000fffff]
>> [    0.000000]  [mem 0x00000000-0x000fffff] page 4k
>> [    0.000000] init_memory_mapping: [mem 0x7fe00000-0x7fffffff]
>> [    0.000000]  [mem 0x7fe00000-0x7fffffff] page 4k
>> [    0.000000] BRK [0x023b8000, 0x023b8fff] PGTABLE
>> [    0.000000] BRK [0x023b9000, 0x023b9fff] PGTABLE
>> [    0.000000] init_memory_mapping: [mem 0x7c000000-0x7fdfffff]
>> [    0.000000]  [mem 0x7c000000-0x7fdfffff] page 4k
>> [    0.000000] BRK [0x023ba000, 0x023bafff] PGTABLE
>> [    0.000000] BRK [0x023bb000, 0x023bbfff] PGTABLE
>> [    0.000000] BRK [0x023bc000, 0x023bcfff] PGTABLE
>> [    0.000000] init_memory_mapping: [mem 0x00100000-0x7bffffff]
>> [    0.000000]  [mem 0x00100000-0x7bffffff] page 4k
>> (XEN) d15:v0: unhandled page fault (ec=0000)
>> (XEN) Pagetable walk from ffffea0000080330:
>> (XEN)  L4[0x1d4] = 0000000000000000 ffffffffffffffff
>> (XEN) domain_crash_sync called from entry.S
>> (XEN) Domain 15 (vcpu#0) crashed on cpu#3:
>> (XEN) ----[ Xen-4.3-unstable  x86_64  debug=y  Not tainted ]----
>
> ok, i think i know the cause for now,  will check if there is good way
> to fix it.

Can you please test attached patch?

Thanks

Yinghai

[-- Attachment #2: fix_xen_2g.patch --]
[-- Type: application/octet-stream, Size: 1235 bytes --]

---
 arch/x86/mm/init.c |   11 ++++++++++-
 1 file changed, 10 insertions(+), 1 deletion(-)

Index: linux-2.6/arch/x86/mm/init.c
===================================================================
--- linux-2.6.orig/arch/x86/mm/init.c
+++ linux-2.6/arch/x86/mm/init.c
@@ -25,6 +25,8 @@ static unsigned long __initdata pgt_buf_
 
 static unsigned long min_pfn_mapped;
 
+static bool __initdata can_use_brk_pgt = true;
+
 /*
  * Pages returned are already directly mapped.
  *
@@ -47,7 +49,7 @@ __ref void *alloc_low_pages(unsigned int
 						__GFP_ZERO, order);
 	}
 
-	if ((pgt_buf_end + num) > pgt_buf_top) {
+	if ((pgt_buf_end + num) > pgt_buf_top || !can_use_brk_pgt) {
 		unsigned long ret;
 		if (min_pfn_mapped >= max_pfn_mapped)
 			panic("alloc_low_page: ran out of memory");
@@ -372,8 +374,15 @@ static unsigned long __init init_range_m
 		if (start >= end)
 			continue;
 
+		/*
+		 * if it is overlapping with brk pgt, we need to
+		 * alloc pgt buf from memblock instead.
+		 */
+		can_use_brk_pgt = max(start, (u64)pgt_buf_end<<PAGE_SHIFT) >=
+				    min(end, (u64)pgt_buf_top<<PAGE_SHIFT);
 		init_memory_mapping(start, end);
 		mapped_ram_size += end - start;
+		can_use_brk_pgt = true;
 	}
 
 	return mapped_ram_size;

^ permalink raw reply	[flat|nested] 199+ messages in thread

* Re: [PATCH v7u1 22/31] x86, boot: add fields to support load bzImage and ramdisk above 4G
  2013-01-15 18:43                           ` Yinghai Lu
@ 2013-01-15 19:49                             ` Borislav Petkov
  2013-01-15 20:16                               ` Yinghai Lu
  0 siblings, 1 reply; 199+ messages in thread
From: Borislav Petkov @ 2013-01-15 19:49 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: H. Peter Anvin, Thomas Gleixner, Ingo Molnar, Eric W. Biederman,
	Andrew Morton, Jan Kiszka, Jason Wessel, linux-kernel,
	Rob Landley, Matt Fleming, Gokul Caushik, Josh Triplett,
	Joe Millenbach

On Tue, Jan 15, 2013 at 10:43:38AM -0800, Yinghai Lu wrote:
> I only change lines according to the response that i could understand
> and i think that is right.

And those you don't understand and/or don't think are right, you simply
ignore? How about asking if you don't understand them? How about saying
that you don't agree with them so that we can talk it out as it is the
case on lkml normally?

> are you looking wrong place?
> 
> http://git.kernel.org/?p=linux/kernel/git/yinghai/linux-yinghai.git;a=commitdiff;h=fd6da054a055aea9cf265a005563073ada6e1af0
> 
>  x86, 64bit, realmode: Use init_level4_pgt to set trapmoline_pgd directly
> author	Yinghai Lu <yinghai@kernel.org>	
> 	Tue, 15 Jan 2013 05:11:07 +0000 (21:11 -0800)
> committer	Yinghai Lu <yinghai@kernel.org>	
> 	Tue, 15 Jan 2013 05:11:07 +0000 (21:11 -0800)
> with #PF handler way to set early page table, level3_ident will go away with
> 64bit native path.
> 
> So just use entries in init_level4_pgt to set them in tramopline_pgd
							^^^^^^^^^^^^^^
No, I'm looking at the right place, you're not seeing it:

Let me show you once again:

tramopline_pgd
trapmoline_pgd

See the letter swap?

> > * [PATCH 08/31] x86, 64bit: early #PF handler set page table
> >         - almost no changes, SOB chain still wrong
> 
> HPA and I have explained that to you.

No, hpa commented only on the handful commits without SOB. But if he's
fine with having only your SOB, then ok.

> http://lkml.org/lkml/2013/1/12/115
> 
> >
> > * [PATCH 12/31] x86: add get_ramdisk_image/size()
> >        - no change
> 
> I respond: will insert other lines between them.

Again: I'm not talking about spacing the functions (but that would be
good too). Here's what I would like to see (btw, I'm explaining this for
the third time):

static u64 __init get_ramdisk_image(void)
{
       return (u64)boot_params.hdr.ramdisk_image;
}

static u64 __init get_ramdisk_size(void)
{
       return (u64)boot_params.hdr.ramdisk_size;
}

No need for the useless variable declaration and improved readability -
a win-win situation.

> > * [PATCH 13/31] x86, boot: add get_cmd_line_ptr()
> >         - no change
> 
> same above

And I say too "same as above".

> > * [PATCH 14/31] x86, boot: move checking of cmd_line_ptr out of common path
> >         - no change
> 
> same above

No, this is not same as above - I'd simply like to have the comment
explaining why we do the >= 0x100000 check in the code.

> > * [PATCH 20/31] x86, kexec: replace ident_mapping_init and init_level4_page
> >         - no change
> 
> https://patchwork.kernel.org/patch/1930741/
> 
> I pointed you about the grammar. ...

And I said:

"And yet, this is not the point - the point is that this code is
complicated enough as it is so why not make the easy things trivial so
that people looking at it months or even years from now can still try to
understand it.

So what it is defined by the standard?! Just add that line anyway! Then
there's no need to go check what was meant. This way it is *there*,
*explicit* and everyone *knows* what is meant - even people who don't
sleep with C99std under their pillow."

IOW, add the initialization *anyways*!

> > * [PATCH 21/31] x86, kexec: only set ident mapping for ram.
> >         - almost
> 
> almost what?

That it is almost fixed:

"This patch exposes THE pfn_mapped array..."

"This patch relies on new THE kernel_ident_mapping_init..."

The "THE" in capital letters are missing.

This is what I mean: you take my comments but not really - you still
change them on the way and make the text funny.

> > * [PATCH 22/31] x86, boot: add fields to support load bzImage and ramdisk above 4G
> >         - except sentinel, almost no change
> 
> ?

Well, I'm not going to repeat myself ad infinitum and ad absurdum - go
look at the mail thread and read my comments I had and then look at your
commit message again.

> > * [PATCH 23/31] x86, boot: update comments about entries for 64bit image
> >         - almost no change
> 
> I explained that i copied that from 32bit, and if you want to change
> with 32bit need to do that later.

You're adding new text and we want it to be as clean as possible.
According to your logic, if you copy/paste code from the kernel and it
has a bug, the newly pasted portion would have that same bug too and you
won't fix it and let someone else fix it? Even though I told you how to
fix it?

So why don't you simply integrate my suggestions verbatim into the text
instead of opposing so much? I'm not forcing you to do anything bad -
I'm simply commenting on your work so that it can get better. Why the
hell are you still opposing to that?

> >> +The memory for struct boot_params should be allocated under or above
> >> +4G and initialized to all zero.
> >
> > I suggested:
> >
> > "Memory for struct boot_params may be allocated anywhere (even above
> > 4G). This memory must be zeroed out."
> >
> > You changed it to:
> >
> > "The memory for struct boot_params could be allocated anywhere (even
> > above 4G) and initialized to all zero."
> >
> > which still reads funny and has a couple of issues.
> 
> did not see anything wrong.

That's why I'M POINTING IT TO YOU! TO FUCKING SEE IT! So if you still
don't see anything wrong, simply take my suggestions as they are,
without changing them on the fly and stop debating.

Btw, if you'd taken the time to simply add those requested changes
instead of stubbornly and uselessy debating, we would've been done by
now.

Thanks.. but no thanks.

-- 
Regards/Gruss,
    Boris.

Sent from a fat crate under my desk. Formatting is fine.
--

^ permalink raw reply	[flat|nested] 199+ messages in thread

* Re: [PATCH v7u1 22/31] x86, boot: add fields to support load bzImage and ramdisk above 4G
  2013-01-15 19:49                             ` Borislav Petkov
@ 2013-01-15 20:16                               ` Yinghai Lu
  2013-01-15 20:28                                 ` Borislav Petkov
  0 siblings, 1 reply; 199+ messages in thread
From: Yinghai Lu @ 2013-01-15 20:16 UTC (permalink / raw)
  To: Borislav Petkov, Yinghai Lu, H. Peter Anvin, Thomas Gleixner,
	Ingo Molnar, Eric W. Biederman, Andrew Morton, Jan Kiszka,
	Jason Wessel, linux-kernel, Rob Landley, Matt Fleming,
	Gokul Caushik, Josh Triplett, Joe Millenbach

On Tue, Jan 15, 2013 at 11:49 AM, Borislav Petkov <bp@alien8.de> wrote:
> On Tue, Jan 15, 2013 at 10:43:38AM -0800, Yinghai Lu wrote:
>> I only change lines according to the response that i could understand
>> and i think that is right.
>
> And those you don't understand and/or don't think are right, you simply
> ignore? How about asking if you don't understand them? How about saying
> that you don't agree with them so that we can talk it out as it is the
> case on lkml normally?
>
>> are you looking wrong place?
>>
>> http://git.kernel.org/?p=linux/kernel/git/yinghai/linux-yinghai.git;a=commitdiff;h=fd6da054a055aea9cf265a005563073ada6e1af0
>>
>>  x86, 64bit, realmode: Use init_level4_pgt to set trapmoline_pgd directly
>> author        Yinghai Lu <yinghai@kernel.org>
>>       Tue, 15 Jan 2013 05:11:07 +0000 (21:11 -0800)
>> committer     Yinghai Lu <yinghai@kernel.org>
>>       Tue, 15 Jan 2013 05:11:07 +0000 (21:11 -0800)
>> with #PF handler way to set early page table, level3_ident will go away with
>> 64bit native path.
>>
>> So just use entries in init_level4_pgt to set them in tramopline_pgd
>                                                         ^^^^^^^^^^^^^^
> No, I'm looking at the right place, you're not seeing it:
>
> Let me show you once again:
>
> tramopline_pgd
> trapmoline_pgd
>
> See the letter swap?

oh, I only changed trampoline_pgt to tramoplint_pgd.

>
>> > * [PATCH 08/31] x86, 64bit: early #PF handler set page table
>> >         - almost no changes, SOB chain still wrong
>>
>> HPA and I have explained that to you.
>
> No, hpa commented only on the handful commits without SOB. But if he's
> fine with having only your SOB, then ok.

what is point that he comment that?

>
>> http://lkml.org/lkml/2013/1/12/115
>>
>> >
>> > * [PATCH 12/31] x86: add get_ramdisk_image/size()
>> >        - no change
>>
>> I respond: will insert other lines between them.
>
> Again: I'm not talking about spacing the functions (but that would be
> good too). Here's what I would like to see (btw, I'm explaining this for
> the third time):
>
> static u64 __init get_ramdisk_image(void)
> {
>        return (u64)boot_params.hdr.ramdisk_image;
> }
>
> static u64 __init get_ramdisk_size(void)
> {
>        return (u64)boot_params.hdr.ramdisk_size;
> }
>
> No need for the useless variable declaration and improved readability -
> a win-win situation.

no,

I mean later in following path, i have

Index: linux-2.6/arch/x86/kernel/setup.c
===================================================================
--- linux-2.6.orig/arch/x86/kernel/setup.c
+++ linux-2.6/arch/x86/kernel/setup.c
@@ -298,12 +298,16 @@ static u64 __init get_ramdisk_image(void
 {
        u64 ramdisk_image = boot_params.hdr.ramdisk_image;

+       ramdisk_image |= (u64)boot_params.ext_ramdisk_image << 32;
+
        return ramdisk_image;
 }
 static u64 __init get_ramdisk_size(void)
 {
        u64 ramdisk_size = boot_params.hdr.ramdisk_size;

+       ramdisk_size |= (u64)boot_params.ext_ramdisk_size << 32;
+
        return ramdisk_size;
 }


>
>> > * [PATCH 13/31] x86, boot: add get_cmd_line_ptr()
>> >         - no change
>>
>> same above
>
> And I say too "same as above".
>
>> > * [PATCH 14/31] x86, boot: move checking of cmd_line_ptr out of common path
>> >         - no change
>>
>> same above
>
> No, this is not same as above - I'd simply like to have the comment
> explaining why we do the >= 0x100000 check in the code.

I moved that code, and kept the "/* inaccessible */"

 int __cmdline_find_option_bool(u32 cmdline_ptr, const char *option);
 static inline int cmdline_find_option(const char *option, char
*buffer, int bufsize)
 {
-       return __cmdline_find_option(boot_params.hdr.cmd_line_ptr,
option, buffer, bufsize);
+       u32 cmd_line_ptr = boot_params.hdr.cmd_line_ptr;
+
+       if (cmd_line_ptr >= 0x100000)
+               return -1;      /* inaccessible */
+
+       return __cmdline_find_option(cmd_line_ptr, option, buffer, bufsize);

--- linux-2.6.orig/arch/x86/boot/cmdline.c
+++ linux-2.6/arch/x86/boot/cmdline.c
@@ -41,8 +41,8 @@ int __cmdline_find_option(u32 cmdline_pt
                st_bufcpy       /* Copying this to buffer */
        } state = st_wordstart;

-       if (!cmdline_ptr || cmdline_ptr >= 0x100000)
-               return -1;      /* No command line, or inaccessible */
+       if (!cmdline_ptr)
+               return -1;      /* No command line */



>
>> > * [PATCH 20/31] x86, kexec: replace ident_mapping_init and init_level4_page
>> >         - no change
>>
>> https://patchwork.kernel.org/patch/1930741/
>>
>> I pointed you about the grammar. ...
>
> And I said:
>
> "And yet, this is not the point - the point is that this code is
> complicated enough as it is so why not make the easy things trivial so
> that people looking at it months or even years from now can still try to
> understand it.

then stop coding.

>
> So what it is defined by the standard?! Just add that line anyway! Then
> there's no need to go check what was meant. This way it is *there*,
> *explicit* and everyone *knows* what is meant - even people who don't
> sleep with C99std under their pillow."
>
> IOW, add the initialization *anyways*!

No.

>
>> > * [PATCH 21/31] x86, kexec: only set ident mapping for ram.
>> >         - almost
>>
>> almost what?
>
> That it is almost fixed:
>
> "This patch exposes THE pfn_mapped array..."
>
> "This patch relies on new THE kernel_ident_mapping_init..."
>
> The "THE" in capital letters are missing.
>
> This is what I mean: you take my comments but not really - you still
> change them on the way and make the text funny.
>
>> > * [PATCH 22/31] x86, boot: add fields to support load bzImage and ramdisk above 4G
>> >         - except sentinel, almost no change
>>
>> ?
>
> Well, I'm not going to repeat myself ad infinitum and ad absurdum - go
> look at the mail thread and read my comments I had and then look at your
> commit message again.
>
>> > * [PATCH 23/31] x86, boot: update comments about entries for 64bit image
>> >         - almost no change
>>
>> I explained that i copied that from 32bit, and if you want to change
>> with 32bit need to do that later.
>
> You're adding new text and we want it to be as clean as possible.
> According to your logic, if you copy/paste code from the kernel and it
> has a bug, the newly pasted portion would have that same bug too and you
> won't fix it and let someone else fix it? Even though I told you how to
> fix it?
>
> So why don't you simply integrate my suggestions verbatim into the text
> instead of opposing so much? I'm not forcing you to do anything bad -
> I'm simply commenting on your work so that it can get better. Why the
> hell are you still opposing to that?

why ? your understanding is not right every time.

>
>> >> +The memory for struct boot_params should be allocated under or above
>> >> +4G and initialized to all zero.
>> >
>> > I suggested:
>> >
>> > "Memory for struct boot_params may be allocated anywhere (even above
>> > 4G). This memory must be zeroed out."
>> >
>> > You changed it to:
>> >
>> > "The memory for struct boot_params could be allocated anywhere (even
>> > above 4G) and initialized to all zero."
>> >
>> > which still reads funny and has a couple of issues.
>>
>> did not see anything wrong.
>
> That's why I'M POINTING IT TO YOU! TO FUCKING SEE IT! So if you still

this is second time that you use F that in this.

I'm going to put your email in the spam filter.

Bye.

^ permalink raw reply	[flat|nested] 199+ messages in thread

* Re: [PATCH v7u1 22/31] x86, boot: add fields to support load bzImage and ramdisk above 4G
  2013-01-15 20:16                               ` Yinghai Lu
@ 2013-01-15 20:28                                 ` Borislav Petkov
  0 siblings, 0 replies; 199+ messages in thread
From: Borislav Petkov @ 2013-01-15 20:28 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: H. Peter Anvin, Thomas Gleixner, Ingo Molnar, Eric W. Biederman,
	Andrew Morton, Jan Kiszka, Jason Wessel, linux-kernel,
	Rob Landley, Matt Fleming, Gokul Caushik, Josh Triplett,
	Joe Millenbach

On Tue, Jan 15, 2013 at 12:16:11PM -0800, Yinghai Lu wrote:
> then stop coding.

I'll stop coding when you write one correct sentence in English.

-- 
Regards/Gruss,
    Boris.

Sent from a fat crate under my desk. Formatting is fine.
--

^ permalink raw reply	[flat|nested] 199+ messages in thread

* Re: [PATCH v7u1 00/31] x86, boot, 64bit: Add support for loading ramdisk and bzImage above 4G
  2013-01-15 19:28     ` Yinghai Lu
@ 2013-01-16 11:32       ` Stefano Stabellini
  2013-01-16 17:31         ` Yinghai Lu
  0 siblings, 1 reply; 199+ messages in thread
From: Stefano Stabellini @ 2013-01-16 11:32 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Stefano Stabellini, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	Eric W. Biederman, Andrew Morton, Borislav Petkov, Jan Kiszka,
	Jason Wessel, linux-kernel, Konrad Rzeszutek Wilk

On Tue, 15 Jan 2013, Yinghai Lu wrote:
> On Tue, Jan 15, 2013 at 8:43 AM, Yinghai Lu <yinghai@kernel.org> wrote:
> > On Tue, Jan 15, 2013 at 4:19 AM, Stefano Stabellini
> > <stefano.stabellini@eu.citrix.com> wrote:
> >>> could be found at:
> >>>
> >>>         git://git.kernel.org/pub/scm/linux/kernel/git/yinghai/linux-yinghai.git for-x86-boot
> >>
> >> I tried to boot this kernel as PV guest with 2GB of RAM, but
> >> unfortunately it crashes early on at boot (earlyprintk=xen log
> >> appended).
> >>
> >>
> >>
> >> mapping kernel into physical memory
> >> about to get started...
> >> [    0.000000] Initializing cgroup subsys cpuset
> >> [    0.000000] Linux version 3.8.0-rc3+ (sstabellini@st22) (gcc version 4.4.5 (Debian 4.4.5-8) ) #4 SMP Tue Jan 15 12:11:59 UTC 2013
> >> [    0.000000] Command line: root=/dev/xvda1 rw loglevel=9 debug console=hvc0 earlyprintk=xen
> >> [    0.000000] ACPI in unprivileged domain disabled
> >> [    0.000000] e820: BIOS-provided physical RAM map:
> >> [    0.000000] Xen: [mem 0x0000000000000000-0x000000000009ffff] usable
> >> [    0.000000] Xen: [mem 0x00000000000a0000-0x00000000000fffff] reserved
> >> [    0.000000] Xen: [mem 0x0000000000100000-0x000000007fffffff] usable
> >> [    0.000000] bootconsole [xenboot0] enabled
> >> [    0.000000] NX (Execute Disable) protection: active
> >> [    0.000000] DMI not present or invalid.
> >> [    0.000000] e820: update [mem 0x00000000-0x0000ffff] usable ==> reserved
> >> [    0.000000] e820: remove [mem 0x000a0000-0x000fffff] usable
> >> [    0.000000] No AGP bridge found
> >> [    0.000000] e820: last_pfn = 0x80000 max_arch_pfn = 0x400000000
> >> [    0.000000] Base memory trampoline at [ffff88000009a000] 9a000 size 24576
> >> [    0.000000] init_memory_mapping: [mem 0x00000000-0x000fffff]
> >> [    0.000000]  [mem 0x00000000-0x000fffff] page 4k
> >> [    0.000000] init_memory_mapping: [mem 0x7fe00000-0x7fffffff]
> >> [    0.000000]  [mem 0x7fe00000-0x7fffffff] page 4k
> >> [    0.000000] BRK [0x023b8000, 0x023b8fff] PGTABLE
> >> [    0.000000] BRK [0x023b9000, 0x023b9fff] PGTABLE
> >> [    0.000000] init_memory_mapping: [mem 0x7c000000-0x7fdfffff]
> >> [    0.000000]  [mem 0x7c000000-0x7fdfffff] page 4k
> >> [    0.000000] BRK [0x023ba000, 0x023bafff] PGTABLE
> >> [    0.000000] BRK [0x023bb000, 0x023bbfff] PGTABLE
> >> [    0.000000] BRK [0x023bc000, 0x023bcfff] PGTABLE
> >> [    0.000000] init_memory_mapping: [mem 0x00100000-0x7bffffff]
> >> [    0.000000]  [mem 0x00100000-0x7bffffff] page 4k
> >> (XEN) d15:v0: unhandled page fault (ec=0000)
> >> (XEN) Pagetable walk from ffffea0000080330:
> >> (XEN)  L4[0x1d4] = 0000000000000000 ffffffffffffffff
> >> (XEN) domain_crash_sync called from entry.S
> >> (XEN) Domain 15 (vcpu#0) crashed on cpu#3:
> >> (XEN) ----[ Xen-4.3-unstable  x86_64  debug=y  Not tainted ]----
> >
> > ok, i think i know the cause for now,  will check if there is good way
> > to fix it.
> 
> Can you please test attached patch?

Yes, this patch seems to fix the problem.
I have also run the kernel with this patch as dom0 and PV domU with
various memory configurations up to 8GB, and I have no errors to report.

^ permalink raw reply	[flat|nested] 199+ messages in thread

* Re: [PATCH v7u1 00/31] x86, boot, 64bit: Add support for loading ramdisk and bzImage above 4G
  2013-01-16 11:32       ` Stefano Stabellini
@ 2013-01-16 17:31         ` Yinghai Lu
  2013-01-16 17:38           ` H. Peter Anvin
  0 siblings, 1 reply; 199+ messages in thread
From: Yinghai Lu @ 2013-01-16 17:31 UTC (permalink / raw)
  To: Stefano Stabellini, H. Peter Anvin
  Cc: Thomas Gleixner, Ingo Molnar, Eric W. Biederman, Andrew Morton,
	Borislav Petkov, Jan Kiszka, Jason Wessel, linux-kernel,
	Konrad Rzeszutek Wilk

On Wed, Jan 16, 2013 at 3:32 AM, Stefano Stabellini
<stefano.stabellini@eu.citrix.com> wrote:
> On Tue, 15 Jan 2013, Yinghai Lu wrote:
>> On Tue, Jan 15, 2013 at 8:43 AM, Yinghai Lu <yinghai@kernel.org> wrote:
>> > On Tue, Jan 15, 2013 at 4:19 AM, Stefano Stabellini
>> > <stefano.stabellini@eu.citrix.com> wrote:
>> >>> could be found at:
>> >>>
>> >>>         git://git.kernel.org/pub/scm/linux/kernel/git/yinghai/linux-yinghai.git for-x86-boot
>> >>
>> >> I tried to boot this kernel as PV guest with 2GB of RAM, but
>> >> unfortunately it crashes early on at boot (earlyprintk=xen log
>> >> appended).
>> >>
>> >>
>> >>
>> >> mapping kernel into physical memory
>> >> about to get started...
>> >> [    0.000000] Initializing cgroup subsys cpuset
>> >> [    0.000000] Linux version 3.8.0-rc3+ (sstabellini@st22) (gcc version 4.4.5 (Debian 4.4.5-8) ) #4 SMP Tue Jan 15 12:11:59 UTC 2013
>> >> [    0.000000] Command line: root=/dev/xvda1 rw loglevel=9 debug console=hvc0 earlyprintk=xen
>> >> [    0.000000] ACPI in unprivileged domain disabled
>> >> [    0.000000] e820: BIOS-provided physical RAM map:
>> >> [    0.000000] Xen: [mem 0x0000000000000000-0x000000000009ffff] usable
>> >> [    0.000000] Xen: [mem 0x00000000000a0000-0x00000000000fffff] reserved
>> >> [    0.000000] Xen: [mem 0x0000000000100000-0x000000007fffffff] usable
>> >> [    0.000000] bootconsole [xenboot0] enabled
>> >> [    0.000000] NX (Execute Disable) protection: active
>> >> [    0.000000] DMI not present or invalid.
>> >> [    0.000000] e820: update [mem 0x00000000-0x0000ffff] usable ==> reserved
>> >> [    0.000000] e820: remove [mem 0x000a0000-0x000fffff] usable
>> >> [    0.000000] No AGP bridge found
>> >> [    0.000000] e820: last_pfn = 0x80000 max_arch_pfn = 0x400000000
>> >> [    0.000000] Base memory trampoline at [ffff88000009a000] 9a000 size 24576
>> >> [    0.000000] init_memory_mapping: [mem 0x00000000-0x000fffff]
>> >> [    0.000000]  [mem 0x00000000-0x000fffff] page 4k
>> >> [    0.000000] init_memory_mapping: [mem 0x7fe00000-0x7fffffff]
>> >> [    0.000000]  [mem 0x7fe00000-0x7fffffff] page 4k
>> >> [    0.000000] BRK [0x023b8000, 0x023b8fff] PGTABLE
>> >> [    0.000000] BRK [0x023b9000, 0x023b9fff] PGTABLE
>> >> [    0.000000] init_memory_mapping: [mem 0x7c000000-0x7fdfffff]
>> >> [    0.000000]  [mem 0x7c000000-0x7fdfffff] page 4k
>> >> [    0.000000] BRK [0x023ba000, 0x023bafff] PGTABLE
>> >> [    0.000000] BRK [0x023bb000, 0x023bbfff] PGTABLE
>> >> [    0.000000] BRK [0x023bc000, 0x023bcfff] PGTABLE
>> >> [    0.000000] init_memory_mapping: [mem 0x00100000-0x7bffffff]
>> >> [    0.000000]  [mem 0x00100000-0x7bffffff] page 4k
>> >> (XEN) d15:v0: unhandled page fault (ec=0000)
>> >> (XEN) Pagetable walk from ffffea0000080330:
>> >> (XEN)  L4[0x1d4] = 0000000000000000 ffffffffffffffff
>> >> (XEN) domain_crash_sync called from entry.S
>> >> (XEN) Domain 15 (vcpu#0) crashed on cpu#3:
>> >> (XEN) ----[ Xen-4.3-unstable  x86_64  debug=y  Not tainted ]----
>> >
>> > ok, i think i know the cause for now,  will check if there is good way
>> > to fix it.
>>
>> Can you please test attached patch?
>
> Yes, this patch seems to fix the problem.
> I have also run the kernel with this patch as dom0 and PV domU with
> various memory configurations up to 8GB, and I have no errors to report.

Thanks a lot for testing all those conf out.

Hi, Peter,
this bug is in tip:x86/mm2 already, but looks like it is only triggered on with
one patch in for-x86-boot.
http://git.kernel.org/?p=linux/kernel/git/yinghai/linux-yinghai.git;a=commitdiff;h=af4f3bc044d1556f89bd488c7ea75e2a162bb273

rom af4f3bc044d1556f89bd488c7ea75e2a162bb273 Mon Sep 17 00:00:00 2001
From: Yinghai Lu <yinghai@kernel.org>
Date: Mon, 14 Jan 2013 21:11:05 -0800
Subject: [PATCH] x86, mm: Fix page table early allocation offset checking

During debugging loading kernel above 4G, found that one page is not used
in pre-allocated BRK area for early page allocation.

pgt_buf_top is address that can not be used, so should check if that new
end is above that top, otherwise last page will not be used.

Fix that checking and also add print out for allocation from pre-allocated
BRK area to catch possible bugs later.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
---
 arch/x86/mm/init.c |    4 +++-
 1 files changed, 3 insertions(+), 1 deletions(-)

diff --git a/arch/x86/mm/init.c b/arch/x86/mm/init.c
index 6f85de8..c4293cf 100644
--- a/arch/x86/mm/init.c
+++ b/arch/x86/mm/init.c
@@ -47,7 +47,7 @@ __ref void *alloc_low_pages(unsigned int num)
 						__GFP_ZERO, order);
 	}

-	if ((pgt_buf_end + num) >= pgt_buf_top) {
+	if ((pgt_buf_end + num) > pgt_buf_top) {
 		unsigned long ret;
 		if (min_pfn_mapped >= max_pfn_mapped)
 			panic("alloc_low_page: ran out of memory");
@@ -61,6 +61,8 @@ __ref void *alloc_low_pages(unsigned int num)
 	} else {
 		pfn = pgt_buf_end;
 		pgt_buf_end += num;
+		printk(KERN_DEBUG "BRK [%#010lx, %#010lx] PGTABLE\n",
+			pfn << PAGE_SHIFT, (pgt_buf_end << PAGE_SHIFT) - 1);
 	}

 	for (i = 0; i < num; i++) {
-- 
1.7.7.6


So we could
1. just put the fix as first one in for-x86-boot.
2. or fold uncovering patch and fix into x86/mm2, and I will send
patches again, and you rebase the whole tip:x86/mm2.

Please let me know which one you like.

Thanks

Yinghai

^ permalink raw reply related	[flat|nested] 199+ messages in thread

* Re: [PATCH v7u1 00/31] x86, boot, 64bit: Add support for loading ramdisk and bzImage above 4G
  2013-01-16 17:31         ` Yinghai Lu
@ 2013-01-16 17:38           ` H. Peter Anvin
  2013-01-16 18:20             ` Yinghai Lu
  0 siblings, 1 reply; 199+ messages in thread
From: H. Peter Anvin @ 2013-01-16 17:38 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Stefano Stabellini, Thomas Gleixner, Ingo Molnar,
	Eric W. Biederman, Andrew Morton, Borislav Petkov, Jan Kiszka,
	Jason Wessel, linux-kernel, Konrad Rzeszutek Wilk

On 01/16/2013 09:31 AM, Yinghai Lu wrote:
>
> rom af4f3bc044d1556f89bd488c7ea75e2a162bb273 Mon Sep 17 00:00:00 2001
> From: Yinghai Lu <yinghai@kernel.org>
> Date: Mon, 14 Jan 2013 21:11:05 -0800
> Subject: [PATCH] x86, mm: Fix page table early allocation offset checking
>
> During debugging loading kernel above 4G, found that one page is not used
> in pre-allocated BRK area for early page allocation.
>
> pgt_buf_top is address that can not be used, so should check if that new
> end is above that top, otherwise last page will not be used.
>

The first sentence here doesn't parse, and this description doesn't give 
any hint to anyone who is researching this code in say, five years, what 
problems this caused.

I can't really figure it out, either; from looking at the thread and the 
patch I'm assuming the problem is somehow that the code failed to use 
the last page in the brk buffer, which somehow lead to a Xen boot 
failure... but the bits from here to there are totally unclear.

	-hpa

-- 
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel.  I don't speak on their behalf.


^ permalink raw reply	[flat|nested] 199+ messages in thread

* Re: [PATCH v7u1 00/31] x86, boot, 64bit: Add support for loading ramdisk and bzImage above 4G
  2013-01-16 17:38           ` H. Peter Anvin
@ 2013-01-16 18:20             ` Yinghai Lu
  2013-01-17  2:35               ` Yinghai Lu
  0 siblings, 1 reply; 199+ messages in thread
From: Yinghai Lu @ 2013-01-16 18:20 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Stefano Stabellini, Thomas Gleixner, Ingo Molnar,
	Eric W. Biederman, Andrew Morton, Borislav Petkov, Jan Kiszka,
	Jason Wessel, linux-kernel, Konrad Rzeszutek Wilk

On Wed, Jan 16, 2013 at 9:38 AM, H. Peter Anvin <hpa@zytor.com> wrote:
> On 01/16/2013 09:31 AM, Yinghai Lu wrote:
>>
>>
>
> The first sentence here doesn't parse, and this description doesn't give any
> hint to anyone who is researching this code in say, five years, what
> problems this caused.
>
> I can't really figure it out, either; from looking at the thread and the
> patch I'm assuming the problem is somehow that the code failed to use the
> last page in the brk buffer, which somehow lead to a Xen boot failure... but
> the bits from here to there are totally unclear.

ok, let me try again.
tip:x86/mm2 does not fail on xen testing from Stefano and Konrad.

Stefano found booting failure with PV xen with for-x86-boot.

for-x86-boot has one patch that is trying to get back an wasted page
in pre-allocated BRK
and that cause xen pv fails with 2G setup.
that patch just uncover one bug in x86/mm2.

So questions:
1. do we need to rebase x86/mm2 to fold the two patches in. ?
2. or just put fixes at beginning for-x86-boot ?

Thanks

Yinghai

^ permalink raw reply	[flat|nested] 199+ messages in thread

* Re: [PATCH v7u1 00/31] x86, boot, 64bit: Add support for loading ramdisk and bzImage above 4G
  2013-01-16 18:20             ` Yinghai Lu
@ 2013-01-17  2:35               ` Yinghai Lu
  0 siblings, 0 replies; 199+ messages in thread
From: Yinghai Lu @ 2013-01-17  2:35 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Stefano Stabellini, Thomas Gleixner, Ingo Molnar,
	Eric W. Biederman, Andrew Morton, Borislav Petkov, Jan Kiszka,
	Jason Wessel, linux-kernel, Konrad Rzeszutek Wilk

On Wed, Jan 16, 2013 at 10:20 AM, Yinghai Lu <yinghai@kernel.org> wrote:
> On Wed, Jan 16, 2013 at 9:38 AM, H. Peter Anvin <hpa@zytor.com> wrote:
>> On 01/16/2013 09:31 AM, Yinghai Lu wrote:
>>>
>>>
>>
>> The first sentence here doesn't parse, and this description doesn't give any
>> hint to anyone who is researching this code in say, five years, what
>> problems this caused.
>>
>> I can't really figure it out, either; from looking at the thread and the
>> patch I'm assuming the problem is somehow that the code failed to use the
>> last page in the brk buffer, which somehow lead to a Xen boot failure... but
>> the bits from here to there are totally unclear.
>
> ok, let me try again.
> tip:x86/mm2 does not fail on xen testing from Stefano and Konrad.
>
> Stefano found booting failure with PV xen with for-x86-boot.
>
> for-x86-boot has one patch that is trying to get back an wasted page
> in pre-allocated BRK
> and that cause xen pv fails with 2G setup.
> that patch just uncover one bug in x86/mm2.
>
> So questions:
> 1. do we need to rebase x86/mm2 to fold the two patches in. ?
> 2. or just put fixes at beginning for-x86-boot ?
>

I updated for-x86-boot branch via second path.
    git://git.kernel.org/pub/scm/linux/kernel/git/yinghai/linux-yinghai.git
for-x86-boot

^ permalink raw reply	[flat|nested] 199+ messages in thread

* Re: [PATCH v7u1 26/31] x86: Don't enable swiotlb if there is not enough ram for it
  2013-01-15  6:19                                                 ` Yinghai Lu
@ 2013-01-18 15:55                                                   ` Konrad Rzeszutek Wilk
  2013-01-24 15:39                                                     ` Konrad Rzeszutek Wilk
  0 siblings, 1 reply; 199+ messages in thread
From: Konrad Rzeszutek Wilk @ 2013-01-18 15:55 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Eric W. Biederman, Shuah Khan, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin, Andrew Morton, Borislav Petkov, Jan Kiszka,
	Jason Wessel, linux-kernel, Joerg Roedel

On Mon, Jan 14, 2013 at 10:19:22PM -0800, Yinghai Lu wrote:
> On Fri, Jan 11, 2013 at 9:49 AM, Yinghai Lu <yinghai@kernel.org> wrote:
> > On Fri, Jan 11, 2013 at 8:52 AM, Yinghai Lu <yinghai@kernel.org> wrote:
> >>>
> >>> I need to check this patch out and then also test-run them on IA64, AMD-VI, Calgary-X
> >>> GART and Intel VT-d to make a sanity test.
> >>
> >> that will be great, and please check attached two patches, or you want
> >> to me update
> >> for-x86-boot branch and you test that instead?
> >>
> >> but if you want to check memmap=4095M$1M, then will need to test on
> >> newer branch.
> >
> >
> > I updated the for-x86-boot branch.
> >
> > git://git.kernel.org/pub/scm/linux/kernel/git/yinghai/linux-yinghai.git
> > for-x86-boot
> >
> 
> Konrad,
> 
> Did you get chance to test that branch on your setups?

I tested it on the IA64 box - it worked without any hiccups. Going to
try the Calgary-X and the rest of the machines over the weekend.
> 
> Thanks
> 
> Yinghai

^ permalink raw reply	[flat|nested] 199+ messages in thread

* Re: [PATCH v7u1 26/31] x86: Don't enable swiotlb if there is not enough ram for it
  2013-01-18 15:55                                                   ` Konrad Rzeszutek Wilk
@ 2013-01-24 15:39                                                     ` Konrad Rzeszutek Wilk
  2013-01-24 16:51                                                       ` Shuah Khan
  0 siblings, 1 reply; 199+ messages in thread
From: Konrad Rzeszutek Wilk @ 2013-01-24 15:39 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: Yinghai Lu, Eric W. Biederman, Shuah Khan, Thomas Gleixner,
	Ingo Molnar, H. Peter Anvin, Andrew Morton, Borislav Petkov,
	Jan Kiszka, Jason Wessel, linux-kernel, Joerg Roedel

On Fri, Jan 18, 2013 at 10:55:35AM -0500, Konrad Rzeszutek Wilk wrote:
> On Mon, Jan 14, 2013 at 10:19:22PM -0800, Yinghai Lu wrote:
> > On Fri, Jan 11, 2013 at 9:49 AM, Yinghai Lu <yinghai@kernel.org> wrote:
> > > On Fri, Jan 11, 2013 at 8:52 AM, Yinghai Lu <yinghai@kernel.org> wrote:
> > >>>
> > >>> I need to check this patch out and then also test-run them on IA64, AMD-VI, Calgary-X
> > >>> GART and Intel VT-d to make a sanity test.
> > >>
> > >> that will be great, and please check attached two patches, or you want
> > >> to me update
> > >> for-x86-boot branch and you test that instead?
> > >>
> > >> but if you want to check memmap=4095M$1M, then will need to test on
> > >> newer branch.
> > >
> > >
> > > I updated the for-x86-boot branch.
> > >
> > > git://git.kernel.org/pub/scm/linux/kernel/git/yinghai/linux-yinghai.git
> > > for-x86-boot
> > >
> > 
> > Konrad,
> > 
> > Did you get chance to test that branch on your setups?
> 
> I tested it on the IA64 box - it worked without any hiccups. Going to
> try the Calgary-X and the rest of the machines over the weekend.

Worked without issues on AMD GART, AMD Vi, Intel VT-d, and on boxes
without any IOMMU.

I am having trouble getting my Calgary-X box to power on, so that
testing is taking a bit longer.

> > 
> > Thanks
> > 
> > Yinghai
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 

^ permalink raw reply	[flat|nested] 199+ messages in thread

* Re: [PATCH v7u1 26/31] x86: Don't enable swiotlb if there is not enough ram for it
  2013-01-24 15:39                                                     ` Konrad Rzeszutek Wilk
@ 2013-01-24 16:51                                                       ` Shuah Khan
  2013-01-24 19:22                                                         ` Shuah Khan
  0 siblings, 1 reply; 199+ messages in thread
From: Shuah Khan @ 2013-01-24 16:51 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: Konrad Rzeszutek Wilk, Yinghai Lu, Eric W. Biederman,
	Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Andrew Morton,
	Borislav Petkov, Jan Kiszka, Jason Wessel, linux-kernel,
	Joerg Roedel

On Thu, Jan 24, 2013 at 8:39 AM, Konrad Rzeszutek Wilk
<konrad@kernel.org> wrote:
> On Fri, Jan 18, 2013 at 10:55:35AM -0500, Konrad Rzeszutek Wilk wrote:
>> On Mon, Jan 14, 2013 at 10:19:22PM -0800, Yinghai Lu wrote:
>> > On Fri, Jan 11, 2013 at 9:49 AM, Yinghai Lu <yinghai@kernel.org> wrote:
>> > > On Fri, Jan 11, 2013 at 8:52 AM, Yinghai Lu <yinghai@kernel.org> wrote:
>> > >>>
>> > >>> I need to check this patch out and then also test-run them on IA64, AMD-VI, Calgary-X
>> > >>> GART and Intel VT-d to make a sanity test.
>> > >>
>> > >> that will be great, and please check attached two patches, or you want
>> > >> to me update
>> > >> for-x86-boot branch and you test that instead?
>> > >>
>> > >> but if you want to check memmap=4095M$1M, then will need to test on
>> > >> newer branch.
>> > >
>> > >
>> > > I updated the for-x86-boot branch.
>> > >
>> > > git://git.kernel.org/pub/scm/linux/kernel/git/yinghai/linux-yinghai.git
>> > > for-x86-boot
>> > >
>> >
>> > Konrad,
>> >
>> > Did you get chance to test that branch on your setups?
>>
>> I tested it on the IA64 box - it worked without any hiccups. Going to
>> try the Calgary-X and the rest of the machines over the weekend.
>
> Worked without issues on AMD GART, AMD Vi, Intel VT-d, and on boxes
> without any IOMMU.
>
> I am having trouble getting my Calgary-X box to power on, so that
> testing is taking a bit longer.
>

I still have the AMD system I tested earlier versions of this work. I
started compiles with these patches on 3.7 and will let you know the
status.

-- Shuah

^ permalink raw reply	[flat|nested] 199+ messages in thread

* Re: [PATCH v7u1 26/31] x86: Don't enable swiotlb if there is not enough ram for it
  2013-01-24 16:51                                                       ` Shuah Khan
@ 2013-01-24 19:22                                                         ` Shuah Khan
  2013-01-24 21:50                                                           ` Yinghai Lu
  0 siblings, 1 reply; 199+ messages in thread
From: Shuah Khan @ 2013-01-24 19:22 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: Konrad Rzeszutek Wilk, Yinghai Lu, Eric W. Biederman,
	Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Andrew Morton,
	Borislav Petkov, Jan Kiszka, Jason Wessel, linux-kernel,
	Joerg Roedel

On Thu, Jan 24, 2013 at 9:51 AM, Shuah Khan <shuahkhan@gmail.com> wrote:
> On Thu, Jan 24, 2013 at 8:39 AM, Konrad Rzeszutek Wilk
> <konrad@kernel.org> wrote:
>> On Fri, Jan 18, 2013 at 10:55:35AM -0500, Konrad Rzeszutek Wilk wrote:
>>> On Mon, Jan 14, 2013 at 10:19:22PM -0800, Yinghai Lu wrote:
>>> > On Fri, Jan 11, 2013 at 9:49 AM, Yinghai Lu <yinghai@kernel.org> wrote:
>>> > > On Fri, Jan 11, 2013 at 8:52 AM, Yinghai Lu <yinghai@kernel.org> wrote:
>>> > >>>
>>> > >>> I need to check this patch out and then also test-run them on IA64, AMD-VI, Calgary-X
>>> > >>> GART and Intel VT-d to make a sanity test.
>>> > >>
>>> > >> that will be great, and please check attached two patches, or you want
>>> > >> to me update
>>> > >> for-x86-boot branch and you test that instead?
>>> > >>
>>> > >> but if you want to check memmap=4095M$1M, then will need to test on
>>> > >> newer branch.
>>> > >
>>> > >
>>> > > I updated the for-x86-boot branch.
>>> > >
>>> > > git://git.kernel.org/pub/scm/linux/kernel/git/yinghai/linux-yinghai.git
>>> > > for-x86-boot
>>> > >
>>> >
>>> > Konrad,
>>> >
>>> > Did you get chance to test that branch on your setups?
>>>
>>> I tested it on the IA64 box - it worked without any hiccups. Going to
>>> try the Calgary-X and the rest of the machines over the weekend.
>>
>> Worked without issues on AMD GART, AMD Vi, Intel VT-d, and on boxes
>> without any IOMMU.
>>
>> I am having trouble getting my Calgary-X box to power on, so that
>> testing is taking a bit longer.
>>
>
> I still have the AMD system I tested earlier versions of this work. I
> started compiles with these patches on 3.7 and will let you know the
> status.

Tested 3.8-rc4 with the patches on AMD system with IOMMU on. Looks
good. I can't test low memory paths on this system very easily without
mucking with the code and force it treat it as low memory. So I just
tested the normal path. My main goal was to make sure AMD iommu driver
can still enable switolb and it does.

If you would like to me test the error cases, please let me know.

-- Shuah

^ permalink raw reply	[flat|nested] 199+ messages in thread

* Re: [PATCH v7u1 26/31] x86: Don't enable swiotlb if there is not enough ram for it
  2013-01-24 19:22                                                         ` Shuah Khan
@ 2013-01-24 21:50                                                           ` Yinghai Lu
  2013-01-29  2:27                                                             ` Shuah Khan
  0 siblings, 1 reply; 199+ messages in thread
From: Yinghai Lu @ 2013-01-24 21:50 UTC (permalink / raw)
  To: Shuah Khan
  Cc: Konrad Rzeszutek Wilk, Konrad Rzeszutek Wilk, Eric W. Biederman,
	Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Andrew Morton,
	Borislav Petkov, Jan Kiszka, Jason Wessel, linux-kernel,
	Joerg Roedel

On Thu, Jan 24, 2013 at 11:22 AM, Shuah Khan <shuahkhan@gmail.com> wrote:
>>
>> I still have the AMD system I tested earlier versions of this work. I
>> started compiles with these patches on 3.7 and will let you know the
>> status.
>
> Tested 3.8-rc4 with the patches on AMD system with IOMMU on. Looks
> good. I can't test low memory paths on this system very easily without
> mucking with the code and force it treat it as low memory. So I just
> tested the normal path. My main goal was to make sure AMD iommu driver
> can still enable switolb and it does.

good.

>
> If you would like to me test the error cases, please let me know.

please check
git://git.kernel.org/pub/scm/linux/kernel/git/yinghai/linux-yinghai.git
for-x86-boot

and boot your test setups with memmap=4095$1M

Thanks

Yinghai

^ permalink raw reply	[flat|nested] 199+ messages in thread

* Re: [PATCH v7u1 26/31] x86: Don't enable swiotlb if there is not enough ram for it
  2013-01-24 21:50                                                           ` Yinghai Lu
@ 2013-01-29  2:27                                                             ` Shuah Khan
  2013-01-29  3:44                                                               ` Yinghai Lu
  0 siblings, 1 reply; 199+ messages in thread
From: Shuah Khan @ 2013-01-29  2:27 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Konrad Rzeszutek Wilk, Konrad Rzeszutek Wilk, Eric W. Biederman,
	Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Andrew Morton,
	Borislav Petkov, Jan Kiszka, Jason Wessel, linux-kernel,
	Joerg Roedel

On Thu, Jan 24, 2013 at 2:50 PM, Yinghai Lu <yinghai@kernel.org> wrote:
> On Thu, Jan 24, 2013 at 11:22 AM, Shuah Khan <shuahkhan@gmail.com> wrote:
>>>
>>> I still have the AMD system I tested earlier versions of this work. I
>>> started compiles with these patches on 3.7 and will let you know the
>>> status.
>>
>> Tested 3.8-rc4 with the patches on AMD system with IOMMU on. Looks
>> good. I can't test low memory paths on this system very easily without
>> mucking with the code and force it treat it as low memory. So I just
>> tested the normal path. My main goal was to make sure AMD iommu driver
>> can still enable switolb and it does.
>
> good.
>
>>
>> If you would like to me test the error cases, please let me know.
>
> please check
> git://git.kernel.org/pub/scm/linux/kernel/git/yinghai/linux-yinghai.git
> for-x86-boot
>
> and boot your test setups with memmap=4095$1M
>
> Thanks
>
> Yinghai

Yinghai,

Your for-x86-boot git boots on AMD system I have. However, with
memmap=4095$1M option, it panics very early in boot. I don't have
physical access to the console and I will try to get you the panic
information tomorrow.

-- Shuah

^ permalink raw reply	[flat|nested] 199+ messages in thread

* Re: [PATCH v7u1 26/31] x86: Don't enable swiotlb if there is not enough ram for it
  2013-01-29  2:27                                                             ` Shuah Khan
@ 2013-01-29  3:44                                                               ` Yinghai Lu
  2013-01-31 19:28                                                                 ` Shuah Khan
  0 siblings, 1 reply; 199+ messages in thread
From: Yinghai Lu @ 2013-01-29  3:44 UTC (permalink / raw)
  To: Shuah Khan
  Cc: Konrad Rzeszutek Wilk, Konrad Rzeszutek Wilk, Eric W. Biederman,
	Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Andrew Morton,
	Borislav Petkov, Jan Kiszka, Jason Wessel, linux-kernel,
	Joerg Roedel

On Mon, Jan 28, 2013 at 6:27 PM, Shuah Khan <shuahkhan@gmail.com> wrote:
> On Thu, Jan 24, 2013 at 2:50 PM, Yinghai Lu <yinghai@kernel.org> wrote:
>
> Your for-x86-boot git boots on AMD system I have. However, with
> memmap=4095$1M option, it panics very early in boot. I don't have
> physical access to the console and I will try to get you the panic
> information tomorrow.

panic is good, but it is supposed some kind of late..

I tested on kvm and it panic late as expected:

early console in setup code
Probing EDD (edd=off to disable)... ok
early console in decompress_kernel
decompress_kernel:
  input: [0x2c042c2-0x3555455], output: 0x1000000, heap: [0x355cc40-0x3564c3f]

Decompressing Linux... xz... Parsing ELF... done.
Booting the kernel.
[    0.000000] bootconsole [uart0] enabled
[    0.000000]    real_mode_data :      phys 0000000000014480
[    0.000000]    real_mode_data :      virt ffff880000014480
[    0.000000]       boot_params : init virt ffffffff831489c0
[    0.000000]       boot_params :      phys 00000000031489c0
[    0.000000]       boot_params :      virt ffff8800031489c0
[    0.000000] boot_command_line : init virt ffffffff83036020
[    0.000000] boot_command_line :      phys 0000000003036020
[    0.000000] boot_command_line :      virt ffff880003036020
[    0.000000] Kernel Layout:
[    0.000000]   .text: [0x01000000-0x021825b9]
[    0.000000] .rodata: [0x02200000-0x02a24fff]
[    0.000000]   .data: [0x02c00000-0x02dcc8bf]
[    0.000000]   .init: [0x02dce000-0x03133fff]
[    0.000000]    .bss: [0x03142000-0x03ddffff]
[    0.000000]    .brk: [0x03de0000-0x03e04fff]
[    0.000000] memblock_reserve: [0x0009fc00-0x000fffff] * BIOS reserved
[    0.000000] Initializing cgroup subsys cpuset
[    0.000000] Initializing cgroup subsys cpu
[    0.000000] Linux version 3.8.0-rc5-yh-01253-gdf8eac2-dirty
(yhlu@linux-siqj.site) (gcc version 4.7.1 20120723 [gcc-4_7-branch
revision 189773] (SUSE Linux) ) #1194 SMP Mon Jan 28 19:36:41 PST 2013
[    0.000000] memblock_reserve: [0x01000000-0x03ddffff] TEXT DATA BSS
[    0.000000] memblock_reserve: [0x7d9ac000-0x7fffefff] RAMDISK
[    0.000000] Command line: BOOT_IMAGE=linux debug ignore_loglevel
memmap=4095M$1M pci=routeirq ramdisk_size=262144 root=/dev/ram0 rw
ip=dhcp console=uart8250,io,0x3f8,115200 initrd=initrd.img
[    0.000000] KERNEL supported cpus:
[    0.000000]   Intel GenuineIntel
[    0.000000]   AMD AuthenticAMD
[    0.000000]   Centaur CentaurHauls
[    0.000000] Physical RAM map:
[    0.000000] raw: [mem 0x0000000000000000-0x000000000009fbff] usable
[    0.000000] raw: [mem 0x000000000009fc00-0x000000000009ffff] reserved
[    0.000000] raw: [mem 0x00000000000f0000-0x00000000000fffff] reserved
[    0.000000] raw: [mem 0x0000000000100000-0x00000000dfffdfff] usable
[    0.000000] raw: [mem 0x00000000dfffe000-0x00000000dfffffff] reserved
[    0.000000] raw: [mem 0x00000000feffc000-0x00000000feffffff] reserved
[    0.000000] raw: [mem 0x00000000fffc0000-0x00000000ffffffff] reserved
[    0.000000] raw: [mem 0x0000000100000000-0x000000019fffffff] usable
[    0.000000] e820: BIOS-provided physical RAM map (sanitized by setup):
[    0.000000] BIOS-e820: [mem 0x0000000000000000-0x000000000009fbff] usable
[    0.000000] BIOS-e820: [mem 0x000000000009fc00-0x000000000009ffff] reserved
[    0.000000] BIOS-e820: [mem 0x00000000000f0000-0x00000000000fffff] reserved
[    0.000000] BIOS-e820: [mem 0x0000000000100000-0x00000000dfffdfff] usable
[    0.000000] BIOS-e820: [mem 0x00000000dfffe000-0x00000000dfffffff] reserved
[    0.000000] BIOS-e820: [mem 0x00000000feffc000-0x00000000feffffff] reserved
[    0.000000] BIOS-e820: [mem 0x00000000fffc0000-0x00000000ffffffff] reserved
[    0.000000] BIOS-e820: [mem 0x0000000100000000-0x000000019fffffff] usable
[    0.000000] debug: ignoring loglevel setting.
[    0.000000] NX (Execute Disable) protection: active
[    0.000000] e820: user-defined physical RAM map:
[    0.000000] user: [mem 0x0000000000000000-0x000000000009fbff] usable
[    0.000000] user: [mem 0x000000000009fc00-0x000000000009ffff] reserved
[    0.000000] user: [mem 0x00000000000f0000-0x00000000ffffffff] reserved
[    0.000000] user: [mem 0x0000000100000000-0x000000019fffffff] usable
[    0.000000] SMBIOS 2.4 present.
[    0.000000] DMI: Bochs Bochs, BIOS Bochs 01/01/2011
[    0.000000] .text .data .bss are not marked as E820_RAM!
[    0.000000] e820: remove [mem 0x01000000-0x03e04fff]
[    0.000000] e820: update [mem 0x00000000-0x0000ffff] usable ==> reserved
[    0.000000] e820: remove [mem 0x000a0000-0x000fffff] usable
[    0.000000] No AGP bridge found
[    0.000000] e820: last_pfn = 0x1a0000 max_arch_pfn = 0x400000000
[    0.000000] MTRR default type: write-back
[    0.000000] MTRR fixed ranges enabled:
[    0.000000]   00000-9FFFF write-back
[    0.000000]   A0000-BFFFF uncachable
[    0.000000]   C0000-FFFFF write-protect
[    0.000000] MTRR variable ranges enabled:
[    0.000000]   0 base 00E0000000 mask FFE0000000 uncachable
[    0.000000]   1 disabled
[    0.000000]   2 disabled
[    0.000000]   3 disabled
[    0.000000]   4 disabled
[    0.000000]   5 disabled
[    0.000000]   6 disabled
[    0.000000]   7 disabled
[    0.000000] PAT not supported by CPU.
[    0.000000] e820: last_pfn = 0x3e05 max_arch_pfn = 0x400000000
[    0.000000] found SMP MP-table at [mem 0x000fda20-0x000fda2f]
mapped at [ffff8800000fda20]
[    0.000000] memblock_reserve: [0x000fda20-0x000fda2f] * MP-table mpf
[    0.000000] memblock_reserve: [0x000fda30-0x000fdb23] * MP-table mpc
[    0.000000] memblock_reserve: [0x03de0000-0x03de5fff] BRK
[    0.000000] MEMBLOCK configuration:
[    0.000000]  memory size = 0xa2e94c00 reserved size = 0x5499400
[    0.000000]  memory.cnt  = 0x3
[    0.000000]  memory[0x0]	[0x00010000-0x0009efff], 0x8f000 bytes
[    0.000000]  memory[0x1]	[0x01000000-0x03e04fff], 0x2e05000 bytes
[    0.000000]  memory[0x2]	[0x100000000-0x19fffffff], 0xa0000000 bytes
[    0.000000]  reserved.cnt  = 0x3
[    0.000000]  reserved[0x0]	[0x0009fc00-0x000fffff], 0x60400 bytes
[    0.000000]  reserved[0x1]	[0x01000000-0x03de5fff], 0x2de6000 bytes
[    0.000000]  reserved[0x2]	[0x7d9ac000-0x7fffefff], 0x2653000 bytes
[    0.000000] memblock_reserve: [0x00099000-0x0009efff] TRAMPOLINE
[    0.000000] Base memory trampoline at [ffff880000099000] 99000 size 24576
[    0.000000] init_memory_mapping: [mem 0x00000000-0x000fffff]
[    0.000000]  [mem 0x00000000-0x000fffff] page 4k
[    0.000000] BRK [0x03de1000, 0x03de1fff] PGTABLE
[    0.000000] BRK [0x03de2000, 0x03de2fff] PGTABLE
[    0.000000] BRK [0x03de3000, 0x03de3fff] PGTABLE
[    0.000000] init_memory_mapping: [mem 0x19fe00000-0x19fffffff]
[    0.000000]  [mem 0x19fe00000-0x19fffffff] page 2M
[    0.000000] BRK [0x03de4000, 0x03de4fff] PGTABLE
[    0.000000] init_memory_mapping: [mem 0x19c000000-0x19fdfffff]
[    0.000000]  [mem 0x19c000000-0x19fdfffff] page 2M
[    0.000000] init_memory_mapping: [mem 0x180000000-0x19bffffff]
[    0.000000]  [mem 0x180000000-0x19bffffff] page 2M
[    0.000000] init_memory_mapping: [mem 0x01000000-0x03e04fff]
[    0.000000]  [mem 0x01000000-0x03dfffff] page 2M
[    0.000000]  [mem 0x03e00000-0x03e04fff] page 4k
[    0.000000] memblock_reserve: [0x19ffff000-0x19fffffff] PGTABLE
[    0.000000] init_memory_mapping: [mem 0x100000000-0x17fffffff]
[    0.000000]  [mem 0x100000000-0x17fffffff] page 2M
[    0.000000] BRK [0x03de5000, 0x03de5fff] PGTABLE
[    0.000000] memblock_reserve: [0x19fffe000-0x19fffefff] PGTABLE
[    0.000000] RAMDISK: [mem 0x7d9ac000-0x7fffefff]
[    0.000000] memblock_reserve: [0x19d9ab000-0x19fffdfff] NEW RAMDISK
[    0.000000] Allocated new RAMDISK: [mem 0x19d9ab000-0x19fffd667]
[    0.000000] Move RAMDISK from [mem 0x7d9ac000-0x7fffe667] to [mem
0x19d9ab000-0x19fffd667]
[    0.000000]    memblock_free: [0x7d9ac000-0x7fffefff]ACPI: RSDP
00000000000fd870 00014 (v00 BOCHS )
[    0.000000] ACPI: RSDT 00000000dfffe3d0 00038 (v01 BOCHS  BXPCRSDT
00000001 BXPC 00000001)
[    0.000000] ACPI: FACP 00000000dfffff80 00074 (v01 BOCHS  BXPCFACP
00000001 BXPC 00000001)
[    0.000000] ACPI: DSDT 00000000dfffe410 0124A (v01   BXPC   BXDSDT
00000001 INTL 20100528)
[    0.000000] ACPI: FACS 00000000dfffff40 00040
[    0.000000] ACPI: SSDT 00000000dffffe30 00110 (v01 BOCHS  BXPCSSDT
00000001 BXPC 00000001)
[    0.000000] ACPI: APIC 00000000dffffd10 00080 (v01 BOCHS  BXPCAPIC
00000001 BXPC 00000001)
[    0.000000] ACPI: HPET 00000000dffffcd0 00038 (v01 BOCHS  BXPCHPET
00000001 BXPC 00000001)
[    0.000000] ACPI: SSDT 00000000dffff660 0066E (v01   BXPC BXSSDTPC
00000001 INTL 20100528)
[    0.000000] ACPI: Local APIC address 0xfee00000
[    0.000000] No NUMA configuration found
[    0.000000] Faking a node at [mem 0x0000000000000000-0x000000019fffffff]
[    0.000000] Initmem setup node 0 [mem 0x00000000-0x19fffffff]
[    0.000000] memblock_reserve: [0x19d984000-0x19d9aafff]
[    0.000000]   NODE_DATA [mem 0x19d984000-0x19d9aafff]
[    0.000000] MEMBLOCK configuration:
[    0.000000]  memory size = 0xa2e94c00 reserved size = 0x54c8400
[    0.000000]  memory.cnt  = 0x3
[    0.000000]  memory[0x0]	[0x00010000-0x0009efff], 0x8f000 bytes on node 0
[    0.000000]  memory[0x1]	[0x01000000-0x03e04fff], 0x2e05000 bytes on node 0
[    0.000000]  memory[0x2]	[0x100000000-0x19fffffff], 0xa0000000
bytes on node 0
[    0.000000]  reserved.cnt  = 0x4
[    0.000000]  reserved[0x0]	[0x00099000-0x0009efff], 0x6000 bytes
[    0.000000]  reserved[0x1]	[0x0009fc00-0x000fffff], 0x60400 bytes
[    0.000000]  reserved[0x2]	[0x01000000-0x03de5fff], 0x2de6000 bytes
[    0.000000]  reserved[0x3]	[0x19d984000-0x19fffffff], 0x267c000 bytes
[    0.000000] memblock_reserve: [0x19d983000-0x19d983fff] sparse section
[    0.000000] memblock_reserve: [0x19d583000-0x19d982fff] usemap_map
[    0.000000] memblock_reserve: [0x19d582d40-0x19d582fdf] usemap section
[    0.000000] memblock_reserve: [0x19d182d40-0x19d582d3f] map_map
[    0.000000] memblock_reserve: [0x19a600000-0x19cffffff] vmemmap buf
[    0.000000] memblock_reserve: [0x19d181000-0x19d181fff] vmemmap block
[    0.000000]  [ffffea0000000000-ffffea7fffffffff] PGD @
ffff88019d181000 on node 0
[    0.000000] memblock_reserve: [0x19d180000-0x19d180fff] vmemmap block
[    0.000000]  [ffffea0000000000-ffffea003fffffff] PUD @
ffff88019d180000 on node 0
[    0.000000]    memblock_free: [0x19d000000-0x19cffffff]
[    0.000000]  [ffffea0000000000-ffffea00067fffff] PMD ->
[ffff88019a600000-ffff88019cffffff] on node 0
[    0.000000]    memblock_free: [0x19d182d40-0x19d582d3f]
[    0.000000]    memblock_free: [0x19d583000-0x19d982fff]Zone ranges:
[    0.000000]   DMA      [mem 0x00010000-0x00ffffff]
[    0.000000]   DMA32    [mem 0x01000000-0xffffffff]
[    0.000000]   Normal   [mem 0x100000000-0x19fffffff]
[    0.000000] Movable zone start for each node
[    0.000000] Early memory node ranges
[    0.000000]   node   0: [mem 0x00010000-0x0009efff]
[    0.000000]   node   0: [mem 0x01000000-0x03e04fff]
[    0.000000]   node   0: [mem 0x100000000-0x19fffffff]
[    0.000000] start - node_states[2]:
[    0.000000] On node 0 totalpages: 667284
[    0.000000]   DMA zone: 3 pages used for memmap
[    0.000000]   DMA zone: 6 pages reserved
[    0.000000]   DMA zone: 134 pages, LIFO batch:0
[    0.000000] memblock_reserve: [0x19d92b000-0x19d982fff] pgdat
[    0.000000]   DMA32 zone: 185 pages used for memmap
[    0.000000]   DMA32 zone: 11596 pages, LIFO batch:1
[    0.000000] memblock_reserve: [0x19d8d3000-0x19d92afff] pgdat
[    0.000000]   Normal zone: 10240 pages used for memmap
[    0.000000]   Normal zone: 645120 pages, LIFO batch:31
[    0.000000] memblock_reserve: [0x19d87b000-0x19d8d2fff] pgdat
[    0.000000] after - node_states[2]: 0
[    0.000000] memblock_reserve: [0x19d87a000-0x19d87afff] pgtable
[    0.000000] ACPI: PM-Timer IO Port: 0xb008
[    0.000000] ACPI: Local APIC address 0xfee00000
[    0.000000] ACPI: LAPIC (acpi_id[0x00] lapic_id[0x00] enabled)
[    0.000000] ACPI: LAPIC (acpi_id[0x01] lapic_id[0x01] enabled)
[    0.000000] ACPI: LAPIC_NMI (acpi_id[0xff] dfl dfl lint[0x1])
[    0.000000] ACPI: IOAPIC (id[0x00] address[0xfec00000] gsi_base[0])
[    0.000000] IOAPIC[0]: apic_id 0, version 17, address 0xfec00000, GSI 0-23
[    0.000000] ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl)
[    0.000000] ACPI: INT_SRC_OVR (bus 0 bus_irq 5 global_irq 5 high level)
[    0.000000] ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 high level)
[    0.000000] ACPI: INT_SRC_OVR (bus 0 bus_irq 10 global_irq 10 high level)
[    0.000000] ACPI: INT_SRC_OVR (bus 0 bus_irq 11 global_irq 11 high level)
[    0.000000] ACPI: IRQ0 used by override.
[    0.000000] ACPI: IRQ2 used by override.
[    0.000000] ACPI: IRQ5 used by override.
[    0.000000] ACPI: IRQ9 used by override.
[    0.000000] ACPI: IRQ10 used by override.
[    0.000000] ACPI: IRQ11 used by override.
[    0.000000] Using ACPI (MADT) for SMP configuration information
[    0.000000] ACPI: HPET id: 0x8086a201 base: 0xfed00000
[    0.000000] memblock_reserve: [0x19d879f80-0x19d879fc0] hpet res
[    0.000000] smpboot: Allowing 2 CPUs, 0 hotplug CPUs
[    0.000000] init_cpu_to_node:
[    0.000000] cpu 0 -> apicid 0x0 -> node 0
[    0.000000] cpu 1 -> apicid 0x1 -> node 0
[    0.000000] memblock_reserve: [0x19d879f00-0x19d879f42] ioapic res
[    0.000000] nr_irqs_gsi: 40
[    0.000000] memblock_reserve: [0x19d879d40-0x19d879ec7] e820 resources
[    0.000000] memblock_reserve: [0x19d879cc0-0x19d879d27] firmware map
[    0.000000] memblock_reserve: [0x19d879c40-0x19d879ca7] firmware map
[    0.000000] memblock_reserve: [0x19d879bc0-0x19d879c27] firmware map
[    0.000000] memblock_reserve: [0x19d879b40-0x19d879ba7] firmware map
[    0.000000] memblock_reserve: [0x19d879ac0-0x19d879b27] firmware map
[    0.000000] memblock_reserve: [0x19d879a40-0x19d879aa7] firmware map
[    0.000000] memblock_reserve: [0x19d8799c0-0x19d879a27] firmware map
[    0.000000] memblock_reserve: [0x19d879940-0x19d8799a7] firmware map
[    0.000000] memblock_reserve: [0x19d879900-0x19d87991f] nosave region
[    0.000000] PM: Registered nosave memory: 000000000009f000 - 00000000000a0000
[    0.000000] PM: Registered nosave memory: 00000000000a0000 - 00000000000f0000
[    0.000000] PM: Registered nosave memory: 00000000000f0000 - 0000000001000000
[    0.000000] memblock_reserve: [0x19d8798c0-0x19d8798df] nosave region
[    0.000000] PM: Registered nosave memory: 0000000003e05000 - 0000000100000000
[    0.000000] e820: cannot find a gap in the 32bit address range
[    0.000000] e820: PCI devices with unassigned 32bit BARs may break!
[    0.000000] e820: [mem 0x1a0100000-0x1a04fffff] available for PCI devices
[    0.000000] memblock_reserve: [0x19d879800-0x19d8798a4] saved_command_l
[    0.000000] memblock_reserve: [0x19d879740-0x19d8797e4] static_command_
[    0.000000] setup_percpu: NR_CPUS:4096 nr_cpumask_bits:2
nr_cpu_ids:2 nr_node_ids:1
[    0.000000] memblock_reserve: [0x19d878740-0x19d87973f] pcpu_alloc_info
[    0.000000] memblock_reserve: [0x19d877740-0x19d87873f] pcpu area
[    0.000000] memblock_reserve: [0x19a200000-0x19a5fffff] pcpu_alloc
[    0.000000]    memblock_free: [0x19a3dc000-0x19a3fffff]
[    0.000000]    memblock_free: [0x19a5dc000-0x19a5fffff][
0.000000] PERCPU: Embedded 476 pages/cpu @ffff88019a200000 s1917968
r8192 d23536 u2097152
[    0.000000] memblock_reserve: [0x19d877700-0x19d877707] pcpu group_offs
[    0.000000] memblock_reserve: [0x19d8776c0-0x19d8776c7] pcpu group_size
[    0.000000] memblock_reserve: [0x19d877680-0x19d877687] pcpu unit_map
[    0.000000] memblock_reserve: [0x19d877640-0x19d87764f] pcpu unit_off
[    0.000000] pcpu-alloc: s1917968 r8192 d23536 u2097152 alloc=1*2097152
[    0.000000] pcpu-alloc: [0] 0 [0] 1
[    0.000000] memblock_reserve: [0x19d8774c0-0x19d87760f] pcpu slot
[    0.000000] memblock_reserve: [0x19d877440-0x19d8774bf] pcpu chunk_stru
[    0.000000] memblock_reserve: [0x19d8773c0-0x19d87743f] pcpu chunk_stru
[    0.000000]    memblock_free: [0x19d878740-0x19d87973f]
[    0.000000]    memblock_free: [0x19d877740-0x19d87873f][
0.000000] memblock_reserve: [0x19d879540-0x19d87973f] node_to_cpumask
[    0.000000] memblock_reserve: [0x19d879340-0x19d87953f] cpu_initialized
[    0.000000] memblock_reserve: [0x19d879140-0x19d87933f] cpu_callin_mask
[    0.000000] memblock_reserve: [0x19d878f40-0x19d87913f] cpu_callout_mas
[    0.000000] memblock_reserve: [0x19d878d40-0x19d878f3f] cpu_calibrated_
[    0.000000] memblock_reserve: [0x19d878b40-0x19d878d3f] cpu_sibling_set
[    0.000000] build_zonelists: local_node: 0 next_best_node: 0
[    0.000000] Built 1 zonelists in Zone order, mobility grouping on.
Total pages: 656850
[    0.000000] Policy zone: Normal
[    0.000000] Kernel command line: BOOT_IMAGE=linux debug
ignore_loglevel memmap=4095M$1M pci=routeirq ramdisk_size=262144
root=/dev/ram0 rw ip=dhcp console=uart8250,io,0x3f8,115200
initrd=initrd.img
[    0.000000] memblock_reserve: [0x19d86f3c0-0x19d8773bf] large system ha
[    0.000000] PID hash table entries: 4096 (order: 3, 32768 bytes)
[    0.000000] __ex_table already sorted, skipping sort
[    0.000000] Initializing CPU#0
[    0.000000] pci_iommu_alloc:
__iommu_entry_pci_swiotlb_detect_4gb+0x0/0x28 ffffffff8312de48
[    0.000000] pci_iommu_alloc:
__iommu_entry_pci_swiotlb_detect_override+0x0/0x28 ffffffff8312de70
[    0.000000] Cannot allocate SWIOTLB buffer
[    0.000000] pci_iommu_alloc:
__iommu_entry_gart_iommu_hole_init+0x0/0x28 ffffffff8312de98
[    0.000000] Checking aperture...
[    0.000000] No AGP bridge found
[    0.000000] pci_iommu_alloc:
__iommu_entry_amd_iommu_detect+0x0/0x28 ffffffff8312dec0
[    0.000000] pci_iommu_alloc:
__iommu_entry_detect_intel_iommu+0x0/0x28 ffffffff8312dee8
[    0.000000]        [0x00010000-0x00098fff]
[    0.000000]        [0x03de6000-0x03e04fff]
[    0.000000]        [0x100000000-0x19a1fffff]
[    0.000000]        [0x19a3dc000-0x19a3fffff]
[    0.000000]        [0x19a5dc000-0x19a5fffff]
[    0.000000]        [0x19d000000-0x19d17ffff]
[    0.000000]        [0x19d182000-0x19d581fff]
[    0.000000]        [0x19d583000-0x19d86efff]
[    0.000000]        [0x19d878000-0x19d877fff]
[    0.000000]        [0x19d87a000-0x19d879fff]
[    0.000000] Memory: 2534768k/6815744k available (17929k kernel
code, 4146608k absent, 134368k reserved, 12584k data, 3480k init)
[    0.000000] SLUB: Genslabs=15, HWalign=64, Order=0-3, MinObjects=0,
CPUs=2, Nodes=1
[    0.000000] Hierarchical RCU implementation.
[    0.000000] 	RCU restricting CPUs from NR_CPUS=4096 to nr_cpu_ids=2.
[    0.000000] NR_IRQS:847872 nr_irqs:40 0
[    0.000000] IOAPIC[0]: apic_id 0, GSI 0-23 ==> irq 0-23 reserved
[    0.000000] IOAPIC[extra]: GSI 24-39 ==> irq 24-39 reserved
[    0.000000]   alloc irq_desc for 0 on node 0
[    0.000000]   alloc irq_desc for 1 on node 0
[    0.000000]   alloc irq_desc for 2 on node 0
[    0.000000]   alloc irq_desc for 3 on node 0
[    0.000000]   alloc irq_desc for 4 on node 0
[    0.000000]   alloc irq_desc for 5 on node 0
[    0.000000]   alloc irq_desc for 6 on node 0
[    0.000000]   alloc irq_desc for 7 on node 0
[    0.000000]   alloc irq_desc for 8 on node 0
[    0.000000]   alloc irq_desc for 9 on node 0
[    0.000000]   alloc irq_desc for 10 on node 0
[    0.000000]   alloc irq_desc for 11 on node 0
[    0.000000]   alloc irq_desc for 12 on node 0
[    0.000000]   alloc irq_desc for 13 on node 0
[    0.000000]   alloc irq_desc for 14 on node 0
[    0.000000]   alloc irq_desc for 15 on node 0
[    0.000000] Console: colour VGA+ 80x25
[    0.000000] console [ttyS0] enabled, bootconsole disabled
[    0.000000] console [ttyS0] enabled, bootconsole disabled
[    0.000000] Lock dependency validator: Copyright (c) 2006 Red Hat,
Inc., Ingo Molnar
[    0.000000] ... MAX_LOCKDEP_SUBCLASSES:  8
[    0.000000] ... MAX_LOCK_DEPTH:          48
[    0.000000] ... MAX_LOCKDEP_KEYS:        8191
[    0.000000] ... CLASSHASH_SIZE:          4096
[    0.000000] ... MAX_LOCKDEP_ENTRIES:     16384
[    0.000000] ... MAX_LOCKDEP_CHAINS:      32768
[    0.000000] ... CHAINHASH_SIZE:          16384
[    0.000000]  memory used by lock dependency info: 6367 kB
[    0.000000]  per task-struct memory footprint: 2688 bytes
[    0.000000] allocated 11010048 bytes of page_cgroup
[    0.000000] please try 'cgroup_disable=memory' option if you don't
want memory cgroups
[    0.000000] hpet clockevent registered
[    0.000000] tsc: Fast TSC calibration using PIT
[    0.000000] tsc: Detected 2492.129 MHz processor
[    0.004005] CPU0: Calibrating delay loop (skipped), value
calculated using timer frequency.. 4984.25 BogoMIPS (lpj=9968516)
[    0.008012] pid_max: default: 32768 minimum: 301
[    0.009727] Dentry cache hash table entries: 524288 (order: 10,
4194304 bytes)
[    0.012845] Inode-cache hash table entries: 262144 (order: 9, 2097152 bytes)
[    0.014741] Mount-cache hash table entries: 256
[    0.016693] Initializing cgroup subsys cpuacct
[    0.017788] Initializing cgroup subsys memory
[    0.018896] Initializing cgroup subsys devices
[    0.020007] Initializing cgroup subsys freezer
[    0.021101] Initializing cgroup subsys blkio
[    0.022218] CPU: L1 I cache: 32K, L1 D cache: 32K
[    0.023387] CPU: L2 cache: 4096K
[    0.024011] CPU 0/0x0 -> Node 0
[    0.024786] CPU 0 microcode level: 0x1
[    0.025708] mce: CPU supports 10 MCE banks
[    0.026729] Last level iTLB entries: 4KB 0, 2MB 0, 4MB 0
[    0.026729] Last level dTLB entries: 4KB 0, 2MB 0, 4MB 0
[    0.026729] tlb_flushall_shift: 6
[    0.028191] Freeing SMP alternatives: 52k freed
[    0.029312] ACPI: Core revision 20130117
[    0.034050] ACPI: All ACPI Tables successfully acquired
[    0.035310] ftrace: allocating 54533 entries in 214 pages
[    0.053148] ..TIMER: vector=0x30 apic1=0 pin1=2 apic2=-1 pin2=-1
[    0.097443] smpboot: CPU0: Intel QEMU Virtual CPU version 1.2.50
(fam: 06, model: 02, stepping: 03)
[    0.100000] Performance Events: unsupported p6 CPU model 2 no PMU
driver, software events only.
[    0.100000] NMI watchdog: disabled (cpu0): hardware events not enabled
[    0.100292] SMP alternatives: lockdep: fixing up alternatives
[    0.101559] smpboot: Booting processor 1 APIC 0x1 ip 0x9a000
[    0.008000] Initializing CPU#1
[    0.008000] CPU: L1 I cache: 32K, L1 D cache: 32K
[    0.008000] CPU: L2 cache: 4096K
[    0.008000] CPU 1/0x1 -> Node 0
[    0.120006] smpboot: CPU1: Intel QEMU Virtual CPU version 1.2.50
(fam: 06, model: 02, stepping: 03)
[    0.122979] checking TSC synchronization [CPU#0 -> CPU#1]: passed.
[    0.128133] Brought up 2 CPUs
[    0.132000] CPU1: Calibrating delay using timer specific routine..
4984.64 BogoMIPS (lpj=9969291)
[    0.209504] smpboot: Total of 2 processors activated (9968.90 BogoMIPS)
[    0.210312] setup_ioapic_desc() done
[    0.210868] mtrr_aps_init() done
[    0.211301] build_sched_domain done
[    0.211706] register_sched_domain_sysctl done
[    0.212024] non_isolated_cpus inited
[    0.212490] hotcpu_notifier to cpuset_cpu done
[    0.212950] hotcpu_notifier to update_runtime done
[    0.213433] init_hrtick done
[    0.213767] sched_init_granularity done
[    0.214203] init_sched_rt_class done
[    0.214635] cpuset_init_smp done
[    0.214635] usermodehelper_init done
[    0.214635] shmem_init done
[    0.216004] devtmpfs_init done
[    0.216391] devices_init done
[    0.216721] buses_init done
[    0.217010] classes_init done
[    0.217348] firmware_init done
[    0.217709] hypervisor_init done
[    0.218214] platform_bus_init done
[    0.218703] cpu_dev_init done
[    0.220520] memory_dev_init done
[    0.220872] driver_init done
[    0.221413] init_irq_proc done
[    0.221759] do_ctors done
[    0.222056] usermodehelper_enable done
[    0.222633] xor: automatically using best checksumming function:
[    0.264005]    generic_sse: 12602.000 MB/sec
[    0.264464] atomic64 test passed for x86-64 platform with CX8 and with SSE
[    0.265477] NET: Registered protocol family 16
[    0.268117] i2c-core: driver [dummy] registered
[    0.268655] ACPI: bus type pci registered
[    0.269121] dca service started, version 1.12.1
[    0.269121] PCI: Using configuration type 1 for base access
[    0.380160] bio: create slab <bio-0> at 0
[    0.448005] raid6: sse2x1    8363 MB/s
[    0.516010] raid6: sse2x2   10289 MB/s
[    0.584008] raid6: sse2x4   11245 MB/s
[    0.584435] raid6: using algorithm sse2x4 (11245 MB/s)
[    0.584927] raid6: using intx1 recovery algorithm
[    0.585598] init_acpi_device_notify done
[    0.585598] ACPI: Added _OSI(Module Device)
[    0.585598] ACPI: Added _OSI(Processor Device)
[    0.585598] ACPI: Added _OSI(3.0 _SCP Extensions)
[    0.585928] ACPI: Added _OSI(Processor Aggregator Device)
[    0.588016] acpi_os_initialize1 done
[    0.590308] acpi_enable_subsystem done
[    0.590692] ACPI: EC: Look up EC in DSDT
[    0.591317] acpi_ec_ecdt_probe done
[    0.595277] acpi_initialize_objects done
[    0.595815] acpi_bus_osc_support done
[    0.596036] acpi_sysfs_init done
[    0.596851] acpi_early_processor_set_pdc done
[    0.597277] acpi_boot_ec_enable done
[    0.597658] ACPI: Interpreter enabled
[    0.598063] ACPI: (supports S0ACPI Exception: AE_NOT_FOUND, While
evaluating Sleep State [\_S1_] (20130117/hwxface-568)
[    0.599245] ACPI Exception: AE_NOT_FOUND, While evaluating Sleep
State [\_S2_] (20130117/hwxface-568)
[    0.600270]  S3 S4 S5)
[    0.600629] acpi_sleep_init done
[    0.600952] ACPI: Using IOAPIC for interrupt routing
[    0.601491] acpi_bus_init_irq done
[    0.601826] acpi_install_notify_handler done
[    0.602357] acpi_root_dir created
[    0.602691] acpi_bus_init done
[    0.602972] pci_mmcfg_late_init done
[    0.603521] PCI: Using host bridge windows from ACPI; if necessary,
use "pci=nocrs" and report a bug
[    0.604016] acpi_pci_root_init done
[    0.604436] acpi_pci_link_init done
[    0.604784] acpi_platform_init done
[    0.605106] acpi_csrt_init done
[    0.620326] ACPI: PCI Root Bridge [PCI0] (domain 0000 [bus 00-ff])
[    0.620887] ACPI: PCI Interrupt Routing Table [\_SB_.PCI0._PRT]
[    0.621931] acpi PNP0A03:00: ACPI _OSC support notification failed,
disabling PCIe ASPM
[    0.624111] acpi PNP0A03:00: Unable to request _OSC control (_OSC
support mask: 0x08)
[    0.625123] acpi PNP0A03:00: fail to add MMCONFIG information,
can't access extended PCI configuration space under this bridge.
[    0.626402] PCI host bridge to bus 0000:00
[    0.628007] pci_bus 0000:00: root bus resource [bus 00-ff]
[    0.628622] pci_bus 0000:00: root bus resource [io  0x0000-0x0cf7]
[    0.629310] pci_bus 0000:00: root bus resource [io  0x0d00-0xffff]
[    0.629947] pci_bus 0000:00: root bus resource [mem 0x000a0000-0x000bffff]
[    0.630609] pci_bus 0000:00: root bus resource [mem 0xe0000000-0xfebfffff]
[    0.631274] pci_bus 0000:00: scanning bus
[    0.632108] pci 0000:00:00.0: [8086:1237] type 00 class 0x060000
[    0.632725] pci 0000:00:00.0: calling quirk_mmio_always_on+0x0/0x10
[    0.633971] pci 0000:00:01.0: [8086:7000] type 00 class 0x060100
[    0.633971] pci 0000:00:01.1: [8086:7010] type 00 class 0x010180
[    0.637841] pci 0000:00:01.1: reg 20: [io  0xc040-0xc04f]
[    0.639480] pci 0000:00:01.3: [8086:7113] type 00 class 0x068000
[    0.640065] pci 0000:00:01.3: calling acpi_pm_check_blacklist+0x0/0x50
[    0.641029] pci 0000:00:01.3: calling quirk_piix4_acpi+0x0/0x130
[    0.642466] pci 0000:00:01.3: PIO at [io  0xb000-0xb03f]
[    0.642998] pci 0000:00:01.3: addon resource PIIX4 ACPI [io
0xb000-0xb03f] added
[    0.644026] pci 0000:00:01.3: PIO at [io  0xb100-0xb10f]
[    0.644525] pci 0000:00:01.3: addon resource PIIX4 SMB [io
0xb100-0xb10f] added
[    0.645225] pci 0000:00:01.3: calling pci_fixup_piix4_acpi+0x0/0x20
[    0.647143] pci 0000:00:02.0: [1013:00b8] type 00 class 0x030000
[    0.648962] pci 0000:00:02.0: reg 10: [mem 0xfc000000-0xfdffffff pref]
[    0.650404] pci 0000:00:02.0: reg 14: [mem 0xfebf0000-0xfebf0fff]
[    0.655651] pci 0000:00:02.0: reg 30: [mem 0xfebe0000-0xfebeffff pref]
[    0.656246] pci 0000:00:03.0: [8086:100e] type 00 class 0x020000
[    0.658215] pci 0000:00:03.0: reg 10: [mem 0xfeba0000-0xfebbffff]
[    0.660679] pci 0000:00:03.0: reg 14: [io  0xc000-0xc03f]
[    0.666322] pci 0000:00:03.0: reg 30: [mem 0xfebc0000-0xfebdffff pref]
[    0.668599] pci_bus 0000:00: fixups for bus
[    0.669563] pci_bus 0000:00: bus scan returning with max=00
[    0.670847] ACPI _OSC control for PCIe not granted, disabling ASPM
[    0.672084]   acpi_pci_ioapic_add is called for \_SB_.PCI0 ffff8801988c91e0
[    0.674128]   acpi_pci_iommu_add is called for \_SB_.PCI0 ffff8801988c91e0
[    0.677276] ACPI: PCI Interrupt Link [LNKA] (IRQs 5 *10 11)
[    0.678311] ACPI: PCI Interrupt Link [LNKB] (IRQs 5 *10 11)
[    0.679181] ACPI: PCI Interrupt Link [LNKC] (IRQs 5 10 *11)
[    0.680397] ACPI: PCI Interrupt Link [LNKD] (IRQs 5 10 *11)
[    0.681223] ACPI: PCI Interrupt Link [LNKS] (IRQs 9) *0
[    0.682194] ACPI: Enabled 16 GPEs in block 00 to 0F
[    0.682194] acpi root: \_SB_.PCI0 notify handler is installed
[    0.682194] Found 1 acpi root devices
[    0.684015] acpi_scan_init done
[    0.684399] acpi_ec_init done
[    0.684399] acpi_debugfs_init done
[    0.684755] acpi_sleep_proc_init done
[    0.685139] acpi_wakeup_device_init done
[    0.686506] ACPI: No dock devices found.
[    0.688145] vgaarb: device added:
PCI:0000:00:02.0,decodes=io+mem,owns=io+mem,locks=none
[    0.688969] vgaarb: loaded
[    0.689237] vgaarb: bridge control possible 0000:00:02.0
[    0.689831] SCSI subsystem initialized
[    0.689831] ACPI: bus type scsi registered
[    0.692101] libata version 3.00 loaded.
[    0.692677] ACPI: bus type usb registered
[    0.692677] usbcore: registered new interface driver usbfs
[    0.692782] usbcore: registered new interface driver hub
[    0.692782] usbcore: registered new device driver usb
[    0.696180] pps_core: LinuxPPS API ver. 1 registered
[    0.696611] pps_core: Software ver. 5.3.6 - Copyright 2005-2007
Rodolfo Giometti <giometti@linux.it>
[    0.697563] PTP clock support registered
[    0.700198] wmi: Mapper loaded
[    0.700485] Advanced Linux Sound Architecture Driver Initialized.
[    0.700780] PCI: Using ACPI for IRQ routing
[    0.700780] PCI: Routing PCI interrupts for all devices because
"pci=routeirq" specified
[    0.701759] ACPI Exception: AE_NOT_FOUND, Evaluating _SRS
(20130117/pci_link-363)
[    0.704420] ACPI: Unable to set IRQ for PCI Interrupt Link [LNKS].
Try pci=noacpi or acpi=off
[    0.706538] pci 0000:00:01.3: PCI INT A: no GSI - using ISA IRQ 9
[    0.707364] ACPI: PCI Interrupt Link [LNKC] enabled at IRQ 11
[    0.708127] PCI: pci_cache_line_size set to 64 bytes
[    0.709395] pci 0000:00:01.1: BAR 0: reserving [io  0x01f0-0x01f7
flags 0x110] (d=0, p=0)
[    0.710234] pci 0000:00:01.1: BAR 1: reserving [io  0x03f6 flags
0x110] (d=0, p=0)
[    0.711091] pci 0000:00:01.1: BAR 2: reserving [io  0x0170-0x0177
flags 0x110] (d=0, p=0)
[    0.712005] pci 0000:00:01.1: BAR 3: reserving [io  0x0376 flags
0x110] (d=0, p=0)
[    0.712758] pci 0000:00:01.1: BAR 4: reserving [io  0xc040-0xc04f
flags 0x40101] (d=0, p=0)
[    0.713543] pci 0000:00:01.3: BAR 17: reserving [io  0xb000-0xb03f
flags 0x100] (d=0, p=0)
[    0.714380] pci 0000:00:01.3: BAR 18: reserving [io  0xb100-0xb10f
flags 0x100] (d=0, p=0)
[    0.715234] pci 0000:00:02.0: BAR 0: reserving [mem
0xfc000000-0xfdffffff flags 0x42208] (d=0, p=0)
[    0.716006] pci 0000:00:02.0: BAR 1: reserving [mem
0xfebf0000-0xfebf0fff flags 0x40200] (d=0, p=0)
[    0.716856] pci 0000:00:03.0: BAR 0: reserving [mem
0xfeba0000-0xfebbffff flags 0x40200] (d=0, p=0)
[    0.717699] pci 0000:00:03.0: BAR 1: reserving [io  0xc000-0xc03f
flags 0x40101] (d=0, p=0)
[    0.720052] e820: reserve RAM buffer [mem 0x0009fc00-0x0009ffff]
[    0.720690] e820: reserve RAM buffer [mem 0x03e05000-0x03ffffff]
[    0.721420] Bluetooth: Core ver 2.16
[    0.721420] NET: Registered protocol family 31
[    0.724017] Bluetooth: HCI device and connection manager initialized
[    0.725596] Bluetooth: HCI socket layer initialized
[    0.726161] Bluetooth: L2CAP socket layer initialized
[    0.726655] Bluetooth: SCO socket layer initialized
[    0.728528] HPET: 3 timers in total, 0 timers will be used for per-cpu timer
[    0.729324] cfg80211: Calling CRDA to update world regulatory domain
[    0.730101] Switching to clocksource hpet
[    0.766022] FS-Cache: Loaded
[    0.766821] CacheFiles: Loaded
[    0.767702] pnp: PnP ACPI init
[    0.768472] ACPI: bus type pnp registered
[    0.769582] pnp 00:00: Plug and Play ACPI device, IDs PNP0b00 (active)
[    0.771642] pnp 00:01: Plug and Play ACPI device, IDs PNP0303 (active)
[    0.773368] pnp 00:02: Plug and Play ACPI device, IDs PNP0f13 (active)
[    0.775044] pnp 00:03: [dma 2]
[    0.775977] pnp 00:03: Plug and Play ACPI device, IDs PNP0700 (active)
[    0.777890] pnp 00:04: Plug and Play ACPI device, IDs PNP0400 (active)
[    0.779751] pnp 00:05: Plug and Play ACPI device, IDs PNP0501 (active)
[    0.781854] pnp 00:06: Plug and Play ACPI device, IDs PNP0103 (active)
[    0.783655] pnp: PnP ACPI: found 7 devices
[    0.784719] ACPI: ACPI bus type pnp unregistered
[    0.785838] INFO_MDMA: LNW DMA Driver Version 1.1.0
[    0.806903] pci_bus 0000:00: resource 4 [io  0x0000-0x0cf7]
[    0.808278] pci_bus 0000:00: resource 5 [io  0x0d00-0xffff]
[    0.809605] pci_bus 0000:00: resource 6 [mem 0x000a0000-0x000bffff]
[    0.811076] pci_bus 0000:00: resource 7 [mem 0xe0000000-0xfebfffff]
[    0.812684] NET: Registered protocol family 2
[    0.813963] TCP established hash table entries: 32768 (order: 7,
524288 bytes)
[    0.815898] TCP bind hash table entries: 32768 (order: 9, 2621440 bytes)
[    0.819977] TCP: Hash tables configured (established 32768 bind 32768)
[    0.821600] TCP: reno registered
[    0.822438] UDP hash table entries: 2048 (order: 6, 393216 bytes)
[    0.824848] UDP-Lite hash table entries: 2048 (order: 6, 393216 bytes)
[    0.826639] NET: Registered protocol family 1
[    0.827501] RPC: Registered named UNIX socket transport module.
[    0.828310] RPC: Registered udp transport module.
[    0.828853] RPC: Registered tcp transport module.
[    0.829329] RPC: Registered tcp NFSv4.1 backchannel transport module.
[    0.830038] pci 0000:00:00.0: calling quirk_natoma+0x0/0x40
[    0.830580] pci 0000:00:00.0: Limiting direct PCI/PCI transfers
[    0.831213] pci 0000:00:00.0: calling quirk_passive_release+0x0/0x90
[    0.831853] pci 0000:00:01.0: PIIX3: Enabling Passive Release
[    0.832792] pci 0000:00:01.0: calling quirk_isa_dma_hangs+0x0/0x40
[    0.833410] pci 0000:00:01.0: Activating ISA DMA hang workarounds
[    0.834124] pci 0000:00:02.0: calling pci_fixup_video+0x0/0xe0
[    0.834696] pci 0000:00:02.0: Boot video device
[    0.835139] pci 0000:00:03.0: calling quirk_e100_interrupt+0x0/0x1c0
[    0.835802] PCI: CLS 0 bytes, default 64
[    0.836521] Trying to unpack rootfs image as initramfs...
[    0.841605] rootfs image is not initramfs (no cpio magic); looks
like an initrd
[    0.868223] Freeing initrd memory: 39244k freed
[    0.875329] PCI-DMA: Using software bounce buffering for IO (SWIOTLB)
[    0.876117] software IO TLB: No low mem
[    0.876860] kvm: no hardware support
[    0.877278] has_svm: not amd
[    0.877640] kvm: no hardware support
[    0.878195] Machine check injector initialized
[    0.880145] microcode: CPU0 sig=0x623, pf=0x0, revision=0x1
[    0.881139] microcode: CPU1 sig=0x623, pf=0x0, revision=0x1
[    0.881950] microcode: Microcode Update Driver: v2.00
<tigran@aivazian.fsnet.co.uk>, Peter Oruba
[    0.930662] HugeTLB registered 2 MB page size, pre-allocated 0 pages
[    0.949664] VFS: Disk quotas dquot_6.5.2
[    0.950330] Dquot-cache hash table entries: 512 (order 0, 4096 bytes)
[    0.952186] DLM installed
[    0.955513] squashfs: version 4.0 (2009/01/31) Phillip Lougher
[    0.957586] FS-Cache: Netfs 'nfs' registered for caching
[    0.959002] NFS: Registering the id_resolver key type
[    0.959609] Key type id_resolver registered
[    0.960185] Key type id_legacy registered
[    0.961346] NTFS driver 2.1.30 [Flags: R/O].
[    0.962202] ROMFS MTD (C) 2007 Red Hat, Inc.
[    0.965459] Btrfs loaded
[    0.967659] GFS2 installed
[    0.967957] msgmni has been set to 5027
[    0.972163] async_tx: api initialized (async)
[    0.973046] Block layer SCSI generic (bsg) driver version 0.4
loaded (major 251)
[    0.974200] io scheduler noop registered
[    0.974614] io scheduler deadline registered
[    0.976305] io scheduler cfq registered (default)
[    0.979413] pci_hotplug: PCI Hot Plug PCI Core version: 0.5
[    0.980707] pciehp: pcie_port_service_register = 0
[    0.981188] pciehp: PCI Express Hot Plug Controller Driver version: 0.4
[    0.981862] cpcihp_zt5550: ZT5550 CompactPCI Hot Plug Driver version: 0.2
[    0.982720] cpcihp_generic: Generic port I/O CompactPCI Hot Plug
Driver version: 0.1
[    0.983557] cpcihp_generic: not configured, disabling.
[    0.984399] shpchp: Standard Hot Plug PCI Controller Driver version: 0.4
[    0.985083] acpiphp: ACPI Hot Plug PCI Controller Driver version: 0.5
[    0.985716]   add_bridge is called for \_SB_.PCI0 ffff8801988c91e0
[    0.986387] acpiphp_glue: found PCI host-bus bridge with hot-pluggable slots
[    0.987114] pci_bus 0000:00:   bridge DEVICE_ACPI_HANDLE
ffff8801988c91e0 : pci0000:00
[    0.988400] pci_bus 0000:00:   bridge DEVICE_ACPI_HANDLE
ffff8801988c91e0 : pci0000:00
[    0.989300] pci_bus 0000:00:   bridge DEVICE_ACPI_HANDLE
ffff8801988c91e0 : pci0000:00
[    0.990186] pci_bus 0000:00:   bridge DEVICE_ACPI_HANDLE
ffff8801988c91e0 : pci0000:00
[    0.991106] pci_bus 0000:00:   bridge DEVICE_ACPI_HANDLE
ffff8801988c91e0 : pci0000:00
[    0.992044] pci_bus 0000:00:   bridge DEVICE_ACPI_HANDLE
ffff8801988c91e0 : pci0000:00
[    0.992955] acpiphp_glue: found ACPI PCI Hotplug slot 3 at PCI 0000:00:03
[    0.993737] pci_bus 0000:00: dev 03, created physical slot 3
[    0.994696] pci_hotplug: __pci_hp_register: Added slot 3 to the list
[    0.995305] acpiphp: Slot [3] registered
[    0.995727] pci_bus 0000:00:   bridge DEVICE_ACPI_HANDLE
ffff8801988c91e0 : pci0000:00
[    0.996750] acpiphp_glue: found ACPI PCI Hotplug slot 4 at PCI 0000:00:04
[    0.997513] pci_bus 0000:00: dev 04, created physical slot 4
[    0.998383] pci_hotplug: __pci_hp_register: Added slot 4 to the list
[    0.999104] acpiphp: Slot [4] registered
[    0.999581] pci_bus 0000:00:   bridge DEVICE_ACPI_HANDLE
ffff8801988c91e0 : pci0000:00
[    1.000608] acpiphp_glue: found ACPI PCI Hotplug slot 5 at PCI 0000:00:05
[    1.001355] pci_bus 0000:00: dev 05, created physical slot 5
[    1.002210] pci_hotplug: __pci_hp_register: Added slot 5 to the list
[    1.003061] acpiphp: Slot [5] registered
[    1.003606] pci_bus 0000:00:   bridge DEVICE_ACPI_HANDLE
ffff8801988c91e0 : pci0000:00
[    1.004869] acpiphp_glue: found ACPI PCI Hotplug slot 6 at PCI 0000:00:06
[    1.006092] pci_bus 0000:00: dev 06, created physical slot 6
[    1.007599] pci_hotplug: __pci_hp_register: Added slot 6 to the list
[    1.008576] acpiphp: Slot [6] registered
[    1.009019] pci_bus 0000:00:   bridge DEVICE_ACPI_HANDLE
ffff8801988c91e0 : pci0000:00
[    1.009958] acpiphp_glue: found ACPI PCI Hotplug slot 7 at PCI 0000:00:07
[    1.010663] pci_bus 0000:00: dev 07, created physical slot 7
[    1.011507] pci_hotplug: __pci_hp_register: Added slot 7 to the list
[    1.012403] acpiphp: Slot [7] registered
[    1.012870] pci_bus 0000:00:   bridge DEVICE_ACPI_HANDLE
ffff8801988c91e0 : pci0000:00
[    1.013667] acpiphp_glue: found ACPI PCI Hotplug slot 8 at PCI 0000:00:08
[    1.014364] pci_bus 0000:00: dev 08, created physical slot 8
[    1.015183] pci_hotplug: __pci_hp_register: Added slot 8 to the list
[    1.015883] acpiphp: Slot [8] registered
[    1.016448] pci_bus 0000:00:   bridge DEVICE_ACPI_HANDLE
ffff8801988c91e0 : pci0000:00
[    1.017347] acpiphp_glue: found ACPI PCI Hotplug slot 9 at PCI 0000:00:09
[    1.018092] pci_bus 0000:00: dev 09, created physical slot 9
[    1.018942] pci_hotplug: __pci_hp_register: Added slot 9 to the list
[    1.019588] acpiphp: Slot [9] registered
[    1.020070] pci_bus 0000:00:   bridge DEVICE_ACPI_HANDLE
ffff8801988c91e0 : pci0000:00
[    1.021018] acpiphp_glue: found ACPI PCI Hotplug slot 10 at PCI 0000:00:0a
[    1.021797] pci_bus 0000:00: dev 0a, created physical slot 10
[    1.022619] pci_hotplug: __pci_hp_register: Added slot 10 to the list
[    1.023278] acpiphp: Slot [10] registered
[    1.023703] pci_bus 0000:00:   bridge DEVICE_ACPI_HANDLE
ffff8801988c91e0 : pci0000:00
[    1.024743] acpiphp_glue: found ACPI PCI Hotplug slot 11 at PCI 0000:00:0b
[    1.025522] pci_bus 0000:00: dev 0b, created physical slot 11
[    1.026446] pci_hotplug: __pci_hp_register: Added slot 11 to the list
[    1.027102] acpiphp: Slot [11] registered
[    1.027553] pci_bus 0000:00:   bridge DEVICE_ACPI_HANDLE
ffff8801988c91e0 : pci0000:00
[    1.028570] acpiphp_glue: found ACPI PCI Hotplug slot 12 at PCI 0000:00:0c
[    1.029291] pci_bus 0000:00: dev 0c, created physical slot 12
[    1.030153] pci_hotplug: __pci_hp_register: Added slot 12 to the list
[    1.030880] acpiphp: Slot [12] registered
[    1.031303] pci_bus 0000:00:   bridge DEVICE_ACPI_HANDLE
ffff8801988c91e0 : pci0000:00
[    1.032265] acpiphp_glue: found ACPI PCI Hotplug slot 13 at PCI 0000:00:0d
[    1.033189] pci_bus 0000:00: dev 0d, created physical slot 13
[    1.034177] pci_hotplug: __pci_hp_register: Added slot 13 to the list
[    1.035003] acpiphp: Slot [13] registered
[    1.035873] pci_bus 0000:00:   bridge DEVICE_ACPI_HANDLE
ffff8801988c91e0 : pci0000:00
[    1.037446] acpiphp_glue: found ACPI PCI Hotplug slot 14 at PCI 0000:00:0e
[    1.038243] pci_bus 0000:00: dev 0e, created physical slot 14
[    1.039209] pci_hotplug: __pci_hp_register: Added slot 14 to the list
[    1.039880] acpiphp: Slot [14] registered
[    1.040523] pci_bus 0000:00:   bridge DEVICE_ACPI_HANDLE
ffff8801988c91e0 : pci0000:00
[    1.041310] acpiphp_glue: found ACPI PCI Hotplug slot 15 at PCI 0000:00:0f
[    1.041997] pci_bus 0000:00: dev 0f, created physical slot 15
[    1.042810] pci_hotplug: __pci_hp_register: Added slot 15 to the list
[    1.043530] acpiphp: Slot [15] registered
[    1.043946] pci_bus 0000:00:   bridge DEVICE_ACPI_HANDLE
ffff8801988c91e0 : pci0000:00
[    1.044926] acpiphp_glue: found ACPI PCI Hotplug slot 16 at PCI 0000:00:10
[    1.045693] pci_bus 0000:00: dev 10, created physical slot 16
[    1.046565] pci_hotplug: __pci_hp_register: Added slot 16 to the list
[    1.047281] acpiphp: Slot [16] registered
[    1.047697] pci_bus 0000:00:   bridge DEVICE_ACPI_HANDLE
ffff8801988c91e0 : pci0000:00
[    1.048659] acpiphp_glue: found ACPI PCI Hotplug slot 17 at PCI 0000:00:11
[    1.049342] pci_bus 0000:00: dev 11, created physical slot 17
[    1.050088] pci_hotplug: __pci_hp_register: Added slot 17 to the list
[    1.050849] acpiphp: Slot [17] registered
[    1.051250] pci_bus 0000:00:   bridge DEVICE_ACPI_HANDLE
ffff8801988c91e0 : pci0000:00
[    1.052215] acpiphp_glue: found ACPI PCI Hotplug slot 18 at PCI 0000:00:12
[    1.052900] pci_bus 0000:00: dev 12, created physical slot 18
[    1.053683] pci_hotplug: __pci_hp_register: Added slot 18 to the list
[    1.054364] acpiphp: Slot [18] registered
[    1.054847] pci_bus 0000:00:   bridge DEVICE_ACPI_HANDLE
ffff8801988c91e0 : pci0000:00
[    1.055681] acpiphp_glue: found ACPI PCI Hotplug slot 19 at PCI 0000:00:13
[    1.056551] pci_bus 0000:00: dev 13, created physical slot 19
[    1.057330] pci_hotplug: __pci_hp_register: Added slot 19 to the list
[    1.057993] acpiphp: Slot [19] registered
[    1.058432] pci_bus 0000:00:   bridge DEVICE_ACPI_HANDLE
ffff8801988c91e0 : pci0000:00
[    1.059293] acpiphp_glue: found ACPI PCI Hotplug slot 20 at PCI 0000:00:14
[    1.060134] pci_bus 0000:00: dev 14, created physical slot 20
[    1.060934] pci_hotplug: __pci_hp_register: Added slot 20 to the list
[    1.061642] acpiphp: Slot [20] registered
[    1.062134] pci_bus 0000:00:   bridge DEVICE_ACPI_HANDLE
ffff8801988c91e0 : pci0000:00
[    1.063206] acpiphp_glue: found ACPI PCI Hotplug slot 21 at PCI 0000:00:15
[    1.064324] pci_bus 0000:00: dev 15, created physical slot 21
[    1.065437] pci_hotplug: __pci_hp_register: Added slot 21 to the list
[    1.066835] acpiphp: Slot [21] registered
[    1.067700] pci_bus 0000:00:   bridge DEVICE_ACPI_HANDLE
ffff8801988c91e0 : pci0000:00
[    1.068707] acpiphp_glue: found ACPI PCI Hotplug slot 22 at PCI 0000:00:16
[    1.069480] pci_bus 0000:00: dev 16, created physical slot 22
[    1.070259] pci_hotplug: __pci_hp_register: Added slot 22 to the list
[    1.070973] acpiphp: Slot [22] registered
[    1.071659] pci_bus 0000:00:   bridge DEVICE_ACPI_HANDLE
ffff8801988c91e0 : pci0000:00
[    1.072728] acpiphp_glue: found ACPI PCI Hotplug slot 23 at PCI 0000:00:17
[    1.073443] pci_bus 0000:00: dev 17, created physical slot 23
[    1.074251] pci_hotplug: __pci_hp_register: Added slot 23 to the list
[    1.074892] acpiphp: Slot [23] registered
[    1.075352] pci_bus 0000:00:   bridge DEVICE_ACPI_HANDLE
ffff8801988c91e0 : pci0000:00
[    1.076324] acpiphp_glue: found ACPI PCI Hotplug slot 24 at PCI 0000:00:18
[    1.077140] pci_bus 0000:00: dev 18, created physical slot 24
[    1.077936] pci_hotplug: __pci_hp_register: Added slot 24 to the list
[    1.078687] acpiphp: Slot [24] registered
[    1.079161] pci_bus 0000:00:   bridge DEVICE_ACPI_HANDLE
ffff8801988c91e0 : pci0000:00
[    1.079964] acpiphp_glue: found ACPI PCI Hotplug slot 25 at PCI 0000:00:19
[    1.080824] pci_bus 0000:00: dev 19, created physical slot 25
[    1.081636] pci_hotplug: __pci_hp_register: Added slot 25 to the list
[    1.082358] acpiphp: Slot [25] registered
[    1.082803] pci_bus 0000:00:   bridge DEVICE_ACPI_HANDLE
ffff8801988c91e0 : pci0000:00
[    1.083598] acpiphp_glue: found ACPI PCI Hotplug slot 26 at PCI 0000:00:1a
[    1.084546] pci_bus 0000:00: dev 1a, created physical slot 26
[    1.085363] pci_hotplug: __pci_hp_register: Added slot 26 to the list
[    1.086074] acpiphp: Slot [26] registered
[    1.086478] pci_bus 0000:00:   bridge DEVICE_ACPI_HANDLE
ffff8801988c91e0 : pci0000:00
[    1.087266] acpiphp_glue: found ACPI PCI Hotplug slot 27 at PCI 0000:00:1b
[    1.088072] pci_bus 0000:00: dev 1b, created physical slot 27
[    1.088939] pci_hotplug: __pci_hp_register: Added slot 27 to the list
[    1.089568] acpiphp: Slot [27] registered
[    1.089998] pci_bus 0000:00:   bridge DEVICE_ACPI_HANDLE
ffff8801988c91e0 : pci0000:00
[    1.090857] acpiphp_glue: found ACPI PCI Hotplug slot 28 at PCI 0000:00:1c
[    1.091586] pci_bus 0000:00: dev 1c, created physical slot 28
[    1.092884] pci_hotplug: __pci_hp_register: Added slot 28 to the list
[    1.093499] acpiphp: Slot [28] registered
[    1.094494] pci_bus 0000:00:   bridge DEVICE_ACPI_HANDLE
ffff8801988c91e0 : pci0000:00
[    1.096476] acpiphp_glue: found ACPI PCI Hotplug slot 29 at PCI 0000:00:1d
[    1.097539] pci_bus 0000:00: dev 1d, created physical slot 29
[    1.098408] pci_hotplug: __pci_hp_register: Added slot 29 to the list
[    1.099258] acpiphp: Slot [29] registered
[    1.099685] pci_bus 0000:00:   bridge DEVICE_ACPI_HANDLE
ffff8801988c91e0 : pci0000:00
[    1.100727] acpiphp_glue: found ACPI PCI Hotplug slot 30 at PCI 0000:00:1e
[    1.101477] pci_bus 0000:00: dev 1e, created physical slot 30
[    1.102336] pci_hotplug: __pci_hp_register: Added slot 30 to the list
[    1.102993] acpiphp: Slot [30] registered
[    1.103477] pci_bus 0000:00:   bridge DEVICE_ACPI_HANDLE
ffff8801988c91e0 : pci0000:00
[    1.104472] acpiphp_glue: found ACPI PCI Hotplug slot 31 at PCI 0000:00:1f
[    1.105438] pci_bus 0000:00: dev 1f, created physical slot 31
[    1.106302] pci_hotplug: __pci_hp_register: Added slot 31 to the list
[    1.106997] acpiphp: Slot [31] registered
[    1.109325] acpiphp_ibm: ibm_acpiphp_init: acpi_walk_namespace failed
[    1.111415] input: Power Button as
/devices/LNXSYSTM:00/LNXPWRBN:00/input/input0
[    1.112313] ACPI: Power Button [PWRF]
[    1.120725] ioatdma: Intel(R) QuickData Technology Driver 4.00
[    1.201277] Serial: 8250/16550 driver, 4 ports, IRQ sharing disabled
[    1.224167] 00:05: ttyS0 at I/O 0x3f8 (irq = 4) is a 16550A
[    1.228414] Non-volatile memory driver v1.3
[    1.230047] Linux agpgart interface v0.103
[    1.232449] [drm] Initialized drm 1.1.0 20060810
[    1.234289] i2c-core: driver [sil164] registered
[    1.242053] brd: module loaded
[    1.246536] loop: module loaded
[    1.247697] mtip32xx Version 1.2.6os3
[    1.249241] i2c-core: driver [at24] registered
[    1.251815] Loading iSCSI transport class v2.0-870.
[    1.255062] rdac: device handler registered
[    1.257222] iscsi: registered transport (tcp)
[    1.258584] Adaptec aacraid driver 1.2-0[29801]-ms
[    1.259927] aic94xx: Adaptec aic94xx SAS/SATA driver version 1.0.3 loaded
[    1.261817] isci: Intel(R) C600 SAS Controller Driver - version 1.1.0
[    1.263790] qla2xxx [0000:00:00.0]-0005: : QLogic Fibre Channel HBA
Driver: 8.04.00.08-k-debug.
[    1.266289] iscsi: registered transport (qla4xxx)
[    1.267542] QLogic iSCSI HBA Driver
[    1.268322] Emulex LightPulse Fibre Channel SCSI driver 8.3.36
[    1.269763] Copyright(c) 2004-2009 Emulex.  All rights reserved.
[    1.271554] Brocade BFA FC/FCOE SCSI driver - version: 3.1.2.1
[    1.273501] megaraid cmm: 2.20.2.7 (Release Date: Sun Jul 16
00:01:03 EST 2006)
[    1.275475] megaraid: 2.20.5.1 (Release Date: Thu Nov 16 15:32:35 EST 2006)
[    1.277402] megasas: 06.504.01.00-rc1 Mon. Oct. 1 17:00:00 PDT 2012
[    1.279120] mpt2sas version 14.100.00.00 loaded
[    1.280992] st: Version 20101219, fixed bufsize 32768, s/g segs 256
[    1.283443] SCSI Media Changer driver v0.25
[    1.285280] osd: LOADED open-osd 0.2.1
[    1.286542] ata_piix 0000:00:01.1: version 2.13
[    1.287754] ata_piix 0000:00:01.1: enabling bus mastering
[    1.289137] ata_piix 0000:00:01.1: setting latency timer to 64
[    1.293173] scsi0 : ata_piix
[    1.294650] scsi1 : ata_piix
[    1.295733] ata1: PATA max MWDMA2 cmd 0x1f0 ctl 0x3f6 bmdma 0xc040 irq 14
[    1.297436] ata2: PATA max MWDMA2 cmd 0x170 ctl 0x376 bmdma 0xc048 irq 15
[    1.314246] libphy: Fixed MDIO Bus: probed
[    1.315809] tun: Universal TUN/TAP device driver, 1.6
[    1.317044] tun: (C) 1999-2004 Max Krasnyansky <maxk@qualcomm.com>
[    1.318930] CAN device driver interface
[    1.320130] usbcore: registered new interface driver peak_usb
[    1.321798] usbcore: registered new interface driver usb_8dev
[    1.323734] cnic: Broadcom NetXtreme II CNIC Driver cnic v2.5.16
(Dec 05, 2012)
[    1.325753] bnx2x: Broadcom NetXtreme II 5771x/578xx 10/20-Gigabit
Ethernet Driver bnx2x 1.78.02-0 (2013/01/14)
[    1.328941] Brocade 10G Ethernet driver - version: 3.1.2.1
[    1.332090] e1000: Intel(R) PRO/1000 Network Driver - version 7.3.21-k8-NAPI
[    1.333805] e1000: Copyright (c) 1999-2006 Intel Corporation.
[    1.335248] e1000 0000:00:03.0: enabling bus mastering
[    1.336565] e1000 0000:00:03.0: setting latency timer to 64
[    1.452633] ata1.01: NODEV after polling detection
[    1.454089] ata1.00: ATA-7: QEMU HARDDISK, 1.2.50, max UDMA/100
[    1.455532] ata1.00: 65536 sectors, multi 16: LBA48
[    1.458521] ata1.00: configured for MWDMA2
[    1.460235] scsi 0:0:0:0: Direct-Access     ATA      QEMU HARDDISK
  1.2. PQ: 0 ANSI: 5
[    1.463117] sd 0:0:0:0: [sda] 65536 512-byte logical blocks: (33.5
MB/32.0 MiB)
[    1.465345] sd 0:0:0:0: [sda] Write Protect is off
[    1.466212] sd 0:0:0:0: [sda] Mode Sense: 00 3a 00 00
[    1.467542] sd 0:0:0:0: [sda] Write cache: enabled, read cache:
enabled, doesn't support DPO or FUA
[    1.470354] ata2.01: NODEV after polling detection
[    1.471804] ata2.00: ATAPI: QEMU DVD-ROM, 1.2.50, max UDMA/100
[    1.474013] ata2.00: configured for MWDMA2
[    1.476308] Kernel panic - not syncing: Can not allocate SWIOTLB
buffer earlier and can't now provide you with the DMA bounce buffer
[    1.479216] Pid: 2057, comm: kworker/u:3 Not tainted
3.8.0-rc5-yh-01253-gdf8eac2-dirty #1194
[    1.480035] Call Trace:
[    1.480035]  [<ffffffff82159a78>] panic+0xc0/0x1ce
[    1.480035]  [<ffffffff814e0b54>] swiotlb_tbl_map_single+0x34/0x290
[    1.480035]  [<ffffffff814e1219>] map_single+0x19/0x20
[    1.480035]  [<ffffffff814e12e8>] swiotlb_map_sg_attrs+0xc8/0x170
[    1.480035]  [<ffffffff819b5ef1>] ata_qc_issue+0x2d1/0x3a0
[    1.480035]  [<ffffffff819bcab0>] ? ata_scsi_set_sense.constprop.13+0x30/0x30
[    1.480035]  [<ffffffff819bb508>] ata_scsi_translate+0x128/0x190
[    1.480035]  [<ffffffff819becf4>] ? ata_scsi_queuecmd+0x34/0x2c0
[    1.480035]  [<ffffffff819bef08>] ata_scsi_queuecmd+0x248/0x2c0
[    1.480035]  [<ffffffff817469c1>] scsi_dispatch_cmd+0x1c1/0x2e0
[    1.480035]  [<ffffffff8174e502>] scsi_request_fn+0x572/0x5b0
[    1.480035]  [<ffffffff810eb3aa>] ? __lock_is_held+0x5a/0x80
[    1.480035]  [<ffffffff81499837>] __blk_run_queue+0x37/0x50
[    1.480035]  [<ffffffff81498f69>] __elv_add_request+0x119/0x290
[    1.480035]  [<ffffffff8149d252>] ? drive_stat_acct+0x52/0x1f0
[    1.480035]  [<ffffffff814a0fb7>] blk_queue_bio+0x337/0x3d0
[    1.480035]  [<ffffffff8149fe9e>] generic_make_request+0xbe/0x120
[    1.480035]  [<ffffffff814a0020>] submit_bio+0x120/0x160
[    1.480035]  [<ffffffff811e5aea>] ? bio_alloc_bioset+0x9a/0x130
[    1.480035]  [<ffffffff811e069f>] submit_bh+0x1af/0x1e0
[    1.480035]  [<ffffffff811e3f4a>] block_read_full_page+0x32a/0x350
[    1.480035]  [<ffffffff811e6cd0>] ? I_BDEV+0x10/0x10
[    1.480035]  [<ffffffff8114f3d7>] ? add_to_page_cache_locked+0xc7/0x130
[    1.480035]  [<ffffffff811e7580>] ? blkdev_write_begin+0x30/0x30
[    1.480035]  [<ffffffff811e7598>] blkdev_readpage+0x18/0x20
[    1.480035]  [<ffffffff8114f622>] do_read_cache_page+0x92/0x180
[    1.480035]  [<ffffffff8114f75c>] read_cache_page_async+0x1c/0x20
[    1.480035]  [<ffffffff8114f76e>] read_cache_page+0xe/0x20
[    1.480035]  [<ffffffff814abad0>] read_dev_sector+0x30/0xa0
[    1.480035]  [<ffffffff814b04cf>] read_lba+0x10f/0x190
[    1.480035]  [<ffffffff814b0baf>] efi_partition+0x11f/0x6c0
[    1.480035]  [<ffffffff814cb8e4>] ? snprintf+0x34/0x40
[    1.480035]  [<ffffffff814b0a90>] ? compare_gpts+0x290/0x290
[    1.480035]  [<ffffffff814acc7c>] check_partition+0xfc/0x200
[    1.480035]  [<ffffffff814ac8ca>] rescan_partitions+0x8a/0x2c0
[    1.480035]  [<ffffffff811e8804>] __blkdev_get+0x4b4/0x4e0
[    1.480035]  [<ffffffff811e8b8d>] blkdev_get+0x35d/0x3d0
[    1.480035]  [<ffffffff811c8c67>] ? unlock_new_inode+0x77/0x90
[    1.480035]  [<ffffffff811e72bf>] ? bdget+0x11f/0x140
[    1.480035]  [<ffffffff814a9537>] ? disk_get_part+0x17/0xa0
[    1.480035]  [<ffffffff814a9efd>] add_disk+0x2ed/0x4a0
[    1.480035]  [<ffffffff819a3a79>] sd_probe_async+0x129/0x1f0
[    1.480035]  [<ffffffff810b7b1c>] async_run_entry_fn+0xdc/0x1a0
[    1.480035]  [<ffffffff810a938d>] process_one_work+0x2dd/0x560
[    1.480035]  [<ffffffff810a9320>] ? process_one_work+0x270/0x560
[    1.480035]  [<ffffffff810a9e79>] ? worker_thread+0x59/0x3a0
[    1.480035]  [<ffffffff810b7a40>] ? async_schedule+0x20/0x20
[    1.480035]  [<ffffffff810aa09a>] worker_thread+0x27a/0x3a0
[    1.480035]  [<ffffffff810ea7ad>] ? trace_hardirqs_on+0xd/0x10
[    1.480035]  [<ffffffff810a9e20>] ? manage_workers+0x280/0x280
[    1.480035]  [<ffffffff810af888>] kthread+0xe8/0xf0
[    1.480035]  [<ffffffff810af7a0>] ? __init_kthread_worker+0x70/0x70
[    1.480035]  [<ffffffff8217ef5c>] ret_from_fork+0x7c/0xb0
[    1.480035]  [<ffffffff810af7a0>] ? __init_kthread_worker+0x70/0x70

^ permalink raw reply	[flat|nested] 199+ messages in thread

* Re: [PATCH v7u1 26/31] x86: Don't enable swiotlb if there is not enough ram for it
  2013-01-29  3:44                                                               ` Yinghai Lu
@ 2013-01-31 19:28                                                                 ` Shuah Khan
  2013-01-31 19:35                                                                   ` H. Peter Anvin
  0 siblings, 1 reply; 199+ messages in thread
From: Shuah Khan @ 2013-01-31 19:28 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Konrad Rzeszutek Wilk, Konrad Rzeszutek Wilk, Eric W. Biederman,
	Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Andrew Morton,
	Borislav Petkov, Jan Kiszka, Jason Wessel, linux-kernel,
	Joerg Roedel

On Mon, Jan 28, 2013 at 8:44 PM, Yinghai Lu <yinghai@kernel.org> wrote:
> On Mon, Jan 28, 2013 at 6:27 PM, Shuah Khan <shuahkhan@gmail.com> wrote:
>> On Thu, Jan 24, 2013 at 2:50 PM, Yinghai Lu <yinghai@kernel.org> wrote:
>>
>> Your for-x86-boot git boots on AMD system I have. However, with
>> memmap=4095$1M option, it panics very early in boot. I don't have
>> physical access to the console and I will try to get you the panic
>> information tomorrow.
>
> panic is good, but it is supposed some kind of late..
>

It is a very early panic, however it is not specific to your git. It
happens on 3.4, 3.8-rc4. You can disregard the early panic for your
git.

-- Shuah

^ permalink raw reply	[flat|nested] 199+ messages in thread

* Re: [PATCH v7u1 26/31] x86: Don't enable swiotlb if there is not enough ram for it
  2013-01-31 19:28                                                                 ` Shuah Khan
@ 2013-01-31 19:35                                                                   ` H. Peter Anvin
  0 siblings, 0 replies; 199+ messages in thread
From: H. Peter Anvin @ 2013-01-31 19:35 UTC (permalink / raw)
  To: Shuah Khan
  Cc: Yinghai Lu, Konrad Rzeszutek Wilk, Konrad Rzeszutek Wilk,
	Eric W. Biederman, Thomas Gleixner, Ingo Molnar, Andrew Morton,
	Borislav Petkov, Jan Kiszka, Jason Wessel, linux-kernel,
	Joerg Roedel

On 01/31/2013 11:28 AM, Shuah Khan wrote:
> On Mon, Jan 28, 2013 at 8:44 PM, Yinghai Lu <yinghai@kernel.org> wrote:
>> On Mon, Jan 28, 2013 at 6:27 PM, Shuah Khan <shuahkhan@gmail.com> wrote:
>>> On Thu, Jan 24, 2013 at 2:50 PM, Yinghai Lu <yinghai@kernel.org> wrote:
>>>
>>> Your for-x86-boot git boots on AMD system I have. However, with
>>> memmap=4095$1M option, it panics very early in boot. I don't have
>>> physical access to the console and I will try to get you the panic
>>> information tomorrow.
>>
>> panic is good, but it is supposed some kind of late..
>>
> 
> It is a very early panic, however it is not specific to your git. It
> happens on 3.4, 3.8-rc4. You can disregard the early panic for your
> git.
> 

Knocking out 4095 bytes is kind of an unusual action, however, assuming
the kernel is running at 16 MiB as normal then it should just work...

memmap=4095M$1M as was shown later is another matter... there the kernel
will be living inside the reserved region, and yes, at that point it is
panic time since there is nothing the kernel can do about it.

	-hpa



^ permalink raw reply	[flat|nested] 199+ messages in thread

end of thread, other threads:[~2013-01-31 19:37 UTC | newest]

Thread overview: 199+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-01-04  0:48 [PATCH v7u1 00/31] x86, boot, 64bit: Add support for loading ramdisk and bzImage above 4G Yinghai Lu
2013-01-04  0:48 ` [PATCH v7u1 01/31] x86, mm: Fix page table early allocation offset checking Yinghai Lu
2013-01-04  7:17   ` Borislav Petkov
2013-01-04 21:50     ` Yinghai Lu
2013-01-05 13:05       ` Borislav Petkov
2013-01-15 12:27   ` Stefano Stabellini
2013-01-04  0:48 ` [PATCH v7u1 02/31] x86, 64bit, mm: make pgd next calculation consistent with pud/pmd Yinghai Lu
2013-01-04  0:48 ` [PATCH v7u1 03/31] x86, realmode: set real_mode permissions early Yinghai Lu
2013-01-04 20:15   ` Borislav Petkov
2013-01-04 20:58     ` Yinghai Lu
2013-01-04 21:04       ` Borislav Petkov
2013-01-04 22:13         ` Yinghai Lu
2013-01-05 13:25           ` Borislav Petkov
2013-01-07 12:40             ` Borislav Petkov
2013-01-04  0:48 ` [PATCH v7u1 04/31] x86, 64bit, mm: add generic kernel/ident mapping helper Yinghai Lu
2013-01-04 21:19   ` Borislav Petkov
2013-01-04 22:19     ` Yinghai Lu
2013-01-05 13:21       ` Borislav Petkov
2013-01-04  0:48 ` [PATCH v7u1 05/31] x86, 64bit: copy zero-page early Yinghai Lu
2013-01-07 15:53   ` Borislav Petkov
2013-01-04  0:48 ` [PATCH v7u1 06/31] x86, 64bit, realmode: use init_level4_pgt to set trapmoline_pgt directly Yinghai Lu
2013-01-04 17:18   ` Sakkinen, Jarkko
2013-01-04 22:01     ` Yinghai Lu
2013-01-05  9:59       ` Sakkinen, Jarkko
2013-01-07 15:54   ` Borislav Petkov
2013-01-04  0:48 ` [PATCH v7u1 07/31] x86, realmode: Separate real_mode reserve and setup Yinghai Lu
2013-01-04 17:18   ` Sakkinen, Jarkko
2013-01-07 15:54   ` Borislav Petkov
2013-01-04  0:48 ` [PATCH v7u1 08/31] x86, 64bit: early #PF handler set page table Yinghai Lu
2013-01-07 15:55   ` Borislav Petkov
2013-01-10  1:56     ` Yinghai Lu
2013-01-10 12:19       ` Borislav Petkov
2013-01-10 17:05         ` Yinghai Lu
2013-01-10 20:27           ` Borislav Petkov
2013-01-12 22:04             ` H. Peter Anvin
2013-01-04  0:48 ` [PATCH v7u1 09/31] x86, 64bit: #PF handler set page to cover 2M only Yinghai Lu
2013-01-09 22:57   ` Borislav Petkov
2013-01-04  0:48 ` [PATCH v7u1 10/31] x86, 64bit: Don't set max_pfn_mapped wrong value early on native path Yinghai Lu
2013-01-11 12:13   ` Borislav Petkov
2013-01-11 16:42     ` Yinghai Lu
2013-01-11 16:52       ` Borislav Petkov
2013-01-15 13:48   ` Stefano Stabellini
2013-01-15 15:22     ` Konrad Rzeszutek Wilk
2013-01-15 15:59       ` Stefano Stabellini
2013-01-15 16:37     ` Yinghai Lu
2013-01-04  0:48 ` [PATCH v7u1 11/31] x86: Merge early_reserve_initrd for 32bit and 64bit Yinghai Lu
2013-01-04  0:48 ` [PATCH v7u1 12/31] x86: add get_ramdisk_image/size() Yinghai Lu
2013-01-07 15:56   ` Borislav Petkov
2013-01-10  1:53     ` Yinghai Lu
2013-01-10 12:13       ` Borislav Petkov
2013-01-04  0:48 ` [PATCH v7u1 13/31] x86, boot: add get_cmd_line_ptr() Yinghai Lu
2013-01-07 15:56   ` Borislav Petkov
2013-01-04  0:48 ` [PATCH v7u1 14/31] x86, boot: move checking of cmd_line_ptr out of common path Yinghai Lu
2013-01-07 16:00   ` Borislav Petkov
2013-01-04  0:48 ` [PATCH v7u1 15/31] x86, boot: pass cmd_line_ptr with unsigned long instead Yinghai Lu
2013-01-04  0:48 ` [PATCH v7u1 16/31] x86, boot: move verify_cpu.S and no_longmode down Yinghai Lu
2013-01-04  0:48 ` [PATCH v7u1 17/31] x86, boot: Move lldt/ltr out of 64bit code section Yinghai Lu
2013-01-04  0:48 ` [PATCH v7u1 18/31] x86, kexec: remove 1024G limitation for kexec buffer on 64bit Yinghai Lu
2013-01-04  0:48 ` [PATCH v7u1 19/31] x86, kexec: set ident mapping for kernel that is above max_pfn Yinghai Lu
2013-01-04  0:48 ` [PATCH v7u1 20/31] x86, kexec: replace ident_mapping_init and init_level4_page Yinghai Lu
2013-01-04 21:01   ` Borislav Petkov
2013-01-04 22:04     ` Yinghai Lu
2013-01-05 13:24       ` Borislav Petkov
2013-01-10  1:26         ` Yinghai Lu
2013-01-10 11:59           ` Borislav Petkov
2013-01-04  0:48 ` [PATCH v7u1 21/31] x86, kexec: only set ident mapping for ram Yinghai Lu
2013-01-13 12:56   ` Borislav Petkov
2013-01-14  5:46     ` Yinghai Lu
2013-01-14  9:53       ` Borislav Petkov
2013-01-14 18:17         ` Yinghai Lu
2013-01-04  0:48 ` [PATCH v7u1 22/31] x86, boot: add fields to support load bzImage and ramdisk above 4G Yinghai Lu
2013-01-13 21:41   ` Borislav Petkov
2013-01-14  5:37     ` Yinghai Lu
2013-01-14  9:43       ` Borislav Petkov
2013-01-14 23:06         ` Yinghai Lu
2013-01-14 17:49       ` H. Peter Anvin
2013-01-14 18:57         ` Yinghai Lu
2013-01-14 18:59           ` H. Peter Anvin
2013-01-14 19:19             ` Yinghai Lu
2013-01-14 19:50             ` Yinghai Lu
2013-01-14 19:56               ` H. Peter Anvin
2013-01-14 20:05                 ` Yinghai Lu
2013-01-15  6:17                   ` Yinghai Lu
2013-01-15 15:50                     ` Borislav Petkov
2013-01-15 16:03                       ` Yinghai Lu
2013-01-15 16:48                         ` Borislav Petkov
2013-01-15 18:43                           ` Yinghai Lu
2013-01-15 19:49                             ` Borislav Petkov
2013-01-15 20:16                               ` Yinghai Lu
2013-01-15 20:28                                 ` Borislav Petkov
2013-01-14 20:05           ` Borislav Petkov
2013-01-14 20:14             ` Yinghai Lu
2013-01-14 20:26               ` Borislav Petkov
2013-01-14 22:38                 ` Yinghai Lu
2013-01-14 23:11                   ` Borislav Petkov
2013-01-15  1:04                     ` Yinghai Lu
2013-01-14 23:10     ` H. Peter Anvin
2013-01-14 23:21       ` Borislav Petkov
2013-01-04  0:48 ` [PATCH v7u1 23/31] x86, boot: update comments about entries for 64bit image Yinghai Lu
2013-01-14 11:20   ` Borislav Petkov
2013-01-14 18:35     ` Yinghai Lu
2013-01-14 18:37       ` Yinghai Lu
2013-01-14 18:46         ` Borislav Petkov
2013-01-14 20:01           ` Yinghai Lu
2013-01-14 18:43       ` Borislav Petkov
2013-01-04  0:48 ` [PATCH v7u1 24/31] x86, boot: Not need to check setup_header version for setup_data Yinghai Lu
2013-01-14 11:26   ` Borislav Petkov
2013-01-14 17:37     ` H. Peter Anvin
2013-01-14 18:04       ` Borislav Petkov
2013-01-14 18:42         ` H. Peter Anvin
2013-01-04  0:48 ` [PATCH v7u1 25/31] memblock: add memblock_mem_size() Yinghai Lu
2013-01-14 20:42   ` H. Peter Anvin
2013-01-14 22:28     ` Yinghai Lu
2013-01-04  0:48 ` [PATCH v7u1 26/31] x86: Don't enable swiotlb if there is not enough ram for it Yinghai Lu
2013-01-04 16:05   ` Konrad Rzeszutek Wilk
2013-01-04 19:57     ` Yinghai Lu
2013-01-04 17:50   ` Shuah Khan
2013-01-04 20:34     ` Yinghai Lu
2013-01-04 21:02       ` Shuah Khan
2013-01-04 22:10         ` Yinghai Lu
2013-01-04 22:26           ` Shuah Khan
2013-01-04 22:34             ` Yinghai Lu
2013-01-04 22:47             ` Eric W. Biederman
2013-01-04 22:56               ` Shuah Khan
2013-01-04 23:00                 ` Yinghai Lu
2013-01-04 23:21                   ` Shuah Khan
2013-01-04 23:55                     ` Yinghai Lu
2013-01-05  2:02                       ` Shuah Khan
2013-01-05  4:10                         ` Yinghai Lu
2013-01-05 22:04                           ` Shuah Khan
2013-01-04 22:58               ` Yinghai Lu
2013-01-07 15:26           ` Konrad Rzeszutek Wilk
2013-01-07 17:02             ` Shuah Khan
2013-01-07 19:29               ` Konrad Rzeszutek Wilk
2013-01-08  2:22               ` Eric W. Biederman
2013-01-08  2:48                 ` Konrad Rzeszutek Wilk
2013-01-08  3:03                   ` Eric W. Biederman
2013-01-08  3:01                 ` Yinghai Lu
2013-01-08  3:13                   ` Eric W. Biederman
2013-01-08  3:50                     ` Yinghai Lu
2013-01-08 23:40                       ` Yinghai Lu
2013-01-09  0:04                         ` Eric W. Biederman
2013-01-09  0:43                         ` Konrad Rzeszutek Wilk
2013-01-09  0:56                           ` Yinghai Lu
2013-01-09  0:58                           ` Eric W. Biederman
2013-01-09  1:07                             ` Yinghai Lu
2013-01-09  1:12                               ` Yinghai Lu
2013-01-09  2:31                                 ` Eric W. Biederman
2013-01-09 13:24                                 ` Konrad Rzeszutek Wilk
2013-01-09 17:27                                   ` Yinghai Lu
2013-01-09 18:01                                     ` Shuah Khan
2013-01-09 19:13                                       ` Yinghai Lu
2013-01-09 21:00                                     ` Eric W. Biederman
2013-01-09 21:15                                       ` Yinghai Lu
2013-01-10 23:07                                         ` Yinghai Lu
2013-01-10 23:15                                           ` Eric W. Biederman
2013-01-10 23:55                                             ` Yinghai Lu
2013-01-11 16:35                                           ` Konrad Rzeszutek Wilk
2013-01-11 16:52                                             ` Yinghai Lu
2013-01-11 17:49                                               ` Yinghai Lu
2013-01-15  6:19                                                 ` Yinghai Lu
2013-01-18 15:55                                                   ` Konrad Rzeszutek Wilk
2013-01-24 15:39                                                     ` Konrad Rzeszutek Wilk
2013-01-24 16:51                                                       ` Shuah Khan
2013-01-24 19:22                                                         ` Shuah Khan
2013-01-24 21:50                                                           ` Yinghai Lu
2013-01-29  2:27                                                             ` Shuah Khan
2013-01-29  3:44                                                               ` Yinghai Lu
2013-01-31 19:28                                                                 ` Shuah Khan
2013-01-31 19:35                                                                   ` H. Peter Anvin
2013-01-09 13:12                             ` Konrad Rzeszutek Wilk
2013-01-07 20:32             ` Yinghai Lu
2013-01-07 21:30               ` Yinghai Lu
2013-01-04  0:48 ` [PATCH v7u1 27/31] x86, kdump: remove crashkernel range find limit for 64bit Yinghai Lu
2013-01-14 15:43   ` Borislav Petkov
2013-01-14 18:18     ` Yinghai Lu
2013-01-04  0:48 ` [PATCH v7u1 28/31] x86: add Crash kernel low reservation Yinghai Lu
2013-01-04  0:48 ` [PATCH v7u1 29/31] x86: Merge early kernel reserve for 32bit and 64bit Yinghai Lu
2013-01-04  0:48 ` [PATCH v7u1 30/31] x86, 64bit, mm: Mark data/bss/brk to nx Yinghai Lu
2013-01-04  0:48 ` [PATCH v7u1 31/31] x86, 64bit, mm: hibernate use generic mapping_init Yinghai Lu
2013-01-04 11:43   ` Rafael J. Wysocki
2013-01-04 21:59     ` Yinghai Lu
2013-01-04 22:07       ` Rafael J. Wysocki
2013-01-04  7:09 ` [PATCH v7u1 00/31] x86, boot, 64bit: Add support for loading ramdisk and bzImage above 4G Borislav Petkov
2013-01-04 21:44   ` Yinghai Lu
2013-01-14 20:45 ` H. Peter Anvin
2013-01-14 22:44   ` Yinghai Lu
2013-01-14 23:16     ` H. Peter Anvin
2013-01-14 23:39       ` David Woodhouse
2013-01-14 23:50         ` H. Peter Anvin
2013-01-15  0:12           ` David Woodhouse
2013-01-15 12:19 ` Stefano Stabellini
2013-01-15 16:43   ` Yinghai Lu
2013-01-15 19:28     ` Yinghai Lu
2013-01-16 11:32       ` Stefano Stabellini
2013-01-16 17:31         ` Yinghai Lu
2013-01-16 17:38           ` H. Peter Anvin
2013-01-16 18:20             ` Yinghai Lu
2013-01-17  2:35               ` Yinghai Lu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).