linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 00/11] x86/microcode: Early load microcode
@ 2012-11-30  1:47 Fenghua Yu
  2012-11-30  1:47 ` [PATCH v2 01/10] Documentation/x86: " Fenghua Yu
                   ` (9 more replies)
  0 siblings, 10 replies; 127+ messages in thread
From: Fenghua Yu @ 2012-11-30  1:47 UTC (permalink / raw)
  To: H Peter Anvin, Ingo Molnar, Thomas Gleixner, Asit K Mallick,
	Tigran Aivazian, Andreas Herrmann, Borislav Petkov, linux-kernel,
	x86
  Cc: Fenghua Yu

From: Fenghua Yu <fenghua.yu@intel.com>

The problem in current microcode loading method is that we load a microcode way,
way too late; ideally we should load it before turning paging on.  This may only
be practical on 32 bits since we can't get to 64-bit mode without paging on,
but we should still do it as early as at all possible.

Similarly, we should load the microcode update as early as possible during AP
bringup and when processors are brought back online after hotplug or S3/S4.

In order to do that, the microcode patch needs to be permanently present in
kernel memory.  Each individual patch is fairly small, so that is OK, but the
entire blob with support for each CPU is too big. Since only CPU's with same
model can be in the same platform, we store microcode with the same model as
BSP. Later on AP's can upload microcode from the saved microcodep patches.

Note, however, that Linux users have gotten used to being able to install a
microcode patch in the field without having a reboot; we support that model too.

v2: Detect vendor before loading microcode. Move some functions from
microcode_intel_early.c to microcode_intel_lib.c. Change some early loading
microcode dependencies in Kconfig. Reword doc.

Fenghua Yu (10):
  Documentation/x86: Early load microcode
  x86/microcode_intel.h: Define functions and macros for early load
    ucode
  x86/microcode_core_early.c: Define interfaces for early load ucode
  x86/microcode_intel_lib.c: Early update ucode on Intel's CPU
  x86/microcode_intel_early.c: Early update ucode on Intel's CPU
  x86/head_32.S: Early update ucode in 32-bit
  x86/head64.c: Early update ucode in 64-bit
  x86/smpboot.c: Early update ucode on AP
  x86/mm/init.c: Copy ucode from initrd image to memory
  x86/Kconfig: Configurations to enable/disable the feature

 Documentation/x86/early-microcode.txt   |   43 +++
 arch/x86/Kconfig                        |   18 ++
 arch/x86/include/asm/microcode.h        |   23 ++
 arch/x86/include/asm/microcode_intel.h  |  106 ++++++++
 arch/x86/kernel/Makefile                |    3 +
 arch/x86/kernel/head64.c                |    6 +
 arch/x86/kernel/head_32.S               |    6 +
 arch/x86/kernel/microcode_core.c        |    7 +-
 arch/x86/kernel/microcode_core_early.c  |   70 +++++
 arch/x86/kernel/microcode_intel.c       |  185 +-------------
 arch/x86/kernel/microcode_intel_early.c |  438 +++++++++++++++++++++++++++++++
 arch/x86/kernel/microcode_intel_lib.c   |  174 ++++++++++++
 arch/x86/kernel/smpboot.c               |    7 +
 arch/x86/mm/init.c                      |   10 +
 14 files changed, 915 insertions(+), 181 deletions(-)
 create mode 100644 Documentation/x86/early-microcode.txt
 create mode 100644 arch/x86/include/asm/microcode_intel.h
 create mode 100644 arch/x86/kernel/microcode_core_early.c
 create mode 100644 arch/x86/kernel/microcode_intel_early.c
 create mode 100644 arch/x86/kernel/microcode_intel_lib.c

-- 
1.7.2


^ permalink raw reply	[flat|nested] 127+ messages in thread

* [PATCH v2 01/10] Documentation/x86: Early load microcode
  2012-11-30  1:47 [PATCH v2 00/11] x86/microcode: Early load microcode Fenghua Yu
@ 2012-11-30  1:47 ` Fenghua Yu
  2012-11-30 19:46   ` H. Peter Anvin
  2012-11-30  1:47 ` [PATCH v2 02/10] x86/microcode_intel.h: Define functions and macros for early loading ucode Fenghua Yu
                   ` (8 subsequent siblings)
  9 siblings, 1 reply; 127+ messages in thread
From: Fenghua Yu @ 2012-11-30  1:47 UTC (permalink / raw)
  To: H Peter Anvin, Ingo Molnar, Thomas Gleixner, Asit K Mallick,
	Tigran Aivazian, Andreas Herrmann, Borislav Petkov, linux-kernel,
	x86
  Cc: Fenghua Yu

From: Fenghua Yu <fenghua.yu@intel.com>

Documenation for early loading microcode methodology.

Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
---
 Documentation/x86/early-microcode.txt |   43 +++++++++++++++++++++++++++++++++
 1 files changed, 43 insertions(+), 0 deletions(-)
 create mode 100644 Documentation/x86/early-microcode.txt

diff --git a/Documentation/x86/early-microcode.txt b/Documentation/x86/early-microcode.txt
new file mode 100644
index 0000000..9b3a6ab
--- /dev/null
+++ b/Documentation/x86/early-microcode.txt
@@ -0,0 +1,43 @@
+Early load microcode
+====================
+By Fenghua Yu <fenghua.yu@intel.com>
+
+Kernel can update microcode in early phase of boot time. Loading microcode early
+can fix CPU issues before they are observed during kernel boot time.
+
+Microcode is stored in an initrd file. The microcode is read from the initrd
+file and loaded to CPUs during boot time.
+
+The format of the combined initrd image is microcode in cpio format followed by
+the initrd image (maybe compressed). Kernel parses the combined initrd image
+during boot time. The microcode file in cpio name space is:
+kernel/x86/microcode/GenuineIntel.bin
+
+During BSP boot (before SMP starts), if the kernel finds the microcode file in
+the initrd file, it parses the microcode and saves matching microcode in memory.
+If matching microcode is found, it will be uploaded in BSP and later on in all
+APs.
+
+The cached microcode patch is applied when CPUs resume from a sleep state.
+
+There are two legacy user space interfaces to load microcode, either through
+/dev/cpu/microcode or through /sys/devices/system/cpu/microcode/reload file
+in sysfs.
+
+In addition to these two legacy methods, the early loading method described
+here is the third method with which microcode can be uploaded to a system's
+CPUs.
+
+The following example script shows how to generate a new combined initrd file in
+/boot/initrd-3.5.0.ucode.img with original microcode microcode.hex and
+original initrd image /boot/initrd-3.5.0.img.
+
+mkdir initrd
+cd initrd
+cp ../microcode.hex kernel/x86/microcode/GenuineIntel/microcode.hex
+find .|cpio -oc >../ucode.cpio
+cd ..
+cat ucode.cpio /boot/initrd-3.5.0.img >/boot/initrd-3.5.0.ucode.img
+
+The generated /boot/initrd-3.5.0.ucode.img can be used as initrd file to load
+microcode during early boot time.
-- 
1.7.2


^ permalink raw reply related	[flat|nested] 127+ messages in thread

* [PATCH v2 02/10] x86/microcode_intel.h: Define functions and macros for early loading ucode
  2012-11-30  1:47 [PATCH v2 00/11] x86/microcode: Early load microcode Fenghua Yu
  2012-11-30  1:47 ` [PATCH v2 01/10] Documentation/x86: " Fenghua Yu
@ 2012-11-30  1:47 ` Fenghua Yu
  2012-12-01  0:21   ` [tip:x86/microcode] " tip-bot for Fenghua Yu
  2012-11-30  1:47 ` [PATCH v2 03/10] x86/microcode_core_early.c: Define interfaces " Fenghua Yu
                   ` (7 subsequent siblings)
  9 siblings, 1 reply; 127+ messages in thread
From: Fenghua Yu @ 2012-11-30  1:47 UTC (permalink / raw)
  To: H Peter Anvin, Ingo Molnar, Thomas Gleixner, Asit K Mallick,
	Tigran Aivazian, Andreas Herrmann, Borislav Petkov, linux-kernel,
	x86
  Cc: Fenghua Yu

From: Fenghua Yu <fenghua.yu@intel.com>

Define some functions and macros that will be used in early loading ucode. Some
of them are moved from microcode_intel.c driver in order to be called in early
boot phase before module can be called.

Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
---
 arch/x86/include/asm/microcode_intel.h |  106 ++++++++++++++++++
 arch/x86/kernel/Makefile               |    3 +
 arch/x86/kernel/microcode_core.c       |    7 +-
 arch/x86/kernel/microcode_intel.c      |  185 ++------------------------------
 4 files changed, 120 insertions(+), 181 deletions(-)
 create mode 100644 arch/x86/include/asm/microcode_intel.h

diff --git a/arch/x86/include/asm/microcode_intel.h b/arch/x86/include/asm/microcode_intel.h
new file mode 100644
index 0000000..0544bf4
--- /dev/null
+++ b/arch/x86/include/asm/microcode_intel.h
@@ -0,0 +1,106 @@
+#ifndef _ASM_X86_MICROCODE_INTEL_H
+#define _ASM_X86_MICROCODE_INTEL_H
+
+#include <asm/microcode.h>
+
+struct microcode_header_intel {
+	unsigned int            hdrver;
+	unsigned int            rev;
+	unsigned int            date;
+	unsigned int            sig;
+	unsigned int            cksum;
+	unsigned int            ldrver;
+	unsigned int            pf;
+	unsigned int            datasize;
+	unsigned int            totalsize;
+	unsigned int            reserved[3];
+};
+
+struct microcode_intel {
+	struct microcode_header_intel hdr;
+	unsigned int            bits[0];
+};
+
+/* microcode format is extended from prescott processors */
+struct extended_signature {
+	unsigned int            sig;
+	unsigned int            pf;
+	unsigned int            cksum;
+};
+
+struct extended_sigtable {
+	unsigned int            count;
+	unsigned int            cksum;
+	unsigned int            reserved[3];
+	struct extended_signature sigs[0];
+};
+
+#define DEFAULT_UCODE_DATASIZE	(2000)
+#define MC_HEADER_SIZE		(sizeof(struct microcode_header_intel))
+#define DEFAULT_UCODE_TOTALSIZE (DEFAULT_UCODE_DATASIZE + MC_HEADER_SIZE)
+#define EXT_HEADER_SIZE		(sizeof(struct extended_sigtable))
+#define EXT_SIGNATURE_SIZE	(sizeof(struct extended_signature))
+#define DWSIZE			(sizeof(u32))
+
+#define get_totalsize(mc) \
+	(((struct microcode_intel *)mc)->hdr.totalsize ? \
+	 ((struct microcode_intel *)mc)->hdr.totalsize : \
+	 DEFAULT_UCODE_TOTALSIZE)
+
+#define get_datasize(mc) \
+	(((struct microcode_intel *)mc)->hdr.datasize ? \
+	 ((struct microcode_intel *)mc)->hdr.datasize : DEFAULT_UCODE_DATASIZE)
+
+#define sigmatch(s1, s2, p1, p2) \
+	(((s1) == (s2)) && (((p1) & (p2)) || (((p1) == 0) && ((p2) == 0))))
+
+#define exttable_size(et) ((et)->count * EXT_SIGNATURE_SIZE + EXT_HEADER_SIZE)
+
+extern int
+get_matching_microcode(unsigned int csig, int cpf, void *mc, int rev);
+extern int microcode_sanity_check(void *mc, int print_err);
+extern int get_matching_sig(unsigned int csig, int cpf, void *mc, int rev);
+extern int
+update_match_revision(struct microcode_header_intel *mc_header, int rev);
+
+#ifdef CONFIG_MICROCODE_INTEL_EARLY
+extern enum ucode_state
+get_matching_model_microcode(int cpu, void *data, size_t size,
+			     struct mc_saved_data *mc_saved_data,
+			     struct microcode_intel **mc_saved_in_initrd,
+			     enum system_states system_state);
+extern enum ucode_state
+generic_load_microcode_early(int cpu, struct microcode_intel **mc_saved_p,
+			     unsigned int mc_saved_count,
+			     struct ucode_cpu_info *uci);
+extern void __init
+load_ucode_intel_bsp(char *real_mode_data);
+extern void __init load_ucode_intel_ap(void);
+#else
+static inline enum ucode_state
+get_matching_model_microcode(int cpu, void *data, size_t size,
+			     struct mc_saved_data *mc_saved_data,
+			     struct microcode_intel **mc_saved_in_initrd,
+			     enum system_states system_state)
+{
+	return UCODE_ERROR;
+}
+static inline enum ucode_state
+generic_load_microcode_early(int cpu, struct microcode_intel **mc_saved_p,
+			     unsigned int mc_saved_count,
+			     struct ucode_cpu_info *uci)
+{
+	return UCODE_ERROR;
+}
+static inline __init void
+load_ucode_intel_bsp(char *real_mode_data)
+{
+}
+static inline __init void
+load_ucode_intel_ap(struct ucode_cpu_info *uci,
+		    struct mc_saved_data *mc_saved_data)
+{
+}
+#endif
+
+#endif /* _ASM_X86_MICROCODE_INTEL_H */
diff --git a/arch/x86/kernel/Makefile b/arch/x86/kernel/Makefile
index 9fd5eed..52b992a 100644
--- a/arch/x86/kernel/Makefile
+++ b/arch/x86/kernel/Makefile
@@ -87,6 +87,9 @@ obj-$(CONFIG_PARAVIRT_CLOCK)	+= pvclock.o
 
 obj-$(CONFIG_PCSPKR_PLATFORM)	+= pcspeaker.o
 
+obj-$(CONFIG_MICROCODE_EARLY)		+= microcode_core_early.o
+obj-$(CONFIG_MICROCODE_INTEL_EARLY)	+= microcode_intel_early.o
+obj-$(CONFIG_MICROCODE_INTEL_LIB)	+= microcode_intel_lib.o
 microcode-y				:= microcode_core.o
 microcode-$(CONFIG_MICROCODE_INTEL)	+= microcode_intel.o
 microcode-$(CONFIG_MICROCODE_AMD)	+= microcode_amd.o
diff --git a/arch/x86/kernel/microcode_core.c b/arch/x86/kernel/microcode_core.c
index 3a04b22..22db92b 100644
--- a/arch/x86/kernel/microcode_core.c
+++ b/arch/x86/kernel/microcode_core.c
@@ -364,10 +364,7 @@ static struct attribute_group mc_attr_group = {
 
 static void microcode_fini_cpu(int cpu)
 {
-	struct ucode_cpu_info *uci = ucode_cpu_info + cpu;
-
 	microcode_ops->microcode_fini_cpu(cpu);
-	uci->valid = 0;
 }
 
 static enum ucode_state microcode_resume_cpu(int cpu)
@@ -383,6 +380,10 @@ static enum ucode_state microcode_resume_cpu(int cpu)
 static enum ucode_state microcode_init_cpu(int cpu, bool refresh_fw)
 {
 	enum ucode_state ustate;
+	struct ucode_cpu_info *uci = ucode_cpu_info + cpu;
+
+	if (uci && uci->valid)
+		return UCODE_OK;
 
 	if (collect_cpu_info(cpu))
 		return UCODE_ERROR;
diff --git a/arch/x86/kernel/microcode_intel.c b/arch/x86/kernel/microcode_intel.c
index 3544aed..f3ddf8e 100644
--- a/arch/x86/kernel/microcode_intel.c
+++ b/arch/x86/kernel/microcode_intel.c
@@ -79,7 +79,7 @@
 #include <linux/module.h>
 #include <linux/vmalloc.h>
 
-#include <asm/microcode.h>
+#include <asm/microcode_intel.h>
 #include <asm/processor.h>
 #include <asm/msr.h>
 
@@ -87,59 +87,6 @@ MODULE_DESCRIPTION("Microcode Update Driver");
 MODULE_AUTHOR("Tigran Aivazian <tigran@aivazian.fsnet.co.uk>");
 MODULE_LICENSE("GPL");
 
-struct microcode_header_intel {
-	unsigned int            hdrver;
-	unsigned int            rev;
-	unsigned int            date;
-	unsigned int            sig;
-	unsigned int            cksum;
-	unsigned int            ldrver;
-	unsigned int            pf;
-	unsigned int            datasize;
-	unsigned int            totalsize;
-	unsigned int            reserved[3];
-};
-
-struct microcode_intel {
-	struct microcode_header_intel hdr;
-	unsigned int            bits[0];
-};
-
-/* microcode format is extended from prescott processors */
-struct extended_signature {
-	unsigned int            sig;
-	unsigned int            pf;
-	unsigned int            cksum;
-};
-
-struct extended_sigtable {
-	unsigned int            count;
-	unsigned int            cksum;
-	unsigned int            reserved[3];
-	struct extended_signature sigs[0];
-};
-
-#define DEFAULT_UCODE_DATASIZE	(2000)
-#define MC_HEADER_SIZE		(sizeof(struct microcode_header_intel))
-#define DEFAULT_UCODE_TOTALSIZE (DEFAULT_UCODE_DATASIZE + MC_HEADER_SIZE)
-#define EXT_HEADER_SIZE		(sizeof(struct extended_sigtable))
-#define EXT_SIGNATURE_SIZE	(sizeof(struct extended_signature))
-#define DWSIZE			(sizeof(u32))
-
-#define get_totalsize(mc) \
-	(((struct microcode_intel *)mc)->hdr.totalsize ? \
-	 ((struct microcode_intel *)mc)->hdr.totalsize : \
-	 DEFAULT_UCODE_TOTALSIZE)
-
-#define get_datasize(mc) \
-	(((struct microcode_intel *)mc)->hdr.datasize ? \
-	 ((struct microcode_intel *)mc)->hdr.datasize : DEFAULT_UCODE_DATASIZE)
-
-#define sigmatch(s1, s2, p1, p2) \
-	(((s1) == (s2)) && (((p1) & (p2)) || (((p1) == 0) && ((p2) == 0))))
-
-#define exttable_size(et) ((et)->count * EXT_SIGNATURE_SIZE + EXT_HEADER_SIZE)
-
 static int collect_cpu_info(int cpu_num, struct cpu_signature *csig)
 {
 	struct cpuinfo_x86 *c = &cpu_data(cpu_num);
@@ -162,128 +109,7 @@ static int collect_cpu_info(int cpu_num, struct cpu_signature *csig)
 	return 0;
 }
 
-static inline int update_match_cpu(struct cpu_signature *csig, int sig, int pf)
-{
-	return (!sigmatch(sig, csig->sig, pf, csig->pf)) ? 0 : 1;
-}
-
-static inline int
-update_match_revision(struct microcode_header_intel *mc_header, int rev)
-{
-	return (mc_header->rev <= rev) ? 0 : 1;
-}
-
-static int microcode_sanity_check(void *mc)
-{
-	unsigned long total_size, data_size, ext_table_size;
-	struct microcode_header_intel *mc_header = mc;
-	struct extended_sigtable *ext_header = NULL;
-	int sum, orig_sum, ext_sigcount = 0, i;
-	struct extended_signature *ext_sig;
-
-	total_size = get_totalsize(mc_header);
-	data_size = get_datasize(mc_header);
-
-	if (data_size + MC_HEADER_SIZE > total_size) {
-		pr_err("error! Bad data size in microcode data file\n");
-		return -EINVAL;
-	}
-
-	if (mc_header->ldrver != 1 || mc_header->hdrver != 1) {
-		pr_err("error! Unknown microcode update format\n");
-		return -EINVAL;
-	}
-	ext_table_size = total_size - (MC_HEADER_SIZE + data_size);
-	if (ext_table_size) {
-		if ((ext_table_size < EXT_HEADER_SIZE)
-		 || ((ext_table_size - EXT_HEADER_SIZE) % EXT_SIGNATURE_SIZE)) {
-			pr_err("error! Small exttable size in microcode data file\n");
-			return -EINVAL;
-		}
-		ext_header = mc + MC_HEADER_SIZE + data_size;
-		if (ext_table_size != exttable_size(ext_header)) {
-			pr_err("error! Bad exttable size in microcode data file\n");
-			return -EFAULT;
-		}
-		ext_sigcount = ext_header->count;
-	}
-
-	/* check extended table checksum */
-	if (ext_table_size) {
-		int ext_table_sum = 0;
-		int *ext_tablep = (int *)ext_header;
-
-		i = ext_table_size / DWSIZE;
-		while (i--)
-			ext_table_sum += ext_tablep[i];
-		if (ext_table_sum) {
-			pr_warning("aborting, bad extended signature table checksum\n");
-			return -EINVAL;
-		}
-	}
-
-	/* calculate the checksum */
-	orig_sum = 0;
-	i = (MC_HEADER_SIZE + data_size) / DWSIZE;
-	while (i--)
-		orig_sum += ((int *)mc)[i];
-	if (orig_sum) {
-		pr_err("aborting, bad checksum\n");
-		return -EINVAL;
-	}
-	if (!ext_table_size)
-		return 0;
-	/* check extended signature checksum */
-	for (i = 0; i < ext_sigcount; i++) {
-		ext_sig = (void *)ext_header + EXT_HEADER_SIZE +
-			  EXT_SIGNATURE_SIZE * i;
-		sum = orig_sum
-			- (mc_header->sig + mc_header->pf + mc_header->cksum)
-			+ (ext_sig->sig + ext_sig->pf + ext_sig->cksum);
-		if (sum) {
-			pr_err("aborting, bad checksum\n");
-			return -EINVAL;
-		}
-	}
-	return 0;
-}
-
-/*
- * return 0 - no update found
- * return 1 - found update
- */
-static int
-get_matching_microcode(struct cpu_signature *cpu_sig, void *mc, int rev)
-{
-	struct microcode_header_intel *mc_header = mc;
-	struct extended_sigtable *ext_header;
-	unsigned long total_size = get_totalsize(mc_header);
-	int ext_sigcount, i;
-	struct extended_signature *ext_sig;
-
-	if (!update_match_revision(mc_header, rev))
-		return 0;
-
-	if (update_match_cpu(cpu_sig, mc_header->sig, mc_header->pf))
-		return 1;
-
-	/* Look for ext. headers: */
-	if (total_size <= get_datasize(mc_header) + MC_HEADER_SIZE)
-		return 0;
-
-	ext_header = mc + get_datasize(mc_header) + MC_HEADER_SIZE;
-	ext_sigcount = ext_header->count;
-	ext_sig = (void *)ext_header + EXT_HEADER_SIZE;
-
-	for (i = 0; i < ext_sigcount; i++) {
-		if (update_match_cpu(cpu_sig, ext_sig->sig, ext_sig->pf))
-			return 1;
-		ext_sig++;
-	}
-	return 0;
-}
-
-static int apply_microcode(int cpu)
+int apply_microcode(int cpu)
 {
 	struct microcode_intel *mc_intel;
 	struct ucode_cpu_info *uci;
@@ -338,6 +164,7 @@ static enum ucode_state generic_load_microcode(int cpu, void *data, size_t size,
 	unsigned int leftover = size;
 	enum ucode_state state = UCODE_OK;
 	unsigned int curr_mc_size = 0;
+	unsigned int csig, cpf;
 
 	while (leftover) {
 		struct microcode_header_intel mc_header;
@@ -362,11 +189,13 @@ static enum ucode_state generic_load_microcode(int cpu, void *data, size_t size,
 		}
 
 		if (get_ucode_data(mc, ucode_ptr, mc_size) ||
-		    microcode_sanity_check(mc) < 0) {
+		    microcode_sanity_check(mc, 1) < 0) {
 			break;
 		}
 
-		if (get_matching_microcode(&uci->cpu_sig, mc, new_rev)) {
+		csig = uci->cpu_sig.sig;
+		cpf = uci->cpu_sig.pf;
+		if (get_matching_microcode(csig, cpf, mc, new_rev)) {
 			vfree(new_mc);
 			new_rev = mc_header.rev;
 			new_mc  = mc;
-- 
1.7.2


^ permalink raw reply related	[flat|nested] 127+ messages in thread

* [PATCH v2 03/10] x86/microcode_core_early.c: Define interfaces for early loading ucode
  2012-11-30  1:47 [PATCH v2 00/11] x86/microcode: Early load microcode Fenghua Yu
  2012-11-30  1:47 ` [PATCH v2 01/10] Documentation/x86: " Fenghua Yu
  2012-11-30  1:47 ` [PATCH v2 02/10] x86/microcode_intel.h: Define functions and macros for early loading ucode Fenghua Yu
@ 2012-11-30  1:47 ` Fenghua Yu
  2012-12-01  0:23   ` [tip:x86/microcode] " tip-bot for Fenghua Yu
  2012-11-30  1:47 ` [PATCH v2 04/10] x86/microcode_intel_lib.c: Early update ucode on Intel's CPU Fenghua Yu
                   ` (6 subsequent siblings)
  9 siblings, 1 reply; 127+ messages in thread
From: Fenghua Yu @ 2012-11-30  1:47 UTC (permalink / raw)
  To: H Peter Anvin, Ingo Molnar, Thomas Gleixner, Asit K Mallick,
	Tigran Aivazian, Andreas Herrmann, Borislav Petkov, linux-kernel,
	x86
  Cc: Fenghua Yu

From: Fenghua Yu <fenghua.yu@intel.com>

Define interfaces load_ucode_bsp() and load_ucode_ap() to load ucode on BSP and
AP in early boot time. These are generic interfaces. Internally they call
vendor specific implementations.

Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
---
 arch/x86/include/asm/microcode.h       |   23 ++++++++++
 arch/x86/kernel/microcode_core_early.c |   70 ++++++++++++++++++++++++++++++++
 2 files changed, 93 insertions(+), 0 deletions(-)
 create mode 100644 arch/x86/kernel/microcode_core_early.c

diff --git a/arch/x86/include/asm/microcode.h b/arch/x86/include/asm/microcode.h
index 43d921b..2e2ff3a 100644
--- a/arch/x86/include/asm/microcode.h
+++ b/arch/x86/include/asm/microcode.h
@@ -57,4 +57,27 @@ static inline struct microcode_ops * __init init_amd_microcode(void)
 static inline void __exit exit_amd_microcode(void) {}
 #endif
 
+struct mc_saved_data {
+	unsigned int mc_saved_count;
+	struct microcode_intel **mc_saved;
+	struct ucode_cpu_info *ucode_cpu_info;
+};
+#ifdef CONFIG_MICROCODE_EARLY
+#define MAX_UCODE_COUNT 128
+extern struct ucode_cpu_info ucode_cpu_info_early[NR_CPUS];
+extern struct microcode_intel __initdata *mc_saved_in_initrd[MAX_UCODE_COUNT];
+extern struct mc_saved_data mc_saved_data;
+extern void __init load_ucode_bsp(char *real_mode_data);
+extern __init void load_ucode_ap(void);
+extern void __init
+save_microcode_in_initrd(struct mc_saved_data *mc_saved_data,
+			 struct microcode_intel **mc_saved_in_initrd);
+#else
+static inline void __init load_ucode_bsp(char *real_mode_data) {}
+static inline __init void load_ucode_ap(void) {}
+static inline void __init
+save_microcode_in_initrd(struct mc_saved_data *mc_saved_data,
+			 struct microcode_intel **mc_saved_in_initrd) {}
+#endif
+
 #endif /* _ASM_X86_MICROCODE_H */
diff --git a/arch/x86/kernel/microcode_core_early.c b/arch/x86/kernel/microcode_core_early.c
new file mode 100644
index 0000000..1c6cc8f
--- /dev/null
+++ b/arch/x86/kernel/microcode_core_early.c
@@ -0,0 +1,70 @@
+/*
+ *	X86 CPU microcode early update for Linux
+ *
+ *	Copyright (C) 2012 Fenghua Yu <fenghua.yu@intel.com>
+ *			   H Peter Anvin" <hpa@zytor.com>
+ *
+ *	This driver allows to early upgrade microcode on Intel processors
+ *	belonging to IA-32 family - PentiumPro, Pentium II,
+ *	Pentium III, Xeon, Pentium 4, etc.
+ *
+ *	Reference: Section 9.11 of Volume 3, IA-32 Intel Architecture
+ *	Software Developer's Manual.
+ *
+ *	This program is free software; you can redistribute it and/or
+ *	modify it under the terms of the GNU General Public License
+ *	as published by the Free Software Foundation; either version
+ *	2 of the License, or (at your option) any later version.
+ */
+#include <linux/module.h>
+#include <linux/mm.h>
+#include <asm/microcode_intel.h>
+#include <asm/processor.h>
+
+struct ucode_cpu_info	ucode_cpu_info_early[NR_CPUS];
+EXPORT_SYMBOL_GPL(ucode_cpu_info_early);
+
+static inline int __init x86_vendor(void)
+{
+	unsigned int eax = 0x00000000;
+	char x86_vendor_id[16];
+	int i;
+	struct {
+		char x86_vendor_id[16];
+		__u8 x86_vendor;
+	} cpu_vendor_table[] = {
+		{ "GenuineIntel", X86_VENDOR_INTEL },
+		{ "AuthenticAMD", X86_VENDOR_AMD },
+	};
+
+	memset(x86_vendor_id, 0, ARRAY_SIZE(x86_vendor_id));
+	/* Get vendor name */
+	native_cpuid(&eax,
+		(unsigned int *)&x86_vendor_id[0],
+		(unsigned int *)&x86_vendor_id[8],
+		(unsigned int *)&x86_vendor_id[4]);
+
+	for (i = 0; i < ARRAY_SIZE(cpu_vendor_table); i++) {
+		if (!strcmp(x86_vendor_id, cpu_vendor_table[i].x86_vendor_id))
+			return cpu_vendor_table[i].x86_vendor;
+	}
+
+	return X86_VENDOR_UNKNOWN;
+}
+
+
+void __init load_ucode_bsp(char *real_mode_data)
+{
+	/*
+	 * boot_cpu_data is not setup yet in this early phase.
+	 * So we get vendor information directly through cpuid.
+	 */
+	if (x86_vendor() == X86_VENDOR_INTEL)
+		load_ucode_intel_bsp(real_mode_data);
+}
+
+void __cpuinit load_ucode_ap(void)
+{
+	if (boot_cpu_data.x86_vendor == X86_VENDOR_INTEL)
+		load_ucode_intel_ap();
+}
-- 
1.7.2


^ permalink raw reply related	[flat|nested] 127+ messages in thread

* [PATCH v2 04/10] x86/microcode_intel_lib.c: Early update ucode on Intel's CPU
  2012-11-30  1:47 [PATCH v2 00/11] x86/microcode: Early load microcode Fenghua Yu
                   ` (2 preceding siblings ...)
  2012-11-30  1:47 ` [PATCH v2 03/10] x86/microcode_core_early.c: Define interfaces " Fenghua Yu
@ 2012-11-30  1:47 ` Fenghua Yu
  2012-12-01  0:24   ` [tip:x86/microcode] " tip-bot for Fenghua Yu
  2012-11-30  1:47 ` [PATCH v2 05/10] x86/microcode_intel_early.c: " Fenghua Yu
                   ` (5 subsequent siblings)
  9 siblings, 1 reply; 127+ messages in thread
From: Fenghua Yu @ 2012-11-30  1:47 UTC (permalink / raw)
  To: H Peter Anvin, Ingo Molnar, Thomas Gleixner, Asit K Mallick,
	Tigran Aivazian, Andreas Herrmann, Borislav Petkov, linux-kernel,
	x86
  Cc: Fenghua Yu

From: Fenghua Yu <fenghua.yu@intel.com>

Define interfaces microcode_sanity_check() and get_matching_microcode(). They
are called both in early boot time and in microcode Intel driver.

Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
---
 arch/x86/kernel/microcode_intel_lib.c |  174 +++++++++++++++++++++++++++++++++
 1 files changed, 174 insertions(+), 0 deletions(-)
 create mode 100644 arch/x86/kernel/microcode_intel_lib.c

diff --git a/arch/x86/kernel/microcode_intel_lib.c b/arch/x86/kernel/microcode_intel_lib.c
new file mode 100644
index 0000000..ce69320
--- /dev/null
+++ b/arch/x86/kernel/microcode_intel_lib.c
@@ -0,0 +1,174 @@
+/*
+ *	Intel CPU Microcode Update Driver for Linux
+ *
+ *	Copyright (C) 2012 Fenghua Yu <fenghua.yu@intel.com>
+ *			   H Peter Anvin" <hpa@zytor.com>
+ *
+ *	This driver allows to upgrade microcode on Intel processors
+ *	belonging to IA-32 family - PentiumPro, Pentium II,
+ *	Pentium III, Xeon, Pentium 4, etc.
+ *
+ *	Reference: Section 8.11 of Volume 3a, IA-32 Intel? Architecture
+ *	Software Developer's Manual
+ *	Order Number 253668 or free download from:
+ *
+ *	http://developer.intel.com/Assets/PDF/manual/253668.pdf
+ *
+ *	For more information, go to http://www.urbanmyth.org/microcode
+ *
+ *	This program is free software; you can redistribute it and/or
+ *	modify it under the terms of the GNU General Public License
+ *	as published by the Free Software Foundation; either version
+ *	2 of the License, or (at your option) any later version.
+ *
+ */
+#include <linux/firmware.h>
+#include <linux/uaccess.h>
+#include <linux/kernel.h>
+#include <linux/module.h>
+
+#include <asm/microcode_intel.h>
+#include <asm/processor.h>
+#include <asm/msr.h>
+
+static inline int
+update_match_cpu(unsigned int csig, unsigned int cpf,
+		 unsigned int sig, unsigned int pf)
+{
+	return (!sigmatch(sig, csig, pf, cpf)) ? 0 : 1;
+}
+
+int
+update_match_revision(struct microcode_header_intel *mc_header, int rev)
+{
+	return (mc_header->rev <= rev) ? 0 : 1;
+}
+
+int microcode_sanity_check(void *mc, int print_err)
+{
+	unsigned long total_size, data_size, ext_table_size;
+	struct microcode_header_intel *mc_header = mc;
+	struct extended_sigtable *ext_header = NULL;
+	int sum, orig_sum, ext_sigcount = 0, i;
+	struct extended_signature *ext_sig;
+
+	total_size = get_totalsize(mc_header);
+	data_size = get_datasize(mc_header);
+
+	if (data_size + MC_HEADER_SIZE > total_size) {
+		if (print_err)
+			pr_err("error! Bad data size in microcode data file\n");
+		return -EINVAL;
+	}
+
+	if (mc_header->ldrver != 1 || mc_header->hdrver != 1) {
+		if (print_err)
+			pr_err("error! Unknown microcode update format\n");
+		return -EINVAL;
+	}
+	ext_table_size = total_size - (MC_HEADER_SIZE + data_size);
+	if (ext_table_size) {
+		if ((ext_table_size < EXT_HEADER_SIZE)
+		 || ((ext_table_size - EXT_HEADER_SIZE) % EXT_SIGNATURE_SIZE)) {
+			if (print_err)
+				pr_err("error! Small exttable size in microcode data file\n");
+			return -EINVAL;
+		}
+		ext_header = mc + MC_HEADER_SIZE + data_size;
+		if (ext_table_size != exttable_size(ext_header)) {
+			if (print_err)
+				pr_err("error! Bad exttable size in microcode data file\n");
+			return -EFAULT;
+		}
+		ext_sigcount = ext_header->count;
+	}
+
+	/* check extended table checksum */
+	if (ext_table_size) {
+		int ext_table_sum = 0;
+		int *ext_tablep = (int *)ext_header;
+
+		i = ext_table_size / DWSIZE;
+		while (i--)
+			ext_table_sum += ext_tablep[i];
+		if (ext_table_sum) {
+			if (print_err)
+				pr_warn("aborting, bad extended signature table checksum\n");
+			return -EINVAL;
+		}
+	}
+
+	/* calculate the checksum */
+	orig_sum = 0;
+	i = (MC_HEADER_SIZE + data_size) / DWSIZE;
+	while (i--)
+		orig_sum += ((int *)mc)[i];
+	if (orig_sum) {
+		if (print_err)
+			pr_err("aborting, bad checksum\n");
+		return -EINVAL;
+	}
+	if (!ext_table_size)
+		return 0;
+	/* check extended signature checksum */
+	for (i = 0; i < ext_sigcount; i++) {
+		ext_sig = (void *)ext_header + EXT_HEADER_SIZE +
+			  EXT_SIGNATURE_SIZE * i;
+		sum = orig_sum
+			- (mc_header->sig + mc_header->pf + mc_header->cksum)
+			+ (ext_sig->sig + ext_sig->pf + ext_sig->cksum);
+		if (sum) {
+			if (print_err)
+				pr_err("aborting, bad checksum\n");
+			return -EINVAL;
+		}
+	}
+	return 0;
+}
+EXPORT_SYMBOL_GPL(microcode_sanity_check);
+
+/*
+ * return 0 - no update found
+ * return 1 - found update
+ */
+int get_matching_sig(unsigned int csig, int cpf, void *mc, int rev)
+{
+	struct microcode_header_intel *mc_header = mc;
+	struct extended_sigtable *ext_header;
+	unsigned long total_size = get_totalsize(mc_header);
+	int ext_sigcount, i;
+	struct extended_signature *ext_sig;
+
+	if (update_match_cpu(csig, cpf, mc_header->sig, mc_header->pf))
+		return 1;
+
+	/* Look for ext. headers: */
+	if (total_size <= get_datasize(mc_header) + MC_HEADER_SIZE)
+		return 0;
+
+	ext_header = mc + get_datasize(mc_header) + MC_HEADER_SIZE;
+	ext_sigcount = ext_header->count;
+	ext_sig = (void *)ext_header + EXT_HEADER_SIZE;
+
+	for (i = 0; i < ext_sigcount; i++) {
+		if (update_match_cpu(csig, cpf, ext_sig->sig, ext_sig->pf))
+			return 1;
+		ext_sig++;
+	}
+	return 0;
+}
+
+/*
+ * return 0 - no update found
+ * return 1 - found update
+ */
+int get_matching_microcode(unsigned int csig, int cpf, void *mc, int rev)
+{
+	struct microcode_header_intel *mc_header = mc;
+
+	if (!update_match_revision(mc_header, rev))
+		return 0;
+
+	return get_matching_sig(csig, cpf, mc, rev);
+}
+EXPORT_SYMBOL_GPL(get_matching_microcode);
-- 
1.7.2


^ permalink raw reply related	[flat|nested] 127+ messages in thread

* [PATCH v2 05/10] x86/microcode_intel_early.c: Early update ucode on Intel's CPU
  2012-11-30  1:47 [PATCH v2 00/11] x86/microcode: Early load microcode Fenghua Yu
                   ` (3 preceding siblings ...)
  2012-11-30  1:47 ` [PATCH v2 04/10] x86/microcode_intel_lib.c: Early update ucode on Intel's CPU Fenghua Yu
@ 2012-11-30  1:47 ` Fenghua Yu
  2012-12-01  0:25   ` [tip:x86/microcode] " tip-bot for Fenghua Yu
  2012-11-30  1:47 ` [PATCH v2 06/10] x86/head_32.S: Early update ucode in 32-bit Fenghua Yu
                   ` (4 subsequent siblings)
  9 siblings, 1 reply; 127+ messages in thread
From: Fenghua Yu @ 2012-11-30  1:47 UTC (permalink / raw)
  To: H Peter Anvin, Ingo Molnar, Thomas Gleixner, Asit K Mallick,
	Tigran Aivazian, Andreas Herrmann, Borislav Petkov, linux-kernel,
	x86
  Cc: Fenghua Yu

From: Fenghua Yu <fenghua.yu@intel.com>

Implementation of early update ucode on Intel's CPU.

load_ucode_intel_bsp() scans ucode in initrd image file which is a cpio format
ucode followed by ordinary initrd image file. The binary ucode file is stored
in kernel/x86/microcode/GenuineIntel/microcode.hex in the cpio data. All ucode
patches with the same model as BSP are saved in memory. A matching ucode patch
is updated on BSP.

load_ucode_intel_ap() reads saved ucoded patches and updates ucode on AP.

Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
---
 arch/x86/kernel/microcode_intel_early.c |  438 +++++++++++++++++++++++++++++++
 1 files changed, 438 insertions(+), 0 deletions(-)
 create mode 100644 arch/x86/kernel/microcode_intel_early.c

diff --git a/arch/x86/kernel/microcode_intel_early.c b/arch/x86/kernel/microcode_intel_early.c
new file mode 100644
index 0000000..36b1df1
--- /dev/null
+++ b/arch/x86/kernel/microcode_intel_early.c
@@ -0,0 +1,438 @@
+/*
+ *	Intel CPU Microcode Update Driver for Linux
+ *
+ *	Copyright (C) 2012 Fenghua Yu <fenghua.yu@intel.com>
+ *			   H Peter Anvin" <hpa@zytor.com>
+ *
+ *	This driver allows to early upgrade microcode on Intel processors
+ *	belonging to IA-32 family - PentiumPro, Pentium II,
+ *	Pentium III, Xeon, Pentium 4, etc.
+ *
+ *	Reference: Section 9.11 of Volume 3, IA-32 Intel Architecture
+ *	Software Developer's Manual.
+ *
+ *	This program is free software; you can redistribute it and/or
+ *	modify it under the terms of the GNU General Public License
+ *	as published by the Free Software Foundation; either version
+ *	2 of the License, or (at your option) any later version.
+ */
+#include <linux/module.h>
+#include <linux/vmalloc.h>
+#include <linux/mm.h>
+#include <linux/earlycpio.h>
+#include <asm/msr.h>
+#include <asm/microcode_intel.h>
+#include <asm/processor.h>
+
+struct microcode_intel __initdata *mc_saved_in_initrd[MAX_UCODE_COUNT];
+struct mc_saved_data mc_saved_data;
+
+enum ucode_state
+generic_load_microcode_early(int cpu, struct microcode_intel **mc_saved_p,
+			     unsigned int mc_saved_count,
+			     struct ucode_cpu_info *uci)
+{
+	struct microcode_intel *ucode_ptr, *new_mc = NULL;
+	int new_rev = uci->cpu_sig.rev;
+	enum ucode_state state = UCODE_OK;
+	unsigned int mc_size;
+	struct microcode_header_intel *mc_header;
+	unsigned int csig = uci->cpu_sig.sig;
+	unsigned int cpf = uci->cpu_sig.pf;
+	int i;
+
+	for (i = 0; i < mc_saved_count; i++) {
+		ucode_ptr = mc_saved_p[i];
+		mc_header = (struct microcode_header_intel *)ucode_ptr;
+		mc_size = get_totalsize(mc_header);
+		if (get_matching_microcode(csig, cpf, ucode_ptr, new_rev)) {
+			new_rev = mc_header->rev;
+			new_mc  = ucode_ptr;
+		}
+	}
+
+	if (!new_mc) {
+		state = UCODE_NFOUND;
+		goto out;
+	}
+
+	uci->mc = (struct microcode_intel *)new_mc;
+out:
+	return state;
+}
+EXPORT_SYMBOL_GPL(generic_load_microcode_early);
+
+static enum ucode_state __init
+load_microcode(struct mc_saved_data *mc_saved_data, int cpu)
+{
+	struct ucode_cpu_info *uci = mc_saved_data->ucode_cpu_info + cpu;
+
+	return generic_load_microcode_early(cpu, mc_saved_data->mc_saved,
+					    mc_saved_data->mc_saved_count, uci);
+}
+
+static u8 get_x86_family(unsigned long sig)
+{
+	u8 x86;
+
+	x86 = (sig >> 8) & 0xf;
+
+	if (x86 == 0xf)
+		x86 += (sig >> 20) & 0xff;
+
+	return x86;
+}
+
+static u8 get_x86_model(unsigned long sig)
+{
+	u8 x86, x86_model;
+
+	x86 = get_x86_family(sig);
+	x86_model = (sig >> 4) & 0xf;
+
+	if (x86 == 0x6 || x86 == 0xf)
+		x86_model += ((sig >> 16) & 0xf) << 4;
+
+	return x86_model;
+}
+
+static enum ucode_state
+matching_model_microcode(struct microcode_header_intel *mc_header,
+			unsigned long sig)
+{
+	u8 x86, x86_model;
+	u8 x86_ucode, x86_model_ucode;
+
+	x86 = get_x86_family(sig);
+	x86_model = get_x86_model(sig);
+
+	x86_ucode = get_x86_family(mc_header->sig);
+	x86_model_ucode = get_x86_model(mc_header->sig);
+
+	if (x86 != x86_ucode || x86_model != x86_model_ucode)
+		return UCODE_ERROR;
+
+	return UCODE_OK;
+}
+
+static void
+save_microcode(struct mc_saved_data *mc_saved_data,
+	       struct microcode_intel **mc_saved_src,
+	       unsigned int mc_saved_count)
+{
+	int i;
+	struct microcode_intel **mc_saved_p;
+
+	if (!mc_saved_count)
+		return;
+
+	mc_saved_p = vmalloc(mc_saved_count*sizeof(struct microcode_intel *));
+	if (!mc_saved_p)
+		return;
+
+	for (i = 0; i < mc_saved_count; i++) {
+		struct microcode_intel *mc = mc_saved_src[i];
+		struct microcode_header_intel *mc_header = &mc->hdr;
+		unsigned long mc_size = get_totalsize(mc_header);
+		mc_saved_p[i] = vmalloc(mc_size);
+		if (mc_saved_src[i])
+			memcpy(mc_saved_p[i], mc, mc_size);
+	}
+
+	mc_saved_data->mc_saved = mc_saved_p;
+}
+
+/*
+ * Get microcode matching with BSP's model. Only CPU's with the same model as
+ * BSP can stay in the platform.
+ */
+enum ucode_state
+get_matching_model_microcode(int cpu, void *data, size_t size,
+			     struct mc_saved_data *mc_saved_data,
+			     struct microcode_intel **mc_saved_in_initrd,
+			     enum system_states system_state)
+{
+	u8 *ucode_ptr = data;
+	unsigned int leftover = size;
+	enum ucode_state state = UCODE_OK;
+	unsigned int mc_size;
+	struct microcode_header_intel *mc_header;
+	struct microcode_intel *mc_saved_tmp[MAX_UCODE_COUNT];
+	size_t mc_saved_size;
+	size_t mem_size;
+	unsigned int mc_saved_count = mc_saved_data->mc_saved_count;
+	struct ucode_cpu_info *uci = mc_saved_data->ucode_cpu_info + cpu;
+	int found = 0;
+	int i;
+
+	if (mc_saved_count) {
+		mem_size = mc_saved_count * sizeof(struct microcode_intel *);
+		memcpy(mc_saved_tmp, mc_saved_data->mc_saved, mem_size);
+	}
+
+	while (leftover) {
+		mc_header = (struct microcode_header_intel *)ucode_ptr;
+
+		mc_size = get_totalsize(mc_header);
+		if (!mc_size || mc_size > leftover ||
+			microcode_sanity_check(ucode_ptr, 0) < 0)
+			break;
+
+		leftover -= mc_size;
+		if (matching_model_microcode(mc_header, uci->cpu_sig.sig) !=
+			 UCODE_OK) {
+			ucode_ptr += mc_size;
+			continue;
+		}
+
+		found = 0;
+		for (i = 0; i < mc_saved_count; i++) {
+			unsigned int sig, pf;
+			unsigned int new_rev;
+			struct microcode_header_intel *mc_saved_header =
+			     (struct microcode_header_intel *)mc_saved_tmp[i];
+			sig = mc_saved_header->sig;
+			pf = mc_saved_header->pf;
+			new_rev = mc_header->rev;
+
+			if (get_matching_sig(sig, pf, ucode_ptr, new_rev)) {
+				found = 1;
+				if (update_match_revision(mc_header, new_rev)) {
+					/*
+					 * Found an older ucode saved before.
+					 * Replace the older one with this newer
+					 * one.
+					 */
+					mc_saved_tmp[i] =
+					(struct microcode_intel *)ucode_ptr;
+					break;
+				}
+			}
+		}
+		if (i >= mc_saved_count && !found)
+			/*
+			 * This ucode is first time discovered in ucode file.
+			 * Save it to memory.
+			 */
+			mc_saved_tmp[mc_saved_count++] =
+					 (struct microcode_intel *)ucode_ptr;
+
+		ucode_ptr += mc_size;
+	}
+
+	if (leftover) {
+		state = UCODE_ERROR;
+		goto out;
+	}
+
+	if (mc_saved_count == 0) {
+		state = UCODE_NFOUND;
+		goto out;
+	}
+
+	if (system_state == SYSTEM_RUNNING) {
+		vfree(mc_saved_data->mc_saved);
+		save_microcode(mc_saved_data, mc_saved_tmp, mc_saved_count);
+	} else {
+		mc_saved_size = sizeof(struct microcode_intel *) *
+				mc_saved_count;
+		memcpy(mc_saved_in_initrd, mc_saved_tmp, mc_saved_size);
+		mc_saved_data->mc_saved = mc_saved_in_initrd;
+	}
+
+	mc_saved_data->mc_saved_count = mc_saved_count;
+out:
+	return state;
+}
+EXPORT_SYMBOL_GPL(get_matching_model_microcode);
+
+#define native_rdmsr(msr, val1, val2)		\
+do {						\
+	u64 __val = native_read_msr((msr));	\
+	(void)((val1) = (u32)__val);		\
+	(void)((val2) = (u32)(__val >> 32));	\
+} while (0)
+
+#define native_wrmsr(msr, low, high)		\
+	native_write_msr(msr, low, high);
+
+static int __cpuinit collect_cpu_info_early(struct ucode_cpu_info *uci)
+{
+	unsigned int val[2];
+	u8 x86, x86_model;
+	struct cpu_signature csig = {0, 0, 0};
+	unsigned int eax, ebx, ecx, edx;
+
+	memset(uci, 0, sizeof(*uci));
+
+	eax = 0x00000001;
+	ecx = 0;
+	native_cpuid(&eax, &ebx, &ecx, &edx);
+	csig.sig = eax;
+
+	x86 = get_x86_family(csig.sig);
+	x86_model = get_x86_model(csig.sig);
+
+	if ((x86_model >= 5) || (x86 > 6)) {
+		/* get processor flags from MSR 0x17 */
+		native_rdmsr(MSR_IA32_PLATFORM_ID, val[0], val[1]);
+		csig.pf = 1 << ((val[1] >> 18) & 7);
+	}
+
+	/* get the current revision from MSR 0x8B */
+	native_rdmsr(MSR_IA32_UCODE_REV, val[0], val[1]);
+
+	csig.rev = val[1];
+
+	uci->cpu_sig = csig;
+	uci->valid = 1;
+
+	return 0;
+}
+
+static __init enum ucode_state
+scan_microcode(unsigned long start, unsigned long end,
+		struct mc_saved_data *mc_saved_data,
+		struct microcode_intel **mc_saved_in_initrd)
+{
+	unsigned int size = end - start + 1;
+	struct cpio_data cd = { 0, 0 };
+	char ucode_name[] = "kernel/x86/microcode/GenuineIntel.bin";
+	long offset = 0;
+
+	cd = find_cpio_data(ucode_name, (void *)start, size, &offset);
+	if (!cd.data)
+		return UCODE_ERROR;
+
+	return get_matching_model_microcode(0, cd.data, cd.size, mc_saved_data,
+					 mc_saved_in_initrd, SYSTEM_BOOTING);
+}
+
+static int __init
+apply_microcode_early(struct mc_saved_data *mc_saved_data, int cpu)
+{
+	struct ucode_cpu_info *uci = mc_saved_data->ucode_cpu_info + cpu;
+	struct microcode_intel *mc_intel;
+	unsigned int val[2];
+
+	/* We should bind the task to the CPU */
+	mc_intel = uci->mc;
+	if (mc_intel == NULL)
+		return 0;
+
+	/* write microcode via MSR 0x79 */
+	native_wrmsr(MSR_IA32_UCODE_WRITE,
+	      (unsigned long) mc_intel->bits,
+	      (unsigned long) mc_intel->bits >> 16 >> 16);
+	native_wrmsr(MSR_IA32_UCODE_REV, 0, 0);
+
+	/* As documented in the SDM: Do a CPUID 1 here */
+	sync_core();
+
+	/* get the current revision from MSR 0x8B */
+	native_rdmsr(MSR_IA32_UCODE_REV, val[0], val[1]);
+	if (val[1] != mc_intel->hdr.rev)
+		return -1;
+
+	uci->cpu_sig.rev = val[1];
+
+	return 0;
+}
+
+#ifdef CONFIG_X86_32
+static void __init map_mc_saved(struct mc_saved_data *mc_saved_data,
+				struct microcode_intel **mc_saved_in_initrd)
+{
+	int i;
+
+	if (mc_saved_data->mc_saved) {
+		for (i = 0; i < mc_saved_data->mc_saved_count; i++)
+			mc_saved_data->mc_saved[i] =
+					 __va(mc_saved_data->mc_saved[i]);
+
+		mc_saved_data->mc_saved = __va(mc_saved_data->mc_saved);
+	}
+
+	if (mc_saved_data->ucode_cpu_info->mc)
+		mc_saved_data->ucode_cpu_info->mc =
+				 __va(mc_saved_data->ucode_cpu_info->mc);
+	mc_saved_data->ucode_cpu_info = __va(mc_saved_data->ucode_cpu_info);
+}
+#else
+static inline void __init map_mc_saved(struct mc_saved_data *mc_saved_data,
+				struct microcode_intel **mc_saved_in_initrd)
+{
+}
+#endif
+
+void __init save_microcode_in_initrd(struct mc_saved_data *mc_saved_data,
+		 struct microcode_intel **mc_saved_in_initrd)
+{
+	unsigned int count = mc_saved_data->mc_saved_count;
+
+	save_microcode(mc_saved_data, mc_saved_in_initrd, count);
+}
+
+static void __init
+_load_ucode_intel_bsp(struct mc_saved_data *mc_saved_data,
+		      struct microcode_intel **mc_saved_in_initrd,
+		      unsigned long initrd_start, unsigned long initrd_end)
+{
+	int cpu = 0;
+
+#ifdef CONFIG_X86_64
+	mc_saved_data->ucode_cpu_info = ucode_cpu_info_early;
+#else
+	mc_saved_data->ucode_cpu_info =
+			(struct ucode_cpu_info *)__pa(ucode_cpu_info_early);
+#endif
+	collect_cpu_info_early(mc_saved_data->ucode_cpu_info + cpu);
+	scan_microcode(initrd_start, initrd_end, mc_saved_data,
+		       mc_saved_in_initrd);
+	load_microcode(mc_saved_data, cpu);
+	apply_microcode_early(mc_saved_data, cpu);
+	map_mc_saved(mc_saved_data, mc_saved_in_initrd);
+}
+
+void __init
+load_ucode_intel_bsp(char *real_mode_data)
+{
+	u64 ramdisk_image, ramdisk_size, ramdisk_end;
+	unsigned long initrd_start, initrd_end;
+	struct boot_params *boot_params;
+
+	boot_params = (struct boot_params *)real_mode_data;
+	ramdisk_image = boot_params->hdr.ramdisk_image;
+	ramdisk_size  = boot_params->hdr.ramdisk_size;
+
+#ifdef CONFIG_X86_64
+	ramdisk_end  = PAGE_ALIGN(ramdisk_image + ramdisk_size);
+	initrd_start = ramdisk_image + PAGE_OFFSET;
+	initrd_end = initrd_start + ramdisk_size;
+	_load_ucode_intel_bsp(&mc_saved_data, mc_saved_in_initrd,
+			      initrd_start, initrd_end);
+#else
+	ramdisk_end  = ramdisk_image + ramdisk_size;
+	initrd_start = ramdisk_image;
+	initrd_end = initrd_start + ramdisk_size;
+	_load_ucode_intel_bsp((struct mc_saved_data *)__pa(&mc_saved_data),
+			(struct microcode_intel **)__pa(mc_saved_in_initrd),
+			initrd_start, initrd_end);
+#endif
+}
+
+void __cpuinit load_ucode_intel_ap(void)
+{
+	int cpu = smp_processor_id();
+
+	/*
+	 * If BSP doesn't find valid ucode and save it in memory, no need to
+	 * update ucode on this AP.
+	 */
+	if (!mc_saved_data.mc_saved)
+		return;
+
+	collect_cpu_info_early(mc_saved_data.ucode_cpu_info + cpu);
+	load_microcode(&mc_saved_data, cpu);
+	apply_microcode_early(&mc_saved_data, cpu);
+}
-- 
1.7.2


^ permalink raw reply related	[flat|nested] 127+ messages in thread

* [PATCH v2 06/10] x86/head_32.S: Early update ucode in 32-bit
  2012-11-30  1:47 [PATCH v2 00/11] x86/microcode: Early load microcode Fenghua Yu
                   ` (4 preceding siblings ...)
  2012-11-30  1:47 ` [PATCH v2 05/10] x86/microcode_intel_early.c: " Fenghua Yu
@ 2012-11-30  1:47 ` Fenghua Yu
  2012-12-01  0:26   ` [tip:x86/microcode] " tip-bot for Fenghua Yu
  2012-11-30  1:47 ` [PATCH v2 07/10] x86/head64.c: Early update ucode in 64-bit Fenghua Yu
                   ` (3 subsequent siblings)
  9 siblings, 1 reply; 127+ messages in thread
From: Fenghua Yu @ 2012-11-30  1:47 UTC (permalink / raw)
  To: H Peter Anvin, Ingo Molnar, Thomas Gleixner, Asit K Mallick,
	Tigran Aivazian, Andreas Herrmann, Borislav Petkov, linux-kernel,
	x86
  Cc: Fenghua Yu

From: Fenghua Yu <fenghua.yu@intel.com>

This updates ucode in 32-bit kernel. At this point, there is no paging and no
virtual address yet.

Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
---
 arch/x86/kernel/head_32.S |    6 ++++++
 1 files changed, 6 insertions(+), 0 deletions(-)

diff --git a/arch/x86/kernel/head_32.S b/arch/x86/kernel/head_32.S
index 957a47a..b7b76a4 100644
--- a/arch/x86/kernel/head_32.S
+++ b/arch/x86/kernel/head_32.S
@@ -144,6 +144,12 @@ ENTRY(startup_32)
 	movl %eax, pa(olpc_ofw_pgd)
 #endif
 
+#ifdef CONFIG_MICROCODE_EARLY
+	/* Early load ucode on BSP. */
+	movl $pa(boot_params), %eax
+	call load_ucode_bsp
+#endif
+
 /*
  * Initialize page tables.  This creates a PDE and a set of page
  * tables, which are located immediately beyond __brk_base.  The variable
-- 
1.7.2


^ permalink raw reply related	[flat|nested] 127+ messages in thread

* [PATCH v2 07/10] x86/head64.c: Early update ucode in 64-bit
  2012-11-30  1:47 [PATCH v2 00/11] x86/microcode: Early load microcode Fenghua Yu
                   ` (5 preceding siblings ...)
  2012-11-30  1:47 ` [PATCH v2 06/10] x86/head_32.S: Early update ucode in 32-bit Fenghua Yu
@ 2012-11-30  1:47 ` Fenghua Yu
  2012-12-01  0:27   ` [tip:x86/microcode] " tip-bot for Fenghua Yu
  2012-11-30  1:47 ` [PATCH v2 08/10] x86/smpboot.c: Early update ucode on AP Fenghua Yu
                   ` (2 subsequent siblings)
  9 siblings, 1 reply; 127+ messages in thread
From: Fenghua Yu @ 2012-11-30  1:47 UTC (permalink / raw)
  To: H Peter Anvin, Ingo Molnar, Thomas Gleixner, Asit K Mallick,
	Tigran Aivazian, Andreas Herrmann, Borislav Petkov, linux-kernel,
	x86
  Cc: Fenghua Yu

From: Fenghua Yu <fenghua.yu@intel.com>

This updates ucode in 64-bit mode. Paging and virtual address are working now.

Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
---
 arch/x86/kernel/head64.c |    6 ++++++
 1 files changed, 6 insertions(+), 0 deletions(-)

diff --git a/arch/x86/kernel/head64.c b/arch/x86/kernel/head64.c
index 037df57..a512f56 100644
--- a/arch/x86/kernel/head64.c
+++ b/arch/x86/kernel/head64.c
@@ -25,6 +25,7 @@
 #include <asm/kdebug.h>
 #include <asm/e820.h>
 #include <asm/bios_ebda.h>
+#include <asm/microcode.h>
 
 static void __init zap_identity_mappings(void)
 {
@@ -73,6 +74,11 @@ void __init x86_64_start_kernel(char * real_mode_data)
 	/* clear bss before set_intr_gate with early_idt_handler */
 	clear_bss();
 
+	/*
+	 * Load microcode early on BSP.
+	 */
+	load_ucode_bsp(real_mode_data);
+
 	/* Make NULL pointers segfault */
 	zap_identity_mappings();
 
-- 
1.7.2


^ permalink raw reply related	[flat|nested] 127+ messages in thread

* [PATCH v2 08/10] x86/smpboot.c: Early update ucode on AP
  2012-11-30  1:47 [PATCH v2 00/11] x86/microcode: Early load microcode Fenghua Yu
                   ` (6 preceding siblings ...)
  2012-11-30  1:47 ` [PATCH v2 07/10] x86/head64.c: Early update ucode in 64-bit Fenghua Yu
@ 2012-11-30  1:47 ` Fenghua Yu
  2012-12-01  0:28   ` [tip:x86/microcode] " tip-bot for Fenghua Yu
  2012-11-30  1:47 ` [PATCH v2 09/10] x86/mm/init.c: Copy ucode from initrd image to memory Fenghua Yu
  2012-11-30  1:47 ` [PATCH v2 10/10] x86/Kconfig: Configurations to enable/disable the feature Fenghua Yu
  9 siblings, 1 reply; 127+ messages in thread
From: Fenghua Yu @ 2012-11-30  1:47 UTC (permalink / raw)
  To: H Peter Anvin, Ingo Molnar, Thomas Gleixner, Asit K Mallick,
	Tigran Aivazian, Andreas Herrmann, Borislav Petkov, linux-kernel,
	x86
  Cc: Fenghua Yu

From: Fenghua Yu <fenghua.yu@intel.com>

This updates ucode on AP. At this point, BSP should store some valid ucode
patches in memory if it finds the ucode patches in initrd. AP searches the
stored ucode and uploads the ucode.

Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
---
 arch/x86/kernel/smpboot.c |    7 +++++++
 1 files changed, 7 insertions(+), 0 deletions(-)

diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
index 732bf5c..025690a 100644
--- a/arch/x86/kernel/smpboot.c
+++ b/arch/x86/kernel/smpboot.c
@@ -76,6 +76,7 @@
 #include <asm/i8259.h>
 
 #include <asm/realmode.h>
+#include <asm/microcode.h>
 
 /* State of each CPU */
 DEFINE_PER_CPU(int, cpu_state) = { 0 };
@@ -239,6 +240,12 @@ notrace static void __cpuinit start_secondary(void *unused)
 	 * most necessary things.
 	 */
 	cpu_init();
+	/*
+	 * Load microcode on this cpu if a valid microcode is available.
+	 * This is early microcode loading procedure.
+	 */
+	load_ucode_ap();
+
 	x86_cpuinit.early_percpu_clock_init();
 	preempt_disable();
 	smp_callin();
-- 
1.7.2


^ permalink raw reply related	[flat|nested] 127+ messages in thread

* [PATCH v2 09/10] x86/mm/init.c: Copy ucode from initrd image to memory
  2012-11-30  1:47 [PATCH v2 00/11] x86/microcode: Early load microcode Fenghua Yu
                   ` (7 preceding siblings ...)
  2012-11-30  1:47 ` [PATCH v2 08/10] x86/smpboot.c: Early update ucode on AP Fenghua Yu
@ 2012-11-30  1:47 ` Fenghua Yu
  2012-12-01  0:29   ` [tip:x86/microcode] " tip-bot for Fenghua Yu
  2012-11-30  1:47 ` [PATCH v2 10/10] x86/Kconfig: Configurations to enable/disable the feature Fenghua Yu
  9 siblings, 1 reply; 127+ messages in thread
From: Fenghua Yu @ 2012-11-30  1:47 UTC (permalink / raw)
  To: H Peter Anvin, Ingo Molnar, Thomas Gleixner, Asit K Mallick,
	Tigran Aivazian, Andreas Herrmann, Borislav Petkov, linux-kernel,
	x86
  Cc: Fenghua Yu

From: Fenghua Yu <fenghua.yu@intel.com>

Before initrd image is freed, copy valid ucode patches from initrd image
to kernel virtual memory. The saved ucode will be used to update AP in resume.

Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
---
 arch/x86/mm/init.c |   10 ++++++++++
 1 files changed, 10 insertions(+), 0 deletions(-)

diff --git a/arch/x86/mm/init.c b/arch/x86/mm/init.c
index d7aea41..a294d4b 100644
--- a/arch/x86/mm/init.c
+++ b/arch/x86/mm/init.c
@@ -16,6 +16,7 @@
 #include <asm/tlb.h>
 #include <asm/proto.h>
 #include <asm/dma.h>		/* for MAX_DMA_PFN */
+#include <asm/microcode.h>
 
 unsigned long __initdata pgt_buf_start;
 unsigned long __meminitdata pgt_buf_end;
@@ -391,6 +392,15 @@ void free_initmem(void)
 #ifdef CONFIG_BLK_DEV_INITRD
 void __init free_initrd_mem(unsigned long start, unsigned long end)
 {
+#ifdef CONFIG_MICROCODE_EARLY
+	/*
+	 * Remember, initrd memory may contain microcode or other useful things.
+	 * Before we lose initrd mem, we need to find a place to hold them
+	 * now that normal virtual memory is enabled.
+	 */
+	save_microcode_in_initrd(&mc_saved_data, mc_saved_in_initrd);
+#endif
+
 	/*
 	 * end could be not aligned, and We can not align that,
 	 * decompresser could be confused by aligned initrd_end
-- 
1.7.2


^ permalink raw reply related	[flat|nested] 127+ messages in thread

* [PATCH v2 10/10] x86/Kconfig: Configurations to enable/disable the feature
  2012-11-30  1:47 [PATCH v2 00/11] x86/microcode: Early load microcode Fenghua Yu
                   ` (8 preceding siblings ...)
  2012-11-30  1:47 ` [PATCH v2 09/10] x86/mm/init.c: Copy ucode from initrd image to memory Fenghua Yu
@ 2012-11-30  1:47 ` Fenghua Yu
  2012-12-01  0:30   ` [tip:x86/microcode] x86/Kconfig: Configurations to enable/ disable " tip-bot for Fenghua Yu
  9 siblings, 1 reply; 127+ messages in thread
From: Fenghua Yu @ 2012-11-30  1:47 UTC (permalink / raw)
  To: H Peter Anvin, Ingo Molnar, Thomas Gleixner, Asit K Mallick,
	Tigran Aivazian, Andreas Herrmann, Borislav Petkov, linux-kernel,
	x86
  Cc: Fenghua Yu

From: Fenghua Yu <fenghua.yu@intel.com>

MICROCODE_INTEL_LIB, MICROCODE_INTEL_EARLY, and MICROCODE_EARLY are three new
configurations to enable or disable the feature.

Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
---
 arch/x86/Kconfig |   18 ++++++++++++++++++
 1 files changed, 18 insertions(+), 0 deletions(-)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index ab62cd5..7c60a0f 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -1064,6 +1064,24 @@ config MICROCODE_OLD_INTERFACE
 	def_bool y
 	depends on MICROCODE
 
+config MICROCODE_INTEL_LIB
+	def_bool y
+	depends on MICROCODE_INTEL
+
+config MICROCODE_INTEL_EARLY
+	bool "Early load microcode"
+	depends on MICROCODE_INTEL && BLK_DEV_INITRD
+	default y
+	help
+	  This option provides functionality to read additional microcode data
+	  at the beginning of initrd image. The data tells kernel to load
+	  microcode to CPU's as early as possible. No functional change if no
+	  microcode data is glued to the initrd, therefore it's safe to say Y.
+
+config MICROCODE_EARLY
+	def_bool y
+	depends on MICROCODE_INTEL_EARLY
+
 config X86_MSR
 	tristate "/dev/cpu/*/msr - Model-specific register support"
 	---help---
-- 
1.7.2


^ permalink raw reply related	[flat|nested] 127+ messages in thread

* Re: [PATCH v2 01/10] Documentation/x86: Early load microcode
  2012-11-30  1:47 ` [PATCH v2 01/10] Documentation/x86: " Fenghua Yu
@ 2012-11-30 19:46   ` H. Peter Anvin
  2012-11-30 20:40     ` Yu, Fenghua
  0 siblings, 1 reply; 127+ messages in thread
From: H. Peter Anvin @ 2012-11-30 19:46 UTC (permalink / raw)
  To: Fenghua Yu
  Cc: Ingo Molnar, Thomas Gleixner, Asit K Mallick, Tigran Aivazian,
	Andreas Herrmann, Borislav Petkov, linux-kernel, x86

On 11/29/2012 05:47 PM, Fenghua Yu wrote:
> +
> +mkdir initrd
> +cd initrd
> +cp ../microcode.hex kernel/x86/microcode/GenuineIntel/microcode.hex
> +find .|cpio -oc >../ucode.cpio
> +cd ..
> +cat ucode.cpio /boot/initrd-3.5.0.img >/boot/initrd-3.5.0.ucode.img
> +
> +The generated /boot/initrd-3.5.0.ucode.img can be used as initrd file to load
> +microcode during early boot time.
> 

This is wrong; first of all, this refers to microcode.hex whereas the
patch refers to microcode.bin, and it misses the mandatory -H newc
option to cpio.

Could you update this bit, please?

	-hpa


^ permalink raw reply	[flat|nested] 127+ messages in thread

* RE: [PATCH v2 01/10] Documentation/x86: Early load microcode
  2012-11-30 19:46   ` H. Peter Anvin
@ 2012-11-30 20:40     ` Yu, Fenghua
  0 siblings, 0 replies; 127+ messages in thread
From: Yu, Fenghua @ 2012-11-30 20:40 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Ingo Molnar, Thomas Gleixner, Mallick, Asit K, Tigran Aivazian,
	Andreas Herrmann, Borislav Petkov, linux-kernel, x86

> -----Original Message-----
> From: H. Peter Anvin [mailto:hpa@zytor.com]
> Sent: Friday, November 30, 2012 11:46 AM
> To: Yu, Fenghua
> Cc: Ingo Molnar; Thomas Gleixner; Mallick, Asit K; Tigran Aivazian;
> Andreas Herrmann; Borislav Petkov; linux-kernel; x86
> Subject: Re: [PATCH v2 01/10] Documentation/x86: Early load microcode
> 
> On 11/29/2012 05:47 PM, Fenghua Yu wrote:
> > +
> > +mkdir initrd
> > +cd initrd
> > +cp ../microcode.hex kernel/x86/microcode/GenuineIntel/microcode.hex
> > +find .|cpio -oc >../ucode.cpio
> > +cd ..
> > +cat ucode.cpio /boot/initrd-3.5.0.img >/boot/initrd-3.5.0.ucode.img
> > +
> > +The generated /boot/initrd-3.5.0.ucode.img can be used as initrd
> file to load
> > +microcode during early boot time.
> >
> 
> This is wrong; first of all, this refers to microcode.hex whereas the
> patch refers to microcode.bin, and 

Will fix this sentence. 

> it misses the mandatory -H newc
> option to cpio.

I think there is no problem here. "-c" is "-H newc".

Quote from man cpio:
'-c' Identical to -H newc, use the new (SVR4) portable format.  If you wish the old
       portable (ASCII) archive format, use -H odc instead.

> 
> Could you update this bit, please?
> 
> 	-hpa


^ permalink raw reply	[flat|nested] 127+ messages in thread

* [tip:x86/microcode] x86/microcode_intel.h: Define functions and macros for early loading ucode
  2012-11-30  1:47 ` [PATCH v2 02/10] x86/microcode_intel.h: Define functions and macros for early loading ucode Fenghua Yu
@ 2012-12-01  0:21   ` tip-bot for Fenghua Yu
  0 siblings, 0 replies; 127+ messages in thread
From: tip-bot for Fenghua Yu @ 2012-12-01  0:21 UTC (permalink / raw)
  To: linux-tip-commits; +Cc: linux-kernel, hpa, mingo, fenghua.yu, tglx, hpa

Commit-ID:  17f1087f1a80d2dfada790c31720eb6a57da2d1f
Gitweb:     http://git.kernel.org/tip/17f1087f1a80d2dfada790c31720eb6a57da2d1f
Author:     Fenghua Yu <fenghua.yu@intel.com>
AuthorDate: Thu, 29 Nov 2012 17:47:40 -0800
Committer:  H. Peter Anvin <hpa@linux.intel.com>
CommitDate: Fri, 30 Nov 2012 15:18:14 -0800

x86/microcode_intel.h: Define functions and macros for early loading ucode

Define some functions and macros that will be used in early loading ucode. Some
of them are moved from microcode_intel.c driver in order to be called in early
boot phase before module can be called.

Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Link: http://lkml.kernel.org/r/1354240068-9821-3-git-send-email-fenghua.yu@intel.com
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
---
 arch/x86/include/asm/microcode_intel.h | 106 +++++++++++++++++++
 arch/x86/kernel/Makefile               |   3 +
 arch/x86/kernel/microcode_core.c       |   7 +-
 arch/x86/kernel/microcode_intel.c      | 185 ++-------------------------------
 4 files changed, 120 insertions(+), 181 deletions(-)

diff --git a/arch/x86/include/asm/microcode_intel.h b/arch/x86/include/asm/microcode_intel.h
new file mode 100644
index 0000000..0544bf4
--- /dev/null
+++ b/arch/x86/include/asm/microcode_intel.h
@@ -0,0 +1,106 @@
+#ifndef _ASM_X86_MICROCODE_INTEL_H
+#define _ASM_X86_MICROCODE_INTEL_H
+
+#include <asm/microcode.h>
+
+struct microcode_header_intel {
+	unsigned int            hdrver;
+	unsigned int            rev;
+	unsigned int            date;
+	unsigned int            sig;
+	unsigned int            cksum;
+	unsigned int            ldrver;
+	unsigned int            pf;
+	unsigned int            datasize;
+	unsigned int            totalsize;
+	unsigned int            reserved[3];
+};
+
+struct microcode_intel {
+	struct microcode_header_intel hdr;
+	unsigned int            bits[0];
+};
+
+/* microcode format is extended from prescott processors */
+struct extended_signature {
+	unsigned int            sig;
+	unsigned int            pf;
+	unsigned int            cksum;
+};
+
+struct extended_sigtable {
+	unsigned int            count;
+	unsigned int            cksum;
+	unsigned int            reserved[3];
+	struct extended_signature sigs[0];
+};
+
+#define DEFAULT_UCODE_DATASIZE	(2000)
+#define MC_HEADER_SIZE		(sizeof(struct microcode_header_intel))
+#define DEFAULT_UCODE_TOTALSIZE (DEFAULT_UCODE_DATASIZE + MC_HEADER_SIZE)
+#define EXT_HEADER_SIZE		(sizeof(struct extended_sigtable))
+#define EXT_SIGNATURE_SIZE	(sizeof(struct extended_signature))
+#define DWSIZE			(sizeof(u32))
+
+#define get_totalsize(mc) \
+	(((struct microcode_intel *)mc)->hdr.totalsize ? \
+	 ((struct microcode_intel *)mc)->hdr.totalsize : \
+	 DEFAULT_UCODE_TOTALSIZE)
+
+#define get_datasize(mc) \
+	(((struct microcode_intel *)mc)->hdr.datasize ? \
+	 ((struct microcode_intel *)mc)->hdr.datasize : DEFAULT_UCODE_DATASIZE)
+
+#define sigmatch(s1, s2, p1, p2) \
+	(((s1) == (s2)) && (((p1) & (p2)) || (((p1) == 0) && ((p2) == 0))))
+
+#define exttable_size(et) ((et)->count * EXT_SIGNATURE_SIZE + EXT_HEADER_SIZE)
+
+extern int
+get_matching_microcode(unsigned int csig, int cpf, void *mc, int rev);
+extern int microcode_sanity_check(void *mc, int print_err);
+extern int get_matching_sig(unsigned int csig, int cpf, void *mc, int rev);
+extern int
+update_match_revision(struct microcode_header_intel *mc_header, int rev);
+
+#ifdef CONFIG_MICROCODE_INTEL_EARLY
+extern enum ucode_state
+get_matching_model_microcode(int cpu, void *data, size_t size,
+			     struct mc_saved_data *mc_saved_data,
+			     struct microcode_intel **mc_saved_in_initrd,
+			     enum system_states system_state);
+extern enum ucode_state
+generic_load_microcode_early(int cpu, struct microcode_intel **mc_saved_p,
+			     unsigned int mc_saved_count,
+			     struct ucode_cpu_info *uci);
+extern void __init
+load_ucode_intel_bsp(char *real_mode_data);
+extern void __init load_ucode_intel_ap(void);
+#else
+static inline enum ucode_state
+get_matching_model_microcode(int cpu, void *data, size_t size,
+			     struct mc_saved_data *mc_saved_data,
+			     struct microcode_intel **mc_saved_in_initrd,
+			     enum system_states system_state)
+{
+	return UCODE_ERROR;
+}
+static inline enum ucode_state
+generic_load_microcode_early(int cpu, struct microcode_intel **mc_saved_p,
+			     unsigned int mc_saved_count,
+			     struct ucode_cpu_info *uci)
+{
+	return UCODE_ERROR;
+}
+static inline __init void
+load_ucode_intel_bsp(char *real_mode_data)
+{
+}
+static inline __init void
+load_ucode_intel_ap(struct ucode_cpu_info *uci,
+		    struct mc_saved_data *mc_saved_data)
+{
+}
+#endif
+
+#endif /* _ASM_X86_MICROCODE_INTEL_H */
diff --git a/arch/x86/kernel/Makefile b/arch/x86/kernel/Makefile
index 91ce48f..74fa88b 100644
--- a/arch/x86/kernel/Makefile
+++ b/arch/x86/kernel/Makefile
@@ -88,6 +88,9 @@ obj-$(CONFIG_PARAVIRT_CLOCK)	+= pvclock.o
 
 obj-$(CONFIG_PCSPKR_PLATFORM)	+= pcspeaker.o
 
+obj-$(CONFIG_MICROCODE_EARLY)		+= microcode_core_early.o
+obj-$(CONFIG_MICROCODE_INTEL_EARLY)	+= microcode_intel_early.o
+obj-$(CONFIG_MICROCODE_INTEL_LIB)	+= microcode_intel_lib.o
 microcode-y				:= microcode_core.o
 microcode-$(CONFIG_MICROCODE_INTEL)	+= microcode_intel.o
 microcode-$(CONFIG_MICROCODE_AMD)	+= microcode_amd.o
diff --git a/arch/x86/kernel/microcode_core.c b/arch/x86/kernel/microcode_core.c
index 3a04b22..22db92b 100644
--- a/arch/x86/kernel/microcode_core.c
+++ b/arch/x86/kernel/microcode_core.c
@@ -364,10 +364,7 @@ static struct attribute_group mc_attr_group = {
 
 static void microcode_fini_cpu(int cpu)
 {
-	struct ucode_cpu_info *uci = ucode_cpu_info + cpu;
-
 	microcode_ops->microcode_fini_cpu(cpu);
-	uci->valid = 0;
 }
 
 static enum ucode_state microcode_resume_cpu(int cpu)
@@ -383,6 +380,10 @@ static enum ucode_state microcode_resume_cpu(int cpu)
 static enum ucode_state microcode_init_cpu(int cpu, bool refresh_fw)
 {
 	enum ucode_state ustate;
+	struct ucode_cpu_info *uci = ucode_cpu_info + cpu;
+
+	if (uci && uci->valid)
+		return UCODE_OK;
 
 	if (collect_cpu_info(cpu))
 		return UCODE_ERROR;
diff --git a/arch/x86/kernel/microcode_intel.c b/arch/x86/kernel/microcode_intel.c
index 3544aed..f3ddf8e 100644
--- a/arch/x86/kernel/microcode_intel.c
+++ b/arch/x86/kernel/microcode_intel.c
@@ -79,7 +79,7 @@
 #include <linux/module.h>
 #include <linux/vmalloc.h>
 
-#include <asm/microcode.h>
+#include <asm/microcode_intel.h>
 #include <asm/processor.h>
 #include <asm/msr.h>
 
@@ -87,59 +87,6 @@ MODULE_DESCRIPTION("Microcode Update Driver");
 MODULE_AUTHOR("Tigran Aivazian <tigran@aivazian.fsnet.co.uk>");
 MODULE_LICENSE("GPL");
 
-struct microcode_header_intel {
-	unsigned int            hdrver;
-	unsigned int            rev;
-	unsigned int            date;
-	unsigned int            sig;
-	unsigned int            cksum;
-	unsigned int            ldrver;
-	unsigned int            pf;
-	unsigned int            datasize;
-	unsigned int            totalsize;
-	unsigned int            reserved[3];
-};
-
-struct microcode_intel {
-	struct microcode_header_intel hdr;
-	unsigned int            bits[0];
-};
-
-/* microcode format is extended from prescott processors */
-struct extended_signature {
-	unsigned int            sig;
-	unsigned int            pf;
-	unsigned int            cksum;
-};
-
-struct extended_sigtable {
-	unsigned int            count;
-	unsigned int            cksum;
-	unsigned int            reserved[3];
-	struct extended_signature sigs[0];
-};
-
-#define DEFAULT_UCODE_DATASIZE	(2000)
-#define MC_HEADER_SIZE		(sizeof(struct microcode_header_intel))
-#define DEFAULT_UCODE_TOTALSIZE (DEFAULT_UCODE_DATASIZE + MC_HEADER_SIZE)
-#define EXT_HEADER_SIZE		(sizeof(struct extended_sigtable))
-#define EXT_SIGNATURE_SIZE	(sizeof(struct extended_signature))
-#define DWSIZE			(sizeof(u32))
-
-#define get_totalsize(mc) \
-	(((struct microcode_intel *)mc)->hdr.totalsize ? \
-	 ((struct microcode_intel *)mc)->hdr.totalsize : \
-	 DEFAULT_UCODE_TOTALSIZE)
-
-#define get_datasize(mc) \
-	(((struct microcode_intel *)mc)->hdr.datasize ? \
-	 ((struct microcode_intel *)mc)->hdr.datasize : DEFAULT_UCODE_DATASIZE)
-
-#define sigmatch(s1, s2, p1, p2) \
-	(((s1) == (s2)) && (((p1) & (p2)) || (((p1) == 0) && ((p2) == 0))))
-
-#define exttable_size(et) ((et)->count * EXT_SIGNATURE_SIZE + EXT_HEADER_SIZE)
-
 static int collect_cpu_info(int cpu_num, struct cpu_signature *csig)
 {
 	struct cpuinfo_x86 *c = &cpu_data(cpu_num);
@@ -162,128 +109,7 @@ static int collect_cpu_info(int cpu_num, struct cpu_signature *csig)
 	return 0;
 }
 
-static inline int update_match_cpu(struct cpu_signature *csig, int sig, int pf)
-{
-	return (!sigmatch(sig, csig->sig, pf, csig->pf)) ? 0 : 1;
-}
-
-static inline int
-update_match_revision(struct microcode_header_intel *mc_header, int rev)
-{
-	return (mc_header->rev <= rev) ? 0 : 1;
-}
-
-static int microcode_sanity_check(void *mc)
-{
-	unsigned long total_size, data_size, ext_table_size;
-	struct microcode_header_intel *mc_header = mc;
-	struct extended_sigtable *ext_header = NULL;
-	int sum, orig_sum, ext_sigcount = 0, i;
-	struct extended_signature *ext_sig;
-
-	total_size = get_totalsize(mc_header);
-	data_size = get_datasize(mc_header);
-
-	if (data_size + MC_HEADER_SIZE > total_size) {
-		pr_err("error! Bad data size in microcode data file\n");
-		return -EINVAL;
-	}
-
-	if (mc_header->ldrver != 1 || mc_header->hdrver != 1) {
-		pr_err("error! Unknown microcode update format\n");
-		return -EINVAL;
-	}
-	ext_table_size = total_size - (MC_HEADER_SIZE + data_size);
-	if (ext_table_size) {
-		if ((ext_table_size < EXT_HEADER_SIZE)
-		 || ((ext_table_size - EXT_HEADER_SIZE) % EXT_SIGNATURE_SIZE)) {
-			pr_err("error! Small exttable size in microcode data file\n");
-			return -EINVAL;
-		}
-		ext_header = mc + MC_HEADER_SIZE + data_size;
-		if (ext_table_size != exttable_size(ext_header)) {
-			pr_err("error! Bad exttable size in microcode data file\n");
-			return -EFAULT;
-		}
-		ext_sigcount = ext_header->count;
-	}
-
-	/* check extended table checksum */
-	if (ext_table_size) {
-		int ext_table_sum = 0;
-		int *ext_tablep = (int *)ext_header;
-
-		i = ext_table_size / DWSIZE;
-		while (i--)
-			ext_table_sum += ext_tablep[i];
-		if (ext_table_sum) {
-			pr_warning("aborting, bad extended signature table checksum\n");
-			return -EINVAL;
-		}
-	}
-
-	/* calculate the checksum */
-	orig_sum = 0;
-	i = (MC_HEADER_SIZE + data_size) / DWSIZE;
-	while (i--)
-		orig_sum += ((int *)mc)[i];
-	if (orig_sum) {
-		pr_err("aborting, bad checksum\n");
-		return -EINVAL;
-	}
-	if (!ext_table_size)
-		return 0;
-	/* check extended signature checksum */
-	for (i = 0; i < ext_sigcount; i++) {
-		ext_sig = (void *)ext_header + EXT_HEADER_SIZE +
-			  EXT_SIGNATURE_SIZE * i;
-		sum = orig_sum
-			- (mc_header->sig + mc_header->pf + mc_header->cksum)
-			+ (ext_sig->sig + ext_sig->pf + ext_sig->cksum);
-		if (sum) {
-			pr_err("aborting, bad checksum\n");
-			return -EINVAL;
-		}
-	}
-	return 0;
-}
-
-/*
- * return 0 - no update found
- * return 1 - found update
- */
-static int
-get_matching_microcode(struct cpu_signature *cpu_sig, void *mc, int rev)
-{
-	struct microcode_header_intel *mc_header = mc;
-	struct extended_sigtable *ext_header;
-	unsigned long total_size = get_totalsize(mc_header);
-	int ext_sigcount, i;
-	struct extended_signature *ext_sig;
-
-	if (!update_match_revision(mc_header, rev))
-		return 0;
-
-	if (update_match_cpu(cpu_sig, mc_header->sig, mc_header->pf))
-		return 1;
-
-	/* Look for ext. headers: */
-	if (total_size <= get_datasize(mc_header) + MC_HEADER_SIZE)
-		return 0;
-
-	ext_header = mc + get_datasize(mc_header) + MC_HEADER_SIZE;
-	ext_sigcount = ext_header->count;
-	ext_sig = (void *)ext_header + EXT_HEADER_SIZE;
-
-	for (i = 0; i < ext_sigcount; i++) {
-		if (update_match_cpu(cpu_sig, ext_sig->sig, ext_sig->pf))
-			return 1;
-		ext_sig++;
-	}
-	return 0;
-}
-
-static int apply_microcode(int cpu)
+int apply_microcode(int cpu)
 {
 	struct microcode_intel *mc_intel;
 	struct ucode_cpu_info *uci;
@@ -338,6 +164,7 @@ static enum ucode_state generic_load_microcode(int cpu, void *data, size_t size,
 	unsigned int leftover = size;
 	enum ucode_state state = UCODE_OK;
 	unsigned int curr_mc_size = 0;
+	unsigned int csig, cpf;
 
 	while (leftover) {
 		struct microcode_header_intel mc_header;
@@ -362,11 +189,13 @@ static enum ucode_state generic_load_microcode(int cpu, void *data, size_t size,
 		}
 
 		if (get_ucode_data(mc, ucode_ptr, mc_size) ||
-		    microcode_sanity_check(mc) < 0) {
+		    microcode_sanity_check(mc, 1) < 0) {
 			break;
 		}
 
-		if (get_matching_microcode(&uci->cpu_sig, mc, new_rev)) {
+		csig = uci->cpu_sig.sig;
+		cpf = uci->cpu_sig.pf;
+		if (get_matching_microcode(csig, cpf, mc, new_rev)) {
 			vfree(new_mc);
 			new_rev = mc_header.rev;
 			new_mc  = mc;

^ permalink raw reply related	[flat|nested] 127+ messages in thread

* [tip:x86/microcode] x86/microcode_core_early.c: Define interfaces for early loading ucode
  2012-11-30  1:47 ` [PATCH v2 03/10] x86/microcode_core_early.c: Define interfaces " Fenghua Yu
@ 2012-12-01  0:23   ` tip-bot for Fenghua Yu
  0 siblings, 0 replies; 127+ messages in thread
From: tip-bot for Fenghua Yu @ 2012-12-01  0:23 UTC (permalink / raw)
  To: linux-tip-commits; +Cc: linux-kernel, hpa, mingo, fenghua.yu, tglx, hpa

Commit-ID:  d42bdf2139115faa4d5bdb0dc591d435a644fde4
Gitweb:     http://git.kernel.org/tip/d42bdf2139115faa4d5bdb0dc591d435a644fde4
Author:     Fenghua Yu <fenghua.yu@intel.com>
AuthorDate: Thu, 29 Nov 2012 17:47:41 -0800
Committer:  H. Peter Anvin <hpa@linux.intel.com>
CommitDate: Fri, 30 Nov 2012 15:18:15 -0800

x86/microcode_core_early.c: Define interfaces for early loading ucode

Define interfaces load_ucode_bsp() and load_ucode_ap() to load ucode on BSP and
AP in early boot time. These are generic interfaces. Internally they call
vendor specific implementations.

Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Link: http://lkml.kernel.org/r/1354240068-9821-4-git-send-email-fenghua.yu@intel.com
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
---
 arch/x86/include/asm/microcode.h       | 23 +++++++++++
 arch/x86/kernel/microcode_core_early.c | 70 ++++++++++++++++++++++++++++++++++
 2 files changed, 93 insertions(+)

diff --git a/arch/x86/include/asm/microcode.h b/arch/x86/include/asm/microcode.h
index 43d921b..2e2ff3a 100644
--- a/arch/x86/include/asm/microcode.h
+++ b/arch/x86/include/asm/microcode.h
@@ -57,4 +57,27 @@ static inline struct microcode_ops * __init init_amd_microcode(void)
 static inline void __exit exit_amd_microcode(void) {}
 #endif
 
+struct mc_saved_data {
+	unsigned int mc_saved_count;
+	struct microcode_intel **mc_saved;
+	struct ucode_cpu_info *ucode_cpu_info;
+};
+#ifdef CONFIG_MICROCODE_EARLY
+#define MAX_UCODE_COUNT 128
+extern struct ucode_cpu_info ucode_cpu_info_early[NR_CPUS];
+extern struct microcode_intel __initdata *mc_saved_in_initrd[MAX_UCODE_COUNT];
+extern struct mc_saved_data mc_saved_data;
+extern void __init load_ucode_bsp(char *real_mode_data);
+extern __init void load_ucode_ap(void);
+extern void __init
+save_microcode_in_initrd(struct mc_saved_data *mc_saved_data,
+			 struct microcode_intel **mc_saved_in_initrd);
+#else
+static inline void __init load_ucode_bsp(char *real_mode_data) {}
+static inline __init void load_ucode_ap(void) {}
+static inline void __init
+save_microcode_in_initrd(struct mc_saved_data *mc_saved_data,
+			 struct microcode_intel **mc_saved_in_initrd) {}
+#endif
+
 #endif /* _ASM_X86_MICROCODE_H */
diff --git a/arch/x86/kernel/microcode_core_early.c b/arch/x86/kernel/microcode_core_early.c
new file mode 100644
index 0000000..1c6cc8f
--- /dev/null
+++ b/arch/x86/kernel/microcode_core_early.c
@@ -0,0 +1,70 @@
+/*
+ *	X86 CPU microcode early update for Linux
+ *
+ *	Copyright (C) 2012 Fenghua Yu <fenghua.yu@intel.com>
+ *			   H Peter Anvin" <hpa@zytor.com>
+ *
+ *	This driver allows to early upgrade microcode on Intel processors
+ *	belonging to IA-32 family - PentiumPro, Pentium II,
+ *	Pentium III, Xeon, Pentium 4, etc.
+ *
+ *	Reference: Section 9.11 of Volume 3, IA-32 Intel Architecture
+ *	Software Developer's Manual.
+ *
+ *	This program is free software; you can redistribute it and/or
+ *	modify it under the terms of the GNU General Public License
+ *	as published by the Free Software Foundation; either version
+ *	2 of the License, or (at your option) any later version.
+ */
+#include <linux/module.h>
+#include <linux/mm.h>
+#include <asm/microcode_intel.h>
+#include <asm/processor.h>
+
+struct ucode_cpu_info	ucode_cpu_info_early[NR_CPUS];
+EXPORT_SYMBOL_GPL(ucode_cpu_info_early);
+
+static inline int __init x86_vendor(void)
+{
+	unsigned int eax = 0x00000000;
+	char x86_vendor_id[16];
+	int i;
+	struct {
+		char x86_vendor_id[16];
+		__u8 x86_vendor;
+	} cpu_vendor_table[] = {
+		{ "GenuineIntel", X86_VENDOR_INTEL },
+		{ "AuthenticAMD", X86_VENDOR_AMD },
+	};
+
+	memset(x86_vendor_id, 0, ARRAY_SIZE(x86_vendor_id));
+	/* Get vendor name */
+	native_cpuid(&eax,
+		(unsigned int *)&x86_vendor_id[0],
+		(unsigned int *)&x86_vendor_id[8],
+		(unsigned int *)&x86_vendor_id[4]);
+
+	for (i = 0; i < ARRAY_SIZE(cpu_vendor_table); i++) {
+		if (!strcmp(x86_vendor_id, cpu_vendor_table[i].x86_vendor_id))
+			return cpu_vendor_table[i].x86_vendor;
+	}
+
+	return X86_VENDOR_UNKNOWN;
+}
+
+
+void __init load_ucode_bsp(char *real_mode_data)
+{
+	/*
+	 * boot_cpu_data is not setup yet in this early phase.
+	 * So we get vendor information directly through cpuid.
+	 */
+	if (x86_vendor() == X86_VENDOR_INTEL)
+		load_ucode_intel_bsp(real_mode_data);
+}
+
+void __cpuinit load_ucode_ap(void)
+{
+	if (boot_cpu_data.x86_vendor == X86_VENDOR_INTEL)
+		load_ucode_intel_ap();
+}

^ permalink raw reply related	[flat|nested] 127+ messages in thread

* [tip:x86/microcode] x86/microcode_intel_lib.c: Early update ucode on Intel's CPU
  2012-11-30  1:47 ` [PATCH v2 04/10] x86/microcode_intel_lib.c: Early update ucode on Intel's CPU Fenghua Yu
@ 2012-12-01  0:24   ` tip-bot for Fenghua Yu
  0 siblings, 0 replies; 127+ messages in thread
From: tip-bot for Fenghua Yu @ 2012-12-01  0:24 UTC (permalink / raw)
  To: linux-tip-commits; +Cc: linux-kernel, hpa, mingo, fenghua.yu, tglx, hpa

Commit-ID:  da7d824a00ec0f4d19e2b51653410bde0de40226
Gitweb:     http://git.kernel.org/tip/da7d824a00ec0f4d19e2b51653410bde0de40226
Author:     Fenghua Yu <fenghua.yu@intel.com>
AuthorDate: Thu, 29 Nov 2012 17:47:42 -0800
Committer:  H. Peter Anvin <hpa@linux.intel.com>
CommitDate: Fri, 30 Nov 2012 15:18:15 -0800

x86/microcode_intel_lib.c: Early update ucode on Intel's CPU

Define interfaces microcode_sanity_check() and get_matching_microcode(). They
are called both in early boot time and in microcode Intel driver.

Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Link: http://lkml.kernel.org/r/1354240068-9821-5-git-send-email-fenghua.yu@intel.com
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
---
 arch/x86/kernel/microcode_intel_lib.c | 174 ++++++++++++++++++++++++++++++++++
 1 file changed, 174 insertions(+)

diff --git a/arch/x86/kernel/microcode_intel_lib.c b/arch/x86/kernel/microcode_intel_lib.c
new file mode 100644
index 0000000..ce69320
--- /dev/null
+++ b/arch/x86/kernel/microcode_intel_lib.c
@@ -0,0 +1,174 @@
+/*
+ *	Intel CPU Microcode Update Driver for Linux
+ *
+ *	Copyright (C) 2012 Fenghua Yu <fenghua.yu@intel.com>
+ *			   H Peter Anvin" <hpa@zytor.com>
+ *
+ *	This driver allows to upgrade microcode on Intel processors
+ *	belonging to IA-32 family - PentiumPro, Pentium II,
+ *	Pentium III, Xeon, Pentium 4, etc.
+ *
+ *	Reference: Section 8.11 of Volume 3a, IA-32 Intel? Architecture
+ *	Software Developer's Manual
+ *	Order Number 253668 or free download from:
+ *
+ *	http://developer.intel.com/Assets/PDF/manual/253668.pdf
+ *
+ *	For more information, go to http://www.urbanmyth.org/microcode
+ *
+ *	This program is free software; you can redistribute it and/or
+ *	modify it under the terms of the GNU General Public License
+ *	as published by the Free Software Foundation; either version
+ *	2 of the License, or (at your option) any later version.
+ *
+ */
+#include <linux/firmware.h>
+#include <linux/uaccess.h>
+#include <linux/kernel.h>
+#include <linux/module.h>
+
+#include <asm/microcode_intel.h>
+#include <asm/processor.h>
+#include <asm/msr.h>
+
+static inline int
+update_match_cpu(unsigned int csig, unsigned int cpf,
+		 unsigned int sig, unsigned int pf)
+{
+	return (!sigmatch(sig, csig, pf, cpf)) ? 0 : 1;
+}
+
+int
+update_match_revision(struct microcode_header_intel *mc_header, int rev)
+{
+	return (mc_header->rev <= rev) ? 0 : 1;
+}
+
+int microcode_sanity_check(void *mc, int print_err)
+{
+	unsigned long total_size, data_size, ext_table_size;
+	struct microcode_header_intel *mc_header = mc;
+	struct extended_sigtable *ext_header = NULL;
+	int sum, orig_sum, ext_sigcount = 0, i;
+	struct extended_signature *ext_sig;
+
+	total_size = get_totalsize(mc_header);
+	data_size = get_datasize(mc_header);
+
+	if (data_size + MC_HEADER_SIZE > total_size) {
+		if (print_err)
+			pr_err("error! Bad data size in microcode data file\n");
+		return -EINVAL;
+	}
+
+	if (mc_header->ldrver != 1 || mc_header->hdrver != 1) {
+		if (print_err)
+			pr_err("error! Unknown microcode update format\n");
+		return -EINVAL;
+	}
+	ext_table_size = total_size - (MC_HEADER_SIZE + data_size);
+	if (ext_table_size) {
+		if ((ext_table_size < EXT_HEADER_SIZE)
+		 || ((ext_table_size - EXT_HEADER_SIZE) % EXT_SIGNATURE_SIZE)) {
+			if (print_err)
+				pr_err("error! Small exttable size in microcode data file\n");
+			return -EINVAL;
+		}
+		ext_header = mc + MC_HEADER_SIZE + data_size;
+		if (ext_table_size != exttable_size(ext_header)) {
+			if (print_err)
+				pr_err("error! Bad exttable size in microcode data file\n");
+			return -EFAULT;
+		}
+		ext_sigcount = ext_header->count;
+	}
+
+	/* check extended table checksum */
+	if (ext_table_size) {
+		int ext_table_sum = 0;
+		int *ext_tablep = (int *)ext_header;
+
+		i = ext_table_size / DWSIZE;
+		while (i--)
+			ext_table_sum += ext_tablep[i];
+		if (ext_table_sum) {
+			if (print_err)
+				pr_warn("aborting, bad extended signature table checksum\n");
+			return -EINVAL;
+		}
+	}
+
+	/* calculate the checksum */
+	orig_sum = 0;
+	i = (MC_HEADER_SIZE + data_size) / DWSIZE;
+	while (i--)
+		orig_sum += ((int *)mc)[i];
+	if (orig_sum) {
+		if (print_err)
+			pr_err("aborting, bad checksum\n");
+		return -EINVAL;
+	}
+	if (!ext_table_size)
+		return 0;
+	/* check extended signature checksum */
+	for (i = 0; i < ext_sigcount; i++) {
+		ext_sig = (void *)ext_header + EXT_HEADER_SIZE +
+			  EXT_SIGNATURE_SIZE * i;
+		sum = orig_sum
+			- (mc_header->sig + mc_header->pf + mc_header->cksum)
+			+ (ext_sig->sig + ext_sig->pf + ext_sig->cksum);
+		if (sum) {
+			if (print_err)
+				pr_err("aborting, bad checksum\n");
+			return -EINVAL;
+		}
+	}
+	return 0;
+}
+EXPORT_SYMBOL_GPL(microcode_sanity_check);
+
+/*
+ * return 0 - no update found
+ * return 1 - found update
+ */
+int get_matching_sig(unsigned int csig, int cpf, void *mc, int rev)
+{
+	struct microcode_header_intel *mc_header = mc;
+	struct extended_sigtable *ext_header;
+	unsigned long total_size = get_totalsize(mc_header);
+	int ext_sigcount, i;
+	struct extended_signature *ext_sig;
+
+	if (update_match_cpu(csig, cpf, mc_header->sig, mc_header->pf))
+		return 1;
+
+	/* Look for ext. headers: */
+	if (total_size <= get_datasize(mc_header) + MC_HEADER_SIZE)
+		return 0;
+
+	ext_header = mc + get_datasize(mc_header) + MC_HEADER_SIZE;
+	ext_sigcount = ext_header->count;
+	ext_sig = (void *)ext_header + EXT_HEADER_SIZE;
+
+	for (i = 0; i < ext_sigcount; i++) {
+		if (update_match_cpu(csig, cpf, ext_sig->sig, ext_sig->pf))
+			return 1;
+		ext_sig++;
+	}
+	return 0;
+}
+
+/*
+ * return 0 - no update found
+ * return 1 - found update
+ */
+int get_matching_microcode(unsigned int csig, int cpf, void *mc, int rev)
+{
+	struct microcode_header_intel *mc_header = mc;
+
+	if (!update_match_revision(mc_header, rev))
+		return 0;
+
+	return get_matching_sig(csig, cpf, mc, rev);
+}
+EXPORT_SYMBOL_GPL(get_matching_microcode);

^ permalink raw reply related	[flat|nested] 127+ messages in thread

* [tip:x86/microcode] x86/microcode_intel_early.c: Early update ucode on Intel's CPU
  2012-11-30  1:47 ` [PATCH v2 05/10] x86/microcode_intel_early.c: " Fenghua Yu
@ 2012-12-01  0:25   ` tip-bot for Fenghua Yu
  2012-12-01  0:55     ` Yinghai Lu
  0 siblings, 1 reply; 127+ messages in thread
From: tip-bot for Fenghua Yu @ 2012-12-01  0:25 UTC (permalink / raw)
  To: linux-tip-commits; +Cc: linux-kernel, hpa, mingo, fenghua.yu, tglx, hpa

Commit-ID:  474355fe313391de2429ae225e0fb02f67ec6c31
Gitweb:     http://git.kernel.org/tip/474355fe313391de2429ae225e0fb02f67ec6c31
Author:     Fenghua Yu <fenghua.yu@intel.com>
AuthorDate: Thu, 29 Nov 2012 17:47:43 -0800
Committer:  H. Peter Anvin <hpa@linux.intel.com>
CommitDate: Fri, 30 Nov 2012 15:18:16 -0800

x86/microcode_intel_early.c: Early update ucode on Intel's CPU

Implementation of early update ucode on Intel's CPU.

load_ucode_intel_bsp() scans ucode in initrd image file which is a cpio format
ucode followed by ordinary initrd image file. The binary ucode file is stored
in kernel/x86/microcode/GenuineIntel/microcode.bin in the cpio data. All ucode
patches with the same model as BSP are saved in memory. A matching ucode patch
is updated on BSP.

load_ucode_intel_ap() reads saved ucoded patches and updates ucode on AP.

Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Link: http://lkml.kernel.org/r/1354240068-9821-6-git-send-email-fenghua.yu@intel.com
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
---
 arch/x86/kernel/microcode_intel_early.c | 438 ++++++++++++++++++++++++++++++++
 1 file changed, 438 insertions(+)

diff --git a/arch/x86/kernel/microcode_intel_early.c b/arch/x86/kernel/microcode_intel_early.c
new file mode 100644
index 0000000..36b1df1
--- /dev/null
+++ b/arch/x86/kernel/microcode_intel_early.c
@@ -0,0 +1,438 @@
+/*
+ *	Intel CPU Microcode Update Driver for Linux
+ *
+ *	Copyright (C) 2012 Fenghua Yu <fenghua.yu@intel.com>
+ *			   H Peter Anvin" <hpa@zytor.com>
+ *
+ *	This driver allows to early upgrade microcode on Intel processors
+ *	belonging to IA-32 family - PentiumPro, Pentium II,
+ *	Pentium III, Xeon, Pentium 4, etc.
+ *
+ *	Reference: Section 9.11 of Volume 3, IA-32 Intel Architecture
+ *	Software Developer's Manual.
+ *
+ *	This program is free software; you can redistribute it and/or
+ *	modify it under the terms of the GNU General Public License
+ *	as published by the Free Software Foundation; either version
+ *	2 of the License, or (at your option) any later version.
+ */
+#include <linux/module.h>
+#include <linux/vmalloc.h>
+#include <linux/mm.h>
+#include <linux/earlycpio.h>
+#include <asm/msr.h>
+#include <asm/microcode_intel.h>
+#include <asm/processor.h>
+
+struct microcode_intel __initdata *mc_saved_in_initrd[MAX_UCODE_COUNT];
+struct mc_saved_data mc_saved_data;
+
+enum ucode_state
+generic_load_microcode_early(int cpu, struct microcode_intel **mc_saved_p,
+			     unsigned int mc_saved_count,
+			     struct ucode_cpu_info *uci)
+{
+	struct microcode_intel *ucode_ptr, *new_mc = NULL;
+	int new_rev = uci->cpu_sig.rev;
+	enum ucode_state state = UCODE_OK;
+	unsigned int mc_size;
+	struct microcode_header_intel *mc_header;
+	unsigned int csig = uci->cpu_sig.sig;
+	unsigned int cpf = uci->cpu_sig.pf;
+	int i;
+
+	for (i = 0; i < mc_saved_count; i++) {
+		ucode_ptr = mc_saved_p[i];
+		mc_header = (struct microcode_header_intel *)ucode_ptr;
+		mc_size = get_totalsize(mc_header);
+		if (get_matching_microcode(csig, cpf, ucode_ptr, new_rev)) {
+			new_rev = mc_header->rev;
+			new_mc  = ucode_ptr;
+		}
+	}
+
+	if (!new_mc) {
+		state = UCODE_NFOUND;
+		goto out;
+	}
+
+	uci->mc = (struct microcode_intel *)new_mc;
+out:
+	return state;
+}
+EXPORT_SYMBOL_GPL(generic_load_microcode_early);
+
+static enum ucode_state __init
+load_microcode(struct mc_saved_data *mc_saved_data, int cpu)
+{
+	struct ucode_cpu_info *uci = mc_saved_data->ucode_cpu_info + cpu;
+
+	return generic_load_microcode_early(cpu, mc_saved_data->mc_saved,
+					    mc_saved_data->mc_saved_count, uci);
+}
+
+static u8 get_x86_family(unsigned long sig)
+{
+	u8 x86;
+
+	x86 = (sig >> 8) & 0xf;
+
+	if (x86 == 0xf)
+		x86 += (sig >> 20) & 0xff;
+
+	return x86;
+}
+
+static u8 get_x86_model(unsigned long sig)
+{
+	u8 x86, x86_model;
+
+	x86 = get_x86_family(sig);
+	x86_model = (sig >> 4) & 0xf;
+
+	if (x86 == 0x6 || x86 == 0xf)
+		x86_model += ((sig >> 16) & 0xf) << 4;
+
+	return x86_model;
+}
+
+static enum ucode_state
+matching_model_microcode(struct microcode_header_intel *mc_header,
+			unsigned long sig)
+{
+	u8 x86, x86_model;
+	u8 x86_ucode, x86_model_ucode;
+
+	x86 = get_x86_family(sig);
+	x86_model = get_x86_model(sig);
+
+	x86_ucode = get_x86_family(mc_header->sig);
+	x86_model_ucode = get_x86_model(mc_header->sig);
+
+	if (x86 != x86_ucode || x86_model != x86_model_ucode)
+		return UCODE_ERROR;
+
+	return UCODE_OK;
+}
+
+static void
+save_microcode(struct mc_saved_data *mc_saved_data,
+	       struct microcode_intel **mc_saved_src,
+	       unsigned int mc_saved_count)
+{
+	int i;
+	struct microcode_intel **mc_saved_p;
+
+	if (!mc_saved_count)
+		return;
+
+	mc_saved_p = vmalloc(mc_saved_count*sizeof(struct microcode_intel *));
+	if (!mc_saved_p)
+		return;
+
+	for (i = 0; i < mc_saved_count; i++) {
+		struct microcode_intel *mc = mc_saved_src[i];
+		struct microcode_header_intel *mc_header = &mc->hdr;
+		unsigned long mc_size = get_totalsize(mc_header);
+		mc_saved_p[i] = vmalloc(mc_size);
+		if (mc_saved_src[i])
+			memcpy(mc_saved_p[i], mc, mc_size);
+	}
+
+	mc_saved_data->mc_saved = mc_saved_p;
+}
+
+/*
+ * Get microcode matching with BSP's model. Only CPU's with the same model as
+ * BSP can stay in the platform.
+ */
+enum ucode_state
+get_matching_model_microcode(int cpu, void *data, size_t size,
+			     struct mc_saved_data *mc_saved_data,
+			     struct microcode_intel **mc_saved_in_initrd,
+			     enum system_states system_state)
+{
+	u8 *ucode_ptr = data;
+	unsigned int leftover = size;
+	enum ucode_state state = UCODE_OK;
+	unsigned int mc_size;
+	struct microcode_header_intel *mc_header;
+	struct microcode_intel *mc_saved_tmp[MAX_UCODE_COUNT];
+	size_t mc_saved_size;
+	size_t mem_size;
+	unsigned int mc_saved_count = mc_saved_data->mc_saved_count;
+	struct ucode_cpu_info *uci = mc_saved_data->ucode_cpu_info + cpu;
+	int found = 0;
+	int i;
+
+	if (mc_saved_count) {
+		mem_size = mc_saved_count * sizeof(struct microcode_intel *);
+		memcpy(mc_saved_tmp, mc_saved_data->mc_saved, mem_size);
+	}
+
+	while (leftover) {
+		mc_header = (struct microcode_header_intel *)ucode_ptr;
+
+		mc_size = get_totalsize(mc_header);
+		if (!mc_size || mc_size > leftover ||
+			microcode_sanity_check(ucode_ptr, 0) < 0)
+			break;
+
+		leftover -= mc_size;
+		if (matching_model_microcode(mc_header, uci->cpu_sig.sig) !=
+			 UCODE_OK) {
+			ucode_ptr += mc_size;
+			continue;
+		}
+
+		found = 0;
+		for (i = 0; i < mc_saved_count; i++) {
+			unsigned int sig, pf;
+			unsigned int new_rev;
+			struct microcode_header_intel *mc_saved_header =
+			     (struct microcode_header_intel *)mc_saved_tmp[i];
+			sig = mc_saved_header->sig;
+			pf = mc_saved_header->pf;
+			new_rev = mc_header->rev;
+
+			if (get_matching_sig(sig, pf, ucode_ptr, new_rev)) {
+				found = 1;
+				if (update_match_revision(mc_header, new_rev)) {
+					/*
+					 * Found an older ucode saved before.
+					 * Replace the older one with this newer
+					 * one.
+					 */
+					mc_saved_tmp[i] =
+					(struct microcode_intel *)ucode_ptr;
+					break;
+				}
+			}
+		}
+		if (i >= mc_saved_count && !found)
+			/*
+			 * This ucode is first time discovered in ucode file.
+			 * Save it to memory.
+			 */
+			mc_saved_tmp[mc_saved_count++] =
+					 (struct microcode_intel *)ucode_ptr;
+
+		ucode_ptr += mc_size;
+	}
+
+	if (leftover) {
+		state = UCODE_ERROR;
+		goto out;
+	}
+
+	if (mc_saved_count == 0) {
+		state = UCODE_NFOUND;
+		goto out;
+	}
+
+	if (system_state == SYSTEM_RUNNING) {
+		vfree(mc_saved_data->mc_saved);
+		save_microcode(mc_saved_data, mc_saved_tmp, mc_saved_count);
+	} else {
+		mc_saved_size = sizeof(struct microcode_intel *) *
+				mc_saved_count;
+		memcpy(mc_saved_in_initrd, mc_saved_tmp, mc_saved_size);
+		mc_saved_data->mc_saved = mc_saved_in_initrd;
+	}
+
+	mc_saved_data->mc_saved_count = mc_saved_count;
+out:
+	return state;
+}
+EXPORT_SYMBOL_GPL(get_matching_model_microcode);
+
+#define native_rdmsr(msr, val1, val2)		\
+do {						\
+	u64 __val = native_read_msr((msr));	\
+	(void)((val1) = (u32)__val);		\
+	(void)((val2) = (u32)(__val >> 32));	\
+} while (0)
+
+#define native_wrmsr(msr, low, high)		\
+	native_write_msr(msr, low, high);
+
+static int __cpuinit collect_cpu_info_early(struct ucode_cpu_info *uci)
+{
+	unsigned int val[2];
+	u8 x86, x86_model;
+	struct cpu_signature csig = {0, 0, 0};
+	unsigned int eax, ebx, ecx, edx;
+
+	memset(uci, 0, sizeof(*uci));
+
+	eax = 0x00000001;
+	ecx = 0;
+	native_cpuid(&eax, &ebx, &ecx, &edx);
+	csig.sig = eax;
+
+	x86 = get_x86_family(csig.sig);
+	x86_model = get_x86_model(csig.sig);
+
+	if ((x86_model >= 5) || (x86 > 6)) {
+		/* get processor flags from MSR 0x17 */
+		native_rdmsr(MSR_IA32_PLATFORM_ID, val[0], val[1]);
+		csig.pf = 1 << ((val[1] >> 18) & 7);
+	}
+
+	/* get the current revision from MSR 0x8B */
+	native_rdmsr(MSR_IA32_UCODE_REV, val[0], val[1]);
+
+	csig.rev = val[1];
+
+	uci->cpu_sig = csig;
+	uci->valid = 1;
+
+	return 0;
+}
+
+static __init enum ucode_state
+scan_microcode(unsigned long start, unsigned long end,
+		struct mc_saved_data *mc_saved_data,
+		struct microcode_intel **mc_saved_in_initrd)
+{
+	unsigned int size = end - start + 1;
+	struct cpio_data cd = { 0, 0 };
+	char ucode_name[] = "kernel/x86/microcode/GenuineIntel.bin";
+	long offset = 0;
+
+	cd = find_cpio_data(ucode_name, (void *)start, size, &offset);
+	if (!cd.data)
+		return UCODE_ERROR;
+
+	return get_matching_model_microcode(0, cd.data, cd.size, mc_saved_data,
+					 mc_saved_in_initrd, SYSTEM_BOOTING);
+}
+
+static int __init
+apply_microcode_early(struct mc_saved_data *mc_saved_data, int cpu)
+{
+	struct ucode_cpu_info *uci = mc_saved_data->ucode_cpu_info + cpu;
+	struct microcode_intel *mc_intel;
+	unsigned int val[2];
+
+	/* We should bind the task to the CPU */
+	mc_intel = uci->mc;
+	if (mc_intel == NULL)
+		return 0;
+
+	/* write microcode via MSR 0x79 */
+	native_wrmsr(MSR_IA32_UCODE_WRITE,
+	      (unsigned long) mc_intel->bits,
+	      (unsigned long) mc_intel->bits >> 16 >> 16);
+	native_wrmsr(MSR_IA32_UCODE_REV, 0, 0);
+
+	/* As documented in the SDM: Do a CPUID 1 here */
+	sync_core();
+
+	/* get the current revision from MSR 0x8B */
+	native_rdmsr(MSR_IA32_UCODE_REV, val[0], val[1]);
+	if (val[1] != mc_intel->hdr.rev)
+		return -1;
+
+	uci->cpu_sig.rev = val[1];
+
+	return 0;
+}
+
+#ifdef CONFIG_X86_32
+static void __init map_mc_saved(struct mc_saved_data *mc_saved_data,
+				struct microcode_intel **mc_saved_in_initrd)
+{
+	int i;
+
+	if (mc_saved_data->mc_saved) {
+		for (i = 0; i < mc_saved_data->mc_saved_count; i++)
+			mc_saved_data->mc_saved[i] =
+					 __va(mc_saved_data->mc_saved[i]);
+
+		mc_saved_data->mc_saved = __va(mc_saved_data->mc_saved);
+	}
+
+	if (mc_saved_data->ucode_cpu_info->mc)
+		mc_saved_data->ucode_cpu_info->mc =
+				 __va(mc_saved_data->ucode_cpu_info->mc);
+	mc_saved_data->ucode_cpu_info = __va(mc_saved_data->ucode_cpu_info);
+}
+#else
+static inline void __init map_mc_saved(struct mc_saved_data *mc_saved_data,
+				struct microcode_intel **mc_saved_in_initrd)
+{
+}
+#endif
+
+void __init save_microcode_in_initrd(struct mc_saved_data *mc_saved_data,
+		 struct microcode_intel **mc_saved_in_initrd)
+{
+	unsigned int count = mc_saved_data->mc_saved_count;
+
+	save_microcode(mc_saved_data, mc_saved_in_initrd, count);
+}
+
+static void __init
+_load_ucode_intel_bsp(struct mc_saved_data *mc_saved_data,
+		      struct microcode_intel **mc_saved_in_initrd,
+		      unsigned long initrd_start, unsigned long initrd_end)
+{
+	int cpu = 0;
+
+#ifdef CONFIG_X86_64
+	mc_saved_data->ucode_cpu_info = ucode_cpu_info_early;
+#else
+	mc_saved_data->ucode_cpu_info =
+			(struct ucode_cpu_info *)__pa(ucode_cpu_info_early);
+#endif
+	collect_cpu_info_early(mc_saved_data->ucode_cpu_info + cpu);
+	scan_microcode(initrd_start, initrd_end, mc_saved_data,
+		       mc_saved_in_initrd);
+	load_microcode(mc_saved_data, cpu);
+	apply_microcode_early(mc_saved_data, cpu);
+	map_mc_saved(mc_saved_data, mc_saved_in_initrd);
+}
+
+void __init
+load_ucode_intel_bsp(char *real_mode_data)
+{
+	u64 ramdisk_image, ramdisk_size, ramdisk_end;
+	unsigned long initrd_start, initrd_end;
+	struct boot_params *boot_params;
+
+	boot_params = (struct boot_params *)real_mode_data;
+	ramdisk_image = boot_params->hdr.ramdisk_image;
+	ramdisk_size  = boot_params->hdr.ramdisk_size;
+
+#ifdef CONFIG_X86_64
+	ramdisk_end  = PAGE_ALIGN(ramdisk_image + ramdisk_size);
+	initrd_start = ramdisk_image + PAGE_OFFSET;
+	initrd_end = initrd_start + ramdisk_size;
+	_load_ucode_intel_bsp(&mc_saved_data, mc_saved_in_initrd,
+			      initrd_start, initrd_end);
+#else
+	ramdisk_end  = ramdisk_image + ramdisk_size;
+	initrd_start = ramdisk_image;
+	initrd_end = initrd_start + ramdisk_size;
+	_load_ucode_intel_bsp((struct mc_saved_data *)__pa(&mc_saved_data),
+			(struct microcode_intel **)__pa(mc_saved_in_initrd),
+			initrd_start, initrd_end);
+#endif
+}
+
+void __cpuinit load_ucode_intel_ap(void)
+{
+	int cpu = smp_processor_id();
+
+	/*
+	 * If BSP doesn't find valid ucode and save it in memory, no need to
+	 * update ucode on this AP.
+	 */
+	if (!mc_saved_data.mc_saved)
+		return;
+
+	collect_cpu_info_early(mc_saved_data.ucode_cpu_info + cpu);
+	load_microcode(&mc_saved_data, cpu);
+	apply_microcode_early(&mc_saved_data, cpu);
+}

^ permalink raw reply related	[flat|nested] 127+ messages in thread

* [tip:x86/microcode] x86/head_32.S: Early update ucode in 32-bit
  2012-11-30  1:47 ` [PATCH v2 06/10] x86/head_32.S: Early update ucode in 32-bit Fenghua Yu
@ 2012-12-01  0:26   ` tip-bot for Fenghua Yu
  0 siblings, 0 replies; 127+ messages in thread
From: tip-bot for Fenghua Yu @ 2012-12-01  0:26 UTC (permalink / raw)
  To: linux-tip-commits; +Cc: linux-kernel, hpa, mingo, fenghua.yu, tglx, hpa

Commit-ID:  eadd948a43145b54b2da8ee4d83774ac2aa6ee02
Gitweb:     http://git.kernel.org/tip/eadd948a43145b54b2da8ee4d83774ac2aa6ee02
Author:     Fenghua Yu <fenghua.yu@intel.com>
AuthorDate: Thu, 29 Nov 2012 17:47:44 -0800
Committer:  H. Peter Anvin <hpa@linux.intel.com>
CommitDate: Fri, 30 Nov 2012 15:18:16 -0800

x86/head_32.S: Early update ucode in 32-bit

This updates ucode in 32-bit kernel. At this point, there is no paging and no
virtual address yet.

Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Link: http://lkml.kernel.org/r/1354240068-9821-7-git-send-email-fenghua.yu@intel.com
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
---
 arch/x86/kernel/head_32.S | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/arch/x86/kernel/head_32.S b/arch/x86/kernel/head_32.S
index 957a47a..b7b76a4 100644
--- a/arch/x86/kernel/head_32.S
+++ b/arch/x86/kernel/head_32.S
@@ -144,6 +144,12 @@ ENTRY(startup_32)
 	movl %eax, pa(olpc_ofw_pgd)
 #endif
 
+#ifdef CONFIG_MICROCODE_EARLY
+	/* Early load ucode on BSP. */
+	movl $pa(boot_params), %eax
+	call load_ucode_bsp
+#endif
+
 /*
  * Initialize page tables.  This creates a PDE and a set of page
  * tables, which are located immediately beyond __brk_base.  The variable

^ permalink raw reply related	[flat|nested] 127+ messages in thread

* [tip:x86/microcode] x86/head64.c: Early update ucode in 64-bit
  2012-11-30  1:47 ` [PATCH v2 07/10] x86/head64.c: Early update ucode in 64-bit Fenghua Yu
@ 2012-12-01  0:27   ` tip-bot for Fenghua Yu
  0 siblings, 0 replies; 127+ messages in thread
From: tip-bot for Fenghua Yu @ 2012-12-01  0:27 UTC (permalink / raw)
  To: linux-tip-commits; +Cc: linux-kernel, hpa, mingo, fenghua.yu, tglx, hpa

Commit-ID:  b8f3111a12c190ab8228718e766dd2707b6fc8e0
Gitweb:     http://git.kernel.org/tip/b8f3111a12c190ab8228718e766dd2707b6fc8e0
Author:     Fenghua Yu <fenghua.yu@intel.com>
AuthorDate: Thu, 29 Nov 2012 17:47:45 -0800
Committer:  H. Peter Anvin <hpa@linux.intel.com>
CommitDate: Fri, 30 Nov 2012 15:18:16 -0800

x86/head64.c: Early update ucode in 64-bit

This updates ucode in 64-bit mode. Paging and virtual address are working now.

Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Link: http://lkml.kernel.org/r/1354240068-9821-8-git-send-email-fenghua.yu@intel.com
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
---
 arch/x86/kernel/head64.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/arch/x86/kernel/head64.c b/arch/x86/kernel/head64.c
index 037df57..a512f56 100644
--- a/arch/x86/kernel/head64.c
+++ b/arch/x86/kernel/head64.c
@@ -25,6 +25,7 @@
 #include <asm/kdebug.h>
 #include <asm/e820.h>
 #include <asm/bios_ebda.h>
+#include <asm/microcode.h>
 
 static void __init zap_identity_mappings(void)
 {
@@ -73,6 +74,11 @@ void __init x86_64_start_kernel(char * real_mode_data)
 	/* clear bss before set_intr_gate with early_idt_handler */
 	clear_bss();
 
+	/*
+	 * Load microcode early on BSP.
+	 */
+	load_ucode_bsp(real_mode_data);
+
 	/* Make NULL pointers segfault */
 	zap_identity_mappings();
 

^ permalink raw reply related	[flat|nested] 127+ messages in thread

* [tip:x86/microcode] x86/smpboot.c: Early update ucode on AP
  2012-11-30  1:47 ` [PATCH v2 08/10] x86/smpboot.c: Early update ucode on AP Fenghua Yu
@ 2012-12-01  0:28   ` tip-bot for Fenghua Yu
  0 siblings, 0 replies; 127+ messages in thread
From: tip-bot for Fenghua Yu @ 2012-12-01  0:28 UTC (permalink / raw)
  To: linux-tip-commits; +Cc: linux-kernel, hpa, mingo, fenghua.yu, tglx, hpa

Commit-ID:  27f7c88afbdff536e65bbf1507ba63ce20eb0ba3
Gitweb:     http://git.kernel.org/tip/27f7c88afbdff536e65bbf1507ba63ce20eb0ba3
Author:     Fenghua Yu <fenghua.yu@intel.com>
AuthorDate: Thu, 29 Nov 2012 17:47:46 -0800
Committer:  H. Peter Anvin <hpa@linux.intel.com>
CommitDate: Fri, 30 Nov 2012 15:18:16 -0800

x86/smpboot.c: Early update ucode on AP

This updates ucode on AP. At this point, BSP should store some valid ucode
patches in memory if it finds the ucode patches in initrd. AP searches the
stored ucode and uploads the ucode.

Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Link: http://lkml.kernel.org/r/1354240068-9821-9-git-send-email-fenghua.yu@intel.com
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
---
 arch/x86/kernel/smpboot.c | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
index c80a33b..f616692 100644
--- a/arch/x86/kernel/smpboot.c
+++ b/arch/x86/kernel/smpboot.c
@@ -76,6 +76,7 @@
 #include <asm/i8259.h>
 
 #include <asm/realmode.h>
+#include <asm/microcode.h>
 
 /* State of each CPU */
 DEFINE_PER_CPU(int, cpu_state) = { 0 };
@@ -239,6 +240,12 @@ notrace static void __cpuinit start_secondary(void *unused)
 	 * most necessary things.
 	 */
 	cpu_init();
+	/*
+	 * Load microcode on this cpu if a valid microcode is available.
+	 * This is early microcode loading procedure.
+	 */
+	load_ucode_ap();
+
 	x86_cpuinit.early_percpu_clock_init();
 	preempt_disable();
 	smp_callin();

^ permalink raw reply related	[flat|nested] 127+ messages in thread

* [tip:x86/microcode] x86/mm/init.c: Copy ucode from initrd image to memory
  2012-11-30  1:47 ` [PATCH v2 09/10] x86/mm/init.c: Copy ucode from initrd image to memory Fenghua Yu
@ 2012-12-01  0:29   ` tip-bot for Fenghua Yu
  0 siblings, 0 replies; 127+ messages in thread
From: tip-bot for Fenghua Yu @ 2012-12-01  0:29 UTC (permalink / raw)
  To: linux-tip-commits; +Cc: linux-kernel, hpa, mingo, fenghua.yu, tglx, hpa

Commit-ID:  834137534b34b2ef0e1c3c00ad6a31e06b604e00
Gitweb:     http://git.kernel.org/tip/834137534b34b2ef0e1c3c00ad6a31e06b604e00
Author:     Fenghua Yu <fenghua.yu@intel.com>
AuthorDate: Thu, 29 Nov 2012 17:47:47 -0800
Committer:  H. Peter Anvin <hpa@linux.intel.com>
CommitDate: Fri, 30 Nov 2012 15:18:17 -0800

x86/mm/init.c: Copy ucode from initrd image to memory

Before initrd image is freed, copy valid ucode patches from initrd image
to kernel virtual memory. The saved ucode will be used to update AP in resume.

Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Link: http://lkml.kernel.org/r/1354240068-9821-10-git-send-email-fenghua.yu@intel.com
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
---
 arch/x86/mm/init.c | 10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/arch/x86/mm/init.c b/arch/x86/mm/init.c
index d7aea41..a294d4b 100644
--- a/arch/x86/mm/init.c
+++ b/arch/x86/mm/init.c
@@ -16,6 +16,7 @@
 #include <asm/tlb.h>
 #include <asm/proto.h>
 #include <asm/dma.h>		/* for MAX_DMA_PFN */
+#include <asm/microcode.h>
 
 unsigned long __initdata pgt_buf_start;
 unsigned long __meminitdata pgt_buf_end;
@@ -391,6 +392,15 @@ void free_initmem(void)
 #ifdef CONFIG_BLK_DEV_INITRD
 void __init free_initrd_mem(unsigned long start, unsigned long end)
 {
+#ifdef CONFIG_MICROCODE_EARLY
+	/*
+	 * Remember, initrd memory may contain microcode or other useful things.
+	 * Before we lose initrd mem, we need to find a place to hold them
+	 * now that normal virtual memory is enabled.
+	 */
+	save_microcode_in_initrd(&mc_saved_data, mc_saved_in_initrd);
+#endif
+
 	/*
 	 * end could be not aligned, and We can not align that,
 	 * decompresser could be confused by aligned initrd_end

^ permalink raw reply related	[flat|nested] 127+ messages in thread

* [tip:x86/microcode] x86/Kconfig: Configurations to enable/ disable the feature
  2012-11-30  1:47 ` [PATCH v2 10/10] x86/Kconfig: Configurations to enable/disable the feature Fenghua Yu
@ 2012-12-01  0:30   ` tip-bot for Fenghua Yu
  0 siblings, 0 replies; 127+ messages in thread
From: tip-bot for Fenghua Yu @ 2012-12-01  0:30 UTC (permalink / raw)
  To: linux-tip-commits; +Cc: linux-kernel, hpa, mingo, fenghua.yu, tglx, hpa

Commit-ID:  16544f8d32aca4ec97b22db880f879ee164c3658
Gitweb:     http://git.kernel.org/tip/16544f8d32aca4ec97b22db880f879ee164c3658
Author:     Fenghua Yu <fenghua.yu@intel.com>
AuthorDate: Thu, 29 Nov 2012 17:47:48 -0800
Committer:  H. Peter Anvin <hpa@linux.intel.com>
CommitDate: Fri, 30 Nov 2012 15:18:17 -0800

x86/Kconfig: Configurations to enable/disable the feature

MICROCODE_INTEL_LIB, MICROCODE_INTEL_EARLY, and MICROCODE_EARLY are three new
configurations to enable or disable the feature.

Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Link: http://lkml.kernel.org/r/1354240068-9821-11-git-send-email-fenghua.yu@intel.com
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
---
 arch/x86/Kconfig | 18 ++++++++++++++++++
 1 file changed, 18 insertions(+)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 46c3bff..abb1153 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -1031,6 +1031,24 @@ config MICROCODE_OLD_INTERFACE
 	def_bool y
 	depends on MICROCODE
 
+config MICROCODE_INTEL_LIB
+	def_bool y
+	depends on MICROCODE_INTEL
+
+config MICROCODE_INTEL_EARLY
+	bool "Early load microcode"
+	depends on MICROCODE_INTEL && BLK_DEV_INITRD
+	default y
+	help
+	  This option provides functionality to read additional microcode data
+	  at the beginning of initrd image. The data tells kernel to load
+	  microcode to CPU's as early as possible. No functional change if no
+	  microcode data is glued to the initrd, therefore it's safe to say Y.
+
+config MICROCODE_EARLY
+	def_bool y
+	depends on MICROCODE_INTEL_EARLY
+
 config X86_MSR
 	tristate "/dev/cpu/*/msr - Model-specific register support"
 	---help---

^ permalink raw reply related	[flat|nested] 127+ messages in thread

* Re: [tip:x86/microcode] x86/microcode_intel_early.c: Early update ucode on Intel's CPU
  2012-12-01  0:25   ` [tip:x86/microcode] " tip-bot for Fenghua Yu
@ 2012-12-01  0:55     ` Yinghai Lu
  2012-12-04  0:18       ` Yu, Fenghua
  0 siblings, 1 reply; 127+ messages in thread
From: Yinghai Lu @ 2012-12-01  0:55 UTC (permalink / raw)
  To: mingo, hpa, linux-kernel, fenghua.yu, tglx, hpa; +Cc: linux-tip-commits

On Fri, Nov 30, 2012 at 4:25 PM, tip-bot for Fenghua Yu
<fenghua.yu@intel.com> wrote:
> Commit-ID:  474355fe313391de2429ae225e0fb02f67ec6c31
> Gitweb:     http://git.kernel.org/tip/474355fe313391de2429ae225e0fb02f67ec6c31
> Author:     Fenghua Yu <fenghua.yu@intel.com>
> AuthorDate: Thu, 29 Nov 2012 17:47:43 -0800
> Committer:  H. Peter Anvin <hpa@linux.intel.com>
> CommitDate: Fri, 30 Nov 2012 15:18:16 -0800
>
> x86/microcode_intel_early.c: Early update ucode on Intel's CPU
>
> Implementation of early update ucode on Intel's CPU.
>
> load_ucode_intel_bsp() scans ucode in initrd image file which is a cpio format
> ucode followed by ordinary initrd image file. The binary ucode file is stored
> in kernel/x86/microcode/GenuineIntel/microcode.bin in the cpio data. All ucode
> patches with the same model as BSP are saved in memory. A matching ucode patch
> is updated on BSP.
>
> load_ucode_intel_ap() reads saved ucoded patches and updates ucode on AP.
>
> Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
> Link: http://lkml.kernel.org/r/1354240068-9821-6-git-send-email-fenghua.yu@intel.com
> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
> ---
>  arch/x86/kernel/microcode_intel_early.c | 438 ++++++++++++++++++++++++++++++++
>  1 file changed, 438 insertions(+)
>
> diff --git a/arch/x86/kernel/microcode_intel_early.c b/arch/x86/kernel/microcode_intel_early.c
> new file mode 100644
> index 0000000..36b1df1
> --- /dev/null
> +++ b/arch/x86/kernel/microcode_intel_early.c
> @@ -0,0 +1,438 @@
> +/*
> + *     Intel CPU Microcode Update Driver for Linux
> + *
> + *     Copyright (C) 2012 Fenghua Yu <fenghua.yu@intel.com>
> + *                        H Peter Anvin" <hpa@zytor.com>
> + *
> + *     This driver allows to early upgrade microcode on Intel processors
> + *     belonging to IA-32 family - PentiumPro, Pentium II,
> + *     Pentium III, Xeon, Pentium 4, etc.
> + *
> + *     Reference: Section 9.11 of Volume 3, IA-32 Intel Architecture
> + *     Software Developer's Manual.
> + *
> + *     This program is free software; you can redistribute it and/or
> + *     modify it under the terms of the GNU General Public License
> + *     as published by the Free Software Foundation; either version
> + *     2 of the License, or (at your option) any later version.
> + */
> +#include <linux/module.h>
> +#include <linux/vmalloc.h>
> +#include <linux/mm.h>
> +#include <linux/earlycpio.h>
> +#include <asm/msr.h>
> +#include <asm/microcode_intel.h>
> +#include <asm/processor.h>
> +
> +struct microcode_intel __initdata *mc_saved_in_initrd[MAX_UCODE_COUNT];
> +struct mc_saved_data mc_saved_data;
> +
> +enum ucode_state
> +generic_load_microcode_early(int cpu, struct microcode_intel **mc_saved_p,
> +                            unsigned int mc_saved_count,
> +                            struct ucode_cpu_info *uci)
> +{
> +       struct microcode_intel *ucode_ptr, *new_mc = NULL;
> +       int new_rev = uci->cpu_sig.rev;
> +       enum ucode_state state = UCODE_OK;
> +       unsigned int mc_size;
> +       struct microcode_header_intel *mc_header;
> +       unsigned int csig = uci->cpu_sig.sig;
> +       unsigned int cpf = uci->cpu_sig.pf;
> +       int i;
> +
> +       for (i = 0; i < mc_saved_count; i++) {
> +               ucode_ptr = mc_saved_p[i];
> +               mc_header = (struct microcode_header_intel *)ucode_ptr;
> +               mc_size = get_totalsize(mc_header);
> +               if (get_matching_microcode(csig, cpf, ucode_ptr, new_rev)) {
> +                       new_rev = mc_header->rev;
> +                       new_mc  = ucode_ptr;
> +               }
> +       }
> +
> +       if (!new_mc) {
> +               state = UCODE_NFOUND;
> +               goto out;
> +       }
> +
> +       uci->mc = (struct microcode_intel *)new_mc;
> +out:
> +       return state;
> +}
> +EXPORT_SYMBOL_GPL(generic_load_microcode_early);
> +
> +static enum ucode_state __init
> +load_microcode(struct mc_saved_data *mc_saved_data, int cpu)
> +{
> +       struct ucode_cpu_info *uci = mc_saved_data->ucode_cpu_info + cpu;
> +
> +       return generic_load_microcode_early(cpu, mc_saved_data->mc_saved,
> +                                           mc_saved_data->mc_saved_count, uci);
> +}
> +
> +static u8 get_x86_family(unsigned long sig)
> +{
> +       u8 x86;
> +
> +       x86 = (sig >> 8) & 0xf;
> +
> +       if (x86 == 0xf)
> +               x86 += (sig >> 20) & 0xff;
> +
> +       return x86;
> +}
> +
> +static u8 get_x86_model(unsigned long sig)
> +{
> +       u8 x86, x86_model;
> +
> +       x86 = get_x86_family(sig);
> +       x86_model = (sig >> 4) & 0xf;
> +
> +       if (x86 == 0x6 || x86 == 0xf)
> +               x86_model += ((sig >> 16) & 0xf) << 4;
> +
> +       return x86_model;
> +}
> +
> +static enum ucode_state
> +matching_model_microcode(struct microcode_header_intel *mc_header,
> +                       unsigned long sig)
> +{
> +       u8 x86, x86_model;
> +       u8 x86_ucode, x86_model_ucode;
> +
> +       x86 = get_x86_family(sig);
> +       x86_model = get_x86_model(sig);
> +
> +       x86_ucode = get_x86_family(mc_header->sig);
> +       x86_model_ucode = get_x86_model(mc_header->sig);
> +
> +       if (x86 != x86_ucode || x86_model != x86_model_ucode)
> +               return UCODE_ERROR;
> +
> +       return UCODE_OK;
> +}
> +
> +static void
> +save_microcode(struct mc_saved_data *mc_saved_data,
> +              struct microcode_intel **mc_saved_src,
> +              unsigned int mc_saved_count)
> +{
> +       int i;
> +       struct microcode_intel **mc_saved_p;
> +
> +       if (!mc_saved_count)
> +               return;
> +
> +       mc_saved_p = vmalloc(mc_saved_count*sizeof(struct microcode_intel *));
> +       if (!mc_saved_p)
> +               return;
> +
> +       for (i = 0; i < mc_saved_count; i++) {
> +               struct microcode_intel *mc = mc_saved_src[i];
> +               struct microcode_header_intel *mc_header = &mc->hdr;
> +               unsigned long mc_size = get_totalsize(mc_header);
> +               mc_saved_p[i] = vmalloc(mc_size);
> +               if (mc_saved_src[i])
> +                       memcpy(mc_saved_p[i], mc, mc_size);
> +       }
> +
> +       mc_saved_data->mc_saved = mc_saved_p;
> +}
> +
> +/*
> + * Get microcode matching with BSP's model. Only CPU's with the same model as
> + * BSP can stay in the platform.
> + */
> +enum ucode_state
> +get_matching_model_microcode(int cpu, void *data, size_t size,
> +                            struct mc_saved_data *mc_saved_data,
> +                            struct microcode_intel **mc_saved_in_initrd,
> +                            enum system_states system_state)
> +{
> +       u8 *ucode_ptr = data;
> +       unsigned int leftover = size;
> +       enum ucode_state state = UCODE_OK;
> +       unsigned int mc_size;
> +       struct microcode_header_intel *mc_header;
> +       struct microcode_intel *mc_saved_tmp[MAX_UCODE_COUNT];
> +       size_t mc_saved_size;
> +       size_t mem_size;
> +       unsigned int mc_saved_count = mc_saved_data->mc_saved_count;
> +       struct ucode_cpu_info *uci = mc_saved_data->ucode_cpu_info + cpu;
> +       int found = 0;
> +       int i;
> +
> +       if (mc_saved_count) {
> +               mem_size = mc_saved_count * sizeof(struct microcode_intel *);
> +               memcpy(mc_saved_tmp, mc_saved_data->mc_saved, mem_size);
> +       }
> +
> +       while (leftover) {
> +               mc_header = (struct microcode_header_intel *)ucode_ptr;
> +
> +               mc_size = get_totalsize(mc_header);
> +               if (!mc_size || mc_size > leftover ||
> +                       microcode_sanity_check(ucode_ptr, 0) < 0)
> +                       break;
> +
> +               leftover -= mc_size;
> +               if (matching_model_microcode(mc_header, uci->cpu_sig.sig) !=
> +                        UCODE_OK) {
> +                       ucode_ptr += mc_size;
> +                       continue;
> +               }
> +
> +               found = 0;
> +               for (i = 0; i < mc_saved_count; i++) {
> +                       unsigned int sig, pf;
> +                       unsigned int new_rev;
> +                       struct microcode_header_intel *mc_saved_header =
> +                            (struct microcode_header_intel *)mc_saved_tmp[i];
> +                       sig = mc_saved_header->sig;
> +                       pf = mc_saved_header->pf;
> +                       new_rev = mc_header->rev;
> +
> +                       if (get_matching_sig(sig, pf, ucode_ptr, new_rev)) {
> +                               found = 1;
> +                               if (update_match_revision(mc_header, new_rev)) {
> +                                       /*
> +                                        * Found an older ucode saved before.
> +                                        * Replace the older one with this newer
> +                                        * one.
> +                                        */
> +                                       mc_saved_tmp[i] =
> +                                       (struct microcode_intel *)ucode_ptr;
> +                                       break;
> +                               }
> +                       }
> +               }
> +               if (i >= mc_saved_count && !found)
> +                       /*
> +                        * This ucode is first time discovered in ucode file.
> +                        * Save it to memory.
> +                        */
> +                       mc_saved_tmp[mc_saved_count++] =
> +                                        (struct microcode_intel *)ucode_ptr;
> +
> +               ucode_ptr += mc_size;
> +       }
> +
> +       if (leftover) {
> +               state = UCODE_ERROR;
> +               goto out;
> +       }
> +
> +       if (mc_saved_count == 0) {
> +               state = UCODE_NFOUND;
> +               goto out;
> +       }
> +
> +       if (system_state == SYSTEM_RUNNING) {
> +               vfree(mc_saved_data->mc_saved);
> +               save_microcode(mc_saved_data, mc_saved_tmp, mc_saved_count);
> +       } else {
> +               mc_saved_size = sizeof(struct microcode_intel *) *
> +                               mc_saved_count;
> +               memcpy(mc_saved_in_initrd, mc_saved_tmp, mc_saved_size);
> +               mc_saved_data->mc_saved = mc_saved_in_initrd;
> +       }
> +
> +       mc_saved_data->mc_saved_count = mc_saved_count;
> +out:
> +       return state;
> +}
> +EXPORT_SYMBOL_GPL(get_matching_model_microcode);
> +
> +#define native_rdmsr(msr, val1, val2)          \
> +do {                                           \
> +       u64 __val = native_read_msr((msr));     \
> +       (void)((val1) = (u32)__val);            \
> +       (void)((val2) = (u32)(__val >> 32));    \
> +} while (0)
> +
> +#define native_wrmsr(msr, low, high)           \
> +       native_write_msr(msr, low, high);
> +
> +static int __cpuinit collect_cpu_info_early(struct ucode_cpu_info *uci)
> +{
> +       unsigned int val[2];
> +       u8 x86, x86_model;
> +       struct cpu_signature csig = {0, 0, 0};
> +       unsigned int eax, ebx, ecx, edx;
> +
> +       memset(uci, 0, sizeof(*uci));
> +
> +       eax = 0x00000001;
> +       ecx = 0;
> +       native_cpuid(&eax, &ebx, &ecx, &edx);
> +       csig.sig = eax;
> +
> +       x86 = get_x86_family(csig.sig);
> +       x86_model = get_x86_model(csig.sig);
> +
> +       if ((x86_model >= 5) || (x86 > 6)) {
> +               /* get processor flags from MSR 0x17 */
> +               native_rdmsr(MSR_IA32_PLATFORM_ID, val[0], val[1]);
> +               csig.pf = 1 << ((val[1] >> 18) & 7);
> +       }
> +
> +       /* get the current revision from MSR 0x8B */
> +       native_rdmsr(MSR_IA32_UCODE_REV, val[0], val[1]);
> +
> +       csig.rev = val[1];
> +
> +       uci->cpu_sig = csig;
> +       uci->valid = 1;
> +
> +       return 0;
> +}
> +
> +static __init enum ucode_state
> +scan_microcode(unsigned long start, unsigned long end,
> +               struct mc_saved_data *mc_saved_data,
> +               struct microcode_intel **mc_saved_in_initrd)
> +{
> +       unsigned int size = end - start + 1;
> +       struct cpio_data cd = { 0, 0 };
> +       char ucode_name[] = "kernel/x86/microcode/GenuineIntel.bin";
> +       long offset = 0;
> +
> +       cd = find_cpio_data(ucode_name, (void *)start, size, &offset);
> +       if (!cd.data)
> +               return UCODE_ERROR;
> +
> +       return get_matching_model_microcode(0, cd.data, cd.size, mc_saved_data,
> +                                        mc_saved_in_initrd, SYSTEM_BOOTING);
> +}
> +
> +static int __init
> +apply_microcode_early(struct mc_saved_data *mc_saved_data, int cpu)
> +{
> +       struct ucode_cpu_info *uci = mc_saved_data->ucode_cpu_info + cpu;
> +       struct microcode_intel *mc_intel;
> +       unsigned int val[2];
> +
> +       /* We should bind the task to the CPU */
> +       mc_intel = uci->mc;
> +       if (mc_intel == NULL)
> +               return 0;
> +
> +       /* write microcode via MSR 0x79 */
> +       native_wrmsr(MSR_IA32_UCODE_WRITE,
> +             (unsigned long) mc_intel->bits,
> +             (unsigned long) mc_intel->bits >> 16 >> 16);
> +       native_wrmsr(MSR_IA32_UCODE_REV, 0, 0);
> +
> +       /* As documented in the SDM: Do a CPUID 1 here */
> +       sync_core();
> +
> +       /* get the current revision from MSR 0x8B */
> +       native_rdmsr(MSR_IA32_UCODE_REV, val[0], val[1]);
> +       if (val[1] != mc_intel->hdr.rev)
> +               return -1;
> +
> +       uci->cpu_sig.rev = val[1];
> +
> +       return 0;
> +}
> +
> +#ifdef CONFIG_X86_32
> +static void __init map_mc_saved(struct mc_saved_data *mc_saved_data,
> +                               struct microcode_intel **mc_saved_in_initrd)
> +{
> +       int i;
> +
> +       if (mc_saved_data->mc_saved) {
> +               for (i = 0; i < mc_saved_data->mc_saved_count; i++)
> +                       mc_saved_data->mc_saved[i] =
> +                                        __va(mc_saved_data->mc_saved[i]);
> +
> +               mc_saved_data->mc_saved = __va(mc_saved_data->mc_saved);
> +       }
> +
> +       if (mc_saved_data->ucode_cpu_info->mc)
> +               mc_saved_data->ucode_cpu_info->mc =
> +                                __va(mc_saved_data->ucode_cpu_info->mc);
> +       mc_saved_data->ucode_cpu_info = __va(mc_saved_data->ucode_cpu_info);
> +}
> +#else
> +static inline void __init map_mc_saved(struct mc_saved_data *mc_saved_data,
> +                               struct microcode_intel **mc_saved_in_initrd)
> +{
> +}
> +#endif
> +
> +void __init save_microcode_in_initrd(struct mc_saved_data *mc_saved_data,
> +                struct microcode_intel **mc_saved_in_initrd)
> +{
> +       unsigned int count = mc_saved_data->mc_saved_count;
> +
> +       save_microcode(mc_saved_data, mc_saved_in_initrd, count);
> +}
> +
> +static void __init
> +_load_ucode_intel_bsp(struct mc_saved_data *mc_saved_data,
> +                     struct microcode_intel **mc_saved_in_initrd,
> +                     unsigned long initrd_start, unsigned long initrd_end)
> +{
> +       int cpu = 0;
> +
> +#ifdef CONFIG_X86_64
> +       mc_saved_data->ucode_cpu_info = ucode_cpu_info_early;
> +#else
> +       mc_saved_data->ucode_cpu_info =
> +                       (struct ucode_cpu_info *)__pa(ucode_cpu_info_early);
> +#endif
> +       collect_cpu_info_early(mc_saved_data->ucode_cpu_info + cpu);
> +       scan_microcode(initrd_start, initrd_end, mc_saved_data,
> +                      mc_saved_in_initrd);
> +       load_microcode(mc_saved_data, cpu);
> +       apply_microcode_early(mc_saved_data, cpu);
> +       map_mc_saved(mc_saved_data, mc_saved_in_initrd);
> +}
> +
> +void __init
> +load_ucode_intel_bsp(char *real_mode_data)
> +{
> +       u64 ramdisk_image, ramdisk_size, ramdisk_end;
> +       unsigned long initrd_start, initrd_end;
> +       struct boot_params *boot_params;
> +
> +       boot_params = (struct boot_params *)real_mode_data;
> +       ramdisk_image = boot_params->hdr.ramdisk_image;
> +       ramdisk_size  = boot_params->hdr.ramdisk_size;
> +
> +#ifdef CONFIG_X86_64
> +       ramdisk_end  = PAGE_ALIGN(ramdisk_image + ramdisk_size);
> +       initrd_start = ramdisk_image + PAGE_OFFSET;
> +       initrd_end = initrd_start + ramdisk_size;
> +       _load_ucode_intel_bsp(&mc_saved_data, mc_saved_in_initrd,
> +                             initrd_start, initrd_end);
> +#else
> +       ramdisk_end  = ramdisk_image + ramdisk_size;
> +       initrd_start = ramdisk_image;
> +       initrd_end = initrd_start + ramdisk_size;
> +       _load_ucode_intel_bsp((struct mc_saved_data *)__pa(&mc_saved_data),
> +                       (struct microcode_intel **)__pa(mc_saved_in_initrd),
> +                       initrd_start, initrd_end);
> +#endif
> +}
> +
> +void __cpuinit load_ucode_intel_ap(void)
> +{
> +       int cpu = smp_processor_id();
> +
> +       /*
> +        * If BSP doesn't find valid ucode and save it in memory, no need to
> +        * update ucode on this AP.
> +        */
> +       if (!mc_saved_data.mc_saved)
> +               return;
> +
> +       collect_cpu_info_early(mc_saved_data.ucode_cpu_info + cpu);
> +       load_microcode(&mc_saved_data, cpu);
> +       apply_microcode_early(&mc_saved_data, cpu);
> +}

looks like this patches do not consider that initrd could be relocated
even 64bit.

may need to copy the ucode.bin to BRK at first. that will make code
much simple, and later does not need to
copy them back in free_bootmem_initrd.

aka, this patchset is not ready for 3.8 even.

Thanks

Yinghai

^ permalink raw reply	[flat|nested] 127+ messages in thread

* RE: [tip:x86/microcode] x86/microcode_intel_early.c: Early update ucode on Intel's CPU
  2012-12-01  0:55     ` Yinghai Lu
@ 2012-12-04  0:18       ` Yu, Fenghua
  2012-12-11  2:39         ` Yinghai Lu
  0 siblings, 1 reply; 127+ messages in thread
From: Yu, Fenghua @ 2012-12-04  0:18 UTC (permalink / raw)
  To: Yinghai Lu, mingo, hpa, linux-kernel, tglx, hpa; +Cc: linux-tip-commits

> From: yhlu.kernel@gmail.com [mailto:yhlu.kernel@gmail.com] On Behalf Of
> Yinghai Lu
> On Fri, Nov 30, 2012 at 4:25 PM, tip-bot for Fenghua Yu
> <fenghua.yu@intel.com> wrote:
> > Commit-ID:  474355fe313391de2429ae225e0fb02f67ec6c31
> > Gitweb:
> http://git.kernel.org/tip/474355fe313391de2429ae225e0fb02f67ec6c31
> > Author:     Fenghua Yu <fenghua.yu@intel.com>
> > AuthorDate: Thu, 29 Nov 2012 17:47:43 -0800
> > Committer:  H. Peter Anvin <hpa@linux.intel.com>
> > CommitDate: Fri, 30 Nov 2012 15:18:16 -0800
> >
> > x86/microcode_intel_early.c: Early update ucode on Intel's CPU
> >
> > Implementation of early update ucode on Intel's CPU.
> >
> > load_ucode_intel_bsp() scans ucode in initrd image file which is a
> cpio format
> > ucode followed by ordinary initrd image file. The binary ucode file
> is stored
> > in kernel/x86/microcode/GenuineIntel/microcode.bin in the cpio data.
> All ucode
> > patches with the same model as BSP are saved in memory. A matching
> ucode patch
> > is updated on BSP.
> >
> > load_ucode_intel_ap() reads saved ucoded patches and updates ucode on
> AP.
> >
> 
> looks like this patches do not consider that initrd could be relocated
> even 64bit.
> 
> may need to copy the ucode.bin to BRK at first. that will make code
> much simple, and later does not need to
> copy them back in free_bootmem_initrd.
> 
> aka, this patchset is not ready for 3.8 even.
> 

I will relocate saved microcode blob (mc_saved_in_initrd) after initrd is
relocated in updated patches. Thus, mc_saved_in_initrd always point to
right initrd during boot time.

Thanks.

-Fenghua

^ permalink raw reply	[flat|nested] 127+ messages in thread

* Re: [tip:x86/microcode] x86/microcode_intel_early.c: Early update ucode on Intel's CPU
  2012-12-04  0:18       ` Yu, Fenghua
@ 2012-12-11  2:39         ` Yinghai Lu
  2012-12-11  3:41           ` H. Peter Anvin
  0 siblings, 1 reply; 127+ messages in thread
From: Yinghai Lu @ 2012-12-11  2:39 UTC (permalink / raw)
  To: Yu, Fenghua; +Cc: mingo, hpa, linux-kernel, tglx, hpa, linux-tip-commits

On Mon, Dec 3, 2012 at 4:18 PM, Yu, Fenghua <fenghua.yu@intel.com> wrote:
>> From: yhlu.kernel@gmail.com [mailto:yhlu.kernel@gmail.com] On Behalf Of
>> Yinghai Lu
>>
>> may need to copy the ucode.bin to BRK at first. that will make code
>> much simple, and later does not need to
>> copy them back in free_bootmem_initrd.
>>
>> aka, this patchset is not ready for 3.8 even.
>>
>
> I will relocate saved microcode blob (mc_saved_in_initrd) after initrd is
> relocated in updated patches. Thus, mc_saved_in_initrd always point to
> right initrd during boot time.

No, you should not copy that several times.

just pre-allocate some kbytes in BRK, and copy to there one time.

Yinghai

^ permalink raw reply	[flat|nested] 127+ messages in thread

* Re: [tip:x86/microcode] x86/microcode_intel_early.c: Early update ucode on Intel's CPU
  2012-12-11  2:39         ` Yinghai Lu
@ 2012-12-11  3:41           ` H. Peter Anvin
  2012-12-11  3:55             ` Yinghai Lu
  0 siblings, 1 reply; 127+ messages in thread
From: H. Peter Anvin @ 2012-12-11  3:41 UTC (permalink / raw)
  To: Yinghai Lu; +Cc: Yu, Fenghua, mingo, linux-kernel, tglx, hpa, linux-tip-commits

On 12/10/2012 06:39 PM, Yinghai Lu wrote:
> 
> No, you should not copy that several times.
> 
> just pre-allocate some kbytes in BRK, and copy to there one time.
> 

He doesn't copy it several times.  He just saves an offset into the
initrd blob.

	-hpa



^ permalink raw reply	[flat|nested] 127+ messages in thread

* Re: [tip:x86/microcode] x86/microcode_intel_early.c: Early update ucode on Intel's CPU
  2012-12-11  3:41           ` H. Peter Anvin
@ 2012-12-11  3:55             ` Yinghai Lu
  2012-12-11  6:34               ` H. Peter Anvin
  0 siblings, 1 reply; 127+ messages in thread
From: Yinghai Lu @ 2012-12-11  3:55 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Yu, Fenghua, mingo, linux-kernel, tglx, hpa, linux-tip-commits

On Mon, Dec 10, 2012 at 7:41 PM, H. Peter Anvin <hpa@zytor.com> wrote:
> On 12/10/2012 06:39 PM, Yinghai Lu wrote:
>>
>> No, you should not copy that several times.
>>
>> just pre-allocate some kbytes in BRK, and copy to there one time.
>>
>
> He doesn't copy it several times.  He just saves an offset into the
> initrd blob.

ucode is together with initrd blob, and code scan that blob, and save
the pointer about ucode,
then BSP use it, and APs use it
after that when initrd get freed,  that ucode is copied to somewhere...

and his patch missed initrd could be get relocated for 64bit and 32bit.
So AP would not find that saved ucode.

After i pointed that, he said he will update the pointer when relocate
the initrd for AP.

And my suggestion is: after scan and find the ucode, save it to BRK,
so don't need to adjust
pointer again, and don't need to copy the blob and update the pointer again.

Yinghai

^ permalink raw reply	[flat|nested] 127+ messages in thread

* Re: [tip:x86/microcode] x86/microcode_intel_early.c: Early update ucode on Intel's CPU
  2012-12-11  3:55             ` Yinghai Lu
@ 2012-12-11  6:34               ` H. Peter Anvin
  2012-12-11  7:07                 ` Yinghai Lu
  0 siblings, 1 reply; 127+ messages in thread
From: H. Peter Anvin @ 2012-12-11  6:34 UTC (permalink / raw)
  To: Yinghai Lu; +Cc: Yu, Fenghua, mingo, linux-kernel, tglx, hpa, linux-tip-commits

On 12/10/2012 07:55 PM, Yinghai Lu wrote:
>
> And my suggestion is: after scan and find the ucode, save it to BRK,
> so don't need to adjust
> pointer again, and don't need to copy the blob and update the pointer again.
>

That doesn't work if the microcode is replaced at runtime.  However, 
vmalloc doesn't work either since 32 bits needs any one blob to be 
physically contiguous.  I have suggested Fenghua replace it with a 
linked list of kmalloc areas, one for each blob.

	-hpa

-- 
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel.  I don't speak on their behalf.


^ permalink raw reply	[flat|nested] 127+ messages in thread

* Re: [tip:x86/microcode] x86/microcode_intel_early.c: Early update ucode on Intel's CPU
  2012-12-11  6:34               ` H. Peter Anvin
@ 2012-12-11  7:07                 ` Yinghai Lu
  2012-12-11 14:57                   ` Borislav Petkov
  0 siblings, 1 reply; 127+ messages in thread
From: Yinghai Lu @ 2012-12-11  7:07 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Yu, Fenghua, mingo, linux-kernel, tglx, hpa, linux-tip-commits

On Mon, Dec 10, 2012 at 10:34 PM, H. Peter Anvin <hpa@zytor.com> wrote:
>
> That doesn't work if the microcode is replaced at runtime.  However, vmalloc
> doesn't work either since 32 bits needs any one blob to be physically
> contiguous.  I have suggested Fenghua replace it with a linked list of
> kmalloc areas, one for each blob.

you mean:
keep the all of version, and update code need to go over the list to
find latest before apply update?

BTW, do we really need to update microcode so early?

Yinghai

^ permalink raw reply	[flat|nested] 127+ messages in thread

* Re: [tip:x86/microcode] x86/microcode_intel_early.c: Early update ucode on Intel's CPU
  2012-12-11  7:07                 ` Yinghai Lu
@ 2012-12-11 14:57                   ` Borislav Petkov
  2012-12-11 16:46                     ` Yinghai Lu
  0 siblings, 1 reply; 127+ messages in thread
From: Borislav Petkov @ 2012-12-11 14:57 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: H. Peter Anvin, Yu, Fenghua, mingo, linux-kernel, tglx, hpa,
	linux-tip-commits

On Mon, Dec 10, 2012 at 11:07:38PM -0800, Yinghai Lu wrote:
> BTW, do we really need to update microcode so early?

Yes we do. Normally ucode gets applied by the BIOS - this early approach
is for those cases where OEMs don't release new BIOS anymore but we
still need to apply a ucode patch as early as possible.

-- 
Regards/Gruss,
    Boris.

Sent from a fat crate under my desk. Formatting is fine.
--

^ permalink raw reply	[flat|nested] 127+ messages in thread

* Re: [tip:x86/microcode] x86/microcode_intel_early.c: Early update ucode on Intel's CPU
  2012-12-11 14:57                   ` Borislav Petkov
@ 2012-12-11 16:46                     ` Yinghai Lu
  2012-12-11 16:48                       ` H. Peter Anvin
  0 siblings, 1 reply; 127+ messages in thread
From: Yinghai Lu @ 2012-12-11 16:46 UTC (permalink / raw)
  To: Borislav Petkov, Yinghai Lu, H. Peter Anvin, Yu, Fenghua, mingo,
	linux-kernel, tglx, hpa, linux-tip-commits

On Tue, Dec 11, 2012 at 6:57 AM, Borislav Petkov <bp@alien8.de> wrote:
> On Mon, Dec 10, 2012 at 11:07:38PM -0800, Yinghai Lu wrote:
>> BTW, do we really need to update microcode so early?
>
> Yes we do. Normally ucode gets applied by the BIOS - this early approach
> is for those cases where OEMs don't release new BIOS anymore but we
> still need to apply a ucode patch as early as possible.

but old bios with old ucode still work with some kind of previous kernel?

Do we have case that system would not work with kernel if ucode is not
updated so early?

Yinghai

^ permalink raw reply	[flat|nested] 127+ messages in thread

* Re: [tip:x86/microcode] x86/microcode_intel_early.c: Early update ucode on Intel's CPU
  2012-12-11 16:46                     ` Yinghai Lu
@ 2012-12-11 16:48                       ` H. Peter Anvin
  2012-12-11 17:00                         ` Yinghai Lu
  0 siblings, 1 reply; 127+ messages in thread
From: H. Peter Anvin @ 2012-12-11 16:48 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Borislav Petkov, Yu, Fenghua, mingo, linux-kernel, tglx, hpa,
	linux-tip-commits

On 12/11/2012 08:46 AM, Yinghai Lu wrote:
> On Tue, Dec 11, 2012 at 6:57 AM, Borislav Petkov <bp@alien8.de> wrote:
>> On Mon, Dec 10, 2012 at 11:07:38PM -0800, Yinghai Lu wrote:
>>> BTW, do we really need to update microcode so early?
>>
>> Yes we do. Normally ucode gets applied by the BIOS - this early approach
>> is for those cases where OEMs don't release new BIOS anymore but we
>> still need to apply a ucode patch as early as possible.
> 
> but old bios with old ucode still work with some kind of previous kernel?
> 
> Do we have case that system would not work with kernel if ucode is not
> updated so early?
> 

It's not about "not working"... it is about "if the microcode isn't
loaded early we have to disable features."

	-hpa



^ permalink raw reply	[flat|nested] 127+ messages in thread

* Re: [tip:x86/microcode] x86/microcode_intel_early.c: Early update ucode on Intel's CPU
  2012-12-11 16:48                       ` H. Peter Anvin
@ 2012-12-11 17:00                         ` Yinghai Lu
  2012-12-11 17:06                           ` Borislav Petkov
  0 siblings, 1 reply; 127+ messages in thread
From: Yinghai Lu @ 2012-12-11 17:00 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Borislav Petkov, Yu, Fenghua, mingo, linux-kernel, tglx, hpa,
	linux-tip-commits

On Tue, Dec 11, 2012 at 8:48 AM, H. Peter Anvin <hpa@zytor.com> wrote:
>
> It's not about "not working"... it is about "if the microcode isn't
> loaded early we have to disable features."

ok, then next question is how early it should be.

before early_cpu_init/early_identify_cpu

or just before check_bugs/identify_cpu

Yinghai

^ permalink raw reply	[flat|nested] 127+ messages in thread

* Re: [tip:x86/microcode] x86/microcode_intel_early.c: Early update ucode on Intel's CPU
  2012-12-11 17:00                         ` Yinghai Lu
@ 2012-12-11 17:06                           ` Borislav Petkov
  2012-12-11 17:15                             ` Yinghai Lu
  0 siblings, 1 reply; 127+ messages in thread
From: Borislav Petkov @ 2012-12-11 17:06 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: H. Peter Anvin, Yu, Fenghua, mingo, linux-kernel, tglx, hpa,
	linux-tip-commits

On Tue, Dec 11, 2012 at 09:00:55AM -0800, Yinghai Lu wrote:
> ok, then next question is how early it should be.
> 
> before early_cpu_init/early_identify_cpu
> 
> or just before check_bugs/identify_cpu

Read the code. It's in x86_64_start_kernel on 64-bit.

-- 
Regards/Gruss,
    Boris.

Sent from a fat crate under my desk. Formatting is fine.
--

^ permalink raw reply	[flat|nested] 127+ messages in thread

* Re: [tip:x86/microcode] x86/microcode_intel_early.c: Early update ucode on Intel's CPU
  2012-12-11 17:06                           ` Borislav Petkov
@ 2012-12-11 17:15                             ` Yinghai Lu
  2012-12-11 17:26                               ` Yu, Fenghua
                                                 ` (2 more replies)
  0 siblings, 3 replies; 127+ messages in thread
From: Yinghai Lu @ 2012-12-11 17:15 UTC (permalink / raw)
  To: Borislav Petkov, Yinghai Lu, H. Peter Anvin, Yu, Fenghua, mingo,
	linux-kernel, tglx, hpa, linux-tip-commits

On Tue, Dec 11, 2012 at 9:06 AM, Borislav Petkov <bp@alien8.de> wrote:
> On Tue, Dec 11, 2012 at 09:00:55AM -0800, Yinghai Lu wrote:
>> ok, then next question is how early it should be.
>>
>> before early_cpu_init/early_identify_cpu
>>
>> or just before check_bugs/identify_cpu
>
> Read the code. It's in x86_64_start_kernel on 64-bit.
>

No, that is not right place. initrd could be loaded anywhere like way
high by bootloader.

to make code simple, we should have following sequence in setup_arch

early_ioremap_init()
early_update_microcode()...
early_cpu_init()

early_update_microcode could use early_ioremap to access initrd ramdisk area.

Yinghai

^ permalink raw reply	[flat|nested] 127+ messages in thread

* RE: [tip:x86/microcode] x86/microcode_intel_early.c: Early update ucode on Intel's CPU
  2012-12-11 17:15                             ` Yinghai Lu
@ 2012-12-11 17:26                               ` Yu, Fenghua
  2012-12-11 17:38                               ` H. Peter Anvin
  2012-12-11 18:02                               ` H. Peter Anvin
  2 siblings, 0 replies; 127+ messages in thread
From: Yu, Fenghua @ 2012-12-11 17:26 UTC (permalink / raw)
  To: Yinghai Lu, Borislav Petkov, H. Peter Anvin, mingo, linux-kernel,
	tglx, hpa, linux-tip-commits

> -----Original Message-----
> From: yhlu.kernel@gmail.com [mailto:yhlu.kernel@gmail.com] On Behalf Of
> Yinghai Lu
> Sent: Tuesday, December 11, 2012 9:16 AM
> To: Borislav Petkov; Yinghai Lu; H. Peter Anvin; Yu, Fenghua;
> mingo@kernel.org; linux-kernel@vger.kernel.org; tglx@linutronix.de;
> hpa@linux.intel.com; linux-tip-commits@vger.kernel.org
> Subject: Re: [tip:x86/microcode] x86/microcode_intel_early.c: Early
> update ucode on Intel's CPU
> 
> On Tue, Dec 11, 2012 at 9:06 AM, Borislav Petkov <bp@alien8.de> wrote:
> > On Tue, Dec 11, 2012 at 09:00:55AM -0800, Yinghai Lu wrote:
> >> ok, then next question is how early it should be.
> >>
> >> before early_cpu_init/early_identify_cpu
> >>
> >> or just before check_bugs/identify_cpu
> >
> > Read the code. It's in x86_64_start_kernel on 64-bit.
> >
> 
> No, that is not right place. initrd could be loaded anywhere like way
> high by bootloader.
> 
> to make code simple, we should have following sequence in setup_arch
> 
> early_ioremap_init()
> early_update_microcode()...
> early_cpu_init()
> 
> early_update_microcode could use early_ioremap to access initrd ramdisk
> area.

The problem is mainly from 32-bit. We need to load microcode before paging is enabled (I'm changing load_ucode_intel_ap to be called before paging as well). That makes the coding not simple. It's better to load ucode as early as possible in 64-bit as well. I don't want to have two totally different sets of early loading ucode for 64-bit and 32-bit. 

Thanks.

-Fenghua

^ permalink raw reply	[flat|nested] 127+ messages in thread

* Re: [tip:x86/microcode] x86/microcode_intel_early.c: Early update ucode on Intel's CPU
  2012-12-11 17:15                             ` Yinghai Lu
  2012-12-11 17:26                               ` Yu, Fenghua
@ 2012-12-11 17:38                               ` H. Peter Anvin
  2012-12-11 23:53                                 ` Yinghai Lu
  2012-12-11 18:02                               ` H. Peter Anvin
  2 siblings, 1 reply; 127+ messages in thread
From: H. Peter Anvin @ 2012-12-11 17:38 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Borislav Petkov, Yu, Fenghua, mingo, linux-kernel, tglx, hpa,
	linux-tip-commits, Konrad Rzeszutek Wilk, Stefano Stabellini

On 12/11/2012 09:15 AM, Yinghai Lu wrote:
>
> No, that is not right place. initrd could be loaded anywhere like way
> high by bootloader.
>

Only *after* your changes... the current protocol doesn't allow that.

Anyway, we need to deal with it, see below.

> to make code simple, we should have following sequence in setup_arch
>
> early_ioremap_init()
> early_update_microcode()...
> early_cpu_init()
>
> early_update_microcode could use early_ioremap to access initrd ramdisk area.

We need there to be as little machinery as possible, *especially* 
machinery involving paging, before the microcode gets loaded.  I would 
prefer if we could load the microcode before enabling paging at all, but 
that would mean running it in 32-bit mode which really isn't practical, 
so we have to deal with scaffolding here.

We really should cycle the paging off after this happens, but that 
really isn't possible until the trampoline gets set up later, since we 
may not have any other place to stand in low memory.

This complexity is the cost of allowing the kernel to load above the 4G 
mark.

Now, to be solution-focused...

I quite frankly don't see anything in early_remap_init() which can't be 
done at compile time.  All it does is set up some data structures which 
could just as well be statically generated, and then it would be 
available from the very beginning and thus usable for this purpose. Xen 
might need extra song & dance, but that can be done in Xen-specific code.

The Xen-induced (and since then expanded by others) 32-bit braindamage 
of having __FIXADDR_TOP be a variable is a huge problem, obviously, but 
it doesn't affect 64 bits which is what we're dealing with here.  There 
*is* a way to fix it on 32 bits without breaking Xen, which is to move 
the fixmap to the beginning of kernel virtual space instead of the end 
-- this boundary is known at compile time.

	-hpa

-- 
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel.  I don't speak on their behalf.


^ permalink raw reply	[flat|nested] 127+ messages in thread

* Re: [tip:x86/microcode] x86/microcode_intel_early.c: Early update ucode on Intel's CPU
  2012-12-11 17:15                             ` Yinghai Lu
  2012-12-11 17:26                               ` Yu, Fenghua
  2012-12-11 17:38                               ` H. Peter Anvin
@ 2012-12-11 18:02                               ` H. Peter Anvin
  2012-12-11 18:20                                 ` H. Peter Anvin
  2 siblings, 1 reply; 127+ messages in thread
From: H. Peter Anvin @ 2012-12-11 18:02 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Borislav Petkov, Yu, Fenghua, mingo, linux-kernel, tglx, hpa,
	linux-tip-commits

On 12/11/2012 09:15 AM, Yinghai Lu wrote:
> On Tue, Dec 11, 2012 at 9:06 AM, Borislav Petkov <bp@alien8.de> wrote:
>> On Tue, Dec 11, 2012 at 09:00:55AM -0800, Yinghai Lu wrote:
>>> ok, then next question is how early it should be.
>>>
>>> before early_cpu_init/early_identify_cpu
>>>
>>> or just before check_bugs/identify_cpu
>>
>> Read the code. It's in x86_64_start_kernel on 64-bit.
>>
>
> No, that is not right place. initrd could be loaded anywhere like way
> high by bootloader.
>

The more I think about it, the more I think the right answer is the one 
we have pretty stated all along: if using the 64-bit entry point it is 
the responsibility of the boot loader to make sure the kernel, the setup 
data, and the initramfs are all mapped on entry.

	-hpa
-- 
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel.  I don't speak on their behalf.


^ permalink raw reply	[flat|nested] 127+ messages in thread

* Re: [tip:x86/microcode] x86/microcode_intel_early.c: Early update ucode on Intel's CPU
  2012-12-11 18:02                               ` H. Peter Anvin
@ 2012-12-11 18:20                                 ` H. Peter Anvin
  2012-12-11 18:42                                   ` Yinghai Lu
  0 siblings, 1 reply; 127+ messages in thread
From: H. Peter Anvin @ 2012-12-11 18:20 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Borislav Petkov, Yu, Fenghua, mingo, linux-kernel, tglx, hpa,
	linux-tip-commits

On 12/11/2012 10:02 AM, H. Peter Anvin wrote:
>
> The more I think about it, the more I think the right answer is the one
> we have pretty stated all along: if using the 64-bit entry point it is
> the responsibility of the boot loader to make sure the kernel, the setup
> data, and the initramfs are all mapped on entry.
>

This, in turn, brings up another major problem with the 64-bit entry 
point: right now it assumes page tables set up the way the current 
kernels expect them, but the way the kernel expects pages to be laid out 
on the future is almost guaranteed to change.  We need to formalize the 
expectations of the page table layout at 64-bit entry, and check to see 
what the implications of that are.

In particular, we may need to build a set of scaffolding page tables in 
the brk when entered from the 64-bit entry point and switch to them a 
lot sooner.  We wouldn't have to do that when coming from the 16/32-bit 
entry points since we'd control the layout of those page tables.

Oh how fun...

	-hpa

-- 
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel.  I don't speak on their behalf.


^ permalink raw reply	[flat|nested] 127+ messages in thread

* Re: [tip:x86/microcode] x86/microcode_intel_early.c: Early update ucode on Intel's CPU
  2012-12-11 18:20                                 ` H. Peter Anvin
@ 2012-12-11 18:42                                   ` Yinghai Lu
  2012-12-11 18:46                                     ` H. Peter Anvin
  0 siblings, 1 reply; 127+ messages in thread
From: Yinghai Lu @ 2012-12-11 18:42 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Borislav Petkov, Yu, Fenghua, mingo, linux-kernel, tglx, hpa,
	linux-tip-commits

On Tue, Dec 11, 2012 at 10:20 AM, H. Peter Anvin <hpa@zytor.com> wrote:
> On 12/11/2012 10:02 AM, H. Peter Anvin wrote:
>>
>>
>> The more I think about it, the more I think the right answer is the one
>> we have pretty stated all along: if using the 64-bit entry point it is
>> the responsibility of the boot loader to make sure the kernel, the setup
>> data, and the initramfs are all mapped on entry.
>>

arch/x86/kernel/head_64.S::startup_64

is setting up its own page table

>
> This, in turn, brings up another major problem with the 64-bit entry point:
> right now it assumes page tables set up the way the current kernels expect
> them, but the way the kernel expects pages to be laid out on the future is
> almost guaranteed to change.  We need to formalize the expectations of the
> page table layout at 64-bit entry, and check to see what the implications of
> that are.
>
> In particular, we may need to build a set of scaffolding page tables in the
> brk when entered from the 64-bit entry point and switch to them a lot
> sooner.  We wouldn't have to do that when coming from the 16/32-bit entry
> points since we'd control the layout of those page tables.
>
> Oh how fun...

now in for-x86-boot:
http://git.kernel.org/?p=linux/kernel/git/yinghai/linux-yinghai.git;a=commit;h=8e4e093e6d140f1316953437fdde4e826f5cfd98

it adds extra mapping from the whole kernel when kernel is loaded above 1G.
from round_down(_text, 2M) to round_up(_end -1, 2M).

Do you mean we need to add extra mapping for realmode_data, cmdline,
ramdisk too? (not include setup_data, and it is accessed via
early_ioremap later).

but if the user memmap to exclude some page, we will still need to
relocate the ramdisk.

Yinghai

^ permalink raw reply	[flat|nested] 127+ messages in thread

* Re: [tip:x86/microcode] x86/microcode_intel_early.c: Early update ucode on Intel's CPU
  2012-12-11 18:42                                   ` Yinghai Lu
@ 2012-12-11 18:46                                     ` H. Peter Anvin
  2012-12-11 19:18                                       ` Yinghai Lu
  0 siblings, 1 reply; 127+ messages in thread
From: H. Peter Anvin @ 2012-12-11 18:46 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Borislav Petkov, Yu, Fenghua, mingo, linux-kernel, tglx, hpa,
	linux-tip-commits

On 12/11/2012 10:42 AM, Yinghai Lu wrote:
> 
> now in for-x86-boot:
> http://git.kernel.org/?p=linux/kernel/git/yinghai/linux-yinghai.git;a=commit;h=8e4e093e6d140f1316953437fdde4e826f5cfd98
> 
> it adds extra mapping from the whole kernel when kernel is loaded above 1G.
> from round_down(_text, 2M) to round_up(_end -1, 2M).
> 
> Do you mean we need to add extra mapping for realmode_data, cmdline,
> ramdisk too? (not include setup_data, and it is accessed via
> early_ioremap later).

Yes, but...

> but if the user memmap to exclude some page, we will still need to
> relocate the ramdisk.

-ENOPARSE

I really need to look at this in more detail.  I'm starting to think
this is done completely backwards.

	-hpa



^ permalink raw reply	[flat|nested] 127+ messages in thread

* Re: [tip:x86/microcode] x86/microcode_intel_early.c: Early update ucode on Intel's CPU
  2012-12-11 18:46                                     ` H. Peter Anvin
@ 2012-12-11 19:18                                       ` Yinghai Lu
  2012-12-11 19:33                                         ` H. Peter Anvin
  0 siblings, 1 reply; 127+ messages in thread
From: Yinghai Lu @ 2012-12-11 19:18 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Borislav Petkov, Yu, Fenghua, mingo, linux-kernel, tglx, hpa,
	linux-tip-commits

On Tue, Dec 11, 2012 at 10:46 AM, H. Peter Anvin <hpa@zytor.com> wrote:
> On 12/11/2012 10:42 AM, Yinghai Lu wrote:
>>
>> now in for-x86-boot:
>> http://git.kernel.org/?p=linux/kernel/git/yinghai/linux-yinghai.git;a=commit;h=8e4e093e6d140f1316953437fdde4e826f5cfd98
>>
>> it adds extra mapping from the whole kernel when kernel is loaded above 1G.
>> from round_down(_text, 2M) to round_up(_end -1, 2M).
>>
>> Do you mean we need to add extra mapping for realmode_data, cmdline,
>> ramdisk too? (not include setup_data, and it is accessed via
>> early_ioremap later).
>
> Yes, but...

that will be bunch of asm code again, and need to parse the setup_header in that
asm to get range value for those regions...

>
>> but if the user memmap to exclude some page, we will still need to
>> relocate the ramdisk.
>
> -ENOPARSE

.. I mean pointer for saved ucode will be updated then again.
and relocated_initrd wil still need to use ioremap because init_memory_mapping
will clean mapping the range range that is excluded by memmap=XX$YY

>
> I really need to look at this in more detail.  I'm starting to think
> this is done completely backwards.

really should not put asm code if it could be done in C in not complicated way.

^ permalink raw reply	[flat|nested] 127+ messages in thread

* Re: [tip:x86/microcode] x86/microcode_intel_early.c: Early update ucode on Intel's CPU
  2012-12-11 19:18                                       ` Yinghai Lu
@ 2012-12-11 19:33                                         ` H. Peter Anvin
  0 siblings, 0 replies; 127+ messages in thread
From: H. Peter Anvin @ 2012-12-11 19:33 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: H. Peter Anvin, Borislav Petkov, Yu, Fenghua, mingo,
	linux-kernel, tglx, linux-tip-commits

On 12/11/2012 11:18 AM, Yinghai Lu wrote:
> 
> that will be bunch of asm code again, and need to parse the setup_header in that
> asm to get range value for those regions...
> 

It's an index into an array, it's not "parsing".

>>
>>> but if the user memmap to exclude some page, we will still need to
>>> relocate the ramdisk.
>>
>> -ENOPARSE
> 
> .. I mean pointer for saved ucode will be updated then again.
> and relocated_initrd wil still need to use ioremap because init_memory_mapping
> will clean mapping the range range that is excluded by memmap=XX$YY
> 
>>
>> I really need to look at this in more detail.  I'm starting to think
>> this is done completely backwards.
> 
> really should not put asm code if it could be done in C in not complicated way.
> 

Uhm... that's not what I'm talking about.

	-hpa


^ permalink raw reply	[flat|nested] 127+ messages in thread

* Re: [tip:x86/microcode] x86/microcode_intel_early.c: Early update ucode on Intel's CPU
  2012-12-11 17:38                               ` H. Peter Anvin
@ 2012-12-11 23:53                                 ` Yinghai Lu
  2012-12-11 23:57                                   ` H. Peter Anvin
  0 siblings, 1 reply; 127+ messages in thread
From: Yinghai Lu @ 2012-12-11 23:53 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Borislav Petkov, Yu, Fenghua, mingo, linux-kernel, tglx, hpa,
	linux-tip-commits, Konrad Rzeszutek Wilk, Stefano Stabellini

On Tue, Dec 11, 2012 at 9:38 AM, H. Peter Anvin <hpa@zytor.com> wrote:
> On 12/11/2012 09:15 AM, Yinghai Lu wrote:
>>
>>
>> No, that is not right place. initrd could be loaded anywhere like way
>> high by bootloader.
>>
>
> Only *after* your changes... the current protocol doesn't allow that.

before for-x86-boot change,

bootloader put ramdisk just under 2g...

[    0.000000] memblock_reserve: [0x7d9dc000-0x7fffefff] RAMDISK

and arch/x86/kernel/head_64.S only set ident mapping for [0,1g)

so before init_memory_mapping, we need to use early_ioremap to access
ramdisk area.

also even after init_memory_mapping, we still need use early_ioremap to access
it because user could use memmap to skip it.

PS: this problem have nothing to do with mapping that is set by bootloader.
arch/x86/kernel/head_64.c will have it own mapping, and only cover [0,1g).

Yinghai

>
> Anyway, we need to deal with it, see below.

^ permalink raw reply	[flat|nested] 127+ messages in thread

* Re: [tip:x86/microcode] x86/microcode_intel_early.c: Early update ucode on Intel's CPU
  2012-12-11 23:53                                 ` Yinghai Lu
@ 2012-12-11 23:57                                   ` H. Peter Anvin
  2012-12-12  0:27                                     ` Yinghai Lu
  0 siblings, 1 reply; 127+ messages in thread
From: H. Peter Anvin @ 2012-12-11 23:57 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Borislav Petkov, Yu, Fenghua, mingo, linux-kernel, tglx, hpa,
	linux-tip-commits, Konrad Rzeszutek Wilk, Stefano Stabellini

On 12/11/2012 03:53 PM, Yinghai Lu wrote:
> On Tue, Dec 11, 2012 at 9:38 AM, H. Peter Anvin <hpa@zytor.com> wrote:
>> On 12/11/2012 09:15 AM, Yinghai Lu wrote:
>>>
>>>
>>> No, that is not right place. initrd could be loaded anywhere like way
>>> high by bootloader.
>>>
>>
>> Only *after* your changes... the current protocol doesn't allow that.
> 
> before for-x86-boot change,
> 
> bootloader put ramdisk just under 2g...
> 
> [    0.000000] memblock_reserve: [0x7d9dc000-0x7fffefff] RAMDISK
> 
> and arch/x86/kernel/head_64.S only set ident mapping for [0,1g)
> 
> so before init_memory_mapping, we need to use early_ioremap to access
> ramdisk area.
> 
> also even after init_memory_mapping, we still need use early_ioremap to access
> it because user could use memmap to skip it.
> 
> PS: this problem have nothing to do with mapping that is set by bootloader.
> arch/x86/kernel/head_64.c will have it own mapping, and only cover [0,1g).
> 

Well, we could invoke it on the bootloader page tables, but as you say
it may not be a good idea... depending on how much memory we may be
talking about.  One solution -- which I have to admit is starting to
sound really good -- is to set up a #PF handler which cycles through a
set of page tables and creates a "virtual identity map"... it does have
the advantage of making the entire physical address space available
without any additional funnies.

	-hpa


^ permalink raw reply	[flat|nested] 127+ messages in thread

* Re: [tip:x86/microcode] x86/microcode_intel_early.c: Early update ucode on Intel's CPU
  2012-12-11 23:57                                   ` H. Peter Anvin
@ 2012-12-12  0:27                                     ` Yinghai Lu
  2012-12-12  0:37                                       ` H. Peter Anvin
  2012-12-12  6:57                                       ` H. Peter Anvin
  0 siblings, 2 replies; 127+ messages in thread
From: Yinghai Lu @ 2012-12-12  0:27 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Borislav Petkov, Yu, Fenghua, mingo, linux-kernel, tglx, hpa,
	linux-tip-commits, Konrad Rzeszutek Wilk, Stefano Stabellini

[-- Attachment #1: Type: text/plain, Size: 814 bytes --]

On Tue, Dec 11, 2012 at 3:57 PM, H. Peter Anvin <hpa@zytor.com> wrote:
> Well, we could invoke it on the bootloader page tables, but as you say
> it may not be a good idea... depending on how much memory we may be
> talking about.  One solution -- which I have to admit is starting to
> sound really good -- is to set up a #PF handler which cycles through a
> set of page tables and creates a "virtual identity map"... it does have
> the advantage of making the entire physical address space available
> without any additional funnies.

so that #PF handler will work before
arch/x86/kernel/setup.c::setup_arch/early_trap_init

early_strap_intit will install another handler there for #PF

for 64bit, moving early_ioremap_init ahead is very simple, like attach patch

but for 32 bit looks like it is not that easy.

[-- Attachment #2: early_ioremap_head64.patch --]
[-- Type: application/octet-stream, Size: 2723 bytes --]

Subject: [PATCH] x86: use io_remap to access real_mode_data

When 64bit bootloader put real mode data above 4g, We can not
access real mode data directly yet.

because in arch/x86/kernel/head_64.S, only set ident mapping
for 0-1g, and kernel code/data/bss.

Move early_ioremap_init() calling as early as possible to
x86_64_start_kernel.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>

---
 arch/x86/kernel/head64.c  |   26 +++++++++++++++++++++++---
 arch/x86/kernel/head_64.S |    4 ++--
 arch/x86/kernel/setup.c   |    2 ++
 3 files changed, 27 insertions(+), 5 deletions(-)

Index: linux-2.6/arch/x86/kernel/head64.c
===================================================================
--- linux-2.6.orig/arch/x86/kernel/head64.c
+++ linux-2.6/arch/x86/kernel/head64.c
@@ -62,12 +62,21 @@ static void __init copy_bootdata(char *r
 {
 	char * command_line;
 	unsigned long cmd_line_ptr;
+	char *p;
 
-	memcpy(&boot_params, real_mode_data, sizeof boot_params);
+	/*
+	 * for 64bit bootloader path, those data could be above 4G,
+	 * and we do not set ident mapping for them in head_64.S.
+	 * So need to use ioremap to access them.
+	 */
+	p = early_memremap((unsigned long)real_mode_data, sizeof(boot_params));
+	memcpy(&boot_params, p, sizeof(boot_params));
+	early_iounmap(p, sizeof(boot_params));
 	cmd_line_ptr = get_cmd_line_ptr();
 	if (cmd_line_ptr) {
-		command_line = __va(cmd_line_ptr);
+		command_line = early_memremap(cmd_line_ptr, COMMAND_LINE_SIZE);
 		memcpy(boot_command_line, command_line, COMMAND_LINE_SIZE);
+		early_iounmap(command_line, COMMAND_LINE_SIZE);
 	}
 }
 
@@ -92,6 +101,10 @@ void __init x86_64_start_kernel(char * r
 	/* clear bss before set_intr_gate with early_idt_handler */
 	clear_bss();
 
+	/* boot_params is in bss */
+	early_ioremap_init();
+	copy_bootdata(real_mode_data);
+
 	/* Make NULL pointers segfault */
 	zap_identity_mappings();
 
@@ -114,7 +127,14 @@ void __init x86_64_start_kernel(char * r
 
 void __init x86_64_start_reservations(char *real_mode_data)
 {
-	copy_bootdata(__va(real_mode_data));
+	/*
+	 * hdr.version is always not 0, so check it to see
+	 *  if boot_params is copied or not.
+	 */
+	if (!boot_params.hdr.version) {
+		early_ioremap_init();
+		copy_bootdata(real_mode_data);
+	}
 
 	memblock_reserve(__pa_symbol(_text),
 			 (unsigned long)__bss_stop - (unsigned long)_text);
Index: linux-2.6/arch/x86/kernel/setup.c
===================================================================
--- linux-2.6.orig/arch/x86/kernel/setup.c
+++ linux-2.6/arch/x86/kernel/setup.c
@@ -714,7 +714,9 @@ void __init setup_arch(char **cmdline_p)
 
 	early_trap_init();
 	early_cpu_init();
+#ifdef CONFIG_X86_32
 	early_ioremap_init();
+#endif
 
 	setup_olpc_ofw_pgd();
 

^ permalink raw reply	[flat|nested] 127+ messages in thread

* Re: [tip:x86/microcode] x86/microcode_intel_early.c: Early update ucode on Intel's CPU
  2012-12-12  0:27                                     ` Yinghai Lu
@ 2012-12-12  0:37                                       ` H. Peter Anvin
  2012-12-12  7:14                                         ` Yinghai Lu
  2012-12-12  6:57                                       ` H. Peter Anvin
  1 sibling, 1 reply; 127+ messages in thread
From: H. Peter Anvin @ 2012-12-12  0:37 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Borislav Petkov, Yu, Fenghua, mingo, linux-kernel, tglx, hpa,
	linux-tip-commits, Konrad Rzeszutek Wilk, Stefano Stabellini

On 12/11/2012 04:27 PM, Yinghai Lu wrote:
> On Tue, Dec 11, 2012 at 3:57 PM, H. Peter Anvin <hpa@zytor.com> wrote:
>> Well, we could invoke it on the bootloader page tables, but as you say
>> it may not be a good idea... depending on how much memory we may be
>> talking about.  One solution -- which I have to admit is starting to
>> sound really good -- is to set up a #PF handler which cycles through a
>> set of page tables and creates a "virtual identity map"... it does have
>> the advantage of making the entire physical address space available
>> without any additional funnies.
> 
> so that #PF handler will work before
> arch/x86/kernel/setup.c::setup_arch/early_trap_init
> 
> early_strap_intit will install another handler there for #PF
> 
> for 64bit, moving early_ioremap_init ahead is very simple, like attach patch
> 
> but for 32 bit looks like it is not that easy.
> 

For 32 bits, we don't need it, because we can just run this part in
linear mode.  It also doesn't help us on 32 bits since we are limited by
virtual address space anyway.

	-hpa


^ permalink raw reply	[flat|nested] 127+ messages in thread

* Re: [tip:x86/microcode] x86/microcode_intel_early.c: Early update ucode on Intel's CPU
  2012-12-12  0:27                                     ` Yinghai Lu
  2012-12-12  0:37                                       ` H. Peter Anvin
@ 2012-12-12  6:57                                       ` H. Peter Anvin
  2012-12-12 13:38                                         ` Borislav Petkov
  1 sibling, 1 reply; 127+ messages in thread
From: H. Peter Anvin @ 2012-12-12  6:57 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Borislav Petkov, Yu, Fenghua, mingo, linux-kernel, tglx, hpa,
	linux-tip-commits, Konrad Rzeszutek Wilk, Stefano Stabellini

[-- Attachment #1: Type: text/plain, Size: 1970 bytes --]

On 12/11/2012 04:27 PM, Yinghai Lu wrote:
> On Tue, Dec 11, 2012 at 3:57 PM, H. Peter Anvin <hpa@zytor.com> wrote:
>> Well, we could invoke it on the bootloader page tables, but as you say
>> it may not be a good idea... depending on how much memory we may be
>> talking about.  One solution -- which I have to admit is starting to
>> sound really good -- is to set up a #PF handler which cycles through a
>> set of page tables and creates a "virtual identity map"... it does have
>> the advantage of making the entire physical address space available
>> without any additional funnies.
>
> so that #PF handler will work before
> arch/x86/kernel/setup.c::setup_arch/early_trap_init
>
> early_strap_intit will install another handler there for #PF
>
> for 64bit, moving early_ioremap_init ahead is very simple, like attach patch
>
> but for 32 bit looks like it is not that easy.
>

Here is an incomplete patch for illustration purposes only what I mean 
with an early-mapping #PF handler.  This creates a set of transient page 
tables on demand which allows us to access memory as if it was all 
mapped, but using only O(1) storage.  The replacement policy is trivial: 
if we run out, we start over from scratch.

The "identity page tables" used during the transition to high virtual 
addresses are kind of magic; there is a bunch of extra aliases created, 
but the way it is done guarantees that the range we actually cares about 
is mapped correctly.  The aliases don't matter and get scrubbed shortly 
thereafter anyway.

This should, obviously, be used on native only -- in particular Xen 
should instead rely on the initial page tables provided by the domain 
builder, which should map all physical memory.

Once the proper memory-map-aware page tables are built, we should turn 
this off by swapping to the newly built real init_level4_pgt instead.

	-hpa

-- 
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel.  I don't speak on their behalf.


[-- Attachment #2: diff --]
[-- Type: text/plain, Size: 12706 bytes --]

diff --git a/arch/x86/include/asm/pgtable_64_types.h b/arch/x86/include/asm/pgtable_64_types.h
index 766ea16..2d88344 100644
--- a/arch/x86/include/asm/pgtable_64_types.h
+++ b/arch/x86/include/asm/pgtable_64_types.h
@@ -1,6 +1,8 @@
 #ifndef _ASM_X86_PGTABLE_64_DEFS_H
 #define _ASM_X86_PGTABLE_64_DEFS_H
 
+#include <asm/sparsemem.h>
+
 #ifndef __ASSEMBLY__
 #include <linux/types.h>
 
@@ -60,4 +62,6 @@ typedef struct { pteval_t pte; } pte_t;
 #define MODULES_END      _AC(0xffffffffff000000, UL)
 #define MODULES_LEN   (MODULES_END - MODULES_VADDR)
 
+#define EARLY_DYNAMIC_PAGE_TABLES	64
+
 #endif /* _ASM_X86_PGTABLE_64_DEFS_H */
diff --git a/arch/x86/kernel/head64.c b/arch/x86/kernel/head64.c
index 037df57..9443c77 100644
--- a/arch/x86/kernel/head64.c
+++ b/arch/x86/kernel/head64.c
@@ -26,11 +26,73 @@
 #include <asm/e820.h>
 #include <asm/bios_ebda.h>
 
-static void __init zap_identity_mappings(void)
+/*
+ * Manage page tables very early on.
+ */
+extern pgd_t early_level4_pgt[PTRS_PER_PGD];
+extern pmd_t early_dynamic_pgts[EARLY_DYNAMIC_PAGE_TABLES][PTRS_PER_PMD];
+static unsigned int __initdata next_early_pgt = 2, early_pgt_resets = 0;
+
+/* Wipe all early page tables except for the kernel symbol map */
+static void __init reset_early_page_tables(void)
 {
-	pgd_t *pgd = pgd_offset_k(0UL);
-	pgd_clear(pgd);
-	__flush_tlb_all();
+	unsigned long i;
+
+	for (i = 0; i < PTRS_PER_PGD-1; i++)
+		early_level4_pgt[i].pgd = 0;
+
+	next_early_pgt = 0;
+	early_pgt_resets++;
+
+	__native_flush_tlb();
+}
+
+/* Create a new PMD entry */
+int __init early_make_pgtable(unsigned long address)
+{
+	unsigned long physaddr = address - __PAGE_OFFSET;
+	unsigned long i;
+	pgdval_t pgd, *pgd_p;
+	pudval_t *pud_p;
+	pmdval_t pmd, *pmd_p;
+
+	if (physaddr >= MAXMEM)
+		return -1;	/* Invalid address - puke */
+
+	i = (address >> PGDIR_SHIFT) & (PTRS_PER_PGD - 1);
+	pgd_p = &early_level4_pgt[i].pgd;
+	pgd = *pgd_p;
+
+	/*
+	 * The use of __START_KERNEL_map rather than __PAGE_OFFSET here is
+	 * critical -- __PAGE_OFFSET would point us back into the dynamic
+	 * range and we might end up looping forever...
+	 */
+	if (pgd && next_early_pgt < EARLY_DYNAMIC_PAGE_TABLES) {
+		pud_p = (pudval_t *)((pgd & PTE_PFN_MASK) + __START_KERNEL_map);
+	} else {
+		if (next_early_pgt >= EARLY_DYNAMIC_PAGE_TABLES-1)
+			reset_early_page_tables();
+
+		pud_p = (pudval_t *)early_dynamic_pgts[next_early_pgt++];
+		for (i = 0; i < PTRS_PER_PUD; i++)
+			pud_p[i] = 0;
+
+		*pgd_p = (pgdval_t)pud_p - __START_KERNEL_map + _KERNPG_TABLE;
+	}
+	i = (address >> PUD_SHIFT) & (PTRS_PER_PUD - 1);
+	pud_p += i;
+
+	pmd_p = (pmdval_t *)early_dynamic_pgts[next_early_pgt++];
+	pmd = (physaddr & PUD_MASK) + (__PAGE_KERNEL_LARGE & ~_PAGE_GLOBAL);
+	for (i = 0; i < PTRS_PER_PMD; i++) {
+		pmd_p[i] = pmd;
+		pmd += PMD_SIZE;
+	}
+
+	*pud_p = (pudval_t)pmd_p - __START_KERNEL_map + _KERNPG_TABLE;
+
+	return 0;
 }
 
 /* Don't add a printk in there. printk relies on the PDA which is not initialized 
@@ -70,12 +132,13 @@ void __init x86_64_start_kernel(char * real_mode_data)
 				(__START_KERNEL & PGDIR_MASK)));
 	BUILD_BUG_ON(__fix_to_virt(__end_of_fixed_addresses) <= MODULES_END);
 
+	/* Kill off the identity-map trampoline */
+	reset_early_page_tables();
+
 	/* clear bss before set_intr_gate with early_idt_handler */
 	clear_bss();
 
-	/* Make NULL pointers segfault */
-	zap_identity_mappings();
-
+	/* XXX - this is wrong... we need to build page tables from scratch */
 	max_pfn_mapped = KERNEL_IMAGE_SIZE >> PAGE_SHIFT;
 
 	for (i = 0; i < NUM_EXCEPTION_VECTORS; i++) {
diff --git a/arch/x86/kernel/head_64.S b/arch/x86/kernel/head_64.S
index 94bf9cc..e13ff91 100644
--- a/arch/x86/kernel/head_64.S
+++ b/arch/x86/kernel/head_64.S
@@ -47,14 +47,13 @@ L3_START_KERNEL = pud_index(__START_KERNEL_map)
 	.code64
 	.globl startup_64
 startup_64:
-
 	/*
 	 * At this point the CPU runs in 64bit mode CS.L = 1 CS.D = 1,
 	 * and someone has loaded an identity mapped page table
 	 * for us.  These identity mapped page tables map all of the
 	 * kernel pages and possibly all of memory.
 	 *
-	 * %esi holds a physical pointer to real_mode_data.
+	 * %rsi holds a physical pointer to real_mode_data.
 	 *
 	 * We come here either directly from a 64bit bootloader, or from
 	 * arch/x86_64/boot/compressed/head.S.
@@ -66,7 +65,8 @@ startup_64:
 	 * tables and then reload them.
 	 */
 
-	/* Compute the delta between the address I am compiled to run at and the
+	/*
+	 * Compute the delta between the address I am compiled to run at and the
 	 * address I am actually running at.
 	 */
 	leaq	_text(%rip), %rbp
@@ -78,53 +78,66 @@ startup_64:
 	testl	%eax, %eax
 	jnz	bad_address
 
-	/* Is the address too large? */
-	leaq	_text(%rip), %rdx
-	movq	$PGDIR_SIZE, %rax
-	cmpq	%rax, %rdx
-	jae	bad_address
-
-	/* Fixup the physical addresses in the page table
+	/*
+	 * Is the address too large?
 	 */
-	addq	%rbp, init_level4_pgt + 0(%rip)
-	addq	%rbp, init_level4_pgt + (L4_PAGE_OFFSET*8)(%rip)
-	addq	%rbp, init_level4_pgt + (L4_START_KERNEL*8)(%rip)
+	leaq	_text(%rip), %rax
+	shrq	$MAX_PHYSMEM_BITS, %rax
+	jnz	bad_address
 
-	addq	%rbp, level3_ident_pgt + 0(%rip)
+	/*
+	 * Fixup the physical addresses in the page table
+	 */
+	addq	%rbp, early_level4_pgt + (L4_START_KERNEL*8)(%rip)
 
 	addq	%rbp, level3_kernel_pgt + (510*8)(%rip)
 	addq	%rbp, level3_kernel_pgt + (511*8)(%rip)
 
 	addq	%rbp, level2_fixmap_pgt + (506*8)(%rip)
 
-	/* Add an Identity mapping if I am above 1G */
+	/*
+	 * Set up the identity mapping for the switchover.  These
+	 * entries should *NOT* have the global bit set!  This also
+	 * creates a bunch of nonsense entries but that is fine --
+	 * it avoids problems around wraparound.
+	 */
 	leaq	_text(%rip), %rdi
-	andq	$PMD_PAGE_MASK, %rdi
+	leaq	early_level4_pgt(%rip), %rbx
 
 	movq	%rdi, %rax
-	shrq	$PUD_SHIFT, %rax
-	andq	$(PTRS_PER_PUD - 1), %rax
-	jz	ident_complete
+	shrq	$PGDIR_SHIFT, %rax
+
+	leaq	(4096 + _KERNPG_TABLE)(%rbx), %rdx
+	movq	%rdx, 0(%rbx,%rax,8)
+	movq	%rdx, 8(%rbx,%rax,8)
 
-	leaq	(level2_spare_pgt - __START_KERNEL_map + _KERNPG_TABLE)(%rbp), %rdx
-	leaq	level3_ident_pgt(%rip), %rbx
-	movq	%rdx, 0(%rbx, %rax, 8)
+	addq	$4096, %rdx
+	movq	%rdi, %rax
+	shrq	$PUD_SHIFT, %rax
+	andl	$(PTRS_PER_PUD-1), %eax
+	movq	%rdx, (4096+0)(%rbx,%rax,8)
+	movq	%rdx, (4096+8)(%rbx,%rax,8)
 
+	addq	$8192, %rbx
 	movq	%rdi, %rax
-	shrq	$PMD_SHIFT, %rax
-	andq	$(PTRS_PER_PMD - 1), %rax
-	leaq	__PAGE_KERNEL_IDENT_LARGE_EXEC(%rdi), %rdx
-	leaq	level2_spare_pgt(%rip), %rbx
-	movq	%rdx, 0(%rbx, %rax, 8)
-ident_complete:
+	shrq	$PMD_SHIFT, %rdi
+	addq	$(__PAGE_KERNEL_LARGE_EXEC & ~_PAGE_GLOBAL), %rax
+	movl	$PTRS_PER_PMD, %ecx
 
+1:
+	andq	$(PTRS_PER_PMD - 1), %rdi
+	movq	%rax, (%rbx,%rdi,8)
+	incq	%rdi
+	addq	$PMD_SIZE, %rax
+	decl	%ecx
+	jnz	1b
+	
 	/*
 	 * Fixup the kernel text+data virtual addresses. Note that
 	 * we might write invalid pmds, when the kernel is relocated
 	 * cleanup_highmap() fixes this up along with the mappings
 	 * beyond _end.
 	 */
-
 	leaq	level2_kernel_pgt(%rip), %rdi
 	leaq	4096(%rdi), %r8
 	/* See if it is a valid page table entry */
@@ -149,7 +162,7 @@ ENTRY(secondary_startup_64)
 	 * At this point the CPU runs in 64bit mode CS.L = 1 CS.D = 1,
 	 * and someone has loaded a mapped page table.
 	 *
-	 * %esi holds a physical pointer to real_mode_data.
+	 * %rsi holds a physical pointer to real_mode_data.
 	 *
 	 * We come here either from startup_64 (using physical addresses)
 	 * or from trampoline.S (using virtual addresses).
@@ -196,7 +209,7 @@ ENTRY(secondary_startup_64)
 	movq	%rax, %cr0
 
 	/* Setup a boot time stack */
-	movq stack_start(%rip),%rsp
+	movq stack_start(%rip), %rsp
 
 	/* zero EFLAGS after setting rsp */
 	pushq $0
@@ -236,31 +249,31 @@ ENTRY(secondary_startup_64)
 	movl	initial_gs+4(%rip),%edx
 	wrmsr	
 
-	/* esi is pointer to real mode structure with interesting info.
+	/* rsi is pointer to real mode structure with interesting info.
 	   pass it to C */
-	movl	%esi, %edi
+	movq	%rsi, %rdi
 	
 	/* Finally jump to run C code and to be on real kernel address
 	 * Since we are running on identity-mapped space we have to jump
 	 * to the full 64bit address, this is only possible as indirect
 	 * jump.  In addition we need to ensure %cs is set so we make this
-	 * a far return.
+	 * a far jump.
 	 */
-	movq	initial_code(%rip),%rax
 	pushq	$0		# fake return address to stop unwinder
-	pushq	$__KERNEL_CS	# set correct cs
-	pushq	%rax		# target address in negative space
-	lretq
+	/* gas 2.22 is buggy and mis-assembles ljmpq */
+	rex64 ljmp *initial_code(%rip)
 
 	/* SMP bootup changes these two */
 	__REFDATA
-	.align	8
-	ENTRY(initial_code)
+	.balign	8
+	GLOBAL(initial_code)
 	.quad	x86_64_start_kernel
-	ENTRY(initial_gs)
+	.word	__KERNEL_CS
+	.balign	8
+	GLOBAL(initial_gs)
 	.quad	INIT_PER_CPU_VAR(irq_stack_union)
 
-	ENTRY(stack_start)
+	GLOBAL(stack_start)
 	.quad  init_thread_union+THREAD_SIZE-8
 	.word  0
 	__FINITDATA
@@ -268,7 +281,7 @@ ENTRY(secondary_startup_64)
 bad_address:
 	jmp bad_address
 
-	.section ".init.text","ax"
+	__INIT
 	.globl early_idt_handlers
 early_idt_handlers:
 	# 104(%rsp) %rflags
@@ -305,14 +318,22 @@ ENTRY(early_idt_handler)
 	pushq %r11		#  0(%rsp)
 
 	cmpl $__KERNEL_CS,96(%rsp)
-	jne 10f
+	jne 11f
 
+	cmpl $14,72(%rsp)	# Page fault?
+	jnz 10f
+	GET_CR2_INTO(%rdi)	# can clobber any volatile register if pv
+	call early_make_pgtable
+	andl %eax,%eax
+	jz 20f			# All good
+
+10:
 	leaq 88(%rsp),%rdi	# Pointer to %rip
 	call early_fixup_exception
 	andl %eax,%eax
 	jnz 20f			# Found an exception entry
 
-10:
+11:
 #ifdef CONFIG_EARLY_PRINTK
 	GET_CR2_INTO(%r9)	# can clobber any volatile register if pv
 	movl 80(%rsp),%r8d	# error code
@@ -334,7 +355,7 @@ ENTRY(early_idt_handler)
 1:	hlt
 	jmp 1b
 
-20:	# Exception table entry found
+20:	# Exception table entry found or page table generated
 	popq %r11
 	popq %r10
 	popq %r9
@@ -348,6 +369,8 @@ ENTRY(early_idt_handler)
 	decl early_recursion_flag(%rip)
 	INTERRUPT_RETURN
 
+	__INITDATA
+	
 	.balign 4
 early_recursion_flag:
 	.long 0
@@ -358,11 +381,10 @@ early_idt_msg:
 early_idt_ripmsg:
 	.asciz "RIP %s\n"
 #endif /* CONFIG_EARLY_PRINTK */
-	.previous
 
 #define NEXT_PAGE(name) \
 	.balign	PAGE_SIZE; \
-ENTRY(name)
+GLOBAL(name)
 
 /* Automate the creation of 1 to 1 mapping pmd entries */
 #define PMDS(START, PERM, COUNT)			\
@@ -372,46 +394,21 @@ ENTRY(name)
 	i = i + 1 ;					\
 	.endr
 
-	.data
-	/*
-	 * This default setting generates an ident mapping at address 0x100000
-	 * and a mapping for the kernel that precisely maps virtual address
-	 * 0xffffffff80000000 to physical address 0x000000. (always using
-	 * 2Mbyte large pages provided by PAE mode)
-	 */
-NEXT_PAGE(init_level4_pgt)
-	.quad	level3_ident_pgt - __START_KERNEL_map + _KERNPG_TABLE
-	.org	init_level4_pgt + L4_PAGE_OFFSET*8, 0
-	.quad	level3_ident_pgt - __START_KERNEL_map + _KERNPG_TABLE
-	.org	init_level4_pgt + L4_START_KERNEL*8, 0
-	/* (2^48-(2*1024*1024*1024))/(2^39) = 511 */
+	__INITDATA
+NEXT_PAGE(early_level4_pgt)
+	.fill	511,8,0
 	.quad	level3_kernel_pgt - __START_KERNEL_map + _PAGE_TABLE
 
-NEXT_PAGE(level3_ident_pgt)
-	.quad	level2_ident_pgt - __START_KERNEL_map + _KERNPG_TABLE
-	.fill	511,8,0
+NEXT_PAGE(early_dynamic_pgts)
+	.fill	512*EARLY_DYNAMIC_PAGE_TABLES,8,0
 
+	.data
 NEXT_PAGE(level3_kernel_pgt)
 	.fill	L3_START_KERNEL,8,0
 	/* (2^48-(2*1024*1024*1024)-((2^39)*511))/(2^30) = 510 */
 	.quad	level2_kernel_pgt - __START_KERNEL_map + _KERNPG_TABLE
 	.quad	level2_fixmap_pgt - __START_KERNEL_map + _PAGE_TABLE
 
-NEXT_PAGE(level2_fixmap_pgt)
-	.fill	506,8,0
-	.quad	level1_fixmap_pgt - __START_KERNEL_map + _PAGE_TABLE
-	/* 8MB reserved for vsyscalls + a 2MB hole = 4 + 1 entries */
-	.fill	5,8,0
-
-NEXT_PAGE(level1_fixmap_pgt)
-	.fill	512,8,0
-
-NEXT_PAGE(level2_ident_pgt)
-	/* Since I easily can, map the first 1G.
-	 * Don't set NX because code runs from these pages.
-	 */
-	PMDS(0, __PAGE_KERNEL_IDENT_LARGE_EXEC, PTRS_PER_PMD)
-
 NEXT_PAGE(level2_kernel_pgt)
 	/*
 	 * 512 MB kernel mapping. We spend a full page on this pagetable
@@ -426,11 +423,13 @@ NEXT_PAGE(level2_kernel_pgt)
 	PMDS(0, __PAGE_KERNEL_LARGE_EXEC,
 		KERNEL_IMAGE_SIZE/PMD_SIZE)
 
-NEXT_PAGE(level2_spare_pgt)
-	.fill   512, 8, 0
+NEXT_PAGE(level2_fixmap_pgt)
+	.fill	506,8,0
+	.quad	level1_fixmap_pgt - __START_KERNEL_map + _PAGE_TABLE
+	/* 8MB reserved for vsyscalls + a 2MB hole = 4 + 1 entries */
+	.fill	5,8,0
 
 #undef PMDS
-#undef NEXT_PAGE
 
 	.data
 	.align 16
@@ -456,6 +455,7 @@ ENTRY(nmi_idt_table)
 	.skip IDT_ENTRIES * 16
 
 	__PAGE_ALIGNED_BSS
-	.align PAGE_SIZE
-ENTRY(empty_zero_page)
+NEXT_PAGE(empty_zero_page)
+	.skip PAGE_SIZE
+NEXT_PAGE(init_level4_pgt)
 	.skip PAGE_SIZE

^ permalink raw reply related	[flat|nested] 127+ messages in thread

* Re: [tip:x86/microcode] x86/microcode_intel_early.c: Early update ucode on Intel's CPU
  2012-12-12  0:37                                       ` H. Peter Anvin
@ 2012-12-12  7:14                                         ` Yinghai Lu
  2012-12-12 10:26                                           ` Yinghai Lu
  2012-12-13  1:06                                           ` Yinghai Lu
  0 siblings, 2 replies; 127+ messages in thread
From: Yinghai Lu @ 2012-12-12  7:14 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Borislav Petkov, Yu, Fenghua, mingo, linux-kernel, tglx, hpa,
	linux-tip-commits, Konrad Rzeszutek Wilk, Stefano Stabellini

[-- Attachment #1: Type: text/plain, Size: 1588 bytes --]

On Tue, Dec 11, 2012 at 4:37 PM, H. Peter Anvin <hpa@zytor.com> wrote:
> On 12/11/2012 04:27 PM, Yinghai Lu wrote:
>> On Tue, Dec 11, 2012 at 3:57 PM, H. Peter Anvin <hpa@zytor.com> wrote:
>>> Well, we could invoke it on the bootloader page tables, but as you say
>>> it may not be a good idea... depending on how much memory we may be
>>> talking about.  One solution -- which I have to admit is starting to
>>> sound really good -- is to set up a #PF handler which cycles through a
>>> set of page tables and creates a "virtual identity map"... it does have
>>> the advantage of making the entire physical address space available
>>> without any additional funnies.
>>
>> so that #PF handler will work before
>> arch/x86/kernel/setup.c::setup_arch/early_trap_init
>>
>> early_strap_intit will install another handler there for #PF
>>
>> for 64bit, moving early_ioremap_init ahead is very simple, like attach patch
>>
>> but for 32 bit looks like it is not that easy.
>>
>
> For 32 bits, we don't need it, because we can just run this part in
> linear mode.  It also doesn't help us on 32 bits since we are limited by
> virtual address space anyway.

please check draft version for early_memremap version for microcode...

1. make find_cpio take map/unmap function pointer, and use that to set
sliding window.
2. clean the end to size in some function to fix -1 offset
3. update_mc_saved to change back to __va for ap etc and after
initrd_relocation.

NOT EVEN COMPILING TEST.

should use this one on top of early_ioremap_head64.patch that i sent
it out this afternoon.

Thanks

Yinghai

[-- Attachment #2: fix_microcode.patch --]
[-- Type: application/octet-stream, Size: 11724 bytes --]

---
 arch/x86/kernel/microcode_intel_early.c |  105 +++++++++++++++++++++-----------
 arch/x86/kernel/setup.c                 |    4 +
 drivers/acpi/osl.c                      |   11 +++
 include/linux/earlycpio.h               |    4 -
 lib/earlycpio.c                         |   88 ++++++++++++++++++++------
 5 files changed, 155 insertions(+), 57 deletions(-)

Index: linux-2.6/arch/x86/kernel/microcode_intel_early.c
===================================================================
--- linux-2.6.orig/arch/x86/kernel/microcode_intel_early.c
+++ linux-2.6/arch/x86/kernel/microcode_intel_early.c
@@ -290,21 +290,69 @@ static int __cpuinit collect_cpu_info_ea
 	return 0;
 }
 
+#ifdef CONFIG_X86_64
+static void *early_cd_data_p __initdata;
+static unsigned long early_cd_data_pa __initdata;
+static unsigned long early_cd_size __initdata;
+
+void update_mc_saved_data(unsigned long pa_offset)
+{
+	int i;
+	unsigned long addr;
+
+	if (early_cd_data_p)
+		early_iounmap(early_cd_data_p, early_cd_size);
+
+	if (!mc_saved_data->mc_saved)
+		return;
+
+	for (i = 0; i< mc_saved_data->mc_saved_count) {
+		addr = (void *)mc_saved_data->mc_saved[i]- early_cd_data_p;
+		addr += early_cd_data_pa;
+		addr += pa_offset;
+		mc_saved_data->mc_saved[i] = __va(addr);
+	}
+}
+#else
+static void *map(unsigned long addr, unsigned long size)
+{
+	return (void *)addr;
+}
+static void unmap(void *p, unsigned long size)
+{
+}
+#endif
+
 static __init enum ucode_state
-scan_microcode(unsigned long start, unsigned long end,
+scan_microcode(unsigned long start, unsigned long size,
 		struct mc_saved_data *mc_saved_data,
 		struct microcode_intel **mc_saved_in_initrd)
 {
-	unsigned int size = end - start + 1;
 	struct cpio_data cd = { 0, 0 };
 	char ucode_name[] = "kernel/x86/microcode/GenuineIntel.bin";
 	long offset = 0;
+	void *p;
 
-	cd = find_cpio_data(ucode_name, (void *)start, size, &offset);
+#ifdef CONFIG_X86_64
+	cd = find_cpio_data(ucode_name, (void *)start, size, &offset,
+				 early_memremap, early_iounmap, PAGE_SIZE);
+#else
+	cd = find_cpio_data(ucode_name, (void *)start, size, &offset,
+				map, unmap, map_size);
+#endif
 	if (!cd.data)
 		return UCODE_ERROR;
 
-	return get_matching_model_microcode(0, cd.data, cd.size, mc_saved_data,
+#ifdef CONFIG_X86_64
+	/* will need to free them later */
+	early_cd_data_pa = cd.data;
+	early_cd_data_size = cd_size;
+	p = early_cd_data_p = early_memremap(early_cd_data_pa, early_cd_data_size);
+#else
+	p = cd.data;
+#endif
+
+	return get_matching_model_microcode(0, p, cd.size, mc_saved_data,
 					 mc_saved_in_initrd, SYSTEM_BOOTING);
 }
 
@@ -340,28 +388,22 @@ apply_microcode_early(struct mc_saved_da
 }
 
 #ifdef CONFIG_X86_32
-static void __init map_mc_saved(struct mc_saved_data *mc_saved_data,
-				struct microcode_intel **mc_saved_in_initrd)
+void update_mc_saved_data(unsigned long pa_offset)
 {
 	int i;
 
 	if (mc_saved_data->mc_saved) {
+		mc_saved_data->mc_saved = __va(mc_saved_data->mc_saved);
 		for (i = 0; i < mc_saved_data->mc_saved_count; i++)
 			mc_saved_data->mc_saved[i] =
-					 __va(mc_saved_data->mc_saved[i]);
+					 __va(mc_saved_data->mc_saved[i] + pa_offset);
 
-		mc_saved_data->mc_saved = __va(mc_saved_data->mc_saved);
 	}
 
+	mc_saved_data->ucode_cpu_info = __va(mc_saved_data->ucode_cpu_info);
 	if (mc_saved_data->ucode_cpu_info->mc)
 		mc_saved_data->ucode_cpu_info->mc =
 				 __va(mc_saved_data->ucode_cpu_info->mc);
-	mc_saved_data->ucode_cpu_info = __va(mc_saved_data->ucode_cpu_info);
-}
-#else
-static inline void __init map_mc_saved(struct mc_saved_data *mc_saved_data,
-				struct microcode_intel **mc_saved_in_initrd)
-{
 }
 #endif
 
@@ -376,7 +418,7 @@ void __init save_microcode_in_initrd(str
 static void __init
 _load_ucode_intel_bsp(struct mc_saved_data *mc_saved_data,
 		      struct microcode_intel **mc_saved_in_initrd,
-		      unsigned long initrd_start, unsigned long initrd_end)
+		      unsigned long initrd_start, unsigned long initrd_size)
 {
 	int cpu = 0;
 
@@ -387,37 +429,34 @@ _load_ucode_intel_bsp(struct mc_saved_da
 			(struct ucode_cpu_info *)__pa(ucode_cpu_info_early);
 #endif
 	collect_cpu_info_early(mc_saved_data->ucode_cpu_info + cpu);
-	scan_microcode(initrd_start, initrd_end, mc_saved_data,
+	scan_microcode(initrd_start, initrd_size, mc_saved_data,
 		       mc_saved_in_initrd);
 	load_microcode(mc_saved_data, cpu);
 	apply_microcode_early(mc_saved_data, cpu);
-	map_mc_saved(mc_saved_data, mc_saved_in_initrd);
 }
 
 void __init
 load_ucode_intel_bsp(char *real_mode_data)
 {
-	u64 ramdisk_image, ramdisk_size, ramdisk_end;
-	unsigned long initrd_start, initrd_end;
-	struct boot_params *boot_params;
-
-	boot_params = (struct boot_params *)real_mode_data;
-	ramdisk_image = boot_params->hdr.ramdisk_image;
-	ramdisk_size  = boot_params->hdr.ramdisk_size;
+	u64 ramdisk_image, ramdisk_size;
 
 #ifdef CONFIG_X86_64
-	ramdisk_end  = PAGE_ALIGN(ramdisk_image + ramdisk_size);
-	initrd_start = ramdisk_image + PAGE_OFFSET;
-	initrd_end = initrd_start + ramdisk_size;
+	ramdisk_image = boot_params.hdr.ramdisk_image;
+	ramdisk_size  = boot_params.hdr.ramdisk_size;
+	if (!ramdisk_image || !ramdisk_size)
+		return;
 	_load_ucode_intel_bsp(&mc_saved_data, mc_saved_in_initrd,
-			      initrd_start, initrd_end);
+			      ramdisk_image, ramdisk_size);
 #else
-	ramdisk_end  = ramdisk_image + ramdisk_size;
-	initrd_start = ramdisk_image;
-	initrd_end = initrd_start + ramdisk_size;
+	struct boot_params *boot_params = (struct boot_params *)real_mode_data;
+
+	ramdisk_image = boot_params->hdr.ramdisk_image;
+	ramdisk_size  = boot_params->hdr.ramdisk_size;
+	if (!ramdisk_image || !ramdisk_size)
+		return;
 	_load_ucode_intel_bsp((struct mc_saved_data *)__pa(&mc_saved_data),
-			(struct microcode_intel **)__pa(mc_saved_in_initrd),
-			initrd_start, initrd_end);
+			      (struct microcode_intel **)__pa(mc_saved_in_initrd),
+			      ramdisk_image, ramdisk_size);
 #endif
 }
 
Index: linux-2.6/drivers/acpi/osl.c
===================================================================
--- linux-2.6.orig/drivers/acpi/osl.c
+++ linux-2.6/drivers/acpi/osl.c
@@ -573,6 +573,14 @@ static const char * const table_sigs[] =
 /* Must not increase 10 or needs code modification below */
 #define ACPI_OVERRIDE_TABLES 10
 
+static void *map(unsigned long addr, unsigned long size)
+{
+	return (void *)addr;
+}
+static void unmap(void *p, unsigned long size)
+{
+}
+
 void __init acpi_initrd_override(void *data, size_t size)
 {
 	int sig, no, table_nr = 0, total_offset = 0;
@@ -587,7 +595,8 @@ void __init acpi_initrd_override(void *d
 		return;
 
 	for (no = 0; no < ACPI_OVERRIDE_TABLES; no++) {
-		file = find_cpio_data(cpio_path, data, size, &offset);
+		file = find_cpio_data(cpio_path, data, size, &offset,
+					map, unmap, size);
 		if (!file.data)
 			break;
 
Index: linux-2.6/include/linux/earlycpio.h
===================================================================
--- linux-2.6.orig/include/linux/earlycpio.h
+++ linux-2.6/include/linux/earlycpio.h
@@ -12,6 +12,8 @@ struct cpio_data {
 };
 
 struct cpio_data find_cpio_data(const char *path, void *data, size_t len,
-				long *offset);
+			long *offset, void *(*map)(unsigned long, unsigned long),
+			void (*unmap)(void *, unsigned long),
+			unsigned long map_size);
 
 #endif /* _LINUX_EARLYCPIO_H */
Index: linux-2.6/lib/earlycpio.c
===================================================================
--- linux-2.6.orig/lib/earlycpio.c
+++ linux-2.6/lib/earlycpio.c
@@ -64,23 +64,39 @@ enum cpio_fields {
  */
 
 struct cpio_data __cpuinit find_cpio_data(const char *path, void *data,
-					  size_t len,  long *offset)
+					  size_t len,  long *offset,
+					  void *(*map)(unsigned long, unsigned long),
+					  void (*unmap)(void *, unsigned long),
+					  unsigned long map_size)
 {
 	const size_t cpio_header_len = 8*C_NFIELDS - 2;
 	struct cpio_data cd = { NULL, 0, "" };
-	const char *p, *dptr, *nptr;
+	const char *p;
 	unsigned int ch[C_NFIELDS], *chp, v;
 	unsigned char c, x;
 	size_t mypathsize = strlen(path);
 	int i, j;
-
-	p = data;
+	unsigned long addr, off, limit = map_size;
+	unsigned long dptr, nptr;
+	char *p_start;
+
+	addr = (unsigned long)data;
+	p = p_start = map(addr, map_size);
+	off = 0;
 
 	while (len > cpio_header_len) {
 		if (!*p) {
 			/* All cpio headers need to be 4-byte aligned */
-			p += 4;
+			addr += 4;
+			off += 4;
 			len -= 4;
+			if (off < limit)
+				p += 4;
+			else {
+				unmap(p_start, map_size);
+				p = p_start = map(addr, map_size);
+				off = 0;
+			}
 			continue;
 		}
 
@@ -90,7 +106,16 @@ struct cpio_data __cpuinit find_cpio_dat
 			v = 0;
 			while (j--) {
 				v <<= 4;
-				c = *p++;
+				c = *p;
+				addr++;
+				off++;
+				if (off < limit)
+					p++;
+				else {
+					unmap(p_start, map_size);
+					p = p_start = map(addr, map_size);
+					off = 0;
+				}
 
 				x = c - '0';
 				if (x < 10) {
@@ -115,31 +140,50 @@ struct cpio_data __cpuinit find_cpio_dat
 
 		len -= cpio_header_len;
 
-		dptr = PTR_ALIGN(p + ch[C_NAMESIZE], 4);
-		nptr = PTR_ALIGN(dptr + ch[C_FILESIZE], 4);
+		dptr = ALIGN(addr + ch[C_NAMESIZE], 4);
+		nptr = ALIGN(dptr + ch[C_FILESIZE], 4);
 
-		if (nptr > p + len || dptr < p || nptr < dptr)
+		if (nptr > addr + len || dptr < addr || nptr < dptr)
 			goto quit; /* Buffer overrun */
 
 		if ((ch[C_MODE] & 0170000) == 0100000 &&
-		    ch[C_NAMESIZE] >= mypathsize &&
-		    !memcmp(p, path, mypathsize)) {
-			*offset = (long)nptr - (long)data;
-			if (ch[C_NAMESIZE] - mypathsize >= MAX_CPIO_FILE_NAME) {
-				pr_warn(
-				"File %s exceeding MAX_CPIO_FILE_NAME [%d]\n",
-				p, MAX_CPIO_FILE_NAME);
+		    ch[C_NAMESIZE] >= mypathsize) {
+			if (off + mypathsize > limit) {
+				unmap(p_start, map_size);
+				p = p_start = map(addr, map_size);
+				off = 0;
 			}
-			strlcpy(cd.name, p + mypathsize, MAX_CPIO_FILE_NAME);
+			if(!memcmp(p, path, mypathsize)) {
+				*offset = (long)nptr - (long)data;
+				if (ch[C_NAMESIZE] - mypathsize >= MAX_CPIO_FILE_NAME) {
+					pr_warn(
+					"File %s exceeding MAX_CPIO_FILE_NAME [%d]\n",
+					p, MAX_CPIO_FILE_NAME);
+				}
+				if (off + mypathsize + MAX_CPIO_FILE_NAME > limit) {
+					unmap(p_start, map_size);
+					p = p_start = map(addr, map_size);
+					off = 0;
+				}
+				strlcpy(cd.name, p + mypathsize, MAX_CPIO_FILE_NAME);
 
-			cd.data = (void *)dptr;
-			cd.size = ch[C_FILESIZE];
-			return cd; /* Found it! */
+				cd.data = (void *)dptr;
+				cd.size = ch[C_FILESIZE];
+				unmap(p_start, map_size);
+				return cd; /* Found it! */
+			}
 		}
-		len -= (nptr - p);
-		p = nptr;
+		len -= (nptr - addr);
+		if (nptr - addr >= limit) {
+			addr = nptr;
+			unmap(p_start, map_size);
+			p = p_start = map(addr, map_size);
+			off = 0;
+		} else
+			addr = nptr;
 	}
 
 quit:
+	unmap(p_start, map_size);
 	return cd;
 }
Index: linux-2.6/arch/x86/kernel/setup.c
===================================================================
--- linux-2.6.orig/arch/x86/kernel/setup.c
+++ linux-2.6/arch/x86/kernel/setup.c
@@ -293,6 +293,7 @@ static void __init reserve_brk(void)
 }
 
 #ifdef CONFIG_BLK_DEV_INITRD
+void update_mc_saved_data(unsigned long pa_offset)
 
 static u64 __init get_ramdisk_image(void)
 {
@@ -357,6 +358,8 @@ static void __init relocate_initrd(void)
 		" [mem %#010llx-%#010llx]\n",
 		ramdisk_image, ramdisk_image + ramdisk_size - 1,
 		ramdisk_here, ramdisk_here + ramdisk_size - 1);
+
+	update_mc_saved_data(ramdisk_here - ramdisk_image);
 }
 
 static u64 __init get_mem_size(unsigned long limit_pfn)
@@ -414,6 +417,7 @@ static void __init reserve_initrd(void)
 		/* All are mapped, easy case */
 		initrd_start = ramdisk_image + PAGE_OFFSET;
 		initrd_end = initrd_start + ramdisk_size;
+		update_mc_saved_data(0);
 		return;
 	}
 

^ permalink raw reply	[flat|nested] 127+ messages in thread

* Re: [tip:x86/microcode] x86/microcode_intel_early.c: Early update ucode on Intel's CPU
  2012-12-12  7:14                                         ` Yinghai Lu
@ 2012-12-12 10:26                                           ` Yinghai Lu
  2012-12-13  1:06                                           ` Yinghai Lu
  1 sibling, 0 replies; 127+ messages in thread
From: Yinghai Lu @ 2012-12-12 10:26 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Borislav Petkov, Yu, Fenghua, mingo, linux-kernel, tglx, hpa,
	linux-tip-commits, Konrad Rzeszutek Wilk, Stefano Stabellini

[-- Attachment #1: Type: text/plain, Size: 700 bytes --]

On Tue, Dec 11, 2012 at 11:14 PM, Yinghai Lu <yinghai@kernel.org> wrote:
>
> please check draft version for early_memremap version for microcode...
>
> 1. make find_cpio take map/unmap function pointer, and use that to set
> sliding window.
> 2. clean the end to size in some function to fix -1 offset
> 3. update_mc_saved to change back to __va for ap etc and after
> initrd_relocation.
>
> should use this one on top of early_ioremap_head64.patch that i sent
> it out this afternoon.
>

this one pass the test on 64bit without microcode cpio with initrd...

Fenghua, can you check it on 64 config with more than 4g RAM ?

on top of tip/x86/microcode and early_ioremap_head64.patch

Thanks

Yinghai

[-- Attachment #2: fix_microcode_v2.patch --]
[-- Type: application/octet-stream, Size: 12412 bytes --]

---
 arch/x86/kernel/microcode_intel_early.c |  129 ++++++++++++++++++++++----------
 arch/x86/kernel/setup.c                 |    4 
 drivers/acpi/osl.c                      |   11 ++
 include/linux/earlycpio.h               |    4 
 lib/earlycpio.c                         |   88 ++++++++++++++++-----
 5 files changed, 172 insertions(+), 64 deletions(-)

Index: linux-2.6/arch/x86/kernel/microcode_intel_early.c
===================================================================
--- linux-2.6.orig/arch/x86/kernel/microcode_intel_early.c
+++ linux-2.6/arch/x86/kernel/microcode_intel_early.c
@@ -20,9 +20,11 @@
 #include <linux/vmalloc.h>
 #include <linux/mm.h>
 #include <linux/earlycpio.h>
+
 #include <asm/msr.h>
 #include <asm/microcode_intel.h>
 #include <asm/processor.h>
+#include <asm/setup.h>
 
 struct microcode_intel __initdata *mc_saved_in_initrd[MAX_UCODE_COUNT];
 struct mc_saved_data mc_saved_data;
@@ -290,21 +292,77 @@ static int __cpuinit collect_cpu_info_ea
 	return 0;
 }
 
+#ifdef CONFIG_X86_64
+static void *early_cd_data_p __initdata;
+static unsigned long early_cd_data_pa __initdata;
+static unsigned long early_cd_size __initdata;
+
+void update_mc_saved_data(unsigned long pa_offset)
+{
+	int i;
+	unsigned long addr;
+
+	if (early_cd_data_p)
+		early_iounmap(early_cd_data_p, early_cd_size);
+
+	if (!mc_saved_data.mc_saved)
+		return;
+
+	for (i = 0; i< mc_saved_data.mc_saved_count; i++) {
+		addr = (void *)mc_saved_data.mc_saved[i]- early_cd_data_p;
+		addr += early_cd_data_pa;
+		addr += pa_offset;
+		mc_saved_data.mc_saved[i] = __va(addr);
+	}
+}
+static void *map(unsigned long addr, unsigned long size)
+{
+	return early_memremap(addr, size);
+}
+static void unmap(void *p, unsigned long size)
+{
+	return early_iounmap(p, size);
+}
+#else
+static void *map(unsigned long addr, unsigned long size)
+{
+	return (void *)addr;
+}
+static void unmap(void *p, unsigned long size)
+{
+}
+#endif
+
 static __init enum ucode_state
-scan_microcode(unsigned long start, unsigned long end,
+scan_microcode(unsigned long start, unsigned long size,
 		struct mc_saved_data *mc_saved_data,
 		struct microcode_intel **mc_saved_in_initrd)
 {
-	unsigned int size = end - start + 1;
 	struct cpio_data cd = { 0, 0 };
 	char ucode_name[] = "kernel/x86/microcode/GenuineIntel.bin";
 	long offset = 0;
+	void *p;
+	unsigned long map_size = size;
 
-	cd = find_cpio_data(ucode_name, (void *)start, size, &offset);
+#ifdef CONFIG_X86_64
+	map_size = PAGE_SIZE;
+#endif
+
+	cd = find_cpio_data(ucode_name, (void *)start, size, &offset,
+				 map, unmap, map_size);
 	if (!cd.data)
 		return UCODE_ERROR;
 
-	return get_matching_model_microcode(0, cd.data, cd.size, mc_saved_data,
+#ifdef CONFIG_X86_64
+	/* will need to free them later */
+	early_cd_data_pa = (unsigned long)cd.data;
+	early_cd_size = cd.size;
+	p = early_cd_data_p = early_memremap(early_cd_data_pa, early_cd_size);
+#else
+	p = cd.data;
+#endif
+
+	return get_matching_model_microcode(0, p, cd.size, mc_saved_data,
 					 mc_saved_in_initrd, SYSTEM_BOOTING);
 }
 
@@ -340,28 +398,22 @@ apply_microcode_early(struct mc_saved_da
 }
 
 #ifdef CONFIG_X86_32
-static void __init map_mc_saved(struct mc_saved_data *mc_saved_data,
-				struct microcode_intel **mc_saved_in_initrd)
+void update_mc_saved_data(unsigned long pa_offset)
 {
 	int i;
 
-	if (mc_saved_data->mc_saved) {
-		for (i = 0; i < mc_saved_data->mc_saved_count; i++)
-			mc_saved_data->mc_saved[i] =
-					 __va(mc_saved_data->mc_saved[i]);
-
-		mc_saved_data->mc_saved = __va(mc_saved_data->mc_saved);
+	if (mc_saved_data.mc_saved) {
+		mc_saved_data.mc_saved = __va(mc_saved_data.mc_saved);
+		for (i = 0; i < mc_saved_data.mc_saved_count; i++)
+			mc_saved_data.mc_saved[i] =
+					 __va(mc_saved_data.mc_saved[i] + pa_offset);
+
 	}
 
-	if (mc_saved_data->ucode_cpu_info->mc)
-		mc_saved_data->ucode_cpu_info->mc =
-				 __va(mc_saved_data->ucode_cpu_info->mc);
-	mc_saved_data->ucode_cpu_info = __va(mc_saved_data->ucode_cpu_info);
-}
-#else
-static inline void __init map_mc_saved(struct mc_saved_data *mc_saved_data,
-				struct microcode_intel **mc_saved_in_initrd)
-{
+	mc_saved_data.ucode_cpu_info = __va(mc_saved_data.ucode_cpu_info);
+	if (mc_saved_data.ucode_cpu_info->mc)
+		mc_saved_data.ucode_cpu_info->mc =
+				 __va(mc_saved_data.ucode_cpu_info->mc);
 }
 #endif
 
@@ -376,7 +428,7 @@ void __init save_microcode_in_initrd(str
 static void __init
 _load_ucode_intel_bsp(struct mc_saved_data *mc_saved_data,
 		      struct microcode_intel **mc_saved_in_initrd,
-		      unsigned long initrd_start, unsigned long initrd_end)
+		      unsigned long initrd_start, unsigned long initrd_size)
 {
 	int cpu = 0;
 
@@ -387,37 +439,34 @@ _load_ucode_intel_bsp(struct mc_saved_da
 			(struct ucode_cpu_info *)__pa(ucode_cpu_info_early);
 #endif
 	collect_cpu_info_early(mc_saved_data->ucode_cpu_info + cpu);
-	scan_microcode(initrd_start, initrd_end, mc_saved_data,
+	scan_microcode(initrd_start, initrd_size, mc_saved_data,
 		       mc_saved_in_initrd);
 	load_microcode(mc_saved_data, cpu);
 	apply_microcode_early(mc_saved_data, cpu);
-	map_mc_saved(mc_saved_data, mc_saved_in_initrd);
 }
 
 void __init
 load_ucode_intel_bsp(char *real_mode_data)
 {
-	u64 ramdisk_image, ramdisk_size, ramdisk_end;
-	unsigned long initrd_start, initrd_end;
-	struct boot_params *boot_params;
-
-	boot_params = (struct boot_params *)real_mode_data;
-	ramdisk_image = boot_params->hdr.ramdisk_image;
-	ramdisk_size  = boot_params->hdr.ramdisk_size;
+	u64 ramdisk_image, ramdisk_size;
 
 #ifdef CONFIG_X86_64
-	ramdisk_end  = PAGE_ALIGN(ramdisk_image + ramdisk_size);
-	initrd_start = ramdisk_image + PAGE_OFFSET;
-	initrd_end = initrd_start + ramdisk_size;
+	ramdisk_image = boot_params.hdr.ramdisk_image;
+	ramdisk_size  = boot_params.hdr.ramdisk_size;
+	if (!ramdisk_image || !ramdisk_size)
+		return;
 	_load_ucode_intel_bsp(&mc_saved_data, mc_saved_in_initrd,
-			      initrd_start, initrd_end);
+			      ramdisk_image, ramdisk_size);
 #else
-	ramdisk_end  = ramdisk_image + ramdisk_size;
-	initrd_start = ramdisk_image;
-	initrd_end = initrd_start + ramdisk_size;
+	struct boot_params *boot_params = (struct boot_params *)real_mode_data;
+
+	ramdisk_image = boot_params->hdr.ramdisk_image;
+	ramdisk_size  = boot_params->hdr.ramdisk_size;
+	if (!ramdisk_image || !ramdisk_size)
+		return;
 	_load_ucode_intel_bsp((struct mc_saved_data *)__pa(&mc_saved_data),
-			(struct microcode_intel **)__pa(mc_saved_in_initrd),
-			initrd_start, initrd_end);
+			      (struct microcode_intel **)__pa(mc_saved_in_initrd),
+			      ramdisk_image, ramdisk_size);
 #endif
 }
 
Index: linux-2.6/drivers/acpi/osl.c
===================================================================
--- linux-2.6.orig/drivers/acpi/osl.c
+++ linux-2.6/drivers/acpi/osl.c
@@ -573,6 +573,14 @@ static const char * const table_sigs[] =
 /* Must not increase 10 or needs code modification below */
 #define ACPI_OVERRIDE_TABLES 10
 
+static void *map(unsigned long addr, unsigned long size)
+{
+	return (void *)addr;
+}
+static void unmap(void *p, unsigned long size)
+{
+}
+
 void __init acpi_initrd_override(void *data, size_t size)
 {
 	int sig, no, table_nr = 0, total_offset = 0;
@@ -587,7 +595,8 @@ void __init acpi_initrd_override(void *d
 		return;
 
 	for (no = 0; no < ACPI_OVERRIDE_TABLES; no++) {
-		file = find_cpio_data(cpio_path, data, size, &offset);
+		file = find_cpio_data(cpio_path, data, size, &offset,
+					map, unmap, size);
 		if (!file.data)
 			break;
 
Index: linux-2.6/include/linux/earlycpio.h
===================================================================
--- linux-2.6.orig/include/linux/earlycpio.h
+++ linux-2.6/include/linux/earlycpio.h
@@ -12,6 +12,8 @@ struct cpio_data {
 };
 
 struct cpio_data find_cpio_data(const char *path, void *data, size_t len,
-				long *offset);
+			long *offset, void *(*map)(unsigned long, unsigned long),
+			void (*unmap)(void *, unsigned long),
+			unsigned long map_size);
 
 #endif /* _LINUX_EARLYCPIO_H */
Index: linux-2.6/lib/earlycpio.c
===================================================================
--- linux-2.6.orig/lib/earlycpio.c
+++ linux-2.6/lib/earlycpio.c
@@ -64,23 +64,39 @@ enum cpio_fields {
  */
 
 struct cpio_data __cpuinit find_cpio_data(const char *path, void *data,
-					  size_t len,  long *offset)
+					  size_t len,  long *offset,
+					  void *(*map)(unsigned long, unsigned long),
+					  void (*unmap)(void *, unsigned long),
+					  unsigned long map_size)
 {
 	const size_t cpio_header_len = 8*C_NFIELDS - 2;
 	struct cpio_data cd = { NULL, 0, "" };
-	const char *p, *dptr, *nptr;
+	const char *p;
 	unsigned int ch[C_NFIELDS], *chp, v;
 	unsigned char c, x;
 	size_t mypathsize = strlen(path);
 	int i, j;
-
-	p = data;
+	unsigned long addr, off, limit = map_size;
+	unsigned long dptr, nptr;
+	char *p_start;
+
+	addr = (unsigned long)data;
+	p = p_start = map(addr, map_size);
+	off = 0;
 
 	while (len > cpio_header_len) {
 		if (!*p) {
 			/* All cpio headers need to be 4-byte aligned */
-			p += 4;
+			addr += 4;
+			off += 4;
 			len -= 4;
+			if (off < limit)
+				p += 4;
+			else {
+				unmap(p_start, map_size);
+				p = p_start = map(addr, map_size);
+				off = 0;
+			}
 			continue;
 		}
 
@@ -90,7 +106,16 @@ struct cpio_data __cpuinit find_cpio_dat
 			v = 0;
 			while (j--) {
 				v <<= 4;
-				c = *p++;
+				c = *p;
+				addr++;
+				off++;
+				if (off < limit)
+					p++;
+				else {
+					unmap(p_start, map_size);
+					p = p_start = map(addr, map_size);
+					off = 0;
+				}
 
 				x = c - '0';
 				if (x < 10) {
@@ -115,31 +140,50 @@ struct cpio_data __cpuinit find_cpio_dat
 
 		len -= cpio_header_len;
 
-		dptr = PTR_ALIGN(p + ch[C_NAMESIZE], 4);
-		nptr = PTR_ALIGN(dptr + ch[C_FILESIZE], 4);
+		dptr = ALIGN(addr + ch[C_NAMESIZE], 4);
+		nptr = ALIGN(dptr + ch[C_FILESIZE], 4);
 
-		if (nptr > p + len || dptr < p || nptr < dptr)
+		if (nptr > addr + len || dptr < addr || nptr < dptr)
 			goto quit; /* Buffer overrun */
 
 		if ((ch[C_MODE] & 0170000) == 0100000 &&
-		    ch[C_NAMESIZE] >= mypathsize &&
-		    !memcmp(p, path, mypathsize)) {
-			*offset = (long)nptr - (long)data;
-			if (ch[C_NAMESIZE] - mypathsize >= MAX_CPIO_FILE_NAME) {
-				pr_warn(
-				"File %s exceeding MAX_CPIO_FILE_NAME [%d]\n",
-				p, MAX_CPIO_FILE_NAME);
+		    ch[C_NAMESIZE] >= mypathsize) {
+			if (off + mypathsize > limit) {
+				unmap(p_start, map_size);
+				p = p_start = map(addr, map_size);
+				off = 0;
 			}
-			strlcpy(cd.name, p + mypathsize, MAX_CPIO_FILE_NAME);
+			if(!memcmp(p, path, mypathsize)) {
+				*offset = (long)nptr - (long)data;
+				if (ch[C_NAMESIZE] - mypathsize >= MAX_CPIO_FILE_NAME) {
+					pr_warn(
+					"File %s exceeding MAX_CPIO_FILE_NAME [%d]\n",
+					p, MAX_CPIO_FILE_NAME);
+				}
+				if (off + mypathsize + MAX_CPIO_FILE_NAME > limit) {
+					unmap(p_start, map_size);
+					p = p_start = map(addr, map_size);
+					off = 0;
+				}
+				strlcpy(cd.name, p + mypathsize, MAX_CPIO_FILE_NAME);
 
-			cd.data = (void *)dptr;
-			cd.size = ch[C_FILESIZE];
-			return cd; /* Found it! */
+				cd.data = (void *)dptr;
+				cd.size = ch[C_FILESIZE];
+				unmap(p_start, map_size);
+				return cd; /* Found it! */
+			}
 		}
-		len -= (nptr - p);
-		p = nptr;
+		len -= (nptr - addr);
+		if (nptr - addr >= limit) {
+			addr = nptr;
+			unmap(p_start, map_size);
+			p = p_start = map(addr, map_size);
+			off = 0;
+		} else
+			addr = nptr;
 	}
 
 quit:
+	unmap(p_start, map_size);
 	return cd;
 }
Index: linux-2.6/arch/x86/kernel/setup.c
===================================================================
--- linux-2.6.orig/arch/x86/kernel/setup.c
+++ linux-2.6/arch/x86/kernel/setup.c
@@ -293,6 +293,7 @@ static void __init reserve_brk(void)
 }
 
 #ifdef CONFIG_BLK_DEV_INITRD
+void update_mc_saved_data(unsigned long pa_offset);
 
 static u64 __init get_ramdisk_image(void)
 {
@@ -357,6 +358,8 @@ static void __init relocate_initrd(void)
 		" [mem %#010llx-%#010llx]\n",
 		ramdisk_image, ramdisk_image + ramdisk_size - 1,
 		ramdisk_here, ramdisk_here + ramdisk_size - 1);
+
+	update_mc_saved_data(ramdisk_here - ramdisk_image);
 }
 
 static u64 __init get_mem_size(unsigned long limit_pfn)
@@ -414,6 +417,7 @@ static void __init reserve_initrd(void)
 		/* All are mapped, easy case */
 		initrd_start = ramdisk_image + PAGE_OFFSET;
 		initrd_end = initrd_start + ramdisk_size;
+		update_mc_saved_data(0);
 		return;
 	}
 

^ permalink raw reply	[flat|nested] 127+ messages in thread

* Re: [tip:x86/microcode] x86/microcode_intel_early.c: Early update ucode on Intel's CPU
  2012-12-12  6:57                                       ` H. Peter Anvin
@ 2012-12-12 13:38                                         ` Borislav Petkov
  2012-12-12 17:43                                           ` H. Peter Anvin
  2012-12-13  5:12                                           ` H. Peter Anvin
  0 siblings, 2 replies; 127+ messages in thread
From: Borislav Petkov @ 2012-12-12 13:38 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Yinghai Lu, Yu, Fenghua, mingo, linux-kernel, tglx, hpa,
	linux-tip-commits, Konrad Rzeszutek Wilk, Stefano Stabellini

On Tue, Dec 11, 2012 at 10:57:03PM -0800, H. Peter Anvin wrote:
> @@ -372,46 +394,21 @@ ENTRY(name)
>  	i = i + 1 ;					\
>  	.endr
>  
> -	.data
> -	/*
> -	 * This default setting generates an ident mapping at address 0x100000
> -	 * and a mapping for the kernel that precisely maps virtual address
> -	 * 0xffffffff80000000 to physical address 0x000000. (always using
> -	 * 2Mbyte large pages provided by PAE mode)
> -	 */
> -NEXT_PAGE(init_level4_pgt)
> -	.quad	level3_ident_pgt - __START_KERNEL_map + _KERNPG_TABLE

We completely lost level3_ident_pgt, causing:

arch/x86/built-in.o: In function `setup_real_mode':
/home/boris/kernel/linux-2.6/arch/x86/realmode/init.c:81: undefined reference to `level3_ident_pgt'
make: *** [vmlinux] Error 1

> -	.org	init_level4_pgt + L4_PAGE_OFFSET*8, 0
> -	.quad	level3_ident_pgt - __START_KERNEL_map + _KERNPG_TABLE
> -	.org	init_level4_pgt + L4_START_KERNEL*8, 0
> -	/* (2^48-(2*1024*1024*1024))/(2^39) = 511 */
> +	__INITDATA
> +NEXT_PAGE(early_level4_pgt)
> +	.fill	511,8,0
>  	.quad	level3_kernel_pgt - __START_KERNEL_map + _PAGE_TABLE
>  
> -NEXT_PAGE(level3_ident_pgt)
> -	.quad	level2_ident_pgt - __START_KERNEL_map + _KERNPG_TABLE
> -	.fill	511,8,0
> +NEXT_PAGE(early_dynamic_pgts)
> +	.fill	512*EARLY_DYNAMIC_PAGE_TABLES,8,0
>  
> +	.data
>  NEXT_PAGE(level3_kernel_pgt)
>  	.fill	L3_START_KERNEL,8,0
>  	/* (2^48-(2*1024*1024*1024)-((2^39)*511))/(2^30) = 510 */
>  	.quad	level2_kernel_pgt - __START_KERNEL_map + _KERNPG_TABLE
>  	.quad	level2_fixmap_pgt - __START_KERNEL_map + _PAGE_TABLE
>  
> -NEXT_PAGE(level2_fixmap_pgt)
> -	.fill	506,8,0
> -	.quad	level1_fixmap_pgt - __START_KERNEL_map + _PAGE_TABLE
> -	/* 8MB reserved for vsyscalls + a 2MB hole = 4 + 1 entries */
> -	.fill	5,8,0
> -
> -NEXT_PAGE(level1_fixmap_pgt)
> -	.fill	512,8,0

You still need that NEXT_PAGE(level1_fixmap_pgt) thing:

arch/x86/kernel/head_64.o: In function `level2_fixmap_pgt':
(.data+0x2fd0): undefined reference to `level1_fixmap_pgt'

-- 
Regards/Gruss,
    Boris.

Sent from a fat crate under my desk. Formatting is fine.
--

^ permalink raw reply	[flat|nested] 127+ messages in thread

* Re: [tip:x86/microcode] x86/microcode_intel_early.c: Early update ucode on Intel's CPU
  2012-12-12 13:38                                         ` Borislav Petkov
@ 2012-12-12 17:43                                           ` H. Peter Anvin
  2012-12-13  5:12                                           ` H. Peter Anvin
  1 sibling, 0 replies; 127+ messages in thread
From: H. Peter Anvin @ 2012-12-12 17:43 UTC (permalink / raw)
  To: Borislav Petkov, Yinghai Lu, Yu, Fenghua, mingo, linux-kernel,
	tglx, hpa, linux-tip-commits, Konrad Rzeszutek Wilk,
	Stefano Stabellini

On 12/12/2012 05:38 AM, Borislav Petkov wrote:
>
> We completely lost level3_ident_pgt, causing:
>
> arch/x86/built-in.o: In function `setup_real_mode':
> /home/boris/kernel/linux-2.6/arch/x86/realmode/init.c:81: undefined reference to `level3_ident_pgt'
> make: *** [vmlinux] Error 1
>

>
> You still need that NEXT_PAGE(level1_fixmap_pgt) thing:
>
> arch/x86/kernel/head_64.o: In function `level2_fixmap_pgt':
> (.data+0x2fd0): undefined reference to `level1_fixmap_pgt'
>

Yes, I said it wasn't a complete patch.  There are bits missing, and 
some of them need restructuring.  The ident page table in the trampoline 
should be handled by mirroring the kernel ident page tables instead, for 
example -- right now that is completely broken if the kernel lives above 
the 512 GiB mark.

	-hpa

-- 
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel.  I don't speak on their behalf.


^ permalink raw reply	[flat|nested] 127+ messages in thread

* Re: [tip:x86/microcode] x86/microcode_intel_early.c: Early update ucode on Intel's CPU
  2012-12-12  7:14                                         ` Yinghai Lu
  2012-12-12 10:26                                           ` Yinghai Lu
@ 2012-12-13  1:06                                           ` Yinghai Lu
  1 sibling, 0 replies; 127+ messages in thread
From: Yinghai Lu @ 2012-12-13  1:06 UTC (permalink / raw)
  To: H. Peter Anvin, Yu, Fenghua
  Cc: Borislav Petkov, mingo, linux-kernel, tglx, hpa,
	linux-tip-commits, Konrad Rzeszutek Wilk, Stefano Stabellini

[-- Attachment #1: Type: text/plain, Size: 458 bytes --]

> please check  early_memremap version for microcode...
>
> 1. make find_cpio take map/unmap function pointer, and use that to set
> sliding window.
> 2. clean the end to size in some function to fix -1 offset
> 3. update_mc_saved to change back to __va for ap etc and after
> initrd_relocation.
>
> should use this one on top of early_ioremap_head64.patch that i sent
> it out this afternoon.

v3...

should split it into to three or four.

Thanks

Yinghai

[-- Attachment #2: fix_microcode_v3.patch --]
[-- Type: application/octet-stream, Size: 12725 bytes --]

---
 arch/x86/kernel/microcode_intel_early.c |  139 ++++++++++++++++++++------------
 arch/x86/kernel/setup.c                 |   11 ++
 include/linux/earlycpio.h               |   10 +-
 lib/earlycpio.c                         |  104 ++++++++++++++++++-----
 4 files changed, 190 insertions(+), 74 deletions(-)

Index: linux-2.6/arch/x86/kernel/microcode_intel_early.c
===================================================================
--- linux-2.6.orig/arch/x86/kernel/microcode_intel_early.c
+++ linux-2.6/arch/x86/kernel/microcode_intel_early.c
@@ -20,9 +20,11 @@
 #include <linux/vmalloc.h>
 #include <linux/mm.h>
 #include <linux/earlycpio.h>
+
 #include <asm/msr.h>
 #include <asm/microcode_intel.h>
 #include <asm/processor.h>
+#include <asm/setup.h>
 
 struct microcode_intel __initdata *mc_saved_in_initrd[MAX_UCODE_COUNT];
 struct mc_saved_data mc_saved_data;
@@ -290,25 +292,93 @@ static int __cpuinit collect_cpu_info_ea
 	return 0;
 }
 
+#ifdef CONFIG_X86_64
+static void *early_cd_data_p __initdata;
+static unsigned long early_cd_data_pa __initdata;
+static unsigned long early_cd_size __initdata;
+
+void __init update_mc_saved_data(unsigned long pa_offset)
+{
+	int i;
+	unsigned long addr;
+
+	if (early_cd_data_p)
+		early_iounmap(early_cd_data_p, early_cd_size);
+
+	if (!mc_saved_data.mc_saved)
+		return;
+
+	for (i = 0; i< mc_saved_data.mc_saved_count; i++) {
+		addr = (void *)mc_saved_data.mc_saved[i]- early_cd_data_p;
+		addr += early_cd_data_pa;
+		addr += pa_offset;
+		mc_saved_data.mc_saved[i] = __va(addr);
+	}
+}
+static __init void *map(unsigned long addr, unsigned long size)
+{
+	return early_memremap(addr, size);
+}
+static __init void unmap(void *p, unsigned long size)
+{
+	return early_iounmap(p, size);
+}
+
+#else
+
+void __init update_mc_saved_data(unsigned long pa_offset)
+{
+	int i;
+
+	if (mc_saved_data.mc_saved) {
+		mc_saved_data.mc_saved = __va(mc_saved_data.mc_saved);
+		for (i = 0; i < mc_saved_data.mc_saved_count; i++)
+			mc_saved_data.mc_saved[i] =
+					 __va(mc_saved_data.mc_saved[i] + pa_offset);
+
+	}
+
+	mc_saved_data.ucode_cpu_info = __va(mc_saved_data.ucode_cpu_info);
+	if (mc_saved_data.ucode_cpu_info->mc)
+		mc_saved_data.ucode_cpu_info->mc =
+				 __va(mc_saved_data.ucode_cpu_info->mc);
+}
+
+#endif
+
 static __init enum ucode_state
-scan_microcode(unsigned long start, unsigned long end,
+scan_microcode(unsigned long start, unsigned long size,
 		struct mc_saved_data *mc_saved_data,
 		struct microcode_intel **mc_saved_in_initrd)
 {
-	unsigned int size = end - start + 1;
 	struct cpio_data cd = { 0, 0 };
 	char ucode_name[] = "kernel/x86/microcode/GenuineIntel.bin";
 	long offset = 0;
+	void *p;
 
+#ifdef CONFIG_X86_64
+	cd = find_cpio_data_map(ucode_name, (void *)start, size, &offset,
+				 map, unmap, PAGE_SIZE*16);
+#else
 	cd = find_cpio_data(ucode_name, (void *)start, size, &offset);
+#endif
 	if (!cd.data)
 		return UCODE_ERROR;
 
-	return get_matching_model_microcode(0, cd.data, cd.size, mc_saved_data,
+#ifdef CONFIG_X86_64
+	/* will need to free them later */
+	early_cd_data_pa = (unsigned long)cd.data;
+	early_cd_size = cd.size;
+	p = early_cd_data_p = early_memremap(early_cd_data_pa, cd.size);
+#else
+	p = cd.data;
+#endif
+
+	return get_matching_model_microcode(0, p, cd.size, mc_saved_data,
 					 mc_saved_in_initrd, SYSTEM_BOOTING);
 }
 
-static int __init
+static int __cpuinit
 apply_microcode_early(struct mc_saved_data *mc_saved_data, int cpu)
 {
 	struct ucode_cpu_info *uci = mc_saved_data->ucode_cpu_info + cpu;
@@ -339,32 +409,6 @@ apply_microcode_early(struct mc_saved_da
 	return 0;
 }
 
-#ifdef CONFIG_X86_32
-static void __init map_mc_saved(struct mc_saved_data *mc_saved_data,
-				struct microcode_intel **mc_saved_in_initrd)
-{
-	int i;
-
-	if (mc_saved_data->mc_saved) {
-		for (i = 0; i < mc_saved_data->mc_saved_count; i++)
-			mc_saved_data->mc_saved[i] =
-					 __va(mc_saved_data->mc_saved[i]);
-
-		mc_saved_data->mc_saved = __va(mc_saved_data->mc_saved);
-	}
-
-	if (mc_saved_data->ucode_cpu_info->mc)
-		mc_saved_data->ucode_cpu_info->mc =
-				 __va(mc_saved_data->ucode_cpu_info->mc);
-	mc_saved_data->ucode_cpu_info = __va(mc_saved_data->ucode_cpu_info);
-}
-#else
-static inline void __init map_mc_saved(struct mc_saved_data *mc_saved_data,
-				struct microcode_intel **mc_saved_in_initrd)
-{
-}
-#endif
-
 void __init save_microcode_in_initrd(struct mc_saved_data *mc_saved_data,
 		 struct microcode_intel **mc_saved_in_initrd)
 {
@@ -376,7 +420,7 @@ void __init save_microcode_in_initrd(str
 static void __init
 _load_ucode_intel_bsp(struct mc_saved_data *mc_saved_data,
 		      struct microcode_intel **mc_saved_in_initrd,
-		      unsigned long initrd_start, unsigned long initrd_end)
+		      unsigned long initrd_start, unsigned long initrd_size)
 {
 	int cpu = 0;
 
@@ -387,37 +431,34 @@ _load_ucode_intel_bsp(struct mc_saved_da
 			(struct ucode_cpu_info *)__pa(ucode_cpu_info_early);
 #endif
 	collect_cpu_info_early(mc_saved_data->ucode_cpu_info + cpu);
-	scan_microcode(initrd_start, initrd_end, mc_saved_data,
+	scan_microcode(initrd_start, initrd_size, mc_saved_data,
 		       mc_saved_in_initrd);
 	load_microcode(mc_saved_data, cpu);
 	apply_microcode_early(mc_saved_data, cpu);
-	map_mc_saved(mc_saved_data, mc_saved_in_initrd);
 }
 
 void __init
 load_ucode_intel_bsp(char *real_mode_data)
 {
-	u64 ramdisk_image, ramdisk_size, ramdisk_end;
-	unsigned long initrd_start, initrd_end;
-	struct boot_params *boot_params;
-
-	boot_params = (struct boot_params *)real_mode_data;
-	ramdisk_image = boot_params->hdr.ramdisk_image;
-	ramdisk_size  = boot_params->hdr.ramdisk_size;
+	u64 ramdisk_image, ramdisk_size;
 
 #ifdef CONFIG_X86_64
-	ramdisk_end  = PAGE_ALIGN(ramdisk_image + ramdisk_size);
-	initrd_start = ramdisk_image + PAGE_OFFSET;
-	initrd_end = initrd_start + ramdisk_size;
+	ramdisk_image = boot_params.hdr.ramdisk_image;
+	ramdisk_size  = boot_params.hdr.ramdisk_size;
+	if (!ramdisk_image || !ramdisk_size)
+		return;
 	_load_ucode_intel_bsp(&mc_saved_data, mc_saved_in_initrd,
-			      initrd_start, initrd_end);
+			      ramdisk_image, ramdisk_size);
 #else
-	ramdisk_end  = ramdisk_image + ramdisk_size;
-	initrd_start = ramdisk_image;
-	initrd_end = initrd_start + ramdisk_size;
+	struct boot_params *boot_params = (struct boot_params *)real_mode_data;
+
+	ramdisk_image = boot_params->hdr.ramdisk_image;
+	ramdisk_size  = boot_params->hdr.ramdisk_size;
+	if (!ramdisk_image || !ramdisk_size)
+		return;
 	_load_ucode_intel_bsp((struct mc_saved_data *)__pa(&mc_saved_data),
-			(struct microcode_intel **)__pa(mc_saved_in_initrd),
-			initrd_start, initrd_end);
+			      (struct microcode_intel **)__pa(mc_saved_in_initrd),
+			      ramdisk_image, ramdisk_size);
 #endif
 }
 
Index: linux-2.6/include/linux/earlycpio.h
===================================================================
--- linux-2.6.orig/include/linux/earlycpio.h
+++ linux-2.6/include/linux/earlycpio.h
@@ -11,7 +11,13 @@ struct cpio_data {
 	char name[MAX_CPIO_FILE_NAME];
 };
 
-struct cpio_data find_cpio_data(const char *path, void *data, size_t len,
-				long *offset);
+struct cpio_data find_cpio(const char *path, void *data, size_t len,
+			long *offset);
+
+struct cpio_data find_cpio_data_map(const char *path, void *data, size_t len,
+			long *offset,
+			void *(*map)(unsigned long, unsigned long),
+			void (*unmap)(void *, unsigned long),
+			unsigned long map_size);
 
 #endif /* _LINUX_EARLYCPIO_H */
Index: linux-2.6/lib/earlycpio.c
===================================================================
--- linux-2.6.orig/lib/earlycpio.c
+++ linux-2.6/lib/earlycpio.c
@@ -47,6 +47,14 @@ enum cpio_fields {
 	C_NFIELDS
 };
 
+static void *map(unsigned long addr, unsigned long size)
+{
+	return (void *)addr;
+}
+static void unmap(void *p, unsigned long size)
+{
+}
+
 /**
  * cpio_data find_cpio_data - Search for files in an uncompressed cpio
  * @path:   The directory to search for, including a slash at the end
@@ -63,24 +71,40 @@ enum cpio_fields {
  *          the match returned an empty filename string.
  */
 
-struct cpio_data __cpuinit find_cpio_data(const char *path, void *data,
-					  size_t len,  long *offset)
+struct cpio_data __cpuinit find_cpio_data_map(const char *path, void *data,
+					  size_t len,  long *offset,
+					  void *(*map)(unsigned long, unsigned long),
+					  void (*unmap)(void *, unsigned long),
+					  unsigned long map_size)
 {
 	const size_t cpio_header_len = 8*C_NFIELDS - 2;
 	struct cpio_data cd = { NULL, 0, "" };
-	const char *p, *dptr, *nptr;
+	const char *p;
 	unsigned int ch[C_NFIELDS], *chp, v;
 	unsigned char c, x;
 	size_t mypathsize = strlen(path);
 	int i, j;
-
-	p = data;
+	unsigned long addr, off, limit = map_size;
+	unsigned long dptr, nptr;
+	char *p_start;
+
+	addr = (unsigned long)data;
+	p = p_start = map(addr, map_size);
+	off = 0;
 
 	while (len > cpio_header_len) {
 		if (!*p) {
 			/* All cpio headers need to be 4-byte aligned */
-			p += 4;
+			addr += 4;
+			off += 4;
 			len -= 4;
+			if (off < limit)
+				p += 4;
+			else {
+				unmap(p_start, map_size);
+				p = p_start = map(addr, map_size);
+				off = 0;
+			}
 			continue;
 		}
 
@@ -90,7 +114,16 @@ struct cpio_data __cpuinit find_cpio_dat
 			v = 0;
 			while (j--) {
 				v <<= 4;
-				c = *p++;
+				c = *p;
+				addr++;
+				off++;
+				if (off < limit)
+					p++;
+				else {
+					unmap(p_start, map_size);
+					p = p_start = map(addr, map_size);
+					off = 0;
+				}
 
 				x = c - '0';
 				if (x < 10) {
@@ -115,31 +148,56 @@ struct cpio_data __cpuinit find_cpio_dat
 
 		len -= cpio_header_len;
 
-		dptr = PTR_ALIGN(p + ch[C_NAMESIZE], 4);
-		nptr = PTR_ALIGN(dptr + ch[C_FILESIZE], 4);
+		dptr = ALIGN(addr + ch[C_NAMESIZE], 4);
+		nptr = ALIGN(dptr + ch[C_FILESIZE], 4);
 
-		if (nptr > p + len || dptr < p || nptr < dptr)
+		if (nptr > addr + len || dptr < addr || nptr < dptr)
 			goto quit; /* Buffer overrun */
 
 		if ((ch[C_MODE] & 0170000) == 0100000 &&
-		    ch[C_NAMESIZE] >= mypathsize &&
-		    !memcmp(p, path, mypathsize)) {
-			*offset = (long)nptr - (long)data;
-			if (ch[C_NAMESIZE] - mypathsize >= MAX_CPIO_FILE_NAME) {
-				pr_warn(
-				"File %s exceeding MAX_CPIO_FILE_NAME [%d]\n",
-				p, MAX_CPIO_FILE_NAME);
+		    ch[C_NAMESIZE] >= mypathsize) {
+			if (off + mypathsize > limit) {
+				unmap(p_start, map_size);
+				p = p_start = map(addr, map_size);
+				off = 0;
 			}
-			strlcpy(cd.name, p + mypathsize, MAX_CPIO_FILE_NAME);
+			if(!memcmp(p, path, mypathsize)) {
+				*offset = (long)nptr - (long)data;
+				if (ch[C_NAMESIZE] - mypathsize >= MAX_CPIO_FILE_NAME) {
+					pr_warn(
+					"File %s exceeding MAX_CPIO_FILE_NAME [%d]\n",
+					p, MAX_CPIO_FILE_NAME);
+				}
+				if (off + mypathsize + MAX_CPIO_FILE_NAME > limit) {
+					unmap(p_start, map_size);
+					p = p_start = map(addr, map_size);
+					off = 0;
+				}
+				strlcpy(cd.name, p + mypathsize, MAX_CPIO_FILE_NAME);
 
-			cd.data = (void *)dptr;
-			cd.size = ch[C_FILESIZE];
-			return cd; /* Found it! */
+				cd.data = (void *)dptr;
+				cd.size = ch[C_FILESIZE];
+				unmap(p_start, map_size);
+				return cd; /* Found it! */
+			}
 		}
-		len -= (nptr - p);
-		p = nptr;
+		len -= (nptr - addr);
+		if (nptr - addr >= limit) {
+			addr = nptr;
+			unmap(p_start, map_size);
+			p = p_start = map(addr, map_size);
+			off = 0;
+		} else
+			addr = nptr;
 	}
 
 quit:
+	unmap(p_start, map_size);
 	return cd;
 }
+
+struct cpio_data __cpuinit find_cpio_data(const char *path, void *data,
+					  size_t len,  long *offset)
+{
+	return find_cpio_data_map(path, data, len, offset, map, unmap, len);
+}
Index: linux-2.6/arch/x86/kernel/setup.c
===================================================================
--- linux-2.6.orig/arch/x86/kernel/setup.c
+++ linux-2.6/arch/x86/kernel/setup.c
@@ -294,6 +294,14 @@ static void __init reserve_brk(void)
 
 #ifdef CONFIG_BLK_DEV_INITRD
 
+#ifdef CONFIG_MICROCODE_INTEL_EARLY
+void update_mc_saved_data(unsigned long pa_offset);
+#else
+static inline void update_mc_saved_data(unsigned long pa_offset)
+{
+}
+#endif
+
 static u64 __init get_ramdisk_image(void)
 {
 	u64 ramdisk_image = boot_params.hdr.ramdisk_image;
@@ -357,6 +365,8 @@ static void __init relocate_initrd(void)
 		" [mem %#010llx-%#010llx]\n",
 		ramdisk_image, ramdisk_image + ramdisk_size - 1,
 		ramdisk_here, ramdisk_here + ramdisk_size - 1);
+
+	update_mc_saved_data(ramdisk_here - ramdisk_image);
 }
 
 static u64 __init get_mem_size(unsigned long limit_pfn)
@@ -414,6 +424,7 @@ static void __init reserve_initrd(void)
 		/* All are mapped, easy case */
 		initrd_start = ramdisk_image + PAGE_OFFSET;
 		initrd_end = initrd_start + ramdisk_size;
+		update_mc_saved_data(0);
 		return;
 	}
 

^ permalink raw reply	[flat|nested] 127+ messages in thread

* Re: [tip:x86/microcode] x86/microcode_intel_early.c: Early update ucode on Intel's CPU
  2012-12-12 13:38                                         ` Borislav Petkov
  2012-12-12 17:43                                           ` H. Peter Anvin
@ 2012-12-13  5:12                                           ` H. Peter Anvin
  2012-12-13  5:26                                             ` H. Peter Anvin
  1 sibling, 1 reply; 127+ messages in thread
From: H. Peter Anvin @ 2012-12-13  5:12 UTC (permalink / raw)
  To: Borislav Petkov, Yinghai Lu, Yu, Fenghua, mingo, linux-kernel,
	tglx, hpa, linux-tip-commits, Konrad Rzeszutek Wilk,
	Stefano Stabellini

[-- Attachment #1: Type: text/plain, Size: 350 bytes --]

Here is a version that compiles.  It doesn't *boot* yet, because the 
switchover from dynamic mode to the real pagetables doesn't happen right 
and so we end up on an uninitialized set of page tables.

The new page table setup in tip:x86/mm2 should make that easier to 
achieve, however... I won't have time to test this out tonight, though.

	-hpa


[-- Attachment #2: diff --]
[-- Type: text/plain, Size: 14241 bytes --]

diff --git a/arch/x86/include/asm/pgtable_64_types.h b/arch/x86/include/asm/pgtable_64_types.h
index 766ea16..2d88344 100644
--- a/arch/x86/include/asm/pgtable_64_types.h
+++ b/arch/x86/include/asm/pgtable_64_types.h
@@ -1,6 +1,8 @@
 #ifndef _ASM_X86_PGTABLE_64_DEFS_H
 #define _ASM_X86_PGTABLE_64_DEFS_H
 
+#include <asm/sparsemem.h>
+
 #ifndef __ASSEMBLY__
 #include <linux/types.h>
 
@@ -60,4 +62,6 @@ typedef struct { pteval_t pte; } pte_t;
 #define MODULES_END      _AC(0xffffffffff000000, UL)
 #define MODULES_LEN   (MODULES_END - MODULES_VADDR)
 
+#define EARLY_DYNAMIC_PAGE_TABLES	64
+
 #endif /* _ASM_X86_PGTABLE_64_DEFS_H */
diff --git a/arch/x86/kernel/head64.c b/arch/x86/kernel/head64.c
index 037df57..9443c77 100644
--- a/arch/x86/kernel/head64.c
+++ b/arch/x86/kernel/head64.c
@@ -26,11 +26,73 @@
 #include <asm/e820.h>
 #include <asm/bios_ebda.h>
 
-static void __init zap_identity_mappings(void)
+/*
+ * Manage page tables very early on.
+ */
+extern pgd_t early_level4_pgt[PTRS_PER_PGD];
+extern pmd_t early_dynamic_pgts[EARLY_DYNAMIC_PAGE_TABLES][PTRS_PER_PMD];
+static unsigned int __initdata next_early_pgt = 2, early_pgt_resets = 0;
+
+/* Wipe all early page tables except for the kernel symbol map */
+static void __init reset_early_page_tables(void)
 {
-	pgd_t *pgd = pgd_offset_k(0UL);
-	pgd_clear(pgd);
-	__flush_tlb_all();
+	unsigned long i;
+
+	for (i = 0; i < PTRS_PER_PGD-1; i++)
+		early_level4_pgt[i].pgd = 0;
+
+	next_early_pgt = 0;
+	early_pgt_resets++;
+
+	__native_flush_tlb();
+}
+
+/* Create a new PMD entry */
+int __init early_make_pgtable(unsigned long address)
+{
+	unsigned long physaddr = address - __PAGE_OFFSET;
+	unsigned long i;
+	pgdval_t pgd, *pgd_p;
+	pudval_t *pud_p;
+	pmdval_t pmd, *pmd_p;
+
+	if (physaddr >= MAXMEM)
+		return -1;	/* Invalid address - puke */
+
+	i = (address >> PGDIR_SHIFT) & (PTRS_PER_PGD - 1);
+	pgd_p = &early_level4_pgt[i].pgd;
+	pgd = *pgd_p;
+
+	/*
+	 * The use of __START_KERNEL_map rather than __PAGE_OFFSET here is
+	 * critical -- __PAGE_OFFSET would point us back into the dynamic
+	 * range and we might end up looping forever...
+	 */
+	if (pgd && next_early_pgt < EARLY_DYNAMIC_PAGE_TABLES) {
+		pud_p = (pudval_t *)((pgd & PTE_PFN_MASK) + __START_KERNEL_map);
+	} else {
+		if (next_early_pgt >= EARLY_DYNAMIC_PAGE_TABLES-1)
+			reset_early_page_tables();
+
+		pud_p = (pudval_t *)early_dynamic_pgts[next_early_pgt++];
+		for (i = 0; i < PTRS_PER_PUD; i++)
+			pud_p[i] = 0;
+
+		*pgd_p = (pgdval_t)pud_p - __START_KERNEL_map + _KERNPG_TABLE;
+	}
+	i = (address >> PUD_SHIFT) & (PTRS_PER_PUD - 1);
+	pud_p += i;
+
+	pmd_p = (pmdval_t *)early_dynamic_pgts[next_early_pgt++];
+	pmd = (physaddr & PUD_MASK) + (__PAGE_KERNEL_LARGE & ~_PAGE_GLOBAL);
+	for (i = 0; i < PTRS_PER_PMD; i++) {
+		pmd_p[i] = pmd;
+		pmd += PMD_SIZE;
+	}
+
+	*pud_p = (pudval_t)pmd_p - __START_KERNEL_map + _KERNPG_TABLE;
+
+	return 0;
 }
 
 /* Don't add a printk in there. printk relies on the PDA which is not initialized 
@@ -70,12 +132,13 @@ void __init x86_64_start_kernel(char * real_mode_data)
 				(__START_KERNEL & PGDIR_MASK)));
 	BUILD_BUG_ON(__fix_to_virt(__end_of_fixed_addresses) <= MODULES_END);
 
+	/* Kill off the identity-map trampoline */
+	reset_early_page_tables();
+
 	/* clear bss before set_intr_gate with early_idt_handler */
 	clear_bss();
 
-	/* Make NULL pointers segfault */
-	zap_identity_mappings();
-
+	/* XXX - this is wrong... we need to build page tables from scratch */
 	max_pfn_mapped = KERNEL_IMAGE_SIZE >> PAGE_SHIFT;
 
 	for (i = 0; i < NUM_EXCEPTION_VECTORS; i++) {
diff --git a/arch/x86/kernel/head_64.S b/arch/x86/kernel/head_64.S
index 94bf9cc..0e040b3 100644
--- a/arch/x86/kernel/head_64.S
+++ b/arch/x86/kernel/head_64.S
@@ -47,14 +47,13 @@ L3_START_KERNEL = pud_index(__START_KERNEL_map)
 	.code64
 	.globl startup_64
 startup_64:
-
 	/*
 	 * At this point the CPU runs in 64bit mode CS.L = 1 CS.D = 1,
 	 * and someone has loaded an identity mapped page table
 	 * for us.  These identity mapped page tables map all of the
 	 * kernel pages and possibly all of memory.
 	 *
-	 * %esi holds a physical pointer to real_mode_data.
+	 * %rsi holds a physical pointer to real_mode_data.
 	 *
 	 * We come here either directly from a 64bit bootloader, or from
 	 * arch/x86_64/boot/compressed/head.S.
@@ -66,7 +65,8 @@ startup_64:
 	 * tables and then reload them.
 	 */
 
-	/* Compute the delta between the address I am compiled to run at and the
+	/*
+	 * Compute the delta between the address I am compiled to run at and the
 	 * address I am actually running at.
 	 */
 	leaq	_text(%rip), %rbp
@@ -78,53 +78,66 @@ startup_64:
 	testl	%eax, %eax
 	jnz	bad_address
 
-	/* Is the address too large? */
-	leaq	_text(%rip), %rdx
-	movq	$PGDIR_SIZE, %rax
-	cmpq	%rax, %rdx
-	jae	bad_address
-
-	/* Fixup the physical addresses in the page table
+	/*
+	 * Is the address too large?
 	 */
-	addq	%rbp, init_level4_pgt + 0(%rip)
-	addq	%rbp, init_level4_pgt + (L4_PAGE_OFFSET*8)(%rip)
-	addq	%rbp, init_level4_pgt + (L4_START_KERNEL*8)(%rip)
+	leaq	_text(%rip), %rax
+	shrq	$MAX_PHYSMEM_BITS, %rax
+	jnz	bad_address
 
-	addq	%rbp, level3_ident_pgt + 0(%rip)
+	/*
+	 * Fixup the physical addresses in the page table
+	 */
+	addq	%rbp, early_level4_pgt + (L4_START_KERNEL*8)(%rip)
 
 	addq	%rbp, level3_kernel_pgt + (510*8)(%rip)
 	addq	%rbp, level3_kernel_pgt + (511*8)(%rip)
 
 	addq	%rbp, level2_fixmap_pgt + (506*8)(%rip)
 
-	/* Add an Identity mapping if I am above 1G */
+	/*
+	 * Set up the identity mapping for the switchover.  These
+	 * entries should *NOT* have the global bit set!  This also
+	 * creates a bunch of nonsense entries but that is fine --
+	 * it avoids problems around wraparound.
+	 */
 	leaq	_text(%rip), %rdi
-	andq	$PMD_PAGE_MASK, %rdi
+	leaq	early_level4_pgt(%rip), %rbx
 
 	movq	%rdi, %rax
-	shrq	$PUD_SHIFT, %rax
-	andq	$(PTRS_PER_PUD - 1), %rax
-	jz	ident_complete
+	shrq	$PGDIR_SHIFT, %rax
 
-	leaq	(level2_spare_pgt - __START_KERNEL_map + _KERNPG_TABLE)(%rbp), %rdx
-	leaq	level3_ident_pgt(%rip), %rbx
-	movq	%rdx, 0(%rbx, %rax, 8)
+	leaq	(4096 + _KERNPG_TABLE)(%rbx), %rdx
+	movq	%rdx, 0(%rbx,%rax,8)
+	movq	%rdx, 8(%rbx,%rax,8)
+
+	addq	$4096, %rdx
+	movq	%rdi, %rax
+	shrq	$PUD_SHIFT, %rax
+	andl	$(PTRS_PER_PUD-1), %eax
+	movq	%rdx, (4096+0)(%rbx,%rax,8)
+	movq	%rdx, (4096+8)(%rbx,%rax,8)
 
+	addq	$8192, %rbx
 	movq	%rdi, %rax
-	shrq	$PMD_SHIFT, %rax
-	andq	$(PTRS_PER_PMD - 1), %rax
-	leaq	__PAGE_KERNEL_IDENT_LARGE_EXEC(%rdi), %rdx
-	leaq	level2_spare_pgt(%rip), %rbx
-	movq	%rdx, 0(%rbx, %rax, 8)
-ident_complete:
+	shrq	$PMD_SHIFT, %rdi
+	addq	$(__PAGE_KERNEL_LARGE_EXEC & ~_PAGE_GLOBAL), %rax
+	movl	$PTRS_PER_PMD, %ecx
 
+1:
+	andq	$(PTRS_PER_PMD - 1), %rdi
+	movq	%rax, (%rbx,%rdi,8)
+	incq	%rdi
+	addq	$PMD_SIZE, %rax
+	decl	%ecx
+	jnz	1b
+	
 	/*
 	 * Fixup the kernel text+data virtual addresses. Note that
 	 * we might write invalid pmds, when the kernel is relocated
 	 * cleanup_highmap() fixes this up along with the mappings
 	 * beyond _end.
 	 */
-
 	leaq	level2_kernel_pgt(%rip), %rdi
 	leaq	4096(%rdi), %r8
 	/* See if it is a valid page table entry */
@@ -149,7 +162,7 @@ ENTRY(secondary_startup_64)
 	 * At this point the CPU runs in 64bit mode CS.L = 1 CS.D = 1,
 	 * and someone has loaded a mapped page table.
 	 *
-	 * %esi holds a physical pointer to real_mode_data.
+	 * %rsi holds a physical pointer to real_mode_data.
 	 *
 	 * We come here either from startup_64 (using physical addresses)
 	 * or from trampoline.S (using virtual addresses).
@@ -196,7 +209,7 @@ ENTRY(secondary_startup_64)
 	movq	%rax, %cr0
 
 	/* Setup a boot time stack */
-	movq stack_start(%rip),%rsp
+	movq stack_start(%rip), %rsp
 
 	/* zero EFLAGS after setting rsp */
 	pushq $0
@@ -236,31 +249,31 @@ ENTRY(secondary_startup_64)
 	movl	initial_gs+4(%rip),%edx
 	wrmsr	
 
-	/* esi is pointer to real mode structure with interesting info.
+	/* rsi is pointer to real mode structure with interesting info.
 	   pass it to C */
-	movl	%esi, %edi
+	movq	%rsi, %rdi
 	
 	/* Finally jump to run C code and to be on real kernel address
 	 * Since we are running on identity-mapped space we have to jump
 	 * to the full 64bit address, this is only possible as indirect
 	 * jump.  In addition we need to ensure %cs is set so we make this
-	 * a far return.
+	 * a far jump.
 	 */
-	movq	initial_code(%rip),%rax
 	pushq	$0		# fake return address to stop unwinder
-	pushq	$__KERNEL_CS	# set correct cs
-	pushq	%rax		# target address in negative space
-	lretq
+	/* gas 2.22 is buggy and mis-assembles ljmpq */
+	rex64 ljmp *initial_code(%rip)
 
 	/* SMP bootup changes these two */
 	__REFDATA
-	.align	8
-	ENTRY(initial_code)
+	.balign	8
+	GLOBAL(initial_code)
 	.quad	x86_64_start_kernel
-	ENTRY(initial_gs)
+	.word	__KERNEL_CS
+	.balign	8
+	GLOBAL(initial_gs)
 	.quad	INIT_PER_CPU_VAR(irq_stack_union)
 
-	ENTRY(stack_start)
+	GLOBAL(stack_start)
 	.quad  init_thread_union+THREAD_SIZE-8
 	.word  0
 	__FINITDATA
@@ -268,7 +281,7 @@ ENTRY(secondary_startup_64)
 bad_address:
 	jmp bad_address
 
-	.section ".init.text","ax"
+	__INIT
 	.globl early_idt_handlers
 early_idt_handlers:
 	# 104(%rsp) %rflags
@@ -305,14 +318,22 @@ ENTRY(early_idt_handler)
 	pushq %r11		#  0(%rsp)
 
 	cmpl $__KERNEL_CS,96(%rsp)
-	jne 10f
+	jne 11f
 
+	cmpl $14,72(%rsp)	# Page fault?
+	jnz 10f
+	GET_CR2_INTO(%rdi)	# can clobber any volatile register if pv
+	call early_make_pgtable
+	andl %eax,%eax
+	jz 20f			# All good
+
+10:
 	leaq 88(%rsp),%rdi	# Pointer to %rip
 	call early_fixup_exception
 	andl %eax,%eax
 	jnz 20f			# Found an exception entry
 
-10:
+11:
 #ifdef CONFIG_EARLY_PRINTK
 	GET_CR2_INTO(%r9)	# can clobber any volatile register if pv
 	movl 80(%rsp),%r8d	# error code
@@ -334,7 +355,7 @@ ENTRY(early_idt_handler)
 1:	hlt
 	jmp 1b
 
-20:	# Exception table entry found
+20:	# Exception table entry found or page table generated
 	popq %r11
 	popq %r10
 	popq %r9
@@ -348,6 +369,8 @@ ENTRY(early_idt_handler)
 	decl early_recursion_flag(%rip)
 	INTERRUPT_RETURN
 
+	__INITDATA
+	
 	.balign 4
 early_recursion_flag:
 	.long 0
@@ -358,11 +381,10 @@ early_idt_msg:
 early_idt_ripmsg:
 	.asciz "RIP %s\n"
 #endif /* CONFIG_EARLY_PRINTK */
-	.previous
 
 #define NEXT_PAGE(name) \
 	.balign	PAGE_SIZE; \
-ENTRY(name)
+GLOBAL(name)
 
 /* Automate the creation of 1 to 1 mapping pmd entries */
 #define PMDS(START, PERM, COUNT)			\
@@ -372,46 +394,21 @@ ENTRY(name)
 	i = i + 1 ;					\
 	.endr
 
-	.data
-	/*
-	 * This default setting generates an ident mapping at address 0x100000
-	 * and a mapping for the kernel that precisely maps virtual address
-	 * 0xffffffff80000000 to physical address 0x000000. (always using
-	 * 2Mbyte large pages provided by PAE mode)
-	 */
-NEXT_PAGE(init_level4_pgt)
-	.quad	level3_ident_pgt - __START_KERNEL_map + _KERNPG_TABLE
-	.org	init_level4_pgt + L4_PAGE_OFFSET*8, 0
-	.quad	level3_ident_pgt - __START_KERNEL_map + _KERNPG_TABLE
-	.org	init_level4_pgt + L4_START_KERNEL*8, 0
-	/* (2^48-(2*1024*1024*1024))/(2^39) = 511 */
+	__INITDATA
+NEXT_PAGE(early_level4_pgt)
+	.fill	511,8,0
 	.quad	level3_kernel_pgt - __START_KERNEL_map + _PAGE_TABLE
 
-NEXT_PAGE(level3_ident_pgt)
-	.quad	level2_ident_pgt - __START_KERNEL_map + _KERNPG_TABLE
-	.fill	511,8,0
+NEXT_PAGE(early_dynamic_pgts)
+	.fill	512*EARLY_DYNAMIC_PAGE_TABLES,8,0
 
+	.data
 NEXT_PAGE(level3_kernel_pgt)
 	.fill	L3_START_KERNEL,8,0
 	/* (2^48-(2*1024*1024*1024)-((2^39)*511))/(2^30) = 510 */
 	.quad	level2_kernel_pgt - __START_KERNEL_map + _KERNPG_TABLE
 	.quad	level2_fixmap_pgt - __START_KERNEL_map + _PAGE_TABLE
 
-NEXT_PAGE(level2_fixmap_pgt)
-	.fill	506,8,0
-	.quad	level1_fixmap_pgt - __START_KERNEL_map + _PAGE_TABLE
-	/* 8MB reserved for vsyscalls + a 2MB hole = 4 + 1 entries */
-	.fill	5,8,0
-
-NEXT_PAGE(level1_fixmap_pgt)
-	.fill	512,8,0
-
-NEXT_PAGE(level2_ident_pgt)
-	/* Since I easily can, map the first 1G.
-	 * Don't set NX because code runs from these pages.
-	 */
-	PMDS(0, __PAGE_KERNEL_IDENT_LARGE_EXEC, PTRS_PER_PMD)
-
 NEXT_PAGE(level2_kernel_pgt)
 	/*
 	 * 512 MB kernel mapping. We spend a full page on this pagetable
@@ -426,11 +423,16 @@ NEXT_PAGE(level2_kernel_pgt)
 	PMDS(0, __PAGE_KERNEL_LARGE_EXEC,
 		KERNEL_IMAGE_SIZE/PMD_SIZE)
 
-NEXT_PAGE(level2_spare_pgt)
-	.fill   512, 8, 0
+NEXT_PAGE(level2_fixmap_pgt)
+	.fill	506,8,0
+	.quad	level1_fixmap_pgt - __START_KERNEL_map + _PAGE_TABLE
+	/* 8MB reserved for vsyscalls + a 2MB hole = 4 + 1 entries */
+	.fill	5,8,0
+
+NEXT_PAGE(level1_fixmap_pgt)
+	.fill	512,8,0
 
 #undef PMDS
-#undef NEXT_PAGE
 
 	.data
 	.align 16
@@ -456,6 +458,7 @@ ENTRY(nmi_idt_table)
 	.skip IDT_ENTRIES * 16
 
 	__PAGE_ALIGNED_BSS
-	.align PAGE_SIZE
-ENTRY(empty_zero_page)
+NEXT_PAGE(empty_zero_page)
+	.skip PAGE_SIZE
+NEXT_PAGE(init_level4_pgt)
 	.skip PAGE_SIZE
diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
index ca45696..e383050 100644
--- a/arch/x86/kernel/setup.c
+++ b/arch/x86/kernel/setup.c
@@ -911,7 +911,6 @@ void __init setup_arch(char **cmdline_p)
 			(max_pfn_mapped<<PAGE_SHIFT) - 1);
 
 	setup_real_mode();
-
 	init_gbpages();
 
 	/* max_pfn_mapped is updated here */
diff --git a/arch/x86/realmode/init.c b/arch/x86/realmode/init.c
index cbca565..1650bf4 100644
--- a/arch/x86/realmode/init.c
+++ b/arch/x86/realmode/init.c
@@ -21,7 +21,7 @@ void __init setup_real_mode(void)
 	struct trampoline_header *trampoline_header;
 	size_t size = PAGE_ALIGN(real_mode_blob_end - real_mode_blob);
 #ifdef CONFIG_X86_64
-	u64 *trampoline_pgd;
+	pgd_t *trampoline_pgd;
 	u64 efer;
 #endif
 
@@ -77,9 +77,17 @@ void __init setup_real_mode(void)
 	trampoline_cr4_features = &trampoline_header->cr4;
 	*trampoline_cr4_features = read_cr4();
 
-	trampoline_pgd = (u64 *) __va(real_mode_header->trampoline_pgd);
-	trampoline_pgd[0] = __pa(level3_ident_pgt) + _KERNPG_TABLE;
-	trampoline_pgd[511] = __pa(level3_kernel_pgt) + _KERNPG_TABLE;
+	trampoline_pgd = (pgd_t *) __va(real_mode_header->trampoline_pgd);
+
+	/* Set up the identity map */
+	clone_pgd_range(trampoline_pgd,
+			init_level4_pgt + KERNEL_PGD_BOUNDARY,
+			MAXMEM >> PGDIR_SHIFT);
+
+	/* Set up the kernel map */
+	clone_pgd_range(trampoline_pgd  + KERNEL_PGD_BOUNDARY,
+			init_level4_pgt + KERNEL_PGD_BOUNDARY,
+			PTRS_PER_PGD - KERNEL_PGD_BOUNDARY);
 #endif
 }
 

^ permalink raw reply related	[flat|nested] 127+ messages in thread

* Re: [tip:x86/microcode] x86/microcode_intel_early.c: Early update ucode on Intel's CPU
  2012-12-13  5:12                                           ` H. Peter Anvin
@ 2012-12-13  5:26                                             ` H. Peter Anvin
  2012-12-13  7:01                                               ` Yinghai Lu
  2012-12-13 19:13                                               ` Borislav Petkov
  0 siblings, 2 replies; 127+ messages in thread
From: H. Peter Anvin @ 2012-12-13  5:26 UTC (permalink / raw)
  To: Borislav Petkov, Yinghai Lu, Yu, Fenghua, mingo, linux-kernel,
	tglx, hpa, linux-tip-commits, Konrad Rzeszutek Wilk,
	Stefano Stabellini

[-- Attachment #1: Type: text/plain, Size: 607 bytes --]

On 12/12/2012 09:12 PM, H. Peter Anvin wrote:
> Here is a version that compiles.  It doesn't *boot* yet, because the
> switchover from dynamic mode to the real pagetables doesn't happen right
> and so we end up on an uninitialized set of page tables.
>
> The new page table setup in tip:x86/mm2 should make that easier to
> achieve, however... I won't have time to test this out tonight, though.
>
>      -hpa

Well, minus a simple brainfart now it actually gets into the page table 
setup.

	-hpa


-- 
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel.  I don't speak on their behalf.


[-- Attachment #2: diff --]
[-- Type: text/plain, Size: 15046 bytes --]

diff --git a/arch/x86/include/asm/pgtable_64_types.h b/arch/x86/include/asm/pgtable_64_types.h
index 766ea16..2d88344 100644
--- a/arch/x86/include/asm/pgtable_64_types.h
+++ b/arch/x86/include/asm/pgtable_64_types.h
@@ -1,6 +1,8 @@
 #ifndef _ASM_X86_PGTABLE_64_DEFS_H
 #define _ASM_X86_PGTABLE_64_DEFS_H
 
+#include <asm/sparsemem.h>
+
 #ifndef __ASSEMBLY__
 #include <linux/types.h>
 
@@ -60,4 +62,6 @@ typedef struct { pteval_t pte; } pte_t;
 #define MODULES_END      _AC(0xffffffffff000000, UL)
 #define MODULES_LEN   (MODULES_END - MODULES_VADDR)
 
+#define EARLY_DYNAMIC_PAGE_TABLES	64
+
 #endif /* _ASM_X86_PGTABLE_64_DEFS_H */
diff --git a/arch/x86/kernel/head64.c b/arch/x86/kernel/head64.c
index 037df57..9443c77 100644
--- a/arch/x86/kernel/head64.c
+++ b/arch/x86/kernel/head64.c
@@ -26,11 +26,73 @@
 #include <asm/e820.h>
 #include <asm/bios_ebda.h>
 
-static void __init zap_identity_mappings(void)
+/*
+ * Manage page tables very early on.
+ */
+extern pgd_t early_level4_pgt[PTRS_PER_PGD];
+extern pmd_t early_dynamic_pgts[EARLY_DYNAMIC_PAGE_TABLES][PTRS_PER_PMD];
+static unsigned int __initdata next_early_pgt = 2, early_pgt_resets = 0;
+
+/* Wipe all early page tables except for the kernel symbol map */
+static void __init reset_early_page_tables(void)
 {
-	pgd_t *pgd = pgd_offset_k(0UL);
-	pgd_clear(pgd);
-	__flush_tlb_all();
+	unsigned long i;
+
+	for (i = 0; i < PTRS_PER_PGD-1; i++)
+		early_level4_pgt[i].pgd = 0;
+
+	next_early_pgt = 0;
+	early_pgt_resets++;
+
+	__native_flush_tlb();
+}
+
+/* Create a new PMD entry */
+int __init early_make_pgtable(unsigned long address)
+{
+	unsigned long physaddr = address - __PAGE_OFFSET;
+	unsigned long i;
+	pgdval_t pgd, *pgd_p;
+	pudval_t *pud_p;
+	pmdval_t pmd, *pmd_p;
+
+	if (physaddr >= MAXMEM)
+		return -1;	/* Invalid address - puke */
+
+	i = (address >> PGDIR_SHIFT) & (PTRS_PER_PGD - 1);
+	pgd_p = &early_level4_pgt[i].pgd;
+	pgd = *pgd_p;
+
+	/*
+	 * The use of __START_KERNEL_map rather than __PAGE_OFFSET here is
+	 * critical -- __PAGE_OFFSET would point us back into the dynamic
+	 * range and we might end up looping forever...
+	 */
+	if (pgd && next_early_pgt < EARLY_DYNAMIC_PAGE_TABLES) {
+		pud_p = (pudval_t *)((pgd & PTE_PFN_MASK) + __START_KERNEL_map);
+	} else {
+		if (next_early_pgt >= EARLY_DYNAMIC_PAGE_TABLES-1)
+			reset_early_page_tables();
+
+		pud_p = (pudval_t *)early_dynamic_pgts[next_early_pgt++];
+		for (i = 0; i < PTRS_PER_PUD; i++)
+			pud_p[i] = 0;
+
+		*pgd_p = (pgdval_t)pud_p - __START_KERNEL_map + _KERNPG_TABLE;
+	}
+	i = (address >> PUD_SHIFT) & (PTRS_PER_PUD - 1);
+	pud_p += i;
+
+	pmd_p = (pmdval_t *)early_dynamic_pgts[next_early_pgt++];
+	pmd = (physaddr & PUD_MASK) + (__PAGE_KERNEL_LARGE & ~_PAGE_GLOBAL);
+	for (i = 0; i < PTRS_PER_PMD; i++) {
+		pmd_p[i] = pmd;
+		pmd += PMD_SIZE;
+	}
+
+	*pud_p = (pudval_t)pmd_p - __START_KERNEL_map + _KERNPG_TABLE;
+
+	return 0;
 }
 
 /* Don't add a printk in there. printk relies on the PDA which is not initialized 
@@ -70,12 +132,13 @@ void __init x86_64_start_kernel(char * real_mode_data)
 				(__START_KERNEL & PGDIR_MASK)));
 	BUILD_BUG_ON(__fix_to_virt(__end_of_fixed_addresses) <= MODULES_END);
 
+	/* Kill off the identity-map trampoline */
+	reset_early_page_tables();
+
 	/* clear bss before set_intr_gate with early_idt_handler */
 	clear_bss();
 
-	/* Make NULL pointers segfault */
-	zap_identity_mappings();
-
+	/* XXX - this is wrong... we need to build page tables from scratch */
 	max_pfn_mapped = KERNEL_IMAGE_SIZE >> PAGE_SHIFT;
 
 	for (i = 0; i < NUM_EXCEPTION_VECTORS; i++) {
diff --git a/arch/x86/kernel/head_64.S b/arch/x86/kernel/head_64.S
index 94bf9cc..d539692 100644
--- a/arch/x86/kernel/head_64.S
+++ b/arch/x86/kernel/head_64.S
@@ -47,14 +47,13 @@ L3_START_KERNEL = pud_index(__START_KERNEL_map)
 	.code64
 	.globl startup_64
 startup_64:
-
 	/*
 	 * At this point the CPU runs in 64bit mode CS.L = 1 CS.D = 1,
 	 * and someone has loaded an identity mapped page table
 	 * for us.  These identity mapped page tables map all of the
 	 * kernel pages and possibly all of memory.
 	 *
-	 * %esi holds a physical pointer to real_mode_data.
+	 * %rsi holds a physical pointer to real_mode_data.
 	 *
 	 * We come here either directly from a 64bit bootloader, or from
 	 * arch/x86_64/boot/compressed/head.S.
@@ -66,7 +65,8 @@ startup_64:
 	 * tables and then reload them.
 	 */
 
-	/* Compute the delta between the address I am compiled to run at and the
+	/*
+	 * Compute the delta between the address I am compiled to run at and the
 	 * address I am actually running at.
 	 */
 	leaq	_text(%rip), %rbp
@@ -78,53 +78,66 @@ startup_64:
 	testl	%eax, %eax
 	jnz	bad_address
 
-	/* Is the address too large? */
-	leaq	_text(%rip), %rdx
-	movq	$PGDIR_SIZE, %rax
-	cmpq	%rax, %rdx
-	jae	bad_address
-
-	/* Fixup the physical addresses in the page table
+	/*
+	 * Is the address too large?
 	 */
-	addq	%rbp, init_level4_pgt + 0(%rip)
-	addq	%rbp, init_level4_pgt + (L4_PAGE_OFFSET*8)(%rip)
-	addq	%rbp, init_level4_pgt + (L4_START_KERNEL*8)(%rip)
+	leaq	_text(%rip), %rax
+	shrq	$MAX_PHYSMEM_BITS, %rax
+	jnz	bad_address
 
-	addq	%rbp, level3_ident_pgt + 0(%rip)
+	/*
+	 * Fixup the physical addresses in the page table
+	 */
+	addq	%rbp, early_level4_pgt + (L4_START_KERNEL*8)(%rip)
 
 	addq	%rbp, level3_kernel_pgt + (510*8)(%rip)
 	addq	%rbp, level3_kernel_pgt + (511*8)(%rip)
 
 	addq	%rbp, level2_fixmap_pgt + (506*8)(%rip)
 
-	/* Add an Identity mapping if I am above 1G */
+	/*
+	 * Set up the identity mapping for the switchover.  These
+	 * entries should *NOT* have the global bit set!  This also
+	 * creates a bunch of nonsense entries but that is fine --
+	 * it avoids problems around wraparound.
+	 */
 	leaq	_text(%rip), %rdi
-	andq	$PMD_PAGE_MASK, %rdi
+	leaq	early_level4_pgt(%rip), %rbx
 
 	movq	%rdi, %rax
-	shrq	$PUD_SHIFT, %rax
-	andq	$(PTRS_PER_PUD - 1), %rax
-	jz	ident_complete
+	shrq	$PGDIR_SHIFT, %rax
 
-	leaq	(level2_spare_pgt - __START_KERNEL_map + _KERNPG_TABLE)(%rbp), %rdx
-	leaq	level3_ident_pgt(%rip), %rbx
-	movq	%rdx, 0(%rbx, %rax, 8)
+	leaq	(4096 + _KERNPG_TABLE)(%rbx), %rdx
+	movq	%rdx, 0(%rbx,%rax,8)
+	movq	%rdx, 8(%rbx,%rax,8)
 
+	addq	$4096, %rdx
 	movq	%rdi, %rax
-	shrq	$PMD_SHIFT, %rax
-	andq	$(PTRS_PER_PMD - 1), %rax
-	leaq	__PAGE_KERNEL_IDENT_LARGE_EXEC(%rdi), %rdx
-	leaq	level2_spare_pgt(%rip), %rbx
-	movq	%rdx, 0(%rbx, %rax, 8)
-ident_complete:
+	shrq	$PUD_SHIFT, %rax
+	andl	$(PTRS_PER_PUD-1), %eax
+	movq	%rdx, (4096+0)(%rbx,%rax,8)
+	movq	%rdx, (4096+8)(%rbx,%rax,8)
 
+	addq	$8192, %rbx
+	movq	%rdi, %rax
+	shrq	$PMD_SHIFT, %rdi
+	addq	$(__PAGE_KERNEL_LARGE_EXEC & ~_PAGE_GLOBAL), %rax
+	movl	$PTRS_PER_PMD, %ecx
+
+1:
+	andq	$(PTRS_PER_PMD - 1), %rdi
+	movq	%rax, (%rbx,%rdi,8)
+	incq	%rdi
+	addq	$PMD_SIZE, %rax
+	decl	%ecx
+	jnz	1b
+	
 	/*
 	 * Fixup the kernel text+data virtual addresses. Note that
 	 * we might write invalid pmds, when the kernel is relocated
 	 * cleanup_highmap() fixes this up along with the mappings
 	 * beyond _end.
 	 */
-
 	leaq	level2_kernel_pgt(%rip), %rdi
 	leaq	4096(%rdi), %r8
 	/* See if it is a valid page table entry */
@@ -139,17 +152,14 @@ ident_complete:
 	/* Fixup phys_base */
 	addq	%rbp, phys_base(%rip)
 
-	/* Due to ENTRY(), sometimes the empty space gets filled with
-	 * zeros. Better take a jmp than relying on empty space being
-	 * filled with 0x90 (nop)
-	 */
-	jmp secondary_startup_64
+	movq	$(early_level4_pgt - __START_KERNEL_map), %rax
+	jmp 1f
 ENTRY(secondary_startup_64)
 	/*
 	 * At this point the CPU runs in 64bit mode CS.L = 1 CS.D = 1,
 	 * and someone has loaded a mapped page table.
 	 *
-	 * %esi holds a physical pointer to real_mode_data.
+	 * %rsi holds a physical pointer to real_mode_data.
 	 *
 	 * We come here either from startup_64 (using physical addresses)
 	 * or from trampoline.S (using virtual addresses).
@@ -159,12 +169,14 @@ ENTRY(secondary_startup_64)
 	 * after the boot processor executes this code.
 	 */
 
+	movq	$(init_level4_pgt - __START_KERNEL_map), %rax
+1:
+
 	/* Enable PAE mode and PGE */
-	movl	$(X86_CR4_PAE | X86_CR4_PGE), %eax
-	movq	%rax, %cr4
+	movl	$(X86_CR4_PAE | X86_CR4_PGE), %ecx
+	movq	%rcx, %cr4
 
 	/* Setup early boot stage 4 level pagetables. */
-	movq	$(init_level4_pgt - __START_KERNEL_map), %rax
 	addq	phys_base(%rip), %rax
 	movq	%rax, %cr3
 
@@ -196,7 +208,7 @@ ENTRY(secondary_startup_64)
 	movq	%rax, %cr0
 
 	/* Setup a boot time stack */
-	movq stack_start(%rip),%rsp
+	movq stack_start(%rip), %rsp
 
 	/* zero EFLAGS after setting rsp */
 	pushq $0
@@ -236,31 +248,31 @@ ENTRY(secondary_startup_64)
 	movl	initial_gs+4(%rip),%edx
 	wrmsr	
 
-	/* esi is pointer to real mode structure with interesting info.
+	/* rsi is pointer to real mode structure with interesting info.
 	   pass it to C */
-	movl	%esi, %edi
+	movq	%rsi, %rdi
 	
 	/* Finally jump to run C code and to be on real kernel address
 	 * Since we are running on identity-mapped space we have to jump
 	 * to the full 64bit address, this is only possible as indirect
 	 * jump.  In addition we need to ensure %cs is set so we make this
-	 * a far return.
+	 * a far jump.
 	 */
-	movq	initial_code(%rip),%rax
 	pushq	$0		# fake return address to stop unwinder
-	pushq	$__KERNEL_CS	# set correct cs
-	pushq	%rax		# target address in negative space
-	lretq
+	/* gas 2.22 is buggy and mis-assembles ljmpq */
+	rex64 ljmp *initial_code(%rip)
 
 	/* SMP bootup changes these two */
 	__REFDATA
-	.align	8
-	ENTRY(initial_code)
+	.balign	8
+	GLOBAL(initial_code)
 	.quad	x86_64_start_kernel
-	ENTRY(initial_gs)
+	.word	__KERNEL_CS
+	.balign	8
+	GLOBAL(initial_gs)
 	.quad	INIT_PER_CPU_VAR(irq_stack_union)
 
-	ENTRY(stack_start)
+	GLOBAL(stack_start)
 	.quad  init_thread_union+THREAD_SIZE-8
 	.word  0
 	__FINITDATA
@@ -268,7 +280,7 @@ ENTRY(secondary_startup_64)
 bad_address:
 	jmp bad_address
 
-	.section ".init.text","ax"
+	__INIT
 	.globl early_idt_handlers
 early_idt_handlers:
 	# 104(%rsp) %rflags
@@ -305,14 +317,22 @@ ENTRY(early_idt_handler)
 	pushq %r11		#  0(%rsp)
 
 	cmpl $__KERNEL_CS,96(%rsp)
-	jne 10f
+	jne 11f
 
+	cmpl $14,72(%rsp)	# Page fault?
+	jnz 10f
+	GET_CR2_INTO(%rdi)	# can clobber any volatile register if pv
+	call early_make_pgtable
+	andl %eax,%eax
+	jz 20f			# All good
+
+10:
 	leaq 88(%rsp),%rdi	# Pointer to %rip
 	call early_fixup_exception
 	andl %eax,%eax
 	jnz 20f			# Found an exception entry
 
-10:
+11:
 #ifdef CONFIG_EARLY_PRINTK
 	GET_CR2_INTO(%r9)	# can clobber any volatile register if pv
 	movl 80(%rsp),%r8d	# error code
@@ -334,7 +354,7 @@ ENTRY(early_idt_handler)
 1:	hlt
 	jmp 1b
 
-20:	# Exception table entry found
+20:	# Exception table entry found or page table generated
 	popq %r11
 	popq %r10
 	popq %r9
@@ -348,6 +368,8 @@ ENTRY(early_idt_handler)
 	decl early_recursion_flag(%rip)
 	INTERRUPT_RETURN
 
+	__INITDATA
+	
 	.balign 4
 early_recursion_flag:
 	.long 0
@@ -358,11 +380,10 @@ early_idt_msg:
 early_idt_ripmsg:
 	.asciz "RIP %s\n"
 #endif /* CONFIG_EARLY_PRINTK */
-	.previous
 
 #define NEXT_PAGE(name) \
 	.balign	PAGE_SIZE; \
-ENTRY(name)
+GLOBAL(name)
 
 /* Automate the creation of 1 to 1 mapping pmd entries */
 #define PMDS(START, PERM, COUNT)			\
@@ -372,46 +393,21 @@ ENTRY(name)
 	i = i + 1 ;					\
 	.endr
 
-	.data
-	/*
-	 * This default setting generates an ident mapping at address 0x100000
-	 * and a mapping for the kernel that precisely maps virtual address
-	 * 0xffffffff80000000 to physical address 0x000000. (always using
-	 * 2Mbyte large pages provided by PAE mode)
-	 */
-NEXT_PAGE(init_level4_pgt)
-	.quad	level3_ident_pgt - __START_KERNEL_map + _KERNPG_TABLE
-	.org	init_level4_pgt + L4_PAGE_OFFSET*8, 0
-	.quad	level3_ident_pgt - __START_KERNEL_map + _KERNPG_TABLE
-	.org	init_level4_pgt + L4_START_KERNEL*8, 0
-	/* (2^48-(2*1024*1024*1024))/(2^39) = 511 */
+	__INITDATA
+NEXT_PAGE(early_level4_pgt)
+	.fill	511,8,0
 	.quad	level3_kernel_pgt - __START_KERNEL_map + _PAGE_TABLE
 
-NEXT_PAGE(level3_ident_pgt)
-	.quad	level2_ident_pgt - __START_KERNEL_map + _KERNPG_TABLE
-	.fill	511,8,0
+NEXT_PAGE(early_dynamic_pgts)
+	.fill	512*EARLY_DYNAMIC_PAGE_TABLES,8,0
 
+	.data
 NEXT_PAGE(level3_kernel_pgt)
 	.fill	L3_START_KERNEL,8,0
 	/* (2^48-(2*1024*1024*1024)-((2^39)*511))/(2^30) = 510 */
 	.quad	level2_kernel_pgt - __START_KERNEL_map + _KERNPG_TABLE
 	.quad	level2_fixmap_pgt - __START_KERNEL_map + _PAGE_TABLE
 
-NEXT_PAGE(level2_fixmap_pgt)
-	.fill	506,8,0
-	.quad	level1_fixmap_pgt - __START_KERNEL_map + _PAGE_TABLE
-	/* 8MB reserved for vsyscalls + a 2MB hole = 4 + 1 entries */
-	.fill	5,8,0
-
-NEXT_PAGE(level1_fixmap_pgt)
-	.fill	512,8,0
-
-NEXT_PAGE(level2_ident_pgt)
-	/* Since I easily can, map the first 1G.
-	 * Don't set NX because code runs from these pages.
-	 */
-	PMDS(0, __PAGE_KERNEL_IDENT_LARGE_EXEC, PTRS_PER_PMD)
-
 NEXT_PAGE(level2_kernel_pgt)
 	/*
 	 * 512 MB kernel mapping. We spend a full page on this pagetable
@@ -426,11 +422,16 @@ NEXT_PAGE(level2_kernel_pgt)
 	PMDS(0, __PAGE_KERNEL_LARGE_EXEC,
 		KERNEL_IMAGE_SIZE/PMD_SIZE)
 
-NEXT_PAGE(level2_spare_pgt)
-	.fill   512, 8, 0
+NEXT_PAGE(level2_fixmap_pgt)
+	.fill	506,8,0
+	.quad	level1_fixmap_pgt - __START_KERNEL_map + _PAGE_TABLE
+	/* 8MB reserved for vsyscalls + a 2MB hole = 4 + 1 entries */
+	.fill	5,8,0
+
+NEXT_PAGE(level1_fixmap_pgt)
+	.fill	512,8,0
 
 #undef PMDS
-#undef NEXT_PAGE
 
 	.data
 	.align 16
@@ -456,6 +457,7 @@ ENTRY(nmi_idt_table)
 	.skip IDT_ENTRIES * 16
 
 	__PAGE_ALIGNED_BSS
-	.align PAGE_SIZE
-ENTRY(empty_zero_page)
+NEXT_PAGE(empty_zero_page)
+	.skip PAGE_SIZE
+NEXT_PAGE(init_level4_pgt)
 	.skip PAGE_SIZE
diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
index ca45696..e383050 100644
--- a/arch/x86/kernel/setup.c
+++ b/arch/x86/kernel/setup.c
@@ -911,7 +911,6 @@ void __init setup_arch(char **cmdline_p)
 			(max_pfn_mapped<<PAGE_SHIFT) - 1);
 
 	setup_real_mode();
-
 	init_gbpages();
 
 	/* max_pfn_mapped is updated here */
diff --git a/arch/x86/realmode/init.c b/arch/x86/realmode/init.c
index cbca565..1650bf4 100644
--- a/arch/x86/realmode/init.c
+++ b/arch/x86/realmode/init.c
@@ -21,7 +21,7 @@ void __init setup_real_mode(void)
 	struct trampoline_header *trampoline_header;
 	size_t size = PAGE_ALIGN(real_mode_blob_end - real_mode_blob);
 #ifdef CONFIG_X86_64
-	u64 *trampoline_pgd;
+	pgd_t *trampoline_pgd;
 	u64 efer;
 #endif
 
@@ -77,9 +77,17 @@ void __init setup_real_mode(void)
 	trampoline_cr4_features = &trampoline_header->cr4;
 	*trampoline_cr4_features = read_cr4();
 
-	trampoline_pgd = (u64 *) __va(real_mode_header->trampoline_pgd);
-	trampoline_pgd[0] = __pa(level3_ident_pgt) + _KERNPG_TABLE;
-	trampoline_pgd[511] = __pa(level3_kernel_pgt) + _KERNPG_TABLE;
+	trampoline_pgd = (pgd_t *) __va(real_mode_header->trampoline_pgd);
+
+	/* Set up the identity map */
+	clone_pgd_range(trampoline_pgd,
+			init_level4_pgt + KERNEL_PGD_BOUNDARY,
+			MAXMEM >> PGDIR_SHIFT);
+
+	/* Set up the kernel map */
+	clone_pgd_range(trampoline_pgd  + KERNEL_PGD_BOUNDARY,
+			init_level4_pgt + KERNEL_PGD_BOUNDARY,
+			PTRS_PER_PGD - KERNEL_PGD_BOUNDARY);
 #endif
 }
 

^ permalink raw reply related	[flat|nested] 127+ messages in thread

* Re: [tip:x86/microcode] x86/microcode_intel_early.c: Early update ucode on Intel's CPU
  2012-12-13  5:26                                             ` H. Peter Anvin
@ 2012-12-13  7:01                                               ` Yinghai Lu
  2012-12-13 15:01                                                 ` H. Peter Anvin
  2012-12-13 19:13                                               ` Borislav Petkov
  1 sibling, 1 reply; 127+ messages in thread
From: Yinghai Lu @ 2012-12-13  7:01 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Borislav Petkov, Yu, Fenghua, mingo, linux-kernel, tglx, hpa,
	linux-tip-commits, Konrad Rzeszutek Wilk, Stefano Stabellini

On Wed, Dec 12, 2012 at 9:26 PM, H. Peter Anvin <hpa@zytor.com> wrote:
>>
>> The new page table setup in tip:x86/mm2 should make that easier to
>> achieve, however... I won't have time to test this out tonight, though.
>>
>>      -hpa
>
>
> Well, minus a simple brainfart now it actually gets into the page table
> setup.

of init_mem_mapping in setup_arch?

^ permalink raw reply	[flat|nested] 127+ messages in thread

* Re: [tip:x86/microcode] x86/microcode_intel_early.c: Early update ucode on Intel's CPU
  2012-12-13  7:01                                               ` Yinghai Lu
@ 2012-12-13 15:01                                                 ` H. Peter Anvin
  0 siblings, 0 replies; 127+ messages in thread
From: H. Peter Anvin @ 2012-12-13 15:01 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: H. Peter Anvin, Borislav Petkov, Yu, Fenghua, mingo,
	linux-kernel, tglx, linux-tip-commits, Konrad Rzeszutek Wilk,
	Stefano Stabellini

On 12/12/2012 11:01 PM, Yinghai Lu wrote:
> On Wed, Dec 12, 2012 at 9:26 PM, H. Peter Anvin <hpa@zytor.com> wrote:
>>>
>>> The new page table setup in tip:x86/mm2 should make that easier to
>>> achieve, however... I won't have time to test this out tonight, though.
>>>
>>>      -hpa
>>
>>
>> Well, minus a simple brainfart now it actually gets into the page table
>> setup.
> 
> of init_mem_mapping in setup_arch?
> 

Probably, yes... it then gets confused which is not surprising since it
is expecting to do an incremental build instead of ground up.

	-hpa


^ permalink raw reply	[flat|nested] 127+ messages in thread

* Re: [tip:x86/microcode] x86/microcode_intel_early.c: Early update ucode on Intel's CPU
  2012-12-13  5:26                                             ` H. Peter Anvin
  2012-12-13  7:01                                               ` Yinghai Lu
@ 2012-12-13 19:13                                               ` Borislav Petkov
  2012-12-13 21:36                                                 ` H. Peter Anvin
  1 sibling, 1 reply; 127+ messages in thread
From: Borislav Petkov @ 2012-12-13 19:13 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Yinghai Lu, Yu, Fenghua, mingo, linux-kernel, tglx, hpa,
	linux-tip-commits, Konrad Rzeszutek Wilk, Stefano Stabellini

On Wed, Dec 12, 2012 at 09:26:47PM -0800, H. Peter Anvin wrote:
> On 12/12/2012 09:12 PM, H. Peter Anvin wrote:
> >Here is a version that compiles.  It doesn't *boot* yet, because the
> >switchover from dynamic mode to the real pagetables doesn't happen right
> >and so we end up on an uninitialized set of page tables.
> >
> >The new page table setup in tip:x86/mm2 should make that easier to
> >achieve, however... I won't have time to test this out tonight, though.
> >
> >     -hpa
> 
> Well, minus a simple brainfart now it actually gets into the page
> table setup.

If by this you mean that you can only see

"Decompressing Linux...
Parsing ELF.."

and then the VM reboots (running it in KVM) then ok, I'm seeing this
here too.

Thanks.

-- 
Regards/Gruss,
    Boris.

Sent from a fat crate under my desk. Formatting is fine.
--

^ permalink raw reply	[flat|nested] 127+ messages in thread

* Re: [tip:x86/microcode] x86/microcode_intel_early.c: Early update ucode on Intel's CPU
  2012-12-13 19:13                                               ` Borislav Petkov
@ 2012-12-13 21:36                                                 ` H. Peter Anvin
  2012-12-14  9:11                                                   ` Yinghai Lu
  0 siblings, 1 reply; 127+ messages in thread
From: H. Peter Anvin @ 2012-12-13 21:36 UTC (permalink / raw)
  To: Borislav Petkov, Yinghai Lu, Yu, Fenghua, mingo, linux-kernel,
	tglx, hpa, linux-tip-commits, Konrad Rzeszutek Wilk,
	Stefano Stabellini

On 12/13/2012 11:13 AM, Borislav Petkov wrote:
> On Wed, Dec 12, 2012 at 09:26:47PM -0800, H. Peter Anvin wrote:
>>
>> Well, minus a simple brainfart now it actually gets into the page
>> table setup.
>
> If by this you mean that you can only see
>
> "Decompressing Linux...
> Parsing ELF.."
>
> and then the VM reboots (running it in KVM) then ok, I'm seeing this
> here too.
>

: tazenda 111 ; qemu-kvm -smp 2 -m 2048 -hda 
~/qemu/fc10/qemu-fc10-64.img -serial stdio -kernel 
o.x86_64/arch/x86/boot/bzImage -append 'ro root=/dev/sda1 console=ttyS0 
earlyprintk=serial,ttyS0 debug'
early console in setup code
early console in decompress_kernel

Decompressing Linux... Parsing ELF... done.
Booting the kernel.
[    0.000000] Initializing cgroup subsys cpuset
[    0.000000] Initializing cgroup subsys cpu
[    0.000000] Linux version 3.7.0+ (hpa@tazenda.hos.anvin.org) (gcc 
version 4.7.2 20120921 (Red Hat 4.7.2-2) (GCC) ) #16 SMP Wed Dec 12 
21:24:54 PST 2012
[    0.000000] Command line: ro root=/dev/sda1 console=ttyS0 
earlyprintk=serial,ttyS0 debug
[    0.000000] e820: BIOS-provided physical RAM map:
[    0.000000] BIOS-e820: [mem 0x0000000000000000-0x000000000009fbff] usable
[    0.000000] BIOS-e820: [mem 0x000000000009fc00-0x000000000009ffff] 
reserved
[    0.000000] BIOS-e820: [mem 0x00000000000f0000-0x00000000000fffff] 
reserved
[    0.000000] BIOS-e820: [mem 0x0000000000100000-0x000000007fffdfff] usable
[    0.000000] BIOS-e820: [mem 0x000000007fffe000-0x000000007fffffff] 
reserved
[    0.000000] BIOS-e820: [mem 0x00000000feffc000-0x00000000feffffff] 
reserved
[    0.000000] BIOS-e820: [mem 0x00000000fffc0000-0x00000000ffffffff] 
reserved
[    0.000000] bootconsole [earlyser0] enabled
[    0.000000] NX (Execute Disable) protection: active
[    0.000000] DMI 2.4 present.
[    0.000000] DMI: Bochs Bochs, BIOS Bochs 01/01/2011
[    0.000000] e820: update [mem 0x00000000-0x0000ffff] usable ==> reserved
[    0.000000] e820: remove [mem 0x000a0000-0x000fffff] usable
[    0.000000] No AGP bridge found
[    0.000000] e820: last_pfn = 0x7fffe max_arch_pfn = 0x400000000
[    0.000000] MTRR default type: write-back
[    0.000000] MTRR fixed ranges enabled:
[    0.000000]   00000-9FFFF write-back
[    0.000000]   A0000-BFFFF uncachable
[    0.000000]   C0000-FFFFF write-protect
[    0.000000] MTRR variable ranges enabled:
[    0.000000]   0 base 00E0000000 mask FFE0000000 uncachable
[    0.000000]   1 disabled
[    0.000000]   2 disabled
[    0.000000]   3 disabled
[    0.000000]   4 disabled
[    0.000000]   5 disabled
[    0.000000]   6 disabled
[    0.000000]   7 disabled
[    0.000000] PAT not supported by CPU.
[    0.000000] found SMP MP-table at [mem 0x000fdae0-0x000fdaef] mapped 
at [ffff8800000fdae0]
[    0.000000] initial memory mapped: [mem 0x00000000-0xffffffffffffffff]
[    0.000000] Base memory trampoline at [ffff880000099000] 99000 size 24576
[    0.000000] init_memory_mapping: [mem 0x00000000-0x7fffdfff]
[    0.000000]  [mem 0x00000000-0x7fdfffff] page 2M
[    0.000000]  [mem 0x7fe00000-0x7fffdfff] page 4k
[    0.000000] Kernel panic - not syncing: Cannot find space for the 
kernel page tables
[    0.000000] Pid: 0, comm: swapper Not tainted 3.7.0+ #16
[    0.000000] Call Trace:
[    0.000000]  [<ffffffff817f0d2e>] panic+0xb6/0x1b5
[    0.000000]  [<ffffffff817e3801>] init_memory_mapping+0x471/0x5a0
[    0.000000]  [<ffffffff81ecd37f>] setup_arch+0x65c/0xb71
[    0.000000]  [<ffffffff81ec998e>] start_kernel+0x8a/0x348
[    0.000000]  [<ffffffff81ec9452>] x86_64_start_reservations+0x132/0x136
[    0.000000]  [<ffffffff81ec94fe>] x86_64_start_kernel+0xa8/0xad



-- 
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel.  I don't speak on their behalf.


^ permalink raw reply	[flat|nested] 127+ messages in thread

* Re: [tip:x86/microcode] x86/microcode_intel_early.c: Early update ucode on Intel's CPU
  2012-12-13 21:36                                                 ` H. Peter Anvin
@ 2012-12-14  9:11                                                   ` Yinghai Lu
  2012-12-14 18:16                                                     ` H. Peter Anvin
  2012-12-14 19:46                                                     ` H. Peter Anvin
  0 siblings, 2 replies; 127+ messages in thread
From: Yinghai Lu @ 2012-12-14  9:11 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Borislav Petkov, Yu, Fenghua, mingo, linux-kernel, tglx, hpa,
	linux-tip-commits, Konrad Rzeszutek Wilk, Stefano Stabellini

[-- Attachment #1: Type: text/plain, Size: 160547 bytes --]

On Thu, Dec 13, 2012 at 1:36 PM, H. Peter Anvin <hpa@zytor.com> wrote:
>
> : tazenda 111 ; qemu-kvm -smp 2 -m 2048 -hda ~/qemu/fc10/qemu-fc10-64.img
> -serial stdio -kernel o.x86_64/arch/x86/boot/bzImage -append 'ro
> root=/dev/sda1 console=ttyS0 earlyprintk=serial,ttyS0 debug'
> early console in setup code
> early console in decompress_kernel
>
> [    0.000000] init_memory_mapping: [mem 0x00000000-0x7fffdfff]
> [    0.000000]  [mem 0x00000000-0x7fdfffff] page 2M
> [    0.000000]  [mem 0x7fe00000-0x7fffdfff] page 4k
> [    0.000000] Kernel panic - not syncing: Cannot find space for the kernel
> page tables
> [    0.000000] Pid: 0, comm: swapper Not tainted 3.7.0+ #16
> [    0.000000] Call Trace:
> [    0.000000]  [<ffffffff817f0d2e>] panic+0xb6/0x1b5
> [    0.000000]  [<ffffffff817e3801>] init_memory_mapping+0x471/0x5a0
> [    0.000000]  [<ffffffff81ecd37f>] setup_arch+0x65c/0xb71
> [    0.000000]  [<ffffffff81ec998e>] start_kernel+0x8a/0x348
> [    0.000000]  [<ffffffff81ec9452>] x86_64_start_reservations+0x132/0x136
> [    0.000000]  [<ffffffff81ec94fe>] x86_64_start_kernel+0xa8/0xad

attached works on kvm local, but SMP does not work yet.

ON TOP of linus tree + tip:x86/mm2

I added mapping to kernel to init_level2_mapping with BRK before #PF
handler still works. (before early_trap_init)
copy entries into init_level2_page from early_level4_pgt....
split setup_real_mode to reserve and copy... copy need to after
init_mem_mapping for system with more than 512G ram.

my plan is using this one replace

      [PATCH v6 06/27] x86, 64bit: Set extra ident mapping for whole
kernel range

early console in setup code
Probing EDD (edd=off to disable)... ok
early console in decompress_kernel

Decompressing Linux... Parsing ELF... done.
Booting the kernel.
[    0.000000] BRK [0x03a9f000, 0x03a9ffff] PGTABLE
[    0.000000] BRK [0x03aa0000, 0x03aa0fff] PGTABLE
[    0.000000] Initializing cgroup subsys cpuset
[    0.000000] Initializing cgroup subsys cpu
[    0.000000] Linux version 3.7.0-yh-07359-ge48ae89-dirty
(yhlu@linux-siqj.site) (gcc version 4.7.1 20120723 [gcc-4_7-branch
revision 189773] (SUSE Linux) ) #912 SMP Fri Dec 14 00:56:36 PST 2012
[    0.000000] Command line: BOOT_IMAGE=linux debug ignore_loglevel
initcall_debug pci=routeirq ramdisk_size=262144 root=/dev/ram0 rw
ip=dhcp console=uart8250,io,0x3f8,115200 initrd=initrd.img
[    0.000000] e820: BIOS-provided physical RAM map:
[    0.000000] BIOS-e820: [mem 0x0000000000000000-0x000000000009fbff] usable
[    0.000000] BIOS-e820: [mem 0x000000000009fc00-0x000000000009ffff] reserved
[    0.000000] BIOS-e820: [mem 0x00000000000f0000-0x00000000000fffff] reserved
[    0.000000] BIOS-e820: [mem 0x0000000000100000-0x00000000dfffdfff] usable
[    0.000000] BIOS-e820: [mem 0x00000000dfffe000-0x00000000dfffffff] reserved
[    0.000000] BIOS-e820: [mem 0x00000000feffc000-0x00000000feffffff] reserved
[    0.000000] BIOS-e820: [mem 0x00000000fffc0000-0x00000000ffffffff] reserved
[    0.000000] BIOS-e820: [mem 0x0000000100000000-0x000000019fffffff] usable
[    0.000000] debug: ignoring loglevel setting.
[    0.000000] Early serial console at I/O port 0x3f8 (options '115200')
[    0.000000] bootconsole [uart0] enabled
[    0.000000] NX (Execute Disable) protection: active
[    0.000000] DMI 2.4 present.
[    0.000000] DMI: Bochs Bochs, BIOS Bochs 01/01/2011
[    0.000000] e820: update [mem 0x00000000-0x0000ffff] usable ==> reserved
[    0.000000] e820: remove [mem 0x000a0000-0x000fffff] usable
[    0.000000] No AGP bridge found
[    0.000000] e820: last_pfn = 0x1a0000 max_arch_pfn = 0x400000000
[    0.000000] MTRR default type: write-back
[    0.000000] MTRR fixed ranges enabled:
[    0.000000]   00000-9FFFF write-back
[    0.000000]   A0000-BFFFF uncachable
[    0.000000]   C0000-FFFFF write-protect
[    0.000000] MTRR variable ranges enabled:
[    0.000000]   0 base 00E0000000 mask FFE0000000 uncachable
[    0.000000]   1 disabled
[    0.000000]   2 disabled
[    0.000000]   3 disabled
[    0.000000]   4 disabled
[    0.000000]   5 disabled
[    0.000000]   6 disabled
[    0.000000]   7 disabled
[    0.000000] PAT not supported by CPU.
[    0.000000] e820: last_pfn = 0xdfffe max_arch_pfn = 0x400000000
[    0.000000] found SMP MP-table at [mem 0x000fda40-0x000fda4f]
mapped at [ffff8800000fda40]
[    0.000000] initial memory mapped: [mem 0x00000000-0xffffffffffffffff]
[    0.000000] Base memory trampoline at [ffff880000099000] 99000 size 24576
[    0.000000] init_memory_mapping: [mem 0x00000000-0x000fffff]
[    0.000000]  [mem 0x00000000-0x000fffff] page 4k
[    0.000000] BRK [0x03aa1000, 0x03aa1fff] PGTABLE
[    0.000000] init_memory_mapping: [mem 0x19fe00000-0x19fffffff]
[    0.000000]  [mem 0x19fe00000-0x19fffffff] page 2M
[    0.000000] BRK [0x03aa2000, 0x03aa2fff] PGTABLE
[    0.000000] init_memory_mapping: [mem 0x19c000000-0x19fdfffff]
[    0.000000]  [mem 0x19c000000-0x19fdfffff] page 2M
[    0.000000] init_memory_mapping: [mem 0x180000000-0x19bffffff]
[    0.000000]  [mem 0x180000000-0x19bffffff] page 2M
[    0.000000] init_memory_mapping: [mem 0x00100000-0xdfffdfff]
[    0.000000]  [mem 0x00100000-0x001fffff] page 4k
[    0.000000]  [mem 0x00200000-0xdfdfffff] page 2M
[    0.000000]  [mem 0xdfe00000-0xdfffdfff] page 4k
[    0.000000] BRK [0x03aa3000, 0x03aa3fff] PGTABLE
[    0.000000] BRK [0x03aa4000, 0x03aa4fff] PGTABLE
[    0.000000] BRK [0x03aa5000, 0x03aa5fff] PGTABLE
[    0.000000] BRK [0x03aa6000, 0x03aa6fff] PGTABLE
[    0.000000] init_memory_mapping: [mem 0x100000000-0x17fffffff]
[    0.000000]  [mem 0x100000000-0x17fffffff] page 2M
[    0.000000] BRK [0x03aa7000, 0x03aa7fff] PGTABLE
[    0.000000] RAMDISK: [mem 0x7d9dc000-0x7fffefff]
[    0.000000] ACPI: RSDP 00000000000fd8c0 00014 (v00 BOCHS )
[    0.000000] ACPI: RSDT 00000000dfffe430 00038 (v01 BOCHS  BXPCRSDT
00000001 BXPC 00000001)
[    0.000000] ACPI: FACP 00000000dfffff80 00074 (v01 BOCHS  BXPCFACP
00000001 BXPC 00000001)
[    0.000000] ACPI: DSDT 00000000dfffe470 0124A (v01   BXPC   BXDSDT
00000001 INTL 20100528)
[    0.000000] ACPI: FACS 00000000dfffff40 00040
[    0.000000] ACPI: SSDT 00000000dffffe90 000AF (v01 BOCHS  BXPCSSDT
00000001 BXPC 00000001)
[    0.000000] ACPI: APIC 00000000dffffd70 00078 (v01 BOCHS  BXPCAPIC
00000001 BXPC 00000001)
[    0.000000] ACPI: HPET 00000000dffffd30 00038 (v01 BOCHS  BXPCHPET
00000001 BXPC 00000001)
[    0.000000] ACPI: SSDT 00000000dffff6c0 0066E (v01   BXPC BXSSDTPC
00000001 INTL 20100528)
[    0.000000] ACPI: Local APIC address 0xfee00000
[    0.000000] No NUMA configuration found
[    0.000000] Faking a node at [mem 0x0000000000000000-0x000000019fffffff]
[    0.000000] Initmem setup node 0 [mem 0x00000000-0x19fffffff]
[    0.000000]   NODE_DATA [mem 0x19ffd8000-0x19fffefff]
[    0.000000]  [ffffea0000000000-ffffea00067fffff] PMD ->
[ffff880199600000-ffff88019f5fffff] on node 0
[    0.000000] Zone ranges:
[    0.000000]   DMA      [mem 0x00010000-0x00ffffff]
[    0.000000]   DMA32    [mem 0x01000000-0xffffffff]
[    0.000000]   Normal   [mem 0x100000000-0x19fffffff]
[    0.000000] Movable zone start for each node
[    0.000000] Early memory node ranges
[    0.000000]   node   0: [mem 0x00010000-0x0009efff]
[    0.000000]   node   0: [mem 0x00100000-0xdfffdfff]
[    0.000000]   node   0: [mem 0x100000000-0x19fffffff]
[    0.000000] On node 0 totalpages: 1572749
[    0.000000]   DMA zone: 64 pages used for memmap
[    0.000000]   DMA zone: 6 pages reserved
[    0.000000]   DMA zone: 3913 pages, LIFO batch:0
[    0.000000]   DMA32 zone: 14272 pages used for memmap
[    0.000000]   DMA32 zone: 899134 pages, LIFO batch:31
[    0.000000]   Normal zone: 10240 pages used for memmap
[    0.000000]   Normal zone: 645120 pages, LIFO batch:31
[    0.000000] ACPI: PM-Timer IO Port: 0xb008
[    0.000000] ACPI: Local APIC address 0xfee00000
[    0.000000] ACPI: LAPIC (acpi_id[0x00] lapic_id[0x00] enabled)
[    0.000000] ACPI: LAPIC_NMI (acpi_id[0xff] dfl dfl lint[0x1])
[    0.000000] ACPI: IOAPIC (id[0x00] address[0xfec00000] gsi_base[0])
[    0.000000] IOAPIC[0]: apic_id 0, version 17, address 0xfec00000, GSI 0-23
[    0.000000] ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl)
[    0.000000] ACPI: INT_SRC_OVR (bus 0 bus_irq 5 global_irq 5 high level)
[    0.000000] ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 high level)
[    0.000000] ACPI: INT_SRC_OVR (bus 0 bus_irq 10 global_irq 10 high level)
[    0.000000] ACPI: INT_SRC_OVR (bus 0 bus_irq 11 global_irq 11 high level)
[    0.000000] ACPI: IRQ0 used by override.
[    0.000000] ACPI: IRQ2 used by override.
[    0.000000] ACPI: IRQ5 used by override.
[    0.000000] ACPI: IRQ9 used by override.
[    0.000000] ACPI: IRQ10 used by override.
[    0.000000] ACPI: IRQ11 used by override.
[    0.000000] Using ACPI (MADT) for SMP configuration information
[    0.000000] ACPI: HPET id: 0x8086a201 base: 0xfed00000
[    0.000000] smpboot: Allowing 1 CPUs, 0 hotplug CPUs
[    0.000000] nr_irqs_gsi: 40
[    0.000000] PM: Registered nosave memory: 000000000009f000 - 00000000000a0000
[    0.000000] PM: Registered nosave memory: 00000000000a0000 - 00000000000f0000
[    0.000000] PM: Registered nosave memory: 00000000000f0000 - 0000000000100000
[    0.000000] PM: Registered nosave memory: 00000000dfffe000 - 00000000e0000000
[    0.000000] PM: Registered nosave memory: 00000000e0000000 - 00000000feffc000
[    0.000000] PM: Registered nosave memory: 00000000feffc000 - 00000000ff000000
[    0.000000] PM: Registered nosave memory: 00000000ff000000 - 00000000fffc0000
[    0.000000] PM: Registered nosave memory: 00000000fffc0000 - 0000000100000000
[    0.000000] e820: [mem 0xe0000000-0xfeffbfff] available for PCI devices
[    0.000000] setup_percpu: NR_CPUS:4096 nr_cpumask_bits:1
nr_cpu_ids:1 nr_node_ids:1
[    0.000000] PERCPU: Embedded 476 pages/cpu @ffff88019fc00000
s1918224 r8192 d23280 u2097152
[    0.000000] pcpu-alloc: s1918224 r8192 d23280 u2097152 alloc=1*2097152
[    0.000000] pcpu-alloc: [0] 0
[    0.000000] Built 1 zonelists in Node order, mobility grouping on.
Total pages: 1548167
[    0.000000] Policy zone: Normal
[    0.000000] Kernel command line: BOOT_IMAGE=linux debug
ignore_loglevel initcall_debug pci=routeirq ramdisk_size=262144
root=/dev/ram0 rw ip=dhcp console=uart8250,io,0x3f8,115200
initrd=initrd.img
[    0.000000] PID hash table entries: 4096 (order: 3, 32768 bytes)
[    0.000000] __ex_table already sorted, skipping sort
[    0.000000] Checking aperture...
[    0.000000] No AGP bridge found
[    0.000000] Memory: 6040796k/6815744k available (17505k kernel
code, 524748k absent, 250200k reserved, 10054k data, 3444k init)
[    0.000000] SLUB: Genslabs=15, HWalign=64, Order=0-3, MinObjects=0,
CPUs=1, Nodes=1
[    0.000000] Hierarchical RCU implementation.
[    0.000000] 	RCU restricting CPUs from NR_CPUS=4096 to nr_cpu_ids=1.
[    0.000000] NR_IRQS:262400 nr_irqs:256 16
[    0.000000] Console: colour VGA+ 80x25
[    0.000000] console [ttyS0] enabled, bootconsole disabled
[    0.000000] console [ttyS0] enabled, bootconsole disabled
[    0.000000] Lock dependency validator: Copyright (c) 2006 Red Hat,
Inc., Ingo Molnar
[    0.000000] ... MAX_LOCKDEP_SUBCLASSES:  8
[    0.000000] ... MAX_LOCK_DEPTH:          48
[    0.000000] ... MAX_LOCKDEP_KEYS:        8191
[    0.000000] ... CLASSHASH_SIZE:          4096
[    0.000000] ... MAX_LOCKDEP_ENTRIES:     16384
[    0.000000] ... MAX_LOCKDEP_CHAINS:      32768
[    0.000000] ... CHAINHASH_SIZE:          16384
[    0.000000]  memory used by lock dependency info: 6367 kB
[    0.000000]  per task-struct memory footprint: 2688 bytes
[    0.000000] WARNING: lockdep init error!
lock-init_mm.page_table_lock was acquiredbefore lockdep_init
[    0.000000] Call stack leading to lockdep invocation was:
[    0.000000]  [<ffffffff81059f8f>] save_stack_trace+0x2f/0x50
[    0.000000]  [<ffffffff810e966d>] __lock_acquire+0xed/0xc40
[    0.000000]  [<ffffffff810ea7ad>] lock_acquire+0x9d/0x120
[    0.000000]  [<ffffffff8210bbc6>] _raw_spin_lock+0x36/0x70
[    0.000000]  [<ffffffff820ee9df>] phys_pmd_init+0x188/0x264
[    0.000000]  [<ffffffff820eeca4>] phys_pud_init+0x1e9/0x298
[    0.000000]  [<ffffffff820eee91>] kernel_physical_mapping_init+0x13e/0x1eb
[    0.000000]  [<ffffffff82cc14d5>] x86_64_start_kernel+0xc4/0x106
[    0.000000]  [<ffffffffffffffff>] 0xffffffffffffffff
[    0.000000] allocated 25165824 bytes of page_cgroup
[    0.000000] please try 'cgroup_disable=memory' option if you don't
want memory cgroups
[    0.000000] hpet clockevent registered
[    0.000000] tsc: Fast TSC calibration using PIT
[    0.000000] tsc: Detected 2491.962 MHz processor
[    0.004002] Calibrating delay loop (skipped), value calculated
using timer frequency.. 4983.92 BogoMIPS (lpj=9967848)
[    0.006588] pid_max: default: 32768 minimum: 301
[    0.008909] Dentry cache hash table entries: 1048576 (order: 11,
8388608 bytes)
[    0.013872] Inode-cache hash table entries: 524288 (order: 10, 4194304 bytes)
[    0.016147] Mount-cache hash table entries: 256
[    0.017971] Initializing cgroup subsys cpuacct
[    0.019083] Initializing cgroup subsys memory
[    0.020068] Initializing cgroup subsys devices
[    0.021162] Initializing cgroup subsys freezer
[    0.022259] Initializing cgroup subsys blkio
[    0.023423] mce: CPU supports 10 MCE banks
[    0.024051] Last level iTLB entries: 4KB 0, 2MB 0, 4MB 0
[    0.024051] Last level dTLB entries: 4KB 0, 2MB 0, 4MB 0
[    0.024051] tlb_flushall_shift: 6
[    0.047824] Freeing SMP alternatives: 48k freed
[    0.048018] ACPI: Core revision 20121018
[    0.052355] ftrace: allocating 53391 entries in 209 pages
[    0.072647] ..TIMER: vector=0x30 apic1=0 pin1=2 apic2=-1 pin2=-1
[    0.113775] smpboot: CPU0: Intel QEMU Virtual CPU version 1.2.50
(fam: 06, model: 02, stepping: 03)
[    0.116000] calling  trace_init_flags_sys_exit+0x0/0x12 @ 1
[    0.116000] initcall trace_init_flags_sys_exit+0x0/0x12 returned 0
after 0 usecs
[    0.116000] calling  trace_init_flags_sys_enter+0x0/0x12 @ 1
[    0.116009] initcall trace_init_flags_sys_enter+0x0/0x12 returned 0
after 0 usecs
[    0.117828] calling  init_hw_perf_events+0x0/0x40d @ 1
[    0.119077] Performance Events: unsupported p6 CPU model 2 no PMU
driver, software events only.
[    0.121392] initcall init_hw_perf_events+0x0/0x40d returned 0 after
3906 usecs
[    0.123124] calling  register_trigger_all_cpu_backtrace+0x0/0x16 @ 1
[    0.124020] initcall register_trigger_all_cpu_backtrace+0x0/0x16
returned 0 after 0 usecs
[    0.126062] calling  spawn_ksoftirqd+0x0/0x26 @ 1
[    0.128104] initcall spawn_ksoftirqd+0x0/0x26 returned 0 after 0 usecs
[    0.129682] calling  init_workqueues+0x0/0x3c5 @ 1
[    0.131021] initcall init_workqueues+0x0/0x3c5 returned 0 after 0 usecs
[    0.132006] calling  migration_init+0x0/0x6c @ 1
[    0.133061] initcall migration_init+0x0/0x6c returned 0 after 0 usecs
[    0.134661] calling  cpu_stop_init+0x0/0xcd @ 1
[    0.136042] initcall cpu_stop_init+0x0/0xcd returned 0 after 0 usecs
[    0.137560] calling  rcu_scheduler_really_started+0x0/0x12 @ 1
[    0.139006] initcall rcu_scheduler_really_started+0x0/0x12 returned
0 after 0 usecs
[    0.140012] calling  rcu_spawn_gp_kthread+0x0/0x87 @ 1
[    0.141353] initcall rcu_spawn_gp_kthread+0x0/0x87 returned 0 after 0 usecs
[    0.144009] calling  relay_init+0x0/0x14 @ 1
[    0.145068] initcall relay_init+0x0/0x14 returned 0 after 0 usecs
[    0.146537] calling  tracer_alloc_buffers+0x0/0x2b2 @ 1
[    0.148056] initcall tracer_alloc_buffers+0x0/0x2b2 returned 0 after 0 usecs
[    0.149762] calling  init_events+0x0/0x60 @ 1
[    0.150831] initcall init_events+0x0/0x60 returned 0 after 0 usecs
[    0.152005] calling  init_trace_printk+0x0/0x12 @ 1
[    0.153189] initcall init_trace_printk+0x0/0x12 returned 0 after 0 usecs
[    0.154806] calling  init_ftrace_syscalls+0x0/0x6f @ 1
[    0.156494] initcall init_ftrace_syscalls+0x0/0x6f returned 0 after 0 usecs
[    0.158225] Brought up 1 CPUs
[    0.158956] smpboot: Total of 1 processors activated (4983.92 BogoMIPS)
[    0.160258] NMI watchdog: disabled (cpu0): hardware events not enabled
[    0.165964] calling  ipc_ns_init+0x0/0x14 @ 1
[    0.166727] initcall ipc_ns_init+0x0/0x14 returned 0 after 0 usecs
[    0.168011] calling  init_mmap_min_addr+0x0/0x16 @ 1
[    0.169191] initcall init_mmap_min_addr+0x0/0x16 returned 0 after 0 usecs
[    0.170895] calling  init_cpufreq_transition_notifier_list+0x0/0x1b @ 1
[    0.172020] initcall init_cpufreq_transition_notifier_list+0x0/0x1b
returned 0 after 0 usecs
[    0.174052] calling  net_ns_init+0x0/0xe7 @ 1
[    0.176054] initcall net_ns_init+0x0/0xe7 returned 0 after 3906 usecs
[    0.177732] calling  e820_mark_nvs_memory+0x0/0x3d @ 1
[    0.178981] initcall e820_mark_nvs_memory+0x0/0x3d returned 0 after 0 usecs
[    0.180007] calling  cpufreq_tsc+0x0/0x33 @ 1
[    0.181066] initcall cpufreq_tsc+0x0/0x33 returned 0 after 0 usecs
[    0.182572] calling  reboot_init+0x0/0x20 @ 1
[    0.184029] initcall reboot_init+0x0/0x20 returned 0 after 0 usecs
[    0.185529] calling  init_lapic_sysfs+0x0/0x23 @ 1
[    0.186704] initcall init_lapic_sysfs+0x0/0x23 returned 0 after 0 usecs
[    0.188006] calling  cpu_hotplug_pm_sync_init+0x0/0x14 @ 1
[    0.189318] initcall cpu_hotplug_pm_sync_init+0x0/0x14 returned 0
after 0 usecs
[    0.191077] calling  alloc_frozen_cpus+0x0/0x1e @ 1
[    0.192010] initcall alloc_frozen_cpus+0x0/0x1e returned 0 after 0 usecs
[    0.193633] calling  ksysfs_init+0x0/0x97 @ 1
[    0.194719] initcall ksysfs_init+0x0/0x97 returned 0 after 0 usecs
[    0.196005] calling  pm_init+0x0/0x7f @ 1
[    0.196997] initcall pm_init+0x0/0x7f returned 0 after 0 usecs
[    0.198390] calling  pm_disk_init+0x0/0x19 @ 1
[    0.200024] initcall pm_disk_init+0x0/0x19 returned 0 after 0 usecs
[    0.201603] calling  swsusp_header_init+0x0/0x40 @ 1
[    0.202846] initcall swsusp_header_init+0x0/0x40 returned 0 after 0 usecs
[    0.204009] calling  init_jiffies_clocksource+0x0/0x12 @ 1
[    0.205354] initcall init_jiffies_clocksource+0x0/0x12 returned 0
after 0 usecs
[    0.208007] calling  ftrace_mod_cmd_init+0x0/0x12 @ 1
[    0.209248] initcall ftrace_mod_cmd_init+0x0/0x12 returned 0 after 0 usecs
[    0.210901] calling  init_function_trace+0x0/0x3e @ 1
[    0.212008] initcall init_function_trace+0x0/0x3e returned 0 after 0 usecs
[    0.213663] calling  init_irqsoff_tracer+0x0/0x14 @ 1
[    0.214887] initcall init_irqsoff_tracer+0x0/0x14 returned 0 after 0 usecs
[    0.216005] calling  init_wakeup_tracer+0x0/0x22 @ 1
[    0.217214] initcall init_wakeup_tracer+0x0/0x22 returned 0 after 0 usecs
[    0.218853] calling  init_graph_trace+0x0/0x67 @ 1
[    0.220008] initcall init_graph_trace+0x0/0x67 returned 0 after 0 usecs
[    0.221588] calling  event_trace_enable+0x0/0x9e @ 1
[    0.224930] initcall event_trace_enable+0x0/0x9e returned 0 after 3906 usecs
[    0.226635] calling  init_zero_pfn+0x0/0x39 @ 1
[    0.227730] initcall init_zero_pfn+0x0/0x39 returned 0 after 0 usecs
[    0.228009] calling  memory_failure_init+0x0/0xe3 @ 1
[    0.229242] initcall memory_failure_init+0x0/0xe3 returned 0 after 0 usecs
[    0.232010] calling  fsnotify_init+0x0/0x34 @ 1
[    0.233180] initcall fsnotify_init+0x0/0x34 returned 0 after 0 usecs
[    0.234719] calling  filelock_init+0x0/0x2a @ 1
[    0.236015] initcall filelock_init+0x0/0x2a returned 0 after 0 usecs
[    0.237558] calling  init_script_binfmt+0x0/0x16 @ 1
[    0.238771] initcall init_script_binfmt+0x0/0x16 returned 0 after 0 usecs
[    0.240006] calling  init_elf_binfmt+0x0/0x16 @ 1
[    0.241143] initcall init_elf_binfmt+0x0/0x16 returned 0 after 0 usecs
[    0.242714] calling  debugfs_init+0x0/0x5b @ 1
[    0.244015] initcall debugfs_init+0x0/0x5b returned 0 after 0 usecs
[    0.245391] calling  securityfs_init+0x0/0x52 @ 1
[    0.246061] initcall securityfs_init+0x0/0x52 returned 0 after 0 usecs
[    0.246756] calling  calibrate_xor_blocks+0x0/0x95 @ 1
[    0.247369] xor: automatically using best checksumming function:
[    0.288005]    generic_sse: 13428.000 MB/sec
[    0.288495] initcall calibrate_xor_blocks+0x0/0x95 returned 0 after
42968 usecs
[    0.289482] calling  random32_init+0x0/0xd4 @ 1
[    0.290078] initcall random32_init+0x0/0xd4 returned 0 after 0 usecs
[    0.290839] calling  test_atomic64+0x0/0x43e @ 1
[    0.291433] atomic64 test passed for x86-64 platform with CX8 and with SSE
[    0.292008] initcall test_atomic64+0x0/0x43e returned 0 after 3906 usecs
[    0.292749] calling  virtio_init+0x0/0x40 @ 1
[    0.293646] initcall virtio_init+0x0/0x40 returned 0 after 0 usecs
[    0.294364] calling  cpufreq_core_init+0x0/0xc1 @ 1
[    0.294827] initcall cpufreq_core_init+0x0/0xc1 returned 0 after 0 usecs
[    0.295530] calling  cpuidle_init+0x0/0x3f @ 1
[    0.296017] initcall cpuidle_init+0x0/0x3f returned 0 after 0 usecs
[    0.296668] calling  bsp_pm_check_init+0x0/0x14 @ 1
[    0.297141] initcall bsp_pm_check_init+0x0/0x14 returned 0 after 0 usecs
[    0.297798] calling  sock_init+0x0/0x7f @ 1
[    0.298425] initcall sock_init+0x0/0x7f returned 0 after 0 usecs
[    0.299058] calling  net_inuse_init+0x0/0x26 @ 1
[    0.299584] initcall net_inuse_init+0x0/0x26 returned 0 after 0 usecs
[    0.300005] calling  netlink_proto_init+0x0/0x1dd @ 1
[    0.300603] NET: Registered protocol family 16
[    0.301145] initcall netlink_proto_init+0x0/0x1dd returned 0 after 0 usecs
[    0.302086] calling  bdi_class_init+0x0/0x49 @ 1
[    0.302758] initcall bdi_class_init+0x0/0x49 returned 0 after 0 usecs
[    0.303380] calling  kobject_uevent_init+0x0/0x12 @ 1
[    0.304020] initcall kobject_uevent_init+0x0/0x12 returned 0 after 0 usecs
[    0.304819] calling  gpiolib_sysfs_init+0x0/0x93 @ 1
[    0.305440] initcall gpiolib_sysfs_init+0x0/0x93 returned 0 after 0 usecs
[    0.306225] calling  pcibus_class_init+0x0/0x19 @ 1
[    0.306822] initcall pcibus_class_init+0x0/0x19 returned 0 after 0 usecs
[    0.308019] calling  pci_driver_init+0x0/0x19 @ 1
[    0.308616] initcall pci_driver_init+0x0/0x19 returned 0 after 0 usecs
[    0.309309] calling  rio_bus_init+0x0/0x37 @ 1
[    0.309925] initcall rio_bus_init+0x0/0x37 returned 0 after 0 usecs
[    0.310589] calling  lcd_class_init+0x0/0x4d @ 1
[    0.311188] initcall lcd_class_init+0x0/0x4d returned 0 after 0 usecs
[    0.311810] calling  backlight_class_init+0x0/0x5d @ 1
[    0.312076] initcall backlight_class_init+0x0/0x5d returned 0 after 0 usecs
[    0.312799] calling  video_output_class_init+0x0/0x19 @ 1
[    0.313384] initcall video_output_class_init+0x0/0x19 returned 0
after 0 usecs
[    0.314127] calling  tty_class_init+0x0/0x34 @ 1
[    0.314646] initcall tty_class_init+0x0/0x34 returned 0 after 0 usecs
[    0.315369] calling  vtconsole_class_init+0x0/0xe9 @ 1
[    0.316203] initcall vtconsole_class_init+0x0/0xe9 returned 0 after 0 usecs
[    0.316978] calling  wakeup_sources_debugfs_init+0x0/0x2b @ 1
[    0.317605] initcall wakeup_sources_debugfs_init+0x0/0x2b returned
0 after 0 usecs
[    0.318350] calling  register_node_type+0x0/0x31 @ 1
[    0.319101] initcall register_node_type+0x0/0x31 returned 0 after 0 usecs
[    0.320022] calling  regmap_initcall+0x0/0xd @ 1
[    0.320665] initcall regmap_initcall+0x0/0xd returned 0 after 0 usecs
[    0.321362] calling  i2c_init+0x0/0x76 @ 1
[    0.321975] initcall i2c_init+0x0/0x76 returned 0 after 0 usecs
[    0.322769] calling  amd_postcore_init+0x0/0x14a @ 1
[    0.323412] initcall amd_postcore_init+0x0/0x14a returned 0 after 0 usecs
[    0.324151] calling  set_real_mode_permissions+0x0/0x9b @ 1
[    0.325526] initcall set_real_mode_permissions+0x0/0x9b returned 0
after 0 usecs
[    0.327275] calling  arch_kdebugfs_init+0x0/0x224 @ 1
[    0.328059] initcall arch_kdebugfs_init+0x0/0x224 returned 0 after 0 usecs
[    0.329717] calling  mtrr_if_init+0x0/0x65 @ 1
[    0.330798] initcall mtrr_if_init+0x0/0x65 returned 0 after 0 usecs
[    0.332008] calling  ffh_cstate_init+0x0/0x2d @ 1
[    0.333148] initcall ffh_cstate_init+0x0/0x2d returned 0 after 0 usecs
[    0.336006] calling  acpi_pci_init+0x0/0x5c @ 1
[    0.336812] ACPI: bus type pci registered
[    0.337790] initcall acpi_pci_init+0x0/0x5c returned 0 after 0 usecs
[    0.339321] calling  dma_bus_init+0x0/0x19 @ 1
[    0.340115] initcall dma_bus_init+0x0/0x19 returned 0 after 0 usecs
[    0.341634] calling  dma_channel_table_init+0x0/0x115 @ 1
[    0.342954] initcall dma_channel_table_init+0x0/0x115 returned 0
after 0 usecs
[    0.344023] calling  dmi_id_init+0x0/0x31f @ 1
[    0.345291] initcall dmi_id_init+0x0/0x31f returned 0 after 0 usecs
[    0.348005] calling  dca_init+0x0/0x20 @ 1
[    0.348980] dca service started, version 1.12.1
[    0.350190] initcall dca_init+0x0/0x20 returned 0 after 0 usecs
[    0.351634] calling  pci_arch_init+0x0/0x6a @ 1
[    0.352076] PCI: Using configuration type 1 for base access
[    0.353424] initcall pci_arch_init+0x0/0x6a returned 0 after 0 usecs
[    0.356057] calling  topology_init+0x0/0x97 @ 1
[    0.357610] initcall topology_init+0x0/0x97 returned 0 after 0 usecs
[    0.359150] calling  mtrr_init_finialize+0x0/0x36 @ 1
[    0.360026] initcall mtrr_init_finialize+0x0/0x36 returned 0 after 0 usecs
[    0.361679] calling  init_vdso+0x0/0x158 @ 1
[    0.362713] initcall init_vdso+0x0/0x158 returned 0 after 0 usecs
[    0.364005] calling  param_sysfs_init+0x0/0x1ae @ 1
[    0.426553] initcall param_sysfs_init+0x0/0x1ae returned 0 after 58593 usecs
[    0.428033] calling  pm_sysrq_init+0x0/0x19 @ 1
[    0.429022] initcall pm_sysrq_init+0x0/0x19 returned 0 after 0 usecs
[    0.430554] calling  default_bdi_init+0x0/0x37 @ 1
[    0.432346] initcall default_bdi_init+0x0/0x37 returned 0 after 3906 usecs
[    0.434030] calling  init_bio+0x0/0x10f @ 1
[    0.435131] bio: create slab <bio-0> at 0
[    0.436040] initcall init_bio+0x0/0x10f returned 0 after 3906 usecs
[    0.437529] calling  fsnotify_notification_init+0x0/0x8b @ 1
[    0.438908] initcall fsnotify_notification_init+0x0/0x8b returned 0
after 0 usecs
[    0.440045] calling  cryptomgr_init+0x0/0x12 @ 1
[    0.441160] initcall cryptomgr_init+0x0/0x12 returned 0 after 0 usecs
[    0.442717] calling  blk_settings_init+0x0/0x2a @ 1
[    0.444062] initcall blk_settings_init+0x0/0x2a returned 0 after 0 usecs
[    0.445685] calling  blk_ioc_init+0x0/0x2a @ 1
[    0.446765] initcall blk_ioc_init+0x0/0x2a returned 0 after 0 usecs
[    0.448057] calling  blk_softirq_init+0x0/0x6f @ 1
[    0.449210] initcall blk_softirq_init+0x0/0x6f returned 0 after 0 usecs
[    0.450803] calling  blk_iopoll_setup+0x0/0x6f @ 1
[    0.452027] initcall blk_iopoll_setup+0x0/0x6f returned 0 after 0 usecs
[    0.453652] calling  genhd_device_init+0x0/0x7a @ 1
[    0.456120] initcall genhd_device_init+0x0/0x7a returned 0 after 3906 usecs
[    0.457785] calling  blk_dev_integrity_init+0x0/0x2a @ 1
[    0.459091] initcall blk_dev_integrity_init+0x0/0x2a returned 0 after 0 usecs
[    0.460010] calling  raid6_select_algo+0x0/0x238 @ 1
[    0.528004] raid6: sse2x1    8293 MB/s
[    0.596005] raid6: sse2x2   10229 MB/s
[    0.664007] raid6: sse2x4   11204 MB/s
[    0.664649] raid6: using algorithm sse2x4 (11204 MB/s)
[    0.665919] raid6: using intx1 recovery algorithm
[    0.667044] initcall raid6_select_algo+0x0/0x238 returned 0 after
199218 usecs
[    0.668006] calling  gpiolib_debugfs_init+0x0/0x24 @ 1
[    0.669261] initcall gpiolib_debugfs_init+0x0/0x24 returned 0 after 0 usecs
[    0.670935] calling  pci_slot_init+0x0/0x60 @ 1
[    0.672013] initcall pci_slot_init+0x0/0x60 returned 0 after 0 usecs
[    0.673533] calling  fbmem_init+0x0/0x98 @ 1
[    0.674708] initcall fbmem_init+0x0/0x98 returned 0 after 0 usecs
[    0.676044] calling  acpi_init+0x0/0x28b @ 1
[    0.677098] ACPI: Added _OSI(Module Device)
[    0.678110] ACPI: Added _OSI(Processor Device)
[    0.680028] ACPI: Added _OSI(3.0 _SCP Extensions)
[    0.681174] ACPI: Added _OSI(Processor Aggregator Device)
[    0.684554] ACPI: EC: Look up EC in DSDT
[    0.690020] ACPI: Interpreter enabled
[    0.690882] ACPI: (supports S0 S3 S4 S5)
[    0.692007] ACPI: Using IOAPIC for interrupt routing
[    0.703051] initcall acpi_init+0x0/0x28b returned 0 after 23437 usecs
[    0.704061] calling  acpi_pci_root_init+0x0/0x28 @ 1
[    0.705283] PCI: Using host bridge windows from ACPI; if necessary,
use "pci=nocrs" and report a bug
[    0.707669] ACPI: PCI Root Bridge [PCI0] (domain 0000 [bus 00-ff])
[    0.708056] ACPI: PCI Interrupt Routing Table [\_SB_.PCI0._PRT]
[    0.710149] pci_root PNP0A03:00: ACPI _OSC support notification
failed, disabling PCIe ASPM
[    0.712006] pci_root PNP0A03:00: Unable to request _OSC control
(_OSC support mask: 0x08)
[    0.714303] pci_root PNP0A03:00: fail to add MMCONFIG information,
can't access extended PCI configuration space under this bridge.
[    0.716242] PCI host bridge to bus 0000:00
[    0.720016] pci_bus 0000:00: root bus resource [bus 00-ff]
[    0.721327] pci_bus 0000:00: root bus resource [io  0x0000-0x0cf7]
[    0.722835] pci_bus 0000:00: root bus resource [io  0x0d00-0xffff]
[    0.724009] pci_bus 0000:00: root bus resource [mem 0x000a0000-0x000bffff]
[    0.725660] pci_bus 0000:00: root bus resource [mem 0xe0000000-0xfebfffff]
[    0.727331] pci_bus 0000:00: scanning bus
[    0.728076] pci 0000:00:00.0: [8086:1237] type 00 class 0x060000
[    0.729520] pci 0000:00:00.0: calling quirk_mmio_always_on+0x0/0x10
[    0.731032] calling  quirk_mmio_always_on+0x0/0x10 @ 1 for 0000:00:00.0
[    0.732006] pci fixup quirk_mmio_always_on+0x0/0x10 returned after
0 usecs for 0000:00:00.0
[    0.734307] pci 0000:00:01.0: [8086:7000] type 00 class 0x060100
[    0.736411] pci 0000:00:01.1: [8086:7010] type 00 class 0x010180
[    0.740526] pci 0000:00:01.1: reg 20: [io  0xc040-0xc04f]
[    0.742608] pci 0000:00:01.3: [8086:7113] type 00 class 0x068000
[    0.744017] pci 0000:00:01.3: calling acpi_pm_check_blacklist+0x0/0x50
[    0.745589] calling  acpi_pm_check_blacklist+0x0/0x50 @ 1 for 0000:00:01.3
[    0.747247] pci fixup acpi_pm_check_blacklist+0x0/0x50 returned
after 0 usecs for 0000:00:01.3
[    0.748332] pci 0000:00:01.3: calling quirk_piix4_acpi+0x0/0x170
[    0.749756] calling  quirk_piix4_acpi+0x0/0x170 @ 1 for 0000:00:01.3
[    0.752023] pci 0000:00:01.3: quirk: [io  0xb000-0xb03f] claimed by
PIIX4 ACPI
[    0.753753] pci 0000:00:01.3: quirk: [io  0xb100-0xb10f] claimed by PIIX4 SMB
[    0.756049] pci fixup quirk_piix4_acpi+0x0/0x170 returned after
3906 usecs for 0000:00:01.3
[    0.758059] pci 0000:00:01.3: calling pci_fixup_piix4_acpi+0x0/0x20
[    0.759574] calling  pci_fixup_piix4_acpi+0x0/0x20 @ 1 for 0000:00:01.3
[    0.760007] pci fixup pci_fixup_piix4_acpi+0x0/0x20 returned after
0 usecs for 0000:00:01.3
[    0.762140] pci 0000:00:02.0: [1013:00b8] type 00 class 0x030000
[    0.765231] pci 0000:00:02.0: reg 10: [mem 0xfc000000-0xfdffffff pref]
[    0.768837] pci 0000:00:02.0: reg 14: [mem 0xfebf0000-0xfebf0fff]
[    0.777158] pci 0000:00:02.0: reg 30: [mem 0xfebe0000-0xfebeffff pref]
[    0.778808] pci 0000:00:03.0: [8086:100e] type 00 class 0x020000
[    0.780674] pci 0000:00:03.0: reg 10: [mem 0xfeba0000-0xfebbffff]
[    0.782767] pci 0000:00:03.0: reg 14: [io  0xc000-0xc03f]
[    0.787380] pci 0000:00:03.0: reg 30: [mem 0xfebc0000-0xfebdffff pref]
[    0.788274] pci_bus 0000:00: fixups for bus
[    0.788977] pci_bus 0000:00: bus scan returning with max=00
[    0.790384] ACPI: Invalid Power Resource to register!
[    0.792021] ACPI: Invalid Power Resource to register!
[    0.793254] ACPI: Invalid Power Resource to register!
[    0.794486] ACPI: Invalid Power Resource to register!
[    0.796019] ACPI: Invalid Power Resource to register!
[    0.797255] ACPI: Invalid Power Resource to register!
[    0.798688] ACPI _OSC control for PCIe not granted, disabling ASPM
[    0.807000] initcall acpi_pci_root_init+0x0/0x28 returned 0 after 97656 usecs
[    0.808006] calling  acpi_pci_link_init+0x0/0x3e @ 1
[    0.809363] ACPI: PCI Interrupt Link [LNKA] (IRQs 5 *10 11)
[    0.810946] ACPI: PCI Interrupt Link [LNKB] (IRQs 5 *10 11)
[    0.812528] ACPI: PCI Interrupt Link [LNKC] (IRQs 5 10 *11)
[    0.814073] ACPI: PCI Interrupt Link [LNKD] (IRQs 5 10 *11)
[    0.816502] ACPI: PCI Interrupt Link [LNKS] (IRQs 9) *0
[    0.818040] initcall acpi_pci_link_init+0x0/0x3e returned 0 after 7812 usecs
[    0.819765] calling  pnp_init+0x0/0x19 @ 1
[    0.820121] initcall pnp_init+0x0/0x19 returned 0 after 0 usecs
[    0.821557] calling  misc_init+0x0/0xb6 @ 1
[    0.822650] initcall misc_init+0x0/0xb6 returned 0 after 0 usecs
[    0.824005] calling  vga_arb_device_init+0x0/0xd4 @ 1
[    0.825463] vgaarb: device added:
PCI:0000:00:02.0,decodes=io+mem,owns=io+mem,locks=none
[    0.828005] vgaarb: loaded
[    0.828667] vgaarb: bridge control possible 0000:00:02.0
[    0.829944] initcall vga_arb_device_init+0x0/0xd4 returned 0 after 3906 usecs
[    0.832006] calling  cn_init+0x0/0xd0 @ 1
[    0.832995] initcall cn_init+0x0/0xd0 returned 0 after 0 usecs
[    0.834412] calling  tifm_init+0x0/0x91 @ 1
[    0.835608] initcall tifm_init+0x0/0x91 returned 0 after 0 usecs
[    0.836006] calling  init_scsi+0x0/0x8e @ 1
[    0.837369] SCSI subsystem initialized
[    0.838289] initcall init_scsi+0x0/0x8e returned 0 after 0 usecs
[    0.840006] calling  ata_init+0x0/0x5f @ 1
[    0.840980] ACPI: bus type scsi registered
[    0.842259] libata version 3.00 loaded.
[    0.844012] initcall ata_init+0x0/0x5f returned 0 after 3906 usecs
[    0.845561] calling  phy_init+0x0/0x2e @ 1
[    0.846820] initcall phy_init+0x0/0x2e returned 0 after 0 usecs
[    0.848010] calling  init_pcmcia_cs+0x0/0x3d @ 1
[    0.849234] initcall init_pcmcia_cs+0x0/0x3d returned 0 after 0 usecs
[    0.850752] calling  usb_init+0x0/0x171 @ 1
[    0.852037] ACPI: bus type usb registered
[    0.853175] usbcore: registered new interface driver usbfs
[    0.854591] usbcore: registered new interface driver hub
[    0.856150] usbcore: registered new device driver usb
[    0.857377] initcall usb_init+0x0/0x171 returned 0 after 3906 usecs
[    0.858881] calling  serio_init+0x0/0x35 @ 1
[    0.860097] initcall serio_init+0x0/0x35 returned 0 after 0 usecs
[    0.861578] calling  input_init+0x0/0x108 @ 1
[    0.862707] initcall input_init+0x0/0x108 returned 0 after 0 usecs
[    0.864044] calling  rtc_init+0x0/0x6a @ 1
[    0.865184] initcall rtc_init+0x0/0x6a returned 0 after 0 usecs
[    0.866552] calling  pps_init+0x0/0xb4 @ 1
[    0.867615] pps_core: LinuxPPS API ver. 1 registered
[    0.868015] pps_core: Software ver. 5.3.6 - Copyright 2005-2007
Rodolfo Giometti <giometti@linux.it>
[    0.870211] initcall pps_init+0x0/0xb4 returned 0 after 3906 usecs
[    0.872011] calling  ptp_init+0x0/0xa1 @ 1
[    0.873148] PTP clock support registered
[    0.874138] initcall ptp_init+0x0/0xa1 returned 0 after 0 usecs
[    0.876029] calling  power_supply_class_init+0x0/0x40 @ 1
[    0.877466] initcall power_supply_class_init+0x0/0x40 returned 0
after 0 usecs
[    0.879226] calling  hwmon_init+0x0/0xee @ 1
[    0.880126] initcall hwmon_init+0x0/0xee returned 0 after 0 usecs
[    0.881600] calling  md_init+0x0/0x150 @ 1
[    0.882648] initcall md_init+0x0/0x150 returned 0 after 0 usecs
[    0.884017] calling  mmc_init+0x0/0x78 @ 1
[    0.885228] initcall mmc_init+0x0/0x78 returned 0 after 0 usecs
[    0.886667] calling  leds_init+0x0/0x44 @ 1
[    0.888086] initcall leds_init+0x0/0x44 returned 0 after 0 usecs
[    0.889536] calling  acpi_wmi_init+0x0/0x71 @ 1
[    0.890772] wmi: Mapper loaded
[    0.892016] initcall acpi_wmi_init+0x0/0x71 returned 0 after 3906 usecs
[    0.893607] calling  iommu_init+0x0/0x56 @ 1
[    0.894643] initcall iommu_init+0x0/0x56 returned 0 after 0 usecs
[    0.896005] calling  init_soundcore+0x0/0x34 @ 1
[    0.897203] initcall init_soundcore+0x0/0x34 returned 0 after 0 usecs
[    0.898749] calling  alsa_sound_init+0x0/0x90 @ 1
[    0.900060] Advanced Linux Sound Architecture Driver Initialized.
[    0.901522] initcall alsa_sound_init+0x0/0x90 returned 0 after 0 usecs
[    0.903068] calling  ac97_bus_init+0x0/0x19 @ 1
[    0.904154] initcall ac97_bus_init+0x0/0x19 returned 0 after 0 usecs
[    0.905733] calling  pci_subsys_init+0x0/0x4a @ 1
[    0.906862] PCI: Using ACPI for IRQ routing
[    0.908031] PCI: Routing PCI interrupts for all devices because
"pci=routeirq" specified
[    0.910008] ACPI Exception: AE_NOT_FOUND, Evaluating _SRS
(20121018/pci_link-367)
[    0.911831] ACPI: Unable to set IRQ for PCI Interrupt Link [LNKS].
Try pci=noacpi or acpi=off
[    0.912007] pci 0000:00:01.3: PCI INT A: no GSI - using ISA IRQ 9
[    0.913633] ACPI: PCI Interrupt Link [LNKC] enabled at IRQ 11
[    0.916035] PCI: pci_cache_line_size set to 64 bytes
[    0.916729] pci 0000:00:01.1: BAR 0: reserving [io  0x01f0-0x01f7
flags 0x110] (d=0, p=0)
[    0.917875] pci 0000:00:01.1: BAR 1: reserving [io  0x03f6 flags
0x110] (d=0, p=0)
[    0.918723] pci 0000:00:01.1: BAR 2: reserving [io  0x0170-0x0177
flags 0x110] (d=0, p=0)
[    0.920005] pci 0000:00:01.1: BAR 3: reserving [io  0x0376 flags
0x110] (d=0, p=0)
[    0.920876] pci 0000:00:01.1: BAR 4: reserving [io  0xc040-0xc04f
flags 0x40101] (d=0, p=0)
[    0.921851] pci 0000:00:02.0: BAR 0: reserving [mem
0xfc000000-0xfdffffff flags 0x42208] (d=0, p=0)
[    0.922708] pci 0000:00:02.0: BAR 1: reserving [mem
0xfebf0000-0xfebf0fff flags 0x40200] (d=0, p=0)
[    0.923628] pci 0000:00:03.0: BAR 0: reserving [mem
0xfeba0000-0xfebbffff flags 0x40200] (d=0, p=0)
[    0.924005] pci 0000:00:03.0: BAR 1: reserving [io  0xc000-0xc03f
flags 0x40101] (d=0, p=0)
[    0.924916] e820: reserve RAM buffer [mem 0x0009fc00-0x0009ffff]
[    0.925488] e820: reserve RAM buffer [mem 0xdfffe000-0xdfffffff]
[    0.926215] initcall pci_subsys_init+0x0/0x4a returned 0 after 19531 usecs
[    0.927012] calling  proto_init+0x0/0x12 @ 1
[    0.928014] initcall proto_init+0x0/0x12 returned 0 after 0 usecs
[    0.928669] calling  net_dev_init+0x0/0x265 @ 1
[    0.929677] initcall net_dev_init+0x0/0x265 returned 0 after 0 usecs
[    0.930310] calling  neigh_init+0x0/0x80 @ 1
[    0.930836] initcall neigh_init+0x0/0x80 returned 0 after 0 usecs
[    0.931647] calling  genl_init+0x0/0x80 @ 1
[    0.932059] initcall genl_init+0x0/0x80 returned 0 after 0 usecs
[    0.932979] calling  bt_init+0x0/0x9d @ 1
[    0.933472] Bluetooth: Core ver 2.16
[    0.933958] NET: Registered protocol family 31
[    0.934642] Bluetooth: HCI device and connection manager initialized
[    0.936041] Bluetooth: HCI socket layer initialized
[    0.937103] Bluetooth: L2CAP socket layer initialized
[    0.937993] Bluetooth: SCO socket layer initialized
[    0.938506] initcall bt_init+0x0/0x9d returned 0 after 3906 usecs
[    0.939238] calling  cfg80211_init+0x0/0xca @ 1
[    0.940447] cfg80211: Calling CRDA to update world regulatory domain
[    0.941172] initcall cfg80211_init+0x0/0xca returned 0 after 0 usecs
[    0.941838] calling  wireless_nlevent_init+0x0/0x12 @ 1
[    0.942461] initcall wireless_nlevent_init+0x0/0x12 returned 0 after 0 usecs
[    0.943145] calling  ieee80211_init+0x0/0x3b @ 1
[    0.943845] initcall ieee80211_init+0x0/0x3b returned 0 after 0 usecs
[    0.944134] calling  hpet_late_init+0x0/0xec @ 1
[    0.944722] HPET: 3 timers in total, 0 timers will be used for per-cpu timer
[    0.945599] initcall hpet_late_init+0x0/0xec returned 0 after 0 usecs
[    0.946347] calling  init_amd_nbs+0x0/0xaf @ 1
[    0.946793] initcall init_amd_nbs+0x0/0xaf returned 0 after 0 usecs
[    0.948025] calling  clocksource_done_booting+0x0/0x5e @ 1
[    0.948623] Switching to clocksource hpet
[    0.949085] initcall clocksource_done_booting+0x0/0x5e returned 0
after 37 usecs
[    0.949856] calling  ftrace_init_debugfs+0x0/0x20c @ 1
[    0.950451] initcall ftrace_init_debugfs+0x0/0x20c returned 0 after 75 usecs
[    0.951462] calling  tracer_init_debugfs+0x0/0x414 @ 1
[    0.951988] initcall tracer_init_debugfs+0x0/0x414 returned 0 after 264 usecs
[    0.951997] calling  init_trace_printk_function_export+0x0/0x2f @ 1
[    0.952642] initcall init_trace_printk_function_export+0x0/0x2f
returned 0 after 10 usecs
[    0.953521] calling  event_trace_init+0x0/0x273 @ 1
[    0.990576] initcall event_trace_init+0x0/0x273 returned 0 after 35572 usecs
[    0.991403] calling  init_kprobe_trace+0x0/0x8e @ 1
[    0.992163] initcall init_kprobe_trace+0x0/0x8e returned 0 after 27 usecs
[    0.993028] calling  init_pipe_fs+0x0/0x4a @ 1
[    0.993615] initcall init_pipe_fs+0x0/0x4a returned 0 after 92 usecs
[    0.994383] calling  eventpoll_init+0x0/0x10a @ 1
[    0.994899] initcall eventpoll_init+0x0/0x10a returned 0 after 22 usecs
[    0.995542] calling  anon_inode_init+0x0/0x59 @ 1
[    0.996169] initcall anon_inode_init+0x0/0x59 returned 0 after 72 usecs
[    0.996965] calling  fscache_init+0x0/0x22c @ 1
[    0.997558] FS-Cache: Loaded
[    0.997863] initcall fscache_init+0x0/0x22c returned 0 after 328 usecs
[    0.998616] calling  cachefiles_init+0x0/0xa1 @ 1
[    0.999305] CacheFiles: Loaded
[    0.999656] initcall cachefiles_init+0x0/0xa1 returned 0 after 508 usecs
[    1.000493] calling  blk_scsi_ioctl_init+0x0/0x289 @ 1
[    1.001096] initcall blk_scsi_ioctl_init+0x0/0x289 returned 0 after 4 usecs
[    1.001829] calling  acpi_event_init+0x0/0x55 @ 1
[    1.002337] initcall acpi_event_init+0x0/0x55 returned 0 after 17 usecs
[    1.002979] calling  pnp_system_init+0x0/0x12 @ 1
[    1.003568] initcall pnp_system_init+0x0/0x12 returned 0 after 86 usecs
[    1.004366] calling  pnpacpi_init+0x0/0x8c @ 1
[    1.004836] pnp: PnP ACPI init
[    1.005180] ACPI: bus type pnp registered
[    1.005830] pnp 00:00: Plug and Play ACPI device, IDs PNP0b00 (active)
[    1.006652] pnp 00:01: Plug and Play ACPI device, IDs PNP0303 (active)
[    1.007442] pnp 00:02: Plug and Play ACPI device, IDs PNP0f13 (active)
[    1.008241] pnp 00:03: [dma 2]
[    1.008723] pnp 00:03: Plug and Play ACPI device, IDs PNP0700 (active)
[    1.009685] pnp 00:04: Plug and Play ACPI device, IDs PNP0400 (active)
[    1.010595] pnp 00:05: Plug and Play ACPI device, IDs PNP0501 (active)
[    1.011730] pnp 00:06: Plug and Play ACPI device, IDs PNP0103 (active)
[    1.012789] pnp: PnP ACPI: found 7 devices
[    1.013236] ACPI: ACPI bus type pnp unregistered
[    1.013700] initcall pnpacpi_init+0x0/0x8c returned 0 after 8655 usecs
[    1.014404] calling  intel_mid_dma_init+0x0/0x30 @ 1
[    1.014992] INFO_MDMA: LNW DMA Driver Version 1.1.0
[    1.015672] initcall intel_mid_dma_init+0x0/0x30 returned 0 after 664 usecs
[    1.016449] calling  chr_dev_init+0x0/0xc1 @ 1
[    1.024170] initcall chr_dev_init+0x0/0xc1 returned 0 after 7072 usecs
[    1.025023] calling  firmware_class_init+0x0/0x120 @ 1
[    1.025594] initcall firmware_class_init+0x0/0x120 returned 0 after 113 usecs
[    1.026218] calling  thermal_init+0x0/0x8a @ 1
[    1.027527] initcall thermal_init+0x0/0x8a returned 0 after 146 usecs
[    1.029161] calling  thermal_gov_step_wise_init+0x0/0x12 @ 1
[    1.030587] initcall thermal_gov_step_wise_init+0x0/0x12 returned 0
after 35 usecs
[    1.032496] calling  cpufreq_gov_performance_init+0x0/0x12 @ 1
[    1.033925] initcall cpufreq_gov_performance_init+0x0/0x12 returned
0 after 17 usecs
[    1.035792] calling  init_acpi_pm_clocksource+0x0/0xdf @ 1
[    1.041756] initcall init_acpi_pm_clocksource+0x0/0xdf returned 0
after 4435 usecs
[    1.043572] calling  ssb_modinit+0x0/0x50 @ 1
[    1.044824] initcall ssb_modinit+0x0/0x50 returned 0 after 135 usecs
[    1.046359] calling  bcma_modinit+0x0/0x3c @ 1
[    1.047616] initcall bcma_modinit+0x0/0x3c returned 0 after 174 usecs
[    1.049281] calling  pcibios_assign_resources+0x0/0xd7 @ 1
[    1.050628] pci_bus 0000:00: resource 4 [io  0x0000-0x0cf7]
[    1.051945] pci_bus 0000:00: resource 5 [io  0x0d00-0xffff]
[    1.053413] pci_bus 0000:00: resource 6 [mem 0x000a0000-0x000bffff]
[    1.054921] pci_bus 0000:00: resource 7 [mem 0xe0000000-0xfebfffff]
[    1.056558] initcall pcibios_assign_resources+0x0/0xd7 returned 0
after 5800 usecs
[    1.058391] calling  sysctl_core_init+0x0/0x2c @ 1
[    1.059580] initcall sysctl_core_init+0x0/0x2c returned 0 after 26 usecs
[    1.061242] calling  inet_init+0x0/0x292 @ 1
[    1.062313] NET: Registered protocol family 2
[    1.063646] TCP established hash table entries: 65536 (order: 8,
1048576 bytes)
[    1.066128] TCP bind hash table entries: 65536 (order: 10, 5242880 bytes)
[    1.071186] TCP: Hash tables configured (established 65536 bind 65536)
[    1.072562] TCP: reno registered
[    1.073407] UDP hash table entries: 4096 (order: 7, 786432 bytes)
[    1.075389] UDP-Lite hash table entries: 4096 (order: 7, 786432 bytes)
[    1.077861] initcall inet_init+0x0/0x292 returned 0 after 15211 usecs
[    1.079189] calling  ipv4_offload_init+0x0/0x68 @ 1
[    1.080445] initcall ipv4_offload_init+0x0/0x68 returned 0 after 14 usecs
[    1.082051] calling  af_unix_init+0x0/0x52 @ 1
[    1.083188] NET: Registered protocol family 1
[    1.084351] initcall af_unix_init+0x0/0x52 returned 0 after 1139 usecs
[    1.085912] calling  init_sunrpc+0x0/0x6b @ 1
[    1.087204] RPC: Registered named UNIX socket transport module.
[    1.088723] RPC: Registered udp transport module.
[    1.089862] RPC: Registered tcp transport module.
[    1.090997] RPC: Registered tcp NFSv4.1 backchannel transport module.
[    1.092587] initcall init_sunrpc+0x0/0x6b returned 0 after 5468 usecs
[    1.094138] calling  pci_apply_final_quirks+0x0/0x108 @ 1
[    1.095459] pci 0000:00:00.0: calling quirk_natoma+0x0/0x40
[    1.096849] calling  quirk_natoma+0x0/0x40 @ 1 for 0000:00:00.0
[    1.098279] pci 0000:00:00.0: Limiting direct PCI/PCI transfers
[    1.099701] pci fixup quirk_natoma+0x0/0x40 returned after 1389
usecs for 0000:00:00.0
[    1.101647] pci 0000:00:00.0: calling quirk_passive_release+0x0/0x90
[    1.103178] calling  quirk_passive_release+0x0/0x90 @ 1 for 0000:00:00.0
[    1.104856] pci 0000:00:01.0: PIIX3: Enabling Passive Release
[    1.106259] pci fixup quirk_passive_release+0x0/0x90 returned after
1384 usecs for 0000:00:00.0
[    1.108396] pci 0000:00:01.0: calling quirk_isa_dma_hangs+0x0/0x40
[    1.109877] calling  quirk_isa_dma_hangs+0x0/0x40 @ 1 for 0000:00:01.0
[    1.111430] pci 0000:00:01.0: Activating ISA DMA hang workarounds
[    1.112934] pci fixup quirk_isa_dma_hangs+0x0/0x40 returned after
1458 usecs for 0000:00:01.0
[    1.115018] pci 0000:00:02.0: calling pci_fixup_video+0x0/0xe0
[    1.116494] calling  pci_fixup_video+0x0/0xe0 @ 1 for 0000:00:02.0
[    1.117997] pci 0000:00:02.0: Boot video device
[    1.119059] pci fixup pci_fixup_video+0x0/0xe0 returned after 1048
usecs for 0000:00:02.0
[    1.121035] pci 0000:00:03.0: calling quirk_e100_interrupt+0x0/0x1c0
[    1.122514] calling  quirk_e100_interrupt+0x0/0x1c0 @ 1 for 0000:00:03.0
[    1.124168] pci fixup quirk_e100_interrupt+0x0/0x1c0 returned after
4 usecs for 0000:00:03.0
[    1.126182] PCI: CLS 0 bytes, default 64
[    1.127137] initcall pci_apply_final_quirks+0x0/0x108 returned 0
after 30941 usecs
[    1.129003] calling  populate_rootfs+0x0/0xd0 @ 1
[    1.130300] Trying to unpack rootfs image as initramfs...
[    1.135789] rootfs image is not initramfs (no cpio magic); looks
like an initrd
[    1.161303] Freeing initrd memory: 39052k freed
[    1.168423] initcall populate_rootfs+0x0/0xd0 returned 0 after 37378 usecs
[    1.169929] calling  pci_iommu_init+0x0/0x3e @ 1
[    1.171049] PCI-DMA: Using software bounce buffering for IO (SWIOTLB)
[    1.172645] software IO TLB [mem 0xdbffe000-0xdfffdfff] (64MB)
mapped at [ffff8800dbffe000-ffff8800dfffdfff]
[    1.175021] initcall pci_iommu_init+0x0/0x3e returned 0 after 3874 usecs
[    1.176686] calling  ir_dev_scope_init+0x0/0x16 @ 1
[    1.177873] initcall ir_dev_scope_init+0x0/0x16 returned 0 after 4 usecs
[    1.179585] calling  irqfd_module_init+0x0/0x38 @ 1
[    1.180910] initcall irqfd_module_init+0x0/0x38 returned 0 after 87 usecs
[    1.182543] calling  vmx_init+0x0/0x2ce @ 1
[    1.183559] kvm: no hardware support
[    1.184495] initcall vmx_init+0x0/0x2ce returned -95 after 913 usecs
[    1.186016] initcall vmx_init+0x0/0x2ce returned with error code -95
[    1.187551] calling  svm_init+0x0/0x1e @ 1
[    1.188591] has_svm: not amd
[    1.189287] kvm: no hardware support
[    1.190154] initcall svm_init+0x0/0x1e returned -95 after 1528 usecs
[    1.191676] initcall svm_init+0x0/0x1e returned with error code -95
[    1.193248] calling  i8259A_init_ops+0x0/0x24 @ 1
[    1.194405] initcall i8259A_init_ops+0x0/0x24 returned 0 after 6 usecs
[    1.195965] calling  vsyscall_init+0x0/0x27 @ 1
[    1.197118] initcall vsyscall_init+0x0/0x27 returned 0 after 7 usecs
[    1.198643] calling  sbf_init+0x0/0xf2 @ 1
[    1.199641] initcall sbf_init+0x0/0xf2 returned 0 after 4 usecs
[    1.201105] calling  init_tsc_clocksource+0x0/0x85 @ 1
[    1.202335] initcall init_tsc_clocksource+0x0/0x85 returned 0 after 11 usecs
[    1.203994] calling  add_rtc_cmos+0x0/0x96 @ 1
[    1.205113] initcall add_rtc_cmos+0x0/0x96 returned 0 after 4 usecs
[    1.206390] calling  i8237A_init_ops+0x0/0x14 @ 1
[    1.207538] initcall i8237A_init_ops+0x0/0x14 returned 0 after 8 usecs
[    1.209163] calling  cache_sysfs_init+0x0/0x5f @ 1
[    1.210383] initcall cache_sysfs_init+0x0/0x5f returned 0 after 57 usecs
[    1.211988] calling  intel_uncore_init+0x0/0x3c8 @ 1
[    1.213245] initcall intel_uncore_init+0x0/0x3c8 returned -19 after 4 usecs
[    1.214922] calling  inject_init+0x0/0x60 @ 1
[    1.215976] Machine check injector initialized
[    1.217102] initcall inject_init+0x0/0x60 returned 0 after 1092 usecs
[    1.218649] calling  thermal_throttle_init_device+0x0/0x97 @ 1
[    1.220096] initcall thermal_throttle_init_device+0x0/0x97 returned
0 after 3 usecs
[    1.221927] calling  amd_ibs_init+0x0/0x37f @ 1
[    1.223029] initcall amd_ibs_init+0x0/0x37f returned -19 after 4 usecs
[    1.224327] calling  msr_init+0x0/0x12f @ 1
[    1.225619] initcall msr_init+0x0/0x12f returned 0 after 265 usecs
[    1.227110] calling  cpuid_init+0x0/0x12f @ 1
[    1.228415] initcall cpuid_init+0x0/0x12f returned 0 after 190 usecs
[    1.229948] calling  ioapic_init_ops+0x0/0x14 @ 1
[    1.231091] initcall ioapic_init_ops+0x0/0x14 returned 0 after 6 usecs
[    1.232698] calling  add_pcspkr+0x0/0x40 @ 1
[    1.233918] initcall add_pcspkr+0x0/0x40 returned 0 after 165 usecs
[    1.235444] calling  microcode_init+0x0/0x1b4 @ 1
[    1.236838] microcode: CPU0 sig=0x623, pf=0x0, revision=0x1
[    1.238387] microcode: Microcode Update Driver: v2.00
<tigran@aivazian.fsnet.co.uk>, Peter Oruba
[    1.240567] initcall microcode_init+0x0/0x1b4 returned 0 after 3824 usecs
[    1.242193] calling  pt_dump_init+0x0/0x30 @ 1
[    1.243293] initcall pt_dump_init+0x0/0x30 returned 0 after 17 usecs
[    1.244876] calling  proc_execdomains_init+0x0/0x22 @ 1
[    1.246156] initcall proc_execdomains_init+0x0/0x22 returned 0 after 12 usecs
[    1.247861] calling  ioresources_init+0x0/0x3c @ 1
[    1.249080] initcall ioresources_init+0x0/0x3c returned 0 after 10 usecs
[    1.250691] calling  uid_cache_init+0x0/0x8c @ 1
[    1.251836] initcall uid_cache_init+0x0/0x8c returned 0 after 21 usecs
[    1.253459] calling  init_posix_timers+0x0/0x203 @ 1
[    1.254686] initcall init_posix_timers+0x0/0x203 returned 0 after 17 usecs
[    1.256385] calling  init_posix_cpu_timers+0x0/0xc5 @ 1
[    1.257653] initcall init_posix_cpu_timers+0x0/0xc5 returned 0 after 4 usecs
[    1.259340] calling  proc_schedstat_init+0x0/0x22 @ 1
[    1.260616] initcall proc_schedstat_init+0x0/0x22 returned 0 after 8 usecs
[    1.262229] calling  init_sched_debug_procfs+0x0/0x2c @ 1
[    1.263560] initcall init_sched_debug_procfs+0x0/0x2c returned 0
after 14 usecs
[    1.265385] calling  snapshot_device_init+0x0/0x12 @ 1
[    1.266824] initcall snapshot_device_init+0x0/0x12 returned 0 after 193 usecs
[    1.268623] calling  timekeeping_init_ops+0x0/0x14 @ 1
[    1.269884] initcall timekeeping_init_ops+0x0/0x14 returned 0 after 7 usecs
[    1.271552] calling  init_clocksource_sysfs+0x0/0x52 @ 1
[    1.273133] initcall init_clocksource_sysfs+0x0/0x52 returned 0
after 237 usecs
[    1.274904] calling  init_timer_list_procfs+0x0/0x2c @ 1
[    1.276270] initcall init_timer_list_procfs+0x0/0x2c returned 0
after 12 usecs
[    1.277991] calling  alarmtimer_init+0x0/0x1ad @ 1
[    1.279393] initcall alarmtimer_init+0x0/0x1ad returned 0 after 236 usecs
[    1.281096] calling  init_tstats_procfs+0x0/0x2c @ 1
[    1.282303] initcall init_tstats_procfs+0x0/0x2c returned 0 after 9 usecs
[    1.283929] calling  lockdep_proc_init+0x0/0x7c @ 1
[    1.285179] initcall lockdep_proc_init+0x0/0x7c returned 0 after 14 usecs
[    1.286807] calling  futex_init+0x0/0x6f @ 1
[    1.287863] initcall futex_init+0x0/0x6f returned 0 after 21 usecs
[    1.289402] calling  proc_dma_init+0x0/0x22 @ 1
[    1.290506] initcall proc_dma_init+0x0/0x22 returned 0 after 9 usecs
[    1.292051] calling  proc_modules_init+0x0/0x22 @ 1
[    1.293247] initcall proc_modules_init+0x0/0x22 returned 0 after 13 usecs
[    1.294800] calling  kallsyms_init+0x0/0x25 @ 1
[    1.295904] initcall kallsyms_init+0x0/0x25 returned 0 after 11 usecs
[    1.297511] calling  crash_save_vmcoreinfo_init+0x0/0x472 @ 1
[    1.298929] initcall crash_save_vmcoreinfo_init+0x0/0x472 returned
0 after 22 usecs
[    1.300819] calling  crash_notes_memory_init+0x0/0x37 @ 1
[    1.302129] initcall crash_notes_memory_init+0x0/0x37 returned 0
after 10 usecs
[    1.303884] calling  pid_namespaces_init+0x0/0x2d @ 1
[    1.305192] initcall pid_namespaces_init+0x0/0x2d returned 0 after 17 usecs
[    1.306866] calling  ikconfig_init+0x0/0x39 @ 1
[    1.307977] initcall ikconfig_init+0x0/0x39 returned 0 after 11 usecs
[    1.309577] calling  init_kprobes+0x0/0x198 @ 1
[    1.352853] initcall init_kprobes+0x0/0x198 returned 0 after 41187 usecs
[    1.354500] calling  hung_task_init+0x0/0x53 @ 1
[    1.355692] initcall hung_task_init+0x0/0x53 returned 0 after 68 usecs
[    1.357337] calling  irq_debugfs_init+0x0/0x2b @ 1
[    1.358523] initcall irq_debugfs_init+0x0/0x2b returned 0 after 19 usecs
[    1.360170] calling  irq_pm_init_ops+0x0/0x14 @ 1
[    1.361322] initcall irq_pm_init_ops+0x0/0x14 returned 0 after 6 usecs
[    1.362723] calling  utsname_sysctl_init+0x0/0x14 @ 1
[    1.363944] initcall utsname_sysctl_init+0x0/0x14 returned 0 after 14 usecs
[    1.365698] calling  init_tracepoints+0x0/0x20 @ 1
[    1.366874] initcall init_tracepoints+0x0/0x20 returned 0 after 4 usecs
[    1.368505] calling  init_lstats_procfs+0x0/0x25 @ 1
[    1.369723] initcall init_lstats_procfs+0x0/0x25 returned 0 after 13 usecs
[    1.371371] calling  stack_trace_init+0x0/0xa6 @ 1
[    1.372608] initcall stack_trace_init+0x0/0xa6 returned 0 after 27 usecs
[    1.374206] calling  init_mmio_trace+0x0/0x12 @ 1
[    1.375363] initcall init_mmio_trace+0x0/0x12 returned 0 after 7 usecs
[    1.376976] calling  init_blk_tracer+0x0/0x5c @ 1
[    1.378120] initcall init_blk_tracer+0x0/0x5c returned 0 after 8 usecs
[    1.379666] calling  perf_event_sysfs_init+0x0/0x9c @ 1
[    1.381582] initcall perf_event_sysfs_init+0x0/0x9c returned 0
after 540 usecs
[    1.383319] calling  init_per_zone_wmark_min+0x0/0x88 @ 1
[    1.384973] initcall init_per_zone_wmark_min+0x0/0x88 returned 0
after 264 usecs
[    1.386756] calling  kswapd_init+0x0/0x77 @ 1
[    1.387882] initcall kswapd_init+0x0/0x77 returned 0 after 70 usecs
[    1.389455] calling  extfrag_debug_init+0x0/0x7c @ 1
[    1.390697] initcall extfrag_debug_init+0x0/0x7c returned 0 after 36 usecs
[    1.392390] calling  setup_vmstat+0x0/0xbd @ 1
[    1.393493] initcall setup_vmstat+0x0/0xbd returned 0 after 22 usecs
[    1.395025] calling  mm_sysfs_init+0x0/0x29 @ 1
[    1.396182] initcall mm_sysfs_init+0x0/0x29 returned 0 after 10 usecs
[    1.397735] calling  proc_vmalloc_init+0x0/0x25 @ 1
[    1.398922] initcall proc_vmalloc_init+0x0/0x25 returned 0 after 9 usecs
[    1.400587] calling  procswaps_init+0x0/0x22 @ 1
[    1.401695] initcall procswaps_init+0x0/0x22 returned 0 after 9 usecs
[    1.403235] calling  hugetlb_init+0x0/0x4e7 @ 1
[    1.404386] HugeTLB registered 2 MB page size, pre-allocated 0 pages
[    1.405940] initcall hugetlb_init+0x0/0x4e7 returned 0 after 1521 usecs
[    1.407437] calling  mmu_notifier_init+0x0/0x20 @ 1
[    1.408677] initcall mmu_notifier_init+0x0/0x20 returned 0 after 8 usecs
[    1.410260] calling  slab_proc_init+0x0/0x25 @ 1
[    1.411402] initcall slab_proc_init+0x0/0x25 returned 0 after 15 usecs
[    1.413012] calling  slab_sysfs_init+0x0/0xfc @ 1
[    1.422815] initcall slab_sysfs_init+0x0/0xfc returned 0 after 8468 usecs
[    1.424494] calling  hugepage_init+0x0/0x177 @ 1
[    1.425996] initcall hugepage_init+0x0/0x177 returned 0 after 366 usecs
[    1.427599] calling  pfn_inject_init+0x0/0x160 @ 1
[    1.428916] initcall pfn_inject_init+0x0/0x160 returned 0 after 54 usecs
[    1.430533] calling  fcntl_init+0x0/0x2a @ 1
[    1.431583] initcall fcntl_init+0x0/0x2a returned 0 after 11 usecs
[    1.433138] calling  proc_filesystems_init+0x0/0x22 @ 1
[    1.434420] initcall proc_filesystems_init+0x0/0x22 returned 0 after 12 usecs
[    1.436170] calling  dio_init+0x0/0x2d @ 1
[    1.437360] initcall dio_init+0x0/0x2d returned 0 after 187 usecs
[    1.438834] calling  fsnotify_mark_init+0x0/0x40 @ 1
[    1.440120] initcall fsnotify_mark_init+0x0/0x40 returned 0 after 40 usecs
[    1.441812] calling  dnotify_init+0x0/0x7b @ 1
[    1.442844] initcall dnotify_init+0x0/0x7b returned 0 after 229 usecs
[    1.444509] calling  inotify_user_setup+0x0/0x70 @ 1
[    1.445746] initcall inotify_user_setup+0x0/0x70 returned 0 after 24 usecs
[    1.447405] calling  aio_setup+0x0/0x7f @ 1
[    1.448508] initcall aio_setup+0x0/0x7f returned 0 after 24 usecs
[    1.449970] calling  proc_locks_init+0x0/0x22 @ 1
[    1.451120] initcall proc_locks_init+0x0/0x22 returned 0 after 13 usecs
[    1.452763] calling  init_mbcache+0x0/0x14 @ 1
[    1.453851] initcall init_mbcache+0x0/0x14 returned 0 after 6 usecs
[    1.455348] calling  dquot_init+0x0/0x11d @ 1
[    1.456446] VFS: Disk quotas dquot_6.5.2
[    1.457606] Dquot-cache hash table entries: 512 (order 0, 4096 bytes)
[    1.459171] initcall dquot_init+0x0/0x11d returned 0 after 2663 usecs
[    1.460798] calling  proc_cmdline_init+0x0/0x22 @ 1
[    1.461990] initcall proc_cmdline_init+0x0/0x22 returned 0 after 9 usecs
[    1.463599] calling  proc_consoles_init+0x0/0x22 @ 1
[    1.464865] initcall proc_consoles_init+0x0/0x22 returned 0 after 8 usecs
[    1.466498] calling  proc_cpuinfo_init+0x0/0x22 @ 1
[    1.467684] initcall proc_cpuinfo_init+0x0/0x22 returned 0 after 9 usecs
[    1.469341] calling  proc_devices_init+0x0/0x22 @ 1
[    1.470522] initcall proc_devices_init+0x0/0x22 returned 0 after 12 usecs
[    1.472221] calling  proc_interrupts_init+0x0/0x22 @ 1
[    1.473461] initcall proc_interrupts_init+0x0/0x22 returned 0 after 12 usecs
[    1.475151] calling  proc_loadavg_init+0x0/0x22 @ 1
[    1.476415] initcall proc_loadavg_init+0x0/0x22 returned 0 after 11 usecs
[    1.478045] calling  proc_meminfo_init+0x0/0x22 @ 1
[    1.479250] initcall proc_meminfo_init+0x0/0x22 returned 0 after 15 usecs
[    1.480939] calling  proc_stat_init+0x0/0x22 @ 1
[    1.482068] initcall proc_stat_init+0x0/0x22 returned 0 after 9 usecs
[    1.483620] calling  proc_uptime_init+0x0/0x22 @ 1
[    1.484841] initcall proc_uptime_init+0x0/0x22 returned 0 after 8 usecs
[    1.486430] calling  proc_version_init+0x0/0x22 @ 1
[    1.487612] initcall proc_version_init+0x0/0x22 returned 0 after 9 usecs
[    1.489274] calling  proc_softirqs_init+0x0/0x22 @ 1
[    1.490484] initcall proc_softirqs_init+0x0/0x22 returned 0 after 9 usecs
[    1.492161] calling  proc_kcore_init+0x0/0xb5 @ 1
[    1.493330] initcall proc_kcore_init+0x0/0xb5 returned 0 after 22 usecs
[    1.494923] calling  vmcore_init+0x0/0x192 @ 1
[    1.496051] initcall vmcore_init+0x0/0x192 returned 0 after 3 usecs
[    1.497570] calling  proc_kmsg_init+0x0/0x25 @ 1
[    1.498698] initcall proc_kmsg_init+0x0/0x25 returned 0 after 9 usecs
[    1.500277] calling  proc_page_init+0x0/0x42 @ 1
[    1.501429] initcall proc_page_init+0x0/0x42 returned 0 after 17 usecs
[    1.502982] calling  configfs_init+0x0/0xa7 @ 1
[    1.504173] initcall configfs_init+0x0/0xa7 returned 0 after 44 usecs
[    1.505725] calling  init_devpts_fs+0x0/0x62 @ 1
[    1.506959] initcall init_devpts_fs+0x0/0x62 returned 0 after 108 usecs
[    1.508611] calling  init_dlm+0x0/0x77 @ 1
[    1.510215] DLM installed
[    1.510873] initcall init_dlm+0x0/0x77 returned 0 after 1238 usecs
[    1.512432] calling  init_reiserfs_fs+0x0/0x6f @ 1
[    1.513733] initcall init_reiserfs_fs+0x0/0x6f returned 0 after 138 usecs
[    1.515385] calling  init_ext3_fs+0x0/0x7b @ 1
[    1.516760] initcall init_ext3_fs+0x0/0x7b returned 0 after 238 usecs
[    1.518322] calling  init_ext2_fs+0x0/0x7b @ 1
[    1.519537] initcall init_ext2_fs+0x0/0x7b returned 0 after 128 usecs
[    1.521140] calling  ext4_init_fs+0x0/0x214 @ 1
[    1.522930] initcall ext4_init_fs+0x0/0x214 returned 0 after 674 usecs
[    1.524564] calling  journal_init+0x0/0xb4 @ 1
[    1.525990] initcall journal_init+0x0/0xb4 returned 0 after 336 usecs
[    1.527546] calling  journal_init+0x0/0x11e @ 1
[    1.528839] initcall journal_init+0x0/0x11e returned 0 after 143 usecs
[    1.530381] calling  init_cramfs_fs+0x0/0x2e @ 1
[    1.531533] initcall init_cramfs_fs+0x0/0x2e returned 0 after 22 usecs
[    1.533150] calling  init_squashfs_fs+0x0/0x75 @ 1
[    1.534521] squashfs: version 4.0 (2009/01/31) Phillip Lougher
[    1.535953] initcall init_squashfs_fs+0x0/0x75 returned 0 after 1593 usecs
[    1.537700] calling  init_ramfs_fs+0x0/0x12 @ 1
[    1.538810] initcall init_ramfs_fs+0x0/0x12 returned 0 after 8 usecs
[    1.540372] calling  init_hugetlbfs_fs+0x0/0x17b @ 1
[    1.541866] initcall init_hugetlbfs_fs+0x0/0x17b returned 0 after 283 usecs
[    1.543551] calling  init_minix_fs+0x0/0x67 @ 1
[    1.544869] initcall init_minix_fs+0x0/0x67 returned 0 after 125 usecs
[    1.546449] calling  init_fat_fs+0x0/0x4c @ 1
[    1.547633] initcall init_fat_fs+0x0/0x4c returned 0 after 220 usecs
[    1.549274] calling  init_vfat_fs+0x0/0x12 @ 1
[    1.550369] initcall init_vfat_fs+0x0/0x12 returned 0 after 6 usecs
[    1.551873] calling  init_msdos_fs+0x0/0x12 @ 1
[    1.553021] initcall init_msdos_fs+0x0/0x12 returned 0 after 5 usecs
[    1.554563] calling  init_iso9660_fs+0x0/0x7d @ 1
[    1.555843] initcall init_iso9660_fs+0x0/0x7d returned 0 after 141 usecs
[    1.557523] calling  vxfs_init+0x0/0x5a @ 1
[    1.558681] initcall vxfs_init+0x0/0x5a returned 0 after 143 usecs
[    1.560214] calling  init_nfs_fs+0x0/0x195 @ 1
[    1.561343] FS-Cache: Netfs 'nfs' registered for caching
[    1.563134] initcall init_nfs_fs+0x0/0x195 returned 0 after 1790 usecs
[    1.564816] calling  init_nfs_v2+0x0/0x14 @ 1
[    1.565913] initcall init_nfs_v2+0x0/0x14 returned 0 after 28 usecs
[    1.567427] calling  init_nfs_v3+0x0/0x14 @ 1
[    1.568542] initcall init_nfs_v3+0x0/0x14 returned 0 after 4 usecs
[    1.570007] calling  init_nfs_v4+0x0/0x3b @ 1
[    1.571062] NFS: Registering the id_resolver key type
[    1.572385] Key type id_resolver registered
[    1.573394] Key type id_legacy registered
[    1.574390] initcall init_nfs_v4+0x0/0x3b returned 0 after 3248 usecs
[    1.575930] calling  init_nlm+0x0/0x47 @ 1
[    1.576994] initcall init_nlm+0x0/0x47 returned 0 after 17 usecs
[    1.578446] calling  init_nls_cp437+0x0/0x12 @ 1
[    1.579581] initcall init_nls_cp437+0x0/0x12 returned 0 after 16 usecs
[    1.581191] calling  init_nls_ascii+0x0/0x12 @ 1
[    1.582319] initcall init_nls_ascii+0x0/0x12 returned 0 after 5 usecs
[    1.583860] calling  init_nls_iso8859_1+0x0/0x12 @ 1
[    1.585116] initcall init_nls_iso8859_1+0x0/0x12 returned 0 after 4 usecs
[    1.586749] calling  init_nls_utf8+0x0/0x25 @ 1
[    1.587846] initcall init_nls_utf8+0x0/0x25 returned 0 after 5 usecs
[    1.589413] calling  init_cifs+0x0/0x3d0 @ 1
[    1.590847] initcall init_cifs+0x0/0x3d0 returned 0 after 407 usecs
[    1.592441] calling  init_ntfs_fs+0x0/0x1ce @ 1
[    1.593479] NTFS driver 2.1.30 [Flags: R/O].
[    1.594836] initcall init_ntfs_fs+0x0/0x1ce returned 0 after 1324 usecs
[    1.596495] calling  init_romfs_fs+0x0/0x84 @ 1
[    1.597601] ROMFS MTD (C) 2007 Red Hat, Inc.
[    1.598775] initcall init_romfs_fs+0x0/0x84 returned 0 after 1146 usecs
[    1.600435] calling  init_autofs4_fs+0x0/0x2a @ 1
[    1.601700] initcall init_autofs4_fs+0x0/0x2a returned 0 after 122 usecs
[    1.603317] calling  init_udf_fs+0x0/0x67 @ 1
[    1.604560] initcall init_udf_fs+0x0/0x67 returned 0 after 120 usecs
[    1.606087] calling  init_omfs_fs+0x0/0x12 @ 1
[    1.607174] initcall init_omfs_fs+0x0/0x12 returned 0 after 6 usecs
[    1.608729] calling  init_btrfs_fs+0x0/0xc3 @ 1
[    1.610835] Btrfs loaded
[    1.611467] initcall init_btrfs_fs+0x0/0xc3 returned 0 after 1600 usecs
[    1.613126] calling  init_gfs2_fs+0x0/0x343 @ 1
[    1.615057] GFS2 installed
[    1.615731] initcall init_gfs2_fs+0x0/0x343 returned 0 after 1479 usecs
[    1.617389] calling  init_pstore_fs+0x0/0x12 @ 1
[    1.618522] initcall init_pstore_fs+0x0/0x12 returned 0 after 7 usecs
[    1.620089] calling  ipc_init+0x0/0x2f @ 1
[    1.621102] msgmni has been set to 11874
[    1.622068] initcall ipc_init+0x0/0x2f returned 0 after 958 usecs
[    1.623523] calling  ipc_sysctl_init+0x0/0x14 @ 1
[    1.624743] initcall ipc_sysctl_init+0x0/0x14 returned 0 after 14 usecs
[    1.626342] calling  init_mqueue_fs+0x0/0xb3 @ 1
[    1.627785] initcall init_mqueue_fs+0x0/0xb3 returned 0 after 312 usecs
[    1.629412] calling  key_proc_init+0x0/0x33 @ 1
[    1.630526] initcall key_proc_init+0x0/0x33 returned 0 after 14 usecs
[    1.632111] calling  crypto_wq_init+0x0/0x38 @ 1
[    1.633283] initcall crypto_wq_init+0x0/0x38 returned 0 after 53 usecs
[    1.634856] calling  crypto_algapi_init+0x0/0xd @ 1
[    1.636091] initcall crypto_algapi_init+0x0/0xd returned 0 after 8 usecs
[    1.637699] calling  skcipher_module_init+0x0/0x36 @ 1
[    1.638944] initcall skcipher_module_init+0x0/0x36 returned 0 after 4 usecs
[    1.640663] calling  chainiv_module_init+0x0/0x12 @ 1
[    1.641917] initcall chainiv_module_init+0x0/0x12 returned 0 after 30 usecs
[    1.643583] calling  eseqiv_module_init+0x0/0x12 @ 1
[    1.644842] initcall eseqiv_module_init+0x0/0x12 returned 0 after 5 usecs
[    1.646479] calling  hmac_module_init+0x0/0x12 @ 1
[    1.647644] initcall hmac_module_init+0x0/0x12 returned 0 after 6 usecs
[    1.649277] calling  md4_mod_init+0x0/0x12 @ 1
[    1.650486] initcall md4_mod_init+0x0/0x12 returned 0 after 152 usecs
[    1.651935] calling  md5_mod_init+0x0/0x12 @ 1
[    1.653191] initcall md5_mod_init+0x0/0x12 returned 0 after 84 usecs
[    1.654727] calling  sha1_generic_mod_init+0x0/0x12 @ 1
[    1.656164] initcall sha1_generic_mod_init+0x0/0x12 returned 0 after 86 usecs
[    1.657887] calling  sha256_generic_mod_init+0x0/0x17 @ 1
[    1.659008] initcall sha256_generic_mod_init+0x0/0x17 returned 0
after 121 usecs
[    1.660858] calling  crypto_ecb_module_init+0x0/0x12 @ 1
[    1.662151] initcall crypto_ecb_module_init+0x0/0x12 returned 0 after 6 usecs
[    1.663859] calling  crypto_cbc_module_init+0x0/0x12 @ 1
[    1.665205] initcall crypto_cbc_module_init+0x0/0x12 returned 0 after 5 usecs
[    1.666914] calling  des_generic_mod_init+0x0/0x17 @ 1
[    1.668328] initcall des_generic_mod_init+0x0/0x17 returned 0 after 122 usecs
[    1.670041] calling  aes_init+0x0/0x12 @ 1
[    1.671097] initcall aes_init+0x0/0x12 returned 0 after 58 usecs
[    1.672604] calling  arc4_init+0x0/0x17 @ 1
[    1.673732] initcall arc4_init+0x0/0x17 returned 0 after 111 usecs
[    1.675234] calling  michael_mic_init+0x0/0x12 @ 1
[    1.676521] initcall michael_mic_init+0x0/0x12 returned 0 after 62 usecs
[    1.678129] calling  crc32c_mod_init+0x0/0x12 @ 1
[    1.679329] initcall crc32c_mod_init+0x0/0x12 returned 0 after 58 usecs
[    1.680974] calling  krng_mod_init+0x0/0x12 @ 1
[    1.682180] initcall krng_mod_init+0x0/0x12 returned 0 after 94 usecs
[    1.683711] calling  async_tx_init+0x0/0x1b @ 1
[    1.684913] async_tx: api initialized (async)
[    1.685980] initcall async_tx_init+0x0/0x1b returned 0 after 1064 usecs
[    1.687649] calling  async_pq_init+0x0/0x3b @ 1
[    1.688796] initcall async_pq_init+0x0/0x3b returned 0 after 6 usecs
[    1.690328] calling  proc_genhd_init+0x0/0x3c @ 1
[    1.691486] initcall proc_genhd_init+0x0/0x3c returned 0 after 17 usecs
[    1.693134] calling  bsg_init+0x0/0x16f @ 1
[    1.694411] Block layer SCSI generic (bsg) driver version 0.4
loaded (major 251)
[    1.696265] initcall bsg_init+0x0/0x16f returned 0 after 2063 usecs
[    1.697765] calling  throtl_init+0x0/0x49 @ 1
[    1.698909] initcall throtl_init+0x0/0x49 returned 0 after 88 usecs
[    1.700483] calling  noop_init+0x0/0x12 @ 1
[    1.701516] io scheduler noop registered
[    1.702477] initcall noop_init+0x0/0x12 returned 0 after 950 usecs
[    1.703955] calling  deadline_init+0x0/0x12 @ 1
[    1.705111] io scheduler deadline registered
[    1.706149] initcall deadline_init+0x0/0x12 returned 0 after 1014 usecs
[    1.707736] calling  cfq_init+0x0/0xb5 @ 1
[    1.709045] io scheduler cfq registered (default)
[    1.710168] initcall cfq_init+0x0/0xb5 returned 0 after 1360 usecs
[    1.711640] calling  plist_test+0x0/0x145 @ 1
[    1.713840] initcall plist_test+0x0/0x145 returned 0 after 1044 usecs
[    1.715299] calling  libcrc32c_mod_init+0x0/0x2a @ 1
[    1.716549] initcall libcrc32c_mod_init+0x0/0x2a returned 0 after 15 usecs
[    1.718200] calling  percpu_counter_startup+0x0/0x19 @ 1
[    1.719496] initcall percpu_counter_startup+0x0/0x19 returned 0 after 9 usecs
[    1.721467] calling  bgpio_driver_init+0x0/0x12 @ 1
[    1.722810] initcall bgpio_driver_init+0x0/0x12 returned 0 after 147 usecs
[    1.724534] calling  amd_gpio_init+0x0/0x170 @ 1
[    1.725664] initcall amd_gpio_init+0x0/0x170 returned -19 after 11 usecs
[    1.727274] calling  ichx_gpio_driver_init+0x0/0x12 @ 1
[    1.728682] initcall ichx_gpio_driver_init+0x0/0x12 returned 0 after 93 usecs
[    1.730416] calling  it8761e_gpio_init+0x0/0x19e @ 1
[    1.731695] initcall it8761e_gpio_init+0x0/0x19e returned -19 after 72 usecs
[    1.733087] calling  pci_proc_init+0x0/0x69 @ 1
[    1.734227] initcall pci_proc_init+0x0/0x69 returned 0 after 35 usecs
[    1.735784] calling  pcie_portdrv_init+0x0/0x77 @ 1
[    1.737203] initcall pcie_portdrv_init+0x0/0x77 returned 0 after 184 usecs
[    1.738873] calling  aer_service_init+0x0/0x22 @ 1
[    1.740182] initcall aer_service_init+0x0/0x22 returned 0 after 101 usecs
[    1.741821] calling  aer_inject_init+0x0/0x12 @ 1
[    1.743110] initcall aer_inject_init+0x0/0x12 returned 0 after 151 usecs
[    1.744820] calling  pcie_pme_service_init+0x0/0x12 @ 1
[    1.746231] initcall pcie_pme_service_init+0x0/0x12 returned 0
after 136 usecs
[    1.747979] calling  ioapic_init+0x0/0x1b @ 1
[    1.748987] initcall ioapic_init+0x0/0x1b returned 0 after 119 usecs
[    1.750518] calling  pci_hotplug_init+0x0/0x4c @ 1
[    1.751681] pci_hotplug: PCI Hot Plug PCI Core version: 0.5
[    1.753099] initcall pci_hotplug_init+0x0/0x4c returned 0 after 1377 usecs
[    1.754759] calling  pcied_init+0x0/0xaf @ 1
[    1.756170] pciehp: PCI Express Hot Plug Controller Driver version: 0.4
[    1.757775] initcall pcied_init+0x0/0xaf returned 0 after 1926 usecs
[    1.759292] calling  zt5550_init+0x0/0x80 @ 1
[    1.760403] cpcihp_zt5550: ZT5550 CompactPCI Hot Plug Driver version: 0.2
[    1.762141] initcall zt5550_init+0x0/0x80 returned 0 after 1699 usecs
[    1.763695] calling  cpcihp_generic_init+0x0/0x449 @ 1
[    1.765006] cpcihp_generic: Generic port I/O CompactPCI Hot Plug
Driver version: 0.1
[    1.766866] cpcihp_generic: not configured, disabling.
[    1.767943] initcall cpcihp_generic_init+0x0/0x449 returned -22
after 2871 usecs
[    1.769765] initcall cpcihp_generic_init+0x0/0x449 returned with
error code -22
[    1.771529] calling  shpcd_init+0x0/0xf5 @ 1
[    1.772772] shpchp: Standard Hot Plug PCI Controller Driver version: 0.4
[    1.774394] initcall shpcd_init+0x0/0xf5 returned 0 after 1732 usecs
[    1.775927] calling  acpiphp_init+0x0/0x50 @ 1
[    1.777097] acpiphp: ACPI Hot Plug PCI Controller Driver version: 0.5
[    1.778939] pci_bus 0000:00: dev 03, created physical slot 3
[    1.780548] acpiphp: Slot [3] registered
[    1.781548] pci_bus 0000:00: dev 04, created physical slot 4
[    1.782994] acpiphp: Slot [4] registered
[    1.783987] pci_bus 0000:00: dev 05, created physical slot 5
[    1.785518] acpiphp: Slot [5] registered
[    1.786504] pci_bus 0000:00: dev 06, created physical slot 6
[    1.787950] acpiphp: Slot [6] registered
[    1.788997] pci_bus 0000:00: dev 07, created physical slot 7
[    1.790456] acpiphp: Slot [7] registered
[    1.791444] pci_bus 0000:00: dev 08, created physical slot 8
[    1.792958] acpiphp: Slot [8] registered
[    1.793946] pci_bus 0000:00: dev 09, created physical slot 9
[    1.795411] acpiphp: Slot [9] registered
[    1.796460] pci_bus 0000:00: dev 0a, created physical slot 10
[    1.797932] acpiphp: Slot [10] registered
[    1.798946] pci_bus 0000:00: dev 0b, created physical slot 11
[    1.800490] acpiphp: Slot [11] registered
[    1.801509] pci_bus 0000:00: dev 0c, created physical slot 12
[    1.803003] acpiphp: Slot [12] registered
[    1.804097] pci_bus 0000:00: dev 0d, created physical slot 13
[    1.805638] acpiphp: Slot [13] registered
[    1.806669] pci_bus 0000:00: dev 0e, created physical slot 14
[    1.808234] acpiphp: Slot [14] registered
[    1.809261] pci_bus 0000:00: dev 0f, created physical slot 15
[    1.810729] acpiphp: Slot [15] registered
[    1.811740] pci_bus 0000:00: dev 10, created physical slot 16
[    1.813290] acpiphp: Slot [16] registered
[    1.814305] pci_bus 0000:00: dev 11, created physical slot 17
[    1.815762] acpiphp: Slot [17] registered
[    1.816833] pci_bus 0000:00: dev 12, created physical slot 18
[    1.818313] acpiphp: Slot [18] registered
[    1.819313] pci_bus 0000:00: dev 13, created physical slot 19
[    1.820852] acpiphp: Slot [19] registered
[    1.821866] pci_bus 0000:00: dev 14, created physical slot 20
[    1.823328] acpiphp: Slot [20] registered
[    1.824404] pci_bus 0000:00: dev 15, created physical slot 21
[    1.825879] acpiphp: Slot [21] registered
[    1.826889] pci_bus 0000:00: dev 16, created physical slot 22
[    1.828416] acpiphp: Slot [22] registered
[    1.829432] pci_bus 0000:00: dev 17, created physical slot 23
[    1.830938] acpiphp: Slot [23] registered
[    1.831977] pci_bus 0000:00: dev 18, created physical slot 24
[    1.833512] acpiphp: Slot [24] registered
[    1.834543] pci_bus 0000:00: dev 19, created physical slot 25
[    1.836131] acpiphp: Slot [25] registered
[    1.837169] pci_bus 0000:00: dev 1a, created physical slot 26
[    1.838666] acpiphp: Slot [26] registered
[    1.839685] pci_bus 0000:00: dev 1b, created physical slot 27
[    1.840974] acpiphp: Slot [27] registered
[    1.841989] pci_bus 0000:00: dev 1c, created physical slot 28
[    1.843467] acpiphp: Slot [28] registered
[    1.844557] pci_bus 0000:00: dev 1d, created physical slot 29
[    1.846026] acpiphp: Slot [29] registered
[    1.847058] pci_bus 0000:00: dev 1e, created physical slot 30
[    1.848571] acpiphp: Slot [30] registered
[    1.849545] pci_bus 0000:00: dev 1f, created physical slot 31
[    1.850982] acpiphp: Slot [31] registered
[    1.852506] initcall acpiphp_init+0x0/0x50 returned 0 after 73636 usecs
[    1.854024] calling  ibm_acpiphp_init+0x0/0x188 @ 1
[    1.856983] acpiphp_ibm: ibm_acpiphp_init: acpi_walk_namespace failed
[    1.858491] initcall ibm_acpiphp_init+0x0/0x188 returned -19 after 3250 usecs
[    1.860128] calling  pci_stub_init+0x0/0x140 @ 1
[    1.861238] initcall pci_stub_init+0x0/0x140 returned 0 after 152 usecs
[    1.862752] calling  tsi721_init+0x0/0x1b @ 1
[    1.863993] initcall tsi721_init+0x0/0x1b returned 0 after 138 usecs
[    1.865652] calling  fb_console_init+0x0/0x112 @ 1
[    1.867103] initcall fb_console_init+0x0/0x112 returned 0 after 268 usecs
[    1.868839] calling  genericbl_driver_init+0x0/0x12 @ 1
[    1.870293] initcall genericbl_driver_init+0x0/0x12 returned 0
after 127 usecs
[    1.872087] calling  acpi_reserve_resources+0x0/0xeb @ 1
[    1.873388] initcall acpi_reserve_resources+0x0/0xeb returned 0 after 8 usecs
[    1.875113] calling  irqrouter_init_ops+0x0/0x29 @ 1
[    1.876373] initcall irqrouter_init_ops+0x0/0x29 returned 0 after 5 usecs
[    1.878013] calling  acpi_ac_init+0x0/0x30 @ 1
[    1.879227] initcall acpi_ac_init+0x0/0x30 returned 0 after 120 usecs
[    1.880840] calling  acpi_button_driver_init+0x0/0x12 @ 1
[    1.882360] input: Power Button as
/devices/LNXSYSTM:00/LNXPWRBN:00/input/input0
[    1.884230] ACPI: Power Button [PWRF]
[    1.885277] initcall acpi_button_driver_init+0x0/0x12 returned 0
after 3053 usecs
[    1.887079] calling  acpi_fan_driver_init+0x0/0x12 @ 1
[    1.888492] initcall acpi_fan_driver_init+0x0/0x12 returned 0 after 102 usecs
[    1.890182] calling  acpi_video_init+0x0/0x91 @ 1
[    1.891499] initcall acpi_video_init+0x0/0x91 returned 0 after 163 usecs
[    1.893222] calling  acpi_processor_init+0x0/0x6d @ 1
[    1.897447] initcall acpi_processor_init+0x0/0x6d returned 0 after 2906 usecs
[    1.898199] calling  acpi_container_init+0x0/0x4a @ 1
[    1.900662] initcall acpi_container_init+0x0/0x4a returned 0 after 1891 usecs
[    1.901380] calling  acpi_thermal_init+0x0/0x42 @ 1
[    1.901987] initcall acpi_thermal_init+0x0/0x42 returned 0 after 111 usecs
[    1.902925] calling  acpi_memory_device_init+0x0/0x71 @ 1
[    1.906021] initcall acpi_memory_device_init+0x0/0x71 returned 0
after 1994 usecs
[    1.907733] calling  acpi_battery_init+0x0/0x16 @ 1
[    1.908957] initcall acpi_battery_init+0x0/0x16 returned 0 after 24 usecs
[    1.910509] calling  acpi_hed_driver_init+0x0/0x12 @ 1
[    1.911770] calling  1_acpi_battery_init_async+0x0/0x1b @ 6
[    1.913364] initcall acpi_hed_driver_init+0x0/0x12 returned 0 after
1626 usecs
[    1.915011] calling  bgrt_init+0x0/0xbb @ 1
[    1.915991] initcall bgrt_init+0x0/0xbb returned -19 after 6 usecs
[    1.917462] calling  acpi_pad_init+0x0/0xca @ 1
[    1.918570] initcall acpi_pad_init+0x0/0xca returned -22 after 6 usecs
[    1.920141] initcall acpi_pad_init+0x0/0xca returned with error code -22
[    1.921628] calling  ioat_init_module+0x0/0x7f @ 1
[    1.922759] ioatdma: Intel(R) QuickData Technology Driver 4.00
[    1.924456] initcall 1_acpi_battery_init_async+0x0/0x1b returned 0
after 11067 usecs
[    1.926494] initcall ioat_init_module+0x0/0x7f returned 0 after 3642 usecs
[    1.928198] calling  td_driver_init+0x0/0x12 @ 1
[    1.929470] initcall td_driver_init+0x0/0x12 returned 0 after 124 usecs
[    1.931057] calling  pch_dma_init+0x0/0x1b @ 1
[    1.932060] initcall pch_dma_init+0x0/0x1b returned 0 after 159 usecs
[    1.933606] calling  virtio_pci_init+0x0/0x1b @ 1
[    1.934858] initcall virtio_pci_init+0x0/0x1b returned 0 after 103 usecs
[    1.936541] calling  init+0x0/0x12 @ 1
[    1.937484] initcall init+0x0/0x12 returned 0 after 97 usecs
[    1.938844] calling  pty_init+0x0/0x3a3 @ 1
[    1.985004] initcall pty_init+0x0/0x3a3 returned 0 after 44090 usecs
[    1.986514] calling  sysrq_init+0x0/0x78 @ 1
[    1.987570] initcall sysrq_init+0x0/0x78 returned 0 after 16 usecs
[    1.989139] calling  serial8250_init+0x0/0x178 @ 1
[    1.990306] Serial: 8250/16550 driver, 4 ports, IRQ sharing disabled
[    2.013382] 00:05: ttyS0 at I/O 0x3f8 (irq = 4) is a 16550A
[    2.016119] initcall serial8250_init+0x0/0x178 returned 0 after 25201 usecs
[    2.017644] calling  serial_pci_driver_init+0x0/0x1b @ 1
[    2.019093] initcall serial_pci_driver_init+0x0/0x1b returned 0
after 136 usecs
[    2.020921] calling  jsm_init_module+0x0/0x45 @ 1
[    2.022208] initcall jsm_init_module+0x0/0x45 returned 0 after 135 usecs
[    2.023825] calling  rand_initialize+0x0/0x40 @ 1
[    2.025073] initcall rand_initialize+0x0/0x40 returned 0 after 43 usecs
[    2.026671] calling  raw_init+0x0/0x141 @ 1
[    2.027915] initcall raw_init+0x0/0x141 returned 0 after 214 usecs
[    2.029503] calling  nvram_init+0x0/0x7f @ 1
[    2.030731] Non-volatile memory driver v1.3
[    2.031762] initcall nvram_init+0x0/0x7f returned 0 after 1193 usecs
[    2.033389] calling  mod_init+0x0/0x218 @ 1
[    2.034488] initcall mod_init+0x0/0x218 returned -19 after 66 usecs
[    2.036091] calling  mod_init+0x0/0x12f @ 1
[    2.037129] initcall mod_init+0x0/0x12f returned -19 after 10 usecs
[    2.038640] calling  mod_init+0x0/0x51 @ 1
[    2.039643] initcall mod_init+0x0/0x51 returned -19 after 4 usecs
[    2.041161] calling  init+0x0/0x12 @ 1
[    2.042182] initcall init+0x0/0x12 returned 0 after 100 usecs
[    2.043547] calling  rng_init+0x0/0x12 @ 1
[    2.044903] initcall rng_init+0x0/0x12 returned 0 after 251 usecs
[    2.046449] calling  agp_init+0x0/0x29 @ 1
[    2.047488] Linux agpgart interface v0.103
[    2.048559] initcall agp_init+0x0/0x29 returned 0 after 1038 usecs
[    2.049979] calling  agp_amd64_mod_init+0x0/0x21 @ 1
[    2.051481] initcall agp_amd64_mod_init+0x0/0x21 returned -19 after 285 usecs
[    2.053283] calling  agp_intel_init+0x0/0x2c @ 1
[    2.054758] initcall agp_intel_init+0x0/0x2c returned 0 after 334 usecs
[    2.056436] calling  init_tis+0x0/0x9c @ 1
[    2.057540] initcall init_tis+0x0/0x9c returned 0 after 96 usecs
[    2.058985] calling  init_nsc+0x0/0x1cd @ 1
[    2.059787] initcall init_nsc+0x0/0x1cd returned -19 after 30 usecs
[    2.061365] calling  init_atmel+0x0/0x187 @ 1
[    2.062606] initcall init_atmel+0x0/0x187 returned -19 after 178 usecs
[    2.064238] calling  drm_core_init+0x0/0x136 @ 1
[    2.065481] [drm] Initialized drm 1.1.0 20060810
[    2.066679] initcall drm_core_init+0x0/0x136 returned 0 after 1272 usecs
[    2.068354] calling  ttm_init+0x0/0x62 @ 1
[    2.069490] initcall ttm_init+0x0/0x62 returned 0 after 130 usecs
[    2.070961] calling  mga_init+0x0/0x25 @ 1
[    2.071969] initcall mga_init+0x0/0x25 returned 0 after 14 usecs
[    2.073466] calling  i915_init+0x0/0x68 @ 1
[    2.074674] initcall i915_init+0x0/0x68 returned 0 after 167 usecs
[    2.076299] calling  mgag200_init+0x0/0x3b @ 1
[    2.077565] initcall mgag200_init+0x0/0x3b returned 0 after 151 usecs
[    2.079122] calling  nouveau_drm_init+0x0/0x4f @ 1
[    2.080509] initcall nouveau_drm_init+0x0/0x4f returned 0 after 144 usecs
[    2.082157] calling  udl_init+0x0/0x19 @ 1
[    2.083255] usbcore: registered new interface driver udl
[    2.084624] initcall udl_init+0x0/0x19 returned 0 after 1427 usecs
[    2.086113] calling  sil164_init+0x0/0x14 @ 1
[    2.087281] initcall sil164_init+0x0/0x14 returned 0 after 106 usecs
[    2.088878] calling  cn_proc_init+0x0/0x3a @ 1
[    2.089989] initcall cn_proc_init+0x0/0x3a returned 0 after 26 usecs
[    2.091547] calling  topology_sysfs_init+0x0/0x5b @ 1
[    2.092833] initcall topology_sysfs_init+0x0/0x5b returned 0 after 17 usecs
[    2.094564] calling  brd_init+0x0/0x1a6 @ 1
[    2.099878] brd: module loaded
[    2.100502] initcall brd_init+0x0/0x1a6 returned 0 after 4795 usecs
[    2.102022] calling  loop_init+0x0/0x135 @ 1
[    2.105920] loop: module loaded
[    2.106733] initcall loop_init+0x0/0x135 returned 0 after 3597 usecs
[    2.108418] calling  nvme_init+0x0/0xa8 @ 1
[    2.109748] initcall nvme_init+0x0/0xa8 returned 0 after 291 usecs
[    2.111212] calling  init+0x0/0x8b @ 1
[    2.112362] initcall init+0x0/0x8b returned 0 after 130 usecs
[    2.113750] calling  mtip_init+0x0/0xd0 @ 1
[    2.114780] mtip32xx Version 1.2.6os3
[    2.115782] initcall mtip_init+0x0/0xd0 returned 0 after 978 usecs
[    2.117349] calling  tifm_7xx1_driver_init+0x0/0x1b @ 1
[    2.118733] initcall tifm_7xx1_driver_init+0x0/0x1b returned 0
after 113 usecs
[    2.120548] calling  enclosure_init+0x0/0x19 @ 1
[    2.121763] initcall enclosure_init+0x0/0x19 returned 0 after 91 usecs
[    2.123332] calling  at24_init+0x0/0x46 @ 1
[    2.124514] initcall at24_init+0x0/0x46 returned 0 after 91 usecs
[    2.125982] calling  cb710_init_module+0x0/0x1b @ 1
[    2.127275] initcall cb710_init_module+0x0/0x1b returned 0 after 105 usecs
[    2.129028] calling  lpc_sch_driver_init+0x0/0x1b @ 1
[    2.130397] initcall lpc_sch_driver_init+0x0/0x1b returned 0 after 96 usecs
[    2.132127] calling  lpc_ich_init+0x0/0x1b @ 1
[    2.133322] initcall lpc_ich_init+0x0/0x1b returned 0 after 135 usecs
[    2.134900] calling  scsi_tgt_init+0x0/0x8f @ 1
[    2.136345] initcall scsi_tgt_init+0x0/0x8f returned 0 after 223 usecs
[    2.137968] calling  raid_init+0x0/0x12 @ 1
[    2.139144] initcall raid_init+0x0/0x12 returned 0 after 120 usecs
[    2.140723] calling  spi_transport_init+0x0/0x7a @ 1
[    2.142118] initcall spi_transport_init+0x0/0x7a returned 0 after 173 usecs
[    2.143618] calling  fc_transport_init+0x0/0x8a @ 1
[    2.145174] initcall fc_transport_init+0x0/0x8a returned 0 after 278 usecs
[    2.146839] calling  iscsi_transport_init+0x0/0x18c @ 1
[    2.148162] Loading iSCSI transport class v2.0-870.
[    2.149802] initcall iscsi_transport_init+0x0/0x18c returned 0
after 1602 usecs
[    2.151569] calling  sas_transport_init+0x0/0xcc @ 1
[    2.153283] initcall sas_transport_init+0x0/0xcc returned 0 after 423 usecs
[    2.154964] calling  sas_class_init+0x0/0x34 @ 1
[    2.156165] initcall sas_class_init+0x0/0x34 returned 0 after 16 usecs
[    2.157735] calling  srp_transport_init+0x0/0x3c @ 1
[    2.159091] initcall srp_transport_init+0x0/0x3c returned 0 after 153 usecs
[    2.160844] calling  scsi_dh_init+0x0/0x3c @ 1
[    2.161929] initcall scsi_dh_init+0x0/0x3c returned 0 after 6 usecs
[    2.163409] calling  rdac_init+0x0/0x7b @ 1
[    2.164531] rdac: device handler registered
[    2.165662] initcall rdac_init+0x0/0x7b returned 0 after 1133 usecs
[    2.167228] calling  libfc_init+0x0/0x3b @ 1
[    2.168359] initcall libfc_init+0x0/0x3b returned 0 after 126 usecs
[    2.169881] calling  fcoe_init+0x0/0x1d5 @ 1
[    2.171070] initcall fcoe_init+0x0/0x1d5 returned 0 after 158 usecs
[    2.172657] calling  libfcoe_init+0x0/0x2a @ 1
[    2.173873] initcall libfcoe_init+0x0/0x2a returned 0 after 127 usecs
[    2.175438] calling  iscsi_sw_tcp_init+0x0/0x42 @ 1
[    2.176855] iscsi: registered transport (tcp)
[    2.177928] initcall iscsi_sw_tcp_init+0x0/0x42 returned 0 after 1188 usecs
[    2.179602] calling  ahc_linux_init+0x0/0x63 @ 1
[    2.180900] initcall ahc_linux_init+0x0/0x63 returned 0 after 113 usecs
[    2.182500] calling  ahd_linux_init+0x0/0x7e @ 1
[    2.183719] initcall ahd_linux_init+0x0/0x7e returned 0 after 97 usecs
[    2.185362] calling  aac_init+0x0/0x79 @ 1
[    2.186389] Adaptec aacraid driver 1.2-0[29800]-ms
[    2.187661] initcall aac_init+0x0/0x79 returned 0 after 1243 usecs
[    2.189211] calling  aic94xx_init+0x0/0x150 @ 1
[    2.190312] aic94xx: Adaptec aic94xx SAS/SATA driver version 1.0.3 loaded
[    2.192127] initcall aic94xx_init+0x0/0x150 returned 0 after 1767 usecs
[    2.193700] calling  isci_init+0x0/0x70 @ 1
[    2.194741] isci: Intel(R) C600 SAS Controller Driver - version 1.1.0
[    2.196584] initcall isci_init+0x0/0x70 returned 0 after 1793 usecs
[    2.198112] calling  qla1280_init+0x0/0x1b @ 1
[    2.199358] initcall qla1280_init+0x0/0x1b returned 0 after 156 usecs
[    2.200997] calling  qla2x00_module_init+0x0/0x215 @ 1
[    2.202275] qla2xxx [0000:00:00.0]-0005: : QLogic Fibre Channel HBA
Driver: 8.04.00.07-k.
[    2.204409] initcall qla2x00_module_init+0x0/0x215 returned 0 after
2112 usecs
[    2.206148] calling  qla4xxx_module_init+0x0/0xe8 @ 1
[    2.207497] iscsi: registered transport (qla4xxx)
[    2.208794] QLogic iSCSI HBA Driver
[    2.209661] initcall qla4xxx_module_init+0x0/0xe8 returned 0 after 2221 usecs
[    2.211381] calling  lpfc_init+0x0/0xfb @ 1
[    2.212460] Emulex LightPulse Fibre Channel SCSI driver 8.3.35
[    2.213865] Copyright(c) 2004-2009 Emulex.  All rights reserved.
[    2.215534] initcall lpfc_init+0x0/0xfb returned 0 after 3006 usecs
[    2.217116] calling  bfad_init+0x0/0xa7 @ 1
[    2.218135] Brocade BFA FC/FCOE SCSI driver - version: 3.1.2.1
[    2.219554] initcall bfad_init+0x0/0xa7 returned 0 after 1386 usecs
[    2.221125] calling  sym2_init+0x0/0xff @ 1
[    2.222256] initcall sym2_init+0x0/0xff returned 0 after 103 usecs
[    2.223721] calling  megaraid_init+0x0/0xb9 @ 1
[    2.225109] initcall megaraid_init+0x0/0xb9 returned 0 after 170 usecs
[    2.226716] calling  mraid_mm_init+0x0/0x8e @ 1
[    2.227829] megaraid cmm: 2.20.2.7 (Release Date: Sun Jul 16
00:01:03 EST 2006)
[    2.229832] initcall mraid_mm_init+0x0/0x8e returned 0 after 1947 usecs
[    2.231438] calling  megaraid_init+0x0/0x95 @ 1
[    2.232606] megaraid: 2.20.5.1 (Release Date: Thu Nov 16 15:32:35 EST 2006)
[    2.234418] initcall megaraid_init+0x0/0x95 returned 0 after 1771 usecs
[    2.236126] calling  megasas_init+0x0/0x1c8 @ 1
[    2.237226] megasas: 06.504.01.00-rc1 Mon. Oct. 1 17:00:00 PDT 2012
[    2.238785] tsc: Refined TSC clocksource calibration: 2491.905 MHz
[    2.240336] Switching to clocksource tsc
[    2.241398] initcall megasas_init+0x0/0x1c8 returned 0 after 4072 usecs
[    2.242993] calling  _scsih_init+0x0/0x188 @ 1
[    2.244100] mpt2sas version 14.100.00.00 loaded
[    2.245402] initcall _scsih_init+0x0/0x188 returned 0 after 1272 usecs
[    2.246980] calling  ufshcd_pci_driver_init+0x0/0x1b @ 1
[    2.248388] initcall ufshcd_pci_driver_init+0x0/0x1b returned 0
after 83 usecs
[    2.250132] calling  mvs_init+0x0/0xa1 @ 1
[    2.251215] initcall mvs_init+0x0/0xa1 returned 0 after 96 usecs
[    2.252708] calling  init_st+0x0/0x18f @ 1
[    2.253684] st: Version 20101219, fixed bufsize 32768, s/g segs 256
[    2.255461] initcall init_st+0x0/0x18f returned 0 after 1737 usecs
[    2.257052] calling  init_sd+0x0/0x133 @ 1
[    2.258238] initcall init_sd+0x0/0x133 returned 0 after 193 usecs
[    2.259729] calling  init_sr+0x0/0x46 @ 1
[    2.260859] initcall init_sr+0x0/0x46 returned 0 after 98 usecs
[    2.262295] calling  init_sg+0x0/0x12c @ 1
[    2.263407] initcall init_sg+0x0/0x12c returned 0 after 118 usecs
[    2.264931] calling  init_ch_module+0x0/0xb5 @ 1
[    2.266049] SCSI Media Changer driver v0.25
[    2.267216] initcall init_ch_module+0x0/0xb5 returned 0 after 1140 usecs
[    2.268877] calling  ses_init+0x0/0x3c @ 1
[    2.269957] initcall ses_init+0x0/0x3c returned 0 after 77 usecs
[    2.271416] calling  osd_uld_init+0x0/0xc3 @ 1
[    2.272653] osd: LOADED open-osd 0.2.1
[    2.273565] initcall osd_uld_init+0x0/0xc3 returned 0 after 1017 usecs
[    2.275220] calling  ahci_pci_driver_init+0x0/0x1b @ 1
[    2.276597] initcall ahci_pci_driver_init+0x0/0x1b returned 0 after 95 usecs
[    2.278296] calling  arasan_cf_driver_init+0x0/0x12 @ 1
[    2.279629] initcall arasan_cf_driver_init+0x0/0x12 returned 0 after 72 usecs
[    2.281390] calling  piix_init+0x0/0x29 @ 1
[    2.282418] ata_piix 0000:00:01.1: version 2.13
[    2.283596] ata_piix 0000:00:01.1: enabling bus mastering
[    2.285019] ata_piix 0000:00:01.1: setting latency timer to 64
[    2.287908] scsi0 : ata_piix
[    2.288884] scsi1 : ata_piix
[    2.289821] ata1: PATA max MWDMA2 cmd 0x1f0 ctl 0x3f6 bmdma 0xc040 irq 14
[    2.291484] ata2: PATA max MWDMA2 cmd 0x170 ctl 0x376 bmdma 0xc048 irq 15
[    2.293229] calling  2_async_port_probe+0x0/0x60 @ 15
[    2.294775] calling  3_async_port_probe+0x0/0x60 @ 6
[    2.296338] initcall piix_init+0x0/0x29 returned 0 after 13608 usecs
[    2.297885] calling  mv_init+0x0/0x45 @ 1
[    2.299009] initcall mv_init+0x0/0x45 returned 0 after 149 usecs
[    2.300512] calling  nv_pci_driver_init+0x0/0x1b @ 1
[    2.301797] initcall nv_pci_driver_init+0x0/0x1b returned 0 after 86 usecs
[    2.303457] calling  sil_pci_driver_init+0x0/0x1b @ 1
[    2.304807] initcall sil_pci_driver_init+0x0/0x1b returned 0 after 91 usecs
[    2.306491] calling  k2_sata_pci_driver_init+0x0/0x1b @ 1
[    2.307876] initcall k2_sata_pci_driver_init+0x0/0x1b returned 0
after 82 usecs
[    2.309688] calling  svia_pci_driver_init+0x0/0x1b @ 1
[    2.311018] initcall svia_pci_driver_init+0x0/0x1b returned 0 after 87 usecs
[    2.312747] calling  amd_pci_driver_init+0x0/0x1b @ 1
[    2.314098] initcall amd_pci_driver_init+0x0/0x1b returned 0 after 139 usecs
[    2.315856] calling  atiixp_pci_driver_init+0x0/0x1b @ 1
[    2.317357] initcall atiixp_pci_driver_init+0x0/0x1b returned 0
after 120 usecs
[    2.319125] calling  oldpiix_pci_driver_init+0x0/0x1b @ 1
[    2.320624] initcall oldpiix_pci_driver_init+0x0/0x1b returned 0
after 121 usecs
[    2.322421] calling  serverworks_pci_driver_init+0x0/0x1b @ 1
[    2.323891] initcall serverworks_pci_driver_init+0x0/0x1b returned
0 after 89 usecs
[    2.325795] calling  via_pci_driver_init+0x0/0x1b @ 1
[    2.327109] initcall via_pci_driver_init+0x0/0x1b returned 0 after 88 usecs
[    2.328831] calling  mpiix_pci_driver_init+0x0/0x1b @ 1
[    2.330185] initcall mpiix_pci_driver_init+0x0/0x1b returned 0 after 87 usecs
[    2.331903] calling  legacy_init+0x0/0xa0e @ 1
[    2.333025] initcall legacy_init+0x0/0xa0e returned -19 after 6 usecs
[    2.334583] calling  macvlan_init_module+0x0/0x3a @ 1
[    2.335807] initcall macvlan_init_module+0x0/0x3a returned 0 after 4 usecs
[    2.337499] calling  macvtap_init+0x0/0x100 @ 1
[    2.338683] initcall macvtap_init+0x0/0x100 returned 0 after 83 usecs
[    2.340271] calling  net_olddevs_init+0x0/0x85 @ 1
[    2.341429] initcall net_olddevs_init+0x0/0x85 returned 0 after 2 usecs
[    2.342996] calling  marvell_init+0x0/0x17 @ 1
[    2.344787] initcall marvell_init+0x0/0x17 returned 0 after 673 usecs
[    2.346416] calling  davicom_init+0x0/0x17 @ 1
[    2.347761] initcall davicom_init+0x0/0x17 returned 0 after 235 usecs
[    2.349402] calling  cicada_init+0x0/0x17 @ 1
[    2.350635] initcall cicada_init+0x0/0x17 returned 0 after 171 usecs
[    2.352205] calling  lxt_init+0x0/0x17 @ 1
[    2.353471] initcall lxt_init+0x0/0x17 returned 0 after 255 usecs
[    2.354953] calling  qs6612_init+0x0/0x12 @ 1
[    2.356146] initcall qs6612_init+0x0/0x12 returned 0 after 73 usecs
[    2.357672] calling  smsc_init+0x0/0x17 @ 1
[    2.358985] initcall smsc_init+0x0/0x17 returned 0 after 295 usecs
[    2.360522] calling  vsc82xx_init+0x0/0x17 @ 1
[    2.361682] initcall vsc82xx_init+0x0/0x17 returned 0 after 133 usecs
[    2.363236] calling  broadcom_init+0x0/0x17 @ 1
[    2.365023] initcall broadcom_init+0x0/0x17 returned 0 after 632 usecs
[    2.366604] calling  icplus_init+0x0/0x17 @ 1
[    2.367846] initcall icplus_init+0x0/0x17 returned 0 after 185 usecs
[    2.369438] calling  realtek_init+0x0/0x12 @ 1
[    2.370600] initcall realtek_init+0x0/0x12 returned 0 after 76 usecs
[    2.372170] calling  et1011c_init+0x0/0x12 @ 1
[    2.373388] initcall et1011c_init+0x0/0x12 returned 0 after 101 usecs
[    2.374968] calling  fixed_mdio_bus_init+0x0/0xf3 @ 1
[    2.376653] libphy: Fixed MDIO Bus: probed
[    2.377683] initcall fixed_mdio_bus_init+0x0/0xf3 returned 0 after 1316 usecs
[    2.379399] calling  ns_init+0x0/0x12 @ 1
[    2.380543] initcall ns_init+0x0/0x12 returned 0 after 111 usecs
[    2.382016] calling  ste10Xp_init+0x0/0x17 @ 1
[    2.383239] initcall ste10Xp_init+0x0/0x17 returned 0 after 148 usecs
[    2.384852] calling  team_module_init+0x0/0x8b @ 1
[    2.386025] initcall team_module_init+0x0/0x8b returned 0 after 20 usecs
[    2.387640] calling  rr_init_module+0x0/0x12 @ 1
[    2.388804] initcall rr_init_module+0x0/0x12 returned 0 after 21 usecs
[    2.390378] calling  ab_init_module+0x0/0x12 @ 1
[    2.391495] initcall ab_init_module+0x0/0x12 returned 0 after 1 usecs
[    2.393076] calling  tun_init+0x0/0x90 @ 1
[    2.394071] tun: Universal TUN/TAP device driver, 1.6
[    2.395296] tun: (C) 1999-2004 Max Krasnyansky <maxk@qualcomm.com>
[    2.396924] initcall tun_init+0x0/0x90 returned 0 after 2783 usecs
[    2.398415] calling  veth_init+0x0/0x12 @ 1
[    2.399431] initcall veth_init+0x0/0x12 returned 0 after 2 usecs
[    2.400920] calling  init+0x0/0x12 @ 1
[    2.401898] initcall init+0x0/0x12 returned 0 after 72 usecs
[    2.403251] calling  can_dev_init+0x0/0x2c @ 1
[    2.404367] CAN device driver interface
[    2.405325] initcall can_dev_init+0x0/0x2c returned 0 after 938 usecs
[    2.406921] calling  peak_usb_init+0x0/0x3e @ 1
[    2.408196] usbcore: registered new interface driver peak_usb
[    2.409601] initcall peak_usb_init+0x0/0x3e returned 0 after 1515 usecs
[    2.411202] calling  bnx2_init+0x0/0x1b @ 1
[    2.412399] initcall bnx2_init+0x0/0x1b returned 0 after 123 usecs
[    2.413897] calling  cnic_init+0x0/0x7e @ 1
[    2.414913] cnic: Broadcom NetXtreme II CNIC Driver cnic v2.5.16
(Dec 05, 2012)
[    2.416773] initcall cnic_init+0x0/0x7e returned 0 after 1809 usecs
[    2.418284] calling  bnx2x_init+0x0/0x97 @ 1
[    2.419313] bnx2x: Broadcom NetXtreme II 5771x/578xx 10/20-Gigabit
Ethernet Driver bnx2x 1.78.00-0 (2012/09/27)
[    2.421894] initcall bnx2x_init+0x0/0x97 returned 0 after 2513 usecs
[    2.423440] calling  tg3_init+0x0/0x1b @ 1
[    2.424556] initcall tg3_init+0x0/0x1b returned 0 after 87 usecs
[    2.426002] calling  bnad_module_init+0x0/0x62 @ 1
[    2.427153] Brocade 10G Ethernet driver - version: 3.1.2.1
[    2.428597] initcall bnad_module_init+0x0/0x62 returned 0 after 1403 usecs
[    2.430254] calling  xgmac_driver_init+0x0/0x12 @ 1
[    2.431501] initcall xgmac_driver_init+0x0/0x12 returned 0 after 73 usecs
[    2.433162] calling  cxgb3_init_module+0x0/0x20 @ 1
[    2.434500] initcall cxgb3_init_module+0x0/0x20 returned 0 after 142 usecs
[    2.436241] calling  cxgb4_init_module+0x0/0x90 @ 1
[    2.437643] initcall cxgb4_init_module+0x0/0x90 returned 0 after 187 usecs
[    2.439322] calling  cxgb4vf_module_init+0x0/0xa4 @ 1
[    2.440721] initcall cxgb4vf_module_init+0x0/0xa4 returned 0 after 128 usecs
[    2.442453] calling  be_init_module+0x0/0x4b @ 1
[    2.443651] initcall be_init_module+0x0/0x4b returned 0 after 100 usecs
[    2.445239] calling  e1000_init_module+0x0/0x84 @ 1
[    2.446381] e1000: Intel(R) PRO/1000 Network Driver - version 7.3.21-k8-NAPI
[    2.448035] e1000: Copyright (c) 1999-2006 Intel Corporation.
[    2.449417] e1000 0000:00:03.0: enabling bus mastering
[    2.450622] e1000 0000:00:03.0: setting latency timer to 64
[    2.452908] ata2.01: NODEV after polling detection
[    2.454280] ata1.01: NODEV after polling detection
[    2.455622] ata2.00: ATAPI: QEMU DVD-ROM, 1.2.50, max UDMA/100
[    2.457191] ata1.00: ATA-7: QEMU HARDDISK, 1.2.50, max UDMA/100
[    2.458545] ata1.00: 65536 sectors, multi 16: LBA48
[    2.460109] ata2.00: configured for MWDMA2
[    2.461281] async_waiting @ 6
[    2.462188] ata1.00: configured for MWDMA2
[    2.463100] async_waiting @ 15
[    2.463765] async_continuing @ 15 after 1 usec
[    2.465410] scsi 0:0:0:0: Direct-Access     ATA      QEMU HARDDISK
  1.2. PQ: 0 ANSI: 5
[    2.468123] calling  4_sd_probe_async+0x0/0x1f0 @ 1350
[    2.469154] sd 0:0:0:0: [sda] 65536 512-byte logical blocks: (33.5
MB/32.0 MiB)
[    2.470966] sd 0:0:0:0: [sda] Write Protect is off
[    2.472127] sd 0:0:0:0: [sda] Mode Sense: 00 3a 00 00
[    2.473350] sd 0:0:0:0: [sda] Write cache: enabled, read cache:
enabled, doesn't support DPO or FUA
[    2.476794]  sda: unknown partition table
[    2.477838] sd 0:0:0:0: Attached scsi generic sg0 type 0
[    2.479720] sd 0:0:0:0: [sda] Attached SCSI disk
[    2.480847] initcall 4_sd_probe_async+0x0/0x1f0 returned 0 after 11562 usecs
[    2.739313] initcall 2_async_port_probe+0x0/0x60 returned 0 after
434424 usecs
[    2.741171] async_continuing @ 6 after 272656 usec
[    2.742610] scsi 1:0:0:0: CD-ROM            QEMU     QEMU DVD-ROM
  1.2. PQ: 0 ANSI: 5
[    2.745217] sr0: scsi3-mmc drive: 4x/4x cd/rw xa/form2 tray
[    2.746496] cdrom: Uniform CD-ROM driver Revision: 3.20
[    2.748150] sr 1:0:0:0: Attached scsi CD-ROM sr0
[    2.749860] sr 1:0:0:0: Attached scsi generic sg1 type 5
[    2.751344] initcall 3_async_port_probe+0x0/0x60 returned 0 after
444694 usecs
[    2.776475] e1000 0000:00:03.0 eth0: (PCI:33MHz:32-bit) 00:1c:25:1c:13:e9
[    2.777667] e1000 0000:00:03.0 eth0: Intel(R) PRO/1000 Network Connection
[    2.779424] initcall e1000_init_module+0x0/0x84 returned 0 after 325239 usecs
[    2.781196] calling  e1000_init_module+0x0/0x3e @ 1
[    2.782379] e1000e: Intel(R) PRO/1000 Network Driver - 2.1.4-k
[    2.783784] e1000e: Copyright(c) 1999 - 2012 Intel Corporation.
[    2.785352] initcall e1000_init_module+0x0/0x3e returned 0 after 2898 usecs
[    2.787034] calling  igb_init_module+0x0/0x58 @ 1
[    2.788198] igb: Intel(R) Gigabit Ethernet Network Driver - version 4.1.2-k
[    2.789863] igb: Copyright (c) 2007-2012 Intel Corporation.
[    2.791378] initcall igb_init_module+0x0/0x58 returned 0 after 3111 usecs
[    2.793062] calling  igbvf_init_module+0x0/0x4c @ 1
[    2.794236] igbvf: Intel(R) Gigabit Virtual Function Network Driver
- version 2.0.2-k
[    2.796150] igbvf: Copyright (c) 2009 - 2012 Intel Corporation.
[    2.797726] initcall igbvf_init_module+0x0/0x4c returned 0 after 3405 usecs
[    2.799386] calling  ixgbe_init_module+0x0/0x5d @ 1
[    2.800710] ixgbe: Intel(R) 10 Gigabit PCI Express Network Driver -
version 3.11.33-k
[    2.802612] ixgbe: Copyright (c) 1999-2012 Intel Corporation.
[    2.804132] initcall ixgbe_init_module+0x0/0x5d returned 0 after 3339 usecs
[    2.805839] calling  ixgbevf_init_module+0x0/0x4c @ 1
[    2.807045] ixgbevf: Intel(R) 10 Gigabit PCI Express Virtual
Function Network Driver - version 2.7.12-k
[    2.809337] ixgbevf: Copyright (c) 2009 - 2012 Intel Corporation.
[    2.810899] initcall ixgbevf_init_module+0x0/0x4c returned 0 after 3759 usecs
[    2.812646] calling  ixgb_init_module+0x0/0x4c @ 1
[    2.813800] ixgb: Intel(R) PRO/10GbE Network Driver - version 1.0.135-k2-NAPI
[    2.815525] ixgb: Copyright (c) 1999-2008 Intel Corporation.
[    2.817091] initcall ixgb_init_module+0x0/0x4c returned 0 after 3209 usecs
[    2.818756] calling  mlx4_init+0x0/0xed @ 1
[    2.819892] initcall mlx4_init+0x0/0xed returned 0 after 120 usecs
[    2.821440] calling  mlx4_en_init+0x0/0x12 @ 1
[    2.822530] initcall mlx4_en_init+0x0/0x12 returned 0 after 15 usecs
[    2.824078] calling  myri10ge_init_module+0x0/0x72 @ 1
[    2.825336] myri10ge: Version 1.5.3-1.534
[    2.826409] initcall myri10ge_init_module+0x0/0x72 returned 0 after
1049 usecs
[    2.828195] calling  init_nic+0x0/0x1b @ 1
[    2.829377] initcall init_nic+0x0/0x1b returned 0 after 151 usecs
[    2.830882] calling  ql3xxx_driver_init+0x0/0x1b @ 1
[    2.832289] initcall ql3xxx_driver_init+0x0/0x1b returned 0 after 146 usecs
[    2.833969] calling  qlcnic_init_module+0x0/0xb9 @ 1
[    2.835179] QLogic 1/10 GbE Converged/Intelligent Ethernet Driver v5.0.29
[    2.837007] initcall qlcnic_init_module+0x0/0xb9 returned 0 after 1777 usecs
[    2.838688] calling  qlge_init_module+0x0/0x1b @ 1
[    2.839935] initcall qlge_init_module+0x0/0x1b returned 0 after 88 usecs
[    2.841596] calling  niu_init+0x0/0x3d @ 1
[    2.842683] initcall niu_init+0x0/0x3d returned 0 after 90 usecs
[    2.844166] calling  skfd_init+0x0/0x1b @ 1
[    2.845283] initcall skfd_init+0x0/0x1b returned 0 after 95 usecs
[    2.846755] calling  cdc_driver_init+0x0/0x1b @ 1
[    2.847967] usbcore: registered new interface driver cdc_ether
[    2.849429] initcall cdc_driver_init+0x0/0x1b returned 0 after 1493 usecs
[    2.851059] calling  eem_driver_init+0x0/0x1b @ 1
[    2.852306] usbcore: registered new interface driver cdc_eem
[    2.853675] initcall eem_driver_init+0x0/0x1b returned 0 after 1427 usecs
[    2.855322] calling  rndis_driver_init+0x0/0x1b @ 1
[    2.856614] usbcore: registered new interface driver rndis_host
[    2.858035] initcall rndis_driver_init+0x0/0x1b returned 0 after 1474 usecs
[    2.859735] calling  usbnet_init+0x0/0x2b @ 1
[    2.860884] initcall usbnet_init+0x0/0x2b returned 0 after 5 usecs
[    2.862385] calling  cdc_ncm_driver_init+0x0/0x1b @ 1
[    2.863749] usbcore: registered new interface driver cdc_ncm
[    2.865191] initcall cdc_ncm_driver_init+0x0/0x1b returned 0 after 1537 usecs
[    2.866894] calling  fusion_init+0x0/0xf8 @ 1
[    2.867944] Fusion MPT base driver 3.04.20
[    2.868958] Copyright (c) 1999-2008 LSI Corporation
[    2.870149] initcall fusion_init+0x0/0xf8 returned 0 after 2149 usecs
[    2.871709] calling  mptspi_init+0x0/0xf4 @ 1
[    2.872788] Fusion MPT SPI Host driver 3.04.20
[    2.873989] initcall mptspi_init+0x0/0xf4 returned 0 after 1174 usecs
[    2.875560] calling  mptfc_init+0x0/0x107 @ 1
[    2.876648] Fusion MPT FC Host driver 3.04.20
[    2.877801] initcall mptfc_init+0x0/0x107 returned 0 after 1127 usecs
[    2.879362] calling  mptsas_init+0x0/0x13b @ 1
[    2.880469] Fusion MPT SAS Host driver 3.04.20
[    2.881646] initcall mptsas_init+0x0/0x13b returned 0 after 1150 usecs
[    2.883211] calling  mptctl_init+0x0/0x15c @ 1
[    2.884338] Fusion MPT misc device (ioctl) driver 3.04.20
[    2.885741] mptctl: Registered with Fusion MPT base driver
[    2.887044] mptctl: /dev/mptctl @ (major,minor=10,220)
[    2.888347] initcall mptctl_init+0x0/0x15c returned 0 after 3911 usecs
[    2.889988] calling  mpt_lan_init+0x0/0x92 @ 1
[    2.891108] Fusion MPT LAN driver 3.04.20
[    2.892125] initcall mpt_lan_init+0x0/0x92 returned 0 after 985 usecs
[    2.893678] calling  uio_init+0x0/0xe8 @ 1
[    2.894828] initcall uio_init+0x0/0xe8 returned 0 after 131 usecs
[    2.896348] calling  uio_pdrv_init+0x0/0x12 @ 1
[    2.897546] initcall uio_pdrv_init+0x0/0x12 returned 0 after 102 usecs
[    2.899151] calling  uio_pdrv_genirq_init+0x0/0x12 @ 1
[    2.900408] initcall uio_pdrv_genirq_init+0x0/0x12 returned 0 after 75 usecs
[    2.902115] calling  vfio_init+0x0/0x1ea @ 1
[    2.903356] VFIO - User Level meta-driver version: 0.3
[    2.904717] initcall vfio_init+0x0/0x1ea returned 0 after 1523 usecs
[    2.906262] calling  vfio_iommu_type1_init+0x0/0x29 @ 1
[    2.907524] initcall vfio_iommu_type1_init+0x0/0x29 returned -19
after 0 usecs
[    2.909294] calling  vfio_pci_init+0x0/0x4d @ 1
[    2.910526] initcall vfio_pci_init+0x0/0x4d returned 0 after 129 usecs
[    2.912137] calling  cdrom_init+0x0/0xd @ 1
[    2.913152] initcall cdrom_init+0x0/0xd returned 0 after 0 usecs
[    2.914617] calling  yenta_socket_init+0x0/0x1b @ 1
[    2.915886] initcall yenta_socket_init+0x0/0x1b returned 0 after 91 usecs
[    2.917548] calling  aoe_init+0x0/0xb9 @ 1
[    2.919350] aoe: AoE v50 initialised.
[    2.920381] initcall aoe_init+0x0/0xb9 returned 0 after 1788 usecs
[    2.921900] calling  ehci_hcd_init+0x0/0x59 @ 1
[    2.922995] ehci_hcd: USB 2.0 'Enhanced' Host Controller (EHCI) Driver
[    2.924635] initcall ehci_hcd_init+0x0/0x59 returned 0 after 1593 usecs
[    2.926223] calling  ehci_pci_init+0x0/0x69 @ 1
[    2.927322] ehci-pci: EHCI PCI platform driver
[    2.928599] initcall ehci_pci_init+0x0/0x69 returned 0 after 1240 usecs
[    2.930197] calling  oxu_driver_init+0x0/0x12 @ 1
[    2.931415] initcall oxu_driver_init+0x0/0x12 returned 0 after 76 usecs
[    2.933058] calling  isp116x_driver_init+0x0/0x12 @ 1
[    2.934368] initcall isp116x_driver_init+0x0/0x12 returned 0 after 78 usecs
[    2.936091] calling  isp1362_driver_init+0x0/0x12 @ 1
[    2.937399] initcall isp1362_driver_init+0x0/0x12 returned 0 after 78 usecs
[    2.939089] calling  ohci_hcd_mod_init+0x0/0x54 @ 1
[    2.940303] ohci_hcd: USB 1.1 'Open' Host Controller (OHCI) Driver
[    2.941876] initcall ohci_hcd_mod_init+0x0/0x54 returned 0 after 1538 usecs
[    2.943559] calling  uhci_hcd_init+0x0/0xc3 @ 1
[    2.944697] uhci_hcd: USB Universal Host Controller Interface driver
[    2.946334] initcall uhci_hcd_init+0x0/0xc3 returned 0 after 1600 usecs
[    2.947913] calling  xhci_hcd_init+0x0/0x29 @ 1
[    2.949265] initcall xhci_hcd_init+0x0/0x29 returned 0 after 158 usecs
[    2.950894] calling  sl811h_driver_init+0x0/0x12 @ 1
[    2.952278] initcall sl811h_driver_init+0x0/0x12 returned 0 after 139 usecs
[    2.953969] calling  r8a66597_driver_init+0x0/0x12 @ 1
[    2.955329] initcall r8a66597_driver_init+0x0/0x12 returned 0 after 103 usecs
[    2.957099] calling  isp1760_init+0x0/0x4d @ 1
[    2.959035] initcall isp1760_init+0x0/0x4d returned 0 after 836 usecs
[    2.960742] calling  c67x00_driver_init+0x0/0x12 @ 1
[    2.962041] initcall c67x00_driver_init+0x0/0x12 returned 0 after 100 usecs
[    2.963727] calling  usblp_driver_init+0x0/0x1b @ 1
[    2.965039] usbcore: registered new interface driver usblp
[    2.966378] initcall usblp_driver_init+0x0/0x1b returned 0 after 1396 usecs
[    2.968094] calling  wdm_driver_init+0x0/0x1b @ 1
[    2.969324] usbcore: registered new interface driver cdc_wdm
[    2.970694] initcall wdm_driver_init+0x0/0x1b returned 0 after 1428 usecs
[    2.972377] calling  usbtmc_driver_init+0x0/0x1b @ 1
[    2.973676] usbcore: registered new interface driver usbtmc
[    2.975036] initcall usbtmc_driver_init+0x0/0x1b returned 0 after 1432 usecs
[    2.976753] calling  usb_stor_init+0x0/0x43 @ 1
[    2.977854] Initializing USB Mass Storage driver...
[    2.979174] usbcore: registered new interface driver usb-storage
[    2.980885] USB Mass Storage support registered.
[    2.982021] initcall usb_stor_init+0x0/0x43 returned 0 after 4064 usecs
[    2.983621] calling  alauda_driver_init+0x0/0x1b @ 1
[    2.985032] usbcore: registered new interface driver ums-alauda
[    2.986465] initcall alauda_driver_init+0x0/0x1b returned 0 after 1540 usecs
[    2.988291] calling  datafab_driver_init+0x0/0x1b @ 1
[    2.989602] usbcore: registered new interface driver ums-datafab
[    2.991049] initcall datafab_driver_init+0x0/0x1b returned 0 after 1506 usecs
[    2.992802] calling  ene_ub6250_driver_init+0x0/0x1b @ 1
[    2.994169] usbcore: registered new interface driver ums_eneub6250
[    2.995668] initcall ene_ub6250_driver_init+0x0/0x1b returned 0
after 1548 usecs
[    2.997514] calling  freecom_driver_init+0x0/0x1b @ 1
[    2.998818] usbcore: registered new interface driver ums-freecom
[    2.999819] initcall freecom_driver_init+0x0/0x1b returned 0 after 1056 usecs
[    3.001299] calling  isd200_driver_init+0x0/0x1b @ 1
[    3.002600] usbcore: registered new interface driver ums-isd200
[    3.003948] initcall isd200_driver_init+0x0/0x1b returned 0 after 1437 usecs
[    3.005704] calling  jumpshot_driver_init+0x0/0x1b @ 1
[    3.007116] usbcore: registered new interface driver ums-jumpshot
[    3.008512] initcall jumpshot_driver_init+0x0/0x1b returned 0 after
1476 usecs
[    3.010315] calling  karma_driver_init+0x0/0x1b @ 1
[    3.011673] usbcore: registered new interface driver ums-karma
[    3.013168] initcall karma_driver_init+0x0/0x1b returned 0 after 1601 usecs
[    3.014859] calling  onetouch_driver_init+0x0/0x1b @ 1
[    3.016302] usbcore: registered new interface driver ums-onetouch
[    3.017783] initcall onetouch_driver_init+0x0/0x1b returned 0 after
1583 usecs
[    3.019529] calling  sddr09_driver_init+0x0/0x1b @ 1
[    3.020849] usbcore: registered new interface driver ums-sddr09
[    3.022298] initcall sddr09_driver_init+0x0/0x1b returned 0 after 1492 usecs
[    3.023999] calling  sddr55_driver_init+0x0/0x1b @ 1
[    3.025375] usbcore: registered new interface driver ums-sddr55
[    3.026684] initcall sddr55_driver_init+0x0/0x1b returned 0 after 1407 usecs
[    3.028210] calling  usbat_driver_init+0x0/0x1b @ 1
[    3.029377] usbcore: registered new interface driver ums-usbat
[    3.030773] initcall usbat_driver_init+0x0/0x1b returned 0 after 1524 usecs
[    3.032499] calling  usb_mdc800_init+0x0/0x2e2 @ 1
[    3.033773] usbcore: registered new interface driver mdc800
[    3.035126] mdc800: v0.7.5 (30/10/2000):USB Driver for Mustek
MDC800 Digital Camera
[    3.036975] initcall usb_mdc800_init+0x0/0x2e2 returned 0 after 3240 usecs
[    3.038655] calling  mts_usb_driver_init+0x0/0x1b @ 1
[    3.040116] usbcore: registered new interface driver microtekX6
[    3.041589] initcall mts_usb_driver_init+0x0/0x1b returned 0 after 1641 usecs
[    3.043294] calling  usb_serial_init+0x0/0x1bd @ 1
[    3.044820] usbcore: registered new interface driver usbserial
[    3.046370] usbcore: registered new interface driver usbserial_generic
[    3.048235] usbserial: USB Serial support registered for generic
[    3.049699] initcall usb_serial_init+0x0/0x1bd returned 0 after 5021 usecs
[    3.051331] calling  usb_serial_module_init+0x0/0x20 @ 1
[    3.052752] usbcore: registered new interface driver usb_debug
[    3.054237] usbserial: USB Serial support registered for debug
[    3.055667] initcall usb_serial_module_init+0x0/0x20 returned 0
after 2959 usecs
[    3.057492] calling  ftdi_init+0x0/0x78 @ 1
[    3.058613] usbcore: registered new interface driver ftdi_sio
[    3.060123] usbserial: USB Serial support registered for FTDI USB
Serial Device
[    3.061881] initcall ftdi_init+0x0/0x78 returned 0 after 3306 usecs
[    3.063370] calling  usb_serial_module_init+0x0/0x20 @ 1
[    3.064800] usbcore: registered new interface driver pl2303
[    3.066214] usbserial: USB Serial support registered for pl2303
[    3.067629] initcall usb_serial_module_init+0x0/0x20 returned 0
after 2872 usecs
[    3.069516] calling  adu_driver_init+0x0/0x1b @ 1
[    3.070835] usbcore: registered new interface driver adutux
[    3.072246] initcall adu_driver_init+0x0/0x1b returned 0 after 1533 usecs
[    3.073879] calling  appledisplay_init+0x0/0x5c @ 1
[    3.075216] usbcore: registered new interface driver appledisplay
[    3.076541] initcall appledisplay_init+0x0/0x5c returned 0 after 1493 usecs
[    3.078158] calling  cypress_driver_init+0x0/0x1b @ 1
[    3.079457] usbcore: registered new interface driver cypress_cy7c63
[    3.080968] initcall cypress_driver_init+0x0/0x1b returned 0 after 1597 usecs
[    3.082646] calling  cytherm_driver_init+0x0/0x1b @ 1
[    3.083949] usbcore: registered new interface driver cytherm
[    3.085371] initcall cytherm_driver_init+0x0/0x1b returned 0 after 1464 usecs
[    3.087083] calling  emi26_driver_init+0x0/0x1b @ 1
[    3.088384] usbcore: registered new interface driver emi26 - firmware loader
[    3.090084] initcall emi26_driver_init+0x0/0x1b returned 0 after 1747 usecs
[    3.091746] calling  emi62_driver_init+0x0/0x1b @ 1
[    3.093042] usbcore: registered new interface driver emi62 - firmware loader
[    3.094748] initcall emi62_driver_init+0x0/0x1b returned 0 after 1749 usecs
[    3.096449] calling  ftdi_elan_init+0x0/0x166 @ 1
[    3.097551] driver ftdi-elan
[    3.098586] usbcore: registered new interface driver ftdi-elan
[    3.100091] initcall ftdi_elan_init+0x0/0x166 returned 0 after 2473 usecs
[    3.101783] calling  idmouse_driver_init+0x0/0x1b @ 1
[    3.103130] usbcore: registered new interface driver idmouse
[    3.104589] initcall idmouse_driver_init+0x0/0x1b returned 0 after 1545 usecs
[    3.106349] calling  iowarrior_driver_init+0x0/0x1b @ 1
[    3.107753] usbcore: registered new interface driver iowarrior
[    3.109203] initcall iowarrior_driver_init+0x0/0x1b returned 0
after 1525 usecs
[    3.110960] calling  isight_firmware_driver_init+0x0/0x1b @ 1
[    3.112454] usbcore: registered new interface driver isight_firmware
[    3.113988] initcall isight_firmware_driver_init+0x0/0x1b returned
0 after 1584 usecs
[    3.115864] calling  lcd_driver_init+0x0/0x1b @ 1
[    3.117120] usbcore: registered new interface driver usblcd
[    3.118472] initcall lcd_driver_init+0x0/0x1b returned 0 after 1400 usecs
[    3.120131] calling  ld_usb_driver_init+0x0/0x1b @ 1
[    3.121424] usbcore: registered new interface driver ldusb
[    3.122748] initcall ld_usb_driver_init+0x0/0x1b returned 0 after 1390 usecs
[    3.124487] calling  led_driver_init+0x0/0x1b @ 1
[    3.125707] usbcore: registered new interface driver usbled
[    3.127038] initcall led_driver_init+0x0/0x1b returned 0 after 1385 usecs
[    3.128723] calling  tower_driver_init+0x0/0x1b @ 1
[    3.130082] usbcore: registered new interface driver legousbtower
[    3.131602] initcall tower_driver_init+0x0/0x1b returned 0 after 1629 usecs
[    3.133339] calling  rio_driver_init+0x0/0x1b @ 1
[    3.134619] usbcore: registered new interface driver rio500
[    3.135974] initcall rio_driver_init+0x0/0x1b returned 0 after 1452 usecs
[    3.137655] calling  sevseg_driver_init+0x0/0x1b @ 1
[    3.138963] usbcore: registered new interface driver usbsevseg
[    3.140406] initcall sevseg_driver_init+0x0/0x1b returned 0 after 1502 usecs
[    3.142087] calling  usb_sisusb_init+0x0/0x1b @ 1
[    3.143311] usbcore: registered new interface driver sisusb
[    3.144707] initcall usb_sisusb_init+0x0/0x1b returned 0 after 1441 usecs
[    3.146339] calling  i8042_init+0x0/0x3e9 @ 1
[    3.147564] i8042: PNP: PS/2 Controller [PNP0303:KBD,PNP0f13:MOU]
at 0x60,0x64 irq 1,12
[    3.150345] serio: i8042 KBD port at 0x60,0x64 irq 1
[    3.151602] serio: i8042 AUX port at 0x60,0x64 irq 12
[    3.153091] initcall i8042_init+0x0/0x3e9 returned 0 after 5561 usecs
[    3.154665] calling  serport_init+0x0/0x31 @ 1
[    3.155750] initcall serport_init+0x0/0x31 returned 0 after 1 usecs
[    3.157285] calling  pcips2_driver_init+0x0/0x1b @ 1
[    3.158595] initcall pcips2_driver_init+0x0/0x1b returned 0 after 341 usecs
[    3.160389] calling  serio_raw_drv_init+0x0/0x1b @ 1
[    3.161809] initcall serio_raw_drv_init+0x0/0x1b returned 0 after 149 usecs
[    3.163502] calling  mousedev_init+0x0/0x89 @ 1
[    3.164949] mousedev: PS/2 mouse device common for all mice
[    3.166317] initcall mousedev_init+0x0/0x89 returned 0 after 1608 usecs
[    3.167912] calling  atkbd_init+0x0/0x27 @ 1
[    3.169134] initcall atkbd_init+0x0/0x27 returned 0 after 129 usecs
[    3.170658] calling  psmouse_init+0x0/0x80 @ 1
[    3.172339] input: AT Translated Set 2 keyboard as
/devices/platform/i8042/serio0/input/input1
[    3.174701] initcall psmouse_init+0x0/0x80 returned 0 after 2986 usecs
[    3.176331] calling  sermouse_drv_init+0x0/0x1b @ 1
[    3.179686] initcall sermouse_drv_init+0x0/0x1b returned 0 after 2122 usecs
[    3.181422] calling  pcspkr_platform_driver_init+0x0/0x12 @ 1
[    3.182948] input: PC Speaker as /devices/platform/pcspkr/input/input2
[    3.184657] initcall pcspkr_platform_driver_init+0x0/0x12 returned
0 after 1801 usecs
[    3.186547] calling  cmos_init+0x0/0x6a @ 1
[    3.187599] rtc_cmos 00:00: RTC can wake from S4
[    3.189168] rtc_cmos 00:00: rtc core: registered rtc_cmos as rtc0
[    3.190853] rtc0: alarms up to one day, 114 bytes nvram, hpet irqs
[    3.192592] initcall cmos_init+0x0/0x6a returned 0 after 4926 usecs
[    3.194121] calling  i2c_dev_init+0x0/0xc7 @ 1
[    3.195213] i2c /dev entries driver
[    3.196285] initcall i2c_dev_init+0x0/0xc7 returned 0 after 1039 usecs
[    3.197871] calling  amd756_driver_init+0x0/0x1b @ 1
[    3.199162] initcall amd756_driver_init+0x0/0x1b returned 0 after 90 usecs
[    3.200863] calling  amd8111_driver_init+0x0/0x1b @ 1
[    3.202177] initcall amd8111_driver_init+0x0/0x1b returned 0 after 85 usecs
[    3.203857] calling  i2c_i801_init+0x0/0xc9 @ 1
[    3.204847] initcall i2c_i801_init+0x0/0xc9 returned 0 after 89 usecs
[    3.206435] calling  smbus_sch_driver_init+0x0/0x12 @ 1
[    3.207851] initcall smbus_sch_driver_init+0x0/0x12 returned 0 after 83 usecs
[    3.209633] calling  nforce2_driver_init+0x0/0x1b @ 1
[    3.210949] initcall nforce2_driver_init+0x0/0x1b returned 0 after 86 usecs
[    3.212666] calling  piix4_driver_init+0x0/0x1b @ 1
[    3.213897] piix4_smbus 0000:00:01.3: SMBus Host Controller at
0xb100, revision 0
[    3.216270] initcall piix4_driver_init+0x0/0x1b returned 0 after 2367 usecs
[    3.217920] calling  w1_init+0x0/0xa8 @ 1
[    3.218912] Driver for 1-wire Dallas network protocol.
[    3.220544] initcall w1_init+0x0/0xa8 returned 0 after 1587 usecs
[    3.222039] calling  matrox_w1_init+0x0/0x1b @ 1
[    3.223349] initcall matrox_w1_init+0x0/0x1b returned 0 after 143 usecs
[    3.225011] calling  ds_driver_init+0x0/0x1b @ 1
[    3.226241] usbcore: registered new interface driver DS9490R
[    3.227631] initcall ds_driver_init+0x0/0x1b returned 0 after 1466 usecs
[    3.229301] calling  ds2482_driver_init+0x0/0x14 @ 1
[    3.230589] initcall ds2482_driver_init+0x0/0x14 returned 0 after 83 usecs
[    3.232295] calling  w1_therm_init+0x0/0x2f @ 1
[    3.233431] initcall w1_therm_init+0x0/0x2f returned 0 after 37 usecs
[    3.234997] calling  w1_smem_init+0x0/0x3c @ 1
[    3.236097] initcall w1_smem_init+0x0/0x3c returned 0 after 2 usecs
[    3.237612] calling  w1_f23_init+0x0/0x12 @ 1
[    3.238677] initcall w1_f23_init+0x0/0x12 returned 0 after 2 usecs
[    3.240207] calling  w1_ds2760_init+0x0/0x2c @ 1
[    3.241335] 1-Wire driver for the DS2760 battery monitor  chip  -
(c) 2004-2005, Szabolcs Gyurko
[    3.243447] initcall w1_ds2760_init+0x0/0x2c returned 0 after 2066 usecs
[    3.245097] calling  sensors_w83627hf_init+0x0/0x170 @ 1
[    3.246412] initcall sensors_w83627hf_init+0x0/0x170 returned -19
after 25 usecs
[    3.248193] calling  adm1026_driver_init+0x0/0x14 @ 1
[    3.320333] initcall adm1026_driver_init+0x0/0x14 returned 0 after
69213 usecs
[    3.324832] calling  coretemp_init+0x0/0x81 @ 1
[    3.325936] initcall coretemp_init+0x0/0x81 returned -19 after 0 usecs
[    3.327511] calling  dme1737_init+0x0/0x19e @ 1
[    3.377359] input: ImExPS/2 Generic Explorer Mouse as
/devices/platform/i8042/serio1/input/input3
[    3.400426] initcall dme1737_init+0x0/0x19e returned 0 after 70097 usecs
[    3.403840] calling  fam15h_power_driver_init+0x0/0x1b @ 1
[    3.405938] initcall fam15h_power_driver_init+0x0/0x1b returned 0
after 143 usecs
[    3.407743] calling  sm_it87_init+0x0/0x584 @ 1
[    3.408900] initcall sm_it87_init+0x0/0x584 returned -19 after 49 usecs
[    3.410495] calling  k8temp_driver_init+0x0/0x1b @ 1
[    3.411790] initcall k8temp_driver_init+0x0/0x1b returned 0 after 91 usecs
[    3.413490] calling  k10temp_driver_init+0x0/0x1b @ 1
[    3.414916] initcall k10temp_driver_init+0x0/0x1b returned 0 after 164 usecs
[    3.416671] calling  lm83_driver_init+0x0/0x14 @ 1
[    3.560354] initcall lm83_driver_init+0x0/0x14 returned 0 after 139040 usecs
[    3.565139] calling  pc87427_init+0x0/0x68 @ 1
[    3.566477] initcall pc87427_init+0x0/0x68 returned -19 after 58 usecs
[    3.568316] calling  smsc47b397_init+0x0/0x19d @ 1
[    3.569680] initcall smsc47b397_init+0x0/0x19d returned -19 after 11 usecs
[    3.571349] calling  sp5100_tco_init_module+0x0/0x79 @ 1
[    3.572635] sp5100_tco: SP5100 TCO WatchDog Timer Driver v0.01
[    3.574588] initcall sp5100_tco_init_module+0x0/0x79 returned 0
after 1912 usecs
[    3.576321] calling  linear_init+0x0/0x12 @ 1
[    3.577371] md: linear personality registered for level -1
[    3.578647] initcall linear_init+0x0/0x12 returned 0 after 1270 usecs
[    3.580249] calling  raid0_init+0x0/0x12 @ 1
[    3.581245] md: raid0 personality registered for level 0
[    3.582542] initcall raid0_init+0x0/0x12 returned 0 after 1269 usecs
[    3.584106] calling  raid_init+0x0/0x12 @ 1
[    3.585103] md: raid1 personality registered for level 1
[    3.586380] initcall raid_init+0x0/0x12 returned 0 after 1248 usecs
[    3.587901] calling  raid_init+0x0/0x12 @ 1
[    3.588899] md: raid10 personality registered for level 10
[    3.589921] initcall raid_init+0x0/0x12 returned 0 after 998 usecs
[    3.591126] calling  raid5_init+0x0/0x2c @ 1
[    3.592214] md: raid6 personality registered for level 6
[    3.593471] md: raid5 personality registered for level 5
[    3.594760] md: raid4 personality registered for level 4
[    3.596064] initcall raid5_init+0x0/0x2c returned 0 after 3758 usecs
[    3.597578] calling  multipath_init+0x0/0x12 @ 1
[    3.598684] md: multipath personality registered for level -4
[    3.600097] initcall multipath_init+0x0/0x12 returned 0 after 1373 usecs
[    3.601721] calling  raid_init+0x0/0x12 @ 1
[    3.602634] md: faulty personality registered for level -5
[    3.603959] initcall raid_init+0x0/0x12 returned 0 after 1296 usecs
[    3.605502] calling  dm_init+0x0/0x46 @ 1
[    3.606963] device-mapper: uevent: version 1.0.3
[    3.608368] device-mapper: ioctl: 4.23.0-ioctl (2012-07-25)
initialised: dm-devel@redhat.com
[    3.610409] initcall dm_init+0x0/0x46 returned 0 after 3846 usecs
[    3.611895] calling  dm_crypt_init+0x0/0x68 @ 1
[    3.613032] initcall dm_crypt_init+0x0/0x68 returned 0 after 8 usecs
[    3.614598] calling  dm_delay_init+0x0/0xc2 @ 1
[    3.615743] initcall dm_delay_init+0x0/0xc2 returned 0 after 52 usecs
[    3.617269] calling  dm_multipath_init+0x0/0x134 @ 1
[    3.618442] device-mapper: multipath: version 1.5.0 loaded
[    3.619684] initcall dm_multipath_init+0x0/0x134 returned 0 after 1318 usecs
[    3.621403] calling  dm_rr_init+0x0/0x3c @ 1
[    3.622458] device-mapper: multipath round-robin: version 1.0.0 loaded
[    3.623029] initcall dm_rr_init+0x0/0x3c returned 0 after 590 usecs
[    3.623997] calling  dm_snapshot_init+0x0/0x21b @ 1
[    3.625233] initcall dm_snapshot_init+0x0/0x21b returned 0 after 53 usecs
[    3.626833] calling  dm_mirror_init+0x0/0x76 @ 1
[    3.628106] initcall dm_mirror_init+0x0/0x76 returned 0 after 201 usecs
[    3.629629] calling  dm_dirty_log_init+0x0/0x56 @ 1
[    3.630766] initcall dm_dirty_log_init+0x0/0x56 returned 0 after 21 usecs
[    3.632358] calling  dm_zero_init+0x0/0x2e @ 1
[    3.633383] initcall dm_zero_init+0x0/0x2e returned 0 after 1 usecs
[    3.634829] calling  hci_uart_init+0x0/0xd7 @ 1
[    3.635871] Bluetooth: HCI UART driver ver 2.2
[    3.636924] Bluetooth: HCI H4 protocol initialized
[    3.638022] Bluetooth: HCI BCSP protocol initialized
[    3.639167] Bluetooth: HCILL protocol initialized
[    3.640375] Bluetooth: HCIATH3K protocol initialized
[    3.641562] initcall hci_uart_init+0x0/0xd7 returned 0 after 5550 usecs
[    3.643134] calling  bcm203x_driver_init+0x0/0x1b @ 1
[    3.644494] usbcore: registered new interface driver bcm203x
[    3.645867] initcall bcm203x_driver_init+0x0/0x1b returned 0 after 1443 usecs
[    3.647572] calling  btusb_driver_init+0x0/0x1b @ 1
[    3.648948] usbcore: registered new interface driver btusb
[    3.650223] initcall btusb_driver_init+0x0/0x1b returned 0 after 1381 usecs
[    3.651894] calling  btsdio_init+0x0/0x27 @ 1
[    3.652996] Bluetooth: Generic Bluetooth SDIO driver ver 0.1
[    3.654516] initcall btsdio_init+0x0/0x27 returned 0 after 1486 usecs
[    3.656123] calling  ath3k_driver_init+0x0/0x1b @ 1
[    3.657429] usbcore: registered new interface driver ath3k
[    3.658768] initcall ath3k_driver_init+0x0/0x1b returned 0 after 1422 usecs
[    3.660484] calling  btmrvl_sdio_init_module+0x0/0x34 @ 1
[    3.661872] initcall btmrvl_sdio_init_module+0x0/0x34 returned 0
after 78 usecs
[    3.663635] calling  cpufreq_stats_init+0x0/0xac @ 1
[    3.664904] initcall cpufreq_stats_init+0x0/0xac returned 0 after 23 usecs
[    3.666559] calling  init_ladder+0x0/0x12 @ 1
[    3.667621] cpuidle: using governor ladder
[    3.668638] initcall init_ladder+0x0/0x12 returned 0 after 999 usecs
[    3.670164] calling  init_menu+0x0/0x12 @ 1
[    3.671172] cpuidle: using governor menu
[    3.672149] initcall init_menu+0x0/0x12 returned 0 after 948 usecs
[    3.673638] calling  mmc_blk_init+0x0/0x71 @ 1
[    3.674808] initcall mmc_blk_init+0x0/0x71 returned 0 after 81 usecs
[    3.676386] calling  sdio_uart_init+0x0/0xe1 @ 1
[    3.677589] initcall sdio_uart_init+0x0/0xe1 returned 0 after 88 usecs
[    3.679161] calling  sdhci_drv_init+0x0/0x24 @ 1
[    3.680319] sdhci: Secure Digital Host Controller Interface driver
[    3.681796] sdhci: Copyright(c) Pierre Ossman
[    3.682854] initcall sdhci_drv_init+0x0/0x24 returned 0 after 2476 usecs
[    3.684518] calling  sdhci_driver_init+0x0/0x1b @ 1
[    3.685714] initcall sdhci_driver_init+0x0/0x1b returned 0 after 152 usecs
[    3.687377] calling  wbsd_drv_init+0x0/0xb8 @ 1
[    3.688510] wbsd: Winbond W83L51xD SD/MMC card interface driver
[    3.689935] wbsd: Copyright(c) Pierre Ossman
[    3.691057] initcall wbsd_drv_init+0x0/0xb8 returned 0 after 2490 usecs
[    3.692690] calling  tifm_sd_init+0x0/0x12 @ 1
[    3.693845] initcall tifm_sd_init+0x0/0x12 returned 0 after 76 usecs
[    3.695391] calling  cb710_mmc_driver_init+0x0/0x12 @ 1
[    3.696766] initcall cb710_mmc_driver_init+0x0/0x12 returned 0 after 79 usecs
[    3.698485] calling  via_sd_driver_init+0x0/0x1b @ 1
[    3.699762] initcall via_sd_driver_init+0x0/0x1b returned 0 after 82 usecs
[    3.701457] calling  vub300_init+0x0/0x13f @ 1
[    3.702529] VUB300 Driver rom wait states = 1C irqpoll timeout = 0400
[    3.704229] usbcore: registered new interface driver vub300
[    3.705655] initcall vub300_init+0x0/0x13f returned 0 after 3052 usecs
[    3.707214] calling  ushc_driver_init+0x0/0x1b @ 1
[    3.708524] usbcore: registered new interface driver ushc
[    3.709846] initcall ushc_driver_init+0x0/0x1b returned 0 after 1416 usecs
[    3.711497] calling  sdhci_pltfm_drv_init+0x0/0x16 @ 1
[    3.712793] sdhci-pltfm: SDHCI platform and OF driver helper
[    3.714169] initcall sdhci_pltfm_drv_init+0x0/0x16 returned 0 after
1342 usecs
[    3.715925] calling  memstick_init+0x0/0x8b @ 1
[    3.717282] initcall memstick_init+0x0/0x8b returned 0 after 214 usecs
[    3.718861] calling  mspro_block_init+0x0/0x6f @ 1
[    3.720129] initcall mspro_block_init+0x0/0x6f returned 0 after 75 usecs
[    3.721756] calling  tifm_ms_init+0x0/0x12 @ 1
[    3.722909] initcall tifm_ms_init+0x0/0x12 returned 0 after 72 usecs
[    3.723946] calling  jmb38x_ms_init+0x0/0x1b @ 1
[    3.725227] initcall jmb38x_ms_init+0x0/0x1b returned 0 after 102 usecs
[    3.726825] calling  timer_trig_init+0x0/0x12 @ 1
[    3.727994] initcall timer_trig_init+0x0/0x12 returned 0 after 34 usecs
[    3.729632] calling  heartbeat_trig_init+0x0/0x3d @ 1
[    3.730857] initcall heartbeat_trig_init+0x0/0x3d returned 0 after 3 usecs
[    3.732528] calling  ib_core_init+0x0/0xa0 @ 1
[    3.733708] initcall ib_core_init+0x0/0xa0 returned 0 after 102 usecs
[    3.735284] calling  ib_mad_init_module+0x0/0xcf @ 1
[    3.736536] initcall ib_mad_init_module+0x0/0xcf returned 0 after 11 usecs
[    3.738152] calling  ib_sa_init+0x0/0x64 @ 1
[    3.739269] initcall ib_sa_init+0x0/0x64 returned 0 after 77 usecs
[    3.740790] calling  ib_cm_init+0x0/0x16d @ 1
[    3.742024] initcall ib_cm_init+0x0/0x16d returned 0 after 178 usecs
[    3.743576] calling  iw_cm_init+0x0/0x38 @ 1
[    3.744733] initcall iw_cm_init+0x0/0x38 returned 0 after 52 usecs
[    3.746229] calling  addr_init+0x0/0x49 @ 1
[    3.747306] initcall addr_init+0x0/0x49 returned 0 after 62 usecs
[    3.748815] calling  cma_init+0x0/0xd5 @ 1
[    3.749867] initcall cma_init+0x0/0xd5 returned 0 after 55 usecs
[    3.751322] calling  ib_umad_init+0x0/0xd9 @ 1
[    3.752535] initcall ib_umad_init+0x0/0xd9 returned 0 after 102 usecs
[    3.754089] calling  ib_uverbs_init+0x0/0xd9 @ 1
[    3.755296] initcall ib_uverbs_init+0x0/0xd9 returned 0 after 78 usecs
[    3.756892] calling  ib_ucm_init+0x0/0xa3 @ 1
[    3.757952] initcall ib_ucm_init+0x0/0xa3 returned 0 after 4 usecs
[    3.759437] calling  ucma_init+0x0/0x9d @ 1
[    3.760597] initcall ucma_init+0x0/0x9d returned 0 after 121 usecs
[    3.762091] calling  mthca_init+0x0/0x182 @ 1
[    3.763281] initcall mthca_init+0x0/0x182 returned 0 after 131 usecs
[    3.764858] calling  infinipath_init+0x0/0xc2 @ 1
[    3.766102] initcall infinipath_init+0x0/0xc2 returned 0 after 105 usecs
[    3.767705] calling  qlogic_ib_init+0x0/0xed @ 1
[    3.769138] ib_qib: Unable to register ipathfs
[    3.770226] initcall qlogic_ib_init+0x0/0xed returned 0 after 1323 usecs
[    3.771849] calling  mlx4_ib_init+0x0/0x75 @ 1
[    3.773052] initcall mlx4_ib_init+0x0/0x75 returned 0 after 71 usecs
[    3.774623] calling  ipoib_init_module+0x0/0x141 @ 1
[    3.775884] initcall ipoib_init_module+0x0/0x141 returned 0 after 59 usecs
[    3.777594] calling  srp_init_module+0x0/0x14d @ 1
[    3.778869] initcall srp_init_module+0x0/0x14d returned 0 after 103 usecs
[    3.780544] calling  iser_init+0x0/0x12c @ 1
[    3.781695] iscsi: registered transport (iser)
[    3.782775] initcall iser_init+0x0/0x12c returned 0 after 1171 usecs
[    3.784343] calling  dmi_sysfs_init+0x0/0x96 @ 1
[    3.785593] initcall dmi_sysfs_init+0x0/0x96 returned 0 after 135 usecs
[    3.787183] calling  efivars_init+0x0/0x106 @ 1
[    3.788302] EFI Variables Facility v0.08 2004-May-17
[    3.789499] initcall efivars_init+0x0/0x106 returned 0 after 1170 usecs
[    3.791081] calling  ibft_init+0x0/0x3bd @ 1
[    3.792134] No iBFT detected.
[    3.792857] initcall ibft_init+0x0/0x3bd returned 0 after 706 usecs
[    3.794371] calling  hid_init+0x0/0x53 @ 1
[    3.795456] initcall hid_init+0x0/0x53 returned 0 after 96 usecs
[    3.796944] calling  hid_init+0x0/0x1b @ 1
[    3.798011] initcall hid_init+0x0/0x1b returned 0 after 95 usecs
[    3.799125] calling  a4_init+0x0/0x1b @ 1
[    3.800244] initcall a4_init+0x0/0x1b returned 0 after 105 usecs
[    3.801694] calling  apple_init+0x0/0x35 @ 1
[    3.802868] initcall apple_init+0x0/0x35 returned 0 after 133 usecs
[    3.804472] calling  belkin_init+0x0/0x1b @ 1
[    3.805634] initcall belkin_init+0x0/0x1b returned 0 after 103 usecs
[    3.807164] calling  ch_init+0x0/0x1b @ 1
[    3.808263] initcall ch_init+0x0/0x1b returned 0 after 86 usecs
[    3.809687] calling  ch_init+0x0/0x1b @ 1
[    3.810735] initcall ch_init+0x0/0x1b returned 0 after 75 usecs
[    3.812206] calling  cp_init+0x0/0x1b @ 1
[    3.813261] initcall cp_init+0x0/0x1b returned 0 after 84 usecs
[    3.814691] calling  dr_init+0x0/0x1b @ 1
[    3.815739] initcall dr_init+0x0/0x1b returned 0 after 74 usecs
[    3.817216] calling  ez_init+0x0/0x1b @ 1
[    3.818285] initcall ez_init+0x0/0x1b returned 0 after 91 usecs
[    3.819707] calling  gyration_init+0x0/0x1b @ 1
[    3.820916] initcall gyration_init+0x0/0x1b returned 0 after 75 usecs
[    3.822481] calling  ks_init+0x0/0x1b @ 1
[    3.823530] initcall ks_init+0x0/0x1b returned 0 after 72 usecs
[    3.825004] calling  kye_init+0x0/0x1b @ 1
[    3.826077] initcall kye_init+0x0/0x1b returned 0 after 77 usecs
[    3.827510] calling  lg_init+0x0/0x1b @ 1
[    3.828637] initcall lg_init+0x0/0x1b returned 0 after 121 usecs
[    3.830090] calling  logi_dj_init+0x0/0x73 @ 1
[    3.831281] initcall logi_dj_init+0x0/0x73 returned 0 after 173 usecs
[    3.832889] calling  ms_init+0x0/0x1b @ 1
[    3.833990] initcall ms_init+0x0/0x1b returned 0 after 118 usecs
[    3.835459] calling  mr_init+0x0/0x1b @ 1
[    3.836571] initcall mr_init+0x0/0x1b returned 0 after 95 usecs
[    3.838005] calling  ntrig_init+0x0/0x1b @ 1
[    3.839128] initcall ntrig_init+0x0/0x1b returned 0 after 84 usecs
[    3.840668] calling  ortek_init+0x0/0x1b @ 1
[    3.841782] initcall ortek_init+0x0/0x1b returned 0 after 81 usecs
[    3.843274] calling  pl_init+0x0/0x1b @ 1
[    3.844359] initcall pl_init+0x0/0x1b returned 0 after 75 usecs
[    3.845797] calling  pl_init+0x0/0x1b @ 1
[    3.846845] initcall pl_init+0x0/0x1b returned 0 after 73 usecs
[    3.848309] calling  samsung_init+0x0/0x1b @ 1
[    3.849461] initcall samsung_init+0x0/0x1b returned 0 after 79 usecs
[    3.850988] calling  sjoy_init+0x0/0x1b @ 1
[    3.852111] initcall sjoy_init+0x0/0x1b returned 0 after 74 usecs
[    3.853581] calling  sony_init+0x0/0x1b @ 1
[    3.854673] initcall sony_init+0x0/0x1b returned 0 after 73 usecs
[    3.856173] calling  sp_init+0x0/0x1b @ 1
[    3.857216] initcall sp_init+0x0/0x1b returned 0 after 78 usecs
[    3.858625] calling  ga_init+0x0/0x1b @ 1
[    3.859707] initcall ga_init+0x0/0x1b returned 0 after 107 usecs
[    3.861209] calling  tm_init+0x0/0x1b @ 1
[    3.862310] initcall tm_init+0x0/0x1b returned 0 after 127 usecs
[    3.863765] calling  ts_init+0x0/0x1b @ 1
[    3.864910] initcall ts_init+0x0/0x1b returned 0 after 105 usecs
[    3.866352] calling  twinhan_init+0x0/0x1b @ 1
[    3.867516] initcall twinhan_init+0x0/0x1b returned 0 after 89 usecs
[    3.869087] calling  zp_init+0x0/0x1b @ 1
[    3.870148] initcall zp_init+0x0/0x1b returned 0 after 80 usecs
[    3.871583] calling  hid_init+0x0/0x4e @ 1
[    3.872699] usbcore: registered new interface driver usbhid
[    3.874041] usbhid: USB HID core driver
[    3.874972] initcall hid_init+0x0/0x4e returned 0 after 2316 usecs
[    3.876496] calling  mxm_wmi_init+0x0/0x8 @ 1
[    3.877547] initcall mxm_wmi_init+0x0/0x8 returned 0 after 0 usecs
[    3.879028] calling  alsa_hwdep_init+0x0/0x55 @ 1
[    3.880220] initcall alsa_hwdep_init+0x0/0x55 returned 0 after 33 usecs
[    3.881810] calling  alsa_timer_init+0x0/0x15e @ 1
[    3.883232] initcall alsa_timer_init+0x0/0x15e returned 0 after 269 usecs
[    3.884920] calling  alsa_pcm_init+0x0/0x5d @ 1
[    3.886020] initcall alsa_pcm_init+0x0/0x5d returned 0 after 6 usecs
[    3.887528] calling  snd_mem_init+0x0/0x2c @ 1
[    3.888634] initcall snd_mem_init+0x0/0x2c returned 0 after 10 usecs
[    3.890170] calling  alsa_rawmidi_init+0x0/0x14 @ 1
[    3.891345] initcall alsa_rawmidi_init+0x0/0x14 returned 0 after 2 usecs
[    3.892979] calling  snd_compress_init+0x0/0x8 @ 1
[    3.894142] initcall snd_compress_init+0x0/0x8 returned 0 after 0 usecs
[    3.895737] calling  alsa_card_serial_init+0x0/0xa5 @ 1
[    3.897329] no UART detected at 0x1
[    3.898513] initcall alsa_card_serial_init+0x0/0xa5 returned -19
after 1460 usecs
[    3.900357] calling  alsa_mpu401_uart_init+0x0/0x8 @ 1
[    3.901597] initcall alsa_mpu401_uart_init+0x0/0x8 returned 0 after 0 usecs
[    3.903267] calling  alsa_card_mpu401_init+0x0/0xcc @ 1
[    3.904825] initcall alsa_card_mpu401_init+0x0/0xcc returned -19
after 264 usecs
[    3.906615] calling  intel8x0_driver_init+0x0/0x1b @ 1
[    3.907938] initcall intel8x0_driver_init+0x0/0x1b returned 0 after 97 usecs
[    3.909684] calling  intel8x0m_driver_init+0x0/0x1b @ 1
[    3.911036] initcall intel8x0m_driver_init+0x0/0x1b returned 0 after 91 usecs
[    3.912794] calling  alsa_ac97_init+0x0/0x8 @ 1
[    3.913887] initcall alsa_ac97_init+0x0/0x8 returned 0 after 0 usecs
[    3.915421] calling  patch_realtek_init+0x0/0x12 @ 1
[    3.916655] initcall patch_realtek_init+0x0/0x12 returned 0 after 18 usecs
[    3.918276] calling  patch_cmedia_init+0x0/0x12 @ 1
[    3.919461] initcall patch_cmedia_init+0x0/0x12 returned 0 after 3 usecs
[    3.920974] calling  patch_analog_init+0x0/0x12 @ 1
[    3.922148] initcall patch_analog_init+0x0/0x12 returned 0 after 2 usecs
[    3.923763] calling  patch_sigmatel_init+0x0/0x12 @ 1
[    3.925029] initcall patch_sigmatel_init+0x0/0x12 returned 0 after 1 usecs
[    3.926688] calling  patch_si3054_init+0x0/0x12 @ 1
[    3.927863] initcall patch_si3054_init+0x0/0x12 returned 0 after 2 usecs
[    3.929505] calling  patch_cirrus_init+0x0/0x12 @ 1
[    3.930684] initcall patch_cirrus_init+0x0/0x12 returned 0 after 1 usecs
[    3.932312] calling  patch_ca0110_init+0x0/0x12 @ 1
[    3.933493] initcall patch_ca0110_init+0x0/0x12 returned 0 after 2 usecs
[    3.934616] calling  patch_ca0132_init+0x0/0x12 @ 1
[    3.935034] initcall patch_ca0132_init+0x0/0x12 returned 0 after 1 usecs
[    3.936224] calling  patch_conexant_init+0x0/0x12 @ 1
[    3.937454] initcall patch_conexant_init+0x0/0x12 returned 0 after 1 usecs
[    3.939108] calling  patch_via_init+0x0/0x12 @ 1
[    3.940270] initcall patch_via_init+0x0/0x12 returned 0 after 1 usecs
[    3.941823] calling  patch_hdmi_init+0x0/0x12 @ 1
[    3.942964] initcall patch_hdmi_init+0x0/0x12 returned 0 after 1 usecs
[    3.944589] calling  azx_driver_init+0x0/0x1b @ 1
[    3.945843] initcall azx_driver_init+0x0/0x1b returned 0 after 159 usecs
[    3.947361] calling  snd_usb_audio_init+0x0/0x3d @ 1
[    3.948631] usbcore: registered new interface driver snd-usb-audio
[    3.950169] initcall snd_usb_audio_init+0x0/0x3d returned 0 after 1638 usecs
[    3.951986] calling  snd_usb_driver_init+0x0/0x1b @ 1
[    3.953393] usbcore: registered new interface driver snd-usb-caiaq
[    3.954912] initcall snd_usb_driver_init+0x0/0x1b returned 0 after 1622 usecs
[    3.956658] calling  snd_soc_init+0x0/0xe8 @ 1
[    3.957878] initcall snd_soc_init+0x0/0xe8 returned 0 after 364 usecs
[    3.959430] calling  sock_diag_init+0x0/0x12 @ 1
[    3.960577] initcall sock_diag_init+0x0/0x12 returned 0 after 10 usecs
[    3.962133] calling  sysctl_ipv4_init+0x0/0x8d @ 1
[    3.963327] initcall sysctl_ipv4_init+0x0/0x8d returned 0 after 35 usecs
[    3.964971] calling  inet_diag_init+0x0/0x97 @ 1
[    3.966106] initcall inet_diag_init+0x0/0x97 returned 0 after 21 usecs
[    3.967679] calling  tcp_diag_init+0x0/0x12 @ 1
[    3.968812] initcall tcp_diag_init+0x0/0x12 returned 0 after 12 usecs
[    3.970358] calling  cubictcp_register+0x0/0x6a @ 1
[    3.971531] TCP: cubic registered
[    3.972353] initcall cubictcp_register+0x0/0x6a returned 0 after 797 usecs
[    3.974007] calling  hstcp_register+0x0/0x12 @ 1
[    3.975118] TCP: highspeed registered
[    3.976023] initcall hstcp_register+0x0/0x12 returned 0 after 877 usecs
[    3.977600] calling  tcp_scalable_register+0x0/0x12 @ 1
[    3.978851] TCP: scalable registered
[    3.979658] initcall tcp_scalable_register+0x0/0x12 returned 0
after 790 usecs
[    3.981419] calling  unix_diag_init+0x0/0x12 @ 1
[    3.982554] initcall unix_diag_init+0x0/0x12 returned 0 after 3 usecs
[    3.984145] calling  packet_init+0x0/0x44 @ 1
[    3.985154] NET: Registered protocol family 17
[    3.986231] initcall packet_init+0x0/0x44 returned 0 after 1057 usecs
[    3.987779] calling  can_init+0x0/0xf5 @ 1
[    3.988803] can: controller area network core (rev 20120528 abi 9)
[    3.990338] NET: Registered protocol family 29
[    3.991418] initcall can_init+0x0/0xf5 returned 0 after 2557 usecs
[    3.992931] calling  rfcomm_init+0x0/0xe4 @ 1
[    3.994080] Bluetooth: RFCOMM TTY layer initialized
[    3.995276] Bluetooth: RFCOMM socket layer initialized
[    3.996555] Bluetooth: RFCOMM ver 1.11
[    3.997473] initcall rfcomm_init+0x0/0xe4 returned 0 after 3407 usecs
[    3.999019] calling  bnep_init+0x0/0x72 @ 1
[    4.000055] Bluetooth: BNEP (Ethernet Emulation) ver 1.3
[    4.001340] Bluetooth: BNEP socket layer initialized
[    4.002536] initcall bnep_init+0x0/0x72 returned 0 after 2427 usecs
[    4.004065] calling  hidp_init+0x0/0x20 @ 1
[    4.005090] Bluetooth: HIDP (Human Interface Emulation) ver 1.2
[    4.006504] Bluetooth: HIDP socket layer initialized
[    4.007716] initcall hidp_init+0x0/0x20 returned 0 after 2568 usecs
[    4.009026] calling  init_rpcsec_gss+0x0/0x61 @ 1
[    4.010097] initcall init_rpcsec_gss+0x0/0x61 returned 0 after 257 usecs
[    4.012060] calling  xprt_rdma_init+0x0/0xb9 @ 1
[    4.013184] RPC: Registered rdma transport module.
[    4.014372] initcall xprt_rdma_init+0x0/0xb9 returned 0 after 1161 usecs
[    4.015991] calling  svc_rdma_init+0x0/0x1b0 @ 1
[    4.017537] initcall svc_rdma_init+0x0/0x1b0 returned 0 after 402 usecs
[    4.019140] calling  dccp_init+0x0/0x39b @ 1
[    4.025107] DCCP: Activated CCID 2 (TCP-like)
[    4.025876] DCCP: Activated CCID 3 (TCP-Friendly Rate Control)
[    4.027301] initcall dccp_init+0x0/0x39b returned 0 after 6848 usecs
[    4.028914] calling  dccp_v4_init+0x0/0x81 @ 1
[    4.030227] initcall dccp_v4_init+0x0/0x81 returned 0 after 237 usecs
[    4.031781] calling  dccp_diag_init+0x0/0x12 @ 1
[    4.032938] initcall dccp_diag_init+0x0/0x12 returned 0 after 1 usecs
[    4.034495] calling  sctp_init+0x0/0x4d9 @ 1
[    4.043062] sctp: Hash tables configured (established 52428 bind 52428)
[    4.045024] initcall sctp_init+0x0/0x4d9 returned 0 after 9268 usecs
[    4.046538] calling  rds_init+0x0/0xf0 @ 1
[    4.047626] NET: Registered protocol family 21
[    4.048756] initcall rds_init+0x0/0xf0 returned 0 after 1189 usecs
[    4.050238] calling  rds_tcp_init+0x0/0xb0 @ 1
[    4.051332] Registered RDS/tcp transport
[    4.052428] initcall rds_tcp_init+0x0/0xb0 returned 0 after 1087 usecs
[    4.053997] calling  tipc_init+0x0/0xcf @ 1
[    4.055012] tipc: Activated (version 2.0.0)
[    4.056473] NET: Registered protocol family 30
[    4.057403] tipc: Started in single node mode
[    4.058454] initcall tipc_init+0x0/0xcf returned 0 after 3357 usecs
[    4.059722] calling  wimax_subsys_init+0x0/0x3a8 @ 1
[    4.060976] initcall wimax_subsys_init+0x0/0x3a8 returned 0 after 11 usecs
[    4.062633] calling  init_dns_resolver+0x0/0x105 @ 1
[    4.063840] Key type dns_resolver registered
[    4.064908] initcall init_dns_resolver+0x0/0x105 returned 0 after 1048 usecs
[    4.066598] calling  mcheck_init_device+0x0/0x12b @ 1
[    4.068213] initcall mcheck_init_device+0x0/0x12b returned 0 after 388 usecs
[    4.069961] calling  rio_init_mports+0x0/0x210 @ 1
[    4.071163] initcall rio_init_mports+0x0/0x210 returned -19 after 0 usecs
[    4.072998] calling  mcheck_debugfs_init+0x0/0x3b @ 1
[    4.074292] initcall mcheck_debugfs_init+0x0/0x3b returned 0 after 57 usecs
[    4.075973] calling  severities_debugfs_init+0x0/0x3d @ 1
[    4.077332] initcall severities_debugfs_init+0x0/0x3d returned 0
after 19 usecs
[    4.079069] calling  threshold_init_device+0x0/0x4b @ 1
[    4.080355] initcall threshold_init_device+0x0/0x4b returned 0 after 0 usecs
[    4.082041] calling  hpet_insert_resource+0x0/0x26 @ 1
[    4.083281] initcall hpet_insert_resource+0x0/0x26 returned 0 after 3 usecs
[    4.084987] calling  update_mp_table+0x0/0x471 @ 1
[    4.086144] initcall update_mp_table+0x0/0x471 returned 0 after 0 usecs
[    4.087729] calling  lapic_insert_resource+0x0/0x41 @ 1
[    4.089013] initcall lapic_insert_resource+0x0/0x41 returned 0 after 0 usecs
[    4.090713] calling  io_apic_bug_finalize+0x0/0x1b @ 1
[    4.091945] initcall io_apic_bug_finalize+0x0/0x1b returned 0 after 0 usecs
[    4.093640] calling  print_ICs+0x0/0x5ab @ 1
[    4.094675] initcall print_ICs+0x0/0x5ab returned 0 after 0 usecs
[    4.096153] calling  check_early_ioremap_leak+0x0/0x50 @ 1
[    4.097475] initcall check_early_ioremap_leak+0x0/0x50 returned 0
after 0 usecs
[    4.099190] calling  pat_memtype_list_init+0x0/0x35 @ 1
[    4.100519] initcall pat_memtype_list_init+0x0/0x35 returned 0 after 0 usecs
[    4.102257] calling  init_oops_id+0x0/0x40 @ 1
[    4.103344] initcall init_oops_id+0x0/0x40 returned 0 after 9 usecs
[    4.104897] calling  printk_late_init+0x0/0x58 @ 1
[    4.106069] initcall printk_late_init+0x0/0x58 returned 0 after 4 usecs
[    4.107662] calling  sched_init_debug+0x0/0x24 @ 1
[    4.108857] initcall sched_init_debug+0x0/0x24 returned 0 after 14 usecs
[    4.110461] calling  pm_qos_power_init+0x0/0x65 @ 1
[    4.111959] initcall pm_qos_power_init+0x0/0x65 returned 0 after 321 usecs
[    4.113669] calling  pm_debugfs_init+0x0/0x24 @ 1
[    4.114819] initcall pm_debugfs_init+0x0/0x24 returned 0 after 8 usecs
[    4.116418] calling  software_resume+0x0/0x2b0 @ 1
[    4.117587] PM: Hibernation image not present or could not be loaded.
[    4.119187] initcall software_resume+0x0/0x2b0 returned -2 after 1580 usecs
[    4.120889] initcall software_resume+0x0/0x2b0 returned with error code -2
[    4.122568] calling  debugfs_kprobe_init+0x0/0xa0 @ 1
[    4.123805] initcall debugfs_kprobe_init+0x0/0xa0 returned 0 after 20 usecs
[    4.125514] calling  taskstats_init+0x0/0x95 @ 1
[    4.126641] registered taskstats version 1
[    4.127652] initcall taskstats_init+0x0/0x95 returned 0 after 999 usecs
[    4.129238] calling  clear_boot_tracer+0x0/0x30 @ 1
[    4.130439] initcall clear_boot_tracer+0x0/0x30 returned 0 after 0 usecs
[    4.132102] calling  fail_page_alloc_debugfs+0x0/0x9c @ 1
[    4.133490] initcall fail_page_alloc_debugfs+0x0/0x9c returned 0
after 71 usecs
[    4.135279] calling  max_swapfiles_check+0x0/0x8 @ 1
[    4.136513] initcall max_swapfiles_check+0x0/0x8 returned 0 after 0 usecs
[    4.138138] calling  failslab_debugfs_init+0x0/0x7c @ 1
[    4.139449] initcall failslab_debugfs_init+0x0/0x7c returned 0 after 60 usecs
[    4.141200] calling  set_recommended_min_free_kbytes+0x0/0xb0 @ 1
[    4.142835] initcall set_recommended_min_free_kbytes+0x0/0xb0
returned 0 after 166 usecs
[    4.144801] calling  fail_make_request_debugfs+0x0/0x28 @ 1
[    4.146175] initcall fail_make_request_debugfs+0x0/0x28 returned 0
after 37 usecs
[    4.147965] calling  random32_reseed+0x0/0xa1 @ 1
[    4.149132] initcall random32_reseed+0x0/0xa1 returned 0 after 7 usecs
[    4.150701] calling  pci_resource_alignment_sysfs_init+0x0/0x22 @ 1
[    4.152232] initcall pci_resource_alignment_sysfs_init+0x0/0x22
returned 0 after 5 usecs
[    4.154167] calling  pci_sysfs_init+0x0/0x51 @ 1
[    4.155406] initcall pci_sysfs_init+0x0/0x51 returned 0 after 121 usecs
[    4.157018] calling  random_int_secret_init+0x0/0x19 @ 1
[    4.158301] initcall random_int_secret_init+0x0/0x19 returned 0
after 11 usecs
[    4.160068] calling  deferred_probe_initcall+0x0/0x60 @ 1
[    4.161462] initcall deferred_probe_initcall+0x0/0x60 returned 0
after 66 usecs
[    4.163237] calling  rionet_init+0x0/0x12 @ 1
[    4.164526] initcall rionet_init+0x0/0x12 returned 0 after 176 usecs
[    4.166076] calling  rtc_hctosys+0x0/0x111 @ 1
[    4.167214] rtc_cmos 00:00: setting system clock to 2012-12-14
09:00:17 UTC (1355475617)
[    4.169177] initcall rtc_hctosys+0x0/0x111 returned 0 after 1980 usecs
[    4.170746] calling  powernowk8_init+0x0/0xfc @ 1
[    4.171881] initcall powernowk8_init+0x0/0xfc returned -19 after 0 usecs
[    4.173521] calling  acpi_cpufreq_init+0x0/0x223 @ 1
[    4.174830] initcall acpi_cpufreq_init+0x0/0x223 returned -19 after 101 usecs
[    4.176569] calling  pcc_cpufreq_init+0x0/0x2d7 @ 1
[    4.177757] initcall pcc_cpufreq_init+0x0/0x2d7 returned -19 after 6 usecs
[    4.179406] calling  edd_init+0x0/0x2da @ 1
[    4.180439] BIOS EDD facility v0.16 2004-Jun-25, 1 devices found
[    4.182003] initcall edd_init+0x0/0x2da returned 0 after 1530 usecs
[    4.183517] calling  firmware_memmap_init+0x0/0x35 @ 1
[    4.184824] initcall firmware_memmap_init+0x0/0x35 returned 0 after 40 usecs
[    4.186522] calling  pci_mmcfg_late_insert_resources+0x0/0x4d @ 1
[    4.187969] initcall pci_mmcfg_late_insert_resources+0x0/0x4d
returned 0 after 0 usecs
[    4.189889] calling  net_secret_init+0x0/0x19 @ 1
[    4.191071] initcall net_secret_init+0x0/0x19 returned 0 after 13 usecs
[    4.192705] calling  tcp_congestion_default+0x0/0x12 @ 1
[    4.193993] initcall tcp_congestion_default+0x0/0x12 returned 0 after 2 usecs
[    4.195720] calling  tcp_fastopen_init+0x0/0x4b @ 1
[    4.196980] initcall tcp_fastopen_init+0x0/0x4b returned 0 after 43 usecs
[    4.198615] calling  ip_auto_config+0x0/0xea1 @ 1
[    6.209094] e1000: eth0 NIC Link is Up 1000 Mbps Full Duplex, Flow
Control: RX
[    6.228172] Sending DHCP requests ., OK
[    6.230578] IP-Config: Got DHCP answer from 10.0.2.2, my address is 10.0.2.15
[    6.234558] IP-Config: Complete:
[    6.235279]      device=eth0, hwaddr=00:1c:25:1c:13:e9,
ipaddr=10.0.2.15, mask=255.255.255.0, gw=10.0.2.2
[    6.237476]      host=10.0.2.15, domain=, nis-domain=(none)
[    6.238804]      bootserver=10.0.2.2, rootserver=10.0.2.2, rootpath=
[    6.240234]      nameserver0=10.0.2.3[    6.240912] initcall
ip_auto_config+0x0/0xea1 returned 0 after 1993324 usecs
[    6.242646] calling  alsa_sound_last_init+0x0/0x61 @ 1
[    6.243133] ALSA device list:
[    6.243439]   No soundcards found.
[    6.243802] initcall alsa_sound_last_init+0x0/0x61 returned 0 after 651 usecs
[    6.244601] calling  initialize_hashrnd+0x0/0x19 @ 1
[    6.245143] initcall initialize_hashrnd+0x0/0x19 returned 0 after 7 usecs
[    6.247764] md: Waiting for all devices to be available before autodetect
[    6.248492] md: If you don't use raid, use raid=noautodetect
[    6.249835] md: Autodetecting RAID arrays.
[    6.250285] md: Scanned 0 and added 0 devices.
[    6.250755] md: autorun ...
[    6.251087] md: ... autorun DONE.
[    6.251470] RAMDISK: xz image found at block 0
[   10.684824] kjournald starting.  Commit interval 5 seconds
[   10.685573] EXT3-fs (ram0): warning: maximal mount count reached,
running e2fsck is recommended
[   10.686413] EXT3-fs (ram0): using internal journal
[   10.686874] EXT3-fs (ram0): mounted filesystem with writeback data mode
[   10.687697] VFS: Mounted root (ext3 filesystem) on device 1:0.
[   10.689034] Freeing unused kernel memory: 3444k freed
INIT: version 2.86 booting
System Boot Control: Running /etc/init.d/boot
Mounting procfs at /proc                                             done
Mounting sysfs at /sys                                               done
Mounting debugfs at /sys/kernel/debug                                done
Mounting tmpfs at /dev                                               done
Initializing /dev                                                    done
Mounting devpts at /dev/pts                                          done
Boot logging started on /dev/ttyS0(/dev/console) at Fri Dec 14 09:00:24 2012
FATAL: Could not load
/lib/modules/3.7.0-yh-07359-ge48ae89-dirty/modules.dep: No such file
or directory
Setting up the hardware clockmodprobe: FATAL: Could not load
/lib/modules/3.7.0-yh-07359-ge48ae89-dirty/modules.dep: No such file
or directory

hwclock: With --noadjfile, you must specify either --utc or --localtime
                                                                     failed
Disabling IP forwarding                                              done
                                                                     done
Starting udevd:                                                      done
Loading drivers, configuring devices: [   10.812280] udevd (2457):
/proc/2457/oom_adj is deprecated, please use /proc/2457/oom_score_adj
instead.
[   10.814451] udevd version 128 started
                                                                     done
Loading required kernel modules                                      done
Activating device mapper...
FATAL: Could not load
/lib/modules/3.7.0-yh-07359-ge48ae89-dirty/modules.dep: No such file
or directory
                                                                     failed
Starting MD Raid                                                     unused
Waiting for udev to settle...
Scanning for LVM volume groups...
File descriptor 3 left open
  Reading all physical volumes.  This may take a while...
Activating LVM volume groups...
File descriptor 3 left open
                                                                     done
Waiting for /firmware
microcode . no more events
Checking file systems...
fsck 1.41.1 (01-Sep-2008)
Checking all file systems.                                           done
                                                                     done
Mounting local file systems...
/proc on /proc type proc (rw)
sysfs on /sys type sysfs (rw)
debugfs on /sys/kernel/debug type debugfs (rw)
udev on /dev type tmpfs (rw)
devpts on /dev/pts type devpts (rw,mode=0620,gid=5)
/firmware on /lib/firmware type tmpfs (rw)
microcode on /usr/lib/microcode type tmpfs (rw)                      done
Creating /var/log/boot.msg                                           done
Activating remaining swap-devices in /etc/fstab...                   done
Setting up linker cache (/etc/ld.so.cache) using ldconfig            done
Using boot-specified hostname '10.0.2.15'
Setting up hostname '10'                                             done
Setting up loopback interface     lo
    lo        IP address: 127.0.0.1/8
              IP address: 127.0.0.2/8
                                                                     done
System Boot Control: The system has been                             set up
Skipped features:                                                boot.md
System Boot Control: Running /etc/init.d/boot.local                  done
INIT: Entering runlevel: 3
Boot logging started on /dev/ttyS0(/dev/console) at Fri Dec 14 09:00:28 2012
Master Resource Control: previous runlevel: N, switching to runlevel:3
Initializing random number generator                                 done
Starting D-Bus daemon                                                done
Starting syslog services                                             done
Loading CPUFreq modules (CPUFreq not supported)
Starting HAL daemon                                                  done
Setting up (localfs) network interfaces:
    lo
    lo        IP address: 127.0.0.1/8
              IP address: 127.0.0.2/8                                done
    eth0      device: Intel Corporation 82540EM Gigabit Ethernet
Controller (rev 03)
              No configuration found for eth0                        unused
Setting up service (localfs) network  .  .  .  .  .  .  .  .  .  .   done
Starting RPC portmap daemon                                          done
Setting up (remotefs) network interfaces:
Setting up service (remotefs) network  .  .  .  .  .  .  .  .  .  .  done
Master Resource Control: runlevel 3 has been                         reached
Shutting down SSH daemon                                             done
                        Generating /etc/ssh/ssh_host_key.
                                                         Generating
public/private rsa1 key pair.
                 Your identification has been saved in /etc/ssh/ssh_host_key.

      Your public key has been saved in /etc/ssh/ssh_host_key.pub.
                                                         The key fingerprint is:
20:80:f6:2f:53:5c:3d:91:91:2a:80:dd:da:fb:67:57 root@10
                                                       The key's
randomart image is:
    +--[RSA1 1024]----+
                       | .+ .   .o=      |
                                          |.o + . . =       |
                                                             |. . * o . .      |
|   o * o         |
                   |    o o S        |
                                      |   o o       E   |
                                                         |    o .     .    |

     |       . o .     |
               |        o .      |
                                  +-----------------+
                                                     Generating
/etc/ssh/ssh_host_dsa_key.
          Generating public/private dsa key pair.
                                                 Your identification
has been saved in /etc/ssh/ssh_host_dsa_key.
                                 Your public key has been saved in
/etc/ssh/ssh_host_dsa_key.pub.
                 The key fingerprint is:

8a:49:46:4b:de:6d:45:22:b3:23:f8:9b:33:e2:39:0d root@10
               The key's randomart image is:
                                            +--[ DSA 1024]----+
                                                               |
o . .      |
  |   .   + o       |
                     |  . + o   .      |
                                        |   = + o .       |
                                                           |    * . S        |

       |  Eo = o         |
                 |  .oB .          |
                                    | ..o.o           |
                                                       |  o.             |

   +-----------------+
             Generating /etc/ssh/ssh_host_rsa_key.
                                                  Generating
public/private rsa key pair.
         Your identification has been saved in /etc/ssh/ssh_host_rsa_key.

  Your public key has been saved in /etc/ssh/ssh_host_rsa_key.pub.
                                                         The key fingerprint is:
0f:d1:a8:4c:88:75:1d:a5:26:27:2b:40:4c:61:bf:68 root@10
                                                       The key's
randomart image is:
    +--[ RSA 1024]----+
                       | o=.. ...o.      |
                                          | o.+ o  .+       |
                                                             |  o o + * .      |
|   o + B .       |
                   |  E o + S        |
                                      | .   .   o       |
                                                         |          .      |

     |                 |
               |                 |
                                  +-----------------+
                                                     Starting SSH daedone

10 login: Connection closed by foreign host.

[-- Attachment #2: hpa_pf_set_page_table_2.patch --]
[-- Type: application/octet-stream, Size: 18720 bytes --]

---
 arch/x86/include/asm/pgtable_64_types.h |    4 
 arch/x86/kernel/head64.c                |   94 ++++++++++++++--
 arch/x86/kernel/head_64.S               |  188 ++++++++++++++++----------------
 arch/x86/kernel/setup.c                 |   10 +
 arch/x86/mm/init.c                      |    4 
 arch/x86/realmode/init.c                |   43 ++++---
 6 files changed, 225 insertions(+), 118 deletions(-)

Index: linux-2.6/arch/x86/include/asm/pgtable_64_types.h
===================================================================
--- linux-2.6.orig/arch/x86/include/asm/pgtable_64_types.h
+++ linux-2.6/arch/x86/include/asm/pgtable_64_types.h
@@ -1,6 +1,8 @@
 #ifndef _ASM_X86_PGTABLE_64_DEFS_H
 #define _ASM_X86_PGTABLE_64_DEFS_H
 
+#include <asm/sparsemem.h>
+
 #ifndef __ASSEMBLY__
 #include <linux/types.h>
 
@@ -60,4 +62,6 @@ typedef struct { pteval_t pte; } pte_t;
 #define MODULES_END      _AC(0xffffffffff000000, UL)
 #define MODULES_LEN   (MODULES_END - MODULES_VADDR)
 
+#define EARLY_DYNAMIC_PAGE_TABLES	64
+
 #endif /* _ASM_X86_PGTABLE_64_DEFS_H */
Index: linux-2.6/arch/x86/kernel/head64.c
===================================================================
--- linux-2.6.orig/arch/x86/kernel/head64.c
+++ linux-2.6/arch/x86/kernel/head64.c
@@ -26,11 +26,73 @@
 #include <asm/e820.h>
 #include <asm/bios_ebda.h>
 
-static void __init zap_identity_mappings(void)
+/*
+ * Manage page tables very early on.
+ */
+extern pgd_t early_level4_pgt[PTRS_PER_PGD];
+extern pmd_t early_dynamic_pgts[EARLY_DYNAMIC_PAGE_TABLES][PTRS_PER_PMD];
+static unsigned int __initdata next_early_pgt = 2, early_pgt_resets = 0;
+
+/* Wipe all early page tables except for the kernel symbol map */
+static void __init reset_early_page_tables(void)
 {
-	pgd_t *pgd = pgd_offset_k(0UL);
-	pgd_clear(pgd);
-	__flush_tlb_all();
+	unsigned long i;
+
+	for (i = 0; i < PTRS_PER_PGD-1; i++)
+		early_level4_pgt[i].pgd = 0;
+
+	next_early_pgt = 0;
+	early_pgt_resets++;
+
+	__native_flush_tlb();
+}
+
+/* Create a new PMD entry */
+int __init early_make_pgtable(unsigned long address)
+{
+	unsigned long physaddr = address - __PAGE_OFFSET;
+	unsigned long i;
+	pgdval_t pgd, *pgd_p;
+	pudval_t *pud_p;
+	pmdval_t pmd, *pmd_p;
+
+	if (physaddr >= MAXMEM)
+		return -1;	/* Invalid address - puke */
+
+	i = (address >> PGDIR_SHIFT) & (PTRS_PER_PGD - 1);
+	pgd_p = &early_level4_pgt[i].pgd;
+	pgd = *pgd_p;
+
+	/*
+	 * The use of __START_KERNEL_map rather than __PAGE_OFFSET here is
+	 * critical -- __PAGE_OFFSET would point us back into the dynamic
+	 * range and we might end up looping forever...
+	 */
+	if (pgd && next_early_pgt < EARLY_DYNAMIC_PAGE_TABLES) {
+		pud_p = (pudval_t *)((pgd & PTE_PFN_MASK) + __START_KERNEL_map);
+	} else {
+		if (next_early_pgt >= EARLY_DYNAMIC_PAGE_TABLES-1)
+			reset_early_page_tables();
+
+		pud_p = (pudval_t *)early_dynamic_pgts[next_early_pgt++];
+		for (i = 0; i < PTRS_PER_PUD; i++)
+			pud_p[i] = 0;
+
+		*pgd_p = (pgdval_t)pud_p - __START_KERNEL_map + _KERNPG_TABLE;
+	}
+	i = (address >> PUD_SHIFT) & (PTRS_PER_PUD - 1);
+	pud_p += i;
+
+	pmd_p = (pmdval_t *)early_dynamic_pgts[next_early_pgt++];
+	pmd = (physaddr & PUD_MASK) + (__PAGE_KERNEL_LARGE & ~_PAGE_GLOBAL);
+	for (i = 0; i < PTRS_PER_PMD; i++) {
+		pmd_p[i] = pmd;
+		pmd += PMD_SIZE;
+	}
+
+	*pud_p = (pudval_t)pmd_p - __START_KERNEL_map + _KERNPG_TABLE;
+
+	return 0;
 }
 
 /* Don't add a printk in there. printk relies on the PDA which is not initialized 
@@ -61,6 +123,21 @@ static void __init copy_bootdata(char *r
 	}
 }
 
+unsigned long __meminit
+kernel_physical_mapping_init(unsigned long start,
+			     unsigned long end,
+			     unsigned long page_size_mask);
+
+static void init_mapping_kernel(void)
+{
+	init_level4_pgt[511] = early_level4_pgt[511];
+	early_alloc_pgt_buf();
+	kernel_physical_mapping_init(0, ISA_END_ADDRESS, 1<<PG_LEVEL_2M);
+	kernel_physical_mapping_init(round_down(__pa_symbol(_text), PMD_SIZE),
+				     round_up(__pa_symbol(_end) - 1, PMD_SIZE),
+					 1<<PG_LEVEL_2M);
+}
+
 void __init x86_64_start_kernel(char * real_mode_data)
 {
 	int i;
@@ -79,12 +156,13 @@ void __init x86_64_start_kernel(char * r
 				(__START_KERNEL & PGDIR_MASK)));
 	BUILD_BUG_ON(__fix_to_virt(__end_of_fixed_addresses) <= MODULES_END);
 
+	/* Kill off the identity-map trampoline */
+	reset_early_page_tables();
+
 	/* clear bss before set_intr_gate with early_idt_handler */
 	clear_bss();
 
-	/* Make NULL pointers segfault */
-	zap_identity_mappings();
-
+	/* XXX - this is wrong... we need to build page tables from scratch */
 	max_pfn_mapped = KERNEL_IMAGE_SIZE >> PAGE_SHIFT;
 
 	for (i = 0; i < NUM_EXCEPTION_VECTORS; i++) {
@@ -99,6 +177,8 @@ void __init x86_64_start_kernel(char * r
 	if (console_loglevel == 10)
 		early_printk("Kernel alive\n");
 
+	init_mapping_kernel();
+
 	x86_64_start_reservations(real_mode_data);
 }
 
Index: linux-2.6/arch/x86/kernel/head_64.S
===================================================================
--- linux-2.6.orig/arch/x86/kernel/head_64.S
+++ linux-2.6/arch/x86/kernel/head_64.S
@@ -47,14 +47,13 @@ L3_START_KERNEL = pud_index(__START_KERN
 	.code64
 	.globl startup_64
 startup_64:
-
 	/*
 	 * At this point the CPU runs in 64bit mode CS.L = 1 CS.D = 1,
 	 * and someone has loaded an identity mapped page table
 	 * for us.  These identity mapped page tables map all of the
 	 * kernel pages and possibly all of memory.
 	 *
-	 * %esi holds a physical pointer to real_mode_data.
+	 * %rsi holds a physical pointer to real_mode_data.
 	 *
 	 * We come here either directly from a 64bit bootloader, or from
 	 * arch/x86_64/boot/compressed/head.S.
@@ -66,7 +65,8 @@ startup_64:
 	 * tables and then reload them.
 	 */
 
-	/* Compute the delta between the address I am compiled to run at and the
+	/*
+	 * Compute the delta between the address I am compiled to run at and the
 	 * address I am actually running at.
 	 */
 	leaq	_text(%rip), %rbp
@@ -78,53 +78,66 @@ startup_64:
 	testl	%eax, %eax
 	jnz	bad_address
 
-	/* Is the address too large? */
-	leaq	_text(%rip), %rdx
-	movq	$PGDIR_SIZE, %rax
-	cmpq	%rax, %rdx
-	jae	bad_address
-
-	/* Fixup the physical addresses in the page table
-	 */
-	addq	%rbp, init_level4_pgt + 0(%rip)
-	addq	%rbp, init_level4_pgt + (L4_PAGE_OFFSET*8)(%rip)
-	addq	%rbp, init_level4_pgt + (L4_START_KERNEL*8)(%rip)
+	/*
+	 * Is the address too large?
+	 */
+	leaq	_text(%rip), %rax
+	shrq	$MAX_PHYSMEM_BITS, %rax
+	jnz	bad_address
 
-	addq	%rbp, level3_ident_pgt + 0(%rip)
+	/*
+	 * Fixup the physical addresses in the page table
+	 */
+	addq	%rbp, early_level4_pgt + (L4_START_KERNEL*8)(%rip)
 
 	addq	%rbp, level3_kernel_pgt + (510*8)(%rip)
 	addq	%rbp, level3_kernel_pgt + (511*8)(%rip)
 
 	addq	%rbp, level2_fixmap_pgt + (506*8)(%rip)
 
-	/* Add an Identity mapping if I am above 1G */
+	/*
+	 * Set up the identity mapping for the switchover.  These
+	 * entries should *NOT* have the global bit set!  This also
+	 * creates a bunch of nonsense entries but that is fine --
+	 * it avoids problems around wraparound.
+	 */
 	leaq	_text(%rip), %rdi
-	andq	$PMD_PAGE_MASK, %rdi
+	leaq	early_level4_pgt(%rip), %rbx
 
 	movq	%rdi, %rax
-	shrq	$PUD_SHIFT, %rax
-	andq	$(PTRS_PER_PUD - 1), %rax
-	jz	ident_complete
+	shrq	$PGDIR_SHIFT, %rax
 
-	leaq	(level2_spare_pgt - __START_KERNEL_map + _KERNPG_TABLE)(%rbp), %rdx
-	leaq	level3_ident_pgt(%rip), %rbx
-	movq	%rdx, 0(%rbx, %rax, 8)
+	leaq	(4096 + _KERNPG_TABLE)(%rbx), %rdx
+	movq	%rdx, 0(%rbx,%rax,8)
+	movq	%rdx, 8(%rbx,%rax,8)
 
+	addq	$4096, %rdx
 	movq	%rdi, %rax
-	shrq	$PMD_SHIFT, %rax
-	andq	$(PTRS_PER_PMD - 1), %rax
-	leaq	__PAGE_KERNEL_IDENT_LARGE_EXEC(%rdi), %rdx
-	leaq	level2_spare_pgt(%rip), %rbx
-	movq	%rdx, 0(%rbx, %rax, 8)
-ident_complete:
+	shrq	$PUD_SHIFT, %rax
+	andl	$(PTRS_PER_PUD-1), %eax
+	movq	%rdx, (4096+0)(%rbx,%rax,8)
+	movq	%rdx, (4096+8)(%rbx,%rax,8)
 
+	addq	$8192, %rbx
+	movq	%rdi, %rax
+	shrq	$PMD_SHIFT, %rdi
+	addq	$(__PAGE_KERNEL_LARGE_EXEC & ~_PAGE_GLOBAL), %rax
+	movl	$PTRS_PER_PMD, %ecx
+
+1:
+	andq	$(PTRS_PER_PMD - 1), %rdi
+	movq	%rax, (%rbx,%rdi,8)
+	incq	%rdi
+	addq	$PMD_SIZE, %rax
+	decl	%ecx
+	jnz	1b
+	
 	/*
 	 * Fixup the kernel text+data virtual addresses. Note that
 	 * we might write invalid pmds, when the kernel is relocated
 	 * cleanup_highmap() fixes this up along with the mappings
 	 * beyond _end.
 	 */
-
 	leaq	level2_kernel_pgt(%rip), %rdi
 	leaq	4096(%rdi), %r8
 	/* See if it is a valid page table entry */
@@ -139,17 +152,14 @@ ident_complete:
 	/* Fixup phys_base */
 	addq	%rbp, phys_base(%rip)
 
-	/* Due to ENTRY(), sometimes the empty space gets filled with
-	 * zeros. Better take a jmp than relying on empty space being
-	 * filled with 0x90 (nop)
-	 */
-	jmp secondary_startup_64
+	movq	$(early_level4_pgt - __START_KERNEL_map), %rax
+	jmp 1f
 ENTRY(secondary_startup_64)
 	/*
 	 * At this point the CPU runs in 64bit mode CS.L = 1 CS.D = 1,
 	 * and someone has loaded a mapped page table.
 	 *
-	 * %esi holds a physical pointer to real_mode_data.
+	 * %rsi holds a physical pointer to real_mode_data.
 	 *
 	 * We come here either from startup_64 (using physical addresses)
 	 * or from trampoline.S (using virtual addresses).
@@ -159,12 +169,14 @@ ENTRY(secondary_startup_64)
 	 * after the boot processor executes this code.
 	 */
 
+	movq	$(init_level4_pgt - __START_KERNEL_map), %rax
+1:
+
 	/* Enable PAE mode and PGE */
-	movl	$(X86_CR4_PAE | X86_CR4_PGE), %eax
-	movq	%rax, %cr4
+	movl	$(X86_CR4_PAE | X86_CR4_PGE), %ecx
+	movq	%rcx, %cr4
 
 	/* Setup early boot stage 4 level pagetables. */
-	movq	$(init_level4_pgt - __START_KERNEL_map), %rax
 	addq	phys_base(%rip), %rax
 	movq	%rax, %cr3
 
@@ -196,7 +208,7 @@ ENTRY(secondary_startup_64)
 	movq	%rax, %cr0
 
 	/* Setup a boot time stack */
-	movq stack_start(%rip),%rsp
+	movq stack_start(%rip), %rsp
 
 	/* zero EFLAGS after setting rsp */
 	pushq $0
@@ -236,21 +248,19 @@ ENTRY(secondary_startup_64)
 	movl	initial_gs+4(%rip),%edx
 	wrmsr	
 
-	/* esi is pointer to real mode structure with interesting info.
+	/* rsi is pointer to real mode structure with interesting info.
 	   pass it to C */
-	movl	%esi, %edi
+	movq	%rsi, %rdi
 	
 	/* Finally jump to run C code and to be on real kernel address
 	 * Since we are running on identity-mapped space we have to jump
 	 * to the full 64bit address, this is only possible as indirect
 	 * jump.  In addition we need to ensure %cs is set so we make this
-	 * a far return.
+	 * a far jump.
 	 */
-	movq	initial_code(%rip),%rax
 	pushq	$0		# fake return address to stop unwinder
-	pushq	$__KERNEL_CS	# set correct cs
-	pushq	%rax		# target address in negative space
-	lretq
+	/* gas 2.22 is buggy and mis-assembles ljmpq */
+	rex64 ljmp *initial_code(%rip)
 
 #ifdef CONFIG_HOTPLUG_CPU
 /*
@@ -270,13 +280,15 @@ ENDPROC(start_cpu0)
 
 	/* SMP bootup changes these two */
 	__REFDATA
-	.align	8
-	ENTRY(initial_code)
+	.balign	8
+	GLOBAL(initial_code)
 	.quad	x86_64_start_kernel
-	ENTRY(initial_gs)
+	.word	__KERNEL_CS
+	.balign	8
+	GLOBAL(initial_gs)
 	.quad	INIT_PER_CPU_VAR(irq_stack_union)
 
-	ENTRY(stack_start)
+	GLOBAL(stack_start)
 	.quad  init_thread_union+THREAD_SIZE-8
 	.word  0
 	__FINITDATA
@@ -284,7 +296,7 @@ ENDPROC(start_cpu0)
 bad_address:
 	jmp bad_address
 
-	.section ".init.text","ax"
+	__INIT
 	.globl early_idt_handlers
 early_idt_handlers:
 	# 104(%rsp) %rflags
@@ -321,14 +333,22 @@ ENTRY(early_idt_handler)
 	pushq %r11		#  0(%rsp)
 
 	cmpl $__KERNEL_CS,96(%rsp)
-	jne 10f
+	jne 11f
 
+	cmpl $14,72(%rsp)	# Page fault?
+	jnz 10f
+	GET_CR2_INTO(%rdi)	# can clobber any volatile register if pv
+	call early_make_pgtable
+	andl %eax,%eax
+	jz 20f			# All good
+
+10:
 	leaq 88(%rsp),%rdi	# Pointer to %rip
 	call early_fixup_exception
 	andl %eax,%eax
 	jnz 20f			# Found an exception entry
 
-10:
+11:
 #ifdef CONFIG_EARLY_PRINTK
 	GET_CR2_INTO(%r9)	# can clobber any volatile register if pv
 	movl 80(%rsp),%r8d	# error code
@@ -350,7 +370,7 @@ ENTRY(early_idt_handler)
 1:	hlt
 	jmp 1b
 
-20:	# Exception table entry found
+20:	# Exception table entry found or page table generated
 	popq %r11
 	popq %r10
 	popq %r9
@@ -364,6 +384,8 @@ ENTRY(early_idt_handler)
 	decl early_recursion_flag(%rip)
 	INTERRUPT_RETURN
 
+	__INITDATA
+	
 	.balign 4
 early_recursion_flag:
 	.long 0
@@ -374,11 +396,10 @@ early_idt_msg:
 early_idt_ripmsg:
 	.asciz "RIP %s\n"
 #endif /* CONFIG_EARLY_PRINTK */
-	.previous
 
 #define NEXT_PAGE(name) \
 	.balign	PAGE_SIZE; \
-ENTRY(name)
+GLOBAL(name)
 
 /* Automate the creation of 1 to 1 mapping pmd entries */
 #define PMDS(START, PERM, COUNT)			\
@@ -388,46 +409,21 @@ ENTRY(name)
 	i = i + 1 ;					\
 	.endr
 
-	.data
-	/*
-	 * This default setting generates an ident mapping at address 0x100000
-	 * and a mapping for the kernel that precisely maps virtual address
-	 * 0xffffffff80000000 to physical address 0x000000. (always using
-	 * 2Mbyte large pages provided by PAE mode)
-	 */
-NEXT_PAGE(init_level4_pgt)
-	.quad	level3_ident_pgt - __START_KERNEL_map + _KERNPG_TABLE
-	.org	init_level4_pgt + L4_PAGE_OFFSET*8, 0
-	.quad	level3_ident_pgt - __START_KERNEL_map + _KERNPG_TABLE
-	.org	init_level4_pgt + L4_START_KERNEL*8, 0
-	/* (2^48-(2*1024*1024*1024))/(2^39) = 511 */
+	__INITDATA
+NEXT_PAGE(early_level4_pgt)
+	.fill	511,8,0
 	.quad	level3_kernel_pgt - __START_KERNEL_map + _PAGE_TABLE
 
-NEXT_PAGE(level3_ident_pgt)
-	.quad	level2_ident_pgt - __START_KERNEL_map + _KERNPG_TABLE
-	.fill	511,8,0
+NEXT_PAGE(early_dynamic_pgts)
+	.fill	512*EARLY_DYNAMIC_PAGE_TABLES,8,0
 
+	.data
 NEXT_PAGE(level3_kernel_pgt)
 	.fill	L3_START_KERNEL,8,0
 	/* (2^48-(2*1024*1024*1024)-((2^39)*511))/(2^30) = 510 */
 	.quad	level2_kernel_pgt - __START_KERNEL_map + _KERNPG_TABLE
 	.quad	level2_fixmap_pgt - __START_KERNEL_map + _PAGE_TABLE
 
-NEXT_PAGE(level2_fixmap_pgt)
-	.fill	506,8,0
-	.quad	level1_fixmap_pgt - __START_KERNEL_map + _PAGE_TABLE
-	/* 8MB reserved for vsyscalls + a 2MB hole = 4 + 1 entries */
-	.fill	5,8,0
-
-NEXT_PAGE(level1_fixmap_pgt)
-	.fill	512,8,0
-
-NEXT_PAGE(level2_ident_pgt)
-	/* Since I easily can, map the first 1G.
-	 * Don't set NX because code runs from these pages.
-	 */
-	PMDS(0, __PAGE_KERNEL_IDENT_LARGE_EXEC, PTRS_PER_PMD)
-
 NEXT_PAGE(level2_kernel_pgt)
 	/*
 	 * 512 MB kernel mapping. We spend a full page on this pagetable
@@ -442,11 +438,16 @@ NEXT_PAGE(level2_kernel_pgt)
 	PMDS(0, __PAGE_KERNEL_LARGE_EXEC,
 		KERNEL_IMAGE_SIZE/PMD_SIZE)
 
-NEXT_PAGE(level2_spare_pgt)
-	.fill   512, 8, 0
+NEXT_PAGE(level2_fixmap_pgt)
+	.fill	506,8,0
+	.quad	level1_fixmap_pgt - __START_KERNEL_map + _PAGE_TABLE
+	/* 8MB reserved for vsyscalls + a 2MB hole = 4 + 1 entries */
+	.fill	5,8,0
+
+NEXT_PAGE(level1_fixmap_pgt)
+	.fill	512,8,0
 
 #undef PMDS
-#undef NEXT_PAGE
 
 	.data
 	.align 16
@@ -472,6 +473,7 @@ ENTRY(nmi_idt_table)
 	.skip IDT_ENTRIES * 16
 
 	__PAGE_ALIGNED_BSS
-	.align PAGE_SIZE
-ENTRY(empty_zero_page)
+NEXT_PAGE(empty_zero_page)
+	.skip PAGE_SIZE
+NEXT_PAGE(init_level4_pgt)
 	.skip PAGE_SIZE
Index: linux-2.6/arch/x86/realmode/init.c
===================================================================
--- linux-2.6.orig/arch/x86/realmode/init.c
+++ linux-2.6/arch/x86/realmode/init.c
@@ -8,22 +8,11 @@
 struct real_mode_header *real_mode_header;
 u32 *trampoline_cr4_features;
 
-void __init setup_real_mode(void)
+void __init setup_real_mode_reserve(void)
 {
 	phys_addr_t mem;
-	u16 real_mode_seg;
-	u32 *rel;
-	u32 count;
-	u32 *ptr;
-	u16 *seg;
-	int i;
 	unsigned char *base;
-	struct trampoline_header *trampoline_header;
 	size_t size = PAGE_ALIGN(real_mode_blob_end - real_mode_blob);
-#ifdef CONFIG_X86_64
-	u64 *trampoline_pgd;
-	u64 efer;
-#endif
 
 	/* Has to be in very low memory so we can execute real-mode AP code. */
 	mem = memblock_find_in_range(0, 1<<20, size, PAGE_SIZE);
@@ -35,6 +24,25 @@ void __init setup_real_mode(void)
 	real_mode_header = (struct real_mode_header *) base;
 	printk(KERN_DEBUG "Base memory trampoline at [%p] %llx size %zu\n",
 	       base, (unsigned long long)mem, size);
+}
+
+void __init setup_real_mode(void)
+{
+	u16 real_mode_seg;
+	u32 *rel;
+	u32 count;
+	u32 *ptr;
+	u16 *seg;
+	int i;
+	unsigned char *base;
+	struct trampoline_header *trampoline_header;
+	size_t size = PAGE_ALIGN(real_mode_blob_end - real_mode_blob);
+#ifdef CONFIG_X86_64
+	pgd_t *trampoline_pgd;
+	u64 efer;
+#endif
+
+	base = (unsigned char *)real_mode_header;
 
 	memcpy(base, real_mode_blob, size);
 
@@ -77,9 +85,14 @@ void __init setup_real_mode(void)
 	trampoline_cr4_features = &trampoline_header->cr4;
 	*trampoline_cr4_features = read_cr4();
 
-	trampoline_pgd = (u64 *) __va(real_mode_header->trampoline_pgd);
-	trampoline_pgd[0] = __pa_symbol(level3_ident_pgt) + _KERNPG_TABLE;
-	trampoline_pgd[511] = __pa_symbol(level3_kernel_pgt) + _KERNPG_TABLE;
+	trampoline_pgd = (pgd_t *) __va(real_mode_header->trampoline_pgd);
+
+	/* Set up the identity map */
+	for (i = 0; i < pgd_index(max_pfn<<PAGE_SHIFT); i++)
+		trampoline_pgd[i] = init_level4_pgt[i + pgd_index(__PAGE_OFFSET)];
+
+	/* Set up the kernel map */
+	trampoline_pgd[511] = init_level4_pgt[511];
 #endif
 }
 
Index: linux-2.6/arch/x86/kernel/setup.c
===================================================================
--- linux-2.6.orig/arch/x86/kernel/setup.c
+++ linux-2.6/arch/x86/kernel/setup.c
@@ -684,6 +684,8 @@ early_param("reservelow", parse_reservel
  * Note: On x86_64, fixmaps are ready for use even before this is called.
  */
 
+void __init setup_real_mode_reserve(void);
+
 void __init setup_arch(char **cmdline_p)
 {
 	early_reserve_initrd();
@@ -909,7 +911,9 @@ void __init setup_arch(char **cmdline_p)
 
 	reserve_ibft_region();
 
+#ifndef CONFIG_X86_64
 	early_alloc_pgt_buf();
+#endif
 
 	/*
 	 * Need to conclude brk, before memblock_x86_fill()
@@ -940,9 +944,13 @@ void __init setup_arch(char **cmdline_p)
 	printk(KERN_DEBUG "initial memory mapped: [mem 0x00000000-%#010lx]\n",
 			(max_pfn_mapped<<PAGE_SHIFT) - 1);
 
-	setup_real_mode();
+	setup_real_mode_reserve();
 
+#ifdef CONFIG_X86_64
+	write_cr3(__pa(init_level4_pgt));
+#endif
 	init_mem_mapping();
+	setup_real_mode();
 
 	memblock.current_limit = get_max_mapped();
 	dma_contiguous_reserve(0);
Index: linux-2.6/arch/x86/mm/init.c
===================================================================
--- linux-2.6.orig/arch/x86/mm/init.c
+++ linux-2.6/arch/x86/mm/init.c
@@ -75,8 +75,8 @@ __ref void *alloc_low_pages(unsigned int
 	return __va(pfn << PAGE_SHIFT);
 }
 
-/* need 4 4k for initial PMD_SIZE, 4k for 0-ISA_END_ADDRESS */
-#define INIT_PGT_BUF_SIZE	(5 * PAGE_SIZE)
+/* need 4 4k for initial PMD_SIZE, 4k for 0-ISA_END_ADDRESS, 4 4k for kernel */
+#define INIT_PGT_BUF_SIZE	(9 * PAGE_SIZE)
 RESERVE_BRK(early_pgt_alloc, INIT_PGT_BUF_SIZE);
 void  __init early_alloc_pgt_buf(void)
 {

^ permalink raw reply	[flat|nested] 127+ messages in thread

* Re: [tip:x86/microcode] x86/microcode_intel_early.c: Early update ucode on Intel's CPU
  2012-12-14  9:11                                                   ` Yinghai Lu
@ 2012-12-14 18:16                                                     ` H. Peter Anvin
  2012-12-14 19:46                                                     ` H. Peter Anvin
  1 sibling, 0 replies; 127+ messages in thread
From: H. Peter Anvin @ 2012-12-14 18:16 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: H. Peter Anvin, Borislav Petkov, Yu, Fenghua, mingo,
	linux-kernel, tglx, linux-tip-commits, Konrad Rzeszutek Wilk,
	Stefano Stabellini

On 12/14/2012 01:11 AM, Yinghai Lu wrote:
> 
> attached works on kvm local, but SMP does not work yet.
> 

SMP should be easy... it is probably just a matter of getting through
the trampoline sequence properly.  I will look at this shortly.

	-hpa



^ permalink raw reply	[flat|nested] 127+ messages in thread

* Re: [tip:x86/microcode] x86/microcode_intel_early.c: Early update ucode on Intel's CPU
  2012-12-14  9:11                                                   ` Yinghai Lu
  2012-12-14 18:16                                                     ` H. Peter Anvin
@ 2012-12-14 19:46                                                     ` H. Peter Anvin
  2012-12-14 20:04                                                       ` Yinghai Lu
  1 sibling, 1 reply; 127+ messages in thread
From: H. Peter Anvin @ 2012-12-14 19:46 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Borislav Petkov, Yu, Fenghua, mingo, linux-kernel, tglx, hpa,
	linux-tip-commits, Konrad Rzeszutek Wilk, Stefano Stabellini

On 12/14/2012 01:11 AM, Yinghai Lu wrote:
> On Thu, Dec 13, 2012 at 1:36 PM, H. Peter Anvin <hpa@zytor.com> wrote:
>>
>> : tazenda 111 ; qemu-kvm -smp 2 -m 2048 -hda ~/qemu/fc10/qemu-fc10-64.img
>> -serial stdio -kernel o.x86_64/arch/x86/boot/bzImage -append 'ro
>> root=/dev/sda1 console=ttyS0 earlyprintk=serial,ttyS0 debug'
>> early console in setup code
>> early console in decompress_kernel
>>
>> [    0.000000] init_memory_mapping: [mem 0x00000000-0x7fffdfff]
>> [    0.000000]  [mem 0x00000000-0x7fdfffff] page 2M
>> [    0.000000]  [mem 0x7fe00000-0x7fffdfff] page 4k
>> [    0.000000] Kernel panic - not syncing: Cannot find space for the kernel
>> page tables
>> [    0.000000] Pid: 0, comm: swapper Not tainted 3.7.0+ #16
>> [    0.000000] Call Trace:
>> [    0.000000]  [<ffffffff817f0d2e>] panic+0xb6/0x1b5
>> [    0.000000]  [<ffffffff817e3801>] init_memory_mapping+0x471/0x5a0
>> [    0.000000]  [<ffffffff81ecd37f>] setup_arch+0x65c/0xb71
>> [    0.000000]  [<ffffffff81ec998e>] start_kernel+0x8a/0x348
>> [    0.000000]  [<ffffffff81ec9452>] x86_64_start_reservations+0x132/0x136
>> [    0.000000]  [<ffffffff81ec94fe>] x86_64_start_kernel+0xa8/0xad
> 
> attached works on kvm local, but SMP does not work yet.
> 
> ON TOP of linus tree + tip:x86/mm2
> 
> I added mapping to kernel to init_level2_mapping with BRK before #PF
> handler still works. (before early_trap_init)
> copy entries into init_level2_page from early_level4_pgt....
> split setup_real_mode to reserve and copy... copy need to after
> init_mem_mapping for system with more than 512G ram.
> 
> my plan is using this one replace
> 
>       [PATCH v6 06/27] x86, 64bit: Set extra ident mapping for whole
> kernel range
> 

That patch doesn't apply on top of the merge of x86/mm2 and
linus/master.  A trivial fixup is totally nonfunctional -- no
earlyprintk, just a null pointer death in setup_real_mode().

I suspect we don't need init_level4_pgt at all and should just plan to
get rid of it.  Is there any reason we can't just build the proper
kernel page table set in pagetable_init() and switch to it there?

	-hpa


^ permalink raw reply	[flat|nested] 127+ messages in thread

* Re: [tip:x86/microcode] x86/microcode_intel_early.c: Early update ucode on Intel's CPU
  2012-12-14 19:46                                                     ` H. Peter Anvin
@ 2012-12-14 20:04                                                       ` Yinghai Lu
  2012-12-14 20:08                                                         ` Yinghai Lu
                                                                           ` (2 more replies)
  0 siblings, 3 replies; 127+ messages in thread
From: Yinghai Lu @ 2012-12-14 20:04 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Borislav Petkov, Yu, Fenghua, mingo, linux-kernel, tglx, hpa,
	linux-tip-commits, Konrad Rzeszutek Wilk, Stefano Stabellini

On Fri, Dec 14, 2012 at 11:46 AM, H. Peter Anvin <hpa@zytor.com> wrote:
>
> That patch doesn't apply on top of the merge of x86/mm2 and
> linus/master.  A trivial fixup is totally nonfunctional -- no
> earlyprintk, just a null pointer death in setup_real_mode().

just saw linus pulled efi fix that is touching real_mode/init.c.
and there is something wrong with it.

will check that...

>
> I suspect we don't need init_level4_pgt at all and should just plan to
> get rid of it.  Is there any reason we can't just build the proper
> kernel page table set in pagetable_init() and switch to it there?

then how to pass the info to AP?

^ permalink raw reply	[flat|nested] 127+ messages in thread

* Re: [tip:x86/microcode] x86/microcode_intel_early.c: Early update ucode on Intel's CPU
  2012-12-14 20:04                                                       ` Yinghai Lu
@ 2012-12-14 20:08                                                         ` Yinghai Lu
  2012-12-14 20:14                                                           ` Yinghai Lu
  2012-12-15  7:57                                                           ` Yinghai Lu
  2012-12-14 20:10                                                         ` H. Peter Anvin
  2012-12-14 21:07                                                         ` Yinghai Lu
  2 siblings, 2 replies; 127+ messages in thread
From: Yinghai Lu @ 2012-12-14 20:08 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Borislav Petkov, Yu, Fenghua, mingo, linux-kernel, tglx, hpa,
	linux-tip-commits, Konrad Rzeszutek Wilk, Stefano Stabellini

On Fri, Dec 14, 2012 at 12:04 PM, Yinghai Lu <yinghai@kernel.org> wrote:
> On Fri, Dec 14, 2012 at 11:46 AM, H. Peter Anvin <hpa@zytor.com> wrote:
>>
>> I suspect we don't need init_level4_pgt at all and should just plan to
>> get rid of it.  Is there any reason we can't just build the proper
>> kernel page table set in pagetable_init() and switch to it there?
>
> then how to pass the info to AP?

also we should merge early_level4_pgt with init_level4_pgt.

and #PE handler could just extend to use BRK ...

but need to make sure BRK get mapped at first, and BRK could cross the
1G, 512G boundary ...

that could make things less impact to all.

^ permalink raw reply	[flat|nested] 127+ messages in thread

* Re: [tip:x86/microcode] x86/microcode_intel_early.c: Early update ucode on Intel's CPU
  2012-12-14 20:04                                                       ` Yinghai Lu
  2012-12-14 20:08                                                         ` Yinghai Lu
@ 2012-12-14 20:10                                                         ` H. Peter Anvin
  2012-12-14 20:17                                                           ` Yinghai Lu
  2012-12-14 21:07                                                         ` Yinghai Lu
  2 siblings, 1 reply; 127+ messages in thread
From: H. Peter Anvin @ 2012-12-14 20:10 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Borislav Petkov, Yu, Fenghua, mingo, linux-kernel, tglx, hpa,
	linux-tip-commits, Konrad Rzeszutek Wilk, Stefano Stabellini

What info are you referring to?

Yinghai Lu <yinghai@kernel.org> wrote:

>On Fri, Dec 14, 2012 at 11:46 AM, H. Peter Anvin <hpa@zytor.com> wrote:
>>
>> That patch doesn't apply on top of the merge of x86/mm2 and
>> linus/master.  A trivial fixup is totally nonfunctional -- no
>> earlyprintk, just a null pointer death in setup_real_mode().
>
>just saw linus pulled efi fix that is touching real_mode/init.c.
>and there is something wrong with it.
>
>will check that...
>
>>
>> I suspect we don't need init_level4_pgt at all and should just plan
>to
>> get rid of it.  Is there any reason we can't just build the proper
>> kernel page table set in pagetable_init() and switch to it there?
>
>then how to pass the info to AP?

-- 
Sent from my mobile phone. Please excuse brevity and lack of formatting.

^ permalink raw reply	[flat|nested] 127+ messages in thread

* Re: [tip:x86/microcode] x86/microcode_intel_early.c: Early update ucode on Intel's CPU
  2012-12-14 20:08                                                         ` Yinghai Lu
@ 2012-12-14 20:14                                                           ` Yinghai Lu
  2012-12-14 20:44                                                             ` H. Peter Anvin
  2012-12-15  7:57                                                           ` Yinghai Lu
  1 sibling, 1 reply; 127+ messages in thread
From: Yinghai Lu @ 2012-12-14 20:14 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Borislav Petkov, Yu, Fenghua, mingo, linux-kernel, tglx, hpa,
	linux-tip-commits, Konrad Rzeszutek Wilk, Stefano Stabellini

On Fri, Dec 14, 2012 at 12:08 PM, Yinghai Lu <yinghai@kernel.org> wrote:
> On Fri, Dec 14, 2012 at 12:04 PM, Yinghai Lu <yinghai@kernel.org> wrote:
>> On Fri, Dec 14, 2012 at 11:46 AM, H. Peter Anvin <hpa@zytor.com> wrote:
>>>
>>> I suspect we don't need init_level4_pgt at all and should just plan to
>>> get rid of it.  Is there any reason we can't just build the proper
>>> kernel page table set in pagetable_init() and switch to it there?
>>
>> then how to pass the info to AP?
>
> also we should merge early_level4_pgt with init_level4_pgt.
>
> and #PE handler could just extend to use BRK ...
>
> but need to make sure BRK get mapped at first, and BRK could cross the
> 1G, 512G boundary ...
>
> that could make things less impact to all.

your current switchover setup in arch/x86/kernel/head_64.S could
handle above 512g and cross boundary?

^ permalink raw reply	[flat|nested] 127+ messages in thread

* Re: [tip:x86/microcode] x86/microcode_intel_early.c: Early update ucode on Intel's CPU
  2012-12-14 20:10                                                         ` H. Peter Anvin
@ 2012-12-14 20:17                                                           ` Yinghai Lu
  2012-12-14 20:52                                                             ` H. Peter Anvin
  0 siblings, 1 reply; 127+ messages in thread
From: Yinghai Lu @ 2012-12-14 20:17 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Borislav Petkov, Yu, Fenghua, mingo, linux-kernel, tglx, hpa,
	linux-tip-commits, Konrad Rzeszutek Wilk, Stefano Stabellini

On Fri, Dec 14, 2012 at 12:10 PM, H. Peter Anvin <hpa@zytor.com> wrote:
> What info are you referring to?

pointer to init_level4_pgt page.

^ permalink raw reply	[flat|nested] 127+ messages in thread

* Re: [tip:x86/microcode] x86/microcode_intel_early.c: Early update ucode on Intel's CPU
  2012-12-14 20:14                                                           ` Yinghai Lu
@ 2012-12-14 20:44                                                             ` H. Peter Anvin
  0 siblings, 0 replies; 127+ messages in thread
From: H. Peter Anvin @ 2012-12-14 20:44 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Borislav Petkov, Yu, Fenghua, mingo, linux-kernel, tglx, hpa,
	linux-tip-commits, Konrad Rzeszutek Wilk, Stefano Stabellini

On 12/14/2012 12:14 PM, Yinghai Lu wrote:
> On Fri, Dec 14, 2012 at 12:08 PM, Yinghai Lu <yinghai@kernel.org> wrote:
>> On Fri, Dec 14, 2012 at 12:04 PM, Yinghai Lu <yinghai@kernel.org> wrote:
>>> On Fri, Dec 14, 2012 at 11:46 AM, H. Peter Anvin <hpa@zytor.com> wrote:
>>>>
>>>> I suspect we don't need init_level4_pgt at all and should just plan to
>>>> get rid of it.  Is there any reason we can't just build the proper
>>>> kernel page table set in pagetable_init() and switch to it there?
>>>
>>> then how to pass the info to AP?
>>
>> also we should merge early_level4_pgt with init_level4_pgt.
>>
>> and #PE handler could just extend to use BRK ...
>>
>> but need to make sure BRK get mapped at first, and BRK could cross the
>> 1G, 512G boundary ...
>>
>> that could make things less impact to all.
> 
> your current switchover setup in arch/x86/kernel/head_64.S could
> handle above 512g and cross boundary?
> 

Yes, that's the purpose of the funny aliasing games I play there.

	-hpa


^ permalink raw reply	[flat|nested] 127+ messages in thread

* Re: [tip:x86/microcode] x86/microcode_intel_early.c: Early update ucode on Intel's CPU
  2012-12-14 20:17                                                           ` Yinghai Lu
@ 2012-12-14 20:52                                                             ` H. Peter Anvin
  0 siblings, 0 replies; 127+ messages in thread
From: H. Peter Anvin @ 2012-12-14 20:52 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Borislav Petkov, Yu, Fenghua, mingo, linux-kernel, tglx, hpa,
	linux-tip-commits, Konrad Rzeszutek Wilk, Stefano Stabellini

On 12/14/2012 12:17 PM, Yinghai Lu wrote:
> On Fri, Dec 14, 2012 at 12:10 PM, H. Peter Anvin <hpa@zytor.com> wrote:
>> What info are you referring to?
> 
> pointer to init_level4_pgt page.
> 

We point the trampoline at the proper page tables in swapper_pg_dir;
this means pushing realmode initialization (but not reservation!) after
pagetable_init(), but that should be fine I *believe*... the big
question is if we have something that wants to invoke EFI before then,
if so it could get tricky.

	-hpa


^ permalink raw reply	[flat|nested] 127+ messages in thread

* Re: [tip:x86/microcode] x86/microcode_intel_early.c: Early update ucode on Intel's CPU
  2012-12-14 20:04                                                       ` Yinghai Lu
  2012-12-14 20:08                                                         ` Yinghai Lu
  2012-12-14 20:10                                                         ` H. Peter Anvin
@ 2012-12-14 21:07                                                         ` Yinghai Lu
  2 siblings, 0 replies; 127+ messages in thread
From: Yinghai Lu @ 2012-12-14 21:07 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Borislav Petkov, Yu, Fenghua, mingo, linux-kernel, tglx, hpa,
	linux-tip-commits, Konrad Rzeszutek Wilk, Stefano Stabellini

[-- Attachment #1: Type: text/plain, Size: 686 bytes --]

On Fri, Dec 14, 2012 at 12:04 PM, Yinghai Lu <yinghai@kernel.org> wrote:
> On Fri, Dec 14, 2012 at 11:46 AM, H. Peter Anvin <hpa@zytor.com> wrote:
>>
>> That patch doesn't apply on top of the merge of x86/mm2 and
>> linus/master.  A trivial fixup is totally nonfunctional -- no
>> earlyprintk, just a null pointer death in setup_real_mode().
>
> just saw linus pulled efi fix that is touching real_mode/init.c.
> and there is something wrong with it.

one of change in efi pull make us can not call
kernel_pysical_mapping_init early...

please check attached...

also move init_mapping_kernel to setup.c instead to make lock init
warning go away.

again, it still has problem with SMP.

[-- Attachment #2: fix_trampoline_pgd.patch --]
[-- Type: application/octet-stream, Size: 3876 bytes --]

Subject: [PATCH] x86, realmode: Separate real_mode reserve and setup

setup should be after init_mem_mapping(), otherwise will not have
ident mapping copied correctly when system have more than 512G RAM
from init_level4_pgt.

after that we could only backup pgd change to trampoline_pgd for
hot add memory only.

So we could call kernel_physical_memory_mapping early again.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>

---
 arch/x86/include/asm/realmode.h |    3 ++-
 arch/x86/kernel/setup.c         |    4 +++-
 arch/x86/mm/init_64.c           |   11 +++++++----
 arch/x86/realmode/init.c        |   30 +++++++++++++++++++-----------
 4 files changed, 31 insertions(+), 17 deletions(-)

Index: linux-2.6/arch/x86/kernel/setup.c
===================================================================
--- linux-2.6.orig/arch/x86/kernel/setup.c
+++ linux-2.6/arch/x86/kernel/setup.c
@@ -913,10 +913,12 @@ void __init setup_arch(char **cmdline_p)
 	printk(KERN_DEBUG "initial memory mapped: [mem 0x00000000-%#010lx]\n",
 			(max_pfn_mapped<<PAGE_SHIFT) - 1);
 
-	setup_real_mode();
+	reserve_real_mode();
 
 	init_mem_mapping();
 
+	setup_real_mode();
+
 	memblock.current_limit = get_max_mapped();
 	dma_contiguous_reserve(0);
 
Index: linux-2.6/arch/x86/realmode/init.c
===================================================================
--- linux-2.6.orig/arch/x86/realmode/init.c
+++ linux-2.6/arch/x86/realmode/init.c
@@ -8,9 +8,26 @@
 struct real_mode_header *real_mode_header;
 u32 *trampoline_cr4_features;
 
-void __init setup_real_mode(void)
+void __init reserve_real_mode(void)
 {
 	phys_addr_t mem;
+	unsigned char *base;
+	size_t size = PAGE_ALIGN(real_mode_blob_end - real_mode_blob);
+
+	/* Has to be in very low memory so we can execute real-mode AP code. */
+	mem = memblock_find_in_range(0, 1<<20, size, PAGE_SIZE);
+	if (!mem)
+		panic("Cannot allocate trampoline\n");
+
+	base = __va(mem);
+	memblock_reserve(mem, size);
+	real_mode_header = (struct real_mode_header *) base;
+	printk(KERN_DEBUG "Base memory trampoline at [%p] %llx size %zu\n",
+	       base, (unsigned long long)mem, size);
+}
+
+void __init setup_real_mode(void)
+{
 	u16 real_mode_seg;
 	u32 *rel;
 	u32 count;
@@ -25,16 +42,7 @@ void __init setup_real_mode(void)
 	u64 efer;
 #endif
 
-	/* Has to be in very low memory so we can execute real-mode AP code. */
-	mem = memblock_find_in_range(0, 1<<20, size, PAGE_SIZE);
-	if (!mem)
-		panic("Cannot allocate trampoline\n");
-
-	base = __va(mem);
-	memblock_reserve(mem, size);
-	real_mode_header = (struct real_mode_header *) base;
-	printk(KERN_DEBUG "Base memory trampoline at [%p] %llx size %zu\n",
-	       base, (unsigned long long)mem, size);
+	base = (unsigned char *)real_mode_header;
 
 	memcpy(base, real_mode_blob, size);
 
Index: linux-2.6/arch/x86/include/asm/realmode.h
===================================================================
--- linux-2.6.orig/arch/x86/include/asm/realmode.h
+++ linux-2.6/arch/x86/include/asm/realmode.h
@@ -58,6 +58,7 @@ extern unsigned char boot_gdt[];
 extern unsigned char secondary_startup_64[];
 #endif
 
-extern void __init setup_real_mode(void);
+void reserve_real_mode(void);
+void setup_real_mode(void);
 
 #endif /* _ARCH_X86_REALMODE_H */
Index: linux-2.6/arch/x86/mm/init_64.c
===================================================================
--- linux-2.6.orig/arch/x86/mm/init_64.c
+++ linux-2.6/arch/x86/mm/init_64.c
@@ -133,11 +133,14 @@ void sync_global_pgds(unsigned long star
 			spin_unlock(pgt_lock);
 		}
 
-		pgd = __va(real_mode_header->trampoline_pgd);
-		pgd += pgd_index(address);
+		/* for hot add memory only */
+		if (after_bootmem) {
+			pgd = __va(real_mode_header->trampoline_pgd);
+			pgd += pgd_index(address);
 
-		if (pgd_none(*pgd))
-			set_pgd(pgd, *pgd_ref);
+			if (pgd_none(*pgd))
+				set_pgd(pgd, *pgd_ref);
+		}
 
 		spin_unlock(&pgd_lock);
 	}

[-- Attachment #3: hpa_pf_set_page_table_3.patch --]
[-- Type: application/octet-stream, Size: 16249 bytes --]

---
 arch/x86/include/asm/pgtable_64_types.h |    4 
 arch/x86/kernel/head64.c                |   77 +++++++++++--
 arch/x86/kernel/head_64.S               |  188 ++++++++++++++++----------------
 arch/x86/kernel/setup.c                 |   27 ++++
 arch/x86/mm/init.c                      |    4 
 5 files changed, 198 insertions(+), 102 deletions(-)

Index: linux-2.6/arch/x86/include/asm/pgtable_64_types.h
===================================================================
--- linux-2.6.orig/arch/x86/include/asm/pgtable_64_types.h
+++ linux-2.6/arch/x86/include/asm/pgtable_64_types.h
@@ -1,6 +1,8 @@
 #ifndef _ASM_X86_PGTABLE_64_DEFS_H
 #define _ASM_X86_PGTABLE_64_DEFS_H
 
+#include <asm/sparsemem.h>
+
 #ifndef __ASSEMBLY__
 #include <linux/types.h>
 
@@ -60,4 +62,6 @@ typedef struct { pteval_t pte; } pte_t;
 #define MODULES_END      _AC(0xffffffffff000000, UL)
 #define MODULES_LEN   (MODULES_END - MODULES_VADDR)
 
+#define EARLY_DYNAMIC_PAGE_TABLES	64
+
 #endif /* _ASM_X86_PGTABLE_64_DEFS_H */
Index: linux-2.6/arch/x86/kernel/head64.c
===================================================================
--- linux-2.6.orig/arch/x86/kernel/head64.c
+++ linux-2.6/arch/x86/kernel/head64.c
@@ -26,11 +26,73 @@
 #include <asm/e820.h>
 #include <asm/bios_ebda.h>
 
-static void __init zap_identity_mappings(void)
+/*
+ * Manage page tables very early on.
+ */
+extern pgd_t early_level4_pgt[PTRS_PER_PGD];
+extern pmd_t early_dynamic_pgts[EARLY_DYNAMIC_PAGE_TABLES][PTRS_PER_PMD];
+static unsigned int __initdata next_early_pgt = 2, early_pgt_resets = 0;
+
+/* Wipe all early page tables except for the kernel symbol map */
+static void __init reset_early_page_tables(void)
 {
-	pgd_t *pgd = pgd_offset_k(0UL);
-	pgd_clear(pgd);
-	__flush_tlb_all();
+	unsigned long i;
+
+	for (i = 0; i < PTRS_PER_PGD-1; i++)
+		early_level4_pgt[i].pgd = 0;
+
+	next_early_pgt = 0;
+	early_pgt_resets++;
+
+	__native_flush_tlb();
+}
+
+/* Create a new PMD entry */
+int __init early_make_pgtable(unsigned long address)
+{
+	unsigned long physaddr = address - __PAGE_OFFSET;
+	unsigned long i;
+	pgdval_t pgd, *pgd_p;
+	pudval_t *pud_p;
+	pmdval_t pmd, *pmd_p;
+
+	if (physaddr >= MAXMEM)
+		return -1;	/* Invalid address - puke */
+
+	i = (address >> PGDIR_SHIFT) & (PTRS_PER_PGD - 1);
+	pgd_p = &early_level4_pgt[i].pgd;
+	pgd = *pgd_p;
+
+	/*
+	 * The use of __START_KERNEL_map rather than __PAGE_OFFSET here is
+	 * critical -- __PAGE_OFFSET would point us back into the dynamic
+	 * range and we might end up looping forever...
+	 */
+	if (pgd && next_early_pgt < EARLY_DYNAMIC_PAGE_TABLES) {
+		pud_p = (pudval_t *)((pgd & PTE_PFN_MASK) + __START_KERNEL_map);
+	} else {
+		if (next_early_pgt >= EARLY_DYNAMIC_PAGE_TABLES-1)
+			reset_early_page_tables();
+
+		pud_p = (pudval_t *)early_dynamic_pgts[next_early_pgt++];
+		for (i = 0; i < PTRS_PER_PUD; i++)
+			pud_p[i] = 0;
+
+		*pgd_p = (pgdval_t)pud_p - __START_KERNEL_map + _KERNPG_TABLE;
+	}
+	i = (address >> PUD_SHIFT) & (PTRS_PER_PUD - 1);
+	pud_p += i;
+
+	pmd_p = (pmdval_t *)early_dynamic_pgts[next_early_pgt++];
+	pmd = (physaddr & PUD_MASK) + (__PAGE_KERNEL_LARGE & ~_PAGE_GLOBAL);
+	for (i = 0; i < PTRS_PER_PMD; i++) {
+		pmd_p[i] = pmd;
+		pmd += PMD_SIZE;
+	}
+
+	*pud_p = (pudval_t)pmd_p - __START_KERNEL_map + _KERNPG_TABLE;
+
+	return 0;
 }
 
 /* Don't add a printk in there. printk relies on the PDA which is not initialized 
@@ -70,12 +132,13 @@ void __init x86_64_start_kernel(char * r
 				(__START_KERNEL & PGDIR_MASK)));
 	BUILD_BUG_ON(__fix_to_virt(__end_of_fixed_addresses) <= MODULES_END);
 
+	/* Kill off the identity-map trampoline */
+	reset_early_page_tables();
+
 	/* clear bss before set_intr_gate with early_idt_handler */
 	clear_bss();
 
-	/* Make NULL pointers segfault */
-	zap_identity_mappings();
-
+	/* XXX - this is wrong... we need to build page tables from scratch */
 	max_pfn_mapped = KERNEL_IMAGE_SIZE >> PAGE_SHIFT;
 
 	for (i = 0; i < NUM_EXCEPTION_VECTORS; i++) {
Index: linux-2.6/arch/x86/kernel/head_64.S
===================================================================
--- linux-2.6.orig/arch/x86/kernel/head_64.S
+++ linux-2.6/arch/x86/kernel/head_64.S
@@ -47,14 +47,13 @@ L3_START_KERNEL = pud_index(__START_KERN
 	.code64
 	.globl startup_64
 startup_64:
-
 	/*
 	 * At this point the CPU runs in 64bit mode CS.L = 1 CS.D = 1,
 	 * and someone has loaded an identity mapped page table
 	 * for us.  These identity mapped page tables map all of the
 	 * kernel pages and possibly all of memory.
 	 *
-	 * %esi holds a physical pointer to real_mode_data.
+	 * %rsi holds a physical pointer to real_mode_data.
 	 *
 	 * We come here either directly from a 64bit bootloader, or from
 	 * arch/x86_64/boot/compressed/head.S.
@@ -66,7 +65,8 @@ startup_64:
 	 * tables and then reload them.
 	 */
 
-	/* Compute the delta between the address I am compiled to run at and the
+	/*
+	 * Compute the delta between the address I am compiled to run at and the
 	 * address I am actually running at.
 	 */
 	leaq	_text(%rip), %rbp
@@ -78,53 +78,66 @@ startup_64:
 	testl	%eax, %eax
 	jnz	bad_address
 
-	/* Is the address too large? */
-	leaq	_text(%rip), %rdx
-	movq	$PGDIR_SIZE, %rax
-	cmpq	%rax, %rdx
-	jae	bad_address
-
-	/* Fixup the physical addresses in the page table
-	 */
-	addq	%rbp, init_level4_pgt + 0(%rip)
-	addq	%rbp, init_level4_pgt + (L4_PAGE_OFFSET*8)(%rip)
-	addq	%rbp, init_level4_pgt + (L4_START_KERNEL*8)(%rip)
+	/*
+	 * Is the address too large?
+	 */
+	leaq	_text(%rip), %rax
+	shrq	$MAX_PHYSMEM_BITS, %rax
+	jnz	bad_address
 
-	addq	%rbp, level3_ident_pgt + 0(%rip)
+	/*
+	 * Fixup the physical addresses in the page table
+	 */
+	addq	%rbp, early_level4_pgt + (L4_START_KERNEL*8)(%rip)
 
 	addq	%rbp, level3_kernel_pgt + (510*8)(%rip)
 	addq	%rbp, level3_kernel_pgt + (511*8)(%rip)
 
 	addq	%rbp, level2_fixmap_pgt + (506*8)(%rip)
 
-	/* Add an Identity mapping if I am above 1G */
+	/*
+	 * Set up the identity mapping for the switchover.  These
+	 * entries should *NOT* have the global bit set!  This also
+	 * creates a bunch of nonsense entries but that is fine --
+	 * it avoids problems around wraparound.
+	 */
 	leaq	_text(%rip), %rdi
-	andq	$PMD_PAGE_MASK, %rdi
+	leaq	early_level4_pgt(%rip), %rbx
 
 	movq	%rdi, %rax
-	shrq	$PUD_SHIFT, %rax
-	andq	$(PTRS_PER_PUD - 1), %rax
-	jz	ident_complete
+	shrq	$PGDIR_SHIFT, %rax
 
-	leaq	(level2_spare_pgt - __START_KERNEL_map + _KERNPG_TABLE)(%rbp), %rdx
-	leaq	level3_ident_pgt(%rip), %rbx
-	movq	%rdx, 0(%rbx, %rax, 8)
+	leaq	(4096 + _KERNPG_TABLE)(%rbx), %rdx
+	movq	%rdx, 0(%rbx,%rax,8)
+	movq	%rdx, 8(%rbx,%rax,8)
 
+	addq	$4096, %rdx
 	movq	%rdi, %rax
-	shrq	$PMD_SHIFT, %rax
-	andq	$(PTRS_PER_PMD - 1), %rax
-	leaq	__PAGE_KERNEL_IDENT_LARGE_EXEC(%rdi), %rdx
-	leaq	level2_spare_pgt(%rip), %rbx
-	movq	%rdx, 0(%rbx, %rax, 8)
-ident_complete:
+	shrq	$PUD_SHIFT, %rax
+	andl	$(PTRS_PER_PUD-1), %eax
+	movq	%rdx, (4096+0)(%rbx,%rax,8)
+	movq	%rdx, (4096+8)(%rbx,%rax,8)
 
+	addq	$8192, %rbx
+	movq	%rdi, %rax
+	shrq	$PMD_SHIFT, %rdi
+	addq	$(__PAGE_KERNEL_LARGE_EXEC & ~_PAGE_GLOBAL), %rax
+	movl	$PTRS_PER_PMD, %ecx
+
+1:
+	andq	$(PTRS_PER_PMD - 1), %rdi
+	movq	%rax, (%rbx,%rdi,8)
+	incq	%rdi
+	addq	$PMD_SIZE, %rax
+	decl	%ecx
+	jnz	1b
+	
 	/*
 	 * Fixup the kernel text+data virtual addresses. Note that
 	 * we might write invalid pmds, when the kernel is relocated
 	 * cleanup_highmap() fixes this up along with the mappings
 	 * beyond _end.
 	 */
-
 	leaq	level2_kernel_pgt(%rip), %rdi
 	leaq	4096(%rdi), %r8
 	/* See if it is a valid page table entry */
@@ -139,17 +152,14 @@ ident_complete:
 	/* Fixup phys_base */
 	addq	%rbp, phys_base(%rip)
 
-	/* Due to ENTRY(), sometimes the empty space gets filled with
-	 * zeros. Better take a jmp than relying on empty space being
-	 * filled with 0x90 (nop)
-	 */
-	jmp secondary_startup_64
+	movq	$(early_level4_pgt - __START_KERNEL_map), %rax
+	jmp 1f
 ENTRY(secondary_startup_64)
 	/*
 	 * At this point the CPU runs in 64bit mode CS.L = 1 CS.D = 1,
 	 * and someone has loaded a mapped page table.
 	 *
-	 * %esi holds a physical pointer to real_mode_data.
+	 * %rsi holds a physical pointer to real_mode_data.
 	 *
 	 * We come here either from startup_64 (using physical addresses)
 	 * or from trampoline.S (using virtual addresses).
@@ -159,12 +169,14 @@ ENTRY(secondary_startup_64)
 	 * after the boot processor executes this code.
 	 */
 
+	movq	$(init_level4_pgt - __START_KERNEL_map), %rax
+1:
+
 	/* Enable PAE mode and PGE */
-	movl	$(X86_CR4_PAE | X86_CR4_PGE), %eax
-	movq	%rax, %cr4
+	movl	$(X86_CR4_PAE | X86_CR4_PGE), %ecx
+	movq	%rcx, %cr4
 
 	/* Setup early boot stage 4 level pagetables. */
-	movq	$(init_level4_pgt - __START_KERNEL_map), %rax
 	addq	phys_base(%rip), %rax
 	movq	%rax, %cr3
 
@@ -196,7 +208,7 @@ ENTRY(secondary_startup_64)
 	movq	%rax, %cr0
 
 	/* Setup a boot time stack */
-	movq stack_start(%rip),%rsp
+	movq stack_start(%rip), %rsp
 
 	/* zero EFLAGS after setting rsp */
 	pushq $0
@@ -236,21 +248,19 @@ ENTRY(secondary_startup_64)
 	movl	initial_gs+4(%rip),%edx
 	wrmsr	
 
-	/* esi is pointer to real mode structure with interesting info.
+	/* rsi is pointer to real mode structure with interesting info.
 	   pass it to C */
-	movl	%esi, %edi
+	movq	%rsi, %rdi
 	
 	/* Finally jump to run C code and to be on real kernel address
 	 * Since we are running on identity-mapped space we have to jump
 	 * to the full 64bit address, this is only possible as indirect
 	 * jump.  In addition we need to ensure %cs is set so we make this
-	 * a far return.
+	 * a far jump.
 	 */
-	movq	initial_code(%rip),%rax
 	pushq	$0		# fake return address to stop unwinder
-	pushq	$__KERNEL_CS	# set correct cs
-	pushq	%rax		# target address in negative space
-	lretq
+	/* gas 2.22 is buggy and mis-assembles ljmpq */
+	rex64 ljmp *initial_code(%rip)
 
 #ifdef CONFIG_HOTPLUG_CPU
 /*
@@ -270,13 +280,15 @@ ENDPROC(start_cpu0)
 
 	/* SMP bootup changes these two */
 	__REFDATA
-	.align	8
-	ENTRY(initial_code)
+	.balign	8
+	GLOBAL(initial_code)
 	.quad	x86_64_start_kernel
-	ENTRY(initial_gs)
+	.word	__KERNEL_CS
+	.balign	8
+	GLOBAL(initial_gs)
 	.quad	INIT_PER_CPU_VAR(irq_stack_union)
 
-	ENTRY(stack_start)
+	GLOBAL(stack_start)
 	.quad  init_thread_union+THREAD_SIZE-8
 	.word  0
 	__FINITDATA
@@ -284,7 +296,7 @@ ENDPROC(start_cpu0)
 bad_address:
 	jmp bad_address
 
-	.section ".init.text","ax"
+	__INIT
 	.globl early_idt_handlers
 early_idt_handlers:
 	# 104(%rsp) %rflags
@@ -321,14 +333,22 @@ ENTRY(early_idt_handler)
 	pushq %r11		#  0(%rsp)
 
 	cmpl $__KERNEL_CS,96(%rsp)
-	jne 10f
+	jne 11f
 
+	cmpl $14,72(%rsp)	# Page fault?
+	jnz 10f
+	GET_CR2_INTO(%rdi)	# can clobber any volatile register if pv
+	call early_make_pgtable
+	andl %eax,%eax
+	jz 20f			# All good
+
+10:
 	leaq 88(%rsp),%rdi	# Pointer to %rip
 	call early_fixup_exception
 	andl %eax,%eax
 	jnz 20f			# Found an exception entry
 
-10:
+11:
 #ifdef CONFIG_EARLY_PRINTK
 	GET_CR2_INTO(%r9)	# can clobber any volatile register if pv
 	movl 80(%rsp),%r8d	# error code
@@ -350,7 +370,7 @@ ENTRY(early_idt_handler)
 1:	hlt
 	jmp 1b
 
-20:	# Exception table entry found
+20:	# Exception table entry found or page table generated
 	popq %r11
 	popq %r10
 	popq %r9
@@ -364,6 +384,8 @@ ENTRY(early_idt_handler)
 	decl early_recursion_flag(%rip)
 	INTERRUPT_RETURN
 
+	__INITDATA
+	
 	.balign 4
 early_recursion_flag:
 	.long 0
@@ -374,11 +396,10 @@ early_idt_msg:
 early_idt_ripmsg:
 	.asciz "RIP %s\n"
 #endif /* CONFIG_EARLY_PRINTK */
-	.previous
 
 #define NEXT_PAGE(name) \
 	.balign	PAGE_SIZE; \
-ENTRY(name)
+GLOBAL(name)
 
 /* Automate the creation of 1 to 1 mapping pmd entries */
 #define PMDS(START, PERM, COUNT)			\
@@ -388,46 +409,21 @@ ENTRY(name)
 	i = i + 1 ;					\
 	.endr
 
-	.data
-	/*
-	 * This default setting generates an ident mapping at address 0x100000
-	 * and a mapping for the kernel that precisely maps virtual address
-	 * 0xffffffff80000000 to physical address 0x000000. (always using
-	 * 2Mbyte large pages provided by PAE mode)
-	 */
-NEXT_PAGE(init_level4_pgt)
-	.quad	level3_ident_pgt - __START_KERNEL_map + _KERNPG_TABLE
-	.org	init_level4_pgt + L4_PAGE_OFFSET*8, 0
-	.quad	level3_ident_pgt - __START_KERNEL_map + _KERNPG_TABLE
-	.org	init_level4_pgt + L4_START_KERNEL*8, 0
-	/* (2^48-(2*1024*1024*1024))/(2^39) = 511 */
+	__INITDATA
+NEXT_PAGE(early_level4_pgt)
+	.fill	511,8,0
 	.quad	level3_kernel_pgt - __START_KERNEL_map + _PAGE_TABLE
 
-NEXT_PAGE(level3_ident_pgt)
-	.quad	level2_ident_pgt - __START_KERNEL_map + _KERNPG_TABLE
-	.fill	511,8,0
+NEXT_PAGE(early_dynamic_pgts)
+	.fill	512*EARLY_DYNAMIC_PAGE_TABLES,8,0
 
+	.data
 NEXT_PAGE(level3_kernel_pgt)
 	.fill	L3_START_KERNEL,8,0
 	/* (2^48-(2*1024*1024*1024)-((2^39)*511))/(2^30) = 510 */
 	.quad	level2_kernel_pgt - __START_KERNEL_map + _KERNPG_TABLE
 	.quad	level2_fixmap_pgt - __START_KERNEL_map + _PAGE_TABLE
 
-NEXT_PAGE(level2_fixmap_pgt)
-	.fill	506,8,0
-	.quad	level1_fixmap_pgt - __START_KERNEL_map + _PAGE_TABLE
-	/* 8MB reserved for vsyscalls + a 2MB hole = 4 + 1 entries */
-	.fill	5,8,0
-
-NEXT_PAGE(level1_fixmap_pgt)
-	.fill	512,8,0
-
-NEXT_PAGE(level2_ident_pgt)
-	/* Since I easily can, map the first 1G.
-	 * Don't set NX because code runs from these pages.
-	 */
-	PMDS(0, __PAGE_KERNEL_IDENT_LARGE_EXEC, PTRS_PER_PMD)
-
 NEXT_PAGE(level2_kernel_pgt)
 	/*
 	 * 512 MB kernel mapping. We spend a full page on this pagetable
@@ -442,11 +438,16 @@ NEXT_PAGE(level2_kernel_pgt)
 	PMDS(0, __PAGE_KERNEL_LARGE_EXEC,
 		KERNEL_IMAGE_SIZE/PMD_SIZE)
 
-NEXT_PAGE(level2_spare_pgt)
-	.fill   512, 8, 0
+NEXT_PAGE(level2_fixmap_pgt)
+	.fill	506,8,0
+	.quad	level1_fixmap_pgt - __START_KERNEL_map + _PAGE_TABLE
+	/* 8MB reserved for vsyscalls + a 2MB hole = 4 + 1 entries */
+	.fill	5,8,0
+
+NEXT_PAGE(level1_fixmap_pgt)
+	.fill	512,8,0
 
 #undef PMDS
-#undef NEXT_PAGE
 
 	.data
 	.align 16
@@ -472,6 +473,7 @@ ENTRY(nmi_idt_table)
 	.skip IDT_ENTRIES * 16
 
 	__PAGE_ALIGNED_BSS
-	.align PAGE_SIZE
-ENTRY(empty_zero_page)
+NEXT_PAGE(empty_zero_page)
+	.skip PAGE_SIZE
+NEXT_PAGE(init_level4_pgt)
 	.skip PAGE_SIZE
Index: linux-2.6/arch/x86/kernel/setup.c
===================================================================
--- linux-2.6.orig/arch/x86/kernel/setup.c
+++ linux-2.6/arch/x86/kernel/setup.c
@@ -659,6 +659,24 @@ early_param("reservelow", parse_reservel
  * Note: On x86_64, fixmaps are ready for use even before this is called.
  */
 
+void __init setup_real_mode_reserve(void);
+
+unsigned long __meminit
+kernel_physical_mapping_init(unsigned long start,
+			     unsigned long end,
+			     unsigned long page_size_mask);
+
+extern pgd_t early_level4_pgt[PTRS_PER_PGD];
+static void init_mapping_kernel(void)
+{
+	init_level4_pgt[511] = early_level4_pgt[511];
+	early_alloc_pgt_buf();
+	kernel_physical_mapping_init(0, ISA_END_ADDRESS, 1<<PG_LEVEL_2M);
+	kernel_physical_mapping_init(round_down(__pa_symbol(_text), PMD_SIZE),
+				     round_up(__pa_symbol(_end) - 1, PMD_SIZE),
+					 1<<PG_LEVEL_2M);
+}
+
 void __init setup_arch(char **cmdline_p)
 {
 #ifdef CONFIG_X86_32
@@ -685,6 +703,10 @@ void __init setup_arch(char **cmdline_p)
 	 */
 	olpc_ofw_detect();
 
+#ifdef CONFIG_X86_64
+	init_mapping_kernel();
+#endif
+
 	early_trap_init();
 	early_cpu_init();
 	early_ioremap_init();
@@ -882,7 +904,9 @@ void __init setup_arch(char **cmdline_p)
 
 	reserve_ibft_region();
 
+#ifndef CONFIG_X86_64
 	early_alloc_pgt_buf();
+#endif
 
 	/*
 	 * Need to conclude brk, before memblock_x86_fill()
@@ -915,6 +939,9 @@ void __init setup_arch(char **cmdline_p)
 
 	reserve_real_mode();
 
+#ifdef CONFIG_X86_64
+	write_cr3(__pa(init_level4_pgt));
+#endif
 	init_mem_mapping();
 
 	setup_real_mode();
Index: linux-2.6/arch/x86/mm/init.c
===================================================================
--- linux-2.6.orig/arch/x86/mm/init.c
+++ linux-2.6/arch/x86/mm/init.c
@@ -75,8 +75,8 @@ __ref void *alloc_low_pages(unsigned int
 	return __va(pfn << PAGE_SHIFT);
 }
 
-/* need 4 4k for initial PMD_SIZE, 4k for 0-ISA_END_ADDRESS */
-#define INIT_PGT_BUF_SIZE	(5 * PAGE_SIZE)
+/* need 4 4k for initial PMD_SIZE, 4k for 0-ISA_END_ADDRESS, 4 4k for kernel */
+#define INIT_PGT_BUF_SIZE	(9 * PAGE_SIZE)
 RESERVE_BRK(early_pgt_alloc, INIT_PGT_BUF_SIZE);
 void  __init early_alloc_pgt_buf(void)
 {

^ permalink raw reply	[flat|nested] 127+ messages in thread

* Re: [tip:x86/microcode] x86/microcode_intel_early.c: Early update ucode on Intel's CPU
  2012-12-14 20:08                                                         ` Yinghai Lu
  2012-12-14 20:14                                                           ` Yinghai Lu
@ 2012-12-15  7:57                                                           ` Yinghai Lu
  2012-12-15 19:30                                                             ` H. Peter Anvin
  1 sibling, 1 reply; 127+ messages in thread
From: Yinghai Lu @ 2012-12-15  7:57 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Borislav Petkov, Yu, Fenghua, mingo, linux-kernel, tglx, hpa,
	linux-tip-commits, Konrad Rzeszutek Wilk, Stefano Stabellini

[-- Attachment #1: Type: text/plain, Size: 1045 bytes --]

On Fri, Dec 14, 2012 at 12:08 PM, Yinghai Lu <yinghai@kernel.org> wrote:
> On Fri, Dec 14, 2012 at 12:04 PM, Yinghai Lu <yinghai@kernel.org> wrote:
>> On Fri, Dec 14, 2012 at 11:46 AM, H. Peter Anvin <hpa@zytor.com> wrote:
>>>
>>> I suspect we don't need init_level4_pgt at all and should just plan to
>>> get rid of it.  Is there any reason we can't just build the proper
>>> kernel page table set in pagetable_init() and switch to it there?
>>
>> then how to pass the info to AP?
>
> also we should merge early_level4_pgt with init_level4_pgt.
>
> and #PE handler could just extend to use BRK ...
>
> but need to make sure BRK get mapped at first, and BRK could cross the
> 1G, 512G boundary ...
>
> that could make things less impact to all.

I tailored your patch and made use 2M page increase to replace patch
ioremap function.

   [PATCH v6 12/27] x86: use io_remap to access real_mode_data

and it will extend init_level4_pgt to map extra range. that will limit
affect to even others.

please check if that is ok to you.

Thanks

Yinghai

[-- Attachment #2: limit_pf_handler.patch --]
[-- Type: application/octet-stream, Size: 3785 bytes --]

Subject: [PATCH] x86, 64bit: use #PE handler to setup page table for data

We need to access data area that is not mapped in arch/x86/kernel/head_64.S
two case:
a. load microcode from microcode
b. when zero_page and command_line ls loaded high above 1G.

with this one, will don't not need to ioremap_init ahead...

the pgt buffer is from BRK, and we have enough space there.

Also later init_mem_mapping will resuse those pgt.

This patch is most from HPA.
others from Yinghai:
1. use it with BRK
2. only map 2M one time, becase use zero_page, and command line is very small
   also microcode should be small too 128k ?
   and should not hit possible hole that should be mapped.
3. make it work with kexec when phys_base is not zero.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>

---
 arch/x86/kernel/head64.c  |   54 ++++++++++++++++++++++++++++++++++++++++++++++
 arch/x86/kernel/head_64.S |   13 ++++++++---
 2 files changed, 64 insertions(+), 3 deletions(-)

Index: linux-2.6/arch/x86/kernel/head64.c
===================================================================
--- linux-2.6.orig/arch/x86/kernel/head64.c
+++ linux-2.6/arch/x86/kernel/head64.c
@@ -26,6 +26,60 @@
 #include <asm/e820.h>
 #include <asm/bios_ebda.h>
 
+/* Create a new PMD entry */
+int __init early_make_pgtable(unsigned long address)
+{
+	unsigned long physaddr = address - __PAGE_OFFSET;
+	unsigned long i;
+	pgdval_t pgd, *pgd_p;
+	pudval_t pud, *pud_p;
+	pmdval_t pmd, *pmd_p;
+
+	if (address < __PAGE_OFFSET || physaddr >= MAXMEM)
+		return -1;	/* Invalid address - puke */
+
+	pgd_p = &init_level4_pgt[pgd_index(address)].pgd;
+	pgd = *pgd_p;
+
+	/*
+	 * The use of __START_KERNEL_map rather than __PAGE_OFFSET here is
+	 * critical -- __PAGE_OFFSET would point us back into the dynamic
+	 * range and we might end up looping forever...
+	 */
+	if (pgd)
+		pud_p = (pudval_t *)((pgd & PTE_PFN_MASK) + __START_KERNEL_map - phys_base);
+	else {
+		if ((char *)(_brk_end + PAGE_SIZE) > __brk_limit)
+			return -1;
+		pud_p = (pudval_t *)_brk_end;
+		_brk_end += PAGE_SIZE;
+
+		for (i = 0; i < PTRS_PER_PUD; i++)
+			pud_p[i] = 0;
+		*pgd_p = (pgdval_t)pud_p - __START_KERNEL_map + phys_base + _KERNPG_TABLE;
+	}
+	pud_p += pud_index(address);
+	pud = *pud_p;
+
+	if (pud)
+		pmd_p = (pmdval_t *)((pud & PTE_PFN_MASK) + __START_KERNEL_map - phys_base);
+	else {
+		if ((char *)(_brk_end + PAGE_SIZE) > __brk_limit)
+			return -1;
+		pmd_p = (pmdval_t *)_brk_end;
+		_brk_end += PAGE_SIZE;
+
+		for (i = 0; i < PTRS_PER_PMD; i++)
+			pmd_p[i] = 0;
+		*pud_p = (pudval_t)pmd_p - __START_KERNEL_map + phys_base + _KERNPG_TABLE;
+	}
+	pmd = (physaddr & PMD_MASK) + __PAGE_KERNEL_LARGE;
+	pmd_p[pmd_index(address)] = pmd;
+
+	return 0;
+}
+
+
 static void __init zap_identity_mappings(void)
 {
 	pgd_t *pgd = pgd_offset_k(0UL);
Index: linux-2.6/arch/x86/kernel/head_64.S
===================================================================
--- linux-2.6.orig/arch/x86/kernel/head_64.S
+++ linux-2.6/arch/x86/kernel/head_64.S
@@ -494,14 +494,21 @@ ENTRY(early_idt_handler)
 	pushq %r11		#  0(%rsp)
 
 	cmpl $__KERNEL_CS,96(%rsp)
-	jne 10f
+	jne 11f
 
+	cmpl $14,72(%rsp)       # Page fault?
+	jnz 10f
+	GET_CR2_INTO(%rdi)      # can clobber any volatile register if pv
+	call early_make_pgtable
+	andl %eax,%eax
+	jz 20f                  # All good
+10:
 	leaq 88(%rsp),%rdi	# Pointer to %rip
 	call early_fixup_exception
 	andl %eax,%eax
 	jnz 20f			# Found an exception entry
 
-10:
+11:
 #ifdef CONFIG_EARLY_PRINTK
 	GET_CR2_INTO(%r9)	# can clobber any volatile register if pv
 	movl 80(%rsp),%r8d	# error code
@@ -523,7 +530,7 @@ ENTRY(early_idt_handler)
 1:	hlt
 	jmp 1b
 
-20:	# Exception table entry found
+20:	# Exception table entry found or page table generaged.
 	popq %r11
 	popq %r10
 	popq %r9

^ permalink raw reply	[flat|nested] 127+ messages in thread

* Re: [tip:x86/microcode] x86/microcode_intel_early.c: Early update ucode on Intel's CPU
  2012-12-15  7:57                                                           ` Yinghai Lu
@ 2012-12-15 19:30                                                             ` H. Peter Anvin
  2012-12-15 20:55                                                               ` Yinghai Lu
  0 siblings, 1 reply; 127+ messages in thread
From: H. Peter Anvin @ 2012-12-15 19:30 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: H. Peter Anvin, Borislav Petkov, Yu, Fenghua, mingo,
	linux-kernel, tglx, linux-tip-commits, Konrad Rzeszutek Wilk,
	Stefano Stabellini

On 12/14/2012 11:57 PM, Yinghai Lu wrote:
> 
> I tailored your patch and made use 2M page increase to replace patch
> ioremap function.
> 
>    [PATCH v6 12/27] x86: use io_remap to access real_mode_data
> 
> and it will extend init_level4_pgt to map extra range. that will limit
> affect to even others.
> 
> please check if that is ok to you.
> 

What is the point of only managing 2M at a time?  Now you have to have
more conditionals and you don't get any more memory efficiency.

Filling arbitrarily into the brk is not acceptable... the brk is an O(1)
area and all brk allocations need to be reserved at compile time, so the
overflow handling is still necessary.

So no, this patch is not acceptable.

	-hpa



^ permalink raw reply	[flat|nested] 127+ messages in thread

* Re: [tip:x86/microcode] x86/microcode_intel_early.c: Early update ucode on Intel's CPU
  2012-12-15 19:30                                                             ` H. Peter Anvin
@ 2012-12-15 20:55                                                               ` Yinghai Lu
  2012-12-15 21:31                                                                 ` H. Peter Anvin
  2012-12-15 21:40                                                                 ` H. Peter Anvin
  0 siblings, 2 replies; 127+ messages in thread
From: Yinghai Lu @ 2012-12-15 20:55 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: H. Peter Anvin, Borislav Petkov, Yu, Fenghua, mingo,
	linux-kernel, tglx, linux-tip-commits, Konrad Rzeszutek Wilk,
	Stefano Stabellini

On Sat, Dec 15, 2012 at 11:30 AM, H. Peter Anvin <hpa@linux.intel.com> wrote:
> What is the point of only managing 2M at a time?  Now you have to have
> more conditionals and you don't get any more memory efficiency.

We don't need to, because real_data is less than 2M, and ramdisk is about 16M.

Also if we set map too large, could have chance to cover mem hole near
1T for AMD HT system.

>
> Filling arbitrarily into the brk is not acceptable... the brk is an O(1)
> area and all brk allocations need to be reserved at compile time, so the
> overflow handling is still necessary.

if run out of BRK, we will get panic, because early_make_pgtable will return -1.

and current BRK already have 64 slop space.

BTW, did you look at smp boot problem with early_level4_pgt version?

Yinghai

^ permalink raw reply	[flat|nested] 127+ messages in thread

* Re: [tip:x86/microcode] x86/microcode_intel_early.c: Early update ucode on Intel's CPU
  2012-12-15 20:55                                                               ` Yinghai Lu
@ 2012-12-15 21:31                                                                 ` H. Peter Anvin
  2012-12-15 21:40                                                                 ` H. Peter Anvin
  1 sibling, 0 replies; 127+ messages in thread
From: H. Peter Anvin @ 2012-12-15 21:31 UTC (permalink / raw)
  To: Yinghai Lu, H. Peter Anvin
  Cc: Borislav Petkov, Yu, Fenghua, mingo, linux-kernel, tglx,
	linux-tip-commits, Konrad Rzeszutek Wilk, Stefano Stabellini

The mem hole at 1T should not be marked cachable in the MTRRs.

Yinghai Lu <yinghai@kernel.org> wrote:

>On Sat, Dec 15, 2012 at 11:30 AM, H. Peter Anvin <hpa@linux.intel.com>
>wrote:
>> What is the point of only managing 2M at a time?  Now you have to
>have
>> more conditionals and you don't get any more memory efficiency.
>
>We don't need to, because real_data is less than 2M, and ramdisk is
>about 16M.
>
>Also if we set map too large, could have chance to cover mem hole near
>1T for AMD HT system.
>
>>
>> Filling arbitrarily into the brk is not acceptable... the brk is an
>O(1)
>> area and all brk allocations need to be reserved at compile time, so
>the
>> overflow handling is still necessary.
>
>if run out of BRK, we will get panic, because early_make_pgtable will
>return -1.
>
>and current BRK already have 64 slop space.
>
>BTW, did you look at smp boot problem with early_level4_pgt version?
>
>Yinghai

-- 
Sent from my mobile phone. Please excuse brevity and lack of formatting.

^ permalink raw reply	[flat|nested] 127+ messages in thread

* Re: [tip:x86/microcode] x86/microcode_intel_early.c: Early update ucode on Intel's CPU
  2012-12-15 20:55                                                               ` Yinghai Lu
  2012-12-15 21:31                                                                 ` H. Peter Anvin
@ 2012-12-15 21:40                                                                 ` H. Peter Anvin
  2012-12-15 22:13                                                                   ` Yinghai Lu
  2012-12-16  2:09                                                                   ` Yinghai Lu
  1 sibling, 2 replies; 127+ messages in thread
From: H. Peter Anvin @ 2012-12-15 21:40 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: H. Peter Anvin, Borislav Petkov, Yu, Fenghua, mingo,
	linux-kernel, tglx, linux-tip-commits, Konrad Rzeszutek Wilk,
	Stefano Stabellini

On 12/15/2012 12:55 PM, Yinghai Lu wrote:
> On Sat, Dec 15, 2012 at 11:30 AM, H. Peter Anvin <hpa@linux.intel.com> wrote:
>> What is the point of only managing 2M at a time?  Now you have to have
>> more conditionals and you don't get any more memory efficiency.
>
> We don't need to, because real_data is less than 2M, and ramdisk is about 16M.
>

In other words, you make magic assumptions (some of which are very wrong 
in many real-life scenarios -- people can and do use gigabyte-plus 
initramfs).  That is exactly the wrong thing to do.  Furthermore it 
doesn't buy you anything, because you still have to allocate the PMDs.

> Also if we set map too large, could have chance to cover mem hole near
> 1T for AMD HT system.

Again, should not be cachable in the MTRRs, and even so, is 1G aligned 
already.

>> Filling arbitrarily into the brk is not acceptable... the brk is an O(1)
>> area and all brk allocations need to be reserved at compile time, so the
>> overflow handling is still necessary.
>
> if run out of BRK, we will get panic, because early_make_pgtable will return -1.

And you consider that panic an acceptable failure mode????

> and current BRK already have 64 slop space.
>
> BTW, did you look at smp boot problem with early_level4_pgt version?

No, I have been busy with non-Linux stuff today.

	-hpa

-- 
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel.  I don't speak on their behalf.


^ permalink raw reply	[flat|nested] 127+ messages in thread

* Re: [tip:x86/microcode] x86/microcode_intel_early.c: Early update ucode on Intel's CPU
  2012-12-15 21:40                                                                 ` H. Peter Anvin
@ 2012-12-15 22:13                                                                   ` Yinghai Lu
  2012-12-15 22:17                                                                     ` H. Peter Anvin
  2012-12-16  2:09                                                                   ` Yinghai Lu
  1 sibling, 1 reply; 127+ messages in thread
From: Yinghai Lu @ 2012-12-15 22:13 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: H. Peter Anvin, Borislav Petkov, Yu, Fenghua, mingo,
	linux-kernel, tglx, linux-tip-commits, Konrad Rzeszutek Wilk,
	Stefano Stabellini

On Sat, Dec 15, 2012 at 1:40 PM, H. Peter Anvin <hpa@zytor.com> wrote:
> On 12/15/2012 12:55 PM, Yinghai Lu wrote:
>> Also if we set map too large, could have chance to cover mem hole near
>> 1T for AMD HT system.
>
>
> Again, should not be cachable in the MTRRs, and even so, is 1G aligned
> already.

AMD system could have all mem between TOLM and TOHM all WB, and don
need to set them in MTRRs entries.

and also your switchover change that handle cross 1G, and 512g, and it
is not 1G aligned.
for example, if kernel at 4095G+512M, it will map from 4095G+512M to
4096G + 512M.

^ permalink raw reply	[flat|nested] 127+ messages in thread

* Re: [tip:x86/microcode] x86/microcode_intel_early.c: Early update ucode on Intel's CPU
  2012-12-15 22:13                                                                   ` Yinghai Lu
@ 2012-12-15 22:17                                                                     ` H. Peter Anvin
  2012-12-15 23:15                                                                       ` Yinghai Lu
  0 siblings, 1 reply; 127+ messages in thread
From: H. Peter Anvin @ 2012-12-15 22:17 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: H. Peter Anvin, Borislav Petkov, Yu, Fenghua, mingo,
	linux-kernel, tglx, linux-tip-commits, Konrad Rzeszutek Wilk,
	Stefano Stabellini

On 12/15/2012 02:13 PM, Yinghai Lu wrote:
>
> AMD system could have all mem between TOLM and TOHM all WB, and don
> need to set them in MTRRs entries.
>

I include the TOM2 mechanism in the overall umbrella of MTRRs for this 
purpose.

> and also your switchover change that handle cross 1G, and 512g, and it
> is not 1G aligned.
> for example, if kernel at 4095G+512M, it will map from 4095G+512M to
> 4096G + 512M.

That is for the kernel region itself (that code is actually unchanged 
from the current code), and yes, we could cap that one to _end if there 
are systems which have bugs in that area.  The dynamic page tables map 
1G aligned at a time.

	-hpa

-- 
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel.  I don't speak on their behalf.


^ permalink raw reply	[flat|nested] 127+ messages in thread

* Re: [tip:x86/microcode] x86/microcode_intel_early.c: Early update ucode on Intel's CPU
  2012-12-15 22:17                                                                     ` H. Peter Anvin
@ 2012-12-15 23:15                                                                       ` Yinghai Lu
  2012-12-15 23:17                                                                         ` H. Peter Anvin
  0 siblings, 1 reply; 127+ messages in thread
From: Yinghai Lu @ 2012-12-15 23:15 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: H. Peter Anvin, Borislav Petkov, Yu, Fenghua, mingo,
	linux-kernel, tglx, linux-tip-commits, Konrad Rzeszutek Wilk,
	Stefano Stabellini

On Sat, Dec 15, 2012 at 2:17 PM, H. Peter Anvin <hpa@zytor.com> wrote:
> On 12/15/2012 02:13 PM, Yinghai Lu wrote:
>>
>>
>> AMD system could have all mem between TOLM and TOHM all WB, and don
>> need to set them in MTRRs entries.
>>
>
> I include the TOM2 mechanism in the overall umbrella of MTRRs for this
> purpose.
>
>
>> and also your switchover change that handle cross 1G, and 512g, and it
>> is not 1G aligned.
>> for example, if kernel at 4095G+512M, it will map from 4095G+512M to
>> 4096G + 512M.
>
>
> That is for the kernel region itself (that code is actually unchanged from
> the current code), and yes, we could cap that one to _end if there are
> systems which have bugs in that area.  The dynamic page tables map 1G
> aligned at a time.

dynamic should be 2M too.

AMD system:

http://git.kernel.org/?p=linux/kernel/git/tip/tip.git;a=commitdiff;h=66520ebc2df3fe52eb4792f8101fac573b766baf

 BIOS-e820: [mem 0x0000000100000000-0x000000e037ffffff] usable
 BIOS-e820: [mem 0x000000e038000000-0x000000fcffffffff] reserved
 BIOS-e820: [mem 0x0000010000000000-0x0000011ffeffffff] usable

the hole is not 1G aligned.

or HT region is from e040000000 ?

^ permalink raw reply	[flat|nested] 127+ messages in thread

* Re: [tip:x86/microcode] x86/microcode_intel_early.c: Early update ucode on Intel's CPU
  2012-12-15 23:15                                                                       ` Yinghai Lu
@ 2012-12-15 23:17                                                                         ` H. Peter Anvin
  2012-12-19 20:37                                                                           ` Borislav Petkov
  0 siblings, 1 reply; 127+ messages in thread
From: H. Peter Anvin @ 2012-12-15 23:17 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: H. Peter Anvin, Borislav Petkov, Yu, Fenghua, mingo,
	linux-kernel, tglx, linux-tip-commits, Konrad Rzeszutek Wilk,
	Stefano Stabellini

On 12/15/2012 03:15 PM, Yinghai Lu wrote:
>>
>> That is for the kernel region itself (that code is actually unchanged from
>> the current code), and yes, we could cap that one to _end if there are
>> systems which have bugs in that area.  The dynamic page tables map 1G
>> aligned at a time.
>
> dynamic should be 2M too.
>
> AMD system:
>
> http://git.kernel.org/?p=linux/kernel/git/tip/tip.git;a=commitdiff;h=66520ebc2df3fe52eb4792f8101fac573b766baf
>
>   BIOS-e820: [mem 0x0000000100000000-0x000000e037ffffff] usable
>   BIOS-e820: [mem 0x000000e038000000-0x000000fcffffffff] reserved
>   BIOS-e820: [mem 0x0000010000000000-0x0000011ffeffffff] usable
>
> the hole is not 1G aligned.
>
> or HT region is from e040000000 ?
>

The HT region starts at 0xfd00000000 -- after that reserved region, so I 
have no idea what that particular system is trying to do or what is 
requirements are (nor what its MTRR setup is, since you didn't post it.)

	-hpa

-- 
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel.  I don't speak on their behalf.


^ permalink raw reply	[flat|nested] 127+ messages in thread

* Re: [tip:x86/microcode] x86/microcode_intel_early.c: Early update ucode on Intel's CPU
  2012-12-15 21:40                                                                 ` H. Peter Anvin
  2012-12-15 22:13                                                                   ` Yinghai Lu
@ 2012-12-16  2:09                                                                   ` Yinghai Lu
  2012-12-16  5:17                                                                     ` Yinghai Lu
  1 sibling, 1 reply; 127+ messages in thread
From: Yinghai Lu @ 2012-12-16  2:09 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: H. Peter Anvin, Borislav Petkov, Yu, Fenghua, mingo,
	linux-kernel, tglx, linux-tip-commits, Konrad Rzeszutek Wilk,
	Stefano Stabellini

On Sat, Dec 15, 2012 at 1:40 PM, H. Peter Anvin <hpa@zytor.com> wrote:
> On 12/15/2012 12:55 PM, Yinghai Lu wrote:
>>
>> BTW, did you look at smp boot problem with early_level4_pgt version?
>
>
> No, I have been busy with non-Linux stuff today.
>

ok, i sorted it out. I will split it to small pieces and post them.

^ permalink raw reply	[flat|nested] 127+ messages in thread

* Re: [tip:x86/microcode] x86/microcode_intel_early.c: Early update ucode on Intel's CPU
  2012-12-16  2:09                                                                   ` Yinghai Lu
@ 2012-12-16  5:17                                                                     ` Yinghai Lu
  2012-12-16  8:50                                                                       ` Yinghai Lu
  0 siblings, 1 reply; 127+ messages in thread
From: Yinghai Lu @ 2012-12-16  5:17 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: H. Peter Anvin, Borislav Petkov, Yu, Fenghua, mingo,
	linux-kernel, tglx, linux-tip-commits, Konrad Rzeszutek Wilk,
	Stefano Stabellini

[-- Attachment #1: Type: text/plain, Size: 583 bytes --]

On Sat, Dec 15, 2012 at 6:09 PM, Yinghai Lu <yinghai@kernel.org> wrote:
> On Sat, Dec 15, 2012 at 1:40 PM, H. Peter Anvin <hpa@zytor.com> wrote:
>> On 12/15/2012 12:55 PM, Yinghai Lu wrote:
>>>
>>> BTW, did you look at smp boot problem with early_level4_pgt version?
>>
>>
>> No, I have been busy with non-Linux stuff today.
>>
>
> ok, i sorted it out. I will split it to small pieces and post them.

I updated for-x86-boot branch with it, and it is based on
linus:master
tip:x86/mm
tip:x86/urgent
tip:x86/mm2.

also attach 7 new ones are just added to that branch.

Thanks

Yinghai

[-- Attachment #2: 0003-x86-call-copy_bootdata-early.patch --]
[-- Type: application/octet-stream, Size: 1423 bytes --]

From bc84a72637cfe9c0f5e3d7f8f4aaad6083d6eb43 Mon Sep 17 00:00:00 2001
From: Yinghai Lu <yinghai@kernel.org>
Date: Sat, 15 Dec 2012 20:59:07 -0800
Subject: [PATCH v7 03/29] x86: call copy_bootdata early

real_mode_data aka zero_page could be above 4g.
We will have #PF handler to set page table for not accessible ram
early, but will only limit it before x86_64_start_reservations.

Also we will need to ramdisk info to access microcode blob in ramdisk
in x86_64_start_kernel.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
---
 arch/x86/kernel/head64.c |    6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kernel/head64.c b/arch/x86/kernel/head64.c
index 7b215a5..c0a25e0 100644
--- a/arch/x86/kernel/head64.c
+++ b/arch/x86/kernel/head64.c
@@ -87,6 +87,8 @@ void __init x86_64_start_kernel(char * real_mode_data)
 	}
 	load_idt((const struct desc_ptr *)&idt_descr);
 
+	copy_bootdata(__va(real_mode_data));
+
 	if (console_loglevel == 10)
 		early_printk("Kernel alive\n");
 
@@ -95,7 +97,9 @@ void __init x86_64_start_kernel(char * real_mode_data)
 
 void __init x86_64_start_reservations(char *real_mode_data)
 {
-	copy_bootdata(__va(real_mode_data));
+	/* version is always not zero if it is copied */
+	if (!boot_params.hdr.version)
+		copy_bootdata(__va(real_mode_data));
 
 	memblock_reserve(__pa_symbol(_text),
 			 (unsigned long)__bss_stop - (unsigned long)_text);
-- 
1.7.10.4


[-- Attachment #3: 0004-x86-mm-add-early-kernel-mapping-in-c.patch --]
[-- Type: application/octet-stream, Size: 3015 bytes --]

From 3c2d2c6fc8157bbe54b1273eb8c29f26d6e2afb6 Mon Sep 17 00:00:00 2001
From: Yinghai Lu <yinghai@kernel.org>
Date: Sat, 15 Dec 2012 20:59:07 -0800
Subject: [PATCH v7 04/29] x86, mm: add early kernel mapping in c

It is usually done in arch/x86/kernel/head_64.S, after have #PF handler way
we could and have to move the kernel mapping init later.

That could make us to have smooth transition to init_mem_mapping from
BRK/topdown way.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
---
 arch/x86/include/asm/init.h |    2 ++
 arch/x86/mm/init_64.c       |   77 +++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 79 insertions(+)

diff --git a/arch/x86/include/asm/init.h b/arch/x86/include/asm/init.h
index bac770b..8d0687a 100644
--- a/arch/x86/include/asm/init.h
+++ b/arch/x86/include/asm/init.h
@@ -1,5 +1,7 @@
 #ifndef _ASM_X86_INIT_H
 #define _ASM_X86_INIT_H
 
+int kernel_ident_mapping_init(pgd_t *pgd_page,
+				unsigned long addr, unsigned long end);
 
 #endif /* _ASM_X86_INIT_H */
diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c
index b1178eb..4f5f9f7 100644
--- a/arch/x86/mm/init_64.c
+++ b/arch/x86/mm/init_64.c
@@ -56,6 +56,83 @@
 
 #include "mm_internal.h"
 
+static void __init ident_pmd_init(pmd_t *pmd_page,
+			  unsigned long addr, unsigned long end)
+{
+	addr &= PMD_MASK;
+	for (; addr < end; addr += PMD_SIZE) {
+		pmd_t *pmd = pmd_page + pmd_index(addr);
+
+		if (!pmd_present(*pmd))
+			set_pmd(pmd, __pmd(addr | __PAGE_KERNEL_LARGE_EXEC));
+	}
+}
+static int __init ident_pud_init(pud_t *pud_page,
+			  unsigned long addr, unsigned long end)
+{
+	unsigned long next;
+
+	for (; addr < end; addr = next) {
+		pud_t *pud = pud_page + pud_index(addr);
+		pmd_t *pmd;
+
+		next = (addr & PUD_MASK) + PUD_SIZE;
+		if (next > end)
+			next = end;
+
+		if (pud_present(*pud)) {
+			pmd = pmd_offset(pud, 0);
+			ident_pmd_init(pmd, addr, next);
+			continue;
+		}
+		pmd = (pmd_t *)alloc_low_page();
+		if (!pmd)
+			return -ENOMEM;
+		clear_page(pmd);
+		ident_pmd_init(pmd, addr, next);
+		set_pud(pud, __pud(__pa(pmd) | _KERNPG_TABLE));
+	}
+
+	return 0;
+}
+int __init kernel_ident_mapping_init(pgd_t *pgd_page,
+				unsigned long addr, unsigned long end)
+{
+	unsigned long next;
+	int result;
+
+	addr = (unsigned long)__va(addr);
+	end = (unsigned long)__va(end);
+
+	for (; addr < end; addr = next) {
+		pgd_t *pgd = pgd_page + pgd_index(addr);
+		pud_t *pud;
+
+		next = (addr & PGDIR_MASK) + PGDIR_SIZE;
+		if (next > end)
+			next = end;
+
+		if (pgd_present(*pgd)) {
+			pud = pud_offset(pgd, 0);
+			result = ident_pud_init(pud, __pa(addr), __pa(next));
+			if (result)
+				return result;
+			continue;
+		}
+
+		pud = (pud_t *)alloc_low_page();
+		if (!pud)
+			return -ENOMEM;
+		clear_page(pud);
+		result = ident_pud_init(pud, __pa(addr), __pa(next));
+		if (result)
+			return result;
+		set_pgd(pgd, __pgd(__pa(pud) | _KERNPG_TABLE));
+	}
+
+	return 0;
+}
+
 static int __init parse_direct_gbpages_off(char *arg)
 {
 	direct_gbpages = 0;
-- 
1.7.10.4


[-- Attachment #4: 0005-x86-realmode-use-init_level4_pgt-to-set-trapmoline_p.patch --]
[-- Type: application/octet-stream, Size: 1113 bytes --]

From b80f97f3bdec38a9bd21167374a405cae52c79fc Mon Sep 17 00:00:00 2001
From: Yinghai Lu <yinghai@kernel.org>
Date: Sat, 15 Dec 2012 20:59:07 -0800
Subject: [PATCH v7 05/29] x86, realmode: use init_level4_pgt to set
 trapmoline_pgt directly

with #PF handler way to set early page table, level3_ident will going away.

So just use entry in init_level4_pgt to set them

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
---
 arch/x86/realmode/init.c |    4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/x86/realmode/init.c b/arch/x86/realmode/init.c
index 8045026..9547652 100644
--- a/arch/x86/realmode/init.c
+++ b/arch/x86/realmode/init.c
@@ -78,8 +78,8 @@ void __init setup_real_mode(void)
 	*trampoline_cr4_features = read_cr4();
 
 	trampoline_pgd = (u64 *) __va(real_mode_header->trampoline_pgd);
-	trampoline_pgd[0] = __pa_symbol(level3_ident_pgt) + _KERNPG_TABLE;
-	trampoline_pgd[511] = __pa_symbol(level3_kernel_pgt) + _KERNPG_TABLE;
+	trampoline_pgd[0] = init_level4_pgt[pgd_index(__PAGE_OFFSET)].pgd;
+	trampoline_pgd[511] = init_level4_pgt[511].pgd;
 #endif
 }
 
-- 
1.7.10.4


[-- Attachment #5: 0006-x86-mm-increase-BRK-area-for-early-page-table.patch --]
[-- Type: application/octet-stream, Size: 1346 bytes --]

From b09e392aa24581550d943770a4efb003e9b223f5 Mon Sep 17 00:00:00 2001
From: Yinghai Lu <yinghai@kernel.org>
Date: Sat, 15 Dec 2012 20:59:07 -0800
Subject: [PATCH v7 06/29] x86, mm: increase BRK area for early page table

kernel ident page table will come from BRK too, so need to increase
the reservation.

Also add checking about if early_alloc buf is allocated, so could
avoid #ifdef in different path like xen or non-xen.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
---
 arch/x86/mm/init.c |    8 ++++++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/arch/x86/mm/init.c b/arch/x86/mm/init.c
index c4293cf..3dd77fd 100644
--- a/arch/x86/mm/init.c
+++ b/arch/x86/mm/init.c
@@ -75,14 +75,18 @@ __ref void *alloc_low_pages(unsigned int num)
 	return __va(pfn << PAGE_SHIFT);
 }
 
-/* need 4 4k for initial PMD_SIZE, 4k for 0-ISA_END_ADDRESS */
-#define INIT_PGT_BUF_SIZE	(5 * PAGE_SIZE)
+/* need 4 4k for initial PMD_SIZE, 4k for 0-ISA_END_ADDRESS, 4 4k for kernel */
+#define INIT_PGT_BUF_SIZE	(9 * PAGE_SIZE)
 RESERVE_BRK(early_pgt_alloc, INIT_PGT_BUF_SIZE);
 void  __init early_alloc_pgt_buf(void)
 {
 	unsigned long tables = INIT_PGT_BUF_SIZE;
 	phys_addr_t base;
 
+	/* already set ? */
+	if (pgt_buf_end)
+		return;
+
 	base = __pa(extend_brk(tables, PAGE_SIZE));
 
 	pgt_buf_start = base >> PAGE_SHIFT;
-- 
1.7.10.4


[-- Attachment #6: 0007-x86-64bit-early-PF-handler-set-page-table.patch --]
[-- Type: application/octet-stream, Size: 16079 bytes --]

From 3ade1023149699942578b7d190c5f661fa3bdd14 Mon Sep 17 00:00:00 2001
From: "H. Peter Anvin" <hpa@zytor.com>
Date: Sat, 15 Dec 2012 20:59:07 -0800
Subject: [PATCH v7 07/29] x86, 64bit: early #PF handler set page table

two use cases:
1. We will support load and run kernel above 4G, and zero_page, ramdisk
   will be above 4G, too
2. need to access ramdisk early to get microcode to update that as
   early possible.

We could use early_iomap to access them, but it will make code to
messy and hard to unified with 32bit.

So comes #PF handler to set page page.

When #PF happen, handler will use temp page to set page page to cover
accessed page.

those code and page in __INIT sections, so will not increase usages.

Also with help of #PF handler, we can use BRK to set kernel mapping
at first in C, before switching to init_level4_pgt, that could avoid
100 lines code in arch/x86/kernel/head_64.S.

switchover is only using three page to handle kernel crossing 1G, 512G
with shareing page, most insteresting part.

early_make_pgtable is using kernel high mapping address to access pages
to set page table.

-v4: Add phys_base offset to make kexec happy, and add
	init_mapping_kernel()   - Yinghai

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
---
 arch/x86/include/asm/pgtable_64_types.h |    4 +
 arch/x86/kernel/head64.c                |   91 +++++++++++++--
 arch/x86/kernel/head_64.S               |  186 ++++++++++++++++---------------
 3 files changed, 182 insertions(+), 99 deletions(-)

diff --git a/arch/x86/include/asm/pgtable_64_types.h b/arch/x86/include/asm/pgtable_64_types.h
index 766ea16..2d88344 100644
--- a/arch/x86/include/asm/pgtable_64_types.h
+++ b/arch/x86/include/asm/pgtable_64_types.h
@@ -1,6 +1,8 @@
 #ifndef _ASM_X86_PGTABLE_64_DEFS_H
 #define _ASM_X86_PGTABLE_64_DEFS_H
 
+#include <asm/sparsemem.h>
+
 #ifndef __ASSEMBLY__
 #include <linux/types.h>
 
@@ -60,4 +62,6 @@ typedef struct { pteval_t pte; } pte_t;
 #define MODULES_END      _AC(0xffffffffff000000, UL)
 #define MODULES_LEN   (MODULES_END - MODULES_VADDR)
 
+#define EARLY_DYNAMIC_PAGE_TABLES	64
+
 #endif /* _ASM_X86_PGTABLE_64_DEFS_H */
diff --git a/arch/x86/kernel/head64.c b/arch/x86/kernel/head64.c
index c0a25e0..4fabebc 100644
--- a/arch/x86/kernel/head64.c
+++ b/arch/x86/kernel/head64.c
@@ -14,6 +14,7 @@
 #include <linux/io.h>
 #include <linux/memblock.h>
 
+#include <asm/init.h>
 #include <asm/processor.h>
 #include <asm/proto.h>
 #include <asm/smp.h>
@@ -26,11 +27,73 @@
 #include <asm/e820.h>
 #include <asm/bios_ebda.h>
 
-static void __init zap_identity_mappings(void)
+/*
+ * Manage page tables very early on.
+ */
+extern pgd_t early_level4_pgt[PTRS_PER_PGD];
+extern pmd_t early_dynamic_pgts[EARLY_DYNAMIC_PAGE_TABLES][PTRS_PER_PMD];
+static unsigned int __initdata next_early_pgt = 2, early_pgt_resets = 0;
+
+/* Wipe all early page tables except for the kernel symbol map */
+static void __init reset_early_page_tables(void)
 {
-	pgd_t *pgd = pgd_offset_k(0UL);
-	pgd_clear(pgd);
-	__flush_tlb_all();
+	unsigned long i;
+
+	for (i = 0; i < PTRS_PER_PGD-1; i++)
+		early_level4_pgt[i].pgd = 0;
+
+	next_early_pgt = 0;
+	early_pgt_resets++;
+
+	__native_flush_tlb();
+}
+
+/* Create a new PMD entry */
+int __init early_make_pgtable(unsigned long address)
+{
+	unsigned long physaddr = address - __PAGE_OFFSET;
+	unsigned long i;
+	pgdval_t pgd, *pgd_p;
+	pudval_t *pud_p;
+	pmdval_t pmd, *pmd_p;
+
+	if (physaddr >= MAXMEM)
+		return -1;	/* Invalid address - puke */
+
+	i = (address >> PGDIR_SHIFT) & (PTRS_PER_PGD - 1);
+	pgd_p = &early_level4_pgt[i].pgd;
+	pgd = *pgd_p;
+
+	/*
+	 * The use of __START_KERNEL_map rather than __PAGE_OFFSET here is
+	 * critical -- __PAGE_OFFSET would point us back into the dynamic
+	 * range and we might end up looping forever...
+	 */
+	if (pgd && next_early_pgt < EARLY_DYNAMIC_PAGE_TABLES) {
+		pud_p = (pudval_t *)((pgd & PTE_PFN_MASK) + __START_KERNEL_map - phys_base);
+	} else {
+		if (next_early_pgt >= EARLY_DYNAMIC_PAGE_TABLES-1)
+			reset_early_page_tables();
+
+		pud_p = (pudval_t *)early_dynamic_pgts[next_early_pgt++];
+		for (i = 0; i < PTRS_PER_PUD; i++)
+			pud_p[i] = 0;
+
+		*pgd_p = (pgdval_t)pud_p - __START_KERNEL_map + phys_base + _KERNPG_TABLE;
+	}
+	i = (address >> PUD_SHIFT) & (PTRS_PER_PUD - 1);
+	pud_p += i;
+
+	pmd_p = (pmdval_t *)early_dynamic_pgts[next_early_pgt++];
+	pmd = (physaddr & PUD_MASK) + (__PAGE_KERNEL_LARGE & ~_PAGE_GLOBAL);
+	for (i = 0; i < PTRS_PER_PMD; i++) {
+		pmd_p[i] = pmd;
+		pmd += PMD_SIZE;
+	}
+
+	*pud_p = (pudval_t)pmd_p - __START_KERNEL_map + phys_base + _KERNPG_TABLE;
+
+	return 0;
 }
 
 /* Don't add a printk in there. printk relies on the PDA which is not initialized 
@@ -52,6 +115,17 @@ static void __init copy_bootdata(char *real_mode_data)
 	}
 }
 
+static void __init init_mapping_kernel(void)
+{
+	init_level4_pgt[511] = early_level4_pgt[511];
+	early_alloc_pgt_buf();
+	kernel_ident_mapping_init(init_level4_pgt, 0, PMD_SIZE);
+	kernel_ident_mapping_init(init_level4_pgt,
+				round_down(__pa_symbol(_text), PMD_SIZE),
+				round_up(__pa_symbol(_end) - 1, PMD_SIZE));
+	write_cr3(__pa(init_level4_pgt));
+}
+
 void __init x86_64_start_kernel(char * real_mode_data)
 {
 	int i;
@@ -70,12 +144,13 @@ void __init x86_64_start_kernel(char * real_mode_data)
 				(__START_KERNEL & PGDIR_MASK)));
 	BUILD_BUG_ON(__fix_to_virt(__end_of_fixed_addresses) <= MODULES_END);
 
+	/* Kill off the identity-map trampoline */
+	reset_early_page_tables();
+
 	/* clear bss before set_intr_gate with early_idt_handler */
 	clear_bss();
 
-	/* Make NULL pointers segfault */
-	zap_identity_mappings();
-
+	/* XXX - this is wrong... we need to build page tables from scratch */
 	max_pfn_mapped = KERNEL_IMAGE_SIZE >> PAGE_SHIFT;
 
 	for (i = 0; i < NUM_EXCEPTION_VECTORS; i++) {
@@ -92,6 +167,8 @@ void __init x86_64_start_kernel(char * real_mode_data)
 	if (console_loglevel == 10)
 		early_printk("Kernel alive\n");
 
+	init_mapping_kernel();
+
 	x86_64_start_reservations(real_mode_data);
 }
 
diff --git a/arch/x86/kernel/head_64.S b/arch/x86/kernel/head_64.S
index 980053c..5261f37 100644
--- a/arch/x86/kernel/head_64.S
+++ b/arch/x86/kernel/head_64.S
@@ -47,14 +47,13 @@ L3_START_KERNEL = pud_index(__START_KERNEL_map)
 	.code64
 	.globl startup_64
 startup_64:
-
 	/*
 	 * At this point the CPU runs in 64bit mode CS.L = 1 CS.D = 1,
 	 * and someone has loaded an identity mapped page table
 	 * for us.  These identity mapped page tables map all of the
 	 * kernel pages and possibly all of memory.
 	 *
-	 * %esi holds a physical pointer to real_mode_data.
+	 * %rsi holds a physical pointer to real_mode_data.
 	 *
 	 * We come here either directly from a 64bit bootloader, or from
 	 * arch/x86_64/boot/compressed/head.S.
@@ -66,7 +65,8 @@ startup_64:
 	 * tables and then reload them.
 	 */
 
-	/* Compute the delta between the address I am compiled to run at and the
+	/*
+	 * Compute the delta between the address I am compiled to run at and the
 	 * address I am actually running at.
 	 */
 	leaq	_text(%rip), %rbp
@@ -78,53 +78,66 @@ startup_64:
 	testl	%eax, %eax
 	jnz	bad_address
 
-	/* Is the address too large? */
-	leaq	_text(%rip), %rdx
-	movq	$PGDIR_SIZE, %rax
-	cmpq	%rax, %rdx
-	jae	bad_address
-
-	/* Fixup the physical addresses in the page table
+	/*
+	 * Is the address too large?
 	 */
-	addq	%rbp, init_level4_pgt + 0(%rip)
-	addq	%rbp, init_level4_pgt + (L4_PAGE_OFFSET*8)(%rip)
-	addq	%rbp, init_level4_pgt + (L4_START_KERNEL*8)(%rip)
+	leaq	_text(%rip), %rax
+	shrq	$MAX_PHYSMEM_BITS, %rax
+	jnz	bad_address
 
-	addq	%rbp, level3_ident_pgt + 0(%rip)
+	/*
+	 * Fixup the physical addresses in the page table
+	 */
+	addq	%rbp, early_level4_pgt + (L4_START_KERNEL*8)(%rip)
 
 	addq	%rbp, level3_kernel_pgt + (510*8)(%rip)
 	addq	%rbp, level3_kernel_pgt + (511*8)(%rip)
 
 	addq	%rbp, level2_fixmap_pgt + (506*8)(%rip)
 
-	/* Add an Identity mapping if I am above 1G */
+	/*
+	 * Set up the identity mapping for the switchover.  These
+	 * entries should *NOT* have the global bit set!  This also
+	 * creates a bunch of nonsense entries but that is fine --
+	 * it avoids problems around wraparound.
+	 */
 	leaq	_text(%rip), %rdi
-	andq	$PMD_PAGE_MASK, %rdi
+	leaq	early_level4_pgt(%rip), %rbx
 
 	movq	%rdi, %rax
-	shrq	$PUD_SHIFT, %rax
-	andq	$(PTRS_PER_PUD - 1), %rax
-	jz	ident_complete
+	shrq	$PGDIR_SHIFT, %rax
 
-	leaq	(level2_spare_pgt - __START_KERNEL_map + _KERNPG_TABLE)(%rbp), %rdx
-	leaq	level3_ident_pgt(%rip), %rbx
-	movq	%rdx, 0(%rbx, %rax, 8)
+	leaq	(4096 + _KERNPG_TABLE)(%rbx), %rdx
+	movq	%rdx, 0(%rbx,%rax,8)
+	movq	%rdx, 8(%rbx,%rax,8)
 
+	addq	$4096, %rdx
 	movq	%rdi, %rax
-	shrq	$PMD_SHIFT, %rax
-	andq	$(PTRS_PER_PMD - 1), %rax
-	leaq	__PAGE_KERNEL_IDENT_LARGE_EXEC(%rdi), %rdx
-	leaq	level2_spare_pgt(%rip), %rbx
-	movq	%rdx, 0(%rbx, %rax, 8)
-ident_complete:
+	shrq	$PUD_SHIFT, %rax
+	andl	$(PTRS_PER_PUD-1), %eax
+	movq	%rdx, (4096+0)(%rbx,%rax,8)
+	movq	%rdx, (4096+8)(%rbx,%rax,8)
 
+	addq	$8192, %rbx
+	movq	%rdi, %rax
+	shrq	$PMD_SHIFT, %rdi
+	addq	$(__PAGE_KERNEL_LARGE_EXEC & ~_PAGE_GLOBAL), %rax
+	movl	$PTRS_PER_PMD, %ecx
+
+1:
+	andq	$(PTRS_PER_PMD - 1), %rdi
+	movq	%rax, (%rbx,%rdi,8)
+	incq	%rdi
+	addq	$PMD_SIZE, %rax
+	decl	%ecx
+	jnz	1b
+	
 	/*
 	 * Fixup the kernel text+data virtual addresses. Note that
 	 * we might write invalid pmds, when the kernel is relocated
 	 * cleanup_highmap() fixes this up along with the mappings
 	 * beyond _end.
 	 */
-
 	leaq	level2_kernel_pgt(%rip), %rdi
 	leaq	4096(%rdi), %r8
 	/* See if it is a valid page table entry */
@@ -139,17 +152,14 @@ ident_complete:
 	/* Fixup phys_base */
 	addq	%rbp, phys_base(%rip)
 
-	/* Due to ENTRY(), sometimes the empty space gets filled with
-	 * zeros. Better take a jmp than relying on empty space being
-	 * filled with 0x90 (nop)
-	 */
-	jmp secondary_startup_64
+	movq	$(early_level4_pgt - __START_KERNEL_map), %rax
+	jmp 1f
 ENTRY(secondary_startup_64)
 	/*
 	 * At this point the CPU runs in 64bit mode CS.L = 1 CS.D = 1,
 	 * and someone has loaded a mapped page table.
 	 *
-	 * %esi holds a physical pointer to real_mode_data.
+	 * %rsi holds a physical pointer to real_mode_data.
 	 *
 	 * We come here either from startup_64 (using physical addresses)
 	 * or from trampoline.S (using virtual addresses).
@@ -159,12 +169,14 @@ ENTRY(secondary_startup_64)
 	 * after the boot processor executes this code.
 	 */
 
+	movq	$(init_level4_pgt - __START_KERNEL_map), %rax
+1:
+
 	/* Enable PAE mode and PGE */
-	movl	$(X86_CR4_PAE | X86_CR4_PGE), %eax
-	movq	%rax, %cr4
+	movl	$(X86_CR4_PAE | X86_CR4_PGE), %ecx
+	movq	%rcx, %cr4
 
 	/* Setup early boot stage 4 level pagetables. */
-	movq	$(init_level4_pgt - __START_KERNEL_map), %rax
 	addq	phys_base(%rip), %rax
 	movq	%rax, %cr3
 
@@ -196,7 +208,7 @@ ENTRY(secondary_startup_64)
 	movq	%rax, %cr0
 
 	/* Setup a boot time stack */
-	movq stack_start(%rip),%rsp
+	movq stack_start(%rip), %rsp
 
 	/* zero EFLAGS after setting rsp */
 	pushq $0
@@ -236,21 +248,19 @@ ENTRY(secondary_startup_64)
 	movl	initial_gs+4(%rip),%edx
 	wrmsr	
 
-	/* esi is pointer to real mode structure with interesting info.
+	/* rsi is pointer to real mode structure with interesting info.
 	   pass it to C */
-	movl	%esi, %edi
+	movq	%rsi, %rdi
 	
 	/* Finally jump to run C code and to be on real kernel address
 	 * Since we are running on identity-mapped space we have to jump
 	 * to the full 64bit address, this is only possible as indirect
 	 * jump.  In addition we need to ensure %cs is set so we make this
-	 * a far return.
+	 * a far jump.
 	 */
-	movq	initial_code(%rip),%rax
 	pushq	$0		# fake return address to stop unwinder
-	pushq	$__KERNEL_CS	# set correct cs
-	pushq	%rax		# target address in negative space
-	lretq
+	/* gas 2.22 is buggy and mis-assembles ljmpq */
+	rex64 ljmp *initial_code(%rip)
 
 #ifdef CONFIG_HOTPLUG_CPU
 /*
@@ -270,13 +280,15 @@ ENDPROC(start_cpu0)
 
 	/* SMP bootup changes these two */
 	__REFDATA
-	.align	8
-	ENTRY(initial_code)
+	.balign	8
+	GLOBAL(initial_code)
 	.quad	x86_64_start_kernel
-	ENTRY(initial_gs)
+	.word	__KERNEL_CS
+	.balign	8
+	GLOBAL(initial_gs)
 	.quad	INIT_PER_CPU_VAR(irq_stack_union)
 
-	ENTRY(stack_start)
+	GLOBAL(stack_start)
 	.quad  init_thread_union+THREAD_SIZE-8
 	.word  0
 	__FINITDATA
@@ -284,7 +296,7 @@ ENDPROC(start_cpu0)
 bad_address:
 	jmp bad_address
 
-	.section ".init.text","ax"
+	__INIT
 	.globl early_idt_handlers
 early_idt_handlers:
 	# 104(%rsp) %rflags
@@ -321,14 +333,22 @@ ENTRY(early_idt_handler)
 	pushq %r11		#  0(%rsp)
 
 	cmpl $__KERNEL_CS,96(%rsp)
-	jne 10f
+	jne 11f
 
+	cmpl $14,72(%rsp)	# Page fault?
+	jnz 10f
+	GET_CR2_INTO(%rdi)	# can clobber any volatile register if pv
+	call early_make_pgtable
+	andl %eax,%eax
+	jz 20f			# All good
+
+10:
 	leaq 88(%rsp),%rdi	# Pointer to %rip
 	call early_fixup_exception
 	andl %eax,%eax
 	jnz 20f			# Found an exception entry
 
-10:
+11:
 #ifdef CONFIG_EARLY_PRINTK
 	GET_CR2_INTO(%r9)	# can clobber any volatile register if pv
 	movl 80(%rsp),%r8d	# error code
@@ -350,7 +370,7 @@ ENTRY(early_idt_handler)
 1:	hlt
 	jmp 1b
 
-20:	# Exception table entry found
+20:	# Exception table entry found or page table generated
 	popq %r11
 	popq %r10
 	popq %r9
@@ -364,6 +384,8 @@ ENTRY(early_idt_handler)
 	decl early_recursion_flag(%rip)
 	INTERRUPT_RETURN
 
+	__INITDATA
+	
 	.balign 4
 early_recursion_flag:
 	.long 0
@@ -374,11 +396,10 @@ early_idt_msg:
 early_idt_ripmsg:
 	.asciz "RIP %s\n"
 #endif /* CONFIG_EARLY_PRINTK */
-	.previous
 
 #define NEXT_PAGE(name) \
 	.balign	PAGE_SIZE; \
-ENTRY(name)
+GLOBAL(name)
 
 /* Automate the creation of 1 to 1 mapping pmd entries */
 #define PMDS(START, PERM, COUNT)			\
@@ -388,46 +409,21 @@ ENTRY(name)
 	i = i + 1 ;					\
 	.endr
 
-	.data
-	/*
-	 * This default setting generates an ident mapping at address 0x100000
-	 * and a mapping for the kernel that precisely maps virtual address
-	 * 0xffffffff80000000 to physical address 0x000000. (always using
-	 * 2Mbyte large pages provided by PAE mode)
-	 */
-NEXT_PAGE(init_level4_pgt)
-	.quad	level3_ident_pgt - __START_KERNEL_map + _KERNPG_TABLE
-	.org	init_level4_pgt + L4_PAGE_OFFSET*8, 0
-	.quad	level3_ident_pgt - __START_KERNEL_map + _KERNPG_TABLE
-	.org	init_level4_pgt + L4_START_KERNEL*8, 0
-	/* (2^48-(2*1024*1024*1024))/(2^39) = 511 */
+	__INITDATA
+NEXT_PAGE(early_level4_pgt)
+	.fill	511,8,0
 	.quad	level3_kernel_pgt - __START_KERNEL_map + _PAGE_TABLE
 
-NEXT_PAGE(level3_ident_pgt)
-	.quad	level2_ident_pgt - __START_KERNEL_map + _KERNPG_TABLE
-	.fill	511,8,0
+NEXT_PAGE(early_dynamic_pgts)
+	.fill	512*EARLY_DYNAMIC_PAGE_TABLES,8,0
 
+	.data
 NEXT_PAGE(level3_kernel_pgt)
 	.fill	L3_START_KERNEL,8,0
 	/* (2^48-(2*1024*1024*1024)-((2^39)*511))/(2^30) = 510 */
 	.quad	level2_kernel_pgt - __START_KERNEL_map + _KERNPG_TABLE
 	.quad	level2_fixmap_pgt - __START_KERNEL_map + _PAGE_TABLE
 
-NEXT_PAGE(level2_fixmap_pgt)
-	.fill	506,8,0
-	.quad	level1_fixmap_pgt - __START_KERNEL_map + _PAGE_TABLE
-	/* 8MB reserved for vsyscalls + a 2MB hole = 4 + 1 entries */
-	.fill	5,8,0
-
-NEXT_PAGE(level1_fixmap_pgt)
-	.fill	512,8,0
-
-NEXT_PAGE(level2_ident_pgt)
-	/* Since I easily can, map the first 1G.
-	 * Don't set NX because code runs from these pages.
-	 */
-	PMDS(0, __PAGE_KERNEL_IDENT_LARGE_EXEC, PTRS_PER_PMD)
-
 NEXT_PAGE(level2_kernel_pgt)
 	/*
 	 * 512 MB kernel mapping. We spend a full page on this pagetable
@@ -442,11 +438,16 @@ NEXT_PAGE(level2_kernel_pgt)
 	PMDS(0, __PAGE_KERNEL_LARGE_EXEC,
 		KERNEL_IMAGE_SIZE/PMD_SIZE)
 
-NEXT_PAGE(level2_spare_pgt)
-	.fill   512, 8, 0
+NEXT_PAGE(level2_fixmap_pgt)
+	.fill	506,8,0
+	.quad	level1_fixmap_pgt - __START_KERNEL_map + _PAGE_TABLE
+	/* 8MB reserved for vsyscalls + a 2MB hole = 4 + 1 entries */
+	.fill	5,8,0
+
+NEXT_PAGE(level1_fixmap_pgt)
+	.fill	512,8,0
 
 #undef PMDS
-#undef NEXT_PAGE
 
 	.data
 	.align 16
@@ -472,6 +473,7 @@ ENTRY(nmi_idt_table)
 	.skip IDT_ENTRIES * 16
 
 	__PAGE_ALIGNED_BSS
-	.align PAGE_SIZE
-ENTRY(empty_zero_page)
+NEXT_PAGE(empty_zero_page)
+	.skip PAGE_SIZE
+NEXT_PAGE(init_level4_pgt)
 	.skip PAGE_SIZE
-- 
1.7.10.4


[-- Attachment #7: 0008-x86-64bit-PF-handler-set-page-to-cover-2M-only.patch --]
[-- Type: application/octet-stream, Size: 3565 bytes --]

From f9be0ff96262be81a29e3fc382a3b29ad1a676b4 Mon Sep 17 00:00:00 2001
From: Yinghai Lu <yinghai@kernel.org>
Date: Sat, 15 Dec 2012 20:59:08 -0800
Subject: [PATCH v7 08/29] x86, 64bit: #PF handler set page to cover 2M only

1. check if phys address is passed, aka we only handle kernel low mapping.
2. only add one 2M instead of 1G accessing one time for dynamically.
3. switchover only cover kernel space instead of 1G.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
---
 arch/x86/kernel/head64.c  |   44 ++++++++++++++++++++++++++------------------
 arch/x86/kernel/head_64.S |    7 +++++--
 2 files changed, 31 insertions(+), 20 deletions(-)

diff --git a/arch/x86/kernel/head64.c b/arch/x86/kernel/head64.c
index 4fabebc..014b48d 100644
--- a/arch/x86/kernel/head64.c
+++ b/arch/x86/kernel/head64.c
@@ -54,14 +54,14 @@ int __init early_make_pgtable(unsigned long address)
 	unsigned long physaddr = address - __PAGE_OFFSET;
 	unsigned long i;
 	pgdval_t pgd, *pgd_p;
-	pudval_t *pud_p;
+	pudval_t pud, *pud_p;
 	pmdval_t pmd, *pmd_p;
 
-	if (physaddr >= MAXMEM)
+	if (address < __PAGE_OFFSET || physaddr >= MAXMEM)
 		return -1;	/* Invalid address - puke */
 
-	i = (address >> PGDIR_SHIFT) & (PTRS_PER_PGD - 1);
-	pgd_p = &early_level4_pgt[i].pgd;
+again:
+	pgd_p = &early_level4_pgt[pgd_index(address)].pgd;
 	pgd = *pgd_p;
 
 	/*
@@ -69,29 +69,37 @@ int __init early_make_pgtable(unsigned long address)
 	 * critical -- __PAGE_OFFSET would point us back into the dynamic
 	 * range and we might end up looping forever...
 	 */
-	if (pgd && next_early_pgt < EARLY_DYNAMIC_PAGE_TABLES) {
+	if (pgd)
 		pud_p = (pudval_t *)((pgd & PTE_PFN_MASK) + __START_KERNEL_map - phys_base);
-	} else {
-		if (next_early_pgt >= EARLY_DYNAMIC_PAGE_TABLES-1)
+	else {
+		if (next_early_pgt >= EARLY_DYNAMIC_PAGE_TABLES) {
 			reset_early_page_tables();
+			goto again;
+		}
 
 		pud_p = (pudval_t *)early_dynamic_pgts[next_early_pgt++];
 		for (i = 0; i < PTRS_PER_PUD; i++)
 			pud_p[i] = 0;
-
 		*pgd_p = (pgdval_t)pud_p - __START_KERNEL_map + phys_base + _KERNPG_TABLE;
 	}
-	i = (address >> PUD_SHIFT) & (PTRS_PER_PUD - 1);
-	pud_p += i;
-
-	pmd_p = (pmdval_t *)early_dynamic_pgts[next_early_pgt++];
-	pmd = (physaddr & PUD_MASK) + (__PAGE_KERNEL_LARGE & ~_PAGE_GLOBAL);
-	for (i = 0; i < PTRS_PER_PMD; i++) {
-		pmd_p[i] = pmd;
-		pmd += PMD_SIZE;
-	}
+	pud_p += pud_index(address);
+	pud = *pud_p;
 
-	*pud_p = (pudval_t)pmd_p - __START_KERNEL_map + phys_base + _KERNPG_TABLE;
+	if (pud)
+		pmd_p = (pmdval_t *)((pud & PTE_PFN_MASK) + __START_KERNEL_map - phys_base);
+	else {
+		if (next_early_pgt >= EARLY_DYNAMIC_PAGE_TABLES) {
+			reset_early_page_tables();
+			goto again;
+		}
+
+		pmd_p = (pmdval_t *)early_dynamic_pgts[next_early_pgt++];
+		for (i = 0; i < PTRS_PER_PMD; i++)
+			pmd_p[i] = 0;
+		*pud_p = (pudval_t)pmd_p - __START_KERNEL_map + phys_base + _KERNPG_TABLE;
+	}
+	pmd = (physaddr & PMD_MASK) + (__PAGE_KERNEL_LARGE & ~_PAGE_GLOBAL);
+	pmd_p[pmd_index(address)] = pmd;
 
 	return 0;
 }
diff --git a/arch/x86/kernel/head_64.S b/arch/x86/kernel/head_64.S
index 5261f37..0ee661c 100644
--- a/arch/x86/kernel/head_64.S
+++ b/arch/x86/kernel/head_64.S
@@ -122,14 +122,17 @@ startup_64:
 	movq	%rdi, %rax
 	shrq	$PMD_SHIFT, %rdi
 	addq	$(__PAGE_KERNEL_LARGE_EXEC & ~_PAGE_GLOBAL), %rax
-	movl	$PTRS_PER_PMD, %ecx
+	leaq	(_end - 1)(%rip), %rcx
+	shrq	$PMD_SHIFT, %rcx
+	subq	%rdi, %rcx
+	incq	%rcx
 
 1:
 	andq	$(PTRS_PER_PMD - 1), %rdi
 	movq	%rax, (%rbx,%rdi,8)
 	incq	%rdi
 	addq	$PMD_SIZE, %rax
-	decl	%ecx
+	decq	%rcx
 	jnz	1b
 	
 	/*
-- 
1.7.10.4


[-- Attachment #8: 0009-x86-64bit-Print-init-kernel-lowmap-correctly.patch --]
[-- Type: application/octet-stream, Size: 3711 bytes --]

From 5a8765bcd2ceef21b677a7d60f9071a040e84e18 Mon Sep 17 00:00:00 2001
From: Yinghai Lu <yinghai@kernel.org>
Date: Sat, 15 Dec 2012 20:59:08 -0800
Subject: [PATCH v7 09/29] x86, 64bit: Print init kernel lowmap correctly

When we get end of x86_64_start_kernel.

We have
1. kernel highmap 512M (KERNEL_IMAGE_SIZE) from kernel loaded address.
2. kernel lowmap: [0, 2M), and size (_end - _text) from kernel
   loaded address.

for example, if the kernel bzImage is loaded high from 8G, will get:
1. kernel highmap:  [8G, 8G+512M)
2. kernel lowmap: [0, 2M), and  [8G, 8G +_end - _text)

So max_pfn_mapped that is for low map pfn recording is not that
simple to 512M for 64 bit.

Try to print out two ranges, when kernel is loaded high.

Also need to use KERNEL_IMAGE_SIZE directly for highmap cleanup.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
---
 arch/x86/kernel/head64.c |    3 ---
 arch/x86/kernel/setup.c  |   20 ++++++++++++++++++--
 arch/x86/mm/init_64.c    |    6 +++++-
 3 files changed, 23 insertions(+), 6 deletions(-)

diff --git a/arch/x86/kernel/head64.c b/arch/x86/kernel/head64.c
index 014b48d..2775666 100644
--- a/arch/x86/kernel/head64.c
+++ b/arch/x86/kernel/head64.c
@@ -158,9 +158,6 @@ void __init x86_64_start_kernel(char * real_mode_data)
 	/* clear bss before set_intr_gate with early_idt_handler */
 	clear_bss();
 
-	/* XXX - this is wrong... we need to build page tables from scratch */
-	max_pfn_mapped = KERNEL_IMAGE_SIZE >> PAGE_SHIFT;
-
 	for (i = 0; i < NUM_EXCEPTION_VECTORS; i++) {
 #ifdef CONFIG_EARLY_PRINTK
 		set_intr_gate(i, &early_idt_handlers[i]);
diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
index 81ea5a5..d321b9b 100644
--- a/arch/x86/kernel/setup.c
+++ b/arch/x86/kernel/setup.c
@@ -646,6 +646,23 @@ static int __init parse_reservelow(char *p)
 
 early_param("reservelow", parse_reservelow);
 
+static __init void print_init_mem_mapped(void)
+{
+#ifdef CONFIG_X86_32
+	printk(KERN_DEBUG "initial memory mapped: [mem 0x00000000-%#010lx]\n",
+			(max_pfn_mapped<<PAGE_SHIFT) - 1);
+#else
+	unsigned long text = __pa_symbol(&_text);
+	unsigned long end = round_up(__pa_symbol(_end) - 1, PMD_SIZE);
+
+	if (text <= PMD_SIZE)
+		printk(KERN_DEBUG "initial memory mapped: [mem 0x00000000-%#010lx]\n",
+			end - 1);
+	else
+		printk(KERN_DEBUG "initial memory mapped: [mem 0x00000000-%#010lx] [mem %#010lx-%#010lx]\n",
+			PMD_SIZE - 1, text, end - 1);
+#endif
+}
 /*
  * Determine if we were loaded by an EFI loader.  If so, then we have also been
  * passed the efi memmap, systab, etc., so we should use these data structures
@@ -910,8 +927,7 @@ void __init setup_arch(char **cmdline_p)
 	setup_bios_corruption_check();
 #endif
 
-	printk(KERN_DEBUG "initial memory mapped: [mem 0x00000000-%#010lx]\n",
-			(max_pfn_mapped<<PAGE_SHIFT) - 1);
+	print_init_mem_mapped();
 
 	setup_real_mode();
 
diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c
index 4f5f9f7..1a88012 100644
--- a/arch/x86/mm/init_64.c
+++ b/arch/x86/mm/init_64.c
@@ -381,10 +381,14 @@ void __init init_extra_mapping_uc(unsigned long phys, unsigned long size)
 void __init cleanup_highmap(void)
 {
 	unsigned long vaddr = __START_KERNEL_map;
-	unsigned long vaddr_end = __START_KERNEL_map + (max_pfn_mapped << PAGE_SHIFT);
+	unsigned long vaddr_end = __START_KERNEL_map + KERNEL_IMAGE_SIZE;
 	unsigned long end = roundup((unsigned long)_brk_end, PMD_SIZE) - 1;
 	pmd_t *pmd = level2_kernel_pgt;
 
+	/* Xen has its own end somehow with abused max_pfn_mapped */
+	if (max_pfn_mapped)
+		vaddr_end = __START_KERNEL_map + (max_pfn_mapped << PAGE_SHIFT);
+
 	for (; vaddr + PMD_SIZE - 1 < vaddr_end; pmd++, vaddr += PMD_SIZE) {
 		if (pmd_none(*pmd))
 			continue;
-- 
1.7.10.4


^ permalink raw reply related	[flat|nested] 127+ messages in thread

* Re: [tip:x86/microcode] x86/microcode_intel_early.c: Early update ucode on Intel's CPU
  2012-12-16  5:17                                                                     ` Yinghai Lu
@ 2012-12-16  8:50                                                                       ` Yinghai Lu
  2012-12-17 22:47                                                                         ` Yinghai Lu
  0 siblings, 1 reply; 127+ messages in thread
From: Yinghai Lu @ 2012-12-16  8:50 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: H. Peter Anvin, Borislav Petkov, Yu, Fenghua, mingo,
	linux-kernel, tglx, linux-tip-commits, Konrad Rzeszutek Wilk,
	Stefano Stabellini

On Sat, Dec 15, 2012 at 9:17 PM, Yinghai Lu <yinghai@kernel.org> wrote:
> On Sat, Dec 15, 2012 at 6:09 PM, Yinghai Lu <yinghai@kernel.org> wrote:
>> On Sat, Dec 15, 2012 at 1:40 PM, H. Peter Anvin <hpa@zytor.com> wrote:
>>> On 12/15/2012 12:55 PM, Yinghai Lu wrote:
>>>>
>>>> BTW, did you look at smp boot problem with early_level4_pgt version?
>>>
>>>
>>> No, I have been busy with non-Linux stuff today.
>>>
>>
>> ok, i sorted it out. I will split it to small pieces and post them.
>
> I updated for-x86-boot branch with it, and it is based on
> linus:master
> tip:x86/mm
> tip:x86/urgent
> tip:x86/mm2.
>
> also attach 7 new ones are just added to that branch.
>
just updated the branch to fix compiling problem that was found by
Fengguang's kbuild test robot.

git://git.kernel.org/pub/scm/linux/kernel/git/yinghai/linux-yinghai.git
for-x86-boot

^ permalink raw reply	[flat|nested] 127+ messages in thread

* Re: [tip:x86/microcode] x86/microcode_intel_early.c: Early update ucode on Intel's CPU
  2012-12-16  8:50                                                                       ` Yinghai Lu
@ 2012-12-17 22:47                                                                         ` Yinghai Lu
  2012-12-17 23:11                                                                           ` H. Peter Anvin
  0 siblings, 1 reply; 127+ messages in thread
From: Yinghai Lu @ 2012-12-17 22:47 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: H. Peter Anvin, Borislav Petkov, Yu, Fenghua, mingo,
	linux-kernel, tglx, linux-tip-commits, Konrad Rzeszutek Wilk,
	Stefano Stabellini

On Sun, Dec 16, 2012 at 12:50 AM, Yinghai Lu <yinghai@kernel.org> wrote:
> On Sat, Dec 15, 2012 at 9:17 PM, Yinghai Lu <yinghai@kernel.org> wrote:
>> On Sat, Dec 15, 2012 at 6:09 PM, Yinghai Lu <yinghai@kernel.org> wrote:
>>> On Sat, Dec 15, 2012 at 1:40 PM, H. Peter Anvin <hpa@zytor.com> wrote:
>>>> On 12/15/2012 12:55 PM, Yinghai Lu wrote:
>>>>>
>>>>> BTW, did you look at smp boot problem with early_level4_pgt version?
>>>>
>>>>
>>>> No, I have been busy with non-Linux stuff today.
>>>>
>>>
>>> ok, i sorted it out. I will split it to small pieces and post them.
>>
>> I updated for-x86-boot branch with it, and it is based on
>> linus:master
>> tip:x86/mm
>> tip:x86/urgent
>> tip:x86/mm2.
>>
>> also attach 7 new ones are just added to that branch.
>>
> just updated the branch to fix compiling problem that was found by
> Fengguang's kbuild test robot.
>
> git://git.kernel.org/pub/scm/linux/kernel/git/yinghai/linux-yinghai.git
> for-x86-boot

Peter, can you check that branch again?

I moved the early_trap_init after init_mem_mapping.
so for 64bit native, init_mem_mapping will setup page table for ram from blank.

^ permalink raw reply	[flat|nested] 127+ messages in thread

* Re: [tip:x86/microcode] x86/microcode_intel_early.c: Early update ucode on Intel's CPU
  2012-12-17 22:47                                                                         ` Yinghai Lu
@ 2012-12-17 23:11                                                                           ` H. Peter Anvin
  2012-12-17 23:26                                                                             ` Yinghai Lu
  0 siblings, 1 reply; 127+ messages in thread
From: H. Peter Anvin @ 2012-12-17 23:11 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: H. Peter Anvin, Borislav Petkov, Yu, Fenghua, mingo,
	linux-kernel, tglx, linux-tip-commits, Konrad Rzeszutek Wilk,
	Stefano Stabellini

On 12/17/2012 02:47 PM, Yinghai Lu wrote:
>
> Peter, can you check that branch again?
>
> I moved the early_trap_init after init_mem_mapping.
> so for 64bit native, init_mem_mapping will setup page table for ram from blank.
>

Looks better, at first glance at least.  There are a couple of 
unnecessary changes (the counter in head_64.S cannot exceed 32 bits once 
computed, so the change from %rcx to %ecx change is pointless.)

There is another bug in my patch: it either needs to mask off the NX bit 
if we are running on non-NX-enabled hardware, or it needs to not set the 
NX bit (which is mostly okay that early on, I suspect.)

	-hpa

-- 
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel.  I don't speak on their behalf.


^ permalink raw reply	[flat|nested] 127+ messages in thread

* Re: [tip:x86/microcode] x86/microcode_intel_early.c: Early update ucode on Intel's CPU
  2012-12-17 23:11                                                                           ` H. Peter Anvin
@ 2012-12-17 23:26                                                                             ` Yinghai Lu
  2012-12-18  1:11                                                                               ` Yinghai Lu
  0 siblings, 1 reply; 127+ messages in thread
From: Yinghai Lu @ 2012-12-17 23:26 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: H. Peter Anvin, Borislav Petkov, Yu, Fenghua, mingo,
	linux-kernel, tglx, linux-tip-commits, Konrad Rzeszutek Wilk,
	Stefano Stabellini

On Mon, Dec 17, 2012 at 3:11 PM, H. Peter Anvin <hpa@zytor.com> wrote:
> On 12/17/2012 02:47 PM, Yinghai Lu wrote:
>>
>>
>> Peter, can you check that branch again?
>>
>> I moved the early_trap_init after init_mem_mapping.
>> so for 64bit native, init_mem_mapping will setup page table for ram from
>> blank.
>>
>
> Looks better, at first glance at least.  There are a couple of unnecessary
> changes (the counter in head_64.S cannot exceed 32 bits once computed, so
> the change from %rcx to %ecx change is pointless.)

ok,  return to use %ecx

>
> There is another bug in my patch: it either needs to mask off the NX bit if
> we are running on non-NX-enabled hardware, or it needs to not set the NX bit
> (which is mostly okay that early on, I suspect.)

i test that in kvm guest, and westmere, current version seem ok.

will repost the patchset to list to get more review.

:

^ permalink raw reply	[flat|nested] 127+ messages in thread

* Re: [tip:x86/microcode] x86/microcode_intel_early.c: Early update ucode on Intel's CPU
  2012-12-17 23:26                                                                             ` Yinghai Lu
@ 2012-12-18  1:11                                                                               ` Yinghai Lu
  2012-12-18  1:51                                                                                 ` Yinghai Lu
  0 siblings, 1 reply; 127+ messages in thread
From: Yinghai Lu @ 2012-12-18  1:11 UTC (permalink / raw)
  To: H. Peter Anvin, Jason Wessel
  Cc: H. Peter Anvin, Borislav Petkov, Yu, Fenghua, mingo,
	linux-kernel, tglx, linux-tip-commits, Konrad Rzeszutek Wilk,
	Stefano Stabellini

On Mon, Dec 17, 2012 at 3:26 PM, Yinghai Lu <yinghai@kernel.org> wrote:
> On Mon, Dec 17, 2012 at 3:11 PM, H. Peter Anvin <hpa@zytor.com> wrote:
>> On 12/17/2012 02:47 PM, Yinghai Lu wrote:
>>>
>>>
>>> Peter, can you check that branch again?
>>>
>>> I moved the early_trap_init after init_mem_mapping.
>>> so for 64bit native, init_mem_mapping will setup page table for ram from
>>> blank.
>>>
>>
>> Looks better, at first glance at least.  There are a couple of unnecessary
>> changes (the counter in head_64.S cannot exceed 32 bits once computed, so
>> the change from %rcx to %ecx change is pointless.)
>
> ok,  return to use %ecx
>
>>
>> There is another bug in my patch: it either needs to mask off the NX bit if
>> we are running on non-NX-enabled hardware, or it needs to not set the NX bit
>> (which is mostly okay that early on, I suspect.)
>
> i test that in kvm guest, and westmere, current version seem ok.
>
> will repost the patchset to list to get more review.
>

not sure if i could move that early_trap_init down.

jason,

We need to move down early_trap_init after init_memory_mapping to use
early #PF handler to set page table.

So can we do that? for kgdb it is that ok to move it down?

or can we just move
set_intr_gate(X86_TRAP_PF, &page_fault)
back to trap_init?

Thanks

Yinghai

^ permalink raw reply	[flat|nested] 127+ messages in thread

* Re: [tip:x86/microcode] x86/microcode_intel_early.c: Early update ucode on Intel's CPU
  2012-12-18  1:11                                                                               ` Yinghai Lu
@ 2012-12-18  1:51                                                                                 ` Yinghai Lu
  2012-12-18  2:42                                                                                   ` Yinghai Lu
  0 siblings, 1 reply; 127+ messages in thread
From: Yinghai Lu @ 2012-12-18  1:51 UTC (permalink / raw)
  To: H. Peter Anvin, Jason Wessel, Jan Kiszka
  Cc: H. Peter Anvin, Borislav Petkov, Yu, Fenghua, mingo,
	linux-kernel, tglx, linux-tip-commits, Konrad Rzeszutek Wilk,
	Stefano Stabellini

On Mon, Dec 17, 2012 at 5:11 PM, Yinghai Lu <yinghai@kernel.org> wrote:
> On Mon, Dec 17, 2012 at 3:26 PM, Yinghai Lu <yinghai@kernel.org> wrote:
>> On Mon, Dec 17, 2012 at 3:11 PM, H. Peter Anvin <hpa@zytor.com> wrote:
>>> On 12/17/2012 02:47 PM, Yinghai Lu wrote:
>>>>
>>>>
>>>> Peter, can you check that branch again?
>>>>
>>>> I moved the early_trap_init after init_mem_mapping.
>>>> so for 64bit native, init_mem_mapping will setup page table for ram from
>>>> blank.
>>>>
>>>
>>> Looks better, at first glance at least.  There are a couple of unnecessary
>>> changes (the counter in head_64.S cannot exceed 32 bits once computed, so
>>> the change from %rcx to %ecx change is pointless.)
>>
>> ok,  return to use %ecx
>>
>>>
>>> There is another bug in my patch: it either needs to mask off the NX bit if
>>> we are running on non-NX-enabled hardware, or it needs to not set the NX bit
>>> (which is mostly okay that early on, I suspect.)
>>
>> i test that in kvm guest, and westmere, current version seem ok.
>>
>> will repost the patchset to list to get more review.
>>
>
> not sure if i could move that early_trap_init down.
>
> jason,
>
> We need to move down early_trap_init after init_memory_mapping to use
> early #PF handler to set page table.
>
> So can we do that? for kgdb it is that ok to move it down?

adding to Jan.

^ permalink raw reply	[flat|nested] 127+ messages in thread

* Re: [tip:x86/microcode] x86/microcode_intel_early.c: Early update ucode on Intel's CPU
  2012-12-18  1:51                                                                                 ` Yinghai Lu
@ 2012-12-18  2:42                                                                                   ` Yinghai Lu
  0 siblings, 0 replies; 127+ messages in thread
From: Yinghai Lu @ 2012-12-18  2:42 UTC (permalink / raw)
  To: H. Peter Anvin, Jason Wessel, Jan Kiszka
  Cc: H. Peter Anvin, Borislav Petkov, Yu, Fenghua, mingo,
	linux-kernel, tglx, linux-tip-commits, Konrad Rzeszutek Wilk,
	Stefano Stabellini

[-- Attachment #1: Type: text/plain, Size: 79 bytes --]

Jan,

Can you check if attached patch is going to break KGDB?

Thanks

Yinghai

[-- Attachment #2: move_down_early_trap_init.patch --]
[-- Type: application/octet-stream, Size: 2408 bytes --]

Subject: [PATCH] x86: Move early_trap_pf_init down after init_mem_mapping

With #PF handler help, we can set mem mapping from blank in init_level4_pgt

but we need to keep that handler alive until init_mem_mapping and don't

let early_trap_init to trash that early #PF handler.

So split early_trap_pf_init out and move it down.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>

---
 arch/x86/include/asm/processor.h |    3 ++-
 arch/x86/kernel/setup.c          |    4 +++-
 arch/x86/kernel/traps.c          |    8 ++++++--
 3 files changed, 11 insertions(+), 4 deletions(-)

Index: linux-2.6/arch/x86/kernel/setup.c
===================================================================
--- linux-2.6.orig/arch/x86/kernel/setup.c
+++ linux-2.6/arch/x86/kernel/setup.c
@@ -685,7 +685,7 @@ void __init setup_arch(char **cmdline_p)
 	 */
 	olpc_ofw_detect();
 
-	early_trap_init();
+	early_trap_db_bp_init();
 	early_cpu_init();
 	early_ioremap_init();
 
@@ -917,6 +917,8 @@ void __init setup_arch(char **cmdline_p)
 
 	init_mem_mapping();
 
+	early_trap_pf_init();
+
 	setup_real_mode();
 
 	memblock.current_limit = get_max_mapped();
Index: linux-2.6/arch/x86/include/asm/processor.h
===================================================================
--- linux-2.6.orig/arch/x86/include/asm/processor.h
+++ linux-2.6/arch/x86/include/asm/processor.h
@@ -730,7 +730,8 @@ enum idle_boot_override {IDLE_NO_OVERRID
 extern void enable_sep_cpu(void);
 extern int sysenter_setup(void);
 
-extern void early_trap_init(void);
+extern void early_trap_db_bp_init(void);
+extern void early_trap_pf_init(void);
 
 /* Defined in head.S */
 extern struct desc_ptr		early_gdt_descr;
Index: linux-2.6/arch/x86/kernel/traps.c
===================================================================
--- linux-2.6.orig/arch/x86/kernel/traps.c
+++ linux-2.6/arch/x86/kernel/traps.c
@@ -689,15 +689,19 @@ dotraplinkage void do_iret_error(struct
 #endif
 
 /* Set of traps needed for early debugging. */
-void __init early_trap_init(void)
+void __init early_trap_db_bp_init(void)
 {
 	set_intr_gate_ist(X86_TRAP_DB, &debug, DEBUG_STACK);
 	/* int3 can be called from all */
 	set_system_intr_gate_ist(X86_TRAP_BP, &int3, DEBUG_STACK);
-	set_intr_gate(X86_TRAP_PF, &page_fault);
 	load_idt(&idt_descr);
 }
 
+void __init early_trap_pf_init(void)
+{
+	set_intr_gate(X86_TRAP_PF, &page_fault);
+}
+
 void __init trap_init(void)
 {
 	int i;

^ permalink raw reply	[flat|nested] 127+ messages in thread

* Re: [tip:x86/microcode] x86/microcode_intel_early.c: Early update ucode on Intel's CPU
  2012-12-15 23:17                                                                         ` H. Peter Anvin
@ 2012-12-19 20:37                                                                           ` Borislav Petkov
  2012-12-19 21:07                                                                             ` Jacob Shin
  0 siblings, 1 reply; 127+ messages in thread
From: Borislav Petkov @ 2012-12-19 20:37 UTC (permalink / raw)
  To: H. Peter Anvin, Jacob Shin
  Cc: Yinghai Lu, H. Peter Anvin, Yu, Fenghua, mingo, linux-kernel,
	tglx, linux-tip-commits, Konrad Rzeszutek Wilk,
	Stefano Stabellini

On Sat, Dec 15, 2012 at 03:17:05PM -0800, H. Peter Anvin wrote:
> On 12/15/2012 03:15 PM, Yinghai Lu wrote:
> >>
> >>That is for the kernel region itself (that code is actually unchanged from
> >>the current code), and yes, we could cap that one to _end if there are
> >>systems which have bugs in that area.  The dynamic page tables map 1G
> >>aligned at a time.
> >
> >dynamic should be 2M too.
> >
> >AMD system:
> >
> >http://git.kernel.org/?p=linux/kernel/git/tip/tip.git;a=commitdiff;h=66520ebc2df3fe52eb4792f8101fac573b766baf
> >
> >  BIOS-e820: [mem 0x0000000100000000-0x000000e037ffffff] usable
> >  BIOS-e820: [mem 0x000000e038000000-0x000000fcffffffff] reserved
> >  BIOS-e820: [mem 0x0000010000000000-0x0000011ffeffffff] usable
> >
> >the hole is not 1G aligned.
> >
> >or HT region is from e040000000 ?
> >
> 
> The HT region starts at 0xfd00000000 -- after that reserved region,
> so I have no idea what that particular system is trying to do or
> what is requirements are (nor what its MTRR setup is, since you
> didn't post it.)

This is something that Jacob should be able to answer since he's been
dealing with the 1T support.

Jacob, how is the HT hole marked on AMD? I know hazily that we do say
"all memory regions cacheable by default if not explicitly marked" but
we need to exclude the HT hole from that, right?

So how are we doing that, MTRRs?

Thanks.

-- 
Regards/Gruss,
    Boris.

Sent from a fat crate under my desk. Formatting is fine.
--

^ permalink raw reply	[flat|nested] 127+ messages in thread

* Re: [tip:x86/microcode] x86/microcode_intel_early.c: Early update ucode on Intel's CPU
  2012-12-19 20:37                                                                           ` Borislav Petkov
@ 2012-12-19 21:07                                                                             ` Jacob Shin
  2012-12-19 21:48                                                                               ` H. Peter Anvin
  0 siblings, 1 reply; 127+ messages in thread
From: Jacob Shin @ 2012-12-19 21:07 UTC (permalink / raw)
  To: Borislav Petkov, H. Peter Anvin, Yinghai Lu, H. Peter Anvin, Yu,
	Fenghua, mingo, linux-kernel, tglx, linux-tip-commits,
	Konrad Rzeszutek Wilk, Stefano Stabellini

On Wed, Dec 19, 2012 at 09:37:51PM +0100, Borislav Petkov wrote:
> On Sat, Dec 15, 2012 at 03:17:05PM -0800, H. Peter Anvin wrote:
> > On 12/15/2012 03:15 PM, Yinghai Lu wrote:
> > >>
> > >>That is for the kernel region itself (that code is actually unchanged from
> > >>the current code), and yes, we could cap that one to _end if there are
> > >>systems which have bugs in that area.  The dynamic page tables map 1G
> > >>aligned at a time.
> > >
> > >dynamic should be 2M too.
> > >
> > >AMD system:
> > >
> > >http://git.kernel.org/?p=linux/kernel/git/tip/tip.git;a=commitdiff;h=66520ebc2df3fe52eb4792f8101fac573b766baf
> > >
> > >  BIOS-e820: [mem 0x0000000100000000-0x000000e037ffffff] usable
> > >  BIOS-e820: [mem 0x000000e038000000-0x000000fcffffffff] reserved
> > >  BIOS-e820: [mem 0x0000010000000000-0x0000011ffeffffff] usable
> > >
> > >the hole is not 1G aligned.
> > >
> > >or HT region is from e040000000 ?
> > >
> > 
> > The HT region starts at 0xfd00000000 -- after that reserved region,
> > so I have no idea what that particular system is trying to do or
> > what is requirements are (nor what its MTRR setup is, since you
> > didn't post it.)
> 
> This is something that Jacob should be able to answer since he's been
> dealing with the 1T support.
> 
> Jacob, how is the HT hole marked on AMD? I know hazily that we do say
> "all memory regions cacheable by default if not explicitly marked" but
> we need to exclude the HT hole from that, right?
> 
> So how are we doing that, MTRRs?

HT hole is architectural, I guess in manuals somewhere and is:
0xfd00000000 ~ 0x10000000000. CPU cannot generate memory read/write in
that region.

On that above particular system, there is 1TB of total RAM, and since
we do not want to loose memory around the HT hole, what BIOS has done
is programmed the DRAM controller to move the last 128 GB of memory
to above the HT region. There are 8 memory nodes, the last DRAM
address of the 7th node is 0xe038000000. Then there is a hole and the
first address of the last memory node starts at 1TB.

MTRRs only cover under 4GB, and does not cover the HT hole.

Yinghai's mm patchset to only direct map regions backed by RAM solves
our memory hole around HT area.

I've tested Yinghai's patchset (several of early versions)
successfully on our above 1TB system. I'll try the latest tip/mm2
again sometime later today, but I'm pretty sure it should be fine.

Thanks,

-Jacob

> 
> Thanks.
> 
> -- 
> Regards/Gruss,
>     Boris.
> 
> Sent from a fat crate under my desk. Formatting is fine.
> --
> 


^ permalink raw reply	[flat|nested] 127+ messages in thread

* Re: [tip:x86/microcode] x86/microcode_intel_early.c: Early update ucode on Intel's CPU
  2012-12-19 21:07                                                                             ` Jacob Shin
@ 2012-12-19 21:48                                                                               ` H. Peter Anvin
  2012-12-19 22:05                                                                                 ` Jacob Shin
  0 siblings, 1 reply; 127+ messages in thread
From: H. Peter Anvin @ 2012-12-19 21:48 UTC (permalink / raw)
  To: Jacob Shin, Borislav Petkov, Yinghai Lu, H. Peter Anvin, Yu,
	Fenghua, mingo, linux-kernel, tglx, linux-tip-commits,
	Konrad Rzeszutek Wilk, Stefano Stabellini

There are a few very serious problems we need to figure out related to generalizing very early boot.  If this range gets mapped, will the CPU treat it as WB?  If so, with what consequences for either the HT region or the hole below it?

Jacob Shin <jacob.shin@amd.com> wrote:

>On Wed, Dec 19, 2012 at 09:37:51PM +0100, Borislav Petkov wrote:
>> On Sat, Dec 15, 2012 at 03:17:05PM -0800, H. Peter Anvin wrote:
>> > On 12/15/2012 03:15 PM, Yinghai Lu wrote:
>> > >>
>> > >>That is for the kernel region itself (that code is actually
>unchanged from
>> > >>the current code), and yes, we could cap that one to _end if
>there are
>> > >>systems which have bugs in that area.  The dynamic page tables
>map 1G
>> > >>aligned at a time.
>> > >
>> > >dynamic should be 2M too.
>> > >
>> > >AMD system:
>> > >
>> >
>>http://git.kernel.org/?p=linux/kernel/git/tip/tip.git;a=commitdiff;h=66520ebc2df3fe52eb4792f8101fac573b766baf
>> > >
>> > >  BIOS-e820: [mem 0x0000000100000000-0x000000e037ffffff] usable
>> > >  BIOS-e820: [mem 0x000000e038000000-0x000000fcffffffff] reserved
>> > >  BIOS-e820: [mem 0x0000010000000000-0x0000011ffeffffff] usable
>> > >
>> > >the hole is not 1G aligned.
>> > >
>> > >or HT region is from e040000000 ?
>> > >
>> > 
>> > The HT region starts at 0xfd00000000 -- after that reserved region,
>> > so I have no idea what that particular system is trying to do or
>> > what is requirements are (nor what its MTRR setup is, since you
>> > didn't post it.)
>> 
>> This is something that Jacob should be able to answer since he's been
>> dealing with the 1T support.
>> 
>> Jacob, how is the HT hole marked on AMD? I know hazily that we do say
>> "all memory regions cacheable by default if not explicitly marked"
>but
>> we need to exclude the HT hole from that, right?
>> 
>> So how are we doing that, MTRRs?
>
>HT hole is architectural, I guess in manuals somewhere and is:
>0xfd00000000 ~ 0x10000000000. CPU cannot generate memory read/write in
>that region.
>
>On that above particular system, there is 1TB of total RAM, and since
>we do not want to loose memory around the HT hole, what BIOS has done
>is programmed the DRAM controller to move the last 128 GB of memory
>to above the HT region. There are 8 memory nodes, the last DRAM
>address of the 7th node is 0xe038000000. Then there is a hole and the
>first address of the last memory node starts at 1TB.
>
>MTRRs only cover under 4GB, and does not cover the HT hole.
>
>Yinghai's mm patchset to only direct map regions backed by RAM solves
>our memory hole around HT area.
>
>I've tested Yinghai's patchset (several of early versions)
>successfully on our above 1TB system. I'll try the latest tip/mm2
>again sometime later today, but I'm pretty sure it should be fine.
>
>Thanks,
>
>-Jacob
>
>> 
>> Thanks.
>> 
>> -- 
>> Regards/Gruss,
>>     Boris.
>> 
>> Sent from a fat crate under my desk. Formatting is fine.
>> --
>> 

-- 
Sent from my mobile phone. Please excuse brevity and lack of formatting.

^ permalink raw reply	[flat|nested] 127+ messages in thread

* Re: [tip:x86/microcode] x86/microcode_intel_early.c: Early update ucode on Intel's CPU
  2012-12-19 21:48                                                                               ` H. Peter Anvin
@ 2012-12-19 22:05                                                                                 ` Jacob Shin
  2012-12-19 22:25                                                                                   ` H. Peter Anvin
  0 siblings, 1 reply; 127+ messages in thread
From: Jacob Shin @ 2012-12-19 22:05 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Borislav Petkov, Yinghai Lu, H. Peter Anvin, Yu, Fenghua, mingo,
	linux-kernel, tglx, linux-tip-commits, Konrad Rzeszutek Wilk,
	Stefano Stabellini

On Wed, Dec 19, 2012 at 01:48:33PM -0800, H. Peter Anvin wrote:
> There are a few very serious problems we need to figure out related to generalizing very early boot.  If this range gets mapped, will the CPU treat it as WB?  If so, with what consequences for either the HT region or the hole below it?

Hm .. I guess I need to read the whole email thread .. but if you can
explain it in short, what are the problems?

Yes the CPU treats it as WB because the region is under TOM2, so by
default it is WB, and also when you create direct mapping page tables,
the PATs mark them as WB.

What we have seen is that even though the kernel never generate memory
accesses in the hole (since E820 says that it is not RAM) when kernel
read/writes memory near the hole, the CPU was prefetching into the
hole because PATs say that it is WB. This resulted in MCE because
there is no physical RAM there.

-Jacob

> 
> Jacob Shin <jacob.shin@amd.com> wrote:
> 
> >On Wed, Dec 19, 2012 at 09:37:51PM +0100, Borislav Petkov wrote:
> >> On Sat, Dec 15, 2012 at 03:17:05PM -0800, H. Peter Anvin wrote:
> >> > On 12/15/2012 03:15 PM, Yinghai Lu wrote:
> >> > >>
> >> > >>That is for the kernel region itself (that code is actually
> >unchanged from
> >> > >>the current code), and yes, we could cap that one to _end if
> >there are
> >> > >>systems which have bugs in that area.  The dynamic page tables
> >map 1G
> >> > >>aligned at a time.
> >> > >
> >> > >dynamic should be 2M too.
> >> > >
> >> > >AMD system:
> >> > >
> >> >
> >>http://git.kernel.org/?p=linux/kernel/git/tip/tip.git;a=commitdiff;h=66520ebc2df3fe52eb4792f8101fac573b766baf
> >> > >
> >> > >  BIOS-e820: [mem 0x0000000100000000-0x000000e037ffffff] usable
> >> > >  BIOS-e820: [mem 0x000000e038000000-0x000000fcffffffff] reserved
> >> > >  BIOS-e820: [mem 0x0000010000000000-0x0000011ffeffffff] usable
> >> > >
> >> > >the hole is not 1G aligned.
> >> > >
> >> > >or HT region is from e040000000 ?
> >> > >
> >> > 
> >> > The HT region starts at 0xfd00000000 -- after that reserved region,
> >> > so I have no idea what that particular system is trying to do or
> >> > what is requirements are (nor what its MTRR setup is, since you
> >> > didn't post it.)
> >> 
> >> This is something that Jacob should be able to answer since he's been
> >> dealing with the 1T support.
> >> 
> >> Jacob, how is the HT hole marked on AMD? I know hazily that we do say
> >> "all memory regions cacheable by default if not explicitly marked"
> >but
> >> we need to exclude the HT hole from that, right?
> >> 
> >> So how are we doing that, MTRRs?
> >
> >HT hole is architectural, I guess in manuals somewhere and is:
> >0xfd00000000 ~ 0x10000000000. CPU cannot generate memory read/write in
> >that region.
> >
> >On that above particular system, there is 1TB of total RAM, and since
> >we do not want to loose memory around the HT hole, what BIOS has done
> >is programmed the DRAM controller to move the last 128 GB of memory
> >to above the HT region. There are 8 memory nodes, the last DRAM
> >address of the 7th node is 0xe038000000. Then there is a hole and the
> >first address of the last memory node starts at 1TB.
> >
> >MTRRs only cover under 4GB, and does not cover the HT hole.
> >
> >Yinghai's mm patchset to only direct map regions backed by RAM solves
> >our memory hole around HT area.
> >
> >I've tested Yinghai's patchset (several of early versions)
> >successfully on our above 1TB system. I'll try the latest tip/mm2
> >again sometime later today, but I'm pretty sure it should be fine.
> >
> >Thanks,
> >
> >-Jacob
> >
> >> 
> >> Thanks.
> >> 
> >> -- 
> >> Regards/Gruss,
> >>     Boris.
> >> 
> >> Sent from a fat crate under my desk. Formatting is fine.
> >> --
> >> 
> 
> -- 
> Sent from my mobile phone. Please excuse brevity and lack of formatting.
> 


^ permalink raw reply	[flat|nested] 127+ messages in thread

* Re: [tip:x86/microcode] x86/microcode_intel_early.c: Early update ucode on Intel's CPU
  2012-12-19 22:05                                                                                 ` Jacob Shin
@ 2012-12-19 22:25                                                                                   ` H. Peter Anvin
  2012-12-19 22:47                                                                                     ` Yinghai Lu
                                                                                                       ` (2 more replies)
  0 siblings, 3 replies; 127+ messages in thread
From: H. Peter Anvin @ 2012-12-19 22:25 UTC (permalink / raw)
  To: Jacob Shin
  Cc: Borislav Petkov, Yinghai Lu, H. Peter Anvin, Yu, Fenghua, mingo,
	linux-kernel, tglx, linux-tip-commits, Konrad Rzeszutek Wilk,
	Stefano Stabellini

On 12/19/2012 02:05 PM, Jacob Shin wrote:
> On Wed, Dec 19, 2012 at 01:48:33PM -0800, H. Peter Anvin wrote:
>> There are a few very serious problems we need to figure out related to generalizing very early boot.  If this range gets mapped, will the CPU treat it as WB?  If so, with what consequences for either the HT region or the hole below it?
>
> Hm .. I guess I need to read the whole email thread .. but if you can
> explain it in short, what are the problems?
>
> Yes the CPU treats it as WB because the region is under TOM2, so by
> default it is WB, and also when you create direct mapping page tables,
> the PATs mark them as WB.
>
> What we have seen is that even though the kernel never generate memory
> accesses in the hole (since E820 says that it is not RAM) when kernel
> read/writes memory near the hole, the CPU was prefetching into the
> hole because PATs say that it is WB. This resulted in MCE because
> there is no physical RAM there.
>

IOW, epic f*ckup.

The problem is that before we have awareness of the memory map, we need 
to map things in order to access them.  This is a big problem and right 
now there are ridiculous heuristics.  I have been working on mapping on 
demand, but there are concerns about the boundaries (i.e. what happens 
if the mapping spill over into a pit like this.)

This kind of stuff is really not acceptable.  A region which will cause 
malfunction if prefetched should not be WB in the MTRR system (I include 
TOM* in that.)  The real question is what we can do to mitigate the damage.

	-hpa

-- 
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel.  I don't speak on their behalf.


^ permalink raw reply	[flat|nested] 127+ messages in thread

* Re: [tip:x86/microcode] x86/microcode_intel_early.c: Early update ucode on Intel's CPU
  2012-12-19 22:25                                                                                   ` H. Peter Anvin
@ 2012-12-19 22:47                                                                                     ` Yinghai Lu
  2012-12-19 22:59                                                                                       ` H. Peter Anvin
  2012-12-19 22:51                                                                                     ` Borislav Petkov
  2012-12-19 22:55                                                                                     ` Jacob Shin
  2 siblings, 1 reply; 127+ messages in thread
From: Yinghai Lu @ 2012-12-19 22:47 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Jacob Shin, Borislav Petkov, H. Peter Anvin, Yu, Fenghua, mingo,
	linux-kernel, tglx, linux-tip-commits, Konrad Rzeszutek Wilk,
	Stefano Stabellini

On Wed, Dec 19, 2012 at 2:25 PM, H. Peter Anvin <hpa@zytor.com> wrote:
>
> The problem is that before we have awareness of the memory map, we need to
> map things in order to access them.  This is a big problem and right now
> there are ridiculous heuristics.  I have been working on mapping on demand,
> but there are concerns about the boundaries (i.e. what happens if the
> mapping spill over into a pit like this.)
>
> This kind of stuff is really not acceptable.  A region which will cause
> malfunction if prefetched should not be WB in the MTRR system (I include
> TOM* in that.)  The real question is what we can do to mitigate the damage.

on demand to only map 2M will help ?
or have to return to v6 version for-x86-boot ?

^ permalink raw reply	[flat|nested] 127+ messages in thread

* Re: [tip:x86/microcode] x86/microcode_intel_early.c: Early update ucode on Intel's CPU
  2012-12-19 22:25                                                                                   ` H. Peter Anvin
  2012-12-19 22:47                                                                                     ` Yinghai Lu
@ 2012-12-19 22:51                                                                                     ` Borislav Petkov
  2012-12-19 22:59                                                                                       ` Jacob Shin
  2012-12-19 22:55                                                                                     ` Jacob Shin
  2 siblings, 1 reply; 127+ messages in thread
From: Borislav Petkov @ 2012-12-19 22:51 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Jacob Shin, Yinghai Lu, H. Peter Anvin, Yu, Fenghua, mingo,
	linux-kernel, tglx, linux-tip-commits, Konrad Rzeszutek Wilk,
	Stefano Stabellini

On Wed, Dec 19, 2012 at 02:25:44PM -0800, H. Peter Anvin wrote:
> The real question is what we can do to mitigate the damage.

Let's try the first thing that comes to mind: waste a variable MTRR on
it:

[    0.000000] MTRR variable ranges enabled:
[    0.000000]   0 base 000000000000 mask FFFF80000000 write-back
[    0.000000]   1 base 000080000000 mask FFFFC0000000 write-back
[    0.000000]   2 base 0000C0000000 mask FFFFF0000000 write-back
[    0.000000]   3 base 000100000000 mask FFFF00000000 write-back
[    0.000000]   4 base 000200000000 mask FFFFE0000000 write-back
[    0.000000]   5 base 000220000000 mask FFFFF0000000 write-back
[    0.000000]   6 disabled
[    0.000000]   7 disabled

one of those last two. This is a small box though so I'm guessing on 1T
boxes those last two won't be disabled. Jacob?

-- 
Regards/Gruss,
    Boris.

Sent from a fat crate under my desk. Formatting is fine.
--

^ permalink raw reply	[flat|nested] 127+ messages in thread

* Re: [tip:x86/microcode] x86/microcode_intel_early.c: Early update ucode on Intel's CPU
  2012-12-19 22:25                                                                                   ` H. Peter Anvin
  2012-12-19 22:47                                                                                     ` Yinghai Lu
  2012-12-19 22:51                                                                                     ` Borislav Petkov
@ 2012-12-19 22:55                                                                                     ` Jacob Shin
  2012-12-19 23:00                                                                                       ` Borislav Petkov
  2012-12-19 23:23                                                                                       ` H. Peter Anvin
  2 siblings, 2 replies; 127+ messages in thread
From: Jacob Shin @ 2012-12-19 22:55 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Borislav Petkov, Yinghai Lu, H. Peter Anvin, Yu, Fenghua, mingo,
	linux-kernel, tglx, linux-tip-commits, Konrad Rzeszutek Wilk,
	Stefano Stabellini

On Wed, Dec 19, 2012 at 02:25:44PM -0800, H. Peter Anvin wrote:
> On 12/19/2012 02:05 PM, Jacob Shin wrote:
> >On Wed, Dec 19, 2012 at 01:48:33PM -0800, H. Peter Anvin wrote:
> >>There are a few very serious problems we need to figure out related to generalizing very early boot.  If this range gets mapped, will the CPU treat it as WB?  If so, with what consequences for either the HT region or the hole below it?
> >
> >Hm .. I guess I need to read the whole email thread .. but if you can
> >explain it in short, what are the problems?
> >
> >Yes the CPU treats it as WB because the region is under TOM2, so by
> >default it is WB, and also when you create direct mapping page tables,
> >the PATs mark them as WB.
> >
> >What we have seen is that even though the kernel never generate memory
> >accesses in the hole (since E820 says that it is not RAM) when kernel
> >read/writes memory near the hole, the CPU was prefetching into the
> >hole because PATs say that it is WB. This resulted in MCE because
> >there is no physical RAM there.
> >
> 
> IOW, epic f*ckup.
> 
> The problem is that before we have awareness of the memory map, we
> need to map things in order to access them.  This is a big problem
> and right now there are ridiculous heuristics.  I have been working
> on mapping on demand, but there are concerns about the boundaries
> (i.e. what happens if the mapping spill over into a pit like this.)
> 
> This kind of stuff is really not acceptable.  A region which will
> cause malfunction if prefetched should not be WB in the MTRR system
> (I include TOM* in that.)  The real question is what we can do to
> mitigate the damage.

Well, really the problem is with any memory hole above 4GB that is too
big to be covered by variable range MTRRs as UC. Because the kernel
use to just simply do init_memory_mapping for 4GB ~ top of memory,
any memory hole above 4GB are marked as WB in PATs.

How is this handled in Intel architecture? If there are memory holes
that are too big to be covered by variable range MTRRs as UC, are
there other MTRR like CPU registers that the BIOS programs?


Thanks,

-Jacob

> 
> 	-hpa
> 
> -- 
> H. Peter Anvin, Intel Open Source Technology Center
> I work for Intel.  I don't speak on their behalf.
> 
> 


^ permalink raw reply	[flat|nested] 127+ messages in thread

* Re: [tip:x86/microcode] x86/microcode_intel_early.c: Early update ucode on Intel's CPU
  2012-12-19 22:47                                                                                     ` Yinghai Lu
@ 2012-12-19 22:59                                                                                       ` H. Peter Anvin
  0 siblings, 0 replies; 127+ messages in thread
From: H. Peter Anvin @ 2012-12-19 22:59 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Jacob Shin, Borislav Petkov, H. Peter Anvin, Yu, Fenghua, mingo,
	linux-kernel, tglx, linux-tip-commits, Konrad Rzeszutek Wilk,
	Stefano Stabellini

On 12/19/2012 02:47 PM, Yinghai Lu wrote:
> 
> on demand to only map 2M will help ?
> or have to return to v6 version for-x86-boot ?
> 

Why would 2M be inherently better than 1G?  I realize it works for the
*one particular system* that you have a specimen for, but that is not a
sensible approach for architecture.

The problem remains no matter how you slice it; we need a general
solution.  The fact that this system was ever built reflects a number of
critical failures that should be surprising but sadly are not.

	-hpa


^ permalink raw reply	[flat|nested] 127+ messages in thread

* Re: [tip:x86/microcode] x86/microcode_intel_early.c: Early update ucode on Intel's CPU
  2012-12-19 22:51                                                                                     ` Borislav Petkov
@ 2012-12-19 22:59                                                                                       ` Jacob Shin
  2012-12-19 23:03                                                                                         ` Borislav Petkov
  0 siblings, 1 reply; 127+ messages in thread
From: Jacob Shin @ 2012-12-19 22:59 UTC (permalink / raw)
  To: Borislav Petkov, H. Peter Anvin, Yinghai Lu, H. Peter Anvin, Yu,
	Fenghua, mingo, linux-kernel, tglx, linux-tip-commits,
	Konrad Rzeszutek Wilk, Stefano Stabellini

On Wed, Dec 19, 2012 at 11:51:55PM +0100, Borislav Petkov wrote:
> On Wed, Dec 19, 2012 at 02:25:44PM -0800, H. Peter Anvin wrote:
> > The real question is what we can do to mitigate the damage.
> 
> Let's try the first thing that comes to mind: waste a variable MTRR on
> it:
> 
> [    0.000000] MTRR variable ranges enabled:
> [    0.000000]   0 base 000000000000 mask FFFF80000000 write-back
> [    0.000000]   1 base 000080000000 mask FFFFC0000000 write-back
> [    0.000000]   2 base 0000C0000000 mask FFFFF0000000 write-back
> [    0.000000]   3 base 000100000000 mask FFFF00000000 write-back
> [    0.000000]   4 base 000200000000 mask FFFFE0000000 write-back
> [    0.000000]   5 base 000220000000 mask FFFFF0000000 write-back
> [    0.000000]   6 disabled
> [    0.000000]   7 disabled
> 
> one of those last two. This is a small box though so I'm guessing on 1T
> boxes those last two won't be disabled. Jacob?

I can check but right, they might be used up. But even if we had slots
available, the memory range that needs to be covered is in large
enough address and aligned in such a way that you cannot cover it with
variable range MTRRs.

> 
> -- 
> Regards/Gruss,
>     Boris.
> 
> Sent from a fat crate under my desk. Formatting is fine.
> --
> 


^ permalink raw reply	[flat|nested] 127+ messages in thread

* Re: [tip:x86/microcode] x86/microcode_intel_early.c: Early update ucode on Intel's CPU
  2012-12-19 22:55                                                                                     ` Jacob Shin
@ 2012-12-19 23:00                                                                                       ` Borislav Petkov
  2012-12-19 23:17                                                                                         ` H. Peter Anvin
  2012-12-19 23:23                                                                                       ` H. Peter Anvin
  1 sibling, 1 reply; 127+ messages in thread
From: Borislav Petkov @ 2012-12-19 23:00 UTC (permalink / raw)
  To: Jacob Shin
  Cc: H. Peter Anvin, Yinghai Lu, H. Peter Anvin, Yu, Fenghua, mingo,
	linux-kernel, tglx, linux-tip-commits, Konrad Rzeszutek Wilk,
	Stefano Stabellini

On Wed, Dec 19, 2012 at 04:55:06PM -0600, Jacob Shin wrote:
> Well, really the problem is with any memory hole above 4GB that is too
> big to be covered by variable range MTRRs as UC.

Why, their PhysBase field is the 40 MSB bits of the physical address.
That should be more than TB.

-- 
Regards/Gruss,
    Boris.

Sent from a fat crate under my desk. Formatting is fine.
--

^ permalink raw reply	[flat|nested] 127+ messages in thread

* Re: [tip:x86/microcode] x86/microcode_intel_early.c: Early update ucode on Intel's CPU
  2012-12-19 22:59                                                                                       ` Jacob Shin
@ 2012-12-19 23:03                                                                                         ` Borislav Petkov
  2012-12-19 23:21                                                                                           ` Jacob Shin
  2012-12-19 23:22                                                                                           ` H. Peter Anvin
  0 siblings, 2 replies; 127+ messages in thread
From: Borislav Petkov @ 2012-12-19 23:03 UTC (permalink / raw)
  To: Jacob Shin
  Cc: H. Peter Anvin, Yinghai Lu, H. Peter Anvin, Yu, Fenghua, mingo,
	linux-kernel, tglx, linux-tip-commits, Konrad Rzeszutek Wilk,
	Stefano Stabellini

On Wed, Dec 19, 2012 at 04:59:41PM -0600, Jacob Shin wrote:
> I can check but right, they might be used up. But even if we had slots
> available, the memory range that needs to be covered is in large
> enough address and aligned in such a way that you cannot cover it with
> variable range MTRRs.

Actually, if I'm not mistaken, you only need to cover the HT hole with
one MTRR - the rest remains WB. And in order the mask bits to work, we
could make it a little bigger - we waste some memory but that's nothing
in comparison to the MCE.

You might need to talk to hw guys about the feasibility of this deal
though.

Thanks.

-- 
Regards/Gruss,
    Boris.

Sent from a fat crate under my desk. Formatting is fine.
--

^ permalink raw reply	[flat|nested] 127+ messages in thread

* Re: [tip:x86/microcode] x86/microcode_intel_early.c: Early update ucode on Intel's CPU
  2012-12-19 23:00                                                                                       ` Borislav Petkov
@ 2012-12-19 23:17                                                                                         ` H. Peter Anvin
  2012-12-19 23:30                                                                                           ` Borislav Petkov
  0 siblings, 1 reply; 127+ messages in thread
From: H. Peter Anvin @ 2012-12-19 23:17 UTC (permalink / raw)
  To: Borislav Petkov, Jacob Shin, Yinghai Lu, H. Peter Anvin, Yu,
	Fenghua, mingo, linux-kernel, tglx, linux-tip-commits,
	Konrad Rzeszutek Wilk, Stefano Stabellini

On 12/19/2012 03:00 PM, Borislav Petkov wrote:
> On Wed, Dec 19, 2012 at 04:55:06PM -0600, Jacob Shin wrote:
>> Well, really the problem is with any memory hole above 4GB that is too
>> big to be covered by variable range MTRRs as UC.
> 
> Why, their PhysBase field is the 40 MSB bits of the physical address.
> That should be more than TB.
> 

I presume with "too big" he really means "oddly shaped".

	-hpa


^ permalink raw reply	[flat|nested] 127+ messages in thread

* Re: [tip:x86/microcode] x86/microcode_intel_early.c: Early update ucode on Intel's CPU
  2012-12-19 23:03                                                                                         ` Borislav Petkov
@ 2012-12-19 23:21                                                                                           ` Jacob Shin
  2012-12-19 23:56                                                                                             ` H. Peter Anvin
  2012-12-19 23:22                                                                                           ` H. Peter Anvin
  1 sibling, 1 reply; 127+ messages in thread
From: Jacob Shin @ 2012-12-19 23:21 UTC (permalink / raw)
  To: Borislav Petkov, H. Peter Anvin, Yinghai Lu, H. Peter Anvin, Yu,
	Fenghua, mingo, linux-kernel, tglx, linux-tip-commits,
	Konrad Rzeszutek Wilk, Stefano Stabellini

On Thu, Dec 20, 2012 at 12:03:29AM +0100, Borislav Petkov wrote:
> On Wed, Dec 19, 2012 at 04:59:41PM -0600, Jacob Shin wrote:
> > I can check but right, they might be used up. But even if we had slots
> > available, the memory range that needs to be covered is in large
> > enough address and aligned in such a way that you cannot cover it with
> > variable range MTRRs.
> 
> Actually, if I'm not mistaken, you only need to cover the HT hole with
> one MTRR - the rest remains WB. And in order the mask bits to work, we
> could make it a little bigger - we waste some memory but that's nothing
> in comparison to the MCE.

Actually all memory hole above 4GB and under TOM2 needs to be marked
as UC, if the kernel just blanket calls init_memory_mapping from 4GB
to top of memory.

Right we would be loosing memory, and I think depending on the
alignment of the boundary and how many MTRRs you have avaiable to use,
significant chunks of memory could be lost. I need to go refresh on
how variable range MTRRs are programmed, it has been a while.

> 
> You might need to talk to hw guys about the feasibility of this deal
> though.
> 
> Thanks.
> 
> -- 
> Regards/Gruss,
>     Boris.
> 
> Sent from a fat crate under my desk. Formatting is fine.
> --
> 


^ permalink raw reply	[flat|nested] 127+ messages in thread

* Re: [tip:x86/microcode] x86/microcode_intel_early.c: Early update ucode on Intel's CPU
  2012-12-19 23:03                                                                                         ` Borislav Petkov
  2012-12-19 23:21                                                                                           ` Jacob Shin
@ 2012-12-19 23:22                                                                                           ` H. Peter Anvin
  2012-12-19 23:40                                                                                             ` Borislav Petkov
                                                                                                               ` (2 more replies)
  1 sibling, 3 replies; 127+ messages in thread
From: H. Peter Anvin @ 2012-12-19 23:22 UTC (permalink / raw)
  To: Borislav Petkov, Jacob Shin, Yinghai Lu, H. Peter Anvin, Yu,
	Fenghua, mingo, linux-kernel, tglx, linux-tip-commits,
	Konrad Rzeszutek Wilk, Stefano Stabellini

On 12/19/2012 03:03 PM, Borislav Petkov wrote:
> On Wed, Dec 19, 2012 at 04:59:41PM -0600, Jacob Shin wrote:
>> I can check but right, they might be used up. But even if we had slots
>> available, the memory range that needs to be covered is in large
>> enough address and aligned in such a way that you cannot cover it with
>> variable range MTRRs.
> 
> Actually, if I'm not mistaken, you only need to cover the HT hole with
> one MTRR - the rest remains WB. And in order the mask bits to work, we
> could make it a little bigger - we waste some memory but that's nothing
> in comparison to the MCE.
> 
> You might need to talk to hw guys about the feasibility of this deal
> though.
> 

Just make the hole a bit bigger, so it starts at 0xfc00000000, then you
only need one MTRR.  This is the correct BIOS-level fix, and it really
needs to happen.

Do these systems actually exist in the field or are they engineering
prototypes?  In the latter case, we might be done at that point.

Really, though, AMD should have added a TOM3 for memory above the 1T
mark since they should have been able to see a 1T hole coming from the
design of HyperTransport.  This would be the correct hardware-level fix,
but I don't expect that to happen.

Now, calming down a little bit, we are definitely dealing with BIOS
engineers and so f*ckups are going to happen, again and again.  The
question is what to do about it.

The only truly "safe" option is to limit early mappings to 4K pages.
This is highly undesirable for a bunch of reasons.  Reducing mapping
granularity to 2M rather than 1G (what Yinghai is proposing) does reduce
the exposure somewhat; it would be interesting to gather trap statistics
and try to get a feel for if this actually changes the boot time
measurably or not.

The other bit is that building the real kernel page tables iteratively
(ignoring the early page tables here) is safer, since the real page
table builder is fully aware of the memory map.  This means any
"spillover" from the early page tables gets minimized to regions where
there are data objects that have to be accessed early.  Since Yinghai
already had iterative page table building working, I don't see any
reason to not use that capability.

Thoughts?

	-hpa


^ permalink raw reply	[flat|nested] 127+ messages in thread

* Re: [tip:x86/microcode] x86/microcode_intel_early.c: Early update ucode on Intel's CPU
  2012-12-19 22:55                                                                                     ` Jacob Shin
  2012-12-19 23:00                                                                                       ` Borislav Petkov
@ 2012-12-19 23:23                                                                                       ` H. Peter Anvin
  1 sibling, 0 replies; 127+ messages in thread
From: H. Peter Anvin @ 2012-12-19 23:23 UTC (permalink / raw)
  To: Jacob Shin
  Cc: H. Peter Anvin, Borislav Petkov, Yinghai Lu, Yu, Fenghua, mingo,
	linux-kernel, tglx, linux-tip-commits, Konrad Rzeszutek Wilk,
	Stefano Stabellini

On 12/19/2012 02:55 PM, Jacob Shin wrote:
> 
> Well, really the problem is with any memory hole above 4GB that is too
> big to be covered by variable range MTRRs as UC. Because the kernel
> use to just simply do init_memory_mapping for 4GB ~ top of memory,
> any memory hole above 4GB are marked as WB in PATs.
> 
> How is this handled in Intel architecture? If there are memory holes
> that are too big to be covered by variable range MTRRs as UC, are
> there other MTRR like CPU registers that the BIOS programs?
> 

Intel CPUs don't have the TOM augmentation to the MTRR mechanism, and so
MTRRs need to explicitly enable caching of memory rather than the other
way around.

	-hpa


^ permalink raw reply	[flat|nested] 127+ messages in thread

* Re: [tip:x86/microcode] x86/microcode_intel_early.c: Early update ucode on Intel's CPU
  2012-12-19 23:17                                                                                         ` H. Peter Anvin
@ 2012-12-19 23:30                                                                                           ` Borislav Petkov
  2012-12-19 23:37                                                                                             ` H. Peter Anvin
  0 siblings, 1 reply; 127+ messages in thread
From: Borislav Petkov @ 2012-12-19 23:30 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Jacob Shin, Yinghai Lu, H. Peter Anvin, Yu, Fenghua, mingo,
	linux-kernel, tglx, linux-tip-commits, Konrad Rzeszutek Wilk,
	Stefano Stabellini

On Wed, Dec 19, 2012 at 03:17:59PM -0800, H. Peter Anvin wrote:
> I presume with "too big" he really means "oddly shaped".

Yeah, that's why it could be enlarged a little in order to adjust it to
the MTRR scheme. This is what the BKDG says about it:

PhysMask and PhysBase are used together to determine whether a target
physical-address falls within the specified address range. PhysMask
is logically ANDed with PhysBase and separately ANDed with the upper
40 bits of the target physical-address. If the results of the two
operations are identical, the target physical-address falls within the
specified memory range. The pseudo-code for the operation is:

MaskBase = PhysMask AND PhysBase
MaskTarget = PhysMask AND Target_Address[51:12]
IF MaskBase == MaskTarget
    target address is in range
ELSE
    target address is not in range

And then there are the alignment requirements:

* The boundary on which a variable range is aligned must be equal to the
range size. For example, a memory range of 16 Mbytes must be aligned on
a 16-Mbyte boundary.

* The range size must be a power of 2 (2n, 52 > n > 11), with a minimum
allowable size of 4 Kbytes. For example, 4 Mbytes and 8 Mbytes are
allowable memory range sizes, but 6 Mbytes is not allowable.

and then some examples about how to calculate those values.

Jacob, if you still have the system, you might try to experiment with
that, provided there are some variable MTRRs free, of course. And also
provided, there's nothing else in the hw stopping us from doing that.

Thanks.

-- 
Regards/Gruss,
    Boris.

Sent from a fat crate under my desk. Formatting is fine.
--

^ permalink raw reply	[flat|nested] 127+ messages in thread

* Re: [tip:x86/microcode] x86/microcode_intel_early.c: Early update ucode on Intel's CPU
  2012-12-19 23:30                                                                                           ` Borislav Petkov
@ 2012-12-19 23:37                                                                                             ` H. Peter Anvin
  0 siblings, 0 replies; 127+ messages in thread
From: H. Peter Anvin @ 2012-12-19 23:37 UTC (permalink / raw)
  To: Borislav Petkov, Jacob Shin, Yinghai Lu, H. Peter Anvin, Yu,
	Fenghua, mingo, linux-kernel, tglx, linux-tip-commits,
	Konrad Rzeszutek Wilk, Stefano Stabellini

On 12/19/2012 03:30 PM, Borislav Petkov wrote:
> On Wed, Dec 19, 2012 at 03:17:59PM -0800, H. Peter Anvin wrote:
>> I presume with "too big" he really means "oddly shaped".
>
> Yeah, that's why it could be enlarged a little in order to adjust it to
> the MTRR scheme. This is what the BKDG says about it:
>

Yes, they should just cap the hole a few megabytes short and put an UC 
MTRR at 0xfc00000000.  That should happen regardless... this system is 
dangerous without it.

	-hpa

-- 
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel.  I don't speak on their behalf.


^ permalink raw reply	[flat|nested] 127+ messages in thread

* Re: [tip:x86/microcode] x86/microcode_intel_early.c: Early update ucode on Intel's CPU
  2012-12-19 23:22                                                                                           ` H. Peter Anvin
@ 2012-12-19 23:40                                                                                             ` Borislav Petkov
  2012-12-20  0:02                                                                                               ` H. Peter Anvin
  2012-12-19 23:40                                                                                             ` Yinghai Lu
  2012-12-19 23:40                                                                                             ` Jacob Shin
  2 siblings, 1 reply; 127+ messages in thread
From: Borislav Petkov @ 2012-12-19 23:40 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Jacob Shin, Yinghai Lu, H. Peter Anvin, Yu, Fenghua, mingo,
	linux-kernel, tglx, linux-tip-commits, Konrad Rzeszutek Wilk,
	Stefano Stabellini

On Wed, Dec 19, 2012 at 03:22:13PM -0800, H. Peter Anvin wrote:

[ … ]

> Now, calming down a little bit, we are definitely dealing with BIOS
> engineers and so f*ckups are going to happen, again and again.

Yeppers.

> The only truly "safe" option is to limit early mappings to 4K pages.
> This is highly undesirable for a bunch of reasons.  Reducing mapping
> granularity to 2M rather than 1G (what Yinghai is proposing) does reduce
> the exposure somewhat; it would be interesting to gather trap statistics
> and try to get a feel for if this actually changes the boot time
> measurably or not.

This is done on the BSP, right? So we can measure it how long it takes
by taking TSC values of start and end.

> The other bit is that building the real kernel page tables iteratively
> (ignoring the early page tables here) is safer, since the real page
> table builder is fully aware of the memory map.  This means any
> "spillover" from the early page tables gets minimized to regions where
> there are data objects that have to be accessed early.

That shouldn't be a "lot", relatively speaking.

> Since Yinghai already had iterative page table building working, I
> don't see any reason to not use that capability.
> 
> Thoughts?

Sounds doable but we should take a hard look at the patches so that we
don't miss anything.

Also, I don't know how stuff like that would be approached for a wider
testing - I mean, it is a serious change in x86 boot code and there will
be issues.

Hmm.

-- 
Regards/Gruss,
    Boris.

Sent from a fat crate under my desk. Formatting is fine.
--

^ permalink raw reply	[flat|nested] 127+ messages in thread

* Re: [tip:x86/microcode] x86/microcode_intel_early.c: Early update ucode on Intel's CPU
  2012-12-19 23:22                                                                                           ` H. Peter Anvin
  2012-12-19 23:40                                                                                             ` Borislav Petkov
@ 2012-12-19 23:40                                                                                             ` Yinghai Lu
  2012-12-19 23:43                                                                                               ` H. Peter Anvin
  2012-12-19 23:40                                                                                             ` Jacob Shin
  2 siblings, 1 reply; 127+ messages in thread
From: Yinghai Lu @ 2012-12-19 23:40 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Borislav Petkov, Jacob Shin, H. Peter Anvin, Yu, Fenghua, mingo,
	linux-kernel, tglx, linux-tip-commits, Konrad Rzeszutek Wilk,
	Stefano Stabellini

On Wed, Dec 19, 2012 at 3:22 PM, H. Peter Anvin <hpa@zytor.com> wrote:
> The other bit is that building the real kernel page tables iteratively
> (ignoring the early page tables here) is safer, since the real page
> table builder is fully aware of the memory map.  This means any
> "spillover" from the early page tables gets minimized to regions where
> there are data objects that have to be accessed early.  Since Yinghai
> already had iterative page table building working, I don't see any
> reason to not use that capability.

that is v6, right?

including that patch

---

Subject: [PATCH] x86, 64bit: Set extra ident mapping for whole kernel range

Current when kernel is loaded above 1G, only [_text, _text+2M] is set
up with extra ident page table.
That is not enough, some variables that could be used early are out of
that range, like BRK for early page table.
Need to set map for [_text, _end] include text/data/bss/brk...

Also current kernel is not allowed to be loaded above 512g, it thinks
that address is too big.
We need to add one extra spare page for level3 to point that 512g range.
Need to check _text range and set level4 pg with that spare level3 page,
and set level3 with level2 page to cover [_text, _end] with extra mapping.

At last, to handle crossing GB boundary, we need to add another
level2 spare page. To handle crossing 512GB boundary, we need to
add another level3 spare page to next 512G range.

Test on with kexec-tools with local test code to force loading kernel
cross 1G, 5G, 512g, 513g.

We need this to put relocatable 64bit bzImage high above 1g.

-v4: add crossing GB boundary handling.
-v5: use spare pages from BRK, so could save pages when kernel is not
        loaded above 1GB.

^ permalink raw reply	[flat|nested] 127+ messages in thread

* Re: [tip:x86/microcode] x86/microcode_intel_early.c: Early update ucode on Intel's CPU
  2012-12-19 23:22                                                                                           ` H. Peter Anvin
  2012-12-19 23:40                                                                                             ` Borislav Petkov
  2012-12-19 23:40                                                                                             ` Yinghai Lu
@ 2012-12-19 23:40                                                                                             ` Jacob Shin
  2012-12-19 23:45                                                                                               ` Yinghai Lu
  2012-12-19 23:50                                                                                               ` H. Peter Anvin
  2 siblings, 2 replies; 127+ messages in thread
From: Jacob Shin @ 2012-12-19 23:40 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Borislav Petkov, Yinghai Lu, H. Peter Anvin, Yu, Fenghua, mingo,
	linux-kernel, tglx, linux-tip-commits, Konrad Rzeszutek Wilk,
	Stefano Stabellini

On Wed, Dec 19, 2012 at 03:22:13PM -0800, H. Peter Anvin wrote:
> On 12/19/2012 03:03 PM, Borislav Petkov wrote:
> > On Wed, Dec 19, 2012 at 04:59:41PM -0600, Jacob Shin wrote:
> >> I can check but right, they might be used up. But even if we had slots
> >> available, the memory range that needs to be covered is in large
> >> enough address and aligned in such a way that you cannot cover it with
> >> variable range MTRRs.
> > 
> > Actually, if I'm not mistaken, you only need to cover the HT hole with
> > one MTRR - the rest remains WB. And in order the mask bits to work, we
> > could make it a little bigger - we waste some memory but that's nothing
> > in comparison to the MCE.
> > 
> > You might need to talk to hw guys about the feasibility of this deal
> > though.
> > 
> 
> Just make the hole a bit bigger, so it starts at 0xfc00000000, then you
> only need one MTRR.  This is the correct BIOS-level fix, and it really
> needs to happen.
> 
> Do these systems actually exist in the field or are they engineering
> prototypes?  In the latter case, we might be done at that point.

Yes, HP is shipping (or will ship soon) such systems.

> 
> Really, though, AMD should have added a TOM3 for memory above the 1T
> mark since they should have been able to see a 1T hole coming from the
> design of HyperTransport.  This would be the correct hardware-level fix,
> but I don't expect that to happen.
> 

I'll feed this conversation back to our hardware folks, but yes we
still need to handle today's systems.

> Now, calming down a little bit, we are definitely dealing with BIOS
> engineers and so f*ckups are going to happen, again and again.  The
> question is what to do about it.
> 
> The only truly "safe" option is to limit early mappings to 4K pages.
> This is highly undesirable for a bunch of reasons.  Reducing mapping
> granularity to 2M rather than 1G (what Yinghai is proposing) does reduce
> the exposure somewhat; it would be interesting to gather trap statistics
> and try to get a feel for if this actually changes the boot time
> measurably or not.
> 
> The other bit is that building the real kernel page tables iteratively
> (ignoring the early page tables here) is safer, since the real page
> table builder is fully aware of the memory map.  This means any
> "spillover" from the early page tables gets minimized to regions where
> there are data objects that have to be accessed early.  Since Yinghai
> already had iterative page table building working, I don't see any
> reason to not use that capability.

Yes, I'll test again with latest, but Yinghai's patchset mapping only
RAM from top down solved our problem.

Thanks,

> 
> Thoughts?
> 
> 	-hpa
> 
> 


^ permalink raw reply	[flat|nested] 127+ messages in thread

* Re: [tip:x86/microcode] x86/microcode_intel_early.c: Early update ucode on Intel's CPU
  2012-12-19 23:40                                                                                             ` Yinghai Lu
@ 2012-12-19 23:43                                                                                               ` H. Peter Anvin
  2012-12-19 23:48                                                                                                 ` Yinghai Lu
  0 siblings, 1 reply; 127+ messages in thread
From: H. Peter Anvin @ 2012-12-19 23:43 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Borislav Petkov, Jacob Shin, H. Peter Anvin, Yu, Fenghua, mingo,
	linux-kernel, tglx, linux-tip-commits, Konrad Rzeszutek Wilk,
	Stefano Stabellini

On 12/19/2012 03:40 PM, Yinghai Lu wrote:
> On Wed, Dec 19, 2012 at 3:22 PM, H. Peter Anvin <hpa@zytor.com> wrote:
>> The other bit is that building the real kernel page tables iteratively
>> (ignoring the early page tables here) is safer, since the real page
>> table builder is fully aware of the memory map.  This means any
>> "spillover" from the early page tables gets minimized to regions where
>> there are data objects that have to be accessed early.  Since Yinghai
>> already had iterative page table building working, I don't see any
>> reason to not use that capability.
>
> that is v6, right?
>
> including that patch
>

No, that's just a different way to create the early page tables (and it 
doesn't solve anything, quite on the contrary.)  I'm talking about the 
strategy for creating the *permanent* page tables

	-hpa

-- 
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel.  I don't speak on their behalf.


^ permalink raw reply	[flat|nested] 127+ messages in thread

* Re: [tip:x86/microcode] x86/microcode_intel_early.c: Early update ucode on Intel's CPU
  2012-12-19 23:40                                                                                             ` Jacob Shin
@ 2012-12-19 23:45                                                                                               ` Yinghai Lu
  2012-12-19 23:50                                                                                               ` H. Peter Anvin
  1 sibling, 0 replies; 127+ messages in thread
From: Yinghai Lu @ 2012-12-19 23:45 UTC (permalink / raw)
  To: Jacob Shin
  Cc: H. Peter Anvin, Borislav Petkov, H. Peter Anvin, Yu, Fenghua,
	mingo, linux-kernel, tglx, linux-tip-commits,
	Konrad Rzeszutek Wilk, Stefano Stabellini

On Wed, Dec 19, 2012 at 3:40 PM, Jacob Shin <jacob.shin@amd.com> wrote:
> On Wed, Dec 19, 2012 at 03:22:13PM -0800, H. Peter Anvin wrote:
>> The other bit is that building the real kernel page tables iteratively
>> (ignoring the early page tables here) is safer, since the real page
>> table builder is fully aware of the memory map.  This means any
>> "spillover" from the early page tables gets minimized to regions where
>> there are data objects that have to be accessed early.  Since Yinghai
>> already had iterative page table building working, I don't see any
>> reason to not use that capability.
>
> Yes, I'll test again with latest, but Yinghai's patchset mapping only
> RAM from top down solved our problem.

that is for-x86-mm or tip:x86/mm2

we are taking about for-x86-boot, and it will allow kernel to be loaded above 4G
to solve the kdump problem.

so early map will have two way
1. extend head_64.S to cover kernel instead of just [0, 1G)
2. or peter's #PF handler version patch to set pg table dynamically.
    it could cover 1G when PF happens.

Yinghai

^ permalink raw reply	[flat|nested] 127+ messages in thread

* Re: [tip:x86/microcode] x86/microcode_intel_early.c: Early update ucode on Intel's CPU
  2012-12-19 23:43                                                                                               ` H. Peter Anvin
@ 2012-12-19 23:48                                                                                                 ` Yinghai Lu
  0 siblings, 0 replies; 127+ messages in thread
From: Yinghai Lu @ 2012-12-19 23:48 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Borislav Petkov, Jacob Shin, H. Peter Anvin, Yu, Fenghua, mingo,
	linux-kernel, tglx, linux-tip-commits, Konrad Rzeszutek Wilk,
	Stefano Stabellini

[-- Attachment #1: Type: text/plain, Size: 1076 bytes --]

On Wed, Dec 19, 2012 at 3:43 PM, H. Peter Anvin <hpa@zytor.com> wrote:
> On 12/19/2012 03:40 PM, Yinghai Lu wrote:
>>
>> On Wed, Dec 19, 2012 at 3:22 PM, H. Peter Anvin <hpa@zytor.com> wrote:
>>>
>>> The other bit is that building the real kernel page tables iteratively
>>> (ignoring the early page tables here) is safer, since the real page
>>> table builder is fully aware of the memory map.  This means any
>>> "spillover" from the early page tables gets minimized to regions where
>>> there are data objects that have to be accessed early.  Since Yinghai
>>> already had iterative page table building working, I don't see any
>>> reason to not use that capability.
>>
>>
>> that is v6, right?
>>
>> including that patch
>>
>
> No, that's just a different way to create the early page tables (and it
> doesn't solve anything, quite on the contrary.)  I'm talking about the
> strategy for creating the *permanent* page tables
>

i'm confused. permanent one is in tip/x86/mm2 right?

for for-x86-boot:
so you want v7 plus attached patch ? that change to 2M per PF.

Yinghai

[-- Attachment #2: fix_hpa_pe_pgt.patch --]
[-- Type: application/octet-stream, Size: 2697 bytes --]

Subject: [PATCH] x86, 64bit: #PF handler set page to cover 2M only

only add one 2M instead of 1G accessing one time for dynamically.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>

---
 arch/x86/kernel/head64.c |   40 ++++++++++++++++++++++++----------------
 1 file changed, 24 insertions(+), 16 deletions(-)

Index: linux-2.6/arch/x86/kernel/head64.c
===================================================================
--- linux-2.6.orig/arch/x86/kernel/head64.c
+++ linux-2.6/arch/x86/kernel/head64.c
@@ -52,7 +52,7 @@ int __init early_make_pgtable(unsigned l
 	unsigned long physaddr = address - __PAGE_OFFSET;
 	unsigned long i;
 	pgdval_t pgd, *pgd_p;
-	pudval_t *pud_p;
+	pudval_t pud, *pud_p;
 	pmdval_t pmd, *pmd_p;
 
 
@@ -60,8 +60,8 @@ int __init early_make_pgtable(unsigned l
 	if (physaddr >= MAXMEM || read_cr3() != __pa(early_level4_pgt))
 		return -1;
 
-	i = (address >> PGDIR_SHIFT) & (PTRS_PER_PGD - 1);
-	pgd_p = &early_level4_pgt[i].pgd;
+again:
+	pgd_p = &early_level4_pgt[pgd_index(address)].pgd;
 	pgd = *pgd_p;
 
 	/*
@@ -69,29 +69,37 @@ int __init early_make_pgtable(unsigned l
 	 * critical -- __PAGE_OFFSET would point us back into the dynamic
 	 * range and we might end up looping forever...
 	 */
-	if (pgd && next_early_pgt < EARLY_DYNAMIC_PAGE_TABLES) {
+	if (pgd)
 		pud_p = (pudval_t *)((pgd & PTE_PFN_MASK) + __START_KERNEL_map - phys_base);
-	} else {
-		if (next_early_pgt >= EARLY_DYNAMIC_PAGE_TABLES-1)
+	else {
+		if (next_early_pgt >= EARLY_DYNAMIC_PAGE_TABLES) {
 			reset_early_page_tables();
+			goto again;
+		}
 
 		pud_p = (pudval_t *)early_dynamic_pgts[next_early_pgt++];
 		for (i = 0; i < PTRS_PER_PUD; i++)
 			pud_p[i] = 0;
-
 		*pgd_p = (pgdval_t)pud_p - __START_KERNEL_map + phys_base + _KERNPG_TABLE;
 	}
-	i = (address >> PUD_SHIFT) & (PTRS_PER_PUD - 1);
-	pud_p += i;
+	pud_p += pud_index(address);
+	pud = *pud_p;
 
-	pmd_p = (pmdval_t *)early_dynamic_pgts[next_early_pgt++];
-	pmd = (physaddr & PUD_MASK) + (__PAGE_KERNEL_LARGE & ~_PAGE_GLOBAL);
-	for (i = 0; i < PTRS_PER_PMD; i++) {
-		pmd_p[i] = pmd;
-		pmd += PMD_SIZE;
-	}
+	if (pud)
+		pmd_p = (pmdval_t *)((pud & PTE_PFN_MASK) + __START_KERNEL_map - phys_base);
+	else {
+		if (next_early_pgt >= EARLY_DYNAMIC_PAGE_TABLES) {
+			reset_early_page_tables();
+			goto again;
+		}
 
-	*pud_p = (pudval_t)pmd_p - __START_KERNEL_map + phys_base + _KERNPG_TABLE;
+		pmd_p = (pmdval_t *)early_dynamic_pgts[next_early_pgt++];
+		for (i = 0; i < PTRS_PER_PMD; i++)
+			pmd_p[i] = 0;
+		*pud_p = (pudval_t)pmd_p - __START_KERNEL_map + phys_base + _KERNPG_TABLE;
+	}
+	pmd = (physaddr & PMD_MASK) + (__PAGE_KERNEL_LARGE & ~_PAGE_GLOBAL);
+	pmd_p[pmd_index(address)] = pmd;
 
 	return 0;
 }

^ permalink raw reply	[flat|nested] 127+ messages in thread

* Re: [tip:x86/microcode] x86/microcode_intel_early.c: Early update ucode on Intel's CPU
  2012-12-19 23:40                                                                                             ` Jacob Shin
  2012-12-19 23:45                                                                                               ` Yinghai Lu
@ 2012-12-19 23:50                                                                                               ` H. Peter Anvin
  2012-12-19 23:55                                                                                                 ` Borislav Petkov
  2012-12-20  0:07                                                                                                 ` Jacob Shin
  1 sibling, 2 replies; 127+ messages in thread
From: H. Peter Anvin @ 2012-12-19 23:50 UTC (permalink / raw)
  To: Jacob Shin
  Cc: Borislav Petkov, Yinghai Lu, H. Peter Anvin, Yu, Fenghua, mingo,
	linux-kernel, tglx, linux-tip-commits, Konrad Rzeszutek Wilk,
	Stefano Stabellini

On 12/19/2012 03:40 PM, Jacob Shin wrote:
>>
>> Just make the hole a bit bigger, so it starts at 0xfc00000000, then you
>> only need one MTRR.  This is the correct BIOS-level fix, and it really
>> needs to happen.
>>
>> Do these systems actually exist in the field or are they engineering
>> prototypes?  In the latter case, we might be done at that point.
>
> Yes, HP is shipping (or will ship soon) such systems.
>

Can you get them to fix the BIOS first, or at least ship a BIOS update? 
  Otherwise there will be a probabilistic failure, and it sounds like it 
is your (AMD's) fault.

>> The other bit is that building the real kernel page tables iteratively
>> (ignoring the early page tables here) is safer, since the real page
>> table builder is fully aware of the memory map.  This means any
>> "spillover" from the early page tables gets minimized to regions where
>> there are data objects that have to be accessed early.  Since Yinghai
>> already had iterative page table building working, I don't see any
>> reason to not use that capability.
>
> Yes, I'll test again with latest, but Yinghai's patchset mapping only
> RAM from top down solved our problem.

Please don't make me go Steve Ballmer on you.

We're talking about two different things... the early page tables versus 
the permanent page tables.  The permanent page tables we can handle 
because the page table creation at that point is aware of the memory map.

The early page tables are what is used before we get to that point. 
Creating them on demand means that if there are no early-needed data 
structures near the hole, there will be no access and everything will be 
okay, but as the early page table creation *is not and cannot be* aware 
of the memory map.  Right now that simply cannot happen, because all 
such data structures are confined to 32-bit addresses, however *THAT 
WILL CHANGE AND WILL CHANGE SOON*, exactly because these kinds of 
large-memory system needs that to happen.  You may start seeing failures 
at that time, and there isn't a huge lot we can do about it.

We are trying to discuss mitigation strategies with you, but you haven't 
really given us any useful information, e.g. what happens near the 
various boundaries of the hole, what could trigger prefeching into the 
range, and what it would take to fix the BIOSes.

	-hpa

-- 
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel.  I don't speak on their behalf.


^ permalink raw reply	[flat|nested] 127+ messages in thread

* Re: [tip:x86/microcode] x86/microcode_intel_early.c: Early update ucode on Intel's CPU
  2012-12-19 23:50                                                                                               ` H. Peter Anvin
@ 2012-12-19 23:55                                                                                                 ` Borislav Petkov
  2012-12-19 23:57                                                                                                   ` H. Peter Anvin
  2012-12-20  0:07                                                                                                 ` Jacob Shin
  1 sibling, 1 reply; 127+ messages in thread
From: Borislav Petkov @ 2012-12-19 23:55 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Jacob Shin, Yinghai Lu, H. Peter Anvin, Yu, Fenghua, mingo,
	linux-kernel, tglx, linux-tip-commits, Konrad Rzeszutek Wilk,
	Stefano Stabellini

On Wed, Dec 19, 2012 at 03:50:14PM -0800, H. Peter Anvin wrote:
> We are trying to discuss mitigation strategies with you, but you
> haven't really given us any useful information, e.g. what happens near
> the various boundaries of the hole, what could trigger prefeching into
> the range, and what it would take to fix the BIOSes.

Another thing we could do (I admit it is ugly) is to add a quirk to the
#MC handler and detect that specific condition by looking at the address
reported in MCi_ADDR and exit early by not panicking the system.

Again, this is ugly but a possibility, still.

-- 
Regards/Gruss,
    Boris.

Sent from a fat crate under my desk. Formatting is fine.
--

^ permalink raw reply	[flat|nested] 127+ messages in thread

* Re: [tip:x86/microcode] x86/microcode_intel_early.c: Early update ucode on Intel's CPU
  2012-12-19 23:21                                                                                           ` Jacob Shin
@ 2012-12-19 23:56                                                                                             ` H. Peter Anvin
  0 siblings, 0 replies; 127+ messages in thread
From: H. Peter Anvin @ 2012-12-19 23:56 UTC (permalink / raw)
  To: Jacob Shin
  Cc: Borislav Petkov, Yinghai Lu, H. Peter Anvin, Yu, Fenghua, mingo,
	linux-kernel, tglx, linux-tip-commits, Konrad Rzeszutek Wilk,
	Stefano Stabellini

On 12/19/2012 03:21 PM, Jacob Shin wrote:
> On Thu, Dec 20, 2012 at 12:03:29AM +0100, Borislav Petkov wrote:
>> On Wed, Dec 19, 2012 at 04:59:41PM -0600, Jacob Shin wrote:
>>> I can check but right, they might be used up. But even if we had slots
>>> available, the memory range that needs to be covered is in large
>>> enough address and aligned in such a way that you cannot cover it with
>>> variable range MTRRs.
>>
>> Actually, if I'm not mistaken, you only need to cover the HT hole with
>> one MTRR - the rest remains WB. And in order the mask bits to work, we
>> could make it a little bigger - we waste some memory but that's nothing
>> in comparison to the MCE.
>
> Actually all memory hole above 4GB and under TOM2 needs to be marked
> as UC, if the kernel just blanket calls init_memory_mapping from 4GB
> to top of memory.
>
> Right we would be loosing memory, and I think depending on the
> alignment of the boundary and how many MTRRs you have avaiable to use,
> significant chunks of memory could be lost. I need to go refresh on
> how variable range MTRRs are programmed, it has been a while.
>

In this particular case an MTRR at 0xe000000000 would lose 896 MB of 
RAM, or just under 0.1% of the total.

If it is only the HT region that causes trouble and not the rest of the 
hole you could just plant an MTRR at 0xfc00000000 and not lose any 
memory at all.

	-hpa

-- 
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel.  I don't speak on their behalf.


^ permalink raw reply	[flat|nested] 127+ messages in thread

* Re: [tip:x86/microcode] x86/microcode_intel_early.c: Early update ucode on Intel's CPU
  2012-12-19 23:55                                                                                                 ` Borislav Petkov
@ 2012-12-19 23:57                                                                                                   ` H. Peter Anvin
  0 siblings, 0 replies; 127+ messages in thread
From: H. Peter Anvin @ 2012-12-19 23:57 UTC (permalink / raw)
  To: Borislav Petkov, Jacob Shin, Yinghai Lu, H. Peter Anvin, Yu,
	Fenghua, mingo, linux-kernel, tglx, linux-tip-commits,
	Konrad Rzeszutek Wilk, Stefano Stabellini

On 12/19/2012 03:55 PM, Borislav Petkov wrote:
> On Wed, Dec 19, 2012 at 03:50:14PM -0800, H. Peter Anvin wrote:
>> We are trying to discuss mitigation strategies with you, but you
>> haven't really given us any useful information, e.g. what happens near
>> the various boundaries of the hole, what could trigger prefeching into
>> the range, and what it would take to fix the BIOSes.
>
> Another thing we could do (I admit it is ugly) is to add a quirk to the
> #MC handler and detect that specific condition by looking at the address
> reported in MCi_ADDR and exit early by not panicking the system.
>
> Again, this is ugly but a possibility, still.
>

I would really, really hate to have to deal with an early MCE handler, too.

	-hpa

-- 
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel.  I don't speak on their behalf.


^ permalink raw reply	[flat|nested] 127+ messages in thread

* Re: [tip:x86/microcode] x86/microcode_intel_early.c: Early update ucode on Intel's CPU
  2012-12-19 23:40                                                                                             ` Borislav Petkov
@ 2012-12-20  0:02                                                                                               ` H. Peter Anvin
  2012-12-20  0:10                                                                                                 ` Borislav Petkov
  0 siblings, 1 reply; 127+ messages in thread
From: H. Peter Anvin @ 2012-12-20  0:02 UTC (permalink / raw)
  To: Borislav Petkov, Jacob Shin, Yinghai Lu, H. Peter Anvin, Yu,
	Fenghua, mingo, linux-kernel, tglx, linux-tip-commits,
	Konrad Rzeszutek Wilk, Stefano Stabellini

On 12/19/2012 03:40 PM, Borislav Petkov wrote:
>
> This is done on the BSP, right? So we can measure it how long it takes
> by taking TSC values of start and end.
>

Yes, and we can count the number of #PF traps cheaply enough.  It would 
be interesting to put a counter on the number of #PFs and the number of 
resets and read them out on a large-system boot.

>
> Sounds doable but we should take a hard look at the patches so that we
> don't miss anything.
>
> Also, I don't know how stuff like that would be approached for a wider
> testing - I mean, it is a serious change in x86 boot code and there will
> be issues.
>

The goal should be to have this into -tip and -next by the middle of 
January in order to make the 3.9 merge window, I think.

	-hpa

-- 
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel.  I don't speak on their behalf.


^ permalink raw reply	[flat|nested] 127+ messages in thread

* Re: [tip:x86/microcode] x86/microcode_intel_early.c: Early update ucode on Intel's CPU
  2012-12-19 23:50                                                                                               ` H. Peter Anvin
  2012-12-19 23:55                                                                                                 ` Borislav Petkov
@ 2012-12-20  0:07                                                                                                 ` Jacob Shin
  2012-12-20  0:24                                                                                                   ` H. Peter Anvin
  1 sibling, 1 reply; 127+ messages in thread
From: Jacob Shin @ 2012-12-20  0:07 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Borislav Petkov, Yinghai Lu, H. Peter Anvin, Yu, Fenghua, mingo,
	linux-kernel, tglx, linux-tip-commits, Konrad Rzeszutek Wilk,
	Stefano Stabellini

On Wed, Dec 19, 2012 at 03:50:14PM -0800, H. Peter Anvin wrote:
> On 12/19/2012 03:40 PM, Jacob Shin wrote:
> >>
> >>Just make the hole a bit bigger, so it starts at 0xfc00000000, then you
> >>only need one MTRR.  This is the correct BIOS-level fix, and it really
> >>needs to happen.
> >>
> >>Do these systems actually exist in the field or are they engineering
> >>prototypes?  In the latter case, we might be done at that point.
> >
> >Yes, HP is shipping (or will ship soon) such systems.
> >
> 
> Can you get them to fix the BIOS first, or at least ship a BIOS
> update?  Otherwise there will be a probabilistic failure, and it
> sounds like it is your (AMD's) fault.
> 
> >>The other bit is that building the real kernel page tables iteratively
> >>(ignoring the early page tables here) is safer, since the real page
> >>table builder is fully aware of the memory map.  This means any
> >>"spillover" from the early page tables gets minimized to regions where
> >>there are data objects that have to be accessed early.  Since Yinghai
> >>already had iterative page table building working, I don't see any
> >>reason to not use that capability.
> >
> >Yes, I'll test again with latest, but Yinghai's patchset mapping only
> >RAM from top down solved our problem.
> 
> Please don't make me go Steve Ballmer on you.
> 
> We're talking about two different things... the early page tables
> versus the permanent page tables.  The permanent page tables we can
> handle because the page table creation at that point is aware of the
> memory map.

Ah okay,

> 
> The early page tables are what is used before we get to that point.
> Creating them on demand means that if there are no early-needed data
> structures near the hole, there will be no access and everything
> will be okay, but as the early page table creation *is not and
> cannot be* aware of the memory map.  Right now that simply cannot
> happen, because all such data structures are confined to 32-bit
> addresses, however *THAT WILL CHANGE AND WILL CHANGE SOON*, exactly
> because these kinds of large-memory system needs that to happen.
> You may start seeing failures at that time, and there isn't a huge
> lot we can do about it.
> 
> We are trying to discuss mitigation strategies with you, but you
> haven't really given us any useful information, e.g. what happens
> near the various boundaries of the hole, what could trigger
> prefeching into the range, and what it would take to fix the BIOSes.

>From what I remember, accessing memory around the memory hole (not
just the HT hole, but e038000000 ~ 10000000000 on our mentioned system
) generated prefetches because the memory hole was marked as WB in PAT.

I'll take a look at the system again, try the blanket MTRR covering
0xe000000000 ~ 1TB, and talk to our BIOS guys.

> 
> 	-hpa
> 
> -- 
> H. Peter Anvin, Intel Open Source Technology Center
> I work for Intel.  I don't speak on their behalf.
> 
> 


^ permalink raw reply	[flat|nested] 127+ messages in thread

* Re: [tip:x86/microcode] x86/microcode_intel_early.c: Early update ucode on Intel's CPU
  2012-12-20  0:02                                                                                               ` H. Peter Anvin
@ 2012-12-20  0:10                                                                                                 ` Borislav Petkov
  2012-12-20  0:15                                                                                                   ` H. Peter Anvin
  0 siblings, 1 reply; 127+ messages in thread
From: Borislav Petkov @ 2012-12-20  0:10 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Jacob Shin, Yinghai Lu, H. Peter Anvin, Yu, Fenghua, mingo,
	linux-kernel, tglx, linux-tip-commits, Konrad Rzeszutek Wilk,
	Stefano Stabellini

On Wed, Dec 19, 2012 at 04:02:25PM -0800, H. Peter Anvin wrote:
> The goal should be to have this into -tip and -next by the middle of
> January in order to make the 3.9 merge window, I think.

...and an easy back-out strategy in case there are too many issues while
testing. Maybe don't merge it into tip/master so that it can be removed
easily, or something to that effect.

-- 
Regards/Gruss,
    Boris.

Sent from a fat crate under my desk. Formatting is fine.
--

^ permalink raw reply	[flat|nested] 127+ messages in thread

* Re: [tip:x86/microcode] x86/microcode_intel_early.c: Early update ucode on Intel's CPU
  2012-12-20  0:10                                                                                                 ` Borislav Petkov
@ 2012-12-20  0:15                                                                                                   ` H. Peter Anvin
  0 siblings, 0 replies; 127+ messages in thread
From: H. Peter Anvin @ 2012-12-20  0:15 UTC (permalink / raw)
  To: Borislav Petkov, Jacob Shin, Yinghai Lu, H. Peter Anvin, Yu,
	Fenghua, mingo, linux-kernel, tglx, linux-tip-commits,
	Konrad Rzeszutek Wilk, Stefano Stabellini

On 12/19/2012 04:10 PM, Borislav Petkov wrote:
> On Wed, Dec 19, 2012 at 04:02:25PM -0800, H. Peter Anvin wrote:
>> The goal should be to have this into -tip and -next by the middle of
>> January in order to make the 3.9 merge window, I think.
>
> ...and an easy back-out strategy in case there are too many issues while
> testing. Maybe don't merge it into tip/master so that it can be removed
> easily, or something to that effect.
>

We keep everything in topic branches; tip:master is a synthetic branch 
which can be regenerated as needed.

	-hpa

-- 
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel.  I don't speak on their behalf.


^ permalink raw reply	[flat|nested] 127+ messages in thread

* Re: [tip:x86/microcode] x86/microcode_intel_early.c: Early update ucode on Intel's CPU
  2012-12-20  0:07                                                                                                 ` Jacob Shin
@ 2012-12-20  0:24                                                                                                   ` H. Peter Anvin
  2012-12-20  0:29                                                                                                     ` Jacob Shin
  0 siblings, 1 reply; 127+ messages in thread
From: H. Peter Anvin @ 2012-12-20  0:24 UTC (permalink / raw)
  To: Jacob Shin
  Cc: Borislav Petkov, Yinghai Lu, H. Peter Anvin, Yu, Fenghua, mingo,
	linux-kernel, tglx, linux-tip-commits, Konrad Rzeszutek Wilk,
	Stefano Stabellini

On 12/19/2012 04:07 PM, Jacob Shin wrote:
> 
> From what I remember, accessing memory around the memory hole (not
> just the HT hole, but e038000000 ~ 10000000000 on our mentioned system
> ) generated prefetches because the memory hole was marked as WB in PAT.
> 
> I'll take a look at the system again, try the blanket MTRR covering
> 0xe000000000 ~ 1TB, and talk to our BIOS guys.
> 

Yes, but do they all #MC (as opposed to, say, fetching all FFs)?

	-hpa



^ permalink raw reply	[flat|nested] 127+ messages in thread

* Re: [tip:x86/microcode] x86/microcode_intel_early.c: Early update ucode on Intel's CPU
  2012-12-20  0:24                                                                                                   ` H. Peter Anvin
@ 2012-12-20  0:29                                                                                                     ` Jacob Shin
  2012-12-20  0:41                                                                                                       ` H. Peter Anvin
  2012-12-20  2:37                                                                                                       ` H. Peter Anvin
  0 siblings, 2 replies; 127+ messages in thread
From: Jacob Shin @ 2012-12-20  0:29 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Borislav Petkov, Yinghai Lu, H. Peter Anvin, Yu, Fenghua, mingo,
	linux-kernel, tglx, linux-tip-commits, Konrad Rzeszutek Wilk,
	Stefano Stabellini

On Wed, Dec 19, 2012 at 04:24:09PM -0800, H. Peter Anvin wrote:
> On 12/19/2012 04:07 PM, Jacob Shin wrote:
> > 
> > From what I remember, accessing memory around the memory hole (not
> > just the HT hole, but e038000000 ~ 10000000000 on our mentioned system
> > ) generated prefetches because the memory hole was marked as WB in PAT.
> > 
> > I'll take a look at the system again, try the blanket MTRR covering
> > 0xe000000000 ~ 1TB, and talk to our BIOS guys.
> > 
> 
> Yes, but do they all #MC (as opposed to, say, fetching all FFs)?

Yes, MCE every time and it was fatal.

> 
> 	-hpa
> 
> 
> 


^ permalink raw reply	[flat|nested] 127+ messages in thread

* Re: [tip:x86/microcode] x86/microcode_intel_early.c: Early update ucode on Intel's CPU
  2012-12-20  0:29                                                                                                     ` Jacob Shin
@ 2012-12-20  0:41                                                                                                       ` H. Peter Anvin
  2012-12-20  2:37                                                                                                       ` H. Peter Anvin
  1 sibling, 0 replies; 127+ messages in thread
From: H. Peter Anvin @ 2012-12-20  0:41 UTC (permalink / raw)
  To: Jacob Shin
  Cc: Borislav Petkov, Yinghai Lu, H. Peter Anvin, Yu, Fenghua, mingo,
	linux-kernel, tglx, linux-tip-commits, Konrad Rzeszutek Wilk,
	Stefano Stabellini

On 12/19/2012 04:29 PM, Jacob Shin wrote:
> On Wed, Dec 19, 2012 at 04:24:09PM -0800, H. Peter Anvin wrote:
>> On 12/19/2012 04:07 PM, Jacob Shin wrote:
>>>
>>> From what I remember, accessing memory around the memory hole (not
>>> just the HT hole, but e038000000 ~ 10000000000 on our mentioned system
>>> ) generated prefetches because the memory hole was marked as WB in PAT.
>>>
>>> I'll take a look at the system again, try the blanket MTRR covering
>>> 0xe000000000 ~ 1TB, and talk to our BIOS guys.
>>>
>>
>> Yes, but do they all #MC (as opposed to, say, fetching all FFs)?
> 
> Yes, MCE every time and it was fatal.
> 

So regardless of address.  Bother.

	-hpa



^ permalink raw reply	[flat|nested] 127+ messages in thread

* Re: [tip:x86/microcode] x86/microcode_intel_early.c: Early update ucode on Intel's CPU
  2012-12-20  0:29                                                                                                     ` Jacob Shin
  2012-12-20  0:41                                                                                                       ` H. Peter Anvin
@ 2012-12-20  2:37                                                                                                       ` H. Peter Anvin
  2012-12-20  4:16                                                                                                         ` Jacob Shin
  1 sibling, 1 reply; 127+ messages in thread
From: H. Peter Anvin @ 2012-12-20  2:37 UTC (permalink / raw)
  To: Jacob Shin
  Cc: Borislav Petkov, Yinghai Lu, H. Peter Anvin, Yu, Fenghua, mingo,
	linux-kernel, tglx, linux-tip-commits, Konrad Rzeszutek Wilk,
	Stefano Stabellini

On 12/19/2012 04:29 PM, Jacob Shin wrote:
> On Wed, Dec 19, 2012 at 04:24:09PM -0800, H. Peter Anvin wrote:
>> On 12/19/2012 04:07 PM, Jacob Shin wrote:
>>>
>>> From what I remember, accessing memory around the memory hole (not
>>> just the HT hole, but e038000000 ~ 10000000000 on our mentioned system
>>> ) generated prefetches because the memory hole was marked as WB in PAT.
>>>
>>> I'll take a look at the system again, try the blanket MTRR covering
>>> 0xe000000000 ~ 1TB, and talk to our BIOS guys.
>>>
>>
>> Yes, but do they all #MC (as opposed to, say, fetching all FFs)?
> 
> Yes, MCE every time and it was fatal.
> 

OK, one more question... there is something odd with the memory ranges here:

 BIOS-e820: [mem 0x0000000100000000-0x000000e037ffffff] usable
 BIOS-e820: [mem 0x000000e038000000-0x000000fcffffffff] reserved
 BIOS-e820: [mem 0x0000010000000000-0x0000011ffeffffff] usable

The first usable range here is 4G to 896G + 896M which is an awfully
strange number.  Similarly, the second range is 1T to 1T + 128G - 16M.
The little fiddly bits imply that there is either overshoot of some sort
going on -- possibly reserved memory -- or these are fairly arbitrary
sizes that don't match any physical bank sizes in which case it should
be possible to shuffle it differently...

	-hpa


^ permalink raw reply	[flat|nested] 127+ messages in thread

* Re: [tip:x86/microcode] x86/microcode_intel_early.c: Early update ucode on Intel's CPU
  2012-12-20  2:37                                                                                                       ` H. Peter Anvin
@ 2012-12-20  4:16                                                                                                         ` Jacob Shin
  2012-12-20  4:21                                                                                                           ` H. Peter Anvin
  0 siblings, 1 reply; 127+ messages in thread
From: Jacob Shin @ 2012-12-20  4:16 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Borislav Petkov, Yinghai Lu, H. Peter Anvin, Yu, Fenghua, mingo,
	linux-kernel, tglx, linux-tip-commits, Konrad Rzeszutek Wilk,
	Stefano Stabellini

On Wed, Dec 19, 2012 at 06:37:45PM -0800, H. Peter Anvin wrote:
> On 12/19/2012 04:29 PM, Jacob Shin wrote:
> > On Wed, Dec 19, 2012 at 04:24:09PM -0800, H. Peter Anvin wrote:
> >> On 12/19/2012 04:07 PM, Jacob Shin wrote:
> >>>
> >>> From what I remember, accessing memory around the memory hole (not
> >>> just the HT hole, but e038000000 ~ 10000000000 on our mentioned system
> >>> ) generated prefetches because the memory hole was marked as WB in PAT.
> >>>
> >>> I'll take a look at the system again, try the blanket MTRR covering
> >>> 0xe000000000 ~ 1TB, and talk to our BIOS guys.
> >>>
> >>
> >> Yes, but do they all #MC (as opposed to, say, fetching all FFs)?
> > 
> > Yes, MCE every time and it was fatal.
> > 
> 
> OK, one more question... there is something odd with the memory ranges here:
> 
>  BIOS-e820: [mem 0x0000000100000000-0x000000e037ffffff] usable
>  BIOS-e820: [mem 0x000000e038000000-0x000000fcffffffff] reserved
>  BIOS-e820: [mem 0x0000010000000000-0x0000011ffeffffff] usable
> 
> The first usable range here is 4G to 896G + 896M which is an awfully
> strange number.  Similarly, the second range is 1T to 1T + 128G - 16M.
> The little fiddly bits imply that there is either overshoot of some sort
> going on -- possibly reserved memory -- or these are fairly arbitrary
> sizes that don't match any physical bank sizes in which case it should
> be possible to shuffle it differently...

Not exactly sure why the wierd boundaries, I'll have to ask the BIOS
side folks to be sure. But if I were to guess ..

Here is the NUMA spew out, physically there is 128 GB connected to
each memory controller node. The PCI MMIO region starts at 0xc8000000.
4 GB - 0xc8000000 = 0x3800000 (896 MB). So we loose 896 MB due to PCI
MMIO hole, so the first node ends at 128 GB + 896 MB to talk to all of
128 GB off of the first memory controller, and hence the weird 896 MB
offset.

[    0.000000] SRAT: Node 0 PXM 0 0-a0000
[    0.000000] SRAT: Node 0 PXM 0 100000-c8000000
[    0.000000] SRAT: Node 0 PXM 0 100000000-2038000000
[    0.000000] SRAT: Node 1 PXM 1 2038000000-4038000000
[    0.000000] SRAT: Node 2 PXM 2 4038000000-6038000000
[    0.000000] SRAT: Node 3 PXM 3 6038000000-8038000000
[    0.000000] SRAT: Node 4 PXM 4 8038000000-a038000000
[    0.000000] SRAT: Node 5 PXM 5 a038000000-c038000000
[    0.000000] SRAT: Node 6 PXM 6 c038000000-e038000000
[    0.000000] SRAT: Node 7 PXM 7 10000000000-11fff000000
[    0.000000] NUMA: Initialized distance table, cnt=8
[    0.000000] NUMA: Node 0 [0,a0000) + [100000,c8000000) -> [0,c8000000)
[    0.000000] NUMA: Node 0 [0,c8000000) + [100000000,2038000000) -> [0,2038000000)
[    0.000000] Initmem setup node 0 0000000000000000-0000002038000000
[    0.000000]   NODE_DATA [0000002037ff5000 - 0000002037ffffff]
[    0.000000] Initmem setup node 1 0000002038000000-0000004038000000
[    0.000000]   NODE_DATA [0000004037ff5000 - 0000004037ffffff]
[    0.000000] Initmem setup node 2 0000004038000000-0000006038000000
[    0.000000]   NODE_DATA [0000006037ff5000 - 0000006037ffffff]
[    0.000000] Initmem setup node 3 0000006038000000-0000008038000000
[    0.000000]   NODE_DATA [0000008037ff5000 - 0000008037ffffff]
[    0.000000] Initmem setup node 4 0000008038000000-000000a038000000
[    0.000000]   NODE_DATA [000000a037ff5000 - 000000a037ffffff]
[    0.000000] Initmem setup node 5 000000a038000000-000000c038000000
[    0.000000]   NODE_DATA [000000c037ff5000 - 000000c037ffffff]
[    0.000000] Initmem setup node 6 000000c038000000-000000e038000000
[    0.000000]   NODE_DATA [000000e037ff2000 - 000000e037ffcfff]
[    0.000000] Initmem setup node 7 0000010000000000-0000011fff000000
[    0.000000]   NODE_DATA [0000011ffeff1000 - 0000011ffeffbfff]
[    0.000000] Zone PFN ranges:
[    0.000000]   DMA      0x00000010 -> 0x00001000
[    0.000000]   DMA32    0x00001000 -> 0x00100000
[    0.000000]   Normal   0x00100000 -> 0x11fff000
[    0.000000] Movable zone start PFN for each node
[    0.000000] early_node_map[10] active PFN ranges
[    0.000000]     0: 0x00000010 -> 0x00000099
[    0.000000]     0: 0x00000100 -> 0x000c7ec0
[    0.000000]     0: 0x00100000 -> 0x02038000
[    0.000000]     1: 0x02038000 -> 0x04038000
[    0.000000]     2: 0x04038000 -> 0x06038000
[    0.000000]     3: 0x06038000 -> 0x08038000
[    0.000000]     4: 0x08038000 -> 0x0a038000
[    0.000000]     5: 0x0a038000 -> 0x0c038000
[    0.000000]     6: 0x0c038000 -> 0x0e038000
[    0.000000]     7: 0x10000000 -> 0x11fff000
[    0.000000] On node 0 totalpages: 33553993
[    0.000000]   DMA zone: 56 pages used for memmap
[    0.000000]   DMA zone: 5 pages reserved
[    0.000000]   DMA zone: 3916 pages, LIFO batch:0
[    0.000000]   DMA32 zone: 14280 pages used for memmap
[    0.000000]   DMA32 zone: 800504 pages, LIFO batch:31
[    0.000000]   Normal zone: 447552 pages used for memmap
[    0.000000]   Normal zone: 32287680 pages, LIFO batch:31
[    0.000000] On node 1 totalpages: 33554432
[    0.000000]   Normal zone: 458752 pages used for memmap
[    0.000000]   Normal zone: 33095680 pages, LIFO batch:31
[    0.000000] On node 2 totalpages: 33554432
[    0.000000]   Normal zone: 458752 pages used for memmap
[    0.000000]   Normal zone: 33095680 pages, LIFO batch:31
[    0.000000] On node 3 totalpages: 33554432
[    0.000000]   Normal zone: 458752 pages used for memmap
[    0.000000]   Normal zone: 33095680 pages, LIFO batch:31
[    0.000000] On node 4 totalpages: 33554432
[    0.000000]   Normal zone: 458752 pages used for memmap
[    0.000000]   Normal zone: 33095680 pages, LIFO batch:31
[    0.000000] On node 5 totalpages: 33554432
[    0.000000]   Normal zone: 458752 pages used for memmap
[    0.000000]   Normal zone: 33095680 pages, LIFO batch:31
[    0.000000] On node 6 totalpages: 33554432
[    0.000000]   Normal zone: 458752 pages used for memmap
[    0.000000]   Normal zone: 33095680 pages, LIFO batch:31
[    0.000000] On node 7 totalpages: 33550336
[    0.000000]   Normal zone: 458696 pages used for memmap
[    0.000000]   Normal zone: 33091640 pages, LIFO batch:31

> 
> 	-hpa
> 
> 


^ permalink raw reply	[flat|nested] 127+ messages in thread

* Re: [tip:x86/microcode] x86/microcode_intel_early.c: Early update ucode on Intel's CPU
  2012-12-20  4:16                                                                                                         ` Jacob Shin
@ 2012-12-20  4:21                                                                                                           ` H. Peter Anvin
  0 siblings, 0 replies; 127+ messages in thread
From: H. Peter Anvin @ 2012-12-20  4:21 UTC (permalink / raw)
  To: Jacob Shin
  Cc: H. Peter Anvin, Borislav Petkov, Yinghai Lu, Yu, Fenghua, mingo,
	linux-kernel, tglx, linux-tip-commits, Konrad Rzeszutek Wilk,
	Stefano Stabellini

On 12/19/2012 08:16 PM, Jacob Shin wrote:
> 
> Not exactly sure why the wierd boundaries, I'll have to ask the BIOS
> side folks to be sure. But if I were to guess ..
> 
> Here is the NUMA spew out, physically there is 128 GB connected to
> each memory controller node. The PCI MMIO region starts at 0xc8000000.
> 4 GB - 0xc8000000 = 0x3800000 (896 MB). So we loose 896 MB due to PCI
> MMIO hole, so the first node ends at 128 GB + 896 MB to talk to all of
> 128 GB off of the first memory controller, and hence the weird 896 MB
> offset.
> 

It would obviously be better if the slack were at the end of the total
memory, instead of end of the < 1T range.  If the PCI MMIO hole were a
power of 2 (e.g. 1G) that would also reduce the likelihood of problems
and reduce MTRR pressure.

	-hpa


^ permalink raw reply	[flat|nested] 127+ messages in thread

* [tip:x86/microcode] x86/microcode_intel_early.c: Early update ucode on Intel's CPU
  2012-12-21  7:44 [PATCH v5 08/12] x86/microcode_intel_early.c: Early update ucode on Intel's CPU Fenghua Yu
@ 2013-01-31 22:33 ` tip-bot for Fenghua Yu
  0 siblings, 0 replies; 127+ messages in thread
From: tip-bot for Fenghua Yu @ 2013-01-31 22:33 UTC (permalink / raw)
  To: linux-tip-commits; +Cc: linux-kernel, hpa, mingo, fenghua.yu, tglx, hpa

Commit-ID:  ec400ddeff200b068ddc6c70f7321f49ecf32ed5
Gitweb:     http://git.kernel.org/tip/ec400ddeff200b068ddc6c70f7321f49ecf32ed5
Author:     Fenghua Yu <fenghua.yu@intel.com>
AuthorDate: Thu, 20 Dec 2012 23:44:28 -0800
Committer:  H. Peter Anvin <hpa@linux.intel.com>
CommitDate: Thu, 31 Jan 2013 13:19:18 -0800

x86/microcode_intel_early.c: Early update ucode on Intel's CPU

Implementation of early update ucode on Intel's CPU.

load_ucode_intel_bsp() scans ucode in initrd image file which is a cpio format
ucode followed by ordinary initrd image file. The binary ucode file is stored
in kernel/x86/microcode/GenuineIntel.bin in the cpio data. All ucode
patches with the same model as BSP are saved in memory. A matching ucode patch
is updated on BSP.

load_ucode_intel_ap() reads saved ucoded patches and updates ucode on AP.

Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Link: http://lkml.kernel.org/r/1356075872-3054-9-git-send-email-fenghua.yu@intel.com
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
---
 arch/x86/kernel/microcode_intel_early.c | 796 ++++++++++++++++++++++++++++++++
 1 file changed, 796 insertions(+)

diff --git a/arch/x86/kernel/microcode_intel_early.c b/arch/x86/kernel/microcode_intel_early.c
new file mode 100644
index 0000000..7890bc8
--- /dev/null
+++ b/arch/x86/kernel/microcode_intel_early.c
@@ -0,0 +1,796 @@
+/*
+ *	Intel CPU microcode early update for Linux
+ *
+ *	Copyright (C) 2012 Fenghua Yu <fenghua.yu@intel.com>
+ *			   H Peter Anvin" <hpa@zytor.com>
+ *
+ *	This allows to early upgrade microcode on Intel processors
+ *	belonging to IA-32 family - PentiumPro, Pentium II,
+ *	Pentium III, Xeon, Pentium 4, etc.
+ *
+ *	Reference: Section 9.11 of Volume 3, IA-32 Intel Architecture
+ *	Software Developer's Manual.
+ *
+ *	This program is free software; you can redistribute it and/or
+ *	modify it under the terms of the GNU General Public License
+ *	as published by the Free Software Foundation; either version
+ *	2 of the License, or (at your option) any later version.
+ */
+#include <linux/module.h>
+#include <linux/mm.h>
+#include <linux/slab.h>
+#include <linux/earlycpio.h>
+#include <linux/initrd.h>
+#include <linux/cpu.h>
+#include <asm/msr.h>
+#include <asm/microcode_intel.h>
+#include <asm/processor.h>
+#include <asm/tlbflush.h>
+#include <asm/setup.h>
+
+unsigned long mc_saved_in_initrd[MAX_UCODE_COUNT];
+struct mc_saved_data {
+	unsigned int mc_saved_count;
+	struct microcode_intel **mc_saved;
+} mc_saved_data;
+
+static enum ucode_state __cpuinit
+generic_load_microcode_early(struct microcode_intel **mc_saved_p,
+			     unsigned int mc_saved_count,
+			     struct ucode_cpu_info *uci)
+{
+	struct microcode_intel *ucode_ptr, *new_mc = NULL;
+	int new_rev = uci->cpu_sig.rev;
+	enum ucode_state state = UCODE_OK;
+	unsigned int mc_size;
+	struct microcode_header_intel *mc_header;
+	unsigned int csig = uci->cpu_sig.sig;
+	unsigned int cpf = uci->cpu_sig.pf;
+	int i;
+
+	for (i = 0; i < mc_saved_count; i++) {
+		ucode_ptr = mc_saved_p[i];
+
+		mc_header = (struct microcode_header_intel *)ucode_ptr;
+		mc_size = get_totalsize(mc_header);
+		if (get_matching_microcode(csig, cpf, ucode_ptr, new_rev)) {
+			new_rev = mc_header->rev;
+			new_mc  = ucode_ptr;
+		}
+	}
+
+	if (!new_mc) {
+		state = UCODE_NFOUND;
+		goto out;
+	}
+
+	uci->mc = (struct microcode_intel *)new_mc;
+out:
+	return state;
+}
+
+static void __cpuinit
+microcode_pointer(struct microcode_intel **mc_saved,
+		  unsigned long *mc_saved_in_initrd,
+		  unsigned long initrd_start, int mc_saved_count)
+{
+	int i;
+
+	for (i = 0; i < mc_saved_count; i++)
+		mc_saved[i] = (struct microcode_intel *)
+			      (mc_saved_in_initrd[i] + initrd_start);
+}
+
+#ifdef CONFIG_X86_32
+static void __cpuinit
+microcode_phys(struct microcode_intel **mc_saved_tmp,
+	       struct mc_saved_data *mc_saved_data)
+{
+	int i;
+	struct microcode_intel ***mc_saved;
+
+	mc_saved = (struct microcode_intel ***)
+		   __pa_symbol(&mc_saved_data->mc_saved);
+	for (i = 0; i < mc_saved_data->mc_saved_count; i++) {
+		struct microcode_intel *p;
+
+		p = *(struct microcode_intel **)
+			__pa(mc_saved_data->mc_saved + i);
+		mc_saved_tmp[i] = (struct microcode_intel *)__pa(p);
+	}
+}
+#endif
+
+static enum ucode_state __cpuinit
+load_microcode(struct mc_saved_data *mc_saved_data,
+	       unsigned long *mc_saved_in_initrd,
+	       unsigned long initrd_start,
+	       struct ucode_cpu_info *uci)
+{
+	struct microcode_intel *mc_saved_tmp[MAX_UCODE_COUNT];
+	unsigned int count = mc_saved_data->mc_saved_count;
+
+	if (!mc_saved_data->mc_saved) {
+		microcode_pointer(mc_saved_tmp, mc_saved_in_initrd,
+				  initrd_start, count);
+
+		return generic_load_microcode_early(mc_saved_tmp, count, uci);
+	} else {
+#ifdef CONFIG_X86_32
+		microcode_phys(mc_saved_tmp, mc_saved_data);
+		return generic_load_microcode_early(mc_saved_tmp, count, uci);
+#else
+		return generic_load_microcode_early(mc_saved_data->mc_saved,
+						    count, uci);
+#endif
+	}
+}
+
+static u8 get_x86_family(unsigned long sig)
+{
+	u8 x86;
+
+	x86 = (sig >> 8) & 0xf;
+
+	if (x86 == 0xf)
+		x86 += (sig >> 20) & 0xff;
+
+	return x86;
+}
+
+static u8 get_x86_model(unsigned long sig)
+{
+	u8 x86, x86_model;
+
+	x86 = get_x86_family(sig);
+	x86_model = (sig >> 4) & 0xf;
+
+	if (x86 == 0x6 || x86 == 0xf)
+		x86_model += ((sig >> 16) & 0xf) << 4;
+
+	return x86_model;
+}
+
+/*
+ * Given CPU signature and a microcode patch, this function finds if the
+ * microcode patch has matching family and model with the CPU.
+ */
+static enum ucode_state
+matching_model_microcode(struct microcode_header_intel *mc_header,
+			unsigned long sig)
+{
+	u8 x86, x86_model;
+	u8 x86_ucode, x86_model_ucode;
+	struct extended_sigtable *ext_header;
+	unsigned long total_size = get_totalsize(mc_header);
+	unsigned long data_size = get_datasize(mc_header);
+	int ext_sigcount, i;
+	struct extended_signature *ext_sig;
+
+	x86 = get_x86_family(sig);
+	x86_model = get_x86_model(sig);
+
+	x86_ucode = get_x86_family(mc_header->sig);
+	x86_model_ucode = get_x86_model(mc_header->sig);
+
+	if (x86 == x86_ucode && x86_model == x86_model_ucode)
+		return UCODE_OK;
+
+	/* Look for ext. headers: */
+	if (total_size <= data_size + MC_HEADER_SIZE)
+		return UCODE_NFOUND;
+
+	ext_header = (struct extended_sigtable *)
+		     mc_header + data_size + MC_HEADER_SIZE;
+	ext_sigcount = ext_header->count;
+	ext_sig = (void *)ext_header + EXT_HEADER_SIZE;
+
+	for (i = 0; i < ext_sigcount; i++) {
+		x86_ucode = get_x86_family(ext_sig->sig);
+		x86_model_ucode = get_x86_model(ext_sig->sig);
+
+		if (x86 == x86_ucode && x86_model == x86_model_ucode)
+			return UCODE_OK;
+
+		ext_sig++;
+	}
+
+	return UCODE_NFOUND;
+}
+
+static int
+save_microcode(struct mc_saved_data *mc_saved_data,
+	       struct microcode_intel **mc_saved_src,
+	       unsigned int mc_saved_count)
+{
+	int i, j;
+	struct microcode_intel **mc_saved_p;
+	int ret;
+
+	if (!mc_saved_count)
+		return -EINVAL;
+
+	/*
+	 * Copy new microcode data.
+	 */
+	mc_saved_p = kmalloc(mc_saved_count*sizeof(struct microcode_intel *),
+			     GFP_KERNEL);
+	if (!mc_saved_p)
+		return -ENOMEM;
+
+	for (i = 0; i < mc_saved_count; i++) {
+		struct microcode_intel *mc = mc_saved_src[i];
+		struct microcode_header_intel *mc_header = &mc->hdr;
+		unsigned long mc_size = get_totalsize(mc_header);
+		mc_saved_p[i] = kmalloc(mc_size, GFP_KERNEL);
+		if (!mc_saved_p[i]) {
+			ret = -ENOMEM;
+			goto err;
+		}
+		if (!mc_saved_src[i]) {
+			ret = -EINVAL;
+			goto err;
+		}
+		memcpy(mc_saved_p[i], mc, mc_size);
+	}
+
+	/*
+	 * Point to newly saved microcode.
+	 */
+	mc_saved_data->mc_saved = mc_saved_p;
+	mc_saved_data->mc_saved_count = mc_saved_count;
+
+	return 0;
+
+err:
+	for (j = 0; j <= i; j++)
+		kfree(mc_saved_p[j]);
+	kfree(mc_saved_p);
+
+	return ret;
+}
+
+/*
+ * A microcode patch in ucode_ptr is saved into mc_saved
+ * - if it has matching signature and newer revision compared to an existing
+ *   patch mc_saved.
+ * - or if it is a newly discovered microcode patch.
+ *
+ * The microcode patch should have matching model with CPU.
+ */
+static void _save_mc(struct microcode_intel **mc_saved, u8 *ucode_ptr,
+		     unsigned int *mc_saved_count_p)
+{
+	int i;
+	int found = 0;
+	unsigned int mc_saved_count = *mc_saved_count_p;
+	struct microcode_header_intel *mc_header;
+
+	mc_header = (struct microcode_header_intel *)ucode_ptr;
+	for (i = 0; i < mc_saved_count; i++) {
+		unsigned int sig, pf;
+		unsigned int new_rev;
+		struct microcode_header_intel *mc_saved_header =
+			     (struct microcode_header_intel *)mc_saved[i];
+		sig = mc_saved_header->sig;
+		pf = mc_saved_header->pf;
+		new_rev = mc_header->rev;
+
+		if (get_matching_sig(sig, pf, ucode_ptr, new_rev)) {
+			found = 1;
+			if (update_match_revision(mc_header, new_rev)) {
+				/*
+				 * Found an older ucode saved before.
+				 * Replace the older one with this newer
+				 * one.
+				 */
+				mc_saved[i] =
+					(struct microcode_intel *)ucode_ptr;
+				break;
+			}
+		}
+	}
+	if (i >= mc_saved_count && !found)
+		/*
+		 * This ucode is first time discovered in ucode file.
+		 * Save it to memory.
+		 */
+		mc_saved[mc_saved_count++] =
+				 (struct microcode_intel *)ucode_ptr;
+
+	*mc_saved_count_p = mc_saved_count;
+}
+
+/*
+ * Get microcode matching with BSP's model. Only CPUs with the same model as
+ * BSP can stay in the platform.
+ */
+static enum ucode_state __init
+get_matching_model_microcode(int cpu, unsigned long start,
+			     void *data, size_t size,
+			     struct mc_saved_data *mc_saved_data,
+			     unsigned long *mc_saved_in_initrd,
+			     struct ucode_cpu_info *uci)
+{
+	u8 *ucode_ptr = data;
+	unsigned int leftover = size;
+	enum ucode_state state = UCODE_OK;
+	unsigned int mc_size;
+	struct microcode_header_intel *mc_header;
+	struct microcode_intel *mc_saved_tmp[MAX_UCODE_COUNT];
+	unsigned int mc_saved_count = mc_saved_data->mc_saved_count;
+	int i;
+
+	while (leftover) {
+		mc_header = (struct microcode_header_intel *)ucode_ptr;
+
+		mc_size = get_totalsize(mc_header);
+		if (!mc_size || mc_size > leftover ||
+			microcode_sanity_check(ucode_ptr, 0) < 0)
+			break;
+
+		leftover -= mc_size;
+
+		/*
+		 * Since APs with same family and model as the BSP may boot in
+		 * the platform, we need to find and save microcode patches
+		 * with the same family and model as the BSP.
+		 */
+		if (matching_model_microcode(mc_header, uci->cpu_sig.sig) !=
+			 UCODE_OK) {
+			ucode_ptr += mc_size;
+			continue;
+		}
+
+		_save_mc(mc_saved_tmp, ucode_ptr, &mc_saved_count);
+
+		ucode_ptr += mc_size;
+	}
+
+	if (leftover) {
+		state = UCODE_ERROR;
+		goto out;
+	}
+
+	if (mc_saved_count == 0) {
+		state = UCODE_NFOUND;
+		goto out;
+	}
+
+	for (i = 0; i < mc_saved_count; i++)
+		mc_saved_in_initrd[i] = (unsigned long)mc_saved_tmp[i] - start;
+
+	mc_saved_data->mc_saved_count = mc_saved_count;
+out:
+	return state;
+}
+
+#define native_rdmsr(msr, val1, val2)		\
+do {						\
+	u64 __val = native_read_msr((msr));	\
+	(void)((val1) = (u32)__val);		\
+	(void)((val2) = (u32)(__val >> 32));	\
+} while (0)
+
+#define native_wrmsr(msr, low, high)		\
+	native_write_msr(msr, low, high);
+
+static int __cpuinit collect_cpu_info_early(struct ucode_cpu_info *uci)
+{
+	unsigned int val[2];
+	u8 x86, x86_model;
+	struct cpu_signature csig;
+	unsigned int eax, ebx, ecx, edx;
+
+	csig.sig = 0;
+	csig.pf = 0;
+	csig.rev = 0;
+
+	memset(uci, 0, sizeof(*uci));
+
+	eax = 0x00000001;
+	ecx = 0;
+	native_cpuid(&eax, &ebx, &ecx, &edx);
+	csig.sig = eax;
+
+	x86 = get_x86_family(csig.sig);
+	x86_model = get_x86_model(csig.sig);
+
+	if ((x86_model >= 5) || (x86 > 6)) {
+		/* get processor flags from MSR 0x17 */
+		native_rdmsr(MSR_IA32_PLATFORM_ID, val[0], val[1]);
+		csig.pf = 1 << ((val[1] >> 18) & 7);
+	}
+	native_wrmsr(MSR_IA32_UCODE_REV, 0, 0);
+
+	/* As documented in the SDM: Do a CPUID 1 here */
+	sync_core();
+
+	/* get the current revision from MSR 0x8B */
+	native_rdmsr(MSR_IA32_UCODE_REV, val[0], val[1]);
+
+	csig.rev = val[1];
+
+	uci->cpu_sig = csig;
+	uci->valid = 1;
+
+	return 0;
+}
+
+#ifdef DEBUG
+static void __ref show_saved_mc(void)
+{
+	int i, j;
+	unsigned int sig, pf, rev, total_size, data_size, date;
+	struct ucode_cpu_info uci;
+
+	if (mc_saved_data.mc_saved_count == 0) {
+		pr_debug("no micorcode data saved.\n");
+		return;
+	}
+	pr_debug("Total microcode saved: %d\n", mc_saved_data.mc_saved_count);
+
+	collect_cpu_info_early(&uci);
+
+	sig = uci.cpu_sig.sig;
+	pf = uci.cpu_sig.pf;
+	rev = uci.cpu_sig.rev;
+	pr_debug("CPU%d: sig=0x%x, pf=0x%x, rev=0x%x\n",
+		 smp_processor_id(), sig, pf, rev);
+
+	for (i = 0; i < mc_saved_data.mc_saved_count; i++) {
+		struct microcode_header_intel *mc_saved_header;
+		struct extended_sigtable *ext_header;
+		int ext_sigcount;
+		struct extended_signature *ext_sig;
+
+		mc_saved_header = (struct microcode_header_intel *)
+				  mc_saved_data.mc_saved[i];
+		sig = mc_saved_header->sig;
+		pf = mc_saved_header->pf;
+		rev = mc_saved_header->rev;
+		total_size = get_totalsize(mc_saved_header);
+		data_size = get_datasize(mc_saved_header);
+		date = mc_saved_header->date;
+
+		pr_debug("mc_saved[%d]: sig=0x%x, pf=0x%x, rev=0x%x, toal size=0x%x, date = %04x-%02x-%02x\n",
+			 i, sig, pf, rev, total_size,
+			 date & 0xffff,
+			 date >> 24,
+			 (date >> 16) & 0xff);
+
+		/* Look for ext. headers: */
+		if (total_size <= data_size + MC_HEADER_SIZE)
+			continue;
+
+		ext_header = (struct extended_sigtable *)
+			     mc_saved_header + data_size + MC_HEADER_SIZE;
+		ext_sigcount = ext_header->count;
+		ext_sig = (void *)ext_header + EXT_HEADER_SIZE;
+
+		for (j = 0; j < ext_sigcount; j++) {
+			sig = ext_sig->sig;
+			pf = ext_sig->pf;
+
+			pr_debug("\tExtended[%d]: sig=0x%x, pf=0x%x\n",
+				 j, sig, pf);
+
+			ext_sig++;
+		}
+
+	}
+}
+#else
+static inline void show_saved_mc(void)
+{
+}
+#endif
+
+#if defined(CONFIG_MICROCODE_INTEL_EARLY) && defined(CONFIG_HOTPLUG_CPU)
+/*
+ * Save this mc into mc_saved_data. So it will be loaded early when a CPU is
+ * hot added or resumes.
+ *
+ * Please make sure this mc should be a valid microcode patch before calling
+ * this function.
+ */
+int save_mc_for_early(u8 *mc)
+{
+	struct microcode_intel *mc_saved_tmp[MAX_UCODE_COUNT];
+	unsigned int mc_saved_count_init;
+	unsigned int mc_saved_count;
+	struct microcode_intel **mc_saved;
+	int ret = 0;
+	int i;
+
+	/*
+	 * Hold hotplug lock so mc_saved_data is not accessed by a CPU in
+	 * hotplug.
+	 */
+	cpu_hotplug_driver_lock();
+
+	mc_saved_count_init = mc_saved_data.mc_saved_count;
+	mc_saved_count = mc_saved_data.mc_saved_count;
+	mc_saved = mc_saved_data.mc_saved;
+
+	if (mc_saved && mc_saved_count)
+		memcpy(mc_saved_tmp, mc_saved,
+		       mc_saved_count * sizeof(struct mirocode_intel *));
+	/*
+	 * Save the microcode patch mc in mc_save_tmp structure if it's a newer
+	 * version.
+	 */
+
+	_save_mc(mc_saved_tmp, mc, &mc_saved_count);
+
+	/*
+	 * Save the mc_save_tmp in global mc_saved_data.
+	 */
+	ret = save_microcode(&mc_saved_data, mc_saved_tmp, mc_saved_count);
+	if (ret) {
+		pr_err("Can not save microcode patch.\n");
+		goto out;
+	}
+
+	show_saved_mc();
+
+	/*
+	 * Free old saved microcod data.
+	 */
+	if (mc_saved) {
+		for (i = 0; i < mc_saved_count_init; i++)
+			kfree(mc_saved[i]);
+		kfree(mc_saved);
+	}
+
+out:
+	cpu_hotplug_driver_unlock();
+
+	return ret;
+}
+EXPORT_SYMBOL_GPL(save_mc_for_early);
+#endif
+
+static __initdata char ucode_name[] = "kernel/x86/microcode/GenuineIntel.bin";
+static __init enum ucode_state
+scan_microcode(unsigned long start, unsigned long end,
+		struct mc_saved_data *mc_saved_data,
+		unsigned long *mc_saved_in_initrd,
+		struct ucode_cpu_info *uci)
+{
+	unsigned int size = end - start + 1;
+	struct cpio_data cd;
+	long offset = 0;
+#ifdef CONFIG_X86_32
+	char *p = (char *)__pa_symbol(ucode_name);
+#else
+	char *p = ucode_name;
+#endif
+
+	cd.data = NULL;
+	cd.size = 0;
+
+	cd = find_cpio_data(p, (void *)start, size, &offset);
+	if (!cd.data)
+		return UCODE_ERROR;
+
+
+	return get_matching_model_microcode(0, start, cd.data, cd.size,
+					    mc_saved_data, mc_saved_in_initrd,
+					    uci);
+}
+
+/*
+ * Print ucode update info.
+ */
+static void __cpuinit
+print_ucode_info(struct ucode_cpu_info *uci, unsigned int date)
+{
+	int cpu = smp_processor_id();
+
+	pr_info("CPU%d microcode updated early to revision 0x%x, date = %04x-%02x-%02x\n",
+		cpu,
+		uci->cpu_sig.rev,
+		date & 0xffff,
+		date >> 24,
+		(date >> 16) & 0xff);
+}
+
+#ifdef CONFIG_X86_32
+
+static int delay_ucode_info;
+static int current_mc_date;
+
+/*
+ * Print early updated ucode info after printk works. This is delayed info dump.
+ */
+void __cpuinit show_ucode_info_early(void)
+{
+	struct ucode_cpu_info uci;
+
+	if (delay_ucode_info) {
+		collect_cpu_info_early(&uci);
+		print_ucode_info(&uci, current_mc_date);
+		delay_ucode_info = 0;
+	}
+}
+
+/*
+ * At this point, we can not call printk() yet. Keep microcode patch number in
+ * mc_saved_data.mc_saved and delay printing microcode info in
+ * show_ucode_info_early() until printk() works.
+ */
+static void __cpuinit print_ucode(struct ucode_cpu_info *uci)
+{
+	struct microcode_intel *mc_intel;
+	int *delay_ucode_info_p;
+	int *current_mc_date_p;
+
+	mc_intel = uci->mc;
+	if (mc_intel == NULL)
+		return;
+
+	delay_ucode_info_p = (int *)__pa_symbol(&delay_ucode_info);
+	current_mc_date_p = (int *)__pa_symbol(&current_mc_date);
+
+	*delay_ucode_info_p = 1;
+	*current_mc_date_p = mc_intel->hdr.date;
+}
+#else
+
+/*
+ * Flush global tlb. We only do this in x86_64 where paging has been enabled
+ * already and PGE should be enabled as well.
+ */
+static inline void __cpuinit flush_tlb_early(void)
+{
+	__native_flush_tlb_global_irq_disabled();
+}
+
+static inline void __cpuinit print_ucode(struct ucode_cpu_info *uci)
+{
+	struct microcode_intel *mc_intel;
+
+	mc_intel = uci->mc;
+	if (mc_intel == NULL)
+		return;
+
+	print_ucode_info(uci, mc_intel->hdr.date);
+}
+#endif
+
+static int apply_microcode_early(struct mc_saved_data *mc_saved_data,
+				 struct ucode_cpu_info *uci)
+{
+	struct microcode_intel *mc_intel;
+	unsigned int val[2];
+
+	mc_intel = uci->mc;
+	if (mc_intel == NULL)
+		return 0;
+
+	/* write microcode via MSR 0x79 */
+	native_wrmsr(MSR_IA32_UCODE_WRITE,
+	      (unsigned long) mc_intel->bits,
+	      (unsigned long) mc_intel->bits >> 16 >> 16);
+	native_wrmsr(MSR_IA32_UCODE_REV, 0, 0);
+
+	/* As documented in the SDM: Do a CPUID 1 here */
+	sync_core();
+
+	/* get the current revision from MSR 0x8B */
+	native_rdmsr(MSR_IA32_UCODE_REV, val[0], val[1]);
+	if (val[1] != mc_intel->hdr.rev)
+		return -1;
+
+#ifdef CONFIG_X86_64
+	/* Flush global tlb. This is precaution. */
+	flush_tlb_early();
+#endif
+	uci->cpu_sig.rev = val[1];
+
+	print_ucode(uci);
+
+	return 0;
+}
+
+/*
+ * This function converts microcode patch offsets previously stored in
+ * mc_saved_in_initrd to pointers and stores the pointers in mc_saved_data.
+ */
+int __init save_microcode_in_initrd(void)
+{
+	unsigned int count = mc_saved_data.mc_saved_count;
+	struct microcode_intel *mc_saved[MAX_UCODE_COUNT];
+	int ret = 0;
+
+	if (count == 0)
+		return ret;
+
+	microcode_pointer(mc_saved, mc_saved_in_initrd, initrd_start, count);
+	ret = save_microcode(&mc_saved_data, mc_saved, count);
+	if (ret)
+		pr_err("Can not save microcod patches from initrd");
+
+	show_saved_mc();
+
+	return ret;
+}
+
+static void __init
+_load_ucode_intel_bsp(struct mc_saved_data *mc_saved_data,
+		      unsigned long *mc_saved_in_initrd,
+		      unsigned long initrd_start_early,
+		      unsigned long initrd_end_early,
+		      struct ucode_cpu_info *uci)
+{
+	collect_cpu_info_early(uci);
+	scan_microcode(initrd_start_early, initrd_end_early, mc_saved_data,
+		       mc_saved_in_initrd, uci);
+	load_microcode(mc_saved_data, mc_saved_in_initrd,
+		       initrd_start_early, uci);
+	apply_microcode_early(mc_saved_data, uci);
+}
+
+void __init
+load_ucode_intel_bsp(void)
+{
+	u64 ramdisk_image, ramdisk_size;
+	unsigned long initrd_start_early, initrd_end_early;
+	struct ucode_cpu_info uci;
+#ifdef CONFIG_X86_32
+	struct boot_params *boot_params_p;
+
+	boot_params_p = (struct boot_params *)__pa_symbol(&boot_params);
+	ramdisk_image = boot_params_p->hdr.ramdisk_image;
+	ramdisk_size  = boot_params_p->hdr.ramdisk_size;
+	initrd_start_early = ramdisk_image;
+	initrd_end_early = initrd_start_early + ramdisk_size;
+
+	_load_ucode_intel_bsp(
+		(struct mc_saved_data *)__pa_symbol(&mc_saved_data),
+		(unsigned long *)__pa_symbol(&mc_saved_in_initrd),
+		initrd_start_early, initrd_end_early, &uci);
+#else
+	ramdisk_image = boot_params.hdr.ramdisk_image;
+	ramdisk_size  = boot_params.hdr.ramdisk_size;
+	initrd_start_early = ramdisk_image + PAGE_OFFSET;
+	initrd_end_early = initrd_start_early + ramdisk_size;
+
+	_load_ucode_intel_bsp(&mc_saved_data, mc_saved_in_initrd,
+			      initrd_start_early, initrd_end_early, &uci);
+#endif
+}
+
+void __cpuinit load_ucode_intel_ap(void)
+{
+	struct mc_saved_data *mc_saved_data_p;
+	struct ucode_cpu_info uci;
+	unsigned long *mc_saved_in_initrd_p;
+	unsigned long initrd_start_addr;
+#ifdef CONFIG_X86_32
+	unsigned long *initrd_start_p;
+
+	mc_saved_in_initrd_p =
+		(unsigned long *)__pa_symbol(mc_saved_in_initrd);
+	mc_saved_data_p = (struct mc_saved_data *)__pa_symbol(&mc_saved_data);
+	initrd_start_p = (unsigned long *)__pa_symbol(&initrd_start);
+	initrd_start_addr = (unsigned long)__pa_symbol(*initrd_start_p);
+#else
+	mc_saved_data_p = &mc_saved_data;
+	mc_saved_in_initrd_p = mc_saved_in_initrd;
+	initrd_start_addr = initrd_start;
+#endif
+
+	/*
+	 * If there is no valid ucode previously saved in memory, no need to
+	 * update ucode on this AP.
+	 */
+	if (mc_saved_data_p->mc_saved_count == 0)
+		return;
+
+	collect_cpu_info_early(&uci);
+	load_microcode(mc_saved_data_p, mc_saved_in_initrd_p,
+		       initrd_start_addr, &uci);
+	apply_microcode_early(mc_saved_data_p, &uci);
+}

^ permalink raw reply related	[flat|nested] 127+ messages in thread

end of thread, other threads:[~2013-01-31 22:33 UTC | newest]

Thread overview: 127+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-11-30  1:47 [PATCH v2 00/11] x86/microcode: Early load microcode Fenghua Yu
2012-11-30  1:47 ` [PATCH v2 01/10] Documentation/x86: " Fenghua Yu
2012-11-30 19:46   ` H. Peter Anvin
2012-11-30 20:40     ` Yu, Fenghua
2012-11-30  1:47 ` [PATCH v2 02/10] x86/microcode_intel.h: Define functions and macros for early loading ucode Fenghua Yu
2012-12-01  0:21   ` [tip:x86/microcode] " tip-bot for Fenghua Yu
2012-11-30  1:47 ` [PATCH v2 03/10] x86/microcode_core_early.c: Define interfaces " Fenghua Yu
2012-12-01  0:23   ` [tip:x86/microcode] " tip-bot for Fenghua Yu
2012-11-30  1:47 ` [PATCH v2 04/10] x86/microcode_intel_lib.c: Early update ucode on Intel's CPU Fenghua Yu
2012-12-01  0:24   ` [tip:x86/microcode] " tip-bot for Fenghua Yu
2012-11-30  1:47 ` [PATCH v2 05/10] x86/microcode_intel_early.c: " Fenghua Yu
2012-12-01  0:25   ` [tip:x86/microcode] " tip-bot for Fenghua Yu
2012-12-01  0:55     ` Yinghai Lu
2012-12-04  0:18       ` Yu, Fenghua
2012-12-11  2:39         ` Yinghai Lu
2012-12-11  3:41           ` H. Peter Anvin
2012-12-11  3:55             ` Yinghai Lu
2012-12-11  6:34               ` H. Peter Anvin
2012-12-11  7:07                 ` Yinghai Lu
2012-12-11 14:57                   ` Borislav Petkov
2012-12-11 16:46                     ` Yinghai Lu
2012-12-11 16:48                       ` H. Peter Anvin
2012-12-11 17:00                         ` Yinghai Lu
2012-12-11 17:06                           ` Borislav Petkov
2012-12-11 17:15                             ` Yinghai Lu
2012-12-11 17:26                               ` Yu, Fenghua
2012-12-11 17:38                               ` H. Peter Anvin
2012-12-11 23:53                                 ` Yinghai Lu
2012-12-11 23:57                                   ` H. Peter Anvin
2012-12-12  0:27                                     ` Yinghai Lu
2012-12-12  0:37                                       ` H. Peter Anvin
2012-12-12  7:14                                         ` Yinghai Lu
2012-12-12 10:26                                           ` Yinghai Lu
2012-12-13  1:06                                           ` Yinghai Lu
2012-12-12  6:57                                       ` H. Peter Anvin
2012-12-12 13:38                                         ` Borislav Petkov
2012-12-12 17:43                                           ` H. Peter Anvin
2012-12-13  5:12                                           ` H. Peter Anvin
2012-12-13  5:26                                             ` H. Peter Anvin
2012-12-13  7:01                                               ` Yinghai Lu
2012-12-13 15:01                                                 ` H. Peter Anvin
2012-12-13 19:13                                               ` Borislav Petkov
2012-12-13 21:36                                                 ` H. Peter Anvin
2012-12-14  9:11                                                   ` Yinghai Lu
2012-12-14 18:16                                                     ` H. Peter Anvin
2012-12-14 19:46                                                     ` H. Peter Anvin
2012-12-14 20:04                                                       ` Yinghai Lu
2012-12-14 20:08                                                         ` Yinghai Lu
2012-12-14 20:14                                                           ` Yinghai Lu
2012-12-14 20:44                                                             ` H. Peter Anvin
2012-12-15  7:57                                                           ` Yinghai Lu
2012-12-15 19:30                                                             ` H. Peter Anvin
2012-12-15 20:55                                                               ` Yinghai Lu
2012-12-15 21:31                                                                 ` H. Peter Anvin
2012-12-15 21:40                                                                 ` H. Peter Anvin
2012-12-15 22:13                                                                   ` Yinghai Lu
2012-12-15 22:17                                                                     ` H. Peter Anvin
2012-12-15 23:15                                                                       ` Yinghai Lu
2012-12-15 23:17                                                                         ` H. Peter Anvin
2012-12-19 20:37                                                                           ` Borislav Petkov
2012-12-19 21:07                                                                             ` Jacob Shin
2012-12-19 21:48                                                                               ` H. Peter Anvin
2012-12-19 22:05                                                                                 ` Jacob Shin
2012-12-19 22:25                                                                                   ` H. Peter Anvin
2012-12-19 22:47                                                                                     ` Yinghai Lu
2012-12-19 22:59                                                                                       ` H. Peter Anvin
2012-12-19 22:51                                                                                     ` Borislav Petkov
2012-12-19 22:59                                                                                       ` Jacob Shin
2012-12-19 23:03                                                                                         ` Borislav Petkov
2012-12-19 23:21                                                                                           ` Jacob Shin
2012-12-19 23:56                                                                                             ` H. Peter Anvin
2012-12-19 23:22                                                                                           ` H. Peter Anvin
2012-12-19 23:40                                                                                             ` Borislav Petkov
2012-12-20  0:02                                                                                               ` H. Peter Anvin
2012-12-20  0:10                                                                                                 ` Borislav Petkov
2012-12-20  0:15                                                                                                   ` H. Peter Anvin
2012-12-19 23:40                                                                                             ` Yinghai Lu
2012-12-19 23:43                                                                                               ` H. Peter Anvin
2012-12-19 23:48                                                                                                 ` Yinghai Lu
2012-12-19 23:40                                                                                             ` Jacob Shin
2012-12-19 23:45                                                                                               ` Yinghai Lu
2012-12-19 23:50                                                                                               ` H. Peter Anvin
2012-12-19 23:55                                                                                                 ` Borislav Petkov
2012-12-19 23:57                                                                                                   ` H. Peter Anvin
2012-12-20  0:07                                                                                                 ` Jacob Shin
2012-12-20  0:24                                                                                                   ` H. Peter Anvin
2012-12-20  0:29                                                                                                     ` Jacob Shin
2012-12-20  0:41                                                                                                       ` H. Peter Anvin
2012-12-20  2:37                                                                                                       ` H. Peter Anvin
2012-12-20  4:16                                                                                                         ` Jacob Shin
2012-12-20  4:21                                                                                                           ` H. Peter Anvin
2012-12-19 22:55                                                                                     ` Jacob Shin
2012-12-19 23:00                                                                                       ` Borislav Petkov
2012-12-19 23:17                                                                                         ` H. Peter Anvin
2012-12-19 23:30                                                                                           ` Borislav Petkov
2012-12-19 23:37                                                                                             ` H. Peter Anvin
2012-12-19 23:23                                                                                       ` H. Peter Anvin
2012-12-16  2:09                                                                   ` Yinghai Lu
2012-12-16  5:17                                                                     ` Yinghai Lu
2012-12-16  8:50                                                                       ` Yinghai Lu
2012-12-17 22:47                                                                         ` Yinghai Lu
2012-12-17 23:11                                                                           ` H. Peter Anvin
2012-12-17 23:26                                                                             ` Yinghai Lu
2012-12-18  1:11                                                                               ` Yinghai Lu
2012-12-18  1:51                                                                                 ` Yinghai Lu
2012-12-18  2:42                                                                                   ` Yinghai Lu
2012-12-14 20:10                                                         ` H. Peter Anvin
2012-12-14 20:17                                                           ` Yinghai Lu
2012-12-14 20:52                                                             ` H. Peter Anvin
2012-12-14 21:07                                                         ` Yinghai Lu
2012-12-11 18:02                               ` H. Peter Anvin
2012-12-11 18:20                                 ` H. Peter Anvin
2012-12-11 18:42                                   ` Yinghai Lu
2012-12-11 18:46                                     ` H. Peter Anvin
2012-12-11 19:18                                       ` Yinghai Lu
2012-12-11 19:33                                         ` H. Peter Anvin
2012-11-30  1:47 ` [PATCH v2 06/10] x86/head_32.S: Early update ucode in 32-bit Fenghua Yu
2012-12-01  0:26   ` [tip:x86/microcode] " tip-bot for Fenghua Yu
2012-11-30  1:47 ` [PATCH v2 07/10] x86/head64.c: Early update ucode in 64-bit Fenghua Yu
2012-12-01  0:27   ` [tip:x86/microcode] " tip-bot for Fenghua Yu
2012-11-30  1:47 ` [PATCH v2 08/10] x86/smpboot.c: Early update ucode on AP Fenghua Yu
2012-12-01  0:28   ` [tip:x86/microcode] " tip-bot for Fenghua Yu
2012-11-30  1:47 ` [PATCH v2 09/10] x86/mm/init.c: Copy ucode from initrd image to memory Fenghua Yu
2012-12-01  0:29   ` [tip:x86/microcode] " tip-bot for Fenghua Yu
2012-11-30  1:47 ` [PATCH v2 10/10] x86/Kconfig: Configurations to enable/disable the feature Fenghua Yu
2012-12-01  0:30   ` [tip:x86/microcode] x86/Kconfig: Configurations to enable/ disable " tip-bot for Fenghua Yu
2012-12-21  7:44 [PATCH v5 08/12] x86/microcode_intel_early.c: Early update ucode on Intel's CPU Fenghua Yu
2013-01-31 22:33 ` [tip:x86/microcode] " tip-bot for Fenghua Yu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).