linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 00/11] EFI runtime services virtual mapping
@ 2013-09-19 14:54 Borislav Petkov
  2013-09-19 14:54 ` [PATCH 01/11] efi: Simplify EFI_DEBUG Borislav Petkov
                   ` (12 more replies)
  0 siblings, 13 replies; 102+ messages in thread
From: Borislav Petkov @ 2013-09-19 14:54 UTC (permalink / raw)
  To: X86 ML
  Cc: LKML, Borislav Petkov, Matt Fleming, Matthew Garrett,
	H. Peter Anvin, James Bottomley, Vivek Goyal, Dave Young,
	linux-efi

From: Borislav Petkov <bp@suse.de>

Hi all,

here's finally a new version of the runtime services VA mapping patchset
which hopefully implements hpa's idea of statically mapping EFI runtime
regions in a top-down manner starting at -4Gb virtual.

We're also using a different pagetable so as not to pollute kernel
address space. For that, we switch to that table before doing an EFI
call, and afterwards we switch back to the previous one.

To the patches:

1-2 are simple cleanups which Matt probably can take now

3-10 add the machinery to map regions into an arbitrary PGD. Those I've
split deliberately into very small bites so that they can be reviewed
more thoroughly and easily for my pagetable skills are pretty basic.

11 is the actual patch which implements that mapping so that we can use
runtime services in kexec (which is the whole reason for this fuss :))

So please take a long hard look at those, hammer on them on your
boxes and let me know. They boot fine on my Dell UEFI box and in OVMF
(obviously :)).

Thanks.

Borislav Petkov (11):
  efi: Simplify EFI_DEBUG
  efi: Remove EFI_PAGE_SHIFT and EFI_PAGE_SIZE
  x86, pageattr: Lookup address in an arbitrary PGD
  x86, pageattr: Add a PGD pagetable populating function
  x86, pageattr: Add a PUD pagetable populating function
  x86, pageattr: Add a PMD pagetable populating function
  x86, pageattr: Add a PTE pagetable populating function
  x86, pageattr: Add a PUD error unwinding path
  x86, pageattr: Add last levels of error path
  x86, cpa: Map in an arbitrary pgd
  EFI: Runtime services virtual mapping

 arch/x86/boot/compressed/eboot.c     |  12 +-
 arch/x86/boot/compressed/eboot.h     |   1 -
 arch/x86/include/asm/efi.h           |  58 +++--
 arch/x86/include/asm/pgtable_types.h |   3 +-
 arch/x86/mm/pageattr.c               | 461 +++++++++++++++++++++++++++++++++--
 arch/x86/platform/efi/efi.c          | 126 +++++-----
 arch/x86/platform/efi/efi_64.c       |  56 +----
 arch/x86/platform/efi/efi_stub_64.S  |  47 ++++
 include/linux/efi.h                  |   6 +-
 9 files changed, 615 insertions(+), 155 deletions(-)

-- 
1.8.4


^ permalink raw reply	[flat|nested] 102+ messages in thread

* [PATCH 01/11] efi: Simplify EFI_DEBUG
  2013-09-19 14:54 [PATCH 00/11] EFI runtime services virtual mapping Borislav Petkov
@ 2013-09-19 14:54 ` Borislav Petkov
  2013-09-19 14:54 ` [PATCH 02/11] efi: Remove EFI_PAGE_SHIFT and EFI_PAGE_SIZE Borislav Petkov
                   ` (11 subsequent siblings)
  12 siblings, 0 replies; 102+ messages in thread
From: Borislav Petkov @ 2013-09-19 14:54 UTC (permalink / raw)
  To: X86 ML
  Cc: LKML, Borislav Petkov, Matt Fleming, Matthew Garrett,
	H. Peter Anvin, James Bottomley, Vivek Goyal, Dave Young,
	linux-efi

From: Borislav Petkov <bp@suse.de>

... and lose one #ifdef .. #endif sandwich.

Signed-off-by: Borislav Petkov <bp@suse.de>
---
 arch/x86/platform/efi/efi.c | 9 +++------
 1 file changed, 3 insertions(+), 6 deletions(-)

diff --git a/arch/x86/platform/efi/efi.c b/arch/x86/platform/efi/efi.c
index 90f6ed127096..7cec1e9e5494 100644
--- a/arch/x86/platform/efi/efi.c
+++ b/arch/x86/platform/efi/efi.c
@@ -51,7 +51,7 @@
 #include <asm/x86_init.h>
 #include <asm/rtc.h>
 
-#define EFI_DEBUG	1
+#define EFI_DEBUG
 
 #define EFI_MIN_RESERVE 5120
 
@@ -402,9 +402,9 @@ int __init efi_memblock_x86_reserve_range(void)
 	return 0;
 }
 
-#if EFI_DEBUG
 static void __init print_efi_memmap(void)
 {
+#ifdef EFI_DEBUG
 	efi_memory_desc_t *md;
 	void *p;
 	int i;
@@ -419,8 +419,8 @@ static void __init print_efi_memmap(void)
 			md->phys_addr + (md->num_pages << EFI_PAGE_SHIFT),
 			(md->num_pages >> (20 - EFI_PAGE_SHIFT)));
 	}
-}
 #endif  /*  EFI_DEBUG  */
+}
 
 void __init efi_reserve_boot_services(void)
 {
@@ -774,10 +774,7 @@ void __init efi_init(void)
 		x86_platform.set_wallclock = efi_set_rtc_mmss;
 	}
 #endif
-
-#if EFI_DEBUG
 	print_efi_memmap();
-#endif
 }
 
 void __init efi_late_init(void)
-- 
1.8.4


^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [PATCH 02/11] efi: Remove EFI_PAGE_SHIFT and EFI_PAGE_SIZE
  2013-09-19 14:54 [PATCH 00/11] EFI runtime services virtual mapping Borislav Petkov
  2013-09-19 14:54 ` [PATCH 01/11] efi: Simplify EFI_DEBUG Borislav Petkov
@ 2013-09-19 14:54 ` Borislav Petkov
  2013-09-20 10:42   ` Matt Fleming
  2013-09-19 14:54 ` [PATCH 03/11] x86, pageattr: Lookup address in an arbitrary PGD Borislav Petkov
                   ` (10 subsequent siblings)
  12 siblings, 1 reply; 102+ messages in thread
From: Borislav Petkov @ 2013-09-19 14:54 UTC (permalink / raw)
  To: X86 ML
  Cc: LKML, Borislav Petkov, Matt Fleming, Matthew Garrett,
	H. Peter Anvin, James Bottomley, Vivek Goyal, Dave Young,
	linux-efi

From: Borislav Petkov <bp@suse.de>

... and use the good old standard defines which we all know. Also,
simplify math to shift by PAGE_SHIFT instead of multiplying by
PAGE_SIZE.

Signed-off-by: Borislav Petkov <bp@suse.de>
---
 arch/x86/boot/compressed/eboot.c | 12 ++++++------
 arch/x86/boot/compressed/eboot.h |  1 -
 arch/x86/platform/efi/efi.c      | 22 +++++++++++-----------
 include/linux/efi.h              |  6 ++----
 4 files changed, 19 insertions(+), 22 deletions(-)

diff --git a/arch/x86/boot/compressed/eboot.c b/arch/x86/boot/compressed/eboot.c
index b7388a425f09..5c440bf769a8 100644
--- a/arch/x86/boot/compressed/eboot.c
+++ b/arch/x86/boot/compressed/eboot.c
@@ -96,7 +96,7 @@ static efi_status_t high_alloc(unsigned long size, unsigned long align,
 	if (status != EFI_SUCCESS)
 		goto fail;
 
-	nr_pages = round_up(size, EFI_PAGE_SIZE) / EFI_PAGE_SIZE;
+	nr_pages = round_up(size, PAGE_SIZE) / PAGE_SIZE;
 again:
 	for (i = 0; i < map_size / desc_size; i++) {
 		efi_memory_desc_t *desc;
@@ -111,7 +111,7 @@ again:
 			continue;
 
 		start = desc->phys_addr;
-		end = start + desc->num_pages * (1UL << EFI_PAGE_SHIFT);
+		end = start + (desc->num_pages << PAGE_SHIFT);
 
 		if ((start + size) > end || (start + size) > max)
 			continue;
@@ -173,7 +173,7 @@ static efi_status_t low_alloc(unsigned long size, unsigned long align,
 	if (status != EFI_SUCCESS)
 		goto fail;
 
-	nr_pages = round_up(size, EFI_PAGE_SIZE) / EFI_PAGE_SIZE;
+	nr_pages = round_up(size, PAGE_SIZE) / PAGE_SIZE;
 	for (i = 0; i < map_size / desc_size; i++) {
 		efi_memory_desc_t *desc;
 		unsigned long m = (unsigned long)map;
@@ -188,7 +188,7 @@ static efi_status_t low_alloc(unsigned long size, unsigned long align,
 			continue;
 
 		start = desc->phys_addr;
-		end = start + desc->num_pages * (1UL << EFI_PAGE_SHIFT);
+		end = start + (desc->num_pages << PAGE_SHIFT);
 
 		/*
 		 * Don't allocate at 0x0. It will confuse code that
@@ -224,7 +224,7 @@ static void low_free(unsigned long size, unsigned long addr)
 {
 	unsigned long nr_pages;
 
-	nr_pages = round_up(size, EFI_PAGE_SIZE) / EFI_PAGE_SIZE;
+	nr_pages = round_up(size, PAGE_SIZE) / PAGE_SIZE;
 	efi_call_phys2(sys_table->boottime->free_pages, addr, nr_pages);
 }
 
@@ -1128,7 +1128,7 @@ static efi_status_t relocate_kernel(struct setup_header *hdr)
 	 * possible.
 	 */
 	start = hdr->pref_address;
-	nr_pages = round_up(hdr->init_size, EFI_PAGE_SIZE) / EFI_PAGE_SIZE;
+	nr_pages = round_up(hdr->init_size, PAGE_SIZE) / PAGE_SIZE;
 
 	status = efi_call_phys4(sys_table->boottime->allocate_pages,
 				EFI_ALLOCATE_ADDRESS, EFI_LOADER_DATA,
diff --git a/arch/x86/boot/compressed/eboot.h b/arch/x86/boot/compressed/eboot.h
index e5b0a8f91c5f..786398c1bb9a 100644
--- a/arch/x86/boot/compressed/eboot.h
+++ b/arch/x86/boot/compressed/eboot.h
@@ -11,7 +11,6 @@
 
 #define DESC_TYPE_CODE_DATA	(1 << 0)
 
-#define EFI_PAGE_SIZE		(1UL << EFI_PAGE_SHIFT)
 #define EFI_READ_CHUNK_SIZE	(1024 * 1024)
 
 #define EFI_CONSOLE_OUT_DEVICE_GUID    \
diff --git a/arch/x86/platform/efi/efi.c b/arch/x86/platform/efi/efi.c
index 7cec1e9e5494..538c1e6b7b2c 100644
--- a/arch/x86/platform/efi/efi.c
+++ b/arch/x86/platform/efi/efi.c
@@ -339,7 +339,7 @@ static void __init do_add_efi_memmap(void)
 	for (p = memmap.map; p < memmap.map_end; p += memmap.desc_size) {
 		efi_memory_desc_t *md = p;
 		unsigned long long start = md->phys_addr;
-		unsigned long long size = md->num_pages << EFI_PAGE_SHIFT;
+		unsigned long long size = md->num_pages << PAGE_SHIFT;
 		int e820_type;
 
 		switch (md->type) {
@@ -416,8 +416,8 @@ static void __init print_efi_memmap(void)
 		pr_info("mem%02u: type=%u, attr=0x%llx, "
 			"range=[0x%016llx-0x%016llx) (%lluMB)\n",
 			i, md->type, md->attribute, md->phys_addr,
-			md->phys_addr + (md->num_pages << EFI_PAGE_SHIFT),
-			(md->num_pages >> (20 - EFI_PAGE_SHIFT)));
+			md->phys_addr + (md->num_pages << PAGE_SHIFT),
+			(md->num_pages >> (20 - PAGE_SHIFT)));
 	}
 #endif  /*  EFI_DEBUG  */
 }
@@ -429,7 +429,7 @@ void __init efi_reserve_boot_services(void)
 	for (p = memmap.map; p < memmap.map_end; p += memmap.desc_size) {
 		efi_memory_desc_t *md = p;
 		u64 start = md->phys_addr;
-		u64 size = md->num_pages << EFI_PAGE_SHIFT;
+		u64 size = md->num_pages << PAGE_SHIFT;
 
 		if (md->type != EFI_BOOT_SERVICES_CODE &&
 		    md->type != EFI_BOOT_SERVICES_DATA)
@@ -473,7 +473,7 @@ void __init efi_free_boot_services(void)
 	for (p = memmap.map; p < memmap.map_end; p += memmap.desc_size) {
 		efi_memory_desc_t *md = p;
 		unsigned long long start = md->phys_addr;
-		unsigned long long size = md->num_pages << EFI_PAGE_SHIFT;
+		unsigned long long size = md->num_pages << PAGE_SHIFT;
 
 		if (md->type != EFI_BOOT_SERVICES_CODE &&
 		    md->type != EFI_BOOT_SERVICES_DATA)
@@ -825,7 +825,7 @@ void __iomem *efi_lookup_mapped_addr(u64 phys_addr)
 		return NULL;
 	for (p = memmap.map; p < memmap.map_end; p += memmap.desc_size) {
 		efi_memory_desc_t *md = p;
-		u64 size = md->num_pages << EFI_PAGE_SHIFT;
+		u64 size = md->num_pages << PAGE_SHIFT;
 		u64 end = md->phys_addr + size;
 		if (!(md->attribute & EFI_MEMORY_RUNTIME) &&
 		    md->type != EFI_BOOT_SERVICES_CODE &&
@@ -843,7 +843,7 @@ void __iomem *efi_lookup_mapped_addr(u64 phys_addr)
 
 void efi_memory_uc(u64 addr, unsigned long size)
 {
-	unsigned long page_shift = 1UL << EFI_PAGE_SHIFT;
+	unsigned long page_shift = 1UL << PAGE_SHIFT;
 	u64 npages;
 
 	npages = round_up(size, page_shift) / page_shift;
@@ -896,7 +896,7 @@ void __init efi_enter_virtual_mode(void)
 			continue;
 		}
 
-		prev_size = prev_md->num_pages << EFI_PAGE_SHIFT;
+		prev_size = prev_md->num_pages << PAGE_SHIFT;
 
 		if (md->phys_addr == (prev_md->phys_addr + prev_size)) {
 			prev_md->num_pages += md->num_pages;
@@ -914,7 +914,7 @@ void __init efi_enter_virtual_mode(void)
 		    md->type != EFI_BOOT_SERVICES_DATA)
 			continue;
 
-		size = md->num_pages << EFI_PAGE_SHIFT;
+		size = md->num_pages << PAGE_SHIFT;
 		end = md->phys_addr + size;
 
 		start_pfn = PFN_DOWN(md->phys_addr);
@@ -1011,7 +1011,7 @@ u32 efi_mem_type(unsigned long phys_addr)
 		md = p;
 		if ((md->phys_addr <= phys_addr) &&
 		    (phys_addr < (md->phys_addr +
-				  (md->num_pages << EFI_PAGE_SHIFT))))
+				  (md->num_pages << PAGE_SHIFT))))
 			return md->type;
 	}
 	return 0;
@@ -1026,7 +1026,7 @@ u64 efi_mem_attributes(unsigned long phys_addr)
 		md = p;
 		if ((md->phys_addr <= phys_addr) &&
 		    (phys_addr < (md->phys_addr +
-				  (md->num_pages << EFI_PAGE_SHIFT))))
+				  (md->num_pages << PAGE_SHIFT))))
 			return md->attribute;
 	}
 	return 0;
diff --git a/include/linux/efi.h b/include/linux/efi.h
index 5f8f176154f7..fa47d80ab4b5 100644
--- a/include/linux/efi.h
+++ b/include/linux/efi.h
@@ -95,8 +95,6 @@ typedef	struct {
 #define EFI_MEMORY_RUNTIME	((u64)0x8000000000000000ULL)	/* range requires runtime mapping */
 #define EFI_MEMORY_DESCRIPTOR_VERSION	1
 
-#define EFI_PAGE_SHIFT		12
-
 typedef struct {
 	u32 type;
 	u32 pad;
@@ -611,7 +609,7 @@ static inline int efi_range_is_wc(unsigned long start, unsigned long len)
 {
 	unsigned long i;
 
-	for (i = 0; i < len; i += (1UL << EFI_PAGE_SHIFT)) {
+	for (i = 0; i < len; i += PAGE_SIZE) {
 		unsigned long paddr = __pa(start + i);
 		if (!(efi_mem_attributes(paddr) & EFI_MEMORY_WC))
 			return 0;
@@ -728,7 +726,7 @@ struct efi_generic_dev_path {
 
 static inline void memrange_efi_to_native(u64 *addr, u64 *npages)
 {
-	*npages = PFN_UP(*addr + (*npages<<EFI_PAGE_SHIFT)) - PFN_DOWN(*addr);
+	*npages = PFN_UP(*addr + (*npages << PAGE_SHIFT)) - PFN_DOWN(*addr);
 	*addr &= PAGE_MASK;
 }
 
-- 
1.8.4


^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [PATCH 03/11] x86, pageattr: Lookup address in an arbitrary PGD
  2013-09-19 14:54 [PATCH 00/11] EFI runtime services virtual mapping Borislav Petkov
  2013-09-19 14:54 ` [PATCH 01/11] efi: Simplify EFI_DEBUG Borislav Petkov
  2013-09-19 14:54 ` [PATCH 02/11] efi: Remove EFI_PAGE_SHIFT and EFI_PAGE_SIZE Borislav Petkov
@ 2013-09-19 14:54 ` Borislav Petkov
  2013-09-19 14:54 ` [PATCH 04/11] x86, pageattr: Add a PGD pagetable populating function Borislav Petkov
                   ` (9 subsequent siblings)
  12 siblings, 0 replies; 102+ messages in thread
From: Borislav Petkov @ 2013-09-19 14:54 UTC (permalink / raw)
  To: X86 ML
  Cc: LKML, Borislav Petkov, Matt Fleming, Matthew Garrett,
	H. Peter Anvin, James Bottomley, Vivek Goyal, Dave Young,
	linux-efi

From: Borislav Petkov <bp@suse.de>

This is preparatory work in order to be able to map pages into a
specified PGD and not implicitly and only into init_mm.

Signed-off-by: Borislav Petkov <bp@suse.de>
---
 arch/x86/mm/pageattr.c | 36 ++++++++++++++++++++++++++----------
 1 file changed, 26 insertions(+), 10 deletions(-)

diff --git a/arch/x86/mm/pageattr.c b/arch/x86/mm/pageattr.c
index bb32480c2d71..c53de62a1170 100644
--- a/arch/x86/mm/pageattr.c
+++ b/arch/x86/mm/pageattr.c
@@ -30,6 +30,7 @@
  */
 struct cpa_data {
 	unsigned long	*vaddr;
+	pgd_t		*pgd;
 	pgprot_t	mask_set;
 	pgprot_t	mask_clr;
 	int		numpages;
@@ -322,17 +323,9 @@ static inline pgprot_t static_protections(pgprot_t prot, unsigned long address,
 	return prot;
 }
 
-/*
- * Lookup the page table entry for a virtual address. Return a pointer
- * to the entry and the level of the mapping.
- *
- * Note: We return pud and pmd either when the entry is marked large
- * or when the present bit is not set. Otherwise we would return a
- * pointer to a nonexisting mapping.
- */
-pte_t *lookup_address(unsigned long address, unsigned int *level)
+static pte_t *__lookup_address_in_pgd(pgd_t *pgd, unsigned long address,
+				      unsigned int *level)
 {
-	pgd_t *pgd = pgd_offset_k(address);
 	pud_t *pud;
 	pmd_t *pmd;
 
@@ -361,8 +354,31 @@ pte_t *lookup_address(unsigned long address, unsigned int *level)
 
 	return pte_offset_kernel(pmd, address);
 }
+
+/*
+ * Lookup the page table entry for a virtual address. Return a pointer
+ * to the entry and the level of the mapping.
+ *
+ * Note: We return pud and pmd either when the entry is marked large
+ * or when the present bit is not set. Otherwise we would return a
+ * pointer to a nonexisting mapping.
+ */
+pte_t *lookup_address(unsigned long address, unsigned int *level)
+{
+        return __lookup_address_in_pgd(pgd_offset_k(address), address, level);
+}
 EXPORT_SYMBOL_GPL(lookup_address);
 
+static pte_t *_lookup_address_cpa(struct cpa_data *cpa, unsigned long address,
+				  unsigned int *level)
+{
+        if (cpa->pgd)
+		return __lookup_address_in_pgd(cpa->pgd + pgd_index(address),
+					       address, level);
+
+        return lookup_address(address, level);
+}
+
 /*
  * This is necessary because __pa() does not work on some
  * kinds of memory, like vmalloc() or the alloc_remap()
-- 
1.8.4


^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [PATCH 04/11] x86, pageattr: Add a PGD pagetable populating function
  2013-09-19 14:54 [PATCH 00/11] EFI runtime services virtual mapping Borislav Petkov
                   ` (2 preceding siblings ...)
  2013-09-19 14:54 ` [PATCH 03/11] x86, pageattr: Lookup address in an arbitrary PGD Borislav Petkov
@ 2013-09-19 14:54 ` Borislav Petkov
  2013-09-19 14:54 ` [PATCH 05/11] x86, pageattr: Add a PUD " Borislav Petkov
                   ` (8 subsequent siblings)
  12 siblings, 0 replies; 102+ messages in thread
From: Borislav Petkov @ 2013-09-19 14:54 UTC (permalink / raw)
  To: X86 ML
  Cc: LKML, Borislav Petkov, Matt Fleming, Matthew Garrett,
	H. Peter Anvin, James Bottomley, Vivek Goyal, Dave Young,
	linux-efi

From: Borislav Petkov <bp@suse.de>

This allocates, if necessary, and populates the corresponding PGD entry
with a PUD page. The next population level is a dummy macro which will
be removed by the next patch and it is added here to keep the patch
small and easily reviewable but not break bisection, at the same time.

Signed-off-by: Borislav Petkov <bp@suse.de>
---
 arch/x86/mm/pageattr.c | 39 +++++++++++++++++++++++++++++++++++++++
 1 file changed, 39 insertions(+)

diff --git a/arch/x86/mm/pageattr.c b/arch/x86/mm/pageattr.c
index c53de62a1170..21a31e85283c 100644
--- a/arch/x86/mm/pageattr.c
+++ b/arch/x86/mm/pageattr.c
@@ -666,6 +666,45 @@ static int split_large_page(pte_t *kpte, unsigned long address)
 	return 0;
 }
 
+#define populate_pud(cpa, addr, pgd, pgprot)	(-1)
+
+/*
+ * Restrictions for kernel page table do not necessarily apply when mapping in
+ * an alternate PGD.
+ */
+static int populate_pgd(struct cpa_data *cpa, unsigned long addr)
+{
+	pgprot_t pgprot = __pgprot(_KERNPG_TABLE);
+	bool allocd_pgd = false;
+	pgd_t *pgd_entry;
+	pud_t *pud;
+	int ret;
+
+	pgd_entry = cpa->pgd + pgd_index(addr);
+
+	/*
+	 * Allocate a PUD page and hand it down for mapping.
+	 */
+	if (pgd_none(*pgd_entry)) {
+		pud = (pud_t *)get_zeroed_page(GFP_KERNEL | __GFP_NOTRACK);
+		if (!pud)
+			return -1;
+
+		set_pgd(pgd_entry, __pgd(__pa(pud) | _KERNPG_TABLE));
+		allocd_pgd = true;
+	}
+
+	pgprot_val(pgprot) &= ~pgprot_val(cpa->mask_clr);
+	pgprot_val(pgprot) |=  pgprot_val(cpa->mask_set);
+
+	ret = populate_pud(cpa, addr, pgd_entry, pgprot);
+	if (ret < 0)
+		return ret;
+
+	cpa->numpages = ret;
+	return 0;
+}
+
 static int __cpa_process_fault(struct cpa_data *cpa, unsigned long vaddr,
 			       int primary)
 {
-- 
1.8.4


^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [PATCH 05/11] x86, pageattr: Add a PUD pagetable populating function
  2013-09-19 14:54 [PATCH 00/11] EFI runtime services virtual mapping Borislav Petkov
                   ` (3 preceding siblings ...)
  2013-09-19 14:54 ` [PATCH 04/11] x86, pageattr: Add a PGD pagetable populating function Borislav Petkov
@ 2013-09-19 14:54 ` Borislav Petkov
  2013-09-19 14:54 ` [PATCH 06/11] x86, pageattr: Add a PMD " Borislav Petkov
                   ` (7 subsequent siblings)
  12 siblings, 0 replies; 102+ messages in thread
From: Borislav Petkov @ 2013-09-19 14:54 UTC (permalink / raw)
  To: X86 ML
  Cc: LKML, Borislav Petkov, Matt Fleming, Matthew Garrett,
	H. Peter Anvin, James Bottomley, Vivek Goyal, Dave Young,
	linux-efi

From: Borislav Petkov <bp@suse.de>

Add the next level of the pagetable populating function, we handle
chunks around a 1G boundary by mapping them with the lower level
functions - otherwise we use 1G pages for the mappings, thus using as
less amount of pagetable pages as possible.

Signed-off-by: Borislav Petkov <bp@suse.de>
---
 arch/x86/mm/pageattr.c | 87 +++++++++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 86 insertions(+), 1 deletion(-)

diff --git a/arch/x86/mm/pageattr.c b/arch/x86/mm/pageattr.c
index 21a31e85283c..41c6fdbbfab0 100644
--- a/arch/x86/mm/pageattr.c
+++ b/arch/x86/mm/pageattr.c
@@ -666,7 +666,92 @@ static int split_large_page(pte_t *kpte, unsigned long address)
 	return 0;
 }
 
-#define populate_pud(cpa, addr, pgd, pgprot)	(-1)
+static int alloc_pmd_page(pud_t *pud)
+{
+	pmd_t *pmd = (pmd_t *)get_zeroed_page(GFP_KERNEL | __GFP_NOTRACK);
+	if (!pmd)
+		return -1;
+
+	set_pud(pud, __pud(__pa(pmd) | _KERNPG_TABLE));
+	return 0;
+}
+
+#define populate_pmd(cpa, start, end, pages, pud, pgprot)	(-1)
+
+static int populate_pud(struct cpa_data *cpa, unsigned long start, pgd_t *pgd,
+			pgprot_t pgprot)
+{
+	pud_t *pud;
+	unsigned long end;
+	int cur_pages = 0;
+
+	end = start + (cpa->numpages << PAGE_SHIFT);
+
+	/*
+	 * Not on a Gb page boundary? => map everything up to it with
+	 * smaller pages.
+	 */
+	if (start & (PUD_SIZE - 1)) {
+		unsigned long pre_end;
+		unsigned long next_page = (start + PUD_SIZE) & PUD_MASK;
+
+		pre_end   = min_t(unsigned long, end, next_page);
+		cur_pages = (pre_end - start) >> PAGE_SHIFT;
+		cur_pages = min_t(int, (int)cpa->numpages, cur_pages);
+
+		pud = pud_offset(pgd, start);
+
+		/*
+		 * Need a PMD page?
+		 */
+		if (pud_none(*pud))
+			if (alloc_pmd_page(pud))
+				return -1;
+
+		cur_pages = populate_pmd(cpa, start, pre_end, cur_pages,
+					 pud, pgprot);
+		if (cur_pages < 0)
+			return cur_pages;
+
+		start = pre_end;
+	}
+
+	/* We mapped them all? */
+	if (cpa->numpages == cur_pages)
+		return cur_pages;
+
+	pud = pud_offset(pgd, start);
+
+	/*
+	 * Map everything starting from the Gb boundary, possibly with 1G pages
+	 */
+	while (end - start >= PUD_SIZE) {
+		set_pud(pud, __pud(cpa->pfn | _PAGE_PSE | massage_pgprot(pgprot)));
+
+		start	  += PUD_SIZE;
+		cpa->pfn  += PUD_SIZE;
+		cur_pages += PUD_SIZE >> PAGE_SHIFT;
+		pud++;
+	}
+
+	/* Map trailing leftover */
+	if (start < end) {
+		int tmp;
+
+		pud = pud_offset(pgd, start);
+		if (pud_none(*pud))
+			if (alloc_pmd_page(pud))
+				return -1;
+
+		tmp = populate_pmd(cpa, start, end, cpa->numpages - cur_pages,
+				   pud, pgprot);
+		if (tmp < 0)
+			return cur_pages;
+
+		cur_pages += tmp;
+	}
+	return cur_pages;
+}
 
 /*
  * Restrictions for kernel page table do not necessarily apply when mapping in
-- 
1.8.4


^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [PATCH 06/11] x86, pageattr: Add a PMD pagetable populating function
  2013-09-19 14:54 [PATCH 00/11] EFI runtime services virtual mapping Borislav Petkov
                   ` (4 preceding siblings ...)
  2013-09-19 14:54 ` [PATCH 05/11] x86, pageattr: Add a PUD " Borislav Petkov
@ 2013-09-19 14:54 ` Borislav Petkov
  2013-09-19 14:54 ` [PATCH 07/11] x86, pageattr: Add a PTE " Borislav Petkov
                   ` (6 subsequent siblings)
  12 siblings, 0 replies; 102+ messages in thread
From: Borislav Petkov @ 2013-09-19 14:54 UTC (permalink / raw)
  To: X86 ML
  Cc: LKML, Borislav Petkov, Matt Fleming, Matthew Garrett,
	H. Peter Anvin, James Bottomley, Vivek Goyal, Dave Young,
	linux-efi

From: Borislav Petkov <bp@suse.de>

Handle PMD-level mappings the same as PUD ones.

Signed-off-by: Borislav Petkov <bp@suse.de>
---
 arch/x86/mm/pageattr.c | 82 +++++++++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 81 insertions(+), 1 deletion(-)

diff --git a/arch/x86/mm/pageattr.c b/arch/x86/mm/pageattr.c
index 41c6fdbbfab0..c56d71591617 100644
--- a/arch/x86/mm/pageattr.c
+++ b/arch/x86/mm/pageattr.c
@@ -666,6 +666,16 @@ static int split_large_page(pte_t *kpte, unsigned long address)
 	return 0;
 }
 
+static int alloc_pte_page(pmd_t *pmd)
+{
+	pte_t *pte = (pte_t *)get_zeroed_page(GFP_KERNEL | __GFP_NOTRACK);
+	if (!pte)
+		return -1;
+
+	set_pmd(pmd, __pmd(__pa(pte) | _KERNPG_TABLE));
+	return 0;
+}
+
 static int alloc_pmd_page(pud_t *pud)
 {
 	pmd_t *pmd = (pmd_t *)get_zeroed_page(GFP_KERNEL | __GFP_NOTRACK);
@@ -676,7 +686,77 @@ static int alloc_pmd_page(pud_t *pud)
 	return 0;
 }
 
-#define populate_pmd(cpa, start, end, pages, pud, pgprot)	(-1)
+#define populate_pte(cpa, start, end, pages, pmd, pgprot)	do {} while (0)
+
+static int populate_pmd(struct cpa_data *cpa,
+			unsigned long start, unsigned long end,
+			unsigned num_pages, pud_t *pud, pgprot_t pgprot)
+{
+	unsigned int cur_pages = 0;
+	pmd_t *pmd;
+
+	/*
+	 * Not on a 2M boundary?
+	 */
+	if (start & (PMD_SIZE - 1)) {
+		unsigned long pre_end = start + (num_pages << PAGE_SHIFT);
+		unsigned long next_page = (start + PMD_SIZE) & PMD_MASK;
+
+		pre_end   = min_t(unsigned long, pre_end, next_page);
+		cur_pages = (pre_end - start) >> PAGE_SHIFT;
+		cur_pages = min_t(unsigned int, num_pages, cur_pages);
+
+		/*
+		 * Need a PTE page?
+		 */
+		pmd = pmd_offset(pud, start);
+		if (pmd_none(*pmd))
+			if (alloc_pte_page(pmd))
+				return -1;
+
+		populate_pte(cpa, start, pre_end, cur_pages, pmd, pgprot);
+
+		start = pre_end;
+	}
+
+	/*
+	 * We mapped them all?
+	 */
+	if (num_pages == cur_pages)
+		return cur_pages;
+
+	while (end - start >= PMD_SIZE) {
+
+		/*
+		 * We cannot use a 1G page so allocate a PMD page if needed.
+		 */
+		if (pud_none(*pud))
+			if (alloc_pmd_page(pud))
+				return -1;
+
+		pmd = pmd_offset(pud, start);
+
+		set_pmd(pmd, __pmd(cpa->pfn | _PAGE_PSE | massage_pgprot(pgprot)));
+
+		start	  += PMD_SIZE;
+		cpa->pfn  += PMD_SIZE;
+		cur_pages += PMD_SIZE >> PAGE_SHIFT;
+	}
+
+	/*
+	 * Map trailing 4K pages.
+	 */
+	if (start < end) {
+		pmd = pmd_offset(pud, start);
+		if (pmd_none(*pmd))
+			if (alloc_pte_page(pmd))
+				return -1;
+
+		populate_pte(cpa, start, end, num_pages - cur_pages,
+			     pmd, pgprot);
+	}
+	return num_pages;
+}
 
 static int populate_pud(struct cpa_data *cpa, unsigned long start, pgd_t *pgd,
 			pgprot_t pgprot)
-- 
1.8.4


^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [PATCH 07/11] x86, pageattr: Add a PTE pagetable populating function
  2013-09-19 14:54 [PATCH 00/11] EFI runtime services virtual mapping Borislav Petkov
                   ` (5 preceding siblings ...)
  2013-09-19 14:54 ` [PATCH 06/11] x86, pageattr: Add a PMD " Borislav Petkov
@ 2013-09-19 14:54 ` Borislav Petkov
  2013-09-19 14:54 ` [PATCH 08/11] x86, pageattr: Add a PUD error unwinding path Borislav Petkov
                   ` (5 subsequent siblings)
  12 siblings, 0 replies; 102+ messages in thread
From: Borislav Petkov @ 2013-09-19 14:54 UTC (permalink / raw)
  To: X86 ML
  Cc: LKML, Borislav Petkov, Matt Fleming, Matthew Garrett,
	H. Peter Anvin, James Bottomley, Vivek Goyal, Dave Young,
	linux-efi

From: Borislav Petkov <bp@suse.de>

Handle last level by unconditionally writing the PTEs into the PTE page
while paying attention to the NX bit.

Signed-off-by: Borislav Petkov <bp@suse.de>
---
 arch/x86/mm/pageattr.c | 22 +++++++++++++++++++++-
 1 file changed, 21 insertions(+), 1 deletion(-)

diff --git a/arch/x86/mm/pageattr.c b/arch/x86/mm/pageattr.c
index c56d71591617..02cf97b3bb7c 100644
--- a/arch/x86/mm/pageattr.c
+++ b/arch/x86/mm/pageattr.c
@@ -686,7 +686,27 @@ static int alloc_pmd_page(pud_t *pud)
 	return 0;
 }
 
-#define populate_pte(cpa, start, end, pages, pmd, pgprot)	do {} while (0)
+static void populate_pte(struct cpa_data *cpa,
+			 unsigned long start, unsigned long end,
+			 unsigned num_pages, pmd_t *pmd, pgprot_t pgprot)
+{
+	pte_t *pte;
+
+	pte = pte_offset_kernel(pmd, start);
+
+	while (num_pages-- && start < end) {
+
+		/* deal with the NX bit */
+		if (!(pgprot_val(pgprot) & _PAGE_NX))
+			cpa->pfn &= ~_PAGE_NX;
+
+		set_pte(pte, pfn_pte(cpa->pfn >> PAGE_SHIFT, pgprot));
+
+		start	 += PAGE_SIZE;
+		cpa->pfn += PAGE_SIZE;
+		pte++;
+	}
+}
 
 static int populate_pmd(struct cpa_data *cpa,
 			unsigned long start, unsigned long end,
-- 
1.8.4


^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [PATCH 08/11] x86, pageattr: Add a PUD error unwinding path
  2013-09-19 14:54 [PATCH 00/11] EFI runtime services virtual mapping Borislav Petkov
                   ` (6 preceding siblings ...)
  2013-09-19 14:54 ` [PATCH 07/11] x86, pageattr: Add a PTE " Borislav Petkov
@ 2013-09-19 14:54 ` Borislav Petkov
  2013-09-19 14:54 ` [PATCH 09/11] x86, pageattr: Add last levels of error path Borislav Petkov
                   ` (4 subsequent siblings)
  12 siblings, 0 replies; 102+ messages in thread
From: Borislav Petkov @ 2013-09-19 14:54 UTC (permalink / raw)
  To: X86 ML
  Cc: LKML, Borislav Petkov, Matt Fleming, Matthew Garrett,
	H. Peter Anvin, James Bottomley, Vivek Goyal, Dave Young,
	linux-efi

From: Borislav Petkov <bp@suse.de>

In case we encounter an error during the mapping of a region, we want to
unwind what we've established so far exactly the way we did the mapping.
This is the PUD part kept deliberately small for easier review.

Signed-off-by: Borislav Petkov <bp@suse.de>
---
 arch/x86/mm/pageattr.c | 60 ++++++++++++++++++++++++++++++++++++++++++++++++--
 1 file changed, 58 insertions(+), 2 deletions(-)

diff --git a/arch/x86/mm/pageattr.c b/arch/x86/mm/pageattr.c
index 02cf97b3bb7c..a0d2e90ad62b 100644
--- a/arch/x86/mm/pageattr.c
+++ b/arch/x86/mm/pageattr.c
@@ -666,6 +666,51 @@ static int split_large_page(pte_t *kpte, unsigned long address)
 	return 0;
 }
 
+#define unmap_pmd_range(pud, start, pre_end)		do {} while (0)
+
+static void unmap_pud_range(pgd_t *pgd, unsigned long start, unsigned long end)
+{
+	pud_t *pud = pud_offset(pgd, start);
+
+	/*
+	 * Not on a GB page boundary?
+	 */
+	if (start & (PUD_SIZE - 1)) {
+		unsigned long next_page = (start + PUD_SIZE) & PUD_MASK;
+		unsigned long pre_end	= min_t(unsigned long, end, next_page);
+
+		unmap_pmd_range(pud, start, pre_end);
+
+		start = pre_end;
+		pud++;
+	}
+
+	/*
+	 * Try to unmap in 1G chunks?
+	 */
+	while (end - start >= PUD_SIZE) {
+
+		if (pud_large(*pud))
+			pud_clear(pud);
+		else
+			unmap_pmd_range(pud, start, start + PUD_SIZE);
+
+		start += PUD_SIZE;
+		pud++;
+	}
+
+	/*
+	 * 2M leftovers?
+	 */
+	if (start < end)
+		unmap_pmd_range(pud, start, end);
+
+	/*
+	 * No need to try to free the PUD page because we'll free it in
+	 * populate_pgd's error path
+	 */
+}
+
 static int alloc_pte_page(pmd_t *pmd)
 {
 	pte_t *pte = (pte_t *)get_zeroed_page(GFP_KERNEL | __GFP_NOTRACK);
@@ -883,9 +928,20 @@ static int populate_pgd(struct cpa_data *cpa, unsigned long addr)
 	pgprot_val(pgprot) |=  pgprot_val(cpa->mask_set);
 
 	ret = populate_pud(cpa, addr, pgd_entry, pgprot);
-	if (ret < 0)
-		return ret;
+	if (ret < 0) {
+		unmap_pud_range(pgd_entry, addr,
+				addr + (cpa->numpages << PAGE_SHIFT));
 
+		if (allocd_pgd) {
+			/*
+			 * If I allocated this PUD page, I can just as well
+			 * free it in this error path.
+			 */
+			pgd_clear(pgd_entry);
+			free_page((unsigned long)pud);
+		}
+		return ret;
+	}
 	cpa->numpages = ret;
 	return 0;
 }
-- 
1.8.4


^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [PATCH 09/11] x86, pageattr: Add last levels of error path
  2013-09-19 14:54 [PATCH 00/11] EFI runtime services virtual mapping Borislav Petkov
                   ` (7 preceding siblings ...)
  2013-09-19 14:54 ` [PATCH 08/11] x86, pageattr: Add a PUD error unwinding path Borislav Petkov
@ 2013-09-19 14:54 ` Borislav Petkov
  2013-09-19 14:54 ` [PATCH 10/11] x86, cpa: Map in an arbitrary pgd Borislav Petkov
                   ` (3 subsequent siblings)
  12 siblings, 0 replies; 102+ messages in thread
From: Borislav Petkov @ 2013-09-19 14:54 UTC (permalink / raw)
  To: X86 ML
  Cc: LKML, Borislav Petkov, Matt Fleming, Matthew Garrett,
	H. Peter Anvin, James Bottomley, Vivek Goyal, Dave Young,
	linux-efi

From: Borislav Petkov <bp@suse.de>

We try to free the pagetable pages once we've unmapped our portion.

Signed-off-by: Borislav Petkov <bp@suse.de>
---
 arch/x86/mm/pageattr.c | 94 +++++++++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 93 insertions(+), 1 deletion(-)

diff --git a/arch/x86/mm/pageattr.c b/arch/x86/mm/pageattr.c
index a0d2e90ad62b..ca76481c09e8 100644
--- a/arch/x86/mm/pageattr.c
+++ b/arch/x86/mm/pageattr.c
@@ -666,7 +666,99 @@ static int split_large_page(pte_t *kpte, unsigned long address)
 	return 0;
 }
 
-#define unmap_pmd_range(pud, start, pre_end)		do {} while (0)
+static bool try_to_free_pte_page(pte_t *pte)
+{
+	int i;
+
+	for (i = 0; i < PTRS_PER_PTE; i++)
+		if (!pte_none(pte[i]))
+			return false;
+
+	free_page((unsigned long)pte);
+	return true;
+}
+
+static bool try_to_free_pmd_page(pmd_t *pmd)
+{
+	int i;
+
+	for (i = 0; i < PTRS_PER_PMD; i++)
+		if (!pmd_none(pmd[i]))
+			return false;
+
+	free_page((unsigned long)pmd);
+	return true;
+}
+
+static bool unmap_pte_range(pmd_t *pmd, unsigned long start, unsigned long end)
+{
+	pte_t *pte = pte_offset_kernel(pmd, start);
+
+	while (start < end) {
+		set_pte(pte, __pte(0));
+
+		start += PAGE_SIZE;
+		pte++;
+	}
+
+	if (try_to_free_pte_page((pte_t *)pmd_page_vaddr(*pmd))) {
+		pmd_clear(pmd);
+		return true;
+	}
+	return false;
+}
+
+static void __unmap_pmd_range(pud_t *pud, pmd_t *pmd,
+			      unsigned long start, unsigned long end)
+{
+	if (unmap_pte_range(pmd, start, end))
+		if (try_to_free_pmd_page((pmd_t *)pud_page_vaddr(*pud)))
+			pud_clear(pud);
+}
+
+static void unmap_pmd_range(pud_t *pud, unsigned long start, unsigned long end)
+{
+	pmd_t *pmd = pmd_offset(pud, start);
+
+	/*
+	 * Not on a 2MB page boundary?
+	 */
+	if (start & (PMD_SIZE - 1)) {
+		unsigned long next_page = (start + PMD_SIZE) & PMD_MASK;
+		unsigned long pre_end = min_t(unsigned long, end, next_page);
+
+		__unmap_pmd_range(pud, pmd, start, pre_end);
+
+		start = pre_end;
+		pmd++;
+	}
+
+	/*
+	 * Try to unmap in 2M chunks.
+	 */
+	while (end - start >= PMD_SIZE) {
+		if (pmd_large(*pmd))
+			pmd_clear(pmd);
+		else
+			__unmap_pmd_range(pud, pmd, start, start + PMD_SIZE);
+
+		start += PMD_SIZE;
+		pmd++;
+	}
+
+	/*
+	 * 4K leftovers?
+	 */
+	if (start < end)
+		return __unmap_pmd_range(pud, pmd, start, end);
+
+	/*
+	 * Try again to free the PMD page if haven't succeeded above.
+	 */
+	if (!pud_none(*pud))
+		if (try_to_free_pmd_page((pmd_t *)pud_page_vaddr(*pud)))
+			pud_clear(pud);
+}
 
 static void unmap_pud_range(pgd_t *pgd, unsigned long start, unsigned long end)
 {
-- 
1.8.4


^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [PATCH 10/11] x86, cpa: Map in an arbitrary pgd
  2013-09-19 14:54 [PATCH 00/11] EFI runtime services virtual mapping Borislav Petkov
                   ` (8 preceding siblings ...)
  2013-09-19 14:54 ` [PATCH 09/11] x86, pageattr: Add last levels of error path Borislav Petkov
@ 2013-09-19 14:54 ` Borislav Petkov
  2013-09-19 14:54 ` [PATCH 11/11] EFI: Runtime services virtual mapping Borislav Petkov
                   ` (2 subsequent siblings)
  12 siblings, 0 replies; 102+ messages in thread
From: Borislav Petkov @ 2013-09-19 14:54 UTC (permalink / raw)
  To: X86 ML
  Cc: LKML, Borislav Petkov, Matt Fleming, Matthew Garrett,
	H. Peter Anvin, James Bottomley, Vivek Goyal, Dave Young,
	linux-efi

From: Borislav Petkov <bp@suse.de>

Add the ability to map pages in an arbitrary pgd. This wires in the
remaining stuff so that there's a new interface with which you can map a
region into an arbitrary PGD.

Signed-off-by: Borislav Petkov <bp@suse.de>
---
 arch/x86/mm/pageattr.c | 53 +++++++++++++++++++++++++++++++++++++++++++-------
 1 file changed, 46 insertions(+), 7 deletions(-)

diff --git a/arch/x86/mm/pageattr.c b/arch/x86/mm/pageattr.c
index ca76481c09e8..991386bf3aad 100644
--- a/arch/x86/mm/pageattr.c
+++ b/arch/x86/mm/pageattr.c
@@ -453,7 +453,7 @@ try_preserve_large_page(pte_t *kpte, unsigned long address,
 	 * Check for races, another CPU might have split this page
 	 * up already:
 	 */
-	tmp = lookup_address(address, &level);
+	tmp = _lookup_address_cpa(cpa, address, &level);
 	if (tmp != kpte)
 		goto out_unlock;
 
@@ -559,7 +559,8 @@ out_unlock:
 }
 
 static int
-__split_large_page(pte_t *kpte, unsigned long address, struct page *base)
+__split_large_page(struct cpa_data *cpa, pte_t *kpte, unsigned long address,
+		   struct page *base)
 {
 	pte_t *pbase = (pte_t *)page_address(base);
 	unsigned long pfn, pfninc = 1;
@@ -572,7 +573,7 @@ __split_large_page(pte_t *kpte, unsigned long address, struct page *base)
 	 * Check for races, another CPU might have split this page
 	 * up for us already:
 	 */
-	tmp = lookup_address(address, &level);
+	tmp = _lookup_address_cpa(cpa, address, &level);
 	if (tmp != kpte) {
 		spin_unlock(&pgd_lock);
 		return 1;
@@ -648,7 +649,8 @@ __split_large_page(pte_t *kpte, unsigned long address, struct page *base)
 	return 0;
 }
 
-static int split_large_page(pte_t *kpte, unsigned long address)
+static int split_large_page(struct cpa_data *cpa, pte_t *kpte,
+			    unsigned long address)
 {
 	struct page *base;
 
@@ -660,7 +662,7 @@ static int split_large_page(pte_t *kpte, unsigned long address)
 	if (!base)
 		return -ENOMEM;
 
-	if (__split_large_page(kpte, address, base))
+	if (__split_large_page(cpa, kpte, address, base))
 		__free_page(base);
 
 	return 0;
@@ -1041,6 +1043,9 @@ static int populate_pgd(struct cpa_data *cpa, unsigned long addr)
 static int __cpa_process_fault(struct cpa_data *cpa, unsigned long vaddr,
 			       int primary)
 {
+	if (cpa->pgd)
+		return populate_pgd(cpa, vaddr);
+
 	/*
 	 * Ignore all non primary paths.
 	 */
@@ -1085,7 +1090,7 @@ static int __change_page_attr(struct cpa_data *cpa, int primary)
 	else
 		address = *cpa->vaddr;
 repeat:
-	kpte = lookup_address(address, &level);
+	kpte = _lookup_address_cpa(cpa, address, &level);
 	if (!kpte)
 		return __cpa_process_fault(cpa, address, primary);
 
@@ -1149,7 +1154,7 @@ repeat:
 	/*
 	 * We have to split the large page:
 	 */
-	err = split_large_page(kpte, address);
+	err = split_large_page(cpa, kpte, address);
 	if (!err) {
 		/*
 	 	 * Do a global flush tlb after splitting the large page
@@ -1298,6 +1303,8 @@ static int change_page_attr_set_clr(unsigned long *addr, int numpages,
 	int ret, cache, checkalias;
 	unsigned long baddr = 0;
 
+	memset(&cpa, 0, sizeof(cpa));
+
 	/*
 	 * Check, if we are requested to change a not supported
 	 * feature:
@@ -1744,6 +1751,7 @@ static int __set_pages_p(struct page *page, int numpages)
 {
 	unsigned long tempaddr = (unsigned long) page_address(page);
 	struct cpa_data cpa = { .vaddr = &tempaddr,
+				.pgd = 0,
 				.numpages = numpages,
 				.mask_set = __pgprot(_PAGE_PRESENT | _PAGE_RW),
 				.mask_clr = __pgprot(0),
@@ -1762,6 +1770,7 @@ static int __set_pages_np(struct page *page, int numpages)
 {
 	unsigned long tempaddr = (unsigned long) page_address(page);
 	struct cpa_data cpa = { .vaddr = &tempaddr,
+				.pgd = 0,
 				.numpages = numpages,
 				.mask_set = __pgprot(0),
 				.mask_clr = __pgprot(_PAGE_PRESENT | _PAGE_RW),
@@ -1822,6 +1831,36 @@ bool kernel_page_present(struct page *page)
 
 #endif /* CONFIG_DEBUG_PAGEALLOC */
 
+int kernel_map_pages_in_pgd(pgd_t *pgd, u64 pfn, unsigned long address,
+			    unsigned numpages, unsigned long page_flags)
+{
+	int retval = 0;
+
+	struct cpa_data cpa = {
+		.vaddr = &address,
+		.pfn = pfn,
+		.pgd = pgd,
+		.numpages = numpages,
+		.mask_set = __pgprot(0),
+		.mask_clr = __pgprot(0),
+		.flags = 0,
+	};
+
+	if (!(__supported_pte_mask & _PAGE_NX))
+		goto out;
+
+	if (!(page_flags & _PAGE_NX))
+		cpa.mask_clr = __pgprot(_PAGE_NX);
+
+	cpa.mask_set = __pgprot(_PAGE_PRESENT | page_flags);
+
+	retval = __change_page_attr_set_clr(&cpa, 0);
+	__flush_tlb_all();
+
+out:
+	return retval;
+}
+
 /*
  * The testcases use internal knowledge of the implementation that shouldn't
  * be exposed to the rest of the kernel. Include these directly here.
-- 
1.8.4


^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [PATCH 11/11] EFI: Runtime services virtual mapping
  2013-09-19 14:54 [PATCH 00/11] EFI runtime services virtual mapping Borislav Petkov
                   ` (9 preceding siblings ...)
  2013-09-19 14:54 ` [PATCH 10/11] x86, cpa: Map in an arbitrary pgd Borislav Petkov
@ 2013-09-19 14:54 ` Borislav Petkov
  2013-09-21 11:39   ` [PATCH -v2] " Borislav Petkov
  2013-09-20  7:29 ` [PATCH 00/11] EFI runtime " Dave Young
  2013-10-08 16:45 ` Borislav Petkov
  12 siblings, 1 reply; 102+ messages in thread
From: Borislav Petkov @ 2013-09-19 14:54 UTC (permalink / raw)
  To: X86 ML
  Cc: LKML, Borislav Petkov, Matt Fleming, Matthew Garrett,
	H. Peter Anvin, James Bottomley, Vivek Goyal, Dave Young,
	linux-efi

From: Borislav Petkov <bp@suse.de>

We map the EFI regions needed for runtime services contiguously on
virtual addresses starting from -4G down for a total max space of 64G.
This way, we provide for stable runtime services addresses across
kernels so that a kexec'd kernel can still use them.

This way, they're mapped in a separate pagetable so that we don't
pollute the kernel namespace (you can see how the whole ioremapping and
saving and restoring of PGDs is gone now).

Signed-off-by: Borislav Petkov <bp@suse.de>
---
 arch/x86/include/asm/efi.h           | 58 +++++++++++++++-------
 arch/x86/include/asm/pgtable_types.h |  3 +-
 arch/x86/platform/efi/efi.c          | 95 +++++++++++++++++++++---------------
 arch/x86/platform/efi/efi_64.c       | 56 +--------------------
 arch/x86/platform/efi/efi_stub_64.S  | 47 ++++++++++++++++++
 5 files changed, 149 insertions(+), 110 deletions(-)

diff --git a/arch/x86/include/asm/efi.h b/arch/x86/include/asm/efi.h
index 0062a0125041..745c8d27265b 100644
--- a/arch/x86/include/asm/efi.h
+++ b/arch/x86/include/asm/efi.h
@@ -39,6 +39,8 @@ extern unsigned long asmlinkage efi_call_phys(void *, ...);
 
 #else /* !CONFIG_X86_32 */
 
+#include <linux/sched.h>
+
 #define EFI_LOADER_SIGNATURE	"EL64"
 
 extern u64 efi_call0(void *fp);
@@ -51,6 +53,21 @@ extern u64 efi_call5(void *fp, u64 arg1, u64 arg2, u64 arg3,
 extern u64 efi_call6(void *fp, u64 arg1, u64 arg2, u64 arg3,
 		     u64 arg4, u64 arg5, u64 arg6);
 
+/*
+ * Add low kernel mappings for passing arguments to EFI functions.
+ */
+static inline void efi_sync_low_kernel_mappings(void)
+{
+	unsigned num_pgds;
+	pgd_t *pgd = (pgd_t *)__va(real_mode_header->trampoline_pgd);
+
+	num_pgds = pgd_index(VMALLOC_START - 1) - pgd_index(PAGE_OFFSET);
+
+	memcpy(pgd + pgd_index(PAGE_OFFSET),
+		init_mm.pgd + pgd_index(PAGE_OFFSET),
+		sizeof(pgd_t) * num_pgds);
+}
+
 #define efi_call_phys0(f)			\
 	efi_call0((f))
 #define efi_call_phys1(f, a1)			\
@@ -69,24 +86,31 @@ extern u64 efi_call6(void *fp, u64 arg1, u64 arg2, u64 arg3,
 	efi_call6((f), (u64)(a1), (u64)(a2), (u64)(a3),		\
 		  (u64)(a4), (u64)(a5), (u64)(a6))
 
+#define _efi_call_virtX(x, f, ...)					\
+({									\
+	efi_status_t __s;						\
+									\
+	efi_sync_low_kernel_mappings();					\
+	preempt_disable();						\
+	__s = efi_call##x((void *)efi.systab->runtime->f, __VA_ARGS__);	\
+	preempt_enable();						\
+	__s;								\
+})
+
 #define efi_call_virt0(f)				\
-	efi_call0((efi.systab->runtime->f))
-#define efi_call_virt1(f, a1)					\
-	efi_call1((efi.systab->runtime->f), (u64)(a1))
-#define efi_call_virt2(f, a1, a2)					\
-	efi_call2((efi.systab->runtime->f), (u64)(a1), (u64)(a2))
-#define efi_call_virt3(f, a1, a2, a3)					\
-	efi_call3((efi.systab->runtime->f), (u64)(a1), (u64)(a2), \
-		  (u64)(a3))
-#define efi_call_virt4(f, a1, a2, a3, a4)				\
-	efi_call4((efi.systab->runtime->f), (u64)(a1), (u64)(a2), \
-		  (u64)(a3), (u64)(a4))
-#define efi_call_virt5(f, a1, a2, a3, a4, a5)				\
-	efi_call5((efi.systab->runtime->f), (u64)(a1), (u64)(a2), \
-		  (u64)(a3), (u64)(a4), (u64)(a5))
-#define efi_call_virt6(f, a1, a2, a3, a4, a5, a6)			\
-	efi_call6((efi.systab->runtime->f), (u64)(a1), (u64)(a2), \
-		  (u64)(a3), (u64)(a4), (u64)(a5), (u64)(a6))
+	_efi_call_virtX(0, f)
+#define efi_call_virt1(f, a1)				\
+	_efi_call_virtX(1, f, (u64)(a1))
+#define efi_call_virt2(f, a1, a2)			\
+	_efi_call_virtX(2, f, (u64)(a1), (u64)(a2))
+#define efi_call_virt3(f, a1, a2, a3)			\
+	_efi_call_virtX(3, f, (u64)(a1), (u64)(a2), (u64)(a3))
+#define efi_call_virt4(f, a1, a2, a3, a4)		\
+	_efi_call_virtX(4, f, (u64)(a1), (u64)(a2), (u64)(a3), (u64)(a4))
+#define efi_call_virt5(f, a1, a2, a3, a4, a5)		\
+	_efi_call_virtX(5, f, (u64)(a1), (u64)(a2), (u64)(a3), (u64)(a4), (u64)(a5))
+#define efi_call_virt6(f, a1, a2, a3, a4, a5, a6)	\
+	_efi_call_virtX(6, f, (u64)(a1), (u64)(a2), (u64)(a3), (u64)(a4), (u64)(a5), (u64)(a6))
 
 extern void __iomem *efi_ioremap(unsigned long addr, unsigned long size,
 				 u32 type, u64 attribute);
diff --git a/arch/x86/include/asm/pgtable_types.h b/arch/x86/include/asm/pgtable_types.h
index 0ecac257fb26..a83aa44bb1fb 100644
--- a/arch/x86/include/asm/pgtable_types.h
+++ b/arch/x86/include/asm/pgtable_types.h
@@ -382,7 +382,8 @@ static inline void update_page_count(int level, unsigned long pages) { }
  */
 extern pte_t *lookup_address(unsigned long address, unsigned int *level);
 extern phys_addr_t slow_virt_to_phys(void *__address);
-
+extern int kernel_map_pages_in_pgd(pgd_t *pgd, u64 pfn, unsigned long address,
+				   unsigned numpages, unsigned long page_flags);
 #endif	/* !__ASSEMBLY__ */
 
 #endif /* _ASM_X86_PGTABLE_DEFS_H */
diff --git a/arch/x86/platform/efi/efi.c b/arch/x86/platform/efi/efi.c
index 538c1e6b7b2c..9c54ce5b9975 100644
--- a/arch/x86/platform/efi/efi.c
+++ b/arch/x86/platform/efi/efi.c
@@ -12,6 +12,8 @@
  *	Bibo Mao <bibo.mao@intel.com>
  *	Chandramouli Narayanan <mouli@linux.intel.com>
  *	Huang Ying <ying.huang@intel.com>
+ * Copyright (C) 2013 SuSE Labs
+ * 	Borislav Petkov <bp@suse.de> - runtime services VA mapping
  *
  * Copied from efi_32.c to eliminate the duplicated code between EFI
  * 32/64 support code. --ying 2007-10-26
@@ -60,6 +62,13 @@
 
 static efi_char16_t efi_dummy_name[6] = { 'D', 'U', 'M', 'M', 'Y', 0 };
 
+/*
+ * We allocate runtime services regions top-down, starting from -4G, i.e.
+ * 0xffff_ffff_0000_0000 and limit EFI VA mapping space to 64G.
+ */
+static unsigned long efi_va = -4 * (1UL << 30);
+#define EFI_VA_END	    (-68 * (1UL << 30))
+
 struct efi __read_mostly efi = {
 	.mps        = EFI_INVALID_TABLE_ADDR,
 	.acpi       = EFI_INVALID_TABLE_ADDR,
@@ -81,6 +90,16 @@ static efi_system_table_t efi_systab __initdata;
 unsigned long x86_efi_facility;
 
 /*
+ * Scratch space used for switching the pagetable in the EFI stub
+ */
+struct efi_scratch {
+	u64 r15;
+	u64 prev_cr3;
+	pgd_t *efi_pgt;
+};
+extern struct efi_scratch efi_scratch;
+
+/*
  * Returns 1 if 'facility' is enabled, 0 otherwise.
  */
 int efi_enabled(int facility)
@@ -797,22 +816,6 @@ void __init efi_set_executable(efi_memory_desc_t *md, bool executable)
 		set_memory_nx(addr, npages);
 }
 
-static void __init runtime_code_page_mkexec(void)
-{
-	efi_memory_desc_t *md;
-	void *p;
-
-	/* Make EFI runtime service code area executable */
-	for (p = memmap.map; p < memmap.map_end; p += memmap.desc_size) {
-		md = p;
-
-		if (md->type != EFI_RUNTIME_SERVICES_CODE)
-			continue;
-
-		efi_set_executable(md, true);
-	}
-}
-
 /*
  * We can't ioremap data in EFI boot services RAM, because we've already mapped
  * it as RAM.  So, look it up in the existing EFI memory map instead.  Only
@@ -851,6 +854,23 @@ void efi_memory_uc(u64 addr, unsigned long size)
 	set_memory_uc(addr, npages);
 }
 
+static void __init __map_region(efi_memory_desc_t *md, u64 va)
+{
+	pgd_t *pgd = (pgd_t *)__va(real_mode_header->trampoline_pgd);
+	unsigned long pf = 0, size;
+	u64 end;
+
+	if (!(md->attribute & EFI_MEMORY_WB))
+		pf |= _PAGE_PCD;
+
+	size = md->num_pages << PAGE_SHIFT;
+	end  = va + size;
+
+	if(kernel_map_pages_in_pgd(pgd, md->phys_addr, va, md->num_pages, pf))
+		pr_warning("Error mapping PA 0x%llx -> VA 0x%llx!\n",
+			   md->phys_addr, va);
+}
+
 /*
  * This function will switch the EFI runtime services to virtual mode.
  * Essentially, look through the EFI memmap and map every region that
@@ -862,10 +882,10 @@ void efi_memory_uc(u64 addr, unsigned long size)
 void __init efi_enter_virtual_mode(void)
 {
 	efi_memory_desc_t *md, *prev_md = NULL;
-	efi_status_t status;
+	void *p, *new_memmap = NULL;
 	unsigned long size;
-	u64 end, systab, start_pfn, end_pfn;
-	void *p, *va, *new_memmap = NULL;
+	efi_status_t status;
+	u64 end, systab, new_va;
 	int count = 0;
 
 	efi.systab = NULL;
@@ -874,7 +894,6 @@ void __init efi_enter_virtual_mode(void)
 	 * We don't do virtual mode, since we don't do runtime services, on
 	 * non-native EFI
 	 */
-
 	if (!efi_is_native()) {
 		efi_unmap_memmap();
 		return;
@@ -914,33 +933,31 @@ void __init efi_enter_virtual_mode(void)
 		    md->type != EFI_BOOT_SERVICES_DATA)
 			continue;
 
+		/* Do the 1:1 map */
+		__map_region(md, md->phys_addr);
+
 		size = md->num_pages << PAGE_SHIFT;
 		end = md->phys_addr + size;
 
-		start_pfn = PFN_DOWN(md->phys_addr);
-		end_pfn = PFN_UP(end);
-		if (pfn_range_is_mapped(start_pfn, end_pfn)) {
-			va = __va(md->phys_addr);
-
-			if (!(md->attribute & EFI_MEMORY_WB))
-				efi_memory_uc((u64)(unsigned long)va, size);
-		} else
-			va = efi_ioremap(md->phys_addr, size,
-					 md->type, md->attribute);
-
-		md->virt_addr = (u64) (unsigned long) va;
-
-		if (!va) {
-			pr_err("ioremap of 0x%llX failed!\n",
-			       (unsigned long long)md->phys_addr);
+		new_va = efi_va - size;
+		if (new_va < EFI_VA_END) {
+			pr_warning(FW_WARN "VA address range overflow!\n");
 			continue;
 		}
 
+		efi_va -= size;
+
+		/* Do the VA map */
+		__map_region(md, new_va);
+		md->virt_addr = new_va;
+
 		systab = (u64) (unsigned long) efi_phys.systab;
 		if (md->phys_addr <= systab && systab < end) {
 			systab += md->virt_addr - md->phys_addr;
+
 			efi.systab = (efi_system_table_t *) (unsigned long) systab;
 		}
+
 		new_memmap = krealloc(new_memmap,
 				      (count + 1) * memmap.desc_size,
 				      GFP_KERNEL);
@@ -949,8 +966,12 @@ void __init efi_enter_virtual_mode(void)
 		count++;
 	}
 
+	efi_scratch.efi_pgt = (pgd_t *)(unsigned long)real_mode_header->trampoline_pgd;
+
 	BUG_ON(!efi.systab);
 
+	efi_sync_low_kernel_mappings();
+
 	status = phys_efi_set_virtual_address_map(
 		memmap.desc_size * count,
 		memmap.desc_size,
@@ -983,8 +1004,6 @@ void __init efi_enter_virtual_mode(void)
 	efi.query_variable_info = virt_efi_query_variable_info;
 	efi.update_capsule = virt_efi_update_capsule;
 	efi.query_capsule_caps = virt_efi_query_capsule_caps;
-	if (__supported_pte_mask & _PAGE_NX)
-		runtime_code_page_mkexec();
 
 	kfree(new_memmap);
 
diff --git a/arch/x86/platform/efi/efi_64.c b/arch/x86/platform/efi/efi_64.c
index 39a0e7f1f0a3..a16fa9a6cf3e 100644
--- a/arch/x86/platform/efi/efi_64.c
+++ b/arch/x86/platform/efi/efi_64.c
@@ -39,60 +39,8 @@
 #include <asm/cacheflush.h>
 #include <asm/fixmap.h>
 
-static pgd_t *save_pgd __initdata;
-static unsigned long efi_flags __initdata;
-
-static void __init early_code_mapping_set_exec(int executable)
-{
-	efi_memory_desc_t *md;
-	void *p;
-
-	if (!(__supported_pte_mask & _PAGE_NX))
-		return;
-
-	/* Make EFI service code area executable */
-	for (p = memmap.map; p < memmap.map_end; p += memmap.desc_size) {
-		md = p;
-		if (md->type == EFI_RUNTIME_SERVICES_CODE ||
-		    md->type == EFI_BOOT_SERVICES_CODE)
-			efi_set_executable(md, executable);
-	}
-}
-
-void __init efi_call_phys_prelog(void)
-{
-	unsigned long vaddress;
-	int pgd;
-	int n_pgds;
-
-	early_code_mapping_set_exec(1);
-	local_irq_save(efi_flags);
-
-	n_pgds = DIV_ROUND_UP((max_pfn << PAGE_SHIFT), PGDIR_SIZE);
-	save_pgd = kmalloc(n_pgds * sizeof(pgd_t), GFP_KERNEL);
-
-	for (pgd = 0; pgd < n_pgds; pgd++) {
-		save_pgd[pgd] = *pgd_offset_k(pgd * PGDIR_SIZE);
-		vaddress = (unsigned long)__va(pgd * PGDIR_SIZE);
-		set_pgd(pgd_offset_k(pgd * PGDIR_SIZE), *pgd_offset_k(vaddress));
-	}
-	__flush_tlb_all();
-}
-
-void __init efi_call_phys_epilog(void)
-{
-	/*
-	 * After the lock is released, the original page table is restored.
-	 */
-	int pgd;
-	int n_pgds = DIV_ROUND_UP((max_pfn << PAGE_SHIFT) , PGDIR_SIZE);
-	for (pgd = 0; pgd < n_pgds; pgd++)
-		set_pgd(pgd_offset_k(pgd * PGDIR_SIZE), save_pgd[pgd]);
-	kfree(save_pgd);
-	__flush_tlb_all();
-	local_irq_restore(efi_flags);
-	early_code_mapping_set_exec(0);
-}
+void __init efi_call_phys_prelog(void) {}
+void __init efi_call_phys_epilog(void) {}
 
 void __iomem *__init efi_ioremap(unsigned long phys_addr, unsigned long size,
 				 u32 type, u64 attribute)
diff --git a/arch/x86/platform/efi/efi_stub_64.S b/arch/x86/platform/efi/efi_stub_64.S
index 4c07ccab8146..f3bc4127f9e8 100644
--- a/arch/x86/platform/efi/efi_stub_64.S
+++ b/arch/x86/platform/efi/efi_stub_64.S
@@ -34,10 +34,41 @@
 	mov %rsi, %cr0;			\
 	mov (%rsp), %rsp
 
+	/* stolen from gcc */
+	.macro FLUSH_TLB_ALL
+	movq %r15, efi_scratch(%rip)
+	movq %r14, efi_scratch+8(%rip)
+	movq %cr4, %r15
+	movq %r15, %r14
+	andb $0x7f, %r14b
+	movq %r14, %cr4
+	movq %r15, %cr4
+	movq efi_scratch+8(%rip), %r14
+	movq efi_scratch(%rip), %r15
+	.endm
+
+	.macro SWITCH_PGT
+	movq %r15, efi_scratch(%rip)		# r15
+	# save previous CR3
+	movq %cr3, %r15
+	movq %r15, efi_scratch+8(%rip)		# prev_cr3
+	movq efi_scratch+16(%rip), %r15		# EFI pgt
+	movq %r15, %cr3
+	.endm
+
+	.macro RESTORE_PGT
+	movq efi_scratch+8(%rip), %r15
+	movq %r15, %cr3
+	movq efi_scratch(%rip), %r15
+	FLUSH_TLB_ALL
+	.endm
+
 ENTRY(efi_call0)
 	SAVE_XMM
 	subq $32, %rsp
+	SWITCH_PGT
 	call *%rdi
+	RESTORE_PGT
 	addq $32, %rsp
 	RESTORE_XMM
 	ret
@@ -47,7 +78,9 @@ ENTRY(efi_call1)
 	SAVE_XMM
 	subq $32, %rsp
 	mov  %rsi, %rcx
+	SWITCH_PGT
 	call *%rdi
+	RESTORE_PGT
 	addq $32, %rsp
 	RESTORE_XMM
 	ret
@@ -57,7 +90,9 @@ ENTRY(efi_call2)
 	SAVE_XMM
 	subq $32, %rsp
 	mov  %rsi, %rcx
+	SWITCH_PGT
 	call *%rdi
+	RESTORE_PGT
 	addq $32, %rsp
 	RESTORE_XMM
 	ret
@@ -68,7 +103,9 @@ ENTRY(efi_call3)
 	subq $32, %rsp
 	mov  %rcx, %r8
 	mov  %rsi, %rcx
+	SWITCH_PGT
 	call *%rdi
+	RESTORE_PGT
 	addq $32, %rsp
 	RESTORE_XMM
 	ret
@@ -80,7 +117,9 @@ ENTRY(efi_call4)
 	mov %r8, %r9
 	mov %rcx, %r8
 	mov %rsi, %rcx
+	SWITCH_PGT
 	call *%rdi
+	RESTORE_PGT
 	addq $32, %rsp
 	RESTORE_XMM
 	ret
@@ -93,7 +132,9 @@ ENTRY(efi_call5)
 	mov %r8, %r9
 	mov %rcx, %r8
 	mov %rsi, %rcx
+	SWITCH_PGT
 	call *%rdi
+	RESTORE_PGT
 	addq $48, %rsp
 	RESTORE_XMM
 	ret
@@ -109,8 +150,14 @@ ENTRY(efi_call6)
 	mov %r8, %r9
 	mov %rcx, %r8
 	mov %rsi, %rcx
+	SWITCH_PGT
 	call *%rdi
+	RESTORE_PGT
 	addq $48, %rsp
 	RESTORE_XMM
 	ret
 ENDPROC(efi_call6)
+
+	.data
+ENTRY(efi_scratch)
+	.fill 3,8,0
-- 
1.8.4


^ permalink raw reply related	[flat|nested] 102+ messages in thread

* Re: [PATCH 00/11] EFI runtime services virtual mapping
  2013-09-19 14:54 [PATCH 00/11] EFI runtime services virtual mapping Borislav Petkov
                   ` (10 preceding siblings ...)
  2013-09-19 14:54 ` [PATCH 11/11] EFI: Runtime services virtual mapping Borislav Petkov
@ 2013-09-20  7:29 ` Dave Young
  2013-09-20  8:19   ` Dave Young
  2013-09-20  9:05   ` Borislav Petkov
  2013-10-08 16:45 ` Borislav Petkov
  12 siblings, 2 replies; 102+ messages in thread
From: Dave Young @ 2013-09-20  7:29 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: X86 ML, LKML, Borislav Petkov, Matt Fleming, Matthew Garrett,
	H. Peter Anvin, James Bottomley, Vivek Goyal, linux-efi

On 09/19/13 at 04:54pm, Borislav Petkov wrote:
> From: Borislav Petkov <bp@suse.de>
> 
> Hi all,
> 
> here's finally a new version of the runtime services VA mapping patchset
> which hopefully implements hpa's idea of statically mapping EFI runtime
> regions in a top-down manner starting at -4Gb virtual.
> 
> We're also using a different pagetable so as not to pollute kernel
> address space. For that, we switch to that table before doing an EFI
> call, and afterwards we switch back to the previous one.
> 
> To the patches:
> 
> 1-2 are simple cleanups which Matt probably can take now
> 
> 3-10 add the machinery to map regions into an arbitrary PGD. Those I've
> split deliberately into very small bites so that they can be reviewed
> more thoroughly and easily for my pagetable skills are pretty basic.
> 
> 11 is the actual patch which implements that mapping so that we can use
> runtime services in kexec (which is the whole reason for this fuss :))
> 
> So please take a long hard look at those, hammer on them on your
> boxes and let me know. They boot fine on my Dell UEFI box and in OVMF
> (obviously :)).

Thanks for your update!

Just tested this series, for 1st kernel It boots ok in qemu+ovmf. But it immediately
reboot on my Thinkpad T420. Unfortunately there's no way to debug this
very early problem because there's no serial port also earlyprintk does
not work for efi boot. No usb debug as well on this machine. I will test
it when I go back to work after the china holiday.

OTOH, for 2nd kernel testing because kexec tools does not fill efi_info[]
in bootparam so kernel will disable efi, also it pass acpi_rsdp pointer
automaticlly to make 2nd kernel boot ok.

I tested with a user space patch which copy efi_info from 1st kernel to
bootparams, as I said previously this is not enough because several fields
in systab, fw_vendor, runtime and tables are converted to virtual address
but in kernel efi init function they are assumed physical addresses. Thus
we need save these physical address. I have a patch to save them and pass
them to 2nd kernel in bootparams.
Since the mapping are same, I wonder if we can calculate the physical
address from virtual address.  Idea?

Another concern is that is it safe for i386 efi boot?

> 
> Thanks.
> 
> Borislav Petkov (11):
>   efi: Simplify EFI_DEBUG
>   efi: Remove EFI_PAGE_SHIFT and EFI_PAGE_SIZE
>   x86, pageattr: Lookup address in an arbitrary PGD
>   x86, pageattr: Add a PGD pagetable populating function
>   x86, pageattr: Add a PUD pagetable populating function
>   x86, pageattr: Add a PMD pagetable populating function
>   x86, pageattr: Add a PTE pagetable populating function
>   x86, pageattr: Add a PUD error unwinding path
>   x86, pageattr: Add last levels of error path
>   x86, cpa: Map in an arbitrary pgd
>   EFI: Runtime services virtual mapping
> 
>  arch/x86/boot/compressed/eboot.c     |  12 +-
>  arch/x86/boot/compressed/eboot.h     |   1 -
>  arch/x86/include/asm/efi.h           |  58 +++--
>  arch/x86/include/asm/pgtable_types.h |   3 +-
>  arch/x86/mm/pageattr.c               | 461 +++++++++++++++++++++++++++++++++--
>  arch/x86/platform/efi/efi.c          | 126 +++++-----
>  arch/x86/platform/efi/efi_64.c       |  56 +----
>  arch/x86/platform/efi/efi_stub_64.S  |  47 ++++
>  include/linux/efi.h                  |   6 +-
>  9 files changed, 615 insertions(+), 155 deletions(-)
> 
> -- 
> 1.8.4
> 

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH 00/11] EFI runtime services virtual mapping
  2013-09-20  7:29 ` [PATCH 00/11] EFI runtime " Dave Young
@ 2013-09-20  8:19   ` Dave Young
  2013-09-20  9:33     ` Borislav Petkov
  2013-09-20  9:05   ` Borislav Petkov
  1 sibling, 1 reply; 102+ messages in thread
From: Dave Young @ 2013-09-20  8:19 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: X86 ML, LKML, Borislav Petkov, Matt Fleming, Matthew Garrett,
	H. Peter Anvin, James Bottomley, Vivek Goyal, linux-efi

On 09/20/13 at 03:29pm, Dave Young wrote:
> On 09/19/13 at 04:54pm, Borislav Petkov wrote:
> > From: Borislav Petkov <bp@suse.de>
> > 
> > Hi all,
> > 
> > here's finally a new version of the runtime services VA mapping patchset
> > which hopefully implements hpa's idea of statically mapping EFI runtime
> > regions in a top-down manner starting at -4Gb virtual.
> > 
> > We're also using a different pagetable so as not to pollute kernel
> > address space. For that, we switch to that table before doing an EFI
> > call, and afterwards we switch back to the previous one.
> > 
> > To the patches:
> > 
> > 1-2 are simple cleanups which Matt probably can take now
> > 
> > 3-10 add the machinery to map regions into an arbitrary PGD. Those I've
> > split deliberately into very small bites so that they can be reviewed
> > more thoroughly and easily for my pagetable skills are pretty basic.
> > 
> > 11 is the actual patch which implements that mapping so that we can use
> > runtime services in kexec (which is the whole reason for this fuss :))
> > 
> > So please take a long hard look at those, hammer on them on your
> > boxes and let me know. They boot fine on my Dell UEFI box and in OVMF
> > (obviously :)).
> 
> Thanks for your update!
> 
> Just tested this series, for 1st kernel It boots ok in qemu+ovmf. But it immediately
> reboot on my Thinkpad T420. Unfortunately there's no way to debug this
> very early problem because there's no serial port also earlyprintk does
> not work for efi boot. No usb debug as well on this machine. I will test
> it when I go back to work after the china holiday.

Actually the ovmf testing is "qemu-system-x86_64 -kernel ", boot from grub
fails as well. Nothing printed on serial. I guess '-kernel' is using efi stub
to boot?

> 
> OTOH, for 2nd kernel testing because kexec tools does not fill efi_info[]
> in bootparam so kernel will disable efi, also it pass acpi_rsdp pointer
> automaticlly to make 2nd kernel boot ok.
> 
> I tested with a user space patch which copy efi_info from 1st kernel to
> bootparams, as I said previously this is not enough because several fields
> in systab, fw_vendor, runtime and tables are converted to virtual address
> but in kernel efi init function they are assumed physical addresses. Thus
> we need save these physical address. I have a patch to save them and pass
> them to 2nd kernel in bootparams.
> Since the mapping are same, I wonder if we can calculate the physical
> address from virtual address.  Idea?
> 
> Another concern is that is it safe for i386 efi boot?
> 

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH 00/11] EFI runtime services virtual mapping
  2013-09-20  7:29 ` [PATCH 00/11] EFI runtime " Dave Young
  2013-09-20  8:19   ` Dave Young
@ 2013-09-20  9:05   ` Borislav Petkov
  2013-09-20  9:44     ` Matt Fleming
                       ` (3 more replies)
  1 sibling, 4 replies; 102+ messages in thread
From: Borislav Petkov @ 2013-09-20  9:05 UTC (permalink / raw)
  To: Dave Young
  Cc: X86 ML, LKML, Borislav Petkov, Matt Fleming, Matthew Garrett,
	H. Peter Anvin, James Bottomley, Vivek Goyal, linux-efi

On Fri, Sep 20, 2013 at 03:29:04PM +0800, Dave Young wrote:
> Just tested this series, for 1st kernel It boots ok in qemu+ovmf. But
> it immediately reboot on my Thinkpad T420. Unfortunately there's no
> way to debug this very early problem because there's no serial port
> also earlyprintk does not work for efi boot. No usb debug as well on
> this machine. I will test it when I go back to work after the china
> holiday.

Hmm, I'm booting with the efi boot stub, how do you do it?

> OTOH, for 2nd kernel testing because kexec tools does not fill
> efi_info[] in bootparam so kernel will disable efi, also it pass
> acpi_rsdp pointer automaticlly to make 2nd kernel boot ok.

Right, the way this could be done is to pass in efi_info.efi_memmap,
i.e. the physical map and then iterate over it and compute the virtual
addresses *without* calling phys_efi_set_virtual_address_map() - they
are stable now.

> I tested with a user space patch which copy efi_info from 1st kernel
> to bootparams, as I said previously this is not enough because several
> fields in systab, fw_vendor, runtime and tables are converted to
> virtual address but in kernel efi init function they are assumed
> physical addresses. Thus we need save these physical address. I have a
> patch to save them and pass them to 2nd kernel in bootparams.

Yep.

> Since the mapping are same, I wonder if we can calculate the physical
> address from virtual address. Idea?

Just look at the loop where we're iterating over regions in
efi_enter_virtual_mode(): we basically can do the same __map_region
calls without calling phys_efi_set_virtual_address_map.

> Another concern is that is it safe for i386 efi boot?

That's why I didn't put a git tree on k.org - I wanted to run tests
myself before Fengguang's robot :)

But no, 32-bit is not addressed here. Which just dawned on me: Matt, I
probably should keep the ioremapping code for 32-bit, doh. I completely
went 64-bit only here :-)

Thanks.

-- 
Regards/Gruss,
    Boris.

Sent from a fat crate under my desk. Formatting is fine.
--

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH 00/11] EFI runtime services virtual mapping
  2013-09-20  8:19   ` Dave Young
@ 2013-09-20  9:33     ` Borislav Petkov
  2013-09-20 10:07       ` Dave Young
  0 siblings, 1 reply; 102+ messages in thread
From: Borislav Petkov @ 2013-09-20  9:33 UTC (permalink / raw)
  To: Dave Young
  Cc: X86 ML, LKML, Borislav Petkov, Matt Fleming, Matthew Garrett,
	H. Peter Anvin, James Bottomley, Vivek Goyal, linux-efi

On Fri, Sep 20, 2013 at 04:19:40PM +0800, Dave Young wrote:
> Actually the ovmf testing is "qemu-system-x86_64 -kernel ", boot from grub
> fails as well. Nothing printed on serial. I guess '-kernel' is using efi stub
> to boot?

Yes.

Which OVMF are you using? Mine is pretty recent: svn revision 14530 from August.

Thanks.

-- 
Regards/Gruss,
    Boris.

Sent from a fat crate under my desk. Formatting is fine.
--

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH 00/11] EFI runtime services virtual mapping
  2013-09-20  9:05   ` Borislav Petkov
@ 2013-09-20  9:44     ` Matt Fleming
  2013-09-20  9:49     ` Matt Fleming
                       ` (2 subsequent siblings)
  3 siblings, 0 replies; 102+ messages in thread
From: Matt Fleming @ 2013-09-20  9:44 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Dave Young, X86 ML, LKML, Borislav Petkov, Matthew Garrett,
	H. Peter Anvin, James Bottomley, Vivek Goyal, linux-efi

On Fri, 20 Sep, at 11:05:44AM, Borislav Petkov wrote:
> But no, 32-bit is not addressed here. Which just dawned on me: Matt, I
> probably should keep the ioremapping code for 32-bit, doh. I completely
> went 64-bit only here :-)

Yes, please keep the ioremap code. At least for now.

-- 
Matt Fleming, Intel Open Source Technology Center

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH 00/11] EFI runtime services virtual mapping
  2013-09-20  9:05   ` Borislav Petkov
  2013-09-20  9:44     ` Matt Fleming
@ 2013-09-20  9:49     ` Matt Fleming
  2013-09-20 10:02       ` Borislav Petkov
  2013-09-20 11:51     ` Dave Young
  2013-09-20 12:29     ` Matt Fleming
  3 siblings, 1 reply; 102+ messages in thread
From: Matt Fleming @ 2013-09-20  9:49 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Dave Young, X86 ML, LKML, Borislav Petkov, Matt Fleming,
	Matthew Garrett, H. Peter Anvin, James Bottomley, Vivek Goyal,
	linux-efi

On Fri, 20 Sep, at 11:05:44AM, Borislav Petkov wrote:
> > Another concern is that is it safe for i386 efi boot?
> 
> That's why I didn't put a git tree on k.org - I wanted to run tests
> myself before Fengguang's robot :)
> 
> But no, 32-bit is not addressed here. Which just dawned on me: Matt, I
> probably should keep the ioremapping code for 32-bit, doh. I completely
> went 64-bit only here :-)

/home/build/git/efi/arch/x86/platform/efi/efi.c: In function ‘__map_region’:
/home/build/git/efi/arch/x86/platform/efi/efi.c:753:24: error: ‘struct real_mode_header’ has no member named ‘trampoline_pgd’
/home/build/git/efi/arch/x86/platform/efi/efi.c: In function ‘efi_enter_virtual_mode’:
/home/build/git/efi/arch/x86/platform/efi/efi.c:863:64: error: ‘struct real_mode_header’ has no member named ‘trampoline_pgd’
/home/build/git/efi/arch/x86/platform/efi/efi.c:867:2: error: implicit declaration of function ‘efi_sync_low_kernel_mappings’
[-Werror=implicit-function-declaration]

-- 
Matt Fleming, Intel Open Source Technology Center

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH 00/11] EFI runtime services virtual mapping
  2013-09-20  9:49     ` Matt Fleming
@ 2013-09-20 10:02       ` Borislav Petkov
  0 siblings, 0 replies; 102+ messages in thread
From: Borislav Petkov @ 2013-09-20 10:02 UTC (permalink / raw)
  To: Matt Fleming
  Cc: Dave Young, X86 ML, LKML, Borislav Petkov, Matthew Garrett,
	H. Peter Anvin, James Bottomley, Vivek Goyal, linux-efi

On Fri, Sep 20, 2013 at 10:49:13AM +0100, Matt Fleming wrote:
> /home/build/git/efi/arch/x86/platform/efi/efi.c: In function ‘__map_region’:
> /home/build/git/efi/arch/x86/platform/efi/efi.c:753:24: error: ‘struct real_mode_header’ has no member named ‘trampoline_pgd’
> /home/build/git/efi/arch/x86/platform/efi/efi.c: In function ‘efi_enter_virtual_mode’:
> /home/build/git/efi/arch/x86/platform/efi/efi.c:863:64: error: ‘struct real_mode_header’ has no member named ‘trampoline_pgd’
> /home/build/git/efi/arch/x86/platform/efi/efi.c:867:2: error: implicit declaration of function ‘efi_sync_low_kernel_mappings’
> [-Werror=implicit-function-declaration]

Yep, I know - saw them last night and fixed them. But this place will
need some reorg anyway in the next version - just don't do 32-bit builds
with this one :)

-- 
Regards/Gruss,
    Boris.

Sent from a fat crate under my desk. Formatting is fine.
--

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH 00/11] EFI runtime services virtual mapping
  2013-09-20  9:33     ` Borislav Petkov
@ 2013-09-20 10:07       ` Dave Young
  0 siblings, 0 replies; 102+ messages in thread
From: Dave Young @ 2013-09-20 10:07 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: X86 ML, LKML, Borislav Petkov, Matt Fleming, Matthew Garrett,
	H. Peter Anvin, James Bottomley, Vivek Goyal, linux-efi

On 09/20/13 at 11:33am, Borislav Petkov wrote:
> On Fri, Sep 20, 2013 at 04:19:40PM +0800, Dave Young wrote:
> > Actually the ovmf testing is "qemu-system-x86_64 -kernel ", boot from grub
> > fails as well. Nothing printed on serial. I guess '-kernel' is using efi stub
> > to boot?
> 
> Yes.
> 
> Which OVMF are you using? Mine is pretty recent: svn revision 14530 from August.

It's a fresh clone at 2013-09-12.

> 
> Thanks.
> 
> -- 
> Regards/Gruss,
>     Boris.
> 
> Sent from a fat crate under my desk. Formatting is fine.
> --

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH 02/11] efi: Remove EFI_PAGE_SHIFT and EFI_PAGE_SIZE
  2013-09-19 14:54 ` [PATCH 02/11] efi: Remove EFI_PAGE_SHIFT and EFI_PAGE_SIZE Borislav Petkov
@ 2013-09-20 10:42   ` Matt Fleming
  2013-09-21 15:21     ` Leif Lindholm
  0 siblings, 1 reply; 102+ messages in thread
From: Matt Fleming @ 2013-09-20 10:42 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: X86 ML, LKML, Borislav Petkov, Matt Fleming, Matthew Garrett,
	H. Peter Anvin, James Bottomley, Vivek Goyal, Dave Young,
	linux-efi, Leif Lindholm, Roy Franz

On Thu, 19 Sep, at 04:54:45PM, Borislav Petkov wrote:
> From: Borislav Petkov <bp@suse.de>
> 
> ... and use the good old standard defines which we all know. Also,
> simplify math to shift by PAGE_SHIFT instead of multiplying by
> PAGE_SIZE.
> 
> Signed-off-by: Borislav Petkov <bp@suse.de>
> ---
>  arch/x86/boot/compressed/eboot.c | 12 ++++++------
>  arch/x86/boot/compressed/eboot.h |  1 -
>  arch/x86/platform/efi/efi.c      | 22 +++++++++++-----------
>  include/linux/efi.h              |  6 ++----
>  4 files changed, 19 insertions(+), 22 deletions(-)

I'm pulling in Leif and Roy just so they're aware of this change,
because while PAGE_SHIFT is always 12 on x86, that's not true for arm64.

However, I imagine that much work would be needed to allow for page
sizes other than 4K, so I am definitely going to take this patch.

> diff --git a/arch/x86/boot/compressed/eboot.c b/arch/x86/boot/compressed/eboot.c
> index b7388a425f09..5c440bf769a8 100644
> --- a/arch/x86/boot/compressed/eboot.c
> +++ b/arch/x86/boot/compressed/eboot.c
> @@ -96,7 +96,7 @@ static efi_status_t high_alloc(unsigned long size, unsigned long align,
>  	if (status != EFI_SUCCESS)
>  		goto fail;
>  
> -	nr_pages = round_up(size, EFI_PAGE_SIZE) / EFI_PAGE_SIZE;
> +	nr_pages = round_up(size, PAGE_SIZE) / PAGE_SIZE;
>  again:
>  	for (i = 0; i < map_size / desc_size; i++) {
>  		efi_memory_desc_t *desc;
> @@ -111,7 +111,7 @@ again:
>  			continue;
>  
>  		start = desc->phys_addr;
> -		end = start + desc->num_pages * (1UL << EFI_PAGE_SHIFT);
> +		end = start + (desc->num_pages << PAGE_SHIFT);
>  
>  		if ((start + size) > end || (start + size) > max)
>  			continue;
> @@ -173,7 +173,7 @@ static efi_status_t low_alloc(unsigned long size, unsigned long align,
>  	if (status != EFI_SUCCESS)
>  		goto fail;
>  
> -	nr_pages = round_up(size, EFI_PAGE_SIZE) / EFI_PAGE_SIZE;
> +	nr_pages = round_up(size, PAGE_SIZE) / PAGE_SIZE;
>  	for (i = 0; i < map_size / desc_size; i++) {
>  		efi_memory_desc_t *desc;
>  		unsigned long m = (unsigned long)map;
> @@ -188,7 +188,7 @@ static efi_status_t low_alloc(unsigned long size, unsigned long align,
>  			continue;
>  
>  		start = desc->phys_addr;
> -		end = start + desc->num_pages * (1UL << EFI_PAGE_SHIFT);
> +		end = start + (desc->num_pages << PAGE_SHIFT);
>  
>  		/*
>  		 * Don't allocate at 0x0. It will confuse code that
> @@ -224,7 +224,7 @@ static void low_free(unsigned long size, unsigned long addr)
>  {
>  	unsigned long nr_pages;
>  
> -	nr_pages = round_up(size, EFI_PAGE_SIZE) / EFI_PAGE_SIZE;
> +	nr_pages = round_up(size, PAGE_SIZE) / PAGE_SIZE;
>  	efi_call_phys2(sys_table->boottime->free_pages, addr, nr_pages);
>  }
>  
> @@ -1128,7 +1128,7 @@ static efi_status_t relocate_kernel(struct setup_header *hdr)
>  	 * possible.
>  	 */
>  	start = hdr->pref_address;
> -	nr_pages = round_up(hdr->init_size, EFI_PAGE_SIZE) / EFI_PAGE_SIZE;
> +	nr_pages = round_up(hdr->init_size, PAGE_SIZE) / PAGE_SIZE;
>  
>  	status = efi_call_phys4(sys_table->boottime->allocate_pages,
>  				EFI_ALLOCATE_ADDRESS, EFI_LOADER_DATA,
> diff --git a/arch/x86/boot/compressed/eboot.h b/arch/x86/boot/compressed/eboot.h
> index e5b0a8f91c5f..786398c1bb9a 100644
> --- a/arch/x86/boot/compressed/eboot.h
> +++ b/arch/x86/boot/compressed/eboot.h
> @@ -11,7 +11,6 @@
>  
>  #define DESC_TYPE_CODE_DATA	(1 << 0)
>  
> -#define EFI_PAGE_SIZE		(1UL << EFI_PAGE_SHIFT)
>  #define EFI_READ_CHUNK_SIZE	(1024 * 1024)
>  
>  #define EFI_CONSOLE_OUT_DEVICE_GUID    \
> diff --git a/arch/x86/platform/efi/efi.c b/arch/x86/platform/efi/efi.c
> index 7cec1e9e5494..538c1e6b7b2c 100644
> --- a/arch/x86/platform/efi/efi.c
> +++ b/arch/x86/platform/efi/efi.c
> @@ -339,7 +339,7 @@ static void __init do_add_efi_memmap(void)
>  	for (p = memmap.map; p < memmap.map_end; p += memmap.desc_size) {
>  		efi_memory_desc_t *md = p;
>  		unsigned long long start = md->phys_addr;
> -		unsigned long long size = md->num_pages << EFI_PAGE_SHIFT;
> +		unsigned long long size = md->num_pages << PAGE_SHIFT;
>  		int e820_type;
>  
>  		switch (md->type) {
> @@ -416,8 +416,8 @@ static void __init print_efi_memmap(void)
>  		pr_info("mem%02u: type=%u, attr=0x%llx, "
>  			"range=[0x%016llx-0x%016llx) (%lluMB)\n",
>  			i, md->type, md->attribute, md->phys_addr,
> -			md->phys_addr + (md->num_pages << EFI_PAGE_SHIFT),
> -			(md->num_pages >> (20 - EFI_PAGE_SHIFT)));
> +			md->phys_addr + (md->num_pages << PAGE_SHIFT),
> +			(md->num_pages >> (20 - PAGE_SHIFT)));
>  	}
>  #endif  /*  EFI_DEBUG  */
>  }
> @@ -429,7 +429,7 @@ void __init efi_reserve_boot_services(void)
>  	for (p = memmap.map; p < memmap.map_end; p += memmap.desc_size) {
>  		efi_memory_desc_t *md = p;
>  		u64 start = md->phys_addr;
> -		u64 size = md->num_pages << EFI_PAGE_SHIFT;
> +		u64 size = md->num_pages << PAGE_SHIFT;
>  
>  		if (md->type != EFI_BOOT_SERVICES_CODE &&
>  		    md->type != EFI_BOOT_SERVICES_DATA)
> @@ -473,7 +473,7 @@ void __init efi_free_boot_services(void)
>  	for (p = memmap.map; p < memmap.map_end; p += memmap.desc_size) {
>  		efi_memory_desc_t *md = p;
>  		unsigned long long start = md->phys_addr;
> -		unsigned long long size = md->num_pages << EFI_PAGE_SHIFT;
> +		unsigned long long size = md->num_pages << PAGE_SHIFT;
>  
>  		if (md->type != EFI_BOOT_SERVICES_CODE &&
>  		    md->type != EFI_BOOT_SERVICES_DATA)
> @@ -825,7 +825,7 @@ void __iomem *efi_lookup_mapped_addr(u64 phys_addr)
>  		return NULL;
>  	for (p = memmap.map; p < memmap.map_end; p += memmap.desc_size) {
>  		efi_memory_desc_t *md = p;
> -		u64 size = md->num_pages << EFI_PAGE_SHIFT;
> +		u64 size = md->num_pages << PAGE_SHIFT;
>  		u64 end = md->phys_addr + size;
>  		if (!(md->attribute & EFI_MEMORY_RUNTIME) &&
>  		    md->type != EFI_BOOT_SERVICES_CODE &&
> @@ -843,7 +843,7 @@ void __iomem *efi_lookup_mapped_addr(u64 phys_addr)
>  
>  void efi_memory_uc(u64 addr, unsigned long size)
>  {
> -	unsigned long page_shift = 1UL << EFI_PAGE_SHIFT;
> +	unsigned long page_shift = 1UL << PAGE_SHIFT;
>  	u64 npages;
>  
>  	npages = round_up(size, page_shift) / page_shift;
> @@ -896,7 +896,7 @@ void __init efi_enter_virtual_mode(void)
>  			continue;
>  		}
>  
> -		prev_size = prev_md->num_pages << EFI_PAGE_SHIFT;
> +		prev_size = prev_md->num_pages << PAGE_SHIFT;
>  
>  		if (md->phys_addr == (prev_md->phys_addr + prev_size)) {
>  			prev_md->num_pages += md->num_pages;
> @@ -914,7 +914,7 @@ void __init efi_enter_virtual_mode(void)
>  		    md->type != EFI_BOOT_SERVICES_DATA)
>  			continue;
>  
> -		size = md->num_pages << EFI_PAGE_SHIFT;
> +		size = md->num_pages << PAGE_SHIFT;
>  		end = md->phys_addr + size;
>  
>  		start_pfn = PFN_DOWN(md->phys_addr);
> @@ -1011,7 +1011,7 @@ u32 efi_mem_type(unsigned long phys_addr)
>  		md = p;
>  		if ((md->phys_addr <= phys_addr) &&
>  		    (phys_addr < (md->phys_addr +
> -				  (md->num_pages << EFI_PAGE_SHIFT))))
> +				  (md->num_pages << PAGE_SHIFT))))
>  			return md->type;
>  	}
>  	return 0;
> @@ -1026,7 +1026,7 @@ u64 efi_mem_attributes(unsigned long phys_addr)
>  		md = p;
>  		if ((md->phys_addr <= phys_addr) &&
>  		    (phys_addr < (md->phys_addr +
> -				  (md->num_pages << EFI_PAGE_SHIFT))))
> +				  (md->num_pages << PAGE_SHIFT))))
>  			return md->attribute;
>  	}
>  	return 0;
> diff --git a/include/linux/efi.h b/include/linux/efi.h
> index 5f8f176154f7..fa47d80ab4b5 100644
> --- a/include/linux/efi.h
> +++ b/include/linux/efi.h
> @@ -95,8 +95,6 @@ typedef	struct {
>  #define EFI_MEMORY_RUNTIME	((u64)0x8000000000000000ULL)	/* range requires runtime mapping */
>  #define EFI_MEMORY_DESCRIPTOR_VERSION	1
>  
> -#define EFI_PAGE_SHIFT		12
> -
>  typedef struct {
>  	u32 type;
>  	u32 pad;
> @@ -611,7 +609,7 @@ static inline int efi_range_is_wc(unsigned long start, unsigned long len)
>  {
>  	unsigned long i;
>  
> -	for (i = 0; i < len; i += (1UL << EFI_PAGE_SHIFT)) {
> +	for (i = 0; i < len; i += PAGE_SIZE) {
>  		unsigned long paddr = __pa(start + i);
>  		if (!(efi_mem_attributes(paddr) & EFI_MEMORY_WC))
>  			return 0;
> @@ -728,7 +726,7 @@ struct efi_generic_dev_path {
>  
>  static inline void memrange_efi_to_native(u64 *addr, u64 *npages)
>  {
> -	*npages = PFN_UP(*addr + (*npages<<EFI_PAGE_SHIFT)) - PFN_DOWN(*addr);
> +	*npages = PFN_UP(*addr + (*npages << PAGE_SHIFT)) - PFN_DOWN(*addr);
>  	*addr &= PAGE_MASK;
>  }
>  
> -- 
> 1.8.4
> 

-- 
Matt Fleming, Intel Open Source Technology Center

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH 00/11] EFI runtime services virtual mapping
  2013-09-20  9:05   ` Borislav Petkov
  2013-09-20  9:44     ` Matt Fleming
  2013-09-20  9:49     ` Matt Fleming
@ 2013-09-20 11:51     ` Dave Young
  2013-09-20 12:29     ` Matt Fleming
  3 siblings, 0 replies; 102+ messages in thread
From: Dave Young @ 2013-09-20 11:51 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: X86 ML, LKML, Borislav Petkov, Matt Fleming, Matthew Garrett,
	H. Peter Anvin, James Bottomley, Vivek Goyal, linux-efi

On 09/20/13 at 11:05am, Borislav Petkov wrote:
> On Fri, Sep 20, 2013 at 03:29:04PM +0800, Dave Young wrote:
> > Just tested this series, for 1st kernel It boots ok in qemu+ovmf. But
> > it immediately reboot on my Thinkpad T420. Unfortunately there's no
> > way to debug this very early problem because there's no serial port
> > also earlyprintk does not work for efi boot. No usb debug as well on
> > this machine. I will test it when I go back to work after the china
> > holiday.
> 
> Hmm, I'm booting with the efi boot stub, how do you do it?

Just a Fedora 19 grub boot.

> 
> > OTOH, for 2nd kernel testing because kexec tools does not fill
> > efi_info[] in bootparam so kernel will disable efi, also it pass
> > acpi_rsdp pointer automaticlly to make 2nd kernel boot ok.
> 
> Right, the way this could be done is to pass in efi_info.efi_memmap,
> i.e. the physical map and then iterate over it and compute the virtual
> addresses *without* calling phys_efi_set_virtual_address_map() - they
> are stable now.
> 
> > I tested with a user space patch which copy efi_info from 1st kernel
> > to bootparams, as I said previously this is not enough because several
> > fields in systab, fw_vendor, runtime and tables are converted to
> > virtual address but in kernel efi init function they are assumed
> > physical addresses. Thus we need save these physical address. I have a
> > patch to save them and pass them to 2nd kernel in bootparams.
> 
> Yep.
> 
> > Since the mapping are same, I wonder if we can calculate the physical
> > address from virtual address. Idea?
> 
> Just look at the loop where we're iterating over regions in
> efi_enter_virtual_mode(): we basically can do the same __map_region
> calls without calling phys_efi_set_virtual_address_map.

Sorry, I do not understand the "do the same __map_region"

See below code:
        /*
         * Show what we know for posterity
         */
        c16 = tmp = early_ioremap(efi.systab->fw_vendor, 2);

efi.systab->fw_vendor is a virtual addr after entering virtual mode,
so can not ioremap it in 2nd kernel. 

efi_init is before enter_virtual_mode, do you means move mem mapping code 
earlier so we can directly use the fw_vendor as a virtual address?

> 
> > Another concern is that is it safe for i386 efi boot?
> 
> That's why I didn't put a git tree on k.org - I wanted to run tests
> myself before Fengguang's robot :)
> 
> But no, 32-bit is not addressed here. Which just dawned on me: Matt, I
> probably should keep the ioremapping code for 32-bit, doh. I completely
> went 64-bit only here :-)
> 
> Thanks.
> 
> -- 
> Regards/Gruss,
>     Boris.
> 
> Sent from a fat crate under my desk. Formatting is fine.
> --

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH 00/11] EFI runtime services virtual mapping
  2013-09-20  9:05   ` Borislav Petkov
                       ` (2 preceding siblings ...)
  2013-09-20 11:51     ` Dave Young
@ 2013-09-20 12:29     ` Matt Fleming
  2013-09-20 14:04       ` Dave Young
  3 siblings, 1 reply; 102+ messages in thread
From: Matt Fleming @ 2013-09-20 12:29 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Dave Young, X86 ML, LKML, Borislav Petkov, Matt Fleming,
	Matthew Garrett, H. Peter Anvin, James Bottomley, Vivek Goyal,
	linux-efi

On Fri, 20 Sep, at 11:05:44AM, Borislav Petkov wrote:
> On Fri, Sep 20, 2013 at 03:29:04PM +0800, Dave Young wrote:
> > Just tested this series, for 1st kernel It boots ok in qemu+ovmf. But
> > it immediately reboot on my Thinkpad T420. Unfortunately there's no
> > way to debug this very early problem because there's no serial port
> > also earlyprintk does not work for efi boot. No usb debug as well on
> > this machine. I will test it when I go back to work after the china
> > holiday.
> 
> Hmm, I'm booting with the efi boot stub, how do you do it?

Dave, could you try this patch?

---

diff --git a/arch/x86/boot/compressed/head_64.S b/arch/x86/boot/compressed/head_64.S
index 06e71c2..9bcc15c 100644
--- a/arch/x86/boot/compressed/head_64.S
+++ b/arch/x86/boot/compressed/head_64.S
@@ -207,6 +207,8 @@ ENTRY(startup_64)
 	jmp	preferred_addr
 
 ENTRY(efi_pe_entry)
+	movq	%cr3, %r15
+	movq	%r15, efi_scratch+16(%rip)
 	mov	%rcx, %rdi
 	mov	%rdx, %rsi
 	pushq	%rdi
@@ -219,6 +221,8 @@ ENTRY(efi_pe_entry)
 	popq	%rdi
 
 ENTRY(efi_stub_entry)
+	movq	%cr3, %r15
+	movq	%r15, efi_scratch+16(%rip)
 	call	efi_main
 	movq	%rax,%rsi
 	cmpq	$0,%rax

-- 
Matt Fleming, Intel Open Source Technology Center

^ permalink raw reply related	[flat|nested] 102+ messages in thread

* Re: [PATCH 00/11] EFI runtime services virtual mapping
  2013-09-20 12:29     ` Matt Fleming
@ 2013-09-20 14:04       ` Dave Young
  0 siblings, 0 replies; 102+ messages in thread
From: Dave Young @ 2013-09-20 14:04 UTC (permalink / raw)
  To: Matt Fleming
  Cc: Borislav Petkov, X86 ML, LKML, Borislav Petkov, Matthew Garrett,
	H. Peter Anvin, James Bottomley, Vivek Goyal, linux-efi

On 09/20/13 at 01:29pm, Matt Fleming wrote:
> On Fri, 20 Sep, at 11:05:44AM, Borislav Petkov wrote:
> > On Fri, Sep 20, 2013 at 03:29:04PM +0800, Dave Young wrote:
> > > Just tested this series, for 1st kernel It boots ok in qemu+ovmf. But
> > > it immediately reboot on my Thinkpad T420. Unfortunately there's no
> > > way to debug this very early problem because there's no serial port
> > > also earlyprintk does not work for efi boot. No usb debug as well on
> > > this machine. I will test it when I go back to work after the china
> > > holiday.
> > 
> > Hmm, I'm booting with the efi boot stub, how do you do it?
> 
> Dave, could you try this patch?

Matt,

It works for me, thanks for the quick fix.

> 
> ---
> 
> diff --git a/arch/x86/boot/compressed/head_64.S b/arch/x86/boot/compressed/head_64.S
> index 06e71c2..9bcc15c 100644
> --- a/arch/x86/boot/compressed/head_64.S
> +++ b/arch/x86/boot/compressed/head_64.S
> @@ -207,6 +207,8 @@ ENTRY(startup_64)
>  	jmp	preferred_addr
>  
>  ENTRY(efi_pe_entry)
> +	movq	%cr3, %r15
> +	movq	%r15, efi_scratch+16(%rip)
>  	mov	%rcx, %rdi
>  	mov	%rdx, %rsi
>  	pushq	%rdi
> @@ -219,6 +221,8 @@ ENTRY(efi_pe_entry)
>  	popq	%rdi
>  
>  ENTRY(efi_stub_entry)
> +	movq	%cr3, %r15
> +	movq	%r15, efi_scratch+16(%rip)
>  	call	efi_main
>  	movq	%rax,%rsi
>  	cmpq	$0,%rax
> 
> -- 
> Matt Fleming, Intel Open Source Technology Center

^ permalink raw reply	[flat|nested] 102+ messages in thread

* [PATCH -v2] EFI: Runtime services virtual mapping
  2013-09-19 14:54 ` [PATCH 11/11] EFI: Runtime services virtual mapping Borislav Petkov
@ 2013-09-21 11:39   ` Borislav Petkov
  2013-09-22 12:35     ` Dave Young
                       ` (3 more replies)
  0 siblings, 4 replies; 102+ messages in thread
From: Borislav Petkov @ 2013-09-21 11:39 UTC (permalink / raw)
  To: X86 ML
  Cc: LKML, Borislav Petkov, Matt Fleming, Matthew Garrett,
	H. Peter Anvin, James Bottomley, Vivek Goyal, Dave Young,
	linux-efi

On Thu, Sep 19, 2013 at 04:54:54PM +0200, Borislav Petkov wrote:
> From: Borislav Petkov <bp@suse.de>
> 
> We map the EFI regions needed for runtime services contiguously on
> virtual addresses starting from -4G down for a total max space of 64G.
> This way, we provide for stable runtime services addresses across
> kernels so that a kexec'd kernel can still use them.
> 
> This way, they're mapped in a separate pagetable so that we don't
> pollute the kernel namespace (you can see how the whole ioremapping and
> saving and restoring of PGDs is gone now).

Ok, this one was not so good, let's try again:

This time I saved 32-bit and am switching the pagetable only after
having built it properly. This boots fine again on baremetal and on OVMF
with Matt's handover flags fix from yesterday.

Also, I've uploaded the whole series to
git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp.git, branch
efi-experimental

("-experimental" doesn't trigger Fengguang's robot :-))

Good luck! :-)

---
>From 880fcee20209a122eda846e7f109776ed1c56de5 Mon Sep 17 00:00:00 2001
From: Borislav Petkov <bp@suse.de>
Date: Wed, 18 Sep 2013 17:35:42 +0200
Subject: [PATCH] EFI: Runtime services virtual mapping

We map the EFI regions needed for runtime services contiguously on
virtual addresses starting from -4G down for a total max space of 64G.
This way, we provide for stable runtime services addresses across
kernels so that a kexec'd kernel can still use them.

This way, they're mapped in a separate pagetable so that we don't
pollute the kernel namespace (you can see how the whole ioremapping and
saving and restoring of PGDs is gone now).

Signed-off-by: Borislav Petkov <bp@suse.de>
---
 arch/x86/include/asm/efi.h           | 43 ++++++++++--------
 arch/x86/include/asm/pgtable_types.h |  3 +-
 arch/x86/platform/efi/efi.c          | 68 ++++++++++++-----------------
 arch/x86/platform/efi/efi_32.c       | 29 +++++++++++-
 arch/x86/platform/efi/efi_64.c       | 85 +++++++++++++++++++-----------------
 arch/x86/platform/efi/efi_stub_64.S  | 53 ++++++++++++++++++++++
 6 files changed, 181 insertions(+), 100 deletions(-)

diff --git a/arch/x86/include/asm/efi.h b/arch/x86/include/asm/efi.h
index 0062a0125041..9a99e0499e4b 100644
--- a/arch/x86/include/asm/efi.h
+++ b/arch/x86/include/asm/efi.h
@@ -69,24 +69,31 @@ extern u64 efi_call6(void *fp, u64 arg1, u64 arg2, u64 arg3,
 	efi_call6((f), (u64)(a1), (u64)(a2), (u64)(a3),		\
 		  (u64)(a4), (u64)(a5), (u64)(a6))
 
+#define _efi_call_virtX(x, f, ...)					\
+({									\
+	efi_status_t __s;						\
+									\
+	efi_sync_low_kernel_mappings();					\
+	preempt_disable();						\
+	__s = efi_call##x((void *)efi.systab->runtime->f, __VA_ARGS__);	\
+	preempt_enable();						\
+	__s;								\
+})
+
 #define efi_call_virt0(f)				\
-	efi_call0((efi.systab->runtime->f))
-#define efi_call_virt1(f, a1)					\
-	efi_call1((efi.systab->runtime->f), (u64)(a1))
-#define efi_call_virt2(f, a1, a2)					\
-	efi_call2((efi.systab->runtime->f), (u64)(a1), (u64)(a2))
-#define efi_call_virt3(f, a1, a2, a3)					\
-	efi_call3((efi.systab->runtime->f), (u64)(a1), (u64)(a2), \
-		  (u64)(a3))
-#define efi_call_virt4(f, a1, a2, a3, a4)				\
-	efi_call4((efi.systab->runtime->f), (u64)(a1), (u64)(a2), \
-		  (u64)(a3), (u64)(a4))
-#define efi_call_virt5(f, a1, a2, a3, a4, a5)				\
-	efi_call5((efi.systab->runtime->f), (u64)(a1), (u64)(a2), \
-		  (u64)(a3), (u64)(a4), (u64)(a5))
-#define efi_call_virt6(f, a1, a2, a3, a4, a5, a6)			\
-	efi_call6((efi.systab->runtime->f), (u64)(a1), (u64)(a2), \
-		  (u64)(a3), (u64)(a4), (u64)(a5), (u64)(a6))
+	_efi_call_virtX(0, f)
+#define efi_call_virt1(f, a1)				\
+	_efi_call_virtX(1, f, (u64)(a1))
+#define efi_call_virt2(f, a1, a2)			\
+	_efi_call_virtX(2, f, (u64)(a1), (u64)(a2))
+#define efi_call_virt3(f, a1, a2, a3)			\
+	_efi_call_virtX(3, f, (u64)(a1), (u64)(a2), (u64)(a3))
+#define efi_call_virt4(f, a1, a2, a3, a4)		\
+	_efi_call_virtX(4, f, (u64)(a1), (u64)(a2), (u64)(a3), (u64)(a4))
+#define efi_call_virt5(f, a1, a2, a3, a4, a5)		\
+	_efi_call_virtX(5, f, (u64)(a1), (u64)(a2), (u64)(a3), (u64)(a4), (u64)(a5))
+#define efi_call_virt6(f, a1, a2, a3, a4, a5, a6)	\
+	_efi_call_virtX(6, f, (u64)(a1), (u64)(a2), (u64)(a3), (u64)(a4), (u64)(a5), (u64)(a6))
 
 extern void __iomem *efi_ioremap(unsigned long addr, unsigned long size,
 				 u32 type, u64 attribute);
@@ -101,6 +108,8 @@ extern void efi_call_phys_prelog(void);
 extern void efi_call_phys_epilog(void);
 extern void efi_unmap_memmap(void);
 extern void efi_memory_uc(u64 addr, unsigned long size);
+extern void __init efi_map_region(efi_memory_desc_t *md);
+extern void efi_sync_low_kernel_mappings(void);
 
 #ifdef CONFIG_EFI
 
diff --git a/arch/x86/include/asm/pgtable_types.h b/arch/x86/include/asm/pgtable_types.h
index 0ecac257fb26..a83aa44bb1fb 100644
--- a/arch/x86/include/asm/pgtable_types.h
+++ b/arch/x86/include/asm/pgtable_types.h
@@ -382,7 +382,8 @@ static inline void update_page_count(int level, unsigned long pages) { }
  */
 extern pte_t *lookup_address(unsigned long address, unsigned int *level);
 extern phys_addr_t slow_virt_to_phys(void *__address);
-
+extern int kernel_map_pages_in_pgd(pgd_t *pgd, u64 pfn, unsigned long address,
+				   unsigned numpages, unsigned long page_flags);
 #endif	/* !__ASSEMBLY__ */
 
 #endif /* _ASM_X86_PGTABLE_DEFS_H */
diff --git a/arch/x86/platform/efi/efi.c b/arch/x86/platform/efi/efi.c
index 538c1e6b7b2c..90459f5f587c 100644
--- a/arch/x86/platform/efi/efi.c
+++ b/arch/x86/platform/efi/efi.c
@@ -12,6 +12,8 @@
  *	Bibo Mao <bibo.mao@intel.com>
  *	Chandramouli Narayanan <mouli@linux.intel.com>
  *	Huang Ying <ying.huang@intel.com>
+ * Copyright (C) 2013 SuSE Labs
+ * 	Borislav Petkov <bp@suse.de> - runtime services VA mapping
  *
  * Copied from efi_32.c to eliminate the duplicated code between EFI
  * 32/64 support code. --ying 2007-10-26
@@ -81,6 +83,17 @@ static efi_system_table_t efi_systab __initdata;
 unsigned long x86_efi_facility;
 
 /*
+ * Scratch space used for switching the pagetable in the EFI stub
+ */
+struct efi_scratch {
+	u64 r15;
+	u64 prev_cr3;
+	pgd_t *efi_pgt;
+	bool use_pgd;
+};
+extern struct efi_scratch efi_scratch;
+
+/*
  * Returns 1 if 'facility' is enabled, 0 otherwise.
  */
 int efi_enabled(int facility)
@@ -797,22 +810,6 @@ void __init efi_set_executable(efi_memory_desc_t *md, bool executable)
 		set_memory_nx(addr, npages);
 }
 
-static void __init runtime_code_page_mkexec(void)
-{
-	efi_memory_desc_t *md;
-	void *p;
-
-	/* Make EFI runtime service code area executable */
-	for (p = memmap.map; p < memmap.map_end; p += memmap.desc_size) {
-		md = p;
-
-		if (md->type != EFI_RUNTIME_SERVICES_CODE)
-			continue;
-
-		efi_set_executable(md, true);
-	}
-}
-
 /*
  * We can't ioremap data in EFI boot services RAM, because we've already mapped
  * it as RAM.  So, look it up in the existing EFI memory map instead.  Only
@@ -862,10 +859,10 @@ void efi_memory_uc(u64 addr, unsigned long size)
 void __init efi_enter_virtual_mode(void)
 {
 	efi_memory_desc_t *md, *prev_md = NULL;
-	efi_status_t status;
+	void *p, *new_memmap = NULL;
 	unsigned long size;
-	u64 end, systab, start_pfn, end_pfn;
-	void *p, *va, *new_memmap = NULL;
+	efi_status_t status;
+	u64 end, systab;
 	int count = 0;
 
 	efi.systab = NULL;
@@ -874,7 +871,6 @@ void __init efi_enter_virtual_mode(void)
 	 * We don't do virtual mode, since we don't do runtime services, on
 	 * non-native EFI
 	 */
-
 	if (!efi_is_native()) {
 		efi_unmap_memmap();
 		return;
@@ -914,33 +910,18 @@ void __init efi_enter_virtual_mode(void)
 		    md->type != EFI_BOOT_SERVICES_DATA)
 			continue;
 
+		efi_map_region(md);
+
 		size = md->num_pages << PAGE_SHIFT;
 		end = md->phys_addr + size;
 
-		start_pfn = PFN_DOWN(md->phys_addr);
-		end_pfn = PFN_UP(end);
-		if (pfn_range_is_mapped(start_pfn, end_pfn)) {
-			va = __va(md->phys_addr);
-
-			if (!(md->attribute & EFI_MEMORY_WB))
-				efi_memory_uc((u64)(unsigned long)va, size);
-		} else
-			va = efi_ioremap(md->phys_addr, size,
-					 md->type, md->attribute);
-
-		md->virt_addr = (u64) (unsigned long) va;
-
-		if (!va) {
-			pr_err("ioremap of 0x%llX failed!\n",
-			       (unsigned long long)md->phys_addr);
-			continue;
-		}
-
 		systab = (u64) (unsigned long) efi_phys.systab;
 		if (md->phys_addr <= systab && systab < end) {
 			systab += md->virt_addr - md->phys_addr;
+
 			efi.systab = (efi_system_table_t *) (unsigned long) systab;
 		}
+
 		new_memmap = krealloc(new_memmap,
 				      (count + 1) * memmap.desc_size,
 				      GFP_KERNEL);
@@ -949,8 +930,15 @@ void __init efi_enter_virtual_mode(void)
 		count++;
 	}
 
+#ifdef CONFIG_X86_64
+	efi_scratch.efi_pgt = (pgd_t *)(unsigned long)real_mode_header->trampoline_pgd;
+	efi_scratch.use_pgd = true;
+#endif
+
 	BUG_ON(!efi.systab);
 
+	efi_sync_low_kernel_mappings();
+
 	status = phys_efi_set_virtual_address_map(
 		memmap.desc_size * count,
 		memmap.desc_size,
@@ -983,8 +971,6 @@ void __init efi_enter_virtual_mode(void)
 	efi.query_variable_info = virt_efi_query_variable_info;
 	efi.update_capsule = virt_efi_update_capsule;
 	efi.query_capsule_caps = virt_efi_query_capsule_caps;
-	if (__supported_pte_mask & _PAGE_NX)
-		runtime_code_page_mkexec();
 
 	kfree(new_memmap);
 
diff --git a/arch/x86/platform/efi/efi_32.c b/arch/x86/platform/efi/efi_32.c
index 40e446941dd7..661663b08eaf 100644
--- a/arch/x86/platform/efi/efi_32.c
+++ b/arch/x86/platform/efi/efi_32.c
@@ -37,9 +37,36 @@
  * claim EFI runtime service handler exclusively and to duplicate a memory in
  * low memory space say 0 - 3G.
  */
-
 static unsigned long efi_rt_eflags;
 
+void efi_sync_low_kernel_mappings(void) {}
+
+void __init efi_map_region(efi_memory_desc_t *md)
+{
+	u64 start_pfn, end_pfn, end;
+	unsigned long size;
+	void *va;
+
+	start_pfn = PFN_DOWN(md->phys_addr);
+	size	  = md->num_pages << PAGE_SHIFT;
+	end	  = md->phys_addr + size;
+	end_pfn   = PFN_UP(end);
+
+	if (pfn_range_is_mapped(start_pfn, end_pfn)) {
+		va = __va(md->phys_addr);
+
+		if (!(md->attribute & EFI_MEMORY_WB))
+			efi_memory_uc((u64)(unsigned long)va, size);
+	} else
+		va = efi_ioremap(md->phys_addr, size,
+				 md->type, md->attribute);
+
+	md->virt_addr = (u64) (unsigned long) va;
+	if (!va)
+		pr_err("ioremap of 0x%llX failed!\n",
+		       (unsigned long long)md->phys_addr);
+}
+
 void efi_call_phys_prelog(void)
 {
 	struct desc_ptr gdt_descr;
diff --git a/arch/x86/platform/efi/efi_64.c b/arch/x86/platform/efi/efi_64.c
index 39a0e7f1f0a3..db5230dd350e 100644
--- a/arch/x86/platform/efi/efi_64.c
+++ b/arch/x86/platform/efi/efi_64.c
@@ -39,59 +39,64 @@
 #include <asm/cacheflush.h>
 #include <asm/fixmap.h>
 
-static pgd_t *save_pgd __initdata;
-static unsigned long efi_flags __initdata;
+void __init efi_call_phys_prelog(void) {}
+void __init efi_call_phys_epilog(void) {}
 
-static void __init early_code_mapping_set_exec(int executable)
+/*
+ * Add low kernel mappings for passing arguments to EFI functions.
+ */
+void efi_sync_low_kernel_mappings(void)
 {
-	efi_memory_desc_t *md;
-	void *p;
+	unsigned num_pgds;
+	pgd_t *pgd = (pgd_t *)__va(real_mode_header->trampoline_pgd);
 
-	if (!(__supported_pte_mask & _PAGE_NX))
-		return;
+	num_pgds = pgd_index(VMALLOC_START - 1) - pgd_index(PAGE_OFFSET);
 
-	/* Make EFI service code area executable */
-	for (p = memmap.map; p < memmap.map_end; p += memmap.desc_size) {
-		md = p;
-		if (md->type == EFI_RUNTIME_SERVICES_CODE ||
-		    md->type == EFI_BOOT_SERVICES_CODE)
-			efi_set_executable(md, executable);
-	}
+	memcpy(pgd + pgd_index(PAGE_OFFSET),
+		init_mm.pgd + pgd_index(PAGE_OFFSET),
+		sizeof(pgd_t) * num_pgds);
 }
 
-void __init efi_call_phys_prelog(void)
+/*
+ * We allocate runtime services regions top-down, starting from -4G, i.e.
+ * 0xffff_ffff_0000_0000 and limit EFI VA mapping space to 64G.
+ */
+static u64 efi_va	 = -4 * (1UL << 30);
+#define EFI_VA_END	 (-68 * (1UL << 30))
+
+static void __init __map_region(efi_memory_desc_t *md, u64 va)
 {
-	unsigned long vaddress;
-	int pgd;
-	int n_pgds;
+	pgd_t *pgd = (pgd_t *)__va(real_mode_header->trampoline_pgd);
+	unsigned long pf = 0, size;
+	u64 end;
 
-	early_code_mapping_set_exec(1);
-	local_irq_save(efi_flags);
+	if (!(md->attribute & EFI_MEMORY_WB))
+		pf |= _PAGE_PCD;
 
-	n_pgds = DIV_ROUND_UP((max_pfn << PAGE_SHIFT), PGDIR_SIZE);
-	save_pgd = kmalloc(n_pgds * sizeof(pgd_t), GFP_KERNEL);
+	size = md->num_pages << PAGE_SHIFT;
+	end  = va + size;
 
-	for (pgd = 0; pgd < n_pgds; pgd++) {
-		save_pgd[pgd] = *pgd_offset_k(pgd * PGDIR_SIZE);
-		vaddress = (unsigned long)__va(pgd * PGDIR_SIZE);
-		set_pgd(pgd_offset_k(pgd * PGDIR_SIZE), *pgd_offset_k(vaddress));
-	}
-	__flush_tlb_all();
+	if(kernel_map_pages_in_pgd(pgd, md->phys_addr, va, md->num_pages, pf))
+		pr_warning("Error mapping PA 0x%llx -> VA 0x%llx!\n",
+			   md->phys_addr, va);
 }
 
-void __init efi_call_phys_epilog(void)
+void __init efi_map_region(efi_memory_desc_t *md)
 {
-	/*
-	 * After the lock is released, the original page table is restored.
-	 */
-	int pgd;
-	int n_pgds = DIV_ROUND_UP((max_pfn << PAGE_SHIFT) , PGDIR_SIZE);
-	for (pgd = 0; pgd < n_pgds; pgd++)
-		set_pgd(pgd_offset_k(pgd * PGDIR_SIZE), save_pgd[pgd]);
-	kfree(save_pgd);
-	__flush_tlb_all();
-	local_irq_restore(efi_flags);
-	early_code_mapping_set_exec(0);
+	unsigned long size = md->num_pages << PAGE_SHIFT;
+
+	efi_va -= size;
+	if (efi_va < EFI_VA_END) {
+		pr_warning(FW_WARN "VA address range overflow!\n");
+		return;
+	}
+
+	/* Do the 1:1 map */
+	__map_region(md, md->phys_addr);
+
+	/* Do the VA map */
+	__map_region(md, efi_va);
+	md->virt_addr = efi_va;
 }
 
 void __iomem *__init efi_ioremap(unsigned long phys_addr, unsigned long size,
diff --git a/arch/x86/platform/efi/efi_stub_64.S b/arch/x86/platform/efi/efi_stub_64.S
index 4c07ccab8146..2bb9714bf713 100644
--- a/arch/x86/platform/efi/efi_stub_64.S
+++ b/arch/x86/platform/efi/efi_stub_64.S
@@ -34,10 +34,47 @@
 	mov %rsi, %cr0;			\
 	mov (%rsp), %rsp
 
+	/* stolen from gcc */
+	.macro FLUSH_TLB_ALL
+	movq %r15, efi_scratch(%rip)
+	movq %r14, efi_scratch+8(%rip)
+	movq %cr4, %r15
+	movq %r15, %r14
+	andb $0x7f, %r14b
+	movq %r14, %cr4
+	movq %r15, %cr4
+	movq efi_scratch+8(%rip), %r14
+	movq efi_scratch(%rip), %r15
+	.endm
+
+	.macro SWITCH_PGT
+	cmpb $0, efi_scratch+24(%rip)
+	je 1f
+	movq %r15, efi_scratch(%rip)		# r15
+	# save previous CR3
+	movq %cr3, %r15
+	movq %r15, efi_scratch+8(%rip)		# prev_cr3
+	movq efi_scratch+16(%rip), %r15		# EFI pgt
+	movq %r15, %cr3
+	1:
+	.endm
+
+	.macro RESTORE_PGT
+	cmpb $0, efi_scratch+24(%rip)
+	je 2f
+	movq efi_scratch+8(%rip), %r15
+	movq %r15, %cr3
+	movq efi_scratch(%rip), %r15
+	FLUSH_TLB_ALL
+	2:
+	.endm
+
 ENTRY(efi_call0)
 	SAVE_XMM
 	subq $32, %rsp
+	SWITCH_PGT
 	call *%rdi
+	RESTORE_PGT
 	addq $32, %rsp
 	RESTORE_XMM
 	ret
@@ -47,7 +84,9 @@ ENTRY(efi_call1)
 	SAVE_XMM
 	subq $32, %rsp
 	mov  %rsi, %rcx
+	SWITCH_PGT
 	call *%rdi
+	RESTORE_PGT
 	addq $32, %rsp
 	RESTORE_XMM
 	ret
@@ -57,7 +96,9 @@ ENTRY(efi_call2)
 	SAVE_XMM
 	subq $32, %rsp
 	mov  %rsi, %rcx
+	SWITCH_PGT
 	call *%rdi
+	RESTORE_PGT
 	addq $32, %rsp
 	RESTORE_XMM
 	ret
@@ -68,7 +109,9 @@ ENTRY(efi_call3)
 	subq $32, %rsp
 	mov  %rcx, %r8
 	mov  %rsi, %rcx
+	SWITCH_PGT
 	call *%rdi
+	RESTORE_PGT
 	addq $32, %rsp
 	RESTORE_XMM
 	ret
@@ -80,7 +123,9 @@ ENTRY(efi_call4)
 	mov %r8, %r9
 	mov %rcx, %r8
 	mov %rsi, %rcx
+	SWITCH_PGT
 	call *%rdi
+	RESTORE_PGT
 	addq $32, %rsp
 	RESTORE_XMM
 	ret
@@ -93,7 +138,9 @@ ENTRY(efi_call5)
 	mov %r8, %r9
 	mov %rcx, %r8
 	mov %rsi, %rcx
+	SWITCH_PGT
 	call *%rdi
+	RESTORE_PGT
 	addq $48, %rsp
 	RESTORE_XMM
 	ret
@@ -109,8 +156,14 @@ ENTRY(efi_call6)
 	mov %r8, %r9
 	mov %rcx, %r8
 	mov %rsi, %rcx
+	SWITCH_PGT
 	call *%rdi
+	RESTORE_PGT
 	addq $48, %rsp
 	RESTORE_XMM
 	ret
 ENDPROC(efi_call6)
+
+	.data
+ENTRY(efi_scratch)
+	.fill 3,8,0
-- 
1.8.4

-- 
Regards/Gruss,
    Boris.

Sent from a fat crate under my desk. Formatting is fine.
--

^ permalink raw reply related	[flat|nested] 102+ messages in thread

* Re: [PATCH 02/11] efi: Remove EFI_PAGE_SHIFT and EFI_PAGE_SIZE
  2013-09-20 10:42   ` Matt Fleming
@ 2013-09-21 15:21     ` Leif Lindholm
  2013-09-21 15:41       ` Borislav Petkov
  0 siblings, 1 reply; 102+ messages in thread
From: Leif Lindholm @ 2013-09-21 15:21 UTC (permalink / raw)
  To: Matt Fleming
  Cc: Mark Salter, Borislav Petkov, X86 ML, LKML, Borislav Petkov,
	Matthew Garrett, H. Peter Anvin, James Bottomley, Vivek Goyal,
	Dave Young, linux-efi, Roy Franz

On Fri, Sep 20, 2013 at 11:42:49AM +0100, Matt Fleming wrote:
> On Thu, 19 Sep, at 04:54:45PM, Borislav Petkov wrote:
> > From: Borislav Petkov <bp@suse.de>
> > 
> > ... and use the good old standard defines which we all know. Also,
> > simplify math to shift by PAGE_SHIFT instead of multiplying by
> > PAGE_SIZE.
> > 
> > Signed-off-by: Borislav Petkov <bp@suse.de>
> > ---
> >  arch/x86/boot/compressed/eboot.c | 12 ++++++------
> >  arch/x86/boot/compressed/eboot.h |  1 -
> >  arch/x86/platform/efi/efi.c      | 22 +++++++++++-----------
> >  include/linux/efi.h              |  6 ++----
> >  4 files changed, 19 insertions(+), 22 deletions(-)
> 
> I'm pulling in Leif and Roy just so they're aware of this change,
> because while PAGE_SHIFT is always 12 on x86, that's not true for arm64.

Thank you.
Also adding Mark.
 
> However, I imagine that much work would be needed to allow for page
> sizes other than 4K, so I am definitely going to take this patch.
 
This could actually be a problem for us pretty much from day 1.
UEFI mandates the use of 4K pages on arm64, but Fedora will be using
64K - so this aspect of this patch will actually break the default
usage model of Linux on arm64.

It will probably not be a problem on the stub side, and it's not used
in many places but it would break efi_lookup_mapped_address(),
efi_range_is_wc() and memrange_efi_to_native() for use by arm64.
At least the first of these would be a problem.

/
    Leif

> > diff --git a/arch/x86/boot/compressed/eboot.c b/arch/x86/boot/compressed/eboot.c
> > index b7388a425f09..5c440bf769a8 100644
> > --- a/arch/x86/boot/compressed/eboot.c
> > +++ b/arch/x86/boot/compressed/eboot.c
> > @@ -96,7 +96,7 @@ static efi_status_t high_alloc(unsigned long size, unsigned long align,
> >  	if (status != EFI_SUCCESS)
> >  		goto fail;
> >  
> > -	nr_pages = round_up(size, EFI_PAGE_SIZE) / EFI_PAGE_SIZE;
> > +	nr_pages = round_up(size, PAGE_SIZE) / PAGE_SIZE;
> >  again:
> >  	for (i = 0; i < map_size / desc_size; i++) {
> >  		efi_memory_desc_t *desc;
> > @@ -111,7 +111,7 @@ again:
> >  			continue;
> >  
> >  		start = desc->phys_addr;
> > -		end = start + desc->num_pages * (1UL << EFI_PAGE_SHIFT);
> > +		end = start + (desc->num_pages << PAGE_SHIFT);
> >  
> >  		if ((start + size) > end || (start + size) > max)
> >  			continue;
> > @@ -173,7 +173,7 @@ static efi_status_t low_alloc(unsigned long size, unsigned long align,
> >  	if (status != EFI_SUCCESS)
> >  		goto fail;
> >  
> > -	nr_pages = round_up(size, EFI_PAGE_SIZE) / EFI_PAGE_SIZE;
> > +	nr_pages = round_up(size, PAGE_SIZE) / PAGE_SIZE;
> >  	for (i = 0; i < map_size / desc_size; i++) {
> >  		efi_memory_desc_t *desc;
> >  		unsigned long m = (unsigned long)map;
> > @@ -188,7 +188,7 @@ static efi_status_t low_alloc(unsigned long size, unsigned long align,
> >  			continue;
> >  
> >  		start = desc->phys_addr;
> > -		end = start + desc->num_pages * (1UL << EFI_PAGE_SHIFT);
> > +		end = start + (desc->num_pages << PAGE_SHIFT);
> >  
> >  		/*
> >  		 * Don't allocate at 0x0. It will confuse code that
> > @@ -224,7 +224,7 @@ static void low_free(unsigned long size, unsigned long addr)
> >  {
> >  	unsigned long nr_pages;
> >  
> > -	nr_pages = round_up(size, EFI_PAGE_SIZE) / EFI_PAGE_SIZE;
> > +	nr_pages = round_up(size, PAGE_SIZE) / PAGE_SIZE;
> >  	efi_call_phys2(sys_table->boottime->free_pages, addr, nr_pages);
> >  }
> >  
> > @@ -1128,7 +1128,7 @@ static efi_status_t relocate_kernel(struct setup_header *hdr)
> >  	 * possible.
> >  	 */
> >  	start = hdr->pref_address;
> > -	nr_pages = round_up(hdr->init_size, EFI_PAGE_SIZE) / EFI_PAGE_SIZE;
> > +	nr_pages = round_up(hdr->init_size, PAGE_SIZE) / PAGE_SIZE;
> >  
> >  	status = efi_call_phys4(sys_table->boottime->allocate_pages,
> >  				EFI_ALLOCATE_ADDRESS, EFI_LOADER_DATA,
> > diff --git a/arch/x86/boot/compressed/eboot.h b/arch/x86/boot/compressed/eboot.h
> > index e5b0a8f91c5f..786398c1bb9a 100644
> > --- a/arch/x86/boot/compressed/eboot.h
> > +++ b/arch/x86/boot/compressed/eboot.h
> > @@ -11,7 +11,6 @@
> >  
> >  #define DESC_TYPE_CODE_DATA	(1 << 0)
> >  
> > -#define EFI_PAGE_SIZE		(1UL << EFI_PAGE_SHIFT)
> >  #define EFI_READ_CHUNK_SIZE	(1024 * 1024)
> >  
> >  #define EFI_CONSOLE_OUT_DEVICE_GUID    \
> > diff --git a/arch/x86/platform/efi/efi.c b/arch/x86/platform/efi/efi.c
> > index 7cec1e9e5494..538c1e6b7b2c 100644
> > --- a/arch/x86/platform/efi/efi.c
> > +++ b/arch/x86/platform/efi/efi.c
> > @@ -339,7 +339,7 @@ static void __init do_add_efi_memmap(void)
> >  	for (p = memmap.map; p < memmap.map_end; p += memmap.desc_size) {
> >  		efi_memory_desc_t *md = p;
> >  		unsigned long long start = md->phys_addr;
> > -		unsigned long long size = md->num_pages << EFI_PAGE_SHIFT;
> > +		unsigned long long size = md->num_pages << PAGE_SHIFT;
> >  		int e820_type;
> >  
> >  		switch (md->type) {
> > @@ -416,8 +416,8 @@ static void __init print_efi_memmap(void)
> >  		pr_info("mem%02u: type=%u, attr=0x%llx, "
> >  			"range=[0x%016llx-0x%016llx) (%lluMB)\n",
> >  			i, md->type, md->attribute, md->phys_addr,
> > -			md->phys_addr + (md->num_pages << EFI_PAGE_SHIFT),
> > -			(md->num_pages >> (20 - EFI_PAGE_SHIFT)));
> > +			md->phys_addr + (md->num_pages << PAGE_SHIFT),
> > +			(md->num_pages >> (20 - PAGE_SHIFT)));
> >  	}
> >  #endif  /*  EFI_DEBUG  */
> >  }
> > @@ -429,7 +429,7 @@ void __init efi_reserve_boot_services(void)
> >  	for (p = memmap.map; p < memmap.map_end; p += memmap.desc_size) {
> >  		efi_memory_desc_t *md = p;
> >  		u64 start = md->phys_addr;
> > -		u64 size = md->num_pages << EFI_PAGE_SHIFT;
> > +		u64 size = md->num_pages << PAGE_SHIFT;
> >  
> >  		if (md->type != EFI_BOOT_SERVICES_CODE &&
> >  		    md->type != EFI_BOOT_SERVICES_DATA)
> > @@ -473,7 +473,7 @@ void __init efi_free_boot_services(void)
> >  	for (p = memmap.map; p < memmap.map_end; p += memmap.desc_size) {
> >  		efi_memory_desc_t *md = p;
> >  		unsigned long long start = md->phys_addr;
> > -		unsigned long long size = md->num_pages << EFI_PAGE_SHIFT;
> > +		unsigned long long size = md->num_pages << PAGE_SHIFT;
> >  
> >  		if (md->type != EFI_BOOT_SERVICES_CODE &&
> >  		    md->type != EFI_BOOT_SERVICES_DATA)
> > @@ -825,7 +825,7 @@ void __iomem *efi_lookup_mapped_addr(u64 phys_addr)
> >  		return NULL;
> >  	for (p = memmap.map; p < memmap.map_end; p += memmap.desc_size) {
> >  		efi_memory_desc_t *md = p;
> > -		u64 size = md->num_pages << EFI_PAGE_SHIFT;
> > +		u64 size = md->num_pages << PAGE_SHIFT;
> >  		u64 end = md->phys_addr + size;
> >  		if (!(md->attribute & EFI_MEMORY_RUNTIME) &&
> >  		    md->type != EFI_BOOT_SERVICES_CODE &&
> > @@ -843,7 +843,7 @@ void __iomem *efi_lookup_mapped_addr(u64 phys_addr)
> >  
> >  void efi_memory_uc(u64 addr, unsigned long size)
> >  {
> > -	unsigned long page_shift = 1UL << EFI_PAGE_SHIFT;
> > +	unsigned long page_shift = 1UL << PAGE_SHIFT;
> >  	u64 npages;
> >  
> >  	npages = round_up(size, page_shift) / page_shift;
> > @@ -896,7 +896,7 @@ void __init efi_enter_virtual_mode(void)
> >  			continue;
> >  		}
> >  
> > -		prev_size = prev_md->num_pages << EFI_PAGE_SHIFT;
> > +		prev_size = prev_md->num_pages << PAGE_SHIFT;
> >  
> >  		if (md->phys_addr == (prev_md->phys_addr + prev_size)) {
> >  			prev_md->num_pages += md->num_pages;
> > @@ -914,7 +914,7 @@ void __init efi_enter_virtual_mode(void)
> >  		    md->type != EFI_BOOT_SERVICES_DATA)
> >  			continue;
> >  
> > -		size = md->num_pages << EFI_PAGE_SHIFT;
> > +		size = md->num_pages << PAGE_SHIFT;
> >  		end = md->phys_addr + size;
> >  
> >  		start_pfn = PFN_DOWN(md->phys_addr);
> > @@ -1011,7 +1011,7 @@ u32 efi_mem_type(unsigned long phys_addr)
> >  		md = p;
> >  		if ((md->phys_addr <= phys_addr) &&
> >  		    (phys_addr < (md->phys_addr +
> > -				  (md->num_pages << EFI_PAGE_SHIFT))))
> > +				  (md->num_pages << PAGE_SHIFT))))
> >  			return md->type;
> >  	}
> >  	return 0;
> > @@ -1026,7 +1026,7 @@ u64 efi_mem_attributes(unsigned long phys_addr)
> >  		md = p;
> >  		if ((md->phys_addr <= phys_addr) &&
> >  		    (phys_addr < (md->phys_addr +
> > -				  (md->num_pages << EFI_PAGE_SHIFT))))
> > +				  (md->num_pages << PAGE_SHIFT))))
> >  			return md->attribute;
> >  	}
> >  	return 0;
> > diff --git a/include/linux/efi.h b/include/linux/efi.h
> > index 5f8f176154f7..fa47d80ab4b5 100644
> > --- a/include/linux/efi.h
> > +++ b/include/linux/efi.h
> > @@ -95,8 +95,6 @@ typedef	struct {
> >  #define EFI_MEMORY_RUNTIME	((u64)0x8000000000000000ULL)	/* range requires runtime mapping */
> >  #define EFI_MEMORY_DESCRIPTOR_VERSION	1
> >  
> > -#define EFI_PAGE_SHIFT		12
> > -
> >  typedef struct {
> >  	u32 type;
> >  	u32 pad;
> > @@ -611,7 +609,7 @@ static inline int efi_range_is_wc(unsigned long start, unsigned long len)
> >  {
> >  	unsigned long i;
> >  
> > -	for (i = 0; i < len; i += (1UL << EFI_PAGE_SHIFT)) {
> > +	for (i = 0; i < len; i += PAGE_SIZE) {
> >  		unsigned long paddr = __pa(start + i);
> >  		if (!(efi_mem_attributes(paddr) & EFI_MEMORY_WC))
> >  			return 0;
> > @@ -728,7 +726,7 @@ struct efi_generic_dev_path {
> >  
> >  static inline void memrange_efi_to_native(u64 *addr, u64 *npages)
> >  {
> > -	*npages = PFN_UP(*addr + (*npages<<EFI_PAGE_SHIFT)) - PFN_DOWN(*addr);
> > +	*npages = PFN_UP(*addr + (*npages << PAGE_SHIFT)) - PFN_DOWN(*addr);
> >  	*addr &= PAGE_MASK;
> >  }
> >  
> > -- 
> > 1.8.4
> > 
> 
> -- 
> Matt Fleming, Intel Open Source Technology Center
> --
> To unsubscribe from this list: send the line "unsubscribe linux-efi" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH 02/11] efi: Remove EFI_PAGE_SHIFT and EFI_PAGE_SIZE
  2013-09-21 15:21     ` Leif Lindholm
@ 2013-09-21 15:41       ` Borislav Petkov
  2013-09-21 15:50         ` Borislav Petkov
  2013-09-21 15:59         ` Leif Lindholm
  0 siblings, 2 replies; 102+ messages in thread
From: Borislav Petkov @ 2013-09-21 15:41 UTC (permalink / raw)
  To: Leif Lindholm
  Cc: Matt Fleming, Mark Salter, X86 ML, LKML, Borislav Petkov,
	Matthew Garrett, H. Peter Anvin, James Bottomley, Vivek Goyal,
	Dave Young, linux-efi, Roy Franz

On Sat, Sep 21, 2013 at 05:21:39PM +0200, Leif Lindholm wrote:

> It will probably not be a problem on the stub side, and it's not used
> in many places but it would break efi_lookup_mapped_address(),
> efi_range_is_wc() and memrange_efi_to_native() for use by arm64.
> At least the first of these would be a problem.

Ok, maybe the generic header include/linux/efi.h might be a problem but
the rest are changes to arch/x86/ which should have no effect whatsoever
on any other arch.

Or are you planning to move some of it into generic code?

-- 
Regards/Gruss,
    Boris.

Sent from a fat crate under my desk. Formatting is fine.
--

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH 02/11] efi: Remove EFI_PAGE_SHIFT and EFI_PAGE_SIZE
  2013-09-21 15:41       ` Borislav Petkov
@ 2013-09-21 15:50         ` Borislav Petkov
  2013-09-21 16:01           ` Leif Lindholm
  2013-09-21 15:59         ` Leif Lindholm
  1 sibling, 1 reply; 102+ messages in thread
From: Borislav Petkov @ 2013-09-21 15:50 UTC (permalink / raw)
  To: Leif Lindholm
  Cc: Matt Fleming, Mark Salter, X86 ML, LKML, Borislav Petkov,
	Matthew Garrett, H. Peter Anvin, James Bottomley, Vivek Goyal,
	Dave Young, linux-efi, Roy Franz

On Sat, Sep 21, 2013 at 05:41:43PM +0200, Borislav Petkov wrote:
> On Sat, Sep 21, 2013 at 05:21:39PM +0200, Leif Lindholm wrote:
> 
> > It will probably not be a problem on the stub side, and it's not used
> > in many places but it would break efi_lookup_mapped_address(),
> > efi_range_is_wc() and memrange_efi_to_native() for use by arm64.
> > At least the first of these would be a problem.
> 
> Ok, maybe the generic header include/linux/efi.h might be a problem but
> the rest are changes to arch/x86/ which should have no effect whatsoever
> on any other arch.
> 
> Or are you planning to move some of it into generic code?

Oh, and arm64 defines a respective PAGE_SIZE too, so what's the problem?
Or is possibly EFI_PAGE_SIZE != PAGE_SIZE on arm64?

-- 
Regards/Gruss,
    Boris.

Sent from a fat crate under my desk. Formatting is fine.
--

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH 02/11] efi: Remove EFI_PAGE_SHIFT and EFI_PAGE_SIZE
  2013-09-21 15:41       ` Borislav Petkov
  2013-09-21 15:50         ` Borislav Petkov
@ 2013-09-21 15:59         ` Leif Lindholm
  1 sibling, 0 replies; 102+ messages in thread
From: Leif Lindholm @ 2013-09-21 15:59 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Matt Fleming, Mark Salter, X86 ML, LKML, Borislav Petkov,
	Matthew Garrett, H. Peter Anvin, James Bottomley, Vivek Goyal,
	Dave Young, linux-efi, Roy Franz

On Sat, Sep 21, 2013 at 05:41:43PM +0200, Borislav Petkov wrote:
> On Sat, Sep 21, 2013 at 05:21:39PM +0200, Leif Lindholm wrote:
> 
> > It will probably not be a problem on the stub side, and it's not used
> > in many places but it would break efi_lookup_mapped_address(),
> > efi_range_is_wc() and memrange_efi_to_native() for use by arm64.
> > At least the first of these would be a problem.
> 
> Ok, maybe the generic header include/linux/efi.h might be a problem but
> the rest are changes to arch/x86/ which should have no effect whatsoever
> on any other arch.

Indeed - my concerns are restricted to include/linux/efi.h and
drivers/firmware/efi.c.

> Or are you planning to move some of it into generic code?

I think some of the stub code is moved to generic in Roy's/Mark's
patches - but then the stub shouldn't be an issue as in UEFI we have
4K pages.

/
    Leif

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH 02/11] efi: Remove EFI_PAGE_SHIFT and EFI_PAGE_SIZE
  2013-09-21 15:50         ` Borislav Petkov
@ 2013-09-21 16:01           ` Leif Lindholm
  2013-09-21 16:03             ` Borislav Petkov
  0 siblings, 1 reply; 102+ messages in thread
From: Leif Lindholm @ 2013-09-21 16:01 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Matt Fleming, Mark Salter, X86 ML, LKML, Borislav Petkov,
	Matthew Garrett, H. Peter Anvin, James Bottomley, Vivek Goyal,
	Dave Young, linux-efi, Roy Franz

On Sat, Sep 21, 2013 at 05:50:39PM +0200, Borislav Petkov wrote:
> > Ok, maybe the generic header include/linux/efi.h might be a problem but
> > the rest are changes to arch/x86/ which should have no effect whatsoever
> > on any other arch.
> > 
> > Or are you planning to move some of it into generic code?
> 
> Oh, and arm64 defines a respective PAGE_SIZE too, so what's the problem?
> Or is possibly EFI_PAGE_SIZE != PAGE_SIZE on arm64?
 
Correct. On arm64, EFI_PAGE_SIZE will be 4K, and PAGE_SIZE can be 4K
or 64K, with at least Fedora opting for 64K.

/
    Leif

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH 02/11] efi: Remove EFI_PAGE_SHIFT and EFI_PAGE_SIZE
  2013-09-21 16:01           ` Leif Lindholm
@ 2013-09-21 16:03             ` Borislav Petkov
  0 siblings, 0 replies; 102+ messages in thread
From: Borislav Petkov @ 2013-09-21 16:03 UTC (permalink / raw)
  To: Leif Lindholm
  Cc: Matt Fleming, Mark Salter, X86 ML, LKML, Borislav Petkov,
	Matthew Garrett, H. Peter Anvin, James Bottomley, Vivek Goyal,
	Dave Young, linux-efi, Roy Franz

On Sat, Sep 21, 2013 at 06:01:21PM +0200, Leif Lindholm wrote:
> Correct. On arm64, EFI_PAGE_SIZE will be 4K, and PAGE_SIZE can be 4K
> or 64K, with at least Fedora opting for 64K.

Hm, ok, it looks like we want to keep EFI_PAGE_SIZE.

Oh well.

Thanks.

-- 
Regards/Gruss,
    Boris.

Sent from a fat crate under my desk. Formatting is fine.
--

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH -v2] EFI: Runtime services virtual mapping
  2013-09-21 11:39   ` [PATCH -v2] " Borislav Petkov
@ 2013-09-22 12:35     ` Dave Young
  2013-09-22 13:37       ` Borislav Petkov
  2013-09-23  5:47     ` Dave Young
                       ` (2 subsequent siblings)
  3 siblings, 1 reply; 102+ messages in thread
From: Dave Young @ 2013-09-22 12:35 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: X86 ML, LKML, Borislav Petkov, Matt Fleming, Matthew Garrett,
	H. Peter Anvin, James Bottomley, Vivek Goyal, linux-efi

On 09/21/13 at 01:39pm, Borislav Petkov wrote:
> On Thu, Sep 19, 2013 at 04:54:54PM +0200, Borislav Petkov wrote:
> > From: Borislav Petkov <bp@suse.de>
> > 
> > We map the EFI regions needed for runtime services contiguously on
> > virtual addresses starting from -4G down for a total max space of 64G.
> > This way, we provide for stable runtime services addresses across
> > kernels so that a kexec'd kernel can still use them.
> > 
> > This way, they're mapped in a separate pagetable so that we don't
> > pollute the kernel namespace (you can see how the whole ioremapping and
> > saving and restoring of PGDs is gone now).
> 
> Ok, this one was not so good, let's try again:
> 
> This time I saved 32-bit and am switching the pagetable only after
> having built it properly. This boots fine again on baremetal and on OVMF
> with Matt's handover flags fix from yesterday.
> 
> Also, I've uploaded the whole series to
> git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp.git, branch
> efi-experimental

I tested your new patch, it works both with efi stub and grub boot in 1st kernel.

But it paniced in kexec boot with my kexec related patcheset, the patchset
contains 3 patch:
1. introduce cmdline kexecboot=<0|1|2>; 1 == kexec, 2 == kdump
2. export physical addr fw_vendor, runtime, tables to /sys/firmware/efi/systab
3. if kexecboot != 0, use fw_vendor, runtime, tables from bootparams; Also do not
   call SetVirtualAddressMao in case kexecboot.

The panic happens at the last line of efi_init:
        /* clean DUMMY object */
        efi.set_variable(efi_dummy_name, &EFI_DUMMY_GUID,
                         EFI_VARIABLE_NON_VOLATILE |
                         EFI_VARIABLE_BOOTSERVICE_ACCESS |
                         EFI_VARIABLE_RUNTIME_ACCESS,
                         0, NULL);

Below is the dmesg:
[    0.003359] pid_max: default: 32768 minimum: 301
[    0.004792] BUG: unable to handle kernel paging request at fffffffefde97e70
[    0.006666] IP: [<ffffffff8103a1db>] virt_efi_set_variable+0x40/0x54
[    0.006666] PGD 36981067 PUD 35828063 PMD 0 
[    0.006666] Oops: 0000 [#1] SMP 
[    0.006666] Modules linked in:
[    0.006666] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 3.12.0-rc1+ #10
[    0.006666] task: ffffffff81985490 ti: ffffffff81964000 task.ti: ffffffff81964000
[    0.006666] RIP: 0010:[<ffffffff8103a1db>]  [<ffffffff8103a1db>] virt_efi_set_variable+0x40/0x54
[    0.006666] RSP: 0000:ffffffff81965ee0  EFLAGS: 00010246
[    0.006666] RAX: fffffffefde97e18 RBX: ffffffff81999300 RCX: 0000000000000007
[    0.006666] RDX: ffffffff81965f20 RSI: ffffffff81999300 RDI: ffff88000009cc88
[    0.006666] RBP: ffffffff81965f08 R08: 0000000000000000 R09: 0000000000000000
[    0.006666] R10: 0000000000000000 R11: 0000000000000000 R12: ffffffff81965f20
[    0.006666] R13: 0000000000000000 R14: 0000000000000007 R15: 0000000000000000
[    0.006666] FS:  0000000000000000(0000) GS:ffff880035c00000(0000) knlGS:0000000000000000
[    0.006666] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[    0.006666] CR2: fffffffefde97e70 CR3: 0000000036980000 CR4: 00000000000006b0
[    0.006666] Stack:
[    0.006666]  0000000000000000 ffffffffff475630 ffff88003582c000 000000000000000f
[    0.006666]  000000000000000f ffffffff81965f60 ffffffff81a98a7c ffffffff81b1a900
[    0.006666]  47ddbe4b4424ac57 a9929ff050ed979e 0000000000000000 a02eda84fa6789b9
[    0.006666] Call Trace:
[    0.006666]  [<ffffffff81a98a7c>] efi_enter_virtual_mode+0x30a/0x32c
[    0.006666]  [<ffffffff81a83ccf>] start_kernel+0x349/0x3da
[    0.006666]  [<ffffffff81a83794>] ? repair_env_string+0x58/0x58
[    0.006666]  [<ffffffff81a83120>] ? early_idt_handlers+0x120/0x120
[    0.006666]  [<ffffffff81a83498>] x86_64_start_reservations+0x2a/0x2c
[    0.006666]  [<ffffffff81a83590>] x86_64_start_kernel+0xf6/0xff
[    0.006666] Code: 55 49 89 cd 4c 89 45 d8 e8 db 05 00 00 48 8b 05 2c 6a a2 00 4c 8b 45 d8 44 89 f1 4c 89 e2 48 89 de 48 8b 40 58 4d 89 c1 4d 89 e8 <48> 8b 78 58 e8 7c 0a 00 00 41 5e 5b 41 5c 41 5d 41 5e 5d c3 55 
[    0.006666] RIP  [<ffffffff8103a1db>] virt_efi_set_variable+0x40/0x54
[    0.006666]  RSP <ffffffff81965ee0>
[    0.006666] CR2: fffffffefde97e70
[    0.006666] ---[ end trace e9fbc5020b26135e ]---
[    0.006666] Kernel panic - not syncing: Attempted to kill the idle task!
[    0.006666] Rebooting in 10 seconds..


> 
> ("-experimental" doesn't trigger Fengguang's robot :-))
> 
> Good luck! :-)
> 
> ---
> From 880fcee20209a122eda846e7f109776ed1c56de5 Mon Sep 17 00:00:00 2001
> From: Borislav Petkov <bp@suse.de>
> Date: Wed, 18 Sep 2013 17:35:42 +0200
> Subject: [PATCH] EFI: Runtime services virtual mapping
> 
> We map the EFI regions needed for runtime services contiguously on
> virtual addresses starting from -4G down for a total max space of 64G.
> This way, we provide for stable runtime services addresses across
> kernels so that a kexec'd kernel can still use them.
> 
> This way, they're mapped in a separate pagetable so that we don't
> pollute the kernel namespace (you can see how the whole ioremapping and
> saving and restoring of PGDs is gone now).
> 
> Signed-off-by: Borislav Petkov <bp@suse.de>
> ---
>  arch/x86/include/asm/efi.h           | 43 ++++++++++--------
>  arch/x86/include/asm/pgtable_types.h |  3 +-
>  arch/x86/platform/efi/efi.c          | 68 ++++++++++++-----------------
>  arch/x86/platform/efi/efi_32.c       | 29 +++++++++++-
>  arch/x86/platform/efi/efi_64.c       | 85 +++++++++++++++++++-----------------
>  arch/x86/platform/efi/efi_stub_64.S  | 53 ++++++++++++++++++++++
>  6 files changed, 181 insertions(+), 100 deletions(-)
> 
> diff --git a/arch/x86/include/asm/efi.h b/arch/x86/include/asm/efi.h
> index 0062a0125041..9a99e0499e4b 100644
> --- a/arch/x86/include/asm/efi.h
> +++ b/arch/x86/include/asm/efi.h
> @@ -69,24 +69,31 @@ extern u64 efi_call6(void *fp, u64 arg1, u64 arg2, u64 arg3,
>  	efi_call6((f), (u64)(a1), (u64)(a2), (u64)(a3),		\
>  		  (u64)(a4), (u64)(a5), (u64)(a6))
>  
> +#define _efi_call_virtX(x, f, ...)					\
> +({									\
> +	efi_status_t __s;						\
> +									\
> +	efi_sync_low_kernel_mappings();					\
> +	preempt_disable();						\
> +	__s = efi_call##x((void *)efi.systab->runtime->f, __VA_ARGS__);	\
> +	preempt_enable();						\
> +	__s;								\
> +})
> +
>  #define efi_call_virt0(f)				\
> -	efi_call0((efi.systab->runtime->f))
> -#define efi_call_virt1(f, a1)					\
> -	efi_call1((efi.systab->runtime->f), (u64)(a1))
> -#define efi_call_virt2(f, a1, a2)					\
> -	efi_call2((efi.systab->runtime->f), (u64)(a1), (u64)(a2))
> -#define efi_call_virt3(f, a1, a2, a3)					\
> -	efi_call3((efi.systab->runtime->f), (u64)(a1), (u64)(a2), \
> -		  (u64)(a3))
> -#define efi_call_virt4(f, a1, a2, a3, a4)				\
> -	efi_call4((efi.systab->runtime->f), (u64)(a1), (u64)(a2), \
> -		  (u64)(a3), (u64)(a4))
> -#define efi_call_virt5(f, a1, a2, a3, a4, a5)				\
> -	efi_call5((efi.systab->runtime->f), (u64)(a1), (u64)(a2), \
> -		  (u64)(a3), (u64)(a4), (u64)(a5))
> -#define efi_call_virt6(f, a1, a2, a3, a4, a5, a6)			\
> -	efi_call6((efi.systab->runtime->f), (u64)(a1), (u64)(a2), \
> -		  (u64)(a3), (u64)(a4), (u64)(a5), (u64)(a6))
> +	_efi_call_virtX(0, f)
> +#define efi_call_virt1(f, a1)				\
> +	_efi_call_virtX(1, f, (u64)(a1))
> +#define efi_call_virt2(f, a1, a2)			\
> +	_efi_call_virtX(2, f, (u64)(a1), (u64)(a2))
> +#define efi_call_virt3(f, a1, a2, a3)			\
> +	_efi_call_virtX(3, f, (u64)(a1), (u64)(a2), (u64)(a3))
> +#define efi_call_virt4(f, a1, a2, a3, a4)		\
> +	_efi_call_virtX(4, f, (u64)(a1), (u64)(a2), (u64)(a3), (u64)(a4))
> +#define efi_call_virt5(f, a1, a2, a3, a4, a5)		\
> +	_efi_call_virtX(5, f, (u64)(a1), (u64)(a2), (u64)(a3), (u64)(a4), (u64)(a5))
> +#define efi_call_virt6(f, a1, a2, a3, a4, a5, a6)	\
> +	_efi_call_virtX(6, f, (u64)(a1), (u64)(a2), (u64)(a3), (u64)(a4), (u64)(a5), (u64)(a6))
>  
>  extern void __iomem *efi_ioremap(unsigned long addr, unsigned long size,
>  				 u32 type, u64 attribute);
> @@ -101,6 +108,8 @@ extern void efi_call_phys_prelog(void);
>  extern void efi_call_phys_epilog(void);
>  extern void efi_unmap_memmap(void);
>  extern void efi_memory_uc(u64 addr, unsigned long size);
> +extern void __init efi_map_region(efi_memory_desc_t *md);
> +extern void efi_sync_low_kernel_mappings(void);
>  
>  #ifdef CONFIG_EFI
>  
> diff --git a/arch/x86/include/asm/pgtable_types.h b/arch/x86/include/asm/pgtable_types.h
> index 0ecac257fb26..a83aa44bb1fb 100644
> --- a/arch/x86/include/asm/pgtable_types.h
> +++ b/arch/x86/include/asm/pgtable_types.h
> @@ -382,7 +382,8 @@ static inline void update_page_count(int level, unsigned long pages) { }
>   */
>  extern pte_t *lookup_address(unsigned long address, unsigned int *level);
>  extern phys_addr_t slow_virt_to_phys(void *__address);
> -
> +extern int kernel_map_pages_in_pgd(pgd_t *pgd, u64 pfn, unsigned long address,
> +				   unsigned numpages, unsigned long page_flags);
>  #endif	/* !__ASSEMBLY__ */
>  
>  #endif /* _ASM_X86_PGTABLE_DEFS_H */
> diff --git a/arch/x86/platform/efi/efi.c b/arch/x86/platform/efi/efi.c
> index 538c1e6b7b2c..90459f5f587c 100644
> --- a/arch/x86/platform/efi/efi.c
> +++ b/arch/x86/platform/efi/efi.c
> @@ -12,6 +12,8 @@
>   *	Bibo Mao <bibo.mao@intel.com>
>   *	Chandramouli Narayanan <mouli@linux.intel.com>
>   *	Huang Ying <ying.huang@intel.com>
> + * Copyright (C) 2013 SuSE Labs
> + * 	Borislav Petkov <bp@suse.de> - runtime services VA mapping
>   *
>   * Copied from efi_32.c to eliminate the duplicated code between EFI
>   * 32/64 support code. --ying 2007-10-26
> @@ -81,6 +83,17 @@ static efi_system_table_t efi_systab __initdata;
>  unsigned long x86_efi_facility;
>  
>  /*
> + * Scratch space used for switching the pagetable in the EFI stub
> + */
> +struct efi_scratch {
> +	u64 r15;
> +	u64 prev_cr3;
> +	pgd_t *efi_pgt;
> +	bool use_pgd;
> +};
> +extern struct efi_scratch efi_scratch;
> +
> +/*
>   * Returns 1 if 'facility' is enabled, 0 otherwise.
>   */
>  int efi_enabled(int facility)
> @@ -797,22 +810,6 @@ void __init efi_set_executable(efi_memory_desc_t *md, bool executable)
>  		set_memory_nx(addr, npages);
>  }
>  
> -static void __init runtime_code_page_mkexec(void)
> -{
> -	efi_memory_desc_t *md;
> -	void *p;
> -
> -	/* Make EFI runtime service code area executable */
> -	for (p = memmap.map; p < memmap.map_end; p += memmap.desc_size) {
> -		md = p;
> -
> -		if (md->type != EFI_RUNTIME_SERVICES_CODE)
> -			continue;
> -
> -		efi_set_executable(md, true);
> -	}
> -}
> -
>  /*
>   * We can't ioremap data in EFI boot services RAM, because we've already mapped
>   * it as RAM.  So, look it up in the existing EFI memory map instead.  Only
> @@ -862,10 +859,10 @@ void efi_memory_uc(u64 addr, unsigned long size)
>  void __init efi_enter_virtual_mode(void)
>  {
>  	efi_memory_desc_t *md, *prev_md = NULL;
> -	efi_status_t status;
> +	void *p, *new_memmap = NULL;
>  	unsigned long size;
> -	u64 end, systab, start_pfn, end_pfn;
> -	void *p, *va, *new_memmap = NULL;
> +	efi_status_t status;
> +	u64 end, systab;
>  	int count = 0;
>  
>  	efi.systab = NULL;
> @@ -874,7 +871,6 @@ void __init efi_enter_virtual_mode(void)
>  	 * We don't do virtual mode, since we don't do runtime services, on
>  	 * non-native EFI
>  	 */
> -
>  	if (!efi_is_native()) {
>  		efi_unmap_memmap();
>  		return;
> @@ -914,33 +910,18 @@ void __init efi_enter_virtual_mode(void)
>  		    md->type != EFI_BOOT_SERVICES_DATA)
>  			continue;
>  
> +		efi_map_region(md);
> +
>  		size = md->num_pages << PAGE_SHIFT;
>  		end = md->phys_addr + size;
>  
> -		start_pfn = PFN_DOWN(md->phys_addr);
> -		end_pfn = PFN_UP(end);
> -		if (pfn_range_is_mapped(start_pfn, end_pfn)) {
> -			va = __va(md->phys_addr);
> -
> -			if (!(md->attribute & EFI_MEMORY_WB))
> -				efi_memory_uc((u64)(unsigned long)va, size);
> -		} else
> -			va = efi_ioremap(md->phys_addr, size,
> -					 md->type, md->attribute);
> -
> -		md->virt_addr = (u64) (unsigned long) va;
> -
> -		if (!va) {
> -			pr_err("ioremap of 0x%llX failed!\n",
> -			       (unsigned long long)md->phys_addr);
> -			continue;
> -		}
> -
>  		systab = (u64) (unsigned long) efi_phys.systab;
>  		if (md->phys_addr <= systab && systab < end) {
>  			systab += md->virt_addr - md->phys_addr;
> +
>  			efi.systab = (efi_system_table_t *) (unsigned long) systab;
>  		}
> +
>  		new_memmap = krealloc(new_memmap,
>  				      (count + 1) * memmap.desc_size,
>  				      GFP_KERNEL);
> @@ -949,8 +930,15 @@ void __init efi_enter_virtual_mode(void)
>  		count++;
>  	}
>  
> +#ifdef CONFIG_X86_64
> +	efi_scratch.efi_pgt = (pgd_t *)(unsigned long)real_mode_header->trampoline_pgd;
> +	efi_scratch.use_pgd = true;
> +#endif
> +
>  	BUG_ON(!efi.systab);
>  
> +	efi_sync_low_kernel_mappings();
> +
>  	status = phys_efi_set_virtual_address_map(
>  		memmap.desc_size * count,
>  		memmap.desc_size,
> @@ -983,8 +971,6 @@ void __init efi_enter_virtual_mode(void)
>  	efi.query_variable_info = virt_efi_query_variable_info;
>  	efi.update_capsule = virt_efi_update_capsule;
>  	efi.query_capsule_caps = virt_efi_query_capsule_caps;
> -	if (__supported_pte_mask & _PAGE_NX)
> -		runtime_code_page_mkexec();
>  
>  	kfree(new_memmap);
>  
> diff --git a/arch/x86/platform/efi/efi_32.c b/arch/x86/platform/efi/efi_32.c
> index 40e446941dd7..661663b08eaf 100644
> --- a/arch/x86/platform/efi/efi_32.c
> +++ b/arch/x86/platform/efi/efi_32.c
> @@ -37,9 +37,36 @@
>   * claim EFI runtime service handler exclusively and to duplicate a memory in
>   * low memory space say 0 - 3G.
>   */
> -
>  static unsigned long efi_rt_eflags;
>  
> +void efi_sync_low_kernel_mappings(void) {}
> +
> +void __init efi_map_region(efi_memory_desc_t *md)
> +{
> +	u64 start_pfn, end_pfn, end;
> +	unsigned long size;
> +	void *va;
> +
> +	start_pfn = PFN_DOWN(md->phys_addr);
> +	size	  = md->num_pages << PAGE_SHIFT;
> +	end	  = md->phys_addr + size;
> +	end_pfn   = PFN_UP(end);
> +
> +	if (pfn_range_is_mapped(start_pfn, end_pfn)) {
> +		va = __va(md->phys_addr);
> +
> +		if (!(md->attribute & EFI_MEMORY_WB))
> +			efi_memory_uc((u64)(unsigned long)va, size);
> +	} else
> +		va = efi_ioremap(md->phys_addr, size,
> +				 md->type, md->attribute);
> +
> +	md->virt_addr = (u64) (unsigned long) va;
> +	if (!va)
> +		pr_err("ioremap of 0x%llX failed!\n",
> +		       (unsigned long long)md->phys_addr);
> +}
> +
>  void efi_call_phys_prelog(void)
>  {
>  	struct desc_ptr gdt_descr;
> diff --git a/arch/x86/platform/efi/efi_64.c b/arch/x86/platform/efi/efi_64.c
> index 39a0e7f1f0a3..db5230dd350e 100644
> --- a/arch/x86/platform/efi/efi_64.c
> +++ b/arch/x86/platform/efi/efi_64.c
> @@ -39,59 +39,64 @@
>  #include <asm/cacheflush.h>
>  #include <asm/fixmap.h>
>  
> -static pgd_t *save_pgd __initdata;
> -static unsigned long efi_flags __initdata;
> +void __init efi_call_phys_prelog(void) {}
> +void __init efi_call_phys_epilog(void) {}
>  
> -static void __init early_code_mapping_set_exec(int executable)
> +/*
> + * Add low kernel mappings for passing arguments to EFI functions.
> + */
> +void efi_sync_low_kernel_mappings(void)
>  {
> -	efi_memory_desc_t *md;
> -	void *p;
> +	unsigned num_pgds;
> +	pgd_t *pgd = (pgd_t *)__va(real_mode_header->trampoline_pgd);
>  
> -	if (!(__supported_pte_mask & _PAGE_NX))
> -		return;
> +	num_pgds = pgd_index(VMALLOC_START - 1) - pgd_index(PAGE_OFFSET);
>  
> -	/* Make EFI service code area executable */
> -	for (p = memmap.map; p < memmap.map_end; p += memmap.desc_size) {
> -		md = p;
> -		if (md->type == EFI_RUNTIME_SERVICES_CODE ||
> -		    md->type == EFI_BOOT_SERVICES_CODE)
> -			efi_set_executable(md, executable);
> -	}
> +	memcpy(pgd + pgd_index(PAGE_OFFSET),
> +		init_mm.pgd + pgd_index(PAGE_OFFSET),
> +		sizeof(pgd_t) * num_pgds);
>  }
>  
> -void __init efi_call_phys_prelog(void)
> +/*
> + * We allocate runtime services regions top-down, starting from -4G, i.e.
> + * 0xffff_ffff_0000_0000 and limit EFI VA mapping space to 64G.
> + */
> +static u64 efi_va	 = -4 * (1UL << 30);
> +#define EFI_VA_END	 (-68 * (1UL << 30))
> +
> +static void __init __map_region(efi_memory_desc_t *md, u64 va)
>  {
> -	unsigned long vaddress;
> -	int pgd;
> -	int n_pgds;
> +	pgd_t *pgd = (pgd_t *)__va(real_mode_header->trampoline_pgd);
> +	unsigned long pf = 0, size;
> +	u64 end;
>  
> -	early_code_mapping_set_exec(1);
> -	local_irq_save(efi_flags);
> +	if (!(md->attribute & EFI_MEMORY_WB))
> +		pf |= _PAGE_PCD;
>  
> -	n_pgds = DIV_ROUND_UP((max_pfn << PAGE_SHIFT), PGDIR_SIZE);
> -	save_pgd = kmalloc(n_pgds * sizeof(pgd_t), GFP_KERNEL);
> +	size = md->num_pages << PAGE_SHIFT;
> +	end  = va + size;
>  
> -	for (pgd = 0; pgd < n_pgds; pgd++) {
> -		save_pgd[pgd] = *pgd_offset_k(pgd * PGDIR_SIZE);
> -		vaddress = (unsigned long)__va(pgd * PGDIR_SIZE);
> -		set_pgd(pgd_offset_k(pgd * PGDIR_SIZE), *pgd_offset_k(vaddress));
> -	}
> -	__flush_tlb_all();
> +	if(kernel_map_pages_in_pgd(pgd, md->phys_addr, va, md->num_pages, pf))
> +		pr_warning("Error mapping PA 0x%llx -> VA 0x%llx!\n",
> +			   md->phys_addr, va);
>  }
>  
> -void __init efi_call_phys_epilog(void)
> +void __init efi_map_region(efi_memory_desc_t *md)
>  {
> -	/*
> -	 * After the lock is released, the original page table is restored.
> -	 */
> -	int pgd;
> -	int n_pgds = DIV_ROUND_UP((max_pfn << PAGE_SHIFT) , PGDIR_SIZE);
> -	for (pgd = 0; pgd < n_pgds; pgd++)
> -		set_pgd(pgd_offset_k(pgd * PGDIR_SIZE), save_pgd[pgd]);
> -	kfree(save_pgd);
> -	__flush_tlb_all();
> -	local_irq_restore(efi_flags);
> -	early_code_mapping_set_exec(0);
> +	unsigned long size = md->num_pages << PAGE_SHIFT;
> +
> +	efi_va -= size;
> +	if (efi_va < EFI_VA_END) {
> +		pr_warning(FW_WARN "VA address range overflow!\n");
> +		return;
> +	}
> +
> +	/* Do the 1:1 map */
> +	__map_region(md, md->phys_addr);
> +
> +	/* Do the VA map */
> +	__map_region(md, efi_va);
> +	md->virt_addr = efi_va;
>  }
>  
>  void __iomem *__init efi_ioremap(unsigned long phys_addr, unsigned long size,
> diff --git a/arch/x86/platform/efi/efi_stub_64.S b/arch/x86/platform/efi/efi_stub_64.S
> index 4c07ccab8146..2bb9714bf713 100644
> --- a/arch/x86/platform/efi/efi_stub_64.S
> +++ b/arch/x86/platform/efi/efi_stub_64.S
> @@ -34,10 +34,47 @@
>  	mov %rsi, %cr0;			\
>  	mov (%rsp), %rsp
>  
> +	/* stolen from gcc */
> +	.macro FLUSH_TLB_ALL
> +	movq %r15, efi_scratch(%rip)
> +	movq %r14, efi_scratch+8(%rip)
> +	movq %cr4, %r15
> +	movq %r15, %r14
> +	andb $0x7f, %r14b
> +	movq %r14, %cr4
> +	movq %r15, %cr4
> +	movq efi_scratch+8(%rip), %r14
> +	movq efi_scratch(%rip), %r15
> +	.endm
> +
> +	.macro SWITCH_PGT
> +	cmpb $0, efi_scratch+24(%rip)
> +	je 1f
> +	movq %r15, efi_scratch(%rip)		# r15
> +	# save previous CR3
> +	movq %cr3, %r15
> +	movq %r15, efi_scratch+8(%rip)		# prev_cr3
> +	movq efi_scratch+16(%rip), %r15		# EFI pgt
> +	movq %r15, %cr3
> +	1:
> +	.endm
> +
> +	.macro RESTORE_PGT
> +	cmpb $0, efi_scratch+24(%rip)
> +	je 2f
> +	movq efi_scratch+8(%rip), %r15
> +	movq %r15, %cr3
> +	movq efi_scratch(%rip), %r15
> +	FLUSH_TLB_ALL
> +	2:
> +	.endm
> +
>  ENTRY(efi_call0)
>  	SAVE_XMM
>  	subq $32, %rsp
> +	SWITCH_PGT
>  	call *%rdi
> +	RESTORE_PGT
>  	addq $32, %rsp
>  	RESTORE_XMM
>  	ret
> @@ -47,7 +84,9 @@ ENTRY(efi_call1)
>  	SAVE_XMM
>  	subq $32, %rsp
>  	mov  %rsi, %rcx
> +	SWITCH_PGT
>  	call *%rdi
> +	RESTORE_PGT
>  	addq $32, %rsp
>  	RESTORE_XMM
>  	ret
> @@ -57,7 +96,9 @@ ENTRY(efi_call2)
>  	SAVE_XMM
>  	subq $32, %rsp
>  	mov  %rsi, %rcx
> +	SWITCH_PGT
>  	call *%rdi
> +	RESTORE_PGT
>  	addq $32, %rsp
>  	RESTORE_XMM
>  	ret
> @@ -68,7 +109,9 @@ ENTRY(efi_call3)
>  	subq $32, %rsp
>  	mov  %rcx, %r8
>  	mov  %rsi, %rcx
> +	SWITCH_PGT
>  	call *%rdi
> +	RESTORE_PGT
>  	addq $32, %rsp
>  	RESTORE_XMM
>  	ret
> @@ -80,7 +123,9 @@ ENTRY(efi_call4)
>  	mov %r8, %r9
>  	mov %rcx, %r8
>  	mov %rsi, %rcx
> +	SWITCH_PGT
>  	call *%rdi
> +	RESTORE_PGT
>  	addq $32, %rsp
>  	RESTORE_XMM
>  	ret
> @@ -93,7 +138,9 @@ ENTRY(efi_call5)
>  	mov %r8, %r9
>  	mov %rcx, %r8
>  	mov %rsi, %rcx
> +	SWITCH_PGT
>  	call *%rdi
> +	RESTORE_PGT
>  	addq $48, %rsp
>  	RESTORE_XMM
>  	ret
> @@ -109,8 +156,14 @@ ENTRY(efi_call6)
>  	mov %r8, %r9
>  	mov %rcx, %r8
>  	mov %rsi, %rcx
> +	SWITCH_PGT
>  	call *%rdi
> +	RESTORE_PGT
>  	addq $48, %rsp
>  	RESTORE_XMM
>  	ret
>  ENDPROC(efi_call6)
> +
> +	.data
> +ENTRY(efi_scratch)
> +	.fill 3,8,0
> -- 
> 1.8.4
> 
> -- 
> Regards/Gruss,
>     Boris.
> 
> Sent from a fat crate under my desk. Formatting is fine.
> --

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH -v2] EFI: Runtime services virtual mapping
  2013-09-22 12:35     ` Dave Young
@ 2013-09-22 13:37       ` Borislav Petkov
  2013-09-22 14:00         ` Dave Young
  2013-09-22 15:27         ` H. Peter Anvin
  0 siblings, 2 replies; 102+ messages in thread
From: Borislav Petkov @ 2013-09-22 13:37 UTC (permalink / raw)
  To: Dave Young
  Cc: X86 ML, LKML, Borislav Petkov, Matt Fleming, Matthew Garrett,
	H. Peter Anvin, James Bottomley, Vivek Goyal, linux-efi

On Sun, Sep 22, 2013 at 08:35:15PM +0800, Dave Young wrote:
> I tested your new patch, it works both with efi stub and grub boot in
> 1st kernel.

Good, thanks!

> But it paniced in kexec boot with my kexec related patcheset, the patchset

That's the second kernel, right?

> contains 3 patch:
> 1. introduce cmdline kexecboot=<0|1|2>; 1 == kexec, 2 == kdump
> 2. export physical addr fw_vendor, runtime, tables to /sys/firmware/efi/systab
> 3. if kexecboot != 0, use fw_vendor, runtime, tables from bootparams; Also do not
>    call SetVirtualAddressMao in case kexecboot.
> 
> The panic happens at the last line of efi_init:
>         /* clean DUMMY object */
>         efi.set_variable(efi_dummy_name, &EFI_DUMMY_GUID,
>                          EFI_VARIABLE_NON_VOLATILE |
>                          EFI_VARIABLE_BOOTSERVICE_ACCESS |
>                          EFI_VARIABLE_RUNTIME_ACCESS,
>                          0, NULL);
> 
> Below is the dmesg:
> [    0.003359] pid_max: default: 32768 minimum: 301
> [    0.004792] BUG: unable to handle kernel paging request at fffffffefde97e70
> [    0.006666] IP: [<ffffffff8103a1db>] virt_efi_set_variable+0x40/0x54
> [    0.006666] PGD 36981067 PUD 35828063 PMD 0

Here it is - fffffffefde97e70 is not mapped in the pagetable, PMD is 0.

Ok, can you upload your patches somewhere and tell me exactly how to
reproduce this so that I can take a look too?

Thanks.

-- 
Regards/Gruss,
    Boris.

Sent from a fat crate under my desk. Formatting is fine.
--

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH -v2] EFI: Runtime services virtual mapping
  2013-09-22 13:37       ` Borislav Petkov
@ 2013-09-22 14:00         ` Dave Young
  2013-09-22 14:31           ` Dave Young
  2013-09-22 15:27         ` H. Peter Anvin
  1 sibling, 1 reply; 102+ messages in thread
From: Dave Young @ 2013-09-22 14:00 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: X86 ML, LKML, Borislav Petkov, Matt Fleming, Matthew Garrett,
	H. Peter Anvin, James Bottomley, Vivek Goyal, linux-efi

On 09/22/13 at 03:37pm, Borislav Petkov wrote:
> On Sun, Sep 22, 2013 at 08:35:15PM +0800, Dave Young wrote:
> > I tested your new patch, it works both with efi stub and grub boot in
> > 1st kernel.
> 
> Good, thanks!
> 
> > But it paniced in kexec boot with my kexec related patcheset, the patchset
> 
> That's the second kernel, right?

Yes, it's 2nd kernel.

> 
> > contains 3 patch:
> > 1. introduce cmdline kexecboot=<0|1|2>; 1 == kexec, 2 == kdump
> > 2. export physical addr fw_vendor, runtime, tables to /sys/firmware/efi/systab
> > 3. if kexecboot != 0, use fw_vendor, runtime, tables from bootparams; Also do not
> >    call SetVirtualAddressMao in case kexecboot.
> > 
> > The panic happens at the last line of efi_init:
> >         /* clean DUMMY object */
> >         efi.set_variable(efi_dummy_name, &EFI_DUMMY_GUID,
> >                          EFI_VARIABLE_NON_VOLATILE |
> >                          EFI_VARIABLE_BOOTSERVICE_ACCESS |
> >                          EFI_VARIABLE_RUNTIME_ACCESS,
> >                          0, NULL);
> > 
> > Below is the dmesg:
> > [    0.003359] pid_max: default: 32768 minimum: 301
> > [    0.004792] BUG: unable to handle kernel paging request at fffffffefde97e70
> > [    0.006666] IP: [<ffffffff8103a1db>] virt_efi_set_variable+0x40/0x54
> > [    0.006666] PGD 36981067 PUD 35828063 PMD 0
> 
> Here it is - fffffffefde97e70 is not mapped in the pagetable, PMD is 0.
> 
> Ok, can you upload your patches somewhere and tell me exactly how to
> reproduce this so that I can take a look too?

Ok, will put somewhere after a little cleanup today.

> 
> Thanks.
> 
> -- 
> Regards/Gruss,
>     Boris.
> 
> Sent from a fat crate under my desk. Formatting is fine.
> --

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH -v2] EFI: Runtime services virtual mapping
  2013-09-22 14:00         ` Dave Young
@ 2013-09-22 14:31           ` Dave Young
  0 siblings, 0 replies; 102+ messages in thread
From: Dave Young @ 2013-09-22 14:31 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: X86 ML, LKML, Borislav Petkov, Matt Fleming, Matthew Garrett,
	H. Peter Anvin, James Bottomley, Vivek Goyal, linux-efi

On 09/22/13 at 10:00pm, Dave Young wrote:
> On 09/22/13 at 03:37pm, Borislav Petkov wrote:
> > On Sun, Sep 22, 2013 at 08:35:15PM +0800, Dave Young wrote:
> > > I tested your new patch, it works both with efi stub and grub boot in
> > > 1st kernel.
> > 
> > Good, thanks!
> > 
> > > But it paniced in kexec boot with my kexec related patcheset, the patchset
> > 
> > That's the second kernel, right?
> 
> Yes, it's 2nd kernel.
> 
> > 
> > > contains 3 patch:
> > > 1. introduce cmdline kexecboot=<0|1|2>; 1 == kexec, 2 == kdump
> > > 2. export physical addr fw_vendor, runtime, tables to /sys/firmware/efi/systab
> > > 3. if kexecboot != 0, use fw_vendor, runtime, tables from bootparams; Also do not
> > >    call SetVirtualAddressMao in case kexecboot.
> > > 
> > > The panic happens at the last line of efi_init:
> > >         /* clean DUMMY object */
> > >         efi.set_variable(efi_dummy_name, &EFI_DUMMY_GUID,
> > >                          EFI_VARIABLE_NON_VOLATILE |
> > >                          EFI_VARIABLE_BOOTSERVICE_ACCESS |
> > >                          EFI_VARIABLE_RUNTIME_ACCESS,
> > >                          0, NULL);
> > > 
> > > Below is the dmesg:
> > > [    0.003359] pid_max: default: 32768 minimum: 301
> > > [    0.004792] BUG: unable to handle kernel paging request at fffffffefde97e70
> > > [    0.006666] IP: [<ffffffff8103a1db>] virt_efi_set_variable+0x40/0x54
> > > [    0.006666] PGD 36981067 PUD 35828063 PMD 0
> > 
> > Here it is - fffffffefde97e70 is not mapped in the pagetable, PMD is 0.
> > 
> > Ok, can you upload your patches somewhere and tell me exactly how to
> > reproduce this so that I can take a look too?
> 
> Ok, will put somewhere after a little cleanup today.

Here it is:
https://people.redhat.com/ruyang/kexec-efi/for-bp/

userspace patches are also necessary, they are under kexec-tools-patches/

Just test with below steps:
kexec -l /boot/vmlinuz-3.12.0-rc1+ --reuse-cmdline --append "kexecboot=1"
kexec -e

> 
> > 
> > Thanks.
> > 
> > -- 
> > Regards/Gruss,
> >     Boris.
> > 
> > Sent from a fat crate under my desk. Formatting is fine.
> > --

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH -v2] EFI: Runtime services virtual mapping
  2013-09-22 13:37       ` Borislav Petkov
  2013-09-22 14:00         ` Dave Young
@ 2013-09-22 15:27         ` H. Peter Anvin
  2013-09-22 16:38           ` Borislav Petkov
                             ` (2 more replies)
  1 sibling, 3 replies; 102+ messages in thread
From: H. Peter Anvin @ 2013-09-22 15:27 UTC (permalink / raw)
  To: Borislav Petkov, Dave Young
  Cc: X86 ML, LKML, Borislav Petkov, Matt Fleming, Matthew Garrett,
	James Bottomley, Vivek Goyal, linux-efi

The address that faults is interesting in that it is indeed just below -4G.  The question at hand is probably what information you are using to build the EFI mappings in the secondary kernel and what could make it not match the primary.

Assuming it isn't as simple as the mappings never get built at all.


Borislav Petkov <bp@alien8.de> wrote:
>On Sun, Sep 22, 2013 at 08:35:15PM +0800, Dave Young wrote:
>> I tested your new patch, it works both with efi stub and grub boot in
>> 1st kernel.
>
>Good, thanks!
>
>> But it paniced in kexec boot with my kexec related patcheset, the
>patchset
>
>That's the second kernel, right?
>
>> contains 3 patch:
>> 1. introduce cmdline kexecboot=<0|1|2>; 1 == kexec, 2 == kdump
>> 2. export physical addr fw_vendor, runtime, tables to
>/sys/firmware/efi/systab
>> 3. if kexecboot != 0, use fw_vendor, runtime, tables from bootparams;
>Also do not
>>    call SetVirtualAddressMao in case kexecboot.
>> 
>> The panic happens at the last line of efi_init:
>>         /* clean DUMMY object */
>>         efi.set_variable(efi_dummy_name, &EFI_DUMMY_GUID,
>>                          EFI_VARIABLE_NON_VOLATILE |
>>                          EFI_VARIABLE_BOOTSERVICE_ACCESS |
>>                          EFI_VARIABLE_RUNTIME_ACCESS,
>>                          0, NULL);
>> 
>> Below is the dmesg:
>> [    0.003359] pid_max: default: 32768 minimum: 301
>> [    0.004792] BUG: unable to handle kernel paging request at
>fffffffefde97e70
>> [    0.006666] IP: [<ffffffff8103a1db>]
>virt_efi_set_variable+0x40/0x54
>> [    0.006666] PGD 36981067 PUD 35828063 PMD 0
>
>Here it is - fffffffefde97e70 is not mapped in the pagetable, PMD is 0.
>
>Ok, can you upload your patches somewhere and tell me exactly how to
>reproduce this so that I can take a look too?
>
>Thanks.

-- 
Sent from my mobile phone.  Please pardon brevity and lack of formatting.

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH -v2] EFI: Runtime services virtual mapping
  2013-09-22 15:27         ` H. Peter Anvin
@ 2013-09-22 16:38           ` Borislav Petkov
  2013-09-23  5:45           ` Dave Young
  2013-09-24  2:52           ` Dave Young
  2 siblings, 0 replies; 102+ messages in thread
From: Borislav Petkov @ 2013-09-22 16:38 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Dave Young, X86 ML, LKML, Borislav Petkov, Matt Fleming,
	Matthew Garrett, James Bottomley, Vivek Goyal, linux-efi

On Sun, Sep 22, 2013 at 08:27:34AM -0700, H. Peter Anvin wrote:a
> The address that faults is interesting in that it is indeed just below
> -4G. The question at hand is probably what information you are using
> to build the EFI mappings in the secondary kernel and what could make
> it not match the primary.

Yep, so obviously we're not building the pagetable in the second kernel
the same way as the first or we're missing some pieces.

Btw, for debugging situations like this one, one could use
arch/x86/mm/dump_pagetables.c successfully by sticking in the right CR3
value into *start.

:-)

-- 
Regards/Gruss,
    Boris.

Sent from a fat crate under my desk. Formatting is fine.
--

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH -v2] EFI: Runtime services virtual mapping
  2013-09-22 15:27         ` H. Peter Anvin
  2013-09-22 16:38           ` Borislav Petkov
@ 2013-09-23  5:45           ` Dave Young
  2013-09-24  2:52           ` Dave Young
  2 siblings, 0 replies; 102+ messages in thread
From: Dave Young @ 2013-09-23  5:45 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Borislav Petkov, X86 ML, LKML, Borislav Petkov, Matt Fleming,
	Matthew Garrett, James Bottomley, Vivek Goyal, linux-efi

On 09/22/13 at 08:27am, H. Peter Anvin wrote:
> The address that faults is interesting in that it is indeed just below -4G.  The question at hand is probably what information you are using to build the EFI mappings in the secondary kernel and what could make it not match the primary.
> 
> Assuming it isn't as simple as the mappings never get built at all.

At least the efi_info is same between two kernels.
Will print some debug info to see if I can find something.

> 
> 
> Borislav Petkov <bp@alien8.de> wrote:
> >On Sun, Sep 22, 2013 at 08:35:15PM +0800, Dave Young wrote:
> >> I tested your new patch, it works both with efi stub and grub boot in
> >> 1st kernel.
> >
> >Good, thanks!
> >
> >> But it paniced in kexec boot with my kexec related patcheset, the
> >patchset
> >
> >That's the second kernel, right?
> >
> >> contains 3 patch:
> >> 1. introduce cmdline kexecboot=<0|1|2>; 1 == kexec, 2 == kdump
> >> 2. export physical addr fw_vendor, runtime, tables to
> >/sys/firmware/efi/systab
> >> 3. if kexecboot != 0, use fw_vendor, runtime, tables from bootparams;
> >Also do not
> >>    call SetVirtualAddressMao in case kexecboot.
> >> 
> >> The panic happens at the last line of efi_init:
> >>         /* clean DUMMY object */
> >>         efi.set_variable(efi_dummy_name, &EFI_DUMMY_GUID,
> >>                          EFI_VARIABLE_NON_VOLATILE |
> >>                          EFI_VARIABLE_BOOTSERVICE_ACCESS |
> >>                          EFI_VARIABLE_RUNTIME_ACCESS,
> >>                          0, NULL);
> >> 
> >> Below is the dmesg:
> >> [    0.003359] pid_max: default: 32768 minimum: 301
> >> [    0.004792] BUG: unable to handle kernel paging request at
> >fffffffefde97e70
> >> [    0.006666] IP: [<ffffffff8103a1db>]
> >virt_efi_set_variable+0x40/0x54
> >> [    0.006666] PGD 36981067 PUD 35828063 PMD 0
> >
> >Here it is - fffffffefde97e70 is not mapped in the pagetable, PMD is 0.
> >
> >Ok, can you upload your patches somewhere and tell me exactly how to
> >reproduce this so that I can take a look too?
> >
> >Thanks.
> 
> -- 
> Sent from my mobile phone.  Please pardon brevity and lack of formatting.

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH -v2] EFI: Runtime services virtual mapping
  2013-09-21 11:39   ` [PATCH -v2] " Borislav Petkov
  2013-09-22 12:35     ` Dave Young
@ 2013-09-23  5:47     ` Dave Young
  2013-09-23  6:29       ` Borislav Petkov
  2013-09-23  8:45     ` Borislav Petkov
  2013-09-25  9:24     ` Borislav Petkov
  3 siblings, 1 reply; 102+ messages in thread
From: Dave Young @ 2013-09-23  5:47 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: X86 ML, LKML, Borislav Petkov, Matt Fleming, Matthew Garrett,
	H. Peter Anvin, James Bottomley, Vivek Goyal, linux-efi

On 09/21/13 at 01:39pm, Borislav Petkov wrote:
> On Thu, Sep 19, 2013 at 04:54:54PM +0200, Borislav Petkov wrote:
> > From: Borislav Petkov <bp@suse.de>
> > 
> > We map the EFI regions needed for runtime services contiguously on
> > virtual addresses starting from -4G down for a total max space of 64G.
> > This way, we provide for stable runtime services addresses across
> > kernels so that a kexec'd kernel can still use them.
> > 
> > This way, they're mapped in a separate pagetable so that we don't
> > pollute the kernel namespace (you can see how the whole ioremapping and
> > saving and restoring of PGDs is gone now).
> 
> Ok, this one was not so good, let's try again:
> 
> This time I saved 32-bit and am switching the pagetable only after
> having built it properly. This boots fine again on baremetal and on OVMF
> with Matt's handover flags fix from yesterday.
> 
> Also, I've uploaded the whole series to
> git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp.git, branch
> efi-experimental
> 
> ("-experimental" doesn't trigger Fengguang's robot :-))
> 
> Good luck! :-)
> 
> ---
> From 880fcee20209a122eda846e7f109776ed1c56de5 Mon Sep 17 00:00:00 2001
> From: Borislav Petkov <bp@suse.de>
> Date: Wed, 18 Sep 2013 17:35:42 +0200
> Subject: [PATCH] EFI: Runtime services virtual mapping
> 
> We map the EFI regions needed for runtime services contiguously on
> virtual addresses starting from -4G down for a total max space of 64G.
> This way, we provide for stable runtime services addresses across
> kernels so that a kexec'd kernel can still use them.
> 
> This way, they're mapped in a separate pagetable so that we don't
> pollute the kernel namespace (you can see how the whole ioremapping and
> saving and restoring of PGDs is gone now).
> 
> Signed-off-by: Borislav Petkov <bp@suse.de>
> ---
>  arch/x86/include/asm/efi.h           | 43 ++++++++++--------
>  arch/x86/include/asm/pgtable_types.h |  3 +-
>  arch/x86/platform/efi/efi.c          | 68 ++++++++++++-----------------
>  arch/x86/platform/efi/efi_32.c       | 29 +++++++++++-
>  arch/x86/platform/efi/efi_64.c       | 85 +++++++++++++++++++-----------------
>  arch/x86/platform/efi/efi_stub_64.S  | 53 ++++++++++++++++++++++
>  6 files changed, 181 insertions(+), 100 deletions(-)
> 
> diff --git a/arch/x86/include/asm/efi.h b/arch/x86/include/asm/efi.h
> index 0062a0125041..9a99e0499e4b 100644
> --- a/arch/x86/include/asm/efi.h
> +++ b/arch/x86/include/asm/efi.h
> @@ -69,24 +69,31 @@ extern u64 efi_call6(void *fp, u64 arg1, u64 arg2, u64 arg3,
>  	efi_call6((f), (u64)(a1), (u64)(a2), (u64)(a3),		\
>  		  (u64)(a4), (u64)(a5), (u64)(a6))
>  
> +#define _efi_call_virtX(x, f, ...)					\
> +({									\
> +	efi_status_t __s;						\
> +									\
> +	efi_sync_low_kernel_mappings();					\
> +	preempt_disable();						\
> +	__s = efi_call##x((void *)efi.systab->runtime->f, __VA_ARGS__);	\
> +	preempt_enable();						\
> +	__s;								\
> +})
> +
>  #define efi_call_virt0(f)				\
> -	efi_call0((efi.systab->runtime->f))
> -#define efi_call_virt1(f, a1)					\
> -	efi_call1((efi.systab->runtime->f), (u64)(a1))
> -#define efi_call_virt2(f, a1, a2)					\
> -	efi_call2((efi.systab->runtime->f), (u64)(a1), (u64)(a2))
> -#define efi_call_virt3(f, a1, a2, a3)					\
> -	efi_call3((efi.systab->runtime->f), (u64)(a1), (u64)(a2), \
> -		  (u64)(a3))
> -#define efi_call_virt4(f, a1, a2, a3, a4)				\
> -	efi_call4((efi.systab->runtime->f), (u64)(a1), (u64)(a2), \
> -		  (u64)(a3), (u64)(a4))
> -#define efi_call_virt5(f, a1, a2, a3, a4, a5)				\
> -	efi_call5((efi.systab->runtime->f), (u64)(a1), (u64)(a2), \
> -		  (u64)(a3), (u64)(a4), (u64)(a5))
> -#define efi_call_virt6(f, a1, a2, a3, a4, a5, a6)			\
> -	efi_call6((efi.systab->runtime->f), (u64)(a1), (u64)(a2), \
> -		  (u64)(a3), (u64)(a4), (u64)(a5), (u64)(a6))
> +	_efi_call_virtX(0, f)
> +#define efi_call_virt1(f, a1)				\
> +	_efi_call_virtX(1, f, (u64)(a1))
> +#define efi_call_virt2(f, a1, a2)			\
> +	_efi_call_virtX(2, f, (u64)(a1), (u64)(a2))
> +#define efi_call_virt3(f, a1, a2, a3)			\
> +	_efi_call_virtX(3, f, (u64)(a1), (u64)(a2), (u64)(a3))
> +#define efi_call_virt4(f, a1, a2, a3, a4)		\
> +	_efi_call_virtX(4, f, (u64)(a1), (u64)(a2), (u64)(a3), (u64)(a4))
> +#define efi_call_virt5(f, a1, a2, a3, a4, a5)		\
> +	_efi_call_virtX(5, f, (u64)(a1), (u64)(a2), (u64)(a3), (u64)(a4), (u64)(a5))
> +#define efi_call_virt6(f, a1, a2, a3, a4, a5, a6)	\
> +	_efi_call_virtX(6, f, (u64)(a1), (u64)(a2), (u64)(a3), (u64)(a4), (u64)(a5), (u64)(a6))
>  
>  extern void __iomem *efi_ioremap(unsigned long addr, unsigned long size,
>  				 u32 type, u64 attribute);
> @@ -101,6 +108,8 @@ extern void efi_call_phys_prelog(void);
>  extern void efi_call_phys_epilog(void);
>  extern void efi_unmap_memmap(void);
>  extern void efi_memory_uc(u64 addr, unsigned long size);
> +extern void __init efi_map_region(efi_memory_desc_t *md);
> +extern void efi_sync_low_kernel_mappings(void);
>  
>  #ifdef CONFIG_EFI
>  
> diff --git a/arch/x86/include/asm/pgtable_types.h b/arch/x86/include/asm/pgtable_types.h
> index 0ecac257fb26..a83aa44bb1fb 100644
> --- a/arch/x86/include/asm/pgtable_types.h
> +++ b/arch/x86/include/asm/pgtable_types.h
> @@ -382,7 +382,8 @@ static inline void update_page_count(int level, unsigned long pages) { }
>   */
>  extern pte_t *lookup_address(unsigned long address, unsigned int *level);
>  extern phys_addr_t slow_virt_to_phys(void *__address);
> -
> +extern int kernel_map_pages_in_pgd(pgd_t *pgd, u64 pfn, unsigned long address,
> +				   unsigned numpages, unsigned long page_flags);
>  #endif	/* !__ASSEMBLY__ */
>  
>  #endif /* _ASM_X86_PGTABLE_DEFS_H */
> diff --git a/arch/x86/platform/efi/efi.c b/arch/x86/platform/efi/efi.c
> index 538c1e6b7b2c..90459f5f587c 100644
> --- a/arch/x86/platform/efi/efi.c
> +++ b/arch/x86/platform/efi/efi.c
> @@ -12,6 +12,8 @@
>   *	Bibo Mao <bibo.mao@intel.com>
>   *	Chandramouli Narayanan <mouli@linux.intel.com>
>   *	Huang Ying <ying.huang@intel.com>
> + * Copyright (C) 2013 SuSE Labs
> + * 	Borislav Petkov <bp@suse.de> - runtime services VA mapping
>   *
>   * Copied from efi_32.c to eliminate the duplicated code between EFI
>   * 32/64 support code. --ying 2007-10-26
> @@ -81,6 +83,17 @@ static efi_system_table_t efi_systab __initdata;
>  unsigned long x86_efi_facility;
>  
>  /*
> + * Scratch space used for switching the pagetable in the EFI stub
> + */
> +struct efi_scratch {
> +	u64 r15;
> +	u64 prev_cr3;
> +	pgd_t *efi_pgt;
> +	bool use_pgd;
> +};
> +extern struct efi_scratch efi_scratch;
> +
> +/*
>   * Returns 1 if 'facility' is enabled, 0 otherwise.
>   */
>  int efi_enabled(int facility)
> @@ -797,22 +810,6 @@ void __init efi_set_executable(efi_memory_desc_t *md, bool executable)
>  		set_memory_nx(addr, npages);
>  }
>  
> -static void __init runtime_code_page_mkexec(void)
> -{
> -	efi_memory_desc_t *md;
> -	void *p;
> -
> -	/* Make EFI runtime service code area executable */
> -	for (p = memmap.map; p < memmap.map_end; p += memmap.desc_size) {
> -		md = p;
> -
> -		if (md->type != EFI_RUNTIME_SERVICES_CODE)
> -			continue;
> -
> -		efi_set_executable(md, true);
> -	}
> -}
> -
>  /*
>   * We can't ioremap data in EFI boot services RAM, because we've already mapped
>   * it as RAM.  So, look it up in the existing EFI memory map instead.  Only
> @@ -862,10 +859,10 @@ void efi_memory_uc(u64 addr, unsigned long size)
>  void __init efi_enter_virtual_mode(void)
>  {
>  	efi_memory_desc_t *md, *prev_md = NULL;
> -	efi_status_t status;
> +	void *p, *new_memmap = NULL;
>  	unsigned long size;
> -	u64 end, systab, start_pfn, end_pfn;
> -	void *p, *va, *new_memmap = NULL;
> +	efi_status_t status;
> +	u64 end, systab;
>  	int count = 0;
>  
>  	efi.systab = NULL;
> @@ -874,7 +871,6 @@ void __init efi_enter_virtual_mode(void)
>  	 * We don't do virtual mode, since we don't do runtime services, on
>  	 * non-native EFI
>  	 */
> -
>  	if (!efi_is_native()) {
>  		efi_unmap_memmap();
>  		return;
> @@ -914,33 +910,18 @@ void __init efi_enter_virtual_mode(void)
>  		    md->type != EFI_BOOT_SERVICES_DATA)
>  			continue;
>  
> +		efi_map_region(md);
> +
>  		size = md->num_pages << PAGE_SHIFT;
>  		end = md->phys_addr + size;
>  
> -		start_pfn = PFN_DOWN(md->phys_addr);
> -		end_pfn = PFN_UP(end);
> -		if (pfn_range_is_mapped(start_pfn, end_pfn)) {
> -			va = __va(md->phys_addr);
> -
> -			if (!(md->attribute & EFI_MEMORY_WB))
> -				efi_memory_uc((u64)(unsigned long)va, size);
> -		} else
> -			va = efi_ioremap(md->phys_addr, size,
> -					 md->type, md->attribute);
> -
> -		md->virt_addr = (u64) (unsigned long) va;
> -
> -		if (!va) {
> -			pr_err("ioremap of 0x%llX failed!\n",
> -			       (unsigned long long)md->phys_addr);
> -			continue;
> -		}
> -
>  		systab = (u64) (unsigned long) efi_phys.systab;
>  		if (md->phys_addr <= systab && systab < end) {
>  			systab += md->virt_addr - md->phys_addr;
> +
>  			efi.systab = (efi_system_table_t *) (unsigned long) systab;
>  		}
> +
>  		new_memmap = krealloc(new_memmap,
>  				      (count + 1) * memmap.desc_size,
>  				      GFP_KERNEL);
> @@ -949,8 +930,15 @@ void __init efi_enter_virtual_mode(void)
>  		count++;
>  	}
>  
> +#ifdef CONFIG_X86_64
> +	efi_scratch.efi_pgt = (pgd_t *)(unsigned long)real_mode_header->trampoline_pgd;
> +	efi_scratch.use_pgd = true;
> +#endif
> +
>  	BUG_ON(!efi.systab);
>  
> +	efi_sync_low_kernel_mappings();
> +
>  	status = phys_efi_set_virtual_address_map(
>  		memmap.desc_size * count,
>  		memmap.desc_size,
> @@ -983,8 +971,6 @@ void __init efi_enter_virtual_mode(void)
>  	efi.query_variable_info = virt_efi_query_variable_info;
>  	efi.update_capsule = virt_efi_update_capsule;
>  	efi.query_capsule_caps = virt_efi_query_capsule_caps;
> -	if (__supported_pte_mask & _PAGE_NX)
> -		runtime_code_page_mkexec();
>  
>  	kfree(new_memmap);
>  
> diff --git a/arch/x86/platform/efi/efi_32.c b/arch/x86/platform/efi/efi_32.c
> index 40e446941dd7..661663b08eaf 100644
> --- a/arch/x86/platform/efi/efi_32.c
> +++ b/arch/x86/platform/efi/efi_32.c
> @@ -37,9 +37,36 @@
>   * claim EFI runtime service handler exclusively and to duplicate a memory in
>   * low memory space say 0 - 3G.
>   */
> -
>  static unsigned long efi_rt_eflags;
>  
> +void efi_sync_low_kernel_mappings(void) {}
> +
> +void __init efi_map_region(efi_memory_desc_t *md)
> +{
> +	u64 start_pfn, end_pfn, end;
> +	unsigned long size;
> +	void *va;
> +
> +	start_pfn = PFN_DOWN(md->phys_addr);
> +	size	  = md->num_pages << PAGE_SHIFT;
> +	end	  = md->phys_addr + size;
> +	end_pfn   = PFN_UP(end);
> +
> +	if (pfn_range_is_mapped(start_pfn, end_pfn)) {
> +		va = __va(md->phys_addr);
> +
> +		if (!(md->attribute & EFI_MEMORY_WB))
> +			efi_memory_uc((u64)(unsigned long)va, size);
> +	} else
> +		va = efi_ioremap(md->phys_addr, size,
> +				 md->type, md->attribute);
> +
> +	md->virt_addr = (u64) (unsigned long) va;
> +	if (!va)
> +		pr_err("ioremap of 0x%llX failed!\n",
> +		       (unsigned long long)md->phys_addr);
> +}
> +
>  void efi_call_phys_prelog(void)
>  {
>  	struct desc_ptr gdt_descr;
> diff --git a/arch/x86/platform/efi/efi_64.c b/arch/x86/platform/efi/efi_64.c
> index 39a0e7f1f0a3..db5230dd350e 100644
> --- a/arch/x86/platform/efi/efi_64.c
> +++ b/arch/x86/platform/efi/efi_64.c
> @@ -39,59 +39,64 @@
>  #include <asm/cacheflush.h>
>  #include <asm/fixmap.h>
>  
> -static pgd_t *save_pgd __initdata;
> -static unsigned long efi_flags __initdata;
> +void __init efi_call_phys_prelog(void) {}
> +void __init efi_call_phys_epilog(void) {}
>  
> -static void __init early_code_mapping_set_exec(int executable)
> +/*
> + * Add low kernel mappings for passing arguments to EFI functions.
> + */
> +void efi_sync_low_kernel_mappings(void)
>  {
> -	efi_memory_desc_t *md;
> -	void *p;
> +	unsigned num_pgds;
> +	pgd_t *pgd = (pgd_t *)__va(real_mode_header->trampoline_pgd);
>  
> -	if (!(__supported_pte_mask & _PAGE_NX))
> -		return;
> +	num_pgds = pgd_index(VMALLOC_START - 1) - pgd_index(PAGE_OFFSET);
>  
> -	/* Make EFI service code area executable */
> -	for (p = memmap.map; p < memmap.map_end; p += memmap.desc_size) {
> -		md = p;
> -		if (md->type == EFI_RUNTIME_SERVICES_CODE ||
> -		    md->type == EFI_BOOT_SERVICES_CODE)
> -			efi_set_executable(md, executable);
> -	}
> +	memcpy(pgd + pgd_index(PAGE_OFFSET),
> +		init_mm.pgd + pgd_index(PAGE_OFFSET),
> +		sizeof(pgd_t) * num_pgds);
>  }
>  
> -void __init efi_call_phys_prelog(void)
> +/*
> + * We allocate runtime services regions top-down, starting from -4G, i.e.
> + * 0xffff_ffff_0000_0000 and limit EFI VA mapping space to 64G.
> + */
> +static u64 efi_va	 = -4 * (1UL << 30);
> +#define EFI_VA_END	 (-68 * (1UL << 30))
> +
> +static void __init __map_region(efi_memory_desc_t *md, u64 va)
>  {
> -	unsigned long vaddress;
> -	int pgd;
> -	int n_pgds;
> +	pgd_t *pgd = (pgd_t *)__va(real_mode_header->trampoline_pgd);
> +	unsigned long pf = 0, size;
> +	u64 end;
>  
> -	early_code_mapping_set_exec(1);
> -	local_irq_save(efi_flags);
> +	if (!(md->attribute & EFI_MEMORY_WB))
> +		pf |= _PAGE_PCD;
>  
> -	n_pgds = DIV_ROUND_UP((max_pfn << PAGE_SHIFT), PGDIR_SIZE);
> -	save_pgd = kmalloc(n_pgds * sizeof(pgd_t), GFP_KERNEL);
> +	size = md->num_pages << PAGE_SHIFT;
> +	end  = va + size;
>  
> -	for (pgd = 0; pgd < n_pgds; pgd++) {
> -		save_pgd[pgd] = *pgd_offset_k(pgd * PGDIR_SIZE);
> -		vaddress = (unsigned long)__va(pgd * PGDIR_SIZE);
> -		set_pgd(pgd_offset_k(pgd * PGDIR_SIZE), *pgd_offset_k(vaddress));
> -	}
> -	__flush_tlb_all();
> +	if(kernel_map_pages_in_pgd(pgd, md->phys_addr, va, md->num_pages, pf))
> +		pr_warning("Error mapping PA 0x%llx -> VA 0x%llx!\n",
> +			   md->phys_addr, va);
>  }
>  
> -void __init efi_call_phys_epilog(void)
> +void __init efi_map_region(efi_memory_desc_t *md)
>  {
> -	/*
> -	 * After the lock is released, the original page table is restored.
> -	 */
> -	int pgd;
> -	int n_pgds = DIV_ROUND_UP((max_pfn << PAGE_SHIFT) , PGDIR_SIZE);
> -	for (pgd = 0; pgd < n_pgds; pgd++)
> -		set_pgd(pgd_offset_k(pgd * PGDIR_SIZE), save_pgd[pgd]);
> -	kfree(save_pgd);
> -	__flush_tlb_all();
> -	local_irq_restore(efi_flags);
> -	early_code_mapping_set_exec(0);
> +	unsigned long size = md->num_pages << PAGE_SHIFT;
> +
> +	efi_va -= size;
> +	if (efi_va < EFI_VA_END) {
> +		pr_warning(FW_WARN "VA address range overflow!\n");
> +		return;
> +	}
> +
> +	/* Do the 1:1 map */
> +	__map_region(md, md->phys_addr);
> +
> +	/* Do the VA map */
> +	__map_region(md, efi_va);


Could you add comment for above code? It's hard to understand the
twice mapping if one did not follow the old thread.


> +	md->virt_addr = efi_va;
>  }
>  
>  void __iomem *__init efi_ioremap(unsigned long phys_addr, unsigned long size,
> diff --git a/arch/x86/platform/efi/efi_stub_64.S b/arch/x86/platform/efi/efi_stub_64.S
> index 4c07ccab8146..2bb9714bf713 100644
> --- a/arch/x86/platform/efi/efi_stub_64.S
> +++ b/arch/x86/platform/efi/efi_stub_64.S
> @@ -34,10 +34,47 @@
>  	mov %rsi, %cr0;			\
>  	mov (%rsp), %rsp
>  
> +	/* stolen from gcc */
> +	.macro FLUSH_TLB_ALL
> +	movq %r15, efi_scratch(%rip)
> +	movq %r14, efi_scratch+8(%rip)
> +	movq %cr4, %r15
> +	movq %r15, %r14
> +	andb $0x7f, %r14b
> +	movq %r14, %cr4
> +	movq %r15, %cr4
> +	movq efi_scratch+8(%rip), %r14
> +	movq efi_scratch(%rip), %r15
> +	.endm
> +
> +	.macro SWITCH_PGT
> +	cmpb $0, efi_scratch+24(%rip)
> +	je 1f
> +	movq %r15, efi_scratch(%rip)		# r15
> +	# save previous CR3
> +	movq %cr3, %r15
> +	movq %r15, efi_scratch+8(%rip)		# prev_cr3
> +	movq efi_scratch+16(%rip), %r15		# EFI pgt
> +	movq %r15, %cr3
> +	1:
> +	.endm
> +
> +	.macro RESTORE_PGT
> +	cmpb $0, efi_scratch+24(%rip)
> +	je 2f
> +	movq efi_scratch+8(%rip), %r15
> +	movq %r15, %cr3
> +	movq efi_scratch(%rip), %r15
> +	FLUSH_TLB_ALL
> +	2:
> +	.endm
> +
>  ENTRY(efi_call0)
>  	SAVE_XMM
>  	subq $32, %rsp
> +	SWITCH_PGT
>  	call *%rdi
> +	RESTORE_PGT
>  	addq $32, %rsp
>  	RESTORE_XMM
>  	ret
> @@ -47,7 +84,9 @@ ENTRY(efi_call1)
>  	SAVE_XMM
>  	subq $32, %rsp
>  	mov  %rsi, %rcx
> +	SWITCH_PGT
>  	call *%rdi
> +	RESTORE_PGT
>  	addq $32, %rsp
>  	RESTORE_XMM
>  	ret
> @@ -57,7 +96,9 @@ ENTRY(efi_call2)
>  	SAVE_XMM
>  	subq $32, %rsp
>  	mov  %rsi, %rcx
> +	SWITCH_PGT
>  	call *%rdi
> +	RESTORE_PGT
>  	addq $32, %rsp
>  	RESTORE_XMM
>  	ret
> @@ -68,7 +109,9 @@ ENTRY(efi_call3)
>  	subq $32, %rsp
>  	mov  %rcx, %r8
>  	mov  %rsi, %rcx
> +	SWITCH_PGT
>  	call *%rdi
> +	RESTORE_PGT
>  	addq $32, %rsp
>  	RESTORE_XMM
>  	ret
> @@ -80,7 +123,9 @@ ENTRY(efi_call4)
>  	mov %r8, %r9
>  	mov %rcx, %r8
>  	mov %rsi, %rcx
> +	SWITCH_PGT
>  	call *%rdi
> +	RESTORE_PGT
>  	addq $32, %rsp
>  	RESTORE_XMM
>  	ret
> @@ -93,7 +138,9 @@ ENTRY(efi_call5)
>  	mov %r8, %r9
>  	mov %rcx, %r8
>  	mov %rsi, %rcx
> +	SWITCH_PGT
>  	call *%rdi
> +	RESTORE_PGT
>  	addq $48, %rsp
>  	RESTORE_XMM
>  	ret
> @@ -109,8 +156,14 @@ ENTRY(efi_call6)
>  	mov %r8, %r9
>  	mov %rcx, %r8
>  	mov %rsi, %rcx
> +	SWITCH_PGT
>  	call *%rdi
> +	RESTORE_PGT
>  	addq $48, %rsp
>  	RESTORE_XMM
>  	ret
>  ENDPROC(efi_call6)
> +
> +	.data
> +ENTRY(efi_scratch)
> +	.fill 3,8,0
> -- 
> 1.8.4
> 
> -- 
> Regards/Gruss,
>     Boris.
> 
> Sent from a fat crate under my desk. Formatting is fine.
> --

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH -v2] EFI: Runtime services virtual mapping
  2013-09-23  5:47     ` Dave Young
@ 2013-09-23  6:29       ` Borislav Petkov
  2013-09-23  7:08         ` Dave Young
  0 siblings, 1 reply; 102+ messages in thread
From: Borislav Petkov @ 2013-09-23  6:29 UTC (permalink / raw)
  To: Dave Young
  Cc: X86 ML, LKML, Borislav Petkov, Matt Fleming, Matthew Garrett,
	H. Peter Anvin, James Bottomley, Vivek Goyal, linux-efi

On Mon, Sep 23, 2013 at 01:47:41PM +0800, Dave Young wrote:
> > +	unsigned long size = md->num_pages << PAGE_SHIFT;
> > +
> > +	efi_va -= size;
> > +	if (efi_va < EFI_VA_END) {
> > +		pr_warning(FW_WARN "VA address range overflow!\n");
> > +		return;
> > +	}
> > +
> > +	/* Do the 1:1 map */
> > +	__map_region(md, md->phys_addr);
> > +
> > +	/* Do the VA map */
> > +	__map_region(md, efi_va);
> 
> 
> Could you add comment for above code? It's hard to understand the
> twice mapping if one did not follow the old thread.

Does that suffice:

/*
 * Make sure the 1:1 mappings are present as a catch-all for b0rked firmware
 * which doesn't update all internal pointers after switching to virtual mode
 * and would otherwise crap on us.
 */

?

Btw, when you reply to a mail, please remove that quoted portion of it
which you're not replying to - I had to scroll a bunch of screens down
and I almost missed your reply. :)

Thanks.

-- 
Regards/Gruss,
    Boris.

Sent from a fat crate under my desk. Formatting is fine.
--

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH -v2] EFI: Runtime services virtual mapping
  2013-09-23  6:29       ` Borislav Petkov
@ 2013-09-23  7:08         ` Dave Young
  0 siblings, 0 replies; 102+ messages in thread
From: Dave Young @ 2013-09-23  7:08 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: X86 ML, LKML, Borislav Petkov, Matt Fleming, Matthew Garrett,
	H. Peter Anvin, James Bottomley, Vivek Goyal, linux-efi

On 09/23/13 at 08:29am, Borislav Petkov wrote:
> On Mon, Sep 23, 2013 at 01:47:41PM +0800, Dave Young wrote:
> > > +	unsigned long size = md->num_pages << PAGE_SHIFT;
> > > +
> > > +	efi_va -= size;
> > > +	if (efi_va < EFI_VA_END) {
> > > +		pr_warning(FW_WARN "VA address range overflow!\n");
> > > +		return;
> > > +	}
> > > +
> > > +	/* Do the 1:1 map */
> > > +	__map_region(md, md->phys_addr);
> > > +
> > > +	/* Do the VA map */
> > > +	__map_region(md, efi_va);
> > 
> > 
> > Could you add comment for above code? It's hard to understand the
> > twice mapping if one did not follow the old thread.
> 
> Does that suffice:
> 
> /*
>  * Make sure the 1:1 mappings are present as a catch-all for b0rked firmware
>  * which doesn't update all internal pointers after switching to virtual mode
>  * and would otherwise crap on us.
>  */
> 
> ?

Yes, looks good. Thanks

> 
> Btw, when you reply to a mail, please remove that quoted portion of it
> which you're not replying to - I had to scroll a bunch of screens down
> and I almost missed your reply. :)

Will do. I did notice the problem after I enter 'y' in mutt, sorry about it.

> 
> Thanks.
> 
> -- 
> Regards/Gruss,
>     Boris.
> 
> Sent from a fat crate under my desk. Formatting is fine.
> --

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH -v2] EFI: Runtime services virtual mapping
  2013-09-21 11:39   ` [PATCH -v2] " Borislav Petkov
  2013-09-22 12:35     ` Dave Young
  2013-09-23  5:47     ` Dave Young
@ 2013-09-23  8:45     ` Borislav Petkov
  2013-09-25  9:24     ` Borislav Petkov
  3 siblings, 0 replies; 102+ messages in thread
From: Borislav Petkov @ 2013-09-23  8:45 UTC (permalink / raw)
  To: X86 ML
  Cc: LKML, Borislav Petkov, Matt Fleming, Matthew Garrett,
	H. Peter Anvin, James Bottomley, Vivek Goyal, Dave Young,
	linux-efi

On Sat, Sep 21, 2013 at 01:39:29PM +0200, Borislav Petkov wrote:
> -void __init efi_call_phys_prelog(void)
> +/*
> + * We allocate runtime services regions top-down, starting from -4G, i.e.
> + * 0xffff_ffff_0000_0000 and limit EFI VA mapping space to 64G.
> + */
> +static u64 efi_va	 = -4 * (1UL << 30);
> +#define EFI_VA_END	 (-68 * (1UL << 30))

Note to self: add this range to Documentation/x86/x86_64/mm.txt

-- 
Regards/Gruss,
    Boris.

Sent from a fat crate under my desk. Formatting is fine.
--

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH -v2] EFI: Runtime services virtual mapping
  2013-09-22 15:27         ` H. Peter Anvin
  2013-09-22 16:38           ` Borislav Petkov
  2013-09-23  5:45           ` Dave Young
@ 2013-09-24  2:52           ` Dave Young
  2013-09-24  3:06             ` H. Peter Anvin
  2 siblings, 1 reply; 102+ messages in thread
From: Dave Young @ 2013-09-24  2:52 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Borislav Petkov, X86 ML, LKML, Borislav Petkov, Matt Fleming,
	Matthew Garrett, James Bottomley, Vivek Goyal, linux-efi

On 09/22/13 at 08:27am, H. Peter Anvin wrote:
> The address that faults is interesting in that it is indeed just below -4G.  The question at hand is probably what information you are using to build the EFI mappings in the secondary kernel and what could make it not match the primary.
> 
> Assuming it isn't as simple as the mappings never get built at all.

Here is my debug output, diff efi-mapping-1st-kernel efi-mapping-2nd-kernel:
Obviously, the high address mapping is not same:

--- efi-mapping-1.txt	2013-09-24 10:46:09.977746047 +0800
+++ efi-mapping-2.txt	2013-09-24 10:46:33.871421806 +0800
@@ -1,30 +1,30 @@
 efi mapping PA 0x800000 -> VA 0x800000
 efi mapping PA 0x800000 -> VA 0xffffffff00000000
 efi mapping PA 0x7c000000 -> VA 0x7c000000
-efi mapping PA 0x7c000000 -> VA 0xfffffffefffe0000
+efi mapping PA 0x7c000000 -> VA 0xffffffff00000000
 efi mapping PA 0x7d5e2000 -> VA 0x7d5e2000
-efi mapping PA 0x7d5e2000 -> VA 0xfffffffefffdf000
+efi mapping PA 0x7d5e2000 -> VA 0xfffffffefffff000
 efi mapping PA 0x7d77d000 -> VA 0x7d77d000
-efi mapping PA 0x7d77d000 -> VA 0xfffffffefffde000
+efi mapping PA 0x7d77d000 -> VA 0xfffffffeffffe000
 efi mapping PA 0x7d864000 -> VA 0x7d864000
-efi mapping PA 0x7d864000 -> VA 0xfffffffeff8d4000
+efi mapping PA 0x7d864000 -> VA 0xfffffffeff8f4000
 efi mapping PA 0x7df6e000 -> VA 0x7df6e000
-efi mapping PA 0x7df6e000 -> VA 0xfffffffeff6ae000
+efi mapping PA 0x7df6e000 -> VA 0xfffffffeff6ce000
 efi mapping PA 0x7e194000 -> VA 0x7e194000
-efi mapping PA 0x7e194000 -> VA 0xfffffffeff6ac000
+efi mapping PA 0x7e194000 -> VA 0xfffffffeff6cc000
 efi mapping PA 0x7e196000 -> VA 0x7e196000
-efi mapping PA 0x7e196000 -> VA 0xfffffffeff696000
+efi mapping PA 0x7e196000 -> VA 0xfffffffeff6b6000
 efi mapping PA 0x7e1ac000 -> VA 0x7e1ac000
-efi mapping PA 0x7e1ac000 -> VA 0xfffffffeff681000
+efi mapping PA 0x7e1ac000 -> VA 0xfffffffeff6a1000
 efi mapping PA 0x7e1c1000 -> VA 0x7e1c1000
-efi mapping PA 0x7e1c1000 -> VA 0xfffffffefe041000
+efi mapping PA 0x7e1c1000 -> VA 0xfffffffefe061000
 efi mapping PA 0x7f802000 -> VA 0x7f802000
-efi mapping PA 0x7f802000 -> VA 0xfffffffefdec2000
+efi mapping PA 0x7f802000 -> VA 0xfffffffefdee2000
 efi mapping PA 0x7f981000 -> VA 0x7f981000
-efi mapping PA 0x7f981000 -> VA 0xfffffffefde92000
+efi mapping PA 0x7f981000 -> VA 0xfffffffefdeb2000
 efi mapping PA 0x7f9b1000 -> VA 0x7f9b1000
-efi mapping PA 0x7f9b1000 -> VA 0xfffffffefde6e000
+efi mapping PA 0x7f9b1000 -> VA 0xfffffffefde8e000
 efi mapping PA 0x7f9e5000 -> VA 0x7f9e5000
-efi mapping PA 0x7f9e5000 -> VA 0xfffffffefd873000
+efi mapping PA 0x7f9e5000 -> VA 0xfffffffefd893000
 efi mapping PA 0x7ffe0000 -> VA 0x7ffe0000
-efi mapping PA 0x7ffe0000 -> VA 0xfffffffefd853000
+efi mapping PA 0x7ffe0000 -> VA 0xfffffffefd873000

> 
> 
> Borislav Petkov <bp@alien8.de> wrote:
> >On Sun, Sep 22, 2013 at 08:35:15PM +0800, Dave Young wrote:
> >> I tested your new patch, it works both with efi stub and grub boot in
> >> 1st kernel.
> >
> >Good, thanks!
> >
> >> But it paniced in kexec boot with my kexec related patcheset, the
> >patchset
> >
> >That's the second kernel, right?
> >
> >> contains 3 patch:
> >> 1. introduce cmdline kexecboot=<0|1|2>; 1 == kexec, 2 == kdump
> >> 2. export physical addr fw_vendor, runtime, tables to
> >/sys/firmware/efi/systab
> >> 3. if kexecboot != 0, use fw_vendor, runtime, tables from bootparams;
> >Also do not
> >>    call SetVirtualAddressMao in case kexecboot.
> >> 
> >> The panic happens at the last line of efi_init:
> >>         /* clean DUMMY object */
> >>         efi.set_variable(efi_dummy_name, &EFI_DUMMY_GUID,
> >>                          EFI_VARIABLE_NON_VOLATILE |
> >>                          EFI_VARIABLE_BOOTSERVICE_ACCESS |
> >>                          EFI_VARIABLE_RUNTIME_ACCESS,
> >>                          0, NULL);
> >> 
> >> Below is the dmesg:
> >> [    0.003359] pid_max: default: 32768 minimum: 301
> >> [    0.004792] BUG: unable to handle kernel paging request at
> >fffffffefde97e70
> >> [    0.006666] IP: [<ffffffff8103a1db>]
> >virt_efi_set_variable+0x40/0x54
> >> [    0.006666] PGD 36981067 PUD 35828063 PMD 0
> >
> >Here it is - fffffffefde97e70 is not mapped in the pagetable, PMD is 0.
> >
> >Ok, can you upload your patches somewhere and tell me exactly how to
> >reproduce this so that I can take a look too?
> >
> >Thanks.
> 
> -- 
> Sent from my mobile phone.  Please pardon brevity and lack of formatting.

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH -v2] EFI: Runtime services virtual mapping
  2013-09-24  2:52           ` Dave Young
@ 2013-09-24  3:06             ` H. Peter Anvin
  2013-09-24  4:57               ` Dave Young
  2013-10-02 10:04               ` Borislav Petkov
  0 siblings, 2 replies; 102+ messages in thread
From: H. Peter Anvin @ 2013-09-24  3:06 UTC (permalink / raw)
  To: Dave Young
  Cc: Borislav Petkov, X86 ML, LKML, Borislav Petkov, Matt Fleming,
	Matthew Garrett, James Bottomley, Vivek Goyal, linux-efi

Okay... I see two problems.

1. It looks like we subtract the region size after, rather than before, assigning an address.

2. The second region is assigned the same address in the secondary kernel as in the first, implying the size of the first region was somehow set to zero.

Dave Young <dyoung@redhat.com> wrote:
>On 09/22/13 at 08:27am, H. Peter Anvin wrote:
>> The address that faults is interesting in that it is indeed just
>below -4G.  The question at hand is probably what information you are
>using to build the EFI mappings in the secondary kernel and what could
>make it not match the primary.
>> 
>> Assuming it isn't as simple as the mappings never get built at all.
>
>Here is my debug output, diff efi-mapping-1st-kernel
>efi-mapping-2nd-kernel:
>Obviously, the high address mapping is not same:
>
>--- efi-mapping-1.txt	2013-09-24 10:46:09.977746047 +0800
>+++ efi-mapping-2.txt	2013-09-24 10:46:33.871421806 +0800
>@@ -1,30 +1,30 @@
> efi mapping PA 0x800000 -> VA 0x800000
> efi mapping PA 0x800000 -> VA 0xffffffff00000000
> efi mapping PA 0x7c000000 -> VA 0x7c000000
>-efi mapping PA 0x7c000000 -> VA 0xfffffffefffe0000
>+efi mapping PA 0x7c000000 -> VA 0xffffffff00000000
> efi mapping PA 0x7d5e2000 -> VA 0x7d5e2000
>-efi mapping PA 0x7d5e2000 -> VA 0xfffffffefffdf000
>+efi mapping PA 0x7d5e2000 -> VA 0xfffffffefffff000
> efi mapping PA 0x7d77d000 -> VA 0x7d77d000
>-efi mapping PA 0x7d77d000 -> VA 0xfffffffefffde000
>+efi mapping PA 0x7d77d000 -> VA 0xfffffffeffffe000
> efi mapping PA 0x7d864000 -> VA 0x7d864000
>-efi mapping PA 0x7d864000 -> VA 0xfffffffeff8d4000
>+efi mapping PA 0x7d864000 -> VA 0xfffffffeff8f4000
> efi mapping PA 0x7df6e000 -> VA 0x7df6e000
>-efi mapping PA 0x7df6e000 -> VA 0xfffffffeff6ae000
>+efi mapping PA 0x7df6e000 -> VA 0xfffffffeff6ce000
> efi mapping PA 0x7e194000 -> VA 0x7e194000
>-efi mapping PA 0x7e194000 -> VA 0xfffffffeff6ac000
>+efi mapping PA 0x7e194000 -> VA 0xfffffffeff6cc000
> efi mapping PA 0x7e196000 -> VA 0x7e196000
>-efi mapping PA 0x7e196000 -> VA 0xfffffffeff696000
>+efi mapping PA 0x7e196000 -> VA 0xfffffffeff6b6000
> efi mapping PA 0x7e1ac000 -> VA 0x7e1ac000
>-efi mapping PA 0x7e1ac000 -> VA 0xfffffffeff681000
>+efi mapping PA 0x7e1ac000 -> VA 0xfffffffeff6a1000
> efi mapping PA 0x7e1c1000 -> VA 0x7e1c1000
>-efi mapping PA 0x7e1c1000 -> VA 0xfffffffefe041000
>+efi mapping PA 0x7e1c1000 -> VA 0xfffffffefe061000
> efi mapping PA 0x7f802000 -> VA 0x7f802000
>-efi mapping PA 0x7f802000 -> VA 0xfffffffefdec2000
>+efi mapping PA 0x7f802000 -> VA 0xfffffffefdee2000
> efi mapping PA 0x7f981000 -> VA 0x7f981000
>-efi mapping PA 0x7f981000 -> VA 0xfffffffefde92000
>+efi mapping PA 0x7f981000 -> VA 0xfffffffefdeb2000
> efi mapping PA 0x7f9b1000 -> VA 0x7f9b1000
>-efi mapping PA 0x7f9b1000 -> VA 0xfffffffefde6e000
>+efi mapping PA 0x7f9b1000 -> VA 0xfffffffefde8e000
> efi mapping PA 0x7f9e5000 -> VA 0x7f9e5000
>-efi mapping PA 0x7f9e5000 -> VA 0xfffffffefd873000
>+efi mapping PA 0x7f9e5000 -> VA 0xfffffffefd893000
> efi mapping PA 0x7ffe0000 -> VA 0x7ffe0000
>-efi mapping PA 0x7ffe0000 -> VA 0xfffffffefd853000
>+efi mapping PA 0x7ffe0000 -> VA 0xfffffffefd873000
>
>> 
>> 
>> Borislav Petkov <bp@alien8.de> wrote:
>> >On Sun, Sep 22, 2013 at 08:35:15PM +0800, Dave Young wrote:
>> >> I tested your new patch, it works both with efi stub and grub boot
>in
>> >> 1st kernel.
>> >
>> >Good, thanks!
>> >
>> >> But it paniced in kexec boot with my kexec related patcheset, the
>> >patchset
>> >
>> >That's the second kernel, right?
>> >
>> >> contains 3 patch:
>> >> 1. introduce cmdline kexecboot=<0|1|2>; 1 == kexec, 2 == kdump
>> >> 2. export physical addr fw_vendor, runtime, tables to
>> >/sys/firmware/efi/systab
>> >> 3. if kexecboot != 0, use fw_vendor, runtime, tables from
>bootparams;
>> >Also do not
>> >>    call SetVirtualAddressMao in case kexecboot.
>> >> 
>> >> The panic happens at the last line of efi_init:
>> >>         /* clean DUMMY object */
>> >>         efi.set_variable(efi_dummy_name, &EFI_DUMMY_GUID,
>> >>                          EFI_VARIABLE_NON_VOLATILE |
>> >>                          EFI_VARIABLE_BOOTSERVICE_ACCESS |
>> >>                          EFI_VARIABLE_RUNTIME_ACCESS,
>> >>                          0, NULL);
>> >> 
>> >> Below is the dmesg:
>> >> [    0.003359] pid_max: default: 32768 minimum: 301
>> >> [    0.004792] BUG: unable to handle kernel paging request at
>> >fffffffefde97e70
>> >> [    0.006666] IP: [<ffffffff8103a1db>]
>> >virt_efi_set_variable+0x40/0x54
>> >> [    0.006666] PGD 36981067 PUD 35828063 PMD 0
>> >
>> >Here it is - fffffffefde97e70 is not mapped in the pagetable, PMD is
>0.
>> >
>> >Ok, can you upload your patches somewhere and tell me exactly how to
>> >reproduce this so that I can take a look too?
>> >
>> >Thanks.
>> 
>> -- 
>> Sent from my mobile phone.  Please pardon brevity and lack of
>formatting.

-- 
Sent from my mobile phone.  Please pardon brevity and lack of formatting.

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH -v2] EFI: Runtime services virtual mapping
  2013-09-24  3:06             ` H. Peter Anvin
@ 2013-09-24  4:57               ` Dave Young
  2013-09-24  4:58                 ` Dave Young
  2013-09-24  9:43                 ` Borislav Petkov
  2013-10-02 10:04               ` Borislav Petkov
  1 sibling, 2 replies; 102+ messages in thread
From: Dave Young @ 2013-09-24  4:57 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Borislav Petkov, X86 ML, LKML, Borislav Petkov, Matt Fleming,
	Matthew Garrett, James Bottomley, Vivek Goyal, linux-efi

On 09/23/13 at 08:06pm, H. Peter Anvin wrote:
> Okay... I see two problems.
> 
> 1. It looks like we subtract the region size after, rather than before, assigning an address.
> 
> 2. The second region is assigned the same address in the secondary kernel as in the first, implying the size of the first region was somehow set to zero.

I find the reason, efi_reserve_boot_services will reserve the BOOT_SERVICE_DATA region
thus the memmap size is changed to 0, so in 2nd kernel the virtual mapping addr after
the md will be not same as 1st kernel, see below code:
 
void __init efi_map_region(efi_memory_desc_t *md)
{
        unsigned long size = md->num_pages << PAGE_SHIFT;

        efi_va -= size;
        ^^^^^^^^^^^^^^^
	[snip]
}


> 
> Dave Young <dyoung@redhat.com> wrote:
> >On 09/22/13 at 08:27am, H. Peter Anvin wrote:
> >> The address that faults is interesting in that it is indeed just
> >below -4G.  The question at hand is probably what information you are
> >using to build the EFI mappings in the secondary kernel and what could
> >make it not match the primary.
> >> 
> >> Assuming it isn't as simple as the mappings never get built at all.
> >
> >Here is my debug output, diff efi-mapping-1st-kernel
> >efi-mapping-2nd-kernel:
> >Obviously, the high address mapping is not same:
> >
> >--- efi-mapping-1.txt	2013-09-24 10:46:09.977746047 +0800
> >+++ efi-mapping-2.txt	2013-09-24 10:46:33.871421806 +0800
> >@@ -1,30 +1,30 @@
> > efi mapping PA 0x800000 -> VA 0x800000
> > efi mapping PA 0x800000 -> VA 0xffffffff00000000
> > efi mapping PA 0x7c000000 -> VA 0x7c000000
> >-efi mapping PA 0x7c000000 -> VA 0xfffffffefffe0000
> >+efi mapping PA 0x7c000000 -> VA 0xffffffff00000000
> > efi mapping PA 0x7d5e2000 -> VA 0x7d5e2000
> >-efi mapping PA 0x7d5e2000 -> VA 0xfffffffefffdf000
> >+efi mapping PA 0x7d5e2000 -> VA 0xfffffffefffff000
> > efi mapping PA 0x7d77d000 -> VA 0x7d77d000
> >-efi mapping PA 0x7d77d000 -> VA 0xfffffffefffde000
> >+efi mapping PA 0x7d77d000 -> VA 0xfffffffeffffe000
> > efi mapping PA 0x7d864000 -> VA 0x7d864000
> >-efi mapping PA 0x7d864000 -> VA 0xfffffffeff8d4000
> >+efi mapping PA 0x7d864000 -> VA 0xfffffffeff8f4000
> > efi mapping PA 0x7df6e000 -> VA 0x7df6e000
> >-efi mapping PA 0x7df6e000 -> VA 0xfffffffeff6ae000
> >+efi mapping PA 0x7df6e000 -> VA 0xfffffffeff6ce000
> > efi mapping PA 0x7e194000 -> VA 0x7e194000
> >-efi mapping PA 0x7e194000 -> VA 0xfffffffeff6ac000
> >+efi mapping PA 0x7e194000 -> VA 0xfffffffeff6cc000
> > efi mapping PA 0x7e196000 -> VA 0x7e196000
> >-efi mapping PA 0x7e196000 -> VA 0xfffffffeff696000
> >+efi mapping PA 0x7e196000 -> VA 0xfffffffeff6b6000
> > efi mapping PA 0x7e1ac000 -> VA 0x7e1ac000
> >-efi mapping PA 0x7e1ac000 -> VA 0xfffffffeff681000
> >+efi mapping PA 0x7e1ac000 -> VA 0xfffffffeff6a1000
> > efi mapping PA 0x7e1c1000 -> VA 0x7e1c1000
> >-efi mapping PA 0x7e1c1000 -> VA 0xfffffffefe041000
> >+efi mapping PA 0x7e1c1000 -> VA 0xfffffffefe061000
> > efi mapping PA 0x7f802000 -> VA 0x7f802000
> >-efi mapping PA 0x7f802000 -> VA 0xfffffffefdec2000
> >+efi mapping PA 0x7f802000 -> VA 0xfffffffefdee2000
> > efi mapping PA 0x7f981000 -> VA 0x7f981000
> >-efi mapping PA 0x7f981000 -> VA 0xfffffffefde92000
> >+efi mapping PA 0x7f981000 -> VA 0xfffffffefdeb2000
> > efi mapping PA 0x7f9b1000 -> VA 0x7f9b1000
> >-efi mapping PA 0x7f9b1000 -> VA 0xfffffffefde6e000
> >+efi mapping PA 0x7f9b1000 -> VA 0xfffffffefde8e000
> > efi mapping PA 0x7f9e5000 -> VA 0x7f9e5000
> >-efi mapping PA 0x7f9e5000 -> VA 0xfffffffefd873000
> >+efi mapping PA 0x7f9e5000 -> VA 0xfffffffefd893000
> > efi mapping PA 0x7ffe0000 -> VA 0x7ffe0000
> >-efi mapping PA 0x7ffe0000 -> VA 0xfffffffefd853000
> >+efi mapping PA 0x7ffe0000 -> VA 0xfffffffefd873000
> >
> >> 
> >> 
> >> Borislav Petkov <bp@alien8.de> wrote:
> >> >On Sun, Sep 22, 2013 at 08:35:15PM +0800, Dave Young wrote:
> >> >> I tested your new patch, it works both with efi stub and grub boot
> >in
> >> >> 1st kernel.
> >> >
> >> >Good, thanks!
> >> >
> >> >> But it paniced in kexec boot with my kexec related patcheset, the
> >> >patchset
> >> >
> >> >That's the second kernel, right?
> >> >
> >> >> contains 3 patch:
> >> >> 1. introduce cmdline kexecboot=<0|1|2>; 1 == kexec, 2 == kdump
> >> >> 2. export physical addr fw_vendor, runtime, tables to
> >> >/sys/firmware/efi/systab
> >> >> 3. if kexecboot != 0, use fw_vendor, runtime, tables from
> >bootparams;
> >> >Also do not
> >> >>    call SetVirtualAddressMao in case kexecboot.
> >> >> 
> >> >> The panic happens at the last line of efi_init:
> >> >>         /* clean DUMMY object */
> >> >>         efi.set_variable(efi_dummy_name, &EFI_DUMMY_GUID,
> >> >>                          EFI_VARIABLE_NON_VOLATILE |
> >> >>                          EFI_VARIABLE_BOOTSERVICE_ACCESS |
> >> >>                          EFI_VARIABLE_RUNTIME_ACCESS,
> >> >>                          0, NULL);
> >> >> 
> >> >> Below is the dmesg:
> >> >> [    0.003359] pid_max: default: 32768 minimum: 301
> >> >> [    0.004792] BUG: unable to handle kernel paging request at
> >> >fffffffefde97e70
> >> >> [    0.006666] IP: [<ffffffff8103a1db>]
> >> >virt_efi_set_variable+0x40/0x54
> >> >> [    0.006666] PGD 36981067 PUD 35828063 PMD 0
> >> >
> >> >Here it is - fffffffefde97e70 is not mapped in the pagetable, PMD is
> >0.
> >> >
> >> >Ok, can you upload your patches somewhere and tell me exactly how to
> >> >reproduce this so that I can take a look too?
> >> >
> >> >Thanks.
> >> 
> >> -- 
> >> Sent from my mobile phone.  Please pardon brevity and lack of
> >formatting.
> 
> -- 
> Sent from my mobile phone.  Please pardon brevity and lack of formatting.

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH -v2] EFI: Runtime services virtual mapping
  2013-09-24  4:57               ` Dave Young
@ 2013-09-24  4:58                 ` Dave Young
  2013-09-24  5:23                   ` Dave Young
  2013-09-24  9:43                 ` Borislav Petkov
  1 sibling, 1 reply; 102+ messages in thread
From: Dave Young @ 2013-09-24  4:58 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Borislav Petkov, X86 ML, LKML, Borislav Petkov, Matt Fleming,
	Matthew Garrett, James Bottomley, Vivek Goyal, linux-efi

On 09/24/13 at 12:57pm, Dave Young wrote:
> On 09/23/13 at 08:06pm, H. Peter Anvin wrote:
> > Okay... I see two problems.
> > 
> > 1. It looks like we subtract the region size after, rather than before, assigning an address.
> > 
> > 2. The second region is assigned the same address in the secondary kernel as in the first, implying the size of the first region was somehow set to zero.
> 
> I find the reason, efi_reserve_boot_services will reserve the BOOT_SERVICE_DATA region
> thus the memmap size is changed to 0, so in 2nd kernel the virtual mapping addr after
> the md will be not same as 1st kernel, see below code:
>  
> void __init efi_map_region(efi_memory_desc_t *md)
> {
>         unsigned long size = md->num_pages << PAGE_SHIFT;
> 
>         efi_va -= size;
>         ^^^^^^^^^^^^^^^
> 	[snip]
> }

So how about just reserve BOOT_SERVICE_DATA region but keep the md.numpages as is?

> 
> 
> > 
> > Dave Young <dyoung@redhat.com> wrote:
> > >On 09/22/13 at 08:27am, H. Peter Anvin wrote:
> > >> The address that faults is interesting in that it is indeed just
> > >below -4G.  The question at hand is probably what information you are
> > >using to build the EFI mappings in the secondary kernel and what could
> > >make it not match the primary.
> > >> 
> > >> Assuming it isn't as simple as the mappings never get built at all.
> > >
> > >Here is my debug output, diff efi-mapping-1st-kernel
> > >efi-mapping-2nd-kernel:
> > >Obviously, the high address mapping is not same:
> > >
> > >--- efi-mapping-1.txt	2013-09-24 10:46:09.977746047 +0800
> > >+++ efi-mapping-2.txt	2013-09-24 10:46:33.871421806 +0800
> > >@@ -1,30 +1,30 @@
> > > efi mapping PA 0x800000 -> VA 0x800000
> > > efi mapping PA 0x800000 -> VA 0xffffffff00000000
> > > efi mapping PA 0x7c000000 -> VA 0x7c000000
> > >-efi mapping PA 0x7c000000 -> VA 0xfffffffefffe0000
> > >+efi mapping PA 0x7c000000 -> VA 0xffffffff00000000
> > > efi mapping PA 0x7d5e2000 -> VA 0x7d5e2000
> > >-efi mapping PA 0x7d5e2000 -> VA 0xfffffffefffdf000
> > >+efi mapping PA 0x7d5e2000 -> VA 0xfffffffefffff000
> > > efi mapping PA 0x7d77d000 -> VA 0x7d77d000
> > >-efi mapping PA 0x7d77d000 -> VA 0xfffffffefffde000
> > >+efi mapping PA 0x7d77d000 -> VA 0xfffffffeffffe000
> > > efi mapping PA 0x7d864000 -> VA 0x7d864000
> > >-efi mapping PA 0x7d864000 -> VA 0xfffffffeff8d4000
> > >+efi mapping PA 0x7d864000 -> VA 0xfffffffeff8f4000
> > > efi mapping PA 0x7df6e000 -> VA 0x7df6e000
> > >-efi mapping PA 0x7df6e000 -> VA 0xfffffffeff6ae000
> > >+efi mapping PA 0x7df6e000 -> VA 0xfffffffeff6ce000
> > > efi mapping PA 0x7e194000 -> VA 0x7e194000
> > >-efi mapping PA 0x7e194000 -> VA 0xfffffffeff6ac000
> > >+efi mapping PA 0x7e194000 -> VA 0xfffffffeff6cc000
> > > efi mapping PA 0x7e196000 -> VA 0x7e196000
> > >-efi mapping PA 0x7e196000 -> VA 0xfffffffeff696000
> > >+efi mapping PA 0x7e196000 -> VA 0xfffffffeff6b6000
> > > efi mapping PA 0x7e1ac000 -> VA 0x7e1ac000
> > >-efi mapping PA 0x7e1ac000 -> VA 0xfffffffeff681000
> > >+efi mapping PA 0x7e1ac000 -> VA 0xfffffffeff6a1000
> > > efi mapping PA 0x7e1c1000 -> VA 0x7e1c1000
> > >-efi mapping PA 0x7e1c1000 -> VA 0xfffffffefe041000
> > >+efi mapping PA 0x7e1c1000 -> VA 0xfffffffefe061000
> > > efi mapping PA 0x7f802000 -> VA 0x7f802000
> > >-efi mapping PA 0x7f802000 -> VA 0xfffffffefdec2000
> > >+efi mapping PA 0x7f802000 -> VA 0xfffffffefdee2000
> > > efi mapping PA 0x7f981000 -> VA 0x7f981000
> > >-efi mapping PA 0x7f981000 -> VA 0xfffffffefde92000
> > >+efi mapping PA 0x7f981000 -> VA 0xfffffffefdeb2000
> > > efi mapping PA 0x7f9b1000 -> VA 0x7f9b1000
> > >-efi mapping PA 0x7f9b1000 -> VA 0xfffffffefde6e000
> > >+efi mapping PA 0x7f9b1000 -> VA 0xfffffffefde8e000
> > > efi mapping PA 0x7f9e5000 -> VA 0x7f9e5000
> > >-efi mapping PA 0x7f9e5000 -> VA 0xfffffffefd873000
> > >+efi mapping PA 0x7f9e5000 -> VA 0xfffffffefd893000
> > > efi mapping PA 0x7ffe0000 -> VA 0x7ffe0000
> > >-efi mapping PA 0x7ffe0000 -> VA 0xfffffffefd853000
> > >+efi mapping PA 0x7ffe0000 -> VA 0xfffffffefd873000
> > >
> > >> 
> > >> 
> > >> Borislav Petkov <bp@alien8.de> wrote:
> > >> >On Sun, Sep 22, 2013 at 08:35:15PM +0800, Dave Young wrote:
> > >> >> I tested your new patch, it works both with efi stub and grub boot
> > >in
> > >> >> 1st kernel.
> > >> >
> > >> >Good, thanks!
> > >> >
> > >> >> But it paniced in kexec boot with my kexec related patcheset, the
> > >> >patchset
> > >> >
> > >> >That's the second kernel, right?
> > >> >
> > >> >> contains 3 patch:
> > >> >> 1. introduce cmdline kexecboot=<0|1|2>; 1 == kexec, 2 == kdump
> > >> >> 2. export physical addr fw_vendor, runtime, tables to
> > >> >/sys/firmware/efi/systab
> > >> >> 3. if kexecboot != 0, use fw_vendor, runtime, tables from
> > >bootparams;
> > >> >Also do not
> > >> >>    call SetVirtualAddressMao in case kexecboot.
> > >> >> 
> > >> >> The panic happens at the last line of efi_init:
> > >> >>         /* clean DUMMY object */
> > >> >>         efi.set_variable(efi_dummy_name, &EFI_DUMMY_GUID,
> > >> >>                          EFI_VARIABLE_NON_VOLATILE |
> > >> >>                          EFI_VARIABLE_BOOTSERVICE_ACCESS |
> > >> >>                          EFI_VARIABLE_RUNTIME_ACCESS,
> > >> >>                          0, NULL);
> > >> >> 
> > >> >> Below is the dmesg:
> > >> >> [    0.003359] pid_max: default: 32768 minimum: 301
> > >> >> [    0.004792] BUG: unable to handle kernel paging request at
> > >> >fffffffefde97e70
> > >> >> [    0.006666] IP: [<ffffffff8103a1db>]
> > >> >virt_efi_set_variable+0x40/0x54
> > >> >> [    0.006666] PGD 36981067 PUD 35828063 PMD 0
> > >> >
> > >> >Here it is - fffffffefde97e70 is not mapped in the pagetable, PMD is
> > >0.
> > >> >
> > >> >Ok, can you upload your patches somewhere and tell me exactly how to
> > >> >reproduce this so that I can take a look too?
> > >> >
> > >> >Thanks.
> > >> 
> > >> -- 
> > >> Sent from my mobile phone.  Please pardon brevity and lack of
> > >formatting.
> > 
> > -- 
> > Sent from my mobile phone.  Please pardon brevity and lack of formatting.

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH -v2] EFI: Runtime services virtual mapping
  2013-09-24  4:58                 ` Dave Young
@ 2013-09-24  5:23                   ` Dave Young
  2013-09-24  8:57                     ` Dave Young
  0 siblings, 1 reply; 102+ messages in thread
From: Dave Young @ 2013-09-24  5:23 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Borislav Petkov, X86 ML, LKML, Borislav Petkov, Matt Fleming,
	Matthew Garrett, James Bottomley, Vivek Goyal, linux-efi

On 09/24/13 at 12:58pm, Dave Young wrote:
> On 09/24/13 at 12:57pm, Dave Young wrote:
> > On 09/23/13 at 08:06pm, H. Peter Anvin wrote:
> > > Okay... I see two problems.
> > > 
> > > 1. It looks like we subtract the region size after, rather than before, assigning an address.

Could you explain more about this problem? Where is the code?

> > > 
> > > 2. The second region is assigned the same address in the secondary kernel as in the first, implying the size of the first region was somehow set to zero.
> > 
> > I find the reason, efi_reserve_boot_services will reserve the BOOT_SERVICE_DATA region
> > thus the memmap size is changed to 0, so in 2nd kernel the virtual mapping addr after
> > the md will be not same as 1st kernel, see below code:
> >  
> > void __init efi_map_region(efi_memory_desc_t *md)
> > {
> >         unsigned long size = md->num_pages << PAGE_SHIFT;
> > 
> >         efi_va -= size;
> >         ^^^^^^^^^^^^^^^
> > 	[snip]
> > }
> 
> So how about just reserve BOOT_SERVICE_DATA region but keep the md.numpages as is?

Hmm, num_pages = 0 is only set when boot service region reservation is imporsible, I'm
lost.. But there must be somewhere set the size to 0.

> 
> > 
> > 
> > > 
> > > Dave Young <dyoung@redhat.com> wrote:
> > > >On 09/22/13 at 08:27am, H. Peter Anvin wrote:
> > > >> The address that faults is interesting in that it is indeed just
> > > >below -4G.  The question at hand is probably what information you are
> > > >using to build the EFI mappings in the secondary kernel and what could
> > > >make it not match the primary.
> > > >> 
> > > >> Assuming it isn't as simple as the mappings never get built at all.
> > > >
> > > >Here is my debug output, diff efi-mapping-1st-kernel
> > > >efi-mapping-2nd-kernel:
> > > >Obviously, the high address mapping is not same:
> > > >
> > > >--- efi-mapping-1.txt	2013-09-24 10:46:09.977746047 +0800
> > > >+++ efi-mapping-2.txt	2013-09-24 10:46:33.871421806 +0800
> > > >@@ -1,30 +1,30 @@
> > > > efi mapping PA 0x800000 -> VA 0x800000
> > > > efi mapping PA 0x800000 -> VA 0xffffffff00000000
> > > > efi mapping PA 0x7c000000 -> VA 0x7c000000
> > > >-efi mapping PA 0x7c000000 -> VA 0xfffffffefffe0000
> > > >+efi mapping PA 0x7c000000 -> VA 0xffffffff00000000
> > > > efi mapping PA 0x7d5e2000 -> VA 0x7d5e2000
> > > >-efi mapping PA 0x7d5e2000 -> VA 0xfffffffefffdf000
> > > >+efi mapping PA 0x7d5e2000 -> VA 0xfffffffefffff000
> > > > efi mapping PA 0x7d77d000 -> VA 0x7d77d000
> > > >-efi mapping PA 0x7d77d000 -> VA 0xfffffffefffde000
> > > >+efi mapping PA 0x7d77d000 -> VA 0xfffffffeffffe000
> > > > efi mapping PA 0x7d864000 -> VA 0x7d864000
> > > >-efi mapping PA 0x7d864000 -> VA 0xfffffffeff8d4000
> > > >+efi mapping PA 0x7d864000 -> VA 0xfffffffeff8f4000
> > > > efi mapping PA 0x7df6e000 -> VA 0x7df6e000
> > > >-efi mapping PA 0x7df6e000 -> VA 0xfffffffeff6ae000
> > > >+efi mapping PA 0x7df6e000 -> VA 0xfffffffeff6ce000
> > > > efi mapping PA 0x7e194000 -> VA 0x7e194000
> > > >-efi mapping PA 0x7e194000 -> VA 0xfffffffeff6ac000
> > > >+efi mapping PA 0x7e194000 -> VA 0xfffffffeff6cc000
> > > > efi mapping PA 0x7e196000 -> VA 0x7e196000
> > > >-efi mapping PA 0x7e196000 -> VA 0xfffffffeff696000
> > > >+efi mapping PA 0x7e196000 -> VA 0xfffffffeff6b6000
> > > > efi mapping PA 0x7e1ac000 -> VA 0x7e1ac000
> > > >-efi mapping PA 0x7e1ac000 -> VA 0xfffffffeff681000
> > > >+efi mapping PA 0x7e1ac000 -> VA 0xfffffffeff6a1000
> > > > efi mapping PA 0x7e1c1000 -> VA 0x7e1c1000
> > > >-efi mapping PA 0x7e1c1000 -> VA 0xfffffffefe041000
> > > >+efi mapping PA 0x7e1c1000 -> VA 0xfffffffefe061000
> > > > efi mapping PA 0x7f802000 -> VA 0x7f802000
> > > >-efi mapping PA 0x7f802000 -> VA 0xfffffffefdec2000
> > > >+efi mapping PA 0x7f802000 -> VA 0xfffffffefdee2000
> > > > efi mapping PA 0x7f981000 -> VA 0x7f981000
> > > >-efi mapping PA 0x7f981000 -> VA 0xfffffffefde92000
> > > >+efi mapping PA 0x7f981000 -> VA 0xfffffffefdeb2000
> > > > efi mapping PA 0x7f9b1000 -> VA 0x7f9b1000
> > > >-efi mapping PA 0x7f9b1000 -> VA 0xfffffffefde6e000
> > > >+efi mapping PA 0x7f9b1000 -> VA 0xfffffffefde8e000
> > > > efi mapping PA 0x7f9e5000 -> VA 0x7f9e5000
> > > >-efi mapping PA 0x7f9e5000 -> VA 0xfffffffefd873000
> > > >+efi mapping PA 0x7f9e5000 -> VA 0xfffffffefd893000
> > > > efi mapping PA 0x7ffe0000 -> VA 0x7ffe0000
> > > >-efi mapping PA 0x7ffe0000 -> VA 0xfffffffefd853000
> > > >+efi mapping PA 0x7ffe0000 -> VA 0xfffffffefd873000
> > > >
> > > >> 
> > > >> 
> > > >> Borislav Petkov <bp@alien8.de> wrote:
> > > >> >On Sun, Sep 22, 2013 at 08:35:15PM +0800, Dave Young wrote:
> > > >> >> I tested your new patch, it works both with efi stub and grub boot
> > > >in
> > > >> >> 1st kernel.
> > > >> >
> > > >> >Good, thanks!
> > > >> >
> > > >> >> But it paniced in kexec boot with my kexec related patcheset, the
> > > >> >patchset
> > > >> >
> > > >> >That's the second kernel, right?
> > > >> >
> > > >> >> contains 3 patch:
> > > >> >> 1. introduce cmdline kexecboot=<0|1|2>; 1 == kexec, 2 == kdump
> > > >> >> 2. export physical addr fw_vendor, runtime, tables to
> > > >> >/sys/firmware/efi/systab
> > > >> >> 3. if kexecboot != 0, use fw_vendor, runtime, tables from
> > > >bootparams;
> > > >> >Also do not
> > > >> >>    call SetVirtualAddressMao in case kexecboot.
> > > >> >> 
> > > >> >> The panic happens at the last line of efi_init:
> > > >> >>         /* clean DUMMY object */
> > > >> >>         efi.set_variable(efi_dummy_name, &EFI_DUMMY_GUID,
> > > >> >>                          EFI_VARIABLE_NON_VOLATILE |
> > > >> >>                          EFI_VARIABLE_BOOTSERVICE_ACCESS |
> > > >> >>                          EFI_VARIABLE_RUNTIME_ACCESS,
> > > >> >>                          0, NULL);
> > > >> >> 
> > > >> >> Below is the dmesg:
> > > >> >> [    0.003359] pid_max: default: 32768 minimum: 301
> > > >> >> [    0.004792] BUG: unable to handle kernel paging request at
> > > >> >fffffffefde97e70
> > > >> >> [    0.006666] IP: [<ffffffff8103a1db>]
> > > >> >virt_efi_set_variable+0x40/0x54
> > > >> >> [    0.006666] PGD 36981067 PUD 35828063 PMD 0
> > > >> >
> > > >> >Here it is - fffffffefde97e70 is not mapped in the pagetable, PMD is
> > > >0.
> > > >> >
> > > >> >Ok, can you upload your patches somewhere and tell me exactly how to
> > > >> >reproduce this so that I can take a look too?
> > > >> >
> > > >> >Thanks.
> > > >> 
> > > >> -- 
> > > >> Sent from my mobile phone.  Please pardon brevity and lack of
> > > >formatting.
> > > 
> > > -- 
> > > Sent from my mobile phone.  Please pardon brevity and lack of formatting.

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH -v2] EFI: Runtime services virtual mapping
  2013-09-24  5:23                   ` Dave Young
@ 2013-09-24  8:57                     ` Dave Young
  0 siblings, 0 replies; 102+ messages in thread
From: Dave Young @ 2013-09-24  8:57 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Borislav Petkov, X86 ML, LKML, Borislav Petkov, Matt Fleming,
	Matthew Garrett, James Bottomley, Vivek Goyal, linux-efi

On 09/24/13 at 01:23pm, Dave Young wrote:
> On 09/24/13 at 12:58pm, Dave Young wrote:
> > On 09/24/13 at 12:57pm, Dave Young wrote:
> > > On 09/23/13 at 08:06pm, H. Peter Anvin wrote:
> > > > Okay... I see two problems.
> > > > 
> > > > 1. It looks like we subtract the region size after, rather than before, assigning an address.
> 
> Could you explain more about this problem? Where is the code?
> 
> > > > 
> > > > 2. The second region is assigned the same address in the secondary kernel as in the first, implying the size of the first region was somehow set to zero.
> > > 
> > > I find the reason, efi_reserve_boot_services will reserve the BOOT_SERVICE_DATA region
> > > thus the memmap size is changed to 0, so in 2nd kernel the virtual mapping addr after
> > > the md will be not same as 1st kernel, see below code:
> > >  
> > > void __init efi_map_region(efi_memory_desc_t *md)
> > > {
> > >         unsigned long size = md->num_pages << PAGE_SHIFT;
> > > 
> > >         efi_va -= size;
> > >         ^^^^^^^^^^^^^^^
> > > 	[snip]
> > > }
> > 
> > So how about just reserve BOOT_SERVICE_DATA region but keep the md.numpages as is?
> 
> Hmm, num_pages = 0 is only set when boot service region reservation is imporsible, I'm
> lost.. But there must be somewhere set the size to 0.
> 

digging more about it, it is indeed below code move the num_pages to 0:
void __init efi_reserve_boot_services(void)
{
 [snip]
                if ((start+size >= __pa_symbol(_text)
                                && start <= __pa_symbol(_end)) ||
                        !e820_all_mapped(start, start+size, E820_RAM) ||
                        memblock_is_region_reserved(start, size)) {
                        /* Could not reserve, skip it */
                        md->num_pages = 0;
 [snip]
}

During my test, the first region overlaps with kernel _text <-> _end, thus cause this issue.

I wonder if md->num_pages must be set to 0 here. If so I think we have to save the original memmap
for kexec use. Any better idea?


^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH -v2] EFI: Runtime services virtual mapping
  2013-09-24  4:57               ` Dave Young
  2013-09-24  4:58                 ` Dave Young
@ 2013-09-24  9:43                 ` Borislav Petkov
  2013-09-24 10:01                   ` Dave Young
  2013-09-24 12:45                   ` Dave Young
  1 sibling, 2 replies; 102+ messages in thread
From: Borislav Petkov @ 2013-09-24  9:43 UTC (permalink / raw)
  To: Dave Young
  Cc: H. Peter Anvin, Borislav Petkov, X86 ML, LKML, Borislav Petkov,
	Matt Fleming, Matthew Garrett, James Bottomley, Vivek Goyal,
	linux-efi

Crap,

I need to send from the web interface since the network here doesn't
somehow let through port 587.

On Tue, September 24, 2013 6:57 am, Dave Young wrote:
> On 09/23/13 at 08:06pm, H. Peter Anvin wrote:
>> Okay... I see two problems.
>>
>> 1. It looks like we subtract the region size after, rather than before,
>> assigning an address.
>>
>> 2. The second region is assigned the same address in the secondary
>> kernel as in the first, implying the size of the first region was
>> somehow set to zero.
>
> I find the reason, efi_reserve_boot_services will reserve the
> BOOT_SERVICE_DATA region
> thus the memmap size is changed to 0, so in 2nd kernel the virtual mapping
> addr after
> the md will be not same as 1st kernel, see below code:
>
> void __init efi_map_region(efi_memory_desc_t *md)
> {
>         unsigned long size = md->num_pages << PAGE_SHIFT;
>
>         efi_va -= size;
>         ^^^^^^^^^^^^^^^

Anyway, yes, this is wrong. We probably want to something like the
following, instead (patch might be whitespace-damaged):

--
diff --git a/arch/x86/platform/efi/efi_64.c b/arch/x86/platform/efi/efi_64.c
index a235dc95d629..ea0ea4fd3dab 100644
--- a/arch/x86/platform/efi/efi_64.c
+++ b/arch/x86/platform/efi/efi_64.c
@@ -85,8 +85,7 @@ void __init efi_map_region(efi_memory_desc_t *md)
 {
 	unsigned long size = md->num_pages << PAGE_SHIFT;

-	efi_va -= size;
-	if (efi_va < EFI_VA_END) {
+	if (efi_va - size < EFI_VA_END) {
 		pr_warning(FW_WARN "VA address range overflow!\n");
 		return;
 	}
@@ -101,6 +100,8 @@ void __init efi_map_region(efi_memory_desc_t *md)
 	/* Do the VA map */
 	__map_region(md, efi_va);
 	md->virt_addr = efi_va;
+
+	efi_va -= size;
 }

 void __iomem *__init efi_ioremap(unsigned long phys_addr, unsigned long
size,

-- 
Regards/Gruss,
Boris.


^ permalink raw reply related	[flat|nested] 102+ messages in thread

* Re: [PATCH -v2] EFI: Runtime services virtual mapping
  2013-09-24  9:43                 ` Borislav Petkov
@ 2013-09-24 10:01                   ` Dave Young
  2013-09-24 12:45                   ` Dave Young
  1 sibling, 0 replies; 102+ messages in thread
From: Dave Young @ 2013-09-24 10:01 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: H. Peter Anvin, X86 ML, LKML, Borislav Petkov, Matt Fleming,
	Matthew Garrett, James Bottomley, Vivek Goyal, linux-efi

On 09/24/13 at 11:43am, Borislav Petkov wrote:
> Crap,
> 
> I need to send from the web interface since the network here doesn't
> somehow let through port 587.
> 
> On Tue, September 24, 2013 6:57 am, Dave Young wrote:
> > On 09/23/13 at 08:06pm, H. Peter Anvin wrote:
> >> Okay... I see two problems.
> >>
> >> 1. It looks like we subtract the region size after, rather than before,
> >> assigning an address.
> >>
> >> 2. The second region is assigned the same address in the secondary
> >> kernel as in the first, implying the size of the first region was
> >> somehow set to zero.
> >
> > I find the reason, efi_reserve_boot_services will reserve the
> > BOOT_SERVICE_DATA region
> > thus the memmap size is changed to 0, so in 2nd kernel the virtual mapping
> > addr after
> > the md will be not same as 1st kernel, see below code:
> >
> > void __init efi_map_region(efi_memory_desc_t *md)
> > {
> >         unsigned long size = md->num_pages << PAGE_SHIFT;
> >
> >         efi_va -= size;
> >         ^^^^^^^^^^^^^^^
> 
> Anyway, yes, this is wrong. We probably want to something like the
> following, instead (patch might be whitespace-damaged):
> 
> --
> diff --git a/arch/x86/platform/efi/efi_64.c b/arch/x86/platform/efi/efi_64.c
> index a235dc95d629..ea0ea4fd3dab 100644
> --- a/arch/x86/platform/efi/efi_64.c
> +++ b/arch/x86/platform/efi/efi_64.c
> @@ -85,8 +85,7 @@ void __init efi_map_region(efi_memory_desc_t *md)
>  {
>  	unsigned long size = md->num_pages << PAGE_SHIFT;
> 
> -	efi_va -= size;
> -	if (efi_va < EFI_VA_END) {
> +	if (efi_va - size < EFI_VA_END) {
>  		pr_warning(FW_WARN "VA address range overflow!\n");
>  		return;
>  	}
> @@ -101,6 +100,8 @@ void __init efi_map_region(efi_memory_desc_t *md)
>  	/* Do the VA map */
>  	__map_region(md, efi_va);
>  	md->virt_addr = efi_va;
> +
> +	efi_va -= size;
>  }
> 
>  void __iomem *__init efi_ioremap(unsigned long phys_addr, unsigned long
> size,
> 

Ok, I got it, it it what what Peter mentioned problem 1.

--
Thanks
Dave

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH -v2] EFI: Runtime services virtual mapping
  2013-09-24  9:43                 ` Borislav Petkov
  2013-09-24 10:01                   ` Dave Young
@ 2013-09-24 12:45                   ` Dave Young
  1 sibling, 0 replies; 102+ messages in thread
From: Dave Young @ 2013-09-24 12:45 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: H. Peter Anvin, X86 ML, LKML, Borislav Petkov, Matt Fleming,
	Matthew Garrett, James Bottomley, Vivek Goyal, linux-efi

On 09/24/13 at 11:43am, Borislav Petkov wrote:
> Crap,
> 
> I need to send from the web interface since the network here doesn't
> somehow let through port 587.
> 
> On Tue, September 24, 2013 6:57 am, Dave Young wrote:
> > On 09/23/13 at 08:06pm, H. Peter Anvin wrote:
> >> Okay... I see two problems.
> >>
> >> 1. It looks like we subtract the region size after, rather than before,
> >> assigning an address.
> >>
> >> 2. The second region is assigned the same address in the secondary
> >> kernel as in the first, implying the size of the first region was
> >> somehow set to zero.
> >
> > I find the reason, efi_reserve_boot_services will reserve the
> > BOOT_SERVICE_DATA region
> > thus the memmap size is changed to 0, so in 2nd kernel the virtual mapping
> > addr after
> > the md will be not same as 1st kernel, see below code:
> >
> > void __init efi_map_region(efi_memory_desc_t *md)
> > {
> >         unsigned long size = md->num_pages << PAGE_SHIFT;
> >
> >         efi_va -= size;
> >         ^^^^^^^^^^^^^^^
> 
> Anyway, yes, this is wrong. We probably want to something like the
> following, instead (patch might be whitespace-damaged):
> 
> --
> diff --git a/arch/x86/platform/efi/efi_64.c b/arch/x86/platform/efi/efi_64.c
> index a235dc95d629..ea0ea4fd3dab 100644
> --- a/arch/x86/platform/efi/efi_64.c
> +++ b/arch/x86/platform/efi/efi_64.c
> @@ -85,8 +85,7 @@ void __init efi_map_region(efi_memory_desc_t *md)
>  {
>  	unsigned long size = md->num_pages << PAGE_SHIFT;
> 
> -	efi_va -= size;
> -	if (efi_va < EFI_VA_END) {
> +	if (efi_va - size < EFI_VA_END) {
>  		pr_warning(FW_WARN "VA address range overflow!\n");
>  		return;
>  	}
> @@ -101,6 +100,8 @@ void __init efi_map_region(efi_memory_desc_t *md)
>  	/* Do the VA map */
>  	__map_region(md, efi_va);
>  	md->virt_addr = efi_va;
> +
> +	efi_va -= size;
>  }
> 
>  void __iomem *__init efi_ioremap(unsigned long phys_addr, unsigned long
> size,
> 

Think again about this, how about 1:1 map them from a base address like -64G
phy_addr  ->  (-64G + phy_addr), in this way we can avoid depending on the
previous region size.

For the zero region problem, we can resolve it as a standalone problem.

--
Thanks
Dave

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH -v2] EFI: Runtime services virtual mapping
  2013-09-21 11:39   ` [PATCH -v2] " Borislav Petkov
                       ` (2 preceding siblings ...)
  2013-09-23  8:45     ` Borislav Petkov
@ 2013-09-25  9:24     ` Borislav Petkov
  3 siblings, 0 replies; 102+ messages in thread
From: Borislav Petkov @ 2013-09-25  9:24 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: X86 ML, LKML, Borislav Petkov, Matt Fleming, Matthew Garrett,
	H. Peter Anvin, James Bottomley, Vivek Goyal, Dave Young,
	linux-efi

On Sat, September 21, 2013 1:39 pm, Borislav Petkov wrote:
> diff --git a/arch/x86/platform/efi/efi_32.c
> b/arch/x86/platform/efi/efi_32.c
> index 40e446941dd7..661663b08eaf 100644
> --- a/arch/x86/platform/efi/efi_32.c
> +++ b/arch/x86/platform/efi/efi_32.c
> @@ -37,9 +37,36 @@
>   * claim EFI runtime service handler exclusively and to duplicate a
> memory in
>   * low memory space say 0 - 3G.
>   */
> -
>  static unsigned long efi_rt_eflags;
>
> +void efi_sync_low_kernel_mappings(void) {}
> +
> +void __init efi_map_region(efi_memory_desc_t *md)
> +{
> +	u64 start_pfn, end_pfn, end;
> +	unsigned long size;
> +	void *va;
> +
> +	start_pfn = PFN_DOWN(md->phys_addr);
> +	size	  = md->num_pages << PAGE_SHIFT;
> +	end	  = md->phys_addr + size;
> +	end_pfn   = PFN_UP(end);
> +
> +	if (pfn_range_is_mapped(start_pfn, end_pfn)) {
> +		va = __va(md->phys_addr);
> +
> +		if (!(md->attribute & EFI_MEMORY_WB))
> +			efi_memory_uc((u64)(unsigned long)va, size);
> +	} else
> +		va = efi_ioremap(md->phys_addr, size,
> +				 md->type, md->attribute);
> +
> +	md->virt_addr = (u64) (unsigned long) va;
> +	if (!va)
> +		pr_err("ioremap of 0x%llX failed!\n",
> +		       (unsigned long long)md->phys_addr);
> +}
> +

Another note-to-self, while I'm here: it is probably prudent to
be conservative here and keep the old runtime mapping method
in generic EFI code and behind a chicken bit, something like
"efi.use_old_runtime_mapping" or shorter so that people whose systems
break from the new mapping can fall back to the old, well-tested method.

Thanks.

-- 
Regards/Gruss,
Boris.


^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH -v2] EFI: Runtime services virtual mapping
  2013-09-24  3:06             ` H. Peter Anvin
  2013-09-24  4:57               ` Dave Young
@ 2013-10-02 10:04               ` Borislav Petkov
  2013-10-02 15:43                 ` H. Peter Anvin
  1 sibling, 1 reply; 102+ messages in thread
From: Borislav Petkov @ 2013-10-02 10:04 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Dave Young, X86 ML, LKML, Borislav Petkov, Matt Fleming,
	Matthew Garrett, James Bottomley, Vivek Goyal, linux-efi

On Mon, Sep 23, 2013 at 08:06:38PM -0700, H. Peter Anvin wrote:
> Okay... I see two problems.
> 
> 1. It looks like we subtract the region size after, rather than before, assigning an address.
> 

Ok, so I'm looking at this agan and, actually, we really do subtract the
region size *before* we assign the address:

---
       efi_va -= size;
       if (efi_va < EFI_VA_END) {
               pr_warning(FW_WARN "VA address range overflow!\n");
               return;
       }

       /*
        * Make sure the 1:1 mappings are present as a catch-all for b0rked
        * firmware which doesn't update all internal pointers after switching
        * to virtual mode and would otherwise crap on us.
        */
       __map_region(md, md->phys_addr);

       /* Do the VA map */
       __map_region(md, efi_va);
       md->virt_addr = efi_va;
--

So let me give an example why I think it is correct to subtract *before*
assigning and so that we can talk about it and we completely agree on
the details. :-)

When we start allocating from -4G, i.e. 0xffffffff00000000, I think we
want to do it bottom-up so that 0xffffffff00000000 is the *last*, i.e.
lowest address. Because we link the kernel text at 0xffffffff81000000 by
default, which would mean, if -4G was the first address, we'll have only
2G:

0xffffffff81000000 - 0xffffffff00000000 = 0x0000000081000000 = 2.164.260.864 bytes

of space for UEFI mappings.

That's why, I need to *first* subtract and *then* use the resulting
address to map the region to. Like so (4 hypothetical regions):

1st region: 0xfffffffeffffe000 - 0xffffffff00000000

2nd region: 0xfffffffeffff8000 - 0xfffffffeffffe000

3rd region: 0xfffffffefffec000 - 0xfffffffeffff8000

4th region: 0xfffffffefffd8000 - 0xfffffffefffec000

and so on...

IOW, the VA layout looks like this:

0xfffffffefffd8000
...
region 4
...
0xfffffffefffec000 (non including)
...
region 3
...
0xfffffffeffff8000 (ditto)
...
region 2
...
0xfffffffeffffe000 (ditto)
...
region 1
...
0xffffffff00000000 (ditto)

Am I even making sense here?

Thanks.

-- 
Regards/Gruss,
    Boris.

Sent from a fat crate under my desk. Formatting is fine.
--

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH -v2] EFI: Runtime services virtual mapping
  2013-10-02 10:04               ` Borislav Petkov
@ 2013-10-02 15:43                 ` H. Peter Anvin
  2013-10-02 17:05                   ` Borislav Petkov
  0 siblings, 1 reply; 102+ messages in thread
From: H. Peter Anvin @ 2013-10-02 15:43 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Dave Young, X86 ML, LKML, Borislav Petkov, Matt Fleming,
	Matthew Garrett, James Bottomley, Vivek Goyal, linux-efi

On 10/02/2013 03:04 AM, Borislav Petkov wrote:
> When we start allocating from -4G, i.e. 0xffffffff00000000, I think we
> want to do it bottom-up so that 0xffffffff00000000 is the *last*, i.e.
> lowest address. Because we link the kernel text at 0xffffffff81000000 by
> default, which would mean, if -4G was the first address, we'll have only
> 2G:

Right.

	-hpa



^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH -v2] EFI: Runtime services virtual mapping
  2013-10-02 15:43                 ` H. Peter Anvin
@ 2013-10-02 17:05                   ` Borislav Petkov
  2013-10-02 17:32                     ` H. Peter Anvin
  0 siblings, 1 reply; 102+ messages in thread
From: Borislav Petkov @ 2013-10-02 17:05 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Dave Young, X86 ML, LKML, Borislav Petkov, Matt Fleming,
	Matthew Garrett, James Bottomley, Vivek Goyal, linux-efi

On Wed, Oct 02, 2013 at 08:43:52AM -0700, H. Peter Anvin wrote:
> On 10/02/2013 03:04 AM, Borislav Petkov wrote:
> > When we start allocating from -4G, i.e. 0xffffffff00000000, I think we
> > want to do it bottom-up so that 0xffffffff00000000 is the *last*, i.e.
> > lowest address. Because we link the kernel text at 0xffffffff81000000 by
> > default, which would mean, if -4G was the first address, we'll have only
> > 2G:
> 
> Right.

Btw, Matt just found another issue with the bottom-up approach - due to
different alignment of VA and PA addresses, this messes up the pagetable
in terms of the order in which we're using 4K, 2M, etc pages.

What can happen is that, you can get a non-2M aligned PA mapped with
2M-aligned VA which results in a #PF with PF_RSVD set, which most likely
happens because one or more of the bits in the [12:20] slice of the PMD
are reserved but they get set due to the PA having address bits set in
the aforementioned slice and thus a #PF is raised.

So we changed the mapping method to a more straight-forward one: we map
all EFI regions in the following range:

[ efi_va - -4G ]

and we compute efi_va by subtracting the highest EFI region address from
-4G, i.e. 0xffff_ffff_0000_0000.

Then, each VA is computed by doing efi_va + PA.

Basically, we have a non-contiguous window in the virtual address space
with the highest address of it being -4G. In OVMF, f.e., we get the
following mappings:

VA: 0xfffffffe80800000..0xfffffffe81000000 -> PA: 0x800000..0x1000000
VA: 0xfffffffefc000000..0xfffffffefc020000 -> PA: 0x7c000000..0x7c020000
VA: 0xfffffffefdc5b000..0xfffffffefe146000 -> PA: 0x7dc5b000..0x7e146000

...

VA: 0xfffffffeffa65000..0xfffffffefffe0000 -> PA: 0x7fa65000..0x7ffe0000
VA: 0xfffffffefffe0000..0xffffffff00000000 -> PA: 0x7ffe0000..0x80000000

So, basically, the EFI regions occupy a 2Gish window with holes in the
range:

[ 0xfffffffe80800000 - 0xffffffff00000000 )

and since we said, we want to give the whole EFI memmap 64G max, that
should be ok.

Oh, and the alignment remains compatible this way.

So this mapping scheme - courtesy of Matt - is very straight-forward
and simple and I like simple. This way we won't need the setup_data
games with kexec tools as we'll be simply doing the same mappings in the
kexec'ed kernel.

Anyway, I'll clean up the patch and send it out later.

Thanks.

-- 
Regards/Gruss,
    Boris.

Sent from a fat crate under my desk. Formatting is fine.
--

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH -v2] EFI: Runtime services virtual mapping
  2013-10-02 17:05                   ` Borislav Petkov
@ 2013-10-02 17:32                     ` H. Peter Anvin
  2013-10-02 18:42                       ` Borislav Petkov
  0 siblings, 1 reply; 102+ messages in thread
From: H. Peter Anvin @ 2013-10-02 17:32 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Dave Young, X86 ML, LKML, Borislav Petkov, Matt Fleming,
	Matthew Garrett, James Bottomley, Vivek Goyal, linux-efi

On 10/02/2013 10:05 AM, Borislav Petkov wrote:
> 
> Btw, Matt just found another issue with the bottom-up approach - due to
> different alignment of VA and PA addresses, this messes up the pagetable
> in terms of the order in which we're using 4K, 2M, etc pages.
> 
> What can happen is that, you can get a non-2M aligned PA mapped with
> 2M-aligned VA which results in a #PF with PF_RSVD set, which most likely
> happens because one or more of the bits in the [12:20] slice of the PMD
> are reserved but they get set due to the PA having address bits set in
> the aforementioned slice and thus a #PF is raised.
> 

So this is a bug in the sense that 2M pages were used when they were not
safe to use (matching alignment is part of the requirement for 2M pages
being allowable.)  However, we of course want to use 2M pages, so see below.

> So we changed the mapping method to a more straight-forward one: we map
> all EFI regions in the following range:
> 
> [ efi_va - -4G ]
> 
> and we compute efi_va by subtracting the highest EFI region address from
> -4G, i.e. 0xffff_ffff_0000_0000.
> 
> Then, each VA is computed by doing efi_va + PA.
> 
> Oh, and the alignment remains compatible this way.
> 
> So this mapping scheme - courtesy of Matt - is very straight-forward
> and simple and I like simple. This way we won't need the setup_data
> games with kexec tools as we'll be simply doing the same mappings in the
> kexec'ed kernel.
> 
> Anyway, I'll clean up the patch and send it out later.
> 

We could achieve the same thing by doing alignment after subtracting the
pointer.  HOWEVER, it also goes to show that any mapping scheme is
inherently fragile (consider if the mapping scheme above ends up
consuming too much virtual space in the future), and as a result I
really think that explicitly passing the map to the kexec kernel really
is the only sane thing to do, as otherwise we have to maintain the same
algorithm forever.

	-hpa


^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH -v2] EFI: Runtime services virtual mapping
  2013-10-02 17:32                     ` H. Peter Anvin
@ 2013-10-02 18:42                       ` Borislav Petkov
  2013-10-02 18:46                         ` H. Peter Anvin
  0 siblings, 1 reply; 102+ messages in thread
From: Borislav Petkov @ 2013-10-02 18:42 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Dave Young, X86 ML, LKML, Borislav Petkov, Matt Fleming,
	Matthew Garrett, James Bottomley, Vivek Goyal, linux-efi

On Wed, Oct 02, 2013 at 10:32:19AM -0700, H. Peter Anvin wrote:
> So this is a bug in the sense that 2M pages were used when they were
> not safe to use (matching alignment is part of the requirement for 2M
> pages being allowable.) However, we of course want to use 2M pages, so
> see below.

Yes, so the alignment has to be such that both PA and VA are the same
amount of 4K pages away from the next 2M boundary, to put it bluntly.

I have a couple of ideas on how to do that.

> We could achieve the same thing by doing alignment after subtracting the
> pointer.  HOWEVER, it also goes to show that any mapping scheme is
> inherently fragile (consider if the mapping scheme above ends up
> consuming too much virtual space in the future), and as a result I

Yes, I understand your sentiment - we want to be as conservative as
possible with the approach before it is cast in stone, for we don't know
what firmware turds are to be expected in the future.

> really think that explicitly passing the map to the kexec kernel
> really is the only sane thing to do, as otherwise we have to maintain
> the same algorithm forever.

Yes, we'll have to announce the mapping over sysfs of proc for the
kexec-tools to parse it, as I'm sure you've already heard. But this can
and will be done in the next step, right after we have a stable regions
mapping algorithm.

Thanks.

-- 
Regards/Gruss,
    Boris.

Sent from a fat crate under my desk. Formatting is fine.
--

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH -v2] EFI: Runtime services virtual mapping
  2013-10-02 18:42                       ` Borislav Petkov
@ 2013-10-02 18:46                         ` H. Peter Anvin
  2013-10-04  9:42                           ` Borislav Petkov
  0 siblings, 1 reply; 102+ messages in thread
From: H. Peter Anvin @ 2013-10-02 18:46 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Dave Young, X86 ML, LKML, Borislav Petkov, Matt Fleming,
	Matthew Garrett, James Bottomley, Vivek Goyal, linux-efi

On 10/02/2013 11:42 AM, Borislav Petkov wrote:
> 
> Yes, so the alignment has to be such that both PA and VA are the same
> amount of 4K pages away from the next 2M boundary, to put it bluntly.
> 
> I have a couple of ideas on how to do that.
> 

It's pretty straightforward - just drop the starting address to proper
alignment after you subtract the size.

	-hpa



^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH -v2] EFI: Runtime services virtual mapping
  2013-10-02 18:46                         ` H. Peter Anvin
@ 2013-10-04  9:42                           ` Borislav Petkov
  2013-10-04 14:43                             ` H. Peter Anvin
  0 siblings, 1 reply; 102+ messages in thread
From: Borislav Petkov @ 2013-10-04  9:42 UTC (permalink / raw)
  To: H. Peter Anvin, Matt Fleming
  Cc: Dave Young, X86 ML, LKML, Borislav Petkov, Matt Fleming,
	Matthew Garrett, James Bottomley, Vivek Goyal, linux-efi

On Wed, Oct 02, 2013 at 11:46:44AM -0700, H. Peter Anvin wrote:
> It's pretty straightforward - just drop the starting address to proper
> alignment after you subtract the size.

Ok, just an observation - it is not necessarily a bad thing but I
thought we should talk about it:

So, when we do the VA space saving mapping, we're basically mapping huge
physical ranges onto a much smaller VA range and adding other mappings
in there pots-factum could turn out to be not straight-forward and
problematic.

To illustrate what I'm trying to say, here's an example from two regions
in OVMF:

[    0.011005] __map_region: VA: 0xfffffffeff800000..0xffffffff00000000 -> PA: 0x800000.. 0x1000000
[    0.017005] __map_region: VA: 0xfffffffeff600000..0xfffffffeff620000 -> PA: 0x7c000000.. 0x7c020000

Now, the physical address range spanned by those regions is:

0x7c020000 - 0x800000 = 0x7b820000 =~ 2G

while the virtual is

0xffffffff00000000 - 0xfffffffeff600000 = 0xa00000 =~ 10M

Now, we obviously cannot map the whole PA space in there, the question
is: do we care?

I mean, we can map it to other VA range but this will totally destroy
the simple math of computing EFI VA addresses with an offset, similar to
PAGE_OFFSET.

OTOH, if we keep Matt's suggestion of mapping the whole EFI address
space window, we don't have that issue. And we've reserved 64G for
EFI and if it needs more, we probably can give it since we're using a
different pagetable anyway.

Opinions?

Thanks.

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH -v2] EFI: Runtime services virtual mapping
  2013-10-04  9:42                           ` Borislav Petkov
@ 2013-10-04 14:43                             ` H. Peter Anvin
  2013-10-04 14:50                               ` Borislav Petkov
  0 siblings, 1 reply; 102+ messages in thread
From: H. Peter Anvin @ 2013-10-04 14:43 UTC (permalink / raw)
  To: Borislav Petkov, Matt Fleming
  Cc: Dave Young, X86 ML, LKML, Borislav Petkov, Matt Fleming,
	Matthew Garrett, James Bottomley, Vivek Goyal, linux-efi

We can do that... but it is different from what Windows does to my understanding and it also has the potential of severe pathologies... e.g. a window at the top of the address space being mapped.

Borislav Petkov <bp@alien8.de> wrote:
>On Wed, Oct 02, 2013 at 11:46:44AM -0700, H. Peter Anvin wrote:
>> It's pretty straightforward - just drop the starting address to
>proper
>> alignment after you subtract the size.
>
>Ok, just an observation - it is not necessarily a bad thing but I
>thought we should talk about it:
>
>So, when we do the VA space saving mapping, we're basically mapping
>huge
>physical ranges onto a much smaller VA range and adding other mappings
>in there pots-factum could turn out to be not straight-forward and
>problematic.
>
>To illustrate what I'm trying to say, here's an example from two
>regions
>in OVMF:
>
>[    0.011005] __map_region: VA: 0xfffffffeff800000..0xffffffff00000000
>-> PA: 0x800000.. 0x1000000
>[    0.017005] __map_region: VA: 0xfffffffeff600000..0xfffffffeff620000
>-> PA: 0x7c000000.. 0x7c020000
>
>Now, the physical address range spanned by those regions is:
>
>0x7c020000 - 0x800000 = 0x7b820000 =~ 2G
>
>while the virtual is
>
>0xffffffff00000000 - 0xfffffffeff600000 = 0xa00000 =~ 10M
>
>Now, we obviously cannot map the whole PA space in there, the question
>is: do we care?
>
>I mean, we can map it to other VA range but this will totally destroy
>the simple math of computing EFI VA addresses with an offset, similar
>to
>PAGE_OFFSET.
>
>OTOH, if we keep Matt's suggestion of mapping the whole EFI address
>space window, we don't have that issue. And we've reserved 64G for
>EFI and if it needs more, we probably can give it since we're using a
>different pagetable anyway.
>
>Opinions?
>
>Thanks.

-- 
Sent from my mobile phone.  Please pardon brevity and lack of formatting.

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH -v2] EFI: Runtime services virtual mapping
  2013-10-04 14:43                             ` H. Peter Anvin
@ 2013-10-04 14:50                               ` Borislav Petkov
  0 siblings, 0 replies; 102+ messages in thread
From: Borislav Petkov @ 2013-10-04 14:50 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Matt Fleming, Dave Young, X86 ML, LKML, Borislav Petkov,
	Matthew Garrett, James Bottomley, Vivek Goyal, linux-efi

On Fri, Oct 04, 2013 at 07:43:37AM -0700, H. Peter Anvin wrote:
> We can do that... but it is different from what Windows does to my
> understanding and it also has the potential of severe pathologies...
> e.g. a window at the top of the address space being mapped.

Right, so after Matt and I talked about it a bit on IRC, we actually
don't really care how we do the mappings if we spell them out later to
kexec over proc or somewhere else, as you wanted.

So we can do the VA address space saving scheme first and change it
later, if there are issues. We'll see.

Thanks.

-- 
Regards/Gruss,
    Boris.

Sent from a fat crate under my desk. Formatting is fine.
--

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH 00/11] EFI runtime services virtual mapping
  2013-09-19 14:54 [PATCH 00/11] EFI runtime services virtual mapping Borislav Petkov
                   ` (11 preceding siblings ...)
  2013-09-20  7:29 ` [PATCH 00/11] EFI runtime " Dave Young
@ 2013-10-08 16:45 ` Borislav Petkov
  2013-10-08 16:47   ` [PATCH 11/12] efi: Add an efi= kernel command line parameter Borislav Petkov
                     ` (2 more replies)
  12 siblings, 3 replies; 102+ messages in thread
From: Borislav Petkov @ 2013-10-08 16:45 UTC (permalink / raw)
  To: X86 ML
  Cc: LKML, Borislav Petkov, Matt Fleming, Matthew Garrett,
	H. Peter Anvin, James Bottomley, Vivek Goyal, Dave Young,
	linux-efi, fwts-devel

Ok,

here's some more changes to the UEFI RT odyssey: So we agreed upon
having a chicken bit to fallback to the old runtime services ioremapping
dance in case there's b0rked firmware (hahahah).

Then, I did some dirty fixing of fwts' efi_runtime.ko kernel module
because the new RT mapping method breaks their implicit assumption that
arguments passed to EFI RT functions are mapped in the same address
space as the kernel. This makes accessing of userspace pointers in the
EFI RT functions impossible so we need to copy stuff around.

I'm attaching a dirty patch which doesn't necessarily always do the
correct thing but it doesn't freeze the guest which is a good first
step. More correctness to it when there's time.

Btw, Matt, in order to make calling of EFI RT functions possible with
parameters in module space, we need to sync PGDs from PAGE_OFFSET all
the way to MODULES_END, see efi_sync_low_kernel_mappings().

Right, so the chicken bit is called "efi=old_map" and it should return
EFI code to the old functionality.

Anyway, the first 10 patches are the same so I'm sending only the last
two as a reply to this message.

Thanks.

--
diff --git a/efi_runtime/Makefile b/efi_runtime/Makefile
index a9c0ea7f9df6..9f197c08774d 100644
--- a/efi_runtime/Makefile
+++ b/efi_runtime/Makefile
@@ -1,9 +1,14 @@
+#ifneq ($(KERNELRELEASE),)
 obj-m += efi_runtime.o
+#else
+KERNELDIR ?= /lib/modules/$(shell uname -r)/build
+PWD := $(shell pwd)
+
 all:
-	make -C /lib/modules/$(KVER)/build M=`pwd` modules
+	make -C $(KERNELDIR) M=$(PWD) modules
 
 install:
-	make -C /lib/modules/$(KVER)/build M=`pwd` modules_install
+	make -C $(KERNELDIR) M=$(PWD) modules_install
 
 clean:
-	make -C /lib/modules/$(KVER)/build M=`pwd` clean
+	make -C $(KERNELDIR) M=$(PWD) clean
diff --git a/efi_runtime/efi_runtime.c b/efi_runtime/efi_runtime.c
index 7e3e9494ddce..8d3cf4f7f4ac 100644
--- a/efi_runtime/efi_runtime.c
+++ b/efi_runtime/efi_runtime.c
@@ -24,8 +24,9 @@
 #include <linux/init.h>
 #include <linux/proc_fs.h>
 #include <linux/efi.h>
-
+#include <linux/slab.h>
 #include <linux/uaccess.h>
+#include <linux/ucs2_string.h>
 
 #include "efi_runtime.h"
 
@@ -106,11 +107,14 @@ static long efi_runtime_ioctl(struct file *file, unsigned int cmd,
 	efi_status_t status;
 	struct efi_getvariable __user *pgetvariable;
 	struct efi_setvariable __user *psetvariable;
+	void *vardata;
+	uint16_t *varname;
+	unsigned namelen;
 
 	efi_guid_t vendor;
 	EFI_GUID vendor_guid;
 	unsigned long datasize;
-	uint32_t attr;
+	uint32_t attr, hc;
 
 	efi_time_t eft;
 	efi_time_cap_t cap;
@@ -127,8 +131,14 @@ static long efi_runtime_ioctl(struct file *file, unsigned int cmd,
 
 	struct efi_getnexthighmonotoniccount __user *pgetnexthighmonotoniccount;
 #if LINUX_VERSION_CODE >= KERNEL_VERSION(3,1,0)
-	struct efi_queryvariableinfo __user *pqueryvariableinfo;
-	struct efi_querycapsulecapabilities __user *pquerycapsulecapabilities;
+	struct efi_queryvariableinfo __user *pqvar;
+	uint64_t MaximumVariableStorageSize, RemainingVariableStorageSize, MaximumVariableSize;
+	struct efi_querycapsulecapabilities __user *u_ccaps;
+	struct efi_querycapsulecapabilities ccaps;
+	uint64_t MaximumCapsuleSize;
+	EFI_RESET_TYPE ResetType;
+	EFI_CAPSULE_HEADER *capsules;
+	int i;
 #endif
 
 	switch (cmd) {
@@ -141,34 +151,75 @@ static long efi_runtime_ioctl(struct file *file, unsigned int cmd,
 			return -EFAULT;
 
 		convert_from_guid(&vendor, &vendor_guid);
-		status = efi.get_variable(pgetvariable->VariableName, &vendor,
-					&attr, &datasize, pgetvariable->Data);
+
+		vardata = kmalloc(datasize, GFP_KERNEL);
+		if (!vardata)
+			return -ENOMEM;
+
+		namelen = ucs2_strsize(pgetvariable->VariableName, 1024);
+
+		varname = kmalloc(namelen, GFP_KERNEL);
+		if (!varname)
+			return -ENOMEM;
+
+		if (copy_from_user(varname, pgetvariable->VariableName, namelen))
+			return -EFAULT;
+
+		if (copy_from_user(vardata, pgetvariable->Data, datasize))
+			return -EFAULT;
+
+		status = efi.get_variable(varname, &vendor, &attr, &datasize, vardata);
 		if (put_user(status, pgetvariable->status))
 			return -EFAULT;
+
+		kfree(varname);
+		kfree(vardata);
+
 		if (status == EFI_SUCCESS) {
 			if (put_user(attr, pgetvariable->Attributes) ||
 				put_user(datasize, pgetvariable->DataSize))
 				return -EFAULT;
 			return 0;
 		} else {
-			printk(KERN_ERR "efi_runtime: can't get variable\n");
+			printk(KERN_ERR "efi_runtime: can't get variable, stat: 0x%lx\n",
+				status);
 			return -EINVAL;
 		}
 
 	case EFI_RUNTIME_SET_VARIABLE:
 		psetvariable = (struct efi_setvariable __user *)arg;
+
 		if (get_user(datasize, &psetvariable->DataSize) ||
 			get_user(attr, &psetvariable->Attributes) ||
 			copy_from_user(&vendor_guid, psetvariable->VendorGuid,
 							sizeof(EFI_GUID)))
 			return -EFAULT;
 
+		vardata = kmalloc(datasize, GFP_KERNEL);
+		if (!vardata)
+			return -ENOMEM;
+
+		namelen = ucs2_strsize(psetvariable->VariableName, 1024);
+
+		varname = kmalloc(namelen, GFP_KERNEL);
+		if (!varname)
+			return -ENOMEM;
+
+		if (copy_from_user(varname, psetvariable->VariableName, namelen))
+			return -EFAULT;
+
+		if (copy_from_user(vardata, psetvariable->Data, datasize))
+			return -EFAULT;
+
 		convert_from_guid(&vendor, &vendor_guid);
-		status = efi.set_variable(psetvariable->VariableName, &vendor,
-					attr, datasize, psetvariable->Data);
+		status = efi.set_variable(varname, &vendor, attr, datasize, vardata);
 
 		if (put_user(status, psetvariable->status))
 			return -EFAULT;
+
+		kfree(vardata);
+		kfree(varname);
+
 		return status == EFI_SUCCESS ? 0 : -EINVAL;
 
 	case EFI_RUNTIME_GET_TIME:
@@ -257,11 +308,19 @@ static long efi_runtime_ioctl(struct file *file, unsigned int cmd,
 		if (name_size > 1024)
 			return -EFAULT;
 
+		namelen = ucs2_strsize(pgetnextvariablename->VariableName, 1024);
+
+		varname = kmalloc(namelen, GFP_KERNEL);
+		if (!varname)
+			return -ENOMEM;
+
+		if (copy_from_user(varname, pgetnextvariablename->VariableName, namelen))
+			return -EFAULT;
+
 		convert_from_guid(&vendor, &vendor_guid);
 
-		status = efi.get_next_variable(&name_size,
-					pgetnextvariablename->VariableName,
-								&vendor);
+		status = efi.get_next_variable(&name_size, varname, &vendor);
+
 		if (put_user(status, pgetnextvariablename->status))
 			return -EFAULT;
 		convert_to_guid(&vendor, &vendor_guid);
@@ -272,6 +331,9 @@ static long efi_runtime_ioctl(struct file *file, unsigned int cmd,
 		if (copy_to_user(pgetnextvariablename->VendorGuid,
 						&vendor_guid, sizeof(EFI_GUID)))
 			return -EFAULT;
+
+		kfree(varname);
+
 		if (status != EFI_SUCCESS)
 			return -EINVAL;
 		return 0;
@@ -279,17 +341,26 @@ static long efi_runtime_ioctl(struct file *file, unsigned int cmd,
 #if LINUX_VERSION_CODE >= KERNEL_VERSION(3,1,0)
 	case EFI_RUNTIME_QUERY_VARIABLEINFO:
 
-		pqueryvariableinfo = (struct efi_queryvariableinfo __user *)arg;
+		pqvar = (struct efi_queryvariableinfo __user *)arg;
 
-		if (get_user(attr, &pqueryvariableinfo->Attributes))
+		if (get_user(attr, &pqvar->Attributes))
 			return -EFAULT;
 
 		status = efi.query_variable_info(attr,
-				pqueryvariableinfo->MaximumVariableStorageSize,
-				pqueryvariableinfo->RemainingVariableStorageSize
-				, pqueryvariableinfo->MaximumVariableSize);
-		if (put_user(status, pqueryvariableinfo->status))
+				&MaximumVariableStorageSize,
+				&RemainingVariableStorageSize,
+				&MaximumVariableSize);
+
+		if (put_user(MaximumVariableStorageSize,
+			     pqvar->MaximumVariableStorageSize) ||
+		    put_user(RemainingVariableStorageSize,
+			    pqvar->RemainingVariableStorageSize) ||
+		    put_user(MaximumVariableSize,
+			    pqvar->MaximumVariableSize) ||
+		    put_user(status, pqvar->status))
 			return -EFAULT;
+
+
 		if (status != EFI_SUCCESS)
 			return -EINVAL;
 
@@ -301,8 +372,10 @@ static long efi_runtime_ioctl(struct file *file, unsigned int cmd,
 		pgetnexthighmonotoniccount = (struct
 				efi_getnexthighmonotoniccount __user *)arg;
 
-		status = efi.get_next_high_mono_count(pgetnexthighmonotoniccount
-								->HighCount);
+		status = efi.get_next_high_mono_count(&hc);
+
+		if (put_user(hc, pgetnexthighmonotoniccount->HighCount))
+			return -EFAULT;
 		if (put_user(status, pgetnexthighmonotoniccount->status))
 			return -EFAULT;
 		if (status != EFI_SUCCESS)
@@ -313,21 +386,46 @@ static long efi_runtime_ioctl(struct file *file, unsigned int cmd,
 #if LINUX_VERSION_CODE >= KERNEL_VERSION(3,1,0)
 	case EFI_RUNTIME_QUERY_CAPSULECAPABILITIES:
 
-		pquerycapsulecapabilities = (struct
-				efi_querycapsulecapabilities __user *)arg;
+		u_ccaps = (struct efi_querycapsulecapabilities __user *)arg;
+
+		if (copy_from_user(&ccaps, u_ccaps, sizeof(ccaps)))
+			return -EFAULT;
+
+		capsules = kcalloc(ccaps.CapsuleCount + 1,
+				   sizeof(EFI_CAPSULE_HEADER),
+				   GFP_KERNEL);
+		if (!capsules)
+			return -ENOMEM;
+
+		for (i = 0; i < ccaps.CapsuleCount; i++)
+			if (copy_from_user(&capsules[i],
+					   (EFI_CAPSULE_HEADER *)u_ccaps->CapsuleHeaderArray[i],
+					   sizeof(EFI_CAPSULE_HEADER)))
+				return -EFAULT;
+
+		ccaps.CapsuleHeaderArray = &capsules;
 
 		status = efi.query_capsule_caps(
-				(efi_capsule_header_t **)
-				pquerycapsulecapabilities->CapsuleHeaderArray,
-				pquerycapsulecapabilities->CapsuleCount,
-				pquerycapsulecapabilities->MaximumCapsuleSize,
-				(int *)pquerycapsulecapabilities->ResetType);
+				(efi_capsule_header_t **) ccaps.CapsuleHeaderArray,
+				ccaps.CapsuleCount,
+				&MaximumCapsuleSize,
+				(int *)&ResetType);
 
-		if (put_user(status, pquerycapsulecapabilities->status))
+		if (put_user(status, u_ccaps->status))
 			return -EFAULT;
+
+		if (put_user(MaximumCapsuleSize,
+			     u_ccaps->MaximumCapsuleSize))
+			return -EFAULT;
+
+		if (put_user(ResetType, u_ccaps->ResetType))
+			return -EFAULT;
+
 		if (status != EFI_SUCCESS)
 			return -EINVAL;
 
+		kfree(capsules);
+
 		return 0;
 #endif
 	}

-- 
Regards/Gruss,
    Boris.

Sent from a fat crate under my desk. Formatting is fine.
--

^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [PATCH 11/12] efi: Add an efi= kernel command line parameter
  2013-10-08 16:45 ` Borislav Petkov
@ 2013-10-08 16:47   ` Borislav Petkov
  2013-10-28 11:02     ` Matt Fleming
  2013-10-08 16:48   ` [PATCH 12/12] EFI: Runtime services virtual mapping Borislav Petkov
  2013-10-14 13:04   ` [PATCH 00/11] EFI runtime " Matt Fleming
  2 siblings, 1 reply; 102+ messages in thread
From: Borislav Petkov @ 2013-10-08 16:47 UTC (permalink / raw)
  To: X86 ML
  Cc: LKML, Borislav Petkov, Matt Fleming, Matthew Garrett,
	H. Peter Anvin, James Bottomley, Vivek Goyal, Dave Young,
	linux-efi, fwts-devel

From: Borislav Petkov <bp@suse.de>

... for passing miscellaneous options and chicken bits from the command
line.

Signed-off-by: Borislav Petkov <bp@suse.de>
---
 arch/x86/platform/efi/efi.c | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/arch/x86/platform/efi/efi.c b/arch/x86/platform/efi/efi.c
index 538c1e6b7b2c..16996aba5012 100644
--- a/arch/x86/platform/efi/efi.c
+++ b/arch/x86/platform/efi/efi.c
@@ -1113,3 +1113,12 @@ efi_status_t efi_query_variable_store(u32 attributes, unsigned long size)
 	return EFI_SUCCESS;
 }
 EXPORT_SYMBOL_GPL(efi_query_variable_store);
+
+static int __init parse_efi_cmdline(char *str)
+{
+	if (*str == '=')
+		str++;
+
+	return 0;
+}
+early_param("efi", parse_efi_cmdline);
-- 
1.8.4

-- 
Regards/Gruss,
    Boris.

Sent from a fat crate under my desk. Formatting is fine.
--

^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [PATCH 12/12] EFI: Runtime services virtual mapping
  2013-10-08 16:45 ` Borislav Petkov
  2013-10-08 16:47   ` [PATCH 11/12] efi: Add an efi= kernel command line parameter Borislav Petkov
@ 2013-10-08 16:48   ` Borislav Petkov
  2013-10-10  8:06     ` Dave Young
                       ` (2 more replies)
  2013-10-14 13:04   ` [PATCH 00/11] EFI runtime " Matt Fleming
  2 siblings, 3 replies; 102+ messages in thread
From: Borislav Petkov @ 2013-10-08 16:48 UTC (permalink / raw)
  To: X86 ML
  Cc: LKML, Borislav Petkov, Matt Fleming, Matthew Garrett,
	H. Peter Anvin, James Bottomley, Vivek Goyal, Dave Young,
	linux-efi, fwts-devel

From: Borislav Petkov <bp@suse.de>

We map the EFI regions needed for runtime services contiguously on
virtual addresses starting from -4G down for a total max space of 64G.
This way, we provide for stable runtime services addresses across
kernels so that a kexec'd kernel can still use them.

This way, they're mapped in a separate pagetable so that we don't
pollute the kernel namespace (you can see how the whole ioremapping and
saving and restoring of PGDs is gone now).

Also, add a chicken bit called "efi=old_map" which can be used as a
fallback to the old runtime services mapping method in case there's some
b0rkage with a particular EFI implementation (haha, it is hard to hold
up the sarcasm here...).

Add UEFI RT VA space to Documentation/x86/x86_64/mm.txt, while at it.

Signed-off-by: Borislav Petkov <bp@suse.de>
---
 Documentation/x86/x86_64/mm.txt      |  7 +++
 arch/x86/include/asm/efi.h           | 47 ++++++++++++-------
 arch/x86/include/asm/pgtable_types.h |  3 +-
 arch/x86/platform/efi/efi.c          | 91 ++++++++++++++++++++++++++----------
 arch/x86/platform/efi/efi_32.c       |  8 +++-
 arch/x86/platform/efi/efi_64.c       | 83 ++++++++++++++++++++++++++++++++
 arch/x86/platform/efi/efi_stub_64.S  | 54 +++++++++++++++++++++
 include/linux/efi.h                  |  1 +
 8 files changed, 251 insertions(+), 43 deletions(-)

diff --git a/Documentation/x86/x86_64/mm.txt b/Documentation/x86/x86_64/mm.txt
index 881582f75c9c..c584a51add15 100644
--- a/Documentation/x86/x86_64/mm.txt
+++ b/Documentation/x86/x86_64/mm.txt
@@ -28,4 +28,11 @@ reference.
 Current X86-64 implementations only support 40 bits of address space,
 but we support up to 46 bits. This expands into MBZ space in the page tables.
 
+->trampoline_pgd:
+
+We map EFI runtime services in the aforementioned PGD in the virtual
+range of 64Gb (arbitrarily set, can be raised if needed)
+
+0xffffffef00000000 - 0xffffffff00000000
+
 -Andi Kleen, Jul 2004
diff --git a/arch/x86/include/asm/efi.h b/arch/x86/include/asm/efi.h
index 0062a0125041..c70714447a8f 100644
--- a/arch/x86/include/asm/efi.h
+++ b/arch/x86/include/asm/efi.h
@@ -39,6 +39,9 @@ extern unsigned long asmlinkage efi_call_phys(void *, ...);
 
 #else /* !CONFIG_X86_32 */
 
+extern u64 efi_va;
+
+#define EFI_VA_END		(-68 * (1UL << 30))
 #define EFI_LOADER_SIGNATURE	"EL64"
 
 extern u64 efi_call0(void *fp);
@@ -69,24 +72,31 @@ extern u64 efi_call6(void *fp, u64 arg1, u64 arg2, u64 arg3,
 	efi_call6((f), (u64)(a1), (u64)(a2), (u64)(a3),		\
 		  (u64)(a4), (u64)(a5), (u64)(a6))
 
+#define _efi_call_virtX(x, f, ...)					\
+({									\
+	efi_status_t __s;						\
+									\
+	efi_sync_low_kernel_mappings();					\
+	preempt_disable();						\
+	__s = efi_call##x((void *)efi.systab->runtime->f, __VA_ARGS__);	\
+	preempt_enable();						\
+	__s;								\
+})
+
 #define efi_call_virt0(f)				\
-	efi_call0((efi.systab->runtime->f))
-#define efi_call_virt1(f, a1)					\
-	efi_call1((efi.systab->runtime->f), (u64)(a1))
-#define efi_call_virt2(f, a1, a2)					\
-	efi_call2((efi.systab->runtime->f), (u64)(a1), (u64)(a2))
-#define efi_call_virt3(f, a1, a2, a3)					\
-	efi_call3((efi.systab->runtime->f), (u64)(a1), (u64)(a2), \
-		  (u64)(a3))
-#define efi_call_virt4(f, a1, a2, a3, a4)				\
-	efi_call4((efi.systab->runtime->f), (u64)(a1), (u64)(a2), \
-		  (u64)(a3), (u64)(a4))
-#define efi_call_virt5(f, a1, a2, a3, a4, a5)				\
-	efi_call5((efi.systab->runtime->f), (u64)(a1), (u64)(a2), \
-		  (u64)(a3), (u64)(a4), (u64)(a5))
-#define efi_call_virt6(f, a1, a2, a3, a4, a5, a6)			\
-	efi_call6((efi.systab->runtime->f), (u64)(a1), (u64)(a2), \
-		  (u64)(a3), (u64)(a4), (u64)(a5), (u64)(a6))
+	_efi_call_virtX(0, f)
+#define efi_call_virt1(f, a1)				\
+	_efi_call_virtX(1, f, (u64)(a1))
+#define efi_call_virt2(f, a1, a2)			\
+	_efi_call_virtX(2, f, (u64)(a1), (u64)(a2))
+#define efi_call_virt3(f, a1, a2, a3)			\
+	_efi_call_virtX(3, f, (u64)(a1), (u64)(a2), (u64)(a3))
+#define efi_call_virt4(f, a1, a2, a3, a4)		\
+	_efi_call_virtX(4, f, (u64)(a1), (u64)(a2), (u64)(a3), (u64)(a4))
+#define efi_call_virt5(f, a1, a2, a3, a4, a5)		\
+	_efi_call_virtX(5, f, (u64)(a1), (u64)(a2), (u64)(a3), (u64)(a4), (u64)(a5))
+#define efi_call_virt6(f, a1, a2, a3, a4, a5, a6)	\
+	_efi_call_virtX(6, f, (u64)(a1), (u64)(a2), (u64)(a3), (u64)(a4), (u64)(a5), (u64)(a6))
 
 extern void __iomem *efi_ioremap(unsigned long addr, unsigned long size,
 				 u32 type, u64 attribute);
@@ -101,6 +111,9 @@ extern void efi_call_phys_prelog(void);
 extern void efi_call_phys_epilog(void);
 extern void efi_unmap_memmap(void);
 extern void efi_memory_uc(u64 addr, unsigned long size);
+extern void __init efi_map_region(efi_memory_desc_t *md);
+extern void efi_sync_low_kernel_mappings(void);
+extern void __init old_map_region(efi_memory_desc_t *md);
 
 #ifdef CONFIG_EFI
 
diff --git a/arch/x86/include/asm/pgtable_types.h b/arch/x86/include/asm/pgtable_types.h
index 0ecac257fb26..a83aa44bb1fb 100644
--- a/arch/x86/include/asm/pgtable_types.h
+++ b/arch/x86/include/asm/pgtable_types.h
@@ -382,7 +382,8 @@ static inline void update_page_count(int level, unsigned long pages) { }
  */
 extern pte_t *lookup_address(unsigned long address, unsigned int *level);
 extern phys_addr_t slow_virt_to_phys(void *__address);
-
+extern int kernel_map_pages_in_pgd(pgd_t *pgd, u64 pfn, unsigned long address,
+				   unsigned numpages, unsigned long page_flags);
 #endif	/* !__ASSEMBLY__ */
 
 #endif /* _ASM_X86_PGTABLE_DEFS_H */
diff --git a/arch/x86/platform/efi/efi.c b/arch/x86/platform/efi/efi.c
index 16996aba5012..91d4fac94e67 100644
--- a/arch/x86/platform/efi/efi.c
+++ b/arch/x86/platform/efi/efi.c
@@ -12,6 +12,8 @@
  *	Bibo Mao <bibo.mao@intel.com>
  *	Chandramouli Narayanan <mouli@linux.intel.com>
  *	Huang Ying <ying.huang@intel.com>
+ * Copyright (C) 2013 SuSE Labs
+ * 	Borislav Petkov <bp@suse.de> - runtime services VA mapping
  *
  * Copied from efi_32.c to eliminate the duplicated code between EFI
  * 32/64 support code. --ying 2007-10-26
@@ -55,6 +57,12 @@
 
 #define EFI_MIN_RESERVE 5120
 
+/*
+ * We allocate runtime services regions bottom-up, starting from -4G, i.e.
+ * 0xffff_ffff_0000_0000 and limit EFI VA mapping space to 64G.
+ */
+u64 efi_va		= -4 * (1UL << 30);
+
 #define EFI_DUMMY_GUID \
 	EFI_GUID(0x4424ac57, 0xbe4b, 0x47dd, 0x9e, 0x97, 0xed, 0x50, 0xf0, 0x9f, 0x92, 0xa9)
 
@@ -81,6 +89,17 @@ static efi_system_table_t efi_systab __initdata;
 unsigned long x86_efi_facility;
 
 /*
+ * Scratch space used for switching the pagetable in the EFI stub
+ */
+struct efi_scratch {
+	u64 r15;
+	u64 prev_cr3;
+	pgd_t *efi_pgt;
+	bool use_pgd;
+};
+extern struct efi_scratch efi_scratch;
+
+/*
  * Returns 1 if 'facility' is enabled, 0 otherwise.
  */
 int efi_enabled(int facility)
@@ -851,6 +870,31 @@ void efi_memory_uc(u64 addr, unsigned long size)
 	set_memory_uc(addr, npages);
 }
 
+void __init old_map_region(efi_memory_desc_t *md)
+{
+	u64 start_pfn, end_pfn, end;
+	unsigned long size;
+	void *va;
+
+	start_pfn = PFN_DOWN(md->phys_addr);
+	size	  = md->num_pages << PAGE_SHIFT;
+	end	  = md->phys_addr + size;
+	end_pfn   = PFN_UP(end);
+
+	if (pfn_range_is_mapped(start_pfn, end_pfn)) {
+		va = __va(md->phys_addr);
+
+		if (!(md->attribute & EFI_MEMORY_WB))
+			efi_memory_uc((u64)(unsigned long)va, size);
+	} else
+		va = efi_ioremap(md->phys_addr, size,
+				 md->type, md->attribute);
+
+	md->virt_addr = (u64) (unsigned long) va;
+	if (!va)
+		pr_err("ioremap of 0x%llX failed!\n",
+		       (unsigned long long)md->phys_addr);
+}
 /*
  * This function will switch the EFI runtime services to virtual mode.
  * Essentially, look through the EFI memmap and map every region that
@@ -862,10 +906,10 @@ void efi_memory_uc(u64 addr, unsigned long size)
 void __init efi_enter_virtual_mode(void)
 {
 	efi_memory_desc_t *md, *prev_md = NULL;
-	efi_status_t status;
+	void *p, *new_memmap = NULL;
 	unsigned long size;
-	u64 end, systab, start_pfn, end_pfn;
-	void *p, *va, *new_memmap = NULL;
+	efi_status_t status;
+	u64 end, systab;
 	int count = 0;
 
 	efi.systab = NULL;
@@ -874,7 +918,6 @@ void __init efi_enter_virtual_mode(void)
 	 * We don't do virtual mode, since we don't do runtime services, on
 	 * non-native EFI
 	 */
-
 	if (!efi_is_native()) {
 		efi_unmap_memmap();
 		return;
@@ -905,6 +948,7 @@ void __init efi_enter_virtual_mode(void)
 			continue;
 		}
 		prev_md = md;
+
 	}
 
 	for (p = memmap.map; p < memmap.map_end; p += memmap.desc_size) {
@@ -914,33 +958,18 @@ void __init efi_enter_virtual_mode(void)
 		    md->type != EFI_BOOT_SERVICES_DATA)
 			continue;
 
+		efi_map_region(md);
+
 		size = md->num_pages << PAGE_SHIFT;
 		end = md->phys_addr + size;
 
-		start_pfn = PFN_DOWN(md->phys_addr);
-		end_pfn = PFN_UP(end);
-		if (pfn_range_is_mapped(start_pfn, end_pfn)) {
-			va = __va(md->phys_addr);
-
-			if (!(md->attribute & EFI_MEMORY_WB))
-				efi_memory_uc((u64)(unsigned long)va, size);
-		} else
-			va = efi_ioremap(md->phys_addr, size,
-					 md->type, md->attribute);
-
-		md->virt_addr = (u64) (unsigned long) va;
-
-		if (!va) {
-			pr_err("ioremap of 0x%llX failed!\n",
-			       (unsigned long long)md->phys_addr);
-			continue;
-		}
-
 		systab = (u64) (unsigned long) efi_phys.systab;
 		if (md->phys_addr <= systab && systab < end) {
 			systab += md->virt_addr - md->phys_addr;
+
 			efi.systab = (efi_system_table_t *) (unsigned long) systab;
 		}
+
 		new_memmap = krealloc(new_memmap,
 				      (count + 1) * memmap.desc_size,
 				      GFP_KERNEL);
@@ -949,8 +978,17 @@ void __init efi_enter_virtual_mode(void)
 		count++;
 	}
 
+#ifdef CONFIG_X86_64
+	efi_scratch.efi_pgt = (pgd_t *)(unsigned long)real_mode_header->trampoline_pgd;
+
+	if (!test_bit(EFI_OLD_MEMMAP, &x86_efi_facility))
+		efi_scratch.use_pgd = true;
+#endif
+
 	BUG_ON(!efi.systab);
 
+	efi_sync_low_kernel_mappings();
+
 	status = phys_efi_set_virtual_address_map(
 		memmap.desc_size * count,
 		memmap.desc_size,
@@ -983,7 +1021,9 @@ void __init efi_enter_virtual_mode(void)
 	efi.query_variable_info = virt_efi_query_variable_info;
 	efi.update_capsule = virt_efi_update_capsule;
 	efi.query_capsule_caps = virt_efi_query_capsule_caps;
-	if (__supported_pte_mask & _PAGE_NX)
+
+	if (test_bit(EFI_OLD_MEMMAP, &x86_efi_facility) &&
+	    (__supported_pte_mask & _PAGE_NX))
 		runtime_code_page_mkexec();
 
 	kfree(new_memmap);
@@ -1119,6 +1159,9 @@ static int __init parse_efi_cmdline(char *str)
 	if (*str == '=')
 		str++;
 
+	if (!strncmp(str, "old_map", 7))
+		set_bit(EFI_OLD_MEMMAP, &x86_efi_facility);
+
 	return 0;
 }
 early_param("efi", parse_efi_cmdline);
diff --git a/arch/x86/platform/efi/efi_32.c b/arch/x86/platform/efi/efi_32.c
index 40e446941dd7..6c697a9633f2 100644
--- a/arch/x86/platform/efi/efi_32.c
+++ b/arch/x86/platform/efi/efi_32.c
@@ -37,9 +37,15 @@
  * claim EFI runtime service handler exclusively and to duplicate a memory in
  * low memory space say 0 - 3G.
  */
-
 static unsigned long efi_rt_eflags;
 
+void efi_sync_low_kernel_mappings(void) {}
+
+void __init efi_map_region(efi_memory_desc_t *md)
+{
+	old_map_region(md);
+}
+
 void efi_call_phys_prelog(void)
 {
 	struct desc_ptr gdt_descr;
diff --git a/arch/x86/platform/efi/efi_64.c b/arch/x86/platform/efi/efi_64.c
index 39a0e7f1f0a3..2ee51db337da 100644
--- a/arch/x86/platform/efi/efi_64.c
+++ b/arch/x86/platform/efi/efi_64.c
@@ -65,6 +65,9 @@ void __init efi_call_phys_prelog(void)
 	int pgd;
 	int n_pgds;
 
+	if (!test_bit(EFI_OLD_MEMMAP, &x86_efi_facility))
+		return;
+
 	early_code_mapping_set_exec(1);
 	local_irq_save(efi_flags);
 
@@ -86,6 +89,10 @@ void __init efi_call_phys_epilog(void)
 	 */
 	int pgd;
 	int n_pgds = DIV_ROUND_UP((max_pfn << PAGE_SHIFT) , PGDIR_SIZE);
+
+	if (!test_bit(EFI_OLD_MEMMAP, &x86_efi_facility))
+		return;
+
 	for (pgd = 0; pgd < n_pgds; pgd++)
 		set_pgd(pgd_offset_k(pgd * PGDIR_SIZE), save_pgd[pgd]);
 	kfree(save_pgd);
@@ -94,6 +101,82 @@ void __init efi_call_phys_epilog(void)
 	early_code_mapping_set_exec(0);
 }
 
+/*
+ * Add low kernel mappings for passing arguments to EFI functions.
+ */
+void efi_sync_low_kernel_mappings(void)
+{
+	unsigned num_pgds;
+	pgd_t *pgd = (pgd_t *)__va(real_mode_header->trampoline_pgd);
+
+	if (test_bit(EFI_OLD_MEMMAP, &x86_efi_facility))
+		return;
+
+	num_pgds = pgd_index(MODULES_END - 1) - pgd_index(PAGE_OFFSET);
+
+	memcpy(pgd + pgd_index(PAGE_OFFSET),
+		init_mm.pgd + pgd_index(PAGE_OFFSET),
+		sizeof(pgd_t) * num_pgds);
+}
+
+static void __init __map_region(efi_memory_desc_t *md, u64 va)
+{
+	pgd_t *pgd = (pgd_t *)__va(real_mode_header->trampoline_pgd);
+	unsigned long pf = 0, size;
+	u64 end;
+
+	if (!(md->attribute & EFI_MEMORY_WB))
+		pf |= _PAGE_PCD;
+
+	size = md->num_pages << PAGE_SHIFT;
+	end  = va + size;
+
+	if(kernel_map_pages_in_pgd(pgd, md->phys_addr, va, md->num_pages, pf))
+		pr_warning("Error mapping PA 0x%llx -> VA 0x%llx!\n",
+			   md->phys_addr, va);
+}
+
+void __init efi_map_region(efi_memory_desc_t *md)
+{
+	unsigned long size = md->num_pages << PAGE_SHIFT;
+	u64 pa = md->phys_addr;
+
+	if (test_bit(EFI_OLD_MEMMAP, &x86_efi_facility))
+		return old_map_region(md);
+
+	/*
+	 * Make sure the 1:1 mappings are present as a catch-all for b0rked
+	 * firmware which doesn't update all internal pointers after switching
+	 * to virtual mode and would otherwise crap on us.
+	 */
+	__map_region(md, md->phys_addr);
+
+	efi_va -= size;
+
+	/* Is PA 2M-aligned? */
+	if (!(pa & (PMD_SIZE - 1)))
+		efi_va &= PMD_MASK;
+	else {
+		u64 pa_offset = pa & (PMD_SIZE - 1);
+		u64 prev_va = efi_va;
+
+		/* get us the same offset within this 2M page */
+		efi_va = (efi_va & PMD_MASK) + pa_offset;
+
+		if (efi_va > prev_va)
+			efi_va -= PMD_SIZE;
+	}
+
+	if (efi_va < EFI_VA_END) {
+		pr_warning(FW_WARN "VA address range overflow!\n");
+		return;
+	}
+
+	/* Do the VA map */
+	__map_region(md, efi_va);
+	md->virt_addr = efi_va;
+}
+
 void __iomem *__init efi_ioremap(unsigned long phys_addr, unsigned long size,
 				 u32 type, u64 attribute)
 {
diff --git a/arch/x86/platform/efi/efi_stub_64.S b/arch/x86/platform/efi/efi_stub_64.S
index 4c07ccab8146..88073b140298 100644
--- a/arch/x86/platform/efi/efi_stub_64.S
+++ b/arch/x86/platform/efi/efi_stub_64.S
@@ -34,10 +34,47 @@
 	mov %rsi, %cr0;			\
 	mov (%rsp), %rsp
 
+	/* stolen from gcc */
+	.macro FLUSH_TLB_ALL
+	movq %r15, efi_scratch(%rip)
+	movq %r14, efi_scratch+8(%rip)
+	movq %cr4, %r15
+	movq %r15, %r14
+	andb $0x7f, %r14b
+	movq %r14, %cr4
+	movq %r15, %cr4
+	movq efi_scratch+8(%rip), %r14
+	movq efi_scratch(%rip), %r15
+	.endm
+
+	.macro SWITCH_PGT
+	cmpb $0, efi_scratch+24(%rip)
+	je 1f
+	movq %r15, efi_scratch(%rip)		# r15
+	# save previous CR3
+	movq %cr3, %r15
+	movq %r15, efi_scratch+8(%rip)		# prev_cr3
+	movq efi_scratch+16(%rip), %r15		# EFI pgt
+	movq %r15, %cr3
+	1:
+	.endm
+
+	.macro RESTORE_PGT
+	cmpb $0, efi_scratch+24(%rip)
+	je 2f
+	movq efi_scratch+8(%rip), %r15
+	movq %r15, %cr3
+	movq efi_scratch(%rip), %r15
+	FLUSH_TLB_ALL
+	2:
+	.endm
+
 ENTRY(efi_call0)
 	SAVE_XMM
 	subq $32, %rsp
+	SWITCH_PGT
 	call *%rdi
+	RESTORE_PGT
 	addq $32, %rsp
 	RESTORE_XMM
 	ret
@@ -47,7 +84,9 @@ ENTRY(efi_call1)
 	SAVE_XMM
 	subq $32, %rsp
 	mov  %rsi, %rcx
+	SWITCH_PGT
 	call *%rdi
+	RESTORE_PGT
 	addq $32, %rsp
 	RESTORE_XMM
 	ret
@@ -57,7 +96,9 @@ ENTRY(efi_call2)
 	SAVE_XMM
 	subq $32, %rsp
 	mov  %rsi, %rcx
+	SWITCH_PGT
 	call *%rdi
+	RESTORE_PGT
 	addq $32, %rsp
 	RESTORE_XMM
 	ret
@@ -68,7 +109,9 @@ ENTRY(efi_call3)
 	subq $32, %rsp
 	mov  %rcx, %r8
 	mov  %rsi, %rcx
+	SWITCH_PGT
 	call *%rdi
+	RESTORE_PGT
 	addq $32, %rsp
 	RESTORE_XMM
 	ret
@@ -80,7 +123,9 @@ ENTRY(efi_call4)
 	mov %r8, %r9
 	mov %rcx, %r8
 	mov %rsi, %rcx
+	SWITCH_PGT
 	call *%rdi
+	RESTORE_PGT
 	addq $32, %rsp
 	RESTORE_XMM
 	ret
@@ -93,7 +138,9 @@ ENTRY(efi_call5)
 	mov %r8, %r9
 	mov %rcx, %r8
 	mov %rsi, %rcx
+	SWITCH_PGT
 	call *%rdi
+	RESTORE_PGT
 	addq $48, %rsp
 	RESTORE_XMM
 	ret
@@ -109,8 +156,15 @@ ENTRY(efi_call6)
 	mov %r8, %r9
 	mov %rcx, %r8
 	mov %rsi, %rcx
+	SWITCH_PGT
 	call *%rdi
+	RESTORE_PGT
 	addq $48, %rsp
 	RESTORE_XMM
 	ret
 ENDPROC(efi_call6)
+
+	.data
+ENTRY(efi_scratch)
+	.fill 3,8,0
+	.byte 0
diff --git a/include/linux/efi.h b/include/linux/efi.h
index fa47d80ab4b5..beff433aa8c0 100644
--- a/include/linux/efi.h
+++ b/include/linux/efi.h
@@ -632,6 +632,7 @@ extern int __init efi_setup_pcdp_console(char *);
 #define EFI_RUNTIME_SERVICES	3	/* Can we use runtime services? */
 #define EFI_MEMMAP		4	/* Can we use EFI memory map? */
 #define EFI_64BIT		5	/* Is the firmware 64-bit? */
+#define EFI_OLD_MEMMAP		6	/* Use old mapping method */
 
 #ifdef CONFIG_EFI
 # ifdef CONFIG_X86
-- 
1.8.4

-- 
Regards/Gruss,
    Boris.

Sent from a fat crate under my desk. Formatting is fine.
--

^ permalink raw reply related	[flat|nested] 102+ messages in thread

* Re: [PATCH 12/12] EFI: Runtime services virtual mapping
  2013-10-08 16:48   ` [PATCH 12/12] EFI: Runtime services virtual mapping Borislav Petkov
@ 2013-10-10  8:06     ` Dave Young
  2013-10-10  8:14       ` Dave Young
  2013-10-28 11:22     ` Matt Fleming
  2013-10-29  6:47     ` Dave Young
  2 siblings, 1 reply; 102+ messages in thread
From: Dave Young @ 2013-10-10  8:06 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: X86 ML, LKML, Borislav Petkov, Matt Fleming, Matthew Garrett,
	H. Peter Anvin, James Bottomley, Vivek Goyal, linux-efi,
	fwts-devel

On 10/08/13 at 06:48pm, Borislav Petkov wrote:
> From: Borislav Petkov <bp@suse.de>
> 
> We map the EFI regions needed for runtime services contiguously on
> virtual addresses starting from -4G down for a total max space of 64G.
> This way, we provide for stable runtime services addresses across
> kernels so that a kexec'd kernel can still use them.
> 
> This way, they're mapped in a separate pagetable so that we don't
> pollute the kernel namespace (you can see how the whole ioremapping and
> saving and restoring of PGDs is gone now).
> 
> Also, add a chicken bit called "efi=old_map" which can be used as a
> fallback to the old runtime services mapping method in case there's some
> b0rkage with a particular EFI implementation (haha, it is hard to hold
> up the sarcasm here...).
> 
> Add UEFI RT VA space to Documentation/x86/x86_64/mm.txt, while at it.
> 

Tested this new patch, the kexec kernel still get different mappings.
Same reason, in first kernel reserve boot service function the size is
set to 0.

With a little hack patch below (upon my previous test patches for kexec)
kexec and kdump works ok in qemu/ovmf, still not tried on real hardware.

--- bp.orig/arch/x86/platform/efi/efi.c
+++ bp/arch/x86/platform/efi/efi.c
@@ -445,10 +445,18 @@ static void __init print_efi_memmap(void
 #endif  /*  EFI_DEBUG  */
 }
 
+static bool inline overlap_with_ktext(u64 start, u64 size)
+{
+	return (start + size >= __pa_symbol(_text)
+				&& start <= __pa_symbol(_end));
+}
+
 void __init efi_reserve_boot_services(void)
 {
 	void *p;
 
+	if (kexecboot)
+		return;
 	for (p = memmap.map; p < memmap.map_end; p += memmap.desc_size) {
 		efi_memory_desc_t *md = p;
 		u64 start = md->phys_addr;
@@ -463,13 +471,16 @@ void __init efi_reserve_boot_services(vo
 		 * - Not within any part of the kernel
 		 * - Not the bios reserved area
 		*/
-		if ((start+size >= __pa_symbol(_text)
-				&& start <= __pa_symbol(_end)) ||
+		if (overlap_with_ktext(start, size) ||
 			!e820_all_mapped(start, start+size, E820_RAM) ||
 			memblock_is_region_reserved(start, size)) {
 			/* Could not reserve, skip it */
-			md->num_pages = 0;
-			memblock_dbg("Could not reserve boot range "
+			if (overlap_with_ktext(start, size)) {
+				u64 s = __pa_symbol(_text) - start;
+				memblock_reserve(start, s);
+			} else
+				md->num_pages = 0;
+			memblock_dbg("Could not reserve whole boot range "
 					"[0x%010llx-0x%010llx]\n",
 						start, start+size-1);
 		} else
@@ -490,6 +501,8 @@ void __init efi_free_boot_services(void)
 {
 	void *p;
 
+	if (kexecboot)
+		return;
 	if (!efi_is_native())
 		return;
 


^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH 12/12] EFI: Runtime services virtual mapping
  2013-10-10  8:06     ` Dave Young
@ 2013-10-10  8:14       ` Dave Young
  2013-10-10  8:58         ` Borislav Petkov
  0 siblings, 1 reply; 102+ messages in thread
From: Dave Young @ 2013-10-10  8:14 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: X86 ML, LKML, Borislav Petkov, Matt Fleming, Matthew Garrett,
	H. Peter Anvin, James Bottomley, Vivek Goyal, linux-efi,
	fwts-devel

On 10/10/13 at 04:06pm, Dave Young wrote:
> On 10/08/13 at 06:48pm, Borislav Petkov wrote:
> > From: Borislav Petkov <bp@suse.de>
> > 
> > We map the EFI regions needed for runtime services contiguously on
> > virtual addresses starting from -4G down for a total max space of 64G.
> > This way, we provide for stable runtime services addresses across
> > kernels so that a kexec'd kernel can still use them.
> > 
> > This way, they're mapped in a separate pagetable so that we don't
> > pollute the kernel namespace (you can see how the whole ioremapping and
> > saving and restoring of PGDs is gone now).
> > 
> > Also, add a chicken bit called "efi=old_map" which can be used as a
> > fallback to the old runtime services mapping method in case there's some
> > b0rkage with a particular EFI implementation (haha, it is hard to hold
> > up the sarcasm here...).
> > 
> > Add UEFI RT VA space to Documentation/x86/x86_64/mm.txt, while at it.
> > 
> 
> Tested this new patch, the kexec kernel still get different mappings.
> Same reason, in first kernel reserve boot service function the size is
> set to 0.
> 
> With a little hack patch below (upon my previous test patches for kexec)
> kexec and kdump works ok in qemu/ovmf, still not tried on real hardware.

Even though I still have no idea why kernel text overlap with efi boot
region, anyway map the un-overlapped part is necessary though.

I can post the kexec related patches after your mapping patches settle
down

> 
> --- bp.orig/arch/x86/platform/efi/efi.c
> +++ bp/arch/x86/platform/efi/efi.c
> @@ -445,10 +445,18 @@ static void __init print_efi_memmap(void
>  #endif  /*  EFI_DEBUG  */
>  }
>  
> +static bool inline overlap_with_ktext(u64 start, u64 size)
> +{
> +	return (start + size >= __pa_symbol(_text)
> +				&& start <= __pa_symbol(_end));
> +}
> +
>  void __init efi_reserve_boot_services(void)
>  {
>  	void *p;
>  
> +	if (kexecboot)
> +		return;
>  	for (p = memmap.map; p < memmap.map_end; p += memmap.desc_size) {
>  		efi_memory_desc_t *md = p;
>  		u64 start = md->phys_addr;
> @@ -463,13 +471,16 @@ void __init efi_reserve_boot_services(vo
>  		 * - Not within any part of the kernel
>  		 * - Not the bios reserved area
>  		*/
> -		if ((start+size >= __pa_symbol(_text)
> -				&& start <= __pa_symbol(_end)) ||
> +		if (overlap_with_ktext(start, size) ||
>  			!e820_all_mapped(start, start+size, E820_RAM) ||
>  			memblock_is_region_reserved(start, size)) {
>  			/* Could not reserve, skip it */
> -			md->num_pages = 0;
> -			memblock_dbg("Could not reserve boot range "
> +			if (overlap_with_ktext(start, size)) {
> +				u64 s = __pa_symbol(_text) - start;
> +				memblock_reserve(start, s);
> +			} else
> +				md->num_pages = 0;
> +			memblock_dbg("Could not reserve whole boot range "
>  					"[0x%010llx-0x%010llx]\n",
>  						start, start+size-1);
>  		} else
> @@ -490,6 +501,8 @@ void __init efi_free_boot_services(void)
>  {
>  	void *p;
>  
> +	if (kexecboot)
> +		return;
>  	if (!efi_is_native())
>  		return;
>  
> 

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH 12/12] EFI: Runtime services virtual mapping
  2013-10-10  8:14       ` Dave Young
@ 2013-10-10  8:58         ` Borislav Petkov
  2013-10-10 12:34           ` Matt Fleming
  0 siblings, 1 reply; 102+ messages in thread
From: Borislav Petkov @ 2013-10-10  8:58 UTC (permalink / raw)
  To: Dave Young
  Cc: X86 ML, LKML, Borislav Petkov, Matt Fleming, Matthew Garrett,
	H. Peter Anvin, James Bottomley, Vivek Goyal, linux-efi,
	fwts-devel

On Thu, Oct 10, 2013 at 04:14:34PM +0800, Dave Young wrote:
> Even though I still have no idea why kernel text overlap with efi boot
> region, anyway map the un-overlapped part is necessary though.
>
> I can post the kexec related patches after your mapping patches settle
> down

Right, "settle down" being the key here.

Matt just mentioned on IRC that we might not need boot services mappings
by the time we have to start the kexec kernel, which would mean, you
don't have to do anything in efi_reserve_boot_services().

The question which needs answering first though is, how the whole efi
thing is going to handle any functionality like calling into efi boot
regions from runtime functions and such. Which hasn't really been tested
and fw vendors don't really want to support that. But this is all bits
and pieces I heard yesterday so it is all pretty wet and I'll let efi
guys, i.e. the Matts and a couple of others :-), figure out this whole
issue.

-- 
Regards/Gruss,
    Boris.

Sent from a fat crate under my desk. Formatting is fine.
--

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH 12/12] EFI: Runtime services virtual mapping
  2013-10-10  8:58         ` Borislav Petkov
@ 2013-10-10 12:34           ` Matt Fleming
  2013-10-11  6:24             ` Dave Young
  0 siblings, 1 reply; 102+ messages in thread
From: Matt Fleming @ 2013-10-10 12:34 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Dave Young, X86 ML, LKML, Borislav Petkov, Matthew Garrett,
	H. Peter Anvin, James Bottomley, Vivek Goyal, linux-efi,
	fwts-devel

On Thu, 10 Oct, at 10:58:28AM, Borislav Petkov wrote:
> On Thu, Oct 10, 2013 at 04:14:34PM +0800, Dave Young wrote:
> > Even though I still have no idea why kernel text overlap with efi boot
> > region, anyway map the un-overlapped part is necessary though.
> >
> > I can post the kexec related patches after your mapping patches settle
> > down
> 
> Right, "settle down" being the key here.
> 
> Matt just mentioned on IRC that we might not need boot services mappings
> by the time we have to start the kexec kernel, which would mean, you
> don't have to do anything in efi_reserve_boot_services().

Dave, apologies for not discussing the whole Boot Services thing sooner.
I missed your questions.

We really should not be passing the EFI Boot Service regions via the
memmap to kexec at all, because by the time the kexec'd kernel is
running those pages that previously contained Boot Service code/data
will have likely been reused for something else.

Which, to answer your question, is why the Boot Service regions overlap
the kernel text in the kexec'd kernel - those regions have been
reallocated by the first kernel and now happen to contain the kernel
text of the kexec kernel.

The reason that we don't keep the Boot Service regions around forever is
because they can take up a considerable amount of memory, so the current
situation of free'ing them after we're sure the firmware isn't going to
reference them is still the right way to go, and simply not including
any Boot Service entries in the memory map passed to kexec should make
everything work OK.

> The question which needs answering first though is, how the whole efi
> thing is going to handle any functionality like calling into efi boot
> regions from runtime functions and such. Which hasn't really been tested
> and fw vendors don't really want to support that. But this is all bits
> and pieces I heard yesterday so it is all pretty wet and I'll let efi
> guys, i.e. the Matts and a couple of others :-), figure out this whole
> issue.

We currently treat the scenario where Runtime Services reference Boot
Service regions as a bug and either work around it (where we do
efi_reserve_boot_services() and efi_free_boot_services() around
SetVirtualAddressMap()) or we avoid calling those services altogether.

The spec is pretty clear that runtime drivers shouldn't be doing this,
and so far this approach has served us well.

There are only two reasons why we keep the Boot Services regions around
(for a short period) at all,

  1) To work around the aforementioned runtime firmware bugs
  2) To copy a ACPI BGRT image into kernel memory

I'm not sure whether the kexec kernel would care about the BGRT?

-- 
Matt Fleming, Intel Open Source Technology Center

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH 12/12] EFI: Runtime services virtual mapping
  2013-10-10 12:34           ` Matt Fleming
@ 2013-10-11  6:24             ` Dave Young
  2013-10-11  7:41               ` Borislav Petkov
  2013-10-11 10:27               ` Matt Fleming
  0 siblings, 2 replies; 102+ messages in thread
From: Dave Young @ 2013-10-11  6:24 UTC (permalink / raw)
  To: Matt Fleming
  Cc: Borislav Petkov, X86 ML, LKML, Borislav Petkov, Matthew Garrett,
	H. Peter Anvin, James Bottomley, Vivek Goyal, linux-efi,
	fwts-devel

On 10/10/13 at 01:34pm, Matt Fleming wrote:
> On Thu, 10 Oct, at 10:58:28AM, Borislav Petkov wrote:
> > On Thu, Oct 10, 2013 at 04:14:34PM +0800, Dave Young wrote:
> > > Even though I still have no idea why kernel text overlap with efi boot
> > > region, anyway map the un-overlapped part is necessary though.
> > >
> > > I can post the kexec related patches after your mapping patches settle
> > > down
> > 
> > Right, "settle down" being the key here.
> > 
> > Matt just mentioned on IRC that we might not need boot services mappings
> > by the time we have to start the kexec kernel, which would mean, you
> > don't have to do anything in efi_reserve_boot_services().
> 
> Dave, apologies for not discussing the whole Boot Services thing sooner.
> I missed your questions.

No problem, Thanks for clarifying the boot service issue.

> 
> We really should not be passing the EFI Boot Service regions via the
> memmap to kexec at all, because by the time the kexec'd kernel is
> running those pages that previously contained Boot Service code/data
> will have likely been reused for something else.
> 
> Which, to answer your question, is why the Boot Service regions overlap
> the kernel text in the kexec'd kernel - those regions have been
> reallocated by the first kernel and now happen to contain the kernel
> text of the kexec kernel.

Ok, then I understand passing boot service regions to 2nd kernel make no
sense.

But for current implementation from Boris, getting same mapping between
diffrent kernel depends on same md order (same start and size for each one)
How about using this mapping solution but at the same time for kexec kernel
we also pass the virtual mappings via setup_data, only thing diffrent
is we only need map the non boot region and just use the boot region size
to ensure the other regions are mapped with same virtual address. 

OTOH, if we only passing ioremapped data without Boris's current patch the
problem I worry about is how can we ensure the addresses are not used by
other code before we mapping the in 2nd kernel efi_init.

For the boot efi_reserve_boot_services code, it's mainly for the
SetVirtualAddressMap callback use, so boot regions should not be reused
before SetVirtualAddressMap, but the overlapping happens before the
efi_reserve_boot_services, isn't it a problem?

> 
> The reason that we don't keep the Boot Service regions around forever is
> because they can take up a considerable amount of memory, so the current
> situation of free'ing them after we're sure the firmware isn't going to
> reference them is still the right way to go, and simply not including
> any Boot Service entries in the memory map passed to kexec should make
> everything work OK.
> 
> > The question which needs answering first though is, how the whole efi
> > thing is going to handle any functionality like calling into efi boot
> > regions from runtime functions and such. Which hasn't really been tested
> > and fw vendors don't really want to support that. But this is all bits
> > and pieces I heard yesterday so it is all pretty wet and I'll let efi
> > guys, i.e. the Matts and a couple of others :-), figure out this whole
> > issue.
> 
> We currently treat the scenario where Runtime Services reference Boot
> Service regions as a bug and either work around it (where we do
> efi_reserve_boot_services() and efi_free_boot_services() around
> SetVirtualAddressMap()) or we avoid calling those services altogether.
> 
> The spec is pretty clear that runtime drivers shouldn't be doing this,
> and so far this approach has served us well.
> 
> There are only two reasons why we keep the Boot Services regions around
> (for a short period) at all,
> 
>   1) To work around the aforementioned runtime firmware bugs
>   2) To copy a ACPI BGRT image into kernel memory
> 
> I'm not sure whether the kexec kernel would care about the BGRT?

I have no idea about BGRT previously, it's  Boot Graphics Resource Table
so it's only for boot time use, I guess kexec can safely ignore it.

Thanks
Dave

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH 12/12] EFI: Runtime services virtual mapping
  2013-10-11  6:24             ` Dave Young
@ 2013-10-11  7:41               ` Borislav Petkov
  2013-10-12  7:54                 ` Dave Young
  2013-10-11 10:27               ` Matt Fleming
  1 sibling, 1 reply; 102+ messages in thread
From: Borislav Petkov @ 2013-10-11  7:41 UTC (permalink / raw)
  To: Dave Young
  Cc: Matt Fleming, X86 ML, LKML, Borislav Petkov, Matthew Garrett,
	H. Peter Anvin, James Bottomley, Vivek Goyal, linux-efi,
	fwts-devel

On Fri, Oct 11, 2013 at 02:24:37PM +0800, Dave Young wrote:
> But for current implementation from Boris, getting same mapping
> between diffrent kernel depends on same md order (same start and
> size for each one) How about using this mapping solution but at the
> same time for kexec kernel we also pass the virtual mappings via
> setup_data, only thing diffrent is we only need map the non boot
> region and just use the boot region size to ensure the other regions
> are mapped with same virtual address.

Actually, as hpa suggested, we will need to be passing the explicit
virtual addresses to the kexec kernel in case we change the mapping
algorithm in the future. So all should go through setup_data.

> OTOH, if we only passing ioremapped data without Boris's current patch
> the problem I worry about is how can we ensure the addresses are not
> used by other code before we mapping the in 2nd kernel efi_init.

Right, the old method of mapping EFI runtime regions used ioremap and
was mapping the regions in the same address space. Now we have reserved
a 64G in the VA space ending at -4G (i.e. 0xffff_ffff_0000_0000) which
is reserved only for EFI RT usage.

-- 
Regards/Gruss,
    Boris.

Sent from a fat crate under my desk. Formatting is fine.
--

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH 12/12] EFI: Runtime services virtual mapping
  2013-10-11  6:24             ` Dave Young
  2013-10-11  7:41               ` Borislav Petkov
@ 2013-10-11 10:27               ` Matt Fleming
  2013-10-11 13:42                 ` Dave Young
  1 sibling, 1 reply; 102+ messages in thread
From: Matt Fleming @ 2013-10-11 10:27 UTC (permalink / raw)
  To: Dave Young
  Cc: Borislav Petkov, X86 ML, LKML, Borislav Petkov, Matthew Garrett,
	H. Peter Anvin, James Bottomley, Vivek Goyal, linux-efi,
	fwts-devel

On Fri, 11 Oct, at 02:24:37PM, Dave Young wrote:
> For the boot efi_reserve_boot_services code, it's mainly for the
> SetVirtualAddressMap callback use, so boot regions should not be reused
> before SetVirtualAddressMap, but the overlapping happens before the
> efi_reserve_boot_services, isn't it a problem?

Hang on, which kernel are you referring to here? The boot kernel or the
kexec'd kernel? I thought you were saying you noticed the overlap when
running in the second (kexec'd) kernel?

The only reason that you would see this overlap in the first (boot)
kernel is if the bootloader messed up and allocated the kernel text as
EfiBootServicesCode/Data. I'd like to believe no bootloaders are still
doing that.

-- 
Matt Fleming, Intel Open Source Technology Center

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH 12/12] EFI: Runtime services virtual mapping
  2013-10-11 10:27               ` Matt Fleming
@ 2013-10-11 13:42                 ` Dave Young
  2013-10-12  2:14                   ` Dave Young
  0 siblings, 1 reply; 102+ messages in thread
From: Dave Young @ 2013-10-11 13:42 UTC (permalink / raw)
  To: Matt Fleming
  Cc: Borislav Petkov, X86 ML, LKML, Borislav Petkov, Matthew Garrett,
	H. Peter Anvin, James Bottomley, Vivek Goyal, linux-efi,
	fwts-devel

Matt,

The kernel I referring is the boot kernel aka the 1st kernel,
the boot loader is grub2 from Fedora 19.

[sorry for top reply because of using webmail]


----- Original Message -----
From: "Matt Fleming" <matt@console-pimps.org>
To: "Dave Young" <dyoung@redhat.com>
Cc: "Borislav Petkov" <bp@alien8.de>, "X86 ML" <x86@kernel.org>, "LKML" <linux-kernel@vger.kernel.org>, "Borislav Petkov" <bp@suse.de>, "Matthew Garrett" <mjg59@srcf.ucam.org>, "H. Peter Anvin" <hpa@zytor.com>, "James Bottomley" <James.Bottomley@HansenPartnership.com>, "Vivek Goyal" <vgoyal@redhat.com>, linux-efi@vger.kernel.org, fwts-devel@lists.ubuntu.com
Sent: Friday, October 11, 2013 6:27:06 PM
Subject: Re: [PATCH 12/12] EFI: Runtime services virtual mapping

On Fri, 11 Oct, at 02:24:37PM, Dave Young wrote:
> For the boot efi_reserve_boot_services code, it's mainly for the
> SetVirtualAddressMap callback use, so boot regions should not be reused
> before SetVirtualAddressMap, but the overlapping happens before the
> efi_reserve_boot_services, isn't it a problem?

Hang on, which kernel are you referring to here? The boot kernel or the
kexec'd kernel? I thought you were saying you noticed the overlap when
running in the second (kexec'd) kernel?

The only reason that you would see this overlap in the first (boot)
kernel is if the bootloader messed up and allocated the kernel text as
EfiBootServicesCode/Data. I'd like to believe no bootloaders are still
doing that.

-- 
Matt Fleming, Intel Open Source Technology Center
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH 12/12] EFI: Runtime services virtual mapping
  2013-10-11 13:42                 ` Dave Young
@ 2013-10-12  2:14                   ` Dave Young
  2013-10-14 15:57                     ` Peter Jones
  0 siblings, 1 reply; 102+ messages in thread
From: Dave Young @ 2013-10-12  2:14 UTC (permalink / raw)
  To: Matt Fleming
  Cc: Borislav Petkov, X86 ML, LKML, Borislav Petkov, Matthew Garrett,
	H. Peter Anvin, James Bottomley, Vivek Goyal, linux-efi,
	fwts-devel, pjones

CCing Peter Jones .., Peter, any idea about the grub related problem?

On 10/11/13 at 09:42am, Dave Young wrote:
> Matt,
> 
> The kernel I referring is the boot kernel aka the 1st kernel,
> the boot loader is grub2 from Fedora 19.
> 
> [sorry for top reply because of using webmail]
> 
> 
> ----- Original Message -----
> From: "Matt Fleming" <matt@console-pimps.org>
> To: "Dave Young" <dyoung@redhat.com>
> Cc: "Borislav Petkov" <bp@alien8.de>, "X86 ML" <x86@kernel.org>, "LKML" <linux-kernel@vger.kernel.org>, "Borislav Petkov" <bp@suse.de>, "Matthew Garrett" <mjg59@srcf.ucam.org>, "H. Peter Anvin" <hpa@zytor.com>, "James Bottomley" <James.Bottomley@HansenPartnership.com>, "Vivek Goyal" <vgoyal@redhat.com>, linux-efi@vger.kernel.org, fwts-devel@lists.ubuntu.com
> Sent: Friday, October 11, 2013 6:27:06 PM
> Subject: Re: [PATCH 12/12] EFI: Runtime services virtual mapping
> 
> On Fri, 11 Oct, at 02:24:37PM, Dave Young wrote:
> > For the boot efi_reserve_boot_services code, it's mainly for the
> > SetVirtualAddressMap callback use, so boot regions should not be reused
> > before SetVirtualAddressMap, but the overlapping happens before the
> > efi_reserve_boot_services, isn't it a problem?
> 
> Hang on, which kernel are you referring to here? The boot kernel or the
> kexec'd kernel? I thought you were saying you noticed the overlap when
> running in the second (kexec'd) kernel?
> 
> The only reason that you would see this overlap in the first (boot)
> kernel is if the bootloader messed up and allocated the kernel text as
> EfiBootServicesCode/Data. I'd like to believe no bootloaders are still
> doing that.
> 
> -- 
> Matt Fleming, Intel Open Source Technology Center
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH 12/12] EFI: Runtime services virtual mapping
  2013-10-11  7:41               ` Borislav Petkov
@ 2013-10-12  7:54                 ` Dave Young
  2013-10-12 10:13                   ` Matt Fleming
  0 siblings, 1 reply; 102+ messages in thread
From: Dave Young @ 2013-10-12  7:54 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Matt Fleming, X86 ML, LKML, Borislav Petkov, Matthew Garrett,
	H. Peter Anvin, James Bottomley, Vivek Goyal, linux-efi,
	fwts-devel

On 10/11/13 at 09:41am, Borislav Petkov wrote:
> On Fri, Oct 11, 2013 at 02:24:37PM +0800, Dave Young wrote:
> > But for current implementation from Boris, getting same mapping
> > between diffrent kernel depends on same md order (same start and
> > size for each one) How about using this mapping solution but at the
> > same time for kexec kernel we also pass the virtual mappings via
> > setup_data, only thing diffrent is we only need map the non boot
> > region and just use the boot region size to ensure the other regions
> > are mapped with same virtual address.
> 
> Actually, as hpa suggested, we will need to be passing the explicit
> virtual addresses to the kexec kernel in case we change the mapping
> algorithm in the future. So all should go through setup_data.
> 
> > OTOH, if we only passing ioremapped data without Boris's current patch
> > the problem I worry about is how can we ensure the addresses are not
> > used by other code before we mapping the in 2nd kernel efi_init.
> 
> Right, the old method of mapping EFI runtime regions used ioremap and
> was mapping the regions in the same address space. Now we have reserved
> a 64G in the VA space ending at -4G (i.e. 0xffff_ffff_0000_0000) which
> is reserved only for EFI RT usage.

Boris:

For the boot service region overlapping problem I have another idea,
how about modify your mapping code to always mapping the RUNTIME region
(non boot service region) firstly from the efi_va, then mapping other
regions in order, in this way kexec 2nd kernel will be happy because
it does not call SetVirtualAddressMap and it does not need the boot
service area at all.

Thanks
Dave

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH 12/12] EFI: Runtime services virtual mapping
  2013-10-12  7:54                 ` Dave Young
@ 2013-10-12 10:13                   ` Matt Fleming
  2013-10-12 10:30                     ` Borislav Petkov
  2013-10-13  3:06                     ` Dave Young
  0 siblings, 2 replies; 102+ messages in thread
From: Matt Fleming @ 2013-10-12 10:13 UTC (permalink / raw)
  To: Dave Young
  Cc: Borislav Petkov, X86 ML, LKML, Borislav Petkov, Matthew Garrett,
	H. Peter Anvin, James Bottomley, Vivek Goyal, linux-efi,
	fwts-devel

On Sat, 12 Oct, at 03:54:44PM, Dave Young wrote:
> Boris:
> 
> For the boot service region overlapping problem I have another idea,
> how about modify your mapping code to always mapping the RUNTIME region
> (non boot service region) firstly from the efi_va, then mapping other
> regions in order, in this way kexec 2nd kernel will be happy because
> it does not call SetVirtualAddressMap and it does not need the boot
> service area at all.

Coalescing the runtime regions together implies that the second kernel
would care about the fragmentation caused by unmapping the boot service
regions - it shouldn't. We've sliced up a considerable chunk of kernel
virtual address space (64G) and fragmentation shouldn't be an issue
right now.

Even if we run out of address space in the future due to fragmentation,
and end up needing to coalesce runtime regions, this would be
transparent to the kexec kernel because it's passed the memmap entries
through setup_data.

Though we are defining an ABI around the EFI address range
(0xffffffef00000000 - 0xffffffff00000000), such that it needs to be the
same between kernels, we must not make the layout of regions within that
range part of the ABI. We need the freedom to change the layout in the
future.

-- 
Matt Fleming, Intel Open Source Technology Center

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH 12/12] EFI: Runtime services virtual mapping
  2013-10-12 10:13                   ` Matt Fleming
@ 2013-10-12 10:30                     ` Borislav Petkov
  2013-10-13  3:11                       ` Dave Young
  2013-10-13  3:06                     ` Dave Young
  1 sibling, 1 reply; 102+ messages in thread
From: Borislav Petkov @ 2013-10-12 10:30 UTC (permalink / raw)
  To: Matt Fleming
  Cc: Dave Young, X86 ML, LKML, Borislav Petkov, Matthew Garrett,
	H. Peter Anvin, James Bottomley, Vivek Goyal, linux-efi,
	fwts-devel

On Sat, Oct 12, 2013 at 11:13:08AM +0100, Matt Fleming wrote:
> On Sat, 12 Oct, at 03:54:44PM, Dave Young wrote:
> > Boris:
> > 
> > For the boot service region overlapping problem I have another idea,
> > how about modify your mapping code to always mapping the RUNTIME region
> > (non boot service region) firstly from the efi_va, then mapping other
> > regions in order, in this way kexec 2nd kernel will be happy because
> > it does not call SetVirtualAddressMap and it does not need the boot
> > service area at all.
> 
> Coalescing the runtime regions together implies that the second kernel
> would care about the fragmentation caused by unmapping the boot service
> regions - it shouldn't. We've sliced up a considerable chunk of kernel
> virtual address space (64G) and fragmentation shouldn't be an issue
> right now.
> 
> Even if we run out of address space in the future due to fragmentation,
> and end up needing to coalesce runtime regions, this would be
> transparent to the kexec kernel because it's passed the memmap entries
> through setup_data.
> 
> Though we are defining an ABI around the EFI address range
> (0xffffffef00000000 - 0xffffffff00000000), such that it needs to be the
> same between kernels, we must not make the layout of regions within that
> range part of the ABI. We need the freedom to change the layout in the
> future.

Basically, to sum up what Matt so eloquently explained, we will be
passing all the runtime regions *but* *not* the boot regions (because
the kexec kernel doesn't need them anyway) through setup_data to the
kexec kernel.

I.e., boot services regions is a dont-care for kexec.

And it is very important to restate that we want to reserve ourselves
the most flexible way of passing regions to the kexec kernel in case we
want to change the mapping algorithm in the future. Therefore, kexec
should simply not know anything about the VA layout of the EFI regions
but will get them spelled out through the boot header's setup_data.

This is the picture so far, AFAICT. Matt, please make a lot of noise if
I've misrepresented anything.

Thanks.

-- 
Regards/Gruss,
    Boris.

Sent from a fat crate under my desk. Formatting is fine.
--

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH 12/12] EFI: Runtime services virtual mapping
  2013-10-12 10:13                   ` Matt Fleming
  2013-10-12 10:30                     ` Borislav Petkov
@ 2013-10-13  3:06                     ` Dave Young
  1 sibling, 0 replies; 102+ messages in thread
From: Dave Young @ 2013-10-13  3:06 UTC (permalink / raw)
  To: Matt Fleming
  Cc: Borislav Petkov, X86 ML, LKML, Borislav Petkov, Matthew Garrett,
	H. Peter Anvin, James Bottomley, Vivek Goyal, linux-efi,
	fwts-devel

On Sat, Oct 12, 2013 at 11:13:08AM +0100, Matt Fleming wrote:
> On Sat, 12 Oct, at 03:54:44PM, Dave Young wrote:
> > Boris:
> > 
> > For the boot service region overlapping problem I have another idea,
> > how about modify your mapping code to always mapping the RUNTIME region
> > (non boot service region) firstly from the efi_va, then mapping other
> > regions in order, in this way kexec 2nd kernel will be happy because
> > it does not call SetVirtualAddressMap and it does not need the boot
> > service area at all.
> 
> Coalescing the runtime regions together implies that the second kernel
> would care about the fragmentation caused by unmapping the boot service
> regions - it shouldn't. We've sliced up a considerable chunk of kernel
> virtual address space (64G) and fragmentation shouldn't be an issue
> right now.
> 
> Even if we run out of address space in the future due to fragmentation,
> and end up needing to coalesce runtime regions, this would be
> transparent to the kexec kernel because it's passed the memmap entries
> through setup_data.

Ok, so passing setup_data looks better like hpa said previously.

> 
> Though we are defining an ABI around the EFI address range
> (0xffffffef00000000 - 0xffffffff00000000), such that it needs to be the
> same between kernels, we must not make the layout of regions within that
> range part of the ABI. We need the freedom to change the layout in the
> future.

Agree.

> 
> -- 
> Matt Fleming, Intel Open Source Technology Center

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH 12/12] EFI: Runtime services virtual mapping
  2013-10-12 10:30                     ` Borislav Petkov
@ 2013-10-13  3:11                       ` Dave Young
  2013-10-13  9:25                         ` Borislav Petkov
  0 siblings, 1 reply; 102+ messages in thread
From: Dave Young @ 2013-10-13  3:11 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Matt Fleming, X86 ML, LKML, Borislav Petkov, Matthew Garrett,
	H. Peter Anvin, James Bottomley, Vivek Goyal, linux-efi,
	fwts-devel

On Sat, Oct 12, 2013 at 12:30:55PM +0200, Borislav Petkov wrote:
> On Sat, Oct 12, 2013 at 11:13:08AM +0100, Matt Fleming wrote:
> > On Sat, 12 Oct, at 03:54:44PM, Dave Young wrote:
> > > Boris:
> > > 
> > > For the boot service region overlapping problem I have another idea,
> > > how about modify your mapping code to always mapping the RUNTIME region
> > > (non boot service region) firstly from the efi_va, then mapping other
> > > regions in order, in this way kexec 2nd kernel will be happy because
> > > it does not call SetVirtualAddressMap and it does not need the boot
> > > service area at all.
> > 
> > Coalescing the runtime regions together implies that the second kernel
> > would care about the fragmentation caused by unmapping the boot service
> > regions - it shouldn't. We've sliced up a considerable chunk of kernel
> > virtual address space (64G) and fragmentation shouldn't be an issue
> > right now.
> > 
> > Even if we run out of address space in the future due to fragmentation,
> > and end up needing to coalesce runtime regions, this would be
> > transparent to the kexec kernel because it's passed the memmap entries
> > through setup_data.
> > 
> > Though we are defining an ABI around the EFI address range
> > (0xffffffef00000000 - 0xffffffff00000000), such that it needs to be the
> > same between kernels, we must not make the layout of regions within that
> > range part of the ABI. We need the freedom to change the layout in the
> > future.
> 
> Basically, to sum up what Matt so eloquently explained, we will be
> passing all the runtime regions *but* *not* the boot regions (because
> the kexec kernel doesn't need them anyway) through setup_data to the
> kexec kernel.
> 
> I.e., boot services regions is a dont-care for kexec.
> 
> And it is very important to restate that we want to reserve ourselves
> the most flexible way of passing regions to the kexec kernel in case we
> want to change the mapping algorithm in the future. Therefore, kexec
> should simply not know anything about the VA layout of the EFI regions
> but will get them spelled out through the boot header's setup_data.

Boris, I think we have got the agreement about passing setup_data?
I think it should be on top of your patch series, I can work on that along
with other kexec related patches. Or if you would like to do it please let
me know.

> 
> This is the picture so far, AFAICT. Matt, please make a lot of noise if
> I've misrepresented anything.
> 
> Thanks.
> 
> -- 
> Regards/Gruss,
>     Boris.
> 
> Sent from a fat crate under my desk. Formatting is fine.
> --

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH 12/12] EFI: Runtime services virtual mapping
  2013-10-13  3:11                       ` Dave Young
@ 2013-10-13  9:25                         ` Borislav Petkov
  2013-10-14 15:58                           ` Borislav Petkov
  0 siblings, 1 reply; 102+ messages in thread
From: Borislav Petkov @ 2013-10-13  9:25 UTC (permalink / raw)
  To: Dave Young
  Cc: Matt Fleming, X86 ML, LKML, Borislav Petkov, Matthew Garrett,
	H. Peter Anvin, James Bottomley, Vivek Goyal, linux-efi,
	fwts-devel

On Sun, Oct 13, 2013 at 11:11:27AM +0800, Dave Young wrote:
> Boris, I think we have got the agreement about passing setup_data?

Yes.

Basically, we want to start with what hpa suggested and see where it
gets us:

http://marc.info/?l=linux-kernel&m=138006799131051

> I think it should be on top of your patch series,

Yep.

> I can work on that along with other kexec related patches. Or if you
> would like to do it please let me know.

Absolutely, please feel free to do so - it's not like I don't have
anything else to do :-)

In the meantime, I'll finish randconfigs testing of the patches and
upload the latest version to k-org, I'll let you know.

Thanks.

-- 
Regards/Gruss,
    Boris.

Sent from a fat crate under my desk. Formatting is fine.
--

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH 00/11] EFI runtime services virtual mapping
  2013-10-08 16:45 ` Borislav Petkov
  2013-10-08 16:47   ` [PATCH 11/12] efi: Add an efi= kernel command line parameter Borislav Petkov
  2013-10-08 16:48   ` [PATCH 12/12] EFI: Runtime services virtual mapping Borislav Petkov
@ 2013-10-14 13:04   ` Matt Fleming
  2 siblings, 0 replies; 102+ messages in thread
From: Matt Fleming @ 2013-10-14 13:04 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: X86 ML, LKML, Borislav Petkov, Matthew Garrett, H. Peter Anvin,
	James Bottomley, Vivek Goyal, Dave Young, linux-efi, fwts-devel

On Tue, 08 Oct, at 06:45:51PM, Borislav Petkov wrote:
> @@ -141,34 +151,75 @@ static long efi_runtime_ioctl(struct file *file, unsigned int cmd,
>  			return -EFAULT;
>  
>  		convert_from_guid(&vendor, &vendor_guid);
> -		status = efi.get_variable(pgetvariable->VariableName, &vendor,
> -					&attr, &datasize, pgetvariable->Data);
> +
> +		vardata = kmalloc(datasize, GFP_KERNEL);
> +		if (!vardata)
> +			return -ENOMEM;
> +
> +		namelen = ucs2_strsize(pgetvariable->VariableName, 1024);
> +
> +		varname = kmalloc(namelen, GFP_KERNEL);
> +		if (!varname)
> +			return -ENOMEM;
> +
> +		if (copy_from_user(varname, pgetvariable->VariableName, namelen))
> +			return -EFAULT;
> +

		varname = kmalloc(namelen + 1, GFP_KERNEL);
		varname[namelen] = 0;

Note that ucs2_strsize() doesn't count the terminating NUL.

-- 
Matt Fleming, Intel Open Source Technology Center

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH 12/12] EFI: Runtime services virtual mapping
  2013-10-12  2:14                   ` Dave Young
@ 2013-10-14 15:57                     ` Peter Jones
  2013-10-16  6:27                       ` Dave Young
  0 siblings, 1 reply; 102+ messages in thread
From: Peter Jones @ 2013-10-14 15:57 UTC (permalink / raw)
  To: Dave Young
  Cc: Matt Fleming, Borislav Petkov, X86 ML, LKML, Borislav Petkov,
	Matthew Garrett, H. Peter Anvin, James Bottomley, Vivek Goyal,
	linux-efi, fwts-devel

On Sat, Oct 12, 2013 at 10:14:39AM +0800, Dave Young wrote:
> CCing Peter Jones .., Peter, any idea about the grub related problem?

What grub problem?  As Matt was saying, grub2 isn't loading it as
EfiBootServicesCode/Data.  grub2 is loading it as EfiLoaderData .

> 
> On 10/11/13 at 09:42am, Dave Young wrote:
> > Matt,
> > 
> > The kernel I referring is the boot kernel aka the 1st kernel,
> > the boot loader is grub2 from Fedora 19.
> > 
> > [sorry for top reply because of using webmail]
> > 
> > 
> > ----- Original Message -----
> > From: "Matt Fleming" <matt@console-pimps.org>
> > To: "Dave Young" <dyoung@redhat.com>
> > Cc: "Borislav Petkov" <bp@alien8.de>, "X86 ML" <x86@kernel.org>, "LKML" <linux-kernel@vger.kernel.org>, "Borislav Petkov" <bp@suse.de>, "Matthew Garrett" <mjg59@srcf.ucam.org>, "H. Peter Anvin" <hpa@zytor.com>, "James Bottomley" <James.Bottomley@HansenPartnership.com>, "Vivek Goyal" <vgoyal@redhat.com>, linux-efi@vger.kernel.org, fwts-devel@lists.ubuntu.com
> > Sent: Friday, October 11, 2013 6:27:06 PM
> > Subject: Re: [PATCH 12/12] EFI: Runtime services virtual mapping
> > 
> > On Fri, 11 Oct, at 02:24:37PM, Dave Young wrote:
> > > For the boot efi_reserve_boot_services code, it's mainly for the
> > > SetVirtualAddressMap callback use, so boot regions should not be reused
> > > before SetVirtualAddressMap, but the overlapping happens before the
> > > efi_reserve_boot_services, isn't it a problem?
> > 
> > Hang on, which kernel are you referring to here? The boot kernel or the
> > kexec'd kernel? I thought you were saying you noticed the overlap when
> > running in the second (kexec'd) kernel?
> > 
> > The only reason that you would see this overlap in the first (boot)
> > kernel is if the bootloader messed up and allocated the kernel text as
> > EfiBootServicesCode/Data. I'd like to believe no bootloaders are still
> > doing that.
> > 
> > -- 
> > Matt Fleming, Intel Open Source Technology Center
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > Please read the FAQ at  http://www.tux.org/lkml/

-- 
        Peter

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH 12/12] EFI: Runtime services virtual mapping
  2013-10-13  9:25                         ` Borislav Petkov
@ 2013-10-14 15:58                           ` Borislav Petkov
  2013-10-21 12:47                             ` Dave Young
  0 siblings, 1 reply; 102+ messages in thread
From: Borislav Petkov @ 2013-10-14 15:58 UTC (permalink / raw)
  To: Dave Young
  Cc: Matt Fleming, X86 ML, LKML, Borislav Petkov, Matthew Garrett,
	H. Peter Anvin, James Bottomley, Vivek Goyal, linux-efi,
	fwts-devel

On Sun, Oct 13, 2013 at 11:25:21AM +0200, Borislav Petkov wrote:
> In the meantime, I'll finish randconfigs testing of the patches and
> upload the latest version to k-org, I'll let you know.

Ok, here it is:

git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp.git#efi

This version seems to work on most boxes except Matt's Asus half-life
zombie.

Let me know if there are issues,

Thanks.

-- 
Regards/Gruss,
    Boris.

Sent from a fat crate under my desk. Formatting is fine.
--

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH 12/12] EFI: Runtime services virtual mapping
  2013-10-14 15:57                     ` Peter Jones
@ 2013-10-16  6:27                       ` Dave Young
  0 siblings, 0 replies; 102+ messages in thread
From: Dave Young @ 2013-10-16  6:27 UTC (permalink / raw)
  To: Peter Jones
  Cc: Matt Fleming, Borislav Petkov, X86 ML, LKML, Borislav Petkov,
	Matthew Garrett, H. Peter Anvin, James Bottomley, Vivek Goyal,
	linux-efi, fwts-devel

On 10/14/13 at 11:57am, Peter Jones wrote:
> On Sat, Oct 12, 2013 at 10:14:39AM +0800, Dave Young wrote:
> > CCing Peter Jones .., Peter, any idea about the grub related problem?
> 
> What grub problem?  As Matt was saying, grub2 isn't loading it as
> EfiBootServicesCode/Data.  grub2 is loading it as EfiLoaderData .

Today I did printk debug, it is in fact an off by one bug:
text start: 1000000 md start: 800000 md size: 800000

Below is the code:
                if ((start+size >= __pa_symbol(_text)
                                && start <= __pa_symbol(_end)) ||
                        !e820_all_mapped(start, start+size, E820_RAM) ||
                        memblock_is_region_reserved(start, size)) {
                        /* Could not reserve, skip it */

Will post a patch to fix it.

> 
> > 
> > On 10/11/13 at 09:42am, Dave Young wrote:
> > > Matt,
> > > 
> > > The kernel I referring is the boot kernel aka the 1st kernel,
> > > the boot loader is grub2 from Fedora 19.
> > > 
> > > [sorry for top reply because of using webmail]
> > > 
> > > 
> > > ----- Original Message -----
> > > From: "Matt Fleming" <matt@console-pimps.org>
> > > To: "Dave Young" <dyoung@redhat.com>
> > > Cc: "Borislav Petkov" <bp@alien8.de>, "X86 ML" <x86@kernel.org>, "LKML" <linux-kernel@vger.kernel.org>, "Borislav Petkov" <bp@suse.de>, "Matthew Garrett" <mjg59@srcf.ucam.org>, "H. Peter Anvin" <hpa@zytor.com>, "James Bottomley" <James.Bottomley@HansenPartnership.com>, "Vivek Goyal" <vgoyal@redhat.com>, linux-efi@vger.kernel.org, fwts-devel@lists.ubuntu.com
> > > Sent: Friday, October 11, 2013 6:27:06 PM
> > > Subject: Re: [PATCH 12/12] EFI: Runtime services virtual mapping
> > > 
> > > On Fri, 11 Oct, at 02:24:37PM, Dave Young wrote:
> > > > For the boot efi_reserve_boot_services code, it's mainly for the
> > > > SetVirtualAddressMap callback use, so boot regions should not be reused
> > > > before SetVirtualAddressMap, but the overlapping happens before the
> > > > efi_reserve_boot_services, isn't it a problem?
> > > 
> > > Hang on, which kernel are you referring to here? The boot kernel or the
> > > kexec'd kernel? I thought you were saying you noticed the overlap when
> > > running in the second (kexec'd) kernel?
> > > 
> > > The only reason that you would see this overlap in the first (boot)
> > > kernel is if the bootloader messed up and allocated the kernel text as
> > > EfiBootServicesCode/Data. I'd like to believe no bootloaders are still
> > > doing that.
> > > 
> > > -- 
> > > Matt Fleming, Intel Open Source Technology Center
> > > --
> > > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> > > the body of a message to majordomo@vger.kernel.org
> > > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > > Please read the FAQ at  http://www.tux.org/lkml/
> 
> -- 
>         Peter

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH 12/12] EFI: Runtime services virtual mapping
  2013-10-14 15:58                           ` Borislav Petkov
@ 2013-10-21 12:47                             ` Dave Young
  2013-10-21 13:37                               ` Borislav Petkov
  0 siblings, 1 reply; 102+ messages in thread
From: Dave Young @ 2013-10-21 12:47 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Matt Fleming, X86 ML, LKML, Borislav Petkov, Matthew Garrett,
	H. Peter Anvin, James Bottomley, Vivek Goyal, linux-efi,
	fwts-devel

On 10/14/13 at 05:58pm, Borislav Petkov wrote:
> On Sun, Oct 13, 2013 at 11:25:21AM +0200, Borislav Petkov wrote:
> > In the meantime, I'll finish randconfigs testing of the patches and
> > upload the latest version to k-org, I'll let you know.
> 
> Ok, here it is:
> 
> git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp.git#efi
> 
> This version seems to work on most boxes except Matt's Asus half-life
> zombie.

What's the status of this series?

> 
> Let me know if there are issues,

I need below patch for mapping to fixed virt addr passed
from 1st kernel. Would you like to add it to your series
or I send out it later?

BTW, what tree should my patches based on? Matt's next tree?
Looks like your tree is not consistant with Matt's tree.

--

Add function efi_map_region_fixed for mapping to a specific
virt address. 

Signed-off-by: Dave Young <dyoung@redhat.com>
---
 arch/x86/platform/efi/efi.c |   13 +++++++++++++
 1 file changed, 13 insertions(+)

--- bp.orig/arch/x86/platform/efi/efi.c
+++ bp/arch/x86/platform/efi/efi.c
@@ -1086,6 +1086,19 @@ static void efi_merge_regions(void)
 	}
 }
 
+void __init efi_map_region_fixed(efi_memory_desc_t *md, u64 virt_addr)
+{
+	pgd_t *pgd = (pgd_t *)__va(real_mode_header->trampoline_pgd);
+	unsigned long pf = 0;
+
+	if (!(md->attribute & EFI_MEMORY_WB))
+		pf |= _PAGE_PCD;
+
+	if(kernel_map_pages_in_pgd(pgd, md->phys_addr, virt_addr, md->num_pages, pf))
+		pr_warning("Error mapping PA 0x%llx -> VA 0x%llx!\n",
+			   md->phys_addr, virt_addr);
+}
+
 /*
  * Map efi memory ranges for runtime serivce
  * Return the new memmap with updated virtual addrresses.

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH 12/12] EFI: Runtime services virtual mapping
  2013-10-21 12:47                             ` Dave Young
@ 2013-10-21 13:37                               ` Borislav Petkov
  2013-10-21 15:04                                 ` Dave Young
  2013-10-26 15:50                                 ` Matt Fleming
  0 siblings, 2 replies; 102+ messages in thread
From: Borislav Petkov @ 2013-10-21 13:37 UTC (permalink / raw)
  To: Dave Young
  Cc: Matt Fleming, X86 ML, LKML, Borislav Petkov, Matthew Garrett,
	H. Peter Anvin, James Bottomley, Vivek Goyal, linux-efi,
	fwts-devel

On Mon, Oct 21, 2013 at 08:47:39PM +0800, Dave Young wrote:
> What's the status of this series?

They should appear at some point in Matt's efi-next branch, I think.

> I need below patch for mapping to fixed virt addr passed
> from 1st kernel.

You need this to map the runtime regions in the kexec kernel, right?
Please write that in the commit message.

> Would you like to add it to your series or I send out it later?

Yeah, just add it to your patchset.

> BTW, what tree should my patches based on? Matt's next tree?

Yeah, I think efi-next. Matt?

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH 12/12] EFI: Runtime services virtual mapping
  2013-10-21 13:37                               ` Borislav Petkov
@ 2013-10-21 15:04                                 ` Dave Young
  2013-10-22 11:18                                   ` Borislav Petkov
  2013-10-26 15:50                                 ` Matt Fleming
  1 sibling, 1 reply; 102+ messages in thread
From: Dave Young @ 2013-10-21 15:04 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Matt Fleming, X86 ML, LKML, Borislav Petkov, Matthew Garrett,
	H. Peter Anvin, James Bottomley, Vivek Goyal, linux-efi,
	fwts-devel

On 10/21/13 at 03:37pm, Borislav Petkov wrote:
> On Mon, Oct 21, 2013 at 08:47:39PM +0800, Dave Young wrote:
> > What's the status of this series?
> 
> They should appear at some point in Matt's efi-next branch, I think.
> 
> > I need below patch for mapping to fixed virt addr passed
> > from 1st kernel.
> 
> You need this to map the runtime regions in the kexec kernel, right?
> Please write that in the commit message.

Yes, will do
> 
> > Would you like to add it to your series or I send out it later?
> 
> Yeah, just add it to your patchset.

Ok.

> 
> > BTW, what tree should my patches based on? Matt's next tree?
> 
> Yeah, I think efi-next. Matt?

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH 12/12] EFI: Runtime services virtual mapping
  2013-10-21 15:04                                 ` Dave Young
@ 2013-10-22 11:18                                   ` Borislav Petkov
  2013-10-23  2:17                                     ` Dave Young
  0 siblings, 1 reply; 102+ messages in thread
From: Borislav Petkov @ 2013-10-22 11:18 UTC (permalink / raw)
  To: Dave Young
  Cc: Matt Fleming, X86 ML, LKML, Borislav Petkov, Matthew Garrett,
	H. Peter Anvin, James Bottomley, Vivek Goyal, linux-efi,
	fwts-devel

On Mon, Oct 21, 2013 at 11:04:26PM +0800, Dave Young wrote:
> > You need this to map the runtime regions in the kexec kernel, right?
> > Please write that in the commit message.
> 
> Yes, will do

Ok, but but, why doesn't the normal code path in efi_enter_virtual_mode
work anymore? I mean, why do you need another function instead of doing
what you did previously:

	if (!kexec)
		phys_efi_set_virtual_address_map(...)

The path up to here does the mapping already anyway so you only need to
do the mapping in the kexec kernel and skip set set_virtual_map thing.

Thanks.

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH 12/12] EFI: Runtime services virtual mapping
  2013-10-22 11:18                                   ` Borislav Petkov
@ 2013-10-23  2:17                                     ` Dave Young
  2013-10-23 12:25                                       ` Borislav Petkov
  0 siblings, 1 reply; 102+ messages in thread
From: Dave Young @ 2013-10-23  2:17 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Matt Fleming, X86 ML, LKML, Borislav Petkov, Matthew Garrett,
	H. Peter Anvin, James Bottomley, Vivek Goyal, linux-efi,
	fwts-devel

On 10/22/13 at 01:18pm, Borislav Petkov wrote:
> On Mon, Oct 21, 2013 at 11:04:26PM +0800, Dave Young wrote:
> > > You need this to map the runtime regions in the kexec kernel, right?
> > > Please write that in the commit message.
> > 
> > Yes, will do
> 
> Ok, but but, why doesn't the normal code path in efi_enter_virtual_mode
> work anymore? I mean, why do you need another function instead of doing
> what you did previously:
> 
> 	if (!kexec)
> 		phys_efi_set_virtual_address_map(...)
> 
> The path up to here does the mapping already anyway so you only need to
> do the mapping in the kexec kernel and skip set set_virtual_map thing.

Hi,

The reason is that I only pass runtime regions from 1st kernel to kexec
kernel, your efi mapping function uses the region size to determin the 
virtual address from top to down. Because the passed-in md ranges in kexec
kernel are different from ranges booting from firmware so the virtual address
will be different.

Even I pass the whole untouched ranges including BOOT_SERVICE there's still
chance the function for reserving boot regions overwrite the boot region
size to 0, and 1st kernel will leave it to be used as normal memory after efi init.
I think we have talked about this issue previously.

Thanks
Dave


> 
> Thanks.

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH 12/12] EFI: Runtime services virtual mapping
  2013-10-23  2:17                                     ` Dave Young
@ 2013-10-23 12:25                                       ` Borislav Petkov
  2013-10-23 12:37                                         ` Matthew Garrett
  2013-10-23 12:51                                         ` Dave Young
  0 siblings, 2 replies; 102+ messages in thread
From: Borislav Petkov @ 2013-10-23 12:25 UTC (permalink / raw)
  To: Dave Young
  Cc: Matt Fleming, X86 ML, LKML, Borislav Petkov, Matthew Garrett,
	H. Peter Anvin, James Bottomley, Vivek Goyal, linux-efi,
	fwts-devel

On Wed, Oct 23, 2013 at 10:17:31AM +0800, Dave Young wrote:
> The reason is that I only pass runtime regions from 1st kernel to
> kexec kernel, your efi mapping function uses the region size to
> determin the virtual address from top to down. Because the passed-in
> md ranges in kexec kernel are different from ranges booting from
> firmware so the virtual address will be different.

Well, this shouldn't be because SetVirtualAddressMap has already fixed
the virtual addresses for us. And if they're different, then runtime
services won't work anyway. Or am I missing something...?

> Even I pass the whole untouched ranges including BOOT_SERVICE there's
> still chance the function for reserving boot regions overwrite the
> boot region size to 0, and 1st kernel will leave it to be used as
> normal memory after efi init. I think we have talked about this issue
> previously.

Matt, didn't you question the need to keep boot services regions
mapped indefinitely? What was the story there?

Thanks.

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH 12/12] EFI: Runtime services virtual mapping
  2013-10-23 12:25                                       ` Borislav Petkov
@ 2013-10-23 12:37                                         ` Matthew Garrett
  2013-10-23 12:51                                         ` Dave Young
  1 sibling, 0 replies; 102+ messages in thread
From: Matthew Garrett @ 2013-10-23 12:37 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Dave Young, Matt Fleming, X86 ML, LKML, Borislav Petkov,
	H. Peter Anvin, James Bottomley, Vivek Goyal, linux-efi,
	fwts-devel

On Wed, Oct 23, 2013 at 02:25:31PM +0200, Borislav Petkov wrote:

> Matt, didn't you question the need to keep boot services regions
> mapped indefinitely? What was the story there?

We shouldn't need boot services regions to be mapped after 
SetVirtualAddressMap is called.

-- 
Matthew Garrett | mjg59@srcf.ucam.org

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH 12/12] EFI: Runtime services virtual mapping
  2013-10-23 12:25                                       ` Borislav Petkov
  2013-10-23 12:37                                         ` Matthew Garrett
@ 2013-10-23 12:51                                         ` Dave Young
  2013-10-23 13:11                                           ` Borislav Petkov
  1 sibling, 1 reply; 102+ messages in thread
From: Dave Young @ 2013-10-23 12:51 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Matt Fleming, X86 ML, LKML, Borislav Petkov, Matthew Garrett,
	H. Peter Anvin, James Bottomley, Vivek Goyal, linux-efi,
	fwts-devel

On 10/23/13 at 02:25pm, Borislav Petkov wrote:
> On Wed, Oct 23, 2013 at 10:17:31AM +0800, Dave Young wrote:
> > The reason is that I only pass runtime regions from 1st kernel to
> > kexec kernel, your efi mapping function uses the region size to
> > determin the virtual address from top to down. Because the passed-in
> > md ranges in kexec kernel are different from ranges booting from
> > firmware so the virtual address will be different.
> 
> Well, this shouldn't be because SetVirtualAddressMap has already fixed
> the virtual addresses for us. And if they're different, then runtime
> services won't work anyway. Or am I missing something...?

Maybe I did not explain clear enough.
Say first kernel mapping below regions:
Region A (boot service):phys_start_a size_a -> virt_start_a size_a
Region B (runtime):	phys_start_b size_b -> virt_start_b size_b

I will pass Range B into 2nd kernel
(phys_start_b, size_b, virt_start_b)

In kexed 2nd kernel, phys_start_b need to be mapped to virt_start_b
Simply use efi_map_region from your patch does not work because it
will map phys_start_b to a different virt address, isn't it?

So I need simply map according to the kexec passed in mapping addr.

Thanks
Dave

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH 12/12] EFI: Runtime services virtual mapping
  2013-10-23 12:51                                         ` Dave Young
@ 2013-10-23 13:11                                           ` Borislav Petkov
  0 siblings, 0 replies; 102+ messages in thread
From: Borislav Petkov @ 2013-10-23 13:11 UTC (permalink / raw)
  To: Dave Young
  Cc: Matt Fleming, X86 ML, LKML, Borislav Petkov, Matthew Garrett,
	H. Peter Anvin, James Bottomley, Vivek Goyal, linux-efi,
	fwts-devel

On Wed, Oct 23, 2013 at 08:51:31PM +0800, Dave Young wrote:
> In kexed 2nd kernel, phys_start_b need to be mapped to virt_start_b
> Simply use efi_map_region from your patch does not work because it
> will map phys_start_b to a different virt address, isn't it?

Oh ok, in the second kernel we're not mapping *all* regions we do map in
the first kernel, right.

> So I need simply map according to the kexec passed in mapping addr.

Yes, thanks for elaborating.

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH 12/12] EFI: Runtime services virtual mapping
  2013-10-21 13:37                               ` Borislav Petkov
  2013-10-21 15:04                                 ` Dave Young
@ 2013-10-26 15:50                                 ` Matt Fleming
  1 sibling, 0 replies; 102+ messages in thread
From: Matt Fleming @ 2013-10-26 15:50 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Dave Young, X86 ML, LKML, Borislav Petkov, Matthew Garrett,
	H. Peter Anvin, James Bottomley, Vivek Goyal, linux-efi,
	fwts-devel

On Mon, 21 Oct, at 03:37:41PM, Borislav Petkov wrote:
> On Mon, Oct 21, 2013 at 08:47:39PM +0800, Dave Young wrote:
> > What's the status of this series?
> 
> They should appear at some point in Matt's efi-next branch, I think.
> 
> > I need below patch for mapping to fixed virt addr passed
> > from 1st kernel.
> 
> You need this to map the runtime regions in the kexec kernel, right?
> Please write that in the commit message.
> 
> > Would you like to add it to your series or I send out it later?
> 
> Yeah, just add it to your patchset.
> 
> > BTW, what tree should my patches based on? Matt's next tree?
> 
> Yeah, I think efi-next. Matt?

Yes please.

-- 
Matt Fleming, Intel Open Source Technology Center

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH 11/12] efi: Add an efi= kernel command line parameter
  2013-10-08 16:47   ` [PATCH 11/12] efi: Add an efi= kernel command line parameter Borislav Petkov
@ 2013-10-28 11:02     ` Matt Fleming
  2013-10-28 11:10       ` Borislav Petkov
  0 siblings, 1 reply; 102+ messages in thread
From: Matt Fleming @ 2013-10-28 11:02 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: X86 ML, LKML, Borislav Petkov, Matthew Garrett, H. Peter Anvin,
	James Bottomley, Vivek Goyal, Dave Young, linux-efi, fwts-devel

On Tue, 08 Oct, at 06:47:02PM, Borislav Petkov wrote:
> From: Borislav Petkov <bp@suse.de>
> 
> ... for passing miscellaneous options and chicken bits from the command
> line.
> 
> Signed-off-by: Borislav Petkov <bp@suse.de>
> ---
>  arch/x86/platform/efi/efi.c | 9 +++++++++
>  1 file changed, 9 insertions(+)

This patch should be part of PATCH 12.

-- 
Matt Fleming, Intel Open Source Technology Center

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH 11/12] efi: Add an efi= kernel command line parameter
  2013-10-28 11:02     ` Matt Fleming
@ 2013-10-28 11:10       ` Borislav Petkov
  0 siblings, 0 replies; 102+ messages in thread
From: Borislav Petkov @ 2013-10-28 11:10 UTC (permalink / raw)
  To: Matt Fleming
  Cc: X86 ML, LKML, Borislav Petkov, Matthew Garrett, H. Peter Anvin,
	James Bottomley, Vivek Goyal, Dave Young, linux-efi, fwts-devel

On Mon, Oct 28, 2013 at 11:02:13AM +0000, Matt Fleming wrote:
> This patch should be part of PATCH 12.

I wanted it to be separate as it adds an unrelated functionality but I
don't really care all that much - I'll merge it.

-- 
Regards/Gruss,
    Boris.

Sent from a fat crate under my desk. Formatting is fine.
--

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH 12/12] EFI: Runtime services virtual mapping
  2013-10-08 16:48   ` [PATCH 12/12] EFI: Runtime services virtual mapping Borislav Petkov
  2013-10-10  8:06     ` Dave Young
@ 2013-10-28 11:22     ` Matt Fleming
  2013-10-28 16:00       ` Borislav Petkov
  2013-10-29  6:47     ` Dave Young
  2 siblings, 1 reply; 102+ messages in thread
From: Matt Fleming @ 2013-10-28 11:22 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: X86 ML, LKML, Borislav Petkov, Matthew Garrett, H. Peter Anvin,
	James Bottomley, Vivek Goyal, Dave Young, linux-efi, fwts-devel

On Tue, 08 Oct, at 06:48:31PM, Borislav Petkov wrote:
> From: Borislav Petkov <bp@suse.de>
> 
> We map the EFI regions needed for runtime services contiguously on
> virtual addresses starting from -4G down for a total max space of 64G.
> This way, we provide for stable runtime services addresses across
> kernels so that a kexec'd kernel can still use them.
> 
> This way, they're mapped in a separate pagetable so that we don't
> pollute the kernel namespace (you can see how the whole ioremapping and
> saving and restoring of PGDs is gone now).
> 
> Also, add a chicken bit called "efi=old_map" which can be used as a
> fallback to the old runtime services mapping method in case there's some
> b0rkage with a particular EFI implementation (haha, it is hard to hold
> up the sarcasm here...).
> 
> Add UEFI RT VA space to Documentation/x86/x86_64/mm.txt, while at it.
> 
> Signed-off-by: Borislav Petkov <bp@suse.de>
> ---
>  Documentation/x86/x86_64/mm.txt      |  7 +++
>  arch/x86/include/asm/efi.h           | 47 ++++++++++++-------
>  arch/x86/include/asm/pgtable_types.h |  3 +-
>  arch/x86/platform/efi/efi.c          | 91 ++++++++++++++++++++++++++----------
>  arch/x86/platform/efi/efi_32.c       |  8 +++-
>  arch/x86/platform/efi/efi_64.c       | 83 ++++++++++++++++++++++++++++++++
>  arch/x86/platform/efi/efi_stub_64.S  | 54 +++++++++++++++++++++
>  include/linux/efi.h                  |  1 +
>  8 files changed, 251 insertions(+), 43 deletions(-)

[...]

> @@ -949,8 +978,17 @@ void __init efi_enter_virtual_mode(void)
>  		count++;
>  	}
>  
> +#ifdef CONFIG_X86_64
> +	efi_scratch.efi_pgt = (pgd_t *)(unsigned long)real_mode_header->trampoline_pgd;
> +
> +	if (!test_bit(EFI_OLD_MEMMAP, &x86_efi_facility))
> +		efi_scratch.use_pgd = true;
> +#endif
> +
>  	BUG_ON(!efi.systab);

Could you use the efi_enabled() function to test for EFI_OLD_MEMMAP
instead of test_bit()?

[...]

> diff --git a/include/linux/efi.h b/include/linux/efi.h
> index fa47d80ab4b5..beff433aa8c0 100644
> --- a/include/linux/efi.h
> +++ b/include/linux/efi.h
> @@ -632,6 +632,7 @@ extern int __init efi_setup_pcdp_console(char *);
>  #define EFI_RUNTIME_SERVICES	3	/* Can we use runtime services? */
>  #define EFI_MEMMAP		4	/* Can we use EFI memory map? */
>  #define EFI_64BIT		5	/* Is the firmware 64-bit? */
> +#define EFI_OLD_MEMMAP		6	/* Use old mapping method */

Hmm... I'm wondering whether this should actually be,

#define EFI_ARCH_1		6	/* Architecture-specific option */

and in arch/x86/include/ we could then do,

/*
 * Lots of info about why we need to switch to a new mapping scheme, but
 * also why the old scheme might be desirable....
 */
#define EFI_OLD_MEMMAP		EFI_ARCH_1

This way we won't exhaust the bitspace quite so soon (since ARM/ARM64
can reuse EFI_ARCH_1 if they need it), plus this memory mapping method
is a very architecture-specific thing and so makes sense to hide it in
the bowels of arch/x86. If it turns out that ARM/ARM64 need the exact
same config option we can delete EFI_ARCH_1 and move EFI_OLD_MEMMAP to
include/linux/efi.h just like in your original patch. 

What do you think?

-- 
Matt Fleming, Intel Open Source Technology Center

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH 12/12] EFI: Runtime services virtual mapping
  2013-10-28 11:22     ` Matt Fleming
@ 2013-10-28 16:00       ` Borislav Petkov
  0 siblings, 0 replies; 102+ messages in thread
From: Borislav Petkov @ 2013-10-28 16:00 UTC (permalink / raw)
  To: Matt Fleming
  Cc: X86 ML, LKML, Borislav Petkov, Matthew Garrett, H. Peter Anvin,
	James Bottomley, Vivek Goyal, Dave Young, linux-efi, fwts-devel

On Mon, Oct 28, 2013 at 11:22:46AM +0000, Matt Fleming wrote:
> Could you use the efi_enabled() function to test for EFI_OLD_MEMMAP
> instead of test_bit()?

Sure.

> This way we won't exhaust the bitspace quite so soon (since ARM/ARM64

Yeah, very foresightful.

> can reuse EFI_ARCH_1 if they need it), plus this memory mapping method
> is a very architecture-specific thing and so makes sense to hide it in
> the bowels of arch/x86. If it turns out that ARM/ARM64 need the exact
> same config option we can delete EFI_ARCH_1 and move EFI_OLD_MEMMAP to
> include/linux/efi.h just like in your original patch. 
> 
> What do you think?

Yep, done and pushed out.

Thanks.

-- 
Regards/Gruss,
    Boris.

Sent from a fat crate under my desk. Formatting is fine.
--

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH 12/12] EFI: Runtime services virtual mapping
  2013-10-08 16:48   ` [PATCH 12/12] EFI: Runtime services virtual mapping Borislav Petkov
  2013-10-10  8:06     ` Dave Young
  2013-10-28 11:22     ` Matt Fleming
@ 2013-10-29  6:47     ` Dave Young
  2013-10-29  9:40       ` Borislav Petkov
  2 siblings, 1 reply; 102+ messages in thread
From: Dave Young @ 2013-10-29  6:47 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: X86 ML, LKML, Borislav Petkov, Matt Fleming, Matthew Garrett,
	H. Peter Anvin, James Bottomley, Vivek Goyal, linux-efi,
	fwts-devel

>  /*
>   * This function will switch the EFI runtime services to virtual mode.
>   * Essentially, look through the EFI memmap and map every region that
> @@ -862,10 +906,10 @@ void efi_memory_uc(u64 addr, unsigned long size)
>  void __init efi_enter_virtual_mode(void)

Boris, could you update the comment? it says below:
 update that memory descriptor with the virtual address obtained
 from ioremap().

Logiclly your patch should update it, then my patch update it again with
the case of mapping to fixed address for kexec.

Thanks
Dave

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH 12/12] EFI: Runtime services virtual mapping
  2013-10-29  6:47     ` Dave Young
@ 2013-10-29  9:40       ` Borislav Petkov
  2013-10-30  9:32         ` Dave Young
  0 siblings, 1 reply; 102+ messages in thread
From: Borislav Petkov @ 2013-10-29  9:40 UTC (permalink / raw)
  To: Dave Young
  Cc: X86 ML, LKML, Borislav Petkov, Matt Fleming, Matthew Garrett,
	H. Peter Anvin, James Bottomley, Vivek Goyal, linux-efi,
	fwts-devel

On Tue, Oct 29, 2013 at 02:47:20PM +0800, Dave Young wrote:
> Boris, could you update the comment? it says below: update that memory
> descriptor with the virtual address obtained from ioremap().
>
> Logiclly your patch should update it, then my patch update it again
> with the case of mapping to fixed address for kexec.

Thanks for catching this, I ended up doing the following:

/*
 * This function will switch the EFI runtime services to virtual mode.
 * Essentially, we look through the EFI memmap and map every region that
 * has the runtime attribute bit set in its memory descriptor into the
 * ->trampoline_pgd page table using a top-down VA allocation scheme.
 *
 * The old method which used to update that memory descriptor with the
 * virtual address obtained from ioremap() is still supported when the
 * kernel is booted with efi=old_map on its command line. Same old
 * method enabled the runtime services to be called without having to
 * thunk back into physical mode for every invocation.
 *
 * The new method does a pagetable switch in a preemption-safe manner
 * so that we're in a different address space when calling a runtime
 * function. For function arguments passing we do copy the PGDs of the
 * kernel page table into ->trampoline_pgd prior to each call.
 */

-- 
Regards/Gruss,
    Boris.

Sent from a fat crate under my desk. Formatting is fine.
--

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH 12/12] EFI: Runtime services virtual mapping
  2013-10-29  9:40       ` Borislav Petkov
@ 2013-10-30  9:32         ` Dave Young
  2013-10-30 10:45           ` Borislav Petkov
  0 siblings, 1 reply; 102+ messages in thread
From: Dave Young @ 2013-10-30  9:32 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: X86 ML, LKML, Borislav Petkov, Matt Fleming, Matthew Garrett,
	H. Peter Anvin, James Bottomley, Vivek Goyal, linux-efi,
	fwts-devel

On 10/29/13 at 10:40am, Borislav Petkov wrote:
> On Tue, Oct 29, 2013 at 02:47:20PM +0800, Dave Young wrote:
> > Boris, could you update the comment? it says below: update that memory
> > descriptor with the virtual address obtained from ioremap().
> >
> > Logiclly your patch should update it, then my patch update it again
> > with the case of mapping to fixed address for kexec.
> 
> Thanks for catching this, I ended up doing the following:
> 
> /*
>  * This function will switch the EFI runtime services to virtual mode.
>  * Essentially, we look through the EFI memmap and map every region that
>  * has the runtime attribute bit set in its memory descriptor into the
>  * ->trampoline_pgd page table using a top-down VA allocation scheme.
>  *
>  * The old method which used to update that memory descriptor with the
>  * virtual address obtained from ioremap() is still supported when the
>  * kernel is booted with efi=old_map on its command line. Same old
>  * method enabled the runtime services to be called without having to
>  * thunk back into physical mode for every invocation.
>  *
>  * The new method does a pagetable switch in a preemption-safe manner
>  * so that we're in a different address space when calling a runtime
>  * function. For function arguments passing we do copy the PGDs of the
>  * kernel page table into ->trampoline_pgd prior to each call.
>  */

Boris, thanks for update, it's very elaborate, I have still wonder if
32 bit case should be mentioned as well.

Waiting for you next version of the patch series. I will redo my patches
based on that.

Thanks
Dave

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH 12/12] EFI: Runtime services virtual mapping
  2013-10-30  9:32         ` Dave Young
@ 2013-10-30 10:45           ` Borislav Petkov
  2013-10-31  7:07             ` Dave Young
  0 siblings, 1 reply; 102+ messages in thread
From: Borislav Petkov @ 2013-10-30 10:45 UTC (permalink / raw)
  To: Dave Young
  Cc: X86 ML, LKML, Borislav Petkov, Matt Fleming, Matthew Garrett,
	H. Peter Anvin, James Bottomley, Vivek Goyal, linux-efi,
	fwts-devel

On Wed, Oct 30, 2013 at 05:32:27PM +0800, Dave Young wrote:
> Boris, thanks for update, it's very elaborate, I have still wonder if
> 32 bit case should be mentioned as well.

Ah, so that's why is mfleming bugging me about it on IRC :)

Well, I left out the 32-bit case simply because I don't think anyone
cares about it.

> Waiting for you next version of the patch series. I will redo my
> patches based on that.

Since I'm doing only minor fixups, I didn't want to spam
the lists again.

The latest version is my 'efi' branch at
git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp.git

and you can pull it from there.

-- 
Regards/Gruss,
    Boris.

Sent from a fat crate under my desk. Formatting is fine.
--

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH 12/12] EFI: Runtime services virtual mapping
  2013-10-30 10:45           ` Borislav Petkov
@ 2013-10-31  7:07             ` Dave Young
  0 siblings, 0 replies; 102+ messages in thread
From: Dave Young @ 2013-10-31  7:07 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: X86 ML, LKML, Borislav Petkov, Matt Fleming, Matthew Garrett,
	H. Peter Anvin, James Bottomley, Vivek Goyal, linux-efi,
	fwts-devel

On 10/30/13 at 11:45am, Borislav Petkov wrote:
> On Wed, Oct 30, 2013 at 05:32:27PM +0800, Dave Young wrote:
> > Boris, thanks for update, it's very elaborate, I have still wonder if
> > 32 bit case should be mentioned as well.
> 
> Ah, so that's why is mfleming bugging me about it on IRC :)
> 
> Well, I left out the 32-bit case simply because I don't think anyone
> cares about it.

Ok, that's fine, thanks for telling me.

> 
> > Waiting for you next version of the patch series. I will redo my
> > patches based on that.
> 
> Since I'm doing only minor fixups, I didn't want to spam
> the lists again.
> 
> The latest version is my 'efi' branch at
> git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp.git
> 
> and you can pull it from there.

Just pulled your git, the function comment has not yet to be updated,
so could you send me privately your new patches if you would not update
in list.

--
Thanks a lot!
Dave

^ permalink raw reply	[flat|nested] 102+ messages in thread

end of thread, other threads:[~2013-10-31  7:09 UTC | newest]

Thread overview: 102+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-09-19 14:54 [PATCH 00/11] EFI runtime services virtual mapping Borislav Petkov
2013-09-19 14:54 ` [PATCH 01/11] efi: Simplify EFI_DEBUG Borislav Petkov
2013-09-19 14:54 ` [PATCH 02/11] efi: Remove EFI_PAGE_SHIFT and EFI_PAGE_SIZE Borislav Petkov
2013-09-20 10:42   ` Matt Fleming
2013-09-21 15:21     ` Leif Lindholm
2013-09-21 15:41       ` Borislav Petkov
2013-09-21 15:50         ` Borislav Petkov
2013-09-21 16:01           ` Leif Lindholm
2013-09-21 16:03             ` Borislav Petkov
2013-09-21 15:59         ` Leif Lindholm
2013-09-19 14:54 ` [PATCH 03/11] x86, pageattr: Lookup address in an arbitrary PGD Borislav Petkov
2013-09-19 14:54 ` [PATCH 04/11] x86, pageattr: Add a PGD pagetable populating function Borislav Petkov
2013-09-19 14:54 ` [PATCH 05/11] x86, pageattr: Add a PUD " Borislav Petkov
2013-09-19 14:54 ` [PATCH 06/11] x86, pageattr: Add a PMD " Borislav Petkov
2013-09-19 14:54 ` [PATCH 07/11] x86, pageattr: Add a PTE " Borislav Petkov
2013-09-19 14:54 ` [PATCH 08/11] x86, pageattr: Add a PUD error unwinding path Borislav Petkov
2013-09-19 14:54 ` [PATCH 09/11] x86, pageattr: Add last levels of error path Borislav Petkov
2013-09-19 14:54 ` [PATCH 10/11] x86, cpa: Map in an arbitrary pgd Borislav Petkov
2013-09-19 14:54 ` [PATCH 11/11] EFI: Runtime services virtual mapping Borislav Petkov
2013-09-21 11:39   ` [PATCH -v2] " Borislav Petkov
2013-09-22 12:35     ` Dave Young
2013-09-22 13:37       ` Borislav Petkov
2013-09-22 14:00         ` Dave Young
2013-09-22 14:31           ` Dave Young
2013-09-22 15:27         ` H. Peter Anvin
2013-09-22 16:38           ` Borislav Petkov
2013-09-23  5:45           ` Dave Young
2013-09-24  2:52           ` Dave Young
2013-09-24  3:06             ` H. Peter Anvin
2013-09-24  4:57               ` Dave Young
2013-09-24  4:58                 ` Dave Young
2013-09-24  5:23                   ` Dave Young
2013-09-24  8:57                     ` Dave Young
2013-09-24  9:43                 ` Borislav Petkov
2013-09-24 10:01                   ` Dave Young
2013-09-24 12:45                   ` Dave Young
2013-10-02 10:04               ` Borislav Petkov
2013-10-02 15:43                 ` H. Peter Anvin
2013-10-02 17:05                   ` Borislav Petkov
2013-10-02 17:32                     ` H. Peter Anvin
2013-10-02 18:42                       ` Borislav Petkov
2013-10-02 18:46                         ` H. Peter Anvin
2013-10-04  9:42                           ` Borislav Petkov
2013-10-04 14:43                             ` H. Peter Anvin
2013-10-04 14:50                               ` Borislav Petkov
2013-09-23  5:47     ` Dave Young
2013-09-23  6:29       ` Borislav Petkov
2013-09-23  7:08         ` Dave Young
2013-09-23  8:45     ` Borislav Petkov
2013-09-25  9:24     ` Borislav Petkov
2013-09-20  7:29 ` [PATCH 00/11] EFI runtime " Dave Young
2013-09-20  8:19   ` Dave Young
2013-09-20  9:33     ` Borislav Petkov
2013-09-20 10:07       ` Dave Young
2013-09-20  9:05   ` Borislav Petkov
2013-09-20  9:44     ` Matt Fleming
2013-09-20  9:49     ` Matt Fleming
2013-09-20 10:02       ` Borislav Petkov
2013-09-20 11:51     ` Dave Young
2013-09-20 12:29     ` Matt Fleming
2013-09-20 14:04       ` Dave Young
2013-10-08 16:45 ` Borislav Petkov
2013-10-08 16:47   ` [PATCH 11/12] efi: Add an efi= kernel command line parameter Borislav Petkov
2013-10-28 11:02     ` Matt Fleming
2013-10-28 11:10       ` Borislav Petkov
2013-10-08 16:48   ` [PATCH 12/12] EFI: Runtime services virtual mapping Borislav Petkov
2013-10-10  8:06     ` Dave Young
2013-10-10  8:14       ` Dave Young
2013-10-10  8:58         ` Borislav Petkov
2013-10-10 12:34           ` Matt Fleming
2013-10-11  6:24             ` Dave Young
2013-10-11  7:41               ` Borislav Petkov
2013-10-12  7:54                 ` Dave Young
2013-10-12 10:13                   ` Matt Fleming
2013-10-12 10:30                     ` Borislav Petkov
2013-10-13  3:11                       ` Dave Young
2013-10-13  9:25                         ` Borislav Petkov
2013-10-14 15:58                           ` Borislav Petkov
2013-10-21 12:47                             ` Dave Young
2013-10-21 13:37                               ` Borislav Petkov
2013-10-21 15:04                                 ` Dave Young
2013-10-22 11:18                                   ` Borislav Petkov
2013-10-23  2:17                                     ` Dave Young
2013-10-23 12:25                                       ` Borislav Petkov
2013-10-23 12:37                                         ` Matthew Garrett
2013-10-23 12:51                                         ` Dave Young
2013-10-23 13:11                                           ` Borislav Petkov
2013-10-26 15:50                                 ` Matt Fleming
2013-10-13  3:06                     ` Dave Young
2013-10-11 10:27               ` Matt Fleming
2013-10-11 13:42                 ` Dave Young
2013-10-12  2:14                   ` Dave Young
2013-10-14 15:57                     ` Peter Jones
2013-10-16  6:27                       ` Dave Young
2013-10-28 11:22     ` Matt Fleming
2013-10-28 16:00       ` Borislav Petkov
2013-10-29  6:47     ` Dave Young
2013-10-29  9:40       ` Borislav Petkov
2013-10-30  9:32         ` Dave Young
2013-10-30 10:45           ` Borislav Petkov
2013-10-31  7:07             ` Dave Young
2013-10-14 13:04   ` [PATCH 00/11] EFI runtime " Matt Fleming

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).