x86/kexec: always ensure EFI systab region is mapped
diff mbox series

Message ID 20190422092804.15534-1-kasong@redhat.com
State New, archived
Headers show
Series
  • x86/kexec: always ensure EFI systab region is mapped
Related show

Commit Message

Kairui Song April 22, 2019, 9:28 a.m. UTC
This is a fix needed for: "x86/boot: Use efi_setup_data for searching
RSDP on kexec-ed kernels", that patch cause kexec to reset the system
on some machines.

The reason is the systab region is not mapped by the identity mapping
provided by kexec. Currently kexec only create identity mapping for
mem regions, wihch won't cover the systab. So second kernel will be
accessing a not mapped memory region and cause fault.
But as kexec tend to pad the map region up to PUD size, the
systab could be included in the map by accident, so it worked on
some machines, but that will be broken easily and unstable.

To fix it just treat systab specially, always map the systab region
unconditionally on EFI systems as long as there is a valid systab
address.

Signed-off-by: Kairui Song <kasong@redhat.com>
---
 arch/x86/kernel/machine_kexec_64.c | 40 ++++++++++++++++++++++++++++++
 1 file changed, 40 insertions(+)

Comments

Junichi Nomura April 23, 2019, 6:20 a.m. UTC | #1
On 4/22/19 6:28 PM, Kairui Song wrote:
> The reason is the systab region is not mapped by the identity mapping
> provided by kexec. Currently kexec only create identity mapping for
> mem regions, wihch won't cover the systab. So second kernel will be
> accessing a not mapped memory region and cause fault.
> But as kexec tend to pad the map region up to PUD size, the
> systab could be included in the map by accident, so it worked on
> some machines, but that will be broken easily and unstable.

Is the mapping of ACPI tables just by luck, too?
Dave Young April 23, 2019, 10:49 a.m. UTC | #2
On 04/23/19 at 06:20am, Junichi Nomura wrote:
> On 4/22/19 6:28 PM, Kairui Song wrote:
> > The reason is the systab region is not mapped by the identity mapping
> > provided by kexec. Currently kexec only create identity mapping for
> > mem regions, wihch won't cover the systab. So second kernel will be
> > accessing a not mapped memory region and cause fault.
> > But as kexec tend to pad the map region up to PUD size, the
> > systab could be included in the map by accident, so it worked on
> > some machines, but that will be broken easily and unstable.
> 
> Is the mapping of ACPI tables just by luck, too?

Hmm, guess it should be mapped by luck, here is the range on the T420: 
da99f000 - dae9efff Reserved (efi systab fall in this region)
daf9f000 - daffefff ACPI tables

Thanks
Dave
Kairui Song April 23, 2019, 5:15 p.m. UTC | #3
On Mon, Apr 22, 2019 at 11:21 PM Junichi Nomura <j-nomura@ce.jp.nec.com> wrote:
>
> On 4/22/19 6:28 PM, Kairui Song wrote:
> > The reason is the systab region is not mapped by the identity mapping
> > provided by kexec. Currently kexec only create identity mapping for
> > mem regions, wihch won't cover the systab. So second kernel will be
> > accessing a not mapped memory region and cause fault.
> > But as kexec tend to pad the map region up to PUD size, the
> > systab could be included in the map by accident, so it worked on
> > some machines, but that will be broken easily and unstable.
>
> Is the mapping of ACPI tables just by luck, too?
>

Good question, they should have same issue with systab, I ignored this one.
Then in first kernel when doing kexec it should ensure both ACPI
tables and the EFI systab are mapped, that should cover everything and
make it work.
Is there anything else missing?
Junichi Nomura April 24, 2019, 2:47 a.m. UTC | #4
On 4/24/19 2:15 AM, Kairui Song wrote:
> On Mon, Apr 22, 2019 at 11:21 PM Junichi Nomura <j-nomura@ce.jp.nec.com> wrote:
>> Is the mapping of ACPI tables just by luck, too?
> 
> Good question, they should have same issue with systab, I ignored this one.
> Then in first kernel when doing kexec it should ensure both ACPI
> tables and the EFI systab are mapped, that should cover everything and
> make it work.

Right.

> Is there anything else missing?
No, as far as I looked around get_rsdp_addr().
Baoquan He April 24, 2019, 5:41 a.m. UTC | #5
On 04/24/19 at 02:47am, Junichi Nomura wrote:
> On 4/24/19 2:15 AM, Kairui Song wrote:
> > On Mon, Apr 22, 2019 at 11:21 PM Junichi Nomura <j-nomura@ce.jp.nec.com> wrote:
> >> Is the mapping of ACPI tables just by luck, too?
> > 
> > Good question, they should have same issue with systab, I ignored this one.
> > Then in first kernel when doing kexec it should ensure both ACPI
> > tables and the EFI systab are mapped, that should cover everything and
> > make it work.
> 
> Right.
> 
> > Is there anything else missing?
> No, as far as I looked around get_rsdp_addr().

Have made a draft patch to build ident mapping for ACPI tables too, it's
based on Kairui's patch. Dave has tested on his t400s laptop, and
passed. Please check if this adding is OK.

Kairui, you can add this into your patch to make a new one and resend.
Or I can combine them and send for you today.

From 7f17fcb12a6fddbb0f5e56e5137cc81f25c04af4 Mon Sep 17 00:00:00 2001
From: Baoquan He <bhe@redhat.com>
Date: Wed, 24 Apr 2019 09:57:01 +0800
Subject: [PATCH] x86/kexec: Prepare ACPI table mapping for kexec kernel

If the kernel decompressing code accesses ACPI tabels in kexec-ed kernel,
they also need be 1:1 mapped.

---
 arch/x86/kernel/machine_kexec_64.c | 54 ++++++++++++++++++++++++++++--
 1 file changed, 51 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kernel/machine_kexec_64.c b/arch/x86/kernel/machine_kexec_64.c
index d5da54893f97..e2948aea27d4 100644
--- a/arch/x86/kernel/machine_kexec_64.c
+++ b/arch/x86/kernel/machine_kexec_64.c
@@ -30,6 +30,48 @@
 #include <asm/setup.h>
 #include <asm/set_memory.h>
 
+#ifdef CONFIG_ACPI
+/**
+ * Used while adding mapping for ACPI tables.
+ * Can be reused when other iomem regions need be mapped
+ */
+struct init_pgtable_data {
+	struct x86_mapping_info *info;
+	pgd_t *level4p;
+};
+
+static int mem_region_callback(struct resource *res, void *arg)
+{
+	struct init_pgtable_data *data= arg;
+	unsigned long mstart, mend;
+
+	mstart = res->start;
+	mend = mstart + resource_size(res) - 1;
+
+	return kernel_ident_mapping_init(data->info,
+			data->level4p, mstart, mend);
+}
+
+static int init_acpi_pgtable(struct x86_mapping_info *info,
+				   pgd_t *level4p)
+{
+	unsigned long flags = IORESOURCE_MEM | IORESOURCE_BUSY;
+	struct init_pgtable_data data;
+
+	data.info = info;
+	data.level4p = level4p;
+	flags = IORESOURCE_MEM | IORESOURCE_BUSY;
+	return walk_iomem_res_desc(IORES_DESC_ACPI_TABLES, flags, 0, -1,
+				   &data, mem_region_callback);
+}
+#else
+static int init_acpi_pgtable(struct x86_mapping_info *info,
+				   pgd_t *level4p)
+{
+	return 0;
+}
+#endif
+
 #ifdef CONFIG_KEXEC_FILE
 const struct kexec_file_ops * const kexec_file_loaders[] = {
 		&kexec_bzImage64_ops,
@@ -114,6 +156,8 @@ static void *alloc_pgt_page(void *data)
 	return p;
 }
 
+
+
 #ifdef CONFIG_EFI
 static int init_efi_systab_pgtable(struct x86_mapping_info *info,
 				   pgd_t *level4p)
@@ -191,14 +235,18 @@ static int init_pgtable(struct kimage *image, unsigned long start_pgtable)
 			return result;
 	}
 
-	/*
-	 * Prepare EFI systab mapping for kexec kernel, systab is not
-	 * covered by pfn_mapped.
+	/**
+	 * Prepare EFI systab and ACPI table mapping for kexec kernel,
+	 * since they are not covered by pfn_mapped.
 	 */
 	result = init_efi_systab_pgtable(&info, level4p);
 	if (result)
 		return result;
 
+	result = init_acpi_pgtable(&info, level4p);
+	if (result)
+		return result;
+
 	return init_transition_pgtable(image, level4p);
 }
Dave Young April 24, 2019, 6:18 a.m. UTC | #6
On 04/24/19 at 01:41pm, Baoquan He wrote:
> On 04/24/19 at 02:47am, Junichi Nomura wrote:
> > On 4/24/19 2:15 AM, Kairui Song wrote:
> > > On Mon, Apr 22, 2019 at 11:21 PM Junichi Nomura <j-nomura@ce.jp.nec.com> wrote:
> > >> Is the mapping of ACPI tables just by luck, too?
> > > 
> > > Good question, they should have same issue with systab, I ignored this one.
> > > Then in first kernel when doing kexec it should ensure both ACPI
> > > tables and the EFI systab are mapped, that should cover everything and
> > > make it work.
> > 
> > Right.
> > 
> > > Is there anything else missing?
> > No, as far as I looked around get_rsdp_addr().
> 
> Have made a draft patch to build ident mapping for ACPI tables too, it's
> based on Kairui's patch. Dave has tested on his t400s laptop, and
> passed. Please check if this adding is OK.
> 
> Kairui, you can add this into your patch to make a new one and resend.
> Or I can combine them and send for you today.
> 
> From 7f17fcb12a6fddbb0f5e56e5137cc81f25c04af4 Mon Sep 17 00:00:00 2001
> From: Baoquan He <bhe@redhat.com>
> Date: Wed, 24 Apr 2019 09:57:01 +0800
> Subject: [PATCH] x86/kexec: Prepare ACPI table mapping for kexec kernel
> 
> If the kernel decompressing code accesses ACPI tabels in kexec-ed kernel,
> they also need be 1:1 mapped.
> 
> ---
>  arch/x86/kernel/machine_kexec_64.c | 54 ++++++++++++++++++++++++++++--
>  1 file changed, 51 insertions(+), 3 deletions(-)
> 
> diff --git a/arch/x86/kernel/machine_kexec_64.c b/arch/x86/kernel/machine_kexec_64.c
> index d5da54893f97..e2948aea27d4 100644
> --- a/arch/x86/kernel/machine_kexec_64.c
> +++ b/arch/x86/kernel/machine_kexec_64.c
> @@ -30,6 +30,48 @@
>  #include <asm/setup.h>
>  #include <asm/set_memory.h>
>  
> +#ifdef CONFIG_ACPI
> +/**
> + * Used while adding mapping for ACPI tables.
> + * Can be reused when other iomem regions need be mapped
> + */
> +struct init_pgtable_data {
> +	struct x86_mapping_info *info;
> +	pgd_t *level4p;
> +};
> +
> +static int mem_region_callback(struct resource *res, void *arg)
> +{
> +	struct init_pgtable_data *data= arg;
> +	unsigned long mstart, mend;
> +
> +	mstart = res->start;
> +	mend = mstart + resource_size(res) - 1;
> +
> +	return kernel_ident_mapping_init(data->info,
> +			data->level4p, mstart, mend);
> +}
> +
> +static int init_acpi_pgtable(struct x86_mapping_info *info,
> +				   pgd_t *level4p)
> +{
> +	unsigned long flags = IORESOURCE_MEM | IORESOURCE_BUSY;
> +	struct init_pgtable_data data;
> +
> +	data.info = info;
> +	data.level4p = level4p;
> +	flags = IORESOURCE_MEM | IORESOURCE_BUSY;
> +	return walk_iomem_res_desc(IORES_DESC_ACPI_TABLES, flags, 0, -1,
> +				   &data, mem_region_callback);
> +}
> +#else
> +static int init_acpi_pgtable(struct x86_mapping_info *info,
> +				   pgd_t *level4p)
> +{
> +	return 0;
> +}
> +#endif
> +
>  #ifdef CONFIG_KEXEC_FILE
>  const struct kexec_file_ops * const kexec_file_loaders[] = {
>  		&kexec_bzImage64_ops,
> @@ -114,6 +156,8 @@ static void *alloc_pgt_page(void *data)
>  	return p;
>  }
>  
> +
> +
>  #ifdef CONFIG_EFI
>  static int init_efi_systab_pgtable(struct x86_mapping_info *info,
>  				   pgd_t *level4p)
> @@ -191,14 +235,18 @@ static int init_pgtable(struct kimage *image, unsigned long start_pgtable)
>  			return result;
>  	}
>  
> -	/*
> -	 * Prepare EFI systab mapping for kexec kernel, systab is not
> -	 * covered by pfn_mapped.
> +	/**
> +	 * Prepare EFI systab and ACPI table mapping for kexec kernel,
> +	 * since they are not covered by pfn_mapped.
>  	 */
>  	result = init_efi_systab_pgtable(&info, level4p);
>  	if (result)
>  		return result;
>  
> +	result = init_acpi_pgtable(&info, level4p);
> +	if (result)
> +		return result;
> +
>  	return init_transition_pgtable(image, level4p);
>  }
>  
> -- 
> 2.17.2
> 

Since I can not reproduce the acpi table accessing fault with Kairui's patch,
the test is just sanity testing on same hardware. But the patch looks
good.

With Kairui's fix+ this acpi fix and Junichi's patch everything works.
Can anyone send them for example patch 1/2: kexec early mapping for
efi/acpi,  patch 2/2: Junichi's previous patch.

With these fixes I think kexec will just work.  Maybe we can go with
these fixes and leave other issues like the loader type flag etc as 
future issue.

Thanks
Dave
Baoquan He April 24, 2019, 7:45 a.m. UTC | #7
On 04/24/19 at 02:18pm, Dave Young wrote:
> On 04/24/19 at 01:41pm, Baoquan He wrote:
> > On 04/24/19 at 02:47am, Junichi Nomura wrote:
> > > On 4/24/19 2:15 AM, Kairui Song wrote:
> > > > On Mon, Apr 22, 2019 at 11:21 PM Junichi Nomura <j-nomura@ce.jp.nec.com> wrote:
> > > >> Is the mapping of ACPI tables just by luck, too?
> > > > 
> > > > Good question, they should have same issue with systab, I ignored this one.
> > > > Then in first kernel when doing kexec it should ensure both ACPI
> > > > tables and the EFI systab are mapped, that should cover everything and
> > > > make it work.
> > > 
> > > Right.
> > > 
> > > > Is there anything else missing?
> > > No, as far as I looked around get_rsdp_addr().
> > 
> > Have made a draft patch to build ident mapping for ACPI tables too, it's
> > based on Kairui's patch. Dave has tested on his t400s laptop, and
> > passed. Please check if this adding is OK.
> > 
> > Kairui, you can add this into your patch to make a new one and resend.
> > Or I can combine them and send for you today.
 
> 
> Since I can not reproduce the acpi table accessing fault with Kairui's patch,
> the test is just sanity testing on same hardware. But the patch looks
> good.

Yes, usually vendor will put these efi systab, ACPI tables together. See
the regions you listed on your t420 laptop in another mail:
da99f000 - dae9efff Reserved (efi systab fall in this region)
daf9f000 - daffefff ACPI tables

We build 1:1 mapping for kexec kernel down to PMD level. Means for a
region, it will align starting address down to PMD size, and align end
address up to PMD size. So the end of efi systab, 0xdae9efff, will cause
mapping built for the 2MB area, 0xdae00000-0xdaf00000. Clearly ACPI
tables are covered by that PMD entry. That's why only efi systab
mapping is built, accessing ACPI tables doesn't cause error.

But we can't assume they will be put together always, so need map ACPI
tables too.

> 
> With Kairui's fix+ this acpi fix and Junichi's patch everything works.
> Can anyone send them for example patch 1/2: kexec early mapping for
> efi/acpi,  patch 2/2: Junichi's previous patch.

Kairui is having a workshop in the US, I can make a patchset to
include these two patches.

For patch 1/2, I will combine the patch Kairui posted and my draft patch,
Kairui is the author certainly, since he debugged and found out the root
cause, and posted v1 when I was on vacation last week.

For patch 2/2, I think the version Boris organized is good. 
http://lkml.kernel.org/r/20190416095209.GG27892@zn.tnic

> 
> With these fixes I think kexec will just work.  Maybe we can go with
> these fixes and leave other issues like the loader type flag etc as 
> future issue.

Yes, agree.

Thanks
Baoquan
Kairui Song April 24, 2019, 11:36 a.m. UTC | #8
On Wed, Apr 24, 2019, 03:46 Baoquan He <bhe@redhat.com> wrote:
>
> On 04/24/19 at 02:18pm, Dave Young wrote:
> > On 04/24/19 at 01:41pm, Baoquan He wrote:
> > > On 04/24/19 at 02:47am, Junichi Nomura wrote:
> > > > On 4/24/19 2:15 AM, Kairui Song wrote:
> > > > > On Mon, Apr 22, 2019 at 11:21 PM Junichi Nomura <j-nomura@ce.jp.nec.com> wrote:
> > > > >> Is the mapping of ACPI tables just by luck, too?
> > > > >
> > > > > Good question, they should have same issue with systab, I ignored this one.
> > > > > Then in first kernel when doing kexec it should ensure both ACPI
> > > > > tables and the EFI systab are mapped, that should cover everything and
> > > > > make it work.
> > > >
> > > > Right.
> > > >
> > > > > Is there anything else missing?
> > > > No, as far as I looked around get_rsdp_addr().
> > >
> > > Have made a draft patch to build ident mapping for ACPI tables too, it's
> > > based on Kairui's patch. Dave has tested on his t400s laptop, and
> > > passed. Please check if this adding is OK.
> > >
> > > Kairui, you can add this into your patch to make a new one and resend.
> > > Or I can combine them and send for you today.
>
> >
> > Since I can not reproduce the acpi table accessing fault with Kairui's patch,
> > the test is just sanity testing on same hardware. But the patch looks
> > good.
>
> Yes, usually vendor will put these efi systab, ACPI tables together. See
> the regions you listed on your t420 laptop in another mail:
> da99f000 - dae9efff Reserved (efi systab fall in this region)
> daf9f000 - daffefff ACPI tables
>
> We build 1:1 mapping for kexec kernel down to PMD level. Means for a
> region, it will align starting address down to PMD size, and align end
> address up to PMD size. So the end of efi systab, 0xdae9efff, will cause
> mapping built for the 2MB area, 0xdae00000-0xdaf00000. Clearly ACPI
> tables are covered by that PMD entry. That's why only efi systab
> mapping is built, accessing ACPI tables doesn't cause error.
>
> But we can't assume they will be put together always, so need map ACPI
> tables too.
>
> >
> > With Kairui's fix+ this acpi fix and Junichi's patch everything works.
> > Can anyone send them for example patch 1/2: kexec early mapping for
> > efi/acpi,  patch 2/2: Junichi's previous patch.
>
> Kairui is having a workshop in the US, I can make a patchset to
> include these two patches.
>
> For patch 1/2, I will combine the patch Kairui posted and my draft patch,
> Kairui is the author certainly, since he debugged and found out the root
> cause, and posted v1 when I was on vacation last week.
>
> For patch 2/2, I think the version Boris organized is good.
> http://lkml.kernel.org/r/20190416095209.GG27892@zn.tnic
>

Thanks a lot Bao! I was offline for about 1 day due to timezone and
flight, I have no problem with this and the ACPI mapping part looks
good to me.

Patch
diff mbox series

diff --git a/arch/x86/kernel/machine_kexec_64.c b/arch/x86/kernel/machine_kexec_64.c
index ceba408ea982..d5da54893f97 100644
--- a/arch/x86/kernel/machine_kexec_64.c
+++ b/arch/x86/kernel/machine_kexec_64.c
@@ -18,6 +18,7 @@ 
 #include <linux/io.h>
 #include <linux/suspend.h>
 #include <linux/vmalloc.h>
+#include <linux/efi.h>
 
 #include <asm/init.h>
 #include <asm/pgtable.h>
@@ -113,6 +114,37 @@  static void *alloc_pgt_page(void *data)
 	return p;
 }
 
+#ifdef CONFIG_EFI
+static int init_efi_systab_pgtable(struct x86_mapping_info *info,
+				   pgd_t *level4p)
+{
+	unsigned long mstart, mend;
+
+	if (!efi_enabled(EFI_BOOT))
+		return 0;
+
+	mstart = (boot_params.efi_info.efi_systab |
+			((u64)boot_params.efi_info.efi_systab_hi<<32));
+
+	if (efi_enabled(EFI_64BIT))
+		mend = mstart + sizeof(efi_system_table_64_t);
+	else
+		mend = mstart + sizeof(efi_system_table_32_t);
+
+	if (mstart)
+		return kernel_ident_mapping_init(info,
+				level4p, mstart, mend);
+
+	return 0;
+}
+#else
+static inline int init_efi_systab_pgtable(struct x86_mapping_info *info,
+					  pgd_t *level4p)
+{
+	return 0;
+}
+#endif
+
 static int init_pgtable(struct kimage *image, unsigned long start_pgtable)
 {
 	struct x86_mapping_info info = {
@@ -159,6 +191,14 @@  static int init_pgtable(struct kimage *image, unsigned long start_pgtable)
 			return result;
 	}
 
+	/*
+	 * Prepare EFI systab mapping for kexec kernel, systab is not
+	 * covered by pfn_mapped.
+	 */
+	result = init_efi_systab_pgtable(&info, level4p);
+	if (result)
+		return result;
+
 	return init_transition_pgtable(image, level4p);
 }