[v6,1/2] x86/kexec: Build identity mapping for EFI systab and ACPI tables
diff mbox series

Message ID 20190429002318.GA25400@MiWiFi-R3L-srv
State New, archived
Headers show
Series
  • [v6,1/2] x86/kexec: Build identity mapping for EFI systab and ACPI tables
Related show

Commit Message

Baoquan He April 29, 2019, 12:23 a.m. UTC
From: Kairui Song <kasong@redhat.com>

The current code only builds identity mapping for physical memory during
kexec-type loading. The regions reserved by firmware are not covered.
In the later patch, the boot decompressing code of kexec-ed kernel tries
to access EFI systab and ACPI tables, lacking identity mapping for them
will cause error and reset system to firmware.

This error doesn't happen on all systems. Because kexec enables gbpages
to build identity mapping, the EFI systab and ACPI tables could have been
covered if they share the same 1 GB area with physical memory. To make
sure, we should map them always.

So here add mapping for them.

Signed-off-by: Kairui Song <kasong@redhat.com>
Signed-off-by: Baoquan He <bhe@redhat.com>
---
Changelog:
v5->v6:
  Tune code, comments and patch log Per Boris's comments.
v5:
  This patch was newly added into v5.

 arch/x86/kernel/machine_kexec_64.c | 79 ++++++++++++++++++++++++++++++
 1 file changed, 79 insertions(+)

Comments

Borislav Petkov April 29, 2019, 1:55 p.m. UTC | #1
On Mon, Apr 29, 2019 at 08:23:18AM +0800, Baoquan He wrote:
> +static int
> +map_acpi_tables(struct x86_mapping_info *info, pgd_t *level4p)
> +{
> +	unsigned long flags = IORESOURCE_MEM | IORESOURCE_BUSY;
> +	struct init_pgtable_data data;
> +
> +	data.info = info;
> +	data.level4p = level4p;
> +	flags = IORESOURCE_MEM | IORESOURCE_BUSY;
> +	return walk_iomem_res_desc(IORES_DESC_ACPI_TABLES, flags, 0, -1,
> +				   &data, mem_region_callback);
> +}
> +#else
> +static int init_acpi_pgtable(struct x86_mapping_info *info,

Did you at least build-test the !CONFIG_ACPI case?

arch/x86/kernel/machine_kexec_64.c: In function ‘init_pgtable’:
arch/x86/kernel/machine_kexec_64.c:237:11: error: implicit declaration of function ‘map_acpi_tables’; did you mean ‘init_acpi_pgtable’? [-Werror=implicit-function-declaration]
  result = map_acpi_tables(&info, level4p);
           ^~~~~~~~~~~~~~~
           init_acpi_pgtable


I don't think so. ;-(

Sigh, next time at least build-test your patch before hurrying it out. I
fixed it up along with decyphering the commit message:

---
From: Kairui Song <kasong@redhat.com>
Date: Mon, 29 Apr 2019 08:23:18 +0800
Subject: [PATCH] x86/kexec: Add the EFI system tables and ACPI tables to the ident map

Currently, only the whole physical memory is identity-mapped for the
kexec kernel and the regions reserved by firmware are ignored.

However, the recent addition of RSDP parsing in the decompression stage
and especially:

  33f0df8d843d ("x86/boot: Search for RSDP in the EFI tables")

which tries to access EFI system tables and to dig out the RDSP address
from there, becomes a problem because in certain configurations, they
might not be mapped in the kexec'ed kernel's address space.

What is more, this problem doesn't appear on all systems because the
kexec kernel uses gigabyte pages to build the identity mapping. And
the EFI system tables and ACPI tables can, depending on the system
configuration, end up being mapped as part of all physical memory, if
they share the same 1 GB area with the physical memory.

Therefore, make sure they're always mapped.

 [ bp: productize half-baked patch:
   - rewrite commit message.
   - s/init_acpi_pgtable/map_acpi_tables/ in the !ACPI case. ]

Signed-off-by: Kairui Song <kasong@redhat.com>
Signed-off-by: Baoquan He <bhe@redhat.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: dyoung@redhat.com
Cc: fanc.fnst@cn.fujitsu.com
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: j-nomura@ce.jp.nec.com
Cc: kexec@lists.infradead.org
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Lianbo Jiang <lijiang@redhat.com>
Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: x86-ml <x86@kernel.org>
Link: https://lkml.kernel.org/r/20190429002318.GA25400@MiWiFi-R3L-srv
---
 arch/x86/kernel/machine_kexec_64.c | 75 ++++++++++++++++++++++++++++++
 1 file changed, 75 insertions(+)

diff --git a/arch/x86/kernel/machine_kexec_64.c b/arch/x86/kernel/machine_kexec_64.c
index ceba408ea982..3c77bdf7b32a 100644
--- a/arch/x86/kernel/machine_kexec_64.c
+++ b/arch/x86/kernel/machine_kexec_64.c
@@ -18,6 +18,7 @@
 #include <linux/io.h>
 #include <linux/suspend.h>
 #include <linux/vmalloc.h>
+#include <linux/efi.h>
 
 #include <asm/init.h>
 #include <asm/pgtable.h>
@@ -29,6 +30,43 @@
 #include <asm/setup.h>
 #include <asm/set_memory.h>
 
+#ifdef CONFIG_ACPI
+/*
+ * Used while adding mapping for ACPI tables.
+ * Can be reused when other iomem regions need be mapped
+ */
+struct init_pgtable_data {
+	struct x86_mapping_info *info;
+	pgd_t *level4p;
+};
+
+static int mem_region_callback(struct resource *res, void *arg)
+{
+	struct init_pgtable_data *data = arg;
+	unsigned long mstart, mend;
+
+	mstart = res->start;
+	mend = mstart + resource_size(res) - 1;
+
+	return kernel_ident_mapping_init(data->info, data->level4p, mstart, mend);
+}
+
+static int
+map_acpi_tables(struct x86_mapping_info *info, pgd_t *level4p)
+{
+	unsigned long flags = IORESOURCE_MEM | IORESOURCE_BUSY;
+	struct init_pgtable_data data;
+
+	data.info = info;
+	data.level4p = level4p;
+	flags = IORESOURCE_MEM | IORESOURCE_BUSY;
+	return walk_iomem_res_desc(IORES_DESC_ACPI_TABLES, flags, 0, -1,
+				   &data, mem_region_callback);
+}
+#else
+static int map_acpi_tables(struct x86_mapping_info *info, pgd_t *level4p) { return 0; }
+#endif
+
 #ifdef CONFIG_KEXEC_FILE
 const struct kexec_file_ops * const kexec_file_loaders[] = {
 		&kexec_bzImage64_ops,
@@ -36,6 +74,31 @@ const struct kexec_file_ops * const kexec_file_loaders[] = {
 };
 #endif
 
+static int
+map_efi_systab(struct x86_mapping_info *info, pgd_t *level4p)
+{
+#ifdef CONFIG_EFI
+	unsigned long mstart, mend;
+
+	if (!efi_enabled(EFI_BOOT))
+		return 0;
+
+	mstart = (boot_params.efi_info.efi_systab |
+			((u64)boot_params.efi_info.efi_systab_hi<<32));
+
+	if (efi_enabled(EFI_64BIT))
+		mend = mstart + sizeof(efi_system_table_64_t);
+	else
+		mend = mstart + sizeof(efi_system_table_32_t);
+
+	if (!mstart)
+		return 0;
+
+	return kernel_ident_mapping_init(info, level4p, mstart, mend);
+#endif
+	return 0;
+}
+
 static void free_transition_pgtable(struct kimage *image)
 {
 	free_page((unsigned long)image->arch.p4d);
@@ -159,6 +222,18 @@ static int init_pgtable(struct kimage *image, unsigned long start_pgtable)
 			return result;
 	}
 
+	/*
+	 * Prepare EFI systab and ACPI tables for kexec kernel since they are
+	 * not covered by pfn_mapped.
+	 */
+	result = map_efi_systab(&info, level4p);
+	if (result)
+		return result;
+
+	result = map_acpi_tables(&info, level4p);
+	if (result)
+		return result;
+
 	return init_transition_pgtable(image, level4p);
 }
Baoquan He April 29, 2019, 2:16 p.m. UTC | #2
On 04/29/19 at 03:55pm, Borislav Petkov wrote:
> On Mon, Apr 29, 2019 at 08:23:18AM +0800, Baoquan He wrote:
> > +static int
> > +map_acpi_tables(struct x86_mapping_info *info, pgd_t *level4p)
> > +{
> > +	unsigned long flags = IORESOURCE_MEM | IORESOURCE_BUSY;
> > +	struct init_pgtable_data data;
> > +
> > +	data.info = info;
> > +	data.level4p = level4p;
> > +	flags = IORESOURCE_MEM | IORESOURCE_BUSY;
> > +	return walk_iomem_res_desc(IORES_DESC_ACPI_TABLES, flags, 0, -1,
> > +				   &data, mem_region_callback);
> > +}
> > +#else
> > +static int init_acpi_pgtable(struct x86_mapping_info *info,
> 
> Did you at least build-test the !CONFIG_ACPI case?
> 
> arch/x86/kernel/machine_kexec_64.c: In function ‘init_pgtable’:
> arch/x86/kernel/machine_kexec_64.c:237:11: error: implicit declaration of function ‘map_acpi_tables’; did you mean ‘init_acpi_pgtable’? [-Werror=implicit-function-declaration]
>   result = map_acpi_tables(&info, level4p);
>            ^~~~~~~~~~~~~~~
>            init_acpi_pgtable
> 
> 
> I don't think so. ;-(
> 
> Sigh, next time at least build-test your patch before hurrying it out. I
> fixed it up along with decyphering the commit message:

Sorry, thought them simple, didn't build !CONFIG_ACPI case. Should be
more careful.

Thanks for fixing it and the log rewriting.

> 
> ---
> From: Kairui Song <kasong@redhat.com>
> Date: Mon, 29 Apr 2019 08:23:18 +0800
> Subject: [PATCH] x86/kexec: Add the EFI system tables and ACPI tables to the ident map
> 
> Currently, only the whole physical memory is identity-mapped for the
> kexec kernel and the regions reserved by firmware are ignored.
> 
> However, the recent addition of RSDP parsing in the decompression stage
> and especially:
> 
>   33f0df8d843d ("x86/boot: Search for RSDP in the EFI tables")
> 
> which tries to access EFI system tables and to dig out the RDSP address
> from there, becomes a problem because in certain configurations, they
> might not be mapped in the kexec'ed kernel's address space.
> 
> What is more, this problem doesn't appear on all systems because the
> kexec kernel uses gigabyte pages to build the identity mapping. And
> the EFI system tables and ACPI tables can, depending on the system
> configuration, end up being mapped as part of all physical memory, if
> they share the same 1 GB area with the physical memory.
> 
> Therefore, make sure they're always mapped.
> 
>  [ bp: productize half-baked patch:
>    - rewrite commit message.
>    - s/init_acpi_pgtable/map_acpi_tables/ in the !ACPI case. ]
> 
> Signed-off-by: Kairui Song <kasong@redhat.com>
> Signed-off-by: Baoquan He <bhe@redhat.com>
> Signed-off-by: Borislav Petkov <bp@suse.de>
> Cc: dyoung@redhat.com
> Cc: fanc.fnst@cn.fujitsu.com
> Cc: "H. Peter Anvin" <hpa@zytor.com>
> Cc: Ingo Molnar <mingo@redhat.com>
> Cc: j-nomura@ce.jp.nec.com
> Cc: kexec@lists.infradead.org
> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
> Cc: Lianbo Jiang <lijiang@redhat.com>
> Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: x86-ml <x86@kernel.org>
> Link: https://lkml.kernel.org/r/20190429002318.GA25400@MiWiFi-R3L-srv
> ---
>  arch/x86/kernel/machine_kexec_64.c | 75 ++++++++++++++++++++++++++++++
>  1 file changed, 75 insertions(+)
> 
> diff --git a/arch/x86/kernel/machine_kexec_64.c b/arch/x86/kernel/machine_kexec_64.c
> index ceba408ea982..3c77bdf7b32a 100644
> --- a/arch/x86/kernel/machine_kexec_64.c
> +++ b/arch/x86/kernel/machine_kexec_64.c
> @@ -18,6 +18,7 @@
>  #include <linux/io.h>
>  #include <linux/suspend.h>
>  #include <linux/vmalloc.h>
> +#include <linux/efi.h>
>  
>  #include <asm/init.h>
>  #include <asm/pgtable.h>
> @@ -29,6 +30,43 @@
>  #include <asm/setup.h>
>  #include <asm/set_memory.h>
>  
> +#ifdef CONFIG_ACPI
> +/*
> + * Used while adding mapping for ACPI tables.
> + * Can be reused when other iomem regions need be mapped
> + */
> +struct init_pgtable_data {
> +	struct x86_mapping_info *info;
> +	pgd_t *level4p;
> +};
> +
> +static int mem_region_callback(struct resource *res, void *arg)
> +{
> +	struct init_pgtable_data *data = arg;
> +	unsigned long mstart, mend;
> +
> +	mstart = res->start;
> +	mend = mstart + resource_size(res) - 1;
> +
> +	return kernel_ident_mapping_init(data->info, data->level4p, mstart, mend);
> +}
> +
> +static int
> +map_acpi_tables(struct x86_mapping_info *info, pgd_t *level4p)
> +{
> +	unsigned long flags = IORESOURCE_MEM | IORESOURCE_BUSY;
> +	struct init_pgtable_data data;
> +
> +	data.info = info;
> +	data.level4p = level4p;
> +	flags = IORESOURCE_MEM | IORESOURCE_BUSY;
> +	return walk_iomem_res_desc(IORES_DESC_ACPI_TABLES, flags, 0, -1,
> +				   &data, mem_region_callback);
> +}
> +#else
> +static int map_acpi_tables(struct x86_mapping_info *info, pgd_t *level4p) { return 0; }
> +#endif
> +
>  #ifdef CONFIG_KEXEC_FILE
>  const struct kexec_file_ops * const kexec_file_loaders[] = {
>  		&kexec_bzImage64_ops,
> @@ -36,6 +74,31 @@ const struct kexec_file_ops * const kexec_file_loaders[] = {
>  };
>  #endif
>  
> +static int
> +map_efi_systab(struct x86_mapping_info *info, pgd_t *level4p)
> +{
> +#ifdef CONFIG_EFI
> +	unsigned long mstart, mend;
> +
> +	if (!efi_enabled(EFI_BOOT))
> +		return 0;
> +
> +	mstart = (boot_params.efi_info.efi_systab |
> +			((u64)boot_params.efi_info.efi_systab_hi<<32));
> +
> +	if (efi_enabled(EFI_64BIT))
> +		mend = mstart + sizeof(efi_system_table_64_t);
> +	else
> +		mend = mstart + sizeof(efi_system_table_32_t);
> +
> +	if (!mstart)
> +		return 0;
> +
> +	return kernel_ident_mapping_init(info, level4p, mstart, mend);
> +#endif
> +	return 0;
> +}
> +
>  static void free_transition_pgtable(struct kimage *image)
>  {
>  	free_page((unsigned long)image->arch.p4d);
> @@ -159,6 +222,18 @@ static int init_pgtable(struct kimage *image, unsigned long start_pgtable)
>  			return result;
>  	}
>  
> +	/*
> +	 * Prepare EFI systab and ACPI tables for kexec kernel since they are
> +	 * not covered by pfn_mapped.
> +	 */
> +	result = map_efi_systab(&info, level4p);
> +	if (result)
> +		return result;
> +
> +	result = map_acpi_tables(&info, level4p);
> +	if (result)
> +		return result;
> +
>  	return init_transition_pgtable(image, level4p);
>  }
>  
> -- 
> 2.21.0
> 
> -- 
> Regards/Gruss,
>     Boris.
> 
> Good mailing practices for 400: avoid top-posting and trim the reply.
Baoquan He May 13, 2019, 1:43 a.m. UTC | #3
Hi Boris,

On 04/29/19 at 03:55pm, Borislav Petkov wrote:
> From: Kairui Song <kasong@redhat.com>
> Date: Mon, 29 Apr 2019 08:23:18 +0800
> Subject: [PATCH] x86/kexec: Add the EFI system tables and ACPI tables to the ident map

> 
> Currently, only the whole physical memory is identity-mapped for the
> kexec kernel and the regions reserved by firmware are ignored.
> 
> However, the recent addition of RSDP parsing in the decompression stage
> and especially:
> 
>   33f0df8d843d ("x86/boot: Search for RSDP in the EFI tables")
> 
> which tries to access EFI system tables and to dig out the RDSP address
> from there, becomes a problem because in certain configurations, they
> might not be mapped in the kexec'ed kernel's address space.
> 
> What is more, this problem doesn't appear on all systems because the
> kexec kernel uses gigabyte pages to build the identity mapping. And
> the EFI system tables and ACPI tables can, depending on the system
> configuration, end up being mapped as part of all physical memory, if
> they share the same 1 GB area with the physical memory.
> 
> Therefore, make sure they're always mapped.
> 
>  [ bp: productize half-baked patch:
>    - rewrite commit message.
>    - s/init_acpi_pgtable/map_acpi_tables/ in the !ACPI case. ]

Can this patchset be merged, or picked into tip?

Thanks
Baoquan

> Signed-off-by: Kairui Song <kasong@redhat.com>
> Signed-off-by: Baoquan He <bhe@redhat.com>
> Signed-off-by: Borislav Petkov <bp@suse.de>
> Cc: dyoung@redhat.com
> Cc: fanc.fnst@cn.fujitsu.com
> Cc: "H. Peter Anvin" <hpa@zytor.com>
> Cc: Ingo Molnar <mingo@redhat.com>
> Cc: j-nomura@ce.jp.nec.com
> Cc: kexec@lists.infradead.org
> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
> Cc: Lianbo Jiang <lijiang@redhat.com>
> Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: x86-ml <x86@kernel.org>
> Link: https://lkml.kernel.org/r/20190429002318.GA25400@MiWiFi-R3L-srv
> ---
>  arch/x86/kernel/machine_kexec_64.c | 75 ++++++++++++++++++++++++++++++
>  1 file changed, 75 insertions(+)
> 
> diff --git a/arch/x86/kernel/machine_kexec_64.c b/arch/x86/kernel/machine_kexec_64.c
> index ceba408ea982..3c77bdf7b32a 100644
> --- a/arch/x86/kernel/machine_kexec_64.c
> +++ b/arch/x86/kernel/machine_kexec_64.c
> @@ -18,6 +18,7 @@
>  #include <linux/io.h>
>  #include <linux/suspend.h>
>  #include <linux/vmalloc.h>
> +#include <linux/efi.h>
>  
>  #include <asm/init.h>
>  #include <asm/pgtable.h>
> @@ -29,6 +30,43 @@
>  #include <asm/setup.h>
>  #include <asm/set_memory.h>
>  
> +#ifdef CONFIG_ACPI
> +/*
> + * Used while adding mapping for ACPI tables.
> + * Can be reused when other iomem regions need be mapped
> + */
> +struct init_pgtable_data {
> +	struct x86_mapping_info *info;
> +	pgd_t *level4p;
> +};
> +
> +static int mem_region_callback(struct resource *res, void *arg)
> +{
> +	struct init_pgtable_data *data = arg;
> +	unsigned long mstart, mend;
> +
> +	mstart = res->start;
> +	mend = mstart + resource_size(res) - 1;
> +
> +	return kernel_ident_mapping_init(data->info, data->level4p, mstart, mend);
> +}
> +
> +static int
> +map_acpi_tables(struct x86_mapping_info *info, pgd_t *level4p)
> +{
> +	unsigned long flags = IORESOURCE_MEM | IORESOURCE_BUSY;
> +	struct init_pgtable_data data;
> +
> +	data.info = info;
> +	data.level4p = level4p;
> +	flags = IORESOURCE_MEM | IORESOURCE_BUSY;
> +	return walk_iomem_res_desc(IORES_DESC_ACPI_TABLES, flags, 0, -1,
> +				   &data, mem_region_callback);
> +}
> +#else
> +static int map_acpi_tables(struct x86_mapping_info *info, pgd_t *level4p) { return 0; }
> +#endif
> +
>  #ifdef CONFIG_KEXEC_FILE
>  const struct kexec_file_ops * const kexec_file_loaders[] = {
>  		&kexec_bzImage64_ops,
> @@ -36,6 +74,31 @@ const struct kexec_file_ops * const kexec_file_loaders[] = {
>  };
>  #endif
>  
> +static int
> +map_efi_systab(struct x86_mapping_info *info, pgd_t *level4p)
> +{
> +#ifdef CONFIG_EFI
> +	unsigned long mstart, mend;
> +
> +	if (!efi_enabled(EFI_BOOT))
> +		return 0;
> +
> +	mstart = (boot_params.efi_info.efi_systab |
> +			((u64)boot_params.efi_info.efi_systab_hi<<32));
> +
> +	if (efi_enabled(EFI_64BIT))
> +		mend = mstart + sizeof(efi_system_table_64_t);
> +	else
> +		mend = mstart + sizeof(efi_system_table_32_t);
> +
> +	if (!mstart)
> +		return 0;
> +
> +	return kernel_ident_mapping_init(info, level4p, mstart, mend);
> +#endif
> +	return 0;
> +}
> +
>  static void free_transition_pgtable(struct kimage *image)
>  {
>  	free_page((unsigned long)image->arch.p4d);
> @@ -159,6 +222,18 @@ static int init_pgtable(struct kimage *image, unsigned long start_pgtable)
>  			return result;
>  	}
>  
> +	/*
> +	 * Prepare EFI systab and ACPI tables for kexec kernel since they are
> +	 * not covered by pfn_mapped.
> +	 */
> +	result = map_efi_systab(&info, level4p);
> +	if (result)
> +		return result;
> +
> +	result = map_acpi_tables(&info, level4p);
> +	if (result)
> +		return result;
> +
>  	return init_transition_pgtable(image, level4p);
>  }
>  
> -- 
> 2.21.0
> 
> -- 
> Regards/Gruss,
>     Boris.
> 
> Good mailing practices for 400: avoid top-posting and trim the reply.
Borislav Petkov May 13, 2019, 7:07 a.m. UTC | #4
Baoquan,

On Mon, May 13, 2019 at 09:43:05AM +0800, Baoquan He wrote:
> Can this patchset be merged, or picked into tip?

what is this thing that happens everytime after a kernel is released and
lasts for approximately 2 weeks?
Baoquan He May 13, 2019, 7:32 a.m. UTC | #5
On 05/13/19 at 09:07am, Borislav Petkov wrote:
> Baoquan,
> 
> On Mon, May 13, 2019 at 09:43:05AM +0800, Baoquan He wrote:
> > Can this patchset be merged, or picked into tip?
> 
> what is this thing that happens everytime after a kernel is released and
> lasts for approximately 2 weeks?

This is a critical bug which breaks memory hotplug, since KASLR is
enabled by default in upstream, and in our distros too. You can see that
Junichi posted the patch after NEC must have tested the code and found
the new issue. And Chao from FJ also worked out the patches to fix the bug. 
And I have tracking  bugs at hand from other important customers, related
to this fix too. The back porting of Chao's patches into our distros are
blocked by these two. We gonna miss another due date we promised to customers.

Thanks
Baoquan
Borislav Petkov May 13, 2019, 7:50 a.m. UTC | #6
On Mon, May 13, 2019 at 03:32:54PM +0800, Baoquan He wrote:
> This is a critical bug which breaks memory hotplug,

Please concentrate and stop the blabla:

36f0c423552d ("x86/boot: Disable RSDP parsing temporarily")

already explains what the deal is. This code was *purposefully* disabled
because we ran out of time and it broke a couple of machines. Don't make
me repeat all that - you were on CC on *all* threads and messages!

So we're going to try it again this cycle and if there's no fallout, it
will go upstream. If not, it will have to be fixed. The usual thing.

And I don't care if Kairui's patch fixes this one problem - judging by
the fragility of this whole thing, it should be hammered on one more
cycle on as many boxes as possible to make sure there's no other SNAFUs.

So go test it on more machines instead. I've pushed it here:

https://git.kernel.org/pub/scm/linux/kernel/git/bp/bp.git/log/?h=next-merge-window
Baoquan He May 13, 2019, 8:02 a.m. UTC | #7
On 05/13/19 at 09:50am, Borislav Petkov wrote:
> On Mon, May 13, 2019 at 03:32:54PM +0800, Baoquan He wrote:
> > This is a critical bug which breaks memory hotplug,
> 
> Please concentrate and stop the blabla:
> 
> 36f0c423552d ("x86/boot: Disable RSDP parsing temporarily")
> 
> already explains what the deal is. This code was *purposefully* disabled
> because we ran out of time and it broke a couple of machines. Don't make
> me repeat all that - you were on CC on *all* threads and messages!
> 
> So we're going to try it again this cycle and if there's no fallout, it
> will go upstream. If not, it will have to be fixed. The usual thing.
> 
> And I don't care if Kairui's patch fixes this one problem - judging by
> the fragility of this whole thing, it should be hammered on one more
> cycle on as many boxes as possible to make sure there's no other SNAFUs.
> 
> So go test it on more machines instead. I've pushed it here:
> 
> https://git.kernel.org/pub/scm/linux/kernel/git/bp/bp.git/log/?h=next-merge-window

Pingfan has got a machine to reproduce the kexec breakage issue, and
applying these two patches fix it. He planned to paste the test result.
I will ask him to try this branch if he has time, or I can get his
machine to test.

Junichi, also have a try on Boris's branch in NEC's test environment?

Thanks
Baoquan
Baoquan He May 13, 2019, 8:06 a.m. UTC | #8
Hi Dave,

On 05/13/19 at 09:50am, Borislav Petkov wrote:
> On Mon, May 13, 2019 at 03:32:54PM +0800, Baoquan He wrote:
> > This is a critical bug which breaks memory hotplug,
> 
> Please concentrate and stop the blabla:
> 
> 36f0c423552d ("x86/boot: Disable RSDP parsing temporarily")
> 
> already explains what the deal is. This code was *purposefully* disabled
> because we ran out of time and it broke a couple of machines. Don't make

I remember your machine is the one on whihc the issue is reported. Could
you also test it and confirm if these all things found ealier are
cleared out?

Thanks
Baoquan

> me repeat all that - you were on CC on *all* threads and messages!
> 
> So we're going to try it again this cycle and if there's no fallout, it
> will go upstream. If not, it will have to be fixed. The usual thing.
> 
> And I don't care if Kairui's patch fixes this one problem - judging by
> the fragility of this whole thing, it should be hammered on one more
> cycle on as many boxes as possible to make sure there's no other SNAFUs.
> 
> So go test it on more machines instead. I've pushed it here:
> 
> https://git.kernel.org/pub/scm/linux/kernel/git/bp/bp.git/log/?h=next-merge-window
> 
> -- 
> Regards/Gruss,
>     Boris.
> 
> Good mailing practices for 400: avoid top-posting and trim the reply.
Dave Young May 14, 2019, 3:22 a.m. UTC | #9
On 05/13/19 at 04:06pm, Baoquan He wrote:
> Hi Dave,
> 
> On 05/13/19 at 09:50am, Borislav Petkov wrote:
> > On Mon, May 13, 2019 at 03:32:54PM +0800, Baoquan He wrote:
> > > This is a critical bug which breaks memory hotplug,
> > 
> > Please concentrate and stop the blabla:
> > 
> > 36f0c423552d ("x86/boot: Disable RSDP parsing temporarily")
> > 
> > already explains what the deal is. This code was *purposefully* disabled
> > because we ran out of time and it broke a couple of machines. Don't make
> 
> I remember your machine is the one on whihc the issue is reported. Could
> you also test it and confirm if these all things found ealier are
> cleared out?
> 

I did some tests on the laptop,  thing is:
1. apply the 3 patches (two you posted + Boris's revert commit 52b922c3d49c)
   on latest Linus master branch, everything works fine.

2. build and test the tip/next-merge-window branch, kernel hangs early
without output, (both 1st boot and kexec boot)

So I think these 3 patches are good,  but there could be other issues
which is not related to the problem we saw.

Another thing is we can move the get rsdp after console_init, but that
can be done later as separate patch.

Thanks
Dave
Baoquan He May 14, 2019, 3:33 a.m. UTC | #10
On 05/14/19 at 11:22am, Dave Young wrote:
> On 05/13/19 at 04:06pm, Baoquan He wrote:
> > Hi Dave,
> > 
> > On 05/13/19 at 09:50am, Borislav Petkov wrote:
> > > On Mon, May 13, 2019 at 03:32:54PM +0800, Baoquan He wrote:
> > > > This is a critical bug which breaks memory hotplug,
> > > 
> > > Please concentrate and stop the blabla:
> > > 
> > > 36f0c423552d ("x86/boot: Disable RSDP parsing temporarily")
> > > 
> > > already explains what the deal is. This code was *purposefully* disabled
> > > because we ran out of time and it broke a couple of machines. Don't make
> > 
> > I remember your machine is the one on whihc the issue is reported. Could
> > you also test it and confirm if these all things found ealier are
> > cleared out?
> > 
> 
> I did some tests on the laptop,  thing is:
> 1. apply the 3 patches (two you posted + Boris's revert commit 52b922c3d49c)
>    on latest Linus master branch, everything works fine.
> 
> 2. build and test the tip/next-merge-window branch, kernel hangs early
> without output, (both 1st boot and kexec boot)

Thanks, Dave.

Yeah, I also tested on a HP machine, problem reprodued on the current
master branch when revert commit 52b922c3d49c.

Then apply these two patches, problem solved.

Tried boris's next-merge-window branch too, kexec works very well.

Dirk, Junichi, feel free to add your test result if you have time.

> 
> Another thing is we can move the get rsdp after console_init, but that
> can be done later as separate patch.

Yes, agree.
Dave Young May 14, 2019, 8:48 a.m. UTC | #11
On 05/14/19 at 11:22am, Dave Young wrote:
> On 05/13/19 at 04:06pm, Baoquan He wrote:
> > Hi Dave,
> > 
> > On 05/13/19 at 09:50am, Borislav Petkov wrote:
> > > On Mon, May 13, 2019 at 03:32:54PM +0800, Baoquan He wrote:
> > > > This is a critical bug which breaks memory hotplug,
> > > 
> > > Please concentrate and stop the blabla:
> > > 
> > > 36f0c423552d ("x86/boot: Disable RSDP parsing temporarily")
> > > 
> > > already explains what the deal is. This code was *purposefully* disabled
> > > because we ran out of time and it broke a couple of machines. Don't make
> > 
> > I remember your machine is the one on whihc the issue is reported. Could
> > you also test it and confirm if these all things found ealier are
> > cleared out?
> > 
> 
> I did some tests on the laptop,  thing is:
> 1. apply the 3 patches (two you posted + Boris's revert commit 52b922c3d49c)
>    on latest Linus master branch, everything works fine.
> 
> 2. build and test the tip/next-merge-window branch, kernel hangs early
> without output, (both 1st boot and kexec boot)

Update about 2.  It should be not early rsdp related, I got the boot log
Since can not reproduce with Linus master branch it may have been fixed.

[    0.685374][    T1] rcu: Hierarchical SRCU implementation.
[    0.686414][    T1] general protection fault: 0000 [#1] SMP PTI
[    0.687328][    T1] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.1.0-rc6+ #877
[    0.687328][    T1] Hardware name: LENOVO 4236NUC/4236NUC, BIOS 83ET82WW (1.52 ) 06/04/2018
[    0.687328][    T1] RIP: 0010:reserve_ds_buffers+0x34e/0x450
[    0.687328][    T1] Code: e8 77 49 1a 00 4c 8b 54 24 18 48 85 c0 4c 8b 4c 24 10 48 89 44 24 20 0f 84 68 fe ff ff 4a 8b 0c cd 20 88 07 a7 48 8d 54 24 20 <48> 89 04 11 e9 e5 fd ff ff 85 db 0f 85 67 fe ff ff 45 85 ed 75 24
[    0.687328][    T1] RSP: 0000:ffffa52100c8bd90 EFLAGS: 00010286
[    0.687328][    T1] RAX: ffff8c6bd5d03000 RBX: 0000000000000000 RCX: ffff8c6bd6000000
[    0.687328][    T1] RDX: ffffa52100c8bdb0 RSI: ffffffffa620e209 RDI: ffff8c6bd5d04000
[    0.687328][    T1] RBP: ffff8c6bd60103a0 R08: ffff8c6bd4da0000 R09: 0000000000000000
[    0.687328][    T1] R10: ffff8c6bd4e20000 R11: ffffc8e248538800 R12: 00000000000103a0
[    0.687328][    T1] R13: 0000000000010000 R14: fffffe0000013000 R15: 0000000000000000
[    0.687328][    T1] FS:  0000000000000000(0000) GS:ffff8c6bd6000000(0000) knlGS:0000000000000000
[    0.687328][    T1] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[    0.687328][    T1] CR2: ffff8c6bde5ff000 CR3: 000000015700e001 CR4: 00000000000606f0
[    0.687328][    T1] Call Trace:
[    0.687328][    T1]  ? hardlockup_detector_event_create+0x50/0x50
[    0.687328][    T1]  x86_reserve_hardware+0x173/0x180
[    0.687328][    T1]  x86_pmu_event_init+0x39/0x220
[    0.687328][    T1]  ? hardlockup_detector_event_create+0x50/0x50
[    0.687328][    T1]  perf_try_init_event+0x42/0xd0
[    0.687328][    T1]  perf_event_alloc+0x46a/0x8b0
[    0.687328][    T1]  perf_event_create_kernel_counter+0x21/0x130
[    0.687328][    T1]  hardlockup_detector_event_create+0x39/0x50
[    0.687328][    T1]  hardlockup_detector_perf_init+0xc/0x40
[    0.687328][    T1]  lockup_detector_init+0x3a/0x71
[    0.687328][    T1]  kernel_init_freeable+0xbc/0x231
[    0.687328][    T1]  ? rest_init+0x9f/0x9f
[    0.687328][    T1]  kernel_init+0xa/0x101
[    0.687328][    T1]  ret_from_fork+0x35/0x40
[    0.687328][    T1] Modules linked in:
[    0.687331][    T1] ---[ end trace 71ee47f6125e74a4 ]---
[    0.688331][    T1] RIP: 0010:reserve_ds_buffers+0x34e/0x450
[    0.689330][    T1] Code: e8 77 49 1a 00 4c 8b 54 24 18 48 85 c0 4c 8b 4c 24 10 48 89 44 24 20 0f 84 68 fe ff ff 4a 8b 0c cd 20 88 07 a7 48 8d 54 24 20 <48> 89 04 11 e9 e5 fd ff ff 85 db 0f 85 67 fe ff ff 45 85 ed 75 24
[    0.690330][    T1] RSP: 0000:ffffa52100c8bd90 EFLAGS: 00010286
[    0.691330][    T1] RAX: ffff8c6bd5d03000 RBX: 0000000000000000 RCX: ffff8c6bd6000000
[    0.692330][    T1] RDX: ffffa52100c8bdb0 RSI: ffffffffa620e209 RDI: ffff8c6bd5d04000
[    0.693330][    T1] RBP: ffff8c6bd60103a0 R08: ffff8c6bd4da0000 R09: 0000000000000000
[    0.694330][    T1] R10: ffff8c6bd4e20000 R11: ffffc8e248538800 R12: 00000000000103a0
[    0.695330][    T1] R13: 0000000000010000 R14: fffffe0000013000 R15: 0000000000000000
[    0.696330][    T1] FS:  0000000000000000(0000) GS:ffff8c6bd6000000(0000) knlGS:0000000000000000
[    0.697330][    T1] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[    0.698330][    T1] CR2: ffff8c6bde5ff000 CR3: 000000015700e001 CR4: 00000000000606f0
[    0.699334][    T1] Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b
[    0.700328][    T1] ---[ end Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b ]---

Thanks
Dave
Kairui Song May 14, 2019, 11:18 a.m. UTC | #12
On Tue, May 14, 2019 at 4:48 PM Dave Young <dyoung@redhat.com> wrote:
>
> On 05/14/19 at 11:22am, Dave Young wrote:
> > On 05/13/19 at 04:06pm, Baoquan He wrote:
> > > Hi Dave,
> > >
> > > On 05/13/19 at 09:50am, Borislav Petkov wrote:
> > > > On Mon, May 13, 2019 at 03:32:54PM +0800, Baoquan He wrote:
> > > > > This is a critical bug which breaks memory hotplug,
> > > >
> > > > Please concentrate and stop the blabla:
> > > >
> > > > 36f0c423552d ("x86/boot: Disable RSDP parsing temporarily")
> > > >
> > > > already explains what the deal is. This code was *purposefully* disabled
> > > > because we ran out of time and it broke a couple of machines. Don't make
> > >
> > > I remember your machine is the one on whihc the issue is reported. Could
> > > you also test it and confirm if these all things found ealier are
> > > cleared out?
> > >
> >
> > I did some tests on the laptop,  thing is:
> > 1. apply the 3 patches (two you posted + Boris's revert commit 52b922c3d49c)
> >    on latest Linus master branch, everything works fine.
> >
> > 2. build and test the tip/next-merge-window branch, kernel hangs early
> > without output, (both 1st boot and kexec boot)
>
> Update about 2.  It should be not early rsdp related, I got the boot log
> Since can not reproduce with Linus master branch it may have been fixed.
>
> [    0.685374][    T1] rcu: Hierarchical SRCU implementation.
> [    0.686414][    T1] general protection fault: 0000 [#1] SMP PTI
> [    0.687328][    T1] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.1.0-rc6+ #877
> [    0.687328][    T1] Hardware name: LENOVO 4236NUC/4236NUC, BIOS 83ET82WW (1.52 ) 06/04/2018
> [    0.687328][    T1] RIP: 0010:reserve_ds_buffers+0x34e/0x450
> [    0.687328][    T1] Code: e8 77 49 1a 00 4c 8b 54 24 18 48 85 c0 4c 8b 4c 24 10 48 89 44 24 20 0f 84 68 fe ff ff 4a 8b 0c cd 20 88 07 a7 48 8d 54 24 20 <48> 89 04 11 e9 e5 fd ff ff 85 db 0f 85 67 fe ff ff 45 85 ed 75 24
> [    0.687328][    T1] RSP: 0000:ffffa52100c8bd90 EFLAGS: 00010286
> [    0.687328][    T1] RAX: ffff8c6bd5d03000 RBX: 0000000000000000 RCX: ffff8c6bd6000000
> [    0.687328][    T1] RDX: ffffa52100c8bdb0 RSI: ffffffffa620e209 RDI: ffff8c6bd5d04000
> [    0.687328][    T1] RBP: ffff8c6bd60103a0 R08: ffff8c6bd4da0000 R09: 0000000000000000
> [    0.687328][    T1] R10: ffff8c6bd4e20000 R11: ffffc8e248538800 R12: 00000000000103a0
> [    0.687328][    T1] R13: 0000000000010000 R14: fffffe0000013000 R15: 0000000000000000
> [    0.687328][    T1] FS:  0000000000000000(0000) GS:ffff8c6bd6000000(0000) knlGS:0000000000000000
> [    0.687328][    T1] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [    0.687328][    T1] CR2: ffff8c6bde5ff000 CR3: 000000015700e001 CR4: 00000000000606f0
> [    0.687328][    T1] Call Trace:
> [    0.687328][    T1]  ? hardlockup_detector_event_create+0x50/0x50
> [    0.687328][    T1]  x86_reserve_hardware+0x173/0x180
> [    0.687328][    T1]  x86_pmu_event_init+0x39/0x220
> [    0.687328][    T1]  ? hardlockup_detector_event_create+0x50/0x50
> [    0.687328][    T1]  perf_try_init_event+0x42/0xd0
> [    0.687328][    T1]  perf_event_alloc+0x46a/0x8b0
> [    0.687328][    T1]  perf_event_create_kernel_counter+0x21/0x130
> [    0.687328][    T1]  hardlockup_detector_event_create+0x39/0x50
> [    0.687328][    T1]  hardlockup_detector_perf_init+0xc/0x40
> [    0.687328][    T1]  lockup_detector_init+0x3a/0x71
> [    0.687328][    T1]  kernel_init_freeable+0xbc/0x231
> [    0.687328][    T1]  ? rest_init+0x9f/0x9f
> [    0.687328][    T1]  kernel_init+0xa/0x101
> [    0.687328][    T1]  ret_from_fork+0x35/0x40
> [    0.687328][    T1] Modules linked in:
> [    0.687331][    T1] ---[ end trace 71ee47f6125e74a4 ]---
> [    0.688331][    T1] RIP: 0010:reserve_ds_buffers+0x34e/0x450
> [    0.689330][    T1] Code: e8 77 49 1a 00 4c 8b 54 24 18 48 85 c0 4c 8b 4c 24 10 48 89 44 24 20 0f 84 68 fe ff ff 4a 8b 0c cd 20 88 07 a7 48 8d 54 24 20 <48> 89 04 11 e9 e5 fd ff ff 85 db 0f 85 67 fe ff ff 45 85 ed 75 24
> [    0.690330][    T1] RSP: 0000:ffffa52100c8bd90 EFLAGS: 00010286
> [    0.691330][    T1] RAX: ffff8c6bd5d03000 RBX: 0000000000000000 RCX: ffff8c6bd6000000
> [    0.692330][    T1] RDX: ffffa52100c8bdb0 RSI: ffffffffa620e209 RDI: ffff8c6bd5d04000
> [    0.693330][    T1] RBP: ffff8c6bd60103a0 R08: ffff8c6bd4da0000 R09: 0000000000000000
> [    0.694330][    T1] R10: ffff8c6bd4e20000 R11: ffffc8e248538800 R12: 00000000000103a0
> [    0.695330][    T1] R13: 0000000000010000 R14: fffffe0000013000 R15: 0000000000000000
> [    0.696330][    T1] FS:  0000000000000000(0000) GS:ffff8c6bd6000000(0000) knlGS:0000000000000000
> [    0.697330][    T1] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [    0.698330][    T1] CR2: ffff8c6bde5ff000 CR3: 000000015700e001 CR4: 00000000000606f0
> [    0.699334][    T1] Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b
> [    0.700328][    T1] ---[ end Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b ]---
>
> Thanks
> Dave

I can confirm as I got same result on my T420. next-merge-window
branch fails both normal boot and kexec...
I didn't manage to get a working serial console, but the behavior is
the same so should be the same issue.

Also after "git cherry-pick de01951c8d40^..next-merge-window" on
master branch, it worked well, so the patch should be good.

--
Best Regards,
Kairui Song
Peter Zijlstra May 14, 2019, 11:38 a.m. UTC | #13
On Tue, May 14, 2019 at 04:48:41PM +0800, Dave Young wrote:

> > I did some tests on the laptop,  thing is:
> > 1. apply the 3 patches (two you posted + Boris's revert commit 52b922c3d49c)
> >    on latest Linus master branch, everything works fine.
> > 
> > 2. build and test the tip/next-merge-window branch, kernel hangs early
> > without output, (both 1st boot and kexec boot)
> 
> Update about 2.  It should be not early rsdp related, I got the boot log
> Since can not reproduce with Linus master branch it may have been fixed.

Nothing was changed here since PTI.

> [    0.685374][    T1] rcu: Hierarchical SRCU implementation.
> [    0.686414][    T1] general protection fault: 0000 [#1] SMP PTI
> [    0.687328][    T1] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.1.0-rc6+ #877
> [    0.687328][    T1] Hardware name: LENOVO 4236NUC/4236NUC, BIOS 83ET82WW (1.52 ) 06/04/2018
> [    0.687328][    T1] RIP: 0010:reserve_ds_buffers+0x34e/0x450

> [    0.687328][    T1] Call Trace:
> [    0.687328][    T1]  ? hardlockup_detector_event_create+0x50/0x50
> [    0.687328][    T1]  x86_reserve_hardware+0x173/0x180
> [    0.687328][    T1]  x86_pmu_event_init+0x39/0x220

The DS buffers are special in that they're part of cpu_entrt_area. If
this comes apart it might mean your pagetables are dodgy.
Dave Young May 14, 2019, 12:58 p.m. UTC | #14
On 05/14/19 at 01:38pm, Peter Zijlstra wrote:
> On Tue, May 14, 2019 at 04:48:41PM +0800, Dave Young wrote:
> 
> > > I did some tests on the laptop,  thing is:
> > > 1. apply the 3 patches (two you posted + Boris's revert commit 52b922c3d49c)
> > >    on latest Linus master branch, everything works fine.
> > > 
> > > 2. build and test the tip/next-merge-window branch, kernel hangs early
> > > without output, (both 1st boot and kexec boot)
> > 
> > Update about 2.  It should be not early rsdp related, I got the boot log
> > Since can not reproduce with Linus master branch it may have been fixed.
> 
> Nothing was changed here since PTI.
> 
> > [    0.685374][    T1] rcu: Hierarchical SRCU implementation.
> > [    0.686414][    T1] general protection fault: 0000 [#1] SMP PTI
> > [    0.687328][    T1] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.1.0-rc6+ #877
> > [    0.687328][    T1] Hardware name: LENOVO 4236NUC/4236NUC, BIOS 83ET82WW (1.52 ) 06/04/2018
> > [    0.687328][    T1] RIP: 0010:reserve_ds_buffers+0x34e/0x450
> 
> > [    0.687328][    T1] Call Trace:
> > [    0.687328][    T1]  ? hardlockup_detector_event_create+0x50/0x50
> > [    0.687328][    T1]  x86_reserve_hardware+0x173/0x180
> > [    0.687328][    T1]  x86_pmu_event_init+0x39/0x220
> 
> The DS buffers are special in that they're part of cpu_entrt_area. If
> this comes apart it might mean your pagetables are dodgy.

Hmm, it seems caused by some WIP branch patches, I suspect below:
commit 124d6af5a5f559e516ed2c6ea857e889ed293b43
x86/paravirt: Standardize 'insn_buff' variable names

The suspicious line is "per_cpu(insn_buff, cpu) = insn_buff;"

I can help to test if need to try anything, eg. debug patch.

I do not know anything of the pti and ds buffer logic, but below chunk
make the next-merge-window branch booting fine on the laptop.
---
diff --git a/arch/x86/events/intel/ds.c b/arch/x86/events/intel/ds.c
index ad47f6415b17..fa254c576032 100644
--- a/arch/x86/events/intel/ds.c
+++ b/arch/x86/events/intel/ds.c
@@ -337,7 +337,7 @@ static int alloc_pebs_buffer(int cpu)
 	struct debug_store *ds = hwev->ds;
 	size_t bsiz = x86_pmu.pebs_buffer_size;
 	int max, node = cpu_to_node(cpu);
-	void *buffer, *insn_buff, *cea;
+	void *buffer, *ibuff, *cea;
 
 	if (!x86_pmu.pebs)
 		return 0;
@@ -351,12 +351,12 @@ static int alloc_pebs_buffer(int cpu)
 	 * buffer then.
 	 */
 	if (x86_pmu.intel_cap.pebs_format < 2) {
-		insn_buff = kzalloc_node(PEBS_FIXUP_SIZE, GFP_KERNEL, node);
-		if (!insn_buff) {
+		ibuff = kzalloc_node(PEBS_FIXUP_SIZE, GFP_KERNEL, node);
+		if (!ibuff) {
 			dsfree_pages(buffer, bsiz);
 			return -ENOMEM;
 		}
-		per_cpu(insn_buff, cpu) = insn_buff;
+		per_cpu(insn_buff, cpu) = ibuff;
 	}
 	hwev->ds_pebs_vaddr = buffer;
 	/* Update the cpu entry area mapping */
Peter Zijlstra May 14, 2019, 1:54 p.m. UTC | #15
On Tue, May 14, 2019 at 08:58:35PM +0800, Dave Young wrote:

> Hmm, it seems caused by some WIP branch patches, I suspect below:

Grmbl.. Ingo, can you zap all those WIP branches, please? They mostly
just get in the way of things. If you want to run them, merge them in a
private branch or something.

> commit 124d6af5a5f559e516ed2c6ea857e889ed293b43
> x86/paravirt: Standardize 'insn_buff' variable names
> 
> The suspicious line is "per_cpu(insn_buff, cpu) = insn_buff;"

Yah, unfortunatly per-cpu variables live in the same namespace as normal
variables and so the above is incorrect, because the local @insn_buffer
variable shadows the global per-cpu symbol and very weird things will
happen.

This is of course consistent with C rules, where everything lives in the
same namespace...
Ingo Molnar May 14, 2019, 2:09 p.m. UTC | #16
* Dave Young <dyoung@redhat.com> wrote:

> On 05/14/19 at 01:38pm, Peter Zijlstra wrote:
> > On Tue, May 14, 2019 at 04:48:41PM +0800, Dave Young wrote:
> > 
> > > > I did some tests on the laptop,  thing is:
> > > > 1. apply the 3 patches (two you posted + Boris's revert commit 52b922c3d49c)
> > > >    on latest Linus master branch, everything works fine.
> > > > 
> > > > 2. build and test the tip/next-merge-window branch, kernel hangs early
> > > > without output, (both 1st boot and kexec boot)
> > > 
> > > Update about 2.  It should be not early rsdp related, I got the boot log
> > > Since can not reproduce with Linus master branch it may have been fixed.
> > 
> > Nothing was changed here since PTI.
> > 
> > > [    0.685374][    T1] rcu: Hierarchical SRCU implementation.
> > > [    0.686414][    T1] general protection fault: 0000 [#1] SMP PTI
> > > [    0.687328][    T1] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.1.0-rc6+ #877
> > > [    0.687328][    T1] Hardware name: LENOVO 4236NUC/4236NUC, BIOS 83ET82WW (1.52 ) 06/04/2018
> > > [    0.687328][    T1] RIP: 0010:reserve_ds_buffers+0x34e/0x450
> > 
> > > [    0.687328][    T1] Call Trace:
> > > [    0.687328][    T1]  ? hardlockup_detector_event_create+0x50/0x50
> > > [    0.687328][    T1]  x86_reserve_hardware+0x173/0x180
> > > [    0.687328][    T1]  x86_pmu_event_init+0x39/0x220
> > 
> > The DS buffers are special in that they're part of cpu_entrt_area. If
> > this comes apart it might mean your pagetables are dodgy.
> 
> Hmm, it seems caused by some WIP branch patches, I suspect below:
> commit 124d6af5a5f559e516ed2c6ea857e889ed293b43
> x86/paravirt: Standardize 'insn_buff' variable names

This commit had a bug which I fixed - could you try the latest -tip?

Thanks,

	Ingo
Dave Young May 15, 2019, 1:08 a.m. UTC | #17
On 05/14/19 at 04:09pm, Ingo Molnar wrote:
> 
> * Dave Young <dyoung@redhat.com> wrote:
> 
> > On 05/14/19 at 01:38pm, Peter Zijlstra wrote:
> > > On Tue, May 14, 2019 at 04:48:41PM +0800, Dave Young wrote:
> > > 
> > > > > I did some tests on the laptop,  thing is:
> > > > > 1. apply the 3 patches (two you posted + Boris's revert commit 52b922c3d49c)
> > > > >    on latest Linus master branch, everything works fine.
> > > > > 
> > > > > 2. build and test the tip/next-merge-window branch, kernel hangs early
> > > > > without output, (both 1st boot and kexec boot)
> > > > 
> > > > Update about 2.  It should be not early rsdp related, I got the boot log
> > > > Since can not reproduce with Linus master branch it may have been fixed.
> > > 
> > > Nothing was changed here since PTI.
> > > 
> > > > [    0.685374][    T1] rcu: Hierarchical SRCU implementation.
> > > > [    0.686414][    T1] general protection fault: 0000 [#1] SMP PTI
> > > > [    0.687328][    T1] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.1.0-rc6+ #877
> > > > [    0.687328][    T1] Hardware name: LENOVO 4236NUC/4236NUC, BIOS 83ET82WW (1.52 ) 06/04/2018
> > > > [    0.687328][    T1] RIP: 0010:reserve_ds_buffers+0x34e/0x450
> > > 
> > > > [    0.687328][    T1] Call Trace:
> > > > [    0.687328][    T1]  ? hardlockup_detector_event_create+0x50/0x50
> > > > [    0.687328][    T1]  x86_reserve_hardware+0x173/0x180
> > > > [    0.687328][    T1]  x86_pmu_event_init+0x39/0x220
> > > 
> > > The DS buffers are special in that they're part of cpu_entrt_area. If
> > > this comes apart it might mean your pagetables are dodgy.
> > 
> > Hmm, it seems caused by some WIP branch patches, I suspect below:
> > commit 124d6af5a5f559e516ed2c6ea857e889ed293b43
> > x86/paravirt: Standardize 'insn_buff' variable names
> 
> This commit had a bug which I fixed - could you try the latest -tip?

Will do, but I do not use tip tree often, not sure which branch includes
the fix.

https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git/
Is it tip/master or tip/tip?

Thanks
Dave
Junichi Nomura May 15, 2019, 5:17 a.m. UTC | #18
Hi Kairui,

On 5/13/19 5:02 PM, Baoquan He wrote:
> On 05/13/19 at 09:50am, Borislav Petkov wrote:
>> On Mon, May 13, 2019 at 03:32:54PM +0800, Baoquan He wrote:
>> So we're going to try it again this cycle and if there's no fallout, it
>> will go upstream. If not, it will have to be fixed. The usual thing.
>>
>> And I don't care if Kairui's patch fixes this one problem - judging by
>> the fragility of this whole thing, it should be hammered on one more
>> cycle on as many boxes as possible to make sure there's no other SNAFUs.
>>
>> So go test it on more machines instead. I've pushed it here:
>>
>> https://git.kernel.org/pub/scm/linux/kernel/git/bp/bp.git/log/?h=next-merge-window
> 
> Pingfan has got a machine to reproduce the kexec breakage issue, and
> applying these two patches fix it. He planned to paste the test result.
> I will ask him to try this branch if he has time, or I can get his
> machine to test.
> 
> Junichi, also have a try on Boris's branch in NEC's test environment?

while the patch set works on most of the machines I'm testing around,
I found kexec(1) fails to load kernel on a few machines if this patch
is applied.  Those machines don't have IORES_DESC_ACPI_TABLES region
and have ACPI tables in IORES_DESC_ACPI_NV_STORAGE region instead.

So I think map_acpi_tables() should try to map both regions.  I tried
following change in addition and it worked.
Junichi Nomura May 15, 2019, 6:43 a.m. UTC | #19
On 5/15/19 10:08 AM, Dave Young wrote:
> On 05/14/19 at 04:09pm, Ingo Molnar wrote:
>>> Hmm, it seems caused by some WIP branch patches, I suspect below:
>>> commit 124d6af5a5f559e516ed2c6ea857e889ed293b43
>>> x86/paravirt: Standardize 'insn_buff' variable names
>>
>> This commit had a bug which I fixed - could you try the latest -tip?
> 
> Will do, but I do not use tip tree often, not sure which branch includes
> the fix.
> 
> https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git/
> Is it tip/master or tip/tip?

Just in case, when I tried tip/master, one of test machines crashed
in the same way as:
  https://lkml.org/lkml/2019/5/9/182

and I found this patch was needed:
  [PATCH] x86: intel_epb: Take CONFIG_PM into account
  https://lore.kernel.org/lkml/3431308.1mSSVdqTRr@kreacher/
Borislav Petkov May 15, 2019, 6:58 a.m. UTC | #20
On Wed, May 15, 2019 at 05:17:19AM +0000, Junichi Nomura wrote:
> Hi Kairui,
> 
> On 5/13/19 5:02 PM, Baoquan He wrote:
> > On 05/13/19 at 09:50am, Borislav Petkov wrote:
> >> On Mon, May 13, 2019 at 03:32:54PM +0800, Baoquan He wrote:
> >> So we're going to try it again this cycle and if there's no fallout, it
> >> will go upstream. If not, it will have to be fixed. The usual thing.
> >>
> >> And I don't care if Kairui's patch fixes this one problem - judging by
> >> the fragility of this whole thing, it should be hammered on one more
> >> cycle on as many boxes as possible to make sure there's no other SNAFUs.
> >>
> >> So go test it on more machines instead. I've pushed it here:
> >>
> >> https://git.kernel.org/pub/scm/linux/kernel/git/bp/bp.git/log/?h=next-merge-window
> > 
> > Pingfan has got a machine to reproduce the kexec breakage issue, and
> > applying these two patches fix it. He planned to paste the test result.
> > I will ask him to try this branch if he has time, or I can get his
> > machine to test.
> > 
> > Junichi, also have a try on Boris's branch in NEC's test environment?
> 
> while the patch set works on most of the machines I'm testing around,
> I found kexec(1) fails to load kernel on a few machines if this patch
> is applied.  Those machines don't have IORES_DESC_ACPI_TABLES region
> and have ACPI tables in IORES_DESC_ACPI_NV_STORAGE region instead.

Why? What kind of machines are those?

Why are the ACPI tables in NV storage?

Looking at crash_setup_memmap_entries(), it already maps that type so I
guess this is needed.

+ Rafael and leaving in the rest for reference.

 
> So I think map_acpi_tables() should try to map both regions.  I tried
> following change in addition and it worked.
> 
> -- 
> Jun'ichi Nomura, NEC Corporation / NEC Solution Innovators, Ltd.
> 
> 
> diff --git a/arch/x86/kernel/machine_kexec_64.c b/arch/x86/kernel/machine_kexec_64.c
> index 3c77bdf..3837c4a 100644
> --- a/arch/x86/kernel/machine_kexec_64.c
> +++ b/arch/x86/kernel/machine_kexec_64.c
> @@ -56,12 +56,22 @@ static int mem_region_callback(struct resource *res, void *arg)
>  {
>  	unsigned long flags = IORESOURCE_MEM | IORESOURCE_BUSY;
>  	struct init_pgtable_data data;
> +	int ret;
>  
>  	data.info = info;
>  	data.level4p = level4p;
>  	flags = IORESOURCE_MEM | IORESOURCE_BUSY;
> -	return walk_iomem_res_desc(IORES_DESC_ACPI_TABLES, flags, 0, -1,
> -				   &data, mem_region_callback);
> +	ret = walk_iomem_res_desc(IORES_DESC_ACPI_TABLES, flags, 0, -1,
> +				  &data, mem_region_callback);
> +	if (ret && ret != -EINVAL)
> +		return ret;
> +
> +	ret = walk_iomem_res_desc(IORES_DESC_ACPI_NV_STORAGE, flags, 0, -1,
> +				  &data, mem_region_callback);
> +	if (ret && ret != -EINVAL)
> +		return ret;
> +
> +	return 0;
>  }
>  #else
>  static int map_acpi_tables(struct x86_mapping_info *info, pgd_t *level4p) { return 0; }
Junichi Nomura May 15, 2019, 7:09 a.m. UTC | #21
On 5/15/19 3:58 PM, Borislav Petkov wrote:
> On Wed, May 15, 2019 at 05:17:19AM +0000, Junichi Nomura wrote:
>> Hi Kairui,
>>
>> On 5/13/19 5:02 PM, Baoquan He wrote:
>>> On 05/13/19 at 09:50am, Borislav Petkov wrote:
>>>> On Mon, May 13, 2019 at 03:32:54PM +0800, Baoquan He wrote:
>>>> So we're going to try it again this cycle and if there's no fallout, it
>>>> will go upstream. If not, it will have to be fixed. The usual thing.
>>>>
>>>> And I don't care if Kairui's patch fixes this one problem - judging by
>>>> the fragility of this whole thing, it should be hammered on one more
>>>> cycle on as many boxes as possible to make sure there's no other SNAFUs.
>>>>
>>>> So go test it on more machines instead. I've pushed it here:
>>>>
>>>> https://git.kernel.org/pub/scm/linux/kernel/git/bp/bp.git/log/?h=next-merge-window
>>>
>>> Pingfan has got a machine to reproduce the kexec breakage issue, and
>>> applying these two patches fix it. He planned to paste the test result.
>>> I will ask him to try this branch if he has time, or I can get his
>>> machine to test.
>>>
>>> Junichi, also have a try on Boris's branch in NEC's test environment?
>>
>> while the patch set works on most of the machines I'm testing around,
>> I found kexec(1) fails to load kernel on a few machines if this patch
>> is applied.  Those machines don't have IORES_DESC_ACPI_TABLES region
>> and have ACPI tables in IORES_DESC_ACPI_NV_STORAGE region instead.
> 
> Why? What kind of machines are those?

I don't know.  They are just general purpose Xeon-based servers
and not some special purpose machines.  So I guess there are other
such machines in the wild.

> Why are the ACPI tables in NV storage?
> 
> Looking at crash_setup_memmap_entries(), it already maps that type so I
> guess this is needed.
> 
> + Rafael and leaving in the rest for reference.
> 
>  
>> So I think map_acpi_tables() should try to map both regions.  I tried
>> following change in addition and it worked.
>>
>> -- 
>> Jun'ichi Nomura, NEC Corporation / NEC Solution Innovators, Ltd.
>>
>>
>> diff --git a/arch/x86/kernel/machine_kexec_64.c b/arch/x86/kernel/machine_kexec_64.c
>> index 3c77bdf..3837c4a 100644
>> --- a/arch/x86/kernel/machine_kexec_64.c
>> +++ b/arch/x86/kernel/machine_kexec_64.c
>> @@ -56,12 +56,22 @@ static int mem_region_callback(struct resource *res, void *arg)
>>  {
>>  	unsigned long flags = IORESOURCE_MEM | IORESOURCE_BUSY;
>>  	struct init_pgtable_data data;
>> +	int ret;
>>  
>>  	data.info = info;
>>  	data.level4p = level4p;
>>  	flags = IORESOURCE_MEM | IORESOURCE_BUSY;
>> -	return walk_iomem_res_desc(IORES_DESC_ACPI_TABLES, flags, 0, -1,
>> -				   &data, mem_region_callback);
>> +	ret = walk_iomem_res_desc(IORES_DESC_ACPI_TABLES, flags, 0, -1,
>> +				  &data, mem_region_callback);
>> +	if (ret && ret != -EINVAL)
>> +		return ret;
>> +
>> +	ret = walk_iomem_res_desc(IORES_DESC_ACPI_NV_STORAGE, flags, 0, -1,
>> +				  &data, mem_region_callback);
>> +	if (ret && ret != -EINVAL)
>> +		return ret;
>> +
>> +	return 0;
>>  }
>>  #else
>>  static int map_acpi_tables(struct x86_mapping_info *info, pgd_t *level4p) { return 0; }
Borislav Petkov May 17, 2019, 1:41 p.m. UTC | #22
On Tue, May 14, 2019 at 11:22:08AM +0800, Dave Young wrote:
> Another thing is we can move the get rsdp after console_init, but that
> can be done later as separate patch.

https://lkml.kernel.org/r/20190417090247.GD20492@zn.tnic
Kairui Song May 21, 2019, 9:02 a.m. UTC | #23
On Wed, May 15, 2019 at 3:10 PM Junichi Nomura <j-nomura@ce.jp.nec.com> wrote:
>
> On 5/15/19 3:58 PM, Borislav Petkov wrote:
> > On Wed, May 15, 2019 at 05:17:19AM +0000, Junichi Nomura wrote:
> >> Hi Kairui,
> >>
> >> On 5/13/19 5:02 PM, Baoquan He wrote:
> >>> On 05/13/19 at 09:50am, Borislav Petkov wrote:
> >>>> On Mon, May 13, 2019 at 03:32:54PM +0800, Baoquan He wrote:
> >>>> So we're going to try it again this cycle and if there's no fallout, it
> >>>> will go upstream. If not, it will have to be fixed. The usual thing.
> >>>>
> >>>> And I don't care if Kairui's patch fixes this one problem - judging by
> >>>> the fragility of this whole thing, it should be hammered on one more
> >>>> cycle on as many boxes as possible to make sure there's no other SNAFUs.
> >>>>
> >>>> So go test it on more machines instead. I've pushed it here:
> >>>>
> >>>> https://git.kernel.org/pub/scm/linux/kernel/git/bp/bp.git/log/?h=next-merge-window
> >>>
> >>> Pingfan has got a machine to reproduce the kexec breakage issue, and
> >>> applying these two patches fix it. He planned to paste the test result.
> >>> I will ask him to try this branch if he has time, or I can get his
> >>> machine to test.
> >>>
> >>> Junichi, also have a try on Boris's branch in NEC's test environment?
> >>
> >> while the patch set works on most of the machines I'm testing around,
> >> I found kexec(1) fails to load kernel on a few machines if this patch
> >> is applied.  Those machines don't have IORES_DESC_ACPI_TABLES region
> >> and have ACPI tables in IORES_DESC_ACPI_NV_STORAGE region instead.
> >
> > Why? What kind of machines are those?
>
> I don't know.  They are just general purpose Xeon-based servers
> and not some special purpose machines.  So I guess there are other
> such machines in the wild.
>

Hi, I think it's reasonable to update the patch to include the
NV_STORAGE regions as well, most likely the firmware only provided
NV_STORAGE region? Can you help confirm if the e820 didn't contain
ACPI data, and only ACPI NVS?

I had a try with this update patch, it worked and didn't break anything.

Hi Boris, would you prefer to just fold Junichi update patch into the
previous one or I should send an updated patch?


--
Best Regards,
Kairui Song
Junichi Nomura May 21, 2019, 10:43 a.m. UTC | #24
On 2019/05/21 18:02, Kairui Song wrote:
> On Wed, May 15, 2019 at 3:10 PM Junichi Nomura <j-nomura@ce.jp.nec.com> wrote:
>> On 5/15/19 3:58 PM, Borislav Petkov wrote:
>>> On Wed, May 15, 2019 at 05:17:19AM +0000, Junichi Nomura wrote:
>>>> I found kexec(1) fails to load kernel on a few machines if this patch
>>>> is applied.  Those machines don't have IORES_DESC_ACPI_TABLES region
>>>> and have ACPI tables in IORES_DESC_ACPI_NV_STORAGE region instead.
>>>
>>> Why? What kind of machines are those?
>>
>> I don't know.  They are just general purpose Xeon-based servers
>> and not some special purpose machines.  So I guess there are other
>> such machines in the wild.
> 
> Hi, I think it's reasonable to update the patch to include the
> NV_STORAGE regions as well, most likely the firmware only provided
> NV_STORAGE region? Can you help confirm if the e820 didn't contain
> ACPI data, and only ACPI NVS?

Yes, those machines only have ACPI NVS region as far as I see from kernel log.

> I had a try with this update patch, it worked and didn't break anything.
> 
> Hi Boris, would you prefer to just fold Junichi update patch into the
> previous one or I should send an updated patch?
Borislav Petkov May 21, 2019, 6:09 p.m. UTC | #25
On Tue, May 21, 2019 at 05:02:59PM +0800, Kairui Song wrote:
> Hi Boris, would you prefer to just fold Junichi update patch into the
> previous one or I should send an updated patch?

Please send a patch ontop after Ingo queues your old one, which should
happen soon. This way it would also document the fact that there are
machines with NVS regions only.

Thx.
Dirk van der Merwe May 21, 2019, 9:53 p.m. UTC | #26
On 5/13/19 8:33 PM, Baoquan He wrote:
> On 05/14/19 at 11:22am, Dave Young wrote:
>> On 05/13/19 at 04:06pm, Baoquan He wrote:
>>> Hi Dave,
>>>
>>> On 05/13/19 at 09:50am, Borislav Petkov wrote:
>>>> On Mon, May 13, 2019 at 03:32:54PM +0800, Baoquan He wrote:
>>>>> This is a critical bug which breaks memory hotplug,
>>>> Please concentrate and stop the blabla:
>>>>
>>>> 36f0c423552d ("x86/boot: Disable RSDP parsing temporarily")
>>>>
>>>> already explains what the deal is. This code was *purposefully* disabled
>>>> because we ran out of time and it broke a couple of machines. Don't make
>>> I remember your machine is the one on whihc the issue is reported. Could
>>> you also test it and confirm if these all things found ealier are
>>> cleared out?
>>>
>> I did some tests on the laptop,  thing is:
>> 1. apply the 3 patches (two you posted + Boris's revert commit 52b922c3d49c)
>>     on latest Linus master branch, everything works fine.
>>
>> 2. build and test the tip/next-merge-window branch, kernel hangs early
>> without output, (both 1st boot and kexec boot)
> Thanks, Dave.
>
> Yeah, I also tested on a HP machine, problem reprodued on the current
> master branch when revert commit 52b922c3d49c.
>
> Then apply these two patches, problem solved.
>
> Tried boris's next-merge-window branch too, kexec works very well.
>
> Dirk, Junichi, feel free to add your test result if you have time.


I tested this with the patches (plus revert of the workaround) applied 
against stable 5.1 and it works fine for me there. Thanks!

Where can I find the next-merge-window tree?

I can test against that too.


Best regards

Dirk
Borislav Petkov May 21, 2019, 11:04 p.m. UTC | #27
On Tue, May 21, 2019 at 02:53:52PM -0700, Dirk van der Merwe wrote:
> Where can I find the next-merge-window tree?
> 
> I can test against that too.

It'll appear soon in a tip branch. I'd appreciate if you tested that
instead - stay tuned...

Thx.
Kairui Song May 28, 2019, 2:49 a.m. UTC | #28
On Wed, May 22, 2019 at 2:09 AM Borislav Petkov <bp@alien8.de> wrote:
>
> On Tue, May 21, 2019 at 05:02:59PM +0800, Kairui Song wrote:
> > Hi Boris, would you prefer to just fold Junichi update patch into the
> > previous one or I should send an updated patch?
>
> Please send a patch ontop after Ingo queues your old one, which should
> happen soon. This way it would also document the fact that there are
> machines with NVS regions only.
>
> Thx.
>

Hi, by now, I still didn't see any tip branch pick up this patch yet,
any update?

--
Best Regards,
Kairui Song
Borislav Petkov June 6, 2019, 7:20 p.m. UTC | #29
On Tue, May 28, 2019 at 10:49:54AM +0800, Kairui Song wrote:
> Hi, by now, I still didn't see any tip branch pick up this patch yet,
> any update?

Ok, stuff is queued in tip:x86/boot now. Please test it as much as you
can and send all fixes ontop.

Thx.

Patch
diff mbox series

diff --git a/arch/x86/kernel/machine_kexec_64.c b/arch/x86/kernel/machine_kexec_64.c
index ceba408ea982..0af01490ee2d 100644
--- a/arch/x86/kernel/machine_kexec_64.c
+++ b/arch/x86/kernel/machine_kexec_64.c
@@ -18,6 +18,7 @@ 
 #include <linux/io.h>
 #include <linux/suspend.h>
 #include <linux/vmalloc.h>
+#include <linux/efi.h>
 
 #include <asm/init.h>
 #include <asm/pgtable.h>
@@ -29,6 +30,47 @@ 
 #include <asm/setup.h>
 #include <asm/set_memory.h>
 
+#ifdef CONFIG_ACPI
+/*
+ * Used while adding mapping for ACPI tables.
+ * Can be reused when other iomem regions need be mapped
+ */
+struct init_pgtable_data {
+	struct x86_mapping_info *info;
+	pgd_t *level4p;
+};
+
+static int mem_region_callback(struct resource *res, void *arg)
+{
+	struct init_pgtable_data *data = arg;
+	unsigned long mstart, mend;
+
+	mstart = res->start;
+	mend = mstart + resource_size(res) - 1;
+
+	return kernel_ident_mapping_init(data->info, data->level4p, mstart, mend);
+}
+
+static int
+map_acpi_tables(struct x86_mapping_info *info, pgd_t *level4p)
+{
+	unsigned long flags = IORESOURCE_MEM | IORESOURCE_BUSY;
+	struct init_pgtable_data data;
+
+	data.info = info;
+	data.level4p = level4p;
+	flags = IORESOURCE_MEM | IORESOURCE_BUSY;
+	return walk_iomem_res_desc(IORES_DESC_ACPI_TABLES, flags, 0, -1,
+				   &data, mem_region_callback);
+}
+#else
+static int init_acpi_pgtable(struct x86_mapping_info *info,
+				   pgd_t *level4p)
+{
+	return 0;
+}
+#endif
+
 #ifdef CONFIG_KEXEC_FILE
 const struct kexec_file_ops * const kexec_file_loaders[] = {
 		&kexec_bzImage64_ops,
@@ -36,6 +78,31 @@  const struct kexec_file_ops * const kexec_file_loaders[] = {
 };
 #endif
 
+static int
+map_efi_systab(struct x86_mapping_info *info, pgd_t *level4p)
+{
+#ifdef CONFIG_EFI
+	unsigned long mstart, mend;
+
+	if (!efi_enabled(EFI_BOOT))
+		return 0;
+
+	mstart = (boot_params.efi_info.efi_systab |
+			((u64)boot_params.efi_info.efi_systab_hi<<32));
+
+	if (efi_enabled(EFI_64BIT))
+		mend = mstart + sizeof(efi_system_table_64_t);
+	else
+		mend = mstart + sizeof(efi_system_table_32_t);
+
+	if (!mstart)
+		return 0;
+
+	return kernel_ident_mapping_init(info, level4p, mstart, mend);
+#endif
+	return 0;
+}
+
 static void free_transition_pgtable(struct kimage *image)
 {
 	free_page((unsigned long)image->arch.p4d);
@@ -159,6 +226,18 @@  static int init_pgtable(struct kimage *image, unsigned long start_pgtable)
 			return result;
 	}
 
+	/*
+	 * Prepare EFI systab and ACPI table mapping for kexec kernel,
+	 * since they are not covered by pfn_mapped.
+	 */
+	result = map_efi_systab(&info, level4p);
+	if (result)
+		return result;
+
+	result = map_acpi_tables(&info, level4p);
+	if (result)
+		return result;
+
 	return init_transition_pgtable(image, level4p);
 }