All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2] x86/kexec: Exclude GART aperture from vmcore
@ 2017-12-16  0:15 ` Jiri Bohac
  0 siblings, 0 replies; 23+ messages in thread
From: Jiri Bohac @ 2017-12-16  0:15 UTC (permalink / raw)
  To: Baoquan He
  Cc: Toshi Kani, David Airlie, Dave Young, joro, kexec, linux-kernel,
	Ingo Molnar, Borislav Petkov, H. Peter Anvin, Bjorn Helgaas,
	Thomas Gleixner, yinghai, Vivek Goyal

On machines where the GART aperture is mapped over physical RAM
/proc/vmcore contains the remapped range and reading it may
cause hangs or reboots. 

In the past, the GART region was added into the resource map, implemented by
commit 56dd669a138c ("[PATCH] Insert GART region into resource map")

However, inserting the iomem_resource from the early GART code caused
resource conflicts with some AGP drivers (bko#72201), which got avoided by
reverting the patch in commit 707d4eefbdb3 ("Revert [PATCH] Insert GART
region into resource map"). This revert introduced the /proc/vmcore bug.

The vmcore ELF header is either prepared by the kernel (when using the
kexec_file_load syscall) or by the kexec userspace (when using the kexec_load
syscall). Since we no longer have the GART iomem resource, the userspace
kexec has no way of knowing which region to exclude from the ELF header.

Changes from v1 of this patch:
Instead of excluding the aperture from the ELF header, this patch
makes /proc/vmcore return zeroes in the second kernel when attempting to
read the aperture region. This is done by reusing the
gart_oldmem_pfn_is_ram infrastructure originally intended to exclude XEN
balooned memory. This works for both, the kexec_file_load and kexec_load
syscalls.

[Note that the GART region is the same in the first and second kernels:
regardless whether the first kernel fixed up the northbridge/bios setting
and mapped the aperture over physical memory, the second kernel finds the
northbridge properly configured by the first kernel and the aperture
never overlaps with e820 memory because the second kernel has a fake e820
map created from the crashkernel memory regions. Thus, the second kernel
keeps the aperture address/size as configured by the first kernel.]

register_oldmem_pfn_is_ram can only register one callback and returns an error
if the callback has been registered already. Since XEN used to be the only user
of this function, it never checks the return value. Now that we have more than
one user, I added a WARN_ON just in case agp, XEN, or any other future user of
register_oldmem_pfn_is_ram were to step on each other's toes.


Signed-off-by: Jiri Bohac <jbohac@suse.cz>
Fixes: 707d4eefbdb3 ("Revert [PATCH] Insert GART region into resource map")

diff --git a/arch/x86/kernel/aperture_64.c b/arch/x86/kernel/aperture_64.c
index f5d92bc3b884..837efa32110c 100644
--- a/arch/x86/kernel/aperture_64.c
+++ b/arch/x86/kernel/aperture_64.c
@@ -30,6 +30,7 @@
 #include <asm/dma.h>
 #include <asm/amd_nb.h>
 #include <asm/x86_init.h>
+#include <linux/crash_dump.h>
 
 /*
  * Using 512M as goal, in case kexec will load kernel_big
@@ -56,6 +57,27 @@ int fallback_aper_force __initdata;
 
 int fix_aperture __initdata = 1;
 
+#ifdef CONFIG_PROC_VMCORE
+static unsigned long aperture_pfn_start, aperture_page_count;
+
+static int gart_oldmem_pfn_is_ram(unsigned long pfn)
+{
+	return likely((pfn < aperture_pfn_start) ||
+		      (pfn >= aperture_pfn_start + aperture_page_count));
+}
+
+static void exclude_from_vmcore(u64 aper_base, u32 aper_order)
+{
+	aperture_pfn_start = aper_base >> PAGE_SHIFT;
+	aperture_page_count = (32 * 1024 * 1024) << aper_order >> PAGE_SHIFT;
+	WARN_ON(register_oldmem_pfn_is_ram(&gart_oldmem_pfn_is_ram));
+}
+#else
+static void exclude_from_vmcore(u64 aper_base, u32 aper_order)
+{
+}
+#endif
+
 /* This code runs before the PCI subsystem is initialized, so just
    access the northbridge directly. */
 
@@ -435,8 +457,10 @@ int __init gart_iommu_hole_init(void)
 
 out:
 	if (!fix && !fallback_aper_force) {
-		if (last_aper_base)
+		if (last_aper_base) {
+			exclude_from_vmcore(last_aper_base, last_aper_order);
 			return 1;
+		}
 		return 0;
 	}
 
@@ -473,6 +497,8 @@ int __init gart_iommu_hole_init(void)
 		return 0;
 	}
 
+	exclude_from_vmcore(aper_alloc, aper_order);
+
 	/* Fix up the north bridges */
 	for (i = 0; i < amd_nb_bus_dev_ranges[i].dev_limit; i++) {
 		int bus, dev_base, dev_limit;
diff --git a/arch/x86/xen/mmu_hvm.c b/arch/x86/xen/mmu_hvm.c
index 2cfcfe4f6b2a..dd2ad82eee80 100644
--- a/arch/x86/xen/mmu_hvm.c
+++ b/arch/x86/xen/mmu_hvm.c
@@ -75,6 +75,6 @@ void __init xen_hvm_init_mmu_ops(void)
 	if (is_pagetable_dying_supported())
 		pv_mmu_ops.exit_mmap = xen_hvm_exit_mmap;
 #ifdef CONFIG_PROC_VMCORE
-	register_oldmem_pfn_is_ram(&xen_oldmem_pfn_is_ram);
+	WARN_ON(register_oldmem_pfn_is_ram(&xen_oldmem_pfn_is_ram));
 #endif
 }
-- 
Jiri Bohac <jbohac@suse.cz>
SUSE Labs, Prague, Czechia

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v2] x86/kexec: Exclude GART aperture from vmcore
@ 2017-12-16  0:15 ` Jiri Bohac
  0 siblings, 0 replies; 23+ messages in thread
From: Jiri Bohac @ 2017-12-16  0:15 UTC (permalink / raw)
  To: Baoquan He
  Cc: Toshi Kani, David Airlie, yinghai, joro, kexec, linux-kernel,
	Ingo Molnar, Borislav Petkov, H. Peter Anvin, Bjorn Helgaas,
	Thomas Gleixner, Dave Young, Vivek Goyal

On machines where the GART aperture is mapped over physical RAM
/proc/vmcore contains the remapped range and reading it may
cause hangs or reboots. 

In the past, the GART region was added into the resource map, implemented by
commit 56dd669a138c ("[PATCH] Insert GART region into resource map")

However, inserting the iomem_resource from the early GART code caused
resource conflicts with some AGP drivers (bko#72201), which got avoided by
reverting the patch in commit 707d4eefbdb3 ("Revert [PATCH] Insert GART
region into resource map"). This revert introduced the /proc/vmcore bug.

The vmcore ELF header is either prepared by the kernel (when using the
kexec_file_load syscall) or by the kexec userspace (when using the kexec_load
syscall). Since we no longer have the GART iomem resource, the userspace
kexec has no way of knowing which region to exclude from the ELF header.

Changes from v1 of this patch:
Instead of excluding the aperture from the ELF header, this patch
makes /proc/vmcore return zeroes in the second kernel when attempting to
read the aperture region. This is done by reusing the
gart_oldmem_pfn_is_ram infrastructure originally intended to exclude XEN
balooned memory. This works for both, the kexec_file_load and kexec_load
syscalls.

[Note that the GART region is the same in the first and second kernels:
regardless whether the first kernel fixed up the northbridge/bios setting
and mapped the aperture over physical memory, the second kernel finds the
northbridge properly configured by the first kernel and the aperture
never overlaps with e820 memory because the second kernel has a fake e820
map created from the crashkernel memory regions. Thus, the second kernel
keeps the aperture address/size as configured by the first kernel.]

register_oldmem_pfn_is_ram can only register one callback and returns an error
if the callback has been registered already. Since XEN used to be the only user
of this function, it never checks the return value. Now that we have more than
one user, I added a WARN_ON just in case agp, XEN, or any other future user of
register_oldmem_pfn_is_ram were to step on each other's toes.


Signed-off-by: Jiri Bohac <jbohac@suse.cz>
Fixes: 707d4eefbdb3 ("Revert [PATCH] Insert GART region into resource map")

diff --git a/arch/x86/kernel/aperture_64.c b/arch/x86/kernel/aperture_64.c
index f5d92bc3b884..837efa32110c 100644
--- a/arch/x86/kernel/aperture_64.c
+++ b/arch/x86/kernel/aperture_64.c
@@ -30,6 +30,7 @@
 #include <asm/dma.h>
 #include <asm/amd_nb.h>
 #include <asm/x86_init.h>
+#include <linux/crash_dump.h>
 
 /*
  * Using 512M as goal, in case kexec will load kernel_big
@@ -56,6 +57,27 @@ int fallback_aper_force __initdata;
 
 int fix_aperture __initdata = 1;
 
+#ifdef CONFIG_PROC_VMCORE
+static unsigned long aperture_pfn_start, aperture_page_count;
+
+static int gart_oldmem_pfn_is_ram(unsigned long pfn)
+{
+	return likely((pfn < aperture_pfn_start) ||
+		      (pfn >= aperture_pfn_start + aperture_page_count));
+}
+
+static void exclude_from_vmcore(u64 aper_base, u32 aper_order)
+{
+	aperture_pfn_start = aper_base >> PAGE_SHIFT;
+	aperture_page_count = (32 * 1024 * 1024) << aper_order >> PAGE_SHIFT;
+	WARN_ON(register_oldmem_pfn_is_ram(&gart_oldmem_pfn_is_ram));
+}
+#else
+static void exclude_from_vmcore(u64 aper_base, u32 aper_order)
+{
+}
+#endif
+
 /* This code runs before the PCI subsystem is initialized, so just
    access the northbridge directly. */
 
@@ -435,8 +457,10 @@ int __init gart_iommu_hole_init(void)
 
 out:
 	if (!fix && !fallback_aper_force) {
-		if (last_aper_base)
+		if (last_aper_base) {
+			exclude_from_vmcore(last_aper_base, last_aper_order);
 			return 1;
+		}
 		return 0;
 	}
 
@@ -473,6 +497,8 @@ int __init gart_iommu_hole_init(void)
 		return 0;
 	}
 
+	exclude_from_vmcore(aper_alloc, aper_order);
+
 	/* Fix up the north bridges */
 	for (i = 0; i < amd_nb_bus_dev_ranges[i].dev_limit; i++) {
 		int bus, dev_base, dev_limit;
diff --git a/arch/x86/xen/mmu_hvm.c b/arch/x86/xen/mmu_hvm.c
index 2cfcfe4f6b2a..dd2ad82eee80 100644
--- a/arch/x86/xen/mmu_hvm.c
+++ b/arch/x86/xen/mmu_hvm.c
@@ -75,6 +75,6 @@ void __init xen_hvm_init_mmu_ops(void)
 	if (is_pagetable_dying_supported())
 		pv_mmu_ops.exit_mmap = xen_hvm_exit_mmap;
 #ifdef CONFIG_PROC_VMCORE
-	register_oldmem_pfn_is_ram(&xen_oldmem_pfn_is_ram);
+	WARN_ON(register_oldmem_pfn_is_ram(&xen_oldmem_pfn_is_ram));
 #endif
 }
-- 
Jiri Bohac <jbohac@suse.cz>
SUSE Labs, Prague, Czechia


_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: [PATCH v2] x86/kexec: Exclude GART aperture from vmcore
  2017-12-16  0:15 ` Jiri Bohac
@ 2017-12-16  1:01   ` Baoquan He
  -1 siblings, 0 replies; 23+ messages in thread
From: Baoquan He @ 2017-12-16  1:01 UTC (permalink / raw)
  To: Jiri Bohac
  Cc: Toshi Kani, David Airlie, Dave Young, joro, kexec, linux-kernel,
	Ingo Molnar, Borislav Petkov, H. Peter Anvin, Bjorn Helgaas,
	Thomas Gleixner, yinghai, Vivek Goyal

On 12/16/17 at 01:15am, Jiri Bohac wrote:
> On machines where the GART aperture is mapped over physical RAM
> /proc/vmcore contains the remapped range and reading it may
> cause hangs or reboots. 
> 
> In the past, the GART region was added into the resource map, implemented by
> commit 56dd669a138c ("[PATCH] Insert GART region into resource map")
> 
> However, inserting the iomem_resource from the early GART code caused
> resource conflicts with some AGP drivers (bko#72201), which got avoided by
> reverting the patch in commit 707d4eefbdb3 ("Revert [PATCH] Insert GART
> region into resource map"). This revert introduced the /proc/vmcore bug.
> 
> The vmcore ELF header is either prepared by the kernel (when using the
> kexec_file_load syscall) or by the kexec userspace (when using the kexec_load
> syscall). Since we no longer have the GART iomem resource, the userspace
> kexec has no way of knowing which region to exclude from the ELF header.
> 
> Changes from v1 of this patch:
> Instead of excluding the aperture from the ELF header, this patch
> makes /proc/vmcore return zeroes in the second kernel when attempting to
> read the aperture region. This is done by reusing the
> gart_oldmem_pfn_is_ram infrastructure originally intended to exclude XEN
> balooned memory. This works for both, the kexec_file_load and kexec_load
> syscalls.

This is funny and smart way. While it may not be good, E.g what if
people add iommu=off in kdump kernel? You made an assumption on this,
which people must use iommu in kdump kernel and aperture_pfn_start can
be got naturally.

As I said in v1, I listed 3 options. In fact 1st option is doable. Just
need to figure out why peopld did the quirk to enable gart anyway when
not enable gart in bios. You replied that you guess there's firmware bug
so that people have to do this. I have to say guess is not convincing.

Well, in fact, I have several questions:

1) In your case, the HP machine with gart, when you enable gart support
in bios, is it OK to you?

2) If firmware is broken, you can't enable gart in firmware, will
firmware engineer fix this since it's a firmware bug?

If above two have answer 'yes', then it's not a blocker to you. And we
should remove the quirk code which enable gart in kernel when people
forget enableing gart support in firmware.

OR

Take the option 3 as I said. Take the gart region off from iomem. But
you said there will be conflict with pci issue. If I did, I would make
clear what's the pci issue and why they conflict. Can we do in another
way to dig it out from iomem, e.g in different stage, after pci bus
iteration. 

I didn't reply to you later since I think my comments have been clear,
and I am very busy on rhel regression bug fix. Kernel bug fix sometime
is not easy. I don't find a gart system in redhat's lab, otherwise I
will make a post to have a try.

Thanks
Baoquan

> 
> [Note that the GART region is the same in the first and second kernels:
> regardless whether the first kernel fixed up the northbridge/bios setting
> and mapped the aperture over physical memory, the second kernel finds the
> northbridge properly configured by the first kernel and the aperture
> never overlaps with e820 memory because the second kernel has a fake e820
> map created from the crashkernel memory regions. Thus, the second kernel
> keeps the aperture address/size as configured by the first kernel.]
> 
> register_oldmem_pfn_is_ram can only register one callback and returns an error
> if the callback has been registered already. Since XEN used to be the only user
> of this function, it never checks the return value. Now that we have more than
> one user, I added a WARN_ON just in case agp, XEN, or any other future user of
> register_oldmem_pfn_is_ram were to step on each other's toes.
> 
> 
> Signed-off-by: Jiri Bohac <jbohac@suse.cz>
> Fixes: 707d4eefbdb3 ("Revert [PATCH] Insert GART region into resource map")
> 
> diff --git a/arch/x86/kernel/aperture_64.c b/arch/x86/kernel/aperture_64.c
> index f5d92bc3b884..837efa32110c 100644
> --- a/arch/x86/kernel/aperture_64.c
> +++ b/arch/x86/kernel/aperture_64.c
> @@ -30,6 +30,7 @@
>  #include <asm/dma.h>
>  #include <asm/amd_nb.h>
>  #include <asm/x86_init.h>
> +#include <linux/crash_dump.h>
>  
>  /*
>   * Using 512M as goal, in case kexec will load kernel_big
> @@ -56,6 +57,27 @@ int fallback_aper_force __initdata;
>  
>  int fix_aperture __initdata = 1;
>  
> +#ifdef CONFIG_PROC_VMCORE
> +static unsigned long aperture_pfn_start, aperture_page_count;
> +
> +static int gart_oldmem_pfn_is_ram(unsigned long pfn)
> +{
> +	return likely((pfn < aperture_pfn_start) ||
> +		      (pfn >= aperture_pfn_start + aperture_page_count));
> +}
> +
> +static void exclude_from_vmcore(u64 aper_base, u32 aper_order)
> +{
> +	aperture_pfn_start = aper_base >> PAGE_SHIFT;
> +	aperture_page_count = (32 * 1024 * 1024) << aper_order >> PAGE_SHIFT;
> +	WARN_ON(register_oldmem_pfn_is_ram(&gart_oldmem_pfn_is_ram));
> +}
> +#else
> +static void exclude_from_vmcore(u64 aper_base, u32 aper_order)
> +{
> +}
> +#endif
> +
>  /* This code runs before the PCI subsystem is initialized, so just
>     access the northbridge directly. */
>  
> @@ -435,8 +457,10 @@ int __init gart_iommu_hole_init(void)
>  
>  out:
>  	if (!fix && !fallback_aper_force) {
> -		if (last_aper_base)
> +		if (last_aper_base) {
> +			exclude_from_vmcore(last_aper_base, last_aper_order);
>  			return 1;
> +		}
>  		return 0;
>  	}
>  
> @@ -473,6 +497,8 @@ int __init gart_iommu_hole_init(void)
>  		return 0;
>  	}
>  
> +	exclude_from_vmcore(aper_alloc, aper_order);
> +
>  	/* Fix up the north bridges */
>  	for (i = 0; i < amd_nb_bus_dev_ranges[i].dev_limit; i++) {
>  		int bus, dev_base, dev_limit;
> diff --git a/arch/x86/xen/mmu_hvm.c b/arch/x86/xen/mmu_hvm.c
> index 2cfcfe4f6b2a..dd2ad82eee80 100644
> --- a/arch/x86/xen/mmu_hvm.c
> +++ b/arch/x86/xen/mmu_hvm.c
> @@ -75,6 +75,6 @@ void __init xen_hvm_init_mmu_ops(void)
>  	if (is_pagetable_dying_supported())
>  		pv_mmu_ops.exit_mmap = xen_hvm_exit_mmap;
>  #ifdef CONFIG_PROC_VMCORE
> -	register_oldmem_pfn_is_ram(&xen_oldmem_pfn_is_ram);
> +	WARN_ON(register_oldmem_pfn_is_ram(&xen_oldmem_pfn_is_ram));
>  #endif
>  }
> -- 
> Jiri Bohac <jbohac@suse.cz>
> SUSE Labs, Prague, Czechia
> 

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2] x86/kexec: Exclude GART aperture from vmcore
@ 2017-12-16  1:01   ` Baoquan He
  0 siblings, 0 replies; 23+ messages in thread
From: Baoquan He @ 2017-12-16  1:01 UTC (permalink / raw)
  To: Jiri Bohac
  Cc: Toshi Kani, David Airlie, yinghai, joro, kexec, linux-kernel,
	Ingo Molnar, Borislav Petkov, H. Peter Anvin, Bjorn Helgaas,
	Thomas Gleixner, Dave Young, Vivek Goyal

On 12/16/17 at 01:15am, Jiri Bohac wrote:
> On machines where the GART aperture is mapped over physical RAM
> /proc/vmcore contains the remapped range and reading it may
> cause hangs or reboots. 
> 
> In the past, the GART region was added into the resource map, implemented by
> commit 56dd669a138c ("[PATCH] Insert GART region into resource map")
> 
> However, inserting the iomem_resource from the early GART code caused
> resource conflicts with some AGP drivers (bko#72201), which got avoided by
> reverting the patch in commit 707d4eefbdb3 ("Revert [PATCH] Insert GART
> region into resource map"). This revert introduced the /proc/vmcore bug.
> 
> The vmcore ELF header is either prepared by the kernel (when using the
> kexec_file_load syscall) or by the kexec userspace (when using the kexec_load
> syscall). Since we no longer have the GART iomem resource, the userspace
> kexec has no way of knowing which region to exclude from the ELF header.
> 
> Changes from v1 of this patch:
> Instead of excluding the aperture from the ELF header, this patch
> makes /proc/vmcore return zeroes in the second kernel when attempting to
> read the aperture region. This is done by reusing the
> gart_oldmem_pfn_is_ram infrastructure originally intended to exclude XEN
> balooned memory. This works for both, the kexec_file_load and kexec_load
> syscalls.

This is funny and smart way. While it may not be good, E.g what if
people add iommu=off in kdump kernel? You made an assumption on this,
which people must use iommu in kdump kernel and aperture_pfn_start can
be got naturally.

As I said in v1, I listed 3 options. In fact 1st option is doable. Just
need to figure out why peopld did the quirk to enable gart anyway when
not enable gart in bios. You replied that you guess there's firmware bug
so that people have to do this. I have to say guess is not convincing.

Well, in fact, I have several questions:

1) In your case, the HP machine with gart, when you enable gart support
in bios, is it OK to you?

2) If firmware is broken, you can't enable gart in firmware, will
firmware engineer fix this since it's a firmware bug?

If above two have answer 'yes', then it's not a blocker to you. And we
should remove the quirk code which enable gart in kernel when people
forget enableing gart support in firmware.

OR

Take the option 3 as I said. Take the gart region off from iomem. But
you said there will be conflict with pci issue. If I did, I would make
clear what's the pci issue and why they conflict. Can we do in another
way to dig it out from iomem, e.g in different stage, after pci bus
iteration. 

I didn't reply to you later since I think my comments have been clear,
and I am very busy on rhel regression bug fix. Kernel bug fix sometime
is not easy. I don't find a gart system in redhat's lab, otherwise I
will make a post to have a try.

Thanks
Baoquan

> 
> [Note that the GART region is the same in the first and second kernels:
> regardless whether the first kernel fixed up the northbridge/bios setting
> and mapped the aperture over physical memory, the second kernel finds the
> northbridge properly configured by the first kernel and the aperture
> never overlaps with e820 memory because the second kernel has a fake e820
> map created from the crashkernel memory regions. Thus, the second kernel
> keeps the aperture address/size as configured by the first kernel.]
> 
> register_oldmem_pfn_is_ram can only register one callback and returns an error
> if the callback has been registered already. Since XEN used to be the only user
> of this function, it never checks the return value. Now that we have more than
> one user, I added a WARN_ON just in case agp, XEN, or any other future user of
> register_oldmem_pfn_is_ram were to step on each other's toes.
> 
> 
> Signed-off-by: Jiri Bohac <jbohac@suse.cz>
> Fixes: 707d4eefbdb3 ("Revert [PATCH] Insert GART region into resource map")
> 
> diff --git a/arch/x86/kernel/aperture_64.c b/arch/x86/kernel/aperture_64.c
> index f5d92bc3b884..837efa32110c 100644
> --- a/arch/x86/kernel/aperture_64.c
> +++ b/arch/x86/kernel/aperture_64.c
> @@ -30,6 +30,7 @@
>  #include <asm/dma.h>
>  #include <asm/amd_nb.h>
>  #include <asm/x86_init.h>
> +#include <linux/crash_dump.h>
>  
>  /*
>   * Using 512M as goal, in case kexec will load kernel_big
> @@ -56,6 +57,27 @@ int fallback_aper_force __initdata;
>  
>  int fix_aperture __initdata = 1;
>  
> +#ifdef CONFIG_PROC_VMCORE
> +static unsigned long aperture_pfn_start, aperture_page_count;
> +
> +static int gart_oldmem_pfn_is_ram(unsigned long pfn)
> +{
> +	return likely((pfn < aperture_pfn_start) ||
> +		      (pfn >= aperture_pfn_start + aperture_page_count));
> +}
> +
> +static void exclude_from_vmcore(u64 aper_base, u32 aper_order)
> +{
> +	aperture_pfn_start = aper_base >> PAGE_SHIFT;
> +	aperture_page_count = (32 * 1024 * 1024) << aper_order >> PAGE_SHIFT;
> +	WARN_ON(register_oldmem_pfn_is_ram(&gart_oldmem_pfn_is_ram));
> +}
> +#else
> +static void exclude_from_vmcore(u64 aper_base, u32 aper_order)
> +{
> +}
> +#endif
> +
>  /* This code runs before the PCI subsystem is initialized, so just
>     access the northbridge directly. */
>  
> @@ -435,8 +457,10 @@ int __init gart_iommu_hole_init(void)
>  
>  out:
>  	if (!fix && !fallback_aper_force) {
> -		if (last_aper_base)
> +		if (last_aper_base) {
> +			exclude_from_vmcore(last_aper_base, last_aper_order);
>  			return 1;
> +		}
>  		return 0;
>  	}
>  
> @@ -473,6 +497,8 @@ int __init gart_iommu_hole_init(void)
>  		return 0;
>  	}
>  
> +	exclude_from_vmcore(aper_alloc, aper_order);
> +
>  	/* Fix up the north bridges */
>  	for (i = 0; i < amd_nb_bus_dev_ranges[i].dev_limit; i++) {
>  		int bus, dev_base, dev_limit;
> diff --git a/arch/x86/xen/mmu_hvm.c b/arch/x86/xen/mmu_hvm.c
> index 2cfcfe4f6b2a..dd2ad82eee80 100644
> --- a/arch/x86/xen/mmu_hvm.c
> +++ b/arch/x86/xen/mmu_hvm.c
> @@ -75,6 +75,6 @@ void __init xen_hvm_init_mmu_ops(void)
>  	if (is_pagetable_dying_supported())
>  		pv_mmu_ops.exit_mmap = xen_hvm_exit_mmap;
>  #ifdef CONFIG_PROC_VMCORE
> -	register_oldmem_pfn_is_ram(&xen_oldmem_pfn_is_ram);
> +	WARN_ON(register_oldmem_pfn_is_ram(&xen_oldmem_pfn_is_ram));
>  #endif
>  }
> -- 
> Jiri Bohac <jbohac@suse.cz>
> SUSE Labs, Prague, Czechia
> 

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2] x86/kexec: Exclude GART aperture from vmcore
  2017-12-16  1:01   ` Baoquan He
@ 2017-12-17 21:47     ` Borislav Petkov
  -1 siblings, 0 replies; 23+ messages in thread
From: Borislav Petkov @ 2017-12-17 21:47 UTC (permalink / raw)
  To: Baoquan He
  Cc: Jiri Bohac, Toshi Kani, David Airlie, Dave Young, joro, kexec,
	linux-kernel, Ingo Molnar, H. Peter Anvin, Bjorn Helgaas,
	Thomas Gleixner, yinghai, Vivek Goyal

On Sat, Dec 16, 2017 at 09:01:42AM +0800, Baoquan He wrote:
> 2) If firmware is broken, you can't enable gart in firmware, will
> firmware engineer fix this since it's a firmware bug?

Slow down and get a reality check first please!

A firmware engineer will fix a 10yr old BIOS?!? Yeah right. And I'll get
a pink pony for Christmas. Geez.

We need a reliable way to tell the second kernel not to access the gart
range. And frankly, the best thing to do would be to teach the *second*
kernel to simply avoid the gart range. Regardless of what it gets told
by the ELF header. Because there are some ranges which it shouldn't
touch. Maybe we can reuse the gart detection code to do that in the
second kernel too.

But I haven't looked at it, might be hairy. Need to deal with this PTI
madness first.

-- 
Regards/Gruss,
    Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2] x86/kexec: Exclude GART aperture from vmcore
@ 2017-12-17 21:47     ` Borislav Petkov
  0 siblings, 0 replies; 23+ messages in thread
From: Borislav Petkov @ 2017-12-17 21:47 UTC (permalink / raw)
  To: Baoquan He
  Cc: Jiri Bohac, Toshi Kani, David Airlie, yinghai, joro, kexec,
	linux-kernel, Ingo Molnar, H. Peter Anvin, Bjorn Helgaas,
	Thomas Gleixner, Dave Young, Vivek Goyal

On Sat, Dec 16, 2017 at 09:01:42AM +0800, Baoquan He wrote:
> 2) If firmware is broken, you can't enable gart in firmware, will
> firmware engineer fix this since it's a firmware bug?

Slow down and get a reality check first please!

A firmware engineer will fix a 10yr old BIOS?!? Yeah right. And I'll get
a pink pony for Christmas. Geez.

We need a reliable way to tell the second kernel not to access the gart
range. And frankly, the best thing to do would be to teach the *second*
kernel to simply avoid the gart range. Regardless of what it gets told
by the ELF header. Because there are some ranges which it shouldn't
touch. Maybe we can reuse the gart detection code to do that in the
second kernel too.

But I haven't looked at it, might be hairy. Need to deal with this PTI
madness first.

-- 
Regards/Gruss,
    Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2] x86/kexec: Exclude GART aperture from vmcore
  2017-12-17 21:47     ` Borislav Petkov
@ 2017-12-18 13:47       ` Baoquan He
  -1 siblings, 0 replies; 23+ messages in thread
From: Baoquan He @ 2017-12-18 13:47 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Jiri Bohac, Toshi Kani, David Airlie, Dave Young, joro, kexec,
	linux-kernel, Ingo Molnar, H. Peter Anvin, Bjorn Helgaas,
	Thomas Gleixner, yinghai, Vivek Goyal

On 12/17/17 at 10:47pm, Borislav Petkov wrote:
> On Sat, Dec 16, 2017 at 09:01:42AM +0800, Baoquan He wrote:
> > 2) If firmware is broken, you can't enable gart in firmware, will
> > firmware engineer fix this since it's a firmware bug?
> 
> Slow down and get a reality check first please!
> 
> A firmware engineer will fix a 10yr old BIOS?!? Yeah right. And I'll get
> a pink pony for Christmas. Geez.

The code is too old. I tried to find out the original commit, many files
moving commits make it not easy to track. In the current code, can only
see the pr_info telling people to "enable the IOMMU option in the BIOS
setup". No even one word to mention that it's for borken firmware.

>From Jiri's replying, he used 'guess', means the bug he is trying to fix
is not broken firmware case, but not enabling gart iommu support in bios.


gart_iommu_hole_init() {

	...

        } else if ((!no_iommu && max_pfn > MAX_DMA32_PFN) ||                                                                                     
                   force_iommu ||
                   valid_agp ||
                   fallback_aper_force) {
                pr_info("Your BIOS doesn't leave an aperture memory hole\n");
                pr_info("Please enable the IOMMU option in the BIOS setup\n");
                pr_info("This costs you %dMB of RAM\n",
                        32 << fallback_aper_order);
	...
}

> 
> We need a reliable way to tell the second kernel not to access the gart
> range. And frankly, the best thing to do would be to teach the *second*
> kernel to simply avoid the gart range. Regardless of what it gets told
> by the ELF header. Because there are some ranges which it shouldn't
> touch. Maybe we can reuse the gart detection code to do that in the
> second kernel too.

Previously people added gart region to iomem to notice that even though
there's ram mapped, while it's occupied by gart, please don't dump it.
Later it's reverted commit 707d4eefbdb3 ("Revert [PATCH] Insert GART
region into resource map"). 

In fact, there are two ways to fix this bug. One is to tell kdump kernel
not to dump the region of gart even though there are ram mapped to that
region and added to vmemmap and direct mapping. This was done before and
reverted later.

The other is not to tell kdump kernel that there's ram mapped into the
region. In the mail I replied to Jiri's v1 post, I meant the 2nd way.
Remove the ram region occupied by gart from iomem, then kdump kernel
won't see it and won't dump it.

And note that when we talk about this gart issue, we only mean the case
that gart support is not enabled in bios. In this case, gart will find a
region of ram and occupy it as gart aperture. And this is done during
gart iommu init, and after that ram region has been added to memory
subsystem.

> 
> But I haven't looked at it, might be hairy. Need to deal with this PTI
> madness first.
> 
> -- 
> Regards/Gruss,
>     Boris.
> 
> Good mailing practices for 400: avoid top-posting and trim the reply.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2] x86/kexec: Exclude GART aperture from vmcore
@ 2017-12-18 13:47       ` Baoquan He
  0 siblings, 0 replies; 23+ messages in thread
From: Baoquan He @ 2017-12-18 13:47 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Jiri Bohac, Toshi Kani, David Airlie, yinghai, joro, kexec,
	linux-kernel, Ingo Molnar, H. Peter Anvin, Bjorn Helgaas,
	Thomas Gleixner, Dave Young, Vivek Goyal

On 12/17/17 at 10:47pm, Borislav Petkov wrote:
> On Sat, Dec 16, 2017 at 09:01:42AM +0800, Baoquan He wrote:
> > 2) If firmware is broken, you can't enable gart in firmware, will
> > firmware engineer fix this since it's a firmware bug?
> 
> Slow down and get a reality check first please!
> 
> A firmware engineer will fix a 10yr old BIOS?!? Yeah right. And I'll get
> a pink pony for Christmas. Geez.

The code is too old. I tried to find out the original commit, many files
moving commits make it not easy to track. In the current code, can only
see the pr_info telling people to "enable the IOMMU option in the BIOS
setup". No even one word to mention that it's for borken firmware.

From Jiri's replying, he used 'guess', means the bug he is trying to fix
is not broken firmware case, but not enabling gart iommu support in bios.


gart_iommu_hole_init() {

	...

        } else if ((!no_iommu && max_pfn > MAX_DMA32_PFN) ||                                                                                     
                   force_iommu ||
                   valid_agp ||
                   fallback_aper_force) {
                pr_info("Your BIOS doesn't leave an aperture memory hole\n");
                pr_info("Please enable the IOMMU option in the BIOS setup\n");
                pr_info("This costs you %dMB of RAM\n",
                        32 << fallback_aper_order);
	...
}

> 
> We need a reliable way to tell the second kernel not to access the gart
> range. And frankly, the best thing to do would be to teach the *second*
> kernel to simply avoid the gart range. Regardless of what it gets told
> by the ELF header. Because there are some ranges which it shouldn't
> touch. Maybe we can reuse the gart detection code to do that in the
> second kernel too.

Previously people added gart region to iomem to notice that even though
there's ram mapped, while it's occupied by gart, please don't dump it.
Later it's reverted commit 707d4eefbdb3 ("Revert [PATCH] Insert GART
region into resource map"). 

In fact, there are two ways to fix this bug. One is to tell kdump kernel
not to dump the region of gart even though there are ram mapped to that
region and added to vmemmap and direct mapping. This was done before and
reverted later.

The other is not to tell kdump kernel that there's ram mapped into the
region. In the mail I replied to Jiri's v1 post, I meant the 2nd way.
Remove the ram region occupied by gart from iomem, then kdump kernel
won't see it and won't dump it.

And note that when we talk about this gart issue, we only mean the case
that gart support is not enabled in bios. In this case, gart will find a
region of ram and occupy it as gart aperture. And this is done during
gart iommu init, and after that ram region has been added to memory
subsystem.

> 
> But I haven't looked at it, might be hairy. Need to deal with this PTI
> madness first.
> 
> -- 
> Regards/Gruss,
>     Boris.
> 
> Good mailing practices for 400: avoid top-posting and trim the reply.

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2] x86/kexec: Exclude GART aperture from vmcore
  2017-12-18 13:47       ` Baoquan He
  (?)
@ 2017-12-18 14:37       ` Borislav Petkov
  2017-12-19  1:58           ` Baoquan He
  -1 siblings, 1 reply; 23+ messages in thread
From: Borislav Petkov @ 2017-12-18 14:37 UTC (permalink / raw)
  To: Baoquan He
  Cc: Jiri Bohac, Toshi Kani, David Airlie, Dave Young, joro, kexec,
	linux-kernel, Ingo Molnar, H. Peter Anvin, Bjorn Helgaas,
	Thomas Gleixner, yinghai, Vivek Goyal

On Mon, Dec 18, 2017 at 09:47:36PM +0800, Baoquan He wrote:
>                 pr_info("Your BIOS doesn't leave an aperture memory hole\n");
>                 pr_info("Please enable the IOMMU option in the BIOS setup\n");
>                 pr_info("This costs you %dMB of RAM\n",
>                         32 << fallback_aper_order);
> 	...
> }

There are BIOSen where there's not even an IOMMU option to enable in the
first place. So forget fixing the firmware.

> Previously people added gart region to iomem to notice that even though
> there's ram mapped, while it's occupied by gart, please don't dump it.
> Later it's reverted commit 707d4eefbdb3 ("Revert [PATCH] Insert GART
> region into resource map").

Yes, I read Jiri's commit message, TYVM.

> The other is not to tell kdump kernel that there's ram mapped into the
> region. In the mail I replied to Jiri's v1 post, I meant the 2nd way.
> Remove the ram region occupied by gart from iomem, then kdump kernel
> won't see it and won't dump it.

That's the wrong approach. Because this way you're lying in iomem about
the layout by hiding the gart range.

What needs to happen is to *exclude* the region from the dumping side
only, so that it doesn't touch it. Because the second kernel still needs
to show a *correct* iomem ranges list. Imagine someone looks at it
during debugging...

So I think Jiri's approach is the right thing to do.

-- 
Regards/Gruss,
    Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2] x86/kexec: Exclude GART aperture from vmcore
  2017-12-18 14:37       ` Borislav Petkov
@ 2017-12-19  1:58           ` Baoquan He
  0 siblings, 0 replies; 23+ messages in thread
From: Baoquan He @ 2017-12-19  1:58 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Jiri Bohac, Toshi Kani, David Airlie, Dave Young, joro, kexec,
	linux-kernel, Ingo Molnar, H. Peter Anvin, Bjorn Helgaas,
	Thomas Gleixner, yinghai, Vivek Goyal

On 12/18/17 at 03:37pm, Borislav Petkov wrote:
> On Mon, Dec 18, 2017 at 09:47:36PM +0800, Baoquan He wrote:
> >                 pr_info("Your BIOS doesn't leave an aperture memory hole\n");
> >                 pr_info("Please enable the IOMMU option in the BIOS setup\n");
> >                 pr_info("This costs you %dMB of RAM\n",
> >                         32 << fallback_aper_order);
> > 	...
> > }
> 
> There are BIOSen where there's not even an IOMMU option to enable in the
> first place. So forget fixing the firmware.

Yes, while GART might not be this case. Because in code there isn't any
information telling that ram region stealing is for borken firmware. And
even the pr_info is telling people to enable GART in bios.

In fact we at least should change the pr_info to pr_warn since this kind
of operation is not encouraged if firmware is not broken.


diff --git a/arch/x86/kernel/aperture_64.c b/arch/x86/kernel/aperture_64.c
index f5d92bc3b884..281ba2b595aa 100644
--- a/arch/x86/kernel/aperture_64.c
+++ b/arch/x86/kernel/aperture_64.c
@@ -451,9 +451,9 @@ int __init gart_iommu_hole_init(void)
 		   force_iommu ||
 		   valid_agp ||
 		   fallback_aper_force) {
-		pr_info("Your BIOS doesn't leave an aperture memory hole\n");
-		pr_info("Please enable the IOMMU option in the BIOS setup\n");
-		pr_info("This costs you %dMB of RAM\n",
+		pr_warn("Your BIOS doesn't leave an aperture memory hole\n");
+		pr_warn("Please enable the IOMMU option in the BIOS setup\n");
+		pr_warn("This costs you %dMB of RAM\n",
 			32 << fallback_aper_order);
 
 		aper_order = fallback_aper_order;

> 
> > Previously people added gart region to iomem to notice that even though
> > there's ram mapped, while it's occupied by gart, please don't dump it.
> > Later it's reverted commit 707d4eefbdb3 ("Revert [PATCH] Insert GART
> > region into resource map").
> 
> Yes, I read Jiri's commit message, TYVM.
> 
> > The other is not to tell kdump kernel that there's ram mapped into the
> > region. In the mail I replied to Jiri's v1 post, I meant the 2nd way.
> > Remove the ram region occupied by gart from iomem, then kdump kernel
> > won't see it and won't dump it.
> 
> That's the wrong approach. Because this way you're lying in iomem about
> the layout by hiding the gart range.
> 
> What needs to happen is to *exclude* the region from the dumping side
> only, so that it doesn't touch it. Because the second kernel still needs
> to show a *correct* iomem ranges list. Imagine someone looks at it
> during debugging...

With dmesg of 1st kernel, people should know the situation. In 1st
kernel, the region of ram memory is reserved in memblock, then no anyone
will touch it. It may not make sense to debug this region.

As the pr_info is saying, that region of ram is stolen by GART, just
GART won't use it. not sure if anyone will try to debug it on purpose.

> 
> So I think Jiri's approach is the right thing to do.

Hmm, as I have said in the first replying mail, the v2 will introduce
issues:

1) If 'iommu=off' is specified in 1st kernel but not in kdump kernel, it
will ignore the ram we need dump.

2) If 'iommu=off' is specified in kdump kernel, but not in 1st kernel,
it won't get the GART region, this patch does't work.

3) If people enable GART in bios, there's a ram memory hole for GART.
Nothing need to do while kdump kernel doesn't know GART is enabled or
not in bios, will try to avoid it anyway. It won't hurt anythig though,
in logic it's not suggested since confusion will be brought in.

Thanks
Baoquan

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: [PATCH v2] x86/kexec: Exclude GART aperture from vmcore
@ 2017-12-19  1:58           ` Baoquan He
  0 siblings, 0 replies; 23+ messages in thread
From: Baoquan He @ 2017-12-19  1:58 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Jiri Bohac, Toshi Kani, David Airlie, yinghai, joro, kexec,
	linux-kernel, Ingo Molnar, H. Peter Anvin, Bjorn Helgaas,
	Thomas Gleixner, Dave Young, Vivek Goyal

On 12/18/17 at 03:37pm, Borislav Petkov wrote:
> On Mon, Dec 18, 2017 at 09:47:36PM +0800, Baoquan He wrote:
> >                 pr_info("Your BIOS doesn't leave an aperture memory hole\n");
> >                 pr_info("Please enable the IOMMU option in the BIOS setup\n");
> >                 pr_info("This costs you %dMB of RAM\n",
> >                         32 << fallback_aper_order);
> > 	...
> > }
> 
> There are BIOSen where there's not even an IOMMU option to enable in the
> first place. So forget fixing the firmware.

Yes, while GART might not be this case. Because in code there isn't any
information telling that ram region stealing is for borken firmware. And
even the pr_info is telling people to enable GART in bios.

In fact we at least should change the pr_info to pr_warn since this kind
of operation is not encouraged if firmware is not broken.


diff --git a/arch/x86/kernel/aperture_64.c b/arch/x86/kernel/aperture_64.c
index f5d92bc3b884..281ba2b595aa 100644
--- a/arch/x86/kernel/aperture_64.c
+++ b/arch/x86/kernel/aperture_64.c
@@ -451,9 +451,9 @@ int __init gart_iommu_hole_init(void)
 		   force_iommu ||
 		   valid_agp ||
 		   fallback_aper_force) {
-		pr_info("Your BIOS doesn't leave an aperture memory hole\n");
-		pr_info("Please enable the IOMMU option in the BIOS setup\n");
-		pr_info("This costs you %dMB of RAM\n",
+		pr_warn("Your BIOS doesn't leave an aperture memory hole\n");
+		pr_warn("Please enable the IOMMU option in the BIOS setup\n");
+		pr_warn("This costs you %dMB of RAM\n",
 			32 << fallback_aper_order);
 
 		aper_order = fallback_aper_order;

> 
> > Previously people added gart region to iomem to notice that even though
> > there's ram mapped, while it's occupied by gart, please don't dump it.
> > Later it's reverted commit 707d4eefbdb3 ("Revert [PATCH] Insert GART
> > region into resource map").
> 
> Yes, I read Jiri's commit message, TYVM.
> 
> > The other is not to tell kdump kernel that there's ram mapped into the
> > region. In the mail I replied to Jiri's v1 post, I meant the 2nd way.
> > Remove the ram region occupied by gart from iomem, then kdump kernel
> > won't see it and won't dump it.
> 
> That's the wrong approach. Because this way you're lying in iomem about
> the layout by hiding the gart range.
> 
> What needs to happen is to *exclude* the region from the dumping side
> only, so that it doesn't touch it. Because the second kernel still needs
> to show a *correct* iomem ranges list. Imagine someone looks at it
> during debugging...

With dmesg of 1st kernel, people should know the situation. In 1st
kernel, the region of ram memory is reserved in memblock, then no anyone
will touch it. It may not make sense to debug this region.

As the pr_info is saying, that region of ram is stolen by GART, just
GART won't use it. not sure if anyone will try to debug it on purpose.

> 
> So I think Jiri's approach is the right thing to do.

Hmm, as I have said in the first replying mail, the v2 will introduce
issues:

1) If 'iommu=off' is specified in 1st kernel but not in kdump kernel, it
will ignore the ram we need dump.

2) If 'iommu=off' is specified in kdump kernel, but not in 1st kernel,
it won't get the GART region, this patch does't work.

3) If people enable GART in bios, there's a ram memory hole for GART.
Nothing need to do while kdump kernel doesn't know GART is enabled or
not in bios, will try to avoid it anyway. It won't hurt anythig though,
in logic it's not suggested since confusion will be brought in.

Thanks
Baoquan

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: [PATCH v2] x86/kexec: Exclude GART aperture from vmcore
  2017-12-19  1:58           ` Baoquan He
@ 2017-12-19 17:58             ` Jiri Bohac
  -1 siblings, 0 replies; 23+ messages in thread
From: Jiri Bohac @ 2017-12-19 17:58 UTC (permalink / raw)
  To: Baoquan He
  Cc: Borislav Petkov, Toshi Kani, David Airlie, Dave Young, joro,
	kexec, linux-kernel, Ingo Molnar, H. Peter Anvin, Bjorn Helgaas,
	Thomas Gleixner, yinghai, Vivek Goyal


On Tue, Dec 19, 2017 at 09:58:04AM +0800, Baoquan He wrote:
> Hmm, as I have said in the first replying mail, the v2 will introduce
> issues:
> 
> 1) If 'iommu=off' is specified in 1st kernel but not in kdump kernel, it
> will ignore the ram we need dump.

yes, instead of crashing the machine (because GART may be initialized in the
2nd kernel, overlapping the 1st kernel memory, which the 2nd kernel with its
fake e820 map sees as unused).

I'd say this is an improvement.

> 2) If 'iommu=off' is specified in kdump kernel, but not in 1st kernel,
> it won't get the GART region, this patch does't work.

No. It will work:

First kernel initializes the GART (either in a hole properly provided by the
BIOS or overlapping e820 RAM).

Second kernel will start with the GART initialized.  In gart_iommu_hole_init()
the setting is read from the northbridge registers and verified as valid. It
does not overlap e820 memory, because the second kernel has a fake e820 map
only spanning the crashkernel= reserved range. "fix" is never set to 1, so it
will exclude GART from vmcore in this path:

out:
        if (!fix && !fallback_aper_force) {
                if (last_aper_base) {
                        exclude_from_vmcore(last_aper_base, last_aper_order);
                        return 1;

(fix is never set to 1)
no_iommu is only checked after that.


> 3) If people enable GART in bios, there's a ram memory hole for GART.
> Nothing need to do while kdump kernel doesn't know GART is enabled or
> not in bios, will try to avoid it anyway. It won't hurt anythig though,
> in logic it's not suggested since confusion will be brought in.

I don't have easy access to the HP machines. I have a machine right here in our
lab that has this issue. It has no "enable GART" setting in BIOS.  It has a
"enable IOMMU" setting. The bug stays there regardless of the setting.
It's old. Noone will fix the firmware. The patch fixes it.

Thanks,

-- 
Jiri Bohac <jbohac@suse.cz>
SUSE Labs, Prague, Czechia

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2] x86/kexec: Exclude GART aperture from vmcore
@ 2017-12-19 17:58             ` Jiri Bohac
  0 siblings, 0 replies; 23+ messages in thread
From: Jiri Bohac @ 2017-12-19 17:58 UTC (permalink / raw)
  To: Baoquan He
  Cc: Toshi Kani, David Airlie, yinghai, joro, kexec, linux-kernel,
	Ingo Molnar, Borislav Petkov, H. Peter Anvin, Bjorn Helgaas,
	Thomas Gleixner, Dave Young, Vivek Goyal


On Tue, Dec 19, 2017 at 09:58:04AM +0800, Baoquan He wrote:
> Hmm, as I have said in the first replying mail, the v2 will introduce
> issues:
> 
> 1) If 'iommu=off' is specified in 1st kernel but not in kdump kernel, it
> will ignore the ram we need dump.

yes, instead of crashing the machine (because GART may be initialized in the
2nd kernel, overlapping the 1st kernel memory, which the 2nd kernel with its
fake e820 map sees as unused).

I'd say this is an improvement.

> 2) If 'iommu=off' is specified in kdump kernel, but not in 1st kernel,
> it won't get the GART region, this patch does't work.

No. It will work:

First kernel initializes the GART (either in a hole properly provided by the
BIOS or overlapping e820 RAM).

Second kernel will start with the GART initialized.  In gart_iommu_hole_init()
the setting is read from the northbridge registers and verified as valid. It
does not overlap e820 memory, because the second kernel has a fake e820 map
only spanning the crashkernel= reserved range. "fix" is never set to 1, so it
will exclude GART from vmcore in this path:

out:
        if (!fix && !fallback_aper_force) {
                if (last_aper_base) {
                        exclude_from_vmcore(last_aper_base, last_aper_order);
                        return 1;

(fix is never set to 1)
no_iommu is only checked after that.


> 3) If people enable GART in bios, there's a ram memory hole for GART.
> Nothing need to do while kdump kernel doesn't know GART is enabled or
> not in bios, will try to avoid it anyway. It won't hurt anythig though,
> in logic it's not suggested since confusion will be brought in.

I don't have easy access to the HP machines. I have a machine right here in our
lab that has this issue. It has no "enable GART" setting in BIOS.  It has a
"enable IOMMU" setting. The bug stays there regardless of the setting.
It's old. Noone will fix the firmware. The patch fixes it.

Thanks,

-- 
Jiri Bohac <jbohac@suse.cz>
SUSE Labs, Prague, Czechia


_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2] x86/kexec: Exclude GART aperture from vmcore
  2017-12-19 17:58             ` Jiri Bohac
@ 2017-12-27  7:44               ` Baoquan He
  -1 siblings, 0 replies; 23+ messages in thread
From: Baoquan He @ 2017-12-27  7:44 UTC (permalink / raw)
  To: Jiri Bohac
  Cc: Borislav Petkov, Toshi Kani, David Airlie, Dave Young, joro,
	kexec, linux-kernel, Ingo Molnar, H. Peter Anvin, Bjorn Helgaas,
	Thomas Gleixner, yinghai, Vivek Goyal

On 12/19/17 at 06:58pm, Jiri Bohac wrote:

Sorry for late response. Please see the inline comments.

> 
> On Tue, Dec 19, 2017 at 09:58:04AM +0800, Baoquan He wrote:
> > Hmm, as I have said in the first replying mail, the v2 will introduce
> > issues:
> > 
> > 1) If 'iommu=off' is specified in 1st kernel but not in kdump kernel, it
> > will ignore the ram we need dump.
> 
> yes, instead of crashing the machine (because GART may be initialized in the
> 2nd kernel, overlapping the 1st kernel memory, which the 2nd kernel with its
> fake e820 map sees as unused).
> 
> I'd say this is an improvement.

I don't get what you said. If 'iommu=off' only specified in 1st kernel,
kdump kernel will think the memory which GART bar pointed as a hole.
This is incorrect. I don't see the improvement.

> 
> > 2) If 'iommu=off' is specified in kdump kernel, but not in 1st kernel,
> > it won't get the GART region, this patch does't work.
> 
> No. It will work:
> 
> First kernel initializes the GART (either in a hole properly provided by the
> BIOS or overlapping e820 RAM).
> 
> Second kernel will start with the GART initialized.  In gart_iommu_hole_init()
> the setting is read from the northbridge registers and verified as valid. It
> does not overlap e820 memory, because the second kernel has a fake e820 map
> only spanning the crashkernel= reserved range. "fix" is never set to 1, so it
> will exclude GART from vmcore in this path:
> 
> out:
>         if (!fix && !fallback_aper_force) {
>                 if (last_aper_base) {
>                         exclude_from_vmcore(last_aper_base, last_aper_order);
>                         return 1;
> 
> (fix is never set to 1)
> no_iommu is only checked after that.

Seems yes. Well, the interesting thing is 'iommu=off' doesn't even work,
right? Well, I don't know why the GART hardware/firmware/implementation
is so ..., well, freaky. Even though 'iommu=off' is specified
explicitly, it will initialize anyway.
> 
> 
> > 3) If people enable GART in bios, there's a ram memory hole for GART.
> > Nothing need to do while kdump kernel doesn't know GART is enabled or
> > not in bios, will try to avoid it anyway. It won't hurt anythig though,
> > in logic it's not suggested since confusion will be brought in.
> 
> I don't have easy access to the HP machines. I have a machine right here in our
> lab that has this issue. It has no "enable GART" setting in BIOS.  It has a
> "enable IOMMU" setting. The bug stays there regardless of the setting.
> It's old. Noone will fix the firmware. The patch fixes it.

OK, then we need fix it. In fact, in my personal opinion, if there's
a chance, we should avoid to fix it, because
 ..GART is too old, and systems with GART rarely are seen currently;
 ..The code is too freaky, no clear code comment. As you can see, we
usually clean up codes around too when we fix a found issue. While
there's no way to begin to do clean up for GART, and it's not worth
doing that.

I understand you could get a bug report from other people, and have to
fix it as an assignee. And this fix is located in aperture_64.c only,
I am fine it's done like this. Maybe you can try the way I suggested
that only removing the region from io resource, but not touching anything
else, if you have interest.

So if have to, could you add some code comments around your fix to notice
people why these code are introduced? Commit log can help to understand
added code, while sometime file moving may make this checking very hard.

Thanks
Baoquan

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2] x86/kexec: Exclude GART aperture from vmcore
@ 2017-12-27  7:44               ` Baoquan He
  0 siblings, 0 replies; 23+ messages in thread
From: Baoquan He @ 2017-12-27  7:44 UTC (permalink / raw)
  To: Jiri Bohac
  Cc: Toshi Kani, David Airlie, yinghai, joro, kexec, linux-kernel,
	Ingo Molnar, Borislav Petkov, H. Peter Anvin, Bjorn Helgaas,
	Thomas Gleixner, Dave Young, Vivek Goyal

On 12/19/17 at 06:58pm, Jiri Bohac wrote:

Sorry for late response. Please see the inline comments.

> 
> On Tue, Dec 19, 2017 at 09:58:04AM +0800, Baoquan He wrote:
> > Hmm, as I have said in the first replying mail, the v2 will introduce
> > issues:
> > 
> > 1) If 'iommu=off' is specified in 1st kernel but not in kdump kernel, it
> > will ignore the ram we need dump.
> 
> yes, instead of crashing the machine (because GART may be initialized in the
> 2nd kernel, overlapping the 1st kernel memory, which the 2nd kernel with its
> fake e820 map sees as unused).
> 
> I'd say this is an improvement.

I don't get what you said. If 'iommu=off' only specified in 1st kernel,
kdump kernel will think the memory which GART bar pointed as a hole.
This is incorrect. I don't see the improvement.

> 
> > 2) If 'iommu=off' is specified in kdump kernel, but not in 1st kernel,
> > it won't get the GART region, this patch does't work.
> 
> No. It will work:
> 
> First kernel initializes the GART (either in a hole properly provided by the
> BIOS or overlapping e820 RAM).
> 
> Second kernel will start with the GART initialized.  In gart_iommu_hole_init()
> the setting is read from the northbridge registers and verified as valid. It
> does not overlap e820 memory, because the second kernel has a fake e820 map
> only spanning the crashkernel= reserved range. "fix" is never set to 1, so it
> will exclude GART from vmcore in this path:
> 
> out:
>         if (!fix && !fallback_aper_force) {
>                 if (last_aper_base) {
>                         exclude_from_vmcore(last_aper_base, last_aper_order);
>                         return 1;
> 
> (fix is never set to 1)
> no_iommu is only checked after that.

Seems yes. Well, the interesting thing is 'iommu=off' doesn't even work,
right? Well, I don't know why the GART hardware/firmware/implementation
is so ..., well, freaky. Even though 'iommu=off' is specified
explicitly, it will initialize anyway.
> 
> 
> > 3) If people enable GART in bios, there's a ram memory hole for GART.
> > Nothing need to do while kdump kernel doesn't know GART is enabled or
> > not in bios, will try to avoid it anyway. It won't hurt anythig though,
> > in logic it's not suggested since confusion will be brought in.
> 
> I don't have easy access to the HP machines. I have a machine right here in our
> lab that has this issue. It has no "enable GART" setting in BIOS.  It has a
> "enable IOMMU" setting. The bug stays there regardless of the setting.
> It's old. Noone will fix the firmware. The patch fixes it.

OK, then we need fix it. In fact, in my personal opinion, if there's
a chance, we should avoid to fix it, because
 ..GART is too old, and systems with GART rarely are seen currently;
 ..The code is too freaky, no clear code comment. As you can see, we
usually clean up codes around too when we fix a found issue. While
there's no way to begin to do clean up for GART, and it's not worth
doing that.

I understand you could get a bug report from other people, and have to
fix it as an assignee. And this fix is located in aperture_64.c only,
I am fine it's done like this. Maybe you can try the way I suggested
that only removing the region from io resource, but not touching anything
else, if you have interest.

So if have to, could you add some code comments around your fix to notice
people why these code are introduced? Commit log can help to understand
added code, while sometime file moving may make this checking very hard.

Thanks
Baoquan


_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2] x86/kexec: Exclude GART aperture from vmcore
  2017-12-27  7:44               ` Baoquan He
  (?)
@ 2017-12-27 12:25               ` Borislav Petkov
  2017-12-27 12:44                   ` Baoquan He
  -1 siblings, 1 reply; 23+ messages in thread
From: Borislav Petkov @ 2017-12-27 12:25 UTC (permalink / raw)
  To: Baoquan He
  Cc: Jiri Bohac, Toshi Kani, David Airlie, Dave Young, joro, kexec,
	linux-kernel, Ingo Molnar, H. Peter Anvin, Bjorn Helgaas,
	Thomas Gleixner, yinghai, Vivek Goyal

On Wed, Dec 27, 2017 at 03:44:49PM +0800, Baoquan He wrote:
> > yes, instead of crashing the machine (because GART may be initialized in the
> > 2nd kernel, overlapping the 1st kernel memory, which the 2nd kernel with its
> > fake e820 map sees as unused).
> > 
> > I'd say this is an improvement.
> 
> I don't get what you said. If 'iommu=off' only specified in 1st kernel,
> kdump kernel will think the memory which GART bar pointed as a hole.
> This is incorrect. I don't see the improvement.

So he says, this memory is unused. Why is that incorrect?!?

Wh do I care about dumping unused memory?!?!

-- 
Regards/Gruss,
    Boris.

ECO tip #101: Trim your mails when you reply.
--

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2] x86/kexec: Exclude GART aperture from vmcore
  2017-12-27 12:25               ` Borislav Petkov
@ 2017-12-27 12:44                   ` Baoquan He
  0 siblings, 0 replies; 23+ messages in thread
From: Baoquan He @ 2017-12-27 12:44 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Jiri Bohac, Toshi Kani, David Airlie, Dave Young, joro, kexec,
	linux-kernel, Ingo Molnar, H. Peter Anvin, Bjorn Helgaas,
	Thomas Gleixner, yinghai, Vivek Goyal

On 12/27/17 at 01:25pm, Borislav Petkov wrote:
> On Wed, Dec 27, 2017 at 03:44:49PM +0800, Baoquan He wrote:
> > > yes, instead of crashing the machine (because GART may be initialized in the
> > > 2nd kernel, overlapping the 1st kernel memory, which the 2nd kernel with its
> > > fake e820 map sees as unused).
> > > 
> > > I'd say this is an improvement.
> > 
> > I don't get what you said. If 'iommu=off' only specified in 1st kernel,
> > kdump kernel will think the memory which GART bar pointed as a hole.
> > This is incorrect. I don't see the improvement.
> 
> So he says, this memory is unused. Why is that incorrect?!?

'iommu=off' specified in 1st kernel, that region will be normal memory,
there could be important kernel data written into the place. While kdump
kernel will take that region as a gart aperture, trying to read data
from that region will cause error which Jiri originally tried to fix.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2] x86/kexec: Exclude GART aperture from vmcore
@ 2017-12-27 12:44                   ` Baoquan He
  0 siblings, 0 replies; 23+ messages in thread
From: Baoquan He @ 2017-12-27 12:44 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Jiri Bohac, Toshi Kani, David Airlie, yinghai, joro, kexec,
	linux-kernel, Ingo Molnar, H. Peter Anvin, Bjorn Helgaas,
	Thomas Gleixner, Dave Young, Vivek Goyal

On 12/27/17 at 01:25pm, Borislav Petkov wrote:
> On Wed, Dec 27, 2017 at 03:44:49PM +0800, Baoquan He wrote:
> > > yes, instead of crashing the machine (because GART may be initialized in the
> > > 2nd kernel, overlapping the 1st kernel memory, which the 2nd kernel with its
> > > fake e820 map sees as unused).
> > > 
> > > I'd say this is an improvement.
> > 
> > I don't get what you said. If 'iommu=off' only specified in 1st kernel,
> > kdump kernel will think the memory which GART bar pointed as a hole.
> > This is incorrect. I don't see the improvement.
> 
> So he says, this memory is unused. Why is that incorrect?!?

'iommu=off' specified in 1st kernel, that region will be normal memory,
there could be important kernel data written into the place. While kdump
kernel will take that region as a gart aperture, trying to read data
from that region will cause error which Jiri originally tried to fix.


_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2] x86/kexec: Exclude GART aperture from vmcore
  2017-12-27  7:44               ` Baoquan He
@ 2018-01-06  1:00                 ` Jiri Bohac
  -1 siblings, 0 replies; 23+ messages in thread
From: Jiri Bohac @ 2018-01-06  1:00 UTC (permalink / raw)
  To: Baoquan He
  Cc: Borislav Petkov, Toshi Kani, David Airlie, Dave Young, joro,
	kexec, linux-kernel, Ingo Molnar, H. Peter Anvin, Bjorn Helgaas,
	Thomas Gleixner, yinghai, Vivek Goyal

Hi Baoquan,

thank you very much for your review!

On Wed, Dec 27, 2017 at 03:44:49PM +0800, Baoquan He wrote:
> > > 1) If 'iommu=off' is specified in 1st kernel but not in kdump kernel, it
> > > will ignore the ram we need dump.
> > 
> > yes, instead of crashing the machine (because GART may be initialized in the
> > 2nd kernel, overlapping the 1st kernel memory, which the 2nd kernel with its
> > fake e820 map sees as unused).
> > 
> > I'd say this is an improvement.
> 
> I don't get what you said. If 'iommu=off' only specified in 1st kernel,
> kdump kernel will think the memory which GART bar pointed as a hole.
> This is incorrect. I don't see the improvement.

Without the patch, the kernel will crash/hang the machine, as it
will (correctly) try to dump the first kernel's memory that is
now mapped by the GART.

With the patch it will produce a dump with the data missing.
But this is just a corner case, I don't see why someone would
specify iommu=off in the first kernel and not the second.

So instead of crashing and producing no dump at all, we get a
dump with some data possibly missing.

> I understand you could get a bug report from other people, and have to
> fix it as an assignee. And this fix is located in aperture_64.c only,
> I am fine it's done like this. Maybe you can try the way I suggested
> that only removing the region from io resource, but not touching anything
> else, if you have interest.
> 
> So if have to, could you add some code comments around your fix to notice
> people why these code are introduced? Commit log can help to understand
> added code, while sometime file moving may make this checking very hard.

Sure, the full patch with comments added is below. Do the
comments make sense? Do you want me to re-post it as v3?:

---
On machines where the GART aperture is mapped over physical RAM
/proc/vmcore contains the remapped range and reading it may
cause hangs or reboots. 

In the past, the GART region was added into the resource map, implemented by
commit 56dd669a138c ("[PATCH] Insert GART region into resource map")

However, inserting the iomem_resource from the early GART code caused
resource conflicts with some AGP drivers (bko#72201), which got avoided by
reverting the patch in commit 707d4eefbdb3 ("Revert [PATCH] Insert GART
region into resource map"). This revert introduced the /proc/vmcore bug.

The vmcore ELF header is either prepared by the kernel (when using the
kexec_file_load syscall) or by the kexec userspace (when using the kexec_load
syscall). Since we no longer have the GART iomem resource, the userspace
kexec has no way of knowing which region to exclude from the ELF header.

Changes from v1 of this patch:
Instead of excluding the aperture from the ELF header, this patch
makes /proc/vmcore return zeroes in the second kernel when attempting to
read the aperture region. This is done by reusing the
gart_oldmem_pfn_is_ram infrastructure originally intended to exclude XEN
balooned memory. This works for both, the kexec_file_load and kexec_load
syscalls.

[Note that the GART region is the same in the first and second kernels:
regardless whether the first kernel fixed up the northbridge/bios setting
and mapped the aperture over physical memory, the second kernel finds the
northbridge properly configured by the first kernel and the aperture
never overlaps with e820 memory because the second kernel has a fake e820
map created from the crashkernel memory regions. Thus, the second kernel
keeps the aperture address/size as configured by the first kernel.]

register_oldmem_pfn_is_ram can only register one callback and returns an error
if the callback has been registered already. Since XEN used to be the only user
of this function, it never checks the return value. Now that we have more than
one user, I added a WARN_ON just in case agp, XEN, or any other future user of
register_oldmem_pfn_is_ram were to step on each other's toes.


Signed-off-by: Jiri Bohac <jbohac@suse.cz>
Fixes: 707d4eefbdb3 ("Revert [PATCH] Insert GART region into resource map")

diff --git a/arch/x86/kernel/aperture_64.c b/arch/x86/kernel/aperture_64.c
index f5d92bc3b884..2c4d5ece7456 100644
--- a/arch/x86/kernel/aperture_64.c
+++ b/arch/x86/kernel/aperture_64.c
@@ -30,6 +30,7 @@
 #include <asm/dma.h>
 #include <asm/amd_nb.h>
 #include <asm/x86_init.h>
+#include <linux/crash_dump.h>
 
 /*
  * Using 512M as goal, in case kexec will load kernel_big
@@ -56,6 +57,33 @@ int fallback_aper_force __initdata;
 
 int fix_aperture __initdata = 1;
 
+#ifdef CONFIG_PROC_VMCORE
+/*
+ * If the first kernel maps the aperture over e820 RAM, the kdump kernel will
+ * use the same range because it will remain configured in the northbridge.
+ * Trying to dump this area via /proc/vmcore may crash the machine, so exclude
+ * it from vmcore.
+ */
+static unsigned long aperture_pfn_start, aperture_page_count;
+
+static int gart_oldmem_pfn_is_ram(unsigned long pfn)
+{
+	return likely((pfn < aperture_pfn_start) ||
+		      (pfn >= aperture_pfn_start + aperture_page_count));
+}
+
+static void exclude_from_vmcore(u64 aper_base, u32 aper_order)
+{
+	aperture_pfn_start = aper_base >> PAGE_SHIFT;
+	aperture_page_count = (32 * 1024 * 1024) << aper_order >> PAGE_SHIFT;
+	WARN_ON(register_oldmem_pfn_is_ram(&gart_oldmem_pfn_is_ram));
+}
+#else
+static void exclude_from_vmcore(u64 aper_base, u32 aper_order)
+{
+}
+#endif
+
 /* This code runs before the PCI subsystem is initialized, so just
    access the northbridge directly. */
 
@@ -435,8 +463,16 @@ int __init gart_iommu_hole_init(void)
 
 out:
 	if (!fix && !fallback_aper_force) {
-		if (last_aper_base)
+		if (last_aper_base) {
+			/*
+			 * If this is the kdump kernel, the first kernel
+			 * may have allocated the range over its e820 RAM
+			 * and fixed up the northbridge
+			 */
+			exclude_from_vmcore(last_aper_base, last_aper_order);
+
 			return 1;
+		}
 		return 0;
 	}
 
@@ -473,6 +509,14 @@ int __init gart_iommu_hole_init(void)
 		return 0;
 	}
 
+	/*
+	 * If this is the kdump kernel _and_ the first kernel did not
+	 * configure the aperture in the northbridge, this range may
+	 * overlap with the first kernel's memory. We can't access the
+	 * range through vmcore even though it should be part of the dump.
+	 */
+	exclude_from_vmcore(aper_alloc, aper_order);
+
 	/* Fix up the north bridges */
 	for (i = 0; i < amd_nb_bus_dev_ranges[i].dev_limit; i++) {
 		int bus, dev_base, dev_limit;
diff --git a/arch/x86/xen/mmu_hvm.c b/arch/x86/xen/mmu_hvm.c
index 2cfcfe4f6b2a..dd2ad82eee80 100644
--- a/arch/x86/xen/mmu_hvm.c
+++ b/arch/x86/xen/mmu_hvm.c
@@ -75,6 +75,6 @@ void __init xen_hvm_init_mmu_ops(void)
 	if (is_pagetable_dying_supported())
 		pv_mmu_ops.exit_mmap = xen_hvm_exit_mmap;
 #ifdef CONFIG_PROC_VMCORE
-	register_oldmem_pfn_is_ram(&xen_oldmem_pfn_is_ram);
+	WARN_ON(register_oldmem_pfn_is_ram(&xen_oldmem_pfn_is_ram));
 #endif
 }

-- 
Jiri Bohac <jbohac@suse.cz>
SUSE Labs, Prague, Czechia

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: [PATCH v2] x86/kexec: Exclude GART aperture from vmcore
@ 2018-01-06  1:00                 ` Jiri Bohac
  0 siblings, 0 replies; 23+ messages in thread
From: Jiri Bohac @ 2018-01-06  1:00 UTC (permalink / raw)
  To: Baoquan He
  Cc: Toshi Kani, David Airlie, yinghai, joro, kexec, linux-kernel,
	Ingo Molnar, Borislav Petkov, H. Peter Anvin, Bjorn Helgaas,
	Thomas Gleixner, Dave Young, Vivek Goyal

Hi Baoquan,

thank you very much for your review!

On Wed, Dec 27, 2017 at 03:44:49PM +0800, Baoquan He wrote:
> > > 1) If 'iommu=off' is specified in 1st kernel but not in kdump kernel, it
> > > will ignore the ram we need dump.
> > 
> > yes, instead of crashing the machine (because GART may be initialized in the
> > 2nd kernel, overlapping the 1st kernel memory, which the 2nd kernel with its
> > fake e820 map sees as unused).
> > 
> > I'd say this is an improvement.
> 
> I don't get what you said. If 'iommu=off' only specified in 1st kernel,
> kdump kernel will think the memory which GART bar pointed as a hole.
> This is incorrect. I don't see the improvement.

Without the patch, the kernel will crash/hang the machine, as it
will (correctly) try to dump the first kernel's memory that is
now mapped by the GART.

With the patch it will produce a dump with the data missing.
But this is just a corner case, I don't see why someone would
specify iommu=off in the first kernel and not the second.

So instead of crashing and producing no dump at all, we get a
dump with some data possibly missing.

> I understand you could get a bug report from other people, and have to
> fix it as an assignee. And this fix is located in aperture_64.c only,
> I am fine it's done like this. Maybe you can try the way I suggested
> that only removing the region from io resource, but not touching anything
> else, if you have interest.
> 
> So if have to, could you add some code comments around your fix to notice
> people why these code are introduced? Commit log can help to understand
> added code, while sometime file moving may make this checking very hard.

Sure, the full patch with comments added is below. Do the
comments make sense? Do you want me to re-post it as v3?:

---
On machines where the GART aperture is mapped over physical RAM
/proc/vmcore contains the remapped range and reading it may
cause hangs or reboots. 

In the past, the GART region was added into the resource map, implemented by
commit 56dd669a138c ("[PATCH] Insert GART region into resource map")

However, inserting the iomem_resource from the early GART code caused
resource conflicts with some AGP drivers (bko#72201), which got avoided by
reverting the patch in commit 707d4eefbdb3 ("Revert [PATCH] Insert GART
region into resource map"). This revert introduced the /proc/vmcore bug.

The vmcore ELF header is either prepared by the kernel (when using the
kexec_file_load syscall) or by the kexec userspace (when using the kexec_load
syscall). Since we no longer have the GART iomem resource, the userspace
kexec has no way of knowing which region to exclude from the ELF header.

Changes from v1 of this patch:
Instead of excluding the aperture from the ELF header, this patch
makes /proc/vmcore return zeroes in the second kernel when attempting to
read the aperture region. This is done by reusing the
gart_oldmem_pfn_is_ram infrastructure originally intended to exclude XEN
balooned memory. This works for both, the kexec_file_load and kexec_load
syscalls.

[Note that the GART region is the same in the first and second kernels:
regardless whether the first kernel fixed up the northbridge/bios setting
and mapped the aperture over physical memory, the second kernel finds the
northbridge properly configured by the first kernel and the aperture
never overlaps with e820 memory because the second kernel has a fake e820
map created from the crashkernel memory regions. Thus, the second kernel
keeps the aperture address/size as configured by the first kernel.]

register_oldmem_pfn_is_ram can only register one callback and returns an error
if the callback has been registered already. Since XEN used to be the only user
of this function, it never checks the return value. Now that we have more than
one user, I added a WARN_ON just in case agp, XEN, or any other future user of
register_oldmem_pfn_is_ram were to step on each other's toes.


Signed-off-by: Jiri Bohac <jbohac@suse.cz>
Fixes: 707d4eefbdb3 ("Revert [PATCH] Insert GART region into resource map")

diff --git a/arch/x86/kernel/aperture_64.c b/arch/x86/kernel/aperture_64.c
index f5d92bc3b884..2c4d5ece7456 100644
--- a/arch/x86/kernel/aperture_64.c
+++ b/arch/x86/kernel/aperture_64.c
@@ -30,6 +30,7 @@
 #include <asm/dma.h>
 #include <asm/amd_nb.h>
 #include <asm/x86_init.h>
+#include <linux/crash_dump.h>
 
 /*
  * Using 512M as goal, in case kexec will load kernel_big
@@ -56,6 +57,33 @@ int fallback_aper_force __initdata;
 
 int fix_aperture __initdata = 1;
 
+#ifdef CONFIG_PROC_VMCORE
+/*
+ * If the first kernel maps the aperture over e820 RAM, the kdump kernel will
+ * use the same range because it will remain configured in the northbridge.
+ * Trying to dump this area via /proc/vmcore may crash the machine, so exclude
+ * it from vmcore.
+ */
+static unsigned long aperture_pfn_start, aperture_page_count;
+
+static int gart_oldmem_pfn_is_ram(unsigned long pfn)
+{
+	return likely((pfn < aperture_pfn_start) ||
+		      (pfn >= aperture_pfn_start + aperture_page_count));
+}
+
+static void exclude_from_vmcore(u64 aper_base, u32 aper_order)
+{
+	aperture_pfn_start = aper_base >> PAGE_SHIFT;
+	aperture_page_count = (32 * 1024 * 1024) << aper_order >> PAGE_SHIFT;
+	WARN_ON(register_oldmem_pfn_is_ram(&gart_oldmem_pfn_is_ram));
+}
+#else
+static void exclude_from_vmcore(u64 aper_base, u32 aper_order)
+{
+}
+#endif
+
 /* This code runs before the PCI subsystem is initialized, so just
    access the northbridge directly. */
 
@@ -435,8 +463,16 @@ int __init gart_iommu_hole_init(void)
 
 out:
 	if (!fix && !fallback_aper_force) {
-		if (last_aper_base)
+		if (last_aper_base) {
+			/*
+			 * If this is the kdump kernel, the first kernel
+			 * may have allocated the range over its e820 RAM
+			 * and fixed up the northbridge
+			 */
+			exclude_from_vmcore(last_aper_base, last_aper_order);
+
 			return 1;
+		}
 		return 0;
 	}
 
@@ -473,6 +509,14 @@ int __init gart_iommu_hole_init(void)
 		return 0;
 	}
 
+	/*
+	 * If this is the kdump kernel _and_ the first kernel did not
+	 * configure the aperture in the northbridge, this range may
+	 * overlap with the first kernel's memory. We can't access the
+	 * range through vmcore even though it should be part of the dump.
+	 */
+	exclude_from_vmcore(aper_alloc, aper_order);
+
 	/* Fix up the north bridges */
 	for (i = 0; i < amd_nb_bus_dev_ranges[i].dev_limit; i++) {
 		int bus, dev_base, dev_limit;
diff --git a/arch/x86/xen/mmu_hvm.c b/arch/x86/xen/mmu_hvm.c
index 2cfcfe4f6b2a..dd2ad82eee80 100644
--- a/arch/x86/xen/mmu_hvm.c
+++ b/arch/x86/xen/mmu_hvm.c
@@ -75,6 +75,6 @@ void __init xen_hvm_init_mmu_ops(void)
 	if (is_pagetable_dying_supported())
 		pv_mmu_ops.exit_mmap = xen_hvm_exit_mmap;
 #ifdef CONFIG_PROC_VMCORE
-	register_oldmem_pfn_is_ram(&xen_oldmem_pfn_is_ram);
+	WARN_ON(register_oldmem_pfn_is_ram(&xen_oldmem_pfn_is_ram));
 #endif
 }

-- 
Jiri Bohac <jbohac@suse.cz>
SUSE Labs, Prague, Czechia


_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: [PATCH v2] x86/kexec: Exclude GART aperture from vmcore
  2018-01-06  1:00                 ` Jiri Bohac
@ 2018-01-09  6:19                   ` Baoquan He
  -1 siblings, 0 replies; 23+ messages in thread
From: Baoquan He @ 2018-01-09  6:19 UTC (permalink / raw)
  To: Jiri Bohac
  Cc: Borislav Petkov, Toshi Kani, David Airlie, Dave Young, joro,
	kexec, linux-kernel, Ingo Molnar, H. Peter Anvin, Bjorn Helgaas,
	Thomas Gleixner, yinghai, Vivek Goyal

On 01/06/18 at 02:00am, Jiri Bohac wrote:
> Hi Baoquan,
> 
> thank you very much for your review!
> 
> On Wed, Dec 27, 2017 at 03:44:49PM +0800, Baoquan He wrote:
> > > > 1) If 'iommu=off' is specified in 1st kernel but not in kdump kernel, it
> > > > will ignore the ram we need dump.
> > > 
> > > yes, instead of crashing the machine (because GART may be initialized in the
> > > 2nd kernel, overlapping the 1st kernel memory, which the 2nd kernel with its
> > > fake e820 map sees as unused).
> > > 
> > > I'd say this is an improvement.
> > 
> > I don't get what you said. If 'iommu=off' only specified in 1st kernel,
> > kdump kernel will think the memory which GART bar pointed as a hole.
> > This is incorrect. I don't see the improvement.
> 
> Without the patch, the kernel will crash/hang the machine, as it
> will (correctly) try to dump the first kernel's memory that is
> now mapped by the GART.
> 
> With the patch it will produce a dump with the data missing.
> But this is just a corner case, I don't see why someone would
> specify iommu=off in the first kernel and not the second.
> 
> So instead of crashing and producing no dump at all, we get a
> dump with some data possibly missing.
> 
> > I understand you could get a bug report from other people, and have to
> > fix it as an assignee. And this fix is located in aperture_64.c only,
> > I am fine it's done like this. Maybe you can try the way I suggested
> > that only removing the region from io resource, but not touching anything
> > else, if you have interest.
> > 
> > So if have to, could you add some code comments around your fix to notice
> > people why these code are introduced? Commit log can help to understand
> > added code, while sometime file moving may make this checking very hard.
> 
> Sure, the full patch with comments added is below. Do the
> comments make sense? Do you want me to re-post it as v3?:
> 
> ---
> On machines where the GART aperture is mapped over physical RAM
> /proc/vmcore contains the remapped range and reading it may
> cause hangs or reboots. 
> 
> In the past, the GART region was added into the resource map, implemented by
> commit 56dd669a138c ("[PATCH] Insert GART region into resource map")
> 
> However, inserting the iomem_resource from the early GART code caused
> resource conflicts with some AGP drivers (bko#72201), which got avoided by
> reverting the patch in commit 707d4eefbdb3 ("Revert [PATCH] Insert GART
> region into resource map"). This revert introduced the /proc/vmcore bug.
> 
> The vmcore ELF header is either prepared by the kernel (when using the
> kexec_file_load syscall) or by the kexec userspace (when using the kexec_load
> syscall). Since we no longer have the GART iomem resource, the userspace
> kexec has no way of knowing which region to exclude from the ELF header.
> 
> Changes from v1 of this patch:
> Instead of excluding the aperture from the ELF header, this patch
> makes /proc/vmcore return zeroes in the second kernel when attempting to
> read the aperture region. This is done by reusing the
> gart_oldmem_pfn_is_ram infrastructure originally intended to exclude XEN
> balooned memory. This works for both, the kexec_file_load and kexec_load
> syscalls.
> 
> [Note that the GART region is the same in the first and second kernels:
> regardless whether the first kernel fixed up the northbridge/bios setting
> and mapped the aperture over physical memory, the second kernel finds the
> northbridge properly configured by the first kernel and the aperture
> never overlaps with e820 memory because the second kernel has a fake e820
> map created from the crashkernel memory regions. Thus, the second kernel
> keeps the aperture address/size as configured by the first kernel.]
> 
> register_oldmem_pfn_is_ram can only register one callback and returns an error
> if the callback has been registered already. Since XEN used to be the only user
> of this function, it never checks the return value. Now that we have more than
> one user, I added a WARN_ON just in case agp, XEN, or any other future user of
> register_oldmem_pfn_is_ram were to step on each other's toes.
> 
> 
> Signed-off-by: Jiri Bohac <jbohac@suse.cz>
> Fixes: 707d4eefbdb3 ("Revert [PATCH] Insert GART region into resource map")

I am fine. There's no git branch for kdump since changes related
usually scatter in all components. For this one, you can post a new one
and send to Joerg since he maintains iommu code. And Andrew, he usually
helps to pick kdump kernel changes to his akpm tree.

Thanks
Baoquan

> 
> diff --git a/arch/x86/kernel/aperture_64.c b/arch/x86/kernel/aperture_64.c
> index f5d92bc3b884..2c4d5ece7456 100644
> --- a/arch/x86/kernel/aperture_64.c
> +++ b/arch/x86/kernel/aperture_64.c
> @@ -30,6 +30,7 @@
>  #include <asm/dma.h>
>  #include <asm/amd_nb.h>
>  #include <asm/x86_init.h>
> +#include <linux/crash_dump.h>
>  
>  /*
>   * Using 512M as goal, in case kexec will load kernel_big
> @@ -56,6 +57,33 @@ int fallback_aper_force __initdata;
>  
>  int fix_aperture __initdata = 1;
>  
> +#ifdef CONFIG_PROC_VMCORE
> +/*
> + * If the first kernel maps the aperture over e820 RAM, the kdump kernel will
> + * use the same range because it will remain configured in the northbridge.
> + * Trying to dump this area via /proc/vmcore may crash the machine, so exclude
> + * it from vmcore.
> + */
> +static unsigned long aperture_pfn_start, aperture_page_count;
> +
> +static int gart_oldmem_pfn_is_ram(unsigned long pfn)
> +{
> +	return likely((pfn < aperture_pfn_start) ||
> +		      (pfn >= aperture_pfn_start + aperture_page_count));
> +}
> +
> +static void exclude_from_vmcore(u64 aper_base, u32 aper_order)
> +{
> +	aperture_pfn_start = aper_base >> PAGE_SHIFT;
> +	aperture_page_count = (32 * 1024 * 1024) << aper_order >> PAGE_SHIFT;
> +	WARN_ON(register_oldmem_pfn_is_ram(&gart_oldmem_pfn_is_ram));
> +}
> +#else
> +static void exclude_from_vmcore(u64 aper_base, u32 aper_order)
> +{
> +}
> +#endif
> +
>  /* This code runs before the PCI subsystem is initialized, so just
>     access the northbridge directly. */
>  
> @@ -435,8 +463,16 @@ int __init gart_iommu_hole_init(void)
>  
>  out:
>  	if (!fix && !fallback_aper_force) {
> -		if (last_aper_base)
> +		if (last_aper_base) {
> +			/*
> +			 * If this is the kdump kernel, the first kernel
> +			 * may have allocated the range over its e820 RAM
> +			 * and fixed up the northbridge
> +			 */
> +			exclude_from_vmcore(last_aper_base, last_aper_order);
> +
>  			return 1;
> +		}
>  		return 0;
>  	}
>  
> @@ -473,6 +509,14 @@ int __init gart_iommu_hole_init(void)
>  		return 0;
>  	}
>  
> +	/*
> +	 * If this is the kdump kernel _and_ the first kernel did not
> +	 * configure the aperture in the northbridge, this range may
> +	 * overlap with the first kernel's memory. We can't access the
> +	 * range through vmcore even though it should be part of the dump.
> +	 */
> +	exclude_from_vmcore(aper_alloc, aper_order);
> +
>  	/* Fix up the north bridges */
>  	for (i = 0; i < amd_nb_bus_dev_ranges[i].dev_limit; i++) {
>  		int bus, dev_base, dev_limit;
> diff --git a/arch/x86/xen/mmu_hvm.c b/arch/x86/xen/mmu_hvm.c
> index 2cfcfe4f6b2a..dd2ad82eee80 100644
> --- a/arch/x86/xen/mmu_hvm.c
> +++ b/arch/x86/xen/mmu_hvm.c
> @@ -75,6 +75,6 @@ void __init xen_hvm_init_mmu_ops(void)
>  	if (is_pagetable_dying_supported())
>  		pv_mmu_ops.exit_mmap = xen_hvm_exit_mmap;
>  #ifdef CONFIG_PROC_VMCORE
> -	register_oldmem_pfn_is_ram(&xen_oldmem_pfn_is_ram);
> +	WARN_ON(register_oldmem_pfn_is_ram(&xen_oldmem_pfn_is_ram));
>  #endif
>  }
> 
> -- 
> Jiri Bohac <jbohac@suse.cz>
> SUSE Labs, Prague, Czechia
> 

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2] x86/kexec: Exclude GART aperture from vmcore
@ 2018-01-09  6:19                   ` Baoquan He
  0 siblings, 0 replies; 23+ messages in thread
From: Baoquan He @ 2018-01-09  6:19 UTC (permalink / raw)
  To: Jiri Bohac
  Cc: Toshi Kani, David Airlie, yinghai, joro, kexec, linux-kernel,
	Ingo Molnar, Borislav Petkov, H. Peter Anvin, Bjorn Helgaas,
	Thomas Gleixner, Dave Young, Vivek Goyal

On 01/06/18 at 02:00am, Jiri Bohac wrote:
> Hi Baoquan,
> 
> thank you very much for your review!
> 
> On Wed, Dec 27, 2017 at 03:44:49PM +0800, Baoquan He wrote:
> > > > 1) If 'iommu=off' is specified in 1st kernel but not in kdump kernel, it
> > > > will ignore the ram we need dump.
> > > 
> > > yes, instead of crashing the machine (because GART may be initialized in the
> > > 2nd kernel, overlapping the 1st kernel memory, which the 2nd kernel with its
> > > fake e820 map sees as unused).
> > > 
> > > I'd say this is an improvement.
> > 
> > I don't get what you said. If 'iommu=off' only specified in 1st kernel,
> > kdump kernel will think the memory which GART bar pointed as a hole.
> > This is incorrect. I don't see the improvement.
> 
> Without the patch, the kernel will crash/hang the machine, as it
> will (correctly) try to dump the first kernel's memory that is
> now mapped by the GART.
> 
> With the patch it will produce a dump with the data missing.
> But this is just a corner case, I don't see why someone would
> specify iommu=off in the first kernel and not the second.
> 
> So instead of crashing and producing no dump at all, we get a
> dump with some data possibly missing.
> 
> > I understand you could get a bug report from other people, and have to
> > fix it as an assignee. And this fix is located in aperture_64.c only,
> > I am fine it's done like this. Maybe you can try the way I suggested
> > that only removing the region from io resource, but not touching anything
> > else, if you have interest.
> > 
> > So if have to, could you add some code comments around your fix to notice
> > people why these code are introduced? Commit log can help to understand
> > added code, while sometime file moving may make this checking very hard.
> 
> Sure, the full patch with comments added is below. Do the
> comments make sense? Do you want me to re-post it as v3?:
> 
> ---
> On machines where the GART aperture is mapped over physical RAM
> /proc/vmcore contains the remapped range and reading it may
> cause hangs or reboots. 
> 
> In the past, the GART region was added into the resource map, implemented by
> commit 56dd669a138c ("[PATCH] Insert GART region into resource map")
> 
> However, inserting the iomem_resource from the early GART code caused
> resource conflicts with some AGP drivers (bko#72201), which got avoided by
> reverting the patch in commit 707d4eefbdb3 ("Revert [PATCH] Insert GART
> region into resource map"). This revert introduced the /proc/vmcore bug.
> 
> The vmcore ELF header is either prepared by the kernel (when using the
> kexec_file_load syscall) or by the kexec userspace (when using the kexec_load
> syscall). Since we no longer have the GART iomem resource, the userspace
> kexec has no way of knowing which region to exclude from the ELF header.
> 
> Changes from v1 of this patch:
> Instead of excluding the aperture from the ELF header, this patch
> makes /proc/vmcore return zeroes in the second kernel when attempting to
> read the aperture region. This is done by reusing the
> gart_oldmem_pfn_is_ram infrastructure originally intended to exclude XEN
> balooned memory. This works for both, the kexec_file_load and kexec_load
> syscalls.
> 
> [Note that the GART region is the same in the first and second kernels:
> regardless whether the first kernel fixed up the northbridge/bios setting
> and mapped the aperture over physical memory, the second kernel finds the
> northbridge properly configured by the first kernel and the aperture
> never overlaps with e820 memory because the second kernel has a fake e820
> map created from the crashkernel memory regions. Thus, the second kernel
> keeps the aperture address/size as configured by the first kernel.]
> 
> register_oldmem_pfn_is_ram can only register one callback and returns an error
> if the callback has been registered already. Since XEN used to be the only user
> of this function, it never checks the return value. Now that we have more than
> one user, I added a WARN_ON just in case agp, XEN, or any other future user of
> register_oldmem_pfn_is_ram were to step on each other's toes.
> 
> 
> Signed-off-by: Jiri Bohac <jbohac@suse.cz>
> Fixes: 707d4eefbdb3 ("Revert [PATCH] Insert GART region into resource map")

I am fine. There's no git branch for kdump since changes related
usually scatter in all components. For this one, you can post a new one
and send to Joerg since he maintains iommu code. And Andrew, he usually
helps to pick kdump kernel changes to his akpm tree.

Thanks
Baoquan

> 
> diff --git a/arch/x86/kernel/aperture_64.c b/arch/x86/kernel/aperture_64.c
> index f5d92bc3b884..2c4d5ece7456 100644
> --- a/arch/x86/kernel/aperture_64.c
> +++ b/arch/x86/kernel/aperture_64.c
> @@ -30,6 +30,7 @@
>  #include <asm/dma.h>
>  #include <asm/amd_nb.h>
>  #include <asm/x86_init.h>
> +#include <linux/crash_dump.h>
>  
>  /*
>   * Using 512M as goal, in case kexec will load kernel_big
> @@ -56,6 +57,33 @@ int fallback_aper_force __initdata;
>  
>  int fix_aperture __initdata = 1;
>  
> +#ifdef CONFIG_PROC_VMCORE
> +/*
> + * If the first kernel maps the aperture over e820 RAM, the kdump kernel will
> + * use the same range because it will remain configured in the northbridge.
> + * Trying to dump this area via /proc/vmcore may crash the machine, so exclude
> + * it from vmcore.
> + */
> +static unsigned long aperture_pfn_start, aperture_page_count;
> +
> +static int gart_oldmem_pfn_is_ram(unsigned long pfn)
> +{
> +	return likely((pfn < aperture_pfn_start) ||
> +		      (pfn >= aperture_pfn_start + aperture_page_count));
> +}
> +
> +static void exclude_from_vmcore(u64 aper_base, u32 aper_order)
> +{
> +	aperture_pfn_start = aper_base >> PAGE_SHIFT;
> +	aperture_page_count = (32 * 1024 * 1024) << aper_order >> PAGE_SHIFT;
> +	WARN_ON(register_oldmem_pfn_is_ram(&gart_oldmem_pfn_is_ram));
> +}
> +#else
> +static void exclude_from_vmcore(u64 aper_base, u32 aper_order)
> +{
> +}
> +#endif
> +
>  /* This code runs before the PCI subsystem is initialized, so just
>     access the northbridge directly. */
>  
> @@ -435,8 +463,16 @@ int __init gart_iommu_hole_init(void)
>  
>  out:
>  	if (!fix && !fallback_aper_force) {
> -		if (last_aper_base)
> +		if (last_aper_base) {
> +			/*
> +			 * If this is the kdump kernel, the first kernel
> +			 * may have allocated the range over its e820 RAM
> +			 * and fixed up the northbridge
> +			 */
> +			exclude_from_vmcore(last_aper_base, last_aper_order);
> +
>  			return 1;
> +		}
>  		return 0;
>  	}
>  
> @@ -473,6 +509,14 @@ int __init gart_iommu_hole_init(void)
>  		return 0;
>  	}
>  
> +	/*
> +	 * If this is the kdump kernel _and_ the first kernel did not
> +	 * configure the aperture in the northbridge, this range may
> +	 * overlap with the first kernel's memory. We can't access the
> +	 * range through vmcore even though it should be part of the dump.
> +	 */
> +	exclude_from_vmcore(aper_alloc, aper_order);
> +
>  	/* Fix up the north bridges */
>  	for (i = 0; i < amd_nb_bus_dev_ranges[i].dev_limit; i++) {
>  		int bus, dev_base, dev_limit;
> diff --git a/arch/x86/xen/mmu_hvm.c b/arch/x86/xen/mmu_hvm.c
> index 2cfcfe4f6b2a..dd2ad82eee80 100644
> --- a/arch/x86/xen/mmu_hvm.c
> +++ b/arch/x86/xen/mmu_hvm.c
> @@ -75,6 +75,6 @@ void __init xen_hvm_init_mmu_ops(void)
>  	if (is_pagetable_dying_supported())
>  		pv_mmu_ops.exit_mmap = xen_hvm_exit_mmap;
>  #ifdef CONFIG_PROC_VMCORE
> -	register_oldmem_pfn_is_ram(&xen_oldmem_pfn_is_ram);
> +	WARN_ON(register_oldmem_pfn_is_ram(&xen_oldmem_pfn_is_ram));
>  #endif
>  }
> 
> -- 
> Jiri Bohac <jbohac@suse.cz>
> SUSE Labs, Prague, Czechia
> 

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [tip:x86/mm] x86/gart: Exclude GART aperture from vmcore
  2018-01-06  1:00                 ` Jiri Bohac
  (?)
  (?)
@ 2018-01-11 14:13                 ` tip-bot for Jiri Bohac
  -1 siblings, 0 replies; 23+ messages in thread
From: tip-bot for Jiri Bohac @ 2018-01-11 14:13 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: tglx, mingo, hpa, bp, jbohac, linux-kernel, dyoung, bhelgaas,
	airlied, toshi.kani, bhe, vgoyal

Commit-ID:  2a3e83c6f96c513f43ce5a8c9034608ea584a255
Gitweb:     https://git.kernel.org/tip/2a3e83c6f96c513f43ce5a8c9034608ea584a255
Author:     Jiri Bohac <jbohac@suse.cz>
AuthorDate: Sat, 6 Jan 2018 02:00:13 +0100
Committer:  Thomas Gleixner <tglx@linutronix.de>
CommitDate: Thu, 11 Jan 2018 15:09:24 +0100

x86/gart: Exclude GART aperture from vmcore

On machines where the GART aperture is mapped over physical RAM
/proc/vmcore contains the remapped range and reading it may cause hangs or
reboots.

In the past, the GART region was added into the resource map, implemented
by commit 56dd669a138c ("[PATCH] Insert GART region into resource map")

However, inserting the iomem_resource from the early GART code caused
resource conflicts with some AGP drivers (bko#72201), which got avoided by
reverting the patch in commit 707d4eefbdb3 ("Revert [PATCH] Insert GART
region into resource map"). This revert introduced the /proc/vmcore bug.

The vmcore ELF header is either prepared by the kernel (when using the
kexec_file_load syscall) or by the kexec userspace (when using the kexec_load
syscall). Since we no longer have the GART iomem resource, the userspace
kexec has no way of knowing which region to exclude from the ELF header.

Changes from v1 of this patch:
Instead of excluding the aperture from the ELF header, this patch
makes /proc/vmcore return zeroes in the second kernel when attempting to
read the aperture region. This is done by reusing the
gart_oldmem_pfn_is_ram infrastructure originally intended to exclude XEN
balooned memory. This works for both, the kexec_file_load and kexec_load
syscalls.

[Note that the GART region is the same in the first and second kernels:
regardless whether the first kernel fixed up the northbridge/bios setting
and mapped the aperture over physical memory, the second kernel finds the
northbridge properly configured by the first kernel and the aperture
never overlaps with e820 memory because the second kernel has a fake e820
map created from the crashkernel memory regions. Thus, the second kernel
keeps the aperture address/size as configured by the first kernel.]

register_oldmem_pfn_is_ram can only register one callback and returns an error
if the callback has been registered already. Since XEN used to be the only user
of this function, it never checks the return value. Now that we have more than
one user, I added a WARN_ON just in case agp, XEN, or any other future user of
register_oldmem_pfn_is_ram were to step on each other's toes.

Fixes: 707d4eefbdb3 ("Revert [PATCH] Insert GART region into resource map")
Signed-off-by: Jiri Bohac <jbohac@suse.cz>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Baoquan He <bhe@redhat.com>
Cc: Toshi Kani <toshi.kani@hpe.com>
Cc: David Airlie <airlied@linux.ie>
Cc: yinghai@kernel.org
Cc: joro@8bytes.org
Cc: kexec@lists.infradead.org
Cc: Borislav Petkov <bp@alien8.de>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Dave Young <dyoung@redhat.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Link: https://lkml.kernel.org/r/20180106010013.73suskgxm7lox7g6@dwarf.suse.cz

---
 arch/x86/kernel/aperture_64.c | 46 ++++++++++++++++++++++++++++++++++++++++++-
 arch/x86/xen/mmu_hvm.c        |  2 +-
 2 files changed, 46 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kernel/aperture_64.c b/arch/x86/kernel/aperture_64.c
index f5d92bc..2c4d5ec 100644
--- a/arch/x86/kernel/aperture_64.c
+++ b/arch/x86/kernel/aperture_64.c
@@ -30,6 +30,7 @@
 #include <asm/dma.h>
 #include <asm/amd_nb.h>
 #include <asm/x86_init.h>
+#include <linux/crash_dump.h>
 
 /*
  * Using 512M as goal, in case kexec will load kernel_big
@@ -56,6 +57,33 @@ int fallback_aper_force __initdata;
 
 int fix_aperture __initdata = 1;
 
+#ifdef CONFIG_PROC_VMCORE
+/*
+ * If the first kernel maps the aperture over e820 RAM, the kdump kernel will
+ * use the same range because it will remain configured in the northbridge.
+ * Trying to dump this area via /proc/vmcore may crash the machine, so exclude
+ * it from vmcore.
+ */
+static unsigned long aperture_pfn_start, aperture_page_count;
+
+static int gart_oldmem_pfn_is_ram(unsigned long pfn)
+{
+	return likely((pfn < aperture_pfn_start) ||
+		      (pfn >= aperture_pfn_start + aperture_page_count));
+}
+
+static void exclude_from_vmcore(u64 aper_base, u32 aper_order)
+{
+	aperture_pfn_start = aper_base >> PAGE_SHIFT;
+	aperture_page_count = (32 * 1024 * 1024) << aper_order >> PAGE_SHIFT;
+	WARN_ON(register_oldmem_pfn_is_ram(&gart_oldmem_pfn_is_ram));
+}
+#else
+static void exclude_from_vmcore(u64 aper_base, u32 aper_order)
+{
+}
+#endif
+
 /* This code runs before the PCI subsystem is initialized, so just
    access the northbridge directly. */
 
@@ -435,8 +463,16 @@ int __init gart_iommu_hole_init(void)
 
 out:
 	if (!fix && !fallback_aper_force) {
-		if (last_aper_base)
+		if (last_aper_base) {
+			/*
+			 * If this is the kdump kernel, the first kernel
+			 * may have allocated the range over its e820 RAM
+			 * and fixed up the northbridge
+			 */
+			exclude_from_vmcore(last_aper_base, last_aper_order);
+
 			return 1;
+		}
 		return 0;
 	}
 
@@ -473,6 +509,14 @@ out:
 		return 0;
 	}
 
+	/*
+	 * If this is the kdump kernel _and_ the first kernel did not
+	 * configure the aperture in the northbridge, this range may
+	 * overlap with the first kernel's memory. We can't access the
+	 * range through vmcore even though it should be part of the dump.
+	 */
+	exclude_from_vmcore(aper_alloc, aper_order);
+
 	/* Fix up the north bridges */
 	for (i = 0; i < amd_nb_bus_dev_ranges[i].dev_limit; i++) {
 		int bus, dev_base, dev_limit;
diff --git a/arch/x86/xen/mmu_hvm.c b/arch/x86/xen/mmu_hvm.c
index 2cfcfe4..dd2ad82 100644
--- a/arch/x86/xen/mmu_hvm.c
+++ b/arch/x86/xen/mmu_hvm.c
@@ -75,6 +75,6 @@ void __init xen_hvm_init_mmu_ops(void)
 	if (is_pagetable_dying_supported())
 		pv_mmu_ops.exit_mmap = xen_hvm_exit_mmap;
 #ifdef CONFIG_PROC_VMCORE
-	register_oldmem_pfn_is_ram(&xen_oldmem_pfn_is_ram);
+	WARN_ON(register_oldmem_pfn_is_ram(&xen_oldmem_pfn_is_ram));
 #endif
 }

^ permalink raw reply related	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2018-01-11 14:18 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-12-16  0:15 [PATCH v2] x86/kexec: Exclude GART aperture from vmcore Jiri Bohac
2017-12-16  0:15 ` Jiri Bohac
2017-12-16  1:01 ` Baoquan He
2017-12-16  1:01   ` Baoquan He
2017-12-17 21:47   ` Borislav Petkov
2017-12-17 21:47     ` Borislav Petkov
2017-12-18 13:47     ` Baoquan He
2017-12-18 13:47       ` Baoquan He
2017-12-18 14:37       ` Borislav Petkov
2017-12-19  1:58         ` Baoquan He
2017-12-19  1:58           ` Baoquan He
2017-12-19 17:58           ` Jiri Bohac
2017-12-19 17:58             ` Jiri Bohac
2017-12-27  7:44             ` Baoquan He
2017-12-27  7:44               ` Baoquan He
2017-12-27 12:25               ` Borislav Petkov
2017-12-27 12:44                 ` Baoquan He
2017-12-27 12:44                   ` Baoquan He
2018-01-06  1:00               ` Jiri Bohac
2018-01-06  1:00                 ` Jiri Bohac
2018-01-09  6:19                 ` Baoquan He
2018-01-09  6:19                   ` Baoquan He
2018-01-11 14:13                 ` [tip:x86/mm] x86/gart: " tip-bot for Jiri Bohac

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.