linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/2 RESEND v7] add reserved e820 ranges to the kdump kernel e820 table
@ 2018-11-24  5:12 Lianbo Jiang
  2018-11-24  5:12 ` [PATCH 1/2 RESEND v7] resource: add the new I/O resource descriptor 'IORES_DESC_RESERVED' Lianbo Jiang
                   ` (3 more replies)
  0 siblings, 4 replies; 13+ messages in thread
From: Lianbo Jiang @ 2018-11-24  5:12 UTC (permalink / raw)
  To: linux-kernel
  Cc: kexec, x86, linux-ia64, linux-efi, tglx, mingo, bp, akpm,
	dave.hansen, luto, peterz, ard.biesheuvel, tony.luck, fenghua.yu,
	dyoung, bhe

These patches add the new I/O resource descriptor 'IORES_DESC_RESERVED'
for the iomem resources search interfaces, and in order to make it still
work after the new descriptor is added, these codes originally related
to 'IORES_DESC_NONE' have been updated.

In addition, for the MMCONFIG issue and the SME kdump issue, it is
necessary to pass the e820 reserved ranges to kdump kernel.

Changes since v1:
1. Modified the value of flags to "0", when walking through the whole
tree for e820 reserved ranges.

Changes since v2:
1. Modified the value of flags to "0", when walking through the whole
tree for e820 reserved ranges.
2. Modified the invalid SOB chain issue.

Changes since v3:
1. Dropped [PATCH 1/3 v3] resource: fix an error which walks through iomem
   resources. Please refer to this commit <010a93bf97c7> "resource: Fix
   find_next_iomem_res() iteration issue"

Changes since v4:
1. Improve the patch log, and add kernel log.

Changes since v5:
1. Rewrite these patches log.

Changes since v6:
1. Modify the [PATCH 1/2], and add the new I/O resource descriptor
   'IORES_DESC_RESERVED' for the iomem resources search interfaces,
   and also updates these codes relates to 'IORES_DESC_NONE'.
2. Modify the [PATCH 2/2], and walk through io resource based on the
   new descriptor 'IORES_DESC_RESERVED'.
3. Update patch log.

Lianbo Jiang (2):
  resource: add the new I/O resource descriptor 'IORES_DESC_RESERVED'
  x86/kexec_file: add reserved e820 ranges to kdump kernel e820 table

 arch/ia64/kernel/efi.c  | 4 ++++
 arch/x86/kernel/crash.c | 6 ++++++
 arch/x86/kernel/e820.c  | 2 +-
 arch/x86/mm/ioremap.c   | 9 ++++++++-
 include/linux/ioport.h  | 1 +
 kernel/resource.c       | 6 +++---
 6 files changed, 23 insertions(+), 5 deletions(-)

-- 
2.17.1


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH 1/2 RESEND v7] resource: add the new I/O resource descriptor 'IORES_DESC_RESERVED'
  2018-11-24  5:12 [PATCH 0/2 RESEND v7] add reserved e820 ranges to the kdump kernel e820 table Lianbo Jiang
@ 2018-11-24  5:12 ` Lianbo Jiang
  2018-11-26 20:52   ` Dave Hansen
  2018-11-24  5:12 ` [PATCH 2/2 RESEND v7] x86/kexec_file: add reserved e820 ranges to kdump kernel e820 table Lianbo Jiang
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 13+ messages in thread
From: Lianbo Jiang @ 2018-11-24  5:12 UTC (permalink / raw)
  To: linux-kernel
  Cc: kexec, x86, linux-ia64, linux-efi, tglx, mingo, bp, akpm,
	dave.hansen, luto, peterz, ard.biesheuvel, tony.luck, fenghua.yu,
	dyoung, bhe

The upstream kernel can not accurately add the e820 reserved type to
kdump krenel e820 table.

Kdump uses walk_iomem_res_desc() to iterate io resources, then adds
the matched resource ranges to the e820 table for kdump kernel. But,
when convert the e820 type to the iores descriptor, several e820
types are converted to 'IORES_DESC_NONE' in this function e820_type
_to_iores_desc(). So the walk_iomem_res_desc() will get the redundant
types(such as E820_TYPE_RAM/E820_TYPE_UNUSABLE/E820_TYPE_KERN) when
walk through io resources with the descriptor 'IORES_DESC_NONE'.

This patch adds the new I/O resource descriptor 'IORES_DESC_RESERVED'
for the iomem resources search interfaces. It is helpful to exactly
match the reserved resource ranges when walking through iomem resources.

Furthermore, in order to make it still work after the new descriptor
is added, these codes originally related to 'IORES_DESC_NONE' have
been updated.

Suggested-by: Dave Young <dyoung@redhat.com>
Signed-off-by: Lianbo Jiang <lijiang@redhat.com>
---
 arch/ia64/kernel/efi.c | 4 ++++
 arch/x86/kernel/e820.c | 2 +-
 arch/x86/mm/ioremap.c  | 9 ++++++++-
 include/linux/ioport.h | 1 +
 kernel/resource.c      | 6 +++---
 5 files changed, 17 insertions(+), 5 deletions(-)

diff --git a/arch/ia64/kernel/efi.c b/arch/ia64/kernel/efi.c
index 8f106638913c..1841e9b4db30 100644
--- a/arch/ia64/kernel/efi.c
+++ b/arch/ia64/kernel/efi.c
@@ -1231,6 +1231,10 @@ efi_initialize_iomem_resources(struct resource *code_resource,
 				break;
 
 			case EFI_RESERVED_TYPE:
+				name = "reserved";
+				desc = IORES_DESC_RESERVED;
+				break;
+
 			case EFI_RUNTIME_SERVICES_CODE:
 			case EFI_RUNTIME_SERVICES_DATA:
 			case EFI_ACPI_RECLAIM_MEMORY:
diff --git a/arch/x86/kernel/e820.c b/arch/x86/kernel/e820.c
index 50895c2f937d..57fafdafb860 100644
--- a/arch/x86/kernel/e820.c
+++ b/arch/x86/kernel/e820.c
@@ -1048,10 +1048,10 @@ static unsigned long __init e820_type_to_iores_desc(struct e820_entry *entry)
 	case E820_TYPE_NVS:		return IORES_DESC_ACPI_NV_STORAGE;
 	case E820_TYPE_PMEM:		return IORES_DESC_PERSISTENT_MEMORY;
 	case E820_TYPE_PRAM:		return IORES_DESC_PERSISTENT_MEMORY_LEGACY;
+	case E820_TYPE_RESERVED:	return IORES_DESC_RESERVED;
 	case E820_TYPE_RESERVED_KERN:	/* Fall-through: */
 	case E820_TYPE_RAM:		/* Fall-through: */
 	case E820_TYPE_UNUSABLE:	/* Fall-through: */
-	case E820_TYPE_RESERVED:	/* Fall-through: */
 	default:			return IORES_DESC_NONE;
 	}
 }
diff --git a/arch/x86/mm/ioremap.c b/arch/x86/mm/ioremap.c
index 5378d10f1d31..91b6112e7489 100644
--- a/arch/x86/mm/ioremap.c
+++ b/arch/x86/mm/ioremap.c
@@ -83,7 +83,14 @@ static bool __ioremap_check_ram(struct resource *res)
 
 static int __ioremap_check_desc_other(struct resource *res)
 {
-	return (res->desc != IORES_DESC_NONE);
+	/*
+	 * The E820_TYPE_RESERVED was converted to the IORES_DESC_NONE
+	 * before the new IORES_DESC_RESERVED is added, so it contained
+	 * the e820 reserved type. In order to make it still work for
+	 * SEV, here keep it the same as before.
+	 */
+	return ((res->desc != IORES_DESC_NONE) ||
+		(res->desc != IORES_DESC_RESERVED));
 }
 
 static int __ioremap_res_check(struct resource *res, void *arg)
diff --git a/include/linux/ioport.h b/include/linux/ioport.h
index da0ebaec25f0..6ed59de48bd5 100644
--- a/include/linux/ioport.h
+++ b/include/linux/ioport.h
@@ -133,6 +133,7 @@ enum {
 	IORES_DESC_PERSISTENT_MEMORY_LEGACY	= 5,
 	IORES_DESC_DEVICE_PRIVATE_MEMORY	= 6,
 	IORES_DESC_DEVICE_PUBLIC_MEMORY		= 7,
+	IORES_DESC_RESERVED			= 8,
 };
 
 /* helpers to define resources */
diff --git a/kernel/resource.c b/kernel/resource.c
index b0fbf685c77a..f34a632c4169 100644
--- a/kernel/resource.c
+++ b/kernel/resource.c
@@ -994,7 +994,7 @@ __reserve_region_with_split(struct resource *root, resource_size_t start,
 	res->start = start;
 	res->end = end;
 	res->flags = type | IORESOURCE_BUSY;
-	res->desc = IORES_DESC_NONE;
+	res->desc = IORES_DESC_RESERVED;
 
 	while (1) {
 
@@ -1029,7 +1029,7 @@ __reserve_region_with_split(struct resource *root, resource_size_t start,
 				next_res->start = conflict->end + 1;
 				next_res->end = end;
 				next_res->flags = type | IORESOURCE_BUSY;
-				next_res->desc = IORES_DESC_NONE;
+				next_res->desc = IORES_DESC_RESERVED;
 			}
 		} else {
 			res->start = conflict->end + 1;
@@ -1477,7 +1477,7 @@ static int __init reserve_setup(char *str)
 			res->start = io_start;
 			res->end = io_start + io_num - 1;
 			res->flags |= IORESOURCE_BUSY;
-			res->desc = IORES_DESC_NONE;
+			res->desc = IORES_DESC_RESERVED;
 			res->child = NULL;
 			if (request_resource(parent, res) == 0)
 				reserved = x+1;
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 2/2 RESEND v7] x86/kexec_file: add reserved e820 ranges to kdump kernel e820 table
  2018-11-24  5:12 [PATCH 0/2 RESEND v7] add reserved e820 ranges to the kdump kernel e820 table Lianbo Jiang
  2018-11-24  5:12 ` [PATCH 1/2 RESEND v7] resource: add the new I/O resource descriptor 'IORES_DESC_RESERVED' Lianbo Jiang
@ 2018-11-24  5:12 ` Lianbo Jiang
  2018-11-26 17:44 ` [PATCH 0/2 RESEND v7] add reserved e820 ranges to the " Dave Hansen
  2018-11-26 18:54 ` Dave Hansen
  3 siblings, 0 replies; 13+ messages in thread
From: Lianbo Jiang @ 2018-11-24  5:12 UTC (permalink / raw)
  To: linux-kernel
  Cc: kexec, x86, linux-ia64, linux-efi, tglx, mingo, bp, akpm,
	dave.hansen, luto, peterz, ard.biesheuvel, tony.luck, fenghua.yu,
	dyoung, bhe

At present, when use the kexec_file_load syscall to load the kernel image
and initramfs(for example: kexec -s -p xxx), the upstream kernel does not
pass the e820 reserved ranges to the second kernel, which might cause two
problems:

The first one is the MMCONFIG issue. The basic problem is that this device
is in PCI segment 1 and the kernel PCI probing can not find it without all
the e820 I/O reservations being present in the e820 table. And the kdump
kernel does not have those reservations because the kexec command does not
pass the I/O reservation via the "memmap=xxx" command line option. (This
problem does not show up for other vendors, as SGI is apparently the
actually fails for everyone, but devices in segment 0 are then found by
some legacy lookup method.) The workaround for this is to pass the I/O
reserved regions to the kdump kernel.

MMCONFIG(aka ECAM) space is described in the ACPI MCFG table. If you don't
have ECAM: (a) PCI devices won't work at all on non-x86 systems that use
only ECAM for config access, (b) you won't be albe to access devices on
non-0 segments, (c) you won't be able to access extended config space(
address 0x100-0xffff), which means none of the Extended Capabilities will
be available(AER, ACS, ATS, etc). [Bjorn's comment]

The second issue is that the SME kdump kernel doesn't work without the
e820 reserved ranges. When SME is active in kdump kernel, actually, those
reserved regions are still decrypted, but because those reserved ranges are
not present at all in kdump kernel e820 table, those reserved regions are
considered as encrypted, it goes wrong.

The e820 reserved range is useful in kdump kernel, so it is necessary to
pass the e820 reserved ranges to kdump kernel.

Suggested-by: Dave Young <dyoung@redhat.com>
Signed-off-by: Lianbo Jiang <lijiang@redhat.com>
---
 arch/x86/kernel/crash.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/arch/x86/kernel/crash.c b/arch/x86/kernel/crash.c
index f631a3f15587..5354a84f1684 100644
--- a/arch/x86/kernel/crash.c
+++ b/arch/x86/kernel/crash.c
@@ -380,6 +380,12 @@ int crash_setup_memmap_entries(struct kimage *image, struct boot_params *params)
 	walk_iomem_res_desc(IORES_DESC_ACPI_NV_STORAGE, flags, 0, -1, &cmd,
 			memmap_entry_callback);
 
+	/* Add e820 reserved ranges */
+	cmd.type = E820_TYPE_RESERVED;
+	flags = IORESOURCE_MEM;
+	walk_iomem_res_desc(IORES_DESC_RESERVED, flags, 0, -1, &cmd,
+			   memmap_entry_callback);
+
 	/* Add crashk_low_res region */
 	if (crashk_low_res.end) {
 		ei.addr = crashk_low_res.start;
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH 0/2 RESEND v7] add reserved e820 ranges to the kdump kernel e820 table
  2018-11-24  5:12 [PATCH 0/2 RESEND v7] add reserved e820 ranges to the kdump kernel e820 table Lianbo Jiang
  2018-11-24  5:12 ` [PATCH 1/2 RESEND v7] resource: add the new I/O resource descriptor 'IORES_DESC_RESERVED' Lianbo Jiang
  2018-11-24  5:12 ` [PATCH 2/2 RESEND v7] x86/kexec_file: add reserved e820 ranges to kdump kernel e820 table Lianbo Jiang
@ 2018-11-26 17:44 ` Dave Hansen
  2018-11-26 18:04   ` Borislav Petkov
  2018-11-26 18:54 ` Dave Hansen
  3 siblings, 1 reply; 13+ messages in thread
From: Dave Hansen @ 2018-11-26 17:44 UTC (permalink / raw)
  To: Lianbo Jiang, linux-kernel
  Cc: kexec, x86, linux-ia64, linux-efi, tglx, mingo, bp, akpm,
	dave.hansen, luto, peterz, ard.biesheuvel, tony.luck, fenghua.yu,
	dyoung, bhe

On 11/23/18 9:12 PM, Lianbo Jiang wrote:
> These patches add the new I/O resource descriptor 'IORES_DESC_RESERVED'
> for the iomem resources search interfaces, and in order to make it still
> work after the new descriptor is added, these codes originally related
> to 'IORES_DESC_NONE' have been updated.

I'm having a really hard time figuring out what problem these patches solve.

What end-user visible behavior does this set change?

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 0/2 RESEND v7] add reserved e820 ranges to the kdump kernel e820 table
  2018-11-26 17:44 ` [PATCH 0/2 RESEND v7] add reserved e820 ranges to the " Dave Hansen
@ 2018-11-26 18:04   ` Borislav Petkov
  0 siblings, 0 replies; 13+ messages in thread
From: Borislav Petkov @ 2018-11-26 18:04 UTC (permalink / raw)
  To: Dave Hansen
  Cc: Lianbo Jiang, linux-kernel, kexec, x86, linux-ia64, linux-efi,
	tglx, mingo, akpm, dave.hansen, luto, peterz, ard.biesheuvel,
	tony.luck, fenghua.yu, dyoung, bhe

On Mon, Nov 26, 2018 at 09:44:46AM -0800, Dave Hansen wrote:
> On 11/23/18 9:12 PM, Lianbo Jiang wrote:
> > These patches add the new I/O resource descriptor 'IORES_DESC_RESERVED'
> > for the iomem resources search interfaces, and in order to make it still
> > work after the new descriptor is added, these codes originally related
> > to 'IORES_DESC_NONE' have been updated.
> 
> I'm having a really hard time figuring out what problem these patches solve.
> 
> What end-user visible behavior does this set change?

Yeah, that was a long process of digging out the "why".

I'd recommend reading the previous threads and, more specifically:

https://lkml.kernel.org/r/CABhMZUUscS3jUZUSM5Y6EYJK6weo7Mjj5-EAKGvbw0qEe%2B38zw@mail.gmail.com

in short: we need to export e820 reserved ranges to the second kernel so
that it can talk to PCI devices properly. (And I'm wondering how it did
work until now...)

-- 
Regards/Gruss,
    Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 0/2 RESEND v7] add reserved e820 ranges to the kdump kernel e820 table
  2018-11-24  5:12 [PATCH 0/2 RESEND v7] add reserved e820 ranges to the kdump kernel e820 table Lianbo Jiang
                   ` (2 preceding siblings ...)
  2018-11-26 17:44 ` [PATCH 0/2 RESEND v7] add reserved e820 ranges to the " Dave Hansen
@ 2018-11-26 18:54 ` Dave Hansen
  2018-11-27  2:58   ` lijiang
  3 siblings, 1 reply; 13+ messages in thread
From: Dave Hansen @ 2018-11-26 18:54 UTC (permalink / raw)
  To: Lianbo Jiang, linux-kernel
  Cc: kexec, x86, linux-ia64, linux-efi, tglx, mingo, bp, akpm,
	dave.hansen, luto, peterz, ard.biesheuvel, tony.luck, fenghua.yu,
	dyoung, bhe

On 11/23/18 9:12 PM, Lianbo Jiang wrote:
> These patches add the new I/O resource descriptor 'IORES_DESC_RESERVED'
> for the iomem resources search interfaces, and in order to make it still
> work after the new descriptor is added, these codes originally related
> to 'IORES_DESC_NONE' have been updated.

This is rather anemic "0/" text.  Could you please include some more
background in here?  The 2/2 patch is pretty good in this regard, but it
needs to be here, too.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 1/2 RESEND v7] resource: add the new I/O resource descriptor 'IORES_DESC_RESERVED'
  2018-11-24  5:12 ` [PATCH 1/2 RESEND v7] resource: add the new I/O resource descriptor 'IORES_DESC_RESERVED' Lianbo Jiang
@ 2018-11-26 20:52   ` Dave Hansen
  2018-11-27 10:04     ` lijiang
  0 siblings, 1 reply; 13+ messages in thread
From: Dave Hansen @ 2018-11-26 20:52 UTC (permalink / raw)
  To: Lianbo Jiang, linux-kernel
  Cc: kexec, x86, linux-ia64, linux-efi, tglx, mingo, bp, akpm,
	dave.hansen, luto, peterz, ard.biesheuvel, tony.luck, fenghua.yu,
	dyoung, bhe

On 11/23/18 9:12 PM, Lianbo Jiang wrote:
> The upstream kernel can not accurately add the e820 reserved type to> kdump krenel e820 table.

	^ kernel

> Kdump uses walk_iomem_res_desc() to iterate io resources, then adds
> the matched resource ranges to the e820 table for kdump kernel. But,
> when convert the e820 type to the iores descriptor, several e820
> types are converted to 'IORES_DESC_NONE' in this function e820_type
> _to_iores_desc(). So the walk_iomem_res_desc() will get the redundant
> types(such as E820_TYPE_RAM/E820_TYPE_UNUSABLE/E820_TYPE_KERN) when
> walk through io resources with the descriptor 'IORES_DESC_NONE'.

Let me see if I understand.

When doing kexec, the old kernel makes a new e820 table for the new
kernel.  The old kernel constructs the new e820 table from
'iomem_resource'.  But, when creating the 'iomem_resource' tree,
reserved areas like E820_TYPE_RESERVED are not properly passed through.
 This causes problems like described in the next patch.

> This patch adds the new I/O resource descriptor 'IORES_DESC_RESERVED'
> for the iomem resources search interfaces. It is helpful to exactly
> match the reserved resource ranges when walking through iomem resources.

It's more than that, though.  You're specifically storing the reserved
area(s) when we see them come through the firmware.

> Furthermore, in order to make it still work after the new descriptor
> is added, these codes originally related to 'IORES_DESC_NONE' have
> been updated.

It would be nice to explain why they needed to be updated and what
breaks if they are not.

> diff --git a/arch/x86/mm/ioremap.c b/arch/x86/mm/ioremap.c
> index 5378d10f1d31..91b6112e7489 100644
> --- a/arch/x86/mm/ioremap.c
> +++ b/arch/x86/mm/ioremap.c
> @@ -83,7 +83,14 @@ static bool __ioremap_check_ram(struct resource *res)
>  
>  static int __ioremap_check_desc_other(struct resource *res)
>  {
> -	return (res->desc != IORES_DESC_NONE);
> +	/*
> +	 * The E820_TYPE_RESERVED was converted to the IORES_DESC_NONE
> +	 * before the new IORES_DESC_RESERVED is added, so it contained
> +	 * the e820 reserved type. In order to make it still work for
> +	 * SEV, here keep it the same as before.
> +	 */
> +	return ((res->desc != IORES_DESC_NONE) ||
> +		(res->desc != IORES_DESC_RESERVED));
>  }

After reading the changelog and the comment, I still have no idea why
this hunk is here.  Could you try to explain a bit more?



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 0/2 RESEND v7] add reserved e820 ranges to the kdump kernel e820 table
  2018-11-26 18:54 ` Dave Hansen
@ 2018-11-27  2:58   ` lijiang
  0 siblings, 0 replies; 13+ messages in thread
From: lijiang @ 2018-11-27  2:58 UTC (permalink / raw)
  To: Dave Hansen, linux-kernel
  Cc: kexec, x86, linux-ia64, linux-efi, tglx, mingo, bp, akpm,
	dave.hansen, luto, peterz, ard.biesheuvel, tony.luck, fenghua.yu,
	dyoung, bhe

在 2018年11月27日 02:54, Dave Hansen 写道:
> On 11/23/18 9:12 PM, Lianbo Jiang wrote:
>> These patches add the new I/O resource descriptor 'IORES_DESC_RESERVED'
>> for the iomem resources search interfaces, and in order to make it still
>> work after the new descriptor is added, these codes originally related
>> to 'IORES_DESC_NONE' have been updated.
> 
> This is rather anemic "0/" text.  Could you please include some more
> background in here?  The 2/2 patch is pretty good in this regard, but it
> needs to be here, too.
> 

Thanks for your comment. 

Originally, this patch added the e820 reserved ranges by accurately comparing
a string. It only modifies fewer code paths. Please refer to patch v6.
https://lore.kernel.org/lkml/20181114072926.13312-2-lijiang@redhat.com/

Because the string comparison is fragile and error-prone, this patch used the
solution that adds a new descriptor 'IORES_DESC_RESERVED'. Please refer to this
link: https://lore.kernel.org/lkml/20181120192935.GK2527@zn.tnic/

When passing the e820 reserved ranges to the second kernel, why does it need to
compare strings accurately or add a new descriptor 'IORES_DESC_RESERVED'?

-The reason is that it can not exactly match the e820 reserved resource ranges
when walking through iomem resources with the descriptor 'IORES_DESC_NONE'.

Thanks.
Lianbo

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 1/2 RESEND v7] resource: add the new I/O resource descriptor 'IORES_DESC_RESERVED'
  2018-11-26 20:52   ` Dave Hansen
@ 2018-11-27 10:04     ` lijiang
  2018-11-27 15:34       ` Dave Hansen
  0 siblings, 1 reply; 13+ messages in thread
From: lijiang @ 2018-11-27 10:04 UTC (permalink / raw)
  To: Dave Hansen, linux-kernel
  Cc: kexec, x86, linux-ia64, linux-efi, tglx, mingo, bp, akpm,
	dave.hansen, luto, peterz, ard.biesheuvel, tony.luck, fenghua.yu,
	dyoung, bhe

在 2018年11月27日 04:52, Dave Hansen 写道:
> On 11/23/18 9:12 PM, Lianbo Jiang wrote:
>> The upstream kernel can not accurately add the e820 reserved type to> kdump krenel e820 table.
> 
> 	^ kernel
> 
>> Kdump uses walk_iomem_res_desc() to iterate io resources, then adds
>> the matched resource ranges to the e820 table for kdump kernel. But,
>> when convert the e820 type to the iores descriptor, several e820
>> types are converted to 'IORES_DESC_NONE' in this function e820_type
>> _to_iores_desc(). So the walk_iomem_res_desc() will get the redundant
>> types(such as E820_TYPE_RAM/E820_TYPE_UNUSABLE/E820_TYPE_KERN) when
>> walk through io resources with the descriptor 'IORES_DESC_NONE'.
> 
> Let me see if I understand.
> 
> When doing kexec, the old kernel makes a new e820 table for the new
> kernel.  The old kernel constructs the new e820 table from
> 'iomem_resource'.  But, when creating the 'iomem_resource' tree,
> reserved areas like E820_TYPE_RESERVED are not properly passed through.
>  This causes problems like described in the next patch.
> 

When doing kexec_file_load, the old kernel need to pass the e820 reserved ranges
to the kdump kernel. But kernel can not exactly match the e820 reserved ranges when
walking through the iomem resources with the descriptor 'IORES_DESC_NONE', because
the descriptor 'IORES_DESC_NONE' contains four e820 types, such as 'E820_TYPE_RAM',
'E820_TYPE_UNUSABLE', 'E820_TYPE_RESERVED_KERN', 'E820_TYPE_RESERVED'. It may pass
these four types to the kdump kernel, this is not desired result.

Please refer to this function e820_type_to_iores_desc(). (arch/x86/kernel/e820.c)

>> This patch adds the new I/O resource descriptor 'IORES_DESC_RESERVED'
>> for the iomem resources search interfaces. It is helpful to exactly
>> match the reserved resource ranges when walking through iomem resources.
> 
> It's more than that, though.  You're specifically storing the reserved
> area(s) when we see them come through the firmware.
> 
>> Furthermore, in order to make it still work after the new descriptor
>> is added, these codes originally related to 'IORES_DESC_NONE' have
>> been updated.
> 
> It would be nice to explain why they needed to be updated and what
> breaks if they are not.
> 

If these code paths have not been updated, the reserved regions descriptor is still marked
as the 'IORES_DESC_NONE' instead of 'IORES_DESC_RESERVED', this is not correct and it is 
also easily confused.

And also cause some errors, such as the following function __ioremap_check_desc_other().

>> diff --git a/arch/x86/mm/ioremap.c b/arch/x86/mm/ioremap.c
>> index 5378d10f1d31..91b6112e7489 100644
>> --- a/arch/x86/mm/ioremap.c
>> +++ b/arch/x86/mm/ioremap.c
>> @@ -83,7 +83,14 @@ static bool __ioremap_check_ram(struct resource *res)
>>  
>>  static int __ioremap_check_desc_other(struct resource *res)
>>  {
>> -	return (res->desc != IORES_DESC_NONE);
>> +	/*
>> +	 * The E820_TYPE_RESERVED was converted to the IORES_DESC_NONE
>> +	 * before the new IORES_DESC_RESERVED is added, so it contained
>> +	 * the e820 reserved type. In order to make it still work for
>> +	 * SEV, here keep it the same as before.
>> +	 */
>> +	return ((res->desc != IORES_DESC_NONE) ||
>> +		(res->desc != IORES_DESC_RESERVED));
>>  }
> 
> After reading the changelog and the comment, I still have no idea why
> this hunk is here.  Could you try to explain a bit more?
> 
> 

Thanks for your comment.

What happens if we don't modify this functions __ioremap_check_desc_other()?
-When SEV is active, it might map these reserved regions with the encryption mask. That is incorrect.
Therefore, keep it the same as before in order to make it still work for the SEV case.

**BTW**:
In this function __ioremap_check_desc_other(), it should use the '&&' operator instead of
the '||' operator. I made a mistake in this function(sorry for this), i will improve this
function in patch v8 and post again like this:

static int __ioremap_check_desc_other(struct resource *res)
{
    return ((res->desc != IORES_DESC_NONE) &&
            (res->desc != IORES_DESC_RESERVED));
}


Regards,
Lianbo

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 1/2 RESEND v7] resource: add the new I/O resource descriptor 'IORES_DESC_RESERVED'
  2018-11-27 10:04     ` lijiang
@ 2018-11-27 15:34       ` Dave Hansen
  2018-11-28  3:51         ` lijiang
  0 siblings, 1 reply; 13+ messages in thread
From: Dave Hansen @ 2018-11-27 15:34 UTC (permalink / raw)
  To: lijiang, linux-kernel
  Cc: kexec, x86, linux-ia64, linux-efi, tglx, mingo, bp, akpm,
	dave.hansen, luto, peterz, ard.biesheuvel, tony.luck, fenghua.yu,
	dyoung, bhe

On 11/27/18 2:04 AM, lijiang wrote:
> What happens if we don't modify this functions
> __ioremap_check_desc_other()? -When SEV is active, it might map these
> reserved regions with the encryption mask. That is incorrect. 

This is missing another sentence or two to "connect the dots".

SEV uses data that comes from the e820 table to tell whether or not the
memory should be encrypted?  If we don't reflect these reserved areas in
the e820 table, the SEV code will set up encrypted mappings for device
memory, for instance?

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 1/2 RESEND v7] resource: add the new I/O resource descriptor 'IORES_DESC_RESERVED'
  2018-11-27 15:34       ` Dave Hansen
@ 2018-11-28  3:51         ` lijiang
  2018-11-28 16:02           ` Dave Hansen
  0 siblings, 1 reply; 13+ messages in thread
From: lijiang @ 2018-11-28  3:51 UTC (permalink / raw)
  To: Dave Hansen, linux-kernel
  Cc: kexec, x86, linux-ia64, linux-efi, tglx, mingo, bp, akpm,
	dave.hansen, luto, peterz, ard.biesheuvel, tony.luck, fenghua.yu,
	dyoung, bhe

在 2018年11月27日 23:34, Dave Hansen 写道:
> On 11/27/18 2:04 AM, lijiang wrote:
>> What happens if we don't modify this functions
>> __ioremap_check_desc_other()? -When SEV is active, it might map these
>> reserved regions with the encryption mask. That is incorrect. 
> 
> This is missing another sentence or two to "connect the dots".
> 
> SEV uses data that comes from the e820 table to tell whether or not the
> memory should be encrypted?  If we don't reflect these reserved areas in
> the e820 table, the SEV code will set up encrypted mappings for device
> memory, for instance?
> 

For the convenience of description, here copy some code fragments and put them at the end.
Please refer to these codes.

When the SEV is active, the page being mapped will determine whether to use the memory
encryption attribute for mapping based on the 'mem_flags.desc_other'.

But the value of 'mem_flags.desc_other' is set according to the returned result of this
function __ioremap_check_desc_other().

Originally, the 'E820_TYPE_RESERVED' type is converted to the descriptor 'IORES_DESC_NONE',
therefore, for the e820 reserved type, the value of 'mem_flags.desc_other' is equal to
'false'.

But now, the 'E820_TYPE_RESERVED' type is converted to the descriptor 'IORES_DESC_RESERVED'
instead of 'IORES_DESC_NONE', it has been changed and the value of 'mem_flags.desc_other'
is equal to 'true'. This is wrong.

So that would be nice to keep it the same as before, that is to say, this function has to
be improved like this.
static int __ioremap_check_desc_other(struct resource *res)
{
    return ((res->desc != IORES_DESC_NONE) &&
            (res->desc != IORES_DESC_RESERVED));
}


Please look at the following function __ioremap_caller(), and also list the call trace path.

Call trace path:
   __ioremap_caller()->
__ioremap_check_mem()->
       walk_mem_res()->
__ioremap_res_check()-> 
__ioremap_check_desc_other()

static void __iomem *__ioremap_caller(resource_size_t phys_addr,
                unsigned long size, enum page_cache_mode pcm,
                void *caller, bool encrypted)
{
        unsigned long offset, vaddr;
        resource_size_t last_addr;
        const resource_size_t unaligned_phys_addr = phys_addr;
        const unsigned long unaligned_size = size;
        struct ioremap_mem_flags mem_flags;

	/* skip some codes...*/

        __ioremap_check_mem(phys_addr, size, &mem_flags);

	/* skip some codes...*/

	/*
         * If the page being mapped is in memory and SEV is active then
         * make sure the memory encryption attribute is enabled in the
         * resulting mapping.
         */
        prot = PAGE_KERNEL_IO;
        if ((sev_active() && mem_flags.desc_other) || encrypted)
                prot = pgprot_encrypted(prot);

	/* skip some codes...*/
}

Thanks.
Lianbo

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 1/2 RESEND v7] resource: add the new I/O resource descriptor 'IORES_DESC_RESERVED'
  2018-11-28  3:51         ` lijiang
@ 2018-11-28 16:02           ` Dave Hansen
  2018-11-29  2:14             ` lijiang
  0 siblings, 1 reply; 13+ messages in thread
From: Dave Hansen @ 2018-11-28 16:02 UTC (permalink / raw)
  To: lijiang, linux-kernel
  Cc: kexec, x86, linux-ia64, linux-efi, tglx, mingo, bp, akpm,
	dave.hansen, luto, peterz, ard.biesheuvel, tony.luck, fenghua.yu,
	dyoung, bhe

On 11/27/18 7:51 PM, lijiang wrote:
> But now, the 'E820_TYPE_RESERVED' type is converted to the descriptor 'IORES_DESC_RESERVED'
> instead of 'IORES_DESC_NONE', it has been changed and the value of 'mem_flags.desc_other'
> is equal to 'true'. This is wrong.

Thanks for the improved description of the problem.  Seems to all make
sense now.

Could you include this in the changelog for the next version?

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 1/2 RESEND v7] resource: add the new I/O resource descriptor 'IORES_DESC_RESERVED'
  2018-11-28 16:02           ` Dave Hansen
@ 2018-11-29  2:14             ` lijiang
  0 siblings, 0 replies; 13+ messages in thread
From: lijiang @ 2018-11-29  2:14 UTC (permalink / raw)
  To: Dave Hansen, linux-kernel
  Cc: kexec, x86, linux-ia64, linux-efi, tglx, mingo, bp, akpm,
	dave.hansen, luto, peterz, ard.biesheuvel, tony.luck, fenghua.yu,
	dyoung, bhe

在 2018年11月29日 00:02, Dave Hansen 写道:
> On 11/27/18 7:51 PM, lijiang wrote:
>> But now, the 'E820_TYPE_RESERVED' type is converted to the descriptor 'IORES_DESC_RESERVED'
>> instead of 'IORES_DESC_NONE', it has been changed and the value of 'mem_flags.desc_other'
>> is equal to 'true'. This is wrong.
> 
> Thanks for the improved description of the problem.  Seems to all make
> sense now.
> 
> Could you include this in the changelog for the next version?
> 

Ok, i will include these changes in patch v8. Also thanks for your advice.

Regards,
Lianbo

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2018-11-29  2:14 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-11-24  5:12 [PATCH 0/2 RESEND v7] add reserved e820 ranges to the kdump kernel e820 table Lianbo Jiang
2018-11-24  5:12 ` [PATCH 1/2 RESEND v7] resource: add the new I/O resource descriptor 'IORES_DESC_RESERVED' Lianbo Jiang
2018-11-26 20:52   ` Dave Hansen
2018-11-27 10:04     ` lijiang
2018-11-27 15:34       ` Dave Hansen
2018-11-28  3:51         ` lijiang
2018-11-28 16:02           ` Dave Hansen
2018-11-29  2:14             ` lijiang
2018-11-24  5:12 ` [PATCH 2/2 RESEND v7] x86/kexec_file: add reserved e820 ranges to kdump kernel e820 table Lianbo Jiang
2018-11-26 17:44 ` [PATCH 0/2 RESEND v7] add reserved e820 ranges to the " Dave Hansen
2018-11-26 18:04   ` Borislav Petkov
2018-11-26 18:54 ` Dave Hansen
2018-11-27  2:58   ` lijiang

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).