linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/2 v5] add reserved e820 ranges to the kdump kernel e820 table
@ 2018-11-07  5:00 Lianbo Jiang
  2018-11-07  5:00 ` [PATCH 1/2 v5] x86/kexec_file: add e820 entry in case e820 type string matches to io resource name Lianbo Jiang
  2018-11-07  5:00 ` [PATCH 2/2 v5] x86/kexec_file: add reserved e820 ranges to kdump kernel e820 table Lianbo Jiang
  0 siblings, 2 replies; 9+ messages in thread
From: Lianbo Jiang @ 2018-11-07  5:00 UTC (permalink / raw)
  To: linux-kernel; +Cc: kexec, x86, tglx, mingo, bp, akpm, dyoung, bhe

E820 reserved ranges is useful in kdump kernel, it has been added in
kexec-tools code.

One reason is PCI mmconf (extended mode) requires reserved region otherwise
it falls back to legacy mode, and also outputs the following kernel log.

Example:
......
[   19.798354] PCI: MMCONFIG for domain 0000 [bus 00-ff] at [mem 0x80000000-0x8fffffff] (base 0x80000000)
[   19.800653] [Firmware Info]: PCI: MMCONFIG at [mem 0x80000000-0x8fffffff] not reserved in ACPI motherboard resources
[   19.800995] PCI: not using MMCONFIG
......

The correct kernel log is like this:
......
[    0.082649] PCI: MMCONFIG for domain 0000 [bus 00-ff] at [mem 0x80000000-0x8fffffff] (base 0x80000000)
[    0.083610] PCI: MMCONFIG at [mem 0x80000000-0x8fffffff] reserved in E820
......

Furthermore, when AMD SME kdump support, it needs to map dmi table area
as decrypted. For normal boot, these ranges sit in e820 reserved ranges,
thus the early ioremap code naturally map them as decrypted. If it also
has same e820 reserve setup in kdump kernel then it will just work like
normal kernel.

Kdump uses walk_iomem_res_desc to iterate resources, then adds matched
desc to e820 table for the kdump kernel.

But IORES_DESC_NONE resource type includes several different e820 types,
we need add exact e820 type to the kdump kernel e820 table, thus it also
needs an extra checking in memmap_entry_callback() to match the e820 type
and resource name.

Changes since v1:
1. Modified the value of flags to "0", when walking through the whole
tree for e820 reserved ranges.

Changes since v2:
1. Modified the value of flags to "0", when walking through the whole
tree for e820 reserved ranges.
2. Modified the invalid SOB chain issue.

Changes since v3:
1. Dropped [PATCH 1/3 v3] resource: fix an error which walks through iomem
   resources. Please refer to this commit <010a93bf97c7> "resource: Fix
   find_next_iomem_res() iteration issue"

Changes since v4:
1. Improve the patch log, and add kernel log.

Lianbo Jiang (2):
  x86/kexec_file: add e820 entry in case e820 type string matches to io
    resource name
  x86/kexec_file: add reserved e820 ranges to kdump kernel e820 table

 arch/x86/include/asm/e820/api.h |  2 ++
 arch/x86/kernel/crash.c         | 10 +++++++++-
 arch/x86/kernel/e820.c          |  2 +-
 kernel/resource.c               |  1 +
 4 files changed, 13 insertions(+), 2 deletions(-)

-- 
2.17.1


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH 1/2 v5] x86/kexec_file: add e820 entry in case e820 type string matches to io resource name
  2018-11-07  5:00 [PATCH 0/2 v5] add reserved e820 ranges to the kdump kernel e820 table Lianbo Jiang
@ 2018-11-07  5:00 ` Lianbo Jiang
  2018-11-07  5:17   ` Baoquan He
  2018-11-07  5:00 ` [PATCH 2/2 v5] x86/kexec_file: add reserved e820 ranges to kdump kernel e820 table Lianbo Jiang
  1 sibling, 1 reply; 9+ messages in thread
From: Lianbo Jiang @ 2018-11-07  5:00 UTC (permalink / raw)
  To: linux-kernel; +Cc: kexec, x86, tglx, mingo, bp, akpm, dyoung, bhe

kdump uses walk_iomem_res_desc() to iterate io resources, then adds matched
desc to e820 table for kdump kernel.

But IORES_DESC_NONE resource type includes several different e820 types,
we need add exact e820 type to kdump kernel e820 table, thus it also needs
an extra checking in memmap_entry_callback() to match the e820 type and
resource name.

Suggested-by: Dave Young <dyoung@redhat.com>
Signed-off-by: Lianbo Jiang <lijiang@redhat.com>
---
 arch/x86/include/asm/e820/api.h | 2 ++
 arch/x86/kernel/crash.c         | 6 +++++-
 arch/x86/kernel/e820.c          | 2 +-
 kernel/resource.c               | 1 +
 4 files changed, 9 insertions(+), 2 deletions(-)

diff --git a/arch/x86/include/asm/e820/api.h b/arch/x86/include/asm/e820/api.h
index 62be73b23d5c..6d5451b36e80 100644
--- a/arch/x86/include/asm/e820/api.h
+++ b/arch/x86/include/asm/e820/api.h
@@ -42,6 +42,8 @@ extern void e820__register_nosave_regions(unsigned long limit_pfn);
 
 extern int  e820__get_entry_type(u64 start, u64 end);
 
+extern const char *e820_type_to_string(struct e820_entry *entry);
+
 /*
  * Returns true iff the specified range [start,end) is completely contained inside
  * the ISA region.
diff --git a/arch/x86/kernel/crash.c b/arch/x86/kernel/crash.c
index f631a3f15587..ae724a6e0a5f 100644
--- a/arch/x86/kernel/crash.c
+++ b/arch/x86/kernel/crash.c
@@ -37,6 +37,7 @@
 #include <asm/reboot.h>
 #include <asm/virtext.h>
 #include <asm/intel_pt.h>
+#include <asm/e820/api.h>
 
 /* Used while preparing memory map entries for second kernel */
 struct crash_memmap_data {
@@ -314,11 +315,14 @@ static int memmap_entry_callback(struct resource *res, void *arg)
 	struct crash_memmap_data *cmd = arg;
 	struct boot_params *params = cmd->params;
 	struct e820_entry ei;
+	const char *name;
 
 	ei.addr = res->start;
 	ei.size = resource_size(res);
 	ei.type = cmd->type;
-	add_e820_entry(params, &ei);
+	name = e820_type_to_string(&ei);
+	if (res->name && !strcmp(name, res->name))
+		add_e820_entry(params, &ei);
 
 	return 0;
 }
diff --git a/arch/x86/kernel/e820.c b/arch/x86/kernel/e820.c
index 50895c2f937d..4c1fe4f8db1e 100644
--- a/arch/x86/kernel/e820.c
+++ b/arch/x86/kernel/e820.c
@@ -1011,7 +1011,7 @@ void __init e820__finish_early_params(void)
 	}
 }
 
-static const char *__init e820_type_to_string(struct e820_entry *entry)
+const char *e820_type_to_string(struct e820_entry *entry)
 {
 	switch (entry->type) {
 	case E820_TYPE_RESERVED_KERN:	/* Fall-through: */
diff --git a/kernel/resource.c b/kernel/resource.c
index b3a3a1fc499e..6285a6b4de6c 100644
--- a/kernel/resource.c
+++ b/kernel/resource.c
@@ -366,6 +366,7 @@ static int find_next_iomem_res(resource_size_t start, resource_size_t end,
 	res->end = min(end, p->end);
 	res->flags = p->flags;
 	res->desc = p->desc;
+	res->name = p->name;
 	return 0;
 }
 
-- 
2.17.1


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH 2/2 v5] x86/kexec_file: add reserved e820 ranges to kdump kernel e820 table
  2018-11-07  5:00 [PATCH 0/2 v5] add reserved e820 ranges to the kdump kernel e820 table Lianbo Jiang
  2018-11-07  5:00 ` [PATCH 1/2 v5] x86/kexec_file: add e820 entry in case e820 type string matches to io resource name Lianbo Jiang
@ 2018-11-07  5:00 ` Lianbo Jiang
  2018-11-07  5:23   ` Baoquan He
  1 sibling, 1 reply; 9+ messages in thread
From: Lianbo Jiang @ 2018-11-07  5:00 UTC (permalink / raw)
  To: linux-kernel; +Cc: kexec, x86, tglx, mingo, bp, akpm, dyoung, bhe

E820 reserved ranges is useful in kdump kernel, it has been added in
kexec-tools code.

One reason is PCI mmconf (extended mode) requires reserved region otherwise
it falls back to legacy mode, and also outputs the following kernel log.

Example:
......
[   19.798354] PCI: MMCONFIG for domain 0000 [bus 00-ff] at [mem 0x80000000-0x8fffffff] (base 0x80000000)
[   19.800653] [Firmware Info]: PCI: MMCONFIG at [mem 0x80000000-0x8fffffff] not reserved in ACPI motherboard resources
[   19.800995] PCI: not using MMCONFIG
......

The correct kernel log is like this:
......
[    0.082649] PCI: MMCONFIG for domain 0000 [bus 00-ff] at [mem 0x80000000-0x8fffffff] (base 0x80000000)
[    0.083610] PCI: MMCONFIG at [mem 0x80000000-0x8fffffff] reserved in E820
......

Furthermore, when AMD SME kdump support, it needs to map dmi table area
as decrypted. For normal boot, these ranges sit in e820 reserved ranges,
thus the early ioremap code naturally map them as decrypted. If it also
has same e820 reserve setup in kdump kernel then it will just work like
normal kernel.

Suggested-by: Dave Young <dyoung@redhat.com>
Signed-off-by: Lianbo Jiang <lijiang@redhat.com>
---
 arch/x86/kernel/crash.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/arch/x86/kernel/crash.c b/arch/x86/kernel/crash.c
index ae724a6e0a5f..d3167125800e 100644
--- a/arch/x86/kernel/crash.c
+++ b/arch/x86/kernel/crash.c
@@ -384,6 +384,10 @@ int crash_setup_memmap_entries(struct kimage *image, struct boot_params *params)
 	walk_iomem_res_desc(IORES_DESC_ACPI_NV_STORAGE, flags, 0, -1, &cmd,
 			memmap_entry_callback);
 
+	cmd.type = E820_TYPE_RESERVED;
+	walk_iomem_res_desc(IORES_DESC_NONE, 0, 0, -1, &cmd,
+			   memmap_entry_callback);
+
 	/* Add crashk_low_res region */
 	if (crashk_low_res.end) {
 		ei.addr = crashk_low_res.start;
-- 
2.17.1


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 1/2 v5] x86/kexec_file: add e820 entry in case e820 type string matches to io resource name
  2018-11-07  5:00 ` [PATCH 1/2 v5] x86/kexec_file: add e820 entry in case e820 type string matches to io resource name Lianbo Jiang
@ 2018-11-07  5:17   ` Baoquan He
  0 siblings, 0 replies; 9+ messages in thread
From: Baoquan He @ 2018-11-07  5:17 UTC (permalink / raw)
  To: Lianbo Jiang; +Cc: linux-kernel, kexec, x86, tglx, mingo, bp, akpm, dyoung

Hi Lianbo,

On 11/07/18 at 01:00pm, Lianbo Jiang wrote:
> kdump uses walk_iomem_res_desc() to iterate io resources, then adds matched
> desc to e820 table for kdump kernel.
> 
> But IORES_DESC_NONE resource type includes several different e820 types,

As we discussed privately, better say more clearly what is
"IORES_DESC_NONE resource type includes several different e820 types".

You can tell:

When convert e820 type into iores descriptors, several e820 types are
all converted to IORES_DESC_NONE in function e820_type_to_iores_desc().

Here may no need to list below code if you have told clearly what
happened. Otherwise could you tell what it mean about the sentence:
"IORES_DESC_NONE resource type includes several different e820 types".

They are diffeent types, why one type includes another type, how does it
include? Why it matters when one type includes another type? Why do you
care?

static unsigned long __init e820_type_to_iores_desc(struct e820_entry *entry)                                                                     
{
        switch (entry->type) {
        case E820_TYPE_ACPI:            return IORES_DESC_ACPI_TABLES;
        case E820_TYPE_NVS:             return IORES_DESC_ACPI_NV_STORAGE;
        case E820_TYPE_PMEM:            return IORES_DESC_PERSISTENT_MEMORY;
        case E820_TYPE_PRAM:            return IORES_DESC_PERSISTENT_MEMORY_LEGACY;
        case E820_TYPE_RESERVED_KERN:   /* Fall-through: */
        case E820_TYPE_RAM:             /* Fall-through: */
        case E820_TYPE_UNUSABLE:        /* Fall-through: */
        case E820_TYPE_RESERVED:        /* Fall-through: */
        default:                        return IORES_DESC_NONE;
        }
}

> we need add exact e820 type to kdump kernel e820 table, thus it also needs

Why we need add exact e820 type to kdump kernel e820 table? just because 
"IORES_DESC_NONE resource type includes several different e820 types"?
Why we didn't need it before, why need it now?

Add it for what? Fix a bug/add a feature?

> an extra checking in memmap_entry_callback() to match the e820 type and
> resource name.

This paragraph only tells how you fix it. "IORES_DESC_NONE resource type
includes several different e820 types" only tells a fact. So what's
impacted by the including? Why i'ts impacted?

> 
> Suggested-by: Dave Young <dyoung@redhat.com>
> Signed-off-by: Lianbo Jiang <lijiang@redhat.com>
> ---
>  arch/x86/include/asm/e820/api.h | 2 ++
>  arch/x86/kernel/crash.c         | 6 +++++-
>  arch/x86/kernel/e820.c          | 2 +-
>  kernel/resource.c               | 1 +
>  4 files changed, 9 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/x86/include/asm/e820/api.h b/arch/x86/include/asm/e820/api.h
> index 62be73b23d5c..6d5451b36e80 100644
> --- a/arch/x86/include/asm/e820/api.h
> +++ b/arch/x86/include/asm/e820/api.h
> @@ -42,6 +42,8 @@ extern void e820__register_nosave_regions(unsigned long limit_pfn);
>  
>  extern int  e820__get_entry_type(u64 start, u64 end);
>  
> +extern const char *e820_type_to_string(struct e820_entry *entry);
> +
>  /*
>   * Returns true iff the specified range [start,end) is completely contained inside
>   * the ISA region.
> diff --git a/arch/x86/kernel/crash.c b/arch/x86/kernel/crash.c
> index f631a3f15587..ae724a6e0a5f 100644
> --- a/arch/x86/kernel/crash.c
> +++ b/arch/x86/kernel/crash.c
> @@ -37,6 +37,7 @@
>  #include <asm/reboot.h>
>  #include <asm/virtext.h>
>  #include <asm/intel_pt.h>
> +#include <asm/e820/api.h>
>  
>  /* Used while preparing memory map entries for second kernel */
>  struct crash_memmap_data {
> @@ -314,11 +315,14 @@ static int memmap_entry_callback(struct resource *res, void *arg)
>  	struct crash_memmap_data *cmd = arg;
>  	struct boot_params *params = cmd->params;
>  	struct e820_entry ei;
> +	const char *name;
>  
>  	ei.addr = res->start;
>  	ei.size = resource_size(res);
>  	ei.type = cmd->type;
> -	add_e820_entry(params, &ei);
> +	name = e820_type_to_string(&ei);
> +	if (res->name && !strcmp(name, res->name))
> +		add_e820_entry(params, &ei);
>  
>  	return 0;
>  }
> diff --git a/arch/x86/kernel/e820.c b/arch/x86/kernel/e820.c
> index 50895c2f937d..4c1fe4f8db1e 100644
> --- a/arch/x86/kernel/e820.c
> +++ b/arch/x86/kernel/e820.c
> @@ -1011,7 +1011,7 @@ void __init e820__finish_early_params(void)
>  	}
>  }
>  
> -static const char *__init e820_type_to_string(struct e820_entry *entry)
> +const char *e820_type_to_string(struct e820_entry *entry)
>  {
>  	switch (entry->type) {
>  	case E820_TYPE_RESERVED_KERN:	/* Fall-through: */
> diff --git a/kernel/resource.c b/kernel/resource.c
> index b3a3a1fc499e..6285a6b4de6c 100644
> --- a/kernel/resource.c
> +++ b/kernel/resource.c
> @@ -366,6 +366,7 @@ static int find_next_iomem_res(resource_size_t start, resource_size_t end,
>  	res->end = min(end, p->end);
>  	res->flags = p->flags;
>  	res->desc = p->desc;
> +	res->name = p->name;
>  	return 0;
>  }
>  
> -- 
> 2.17.1
> 

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 2/2 v5] x86/kexec_file: add reserved e820 ranges to kdump kernel e820 table
  2018-11-07  5:00 ` [PATCH 2/2 v5] x86/kexec_file: add reserved e820 ranges to kdump kernel e820 table Lianbo Jiang
@ 2018-11-07  5:23   ` Baoquan He
  2018-11-07  9:10     ` lijiang
  0 siblings, 1 reply; 9+ messages in thread
From: Baoquan He @ 2018-11-07  5:23 UTC (permalink / raw)
  To: Lianbo Jiang; +Cc: linux-kernel, kexec, x86, tglx, mingo, bp, akpm, dyoung

On 11/07/18 at 01:00pm, Lianbo Jiang wrote:
> E820 reserved ranges is useful in kdump kernel, it has been added in
> kexec-tools code.
> 
> One reason is PCI mmconf (extended mode) requires reserved region otherwise
> it falls back to legacy mode, and also outputs the following kernel log.

OK, it falls back to legacy mode, and also output kernel log, except of
these, does it crash kernel? kdump kernel hang? Can we leave it if it
only ouptut kernel log?

> 
> Example:
> ......
> [   19.798354] PCI: MMCONFIG for domain 0000 [bus 00-ff] at [mem 0x80000000-0x8fffffff] (base 0x80000000)
> [   19.800653] [Firmware Info]: PCI: MMCONFIG at [mem 0x80000000-0x8fffffff] not reserved in ACPI motherboard resources
> [   19.800995] PCI: not using MMCONFIG
> ......
> 
> The correct kernel log is like this:
> ......
> [    0.082649] PCI: MMCONFIG for domain 0000 [bus 00-ff] at [mem 0x80000000-0x8fffffff] (base 0x80000000)
> [    0.083610] PCI: MMCONFIG at [mem 0x80000000-0x8fffffff] reserved in E820
> ......
> 
> Furthermore, when AMD SME kdump support, it needs to map dmi table area
> as decrypted. For normal boot, these ranges sit in e820 reserved ranges,
> thus the early ioremap code naturally map them as decrypted. If it also
> has same e820 reserve setup in kdump kernel then it will just work like
> normal kernel.

Why do we care? If don't fix, what's happening?

Lianbo, for a bug fix, please describe the problems. Then give out the
analysis about root cause. 


> 
> Suggested-by: Dave Young <dyoung@redhat.com>
> Signed-off-by: Lianbo Jiang <lijiang@redhat.com>
> ---
>  arch/x86/kernel/crash.c | 4 ++++
>  1 file changed, 4 insertions(+)
> 
> diff --git a/arch/x86/kernel/crash.c b/arch/x86/kernel/crash.c
> index ae724a6e0a5f..d3167125800e 100644
> --- a/arch/x86/kernel/crash.c
> +++ b/arch/x86/kernel/crash.c
> @@ -384,6 +384,10 @@ int crash_setup_memmap_entries(struct kimage *image, struct boot_params *params)
>  	walk_iomem_res_desc(IORES_DESC_ACPI_NV_STORAGE, flags, 0, -1, &cmd,
>  			memmap_entry_callback);
>  
> +	cmd.type = E820_TYPE_RESERVED;
> +	walk_iomem_res_desc(IORES_DESC_NONE, 0, 0, -1, &cmd,
> +			   memmap_entry_callback);
> +
>  	/* Add crashk_low_res region */
>  	if (crashk_low_res.end) {
>  		ei.addr = crashk_low_res.start;
> -- 
> 2.17.1
> 

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 2/2 v5] x86/kexec_file: add reserved e820 ranges to kdump kernel e820 table
  2018-11-07  5:23   ` Baoquan He
@ 2018-11-07  9:10     ` lijiang
  2018-11-07  9:35       ` Dave Young
  2018-11-09  6:51       ` Baoquan He
  0 siblings, 2 replies; 9+ messages in thread
From: lijiang @ 2018-11-07  9:10 UTC (permalink / raw)
  To: Baoquan He; +Cc: linux-kernel, kexec, x86, tglx, mingo, bp, akpm, dyoung

在 2018年11月07日 13:23, Baoquan He 写道:
> On 11/07/18 at 01:00pm, Lianbo Jiang wrote:
>> E820 reserved ranges is useful in kdump kernel, it has been added in
>> kexec-tools code.
>>
>> One reason is PCI mmconf (extended mode) requires reserved region otherwise
>> it falls back to legacy mode, and also outputs the following kernel log.
> 
> OK, it falls back to legacy mode, and also output kernel log, except of
> these, does it crash kernel? kdump kernel hang? Can we leave it if it
> only ouptut kernel log?
> 
>>
>> Example:
>> ......
>> [   19.798354] PCI: MMCONFIG for domain 0000 [bus 00-ff] at [mem 0x80000000-0x8fffffff] (base 0x80000000)
>> [   19.800653] [Firmware Info]: PCI: MMCONFIG at [mem 0x80000000-0x8fffffff] not reserved in ACPI motherboard resources
>> [   19.800995] PCI: not using MMCONFIG
>> ......
>>
>> The correct kernel log is like this:
>> ......
>> [    0.082649] PCI: MMCONFIG for domain 0000 [bus 00-ff] at [mem 0x80000000-0x8fffffff] (base 0x80000000)
>> [    0.083610] PCI: MMCONFIG at [mem 0x80000000-0x8fffffff] reserved in E820
>> ......
>>
>> Furthermore, when AMD SME kdump support, it needs to map dmi table area
>> as decrypted. For normal boot, these ranges sit in e820 reserved ranges,
>> thus the early ioremap code naturally map them as decrypted. If it also
>> has same e820 reserve setup in kdump kernel then it will just work like
>> normal kernel.
> 
> Why do we care? If don't fix, what's happening?
> 
> Lianbo, for a bug fix, please describe the problems. Then give out the
> analysis about root cause. 
> 

Thanks for your comment in detail.

In fact, these patches are really simple. As the subject mentioned, this patch
[PATCH 2/2] adds the reserved e820 ranges to kdump kernel e820 table, and the
first patch [PATCH 1/2] helps to exactly add the e820(E820_TYPE_RESERVED) type
to kdump kernel e820 table, that is to say, it will filter out some unnecessary
type(E820_TYPE_RAM/E820_TYPE_UNUSABLE/E820_TYPE_RESERVED_KERN).

At present, when we use the kexec to load the kernel image and initramfs(for
example: kexec -s -p xxxx), the latest kernel does not pass the e820 reserved
ranges to the second kernel, which might produce two problems:

The first one is the MMCONFIG issue, although which does not make the system
crash or hang, this issue is still a potential risk, because my test can't
cover all cases due to resource constraints(Machine), and i'm not sure what
it will happen on other machine.

The second issue is that the e820 reserved ranges do not setup in kdump kernel,
which will cause some functions which are related to the e820 reserved ranges
to become invalid. For example:

early_memremap()->
early_memremap_pgprot_adjust()->
memremap_should_map_decrypted()->
e820__get_entry_type()

Please focus on these functions, early_memremap_pgprot_adjust() and memremap_should_map_decrypted().

In the first kernel, these ranges sit in e820 reserved ranges, so the memremap_should_map_decrypted()
will return true, that is to say, the reserved memory is decrypted, then the early_memremap_pgprot_adjust()
will call the pgprot_decrypted() to clear the memory encryption mask.

In the second kernel, because the e820 reserved ranges are not passed to the second kernel, these ranges
don't sit in the e820 reserved ranges, so the the memremap_should_map_decrypted() will return false, that
is to say, the reserved memory is encrypted, and then the early_memremap_pgprot_adjust() will also call the
pgprot_encrypted() to set the memory encryption mask.

Obviously, in the second kernel, the e820 reserved memory is still decrypted, it has gone wrong. So, if
don't fix, kdump won't work when we use the command(kexec -s -p xxx) to load the kernel image and initramfs.

Hope this helps.

Thanks,
Lianbo

pgprot_t __init early_memremap_pgprot_adjust(resource_size_t phys_addr,
                                             unsigned long size,
                                             pgprot_t prot)
{
        bool encrypted_prot;

        if (!mem_encrypt_active())
                return prot;

        encrypted_prot = true;

	//......

        if (encrypted_prot && memremap_should_map_decrypted(phys_addr, size))
                encrypted_prot = false;

        return encrypted_prot ? pgprot_encrypted(prot)
                              : pgprot_decrypted(prot);
}

static bool memremap_should_map_decrypted(resource_size_t phys_addr,
                                          unsigned long size)
{
        int is_pmem;

	//......

        /* Check if the address is outside kernel usable area */
        switch (e820__get_entry_type(phys_addr, phys_addr + size - 1)) {
        case E820_TYPE_RESERVED:
        case E820_TYPE_ACPI:
        case E820_TYPE_NVS:
        case E820_TYPE_UNUSABLE:
                /* For SEV, these areas are encrypted */
                if (sev_active())
                        break;
                /* Fallthrough */

        case E820_TYPE_PRAM:
                return true;
        default:
                break;
        }

        return false;
}


> 
>>
>> Suggested-by: Dave Young <dyoung@redhat.com>
>> Signed-off-by: Lianbo Jiang <lijiang@redhat.com>
>> ---
>>  arch/x86/kernel/crash.c | 4 ++++
>>  1 file changed, 4 insertions(+)
>>
>> diff --git a/arch/x86/kernel/crash.c b/arch/x86/kernel/crash.c
>> index ae724a6e0a5f..d3167125800e 100644
>> --- a/arch/x86/kernel/crash.c
>> +++ b/arch/x86/kernel/crash.c
>> @@ -384,6 +384,10 @@ int crash_setup_memmap_entries(struct kimage *image, struct boot_params *params)
>>  	walk_iomem_res_desc(IORES_DESC_ACPI_NV_STORAGE, flags, 0, -1, &cmd,
>>  			memmap_entry_callback);
>>  
>> +	cmd.type = E820_TYPE_RESERVED;
>> +	walk_iomem_res_desc(IORES_DESC_NONE, 0, 0, -1, &cmd,
>> +			   memmap_entry_callback);
>> +
>>  	/* Add crashk_low_res region */
>>  	if (crashk_low_res.end) {
>>  		ei.addr = crashk_low_res.start;
>> -- 
>> 2.17.1
>>

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 2/2 v5] x86/kexec_file: add reserved e820 ranges to kdump kernel e820 table
  2018-11-07  9:10     ` lijiang
@ 2018-11-07  9:35       ` Dave Young
  2018-11-09  6:51       ` Baoquan He
  1 sibling, 0 replies; 9+ messages in thread
From: Dave Young @ 2018-11-07  9:35 UTC (permalink / raw)
  To: lijiang; +Cc: Baoquan He, linux-kernel, kexec, x86, tglx, mingo, bp, akpm

On 11/07/18 at 05:10pm, lijiang wrote:
> 在 2018年11月07日 13:23, Baoquan He 写道:
> > On 11/07/18 at 01:00pm, Lianbo Jiang wrote:
> >> E820 reserved ranges is useful in kdump kernel, it has been added in
> >> kexec-tools code.
> >>
> >> One reason is PCI mmconf (extended mode) requires reserved region otherwise
> >> it falls back to legacy mode, and also outputs the following kernel log.
> > 
> > OK, it falls back to legacy mode, and also output kernel log, except of
> > these, does it crash kernel? kdump kernel hang? Can we leave it if it
> > only ouptut kernel log?
> > 
> >>
> >> Example:
> >> ......
> >> [   19.798354] PCI: MMCONFIG for domain 0000 [bus 00-ff] at [mem 0x80000000-0x8fffffff] (base 0x80000000)
> >> [   19.800653] [Firmware Info]: PCI: MMCONFIG at [mem 0x80000000-0x8fffffff] not reserved in ACPI motherboard resources
> >> [   19.800995] PCI: not using MMCONFIG
> >> ......
> >>
> >> The correct kernel log is like this:
> >> ......
> >> [    0.082649] PCI: MMCONFIG for domain 0000 [bus 00-ff] at [mem 0x80000000-0x8fffffff] (base 0x80000000)
> >> [    0.083610] PCI: MMCONFIG at [mem 0x80000000-0x8fffffff] reserved in E820
> >> ......
> >>
> >> Furthermore, when AMD SME kdump support, it needs to map dmi table area
> >> as decrypted. For normal boot, these ranges sit in e820 reserved ranges,
> >> thus the early ioremap code naturally map them as decrypted. If it also
> >> has same e820 reserve setup in kdump kernel then it will just work like
> >> normal kernel.
> > 
> > Why do we care? If don't fix, what's happening?
> > 
> > Lianbo, for a bug fix, please describe the problems. Then give out the
> > analysis about root cause. 
> > 
> 
> Thanks for your comment in detail.
> 
> In fact, these patches are really simple. As the subject mentioned, this patch
> [PATCH 2/2] adds the reserved e820 ranges to kdump kernel e820 table, and the
> first patch [PATCH 1/2] helps to exactly add the e820(E820_TYPE_RESERVED) type
> to kdump kernel e820 table, that is to say, it will filter out some unnecessary
> type(E820_TYPE_RAM/E820_TYPE_UNUSABLE/E820_TYPE_RESERVED_KERN).
> 
> At present, when we use the kexec to load the kernel image and initramfs(for
> example: kexec -s -p xxxx), the latest kernel does not pass the e820 reserved
> ranges to the second kernel, which might produce two problems:
> 
> The first one is the MMCONFIG issue, although which does not make the system
> crash or hang, this issue is still a potential risk, because my test can't
> cover all cases due to resource constraints(Machine), and i'm not sure what
> it will happen on other machine.

If a device is plugged in pci domain 1 then this device can not be
recognized in kdump kernel in your case.  For example if one want to
dump to a network target, but the net card can not be recognized then
vmcore can not be saved successfully

Thanks
Dave

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 2/2 v5] x86/kexec_file: add reserved e820 ranges to kdump kernel e820 table
  2018-11-07  9:10     ` lijiang
  2018-11-07  9:35       ` Dave Young
@ 2018-11-09  6:51       ` Baoquan He
  2018-11-09  7:13         ` lijiang
  1 sibling, 1 reply; 9+ messages in thread
From: Baoquan He @ 2018-11-09  6:51 UTC (permalink / raw)
  To: lijiang; +Cc: linux-kernel, kexec, x86, tglx, mingo, bp, akpm, dyoung

On 11/07/18 at 05:10pm, lijiang wrote:
> In fact, these patches are really simple. As the subject mentioned, this patch
> [PATCH 2/2] adds the reserved e820 ranges to kdump kernel e820 table, and the
> first patch [PATCH 1/2] helps to exactly add the e820(E820_TYPE_RESERVED) type
> to kdump kernel e820 table, that is to say, it will filter out some unnecessary
> type(E820_TYPE_RAM/E820_TYPE_UNUSABLE/E820_TYPE_RESERVED_KERN).
> 
> At present, when we use the kexec to load the kernel image and initramfs(for
> example: kexec -s -p xxxx), the latest kernel does not pass the e820 reserved
> ranges to the second kernel, which might produce two problems:
> 
> The first one is the MMCONFIG issue, although which does not make the system
> crash or hang, this issue is still a potential risk, because my test can't
> cover all cases due to resource constraints(Machine), and i'm not sure what
> it will happen on other machine.
> 
> The second issue is that the e820 reserved ranges do not setup in kdump kernel,
> which will cause some functions which are related to the e820 reserved ranges
> to become invalid. For example:
> 
> early_memremap()->
> early_memremap_pgprot_adjust()->
> memremap_should_map_decrypted()->
> e820__get_entry_type()
> 
> Please focus on these functions, early_memremap_pgprot_adjust() and memremap_should_map_decrypted().
> 
> In the first kernel, these ranges sit in e820 reserved ranges, so the memremap_should_map_decrypted()
> will return true, that is to say, the reserved memory is decrypted, then the early_memremap_pgprot_adjust()
> will call the pgprot_decrypted() to clear the memory encryption mask.
> 
> In the second kernel, because the e820 reserved ranges are not passed to the second kernel, these ranges
> don't sit in the e820 reserved ranges, so the the memremap_should_map_decrypted() will return false, that
> is to say, the reserved memory is encrypted, and then the early_memremap_pgprot_adjust() will also call the
> pgprot_encrypted() to set the memory encryption mask.
> 
> Obviously, in the second kernel, the e820 reserved memory is still decrypted, it has gone wrong. So, if
> don't fix, kdump won't work when we use the command(kexec -s -p xxx) to load the kernel image and initramfs.

Yes, this looks better. In fact, I searched cover-letter and both patch
lg, didn't find anything to tell what will happen without these
patchset. We should at least tell the patch is for cleanup/improvement/fix/feature.
This "if don't fix, kdump won't work ..." is the sentence I first saw
telling the sequence or the problem.

Please do not misunderstand those questions from me with unfriendly tone,
words are cold and can't express emotions exactly, and appearance. Those
questions are just meaning that people may get confusion from those
places.

What: describe the problem you met.
Why:  tell the root cause
How:  how you fix it

This is what I usually use to write patch log. Just for your reference.

Thanks
Baoquan

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 2/2 v5] x86/kexec_file: add reserved e820 ranges to kdump kernel e820 table
  2018-11-09  6:51       ` Baoquan He
@ 2018-11-09  7:13         ` lijiang
  0 siblings, 0 replies; 9+ messages in thread
From: lijiang @ 2018-11-09  7:13 UTC (permalink / raw)
  To: Baoquan He; +Cc: linux-kernel, kexec, x86, tglx, mingo, bp, akpm, dyoung

在 2018年11月09日 14:51, Baoquan He 写道:
> On 11/07/18 at 05:10pm, lijiang wrote:
>> In fact, these patches are really simple. As the subject mentioned, this patch
>> [PATCH 2/2] adds the reserved e820 ranges to kdump kernel e820 table, and the
>> first patch [PATCH 1/2] helps to exactly add the e820(E820_TYPE_RESERVED) type
>> to kdump kernel e820 table, that is to say, it will filter out some unnecessary
>> type(E820_TYPE_RAM/E820_TYPE_UNUSABLE/E820_TYPE_RESERVED_KERN).
>>
>> At present, when we use the kexec to load the kernel image and initramfs(for
>> example: kexec -s -p xxxx), the latest kernel does not pass the e820 reserved
>> ranges to the second kernel, which might produce two problems:
>>
>> The first one is the MMCONFIG issue, although which does not make the system
>> crash or hang, this issue is still a potential risk, because my test can't
>> cover all cases due to resource constraints(Machine), and i'm not sure what
>> it will happen on other machine.
>>
>> The second issue is that the e820 reserved ranges do not setup in kdump kernel,
>> which will cause some functions which are related to the e820 reserved ranges
>> to become invalid. For example:
>>
>> early_memremap()->
>> early_memremap_pgprot_adjust()->
>> memremap_should_map_decrypted()->
>> e820__get_entry_type()
>>
>> Please focus on these functions, early_memremap_pgprot_adjust() and memremap_should_map_decrypted().
>>
>> In the first kernel, these ranges sit in e820 reserved ranges, so the memremap_should_map_decrypted()
>> will return true, that is to say, the reserved memory is decrypted, then the early_memremap_pgprot_adjust()
>> will call the pgprot_decrypted() to clear the memory encryption mask.
>>
>> In the second kernel, because the e820 reserved ranges are not passed to the second kernel, these ranges
>> don't sit in the e820 reserved ranges, so the the memremap_should_map_decrypted() will return false, that
>> is to say, the reserved memory is encrypted, and then the early_memremap_pgprot_adjust() will also call the
>> pgprot_encrypted() to set the memory encryption mask.
>>
>> Obviously, in the second kernel, the e820 reserved memory is still decrypted, it has gone wrong. So, if
>> don't fix, kdump won't work when we use the command(kexec -s -p xxx) to load the kernel image and initramfs.
> 
> Yes, this looks better. In fact, I searched cover-letter and both patch
> lg, didn't find anything to tell what will happen without these
> patchset. We should at least tell the patch is for cleanup/improvement/fix/feature.
> This "if don't fix, kdump won't work ..." is the sentence I first saw
> telling the sequence or the problem.
> 
> Please do not misunderstand those questions from me with unfriendly tone,
> words are cold and can't express emotions exactly, and appearance. Those
> questions are just meaning that people may get confusion from those
> places.
> 

Don't worry.
 
I would like to improve patch log based on your suggestion.

Regards,
Lianbo
> What: describe the problem you met.
> Why:  tell the root cause
> How:  how you fix it
> 
> This is what I usually use to write patch log. Just for your reference.
> 

Ok, thanks.

> Thanks
> Baoquan
> 

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2018-11-09  7:14 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-11-07  5:00 [PATCH 0/2 v5] add reserved e820 ranges to the kdump kernel e820 table Lianbo Jiang
2018-11-07  5:00 ` [PATCH 1/2 v5] x86/kexec_file: add e820 entry in case e820 type string matches to io resource name Lianbo Jiang
2018-11-07  5:17   ` Baoquan He
2018-11-07  5:00 ` [PATCH 2/2 v5] x86/kexec_file: add reserved e820 ranges to kdump kernel e820 table Lianbo Jiang
2018-11-07  5:23   ` Baoquan He
2018-11-07  9:10     ` lijiang
2018-11-07  9:35       ` Dave Young
2018-11-09  6:51       ` Baoquan He
2018-11-09  7:13         ` lijiang

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).