[PATCH] x86/kdump: Fix 'kmem -s' reported an invalid freepointer when SME was active

linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH] x86/kdump: Fix 'kmem -s' reported an invalid freepointer when SME was active
@ 2019-09-20  3:53 Lianbo Jiang
  2019-09-27  5:15 ` Dave Young
  0 siblings, 1 reply; 10+ messages in thread
From: Lianbo Jiang @ 2019-09-20  3:53 UTC (permalink / raw)
  To: linux-kernel
  Cc: tglx, mingo, bp, hpa, x86, bhe, dyoung, jgross, dhowells,
	Thomas.Lendacky

Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=204793

Kdump kernel will reuse the first 640k region because of some reasons,
for example: the trampline and conventional PC system BIOS region may
require to allocate memory in this area. Obviously, kdump kernel will
also overwrite the first 640k region, therefore, kernel has to copy
the contents of the first 640k area to a backup area, which is done in
purgatory(), because vmcore may need the old memory. When vmcore is
dumped, kdump kernel will read the old memory from the backup area of
the first 640k area.

Basically, the main reason should be clear, kernel does not correctly
handle the first 640k region when SME is active, which causes that
kernel does not properly copy these old memory to the backup area in
purgatory(). Therefore, kdump kernel reads out the incorrect contents
from the backup area when dumping vmcore. Finally, the phenomenon is
as follow:

[root linux]$ crash vmlinux /var/crash/127.0.0.1-2019-09-19-08\:31\:27/vmcore
WARNING: kernel relocated [240MB]: patching 97110 gdb minimal_symbol values

      KERNEL: /var/crash/127.0.0.1-2019-09-19-08:31:27/vmlinux
    DUMPFILE: /var/crash/127.0.0.1-2019-09-19-08:31:27/vmcore  [PARTIAL DUMP]
        CPUS: 128
        DATE: Thu Sep 19 08:31:18 2019
      UPTIME: 00:01:21
LOAD AVERAGE: 0.16, 0.07, 0.02
       TASKS: 1343
    NODENAME: amd-ethanol
     RELEASE: 5.3.0-rc7+
     VERSION: #4 SMP Thu Sep 19 08:14:00 EDT 2019
     MACHINE: x86_64  (2195 Mhz)
      MEMORY: 127.9 GB
       PANIC: "Kernel panic - not syncing: sysrq triggered crash"
         PID: 9789
     COMMAND: "bash"
        TASK: "ffff89711894ae80  [THREAD_INFO: ffff89711894ae80]"
         CPU: 83
       STATE: TASK_RUNNING (PANIC)

crash> kmem -s|grep -i invalid
kmem: dma-kmalloc-512: slab:ffffd77680001c00 invalid freepointer:a6086ac099f0c5a4
kmem: dma-kmalloc-512: slab:ffffd77680001c00 invalid freepointer:a6086ac099f0c5a4
crash>

In order to avoid such problem, lets occupy the first 640k region when
SME is active, which will ensure that the allocated memory does not fall
into the first 640k area. So, no need to worry about whether kernel can
correctly copy the contents of the first 640K area to a backup region in
purgatory().

Signed-off-by: Lianbo Jiang <lijiang@redhat.com>
---
 arch/x86/kernel/setup.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
index 77ea96b794bd..5bfb2c83bb6c 100644
--- a/arch/x86/kernel/setup.c
+++ b/arch/x86/kernel/setup.c
@@ -1148,6 +1148,9 @@ void __init setup_arch(char **cmdline_p)

 	reserve_real_mode();

+	if (sme_active())
+		memblock_reserve(0, 640*1024);
+
 	trim_platform_memory_ranges();
 	trim_low_memory_range();

-- 
2.17.1

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH] x86/kdump: Fix 'kmem -s' reported an invalid freepointer when SME was active
  2019-09-20  3:53 [PATCH] x86/kdump: Fix 'kmem -s' reported an invalid freepointer when SME was active Lianbo Jiang
@ 2019-09-27  5:15 ` Dave Young
  2019-09-27 20:49   ` Eric W. Biederman
  0 siblings, 1 reply; 10+ messages in thread
From: Dave Young @ 2019-09-27  5:15 UTC (permalink / raw)
  To: Lianbo Jiang
  Cc: linux-kernel, tglx, mingo, bp, hpa, x86, bhe, jgross, dhowells,
	Thomas.Lendacky, kexec, Vivek Goyal, Eric Biederman

Hi Lianbo,

For kexec/kdump patches, please remember to cc kexec list next time.
Also it is definitely kdump specific issue, I added Vivek and Eric in
cc. 

On 09/20/19 at 11:53am, Lianbo Jiang wrote:
> Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=204793
> 
> Kdump kernel will reuse the first 640k region because of some reasons,
> for example: the trampline and conventional PC system BIOS region may
> require to allocate memory in this area. Obviously, kdump kernel will
> also overwrite the first 640k region, therefore, kernel has to copy
> the contents of the first 640k area to a backup area, which is done in
> purgatory(), because vmcore may need the old memory. When vmcore is
> dumped, kdump kernel will read the old memory from the backup area of
> the first 640k area.
> 
> Basically, the main reason should be clear, kernel does not correctly
> handle the first 640k region when SME is active, which causes that
> kernel does not properly copy these old memory to the backup area in
> purgatory(). Therefore, kdump kernel reads out the incorrect contents
> from the backup area when dumping vmcore. Finally, the phenomenon is
> as follow:
> 
> [root linux]$ crash vmlinux /var/crash/127.0.0.1-2019-09-19-08\:31\:27/vmcore
> WARNING: kernel relocated [240MB]: patching 97110 gdb minimal_symbol values
> 
>       KERNEL: /var/crash/127.0.0.1-2019-09-19-08:31:27/vmlinux
>     DUMPFILE: /var/crash/127.0.0.1-2019-09-19-08:31:27/vmcore  [PARTIAL DUMP]
>         CPUS: 128
>         DATE: Thu Sep 19 08:31:18 2019
>       UPTIME: 00:01:21
> LOAD AVERAGE: 0.16, 0.07, 0.02
>        TASKS: 1343
>     NODENAME: amd-ethanol
>      RELEASE: 5.3.0-rc7+
>      VERSION: #4 SMP Thu Sep 19 08:14:00 EDT 2019
>      MACHINE: x86_64  (2195 Mhz)
>       MEMORY: 127.9 GB
>        PANIC: "Kernel panic - not syncing: sysrq triggered crash"
>          PID: 9789
>      COMMAND: "bash"
>         TASK: "ffff89711894ae80  [THREAD_INFO: ffff89711894ae80]"
>          CPU: 83
>        STATE: TASK_RUNNING (PANIC)
> 
> crash> kmem -s|grep -i invalid
> kmem: dma-kmalloc-512: slab:ffffd77680001c00 invalid freepointer:a6086ac099f0c5a4
> kmem: dma-kmalloc-512: slab:ffffd77680001c00 invalid freepointer:a6086ac099f0c5a4
> crash>
> 
> In order to avoid such problem, lets occupy the first 640k region when
> SME is active, which will ensure that the allocated memory does not fall
> into the first 640k area. So, no need to worry about whether kernel can
> correctly copy the contents of the first 640K area to a backup region in
> purgatory().

The log is too simple,  I know you did some other tries to fix this, but
the patch log does not show why you can not correctly copy the 640k in
current kdump code, in purgatory here.

Also this patch seems works in your test, but still to see if other
people can comment and see if it is safe or not, if any other risks
other than waste the small chunk of memory.  If it is safe then kdump
can just drop the backup logic and use this in common code instead of
only do it for SME.

> 
> Signed-off-by: Lianbo Jiang <lijiang@redhat.com>
> ---
>  arch/x86/kernel/setup.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
> index 77ea96b794bd..5bfb2c83bb6c 100644
> --- a/arch/x86/kernel/setup.c
> +++ b/arch/x86/kernel/setup.c
> @@ -1148,6 +1148,9 @@ void __init setup_arch(char **cmdline_p)
>  
>  	reserve_real_mode();
>  
> +	if (sme_active())
> +		memblock_reserve(0, 640*1024);
> +
>  	trim_platform_memory_ranges();
>  	trim_low_memory_range();
>  
> -- 
> 2.17.1
> 

Thanks
Dave

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] x86/kdump: Fix 'kmem -s' reported an invalid freepointer when SME was active
  2019-09-27  5:15 ` Dave Young
@ 2019-09-27 20:49   ` Eric W. Biederman
  2019-09-27 23:51     ` Baoquan He
  2019-09-28  0:05     ` Baoquan He
  0 siblings, 2 replies; 10+ messages in thread
From: Eric W. Biederman @ 2019-09-27 20:49 UTC (permalink / raw)
  To: Dave Young
  Cc: Lianbo Jiang, linux-kernel, tglx, mingo, bp, hpa, x86, bhe,
	jgross, dhowells, Thomas.Lendacky, kexec, Vivek Goyal

Dave Young <dyoung@redhat.com> writes:

> Hi Lianbo,
>
> For kexec/kdump patches, please remember to cc kexec list next time.
> Also it is definitely kdump specific issue, I added Vivek and Eric in
> cc. 
>
> On 09/20/19 at 11:53am, Lianbo Jiang wrote:
>> Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=204793
>> 
>> Kdump kernel will reuse the first 640k region because of some reasons,
>> for example: the trampline and conventional PC system BIOS region may
>> require to allocate memory in this area. Obviously, kdump kernel will
>> also overwrite the first 640k region, therefore, kernel has to copy
>> the contents of the first 640k area to a backup area, which is done in
>> purgatory(), because vmcore may need the old memory. When vmcore is
>> dumped, kdump kernel will read the old memory from the backup area of
>> the first 640k area.
>>
>> Basically, the main reason should be clear, kernel does not correctly
>> handle the first 640k region when SME is active, which causes that
>> kernel does not properly copy these old memory to the backup area in
>> purgatory(). Therefore, kdump kernel reads out the incorrect contents
>> from the backup area when dumping vmcore. Finally, the phenomenon is
>> as follow:
>> 
>> [root linux]$ crash vmlinux /var/crash/127.0.0.1-2019-09-19-08\:31\:27/vmcore
>> WARNING: kernel relocated [240MB]: patching 97110 gdb minimal_symbol values
>> 
>>       KERNEL: /var/crash/127.0.0.1-2019-09-19-08:31:27/vmlinux
>>     DUMPFILE: /var/crash/127.0.0.1-2019-09-19-08:31:27/vmcore  [PARTIAL DUMP]
>>         CPUS: 128
>>         DATE: Thu Sep 19 08:31:18 2019
>>       UPTIME: 00:01:21
>> LOAD AVERAGE: 0.16, 0.07, 0.02
>>        TASKS: 1343
>>     NODENAME: amd-ethanol
>>      RELEASE: 5.3.0-rc7+
>>      VERSION: #4 SMP Thu Sep 19 08:14:00 EDT 2019
>>      MACHINE: x86_64  (2195 Mhz)
>>       MEMORY: 127.9 GB
>>        PANIC: "Kernel panic - not syncing: sysrq triggered crash"
>>          PID: 9789
>>      COMMAND: "bash"
>>         TASK: "ffff89711894ae80  [THREAD_INFO: ffff89711894ae80]"
>>          CPU: 83
>>        STATE: TASK_RUNNING (PANIC)
>> 
>> crash> kmem -s|grep -i invalid
>> kmem: dma-kmalloc-512: slab:ffffd77680001c00 invalid freepointer:a6086ac099f0c5a4
>> kmem: dma-kmalloc-512: slab:ffffd77680001c00 invalid freepointer:a6086ac099f0c5a4
>> crash>
>> 
>> In order to avoid such problem, lets occupy the first 640k region when
>> SME is active, which will ensure that the allocated memory does not fall
>> into the first 640k area. So, no need to worry about whether kernel can
>> correctly copy the contents of the first 640K area to a backup region in
>> purgatory().

We must occupy part of the first 640k so that we can start up secondary
cpus unless someone has added another way to do that in recent years on
SME capable cpus.

Further there is Fimware/BIOS interaction that happens within those
first 640K.

Furthermore the kdump kernel needs to be able to read all of the memory
that the previous kernel could read.  Otherwise we can't get a crash
dump.

So I do not think ignoring the first 640K is the correct resolution
here.

> The log is too simple,  I know you did some other tries to fix this, but
> the patch log does not show why you can not correctly copy the 640k in
> current kdump code, in purgatory here.
>
> Also this patch seems works in your test, but still to see if other
> people can comment and see if it is safe or not, if any other risks
> other than waste the small chunk of memory.  If it is safe then kdump
> can just drop the backup logic and use this in common code instead of
> only do it for SME.

Exactly.

I think at best this avoids the symptoms, but does not give a reliable
crash dump.

Eric

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] x86/kdump: Fix 'kmem -s' reported an invalid freepointer when SME was active
  2019-09-27 20:49   ` Eric W. Biederman
@ 2019-09-27 23:51     ` Baoquan He
  2019-09-28  0:05     ` Baoquan He
  1 sibling, 0 replies; 10+ messages in thread
From: Baoquan He @ 2019-09-27 23:51 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Dave Young, Lianbo Jiang, linux-kernel, tglx, mingo, bp, hpa,
	x86, jgross, dhowells, Thomas.Lendacky, kexec, Vivek Goyal

On 09/27/19 at 03:49pm, Eric W. Biederman wrote:
> >> In order to avoid such problem, lets occupy the first 640k region when
> >> SME is active, which will ensure that the allocated memory does not fall
> >> into the first 640k area. So, no need to worry about whether kernel can
> >> correctly copy the contents of the first 640K area to a backup region in
> >> purgatory().
> 
> We must occupy part of the first 640k so that we can start up secondary
> cpus unless someone has added another way to do that in recent years on
> SME capable cpus.
> 
> Further there is Fimware/BIOS interaction that happens within those
> first 640K.
> 
> Furthermore the kdump kernel needs to be able to read all of the memory
> that the previous kernel could read.  Otherwise we can't get a crash
> dump.
> 
> So I do not think ignoring the first 640K is the correct resolution
> here.

We discussed and tried many ways to copy the first 640K of 1st kernel
out since kernel data may be allocated there, then crash need it to
parse. But SME makes the copy very difficult to do, because the first
640K is encrypted in 1st kernel, but the copy is done in purgatory with
1:1 ident-mapping and unencrypted.

Finally we decided this way as patch does. Reserving it in memblock is
not ignoring the first 640K, but lock it down to avoid any later kernel
data allocated in this area. The first 640K will be taken as system RAM
of kdump kernel as is. Like this, no available kernel information
could be located in this area, then we don't care if the copy is correct
or not.

One word, what the patch is doing is locking down the first 640K after 
reserve_real_mode() invocation. Putting it after reserve_real_mode() is
because reserve_real_mode() may put real mode trampoline inside first
640K. Surely the real mode trampoline will be discarded too in kdump
kernel, since it's not important and unnecessary for crash parsing.

I think Lianbo need rewrite this patch log to make it clearer.

diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
index 77ea96b794bd..5bfb2c83bb6c 100644
--- a/arch/x86/kernel/setup.c
+++ b/arch/x86/kernel/setup.c
@@ -1148,6 +1148,9 @@ void __init setup_arch(char **cmdline_p)

        reserve_real_mode();

+       if (sme_active())
+               memblock_reserve(0, 640*1024);
+
        trim_platform_memory_ranges();
        trim_low_memory_range();

> 
> > The log is too simple,  I know you did some other tries to fix this, but
> > the patch log does not show why you can not correctly copy the 640k in
> > current kdump code, in purgatory here.
> >
> > Also this patch seems works in your test, but still to see if other
> > people can comment and see if it is safe or not, if any other risks
> > other than waste the small chunk of memory.  If it is safe then kdump
> > can just drop the backup logic and use this in common code instead of
> > only do it for SME.
> 
> Exactly.
> 
> I think at best this avoids the symptoms, but does not give a reliable
> crash dump.
> 
> Eric

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH] x86/kdump: Fix 'kmem -s' reported an invalid freepointer when SME was active
  2019-09-27 20:49   ` Eric W. Biederman
  2019-09-27 23:51     ` Baoquan He
@ 2019-09-28  0:05     ` Baoquan He
  2019-09-28  2:32       ` Eric W. Biederman
  1 sibling, 1 reply; 10+ messages in thread
From: Baoquan He @ 2019-09-28  0:05 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Dave Young, Lianbo Jiang, linux-kernel, tglx, mingo, bp, hpa,
	x86, jgross, dhowells, Thomas.Lendacky, kexec, Vivek Goyal

On 09/27/19 at 03:49pm, Eric W. Biederman wrote:
> Dave Young <dyoung@redhat.com> writes:
> >> In order to avoid such problem, lets occupy the first 640k region when
> >> SME is active, which will ensure that the allocated memory does not fall
> >> into the first 640k area. So, no need to worry about whether kernel can
> >> correctly copy the contents of the first 640K area to a backup region in
> >> purgatory().
> 
> We must occupy part of the first 640k so that we can start up secondary
> cpus unless someone has added another way to do that in recent years on
> SME capable cpus.
> 
> Further there is Fimware/BIOS interaction that happens within those
> first 640K.
> 
> Furthermore the kdump kernel needs to be able to read all of the memory
> that the previous kernel could read.  Otherwise we can't get a crash
> dump.
> 
> So I do not think ignoring the first 640K is the correct resolution
> here.
> 
> > The log is too simple,  I know you did some other tries to fix this, but
> > the patch log does not show why you can not correctly copy the 640k in
> > current kdump code, in purgatory here.
> >
> > Also this patch seems works in your test, but still to see if other
> > people can comment and see if it is safe or not, if any other risks
> > other than waste the small chunk of memory.  If it is safe then kdump
> > can just drop the backup logic and use this in common code instead of
> > only do it for SME.
> 
> Exactly.
> 
> I think at best this avoids the symptoms, but does not give a reliable
> crash dump.

Sorry, didn't notice this comment at bottom.

From code, currently the first 640K area is needed in two places.
One is for 5-level trampoline during boot compressing stage, in
find_trampoline_placement(). 

The other is in reserve_real_mode(), as you mentioned, for application
CPU booting.

Only allow these two put data inside first 640K, then lock it done. It
should not impact crash dump and parsing. And these two's content
doesn't matter.

Thanks
Baoquan

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] x86/kdump: Fix 'kmem -s' reported an invalid freepointer when SME was active
  2019-09-28  0:05     ` Baoquan He
@ 2019-09-28  2:32       ` Eric W. Biederman
  2019-09-28  3:09         ` Baoquan He
  0 siblings, 1 reply; 10+ messages in thread
From: Eric W. Biederman @ 2019-09-28  2:32 UTC (permalink / raw)
  To: Baoquan He
  Cc: Dave Young, Lianbo Jiang, linux-kernel, tglx, mingo, bp, hpa,
	x86, jgross, dhowells, Thomas.Lendacky, kexec, Vivek Goyal

Baoquan He <bhe@redhat.com> writes:

> On 09/27/19 at 03:49pm, Eric W. Biederman wrote:
>> Dave Young <dyoung@redhat.com> writes:
>> >> In order to avoid such problem, lets occupy the first 640k region when
>> >> SME is active, which will ensure that the allocated memory does not fall
>> >> into the first 640k area. So, no need to worry about whether kernel can
>> >> correctly copy the contents of the first 640K area to a backup region in
>> >> purgatory().
>> 
>> We must occupy part of the first 640k so that we can start up secondary
>> cpus unless someone has added another way to do that in recent years on
>> SME capable cpus.
>> 
>> Further there is Fimware/BIOS interaction that happens within those
>> first 640K.
>> 
>> Furthermore the kdump kernel needs to be able to read all of the memory
>> that the previous kernel could read.  Otherwise we can't get a crash
>> dump.
>> 
>> So I do not think ignoring the first 640K is the correct resolution
>> here.
>> 
>> > The log is too simple,  I know you did some other tries to fix this, but
>> > the patch log does not show why you can not correctly copy the 640k in
>> > current kdump code, in purgatory here.
>> >
>> > Also this patch seems works in your test, but still to see if other
>> > people can comment and see if it is safe or not, if any other risks
>> > other than waste the small chunk of memory.  If it is safe then kdump
>> > can just drop the backup logic and use this in common code instead of
>> > only do it for SME.
>> 
>> Exactly.
>> 
>> I think at best this avoids the symptoms, but does not give a reliable
>> crash dump.
>
> Sorry, didn't notice this comment at bottom.
>
> From code, currently the first 640K area is needed in two places.
> One is for 5-level trampoline during boot compressing stage, in
> find_trampoline_placement(). 
>
> The other is in reserve_real_mode(), as you mentioned, for application
> CPU booting.
>
> Only allow these two put data inside first 640K, then lock it done. It
> should not impact crash dump and parsing. And these two's content
> doesn't matter.

Apologies.  Do I understand correctly that the idea is that the kernel
that may crash will never touch these pages?  And that the reservation
is not in the kernel that recovers from the crash?  That definitely
needs a little better description.  I know it is not a lot on modern
systems but reserving an extra 1M of memory to avoid having to special
case it later seems in need of calling out.

I have an old system around that I think that 640K is about 25% of
memory.

How we interact with BIOS tables in the first 640k needs some
explanation.  Both in the first kernel and in the crash kernel.

Eric

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] x86/kdump: Fix 'kmem -s' reported an invalid freepointer when SME was active
  2019-09-28  2:32       ` Eric W. Biederman
@ 2019-09-28  3:09         ` Baoquan He
  2019-09-30 10:14           ` Eric W. Biederman
  0 siblings, 1 reply; 10+ messages in thread
From: Baoquan He @ 2019-09-28  3:09 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Dave Young, Lianbo Jiang, linux-kernel, tglx, mingo, bp, hpa,
	x86, jgross, dhowells, Thomas.Lendacky, kexec, Vivek Goyal

On 09/27/19 at 09:32pm, Eric W. Biederman wrote:
> Baoquan He <bhe@redhat.com> writes:
> 
> > On 09/27/19 at 03:49pm, Eric W. Biederman wrote:
> >> Dave Young <dyoung@redhat.com> writes:
> >> >> In order to avoid such problem, lets occupy the first 640k region when
> >> >> SME is active, which will ensure that the allocated memory does not fall
> >> >> into the first 640k area. So, no need to worry about whether kernel can
> >> >> correctly copy the contents of the first 640K area to a backup region in
> >> >> purgatory().
> >> 
> >> We must occupy part of the first 640k so that we can start up secondary
> >> cpus unless someone has added another way to do that in recent years on
> >> SME capable cpus.
> >> 
> >> Further there is Fimware/BIOS interaction that happens within those
> >> first 640K.
> >> 
> >> Furthermore the kdump kernel needs to be able to read all of the memory
> >> that the previous kernel could read.  Otherwise we can't get a crash
> >> dump.
> >> 
> >> So I do not think ignoring the first 640K is the correct resolution
> >> here.
> >> 
> >> > The log is too simple,  I know you did some other tries to fix this, but
> >> > the patch log does not show why you can not correctly copy the 640k in
> >> > current kdump code, in purgatory here.
> >> >
> >> > Also this patch seems works in your test, but still to see if other
> >> > people can comment and see if it is safe or not, if any other risks
> >> > other than waste the small chunk of memory.  If it is safe then kdump
> >> > can just drop the backup logic and use this in common code instead of
> >> > only do it for SME.
> >> 
> >> Exactly.
> >> 
> >> I think at best this avoids the symptoms, but does not give a reliable
> >> crash dump.
> >
> > Sorry, didn't notice this comment at bottom.
> >
> > From code, currently the first 640K area is needed in two places.
> > One is for 5-level trampoline during boot compressing stage, in
> > find_trampoline_placement(). 
> >
> > The other is in reserve_real_mode(), as you mentioned, for application
> > CPU booting.
> >
> > Only allow these two put data inside first 640K, then lock it done. It
> > should not impact crash dump and parsing. And these two's content
> > doesn't matter.
> 
> Do I understand correctly that the idea is that the kernel
> that may crash will never touch these pages?  And that the reservation

Thanks a lot for your comments, Eric.

And yes. Except of the reserved real mode trampoline for secondary cpu if
the trampoline is allocated in this area, it will be reused. We search area
for real mode under 1M. Otherwise, the kernel that may crash will never
touch these pages.

> is not in the kernel that recovers from the crash?  That definitely

You mean kdump kernel? Only the e820 reserved regions which are inserted
into io_resource will be passed to kdump kernel, memblock reserved
region is only seen by the present kernel.

But for the content in first 640K, it's not erased, kdump kernel will
ignore it and take it as a available memory resource into memory
allocator.


> needs a little better description.  I know it is not a lot on modern
> systems but reserving an extra 1M of memory to avoid having to special
> case it later seems in need of calling out.
> 
> I have an old system around that I think that 640K is about 25% of
> memory.

Understood. Basically 640K is wasted in this case. But we only do like
this in SME case, a condition checking is added. And system with SME is
pretty new model, it may not impact the old system.

> 
> How we interact with BIOS tables in the first 640k needs some
> explanation.  Both in the first kernel and in the crash kernel.

Yes, totally agree.

Those BIOS tables have been reserved as e820 reserved regions and will
be passed to kdump kernel for reusing. Memblock reserved 640K doesn't
mean it will cover the whole [0, 640K) region, it only searches for
available system RAM from memblock allocator.

Thanks
Baoquan

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] x86/kdump: Fix 'kmem -s' reported an invalid freepointer when SME was active
  2019-09-28  3:09         ` Baoquan He
@ 2019-09-30 10:14           ` Eric W. Biederman
  2019-10-01  7:40             ` Baoquan He
  0 siblings, 1 reply; 10+ messages in thread
From: Eric W. Biederman @ 2019-09-30 10:14 UTC (permalink / raw)
  To: Baoquan He
  Cc: Dave Young, Lianbo Jiang, linux-kernel, tglx, mingo, bp, hpa,
	x86, jgross, dhowells, Thomas.Lendacky, kexec, Vivek Goyal

Baoquan He <bhe@redhat.com> writes:

> On 09/27/19 at 09:32pm, Eric W. Biederman wrote:
>> Baoquan He <bhe@redhat.com> writes:
>> 
>> > On 09/27/19 at 03:49pm, Eric W. Biederman wrote:
>> >> Dave Young <dyoung@redhat.com> writes:
>> >> >> In order to avoid such problem, lets occupy the first 640k region when
>> >> >> SME is active, which will ensure that the allocated memory does not fall
>> >> >> into the first 640k area. So, no need to worry about whether kernel can
>> >> >> correctly copy the contents of the first 640K area to a backup region in
>> >> >> purgatory().
>> >> 
>> >> We must occupy part of the first 640k so that we can start up secondary
>> >> cpus unless someone has added another way to do that in recent years on
>> >> SME capable cpus.
>> >> 
>> >> Further there is Fimware/BIOS interaction that happens within those
>> >> first 640K.
>> >> 
>> >> Furthermore the kdump kernel needs to be able to read all of the memory
>> >> that the previous kernel could read.  Otherwise we can't get a crash
>> >> dump.
>> >> 
>> >> So I do not think ignoring the first 640K is the correct resolution
>> >> here.
>> >> 
>> >> > The log is too simple,  I know you did some other tries to fix this, but
>> >> > the patch log does not show why you can not correctly copy the 640k in
>> >> > current kdump code, in purgatory here.
>> >> >
>> >> > Also this patch seems works in your test, but still to see if other
>> >> > people can comment and see if it is safe or not, if any other risks
>> >> > other than waste the small chunk of memory.  If it is safe then kdump
>> >> > can just drop the backup logic and use this in common code instead of
>> >> > only do it for SME.
>> >> 
>> >> Exactly.
>> >> 
>> >> I think at best this avoids the symptoms, but does not give a reliable
>> >> crash dump.
>> >
>> > Sorry, didn't notice this comment at bottom.
>> >
>> > From code, currently the first 640K area is needed in two places.
>> > One is for 5-level trampoline during boot compressing stage, in
>> > find_trampoline_placement(). 
>> >
>> > The other is in reserve_real_mode(), as you mentioned, for application
>> > CPU booting.
>> >
>> > Only allow these two put data inside first 640K, then lock it done. It
>> > should not impact crash dump and parsing. And these two's content
>> > doesn't matter.
>> 
>> Do I understand correctly that the idea is that the kernel
>> that may crash will never touch these pages?  And that the reservation
>
> Thanks a lot for your comments, Eric.
>
> And yes. Except of the reserved real mode trampoline for secondary cpu if
> the trampoline is allocated in this area, it will be reused. We search area
> for real mode under 1M. Otherwise, the kernel that may crash will never
> touch these pages.
>
>> is not in the kernel that recovers from the crash?  That definitely
>
> You mean kdump kernel? Only the e820 reserved regions which are inserted
> into io_resource will be passed to kdump kernel, memblock reserved
> region is only seen by the present kernel.
>
> But for the content in first 640K, it's not erased, kdump kernel will
> ignore it and take it as a available memory resource into memory
> allocator.
>
>
>> needs a little better description.  I know it is not a lot on modern
>> systems but reserving an extra 1M of memory to avoid having to special
>> case it later seems in need of calling out.
>> 
>> I have an old system around that I think that 640K is about 25% of
>> memory.
>
> Understood. Basically 640K is wasted in this case. But we only do like
> this in SME case, a condition checking is added. And system with SME is
> pretty new model, it may not impact the old system.

The conditional really should be based on if we are reserving memory
for a kdump kernel.  AKA if crash_kernel=XXX is specified on the kernel
command line.

At which point I think it would be very reasonable to unconditionally
reserve the low 640k, and make the whole thing a non-issue.  This would
allow the kdump code to just not do anything special for any of the
weird special case.

It isn't perfect because we need a page or so used in the first kernel
for bootstrapping the secondary cpus, but that seems like the least of
evils.  Especially as no one will DMA to that memory.

So please let's just change what memory we reserve when crash_kernel is
specified.

>> How we interact with BIOS tables in the first 640k needs some
>> explanation.  Both in the first kernel and in the crash kernel.
>
> Yes, totally agree.
>
> Those BIOS tables have been reserved as e820 reserved regions and will
> be passed to kdump kernel for reusing. Memblock reserved 640K doesn't
> mean it will cover the whole [0, 640K) region, it only searches for
> available system RAM from memblock allocator.

Careful with that assumption.  My memory is that the e820 memory map
frequently fails to cover areas like the real mode interrupt descriptor
table at address 0.

Eric

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] x86/kdump: Fix 'kmem -s' reported an invalid freepointer when SME was active
  2019-09-30 10:14           ` Eric W. Biederman
@ 2019-10-01  7:40             ` Baoquan He
  2019-10-05  7:35               ` lijiang
  0 siblings, 1 reply; 10+ messages in thread
From: Baoquan He @ 2019-10-01  7:40 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Dave Young, Lianbo Jiang, linux-kernel, tglx, mingo, bp, hpa,
	x86, jgross, dhowells, Thomas.Lendacky, kexec, Vivek Goyal

On 09/30/19 at 05:14am, Eric W. Biederman wrote:
> Baoquan He <bhe@redhat.com> writes:
> >> needs a little better description.  I know it is not a lot on modern
> >> systems but reserving an extra 1M of memory to avoid having to special
> >> case it later seems in need of calling out.
> >> 
> >> I have an old system around that I think that 640K is about 25% of
> >> memory.
> >
> > Understood. Basically 640K is wasted in this case. But we only do like
> > this in SME case, a condition checking is added. And system with SME is
> > pretty new model, it may not impact the old system.
> 
> The conditional really should be based on if we are reserving memory
> for a kdump kernel.  AKA if crash_kernel=XXX is specified on the kernel
> command line.
> 
> At which point I think it would be very reasonable to unconditionally
> reserve the low 640k, and make the whole thing a non-issue.  This would
> allow the kdump code to just not do anything special for any of the
> weird special case.
> 
> It isn't perfect because we need a page or so used in the first kernel
> for bootstrapping the secondary cpus, but that seems like the least of
> evils.  Especially as no one will DMA to that memory.
> 
> So please let's just change what memory we reserve when crash_kernel is
> specified.

Yes, makes sense, thanks for pointing it out.

> 
> >> How we interact with BIOS tables in the first 640k needs some
> >> explanation.  Both in the first kernel and in the crash kernel.
> >
> > Yes, totally agree.
> >
> > Those BIOS tables have been reserved as e820 reserved regions and will
> > be passed to kdump kernel for reusing. Memblock reserved 640K doesn't
> > mean it will cover the whole [0, 640K) region, it only searches for
> > available system RAM from memblock allocator.
> 
> Careful with that assumption.  My memory is that the e820 memory map
> frequently fails to cover areas like the real mode interrupt descriptor
> table at address 0.

OK, will think more about this. Thanks.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] x86/kdump: Fix 'kmem -s' reported an invalid freepointer when SME was active
  2019-10-01  7:40             ` Baoquan He
@ 2019-10-05  7:35               ` lijiang
  0 siblings, 0 replies; 10+ messages in thread
From: lijiang @ 2019-10-05  7:35 UTC (permalink / raw)
  To: Baoquan He, Eric W. Biederman
  Cc: Dave Young, linux-kernel, tglx, mingo, bp, hpa, x86, jgross,
	dhowells, Thomas.Lendacky, kexec, Vivek Goyal

在 2019年10月01日 15:40, Baoquan He 写道:
> On 09/30/19 at 05:14am, Eric W. Biederman wrote:
>> Baoquan He <bhe@redhat.com> writes:
>>>> needs a little better description.  I know it is not a lot on modern
>>>> systems but reserving an extra 1M of memory to avoid having to special
>>>> case it later seems in need of calling out.
>>>>
>>>> I have an old system around that I think that 640K is about 25% of
>>>> memory.
>>>
>>> Understood. Basically 640K is wasted in this case. But we only do like
>>> this in SME case, a condition checking is added. And system with SME is
>>> pretty new model, it may not impact the old system.
>>
>> The conditional really should be based on if we are reserving memory
>> for a kdump kernel.  AKA if crash_kernel=XXX is specified on the kernel
>> command line.
>>
>> At which point I think it would be very reasonable to unconditionally
>> reserve the low 640k, and make the whole thing a non-issue.  This would
>> allow the kdump code to just not do anything special for any of the
>> weird special case.
>>
>> It isn't perfect because we need a page or so used in the first kernel
>> for bootstrapping the secondary cpus, but that seems like the least of
>> evils.  Especially as no one will DMA to that memory.
>>
>> So please let's just change what memory we reserve when crash_kernel is
>> specified.
> 
> Yes, makes sense, thanks for pointing it out.
> 

Sorry for the delay and thanks for your comment, Eric, Baoquan and Dave Young.

I will improve patch log and add the extra condition crash_kernel. I will post
v2 later.

Thanks.
Lianbo

>>
>>>> How we interact with BIOS tables in the first 640k needs some
>>>> explanation.  Both in the first kernel and in the crash kernel.
>>>
>>> Yes, totally agree.
>>>
>>> Those BIOS tables have been reserved as e820 reserved regions and will
>>> be passed to kdump kernel for reusing. Memblock reserved 640K doesn't
>>> mean it will cover the whole [0, 640K) region, it only searches for
>>> available system RAM from memblock allocator.
>>
>> Careful with that assumption.  My memory is that the e820 memory map
>> frequently fails to cover areas like the real mode interrupt descriptor
>> table at address 0.
> 
> OK, will think more about this. Thanks.
> 

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2019-10-05  7:35 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-09-20  3:53 [PATCH] x86/kdump: Fix 'kmem -s' reported an invalid freepointer when SME was active Lianbo Jiang
2019-09-27  5:15 ` Dave Young
2019-09-27 20:49   ` Eric W. Biederman
2019-09-27 23:51     ` Baoquan He
2019-09-28  0:05     ` Baoquan He
2019-09-28  2:32       ` Eric W. Biederman
2019-09-28  3:09         ` Baoquan He
2019-09-30 10:14           ` Eric W. Biederman
2019-10-01  7:40             ` Baoquan He
2019-10-05  7:35               ` lijiang

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).