linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCHv7] x86/kdump: bugfix, make the behavior of crashkernel=X consistent with kaslr
@ 2019-01-21  5:16 Pingfan Liu
  2019-01-21  6:24 ` Baoquan He
  2019-01-25 10:39 ` Borislav Petkov
  0 siblings, 2 replies; 45+ messages in thread
From: Pingfan Liu @ 2019-01-21  5:16 UTC (permalink / raw)
  To: kexec
  Cc: Pingfan Liu, Dave Young, Baoquan He, Andrew Morton,
	Mike Rapoport, yinghai, vgoyal, Randy Dunlap, Borislav Petkov,
	x86, linux-kernel

People reported crashkernel=384M reservation failed on a high end server
with KASLR enabled.  In that case there is enough free memory under 896M
but crashkernel reservation still fails intermittently.

The situation is crashkernel reservation code only finds free region under
896 MB with 128M aligned in case no ',high' being used.  And KASLR could
break the first 896M into several parts randomly thus the failure happens.
User has no way to predict and make sure crashkernel=xM working unless
he/she use 'crashkernel=xM,high'.  Since 'crashkernel=xM' is the most
common use case this issue is a serious bug.

And we can't answer questions raised from customer:
1) why it doesn't succeed to reserve 896 MB;
2) what's wrong with memory region under 4G;
3) why I have to add ',high', I only require 384 MB, not 3840 MB.

This patch tries to get memory region from 896 MB firstly, then [896MB,4G],
finally above 4G.

Dave Young sent the original post, and I just re-post it with commit log
improvement as his requirement.
http://lists.infradead.org/pipermail/kexec/2017-October/019571.html
There was an old discussion below (previously posted by Chao Wang):
https://lkml.org/lkml/2013/10/15/601

Signed-off-by: Pingfan Liu <kernelfans@gmail.com>
Cc: Dave Young <dyoung@redhat.com>
Cc: Baoquan He <bhe@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
Cc: yinghai@kernel.org,
Cc: vgoyal@redhat.com
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: x86@kernel.org
Cc: linux-kernel@vger.kernel.org
---
v6 -> v7: commit log improvement
 arch/x86/kernel/setup.c | 16 ++++++++++++++++
 1 file changed, 16 insertions(+)

diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
index 3d872a5..fa62c81 100644
--- a/arch/x86/kernel/setup.c
+++ b/arch/x86/kernel/setup.c
@@ -551,6 +551,22 @@ static void __init reserve_crashkernel(void)
 						    high ? CRASH_ADDR_HIGH_MAX
 							 : CRASH_ADDR_LOW_MAX,
 						    crash_size, CRASH_ALIGN);
+#ifdef CONFIG_X86_64
+		/*
+		 * crashkernel=X reserve below 896M fails? Try below 4G
+		 */
+		if (!high && !crash_base)
+			crash_base = memblock_find_in_range(CRASH_ALIGN,
+						(1ULL << 32),
+						crash_size, CRASH_ALIGN);
+		/*
+		 * crashkernel=X reserve below 4G fails? Try MAXMEM
+		 */
+		if (!high && !crash_base)
+			crash_base = memblock_find_in_range(CRASH_ALIGN,
+						CRASH_ADDR_HIGH_MAX,
+						crash_size, CRASH_ALIGN);
+#endif
 		if (!crash_base) {
 			pr_info("crashkernel reservation failed - No suitable area found.\n");
 			return;
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* Re: [PATCHv7] x86/kdump: bugfix, make the behavior of crashkernel=X consistent with kaslr
  2019-01-21  5:16 [PATCHv7] x86/kdump: bugfix, make the behavior of crashkernel=X consistent with kaslr Pingfan Liu
@ 2019-01-21  6:24 ` Baoquan He
  2019-01-25 10:39 ` Borislav Petkov
  1 sibling, 0 replies; 45+ messages in thread
From: Baoquan He @ 2019-01-21  6:24 UTC (permalink / raw)
  To: Pingfan Liu
  Cc: kexec, Dave Young, Andrew Morton, Mike Rapoport, yinghai, vgoyal,
	Randy Dunlap, Borislav Petkov, x86, linux-kernel

On 01/21/19 at 01:16pm, Pingfan Liu wrote:
> People reported crashkernel=384M reservation failed on a high end server
> with KASLR enabled.  In that case there is enough free memory under 896M
> but crashkernel reservation still fails intermittently.
> 
> The situation is crashkernel reservation code only finds free region under
> 896 MB with 128M aligned in case no ',high' being used.  And KASLR could
> break the first 896M into several parts randomly thus the failure happens.
> User has no way to predict and make sure crashkernel=xM working unless
> he/she use 'crashkernel=xM,high'.  Since 'crashkernel=xM' is the most
> common use case this issue is a serious bug.
> 
> And we can't answer questions raised from customer:
> 1) why it doesn't succeed to reserve 896 MB;
> 2) what's wrong with memory region under 4G;
> 3) why I have to add ',high', I only require 384 MB, not 3840 MB.
> 
> This patch tries to get memory region from 896 MB firstly, then [896MB,4G],
> finally above 4G.
> 
> Dave Young sent the original post, and I just re-post it with commit log
> improvement as his requirement.
> http://lists.infradead.org/pipermail/kexec/2017-October/019571.html
> There was an old discussion below (previously posted by Chao Wang):
> https://lkml.org/lkml/2013/10/15/601
> 
> Signed-off-by: Pingfan Liu <kernelfans@gmail.com>
> Cc: Dave Young <dyoung@redhat.com>
> Cc: Baoquan He <bhe@redhat.com>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
> Cc: yinghai@kernel.org,
> Cc: vgoyal@redhat.com
> Cc: Randy Dunlap <rdunlap@infradead.org>
> Cc: Borislav Petkov <bp@alien8.de>
> Cc: x86@kernel.org
> Cc: linux-kernel@vger.kernel.org

Looks good, ack.

Acked-by: Baoquan He <bhe@redhat.com>

Thanks
Baoquan

> ---
> v6 -> v7: commit log improvement
>  arch/x86/kernel/setup.c | 16 ++++++++++++++++
>  1 file changed, 16 insertions(+)
> 
> diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
> index 3d872a5..fa62c81 100644
> --- a/arch/x86/kernel/setup.c
> +++ b/arch/x86/kernel/setup.c
> @@ -551,6 +551,22 @@ static void __init reserve_crashkernel(void)
>  						    high ? CRASH_ADDR_HIGH_MAX
>  							 : CRASH_ADDR_LOW_MAX,
>  						    crash_size, CRASH_ALIGN);
> +#ifdef CONFIG_X86_64
> +		/*
> +		 * crashkernel=X reserve below 896M fails? Try below 4G
> +		 */
> +		if (!high && !crash_base)
> +			crash_base = memblock_find_in_range(CRASH_ALIGN,
> +						(1ULL << 32),
> +						crash_size, CRASH_ALIGN);
> +		/*
> +		 * crashkernel=X reserve below 4G fails? Try MAXMEM
> +		 */
> +		if (!high && !crash_base)
> +			crash_base = memblock_find_in_range(CRASH_ALIGN,
> +						CRASH_ADDR_HIGH_MAX,
> +						crash_size, CRASH_ALIGN);
> +#endif
>  		if (!crash_base) {
>  			pr_info("crashkernel reservation failed - No suitable area found.\n");
>  			return;
> -- 
> 2.7.4
> 

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCHv7] x86/kdump: bugfix, make the behavior of crashkernel=X consistent with kaslr
  2019-01-21  5:16 [PATCHv7] x86/kdump: bugfix, make the behavior of crashkernel=X consistent with kaslr Pingfan Liu
  2019-01-21  6:24 ` Baoquan He
@ 2019-01-25 10:39 ` Borislav Petkov
  2019-01-25 13:45   ` Dave Young
  2019-01-29  5:51   ` Pingfan Liu
  1 sibling, 2 replies; 45+ messages in thread
From: Borislav Petkov @ 2019-01-25 10:39 UTC (permalink / raw)
  To: Pingfan Liu
  Cc: kexec, Dave Young, Baoquan He, Andrew Morton, Mike Rapoport,
	yinghai, vgoyal, Randy Dunlap, x86, linux-kernel


>  Subject: Re: [PATCHv7] x86/kdump: bugfix, make the behavior of crashkernel=X

s/bugfix, //

On Mon, Jan 21, 2019 at 01:16:08PM +0800, Pingfan Liu wrote:
> People reported crashkernel=384M reservation failed on a high end server
> with KASLR enabled.  In that case there is enough free memory under 896M
> but crashkernel reservation still fails intermittently.
> 
> The situation is crashkernel reservation code only finds free region under
> 896 MB with 128M aligned in case no ',high' being used.  And KASLR could
> break the first 896M into several parts randomly thus the failure happens.

This reads very strange.

> User has no way to predict and make sure crashkernel=xM working unless
> he/she use 'crashkernel=xM,high'.  Since 'crashkernel=xM' is the most
> common use case this issue is a serious bug.
> 
> And we can't answer questions raised from customer:
> 1) why it doesn't succeed to reserve 896 MB;
> 2) what's wrong with memory region under 4G;
> 3) why I have to add ',high', I only require 384 MB, not 3840 MB.

Errr, this looks like communication issue. Sounds to me like the text
around crashkernel= in

Documentation/admin-guide/kernel-parameters.txt

needs improving?

> This patch tries to get memory region from 896 MB firstly, then [896MB,4G],

Avoid having "This patch" or "This commit" in the commit message. It is
tautologically useless.

Also, do

$ git grep 'This patch' Documentation/process

for more details.

> finally above 4G.
> 
> Dave Young sent the original post, and I just re-post it with commit log

If he sent it, he should be the author I guess.

> improvement as his requirement.
> http://lists.infradead.org/pipermail/kexec/2017-October/019571.html
> There was an old discussion below (previously posted by Chao Wang):
> https://lkml.org/lkml/2013/10/15/601

All that changelog info doesn't belong in the commit message ...

> Signed-off-by: Pingfan Liu <kernelfans@gmail.com>
> Cc: Dave Young <dyoung@redhat.com>
> Cc: Baoquan He <bhe@redhat.com>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
> Cc: yinghai@kernel.org,
> Cc: vgoyal@redhat.com
> Cc: Randy Dunlap <rdunlap@infradead.org>
> Cc: Borislav Petkov <bp@alien8.de>
> Cc: x86@kernel.org
> Cc: linux-kernel@vger.kernel.org
> ---

.... but here.

> v6 -> v7: commit log improvement
>  arch/x86/kernel/setup.c | 16 ++++++++++++++++
>  1 file changed, 16 insertions(+)
> 
> diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
> index 3d872a5..fa62c81 100644
> --- a/arch/x86/kernel/setup.c
> +++ b/arch/x86/kernel/setup.c
> @@ -551,6 +551,22 @@ static void __init reserve_crashkernel(void)
>  						    high ? CRASH_ADDR_HIGH_MAX
>  							 : CRASH_ADDR_LOW_MAX,
>  						    crash_size, CRASH_ALIGN);
> +#ifdef CONFIG_X86_64
> +		/*
> +		 * crashkernel=X reserve below 896M fails? Try below 4G
> +		 */
> +		if (!high && !crash_base)
> +			crash_base = memblock_find_in_range(CRASH_ALIGN,
> +						(1ULL << 32),
> +						crash_size, CRASH_ALIGN);
> +		/*
> +		 * crashkernel=X reserve below 4G fails? Try MAXMEM
> +		 */
> +		if (!high && !crash_base)
> +			crash_base = memblock_find_in_range(CRASH_ALIGN,
> +						CRASH_ADDR_HIGH_MAX,
> +						crash_size, CRASH_ALIGN);
> +#endif

Ok, so this is silly: we know at which physical address KASLR allocated
the kernel so why aren't we querying that and seeing if there's enough
room before it or after it to call memblock_find_in_range() on the
bigger range?

Also, why is "high" dealt with separately and why isn't the code
enforcing "high" if the normal reservation fails?

The presence of high is requiring from our users to pay attention what
to use when the kernel can do all that automatically. Looks like a UI
fail to me.

And look at all the flavors of crashkernel= :

        crashkernel=size[KMG][@offset[KMG]]
        crashkernel=range1:size1[,range2:size2,...][@offset]
        crashkernel=size[KMG],high
        crashkernel=size[KMG],low

We couldn't do one so we made 4?!?!

What for?

Nowhere in that help text does it explain why a user would care about
high or low or range or offset or the planets alignment...

So what's up?

-- 
Regards/Gruss,
    Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCHv7] x86/kdump: bugfix, make the behavior of crashkernel=X consistent with kaslr
  2019-01-25 10:39 ` Borislav Petkov
@ 2019-01-25 13:45   ` Dave Young
  2019-01-25 14:08     ` Borislav Petkov
  2019-01-29  5:51   ` Pingfan Liu
  1 sibling, 1 reply; 45+ messages in thread
From: Dave Young @ 2019-01-25 13:45 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Pingfan Liu, kexec, Baoquan He, Andrew Morton, Mike Rapoport,
	yinghai, vgoyal, Randy Dunlap, x86, linux-kernel

> > 
> > Dave Young sent the original post, and I just re-post it with commit log
> 
> If he sent it, he should be the author I guess.

Just drop the line, but can change the credit to Chao Wang since he did
it initially.

But Chao does not work on kexec/kdump any more, and the email address is
outdated as well.

> 
> > improvement as his requirement.
> > http://lists.infradead.org/pipermail/kexec/2017-October/019571.html
> > There was an old discussion below (previously posted by Chao Wang):
> > https://lkml.org/lkml/2013/10/15/601
> 
> All that changelog info doesn't belong in the commit message ...
> 
> > Signed-off-by: Pingfan Liu <kernelfans@gmail.com>
> > Cc: Dave Young <dyoung@redhat.com>
> > Cc: Baoquan He <bhe@redhat.com>
> > Cc: Andrew Morton <akpm@linux-foundation.org>
> > Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
> > Cc: yinghai@kernel.org,
> > Cc: vgoyal@redhat.com
> > Cc: Randy Dunlap <rdunlap@infradead.org>
> > Cc: Borislav Petkov <bp@alien8.de>
> > Cc: x86@kernel.org
> > Cc: linux-kernel@vger.kernel.org
> > ---
> 
> .... but here.
> 
> > v6 -> v7: commit log improvement
> >  arch/x86/kernel/setup.c | 16 ++++++++++++++++
> >  1 file changed, 16 insertions(+)
> > 
> > diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
> > index 3d872a5..fa62c81 100644
> > --- a/arch/x86/kernel/setup.c
> > +++ b/arch/x86/kernel/setup.c
> > @@ -551,6 +551,22 @@ static void __init reserve_crashkernel(void)
> >  						    high ? CRASH_ADDR_HIGH_MAX
> >  							 : CRASH_ADDR_LOW_MAX,
> >  						    crash_size, CRASH_ALIGN);
> > +#ifdef CONFIG_X86_64
> > +		/*
> > +		 * crashkernel=X reserve below 896M fails? Try below 4G
> > +		 */
> > +		if (!high && !crash_base)
> > +			crash_base = memblock_find_in_range(CRASH_ALIGN,
> > +						(1ULL << 32),
> > +						crash_size, CRASH_ALIGN);
> > +		/*
> > +		 * crashkernel=X reserve below 4G fails? Try MAXMEM
> > +		 */
> > +		if (!high && !crash_base)
> > +			crash_base = memblock_find_in_range(CRASH_ALIGN,
> > +						CRASH_ADDR_HIGH_MAX,
> > +						crash_size, CRASH_ALIGN);
> > +#endif
> 
> Ok, so this is silly: we know at which physical address KASLR allocated
> the kernel so why aren't we querying that and seeing if there's enough
> room before it or after it to call memblock_find_in_range() on the
> bigger range?

Baoquan may have comments?

> 
> Also, why is "high" dealt with separately and why isn't the code
> enforcing "high" if the normal reservation fails?
> 

AFAIK, some people prefer to explictly reserve crash memory at high
region even if it is possible to reserve at low area.  May because
<4G memory is limited on large server, they want to leave this for other
use. 

Yinghai or Vivek should know more about the history, probably they can
recall some initial reason.

> The presence of high is requiring from our users to pay attention what
> to use when the kernel can do all that automatically. Looks like a UI
> fail to me.
> 
> And look at all the flavors of crashkernel= :
> 
>         crashkernel=size[KMG][@offset[KMG]]
>         crashkernel=range1:size1[,range2:size2,...][@offset]
>         crashkernel=size[KMG],high
>         crashkernel=size[KMG],low
> 
> We couldn't do one so we made 4?!?!
> 
> What for?
> 
> Nowhere in that help text does it explain why a user would care about
> high or low or range or offset or the planets alignment...
> 
> So what's up?

Good question, still it may be some historical reason, but it is good to
make them clear and rethink about it after long time.

I also want to understand, need dig the log more.
> 
> -- 
> Regards/Gruss,
>     Boris.
> 
> Good mailing practices for 400: avoid top-posting and trim the reply.

Thanks
Dave

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCHv7] x86/kdump: bugfix, make the behavior of crashkernel=X consistent with kaslr
  2019-01-25 13:45   ` Dave Young
@ 2019-01-25 14:08     ` Borislav Petkov
  2019-01-28  9:58       ` Dave Young
                         ` (2 more replies)
  0 siblings, 3 replies; 45+ messages in thread
From: Borislav Petkov @ 2019-01-25 14:08 UTC (permalink / raw)
  To: Dave Young
  Cc: Pingfan Liu, kexec, Baoquan He, Andrew Morton, Mike Rapoport,
	yinghai, vgoyal, Randy Dunlap, x86, linux-kernel

On Fri, Jan 25, 2019 at 09:45:18PM +0800, Dave Young wrote:
> AFAIK, some people prefer to explictly reserve crash memory at high
> region even if it is possible to reserve at low area.  May because
> <4G memory is limited on large server, they want to leave this for other
> use. 
> 
> Yinghai or Vivek should know more about the history, probably they can
> recall some initial reason.

Yes, just "prefer" is not good enough. There should be a technical
reason why that's there.

Also, if the user doesn't care, then the code should be free to force
"high" and thus probe a different range for allocation.

> Good question, still it may be some historical reason, but it is good to
> make them clear and rethink about it after long time.
> 
> I also want to understand, need dig the log more.

Good idea. That would be a very nice cleanup. :-)

Thx.

-- 
Regards/Gruss,
    Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCHv7] x86/kdump: bugfix, make the behavior of crashkernel=X consistent with kaslr
  2019-01-25 14:08     ` Borislav Petkov
@ 2019-01-28  9:58       ` Dave Young
  2019-01-28 10:18         ` Borislav Petkov
  2019-01-29  5:25       ` Pingfan Liu
  2019-01-31  7:59       ` Dave Young
  2 siblings, 1 reply; 45+ messages in thread
From: Dave Young @ 2019-01-28  9:58 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Pingfan Liu, kexec, Baoquan He, Andrew Morton, Mike Rapoport,
	yinghai, vgoyal, Randy Dunlap, x86, linux-kernel

On 01/25/19 at 03:08pm, Borislav Petkov wrote:
> On Fri, Jan 25, 2019 at 09:45:18PM +0800, Dave Young wrote:
> > AFAIK, some people prefer to explictly reserve crash memory at high
> > region even if it is possible to reserve at low area.  May because
> > <4G memory is limited on large server, they want to leave this for other
> > use. 
> > 
> > Yinghai or Vivek should know more about the history, probably they can
> > recall some initial reason.
> 
> Yes, just "prefer" is not good enough. There should be a technical
> reason why that's there.
> 
> Also, if the user doesn't care, then the code should be free to force
> "high" and thus probe a different range for allocation.

Another reason is in case ,high we will need automatically reserve a
region in low area for swiotlb.  So for example one use
crashkernel=256M,high,  actual reserved memory is 256M above 4G and
another 256M under 4G for swiotlb.  Normally it is not necessary for
most people.  Thus we can not make ,high as default.

Thanks
Dave

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCHv7] x86/kdump: bugfix, make the behavior of crashkernel=X consistent with kaslr
  2019-01-28  9:58       ` Dave Young
@ 2019-01-28 10:18         ` Borislav Petkov
  2019-06-07 17:30           ` Borislav Petkov
  0 siblings, 1 reply; 45+ messages in thread
From: Borislav Petkov @ 2019-01-28 10:18 UTC (permalink / raw)
  To: Dave Young
  Cc: Pingfan Liu, kexec, Baoquan He, Andrew Morton, Mike Rapoport,
	yinghai, vgoyal, Randy Dunlap, x86, linux-kernel

On Mon, Jan 28, 2019 at 05:58:09PM +0800, Dave Young wrote:
> Another reason is in case ,high we will need automatically reserve a
> region in low area for swiotlb.  So for example one use
> crashkernel=256M,high,  actual reserved memory is 256M above 4G and
> another 256M under 4G for swiotlb.  Normally it is not necessary for
> most people.  Thus we can not make ,high as default.

And how is the poor user to figure out that we decided for her/him that
swiotlb reservation is something not necessary for most people and thus
we fail the crashkernel= reservation?

IOW, that "logic" above doesn't make a whole lot of sense to me from
user friendliness perspective.

-- 
Regards/Gruss,
    Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCHv7] x86/kdump: bugfix, make the behavior of crashkernel=X consistent with kaslr
  2019-01-25 14:08     ` Borislav Petkov
  2019-01-28  9:58       ` Dave Young
@ 2019-01-29  5:25       ` Pingfan Liu
  2019-01-31  7:42         ` Dave Young
  2019-01-31  7:59       ` Dave Young
  2 siblings, 1 reply; 45+ messages in thread
From: Pingfan Liu @ 2019-01-29  5:25 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Dave Young, kexec, Baoquan He, Andrew Morton, Mike Rapoport,
	Yinghai Lu, vgoyal, Randy Dunlap, x86, LKML

On Fri, Jan 25, 2019 at 10:08 PM Borislav Petkov <bp@alien8.de> wrote:
>
> On Fri, Jan 25, 2019 at 09:45:18PM +0800, Dave Young wrote:
> > AFAIK, some people prefer to explictly reserve crash memory at high
> > region even if it is possible to reserve at low area.  May because
> > <4G memory is limited on large server, they want to leave this for other
> > use.
> >
> > Yinghai or Vivek should know more about the history, probably they can
> > recall some initial reason.
>
Go through the git log, and I found the initial introduction of
crashkernel_high option. Refer to
commit 55a20ee7804ab64ac90bcdd4e2868a42829e2784
Author: Yinghai Lu <yinghai@kernel.org>
Date:   Mon Apr 15 22:23:47 2013 -0700

    x86, kdump: Retore crashkernel= to allocate under 896M

    Vivek found old kexec-tools does not work new kernel anymore.

    So change back crashkernel= back to old behavoir, and add crashkernel_high=
    to let user decide if buffer could be above 4G, and also new
kexec-tools will
    be needed.

But kexec-tools-2.0.3, released at 2012, can run 4.20 kernel with
crashkernel=256M@5G, so I think only very old kexec-tools requires
memory under 896M. Due to -1.few people running latest kernel with
very old kexec-tools to date, -2. crashkernel=X is more popular than
crashkernel=X.high, it should be time to eliminate this limit of
crashkernel=X parameter, otherwise we will run into this bug.
As for crashkernel=,high, I think it is a more professional option for
who cares about the DMA32. On high-end machine, big reserved region is
used for crashkernel(e.g. in this case 384M), which make the crowed
situation under under 4GB memory worse.

> Yes, just "prefer" is not good enough. There should be a technical
> reason why that's there.
>
> Also, if the user doesn't care, then the code should be free to force
> "high" and thus probe a different range for allocation.
>
Do you suggest to remove crashkernel=X,high parameter?

Thanks,
Pingfan
> > Good question, still it may be some historical reason, but it is good to
> > make them clear and rethink about it after long time.
> >
> > I also want to understand, need dig the log more.
>
> Good idea. That would be a very nice cleanup. :-)
>
> Thx.
>
> --
> Regards/Gruss,
>     Boris.
>
> Good mailing practices for 400: avoid top-posting and trim the reply.

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCHv7] x86/kdump: bugfix, make the behavior of crashkernel=X consistent with kaslr
  2019-01-25 10:39 ` Borislav Petkov
  2019-01-25 13:45   ` Dave Young
@ 2019-01-29  5:51   ` Pingfan Liu
  2019-01-31 10:50     ` Borislav Petkov
  1 sibling, 1 reply; 45+ messages in thread
From: Pingfan Liu @ 2019-01-29  5:51 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: kexec, Dave Young, Baoquan He, Andrew Morton, Mike Rapoport,
	Yinghai Lu, vgoyal, Randy Dunlap, x86, LKML

On Fri, Jan 25, 2019 at 6:39 PM Borislav Petkov <bp@alien8.de> wrote:
>
>
> >  Subject: Re: [PATCHv7] x86/kdump: bugfix, make the behavior of crashkernel=X
>
> s/bugfix, //
>
OK.

> On Mon, Jan 21, 2019 at 01:16:08PM +0800, Pingfan Liu wrote:
> > People reported crashkernel=384M reservation failed on a high end server
> > with KASLR enabled.  In that case there is enough free memory under 896M
> > but crashkernel reservation still fails intermittently.
> >
> > The situation is crashkernel reservation code only finds free region under
> > 896 MB with 128M aligned in case no ',high' being used.  And KASLR could
> > break the first 896M into several parts randomly thus the failure happens.
>
> This reads very strange.
>
What about   "  It turns out that crashkernel reservation code only
tries to find a region under 896 MB, aligned on 128M. But KASLR
randomly breaks big region inside [0,896M] into smaller pieces, not
big enough as demanded in the "crashkernel=X" parameter."

> > User has no way to predict and make sure crashkernel=xM working unless
> > he/she use 'crashkernel=xM,high'.  Since 'crashkernel=xM' is the most
> > common use case this issue is a serious bug.
> >
> > And we can't answer questions raised from customer:
> > 1) why it doesn't succeed to reserve 896 MB;
> > 2) what's wrong with memory region under 4G;
> > 3) why I have to add ',high', I only require 384 MB, not 3840 MB.
>
> Errr, this looks like communication issue. Sounds to me like the text
> around crashkernel= in
>
What about dropping this section in commit log and another patch to
fix the document?

> Documentation/admin-guide/kernel-parameters.txt
>
> needs improving?
>
> > This patch tries to get memory region from 896 MB firstly, then [896MB,4G],
>
> Avoid having "This patch" or "This commit" in the commit message. It is
> tautologically useless.
>
OK

> Also, do
>
> $ git grep 'This patch' Documentation/process
>
> for more details.
>
> > finally above 4G.
> >
> > Dave Young sent the original post, and I just re-post it with commit log
>
> If he sent it, he should be the author I guess.
>
> > improvement as his requirement.
> > http://lists.infradead.org/pipermail/kexec/2017-October/019571.html
> > There was an old discussion below (previously posted by Chao Wang):
> > https://lkml.org/lkml/2013/10/15/601
>
> All that changelog info doesn't belong in the commit message ...
>
> > Signed-off-by: Pingfan Liu <kernelfans@gmail.com>
> > Cc: Dave Young <dyoung@redhat.com>
> > Cc: Baoquan He <bhe@redhat.com>
> > Cc: Andrew Morton <akpm@linux-foundation.org>
> > Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
> > Cc: yinghai@kernel.org,
> > Cc: vgoyal@redhat.com
> > Cc: Randy Dunlap <rdunlap@infradead.org>
> > Cc: Borislav Petkov <bp@alien8.de>
> > Cc: x86@kernel.org
> > Cc: linux-kernel@vger.kernel.org
> > ---
>
> .... but here.
>
> > v6 -> v7: commit log improvement
> >  arch/x86/kernel/setup.c | 16 ++++++++++++++++
> >  1 file changed, 16 insertions(+)
> >
> > diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
> > index 3d872a5..fa62c81 100644
> > --- a/arch/x86/kernel/setup.c
> > +++ b/arch/x86/kernel/setup.c
> > @@ -551,6 +551,22 @@ static void __init reserve_crashkernel(void)
> >                                                   high ? CRASH_ADDR_HIGH_MAX
> >                                                        : CRASH_ADDR_LOW_MAX,
> >                                                   crash_size, CRASH_ALIGN);
> > +#ifdef CONFIG_X86_64
> > +             /*
> > +              * crashkernel=X reserve below 896M fails? Try below 4G
> > +              */
> > +             if (!high && !crash_base)
> > +                     crash_base = memblock_find_in_range(CRASH_ALIGN,
> > +                                             (1ULL << 32),
> > +                                             crash_size, CRASH_ALIGN);
> > +             /*
> > +              * crashkernel=X reserve below 4G fails? Try MAXMEM
> > +              */
> > +             if (!high && !crash_base)
> > +                     crash_base = memblock_find_in_range(CRASH_ALIGN,
> > +                                             CRASH_ADDR_HIGH_MAX,
> > +                                             crash_size, CRASH_ALIGN);
> > +#endif
>
> Ok, so this is silly: we know at which physical address KASLR allocated
> the kernel so why aren't we querying that and seeing if there's enough
> room before it or after it to call memblock_find_in_range() on the
> bigger range?
>
Sorry, can not catch up with you. Do you suggestion
memblock_find_in_range(0, kernel_start) and
memblock_find_in_range(kernel_end, mem_end)? But the memory is
truncated into fraction by many component which call
memblock_reserve(), besides kernel.

For the left question, Dave has follow the discussion in another
email, will follow there.

Thanks and regards,
Pingfan

> Also, why is "high" dealt with separately and why isn't the code
> enforcing "high" if the normal reservation fails?
>
> The presence of high is requiring from our users to pay attention what
> to use when the kernel can do all that automatically. Looks like a UI
> fail to me.
>
> And look at all the flavors of crashkernel= :
>
>         crashkernel=size[KMG][@offset[KMG]]
>         crashkernel=range1:size1[,range2:size2,...][@offset]
>         crashkernel=size[KMG],high
>         crashkernel=size[KMG],low
>
> We couldn't do one so we made 4?!?!
>
> What for?
>
> Nowhere in that help text does it explain why a user would care about
> high or low or range or offset or the planets alignment...
>
> So what's up?
>
> --
> Regards/Gruss,
>     Boris.
>
> Good mailing practices for 400: avoid top-posting and trim the reply.

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCHv7] x86/kdump: bugfix, make the behavior of crashkernel=X consistent with kaslr
  2019-01-29  5:25       ` Pingfan Liu
@ 2019-01-31  7:42         ` Dave Young
  0 siblings, 0 replies; 45+ messages in thread
From: Dave Young @ 2019-01-31  7:42 UTC (permalink / raw)
  To: Pingfan Liu
  Cc: Borislav Petkov, kexec, Baoquan He, Andrew Morton, Mike Rapoport,
	Yinghai Lu, vgoyal, Randy Dunlap, x86, LKML

On 01/29/19 at 01:25pm, Pingfan Liu wrote:
> On Fri, Jan 25, 2019 at 10:08 PM Borislav Petkov <bp@alien8.de> wrote:
> >
> > On Fri, Jan 25, 2019 at 09:45:18PM +0800, Dave Young wrote:
> > > AFAIK, some people prefer to explictly reserve crash memory at high
> > > region even if it is possible to reserve at low area.  May because
> > > <4G memory is limited on large server, they want to leave this for other
> > > use.
> > >
> > > Yinghai or Vivek should know more about the history, probably they can
> > > recall some initial reason.
> >
> Go through the git log, and I found the initial introduction of
> crashkernel_high option. Refer to
> commit 55a20ee7804ab64ac90bcdd4e2868a42829e2784
> Author: Yinghai Lu <yinghai@kernel.org>
> Date:   Mon Apr 15 22:23:47 2013 -0700
> 
>     x86, kdump: Retore crashkernel= to allocate under 896M
> 
>     Vivek found old kexec-tools does not work new kernel anymore.
> 
>     So change back crashkernel= back to old behavoir, and add crashkernel_high=
>     to let user decide if buffer could be above 4G, and also new
> kexec-tools will
>     be needed.
> 
> But kexec-tools-2.0.3, released at 2012, can run 4.20 kernel with
> crashkernel=256M@5G, so I think only very old kexec-tools requires
> memory under 896M. Due to -1.few people running latest kernel with
> very old kexec-tools to date, -2. crashkernel=X is more popular than
> crashkernel=X.high, it should be time to eliminate this limit of
> crashkernel=X parameter, otherwise we will run into this bug.
> As for crashkernel=,high, I think it is a more professional option for
> who cares about the DMA32. On high-end machine, big reserved region is
> used for crashkernel(e.g. in this case 384M), which make the crowed
> situation under under 4GB memory worse.

This seems answers the question why ,high is needed and why
crashkernel=X can not allocate from high by default.

Another reason for this question is for crashkernel=,high, a separate
region under 4G is still needed.  I searched some history bug/emails,
previously the initial commit does not reserve low memory by default if
,high is used, but later Yinghai saw a regression bug in case iommu
disabled. see below commit:
commit c729de8fcea37a1c444e81857eace12494c804a9
Author: Yinghai Lu <yinghai@kernel.org>
Date:   Mon Apr 15 22:23:45 2013 -0700

    x86, kdump: Set crashkernel_low automatically

    Chao said that kdump does does work well on his system on 3.8
    without extra parameter, even iommu does not work with kdump.
    And now have to append crashkernel_low=Y in first kernel to make
    kdump work.
    
    We have now modified crashkernel=X to allocate memory beyong 4G (if
    available) and do not allocate low range for crashkernel if the user
    does not specify that with crashkernel_low=Y.  This causes regression
    if iommu is not enabled.  Without iommu, swiotlb needs to be setup in
    first 4G and there is no low memory available to second kernel.
[snip]



> 
> > Yes, just "prefer" is not good enough. There should be a technical
> > reason why that's there.
> >
> > Also, if the user doesn't care, then the code should be free to force
> > "high" and thus probe a different range for allocation.
> >
> Do you suggest to remove crashkernel=X,high parameter?

I do not think it can be removed,  because for ,high we also need extra
low memory, it will cause confusion, most people are just fine without
this.  People can choose to use ,high if they really need it.

Thanks
Dave

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCHv7] x86/kdump: bugfix, make the behavior of crashkernel=X consistent with kaslr
  2019-01-25 14:08     ` Borislav Petkov
  2019-01-28  9:58       ` Dave Young
  2019-01-29  5:25       ` Pingfan Liu
@ 2019-01-31  7:59       ` Dave Young
  2019-01-31 10:57         ` Borislav Petkov
  2 siblings, 1 reply; 45+ messages in thread
From: Dave Young @ 2019-01-31  7:59 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Pingfan Liu, kexec, Baoquan He, Andrew Morton, Mike Rapoport,
	yinghai, vgoyal, Randy Dunlap, x86, linux-kernel

On 01/25/19 at 03:08pm, Borislav Petkov wrote:
> On Fri, Jan 25, 2019 at 09:45:18PM +0800, Dave Young wrote:
> > AFAIK, some people prefer to explictly reserve crash memory at high
> > region even if it is possible to reserve at low area.  May because
> > <4G memory is limited on large server, they want to leave this for other
> > use. 
> > 
> > Yinghai or Vivek should know more about the history, probably they can
> > recall some initial reason.
> 
> Yes, just "prefer" is not good enough. There should be a technical
> reason why that's there.

Got more about this, actually the thing is for large server with very
huge memory and also possible a lot of io devices.  The UEFI IO drivers
could allocate a lot memory under 896M,  so it will leave small room for
crashkernel=X

Also if the memory is huge, then copying the vmcore will be very slow,
people want to use big makedumpfile bitmap buffer, and multithread, then
kdump kernel needs more memory, they can choose ,high to get more
memory.

But for common use cases, if one does not need so much kdump memory
reserved he/she can just use crashkernel=X.


> 
> Also, if the user doesn't care, then the code should be free to force
> "high" and thus probe a different range for allocation.

As Pingfan/me mentioned in another reply, there are two reasons:
1. old kexec-tools can not load kernel to high memory
2. ,high will not work on some systems without some amounts of low
memory so it nees reserve extra low memory, it is bad for people who do
not want it. 

> 
> > Good question, still it may be some historical reason, but it is good to
> > make them clear and rethink about it after long time.
> > 
> > I also want to understand, need dig the log more.
> 
> Good idea. That would be a very nice cleanup. :-)

Let's review these different parameters:
>         crashkernel=size[KMG][@offset[KMG]]

Did not get the initial commit for this, it should be the very old
format, and kernel did not find the base address automatically for the
size

Other arches still use this, for example mips code seems needs explict
@offset, see the function mips_parse_crashkernel in
arch/mips/kernel/setup.c

Probably there are other arches as well which only support this format.

>         crashkernel=range1:size1[,range2:size2,...][@offset]

This is nice param which can help to dynamically reserve different size
depends on system memory.

>         crashkernel=size[KMG],high
>         crashkernel=size[KMG],low

We have talked about these two.  It is not graceful, but seems no way to
improve it due to the swiotlb issue.

Thanks
Dave

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCHv7] x86/kdump: bugfix, make the behavior of crashkernel=X consistent with kaslr
  2019-01-29  5:51   ` Pingfan Liu
@ 2019-01-31 10:50     ` Borislav Petkov
  0 siblings, 0 replies; 45+ messages in thread
From: Borislav Petkov @ 2019-01-31 10:50 UTC (permalink / raw)
  To: Pingfan Liu
  Cc: kexec, Dave Young, Baoquan He, Andrew Morton, Mike Rapoport,
	Yinghai Lu, vgoyal, Randy Dunlap, x86, LKML

On Tue, Jan 29, 2019 at 01:51:45PM +0800, Pingfan Liu wrote:
> Sorry, can not catch up with you. Do you suggestion
> memblock_find_in_range(0, kernel_start) and
> memblock_find_in_range(kernel_end, mem_end)? But the memory is
> truncated into fraction by many component which call
> memblock_reserve(), besides kernel.

extract_kernel() selects the physical address where the place the
kernel. You can use that physical address to figure out where to
allocate the crash kernel range.

-- 
Regards/Gruss,
    Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCHv7] x86/kdump: bugfix, make the behavior of crashkernel=X consistent with kaslr
  2019-01-31  7:59       ` Dave Young
@ 2019-01-31 10:57         ` Borislav Petkov
  2019-01-31 22:27           ` Jerry Hoemann
  0 siblings, 1 reply; 45+ messages in thread
From: Borislav Petkov @ 2019-01-31 10:57 UTC (permalink / raw)
  To: Dave Young
  Cc: Pingfan Liu, kexec, Baoquan He, Andrew Morton, Mike Rapoport,
	yinghai, vgoyal, Randy Dunlap, x86, linux-kernel

On Thu, Jan 31, 2019 at 03:59:07PM +0800, Dave Young wrote:
> As Pingfan/me mentioned in another reply, there are two reasons:
> 1. old kexec-tools can not load kernel to high memory
> 2. ,high will not work on some systems without some amounts of low
> memory so it nees reserve extra low memory, it is bad for people who do
> not want it.

Let's see: we don't enable high by default because of old kexec-tools
and some systems which do some funky reservations.

But this patch kinda enables high by trying a couple more regions.

So we don't really need this - we simply need to tell people to use high
if it fails with KASLR, AFAICT.

-- 
Regards/Gruss,
    Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCHv7] x86/kdump: bugfix, make the behavior of crashkernel=X consistent with kaslr
  2019-01-31 10:57         ` Borislav Petkov
@ 2019-01-31 22:27           ` Jerry Hoemann
  2019-01-31 23:47             ` Borislav Petkov
  0 siblings, 1 reply; 45+ messages in thread
From: Jerry Hoemann @ 2019-01-31 22:27 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Dave Young, x86, Baoquan He, Randy Dunlap, kexec, linux-kernel,
	Pingfan Liu, Mike Rapoport, Andrew Morton, yinghai, vgoyal

On Thu, Jan 31, 2019 at 11:57:32AM +0100, Borislav Petkov wrote:
> On Thu, Jan 31, 2019 at 03:59:07PM +0800, Dave Young wrote:
> > As Pingfan/me mentioned in another reply, there are two reasons:
> > 1. old kexec-tools can not load kernel to high memory
> > 2. ,high will not work on some systems without some amounts of low
> > memory so it nees reserve extra low memory, it is bad for people who do
> > not want it.
> 
> Let's see: we don't enable high by default because of old kexec-tools
> and some systems which do some funky reservations.
> 
> But this patch kinda enables high by trying a couple more regions.
> 
> So we don't really need this - we simply need to tell people to use high
> if it fails with KASLR, AFAICT.

Borris,

The testing I've done shows that the allocation failure caused by KASLR 
is intermittent.  On one SUT that I've seen this issue on, the crash
kernel allocation fails less than 10% of the time.

So even if a system administrator is diligent and tests
that a chosen kdump configuration works, that configuration
might not work on some random reboot 7 months in the future.
Unfortunately, that will be the time that the system crashes
and we won't be able to collect the crash dump due to the
crashkernel allocation failure.

Jerry

-- 

-----------------------------------------------------------------------------
Jerry Hoemann                  Software Engineer   Hewlett Packard Enterprise
-----------------------------------------------------------------------------

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCHv7] x86/kdump: bugfix, make the behavior of crashkernel=X consistent with kaslr
  2019-01-31 22:27           ` Jerry Hoemann
@ 2019-01-31 23:47             ` Borislav Petkov
  2019-02-04 22:30               ` Jerry Hoemann
  0 siblings, 1 reply; 45+ messages in thread
From: Borislav Petkov @ 2019-01-31 23:47 UTC (permalink / raw)
  To: Jerry Hoemann
  Cc: Dave Young, x86, Baoquan He, Randy Dunlap, kexec, linux-kernel,
	Pingfan Liu, Mike Rapoport, Andrew Morton, yinghai, vgoyal

On Thu, Jan 31, 2019 at 03:27:32PM -0700, Jerry Hoemann wrote:
> So even if a system administrator is diligent and tests
> that a chosen kdump configuration works, that configuration
> might not work on some random reboot 7 months in the future.

Jerry, did you read the rest of the thread where I'm *actually*
suggesting to make the allocation code more robust against such
failures?

Doesn't look like it...

Now let's look at the code:

The "high" allocation does:

                crash_base = memblock_find_in_range(CRASH_ALIGN,
                                                    high ? CRASH_ADDR_HIGH_MAX
                                                         : CRASH_ADDR_LOW_MAX,
                                                    crash_size, CRASH_ALIGN);

where high=true and CRASH_ADDR_HIGH_MAX on 64-bit is MAXMEM:

# define CRASH_ADDR_HIGH_MAX    MAXMEM

The second fallback in the suggested patch does the same:

+               /*
+                * crashkernel=X reserve below 4G fails? Try MAXMEM
+                */
+               if (!high && !crash_base)
+                       crash_base = memblock_find_in_range(CRASH_ALIGN,
+                                               CRASH_ADDR_HIGH_MAX,
+                                               crash_size, CRASH_ALIGN);

and yet I get back that falling back to "high" if the first allocation
doesn't succeed is not something we should do by default because of
reasons. But this patch *practically* *does* it.

So no, until this hasn't been done cleanly and properly explained, we'll
be in a holding pattern.

-- 
Regards/Gruss,
    Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCHv7] x86/kdump: bugfix, make the behavior of crashkernel=X consistent with kaslr
  2019-01-31 23:47             ` Borislav Petkov
@ 2019-02-04 22:30               ` Jerry Hoemann
  2019-02-05  8:15                 ` Borislav Petkov
  0 siblings, 1 reply; 45+ messages in thread
From: Jerry Hoemann @ 2019-02-04 22:30 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Dave Young, x86, Baoquan He, Randy Dunlap, kexec, linux-kernel,
	Pingfan Liu, Mike Rapoport, Andrew Morton, yinghai, vgoyal

On Fri, Feb 01, 2019 at 12:47:40AM +0100, Borislav Petkov wrote:
> On Thu, Jan 31, 2019 at 03:27:32PM -0700, Jerry Hoemann wrote:
> > So even if a system administrator is diligent and tests
> > that a chosen kdump configuration works, that configuration
> > might not work on some random reboot 7 months in the future.
> 
> Jerry, did you read the rest of the thread where I'm *actually*
> suggesting to make the allocation code more robust against such
> failures?


Boris,

I may have misunderstood your earlier comment:

  So we don't really need this - we simply need to tell people to use high
  if it fails with KASLR, AFAICT

To imply an iterative approach to crashkernel size discovery.  Whereas you
may simply have ment:  Always use ,high as the old way is broken.


> Now let's look at the code:
> 
> The "high" allocation does:
> 
>                 crash_base = memblock_find_in_range(CRASH_ALIGN,
>                                                     high ? CRASH_ADDR_HIGH_MAX
>                                                          : CRASH_ADDR_LOW_MAX,
>                                                     crash_size, CRASH_ALIGN);
> 
> where high=true and CRASH_ADDR_HIGH_MAX on 64-bit is MAXMEM:
> 
> # define CRASH_ADDR_HIGH_MAX    MAXMEM
> 
> The second fallback in the suggested patch does the same:
> 
> +               /*
> +                * crashkernel=X reserve below 4G fails? Try MAXMEM
> +                */
> +               if (!high && !crash_base)
> +                       crash_base = memblock_find_in_range(CRASH_ALIGN,
> +                                               CRASH_ADDR_HIGH_MAX,
> +                                               crash_size, CRASH_ALIGN);
> 
> and yet I get back that falling back to "high" if the first allocation
> doesn't succeed is not something we should do by default because of
> reasons. But this patch *practically* *does* it.


Is your objection only to the second fallback of allocating
memory above >= 4GB?   Or are you objecting to allocating from
(896 .. 4GB) as well?

Falling back to allocating < 4GB probably satisfes most of the cases
where the original allocation fails.

thanks

-- 

-----------------------------------------------------------------------------
Jerry Hoemann                  Software Engineer   Hewlett Packard Enterprise
-----------------------------------------------------------------------------

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCHv7] x86/kdump: bugfix, make the behavior of crashkernel=X consistent with kaslr
  2019-02-04 22:30               ` Jerry Hoemann
@ 2019-02-05  8:15                 ` Borislav Petkov
  2019-02-06 12:08                   ` Dave Young
  0 siblings, 1 reply; 45+ messages in thread
From: Borislav Petkov @ 2019-02-05  8:15 UTC (permalink / raw)
  To: Jerry Hoemann
  Cc: Dave Young, x86, Baoquan He, Randy Dunlap, kexec, linux-kernel,
	Pingfan Liu, Mike Rapoport, Andrew Morton, yinghai, vgoyal

On Mon, Feb 04, 2019 at 03:30:16PM -0700, Jerry Hoemann wrote:
> Is your objection only to the second fallback of allocating
> memory above >= 4GB?   Or are you objecting to allocating from
> (896 .. 4GB) as well?

My problem is why should the user need to specify high or low allocation
explicitly when we can handle all that in the kernel automatically.

The presence of crashkernel= on the cmdline sure means that the user
wants to allocate memory for a second kernel.

Now, if the requested allocation fails, we say:

  Error reserving crashkernel

So, instead of saying that, we can *try* *again* and say

  Error reserving requested crashkernel at @..., attempting a high range.

and run memblock_find_in_range() on the other regions which we deemed
are ok to allocate from.

Why aren't we doing that by default instead of placing all those
different options in front of the user and expecting her/him to know
something about all those magic ranges?

I don't think most of the users care about where the kernel gets
allocated - all they want is a working kdump setup.

> Falling back to allocating < 4GB probably satisfes most of the cases
> where the original allocation fails.

Yes. Now make that automatic.

-- 
Regards/Gruss,
    Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCHv7] x86/kdump: bugfix, make the behavior of crashkernel=X consistent with kaslr
  2019-02-05  8:15                 ` Borislav Petkov
@ 2019-02-06 12:08                   ` Dave Young
  2019-02-11 20:48                     ` Dave Young
  0 siblings, 1 reply; 45+ messages in thread
From: Dave Young @ 2019-02-06 12:08 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Jerry Hoemann, x86, Baoquan He, Randy Dunlap, kexec,
	linux-kernel, Pingfan Liu, Mike Rapoport, Andrew Morton, yinghai,
	vgoyal

On 02/05/19 at 09:15am, Borislav Petkov wrote:
> On Mon, Feb 04, 2019 at 03:30:16PM -0700, Jerry Hoemann wrote:
> > Is your objection only to the second fallback of allocating
> > memory above >= 4GB?   Or are you objecting to allocating from
> > (896 .. 4GB) as well?
> 
> My problem is why should the user need to specify high or low allocation
> explicitly when we can handle all that in the kernel automatically.
> 
> The presence of crashkernel= on the cmdline sure means that the user
> wants to allocate memory for a second kernel.
> 
> Now, if the requested allocation fails, we say:
> 
>   Error reserving crashkernel
> 
> So, instead of saying that, we can *try* *again* and say
> 
>   Error reserving requested crashkernel at @..., attempting a high range.
> 
> and run memblock_find_in_range() on the other regions which we deemed
> are ok to allocate from.
> 
> Why aren't we doing that by default instead of placing all those
> different options in front of the user and expecting her/him to know
> something about all those magic ranges?

As we talked in another reply, for the >4G allocation we can not avoid
the swiotlb issue,  but if one request for 256M in high region and we
allocate the low part automatically, it will eat more memory eg. 512M.

But probably in case allacation failed in low region ,high is a must for kdump
reservation, since no other choices perhaps we can make that as you said

> 
> I don't think most of the users care about where the kernel gets
> allocated - all they want is a working kdump setup.
> 
> > Falling back to allocating < 4GB probably satisfes most of the cases
> > where the original allocation fails.
> 
> Yes. Now make that automatic.

For the time being, this should be good enough.

> 
> -- 
> Regards/Gruss,
>     Boris.
> 
> Good mailing practices for 400: avoid top-posting and trim the reply.

Thanks
Dave

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCHv7] x86/kdump: bugfix, make the behavior of crashkernel=X consistent with kaslr
  2019-02-06 12:08                   ` Dave Young
@ 2019-02-11 20:48                     ` Dave Young
  2019-02-12  5:35                       ` Pingfan Liu
  2019-02-15 10:24                       ` Borislav Petkov
  0 siblings, 2 replies; 45+ messages in thread
From: Dave Young @ 2019-02-11 20:48 UTC (permalink / raw)
  To: Borislav Petkov, bhe
  Cc: Jerry Hoemann, x86, Baoquan He, Randy Dunlap, kexec,
	linux-kernel, Pingfan Liu, Mike Rapoport, Andrew Morton, yinghai,
	vgoyal

On 02/06/19 at 08:08pm, Dave Young wrote:
> On 02/05/19 at 09:15am, Borislav Petkov wrote:
> > On Mon, Feb 04, 2019 at 03:30:16PM -0700, Jerry Hoemann wrote:
> > > Is your objection only to the second fallback of allocating
> > > memory above >= 4GB?   Or are you objecting to allocating from
> > > (896 .. 4GB) as well?
> > 
> > My problem is why should the user need to specify high or low allocation
> > explicitly when we can handle all that in the kernel automatically.
> > 
> > The presence of crashkernel= on the cmdline sure means that the user
> > wants to allocate memory for a second kernel.
> > 
> > Now, if the requested allocation fails, we say:
> > 
> >   Error reserving crashkernel
> > 
> > So, instead of saying that, we can *try* *again* and say
> > 
> >   Error reserving requested crashkernel at @..., attempting a high range.
> > 
> > and run memblock_find_in_range() on the other regions which we deemed
> > are ok to allocate from.
> > 
> > Why aren't we doing that by default instead of placing all those
> > different options in front of the user and expecting her/him to know
> > something about all those magic ranges?
> 
> As we talked in another reply, for the >4G allocation we can not avoid
> the swiotlb issue,  but if one request for 256M in high region and we
> allocate the low part automatically, it will eat more memory eg. 512M.
> 
> But probably in case allacation failed in low region ,high is a must for kdump
> reservation, since no other choices perhaps we can make that as you said

That is exactly what Pingfan is doing in this patch.

Even we make it automatic in kernel, but we have to have some default
value for swiotlb in case crashkernel can not find a free region under 4G.
So this default value can not work for every use cases, people need 
manually use crashkernel=,low and crashkernel=,high in case
crashkernel=X does not work.  One can tune it for their use:

1) crashkernel=X reservation fails, likely the ,low default value is
still too big, one can shrink the value and manually try other value
2) crashkernel=X reserve successfully on high memory and along with some
default low memory region. But the low region is not enough.  In this
case one can increase the 

This should answer the question why ,high and ,low is still needed.

But for above consumption 1),  KASLR can still cause default ,low memory
failed to reserve.  So I wonder if KASLR can skip the 0-896M if the
system memory is big enough.

Thanks
Dave

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCHv7] x86/kdump: bugfix, make the behavior of crashkernel=X consistent with kaslr
  2019-02-11 20:48                     ` Dave Young
@ 2019-02-12  5:35                       ` Pingfan Liu
  2019-02-15 10:24                       ` Borislav Petkov
  1 sibling, 0 replies; 45+ messages in thread
From: Pingfan Liu @ 2019-02-12  5:35 UTC (permalink / raw)
  To: Dave Young
  Cc: Borislav Petkov, Baoquan He, Jerry Hoemann, x86, Randy Dunlap,
	kexec, LKML, Mike Rapoport, Andrew Morton, Yinghai Lu, vgoyal

On Tue, Feb 12, 2019 at 4:48 AM Dave Young <dyoung@redhat.com> wrote:
>
> On 02/06/19 at 08:08pm, Dave Young wrote:
> > On 02/05/19 at 09:15am, Borislav Petkov wrote:
> > > On Mon, Feb 04, 2019 at 03:30:16PM -0700, Jerry Hoemann wrote:
> > > > Is your objection only to the second fallback of allocating
> > > > memory above >= 4GB?   Or are you objecting to allocating from
> > > > (896 .. 4GB) as well?
> > >
> > > My problem is why should the user need to specify high or low allocation
> > > explicitly when we can handle all that in the kernel automatically.
> > >
> > > The presence of crashkernel= on the cmdline sure means that the user
> > > wants to allocate memory for a second kernel.
> > >
> > > Now, if the requested allocation fails, we say:
> > >
> > >   Error reserving crashkernel
> > >
> > > So, instead of saying that, we can *try* *again* and say
> > >
> > >   Error reserving requested crashkernel at @..., attempting a high range.
> > >
> > > and run memblock_find_in_range() on the other regions which we deemed
> > > are ok to allocate from.
> > >
> > > Why aren't we doing that by default instead of placing all those
> > > different options in front of the user and expecting her/him to know
> > > something about all those magic ranges?
> >
> > As we talked in another reply, for the >4G allocation we can not avoid
> > the swiotlb issue,  but if one request for 256M in high region and we
> > allocate the low part automatically, it will eat more memory eg. 512M.
> >
> > But probably in case allacation failed in low region ,high is a must for kdump
> > reservation, since no other choices perhaps we can make that as you said
>
> That is exactly what Pingfan is doing in this patch.
>
> Even we make it automatic in kernel, but we have to have some default
> value for swiotlb in case crashkernel can not find a free region under 4G.
> So this default value can not work for every use cases, people need
> manually use crashkernel=,low and crashkernel=,high in case
> crashkernel=X does not work.  One can tune it for their use:
>
> 1) crashkernel=X reservation fails, likely the ,low default value is
> still too big, one can shrink the value and manually try other value
> 2) crashkernel=X reserve successfully on high memory and along with some
> default low memory region. But the low region is not enough.  In this
> case one can increase the
>
> This should answer the question why ,high and ,low is still needed.
>
> But for above consumption 1),  KASLR can still cause default ,low memory
> failed to reserve.  So I wonder if KASLR can skip the 0-896M if the
> system memory is big enough.
>
A little fix about the comment. Refer to reserve_crashkernel_low(),
low_base = memblock_find_in_range(0, 1ULL << 32, low_size,
CRASH_ALIGN); So it should try 0~4G for ",low". And the default size
for low is 256M. Given the limited memory region reserved by other
component before crashkernel, we always can find a continuous chunk of
256M inside the fragmented [0,4G], which is split by initrd, KASLR.

Thanks,
Pingfan

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCHv7] x86/kdump: bugfix, make the behavior of crashkernel=X consistent with kaslr
  2019-02-11 20:48                     ` Dave Young
  2019-02-12  5:35                       ` Pingfan Liu
@ 2019-02-15 10:24                       ` Borislav Petkov
  2019-02-18  1:48                         ` Dave Young
  1 sibling, 1 reply; 45+ messages in thread
From: Borislav Petkov @ 2019-02-15 10:24 UTC (permalink / raw)
  To: Dave Young
  Cc: bhe, Jerry Hoemann, x86, Randy Dunlap, kexec, linux-kernel,
	Pingfan Liu, Mike Rapoport, Andrew Morton, yinghai, vgoyal

On Tue, Feb 12, 2019 at 04:48:16AM +0800, Dave Young wrote:
> Even we make it automatic in kernel, but we have to have some default
> value for swiotlb in case crashkernel can not find a free region under 4G.
> So this default value can not work for every use cases, people need 
> manually use crashkernel=,low and crashkernel=,high in case
> crashkernel=X does not work.

Why would the user need to find swiotlb range? The kernel has all the
information it requires at its finger tips in order to decide properly.

The user wants a crashkernel range, the kernel tries the low range =>
no workie, then it tries the next range => workie but needs to allocate
swiotlb range so that DMA can happen too. Doh, then the kernel does
allocate that too.

Why would the user need to do anything here?!

-- 
Regards/Gruss,
    Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCHv7] x86/kdump: bugfix, make the behavior of crashkernel=X consistent with kaslr
  2019-02-15 10:24                       ` Borislav Petkov
@ 2019-02-18  1:48                         ` Dave Young
  2019-02-20  7:38                           ` Pingfan Liu
  2019-02-20  8:32                           ` Borislav Petkov
  0 siblings, 2 replies; 45+ messages in thread
From: Dave Young @ 2019-02-18  1:48 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: bhe, Jerry Hoemann, x86, Randy Dunlap, kexec, linux-kernel,
	Pingfan Liu, Mike Rapoport, Andrew Morton, yinghai, vgoyal,
	iommu, konrad.wilk

On 02/15/19 at 11:24am, Borislav Petkov wrote:
> On Tue, Feb 12, 2019 at 04:48:16AM +0800, Dave Young wrote:
> > Even we make it automatic in kernel, but we have to have some default
> > value for swiotlb in case crashkernel can not find a free region under 4G.
> > So this default value can not work for every use cases, people need 
> > manually use crashkernel=,low and crashkernel=,high in case
> > crashkernel=X does not work.
> 
> Why would the user need to find swiotlb range? The kernel has all the
> information it requires at its finger tips in order to decide properly.
> 
> The user wants a crashkernel range, the kernel tries the low range =>
> no workie, then it tries the next range => workie but needs to allocate
> swiotlb range so that DMA can happen too. Doh, then the kernel does
> allocate that too.

It is ideal if kernel can do it automatically, but I'm not sure if
kernel can predict the swiotlb reserved size automatically.

Let's add more people to seek for comments. 

> 
> Why would the user need to do anything here?!
> 
> -- 
> Regards/Gruss,
>     Boris.
> 
> Good mailing practices for 400: avoid top-posting and trim the reply.

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCHv7] x86/kdump: bugfix, make the behavior of crashkernel=X consistent with kaslr
  2019-02-18  1:48                         ` Dave Young
@ 2019-02-20  7:38                           ` Pingfan Liu
  2019-02-20  8:32                           ` Borislav Petkov
  1 sibling, 0 replies; 45+ messages in thread
From: Pingfan Liu @ 2019-02-20  7:38 UTC (permalink / raw)
  To: Dave Young
  Cc: Borislav Petkov, Baoquan He, Jerry Hoemann, x86, Randy Dunlap,
	kexec, LKML, Mike Rapoport, Andrew Morton, Yinghai Lu, vgoyal,
	iommu, konrad.wilk

On Mon, Feb 18, 2019 at 9:48 AM Dave Young <dyoung@redhat.com> wrote:
>
> On 02/15/19 at 11:24am, Borislav Petkov wrote:
> > On Tue, Feb 12, 2019 at 04:48:16AM +0800, Dave Young wrote:
> > > Even we make it automatic in kernel, but we have to have some default
> > > value for swiotlb in case crashkernel can not find a free region under 4G.
> > > So this default value can not work for every use cases, people need
> > > manually use crashkernel=,low and crashkernel=,high in case
> > > crashkernel=X does not work.
> >
> > Why would the user need to find swiotlb range? The kernel has all the
> > information it requires at its finger tips in order to decide properly.
> >
> > The user wants a crashkernel range, the kernel tries the low range =>
> > no workie, then it tries the next range => workie but needs to allocate
> > swiotlb range so that DMA can happen too. Doh, then the kernel does
> > allocate that too.
>
> It is ideal if kernel can do it automatically, but I'm not sure if
> kernel can predict the swiotlb reserved size automatically.
>
Agreed, I think it is hard to decide the reserved size automatically.
We do not know the requirement for memory of ZONE_DMA32 at boot time.
The requirement depends on how many DMA32 devices, and the dynamic
payload of them.

> Let's add more people to seek for comments.
>
> >
> > Why would the user need to do anything here?!
> >
> > --
> > Regards/Gruss,
> >     Boris.
> >
> > Good mailing practices for 400: avoid top-posting and trim the reply.

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCHv7] x86/kdump: bugfix, make the behavior of crashkernel=X consistent with kaslr
  2019-02-18  1:48                         ` Dave Young
  2019-02-20  7:38                           ` Pingfan Liu
@ 2019-02-20  8:32                           ` Borislav Petkov
  2019-02-20  9:41                             ` Dave Young
  1 sibling, 1 reply; 45+ messages in thread
From: Borislav Petkov @ 2019-02-20  8:32 UTC (permalink / raw)
  To: Dave Young
  Cc: bhe, Jerry Hoemann, x86, Randy Dunlap, kexec, linux-kernel,
	Pingfan Liu, Mike Rapoport, Andrew Morton, yinghai, vgoyal,
	iommu, konrad.wilk

On Mon, Feb 18, 2019 at 09:48:20AM +0800, Dave Young wrote:
> It is ideal if kernel can do it automatically, but I'm not sure if
> kernel can predict the swiotlb reserved size automatically.

Do you see how even more absurd this gets?

If the kernel cannot know the swiotlb reserved size automatically, how
is the normal user even supposed to know?!

I see swiotlb_size_or_default() so we have a sane default which we fall
back to. Now where's the problem with that?

-- 
Regards/Gruss,
    Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCHv7] x86/kdump: bugfix, make the behavior of crashkernel=X consistent with kaslr
  2019-02-20  8:32                           ` Borislav Petkov
@ 2019-02-20  9:41                             ` Dave Young
  2019-02-20 12:51                               ` Pingfan Liu
  2019-02-21 17:13                               ` Borislav Petkov
  0 siblings, 2 replies; 45+ messages in thread
From: Dave Young @ 2019-02-20  9:41 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: bhe, Jerry Hoemann, x86, Randy Dunlap, kexec, linux-kernel,
	Pingfan Liu, Mike Rapoport, Andrew Morton, yinghai, vgoyal,
	iommu, konrad.wilk, Joerg Roedel

On 02/20/19 at 09:32am, Borislav Petkov wrote:
> On Mon, Feb 18, 2019 at 09:48:20AM +0800, Dave Young wrote:
> > It is ideal if kernel can do it automatically, but I'm not sure if
> > kernel can predict the swiotlb reserved size automatically.
> 
> Do you see how even more absurd this gets?
> 
> If the kernel cannot know the swiotlb reserved size automatically, how
> is the normal user even supposed to know?!
> 
> I see swiotlb_size_or_default() so we have a sane default which we fall
> back to. Now where's the problem with that?

Good question, I expect some answer from people who know more about the
background.  It would be good to have some actual test results, Pingfan
is trying to do some tests.

Previously Joerg posted below patch, maybe he has some idea. Joerg?

commit 94fb9334182284e8e7e4bcb9125c25dc33af19d4
Author: Joerg Roedel <jroedel@suse.de>
Date:   Wed Jun 10 17:49:42 2015 +0200

    x86/crash: Allocate enough low memory when crashkernel=high

Thanks
Dave

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCHv7] x86/kdump: bugfix, make the behavior of crashkernel=X consistent with kaslr
  2019-02-20  9:41                             ` Dave Young
@ 2019-02-20 12:51                               ` Pingfan Liu
  2019-02-21 17:13                               ` Borislav Petkov
  1 sibling, 0 replies; 45+ messages in thread
From: Pingfan Liu @ 2019-02-20 12:51 UTC (permalink / raw)
  To: Dave Young
  Cc: Borislav Petkov, Baoquan He, Jerry Hoemann, x86, Randy Dunlap,
	kexec, LKML, Mike Rapoport, Andrew Morton, Yinghai Lu, vgoyal,
	iommu, konrad.wilk, Joerg Roedel

On Wed, Feb 20, 2019 at 5:41 PM Dave Young <dyoung@redhat.com> wrote:
>
> On 02/20/19 at 09:32am, Borislav Petkov wrote:
> > On Mon, Feb 18, 2019 at 09:48:20AM +0800, Dave Young wrote:
> > > It is ideal if kernel can do it automatically, but I'm not sure if
> > > kernel can predict the swiotlb reserved size automatically.
> >
> > Do you see how even more absurd this gets?
> >
> > If the kernel cannot know the swiotlb reserved size automatically, how
> > is the normal user even supposed to know?!
> >
I think swiotlb is bounce-buffer, if we enlarge it, we can get better
performance. Default size should be enough for platform to work. But
in case of reserving low memory for crashkernel, things are different.
The reserve low memory = swiotlb_size_or_default() + DMA32 memory for
devices. And the 2nd item in the right of the equation varies, based
on machine type and dynamic payload

> > I see swiotlb_size_or_default() so we have a sane default which we fall
> > back to. Now where's the problem with that?
>
> Good question, I expect some answer from people who know more about the
> background.  It would be good to have some actual test results, Pingfan
> is trying to do some tests.
>
Not following the idea, I do not think the following test result can
tell much. (We need various type of machine to get a final result.)
I do a quick test on "HPE ProLiant DL380 Gen10/ProLiant DL380 Gen10",
command line "crashkernel=180M,high crashkernel=64M,low" can work for
the 2nd kernel. Although it complained some memory shortage issue:
[    7.655591] fbcon: mgadrmfb (fb0) is primary device
[    7.655639] Console: switching to colour frame buffer device 128x48
[    7.660609] systemd-udevd: page allocation failure: order:0, mode:0x280d4
[    7.660611] CPU: 0 PID: 180 Comm: systemd-udevd Not tainted
3.10.0-957.el7.x86_64 #1
[    7.660612] Hardware name: HPE ProLiant DL380 Gen10/ProLiant DL380
Gen10, BIOS U30 06/20/2018
[    7.660612] Call Trace:
[    7.660621]  [<ffffffff81761dc1>] dump_stack+0x19/0x1b
[    7.660625]  [<ffffffff811bc830>] warn_alloc_failed+0x110/0x180
[    7.660628]  [<ffffffff8175d3ce>] __alloc_pages_slowpath+0x6b6/0x724
[    7.660631]  [<ffffffff811c0e95>] __alloc_pages_nodemask+0x405/0x420
[    7.660633]  [<ffffffff8120dcf8>] alloc_pages_current+0x98/0x110
[    7.660638]  [<ffffffffc00c8622>] ttm_pool_populate+0x3d2/0x4b0 [ttm]
[    7.660641]  [<ffffffffc00bf1cd>] ttm_tt_populate+0x7d/0x90 [ttm]
[    7.660644]  [<ffffffffc00c3c74>] ttm_bo_kmap+0x124/0x240 [ttm]
[    7.660648]  [<ffffffff810cecbf>] ? __wake_up_sync_key+0x4f/0x60
[    7.660650]  [<ffffffffc012677e>] mga_dirty_update+0x25e/0x310 [mgag200]
[    7.660653]  [<ffffffffc012685f>] mga_imageblit+0x2f/0x40 [mgag200]
[    7.660657]  [<ffffffff813f97ca>] soft_cursor+0x1ba/0x260
[    7.660659]  [<ffffffff813f8f53>] bit_cursor+0x663/0x6a0
[    7.660662]  [<ffffffff81098739>] ? console_trylock+0x19/0x70
[    7.660664]  [<ffffffff813f514d>] fbcon_cursor+0x13d/0x1c0
[    7.660665]  [<ffffffff813f88f0>] ? bit_clear+0x120/0x120
[    7.660668]  [<ffffffff8146af2e>] hide_cursor+0x2e/0xa0
[    7.660669]  [<ffffffff8146d4e8>] redraw_screen+0x188/0x270
[    7.660671]  [<ffffffff8146e086>] do_bind_con_driver+0x316/0x340
[    7.660672]  [<ffffffff8146e5e9>] do_take_over_console+0x49/0x60
[    7.660674]  [<ffffffff813f24c3>] do_fbcon_takeover+0x63/0xd0
[    7.660675]  [<ffffffff813f808d>] fbcon_event_notify+0x61d/0x730
[    7.660678]  [<ffffffff8176fb0f>] notifier_call_chain+0x4f/0x70
[    7.660681]  [<ffffffff810c7f6d>] __blocking_notifier_call_chain+0x4d/0x70
[    7.660683]  [<ffffffff810c7fa6>] blocking_notifier_call_chain+0x16/0x20
[    7.660684]  [<ffffffff813e8b9b>] fb_notifier_call_chain+0x1b/0x20
[    7.660686]  [<ffffffff813e9e46>] register_framebuffer+0x1f6/0x340
[    7.660690]  [<ffffffffc01027e2>]
__drm_fb_helper_initial_config_and_unlock+0x252/0x3e0 [drm_kms_helper]
[    7.660694]  [<ffffffffc01029ae>]
drm_fb_helper_initial_config+0x3e/0x50 [drm_kms_helper]
[    7.660697]  [<ffffffffc01269d3>] mgag200_fbdev_init+0xe3/0x100 [mgag200]
[    7.660699]  [<ffffffffc01254f4>] mgag200_modeset_init+0x154/0x1d0 [mgag200]
[    7.660701]  [<ffffffffc012157d>] mgag200_driver_load+0x41d/0x5b0 [mgag200]
[    7.660708]  [<ffffffffc005ba4f>] drm_dev_register+0x15f/0x1f0 [drm]
[    7.660711]  [<ffffffff813c3518>] ? pci_enable_device_flags+0xe8/0x140
[    7.660718]  [<ffffffffc005d0da>] drm_get_pci_dev+0x8a/0x1a0 [drm]
[    7.660720]  [<ffffffffc012626b>] mga_pci_probe+0x9b/0xc0 [mgag200]
[    7.660722]  [<ffffffff813c4aca>] local_pci_probe+0x4a/0xb0
[    7.660723]  [<ffffffff813c6209>] pci_device_probe+0x109/0x160
[    7.660726]  [<ffffffff814a8285>] driver_probe_device+0xc5/0x3e0
[    7.660727]  [<ffffffff814a8683>] __driver_attach+0x93/0xa0
[    7.660728]  [<ffffffff814a85f0>] ? __device_attach+0x50/0x50
[    7.660730]  [<ffffffff814a5e25>] bus_for_each_dev+0x75/0xc0
[    7.660731]  [<ffffffff814a7bfe>] driver_attach+0x1e/0x20
[    7.660733]  [<ffffffff814a76a0>] bus_add_driver+0x200/0x2d0
[    7.660734]  [<ffffffff814a8d14>] driver_register+0x64/0xf0
[    7.660735]  [<ffffffff813c5a45>] __pci_register_driver+0xa5/0xc0
[    7.660737]  [<ffffffffc012d000>] ? 0xffffffffc012cfff
[    7.660739]  [<ffffffffc012d039>] mgag200_init+0x39/0x1000 [mgag200]
[    7.660742]  [<ffffffff8100210a>] do_one_initcall+0xba/0x240
[    7.660745]  [<ffffffff81118f8c>] load_module+0x272c/0x2bc0
[    7.660748]  [<ffffffff813a3030>] ? ddebug_proc_write+0x100/0x100
[    7.660750]  [<ffffffff8111950f>] SyS_init_module+0xef/0x140
[    7.660752]  [<ffffffff81774ddb>] system_call_fastpath+0x22/0x27
[    7.660753] Mem-Info:
[    7.660756] active_anon:3364 inactive_anon:6661 isolated_anon:0
[    7.660756]  active_file:0 inactive_file:0 isolated_file:0
[    7.660756]  unevictable:0 dirty:0 writeback:0 unstable:0
[    7.660756]  slab_reclaimable:1492 slab_unreclaimable:3116
[    7.660756]  mapped:1223 shmem:8449 pagetables:179 bounce:0
[    7.660756]  free:20626 free_pcp:0 free_cma:0
[    7.660761] Node 0 DMA free:0kB min:4kB low:4kB high:4kB
active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB
unevictable:0kB isolated(anon):0kB isolated(file):0kB present:564kB
managed:448kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB
slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB
pagetables:0kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB
free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes
[    7.660762] lowmem_reserve[]: 0 0 152 152
[    7.660766] Node 0 DMA32 free:0kB min:0kB low:0kB high:0kB
active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB
unevictable:0kB isolated(anon):0kB isolated(file):0kB present:65536kB
managed:0kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB
slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB
pagetables:0kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB
free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes
[    7.660767] lowmem_reserve[]: 0 0 152 152
[    7.660771] Node 0 Normal free:82504kB min:1572kB low:1964kB
high:2356kB active_anon:13456kB inactive_anon:26644kB active_file:0kB
inactive_file:0kB unevictable:0kB isolated(anon):0kB
isolated(file):0kB present:183740kB managed:158716kB mlocked:0kB
dirty:0kB writeback:0kB mapped:4892kB shmem:33796kB
slab_reclaimable:5968kB slab_unreclaimable:12464kB kernel_stack:784kB
pagetables:716kB unstable:0kB bounce:0kB free_pcp:[    8.722693]
Microsemi PQI Driver (v1.1.4-115)


> Previously Joerg posted below patch, maybe he has some idea. Joerg?
>
> commit 94fb9334182284e8e7e4bcb9125c25dc33af19d4
> Author: Joerg Roedel <jroedel@suse.de>
> Date:   Wed Jun 10 17:49:42 2015 +0200
>
>     x86/crash: Allocate enough low memory when crashkernel=high
>
> Thanks
> Dave

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCHv7] x86/kdump: bugfix, make the behavior of crashkernel=X consistent with kaslr
  2019-02-20  9:41                             ` Dave Young
  2019-02-20 12:51                               ` Pingfan Liu
@ 2019-02-21 17:13                               ` Borislav Petkov
  2019-02-22  2:11                                 ` Dave Young
  1 sibling, 1 reply; 45+ messages in thread
From: Borislav Petkov @ 2019-02-21 17:13 UTC (permalink / raw)
  To: Dave Young
  Cc: bhe, Jerry Hoemann, x86, Randy Dunlap, kexec, linux-kernel,
	Pingfan Liu, Mike Rapoport, Andrew Morton, yinghai, vgoyal,
	iommu, konrad.wilk, Joerg Roedel

On Wed, Feb 20, 2019 at 05:41:46PM +0800, Dave Young wrote:
> Previously Joerg posted below patch, maybe he has some idea. Joerg?

Isn't it clear from the commit message?

-- 
Regards/Gruss,
    Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCHv7] x86/kdump: bugfix, make the behavior of crashkernel=X consistent with kaslr
  2019-02-21 17:13                               ` Borislav Petkov
@ 2019-02-22  2:11                                 ` Dave Young
  2019-02-22  8:42                                   ` Joerg Roedel
  0 siblings, 1 reply; 45+ messages in thread
From: Dave Young @ 2019-02-22  2:11 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: bhe, Jerry Hoemann, x86, Randy Dunlap, kexec, linux-kernel,
	Pingfan Liu, Mike Rapoport, Andrew Morton, yinghai, vgoyal,
	iommu, konrad.wilk, Joerg Roedel

On 02/21/19 at 06:13pm, Borislav Petkov wrote:
> On Wed, Feb 20, 2019 at 05:41:46PM +0800, Dave Young wrote:
> > Previously Joerg posted below patch, maybe he has some idea. Joerg?
> 
> Isn't it clear from the commit message?

Then, does it answered your question?

256M is set as a default value in the patch, but it is not a predict to
satisfy all use cases, from the description it is also possible that some
people run out of the 256M and the ,low and ,high format is still
necessary to exist even if we make crashkernel=X do the allocation
automatically in high in case failed in low area.

crashkernel=X:  allocate in low first, if not possible, then allocate in
high

In case people have a lot of devices need more swiotlb, then he manually
set the ,high with ,low together.

What's your suggestion then?  remove ,low and ,high and increase default
256M in case we get failure bug report?

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCHv7] x86/kdump: bugfix, make the behavior of crashkernel=X consistent with kaslr
  2019-02-22  2:11                                 ` Dave Young
@ 2019-02-22  8:42                                   ` Joerg Roedel
  2019-02-22 13:00                                     ` Borislav Petkov
  0 siblings, 1 reply; 45+ messages in thread
From: Joerg Roedel @ 2019-02-22  8:42 UTC (permalink / raw)
  To: Dave Young
  Cc: Borislav Petkov, bhe, Jerry Hoemann, x86, Randy Dunlap, kexec,
	linux-kernel, Pingfan Liu, Mike Rapoport, Andrew Morton, yinghai,
	vgoyal, iommu, konrad.wilk

On Fri, Feb 22, 2019 at 10:11:01AM +0800, Dave Young wrote:
> In case people have a lot of devices need more swiotlb, then he manually
> set the ,high with ,low together.

The option to specify the high and low values for the crashkernel are
important for certain machines. The point is that swiotlb already
allocates 64MB of low memory by default. But that memory is only used
for 32bit DMA-mask devices that want to DMA into high memory. There are
drivers just allocating GFP_DMA32 memory, which also ends up in the low
region (but not swiotlb), that is why the previous default of 72MB low
memory was not enough, it only left 8MB of GFP_DMA32 memory. The current
default of 256MB was found by experiments on a bigger number of
machines, to create a reasonable default that is at least likely to be
sufficient of an average machine.

There is no way today for the kernel to find an optimum value for the
amount of low memory required to successfully create a crash dump. It
depends on the amount of devices in the system and how the drivers
for them are written. The drivers have no way to report back their
requirements, and even if they had, at the time the allocation happens
no driver is loaded yet.

So it is up to the system administrator to find workable values for the
high and low memory requirements, even using experiments as a last
resort.

Regards,

	Joerg

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCHv7] x86/kdump: bugfix, make the behavior of crashkernel=X consistent with kaslr
  2019-02-22  8:42                                   ` Joerg Roedel
@ 2019-02-22 13:00                                     ` Borislav Petkov
  2019-02-24 13:25                                       ` Pingfan Liu
  2019-02-25 11:00                                       ` Joerg Roedel
  0 siblings, 2 replies; 45+ messages in thread
From: Borislav Petkov @ 2019-02-22 13:00 UTC (permalink / raw)
  To: Joerg Roedel, Dave Young
  Cc: bhe, Jerry Hoemann, x86, Randy Dunlap, kexec, linux-kernel,
	Pingfan Liu, Mike Rapoport, Andrew Morton, yinghai, vgoyal,
	iommu, konrad.wilk

On Fri, Feb 22, 2019 at 09:42:41AM +0100, Joerg Roedel wrote:
> The current default of 256MB was found by experiments on a bigger
> number of machines, to create a reasonable default that is at least
> likely to be sufficient of an average machine.

Exactly, and this is what makes sense.

The code should try the requested reservation and if it fails, it should
try high allocation with default swiotlb size because we need to reserve
*some* range.

If that reservation succeeds, we should say something along the lines of

"... requested range failed, reserved <X> range instead."

And then in Documentation/admin-guide/kernel-parameters.txt above the
crashkernel= explanations, the allocation strategy of best effort should
be explained in short. That the kernel will try to allocate high if the
requested allocation didn't succeed and that the user can tweak the
allocation with the below options.

Bottom line is: the kernel should assist the user and try harder to
allocate *some* range for a crash kernel when there's no detailed
specification what that range should be.

*If* the user adds ,low, high, then the kernel should try only that
specified range because the assumption is that the user knows what she's
doing.

But if the user simply wants a range for a crash kernel without stating
where that range should be in particular and it's placement is a don't
care - as long as there is a range - then the kernel should simply try
high, etc.

Makes sense?

-- 
Regards/Gruss,
    Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCHv7] x86/kdump: bugfix, make the behavior of crashkernel=X consistent with kaslr
  2019-02-22 13:00                                     ` Borislav Petkov
@ 2019-02-24 13:25                                       ` Pingfan Liu
  2019-02-25  1:53                                         ` Dave Young
  2019-02-25  9:39                                         ` Borislav Petkov
  2019-02-25 11:00                                       ` Joerg Roedel
  1 sibling, 2 replies; 45+ messages in thread
From: Pingfan Liu @ 2019-02-24 13:25 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Joerg Roedel, Dave Young, Baoquan He, Jerry Hoemann, x86,
	Randy Dunlap, kexec, LKML, Mike Rapoport, Andrew Morton,
	Yinghai Lu, vgoyal, iommu, konrad.wilk

On Fri, Feb 22, 2019 at 9:00 PM Borislav Petkov <bp@alien8.de> wrote:
>
> On Fri, Feb 22, 2019 at 09:42:41AM +0100, Joerg Roedel wrote:
> > The current default of 256MB was found by experiments on a bigger
> > number of machines, to create a reasonable default that is at least
> > likely to be sufficient of an average machine.
>
> Exactly, and this is what makes sense.
>
> The code should try the requested reservation and if it fails, it should
> try high allocation with default swiotlb size because we need to reserve
> *some* range.
>
> If that reservation succeeds, we should say something along the lines of
>
> "... requested range failed, reserved <X> range instead."
>
Maybe I misunderstood you, but does "requested range failed" mean that
user specify the range? If yes, then it should be the duty of user as
you said later, not the duty of kernel"

> And then in Documentation/admin-guide/kernel-parameters.txt above the
> crashkernel= explanations, the allocation strategy of best effort should
> be explained in short. That the kernel will try to allocate high if the
> requested allocation didn't succeed and that the user can tweak the
> allocation with the below options.
>
Yes, it should be improved.

> Bottom line is: the kernel should assist the user and try harder to
> allocate *some* range for a crash kernel when there's no detailed
> specification what that range should be.
>
> *If* the user adds ,low, high, then the kernel should try only that
> specified range because the assumption is that the user knows what she's
> doing.
>
> But if the user simply wants a range for a crash kernel without stating
> where that range should be in particular and it's placement is a don't
> care - as long as there is a range - then the kernel should simply try
> high, etc.
>
We do not know the memory layout of a system, maybe a system with
memory less than 4GB. So it is better to try all the range of system
memory

Thanks,
Pingfan

> Makes sense?
>
> --
> Regards/Gruss,
>     Boris.
>
> Good mailing practices for 400: avoid top-posting and trim the reply.

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCHv7] x86/kdump: bugfix, make the behavior of crashkernel=X consistent with kaslr
  2019-02-24 13:25                                       ` Pingfan Liu
@ 2019-02-25  1:53                                         ` Dave Young
  2019-02-25  9:39                                         ` Borislav Petkov
  1 sibling, 0 replies; 45+ messages in thread
From: Dave Young @ 2019-02-25  1:53 UTC (permalink / raw)
  To: Pingfan Liu
  Cc: Borislav Petkov, Joerg Roedel, Baoquan He, Jerry Hoemann, x86,
	Randy Dunlap, kexec, LKML, Mike Rapoport, Andrew Morton,
	Yinghai Lu, vgoyal, iommu, konrad.wilk

On 02/24/19 at 09:25pm, Pingfan Liu wrote:
> On Fri, Feb 22, 2019 at 9:00 PM Borislav Petkov <bp@alien8.de> wrote:
> >
> > On Fri, Feb 22, 2019 at 09:42:41AM +0100, Joerg Roedel wrote:
> > > The current default of 256MB was found by experiments on a bigger
> > > number of machines, to create a reasonable default that is at least
> > > likely to be sufficient of an average machine.
> >
> > Exactly, and this is what makes sense.
> >
> > The code should try the requested reservation and if it fails, it should
> > try high allocation with default swiotlb size because we need to reserve
> > *some* range.
> >
> > If that reservation succeeds, we should say something along the lines of
> >
> > "... requested range failed, reserved <X> range instead."
> >
> Maybe I misunderstood you, but does "requested range failed" mean that
> user specify the range? If yes, then it should be the duty of user as
> you said later, not the duty of kernel"

If you go with the changes in your current patch it is needed to say
something like:
"crashkernel: can not find free memory under 4G, reserve XM@.. instead" 

Also need to print the reserved low memory area in case ,high being used.

But for 896M -> 4G, the 896M faulure is not necessary to show in dmesg,
it is some in kernel logic.

Thanks
Dave

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCHv7] x86/kdump: bugfix, make the behavior of crashkernel=X consistent with kaslr
  2019-02-24 13:25                                       ` Pingfan Liu
  2019-02-25  1:53                                         ` Dave Young
@ 2019-02-25  9:39                                         ` Borislav Petkov
  1 sibling, 0 replies; 45+ messages in thread
From: Borislav Petkov @ 2019-02-25  9:39 UTC (permalink / raw)
  To: Pingfan Liu
  Cc: Joerg Roedel, Dave Young, Baoquan He, Jerry Hoemann, x86,
	Randy Dunlap, kexec, LKML, Mike Rapoport, Andrew Morton,
	Yinghai Lu, vgoyal, iommu, konrad.wilk

On Sun, Feb 24, 2019 at 09:25:18PM +0800, Pingfan Liu wrote:
> Maybe I misunderstood you, but does "requested range failed" mean that
> user specify the range? If yes, then it should be the duty of user as
> you said later, not the duty of kernel"

No, it should say that it selected a different range only when the user
didn't specify it. Which would mean that the user didn't care about the
range - she/he only wanted to have *any* crashkernel range reserved.
I.e., crashkernel=X invocation.

> We do not know the memory layout of a system, maybe a system with
> memory less than 4GB. So it is better to try all the range of system
> memory.

Ok. If 4G fails, you set high and then try again.

-- 
Regards/Gruss,
    Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCHv7] x86/kdump: bugfix, make the behavior of crashkernel=X consistent with kaslr
  2019-02-22 13:00                                     ` Borislav Petkov
  2019-02-24 13:25                                       ` Pingfan Liu
@ 2019-02-25 11:00                                       ` Joerg Roedel
  2019-02-25 11:12                                         ` Dave Young
  1 sibling, 1 reply; 45+ messages in thread
From: Joerg Roedel @ 2019-02-25 11:00 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Dave Young, bhe, Jerry Hoemann, x86, Randy Dunlap, kexec,
	linux-kernel, Pingfan Liu, Mike Rapoport, Andrew Morton, yinghai,
	vgoyal, iommu, konrad.wilk

On Fri, Feb 22, 2019 at 02:00:26PM +0100, Borislav Petkov wrote:
> On Fri, Feb 22, 2019 at 09:42:41AM +0100, Joerg Roedel wrote:
> > The current default of 256MB was found by experiments on a bigger
> > number of machines, to create a reasonable default that is at least
> > likely to be sufficient of an average machine.
> 
> Exactly, and this is what makes sense.
> 
> The code should try the requested reservation and if it fails, it should
> try high allocation with default swiotlb size because we need to reserve
> *some* range.

Right, makes sense. While at it, maybe it is time to move the default
allocation policy to 'high' again. The change was reverted six years ago
because it broke old kexec tools, but those are probably out-of-service
now. I think this change would make the whole crashdump allocation
process less fragile.

Regards,

	Joerg


^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCHv7] x86/kdump: bugfix, make the behavior of crashkernel=X consistent with kaslr
  2019-02-25 11:00                                       ` Joerg Roedel
@ 2019-02-25 11:12                                         ` Dave Young
  2019-02-25 11:30                                           ` Borislav Petkov
  0 siblings, 1 reply; 45+ messages in thread
From: Dave Young @ 2019-02-25 11:12 UTC (permalink / raw)
  To: Joerg Roedel
  Cc: Borislav Petkov, bhe, Jerry Hoemann, x86, Randy Dunlap, kexec,
	linux-kernel, Pingfan Liu, Mike Rapoport, Andrew Morton, yinghai,
	vgoyal, iommu, konrad.wilk

On 02/25/19 at 12:00pm, Joerg Roedel wrote:
> On Fri, Feb 22, 2019 at 02:00:26PM +0100, Borislav Petkov wrote:
> > On Fri, Feb 22, 2019 at 09:42:41AM +0100, Joerg Roedel wrote:
> > > The current default of 256MB was found by experiments on a bigger
> > > number of machines, to create a reasonable default that is at least
> > > likely to be sufficient of an average machine.
> > 
> > Exactly, and this is what makes sense.
> > 
> > The code should try the requested reservation and if it fails, it should
> > try high allocation with default swiotlb size because we need to reserve
> > *some* range.
> 
> Right, makes sense. While at it, maybe it is time to move the default
> allocation policy to 'high' again. The change was reverted six years ago
> because it broke old kexec tools, but those are probably out-of-service
> now. I think this change would make the whole crashdump allocation
> process less fragile.

One concern about this is for average cases, one do not need so much
memory for kdump.  For example in RHEL we use crashkernel=auto to
automatically reserve kdump kernel memory, and for x86 the reserved size
is like below now:
1G-64G:160M,64G-1T:256M,1T-:512M

That means for a machine with less than 64G memory we only allocate
160M, it works for most machines in our lab.

If we move to high as default, it will allocate 160M high + 256M low. It
is too much for people who is good with the default 160M.  Especially
for virtual machine with less memory (but > 4G)

To make the process less fragile maybe we can remove the 896M limitation
and only try <4G then go to high.

Thanks
Dave

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCHv7] x86/kdump: bugfix, make the behavior of crashkernel=X consistent with kaslr
  2019-02-25 11:12                                         ` Dave Young
@ 2019-02-25 11:30                                           ` Borislav Petkov
  2019-03-01  3:04                                             ` Pingfan Liu
  0 siblings, 1 reply; 45+ messages in thread
From: Borislav Petkov @ 2019-02-25 11:30 UTC (permalink / raw)
  To: Dave Young
  Cc: Joerg Roedel, bhe, Jerry Hoemann, x86, Randy Dunlap, kexec,
	linux-kernel, Pingfan Liu, Mike Rapoport, Andrew Morton, yinghai,
	vgoyal, iommu, konrad.wilk

On Mon, Feb 25, 2019 at 07:12:16PM +0800, Dave Young wrote:
> If we move to high as default, it will allocate 160M high + 256M low. It

We won't move to high by default - we will *fall* back to high if the
default allocation fails.

> To make the process less fragile maybe we can remove the 896M limitation
> and only try <4G then go to high.

Sure, the more robust for the user, the better.

-- 
Regards/Gruss,
    Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCHv7] x86/kdump: bugfix, make the behavior of crashkernel=X consistent with kaslr
  2019-02-25 11:30                                           ` Borislav Petkov
@ 2019-03-01  3:04                                             ` Pingfan Liu
  2019-03-01  3:19                                               ` Pingfan Liu
  0 siblings, 1 reply; 45+ messages in thread
From: Pingfan Liu @ 2019-03-01  3:04 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Dave Young, Joerg Roedel, Baoquan He, Jerry Hoemann, x86,
	Randy Dunlap, kexec, LKML, Mike Rapoport, Andrew Morton,
	Yinghai Lu, vgoyal, iommu, konrad.wilk

Hi Borislav,

Do you think the following patch is good at present?
diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
index 81f9d23..9213073 100644
--- a/arch/x86/kernel/setup.c
+++ b/arch/x86/kernel/setup.c
@@ -460,7 +460,7 @@ static void __init
memblock_x86_reserve_range_setup_data(void)
 # define CRASH_ADDR_LOW_MAX    (512 << 20)
 # define CRASH_ADDR_HIGH_MAX   (512 << 20)
 #else
-# define CRASH_ADDR_LOW_MAX    (896UL << 20)
+# define CRASH_ADDR_LOW_MAX    (1 << 32)
 # define CRASH_ADDR_HIGH_MAX   MAXMEM
 #endif

For documentation, I will send another patch to improve the description.

Thanks,
Pingfan

On Mon, Feb 25, 2019 at 7:30 PM Borislav Petkov <bp@alien8.de> wrote:
>
> On Mon, Feb 25, 2019 at 07:12:16PM +0800, Dave Young wrote:
> > If we move to high as default, it will allocate 160M high + 256M low. It
>
> We won't move to high by default - we will *fall* back to high if the
> default allocation fails.
>
> > To make the process less fragile maybe we can remove the 896M limitation
> > and only try <4G then go to high.
>
> Sure, the more robust for the user, the better.
>
> --
> Regards/Gruss,
>     Boris.
>
> Good mailing practices for 400: avoid top-posting and trim the reply.

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* Re: [PATCHv7] x86/kdump: bugfix, make the behavior of crashkernel=X consistent with kaslr
  2019-03-01  3:04                                             ` Pingfan Liu
@ 2019-03-01  3:19                                               ` Pingfan Liu
  2019-03-22  8:22                                                 ` Dave Young
  0 siblings, 1 reply; 45+ messages in thread
From: Pingfan Liu @ 2019-03-01  3:19 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Dave Young, Joerg Roedel, Baoquan He, Jerry Hoemann, x86,
	Randy Dunlap, kexec, LKML, Mike Rapoport, Andrew Morton,
	Yinghai Lu, vgoyal, iommu, konrad.wilk

On Fri, Mar 1, 2019 at 11:04 AM Pingfan Liu <kernelfans@gmail.com> wrote:
>
> Hi Borislav,
>
> Do you think the following patch is good at present?
> diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
> index 81f9d23..9213073 100644
> --- a/arch/x86/kernel/setup.c
> +++ b/arch/x86/kernel/setup.c
> @@ -460,7 +460,7 @@ static void __init
> memblock_x86_reserve_range_setup_data(void)
>  # define CRASH_ADDR_LOW_MAX    (512 << 20)
>  # define CRASH_ADDR_HIGH_MAX   (512 << 20)
>  #else
> -# define CRASH_ADDR_LOW_MAX    (896UL << 20)
> +# define CRASH_ADDR_LOW_MAX    (1 << 32)
>  # define CRASH_ADDR_HIGH_MAX   MAXMEM
>  #endif
>
Or patch lools like:
diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
index 3d872a5..ed0def5 100644
--- a/arch/x86/kernel/setup.c
+++ b/arch/x86/kernel/setup.c
@@ -459,7 +459,7 @@ static void __init
memblock_x86_reserve_range_setup_data(void)
 # define CRASH_ADDR_LOW_MAX    (512 << 20)
 # define CRASH_ADDR_HIGH_MAX   (512 << 20)
 #else
-# define CRASH_ADDR_LOW_MAX    (896UL << 20)
+# define CRASH_ADDR_LOW_MAX    (1 << 32)
 # define CRASH_ADDR_HIGH_MAX   MAXMEM
 #endif

@@ -551,6 +551,15 @@ static void __init reserve_crashkernel(void)
                                                    high ? CRASH_ADDR_HIGH_MAX
                                                         : CRASH_ADDR_LOW_MAX,
                                                    crash_size, CRASH_ALIGN);
+#ifdef CONFIG_X86_64
+               /*
+                * crashkernel=X reserve below 4G fails? Try MAXMEM
+                */
+               if (!high && !crash_base)
+                       crash_base = memblock_find_in_range(CRASH_ALIGN,
+                                               CRASH_ADDR_HIGH_MAX,
+                                               crash_size, CRASH_ALIGN);
+#endif

which tries 0-4G, the fall back to 4G above

> For documentation, I will send another patch to improve the description.
>
> Thanks,
> Pingfan
>
> On Mon, Feb 25, 2019 at 7:30 PM Borislav Petkov <bp@alien8.de> wrote:
> >
> > On Mon, Feb 25, 2019 at 07:12:16PM +0800, Dave Young wrote:
> > > If we move to high as default, it will allocate 160M high + 256M low. It
> >
> > We won't move to high by default - we will *fall* back to high if the
> > default allocation fails.
> >
> > > To make the process less fragile maybe we can remove the 896M limitation
> > > and only try <4G then go to high.
> >
> > Sure, the more robust for the user, the better.
> >
> > --
> > Regards/Gruss,
> >     Boris.
> >
> > Good mailing practices for 400: avoid top-posting and trim the reply.

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* Re: [PATCHv7] x86/kdump: bugfix, make the behavior of crashkernel=X consistent with kaslr
  2019-03-01  3:19                                               ` Pingfan Liu
@ 2019-03-22  8:22                                                 ` Dave Young
  0 siblings, 0 replies; 45+ messages in thread
From: Dave Young @ 2019-03-22  8:22 UTC (permalink / raw)
  To: Pingfan Liu
  Cc: Borislav Petkov, Joerg Roedel, Baoquan He, Jerry Hoemann, x86,
	Randy Dunlap, kexec, LKML, Mike Rapoport, Andrew Morton,
	Yinghai Lu, vgoyal, iommu, konrad.wilk

Hi Pingfan, 

Thanks for the effort,
On 03/01/19 at 11:19am, Pingfan Liu wrote:
> On Fri, Mar 1, 2019 at 11:04 AM Pingfan Liu <kernelfans@gmail.com> wrote:
> >
> > Hi Borislav,
> >
> > Do you think the following patch is good at present?
> > diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
> > index 81f9d23..9213073 100644
> > --- a/arch/x86/kernel/setup.c
> > +++ b/arch/x86/kernel/setup.c
> > @@ -460,7 +460,7 @@ static void __init
> > memblock_x86_reserve_range_setup_data(void)
> >  # define CRASH_ADDR_LOW_MAX    (512 << 20)
> >  # define CRASH_ADDR_HIGH_MAX   (512 << 20)
> >  #else
> > -# define CRASH_ADDR_LOW_MAX    (896UL << 20)
> > +# define CRASH_ADDR_LOW_MAX    (1 << 32)
> >  # define CRASH_ADDR_HIGH_MAX   MAXMEM
> >  #endif
> >
> Or patch lools like:
> diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
> index 3d872a5..ed0def5 100644
> --- a/arch/x86/kernel/setup.c
> +++ b/arch/x86/kernel/setup.c
> @@ -459,7 +459,7 @@ static void __init
> memblock_x86_reserve_range_setup_data(void)
>  # define CRASH_ADDR_LOW_MAX    (512 << 20)
>  # define CRASH_ADDR_HIGH_MAX   (512 << 20)
>  #else
> -# define CRASH_ADDR_LOW_MAX    (896UL << 20)
> +# define CRASH_ADDR_LOW_MAX    (1 << 32)
>  # define CRASH_ADDR_HIGH_MAX   MAXMEM
>  #endif
> 
> @@ -551,6 +551,15 @@ static void __init reserve_crashkernel(void)
>                                                     high ? CRASH_ADDR_HIGH_MAX
>                                                          : CRASH_ADDR_LOW_MAX,
>                                                     crash_size, CRASH_ALIGN);
> +#ifdef CONFIG_X86_64
> +               /*
> +                * crashkernel=X reserve below 4G fails? Try MAXMEM
> +                */
> +               if (!high && !crash_base)
> +                       crash_base = memblock_find_in_range(CRASH_ALIGN,
> +                                               CRASH_ADDR_HIGH_MAX,
> +                                               crash_size, CRASH_ALIGN);
> +#endif
> 
> which tries 0-4G, the fall back to 4G above

This way looks good to me, I will do some testing with old kexec-tools,
Once testing done I can take up this again and repost later with some documentation
update.  Also will split to 2 patches  one to drop the old limitation,
another for the fallback.

Thanks
Dave

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCHv7] x86/kdump: bugfix, make the behavior of crashkernel=X consistent with kaslr
  2019-01-28 10:18         ` Borislav Petkov
@ 2019-06-07 17:30           ` Borislav Petkov
  2019-06-10  6:51             ` Dave Young
  0 siblings, 1 reply; 45+ messages in thread
From: Borislav Petkov @ 2019-06-07 17:30 UTC (permalink / raw)
  To: Dave Young, Pingfan Liu
  Cc: kexec, Baoquan He, Andrew Morton, Mike Rapoport, yinghai, vgoyal,
	Randy Dunlap, x86, linux-kernel

On Mon, Jan 28, 2019 at 11:18:31AM +0100, Borislav Petkov wrote:
> On Mon, Jan 28, 2019 at 05:58:09PM +0800, Dave Young wrote:
> > Another reason is in case ,high we will need automatically reserve a
> > region in low area for swiotlb.  So for example one use
> > crashkernel=256M,high,  actual reserved memory is 256M above 4G and
> > another 256M under 4G for swiotlb.  Normally it is not necessary for
> > most people.  Thus we can not make ,high as default.
> 
> And how is the poor user to figure out that we decided for her/him that
> swiotlb reservation is something not necessary for most people and thus
> we fail the crashkernel= reservation?
> 
> IOW, that "logic" above doesn't make a whole lot of sense to me from
> user friendliness perspective.

So to show what I mean: I'm trying to reserve a crash kernel region on a
box here. I tried:

crashkernel=64M@16M

as it is stated in Documentation/kdump/kdump.txt.

Box said:

[    0.000000] crashkernel reservation failed - memory is in use.

Oh great.

Then I tried:

crashkernel=64M@64M

Box said:

[    0.000000] crashkernel reservation failed - memory is in use.

So I simply did:

crashkernel=64M

and the box said:

[    0.000000] Reserving 64MB of memory at 3392MB for crashkernel (System RAM: 16271MB)

So I could've gone a long time poking at the memory to find a suitable
address.

So do you see what I mean with making this as user-friendly and as
robust as possible?

In this case I don't care about *where* my crash kernel is - I only want
to have one loaded *somewhere*.

And the same strategy should be applied to other reservation attempts
- we should try hard to reserve and if we cannot reserve, then try an
alternating range.

I even think that

crashkernel=X@Y

should not simply fail if Y is occupied but keep trying and say

[    0.000000] Reserving 64MB of memory at alternative address 3392MB for crashkernel (System RAM: 16271MB)

and only fail when the user doesn't really want the kernel to try hard
by booting with

crashkernel=X@Y,strict

But that's for another day.

-- 
Regards/Gruss,
    Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCHv7] x86/kdump: bugfix, make the behavior of crashkernel=X consistent with kaslr
  2019-06-07 17:30           ` Borislav Petkov
@ 2019-06-10  6:51             ` Dave Young
  0 siblings, 0 replies; 45+ messages in thread
From: Dave Young @ 2019-06-10  6:51 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Pingfan Liu, kexec, Baoquan He, Andrew Morton, Mike Rapoport,
	yinghai, vgoyal, Randy Dunlap, x86, linux-kernel

On 06/07/19 at 07:30pm, Borislav Petkov wrote:
> On Mon, Jan 28, 2019 at 11:18:31AM +0100, Borislav Petkov wrote:
> > On Mon, Jan 28, 2019 at 05:58:09PM +0800, Dave Young wrote:
> > > Another reason is in case ,high we will need automatically reserve a
> > > region in low area for swiotlb.  So for example one use
> > > crashkernel=256M,high,  actual reserved memory is 256M above 4G and
> > > another 256M under 4G for swiotlb.  Normally it is not necessary for
> > > most people.  Thus we can not make ,high as default.
> > 
> > And how is the poor user to figure out that we decided for her/him that
> > swiotlb reservation is something not necessary for most people and thus
> > we fail the crashkernel= reservation?
> > 
> > IOW, that "logic" above doesn't make a whole lot of sense to me from
> > user friendliness perspective.
> 
> So to show what I mean: I'm trying to reserve a crash kernel region on a
> box here. I tried:
> 
> crashkernel=64M@16M
> 
> as it is stated in Documentation/kdump/kdump.txt.
> 
> Box said:
> 
> [    0.000000] crashkernel reservation failed - memory is in use.
> 
> Oh great.
> 
> Then I tried:
> 
> crashkernel=64M@64M
> 
> Box said:
> 
> [    0.000000] crashkernel reservation failed - memory is in use.
> 
> So I simply did:
> 
> crashkernel=64M
> 
> and the box said:
> 
> [    0.000000] Reserving 64MB of memory at 3392MB for crashkernel (System RAM: 16271MB)
> 
> So I could've gone a long time poking at the memory to find a suitable
> address.
> 
> So do you see what I mean with making this as user-friendly and as
> robust as possible?

Yes, it is clear to me, I absolutely agree that is not friendly :)

Previously without KASLR, one can check /proc/iomem to find a possible
free area and use it for next and future boot.  But in case KASLR
enabled nowadays it become harder to predict the persistent free areas.

> 
> In this case I don't care about *where* my crash kernel is - I only want
> to have one loaded *somewhere*.

We would suggest people to use crashkernel=X instead.  for the X@Y
I believe it is some historic thing, it *should* be able to be obsolete
at least on X86, (not sure other arches).
I expect people can comment if they have some use cases requiring this
X@Y way. 

We have modified the crashkernel=X to search 0 - 4G memory instead
of old 0 - 896M for low memory areas, so a possible case is people who
uses very old kexec-tools which can only load kernel to memory under
896M.

Another way is we just obsolete X@Y, but introduce another interface
like crahskernel=X,max=  (max will be used like the CRASH_ADDR_HIGH_MAX
in arch/x86/kernel/setup.c)

> 
> And the same strategy should be applied to other reservation attempts
> - we should try hard to reserve and if we cannot reserve, then try an
> alternating range.
> 
> I even think that
> 
> crashkernel=X@Y
> 
> should not simply fail if Y is occupied but keep trying and say
> 
> [    0.000000] Reserving 64MB of memory at alternative address 3392MB for crashkernel (System RAM: 16271MB)
> 
> and only fail when the user doesn't really want the kernel to try hard
> by booting with
> 
> crashkernel=X@Y,strict
> 
> But that's for another day.

Maybe X@Y,max=..  Then kernel will search begin with Y, and stop until
max - 1;

> 
> -- 
> Regards/Gruss,
>     Boris.
> 
> Good mailing practices for 400: avoid top-posting and trim the reply.

Thanks
Dave

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCHv7] x86/kdump: bugfix, make the behavior of crashkernel=X consistent with kaslr
  2019-01-19  1:25 ` Jerry Hoemann
@ 2019-01-21  5:11   ` Pingfan Liu
  0 siblings, 0 replies; 45+ messages in thread
From: Pingfan Liu @ 2019-01-21  5:11 UTC (permalink / raw)
  To: Jerry.Hoemann
  Cc: kexec, Baoquan He, Yinghai Lu, Randy Dunlap, LKML, Mike Rapoport,
	Andrew Morton, Dave Young, vgoyal

On Sat, Jan 19, 2019 at 9:25 AM Jerry Hoemann <jerry.hoemann@hpe.com> wrote:
>
> On Tue, Jan 15, 2019 at 04:07:03PM +0800, Pingfan Liu wrote:
> > People reported a bug on a high end server with many pcie devices, where
> > kernel bootup with crashkernel=384M, and kaslr is enabled. Even
> > though we still see much memory under 896 MB, the finding still failed
> > intermittently. Because currently we can only find region under 896 MB,
> > if without ',high' specified. Then KASLR breaks 896 MB into several parts
> > randomly, and crashkernel reservation need be aligned to 128 MB, that's
> > why failure is found. It raises confusion to the end user that sometimes
> > crashkernel=X works while sometimes fails.
> > If want to make it succeed, customer can change kernel option to
> > "crashkernel=384M,high". Just this give "crashkernel=xx@yy" a very
> > limited space to behave even though its grammar looks more generic.
> > And we can't answer questions raised from customer that confidently:
> > 1) why it doesn't succeed to reserve 896 MB;
> > 2) what's wrong with memory region under 4G;
> > 3) why I have to add ',high', I only require 384 MB, not 3840 MB.
> > This patch tries to get memory region from 896 MB firstly, then [896MB,4G],
> > finally above 4G.
>
> While allocating crashkernel from below 4G seems fine, won't we have
> problems if the crash kernel gets allocated above 4G because of the SWIOTLB?
>
It will reserve extra memory below 4G for the swiotlb purpose. You can
find the logic in reserve_crashkernel_low()
And testing with crashkernel=512M@4G, we will get:
cat /proc/iomem  | grep Crash
  aa000000-b9ffffff : Crash kernel
  100000000-11fffffff : Crash kernel

Thanks,
Pingfan

> thanks
>
>
> > Dave Young sent the original post, and I just re-post it with commit log
> > improvement as his requirement.
> > http://lists.infradead.org/pipermail/kexec/2017-October/019571.html
> > There was an old discussion below (previously posted by Chao Wang):
> > https://lkml.org/lkml/2013/10/15/601
> >
> > Signed-off-by: Pingfan Liu <kernelfans@gmail.com>
> > Cc: Dave Young <dyoung@redhat.com>
> > Cc: Baoquan He <bhe@redhat.com>
> > Cc: Andrew Morton <akpm@linux-foundation.org>
> > Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
> > Cc: yinghai@kernel.org,
> > Cc: vgoyal@redhat.com
> > Cc: Randy Dunlap <rdunlap@infradead.org>
> > ---
> > v6 -> v7: fix spelling mistake pointed out by Randy
> >  arch/x86/kernel/setup.c | 16 ++++++++++++++++
> >  1 file changed, 16 insertions(+)
> >
> > diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
> > index 3d872a5..fa62c81 100644
> > --- a/arch/x86/kernel/setup.c
> > +++ b/arch/x86/kernel/setup.c
> > @@ -551,6 +551,22 @@ static void __init reserve_crashkernel(void)
> >                                                   high ? CRASH_ADDR_HIGH_MAX
> >                                                        : CRASH_ADDR_LOW_MAX,
> >                                                   crash_size, CRASH_ALIGN);
> > +#ifdef CONFIG_X86_64
> > +             /*
> > +              * crashkernel=X reserve below 896M fails? Try below 4G
> > +              */
> > +             if (!high && !crash_base)
> > +                     crash_base = memblock_find_in_range(CRASH_ALIGN,
> > +                                             (1ULL << 32),
> > +                                             crash_size, CRASH_ALIGN);
> > +             /*
> > +              * crashkernel=X reserve below 4G fails? Try MAXMEM
> > +              */
> > +             if (!high && !crash_base)
> > +                     crash_base = memblock_find_in_range(CRASH_ALIGN,
> > +                                             CRASH_ADDR_HIGH_MAX,
> > +                                             crash_size, CRASH_ALIGN);
> > +#endif
> >               if (!crash_base) {
> >                       pr_info("crashkernel reservation failed - No suitable area found.\n");
> >                       return;
> > --
> > 2.7.4
> >
> >
> > _______________________________________________
> > kexec mailing list
> > kexec@lists.infradead.org
> > http://lists.infradead.org/mailman/listinfo/kexec
>
> --
>
> -----------------------------------------------------------------------------
> Jerry Hoemann                  Software Engineer   Hewlett Packard Enterprise
> -----------------------------------------------------------------------------

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCHv7] x86/kdump: bugfix, make the behavior of crashkernel=X consistent with kaslr
  2019-01-15  8:07 Pingfan Liu
  2019-01-18  3:43 ` Dave Young
@ 2019-01-19  1:25 ` Jerry Hoemann
  2019-01-21  5:11   ` Pingfan Liu
  1 sibling, 1 reply; 45+ messages in thread
From: Jerry Hoemann @ 2019-01-19  1:25 UTC (permalink / raw)
  To: Pingfan Liu
  Cc: kexec, Baoquan He, yinghai, Randy Dunlap, linux-kernel,
	Mike Rapoport, Andrew Morton, Dave Young, vgoyal

On Tue, Jan 15, 2019 at 04:07:03PM +0800, Pingfan Liu wrote:
> People reported a bug on a high end server with many pcie devices, where
> kernel bootup with crashkernel=384M, and kaslr is enabled. Even
> though we still see much memory under 896 MB, the finding still failed
> intermittently. Because currently we can only find region under 896 MB,
> if without ',high' specified. Then KASLR breaks 896 MB into several parts
> randomly, and crashkernel reservation need be aligned to 128 MB, that's
> why failure is found. It raises confusion to the end user that sometimes
> crashkernel=X works while sometimes fails.
> If want to make it succeed, customer can change kernel option to
> "crashkernel=384M,high". Just this give "crashkernel=xx@yy" a very
> limited space to behave even though its grammar looks more generic.
> And we can't answer questions raised from customer that confidently:
> 1) why it doesn't succeed to reserve 896 MB;
> 2) what's wrong with memory region under 4G;
> 3) why I have to add ',high', I only require 384 MB, not 3840 MB.
> This patch tries to get memory region from 896 MB firstly, then [896MB,4G],
> finally above 4G.

While allocating crashkernel from below 4G seems fine, won't we have
problems if the crash kernel gets allocated above 4G because of the SWIOTLB?

thanks


> Dave Young sent the original post, and I just re-post it with commit log
> improvement as his requirement.
> http://lists.infradead.org/pipermail/kexec/2017-October/019571.html
> There was an old discussion below (previously posted by Chao Wang):
> https://lkml.org/lkml/2013/10/15/601
> 
> Signed-off-by: Pingfan Liu <kernelfans@gmail.com>
> Cc: Dave Young <dyoung@redhat.com>
> Cc: Baoquan He <bhe@redhat.com>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
> Cc: yinghai@kernel.org,
> Cc: vgoyal@redhat.com
> Cc: Randy Dunlap <rdunlap@infradead.org>
> ---
> v6 -> v7: fix spelling mistake pointed out by Randy
>  arch/x86/kernel/setup.c | 16 ++++++++++++++++
>  1 file changed, 16 insertions(+)
> 
> diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
> index 3d872a5..fa62c81 100644
> --- a/arch/x86/kernel/setup.c
> +++ b/arch/x86/kernel/setup.c
> @@ -551,6 +551,22 @@ static void __init reserve_crashkernel(void)
>  						    high ? CRASH_ADDR_HIGH_MAX
>  							 : CRASH_ADDR_LOW_MAX,
>  						    crash_size, CRASH_ALIGN);
> +#ifdef CONFIG_X86_64
> +		/*
> +		 * crashkernel=X reserve below 896M fails? Try below 4G
> +		 */
> +		if (!high && !crash_base)
> +			crash_base = memblock_find_in_range(CRASH_ALIGN,
> +						(1ULL << 32),
> +						crash_size, CRASH_ALIGN);
> +		/*
> +		 * crashkernel=X reserve below 4G fails? Try MAXMEM
> +		 */
> +		if (!high && !crash_base)
> +			crash_base = memblock_find_in_range(CRASH_ALIGN,
> +						CRASH_ADDR_HIGH_MAX,
> +						crash_size, CRASH_ALIGN);
> +#endif
>  		if (!crash_base) {
>  			pr_info("crashkernel reservation failed - No suitable area found.\n");
>  			return;
> -- 
> 2.7.4
> 
> 
> _______________________________________________
> kexec mailing list
> kexec@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/kexec

-- 

-----------------------------------------------------------------------------
Jerry Hoemann                  Software Engineer   Hewlett Packard Enterprise
-----------------------------------------------------------------------------

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCHv7] x86/kdump: bugfix, make the behavior of crashkernel=X consistent with kaslr
  2019-01-15  8:07 Pingfan Liu
@ 2019-01-18  3:43 ` Dave Young
  2019-01-19  1:25 ` Jerry Hoemann
  1 sibling, 0 replies; 45+ messages in thread
From: Dave Young @ 2019-01-18  3:43 UTC (permalink / raw)
  To: Pingfan Liu
  Cc: kexec, linux-kernel, Baoquan He, Andrew Morton, Mike Rapoport,
	yinghai, vgoyal, Randy Dunlap, Borislav Petkov, x86

Pingfan, thanks for the post.

On 01/15/19 at 04:07pm, Pingfan Liu wrote:
> People reported a bug on a high end server with many pcie devices, where
> kernel bootup with crashkernel=384M, and kaslr is enabled. Even
> though we still see much memory under 896 MB, the finding still failed
> intermittently. Because currently we can only find region under 896 MB,
> if without ',high' specified. Then KASLR breaks 896 MB into several parts
> randomly, and crashkernel reservation need be aligned to 128 MB, that's
> why failure is found. It raises confusion to the end user that sometimes
> crashkernel=X works while sometimes fails.
> If want to make it succeed, customer can change kernel option to
> "crashkernel=384M,high". Just this give "crashkernel=xx@yy" a very
> limited space to behave even though its grammar looks more generic.
> And we can't answer questions raised from customer that confidently:
> 1) why it doesn't succeed to reserve 896 MB;
> 2) what's wrong with memory region under 4G;
> 3) why I have to add ',high', I only require 384 MB, not 3840 MB.
> This patch tries to get memory region from 896 MB firstly, then [896MB,4G],
> finally above 4G.

The patch log still looks not very good.  It needs some cleanup like
paragraph line breaks to make it more readable.

For example you can take like below:
--
People reported crashkernel=384M reservation failed on a high end server
with KASLR enabled.  In that case there is enough free memory under 896M
but crashkernel reservation still fails intermittently.

The situation is crashkernel reservation code only finds free region under
896 MB with 128M aligned in case no ',high' being used.  And KASLR could
break the first 896M into several parts randomly thus the failure happens.
User has no way to predict and make sure crashkernel=xM working unless
he/she use 'crashkernel=xM,high'.  Since 'crashkernel=xM' is the most
common use case this issue is a serious bug.

And we can't answer questions raised from customer:
1) why it doesn't succeed to reserve 896 MB;
2) what's wrong with memory region under 4G;
3) why I have to add ',high', I only require 384 MB, not 3840 MB.

This patch tries to get memory region from 896 MB firstly, then [896MB,4G],
finally above 4G.

> Dave Young sent the original post, and I just re-post it with commit log
> improvement as his requirement.
> http://lists.infradead.org/pipermail/kexec/2017-October/019571.html
> There was an old discussion below (previously posted by Chao Wang):
> https://lkml.org/lkml/2013/10/15/601

I hope someone else can provide review because I posted it previously.

But I think previously when I posted it is a good to have improvement,
but now it is a real serious bug which need to be fixed.  I can review
and ack if you can repost with a better log.

> 
> Signed-off-by: Pingfan Liu <kernelfans@gmail.com>
> Cc: Dave Young <dyoung@redhat.com>
> Cc: Baoquan He <bhe@redhat.com>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
> Cc: yinghai@kernel.org,
> Cc: vgoyal@redhat.com
> Cc: Randy Dunlap <rdunlap@infradead.org>
> ---
> v6 -> v7: fix spelling mistake pointed out by Randy
>  arch/x86/kernel/setup.c | 16 ++++++++++++++++
>  1 file changed, 16 insertions(+)
> 
> diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
> index 3d872a5..fa62c81 100644
> --- a/arch/x86/kernel/setup.c
> +++ b/arch/x86/kernel/setup.c
> @@ -551,6 +551,22 @@ static void __init reserve_crashkernel(void)
>  						    high ? CRASH_ADDR_HIGH_MAX
>  							 : CRASH_ADDR_LOW_MAX,
>  						    crash_size, CRASH_ALIGN);
> +#ifdef CONFIG_X86_64
> +		/*
> +		 * crashkernel=X reserve below 896M fails? Try below 4G
> +		 */
> +		if (!high && !crash_base)
> +			crash_base = memblock_find_in_range(CRASH_ALIGN,
> +						(1ULL << 32),
> +						crash_size, CRASH_ALIGN);
> +		/*
> +		 * crashkernel=X reserve below 4G fails? Try MAXMEM
> +		 */
> +		if (!high && !crash_base)
> +			crash_base = memblock_find_in_range(CRASH_ALIGN,
> +						CRASH_ADDR_HIGH_MAX,
> +						crash_size, CRASH_ALIGN);
> +#endif
>  		if (!crash_base) {
>  			pr_info("crashkernel reservation failed - No suitable area found.\n");
>  			return;
> -- 
> 2.7.4
> 

Thanks
Dave

^ permalink raw reply	[flat|nested] 45+ messages in thread

* [PATCHv7] x86/kdump: bugfix, make the behavior of crashkernel=X consistent with kaslr
@ 2019-01-15  8:07 Pingfan Liu
  2019-01-18  3:43 ` Dave Young
  2019-01-19  1:25 ` Jerry Hoemann
  0 siblings, 2 replies; 45+ messages in thread
From: Pingfan Liu @ 2019-01-15  8:07 UTC (permalink / raw)
  To: kexec
  Cc: linux-kernel, Pingfan Liu, Dave Young, Baoquan He, Andrew Morton,
	Mike Rapoport, yinghai, vgoyal, Randy Dunlap

People reported a bug on a high end server with many pcie devices, where
kernel bootup with crashkernel=384M, and kaslr is enabled. Even
though we still see much memory under 896 MB, the finding still failed
intermittently. Because currently we can only find region under 896 MB,
if without ',high' specified. Then KASLR breaks 896 MB into several parts
randomly, and crashkernel reservation need be aligned to 128 MB, that's
why failure is found. It raises confusion to the end user that sometimes
crashkernel=X works while sometimes fails.
If want to make it succeed, customer can change kernel option to
"crashkernel=384M,high". Just this give "crashkernel=xx@yy" a very
limited space to behave even though its grammar looks more generic.
And we can't answer questions raised from customer that confidently:
1) why it doesn't succeed to reserve 896 MB;
2) what's wrong with memory region under 4G;
3) why I have to add ',high', I only require 384 MB, not 3840 MB.
This patch tries to get memory region from 896 MB firstly, then [896MB,4G],
finally above 4G.
Dave Young sent the original post, and I just re-post it with commit log
improvement as his requirement.
http://lists.infradead.org/pipermail/kexec/2017-October/019571.html
There was an old discussion below (previously posted by Chao Wang):
https://lkml.org/lkml/2013/10/15/601

Signed-off-by: Pingfan Liu <kernelfans@gmail.com>
Cc: Dave Young <dyoung@redhat.com>
Cc: Baoquan He <bhe@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
Cc: yinghai@kernel.org,
Cc: vgoyal@redhat.com
Cc: Randy Dunlap <rdunlap@infradead.org>
---
v6 -> v7: fix spelling mistake pointed out by Randy
 arch/x86/kernel/setup.c | 16 ++++++++++++++++
 1 file changed, 16 insertions(+)

diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
index 3d872a5..fa62c81 100644
--- a/arch/x86/kernel/setup.c
+++ b/arch/x86/kernel/setup.c
@@ -551,6 +551,22 @@ static void __init reserve_crashkernel(void)
 						    high ? CRASH_ADDR_HIGH_MAX
 							 : CRASH_ADDR_LOW_MAX,
 						    crash_size, CRASH_ALIGN);
+#ifdef CONFIG_X86_64
+		/*
+		 * crashkernel=X reserve below 896M fails? Try below 4G
+		 */
+		if (!high && !crash_base)
+			crash_base = memblock_find_in_range(CRASH_ALIGN,
+						(1ULL << 32),
+						crash_size, CRASH_ALIGN);
+		/*
+		 * crashkernel=X reserve below 4G fails? Try MAXMEM
+		 */
+		if (!high && !crash_base)
+			crash_base = memblock_find_in_range(CRASH_ALIGN,
+						CRASH_ADDR_HIGH_MAX,
+						crash_size, CRASH_ALIGN);
+#endif
 		if (!crash_base) {
 			pr_info("crashkernel reservation failed - No suitable area found.\n");
 			return;
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 45+ messages in thread

end of thread, other threads:[~2019-06-10  6:51 UTC | newest]

Thread overview: 45+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-01-21  5:16 [PATCHv7] x86/kdump: bugfix, make the behavior of crashkernel=X consistent with kaslr Pingfan Liu
2019-01-21  6:24 ` Baoquan He
2019-01-25 10:39 ` Borislav Petkov
2019-01-25 13:45   ` Dave Young
2019-01-25 14:08     ` Borislav Petkov
2019-01-28  9:58       ` Dave Young
2019-01-28 10:18         ` Borislav Petkov
2019-06-07 17:30           ` Borislav Petkov
2019-06-10  6:51             ` Dave Young
2019-01-29  5:25       ` Pingfan Liu
2019-01-31  7:42         ` Dave Young
2019-01-31  7:59       ` Dave Young
2019-01-31 10:57         ` Borislav Petkov
2019-01-31 22:27           ` Jerry Hoemann
2019-01-31 23:47             ` Borislav Petkov
2019-02-04 22:30               ` Jerry Hoemann
2019-02-05  8:15                 ` Borislav Petkov
2019-02-06 12:08                   ` Dave Young
2019-02-11 20:48                     ` Dave Young
2019-02-12  5:35                       ` Pingfan Liu
2019-02-15 10:24                       ` Borislav Petkov
2019-02-18  1:48                         ` Dave Young
2019-02-20  7:38                           ` Pingfan Liu
2019-02-20  8:32                           ` Borislav Petkov
2019-02-20  9:41                             ` Dave Young
2019-02-20 12:51                               ` Pingfan Liu
2019-02-21 17:13                               ` Borislav Petkov
2019-02-22  2:11                                 ` Dave Young
2019-02-22  8:42                                   ` Joerg Roedel
2019-02-22 13:00                                     ` Borislav Petkov
2019-02-24 13:25                                       ` Pingfan Liu
2019-02-25  1:53                                         ` Dave Young
2019-02-25  9:39                                         ` Borislav Petkov
2019-02-25 11:00                                       ` Joerg Roedel
2019-02-25 11:12                                         ` Dave Young
2019-02-25 11:30                                           ` Borislav Petkov
2019-03-01  3:04                                             ` Pingfan Liu
2019-03-01  3:19                                               ` Pingfan Liu
2019-03-22  8:22                                                 ` Dave Young
2019-01-29  5:51   ` Pingfan Liu
2019-01-31 10:50     ` Borislav Petkov
  -- strict thread matches above, loose matches on Subject: below --
2019-01-15  8:07 Pingfan Liu
2019-01-18  3:43 ` Dave Young
2019-01-19  1:25 ` Jerry Hoemann
2019-01-21  5:11   ` Pingfan Liu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).