[v3,2/2] x86/kdump: Reserve extra memory when SME or SEV is active
diff mbox series

Message ID 20190910151341.14986-3-kasong@redhat.com
State New, archived
Headers show
Series
  • x86/kdump: Reserve extra memory when SME or SEV is active
Related show

Commit Message

Kairui Song Sept. 10, 2019, 3:13 p.m. UTC
Since commit c7753208a94c ("x86, swiotlb: Add memory encryption support"),
SWIOTLB will be enabled even if there is less than 4G of memory when SME
is active, to support DMA of devices that not support address with the
encrypt bit.

And commit aba2d9a6385a ("iommu/amd: Do not disable SWIOTLB if SME is
active") make the kernel keep SWIOTLB enabled even if there is an IOMMU.

Then commit d7b417fa08d1 ("x86/mm: Add DMA support for SEV memory
encryption") will always force SWIOTLB to be enabled when SEV is active
in all cases.

Now, when either SME or SEV is active, SWIOTLB will be force enabled,
and this is also true for kdump kernel. As a result kdump kernel will
run out of already scarce pre-reserved memory easily.

So when SME/SEV is active, reserve extra memory for SWIOTLB to ensure
kdump kernel have enough memory, except when "crashkernel=size[KMG],high"
is specified or any offset is used. As for the high reservation case, an
extra low memory region will always be reserved and that is enough for
SWIOTLB. Else if the offset format is used, user should be fully aware
of any possible kdump kernel memory requirement and have to organize the
memory usage carefully.

Signed-off-by: Kairui Song <kasong@redhat.com>
---
 arch/x86/kernel/setup.c | 20 +++++++++++++++++---
 1 file changed, 17 insertions(+), 3 deletions(-)

Comments

Ingo Molnar Sept. 11, 2019, 5:56 a.m. UTC | #1
* Kairui Song <kasong@redhat.com> wrote:

> Since commit c7753208a94c ("x86, swiotlb: Add memory encryption support"),
> SWIOTLB will be enabled even if there is less than 4G of memory when SME
> is active, to support DMA of devices that not support address with the
> encrypt bit.
> 
> And commit aba2d9a6385a ("iommu/amd: Do not disable SWIOTLB if SME is
> active") make the kernel keep SWIOTLB enabled even if there is an IOMMU.
> 
> Then commit d7b417fa08d1 ("x86/mm: Add DMA support for SEV memory
> encryption") will always force SWIOTLB to be enabled when SEV is active
> in all cases.
> 
> Now, when either SME or SEV is active, SWIOTLB will be force enabled,
> and this is also true for kdump kernel. As a result kdump kernel will
> run out of already scarce pre-reserved memory easily.
> 
> So when SME/SEV is active, reserve extra memory for SWIOTLB to ensure
> kdump kernel have enough memory, except when "crashkernel=size[KMG],high"
> is specified or any offset is used. As for the high reservation case, an
> extra low memory region will always be reserved and that is enough for
> SWIOTLB. Else if the offset format is used, user should be fully aware
> of any possible kdump kernel memory requirement and have to organize the
> memory usage carefully.
> 
> Signed-off-by: Kairui Song <kasong@redhat.com>
> ---
>  arch/x86/kernel/setup.c | 20 +++++++++++++++++---
>  1 file changed, 17 insertions(+), 3 deletions(-)
> 
> diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
> index 71f20bb18cb0..ee6a2f1e2226 100644
> --- a/arch/x86/kernel/setup.c
> +++ b/arch/x86/kernel/setup.c
> @@ -530,7 +530,7 @@ static int __init crashkernel_find_region(unsigned long long *crash_base,
>  					  unsigned long long *crash_size,
>  					  bool high)
>  {
> -	unsigned long long base, size;
> +	unsigned long long base, size, mem_enc_req = 0;
>  
>  	base = *crash_base;
>  	size = *crash_size;
> @@ -561,11 +561,25 @@ static int __init crashkernel_find_region(unsigned long long *crash_base,
>  	if (high)
>  		goto high_reserve;
>  
> +	/*
> +	 * When SME/SEV is active and not using high reserve,
> +	 * it will always required an extra SWIOTLB region.
> +	 */
> +	if (mem_encrypt_active())
> +		mem_enc_req = ALIGN(swiotlb_size_or_default(), SZ_1M);
> +
>  	base = memblock_find_in_range(CRASH_ALIGN,
> -				      CRASH_ADDR_LOW_MAX, size,
> +				      CRASH_ADDR_LOW_MAX,
> +				      size + mem_enc_req,
>  				      CRASH_ALIGN);

What sizes are we talking about here?

- What is the possible size range of swiotlb_size_or_default()

- What is the size of CRASH_ADDR_LOW_MAX (the old limit)?

- Why do we replace one fixed limit with another fixed limit instead of 
  accurately sizing the area, with each required feature adding its own 
  requirement to the reservation size?

I.e. please engineer this into a proper solution instead of just 
modifying it around the edges.

For example have you considered adding some sort of 
kdump_memory_reserve(size) facility, which increases the reservation size 
as something like SWIOTLB gets activated? That would avoid the ugly 
mem_encrypt_active() flag, it would just automagically work.

Thanks,

	Ingo
Dave Young Sept. 18, 2019, 7:55 a.m. UTC | #2
On 09/12/19 at 12:23am, Kairui Song wrote:
> On Wednesday, September 11, 2019, Ingo Molnar <mingo@kernel.org> wrote:
> >
> > * Kairui Song <kasong@redhat.com> wrote:
> >
> >> Since commit c7753208a94c ("x86, swiotlb: Add memory encryption
> support"),
> >> SWIOTLB will be enabled even if there is less than 4G of memory when SME
> >> is active, to support DMA of devices that not support address with the
> >> encrypt bit.
> >>
> >> And commit aba2d9a6385a ("iommu/amd: Do not disable SWIOTLB if SME is
> >> active") make the kernel keep SWIOTLB enabled even if there is an IOMMU.
> >>
> >> Then commit d7b417fa08d1 ("x86/mm: Add DMA support for SEV memory
> >> encryption") will always force SWIOTLB to be enabled when SEV is active
> >> in all cases.
> >>
> >> Now, when either SME or SEV is active, SWIOTLB will be force enabled,
> >> and this is also true for kdump kernel. As a result kdump kernel will
> >> run out of already scarce pre-reserved memory easily.
> >>
> >> So when SME/SEV is active, reserve extra memory for SWIOTLB to ensure
> >> kdump kernel have enough memory, except when "crashkernel=size[KMG],high"
> >> is specified or any offset is used. As for the high reservation case, an
> >> extra low memory region will always be reserved and that is enough for
> >> SWIOTLB. Else if the offset format is used, user should be fully aware
> >> of any possible kdump kernel memory requirement and have to organize the
> >> memory usage carefully.
> >>
> >> Signed-off-by: Kairui Song <kasong@redhat.com>
> >> ---
> >>  arch/x86/kernel/setup.c | 20 +++++++++++++++++---
> >>  1 file changed, 17 insertions(+), 3 deletions(-)
> >>
> >> diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
> >> index 71f20bb18cb0..ee6a2f1e2226 100644
> >> --- a/arch/x86/kernel/setup.c
> >> +++ b/arch/x86/kernel/setup.c
> >> @@ -530,7 +530,7 @@ static int __init crashkernel_find_region(unsigned
> long long *crash_base,
> >>                                         unsigned long long *crash_size,
> >>                                         bool high)
> >>  {
> >> -     unsigned long long base, size;
> >> +     unsigned long long base, size, mem_enc_req = 0;
> >>
> >>       base = *crash_base;
> >>       size = *crash_size;
> >> @@ -561,11 +561,25 @@ static int __init crashkernel_find_region(unsigned
> long long *crash_base,
> >>       if (high)
> >>               goto high_reserve;
> >>
> >> +     /*
> >> +      * When SME/SEV is active and not using high reserve,
> >> +      * it will always required an extra SWIOTLB region.
> >> +      */
> >> +     if (mem_encrypt_active())
> >> +             mem_enc_req = ALIGN(swiotlb_size_or_default(), SZ_1M);
> >> +
> >>       base = memblock_find_in_range(CRASH_ALIGN,
> >> -                                   CRASH_ADDR_LOW_MAX, size,
> >> +                                   CRASH_ADDR_LOW_MAX,
> >> +                                   size + mem_enc_req,
> >>                                     CRASH_ALIGN);
> >
> > What sizes are we talking about here?
> >
> > - What is the possible size range of swiotlb_size_or_default()
> >
> > - What is the size of CRASH_ADDR_LOW_MAX (the old limit)?
> >
> > - Why do we replace one fixed limit with another fixed limit instead of
> >   accurately sizing the area, with each required feature adding its own
> >   requirement to the reservation size?
> >
> > I.e. please engineer this into a proper solution instead of just
> > modifying it around the edges.
> >
> > For example have you considered adding some sort of
> > kdump_memory_reserve(size) facility, which increases the reservation size
> > as something like SWIOTLB gets activated? That would avoid the ugly
> > mem_encrypt_active() flag, it would just automagically work.
> 
> Hi, thanks for the suggestions, actually I did try to workout a better
> resolution, at least for SWIOTLB and crashkernel memory, like make
> crashkernel reserve more memory as SWIOTLB get activated to be more
> flexible and generic.
> 
> There are some problems:
> 
> - Usually, for SWIOTLB, even if the first booting kernel have SWIOTLB
> activated, second kernel will most likely not need it. Currently, only the
> high reservation case will still need SWIOTLB in second kernel, and it's
> already reserving extra low memory automatically. SME/SEV systems is the
> only other special case that will force both kernel to use it.
> 
> - There is a little complex procedure to judge whether SWIOTLB is required
> (Depends on whether the system have >4G, if there is an iommu want to shut
> down SWIOTLB, and some times iommu still expect SWIOTLB to exist, and
> SWIOTLB could be activated first, then got closed later etc...). The crash
> kernel reserve should happen very early to ensure the region is usable, but
> kernel is not aware of if SWIOTLB is needed. Have to either move the
> reservation to some later stage or always reserve extra memories early,
> then release if not needed later. Neither sound a good solution, and after
> all, as mentioned above, currently kernel need it doesn't mean kdump kernel
> needs it.
> 
> - Also tried to reuse and improve the currently crashk_low_res facility to
> reserve the memory in a different block at first, make the code simpler.
> Didn't work well due to some other issue with all current version of user
> space kexec-tools, which never expect there will be an extra memory region
> for non-high reservation case, and failed to load the kernel for kdump.
> Have to always make the another block in a lower position or rename the
> memory resource. I think this will either break user space tools heavily,
> or make the implementation even more complex and confusing.

Kairui, I remember you tried to reuse the swiotlb regions across kexec
reboot, are you saying this as the 3rd problem above?

It is not clear how you use crashk_low_res,  can you elaborate it I
remember you mentioned some problem with this approach, maybe we can
re-explore it.

Thanks
Dave
Kairui Song Sept. 18, 2019, 10:17 a.m. UTC | #3
On Wed, Sep 18, 2019 at 3:55 PM Dave Young <dyoung@redhat.com> wrote:
>
> On 09/12/19 at 12:23am, Kairui Song wrote:
> > On Wednesday, September 11, 2019, Ingo Molnar <mingo@kernel.org> wrote:
> > >
> > > * Kairui Song <kasong@redhat.com> wrote:
> > >
> > >> Since commit c7753208a94c ("x86, swiotlb: Add memory encryption
> > support"),
> > >> SWIOTLB will be enabled even if there is less than 4G of memory when SME
> > >> is active, to support DMA of devices that not support address with the
> > >> encrypt bit.
> > >>
> > >> And commit aba2d9a6385a ("iommu/amd: Do not disable SWIOTLB if SME is
> > >> active") make the kernel keep SWIOTLB enabled even if there is an IOMMU.
> > >>
> > >> Then commit d7b417fa08d1 ("x86/mm: Add DMA support for SEV memory
> > >> encryption") will always force SWIOTLB to be enabled when SEV is active
> > >> in all cases.
> > >>
> > >> Now, when either SME or SEV is active, SWIOTLB will be force enabled,
> > >> and this is also true for kdump kernel. As a result kdump kernel will
> > >> run out of already scarce pre-reserved memory easily.
> > >>
> > >> So when SME/SEV is active, reserve extra memory for SWIOTLB to ensure
> > >> kdump kernel have enough memory, except when "crashkernel=size[KMG],high"
> > >> is specified or any offset is used. As for the high reservation case, an
> > >> extra low memory region will always be reserved and that is enough for
> > >> SWIOTLB. Else if the offset format is used, user should be fully aware
> > >> of any possible kdump kernel memory requirement and have to organize the
> > >> memory usage carefully.
> > >>
> > >> Signed-off-by: Kairui Song <kasong@redhat.com>
> > >> ---
> > >>  arch/x86/kernel/setup.c | 20 +++++++++++++++++---
> > >>  1 file changed, 17 insertions(+), 3 deletions(-)
> > >>
> > >> diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
> > >> index 71f20bb18cb0..ee6a2f1e2226 100644
> > >> --- a/arch/x86/kernel/setup.c
> > >> +++ b/arch/x86/kernel/setup.c
> > >> @@ -530,7 +530,7 @@ static int __init crashkernel_find_region(unsigned
> > long long *crash_base,
> > >>                                         unsigned long long *crash_size,
> > >>                                         bool high)
> > >>  {
> > >> -     unsigned long long base, size;
> > >> +     unsigned long long base, size, mem_enc_req = 0;
> > >>
> > >>       base = *crash_base;
> > >>       size = *crash_size;
> > >> @@ -561,11 +561,25 @@ static int __init crashkernel_find_region(unsigned
> > long long *crash_base,
> > >>       if (high)
> > >>               goto high_reserve;
> > >>
> > >> +     /*
> > >> +      * When SME/SEV is active and not using high reserve,
> > >> +      * it will always required an extra SWIOTLB region.
> > >> +      */
> > >> +     if (mem_encrypt_active())
> > >> +             mem_enc_req = ALIGN(swiotlb_size_or_default(), SZ_1M);
> > >> +
> > >>       base = memblock_find_in_range(CRASH_ALIGN,
> > >> -                                   CRASH_ADDR_LOW_MAX, size,
> > >> +                                   CRASH_ADDR_LOW_MAX,
> > >> +                                   size + mem_enc_req,
> > >>                                     CRASH_ALIGN);
> > >
> > > What sizes are we talking about here?
> > >
> > > - What is the possible size range of swiotlb_size_or_default()
> > >
> > > - What is the size of CRASH_ADDR_LOW_MAX (the old limit)?
> > >
> > > - Why do we replace one fixed limit with another fixed limit instead of
> > >   accurately sizing the area, with each required feature adding its own
> > >   requirement to the reservation size?
> > >
> > > I.e. please engineer this into a proper solution instead of just
> > > modifying it around the edges.
> > >
> > > For example have you considered adding some sort of
> > > kdump_memory_reserve(size) facility, which increases the reservation size
> > > as something like SWIOTLB gets activated? That would avoid the ugly
> > > mem_encrypt_active() flag, it would just automagically work.
> >
> > Hi, thanks for the suggestions, actually I did try to workout a better
> > resolution, at least for SWIOTLB and crashkernel memory, like make
> > crashkernel reserve more memory as SWIOTLB get activated to be more
> > flexible and generic.
> >
> > There are some problems:
> >
> > - Usually, for SWIOTLB, even if the first booting kernel have SWIOTLB
> > activated, second kernel will most likely not need it. Currently, only the
> > high reservation case will still need SWIOTLB in second kernel, and it's
> > already reserving extra low memory automatically. SME/SEV systems is the
> > only other special case that will force both kernel to use it.
> >
> > - There is a little complex procedure to judge whether SWIOTLB is required
> > (Depends on whether the system have >4G, if there is an iommu want to shut
> > down SWIOTLB, and some times iommu still expect SWIOTLB to exist, and
> > SWIOTLB could be activated first, then got closed later etc...). The crash
> > kernel reserve should happen very early to ensure the region is usable, but
> > kernel is not aware of if SWIOTLB is needed. Have to either move the
> > reservation to some later stage or always reserve extra memories early,
> > then release if not needed later. Neither sound a good solution, and after
> > all, as mentioned above, currently kernel need it doesn't mean kdump kernel
> > needs it.
> >
> > - Also tried to reuse and improve the currently crashk_low_res facility to
> > reserve the memory in a different block at first, make the code simpler.
> > Didn't work well due to some other issue with all current version of user
> > space kexec-tools, which never expect there will be an extra memory region
> > for non-high reservation case, and failed to load the kernel for kdump.
> > Have to always make the another block in a lower position or rename the
> > memory resource. I think this will either break user space tools heavily,
> > or make the implementation even more complex and confusing.
>
> Kairui, I remember you tried to reuse the swiotlb regions across kexec
> reboot, are you saying this as the 3rd problem above?
>
> It is not clear how you use crashk_low_res,  can you elaborate it I
> remember you mentioned some problem with this approach, maybe we can
> re-explore it.
>
> Thanks
> Dave

Yes, previously I tried to pass the swiotlb region in 1st kernel to
kdump kernel as usable memory. What I worried is that if there are
still on going DMA, and kdump kernel just used the region as normal
memory, it may cause memory corruption.

There are other ways to make it work properly, maybe pass it as
reserved and build the map on need when doing swiotlb init, or
introduce an earlier swiotlb reserving logic so nothing else will
occupy the region. But still have to find a way to let kdump kernel
know about the reusable swiotlb region, like extend the
swiotlb=<nslab>@<addr> syntax so both userspace kernel kexec can
specify where the kdump kernel should reuse as swiotlb, and have to
expose swiotlb in iomem so userspace kexec-tools can actually find the
range...
But this seems getting a bit trouble some just for solving this
particular problem, not sure if it's acceptable. SME/SEV is currently
the only case that swiotlb will be required in kdump kernel, and reuse
it will force drop the swiotlb content when doing kdump.


About the "reuse crashk_low_res" part, I was trying to simply change
the "if (condition)" of previous low memory reserving logic, so an
extra low memory region for swiotlb will be reserve when SME/SEV is
used, just like high reserve case, and it will be only a few lines of
code change.

But that will make it always reserve at lease 256M more memory when
SME/SEV is active, it doesn't require that much. The previous reason
for always reserving at lease 256M of low memory could be found in
commit 94fb933418228. Then I tried to change the logic a bit to
reserve fewer low memory for SME/SEV case, and encountered the issue I
mentioned, that all current version of user space kexec-tools, never
expect there will be a smaller extra memory region for non-high
reservation case. It always try to load kernel image into the highest
region, it's totally possible that the extra smaller swiotlb region
reserved for SME/SEV case is actually higher. So have to always make
the another block in a lower position, or rename the low memory
resource. This also seems more confusing and complex. So instead I
just posted this patch, at lease it's a straight and clean approach.

--
Best Regards,
Kairui Song
Kairui Song Sept. 25, 2019, 10:36 a.m. UTC | #4
On Wed, Sep 11, 2019 at 1:56 PM Ingo Molnar <mingo@kernel.org> wrote:
> * Kairui Song <kasong@redhat.com> wrote:
>
> > Since commit c7753208a94c ("x86, swiotlb: Add memory encryption support"),
> > SWIOTLB will be enabled even if there is less than 4G of memory when SME
> > is active, to support DMA of devices that not support address with the
> > encrypt bit.
> >
> > And commit aba2d9a6385a ("iommu/amd: Do not disable SWIOTLB if SME is
> > active") make the kernel keep SWIOTLB enabled even if there is an IOMMU.
> >
> > Then commit d7b417fa08d1 ("x86/mm: Add DMA support for SEV memory
> > encryption") will always force SWIOTLB to be enabled when SEV is active
> > in all cases.
> >
> > Now, when either SME or SEV is active, SWIOTLB will be force enabled,
> > and this is also true for kdump kernel. As a result kdump kernel will
> > run out of already scarce pre-reserved memory easily.
> >
> > So when SME/SEV is active, reserve extra memory for SWIOTLB to ensure
> > kdump kernel have enough memory, except when "crashkernel=size[KMG],high"
> > is specified or any offset is used. As for the high reservation case, an
> > extra low memory region will always be reserved and that is enough for
> > SWIOTLB. Else if the offset format is used, user should be fully aware
> > of any possible kdump kernel memory requirement and have to organize the
> > memory usage carefully.
> >
> > Signed-off-by: Kairui Song <kasong@redhat.com>
> > ---
> >  arch/x86/kernel/setup.c | 20 +++++++++++++++++---
> >  1 file changed, 17 insertions(+), 3 deletions(-)
> >
> > diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
> > index 71f20bb18cb0..ee6a2f1e2226 100644
> > --- a/arch/x86/kernel/setup.c
> > +++ b/arch/x86/kernel/setup.c
> > @@ -530,7 +530,7 @@ static int __init crashkernel_find_region(unsigned long long *crash_base,
> >                                         unsigned long long *crash_size,
> >                                         bool high)
> >  {
> > -     unsigned long long base, size;
> > +     unsigned long long base, size, mem_enc_req = 0;
> >
> >       base = *crash_base;
> >       size = *crash_size;
> > @@ -561,11 +561,25 @@ static int __init crashkernel_find_region(unsigned long long *crash_base,
> >       if (high)
> >               goto high_reserve;
> >
> > +     /*
> > +      * When SME/SEV is active and not using high reserve,
> > +      * it will always required an extra SWIOTLB region.
> > +      */
> > +     if (mem_encrypt_active())
> > +             mem_enc_req = ALIGN(swiotlb_size_or_default(), SZ_1M);
> > +
> >       base = memblock_find_in_range(CRASH_ALIGN,
> > -                                   CRASH_ADDR_LOW_MAX, size,
> > +                                   CRASH_ADDR_LOW_MAX,
> > +                                   size + mem_enc_req,
> >                                     CRASH_ALIGN);
>

Hi Ingo,

I re-read my previous reply, it's long and tedious, let me try to make
a more effective reply:

> What sizes are we talking about here?

The size here is how much memory will be reserved for kdump kernel, to
ensure kdump kernel and userspace can run without OOM.

>
> - What is the possible size range of swiotlb_size_or_default()

swiotlb_size_or_default() returns the swiotlb size, it's specified by
user using swiotlb=<size>, or default size (64MB)

>
> - What is the size of CRASH_ADDR_LOW_MAX (the old limit)?

It's 4G.

>
> - Why do we replace one fixed limit with another fixed limit instead of
>   accurately sizing the area, with each required feature adding its own
>   requirement to the reservation size?

It's quite hard to "accurately sizing the area".

No way to tell the exact amount of memory kdump needs, we can only estimate.
Kdump kernel use different cmdline, drivers and components will have
special handling for kdump, and userspace is totally different.

>
> I.e. please engineer this into a proper solution instead of just
> modifying it around the edges.
>
> For example have you considered adding some sort of
> kdump_memory_reserve(size) facility, which increases the reservation size
> as something like SWIOTLB gets activated? That would avoid the ugly
> mem_encrypt_active() flag, it would just automagically work.

My first attempt is increase crashkernel memory as swiotlb is
activated. There are problems.

First, SME/SEV is currently the only case that both kernel require
SWIOTLB, for most other case, it's wasting memory.

If we don't care about the memory waste, it has to check/reserve/free
crashkernel memory at three different points:
1. Early boot:
    - crash kernel reserved a region as usual.
2. Right before memblock freeing memoy:
    - If SWIOTLB is activated, crash kernel should reserve another region.
3. After Initcalls:
    - SWIOTLB may get deactivated by initcalls, so need to do a later
check for if we need to release the later reserved region.
It's more complex.

And about a "kdump_memory_reserve(size)" facility, as talked above,
it's hard to know how much kdump needs for now, also hard to find any
user of this.

Please let me know if I failed to make something clear or have any
misunderstanding.

>
> Thanks,
>
>         Ingo



--
Best Regards,
Kairui Song
Dave Young Sept. 27, 2019, 5:42 a.m. UTC | #5
On 09/25/19 at 06:36pm, Kairui Song wrote:
> On Wed, Sep 11, 2019 at 1:56 PM Ingo Molnar <mingo@kernel.org> wrote:
> > * Kairui Song <kasong@redhat.com> wrote:
> >
> > > Since commit c7753208a94c ("x86, swiotlb: Add memory encryption support"),
> > > SWIOTLB will be enabled even if there is less than 4G of memory when SME
> > > is active, to support DMA of devices that not support address with the
> > > encrypt bit.
> > >
> > > And commit aba2d9a6385a ("iommu/amd: Do not disable SWIOTLB if SME is
> > > active") make the kernel keep SWIOTLB enabled even if there is an IOMMU.
> > >
> > > Then commit d7b417fa08d1 ("x86/mm: Add DMA support for SEV memory
> > > encryption") will always force SWIOTLB to be enabled when SEV is active
> > > in all cases.
> > >
> > > Now, when either SME or SEV is active, SWIOTLB will be force enabled,
> > > and this is also true for kdump kernel. As a result kdump kernel will
> > > run out of already scarce pre-reserved memory easily.
> > >
> > > So when SME/SEV is active, reserve extra memory for SWIOTLB to ensure
> > > kdump kernel have enough memory, except when "crashkernel=size[KMG],high"
> > > is specified or any offset is used. As for the high reservation case, an
> > > extra low memory region will always be reserved and that is enough for
> > > SWIOTLB. Else if the offset format is used, user should be fully aware
> > > of any possible kdump kernel memory requirement and have to organize the
> > > memory usage carefully.
> > >
> > > Signed-off-by: Kairui Song <kasong@redhat.com>
> > > ---
> > >  arch/x86/kernel/setup.c | 20 +++++++++++++++++---
> > >  1 file changed, 17 insertions(+), 3 deletions(-)
> > >
> > > diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
> > > index 71f20bb18cb0..ee6a2f1e2226 100644
> > > --- a/arch/x86/kernel/setup.c
> > > +++ b/arch/x86/kernel/setup.c
> > > @@ -530,7 +530,7 @@ static int __init crashkernel_find_region(unsigned long long *crash_base,
> > >                                         unsigned long long *crash_size,
> > >                                         bool high)
> > >  {
> > > -     unsigned long long base, size;
> > > +     unsigned long long base, size, mem_enc_req = 0;
> > >
> > >       base = *crash_base;
> > >       size = *crash_size;
> > > @@ -561,11 +561,25 @@ static int __init crashkernel_find_region(unsigned long long *crash_base,
> > >       if (high)
> > >               goto high_reserve;
> > >
> > > +     /*
> > > +      * When SME/SEV is active and not using high reserve,
> > > +      * it will always required an extra SWIOTLB region.
> > > +      */
> > > +     if (mem_encrypt_active())
> > > +             mem_enc_req = ALIGN(swiotlb_size_or_default(), SZ_1M);
> > > +
> > >       base = memblock_find_in_range(CRASH_ALIGN,
> > > -                                   CRASH_ADDR_LOW_MAX, size,
> > > +                                   CRASH_ADDR_LOW_MAX,
> > > +                                   size + mem_enc_req,
> > >                                     CRASH_ALIGN);
> >
> 
> Hi Ingo,
> 
> I re-read my previous reply, it's long and tedious, let me try to make
> a more effective reply:
> 
> > What sizes are we talking about here?
> 
> The size here is how much memory will be reserved for kdump kernel, to
> ensure kdump kernel and userspace can run without OOM.
> 
> >
> > - What is the possible size range of swiotlb_size_or_default()
> 
> swiotlb_size_or_default() returns the swiotlb size, it's specified by
> user using swiotlb=<size>, or default size (64MB)
> 
> >
> > - What is the size of CRASH_ADDR_LOW_MAX (the old limit)?
> 
> It's 4G.
> 
> >
> > - Why do we replace one fixed limit with another fixed limit instead of
> >   accurately sizing the area, with each required feature adding its own
> >   requirement to the reservation size?
> 
> It's quite hard to "accurately sizing the area".
> 
> No way to tell the exact amount of memory kdump needs, we can only estimate.
> Kdump kernel use different cmdline, drivers and components will have
> special handling for kdump, and userspace is totally different.

Agreed about your above, but specific this the problem in this patch
There should be other ways.

First thought about doing generic handling in swiotlb part, and do
something like kdump_memory_reserve(size) Ingo suggested,  but according
to you swiotlb init is late, so it can not increase the size, OTOH if
reserve another region for kdump in swiotlb will cause other issues. 

So let's think about other improvement, for example to see if you can
call kdump_memory_reserve(size) in AMD SME init path, for example in
mem_encrypt_init(), is it before crashkernel reservation? 

If doable it will be at least cleaner than the code in this patch.

Thanks
Dave
Kairui Song Sept. 27, 2019, 7:52 a.m. UTC | #6
On Fri, Sep 27, 2019 at 1:42 PM Dave Young <dyoung@redhat.com> wrote:
>
> On 09/25/19 at 06:36pm, Kairui Song wrote:
> > On Wed, Sep 11, 2019 at 1:56 PM Ingo Molnar <mingo@kernel.org> wrote:
> > > * Kairui Song <kasong@redhat.com> wrote:
> > >
> > > > Since commit c7753208a94c ("x86, swiotlb: Add memory encryption support"),
> > > > SWIOTLB will be enabled even if there is less than 4G of memory when SME
> > > > is active, to support DMA of devices that not support address with the
> > > > encrypt bit.
> > > >
> > > > And commit aba2d9a6385a ("iommu/amd: Do not disable SWIOTLB if SME is
> > > > active") make the kernel keep SWIOTLB enabled even if there is an IOMMU.
> > > >
> > > > Then commit d7b417fa08d1 ("x86/mm: Add DMA support for SEV memory
> > > > encryption") will always force SWIOTLB to be enabled when SEV is active
> > > > in all cases.
> > > >
> > > > Now, when either SME or SEV is active, SWIOTLB will be force enabled,
> > > > and this is also true for kdump kernel. As a result kdump kernel will
> > > > run out of already scarce pre-reserved memory easily.
> > > >
> > > > So when SME/SEV is active, reserve extra memory for SWIOTLB to ensure
> > > > kdump kernel have enough memory, except when "crashkernel=size[KMG],high"
> > > > is specified or any offset is used. As for the high reservation case, an
> > > > extra low memory region will always be reserved and that is enough for
> > > > SWIOTLB. Else if the offset format is used, user should be fully aware
> > > > of any possible kdump kernel memory requirement and have to organize the
> > > > memory usage carefully.
> > > >
> > > > Signed-off-by: Kairui Song <kasong@redhat.com>
> > > > ---
> > > >  arch/x86/kernel/setup.c | 20 +++++++++++++++++---
> > > >  1 file changed, 17 insertions(+), 3 deletions(-)
> > > >
> > > > diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
> > > > index 71f20bb18cb0..ee6a2f1e2226 100644
> > > > --- a/arch/x86/kernel/setup.c
> > > > +++ b/arch/x86/kernel/setup.c
> > > > @@ -530,7 +530,7 @@ static int __init crashkernel_find_region(unsigned long long *crash_base,
> > > >                                         unsigned long long *crash_size,
> > > >                                         bool high)
> > > >  {
> > > > -     unsigned long long base, size;
> > > > +     unsigned long long base, size, mem_enc_req = 0;
> > > >
> > > >       base = *crash_base;
> > > >       size = *crash_size;
> > > > @@ -561,11 +561,25 @@ static int __init crashkernel_find_region(unsigned long long *crash_base,
> > > >       if (high)
> > > >               goto high_reserve;
> > > >
> > > > +     /*
> > > > +      * When SME/SEV is active and not using high reserve,
> > > > +      * it will always required an extra SWIOTLB region.
> > > > +      */
> > > > +     if (mem_encrypt_active())
> > > > +             mem_enc_req = ALIGN(swiotlb_size_or_default(), SZ_1M);
> > > > +
> > > >       base = memblock_find_in_range(CRASH_ALIGN,
> > > > -                                   CRASH_ADDR_LOW_MAX, size,
> > > > +                                   CRASH_ADDR_LOW_MAX,
> > > > +                                   size + mem_enc_req,
> > > >                                     CRASH_ALIGN);
> > >
> >
> > Hi Ingo,
> >
> > I re-read my previous reply, it's long and tedious, let me try to make
> > a more effective reply:
> >
> > > What sizes are we talking about here?
> >
> > The size here is how much memory will be reserved for kdump kernel, to
> > ensure kdump kernel and userspace can run without OOM.
> >
> > >
> > > - What is the possible size range of swiotlb_size_or_default()
> >
> > swiotlb_size_or_default() returns the swiotlb size, it's specified by
> > user using swiotlb=<size>, or default size (64MB)
> >
> > >
> > > - What is the size of CRASH_ADDR_LOW_MAX (the old limit)?
> >
> > It's 4G.
> >
> > >
> > > - Why do we replace one fixed limit with another fixed limit instead of
> > >   accurately sizing the area, with each required feature adding its own
> > >   requirement to the reservation size?
> >
> > It's quite hard to "accurately sizing the area".
> >
> > No way to tell the exact amount of memory kdump needs, we can only estimate.
> > Kdump kernel use different cmdline, drivers and components will have
> > special handling for kdump, and userspace is totally different.
>
> Agreed about your above, but specific this the problem in this patch
> There should be other ways.
>
> First thought about doing generic handling in swiotlb part, and do
> something like kdump_memory_reserve(size) Ingo suggested,  but according
> to you swiotlb init is late, so it can not increase the size, OTOH if
> reserve another region for kdump in swiotlb will cause other issues.
>
> So let's think about other improvement, for example to see if you can
> call kdump_memory_reserve(size) in AMD SME init path, for example in
> mem_encrypt_init(), is it before crashkernel reservation?

Yes, mem_encrypt_init is before the crashkernel reservation.
I think this is a good idea. Will make a V4 patch based on this.

>
> If doable it will be at least cleaner than the code in this patch.
>
> Thanks
> Dave
Kairui Song Oct. 12, 2019, 9:24 a.m. UTC | #7
On 9/27/19 1:42 PM, Dave Young wrote:
> On 09/25/19 at 06:36pm, Kairui Song wrote:
>> On Wed, Sep 11, 2019 at 1:56 PM Ingo Molnar <mingo@kernel.org> wrote:
>>> * Kairui Song <kasong@redhat.com> wrote:
>>>
>>>> Since commit c7753208a94c ("x86, swiotlb: Add memory encryption support"),
>>>> SWIOTLB will be enabled even if there is less than 4G of memory when SME
>>>> is active, to support DMA of devices that not support address with the
>>>> encrypt bit.
>>>>
>>>> And commit aba2d9a6385a ("iommu/amd: Do not disable SWIOTLB if SME is
>>>> active") make the kernel keep SWIOTLB enabled even if there is an IOMMU.
>>>>
>>>> Then commit d7b417fa08d1 ("x86/mm: Add DMA support for SEV memory
>>>> encryption") will always force SWIOTLB to be enabled when SEV is active
>>>> in all cases.
>>>>
>>>> Now, when either SME or SEV is active, SWIOTLB will be force enabled,
>>>> and this is also true for kdump kernel. As a result kdump kernel will
>>>> run out of already scarce pre-reserved memory easily.
>>>>
>>>> So when SME/SEV is active, reserve extra memory for SWIOTLB to ensure
>>>> kdump kernel have enough memory, except when "crashkernel=size[KMG],high"
>>>> is specified or any offset is used. As for the high reservation case, an
>>>> extra low memory region will always be reserved and that is enough for
>>>> SWIOTLB. Else if the offset format is used, user should be fully aware
>>>> of any possible kdump kernel memory requirement and have to organize the
>>>> memory usage carefully.
>>>>
>>>> Signed-off-by: Kairui Song <kasong@redhat.com>
>>>> ---
>>>>   arch/x86/kernel/setup.c | 20 +++++++++++++++++---
>>>>   1 file changed, 17 insertions(+), 3 deletions(-)
>>>>
>>>> diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
>>>> index 71f20bb18cb0..ee6a2f1e2226 100644
>>>> --- a/arch/x86/kernel/setup.c
>>>> +++ b/arch/x86/kernel/setup.c
>>>> @@ -530,7 +530,7 @@ static int __init crashkernel_find_region(unsigned long long *crash_base,
>>>>                                          unsigned long long *crash_size,
>>>>                                          bool high)
>>>>   {
>>>> -     unsigned long long base, size;
>>>> +     unsigned long long base, size, mem_enc_req = 0;
>>>>
>>>>        base = *crash_base;
>>>>        size = *crash_size;
>>>> @@ -561,11 +561,25 @@ static int __init crashkernel_find_region(unsigned long long *crash_base,
>>>>        if (high)
>>>>                goto high_reserve;
>>>>
>>>> +     /*
>>>> +      * When SME/SEV is active and not using high reserve,
>>>> +      * it will always required an extra SWIOTLB region.
>>>> +      */
>>>> +     if (mem_encrypt_active())
>>>> +             mem_enc_req = ALIGN(swiotlb_size_or_default(), SZ_1M);
>>>> +
>>>>        base = memblock_find_in_range(CRASH_ALIGN,
>>>> -                                   CRASH_ADDR_LOW_MAX, size,
>>>> +                                   CRASH_ADDR_LOW_MAX,
>>>> +                                   size + mem_enc_req,
>>>>                                      CRASH_ALIGN);
>>>
>>
>> Hi Ingo,
>>
>> I re-read my previous reply, it's long and tedious, let me try to make
>> a more effective reply:
>>
>>> What sizes are we talking about here?
>>
>> The size here is how much memory will be reserved for kdump kernel, to
>> ensure kdump kernel and userspace can run without OOM.
>>
>>>
>>> - What is the possible size range of swiotlb_size_or_default()
>>
>> swiotlb_size_or_default() returns the swiotlb size, it's specified by
>> user using swiotlb=<size>, or default size (64MB)
>>
>>>
>>> - What is the size of CRASH_ADDR_LOW_MAX (the old limit)?
>>
>> It's 4G.
>>
>>>
>>> - Why do we replace one fixed limit with another fixed limit instead of
>>>    accurately sizing the area, with each required feature adding its own
>>>    requirement to the reservation size?
>>
>> It's quite hard to "accurately sizing the area".
>>
>> No way to tell the exact amount of memory kdump needs, we can only estimate.
>> Kdump kernel use different cmdline, drivers and components will have
>> special handling for kdump, and userspace is totally different.
> 
> Agreed about your above, but specific this the problem in this patch
> There should be other ways.
> 
> First thought about doing generic handling in swiotlb part, and do
> something like kdump_memory_reserve(size) Ingo suggested,  but according
> to you swiotlb init is late, so it can not increase the size, OTOH if
> reserve another region for kdump in swiotlb will cause other issues.
> 
> So let's think about other improvement, for example to see if you can
> call kdump_memory_reserve(size) in AMD SME init path, for example in
> mem_encrypt_init(), is it before crashkernel reservation?
> 
> If doable it will be at least cleaner than the code in this patch.
> 
> Thanks
> Dave
> 

How about something simple as following code? The logic and new function is as simple as
possible, just always reserve extra low memory when SME/SEV is active, ignore the high/low
reservation case. It will waste some memory with SME and high reservation though.

Was hesitating a lot about this series, one thing I'm thinking is that what is the point
of "crashkernel=" argument, if the crashkernel value could be adjusted according, the value
specified will seems more meanless or confusing...

And currently there isn't anything like crashkernel=auto or anything similiar to let kernel
calculate the value automatically, maybe the admin should be aware of the value or be informed
about the suitable crashkernel value after all?

diff --git a/arch/x86/include/asm/setup.h b/arch/x86/include/asm/setup.h
index ed8ec011a9fd..7263a237f689 100644
--- a/arch/x86/include/asm/setup.h
+++ b/arch/x86/include/asm/setup.h
@@ -44,6 +44,7 @@ void early_platform_quirks(void);
  
  extern unsigned long saved_video_mode;
  
+extern void kdump_need_extra_low_memory(unsigned long size);
  extern void reserve_standard_io_resources(void);
  extern void i386_reserve_resources(void);
  extern unsigned long __startup_64(unsigned long physaddr, struct boot_params *bp);
diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
index 77ea96b794bd..e5888fb8e4bc 100644
--- a/arch/x86/kernel/setup.c
+++ b/arch/x86/kernel/setup.c
@@ -473,6 +473,13 @@ static void __init memblock_x86_reserve_range_setup_data(void)
  # define CRASH_ADDR_HIGH_MAX	SZ_64T
  #endif
  
+static __initdata unsigned long long crash_low_extra;
+
+void __init kdump_need_extra_low_memory(unsigned long size)
+{
+	crash_low_extra += size;
+}
+
  static int __init reserve_crashkernel_low(void)
  {
  #ifdef CONFIG_X86_64
@@ -501,6 +508,7 @@ static int __init reserve_crashkernel_low(void)
  			return 0;
  	}
  
+	low_size += crash_low_extra;
  	low_base = memblock_find_in_range(0, 1ULL << 32, low_size, CRASH_ALIGN);
  	if (!low_base) {
  		pr_err("Cannot reserve %ldMB crashkernel low memory, please try smaller size.\n",
@@ -563,8 +571,17 @@ static void __init reserve_crashkernel(void)
  		if (!high)
  			crash_base = memblock_find_in_range(CRASH_ALIGN,
  						CRASH_ADDR_LOW_MAX,
-						crash_size, CRASH_ALIGN);
-		if (!crash_base)
+						crash_size + crash_low_extra,
+						CRASH_ALIGN);
+		/*
+		 * If reserving the crashkernel memory in low region, then also
+		 * include the extra low memory requirement declared by other
+		 * components. If falled back to high reservation the dedicated
+		 * low crash memory will take care of that.
+		 */
+		if (crash_base)
+			crash_size += crash_low_extra;
+		else
  			crash_base = memblock_find_in_range(CRASH_ALIGN,
  						CRASH_ADDR_HIGH_MAX,
  						crash_size, CRASH_ALIGN);
diff --git a/arch/x86/mm/mem_encrypt.c b/arch/x86/mm/mem_encrypt.c
index 9268c12458c8..b4556d2dcb8e 100644
--- a/arch/x86/mm/mem_encrypt.c
+++ b/arch/x86/mm/mem_encrypt.c
@@ -415,6 +415,8 @@ void __init mem_encrypt_init(void)
  	if (sev_active())
  		static_branch_enable(&sev_enable_key);
  
+	kdump_need_extra_low_memory(swiotlb_size_or_default());
+
  	pr_info("AMD %s active\n",
  		sev_active() ? "Secure Encrypted Virtualization (SEV)"
  			     : "Secure Memory Encryption (SME)");
Dave Young Oct. 14, 2019, 11:05 a.m. UTC | #8
On 10/12/19 at 05:24pm, Kairui Song wrote:
> On 9/27/19 1:42 PM, Dave Young wrote:
> > On 09/25/19 at 06:36pm, Kairui Song wrote:
> > > On Wed, Sep 11, 2019 at 1:56 PM Ingo Molnar <mingo@kernel.org> wrote:
> > > > * Kairui Song <kasong@redhat.com> wrote:
> > > > 
> > > > > Since commit c7753208a94c ("x86, swiotlb: Add memory encryption support"),
> > > > > SWIOTLB will be enabled even if there is less than 4G of memory when SME
> > > > > is active, to support DMA of devices that not support address with the
> > > > > encrypt bit.
> > > > > 
> > > > > And commit aba2d9a6385a ("iommu/amd: Do not disable SWIOTLB if SME is
> > > > > active") make the kernel keep SWIOTLB enabled even if there is an IOMMU.
> > > > > 
> > > > > Then commit d7b417fa08d1 ("x86/mm: Add DMA support for SEV memory
> > > > > encryption") will always force SWIOTLB to be enabled when SEV is active
> > > > > in all cases.
> > > > > 
> > > > > Now, when either SME or SEV is active, SWIOTLB will be force enabled,
> > > > > and this is also true for kdump kernel. As a result kdump kernel will
> > > > > run out of already scarce pre-reserved memory easily.
> > > > > 
> > > > > So when SME/SEV is active, reserve extra memory for SWIOTLB to ensure
> > > > > kdump kernel have enough memory, except when "crashkernel=size[KMG],high"
> > > > > is specified or any offset is used. As for the high reservation case, an
> > > > > extra low memory region will always be reserved and that is enough for
> > > > > SWIOTLB. Else if the offset format is used, user should be fully aware
> > > > > of any possible kdump kernel memory requirement and have to organize the
> > > > > memory usage carefully.
> > > > > 
> > > > > Signed-off-by: Kairui Song <kasong@redhat.com>
> > > > > ---
> > > > >   arch/x86/kernel/setup.c | 20 +++++++++++++++++---
> > > > >   1 file changed, 17 insertions(+), 3 deletions(-)
> > > > > 
> > > > > diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
> > > > > index 71f20bb18cb0..ee6a2f1e2226 100644
> > > > > --- a/arch/x86/kernel/setup.c
> > > > > +++ b/arch/x86/kernel/setup.c
> > > > > @@ -530,7 +530,7 @@ static int __init crashkernel_find_region(unsigned long long *crash_base,
> > > > >                                          unsigned long long *crash_size,
> > > > >                                          bool high)
> > > > >   {
> > > > > -     unsigned long long base, size;
> > > > > +     unsigned long long base, size, mem_enc_req = 0;
> > > > > 
> > > > >        base = *crash_base;
> > > > >        size = *crash_size;
> > > > > @@ -561,11 +561,25 @@ static int __init crashkernel_find_region(unsigned long long *crash_base,
> > > > >        if (high)
> > > > >                goto high_reserve;
> > > > > 
> > > > > +     /*
> > > > > +      * When SME/SEV is active and not using high reserve,
> > > > > +      * it will always required an extra SWIOTLB region.
> > > > > +      */
> > > > > +     if (mem_encrypt_active())
> > > > > +             mem_enc_req = ALIGN(swiotlb_size_or_default(), SZ_1M);
> > > > > +
> > > > >        base = memblock_find_in_range(CRASH_ALIGN,
> > > > > -                                   CRASH_ADDR_LOW_MAX, size,
> > > > > +                                   CRASH_ADDR_LOW_MAX,
> > > > > +                                   size + mem_enc_req,
> > > > >                                      CRASH_ALIGN);
> > > > 
> > > 
> > > Hi Ingo,
> > > 
> > > I re-read my previous reply, it's long and tedious, let me try to make
> > > a more effective reply:
> > > 
> > > > What sizes are we talking about here?
> > > 
> > > The size here is how much memory will be reserved for kdump kernel, to
> > > ensure kdump kernel and userspace can run without OOM.
> > > 
> > > > 
> > > > - What is the possible size range of swiotlb_size_or_default()
> > > 
> > > swiotlb_size_or_default() returns the swiotlb size, it's specified by
> > > user using swiotlb=<size>, or default size (64MB)
> > > 
> > > > 
> > > > - What is the size of CRASH_ADDR_LOW_MAX (the old limit)?
> > > 
> > > It's 4G.
> > > 
> > > > 
> > > > - Why do we replace one fixed limit with another fixed limit instead of
> > > >    accurately sizing the area, with each required feature adding its own
> > > >    requirement to the reservation size?
> > > 
> > > It's quite hard to "accurately sizing the area".
> > > 
> > > No way to tell the exact amount of memory kdump needs, we can only estimate.
> > > Kdump kernel use different cmdline, drivers and components will have
> > > special handling for kdump, and userspace is totally different.
> > 
> > Agreed about your above, but specific this the problem in this patch
> > There should be other ways.
> > 
> > First thought about doing generic handling in swiotlb part, and do
> > something like kdump_memory_reserve(size) Ingo suggested,  but according
> > to you swiotlb init is late, so it can not increase the size, OTOH if
> > reserve another region for kdump in swiotlb will cause other issues.
> > 
> > So let's think about other improvement, for example to see if you can
> > call kdump_memory_reserve(size) in AMD SME init path, for example in
> > mem_encrypt_init(), is it before crashkernel reservation?
> > 
> > If doable it will be at least cleaner than the code in this patch.
> > 
> > Thanks
> > Dave
> > 
> 
> How about something simple as following code? The logic and new function is as simple as
> possible, just always reserve extra low memory when SME/SEV is active, ignore the high/low
> reservation case. It will waste some memory with SME and high reservation though.
> 
> Was hesitating a lot about this series, one thing I'm thinking is that what is the point
> of "crashkernel=" argument, if the crashkernel value could be adjusted according, the value
> specified will seems more meanless or confusing...
> 
> And currently there isn't anything like crashkernel=auto or anything similiar to let kernel
> calculate the value automatically, maybe the admin should be aware of the value or be informed
> about the suitable crashkernel value after all?

Hmm, it is reasonable that a user defined value should be just as is
without any change by kernel.  So it is a good reason to introduce
a crashkernel=auto so that kernel can tune the crashkernel size
accordingly on top of some base value which can be configurable by
kernel configs (arch dependent).

> 
> diff --git a/arch/x86/include/asm/setup.h b/arch/x86/include/asm/setup.h
> index ed8ec011a9fd..7263a237f689 100644
> --- a/arch/x86/include/asm/setup.h
> +++ b/arch/x86/include/asm/setup.h
> @@ -44,6 +44,7 @@ void early_platform_quirks(void);
>  extern unsigned long saved_video_mode;
> +extern void kdump_need_extra_low_memory(unsigned long size);
>  extern void reserve_standard_io_resources(void);
>  extern void i386_reserve_resources(void);
>  extern unsigned long __startup_64(unsigned long physaddr, struct boot_params *bp);
> diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
> index 77ea96b794bd..e5888fb8e4bc 100644
> --- a/arch/x86/kernel/setup.c
> +++ b/arch/x86/kernel/setup.c
> @@ -473,6 +473,13 @@ static void __init memblock_x86_reserve_range_setup_data(void)
>  # define CRASH_ADDR_HIGH_MAX	SZ_64T
>  #endif
> +static __initdata unsigned long long crash_low_extra;
> +
> +void __init kdump_need_extra_low_memory(unsigned long size)
> +{
> +	crash_low_extra += size;
> +}
> +
>  static int __init reserve_crashkernel_low(void)
>  {
>  #ifdef CONFIG_X86_64
> @@ -501,6 +508,7 @@ static int __init reserve_crashkernel_low(void)
>  			return 0;
>  	}
> +	low_size += crash_low_extra;
>  	low_base = memblock_find_in_range(0, 1ULL << 32, low_size, CRASH_ALIGN);
>  	if (!low_base) {
>  		pr_err("Cannot reserve %ldMB crashkernel low memory, please try smaller size.\n",
> @@ -563,8 +571,17 @@ static void __init reserve_crashkernel(void)
>  		if (!high)
>  			crash_base = memblock_find_in_range(CRASH_ALIGN,
>  						CRASH_ADDR_LOW_MAX,
> -						crash_size, CRASH_ALIGN);
> -		if (!crash_base)
> +						crash_size + crash_low_extra,
> +						CRASH_ALIGN);
> +		/*
> +		 * If reserving the crashkernel memory in low region, then also
> +		 * include the extra low memory requirement declared by other
> +		 * components. If falled back to high reservation the dedicated
> +		 * low crash memory will take care of that.
> +		 */
> +		if (crash_base)
> +			crash_size += crash_low_extra;
> +		else
>  			crash_base = memblock_find_in_range(CRASH_ALIGN,
>  						CRASH_ADDR_HIGH_MAX,
>  						crash_size, CRASH_ALIGN);
> diff --git a/arch/x86/mm/mem_encrypt.c b/arch/x86/mm/mem_encrypt.c
> index 9268c12458c8..b4556d2dcb8e 100644
> --- a/arch/x86/mm/mem_encrypt.c
> +++ b/arch/x86/mm/mem_encrypt.c
> @@ -415,6 +415,8 @@ void __init mem_encrypt_init(void)
>  	if (sev_active())
>  		static_branch_enable(&sev_enable_key);
> +	kdump_need_extra_low_memory(swiotlb_size_or_default());
> +
>  	pr_info("AMD %s active\n",
>  		sev_active() ? "Secure Encrypted Virtualization (SEV)"
>  			     : "Secure Memory Encryption (SME)");

BTW, your above patch chunks are not malformed, can not be applied cleanly.

Thanks
Dave
Dave Young Oct. 14, 2019, 11:11 a.m. UTC | #9
On 10/14/19 at 07:05pm, Dave Young wrote:
> On 10/12/19 at 05:24pm, Kairui Song wrote:
> > On 9/27/19 1:42 PM, Dave Young wrote:
> > > On 09/25/19 at 06:36pm, Kairui Song wrote:
> > > > On Wed, Sep 11, 2019 at 1:56 PM Ingo Molnar <mingo@kernel.org> wrote:
> > > > > * Kairui Song <kasong@redhat.com> wrote:
> > > > > 
> > > > > > Since commit c7753208a94c ("x86, swiotlb: Add memory encryption support"),
> > > > > > SWIOTLB will be enabled even if there is less than 4G of memory when SME
> > > > > > is active, to support DMA of devices that not support address with the
> > > > > > encrypt bit.
> > > > > > 
> > > > > > And commit aba2d9a6385a ("iommu/amd: Do not disable SWIOTLB if SME is
> > > > > > active") make the kernel keep SWIOTLB enabled even if there is an IOMMU.
> > > > > > 
> > > > > > Then commit d7b417fa08d1 ("x86/mm: Add DMA support for SEV memory
> > > > > > encryption") will always force SWIOTLB to be enabled when SEV is active
> > > > > > in all cases.
> > > > > > 
> > > > > > Now, when either SME or SEV is active, SWIOTLB will be force enabled,
> > > > > > and this is also true for kdump kernel. As a result kdump kernel will
> > > > > > run out of already scarce pre-reserved memory easily.
> > > > > > 
> > > > > > So when SME/SEV is active, reserve extra memory for SWIOTLB to ensure
> > > > > > kdump kernel have enough memory, except when "crashkernel=size[KMG],high"
> > > > > > is specified or any offset is used. As for the high reservation case, an
> > > > > > extra low memory region will always be reserved and that is enough for
> > > > > > SWIOTLB. Else if the offset format is used, user should be fully aware
> > > > > > of any possible kdump kernel memory requirement and have to organize the
> > > > > > memory usage carefully.
> > > > > > 
> > > > > > Signed-off-by: Kairui Song <kasong@redhat.com>
> > > > > > ---
> > > > > >   arch/x86/kernel/setup.c | 20 +++++++++++++++++---
> > > > > >   1 file changed, 17 insertions(+), 3 deletions(-)
> > > > > > 
> > > > > > diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
> > > > > > index 71f20bb18cb0..ee6a2f1e2226 100644
> > > > > > --- a/arch/x86/kernel/setup.c
> > > > > > +++ b/arch/x86/kernel/setup.c
> > > > > > @@ -530,7 +530,7 @@ static int __init crashkernel_find_region(unsigned long long *crash_base,
> > > > > >                                          unsigned long long *crash_size,
> > > > > >                                          bool high)
> > > > > >   {
> > > > > > -     unsigned long long base, size;
> > > > > > +     unsigned long long base, size, mem_enc_req = 0;
> > > > > > 
> > > > > >        base = *crash_base;
> > > > > >        size = *crash_size;
> > > > > > @@ -561,11 +561,25 @@ static int __init crashkernel_find_region(unsigned long long *crash_base,
> > > > > >        if (high)
> > > > > >                goto high_reserve;
> > > > > > 
> > > > > > +     /*
> > > > > > +      * When SME/SEV is active and not using high reserve,
> > > > > > +      * it will always required an extra SWIOTLB region.
> > > > > > +      */
> > > > > > +     if (mem_encrypt_active())
> > > > > > +             mem_enc_req = ALIGN(swiotlb_size_or_default(), SZ_1M);
> > > > > > +
> > > > > >        base = memblock_find_in_range(CRASH_ALIGN,
> > > > > > -                                   CRASH_ADDR_LOW_MAX, size,
> > > > > > +                                   CRASH_ADDR_LOW_MAX,
> > > > > > +                                   size + mem_enc_req,
> > > > > >                                      CRASH_ALIGN);
> > > > > 
> > > > 
> > > > Hi Ingo,
> > > > 
> > > > I re-read my previous reply, it's long and tedious, let me try to make
> > > > a more effective reply:
> > > > 
> > > > > What sizes are we talking about here?
> > > > 
> > > > The size here is how much memory will be reserved for kdump kernel, to
> > > > ensure kdump kernel and userspace can run without OOM.
> > > > 
> > > > > 
> > > > > - What is the possible size range of swiotlb_size_or_default()
> > > > 
> > > > swiotlb_size_or_default() returns the swiotlb size, it's specified by
> > > > user using swiotlb=<size>, or default size (64MB)
> > > > 
> > > > > 
> > > > > - What is the size of CRASH_ADDR_LOW_MAX (the old limit)?
> > > > 
> > > > It's 4G.
> > > > 
> > > > > 
> > > > > - Why do we replace one fixed limit with another fixed limit instead of
> > > > >    accurately sizing the area, with each required feature adding its own
> > > > >    requirement to the reservation size?
> > > > 
> > > > It's quite hard to "accurately sizing the area".
> > > > 
> > > > No way to tell the exact amount of memory kdump needs, we can only estimate.
> > > > Kdump kernel use different cmdline, drivers and components will have
> > > > special handling for kdump, and userspace is totally different.
> > > 
> > > Agreed about your above, but specific this the problem in this patch
> > > There should be other ways.
> > > 
> > > First thought about doing generic handling in swiotlb part, and do
> > > something like kdump_memory_reserve(size) Ingo suggested,  but according
> > > to you swiotlb init is late, so it can not increase the size, OTOH if
> > > reserve another region for kdump in swiotlb will cause other issues.
> > > 
> > > So let's think about other improvement, for example to see if you can
> > > call kdump_memory_reserve(size) in AMD SME init path, for example in
> > > mem_encrypt_init(), is it before crashkernel reservation?
> > > 
> > > If doable it will be at least cleaner than the code in this patch.
> > > 
> > > Thanks
> > > Dave
> > > 
> > 
> > How about something simple as following code? The logic and new function is as simple as
> > possible, just always reserve extra low memory when SME/SEV is active, ignore the high/low
> > reservation case. It will waste some memory with SME and high reservation though.
> > 
> > Was hesitating a lot about this series, one thing I'm thinking is that what is the point
> > of "crashkernel=" argument, if the crashkernel value could be adjusted according, the value
> > specified will seems more meanless or confusing...
> > 
> > And currently there isn't anything like crashkernel=auto or anything similiar to let kernel
> > calculate the value automatically, maybe the admin should be aware of the value or be informed
> > about the suitable crashkernel value after all?
> 
> Hmm, it is reasonable that a user defined value should be just as is
> without any change by kernel.  So it is a good reason to introduce
> a crashkernel=auto so that kernel can tune the crashkernel size
> accordingly on top of some base value which can be configurable by
> kernel configs (arch dependent).
> 

Here is some old patches I posted for some default crashkernel values,
maybe you can try to do something like that with a crashkernel=auto
https://lkml.org/lkml/2018/5/20/262

Thanks
Dave
Dave Young Oct. 15, 2019, 2:18 a.m. UTC | #10
On 10/14/19 at 07:05pm, Dave Young wrote:
> On 10/12/19 at 05:24pm, Kairui Song wrote:
> > On 9/27/19 1:42 PM, Dave Young wrote:
> > > On 09/25/19 at 06:36pm, Kairui Song wrote:
> > > > On Wed, Sep 11, 2019 at 1:56 PM Ingo Molnar <mingo@kernel.org> wrote:
> > > > > * Kairui Song <kasong@redhat.com> wrote:
> > > > > 
> > > > > > Since commit c7753208a94c ("x86, swiotlb: Add memory encryption support"),
> > > > > > SWIOTLB will be enabled even if there is less than 4G of memory when SME
> > > > > > is active, to support DMA of devices that not support address with the
> > > > > > encrypt bit.
> > > > > > 
> > > > > > And commit aba2d9a6385a ("iommu/amd: Do not disable SWIOTLB if SME is
> > > > > > active") make the kernel keep SWIOTLB enabled even if there is an IOMMU.
> > > > > > 
> > > > > > Then commit d7b417fa08d1 ("x86/mm: Add DMA support for SEV memory
> > > > > > encryption") will always force SWIOTLB to be enabled when SEV is active
> > > > > > in all cases.
> > > > > > 
> > > > > > Now, when either SME or SEV is active, SWIOTLB will be force enabled,
> > > > > > and this is also true for kdump kernel. As a result kdump kernel will
> > > > > > run out of already scarce pre-reserved memory easily.
> > > > > > 
> > > > > > So when SME/SEV is active, reserve extra memory for SWIOTLB to ensure
> > > > > > kdump kernel have enough memory, except when "crashkernel=size[KMG],high"
> > > > > > is specified or any offset is used. As for the high reservation case, an
> > > > > > extra low memory region will always be reserved and that is enough for
> > > > > > SWIOTLB. Else if the offset format is used, user should be fully aware
> > > > > > of any possible kdump kernel memory requirement and have to organize the
> > > > > > memory usage carefully.
> > > > > > 
> > > > > > Signed-off-by: Kairui Song <kasong@redhat.com>
> > > > > > ---
> > > > > >   arch/x86/kernel/setup.c | 20 +++++++++++++++++---
> > > > > >   1 file changed, 17 insertions(+), 3 deletions(-)
> > > > > > 
> > > > > > diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
> > > > > > index 71f20bb18cb0..ee6a2f1e2226 100644
> > > > > > --- a/arch/x86/kernel/setup.c
> > > > > > +++ b/arch/x86/kernel/setup.c
> > > > > > @@ -530,7 +530,7 @@ static int __init crashkernel_find_region(unsigned long long *crash_base,
> > > > > >                                          unsigned long long *crash_size,
> > > > > >                                          bool high)
> > > > > >   {
> > > > > > -     unsigned long long base, size;
> > > > > > +     unsigned long long base, size, mem_enc_req = 0;
> > > > > > 
> > > > > >        base = *crash_base;
> > > > > >        size = *crash_size;
> > > > > > @@ -561,11 +561,25 @@ static int __init crashkernel_find_region(unsigned long long *crash_base,
> > > > > >        if (high)
> > > > > >                goto high_reserve;
> > > > > > 
> > > > > > +     /*
> > > > > > +      * When SME/SEV is active and not using high reserve,
> > > > > > +      * it will always required an extra SWIOTLB region.
> > > > > > +      */
> > > > > > +     if (mem_encrypt_active())
> > > > > > +             mem_enc_req = ALIGN(swiotlb_size_or_default(), SZ_1M);
> > > > > > +
> > > > > >        base = memblock_find_in_range(CRASH_ALIGN,
> > > > > > -                                   CRASH_ADDR_LOW_MAX, size,
> > > > > > +                                   CRASH_ADDR_LOW_MAX,
> > > > > > +                                   size + mem_enc_req,
> > > > > >                                      CRASH_ALIGN);
> > > > > 
> > > > 
> > > > Hi Ingo,
> > > > 
> > > > I re-read my previous reply, it's long and tedious, let me try to make
> > > > a more effective reply:
> > > > 
> > > > > What sizes are we talking about here?
> > > > 
> > > > The size here is how much memory will be reserved for kdump kernel, to
> > > > ensure kdump kernel and userspace can run without OOM.
> > > > 
> > > > > 
> > > > > - What is the possible size range of swiotlb_size_or_default()
> > > > 
> > > > swiotlb_size_or_default() returns the swiotlb size, it's specified by
> > > > user using swiotlb=<size>, or default size (64MB)
> > > > 
> > > > > 
> > > > > - What is the size of CRASH_ADDR_LOW_MAX (the old limit)?
> > > > 
> > > > It's 4G.
> > > > 
> > > > > 
> > > > > - Why do we replace one fixed limit with another fixed limit instead of
> > > > >    accurately sizing the area, with each required feature adding its own
> > > > >    requirement to the reservation size?
> > > > 
> > > > It's quite hard to "accurately sizing the area".
> > > > 
> > > > No way to tell the exact amount of memory kdump needs, we can only estimate.
> > > > Kdump kernel use different cmdline, drivers and components will have
> > > > special handling for kdump, and userspace is totally different.
> > > 
> > > Agreed about your above, but specific this the problem in this patch
> > > There should be other ways.
> > > 
> > > First thought about doing generic handling in swiotlb part, and do
> > > something like kdump_memory_reserve(size) Ingo suggested,  but according
> > > to you swiotlb init is late, so it can not increase the size, OTOH if
> > > reserve another region for kdump in swiotlb will cause other issues.
> > > 
> > > So let's think about other improvement, for example to see if you can
> > > call kdump_memory_reserve(size) in AMD SME init path, for example in
> > > mem_encrypt_init(), is it before crashkernel reservation?
> > > 
> > > If doable it will be at least cleaner than the code in this patch.
> > > 
> > > Thanks
> > > Dave
> > > 
> > 
> > How about something simple as following code? The logic and new function is as simple as
> > possible, just always reserve extra low memory when SME/SEV is active, ignore the high/low
> > reservation case. It will waste some memory with SME and high reservation though.
> > 
> > Was hesitating a lot about this series, one thing I'm thinking is that what is the point
> > of "crashkernel=" argument, if the crashkernel value could be adjusted according, the value
> > specified will seems more meanless or confusing...
> > 
> > And currently there isn't anything like crashkernel=auto or anything similiar to let kernel
> > calculate the value automatically, maybe the admin should be aware of the value or be informed
> > about the suitable crashkernel value after all?
> 
> Hmm, it is reasonable that a user defined value should be just as is
> without any change by kernel.  So it is a good reason to introduce
> a crashkernel=auto so that kernel can tune the crashkernel size
> accordingly on top of some base value which can be configurable by
> kernel configs (arch dependent).
> 

And for the time being, can just print a warning when crashkernel= param
is used, in mem_encrypt_init() code. alert people to increase the memory
size swiotlb_size_or_default().

In the future, if the crashkernel=auto is doable then kernel can adapt
to that in code.  Even if it is reasonable to let admin to provide a
exact value but sometimes it is hard to know these kernel requirement
details..

Thanks
Dave
Kairui Song Oct. 15, 2019, 5:18 p.m. UTC | #11
thiOn Tue, Oct 15, 2019 at 10:18 AM Dave Young <dyoung@redhat.com> wrote:
>
> On 10/14/19 at 07:05pm, Dave Young wrote:
> > On 10/12/19 at 05:24pm, Kairui Song wrote:
> > > On 9/27/19 1:42 PM, Dave Young wrote:
> > > > On 09/25/19 at 06:36pm, Kairui Song wrote:
> > > > > On Wed, Sep 11, 2019 at 1:56 PM Ingo Molnar <mingo@kernel.org> wrote:
> > > > > > * Kairui Song <kasong@redhat.com> wrote:
> > > > > >
> > > > > > > Since commit c7753208a94c ("x86, swiotlb: Add memory encryption support"),
> > > > > > > SWIOTLB will be enabled even if there is less than 4G of memory when SME
> > > > > > > is active, to support DMA of devices that not support address with the
> > > > > > > encrypt bit.
> > > > > > >
> > > > > > > And commit aba2d9a6385a ("iommu/amd: Do not disable SWIOTLB if SME is
> > > > > > > active") make the kernel keep SWIOTLB enabled even if there is an IOMMU.
> > > > > > >
> > > > > > > Then commit d7b417fa08d1 ("x86/mm: Add DMA support for SEV memory
> > > > > > > encryption") will always force SWIOTLB to be enabled when SEV is active
> > > > > > > in all cases.
> > > > > > >
> > > > > > > Now, when either SME or SEV is active, SWIOTLB will be force enabled,
> > > > > > > and this is also true for kdump kernel. As a result kdump kernel will
> > > > > > > run out of already scarce pre-reserved memory easily.
> > > > > > >
> > > > > > > So when SME/SEV is active, reserve extra memory for SWIOTLB to ensure
> > > > > > > kdump kernel have enough memory, except when "crashkernel=size[KMG],high"
> > > > > > > is specified or any offset is used. As for the high reservation case, an
> > > > > > > extra low memory region will always be reserved and that is enough for
> > > > > > > SWIOTLB. Else if the offset format is used, user should be fully aware
> > > > > > > of any possible kdump kernel memory requirement and have to organize the
> > > > > > > memory usage carefully.
> > > > > > >
> > > > > > > Signed-off-by: Kairui Song <kasong@redhat.com>
> > > > > > > ---
> > > > > > >   arch/x86/kernel/setup.c | 20 +++++++++++++++++---
> > > > > > >   1 file changed, 17 insertions(+), 3 deletions(-)
> > > > > > >
> > > > > > > diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
> > > > > > > index 71f20bb18cb0..ee6a2f1e2226 100644
> > > > > > > --- a/arch/x86/kernel/setup.c
> > > > > > > +++ b/arch/x86/kernel/setup.c
> > > > > > > @@ -530,7 +530,7 @@ static int __init crashkernel_find_region(unsigned long long *crash_base,
> > > > > > >                                          unsigned long long *crash_size,
> > > > > > >                                          bool high)
> > > > > > >   {
> > > > > > > -     unsigned long long base, size;
> > > > > > > +     unsigned long long base, size, mem_enc_req = 0;
> > > > > > >
> > > > > > >        base = *crash_base;
> > > > > > >        size = *crash_size;
> > > > > > > @@ -561,11 +561,25 @@ static int __init crashkernel_find_region(unsigned long long *crash_base,
> > > > > > >        if (high)
> > > > > > >                goto high_reserve;
> > > > > > >
> > > > > > > +     /*
> > > > > > > +      * When SME/SEV is active and not using high reserve,
> > > > > > > +      * it will always required an extra SWIOTLB region.
> > > > > > > +      */
> > > > > > > +     if (mem_encrypt_active())
> > > > > > > +             mem_enc_req = ALIGN(swiotlb_size_or_default(), SZ_1M);
> > > > > > > +
> > > > > > >        base = memblock_find_in_range(CRASH_ALIGN,
> > > > > > > -                                   CRASH_ADDR_LOW_MAX, size,
> > > > > > > +                                   CRASH_ADDR_LOW_MAX,
> > > > > > > +                                   size + mem_enc_req,
> > > > > > >                                      CRASH_ALIGN);
> > > > > >
> > > > >
> > > > > Hi Ingo,
> > > > >
> > > > > I re-read my previous reply, it's long and tedious, let me try to make
> > > > > a more effective reply:
> > > > >
> > > > > > What sizes are we talking about here?
> > > > >
> > > > > The size here is how much memory will be reserved for kdump kernel, to
> > > > > ensure kdump kernel and userspace can run without OOM.
> > > > >
> > > > > >
> > > > > > - What is the possible size range of swiotlb_size_or_default()
> > > > >
> > > > > swiotlb_size_or_default() returns the swiotlb size, it's specified by
> > > > > user using swiotlb=<size>, or default size (64MB)
> > > > >
> > > > > >
> > > > > > - What is the size of CRASH_ADDR_LOW_MAX (the old limit)?
> > > > >
> > > > > It's 4G.
> > > > >
> > > > > >
> > > > > > - Why do we replace one fixed limit with another fixed limit instead of
> > > > > >    accurately sizing the area, with each required feature adding its own
> > > > > >    requirement to the reservation size?
> > > > >
> > > > > It's quite hard to "accurately sizing the area".
> > > > >
> > > > > No way to tell the exact amount of memory kdump needs, we can only estimate.
> > > > > Kdump kernel use different cmdline, drivers and components will have
> > > > > special handling for kdump, and userspace is totally different.
> > > >
> > > > Agreed about your above, but specific this the problem in this patch
> > > > There should be other ways.
> > > >
> > > > First thought about doing generic handling in swiotlb part, and do
> > > > something like kdump_memory_reserve(size) Ingo suggested,  but according
> > > > to you swiotlb init is late, so it can not increase the size, OTOH if
> > > > reserve another region for kdump in swiotlb will cause other issues.
> > > >
> > > > So let's think about other improvement, for example to see if you can
> > > > call kdump_memory_reserve(size) in AMD SME init path, for example in
> > > > mem_encrypt_init(), is it before crashkernel reservation?
> > > >
> > > > If doable it will be at least cleaner than the code in this patch.
> > > >
> > > > Thanks
> > > > Dave
> > > >
> > >
> > > How about something simple as following code? The logic and new function is as simple as
> > > possible, just always reserve extra low memory when SME/SEV is active, ignore the high/low
> > > reservation case. It will waste some memory with SME and high reservation though.
> > >
> > > Was hesitating a lot about this series, one thing I'm thinking is that what is the point
> > > of "crashkernel=" argument, if the crashkernel value could be adjusted according, the value
> > > specified will seems more meanless or confusing...
> > >
> > > And currently there isn't anything like crashkernel=auto or anything similiar to let kernel
> > > calculate the value automatically, maybe the admin should be aware of the value or be informed
> > > about the suitable crashkernel value after all?
> >
> > Hmm, it is reasonable that a user defined value should be just as is
> > without any change by kernel.  So it is a good reason to introduce
> > a crashkernel=auto so that kernel can tune the crashkernel size
> > accordingly on top of some base value which can be configurable by
> > kernel configs (arch dependent).
> >
>
> And for the time being, can just print a warning when crashkernel= param
> is used, in mem_encrypt_init() code. alert people to increase the memory
> size swiotlb_size_or_default().

Good suggestion,  it will be much more reasonable if kernel only adjust the
crashekernel value when crashkernel=auto is used. For now, I think giving a
warning could be a better solution.

> In the future, if the crashkernel=auto is doable then kernel can adapt
> to that in code.  Even if it is reasonable to let admin to provide a
> exact value but sometimes it is hard to know these kernel requirement
> details..
>

Yes, and the crashkernel=auto could be more helpful with some infrastructure
for add extra kdump memory.

Let crashkernel=auto have a configurable basic size, and each component could
call kdump_memory_reserve to add to the basic size. (Like SME/SEV case here)

And currently I'm doing some experiment to reserve some pages from buddy
as the crash memory. So kernel can try reserve extra memory for kdump kernel
anytime, userspace can also tell kernel to reserve more crash memory.

If everything goes well, this may make the auto reservation even better.

crashkernel=auto just need to provide a more generic value that's enough to
contain the kernel image, initramfs, swiotlb, early boot memory, etc..
(all the continual things), plus some basic value to fits most default
kdump setups.

Extra usage could be estimated by userspace and added later.
Userspace is more aware of what service/module might be used for kdump
and can be more accurate.

This may provide a better 'auto' crashkernel solution.

--
Best Regards,
Kairui Song

Patch
diff mbox series

diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
index 71f20bb18cb0..ee6a2f1e2226 100644
--- a/arch/x86/kernel/setup.c
+++ b/arch/x86/kernel/setup.c
@@ -530,7 +530,7 @@  static int __init crashkernel_find_region(unsigned long long *crash_base,
 					  unsigned long long *crash_size,
 					  bool high)
 {
-	unsigned long long base, size;
+	unsigned long long base, size, mem_enc_req = 0;
 
 	base = *crash_base;
 	size = *crash_size;
@@ -561,11 +561,25 @@  static int __init crashkernel_find_region(unsigned long long *crash_base,
 	if (high)
 		goto high_reserve;
 
+	/*
+	 * When SME/SEV is active and not using high reserve,
+	 * it will always required an extra SWIOTLB region.
+	 */
+	if (mem_encrypt_active())
+		mem_enc_req = ALIGN(swiotlb_size_or_default(), SZ_1M);
+
 	base = memblock_find_in_range(CRASH_ALIGN,
-				      CRASH_ADDR_LOW_MAX, size,
+				      CRASH_ADDR_LOW_MAX,
+				      size + mem_enc_req,
 				      CRASH_ALIGN);
-	if (base)
+	if (base) {
+		if (mem_enc_req) {
+			pr_info("Memory encryption is active, crashkernel needs %ldMB extra memory\n",
+				(unsigned long)(mem_enc_req >> 20));
+			size += mem_enc_req;
+		}
 		goto found;
+	}
 
 high_reserve:
 	/* Try high reserve */