linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC FIX PATCH] x86/e820: Stop kernel boot when RAM resource reservation fails
@ 2022-07-18  8:58 Bharata B Rao
  2022-07-18 10:42 ` Boris Petkov
  0 siblings, 1 reply; 7+ messages in thread
From: Bharata B Rao @ 2022-07-18  8:58 UTC (permalink / raw)
  To: linux-kernel
  Cc: tglx, mingo, bp, dave.hansen, x86, hpa, nikunj, Abraham.Shaju

Currently it is possible to start a guest with memory that
is beyond the addressable range of CPU. This can typically
be done by using QEMU without explicilty specifying the max
physical addressable bits (via phys-bits or host-phys-bits
options). In such cases QEMU will start the guest with more
than 1TB memory but would implicitly limit the phys-bits to 40.

In this scenario, iomem_resource.end gets set to 1TB and
hence subsequent resource reservations of RAM regions beyond
1TB would fail. Since this failure is ignored, there can be
a situation where kernel is using the entire RAM (beyond 1T),
but the RAM range is not part of iomem resource tree.

This can lead to both performance as well as correctness
issues. For example, gettimeofday() calls will take more
time as the vvar_page gets mapped as uncacheable memory
type (_PAGE_CACHE_MODE_UC_MINUS). The vvar fault handler
will default to uncacheable type when it fails to find the
vvar_page pfn as part of any RAM range in iomem_resource.
Here is a comparision of the time taken (in us) by an
application doing lots (10240) of gettimeofday() calls, to
complete in case of 999G and 1T guest RAM:

Iteration	999G	1T
----------------------------
1		291	1178
2		316	3286
3		582	2982
4		284	1808
5		252	4503

This is how /proc/iomem looks like for the above two cases:

999G guest RAM
---------------
00001000-0009fbff : System RAM
00100000-bffdbfff : System RAM
100000000-f9ffffffff : System RAM
  1549c00000-154fe09107 : Kernel code
  1550000000-1552f3cfff : Kernel rodata
  1553000000-15544aea3f : Kernel data
  1554d67000-15553fffff : Kernel bss

1T guest RAM
------------
00001000-0009fbff : System RAM
00100000-bffdbfff : System RAM
6752200000-6758409107 : Kernel code
6758600000-675b53cfff : Kernel rodata
675b600000-675caaea3f : Kernel data
675d367000-675d9fffff : Kernel bss
(Last System RAM entry is missing)

It is also seen that any memory region reservation requests
(say by using request_free_mem_region()), whose sizes fall
below 1TB, will be satisfied, leading to ranges overlapping
with actual RAM range (though the RAM range is missing in the
resource tree).

Fix this problem by stopping the kernel boot when resource
reservation fails for system RAM.

Reported-by: Shaju Abraham <Abraham.Shaju@amd.com>
Signed-off-by: Bharata B Rao <bharata@amd.com>
---
1. It appears that we should fail for other types of
resources too and not just for RAM, but wasn't sure
and hence checking for RAM explicitly in this version.
2. There is an attempt to fix this on the QEMU side too
https://lore.kernel.org/qemu-devel/20220718081734.135598-1-nikunj@amd.com/

 arch/x86/kernel/e820.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kernel/e820.c b/arch/x86/kernel/e820.c
index f267205f2d5a..1cfe640afe71 100644
--- a/arch/x86/kernel/e820.c
+++ b/arch/x86/kernel/e820.c
@@ -1185,7 +1185,10 @@ void __init e820__reserve_resources(void)
 		 */
 		if (do_mark_busy(entry->type, res)) {
 			res->flags |= IORESOURCE_BUSY;
-			insert_resource(&iomem_resource, res);
+			if (insert_resource(&iomem_resource, res) &&
+			    entry->type == E820_TYPE_RAM)
+				panic("%s: Failed to reserve resource %s with range (%llx-%llx)\n",
+				      __func__, res->name, res->start, res->end);
 		}
 		res++;
 	}
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [RFC FIX PATCH] x86/e820: Stop kernel boot when RAM resource reservation fails
  2022-07-18  8:58 [RFC FIX PATCH] x86/e820: Stop kernel boot when RAM resource reservation fails Bharata B Rao
@ 2022-07-18 10:42 ` Boris Petkov
  2022-07-18 14:54   ` Bharata B Rao
  0 siblings, 1 reply; 7+ messages in thread
From: Boris Petkov @ 2022-07-18 10:42 UTC (permalink / raw)
  To: Bharata B Rao, linux-kernel
  Cc: tglx, mingo, x86, dave.hansen, nikunj, hpa, Abraham.Shaju

On July 18, 2022 8:58:15 AM UTC, Bharata B Rao <bharata@amd.com> wrote:
>Currently it is possible to start a guest with memory that
>is beyond the addressable range of CPU. This can typically
>be done by using QEMU without explicilty specifying the max
>physical addressable bits (via phys-bits or host-phys-bits
>options). In such cases QEMU will start the guest with more
>than 1TB memory but would implicitly limit the phys-bits to 40.

Why does the upstream kernel care about some weird qemu guest configurations? 

-- 
Sent from a small device: formatting sux and brevity is inevitable.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [RFC FIX PATCH] x86/e820: Stop kernel boot when RAM resource reservation fails
  2022-07-18 10:42 ` Boris Petkov
@ 2022-07-18 14:54   ` Bharata B Rao
  2022-07-18 15:07     ` Borislav Petkov
  0 siblings, 1 reply; 7+ messages in thread
From: Bharata B Rao @ 2022-07-18 14:54 UTC (permalink / raw)
  To: Boris Petkov, linux-kernel
  Cc: tglx, mingo, x86, dave.hansen, nikunj, hpa, Abraham.Shaju



On 7/18/2022 4:12 PM, Boris Petkov wrote:
> On July 18, 2022 8:58:15 AM UTC, Bharata B Rao <bharata@amd.com> wrote:
>> Currently it is possible to start a guest with memory that
>> is beyond the addressable range of CPU. This can typically
>> be done by using QEMU without explicilty specifying the max
>> physical addressable bits (via phys-bits or host-phys-bits
>> options). In such cases QEMU will start the guest with more
>> than 1TB memory but would implicitly limit the phys-bits to 40.
> 
> Why does the upstream kernel care about some weird qemu guest configurations? 

It may be a weird guest configuration, but it looks like
a kernel bug exposed by QEMU.

Regards,
Bharata

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [RFC FIX PATCH] x86/e820: Stop kernel boot when RAM resource reservation fails
  2022-07-18 14:54   ` Bharata B Rao
@ 2022-07-18 15:07     ` Borislav Petkov
  2022-07-19  4:15       ` Bharata B Rao
  0 siblings, 1 reply; 7+ messages in thread
From: Borislav Petkov @ 2022-07-18 15:07 UTC (permalink / raw)
  To: Bharata B Rao
  Cc: linux-kernel, tglx, mingo, x86, dave.hansen, nikunj, hpa, Abraham.Shaju

On Mon, Jul 18, 2022 at 08:24:08PM +0530, Bharata B Rao wrote:
> It may be a weird guest configuration, but it looks like
> a kernel bug exposed by QEMU.

I betcha you can generate a lot of "kernel bugs" with weird qemu
options. If it is not a real use case, nobody cares.

And even if it were a real use case, panicking the machine is not the
right fix.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [RFC FIX PATCH] x86/e820: Stop kernel boot when RAM resource reservation fails
  2022-07-18 15:07     ` Borislav Petkov
@ 2022-07-19  4:15       ` Bharata B Rao
  2022-07-19 17:12         ` Sean Christopherson
  2022-08-04  9:46         ` Ingo Molnar
  0 siblings, 2 replies; 7+ messages in thread
From: Bharata B Rao @ 2022-07-19  4:15 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: linux-kernel, tglx, mingo, x86, dave.hansen, nikunj, hpa, Abraham.Shaju

On 7/18/2022 8:37 PM, Borislav Petkov wrote:
> 
> I betcha you can generate a lot of "kernel bugs" with weird qemu
> options. If it is not a real use case, nobody cares.

I see that we will hit this problem by default when starting
a guest with 1T or more memory using QEMU.

> 
> And even if it were a real use case, panicking the machine is not the
> right fix.

I couldn't see a clean exit/recovery option in setup_arch()->e820__reserve_resources()
where this happens. Any suggestions?

Regards,
Bharata.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [RFC FIX PATCH] x86/e820: Stop kernel boot when RAM resource reservation fails
  2022-07-19  4:15       ` Bharata B Rao
@ 2022-07-19 17:12         ` Sean Christopherson
  2022-08-04  9:46         ` Ingo Molnar
  1 sibling, 0 replies; 7+ messages in thread
From: Sean Christopherson @ 2022-07-19 17:12 UTC (permalink / raw)
  To: Bharata B Rao
  Cc: Borislav Petkov, linux-kernel, tglx, mingo, x86, dave.hansen,
	nikunj, hpa, Abraham.Shaju

On Tue, Jul 19, 2022, Bharata B Rao wrote:
> On 7/18/2022 8:37 PM, Borislav Petkov wrote:
> > 
> > I betcha you can generate a lot of "kernel bugs" with weird qemu
> > options. If it is not a real use case, nobody cares.
> 
> I see that we will hit this problem by default when starting
> a guest with 1T or more memory using QEMU.

That a user can create a bad configuration using QEMU's default MAXPHYADDR doesn't
change the fact that adding memory beyond MAXPHYADDR is firmly a configuration bug.

> > And even if it were a real use case, panicking the machine is not the
> > right fix.
> 
> I couldn't see a clean exit/recovery option in setup_arch()->e820__reserve_resources()
> where this happens. Any suggestions?

WARN or pr_err/warn() and move on, or just do nothing.  Adding code to try and
gracefully handle an architecturally impossible configuration is a waste of time
and effort.  Like Boris said, there's practically a limitless number of bad setups
QEMU can create, this one just happens to be easier to create than others due to
shortcomings in QEMU.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [RFC FIX PATCH] x86/e820: Stop kernel boot when RAM resource reservation fails
  2022-07-19  4:15       ` Bharata B Rao
  2022-07-19 17:12         ` Sean Christopherson
@ 2022-08-04  9:46         ` Ingo Molnar
  1 sibling, 0 replies; 7+ messages in thread
From: Ingo Molnar @ 2022-08-04  9:46 UTC (permalink / raw)
  To: Bharata B Rao
  Cc: Borislav Petkov, linux-kernel, tglx, mingo, x86, dave.hansen,
	nikunj, hpa, Abraham.Shaju


* Bharata B Rao <bharata@amd.com> wrote:

> On 7/18/2022 8:37 PM, Borislav Petkov wrote:
> > 
> > I betcha you can generate a lot of "kernel bugs" with weird qemu
> > options. If it is not a real use case, nobody cares.
> 
> I see that we will hit this problem by default when starting
> a guest with 1T or more memory using QEMU.
> 
> > 
> > And even if it were a real use case, panicking the machine is not the
> > right fix.
> 
> I couldn't see a clean exit/recovery option in 
> setup_arch()->e820__reserve_resources() where this happens. Any 
> suggestions?

I'd emit a low impact, non-fatal WARN()ing to make sure users aren't silent 
victims of an easily detectable firmware (Qemu) misconfiguration.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2022-08-04  9:46 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-07-18  8:58 [RFC FIX PATCH] x86/e820: Stop kernel boot when RAM resource reservation fails Bharata B Rao
2022-07-18 10:42 ` Boris Petkov
2022-07-18 14:54   ` Bharata B Rao
2022-07-18 15:07     ` Borislav Petkov
2022-07-19  4:15       ` Bharata B Rao
2022-07-19 17:12         ` Sean Christopherson
2022-08-04  9:46         ` Ingo Molnar

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).