Re: FYI: Possible HPFAR_EL2 corruption (LPAE guests on AArch64 hosts)

From: Jon Masters <jcm@jonmasters.org>
To: Russell King - ARM Linux admin <linux@armlinux.org.uk>
Cc: "kvmarm@lists.cs.columbia.edu" <kvmarm@lists.cs.columbia.edu>,
	"linux-arm-kernel@lists.infradead.org"
	<linux-arm-kernel@lists.infradead.org>
Subject: Re: FYI: Possible HPFAR_EL2 corruption (LPAE guests on AArch64 hosts)
Date: Mon, 8 Jul 2019 08:40:50 -0400	[thread overview]
Message-ID: <21412e56-a8cc-0e75-06e4-3dd1684839e2@jonmasters.org> (raw)
In-Reply-To: <20190708093714.57t55inainky2zcq@shell.armlinux.org.uk>

On 7/8/19 5:37 AM, Russell King - ARM Linux admin wrote:
> On Sun, Jul 07, 2019 at 11:39:46PM -0400, Jon Masters wrote:
>> Hi all,
>>
>> TLDR: We think $subject may be a hardware errata and we are
>> investigating. I was asked to drop a note to share my initial analysis
>> in case others have been experiencing similar problems with 32-bit VMs.
>>
>> The Fedora Arm 32-bit builders run as "armv7hl+lpae" (aarch32) LPAE
>> (VMSAv8-32 Long-descriptor table format in aarch32 execution state) VMs
>> on AArch64 hosts. Under certain conditions, those builders will "pause"
>> with the following obscure looking error message:
>>
>> kvm [10652]: load/store instruction decoding not implemented
> 
> Out of interest, because I'm running a number of 32-bit VMs on the
> Macchiatobin board, using a different 64-bit distro...
> 
> How often do these errors occur?  Have you been able to pinpoint any
> particular CPU core?  Does the workload in the VMs have any effect?
> What about the workload in the host?

It's a specific CPU core (not a Cortex design), running a 32-bit LPAE
kernel (needs to be LPAE to have an IPA >32-bit). In the course of a
weekend running stress tests, my test kernel fixed up hundreds of faults
that would otherwise have taken the guest system down.

Specifically, PGDs are allocated from a cache located in low memory (so
we never hit this condition for those), but PTEs are allocated using:

	alloc_pages(PGALLOC_GFP | __GFP_HIGHMEM, 0);

So at some point, we'll allocate a PTE from above 32-bit. When we later
take a fault on those during a stage 1 walk, we hit a problem.

My guess is we do the clock algorithm on the host looking to see for
recent accesses by unsetting access bits on the host (stage2) and since
on Armv8.0 we do a software trap for access bit updates we'll trap to
stage 2 during the stage 1 guest walk the next time around. So simply
pinning the guest memory isn't going to be sufficient to prevent it if
that memory is allocated normally with the host doing software LRU.

But the above is just what I consider the likely cause in my head.

Jon.
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm