kvmarm.lists.cs.columbia.edu archive mirror
 help / color / mirror / Atom feed
From: Jon Masters <jcm@jonmasters.org>
To: Mark Rutland <mark.rutland@arm.com>
Cc: kvmarm@lists.cs.columbia.edu,
	"linux-arm-kernel@lists.infradead.org"
	<linux-arm-kernel@lists.infradead.org>
Subject: Re: FYI: Possible HPFAR_EL2 corruption (LPAE guests on AArch64 hosts)
Date: Mon, 8 Jul 2019 08:16:25 -0400	[thread overview]
Message-ID: <de6f5ca5-9485-620f-b748-9a38e9a4a0ba@jonmasters.org> (raw)
In-Reply-To: <20190708114716.GA33099@lakrids.cambridge.arm.com>

Hi Mark,

Thanks for adding the CCs. See below for more.

On 7/8/19 7:47 AM, Mark Rutland wrote:
> On Sun, Jul 07, 2019 at 11:39:46PM -0400, Jon Masters wrote:
>> Hi all,
> 
> Hi Jon,
> 
> [adding Marc and the kvm-arm list]
> 
>> TLDR: We think $subject may be a hardware errata and we are
>> investigating. I was asked to drop a note to share my initial analysis
>> in case others have been experiencing similar problems with 32-bit VMs.
>>
>> The Fedora Arm 32-bit builders run as "armv7hl+lpae" (aarch32) LPAE
>> (VMSAv8-32 Long-descriptor table format in aarch32 execution state) VMs
>> on AArch64 hosts. Under certain conditions, those builders will "pause"
>> with the following obscure looking error message:
>>
>> kvm [10652]: load/store instruction decoding not implemented
>>
>> (which is caused by a fall-through in io_mem_abort, the code assumes
>> that if we couldn't find the guest memslot we're taking an IO abort)
>>
>> This has been happening on and off for more than a year, tickled further
>> by various 32-bit Fedora guest updates, leading to some speculation that
>> there was actually a problem with guest toolchains generating
>> hard-to-emulate complex load/store instruction sequences not handled in KVM.
>>
>> After extensive analysis, I believe instead that it appears on the
>> platform we are using in Fedora that a stage 2 fault (e.g. v8.0 software
>> access bit update in the host) taken during stage 1 guest page table
>> walk will result in an HPFAR_EL2 truncation to a 32-bit address instead
>> of the full 48-bit IPA in use due to aarch32 LPAE. I believe that this
>> is a hardware errata and have requested that the vendor investigate.
>>
>> Meanwhile, I have a /very/ nasty patch that checks the fault conditions
>> in kvm_handle_guest_abort and if they match (S1 PTW, etc.), does a
>> software walk through the guest page tables looking for a PTE that
>> matches with the lower part of the faulting address bits we did get
>> reported to the host, then re-injects the correct fault. With this
>> patch, the test builder stays up, albeit correcting various faults:
>>
>> [  143.670063] JCM: WARNING: Mismatched FIPA and PA translation detected!
>> [  143.748447] JCM: Hyper faulting far: 0x3deb0000
>> [  143.802808] JCM: Guest faulting far: 0xb6dce3c4 (gfn: 0x3deb)
>> [  143.871776] JCM: Guest TTBCR: 0xb5023500, TTBR0: 0x5b06cc40
>> [  143.938649] JCM: Guest PGD address: 0x5b06cc50
>> [  143.991962] JCM: Guest PGD: 0x5b150003
>> [  144.036925] JCM: Guest PMD address: 0x5b150db0
>> [  144.090238] JCM: Guest PMD: 0x43deb0003
>> [  144.136241] JCM: Guest PTE address: 0x43deb0e70
>> [  144.190604] JCM: Guest PTE: 0x42000043bb72fdf
>> [  144.242884] JCM: Manually translated as: 0xb6dce3c4->0x43bb72000
>> [  144.314972] JCM: Faulting IPA page: 0x3deb0000
>> [  144.368286] JCM: Faulting PTE page: 0x43deb0000
>> [  144.422641] JCM: Fault occurred while performing S1 PTW -fixing
>> [  144.493684] JCM: corrected fault_ipa: 0x43deb0000
>> [  144.550133] JCM: Corrected gfn: 0x43deb
>> [  144.596145] JCM: handle user_mem_abort
>> [  144.641155] JCM: ret: 0x1
> 
> When the conditions are met, does the issue continue to trigger
> reliably?

Yeah. But only for certain faults - seems to be specifically for stage 1
page table walks that cause a trap to stage 2.

> e.g. if you return to the guest without fixing the fault, do you always
> see the truncation when taking the fault again?

I believe so, but I need to specifically check that.

> If you try the translation with an AT, does that work as expected? We've
> had to use that elsewhere; see __populate_fault_info() in
> arch/arm64/kvm/hyp/switch.c.

Yea, I've seen that code for the other errata :) The problem is the
virtual address in the FAR is different from the one we ultimately have
a PA translation for. We take a fault when the hardware walker tries to
perform a load to (e.g.) the PTE leaf during the translation of the VA.
So the PTE itself is what we are trying to load, not the PA of the VA
that the guest userspace/kernel tried to load. Hence an AT won't work,
unless I'm missing something. My first thought had been to do that.

Many thanks,

Jon.


_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

  reply	other threads:[~2019-07-08 12:16 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <7dd77cea-d673-269a-044f-4df269db7e5e@jonmasters.org>
2019-07-08 11:47 ` FYI: Possible HPFAR_EL2 corruption (LPAE guests on AArch64 hosts) Mark Rutland
2019-07-08 12:16   ` Jon Masters [this message]
2019-07-08 13:04     ` Marc Zyngier
2019-07-08 13:14       ` Jon Masters
2019-07-08 13:09     ` Mark Rutland
2019-07-08 13:17       ` Jon Masters
     [not found] ` <20190708093714.57t55inainky2zcq@shell.armlinux.org.uk>
2019-07-08 12:40   ` Jon Masters

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=de6f5ca5-9485-620f-b748-9a38e9a4a0ba@jonmasters.org \
    --to=jcm@jonmasters.org \
    --cc=kvmarm@lists.cs.columbia.edu \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=mark.rutland@arm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).