From: Jan Beulich <jbeulich@suse.com>
To: Julien Grall <julien@xen.org>,
Bertrand Marquis <bertrand.marquis@arm.com>
Cc: "Stefano Stabellini" <sstabellini@kernel.org>,
"Wei Liu" <wl@xen.org>,
"Andrew Cooper" <andrew.cooper3@citrix.com>,
"Ian Jackson" <ian.jackson@eu.citrix.com>,
"George Dunlap" <george.dunlap@citrix.com>,
xen-devel@lists.xenproject.org, nd@arm.com,
"Volodymyr Babchuk" <Volodymyr_Babchuk@epam.com>,
"Roger Pau Monné" <roger.pau@citrix.com>
Subject: Re: [PATCH v3] xen/arm: Convert runstate address during hypcall
Date: Fri, 31 Jul 2020 14:19:47 +0200 [thread overview]
Message-ID: <d8eb8052-6370-7484-1c9a-f90d83396fa1@suse.com> (raw)
In-Reply-To: <f48f81d5-589e-3f75-1044-583114bf497e@xen.org>
On 30.07.2020 22:50, Julien Grall wrote:
> On 30/07/2020 11:24, Bertrand Marquis wrote:
>> At the moment on Arm, a Linux guest running with KTPI enabled will
>> cause the following error when a context switch happens in user mode:
>> (XEN) p2m.c:1890: d1v0: Failed to walk page-table va 0xffffff837ebe0cd0
>>
>> The error is caused by the virtual address for the runstate area
>> registered by the guest only being accessible when the guest is running
>> in kernel space when KPTI is enabled.
>>
>> To solve this issue, this patch is doing the translation from virtual
>> address to physical address during the hypercall and mapping the
>> required pages using vmap. This is removing the conversion from virtual
>> to physical address during the context switch which is solving the
>> problem with KPTI.
>
> To echo what Jan said on the previous version, this is a change in a
> stable ABI and therefore may break existing guest. FAOD, I agree in
> principle with the idea. However, we want to explain why breaking the
> ABI is the *only* viable solution.
>
> From my understanding, it is not possible to fix without an ABI
> breakage because the hypervisor doesn't know when the guest will switch
> back from userspace to kernel space.
And there's also no way to know on Arm, by e.g. enabling a suitable
intercept?
> The risk is the information
> provided by the runstate wouldn't contain accurate information and could
> affect how the guest handle stolen time.
>
> Additionally there are a few issues with the current interface:
> 1) It is assuming the virtual address cannot be re-used by the
> userspace. Thanksfully Linux have a split address space. But this may
> change with KPTI in place.
> 2) When update the page-tables, the guest has to go through an
> invalid mapping. So the translation may fail at any point.
>
> IOW, the existing interface can lead to random memory corruption and
> inacurracy of the stolen time.
>
>>
>> This is done only on arm architecture, the behaviour on x86 is not
>> modified by this patch and the address conversion is done as before
>> during each context switch.
>>
>> This is introducing several limitations in comparison to the previous
>> behaviour (on arm only):
>> - if the guest is remapping the area at a different physical address Xen
>> will continue to update the area at the previous physical address. As
>> the area is in kernel space and usually defined as a global variable this
>> is something which is believed not to happen. If this is required by a
>> guest, it will have to call the hypercall with the new area (even if it
>> is at the same virtual address).
>> - the area needs to be mapped during the hypercall. For the same reasons
>> as for the previous case, even if the area is registered for a different
>> vcpu. It is believed that registering an area using a virtual address
>> unmapped is not something done.
>
> This is not clear whether the virtual address refer to the current vCPU
> or the vCPU you register the runstate for. From the past discussion, I
> think you refer to the former. It would be good to clarify.
>
> Additionally, all the new restrictions should be documented in the
> public interface. So an OS developper can find the differences between
> the architectures.
>
> To answer Jan's concern, we certainly don't know all the guest OSes
> existing, however we also need to balance the benefit for a large
> majority of the users.
>
> From previous discussion, the current approach was deemed to be
> acceptable on Arm and, AFAICT, also x86 (see [1]).
>
> TBH, I would rather see the approach to be common. For that, we would an
> agreement from Andrew and Jan in the approach here. Meanwhile, I think
> this is the best approach to address the concern from Arm users.
Just FTR: If x86 was to also change, VCPUOP_register_vcpu_time_memory_area
would need taking care of as well, as it's using the same underlying model
(including recovery logic when, while the guest is in user mode, the
update has failed).
>> @@ -275,36 +276,156 @@ static void ctxt_switch_to(struct vcpu *n)
>> virt_timer_restore(n);
>> }
>>
>> -/* Update per-VCPU guest runstate shared memory area (if registered). */
>> -static void update_runstate_area(struct vcpu *v)
>> +static void cleanup_runstate_vcpu_locked(struct vcpu *v)
>> {
>> - void __user *guest_handle = NULL;
>> + if ( v->arch.runstate_guest )
>> + {
>> + vunmap((void *)((unsigned long)v->arch.runstate_guest & PAGE_MASK));
>> +
>> + put_page(v->arch.runstate_guest_page[0]);
>> +
>> + if ( v->arch.runstate_guest_page[1] )
>> + put_page(v->arch.runstate_guest_page[1]);
>> +
>> + v->arch.runstate_guest = NULL;
>> + }
>> +}
>> +
>> +void arch_vcpu_cleanup_runstate(struct vcpu *v)
>> +{
>> + spin_lock(&v->arch.runstate_guest_lock);
>> +
>> + cleanup_runstate_vcpu_locked(v);
>> +
>> + spin_unlock(&v->arch.runstate_guest_lock);
>> +}
>> +
>> +static int setup_runstate_vcpu_locked(struct vcpu *v, vaddr_t vaddr)
>> +{
>> + unsigned int offset;
>> + mfn_t mfn[2];
>> + struct page_info *page;
>> + unsigned int numpages;
>> struct vcpu_runstate_info runstate;
>> + void *p;
>>
>> - if ( guest_handle_is_null(runstate_guest(v)) )
>> - return;
>> + /* user can pass a NULL address to unregister a previous area */
>> + if ( vaddr == 0 )
>> + return 0;
>> +
>> + offset = vaddr & ~PAGE_MASK;
>> +
>> + /* provided address must be aligned to a 64bit */
>> + if ( offset % alignof(struct vcpu_runstate_info) )
>
> This new restriction wants to be explained in the commit message and
> public header.
And the expression would imo also better use alignof(runstate).
Jan
next prev parent reply other threads:[~2020-07-31 12:20 UTC|newest]
Thread overview: 15+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-07-30 10:24 [PATCH v3] xen/arm: Convert runstate address during hypcall Bertrand Marquis
2020-07-30 20:50 ` Julien Grall
2020-07-31 1:18 ` Stefano Stabellini
2020-07-31 13:17 ` Bertrand Marquis
2020-07-31 12:19 ` Jan Beulich [this message]
2020-07-31 13:09 ` Bertrand Marquis
2020-07-31 15:06 ` Julien Grall
2020-07-31 13:16 ` Bertrand Marquis
2020-08-13 17:28 ` Julien Grall
2020-08-14 9:11 ` Bertrand Marquis
2020-07-31 13:26 ` Bertrand Marquis
2020-07-31 23:03 ` Stefano Stabellini
2020-08-14 9:12 ` Bertrand Marquis
2020-08-13 17:35 ` Julien Grall
2020-08-14 9:11 ` Bertrand Marquis
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=d8eb8052-6370-7484-1c9a-f90d83396fa1@suse.com \
--to=jbeulich@suse.com \
--cc=Volodymyr_Babchuk@epam.com \
--cc=andrew.cooper3@citrix.com \
--cc=bertrand.marquis@arm.com \
--cc=george.dunlap@citrix.com \
--cc=ian.jackson@eu.citrix.com \
--cc=julien@xen.org \
--cc=nd@arm.com \
--cc=roger.pau@citrix.com \
--cc=sstabellini@kernel.org \
--cc=wl@xen.org \
--cc=xen-devel@lists.xenproject.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).