From: Bertrand Marquis <Bertrand.Marquis@arm.com>
To: Stefano Stabellini <sstabellini@kernel.org>
Cc: "Julien Grall" <julien@xen.org>, "Wei Liu" <wl@xen.org>,
"Andrew Cooper" <andrew.cooper3@citrix.com>,
"Ian Jackson" <ian.jackson@eu.citrix.com>,
"George Dunlap" <george.dunlap@citrix.com>,
"Jan Beulich" <jbeulich@suse.com>,
Xen-devel <xen-devel@lists.xenproject.org>, nd <nd@arm.com>,
"Volodymyr Babchuk" <Volodymyr_Babchuk@epam.com>,
"Roger Pau Monné" <roger.pau@citrix.com>
Subject: Re: [PATCH v3] xen/arm: Convert runstate address during hypcall
Date: Fri, 31 Jul 2020 13:17:09 +0000 [thread overview]
Message-ID: <91E1094A-C03D-4DD7-AC4B-0A01330A043F@arm.com> (raw)
In-Reply-To: <alpine.DEB.2.21.2007301422030.1767@sstabellini-ThinkPad-T480s>
> On 31 Jul 2020, at 03:18, Stefano Stabellini <sstabellini@kernel.org> wrote:
>
> On Thu, 30 Jul 2020, Julien Grall wrote:
>> Hi Bertrand,
>>
>> To avoid extra work on your side, I would recommend to wait a bit before
>> sending a new version. It would be good to at least settle the conversation in
>> v2 regarding the approach taken.
>>
>> On 30/07/2020 11:24, Bertrand Marquis wrote:
>>> At the moment on Arm, a Linux guest running with KTPI enabled will
>>> cause the following error when a context switch happens in user mode:
>>> (XEN) p2m.c:1890: d1v0: Failed to walk page-table va 0xffffff837ebe0cd0
>>>
>>> The error is caused by the virtual address for the runstate area
>>> registered by the guest only being accessible when the guest is running
>>> in kernel space when KPTI is enabled.
>>>
>>> To solve this issue, this patch is doing the translation from virtual
>>> address to physical address during the hypercall and mapping the
>>> required pages using vmap. This is removing the conversion from virtual
>>> to physical address during the context switch which is solving the
>>> problem with KPTI.
>>
>> To echo what Jan said on the previous version, this is a change in a stable
>> ABI and therefore may break existing guest. FAOD, I agree in principle with
>> the idea. However, we want to explain why breaking the ABI is the *only*
>> viable solution.
>>
>> From my understanding, it is not possible to fix without an ABI breakage
>> because the hypervisor doesn't know when the guest will switch back from
>> userspace to kernel space. The risk is the information provided by the
>> runstate wouldn't contain accurate information and could affect how the guest
>> handle stolen time.
>>
>> Additionally there are a few issues with the current interface:
>> 1) It is assuming the virtual address cannot be re-used by the userspace.
>> Thanksfully Linux have a split address space. But this may change with KPTI in
>> place.
>> 2) When update the page-tables, the guest has to go through an invalid
>> mapping. So the translation may fail at any point.
>>
>> IOW, the existing interface can lead to random memory corruption and
>> inacurracy of the stolen time.
>>
>>>
>>> This is done only on arm architecture, the behaviour on x86 is not
>>> modified by this patch and the address conversion is done as before
>>> during each context switch.
>>>
>>> This is introducing several limitations in comparison to the previous
>>> behaviour (on arm only):
>>> - if the guest is remapping the area at a different physical address Xen
>>> will continue to update the area at the previous physical address. As
>>> the area is in kernel space and usually defined as a global variable this
>>> is something which is believed not to happen. If this is required by a
>>> guest, it will have to call the hypercall with the new area (even if it
>>> is at the same virtual address).
>>> - the area needs to be mapped during the hypercall. For the same reasons
>>> as for the previous case, even if the area is registered for a different
>>> vcpu. It is believed that registering an area using a virtual address
>>> unmapped is not something done.
>>
>> This is not clear whether the virtual address refer to the current vCPU or the
>> vCPU you register the runstate for. From the past discussion, I think you
>> refer to the former. It would be good to clarify.
>>
>> Additionally, all the new restrictions should be documented in the public
>> interface. So an OS developper can find the differences between the
>> architectures.
>
> Just to paraphrase what Julien wrote, it would be good to improve the
> commit message with the points suggested and also write a note in the
> header file about the changes to the interface.
Ok i wil do that.
>
>
>> To answer Jan's concern, we certainly don't know all the guest OSes existing,
>> however we also need to balance the benefit for a large majority of the users.
>>
>> From previous discussion, the current approach was deemed to be acceptable on
>> Arm and, AFAICT, also x86 (see [1]).
>>
>> TBH, I would rather see the approach to be common. For that, we would an
>> agreement from Andrew and Jan in the approach here. Meanwhile, I think this is
>> the best approach to address the concern from Arm users.
>
> +1
>
>
>>> inline functions in headers could not be used as the architecture
>>> domain.h is included before the global domain.h making it impossible
>>> to use the struct vcpu inside the architecture header.
>>> This should not have any performance impact as the hypercall is only
>>> called once per vcpu usually.
>>>
>>> Signed-off-by: Bertrand Marquis <bertrand.marquis@arm.com>
>>>
>>> ---
>>> Changes in v2
>>> - use vmap to map the pages during the hypercall.
>>> - reintroduce initial copy during hypercall.
>>>
>>> Changes in v3
>>> - Fix Coding style
>>> - Fix vaddr printing on arm32
>>> - use write_atomic to modify state_entry_time update bit (only
>>> in guest structure as the bit is not used inside Xen copy)
>>>
>>> ---
>>> xen/arch/arm/domain.c | 161 ++++++++++++++++++++++++++++++-----
>>> xen/arch/x86/domain.c | 29 ++++++-
>>> xen/arch/x86/x86_64/domain.c | 4 +-
>>> xen/common/domain.c | 19 ++---
>>> xen/include/asm-arm/domain.h | 9 ++
>>> xen/include/asm-x86/domain.h | 16 ++++
>>> xen/include/xen/domain.h | 5 ++
>>> xen/include/xen/sched.h | 16 +---
>>> 8 files changed, 206 insertions(+), 53 deletions(-)
>>>
>>> diff --git a/xen/arch/arm/domain.c b/xen/arch/arm/domain.c
>>> index 31169326b2..8b36946017 100644
>>> --- a/xen/arch/arm/domain.c
>>> +++ b/xen/arch/arm/domain.c
>>> @@ -19,6 +19,7 @@
>>> #include <xen/sched.h>
>>> #include <xen/softirq.h>
>>> #include <xen/wait.h>
>>> +#include <xen/vmap.h>
>>> #include <asm/alternative.h>
>>> #include <asm/cpuerrata.h>
>>> @@ -275,36 +276,156 @@ static void ctxt_switch_to(struct vcpu *n)
>>> virt_timer_restore(n);
>>> }
>>> -/* Update per-VCPU guest runstate shared memory area (if registered). */
>>> -static void update_runstate_area(struct vcpu *v)
>>> +static void cleanup_runstate_vcpu_locked(struct vcpu *v)
>>> {
>>> - void __user *guest_handle = NULL;
>>> + if ( v->arch.runstate_guest )
>>> + {
>>> + vunmap((void *)((unsigned long)v->arch.runstate_guest &
>>> PAGE_MASK));
>>> +
>>> + put_page(v->arch.runstate_guest_page[0]);
>>> +
>>> + if ( v->arch.runstate_guest_page[1] )
>>> + put_page(v->arch.runstate_guest_page[1]);
>>> +
>>> + v->arch.runstate_guest = NULL;
>>> + }
>>> +}
>>> +
>>> +void arch_vcpu_cleanup_runstate(struct vcpu *v)
>>> +{
>>> + spin_lock(&v->arch.runstate_guest_lock);
>>> +
>>> + cleanup_runstate_vcpu_locked(v);
>>> +
>>> + spin_unlock(&v->arch.runstate_guest_lock);
>>> +}
>>> +
>>> +static int setup_runstate_vcpu_locked(struct vcpu *v, vaddr_t vaddr)
>>> +{
>>> + unsigned int offset;
>>> + mfn_t mfn[2];
>>> + struct page_info *page;
>>> + unsigned int numpages;
>>> struct vcpu_runstate_info runstate;
>>> + void *p;
>>> - if ( guest_handle_is_null(runstate_guest(v)) )
>>> - return;
>>> + /* user can pass a NULL address to unregister a previous area */
>>> + if ( vaddr == 0 )
>>> + return 0;
>>> +
>>> + offset = vaddr & ~PAGE_MASK;
>>> +
>>> + /* provided address must be aligned to a 64bit */
>>> + if ( offset % alignof(struct vcpu_runstate_info) )
>>
>> This new restriction wants to be explained in the commit message and public
>> header.
>>
>>> + {
>>> + gprintk(XENLOG_WARNING, "Cannot map runstate pointer at
>>> 0x%"PRIvaddr
>>> + ": Invalid offset\n", vaddr);
>>
>> We usually enforce 80 character per lines except for format string. So it is
>> easier to grep them in the code.
>>
>>> + return -EINVAL;
>>> + }
>>> +
>>> + page = get_page_from_gva(v, vaddr, GV2M_WRITE);
>>> + if ( !page )
>>> + {
>>> + gprintk(XENLOG_WARNING, "Cannot map runstate pointer at
>>> 0x%"PRIvaddr
>>> + ": Page is not mapped\n", vaddr);
>>> + return -EINVAL;
>>> + }
>>> +
>>> + mfn[0] = page_to_mfn(page);
>>> + v->arch.runstate_guest_page[0] = page;
>>> +
>>> + if ( offset > (PAGE_SIZE - sizeof(struct vcpu_runstate_info)) )
>>> + {
>>> + /* guest area is crossing pages */
>>> + page = get_page_from_gva(v, vaddr + PAGE_SIZE, GV2M_WRITE);
>>> + if ( !page )
>>> + {
>>> + put_page(v->arch.runstate_guest_page[0]);
>>> + gprintk(XENLOG_WARNING,
>>> + "Cannot map runstate pointer at 0x%"PRIvaddr
>>> + ": 2nd Page is not mapped\n", vaddr);
>>> + return -EINVAL;
>>> + }
>>> + mfn[1] = page_to_mfn(page);
>>> + v->arch.runstate_guest_page[1] = page;
>>> + numpages = 2;
>>> + }
>>> + else
>>> + {
>>> + v->arch.runstate_guest_page[1] = NULL;
>>> + numpages = 1;
>>> + }
>>> - memcpy(&runstate, &v->runstate, sizeof(runstate));
>>> + p = vmap(mfn, numpages);
>>> + if ( !p )
>>> + {
>>> + put_page(v->arch.runstate_guest_page[0]);
>>> + if ( numpages == 2 )
>>> + put_page(v->arch.runstate_guest_page[1]);
>>> - if ( VM_ASSIST(v->domain, runstate_update_flag) )
>>> + gprintk(XENLOG_WARNING, "Cannot map runstate pointer at
>>> 0x%"PRIvaddr
>>> + ": vmap error\n", vaddr);
>>> + return -EINVAL;
>>> + }
>>> +
>>> + v->arch.runstate_guest = p + offset;
>>> +
>>> + if (v == current)
>>> + memcpy(v->arch.runstate_guest, &v->runstate, sizeof(v->runstate));
>>> + else
>>> {
>>> - guest_handle = &v->runstate_guest.p->state_entry_time + 1;
>>> - guest_handle--;
>>> - runstate.state_entry_time |= XEN_RUNSTATE_UPDATE;
>>> - __raw_copy_to_guest(guest_handle,
>>> - (void *)(&runstate.state_entry_time + 1) - 1,
>>> 1);
>>> - smp_wmb();
>>> + vcpu_runstate_get(v, &runstate);
>>> + memcpy(v->arch.runstate_guest, &runstate, sizeof(v->runstate));
>>> }
>>> - __copy_to_guest(runstate_guest(v), &runstate, 1);
>>> + return 0;
>>> +}
>>> +
>>> +int arch_vcpu_setup_runstate(struct vcpu *v,
>>> + struct vcpu_register_runstate_memory_area
>>> area)
>>> +{
>>> + int rc;
>>> +
>>> + spin_lock(&v->arch.runstate_guest_lock);
>>> +
>>> + /* cleanup if we are recalled */
>>> + cleanup_runstate_vcpu_locked(v);
>>> +
>>> + rc = setup_runstate_vcpu_locked(v, (vaddr_t)area.addr.v);
>>> +
>>> + spin_unlock(&v->arch.runstate_guest_lock);
>>> - if ( guest_handle )
>>> + return rc;
>>> +}
>>> +
>>> +
>>> +/* Update per-VCPU guest runstate shared memory area (if registered). */
>>> +static void update_runstate_area(struct vcpu *v)
>>> +{
>>> + spin_lock(&v->arch.runstate_guest_lock);
>>> +
>>> + if ( v->arch.runstate_guest )
>>> {
>>> - runstate.state_entry_time &= ~XEN_RUNSTATE_UPDATE;
>>> - smp_wmb();
>>> - __raw_copy_to_guest(guest_handle,
>>> - (void *)(&runstate.state_entry_time + 1) - 1,
>>> 1);
>>> + if ( VM_ASSIST(v->domain, runstate_update_flag) )
>>> + {
>>> + v->runstate.state_entry_time |= XEN_RUNSTATE_UPDATE;
>>> + write_atomic(&(v->arch.runstate_guest->state_entry_time),
>>> + v->runstate.state_entry_time);
>>
>> NIT: You want to indent v-> at the same level as the argument from the first
>> line.
>>
>> Also, I think you are missing a smp_wmb() here.
>
> I just wanted to add that I reviewed the patch and aside from the
> smp_wmb (and the couple of code style NITs), there is no other issue in
> the patch that I could find. No further comments from my side.
>
>
>>> + }
>>> +
>>> + memcpy(v->arch.runstate_guest, &v->runstate, sizeof(v->runstate));
>>> +
>>> + if ( VM_ASSIST(v->domain, runstate_update_flag) )
>>> + {
>>> + /* copy must be done before switching the bit */
>>> + smp_wmb();
>>> + v->runstate.state_entry_time &= ~XEN_RUNSTATE_UPDATE;
>>> + write_atomic(&(v->arch.runstate_guest->state_entry_time),
>>> + v->runstate.state_entry_time);
>>
>> Same remark for the indentation.
next prev parent reply other threads:[~2020-07-31 13:17 UTC|newest]
Thread overview: 15+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-07-30 10:24 [PATCH v3] xen/arm: Convert runstate address during hypcall Bertrand Marquis
2020-07-30 20:50 ` Julien Grall
2020-07-31 1:18 ` Stefano Stabellini
2020-07-31 13:17 ` Bertrand Marquis [this message]
2020-07-31 12:19 ` Jan Beulich
2020-07-31 13:09 ` Bertrand Marquis
2020-07-31 15:06 ` Julien Grall
2020-07-31 13:16 ` Bertrand Marquis
2020-08-13 17:28 ` Julien Grall
2020-08-14 9:11 ` Bertrand Marquis
2020-07-31 13:26 ` Bertrand Marquis
2020-07-31 23:03 ` Stefano Stabellini
2020-08-14 9:12 ` Bertrand Marquis
2020-08-13 17:35 ` Julien Grall
2020-08-14 9:11 ` Bertrand Marquis
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=91E1094A-C03D-4DD7-AC4B-0A01330A043F@arm.com \
--to=bertrand.marquis@arm.com \
--cc=Volodymyr_Babchuk@epam.com \
--cc=andrew.cooper3@citrix.com \
--cc=george.dunlap@citrix.com \
--cc=ian.jackson@eu.citrix.com \
--cc=jbeulich@suse.com \
--cc=julien@xen.org \
--cc=nd@arm.com \
--cc=roger.pau@citrix.com \
--cc=sstabellini@kernel.org \
--cc=wl@xen.org \
--cc=xen-devel@lists.xenproject.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).