All of lore.kernel.org
 help / color / mirror / Atom feed
From: Andrew Cooper <andrew.cooper3@citrix.com>
To: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>,
	xen-devel@lists.xen.org
Cc: boris.ostrovsky@oracle.com, sherry.hurwitz@amd.com, jbeulich@suse.com
Subject: Re: [PATCH v2 05/10] x86/HVM/SVM: Add AVIC initialization code
Date: Mon, 2 Jan 2017 16:37:35 +0000	[thread overview]
Message-ID: <d38854e9-0e24-a613-2354-913219a9829c@citrix.com> (raw)
In-Reply-To: <1483163161-2402-6-git-send-email-suravee.suthikulpanit@amd.com>

On 31/12/2016 05:45, Suravee Suthikulpanit wrote:
> diff --git a/xen/arch/x86/hvm/svm/avic.c b/xen/arch/x86/hvm/svm/avic.c
> new file mode 100644
> index 0000000..b62f982
> --- /dev/null
> +++ b/xen/arch/x86/hvm/svm/avic.c
> @@ -0,0 +1,232 @@
> +/*
> + * avic.c: implements AMD Advance Virtual Interrupt Controller (AVIC) support
> + * Copyright (c) 2016, Advanced Micro Devices, Inc.
> + *
> + * This program is free software; you can redistribute it and/or modify it
> + * under the terms and conditions of the GNU General Public License,
> + * version 2, as published by the Free Software Foundation.
> + *
> + * This program is distributed in the hope it will be useful, but WITHOUT
> + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
> + * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
> + * more details.
> + *
> + * You should have received a copy of the GNU General Public License along with
> + * this program; If not, see <http://www.gnu.org/licenses/>.
> + */
> +
> +#include <xen/domain_page.h>
> +#include <xen/sched.h>
> +#include <xen/stdbool.h>
> +#include <asm/acpi.h>
> +#include <asm/apicdef.h>
> +#include <asm/event.h>
> +#include <asm/hvm/emulate.h>
> +#include <asm/hvm/nestedhvm.h>
> +#include <asm/hvm/support.h>
> +#include <asm/hvm/svm/avic.h>
> +#include <asm/hvm/vlapic.h>
> +#include <asm/p2m.h>
> +#include <asm/page.h>
> +
> +/*
> + * Note: Current max index allowed for physical APIC ID table is 255.
> + */
> +#define AVIC_PHY_APIC_ID_MAX    0xFF
> +
> +#define AVIC_DOORBELL           0xc001011b
> +
> +#define AVIC_HPA_SHIFT  12
> +#define AVIC_HPA_MASK           (((1ULL << 40) - 1) << AVIC_HPA_SHIFT)

What does this mask represent?  I have spotted a truncation bug below,
but how to fix the issue depends on the meaning of the mask.

The only limits I can see in the spec is that the host physical
addresses must be present in system RAM, which presumably means they
should follow maxphysaddr like everything else?

> +#define AVIC_VAPIC_BAR_MASK     AVIC_HPA_MASK
> +
> +/*
> + * Note:
> + * Currently, svm-avic mode is not supported with nested virtualization.
> + * Therefore, it is not yet currently enabled by default. Once the support
> + * is in-place, this should be enabled by default.
> + */
> +bool svm_avic = 0;
> +boolean_param("svm-avic", svm_avic);

We are trying to avoid the proliferation of top-level options.  Please
could you introduce a "svm=" option which takes avic as a sub-boolean,
similar to how iommu= currently works?

> +
> +static struct avic_phy_apic_id_ent *
> +avic_get_phy_apic_id_ent(const struct vcpu *v, unsigned int index)

The Physical APIC table is per-domain, not per-cpu according to the
spec.  This function should take a struct domain, although I don't think
you need it at all (see below).

> +{
> +    struct avic_phy_apic_id_ent *avic_phy_apic_id_table;
> +    struct svm_domain *d = &v->domain->arch.hvm_domain.svm;
> +
> +    if ( !d->avic_phy_apic_id_table_mfn )
> +        return NULL;
> +
> +    /*
> +    * Note: APIC ID = 0xff is used for broadcast.
> +    *       APIC ID > 0xff is reserved.
> +    */
> +    if ( index >= 0xff )
> +        return NULL;
> +
> +    avic_phy_apic_id_table = mfn_to_virt(d->avic_phy_apic_id_table_mfn);

This is unsafe and will break on hosts with more than 5TB of RAM.

As you allocate avic_phy_apic_id_table_mfn with alloc_domheap_page(),
you must use {map,unmap}_domain_page() to get temporary mappings (which
will use mfn_to_virt() in the common case), or use
{map,unmap}_domain_page_global() to get permanent mappings.

As the values in here need modifying across a schedule, I would suggest
making a global mapping at create time, and storing the result in a
properly-typed pointer.

> +
> +    return &avic_phy_apic_id_table[index];
> +}
> +
> +int svm_avic_dom_init(struct domain *d)
> +{
> +    int ret = 0;
> +    struct page_info *pg;
> +    unsigned long mfn;

mfn_t please, which would also highlight the type error in svm_domain.

> +
> +    if ( !svm_avic )
> +        return 0;

|| !has_vlapic(d)

HVMLite domains may legitimately be configured without an LAPIC at all.

> +
> +    /*
> +     * Note:
> +     * AVIC hardware walks the nested page table to check permissions,
> +     * but does not use the SPA address specified in the leaf page
> +     * table entry since it uses  address in the AVIC_BACKING_PAGE pointer
> +     * field of the VMCB. Therefore, we set up a dummy page for APIC _mfn(0).
> +     */
> +    if ( !d->arch.hvm_domain.svm.avic_access_page_done )
> +    {
> +        set_mmio_p2m_entry(d, paddr_to_pfn(APIC_DEFAULT_PHYS_BASE),
> +                           _mfn(0), PAGE_ORDER_4K,
> +                           p2m_get_hostp2m(d)->default_access);
> +        d->arch.hvm_domain.svm.avic_access_page_done = true;

I don't see the purpose of this avic_access_page_done boolean, even in
later patches.

> +    }
> +
> +    /* Init AVIC logical APIC ID table */
> +    pg = alloc_domheap_page(d, MEMF_no_owner);
> +    if ( !pg )
> +    {
> +        gdprintk(XENLOG_ERR,
> +                "%d: AVIC logical APIC ID table could not be allocated.\n",
> +                d->domain_id);

Do we really need to have a printk() in the -ENOMEM case?  I'd
personally not bother.

> +        ret = -ENOMEM;
> +        goto err_out;
> +    }
> +    mfn = page_to_mfn(pg);
> +    clear_domain_page(_mfn(mfn));
> +    d->arch.hvm_domain.svm.avic_log_apic_id_table_mfn = mfn;
> +
> +    /* Init AVIC physical APIC ID table */
> +    pg = alloc_domheap_page(d, MEMF_no_owner);
> +    if ( !pg )
> +    {
> +        gdprintk(XENLOG_ERR,
> +                "%d: AVIC physical APIC ID table could not be allocated.\n",
> +                d->domain_id);
> +        ret = -ENOMEM;
> +        goto err_out;
> +    }
> +    mfn = page_to_mfn(pg);
> +    clear_domain_page(_mfn(mfn));
> +    d->arch.hvm_domain.svm.avic_phy_apic_id_table_mfn = mfn;
> +
> +    return ret;
> + err_out:
> +    svm_avic_dom_destroy(d);
> +    return ret;
> +}
> +
> +void svm_avic_dom_destroy(struct domain *d)
> +{
> +    if ( !svm_avic )
> +        return;
> +
> +    if ( d->arch.hvm_domain.svm.avic_phy_apic_id_table_mfn )
> +    {
> +        free_domheap_page(mfn_to_page(d->arch.hvm_domain.svm.avic_phy_apic_id_table_mfn));
> +        d->arch.hvm_domain.svm.avic_phy_apic_id_table_mfn = 0;
> +    }
> +
> +    if ( d->arch.hvm_domain.svm.avic_log_apic_id_table_mfn )
> +    {
> +        free_domheap_page(mfn_to_page(d->arch.hvm_domain.svm.avic_log_apic_id_table_mfn));
> +        d->arch.hvm_domain.svm.avic_log_apic_id_table_mfn = 0;
> +    }
> +}
> +
> +bool svm_avic_vcpu_enabled(const struct vcpu *v)
> +{
> +    const struct arch_svm_struct *s = &v->arch.hvm_svm;
> +    const struct vmcb_struct *vmcb = s->vmcb;
> +
> +    return svm_avic && vmcb->_vintr.fields.avic_enable;

This predicate should just be on avic_enable.  The svm_avic case should
prevent the avic_enable from being set in the first place.

> +}
> +
> +static inline u32 *
> +avic_get_bk_page_entry(const struct vcpu *v, u32 offset)
> +{
> +    const struct vlapic *vlapic = vcpu_vlapic(v);
> +    char *tmp;
> +
> +    if ( !vlapic || !vlapic->regs_page )
> +        return NULL;
> +
> +    tmp = (char *)page_to_virt(vlapic->regs_page);
> +    return (u32 *)(tmp + offset);

This is also unsafe over 5TB.  vlapic->regs is already a global mapping
which you should use.

Also, you should leave a comment by the allocation of regs_page that
AVIC depends on it being a full page allocation.

> +}
> +
> +void svm_avic_update_vapic_bar(const struct vcpu *v, uint64_t data)
> +{
> +    const struct arch_svm_struct *s = &v->arch.hvm_svm;
> +
> +    s->vmcb->avic_vapic_bar = data & AVIC_VAPIC_BAR_MASK;
> +    s->vmcb->cleanbits.fields.avic = 0;

While I can appreciate what you are trying to do here, altering the
physical address in IA32_APICBASE has never functioned properly under
Xen, as nothing updates the P2M.  Worse, with multi-vcpu guests, the use
of a single shared p2m makes this impossible to do properly.

I know it is non-architectural behaviour, but IMO vlapic_msr_set()
should be updated to raise #GP[0] when attempting to change the frame,
to make it fail obviously rather than having interrupts go missing.

Also I see that the Intel side (via vmx_vlapic_msr_changed()) makes the
same mistake.

> +}
> +
> +int svm_avic_init_vmcb(struct vcpu *v)
> +{
> +    paddr_t ma;
> +    u32 *apic_id_reg;
> +    struct arch_svm_struct *s = &v->arch.hvm_svm;
> +    struct vmcb_struct *vmcb = s->vmcb;
> +    struct svm_domain *d = &v->domain->arch.hvm_domain.svm;
> +    const struct vlapic *vlapic = vcpu_vlapic(v);
> +    struct avic_phy_apic_id_ent entry;
> +
> +    if ( !svm_avic )
> +        return 0;
> +
> +    if ( !vlapic || !vlapic->regs_page )
> +        return -EINVAL;

Somewhere you need a bounds check against the APIC ID being within AVIC
limits.

> +
> +    apic_id_reg = avic_get_bk_page_entry(v, APIC_ID);
> +    if ( !apic_id_reg )
> +        return -EINVAL;

This should be vlapic_read()...

> +
> +    s->avic_last_phy_id = avic_get_phy_apic_id_ent(v, *apic_id_reg >> 24);

And this, GET_xAPIC_ID(),

> +    if ( !s->avic_last_phy_id )
> +        return -EINVAL;

Why does a pointer into the physical ID page need storing in
arch_svm_struct?

> +
> +    vmcb->avic_bk_pg_pa = page_to_maddr(vlapic->regs_page) & AVIC_HPA_MASK;

This use of AVIC_HPA_MASK may truncate the the address, which is
definitely a problem.

If AVIC_HPA_MASK isn't just related to maxphysaddr, then you need to
pass appropriate memflags into the alloc_domheap_page() calls to get a
suitable page.

> +    ma = d->avic_log_apic_id_table_mfn;
> +    vmcb->avic_log_apic_id = (ma << PAGE_SHIFT) & AVIC_HPA_MASK;

This ma << PAGE_SHIFT also highlights the type error.  ma != mfn.

> +    ma = d->avic_phy_apic_id_table_mfn;
> +    vmcb->avic_phy_apic_id = (ma << PAGE_SHIFT) & AVIC_HPA_MASK;
> +    vmcb->avic_phy_apic_id |= AVIC_PHY_APIC_ID_MAX;

Surely this should be some calculation derived from d->max_vcpus?

> +
> +    entry = *(s->avic_last_phy_id);
> +    smp_rmb();
> +    entry.bk_pg_ptr = (vmcb->avic_bk_pg_pa & AVIC_HPA_MASK) >> AVIC_HPA_SHIFT;
> +    entry.is_running = 0;
> +    entry.valid = 1;
> +    *(s->avic_last_phy_id) = entry;
> +    smp_wmb();

During domain creation, no guests are running, so you can edit this cleanly.

What values are actually being excluded here?  This, and other patches,
look like you are actually just trying to do a read_atomic(), modify,
write_atomic() update, rather than actually requiring ordering.

> +
> +    svm_avic_update_vapic_bar(v, APIC_DEFAULT_PHYS_BASE);
> +
> +    vmcb->_vintr.fields.avic_enable = 1;
> +
> +    return 0;
> +}
> +
> +/*
> + * Local variables:
> + * mode: C
> + * c-file-style: "BSD"
> + * c-basic-offset: 4
> + * tab-width: 4
> + * indent-tabs-mode: nil
> + * End:
> + */
> @@ -1799,6 +1802,10 @@ static int svm_msr_write_intercept(unsigned int msr, uint64_t msr_content)
>  
>      switch ( msr )
>      {
> +    case MSR_IA32_APICBASE:
> +        if ( svm_avic_vcpu_enabled(v) )
> +            svm_avic_update_vapic_bar(v, msr_content);
> +        break;

I think this is dead code.  MSR_IA32_APICBASE is handled in
hvm_msr_write_intercept(), and won't fall into the vendor hook.  If you
were to continue down this route, I would add another pi_ops hook, and
replace the VMX layering violation in vlapic_msr_set().

>      case MSR_IA32_SYSENTER_CS:
>      case MSR_IA32_SYSENTER_ESP:
>      case MSR_IA32_SYSENTER_EIP:
> diff --git a/xen/include/asm-x86/hvm/svm/avic.h b/xen/include/asm-x86/hvm/svm/avic.h
> new file mode 100644
> index 0000000..16b433c
> --- /dev/null
> +++ b/xen/include/asm-x86/hvm/svm/avic.h
> @@ -0,0 +1,40 @@
> +#ifndef _SVM_AVIC_H_
> +#define _SVM_AVIC_H_
> +
> +enum avic_incmp_ipi_err_code {
> +    AVIC_INCMP_IPI_ERR_INVALID_INT_TYPE,
> +    AVIC_INCMP_IPI_ERR_TARGET_NOT_RUN,
> +    AVIC_INCMP_IPI_ERR_INV_TARGET,
> +    AVIC_INCMP_IPI_ERR_INV_BK_PAGE,
> +};
> +
> +struct __attribute__ ((__packed__))

Please include xen/compiler.h and use __packed

> +avic_log_apic_id_ent {
> +    u32 guest_phy_apic_id : 8;
> +    u32 res               : 23;
> +    u32 valid             : 1;
> +};
> +
> +struct __attribute__ ((__packed__))
> +avic_phy_apic_id_ent {
> +    u64 host_phy_apic_id  : 8;
> +    u64 res1              : 4;
> +    u64 bk_pg_ptr         : 40;

This field should have mfn in its name, as it has an implicit shift in
its definition.

> +    u64 res2              : 10;
> +    u64 is_running        : 1;
> +    u64 valid             : 1;
> +};
> <snip>
> diff --git a/xen/include/asm-x86/hvm/svm/vmcb.h b/xen/include/asm-x86/hvm/svm/vmcb.h
> index 43cb98e..d3d045f 100644
> --- a/xen/include/asm-x86/hvm/svm/vmcb.h
> +++ b/xen/include/asm-x86/hvm/svm/vmcb.h
> @@ -496,6 +496,24 @@ struct __packed vmcb_struct {
>  };
>  
>  struct svm_domain {
> +    /*
> +     * This per-domain table is used by the hardware to locate
> +     * the vAPIC backing page to be used to deliver interrupts
> +     * based on the guest physical APIC ID.
> +     */
> +    paddr_t avic_phy_apic_id_table_mfn;

paddrs and mfns differ by PAGE_SHIFT.  Please use mfn_t here.

> +
> +    /*
> +     * This per-domain table is used by the hardware to map
> +     * logically addressed interrupt requests (w/ guest logical APIC id)
> +     * to the guest physical APIC ID.
> +     */
> +    paddr_t avic_log_apic_id_table_mfn;
> +
> +    u32 avic_max_vcpu_id;
> +    bool avic_access_page_done;
> +    spinlock_t avic_ldr_mode_lock;

You introduce the spinlock here, but initialise it in the subsequent
patch.  I suspect that hunk wants moving into this patch?

~Andrew

> +    u32 avic_ldr_mode;
>  };
>  
>  struct arch_svm_struct {


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

  reply	other threads:[~2017-01-02 16:37 UTC|newest]

Thread overview: 48+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-12-31  5:45 [PATCH v2 00/10] Introduce AMD SVM AVIC Suravee Suthikulpanit
2016-12-31  5:45 ` [PATCH v2 01/10] x86/HVM: Introduce struct hvm_pi_ops Suravee Suthikulpanit
2017-01-05  2:54   ` Tian, Kevin
2017-01-05  7:57     ` Jan Beulich
2017-01-05 15:51   ` Jan Beulich
2017-01-10  6:51     ` Suravee Suthikulpanit
2017-01-10  8:24       ` Jan Beulich
2017-01-10  9:45         ` Suravee Suthikulpanit
2016-12-31  5:45 ` [PATCH v2 02/10] x86/vLAPIC: Declare vlapic_read_aligned() and vlapic_reg_write() as non-static Suravee Suthikulpanit
2017-01-05 15:53   ` Jan Beulich
2017-01-10  6:57     ` Suravee Suthikulpanit
2017-01-10  8:25       ` Jan Beulich
2016-12-31  5:45 ` [PATCH v2 03/10] x86/HVM: Call vlapic_destroy after vcpu_destroy Suravee Suthikulpanit
2017-01-05  2:56   ` Tian, Kevin
2017-01-05 15:56   ` Jan Beulich
2017-01-10  8:18     ` Suravee Suthikulpanit
2016-12-31  5:45 ` [PATCH v2 04/10] x86/SVM: Modify VMCB fields to add AVIC support Suravee Suthikulpanit
2016-12-31  5:45 ` [PATCH v2 05/10] x86/HVM/SVM: Add AVIC initialization code Suravee Suthikulpanit
2017-01-02 16:37   ` Andrew Cooper [this message]
2017-01-04 17:24     ` Suravee Suthikulpanit
2017-01-04 17:59       ` Andrew Cooper
2017-01-10  3:06     ` Suravee Suthikulpanit
2017-01-03 14:54   ` Boris Ostrovsky
2016-12-31  5:45 ` [PATCH v2 06/10] x86/SVM: Add AVIC vmexit handlers Suravee Suthikulpanit
2017-01-02 17:28   ` Andrew Cooper
2017-01-05  4:07     ` Suravee Suthikulpanit
2017-01-03 15:34   ` Boris Ostrovsky
2017-01-05  6:41     ` Suravee Suthikulpanit
2016-12-31  5:45 ` [PATCH v2 07/10] x86/SVM: Add vcpu scheduling support for AVIC Suravee Suthikulpanit
2017-01-02 17:35   ` Andrew Cooper
2017-01-03 15:43   ` Boris Ostrovsky
2016-12-31  5:45 ` [PATCH v2 08/10] x86/SVM: Add interrupt management code via AVIC Suravee Suthikulpanit
2017-01-02 17:45   ` Andrew Cooper
2017-02-28 12:01     ` George Dunlap
2017-01-05 16:01   ` Jan Beulich
2016-12-31  5:46 ` [PATCH v2 09/10] x86/SVM: Hook up miscellaneous AVIC functions Suravee Suthikulpanit
2017-01-02 17:49   ` Andrew Cooper
2017-01-05 16:05   ` Jan Beulich
2017-01-10  8:35     ` Suravee Suthikulpanit
2017-01-10  9:00       ` Jan Beulich
2017-01-10 10:28         ` Suravee Suthikulpanit
2016-12-31  5:46 ` [PATCH v2 10/10] x86/SVM: Add AMD AVIC key handler Suravee Suthikulpanit
2017-01-03 16:01   ` Boris Ostrovsky
2017-01-03 16:04     ` Andrew Cooper
2017-01-05  8:00       ` Suravee Suthikulpanit
2017-01-05 16:07   ` Jan Beulich
2017-01-10 11:14     ` Suravee Suthikulpanit
2017-01-10 12:55       ` Jan Beulich

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=d38854e9-0e24-a613-2354-913219a9829c@citrix.com \
    --to=andrew.cooper3@citrix.com \
    --cc=boris.ostrovsky@oracle.com \
    --cc=jbeulich@suse.com \
    --cc=sherry.hurwitz@amd.com \
    --cc=suravee.suthikulpanit@amd.com \
    --cc=xen-devel@lists.xen.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.