QEMU-Devel Archive on lore.kernel.org
 help / color / Atom feed
From: Peter Xu <peterx@redhat.com>
To: Liu Yi L <yi.l.liu@intel.com>
Cc: jean-philippe@linaro.org, kevin.tian@intel.com,
	Jacob Pan <jacob.jun.pan@linux.intel.com>,
	Yi Sun <yi.y.sun@linux.intel.com>,
	kvm@vger.kernel.org, mst@redhat.com, jun.j.tian@intel.com,
	qemu-devel@nongnu.org, eric.auger@redhat.com,
	alex.williamson@redhat.com, pbonzini@redhat.com,
	hao.wu@intel.com, yi.y.sun@intel.com,
	Richard Henderson <rth@twiddle.net>,
	david@gibson.dropbear.id.au
Subject: Re: [PATCH v1 15/22] intel_iommu: replay guest pasid bindings to host
Date: Tue, 24 Mar 2020 14:00:13 -0400
Message-ID: <20200324180013.GZ127076@xz-x1> (raw)
In-Reply-To: <1584880579-12178-16-git-send-email-yi.l.liu@intel.com>

On Sun, Mar 22, 2020 at 05:36:12AM -0700, Liu Yi L wrote:
> This patch adds guest pasid bindings replay for domain
> selective pasid cache invalidation(dsi) and global pasid
> cache invalidation by walking guest pasid table.
> 
> Reason:
> Guest OS may flush the pasid cache with a larger granularity.
> e.g. guest does a svm_bind() but flush the pasid cache with
> global or domain selective pasid cache invalidation instead
> of pasid selective(psi) pasid cache invalidation. Regards to
> such case, it works in host. Per spec, a global or domain
> selective pasid cache invalidation should be able to cover
> what a pasid selective invalidation does. The only concern
> is performance deduction since dsi and global cache invalidation
> will flush more than psi. To align with native, vIOMMU needs
> emulator needs to do replay for the two invalidation granularity
> to reflect the latest pasid bindings in guest pasid table.

This is actually related to my question in the other patch on whether
the replay can and should also work for the PSI case too.  I'm still
confused on why the guest cannot use a PSI for a newly created PASID
entry for one device?

> 
> Cc: Kevin Tian <kevin.tian@intel.com>
> Cc: Jacob Pan <jacob.jun.pan@linux.intel.com>
> Cc: Peter Xu <peterx@redhat.com>
> Cc: Yi Sun <yi.y.sun@linux.intel.com>
> Cc: Paolo Bonzini <pbonzini@redhat.com>
> Cc: Richard Henderson <rth@twiddle.net>
> Signed-off-by: Liu Yi L <yi.l.liu@intel.com>
> ---
>  hw/i386/intel_iommu.c          | 128 ++++++++++++++++++++++++++++++++++++++++-
>  hw/i386/intel_iommu_internal.h |   1 +
>  2 files changed, 127 insertions(+), 2 deletions(-)
> 
> diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
> index 0423c83..8ec638f 100644
> --- a/hw/i386/intel_iommu.c
> +++ b/hw/i386/intel_iommu.c
> @@ -2717,6 +2717,130 @@ static VTDPASIDAddressSpace *vtd_add_find_pasid_as(IntelIOMMUState *s,
>      return vtd_pasid_as;
>  }
>  
> +/**
> + * Constant information used during pasid table walk
> +   @vtd_bus, @devfn: device info
> + * @flags: indicates if it is domain selective walk
> + * @did: domain ID of the pasid table walk
> + */
> +typedef struct {
> +    VTDBus *vtd_bus;
> +    uint16_t devfn;
> +#define VTD_PASID_TABLE_DID_SEL_WALK   (1ULL << 0);
> +    uint32_t flags;
> +    uint16_t did;
> +} vtd_pasid_table_walk_info;

So this is going to be similar to VTDPASIDCacheInfo as I mentioned.
Maybe you can use a shared object for both?

> +
> +/**
> + * Caller of this function should hold iommu_lock.
> + */
> +static bool vtd_sm_pasid_table_walk_one(IntelIOMMUState *s,
> +                                        dma_addr_t pt_base,
> +                                        int start,
> +                                        int end,
> +                                        vtd_pasid_table_walk_info *info)
> +{
> +    VTDPASIDEntry pe;
> +    int pasid = start;
> +    int pasid_next;
> +    VTDPASIDAddressSpace *vtd_pasid_as;
> +    VTDPASIDCacheEntry *pc_entry;
> +
> +    while (pasid < end) {
> +        pasid_next = pasid + 1;
> +
> +        if (!vtd_get_pe_in_pasid_leaf_table(s, pasid, pt_base, &pe)
> +            && vtd_pe_present(&pe)) {
> +            vtd_pasid_as = vtd_add_find_pasid_as(s,
> +                                       info->vtd_bus, info->devfn, pasid);

For this chunk:

> +            pc_entry = &vtd_pasid_as->pasid_cache_entry;
> +            if (s->pasid_cache_gen == pc_entry->pasid_cache_gen) {
> +                vtd_update_pe_in_cache(s, vtd_pasid_as, &pe);
> +            } else {
> +                vtd_fill_in_pe_in_cache(s, vtd_pasid_as, &pe);
> +            }

We already got &pe, then would it be easier to simply call:

               vtd_update_pe_in_cache(s, vtd_pasid_as, &pe);

?

Since IIUC the cache_gen is only helpful to avoid looking up the &pe.
And the vtd_pasid_entry_compare() check should be even more solid than
the cache_gen.

> +        }
> +        pasid = pasid_next;
> +    }
> +    return true;
> +}
> +
> +/*
> + * Currently, VT-d scalable mode pasid table is a two level table,
> + * this function aims to loop a range of PASIDs in a given pasid
> + * table to identify the pasid config in guest.
> + * Caller of this function should hold iommu_lock.
> + */
> +static void vtd_sm_pasid_table_walk(IntelIOMMUState *s,
> +                                    dma_addr_t pdt_base,
> +                                    int start,
> +                                    int end,
> +                                    vtd_pasid_table_walk_info *info)
> +{
> +    VTDPASIDDirEntry pdire;
> +    int pasid = start;
> +    int pasid_next;
> +    dma_addr_t pt_base;
> +
> +    while (pasid < end) {
> +        pasid_next = pasid + VTD_PASID_TBL_ENTRY_NUM;
> +        if (!vtd_get_pdire_from_pdir_table(pdt_base, pasid, &pdire)
> +            && vtd_pdire_present(&pdire)) {
> +            pt_base = pdire.val & VTD_PASID_TABLE_BASE_ADDR_MASK;
> +            if (!vtd_sm_pasid_table_walk_one(s,
> +                              pt_base, pasid, pasid_next, info)) {

vtd_sm_pasid_table_walk_one() never returns false.  Remove this check?
Maybe also let vtd_sm_pasid_table_walk_one() to return nothing.

> +                break;
> +            }
> +        }
> +        pasid = pasid_next;
> +    }
> +}
> +
> +/**
> + * This function replay the guest pasid bindings to hots by
> + * walking the guest PASID table. This ensures host will have
> + * latest guest pasid bindings.
> + */
> +static void vtd_replay_guest_pasid_bindings(IntelIOMMUState *s,
> +                                            uint16_t *did,
> +                                            bool is_dsi)
> +{
> +    VTDContextEntry ce;
> +    VTDHostIOMMUContext *vtd_dev_icx;
> +    int bus_n, devfn;
> +    vtd_pasid_table_walk_info info;
> +
> +    if (is_dsi) {
> +        info.flags = VTD_PASID_TABLE_DID_SEL_WALK;
> +        info.did = *did;
> +    }
> +
> +    /*
> +     * In this replay, only needs to care about the devices which
> +     * are backed by host IOMMU. For such devices, their vtd_dev_icx
> +     * instances are in the s->vtd_dev_icx_list. For devices which
> +     * are not backed byhost IOMMU, it is not necessary to replay
> +     * the bindings since their cache could be re-created in the future
> +     * DMA address transaltion.
> +     */
> +    vtd_iommu_lock(s);
> +    QLIST_FOREACH(vtd_dev_icx, &s->vtd_dev_icx_list, next) {
> +        bus_n = pci_bus_num(vtd_dev_icx->vtd_bus->bus);
> +        devfn = vtd_dev_icx->devfn;
> +
> +        if (!vtd_dev_to_context_entry(s, bus_n, devfn, &ce)) {
> +            info.vtd_bus = vtd_dev_icx->vtd_bus;
> +            info.devfn = devfn;
> +            vtd_sm_pasid_table_walk(s,
> +                                    VTD_CE_GET_PASID_DIR_TABLE(&ce),
> +                                    0,
> +                                    VTD_MAX_HPASID,
> +                                    &info);
> +        }
> +    }
> +    vtd_iommu_unlock(s);
> +}
> +
>  static int vtd_pasid_cache_dsi(IntelIOMMUState *s, uint16_t domain_id)
>  {
>      VTDPASIDCacheInfo pc_info;
> @@ -2735,7 +2859,6 @@ static int vtd_pasid_cache_dsi(IntelIOMMUState *s, uint16_t domain_id)
>      vtd_iommu_unlock(s);
>  
>      /*
> -     * TODO:
>       * Domain selective PASID cache invalidation flushes
>       * all the pasid caches within a domain. To be safe,
>       * after invalidating the pasid caches, emulator needs
> @@ -2743,6 +2866,7 @@ static int vtd_pasid_cache_dsi(IntelIOMMUState *s, uint16_t domain_id)
>       * dir and pasid table. e.g. When the guest setup a new
>       * PASID entry then send a PASID DSI.
>       */
> +    vtd_replay_guest_pasid_bindings(s, &domain_id, true);
>      return 0;
>  }
>  
> @@ -2881,13 +3005,13 @@ static int vtd_pasid_cache_gsi(IntelIOMMUState *s)
>      vtd_iommu_unlock(s);
>  
>      /*
> -     * TODO:
>       * Global PASID cache invalidation flushes all
>       * the pasid caches. To be safe, after invalidating
>       * the pasid caches, emulator needs to replay the
>       * pasid bindings by walking guest pasid dir and
>       * pasid table.
>       */
> +    vtd_replay_guest_pasid_bindings(s, NULL, false);
>      return 0;
>  }
>  
> diff --git a/hw/i386/intel_iommu_internal.h b/hw/i386/intel_iommu_internal.h
> index 4451acf..b0a324c 100644
> --- a/hw/i386/intel_iommu_internal.h
> +++ b/hw/i386/intel_iommu_internal.h
> @@ -554,6 +554,7 @@ typedef struct VTDPASIDCacheInfo VTDPASIDCacheInfo;
>  #define VTD_PASID_TABLE_BITS_MASK     (0x3fULL)
>  #define VTD_PASID_TABLE_INDEX(pasid)  ((pasid) & VTD_PASID_TABLE_BITS_MASK)
>  #define VTD_PASID_ENTRY_FPD           (1ULL << 1) /* Fault Processing Disable */
> +#define VTD_PASID_TBL_ENTRY_NUM       (1ULL << 6)
>  
>  /* PASID Granular Translation Type Mask */
>  #define VTD_PASID_ENTRY_P              1ULL
> -- 
> 2.7.4
> 

-- 
Peter Xu



  reply index

Thread overview: 80+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-03-22 12:35 [PATCH v1 00/22] intel_iommu: expose Shared Virtual Addressing to VMs Liu Yi L
2020-03-22 12:35 ` [PATCH v1 01/22] scripts/update-linux-headers: Import iommu.h Liu Yi L
2020-03-22 12:35 ` [PATCH v1 02/22] header file update VFIO/IOMMU vSVA APIs Liu Yi L
2020-03-29 16:32   ` Auger Eric
2020-03-30  7:06     ` Liu, Yi L
2020-03-22 12:36 ` [PATCH v1 03/22] vfio: check VFIO_TYPE1_NESTING_IOMMU support Liu Yi L
2020-03-22 12:36 ` [PATCH v1 04/22] hw/iommu: introduce HostIOMMUContext Liu Yi L
2020-03-23 20:58   ` Peter Xu
2020-03-24 10:00     ` Liu, Yi L
2020-03-22 12:36 ` [PATCH v1 05/22] hw/pci: modify pci_setup_iommu() to set PCIIOMMUOps Liu Yi L
2020-03-22 12:36 ` [PATCH v1 06/22] hw/pci: introduce pci_device_set/unset_iommu_context() Liu Yi L
2020-03-23 21:15   ` Peter Xu
2020-03-24 10:02     ` Liu, Yi L
2020-03-22 12:36 ` [PATCH v1 07/22] intel_iommu: add set/unset_iommu_context callback Liu Yi L
2020-03-23 21:29   ` Peter Xu
2020-03-24 11:15     ` Liu, Yi L
2020-03-24 15:24       ` Peter Xu
2020-03-25  9:37         ` Liu, Yi L
2020-03-22 12:36 ` [PATCH v1 08/22] vfio: init HostIOMMUContext per-container Liu Yi L
     [not found]   ` <20200323213943.GR127076@xz-x1>
2020-03-24 13:03     ` Liu, Yi L
2020-03-24 14:45       ` Peter Xu
2020-03-25  9:30         ` Liu, Yi L
2020-03-22 12:36 ` [PATCH v1 09/22] vfio/common: check PASID alloc/free availability Liu Yi L
2020-03-23 22:06   ` Peter Xu
2020-03-24 11:18     ` Liu, Yi L
2020-03-22 12:36 ` [PATCH v1 10/22] intel_iommu: add virtual command capability support Liu Yi L
2020-03-22 12:36 ` [PATCH v1 11/22] intel_iommu: process PASID cache invalidation Liu Yi L
2020-03-22 12:36 ` [PATCH v1 12/22] intel_iommu: add PASID cache management infrastructure Liu Yi L
2020-03-24 17:32   ` Peter Xu
2020-03-25 12:20     ` Liu, Yi L
2020-03-25 14:52       ` Peter Xu
2020-03-26  6:15         ` Liu, Yi L
2020-03-26 13:57           ` Liu, Yi L
2020-03-26 15:53             ` Peter Xu
2020-03-27  1:33               ` Liu, Yi L
2020-03-22 12:36 ` [PATCH v1 13/22] vfio: add bind stage-1 page table support Liu Yi L
2020-03-24 17:41   ` Peter Xu
2020-03-25  9:49     ` Liu, Yi L
2020-03-22 12:36 ` [PATCH v1 14/22] intel_iommu: bind/unbind guest page table to host Liu Yi L
2020-03-24 17:46   ` Peter Xu
2020-03-25 12:42     ` Liu, Yi L
2020-03-25 14:56       ` Peter Xu
2020-03-26  3:04         ` Liu, Yi L
2020-03-25 12:47     ` Liu, Yi L
2020-03-22 12:36 ` [PATCH v1 15/22] intel_iommu: replay guest pasid bindings " Liu Yi L
2020-03-24 18:00   ` Peter Xu [this message]
2020-03-25 13:14     ` Liu, Yi L
2020-03-25 15:06       ` Peter Xu
2020-03-26  3:17         ` Liu, Yi L
2020-03-22 12:36 ` [PATCH v1 16/22] intel_iommu: replay pasid binds after context cache invalidation Liu Yi L
2020-03-24 18:07   ` Peter Xu
2020-03-25 13:18     ` Liu, Yi L
2020-03-22 12:36 ` [PATCH v1 17/22] intel_iommu: do not pass down pasid bind for PASID #0 Liu Yi L
2020-03-24 18:13   ` Peter Xu
2020-03-25 10:42     ` Liu, Yi L
2020-03-25 15:12       ` Peter Xu
2020-03-26  2:42         ` Liu, Yi L
2020-03-22 12:36 ` [PATCH v1 18/22] vfio: add support for flush iommu stage-1 cache Liu Yi L
2020-03-24 18:19   ` Peter Xu
2020-03-25 10:40     ` Liu, Yi L
2020-03-22 12:36 ` [PATCH v1 19/22] intel_iommu: process PASID-based iotlb invalidation Liu Yi L
2020-03-24 18:26   ` Peter Xu
2020-03-25 13:36     ` Liu, Yi L
2020-03-25 15:15       ` Peter Xu
2020-03-29 11:17         ` Liu, Yi L
2020-03-22 12:36 ` [PATCH v1 20/22] intel_iommu: propagate PASID-based iotlb invalidation to host Liu Yi L
2020-03-24 18:34   ` Peter Xu
2020-03-25 13:21     ` Liu, Yi L
2020-03-26  5:41       ` Liu, Yi L
2020-03-26 13:02         ` Peter Xu
2020-03-26 13:22           ` Peter Xu
2020-03-26 13:33             ` Liu, Yi L
2020-03-26 13:23           ` Liu, Yi L
2020-03-22 12:36 ` [PATCH v1 21/22] intel_iommu: process PASID-based Device-TLB invalidation Liu Yi L
2020-03-24 18:36   ` Peter Xu
2020-03-25  9:19     ` Liu, Yi L
2020-03-22 12:36 ` [PATCH v1 22/22] intel_iommu: modify x-scalable-mode to be string option Liu Yi L
2020-03-24 18:39   ` Peter Xu
2020-03-25 13:22     ` Liu, Yi L
2020-03-22 13:25 ` [PATCH v1 00/22] intel_iommu: expose Shared Virtual Addressing to VMs no-reply

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200324180013.GZ127076@xz-x1 \
    --to=peterx@redhat.com \
    --cc=alex.williamson@redhat.com \
    --cc=david@gibson.dropbear.id.au \
    --cc=eric.auger@redhat.com \
    --cc=hao.wu@intel.com \
    --cc=jacob.jun.pan@linux.intel.com \
    --cc=jean-philippe@linaro.org \
    --cc=jun.j.tian@intel.com \
    --cc=kevin.tian@intel.com \
    --cc=kvm@vger.kernel.org \
    --cc=mst@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=rth@twiddle.net \
    --cc=yi.l.liu@intel.com \
    --cc=yi.y.sun@intel.com \
    --cc=yi.y.sun@linux.intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

QEMU-Devel Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/qemu-devel/0 qemu-devel/git/0.git
	git clone --mirror https://lore.kernel.org/qemu-devel/1 qemu-devel/git/1.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 qemu-devel qemu-devel/ https://lore.kernel.org/qemu-devel \
		qemu-devel@nongnu.org
	public-inbox-index qemu-devel

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.nongnu.qemu-devel


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git