QEMU-Devel Archive on lore.kernel.org
 help / color / Atom feed
From: "Liu, Yi L" <yi.l.liu@intel.com>
To: Peter Xu <peterx@redhat.com>
Cc: "jean-philippe@linaro.org" <jean-philippe@linaro.org>,
	"Tian, Kevin" <kevin.tian@intel.com>,
	Jacob Pan <jacob.jun.pan@linux.intel.com>,
	Yi Sun <yi.y.sun@linux.intel.com>,
	Eduardo Habkost <ehabkost@redhat.com>,
	"kvm@vger.kernel.org" <kvm@vger.kernel.org>,
	"mst@redhat.com" <mst@redhat.com>,
	"Tian, Jun J" <jun.j.tian@intel.com>,
	"qemu-devel@nongnu.org" <qemu-devel@nongnu.org>,
	"eric.auger@redhat.com" <eric.auger@redhat.com>,
	"alex.williamson@redhat.com" <alex.williamson@redhat.com>,
	"pbonzini@redhat.com" <pbonzini@redhat.com>,
	"Wu, Hao" <hao.wu@intel.com>, "Sun, Yi Y" <yi.y.sun@intel.com>,
	Richard Henderson <rth@twiddle.net>,
	"david@gibson.dropbear.id.au" <david@gibson.dropbear.id.au>
Subject: RE: [PATCH v1 12/22] intel_iommu: add PASID cache management infrastructure
Date: Thu, 26 Mar 2020 13:57:10 +0000
Message-ID: <A2975661238FB949B60364EF0F2C25743A204614@SHSMSX104.ccr.corp.intel.com> (raw)
In-Reply-To: <A2975661238FB949B60364EF0F2C25743A203F2C@SHSMSX104.ccr.corp.intel.com>

> From: Liu, Yi L
> Sent: Thursday, March 26, 2020 2:15 PM
> To: 'Peter Xu' <peterx@redhat.com>
> Subject: RE: [PATCH v1 12/22] intel_iommu: add PASID cache management
> infrastructure
> 
> > From: Peter Xu <peterx@redhat.com>
> > Sent: Wednesday, March 25, 2020 10:52 PM
> > To: Liu, Yi L <yi.l.liu@intel.com>
> > Subject: Re: [PATCH v1 12/22] intel_iommu: add PASID cache management
> > infrastructure
> >
> > On Wed, Mar 25, 2020 at 12:20:21PM +0000, Liu, Yi L wrote:
> > > > From: Peter Xu <peterx@redhat.com>
> > > > Sent: Wednesday, March 25, 2020 1:32 AM
> > > > To: Liu, Yi L <yi.l.liu@intel.com>
> > > > Subject: Re: [PATCH v1 12/22] intel_iommu: add PASID cache
> > > > management infrastructure
> > > >
> > > > On Sun, Mar 22, 2020 at 05:36:09AM -0700, Liu Yi L wrote:
> > > > > This patch adds a PASID cache management infrastructure based on
> > > > > new added structure VTDPASIDAddressSpace, which is used to track
> > > > > the PASID usage and future PASID tagged DMA address translation
> > > > > support in vIOMMU.
[...]
> > > >
> > > > <START>
> > > >
> > > > > +    /*
> > > > > +     * Step 2: loop all the exisitng vtd_dev_icx instances.
> > > > > +     * Ideally, needs to loop all devices to find if there is any new
> > > > > +     * PASID binding regards to the PASID cache invalidation request.
> > > > > +     * But it is enough to loop the devices which are backed by host
> > > > > +     * IOMMU. For devices backed by vIOMMU (a.k.a emulated devices),
> > > > > +     * if new PASID happened on them, their vtd_pasid_as instance could
> > > > > +     * be created during future vIOMMU DMA translation.
> > > > > +     */
> > > > > +    QLIST_FOREACH(vtd_dev_icx, &s->vtd_dev_icx_list, next) {
> > > > > +        VTDPASIDAddressSpace *vtd_pasid_as;
> > > > > +        VTDPASIDCacheEntry *pc_entry;
> > > > > +        VTDPASIDEntry pe;
> > > > > +        VTDBus *vtd_bus = vtd_dev_icx->vtd_bus;
> > > > > +        uint16_t devfn = vtd_dev_icx->devfn;
> > > > > +        int bus_n = pci_bus_num(vtd_bus->bus);
> > > > > +
> > > > > +        /* i) fetch vtd_pasid_as and check if it is valid */
> > > > > +        vtd_pasid_as = vtd_add_find_pasid_as(s, vtd_bus,
> > > > > +                                             devfn, pasid);
> > > >
> > > > I don't feel like it's correct here...
> > > >
> > > > Assuming we have two devices assigned D1, D2.  D1 uses PASID=1, D2
> > > > uses
> > PASID=2.
> > > > When invalidating against PASID=1, are you also going to create a
> > > > VTDPASIDAddressSpace also for D2 with PASID=1?
> > >
> > > Answer is no. Before going forward, let's see if the below flow looks good to you.
> > >
> > > Let me add one more device besides D1 and D2. Say device D3 which
> > > also uses PASID=1. And assume it begins with no PASID usage in guest.
> > >
> > > Then the flow from scratch is:
> > >
> > > a) guest IOMMU driver setup PASID entry for D1 with PASID=1,
> > >    then invalidates against PASID #1
> > > b) trap to QEMU, and comes to this function. Since there is
> > >    no previous pasid cache invalidation, so the Step 1 of this
> > >    function has nothing to do, then goes to Step 2 which is to
> > >    loop all assigned devices and check if the guest pasid entry
> > >    is present. In this loop, should find D1's pasid entry for
> > >    PASID#1 is present. So create the VTDPASIDAddressSpace and
> > >    initialize its pasid_cache_entry and pasid_cache_gen fields.
> > > c) guest IOMMU driver setup PASID entry for D2 with PASID=2,
> > >    then invalidates against PASID #2
> > > d) same with b), only difference is the Step 1 of this function
> > >    will loop the VTDPASIDAddressSpace created in b), but its
> > >    pasid is 1 which is not the target of current pasid cache
> > >    invalidation. Same with b), in Step 2, it will create a
> > >    VTDPASIDAddressSpace for D2+PASID#2
> > > e) guest IOMMU driver setup PASID entry for D3 with PASID=1,
> > >    then invalidates against PASID #1
> > > f) trap to QEMU and comes to this function. Step 1 loops two
> > >    VTDPASIDAddressSpace created in b) and d), and it finds there
> > >    is a VTDPASIDAddressSpace whose pasid is 1. vtd_flush_pasid()
> > >    will check if the cached pasid entry is the same with the one
> > >    in guest memory. In this flow, it should be the same, so
> > >    vtd_flush_pasid() will do nothing for it. Then comes to Step 2,
> > >    it loops D1, D2, D3.
> > >    - For D1, no need to do more since there is already a
> > >      VTDPASIDAddressSpace created for D1+PASID#1.
> > >    - For D2, its guest pasid entry for PASID#1 is not present, so
> > >      no need to do anything for it.
> > >    - For D3, its guest pasid entry for PASID#1 is present and it
> > >      is exactly the reason for this invalidation. So create a
> > >      VTDPASIDAddressSpace for and init the pasid_cache_entry and
> > >      pasid_cache_gen fields.
> > >
> > > > I feel like we shouldn't create VTDPASIDAddressSpace only if it
> > > > existed, say, until when we reach vtd_dev_get_pe_from_pasid() below with
> retcode==0.
> > >
> > > You are right. I think I failed to destroy the VTDAddressSpace when
> > > vtd_dev_get_pe_from_pasid() returns error. Thus the code won't
> > > create a VTDPASIDAddressSpace for D2+PASID#1.
> >
> > OK, but that free() is really not necessary...  Please see below.
> >
> > >
> > > > Besides this...
> > > >
> > > > > +        pc_entry = &vtd_pasid_as->pasid_cache_entry;
> > > > > +        if (s->pasid_cache_gen == pc_entry->pasid_cache_gen) {
> > > > > +            /*
> > > > > +             * pasid_cache_gen equals to s->pasid_cache_gen means
> > > > > +             * vtd_pasid_as is valid after the above s->vtd_pasid_as
> > > > > +             * updates in Step 1. Thus no need for the below steps.
> > > > > +             */
> > > > > +            continue;
> > > > > +        }
> > > > > +
> > > > > +        /*
> > > > > +         * ii) vtd_pasid_as is not valid, it's potentailly a new
> > > > > +         *    pasid bind. Fetch guest pasid entry.
> > > > > +         */
> > > > > +        if (vtd_dev_get_pe_from_pasid(s, bus_n, devfn, pasid,
> > > > > + &pe)) {
> > >
> > > Yi: should destroy pasid_as as there is no valid pasid entry. Thus
> > > to ensure all the pasid_as in hash table are valid.
> > >
> > > > > +            continue;
> > > > > +        }
> > > > > +
> > > > > +        /*
> > > > > +         * iii) pasid entry exists, update pasid cache
> > > > > +         *
> > > > > +         * Here need to check domain ID since guest pasid entry
> > > > > +         * exists. What needs to do are:
> > > > > +         *   - update the pc_entry in the vtd_pasid_as
> > > > > +         *   - set proper pc_entry.pasid_cache_gen
> > > > > +         *   - pass down the latest guest pasid entry config to host
> > > > > +         *     (will be added in later patch)
> > > > > +         */
> > > > > +        if (domain_id == vtd_pe_get_domain_id(&pe)) {
> > > > > +            vtd_fill_in_pe_in_cache(s, vtd_pasid_as, &pe);
> > > > > +        }
> > > > > +    }
> > > >
> > > > <END>
> > > >
> > > > ... I'm a bit confused on the whole range between <START> and
> > > > <END> on how it differs from the vtd_replay_guest_pasid_bindings() you're
> going to introduce.
> > > > Shouldn't the replay code do similar thing?  Can we merge them?
> > >
> > > Yes, there is similar thing which is to create VTDPASIDAddressSpace
> > > per the guest pasid entry presence.
> > >
> > > But there are differences. For one, the code here is to loop all
> > > assigned devices for a specific PASID. While the logic in
> > > vtd_replay_guest_pasid_bindings() is to loop all assigned devices
> > > and the PASID tables behind them. For two, the code here only cares
> > > about the case which guest pasid entry from INVALID->VALID.
> > > The reason is in Step 1 of this function, VALID->INVALID and
> > > VALID->VALID cases are already covered. While the logic in
> > > vtd_replay_guest_pasid_bindings() needs to cover all the three cases.
> > > The last reason I didn't merge them is in
> > > vtd_replay_guest_pasid_bindings() it needs to divide the pasid entry
> > > fetch into two steps and check the result (if fetch pasid directory
> > > entry failed, it could skip a range of PASIDs). While the code in
> > > this function, it doesn't care about it, it only cares if there is a
> > > valid pasid entry for this specific pasid.
> > >
> > > >
> > > > My understanding is that we can just make sure to do it right once
> > > > in the replay code (the three cases: INVALID->VALID,
> > > > VALID->INVALID,
> > > > VALID->VALID), then no matter whether it's DSI/PSI/GSI, we call
> > > > VALID->the
> > > > replay code probably with VTDPASIDCacheInfo* passed in, then the
> > > > replay code
> > will
> > > > know what to look after.
> > >
> > > Hmmm, let me think more to abstract the code between the <START> and
> > > <END> to be a helper function to be called by
> > > vtd_replay_guest_pasid_bindings(). Also, in that case, I need to
> > > apply the two step concept in the replay function.
> >
> > ... I think your vtd_sm_pasid_table_walk() is already suitable for
> > this because it allows to specify "start" and "end" pasid.  Now you're
> > always passing in the (0, VTD_MAX_HPASID) tuple, here you can simply
> > pass in (pasid, pasid+1).  But I think you need to touch up
> > vtd_sm_pasid_table_walk() a bit to make sure it allows the pasid to be
> > not aliged to VTD_PASID_TBL_ENTRY_NUM.
> >
> > Another thing is I think vtd_sm_pasid_table_walk_one() didn't really
> > check vtd_pasid_table_walk_info.did domain information...  When that's
> > fixed, this case is the same as the DSI walk with a specific pasid
> > range.
> 
> got it, let me refactor them (PSI and replay).
> 
> > And again, please also consider to use VTDPASIDCacheInfo to be used
> > directly during the page walk, so vtd_pasid_table_walk_info can be
> > replaced I think, because IIUC VTDPASIDCacheInfo contains all
> > information the table walk will need.
> 
> yes, no need to have the walk_info structure.
I'm not quite get here. Why cache_gen increase only happen in PSI
hook? I think cache_gen used to avoid drop all pasid_as when a pasid
cache reset happened.


Today, I'm trying to replace vtd_pasid_table_walk_info with
VTDPASIDCacheInfo. But I found it may be a little bit strange.
The vtd_pasid_table_walk_info include vtd_bus/devfn/did and a
flag to indicate if did is useful. The final user of the walk
info is vtd_sm_pasid_table_walk_one() which only cares about
the the vtd_bus/devfn/did. But VTDPASIDCacheInfo has an extra
pasid field and also has multiple flag definitions, which are
not necessary for the table work. So it appears to me use
separate structure would be better. Maybe I can show you when
sending out the code.

> > >
> > > > > +
> > > > > +    vtd_iommu_unlock(s);
> > > > >      return 0;
> > > > >  }
> > > > >
> > > > > +/**
> > > > > + * Caller of this function should hold iommu_lock  */ static
> > > > > +void vtd_pasid_cache_reset(IntelIOMMUState *s) {
> > > > > +    VTDPASIDCacheInfo pc_info;
> > > > > +
> > > > > +    trace_vtd_pasid_cache_reset();
> > > > > +
> > > > > +    pc_info.flags = VTD_PASID_CACHE_GLOBAL;
> > > > > +
> > > > > +    /*
> > > > > +     * Reset pasid cache is a big hammer, so use
> > > > > +     * g_hash_table_foreach_remove which will free
> > > > > +     * the vtd_pasid_as instances, indicates the
> > > > > +     * cached pasid_cache_gen would be set to 0.
> > > > > +     */
> > > > > +    g_hash_table_foreach_remove(s->vtd_pasid_as,
> > > > > +                           vtd_flush_pasid, &pc_info);
> > > >
> > > > Would this make sure the per pasid_as pasid_cache_gen will be reset to zero?
> > I'm
> > > > not very sure, say, what if the memory is stall during a reset and
> > > > still have the
> > old
> > > > data?
> > > >
> > > > I'm not sure, but I feel like we should simply drop all pasid_as
> > > > here, rather than using the same code for a global pasid invalidation.
> > >
> > > I see. Maybe I can get another helper function which always returns
> > > true, and replace vtd_flush_pasid with the new function. This should
> > > ensure all pasid_as are dropped. How do you think?
> >
> > g_hash_table_remove_all() might be easier. :)
> 
> right. I'll make it.

Sorry to reclaim my reply here. I think here still needs a function (say
vtd_flush_pasid) to check if needs to notify host do unbind. e.g. If guest
unbind a pasid in guest, and issues a GSI (pasid cache), remove_all()
will drop all pasid_as, this would be a problem. The guest unbind will
not be propagated to host. And even we add a replay after it, it can
only shadow the bindings in guest to host, but cannot figure out an
unbind. But I agree with you that vtd_pasid_cache_reset() should drop
all pasid_as but also needs to notify host properly.

Thanks,
Yi Liu

  reply index

Thread overview: 80+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-03-22 12:35 [PATCH v1 00/22] intel_iommu: expose Shared Virtual Addressing to VMs Liu Yi L
2020-03-22 12:35 ` [PATCH v1 01/22] scripts/update-linux-headers: Import iommu.h Liu Yi L
2020-03-22 12:35 ` [PATCH v1 02/22] header file update VFIO/IOMMU vSVA APIs Liu Yi L
2020-03-29 16:32   ` Auger Eric
2020-03-30  7:06     ` Liu, Yi L
2020-03-22 12:36 ` [PATCH v1 03/22] vfio: check VFIO_TYPE1_NESTING_IOMMU support Liu Yi L
2020-03-22 12:36 ` [PATCH v1 04/22] hw/iommu: introduce HostIOMMUContext Liu Yi L
2020-03-23 20:58   ` Peter Xu
2020-03-24 10:00     ` Liu, Yi L
2020-03-22 12:36 ` [PATCH v1 05/22] hw/pci: modify pci_setup_iommu() to set PCIIOMMUOps Liu Yi L
2020-03-22 12:36 ` [PATCH v1 06/22] hw/pci: introduce pci_device_set/unset_iommu_context() Liu Yi L
2020-03-23 21:15   ` Peter Xu
2020-03-24 10:02     ` Liu, Yi L
2020-03-22 12:36 ` [PATCH v1 07/22] intel_iommu: add set/unset_iommu_context callback Liu Yi L
2020-03-23 21:29   ` Peter Xu
2020-03-24 11:15     ` Liu, Yi L
2020-03-24 15:24       ` Peter Xu
2020-03-25  9:37         ` Liu, Yi L
2020-03-22 12:36 ` [PATCH v1 08/22] vfio: init HostIOMMUContext per-container Liu Yi L
     [not found]   ` <20200323213943.GR127076@xz-x1>
2020-03-24 13:03     ` Liu, Yi L
2020-03-24 14:45       ` Peter Xu
2020-03-25  9:30         ` Liu, Yi L
2020-03-22 12:36 ` [PATCH v1 09/22] vfio/common: check PASID alloc/free availability Liu Yi L
2020-03-23 22:06   ` Peter Xu
2020-03-24 11:18     ` Liu, Yi L
2020-03-22 12:36 ` [PATCH v1 10/22] intel_iommu: add virtual command capability support Liu Yi L
2020-03-22 12:36 ` [PATCH v1 11/22] intel_iommu: process PASID cache invalidation Liu Yi L
2020-03-22 12:36 ` [PATCH v1 12/22] intel_iommu: add PASID cache management infrastructure Liu Yi L
2020-03-24 17:32   ` Peter Xu
2020-03-25 12:20     ` Liu, Yi L
2020-03-25 14:52       ` Peter Xu
2020-03-26  6:15         ` Liu, Yi L
2020-03-26 13:57           ` Liu, Yi L [this message]
2020-03-26 15:53             ` Peter Xu
2020-03-27  1:33               ` Liu, Yi L
2020-03-22 12:36 ` [PATCH v1 13/22] vfio: add bind stage-1 page table support Liu Yi L
2020-03-24 17:41   ` Peter Xu
2020-03-25  9:49     ` Liu, Yi L
2020-03-22 12:36 ` [PATCH v1 14/22] intel_iommu: bind/unbind guest page table to host Liu Yi L
2020-03-24 17:46   ` Peter Xu
2020-03-25 12:42     ` Liu, Yi L
2020-03-25 14:56       ` Peter Xu
2020-03-26  3:04         ` Liu, Yi L
2020-03-25 12:47     ` Liu, Yi L
2020-03-22 12:36 ` [PATCH v1 15/22] intel_iommu: replay guest pasid bindings " Liu Yi L
2020-03-24 18:00   ` Peter Xu
2020-03-25 13:14     ` Liu, Yi L
2020-03-25 15:06       ` Peter Xu
2020-03-26  3:17         ` Liu, Yi L
2020-03-22 12:36 ` [PATCH v1 16/22] intel_iommu: replay pasid binds after context cache invalidation Liu Yi L
2020-03-24 18:07   ` Peter Xu
2020-03-25 13:18     ` Liu, Yi L
2020-03-22 12:36 ` [PATCH v1 17/22] intel_iommu: do not pass down pasid bind for PASID #0 Liu Yi L
2020-03-24 18:13   ` Peter Xu
2020-03-25 10:42     ` Liu, Yi L
2020-03-25 15:12       ` Peter Xu
2020-03-26  2:42         ` Liu, Yi L
2020-03-22 12:36 ` [PATCH v1 18/22] vfio: add support for flush iommu stage-1 cache Liu Yi L
2020-03-24 18:19   ` Peter Xu
2020-03-25 10:40     ` Liu, Yi L
2020-03-22 12:36 ` [PATCH v1 19/22] intel_iommu: process PASID-based iotlb invalidation Liu Yi L
2020-03-24 18:26   ` Peter Xu
2020-03-25 13:36     ` Liu, Yi L
2020-03-25 15:15       ` Peter Xu
2020-03-29 11:17         ` Liu, Yi L
2020-03-22 12:36 ` [PATCH v1 20/22] intel_iommu: propagate PASID-based iotlb invalidation to host Liu Yi L
2020-03-24 18:34   ` Peter Xu
2020-03-25 13:21     ` Liu, Yi L
2020-03-26  5:41       ` Liu, Yi L
2020-03-26 13:02         ` Peter Xu
2020-03-26 13:22           ` Peter Xu
2020-03-26 13:33             ` Liu, Yi L
2020-03-26 13:23           ` Liu, Yi L
2020-03-22 12:36 ` [PATCH v1 21/22] intel_iommu: process PASID-based Device-TLB invalidation Liu Yi L
2020-03-24 18:36   ` Peter Xu
2020-03-25  9:19     ` Liu, Yi L
2020-03-22 12:36 ` [PATCH v1 22/22] intel_iommu: modify x-scalable-mode to be string option Liu Yi L
2020-03-24 18:39   ` Peter Xu
2020-03-25 13:22     ` Liu, Yi L
2020-03-22 13:25 ` [PATCH v1 00/22] intel_iommu: expose Shared Virtual Addressing to VMs no-reply

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=A2975661238FB949B60364EF0F2C25743A204614@SHSMSX104.ccr.corp.intel.com \
    --to=yi.l.liu@intel.com \
    --cc=alex.williamson@redhat.com \
    --cc=david@gibson.dropbear.id.au \
    --cc=ehabkost@redhat.com \
    --cc=eric.auger@redhat.com \
    --cc=hao.wu@intel.com \
    --cc=jacob.jun.pan@linux.intel.com \
    --cc=jean-philippe@linaro.org \
    --cc=jun.j.tian@intel.com \
    --cc=kevin.tian@intel.com \
    --cc=kvm@vger.kernel.org \
    --cc=mst@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=peterx@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=rth@twiddle.net \
    --cc=yi.y.sun@intel.com \
    --cc=yi.y.sun@linux.intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

QEMU-Devel Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/qemu-devel/0 qemu-devel/git/0.git
	git clone --mirror https://lore.kernel.org/qemu-devel/1 qemu-devel/git/1.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 qemu-devel qemu-devel/ https://lore.kernel.org/qemu-devel \
		qemu-devel@nongnu.org
	public-inbox-index qemu-devel

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.nongnu.qemu-devel


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git