Re: [RFC] Xen PV IOMMU interface draft B

From: "Yu, Zhang" <yu.c.zhang@linux.intel.com>
To: Malcolm Crossley <malcolm.crossley@citrix.com>,
	xen-devel <xen-devel@lists.xenproject.org>,
	Jan Beulich <JBeulich@suse.com>,
	Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>,
	Andrew Cooper <andrew.cooper3@citrix.com>,
	Paul Durrant <Paul.Durrant@citrix.com>,
	Kevin Tian <kevin.tian@intel.com>,
	"Lv, Zhiyuan" <zhiyuan.lv@intel.com>,
	David Vrabel <david.vrabel@citrix.com>
Subject: Re: [RFC] Xen PV IOMMU interface draft B
Date: Wed, 17 Jun 2015 20:48:44 +0800	[thread overview]
Message-ID: <55816CAC.7090104@linux.intel.com> (raw)
In-Reply-To: <557B0C35.4080907@citrix.com>

Hi Malcolm,

   Thank you very much for accommodate our XenGT requirement in your
design. Following are some XenGT related questions. :)

On 6/13/2015 12:43 AM, Malcolm Crossley wrote:
> Hi All,
>
> Here is a design for allowing guests to control the IOMMU. This
> allows for the guest GFN mapping to be programmed into the IOMMU and
> avoid using the SWIOTLB bounce buffer technique in the Linux kernel
> (except for legacy 32 bit DMA IO devices).
>
> Draft B has been expanded to include Bus Address mapping/lookup for Mediated
> pass-through emulators.
>
> The pandoc markdown format of the document is provided below to allow
> for easier inline comments:
>
> % Xen PV IOMMU interface
> % Malcolm Crossley <<malcolm.crossley@citrix.com>>
>    Paul Durrant <<paul.durrant@citrix.com>>
> % Draft B
>
> Introduction
> ============
>
> Revision History
> ----------------
>
> --------------------------------------------------------------------
> Version  Date         Changes
> -------  -----------  ----------------------------------------------
> Draft A  10 Apr 2014  Initial draft.
>
> Draft B  12 Jun 2015  Second draft.
> --------------------------------------------------------------------
>
> Background
> ==========
>
> Linux kernel SWIOTLB
> --------------------
>
> Xen PV guests use a Pseudophysical Frame Number(PFN) address space which is
> decoupled from the host Machine Frame Number(MFN) address space.
>
> PV guest hardware drivers are only aware of the PFN address space only and
> assume that if PFN addresses are contiguous then the hardware addresses would
> be contiguous as well. The decoupling between PFN and MFN address spaces means
> PFN and MFN addresses may not be contiguous across page boundaries and thus a
> buffer allocated in GFN address space which spans a page boundary may not be
> contiguous in MFN address space.
>
> PV hardware drivers cannot tolerate this behaviour and so a special
> "bounce buffer" region is used to hide this issue from the drivers.
>
> A bounce buffer region is a special part of the PFN address space which has
> been made to be contiguous in both PFN and MFN address spaces. When a driver
> requests a buffer which spans a page boundary be made available for hardware
> to read the core operating system code copies the buffer into a temporarily
> reserved part of the bounce buffer region and then returns the MFN address of
> the reserved part of the bounce buffer region back to the driver itself. The
> driver then instructs the hardware to read the copy of the buffer in the
> bounce buffer. Similarly if the driver requests a buffer is made available
> for hardware to write to the first a region of the bounce buffer is reserved
> and then after the hardware completes writing then the reserved region of
> bounce buffer is copied to the originally allocated buffer.
>
> The overheard of memory copies to/from the bounce buffer region is high
> and damages performance. Furthermore, there is a risk the fixed size
> bounce buffer region will become exhausted and it will not be possible to
> return an hardware address back to the driver. The Linux kernel drivers do not
> tolerate this failure and so the kernel is forced to crash, as an
> uncorrectable error has occurred.
>
> Input/Output Memory Management Units (IOMMU) allow for an inbound address
> mapping to be created from the I/O Bus address space (typically PCI) to
> the machine frame number address space. IOMMU's typically use a page table
> mechanism to manage the mappings and therefore can create mappings of page size
> granularity or larger.
>
> The I/O Bus address space will be referred to as the Bus Frame Number (BFN)
> address space for the rest of this document.
>
>
> Mediated Pass-through Emulators
> -------------------------------
>
> Mediated Pass-through emulators allow guest domains to interact with
> hardware devices via emulator mediation. The emulator runs in a domain separate
> to the guest domain and it is used to enforce security of guest access to the
> hardware devices and isolation of different guests accessing the same hardware
> device.
>
> The emulator requires a mechanism to map guest address's to a bus address that
> the hardware devices can access.
>
>
> Clarification of GFN and BFN fields for different guest types
> -------------------------------------------------------------
> Guest Frame Numbers (GFN) definition varies depending on the guest type.
>
> Diagram below details the memory accesses originating from CPU, per guest type:
>
>        HVM guest                              PV guest
>
>           (VA)                                   (VA)
>            |                                      |
>           MMU                                    MMU
>            |                                      |
>           (GFN)                                   |
>            |                                      | (GFN)
>       HAP a.k.a EPT/NPT                           |
>            |                                      |
>           (MFN)                                  (MFN)
>            |                                      |
>           RAM                                    RAM
>
> For PV guests GFN is equal to MFN for a single page but not for a contiguous
> range of pages.
>
> Bus Frame Numbers (BFN) refer to the address presented on the physical bus
> before being translated by the IOMMU.
>
> Diagram below details memory accesses originating from physical device.
>
>      Physical Device
>            |
>          (BFN)
>            |
> 	   IOMMU-PT
>            |
>          (MFN)
>            |
>           RAM
>
>
>
> Purpose
> =======
>
> 1. Allow Xen guests to create/modify/destroy IOMMU mappings for
> hardware devices that the PV guests has access to. This enables the PV guest to
> program a bus address space mapping which matches it's GFN mapping. Once a 1-1
> mapping of PFN to bus address space is created then a bounce buffer
> region is not required for the IO devices connected to the IOMMU.
>
> 2. Allow for Xen guests to lookup/create/modify/destroy IOMMU mappings for
> guest memory of domains the calling Xen guest has sufficient privilege over.
> This enables domains to provide mediated hardware acceleration to other
> guest domains.
>
>
> Xen Architecture
> ================
>
> The Xen architecture consists of a new hypercall interface and changes to the
> grant map interface.
>
> The existing IOMMU mappings setup at domain creation time will be preserved so
> that PV domains unaware of this feature will continue to function with no
> changes required.
>
> Memory ballooning will be supported by taking an additional reference on the
> MFN backing the GFN for each successful IOMMU mapping created.
>
> An M2B tracking structure will be used to ensure all reference's to a MFN can
> be located easily.
>
> Xen PV IOMMU hypercall interface
> --------------------------------
> A two argument hypercall interface (do_iommu_op).
>
> ret_t do_iommu_op(XEN_GUEST_HANDLE_PARAM(void) arg, unsigned int count)
>
> First argument, guest handle pointer to array of `struct pv_iommu_op`
> Second argument, unsigned integer count of `struct pv_iommu_op` elements in array.
>
> Definition of struct pv_iommu_op:
>
>      struct pv_iommu_op {
>
>          uint16_t subop_id;
>          uint16_t flags;
>          int32_t status;
>
>          union {
>              struct {
>                  uint64_t bfn;
>                  uint64_t gfn;
>              } map_page;
>
>              struct {
>                  uint64_t bfn;
>              } unmap_page;
>
>              struct {
>                  uint64_t bfn;
>                  uint64_t gfn;
>                  uint16_t domid;
>                  ioservid_t ioserver;
>              } map_foreign_page;
>
>              struct {
>                  uint64_t bfn;
>                  uint64_t gfn;
>                  uint16_t domid;
>                  ioservid_t ioserver;
>              } lookup_foreign_page;
>
>              struct {
>                  uint64_t bfn;
>                  ioservid_t ioserver;
>              } unmap_foreign_page;
>          } u;
>      };
>
> Definition of PV IOMMU subops:
>
>      #define IOMMUOP_query_caps            1
>      #define IOMMUOP_map_page              2
>      #define IOMMUOP_unmap_page            3
>      #define IOMMUOP_map_foreign_page      4
>      #define IOMMUOP_lookup_foreign_page   5
>      #define IOMMUOP_unmap_foreign_page    6
>
>
> Design considerations for hypercall op
> -------------------------------------------
> IOMMU map/unmap operations can be slow and can involve flushing the IOMMU TLB
> to ensure the IO device uses the updated mappings.
>
> The op has been designed to take an array of operations and a count as
> parameters. This allows for easily implemented hypercall continuations to be
> used and allows for batches of IOMMU operations to be submitted before flushing
> the IOMMU TLB.
>
> The subop_id to be used for a particular element is encoded into the element
> itself. This allows for map and unmap operations to be performed in one hypercall
> and for the IOMMU TLB flushing optimisations to be still applied.
>
> The hypercall will ensure that the required IOMMU TLB flushes are applied before
> returning to guest via either hypercall completion or a hypercall continuation.
>
> IOMMUOP_query_caps
> ------------------
>
> This subop queries the runtime capabilities of the PV-IOMMU interface for the
> specific called domain. This subop uses `struct pv_iommu_op` directly.
>
> ------------------------------------------------------------------------------
> Field          Purpose
> -----          ---------------------------------------------------------------
> `flags`        [out] This field details the IOMMUOP capabilities.
>
> `status`       [out] Status of this op, op specific values listed below
> ------------------------------------------------------------------------------
>
> Defined bits for flags field:
>
> ------------------------------------------------------------------------------
> Name                        Bit                Definition
> ----                       ------     ----------------------------------
> IOMMU_QUERY_map_cap          0        IOMMUOP_map_page or IOMMUOP_map_foreign
>                                        can be used for this domain
>
> IOMMU_QUERY_map_all_gfns     1        IOMMUOP_map_page subop can map any MFN
>                                        not used by Xen
>
> Reserved for future use     2-9                   n/a
>
> IOMMU_page_order           10-15      Returns maximum possible page order for
>                                        all other IOMMUOP subops
> ------------------------------------------------------------------------------
>
> Defined values for query_caps subop status field:
>
> Value   Reason
> ------  ----------------------------------------------------------
> 0       subop successfully returned
>
> IOMMUOP_map_page
> ----------------------
> This subop uses `struct map_page` part of the `struct pv_iommu_op`.
>
> If IOMMU dom0-strict mode is NOT enabled then the hardware domain will be
> allowed to map all GFN's except for Xen owned MFN's else the hardware
> domain will only be allowed to map GFN's which it owns.
>
> If IOMMU dom0-strict mode is NOT enabled then the hardware domain will be
> allowed to map all GFN's without taking a reference to the MFN backing the GFN
> by setting the IOMMU_MAP_OP_no_ref_cnt flag.
>
> Every successful pv_iommu_op will result in an additional page reference being
> taken on the MFN backing the GFN except for the condition detailed above.
>
> If the map_op flags indicate a writeable mapping is required then a writeable
> page type reference will be taken otherwise a standard page reference will be
> taken.
>
> All the following conditions are required to be true for PV IOMMU map
> subop to succeed:
>
> 1. IOMMU detected and supported by Xen
> 2. The domain has IOMMU controlled hardware allocated to it
> 3. If hardware_domain and the following Xen IOMMU options are
>     NOT enabled: dom0-passthrough
>
> This subop usage of the "struct pv_iommu_op" and ``struct map_page` fields
> are detailed below:
>
> ------------------------------------------------------------------------------
> Field          Purpose
> -----          ---------------------------------------------------------------
> `bfn`          [in]  Bus address frame number(BFN) to be mapped to specified gfn
>                       below
>
> `gfn`          [in]  Guest address frame number for DOMID_SELF
>
> `flags`        [in]  Flags for signalling type of IOMMU mapping to be created,
>                       Flags can be combined.
>
> `status`       [out] Mapping status of this op, op specific values listed below
> ------------------------------------------------------------------------------
>
> Defined bits for flags field:
>
> Name                        Bit                Definition
> ----                       -----      ----------------------------------
> IOMMU_OP_readable            0        Create readable IOMMU mapping
> IOMMU_OP_writeable           1        Create writeable IOMMU mapping
> IOMMU_MAP_OP_no_ref_cnt      2        IOMMU mapping does not take a reference to
>                                        MFN backing BFN mapping
> Reserved for future use     3-9                   n/a
> IOMMU_page_order            10-15     Page order to be used for both gfn and bfn
>
> Defined values for map_page subop status field:
>
> Value   Reason
> ------  ----------------------------------------------------------------------
> 0       subop successfully returned
> -EIO    IOMMU unit returned error when attempting to map BFN to GFN.
> -EPERM  GFN could not be mapped because the GFN belongs to Xen.
> -EPERM  Domain is not a  domain and GFN does not belong to domain
> -EPERM  Domain is a hardware domain, IOMMU dom-strict mode is enabled and
>          GFN does not belong to domain
> -EACCES BFN address conflicts with RMRR regions for device's attached to
>          DOMID_SELF
> -ENOSPC Page order is too large for either BFN, GFN or IOMMU unit
>
> IOMMUOP_unmap_page
> ------------------
> This subop uses `struct unmap_page` part of the `struct pv_iommu_op`.
>
> The subop usage of the "struct pv_iommu_op" and ``struct unmap_page` fields
> are detailed below:
>
> --------------------------------------------------------------------
> Field          Purpose
> -----          -----------------------------------------------------
> `bfn`          [in] Bus address frame number to be unmapped in DOMID_SELF
>
> `flags`        [in] Flags for signalling page order of unmap operation
>
> `status`       [out] Mapping status of this unmap operation, 0 indicates success
> --------------------------------------------------------------------
>
> Defined bits for flags field:
>
> Name                        Bit                Definition
> ----                       -----      ----------------------------------
> Reserved for future use     0-9                   n/a
> IOMMU_page_order            10-15     Page order to be used for bfn
>
>
> Defined values for unmap_page subop status field:
>
> Error code  Reason
> ----------  ------------------------------------------------------------
> 0            subop successfully returned
> -EIO         IOMMU unit returned error when attempting to unmap BFN.
> -ENOSPC      Page order is too large for either BFN address or IOMMU unit
> ------------------------------------------------------------------------
>
>
> IOMMUOP_map_foreign_page
> ----------------
> This subop uses `struct map_foreign_page` part of the `struct pv_iommu_op`.
>
> It is not valid to use domid representing the calling domain.
>
> The hypercall will only succeed if calling domain has sufficient privilege over
> the specified domid
>
> If there is no IOMMU support then the MFN is returned in the BFN field (that is
> the only valid bus address for the GFN + domid combination).
>
> If there IOMMU support then the specified BFN is returned for the GFN + domid
> combination
>
> The M2B mechanism is a MFN to (BFN,domid,ioserver) tuple.
>
> Each successful subop will add to the M2B if there was not an existing identical
> M2B entry.
>
> Every new M2B entry will take a reference to the MFN backing the GFN.
>
> All the following conditions are required to be true for PV IOMMU map_foreign
> subop to succeed:
>
> 1. IOMMU detected and supported by Xen
> 2. The domain has IOMMU controlled hardware allocated to it
> 3. The domain is a hardware_domain and the following Xen IOMMU options are
>     NOT enabled: dom0-passthrough
What if the IOMMU is enabled, and runs in the default mode, which 1:1 
maps all memories except owned by Xen?
>
>
> This subop usage of the "struct pv_iommu_op" and ``struct map_foreign_page`
> fields are detailed below:
>
> --------------------------------------------------------------------
> Field          Purpose
> -----          -----------------------------------------------------
> `domid`        [in] The domain ID for which the gfn field applies
>
> `ioserver`     [in] IOREQ server id associated with mapping
>
> `bfn`          [in] Bus address frame number for gfn address
>
> `gfn`          [in] Guest address frame number
>
> `flags`        [in] Details the status of the BFN mapping
>
> `status`       [out] status of this subop, 0 indicates success
> --------------------------------------------------------------------
>
> Defined bits for flags field:
>
> Name                         Bit                Definition
> ----                        -----      ----------------------------------
> IOMMUOP_readable              0        BFN IOMMU mapping is readable
> IOMMUOP_writeable             1        BFN IOMMU mapping is writeable
> IOMMUOP_swap_mfn              2        BFN IOMMU mapping can be safely
>                                         swapped to scratch page
> Reserved for future use      3-9       Reserved flag bits should be 0
> IOMMU_page_order            10-15      Returns maximum possible page order for
>                                         all other IOMMUOP subops
>
> Defined values for map_foreign_page subop status field:
>
> Error code  Reason
> ----------  ------------------------------------------------------------
> 0            subop successfully returned
> -EIO         IOMMU unit returned error when attempting to map BFN to GFN.
> -EPERM       Calling domain does not have sufficient privilege over domid
> -EPERM       GFN could not be mapped because the GFN belongs to Xen.
> -EPERM       domid maps to DOMID_SELF
> -EACCES      BFN address conflicts with RMRR regions for device's attached to
>               DOMID_SELF
> -ENODEV      Provided ioserver id is not valid
> -ENXIO       Provided domid id is not valid
> -ENXIO       Provided GFN address is not valid
> -ENOSPC      Page order is too large for either BFN, GFN or IOMMU unit
>
> IOMMU_lookup_foreign_page
> ----------------
> This subop uses `struct lookup_foreign_page` part of the `struct pv_iommu_op`.
>
> If the BFN is specified as an input and parameter and there is no IOMMU support
> for the calling domain then an error will be returned.
>
> It is the calling domain responsibility to ensure there are no conflicts
>
> The hypercall will only succeed if calling domain has sufficient privilege over
> the specified domid
>
> If there is no IOMMU support then the MFN is returned in the BFN field (that is
> the only valid bus address for the GFN + domid combination).
Similarly, what if the IOMMU is enabled, and runs in the default mode,
which 1:1 maps all memories except owned by Xen? Will a MFN be returned?
Or should we take the query/map ops instead of the lookup op for this
situation?
>
> Each successful subop will add to the M2B if there was not an existing identical
> M2B entry.
>
> Every new M2B entry will take a reference to the MFN backing the GFN.
>
> This subop usage of the "struct pv_iommu_op" and ``struct lookup_foreign_page`
> fields are detailed below:
>
> --------------------------------------------------------------------
> Field          Purpose
> -----          -----------------------------------------------------
> `domid`        [in] The domain ID for which the gfn field applies
>
> `ioserver`     [in] IOREQ server id associated with mapping
>
> `bfn`          [out] Bus address frame number for gfn address
>
> `gfn`          [in] Guest address frame number
>
> `flags`        [out] Details the status of the BFN mapping
>
> `status`       [out] status of this subop, 0 indicates success
> --------------------------------------------------------------------
>
> Defined bits for flags field:
>
> Name                         Bit                Definition
> ----                        -----      ----------------------------------
> IOMMUOP_readable              0        Returned BFN IOMMU mapping is readable
> IOMMUOP_writeable             1        Returned BFN IOMMU mapping is writeable
> Reserved for future use      2-9       Reserved flag bits should be 0
> IOMMU_page_order            10-15      Returns maximum possible page order for
>                                         all other IOMMUOP subops
>
> Defined values for lookup_foreign_page subop status field:
>
> Error code  Reason
> ----------  ------------------------------------------------------------
> 0            subop successfully returned
> -EPERM       Calling domain does not have sufficient privilege over domid
> -ENOENT      There is no available BFN for provided GFN + domid combination
> -ENODEV      Provided ioserver id is not valid
> -ENXIO       Provided domid id is not valid
> -ENXIO       Provided GFN address is not valid
>
>
> IOMMUOP_unmap_foreign_page
> ----------------
> This subop uses `struct unmap_foreign_page` part of the `struct pv_iommu_op`.
>
> If there is no IOMMU support then the MFN is returned in the BFN field (that is
> the only valid bus address for the GFN + domid combination).
>
> If there is IOMMU support then the specified BFN is returned for the GFN + domid
> combination
>
> Each successful subop will add to the M2B if there was not an existing identical
> M2B entry. The
>
> Every new M2B entry will take a reference to the MFN backing the GFN.
>
> This subop usage of the "struct pv_iommu_op" and ``struct unmap_foreign_page` fields
> are detailed below:
>
> -----------------------------------------------------------------------
> Field          Purpose
> -----          --------------------------------------------------------
> `ioserver`      [in] IOREQ server id associated with mapping
>
> `bfn`          [in] Bus address frame number for gfn address
>
> `flags`        [out] Flags for signalling page order of unmap operation
>
> `status`       [out] status of this subop, 0 indicates success
> -----------------------------------------------------------------------
>
> Defined bits for flags field:
>
> Name                        Bit                Definition
> ----                        -----      ----------------------------------
> Reserved for future use     0-9                   n/a
> IOMMU_page_order            10-15     Page order to be used for bfn
>
> Defined values for unmap_foreign_page subop status field:
>
> Error code  Reason
> ----------  ------------------------------------------------------------
> 0            subop successfully returned
> -ENOENT      There is no mapped BFN + ioserver id combination to unmap
>
>
> IOMMUOP_*_foreign_page interactions with guest domain ballooning
> ================================================================
>
> Guest domains can balloon out a set of GFN mappings at any time and render the
> BFN to GFN mapping invalid.
>
> When a BFN to GFN mapping becomes invalid, Xen will issue a buffered IO request
> of type IOREQ_TYPE_INVALIDATE to the affected IOREQ servers with the now invalid
> BFN address in the data field. If the buffered IO request ring is full then a
> standard (synchronous) IO request of type IOREQ_TYPE_INVALIDATE will be issued
> to the affected IOREQ server the with just invalidated BFN address in the data
> field.
>
> The BFN mappings cannot be simply unmapped at the point of the balloon hypercall
> otherwise a malicious guest could specifically balloon out an in use GFN address
> in use by an emulator and trigger IOMMU faults for the domains with BFN
> mappings.
>
> For hosts with no IOMMU support: The affected emulator(s) must specifically
> issue a IOMMUOP_unmap_foreign_page subop for the now invalid BFN address so that
> the references to the underlying MFN are removed and the MFN can be freed back
> to the Xen memory allocator.
I do not quite understand this. With no IOMMU support, these BFNs are
supplied by hypervisor. So why not let hypervisor do this unmap and
notify the calling domain?
>
> For hosts with IOMMU support:
> If the BFN was mapped without the IOMMUOP_swap_mfn flag set in the
> IOMMUOP_map_foreign_page then the affected affected emulator(s) must
> specifically issue a IOMMUOP_unmap_foreign_page subop for the now invalid BFN
> address so that the references to the underlying MFN are removed.
>
> If the BFN was mapped with the IOMMUOP_swap_mfn flag set in the
> IOMMUOP_map_foreign_page subop for all emulators with mappings of that GFN then
> the BFN mapping will be swapped to point at a scratch MFN page and all BFN
> references to the invalid MFN will be removed by Xen after the BFN mapping has
> been updated to point at the scratch MFN page.
>
> The rationale for swapping the BFN mapping to point at scratch pages is to
> enable guest domains to balloon quickly without requiring hypercall(s) from
> emulators.
>
> Not all BFN mappings can be swapped without potentially causing problems for the
> hardware itself (command rings etc.) so the IOMMUOP_swap_mfn flag is used to
> allow per BFN control of Xen ballooning behaviour.
>
>
> PV IOMMU interactions with self ballooning
> ==========================================
>
> The guest should clear any IOMMU mappings it has of it's own pages before
> releasing a page back to Xen. It will need to add IOMMU mappings after
> repopulating a page with the populate_physmap hypercall.
>
> This requires that IOMMU mappings get a writeable page type reference count and
> that guests clear any IOMMU mappings before pinning page table pages.
>
>
> Security Implications of allowing domain IOMMU control
> ===============================================================
>
> Xen currently allows IO devices attached to hardware domain to have direct
> access to the all of the MFN address space (except Xen hypervisor memory regions),
> provided the Xen IOMMU option dom0-strict is not enabled.
>
> The PV IOMMU feature provides the same level of access to MFN address space
> and the feature is not enabled when the Xen IOMMU option dom0-strict is
> enabled. Therefore security is not degraded by the PV IOMMU feature.
>
> Domains with physical device(s) assigned which are not hardware domains are only
> allowed to map their own GFNs or GFNs for domain(s) they have privilege over.
>
>
> PV IOMMU interactions with grant map/unmap operations
> =====================================================
>
> Grant map operations return a Physical device accessible address (BFN) if the
> GNTMAP_device_map flag is set.  This operation currently returns the MFN for PV
> guests which may conflict with the BFN address space the guest uses if PV IOMMU
> map support is available to the guest.
>
> This design proposes to allow the calling domain to control the BFN address that
> a grant map operation uses.
>
> This can be achieved by specifying that the dev_bus_addr in the
> gnttab_map_grant_ref structure is used an input parameter instead of the
> output parameter it is currently.
>
> Only PAGE_SIZE aligned addresses are allowed for dev_bus_addr input parameter.
>
> The revised structure is shown below for convenience.
>
>      struct gnttab_map_grant_ref {
>          /* IN parameters. */
>          uint64_t host_addr;
>          uint32_t flags;               /* GNTMAP_* */
>          grant_ref_t ref;
>          domid_t  dom;
>          /* OUT parameters. */
>          int16_t  status;              /* => enum grant_status */
>          grant_handle_t handle;
>          /* IN/OUT parameters */
>          uint64_t dev_bus_addr;
>      };
>
>
> The grant map operation would then behave similarly to the IOMMUOP_map_page
> subop for the creation of the IOMMU mapping.
>
> The grant unmap operation would then behave similarly to the IOMMUOP_unmap_page
> subop for the removal of the IOMMU mapping.
>
> A new grantmap flag would be used to indicate the domain is requesting the
> dev_bus_addr field is used an input parameter.
>
>
>      #define _GNTMAP_request_bfn_map      (6)
>      #define GNTMAP_request_bfn_map   (1<<_GNTMAP_request_bfn_map)
>
>
>
> Linux kernel architecture
> =========================
>
> The Linux kernel will use the PV-IOMMU hypercalls to map it's PFN address
> space into the IOMMU. It will map the PFN's to the IOMMU address space using
> a 1:1 mapping, it does this by programming a BFN to GFN mapping which matches
> the PFN to GFN mapping.
>
> The native SWIOTLB will be used to handle device's which cannot DMA to all of
> the kernel's PFN address space.
>
> An interface shall be provided for emulator usage of IOMMUOP_*_foreign_page
> subops which will allow the Linux kernel to centrally manage that domains BFN
> resource and ensure there are no unexpected conflicts.
>
>
> Emulator usage of PV IOMMU interface
> ====================================
>
> Emulators which require bus address mapping of guest RAM must first determine if
> it's possible for the domain to control the bus addresses themselves.
>
> A IOMMUOP_query_caps subop will return the IOMMU_QUERY_map_cap flag. If this
> flag is set then the emulator may specify the BFN address it wishes guest RAM to
> be mapped to via the IOMMUOP_map_foreign_page subop.  If the flag is not set
> then the emulator must use BFN addresses supplied by the Xen via the
> IOMMUOP_lookup_foreign_page.
>
> Operating systems which use the IOMMUOP_map_page subop are expected to provide a
> common interface for emulators

According to our previous internal discussions, my understanding about
the usage is this:
1> PV IOMMU has an interface in dom0's kernel to do the query/map/lookup
all at once, which also includes the BFN allocation algorithm.
2> When XenGT emulator tries to construct a shadow PTE, we can just call
your interface, which returns a BFN whatever.

However, the above description seems the XenGT device model need to do
the query/lookup/map by itself?
Besides, could you please give a more detailed information about this
'common interface'? :)

Thanks
Yu
>
> Emulators should unmap unused GFN mappings as often as possible using
> IOMMUOP_unmap_foreign_page subops so that guest domains can balloon pages
> quickly and efficiently.
>
> Emulators should conform to the ballooning behaviour described section
> "IOMMUOP_*_foreign_page interactions with guest domain ballooning" so that guest
> domains are able to effectively balloon out and in memory.
>
> Emulators must unmap any active BFN mappings when they shutdown.
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel
>