All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Yu, Zhang" <yu.c.zhang@linux.intel.com>
To: Malcolm Crossley <malcolm.crossley@citrix.com>,
	xen-devel <xen-devel@lists.xenproject.org>,
	Jan Beulich <JBeulich@suse.com>,
	Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>,
	Andrew Cooper <andrew.cooper3@citrix.com>,
	Paul Durrant <Paul.Durrant@citrix.com>,
	Kevin Tian <kevin.tian@intel.com>,
	"Lv, Zhiyuan" <zhiyuan.lv@intel.com>,
	David Vrabel <david.vrabel@citrix.com>
Subject: Re: [RFC] Xen PV IOMMU interface draft B
Date: Wed, 17 Jun 2015 20:48:44 +0800	[thread overview]
Message-ID: <55816CAC.7090104@linux.intel.com> (raw)
In-Reply-To: <557B0C35.4080907@citrix.com>

Hi Malcolm,

   Thank you very much for accommodate our XenGT requirement in your
design. Following are some XenGT related questions. :)

On 6/13/2015 12:43 AM, Malcolm Crossley wrote:
> Hi All,
>
> Here is a design for allowing guests to control the IOMMU. This
> allows for the guest GFN mapping to be programmed into the IOMMU and
> avoid using the SWIOTLB bounce buffer technique in the Linux kernel
> (except for legacy 32 bit DMA IO devices).
>
> Draft B has been expanded to include Bus Address mapping/lookup for Mediated
> pass-through emulators.
>
> The pandoc markdown format of the document is provided below to allow
> for easier inline comments:
>
> % Xen PV IOMMU interface
> % Malcolm Crossley <<malcolm.crossley@citrix.com>>
>    Paul Durrant <<paul.durrant@citrix.com>>
> % Draft B
>
> Introduction
> ============
>
> Revision History
> ----------------
>
> --------------------------------------------------------------------
> Version  Date         Changes
> -------  -----------  ----------------------------------------------
> Draft A  10 Apr 2014  Initial draft.
>
> Draft B  12 Jun 2015  Second draft.
> --------------------------------------------------------------------
>
> Background
> ==========
>
> Linux kernel SWIOTLB
> --------------------
>
> Xen PV guests use a Pseudophysical Frame Number(PFN) address space which is
> decoupled from the host Machine Frame Number(MFN) address space.
>
> PV guest hardware drivers are only aware of the PFN address space only and
> assume that if PFN addresses are contiguous then the hardware addresses would
> be contiguous as well. The decoupling between PFN and MFN address spaces means
> PFN and MFN addresses may not be contiguous across page boundaries and thus a
> buffer allocated in GFN address space which spans a page boundary may not be
> contiguous in MFN address space.
>
> PV hardware drivers cannot tolerate this behaviour and so a special
> "bounce buffer" region is used to hide this issue from the drivers.
>
> A bounce buffer region is a special part of the PFN address space which has
> been made to be contiguous in both PFN and MFN address spaces. When a driver
> requests a buffer which spans a page boundary be made available for hardware
> to read the core operating system code copies the buffer into a temporarily
> reserved part of the bounce buffer region and then returns the MFN address of
> the reserved part of the bounce buffer region back to the driver itself. The
> driver then instructs the hardware to read the copy of the buffer in the
> bounce buffer. Similarly if the driver requests a buffer is made available
> for hardware to write to the first a region of the bounce buffer is reserved
> and then after the hardware completes writing then the reserved region of
> bounce buffer is copied to the originally allocated buffer.
>
> The overheard of memory copies to/from the bounce buffer region is high
> and damages performance. Furthermore, there is a risk the fixed size
> bounce buffer region will become exhausted and it will not be possible to
> return an hardware address back to the driver. The Linux kernel drivers do not
> tolerate this failure and so the kernel is forced to crash, as an
> uncorrectable error has occurred.
>
> Input/Output Memory Management Units (IOMMU) allow for an inbound address
> mapping to be created from the I/O Bus address space (typically PCI) to
> the machine frame number address space. IOMMU's typically use a page table
> mechanism to manage the mappings and therefore can create mappings of page size
> granularity or larger.
>
> The I/O Bus address space will be referred to as the Bus Frame Number (BFN)
> address space for the rest of this document.
>
>
> Mediated Pass-through Emulators
> -------------------------------
>
> Mediated Pass-through emulators allow guest domains to interact with
> hardware devices via emulator mediation. The emulator runs in a domain separate
> to the guest domain and it is used to enforce security of guest access to the
> hardware devices and isolation of different guests accessing the same hardware
> device.
>
> The emulator requires a mechanism to map guest address's to a bus address that
> the hardware devices can access.
>
>
> Clarification of GFN and BFN fields for different guest types
> -------------------------------------------------------------
> Guest Frame Numbers (GFN) definition varies depending on the guest type.
>
> Diagram below details the memory accesses originating from CPU, per guest type:
>
>        HVM guest                              PV guest
>
>           (VA)                                   (VA)
>            |                                      |
>           MMU                                    MMU
>            |                                      |
>           (GFN)                                   |
>            |                                      | (GFN)
>       HAP a.k.a EPT/NPT                           |
>            |                                      |
>           (MFN)                                  (MFN)
>            |                                      |
>           RAM                                    RAM
>
> For PV guests GFN is equal to MFN for a single page but not for a contiguous
> range of pages.
>
> Bus Frame Numbers (BFN) refer to the address presented on the physical bus
> before being translated by the IOMMU.
>
> Diagram below details memory accesses originating from physical device.
>
>      Physical Device
>            |
>          (BFN)
>            |
> 	   IOMMU-PT
>            |
>          (MFN)
>            |
>           RAM
>
>
>
> Purpose
> =======
>
> 1. Allow Xen guests to create/modify/destroy IOMMU mappings for
> hardware devices that the PV guests has access to. This enables the PV guest to
> program a bus address space mapping which matches it's GFN mapping. Once a 1-1
> mapping of PFN to bus address space is created then a bounce buffer
> region is not required for the IO devices connected to the IOMMU.
>
> 2. Allow for Xen guests to lookup/create/modify/destroy IOMMU mappings for
> guest memory of domains the calling Xen guest has sufficient privilege over.
> This enables domains to provide mediated hardware acceleration to other
> guest domains.
>
>
> Xen Architecture
> ================
>
> The Xen architecture consists of a new hypercall interface and changes to the
> grant map interface.
>
> The existing IOMMU mappings setup at domain creation time will be preserved so
> that PV domains unaware of this feature will continue to function with no
> changes required.
>
> Memory ballooning will be supported by taking an additional reference on the
> MFN backing the GFN for each successful IOMMU mapping created.
>
> An M2B tracking structure will be used to ensure all reference's to a MFN can
> be located easily.
>
> Xen PV IOMMU hypercall interface
> --------------------------------
> A two argument hypercall interface (do_iommu_op).
>
> ret_t do_iommu_op(XEN_GUEST_HANDLE_PARAM(void) arg, unsigned int count)
>
> First argument, guest handle pointer to array of `struct pv_iommu_op`
> Second argument, unsigned integer count of `struct pv_iommu_op` elements in array.
>
> Definition of struct pv_iommu_op:
>
>      struct pv_iommu_op {
>
>          uint16_t subop_id;
>          uint16_t flags;
>          int32_t status;
>
>          union {
>              struct {
>                  uint64_t bfn;
>                  uint64_t gfn;
>              } map_page;
>
>              struct {
>                  uint64_t bfn;
>              } unmap_page;
>
>              struct {
>                  uint64_t bfn;
>                  uint64_t gfn;
>                  uint16_t domid;
>                  ioservid_t ioserver;
>              } map_foreign_page;
>
>              struct {
>                  uint64_t bfn;
>                  uint64_t gfn;
>                  uint16_t domid;
>                  ioservid_t ioserver;
>              } lookup_foreign_page;
>
>              struct {
>                  uint64_t bfn;
>                  ioservid_t ioserver;
>              } unmap_foreign_page;
>          } u;
>      };
>
> Definition of PV IOMMU subops:
>
>      #define IOMMUOP_query_caps            1
>      #define IOMMUOP_map_page              2
>      #define IOMMUOP_unmap_page            3
>      #define IOMMUOP_map_foreign_page      4
>      #define IOMMUOP_lookup_foreign_page   5
>      #define IOMMUOP_unmap_foreign_page    6
>
>
> Design considerations for hypercall op
> -------------------------------------------
> IOMMU map/unmap operations can be slow and can involve flushing the IOMMU TLB
> to ensure the IO device uses the updated mappings.
>
> The op has been designed to take an array of operations and a count as
> parameters. This allows for easily implemented hypercall continuations to be
> used and allows for batches of IOMMU operations to be submitted before flushing
> the IOMMU TLB.
>
> The subop_id to be used for a particular element is encoded into the element
> itself. This allows for map and unmap operations to be performed in one hypercall
> and for the IOMMU TLB flushing optimisations to be still applied.
>
> The hypercall will ensure that the required IOMMU TLB flushes are applied before
> returning to guest via either hypercall completion or a hypercall continuation.
>
> IOMMUOP_query_caps
> ------------------
>
> This subop queries the runtime capabilities of the PV-IOMMU interface for the
> specific called domain. This subop uses `struct pv_iommu_op` directly.
>
> ------------------------------------------------------------------------------
> Field          Purpose
> -----          ---------------------------------------------------------------
> `flags`        [out] This field details the IOMMUOP capabilities.
>
> `status`       [out] Status of this op, op specific values listed below
> ------------------------------------------------------------------------------
>
> Defined bits for flags field:
>
> ------------------------------------------------------------------------------
> Name                        Bit                Definition
> ----                       ------     ----------------------------------
> IOMMU_QUERY_map_cap          0        IOMMUOP_map_page or IOMMUOP_map_foreign
>                                        can be used for this domain
>
> IOMMU_QUERY_map_all_gfns     1        IOMMUOP_map_page subop can map any MFN
>                                        not used by Xen
>
> Reserved for future use     2-9                   n/a
>
> IOMMU_page_order           10-15      Returns maximum possible page order for
>                                        all other IOMMUOP subops
> ------------------------------------------------------------------------------
>
> Defined values for query_caps subop status field:
>
> Value   Reason
> ------  ----------------------------------------------------------
> 0       subop successfully returned
>
> IOMMUOP_map_page
> ----------------------
> This subop uses `struct map_page` part of the `struct pv_iommu_op`.
>
> If IOMMU dom0-strict mode is NOT enabled then the hardware domain will be
> allowed to map all GFN's except for Xen owned MFN's else the hardware
> domain will only be allowed to map GFN's which it owns.
>
> If IOMMU dom0-strict mode is NOT enabled then the hardware domain will be
> allowed to map all GFN's without taking a reference to the MFN backing the GFN
> by setting the IOMMU_MAP_OP_no_ref_cnt flag.
>
> Every successful pv_iommu_op will result in an additional page reference being
> taken on the MFN backing the GFN except for the condition detailed above.
>
> If the map_op flags indicate a writeable mapping is required then a writeable
> page type reference will be taken otherwise a standard page reference will be
> taken.
>
> All the following conditions are required to be true for PV IOMMU map
> subop to succeed:
>
> 1. IOMMU detected and supported by Xen
> 2. The domain has IOMMU controlled hardware allocated to it
> 3. If hardware_domain and the following Xen IOMMU options are
>     NOT enabled: dom0-passthrough
>
> This subop usage of the "struct pv_iommu_op" and ``struct map_page` fields
> are detailed below:
>
> ------------------------------------------------------------------------------
> Field          Purpose
> -----          ---------------------------------------------------------------
> `bfn`          [in]  Bus address frame number(BFN) to be mapped to specified gfn
>                       below
>
> `gfn`          [in]  Guest address frame number for DOMID_SELF
>
> `flags`        [in]  Flags for signalling type of IOMMU mapping to be created,
>                       Flags can be combined.
>
> `status`       [out] Mapping status of this op, op specific values listed below
> ------------------------------------------------------------------------------
>
> Defined bits for flags field:
>
> Name                        Bit                Definition
> ----                       -----      ----------------------------------
> IOMMU_OP_readable            0        Create readable IOMMU mapping
> IOMMU_OP_writeable           1        Create writeable IOMMU mapping
> IOMMU_MAP_OP_no_ref_cnt      2        IOMMU mapping does not take a reference to
>                                        MFN backing BFN mapping
> Reserved for future use     3-9                   n/a
> IOMMU_page_order            10-15     Page order to be used for both gfn and bfn
>
> Defined values for map_page subop status field:
>
> Value   Reason
> ------  ----------------------------------------------------------------------
> 0       subop successfully returned
> -EIO    IOMMU unit returned error when attempting to map BFN to GFN.
> -EPERM  GFN could not be mapped because the GFN belongs to Xen.
> -EPERM  Domain is not a  domain and GFN does not belong to domain
> -EPERM  Domain is a hardware domain, IOMMU dom-strict mode is enabled and
>          GFN does not belong to domain
> -EACCES BFN address conflicts with RMRR regions for device's attached to
>          DOMID_SELF
> -ENOSPC Page order is too large for either BFN, GFN or IOMMU unit
>
> IOMMUOP_unmap_page
> ------------------
> This subop uses `struct unmap_page` part of the `struct pv_iommu_op`.
>
> The subop usage of the "struct pv_iommu_op" and ``struct unmap_page` fields
> are detailed below:
>
> --------------------------------------------------------------------
> Field          Purpose
> -----          -----------------------------------------------------
> `bfn`          [in] Bus address frame number to be unmapped in DOMID_SELF
>
> `flags`        [in] Flags for signalling page order of unmap operation
>
> `status`       [out] Mapping status of this unmap operation, 0 indicates success
> --------------------------------------------------------------------
>
> Defined bits for flags field:
>
> Name                        Bit                Definition
> ----                       -----      ----------------------------------
> Reserved for future use     0-9                   n/a
> IOMMU_page_order            10-15     Page order to be used for bfn
>
>
> Defined values for unmap_page subop status field:
>
> Error code  Reason
> ----------  ------------------------------------------------------------
> 0            subop successfully returned
> -EIO         IOMMU unit returned error when attempting to unmap BFN.
> -ENOSPC      Page order is too large for either BFN address or IOMMU unit
> ------------------------------------------------------------------------
>
>
> IOMMUOP_map_foreign_page
> ----------------
> This subop uses `struct map_foreign_page` part of the `struct pv_iommu_op`.
>
> It is not valid to use domid representing the calling domain.
>
> The hypercall will only succeed if calling domain has sufficient privilege over
> the specified domid
>
> If there is no IOMMU support then the MFN is returned in the BFN field (that is
> the only valid bus address for the GFN + domid combination).
>
> If there IOMMU support then the specified BFN is returned for the GFN + domid
> combination
>
> The M2B mechanism is a MFN to (BFN,domid,ioserver) tuple.
>
> Each successful subop will add to the M2B if there was not an existing identical
> M2B entry.
>
> Every new M2B entry will take a reference to the MFN backing the GFN.
>
> All the following conditions are required to be true for PV IOMMU map_foreign
> subop to succeed:
>
> 1. IOMMU detected and supported by Xen
> 2. The domain has IOMMU controlled hardware allocated to it
> 3. The domain is a hardware_domain and the following Xen IOMMU options are
>     NOT enabled: dom0-passthrough
What if the IOMMU is enabled, and runs in the default mode, which 1:1 
maps all memories except owned by Xen?
>
>
> This subop usage of the "struct pv_iommu_op" and ``struct map_foreign_page`
> fields are detailed below:
>
> --------------------------------------------------------------------
> Field          Purpose
> -----          -----------------------------------------------------
> `domid`        [in] The domain ID for which the gfn field applies
>
> `ioserver`     [in] IOREQ server id associated with mapping
>
> `bfn`          [in] Bus address frame number for gfn address
>
> `gfn`          [in] Guest address frame number
>
> `flags`        [in] Details the status of the BFN mapping
>
> `status`       [out] status of this subop, 0 indicates success
> --------------------------------------------------------------------
>
> Defined bits for flags field:
>
> Name                         Bit                Definition
> ----                        -----      ----------------------------------
> IOMMUOP_readable              0        BFN IOMMU mapping is readable
> IOMMUOP_writeable             1        BFN IOMMU mapping is writeable
> IOMMUOP_swap_mfn              2        BFN IOMMU mapping can be safely
>                                         swapped to scratch page
> Reserved for future use      3-9       Reserved flag bits should be 0
> IOMMU_page_order            10-15      Returns maximum possible page order for
>                                         all other IOMMUOP subops
>
> Defined values for map_foreign_page subop status field:
>
> Error code  Reason
> ----------  ------------------------------------------------------------
> 0            subop successfully returned
> -EIO         IOMMU unit returned error when attempting to map BFN to GFN.
> -EPERM       Calling domain does not have sufficient privilege over domid
> -EPERM       GFN could not be mapped because the GFN belongs to Xen.
> -EPERM       domid maps to DOMID_SELF
> -EACCES      BFN address conflicts with RMRR regions for device's attached to
>               DOMID_SELF
> -ENODEV      Provided ioserver id is not valid
> -ENXIO       Provided domid id is not valid
> -ENXIO       Provided GFN address is not valid
> -ENOSPC      Page order is too large for either BFN, GFN or IOMMU unit
>
> IOMMU_lookup_foreign_page
> ----------------
> This subop uses `struct lookup_foreign_page` part of the `struct pv_iommu_op`.
>
> If the BFN is specified as an input and parameter and there is no IOMMU support
> for the calling domain then an error will be returned.
>
> It is the calling domain responsibility to ensure there are no conflicts
>
> The hypercall will only succeed if calling domain has sufficient privilege over
> the specified domid
>
> If there is no IOMMU support then the MFN is returned in the BFN field (that is
> the only valid bus address for the GFN + domid combination).
Similarly, what if the IOMMU is enabled, and runs in the default mode,
which 1:1 maps all memories except owned by Xen? Will a MFN be returned?
Or should we take the query/map ops instead of the lookup op for this
situation?
>
> Each successful subop will add to the M2B if there was not an existing identical
> M2B entry.
>
> Every new M2B entry will take a reference to the MFN backing the GFN.
>
> This subop usage of the "struct pv_iommu_op" and ``struct lookup_foreign_page`
> fields are detailed below:
>
> --------------------------------------------------------------------
> Field          Purpose
> -----          -----------------------------------------------------
> `domid`        [in] The domain ID for which the gfn field applies
>
> `ioserver`     [in] IOREQ server id associated with mapping
>
> `bfn`          [out] Bus address frame number for gfn address
>
> `gfn`          [in] Guest address frame number
>
> `flags`        [out] Details the status of the BFN mapping
>
> `status`       [out] status of this subop, 0 indicates success
> --------------------------------------------------------------------
>
> Defined bits for flags field:
>
> Name                         Bit                Definition
> ----                        -----      ----------------------------------
> IOMMUOP_readable              0        Returned BFN IOMMU mapping is readable
> IOMMUOP_writeable             1        Returned BFN IOMMU mapping is writeable
> Reserved for future use      2-9       Reserved flag bits should be 0
> IOMMU_page_order            10-15      Returns maximum possible page order for
>                                         all other IOMMUOP subops
>
> Defined values for lookup_foreign_page subop status field:
>
> Error code  Reason
> ----------  ------------------------------------------------------------
> 0            subop successfully returned
> -EPERM       Calling domain does not have sufficient privilege over domid
> -ENOENT      There is no available BFN for provided GFN + domid combination
> -ENODEV      Provided ioserver id is not valid
> -ENXIO       Provided domid id is not valid
> -ENXIO       Provided GFN address is not valid
>
>
> IOMMUOP_unmap_foreign_page
> ----------------
> This subop uses `struct unmap_foreign_page` part of the `struct pv_iommu_op`.
>
> If there is no IOMMU support then the MFN is returned in the BFN field (that is
> the only valid bus address for the GFN + domid combination).
>
> If there is IOMMU support then the specified BFN is returned for the GFN + domid
> combination
>
> Each successful subop will add to the M2B if there was not an existing identical
> M2B entry. The
>
> Every new M2B entry will take a reference to the MFN backing the GFN.
>
> This subop usage of the "struct pv_iommu_op" and ``struct unmap_foreign_page` fields
> are detailed below:
>
> -----------------------------------------------------------------------
> Field          Purpose
> -----          --------------------------------------------------------
> `ioserver`      [in] IOREQ server id associated with mapping
>
> `bfn`          [in] Bus address frame number for gfn address
>
> `flags`        [out] Flags for signalling page order of unmap operation
>
> `status`       [out] status of this subop, 0 indicates success
> -----------------------------------------------------------------------
>
> Defined bits for flags field:
>
> Name                        Bit                Definition
> ----                        -----      ----------------------------------
> Reserved for future use     0-9                   n/a
> IOMMU_page_order            10-15     Page order to be used for bfn
>
> Defined values for unmap_foreign_page subop status field:
>
> Error code  Reason
> ----------  ------------------------------------------------------------
> 0            subop successfully returned
> -ENOENT      There is no mapped BFN + ioserver id combination to unmap
>
>
> IOMMUOP_*_foreign_page interactions with guest domain ballooning
> ================================================================
>
> Guest domains can balloon out a set of GFN mappings at any time and render the
> BFN to GFN mapping invalid.
>
> When a BFN to GFN mapping becomes invalid, Xen will issue a buffered IO request
> of type IOREQ_TYPE_INVALIDATE to the affected IOREQ servers with the now invalid
> BFN address in the data field. If the buffered IO request ring is full then a
> standard (synchronous) IO request of type IOREQ_TYPE_INVALIDATE will be issued
> to the affected IOREQ server the with just invalidated BFN address in the data
> field.
>
> The BFN mappings cannot be simply unmapped at the point of the balloon hypercall
> otherwise a malicious guest could specifically balloon out an in use GFN address
> in use by an emulator and trigger IOMMU faults for the domains with BFN
> mappings.
>
> For hosts with no IOMMU support: The affected emulator(s) must specifically
> issue a IOMMUOP_unmap_foreign_page subop for the now invalid BFN address so that
> the references to the underlying MFN are removed and the MFN can be freed back
> to the Xen memory allocator.
I do not quite understand this. With no IOMMU support, these BFNs are
supplied by hypervisor. So why not let hypervisor do this unmap and
notify the calling domain?
>
> For hosts with IOMMU support:
> If the BFN was mapped without the IOMMUOP_swap_mfn flag set in the
> IOMMUOP_map_foreign_page then the affected affected emulator(s) must
> specifically issue a IOMMUOP_unmap_foreign_page subop for the now invalid BFN
> address so that the references to the underlying MFN are removed.
>
> If the BFN was mapped with the IOMMUOP_swap_mfn flag set in the
> IOMMUOP_map_foreign_page subop for all emulators with mappings of that GFN then
> the BFN mapping will be swapped to point at a scratch MFN page and all BFN
> references to the invalid MFN will be removed by Xen after the BFN mapping has
> been updated to point at the scratch MFN page.
>
> The rationale for swapping the BFN mapping to point at scratch pages is to
> enable guest domains to balloon quickly without requiring hypercall(s) from
> emulators.
>
> Not all BFN mappings can be swapped without potentially causing problems for the
> hardware itself (command rings etc.) so the IOMMUOP_swap_mfn flag is used to
> allow per BFN control of Xen ballooning behaviour.
>
>
> PV IOMMU interactions with self ballooning
> ==========================================
>
> The guest should clear any IOMMU mappings it has of it's own pages before
> releasing a page back to Xen. It will need to add IOMMU mappings after
> repopulating a page with the populate_physmap hypercall.
>
> This requires that IOMMU mappings get a writeable page type reference count and
> that guests clear any IOMMU mappings before pinning page table pages.
>
>
> Security Implications of allowing domain IOMMU control
> ===============================================================
>
> Xen currently allows IO devices attached to hardware domain to have direct
> access to the all of the MFN address space (except Xen hypervisor memory regions),
> provided the Xen IOMMU option dom0-strict is not enabled.
>
> The PV IOMMU feature provides the same level of access to MFN address space
> and the feature is not enabled when the Xen IOMMU option dom0-strict is
> enabled. Therefore security is not degraded by the PV IOMMU feature.
>
> Domains with physical device(s) assigned which are not hardware domains are only
> allowed to map their own GFNs or GFNs for domain(s) they have privilege over.
>
>
> PV IOMMU interactions with grant map/unmap operations
> =====================================================
>
> Grant map operations return a Physical device accessible address (BFN) if the
> GNTMAP_device_map flag is set.  This operation currently returns the MFN for PV
> guests which may conflict with the BFN address space the guest uses if PV IOMMU
> map support is available to the guest.
>
> This design proposes to allow the calling domain to control the BFN address that
> a grant map operation uses.
>
> This can be achieved by specifying that the dev_bus_addr in the
> gnttab_map_grant_ref structure is used an input parameter instead of the
> output parameter it is currently.
>
> Only PAGE_SIZE aligned addresses are allowed for dev_bus_addr input parameter.
>
> The revised structure is shown below for convenience.
>
>      struct gnttab_map_grant_ref {
>          /* IN parameters. */
>          uint64_t host_addr;
>          uint32_t flags;               /* GNTMAP_* */
>          grant_ref_t ref;
>          domid_t  dom;
>          /* OUT parameters. */
>          int16_t  status;              /* => enum grant_status */
>          grant_handle_t handle;
>          /* IN/OUT parameters */
>          uint64_t dev_bus_addr;
>      };
>
>
> The grant map operation would then behave similarly to the IOMMUOP_map_page
> subop for the creation of the IOMMU mapping.
>
> The grant unmap operation would then behave similarly to the IOMMUOP_unmap_page
> subop for the removal of the IOMMU mapping.
>
> A new grantmap flag would be used to indicate the domain is requesting the
> dev_bus_addr field is used an input parameter.
>
>
>      #define _GNTMAP_request_bfn_map      (6)
>      #define GNTMAP_request_bfn_map   (1<<_GNTMAP_request_bfn_map)
>
>
>
> Linux kernel architecture
> =========================
>
> The Linux kernel will use the PV-IOMMU hypercalls to map it's PFN address
> space into the IOMMU. It will map the PFN's to the IOMMU address space using
> a 1:1 mapping, it does this by programming a BFN to GFN mapping which matches
> the PFN to GFN mapping.
>
> The native SWIOTLB will be used to handle device's which cannot DMA to all of
> the kernel's PFN address space.
>
> An interface shall be provided for emulator usage of IOMMUOP_*_foreign_page
> subops which will allow the Linux kernel to centrally manage that domains BFN
> resource and ensure there are no unexpected conflicts.
>
>
> Emulator usage of PV IOMMU interface
> ====================================
>
> Emulators which require bus address mapping of guest RAM must first determine if
> it's possible for the domain to control the bus addresses themselves.
>
> A IOMMUOP_query_caps subop will return the IOMMU_QUERY_map_cap flag. If this
> flag is set then the emulator may specify the BFN address it wishes guest RAM to
> be mapped to via the IOMMUOP_map_foreign_page subop.  If the flag is not set
> then the emulator must use BFN addresses supplied by the Xen via the
> IOMMUOP_lookup_foreign_page.
>
> Operating systems which use the IOMMUOP_map_page subop are expected to provide a
> common interface for emulators

According to our previous internal discussions, my understanding about
the usage is this:
1> PV IOMMU has an interface in dom0's kernel to do the query/map/lookup
all at once, which also includes the BFN allocation algorithm.
2> When XenGT emulator tries to construct a shadow PTE, we can just call
your interface, which returns a BFN whatever.

However, the above description seems the XenGT device model need to do
the query/lookup/map by itself?
Besides, could you please give a more detailed information about this
'common interface'? :)

Thanks
Yu
>
> Emulators should unmap unused GFN mappings as often as possible using
> IOMMUOP_unmap_foreign_page subops so that guest domains can balloon pages
> quickly and efficiently.
>
> Emulators should conform to the ballooning behaviour described section
> "IOMMUOP_*_foreign_page interactions with guest domain ballooning" so that guest
> domains are able to effectively balloon out and in memory.
>
> Emulators must unmap any active BFN mappings when they shutdown.
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel
>

  parent reply	other threads:[~2015-06-17 12:53 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-06-12 16:43 [RFC] Xen PV IOMMU interface draft B Malcolm Crossley
2015-06-16 13:19 ` Jan Beulich
2015-06-16 14:47   ` Malcolm Crossley
2015-06-16 15:56     ` Jan Beulich
2015-06-17 12:48 ` Yu, Zhang [this message]
2015-06-17 13:34   ` Jan Beulich
2015-06-17 13:44   ` Malcolm Crossley
2015-06-26 10:23 ` Xen PV IOMMU interface draft C Malcolm Crossley
2015-06-26 11:03   ` Ian Campbell
2015-06-29 14:40     ` Konrad Rzeszutek Wilk
2015-06-29 14:52       ` Ian Campbell
2015-06-29 15:05         ` Malcolm Crossley
2015-06-29 15:24         ` David Vrabel
2015-06-29 15:36           ` Ian Campbell
2015-07-10 19:32   ` Konrad Rzeszutek Wilk
2016-02-10 10:09   ` Xen PV IOMMU interface draft D Malcolm Crossley
2016-02-18  8:21     ` Tian, Kevin
2016-02-23 16:17     ` Jan Beulich
2016-02-23 16:22       ` Malcolm Crossley
2016-03-02  6:54     ` Tian, Kevin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=55816CAC.7090104@linux.intel.com \
    --to=yu.c.zhang@linux.intel.com \
    --cc=JBeulich@suse.com \
    --cc=Paul.Durrant@citrix.com \
    --cc=andrew.cooper3@citrix.com \
    --cc=david.vrabel@citrix.com \
    --cc=kevin.tian@intel.com \
    --cc=konrad.wilk@oracle.com \
    --cc=malcolm.crossley@citrix.com \
    --cc=xen-devel@lists.xenproject.org \
    --cc=zhiyuan.lv@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.