All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC PATCH 00/20] Qemu: Extend intel_iommu emulator to support Shared Virtual Memory
@ 2017-04-26 10:06 ` Liu, Yi L
  0 siblings, 0 replies; 81+ messages in thread
From: Liu, Yi L @ 2017-04-26 10:06 UTC (permalink / raw)
  To: qemu-devel, alex.williamson, peterx
  Cc: kvm, jasowang, iommu, kevin.tian, ashok.raj, jacob.jun.pan,
	tianyu.lan, yi.l.liu, jean-philippe.brucker, Liu, Yi L

Hi,

This patchset is proposing a solution to extend the current
Intel IOMMU emulator in QEMU to support Shared Virtual Memory
usage in guest. The whole SVM virtualization for intel_iommu
has two series which introduces changes in Qemu and VFIO/IOMMU.
This patchset mainly changes Qemu. For VFIO/IOMMU changes, it is
in another patchset.

"[RFC PATCH 0/8] Shared Virtual Memory virtualization for VT-d"

[Terms]:
SVM: Shared Virtual Memory
vSVM: virtual SVM, mean use SVM in guest
IOVA: I/O Virtual Address
gIOVA: I/O Virtual Address in guest
GVA: virtual memory address in guest
GPA: physical address in guest
HPA: physical address in host
PRQ: Page Request
vIOMMU: Virtual IOMMU emulated by QEMU
pIOMMU: physical IOMMU on HW
QI: Queued Invalidation, a mechanism used to invalidate cache in VT-d
PASID: Process Address Space ID
IGD: Intel Graphics Device
PT: Passthru Mode
ECS: Extended Context Support
Ex-Root Table: root table used in ECS mode
Ex-Context Table: context table use in ECS mode

[About Shared Virtual Memory]
Shared Virtual Memory (SVM) is a VT-d feature that allows sharing
application address space with the I/O device. The feature works
with the PCI sig Process Address Space ID (PASID). SVM has the
following benefits:

* Programmer gets a consistent view of memory across host application
  and device.
* Efficient access to data, avoiding pining or copying overheads.
* Memory over-commit via demand paging for both CPU and device access
  to memory.

IGD is a SVM capable device, applications like OpenCL wants SVM support
to achieve the benefits above. This patchset was tested with IGD and SVM
tools provided by IGD driver developer.


[vSVM]
SVM usage in guest would be mentioned as vSVM in this patch set. vSVM
enables sharing guest application address space with assigned devices.

The following diagram illustrates the relationship of the Ex-Root Table
, Ex-Context Table, PASID Table, First-Level Page Table, Second-Level
Page Table on VT-d.

                                              ------+
                                            ------+ |
                                         +------+ | |
                              PASID      |      | | |
                              Table      +------+ | |
                             +------+    |      | | |
                Ex-Context   |      |    +------+ | |
                   Table     +------+    |      | |
                 +------+    | pasid| -->+------+
      Ex-Root    |      |    +------+    First-Level
      Table      +------+    |      |    Page Table
     +------+    |devfn | -->+------+
     |      |    +------+ \
     +------+    |      |  \                ------+
     | bus  | -->+------+   \             ------+ |
     +------+                \         +------+ | |
     |      |                 \        |      | | |
     +------+                  \       +------+ | |
    /                           \      |      | | |
RTA                              \     +------+ | |
                                  \    |      | |
                                   --> +------+
                                       Second-Level
                                       Page Table

To achieve the virtual SVM usage, GVA->HPA mapping in physical VT-d
is needed. On VT-d, there is nested mode which is able to achieve
GVA->HPA mapping. With nested mode enabled for a device, any request-
with-PASID from this device would be translated with first-level page
table and second-level page table in a nested mode. The translation
process is getting GVA->GPA by first-level page table, and then getting
GPA->HPA by second-level page table.
                                       
The translation above could be achieve by linking the whole guest PASID
table to host. With guest PASID table linked, the Remapping Hardware in
VT-d could use the guest first-level page table to get GVA->GPA translation
and then use the host second-level page table to get GPA->HPA translation.

Besides nested mode and linking guest PASID table to host, caching-mode
is another key capability. Reporting the Caching Mode as Set for the
virtual hardware requires the guest software to explicitly issue
invalidation operations on the virtual hardware for any/all updates to the
guest remapping structures. The virtualizing software may trap these guest
invalidation operations to keep the shadow translation structures consistent
to guest translation structure modifications. With Caching Mode reported to
guest, intel_iommu emulator could trap the programming of context entry in
guest thus link the guest PASID table to host and set nested mode.

[vSVM Implementation]
To enable SVM usage to guest, the work includes the following items.

Initialization Phase:
(1) Report SVM required capabilities in intel_iommu emulator
(2) Trap the guest context cache invalidation, link the whole guest PASID
    table to host ex-context entry
(3) Set nested mode in host extended-context entry

Run-time:
(4) Forward guest cache invalidation requests for 1st level translation to
    pIOMMU
(5) Fault reporting, reports fault happen on host to intel_iommu emulator,
    then to guest
(6) Page Request and response

As fault reporting framework is in discussion in another thread which is
driven by Lan Tianyu, so vSVM enabling plan is to divide the work into two
phase. This patchset is for Phase 1.

Phase 1: include item (1), (2) and (3).
Phase 2: include item (4), (5) and (6).


[Overview of patch]
This patchset has a requirement of Passthru-Mode supporting for
intel_iommu. Peter Xu has sent a patch for it.
https://www.mail-archive.com/qemu-devel@nongnu.org/msg443627.html

* 1 ~ 2 enables Extend-Context Support in intel_iommu emulator.
* 3 exposes SVM related capability to guest with an option.
* 4 changes VFIO notifier parameter for the newly added notifier.
* 5 ~ 6 adds new VFIO notifier for pasid table bind request.
* 7 ~ 8 adds notifier flag check in memory_replay and region_del.
* 9 ~ 11 introduces a mechanism between VFIO and intel_iommu emulator
  to record assigned device info. e.g. the host SID of the assigned
  device.
* 12 adds fire function for pasid table bind notifier
* 13 adds generic definition for pasid table info in iommu.h
* 14 ~ 15 link the guest pasid table to host for intel_iommu
* 16 adds VFIO notifier for propagating guest IOMMU TLB invalidate
  to host.
* 17 adds fire function for IOMMU TLB invalidate notifier
* 18 ~ 20 propagate first-level page table related cache invalidate
  to host.

[Test Done]
The patchset is tested with IGD. Assign IGD to guest, the IGD could
write data to guest application address space.

i915 SVM capable driver could be found:
https://cgit.freedesktop.org/~miku/drm-intel/?h=svm

i915 svm test tool:
https://cgit.freedesktop.org/~miku/intel-gpu-tools/log/?h=svm


[Co-work with gIOVA enablement]
Currently Peter Xu is working on enabling gIOVA usage for Intel
IOMMU emulator, this patchset is based on Peter's work (V7).
https://github.com/xzpeter/qemu/tree/vtd-vfio-enablement-v7

[Limitation]
* Due to VT-d HW limitation, an assigned device cannot use gIOVA
and vSVM in the same time. Intel VT-d spec would introduce a new
capability bit indicating such limitation which guest IOMMU driver
can check to prevent both IOVA/SVM enabled, as a short-term solution.
In the long term it will be fixed by HW.

[Open]
* This patchset proposes passing raw data from guest to host when
propagating the guest IOMMU TLB invalidation.

In fact, we have two choice here.

a) as proposed in this patchset, passing raw data to host. Host pIOMMU
   driver submits invalidation request after replacing specific fields.
   Reject if the IOMMU model is not correct.
   * Pros: no need to do parse and re-assembling, better performance
   * Cons: unable to support the scenarios which emulates an Intel IOMMU
           on an ARM platform.
b) parse the invalidation info into specific data, e.g. gran, addr,
   size, invalidation type etc. then fill the data in a generic
   structure. In host, pIOMMU driver re-assemble the invalidation
   request and submit to pIOMMU.
   * Pros: may be able to support the scenario above. But it is still in
           question since different vendor may have vendor specific
           invalidation info. This would make it difficult to have vendor
           agnostic invalidation propagation API.

   * Cons: needs additional complexity to do parse and re-assembling.
           The generic structure would be a hyper-set of all possible
           invalidate info, this may be hard to maintain in future.

As the pros/cons show, I proposed a) as an initial version. But it is an
open. I would be glad to hear from you.

FYI. The following definition is a draft discussed with Jean in previous
discussion. It has both generic part and vendor specific part.

struct tlb_invalidate_info
{
        __u32   model;  /* Vendor number */
        __u8 granularity
#define DEVICE_SELECTVIE_INV    (1 << 0)
#define PAGE_SELECTIVE_INV      (1 << 0)
#define PASID_SELECTIVE_INV     (1 << 1)
        __u32 pasid;
        __u64 addr;
        __u64 size;

        /* Since IOMMU format has already been validated for this table,
           the IOMMU driver knows that the following structure is in a
           format it knows */
        __u8 opaque[];
};

struct tlb_invalidate_info_intel
{
        __u32 inv_type;
        ...
        __u64 flags;
        ...
        __u8 mip;
        __u16 pfsid;
};

Additionally, Jean is proposing a para-vIOMMU solution. There is opaque
data in the proposed invalidate request VIRTIO_IOMMU_T_INVALIDATE. So it
may be preferred to have opaque part when doing the iommu tlb invalidate
propagation in SVM virtualization.

http://www.spinics.net/lists/kvm/msg147993.html

Best Wishes,
Yi L


Liu, Yi L (20):
  intel_iommu: add "ecs" option
  intel_iommu: exposed extended-context mode to guest
  intel_iommu: add "svm" option
  Memory: modify parameter in IOMMUNotifier func
  VFIO: add new IOCTL for svm bind tasks
  VFIO: add new notifier for binding PASID table
  VFIO: check notifier flag in region_del()
  Memory: add notifier flag check in memory_replay()
  Memory: introduce iommu_ops->record_device
  VFIO: notify vIOMMU emulator when device is assigned
  intel_iommu: provide iommu_ops->record_device
  Memory: Add func to fire pasidt_bind notifier
  IOMMU: add pasid_table_info for guest pasid table
  intel_iommu: add FOR_EACH_ASSIGN_DEVICE macro
  intel_iommu: link whole guest pasid table to host
  VFIO: Add notifier for propagating IOMMU TLB invalidate
  Memory: Add func to fire TLB invalidate notifier
  intel_iommu: propagate Extended-IOTLB invalidate to host
  intel_iommu: propagate PASID-Cache invalidate to host
  intel_iommu: propagate Ext-Device-TLB invalidate to host

 hw/i386/intel_iommu.c          | 543 +++++++++++++++++++++++++++++++++++++----
 hw/i386/intel_iommu_internal.h |  87 +++++++
 hw/vfio/common.c               |  45 +++-
 hw/vfio/pci.c                  |  94 ++++++-
 hw/virtio/vhost.c              |   3 +-
 include/exec/memory.h          |  45 +++-
 include/hw/i386/intel_iommu.h  |   5 +-
 include/hw/vfio/vfio-common.h  |   5 +
 linux-headers/linux/iommu.h    |  35 +++
 linux-headers/linux/vfio.h     |  26 ++
 memory.c                       |  59 +++++
 11 files changed, 882 insertions(+), 65 deletions(-)
 create mode 100644 linux-headers/linux/iommu.h

-- 
1.9.1

^ permalink raw reply	[flat|nested] 81+ messages in thread

* [Qemu-devel] [RFC PATCH 00/20] Qemu: Extend intel_iommu emulator to support Shared Virtual Memory
@ 2017-04-26 10:06 ` Liu, Yi L
  0 siblings, 0 replies; 81+ messages in thread
From: Liu, Yi L @ 2017-04-26 10:06 UTC (permalink / raw)
  To: qemu-devel, alex.williamson, peterx
  Cc: kvm, jasowang, iommu, kevin.tian, ashok.raj, jacob.jun.pan,
	tianyu.lan, yi.l.liu, jean-philippe.brucker, Liu, Yi L

Hi,

This patchset is proposing a solution to extend the current
Intel IOMMU emulator in QEMU to support Shared Virtual Memory
usage in guest. The whole SVM virtualization for intel_iommu
has two series which introduces changes in Qemu and VFIO/IOMMU.
This patchset mainly changes Qemu. For VFIO/IOMMU changes, it is
in another patchset.

"[RFC PATCH 0/8] Shared Virtual Memory virtualization for VT-d"

[Terms]:
SVM: Shared Virtual Memory
vSVM: virtual SVM, mean use SVM in guest
IOVA: I/O Virtual Address
gIOVA: I/O Virtual Address in guest
GVA: virtual memory address in guest
GPA: physical address in guest
HPA: physical address in host
PRQ: Page Request
vIOMMU: Virtual IOMMU emulated by QEMU
pIOMMU: physical IOMMU on HW
QI: Queued Invalidation, a mechanism used to invalidate cache in VT-d
PASID: Process Address Space ID
IGD: Intel Graphics Device
PT: Passthru Mode
ECS: Extended Context Support
Ex-Root Table: root table used in ECS mode
Ex-Context Table: context table use in ECS mode

[About Shared Virtual Memory]
Shared Virtual Memory (SVM) is a VT-d feature that allows sharing
application address space with the I/O device. The feature works
with the PCI sig Process Address Space ID (PASID). SVM has the
following benefits:

* Programmer gets a consistent view of memory across host application
  and device.
* Efficient access to data, avoiding pining or copying overheads.
* Memory over-commit via demand paging for both CPU and device access
  to memory.

IGD is a SVM capable device, applications like OpenCL wants SVM support
to achieve the benefits above. This patchset was tested with IGD and SVM
tools provided by IGD driver developer.


[vSVM]
SVM usage in guest would be mentioned as vSVM in this patch set. vSVM
enables sharing guest application address space with assigned devices.

The following diagram illustrates the relationship of the Ex-Root Table
, Ex-Context Table, PASID Table, First-Level Page Table, Second-Level
Page Table on VT-d.

                                              ------+
                                            ------+ |
                                         +------+ | |
                              PASID      |      | | |
                              Table      +------+ | |
                             +------+    |      | | |
                Ex-Context   |      |    +------+ | |
                   Table     +------+    |      | |
                 +------+    | pasid| -->+------+
      Ex-Root    |      |    +------+    First-Level
      Table      +------+    |      |    Page Table
     +------+    |devfn | -->+------+
     |      |    +------+ \
     +------+    |      |  \                ------+
     | bus  | -->+------+   \             ------+ |
     +------+                \         +------+ | |
     |      |                 \        |      | | |
     +------+                  \       +------+ | |
    /                           \      |      | | |
RTA                              \     +------+ | |
                                  \    |      | |
                                   --> +------+
                                       Second-Level
                                       Page Table

To achieve the virtual SVM usage, GVA->HPA mapping in physical VT-d
is needed. On VT-d, there is nested mode which is able to achieve
GVA->HPA mapping. With nested mode enabled for a device, any request-
with-PASID from this device would be translated with first-level page
table and second-level page table in a nested mode. The translation
process is getting GVA->GPA by first-level page table, and then getting
GPA->HPA by second-level page table.
                                       
The translation above could be achieve by linking the whole guest PASID
table to host. With guest PASID table linked, the Remapping Hardware in
VT-d could use the guest first-level page table to get GVA->GPA translation
and then use the host second-level page table to get GPA->HPA translation.

Besides nested mode and linking guest PASID table to host, caching-mode
is another key capability. Reporting the Caching Mode as Set for the
virtual hardware requires the guest software to explicitly issue
invalidation operations on the virtual hardware for any/all updates to the
guest remapping structures. The virtualizing software may trap these guest
invalidation operations to keep the shadow translation structures consistent
to guest translation structure modifications. With Caching Mode reported to
guest, intel_iommu emulator could trap the programming of context entry in
guest thus link the guest PASID table to host and set nested mode.

[vSVM Implementation]
To enable SVM usage to guest, the work includes the following items.

Initialization Phase:
(1) Report SVM required capabilities in intel_iommu emulator
(2) Trap the guest context cache invalidation, link the whole guest PASID
    table to host ex-context entry
(3) Set nested mode in host extended-context entry

Run-time:
(4) Forward guest cache invalidation requests for 1st level translation to
    pIOMMU
(5) Fault reporting, reports fault happen on host to intel_iommu emulator,
    then to guest
(6) Page Request and response

As fault reporting framework is in discussion in another thread which is
driven by Lan Tianyu, so vSVM enabling plan is to divide the work into two
phase. This patchset is for Phase 1.

Phase 1: include item (1), (2) and (3).
Phase 2: include item (4), (5) and (6).


[Overview of patch]
This patchset has a requirement of Passthru-Mode supporting for
intel_iommu. Peter Xu has sent a patch for it.
https://www.mail-archive.com/qemu-devel@nongnu.org/msg443627.html

* 1 ~ 2 enables Extend-Context Support in intel_iommu emulator.
* 3 exposes SVM related capability to guest with an option.
* 4 changes VFIO notifier parameter for the newly added notifier.
* 5 ~ 6 adds new VFIO notifier for pasid table bind request.
* 7 ~ 8 adds notifier flag check in memory_replay and region_del.
* 9 ~ 11 introduces a mechanism between VFIO and intel_iommu emulator
  to record assigned device info. e.g. the host SID of the assigned
  device.
* 12 adds fire function for pasid table bind notifier
* 13 adds generic definition for pasid table info in iommu.h
* 14 ~ 15 link the guest pasid table to host for intel_iommu
* 16 adds VFIO notifier for propagating guest IOMMU TLB invalidate
  to host.
* 17 adds fire function for IOMMU TLB invalidate notifier
* 18 ~ 20 propagate first-level page table related cache invalidate
  to host.

[Test Done]
The patchset is tested with IGD. Assign IGD to guest, the IGD could
write data to guest application address space.

i915 SVM capable driver could be found:
https://cgit.freedesktop.org/~miku/drm-intel/?h=svm

i915 svm test tool:
https://cgit.freedesktop.org/~miku/intel-gpu-tools/log/?h=svm


[Co-work with gIOVA enablement]
Currently Peter Xu is working on enabling gIOVA usage for Intel
IOMMU emulator, this patchset is based on Peter's work (V7).
https://github.com/xzpeter/qemu/tree/vtd-vfio-enablement-v7

[Limitation]
* Due to VT-d HW limitation, an assigned device cannot use gIOVA
and vSVM in the same time. Intel VT-d spec would introduce a new
capability bit indicating such limitation which guest IOMMU driver
can check to prevent both IOVA/SVM enabled, as a short-term solution.
In the long term it will be fixed by HW.

[Open]
* This patchset proposes passing raw data from guest to host when
propagating the guest IOMMU TLB invalidation.

In fact, we have two choice here.

a) as proposed in this patchset, passing raw data to host. Host pIOMMU
   driver submits invalidation request after replacing specific fields.
   Reject if the IOMMU model is not correct.
   * Pros: no need to do parse and re-assembling, better performance
   * Cons: unable to support the scenarios which emulates an Intel IOMMU
           on an ARM platform.
b) parse the invalidation info into specific data, e.g. gran, addr,
   size, invalidation type etc. then fill the data in a generic
   structure. In host, pIOMMU driver re-assemble the invalidation
   request and submit to pIOMMU.
   * Pros: may be able to support the scenario above. But it is still in
           question since different vendor may have vendor specific
           invalidation info. This would make it difficult to have vendor
           agnostic invalidation propagation API.

   * Cons: needs additional complexity to do parse and re-assembling.
           The generic structure would be a hyper-set of all possible
           invalidate info, this may be hard to maintain in future.

As the pros/cons show, I proposed a) as an initial version. But it is an
open. I would be glad to hear from you.

FYI. The following definition is a draft discussed with Jean in previous
discussion. It has both generic part and vendor specific part.

struct tlb_invalidate_info
{
        __u32   model;  /* Vendor number */
        __u8 granularity
#define DEVICE_SELECTVIE_INV    (1 << 0)
#define PAGE_SELECTIVE_INV      (1 << 0)
#define PASID_SELECTIVE_INV     (1 << 1)
        __u32 pasid;
        __u64 addr;
        __u64 size;

        /* Since IOMMU format has already been validated for this table,
           the IOMMU driver knows that the following structure is in a
           format it knows */
        __u8 opaque[];
};

struct tlb_invalidate_info_intel
{
        __u32 inv_type;
        ...
        __u64 flags;
        ...
        __u8 mip;
        __u16 pfsid;
};

Additionally, Jean is proposing a para-vIOMMU solution. There is opaque
data in the proposed invalidate request VIRTIO_IOMMU_T_INVALIDATE. So it
may be preferred to have opaque part when doing the iommu tlb invalidate
propagation in SVM virtualization.

http://www.spinics.net/lists/kvm/msg147993.html

Best Wishes,
Yi L


Liu, Yi L (20):
  intel_iommu: add "ecs" option
  intel_iommu: exposed extended-context mode to guest
  intel_iommu: add "svm" option
  Memory: modify parameter in IOMMUNotifier func
  VFIO: add new IOCTL for svm bind tasks
  VFIO: add new notifier for binding PASID table
  VFIO: check notifier flag in region_del()
  Memory: add notifier flag check in memory_replay()
  Memory: introduce iommu_ops->record_device
  VFIO: notify vIOMMU emulator when device is assigned
  intel_iommu: provide iommu_ops->record_device
  Memory: Add func to fire pasidt_bind notifier
  IOMMU: add pasid_table_info for guest pasid table
  intel_iommu: add FOR_EACH_ASSIGN_DEVICE macro
  intel_iommu: link whole guest pasid table to host
  VFIO: Add notifier for propagating IOMMU TLB invalidate
  Memory: Add func to fire TLB invalidate notifier
  intel_iommu: propagate Extended-IOTLB invalidate to host
  intel_iommu: propagate PASID-Cache invalidate to host
  intel_iommu: propagate Ext-Device-TLB invalidate to host

 hw/i386/intel_iommu.c          | 543 +++++++++++++++++++++++++++++++++++++----
 hw/i386/intel_iommu_internal.h |  87 +++++++
 hw/vfio/common.c               |  45 +++-
 hw/vfio/pci.c                  |  94 ++++++-
 hw/virtio/vhost.c              |   3 +-
 include/exec/memory.h          |  45 +++-
 include/hw/i386/intel_iommu.h  |   5 +-
 include/hw/vfio/vfio-common.h  |   5 +
 linux-headers/linux/iommu.h    |  35 +++
 linux-headers/linux/vfio.h     |  26 ++
 memory.c                       |  59 +++++
 11 files changed, 882 insertions(+), 65 deletions(-)
 create mode 100644 linux-headers/linux/iommu.h

-- 
1.9.1

^ permalink raw reply	[flat|nested] 81+ messages in thread

* [RFC PATCH 01/20] intel_iommu: add "ecs" option
  2017-04-26 10:06 ` [Qemu-devel] " Liu, Yi L
@ 2017-04-26 10:06   ` Liu, Yi L
  -1 siblings, 0 replies; 81+ messages in thread
From: Liu, Yi L @ 2017-04-26 10:06 UTC (permalink / raw)
  To: qemu-devel, alex.williamson, peterx
  Cc: kvm, jasowang, iommu, kevin.tian, ashok.raj, jacob.jun.pan,
	tianyu.lan, yi.l.liu, jean-philippe.brucker, Liu, Yi L

Report ecap.ECS=1 to guest by "-deivce intel-iommu, ecs=on" in QEMU Cmd

Signed-off-by: Liu, Yi L <yi.l.liu@linux.intel.com>
---
 hw/i386/intel_iommu.c          | 5 +++++
 hw/i386/intel_iommu_internal.h | 1 +
 include/hw/i386/intel_iommu.h  | 1 +
 3 files changed, 7 insertions(+)

diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
index 4b7d90d..400d0d1 100644
--- a/hw/i386/intel_iommu.c
+++ b/hw/i386/intel_iommu.c
@@ -2409,6 +2409,7 @@ static Property vtd_properties[] = {
                             ON_OFF_AUTO_AUTO),
     DEFINE_PROP_BOOL("x-buggy-eim", IntelIOMMUState, buggy_eim, false),
     DEFINE_PROP_BOOL("caching-mode", IntelIOMMUState, caching_mode, FALSE),
+    DEFINE_PROP_BOOL("ecs", IntelIOMMUState, ecs, FALSE),
     DEFINE_PROP_END_OF_LIST(),
 };
 
@@ -2925,6 +2926,10 @@ static void vtd_init(IntelIOMMUState *s)
         s->ecap |= VTD_ECAP_PT;
     }
 
+    if (s->ecs) {
+        s->ecap |= VTD_ECAP_ECS;
+    }
+
     if (s->caching_mode) {
         s->cap |= VTD_CAP_CM;
     }
diff --git a/hw/i386/intel_iommu_internal.h b/hw/i386/intel_iommu_internal.h
index b96884e..ec1bd17 100644
--- a/hw/i386/intel_iommu_internal.h
+++ b/hw/i386/intel_iommu_internal.h
@@ -190,6 +190,7 @@
 #define VTD_ECAP_EIM                (1ULL << 4)
 #define VTD_ECAP_PT                 (1ULL << 6)
 #define VTD_ECAP_MHMV               (15ULL << 20)
+#define VTD_ECAP_ECS                (1ULL << 24)
 
 /* CAP_REG */
 /* (offset >> 4) << 24 */
diff --git a/include/hw/i386/intel_iommu.h b/include/hw/i386/intel_iommu.h
index 3e51876..fa5963e 100644
--- a/include/hw/i386/intel_iommu.h
+++ b/include/hw/i386/intel_iommu.h
@@ -266,6 +266,7 @@ struct IntelIOMMUState {
     uint32_t version;
 
     bool caching_mode;          /* RO - is cap CM enabled? */
+    bool ecs;                       /* Extended Context Support */
 
     dma_addr_t root;                /* Current root table pointer */
     bool root_extended;             /* Type of root table (extended or not) */
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 81+ messages in thread

* [Qemu-devel] [RFC PATCH 01/20] intel_iommu: add "ecs" option
@ 2017-04-26 10:06   ` Liu, Yi L
  0 siblings, 0 replies; 81+ messages in thread
From: Liu, Yi L @ 2017-04-26 10:06 UTC (permalink / raw)
  To: qemu-devel, alex.williamson, peterx
  Cc: kvm, jasowang, iommu, kevin.tian, ashok.raj, jacob.jun.pan,
	tianyu.lan, yi.l.liu, jean-philippe.brucker, Liu, Yi L

Report ecap.ECS=1 to guest by "-deivce intel-iommu, ecs=on" in QEMU Cmd

Signed-off-by: Liu, Yi L <yi.l.liu@linux.intel.com>
---
 hw/i386/intel_iommu.c          | 5 +++++
 hw/i386/intel_iommu_internal.h | 1 +
 include/hw/i386/intel_iommu.h  | 1 +
 3 files changed, 7 insertions(+)

diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
index 4b7d90d..400d0d1 100644
--- a/hw/i386/intel_iommu.c
+++ b/hw/i386/intel_iommu.c
@@ -2409,6 +2409,7 @@ static Property vtd_properties[] = {
                             ON_OFF_AUTO_AUTO),
     DEFINE_PROP_BOOL("x-buggy-eim", IntelIOMMUState, buggy_eim, false),
     DEFINE_PROP_BOOL("caching-mode", IntelIOMMUState, caching_mode, FALSE),
+    DEFINE_PROP_BOOL("ecs", IntelIOMMUState, ecs, FALSE),
     DEFINE_PROP_END_OF_LIST(),
 };
 
@@ -2925,6 +2926,10 @@ static void vtd_init(IntelIOMMUState *s)
         s->ecap |= VTD_ECAP_PT;
     }
 
+    if (s->ecs) {
+        s->ecap |= VTD_ECAP_ECS;
+    }
+
     if (s->caching_mode) {
         s->cap |= VTD_CAP_CM;
     }
diff --git a/hw/i386/intel_iommu_internal.h b/hw/i386/intel_iommu_internal.h
index b96884e..ec1bd17 100644
--- a/hw/i386/intel_iommu_internal.h
+++ b/hw/i386/intel_iommu_internal.h
@@ -190,6 +190,7 @@
 #define VTD_ECAP_EIM                (1ULL << 4)
 #define VTD_ECAP_PT                 (1ULL << 6)
 #define VTD_ECAP_MHMV               (15ULL << 20)
+#define VTD_ECAP_ECS                (1ULL << 24)
 
 /* CAP_REG */
 /* (offset >> 4) << 24 */
diff --git a/include/hw/i386/intel_iommu.h b/include/hw/i386/intel_iommu.h
index 3e51876..fa5963e 100644
--- a/include/hw/i386/intel_iommu.h
+++ b/include/hw/i386/intel_iommu.h
@@ -266,6 +266,7 @@ struct IntelIOMMUState {
     uint32_t version;
 
     bool caching_mode;          /* RO - is cap CM enabled? */
+    bool ecs;                       /* Extended Context Support */
 
     dma_addr_t root;                /* Current root table pointer */
     bool root_extended;             /* Type of root table (extended or not) */
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 81+ messages in thread

* [RFC PATCH 02/20] intel_iommu: exposed extended-context mode to guest
  2017-04-26 10:06 ` [Qemu-devel] " Liu, Yi L
@ 2017-04-26 10:06   ` Liu, Yi L
  -1 siblings, 0 replies; 81+ messages in thread
From: Liu, Yi L @ 2017-04-26 10:06 UTC (permalink / raw)
  To: qemu-devel, alex.williamson, peterx
  Cc: kvm, jasowang, iommu, kevin.tian, ashok.raj, jacob.jun.pan,
	tianyu.lan, yi.l.liu, jean-philippe.brucker, Liu, Yi L

VT-d implementations reporting PASID or PRS fields as "Set", must also
report ecap.ECS as "Set". Extended-Context is required for SVM.

When ECS is reported, intel iommu driver would initiate extended root entry
and extended context entry, and also PASID table if there is any SVM capable
device.

Signed-off-by: Liu, Yi L <yi.l.liu@linux.intel.com>
---
 hw/i386/intel_iommu.c          | 131 +++++++++++++++++++++++++++--------------
 hw/i386/intel_iommu_internal.h |   9 +++
 include/hw/i386/intel_iommu.h  |   2 +-
 3 files changed, 97 insertions(+), 45 deletions(-)

diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
index 400d0d1..bf98fa5 100644
--- a/hw/i386/intel_iommu.c
+++ b/hw/i386/intel_iommu.c
@@ -497,6 +497,11 @@ static inline bool vtd_root_entry_present(VTDRootEntry *root)
     return root->val & VTD_ROOT_ENTRY_P;
 }
 
+static inline bool vtd_root_entry_upper_present(VTDRootEntry *root)
+{
+    return root->rsvd & VTD_ROOT_ENTRY_P;
+}
+
 static int vtd_get_root_entry(IntelIOMMUState *s, uint8_t index,
                               VTDRootEntry *re)
 {
@@ -509,6 +514,9 @@ static int vtd_get_root_entry(IntelIOMMUState *s, uint8_t index,
         return -VTD_FR_ROOT_TABLE_INV;
     }
     re->val = le64_to_cpu(re->val);
+    if (s->ecs) {
+        re->rsvd = le64_to_cpu(re->rsvd);
+    }
     return 0;
 }
 
@@ -517,19 +525,30 @@ static inline bool vtd_context_entry_present(VTDContextEntry *context)
     return context->lo & VTD_CONTEXT_ENTRY_P;
 }
 
-static int vtd_get_context_entry_from_root(VTDRootEntry *root, uint8_t index,
-                                           VTDContextEntry *ce)
+static int vtd_get_context_entry_from_root(IntelIOMMUState *s,
+                 VTDRootEntry *root, uint8_t index, VTDContextEntry *ce)
 {
-    dma_addr_t addr;
+    dma_addr_t addr, ce_size;
 
     /* we have checked that root entry is present */
-    addr = (root->val & VTD_ROOT_ENTRY_CTP) + index * sizeof(*ce);
-    if (dma_memory_read(&address_space_memory, addr, ce, sizeof(*ce))) {
+    ce_size = (s->ecs) ? (2 * sizeof(*ce)) : (sizeof(*ce));
+    addr = (s->ecs && (index > 0x7f)) ?
+           ((root->rsvd & VTD_ROOT_ENTRY_CTP) + (index - 0x80) * ce_size) :
+           ((root->val & VTD_ROOT_ENTRY_CTP) + index * ce_size);
+
+    if (dma_memory_read(&address_space_memory, addr, ce, ce_size)) {
         trace_vtd_re_invalid(root->rsvd, root->val);
         return -VTD_FR_CONTEXT_TABLE_INV;
     }
-    ce->lo = le64_to_cpu(ce->lo);
-    ce->hi = le64_to_cpu(ce->hi);
+
+    ce[0].lo = le64_to_cpu(ce[0].lo);
+    ce[0].hi = le64_to_cpu(ce[0].hi);
+
+    if (s->ecs) {
+        ce[1].lo = le64_to_cpu(ce[1].lo);
+        ce[1].hi = le64_to_cpu(ce[1].hi);
+    }
+
     return 0;
 }
 
@@ -595,9 +614,11 @@ static inline uint32_t vtd_get_agaw_from_context_entry(VTDContextEntry *ce)
     return 30 + (ce->hi & VTD_CONTEXT_ENTRY_AW) * 9;
 }
 
-static inline uint32_t vtd_ce_get_type(VTDContextEntry *ce)
+static inline uint32_t vtd_ce_get_type(IntelIOMMUState *s,
+                                       VTDContextEntry *ce)
 {
-    return ce->lo & VTD_CONTEXT_ENTRY_TT;
+    return s->ecs ? (ce->lo & VTD_CONTEXT_ENTRY_TT) :
+                    (ce->lo & VTD_EXT_CONTEXT_ENTRY_TT);
 }
 
 static inline uint64_t vtd_iova_limit(VTDContextEntry *ce)
@@ -842,16 +863,20 @@ static int vtd_dev_to_context_entry(IntelIOMMUState *s, uint8_t bus_num,
         return ret_fr;
     }
 
-    if (!vtd_root_entry_present(&re)) {
+    if (!vtd_root_entry_present(&re) ||
+        (s->ecs && (devfn > 0x7f) && (!vtd_root_entry_upper_present(&re)))) {
         /* Not error - it's okay we don't have root entry. */
         trace_vtd_re_not_present(bus_num);
         return -VTD_FR_ROOT_ENTRY_P;
-    } else if (re.rsvd || (re.val & VTD_ROOT_ENTRY_RSVD)) {
-        trace_vtd_re_invalid(re.rsvd, re.val);
-        return -VTD_FR_ROOT_ENTRY_RSVD;
+    }
+    if ((s->ecs && (devfn > 0x7f) && (re.rsvd & VTD_ROOT_ENTRY_RSVD)) ||
+        (s->ecs && (devfn < 0x80) && (re.val & VTD_ROOT_ENTRY_RSVD)) ||
+        ((!s->ecs) && (re.rsvd || (re.val & VTD_ROOT_ENTRY_RSVD)))) {
+            trace_vtd_re_invalid(re.rsvd, re.val);
+            return -VTD_FR_ROOT_ENTRY_RSVD;
     }
 
-    ret_fr = vtd_get_context_entry_from_root(&re, devfn, ce);
+    ret_fr = vtd_get_context_entry_from_root(s, &re, devfn, ce);
     if (ret_fr) {
         return ret_fr;
     }
@@ -860,21 +885,36 @@ static int vtd_dev_to_context_entry(IntelIOMMUState *s, uint8_t bus_num,
         /* Not error - it's okay we don't have context entry. */
         trace_vtd_ce_not_present(bus_num, devfn);
         return -VTD_FR_CONTEXT_ENTRY_P;
-    } else if ((ce->hi & VTD_CONTEXT_ENTRY_RSVD_HI) ||
-               (ce->lo & VTD_CONTEXT_ENTRY_RSVD_LO)) {
+    }
+
+    /* Check Reserved bits in context-entry */
+    if ((!s->ecs && (ce->hi & VTD_CONTEXT_ENTRY_RSVD_HI)) ||
+        (!s->ecs && (ce->lo & VTD_CONTEXT_ENTRY_RSVD_LO)) ||
+        (s->ecs && (ce[0].lo & VTD_EXT_CONTEXT_ENTRY_RSVD_LOW0)) ||
+        (s->ecs && (ce[0].hi & VTD_EXT_CONTEXT_ENTRY_RSVD_HIGH0)) ||
+        (s->ecs && (ce[1].lo & VTD_EXT_CONTEXT_ENTRY_RSVD_LOW1))) {
         trace_vtd_ce_invalid(ce->hi, ce->lo);
         return -VTD_FR_CONTEXT_ENTRY_RSVD;
     }
+
     /* Check if the programming of context-entry is valid */
     if (!vtd_is_level_supported(s, vtd_get_level_from_context_entry(ce))) {
         trace_vtd_ce_invalid(ce->hi, ce->lo);
         return -VTD_FR_CONTEXT_ENTRY_INV;
     } else {
-        switch (vtd_ce_get_type(ce)) {
+        switch (vtd_ce_get_type(s, ce)) {
         case VTD_CONTEXT_TT_MULTI_LEVEL:
             /* fall through */
         case VTD_CONTEXT_TT_DEV_IOTLB:
             break;
+        case VTD_EXT_CONTEXT_TT_NO_DEV_IOTLB:
+        case VTD_EXT_CONTEXT_TT_DEV_IOTLB:
+            if (s->ecs) {
+                break;
+            } else {
+                trace_vtd_ce_invalid(ce->hi, ce->lo);
+                return -VTD_FR_CONTEXT_ENTRY_INV;
+            }
         case VTD_CONTEXT_TT_PASS_THROUGH:
             if (s->ecap & VTD_ECAP_PT) {
                 break;
@@ -894,18 +934,18 @@ static int vtd_dev_to_context_entry(IntelIOMMUState *s, uint8_t bus_num,
 static int vtd_dev_get_trans_type(VTDAddressSpace *as)
 {
     IntelIOMMUState *s;
-    VTDContextEntry ce;
+    VTDContextEntry ce[2];
     int ret;
 
     s = as->iommu_state;
 
     ret = vtd_dev_to_context_entry(s, pci_bus_num(as->bus),
-                                   as->devfn, &ce);
+                                   as->devfn, &ce[0]);
     if (ret) {
         return ret;
     }
 
-    return vtd_ce_get_type(&ce);
+    return vtd_ce_get_type(s, &ce[0]);
 }
 
 static bool vtd_dev_pt_enabled(VTDAddressSpace *as)
@@ -1008,7 +1048,7 @@ static void vtd_do_iommu_translate(VTDAddressSpace *vtd_as, PCIBus *bus,
                                    IOMMUTLBEntry *entry)
 {
     IntelIOMMUState *s = vtd_as->iommu_state;
-    VTDContextEntry ce;
+    VTDContextEntry ce[2];
     uint8_t bus_num = pci_bus_num(bus);
     VTDContextCacheEntry *cc_entry = &vtd_as->context_cache_entry;
     uint64_t slpte, page_mask;
@@ -1039,14 +1079,16 @@ static void vtd_do_iommu_translate(VTDAddressSpace *vtd_as, PCIBus *bus,
     }
     /* Try to fetch context-entry from cache first */
     if (cc_entry->context_cache_gen == s->context_cache_gen) {
-        trace_vtd_iotlb_cc_hit(bus_num, devfn, cc_entry->context_entry.hi,
-                               cc_entry->context_entry.lo,
+        trace_vtd_iotlb_cc_hit(bus_num, devfn,
+                               cc_entry->context_entry[0].hi,
+                               cc_entry->context_entry[0].lo,
                                cc_entry->context_cache_gen);
-        ce = cc_entry->context_entry;
-        is_fpd_set = ce.lo & VTD_CONTEXT_ENTRY_FPD;
+        ce[0] = cc_entry->context_entry[0];
+        ce[1] = cc_entry->context_entry[1];
+        is_fpd_set = ce[0].lo & VTD_CONTEXT_ENTRY_FPD;
     } else {
-        ret_fr = vtd_dev_to_context_entry(s, bus_num, devfn, &ce);
-        is_fpd_set = ce.lo & VTD_CONTEXT_ENTRY_FPD;
+        ret_fr = vtd_dev_to_context_entry(s, bus_num, devfn, &ce[0]);
+        is_fpd_set = ce[0].lo & VTD_CONTEXT_ENTRY_FPD;
         if (ret_fr) {
             ret_fr = -ret_fr;
             if (is_fpd_set && vtd_is_qualified_fault(ret_fr)) {
@@ -1057,10 +1099,11 @@ static void vtd_do_iommu_translate(VTDAddressSpace *vtd_as, PCIBus *bus,
             return;
         }
         /* Update context-cache */
-        trace_vtd_iotlb_cc_update(bus_num, devfn, ce.hi, ce.lo,
+        trace_vtd_iotlb_cc_update(bus_num, devfn, ce[0].hi, ce[0].lo,
                                   cc_entry->context_cache_gen,
                                   s->context_cache_gen);
-        cc_entry->context_entry = ce;
+        cc_entry->context_entry[0] = ce[0];
+        cc_entry->context_entry[1] = ce[1];
         cc_entry->context_cache_gen = s->context_cache_gen;
     }
 
@@ -1068,7 +1111,7 @@ static void vtd_do_iommu_translate(VTDAddressSpace *vtd_as, PCIBus *bus,
      * We don't need to translate for pass-through context entries.
      * Also, let's ignore IOTLB caching as well for PT devices.
      */
-    if (vtd_ce_get_type(&ce) == VTD_CONTEXT_TT_PASS_THROUGH) {
+    if (vtd_ce_get_type(s, &ce[0]) == VTD_CONTEXT_TT_PASS_THROUGH) {
         entry->translated_addr = entry->iova;
         entry->addr_mask = VTD_PAGE_SIZE - 1;
         entry->perm = IOMMU_RW;
@@ -1076,7 +1119,7 @@ static void vtd_do_iommu_translate(VTDAddressSpace *vtd_as, PCIBus *bus,
         return;
     }
 
-    ret_fr = vtd_iova_to_slpte(&ce, addr, is_write, &slpte, &level,
+    ret_fr = vtd_iova_to_slpte(&ce[0], addr, is_write, &slpte, &level,
                                &reads, &writes);
     if (ret_fr) {
         ret_fr = -ret_fr;
@@ -1089,7 +1132,7 @@ static void vtd_do_iommu_translate(VTDAddressSpace *vtd_as, PCIBus *bus,
     }
 
     page_mask = vtd_slpt_level_page_mask(level);
-    vtd_update_iotlb(s, source_id, VTD_CONTEXT_ENTRY_DID(ce.hi), addr, slpte,
+    vtd_update_iotlb(s, source_id, VTD_CONTEXT_ENTRY_DID(ce[0].hi), addr, slpte,
                      reads, writes, level);
 out:
     entry->iova = addr & page_mask;
@@ -1283,7 +1326,7 @@ static void vtd_iotlb_global_invalidate(IntelIOMMUState *s)
 static void vtd_iotlb_domain_invalidate(IntelIOMMUState *s, uint16_t domain_id)
 {
     IntelIOMMUNotifierNode *node;
-    VTDContextEntry ce;
+    VTDContextEntry ce[2];
     VTDAddressSpace *vtd_as;
 
     g_hash_table_foreach_remove(s->iotlb, vtd_hash_remove_by_domain,
@@ -1292,8 +1335,8 @@ static void vtd_iotlb_domain_invalidate(IntelIOMMUState *s, uint16_t domain_id)
     QLIST_FOREACH(node, &s->notifiers_list, next) {
         vtd_as = node->vtd_as;
         if (!vtd_dev_to_context_entry(s, pci_bus_num(vtd_as->bus),
-                                      vtd_as->devfn, &ce) &&
-            domain_id == VTD_CONTEXT_ENTRY_DID(ce.hi)) {
+                                      vtd_as->devfn, &ce[0]) &&
+            domain_id == VTD_CONTEXT_ENTRY_DID(ce[0].hi)) {
             memory_region_iommu_replay_all(&vtd_as->iommu);
         }
     }
@@ -1311,15 +1354,15 @@ static void vtd_iotlb_page_invalidate_notify(IntelIOMMUState *s,
                                            uint8_t am)
 {
     IntelIOMMUNotifierNode *node;
-    VTDContextEntry ce;
+    VTDContextEntry ce[2];
     int ret;
 
     QLIST_FOREACH(node, &(s->notifiers_list), next) {
         VTDAddressSpace *vtd_as = node->vtd_as;
         ret = vtd_dev_to_context_entry(s, pci_bus_num(vtd_as->bus),
-                                       vtd_as->devfn, &ce);
-        if (!ret && domain_id == VTD_CONTEXT_ENTRY_DID(ce.hi)) {
-            vtd_page_walk(&ce, addr, addr + (1 << am) * VTD_PAGE_SIZE,
+                                       vtd_as->devfn, &ce[0]);
+        if (!ret && domain_id == VTD_CONTEXT_ENTRY_DID(ce[0].hi)) {
+            vtd_page_walk(&ce[0], addr, addr + (1 << am) * VTD_PAGE_SIZE,
                           vtd_page_invalidate_notify_hook,
                           (void *)&vtd_as->iommu, true);
         }
@@ -2858,7 +2901,7 @@ static void vtd_iommu_replay(MemoryRegion *mr, IOMMUNotifier *n)
     VTDAddressSpace *vtd_as = container_of(mr, VTDAddressSpace, iommu);
     IntelIOMMUState *s = vtd_as->iommu_state;
     uint8_t bus_n = pci_bus_num(vtd_as->bus);
-    VTDContextEntry ce;
+    VTDContextEntry ce[2];
 
     /*
      * The replay can be triggered by either a invalidation or a newly
@@ -2867,12 +2910,12 @@ static void vtd_iommu_replay(MemoryRegion *mr, IOMMUNotifier *n)
      */
     vtd_address_space_unmap(vtd_as, n);
 
-    if (vtd_dev_to_context_entry(s, bus_n, vtd_as->devfn, &ce) == 0) {
+    if (vtd_dev_to_context_entry(s, bus_n, vtd_as->devfn, &ce[0]) == 0) {
         trace_vtd_replay_ce_valid(bus_n, PCI_SLOT(vtd_as->devfn),
                                   PCI_FUNC(vtd_as->devfn),
-                                  VTD_CONTEXT_ENTRY_DID(ce.hi),
-                                  ce.hi, ce.lo);
-        vtd_page_walk(&ce, 0, ~0ULL, vtd_replay_hook, (void *)n, false);
+                                  VTD_CONTEXT_ENTRY_DID(ce[0].hi),
+                                  ce[0].hi, ce[0].lo);
+        vtd_page_walk(&ce[0], 0, ~0ULL, vtd_replay_hook, (void *)n, false);
     } else {
         trace_vtd_replay_ce_invalid(bus_n, PCI_SLOT(vtd_as->devfn),
                                     PCI_FUNC(vtd_as->devfn));
diff --git a/hw/i386/intel_iommu_internal.h b/hw/i386/intel_iommu_internal.h
index ec1bd17..71a1c1e 100644
--- a/hw/i386/intel_iommu_internal.h
+++ b/hw/i386/intel_iommu_internal.h
@@ -425,6 +425,15 @@ typedef struct VTDRootEntry VTDRootEntry;
 
 #define VTD_CONTEXT_ENTRY_NR        (VTD_PAGE_SIZE / sizeof(VTDContextEntry))
 
+/* Definition for Extended Context */
+#define VTD_EXT_CONTEXT_ENTRY_RSVD_LOW0   (~(VTD_HAW_MASK))
+#define VTD_EXT_CONTEXT_ENTRY_RSVD_HIGH0  0xF0000000ULL
+#define VTD_EXT_CONTEXT_ENTRY_RSVD_LOW1   ((~(VTD_HAW_MASK)) | 0xFF0ULL)
+#define VTD_EXT_CONTEXT_ENTRY_RSVD_HIGH1  ((~(VTD_HAW_MASK)) | 0xFFFULL)
+#define VTD_EXT_CONTEXT_ENTRY_TT          (7ULL << 2)
+#define VTD_EXT_CONTEXT_TT_NO_DEV_IOTLB   (4ULL << 2)
+#define VTD_EXT_CONTEXT_TT_DEV_IOTLB      (5ULL << 2)
+
 /* Paging Structure common */
 #define VTD_SL_PT_PAGE_SIZE_MASK    (1ULL << 7)
 /* Bits to decide the offset for each level */
diff --git a/include/hw/i386/intel_iommu.h b/include/hw/i386/intel_iommu.h
index fa5963e..ae21fe5 100644
--- a/include/hw/i386/intel_iommu.h
+++ b/include/hw/i386/intel_iommu.h
@@ -76,7 +76,7 @@ struct VTDContextCacheEntry {
      * context_cache_gen!=IntelIOMMUState.context_cache_gen
      */
     uint32_t context_cache_gen;
-    struct VTDContextEntry context_entry;
+    struct VTDContextEntry context_entry[2];
 };
 
 struct VTDAddressSpace {
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 81+ messages in thread

* [Qemu-devel] [RFC PATCH 02/20] intel_iommu: exposed extended-context mode to guest
@ 2017-04-26 10:06   ` Liu, Yi L
  0 siblings, 0 replies; 81+ messages in thread
From: Liu, Yi L @ 2017-04-26 10:06 UTC (permalink / raw)
  To: qemu-devel, alex.williamson, peterx
  Cc: kvm, jasowang, iommu, kevin.tian, ashok.raj, jacob.jun.pan,
	tianyu.lan, yi.l.liu, jean-philippe.brucker, Liu, Yi L

VT-d implementations reporting PASID or PRS fields as "Set", must also
report ecap.ECS as "Set". Extended-Context is required for SVM.

When ECS is reported, intel iommu driver would initiate extended root entry
and extended context entry, and also PASID table if there is any SVM capable
device.

Signed-off-by: Liu, Yi L <yi.l.liu@linux.intel.com>
---
 hw/i386/intel_iommu.c          | 131 +++++++++++++++++++++++++++--------------
 hw/i386/intel_iommu_internal.h |   9 +++
 include/hw/i386/intel_iommu.h  |   2 +-
 3 files changed, 97 insertions(+), 45 deletions(-)

diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
index 400d0d1..bf98fa5 100644
--- a/hw/i386/intel_iommu.c
+++ b/hw/i386/intel_iommu.c
@@ -497,6 +497,11 @@ static inline bool vtd_root_entry_present(VTDRootEntry *root)
     return root->val & VTD_ROOT_ENTRY_P;
 }
 
+static inline bool vtd_root_entry_upper_present(VTDRootEntry *root)
+{
+    return root->rsvd & VTD_ROOT_ENTRY_P;
+}
+
 static int vtd_get_root_entry(IntelIOMMUState *s, uint8_t index,
                               VTDRootEntry *re)
 {
@@ -509,6 +514,9 @@ static int vtd_get_root_entry(IntelIOMMUState *s, uint8_t index,
         return -VTD_FR_ROOT_TABLE_INV;
     }
     re->val = le64_to_cpu(re->val);
+    if (s->ecs) {
+        re->rsvd = le64_to_cpu(re->rsvd);
+    }
     return 0;
 }
 
@@ -517,19 +525,30 @@ static inline bool vtd_context_entry_present(VTDContextEntry *context)
     return context->lo & VTD_CONTEXT_ENTRY_P;
 }
 
-static int vtd_get_context_entry_from_root(VTDRootEntry *root, uint8_t index,
-                                           VTDContextEntry *ce)
+static int vtd_get_context_entry_from_root(IntelIOMMUState *s,
+                 VTDRootEntry *root, uint8_t index, VTDContextEntry *ce)
 {
-    dma_addr_t addr;
+    dma_addr_t addr, ce_size;
 
     /* we have checked that root entry is present */
-    addr = (root->val & VTD_ROOT_ENTRY_CTP) + index * sizeof(*ce);
-    if (dma_memory_read(&address_space_memory, addr, ce, sizeof(*ce))) {
+    ce_size = (s->ecs) ? (2 * sizeof(*ce)) : (sizeof(*ce));
+    addr = (s->ecs && (index > 0x7f)) ?
+           ((root->rsvd & VTD_ROOT_ENTRY_CTP) + (index - 0x80) * ce_size) :
+           ((root->val & VTD_ROOT_ENTRY_CTP) + index * ce_size);
+
+    if (dma_memory_read(&address_space_memory, addr, ce, ce_size)) {
         trace_vtd_re_invalid(root->rsvd, root->val);
         return -VTD_FR_CONTEXT_TABLE_INV;
     }
-    ce->lo = le64_to_cpu(ce->lo);
-    ce->hi = le64_to_cpu(ce->hi);
+
+    ce[0].lo = le64_to_cpu(ce[0].lo);
+    ce[0].hi = le64_to_cpu(ce[0].hi);
+
+    if (s->ecs) {
+        ce[1].lo = le64_to_cpu(ce[1].lo);
+        ce[1].hi = le64_to_cpu(ce[1].hi);
+    }
+
     return 0;
 }
 
@@ -595,9 +614,11 @@ static inline uint32_t vtd_get_agaw_from_context_entry(VTDContextEntry *ce)
     return 30 + (ce->hi & VTD_CONTEXT_ENTRY_AW) * 9;
 }
 
-static inline uint32_t vtd_ce_get_type(VTDContextEntry *ce)
+static inline uint32_t vtd_ce_get_type(IntelIOMMUState *s,
+                                       VTDContextEntry *ce)
 {
-    return ce->lo & VTD_CONTEXT_ENTRY_TT;
+    return s->ecs ? (ce->lo & VTD_CONTEXT_ENTRY_TT) :
+                    (ce->lo & VTD_EXT_CONTEXT_ENTRY_TT);
 }
 
 static inline uint64_t vtd_iova_limit(VTDContextEntry *ce)
@@ -842,16 +863,20 @@ static int vtd_dev_to_context_entry(IntelIOMMUState *s, uint8_t bus_num,
         return ret_fr;
     }
 
-    if (!vtd_root_entry_present(&re)) {
+    if (!vtd_root_entry_present(&re) ||
+        (s->ecs && (devfn > 0x7f) && (!vtd_root_entry_upper_present(&re)))) {
         /* Not error - it's okay we don't have root entry. */
         trace_vtd_re_not_present(bus_num);
         return -VTD_FR_ROOT_ENTRY_P;
-    } else if (re.rsvd || (re.val & VTD_ROOT_ENTRY_RSVD)) {
-        trace_vtd_re_invalid(re.rsvd, re.val);
-        return -VTD_FR_ROOT_ENTRY_RSVD;
+    }
+    if ((s->ecs && (devfn > 0x7f) && (re.rsvd & VTD_ROOT_ENTRY_RSVD)) ||
+        (s->ecs && (devfn < 0x80) && (re.val & VTD_ROOT_ENTRY_RSVD)) ||
+        ((!s->ecs) && (re.rsvd || (re.val & VTD_ROOT_ENTRY_RSVD)))) {
+            trace_vtd_re_invalid(re.rsvd, re.val);
+            return -VTD_FR_ROOT_ENTRY_RSVD;
     }
 
-    ret_fr = vtd_get_context_entry_from_root(&re, devfn, ce);
+    ret_fr = vtd_get_context_entry_from_root(s, &re, devfn, ce);
     if (ret_fr) {
         return ret_fr;
     }
@@ -860,21 +885,36 @@ static int vtd_dev_to_context_entry(IntelIOMMUState *s, uint8_t bus_num,
         /* Not error - it's okay we don't have context entry. */
         trace_vtd_ce_not_present(bus_num, devfn);
         return -VTD_FR_CONTEXT_ENTRY_P;
-    } else if ((ce->hi & VTD_CONTEXT_ENTRY_RSVD_HI) ||
-               (ce->lo & VTD_CONTEXT_ENTRY_RSVD_LO)) {
+    }
+
+    /* Check Reserved bits in context-entry */
+    if ((!s->ecs && (ce->hi & VTD_CONTEXT_ENTRY_RSVD_HI)) ||
+        (!s->ecs && (ce->lo & VTD_CONTEXT_ENTRY_RSVD_LO)) ||
+        (s->ecs && (ce[0].lo & VTD_EXT_CONTEXT_ENTRY_RSVD_LOW0)) ||
+        (s->ecs && (ce[0].hi & VTD_EXT_CONTEXT_ENTRY_RSVD_HIGH0)) ||
+        (s->ecs && (ce[1].lo & VTD_EXT_CONTEXT_ENTRY_RSVD_LOW1))) {
         trace_vtd_ce_invalid(ce->hi, ce->lo);
         return -VTD_FR_CONTEXT_ENTRY_RSVD;
     }
+
     /* Check if the programming of context-entry is valid */
     if (!vtd_is_level_supported(s, vtd_get_level_from_context_entry(ce))) {
         trace_vtd_ce_invalid(ce->hi, ce->lo);
         return -VTD_FR_CONTEXT_ENTRY_INV;
     } else {
-        switch (vtd_ce_get_type(ce)) {
+        switch (vtd_ce_get_type(s, ce)) {
         case VTD_CONTEXT_TT_MULTI_LEVEL:
             /* fall through */
         case VTD_CONTEXT_TT_DEV_IOTLB:
             break;
+        case VTD_EXT_CONTEXT_TT_NO_DEV_IOTLB:
+        case VTD_EXT_CONTEXT_TT_DEV_IOTLB:
+            if (s->ecs) {
+                break;
+            } else {
+                trace_vtd_ce_invalid(ce->hi, ce->lo);
+                return -VTD_FR_CONTEXT_ENTRY_INV;
+            }
         case VTD_CONTEXT_TT_PASS_THROUGH:
             if (s->ecap & VTD_ECAP_PT) {
                 break;
@@ -894,18 +934,18 @@ static int vtd_dev_to_context_entry(IntelIOMMUState *s, uint8_t bus_num,
 static int vtd_dev_get_trans_type(VTDAddressSpace *as)
 {
     IntelIOMMUState *s;
-    VTDContextEntry ce;
+    VTDContextEntry ce[2];
     int ret;
 
     s = as->iommu_state;
 
     ret = vtd_dev_to_context_entry(s, pci_bus_num(as->bus),
-                                   as->devfn, &ce);
+                                   as->devfn, &ce[0]);
     if (ret) {
         return ret;
     }
 
-    return vtd_ce_get_type(&ce);
+    return vtd_ce_get_type(s, &ce[0]);
 }
 
 static bool vtd_dev_pt_enabled(VTDAddressSpace *as)
@@ -1008,7 +1048,7 @@ static void vtd_do_iommu_translate(VTDAddressSpace *vtd_as, PCIBus *bus,
                                    IOMMUTLBEntry *entry)
 {
     IntelIOMMUState *s = vtd_as->iommu_state;
-    VTDContextEntry ce;
+    VTDContextEntry ce[2];
     uint8_t bus_num = pci_bus_num(bus);
     VTDContextCacheEntry *cc_entry = &vtd_as->context_cache_entry;
     uint64_t slpte, page_mask;
@@ -1039,14 +1079,16 @@ static void vtd_do_iommu_translate(VTDAddressSpace *vtd_as, PCIBus *bus,
     }
     /* Try to fetch context-entry from cache first */
     if (cc_entry->context_cache_gen == s->context_cache_gen) {
-        trace_vtd_iotlb_cc_hit(bus_num, devfn, cc_entry->context_entry.hi,
-                               cc_entry->context_entry.lo,
+        trace_vtd_iotlb_cc_hit(bus_num, devfn,
+                               cc_entry->context_entry[0].hi,
+                               cc_entry->context_entry[0].lo,
                                cc_entry->context_cache_gen);
-        ce = cc_entry->context_entry;
-        is_fpd_set = ce.lo & VTD_CONTEXT_ENTRY_FPD;
+        ce[0] = cc_entry->context_entry[0];
+        ce[1] = cc_entry->context_entry[1];
+        is_fpd_set = ce[0].lo & VTD_CONTEXT_ENTRY_FPD;
     } else {
-        ret_fr = vtd_dev_to_context_entry(s, bus_num, devfn, &ce);
-        is_fpd_set = ce.lo & VTD_CONTEXT_ENTRY_FPD;
+        ret_fr = vtd_dev_to_context_entry(s, bus_num, devfn, &ce[0]);
+        is_fpd_set = ce[0].lo & VTD_CONTEXT_ENTRY_FPD;
         if (ret_fr) {
             ret_fr = -ret_fr;
             if (is_fpd_set && vtd_is_qualified_fault(ret_fr)) {
@@ -1057,10 +1099,11 @@ static void vtd_do_iommu_translate(VTDAddressSpace *vtd_as, PCIBus *bus,
             return;
         }
         /* Update context-cache */
-        trace_vtd_iotlb_cc_update(bus_num, devfn, ce.hi, ce.lo,
+        trace_vtd_iotlb_cc_update(bus_num, devfn, ce[0].hi, ce[0].lo,
                                   cc_entry->context_cache_gen,
                                   s->context_cache_gen);
-        cc_entry->context_entry = ce;
+        cc_entry->context_entry[0] = ce[0];
+        cc_entry->context_entry[1] = ce[1];
         cc_entry->context_cache_gen = s->context_cache_gen;
     }
 
@@ -1068,7 +1111,7 @@ static void vtd_do_iommu_translate(VTDAddressSpace *vtd_as, PCIBus *bus,
      * We don't need to translate for pass-through context entries.
      * Also, let's ignore IOTLB caching as well for PT devices.
      */
-    if (vtd_ce_get_type(&ce) == VTD_CONTEXT_TT_PASS_THROUGH) {
+    if (vtd_ce_get_type(s, &ce[0]) == VTD_CONTEXT_TT_PASS_THROUGH) {
         entry->translated_addr = entry->iova;
         entry->addr_mask = VTD_PAGE_SIZE - 1;
         entry->perm = IOMMU_RW;
@@ -1076,7 +1119,7 @@ static void vtd_do_iommu_translate(VTDAddressSpace *vtd_as, PCIBus *bus,
         return;
     }
 
-    ret_fr = vtd_iova_to_slpte(&ce, addr, is_write, &slpte, &level,
+    ret_fr = vtd_iova_to_slpte(&ce[0], addr, is_write, &slpte, &level,
                                &reads, &writes);
     if (ret_fr) {
         ret_fr = -ret_fr;
@@ -1089,7 +1132,7 @@ static void vtd_do_iommu_translate(VTDAddressSpace *vtd_as, PCIBus *bus,
     }
 
     page_mask = vtd_slpt_level_page_mask(level);
-    vtd_update_iotlb(s, source_id, VTD_CONTEXT_ENTRY_DID(ce.hi), addr, slpte,
+    vtd_update_iotlb(s, source_id, VTD_CONTEXT_ENTRY_DID(ce[0].hi), addr, slpte,
                      reads, writes, level);
 out:
     entry->iova = addr & page_mask;
@@ -1283,7 +1326,7 @@ static void vtd_iotlb_global_invalidate(IntelIOMMUState *s)
 static void vtd_iotlb_domain_invalidate(IntelIOMMUState *s, uint16_t domain_id)
 {
     IntelIOMMUNotifierNode *node;
-    VTDContextEntry ce;
+    VTDContextEntry ce[2];
     VTDAddressSpace *vtd_as;
 
     g_hash_table_foreach_remove(s->iotlb, vtd_hash_remove_by_domain,
@@ -1292,8 +1335,8 @@ static void vtd_iotlb_domain_invalidate(IntelIOMMUState *s, uint16_t domain_id)
     QLIST_FOREACH(node, &s->notifiers_list, next) {
         vtd_as = node->vtd_as;
         if (!vtd_dev_to_context_entry(s, pci_bus_num(vtd_as->bus),
-                                      vtd_as->devfn, &ce) &&
-            domain_id == VTD_CONTEXT_ENTRY_DID(ce.hi)) {
+                                      vtd_as->devfn, &ce[0]) &&
+            domain_id == VTD_CONTEXT_ENTRY_DID(ce[0].hi)) {
             memory_region_iommu_replay_all(&vtd_as->iommu);
         }
     }
@@ -1311,15 +1354,15 @@ static void vtd_iotlb_page_invalidate_notify(IntelIOMMUState *s,
                                            uint8_t am)
 {
     IntelIOMMUNotifierNode *node;
-    VTDContextEntry ce;
+    VTDContextEntry ce[2];
     int ret;
 
     QLIST_FOREACH(node, &(s->notifiers_list), next) {
         VTDAddressSpace *vtd_as = node->vtd_as;
         ret = vtd_dev_to_context_entry(s, pci_bus_num(vtd_as->bus),
-                                       vtd_as->devfn, &ce);
-        if (!ret && domain_id == VTD_CONTEXT_ENTRY_DID(ce.hi)) {
-            vtd_page_walk(&ce, addr, addr + (1 << am) * VTD_PAGE_SIZE,
+                                       vtd_as->devfn, &ce[0]);
+        if (!ret && domain_id == VTD_CONTEXT_ENTRY_DID(ce[0].hi)) {
+            vtd_page_walk(&ce[0], addr, addr + (1 << am) * VTD_PAGE_SIZE,
                           vtd_page_invalidate_notify_hook,
                           (void *)&vtd_as->iommu, true);
         }
@@ -2858,7 +2901,7 @@ static void vtd_iommu_replay(MemoryRegion *mr, IOMMUNotifier *n)
     VTDAddressSpace *vtd_as = container_of(mr, VTDAddressSpace, iommu);
     IntelIOMMUState *s = vtd_as->iommu_state;
     uint8_t bus_n = pci_bus_num(vtd_as->bus);
-    VTDContextEntry ce;
+    VTDContextEntry ce[2];
 
     /*
      * The replay can be triggered by either a invalidation or a newly
@@ -2867,12 +2910,12 @@ static void vtd_iommu_replay(MemoryRegion *mr, IOMMUNotifier *n)
      */
     vtd_address_space_unmap(vtd_as, n);
 
-    if (vtd_dev_to_context_entry(s, bus_n, vtd_as->devfn, &ce) == 0) {
+    if (vtd_dev_to_context_entry(s, bus_n, vtd_as->devfn, &ce[0]) == 0) {
         trace_vtd_replay_ce_valid(bus_n, PCI_SLOT(vtd_as->devfn),
                                   PCI_FUNC(vtd_as->devfn),
-                                  VTD_CONTEXT_ENTRY_DID(ce.hi),
-                                  ce.hi, ce.lo);
-        vtd_page_walk(&ce, 0, ~0ULL, vtd_replay_hook, (void *)n, false);
+                                  VTD_CONTEXT_ENTRY_DID(ce[0].hi),
+                                  ce[0].hi, ce[0].lo);
+        vtd_page_walk(&ce[0], 0, ~0ULL, vtd_replay_hook, (void *)n, false);
     } else {
         trace_vtd_replay_ce_invalid(bus_n, PCI_SLOT(vtd_as->devfn),
                                     PCI_FUNC(vtd_as->devfn));
diff --git a/hw/i386/intel_iommu_internal.h b/hw/i386/intel_iommu_internal.h
index ec1bd17..71a1c1e 100644
--- a/hw/i386/intel_iommu_internal.h
+++ b/hw/i386/intel_iommu_internal.h
@@ -425,6 +425,15 @@ typedef struct VTDRootEntry VTDRootEntry;
 
 #define VTD_CONTEXT_ENTRY_NR        (VTD_PAGE_SIZE / sizeof(VTDContextEntry))
 
+/* Definition for Extended Context */
+#define VTD_EXT_CONTEXT_ENTRY_RSVD_LOW0   (~(VTD_HAW_MASK))
+#define VTD_EXT_CONTEXT_ENTRY_RSVD_HIGH0  0xF0000000ULL
+#define VTD_EXT_CONTEXT_ENTRY_RSVD_LOW1   ((~(VTD_HAW_MASK)) | 0xFF0ULL)
+#define VTD_EXT_CONTEXT_ENTRY_RSVD_HIGH1  ((~(VTD_HAW_MASK)) | 0xFFFULL)
+#define VTD_EXT_CONTEXT_ENTRY_TT          (7ULL << 2)
+#define VTD_EXT_CONTEXT_TT_NO_DEV_IOTLB   (4ULL << 2)
+#define VTD_EXT_CONTEXT_TT_DEV_IOTLB      (5ULL << 2)
+
 /* Paging Structure common */
 #define VTD_SL_PT_PAGE_SIZE_MASK    (1ULL << 7)
 /* Bits to decide the offset for each level */
diff --git a/include/hw/i386/intel_iommu.h b/include/hw/i386/intel_iommu.h
index fa5963e..ae21fe5 100644
--- a/include/hw/i386/intel_iommu.h
+++ b/include/hw/i386/intel_iommu.h
@@ -76,7 +76,7 @@ struct VTDContextCacheEntry {
      * context_cache_gen!=IntelIOMMUState.context_cache_gen
      */
     uint32_t context_cache_gen;
-    struct VTDContextEntry context_entry;
+    struct VTDContextEntry context_entry[2];
 };
 
 struct VTDAddressSpace {
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 81+ messages in thread

* [RFC PATCH 03/20] intel_iommu: add "svm" option
  2017-04-26 10:06 ` [Qemu-devel] " Liu, Yi L
@ 2017-04-26 10:06   ` Liu, Yi L
  -1 siblings, 0 replies; 81+ messages in thread
From: Liu, Yi L @ 2017-04-26 10:06 UTC (permalink / raw)
  To: qemu-devel, alex.williamson, peterx
  Cc: kvm, jasowang, iommu, kevin.tian, ashok.raj, jacob.jun.pan,
	tianyu.lan, yi.l.liu, jean-philippe.brucker, Liu, Yi L

Expose "Shared Virtual Memory" to guest by using "svm" option.
Also use "svm" to expose SVM related capabilities to guest.
e.g. "-device intel-iommu, svm=on"

Signed-off-by: Liu, Yi L <yi.l.liu@linux.intel.com>
---
 hw/i386/intel_iommu.c          | 10 ++++++++++
 hw/i386/intel_iommu_internal.h |  5 +++++
 include/hw/i386/intel_iommu.h  |  1 +
 3 files changed, 16 insertions(+)

diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
index bf98fa5..ba1e7eb 100644
--- a/hw/i386/intel_iommu.c
+++ b/hw/i386/intel_iommu.c
@@ -2453,6 +2453,7 @@ static Property vtd_properties[] = {
     DEFINE_PROP_BOOL("x-buggy-eim", IntelIOMMUState, buggy_eim, false),
     DEFINE_PROP_BOOL("caching-mode", IntelIOMMUState, caching_mode, FALSE),
     DEFINE_PROP_BOOL("ecs", IntelIOMMUState, ecs, FALSE),
+    DEFINE_PROP_BOOL("svm", IntelIOMMUState, svm, FALSE),
     DEFINE_PROP_END_OF_LIST(),
 };
 
@@ -2973,6 +2974,15 @@ static void vtd_init(IntelIOMMUState *s)
         s->ecap |= VTD_ECAP_ECS;
     }
 
+    if (s->svm) {
+        if (!s->ecs || !x86_iommu->pt_supported || !s->caching_mode) {
+            error_report("Need to set ecs, pt, caching-mode for svm");
+            exit(1);
+        }
+        s->cap |= VTD_CAP_DWD | VTD_CAP_DRD;
+        s->ecap |= VTD_ECAP_PRS | VTD_ECAP_PTS | VTD_ECAP_PASID28;
+    }
+
     if (s->caching_mode) {
         s->cap |= VTD_CAP_CM;
     }
diff --git a/hw/i386/intel_iommu_internal.h b/hw/i386/intel_iommu_internal.h
index 71a1c1e..f2a7d12 100644
--- a/hw/i386/intel_iommu_internal.h
+++ b/hw/i386/intel_iommu_internal.h
@@ -191,6 +191,9 @@
 #define VTD_ECAP_PT                 (1ULL << 6)
 #define VTD_ECAP_MHMV               (15ULL << 20)
 #define VTD_ECAP_ECS                (1ULL << 24)
+#define VTD_ECAP_PASID28            (1ULL << 28)
+#define VTD_ECAP_PRS                (1ULL << 29)
+#define VTD_ECAP_PTS                (0xeULL << 35)
 
 /* CAP_REG */
 /* (offset >> 4) << 24 */
@@ -207,6 +210,8 @@
 #define VTD_CAP_PSI                 (1ULL << 39)
 #define VTD_CAP_SLLPS               ((1ULL << 34) | (1ULL << 35))
 #define VTD_CAP_CM                  (1ULL << 7)
+#define VTD_CAP_DWD                 (1ULL << 54)
+#define VTD_CAP_DRD                 (1ULL << 55)
 
 /* Supported Adjusted Guest Address Widths */
 #define VTD_CAP_SAGAW_SHIFT         8
diff --git a/include/hw/i386/intel_iommu.h b/include/hw/i386/intel_iommu.h
index ae21fe5..8981615 100644
--- a/include/hw/i386/intel_iommu.h
+++ b/include/hw/i386/intel_iommu.h
@@ -267,6 +267,7 @@ struct IntelIOMMUState {
 
     bool caching_mode;          /* RO - is cap CM enabled? */
     bool ecs;                       /* Extended Context Support */
+    bool svm;                       /* Shared Virtual Memory */
 
     dma_addr_t root;                /* Current root table pointer */
     bool root_extended;             /* Type of root table (extended or not) */
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 81+ messages in thread

* [Qemu-devel] [RFC PATCH 03/20] intel_iommu: add "svm" option
@ 2017-04-26 10:06   ` Liu, Yi L
  0 siblings, 0 replies; 81+ messages in thread
From: Liu, Yi L @ 2017-04-26 10:06 UTC (permalink / raw)
  To: qemu-devel, alex.williamson, peterx
  Cc: kvm, jasowang, iommu, kevin.tian, ashok.raj, jacob.jun.pan,
	tianyu.lan, yi.l.liu, jean-philippe.brucker, Liu, Yi L

Expose "Shared Virtual Memory" to guest by using "svm" option.
Also use "svm" to expose SVM related capabilities to guest.
e.g. "-device intel-iommu, svm=on"

Signed-off-by: Liu, Yi L <yi.l.liu@linux.intel.com>
---
 hw/i386/intel_iommu.c          | 10 ++++++++++
 hw/i386/intel_iommu_internal.h |  5 +++++
 include/hw/i386/intel_iommu.h  |  1 +
 3 files changed, 16 insertions(+)

diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
index bf98fa5..ba1e7eb 100644
--- a/hw/i386/intel_iommu.c
+++ b/hw/i386/intel_iommu.c
@@ -2453,6 +2453,7 @@ static Property vtd_properties[] = {
     DEFINE_PROP_BOOL("x-buggy-eim", IntelIOMMUState, buggy_eim, false),
     DEFINE_PROP_BOOL("caching-mode", IntelIOMMUState, caching_mode, FALSE),
     DEFINE_PROP_BOOL("ecs", IntelIOMMUState, ecs, FALSE),
+    DEFINE_PROP_BOOL("svm", IntelIOMMUState, svm, FALSE),
     DEFINE_PROP_END_OF_LIST(),
 };
 
@@ -2973,6 +2974,15 @@ static void vtd_init(IntelIOMMUState *s)
         s->ecap |= VTD_ECAP_ECS;
     }
 
+    if (s->svm) {
+        if (!s->ecs || !x86_iommu->pt_supported || !s->caching_mode) {
+            error_report("Need to set ecs, pt, caching-mode for svm");
+            exit(1);
+        }
+        s->cap |= VTD_CAP_DWD | VTD_CAP_DRD;
+        s->ecap |= VTD_ECAP_PRS | VTD_ECAP_PTS | VTD_ECAP_PASID28;
+    }
+
     if (s->caching_mode) {
         s->cap |= VTD_CAP_CM;
     }
diff --git a/hw/i386/intel_iommu_internal.h b/hw/i386/intel_iommu_internal.h
index 71a1c1e..f2a7d12 100644
--- a/hw/i386/intel_iommu_internal.h
+++ b/hw/i386/intel_iommu_internal.h
@@ -191,6 +191,9 @@
 #define VTD_ECAP_PT                 (1ULL << 6)
 #define VTD_ECAP_MHMV               (15ULL << 20)
 #define VTD_ECAP_ECS                (1ULL << 24)
+#define VTD_ECAP_PASID28            (1ULL << 28)
+#define VTD_ECAP_PRS                (1ULL << 29)
+#define VTD_ECAP_PTS                (0xeULL << 35)
 
 /* CAP_REG */
 /* (offset >> 4) << 24 */
@@ -207,6 +210,8 @@
 #define VTD_CAP_PSI                 (1ULL << 39)
 #define VTD_CAP_SLLPS               ((1ULL << 34) | (1ULL << 35))
 #define VTD_CAP_CM                  (1ULL << 7)
+#define VTD_CAP_DWD                 (1ULL << 54)
+#define VTD_CAP_DRD                 (1ULL << 55)
 
 /* Supported Adjusted Guest Address Widths */
 #define VTD_CAP_SAGAW_SHIFT         8
diff --git a/include/hw/i386/intel_iommu.h b/include/hw/i386/intel_iommu.h
index ae21fe5..8981615 100644
--- a/include/hw/i386/intel_iommu.h
+++ b/include/hw/i386/intel_iommu.h
@@ -267,6 +267,7 @@ struct IntelIOMMUState {
 
     bool caching_mode;          /* RO - is cap CM enabled? */
     bool ecs;                       /* Extended Context Support */
+    bool svm;                       /* Shared Virtual Memory */
 
     dma_addr_t root;                /* Current root table pointer */
     bool root_extended;             /* Type of root table (extended or not) */
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 81+ messages in thread

* [RFC PATCH 04/20] Memory: modify parameter in IOMMUNotifier func
  2017-04-26 10:06 ` [Qemu-devel] " Liu, Yi L
@ 2017-04-26 10:06   ` Liu, Yi L
  -1 siblings, 0 replies; 81+ messages in thread
From: Liu, Yi L @ 2017-04-26 10:06 UTC (permalink / raw)
  To: qemu-devel, alex.williamson, peterx
  Cc: kvm, jasowang, iommu, kevin.tian, ashok.raj, jacob.jun.pan,
	tianyu.lan, yi.l.liu, jean-philippe.brucker, Liu, Yi L

This patch modifies parameter of IOMMUNotifier, use "void *data" instead
of "IOMMUTLBEntry*". This is to extend it to support notifiers other than
MAP/UNMAP.

Signed-off-by: Liu, Yi L <yi.l.liu@linux.intel.com>
---
 hw/vfio/common.c      | 3 ++-
 hw/virtio/vhost.c     | 3 ++-
 include/exec/memory.h | 2 +-
 3 files changed, 5 insertions(+), 3 deletions(-)

diff --git a/hw/vfio/common.c b/hw/vfio/common.c
index 6b33b9f..14473f1 100644
--- a/hw/vfio/common.c
+++ b/hw/vfio/common.c
@@ -332,10 +332,11 @@ static bool vfio_get_vaddr(IOMMUTLBEntry *iotlb, void **vaddr,
     return true;
 }
 
-static void vfio_iommu_map_notify(IOMMUNotifier *n, IOMMUTLBEntry *iotlb)
+static void vfio_iommu_map_notify(IOMMUNotifier *n, void *data)
 {
     VFIOGuestIOMMU *giommu = container_of(n, VFIOGuestIOMMU, n);
     VFIOContainer *container = giommu->container;
+    IOMMUTLBEntry *iotlb = (IOMMUTLBEntry *)data;
     hwaddr iova = iotlb->iova + giommu->iommu_offset;
     bool read_only;
     void *vaddr;
diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c
index ccf8b2e..fd20fd0 100644
--- a/hw/virtio/vhost.c
+++ b/hw/virtio/vhost.c
@@ -1161,9 +1161,10 @@ static void vhost_virtqueue_cleanup(struct vhost_virtqueue *vq)
     event_notifier_cleanup(&vq->masked_notifier);
 }
 
-static void vhost_iommu_unmap_notify(IOMMUNotifier *n, IOMMUTLBEntry *iotlb)
+static void vhost_iommu_unmap_notify(IOMMUNotifier *n, void *data)
 {
     struct vhost_dev *hdev = container_of(n, struct vhost_dev, n);
+    IOMMUTLBEntry *iotlb = (IOMMUTLBEntry *)data;
 
     if (hdev->vhost_ops->vhost_invalidate_device_iotlb(hdev,
                                                        iotlb->iova,
diff --git a/include/exec/memory.h b/include/exec/memory.h
index 267f399..1faca3b 100644
--- a/include/exec/memory.h
+++ b/include/exec/memory.h
@@ -81,7 +81,7 @@ typedef enum {
 
 struct IOMMUNotifier;
 typedef void (*IOMMUNotify)(struct IOMMUNotifier *notifier,
-                            IOMMUTLBEntry *data);
+                            void *data);
 
 struct IOMMUNotifier {
     IOMMUNotify notify;
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 81+ messages in thread

* [Qemu-devel] [RFC PATCH 04/20] Memory: modify parameter in IOMMUNotifier func
@ 2017-04-26 10:06   ` Liu, Yi L
  0 siblings, 0 replies; 81+ messages in thread
From: Liu, Yi L @ 2017-04-26 10:06 UTC (permalink / raw)
  To: qemu-devel, alex.williamson, peterx
  Cc: kvm, jasowang, iommu, kevin.tian, ashok.raj, jacob.jun.pan,
	tianyu.lan, yi.l.liu, jean-philippe.brucker, Liu, Yi L

This patch modifies parameter of IOMMUNotifier, use "void *data" instead
of "IOMMUTLBEntry*". This is to extend it to support notifiers other than
MAP/UNMAP.

Signed-off-by: Liu, Yi L <yi.l.liu@linux.intel.com>
---
 hw/vfio/common.c      | 3 ++-
 hw/virtio/vhost.c     | 3 ++-
 include/exec/memory.h | 2 +-
 3 files changed, 5 insertions(+), 3 deletions(-)

diff --git a/hw/vfio/common.c b/hw/vfio/common.c
index 6b33b9f..14473f1 100644
--- a/hw/vfio/common.c
+++ b/hw/vfio/common.c
@@ -332,10 +332,11 @@ static bool vfio_get_vaddr(IOMMUTLBEntry *iotlb, void **vaddr,
     return true;
 }
 
-static void vfio_iommu_map_notify(IOMMUNotifier *n, IOMMUTLBEntry *iotlb)
+static void vfio_iommu_map_notify(IOMMUNotifier *n, void *data)
 {
     VFIOGuestIOMMU *giommu = container_of(n, VFIOGuestIOMMU, n);
     VFIOContainer *container = giommu->container;
+    IOMMUTLBEntry *iotlb = (IOMMUTLBEntry *)data;
     hwaddr iova = iotlb->iova + giommu->iommu_offset;
     bool read_only;
     void *vaddr;
diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c
index ccf8b2e..fd20fd0 100644
--- a/hw/virtio/vhost.c
+++ b/hw/virtio/vhost.c
@@ -1161,9 +1161,10 @@ static void vhost_virtqueue_cleanup(struct vhost_virtqueue *vq)
     event_notifier_cleanup(&vq->masked_notifier);
 }
 
-static void vhost_iommu_unmap_notify(IOMMUNotifier *n, IOMMUTLBEntry *iotlb)
+static void vhost_iommu_unmap_notify(IOMMUNotifier *n, void *data)
 {
     struct vhost_dev *hdev = container_of(n, struct vhost_dev, n);
+    IOMMUTLBEntry *iotlb = (IOMMUTLBEntry *)data;
 
     if (hdev->vhost_ops->vhost_invalidate_device_iotlb(hdev,
                                                        iotlb->iova,
diff --git a/include/exec/memory.h b/include/exec/memory.h
index 267f399..1faca3b 100644
--- a/include/exec/memory.h
+++ b/include/exec/memory.h
@@ -81,7 +81,7 @@ typedef enum {
 
 struct IOMMUNotifier;
 typedef void (*IOMMUNotify)(struct IOMMUNotifier *notifier,
-                            IOMMUTLBEntry *data);
+                            void *data);
 
 struct IOMMUNotifier {
     IOMMUNotify notify;
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 81+ messages in thread

* [RFC PATCH 05/20] VFIO: add new IOCTL for svm bind tasks
  2017-04-26 10:06 ` [Qemu-devel] " Liu, Yi L
@ 2017-04-26 10:06   ` Liu, Yi L
  -1 siblings, 0 replies; 81+ messages in thread
From: Liu, Yi L @ 2017-04-26 10:06 UTC (permalink / raw)
  To: qemu-devel, alex.williamson, peterx
  Cc: kvm, jasowang, iommu, kevin.tian, ashok.raj, jacob.jun.pan,
	tianyu.lan, yi.l.liu, jean-philippe.brucker, Liu, Yi L

Add a new IOCTL cmd VFIO_IOMMU_SVM_BIND_TASK attached on container->fd.

On VT-d, this IOCTL cmd would be used to link the guest PASID page table
to host. While for other vendors, it may also be used to support other
kind of SVM bind request. Previously, there is a discussion on it with
ARM engineer. It can be found by the link below. This IOCTL cmd may
support SVM PASID bind request from userspace driver, or page table(cr3)
bind request from guest. These SVM bind requests would be supported by
adding different flags. e.g. VFIO_SVM_BIND_PASID is added to support
PASID bind from userspace driver, VFIO_SVM_BIND_PGTABLE is added to
support page table bind from guest.

https://patchwork.kernel.org/patch/9594231/

Signed-off-by: Liu, Yi L <yi.l.liu@linux.intel.com>
---
 linux-headers/linux/vfio.h | 18 ++++++++++++++++++
 1 file changed, 18 insertions(+)

diff --git a/linux-headers/linux/vfio.h b/linux-headers/linux/vfio.h
index 759b850..9848d63 100644
--- a/linux-headers/linux/vfio.h
+++ b/linux-headers/linux/vfio.h
@@ -537,6 +537,24 @@ struct vfio_iommu_type1_dma_unmap {
 #define VFIO_IOMMU_ENABLE	_IO(VFIO_TYPE, VFIO_BASE + 15)
 #define VFIO_IOMMU_DISABLE	_IO(VFIO_TYPE, VFIO_BASE + 16)
 
+/* IOCTL for Shared Virtual Memory Bind */
+struct vfio_device_svm {
+	__u32	argsz;
+#define VFIO_SVM_BIND_PASIDTBL	(1 << 0) /* Bind PASID Table */
+#define VFIO_SVM_BIND_PASID	(1 << 1) /* Bind PASID from userspace driver */
+#define VFIO_SVM_BIND_PGTABLE	(1 << 2) /* Bind guest mmu page table */
+	__u32	flags;
+	__u32	length;
+	__u8	data[];
+};
+
+#define VFIO_SVM_TYPE_MASK	(VFIO_SVM_BIND_PASIDTBL | \
+				VFIO_SVM_BIND_PASID | \
+				VFIO_SVM_BIND_PGTABLE )
+
+#define VFIO_IOMMU_SVM_BIND_TASK	_IO(VFIO_TYPE, VFIO_BASE + 22)
+
+
 /* -------- Additional API for SPAPR TCE (Server POWERPC) IOMMU -------- */
 
 /*
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 81+ messages in thread

* [Qemu-devel] [RFC PATCH 05/20] VFIO: add new IOCTL for svm bind tasks
@ 2017-04-26 10:06   ` Liu, Yi L
  0 siblings, 0 replies; 81+ messages in thread
From: Liu, Yi L @ 2017-04-26 10:06 UTC (permalink / raw)
  To: qemu-devel, alex.williamson, peterx
  Cc: kvm, jasowang, iommu, kevin.tian, ashok.raj, jacob.jun.pan,
	tianyu.lan, yi.l.liu, jean-philippe.brucker, Liu, Yi L

Add a new IOCTL cmd VFIO_IOMMU_SVM_BIND_TASK attached on container->fd.

On VT-d, this IOCTL cmd would be used to link the guest PASID page table
to host. While for other vendors, it may also be used to support other
kind of SVM bind request. Previously, there is a discussion on it with
ARM engineer. It can be found by the link below. This IOCTL cmd may
support SVM PASID bind request from userspace driver, or page table(cr3)
bind request from guest. These SVM bind requests would be supported by
adding different flags. e.g. VFIO_SVM_BIND_PASID is added to support
PASID bind from userspace driver, VFIO_SVM_BIND_PGTABLE is added to
support page table bind from guest.

https://patchwork.kernel.org/patch/9594231/

Signed-off-by: Liu, Yi L <yi.l.liu@linux.intel.com>
---
 linux-headers/linux/vfio.h | 18 ++++++++++++++++++
 1 file changed, 18 insertions(+)

diff --git a/linux-headers/linux/vfio.h b/linux-headers/linux/vfio.h
index 759b850..9848d63 100644
--- a/linux-headers/linux/vfio.h
+++ b/linux-headers/linux/vfio.h
@@ -537,6 +537,24 @@ struct vfio_iommu_type1_dma_unmap {
 #define VFIO_IOMMU_ENABLE	_IO(VFIO_TYPE, VFIO_BASE + 15)
 #define VFIO_IOMMU_DISABLE	_IO(VFIO_TYPE, VFIO_BASE + 16)
 
+/* IOCTL for Shared Virtual Memory Bind */
+struct vfio_device_svm {
+	__u32	argsz;
+#define VFIO_SVM_BIND_PASIDTBL	(1 << 0) /* Bind PASID Table */
+#define VFIO_SVM_BIND_PASID	(1 << 1) /* Bind PASID from userspace driver */
+#define VFIO_SVM_BIND_PGTABLE	(1 << 2) /* Bind guest mmu page table */
+	__u32	flags;
+	__u32	length;
+	__u8	data[];
+};
+
+#define VFIO_SVM_TYPE_MASK	(VFIO_SVM_BIND_PASIDTBL | \
+				VFIO_SVM_BIND_PASID | \
+				VFIO_SVM_BIND_PGTABLE )
+
+#define VFIO_IOMMU_SVM_BIND_TASK	_IO(VFIO_TYPE, VFIO_BASE + 22)
+
+
 /* -------- Additional API for SPAPR TCE (Server POWERPC) IOMMU -------- */
 
 /*
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 81+ messages in thread

* [RFC PATCH 06/20] VFIO: add new notifier for binding PASID table
  2017-04-26 10:06 ` [Qemu-devel] " Liu, Yi L
@ 2017-04-26 10:06   ` Liu, Yi L
  -1 siblings, 0 replies; 81+ messages in thread
From: Liu, Yi L @ 2017-04-26 10:06 UTC (permalink / raw)
  To: qemu-devel, alex.williamson, peterx
  Cc: kvm, jasowang, iommu, kevin.tian, ashok.raj, jacob.jun.pan,
	tianyu.lan, yi.l.liu, jean-philippe.brucker, Liu, Yi L

This patch includes the following items:

* add vfio_register_notifier() for vfio notifier initialization
* add new notifier flag IOMMU_NOTIFIER_SVM_PASIDT_BIND = 0x4
* add vfio_iommu_bind_pasid_tbl_notify() to link guest pasid table
  to host

This patch doesn't register new notifier in vfio memory region listener
region_add callback. The reason is as below:

On VT-d, when virtual intel_iommu is exposed to guest, the vfio memory
listener listens to address_space_memory. When guest Intel IOMMU driver
enables address translation, vfio memory listener may switch to listen
to vtd_address_space. But there is special case. If virtual intel_iommu
reports ecap.PT=1 to guest and meanwhile guest Intel IOMMU driver sets
"pt" mode for the assigned, vfio memory listener would keep listen to
address_space_memory to make sure there is GPA->HPA mapping in pIOMMU.
Thus region_add would not be triggered. While for the newly added
notifier, it requires to be registered once virtual intel_iommu is
exposed to guest.

Signed-off-by: Liu, Yi L <yi.l.liu@linux.intel.com>
---
 hw/vfio/common.c              | 37 +++++++++++++++++++++++-------
 hw/vfio/pci.c                 | 53 ++++++++++++++++++++++++++++++++++++++++++-
 include/exec/memory.h         |  8 +++++++
 include/hw/vfio/vfio-common.h |  5 ++++
 4 files changed, 94 insertions(+), 9 deletions(-)

diff --git a/hw/vfio/common.c b/hw/vfio/common.c
index 14473f1..e270255 100644
--- a/hw/vfio/common.c
+++ b/hw/vfio/common.c
@@ -294,6 +294,25 @@ static bool vfio_listener_skipped_section(MemoryRegionSection *section)
            section->offset_within_address_space & (1ULL << 63);
 }
 
+VFIOGuestIOMMU *vfio_register_notifier(VFIOContainer *container,
+                                       MemoryRegion *mr,
+                                       hwaddr offset,
+                                       IOMMUNotifier *n)
+{
+    VFIOGuestIOMMU *giommu;
+
+    giommu = g_malloc0(sizeof(*giommu));
+    giommu->iommu = mr;
+    giommu->iommu_offset = offset;
+    giommu->container = container;
+    giommu->n = *n;
+
+    QLIST_INSERT_HEAD(&container->giommu_list, giommu, giommu_next);
+    memory_region_register_iommu_notifier(giommu->iommu, &giommu->n);
+
+    return giommu;
+}
+
 /* Called with rcu_read_lock held.  */
 static bool vfio_get_vaddr(IOMMUTLBEntry *iotlb, void **vaddr,
                            bool *read_only)
@@ -466,6 +485,8 @@ static void vfio_listener_region_add(MemoryListener *listener,
 
     if (memory_region_is_iommu(section->mr)) {
         VFIOGuestIOMMU *giommu;
+        IOMMUNotifier n;
+        hwaddr iommu_offset;
 
         trace_vfio_listener_region_add_iommu(iova, end);
         /*
@@ -474,21 +495,21 @@ static void vfio_listener_region_add(MemoryListener *listener,
          * would be the right place to wire that up (tell the KVM
          * device emulation the VFIO iommu handles to use).
          */
-        giommu = g_malloc0(sizeof(*giommu));
-        giommu->iommu = section->mr;
-        giommu->iommu_offset = section->offset_within_address_space -
-                               section->offset_within_region;
-        giommu->container = container;
+        iommu_offset = section->offset_within_address_space -
+                       section->offset_within_region;
         llend = int128_add(int128_make64(section->offset_within_region),
                            section->size);
         llend = int128_sub(llend, int128_one());
-        iommu_notifier_init(&giommu->n, vfio_iommu_map_notify,
+        iommu_notifier_init(&n, vfio_iommu_map_notify,
                             IOMMU_NOTIFIER_ALL,
                             section->offset_within_region,
                             int128_get64(llend));
-        QLIST_INSERT_HEAD(&container->giommu_list, giommu, giommu_next);
 
-        memory_region_register_iommu_notifier(giommu->iommu, &giommu->n);
+        giommu = vfio_register_notifier(container,
+                                        section->mr,
+                                        iommu_offset,
+                                        &n);
+
         memory_region_iommu_replay(giommu->iommu, &giommu->n, false);
 
         return;
diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
index 332f41d..9e13472 100644
--- a/hw/vfio/pci.c
+++ b/hw/vfio/pci.c
@@ -2594,11 +2594,38 @@ static void vfio_unregister_req_notifier(VFIOPCIDevice *vdev)
     vdev->req_enabled = false;
 }
 
+static void vfio_iommu_bind_pasid_tbl_notify(IOMMUNotifier *n, void *data)
+{
+    VFIOGuestIOMMU *giommu = container_of(n, VFIOGuestIOMMU, n);
+    VFIOContainer *container = giommu->container;
+    IOMMUNotifierData *iommu_data = (IOMMUNotifierData *) data;
+    struct vfio_device_svm *vfio_svm;
+    int argsz;
+
+    argsz = sizeof(*vfio_svm) + iommu_data->payload_size;
+    vfio_svm = g_malloc0(argsz);
+    vfio_svm->argsz = argsz;
+    vfio_svm->flags = VFIO_SVM_BIND_PASIDTBL;
+    vfio_svm->length = iommu_data->payload_size;
+    memcpy(&vfio_svm->data, iommu_data->payload,
+                         iommu_data->payload_size);
+
+    rcu_read_lock();
+    if (ioctl(container->fd, VFIO_IOMMU_SVM_BIND_TASK, vfio_svm) != 0) {
+        error_report("vfio_iommu_bind_pasid_tbl_notify:"
+                     " bind failed, contanier: %p", container);
+    }
+    rcu_read_unlock();
+    g_free(vfio_svm);
+}
+
 static void vfio_realize(PCIDevice *pdev, Error **errp)
 {
     VFIOPCIDevice *vdev = DO_UPCAST(VFIOPCIDevice, pdev, pdev);
     VFIODevice *vbasedev_iter;
     VFIOGroup *group;
+    AddressSpace *as;
+    MemoryRegion *subregion;
     char *tmp, group_path[PATH_MAX], *group_name;
     Error *err = NULL;
     ssize_t len;
@@ -2650,7 +2677,8 @@ static void vfio_realize(PCIDevice *pdev, Error **errp)
 
     trace_vfio_realize(vdev->vbasedev.name, groupid);
 
-    group = vfio_get_group(groupid, pci_device_iommu_address_space(pdev), errp);
+    as = pci_device_iommu_address_space(pdev);
+    group = vfio_get_group(groupid, as, errp);
     if (!group) {
         goto error;
     }
@@ -2833,6 +2861,29 @@ static void vfio_realize(PCIDevice *pdev, Error **errp)
     vfio_register_req_notifier(vdev);
     vfio_setup_resetfn_quirk(vdev);
 
+    /* Check if vIOMMU exists */
+    QTAILQ_FOREACH(subregion, &as->root->subregions, subregions_link) {
+        if (memory_region_is_iommu(subregion)) {
+            IOMMUNotifier n1;
+
+            /*
+             FIXME: current iommu notifier is actually designed for
+             IOMMUTLB MAP/UNMAP. However, vIOMMU emulator may need
+             notifiers other than MAP/UNMAP, so it'll be better to
+             split the non-IOMMUTLB notifier from the current IOMMUTLB
+             notifier framewrok.
+             */
+            iommu_notifier_init(&n1, vfio_iommu_bind_pasid_tbl_notify,
+                                IOMMU_NOTIFIER_SVM_PASIDT_BIND,
+                                0,
+                                0);
+            vfio_register_notifier(group->container,
+                                   subregion,
+                                   0,
+                                   &n1);
+        }
+    }
+
     return;
 
 out_teardown:
diff --git a/include/exec/memory.h b/include/exec/memory.h
index 1faca3b..d2f24cc 100644
--- a/include/exec/memory.h
+++ b/include/exec/memory.h
@@ -65,6 +65,12 @@ struct IOMMUTLBEntry {
     IOMMUAccessFlags perm;
 };
 
+struct IOMMUNotifierData {
+    uint64_t payload_size;
+    uint8_t *payload;
+};
+typedef struct IOMMUNotifierData IOMMUNotifierData;
+
 /*
  * Bitmap for different IOMMUNotifier capabilities. Each notifier can
  * register with one or multiple IOMMU Notifier capability bit(s).
@@ -75,6 +81,8 @@ typedef enum {
     IOMMU_NOTIFIER_UNMAP = 0x1,
     /* Notify entry changes (newly created entries) */
     IOMMU_NOTIFIER_MAP = 0x2,
+    /* Notify PASID Table Binding */
+    IOMMU_NOTIFIER_SVM_PASIDT_BIND = 0x4,
 } IOMMUNotifierFlag;
 
 #define IOMMU_NOTIFIER_ALL (IOMMU_NOTIFIER_MAP | IOMMU_NOTIFIER_UNMAP)
diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
index c582de1..195795c 100644
--- a/include/hw/vfio/vfio-common.h
+++ b/include/hw/vfio/vfio-common.h
@@ -160,6 +160,11 @@ void vfio_put_group(VFIOGroup *group);
 int vfio_get_device(VFIOGroup *group, const char *name,
                     VFIODevice *vbasedev, Error **errp);
 
+VFIOGuestIOMMU *vfio_register_notifier(VFIOContainer *container,
+                                       MemoryRegion *mr,
+                                       hwaddr offset,
+                                       IOMMUNotifier *n);
+
 extern const MemoryRegionOps vfio_region_ops;
 extern QLIST_HEAD(vfio_group_head, VFIOGroup) vfio_group_list;
 extern QLIST_HEAD(vfio_as_head, VFIOAddressSpace) vfio_address_spaces;
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 81+ messages in thread

* [Qemu-devel] [RFC PATCH 06/20] VFIO: add new notifier for binding PASID table
@ 2017-04-26 10:06   ` Liu, Yi L
  0 siblings, 0 replies; 81+ messages in thread
From: Liu, Yi L @ 2017-04-26 10:06 UTC (permalink / raw)
  To: qemu-devel, alex.williamson, peterx
  Cc: kvm, jasowang, iommu, kevin.tian, ashok.raj, jacob.jun.pan,
	tianyu.lan, yi.l.liu, jean-philippe.brucker, Liu, Yi L

This patch includes the following items:

* add vfio_register_notifier() for vfio notifier initialization
* add new notifier flag IOMMU_NOTIFIER_SVM_PASIDT_BIND = 0x4
* add vfio_iommu_bind_pasid_tbl_notify() to link guest pasid table
  to host

This patch doesn't register new notifier in vfio memory region listener
region_add callback. The reason is as below:

On VT-d, when virtual intel_iommu is exposed to guest, the vfio memory
listener listens to address_space_memory. When guest Intel IOMMU driver
enables address translation, vfio memory listener may switch to listen
to vtd_address_space. But there is special case. If virtual intel_iommu
reports ecap.PT=1 to guest and meanwhile guest Intel IOMMU driver sets
"pt" mode for the assigned, vfio memory listener would keep listen to
address_space_memory to make sure there is GPA->HPA mapping in pIOMMU.
Thus region_add would not be triggered. While for the newly added
notifier, it requires to be registered once virtual intel_iommu is
exposed to guest.

Signed-off-by: Liu, Yi L <yi.l.liu@linux.intel.com>
---
 hw/vfio/common.c              | 37 +++++++++++++++++++++++-------
 hw/vfio/pci.c                 | 53 ++++++++++++++++++++++++++++++++++++++++++-
 include/exec/memory.h         |  8 +++++++
 include/hw/vfio/vfio-common.h |  5 ++++
 4 files changed, 94 insertions(+), 9 deletions(-)

diff --git a/hw/vfio/common.c b/hw/vfio/common.c
index 14473f1..e270255 100644
--- a/hw/vfio/common.c
+++ b/hw/vfio/common.c
@@ -294,6 +294,25 @@ static bool vfio_listener_skipped_section(MemoryRegionSection *section)
            section->offset_within_address_space & (1ULL << 63);
 }
 
+VFIOGuestIOMMU *vfio_register_notifier(VFIOContainer *container,
+                                       MemoryRegion *mr,
+                                       hwaddr offset,
+                                       IOMMUNotifier *n)
+{
+    VFIOGuestIOMMU *giommu;
+
+    giommu = g_malloc0(sizeof(*giommu));
+    giommu->iommu = mr;
+    giommu->iommu_offset = offset;
+    giommu->container = container;
+    giommu->n = *n;
+
+    QLIST_INSERT_HEAD(&container->giommu_list, giommu, giommu_next);
+    memory_region_register_iommu_notifier(giommu->iommu, &giommu->n);
+
+    return giommu;
+}
+
 /* Called with rcu_read_lock held.  */
 static bool vfio_get_vaddr(IOMMUTLBEntry *iotlb, void **vaddr,
                            bool *read_only)
@@ -466,6 +485,8 @@ static void vfio_listener_region_add(MemoryListener *listener,
 
     if (memory_region_is_iommu(section->mr)) {
         VFIOGuestIOMMU *giommu;
+        IOMMUNotifier n;
+        hwaddr iommu_offset;
 
         trace_vfio_listener_region_add_iommu(iova, end);
         /*
@@ -474,21 +495,21 @@ static void vfio_listener_region_add(MemoryListener *listener,
          * would be the right place to wire that up (tell the KVM
          * device emulation the VFIO iommu handles to use).
          */
-        giommu = g_malloc0(sizeof(*giommu));
-        giommu->iommu = section->mr;
-        giommu->iommu_offset = section->offset_within_address_space -
-                               section->offset_within_region;
-        giommu->container = container;
+        iommu_offset = section->offset_within_address_space -
+                       section->offset_within_region;
         llend = int128_add(int128_make64(section->offset_within_region),
                            section->size);
         llend = int128_sub(llend, int128_one());
-        iommu_notifier_init(&giommu->n, vfio_iommu_map_notify,
+        iommu_notifier_init(&n, vfio_iommu_map_notify,
                             IOMMU_NOTIFIER_ALL,
                             section->offset_within_region,
                             int128_get64(llend));
-        QLIST_INSERT_HEAD(&container->giommu_list, giommu, giommu_next);
 
-        memory_region_register_iommu_notifier(giommu->iommu, &giommu->n);
+        giommu = vfio_register_notifier(container,
+                                        section->mr,
+                                        iommu_offset,
+                                        &n);
+
         memory_region_iommu_replay(giommu->iommu, &giommu->n, false);
 
         return;
diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
index 332f41d..9e13472 100644
--- a/hw/vfio/pci.c
+++ b/hw/vfio/pci.c
@@ -2594,11 +2594,38 @@ static void vfio_unregister_req_notifier(VFIOPCIDevice *vdev)
     vdev->req_enabled = false;
 }
 
+static void vfio_iommu_bind_pasid_tbl_notify(IOMMUNotifier *n, void *data)
+{
+    VFIOGuestIOMMU *giommu = container_of(n, VFIOGuestIOMMU, n);
+    VFIOContainer *container = giommu->container;
+    IOMMUNotifierData *iommu_data = (IOMMUNotifierData *) data;
+    struct vfio_device_svm *vfio_svm;
+    int argsz;
+
+    argsz = sizeof(*vfio_svm) + iommu_data->payload_size;
+    vfio_svm = g_malloc0(argsz);
+    vfio_svm->argsz = argsz;
+    vfio_svm->flags = VFIO_SVM_BIND_PASIDTBL;
+    vfio_svm->length = iommu_data->payload_size;
+    memcpy(&vfio_svm->data, iommu_data->payload,
+                         iommu_data->payload_size);
+
+    rcu_read_lock();
+    if (ioctl(container->fd, VFIO_IOMMU_SVM_BIND_TASK, vfio_svm) != 0) {
+        error_report("vfio_iommu_bind_pasid_tbl_notify:"
+                     " bind failed, contanier: %p", container);
+    }
+    rcu_read_unlock();
+    g_free(vfio_svm);
+}
+
 static void vfio_realize(PCIDevice *pdev, Error **errp)
 {
     VFIOPCIDevice *vdev = DO_UPCAST(VFIOPCIDevice, pdev, pdev);
     VFIODevice *vbasedev_iter;
     VFIOGroup *group;
+    AddressSpace *as;
+    MemoryRegion *subregion;
     char *tmp, group_path[PATH_MAX], *group_name;
     Error *err = NULL;
     ssize_t len;
@@ -2650,7 +2677,8 @@ static void vfio_realize(PCIDevice *pdev, Error **errp)
 
     trace_vfio_realize(vdev->vbasedev.name, groupid);
 
-    group = vfio_get_group(groupid, pci_device_iommu_address_space(pdev), errp);
+    as = pci_device_iommu_address_space(pdev);
+    group = vfio_get_group(groupid, as, errp);
     if (!group) {
         goto error;
     }
@@ -2833,6 +2861,29 @@ static void vfio_realize(PCIDevice *pdev, Error **errp)
     vfio_register_req_notifier(vdev);
     vfio_setup_resetfn_quirk(vdev);
 
+    /* Check if vIOMMU exists */
+    QTAILQ_FOREACH(subregion, &as->root->subregions, subregions_link) {
+        if (memory_region_is_iommu(subregion)) {
+            IOMMUNotifier n1;
+
+            /*
+             FIXME: current iommu notifier is actually designed for
+             IOMMUTLB MAP/UNMAP. However, vIOMMU emulator may need
+             notifiers other than MAP/UNMAP, so it'll be better to
+             split the non-IOMMUTLB notifier from the current IOMMUTLB
+             notifier framewrok.
+             */
+            iommu_notifier_init(&n1, vfio_iommu_bind_pasid_tbl_notify,
+                                IOMMU_NOTIFIER_SVM_PASIDT_BIND,
+                                0,
+                                0);
+            vfio_register_notifier(group->container,
+                                   subregion,
+                                   0,
+                                   &n1);
+        }
+    }
+
     return;
 
 out_teardown:
diff --git a/include/exec/memory.h b/include/exec/memory.h
index 1faca3b..d2f24cc 100644
--- a/include/exec/memory.h
+++ b/include/exec/memory.h
@@ -65,6 +65,12 @@ struct IOMMUTLBEntry {
     IOMMUAccessFlags perm;
 };
 
+struct IOMMUNotifierData {
+    uint64_t payload_size;
+    uint8_t *payload;
+};
+typedef struct IOMMUNotifierData IOMMUNotifierData;
+
 /*
  * Bitmap for different IOMMUNotifier capabilities. Each notifier can
  * register with one or multiple IOMMU Notifier capability bit(s).
@@ -75,6 +81,8 @@ typedef enum {
     IOMMU_NOTIFIER_UNMAP = 0x1,
     /* Notify entry changes (newly created entries) */
     IOMMU_NOTIFIER_MAP = 0x2,
+    /* Notify PASID Table Binding */
+    IOMMU_NOTIFIER_SVM_PASIDT_BIND = 0x4,
 } IOMMUNotifierFlag;
 
 #define IOMMU_NOTIFIER_ALL (IOMMU_NOTIFIER_MAP | IOMMU_NOTIFIER_UNMAP)
diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
index c582de1..195795c 100644
--- a/include/hw/vfio/vfio-common.h
+++ b/include/hw/vfio/vfio-common.h
@@ -160,6 +160,11 @@ void vfio_put_group(VFIOGroup *group);
 int vfio_get_device(VFIOGroup *group, const char *name,
                     VFIODevice *vbasedev, Error **errp);
 
+VFIOGuestIOMMU *vfio_register_notifier(VFIOContainer *container,
+                                       MemoryRegion *mr,
+                                       hwaddr offset,
+                                       IOMMUNotifier *n);
+
 extern const MemoryRegionOps vfio_region_ops;
 extern QLIST_HEAD(vfio_group_head, VFIOGroup) vfio_group_list;
 extern QLIST_HEAD(vfio_as_head, VFIOAddressSpace) vfio_address_spaces;
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 81+ messages in thread

* [RFC PATCH 07/20] VFIO: check notifier flag in region_del()
  2017-04-26 10:06 ` [Qemu-devel] " Liu, Yi L
@ 2017-04-26 10:06   ` Liu, Yi L
  -1 siblings, 0 replies; 81+ messages in thread
From: Liu, Yi L @ 2017-04-26 10:06 UTC (permalink / raw)
  To: qemu-devel, alex.williamson, peterx
  Cc: kvm, jasowang, iommu, kevin.tian, ashok.raj, jacob.jun.pan,
	tianyu.lan, yi.l.liu, jean-philippe.brucker, Liu, Yi L

This patch adds flag check when unregistering MAP/UNMAP notifier in
region_del. MAP/UNMAP notifier would be unregistered when iommu
memory region is deleted. This is to avoid unregistering other
notifiers.

Peter Xu's intel_iommu enhancement series has introduced dynamic
switch of IOMMU region. If an assigned device switches to use "pt",
the IOMMU region would be deleted, thus the MAP/UNMAP notifier would
be unregistered. While for some cases, the other notifiers may still
wanted. e.g. if a user decides to use vSVM for the assigned device
after the switch, then the pasid table bind notifier is needed. The
newly added pasid table bind notifier would be unregistered in the
vfio_disconnect_container(). The link below would direct you to Peter's
dynamic switch patch.

https://www.mail-archive.com/qemu-devel@nongnu.org/msg444462.html

Signed-off-by: Liu, Yi L <yi.l.liu@linux.intel.com>
---
 hw/vfio/common.c      | 5 +++--
 include/exec/memory.h | 2 +-
 2 files changed, 4 insertions(+), 3 deletions(-)

diff --git a/hw/vfio/common.c b/hw/vfio/common.c
index e270255..719de61 100644
--- a/hw/vfio/common.c
+++ b/hw/vfio/common.c
@@ -501,7 +501,7 @@ static void vfio_listener_region_add(MemoryListener *listener,
                            section->size);
         llend = int128_sub(llend, int128_one());
         iommu_notifier_init(&n, vfio_iommu_map_notify,
-                            IOMMU_NOTIFIER_ALL,
+                            IOMMU_NOTIFIER_MAP_UNMAP,
                             section->offset_within_region,
                             int128_get64(llend));
 
@@ -578,7 +578,8 @@ static void vfio_listener_region_del(MemoryListener *listener,
 
         QLIST_FOREACH(giommu, &container->giommu_list, giommu_next) {
             if (giommu->iommu == section->mr &&
-                giommu->n.start == section->offset_within_region) {
+                giommu->n.start == section->offset_within_region &&
+                giommu->n.notifier_flags & IOMMU_NOTIFIER_MAP_UNMAP) {
                 memory_region_unregister_iommu_notifier(giommu->iommu,
                                                         &giommu->n);
                 QLIST_REMOVE(giommu, giommu_next);
diff --git a/include/exec/memory.h b/include/exec/memory.h
index d2f24cc..7bd13ab 100644
--- a/include/exec/memory.h
+++ b/include/exec/memory.h
@@ -85,7 +85,7 @@ typedef enum {
     IOMMU_NOTIFIER_SVM_PASIDT_BIND = 0x4,
 } IOMMUNotifierFlag;
 
-#define IOMMU_NOTIFIER_ALL (IOMMU_NOTIFIER_MAP | IOMMU_NOTIFIER_UNMAP)
+#define IOMMU_NOTIFIER_MAP_UNMAP (IOMMU_NOTIFIER_MAP | IOMMU_NOTIFIER_UNMAP)
 
 struct IOMMUNotifier;
 typedef void (*IOMMUNotify)(struct IOMMUNotifier *notifier,
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 81+ messages in thread

* [Qemu-devel] [RFC PATCH 07/20] VFIO: check notifier flag in region_del()
@ 2017-04-26 10:06   ` Liu, Yi L
  0 siblings, 0 replies; 81+ messages in thread
From: Liu, Yi L @ 2017-04-26 10:06 UTC (permalink / raw)
  To: qemu-devel, alex.williamson, peterx
  Cc: kvm, jasowang, iommu, kevin.tian, ashok.raj, jacob.jun.pan,
	tianyu.lan, yi.l.liu, jean-philippe.brucker, Liu, Yi L

This patch adds flag check when unregistering MAP/UNMAP notifier in
region_del. MAP/UNMAP notifier would be unregistered when iommu
memory region is deleted. This is to avoid unregistering other
notifiers.

Peter Xu's intel_iommu enhancement series has introduced dynamic
switch of IOMMU region. If an assigned device switches to use "pt",
the IOMMU region would be deleted, thus the MAP/UNMAP notifier would
be unregistered. While for some cases, the other notifiers may still
wanted. e.g. if a user decides to use vSVM for the assigned device
after the switch, then the pasid table bind notifier is needed. The
newly added pasid table bind notifier would be unregistered in the
vfio_disconnect_container(). The link below would direct you to Peter's
dynamic switch patch.

https://www.mail-archive.com/qemu-devel@nongnu.org/msg444462.html

Signed-off-by: Liu, Yi L <yi.l.liu@linux.intel.com>
---
 hw/vfio/common.c      | 5 +++--
 include/exec/memory.h | 2 +-
 2 files changed, 4 insertions(+), 3 deletions(-)

diff --git a/hw/vfio/common.c b/hw/vfio/common.c
index e270255..719de61 100644
--- a/hw/vfio/common.c
+++ b/hw/vfio/common.c
@@ -501,7 +501,7 @@ static void vfio_listener_region_add(MemoryListener *listener,
                            section->size);
         llend = int128_sub(llend, int128_one());
         iommu_notifier_init(&n, vfio_iommu_map_notify,
-                            IOMMU_NOTIFIER_ALL,
+                            IOMMU_NOTIFIER_MAP_UNMAP,
                             section->offset_within_region,
                             int128_get64(llend));
 
@@ -578,7 +578,8 @@ static void vfio_listener_region_del(MemoryListener *listener,
 
         QLIST_FOREACH(giommu, &container->giommu_list, giommu_next) {
             if (giommu->iommu == section->mr &&
-                giommu->n.start == section->offset_within_region) {
+                giommu->n.start == section->offset_within_region &&
+                giommu->n.notifier_flags & IOMMU_NOTIFIER_MAP_UNMAP) {
                 memory_region_unregister_iommu_notifier(giommu->iommu,
                                                         &giommu->n);
                 QLIST_REMOVE(giommu, giommu_next);
diff --git a/include/exec/memory.h b/include/exec/memory.h
index d2f24cc..7bd13ab 100644
--- a/include/exec/memory.h
+++ b/include/exec/memory.h
@@ -85,7 +85,7 @@ typedef enum {
     IOMMU_NOTIFIER_SVM_PASIDT_BIND = 0x4,
 } IOMMUNotifierFlag;
 
-#define IOMMU_NOTIFIER_ALL (IOMMU_NOTIFIER_MAP | IOMMU_NOTIFIER_UNMAP)
+#define IOMMU_NOTIFIER_MAP_UNMAP (IOMMU_NOTIFIER_MAP | IOMMU_NOTIFIER_UNMAP)
 
 struct IOMMUNotifier;
 typedef void (*IOMMUNotify)(struct IOMMUNotifier *notifier,
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 81+ messages in thread

* [RFC PATCH 08/20] Memory: add notifier flag check in memory_replay()
  2017-04-26 10:06 ` [Qemu-devel] " Liu, Yi L
@ 2017-04-26 10:06   ` Liu, Yi L
  -1 siblings, 0 replies; 81+ messages in thread
From: Liu, Yi L @ 2017-04-26 10:06 UTC (permalink / raw)
  To: qemu-devel, alex.williamson, peterx
  Cc: kvm, jasowang, iommu, kevin.tian, ashok.raj, jacob.jun.pan,
	tianyu.lan, yi.l.liu, jean-philippe.brucker, Liu, Yi L

memory_region_iommu_replay is used to do replay with MAP/UNMAP notifier.
However, other notifiers may be passed in, so add a check against notifier
flag to avoid potential error. e.g. memory_region_iommu_replay_all loops
all registered notifiers, may just pass in wrong notifier.

Signed-off-by: Liu, Yi L <yi.l.liu@linux.intel.com>
---
 memory.c | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/memory.c b/memory.c
index 9c253cc..0728e62 100644
--- a/memory.c
+++ b/memory.c
@@ -1630,6 +1630,14 @@ void memory_region_iommu_replay(MemoryRegion *mr, IOMMUNotifier *n,
     hwaddr addr, granularity;
     IOMMUTLBEntry iotlb;
 
+    if (!(n->notifier_flags & IOMMU_NOTIFIER_MAP_UNMAP)) {
+        /* If notifier flag is not IOMMU_NOTIFIER_UNMAP or
+         * IOMMU_NOTIFIER_MAP, return. This check is necessary
+         * as there is notifier other than MAP/UNMAP
+         */
+        return;
+    }
+
     /* If the IOMMU has its own replay callback, override */
     if (mr->iommu_ops->replay) {
         mr->iommu_ops->replay(mr, n);
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 81+ messages in thread

* [Qemu-devel] [RFC PATCH 08/20] Memory: add notifier flag check in memory_replay()
@ 2017-04-26 10:06   ` Liu, Yi L
  0 siblings, 0 replies; 81+ messages in thread
From: Liu, Yi L @ 2017-04-26 10:06 UTC (permalink / raw)
  To: qemu-devel, alex.williamson, peterx
  Cc: kvm, jasowang, iommu, kevin.tian, ashok.raj, jacob.jun.pan,
	tianyu.lan, yi.l.liu, jean-philippe.brucker, Liu, Yi L

memory_region_iommu_replay is used to do replay with MAP/UNMAP notifier.
However, other notifiers may be passed in, so add a check against notifier
flag to avoid potential error. e.g. memory_region_iommu_replay_all loops
all registered notifiers, may just pass in wrong notifier.

Signed-off-by: Liu, Yi L <yi.l.liu@linux.intel.com>
---
 memory.c | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/memory.c b/memory.c
index 9c253cc..0728e62 100644
--- a/memory.c
+++ b/memory.c
@@ -1630,6 +1630,14 @@ void memory_region_iommu_replay(MemoryRegion *mr, IOMMUNotifier *n,
     hwaddr addr, granularity;
     IOMMUTLBEntry iotlb;
 
+    if (!(n->notifier_flags & IOMMU_NOTIFIER_MAP_UNMAP)) {
+        /* If notifier flag is not IOMMU_NOTIFIER_UNMAP or
+         * IOMMU_NOTIFIER_MAP, return. This check is necessary
+         * as there is notifier other than MAP/UNMAP
+         */
+        return;
+    }
+
     /* If the IOMMU has its own replay callback, override */
     if (mr->iommu_ops->replay) {
         mr->iommu_ops->replay(mr, n);
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 81+ messages in thread

* [RFC PATCH 09/20] Memory: introduce iommu_ops->record_device
  2017-04-26 10:06 ` [Qemu-devel] " Liu, Yi L
@ 2017-04-26 10:06   ` Liu, Yi L
  -1 siblings, 0 replies; 81+ messages in thread
From: Liu, Yi L @ 2017-04-26 10:06 UTC (permalink / raw)
  To: qemu-devel, alex.williamson, peterx
  Cc: kvm, jasowang, iommu, kevin.tian, ashok.raj, jacob.jun.pan,
	tianyu.lan, yi.l.liu, jean-philippe.brucker, Liu, Yi L

With vIOMMU exposed to guest, vIOMMU emulator needs to do translation
between host and guest. e.g. a device-selective TLB flush, vIOMMU
emulator needs to replace guest SID with host SID so that to limit
the invalidation. This patch introduces a new callback
iommu_ops->record_device() to notify vIOMMU emulator to record necessary
information about the assigned device.

Signed-off-by: Liu, Yi L <yi.l.liu@linux.intel.com>
---
 include/exec/memory.h | 11 +++++++++++
 memory.c              | 12 ++++++++++++
 2 files changed, 23 insertions(+)

diff --git a/include/exec/memory.h b/include/exec/memory.h
index 7bd13ab..49087ef 100644
--- a/include/exec/memory.h
+++ b/include/exec/memory.h
@@ -203,6 +203,8 @@ struct MemoryRegionIOMMUOps {
                                 IOMMUNotifierFlag new_flags);
     /* Set this up to provide customized IOMMU replay function */
     void (*replay)(MemoryRegion *iommu, IOMMUNotifier *notifier);
+    void (*record_device)(MemoryRegion *iommu,
+                          void *device_info);
 };
 
 typedef struct CoalescedMemoryRange CoalescedMemoryRange;
@@ -708,6 +710,15 @@ void memory_region_notify_iommu(MemoryRegion *mr,
 void memory_region_notify_one(IOMMUNotifier *notifier,
                               IOMMUTLBEntry *entry);
 
+/*
+ * memory_region_notify_device_record: notify IOMMU to record assign
+ * device.
+ * @mr: the memory region to notify
+ * @ device_info: device information
+ */
+void memory_region_notify_device_record(MemoryRegion *mr,
+                                        void *info);
+
 /**
  * memory_region_register_iommu_notifier: register a notifier for changes to
  * IOMMU translation entries.
diff --git a/memory.c b/memory.c
index 0728e62..45ef069 100644
--- a/memory.c
+++ b/memory.c
@@ -1600,6 +1600,18 @@ static void memory_region_update_iommu_notify_flags(MemoryRegion *mr)
     mr->iommu_notify_flags = flags;
 }
 
+void memory_region_notify_device_record(MemoryRegion *mr,
+                                        void *info)
+{
+    assert(memory_region_is_iommu(mr));
+
+    if (mr->iommu_ops->record_device) {
+        mr->iommu_ops->record_device(mr, info);
+    }
+
+    return;
+}
+
 void memory_region_register_iommu_notifier(MemoryRegion *mr,
                                            IOMMUNotifier *n)
 {
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 81+ messages in thread

* [Qemu-devel] [RFC PATCH 09/20] Memory: introduce iommu_ops->record_device
@ 2017-04-26 10:06   ` Liu, Yi L
  0 siblings, 0 replies; 81+ messages in thread
From: Liu, Yi L @ 2017-04-26 10:06 UTC (permalink / raw)
  To: qemu-devel, alex.williamson, peterx
  Cc: kvm, jasowang, iommu, kevin.tian, ashok.raj, jacob.jun.pan,
	tianyu.lan, yi.l.liu, jean-philippe.brucker, Liu, Yi L

With vIOMMU exposed to guest, vIOMMU emulator needs to do translation
between host and guest. e.g. a device-selective TLB flush, vIOMMU
emulator needs to replace guest SID with host SID so that to limit
the invalidation. This patch introduces a new callback
iommu_ops->record_device() to notify vIOMMU emulator to record necessary
information about the assigned device.

Signed-off-by: Liu, Yi L <yi.l.liu@linux.intel.com>
---
 include/exec/memory.h | 11 +++++++++++
 memory.c              | 12 ++++++++++++
 2 files changed, 23 insertions(+)

diff --git a/include/exec/memory.h b/include/exec/memory.h
index 7bd13ab..49087ef 100644
--- a/include/exec/memory.h
+++ b/include/exec/memory.h
@@ -203,6 +203,8 @@ struct MemoryRegionIOMMUOps {
                                 IOMMUNotifierFlag new_flags);
     /* Set this up to provide customized IOMMU replay function */
     void (*replay)(MemoryRegion *iommu, IOMMUNotifier *notifier);
+    void (*record_device)(MemoryRegion *iommu,
+                          void *device_info);
 };
 
 typedef struct CoalescedMemoryRange CoalescedMemoryRange;
@@ -708,6 +710,15 @@ void memory_region_notify_iommu(MemoryRegion *mr,
 void memory_region_notify_one(IOMMUNotifier *notifier,
                               IOMMUTLBEntry *entry);
 
+/*
+ * memory_region_notify_device_record: notify IOMMU to record assign
+ * device.
+ * @mr: the memory region to notify
+ * @ device_info: device information
+ */
+void memory_region_notify_device_record(MemoryRegion *mr,
+                                        void *info);
+
 /**
  * memory_region_register_iommu_notifier: register a notifier for changes to
  * IOMMU translation entries.
diff --git a/memory.c b/memory.c
index 0728e62..45ef069 100644
--- a/memory.c
+++ b/memory.c
@@ -1600,6 +1600,18 @@ static void memory_region_update_iommu_notify_flags(MemoryRegion *mr)
     mr->iommu_notify_flags = flags;
 }
 
+void memory_region_notify_device_record(MemoryRegion *mr,
+                                        void *info)
+{
+    assert(memory_region_is_iommu(mr));
+
+    if (mr->iommu_ops->record_device) {
+        mr->iommu_ops->record_device(mr, info);
+    }
+
+    return;
+}
+
 void memory_region_register_iommu_notifier(MemoryRegion *mr,
                                            IOMMUNotifier *n)
 {
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 81+ messages in thread

* [RFC PATCH 10/20] VFIO: notify vIOMMU emulator when device is assigned
  2017-04-26 10:06 ` [Qemu-devel] " Liu, Yi L
@ 2017-04-26 10:06   ` Liu, Yi L
  -1 siblings, 0 replies; 81+ messages in thread
From: Liu, Yi L @ 2017-04-26 10:06 UTC (permalink / raw)
  To: qemu-devel, alex.williamson, peterx
  Cc: kvm, jasowang, iommu, kevin.tian, ashok.raj, jacob.jun.pan,
	tianyu.lan, yi.l.liu, jean-philippe.brucker, Liu, Yi L

With vIOMMU exposed to guest, notify vIOMMU emulator to record information
of this assigned device. This patch adds iommu_ops->record_device to record
the host bus/slot/function for this device. In future, it can be extended to
other info which is needed.

Signed-off-by: Liu, Yi L <yi.l.liu@linux.intel.com>
---
 hw/vfio/pci.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
index 9e13472..a1e6942 100644
--- a/hw/vfio/pci.c
+++ b/hw/vfio/pci.c
@@ -2881,6 +2881,10 @@ static void vfio_realize(PCIDevice *pdev, Error **errp)
                                    subregion,
                                    0,
                                    &n1);
+
+            memory_region_notify_device_record(subregion,
+                                               &vdev->host);
+
         }
     }
 
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 81+ messages in thread

* [Qemu-devel] [RFC PATCH 10/20] VFIO: notify vIOMMU emulator when device is assigned
@ 2017-04-26 10:06   ` Liu, Yi L
  0 siblings, 0 replies; 81+ messages in thread
From: Liu, Yi L @ 2017-04-26 10:06 UTC (permalink / raw)
  To: qemu-devel, alex.williamson, peterx
  Cc: kvm, jasowang, iommu, kevin.tian, ashok.raj, jacob.jun.pan,
	tianyu.lan, yi.l.liu, jean-philippe.brucker, Liu, Yi L

With vIOMMU exposed to guest, notify vIOMMU emulator to record information
of this assigned device. This patch adds iommu_ops->record_device to record
the host bus/slot/function for this device. In future, it can be extended to
other info which is needed.

Signed-off-by: Liu, Yi L <yi.l.liu@linux.intel.com>
---
 hw/vfio/pci.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
index 9e13472..a1e6942 100644
--- a/hw/vfio/pci.c
+++ b/hw/vfio/pci.c
@@ -2881,6 +2881,10 @@ static void vfio_realize(PCIDevice *pdev, Error **errp)
                                    subregion,
                                    0,
                                    &n1);
+
+            memory_region_notify_device_record(subregion,
+                                               &vdev->host);
+
         }
     }
 
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 81+ messages in thread

* [RFC PATCH 11/20] intel_iommu: provide iommu_ops->record_device
  2017-04-26 10:06 ` [Qemu-devel] " Liu, Yi L
@ 2017-04-26 10:06   ` Liu, Yi L
  -1 siblings, 0 replies; 81+ messages in thread
From: Liu, Yi L @ 2017-04-26 10:06 UTC (permalink / raw)
  To: qemu-devel, alex.williamson, peterx
  Cc: kvm, jasowang, iommu, kevin.tian, ashok.raj, jacob.jun.pan,
	tianyu.lan, yi.l.liu, jean-philippe.brucker, Liu, Yi L

This patch provides iommu_ops->record_device implementation for
intel_iommu. It records the host sid in the IntelIOMMUNotifierNode for
further virtualization usage. e.g. guest sid -> host sid translation
during propagating 1st level cache invalidation from guest to host.

Signed-off-by: Liu, Yi L <yi.l.liu@linux.intel.com>
---
 hw/i386/intel_iommu.c         | 19 +++++++++++++++++++
 include/hw/i386/intel_iommu.h |  1 +
 2 files changed, 20 insertions(+)

diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
index ba1e7eb..0c412d2 100644
--- a/hw/i386/intel_iommu.c
+++ b/hw/i386/intel_iommu.c
@@ -2407,6 +2407,24 @@ static void vtd_iommu_notify_flag_changed(MemoryRegion *iommu,
     }
 }
 
+static void vtd_iommu_record_device(MemoryRegion *iommu,
+                                    void *device_info)
+{
+    VTDAddressSpace *vtd_as = container_of(iommu, VTDAddressSpace, iommu);
+    IntelIOMMUState *s = vtd_as->iommu_state;
+    IntelIOMMUNotifierNode *node = NULL;
+    IntelIOMMUNotifierNode *next_node = NULL;
+    PCIHostDeviceAddress *host = (PCIHostDeviceAddress *) device_info;
+
+    QLIST_FOREACH_SAFE(node, &s->notifiers_list, next, next_node) {
+        if (node->vtd_as == vtd_as) {
+            node->host_sid = ((host->bus & 0xffUL) << 8)
+                           | ((host->slot & 0x1f) << 3)
+                           | (host->function & 0x7);
+        }
+    }
+}
+
 static const VMStateDescription vtd_vmstate = {
     .name = "iommu-intel",
     .version_id = 1,
@@ -2940,6 +2958,7 @@ static void vtd_init(IntelIOMMUState *s)
     s->iommu_ops.translate = vtd_iommu_translate;
     s->iommu_ops.notify_flag_changed = vtd_iommu_notify_flag_changed;
     s->iommu_ops.replay = vtd_iommu_replay;
+    s->iommu_ops.record_device = vtd_iommu_record_device;
     s->root = 0;
     s->root_extended = false;
     s->dmar_enabled = false;
diff --git a/include/hw/i386/intel_iommu.h b/include/hw/i386/intel_iommu.h
index 8981615..a4ce5c3 100644
--- a/include/hw/i386/intel_iommu.h
+++ b/include/hw/i386/intel_iommu.h
@@ -252,6 +252,7 @@ struct VTD_MSIMessage {
 
 struct IntelIOMMUNotifierNode {
     VTDAddressSpace *vtd_as;
+    uint16_t host_sid;
     QLIST_ENTRY(IntelIOMMUNotifierNode) next;
 };
 
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 81+ messages in thread

* [Qemu-devel] [RFC PATCH 11/20] intel_iommu: provide iommu_ops->record_device
@ 2017-04-26 10:06   ` Liu, Yi L
  0 siblings, 0 replies; 81+ messages in thread
From: Liu, Yi L @ 2017-04-26 10:06 UTC (permalink / raw)
  To: qemu-devel, alex.williamson, peterx
  Cc: kvm, jasowang, iommu, kevin.tian, ashok.raj, jacob.jun.pan,
	tianyu.lan, yi.l.liu, jean-philippe.brucker, Liu, Yi L

This patch provides iommu_ops->record_device implementation for
intel_iommu. It records the host sid in the IntelIOMMUNotifierNode for
further virtualization usage. e.g. guest sid -> host sid translation
during propagating 1st level cache invalidation from guest to host.

Signed-off-by: Liu, Yi L <yi.l.liu@linux.intel.com>
---
 hw/i386/intel_iommu.c         | 19 +++++++++++++++++++
 include/hw/i386/intel_iommu.h |  1 +
 2 files changed, 20 insertions(+)

diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
index ba1e7eb..0c412d2 100644
--- a/hw/i386/intel_iommu.c
+++ b/hw/i386/intel_iommu.c
@@ -2407,6 +2407,24 @@ static void vtd_iommu_notify_flag_changed(MemoryRegion *iommu,
     }
 }
 
+static void vtd_iommu_record_device(MemoryRegion *iommu,
+                                    void *device_info)
+{
+    VTDAddressSpace *vtd_as = container_of(iommu, VTDAddressSpace, iommu);
+    IntelIOMMUState *s = vtd_as->iommu_state;
+    IntelIOMMUNotifierNode *node = NULL;
+    IntelIOMMUNotifierNode *next_node = NULL;
+    PCIHostDeviceAddress *host = (PCIHostDeviceAddress *) device_info;
+
+    QLIST_FOREACH_SAFE(node, &s->notifiers_list, next, next_node) {
+        if (node->vtd_as == vtd_as) {
+            node->host_sid = ((host->bus & 0xffUL) << 8)
+                           | ((host->slot & 0x1f) << 3)
+                           | (host->function & 0x7);
+        }
+    }
+}
+
 static const VMStateDescription vtd_vmstate = {
     .name = "iommu-intel",
     .version_id = 1,
@@ -2940,6 +2958,7 @@ static void vtd_init(IntelIOMMUState *s)
     s->iommu_ops.translate = vtd_iommu_translate;
     s->iommu_ops.notify_flag_changed = vtd_iommu_notify_flag_changed;
     s->iommu_ops.replay = vtd_iommu_replay;
+    s->iommu_ops.record_device = vtd_iommu_record_device;
     s->root = 0;
     s->root_extended = false;
     s->dmar_enabled = false;
diff --git a/include/hw/i386/intel_iommu.h b/include/hw/i386/intel_iommu.h
index 8981615..a4ce5c3 100644
--- a/include/hw/i386/intel_iommu.h
+++ b/include/hw/i386/intel_iommu.h
@@ -252,6 +252,7 @@ struct VTD_MSIMessage {
 
 struct IntelIOMMUNotifierNode {
     VTDAddressSpace *vtd_as;
+    uint16_t host_sid;
     QLIST_ENTRY(IntelIOMMUNotifierNode) next;
 };
 
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 81+ messages in thread

* [RFC PATCH 12/20] Memory: Add func to fire pasidt_bind notifier
  2017-04-26 10:06 ` [Qemu-devel] " Liu, Yi L
@ 2017-04-26 10:06   ` Liu, Yi L
  -1 siblings, 0 replies; 81+ messages in thread
From: Liu, Yi L @ 2017-04-26 10:06 UTC (permalink / raw)
  To: qemu-devel, alex.williamson, peterx
  Cc: kvm, jasowang, iommu, kevin.tian, ashok.raj, jacob.jun.pan,
	tianyu.lan, yi.l.liu, jean-philippe.brucker, Liu, Yi L

Add a separate function to fire pasid table bind notifier. In future
there may be more pasid bind type with different granularity. e.g.
binding pasid entry instead of binding pasid table. It can be supported
by adding bind_type, check bind_type in fire func and trigger correct
notifier.

Signed-off-by: Liu, Yi L <yi.l.liu@linux.intel.com>
---
 include/exec/memory.h | 11 +++++++++++
 memory.c              | 21 +++++++++++++++++++++
 2 files changed, 32 insertions(+)

diff --git a/include/exec/memory.h b/include/exec/memory.h
index 49087ef..3b8f487 100644
--- a/include/exec/memory.h
+++ b/include/exec/memory.h
@@ -695,6 +695,17 @@ uint64_t memory_region_iommu_get_min_page_size(MemoryRegion *mr);
 void memory_region_notify_iommu(MemoryRegion *mr,
                                 IOMMUTLBEntry entry);
 
+/*
+ * memory_region_notify_iommu_svm_bind notify SVM bind
+ * request from vIOMMU emulator.
+ *
+ * @mr: the memory region of IOMMU
+ * @data: IOMMU SVM data
+ */
+void memory_region_notify_iommu_svm_bind(MemoryRegion *mr,
+                                         void *data);
+
+
 /**
  * memory_region_notify_one: notify a change in an IOMMU translation
  *                           entry to a single notifier
diff --git a/memory.c b/memory.c
index 45ef069..ce0b0ff 100644
--- a/memory.c
+++ b/memory.c
@@ -1729,6 +1729,27 @@ void memory_region_notify_iommu(MemoryRegion *mr,
     }
 }
 
+void memory_region_notify_iommu_svm_bind(MemoryRegion *mr,
+                                         void *data)
+{
+    IOMMUNotifier *iommu_notifier;
+    IOMMUNotifierFlag request_flags;
+
+    assert(memory_region_is_iommu(mr));
+
+    /*TODO: support other bind requests with smaller gran,
+     * e.g. bind signle pasid entry
+     */
+    request_flags = IOMMU_NOTIFIER_SVM_PASIDT_BIND;
+
+    QLIST_FOREACH(iommu_notifier, &mr->iommu_notify, node) {
+        if (iommu_notifier->notifier_flags & request_flags) {
+            iommu_notifier->notify(iommu_notifier, data);
+            break;
+        }
+    }
+}
+
 void memory_region_set_log(MemoryRegion *mr, bool log, unsigned client)
 {
     uint8_t mask = 1 << client;
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 81+ messages in thread

* [Qemu-devel] [RFC PATCH 12/20] Memory: Add func to fire pasidt_bind notifier
@ 2017-04-26 10:06   ` Liu, Yi L
  0 siblings, 0 replies; 81+ messages in thread
From: Liu, Yi L @ 2017-04-26 10:06 UTC (permalink / raw)
  To: qemu-devel, alex.williamson, peterx
  Cc: kvm, jasowang, iommu, kevin.tian, ashok.raj, jacob.jun.pan,
	tianyu.lan, yi.l.liu, jean-philippe.brucker, Liu, Yi L

Add a separate function to fire pasid table bind notifier. In future
there may be more pasid bind type with different granularity. e.g.
binding pasid entry instead of binding pasid table. It can be supported
by adding bind_type, check bind_type in fire func and trigger correct
notifier.

Signed-off-by: Liu, Yi L <yi.l.liu@linux.intel.com>
---
 include/exec/memory.h | 11 +++++++++++
 memory.c              | 21 +++++++++++++++++++++
 2 files changed, 32 insertions(+)

diff --git a/include/exec/memory.h b/include/exec/memory.h
index 49087ef..3b8f487 100644
--- a/include/exec/memory.h
+++ b/include/exec/memory.h
@@ -695,6 +695,17 @@ uint64_t memory_region_iommu_get_min_page_size(MemoryRegion *mr);
 void memory_region_notify_iommu(MemoryRegion *mr,
                                 IOMMUTLBEntry entry);
 
+/*
+ * memory_region_notify_iommu_svm_bind notify SVM bind
+ * request from vIOMMU emulator.
+ *
+ * @mr: the memory region of IOMMU
+ * @data: IOMMU SVM data
+ */
+void memory_region_notify_iommu_svm_bind(MemoryRegion *mr,
+                                         void *data);
+
+
 /**
  * memory_region_notify_one: notify a change in an IOMMU translation
  *                           entry to a single notifier
diff --git a/memory.c b/memory.c
index 45ef069..ce0b0ff 100644
--- a/memory.c
+++ b/memory.c
@@ -1729,6 +1729,27 @@ void memory_region_notify_iommu(MemoryRegion *mr,
     }
 }
 
+void memory_region_notify_iommu_svm_bind(MemoryRegion *mr,
+                                         void *data)
+{
+    IOMMUNotifier *iommu_notifier;
+    IOMMUNotifierFlag request_flags;
+
+    assert(memory_region_is_iommu(mr));
+
+    /*TODO: support other bind requests with smaller gran,
+     * e.g. bind signle pasid entry
+     */
+    request_flags = IOMMU_NOTIFIER_SVM_PASIDT_BIND;
+
+    QLIST_FOREACH(iommu_notifier, &mr->iommu_notify, node) {
+        if (iommu_notifier->notifier_flags & request_flags) {
+            iommu_notifier->notify(iommu_notifier, data);
+            break;
+        }
+    }
+}
+
 void memory_region_set_log(MemoryRegion *mr, bool log, unsigned client)
 {
     uint8_t mask = 1 << client;
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 81+ messages in thread

* [RFC PATCH 13/20] IOMMU: add pasid_table_info for guest pasid table
  2017-04-26 10:06 ` [Qemu-devel] " Liu, Yi L
@ 2017-04-26 10:06   ` Liu, Yi L
  -1 siblings, 0 replies; 81+ messages in thread
From: Liu, Yi L @ 2017-04-26 10:06 UTC (permalink / raw)
  To: qemu-devel, alex.williamson, peterx
  Cc: kvm, jasowang, iommu, kevin.tian, ashok.raj, jacob.jun.pan,
	tianyu.lan, yi.l.liu, jean-philippe.brucker, Liu, Yi L

This patch adds iommu.h to define some generic definition for IOMMU.

Here defines "struct pasid_table_info" for guest pasid table bind.

Signed-off-by: Liu, Yi L <yi.l.liu@linux.intel.com>
---
 linux-headers/linux/iommu.h | 30 ++++++++++++++++++++++++++++++
 1 file changed, 30 insertions(+)
 create mode 100644 linux-headers/linux/iommu.h

diff --git a/linux-headers/linux/iommu.h b/linux-headers/linux/iommu.h
new file mode 100644
index 0000000..4519dcf
--- /dev/null
+++ b/linux-headers/linux/iommu.h
@@ -0,0 +1,30 @@
+/*
+ * Copyright (C) 2017 Intel Corporation.
+ * Author: Yi Liu <yi.l.liu@linux.intel.com>
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ */
+
+#ifndef __LINUX_IOMMU_H
+#define __LINUX_IOMMU_H
+
+#include <linux/errno.h>
+
+struct pasid_table_info {
+	__u64  ptr;	/* PASID table ptr */
+	__u64  size;	/* PASID table size*/
+	__u32  model;	/* magic number */
+#define	INTEL_IOMMU	(1 << 0)
+#define	ARM_SMMU	(1 << 1)
+	__u8   opaque[];/* IOMMU-specific details */
+};
+
+#endif /* __LINUX_IOMMU_H */
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 81+ messages in thread

* [Qemu-devel] [RFC PATCH 13/20] IOMMU: add pasid_table_info for guest pasid table
@ 2017-04-26 10:06   ` Liu, Yi L
  0 siblings, 0 replies; 81+ messages in thread
From: Liu, Yi L @ 2017-04-26 10:06 UTC (permalink / raw)
  To: qemu-devel, alex.williamson, peterx
  Cc: kvm, jasowang, iommu, kevin.tian, ashok.raj, jacob.jun.pan,
	tianyu.lan, yi.l.liu, jean-philippe.brucker, Liu, Yi L

This patch adds iommu.h to define some generic definition for IOMMU.

Here defines "struct pasid_table_info" for guest pasid table bind.

Signed-off-by: Liu, Yi L <yi.l.liu@linux.intel.com>
---
 linux-headers/linux/iommu.h | 30 ++++++++++++++++++++++++++++++
 1 file changed, 30 insertions(+)
 create mode 100644 linux-headers/linux/iommu.h

diff --git a/linux-headers/linux/iommu.h b/linux-headers/linux/iommu.h
new file mode 100644
index 0000000..4519dcf
--- /dev/null
+++ b/linux-headers/linux/iommu.h
@@ -0,0 +1,30 @@
+/*
+ * Copyright (C) 2017 Intel Corporation.
+ * Author: Yi Liu <yi.l.liu@linux.intel.com>
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ */
+
+#ifndef __LINUX_IOMMU_H
+#define __LINUX_IOMMU_H
+
+#include <linux/errno.h>
+
+struct pasid_table_info {
+	__u64  ptr;	/* PASID table ptr */
+	__u64  size;	/* PASID table size*/
+	__u32  model;	/* magic number */
+#define	INTEL_IOMMU	(1 << 0)
+#define	ARM_SMMU	(1 << 1)
+	__u8   opaque[];/* IOMMU-specific details */
+};
+
+#endif /* __LINUX_IOMMU_H */
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 81+ messages in thread

* [RFC PATCH 14/20] intel_iommu: add FOR_EACH_ASSIGN_DEVICE macro
  2017-04-26 10:06 ` [Qemu-devel] " Liu, Yi L
@ 2017-04-26 10:06   ` Liu, Yi L
  -1 siblings, 0 replies; 81+ messages in thread
From: Liu, Yi L @ 2017-04-26 10:06 UTC (permalink / raw)
  To: qemu-devel, alex.williamson, peterx
  Cc: kvm, jasowang, iommu, kevin.tian, ashok.raj, jacob.jun.pan,
	tianyu.lan, yi.l.liu, jean-philippe.brucker, Liu, Yi L

Add FOR_EACH_ASSIGN_DEVICE. It would be used to loop all assigned
devices when processing guest pasid table linking and iommu cache
invalidate propagation.

Signed-off-by: Liu, Yi L <yi.l.liu@linux.intel.com>
---
 hw/i386/intel_iommu.c          | 32 ++++++++++++++++++++++++++++++++
 hw/i386/intel_iommu_internal.h | 11 +++++++++++
 2 files changed, 43 insertions(+)

diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
index 0c412d2..f291995 100644
--- a/hw/i386/intel_iommu.c
+++ b/hw/i386/intel_iommu.c
@@ -55,6 +55,38 @@ static int vtd_dbgflags = VTD_DBGBIT(GENERAL) | VTD_DBGBIT(CSR);
 #define VTD_DPRINTF(what, fmt, ...) do {} while (0)
 #endif
 
+#define FOR_EACH_ASSIGN_DEVICE(__notify_info_type, \
+                               __opaque_type, \
+                               __hook_info, \
+                               __hook_fn) \
+do { \
+    IntelIOMMUNotifierNode *node; \
+    VTDNotifierIterator iterator; \
+    int ret = 0; \
+    __notify_info_type *notify_info; \
+    __opaque_type *opaq; \
+    int argsz; \
+    argsz = sizeof(*notify_info) + sizeof(*opaq); \
+    notify_info = g_malloc0(argsz); \
+    QLIST_FOREACH(node, &(s->notifiers_list), next) { \
+        VTDAddressSpace *vtd_as = node->vtd_as; \
+        VTDContextEntry ce[2]; \
+        iterator.bus = pci_bus_num(vtd_as->bus); \
+        ret = vtd_dev_to_context_entry(s, iterator.bus, \
+                               vtd_as->devfn, &ce[0]); \
+        if (ret != 0) { \
+            continue; \
+        } \
+        iterator.sid = vtd_make_source_id(iterator.bus, vtd_as->devfn); \
+        iterator.did =  VTD_CONTEXT_ENTRY_DID(ce[0].hi); \
+        iterator.host_sid = node->host_sid; \
+        iterator.vtd_as = vtd_as; \
+        iterator.ce = &ce[0]; \
+        __hook_fn(&iterator, __hook_info, notify_info); \
+    } \
+    g_free(notify_info); \
+} while (0)
+
 static void vtd_define_quad(IntelIOMMUState *s, hwaddr addr, uint64_t val,
                             uint64_t wmask, uint64_t w1cmask)
 {
diff --git a/hw/i386/intel_iommu_internal.h b/hw/i386/intel_iommu_internal.h
index f2a7d12..5178398 100644
--- a/hw/i386/intel_iommu_internal.h
+++ b/hw/i386/intel_iommu_internal.h
@@ -439,6 +439,17 @@ typedef struct VTDRootEntry VTDRootEntry;
 #define VTD_EXT_CONTEXT_TT_NO_DEV_IOTLB   (4ULL << 2)
 #define VTD_EXT_CONTEXT_TT_DEV_IOTLB      (5ULL << 2)
 
+struct VTDNotifierIterator {
+    VTDAddressSpace *vtd_as;
+    VTDContextEntry *ce;
+    uint16_t host_sid;
+    uint16_t sid;
+    uint16_t did;
+    uint8_t  bus;
+};
+
+typedef struct VTDNotifierIterator VTDNotifierIterator;
+
 /* Paging Structure common */
 #define VTD_SL_PT_PAGE_SIZE_MASK    (1ULL << 7)
 /* Bits to decide the offset for each level */
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 81+ messages in thread

* [Qemu-devel] [RFC PATCH 14/20] intel_iommu: add FOR_EACH_ASSIGN_DEVICE macro
@ 2017-04-26 10:06   ` Liu, Yi L
  0 siblings, 0 replies; 81+ messages in thread
From: Liu, Yi L @ 2017-04-26 10:06 UTC (permalink / raw)
  To: qemu-devel, alex.williamson, peterx
  Cc: kvm, jasowang, iommu, kevin.tian, ashok.raj, jacob.jun.pan,
	tianyu.lan, yi.l.liu, jean-philippe.brucker, Liu, Yi L

Add FOR_EACH_ASSIGN_DEVICE. It would be used to loop all assigned
devices when processing guest pasid table linking and iommu cache
invalidate propagation.

Signed-off-by: Liu, Yi L <yi.l.liu@linux.intel.com>
---
 hw/i386/intel_iommu.c          | 32 ++++++++++++++++++++++++++++++++
 hw/i386/intel_iommu_internal.h | 11 +++++++++++
 2 files changed, 43 insertions(+)

diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
index 0c412d2..f291995 100644
--- a/hw/i386/intel_iommu.c
+++ b/hw/i386/intel_iommu.c
@@ -55,6 +55,38 @@ static int vtd_dbgflags = VTD_DBGBIT(GENERAL) | VTD_DBGBIT(CSR);
 #define VTD_DPRINTF(what, fmt, ...) do {} while (0)
 #endif
 
+#define FOR_EACH_ASSIGN_DEVICE(__notify_info_type, \
+                               __opaque_type, \
+                               __hook_info, \
+                               __hook_fn) \
+do { \
+    IntelIOMMUNotifierNode *node; \
+    VTDNotifierIterator iterator; \
+    int ret = 0; \
+    __notify_info_type *notify_info; \
+    __opaque_type *opaq; \
+    int argsz; \
+    argsz = sizeof(*notify_info) + sizeof(*opaq); \
+    notify_info = g_malloc0(argsz); \
+    QLIST_FOREACH(node, &(s->notifiers_list), next) { \
+        VTDAddressSpace *vtd_as = node->vtd_as; \
+        VTDContextEntry ce[2]; \
+        iterator.bus = pci_bus_num(vtd_as->bus); \
+        ret = vtd_dev_to_context_entry(s, iterator.bus, \
+                               vtd_as->devfn, &ce[0]); \
+        if (ret != 0) { \
+            continue; \
+        } \
+        iterator.sid = vtd_make_source_id(iterator.bus, vtd_as->devfn); \
+        iterator.did =  VTD_CONTEXT_ENTRY_DID(ce[0].hi); \
+        iterator.host_sid = node->host_sid; \
+        iterator.vtd_as = vtd_as; \
+        iterator.ce = &ce[0]; \
+        __hook_fn(&iterator, __hook_info, notify_info); \
+    } \
+    g_free(notify_info); \
+} while (0)
+
 static void vtd_define_quad(IntelIOMMUState *s, hwaddr addr, uint64_t val,
                             uint64_t wmask, uint64_t w1cmask)
 {
diff --git a/hw/i386/intel_iommu_internal.h b/hw/i386/intel_iommu_internal.h
index f2a7d12..5178398 100644
--- a/hw/i386/intel_iommu_internal.h
+++ b/hw/i386/intel_iommu_internal.h
@@ -439,6 +439,17 @@ typedef struct VTDRootEntry VTDRootEntry;
 #define VTD_EXT_CONTEXT_TT_NO_DEV_IOTLB   (4ULL << 2)
 #define VTD_EXT_CONTEXT_TT_DEV_IOTLB      (5ULL << 2)
 
+struct VTDNotifierIterator {
+    VTDAddressSpace *vtd_as;
+    VTDContextEntry *ce;
+    uint16_t host_sid;
+    uint16_t sid;
+    uint16_t did;
+    uint8_t  bus;
+};
+
+typedef struct VTDNotifierIterator VTDNotifierIterator;
+
 /* Paging Structure common */
 #define VTD_SL_PT_PAGE_SIZE_MASK    (1ULL << 7)
 /* Bits to decide the offset for each level */
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 81+ messages in thread

* [RFC PATCH 15/20] intel_iommu: link whole guest pasid table to host
  2017-04-26 10:06 ` [Qemu-devel] " Liu, Yi L
@ 2017-04-26 10:06   ` Liu, Yi L
  -1 siblings, 0 replies; 81+ messages in thread
From: Liu, Yi L @ 2017-04-26 10:06 UTC (permalink / raw)
  To: qemu-devel, alex.williamson, peterx
  Cc: kvm, jasowang, iommu, kevin.tian, ashok.raj, jacob.jun.pan,
	tianyu.lan, yi.l.liu, jean-philippe.brucker, Liu, Yi L

VT-d has a nested mode which allows SVM virtualization. Link the whole
guest PASID table to host context entry and enable nested mode, pIOMMU
would do nested translation for DMA request. Thus achieve GVA->HPA
translation.

When extended-context-entry is modified in guest, intel_iommu emulator
should capture it, then link the whole guest PASID table to host and
enable nested mode for the assigned device.

Signed-off-by: Liu, Yi L <yi.l.liu@linux.intel.com>
---
 hw/i386/intel_iommu.c          | 121 +++++++++++++++++++++++++++++++++++++++--
 hw/i386/intel_iommu_internal.h |  11 ++++
 2 files changed, 127 insertions(+), 5 deletions(-)

diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
index f291995..cd6db65 100644
--- a/hw/i386/intel_iommu.c
+++ b/hw/i386/intel_iommu.c
@@ -36,6 +36,7 @@
 #include "hw/i386/apic_internal.h"
 #include "kvm_i386.h"
 #include "trace.h"
+#include <linux/iommu.h>
 
 /*#define DEBUG_INTEL_IOMMU*/
 #ifdef DEBUG_INTEL_IOMMU
@@ -55,6 +56,14 @@ static int vtd_dbgflags = VTD_DBGBIT(GENERAL) | VTD_DBGBIT(CSR);
 #define VTD_DPRINTF(what, fmt, ...) do {} while (0)
 #endif
 
+typedef void (*vtd_device_hook)(VTDNotifierIterator *iter,
+                                void *hook_info,
+                                void *notify_info);
+
+static void vtd_context_inv_notify_hook(VTDNotifierIterator *iter,
+                                        void *hook_info,
+                                        void *notify_info);
+
 #define FOR_EACH_ASSIGN_DEVICE(__notify_info_type, \
                                __opaque_type, \
                                __hook_info, \
@@ -1213,6 +1222,66 @@ static void vtd_iommu_replay_all(IntelIOMMUState *s)
     }
 }
 
+void vtd_context_inv_notify_hook(VTDNotifierIterator *iter,
+                                 void *hook_info,
+                                 void *notify_info)
+{
+    struct pasid_table_info *pasidt_info;
+    IOMMUNotifierData iommu_data;
+    VTDContextHookInfo *context_hook_info;
+    uint16_t *host_sid;
+    pasidt_info = (struct pasid_table_info *) notify_info;
+    context_hook_info = (VTDContextHookInfo *) hook_info;
+    switch (context_hook_info->gran) {
+    case VTD_INV_DESC_CC_GLOBAL:
+        /* Fall through */
+    case VTD_INV_DESC_CC_DOMAIN:
+        if (iter->did == *context_hook_info->did) {
+            break;
+        }
+        /* Fall through */
+    case VTD_INV_DESC_CC_DEVICE:
+        if ((iter->did == *context_hook_info->did) &&
+            (iter->sid == *context_hook_info->sid)) {
+            break;
+        }
+        /* Fall through */
+    default:
+        return;
+    }
+
+    pasidt_info->model = INTEL_IOMMU;
+    host_sid = (uint16_t *)&pasidt_info->opaque;
+
+    pasidt_info->ptr = iter->ce[1].lo;
+    pasidt_info->size = iter->ce[1].lo & VTD_PASID_TABLE_SIZE_MASK;
+    *host_sid = iter->host_sid;
+    iommu_data.payload = (uint8_t *) pasidt_info;
+    iommu_data.payload_size = sizeof(*pasidt_info) + sizeof(*host_sid);
+    memory_region_notify_iommu_svm_bind(&iter->vtd_as->iommu,
+                                        &iommu_data);
+    return;
+}
+
+static void vtd_context_cache_invalidate_notify(IntelIOMMUState *s,
+                                                uint16_t *did,
+                                                uint16_t *sid,
+                                                uint8_t gran,
+                                                vtd_device_hook hook_fn)
+{
+    VTDContextHookInfo context_hook_info = {
+        .did = did,
+        .sid = sid,
+        .gran = gran,
+    };
+
+    FOR_EACH_ASSIGN_DEVICE(struct pasid_table_info,
+                           uint16_t,
+                           &context_hook_info,
+                           hook_fn);
+    return;
+}
+
 static void vtd_context_global_invalidate(IntelIOMMUState *s)
 {
     trace_vtd_inv_desc_cc_global();
@@ -1228,8 +1297,35 @@ static void vtd_context_global_invalidate(IntelIOMMUState *s)
      * VT-d emulation codes.
      */
     vtd_iommu_replay_all(s);
+
+    if (s->svm) {
+        vtd_context_cache_invalidate_notify(s, NULL, NULL,
+                VTD_INV_DESC_CC_GLOBAL, vtd_context_inv_notify_hook);
+    }
 }
 
+static void vtd_context_domain_selective_invalidate(IntelIOMMUState *s,
+                                                    uint16_t did)
+{
+    trace_vtd_inv_desc_cc_global();
+    s->context_cache_gen++;
+    if (s->context_cache_gen == VTD_CONTEXT_CACHE_GEN_MAX) {
+        vtd_reset_context_cache(s);
+    }
+    /*
+     * From VT-d spec 6.5.2.1, a global context entry invalidation
+     * should be followed by a IOTLB global invalidation, so we should
+     * be safe even without this. Hoewever, let's replay the region as
+     * well to be safer, and go back here when we need finer tunes for
+     * VT-d emulation codes.
+     */
+    vtd_iommu_replay_all(s);
+
+    if (s->svm) {
+        vtd_context_cache_invalidate_notify(s, &did, NULL,
+                 VTD_INV_DESC_CC_DOMAIN, vtd_context_inv_notify_hook);
+    }
+}
 
 /* Find the VTD address space currently associated with a given bus number,
  */
@@ -1258,13 +1354,14 @@ static VTDBus *vtd_find_as_from_bus_num(IntelIOMMUState *s, uint8_t bus_num)
  */
 static void vtd_context_device_invalidate(IntelIOMMUState *s,
                                           uint16_t source_id,
+                                          uint16_t did,
                                           uint16_t func_mask)
 {
     uint16_t mask;
     VTDBus *vtd_bus;
     VTDAddressSpace *vtd_as;
     uint8_t bus_n, devfn;
-    uint16_t devfn_it;
+    uint16_t devfn_it, sid_it;
 
     trace_vtd_inv_desc_cc_devices(source_id, func_mask);
 
@@ -1311,6 +1408,12 @@ static void vtd_context_device_invalidate(IntelIOMMUState *s,
                  * happened.
                  */
                 memory_region_iommu_replay_all(&vtd_as->iommu);
+                if (s->svm) {
+                    sid_it = vtd_make_source_id(pci_bus_num(vtd_bus->bus),
+                                                devfn_it);
+                    vtd_context_cache_invalidate_notify(s, &did, &sid_it,
+                        VTD_INV_DESC_CC_DEVICE, vtd_context_inv_notify_hook);
+                }
             }
         }
     }
@@ -1324,6 +1427,7 @@ static uint64_t vtd_context_cache_invalidate(IntelIOMMUState *s, uint64_t val)
 {
     uint64_t caig;
     uint64_t type = val & VTD_CCMD_CIRG_MASK;
+    uint16_t did;
 
     switch (type) {
     case VTD_CCMD_DOMAIN_INVL:
@@ -1338,7 +1442,9 @@ static uint64_t vtd_context_cache_invalidate(IntelIOMMUState *s, uint64_t val)
 
     case VTD_CCMD_DEVICE_INVL:
         caig = VTD_CCMD_DEVICE_INVL_A;
-        vtd_context_device_invalidate(s, VTD_CCMD_SID(val), VTD_CCMD_FM(val));
+        did = VTD_CCMD_DID(val);
+        vtd_context_device_invalidate(s, VTD_CCMD_SID(val),
+                                      did, VTD_CCMD_FM(val));
         break;
 
     default:
@@ -1720,7 +1826,7 @@ static bool vtd_process_wait_desc(IntelIOMMUState *s, VTDInvDesc *inv_desc)
 static bool vtd_process_context_cache_desc(IntelIOMMUState *s,
                                            VTDInvDesc *inv_desc)
 {
-    uint16_t sid, fmask;
+    uint16_t sid, fmask, did;
 
     if ((inv_desc->lo & VTD_INV_DESC_CC_RSVD) || inv_desc->hi) {
         trace_vtd_inv_desc_cc_invalid(inv_desc->hi, inv_desc->lo);
@@ -1728,17 +1834,22 @@ static bool vtd_process_context_cache_desc(IntelIOMMUState *s,
     }
     switch (inv_desc->lo & VTD_INV_DESC_CC_G) {
     case VTD_INV_DESC_CC_DOMAIN:
+        did = VTD_INV_DESC_CC_DID(inv_desc->lo);
+        vtd_context_domain_selective_invalidate(s, did);
         trace_vtd_inv_desc_cc_domain(
             (uint16_t)VTD_INV_DESC_CC_DID(inv_desc->lo));
-        /* Fall through */
+        break;
     case VTD_INV_DESC_CC_GLOBAL:
+        trace_vtd_inv_desc_cc_domain(
+            (uint16_t)VTD_INV_DESC_CC_DID(inv_desc->lo));
         vtd_context_global_invalidate(s);
         break;
 
     case VTD_INV_DESC_CC_DEVICE:
         sid = VTD_INV_DESC_CC_SID(inv_desc->lo);
         fmask = VTD_INV_DESC_CC_FM(inv_desc->lo);
-        vtd_context_device_invalidate(s, sid, fmask);
+        did = VTD_INV_DESC_CC_DID(inv_desc->lo);
+        vtd_context_device_invalidate(s, sid, did, fmask);
         break;
 
     default:
diff --git a/hw/i386/intel_iommu_internal.h b/hw/i386/intel_iommu_internal.h
index 5178398..5ab7d77 100644
--- a/hw/i386/intel_iommu_internal.h
+++ b/hw/i386/intel_iommu_internal.h
@@ -439,6 +439,14 @@ typedef struct VTDRootEntry VTDRootEntry;
 #define VTD_EXT_CONTEXT_TT_NO_DEV_IOTLB   (4ULL << 2)
 #define VTD_EXT_CONTEXT_TT_DEV_IOTLB      (5ULL << 2)
 
+struct VTDContextHookInfo {
+    uint16_t *did;
+    uint16_t *sid;
+    uint8_t  gran;
+};
+
+typedef struct VTDContextHookInfo VTDContextHookInfo;
+
 struct VTDNotifierIterator {
     VTDAddressSpace *vtd_as;
     VTDContextEntry *ce;
@@ -450,6 +458,9 @@ struct VTDNotifierIterator {
 
 typedef struct VTDNotifierIterator VTDNotifierIterator;
 
+/* Masks for struct VTDContextEntry - Extended Context */
+#define VTD_PASID_TABLE_SIZE_MASK 0xf
+
 /* Paging Structure common */
 #define VTD_SL_PT_PAGE_SIZE_MASK    (1ULL << 7)
 /* Bits to decide the offset for each level */
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 81+ messages in thread

* [Qemu-devel] [RFC PATCH 15/20] intel_iommu: link whole guest pasid table to host
@ 2017-04-26 10:06   ` Liu, Yi L
  0 siblings, 0 replies; 81+ messages in thread
From: Liu, Yi L @ 2017-04-26 10:06 UTC (permalink / raw)
  To: qemu-devel, alex.williamson, peterx
  Cc: kvm, jasowang, iommu, kevin.tian, ashok.raj, jacob.jun.pan,
	tianyu.lan, yi.l.liu, jean-philippe.brucker, Liu, Yi L

VT-d has a nested mode which allows SVM virtualization. Link the whole
guest PASID table to host context entry and enable nested mode, pIOMMU
would do nested translation for DMA request. Thus achieve GVA->HPA
translation.

When extended-context-entry is modified in guest, intel_iommu emulator
should capture it, then link the whole guest PASID table to host and
enable nested mode for the assigned device.

Signed-off-by: Liu, Yi L <yi.l.liu@linux.intel.com>
---
 hw/i386/intel_iommu.c          | 121 +++++++++++++++++++++++++++++++++++++++--
 hw/i386/intel_iommu_internal.h |  11 ++++
 2 files changed, 127 insertions(+), 5 deletions(-)

diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
index f291995..cd6db65 100644
--- a/hw/i386/intel_iommu.c
+++ b/hw/i386/intel_iommu.c
@@ -36,6 +36,7 @@
 #include "hw/i386/apic_internal.h"
 #include "kvm_i386.h"
 #include "trace.h"
+#include <linux/iommu.h>
 
 /*#define DEBUG_INTEL_IOMMU*/
 #ifdef DEBUG_INTEL_IOMMU
@@ -55,6 +56,14 @@ static int vtd_dbgflags = VTD_DBGBIT(GENERAL) | VTD_DBGBIT(CSR);
 #define VTD_DPRINTF(what, fmt, ...) do {} while (0)
 #endif
 
+typedef void (*vtd_device_hook)(VTDNotifierIterator *iter,
+                                void *hook_info,
+                                void *notify_info);
+
+static void vtd_context_inv_notify_hook(VTDNotifierIterator *iter,
+                                        void *hook_info,
+                                        void *notify_info);
+
 #define FOR_EACH_ASSIGN_DEVICE(__notify_info_type, \
                                __opaque_type, \
                                __hook_info, \
@@ -1213,6 +1222,66 @@ static void vtd_iommu_replay_all(IntelIOMMUState *s)
     }
 }
 
+void vtd_context_inv_notify_hook(VTDNotifierIterator *iter,
+                                 void *hook_info,
+                                 void *notify_info)
+{
+    struct pasid_table_info *pasidt_info;
+    IOMMUNotifierData iommu_data;
+    VTDContextHookInfo *context_hook_info;
+    uint16_t *host_sid;
+    pasidt_info = (struct pasid_table_info *) notify_info;
+    context_hook_info = (VTDContextHookInfo *) hook_info;
+    switch (context_hook_info->gran) {
+    case VTD_INV_DESC_CC_GLOBAL:
+        /* Fall through */
+    case VTD_INV_DESC_CC_DOMAIN:
+        if (iter->did == *context_hook_info->did) {
+            break;
+        }
+        /* Fall through */
+    case VTD_INV_DESC_CC_DEVICE:
+        if ((iter->did == *context_hook_info->did) &&
+            (iter->sid == *context_hook_info->sid)) {
+            break;
+        }
+        /* Fall through */
+    default:
+        return;
+    }
+
+    pasidt_info->model = INTEL_IOMMU;
+    host_sid = (uint16_t *)&pasidt_info->opaque;
+
+    pasidt_info->ptr = iter->ce[1].lo;
+    pasidt_info->size = iter->ce[1].lo & VTD_PASID_TABLE_SIZE_MASK;
+    *host_sid = iter->host_sid;
+    iommu_data.payload = (uint8_t *) pasidt_info;
+    iommu_data.payload_size = sizeof(*pasidt_info) + sizeof(*host_sid);
+    memory_region_notify_iommu_svm_bind(&iter->vtd_as->iommu,
+                                        &iommu_data);
+    return;
+}
+
+static void vtd_context_cache_invalidate_notify(IntelIOMMUState *s,
+                                                uint16_t *did,
+                                                uint16_t *sid,
+                                                uint8_t gran,
+                                                vtd_device_hook hook_fn)
+{
+    VTDContextHookInfo context_hook_info = {
+        .did = did,
+        .sid = sid,
+        .gran = gran,
+    };
+
+    FOR_EACH_ASSIGN_DEVICE(struct pasid_table_info,
+                           uint16_t,
+                           &context_hook_info,
+                           hook_fn);
+    return;
+}
+
 static void vtd_context_global_invalidate(IntelIOMMUState *s)
 {
     trace_vtd_inv_desc_cc_global();
@@ -1228,8 +1297,35 @@ static void vtd_context_global_invalidate(IntelIOMMUState *s)
      * VT-d emulation codes.
      */
     vtd_iommu_replay_all(s);
+
+    if (s->svm) {
+        vtd_context_cache_invalidate_notify(s, NULL, NULL,
+                VTD_INV_DESC_CC_GLOBAL, vtd_context_inv_notify_hook);
+    }
 }
 
+static void vtd_context_domain_selective_invalidate(IntelIOMMUState *s,
+                                                    uint16_t did)
+{
+    trace_vtd_inv_desc_cc_global();
+    s->context_cache_gen++;
+    if (s->context_cache_gen == VTD_CONTEXT_CACHE_GEN_MAX) {
+        vtd_reset_context_cache(s);
+    }
+    /*
+     * From VT-d spec 6.5.2.1, a global context entry invalidation
+     * should be followed by a IOTLB global invalidation, so we should
+     * be safe even without this. Hoewever, let's replay the region as
+     * well to be safer, and go back here when we need finer tunes for
+     * VT-d emulation codes.
+     */
+    vtd_iommu_replay_all(s);
+
+    if (s->svm) {
+        vtd_context_cache_invalidate_notify(s, &did, NULL,
+                 VTD_INV_DESC_CC_DOMAIN, vtd_context_inv_notify_hook);
+    }
+}
 
 /* Find the VTD address space currently associated with a given bus number,
  */
@@ -1258,13 +1354,14 @@ static VTDBus *vtd_find_as_from_bus_num(IntelIOMMUState *s, uint8_t bus_num)
  */
 static void vtd_context_device_invalidate(IntelIOMMUState *s,
                                           uint16_t source_id,
+                                          uint16_t did,
                                           uint16_t func_mask)
 {
     uint16_t mask;
     VTDBus *vtd_bus;
     VTDAddressSpace *vtd_as;
     uint8_t bus_n, devfn;
-    uint16_t devfn_it;
+    uint16_t devfn_it, sid_it;
 
     trace_vtd_inv_desc_cc_devices(source_id, func_mask);
 
@@ -1311,6 +1408,12 @@ static void vtd_context_device_invalidate(IntelIOMMUState *s,
                  * happened.
                  */
                 memory_region_iommu_replay_all(&vtd_as->iommu);
+                if (s->svm) {
+                    sid_it = vtd_make_source_id(pci_bus_num(vtd_bus->bus),
+                                                devfn_it);
+                    vtd_context_cache_invalidate_notify(s, &did, &sid_it,
+                        VTD_INV_DESC_CC_DEVICE, vtd_context_inv_notify_hook);
+                }
             }
         }
     }
@@ -1324,6 +1427,7 @@ static uint64_t vtd_context_cache_invalidate(IntelIOMMUState *s, uint64_t val)
 {
     uint64_t caig;
     uint64_t type = val & VTD_CCMD_CIRG_MASK;
+    uint16_t did;
 
     switch (type) {
     case VTD_CCMD_DOMAIN_INVL:
@@ -1338,7 +1442,9 @@ static uint64_t vtd_context_cache_invalidate(IntelIOMMUState *s, uint64_t val)
 
     case VTD_CCMD_DEVICE_INVL:
         caig = VTD_CCMD_DEVICE_INVL_A;
-        vtd_context_device_invalidate(s, VTD_CCMD_SID(val), VTD_CCMD_FM(val));
+        did = VTD_CCMD_DID(val);
+        vtd_context_device_invalidate(s, VTD_CCMD_SID(val),
+                                      did, VTD_CCMD_FM(val));
         break;
 
     default:
@@ -1720,7 +1826,7 @@ static bool vtd_process_wait_desc(IntelIOMMUState *s, VTDInvDesc *inv_desc)
 static bool vtd_process_context_cache_desc(IntelIOMMUState *s,
                                            VTDInvDesc *inv_desc)
 {
-    uint16_t sid, fmask;
+    uint16_t sid, fmask, did;
 
     if ((inv_desc->lo & VTD_INV_DESC_CC_RSVD) || inv_desc->hi) {
         trace_vtd_inv_desc_cc_invalid(inv_desc->hi, inv_desc->lo);
@@ -1728,17 +1834,22 @@ static bool vtd_process_context_cache_desc(IntelIOMMUState *s,
     }
     switch (inv_desc->lo & VTD_INV_DESC_CC_G) {
     case VTD_INV_DESC_CC_DOMAIN:
+        did = VTD_INV_DESC_CC_DID(inv_desc->lo);
+        vtd_context_domain_selective_invalidate(s, did);
         trace_vtd_inv_desc_cc_domain(
             (uint16_t)VTD_INV_DESC_CC_DID(inv_desc->lo));
-        /* Fall through */
+        break;
     case VTD_INV_DESC_CC_GLOBAL:
+        trace_vtd_inv_desc_cc_domain(
+            (uint16_t)VTD_INV_DESC_CC_DID(inv_desc->lo));
         vtd_context_global_invalidate(s);
         break;
 
     case VTD_INV_DESC_CC_DEVICE:
         sid = VTD_INV_DESC_CC_SID(inv_desc->lo);
         fmask = VTD_INV_DESC_CC_FM(inv_desc->lo);
-        vtd_context_device_invalidate(s, sid, fmask);
+        did = VTD_INV_DESC_CC_DID(inv_desc->lo);
+        vtd_context_device_invalidate(s, sid, did, fmask);
         break;
 
     default:
diff --git a/hw/i386/intel_iommu_internal.h b/hw/i386/intel_iommu_internal.h
index 5178398..5ab7d77 100644
--- a/hw/i386/intel_iommu_internal.h
+++ b/hw/i386/intel_iommu_internal.h
@@ -439,6 +439,14 @@ typedef struct VTDRootEntry VTDRootEntry;
 #define VTD_EXT_CONTEXT_TT_NO_DEV_IOTLB   (4ULL << 2)
 #define VTD_EXT_CONTEXT_TT_DEV_IOTLB      (5ULL << 2)
 
+struct VTDContextHookInfo {
+    uint16_t *did;
+    uint16_t *sid;
+    uint8_t  gran;
+};
+
+typedef struct VTDContextHookInfo VTDContextHookInfo;
+
 struct VTDNotifierIterator {
     VTDAddressSpace *vtd_as;
     VTDContextEntry *ce;
@@ -450,6 +458,9 @@ struct VTDNotifierIterator {
 
 typedef struct VTDNotifierIterator VTDNotifierIterator;
 
+/* Masks for struct VTDContextEntry - Extended Context */
+#define VTD_PASID_TABLE_SIZE_MASK 0xf
+
 /* Paging Structure common */
 #define VTD_SL_PT_PAGE_SIZE_MASK    (1ULL << 7)
 /* Bits to decide the offset for each level */
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 81+ messages in thread

* [RFC PATCH 16/20] VFIO: Add notifier for propagating IOMMU TLB invalidate
  2017-04-26 10:06 ` [Qemu-devel] " Liu, Yi L
@ 2017-04-26 10:06   ` Liu, Yi L
  -1 siblings, 0 replies; 81+ messages in thread
From: Liu, Yi L @ 2017-04-26 10:06 UTC (permalink / raw)
  To: qemu-devel, alex.williamson, peterx
  Cc: kvm, jasowang, iommu, kevin.tian, ashok.raj, jacob.jun.pan,
	tianyu.lan, yi.l.liu, jean-philippe.brucker, Liu, Yi L

This patch adds the following items:
* add new notifier flag IOMMU_NOTIFIER_IOMMU_TLB_INV = 0x8
* add new IOCTL cmd VFIO_IOMMU_TLB_INVALIDATE attached on container->fd
* add vfio_iommu_tlb_invalidate_notify() to propagate IOMMU TLB invalidate
  to host

This new notifier is originated from the requirement of SVM virtualization
on VT-d. It is for invalidation of first-level and nested mappings from the
IOTLB and the paging-structure-caches. Since the existed MAP/UNMAP notifier
is designed for second-level related mappings, it is not suitable for the
new requirement. So it is necessary to introduce this new notifier to meet
the SVM virtualization requirement. Further detail would be included in the
patch below:

"intel_iommu: propagate Extended-IOTLB invalidate to host"

Signed-off-by: Liu, Yi L <yi.l.liu@linux.intel.com>
---
 hw/vfio/pci.c               | 37 +++++++++++++++++++++++++++++++++++++
 include/exec/memory.h       |  2 ++
 linux-headers/linux/iommu.h |  5 +++++
 linux-headers/linux/vfio.h  |  8 ++++++++
 4 files changed, 52 insertions(+)

diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
index a1e6942..afcefd6 100644
--- a/hw/vfio/pci.c
+++ b/hw/vfio/pci.c
@@ -2619,6 +2619,33 @@ static void vfio_iommu_bind_pasid_tbl_notify(IOMMUNotifier *n, void *data)
     g_free(vfio_svm);
 }
 
+static void vfio_iommu_tlb_invalidate_notify(IOMMUNotifier *n,
+                                             void *data)
+{
+    VFIOGuestIOMMU *giommu = container_of(n, VFIOGuestIOMMU, n);
+    VFIOContainer *container = giommu->container;
+    IOMMUNotifierData *iommu_data = (IOMMUNotifierData *) data;
+    struct vfio_iommu_tlb_invalidate *vfio_tlb_inv;
+    int argsz;
+
+    argsz = sizeof(*vfio_tlb_inv) + iommu_data->payload_size;
+    vfio_tlb_inv = g_malloc0(argsz);
+    vfio_tlb_inv->argsz = argsz;
+    vfio_tlb_inv->length = iommu_data->payload_size;
+
+    memcpy(&vfio_tlb_inv->data, iommu_data->payload,
+              iommu_data->payload_size);
+
+    rcu_read_lock();
+    if (ioctl(container->fd, VFIO_IOMMU_TLB_INVALIDATE,
+              vfio_tlb_inv) != 0) {
+        error_report("vfio_iommu_tlb_invalidate_notify:"
+                     " failed, contanier: %p", container);
+    }
+    rcu_read_unlock();
+    g_free(vfio_tlb_inv);
+}
+
 static void vfio_realize(PCIDevice *pdev, Error **errp)
 {
     VFIOPCIDevice *vdev = DO_UPCAST(VFIOPCIDevice, pdev, pdev);
@@ -2865,6 +2892,7 @@ static void vfio_realize(PCIDevice *pdev, Error **errp)
     QTAILQ_FOREACH(subregion, &as->root->subregions, subregions_link) {
         if (memory_region_is_iommu(subregion)) {
             IOMMUNotifier n1;
+            IOMMUNotifier n2;
 
             /*
              FIXME: current iommu notifier is actually designed for
@@ -2882,6 +2910,15 @@ static void vfio_realize(PCIDevice *pdev, Error **errp)
                                    0,
                                    &n1);
 
+            iommu_notifier_init(&n2, vfio_iommu_tlb_invalidate_notify,
+                                IOMMU_NOTIFIER_IOMMU_TLB_INV,
+                                0,
+                                0);
+            vfio_register_notifier(group->container,
+                                   subregion,
+                                   0,
+                                   &n2);
+
             memory_region_notify_device_record(subregion,
                                                &vdev->host);
 
diff --git a/include/exec/memory.h b/include/exec/memory.h
index 3b8f487..af15351 100644
--- a/include/exec/memory.h
+++ b/include/exec/memory.h
@@ -83,6 +83,8 @@ typedef enum {
     IOMMU_NOTIFIER_MAP = 0x2,
     /* Notify PASID Table Binding */
     IOMMU_NOTIFIER_SVM_PASIDT_BIND = 0x4,
+    /* Notify IOMMU TLB Invalidation */
+    IOMMU_NOTIFIER_IOMMU_TLB_INV = 0x8,
 } IOMMUNotifierFlag;
 
 #define IOMMU_NOTIFIER_MAP_UNMAP (IOMMU_NOTIFIER_MAP | IOMMU_NOTIFIER_UNMAP)
diff --git a/linux-headers/linux/iommu.h b/linux-headers/linux/iommu.h
index 4519dcf..c2742ba 100644
--- a/linux-headers/linux/iommu.h
+++ b/linux-headers/linux/iommu.h
@@ -27,4 +27,9 @@ struct pasid_table_info {
 	__u8   opaque[];/* IOMMU-specific details */
 };
 
+struct tlb_invalidate_info {
+	__u32	model;
+	__u8	opaque[];
+};
+
 #endif /* __LINUX_IOMMU_H */
diff --git a/linux-headers/linux/vfio.h b/linux-headers/linux/vfio.h
index 9848d63..6c71c4a 100644
--- a/linux-headers/linux/vfio.h
+++ b/linux-headers/linux/vfio.h
@@ -554,6 +554,14 @@ struct vfio_device_svm {
 
 #define VFIO_IOMMU_SVM_BIND_TASK	_IO(VFIO_TYPE, VFIO_BASE + 22)
 
+/* For IOMMU Invalidation Passdwon */
+struct vfio_iommu_tlb_invalidate {
+	__u32	argsz;
+	__u32	length;
+	__u8	data[];
+};
+
+#define VFIO_IOMMU_TLB_INVALIDATE	_IO(VFIO_TYPE, VFIO_BASE + 23)
 
 /* -------- Additional API for SPAPR TCE (Server POWERPC) IOMMU -------- */
 
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 81+ messages in thread

* [Qemu-devel] [RFC PATCH 16/20] VFIO: Add notifier for propagating IOMMU TLB invalidate
@ 2017-04-26 10:06   ` Liu, Yi L
  0 siblings, 0 replies; 81+ messages in thread
From: Liu, Yi L @ 2017-04-26 10:06 UTC (permalink / raw)
  To: qemu-devel, alex.williamson, peterx
  Cc: kvm, jasowang, iommu, kevin.tian, ashok.raj, jacob.jun.pan,
	tianyu.lan, yi.l.liu, jean-philippe.brucker, Liu, Yi L

This patch adds the following items:
* add new notifier flag IOMMU_NOTIFIER_IOMMU_TLB_INV = 0x8
* add new IOCTL cmd VFIO_IOMMU_TLB_INVALIDATE attached on container->fd
* add vfio_iommu_tlb_invalidate_notify() to propagate IOMMU TLB invalidate
  to host

This new notifier is originated from the requirement of SVM virtualization
on VT-d. It is for invalidation of first-level and nested mappings from the
IOTLB and the paging-structure-caches. Since the existed MAP/UNMAP notifier
is designed for second-level related mappings, it is not suitable for the
new requirement. So it is necessary to introduce this new notifier to meet
the SVM virtualization requirement. Further detail would be included in the
patch below:

"intel_iommu: propagate Extended-IOTLB invalidate to host"

Signed-off-by: Liu, Yi L <yi.l.liu@linux.intel.com>
---
 hw/vfio/pci.c               | 37 +++++++++++++++++++++++++++++++++++++
 include/exec/memory.h       |  2 ++
 linux-headers/linux/iommu.h |  5 +++++
 linux-headers/linux/vfio.h  |  8 ++++++++
 4 files changed, 52 insertions(+)

diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
index a1e6942..afcefd6 100644
--- a/hw/vfio/pci.c
+++ b/hw/vfio/pci.c
@@ -2619,6 +2619,33 @@ static void vfio_iommu_bind_pasid_tbl_notify(IOMMUNotifier *n, void *data)
     g_free(vfio_svm);
 }
 
+static void vfio_iommu_tlb_invalidate_notify(IOMMUNotifier *n,
+                                             void *data)
+{
+    VFIOGuestIOMMU *giommu = container_of(n, VFIOGuestIOMMU, n);
+    VFIOContainer *container = giommu->container;
+    IOMMUNotifierData *iommu_data = (IOMMUNotifierData *) data;
+    struct vfio_iommu_tlb_invalidate *vfio_tlb_inv;
+    int argsz;
+
+    argsz = sizeof(*vfio_tlb_inv) + iommu_data->payload_size;
+    vfio_tlb_inv = g_malloc0(argsz);
+    vfio_tlb_inv->argsz = argsz;
+    vfio_tlb_inv->length = iommu_data->payload_size;
+
+    memcpy(&vfio_tlb_inv->data, iommu_data->payload,
+              iommu_data->payload_size);
+
+    rcu_read_lock();
+    if (ioctl(container->fd, VFIO_IOMMU_TLB_INVALIDATE,
+              vfio_tlb_inv) != 0) {
+        error_report("vfio_iommu_tlb_invalidate_notify:"
+                     " failed, contanier: %p", container);
+    }
+    rcu_read_unlock();
+    g_free(vfio_tlb_inv);
+}
+
 static void vfio_realize(PCIDevice *pdev, Error **errp)
 {
     VFIOPCIDevice *vdev = DO_UPCAST(VFIOPCIDevice, pdev, pdev);
@@ -2865,6 +2892,7 @@ static void vfio_realize(PCIDevice *pdev, Error **errp)
     QTAILQ_FOREACH(subregion, &as->root->subregions, subregions_link) {
         if (memory_region_is_iommu(subregion)) {
             IOMMUNotifier n1;
+            IOMMUNotifier n2;
 
             /*
              FIXME: current iommu notifier is actually designed for
@@ -2882,6 +2910,15 @@ static void vfio_realize(PCIDevice *pdev, Error **errp)
                                    0,
                                    &n1);
 
+            iommu_notifier_init(&n2, vfio_iommu_tlb_invalidate_notify,
+                                IOMMU_NOTIFIER_IOMMU_TLB_INV,
+                                0,
+                                0);
+            vfio_register_notifier(group->container,
+                                   subregion,
+                                   0,
+                                   &n2);
+
             memory_region_notify_device_record(subregion,
                                                &vdev->host);
 
diff --git a/include/exec/memory.h b/include/exec/memory.h
index 3b8f487..af15351 100644
--- a/include/exec/memory.h
+++ b/include/exec/memory.h
@@ -83,6 +83,8 @@ typedef enum {
     IOMMU_NOTIFIER_MAP = 0x2,
     /* Notify PASID Table Binding */
     IOMMU_NOTIFIER_SVM_PASIDT_BIND = 0x4,
+    /* Notify IOMMU TLB Invalidation */
+    IOMMU_NOTIFIER_IOMMU_TLB_INV = 0x8,
 } IOMMUNotifierFlag;
 
 #define IOMMU_NOTIFIER_MAP_UNMAP (IOMMU_NOTIFIER_MAP | IOMMU_NOTIFIER_UNMAP)
diff --git a/linux-headers/linux/iommu.h b/linux-headers/linux/iommu.h
index 4519dcf..c2742ba 100644
--- a/linux-headers/linux/iommu.h
+++ b/linux-headers/linux/iommu.h
@@ -27,4 +27,9 @@ struct pasid_table_info {
 	__u8   opaque[];/* IOMMU-specific details */
 };
 
+struct tlb_invalidate_info {
+	__u32	model;
+	__u8	opaque[];
+};
+
 #endif /* __LINUX_IOMMU_H */
diff --git a/linux-headers/linux/vfio.h b/linux-headers/linux/vfio.h
index 9848d63..6c71c4a 100644
--- a/linux-headers/linux/vfio.h
+++ b/linux-headers/linux/vfio.h
@@ -554,6 +554,14 @@ struct vfio_device_svm {
 
 #define VFIO_IOMMU_SVM_BIND_TASK	_IO(VFIO_TYPE, VFIO_BASE + 22)
 
+/* For IOMMU Invalidation Passdwon */
+struct vfio_iommu_tlb_invalidate {
+	__u32	argsz;
+	__u32	length;
+	__u8	data[];
+};
+
+#define VFIO_IOMMU_TLB_INVALIDATE	_IO(VFIO_TYPE, VFIO_BASE + 23)
 
 /* -------- Additional API for SPAPR TCE (Server POWERPC) IOMMU -------- */
 
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 81+ messages in thread

* [RFC PATCH 17/20] Memory: Add func to fire TLB invalidate notifier
  2017-04-26 10:06 ` [Qemu-devel] " Liu, Yi L
@ 2017-04-26 10:06   ` Liu, Yi L
  -1 siblings, 0 replies; 81+ messages in thread
From: Liu, Yi L @ 2017-04-26 10:06 UTC (permalink / raw)
  To: qemu-devel, alex.williamson, peterx
  Cc: kvm, jasowang, iommu, kevin.tian, ashok.raj, jacob.jun.pan,
	tianyu.lan, yi.l.liu, jean-philippe.brucker, Liu, Yi L

This patch adds a separate function to fire IOMMU TLB invalidate notifier.

Signed-off-by: Liu, Yi L <yi.l.liu@linux.intel.com>
---
 include/exec/memory.h |  9 +++++++++
 memory.c              | 18 ++++++++++++++++++
 2 files changed, 27 insertions(+)

diff --git a/include/exec/memory.h b/include/exec/memory.h
index af15351..0155bad 100644
--- a/include/exec/memory.h
+++ b/include/exec/memory.h
@@ -707,6 +707,15 @@ void memory_region_notify_iommu(MemoryRegion *mr,
 void memory_region_notify_iommu_svm_bind(MemoryRegion *mr,
                                          void *data);
 
+/*
+ * memory_region_notify_iommu_invalidate: notify IOMMU
+ * TLB invalidation passdown.
+ *
+ * @mr: the memory region of IOMMU
+ * @data: IOMMU SVM data
+ */
+void memory_region_notify_iommu_invalidate(MemoryRegion *mr,
+                                           void *data);
 
 /**
  * memory_region_notify_one: notify a change in an IOMMU translation
diff --git a/memory.c b/memory.c
index ce0b0ff..8c572d5 100644
--- a/memory.c
+++ b/memory.c
@@ -1750,6 +1750,24 @@ void memory_region_notify_iommu_svm_bind(MemoryRegion *mr,
     }
 }
 
+void memory_region_notify_iommu_invalidate(MemoryRegion *mr,
+                                           void *data)
+{
+    IOMMUNotifier *iommu_notifier;
+    IOMMUNotifierFlag request_flags;
+
+    assert(memory_region_is_iommu(mr));
+
+    request_flags = IOMMU_NOTIFIER_IOMMU_TLB_INV;
+
+    QLIST_FOREACH(iommu_notifier, &mr->iommu_notify, node) {
+        if (iommu_notifier->notifier_flags & request_flags) {
+            iommu_notifier->notify(iommu_notifier, data);
+            break;
+        }
+    }
+}
+
 void memory_region_set_log(MemoryRegion *mr, bool log, unsigned client)
 {
     uint8_t mask = 1 << client;
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 81+ messages in thread

* [Qemu-devel] [RFC PATCH 17/20] Memory: Add func to fire TLB invalidate notifier
@ 2017-04-26 10:06   ` Liu, Yi L
  0 siblings, 0 replies; 81+ messages in thread
From: Liu, Yi L @ 2017-04-26 10:06 UTC (permalink / raw)
  To: qemu-devel, alex.williamson, peterx
  Cc: kvm, jasowang, iommu, kevin.tian, ashok.raj, jacob.jun.pan,
	tianyu.lan, yi.l.liu, jean-philippe.brucker, Liu, Yi L

This patch adds a separate function to fire IOMMU TLB invalidate notifier.

Signed-off-by: Liu, Yi L <yi.l.liu@linux.intel.com>
---
 include/exec/memory.h |  9 +++++++++
 memory.c              | 18 ++++++++++++++++++
 2 files changed, 27 insertions(+)

diff --git a/include/exec/memory.h b/include/exec/memory.h
index af15351..0155bad 100644
--- a/include/exec/memory.h
+++ b/include/exec/memory.h
@@ -707,6 +707,15 @@ void memory_region_notify_iommu(MemoryRegion *mr,
 void memory_region_notify_iommu_svm_bind(MemoryRegion *mr,
                                          void *data);
 
+/*
+ * memory_region_notify_iommu_invalidate: notify IOMMU
+ * TLB invalidation passdown.
+ *
+ * @mr: the memory region of IOMMU
+ * @data: IOMMU SVM data
+ */
+void memory_region_notify_iommu_invalidate(MemoryRegion *mr,
+                                           void *data);
 
 /**
  * memory_region_notify_one: notify a change in an IOMMU translation
diff --git a/memory.c b/memory.c
index ce0b0ff..8c572d5 100644
--- a/memory.c
+++ b/memory.c
@@ -1750,6 +1750,24 @@ void memory_region_notify_iommu_svm_bind(MemoryRegion *mr,
     }
 }
 
+void memory_region_notify_iommu_invalidate(MemoryRegion *mr,
+                                           void *data)
+{
+    IOMMUNotifier *iommu_notifier;
+    IOMMUNotifierFlag request_flags;
+
+    assert(memory_region_is_iommu(mr));
+
+    request_flags = IOMMU_NOTIFIER_IOMMU_TLB_INV;
+
+    QLIST_FOREACH(iommu_notifier, &mr->iommu_notify, node) {
+        if (iommu_notifier->notifier_flags & request_flags) {
+            iommu_notifier->notify(iommu_notifier, data);
+            break;
+        }
+    }
+}
+
 void memory_region_set_log(MemoryRegion *mr, bool log, unsigned client)
 {
     uint8_t mask = 1 << client;
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 81+ messages in thread

* [RFC PATCH 18/20] intel_iommu: propagate Extended-IOTLB invalidate to host
  2017-04-26 10:06 ` [Qemu-devel] " Liu, Yi L
@ 2017-04-26 10:06   ` Liu, Yi L
  -1 siblings, 0 replies; 81+ messages in thread
From: Liu, Yi L @ 2017-04-26 10:06 UTC (permalink / raw)
  To: qemu-devel, alex.williamson, peterx
  Cc: kvm, jasowang, iommu, kevin.tian, ashok.raj, jacob.jun.pan,
	tianyu.lan, yi.l.liu, jean-philippe.brucker, Liu, Yi L

The invalidation of Extended-IOTLB invalidates first-level and nested
mappings from the IOTLB and the paging-structure-caches.

For SVM virtualization, iommu tlb invalidate notifier is added. The reason
is as below:

* On VT-d, MAP/UNMAP notifier would be used to shadow the changes of the
  guest second-level page table. While for the 1st-level page table, is
  not shadowed like the way of second-level page table. Actually, the
  guest 1st-level page table is linked to host after the whole guest PASID
  table is linked to host. 1st-level page table is owned by guest in this
  SVM virtualization solution for VT-d. Guest should have modified the
  1st-level page table in memory before it issues the invalidate request
  for 1st-level mappings, so MAP/UNMAP notifier is not suitable for the
  invalidation of guest 1st-level mappings.

* Since guest owns the 1st-level page table, host have no knowledge about
  the invalidations to 1st-level related mappings. So intel_iommu emulator
  needs to propagate the invalidate request to host, then host invalidates
  the 1st-level and nested mapping in IOTLB and paging-structure-caches on
  host. So a new notifier is added to meet such requirement.

Before passing the invalidate request to host, intel_iommu emulator needs
to do specific translation to the invalidation request. e.g. granularity
translation, needs to limit the scope of the invalidate.

This patchset proposes passing raw data from guest to host when propagating
the guest IOMMU TLB invalidation. As the cover letter mentioned, there is
both pros and cons for passing raw data. Would be pleased to see comments
on the way how to pass the invalidate request to host.

For Extended-IOTLB invalidation, intel_iommu emulator would check all the
assigned devices to see if the device is affected by the invalidate request,
also intel_iommu emulator needs to do sanity check to the invalidate request
and then pass it to host.

Host would replace some fields in the raw data before submitting to pIOMMU.
e.g. guest domain ID must be replaced with the real domain ID in host. In
future PASID may also need to be replaced.

Signed-off-by: Liu, Yi L <yi.l.liu@linux.intel.com>
---
 hw/i386/intel_iommu.c          | 126 +++++++++++++++++++++++++++++++++++++++++
 hw/i386/intel_iommu_internal.h |  33 +++++++++++
 2 files changed, 159 insertions(+)

diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
index cd6db65..5fbb7f1 100644
--- a/hw/i386/intel_iommu.c
+++ b/hw/i386/intel_iommu.c
@@ -64,6 +64,10 @@ static void vtd_context_inv_notify_hook(VTDNotifierIterator *iter,
                                         void *hook_info,
                                         void *notify_info);
 
+static void vtd_tlb_inv_notify_hook(VTDNotifierIterator *iter,
+                                    void *hook_info,
+                                    void *notify_info);
+
 #define FOR_EACH_ASSIGN_DEVICE(__notify_info_type, \
                                __opaque_type, \
                                __hook_info, \
@@ -1979,6 +1983,121 @@ done:
     return true;
 }
 
+static void vtd_tlb_inv_passdown_notify(IntelIOMMUState *s,
+                                        VTDIOTLBInvHookInfo *hook_info,
+                                        vtd_device_hook hook_fn)
+{
+    FOR_EACH_ASSIGN_DEVICE(struct tlb_invalidate_info,
+                           VTDInvalidateData,
+                           hook_info,
+                           hook_fn);
+    return;
+}
+
+static void vtd_tlb_inv_notify_hook(VTDNotifierIterator *iter,
+                             void *hook_info,
+                             void *notify_info)
+{
+    struct tlb_invalidate_info *tlb_inv_info;
+    IOMMUNotifierData iommu_data;
+    VTDIOTLBInvHookInfo *tlb_hook_info;
+    VTDInvalidateData *inv_data;
+    tlb_inv_info = (struct tlb_invalidate_info *) notify_info;
+    tlb_hook_info = (VTDIOTLBInvHookInfo *) hook_info;
+    switch (tlb_hook_info->inv_desc->lo & VTD_INV_DESC_TYPE) {
+    case VTD_INV_DESC_EXT_IOTLB:
+        if (iter->did == *tlb_hook_info->did) {
+            break;
+        } else {
+            return;
+        }
+    default:
+        return;
+    }
+
+    tlb_inv_info->model = INTEL_IOMMU;
+
+    inv_data = (VTDInvalidateData *)&tlb_inv_info->opaque;
+    inv_data->pasid = *tlb_hook_info->pasid;
+    inv_data->sid = iter->host_sid;
+    inv_data->inv_desc = *tlb_hook_info->inv_desc;
+
+    iommu_data.payload = (uint8_t *) tlb_inv_info;
+    iommu_data.payload_size = sizeof(*tlb_inv_info) + sizeof(*inv_data);
+
+    memory_region_notify_iommu_invalidate(&iter->vtd_as->iommu,
+                                          &iommu_data);
+}
+
+static bool vtd_process_exiotlb_desc(IntelIOMMUState *s,
+                                     VTDInvDesc *inv_desc)
+{
+    uint16_t domain_id;
+    uint32_t pasid;
+    uint8_t am;
+    VTDIOTLBInvHookInfo tlb_hook_info;
+
+    if ((inv_desc->lo & VTD_INV_DESC_EXIOTLB_RSVD_LO) ||
+        (inv_desc->hi & VTD_INV_DESC_EXIOTLB_RSVD_HI)) {
+        VTD_DPRINTF(GENERAL, "error: non-zero reserved field in"
+                    "EXIOTLB Invalidate Descriptor hi 0x%"PRIx64
+                    " lo 0x%"PRIx64, inv_desc->hi, inv_desc->lo);
+        return false;
+    }
+
+    domain_id = VTD_INV_DESC_EXIOTLB_DID(inv_desc->lo);
+    switch (inv_desc->lo & VTD_INV_DESC_IOTLB_G) {
+    case VTD_INV_DESC_EXIOTLB_ALL_ALL:
+        VTD_DPRINTF(INV, "Invalidate all within ALL PASID");
+        inv_desc->lo &= ~VTD_INV_DESC_IOTLB_G;
+        inv_desc->lo |= VTD_INV_DESC_EXIOTLB_NONG_PASID;
+        break;
+
+    case VTD_INV_DESC_EXIOTLB_NONG_ALL:
+        VTD_DPRINTF(INV, "Invalidate non-global within ALL PASID");
+        break;
+
+    case VTD_INV_DESC_EXIOTLB_NONG_PASID:
+        VTD_DPRINTF(INV, "Invalidate non-global within slective-PASID,"
+                    "domain 0x%"PRIx16, domain_id);
+
+        break;
+
+    case VTD_INV_DESC_EXIOTLB_PSI_PASID:
+        am = VTD_INV_DESC_EXIOTLB_AM(inv_desc->hi);
+        VTD_DPRINTF(INV, "Invalidate selective-page within selective-"
+                         "PASID, domain 0x%"PRIx16 " addr 0x%"PRIx64
+                         " mask %"PRIu8, domain_id,
+                         (hwaddr) VTD_INV_DESC_EXIOTLB_ADDR(inv_desc->hi),
+                          am);
+        if (am > VTD_MAMV) {
+            VTD_DPRINTF(GENERAL, "error: supported max address mask value"
+                        "is %"PRIu8, (uint8_t)VTD_MAMV);
+            return false;
+        }
+
+        break;
+
+    default:
+        VTD_DPRINTF(GENERAL, "error: invalid granularity in Ex-IOTLB"
+                    "Invalidate Descriptor hi 0x%"PRIx64 " lo 0x%"PRIx64,
+                    inv_desc->hi, inv_desc->lo);
+        return false;
+    }
+
+    pasid = VTD_INV_DESC_EXIOTLB_PASID(inv_desc->lo);
+
+    tlb_hook_info.did = &domain_id;
+    tlb_hook_info.sid = NULL;
+    tlb_hook_info.pasid = &pasid;
+    tlb_hook_info.inv_desc = inv_desc;
+    vtd_tlb_inv_passdown_notify(s,
+                                &tlb_hook_info,
+                                vtd_tlb_inv_notify_hook);
+
+    return true;
+}
+
 static bool vtd_process_inv_desc(IntelIOMMUState *s)
 {
     VTDInvDesc inv_desc;
@@ -2008,6 +2127,13 @@ static bool vtd_process_inv_desc(IntelIOMMUState *s)
         }
         break;
 
+    case VTD_INV_DESC_EXT_IOTLB:
+        trace_vtd_inv_desc("extended-iotlb", inv_desc.hi, inv_desc.lo);
+        if (!vtd_process_exiotlb_desc(s, &inv_desc)) {
+            return false;
+        }
+        break;
+
     case VTD_INV_DESC_WAIT:
         trace_vtd_inv_desc("wait", inv_desc.hi, inv_desc.lo);
         if (!vtd_process_wait_desc(s, &inv_desc)) {
diff --git a/hw/i386/intel_iommu_internal.h b/hw/i386/intel_iommu_internal.h
index 5ab7d77..9f89751 100644
--- a/hw/i386/intel_iommu_internal.h
+++ b/hw/i386/intel_iommu_internal.h
@@ -341,6 +341,7 @@ typedef union VTDInvDesc VTDInvDesc;
 #define VTD_INV_DESC_IEC                0x4 /* Interrupt Entry Cache
                                                Invalidate Descriptor */
 #define VTD_INV_DESC_WAIT               0x5 /* Invalidation Wait Descriptor */
+#define VTD_INV_DESC_EXT_IOTLB          0x6 /* Ext-IOTLB Invalidate Desc */
 #define VTD_INV_DESC_NONE               0   /* Not an Invalidate Descriptor */
 
 /* Masks for Invalidation Wait Descriptor*/
@@ -380,6 +381,22 @@ typedef union VTDInvDesc VTDInvDesc;
 #define VTD_INV_DESC_DEVICE_IOTLB_RSVD_HI 0xffeULL
 #define VTD_INV_DESC_DEVICE_IOTLB_RSVD_LO 0xffff0000ffe0fff8
 
+#define VTD_INV_DESC_EXIOTLB_ALL_ALL       (0ULL << 4)
+#define VTD_INV_DESC_EXIOTLB_NONG_ALL      (1ULL << 4)
+#define VTD_INV_DESC_EXIOTLB_NONG_PASID    (2ULL << 4)
+#define VTD_INV_DESC_EXIOTLB_PSI_PASID     (3ULL << 4)
+
+#define VTD_INV_DESC_EXIOTLB_RSVD_LO       0xfff000000000ffc0ULL
+#define VTD_INV_DESC_EXIOTLB_RSVD_HI       0xf00ULL
+
+#define VTD_INV_DESC_EXIOTLB_PASID(val)    (((val) >> 32) & 0xfffffULL)
+#define VTD_INV_DESC_EXIOTLB_DID(val)      (((val) >> 16) & \
+                                             VTD_DOMAIN_ID_MASK)
+#define VTD_INV_DESC_EXIOTLB_ADDR(val)     ((val) & ~0xfffULL)
+#define VTD_INV_DESC_EXIOTLB_AM(val)       ((val) & 0x3fULL)
+#define VTD_INV_DESC_EXIOTLB_IH(val)       (((val) >> 6) & 0x1)
+#define VTD_INV_DESC_EXIOTLB_GL(val)       (((val) >> 7) & 0x1)
+
 /* Information about page-selective IOTLB invalidate */
 struct VTDIOTLBPageInvInfo {
     uint16_t domain_id;
@@ -388,6 +405,13 @@ struct VTDIOTLBPageInvInfo {
 };
 typedef struct VTDIOTLBPageInvInfo VTDIOTLBPageInvInfo;
 
+struct VTDInvalidateData {
+    uint16_t sid; /* it is a physical SID instead of guest SID */
+    uint32_t pasid;
+    VTDInvDesc inv_desc;
+};
+typedef struct VTDInvalidateData VTDInvalidateData;
+
 /* Pagesize of VTD paging structures, including root and context tables */
 #define VTD_PAGE_SHIFT              12
 #define VTD_PAGE_SIZE               (1ULL << VTD_PAGE_SHIFT)
@@ -447,6 +471,15 @@ struct VTDContextHookInfo {
 
 typedef struct VTDContextHookInfo VTDContextHookInfo;
 
+struct VTDIOTLBInvHookInfo {
+    uint16_t *did;
+    uint32_t *pasid;
+    uint16_t *sid;
+    VTDInvDesc *inv_desc;
+};
+
+typedef struct  VTDIOTLBInvHookInfo VTDIOTLBInvHookInfo;
+
 struct VTDNotifierIterator {
     VTDAddressSpace *vtd_as;
     VTDContextEntry *ce;
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 81+ messages in thread

* [Qemu-devel] [RFC PATCH 18/20] intel_iommu: propagate Extended-IOTLB invalidate to host
@ 2017-04-26 10:06   ` Liu, Yi L
  0 siblings, 0 replies; 81+ messages in thread
From: Liu, Yi L @ 2017-04-26 10:06 UTC (permalink / raw)
  To: qemu-devel, alex.williamson, peterx
  Cc: kvm, jasowang, iommu, kevin.tian, ashok.raj, jacob.jun.pan,
	tianyu.lan, yi.l.liu, jean-philippe.brucker, Liu, Yi L

The invalidation of Extended-IOTLB invalidates first-level and nested
mappings from the IOTLB and the paging-structure-caches.

For SVM virtualization, iommu tlb invalidate notifier is added. The reason
is as below:

* On VT-d, MAP/UNMAP notifier would be used to shadow the changes of the
  guest second-level page table. While for the 1st-level page table, is
  not shadowed like the way of second-level page table. Actually, the
  guest 1st-level page table is linked to host after the whole guest PASID
  table is linked to host. 1st-level page table is owned by guest in this
  SVM virtualization solution for VT-d. Guest should have modified the
  1st-level page table in memory before it issues the invalidate request
  for 1st-level mappings, so MAP/UNMAP notifier is not suitable for the
  invalidation of guest 1st-level mappings.

* Since guest owns the 1st-level page table, host have no knowledge about
  the invalidations to 1st-level related mappings. So intel_iommu emulator
  needs to propagate the invalidate request to host, then host invalidates
  the 1st-level and nested mapping in IOTLB and paging-structure-caches on
  host. So a new notifier is added to meet such requirement.

Before passing the invalidate request to host, intel_iommu emulator needs
to do specific translation to the invalidation request. e.g. granularity
translation, needs to limit the scope of the invalidate.

This patchset proposes passing raw data from guest to host when propagating
the guest IOMMU TLB invalidation. As the cover letter mentioned, there is
both pros and cons for passing raw data. Would be pleased to see comments
on the way how to pass the invalidate request to host.

For Extended-IOTLB invalidation, intel_iommu emulator would check all the
assigned devices to see if the device is affected by the invalidate request,
also intel_iommu emulator needs to do sanity check to the invalidate request
and then pass it to host.

Host would replace some fields in the raw data before submitting to pIOMMU.
e.g. guest domain ID must be replaced with the real domain ID in host. In
future PASID may also need to be replaced.

Signed-off-by: Liu, Yi L <yi.l.liu@linux.intel.com>
---
 hw/i386/intel_iommu.c          | 126 +++++++++++++++++++++++++++++++++++++++++
 hw/i386/intel_iommu_internal.h |  33 +++++++++++
 2 files changed, 159 insertions(+)

diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
index cd6db65..5fbb7f1 100644
--- a/hw/i386/intel_iommu.c
+++ b/hw/i386/intel_iommu.c
@@ -64,6 +64,10 @@ static void vtd_context_inv_notify_hook(VTDNotifierIterator *iter,
                                         void *hook_info,
                                         void *notify_info);
 
+static void vtd_tlb_inv_notify_hook(VTDNotifierIterator *iter,
+                                    void *hook_info,
+                                    void *notify_info);
+
 #define FOR_EACH_ASSIGN_DEVICE(__notify_info_type, \
                                __opaque_type, \
                                __hook_info, \
@@ -1979,6 +1983,121 @@ done:
     return true;
 }
 
+static void vtd_tlb_inv_passdown_notify(IntelIOMMUState *s,
+                                        VTDIOTLBInvHookInfo *hook_info,
+                                        vtd_device_hook hook_fn)
+{
+    FOR_EACH_ASSIGN_DEVICE(struct tlb_invalidate_info,
+                           VTDInvalidateData,
+                           hook_info,
+                           hook_fn);
+    return;
+}
+
+static void vtd_tlb_inv_notify_hook(VTDNotifierIterator *iter,
+                             void *hook_info,
+                             void *notify_info)
+{
+    struct tlb_invalidate_info *tlb_inv_info;
+    IOMMUNotifierData iommu_data;
+    VTDIOTLBInvHookInfo *tlb_hook_info;
+    VTDInvalidateData *inv_data;
+    tlb_inv_info = (struct tlb_invalidate_info *) notify_info;
+    tlb_hook_info = (VTDIOTLBInvHookInfo *) hook_info;
+    switch (tlb_hook_info->inv_desc->lo & VTD_INV_DESC_TYPE) {
+    case VTD_INV_DESC_EXT_IOTLB:
+        if (iter->did == *tlb_hook_info->did) {
+            break;
+        } else {
+            return;
+        }
+    default:
+        return;
+    }
+
+    tlb_inv_info->model = INTEL_IOMMU;
+
+    inv_data = (VTDInvalidateData *)&tlb_inv_info->opaque;
+    inv_data->pasid = *tlb_hook_info->pasid;
+    inv_data->sid = iter->host_sid;
+    inv_data->inv_desc = *tlb_hook_info->inv_desc;
+
+    iommu_data.payload = (uint8_t *) tlb_inv_info;
+    iommu_data.payload_size = sizeof(*tlb_inv_info) + sizeof(*inv_data);
+
+    memory_region_notify_iommu_invalidate(&iter->vtd_as->iommu,
+                                          &iommu_data);
+}
+
+static bool vtd_process_exiotlb_desc(IntelIOMMUState *s,
+                                     VTDInvDesc *inv_desc)
+{
+    uint16_t domain_id;
+    uint32_t pasid;
+    uint8_t am;
+    VTDIOTLBInvHookInfo tlb_hook_info;
+
+    if ((inv_desc->lo & VTD_INV_DESC_EXIOTLB_RSVD_LO) ||
+        (inv_desc->hi & VTD_INV_DESC_EXIOTLB_RSVD_HI)) {
+        VTD_DPRINTF(GENERAL, "error: non-zero reserved field in"
+                    "EXIOTLB Invalidate Descriptor hi 0x%"PRIx64
+                    " lo 0x%"PRIx64, inv_desc->hi, inv_desc->lo);
+        return false;
+    }
+
+    domain_id = VTD_INV_DESC_EXIOTLB_DID(inv_desc->lo);
+    switch (inv_desc->lo & VTD_INV_DESC_IOTLB_G) {
+    case VTD_INV_DESC_EXIOTLB_ALL_ALL:
+        VTD_DPRINTF(INV, "Invalidate all within ALL PASID");
+        inv_desc->lo &= ~VTD_INV_DESC_IOTLB_G;
+        inv_desc->lo |= VTD_INV_DESC_EXIOTLB_NONG_PASID;
+        break;
+
+    case VTD_INV_DESC_EXIOTLB_NONG_ALL:
+        VTD_DPRINTF(INV, "Invalidate non-global within ALL PASID");
+        break;
+
+    case VTD_INV_DESC_EXIOTLB_NONG_PASID:
+        VTD_DPRINTF(INV, "Invalidate non-global within slective-PASID,"
+                    "domain 0x%"PRIx16, domain_id);
+
+        break;
+
+    case VTD_INV_DESC_EXIOTLB_PSI_PASID:
+        am = VTD_INV_DESC_EXIOTLB_AM(inv_desc->hi);
+        VTD_DPRINTF(INV, "Invalidate selective-page within selective-"
+                         "PASID, domain 0x%"PRIx16 " addr 0x%"PRIx64
+                         " mask %"PRIu8, domain_id,
+                         (hwaddr) VTD_INV_DESC_EXIOTLB_ADDR(inv_desc->hi),
+                          am);
+        if (am > VTD_MAMV) {
+            VTD_DPRINTF(GENERAL, "error: supported max address mask value"
+                        "is %"PRIu8, (uint8_t)VTD_MAMV);
+            return false;
+        }
+
+        break;
+
+    default:
+        VTD_DPRINTF(GENERAL, "error: invalid granularity in Ex-IOTLB"
+                    "Invalidate Descriptor hi 0x%"PRIx64 " lo 0x%"PRIx64,
+                    inv_desc->hi, inv_desc->lo);
+        return false;
+    }
+
+    pasid = VTD_INV_DESC_EXIOTLB_PASID(inv_desc->lo);
+
+    tlb_hook_info.did = &domain_id;
+    tlb_hook_info.sid = NULL;
+    tlb_hook_info.pasid = &pasid;
+    tlb_hook_info.inv_desc = inv_desc;
+    vtd_tlb_inv_passdown_notify(s,
+                                &tlb_hook_info,
+                                vtd_tlb_inv_notify_hook);
+
+    return true;
+}
+
 static bool vtd_process_inv_desc(IntelIOMMUState *s)
 {
     VTDInvDesc inv_desc;
@@ -2008,6 +2127,13 @@ static bool vtd_process_inv_desc(IntelIOMMUState *s)
         }
         break;
 
+    case VTD_INV_DESC_EXT_IOTLB:
+        trace_vtd_inv_desc("extended-iotlb", inv_desc.hi, inv_desc.lo);
+        if (!vtd_process_exiotlb_desc(s, &inv_desc)) {
+            return false;
+        }
+        break;
+
     case VTD_INV_DESC_WAIT:
         trace_vtd_inv_desc("wait", inv_desc.hi, inv_desc.lo);
         if (!vtd_process_wait_desc(s, &inv_desc)) {
diff --git a/hw/i386/intel_iommu_internal.h b/hw/i386/intel_iommu_internal.h
index 5ab7d77..9f89751 100644
--- a/hw/i386/intel_iommu_internal.h
+++ b/hw/i386/intel_iommu_internal.h
@@ -341,6 +341,7 @@ typedef union VTDInvDesc VTDInvDesc;
 #define VTD_INV_DESC_IEC                0x4 /* Interrupt Entry Cache
                                                Invalidate Descriptor */
 #define VTD_INV_DESC_WAIT               0x5 /* Invalidation Wait Descriptor */
+#define VTD_INV_DESC_EXT_IOTLB          0x6 /* Ext-IOTLB Invalidate Desc */
 #define VTD_INV_DESC_NONE               0   /* Not an Invalidate Descriptor */
 
 /* Masks for Invalidation Wait Descriptor*/
@@ -380,6 +381,22 @@ typedef union VTDInvDesc VTDInvDesc;
 #define VTD_INV_DESC_DEVICE_IOTLB_RSVD_HI 0xffeULL
 #define VTD_INV_DESC_DEVICE_IOTLB_RSVD_LO 0xffff0000ffe0fff8
 
+#define VTD_INV_DESC_EXIOTLB_ALL_ALL       (0ULL << 4)
+#define VTD_INV_DESC_EXIOTLB_NONG_ALL      (1ULL << 4)
+#define VTD_INV_DESC_EXIOTLB_NONG_PASID    (2ULL << 4)
+#define VTD_INV_DESC_EXIOTLB_PSI_PASID     (3ULL << 4)
+
+#define VTD_INV_DESC_EXIOTLB_RSVD_LO       0xfff000000000ffc0ULL
+#define VTD_INV_DESC_EXIOTLB_RSVD_HI       0xf00ULL
+
+#define VTD_INV_DESC_EXIOTLB_PASID(val)    (((val) >> 32) & 0xfffffULL)
+#define VTD_INV_DESC_EXIOTLB_DID(val)      (((val) >> 16) & \
+                                             VTD_DOMAIN_ID_MASK)
+#define VTD_INV_DESC_EXIOTLB_ADDR(val)     ((val) & ~0xfffULL)
+#define VTD_INV_DESC_EXIOTLB_AM(val)       ((val) & 0x3fULL)
+#define VTD_INV_DESC_EXIOTLB_IH(val)       (((val) >> 6) & 0x1)
+#define VTD_INV_DESC_EXIOTLB_GL(val)       (((val) >> 7) & 0x1)
+
 /* Information about page-selective IOTLB invalidate */
 struct VTDIOTLBPageInvInfo {
     uint16_t domain_id;
@@ -388,6 +405,13 @@ struct VTDIOTLBPageInvInfo {
 };
 typedef struct VTDIOTLBPageInvInfo VTDIOTLBPageInvInfo;
 
+struct VTDInvalidateData {
+    uint16_t sid; /* it is a physical SID instead of guest SID */
+    uint32_t pasid;
+    VTDInvDesc inv_desc;
+};
+typedef struct VTDInvalidateData VTDInvalidateData;
+
 /* Pagesize of VTD paging structures, including root and context tables */
 #define VTD_PAGE_SHIFT              12
 #define VTD_PAGE_SIZE               (1ULL << VTD_PAGE_SHIFT)
@@ -447,6 +471,15 @@ struct VTDContextHookInfo {
 
 typedef struct VTDContextHookInfo VTDContextHookInfo;
 
+struct VTDIOTLBInvHookInfo {
+    uint16_t *did;
+    uint32_t *pasid;
+    uint16_t *sid;
+    VTDInvDesc *inv_desc;
+};
+
+typedef struct  VTDIOTLBInvHookInfo VTDIOTLBInvHookInfo;
+
 struct VTDNotifierIterator {
     VTDAddressSpace *vtd_as;
     VTDContextEntry *ce;
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 81+ messages in thread

* [RFC PATCH 19/20] intel_iommu: propagate PASID-Cache invalidate to host
  2017-04-26 10:06 ` [Qemu-devel] " Liu, Yi L
@ 2017-04-26 10:06   ` Liu, Yi L
  -1 siblings, 0 replies; 81+ messages in thread
From: Liu, Yi L @ 2017-04-26 10:06 UTC (permalink / raw)
  To: qemu-devel, alex.williamson, peterx
  Cc: kvm, jasowang, iommu, kevin.tian, ashok.raj, jacob.jun.pan,
	tianyu.lan, yi.l.liu, jean-philippe.brucker, Liu, Yi L

This patch adds support for propagating PASID-Cache invalidation to host.
Similar with Extended-IOTLB invalidation, intel_iommu emulator would also
check all the assigned devices and do sanity check, then pass it to host.

Host pIOMMU driver would replace some fields in the raw data before
submitting to pIOMMU. e.g. guest domain ID must be replaced with the real
domain ID in host. In future PASID may need to be replaced.

Signed-off-by: Liu, Yi L <yi.l.liu@linux.intel.com>
---
 hw/i386/intel_iommu.c          | 56 ++++++++++++++++++++++++++++++++++++++++++
 hw/i386/intel_iommu_internal.h | 10 ++++++++
 2 files changed, 66 insertions(+)

diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
index 5fbb7f1..c5e9170 100644
--- a/hw/i386/intel_iommu.c
+++ b/hw/i386/intel_iommu.c
@@ -2006,6 +2006,7 @@ static void vtd_tlb_inv_notify_hook(VTDNotifierIterator *iter,
     tlb_hook_info = (VTDIOTLBInvHookInfo *) hook_info;
     switch (tlb_hook_info->inv_desc->lo & VTD_INV_DESC_TYPE) {
     case VTD_INV_DESC_EXT_IOTLB:
+    case VTD_INV_DESC_PC:
         if (iter->did == *tlb_hook_info->did) {
             break;
         } else {
@@ -2098,6 +2099,54 @@ static bool vtd_process_exiotlb_desc(IntelIOMMUState *s,
     return true;
 }
 
+static bool vtd_process_pasid_desc(IntelIOMMUState *s,
+                                   VTDInvDesc *inv_desc)
+{
+    uint16_t domain_id;
+    uint32_t pasid;
+    VTDIOTLBInvHookInfo tlb_hook_info;
+
+    if ((inv_desc->lo & VTD_INV_DESC_PASIDC_RSVD_LO) ||
+        (inv_desc->hi & VTD_INV_DESC_PASIDC_RSVD_HI)) {
+        VTD_DPRINTF(GENERAL, "error: non-zero reserved field"
+                    " in PASID desc, hi 0x%"PRIx64 " lo 0x%"PRIx64,
+                    inv_desc->hi, inv_desc->lo);
+        return false;
+    }
+
+    domain_id = VTD_INV_DESC_PASIDC_DID(inv_desc->lo);
+
+    switch (inv_desc->lo & VTD_INV_DESC_PASIDC_G) {
+    case VTD_INV_DESC_PASIDC_ALL_ALL:
+        VTD_DPRINTF(INV, "Invalidate all PASID");
+        break;
+
+    case VTD_INV_DESC_PASIDC_PASID_SI:
+        VTD_DPRINTF(INV, "pasid-selective invalidation"
+                    " domain 0x%"PRIx16, domain_id);
+        break;
+
+    default:
+        VTD_DPRINTF(GENERAL, "error: invalid granularity"
+                    " in PASID-Cache Invalidate Descriptor"
+                    " hi 0x%"PRIx64 " lo 0x%"PRIx64,
+                    inv_desc->hi, inv_desc->lo);
+        return false;
+    }
+
+    pasid = VTD_INV_DESC_PASIDC_PASID(inv_desc->lo);
+
+    tlb_hook_info.did = &domain_id;
+    tlb_hook_info.sid = NULL;
+    tlb_hook_info.pasid = &pasid;
+    tlb_hook_info.inv_desc = inv_desc;
+    vtd_tlb_inv_passdown_notify(s,
+                                &tlb_hook_info,
+                                vtd_tlb_inv_notify_hook);
+
+    return true;
+}
+
 static bool vtd_process_inv_desc(IntelIOMMUState *s)
 {
     VTDInvDesc inv_desc;
@@ -2134,6 +2183,13 @@ static bool vtd_process_inv_desc(IntelIOMMUState *s)
         }
         break;
 
+     case VTD_INV_DESC_PC:
+        trace_vtd_inv_desc("pasid-cache", inv_desc.hi, inv_desc.lo);
+        if (!vtd_process_pasid_desc(s, &inv_desc)) {
+            return false;
+        }
+        break;
+
     case VTD_INV_DESC_WAIT:
         trace_vtd_inv_desc("wait", inv_desc.hi, inv_desc.lo);
         if (!vtd_process_wait_desc(s, &inv_desc)) {
diff --git a/hw/i386/intel_iommu_internal.h b/hw/i386/intel_iommu_internal.h
index 9f89751..a6b9350 100644
--- a/hw/i386/intel_iommu_internal.h
+++ b/hw/i386/intel_iommu_internal.h
@@ -342,6 +342,7 @@ typedef union VTDInvDesc VTDInvDesc;
                                                Invalidate Descriptor */
 #define VTD_INV_DESC_WAIT               0x5 /* Invalidation Wait Descriptor */
 #define VTD_INV_DESC_EXT_IOTLB          0x6 /* Ext-IOTLB Invalidate Desc */
+#define VTD_INV_DESC_PC                 0x7 /* PASID-cache Invalidate Desc */
 #define VTD_INV_DESC_NONE               0   /* Not an Invalidate Descriptor */
 
 /* Masks for Invalidation Wait Descriptor*/
@@ -397,6 +398,15 @@ typedef union VTDInvDesc VTDInvDesc;
 #define VTD_INV_DESC_EXIOTLB_IH(val)       (((val) >> 6) & 0x1)
 #define VTD_INV_DESC_EXIOTLB_GL(val)       (((val) >> 7) & 0x1)
 
+#define VTD_INV_DESC_PASIDC_G          (3ULL << 4)
+#define VTD_INV_DESC_PASIDC_PASID(val) (((val) >> 32) & 0xfffffULL)
+#define VTD_INV_DESC_PASIDC_DID(val)   (((val) >> 16) & VTD_DOMAIN_ID_MASK)
+#define VTD_INV_DESC_PASIDC_RSVD_LO    0xfff000000000ffc0ULL
+#define VTD_INV_DESC_PASIDC_RSVD_HI    0xffffffffffffffffULL
+
+#define VTD_INV_DESC_PASIDC_ALL_ALL    (0ULL << 4)
+#define VTD_INV_DESC_PASIDC_PASID_SI   (1ULL << 4)
+
 /* Information about page-selective IOTLB invalidate */
 struct VTDIOTLBPageInvInfo {
     uint16_t domain_id;
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 81+ messages in thread

* [Qemu-devel] [RFC PATCH 19/20] intel_iommu: propagate PASID-Cache invalidate to host
@ 2017-04-26 10:06   ` Liu, Yi L
  0 siblings, 0 replies; 81+ messages in thread
From: Liu, Yi L @ 2017-04-26 10:06 UTC (permalink / raw)
  To: qemu-devel, alex.williamson, peterx
  Cc: kvm, jasowang, iommu, kevin.tian, ashok.raj, jacob.jun.pan,
	tianyu.lan, yi.l.liu, jean-philippe.brucker, Liu, Yi L

This patch adds support for propagating PASID-Cache invalidation to host.
Similar with Extended-IOTLB invalidation, intel_iommu emulator would also
check all the assigned devices and do sanity check, then pass it to host.

Host pIOMMU driver would replace some fields in the raw data before
submitting to pIOMMU. e.g. guest domain ID must be replaced with the real
domain ID in host. In future PASID may need to be replaced.

Signed-off-by: Liu, Yi L <yi.l.liu@linux.intel.com>
---
 hw/i386/intel_iommu.c          | 56 ++++++++++++++++++++++++++++++++++++++++++
 hw/i386/intel_iommu_internal.h | 10 ++++++++
 2 files changed, 66 insertions(+)

diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
index 5fbb7f1..c5e9170 100644
--- a/hw/i386/intel_iommu.c
+++ b/hw/i386/intel_iommu.c
@@ -2006,6 +2006,7 @@ static void vtd_tlb_inv_notify_hook(VTDNotifierIterator *iter,
     tlb_hook_info = (VTDIOTLBInvHookInfo *) hook_info;
     switch (tlb_hook_info->inv_desc->lo & VTD_INV_DESC_TYPE) {
     case VTD_INV_DESC_EXT_IOTLB:
+    case VTD_INV_DESC_PC:
         if (iter->did == *tlb_hook_info->did) {
             break;
         } else {
@@ -2098,6 +2099,54 @@ static bool vtd_process_exiotlb_desc(IntelIOMMUState *s,
     return true;
 }
 
+static bool vtd_process_pasid_desc(IntelIOMMUState *s,
+                                   VTDInvDesc *inv_desc)
+{
+    uint16_t domain_id;
+    uint32_t pasid;
+    VTDIOTLBInvHookInfo tlb_hook_info;
+
+    if ((inv_desc->lo & VTD_INV_DESC_PASIDC_RSVD_LO) ||
+        (inv_desc->hi & VTD_INV_DESC_PASIDC_RSVD_HI)) {
+        VTD_DPRINTF(GENERAL, "error: non-zero reserved field"
+                    " in PASID desc, hi 0x%"PRIx64 " lo 0x%"PRIx64,
+                    inv_desc->hi, inv_desc->lo);
+        return false;
+    }
+
+    domain_id = VTD_INV_DESC_PASIDC_DID(inv_desc->lo);
+
+    switch (inv_desc->lo & VTD_INV_DESC_PASIDC_G) {
+    case VTD_INV_DESC_PASIDC_ALL_ALL:
+        VTD_DPRINTF(INV, "Invalidate all PASID");
+        break;
+
+    case VTD_INV_DESC_PASIDC_PASID_SI:
+        VTD_DPRINTF(INV, "pasid-selective invalidation"
+                    " domain 0x%"PRIx16, domain_id);
+        break;
+
+    default:
+        VTD_DPRINTF(GENERAL, "error: invalid granularity"
+                    " in PASID-Cache Invalidate Descriptor"
+                    " hi 0x%"PRIx64 " lo 0x%"PRIx64,
+                    inv_desc->hi, inv_desc->lo);
+        return false;
+    }
+
+    pasid = VTD_INV_DESC_PASIDC_PASID(inv_desc->lo);
+
+    tlb_hook_info.did = &domain_id;
+    tlb_hook_info.sid = NULL;
+    tlb_hook_info.pasid = &pasid;
+    tlb_hook_info.inv_desc = inv_desc;
+    vtd_tlb_inv_passdown_notify(s,
+                                &tlb_hook_info,
+                                vtd_tlb_inv_notify_hook);
+
+    return true;
+}
+
 static bool vtd_process_inv_desc(IntelIOMMUState *s)
 {
     VTDInvDesc inv_desc;
@@ -2134,6 +2183,13 @@ static bool vtd_process_inv_desc(IntelIOMMUState *s)
         }
         break;
 
+     case VTD_INV_DESC_PC:
+        trace_vtd_inv_desc("pasid-cache", inv_desc.hi, inv_desc.lo);
+        if (!vtd_process_pasid_desc(s, &inv_desc)) {
+            return false;
+        }
+        break;
+
     case VTD_INV_DESC_WAIT:
         trace_vtd_inv_desc("wait", inv_desc.hi, inv_desc.lo);
         if (!vtd_process_wait_desc(s, &inv_desc)) {
diff --git a/hw/i386/intel_iommu_internal.h b/hw/i386/intel_iommu_internal.h
index 9f89751..a6b9350 100644
--- a/hw/i386/intel_iommu_internal.h
+++ b/hw/i386/intel_iommu_internal.h
@@ -342,6 +342,7 @@ typedef union VTDInvDesc VTDInvDesc;
                                                Invalidate Descriptor */
 #define VTD_INV_DESC_WAIT               0x5 /* Invalidation Wait Descriptor */
 #define VTD_INV_DESC_EXT_IOTLB          0x6 /* Ext-IOTLB Invalidate Desc */
+#define VTD_INV_DESC_PC                 0x7 /* PASID-cache Invalidate Desc */
 #define VTD_INV_DESC_NONE               0   /* Not an Invalidate Descriptor */
 
 /* Masks for Invalidation Wait Descriptor*/
@@ -397,6 +398,15 @@ typedef union VTDInvDesc VTDInvDesc;
 #define VTD_INV_DESC_EXIOTLB_IH(val)       (((val) >> 6) & 0x1)
 #define VTD_INV_DESC_EXIOTLB_GL(val)       (((val) >> 7) & 0x1)
 
+#define VTD_INV_DESC_PASIDC_G          (3ULL << 4)
+#define VTD_INV_DESC_PASIDC_PASID(val) (((val) >> 32) & 0xfffffULL)
+#define VTD_INV_DESC_PASIDC_DID(val)   (((val) >> 16) & VTD_DOMAIN_ID_MASK)
+#define VTD_INV_DESC_PASIDC_RSVD_LO    0xfff000000000ffc0ULL
+#define VTD_INV_DESC_PASIDC_RSVD_HI    0xffffffffffffffffULL
+
+#define VTD_INV_DESC_PASIDC_ALL_ALL    (0ULL << 4)
+#define VTD_INV_DESC_PASIDC_PASID_SI   (1ULL << 4)
+
 /* Information about page-selective IOTLB invalidate */
 struct VTDIOTLBPageInvInfo {
     uint16_t domain_id;
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 81+ messages in thread

* [RFC PATCH 20/20] intel_iommu: propagate Ext-Device-TLB invalidate to host
  2017-04-26 10:06 ` [Qemu-devel] " Liu, Yi L
@ 2017-04-26 10:06   ` Liu, Yi L
  -1 siblings, 0 replies; 81+ messages in thread
From: Liu, Yi L @ 2017-04-26 10:06 UTC (permalink / raw)
  To: qemu-devel, alex.williamson, peterx
  Cc: kvm, jasowang, iommu, kevin.tian, ashok.raj, jacob.jun.pan,
	tianyu.lan, yi.l.liu, jean-philippe.brucker, Liu, Yi L

For Extended-Device-TLB invalidation, intel_iommu emulator needs to check
all the assigned device and find the affected device. Replace the guest
SID with the host SID in the invalidate descriptor and pass the request to
host.

Host may just submit the request to corresponding invalidation queue in
pIOMMU. In future maybe PASID needs to be replaced.

Signed-off-by: Liu, Yi L <yi.l.liu@linux.intel.com>
---
 hw/i386/intel_iommu.c          | 43 ++++++++++++++++++++++++++++++++++++++++++
 hw/i386/intel_iommu_internal.h |  7 +++++++
 2 files changed, 50 insertions(+)

diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
index c5e9170..4370790 100644
--- a/hw/i386/intel_iommu.c
+++ b/hw/i386/intel_iommu.c
@@ -2012,6 +2012,13 @@ static void vtd_tlb_inv_notify_hook(VTDNotifierIterator *iter,
         } else {
             return;
         }
+    case VTD_INV_DESC_EXT_DIOTLB:
+        if (iter->sid != *tlb_hook_info->sid) {
+            return;
+        }
+        tlb_hook_info->inv_desc->lo &= ~VTD_INV_DESC_EXT_DIOTLB_SID_MASK;
+        tlb_hook_info->inv_desc->lo |= (iter->host_sid << 16);
+        break;
     default:
         return;
     }
@@ -2147,6 +2154,34 @@ static bool vtd_process_pasid_desc(IntelIOMMUState *s,
     return true;
 }
 
+static bool vtd_process_ext_device_iotlb(IntelIOMMUState *s,
+                                         VTDInvDesc *inv_desc)
+{
+    uint32_t pasid;
+    uint16_t sid;
+    VTDIOTLBInvHookInfo tlb_hook_info;
+
+    if ((inv_desc->lo & VTD_INV_DESC_EXT_DIOTLB_RSVD_LO) ||
+        (inv_desc->hi & VTD_INV_DESC_EXT_DIOTLB_RSVD_HI)) {
+        VTD_DPRINTF(GENERAL, "error: non-zero reserved field in"
+                    " Device ExIOTLB desc, hi 0x%"PRIx64 " lo 0x%"PRIx64,
+                    inv_desc->hi, inv_desc->lo);
+        return false;
+    }
+
+    pasid = VTD_INV_DESC_EXT_DIOTLB_PASID(inv_desc->lo);
+    sid = VTD_INV_DESC_EXT_DIOTLB_SID(inv_desc->lo);
+
+    tlb_hook_info.did = NULL;
+    tlb_hook_info.sid = &sid;
+    tlb_hook_info.pasid = &pasid;
+    tlb_hook_info.inv_desc = inv_desc;
+    vtd_tlb_inv_passdown_notify(s,
+                                &tlb_hook_info,
+                                vtd_tlb_inv_notify_hook);
+    return true;
+}
+
 static bool vtd_process_inv_desc(IntelIOMMUState *s)
 {
     VTDInvDesc inv_desc;
@@ -2190,6 +2225,14 @@ static bool vtd_process_inv_desc(IntelIOMMUState *s)
         }
         break;
 
+    case VTD_INV_DESC_EXT_DIOTLB:
+        trace_vtd_inv_desc("device-extended-iotlb",
+                           inv_desc.hi, inv_desc.lo);
+        if (!vtd_process_ext_device_iotlb(s, &inv_desc)) {
+            return false;
+        }
+        break;
+
     case VTD_INV_DESC_WAIT:
         trace_vtd_inv_desc("wait", inv_desc.hi, inv_desc.lo);
         if (!vtd_process_wait_desc(s, &inv_desc)) {
diff --git a/hw/i386/intel_iommu_internal.h b/hw/i386/intel_iommu_internal.h
index a6b9350..3cb2361 100644
--- a/hw/i386/intel_iommu_internal.h
+++ b/hw/i386/intel_iommu_internal.h
@@ -343,6 +343,7 @@ typedef union VTDInvDesc VTDInvDesc;
 #define VTD_INV_DESC_WAIT               0x5 /* Invalidation Wait Descriptor */
 #define VTD_INV_DESC_EXT_IOTLB          0x6 /* Ext-IOTLB Invalidate Desc */
 #define VTD_INV_DESC_PC                 0x7 /* PASID-cache Invalidate Desc */
+#define VTD_INV_DESC_EXT_DIOTLB         0x8 /* Ext-DIOTLB Invalidate Desc */
 #define VTD_INV_DESC_NONE               0   /* Not an Invalidate Descriptor */
 
 /* Masks for Invalidation Wait Descriptor*/
@@ -407,6 +408,12 @@ typedef union VTDInvDesc VTDInvDesc;
 #define VTD_INV_DESC_PASIDC_ALL_ALL    (0ULL << 4)
 #define VTD_INV_DESC_PASIDC_PASID_SI   (1ULL << 4)
 
+#define VTD_INV_DESC_EXT_DIOTLB_PASID(val) (((val) >> 32) & 0xfffffULL)
+#define VTD_INV_DESC_EXT_DIOTLB_SID(val)   (((val) >> 16) & 0xffff)
+#define VTD_INV_DESC_EXT_DIOTLB_RSVD_LO    0xe00ULL
+#define VTD_INV_DESC_EXT_DIOTLB_RSVD_HI    0x7feULL
+#define VTD_INV_DESC_EXT_DIOTLB_SID_MASK   0xFFFF0000ULL
+
 /* Information about page-selective IOTLB invalidate */
 struct VTDIOTLBPageInvInfo {
     uint16_t domain_id;
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 81+ messages in thread

* [Qemu-devel] [RFC PATCH 20/20] intel_iommu: propagate Ext-Device-TLB invalidate to host
@ 2017-04-26 10:06   ` Liu, Yi L
  0 siblings, 0 replies; 81+ messages in thread
From: Liu, Yi L @ 2017-04-26 10:06 UTC (permalink / raw)
  To: qemu-devel, alex.williamson, peterx
  Cc: kvm, jasowang, iommu, kevin.tian, ashok.raj, jacob.jun.pan,
	tianyu.lan, yi.l.liu, jean-philippe.brucker, Liu, Yi L

For Extended-Device-TLB invalidation, intel_iommu emulator needs to check
all the assigned device and find the affected device. Replace the guest
SID with the host SID in the invalidate descriptor and pass the request to
host.

Host may just submit the request to corresponding invalidation queue in
pIOMMU. In future maybe PASID needs to be replaced.

Signed-off-by: Liu, Yi L <yi.l.liu@linux.intel.com>
---
 hw/i386/intel_iommu.c          | 43 ++++++++++++++++++++++++++++++++++++++++++
 hw/i386/intel_iommu_internal.h |  7 +++++++
 2 files changed, 50 insertions(+)

diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
index c5e9170..4370790 100644
--- a/hw/i386/intel_iommu.c
+++ b/hw/i386/intel_iommu.c
@@ -2012,6 +2012,13 @@ static void vtd_tlb_inv_notify_hook(VTDNotifierIterator *iter,
         } else {
             return;
         }
+    case VTD_INV_DESC_EXT_DIOTLB:
+        if (iter->sid != *tlb_hook_info->sid) {
+            return;
+        }
+        tlb_hook_info->inv_desc->lo &= ~VTD_INV_DESC_EXT_DIOTLB_SID_MASK;
+        tlb_hook_info->inv_desc->lo |= (iter->host_sid << 16);
+        break;
     default:
         return;
     }
@@ -2147,6 +2154,34 @@ static bool vtd_process_pasid_desc(IntelIOMMUState *s,
     return true;
 }
 
+static bool vtd_process_ext_device_iotlb(IntelIOMMUState *s,
+                                         VTDInvDesc *inv_desc)
+{
+    uint32_t pasid;
+    uint16_t sid;
+    VTDIOTLBInvHookInfo tlb_hook_info;
+
+    if ((inv_desc->lo & VTD_INV_DESC_EXT_DIOTLB_RSVD_LO) ||
+        (inv_desc->hi & VTD_INV_DESC_EXT_DIOTLB_RSVD_HI)) {
+        VTD_DPRINTF(GENERAL, "error: non-zero reserved field in"
+                    " Device ExIOTLB desc, hi 0x%"PRIx64 " lo 0x%"PRIx64,
+                    inv_desc->hi, inv_desc->lo);
+        return false;
+    }
+
+    pasid = VTD_INV_DESC_EXT_DIOTLB_PASID(inv_desc->lo);
+    sid = VTD_INV_DESC_EXT_DIOTLB_SID(inv_desc->lo);
+
+    tlb_hook_info.did = NULL;
+    tlb_hook_info.sid = &sid;
+    tlb_hook_info.pasid = &pasid;
+    tlb_hook_info.inv_desc = inv_desc;
+    vtd_tlb_inv_passdown_notify(s,
+                                &tlb_hook_info,
+                                vtd_tlb_inv_notify_hook);
+    return true;
+}
+
 static bool vtd_process_inv_desc(IntelIOMMUState *s)
 {
     VTDInvDesc inv_desc;
@@ -2190,6 +2225,14 @@ static bool vtd_process_inv_desc(IntelIOMMUState *s)
         }
         break;
 
+    case VTD_INV_DESC_EXT_DIOTLB:
+        trace_vtd_inv_desc("device-extended-iotlb",
+                           inv_desc.hi, inv_desc.lo);
+        if (!vtd_process_ext_device_iotlb(s, &inv_desc)) {
+            return false;
+        }
+        break;
+
     case VTD_INV_DESC_WAIT:
         trace_vtd_inv_desc("wait", inv_desc.hi, inv_desc.lo);
         if (!vtd_process_wait_desc(s, &inv_desc)) {
diff --git a/hw/i386/intel_iommu_internal.h b/hw/i386/intel_iommu_internal.h
index a6b9350..3cb2361 100644
--- a/hw/i386/intel_iommu_internal.h
+++ b/hw/i386/intel_iommu_internal.h
@@ -343,6 +343,7 @@ typedef union VTDInvDesc VTDInvDesc;
 #define VTD_INV_DESC_WAIT               0x5 /* Invalidation Wait Descriptor */
 #define VTD_INV_DESC_EXT_IOTLB          0x6 /* Ext-IOTLB Invalidate Desc */
 #define VTD_INV_DESC_PC                 0x7 /* PASID-cache Invalidate Desc */
+#define VTD_INV_DESC_EXT_DIOTLB         0x8 /* Ext-DIOTLB Invalidate Desc */
 #define VTD_INV_DESC_NONE               0   /* Not an Invalidate Descriptor */
 
 /* Masks for Invalidation Wait Descriptor*/
@@ -407,6 +408,12 @@ typedef union VTDInvDesc VTDInvDesc;
 #define VTD_INV_DESC_PASIDC_ALL_ALL    (0ULL << 4)
 #define VTD_INV_DESC_PASIDC_PASID_SI   (1ULL << 4)
 
+#define VTD_INV_DESC_EXT_DIOTLB_PASID(val) (((val) >> 32) & 0xfffffULL)
+#define VTD_INV_DESC_EXT_DIOTLB_SID(val)   (((val) >> 16) & 0xffff)
+#define VTD_INV_DESC_EXT_DIOTLB_RSVD_LO    0xe00ULL
+#define VTD_INV_DESC_EXT_DIOTLB_RSVD_HI    0x7feULL
+#define VTD_INV_DESC_EXT_DIOTLB_SID_MASK   0xFFFF0000ULL
+
 /* Information about page-selective IOTLB invalidate */
 struct VTDIOTLBPageInvInfo {
     uint16_t domain_id;
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 81+ messages in thread

* Re: [RFC PATCH 12/20] Memory: Add func to fire pasidt_bind notifier
  2017-04-26 10:06   ` [Qemu-devel] " Liu, Yi L
@ 2017-04-26 13:50       ` Paolo Bonzini
  -1 siblings, 0 replies; 81+ messages in thread
From: Paolo Bonzini @ 2017-04-26 13:50 UTC (permalink / raw)
  To: Liu, Yi L, qemu-devel-qX2TKyscuCcdnm+yROfE0A,
	alex.williamson-H+wXaHxf7aLQT0dZR+AlfA,
	peterx-H+wXaHxf7aLQT0dZR+AlfA
  Cc: tianyu.lan-ral2JQCrhuEAvxtiuMwx3w,
	kevin.tian-ral2JQCrhuEAvxtiuMwx3w, kvm-u79uwXL29TY76Z2rM5mHXA,
	jasowang-H+wXaHxf7aLQT0dZR+AlfA,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	jacob.jun.pan-ral2JQCrhuEAvxtiuMwx3w



On 26/04/2017 12:06, Liu, Yi L wrote:
> +void memory_region_notify_iommu_svm_bind(MemoryRegion *mr,
> +                                         void *data)
> +{
> +    IOMMUNotifier *iommu_notifier;
> +    IOMMUNotifierFlag request_flags;
> +
> +    assert(memory_region_is_iommu(mr));
> +
> +    /*TODO: support other bind requests with smaller gran,
> +     * e.g. bind signle pasid entry
> +     */
> +    request_flags = IOMMU_NOTIFIER_SVM_PASIDT_BIND;
> +
> +    QLIST_FOREACH(iommu_notifier, &mr->iommu_notify, node) {
> +        if (iommu_notifier->notifier_flags & request_flags) {
> +            iommu_notifier->notify(iommu_notifier, data);
> +            break;
> +        }
> +    }

Peter,

should this reuse ->notify, or should it be different function pointer
in IOMMUNotifier?

Paolo

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 12/20] Memory: Add func to fire pasidt_bind notifier
@ 2017-04-26 13:50       ` Paolo Bonzini
  0 siblings, 0 replies; 81+ messages in thread
From: Paolo Bonzini @ 2017-04-26 13:50 UTC (permalink / raw)
  To: Liu, Yi L, qemu-devel, alex.williamson, peterx
  Cc: kvm, jasowang, iommu, kevin.tian, ashok.raj, jacob.jun.pan,
	tianyu.lan, yi.l.liu, jean-philippe.brucker



On 26/04/2017 12:06, Liu, Yi L wrote:
> +void memory_region_notify_iommu_svm_bind(MemoryRegion *mr,
> +                                         void *data)
> +{
> +    IOMMUNotifier *iommu_notifier;
> +    IOMMUNotifierFlag request_flags;
> +
> +    assert(memory_region_is_iommu(mr));
> +
> +    /*TODO: support other bind requests with smaller gran,
> +     * e.g. bind signle pasid entry
> +     */
> +    request_flags = IOMMU_NOTIFIER_SVM_PASIDT_BIND;
> +
> +    QLIST_FOREACH(iommu_notifier, &mr->iommu_notify, node) {
> +        if (iommu_notifier->notifier_flags & request_flags) {
> +            iommu_notifier->notify(iommu_notifier, data);
> +            break;
> +        }
> +    }

Peter,

should this reuse ->notify, or should it be different function pointer
in IOMMUNotifier?

Paolo

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 12/20] Memory: Add func to fire pasidt_bind notifier
  2017-04-26 13:50       ` [Qemu-devel] " Paolo Bonzini
  (?)
@ 2017-04-27  2:37       ` Liu, Yi L
  2017-04-27  6:14           ` Peter Xu
  -1 siblings, 1 reply; 81+ messages in thread
From: Liu, Yi L @ 2017-04-27  2:37 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: qemu-devel, alex.williamson, peterx, tianyu.lan, kevin.tian,
	yi.l.liu, ashok.raj, kvm, jean-philippe.brucker, jasowang, iommu,
	jacob.jun.pan

On Wed, Apr 26, 2017 at 03:50:16PM +0200, Paolo Bonzini wrote:
> 
> 
> On 26/04/2017 12:06, Liu, Yi L wrote:
> > +void memory_region_notify_iommu_svm_bind(MemoryRegion *mr,
> > +                                         void *data)
> > +{
> > +    IOMMUNotifier *iommu_notifier;
> > +    IOMMUNotifierFlag request_flags;
> > +
> > +    assert(memory_region_is_iommu(mr));
> > +
> > +    /*TODO: support other bind requests with smaller gran,
> > +     * e.g. bind signle pasid entry
> > +     */
> > +    request_flags = IOMMU_NOTIFIER_SVM_PASIDT_BIND;
> > +
> > +    QLIST_FOREACH(iommu_notifier, &mr->iommu_notify, node) {
> > +        if (iommu_notifier->notifier_flags & request_flags) {
> > +            iommu_notifier->notify(iommu_notifier, data);
> > +            break;
> > +        }
> > +    }
> 
> Peter,
> 
> should this reuse ->notify, or should it be different function pointer
> in IOMMUNotifier?

Hi Paolo,

Thx for your review.

I think it should be “->notify” here. In this patchset, the new notifier
is registered with the existing notifier registration API. So the all the
notifiers are in the mr->iommu_notify list. And notifiers are labeled
by notify flag, so it is able to differentiate the IOMMUNotifier nodes.
When the flag meets, trigger it by “->notify”. The diagram below shows
my understanding , wish it helps to make me understood.

VFIOContainer
       |
       giommu_list(VFIOGuestIOMMU)
                \
                 VFIOGuestIOMMU1 ->   VFIOGuestIOMMU2 -> VFIOGuestIOMMU3 ...
                    |                     |                 |
mr->iommu_notify: IOMMUNotifier   ->    IOMMUNotifier  ->  IOMMUNotifier
                  (Flag:MAP/UNMAP)     (Flag:SVM bind)  (Flag:tlb invalidate)


Actually, compared with the MAP/UNMAP notifier, the newly added notifier has
no start/end check, and there may be other types of bind notfier flag in
future, so I added a separate fire func for SVM bind notifier.

Thanks,
Yi L

> Paolo
> 

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 12/20] Memory: Add func to fire pasidt_bind notifier
  2017-04-27  2:37       ` Liu, Yi L
@ 2017-04-27  6:14           ` Peter Xu
  0 siblings, 0 replies; 81+ messages in thread
From: Peter Xu @ 2017-04-27  6:14 UTC (permalink / raw)
  To: Liu, Yi L
  Cc: tianyu.lan-ral2JQCrhuEAvxtiuMwx3w,
	kevin.tian-ral2JQCrhuEAvxtiuMwx3w, kvm-u79uwXL29TY76Z2rM5mHXA,
	jasowang-H+wXaHxf7aLQT0dZR+AlfA,
	qemu-devel-qX2TKyscuCcdnm+yROfE0A,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	jacob.jun.pan-ral2JQCrhuEAvxtiuMwx3w, Paolo Bonzini

On Thu, Apr 27, 2017 at 10:37:19AM +0800, Liu, Yi L wrote:
> On Wed, Apr 26, 2017 at 03:50:16PM +0200, Paolo Bonzini wrote:
> > 
> > 
> > On 26/04/2017 12:06, Liu, Yi L wrote:
> > > +void memory_region_notify_iommu_svm_bind(MemoryRegion *mr,
> > > +                                         void *data)
> > > +{
> > > +    IOMMUNotifier *iommu_notifier;
> > > +    IOMMUNotifierFlag request_flags;
> > > +
> > > +    assert(memory_region_is_iommu(mr));
> > > +
> > > +    /*TODO: support other bind requests with smaller gran,
> > > +     * e.g. bind signle pasid entry
> > > +     */
> > > +    request_flags = IOMMU_NOTIFIER_SVM_PASIDT_BIND;
> > > +
> > > +    QLIST_FOREACH(iommu_notifier, &mr->iommu_notify, node) {
> > > +        if (iommu_notifier->notifier_flags & request_flags) {
> > > +            iommu_notifier->notify(iommu_notifier, data);
> > > +            break;
> > > +        }
> > > +    }
> > 
> > Peter,
> > 
> > should this reuse ->notify, or should it be different function pointer
> > in IOMMUNotifier?
> 
> Hi Paolo,
> 
> Thx for your review.
> 
> I think it should be “->notify” here. In this patchset, the new notifier
> is registered with the existing notifier registration API. So the all the
> notifiers are in the mr->iommu_notify list. And notifiers are labeled
> by notify flag, so it is able to differentiate the IOMMUNotifier nodes.
> When the flag meets, trigger it by “->notify”. The diagram below shows
> my understanding , wish it helps to make me understood.
> 
> VFIOContainer
>        |
>        giommu_list(VFIOGuestIOMMU)
>                 \
>                  VFIOGuestIOMMU1 ->   VFIOGuestIOMMU2 -> VFIOGuestIOMMU3 ...
>                     |                     |                 |
> mr->iommu_notify: IOMMUNotifier   ->    IOMMUNotifier  ->  IOMMUNotifier
>                   (Flag:MAP/UNMAP)     (Flag:SVM bind)  (Flag:tlb invalidate)
> 
> 
> Actually, compared with the MAP/UNMAP notifier, the newly added notifier has
> no start/end check, and there may be other types of bind notfier flag in
> future, so I added a separate fire func for SVM bind notifier.

I agree with Paolo that this interface might not be the suitable place
for the SVM notifiers (just like what I worried about in previous
discussions).

The biggest problem is that, if you see current notifier mechanism,
it's per-memory-region. However iiuc your messages should be
per-iommu, or say, per translation unit. While, for each iommu, there
can be more than one memory regions (ppc can be an example). When
there are more than one MRs binded to the same iommu unit, which
memory region should you register to? Any one of them, or all?

So my conclusion is, it just has nothing to do with memory regions...

Instead of a different function pointer in IOMMUNotifer, IMHO we can
even move a step further, to isolate IOTLB notifications (targeted at
memory regions and with start/end ranges) out of SVM/other
notifications, since they are different in general. So we basically
need two notification mechanism:

- one for memory regions, currently what I can see is IOTLB
  notifications

- one for translation units, currently I see all the rest of
  notifications needed in virt-svm in this category

Maybe some RFC patches would be good to show what I mean... I'll see
whether I can prepare some.

Thanks,

-- 
Peter Xu
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 12/20] Memory: Add func to fire pasidt_bind notifier
@ 2017-04-27  6:14           ` Peter Xu
  0 siblings, 0 replies; 81+ messages in thread
From: Peter Xu @ 2017-04-27  6:14 UTC (permalink / raw)
  To: Liu, Yi L
  Cc: Paolo Bonzini, qemu-devel, alex.williamson, tianyu.lan,
	kevin.tian, yi.l.liu, ashok.raj, kvm, jean-philippe.brucker,
	jasowang, iommu, jacob.jun.pan

On Thu, Apr 27, 2017 at 10:37:19AM +0800, Liu, Yi L wrote:
> On Wed, Apr 26, 2017 at 03:50:16PM +0200, Paolo Bonzini wrote:
> > 
> > 
> > On 26/04/2017 12:06, Liu, Yi L wrote:
> > > +void memory_region_notify_iommu_svm_bind(MemoryRegion *mr,
> > > +                                         void *data)
> > > +{
> > > +    IOMMUNotifier *iommu_notifier;
> > > +    IOMMUNotifierFlag request_flags;
> > > +
> > > +    assert(memory_region_is_iommu(mr));
> > > +
> > > +    /*TODO: support other bind requests with smaller gran,
> > > +     * e.g. bind signle pasid entry
> > > +     */
> > > +    request_flags = IOMMU_NOTIFIER_SVM_PASIDT_BIND;
> > > +
> > > +    QLIST_FOREACH(iommu_notifier, &mr->iommu_notify, node) {
> > > +        if (iommu_notifier->notifier_flags & request_flags) {
> > > +            iommu_notifier->notify(iommu_notifier, data);
> > > +            break;
> > > +        }
> > > +    }
> > 
> > Peter,
> > 
> > should this reuse ->notify, or should it be different function pointer
> > in IOMMUNotifier?
> 
> Hi Paolo,
> 
> Thx for your review.
> 
> I think it should be “->notify” here. In this patchset, the new notifier
> is registered with the existing notifier registration API. So the all the
> notifiers are in the mr->iommu_notify list. And notifiers are labeled
> by notify flag, so it is able to differentiate the IOMMUNotifier nodes.
> When the flag meets, trigger it by “->notify”. The diagram below shows
> my understanding , wish it helps to make me understood.
> 
> VFIOContainer
>        |
>        giommu_list(VFIOGuestIOMMU)
>                 \
>                  VFIOGuestIOMMU1 ->   VFIOGuestIOMMU2 -> VFIOGuestIOMMU3 ...
>                     |                     |                 |
> mr->iommu_notify: IOMMUNotifier   ->    IOMMUNotifier  ->  IOMMUNotifier
>                   (Flag:MAP/UNMAP)     (Flag:SVM bind)  (Flag:tlb invalidate)
> 
> 
> Actually, compared with the MAP/UNMAP notifier, the newly added notifier has
> no start/end check, and there may be other types of bind notfier flag in
> future, so I added a separate fire func for SVM bind notifier.

I agree with Paolo that this interface might not be the suitable place
for the SVM notifiers (just like what I worried about in previous
discussions).

The biggest problem is that, if you see current notifier mechanism,
it's per-memory-region. However iiuc your messages should be
per-iommu, or say, per translation unit. While, for each iommu, there
can be more than one memory regions (ppc can be an example). When
there are more than one MRs binded to the same iommu unit, which
memory region should you register to? Any one of them, or all?

So my conclusion is, it just has nothing to do with memory regions...

Instead of a different function pointer in IOMMUNotifer, IMHO we can
even move a step further, to isolate IOTLB notifications (targeted at
memory regions and with start/end ranges) out of SVM/other
notifications, since they are different in general. So we basically
need two notification mechanism:

- one for memory regions, currently what I can see is IOTLB
  notifications

- one for translation units, currently I see all the rest of
  notifications needed in virt-svm in this category

Maybe some RFC patches would be good to show what I mean... I'll see
whether I can prepare some.

Thanks,

-- 
Peter Xu

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 12/20] Memory: Add func to fire pasidt_bind notifier
  2017-04-27  6:14           ` Peter Xu
  (?)
@ 2017-04-27 10:09           ` Peter Xu
  -1 siblings, 0 replies; 81+ messages in thread
From: Peter Xu @ 2017-04-27 10:09 UTC (permalink / raw)
  To: Liu, Yi L
  Cc: Paolo Bonzini, qemu-devel, alex.williamson, tianyu.lan,
	kevin.tian, yi.l.liu, ashok.raj, kvm, jean-philippe.brucker,
	jasowang, iommu, jacob.jun.pan

On Thu, Apr 27, 2017 at 02:14:27PM +0800, Peter Xu wrote:
> On Thu, Apr 27, 2017 at 10:37:19AM +0800, Liu, Yi L wrote:
> > On Wed, Apr 26, 2017 at 03:50:16PM +0200, Paolo Bonzini wrote:
> > > 
> > > 
> > > On 26/04/2017 12:06, Liu, Yi L wrote:
> > > > +void memory_region_notify_iommu_svm_bind(MemoryRegion *mr,
> > > > +                                         void *data)
> > > > +{
> > > > +    IOMMUNotifier *iommu_notifier;
> > > > +    IOMMUNotifierFlag request_flags;
> > > > +
> > > > +    assert(memory_region_is_iommu(mr));
> > > > +
> > > > +    /*TODO: support other bind requests with smaller gran,
> > > > +     * e.g. bind signle pasid entry
> > > > +     */
> > > > +    request_flags = IOMMU_NOTIFIER_SVM_PASIDT_BIND;
> > > > +
> > > > +    QLIST_FOREACH(iommu_notifier, &mr->iommu_notify, node) {
> > > > +        if (iommu_notifier->notifier_flags & request_flags) {
> > > > +            iommu_notifier->notify(iommu_notifier, data);
> > > > +            break;
> > > > +        }
> > > > +    }
> > > 
> > > Peter,
> > > 
> > > should this reuse ->notify, or should it be different function pointer
> > > in IOMMUNotifier?
> > 
> > Hi Paolo,
> > 
> > Thx for your review.
> > 
> > I think it should be “->notify” here. In this patchset, the new notifier
> > is registered with the existing notifier registration API. So the all the
> > notifiers are in the mr->iommu_notify list. And notifiers are labeled
> > by notify flag, so it is able to differentiate the IOMMUNotifier nodes.
> > When the flag meets, trigger it by “->notify”. The diagram below shows
> > my understanding , wish it helps to make me understood.
> > 
> > VFIOContainer
> >        |
> >        giommu_list(VFIOGuestIOMMU)
> >                 \
> >                  VFIOGuestIOMMU1 ->   VFIOGuestIOMMU2 -> VFIOGuestIOMMU3 ...
> >                     |                     |                 |
> > mr->iommu_notify: IOMMUNotifier   ->    IOMMUNotifier  ->  IOMMUNotifier
> >                   (Flag:MAP/UNMAP)     (Flag:SVM bind)  (Flag:tlb invalidate)
> > 
> > 
> > Actually, compared with the MAP/UNMAP notifier, the newly added notifier has
> > no start/end check, and there may be other types of bind notfier flag in
> > future, so I added a separate fire func for SVM bind notifier.
> 
> I agree with Paolo that this interface might not be the suitable place
> for the SVM notifiers (just like what I worried about in previous
> discussions).
> 
> The biggest problem is that, if you see current notifier mechanism,
> it's per-memory-region. However iiuc your messages should be
> per-iommu, or say, per translation unit. While, for each iommu, there
> can be more than one memory regions (ppc can be an example). When
> there are more than one MRs binded to the same iommu unit, which
> memory region should you register to? Any one of them, or all?
> 
> So my conclusion is, it just has nothing to do with memory regions...
> 
> Instead of a different function pointer in IOMMUNotifer, IMHO we can
> even move a step further, to isolate IOTLB notifications (targeted at
> memory regions and with start/end ranges) out of SVM/other
> notifications, since they are different in general. So we basically
> need two notification mechanism:
> 
> - one for memory regions, currently what I can see is IOTLB
>   notifications
> 
> - one for translation units, currently I see all the rest of
>   notifications needed in virt-svm in this category
> 
> Maybe some RFC patches would be good to show what I mean... I'll see
> whether I can prepare some.

Here it is (on qemu-devel):

[RFC PATCH 0/8] IOMMU: introduce common IOMMUObject

Thanks,

-- 
Peter Xu

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 12/20] Memory: Add func to fire pasidt_bind notifier
  2017-04-27  6:14           ` Peter Xu
@ 2017-04-27 10:25               ` Liu, Yi L
  -1 siblings, 0 replies; 81+ messages in thread
From: Liu, Yi L @ 2017-04-27 10:25 UTC (permalink / raw)
  To: Peter Xu
  Cc: tianyu.lan-ral2JQCrhuEAvxtiuMwx3w,
	kevin.tian-ral2JQCrhuEAvxtiuMwx3w, kvm-u79uwXL29TY76Z2rM5mHXA,
	jasowang-H+wXaHxf7aLQT0dZR+AlfA,
	qemu-devel-qX2TKyscuCcdnm+yROfE0A,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	jacob.jun.pan-ral2JQCrhuEAvxtiuMwx3w, Paolo Bonzini

On Thu, Apr 27, 2017 at 02:14:27PM +0800, Peter Xu wrote:
> On Thu, Apr 27, 2017 at 10:37:19AM +0800, Liu, Yi L wrote:
> > On Wed, Apr 26, 2017 at 03:50:16PM +0200, Paolo Bonzini wrote:
> > > 
> > > 
> > > On 26/04/2017 12:06, Liu, Yi L wrote:
> > > > +void memory_region_notify_iommu_svm_bind(MemoryRegion *mr,
> > > > +                                         void *data)
> > > > +{
> > > > +    IOMMUNotifier *iommu_notifier;
> > > > +    IOMMUNotifierFlag request_flags;
> > > > +
> > > > +    assert(memory_region_is_iommu(mr));
> > > > +
> > > > +    /*TODO: support other bind requests with smaller gran,
> > > > +     * e.g. bind signle pasid entry
> > > > +     */
> > > > +    request_flags = IOMMU_NOTIFIER_SVM_PASIDT_BIND;
> > > > +
> > > > +    QLIST_FOREACH(iommu_notifier, &mr->iommu_notify, node) {
> > > > +        if (iommu_notifier->notifier_flags & request_flags) {
> > > > +            iommu_notifier->notify(iommu_notifier, data);
> > > > +            break;
> > > > +        }
> > > > +    }
> > > 
> > > Peter,
> > > 
> > > should this reuse ->notify, or should it be different function pointer
> > > in IOMMUNotifier?
> > 
> > Hi Paolo,
> > 
> > Thx for your review.
> > 
> > I think it should be “->notify” here. In this patchset, the new notifier
> > is registered with the existing notifier registration API. So the all the
> > notifiers are in the mr->iommu_notify list. And notifiers are labeled
> > by notify flag, so it is able to differentiate the IOMMUNotifier nodes.
> > When the flag meets, trigger it by “->notify”. The diagram below shows
> > my understanding , wish it helps to make me understood.
> > 
> > VFIOContainer
> >        |
> >        giommu_list(VFIOGuestIOMMU)
> >                 \
> >                  VFIOGuestIOMMU1 ->   VFIOGuestIOMMU2 -> VFIOGuestIOMMU3 ...
> >                     |                     |                 |
> > mr->iommu_notify: IOMMUNotifier   ->    IOMMUNotifier  ->  IOMMUNotifier
> >                   (Flag:MAP/UNMAP)     (Flag:SVM bind)  (Flag:tlb invalidate)
> > 
> > 
> > Actually, compared with the MAP/UNMAP notifier, the newly added notifier has
> > no start/end check, and there may be other types of bind notfier flag in
> > future, so I added a separate fire func for SVM bind notifier.
> 
> I agree with Paolo that this interface might not be the suitable place
> for the SVM notifiers (just like what I worried about in previous
> discussions).
> 
> The biggest problem is that, if you see current notifier mechanism,
> it's per-memory-region. However iiuc your messages should be
> per-iommu, or say, per translation unit.

Hi Peter,

yes, you're right. the newly added notifier is per-iommu.

> While, for each iommu, there
> can be more than one memory regions (ppc can be an example). When
> there are more than one MRs binded to the same iommu unit, which
> memory region should you register to? Any one of them, or all?

Honestly, I'm not expert on ppc. According to the current code,
I can only find one MR initialized with memory_region_init_iommu()
in spapr_tce_table_realize(). So to better get your point, let me
check. Do you mean there may be multiple of iommu MRs behind a iommu?

I admit it must be considered if there are multiple iommu MRs. I may
choose to register for one of them since the notifier is per-iommu as
you've pointed. Then vIOMMU emulator need to trigger the notifier with
the correct MR. Not sure if ppc vIOMMU is fine with it.

> So my conclusion is, it just has nothing to do with memory regions...
>
> Instead of a different function pointer in IOMMUNotifer, IMHO we can
> even move a step further, to isolate IOTLB notifications (targeted at
> memory regions and with start/end ranges) out of SVM/other
> notifications, since they are different in general. So we basically
> need two notification mechanism:
> 
> - one for memory regions, currently what I can see is IOTLB
>   notifications
> 
> - one for translation units, currently I see all the rest of
>   notifications needed in virt-svm in this category
> 
> Maybe some RFC patches would be good to show what I mean... I'll see
> whether I can prepare some.

I agree that it would be helpful to split the two kinds of notifiers. I
marked it as a FIXME in patch 0006 of this series. Just saw your RFC patch
for common IOMMUObject. Thx for your work, would try to review it.

Besides the notifier registration, pls also help to review the SVM
virtualization itself. Would be glad to know your comments.

Thanks,
Yi L

> Thanks,
> 
> -- 
> Peter Xu
> 
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 12/20] Memory: Add func to fire pasidt_bind notifier
@ 2017-04-27 10:25               ` Liu, Yi L
  0 siblings, 0 replies; 81+ messages in thread
From: Liu, Yi L @ 2017-04-27 10:25 UTC (permalink / raw)
  To: Peter Xu
  Cc: tianyu.lan, kevin.tian, yi.l.liu, ashok.raj, kvm,
	jean-philippe.brucker, jasowang, qemu-devel, iommu,
	alex.williamson, jacob.jun.pan, Paolo Bonzini

On Thu, Apr 27, 2017 at 02:14:27PM +0800, Peter Xu wrote:
> On Thu, Apr 27, 2017 at 10:37:19AM +0800, Liu, Yi L wrote:
> > On Wed, Apr 26, 2017 at 03:50:16PM +0200, Paolo Bonzini wrote:
> > > 
> > > 
> > > On 26/04/2017 12:06, Liu, Yi L wrote:
> > > > +void memory_region_notify_iommu_svm_bind(MemoryRegion *mr,
> > > > +                                         void *data)
> > > > +{
> > > > +    IOMMUNotifier *iommu_notifier;
> > > > +    IOMMUNotifierFlag request_flags;
> > > > +
> > > > +    assert(memory_region_is_iommu(mr));
> > > > +
> > > > +    /*TODO: support other bind requests with smaller gran,
> > > > +     * e.g. bind signle pasid entry
> > > > +     */
> > > > +    request_flags = IOMMU_NOTIFIER_SVM_PASIDT_BIND;
> > > > +
> > > > +    QLIST_FOREACH(iommu_notifier, &mr->iommu_notify, node) {
> > > > +        if (iommu_notifier->notifier_flags & request_flags) {
> > > > +            iommu_notifier->notify(iommu_notifier, data);
> > > > +            break;
> > > > +        }
> > > > +    }
> > > 
> > > Peter,
> > > 
> > > should this reuse ->notify, or should it be different function pointer
> > > in IOMMUNotifier?
> > 
> > Hi Paolo,
> > 
> > Thx for your review.
> > 
> > I think it should be “->notify” here. In this patchset, the new notifier
> > is registered with the existing notifier registration API. So the all the
> > notifiers are in the mr->iommu_notify list. And notifiers are labeled
> > by notify flag, so it is able to differentiate the IOMMUNotifier nodes.
> > When the flag meets, trigger it by “->notify”. The diagram below shows
> > my understanding , wish it helps to make me understood.
> > 
> > VFIOContainer
> >        |
> >        giommu_list(VFIOGuestIOMMU)
> >                 \
> >                  VFIOGuestIOMMU1 ->   VFIOGuestIOMMU2 -> VFIOGuestIOMMU3 ...
> >                     |                     |                 |
> > mr->iommu_notify: IOMMUNotifier   ->    IOMMUNotifier  ->  IOMMUNotifier
> >                   (Flag:MAP/UNMAP)     (Flag:SVM bind)  (Flag:tlb invalidate)
> > 
> > 
> > Actually, compared with the MAP/UNMAP notifier, the newly added notifier has
> > no start/end check, and there may be other types of bind notfier flag in
> > future, so I added a separate fire func for SVM bind notifier.
> 
> I agree with Paolo that this interface might not be the suitable place
> for the SVM notifiers (just like what I worried about in previous
> discussions).
> 
> The biggest problem is that, if you see current notifier mechanism,
> it's per-memory-region. However iiuc your messages should be
> per-iommu, or say, per translation unit.

Hi Peter,

yes, you're right. the newly added notifier is per-iommu.

> While, for each iommu, there
> can be more than one memory regions (ppc can be an example). When
> there are more than one MRs binded to the same iommu unit, which
> memory region should you register to? Any one of them, or all?

Honestly, I'm not expert on ppc. According to the current code,
I can only find one MR initialized with memory_region_init_iommu()
in spapr_tce_table_realize(). So to better get your point, let me
check. Do you mean there may be multiple of iommu MRs behind a iommu?

I admit it must be considered if there are multiple iommu MRs. I may
choose to register for one of them since the notifier is per-iommu as
you've pointed. Then vIOMMU emulator need to trigger the notifier with
the correct MR. Not sure if ppc vIOMMU is fine with it.

> So my conclusion is, it just has nothing to do with memory regions...
>
> Instead of a different function pointer in IOMMUNotifer, IMHO we can
> even move a step further, to isolate IOTLB notifications (targeted at
> memory regions and with start/end ranges) out of SVM/other
> notifications, since they are different in general. So we basically
> need two notification mechanism:
> 
> - one for memory regions, currently what I can see is IOTLB
>   notifications
> 
> - one for translation units, currently I see all the rest of
>   notifications needed in virt-svm in this category
> 
> Maybe some RFC patches would be good to show what I mean... I'll see
> whether I can prepare some.

I agree that it would be helpful to split the two kinds of notifiers. I
marked it as a FIXME in patch 0006 of this series. Just saw your RFC patch
for common IOMMUObject. Thx for your work, would try to review it.

Besides the notifier registration, pls also help to review the SVM
virtualization itself. Would be glad to know your comments.

Thanks,
Yi L

> Thanks,
> 
> -- 
> Peter Xu
> 

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [RFC PATCH 02/20] intel_iommu: exposed extended-context mode to guest
  2017-04-26 10:06   ` [Qemu-devel] " Liu, Yi L
@ 2017-04-27 10:32       ` Peter Xu
  -1 siblings, 0 replies; 81+ messages in thread
From: Peter Xu @ 2017-04-27 10:32 UTC (permalink / raw)
  To: Liu, Yi L
  Cc: tianyu.lan-ral2JQCrhuEAvxtiuMwx3w,
	kevin.tian-ral2JQCrhuEAvxtiuMwx3w, kvm-u79uwXL29TY76Z2rM5mHXA,
	jasowang-H+wXaHxf7aLQT0dZR+AlfA,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	qemu-devel-qX2TKyscuCcdnm+yROfE0A,
	jacob.jun.pan-ral2JQCrhuEAvxtiuMwx3w

On Wed, Apr 26, 2017 at 06:06:32PM +0800, Liu, Yi L wrote:
> VT-d implementations reporting PASID or PRS fields as "Set", must also
> report ecap.ECS as "Set". Extended-Context is required for SVM.
> 
> When ECS is reported, intel iommu driver would initiate extended root entry
> and extended context entry, and also PASID table if there is any SVM capable
> device.
> 
> Signed-off-by: Liu, Yi L <yi.l.liu-VuQAYsv1563Yd54FQh9/CA@public.gmane.org>
> ---
>  hw/i386/intel_iommu.c          | 131 +++++++++++++++++++++++++++--------------
>  hw/i386/intel_iommu_internal.h |   9 +++
>  include/hw/i386/intel_iommu.h  |   2 +-
>  3 files changed, 97 insertions(+), 45 deletions(-)
> 
> diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
> index 400d0d1..bf98fa5 100644
> --- a/hw/i386/intel_iommu.c
> +++ b/hw/i386/intel_iommu.c
> @@ -497,6 +497,11 @@ static inline bool vtd_root_entry_present(VTDRootEntry *root)
>      return root->val & VTD_ROOT_ENTRY_P;
>  }
>  
> +static inline bool vtd_root_entry_upper_present(VTDRootEntry *root)
> +{
> +    return root->rsvd & VTD_ROOT_ENTRY_P;
> +}
> +
>  static int vtd_get_root_entry(IntelIOMMUState *s, uint8_t index,
>                                VTDRootEntry *re)
>  {
> @@ -509,6 +514,9 @@ static int vtd_get_root_entry(IntelIOMMUState *s, uint8_t index,
>          return -VTD_FR_ROOT_TABLE_INV;
>      }
>      re->val = le64_to_cpu(re->val);
> +    if (s->ecs) {
> +        re->rsvd = le64_to_cpu(re->rsvd);
> +    }

I feel it slightly hacky to play with re->rsvd. How about:

union VTDRootEntry {
    struct {
        uint64_t val;
        uint64_t rsvd;
    } base;
    struct {
        uint64_t ext_lo;
        uint64_t ext_hi;
    } extended;
};

(Or any better way that can get rid of rsvd...)

Even:

struct VTDRootEntry {
    union {
        struct {
                uint64_t val;
                uint64_t rsvd;
        } base;
        struct {
                uint64_t ext_lo;
                uint64_t ext_hi;
        } extended;
    } data;
    bool extended;
};

Then we read the entry into data, and setup extended bit. A benefit of
it is that we may avoid passing around IntelIOMMUState everywhere to
know whether we are using extended context entries.

>      return 0;
>  }
>  
> @@ -517,19 +525,30 @@ static inline bool vtd_context_entry_present(VTDContextEntry *context)
>      return context->lo & VTD_CONTEXT_ENTRY_P;
>  }
>  
> -static int vtd_get_context_entry_from_root(VTDRootEntry *root, uint8_t index,
> -                                           VTDContextEntry *ce)
> +static int vtd_get_context_entry_from_root(IntelIOMMUState *s,
> +                 VTDRootEntry *root, uint8_t index, VTDContextEntry *ce)
>  {
> -    dma_addr_t addr;
> +    dma_addr_t addr, ce_size;
>  
>      /* we have checked that root entry is present */
> -    addr = (root->val & VTD_ROOT_ENTRY_CTP) + index * sizeof(*ce);
> -    if (dma_memory_read(&address_space_memory, addr, ce, sizeof(*ce))) {
> +    ce_size = (s->ecs) ? (2 * sizeof(*ce)) : (sizeof(*ce));
> +    addr = (s->ecs && (index > 0x7f)) ?
> +           ((root->rsvd & VTD_ROOT_ENTRY_CTP) + (index - 0x80) * ce_size) :
> +           ((root->val & VTD_ROOT_ENTRY_CTP) + index * ce_size);
> +
> +    if (dma_memory_read(&address_space_memory, addr, ce, ce_size)) {
>          trace_vtd_re_invalid(root->rsvd, root->val);
>          return -VTD_FR_CONTEXT_TABLE_INV;
>      }
> -    ce->lo = le64_to_cpu(ce->lo);
> -    ce->hi = le64_to_cpu(ce->hi);
> +
> +    ce[0].lo = le64_to_cpu(ce[0].lo);
> +    ce[0].hi = le64_to_cpu(ce[0].hi);

Again, I feel this even hackier. :)

I would slightly prefer to play the same union trick to context
entries, just like what I proposed to the root entries above...

> +
> +    if (s->ecs) {
> +        ce[1].lo = le64_to_cpu(ce[1].lo);
> +        ce[1].hi = le64_to_cpu(ce[1].hi);
> +    }
> +
>      return 0;
>  }
>  
> @@ -595,9 +614,11 @@ static inline uint32_t vtd_get_agaw_from_context_entry(VTDContextEntry *ce)
>      return 30 + (ce->hi & VTD_CONTEXT_ENTRY_AW) * 9;
>  }
>  
> -static inline uint32_t vtd_ce_get_type(VTDContextEntry *ce)
> +static inline uint32_t vtd_ce_get_type(IntelIOMMUState *s,
> +                                       VTDContextEntry *ce)
>  {
> -    return ce->lo & VTD_CONTEXT_ENTRY_TT;
> +    return s->ecs ? (ce->lo & VTD_CONTEXT_ENTRY_TT) :
> +                    (ce->lo & VTD_EXT_CONTEXT_ENTRY_TT);
>  }
>  
>  static inline uint64_t vtd_iova_limit(VTDContextEntry *ce)
> @@ -842,16 +863,20 @@ static int vtd_dev_to_context_entry(IntelIOMMUState *s, uint8_t bus_num,
>          return ret_fr;
>      }
>  
> -    if (!vtd_root_entry_present(&re)) {
> +    if (!vtd_root_entry_present(&re) ||
> +        (s->ecs && (devfn > 0x7f) && (!vtd_root_entry_upper_present(&re)))) {
>          /* Not error - it's okay we don't have root entry. */
>          trace_vtd_re_not_present(bus_num);
>          return -VTD_FR_ROOT_ENTRY_P;
> -    } else if (re.rsvd || (re.val & VTD_ROOT_ENTRY_RSVD)) {
> -        trace_vtd_re_invalid(re.rsvd, re.val);
> -        return -VTD_FR_ROOT_ENTRY_RSVD;
> +    }
> +    if ((s->ecs && (devfn > 0x7f) && (re.rsvd & VTD_ROOT_ENTRY_RSVD)) ||
> +        (s->ecs && (devfn < 0x80) && (re.val & VTD_ROOT_ENTRY_RSVD)) ||
> +        ((!s->ecs) && (re.rsvd || (re.val & VTD_ROOT_ENTRY_RSVD)))) {
> +            trace_vtd_re_invalid(re.rsvd, re.val);
> +            return -VTD_FR_ROOT_ENTRY_RSVD;

Nit: I feel like we can better wrap these 0x7f and 0x80 into helper
functions, especially if with above structure change...

(will hold here...)

Thanks,

-- 
Peter Xu

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 02/20] intel_iommu: exposed extended-context mode to guest
@ 2017-04-27 10:32       ` Peter Xu
  0 siblings, 0 replies; 81+ messages in thread
From: Peter Xu @ 2017-04-27 10:32 UTC (permalink / raw)
  To: Liu, Yi L
  Cc: qemu-devel, alex.williamson, kvm, jasowang, iommu, kevin.tian,
	ashok.raj, jacob.jun.pan, tianyu.lan, yi.l.liu,
	jean-philippe.brucker

On Wed, Apr 26, 2017 at 06:06:32PM +0800, Liu, Yi L wrote:
> VT-d implementations reporting PASID or PRS fields as "Set", must also
> report ecap.ECS as "Set". Extended-Context is required for SVM.
> 
> When ECS is reported, intel iommu driver would initiate extended root entry
> and extended context entry, and also PASID table if there is any SVM capable
> device.
> 
> Signed-off-by: Liu, Yi L <yi.l.liu@linux.intel.com>
> ---
>  hw/i386/intel_iommu.c          | 131 +++++++++++++++++++++++++++--------------
>  hw/i386/intel_iommu_internal.h |   9 +++
>  include/hw/i386/intel_iommu.h  |   2 +-
>  3 files changed, 97 insertions(+), 45 deletions(-)
> 
> diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
> index 400d0d1..bf98fa5 100644
> --- a/hw/i386/intel_iommu.c
> +++ b/hw/i386/intel_iommu.c
> @@ -497,6 +497,11 @@ static inline bool vtd_root_entry_present(VTDRootEntry *root)
>      return root->val & VTD_ROOT_ENTRY_P;
>  }
>  
> +static inline bool vtd_root_entry_upper_present(VTDRootEntry *root)
> +{
> +    return root->rsvd & VTD_ROOT_ENTRY_P;
> +}
> +
>  static int vtd_get_root_entry(IntelIOMMUState *s, uint8_t index,
>                                VTDRootEntry *re)
>  {
> @@ -509,6 +514,9 @@ static int vtd_get_root_entry(IntelIOMMUState *s, uint8_t index,
>          return -VTD_FR_ROOT_TABLE_INV;
>      }
>      re->val = le64_to_cpu(re->val);
> +    if (s->ecs) {
> +        re->rsvd = le64_to_cpu(re->rsvd);
> +    }

I feel it slightly hacky to play with re->rsvd. How about:

union VTDRootEntry {
    struct {
        uint64_t val;
        uint64_t rsvd;
    } base;
    struct {
        uint64_t ext_lo;
        uint64_t ext_hi;
    } extended;
};

(Or any better way that can get rid of rsvd...)

Even:

struct VTDRootEntry {
    union {
        struct {
                uint64_t val;
                uint64_t rsvd;
        } base;
        struct {
                uint64_t ext_lo;
                uint64_t ext_hi;
        } extended;
    } data;
    bool extended;
};

Then we read the entry into data, and setup extended bit. A benefit of
it is that we may avoid passing around IntelIOMMUState everywhere to
know whether we are using extended context entries.

>      return 0;
>  }
>  
> @@ -517,19 +525,30 @@ static inline bool vtd_context_entry_present(VTDContextEntry *context)
>      return context->lo & VTD_CONTEXT_ENTRY_P;
>  }
>  
> -static int vtd_get_context_entry_from_root(VTDRootEntry *root, uint8_t index,
> -                                           VTDContextEntry *ce)
> +static int vtd_get_context_entry_from_root(IntelIOMMUState *s,
> +                 VTDRootEntry *root, uint8_t index, VTDContextEntry *ce)
>  {
> -    dma_addr_t addr;
> +    dma_addr_t addr, ce_size;
>  
>      /* we have checked that root entry is present */
> -    addr = (root->val & VTD_ROOT_ENTRY_CTP) + index * sizeof(*ce);
> -    if (dma_memory_read(&address_space_memory, addr, ce, sizeof(*ce))) {
> +    ce_size = (s->ecs) ? (2 * sizeof(*ce)) : (sizeof(*ce));
> +    addr = (s->ecs && (index > 0x7f)) ?
> +           ((root->rsvd & VTD_ROOT_ENTRY_CTP) + (index - 0x80) * ce_size) :
> +           ((root->val & VTD_ROOT_ENTRY_CTP) + index * ce_size);
> +
> +    if (dma_memory_read(&address_space_memory, addr, ce, ce_size)) {
>          trace_vtd_re_invalid(root->rsvd, root->val);
>          return -VTD_FR_CONTEXT_TABLE_INV;
>      }
> -    ce->lo = le64_to_cpu(ce->lo);
> -    ce->hi = le64_to_cpu(ce->hi);
> +
> +    ce[0].lo = le64_to_cpu(ce[0].lo);
> +    ce[0].hi = le64_to_cpu(ce[0].hi);

Again, I feel this even hackier. :)

I would slightly prefer to play the same union trick to context
entries, just like what I proposed to the root entries above...

> +
> +    if (s->ecs) {
> +        ce[1].lo = le64_to_cpu(ce[1].lo);
> +        ce[1].hi = le64_to_cpu(ce[1].hi);
> +    }
> +
>      return 0;
>  }
>  
> @@ -595,9 +614,11 @@ static inline uint32_t vtd_get_agaw_from_context_entry(VTDContextEntry *ce)
>      return 30 + (ce->hi & VTD_CONTEXT_ENTRY_AW) * 9;
>  }
>  
> -static inline uint32_t vtd_ce_get_type(VTDContextEntry *ce)
> +static inline uint32_t vtd_ce_get_type(IntelIOMMUState *s,
> +                                       VTDContextEntry *ce)
>  {
> -    return ce->lo & VTD_CONTEXT_ENTRY_TT;
> +    return s->ecs ? (ce->lo & VTD_CONTEXT_ENTRY_TT) :
> +                    (ce->lo & VTD_EXT_CONTEXT_ENTRY_TT);
>  }
>  
>  static inline uint64_t vtd_iova_limit(VTDContextEntry *ce)
> @@ -842,16 +863,20 @@ static int vtd_dev_to_context_entry(IntelIOMMUState *s, uint8_t bus_num,
>          return ret_fr;
>      }
>  
> -    if (!vtd_root_entry_present(&re)) {
> +    if (!vtd_root_entry_present(&re) ||
> +        (s->ecs && (devfn > 0x7f) && (!vtd_root_entry_upper_present(&re)))) {
>          /* Not error - it's okay we don't have root entry. */
>          trace_vtd_re_not_present(bus_num);
>          return -VTD_FR_ROOT_ENTRY_P;
> -    } else if (re.rsvd || (re.val & VTD_ROOT_ENTRY_RSVD)) {
> -        trace_vtd_re_invalid(re.rsvd, re.val);
> -        return -VTD_FR_ROOT_ENTRY_RSVD;
> +    }
> +    if ((s->ecs && (devfn > 0x7f) && (re.rsvd & VTD_ROOT_ENTRY_RSVD)) ||
> +        (s->ecs && (devfn < 0x80) && (re.val & VTD_ROOT_ENTRY_RSVD)) ||
> +        ((!s->ecs) && (re.rsvd || (re.val & VTD_ROOT_ENTRY_RSVD)))) {
> +            trace_vtd_re_invalid(re.rsvd, re.val);
> +            return -VTD_FR_ROOT_ENTRY_RSVD;

Nit: I feel like we can better wrap these 0x7f and 0x80 into helper
functions, especially if with above structure change...

(will hold here...)

Thanks,

-- 
Peter Xu

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 12/20] Memory: Add func to fire pasidt_bind notifier
  2017-04-27 10:25               ` Liu, Yi L
  (?)
@ 2017-04-27 10:51               ` Peter Xu
  -1 siblings, 0 replies; 81+ messages in thread
From: Peter Xu @ 2017-04-27 10:51 UTC (permalink / raw)
  To: Liu, Yi L
  Cc: tianyu.lan, kevin.tian, yi.l.liu, ashok.raj, kvm,
	jean-philippe.brucker, jasowang, qemu-devel, iommu,
	alex.williamson, jacob.jun.pan, Paolo Bonzini

On Thu, Apr 27, 2017 at 06:25:37PM +0800, Liu, Yi L wrote:
> On Thu, Apr 27, 2017 at 02:14:27PM +0800, Peter Xu wrote:
> > On Thu, Apr 27, 2017 at 10:37:19AM +0800, Liu, Yi L wrote:
> > > On Wed, Apr 26, 2017 at 03:50:16PM +0200, Paolo Bonzini wrote:
> > > > 
> > > > 
> > > > On 26/04/2017 12:06, Liu, Yi L wrote:
> > > > > +void memory_region_notify_iommu_svm_bind(MemoryRegion *mr,
> > > > > +                                         void *data)
> > > > > +{
> > > > > +    IOMMUNotifier *iommu_notifier;
> > > > > +    IOMMUNotifierFlag request_flags;
> > > > > +
> > > > > +    assert(memory_region_is_iommu(mr));
> > > > > +
> > > > > +    /*TODO: support other bind requests with smaller gran,
> > > > > +     * e.g. bind signle pasid entry
> > > > > +     */
> > > > > +    request_flags = IOMMU_NOTIFIER_SVM_PASIDT_BIND;
> > > > > +
> > > > > +    QLIST_FOREACH(iommu_notifier, &mr->iommu_notify, node) {
> > > > > +        if (iommu_notifier->notifier_flags & request_flags) {
> > > > > +            iommu_notifier->notify(iommu_notifier, data);
> > > > > +            break;
> > > > > +        }
> > > > > +    }
> > > > 
> > > > Peter,
> > > > 
> > > > should this reuse ->notify, or should it be different function pointer
> > > > in IOMMUNotifier?
> > > 
> > > Hi Paolo,
> > > 
> > > Thx for your review.
> > > 
> > > I think it should be “->notify” here. In this patchset, the new notifier
> > > is registered with the existing notifier registration API. So the all the
> > > notifiers are in the mr->iommu_notify list. And notifiers are labeled
> > > by notify flag, so it is able to differentiate the IOMMUNotifier nodes.
> > > When the flag meets, trigger it by “->notify”. The diagram below shows
> > > my understanding , wish it helps to make me understood.
> > > 
> > > VFIOContainer
> > >        |
> > >        giommu_list(VFIOGuestIOMMU)
> > >                 \
> > >                  VFIOGuestIOMMU1 ->   VFIOGuestIOMMU2 -> VFIOGuestIOMMU3 ...
> > >                     |                     |                 |
> > > mr->iommu_notify: IOMMUNotifier   ->    IOMMUNotifier  ->  IOMMUNotifier
> > >                   (Flag:MAP/UNMAP)     (Flag:SVM bind)  (Flag:tlb invalidate)
> > > 
> > > 
> > > Actually, compared with the MAP/UNMAP notifier, the newly added notifier has
> > > no start/end check, and there may be other types of bind notfier flag in
> > > future, so I added a separate fire func for SVM bind notifier.
> > 
> > I agree with Paolo that this interface might not be the suitable place
> > for the SVM notifiers (just like what I worried about in previous
> > discussions).
> > 
> > The biggest problem is that, if you see current notifier mechanism,
> > it's per-memory-region. However iiuc your messages should be
> > per-iommu, or say, per translation unit.
> 
> Hi Peter,
> 
> yes, you're right. the newly added notifier is per-iommu.
> 
> > While, for each iommu, there
> > can be more than one memory regions (ppc can be an example). When
> > there are more than one MRs binded to the same iommu unit, which
> > memory region should you register to? Any one of them, or all?
> 
> Honestly, I'm not expert on ppc. According to the current code,
> I can only find one MR initialized with memory_region_init_iommu()
> in spapr_tce_table_realize(). So to better get your point, let me
> check. Do you mean there may be multiple of iommu MRs behind a iommu?

I am not either. :)

But yes, that's what I mean. At least that's how I understand it.

> 
> I admit it must be considered if there are multiple iommu MRs. I may
> choose to register for one of them since the notifier is per-iommu as
> you've pointed. Then vIOMMU emulator need to trigger the notifier with
> the correct MR. Not sure if ppc vIOMMU is fine with it.
> 
> > So my conclusion is, it just has nothing to do with memory regions...
> >
> > Instead of a different function pointer in IOMMUNotifer, IMHO we can
> > even move a step further, to isolate IOTLB notifications (targeted at
> > memory regions and with start/end ranges) out of SVM/other
> > notifications, since they are different in general. So we basically
> > need two notification mechanism:
> > 
> > - one for memory regions, currently what I can see is IOTLB
> >   notifications
> > 
> > - one for translation units, currently I see all the rest of
> >   notifications needed in virt-svm in this category
> > 
> > Maybe some RFC patches would be good to show what I mean... I'll see
> > whether I can prepare some.
> 
> I agree that it would be helpful to split the two kinds of notifiers. I
> marked it as a FIXME in patch 0006 of this series. Just saw your RFC patch
> for common IOMMUObject. Thx for your work, would try to review it.

Thanks, looking forward to your review comments.

> 
> Besides the notifier registration, pls also help to review the SVM
> virtualization itself. Would be glad to know your comments.

Yes. It's on my list. Thanks,

-- 
Peter Xu

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [RFC PATCH 03/20] intel_iommu: add "svm" option
  2017-04-26 10:06   ` [Qemu-devel] " Liu, Yi L
@ 2017-04-27 10:53       ` Peter Xu
  -1 siblings, 0 replies; 81+ messages in thread
From: Peter Xu @ 2017-04-27 10:53 UTC (permalink / raw)
  To: Liu, Yi L
  Cc: tianyu.lan-ral2JQCrhuEAvxtiuMwx3w,
	kevin.tian-ral2JQCrhuEAvxtiuMwx3w, kvm-u79uwXL29TY76Z2rM5mHXA,
	jasowang-H+wXaHxf7aLQT0dZR+AlfA,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	qemu-devel-qX2TKyscuCcdnm+yROfE0A,
	jacob.jun.pan-ral2JQCrhuEAvxtiuMwx3w

On Wed, Apr 26, 2017 at 06:06:33PM +0800, Liu, Yi L wrote:
> Expose "Shared Virtual Memory" to guest by using "svm" option.
> Also use "svm" to expose SVM related capabilities to guest.
> e.g. "-device intel-iommu, svm=on"
> 
> Signed-off-by: Liu, Yi L <yi.l.liu-VuQAYsv1563Yd54FQh9/CA@public.gmane.org>
> ---
>  hw/i386/intel_iommu.c          | 10 ++++++++++
>  hw/i386/intel_iommu_internal.h |  5 +++++
>  include/hw/i386/intel_iommu.h  |  1 +
>  3 files changed, 16 insertions(+)
> 
> diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
> index bf98fa5..ba1e7eb 100644
> --- a/hw/i386/intel_iommu.c
> +++ b/hw/i386/intel_iommu.c
> @@ -2453,6 +2453,7 @@ static Property vtd_properties[] = {
>      DEFINE_PROP_BOOL("x-buggy-eim", IntelIOMMUState, buggy_eim, false),
>      DEFINE_PROP_BOOL("caching-mode", IntelIOMMUState, caching_mode, FALSE),
>      DEFINE_PROP_BOOL("ecs", IntelIOMMUState, ecs, FALSE),
> +    DEFINE_PROP_BOOL("svm", IntelIOMMUState, svm, FALSE),
>      DEFINE_PROP_END_OF_LIST(),
>  };
>  
> @@ -2973,6 +2974,15 @@ static void vtd_init(IntelIOMMUState *s)
>          s->ecap |= VTD_ECAP_ECS;
>      }
>  
> +    if (s->svm) {
> +        if (!s->ecs || !x86_iommu->pt_supported || !s->caching_mode) {
> +            error_report("Need to set ecs, pt, caching-mode for svm");
> +            exit(1);
> +        }
> +        s->cap |= VTD_CAP_DWD | VTD_CAP_DRD;
> +        s->ecap |= VTD_ECAP_PRS | VTD_ECAP_PTS | VTD_ECAP_PASID28;
> +    }
> +
>      if (s->caching_mode) {
>          s->cap |= VTD_CAP_CM;
>      }
> diff --git a/hw/i386/intel_iommu_internal.h b/hw/i386/intel_iommu_internal.h
> index 71a1c1e..f2a7d12 100644
> --- a/hw/i386/intel_iommu_internal.h
> +++ b/hw/i386/intel_iommu_internal.h
> @@ -191,6 +191,9 @@
>  #define VTD_ECAP_PT                 (1ULL << 6)
>  #define VTD_ECAP_MHMV               (15ULL << 20)
>  #define VTD_ECAP_ECS                (1ULL << 24)
> +#define VTD_ECAP_PASID28            (1ULL << 28)

Could I ask what's this bit? On my spec, it says this bit is reserved
and defunct (spec version: June 2016).

> +#define VTD_ECAP_PRS                (1ULL << 29)
> +#define VTD_ECAP_PTS                (0xeULL << 35)

Would it better we avoid using 0xe here, or at least add some comment?

>  
>  /* CAP_REG */
>  /* (offset >> 4) << 24 */
> @@ -207,6 +210,8 @@
>  #define VTD_CAP_PSI                 (1ULL << 39)
>  #define VTD_CAP_SLLPS               ((1ULL << 34) | (1ULL << 35))
>  #define VTD_CAP_CM                  (1ULL << 7)
> +#define VTD_CAP_DWD                 (1ULL << 54)
> +#define VTD_CAP_DRD                 (1ULL << 55)

Just to confirm: after this series, we should support drain read/write
then, right?

Thanks,

>  
>  /* Supported Adjusted Guest Address Widths */
>  #define VTD_CAP_SAGAW_SHIFT         8
> diff --git a/include/hw/i386/intel_iommu.h b/include/hw/i386/intel_iommu.h
> index ae21fe5..8981615 100644
> --- a/include/hw/i386/intel_iommu.h
> +++ b/include/hw/i386/intel_iommu.h
> @@ -267,6 +267,7 @@ struct IntelIOMMUState {
>  
>      bool caching_mode;          /* RO - is cap CM enabled? */
>      bool ecs;                       /* Extended Context Support */
> +    bool svm;                       /* Shared Virtual Memory */
>  
>      dma_addr_t root;                /* Current root table pointer */
>      bool root_extended;             /* Type of root table (extended or not) */
> -- 
> 1.9.1
> 

-- 
Peter Xu

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 03/20] intel_iommu: add "svm" option
@ 2017-04-27 10:53       ` Peter Xu
  0 siblings, 0 replies; 81+ messages in thread
From: Peter Xu @ 2017-04-27 10:53 UTC (permalink / raw)
  To: Liu, Yi L
  Cc: qemu-devel, alex.williamson, kvm, jasowang, iommu, kevin.tian,
	ashok.raj, jacob.jun.pan, tianyu.lan, yi.l.liu,
	jean-philippe.brucker

On Wed, Apr 26, 2017 at 06:06:33PM +0800, Liu, Yi L wrote:
> Expose "Shared Virtual Memory" to guest by using "svm" option.
> Also use "svm" to expose SVM related capabilities to guest.
> e.g. "-device intel-iommu, svm=on"
> 
> Signed-off-by: Liu, Yi L <yi.l.liu@linux.intel.com>
> ---
>  hw/i386/intel_iommu.c          | 10 ++++++++++
>  hw/i386/intel_iommu_internal.h |  5 +++++
>  include/hw/i386/intel_iommu.h  |  1 +
>  3 files changed, 16 insertions(+)
> 
> diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
> index bf98fa5..ba1e7eb 100644
> --- a/hw/i386/intel_iommu.c
> +++ b/hw/i386/intel_iommu.c
> @@ -2453,6 +2453,7 @@ static Property vtd_properties[] = {
>      DEFINE_PROP_BOOL("x-buggy-eim", IntelIOMMUState, buggy_eim, false),
>      DEFINE_PROP_BOOL("caching-mode", IntelIOMMUState, caching_mode, FALSE),
>      DEFINE_PROP_BOOL("ecs", IntelIOMMUState, ecs, FALSE),
> +    DEFINE_PROP_BOOL("svm", IntelIOMMUState, svm, FALSE),
>      DEFINE_PROP_END_OF_LIST(),
>  };
>  
> @@ -2973,6 +2974,15 @@ static void vtd_init(IntelIOMMUState *s)
>          s->ecap |= VTD_ECAP_ECS;
>      }
>  
> +    if (s->svm) {
> +        if (!s->ecs || !x86_iommu->pt_supported || !s->caching_mode) {
> +            error_report("Need to set ecs, pt, caching-mode for svm");
> +            exit(1);
> +        }
> +        s->cap |= VTD_CAP_DWD | VTD_CAP_DRD;
> +        s->ecap |= VTD_ECAP_PRS | VTD_ECAP_PTS | VTD_ECAP_PASID28;
> +    }
> +
>      if (s->caching_mode) {
>          s->cap |= VTD_CAP_CM;
>      }
> diff --git a/hw/i386/intel_iommu_internal.h b/hw/i386/intel_iommu_internal.h
> index 71a1c1e..f2a7d12 100644
> --- a/hw/i386/intel_iommu_internal.h
> +++ b/hw/i386/intel_iommu_internal.h
> @@ -191,6 +191,9 @@
>  #define VTD_ECAP_PT                 (1ULL << 6)
>  #define VTD_ECAP_MHMV               (15ULL << 20)
>  #define VTD_ECAP_ECS                (1ULL << 24)
> +#define VTD_ECAP_PASID28            (1ULL << 28)

Could I ask what's this bit? On my spec, it says this bit is reserved
and defunct (spec version: June 2016).

> +#define VTD_ECAP_PRS                (1ULL << 29)
> +#define VTD_ECAP_PTS                (0xeULL << 35)

Would it better we avoid using 0xe here, or at least add some comment?

>  
>  /* CAP_REG */
>  /* (offset >> 4) << 24 */
> @@ -207,6 +210,8 @@
>  #define VTD_CAP_PSI                 (1ULL << 39)
>  #define VTD_CAP_SLLPS               ((1ULL << 34) | (1ULL << 35))
>  #define VTD_CAP_CM                  (1ULL << 7)
> +#define VTD_CAP_DWD                 (1ULL << 54)
> +#define VTD_CAP_DRD                 (1ULL << 55)

Just to confirm: after this series, we should support drain read/write
then, right?

Thanks,

>  
>  /* Supported Adjusted Guest Address Widths */
>  #define VTD_CAP_SAGAW_SHIFT         8
> diff --git a/include/hw/i386/intel_iommu.h b/include/hw/i386/intel_iommu.h
> index ae21fe5..8981615 100644
> --- a/include/hw/i386/intel_iommu.h
> +++ b/include/hw/i386/intel_iommu.h
> @@ -267,6 +267,7 @@ struct IntelIOMMUState {
>  
>      bool caching_mode;          /* RO - is cap CM enabled? */
>      bool ecs;                       /* Extended Context Support */
> +    bool svm;                       /* Shared Virtual Memory */
>  
>      dma_addr_t root;                /* Current root table pointer */
>      bool root_extended;             /* Type of root table (extended or not) */
> -- 
> 1.9.1
> 

-- 
Peter Xu

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [RFC PATCH 02/20] intel_iommu: exposed extended-context mode to guest
  2017-04-27 10:32       ` [Qemu-devel] " Peter Xu
@ 2017-04-28  6:00         ` Lan Tianyu
  -1 siblings, 0 replies; 81+ messages in thread
From: Lan Tianyu @ 2017-04-28  6:00 UTC (permalink / raw)
  To: Peter Xu, Liu, Yi L
  Cc: qemu-devel, alex.williamson, kvm, jasowang, iommu, kevin.tian,
	ashok.raj, jacob.jun.pan, yi.l.liu, jean-philippe.brucker

On 2017年04月27日 18:32, Peter Xu wrote:
> On Wed, Apr 26, 2017 at 06:06:32PM +0800, Liu, Yi L wrote:
>> VT-d implementations reporting PASID or PRS fields as "Set", must also
>> report ecap.ECS as "Set". Extended-Context is required for SVM.
>>
>> When ECS is reported, intel iommu driver would initiate extended root entry
>> and extended context entry, and also PASID table if there is any SVM capable
>> device.
>>
>> Signed-off-by: Liu, Yi L <yi.l.liu@linux.intel.com>
>> ---
>>  hw/i386/intel_iommu.c          | 131 +++++++++++++++++++++++++++--------------
>>  hw/i386/intel_iommu_internal.h |   9 +++
>>  include/hw/i386/intel_iommu.h  |   2 +-
>>  3 files changed, 97 insertions(+), 45 deletions(-)
>>
>> diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
>> index 400d0d1..bf98fa5 100644
>> --- a/hw/i386/intel_iommu.c
>> +++ b/hw/i386/intel_iommu.c
>> @@ -497,6 +497,11 @@ static inline bool vtd_root_entry_present(VTDRootEntry *root)
>>      return root->val & VTD_ROOT_ENTRY_P;
>>  }
>>  
>> +static inline bool vtd_root_entry_upper_present(VTDRootEntry *root)
>> +{
>> +    return root->rsvd & VTD_ROOT_ENTRY_P;
>> +}
>> +
>>  static int vtd_get_root_entry(IntelIOMMUState *s, uint8_t index,
>>                                VTDRootEntry *re)
>>  {
>> @@ -509,6 +514,9 @@ static int vtd_get_root_entry(IntelIOMMUState *s, uint8_t index,
>>          return -VTD_FR_ROOT_TABLE_INV;
>>      }
>>      re->val = le64_to_cpu(re->val);
>> +    if (s->ecs) {
>> +        re->rsvd = le64_to_cpu(re->rsvd);
>> +    }
> 
> I feel it slightly hacky to play with re->rsvd. How about:
> 
> union VTDRootEntry {
>     struct {
>         uint64_t val;
>         uint64_t rsvd;
>     } base;
>     struct {
>         uint64_t ext_lo;
>         uint64_t ext_hi;
>     } extended;
> };
> 
> (Or any better way that can get rid of rsvd...)
> 
> Even:
> 
> struct VTDRootEntry {
>     union {
>         struct {
>                 uint64_t val;
>                 uint64_t rsvd;
>         } base;
>         struct {
>                 uint64_t ext_lo;
>                 uint64_t ext_hi;
>         } extended;
>     } data;
>     bool extended;
> };
> 
> Then we read the entry into data, and setup extended bit. A benefit of
> it is that we may avoid passing around IntelIOMMUState everywhere to
> know whether we are using extended context entries.
> 
>>      return 0;
>>  }
>>  
>> @@ -517,19 +525,30 @@ static inline bool vtd_context_entry_present(VTDContextEntry *context)
>>      return context->lo & VTD_CONTEXT_ENTRY_P;
>>  }
>>  
>> -static int vtd_get_context_entry_from_root(VTDRootEntry *root, uint8_t index,
>> -                                           VTDContextEntry *ce)
>> +static int vtd_get_context_entry_from_root(IntelIOMMUState *s,
>> +                 VTDRootEntry *root, uint8_t index, VTDContextEntry *ce)
>>  {
>> -    dma_addr_t addr;
>> +    dma_addr_t addr, ce_size;
>>  
>>      /* we have checked that root entry is present */
>> -    addr = (root->val & VTD_ROOT_ENTRY_CTP) + index * sizeof(*ce);
>> -    if (dma_memory_read(&address_space_memory, addr, ce, sizeof(*ce))) {
>> +    ce_size = (s->ecs) ? (2 * sizeof(*ce)) : (sizeof(*ce));
>> +    addr = (s->ecs && (index > 0x7f)) ?
>> +           ((root->rsvd & VTD_ROOT_ENTRY_CTP) + (index - 0x80) * ce_size) :
>> +           ((root->val & VTD_ROOT_ENTRY_CTP) + index * ce_size);
>> +
>> +    if (dma_memory_read(&address_space_memory, addr, ce, ce_size)) {
>>          trace_vtd_re_invalid(root->rsvd, root->val);
>>          return -VTD_FR_CONTEXT_TABLE_INV;
>>      }
>> -    ce->lo = le64_to_cpu(ce->lo);
>> -    ce->hi = le64_to_cpu(ce->hi);
>> +
>> +    ce[0].lo = le64_to_cpu(ce[0].lo);
>> +    ce[0].hi = le64_to_cpu(ce[0].hi);
> 
> Again, I feel this even hackier. :)
> 
> I would slightly prefer to play the same union trick to context
> entries, just like what I proposed to the root entries above...
> 
>> +
>> +    if (s->ecs) {
>> +        ce[1].lo = le64_to_cpu(ce[1].lo);
>> +        ce[1].hi = le64_to_cpu(ce[1].hi);
>> +    }
>> +
>>      return 0;
>>  }
>>  
>> @@ -595,9 +614,11 @@ static inline uint32_t vtd_get_agaw_from_context_entry(VTDContextEntry *ce)
>>      return 30 + (ce->hi & VTD_CONTEXT_ENTRY_AW) * 9;
>>  }
>>  
>> -static inline uint32_t vtd_ce_get_type(VTDContextEntry *ce)
>> +static inline uint32_t vtd_ce_get_type(IntelIOMMUState *s,
>> +                                       VTDContextEntry *ce)
>>  {
>> -    return ce->lo & VTD_CONTEXT_ENTRY_TT;
>> +    return s->ecs ? (ce->lo & VTD_CONTEXT_ENTRY_TT) :
>> +                    (ce->lo & VTD_EXT_CONTEXT_ENTRY_TT);
>>  }
>>  
>>  static inline uint64_t vtd_iova_limit(VTDContextEntry *ce)
>> @@ -842,16 +863,20 @@ static int vtd_dev_to_context_entry(IntelIOMMUState *s, uint8_t bus_num,
>>          return ret_fr;
>>      }
>>  
>> -    if (!vtd_root_entry_present(&re)) {
>> +    if (!vtd_root_entry_present(&re) ||
>> +        (s->ecs && (devfn > 0x7f) && (!vtd_root_entry_upper_present(&re)))) {
>>          /* Not error - it's okay we don't have root entry. */
>>          trace_vtd_re_not_present(bus_num);
>>          return -VTD_FR_ROOT_ENTRY_P;
>> -    } else if (re.rsvd || (re.val & VTD_ROOT_ENTRY_RSVD)) {
>> -        trace_vtd_re_invalid(re.rsvd, re.val);
>> -        return -VTD_FR_ROOT_ENTRY_RSVD;
>> +    }
>> +    if ((s->ecs && (devfn > 0x7f) && (re.rsvd & VTD_ROOT_ENTRY_RSVD)) ||
>> +        (s->ecs && (devfn < 0x80) && (re.val & VTD_ROOT_ENTRY_RSVD)) ||
>> +        ((!s->ecs) && (re.rsvd || (re.val & VTD_ROOT_ENTRY_RSVD)))) {
>> +            trace_vtd_re_invalid(re.rsvd, re.val);
>> +            return -VTD_FR_ROOT_ENTRY_RSVD;
> 
> Nit: I feel like we can better wrap these 0x7f and 0x80 into helper
> functions, especially if with above structure change...
> 
> (will hold here...)
> 
> Thanks,
> 


It's possible to add helper macro to check bits in context entry and
extend context entry and put the check of ecs mode into helper macro?

-- 
Best regards
Tianyu Lan

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 02/20] intel_iommu: exposed extended-context mode to guest
@ 2017-04-28  6:00         ` Lan Tianyu
  0 siblings, 0 replies; 81+ messages in thread
From: Lan Tianyu @ 2017-04-28  6:00 UTC (permalink / raw)
  To: Peter Xu, Liu, Yi L
  Cc: qemu-devel, alex.williamson, kvm, jasowang, iommu, kevin.tian,
	ashok.raj, jacob.jun.pan, yi.l.liu, jean-philippe.brucker

On 2017年04月27日 18:32, Peter Xu wrote:
> On Wed, Apr 26, 2017 at 06:06:32PM +0800, Liu, Yi L wrote:
>> VT-d implementations reporting PASID or PRS fields as "Set", must also
>> report ecap.ECS as "Set". Extended-Context is required for SVM.
>>
>> When ECS is reported, intel iommu driver would initiate extended root entry
>> and extended context entry, and also PASID table if there is any SVM capable
>> device.
>>
>> Signed-off-by: Liu, Yi L <yi.l.liu@linux.intel.com>
>> ---
>>  hw/i386/intel_iommu.c          | 131 +++++++++++++++++++++++++++--------------
>>  hw/i386/intel_iommu_internal.h |   9 +++
>>  include/hw/i386/intel_iommu.h  |   2 +-
>>  3 files changed, 97 insertions(+), 45 deletions(-)
>>
>> diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
>> index 400d0d1..bf98fa5 100644
>> --- a/hw/i386/intel_iommu.c
>> +++ b/hw/i386/intel_iommu.c
>> @@ -497,6 +497,11 @@ static inline bool vtd_root_entry_present(VTDRootEntry *root)
>>      return root->val & VTD_ROOT_ENTRY_P;
>>  }
>>  
>> +static inline bool vtd_root_entry_upper_present(VTDRootEntry *root)
>> +{
>> +    return root->rsvd & VTD_ROOT_ENTRY_P;
>> +}
>> +
>>  static int vtd_get_root_entry(IntelIOMMUState *s, uint8_t index,
>>                                VTDRootEntry *re)
>>  {
>> @@ -509,6 +514,9 @@ static int vtd_get_root_entry(IntelIOMMUState *s, uint8_t index,
>>          return -VTD_FR_ROOT_TABLE_INV;
>>      }
>>      re->val = le64_to_cpu(re->val);
>> +    if (s->ecs) {
>> +        re->rsvd = le64_to_cpu(re->rsvd);
>> +    }
> 
> I feel it slightly hacky to play with re->rsvd. How about:
> 
> union VTDRootEntry {
>     struct {
>         uint64_t val;
>         uint64_t rsvd;
>     } base;
>     struct {
>         uint64_t ext_lo;
>         uint64_t ext_hi;
>     } extended;
> };
> 
> (Or any better way that can get rid of rsvd...)
> 
> Even:
> 
> struct VTDRootEntry {
>     union {
>         struct {
>                 uint64_t val;
>                 uint64_t rsvd;
>         } base;
>         struct {
>                 uint64_t ext_lo;
>                 uint64_t ext_hi;
>         } extended;
>     } data;
>     bool extended;
> };
> 
> Then we read the entry into data, and setup extended bit. A benefit of
> it is that we may avoid passing around IntelIOMMUState everywhere to
> know whether we are using extended context entries.
> 
>>      return 0;
>>  }
>>  
>> @@ -517,19 +525,30 @@ static inline bool vtd_context_entry_present(VTDContextEntry *context)
>>      return context->lo & VTD_CONTEXT_ENTRY_P;
>>  }
>>  
>> -static int vtd_get_context_entry_from_root(VTDRootEntry *root, uint8_t index,
>> -                                           VTDContextEntry *ce)
>> +static int vtd_get_context_entry_from_root(IntelIOMMUState *s,
>> +                 VTDRootEntry *root, uint8_t index, VTDContextEntry *ce)
>>  {
>> -    dma_addr_t addr;
>> +    dma_addr_t addr, ce_size;
>>  
>>      /* we have checked that root entry is present */
>> -    addr = (root->val & VTD_ROOT_ENTRY_CTP) + index * sizeof(*ce);
>> -    if (dma_memory_read(&address_space_memory, addr, ce, sizeof(*ce))) {
>> +    ce_size = (s->ecs) ? (2 * sizeof(*ce)) : (sizeof(*ce));
>> +    addr = (s->ecs && (index > 0x7f)) ?
>> +           ((root->rsvd & VTD_ROOT_ENTRY_CTP) + (index - 0x80) * ce_size) :
>> +           ((root->val & VTD_ROOT_ENTRY_CTP) + index * ce_size);
>> +
>> +    if (dma_memory_read(&address_space_memory, addr, ce, ce_size)) {
>>          trace_vtd_re_invalid(root->rsvd, root->val);
>>          return -VTD_FR_CONTEXT_TABLE_INV;
>>      }
>> -    ce->lo = le64_to_cpu(ce->lo);
>> -    ce->hi = le64_to_cpu(ce->hi);
>> +
>> +    ce[0].lo = le64_to_cpu(ce[0].lo);
>> +    ce[0].hi = le64_to_cpu(ce[0].hi);
> 
> Again, I feel this even hackier. :)
> 
> I would slightly prefer to play the same union trick to context
> entries, just like what I proposed to the root entries above...
> 
>> +
>> +    if (s->ecs) {
>> +        ce[1].lo = le64_to_cpu(ce[1].lo);
>> +        ce[1].hi = le64_to_cpu(ce[1].hi);
>> +    }
>> +
>>      return 0;
>>  }
>>  
>> @@ -595,9 +614,11 @@ static inline uint32_t vtd_get_agaw_from_context_entry(VTDContextEntry *ce)
>>      return 30 + (ce->hi & VTD_CONTEXT_ENTRY_AW) * 9;
>>  }
>>  
>> -static inline uint32_t vtd_ce_get_type(VTDContextEntry *ce)
>> +static inline uint32_t vtd_ce_get_type(IntelIOMMUState *s,
>> +                                       VTDContextEntry *ce)
>>  {
>> -    return ce->lo & VTD_CONTEXT_ENTRY_TT;
>> +    return s->ecs ? (ce->lo & VTD_CONTEXT_ENTRY_TT) :
>> +                    (ce->lo & VTD_EXT_CONTEXT_ENTRY_TT);
>>  }
>>  
>>  static inline uint64_t vtd_iova_limit(VTDContextEntry *ce)
>> @@ -842,16 +863,20 @@ static int vtd_dev_to_context_entry(IntelIOMMUState *s, uint8_t bus_num,
>>          return ret_fr;
>>      }
>>  
>> -    if (!vtd_root_entry_present(&re)) {
>> +    if (!vtd_root_entry_present(&re) ||
>> +        (s->ecs && (devfn > 0x7f) && (!vtd_root_entry_upper_present(&re)))) {
>>          /* Not error - it's okay we don't have root entry. */
>>          trace_vtd_re_not_present(bus_num);
>>          return -VTD_FR_ROOT_ENTRY_P;
>> -    } else if (re.rsvd || (re.val & VTD_ROOT_ENTRY_RSVD)) {
>> -        trace_vtd_re_invalid(re.rsvd, re.val);
>> -        return -VTD_FR_ROOT_ENTRY_RSVD;
>> +    }
>> +    if ((s->ecs && (devfn > 0x7f) && (re.rsvd & VTD_ROOT_ENTRY_RSVD)) ||
>> +        (s->ecs && (devfn < 0x80) && (re.val & VTD_ROOT_ENTRY_RSVD)) ||
>> +        ((!s->ecs) && (re.rsvd || (re.val & VTD_ROOT_ENTRY_RSVD)))) {
>> +            trace_vtd_re_invalid(re.rsvd, re.val);
>> +            return -VTD_FR_ROOT_ENTRY_RSVD;
> 
> Nit: I feel like we can better wrap these 0x7f and 0x80 into helper
> functions, especially if with above structure change...
> 
> (will hold here...)
> 
> Thanks,
> 


It's possible to add helper macro to check bits in context entry and
extend context entry and put the check of ecs mode into helper macro?

-- 
Best regards
Tianyu Lan

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [RFC PATCH 09/20] Memory: introduce iommu_ops->record_device
  2017-04-26 10:06   ` [Qemu-devel] " Liu, Yi L
@ 2017-04-28  6:46       ` Lan Tianyu
  -1 siblings, 0 replies; 81+ messages in thread
From: Lan Tianyu @ 2017-04-28  6:46 UTC (permalink / raw)
  To: Liu, Yi L, qemu-devel-qX2TKyscuCcdnm+yROfE0A,
	alex.williamson-H+wXaHxf7aLQT0dZR+AlfA,
	peterx-H+wXaHxf7aLQT0dZR+AlfA
  Cc: kevin.tian-ral2JQCrhuEAvxtiuMwx3w, kvm-u79uwXL29TY76Z2rM5mHXA,
	jasowang-H+wXaHxf7aLQT0dZR+AlfA,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	jacob.jun.pan-ral2JQCrhuEAvxtiuMwx3w

On 2017年04月26日 18:06, Liu, Yi L wrote:
> With vIOMMU exposed to guest, vIOMMU emulator needs to do translation
> between host and guest. e.g. a device-selective TLB flush, vIOMMU
> emulator needs to replace guest SID with host SID so that to limit
> the invalidation. This patch introduces a new callback
> iommu_ops->record_device() to notify vIOMMU emulator to record necessary
> information about the assigned device.

This patch is to prepare to translate guest sbdf to host sbdf.

Alex:
	Could we add a new vfio API to do such translation? This will be more
straight forward than storing host sbdf in the vIOMMU device model.

> 
> Signed-off-by: Liu, Yi L <yi.l.liu@linux.intel.com>
> ---
>  include/exec/memory.h | 11 +++++++++++
>  memory.c              | 12 ++++++++++++
>  2 files changed, 23 insertions(+)
> 
> diff --git a/include/exec/memory.h b/include/exec/memory.h
> index 7bd13ab..49087ef 100644
> --- a/include/exec/memory.h
> +++ b/include/exec/memory.h
> @@ -203,6 +203,8 @@ struct MemoryRegionIOMMUOps {
>                                  IOMMUNotifierFlag new_flags);
>      /* Set this up to provide customized IOMMU replay function */
>      void (*replay)(MemoryRegion *iommu, IOMMUNotifier *notifier);
> +    void (*record_device)(MemoryRegion *iommu,
> +                          void *device_info);
>  };
>  
>  typedef struct CoalescedMemoryRange CoalescedMemoryRange;
> @@ -708,6 +710,15 @@ void memory_region_notify_iommu(MemoryRegion *mr,
>  void memory_region_notify_one(IOMMUNotifier *notifier,
>                                IOMMUTLBEntry *entry);
>  
> +/*
> + * memory_region_notify_device_record: notify IOMMU to record assign
> + * device.
> + * @mr: the memory region to notify
> + * @ device_info: device information
> + */
> +void memory_region_notify_device_record(MemoryRegion *mr,
> +                                        void *info);
> +
>  /**
>   * memory_region_register_iommu_notifier: register a notifier for changes to
>   * IOMMU translation entries.
> diff --git a/memory.c b/memory.c
> index 0728e62..45ef069 100644
> --- a/memory.c
> +++ b/memory.c
> @@ -1600,6 +1600,18 @@ static void memory_region_update_iommu_notify_flags(MemoryRegion *mr)
>      mr->iommu_notify_flags = flags;
>  }
>  
> +void memory_region_notify_device_record(MemoryRegion *mr,
> +                                        void *info)
> +{
> +    assert(memory_region_is_iommu(mr));
> +
> +    if (mr->iommu_ops->record_device) {
> +        mr->iommu_ops->record_device(mr, info);
> +    }
> +
> +    return;
> +}
> +
>  void memory_region_register_iommu_notifier(MemoryRegion *mr,
>                                             IOMMUNotifier *n)
>  {
> 

_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 09/20] Memory: introduce iommu_ops->record_device
@ 2017-04-28  6:46       ` Lan Tianyu
  0 siblings, 0 replies; 81+ messages in thread
From: Lan Tianyu @ 2017-04-28  6:46 UTC (permalink / raw)
  To: Liu, Yi L, qemu-devel, alex.williamson, peterx
  Cc: kvm, jasowang, iommu, kevin.tian, ashok.raj, jacob.jun.pan,
	yi.l.liu, jean-philippe.brucker

On 2017年04月26日 18:06, Liu, Yi L wrote:
> With vIOMMU exposed to guest, vIOMMU emulator needs to do translation
> between host and guest. e.g. a device-selective TLB flush, vIOMMU
> emulator needs to replace guest SID with host SID so that to limit
> the invalidation. This patch introduces a new callback
> iommu_ops->record_device() to notify vIOMMU emulator to record necessary
> information about the assigned device.

This patch is to prepare to translate guest sbdf to host sbdf.

Alex:
	Could we add a new vfio API to do such translation? This will be more
straight forward than storing host sbdf in the vIOMMU device model.

> 
> Signed-off-by: Liu, Yi L <yi.l.liu@linux.intel.com>
> ---
>  include/exec/memory.h | 11 +++++++++++
>  memory.c              | 12 ++++++++++++
>  2 files changed, 23 insertions(+)
> 
> diff --git a/include/exec/memory.h b/include/exec/memory.h
> index 7bd13ab..49087ef 100644
> --- a/include/exec/memory.h
> +++ b/include/exec/memory.h
> @@ -203,6 +203,8 @@ struct MemoryRegionIOMMUOps {
>                                  IOMMUNotifierFlag new_flags);
>      /* Set this up to provide customized IOMMU replay function */
>      void (*replay)(MemoryRegion *iommu, IOMMUNotifier *notifier);
> +    void (*record_device)(MemoryRegion *iommu,
> +                          void *device_info);
>  };
>  
>  typedef struct CoalescedMemoryRange CoalescedMemoryRange;
> @@ -708,6 +710,15 @@ void memory_region_notify_iommu(MemoryRegion *mr,
>  void memory_region_notify_one(IOMMUNotifier *notifier,
>                                IOMMUTLBEntry *entry);
>  
> +/*
> + * memory_region_notify_device_record: notify IOMMU to record assign
> + * device.
> + * @mr: the memory region to notify
> + * @ device_info: device information
> + */
> +void memory_region_notify_device_record(MemoryRegion *mr,
> +                                        void *info);
> +
>  /**
>   * memory_region_register_iommu_notifier: register a notifier for changes to
>   * IOMMU translation entries.
> diff --git a/memory.c b/memory.c
> index 0728e62..45ef069 100644
> --- a/memory.c
> +++ b/memory.c
> @@ -1600,6 +1600,18 @@ static void memory_region_update_iommu_notify_flags(MemoryRegion *mr)
>      mr->iommu_notify_flags = flags;
>  }
>  
> +void memory_region_notify_device_record(MemoryRegion *mr,
> +                                        void *info)
> +{
> +    assert(memory_region_is_iommu(mr));
> +
> +    if (mr->iommu_ops->record_device) {
> +        mr->iommu_ops->record_device(mr, info);
> +    }
> +
> +    return;
> +}
> +
>  void memory_region_register_iommu_notifier(MemoryRegion *mr,
>                                             IOMMUNotifier *n)
>  {
> 

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [RFC PATCH 14/20] intel_iommu: add FOR_EACH_ASSIGN_DEVICE macro
  2017-04-26 10:06   ` [Qemu-devel] " Liu, Yi L
@ 2017-04-28  7:33       ` Lan Tianyu
  -1 siblings, 0 replies; 81+ messages in thread
From: Lan Tianyu @ 2017-04-28  7:33 UTC (permalink / raw)
  To: Liu, Yi L, qemu-devel-qX2TKyscuCcdnm+yROfE0A,
	alex.williamson-H+wXaHxf7aLQT0dZR+AlfA,
	peterx-H+wXaHxf7aLQT0dZR+AlfA
  Cc: kevin.tian-ral2JQCrhuEAvxtiuMwx3w, kvm-u79uwXL29TY76Z2rM5mHXA,
	jasowang-H+wXaHxf7aLQT0dZR+AlfA,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	jacob.jun.pan-ral2JQCrhuEAvxtiuMwx3w

On 2017年04月26日 18:06, Liu, Yi L wrote:
> Add FOR_EACH_ASSIGN_DEVICE. It would be used to loop all assigned
> devices when processing guest pasid table linking and iommu cache
> invalidate propagation.
> 
> Signed-off-by: Liu, Yi L <yi.l.liu@linux.intel.com>
> ---
>  hw/i386/intel_iommu.c          | 32 ++++++++++++++++++++++++++++++++
>  hw/i386/intel_iommu_internal.h | 11 +++++++++++
>  2 files changed, 43 insertions(+)
> 
> diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
> index 0c412d2..f291995 100644
> --- a/hw/i386/intel_iommu.c
> +++ b/hw/i386/intel_iommu.c
> @@ -55,6 +55,38 @@ static int vtd_dbgflags = VTD_DBGBIT(GENERAL) | VTD_DBGBIT(CSR);
>  #define VTD_DPRINTF(what, fmt, ...) do {} while (0)
>  #endif
>  
> +#define FOR_EACH_ASSIGN_DEVICE(__notify_info_type, \
> +                               __opaque_type, \
> +                               __hook_info, \
> +                               __hook_fn) \
> +do { \
> +    IntelIOMMUNotifierNode *node; \
> +    VTDNotifierIterator iterator; \
> +    int ret = 0; \
> +    __notify_info_type *notify_info; \
> +    __opaque_type *opaq; \
> +    int argsz; \
> +    argsz = sizeof(*notify_info) + sizeof(*opaq); \
> +    notify_info = g_malloc0(argsz); \
> +    QLIST_FOREACH(node, &(s->notifiers_list), next) { \
> +        VTDAddressSpace *vtd_as = node->vtd_as; \
> +        VTDContextEntry ce[2]; \
> +        iterator.bus = pci_bus_num(vtd_as->bus); \
> +        ret = vtd_dev_to_context_entry(s, iterator.bus, \
> +                               vtd_as->devfn, &ce[0]); \
> +        if (ret != 0) { \
> +            continue; \
> +        } \
> +        iterator.sid = vtd_make_source_id(iterator.bus, vtd_as->devfn); \
> +        iterator.did =  VTD_CONTEXT_ENTRY_DID(ce[0].hi); \
> +        iterator.host_sid = node->host_sid; \
> +        iterator.vtd_as = vtd_as; \
> +        iterator.ce = &ce[0]; \
> +        __hook_fn(&iterator, __hook_info, notify_info); \
> +    } \
> +    g_free(notify_info); \
> +} while (0)
> +
>  static void vtd_define_quad(IntelIOMMUState *s, hwaddr addr, uint64_t val,
>                              uint64_t wmask, uint64_t w1cmask)
>  {
> diff --git a/hw/i386/intel_iommu_internal.h b/hw/i386/intel_iommu_internal.h
> index f2a7d12..5178398 100644
> --- a/hw/i386/intel_iommu_internal.h
> +++ b/hw/i386/intel_iommu_internal.h
> @@ -439,6 +439,17 @@ typedef struct VTDRootEntry VTDRootEntry;
>  #define VTD_EXT_CONTEXT_TT_NO_DEV_IOTLB   (4ULL << 2)
>  #define VTD_EXT_CONTEXT_TT_DEV_IOTLB      (5ULL << 2)
>  
> +struct VTDNotifierIterator {
> +    VTDAddressSpace *vtd_as;
> +    VTDContextEntry *ce;
> +    uint16_t host_sid;
> +    uint16_t sid;
> +    uint16_t did;
> +    uint8_t  bus;

The "bus" seems to be redundant.
It is already contained in the "sid", right?

> +};
> +
> +typedef struct VTDNotifierIterator VTDNotifierIterator;
> +
>  /* Paging Structure common */
>  #define VTD_SL_PT_PAGE_SIZE_MASK    (1ULL << 7)
>  /* Bits to decide the offset for each level */
> 


-- 
Best regards
Tianyu Lan
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 14/20] intel_iommu: add FOR_EACH_ASSIGN_DEVICE macro
@ 2017-04-28  7:33       ` Lan Tianyu
  0 siblings, 0 replies; 81+ messages in thread
From: Lan Tianyu @ 2017-04-28  7:33 UTC (permalink / raw)
  To: Liu, Yi L, qemu-devel, alex.williamson, peterx
  Cc: kvm, jasowang, iommu, kevin.tian, ashok.raj, jacob.jun.pan,
	yi.l.liu, jean-philippe.brucker

On 2017年04月26日 18:06, Liu, Yi L wrote:
> Add FOR_EACH_ASSIGN_DEVICE. It would be used to loop all assigned
> devices when processing guest pasid table linking and iommu cache
> invalidate propagation.
> 
> Signed-off-by: Liu, Yi L <yi.l.liu@linux.intel.com>
> ---
>  hw/i386/intel_iommu.c          | 32 ++++++++++++++++++++++++++++++++
>  hw/i386/intel_iommu_internal.h | 11 +++++++++++
>  2 files changed, 43 insertions(+)
> 
> diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
> index 0c412d2..f291995 100644
> --- a/hw/i386/intel_iommu.c
> +++ b/hw/i386/intel_iommu.c
> @@ -55,6 +55,38 @@ static int vtd_dbgflags = VTD_DBGBIT(GENERAL) | VTD_DBGBIT(CSR);
>  #define VTD_DPRINTF(what, fmt, ...) do {} while (0)
>  #endif
>  
> +#define FOR_EACH_ASSIGN_DEVICE(__notify_info_type, \
> +                               __opaque_type, \
> +                               __hook_info, \
> +                               __hook_fn) \
> +do { \
> +    IntelIOMMUNotifierNode *node; \
> +    VTDNotifierIterator iterator; \
> +    int ret = 0; \
> +    __notify_info_type *notify_info; \
> +    __opaque_type *opaq; \
> +    int argsz; \
> +    argsz = sizeof(*notify_info) + sizeof(*opaq); \
> +    notify_info = g_malloc0(argsz); \
> +    QLIST_FOREACH(node, &(s->notifiers_list), next) { \
> +        VTDAddressSpace *vtd_as = node->vtd_as; \
> +        VTDContextEntry ce[2]; \
> +        iterator.bus = pci_bus_num(vtd_as->bus); \
> +        ret = vtd_dev_to_context_entry(s, iterator.bus, \
> +                               vtd_as->devfn, &ce[0]); \
> +        if (ret != 0) { \
> +            continue; \
> +        } \
> +        iterator.sid = vtd_make_source_id(iterator.bus, vtd_as->devfn); \
> +        iterator.did =  VTD_CONTEXT_ENTRY_DID(ce[0].hi); \
> +        iterator.host_sid = node->host_sid; \
> +        iterator.vtd_as = vtd_as; \
> +        iterator.ce = &ce[0]; \
> +        __hook_fn(&iterator, __hook_info, notify_info); \
> +    } \
> +    g_free(notify_info); \
> +} while (0)
> +
>  static void vtd_define_quad(IntelIOMMUState *s, hwaddr addr, uint64_t val,
>                              uint64_t wmask, uint64_t w1cmask)
>  {
> diff --git a/hw/i386/intel_iommu_internal.h b/hw/i386/intel_iommu_internal.h
> index f2a7d12..5178398 100644
> --- a/hw/i386/intel_iommu_internal.h
> +++ b/hw/i386/intel_iommu_internal.h
> @@ -439,6 +439,17 @@ typedef struct VTDRootEntry VTDRootEntry;
>  #define VTD_EXT_CONTEXT_TT_NO_DEV_IOTLB   (4ULL << 2)
>  #define VTD_EXT_CONTEXT_TT_DEV_IOTLB      (5ULL << 2)
>  
> +struct VTDNotifierIterator {
> +    VTDAddressSpace *vtd_as;
> +    VTDContextEntry *ce;
> +    uint16_t host_sid;
> +    uint16_t sid;
> +    uint16_t did;
> +    uint8_t  bus;

The "bus" seems to be redundant.
It is already contained in the "sid", right?

> +};
> +
> +typedef struct VTDNotifierIterator VTDNotifierIterator;
> +
>  /* Paging Structure common */
>  #define VTD_SL_PT_PAGE_SIZE_MASK    (1ULL << 7)
>  /* Bits to decide the offset for each level */
> 


-- 
Best regards
Tianyu Lan

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 02/20] intel_iommu: exposed extended-context mode to guest
  2017-04-27 10:32       ` [Qemu-devel] " Peter Xu
@ 2017-04-28  9:55           ` Liu, Yi L
  -1 siblings, 0 replies; 81+ messages in thread
From: Liu, Yi L @ 2017-04-28  9:55 UTC (permalink / raw)
  To: Peter Xu
  Cc: tianyu.lan-ral2JQCrhuEAvxtiuMwx3w,
	kevin.tian-ral2JQCrhuEAvxtiuMwx3w, kvm-u79uwXL29TY76Z2rM5mHXA,
	jasowang-H+wXaHxf7aLQT0dZR+AlfA,
	qemu-devel-qX2TKyscuCcdnm+yROfE0A,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	jacob.jun.pan-ral2JQCrhuEAvxtiuMwx3w

On Thu, Apr 27, 2017 at 06:32:21PM +0800, Peter Xu wrote:
> On Wed, Apr 26, 2017 at 06:06:32PM +0800, Liu, Yi L wrote:
> > VT-d implementations reporting PASID or PRS fields as "Set", must also
> > report ecap.ECS as "Set". Extended-Context is required for SVM.
> > 
> > When ECS is reported, intel iommu driver would initiate extended root entry
> > and extended context entry, and also PASID table if there is any SVM capable
> > device.
> > 
> > Signed-off-by: Liu, Yi L <yi.l.liu-VuQAYsv1563Yd54FQh9/CA@public.gmane.org>
> > ---
> >  hw/i386/intel_iommu.c          | 131 +++++++++++++++++++++++++++--------------
> >  hw/i386/intel_iommu_internal.h |   9 +++
> >  include/hw/i386/intel_iommu.h  |   2 +-
> >  3 files changed, 97 insertions(+), 45 deletions(-)
> > 
> > diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
> > index 400d0d1..bf98fa5 100644
> > --- a/hw/i386/intel_iommu.c
> > +++ b/hw/i386/intel_iommu.c
> > @@ -497,6 +497,11 @@ static inline bool vtd_root_entry_present(VTDRootEntry *root)
> >      return root->val & VTD_ROOT_ENTRY_P;
> >  }
> >  
> > +static inline bool vtd_root_entry_upper_present(VTDRootEntry *root)
> > +{
> > +    return root->rsvd & VTD_ROOT_ENTRY_P;
> > +}
> > +
> >  static int vtd_get_root_entry(IntelIOMMUState *s, uint8_t index,
> >                                VTDRootEntry *re)
> >  {
> > @@ -509,6 +514,9 @@ static int vtd_get_root_entry(IntelIOMMUState *s, uint8_t index,
> >          return -VTD_FR_ROOT_TABLE_INV;
> >      }
> >      re->val = le64_to_cpu(re->val);
> > +    if (s->ecs) {
> > +        re->rsvd = le64_to_cpu(re->rsvd);
> > +    }
> 
> I feel it slightly hacky to play with re->rsvd. How about:
> 
> union VTDRootEntry {
>     struct {
>         uint64_t val;
>         uint64_t rsvd;
>     } base;
>     struct {
>         uint64_t ext_lo;
>         uint64_t ext_hi;
>     } extended;
> };

Agree.
 
> (Or any better way that can get rid of rsvd...)
> 
> Even:
> 
> struct VTDRootEntry {
>     union {
>         struct {
>                 uint64_t val;
>                 uint64_t rsvd;
>         } base;
>         struct {
>                 uint64_t ext_lo;
>                 uint64_t ext_hi;
>         } extended;
>     } data;
>     bool extended;
> };
> 
> Then we read the entry into data, and setup extended bit. A benefit of
> it is that we may avoid passing around IntelIOMMUState everywhere to
> know whether we are using extended context entries.

For this proposal, it's combining the s->ecs bit and root entry. But it
may mislead future maintainer as it uses VTDRootEntry. maybe name it
differently.

> >      return 0;
> >  }
> >  
> > @@ -517,19 +525,30 @@ static inline bool vtd_context_entry_present(VTDContextEntry *context)
> >      return context->lo & VTD_CONTEXT_ENTRY_P;
> >  }
> >  
> > -static int vtd_get_context_entry_from_root(VTDRootEntry *root, uint8_t index,
> > -                                           VTDContextEntry *ce)
> > +static int vtd_get_context_entry_from_root(IntelIOMMUState *s,
> > +                 VTDRootEntry *root, uint8_t index, VTDContextEntry *ce)
> >  {
> > -    dma_addr_t addr;
> > +    dma_addr_t addr, ce_size;
> >  
> >      /* we have checked that root entry is present */
> > -    addr = (root->val & VTD_ROOT_ENTRY_CTP) + index * sizeof(*ce);
> > -    if (dma_memory_read(&address_space_memory, addr, ce, sizeof(*ce))) {
> > +    ce_size = (s->ecs) ? (2 * sizeof(*ce)) : (sizeof(*ce));
> > +    addr = (s->ecs && (index > 0x7f)) ?
> > +           ((root->rsvd & VTD_ROOT_ENTRY_CTP) + (index - 0x80) * ce_size) :
> > +           ((root->val & VTD_ROOT_ENTRY_CTP) + index * ce_size);
> > +
> > +    if (dma_memory_read(&address_space_memory, addr, ce, ce_size)) {
> >          trace_vtd_re_invalid(root->rsvd, root->val);
> >          return -VTD_FR_CONTEXT_TABLE_INV;
> >      }
> > -    ce->lo = le64_to_cpu(ce->lo);
> > -    ce->hi = le64_to_cpu(ce->hi);
> > +
> > +    ce[0].lo = le64_to_cpu(ce[0].lo);
> > +    ce[0].hi = le64_to_cpu(ce[0].hi);
> 
> Again, I feel this even hackier. :)
> 
> I would slightly prefer to play the same union trick to context
> entries, just like what I proposed to the root entries above...

would think about it.

> > +
> > +    if (s->ecs) {
> > +        ce[1].lo = le64_to_cpu(ce[1].lo);
> > +        ce[1].hi = le64_to_cpu(ce[1].hi);
> > +    }
> > +
> >      return 0;
> >  }
> >  
> > @@ -595,9 +614,11 @@ static inline uint32_t vtd_get_agaw_from_context_entry(VTDContextEntry *ce)
> >      return 30 + (ce->hi & VTD_CONTEXT_ENTRY_AW) * 9;
> >  }
> >  
> > -static inline uint32_t vtd_ce_get_type(VTDContextEntry *ce)
> > +static inline uint32_t vtd_ce_get_type(IntelIOMMUState *s,
> > +                                       VTDContextEntry *ce)
> >  {
> > -    return ce->lo & VTD_CONTEXT_ENTRY_TT;
> > +    return s->ecs ? (ce->lo & VTD_CONTEXT_ENTRY_TT) :
> > +                    (ce->lo & VTD_EXT_CONTEXT_ENTRY_TT);
> >  }
> >  
> >  static inline uint64_t vtd_iova_limit(VTDContextEntry *ce)
> > @@ -842,16 +863,20 @@ static int vtd_dev_to_context_entry(IntelIOMMUState *s, uint8_t bus_num,
> >          return ret_fr;
> >      }
> >  
> > -    if (!vtd_root_entry_present(&re)) {
> > +    if (!vtd_root_entry_present(&re) ||
> > +        (s->ecs && (devfn > 0x7f) && (!vtd_root_entry_upper_present(&re)))) {
> >          /* Not error - it's okay we don't have root entry. */
> >          trace_vtd_re_not_present(bus_num);
> >          return -VTD_FR_ROOT_ENTRY_P;
> > -    } else if (re.rsvd || (re.val & VTD_ROOT_ENTRY_RSVD)) {
> > -        trace_vtd_re_invalid(re.rsvd, re.val);
> > -        return -VTD_FR_ROOT_ENTRY_RSVD;
> > +    }
> > +    if ((s->ecs && (devfn > 0x7f) && (re.rsvd & VTD_ROOT_ENTRY_RSVD)) ||
> > +        (s->ecs && (devfn < 0x80) && (re.val & VTD_ROOT_ENTRY_RSVD)) ||
> > +        ((!s->ecs) && (re.rsvd || (re.val & VTD_ROOT_ENTRY_RSVD)))) {
> > +            trace_vtd_re_invalid(re.rsvd, re.val);
> > +            return -VTD_FR_ROOT_ENTRY_RSVD;
> 
> Nit: I feel like we can better wrap these 0x7f and 0x80 into helper
> functions, especially if with above structure change...

yep, would add helper function.

> (will hold here...)
> 
> Thanks,
> 
> -- 
> Peter Xu
> 

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 02/20] intel_iommu: exposed extended-context mode to guest
@ 2017-04-28  9:55           ` Liu, Yi L
  0 siblings, 0 replies; 81+ messages in thread
From: Liu, Yi L @ 2017-04-28  9:55 UTC (permalink / raw)
  To: Peter Xu
  Cc: tianyu.lan, kevin.tian, yi.l.liu, ashok.raj, kvm,
	jean-philippe.brucker, jasowang, iommu, qemu-devel,
	alex.williamson, jacob.jun.pan

On Thu, Apr 27, 2017 at 06:32:21PM +0800, Peter Xu wrote:
> On Wed, Apr 26, 2017 at 06:06:32PM +0800, Liu, Yi L wrote:
> > VT-d implementations reporting PASID or PRS fields as "Set", must also
> > report ecap.ECS as "Set". Extended-Context is required for SVM.
> > 
> > When ECS is reported, intel iommu driver would initiate extended root entry
> > and extended context entry, and also PASID table if there is any SVM capable
> > device.
> > 
> > Signed-off-by: Liu, Yi L <yi.l.liu@linux.intel.com>
> > ---
> >  hw/i386/intel_iommu.c          | 131 +++++++++++++++++++++++++++--------------
> >  hw/i386/intel_iommu_internal.h |   9 +++
> >  include/hw/i386/intel_iommu.h  |   2 +-
> >  3 files changed, 97 insertions(+), 45 deletions(-)
> > 
> > diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
> > index 400d0d1..bf98fa5 100644
> > --- a/hw/i386/intel_iommu.c
> > +++ b/hw/i386/intel_iommu.c
> > @@ -497,6 +497,11 @@ static inline bool vtd_root_entry_present(VTDRootEntry *root)
> >      return root->val & VTD_ROOT_ENTRY_P;
> >  }
> >  
> > +static inline bool vtd_root_entry_upper_present(VTDRootEntry *root)
> > +{
> > +    return root->rsvd & VTD_ROOT_ENTRY_P;
> > +}
> > +
> >  static int vtd_get_root_entry(IntelIOMMUState *s, uint8_t index,
> >                                VTDRootEntry *re)
> >  {
> > @@ -509,6 +514,9 @@ static int vtd_get_root_entry(IntelIOMMUState *s, uint8_t index,
> >          return -VTD_FR_ROOT_TABLE_INV;
> >      }
> >      re->val = le64_to_cpu(re->val);
> > +    if (s->ecs) {
> > +        re->rsvd = le64_to_cpu(re->rsvd);
> > +    }
> 
> I feel it slightly hacky to play with re->rsvd. How about:
> 
> union VTDRootEntry {
>     struct {
>         uint64_t val;
>         uint64_t rsvd;
>     } base;
>     struct {
>         uint64_t ext_lo;
>         uint64_t ext_hi;
>     } extended;
> };

Agree.
 
> (Or any better way that can get rid of rsvd...)
> 
> Even:
> 
> struct VTDRootEntry {
>     union {
>         struct {
>                 uint64_t val;
>                 uint64_t rsvd;
>         } base;
>         struct {
>                 uint64_t ext_lo;
>                 uint64_t ext_hi;
>         } extended;
>     } data;
>     bool extended;
> };
> 
> Then we read the entry into data, and setup extended bit. A benefit of
> it is that we may avoid passing around IntelIOMMUState everywhere to
> know whether we are using extended context entries.

For this proposal, it's combining the s->ecs bit and root entry. But it
may mislead future maintainer as it uses VTDRootEntry. maybe name it
differently.

> >      return 0;
> >  }
> >  
> > @@ -517,19 +525,30 @@ static inline bool vtd_context_entry_present(VTDContextEntry *context)
> >      return context->lo & VTD_CONTEXT_ENTRY_P;
> >  }
> >  
> > -static int vtd_get_context_entry_from_root(VTDRootEntry *root, uint8_t index,
> > -                                           VTDContextEntry *ce)
> > +static int vtd_get_context_entry_from_root(IntelIOMMUState *s,
> > +                 VTDRootEntry *root, uint8_t index, VTDContextEntry *ce)
> >  {
> > -    dma_addr_t addr;
> > +    dma_addr_t addr, ce_size;
> >  
> >      /* we have checked that root entry is present */
> > -    addr = (root->val & VTD_ROOT_ENTRY_CTP) + index * sizeof(*ce);
> > -    if (dma_memory_read(&address_space_memory, addr, ce, sizeof(*ce))) {
> > +    ce_size = (s->ecs) ? (2 * sizeof(*ce)) : (sizeof(*ce));
> > +    addr = (s->ecs && (index > 0x7f)) ?
> > +           ((root->rsvd & VTD_ROOT_ENTRY_CTP) + (index - 0x80) * ce_size) :
> > +           ((root->val & VTD_ROOT_ENTRY_CTP) + index * ce_size);
> > +
> > +    if (dma_memory_read(&address_space_memory, addr, ce, ce_size)) {
> >          trace_vtd_re_invalid(root->rsvd, root->val);
> >          return -VTD_FR_CONTEXT_TABLE_INV;
> >      }
> > -    ce->lo = le64_to_cpu(ce->lo);
> > -    ce->hi = le64_to_cpu(ce->hi);
> > +
> > +    ce[0].lo = le64_to_cpu(ce[0].lo);
> > +    ce[0].hi = le64_to_cpu(ce[0].hi);
> 
> Again, I feel this even hackier. :)
> 
> I would slightly prefer to play the same union trick to context
> entries, just like what I proposed to the root entries above...

would think about it.

> > +
> > +    if (s->ecs) {
> > +        ce[1].lo = le64_to_cpu(ce[1].lo);
> > +        ce[1].hi = le64_to_cpu(ce[1].hi);
> > +    }
> > +
> >      return 0;
> >  }
> >  
> > @@ -595,9 +614,11 @@ static inline uint32_t vtd_get_agaw_from_context_entry(VTDContextEntry *ce)
> >      return 30 + (ce->hi & VTD_CONTEXT_ENTRY_AW) * 9;
> >  }
> >  
> > -static inline uint32_t vtd_ce_get_type(VTDContextEntry *ce)
> > +static inline uint32_t vtd_ce_get_type(IntelIOMMUState *s,
> > +                                       VTDContextEntry *ce)
> >  {
> > -    return ce->lo & VTD_CONTEXT_ENTRY_TT;
> > +    return s->ecs ? (ce->lo & VTD_CONTEXT_ENTRY_TT) :
> > +                    (ce->lo & VTD_EXT_CONTEXT_ENTRY_TT);
> >  }
> >  
> >  static inline uint64_t vtd_iova_limit(VTDContextEntry *ce)
> > @@ -842,16 +863,20 @@ static int vtd_dev_to_context_entry(IntelIOMMUState *s, uint8_t bus_num,
> >          return ret_fr;
> >      }
> >  
> > -    if (!vtd_root_entry_present(&re)) {
> > +    if (!vtd_root_entry_present(&re) ||
> > +        (s->ecs && (devfn > 0x7f) && (!vtd_root_entry_upper_present(&re)))) {
> >          /* Not error - it's okay we don't have root entry. */
> >          trace_vtd_re_not_present(bus_num);
> >          return -VTD_FR_ROOT_ENTRY_P;
> > -    } else if (re.rsvd || (re.val & VTD_ROOT_ENTRY_RSVD)) {
> > -        trace_vtd_re_invalid(re.rsvd, re.val);
> > -        return -VTD_FR_ROOT_ENTRY_RSVD;
> > +    }
> > +    if ((s->ecs && (devfn > 0x7f) && (re.rsvd & VTD_ROOT_ENTRY_RSVD)) ||
> > +        (s->ecs && (devfn < 0x80) && (re.val & VTD_ROOT_ENTRY_RSVD)) ||
> > +        ((!s->ecs) && (re.rsvd || (re.val & VTD_ROOT_ENTRY_RSVD)))) {
> > +            trace_vtd_re_invalid(re.rsvd, re.val);
> > +            return -VTD_FR_ROOT_ENTRY_RSVD;
> 
> Nit: I feel like we can better wrap these 0x7f and 0x80 into helper
> functions, especially if with above structure change...

yep, would add helper function.

> (will hold here...)
> 
> Thanks,
> 
> -- 
> Peter Xu
> 

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [RFC PATCH 02/20] intel_iommu: exposed extended-context mode to guest
  2017-04-28  6:00         ` [Qemu-devel] " Lan Tianyu
@ 2017-04-28  9:56             ` Liu, Yi L
  -1 siblings, 0 replies; 81+ messages in thread
From: Liu, Yi L @ 2017-04-28  9:56 UTC (permalink / raw)
  To: Lan Tianyu
  Cc: kevin.tian-ral2JQCrhuEAvxtiuMwx3w, kvm-u79uwXL29TY76Z2rM5mHXA,
	jasowang-H+wXaHxf7aLQT0dZR+AlfA,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	qemu-devel-qX2TKyscuCcdnm+yROfE0A,
	jacob.jun.pan-ral2JQCrhuEAvxtiuMwx3w

On Fri, Apr 28, 2017 at 02:00:15PM +0800, Lan Tianyu wrote:
> On 2017年04月27日 18:32, Peter Xu wrote:
> > On Wed, Apr 26, 2017 at 06:06:32PM +0800, Liu, Yi L wrote:
> >> VT-d implementations reporting PASID or PRS fields as "Set", must also
> >> report ecap.ECS as "Set". Extended-Context is required for SVM.
> >>
> >> When ECS is reported, intel iommu driver would initiate extended root entry
> >> and extended context entry, and also PASID table if there is any SVM capable
> >> device.
> >>
> >> Signed-off-by: Liu, Yi L <yi.l.liu@linux.intel.com>
> >> ---
> >>  hw/i386/intel_iommu.c          | 131 +++++++++++++++++++++++++++--------------
> >>  hw/i386/intel_iommu_internal.h |   9 +++
> >>  include/hw/i386/intel_iommu.h  |   2 +-
> >>  3 files changed, 97 insertions(+), 45 deletions(-)
> >>
> >> diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
> >> index 400d0d1..bf98fa5 100644
> >> --- a/hw/i386/intel_iommu.c
> >> +++ b/hw/i386/intel_iommu.c
> >> @@ -497,6 +497,11 @@ static inline bool vtd_root_entry_present(VTDRootEntry *root)
> >>      return root->val & VTD_ROOT_ENTRY_P;
> >>  }
> >>  
> >> +static inline bool vtd_root_entry_upper_present(VTDRootEntry *root)
> >> +{
> >> +    return root->rsvd & VTD_ROOT_ENTRY_P;
> >> +}
> >> +
> >>  static int vtd_get_root_entry(IntelIOMMUState *s, uint8_t index,
> >>                                VTDRootEntry *re)
> >>  {
> >> @@ -509,6 +514,9 @@ static int vtd_get_root_entry(IntelIOMMUState *s, uint8_t index,
> >>          return -VTD_FR_ROOT_TABLE_INV;
> >>      }
> >>      re->val = le64_to_cpu(re->val);
> >> +    if (s->ecs) {
> >> +        re->rsvd = le64_to_cpu(re->rsvd);
> >> +    }
> > 
> > I feel it slightly hacky to play with re->rsvd. How about:
> > 
> > union VTDRootEntry {
> >     struct {
> >         uint64_t val;
> >         uint64_t rsvd;
> >     } base;
> >     struct {
> >         uint64_t ext_lo;
> >         uint64_t ext_hi;
> >     } extended;
> > };
> > 
> > (Or any better way that can get rid of rsvd...)
> > 
> > Even:
> > 
> > struct VTDRootEntry {
> >     union {
> >         struct {
> >                 uint64_t val;
> >                 uint64_t rsvd;
> >         } base;
> >         struct {
> >                 uint64_t ext_lo;
> >                 uint64_t ext_hi;
> >         } extended;
> >     } data;
> >     bool extended;
> > };
> > 
> > Then we read the entry into data, and setup extended bit. A benefit of
> > it is that we may avoid passing around IntelIOMMUState everywhere to
> > know whether we are using extended context entries.
> > 
> >>      return 0;
> >>  }
> >>  
> >> @@ -517,19 +525,30 @@ static inline bool vtd_context_entry_present(VTDContextEntry *context)
> >>      return context->lo & VTD_CONTEXT_ENTRY_P;
> >>  }
> >>  
> >> -static int vtd_get_context_entry_from_root(VTDRootEntry *root, uint8_t index,
> >> -                                           VTDContextEntry *ce)
> >> +static int vtd_get_context_entry_from_root(IntelIOMMUState *s,
> >> +                 VTDRootEntry *root, uint8_t index, VTDContextEntry *ce)
> >>  {
> >> -    dma_addr_t addr;
> >> +    dma_addr_t addr, ce_size;
> >>  
> >>      /* we have checked that root entry is present */
> >> -    addr = (root->val & VTD_ROOT_ENTRY_CTP) + index * sizeof(*ce);
> >> -    if (dma_memory_read(&address_space_memory, addr, ce, sizeof(*ce))) {
> >> +    ce_size = (s->ecs) ? (2 * sizeof(*ce)) : (sizeof(*ce));
> >> +    addr = (s->ecs && (index > 0x7f)) ?
> >> +           ((root->rsvd & VTD_ROOT_ENTRY_CTP) + (index - 0x80) * ce_size) :
> >> +           ((root->val & VTD_ROOT_ENTRY_CTP) + index * ce_size);
> >> +
> >> +    if (dma_memory_read(&address_space_memory, addr, ce, ce_size)) {
> >>          trace_vtd_re_invalid(root->rsvd, root->val);
> >>          return -VTD_FR_CONTEXT_TABLE_INV;
> >>      }
> >> -    ce->lo = le64_to_cpu(ce->lo);
> >> -    ce->hi = le64_to_cpu(ce->hi);
> >> +
> >> +    ce[0].lo = le64_to_cpu(ce[0].lo);
> >> +    ce[0].hi = le64_to_cpu(ce[0].hi);
> > 
> > Again, I feel this even hackier. :)
> > 
> > I would slightly prefer to play the same union trick to context
> > entries, just like what I proposed to the root entries above...
> > 
> >> +
> >> +    if (s->ecs) {
> >> +        ce[1].lo = le64_to_cpu(ce[1].lo);
> >> +        ce[1].hi = le64_to_cpu(ce[1].hi);
> >> +    }
> >> +
> >>      return 0;
> >>  }
> >>  
> >> @@ -595,9 +614,11 @@ static inline uint32_t vtd_get_agaw_from_context_entry(VTDContextEntry *ce)
> >>      return 30 + (ce->hi & VTD_CONTEXT_ENTRY_AW) * 9;
> >>  }
> >>  
> >> -static inline uint32_t vtd_ce_get_type(VTDContextEntry *ce)
> >> +static inline uint32_t vtd_ce_get_type(IntelIOMMUState *s,
> >> +                                       VTDContextEntry *ce)
> >>  {
> >> -    return ce->lo & VTD_CONTEXT_ENTRY_TT;
> >> +    return s->ecs ? (ce->lo & VTD_CONTEXT_ENTRY_TT) :
> >> +                    (ce->lo & VTD_EXT_CONTEXT_ENTRY_TT);
> >>  }
> >>  
> >>  static inline uint64_t vtd_iova_limit(VTDContextEntry *ce)
> >> @@ -842,16 +863,20 @@ static int vtd_dev_to_context_entry(IntelIOMMUState *s, uint8_t bus_num,
> >>          return ret_fr;
> >>      }
> >>  
> >> -    if (!vtd_root_entry_present(&re)) {
> >> +    if (!vtd_root_entry_present(&re) ||
> >> +        (s->ecs && (devfn > 0x7f) && (!vtd_root_entry_upper_present(&re)))) {
> >>          /* Not error - it's okay we don't have root entry. */
> >>          trace_vtd_re_not_present(bus_num);
> >>          return -VTD_FR_ROOT_ENTRY_P;
> >> -    } else if (re.rsvd || (re.val & VTD_ROOT_ENTRY_RSVD)) {
> >> -        trace_vtd_re_invalid(re.rsvd, re.val);
> >> -        return -VTD_FR_ROOT_ENTRY_RSVD;
> >> +    }
> >> +    if ((s->ecs && (devfn > 0x7f) && (re.rsvd & VTD_ROOT_ENTRY_RSVD)) ||
> >> +        (s->ecs && (devfn < 0x80) && (re.val & VTD_ROOT_ENTRY_RSVD)) ||
> >> +        ((!s->ecs) && (re.rsvd || (re.val & VTD_ROOT_ENTRY_RSVD)))) {
> >> +            trace_vtd_re_invalid(re.rsvd, re.val);
> >> +            return -VTD_FR_ROOT_ENTRY_RSVD;
> > 
> > Nit: I feel like we can better wrap these 0x7f and 0x80 into helper
> > functions, especially if with above structure change...
> > 
> > (will hold here...)
> > 
> > Thanks,
> > 
> 
> 
> It's possible to add helper macro to check bits in context entry and
> extend context entry and put the check of ecs mode into helper macro?

yes, would add accordingly in next version.

_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 02/20] intel_iommu: exposed extended-context mode to guest
@ 2017-04-28  9:56             ` Liu, Yi L
  0 siblings, 0 replies; 81+ messages in thread
From: Liu, Yi L @ 2017-04-28  9:56 UTC (permalink / raw)
  To: Lan Tianyu
  Cc: Peter Xu, qemu-devel, alex.williamson, kvm, jasowang, iommu,
	kevin.tian, ashok.raj, jacob.jun.pan, yi.l.liu,
	jean-philippe.brucker

On Fri, Apr 28, 2017 at 02:00:15PM +0800, Lan Tianyu wrote:
> On 2017年04月27日 18:32, Peter Xu wrote:
> > On Wed, Apr 26, 2017 at 06:06:32PM +0800, Liu, Yi L wrote:
> >> VT-d implementations reporting PASID or PRS fields as "Set", must also
> >> report ecap.ECS as "Set". Extended-Context is required for SVM.
> >>
> >> When ECS is reported, intel iommu driver would initiate extended root entry
> >> and extended context entry, and also PASID table if there is any SVM capable
> >> device.
> >>
> >> Signed-off-by: Liu, Yi L <yi.l.liu@linux.intel.com>
> >> ---
> >>  hw/i386/intel_iommu.c          | 131 +++++++++++++++++++++++++++--------------
> >>  hw/i386/intel_iommu_internal.h |   9 +++
> >>  include/hw/i386/intel_iommu.h  |   2 +-
> >>  3 files changed, 97 insertions(+), 45 deletions(-)
> >>
> >> diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
> >> index 400d0d1..bf98fa5 100644
> >> --- a/hw/i386/intel_iommu.c
> >> +++ b/hw/i386/intel_iommu.c
> >> @@ -497,6 +497,11 @@ static inline bool vtd_root_entry_present(VTDRootEntry *root)
> >>      return root->val & VTD_ROOT_ENTRY_P;
> >>  }
> >>  
> >> +static inline bool vtd_root_entry_upper_present(VTDRootEntry *root)
> >> +{
> >> +    return root->rsvd & VTD_ROOT_ENTRY_P;
> >> +}
> >> +
> >>  static int vtd_get_root_entry(IntelIOMMUState *s, uint8_t index,
> >>                                VTDRootEntry *re)
> >>  {
> >> @@ -509,6 +514,9 @@ static int vtd_get_root_entry(IntelIOMMUState *s, uint8_t index,
> >>          return -VTD_FR_ROOT_TABLE_INV;
> >>      }
> >>      re->val = le64_to_cpu(re->val);
> >> +    if (s->ecs) {
> >> +        re->rsvd = le64_to_cpu(re->rsvd);
> >> +    }
> > 
> > I feel it slightly hacky to play with re->rsvd. How about:
> > 
> > union VTDRootEntry {
> >     struct {
> >         uint64_t val;
> >         uint64_t rsvd;
> >     } base;
> >     struct {
> >         uint64_t ext_lo;
> >         uint64_t ext_hi;
> >     } extended;
> > };
> > 
> > (Or any better way that can get rid of rsvd...)
> > 
> > Even:
> > 
> > struct VTDRootEntry {
> >     union {
> >         struct {
> >                 uint64_t val;
> >                 uint64_t rsvd;
> >         } base;
> >         struct {
> >                 uint64_t ext_lo;
> >                 uint64_t ext_hi;
> >         } extended;
> >     } data;
> >     bool extended;
> > };
> > 
> > Then we read the entry into data, and setup extended bit. A benefit of
> > it is that we may avoid passing around IntelIOMMUState everywhere to
> > know whether we are using extended context entries.
> > 
> >>      return 0;
> >>  }
> >>  
> >> @@ -517,19 +525,30 @@ static inline bool vtd_context_entry_present(VTDContextEntry *context)
> >>      return context->lo & VTD_CONTEXT_ENTRY_P;
> >>  }
> >>  
> >> -static int vtd_get_context_entry_from_root(VTDRootEntry *root, uint8_t index,
> >> -                                           VTDContextEntry *ce)
> >> +static int vtd_get_context_entry_from_root(IntelIOMMUState *s,
> >> +                 VTDRootEntry *root, uint8_t index, VTDContextEntry *ce)
> >>  {
> >> -    dma_addr_t addr;
> >> +    dma_addr_t addr, ce_size;
> >>  
> >>      /* we have checked that root entry is present */
> >> -    addr = (root->val & VTD_ROOT_ENTRY_CTP) + index * sizeof(*ce);
> >> -    if (dma_memory_read(&address_space_memory, addr, ce, sizeof(*ce))) {
> >> +    ce_size = (s->ecs) ? (2 * sizeof(*ce)) : (sizeof(*ce));
> >> +    addr = (s->ecs && (index > 0x7f)) ?
> >> +           ((root->rsvd & VTD_ROOT_ENTRY_CTP) + (index - 0x80) * ce_size) :
> >> +           ((root->val & VTD_ROOT_ENTRY_CTP) + index * ce_size);
> >> +
> >> +    if (dma_memory_read(&address_space_memory, addr, ce, ce_size)) {
> >>          trace_vtd_re_invalid(root->rsvd, root->val);
> >>          return -VTD_FR_CONTEXT_TABLE_INV;
> >>      }
> >> -    ce->lo = le64_to_cpu(ce->lo);
> >> -    ce->hi = le64_to_cpu(ce->hi);
> >> +
> >> +    ce[0].lo = le64_to_cpu(ce[0].lo);
> >> +    ce[0].hi = le64_to_cpu(ce[0].hi);
> > 
> > Again, I feel this even hackier. :)
> > 
> > I would slightly prefer to play the same union trick to context
> > entries, just like what I proposed to the root entries above...
> > 
> >> +
> >> +    if (s->ecs) {
> >> +        ce[1].lo = le64_to_cpu(ce[1].lo);
> >> +        ce[1].hi = le64_to_cpu(ce[1].hi);
> >> +    }
> >> +
> >>      return 0;
> >>  }
> >>  
> >> @@ -595,9 +614,11 @@ static inline uint32_t vtd_get_agaw_from_context_entry(VTDContextEntry *ce)
> >>      return 30 + (ce->hi & VTD_CONTEXT_ENTRY_AW) * 9;
> >>  }
> >>  
> >> -static inline uint32_t vtd_ce_get_type(VTDContextEntry *ce)
> >> +static inline uint32_t vtd_ce_get_type(IntelIOMMUState *s,
> >> +                                       VTDContextEntry *ce)
> >>  {
> >> -    return ce->lo & VTD_CONTEXT_ENTRY_TT;
> >> +    return s->ecs ? (ce->lo & VTD_CONTEXT_ENTRY_TT) :
> >> +                    (ce->lo & VTD_EXT_CONTEXT_ENTRY_TT);
> >>  }
> >>  
> >>  static inline uint64_t vtd_iova_limit(VTDContextEntry *ce)
> >> @@ -842,16 +863,20 @@ static int vtd_dev_to_context_entry(IntelIOMMUState *s, uint8_t bus_num,
> >>          return ret_fr;
> >>      }
> >>  
> >> -    if (!vtd_root_entry_present(&re)) {
> >> +    if (!vtd_root_entry_present(&re) ||
> >> +        (s->ecs && (devfn > 0x7f) && (!vtd_root_entry_upper_present(&re)))) {
> >>          /* Not error - it's okay we don't have root entry. */
> >>          trace_vtd_re_not_present(bus_num);
> >>          return -VTD_FR_ROOT_ENTRY_P;
> >> -    } else if (re.rsvd || (re.val & VTD_ROOT_ENTRY_RSVD)) {
> >> -        trace_vtd_re_invalid(re.rsvd, re.val);
> >> -        return -VTD_FR_ROOT_ENTRY_RSVD;
> >> +    }
> >> +    if ((s->ecs && (devfn > 0x7f) && (re.rsvd & VTD_ROOT_ENTRY_RSVD)) ||
> >> +        (s->ecs && (devfn < 0x80) && (re.val & VTD_ROOT_ENTRY_RSVD)) ||
> >> +        ((!s->ecs) && (re.rsvd || (re.val & VTD_ROOT_ENTRY_RSVD)))) {
> >> +            trace_vtd_re_invalid(re.rsvd, re.val);
> >> +            return -VTD_FR_ROOT_ENTRY_RSVD;
> > 
> > Nit: I feel like we can better wrap these 0x7f and 0x80 into helper
> > functions, especially if with above structure change...
> > 
> > (will hold here...)
> > 
> > Thanks,
> > 
> 
> 
> It's possible to add helper macro to check bits in context entry and
> extend context entry and put the check of ecs mode into helper macro?

yes, would add accordingly in next version.

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [RFC PATCH 03/20] intel_iommu: add "svm" option
  2017-04-27 10:53       ` [Qemu-devel] " Peter Xu
@ 2017-05-04 20:28           ` Alex Williamson
  -1 siblings, 0 replies; 81+ messages in thread
From: Alex Williamson @ 2017-05-04 20:28 UTC (permalink / raw)
  To: Peter Xu
  Cc: tianyu.lan-ral2JQCrhuEAvxtiuMwx3w, Liu, Yi L,
	kevin.tian-ral2JQCrhuEAvxtiuMwx3w, kvm-u79uwXL29TY76Z2rM5mHXA,
	jasowang-H+wXaHxf7aLQT0dZR+AlfA,
	qemu-devel-qX2TKyscuCcdnm+yROfE0A,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	jacob.jun.pan-ral2JQCrhuEAvxtiuMwx3w

On Thu, 27 Apr 2017 18:53:17 +0800
Peter Xu <peterx-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> wrote:

> On Wed, Apr 26, 2017 at 06:06:33PM +0800, Liu, Yi L wrote:
> > Expose "Shared Virtual Memory" to guest by using "svm" option.
> > Also use "svm" to expose SVM related capabilities to guest.
> > e.g. "-device intel-iommu, svm=on"
> > 
> > Signed-off-by: Liu, Yi L <yi.l.liu-VuQAYsv1563Yd54FQh9/CA@public.gmane.org>
> > ---
> >  hw/i386/intel_iommu.c          | 10 ++++++++++
> >  hw/i386/intel_iommu_internal.h |  5 +++++
> >  include/hw/i386/intel_iommu.h  |  1 +
> >  3 files changed, 16 insertions(+)
> > 
> > diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
> > index bf98fa5..ba1e7eb 100644
> > --- a/hw/i386/intel_iommu.c
> > +++ b/hw/i386/intel_iommu.c
> > @@ -2453,6 +2453,7 @@ static Property vtd_properties[] = {
> >      DEFINE_PROP_BOOL("x-buggy-eim", IntelIOMMUState, buggy_eim, false),
> >      DEFINE_PROP_BOOL("caching-mode", IntelIOMMUState, caching_mode, FALSE),
> >      DEFINE_PROP_BOOL("ecs", IntelIOMMUState, ecs, FALSE),
> > +    DEFINE_PROP_BOOL("svm", IntelIOMMUState, svm, FALSE),
> >      DEFINE_PROP_END_OF_LIST(),
> >  };
> >  
> > @@ -2973,6 +2974,15 @@ static void vtd_init(IntelIOMMUState *s)
> >          s->ecap |= VTD_ECAP_ECS;
> >      }
> >  
> > +    if (s->svm) {
> > +        if (!s->ecs || !x86_iommu->pt_supported || !s->caching_mode) {
> > +            error_report("Need to set ecs, pt, caching-mode for svm");
> > +            exit(1);
> > +        }
> > +        s->cap |= VTD_CAP_DWD | VTD_CAP_DRD;
> > +        s->ecap |= VTD_ECAP_PRS | VTD_ECAP_PTS | VTD_ECAP_PASID28;
> > +    }
> > +
> >      if (s->caching_mode) {
> >          s->cap |= VTD_CAP_CM;
> >      }
> > diff --git a/hw/i386/intel_iommu_internal.h b/hw/i386/intel_iommu_internal.h
> > index 71a1c1e..f2a7d12 100644
> > --- a/hw/i386/intel_iommu_internal.h
> > +++ b/hw/i386/intel_iommu_internal.h
> > @@ -191,6 +191,9 @@
> >  #define VTD_ECAP_PT                 (1ULL << 6)
> >  #define VTD_ECAP_MHMV               (15ULL << 20)
> >  #define VTD_ECAP_ECS                (1ULL << 24)
> > +#define VTD_ECAP_PASID28            (1ULL << 28)  
> 
> Could I ask what's this bit? On my spec, it says this bit is reserved
> and defunct (spec version: June 2016).

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=d42fde70849c5ba2f00c37a0666305eb507a47b8

Do we really need to emulate the buggy implementation?  Seems like we
could just pretend bit28 never happened here and use bit40 instead.

> 
> > +#define VTD_ECAP_PRS                (1ULL << 29)
> > +#define VTD_ECAP_PTS                (0xeULL << 35)  
> 
> Would it better we avoid using 0xe here, or at least add some comment?
> 
> >  
> >  /* CAP_REG */
> >  /* (offset >> 4) << 24 */
> > @@ -207,6 +210,8 @@
> >  #define VTD_CAP_PSI                 (1ULL << 39)
> >  #define VTD_CAP_SLLPS               ((1ULL << 34) | (1ULL << 35))
> >  #define VTD_CAP_CM                  (1ULL << 7)
> > +#define VTD_CAP_DWD                 (1ULL << 54)
> > +#define VTD_CAP_DRD                 (1ULL << 55)  
> 
> Just to confirm: after this series, we should support drain read/write
> then, right?
> 
> Thanks,
> 
> >  
> >  /* Supported Adjusted Guest Address Widths */
> >  #define VTD_CAP_SAGAW_SHIFT         8
> > diff --git a/include/hw/i386/intel_iommu.h b/include/hw/i386/intel_iommu.h
> > index ae21fe5..8981615 100644
> > --- a/include/hw/i386/intel_iommu.h
> > +++ b/include/hw/i386/intel_iommu.h
> > @@ -267,6 +267,7 @@ struct IntelIOMMUState {
> >  
> >      bool caching_mode;          /* RO - is cap CM enabled? */
> >      bool ecs;                       /* Extended Context Support */
> > +    bool svm;                       /* Shared Virtual Memory */
> >  
> >      dma_addr_t root;                /* Current root table pointer */
> >      bool root_extended;             /* Type of root table (extended or not) */
> > -- 
> > 1.9.1
> >   
> 

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 03/20] intel_iommu: add "svm" option
@ 2017-05-04 20:28           ` Alex Williamson
  0 siblings, 0 replies; 81+ messages in thread
From: Alex Williamson @ 2017-05-04 20:28 UTC (permalink / raw)
  To: Peter Xu
  Cc: Liu, Yi L, qemu-devel, kvm, jasowang, iommu, kevin.tian,
	ashok.raj, jacob.jun.pan, tianyu.lan, yi.l.liu,
	jean-philippe.brucker

On Thu, 27 Apr 2017 18:53:17 +0800
Peter Xu <peterx@redhat.com> wrote:

> On Wed, Apr 26, 2017 at 06:06:33PM +0800, Liu, Yi L wrote:
> > Expose "Shared Virtual Memory" to guest by using "svm" option.
> > Also use "svm" to expose SVM related capabilities to guest.
> > e.g. "-device intel-iommu, svm=on"
> > 
> > Signed-off-by: Liu, Yi L <yi.l.liu@linux.intel.com>
> > ---
> >  hw/i386/intel_iommu.c          | 10 ++++++++++
> >  hw/i386/intel_iommu_internal.h |  5 +++++
> >  include/hw/i386/intel_iommu.h  |  1 +
> >  3 files changed, 16 insertions(+)
> > 
> > diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
> > index bf98fa5..ba1e7eb 100644
> > --- a/hw/i386/intel_iommu.c
> > +++ b/hw/i386/intel_iommu.c
> > @@ -2453,6 +2453,7 @@ static Property vtd_properties[] = {
> >      DEFINE_PROP_BOOL("x-buggy-eim", IntelIOMMUState, buggy_eim, false),
> >      DEFINE_PROP_BOOL("caching-mode", IntelIOMMUState, caching_mode, FALSE),
> >      DEFINE_PROP_BOOL("ecs", IntelIOMMUState, ecs, FALSE),
> > +    DEFINE_PROP_BOOL("svm", IntelIOMMUState, svm, FALSE),
> >      DEFINE_PROP_END_OF_LIST(),
> >  };
> >  
> > @@ -2973,6 +2974,15 @@ static void vtd_init(IntelIOMMUState *s)
> >          s->ecap |= VTD_ECAP_ECS;
> >      }
> >  
> > +    if (s->svm) {
> > +        if (!s->ecs || !x86_iommu->pt_supported || !s->caching_mode) {
> > +            error_report("Need to set ecs, pt, caching-mode for svm");
> > +            exit(1);
> > +        }
> > +        s->cap |= VTD_CAP_DWD | VTD_CAP_DRD;
> > +        s->ecap |= VTD_ECAP_PRS | VTD_ECAP_PTS | VTD_ECAP_PASID28;
> > +    }
> > +
> >      if (s->caching_mode) {
> >          s->cap |= VTD_CAP_CM;
> >      }
> > diff --git a/hw/i386/intel_iommu_internal.h b/hw/i386/intel_iommu_internal.h
> > index 71a1c1e..f2a7d12 100644
> > --- a/hw/i386/intel_iommu_internal.h
> > +++ b/hw/i386/intel_iommu_internal.h
> > @@ -191,6 +191,9 @@
> >  #define VTD_ECAP_PT                 (1ULL << 6)
> >  #define VTD_ECAP_MHMV               (15ULL << 20)
> >  #define VTD_ECAP_ECS                (1ULL << 24)
> > +#define VTD_ECAP_PASID28            (1ULL << 28)  
> 
> Could I ask what's this bit? On my spec, it says this bit is reserved
> and defunct (spec version: June 2016).

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=d42fde70849c5ba2f00c37a0666305eb507a47b8

Do we really need to emulate the buggy implementation?  Seems like we
could just pretend bit28 never happened here and use bit40 instead.

> 
> > +#define VTD_ECAP_PRS                (1ULL << 29)
> > +#define VTD_ECAP_PTS                (0xeULL << 35)  
> 
> Would it better we avoid using 0xe here, or at least add some comment?
> 
> >  
> >  /* CAP_REG */
> >  /* (offset >> 4) << 24 */
> > @@ -207,6 +210,8 @@
> >  #define VTD_CAP_PSI                 (1ULL << 39)
> >  #define VTD_CAP_SLLPS               ((1ULL << 34) | (1ULL << 35))
> >  #define VTD_CAP_CM                  (1ULL << 7)
> > +#define VTD_CAP_DWD                 (1ULL << 54)
> > +#define VTD_CAP_DRD                 (1ULL << 55)  
> 
> Just to confirm: after this series, we should support drain read/write
> then, right?
> 
> Thanks,
> 
> >  
> >  /* Supported Adjusted Guest Address Widths */
> >  #define VTD_CAP_SAGAW_SHIFT         8
> > diff --git a/include/hw/i386/intel_iommu.h b/include/hw/i386/intel_iommu.h
> > index ae21fe5..8981615 100644
> > --- a/include/hw/i386/intel_iommu.h
> > +++ b/include/hw/i386/intel_iommu.h
> > @@ -267,6 +267,7 @@ struct IntelIOMMUState {
> >  
> >      bool caching_mode;          /* RO - is cap CM enabled? */
> >      bool ecs;                       /* Extended Context Support */
> > +    bool svm;                       /* Shared Virtual Memory */
> >  
> >      dma_addr_t root;                /* Current root table pointer */
> >      bool root_extended;             /* Type of root table (extended or not) */
> > -- 
> > 1.9.1
> >   
> 

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [RFC PATCH 03/20] intel_iommu: add "svm" option
  2017-05-04 20:28           ` [Qemu-devel] " Alex Williamson
@ 2017-05-04 20:37               ` Raj, Ashok
  -1 siblings, 0 replies; 81+ messages in thread
From: Raj, Ashok @ 2017-05-04 20:37 UTC (permalink / raw)
  To: Alex Williamson
  Cc: tianyu.lan-ral2JQCrhuEAvxtiuMwx3w, Liu, Yi L,
	kevin.tian-ral2JQCrhuEAvxtiuMwx3w, kvm-u79uwXL29TY76Z2rM5mHXA,
	jasowang-H+wXaHxf7aLQT0dZR+AlfA,
	qemu-devel-qX2TKyscuCcdnm+yROfE0A,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	jacob.jun.pan-ral2JQCrhuEAvxtiuMwx3w

On Thu, May 04, 2017 at 02:28:53PM -0600, Alex Williamson wrote:
> On Thu, 27 Apr 2017 18:53:17 +0800
> Peter Xu <peterx-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> wrote:
> 
> > On Wed, Apr 26, 2017 at 06:06:33PM +0800, Liu, Yi L wrote:
> > > Expose "Shared Virtual Memory" to guest by using "svm" option.
> > > Also use "svm" to expose SVM related capabilities to guest.
> > > e.g. "-device intel-iommu, svm=on"
> > > 
> > > Signed-off-by: Liu, Yi L <yi.l.liu-VuQAYsv1563Yd54FQh9/CA@public.gmane.org>
> > > ---
> > >  hw/i386/intel_iommu.c          | 10 ++++++++++
> > >  hw/i386/intel_iommu_internal.h |  5 +++++
> > >  include/hw/i386/intel_iommu.h  |  1 +
> > >  3 files changed, 16 insertions(+)
> > > 
> > > diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
> > > index bf98fa5..ba1e7eb 100644
> > > --- a/hw/i386/intel_iommu.c
> > > +++ b/hw/i386/intel_iommu.c
> > > @@ -2453,6 +2453,7 @@ static Property vtd_properties[] = {
> > >      DEFINE_PROP_BOOL("x-buggy-eim", IntelIOMMUState, buggy_eim, false),
> > >      DEFINE_PROP_BOOL("caching-mode", IntelIOMMUState, caching_mode, FALSE),
> > >      DEFINE_PROP_BOOL("ecs", IntelIOMMUState, ecs, FALSE),
> > > +    DEFINE_PROP_BOOL("svm", IntelIOMMUState, svm, FALSE),
> > >      DEFINE_PROP_END_OF_LIST(),
> > >  };
> > >  
> > > @@ -2973,6 +2974,15 @@ static void vtd_init(IntelIOMMUState *s)
> > >          s->ecap |= VTD_ECAP_ECS;
> > >      }
> > >  
> > > +    if (s->svm) {
> > > +        if (!s->ecs || !x86_iommu->pt_supported || !s->caching_mode) {
> > > +            error_report("Need to set ecs, pt, caching-mode for svm");
> > > +            exit(1);
> > > +        }
> > > +        s->cap |= VTD_CAP_DWD | VTD_CAP_DRD;
> > > +        s->ecap |= VTD_ECAP_PRS | VTD_ECAP_PTS | VTD_ECAP_PASID28;
> > > +    }
> > > +
> > >      if (s->caching_mode) {
> > >          s->cap |= VTD_CAP_CM;
> > >      }
> > > diff --git a/hw/i386/intel_iommu_internal.h b/hw/i386/intel_iommu_internal.h
> > > index 71a1c1e..f2a7d12 100644
> > > --- a/hw/i386/intel_iommu_internal.h
> > > +++ b/hw/i386/intel_iommu_internal.h
> > > @@ -191,6 +191,9 @@
> > >  #define VTD_ECAP_PT                 (1ULL << 6)
> > >  #define VTD_ECAP_MHMV               (15ULL << 20)
> > >  #define VTD_ECAP_ECS                (1ULL << 24)
> > > +#define VTD_ECAP_PASID28            (1ULL << 28)  
> > 
> > Could I ask what's this bit? On my spec, it says this bit is reserved
> > and defunct (spec version: June 2016).
> 
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=d42fde70849c5ba2f00c37a0666305eb507a47b8
> 
> Do we really need to emulate the buggy implementation?  Seems like we
> could just pretend bit28 never happened here and use bit40 instead.
> 

Agree, bit28 can be gone.  

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 03/20] intel_iommu: add "svm" option
@ 2017-05-04 20:37               ` Raj, Ashok
  0 siblings, 0 replies; 81+ messages in thread
From: Raj, Ashok @ 2017-05-04 20:37 UTC (permalink / raw)
  To: Alex Williamson
  Cc: Peter Xu, Liu, Yi L, qemu-devel, kvm, jasowang, iommu,
	kevin.tian, jacob.jun.pan, tianyu.lan, yi.l.liu,
	jean-philippe.brucker

On Thu, May 04, 2017 at 02:28:53PM -0600, Alex Williamson wrote:
> On Thu, 27 Apr 2017 18:53:17 +0800
> Peter Xu <peterx@redhat.com> wrote:
> 
> > On Wed, Apr 26, 2017 at 06:06:33PM +0800, Liu, Yi L wrote:
> > > Expose "Shared Virtual Memory" to guest by using "svm" option.
> > > Also use "svm" to expose SVM related capabilities to guest.
> > > e.g. "-device intel-iommu, svm=on"
> > > 
> > > Signed-off-by: Liu, Yi L <yi.l.liu@linux.intel.com>
> > > ---
> > >  hw/i386/intel_iommu.c          | 10 ++++++++++
> > >  hw/i386/intel_iommu_internal.h |  5 +++++
> > >  include/hw/i386/intel_iommu.h  |  1 +
> > >  3 files changed, 16 insertions(+)
> > > 
> > > diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
> > > index bf98fa5..ba1e7eb 100644
> > > --- a/hw/i386/intel_iommu.c
> > > +++ b/hw/i386/intel_iommu.c
> > > @@ -2453,6 +2453,7 @@ static Property vtd_properties[] = {
> > >      DEFINE_PROP_BOOL("x-buggy-eim", IntelIOMMUState, buggy_eim, false),
> > >      DEFINE_PROP_BOOL("caching-mode", IntelIOMMUState, caching_mode, FALSE),
> > >      DEFINE_PROP_BOOL("ecs", IntelIOMMUState, ecs, FALSE),
> > > +    DEFINE_PROP_BOOL("svm", IntelIOMMUState, svm, FALSE),
> > >      DEFINE_PROP_END_OF_LIST(),
> > >  };
> > >  
> > > @@ -2973,6 +2974,15 @@ static void vtd_init(IntelIOMMUState *s)
> > >          s->ecap |= VTD_ECAP_ECS;
> > >      }
> > >  
> > > +    if (s->svm) {
> > > +        if (!s->ecs || !x86_iommu->pt_supported || !s->caching_mode) {
> > > +            error_report("Need to set ecs, pt, caching-mode for svm");
> > > +            exit(1);
> > > +        }
> > > +        s->cap |= VTD_CAP_DWD | VTD_CAP_DRD;
> > > +        s->ecap |= VTD_ECAP_PRS | VTD_ECAP_PTS | VTD_ECAP_PASID28;
> > > +    }
> > > +
> > >      if (s->caching_mode) {
> > >          s->cap |= VTD_CAP_CM;
> > >      }
> > > diff --git a/hw/i386/intel_iommu_internal.h b/hw/i386/intel_iommu_internal.h
> > > index 71a1c1e..f2a7d12 100644
> > > --- a/hw/i386/intel_iommu_internal.h
> > > +++ b/hw/i386/intel_iommu_internal.h
> > > @@ -191,6 +191,9 @@
> > >  #define VTD_ECAP_PT                 (1ULL << 6)
> > >  #define VTD_ECAP_MHMV               (15ULL << 20)
> > >  #define VTD_ECAP_ECS                (1ULL << 24)
> > > +#define VTD_ECAP_PASID28            (1ULL << 28)  
> > 
> > Could I ask what's this bit? On my spec, it says this bit is reserved
> > and defunct (spec version: June 2016).
> 
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=d42fde70849c5ba2f00c37a0666305eb507a47b8
> 
> Do we really need to emulate the buggy implementation?  Seems like we
> could just pretend bit28 never happened here and use bit40 instead.
> 

Agree, bit28 can be gone.  

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 03/20] intel_iommu: add "svm" option
  2017-05-08 11:20               ` [Qemu-devel] " Peter Xu
@ 2017-05-08  8:15                   ` Liu, Yi L
  -1 siblings, 0 replies; 81+ messages in thread
From: Liu, Yi L @ 2017-05-08  8:15 UTC (permalink / raw)
  To: Peter Xu
  Cc: Lan, Tianyu, Tian, Kevin, kvm-u79uwXL29TY76Z2rM5mHXA,
	jasowang-H+wXaHxf7aLQT0dZR+AlfA,
	qemu-devel-qX2TKyscuCcdnm+yROfE0A,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA, Pan,
	Jacob jun

On Mon, May 08, 2017 at 07:20:34PM +0800, Peter Xu wrote:
> On Mon, May 08, 2017 at 10:38:09AM +0000, Liu, Yi L wrote:
> > On Thu, 27 Apr 2017 18:53:17 +0800
> > Peter Xu <peterx@redhat.com> wrote:
> > 
> > > On Wed, Apr 26, 2017 at 06:06:33PM +0800, Liu, Yi L wrote:
> > > > Expose "Shared Virtual Memory" to guest by using "svm" option.
> > > > Also use "svm" to expose SVM related capabilities to guest.
> > > > e.g. "-device intel-iommu, svm=on"
> > > >
> > > > Signed-off-by: Liu, Yi L <yi.l.liu@linux.intel.com>
> > > > ---
> > > >  hw/i386/intel_iommu.c          | 10 ++++++++++
> > > >  hw/i386/intel_iommu_internal.h |  5 +++++
> > > > include/hw/i386/intel_iommu.h  |  1 +
> > > >  3 files changed, 16 insertions(+)
> > > >
> > > > diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c index
> > > > bf98fa5..ba1e7eb 100644
> > > > --- a/hw/i386/intel_iommu.c
> > > > +++ b/hw/i386/intel_iommu.c
> > > > @@ -2453,6 +2453,7 @@ static Property vtd_properties[] = {
> > > >      DEFINE_PROP_BOOL("x-buggy-eim", IntelIOMMUState, buggy_eim, false),
> > > >      DEFINE_PROP_BOOL("caching-mode", IntelIOMMUState, caching_mode,
> > > FALSE),
> > > >      DEFINE_PROP_BOOL("ecs", IntelIOMMUState, ecs, FALSE),
> > > > +    DEFINE_PROP_BOOL("svm", IntelIOMMUState, svm, FALSE),
> > > >      DEFINE_PROP_END_OF_LIST(),
> > > >  };
> > > >
> > > > @@ -2973,6 +2974,15 @@ static void vtd_init(IntelIOMMUState *s)
> > > >          s->ecap |= VTD_ECAP_ECS;
> > > >      }
> > > >
> > > > +    if (s->svm) {
> > > > +        if (!s->ecs || !x86_iommu->pt_supported || !s->caching_mode) {
> > > > +            error_report("Need to set ecs, pt, caching-mode for svm");
> > > > +            exit(1);
> > > > +        }
> > > > +        s->cap |= VTD_CAP_DWD | VTD_CAP_DRD;
> > > > +        s->ecap |= VTD_ECAP_PRS | VTD_ECAP_PTS | VTD_ECAP_PASID28;
> > > > +    }
> > > > +
> > > >      if (s->caching_mode) {
> > > >          s->cap |= VTD_CAP_CM;
> > > >      }
> > > > diff --git a/hw/i386/intel_iommu_internal.h
> > > > b/hw/i386/intel_iommu_internal.h index 71a1c1e..f2a7d12 100644
> > > > --- a/hw/i386/intel_iommu_internal.h
> > > > +++ b/hw/i386/intel_iommu_internal.h
> > > > @@ -191,6 +191,9 @@
> > > >  #define VTD_ECAP_PT                 (1ULL << 6)
> > > >  #define VTD_ECAP_MHMV               (15ULL << 20)
> > > >  #define VTD_ECAP_ECS                (1ULL << 24)
> > > > +#define VTD_ECAP_PASID28            (1ULL << 28)
> > > 
> > > Could I ask what's this bit? On my spec, it says this bit is reserved and defunct (spec
> > > version: June 2016).
> > 
> > As Ashok confirmed, yes it should be bit 40. would update it.
> 
> Ok.
> 
> > 
> > > > +#define VTD_ECAP_PRS                (1ULL << 29)
> > > > +#define VTD_ECAP_PTS                (0xeULL << 35)
> > > 
> > > Would it better we avoid using 0xe here, or at least add some comment?
> > 
> > For this value, it must be no more than the bits host supports. So it may be
> > better to have a default value and meanwhile expose an option to let user
> > set it. how about your opinion?
> 
> I think a more important point is that we need to make sure this value
> is no larger than hardware support? 

Agree. If it is larger, sanity check would fail.

> Since you are also working on the
> vfio interface for virt-svm... would it be possible that we can talk
> to kernel in some way so that we can know the supported pasid size in
> host IOMMU? So that when guest specifies something bigger, we can stop
> the user.

If it is just to stop when the size is not valid, I think we already have
such sanity check in host when trying to bind guest pasid table. Not sure
if it is practical to talk with kernel on the supported pasid size. But
may think about it. It is very likely that we need to do it through VFIO.

> 
> I don't know the practical value for this field, if it's static
> enough, I think it's also okay we make it static here as well. But
> again, I would prefer at least some comment, like:
> 
>   /* Value N indicates PASID field of N+1 bits, here 0xe stands for.. */

yes, at least we need to add such comments. Would add it.

> > 
> > > 
> > > >
> > > >  /* CAP_REG */
> > > >  /* (offset >> 4) << 24 */
> > > > @@ -207,6 +210,8 @@
> > > >  #define VTD_CAP_PSI                 (1ULL << 39)
> > > >  #define VTD_CAP_SLLPS               ((1ULL << 34) | (1ULL << 35))
> > > >  #define VTD_CAP_CM                  (1ULL << 7)
> > > > +#define VTD_CAP_DWD                 (1ULL << 54)
> > > > +#define VTD_CAP_DRD                 (1ULL << 55)
> > > 
> > > Just to confirm: after this series, we should support drain read/write then, right?
> > 
> > I haven’t done special process against it in IOMMU emulator. It's set to keep
> > consistence with VT-d spec since DWD and DRW is required capability when
> > PASID it reported as Set. However, I think it should be fine if guest issue QI
> > with drain read/write set in the descriptor. Host should be able to process it.
> 
> I see. IIUC the point here is we need to deliver these requests to
> host IOMMU, and I guess we need to be able to do this in a synchronous
> way as well.

yes, deliver request to host. For assigned devices, it is ok. BTW. do you
think we need to consider it for emulated devices?

Thanks,
Yi L

_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 03/20] intel_iommu: add "svm" option
@ 2017-05-08  8:15                   ` Liu, Yi L
  0 siblings, 0 replies; 81+ messages in thread
From: Liu, Yi L @ 2017-05-08  8:15 UTC (permalink / raw)
  To: Peter Xu
  Cc: Liu, Yi L, Lan, Tianyu, Tian, Kevin, Raj, Ashok, kvm,
	jean-philippe.brucker, jasowang, iommu, qemu-devel,
	alex.williamson, Pan, Jacob jun

On Mon, May 08, 2017 at 07:20:34PM +0800, Peter Xu wrote:
> On Mon, May 08, 2017 at 10:38:09AM +0000, Liu, Yi L wrote:
> > On Thu, 27 Apr 2017 18:53:17 +0800
> > Peter Xu <peterx@redhat.com> wrote:
> > 
> > > On Wed, Apr 26, 2017 at 06:06:33PM +0800, Liu, Yi L wrote:
> > > > Expose "Shared Virtual Memory" to guest by using "svm" option.
> > > > Also use "svm" to expose SVM related capabilities to guest.
> > > > e.g. "-device intel-iommu, svm=on"
> > > >
> > > > Signed-off-by: Liu, Yi L <yi.l.liu@linux.intel.com>
> > > > ---
> > > >  hw/i386/intel_iommu.c          | 10 ++++++++++
> > > >  hw/i386/intel_iommu_internal.h |  5 +++++
> > > > include/hw/i386/intel_iommu.h  |  1 +
> > > >  3 files changed, 16 insertions(+)
> > > >
> > > > diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c index
> > > > bf98fa5..ba1e7eb 100644
> > > > --- a/hw/i386/intel_iommu.c
> > > > +++ b/hw/i386/intel_iommu.c
> > > > @@ -2453,6 +2453,7 @@ static Property vtd_properties[] = {
> > > >      DEFINE_PROP_BOOL("x-buggy-eim", IntelIOMMUState, buggy_eim, false),
> > > >      DEFINE_PROP_BOOL("caching-mode", IntelIOMMUState, caching_mode,
> > > FALSE),
> > > >      DEFINE_PROP_BOOL("ecs", IntelIOMMUState, ecs, FALSE),
> > > > +    DEFINE_PROP_BOOL("svm", IntelIOMMUState, svm, FALSE),
> > > >      DEFINE_PROP_END_OF_LIST(),
> > > >  };
> > > >
> > > > @@ -2973,6 +2974,15 @@ static void vtd_init(IntelIOMMUState *s)
> > > >          s->ecap |= VTD_ECAP_ECS;
> > > >      }
> > > >
> > > > +    if (s->svm) {
> > > > +        if (!s->ecs || !x86_iommu->pt_supported || !s->caching_mode) {
> > > > +            error_report("Need to set ecs, pt, caching-mode for svm");
> > > > +            exit(1);
> > > > +        }
> > > > +        s->cap |= VTD_CAP_DWD | VTD_CAP_DRD;
> > > > +        s->ecap |= VTD_ECAP_PRS | VTD_ECAP_PTS | VTD_ECAP_PASID28;
> > > > +    }
> > > > +
> > > >      if (s->caching_mode) {
> > > >          s->cap |= VTD_CAP_CM;
> > > >      }
> > > > diff --git a/hw/i386/intel_iommu_internal.h
> > > > b/hw/i386/intel_iommu_internal.h index 71a1c1e..f2a7d12 100644
> > > > --- a/hw/i386/intel_iommu_internal.h
> > > > +++ b/hw/i386/intel_iommu_internal.h
> > > > @@ -191,6 +191,9 @@
> > > >  #define VTD_ECAP_PT                 (1ULL << 6)
> > > >  #define VTD_ECAP_MHMV               (15ULL << 20)
> > > >  #define VTD_ECAP_ECS                (1ULL << 24)
> > > > +#define VTD_ECAP_PASID28            (1ULL << 28)
> > > 
> > > Could I ask what's this bit? On my spec, it says this bit is reserved and defunct (spec
> > > version: June 2016).
> > 
> > As Ashok confirmed, yes it should be bit 40. would update it.
> 
> Ok.
> 
> > 
> > > > +#define VTD_ECAP_PRS                (1ULL << 29)
> > > > +#define VTD_ECAP_PTS                (0xeULL << 35)
> > > 
> > > Would it better we avoid using 0xe here, or at least add some comment?
> > 
> > For this value, it must be no more than the bits host supports. So it may be
> > better to have a default value and meanwhile expose an option to let user
> > set it. how about your opinion?
> 
> I think a more important point is that we need to make sure this value
> is no larger than hardware support? 

Agree. If it is larger, sanity check would fail.

> Since you are also working on the
> vfio interface for virt-svm... would it be possible that we can talk
> to kernel in some way so that we can know the supported pasid size in
> host IOMMU? So that when guest specifies something bigger, we can stop
> the user.

If it is just to stop when the size is not valid, I think we already have
such sanity check in host when trying to bind guest pasid table. Not sure
if it is practical to talk with kernel on the supported pasid size. But
may think about it. It is very likely that we need to do it through VFIO.

> 
> I don't know the practical value for this field, if it's static
> enough, I think it's also okay we make it static here as well. But
> again, I would prefer at least some comment, like:
> 
>   /* Value N indicates PASID field of N+1 bits, here 0xe stands for.. */

yes, at least we need to add such comments. Would add it.

> > 
> > > 
> > > >
> > > >  /* CAP_REG */
> > > >  /* (offset >> 4) << 24 */
> > > > @@ -207,6 +210,8 @@
> > > >  #define VTD_CAP_PSI                 (1ULL << 39)
> > > >  #define VTD_CAP_SLLPS               ((1ULL << 34) | (1ULL << 35))
> > > >  #define VTD_CAP_CM                  (1ULL << 7)
> > > > +#define VTD_CAP_DWD                 (1ULL << 54)
> > > > +#define VTD_CAP_DRD                 (1ULL << 55)
> > > 
> > > Just to confirm: after this series, we should support drain read/write then, right?
> > 
> > I haven’t done special process against it in IOMMU emulator. It's set to keep
> > consistence with VT-d spec since DWD and DRW is required capability when
> > PASID it reported as Set. However, I think it should be fine if guest issue QI
> > with drain read/write set in the descriptor. Host should be able to process it.
> 
> I see. IIUC the point here is we need to deliver these requests to
> host IOMMU, and I guess we need to be able to do this in a synchronous
> way as well.

yes, deliver request to host. For assigned devices, it is ok. BTW. do you
think we need to consider it for emulated devices?

Thanks,
Yi L

^ permalink raw reply	[flat|nested] 81+ messages in thread

* RE: [RFC PATCH 03/20] intel_iommu: add "svm" option
  2017-04-27 10:53       ` [Qemu-devel] " Peter Xu
@ 2017-05-08 10:38           ` Liu, Yi L
  -1 siblings, 0 replies; 81+ messages in thread
From: Liu, Yi L @ 2017-05-08 10:38 UTC (permalink / raw)
  To: Peter Xu, Liu, Yi L
  Cc: Lan, Tianyu, Tian, Kevin, kvm-u79uwXL29TY76Z2rM5mHXA,
	jasowang-H+wXaHxf7aLQT0dZR+AlfA,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	qemu-devel-qX2TKyscuCcdnm+yROfE0A, Pan, Jacob jun

On Thu, 27 Apr 2017 18:53:17 +0800
Peter Xu <peterx@redhat.com> wrote:

> On Wed, Apr 26, 2017 at 06:06:33PM +0800, Liu, Yi L wrote:
> > Expose "Shared Virtual Memory" to guest by using "svm" option.
> > Also use "svm" to expose SVM related capabilities to guest.
> > e.g. "-device intel-iommu, svm=on"
> >
> > Signed-off-by: Liu, Yi L <yi.l.liu@linux.intel.com>
> > ---
> >  hw/i386/intel_iommu.c          | 10 ++++++++++
> >  hw/i386/intel_iommu_internal.h |  5 +++++
> > include/hw/i386/intel_iommu.h  |  1 +
> >  3 files changed, 16 insertions(+)
> >
> > diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c index
> > bf98fa5..ba1e7eb 100644
> > --- a/hw/i386/intel_iommu.c
> > +++ b/hw/i386/intel_iommu.c
> > @@ -2453,6 +2453,7 @@ static Property vtd_properties[] = {
> >      DEFINE_PROP_BOOL("x-buggy-eim", IntelIOMMUState, buggy_eim, false),
> >      DEFINE_PROP_BOOL("caching-mode", IntelIOMMUState, caching_mode,
> FALSE),
> >      DEFINE_PROP_BOOL("ecs", IntelIOMMUState, ecs, FALSE),
> > +    DEFINE_PROP_BOOL("svm", IntelIOMMUState, svm, FALSE),
> >      DEFINE_PROP_END_OF_LIST(),
> >  };
> >
> > @@ -2973,6 +2974,15 @@ static void vtd_init(IntelIOMMUState *s)
> >          s->ecap |= VTD_ECAP_ECS;
> >      }
> >
> > +    if (s->svm) {
> > +        if (!s->ecs || !x86_iommu->pt_supported || !s->caching_mode) {
> > +            error_report("Need to set ecs, pt, caching-mode for svm");
> > +            exit(1);
> > +        }
> > +        s->cap |= VTD_CAP_DWD | VTD_CAP_DRD;
> > +        s->ecap |= VTD_ECAP_PRS | VTD_ECAP_PTS | VTD_ECAP_PASID28;
> > +    }
> > +
> >      if (s->caching_mode) {
> >          s->cap |= VTD_CAP_CM;
> >      }
> > diff --git a/hw/i386/intel_iommu_internal.h
> > b/hw/i386/intel_iommu_internal.h index 71a1c1e..f2a7d12 100644
> > --- a/hw/i386/intel_iommu_internal.h
> > +++ b/hw/i386/intel_iommu_internal.h
> > @@ -191,6 +191,9 @@
> >  #define VTD_ECAP_PT                 (1ULL << 6)
> >  #define VTD_ECAP_MHMV               (15ULL << 20)
> >  #define VTD_ECAP_ECS                (1ULL << 24)
> > +#define VTD_ECAP_PASID28            (1ULL << 28)
> 
> Could I ask what's this bit? On my spec, it says this bit is reserved and defunct (spec
> version: June 2016).

As Ashok confirmed, yes it should be bit 40. would update it.

> > +#define VTD_ECAP_PRS                (1ULL << 29)
> > +#define VTD_ECAP_PTS                (0xeULL << 35)
> 
> Would it better we avoid using 0xe here, or at least add some comment?

For this value, it must be no more than the bits host supports. So it may be
better to have a default value and meanwhile expose an option to let user
set it. how about your opinion?

> 
> >
> >  /* CAP_REG */
> >  /* (offset >> 4) << 24 */
> > @@ -207,6 +210,8 @@
> >  #define VTD_CAP_PSI                 (1ULL << 39)
> >  #define VTD_CAP_SLLPS               ((1ULL << 34) | (1ULL << 35))
> >  #define VTD_CAP_CM                  (1ULL << 7)
> > +#define VTD_CAP_DWD                 (1ULL << 54)
> > +#define VTD_CAP_DRD                 (1ULL << 55)
> 
> Just to confirm: after this series, we should support drain read/write then, right?

I haven’t done special process against it in IOMMU emulator. It's set to keep
consistence with VT-d spec since DWD and DRW is required capability when
PASID it reported as Set. However, I think it should be fine if guest issue QI
with drain read/write set in the descriptor. Host should be able to process it.

Thanks,
Yi L
> >
> >  /* Supported Adjusted Guest Address Widths */
> >  #define VTD_CAP_SAGAW_SHIFT         8
> > diff --git a/include/hw/i386/intel_iommu.h
> > b/include/hw/i386/intel_iommu.h index ae21fe5..8981615 100644
> > --- a/include/hw/i386/intel_iommu.h
> > +++ b/include/hw/i386/intel_iommu.h
> > @@ -267,6 +267,7 @@ struct IntelIOMMUState {
> >
> >      bool caching_mode;          /* RO - is cap CM enabled? */
> >      bool ecs;                       /* Extended Context Support */
> > +    bool svm;                       /* Shared Virtual Memory */
> >
> >      dma_addr_t root;                /* Current root table pointer */
> >      bool root_extended;             /* Type of root table (extended or not) */
> > --
> > 1.9.1
> >
> 
> --
> Peter Xu
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 03/20] intel_iommu: add "svm" option
@ 2017-05-08 10:38           ` Liu, Yi L
  0 siblings, 0 replies; 81+ messages in thread
From: Liu, Yi L @ 2017-05-08 10:38 UTC (permalink / raw)
  To: Peter Xu, Liu, Yi L
  Cc: qemu-devel, alex.williamson, kvm, jasowang, iommu, Tian, Kevin,
	Raj, Ashok, Pan, Jacob jun, Lan, Tianyu, jean-philippe.brucker

On Thu, 27 Apr 2017 18:53:17 +0800
Peter Xu <peterx@redhat.com> wrote:

> On Wed, Apr 26, 2017 at 06:06:33PM +0800, Liu, Yi L wrote:
> > Expose "Shared Virtual Memory" to guest by using "svm" option.
> > Also use "svm" to expose SVM related capabilities to guest.
> > e.g. "-device intel-iommu, svm=on"
> >
> > Signed-off-by: Liu, Yi L <yi.l.liu@linux.intel.com>
> > ---
> >  hw/i386/intel_iommu.c          | 10 ++++++++++
> >  hw/i386/intel_iommu_internal.h |  5 +++++
> > include/hw/i386/intel_iommu.h  |  1 +
> >  3 files changed, 16 insertions(+)
> >
> > diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c index
> > bf98fa5..ba1e7eb 100644
> > --- a/hw/i386/intel_iommu.c
> > +++ b/hw/i386/intel_iommu.c
> > @@ -2453,6 +2453,7 @@ static Property vtd_properties[] = {
> >      DEFINE_PROP_BOOL("x-buggy-eim", IntelIOMMUState, buggy_eim, false),
> >      DEFINE_PROP_BOOL("caching-mode", IntelIOMMUState, caching_mode,
> FALSE),
> >      DEFINE_PROP_BOOL("ecs", IntelIOMMUState, ecs, FALSE),
> > +    DEFINE_PROP_BOOL("svm", IntelIOMMUState, svm, FALSE),
> >      DEFINE_PROP_END_OF_LIST(),
> >  };
> >
> > @@ -2973,6 +2974,15 @@ static void vtd_init(IntelIOMMUState *s)
> >          s->ecap |= VTD_ECAP_ECS;
> >      }
> >
> > +    if (s->svm) {
> > +        if (!s->ecs || !x86_iommu->pt_supported || !s->caching_mode) {
> > +            error_report("Need to set ecs, pt, caching-mode for svm");
> > +            exit(1);
> > +        }
> > +        s->cap |= VTD_CAP_DWD | VTD_CAP_DRD;
> > +        s->ecap |= VTD_ECAP_PRS | VTD_ECAP_PTS | VTD_ECAP_PASID28;
> > +    }
> > +
> >      if (s->caching_mode) {
> >          s->cap |= VTD_CAP_CM;
> >      }
> > diff --git a/hw/i386/intel_iommu_internal.h
> > b/hw/i386/intel_iommu_internal.h index 71a1c1e..f2a7d12 100644
> > --- a/hw/i386/intel_iommu_internal.h
> > +++ b/hw/i386/intel_iommu_internal.h
> > @@ -191,6 +191,9 @@
> >  #define VTD_ECAP_PT                 (1ULL << 6)
> >  #define VTD_ECAP_MHMV               (15ULL << 20)
> >  #define VTD_ECAP_ECS                (1ULL << 24)
> > +#define VTD_ECAP_PASID28            (1ULL << 28)
> 
> Could I ask what's this bit? On my spec, it says this bit is reserved and defunct (spec
> version: June 2016).

As Ashok confirmed, yes it should be bit 40. would update it.

> > +#define VTD_ECAP_PRS                (1ULL << 29)
> > +#define VTD_ECAP_PTS                (0xeULL << 35)
> 
> Would it better we avoid using 0xe here, or at least add some comment?

For this value, it must be no more than the bits host supports. So it may be
better to have a default value and meanwhile expose an option to let user
set it. how about your opinion?

> 
> >
> >  /* CAP_REG */
> >  /* (offset >> 4) << 24 */
> > @@ -207,6 +210,8 @@
> >  #define VTD_CAP_PSI                 (1ULL << 39)
> >  #define VTD_CAP_SLLPS               ((1ULL << 34) | (1ULL << 35))
> >  #define VTD_CAP_CM                  (1ULL << 7)
> > +#define VTD_CAP_DWD                 (1ULL << 54)
> > +#define VTD_CAP_DRD                 (1ULL << 55)
> 
> Just to confirm: after this series, we should support drain read/write then, right?

I haven’t done special process against it in IOMMU emulator. It's set to keep
consistence with VT-d spec since DWD and DRW is required capability when
PASID it reported as Set. However, I think it should be fine if guest issue QI
with drain read/write set in the descriptor. Host should be able to process it.

Thanks,
Yi L
> >
> >  /* Supported Adjusted Guest Address Widths */
> >  #define VTD_CAP_SAGAW_SHIFT         8
> > diff --git a/include/hw/i386/intel_iommu.h
> > b/include/hw/i386/intel_iommu.h index ae21fe5..8981615 100644
> > --- a/include/hw/i386/intel_iommu.h
> > +++ b/include/hw/i386/intel_iommu.h
> > @@ -267,6 +267,7 @@ struct IntelIOMMUState {
> >
> >      bool caching_mode;          /* RO - is cap CM enabled? */
> >      bool ecs;                       /* Extended Context Support */
> > +    bool svm;                       /* Shared Virtual Memory */
> >
> >      dma_addr_t root;                /* Current root table pointer */
> >      bool root_extended;             /* Type of root table (extended or not) */
> > --
> > 1.9.1
> >
> 
> --
> Peter Xu

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [RFC PATCH 03/20] intel_iommu: add "svm" option
  2017-05-08 10:38           ` [Qemu-devel] " Liu, Yi L
@ 2017-05-08 11:20               ` Peter Xu
  -1 siblings, 0 replies; 81+ messages in thread
From: Peter Xu @ 2017-05-08 11:20 UTC (permalink / raw)
  To: Liu, Yi L
  Cc: Lan, Tianyu, Liu, Yi L, Tian, Kevin, kvm-u79uwXL29TY76Z2rM5mHXA,
	jasowang-H+wXaHxf7aLQT0dZR+AlfA,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	qemu-devel-qX2TKyscuCcdnm+yROfE0A, Pan, Jacob jun

On Mon, May 08, 2017 at 10:38:09AM +0000, Liu, Yi L wrote:
> On Thu, 27 Apr 2017 18:53:17 +0800
> Peter Xu <peterx@redhat.com> wrote:
> 
> > On Wed, Apr 26, 2017 at 06:06:33PM +0800, Liu, Yi L wrote:
> > > Expose "Shared Virtual Memory" to guest by using "svm" option.
> > > Also use "svm" to expose SVM related capabilities to guest.
> > > e.g. "-device intel-iommu, svm=on"
> > >
> > > Signed-off-by: Liu, Yi L <yi.l.liu@linux.intel.com>
> > > ---
> > >  hw/i386/intel_iommu.c          | 10 ++++++++++
> > >  hw/i386/intel_iommu_internal.h |  5 +++++
> > > include/hw/i386/intel_iommu.h  |  1 +
> > >  3 files changed, 16 insertions(+)
> > >
> > > diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c index
> > > bf98fa5..ba1e7eb 100644
> > > --- a/hw/i386/intel_iommu.c
> > > +++ b/hw/i386/intel_iommu.c
> > > @@ -2453,6 +2453,7 @@ static Property vtd_properties[] = {
> > >      DEFINE_PROP_BOOL("x-buggy-eim", IntelIOMMUState, buggy_eim, false),
> > >      DEFINE_PROP_BOOL("caching-mode", IntelIOMMUState, caching_mode,
> > FALSE),
> > >      DEFINE_PROP_BOOL("ecs", IntelIOMMUState, ecs, FALSE),
> > > +    DEFINE_PROP_BOOL("svm", IntelIOMMUState, svm, FALSE),
> > >      DEFINE_PROP_END_OF_LIST(),
> > >  };
> > >
> > > @@ -2973,6 +2974,15 @@ static void vtd_init(IntelIOMMUState *s)
> > >          s->ecap |= VTD_ECAP_ECS;
> > >      }
> > >
> > > +    if (s->svm) {
> > > +        if (!s->ecs || !x86_iommu->pt_supported || !s->caching_mode) {
> > > +            error_report("Need to set ecs, pt, caching-mode for svm");
> > > +            exit(1);
> > > +        }
> > > +        s->cap |= VTD_CAP_DWD | VTD_CAP_DRD;
> > > +        s->ecap |= VTD_ECAP_PRS | VTD_ECAP_PTS | VTD_ECAP_PASID28;
> > > +    }
> > > +
> > >      if (s->caching_mode) {
> > >          s->cap |= VTD_CAP_CM;
> > >      }
> > > diff --git a/hw/i386/intel_iommu_internal.h
> > > b/hw/i386/intel_iommu_internal.h index 71a1c1e..f2a7d12 100644
> > > --- a/hw/i386/intel_iommu_internal.h
> > > +++ b/hw/i386/intel_iommu_internal.h
> > > @@ -191,6 +191,9 @@
> > >  #define VTD_ECAP_PT                 (1ULL << 6)
> > >  #define VTD_ECAP_MHMV               (15ULL << 20)
> > >  #define VTD_ECAP_ECS                (1ULL << 24)
> > > +#define VTD_ECAP_PASID28            (1ULL << 28)
> > 
> > Could I ask what's this bit? On my spec, it says this bit is reserved and defunct (spec
> > version: June 2016).
> 
> As Ashok confirmed, yes it should be bit 40. would update it.

Ok.

> 
> > > +#define VTD_ECAP_PRS                (1ULL << 29)
> > > +#define VTD_ECAP_PTS                (0xeULL << 35)
> > 
> > Would it better we avoid using 0xe here, or at least add some comment?
> 
> For this value, it must be no more than the bits host supports. So it may be
> better to have a default value and meanwhile expose an option to let user
> set it. how about your opinion?

I think a more important point is that we need to make sure this value
is no larger than hardware support? Since you are also working on the
vfio interface for virt-svm... would it be possible that we can talk
to kernel in some way so that we can know the supported pasid size in
host IOMMU? So that when guest specifies something bigger, we can stop
the user.

I don't know the practical value for this field, if it's static
enough, I think it's also okay we make it static here as well. But
again, I would prefer at least some comment, like:

  /* Value N indicates PASID field of N+1 bits, here 0xe stands for.. */

> 
> > 
> > >
> > >  /* CAP_REG */
> > >  /* (offset >> 4) << 24 */
> > > @@ -207,6 +210,8 @@
> > >  #define VTD_CAP_PSI                 (1ULL << 39)
> > >  #define VTD_CAP_SLLPS               ((1ULL << 34) | (1ULL << 35))
> > >  #define VTD_CAP_CM                  (1ULL << 7)
> > > +#define VTD_CAP_DWD                 (1ULL << 54)
> > > +#define VTD_CAP_DRD                 (1ULL << 55)
> > 
> > Just to confirm: after this series, we should support drain read/write then, right?
> 
> I haven’t done special process against it in IOMMU emulator. It's set to keep
> consistence with VT-d spec since DWD and DRW is required capability when
> PASID it reported as Set. However, I think it should be fine if guest issue QI
> with drain read/write set in the descriptor. Host should be able to process it.

I see. IIUC the point here is we need to deliver these requests to
host IOMMU, and I guess we need to be able to do this in a synchronous
way as well.

Thanks,

-- 
Peter Xu
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 03/20] intel_iommu: add "svm" option
@ 2017-05-08 11:20               ` Peter Xu
  0 siblings, 0 replies; 81+ messages in thread
From: Peter Xu @ 2017-05-08 11:20 UTC (permalink / raw)
  To: Liu, Yi L
  Cc: Liu, Yi L, qemu-devel, alex.williamson, kvm, jasowang, iommu,
	Tian, Kevin, Raj, Ashok, Pan, Jacob jun, Lan, Tianyu,
	jean-philippe.brucker

On Mon, May 08, 2017 at 10:38:09AM +0000, Liu, Yi L wrote:
> On Thu, 27 Apr 2017 18:53:17 +0800
> Peter Xu <peterx@redhat.com> wrote:
> 
> > On Wed, Apr 26, 2017 at 06:06:33PM +0800, Liu, Yi L wrote:
> > > Expose "Shared Virtual Memory" to guest by using "svm" option.
> > > Also use "svm" to expose SVM related capabilities to guest.
> > > e.g. "-device intel-iommu, svm=on"
> > >
> > > Signed-off-by: Liu, Yi L <yi.l.liu@linux.intel.com>
> > > ---
> > >  hw/i386/intel_iommu.c          | 10 ++++++++++
> > >  hw/i386/intel_iommu_internal.h |  5 +++++
> > > include/hw/i386/intel_iommu.h  |  1 +
> > >  3 files changed, 16 insertions(+)
> > >
> > > diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c index
> > > bf98fa5..ba1e7eb 100644
> > > --- a/hw/i386/intel_iommu.c
> > > +++ b/hw/i386/intel_iommu.c
> > > @@ -2453,6 +2453,7 @@ static Property vtd_properties[] = {
> > >      DEFINE_PROP_BOOL("x-buggy-eim", IntelIOMMUState, buggy_eim, false),
> > >      DEFINE_PROP_BOOL("caching-mode", IntelIOMMUState, caching_mode,
> > FALSE),
> > >      DEFINE_PROP_BOOL("ecs", IntelIOMMUState, ecs, FALSE),
> > > +    DEFINE_PROP_BOOL("svm", IntelIOMMUState, svm, FALSE),
> > >      DEFINE_PROP_END_OF_LIST(),
> > >  };
> > >
> > > @@ -2973,6 +2974,15 @@ static void vtd_init(IntelIOMMUState *s)
> > >          s->ecap |= VTD_ECAP_ECS;
> > >      }
> > >
> > > +    if (s->svm) {
> > > +        if (!s->ecs || !x86_iommu->pt_supported || !s->caching_mode) {
> > > +            error_report("Need to set ecs, pt, caching-mode for svm");
> > > +            exit(1);
> > > +        }
> > > +        s->cap |= VTD_CAP_DWD | VTD_CAP_DRD;
> > > +        s->ecap |= VTD_ECAP_PRS | VTD_ECAP_PTS | VTD_ECAP_PASID28;
> > > +    }
> > > +
> > >      if (s->caching_mode) {
> > >          s->cap |= VTD_CAP_CM;
> > >      }
> > > diff --git a/hw/i386/intel_iommu_internal.h
> > > b/hw/i386/intel_iommu_internal.h index 71a1c1e..f2a7d12 100644
> > > --- a/hw/i386/intel_iommu_internal.h
> > > +++ b/hw/i386/intel_iommu_internal.h
> > > @@ -191,6 +191,9 @@
> > >  #define VTD_ECAP_PT                 (1ULL << 6)
> > >  #define VTD_ECAP_MHMV               (15ULL << 20)
> > >  #define VTD_ECAP_ECS                (1ULL << 24)
> > > +#define VTD_ECAP_PASID28            (1ULL << 28)
> > 
> > Could I ask what's this bit? On my spec, it says this bit is reserved and defunct (spec
> > version: June 2016).
> 
> As Ashok confirmed, yes it should be bit 40. would update it.

Ok.

> 
> > > +#define VTD_ECAP_PRS                (1ULL << 29)
> > > +#define VTD_ECAP_PTS                (0xeULL << 35)
> > 
> > Would it better we avoid using 0xe here, or at least add some comment?
> 
> For this value, it must be no more than the bits host supports. So it may be
> better to have a default value and meanwhile expose an option to let user
> set it. how about your opinion?

I think a more important point is that we need to make sure this value
is no larger than hardware support? Since you are also working on the
vfio interface for virt-svm... would it be possible that we can talk
to kernel in some way so that we can know the supported pasid size in
host IOMMU? So that when guest specifies something bigger, we can stop
the user.

I don't know the practical value for this field, if it's static
enough, I think it's also okay we make it static here as well. But
again, I would prefer at least some comment, like:

  /* Value N indicates PASID field of N+1 bits, here 0xe stands for.. */

> 
> > 
> > >
> > >  /* CAP_REG */
> > >  /* (offset >> 4) << 24 */
> > > @@ -207,6 +210,8 @@
> > >  #define VTD_CAP_PSI                 (1ULL << 39)
> > >  #define VTD_CAP_SLLPS               ((1ULL << 34) | (1ULL << 35))
> > >  #define VTD_CAP_CM                  (1ULL << 7)
> > > +#define VTD_CAP_DWD                 (1ULL << 54)
> > > +#define VTD_CAP_DRD                 (1ULL << 55)
> > 
> > Just to confirm: after this series, we should support drain read/write then, right?
> 
> I haven’t done special process against it in IOMMU emulator. It's set to keep
> consistence with VT-d spec since DWD and DRW is required capability when
> PASID it reported as Set. However, I think it should be fine if guest issue QI
> with drain read/write set in the descriptor. Host should be able to process it.

I see. IIUC the point here is we need to deliver these requests to
host IOMMU, and I guess we need to be able to do this in a synchronous
way as well.

Thanks,

-- 
Peter Xu

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 09/20] Memory: introduce iommu_ops->record_device
  2017-04-28  6:46       ` [Qemu-devel] " Lan Tianyu
  (?)
@ 2017-05-19  5:23         ` Liu, Yi L
  -1 siblings, 0 replies; 81+ messages in thread
From: Liu, Yi L @ 2017-05-19  5:23 UTC (permalink / raw)
  To: alex.williamson
  Cc: qemu-devel, alex.williamson, peterx, kevin.tian, yi.l.liu,
	ashok.raj, kvm, jean-philippe.brucker, jasowang, iommu,
	jacob.jun.pan, tianyu.lan

Hi Alex,

What's your opinion with Tianyu's question? Is it accepatable
to use VFIO API in intel_iommu emulator?

Thanks,
Yi L
On Fri, Apr 28, 2017 at 02:46:16PM +0800, Lan Tianyu wrote:
> On 2017年04月26日 18:06, Liu, Yi L wrote:
> > With vIOMMU exposed to guest, vIOMMU emulator needs to do translation
> > between host and guest. e.g. a device-selective TLB flush, vIOMMU
> > emulator needs to replace guest SID with host SID so that to limit
> > the invalidation. This patch introduces a new callback
> > iommu_ops->record_device() to notify vIOMMU emulator to record necessary
> > information about the assigned device.
> 
> This patch is to prepare to translate guest sbdf to host sbdf.
> 
> Alex:
> 	Could we add a new vfio API to do such translation? This will be more
> straight forward than storing host sbdf in the vIOMMU device model.
> 
> > 
> > Signed-off-by: Liu, Yi L <yi.l.liu@linux.intel.com>
> > ---
> >  include/exec/memory.h | 11 +++++++++++
> >  memory.c              | 12 ++++++++++++
> >  2 files changed, 23 insertions(+)
> > 
> > diff --git a/include/exec/memory.h b/include/exec/memory.h
> > index 7bd13ab..49087ef 100644
> > --- a/include/exec/memory.h
> > +++ b/include/exec/memory.h
> > @@ -203,6 +203,8 @@ struct MemoryRegionIOMMUOps {
> >                                  IOMMUNotifierFlag new_flags);
> >      /* Set this up to provide customized IOMMU replay function */
> >      void (*replay)(MemoryRegion *iommu, IOMMUNotifier *notifier);
> > +    void (*record_device)(MemoryRegion *iommu,
> > +                          void *device_info);
> >  };
> >  
> >  typedef struct CoalescedMemoryRange CoalescedMemoryRange;
> > @@ -708,6 +710,15 @@ void memory_region_notify_iommu(MemoryRegion *mr,
> >  void memory_region_notify_one(IOMMUNotifier *notifier,
> >                                IOMMUTLBEntry *entry);
> >  
> > +/*
> > + * memory_region_notify_device_record: notify IOMMU to record assign
> > + * device.
> > + * @mr: the memory region to notify
> > + * @ device_info: device information
> > + */
> > +void memory_region_notify_device_record(MemoryRegion *mr,
> > +                                        void *info);
> > +
> >  /**
> >   * memory_region_register_iommu_notifier: register a notifier for changes to
> >   * IOMMU translation entries.
> > diff --git a/memory.c b/memory.c
> > index 0728e62..45ef069 100644
> > --- a/memory.c
> > +++ b/memory.c
> > @@ -1600,6 +1600,18 @@ static void memory_region_update_iommu_notify_flags(MemoryRegion *mr)
> >      mr->iommu_notify_flags = flags;
> >  }
> >  
> > +void memory_region_notify_device_record(MemoryRegion *mr,
> > +                                        void *info)
> > +{
> > +    assert(memory_region_is_iommu(mr));
> > +
> > +    if (mr->iommu_ops->record_device) {
> > +        mr->iommu_ops->record_device(mr, info);
> > +    }
> > +
> > +    return;
> > +}
> > +
> >  void memory_region_register_iommu_notifier(MemoryRegion *mr,
> >                                             IOMMUNotifier *n)
> >  {
> > 
> 
> 

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 09/20] Memory: introduce iommu_ops->record_device
@ 2017-05-19  5:23         ` Liu, Yi L
  0 siblings, 0 replies; 81+ messages in thread
From: Liu, Yi L @ 2017-05-19  5:23 UTC (permalink / raw)
  Cc: qemu-devel, alex.williamson, peterx, kevin.tian, yi.l.liu,
	ashok.raj, kvm, jean-philippe.brucker, jasowang, iommu,
	jacob.jun.pan, tianyu.lan

Hi Alex,

What's your opinion with Tianyu's question? Is it accepatable
to use VFIO API in intel_iommu emulator?

Thanks,
Yi L
On Fri, Apr 28, 2017 at 02:46:16PM +0800, Lan Tianyu wrote:
> On 2017年04月26日 18:06, Liu, Yi L wrote:
> > With vIOMMU exposed to guest, vIOMMU emulator needs to do translation
> > between host and guest. e.g. a device-selective TLB flush, vIOMMU
> > emulator needs to replace guest SID with host SID so that to limit
> > the invalidation. This patch introduces a new callback
> > iommu_ops->record_device() to notify vIOMMU emulator to record necessary
> > information about the assigned device.
> 
> This patch is to prepare to translate guest sbdf to host sbdf.
> 
> Alex:
> 	Could we add a new vfio API to do such translation? This will be more
> straight forward than storing host sbdf in the vIOMMU device model.
> 
> > 
> > Signed-off-by: Liu, Yi L <yi.l.liu@linux.intel.com>
> > ---
> >  include/exec/memory.h | 11 +++++++++++
> >  memory.c              | 12 ++++++++++++
> >  2 files changed, 23 insertions(+)
> > 
> > diff --git a/include/exec/memory.h b/include/exec/memory.h
> > index 7bd13ab..49087ef 100644
> > --- a/include/exec/memory.h
> > +++ b/include/exec/memory.h
> > @@ -203,6 +203,8 @@ struct MemoryRegionIOMMUOps {
> >                                  IOMMUNotifierFlag new_flags);
> >      /* Set this up to provide customized IOMMU replay function */
> >      void (*replay)(MemoryRegion *iommu, IOMMUNotifier *notifier);
> > +    void (*record_device)(MemoryRegion *iommu,
> > +                          void *device_info);
> >  };
> >  
> >  typedef struct CoalescedMemoryRange CoalescedMemoryRange;
> > @@ -708,6 +710,15 @@ void memory_region_notify_iommu(MemoryRegion *mr,
> >  void memory_region_notify_one(IOMMUNotifier *notifier,
> >                                IOMMUTLBEntry *entry);
> >  
> > +/*
> > + * memory_region_notify_device_record: notify IOMMU to record assign
> > + * device.
> > + * @mr: the memory region to notify
> > + * @ device_info: device information
> > + */
> > +void memory_region_notify_device_record(MemoryRegion *mr,
> > +                                        void *info);
> > +
> >  /**
> >   * memory_region_register_iommu_notifier: register a notifier for changes to
> >   * IOMMU translation entries.
> > diff --git a/memory.c b/memory.c
> > index 0728e62..45ef069 100644
> > --- a/memory.c
> > +++ b/memory.c
> > @@ -1600,6 +1600,18 @@ static void memory_region_update_iommu_notify_flags(MemoryRegion *mr)
> >      mr->iommu_notify_flags = flags;
> >  }
> >  
> > +void memory_region_notify_device_record(MemoryRegion *mr,
> > +                                        void *info)
> > +{
> > +    assert(memory_region_is_iommu(mr));
> > +
> > +    if (mr->iommu_ops->record_device) {
> > +        mr->iommu_ops->record_device(mr, info);
> > +    }
> > +
> > +    return;
> > +}
> > +
> >  void memory_region_register_iommu_notifier(MemoryRegion *mr,
> >                                             IOMMUNotifier *n)
> >  {
> > 
> 
> 

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 09/20] Memory: introduce iommu_ops->record_device
@ 2017-05-19  5:23         ` Liu, Yi L
  0 siblings, 0 replies; 81+ messages in thread
From: Liu, Yi L @ 2017-05-19  5:23 UTC (permalink / raw)
  To: alex.williamson
  Cc: qemu-devel, peterx, kevin.tian, yi.l.liu, ashok.raj, kvm,
	jean-philippe.brucker, jasowang, iommu, jacob.jun.pan,
	tianyu.lan

Hi Alex,

What's your opinion with Tianyu's question? Is it accepatable
to use VFIO API in intel_iommu emulator?

Thanks,
Yi L
On Fri, Apr 28, 2017 at 02:46:16PM +0800, Lan Tianyu wrote:
> On 2017年04月26日 18:06, Liu, Yi L wrote:
> > With vIOMMU exposed to guest, vIOMMU emulator needs to do translation
> > between host and guest. e.g. a device-selective TLB flush, vIOMMU
> > emulator needs to replace guest SID with host SID so that to limit
> > the invalidation. This patch introduces a new callback
> > iommu_ops->record_device() to notify vIOMMU emulator to record necessary
> > information about the assigned device.
> 
> This patch is to prepare to translate guest sbdf to host sbdf.
> 
> Alex:
> 	Could we add a new vfio API to do such translation? This will be more
> straight forward than storing host sbdf in the vIOMMU device model.
> 
> > 
> > Signed-off-by: Liu, Yi L <yi.l.liu@linux.intel.com>
> > ---
> >  include/exec/memory.h | 11 +++++++++++
> >  memory.c              | 12 ++++++++++++
> >  2 files changed, 23 insertions(+)
> > 
> > diff --git a/include/exec/memory.h b/include/exec/memory.h
> > index 7bd13ab..49087ef 100644
> > --- a/include/exec/memory.h
> > +++ b/include/exec/memory.h
> > @@ -203,6 +203,8 @@ struct MemoryRegionIOMMUOps {
> >                                  IOMMUNotifierFlag new_flags);
> >      /* Set this up to provide customized IOMMU replay function */
> >      void (*replay)(MemoryRegion *iommu, IOMMUNotifier *notifier);
> > +    void (*record_device)(MemoryRegion *iommu,
> > +                          void *device_info);
> >  };
> >  
> >  typedef struct CoalescedMemoryRange CoalescedMemoryRange;
> > @@ -708,6 +710,15 @@ void memory_region_notify_iommu(MemoryRegion *mr,
> >  void memory_region_notify_one(IOMMUNotifier *notifier,
> >                                IOMMUTLBEntry *entry);
> >  
> > +/*
> > + * memory_region_notify_device_record: notify IOMMU to record assign
> > + * device.
> > + * @mr: the memory region to notify
> > + * @ device_info: device information
> > + */
> > +void memory_region_notify_device_record(MemoryRegion *mr,
> > +                                        void *info);
> > +
> >  /**
> >   * memory_region_register_iommu_notifier: register a notifier for changes to
> >   * IOMMU translation entries.
> > diff --git a/memory.c b/memory.c
> > index 0728e62..45ef069 100644
> > --- a/memory.c
> > +++ b/memory.c
> > @@ -1600,6 +1600,18 @@ static void memory_region_update_iommu_notify_flags(MemoryRegion *mr)
> >      mr->iommu_notify_flags = flags;
> >  }
> >  
> > +void memory_region_notify_device_record(MemoryRegion *mr,
> > +                                        void *info)
> > +{
> > +    assert(memory_region_is_iommu(mr));
> > +
> > +    if (mr->iommu_ops->record_device) {
> > +        mr->iommu_ops->record_device(mr, info);
> > +    }
> > +
> > +    return;
> > +}
> > +
> >  void memory_region_register_iommu_notifier(MemoryRegion *mr,
> >                                             IOMMUNotifier *n)
> >  {
> > 
> 
> 

^ permalink raw reply	[flat|nested] 81+ messages in thread

* RE: [Qemu-devel] [RFC PATCH 09/20] Memory: introduce iommu_ops->record_device
  2017-05-19  5:23         ` Liu, Yi L
@ 2017-05-19  9:07           ` Tian, Kevin
  -1 siblings, 0 replies; 81+ messages in thread
From: Tian, Kevin @ 2017-05-19  9:07 UTC (permalink / raw)
  To: Liu, Yi L, alex.williamson-H+wXaHxf7aLQT0dZR+AlfA
  Cc: Lan, Tianyu, kvm-u79uwXL29TY76Z2rM5mHXA,
	jasowang-H+wXaHxf7aLQT0dZR+AlfA,
	qemu-devel-qX2TKyscuCcdnm+yROfE0A,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA, Pan,
	Jacob jun

> From: Liu, Yi L [mailto:yi.l.liu@linux.intel.com]
> Sent: Friday, May 19, 2017 1:24 PM
> 
> Hi Alex,
> 
> What's your opinion with Tianyu's question? Is it accepatable
> to use VFIO API in intel_iommu emulator?

Did you actually need such translation at all? SID should be
filled by kernel IOMMU driver based on which device is
requested with invalidation request, regardless of which 
guest SID is used in user space. Qemu only needs to know
which fd corresponds to guest SID, and then initiates an
invalidation request on that fd?

> 
> Thanks,
> Yi L
> On Fri, Apr 28, 2017 at 02:46:16PM +0800, Lan Tianyu wrote:
> > On 2017年04月26日 18:06, Liu, Yi L wrote:
> > > With vIOMMU exposed to guest, vIOMMU emulator needs to do
> translation
> > > between host and guest. e.g. a device-selective TLB flush, vIOMMU
> > > emulator needs to replace guest SID with host SID so that to limit
> > > the invalidation. This patch introduces a new callback
> > > iommu_ops->record_device() to notify vIOMMU emulator to record
> necessary
> > > information about the assigned device.
> >
> > This patch is to prepare to translate guest sbdf to host sbdf.
> >
> > Alex:
> > 	Could we add a new vfio API to do such translation? This will be more
> > straight forward than storing host sbdf in the vIOMMU device model.
> >
> > >
> > > Signed-off-by: Liu, Yi L <yi.l.liu@linux.intel.com>
> > > ---
> > >  include/exec/memory.h | 11 +++++++++++
> > >  memory.c              | 12 ++++++++++++
> > >  2 files changed, 23 insertions(+)
> > >
> > > diff --git a/include/exec/memory.h b/include/exec/memory.h
> > > index 7bd13ab..49087ef 100644
> > > --- a/include/exec/memory.h
> > > +++ b/include/exec/memory.h
> > > @@ -203,6 +203,8 @@ struct MemoryRegionIOMMUOps {
> > >                                  IOMMUNotifierFlag new_flags);
> > >      /* Set this up to provide customized IOMMU replay function */
> > >      void (*replay)(MemoryRegion *iommu, IOMMUNotifier *notifier);
> > > +    void (*record_device)(MemoryRegion *iommu,
> > > +                          void *device_info);
> > >  };
> > >
> > >  typedef struct CoalescedMemoryRange CoalescedMemoryRange;
> > > @@ -708,6 +710,15 @@ void
> memory_region_notify_iommu(MemoryRegion *mr,
> > >  void memory_region_notify_one(IOMMUNotifier *notifier,
> > >                                IOMMUTLBEntry *entry);
> > >
> > > +/*
> > > + * memory_region_notify_device_record: notify IOMMU to record
> assign
> > > + * device.
> > > + * @mr: the memory region to notify
> > > + * @ device_info: device information
> > > + */
> > > +void memory_region_notify_device_record(MemoryRegion *mr,
> > > +                                        void *info);
> > > +
> > >  /**
> > >   * memory_region_register_iommu_notifier: register a notifier for
> changes to
> > >   * IOMMU translation entries.
> > > diff --git a/memory.c b/memory.c
> > > index 0728e62..45ef069 100644
> > > --- a/memory.c
> > > +++ b/memory.c
> > > @@ -1600,6 +1600,18 @@ static void
> memory_region_update_iommu_notify_flags(MemoryRegion *mr)
> > >      mr->iommu_notify_flags = flags;
> > >  }
> > >
> > > +void memory_region_notify_device_record(MemoryRegion *mr,
> > > +                                        void *info)
> > > +{
> > > +    assert(memory_region_is_iommu(mr));
> > > +
> > > +    if (mr->iommu_ops->record_device) {
> > > +        mr->iommu_ops->record_device(mr, info);
> > > +    }
> > > +
> > > +    return;
> > > +}
> > > +
> > >  void memory_region_register_iommu_notifier(MemoryRegion *mr,
> > >                                             IOMMUNotifier *n)
> > >  {
> > >
> >
> >
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 09/20] Memory: introduce iommu_ops->record_device
@ 2017-05-19  9:07           ` Tian, Kevin
  0 siblings, 0 replies; 81+ messages in thread
From: Tian, Kevin @ 2017-05-19  9:07 UTC (permalink / raw)
  To: Liu, Yi L, alex.williamson
  Cc: qemu-devel, peterx, Liu, Yi L, Raj, Ashok, kvm,
	jean-philippe.brucker, jasowang, iommu, Pan, Jacob jun, Lan,
	Tianyu

> From: Liu, Yi L [mailto:yi.l.liu@linux.intel.com]
> Sent: Friday, May 19, 2017 1:24 PM
> 
> Hi Alex,
> 
> What's your opinion with Tianyu's question? Is it accepatable
> to use VFIO API in intel_iommu emulator?

Did you actually need such translation at all? SID should be
filled by kernel IOMMU driver based on which device is
requested with invalidation request, regardless of which 
guest SID is used in user space. Qemu only needs to know
which fd corresponds to guest SID, and then initiates an
invalidation request on that fd?

> 
> Thanks,
> Yi L
> On Fri, Apr 28, 2017 at 02:46:16PM +0800, Lan Tianyu wrote:
> > On 2017年04月26日 18:06, Liu, Yi L wrote:
> > > With vIOMMU exposed to guest, vIOMMU emulator needs to do
> translation
> > > between host and guest. e.g. a device-selective TLB flush, vIOMMU
> > > emulator needs to replace guest SID with host SID so that to limit
> > > the invalidation. This patch introduces a new callback
> > > iommu_ops->record_device() to notify vIOMMU emulator to record
> necessary
> > > information about the assigned device.
> >
> > This patch is to prepare to translate guest sbdf to host sbdf.
> >
> > Alex:
> > 	Could we add a new vfio API to do such translation? This will be more
> > straight forward than storing host sbdf in the vIOMMU device model.
> >
> > >
> > > Signed-off-by: Liu, Yi L <yi.l.liu@linux.intel.com>
> > > ---
> > >  include/exec/memory.h | 11 +++++++++++
> > >  memory.c              | 12 ++++++++++++
> > >  2 files changed, 23 insertions(+)
> > >
> > > diff --git a/include/exec/memory.h b/include/exec/memory.h
> > > index 7bd13ab..49087ef 100644
> > > --- a/include/exec/memory.h
> > > +++ b/include/exec/memory.h
> > > @@ -203,6 +203,8 @@ struct MemoryRegionIOMMUOps {
> > >                                  IOMMUNotifierFlag new_flags);
> > >      /* Set this up to provide customized IOMMU replay function */
> > >      void (*replay)(MemoryRegion *iommu, IOMMUNotifier *notifier);
> > > +    void (*record_device)(MemoryRegion *iommu,
> > > +                          void *device_info);
> > >  };
> > >
> > >  typedef struct CoalescedMemoryRange CoalescedMemoryRange;
> > > @@ -708,6 +710,15 @@ void
> memory_region_notify_iommu(MemoryRegion *mr,
> > >  void memory_region_notify_one(IOMMUNotifier *notifier,
> > >                                IOMMUTLBEntry *entry);
> > >
> > > +/*
> > > + * memory_region_notify_device_record: notify IOMMU to record
> assign
> > > + * device.
> > > + * @mr: the memory region to notify
> > > + * @ device_info: device information
> > > + */
> > > +void memory_region_notify_device_record(MemoryRegion *mr,
> > > +                                        void *info);
> > > +
> > >  /**
> > >   * memory_region_register_iommu_notifier: register a notifier for
> changes to
> > >   * IOMMU translation entries.
> > > diff --git a/memory.c b/memory.c
> > > index 0728e62..45ef069 100644
> > > --- a/memory.c
> > > +++ b/memory.c
> > > @@ -1600,6 +1600,18 @@ static void
> memory_region_update_iommu_notify_flags(MemoryRegion *mr)
> > >      mr->iommu_notify_flags = flags;
> > >  }
> > >
> > > +void memory_region_notify_device_record(MemoryRegion *mr,
> > > +                                        void *info)
> > > +{
> > > +    assert(memory_region_is_iommu(mr));
> > > +
> > > +    if (mr->iommu_ops->record_device) {
> > > +        mr->iommu_ops->record_device(mr, info);
> > > +    }
> > > +
> > > +    return;
> > > +}
> > > +
> > >  void memory_region_register_iommu_notifier(MemoryRegion *mr,
> > >                                             IOMMUNotifier *n)
> > >  {
> > >
> >
> >

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 09/20] Memory: introduce iommu_ops->record_device
  2017-05-19  9:07           ` Tian, Kevin
  (?)
@ 2017-05-19  9:35           ` Liu, Yi L
  -1 siblings, 0 replies; 81+ messages in thread
From: Liu, Yi L @ 2017-05-19  9:35 UTC (permalink / raw)
  To: Tian, Kevin
  Cc: alex.williamson, qemu-devel, peterx, Liu, Yi L, Raj, Ashok, kvm,
	jean-philippe.brucker, jasowang, iommu, Pan, Jacob jun, Lan,
	Tianyu

On Fri, May 19, 2017 at 09:07:49AM +0000, Tian, Kevin wrote:
> > From: Liu, Yi L [mailto:yi.l.liu@linux.intel.com]
> > Sent: Friday, May 19, 2017 1:24 PM
> > 
> > Hi Alex,
> > 
> > What's your opinion with Tianyu's question? Is it accepatable
> > to use VFIO API in intel_iommu emulator?
> 
> Did you actually need such translation at all? SID should be
> filled by kernel IOMMU driver based on which device is
> requested with invalidation request, regardless of which 
> guest SID is used in user space. Qemu only needs to know
> which fd corresponds to guest SID, and then initiates an
> invalidation request on that fd?

Kevin,

It actually depends on the svm binding behavior we expect in host
IOMMU driver side. If we want to have the binding per-device, this
translation is needed in Qemu either in VFIO or intel_iommu emulator.
So that the host SID could be used as a device selector when looping
devices in a group.

If we can use VFIO API directly, we also may trigger the svm bind/qi
propagation straightforwardly instead of using notifier.

Thanks,
Yi L
 
> > 
> > Thanks,
> > Yi L
> > On Fri, Apr 28, 2017 at 02:46:16PM +0800, Lan Tianyu wrote:
> > > On 2017年04月26日 18:06, Liu, Yi L wrote:
> > > > With vIOMMU exposed to guest, vIOMMU emulator needs to do
> > translation
> > > > between host and guest. e.g. a device-selective TLB flush, vIOMMU
> > > > emulator needs to replace guest SID with host SID so that to limit
> > > > the invalidation. This patch introduces a new callback
> > > > iommu_ops->record_device() to notify vIOMMU emulator to record
> > necessary
> > > > information about the assigned device.
> > >
> > > This patch is to prepare to translate guest sbdf to host sbdf.
> > >
> > > Alex:
> > > 	Could we add a new vfio API to do such translation? This will be more
> > > straight forward than storing host sbdf in the vIOMMU device model.
> > >
> > > >
> > > > Signed-off-by: Liu, Yi L <yi.l.liu@linux.intel.com>
> > > > ---
> > > >  include/exec/memory.h | 11 +++++++++++
> > > >  memory.c              | 12 ++++++++++++
> > > >  2 files changed, 23 insertions(+)
> > > >
> > > > diff --git a/include/exec/memory.h b/include/exec/memory.h
> > > > index 7bd13ab..49087ef 100644
> > > > --- a/include/exec/memory.h
> > > > +++ b/include/exec/memory.h
> > > > @@ -203,6 +203,8 @@ struct MemoryRegionIOMMUOps {
> > > >                                  IOMMUNotifierFlag new_flags);
> > > >      /* Set this up to provide customized IOMMU replay function */
> > > >      void (*replay)(MemoryRegion *iommu, IOMMUNotifier *notifier);
> > > > +    void (*record_device)(MemoryRegion *iommu,
> > > > +                          void *device_info);
> > > >  };
> > > >
> > > >  typedef struct CoalescedMemoryRange CoalescedMemoryRange;
> > > > @@ -708,6 +710,15 @@ void
> > memory_region_notify_iommu(MemoryRegion *mr,
> > > >  void memory_region_notify_one(IOMMUNotifier *notifier,
> > > >                                IOMMUTLBEntry *entry);
> > > >
> > > > +/*
> > > > + * memory_region_notify_device_record: notify IOMMU to record
> > assign
> > > > + * device.
> > > > + * @mr: the memory region to notify
> > > > + * @ device_info: device information
> > > > + */
> > > > +void memory_region_notify_device_record(MemoryRegion *mr,
> > > > +                                        void *info);
> > > > +
> > > >  /**
> > > >   * memory_region_register_iommu_notifier: register a notifier for
> > changes to
> > > >   * IOMMU translation entries.
> > > > diff --git a/memory.c b/memory.c
> > > > index 0728e62..45ef069 100644
> > > > --- a/memory.c
> > > > +++ b/memory.c
> > > > @@ -1600,6 +1600,18 @@ static void
> > memory_region_update_iommu_notify_flags(MemoryRegion *mr)
> > > >      mr->iommu_notify_flags = flags;
> > > >  }
> > > >
> > > > +void memory_region_notify_device_record(MemoryRegion *mr,
> > > > +                                        void *info)
> > > > +{
> > > > +    assert(memory_region_is_iommu(mr));
> > > > +
> > > > +    if (mr->iommu_ops->record_device) {
> > > > +        mr->iommu_ops->record_device(mr, info);
> > > > +    }
> > > > +
> > > > +    return;
> > > > +}
> > > > +
> > > >  void memory_region_register_iommu_notifier(MemoryRegion *mr,
> > > >                                             IOMMUNotifier *n)
> > > >  {
> > > >
> > >
> > >

^ permalink raw reply	[flat|nested] 81+ messages in thread

end of thread, other threads:[~2017-05-19  9:52 UTC | newest]

Thread overview: 81+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-04-26 10:06 [RFC PATCH 00/20] Qemu: Extend intel_iommu emulator to support Shared Virtual Memory Liu, Yi L
2017-04-26 10:06 ` [Qemu-devel] " Liu, Yi L
2017-04-26 10:06 ` [RFC PATCH 01/20] intel_iommu: add "ecs" option Liu, Yi L
2017-04-26 10:06   ` [Qemu-devel] " Liu, Yi L
2017-04-26 10:06 ` [RFC PATCH 02/20] intel_iommu: exposed extended-context mode to guest Liu, Yi L
2017-04-26 10:06   ` [Qemu-devel] " Liu, Yi L
     [not found]   ` <1493201210-14357-3-git-send-email-yi.l.liu-VuQAYsv1563Yd54FQh9/CA@public.gmane.org>
2017-04-27 10:32     ` Peter Xu
2017-04-27 10:32       ` [Qemu-devel] " Peter Xu
2017-04-28  6:00       ` Lan Tianyu
2017-04-28  6:00         ` [Qemu-devel] " Lan Tianyu
     [not found]         ` <a7cd779f-2cd6-3a3f-7e73-e79a49c48961-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
2017-04-28  9:56           ` Liu, Yi L
2017-04-28  9:56             ` [Qemu-devel] " Liu, Yi L
     [not found]       ` <20170427103221.GD1542-QJIicYCqamqhazCxEpVPD9i2O/JbrIOy@public.gmane.org>
2017-04-28  9:55         ` Liu, Yi L
2017-04-28  9:55           ` Liu, Yi L
2017-04-26 10:06 ` [RFC PATCH 03/20] intel_iommu: add "svm" option Liu, Yi L
2017-04-26 10:06   ` [Qemu-devel] " Liu, Yi L
     [not found]   ` <1493201210-14357-4-git-send-email-yi.l.liu-VuQAYsv1563Yd54FQh9/CA@public.gmane.org>
2017-04-27 10:53     ` Peter Xu
2017-04-27 10:53       ` [Qemu-devel] " Peter Xu
     [not found]       ` <20170427105317.GE1542-QJIicYCqamqhazCxEpVPD9i2O/JbrIOy@public.gmane.org>
2017-05-04 20:28         ` Alex Williamson
2017-05-04 20:28           ` [Qemu-devel] " Alex Williamson
     [not found]           ` <20170504142853.1537028c-1yVPhWWZRC1BDLzU/O5InQ@public.gmane.org>
2017-05-04 20:37             ` Raj, Ashok
2017-05-04 20:37               ` [Qemu-devel] " Raj, Ashok
2017-05-08 10:38         ` Liu, Yi L
2017-05-08 10:38           ` [Qemu-devel] " Liu, Yi L
     [not found]           ` <A2975661238FB949B60364EF0F2C25743906890D-E2R4CRU6q/6iAffOGbnezLfspsVTdybXVpNB7YpNyf8@public.gmane.org>
2017-05-08 11:20             ` Peter Xu
2017-05-08 11:20               ` [Qemu-devel] " Peter Xu
     [not found]               ` <20170508112034.GE2820-QJIicYCqamqhazCxEpVPD9i2O/JbrIOy@public.gmane.org>
2017-05-08  8:15                 ` Liu, Yi L
2017-05-08  8:15                   ` Liu, Yi L
2017-04-26 10:06 ` [RFC PATCH 04/20] Memory: modify parameter in IOMMUNotifier func Liu, Yi L
2017-04-26 10:06   ` [Qemu-devel] " Liu, Yi L
2017-04-26 10:06 ` [RFC PATCH 05/20] VFIO: add new IOCTL for svm bind tasks Liu, Yi L
2017-04-26 10:06   ` [Qemu-devel] " Liu, Yi L
2017-04-26 10:06 ` [RFC PATCH 06/20] VFIO: add new notifier for binding PASID table Liu, Yi L
2017-04-26 10:06   ` [Qemu-devel] " Liu, Yi L
2017-04-26 10:06 ` [RFC PATCH 07/20] VFIO: check notifier flag in region_del() Liu, Yi L
2017-04-26 10:06   ` [Qemu-devel] " Liu, Yi L
2017-04-26 10:06 ` [RFC PATCH 08/20] Memory: add notifier flag check in memory_replay() Liu, Yi L
2017-04-26 10:06   ` [Qemu-devel] " Liu, Yi L
2017-04-26 10:06 ` [RFC PATCH 09/20] Memory: introduce iommu_ops->record_device Liu, Yi L
2017-04-26 10:06   ` [Qemu-devel] " Liu, Yi L
     [not found]   ` <1493201210-14357-10-git-send-email-yi.l.liu-VuQAYsv1563Yd54FQh9/CA@public.gmane.org>
2017-04-28  6:46     ` Lan Tianyu
2017-04-28  6:46       ` [Qemu-devel] " Lan Tianyu
2017-05-19  5:23       ` Liu, Yi L
2017-05-19  5:23         ` Liu, Yi L
2017-05-19  5:23         ` Liu, Yi L
2017-05-19  9:07         ` Tian, Kevin
2017-05-19  9:07           ` Tian, Kevin
2017-05-19  9:35           ` Liu, Yi L
2017-04-26 10:06 ` [RFC PATCH 10/20] VFIO: notify vIOMMU emulator when device is assigned Liu, Yi L
2017-04-26 10:06   ` [Qemu-devel] " Liu, Yi L
2017-04-26 10:06 ` [RFC PATCH 11/20] intel_iommu: provide iommu_ops->record_device Liu, Yi L
2017-04-26 10:06   ` [Qemu-devel] " Liu, Yi L
2017-04-26 10:06 ` [RFC PATCH 12/20] Memory: Add func to fire pasidt_bind notifier Liu, Yi L
2017-04-26 10:06   ` [Qemu-devel] " Liu, Yi L
     [not found]   ` <1493201210-14357-13-git-send-email-yi.l.liu-VuQAYsv1563Yd54FQh9/CA@public.gmane.org>
2017-04-26 13:50     ` Paolo Bonzini
2017-04-26 13:50       ` [Qemu-devel] " Paolo Bonzini
2017-04-27  2:37       ` Liu, Yi L
2017-04-27  6:14         ` Peter Xu
2017-04-27  6:14           ` Peter Xu
2017-04-27 10:09           ` Peter Xu
     [not found]           ` <20170427061427.GA1542-QJIicYCqamqhazCxEpVPD9i2O/JbrIOy@public.gmane.org>
2017-04-27 10:25             ` Liu, Yi L
2017-04-27 10:25               ` Liu, Yi L
2017-04-27 10:51               ` Peter Xu
2017-04-26 10:06 ` [RFC PATCH 13/20] IOMMU: add pasid_table_info for guest pasid table Liu, Yi L
2017-04-26 10:06   ` [Qemu-devel] " Liu, Yi L
2017-04-26 10:06 ` [RFC PATCH 14/20] intel_iommu: add FOR_EACH_ASSIGN_DEVICE macro Liu, Yi L
2017-04-26 10:06   ` [Qemu-devel] " Liu, Yi L
     [not found]   ` <1493201210-14357-15-git-send-email-yi.l.liu-VuQAYsv1563Yd54FQh9/CA@public.gmane.org>
2017-04-28  7:33     ` Lan Tianyu
2017-04-28  7:33       ` [Qemu-devel] " Lan Tianyu
2017-04-26 10:06 ` [RFC PATCH 15/20] intel_iommu: link whole guest pasid table to host Liu, Yi L
2017-04-26 10:06   ` [Qemu-devel] " Liu, Yi L
2017-04-26 10:06 ` [RFC PATCH 16/20] VFIO: Add notifier for propagating IOMMU TLB invalidate Liu, Yi L
2017-04-26 10:06   ` [Qemu-devel] " Liu, Yi L
2017-04-26 10:06 ` [RFC PATCH 17/20] Memory: Add func to fire TLB invalidate notifier Liu, Yi L
2017-04-26 10:06   ` [Qemu-devel] " Liu, Yi L
2017-04-26 10:06 ` [RFC PATCH 18/20] intel_iommu: propagate Extended-IOTLB invalidate to host Liu, Yi L
2017-04-26 10:06   ` [Qemu-devel] " Liu, Yi L
2017-04-26 10:06 ` [RFC PATCH 19/20] intel_iommu: propagate PASID-Cache " Liu, Yi L
2017-04-26 10:06   ` [Qemu-devel] " Liu, Yi L
2017-04-26 10:06 ` [RFC PATCH 20/20] intel_iommu: propagate Ext-Device-TLB " Liu, Yi L
2017-04-26 10:06   ` [Qemu-devel] " Liu, Yi L

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.