[Qemu-devel] [PATCH v1 0/3] intel-iommu: add support for 5-level virtual IOMMU.

All of lore.kernel.org
 help / color / mirror / Atom feed

* [Qemu-devel] [PATCH v1 0/3] intel-iommu: add support for 5-level virtual IOMMU.
@ 2018-11-09 11:49 Yu Zhang
  2018-11-09 11:49 ` [Qemu-devel] [PATCH v1 1/3] intel-iommu: differentiate host address width from IOVA address width Yu Zhang
                   ` (4 more replies)
  0 siblings, 5 replies; 22+ messages in thread
From: Yu Zhang @ 2018-11-09 11:49 UTC (permalink / raw)
  To: qemu-devel
  Cc: Michael S. Tsirkin, Igor Mammedov, Marcel Apfelbaum,
	Paolo Bonzini, Richard Henderson, Eduardo Habkost, Peter Xu

Intel's upcoming processors will extend maximum linear address width to
57 bits, and introduce 5-level paging for CPU. Meanwhile, the platform
will also extend the maximum guest address width for IOMMU to 57 bits,
thus introducing the 5-level paging for 2nd level translation(See chapter 3
in Intel Virtualization Technology for Directed I/O). 

This patch set extends the current logic to support a wider address width.
A 5-level paging capable IOMMU(for 2nd level translation) can be rendered
with configuration "device intel-iommu,x-aw-bits=57".


Yu Zhang (3):
  intel-iommu: differentiate host address width from IOVA address width.
  intel-iommu: extend VTD emulation to allow 57-bit IOVA address width.
  intel-iommu: search iotlb for levels supported by the address width.
---
Cc: "Michael S. Tsirkin" <mst@redhat.com> 
Cc: Igor Mammedov <imammedo@redhat.com> 
Cc: Marcel Apfelbaum <marcel.apfelbaum@gmail.com>
Cc: Paolo Bonzini <pbonzini@redhat.com> 
Cc: Richard Henderson <rth@twiddle.net> 
Cc: Eduardo Habkost <ehabkost@redhat.com>
Cc: Peter Xu <peterx@redhat.com>


 hw/i386/acpi-build.c           |   2 +-
 hw/i386/intel_iommu.c          | 101 +++++++++++++++++++++++++++--------------
 hw/i386/intel_iommu_internal.h |  13 ++++--
 include/hw/i386/intel_iommu.h  |  10 ++--
 4 files changed, 83 insertions(+), 43 deletions(-)

-- 
1.9.1

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [Qemu-devel] [PATCH v1 1/3] intel-iommu: differentiate host address width from IOVA address width.
  2018-11-09 11:49 [Qemu-devel] [PATCH v1 0/3] intel-iommu: add support for 5-level virtual IOMMU Yu Zhang
@ 2018-11-09 11:49 ` Yu Zhang
  2018-11-12  8:15   ` Peter Xu
  2018-11-09 11:49 ` [Qemu-devel] [PATCH v1 2/3] intel-iommu: extend VTD emulation to allow 57-bit " Yu Zhang
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 22+ messages in thread
From: Yu Zhang @ 2018-11-09 11:49 UTC (permalink / raw)
  To: qemu-devel
  Cc: Michael S. Tsirkin, Igor Mammedov, Marcel Apfelbaum,
	Paolo Bonzini, Richard Henderson, Eduardo Habkost, Peter Xu

Currently, vIOMMU is using the value of IOVA address width, instead of
the host address width(HAW) to calculate the number of reserved bits in
data structures such as root entries, context entries, and entries of
DMA paging structures etc.

However values of IOVA address width and of the HAW may not equal. For
example, a 48-bit IOVA can only be mapped to host addresses no wider than
46 bits. Using 48, instead of 46 to calculate the reserved bit may result
in an invalid IOVA being accepted.

To fix this, a new field - haw_bits is introduced in struct IntelIOMMUState,
whose value is initialized based on the maximum physical address set to
guest CPU. Also, definitions such as VTD_HOST_AW_39/48BIT etc. are renamed
to clarify.

Signed-off-by: Yu Zhang <yu.c.zhang@linux.intel.com>
---
Cc: "Michael S. Tsirkin" <mst@redhat.com> 
Cc: Igor Mammedov <imammedo@redhat.com> 
Cc: Marcel Apfelbaum <marcel.apfelbaum@gmail.com>
Cc: Paolo Bonzini <pbonzini@redhat.com> 
Cc: Richard Henderson <rth@twiddle.net> 
Cc: Eduardo Habkost <ehabkost@redhat.com>
Cc: Peter Xu <peterx@redhat.com>
---
 hw/i386/acpi-build.c          |  2 +-
 hw/i386/intel_iommu.c         | 54 +++++++++++++++++++++++--------------------
 include/hw/i386/intel_iommu.h |  9 ++++----
 3 files changed, 35 insertions(+), 30 deletions(-)

diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c
index 236a20e..b989523 100644
--- a/hw/i386/acpi-build.c
+++ b/hw/i386/acpi-build.c
@@ -2431,7 +2431,7 @@ build_dmar_q35(GArray *table_data, BIOSLinker *linker)
     }
 
     dmar = acpi_data_push(table_data, sizeof(*dmar));
-    dmar->host_address_width = intel_iommu->aw_bits - 1;
+    dmar->host_address_width = intel_iommu->haw_bits - 1;
     dmar->flags = dmar_flags;
 
     /* DMAR Remapping Hardware Unit Definition structure */
diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
index d97bcbc..e772fca 100644
--- a/hw/i386/intel_iommu.c
+++ b/hw/i386/intel_iommu.c
@@ -706,8 +706,8 @@ static VTDBus *vtd_find_as_from_bus_num(IntelIOMMUState *s, uint8_t bus_num)
  * of the translation, can be used for deciding the size of large page.
  */
 static int vtd_iova_to_slpte(VTDContextEntry *ce, uint64_t iova, bool is_write,
-                             uint64_t *slptep, uint32_t *slpte_level,
-                             bool *reads, bool *writes, uint8_t aw_bits)
+                             uint64_t *slptep, uint32_t *slpte_level, bool *reads,
+                             bool *writes, uint8_t aw_bits, uint8_t haw_bits)
 {
     dma_addr_t addr = vtd_ce_get_slpt_base(ce);
     uint32_t level = vtd_ce_get_level(ce);
@@ -760,7 +760,7 @@ static int vtd_iova_to_slpte(VTDContextEntry *ce, uint64_t iova, bool is_write,
             *slpte_level = level;
             return 0;
         }
-        addr = vtd_get_slpte_addr(slpte, aw_bits);
+        addr = vtd_get_slpte_addr(slpte, haw_bits);
         level--;
     }
 }
@@ -887,6 +887,7 @@ static int vtd_page_walk_level(dma_addr_t addr, uint64_t start,
     uint64_t iova = start;
     uint64_t iova_next;
     int ret = 0;
+    uint8_t haw = info->as->iommu_state->haw_bits;
 
     trace_vtd_page_walk_level(addr, level, start, end);
 
@@ -925,7 +926,7 @@ static int vtd_page_walk_level(dma_addr_t addr, uint64_t start,
              * This is a valid PDE (or even bigger than PDE).  We need
              * to walk one further level.
              */
-            ret = vtd_page_walk_level(vtd_get_slpte_addr(slpte, info->aw),
+            ret = vtd_page_walk_level(vtd_get_slpte_addr(slpte, haw),
                                       iova, MIN(iova_next, end), level - 1,
                                       read_cur, write_cur, info);
         } else {
@@ -942,7 +943,7 @@ static int vtd_page_walk_level(dma_addr_t addr, uint64_t start,
             entry.perm = IOMMU_ACCESS_FLAG(read_cur, write_cur);
             entry.addr_mask = ~subpage_mask;
             /* NOTE: this is only meaningful if entry_valid == true */
-            entry.translated_addr = vtd_get_slpte_addr(slpte, info->aw);
+            entry.translated_addr = vtd_get_slpte_addr(slpte, haw);
             ret = vtd_page_walk_one(&entry, info);
         }
 
@@ -1002,7 +1003,7 @@ static int vtd_dev_to_context_entry(IntelIOMMUState *s, uint8_t bus_num,
         return -VTD_FR_ROOT_ENTRY_P;
     }
 
-    if (re.rsvd || (re.val & VTD_ROOT_ENTRY_RSVD(s->aw_bits))) {
+    if (re.rsvd || (re.val & VTD_ROOT_ENTRY_RSVD(s->haw_bits))) {
         trace_vtd_re_invalid(re.rsvd, re.val);
         return -VTD_FR_ROOT_ENTRY_RSVD;
     }
@@ -1019,7 +1020,7 @@ static int vtd_dev_to_context_entry(IntelIOMMUState *s, uint8_t bus_num,
     }
 
     if ((ce->hi & VTD_CONTEXT_ENTRY_RSVD_HI) ||
-               (ce->lo & VTD_CONTEXT_ENTRY_RSVD_LO(s->aw_bits))) {
+               (ce->lo & VTD_CONTEXT_ENTRY_RSVD_LO(s->haw_bits))) {
         trace_vtd_ce_invalid(ce->hi, ce->lo);
         return -VTD_FR_CONTEXT_ENTRY_RSVD;
     }
@@ -1360,7 +1361,7 @@ static bool vtd_do_iommu_translate(VTDAddressSpace *vtd_as, PCIBus *bus,
     }
 
     ret_fr = vtd_iova_to_slpte(&ce, addr, is_write, &slpte, &level,
-                               &reads, &writes, s->aw_bits);
+                               &reads, &writes, s->aw_bits, s->haw_bits);
     if (ret_fr) {
         ret_fr = -ret_fr;
         if (is_fpd_set && vtd_is_qualified_fault(ret_fr)) {
@@ -1378,7 +1379,7 @@ static bool vtd_do_iommu_translate(VTDAddressSpace *vtd_as, PCIBus *bus,
 out:
     vtd_iommu_unlock(s);
     entry->iova = addr & page_mask;
-    entry->translated_addr = vtd_get_slpte_addr(slpte, s->aw_bits) & page_mask;
+    entry->translated_addr = vtd_get_slpte_addr(slpte, s->haw_bits) & page_mask;
     entry->addr_mask = ~page_mask;
     entry->perm = access_flags;
     return true;
@@ -1396,7 +1397,7 @@ static void vtd_root_table_setup(IntelIOMMUState *s)
 {
     s->root = vtd_get_quad_raw(s, DMAR_RTADDR_REG);
     s->root_extended = s->root & VTD_RTADDR_RTT;
-    s->root &= VTD_RTADDR_ADDR_MASK(s->aw_bits);
+    s->root &= VTD_RTADDR_ADDR_MASK(s->haw_bits);
 
     trace_vtd_reg_dmar_root(s->root, s->root_extended);
 }
@@ -1412,7 +1413,7 @@ static void vtd_interrupt_remap_table_setup(IntelIOMMUState *s)
     uint64_t value = 0;
     value = vtd_get_quad_raw(s, DMAR_IRTA_REG);
     s->intr_size = 1UL << ((value & VTD_IRTA_SIZE_MASK) + 1);
-    s->intr_root = value & VTD_IRTA_ADDR_MASK(s->aw_bits);
+    s->intr_root = value & VTD_IRTA_ADDR_MASK(s->haw_bits);
     s->intr_eime = value & VTD_IRTA_EIME;
 
     /* Notify global invalidation */
@@ -1689,7 +1690,7 @@ static void vtd_handle_gcmd_qie(IntelIOMMUState *s, bool en)
     trace_vtd_inv_qi_enable(en);
 
     if (en) {
-        s->iq = iqa_val & VTD_IQA_IQA_MASK(s->aw_bits);
+        s->iq = iqa_val & VTD_IQA_IQA_MASK(s->haw_bits);
         /* 2^(x+8) entries */
         s->iq_size = 1UL << ((iqa_val & VTD_IQA_QS) + 8);
         s->qi_enabled = true;
@@ -2629,7 +2630,7 @@ static Property vtd_properties[] = {
                             ON_OFF_AUTO_AUTO),
     DEFINE_PROP_BOOL("x-buggy-eim", IntelIOMMUState, buggy_eim, false),
     DEFINE_PROP_UINT8("x-aw-bits", IntelIOMMUState, aw_bits,
-                      VTD_HOST_ADDRESS_WIDTH),
+                      VTD_ADDRESS_WIDTH),
     DEFINE_PROP_BOOL("caching-mode", IntelIOMMUState, caching_mode, FALSE),
     DEFINE_PROP_END_OF_LIST(),
 };
@@ -3100,6 +3101,8 @@ static void vtd_iommu_replay(IOMMUMemoryRegion *iommu_mr, IOMMUNotifier *n)
 static void vtd_init(IntelIOMMUState *s)
 {
     X86IOMMUState *x86_iommu = X86_IOMMU_DEVICE(s);
+    CPUState *cs = first_cpu;
+    X86CPU *cpu = X86_CPU(cs);
 
     memset(s->csr, 0, DMAR_REG_SIZE);
     memset(s->wmask, 0, DMAR_REG_SIZE);
@@ -3119,23 +3122,24 @@ static void vtd_init(IntelIOMMUState *s)
     s->cap = VTD_CAP_FRO | VTD_CAP_NFR | VTD_CAP_ND |
              VTD_CAP_MAMV | VTD_CAP_PSI | VTD_CAP_SLLPS |
              VTD_CAP_SAGAW_39bit | VTD_CAP_MGAW(s->aw_bits);
-    if (s->aw_bits == VTD_HOST_AW_48BIT) {
+    if (s->aw_bits == VTD_AW_48BIT) {
         s->cap |= VTD_CAP_SAGAW_48bit;
     }
     s->ecap = VTD_ECAP_QI | VTD_ECAP_IRO;
+    s->haw_bits = cpu->phys_bits;
 
     /*
      * Rsvd field masks for spte
      */
     vtd_paging_entry_rsvd_field[0] = ~0ULL;
-    vtd_paging_entry_rsvd_field[1] = VTD_SPTE_PAGE_L1_RSVD_MASK(s->aw_bits);
-    vtd_paging_entry_rsvd_field[2] = VTD_SPTE_PAGE_L2_RSVD_MASK(s->aw_bits);
-    vtd_paging_entry_rsvd_field[3] = VTD_SPTE_PAGE_L3_RSVD_MASK(s->aw_bits);
-    vtd_paging_entry_rsvd_field[4] = VTD_SPTE_PAGE_L4_RSVD_MASK(s->aw_bits);
-    vtd_paging_entry_rsvd_field[5] = VTD_SPTE_LPAGE_L1_RSVD_MASK(s->aw_bits);
-    vtd_paging_entry_rsvd_field[6] = VTD_SPTE_LPAGE_L2_RSVD_MASK(s->aw_bits);
-    vtd_paging_entry_rsvd_field[7] = VTD_SPTE_LPAGE_L3_RSVD_MASK(s->aw_bits);
-    vtd_paging_entry_rsvd_field[8] = VTD_SPTE_LPAGE_L4_RSVD_MASK(s->aw_bits);
+    vtd_paging_entry_rsvd_field[1] = VTD_SPTE_PAGE_L1_RSVD_MASK(s->haw_bits);
+    vtd_paging_entry_rsvd_field[2] = VTD_SPTE_PAGE_L2_RSVD_MASK(s->haw_bits);
+    vtd_paging_entry_rsvd_field[3] = VTD_SPTE_PAGE_L3_RSVD_MASK(s->haw_bits);
+    vtd_paging_entry_rsvd_field[4] = VTD_SPTE_PAGE_L4_RSVD_MASK(s->haw_bits);
+    vtd_paging_entry_rsvd_field[5] = VTD_SPTE_LPAGE_L1_RSVD_MASK(s->haw_bits);
+    vtd_paging_entry_rsvd_field[6] = VTD_SPTE_LPAGE_L2_RSVD_MASK(s->haw_bits);
+    vtd_paging_entry_rsvd_field[7] = VTD_SPTE_LPAGE_L3_RSVD_MASK(s->haw_bits);
+    vtd_paging_entry_rsvd_field[8] = VTD_SPTE_LPAGE_L4_RSVD_MASK(s->haw_bits);
 
     if (x86_iommu->intr_supported) {
         s->ecap |= VTD_ECAP_IR | VTD_ECAP_MHMV;
@@ -3261,10 +3265,10 @@ static bool vtd_decide_config(IntelIOMMUState *s, Error **errp)
     }
 
     /* Currently only address widths supported are 39 and 48 bits */
-    if ((s->aw_bits != VTD_HOST_AW_39BIT) &&
-        (s->aw_bits != VTD_HOST_AW_48BIT)) {
+    if ((s->aw_bits != VTD_AW_39BIT) &&
+        (s->aw_bits != VTD_AW_48BIT)) {
         error_setg(errp, "Supported values for x-aw-bits are: %d, %d",
-                   VTD_HOST_AW_39BIT, VTD_HOST_AW_48BIT);
+                   VTD_AW_39BIT, VTD_AW_48BIT);
         return false;
     }
 
diff --git a/include/hw/i386/intel_iommu.h b/include/hw/i386/intel_iommu.h
index ed4e758..820451c 100644
--- a/include/hw/i386/intel_iommu.h
+++ b/include/hw/i386/intel_iommu.h
@@ -47,9 +47,9 @@
 #define VTD_SID_TO_DEVFN(sid)       ((sid) & 0xff)
 
 #define DMAR_REG_SIZE               0x230
-#define VTD_HOST_AW_39BIT           39
-#define VTD_HOST_AW_48BIT           48
-#define VTD_HOST_ADDRESS_WIDTH      VTD_HOST_AW_39BIT
+#define VTD_AW_39BIT                39
+#define VTD_AW_48BIT                48
+#define VTD_ADDRESS_WIDTH           VTD_AW_39BIT
 #define VTD_HAW_MASK(aw)            ((1ULL << (aw)) - 1)
 
 #define DMAR_REPORT_F_INTR          (1)
@@ -244,7 +244,8 @@ struct IntelIOMMUState {
     bool intr_eime;                 /* Extended interrupt mode enabled */
     OnOffAuto intr_eim;             /* Toggle for EIM cabability */
     bool buggy_eim;                 /* Force buggy EIM unless eim=off */
-    uint8_t aw_bits;                /* Host/IOVA address width (in bits) */
+    uint8_t aw_bits;                /* IOVA address width (in bits) */
+    uint8_t haw_bits;               /* Hardware address width (in bits) */
 
     /*
      * Protects IOMMU states in general.  Currently it protects the
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [Qemu-devel] [PATCH v1 2/3] intel-iommu: extend VTD emulation to allow 57-bit IOVA address width.
  2018-11-09 11:49 [Qemu-devel] [PATCH v1 0/3] intel-iommu: add support for 5-level virtual IOMMU Yu Zhang
  2018-11-09 11:49 ` [Qemu-devel] [PATCH v1 1/3] intel-iommu: differentiate host address width from IOVA address width Yu Zhang
@ 2018-11-09 11:49 ` Yu Zhang
  2018-11-12  8:36   ` Peter Xu
  2018-11-09 11:49 ` [Qemu-devel] [PATCH v1 3/3] intel-iommu: search iotlb for levels supported by the " Yu Zhang
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 22+ messages in thread
From: Yu Zhang @ 2018-11-09 11:49 UTC (permalink / raw)
  To: qemu-devel
  Cc: Michael S. Tsirkin, Marcel Apfelbaum, Paolo Bonzini,
	Richard Henderson, Eduardo Habkost, Peter Xu

A 5-level paging capable VM may choose to use 57-bit IOVA address width.
E.g. guest applications like DPDK prefer to use its VA as IOVA when
performing VFIO map/unmap operations, to avoid the burden of managing the
IOVA space.

This patch extends the current vIOMMU logic to cover the extended address
width. When creating a VM with 5-level paging feature, one can choose to
create a virtual VTD with 5-level paging capability, with configurations
like "-device intel-iommu,x-aw-bits=57".

Signed-off-by: Yu Zhang <yu.c.zhang@linux.intel.com>
---
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Marcel Apfelbaum <marcel.apfelbaum@gmail.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Eduardo Habkost <ehabkost@redhat.com>
Cc: Peter Xu <peterx@redhat.com>
---
 hw/i386/intel_iommu.c          | 54 ++++++++++++++++++++++++++++++++----------
 hw/i386/intel_iommu_internal.h |  6 +++++
 include/hw/i386/intel_iommu.h  |  1 +
 3 files changed, 49 insertions(+), 12 deletions(-)

diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
index e772fca..9cdf755 100644
--- a/hw/i386/intel_iommu.c
+++ b/hw/i386/intel_iommu.c
@@ -664,16 +664,16 @@ static inline bool vtd_iova_range_check(uint64_t iova, VTDContextEntry *ce,
 
 /*
  * Rsvd field masks for spte:
- *     Index [1] to [4] 4k pages
- *     Index [5] to [8] large pages
+ *     Index [1] to [5] 4k pages
+ *     Index [6] to [10] large pages
  */
-static uint64_t vtd_paging_entry_rsvd_field[9];
+static uint64_t vtd_paging_entry_rsvd_field[11];
 
 static bool vtd_slpte_nonzero_rsvd(uint64_t slpte, uint32_t level)
 {
     if (slpte & VTD_SL_PT_PAGE_SIZE_MASK) {
         /* Maybe large page */
-        return slpte & vtd_paging_entry_rsvd_field[level + 4];
+        return slpte & vtd_paging_entry_rsvd_field[level + 5];
     } else {
         return slpte & vtd_paging_entry_rsvd_field[level];
     }
@@ -3125,6 +3125,9 @@ static void vtd_init(IntelIOMMUState *s)
     if (s->aw_bits == VTD_AW_48BIT) {
         s->cap |= VTD_CAP_SAGAW_48bit;
     }
+    else if (s->aw_bits == VTD_AW_57BIT) {
+        s->cap |= VTD_CAP_SAGAW_57bit | VTD_CAP_SAGAW_48bit;
+    }
     s->ecap = VTD_ECAP_QI | VTD_ECAP_IRO;
     s->haw_bits = cpu->phys_bits;
 
@@ -3136,10 +3139,12 @@ static void vtd_init(IntelIOMMUState *s)
     vtd_paging_entry_rsvd_field[2] = VTD_SPTE_PAGE_L2_RSVD_MASK(s->haw_bits);
     vtd_paging_entry_rsvd_field[3] = VTD_SPTE_PAGE_L3_RSVD_MASK(s->haw_bits);
     vtd_paging_entry_rsvd_field[4] = VTD_SPTE_PAGE_L4_RSVD_MASK(s->haw_bits);
-    vtd_paging_entry_rsvd_field[5] = VTD_SPTE_LPAGE_L1_RSVD_MASK(s->haw_bits);
-    vtd_paging_entry_rsvd_field[6] = VTD_SPTE_LPAGE_L2_RSVD_MASK(s->haw_bits);
-    vtd_paging_entry_rsvd_field[7] = VTD_SPTE_LPAGE_L3_RSVD_MASK(s->haw_bits);
-    vtd_paging_entry_rsvd_field[8] = VTD_SPTE_LPAGE_L4_RSVD_MASK(s->haw_bits);
+    vtd_paging_entry_rsvd_field[5] = VTD_SPTE_PAGE_L5_RSVD_MASK(s->haw_bits);
+    vtd_paging_entry_rsvd_field[6] = VTD_SPTE_LPAGE_L1_RSVD_MASK(s->haw_bits);
+    vtd_paging_entry_rsvd_field[7] = VTD_SPTE_LPAGE_L2_RSVD_MASK(s->haw_bits);
+    vtd_paging_entry_rsvd_field[8] = VTD_SPTE_LPAGE_L3_RSVD_MASK(s->haw_bits);
+    vtd_paging_entry_rsvd_field[9] = VTD_SPTE_LPAGE_L4_RSVD_MASK(s->haw_bits);
+    vtd_paging_entry_rsvd_field[10] = VTD_SPTE_LPAGE_L5_RSVD_MASK(s->haw_bits);
 
     if (x86_iommu->intr_supported) {
         s->ecap |= VTD_ECAP_IR | VTD_ECAP_MHMV;
@@ -3238,6 +3243,23 @@ static AddressSpace *vtd_host_dma_iommu(PCIBus *bus, void *opaque, int devfn)
     return &vtd_as->as;
 }
 
+static bool host_has_la57(void)
+{
+    uint32_t ecx, unused;
+
+    host_cpuid(7, 0, &unused, &unused, &ecx, &unused);
+    return ecx & CPUID_7_0_ECX_LA57;
+}
+
+static bool guest_has_la57(void)
+{
+    CPUState *cs = first_cpu;
+    X86CPU *cpu = X86_CPU(cs);
+    CPUX86State *env = &cpu->env;
+
+    return env->features[FEAT_7_0_ECX] & CPUID_7_0_ECX_LA57;
+}
+
 static bool vtd_decide_config(IntelIOMMUState *s, Error **errp)
 {
     X86IOMMUState *x86_iommu = X86_IOMMU_DEVICE(s);
@@ -3264,11 +3286,19 @@ static bool vtd_decide_config(IntelIOMMUState *s, Error **errp)
         }
     }
 
-    /* Currently only address widths supported are 39 and 48 bits */
+    /* Currently address widths supported are 39, 48, and 57 bits */
     if ((s->aw_bits != VTD_AW_39BIT) &&
-        (s->aw_bits != VTD_AW_48BIT)) {
-        error_setg(errp, "Supported values for x-aw-bits are: %d, %d",
-                   VTD_AW_39BIT, VTD_AW_48BIT);
+        (s->aw_bits != VTD_AW_48BIT) &&
+        (s->aw_bits != VTD_AW_57BIT)) {
+        error_setg(errp, "Supported values for x-aw-bits are: %d, %d, %d",
+                   VTD_AW_39BIT, VTD_AW_48BIT, VTD_AW_57BIT);
+        return false;
+    }
+
+    if ((s->aw_bits == VTD_AW_57BIT) &&
+        !(host_has_la57() && guest_has_la57())) {
+        error_setg(errp, "Do not support 57-bit DMA address, unless both "
+                         "host and guest are capable of 5-level paging.\n");
         return false;
     }
 
diff --git a/hw/i386/intel_iommu_internal.h b/hw/i386/intel_iommu_internal.h
index d084099..a7ef24b 100644
--- a/hw/i386/intel_iommu_internal.h
+++ b/hw/i386/intel_iommu_internal.h
@@ -212,6 +212,8 @@
 #define VTD_CAP_SAGAW_39bit         (0x2ULL << VTD_CAP_SAGAW_SHIFT)
  /* 48-bit AGAW, 4-level page-table */
 #define VTD_CAP_SAGAW_48bit         (0x4ULL << VTD_CAP_SAGAW_SHIFT)
+ /* 57-bit AGAW, 5-level page-table */
+#define VTD_CAP_SAGAW_57bit         (0x8ULL << VTD_CAP_SAGAW_SHIFT)
 
 /* IQT_REG */
 #define VTD_IQT_QT(val)             (((val) >> 4) & 0x7fffULL)
@@ -379,6 +381,8 @@ typedef union VTDInvDesc VTDInvDesc;
         (0x800ULL | ~(VTD_HAW_MASK(aw) | VTD_SL_IGN_COM))
 #define VTD_SPTE_PAGE_L4_RSVD_MASK(aw) \
         (0x880ULL | ~(VTD_HAW_MASK(aw) | VTD_SL_IGN_COM))
+#define VTD_SPTE_PAGE_L5_RSVD_MASK(aw) \
+        (0x880ULL | ~(VTD_HAW_MASK(aw) | VTD_SL_IGN_COM))
 #define VTD_SPTE_LPAGE_L1_RSVD_MASK(aw) \
         (0x800ULL | ~(VTD_HAW_MASK(aw) | VTD_SL_IGN_COM))
 #define VTD_SPTE_LPAGE_L2_RSVD_MASK(aw) \
@@ -387,6 +391,8 @@ typedef union VTDInvDesc VTDInvDesc;
         (0x3ffff800ULL | ~(VTD_HAW_MASK(aw) | VTD_SL_IGN_COM))
 #define VTD_SPTE_LPAGE_L4_RSVD_MASK(aw) \
         (0x880ULL | ~(VTD_HAW_MASK(aw) | VTD_SL_IGN_COM))
+#define VTD_SPTE_LPAGE_L5_RSVD_MASK(aw) \
+        (0x880ULL | ~(VTD_HAW_MASK(aw) | VTD_SL_IGN_COM))
 
 /* Information about page-selective IOTLB invalidate */
 struct VTDIOTLBPageInvInfo {
diff --git a/include/hw/i386/intel_iommu.h b/include/hw/i386/intel_iommu.h
index 820451c..7474c4f 100644
--- a/include/hw/i386/intel_iommu.h
+++ b/include/hw/i386/intel_iommu.h
@@ -49,6 +49,7 @@
 #define DMAR_REG_SIZE               0x230
 #define VTD_AW_39BIT                39
 #define VTD_AW_48BIT                48
+#define VTD_AW_57BIT                57
 #define VTD_ADDRESS_WIDTH           VTD_AW_39BIT
 #define VTD_HAW_MASK(aw)            ((1ULL << (aw)) - 1)
 
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [Qemu-devel] [PATCH v1 3/3] intel-iommu: search iotlb for levels supported by the address width.
  2018-11-09 11:49 [Qemu-devel] [PATCH v1 0/3] intel-iommu: add support for 5-level virtual IOMMU Yu Zhang
  2018-11-09 11:49 ` [Qemu-devel] [PATCH v1 1/3] intel-iommu: differentiate host address width from IOVA address width Yu Zhang
  2018-11-09 11:49 ` [Qemu-devel] [PATCH v1 2/3] intel-iommu: extend VTD emulation to allow 57-bit " Yu Zhang
@ 2018-11-09 11:49 ` Yu Zhang
  2018-11-12  8:51   ` Peter Xu
  2018-11-09 22:32 ` [Qemu-devel] [PATCH v1 0/3] intel-iommu: add support for 5-level virtual IOMMU no-reply
  2018-11-12  8:53 ` Peter Xu
  4 siblings, 1 reply; 22+ messages in thread
From: Yu Zhang @ 2018-11-09 11:49 UTC (permalink / raw)
  To: qemu-devel
  Cc: Michael S. Tsirkin, Marcel Apfelbaum, Paolo Bonzini,
	Richard Henderson, Eduardo Habkost, Peter Xu

This patch updates vtd_lookup_iotlb() to search cached mappings only
for all page levels supported by address width of current vIOMMU. Also,
to cover 57-bit width, the shift of source id(VTD_IOTLB_SID_SHIFT) and
of page level(VTD_IOTLB_LVL_SHIFT) are enlarged by 9 - the stride of
one paging structure level.

Signed-off-by: Yu Zhang <yu.c.zhang@linux.intel.com>
---
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Marcel Apfelbaum <marcel.apfelbaum@gmail.com>
Cc: Paolo Bonzini <pbonzini@redhat.com> 
Cc: Richard Henderson <rth@twiddle.net> 
Cc: Eduardo Habkost <ehabkost@redhat.com>
Cc: Peter Xu <peterx@redhat.com>
---
 hw/i386/intel_iommu.c          | 5 +++--
 hw/i386/intel_iommu_internal.h | 7 ++-----
 2 files changed, 5 insertions(+), 7 deletions(-)

diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
index 9cdf755..ce7e17e 100644
--- a/hw/i386/intel_iommu.c
+++ b/hw/i386/intel_iommu.c
@@ -254,11 +254,12 @@ static uint64_t vtd_get_iotlb_gfn(hwaddr addr, uint32_t level)
 static VTDIOTLBEntry *vtd_lookup_iotlb(IntelIOMMUState *s, uint16_t source_id,
                                        hwaddr addr)
 {
-    VTDIOTLBEntry *entry;
+    VTDIOTLBEntry *entry = NULL;
     uint64_t key;
     int level;
+    int max_level = (s->aw_bits - VTD_PAGE_SHIFT_4K) / VTD_SL_LEVEL_BITS;
 
-    for (level = VTD_SL_PT_LEVEL; level < VTD_SL_PML4_LEVEL; level++) {
+    for (level = VTD_SL_PT_LEVEL; level < max_level; level++) {
         key = vtd_get_iotlb_key(vtd_get_iotlb_gfn(addr, level),
                                 source_id, level);
         entry = g_hash_table_lookup(s->iotlb, &key);
diff --git a/hw/i386/intel_iommu_internal.h b/hw/i386/intel_iommu_internal.h
index a7ef24b..bdf2b7c 100644
--- a/hw/i386/intel_iommu_internal.h
+++ b/hw/i386/intel_iommu_internal.h
@@ -114,8 +114,8 @@
                                      VTD_INTERRUPT_ADDR_FIRST + 1)
 
 /* The shift of source_id in the key of IOTLB hash table */
-#define VTD_IOTLB_SID_SHIFT         36
-#define VTD_IOTLB_LVL_SHIFT         52
+#define VTD_IOTLB_SID_SHIFT         45
+#define VTD_IOTLB_LVL_SHIFT         61
 #define VTD_IOTLB_MAX_SIZE          1024    /* Max size of the hash table */
 
 /* IOTLB_REG */
@@ -450,9 +450,6 @@ typedef struct VTDRootEntry VTDRootEntry;
 #define VTD_SL_LEVEL_BITS           9
 
 /* Second Level Paging Structure */
-#define VTD_SL_PML4_LEVEL           4
-#define VTD_SL_PDP_LEVEL            3
-#define VTD_SL_PD_LEVEL             2
 #define VTD_SL_PT_LEVEL             1
 #define VTD_SL_PT_ENTRY_NR          512
 
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [Qemu-devel] [PATCH v1 0/3] intel-iommu: add support for 5-level virtual IOMMU.
  2018-11-09 11:49 [Qemu-devel] [PATCH v1 0/3] intel-iommu: add support for 5-level virtual IOMMU Yu Zhang
                   ` (2 preceding siblings ...)
  2018-11-09 11:49 ` [Qemu-devel] [PATCH v1 3/3] intel-iommu: search iotlb for levels supported by the " Yu Zhang
@ 2018-11-09 22:32 ` no-reply
  2018-11-12  8:53 ` Peter Xu
  4 siblings, 0 replies; 22+ messages in thread
From: no-reply @ 2018-11-09 22:32 UTC (permalink / raw)
  To: yu.c.zhang
  Cc: famz, qemu-devel, ehabkost, mst, peterx, pbonzini, imammedo, rth

Hi,

This series seems to have some coding style problems. See output below for
more information:

Type: series
Message-id: 1541764187-10732-1-git-send-email-yu.c.zhang@linux.intel.com
Subject: [Qemu-devel] [PATCH v1 0/3] intel-iommu: add support for 5-level virtual IOMMU.

=== TEST SCRIPT BEGIN ===
#!/bin/bash

BASE=base
n=1
total=$(git log --oneline $BASE.. | wc -l)
failed=0

git config --local diff.renamelimit 0
git config --local diff.renames True
git config --local diff.algorithm histogram

commits="$(git log --format=%H --reverse $BASE..)"
for c in $commits; do
    echo "Checking PATCH $n/$total: $(git log -n 1 --format=%s $c)..."
    if ! git show $c --format=email | ./scripts/checkpatch.pl --mailback -; then
        failed=1
        echo
    fi
    n=$((n+1))
done

exit $failed
=== TEST SCRIPT END ===

Updating 3c8cf5a9c21ff8782164d1def7f44bd888713384
From https://github.com/patchew-project/qemu
 * [new tag]               patchew/20181109221213.7310-1-crosa@redhat.com -> patchew/20181109221213.7310-1-crosa@redhat.com
Switched to a new branch 'test'
bcc712a4f8 intel-iommu: search iotlb for levels supported by the address width.
b21a8d281a intel-iommu: extend VTD emulation to allow 57-bit IOVA address width.
5234b6784d intel-iommu: differentiate host address width from IOVA address width.

=== OUTPUT BEGIN ===
Checking PATCH 1/3: intel-iommu: differentiate host address width from IOVA address width....
WARNING: line over 80 characters
#48: FILE: hw/i386/intel_iommu.c:709:
+                             uint64_t *slptep, uint32_t *slpte_level, bool *reads,

total: 0 errors, 1 warnings, 188 lines checked

Your patch has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.
Checking PATCH 2/3: intel-iommu: extend VTD emulation to allow 57-bit IOVA address width....
ERROR: else should follow close brace '}'
#49: FILE: hw/i386/intel_iommu.c:3128:
     }
+    else if (s->aw_bits == VTD_AW_57BIT) {

ERROR: Error messages should not contain newlines
#116: FILE: hw/i386/intel_iommu.c:3301:
+                         "host and guest are capable of 5-level paging.\n");

total: 2 errors, 0 warnings, 122 lines checked

Your patch has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.

Checking PATCH 3/3: intel-iommu: search iotlb for levels supported by the address width....
=== OUTPUT END ===

Test command exited with code: 1

---
Email generated automatically by Patchew [http://patchew.org/].
Please send your feedback to patchew-devel@redhat.com

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Qemu-devel] [PATCH v1 1/3] intel-iommu: differentiate host address width from IOVA address width.
  2018-11-09 11:49 ` [Qemu-devel] [PATCH v1 1/3] intel-iommu: differentiate host address width from IOVA address width Yu Zhang
@ 2018-11-12  8:15   ` Peter Xu
  2018-11-12  9:28     ` Yu Zhang
  0 siblings, 1 reply; 22+ messages in thread
From: Peter Xu @ 2018-11-12  8:15 UTC (permalink / raw)
  To: Yu Zhang
  Cc: qemu-devel, Michael S. Tsirkin, Igor Mammedov, Marcel Apfelbaum,
	Paolo Bonzini, Richard Henderson, Eduardo Habkost

On Fri, Nov 09, 2018 at 07:49:45PM +0800, Yu Zhang wrote:
> Currently, vIOMMU is using the value of IOVA address width, instead of
> the host address width(HAW) to calculate the number of reserved bits in
> data structures such as root entries, context entries, and entries of
> DMA paging structures etc.
> 
> However values of IOVA address width and of the HAW may not equal. For
> example, a 48-bit IOVA can only be mapped to host addresses no wider than
> 46 bits. Using 48, instead of 46 to calculate the reserved bit may result
> in an invalid IOVA being accepted.
> 
> To fix this, a new field - haw_bits is introduced in struct IntelIOMMUState,
> whose value is initialized based on the maximum physical address set to
> guest CPU. Also, definitions such as VTD_HOST_AW_39/48BIT etc. are renamed
> to clarify.

IIRC I raised this question some time ago somewhere but no one
remembered to follow that up.  Thanks for fixing it.

It looks mostly good to me, only one tiny comment below...

[...]

> @@ -887,6 +887,7 @@ static int vtd_page_walk_level(dma_addr_t addr, uint64_t start,
>      uint64_t iova = start;
>      uint64_t iova_next;
>      int ret = 0;
> +    uint8_t haw = info->as->iommu_state->haw_bits;

For now vtd_page_walk_info->aw_bits caches the GAW information and we
use a single vtd_page_walk_info during one page walk, maybe we can
also do the same for HAW instead of fetching it every time here from
info->as->iommu_state->haw_bits?

Regards,

-- 
Peter Xu

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Qemu-devel] [PATCH v1 2/3] intel-iommu: extend VTD emulation to allow 57-bit IOVA address width.
  2018-11-09 11:49 ` [Qemu-devel] [PATCH v1 2/3] intel-iommu: extend VTD emulation to allow 57-bit " Yu Zhang
@ 2018-11-12  8:36   ` Peter Xu
  2018-11-12  9:42     ` Yu Zhang
  0 siblings, 1 reply; 22+ messages in thread
From: Peter Xu @ 2018-11-12  8:36 UTC (permalink / raw)
  To: Yu Zhang
  Cc: qemu-devel, Michael S. Tsirkin, Marcel Apfelbaum, Paolo Bonzini,
	Richard Henderson, Eduardo Habkost

On Fri, Nov 09, 2018 at 07:49:46PM +0800, Yu Zhang wrote:
> A 5-level paging capable VM may choose to use 57-bit IOVA address width.
> E.g. guest applications like DPDK prefer to use its VA as IOVA when
> performing VFIO map/unmap operations, to avoid the burden of managing the
> IOVA space.

Since you mentioned about DPDK... I'm just curious that whether have
you tested the patchset with the 57bit-enabled machines with DPDK VA
mode running in the guest? That would be something nice to mention in
the cover letter if you have.

[...]

> @@ -3264,11 +3286,19 @@ static bool vtd_decide_config(IntelIOMMUState *s, Error **errp)
>          }
>      }
>  
> -    /* Currently only address widths supported are 39 and 48 bits */
> +    /* Currently address widths supported are 39, 48, and 57 bits */
>      if ((s->aw_bits != VTD_AW_39BIT) &&
> -        (s->aw_bits != VTD_AW_48BIT)) {
> -        error_setg(errp, "Supported values for x-aw-bits are: %d, %d",
> -                   VTD_AW_39BIT, VTD_AW_48BIT);
> +        (s->aw_bits != VTD_AW_48BIT) &&
> +        (s->aw_bits != VTD_AW_57BIT)) {
> +        error_setg(errp, "Supported values for x-aw-bits are: %d, %d, %d",
> +                   VTD_AW_39BIT, VTD_AW_48BIT, VTD_AW_57BIT);
> +        return false;
> +    }
> +
> +    if ((s->aw_bits == VTD_AW_57BIT) &&
> +        !(host_has_la57() && guest_has_la57())) {
> +        error_setg(errp, "Do not support 57-bit DMA address, unless both "
> +                         "host and guest are capable of 5-level paging.\n");

Is there any context (or pointer to previous discussions would work
too) on explaining why we don't support some scenarios like
host_paw=48,guest_paw=48,guest_gaw=57?

Thanks,

-- 
Peter Xu

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Qemu-devel] [PATCH v1 3/3] intel-iommu: search iotlb for levels supported by the address width.
  2018-11-09 11:49 ` [Qemu-devel] [PATCH v1 3/3] intel-iommu: search iotlb for levels supported by the " Yu Zhang
@ 2018-11-12  8:51   ` Peter Xu
  2018-11-12  9:25     ` Yu Zhang
  0 siblings, 1 reply; 22+ messages in thread
From: Peter Xu @ 2018-11-12  8:51 UTC (permalink / raw)
  To: Yu Zhang
  Cc: qemu-devel, Michael S. Tsirkin, Marcel Apfelbaum, Paolo Bonzini,
	Richard Henderson, Eduardo Habkost

On Fri, Nov 09, 2018 at 07:49:47PM +0800, Yu Zhang wrote:
> This patch updates vtd_lookup_iotlb() to search cached mappings only
> for all page levels supported by address width of current vIOMMU. Also,
> to cover 57-bit width, the shift of source id(VTD_IOTLB_SID_SHIFT) and
> of page level(VTD_IOTLB_LVL_SHIFT) are enlarged by 9 - the stride of
> one paging structure level.
> 
> Signed-off-by: Yu Zhang <yu.c.zhang@linux.intel.com>
> ---
> Cc: "Michael S. Tsirkin" <mst@redhat.com>
> Cc: Marcel Apfelbaum <marcel.apfelbaum@gmail.com>
> Cc: Paolo Bonzini <pbonzini@redhat.com> 
> Cc: Richard Henderson <rth@twiddle.net> 
> Cc: Eduardo Habkost <ehabkost@redhat.com>
> Cc: Peter Xu <peterx@redhat.com>
> ---
>  hw/i386/intel_iommu.c          | 5 +++--
>  hw/i386/intel_iommu_internal.h | 7 ++-----
>  2 files changed, 5 insertions(+), 7 deletions(-)
> 
> diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
> index 9cdf755..ce7e17e 100644
> --- a/hw/i386/intel_iommu.c
> +++ b/hw/i386/intel_iommu.c
> @@ -254,11 +254,12 @@ static uint64_t vtd_get_iotlb_gfn(hwaddr addr, uint32_t level)
>  static VTDIOTLBEntry *vtd_lookup_iotlb(IntelIOMMUState *s, uint16_t source_id,
>                                         hwaddr addr)
>  {
> -    VTDIOTLBEntry *entry;
> +    VTDIOTLBEntry *entry = NULL;
>      uint64_t key;
>      int level;
> +    int max_level = (s->aw_bits - VTD_PAGE_SHIFT_4K) / VTD_SL_LEVEL_BITS;
>  
> -    for (level = VTD_SL_PT_LEVEL; level < VTD_SL_PML4_LEVEL; level++) {
> +    for (level = VTD_SL_PT_LEVEL; level < max_level; level++) {

My understanding of current IOTLB is that it only caches the last
level of mapping, say:

  - level 1: 4K page
  - level 2: 2M page
  - level 3: 1G page

So we don't check against level=4 even if x-aw-bits=48 is specified.

Here does it mean that we're going to have... 512G iommu huge pages?

>          key = vtd_get_iotlb_key(vtd_get_iotlb_gfn(addr, level),
>                                  source_id, level);
>          entry = g_hash_table_lookup(s->iotlb, &key);
> diff --git a/hw/i386/intel_iommu_internal.h b/hw/i386/intel_iommu_internal.h
> index a7ef24b..bdf2b7c 100644
> --- a/hw/i386/intel_iommu_internal.h
> +++ b/hw/i386/intel_iommu_internal.h
> @@ -114,8 +114,8 @@
>                                       VTD_INTERRUPT_ADDR_FIRST + 1)
>  
>  /* The shift of source_id in the key of IOTLB hash table */
> -#define VTD_IOTLB_SID_SHIFT         36
> -#define VTD_IOTLB_LVL_SHIFT         52
> +#define VTD_IOTLB_SID_SHIFT         45
> +#define VTD_IOTLB_LVL_SHIFT         61
>  #define VTD_IOTLB_MAX_SIZE          1024    /* Max size of the hash table */
>  
>  /* IOTLB_REG */
> @@ -450,9 +450,6 @@ typedef struct VTDRootEntry VTDRootEntry;
>  #define VTD_SL_LEVEL_BITS           9
>  
>  /* Second Level Paging Structure */
> -#define VTD_SL_PML4_LEVEL           4
> -#define VTD_SL_PDP_LEVEL            3
> -#define VTD_SL_PD_LEVEL             2
>  #define VTD_SL_PT_LEVEL             1
>  #define VTD_SL_PT_ENTRY_NR          512
>  
> -- 
> 1.9.1
> 

Regards,

-- 
Peter Xu

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Qemu-devel] [PATCH v1 0/3] intel-iommu: add support for 5-level virtual IOMMU.
  2018-11-09 11:49 [Qemu-devel] [PATCH v1 0/3] intel-iommu: add support for 5-level virtual IOMMU Yu Zhang
                   ` (3 preceding siblings ...)
  2018-11-09 22:32 ` [Qemu-devel] [PATCH v1 0/3] intel-iommu: add support for 5-level virtual IOMMU no-reply
@ 2018-11-12  8:53 ` Peter Xu
  4 siblings, 0 replies; 22+ messages in thread
From: Peter Xu @ 2018-11-12  8:53 UTC (permalink / raw)
  To: Yu Zhang
  Cc: qemu-devel, Michael S. Tsirkin, Igor Mammedov, Marcel Apfelbaum,
	Paolo Bonzini, Richard Henderson, Eduardo Habkost

On Fri, Nov 09, 2018 at 07:49:44PM +0800, Yu Zhang wrote:
> Intel's upcoming processors will extend maximum linear address width to
> 57 bits, and introduce 5-level paging for CPU. Meanwhile, the platform
> will also extend the maximum guest address width for IOMMU to 57 bits,
> thus introducing the 5-level paging for 2nd level translation(See chapter 3
> in Intel Virtualization Technology for Directed I/O). 
> 
> This patch set extends the current logic to support a wider address width.
> A 5-level paging capable IOMMU(for 2nd level translation) can be rendered
> with configuration "device intel-iommu,x-aw-bits=57".

Along with this series, I'm not sure whether it'll be good we start to
consider removing the "x-" prefix for "x-aw-bits".

Michael?

Regards,

-- 
Peter Xu

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Qemu-devel] [PATCH v1 3/3] intel-iommu: search iotlb for levels supported by the address width.
  2018-11-12  8:51   ` Peter Xu
@ 2018-11-12  9:25     ` Yu Zhang
  2018-11-12  9:36       ` Peter Xu
  0 siblings, 1 reply; 22+ messages in thread
From: Yu Zhang @ 2018-11-12  9:25 UTC (permalink / raw)
  To: Peter Xu
  Cc: Eduardo Habkost, Michael S. Tsirkin, qemu-devel, Paolo Bonzini,
	Richard Henderson

On Mon, Nov 12, 2018 at 04:51:22PM +0800, Peter Xu wrote:
> On Fri, Nov 09, 2018 at 07:49:47PM +0800, Yu Zhang wrote:
> > This patch updates vtd_lookup_iotlb() to search cached mappings only
> > for all page levels supported by address width of current vIOMMU. Also,
> > to cover 57-bit width, the shift of source id(VTD_IOTLB_SID_SHIFT) and
> > of page level(VTD_IOTLB_LVL_SHIFT) are enlarged by 9 - the stride of
> > one paging structure level.
> > 
> > Signed-off-by: Yu Zhang <yu.c.zhang@linux.intel.com>
> > ---
> > Cc: "Michael S. Tsirkin" <mst@redhat.com>
> > Cc: Marcel Apfelbaum <marcel.apfelbaum@gmail.com>
> > Cc: Paolo Bonzini <pbonzini@redhat.com> 
> > Cc: Richard Henderson <rth@twiddle.net> 
> > Cc: Eduardo Habkost <ehabkost@redhat.com>
> > Cc: Peter Xu <peterx@redhat.com>
> > ---
> >  hw/i386/intel_iommu.c          | 5 +++--
> >  hw/i386/intel_iommu_internal.h | 7 ++-----
> >  2 files changed, 5 insertions(+), 7 deletions(-)
> > 
> > diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
> > index 9cdf755..ce7e17e 100644
> > --- a/hw/i386/intel_iommu.c
> > +++ b/hw/i386/intel_iommu.c
> > @@ -254,11 +254,12 @@ static uint64_t vtd_get_iotlb_gfn(hwaddr addr, uint32_t level)
> >  static VTDIOTLBEntry *vtd_lookup_iotlb(IntelIOMMUState *s, uint16_t source_id,
> >                                         hwaddr addr)
> >  {
> > -    VTDIOTLBEntry *entry;
> > +    VTDIOTLBEntry *entry = NULL;
> >      uint64_t key;
> >      int level;
> > +    int max_level = (s->aw_bits - VTD_PAGE_SHIFT_4K) / VTD_SL_LEVEL_BITS;
> >  
> > -    for (level = VTD_SL_PT_LEVEL; level < VTD_SL_PML4_LEVEL; level++) {
> > +    for (level = VTD_SL_PT_LEVEL; level < max_level; level++) {
> 
> My understanding of current IOTLB is that it only caches the last
> level of mapping, say:
> 
>   - level 1: 4K page
>   - level 2: 2M page
>   - level 3: 1G page
> 
> So we don't check against level=4 even if x-aw-bits=48 is specified.
> 
> Here does it mean that we're going to have... 512G iommu huge pages?
> 

No. My bad, I misunderstood this routine. And now I believe we do not
need this patch. :-)

Thanks, Peter.

> >          key = vtd_get_iotlb_key(vtd_get_iotlb_gfn(addr, level),
> >                                  source_id, level);
> >          entry = g_hash_table_lookup(s->iotlb, &key);
> > diff --git a/hw/i386/intel_iommu_internal.h b/hw/i386/intel_iommu_internal.h
> > index a7ef24b..bdf2b7c 100644
> > --- a/hw/i386/intel_iommu_internal.h
> > +++ b/hw/i386/intel_iommu_internal.h
> > @@ -114,8 +114,8 @@
> >                                       VTD_INTERRUPT_ADDR_FIRST + 1)
> >  
> >  /* The shift of source_id in the key of IOTLB hash table */
> > -#define VTD_IOTLB_SID_SHIFT         36
> > -#define VTD_IOTLB_LVL_SHIFT         52
> > +#define VTD_IOTLB_SID_SHIFT         45
> > +#define VTD_IOTLB_LVL_SHIFT         61
> >  #define VTD_IOTLB_MAX_SIZE          1024    /* Max size of the hash table */
> >  
> >  /* IOTLB_REG */
> > @@ -450,9 +450,6 @@ typedef struct VTDRootEntry VTDRootEntry;
> >  #define VTD_SL_LEVEL_BITS           9
> >  
> >  /* Second Level Paging Structure */
> > -#define VTD_SL_PML4_LEVEL           4
> > -#define VTD_SL_PDP_LEVEL            3
> > -#define VTD_SL_PD_LEVEL             2
> >  #define VTD_SL_PT_LEVEL             1
> >  #define VTD_SL_PT_ENTRY_NR          512
> >  
> > -- 
> > 1.9.1
> > 
> 
> Regards,
> 
> -- 
> Peter Xu
> 

B.R.
Yu

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Qemu-devel] [PATCH v1 1/3] intel-iommu: differentiate host address width from IOVA address width.
  2018-11-12  8:15   ` Peter Xu
@ 2018-11-12  9:28     ` Yu Zhang
  0 siblings, 0 replies; 22+ messages in thread
From: Yu Zhang @ 2018-11-12  9:28 UTC (permalink / raw)
  To: Peter Xu
  Cc: Eduardo Habkost, Michael S. Tsirkin, qemu-devel, Paolo Bonzini,
	Igor Mammedov, Richard Henderson

On Mon, Nov 12, 2018 at 04:15:46PM +0800, Peter Xu wrote:
> On Fri, Nov 09, 2018 at 07:49:45PM +0800, Yu Zhang wrote:
> > Currently, vIOMMU is using the value of IOVA address width, instead of
> > the host address width(HAW) to calculate the number of reserved bits in
> > data structures such as root entries, context entries, and entries of
> > DMA paging structures etc.
> > 
> > However values of IOVA address width and of the HAW may not equal. For
> > example, a 48-bit IOVA can only be mapped to host addresses no wider than
> > 46 bits. Using 48, instead of 46 to calculate the reserved bit may result
> > in an invalid IOVA being accepted.
> > 
> > To fix this, a new field - haw_bits is introduced in struct IntelIOMMUState,
> > whose value is initialized based on the maximum physical address set to
> > guest CPU. Also, definitions such as VTD_HOST_AW_39/48BIT etc. are renamed
> > to clarify.
> 
> IIRC I raised this question some time ago somewhere but no one
> remembered to follow that up.  Thanks for fixing it.
> 
> It looks mostly good to me, only one tiny comment below...
> 
> [...]
> 
> > @@ -887,6 +887,7 @@ static int vtd_page_walk_level(dma_addr_t addr, uint64_t start,
> >      uint64_t iova = start;
> >      uint64_t iova_next;
> >      int ret = 0;
> > +    uint8_t haw = info->as->iommu_state->haw_bits;
> 
> For now vtd_page_walk_info->aw_bits caches the GAW information and we
> use a single vtd_page_walk_info during one page walk, maybe we can
> also do the same for HAW instead of fetching it every time here from
> info->as->iommu_state->haw_bits?

Thank you, Peter. And yes, using haw_bits in vtd_page_walk_info is better. :)

> 
> Regards,
> 
> -- 
> Peter Xu
> 

B.R.
Yu

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Qemu-devel] [PATCH v1 3/3] intel-iommu: search iotlb for levels supported by the address width.
  2018-11-12  9:25     ` Yu Zhang
@ 2018-11-12  9:36       ` Peter Xu
  2018-11-12 12:38         ` Yu Zhang
  0 siblings, 1 reply; 22+ messages in thread
From: Peter Xu @ 2018-11-12  9:36 UTC (permalink / raw)
  To: Yu Zhang
  Cc: Eduardo Habkost, Michael S. Tsirkin, qemu-devel, Paolo Bonzini,
	Richard Henderson

On Mon, Nov 12, 2018 at 05:25:48PM +0800, Yu Zhang wrote:
> On Mon, Nov 12, 2018 at 04:51:22PM +0800, Peter Xu wrote:
> > On Fri, Nov 09, 2018 at 07:49:47PM +0800, Yu Zhang wrote:
> > > This patch updates vtd_lookup_iotlb() to search cached mappings only
> > > for all page levels supported by address width of current vIOMMU. Also,
> > > to cover 57-bit width, the shift of source id(VTD_IOTLB_SID_SHIFT) and
> > > of page level(VTD_IOTLB_LVL_SHIFT) are enlarged by 9 - the stride of
> > > one paging structure level.
> > > 
> > > Signed-off-by: Yu Zhang <yu.c.zhang@linux.intel.com>
> > > ---
> > > Cc: "Michael S. Tsirkin" <mst@redhat.com>
> > > Cc: Marcel Apfelbaum <marcel.apfelbaum@gmail.com>
> > > Cc: Paolo Bonzini <pbonzini@redhat.com> 
> > > Cc: Richard Henderson <rth@twiddle.net> 
> > > Cc: Eduardo Habkost <ehabkost@redhat.com>
> > > Cc: Peter Xu <peterx@redhat.com>
> > > ---
> > >  hw/i386/intel_iommu.c          | 5 +++--
> > >  hw/i386/intel_iommu_internal.h | 7 ++-----
> > >  2 files changed, 5 insertions(+), 7 deletions(-)
> > > 
> > > diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
> > > index 9cdf755..ce7e17e 100644
> > > --- a/hw/i386/intel_iommu.c
> > > +++ b/hw/i386/intel_iommu.c
> > > @@ -254,11 +254,12 @@ static uint64_t vtd_get_iotlb_gfn(hwaddr addr, uint32_t level)
> > >  static VTDIOTLBEntry *vtd_lookup_iotlb(IntelIOMMUState *s, uint16_t source_id,
> > >                                         hwaddr addr)
> > >  {
> > > -    VTDIOTLBEntry *entry;
> > > +    VTDIOTLBEntry *entry = NULL;
> > >      uint64_t key;
> > >      int level;
> > > +    int max_level = (s->aw_bits - VTD_PAGE_SHIFT_4K) / VTD_SL_LEVEL_BITS;
> > >  
> > > -    for (level = VTD_SL_PT_LEVEL; level < VTD_SL_PML4_LEVEL; level++) {
> > > +    for (level = VTD_SL_PT_LEVEL; level < max_level; level++) {
> > 
> > My understanding of current IOTLB is that it only caches the last
> > level of mapping, say:
> > 
> >   - level 1: 4K page
> >   - level 2: 2M page
> >   - level 3: 1G page
> > 
> > So we don't check against level=4 even if x-aw-bits=48 is specified.
> > 
> > Here does it mean that we're going to have... 512G iommu huge pages?
> > 
> 
> No. My bad, I misunderstood this routine. And now I believe we do not
> need this patch. :-)

Yeah good to confirm that :-)

Regards,

-- 
Peter Xu

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Qemu-devel] [PATCH v1 2/3] intel-iommu: extend VTD emulation to allow 57-bit IOVA address width.
  2018-11-12  8:36   ` Peter Xu
@ 2018-11-12  9:42     ` Yu Zhang
  2018-11-13  3:37       ` Peter Xu
  0 siblings, 1 reply; 22+ messages in thread
From: Yu Zhang @ 2018-11-12  9:42 UTC (permalink / raw)
  To: Peter Xu
  Cc: Eduardo Habkost, Michael S. Tsirkin, qemu-devel, Paolo Bonzini,
	Richard Henderson

On Mon, Nov 12, 2018 at 04:36:34PM +0800, Peter Xu wrote:
> On Fri, Nov 09, 2018 at 07:49:46PM +0800, Yu Zhang wrote:
> > A 5-level paging capable VM may choose to use 57-bit IOVA address width.
> > E.g. guest applications like DPDK prefer to use its VA as IOVA when
> > performing VFIO map/unmap operations, to avoid the burden of managing the
> > IOVA space.
> 
> Since you mentioned about DPDK... I'm just curious that whether have
> you tested the patchset with the 57bit-enabled machines with DPDK VA
> mode running in the guest? That would be something nice to mention in
> the cover letter if you have.
> 

Hah. Maybe I shall not mention DPDK here. 

The story is that we heard the requirement, saying applications like DPDK
would need 5-level paging in IOMMU side. And I was convinced after checked
DPDK code, seeing it may use VA as IOVA directly. But I did not test this
patch with DPDK.

Instead, I used kvm-unit-test to verify this patch series. And of course, I
also did some modification to the test case. Patch for the test also sent out
at https://www.spinics.net/lists/kvm/msg177425.html.

> [...]
> 
> > @@ -3264,11 +3286,19 @@ static bool vtd_decide_config(IntelIOMMUState *s, Error **errp)
> >          }
> >      }
> >  
> > -    /* Currently only address widths supported are 39 and 48 bits */
> > +    /* Currently address widths supported are 39, 48, and 57 bits */
> >      if ((s->aw_bits != VTD_AW_39BIT) &&
> > -        (s->aw_bits != VTD_AW_48BIT)) {
> > -        error_setg(errp, "Supported values for x-aw-bits are: %d, %d",
> > -                   VTD_AW_39BIT, VTD_AW_48BIT);
> > +        (s->aw_bits != VTD_AW_48BIT) &&
> > +        (s->aw_bits != VTD_AW_57BIT)) {
> > +        error_setg(errp, "Supported values for x-aw-bits are: %d, %d, %d",
> > +                   VTD_AW_39BIT, VTD_AW_48BIT, VTD_AW_57BIT);
> > +        return false;
> > +    }
> > +
> > +    if ((s->aw_bits == VTD_AW_57BIT) &&
> > +        !(host_has_la57() && guest_has_la57())) {
> > +        error_setg(errp, "Do not support 57-bit DMA address, unless both "
> > +                         "host and guest are capable of 5-level paging.\n");
> 
> Is there any context (or pointer to previous discussions would work
> too) on explaining why we don't support some scenarios like
> host_paw=48,guest_paw=48,guest_gaw=57?
> 

Well, above check is only to make sure both the host and the guest can
use 57bit linear address, which requires 5-level paging. So I believe
we do support scenarios like host_paw=48,guest_paw=48,guest_gaw=57.
The guest_has_la57() means the guest can use 57-bit linear address,
regardless of its physical address width.

> Thanks,
> 
> -- 
> Peter Xu
> 

B.R.
Yu

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Qemu-devel] [PATCH v1 3/3] intel-iommu: search iotlb for levels supported by the address width.
  2018-11-12  9:36       ` Peter Xu
@ 2018-11-12 12:38         ` Yu Zhang
  2018-11-13  5:18           ` Peter Xu
  0 siblings, 1 reply; 22+ messages in thread
From: Yu Zhang @ 2018-11-12 12:38 UTC (permalink / raw)
  To: Peter Xu
  Cc: qemu-devel, Paolo Bonzini, Richard Henderson, Eduardo Habkost,
	Michael S. Tsirkin

On Mon, Nov 12, 2018 at 05:36:38PM +0800, Peter Xu wrote:
> On Mon, Nov 12, 2018 at 05:25:48PM +0800, Yu Zhang wrote:
> > On Mon, Nov 12, 2018 at 04:51:22PM +0800, Peter Xu wrote:
> > > On Fri, Nov 09, 2018 at 07:49:47PM +0800, Yu Zhang wrote:
> > > > This patch updates vtd_lookup_iotlb() to search cached mappings only
> > > > for all page levels supported by address width of current vIOMMU. Also,
> > > > to cover 57-bit width, the shift of source id(VTD_IOTLB_SID_SHIFT) and
> > > > of page level(VTD_IOTLB_LVL_SHIFT) are enlarged by 9 - the stride of
> > > > one paging structure level.
> > > > 
> > > > Signed-off-by: Yu Zhang <yu.c.zhang@linux.intel.com>
> > > > ---
> > > > Cc: "Michael S. Tsirkin" <mst@redhat.com>
> > > > Cc: Marcel Apfelbaum <marcel.apfelbaum@gmail.com>
> > > > Cc: Paolo Bonzini <pbonzini@redhat.com> 
> > > > Cc: Richard Henderson <rth@twiddle.net> 
> > > > Cc: Eduardo Habkost <ehabkost@redhat.com>
> > > > Cc: Peter Xu <peterx@redhat.com>
> > > > ---
> > > >  hw/i386/intel_iommu.c          | 5 +++--
> > > >  hw/i386/intel_iommu_internal.h | 7 ++-----
> > > >  2 files changed, 5 insertions(+), 7 deletions(-)
> > > > 
> > > > diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
> > > > index 9cdf755..ce7e17e 100644
> > > > --- a/hw/i386/intel_iommu.c
> > > > +++ b/hw/i386/intel_iommu.c
> > > > @@ -254,11 +254,12 @@ static uint64_t vtd_get_iotlb_gfn(hwaddr addr, uint32_t level)
> > > >  static VTDIOTLBEntry *vtd_lookup_iotlb(IntelIOMMUState *s, uint16_t source_id,
> > > >                                         hwaddr addr)
> > > >  {
> > > > -    VTDIOTLBEntry *entry;
> > > > +    VTDIOTLBEntry *entry = NULL;
> > > >      uint64_t key;
> > > >      int level;
> > > > +    int max_level = (s->aw_bits - VTD_PAGE_SHIFT_4K) / VTD_SL_LEVEL_BITS;
> > > >  
> > > > -    for (level = VTD_SL_PT_LEVEL; level < VTD_SL_PML4_LEVEL; level++) {
> > > > +    for (level = VTD_SL_PT_LEVEL; level < max_level; level++) {
> > > 
> > > My understanding of current IOTLB is that it only caches the last
> > > level of mapping, say:
> > > 
> > >   - level 1: 4K page
> > >   - level 2: 2M page
> > >   - level 3: 1G page
> > > 
> > > So we don't check against level=4 even if x-aw-bits=48 is specified.
> > > 
> > > Here does it mean that we're going to have... 512G iommu huge pages?
> > > 
> > 
> > No. My bad, I misunderstood this routine. And now I believe we do not
> > need this patch. :-)
> 
> Yeah good to confirm that :-)

Sorry, Peter. I still have question about this part. I agree we do not need
to do the extra loop - therefore no need for the max_level part introduced
in this patch.

But as to modification of VTD_IOTLB_SID_SHIFT/VTD_IOTLB_LVL_SHIFT, we may
still need to do it due to the enlarged gfn, to search an IOTLB entry for
a 4K mapping, the pfn itself could be as large as 45-bit.

Besides, currently vtd_get_iotlb_gfn() is just shifting 12 bits for all
different levels, is this necessary? I mean, how about we do the shift
based on current level?

 static uint64_t vtd_get_iotlb_gfn(hwaddr addr, uint32_t level)
 {
-    return (addr & vtd_slpt_level_page_mask(level)) >> VTD_PAGE_SHIFT_4K;
+    uint32_t shift = vtd_slpt_level_shift(level);
+    return (addr & vtd_slpt_level_page_mask(level)) >> shift;
 }

> 
> Regards,
> 
> -- 
> Peter Xu
> 

B.R.
Yu

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Qemu-devel] [PATCH v1 2/3] intel-iommu: extend VTD emulation to allow 57-bit IOVA address width.
  2018-11-12  9:42     ` Yu Zhang
@ 2018-11-13  3:37       ` Peter Xu
  2018-11-13  5:04         ` Peter Xu
  2018-11-13  5:41         ` Yu Zhang
  0 siblings, 2 replies; 22+ messages in thread
From: Peter Xu @ 2018-11-13  3:37 UTC (permalink / raw)
  To: Yu Zhang
  Cc: Eduardo Habkost, Michael S. Tsirkin, qemu-devel, Paolo Bonzini,
	Richard Henderson

On Mon, Nov 12, 2018 at 05:42:01PM +0800, Yu Zhang wrote:
> On Mon, Nov 12, 2018 at 04:36:34PM +0800, Peter Xu wrote:
> > On Fri, Nov 09, 2018 at 07:49:46PM +0800, Yu Zhang wrote:
> > > A 5-level paging capable VM may choose to use 57-bit IOVA address width.
> > > E.g. guest applications like DPDK prefer to use its VA as IOVA when
> > > performing VFIO map/unmap operations, to avoid the burden of managing the
> > > IOVA space.
> > 
> > Since you mentioned about DPDK... I'm just curious that whether have
> > you tested the patchset with the 57bit-enabled machines with DPDK VA
> > mode running in the guest? That would be something nice to mention in
> > the cover letter if you have.
> > 
> 
> Hah. Maybe I shall not mention DPDK here. 
> 
> The story is that we heard the requirement, saying applications like DPDK
> would need 5-level paging in IOMMU side. And I was convinced after checked
> DPDK code, seeing it may use VA as IOVA directly. But I did not test this
> patch with DPDK.
> 
> Instead, I used kvm-unit-test to verify this patch series. And of course, I
> also did some modification to the test case. Patch for the test also sent out
> at https://www.spinics.net/lists/kvm/msg177425.html.

Yeah that's perfectly fine for me.  So instead maybe you can also
mention the kvm-unit-test in the cover letter if you gonna repost.

> 
> > [...]
> > 
> > > @@ -3264,11 +3286,19 @@ static bool vtd_decide_config(IntelIOMMUState *s, Error **errp)
> > >          }
> > >      }
> > >  
> > > -    /* Currently only address widths supported are 39 and 48 bits */
> > > +    /* Currently address widths supported are 39, 48, and 57 bits */
> > >      if ((s->aw_bits != VTD_AW_39BIT) &&
> > > -        (s->aw_bits != VTD_AW_48BIT)) {
> > > -        error_setg(errp, "Supported values for x-aw-bits are: %d, %d",
> > > -                   VTD_AW_39BIT, VTD_AW_48BIT);
> > > +        (s->aw_bits != VTD_AW_48BIT) &&
> > > +        (s->aw_bits != VTD_AW_57BIT)) {
> > > +        error_setg(errp, "Supported values for x-aw-bits are: %d, %d, %d",
> > > +                   VTD_AW_39BIT, VTD_AW_48BIT, VTD_AW_57BIT);
> > > +        return false;
> > > +    }
> > > +
> > > +    if ((s->aw_bits == VTD_AW_57BIT) &&
> > > +        !(host_has_la57() && guest_has_la57())) {
> > > +        error_setg(errp, "Do not support 57-bit DMA address, unless both "
> > > +                         "host and guest are capable of 5-level paging.\n");
> > 
> > Is there any context (or pointer to previous discussions would work
> > too) on explaining why we don't support some scenarios like
> > host_paw=48,guest_paw=48,guest_gaw=57?
> > 
> 
> Well, above check is only to make sure both the host and the guest can
> use 57bit linear address, which requires 5-level paging. So I believe
> we do support scenarios like host_paw=48,guest_paw=48,guest_gaw=57.
> The guest_has_la57() means the guest can use 57-bit linear address,
> regardless of its physical address width.

Sorry for my incorrect wording.  I mean when host/guest CPU only
support 4-level LA then would/should we allow the guest IOMMU to
support 5-level IOVA?  Asked since I'm thinking whether I can run the
series a bit with my laptop/servers.

Since at it, another thing I thought about is making sure the IOMMU
capabilities will match between host and guest IOMMU, which I think
this series has ignorred so far.  E.g., when we're having assigned
devices in the guest and with 5-level IOVA, we should make sure the
host IOMMU supports 5-level as well before the guest starts since
otherwise the shadow page synchronization could potentially fail when
the requested IOVA address goes beyond 4-level.  One simple solution
is just to disable device assignment for now when we're with 57bits
vIOMMU but I'm not sure whether that's what you want, especially you
mentioned the DPDK case (who may use assigned devices).

(sorry to have mentioned the dpdk case again :)

Regards,

-- 
Peter Xu

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Qemu-devel] [PATCH v1 2/3] intel-iommu: extend VTD emulation to allow 57-bit IOVA address width.
  2018-11-13  3:37       ` Peter Xu
@ 2018-11-13  5:04         ` Peter Xu
  2018-11-13  5:45           ` Yu Zhang
  2018-11-13  5:41         ` Yu Zhang
  1 sibling, 1 reply; 22+ messages in thread
From: Peter Xu @ 2018-11-13  5:04 UTC (permalink / raw)
  To: Yu Zhang
  Cc: Eduardo Habkost, Michael S. Tsirkin, qemu-devel, Paolo Bonzini,
	Richard Henderson

On Tue, Nov 13, 2018 at 11:37:07AM +0800, Peter Xu wrote:
> On Mon, Nov 12, 2018 at 05:42:01PM +0800, Yu Zhang wrote:
> > On Mon, Nov 12, 2018 at 04:36:34PM +0800, Peter Xu wrote:
> > > On Fri, Nov 09, 2018 at 07:49:46PM +0800, Yu Zhang wrote:
> > > > A 5-level paging capable VM may choose to use 57-bit IOVA address width.
> > > > E.g. guest applications like DPDK prefer to use its VA as IOVA when
> > > > performing VFIO map/unmap operations, to avoid the burden of managing the
> > > > IOVA space.
> > > 
> > > Since you mentioned about DPDK... I'm just curious that whether have
> > > you tested the patchset with the 57bit-enabled machines with DPDK VA
> > > mode running in the guest? That would be something nice to mention in
> > > the cover letter if you have.
> > > 
> > 
> > Hah. Maybe I shall not mention DPDK here. 
> > 
> > The story is that we heard the requirement, saying applications like DPDK
> > would need 5-level paging in IOMMU side. And I was convinced after checked
> > DPDK code, seeing it may use VA as IOVA directly. But I did not test this
> > patch with DPDK.
> > 
> > Instead, I used kvm-unit-test to verify this patch series. And of course, I
> > also did some modification to the test case. Patch for the test also sent out
> > at https://www.spinics.net/lists/kvm/msg177425.html.
> 
> Yeah that's perfectly fine for me.  So instead maybe you can also
> mention the kvm-unit-test in the cover letter if you gonna repost.
> 
> > 
> > > [...]
> > > 
> > > > @@ -3264,11 +3286,19 @@ static bool vtd_decide_config(IntelIOMMUState *s, Error **errp)
> > > >          }
> > > >      }
> > > >  
> > > > -    /* Currently only address widths supported are 39 and 48 bits */
> > > > +    /* Currently address widths supported are 39, 48, and 57 bits */
> > > >      if ((s->aw_bits != VTD_AW_39BIT) &&
> > > > -        (s->aw_bits != VTD_AW_48BIT)) {
> > > > -        error_setg(errp, "Supported values for x-aw-bits are: %d, %d",
> > > > -                   VTD_AW_39BIT, VTD_AW_48BIT);
> > > > +        (s->aw_bits != VTD_AW_48BIT) &&
> > > > +        (s->aw_bits != VTD_AW_57BIT)) {
> > > > +        error_setg(errp, "Supported values for x-aw-bits are: %d, %d, %d",
> > > > +                   VTD_AW_39BIT, VTD_AW_48BIT, VTD_AW_57BIT);
> > > > +        return false;
> > > > +    }
> > > > +
> > > > +    if ((s->aw_bits == VTD_AW_57BIT) &&
> > > > +        !(host_has_la57() && guest_has_la57())) {
> > > > +        error_setg(errp, "Do not support 57-bit DMA address, unless both "
> > > > +                         "host and guest are capable of 5-level paging.\n");
> > > 
> > > Is there any context (or pointer to previous discussions would work
> > > too) on explaining why we don't support some scenarios like
> > > host_paw=48,guest_paw=48,guest_gaw=57?
> > > 
> > 
> > Well, above check is only to make sure both the host and the guest can
> > use 57bit linear address, which requires 5-level paging. So I believe
> > we do support scenarios like host_paw=48,guest_paw=48,guest_gaw=57.
> > The guest_has_la57() means the guest can use 57-bit linear address,
> > regardless of its physical address width.
> 
> Sorry for my incorrect wording.  I mean when host/guest CPU only
> support 4-level LA then would/should we allow the guest IOMMU to
> support 5-level IOVA?  Asked since I'm thinking whether I can run the
> series a bit with my laptop/servers.

[...]

> 
> Since at it, another thing I thought about is making sure the IOMMU
> capabilities will match between host and guest IOMMU, which I think
> this series has ignorred so far.  E.g., when we're having assigned
> devices in the guest and with 5-level IOVA, we should make sure the
> host IOMMU supports 5-level as well before the guest starts since
> otherwise the shadow page synchronization could potentially fail when
> the requested IOVA address goes beyond 4-level.  One simple solution
> is just to disable device assignment for now when we're with 57bits
> vIOMMU but I'm not sure whether that's what you want, especially you
> mentioned the DPDK case (who may use assigned devices).

Ok I totally forgot that we don't even support any kind of check like
this before... So feel free to skip this comment if you want, or it
would be even nicer if you want to fix it as a whole. :)

Regards,

-- 
Peter Xu

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Qemu-devel] [PATCH v1 3/3] intel-iommu: search iotlb for levels supported by the address width.
  2018-11-12 12:38         ` Yu Zhang
@ 2018-11-13  5:18           ` Peter Xu
  2018-11-13  5:53             ` Yu Zhang
  0 siblings, 1 reply; 22+ messages in thread
From: Peter Xu @ 2018-11-13  5:18 UTC (permalink / raw)
  To: Yu Zhang
  Cc: qemu-devel, Paolo Bonzini, Richard Henderson, Eduardo Habkost,
	Michael S. Tsirkin

On Mon, Nov 12, 2018 at 08:38:30PM +0800, Yu Zhang wrote:
> On Mon, Nov 12, 2018 at 05:36:38PM +0800, Peter Xu wrote:
> > On Mon, Nov 12, 2018 at 05:25:48PM +0800, Yu Zhang wrote:
> > > On Mon, Nov 12, 2018 at 04:51:22PM +0800, Peter Xu wrote:
> > > > On Fri, Nov 09, 2018 at 07:49:47PM +0800, Yu Zhang wrote:
> > > > > This patch updates vtd_lookup_iotlb() to search cached mappings only
> > > > > for all page levels supported by address width of current vIOMMU. Also,
> > > > > to cover 57-bit width, the shift of source id(VTD_IOTLB_SID_SHIFT) and
> > > > > of page level(VTD_IOTLB_LVL_SHIFT) are enlarged by 9 - the stride of
> > > > > one paging structure level.
> > > > > 
> > > > > Signed-off-by: Yu Zhang <yu.c.zhang@linux.intel.com>
> > > > > ---
> > > > > Cc: "Michael S. Tsirkin" <mst@redhat.com>
> > > > > Cc: Marcel Apfelbaum <marcel.apfelbaum@gmail.com>
> > > > > Cc: Paolo Bonzini <pbonzini@redhat.com> 
> > > > > Cc: Richard Henderson <rth@twiddle.net> 
> > > > > Cc: Eduardo Habkost <ehabkost@redhat.com>
> > > > > Cc: Peter Xu <peterx@redhat.com>
> > > > > ---
> > > > >  hw/i386/intel_iommu.c          | 5 +++--
> > > > >  hw/i386/intel_iommu_internal.h | 7 ++-----
> > > > >  2 files changed, 5 insertions(+), 7 deletions(-)
> > > > > 
> > > > > diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
> > > > > index 9cdf755..ce7e17e 100644
> > > > > --- a/hw/i386/intel_iommu.c
> > > > > +++ b/hw/i386/intel_iommu.c
> > > > > @@ -254,11 +254,12 @@ static uint64_t vtd_get_iotlb_gfn(hwaddr addr, uint32_t level)
> > > > >  static VTDIOTLBEntry *vtd_lookup_iotlb(IntelIOMMUState *s, uint16_t source_id,
> > > > >                                         hwaddr addr)
> > > > >  {
> > > > > -    VTDIOTLBEntry *entry;
> > > > > +    VTDIOTLBEntry *entry = NULL;
> > > > >      uint64_t key;
> > > > >      int level;
> > > > > +    int max_level = (s->aw_bits - VTD_PAGE_SHIFT_4K) / VTD_SL_LEVEL_BITS;
> > > > >  
> > > > > -    for (level = VTD_SL_PT_LEVEL; level < VTD_SL_PML4_LEVEL; level++) {
> > > > > +    for (level = VTD_SL_PT_LEVEL; level < max_level; level++) {
> > > > 
> > > > My understanding of current IOTLB is that it only caches the last
> > > > level of mapping, say:
> > > > 
> > > >   - level 1: 4K page
> > > >   - level 2: 2M page
> > > >   - level 3: 1G page
> > > > 
> > > > So we don't check against level=4 even if x-aw-bits=48 is specified.
> > > > 
> > > > Here does it mean that we're going to have... 512G iommu huge pages?
> > > > 
> > > 
> > > No. My bad, I misunderstood this routine. And now I believe we do not
> > > need this patch. :-)
> > 
> > Yeah good to confirm that :-)
> 
> Sorry, Peter. I still have question about this part. I agree we do not need
> to do the extra loop - therefore no need for the max_level part introduced
> in this patch.
> 
> But as to modification of VTD_IOTLB_SID_SHIFT/VTD_IOTLB_LVL_SHIFT, we may
> still need to do it due to the enlarged gfn, to search an IOTLB entry for
> a 4K mapping, the pfn itself could be as large as 45-bit.

Agreed.

> 
> Besides, currently vtd_get_iotlb_gfn() is just shifting 12 bits for all
> different levels, is this necessary? I mean, how about we do the shift
> based on current level?
> 
>  static uint64_t vtd_get_iotlb_gfn(hwaddr addr, uint32_t level)
>  {
> -    return (addr & vtd_slpt_level_page_mask(level)) >> VTD_PAGE_SHIFT_4K;
> +    uint32_t shift = vtd_slpt_level_shift(level);
> +    return (addr & vtd_slpt_level_page_mask(level)) >> shift;
>  }

IMHO we can, but I don't see much gain from it.

If we shift, we still need to use the maximum possible bits that a PFN
can hold, which is 45bits (when with 4k pages), so we can't gain
anything out if it (no saved bits on iotlb key).  Instead, we'll need
to call more vtd_slpt_level_shift() for each vtd_get_iotlb_gfn() which
even seems a bit slower.

Regards,

-- 
Peter Xu

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Qemu-devel] [PATCH v1 2/3] intel-iommu: extend VTD emulation to allow 57-bit IOVA address width.
  2018-11-13  3:37       ` Peter Xu
  2018-11-13  5:04         ` Peter Xu
@ 2018-11-13  5:41         ` Yu Zhang
  1 sibling, 0 replies; 22+ messages in thread
From: Yu Zhang @ 2018-11-13  5:41 UTC (permalink / raw)
  To: Peter Xu
  Cc: qemu-devel, Paolo Bonzini, Richard Henderson, Eduardo Habkost,
	Michael S. Tsirkin

On Tue, Nov 13, 2018 at 11:37:07AM +0800, Peter Xu wrote:
> On Mon, Nov 12, 2018 at 05:42:01PM +0800, Yu Zhang wrote:
> > On Mon, Nov 12, 2018 at 04:36:34PM +0800, Peter Xu wrote:
> > > On Fri, Nov 09, 2018 at 07:49:46PM +0800, Yu Zhang wrote:
> > > > A 5-level paging capable VM may choose to use 57-bit IOVA address width.
> > > > E.g. guest applications like DPDK prefer to use its VA as IOVA when
> > > > performing VFIO map/unmap operations, to avoid the burden of managing the
> > > > IOVA space.
> > > 
> > > Since you mentioned about DPDK... I'm just curious that whether have
> > > you tested the patchset with the 57bit-enabled machines with DPDK VA
> > > mode running in the guest? That would be something nice to mention in
> > > the cover letter if you have.
> > > 
> > 
> > Hah. Maybe I shall not mention DPDK here. 
> > 
> > The story is that we heard the requirement, saying applications like DPDK
> > would need 5-level paging in IOMMU side. And I was convinced after checked
> > DPDK code, seeing it may use VA as IOVA directly. But I did not test this
> > patch with DPDK.
> > 
> > Instead, I used kvm-unit-test to verify this patch series. And of course, I
> > also did some modification to the test case. Patch for the test also sent out
> > at https://www.spinics.net/lists/kvm/msg177425.html.
> 
> Yeah that's perfectly fine for me.  So instead maybe you can also
> mention the kvm-unit-test in the cover letter if you gonna repost.

Got it. Thanks!

> 
> > 
> > > [...]
> > > 
> > > > @@ -3264,11 +3286,19 @@ static bool vtd_decide_config(IntelIOMMUState *s, Error **errp)
> > > >          }
> > > >      }
> > > >  
> > > > -    /* Currently only address widths supported are 39 and 48 bits */
> > > > +    /* Currently address widths supported are 39, 48, and 57 bits */
> > > >      if ((s->aw_bits != VTD_AW_39BIT) &&
> > > > -        (s->aw_bits != VTD_AW_48BIT)) {
> > > > -        error_setg(errp, "Supported values for x-aw-bits are: %d, %d",
> > > > -                   VTD_AW_39BIT, VTD_AW_48BIT);
> > > > +        (s->aw_bits != VTD_AW_48BIT) &&
> > > > +        (s->aw_bits != VTD_AW_57BIT)) {
> > > > +        error_setg(errp, "Supported values for x-aw-bits are: %d, %d, %d",
> > > > +                   VTD_AW_39BIT, VTD_AW_48BIT, VTD_AW_57BIT);
> > > > +        return false;
> > > > +    }
> > > > +
> > > > +    if ((s->aw_bits == VTD_AW_57BIT) &&
> > > > +        !(host_has_la57() && guest_has_la57())) {
> > > > +        error_setg(errp, "Do not support 57-bit DMA address, unless both "
> > > > +                         "host and guest are capable of 5-level paging.\n");
> > > 
> > > Is there any context (or pointer to previous discussions would work
> > > too) on explaining why we don't support some scenarios like
> > > host_paw=48,guest_paw=48,guest_gaw=57?
> > > 
> > 
> > Well, above check is only to make sure both the host and the guest can
> > use 57bit linear address, which requires 5-level paging. So I believe
> > we do support scenarios like host_paw=48,guest_paw=48,guest_gaw=57.
> > The guest_has_la57() means the guest can use 57-bit linear address,
> > regardless of its physical address width.
> 
> Sorry for my incorrect wording.  I mean when host/guest CPU only
> support 4-level LA then would/should we allow the guest IOMMU to
> support 5-level IOVA?  Asked since I'm thinking whether I can run the
> series a bit with my laptop/servers.

Well, by "only support", I guess you mean the hardware capability, instead
of its paging mode. So I do not think hardware will support 5-level IOVA for
platforms without 5-level VA. Therefore a 5-level vIOMMU is disallowed here. :)

> 
> Since at it, another thing I thought about is making sure the IOMMU
> capabilities will match between host and guest IOMMU, which I think
> this series has ignorred so far.  E.g., when we're having assigned
> devices in the guest and with 5-level IOVA, we should make sure the
> host IOMMU supports 5-level as well before the guest starts since
> otherwise the shadow page synchronization could potentially fail when
> the requested IOVA address goes beyond 4-level.  One simple solution
> is just to disable device assignment for now when we're with 57bits
> vIOMMU but I'm not sure whether that's what you want, especially you
> mentioned the DPDK case (who may use assigned devices).
> 

Thanks, Peter. Replied in the following up mail. :)

> (sorry to have mentioned the dpdk case again :)
> 
> Regards,
> 
> -- 
> Peter Xu
> 

B.R.
Yu

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Qemu-devel] [PATCH v1 2/3] intel-iommu: extend VTD emulation to allow 57-bit IOVA address width.
  2018-11-13  5:04         ` Peter Xu
@ 2018-11-13  5:45           ` Yu Zhang
  2018-11-13  6:12             ` Peter Xu
  0 siblings, 1 reply; 22+ messages in thread
From: Yu Zhang @ 2018-11-13  5:45 UTC (permalink / raw)
  To: Peter Xu
  Cc: qemu-devel, Paolo Bonzini, Richard Henderson, Eduardo Habkost,
	Michael S. Tsirkin

On Tue, Nov 13, 2018 at 01:04:51PM +0800, Peter Xu wrote:
> On Tue, Nov 13, 2018 at 11:37:07AM +0800, Peter Xu wrote:
> > On Mon, Nov 12, 2018 at 05:42:01PM +0800, Yu Zhang wrote:
> > > On Mon, Nov 12, 2018 at 04:36:34PM +0800, Peter Xu wrote:
> > > > On Fri, Nov 09, 2018 at 07:49:46PM +0800, Yu Zhang wrote:
> > > > > A 5-level paging capable VM may choose to use 57-bit IOVA address width.
> > > > > E.g. guest applications like DPDK prefer to use its VA as IOVA when
> > > > > performing VFIO map/unmap operations, to avoid the burden of managing the
> > > > > IOVA space.
> > > > 
> > > > Since you mentioned about DPDK... I'm just curious that whether have
> > > > you tested the patchset with the 57bit-enabled machines with DPDK VA
> > > > mode running in the guest? That would be something nice to mention in
> > > > the cover letter if you have.
> > > > 
> > > 
> > > Hah. Maybe I shall not mention DPDK here. 
> > > 
> > > The story is that we heard the requirement, saying applications like DPDK
> > > would need 5-level paging in IOMMU side. And I was convinced after checked
> > > DPDK code, seeing it may use VA as IOVA directly. But I did not test this
> > > patch with DPDK.
> > > 
> > > Instead, I used kvm-unit-test to verify this patch series. And of course, I
> > > also did some modification to the test case. Patch for the test also sent out
> > > at https://www.spinics.net/lists/kvm/msg177425.html.
> > 
> > Yeah that's perfectly fine for me.  So instead maybe you can also
> > mention the kvm-unit-test in the cover letter if you gonna repost.
> > 
> > > 
> > > > [...]
> > > > 
> > > > > @@ -3264,11 +3286,19 @@ static bool vtd_decide_config(IntelIOMMUState *s, Error **errp)
> > > > >          }
> > > > >      }
> > > > >  
> > > > > -    /* Currently only address widths supported are 39 and 48 bits */
> > > > > +    /* Currently address widths supported are 39, 48, and 57 bits */
> > > > >      if ((s->aw_bits != VTD_AW_39BIT) &&
> > > > > -        (s->aw_bits != VTD_AW_48BIT)) {
> > > > > -        error_setg(errp, "Supported values for x-aw-bits are: %d, %d",
> > > > > -                   VTD_AW_39BIT, VTD_AW_48BIT);
> > > > > +        (s->aw_bits != VTD_AW_48BIT) &&
> > > > > +        (s->aw_bits != VTD_AW_57BIT)) {
> > > > > +        error_setg(errp, "Supported values for x-aw-bits are: %d, %d, %d",
> > > > > +                   VTD_AW_39BIT, VTD_AW_48BIT, VTD_AW_57BIT);
> > > > > +        return false;
> > > > > +    }
> > > > > +
> > > > > +    if ((s->aw_bits == VTD_AW_57BIT) &&
> > > > > +        !(host_has_la57() && guest_has_la57())) {
> > > > > +        error_setg(errp, "Do not support 57-bit DMA address, unless both "
> > > > > +                         "host and guest are capable of 5-level paging.\n");
> > > > 
> > > > Is there any context (or pointer to previous discussions would work
> > > > too) on explaining why we don't support some scenarios like
> > > > host_paw=48,guest_paw=48,guest_gaw=57?
> > > > 
> > > 
> > > Well, above check is only to make sure both the host and the guest can
> > > use 57bit linear address, which requires 5-level paging. So I believe
> > > we do support scenarios like host_paw=48,guest_paw=48,guest_gaw=57.
> > > The guest_has_la57() means the guest can use 57-bit linear address,
> > > regardless of its physical address width.
> > 
> > Sorry for my incorrect wording.  I mean when host/guest CPU only
> > support 4-level LA then would/should we allow the guest IOMMU to
> > support 5-level IOVA?  Asked since I'm thinking whether I can run the
> > series a bit with my laptop/servers.
> 
> [...]
> 
> > 
> > Since at it, another thing I thought about is making sure the IOMMU
> > capabilities will match between host and guest IOMMU, which I think
> > this series has ignorred so far.  E.g., when we're having assigned
> > devices in the guest and with 5-level IOVA, we should make sure the
> > host IOMMU supports 5-level as well before the guest starts since
> > otherwise the shadow page synchronization could potentially fail when
> > the requested IOVA address goes beyond 4-level.  One simple solution
> > is just to disable device assignment for now when we're with 57bits
> > vIOMMU but I'm not sure whether that's what you want, especially you
> > mentioned the DPDK case (who may use assigned devices).
> 
> Ok I totally forgot that we don't even support any kind of check like
> this before... So feel free to skip this comment if you want, or it
> would be even nicer if you want to fix it as a whole. :)
> 

Indeed. We have talked about this before. How about we focus on the 5-level
extension for now, and solve the check issue in the future? I still do not
have any clean solutions in mind. BTW, any suggestions on this issue? :)

> Regards,
> 
> -- 
> Peter Xu
> 

B.R.
Yu

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Qemu-devel] [PATCH v1 3/3] intel-iommu: search iotlb for levels supported by the address width.
  2018-11-13  5:18           ` Peter Xu
@ 2018-11-13  5:53             ` Yu Zhang
  0 siblings, 0 replies; 22+ messages in thread
From: Yu Zhang @ 2018-11-13  5:53 UTC (permalink / raw)
  To: Peter Xu
  Cc: Paolo Bonzini, Michael S. Tsirkin, qemu-devel, Eduardo Habkost,
	Richard Henderson

On Tue, Nov 13, 2018 at 01:18:54PM +0800, Peter Xu wrote:
> On Mon, Nov 12, 2018 at 08:38:30PM +0800, Yu Zhang wrote:
> > On Mon, Nov 12, 2018 at 05:36:38PM +0800, Peter Xu wrote:
> > > On Mon, Nov 12, 2018 at 05:25:48PM +0800, Yu Zhang wrote:
> > > > On Mon, Nov 12, 2018 at 04:51:22PM +0800, Peter Xu wrote:
> > > > > On Fri, Nov 09, 2018 at 07:49:47PM +0800, Yu Zhang wrote:
> > > > > > This patch updates vtd_lookup_iotlb() to search cached mappings only
> > > > > > for all page levels supported by address width of current vIOMMU. Also,
> > > > > > to cover 57-bit width, the shift of source id(VTD_IOTLB_SID_SHIFT) and
> > > > > > of page level(VTD_IOTLB_LVL_SHIFT) are enlarged by 9 - the stride of
> > > > > > one paging structure level.
> > > > > > 
> > > > > > Signed-off-by: Yu Zhang <yu.c.zhang@linux.intel.com>
> > > > > > ---
> > > > > > Cc: "Michael S. Tsirkin" <mst@redhat.com>
> > > > > > Cc: Marcel Apfelbaum <marcel.apfelbaum@gmail.com>
> > > > > > Cc: Paolo Bonzini <pbonzini@redhat.com> 
> > > > > > Cc: Richard Henderson <rth@twiddle.net> 
> > > > > > Cc: Eduardo Habkost <ehabkost@redhat.com>
> > > > > > Cc: Peter Xu <peterx@redhat.com>
> > > > > > ---
> > > > > >  hw/i386/intel_iommu.c          | 5 +++--
> > > > > >  hw/i386/intel_iommu_internal.h | 7 ++-----
> > > > > >  2 files changed, 5 insertions(+), 7 deletions(-)
> > > > > > 
> > > > > > diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
> > > > > > index 9cdf755..ce7e17e 100644
> > > > > > --- a/hw/i386/intel_iommu.c
> > > > > > +++ b/hw/i386/intel_iommu.c
> > > > > > @@ -254,11 +254,12 @@ static uint64_t vtd_get_iotlb_gfn(hwaddr addr, uint32_t level)
> > > > > >  static VTDIOTLBEntry *vtd_lookup_iotlb(IntelIOMMUState *s, uint16_t source_id,
> > > > > >                                         hwaddr addr)
> > > > > >  {
> > > > > > -    VTDIOTLBEntry *entry;
> > > > > > +    VTDIOTLBEntry *entry = NULL;
> > > > > >      uint64_t key;
> > > > > >      int level;
> > > > > > +    int max_level = (s->aw_bits - VTD_PAGE_SHIFT_4K) / VTD_SL_LEVEL_BITS;
> > > > > >  
> > > > > > -    for (level = VTD_SL_PT_LEVEL; level < VTD_SL_PML4_LEVEL; level++) {
> > > > > > +    for (level = VTD_SL_PT_LEVEL; level < max_level; level++) {
> > > > > 
> > > > > My understanding of current IOTLB is that it only caches the last
> > > > > level of mapping, say:
> > > > > 
> > > > >   - level 1: 4K page
> > > > >   - level 2: 2M page
> > > > >   - level 3: 1G page
> > > > > 
> > > > > So we don't check against level=4 even if x-aw-bits=48 is specified.
> > > > > 
> > > > > Here does it mean that we're going to have... 512G iommu huge pages?
> > > > > 
> > > > 
> > > > No. My bad, I misunderstood this routine. And now I believe we do not
> > > > need this patch. :-)
> > > 
> > > Yeah good to confirm that :-)
> > 
> > Sorry, Peter. I still have question about this part. I agree we do not need
> > to do the extra loop - therefore no need for the max_level part introduced
> > in this patch.
> > 
> > But as to modification of VTD_IOTLB_SID_SHIFT/VTD_IOTLB_LVL_SHIFT, we may
> > still need to do it due to the enlarged gfn, to search an IOTLB entry for
> > a 4K mapping, the pfn itself could be as large as 45-bit.
> 
> Agreed.

Thanks~

> 
> > 
> > Besides, currently vtd_get_iotlb_gfn() is just shifting 12 bits for all
> > different levels, is this necessary? I mean, how about we do the shift
> > based on current level?
> > 
> >  static uint64_t vtd_get_iotlb_gfn(hwaddr addr, uint32_t level)
> >  {
> > -    return (addr & vtd_slpt_level_page_mask(level)) >> VTD_PAGE_SHIFT_4K;
> > +    uint32_t shift = vtd_slpt_level_shift(level);
> > +    return (addr & vtd_slpt_level_page_mask(level)) >> shift;
> >  }
> 
> IMHO we can, but I don't see much gain from it.
> 
> If we shift, we still need to use the maximum possible bits that a PFN
> can hold, which is 45bits (when with 4k pages), so we can't gain
> anything out if it (no saved bits on iotlb key).  Instead, we'll need
> to call more vtd_slpt_level_shift() for each vtd_get_iotlb_gfn() which
> even seems a bit slower.

Yep, we still need to use 45 bits for 4K mappings. The only benifit I can think
of is it's more intuitive - more aligned to the vtd spec of iotlb tags. But just
like you said, I do not see any runtime gain in it. So I'm fine to drop this. :)

> 
> Regards,
> 
> -- 
> Peter Xu
> 

B.R.
Yu

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Qemu-devel] [PATCH v1 2/3] intel-iommu: extend VTD emulation to allow 57-bit IOVA address width.
  2018-11-13  5:45           ` Yu Zhang
@ 2018-11-13  6:12             ` Peter Xu
  2018-11-13  6:59               ` Yu Zhang
  0 siblings, 1 reply; 22+ messages in thread
From: Peter Xu @ 2018-11-13  6:12 UTC (permalink / raw)
  To: Yu Zhang
  Cc: qemu-devel, Paolo Bonzini, Richard Henderson, Eduardo Habkost,
	Michael S. Tsirkin

On Tue, Nov 13, 2018 at 01:45:44PM +0800, Yu Zhang wrote:

[...]

> > > Since at it, another thing I thought about is making sure the IOMMU
> > > capabilities will match between host and guest IOMMU, which I think
> > > this series has ignorred so far.  E.g., when we're having assigned
> > > devices in the guest and with 5-level IOVA, we should make sure the
> > > host IOMMU supports 5-level as well before the guest starts since
> > > otherwise the shadow page synchronization could potentially fail when
> > > the requested IOVA address goes beyond 4-level.  One simple solution
> > > is just to disable device assignment for now when we're with 57bits
> > > vIOMMU but I'm not sure whether that's what you want, especially you
> > > mentioned the DPDK case (who may use assigned devices).
> > 
> > Ok I totally forgot that we don't even support any kind of check like
> > this before... So feel free to skip this comment if you want, or it
> > would be even nicer if you want to fix it as a whole. :)
> > 
> 
> Indeed. We have talked about this before. How about we focus on the 5-level
> extension for now, and solve the check issue in the future? I still do not
> have any clean solutions in mind. BTW, any suggestions on this issue? :)

I started to remember our discussions, sorry I should remember them
earlier... :)

The only thing in my mind (I think I also suggested the same thing
during that discussion, but I don't trust my memory any more...) is to
use sysfs.  Say:

  1. Scan /sys/class/iommu/dmarN for all the host IOMMUs, read cap of
     each IOMMU from /sys/class/iommu/dmar0/intel-iommu/cap,

  2. For each host iommu, scan /sys/class/iommu/dmarN/devices for all
     the devices under each host IOMMU, then we can know which IOMMU
     owns which device,

  3. For each assigned device to the guest, we lookup the previous
     information to know the mgaw for each host device, raise error
     and stop QEMU from booting if any of the host device has less
     level supported than the guest vIOMMU (possibly some more checks
     in vtd_iommu_notify_flag_changed)

(we still have some issue on vtd_iommu_notify_flag_changed since it's
 only run until the first enablement of vIOMMU, so we'll only raise
 the error during guest Linux boots with vIOMMU on. But that's another
 issue)

Regards,

-- 
Peter Xu

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Qemu-devel] [PATCH v1 2/3] intel-iommu: extend VTD emulation to allow 57-bit IOVA address width.
  2018-11-13  6:12             ` Peter Xu
@ 2018-11-13  6:59               ` Yu Zhang
  0 siblings, 0 replies; 22+ messages in thread
From: Yu Zhang @ 2018-11-13  6:59 UTC (permalink / raw)
  To: Peter Xu
  Cc: Paolo Bonzini, Michael S. Tsirkin, qemu-devel, Eduardo Habkost,
	Richard Henderson

On Tue, Nov 13, 2018 at 02:12:17PM +0800, Peter Xu wrote:
> On Tue, Nov 13, 2018 at 01:45:44PM +0800, Yu Zhang wrote:
> 
> [...]
> 
> > > > Since at it, another thing I thought about is making sure the IOMMU
> > > > capabilities will match between host and guest IOMMU, which I think
> > > > this series has ignorred so far.  E.g., when we're having assigned
> > > > devices in the guest and with 5-level IOVA, we should make sure the
> > > > host IOMMU supports 5-level as well before the guest starts since
> > > > otherwise the shadow page synchronization could potentially fail when
> > > > the requested IOVA address goes beyond 4-level.  One simple solution
> > > > is just to disable device assignment for now when we're with 57bits
> > > > vIOMMU but I'm not sure whether that's what you want, especially you
> > > > mentioned the DPDK case (who may use assigned devices).
> > > 
> > > Ok I totally forgot that we don't even support any kind of check like
> > > this before... So feel free to skip this comment if you want, or it
> > > would be even nicer if you want to fix it as a whole. :)
> > > 
> > 
> > Indeed. We have talked about this before. How about we focus on the 5-level
> > extension for now, and solve the check issue in the future? I still do not
> > have any clean solutions in mind. BTW, any suggestions on this issue? :)
> 
> I started to remember our discussions, sorry I should remember them
> earlier... :)
> 
> The only thing in my mind (I think I also suggested the same thing
> during that discussion, but I don't trust my memory any more...) is to
> use sysfs.  Say:
> 
>   1. Scan /sys/class/iommu/dmarN for all the host IOMMUs, read cap of
>      each IOMMU from /sys/class/iommu/dmar0/intel-iommu/cap,
> 
>   2. For each host iommu, scan /sys/class/iommu/dmarN/devices for all
>      the devices under each host IOMMU, then we can know which IOMMU
>      owns which device,
> 
>   3. For each assigned device to the guest, we lookup the previous
>      information to know the mgaw for each host device, raise error
>      and stop QEMU from booting if any of the host device has less
>      level supported than the guest vIOMMU (possibly some more checks
>      in vtd_iommu_notify_flag_changed)
> 
> (we still have some issue on vtd_iommu_notify_flag_changed since it's
>  only run until the first enablement of vIOMMU, so we'll only raise
>  the error during guest Linux boots with vIOMMU on. But that's another
>  issue)

Thanks for the explanation, Peter. You do have a better memory than I am.:)


> 
> Regards,
> 
> -- 
> Peter Xu
> 

B.R.
Yu

^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2018-11-13  7:01 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-11-09 11:49 [Qemu-devel] [PATCH v1 0/3] intel-iommu: add support for 5-level virtual IOMMU Yu Zhang
2018-11-09 11:49 ` [Qemu-devel] [PATCH v1 1/3] intel-iommu: differentiate host address width from IOVA address width Yu Zhang
2018-11-12  8:15   ` Peter Xu
2018-11-12  9:28     ` Yu Zhang
2018-11-09 11:49 ` [Qemu-devel] [PATCH v1 2/3] intel-iommu: extend VTD emulation to allow 57-bit " Yu Zhang
2018-11-12  8:36   ` Peter Xu
2018-11-12  9:42     ` Yu Zhang
2018-11-13  3:37       ` Peter Xu
2018-11-13  5:04         ` Peter Xu
2018-11-13  5:45           ` Yu Zhang
2018-11-13  6:12             ` Peter Xu
2018-11-13  6:59               ` Yu Zhang
2018-11-13  5:41         ` Yu Zhang
2018-11-09 11:49 ` [Qemu-devel] [PATCH v1 3/3] intel-iommu: search iotlb for levels supported by the " Yu Zhang
2018-11-12  8:51   ` Peter Xu
2018-11-12  9:25     ` Yu Zhang
2018-11-12  9:36       ` Peter Xu
2018-11-12 12:38         ` Yu Zhang
2018-11-13  5:18           ` Peter Xu
2018-11-13  5:53             ` Yu Zhang
2018-11-09 22:32 ` [Qemu-devel] [PATCH v1 0/3] intel-iommu: add support for 5-level virtual IOMMU no-reply
2018-11-12  8:53 ` Peter Xu

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.