Xen-Devel Archive on lore.kernel.org
 help / color / Atom feed
* [Xen-devel] [PATCH 0/9] x86: AMD x2APIC support
@ 2019-06-13 13:14 Jan Beulich
  2019-06-13 13:22 ` [Xen-devel] [PATCH 1/9] AMD/IOMMU: use bit field for extended feature register Jan Beulich
                   ` (9 more replies)
  0 siblings, 10 replies; 54+ messages in thread
From: Jan Beulich @ 2019-06-13 13:14 UTC (permalink / raw)
  To: xen-devel; +Cc: Andrew Cooper, Brian Woods, Suravee Suthikulpanit

Despite the title this is actually all AMD IOMMU side work; all x86
side adjustments have already been carried out.

If in doubt, the series is assumed to go on top of

AMD/IOMMU: initialize IRQ tasklet only once [1]
AMD/IOMMU: revert "amd/iommu: assign iommu devices to Xen" [2]
AMD/IOMMU: don't "add" IOMMUs [3]

1: AMD/IOMMU: use bit field for extended feature register
2: AMD/IOMMU: use bit field for control register
3: AMD/IOMMU: use bit field for IRTE
4: AMD/IOMMU: introduce 128-bit IRTE non-guest-APIC IRTE format
5: AMD/IOMMU: split amd_iommu_init_one()
6: AMD/IOMMU: allow enabling with IRQ not yet set up
7: AMD/IOMMU: adjust setup of internal interrupt for x2APIC mode
8: AMD/IOMMU: enable x2APIC mode when available
9: AMD/IOMMU: correct IRTE updating

Jan

[1] https://lists.xenproject.org/archives/html/xen-devel/2019-05/msg02441.html
[2] https://lists.xenproject.org/archives/html/xen-devel/2019-06/msg00095.html
[3] https://lists.xenproject.org/archives/html/xen-devel/2019-06/msg00200.html



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 54+ messages in thread

* [Xen-devel] [PATCH 1/9] AMD/IOMMU: use bit field for extended feature register
  2019-06-13 13:14 [Xen-devel] [PATCH 0/9] x86: AMD x2APIC support Jan Beulich
@ 2019-06-13 13:22 ` Jan Beulich
  2019-06-17 19:07   ` Woods, Brian
  2019-06-17 20:23   ` Andrew Cooper
  2019-06-13 13:22 ` [Xen-devel] [PATCH 2/9] AMD/IOMMU: use bit field for control register Jan Beulich
                   ` (8 subsequent siblings)
  9 siblings, 2 replies; 54+ messages in thread
From: Jan Beulich @ 2019-06-13 13:22 UTC (permalink / raw)
  To: xen-devel; +Cc: Andrew Cooper, Brian Woods, Suravee Suthikulpanit

This also takes care of several of the shift values wrongly having been
specified as hex rather than dec.

Take the opportunity and add further fields.

Signed-off-by: Jan Beulich <jbeulich@suse.com>

--- a/xen/drivers/passthrough/amd/iommu_detect.c
+++ b/xen/drivers/passthrough/amd/iommu_detect.c
@@ -60,43 +60,72 @@ static int __init get_iommu_capabilities
 
 void __init get_iommu_features(struct amd_iommu *iommu)
 {
-    u32 low, high;
-    int i = 0 ;
-    static const char *__initdata feature_str[] = {
-        "- Prefetch Pages Command", 
-        "- Peripheral Page Service Request", 
-        "- X2APIC Supported", 
-        "- NX bit Supported", 
-        "- Guest Translation", 
-        "- Reserved bit [5]",
-        "- Invalidate All Command", 
-        "- Guest APIC supported", 
-        "- Hardware Error Registers", 
-        "- Performance Counters", 
-        NULL
-    };
-
     ASSERT( iommu->mmio_base );
 
     if ( !iommu_has_cap(iommu, PCI_CAP_EFRSUP_SHIFT) )
     {
-        iommu->features = 0;
+        iommu->features.raw = 0;
         return;
     }
 
-    low = readl(iommu->mmio_base + IOMMU_EXT_FEATURE_MMIO_OFFSET);
-    high = readl(iommu->mmio_base + IOMMU_EXT_FEATURE_MMIO_OFFSET + 4);
-
-    iommu->features = ((u64)high << 32) | low;
+    iommu->features.raw =
+        readq(iommu->mmio_base + IOMMU_EXT_FEATURE_MMIO_OFFSET);
 
     printk("AMD-Vi: IOMMU Extended Features:\n");
 
-    while ( feature_str[i] )
+#define MASK(fld) ((union amd_iommu_ext_features){ .flds.fld = ~0 }).raw
+#define FEAT(fld, str) do { \
+    if ( MASK(fld) & (MASK(fld) - 1) ) \
+        printk( "- " str ": %#x\n", iommu->features.flds.fld); \
+    else if ( iommu->features.raw & MASK(fld) ) \
+        printk( "- " str "\n"); \
+} while ( false )
+
+    FEAT(pref_sup,           "Prefetch Pages Command");
+    FEAT(ppr_sup,            "Peripheral Page Service Request");
+    FEAT(xt_sup,             "x2APIC");
+    FEAT(nx_sup,             "NX bit");
+    FEAT(gappi_sup,          "Guest APIC Physical Processor Interrupt");
+    FEAT(ia_sup,             "Invalidate All Command");
+    FEAT(ga_sup,             "Guest APIC");
+    FEAT(he_sup,             "Hardware Error Registers");
+    FEAT(pc_sup,             "Performance Counters");
+    FEAT(hats,               "Host Address Translation Size");
+
+    if ( iommu->features.flds.gt_sup )
     {
-        if ( amd_iommu_has_feature(iommu, i) )
-            printk( " %s\n", feature_str[i]);
-        i++;
+        FEAT(gats,           "Guest Address Translation Size");
+        FEAT(glx_sup,        "Guest CR3 Root Table Level");
+        FEAT(pas_max,        "Maximum PASID");
     }
+
+    FEAT(smif_sup,           "SMI Filter Register");
+    FEAT(smif_rc,            "SMI Filter Register Count");
+    FEAT(gam_sup,            "Guest Virtual APIC Modes");
+    FEAT(dual_ppr_log_sup,   "Dual PPR Log");
+    FEAT(dual_event_log_sup, "Dual Event Log");
+    FEAT(sat_sup,            "Secure ATS");
+    FEAT(us_sup,             "User / Supervisor Page Protection");
+    FEAT(dev_tbl_seg_sup,    "Device Table Segmentation");
+    FEAT(ppr_early_of_sup,   "PPR Log Overflow Early Warning");
+    FEAT(ppr_auto_rsp_sup,   "PPR Automatic Response");
+    FEAT(marc_sup,           "Memory Access Routing and Control");
+    FEAT(blk_stop_mrk_sup,   "Block StopMark Message");
+    FEAT(perf_opt_sup ,      "Performance Optimization");
+    FEAT(msi_cap_mmio_sup,   "MSI Capability MMIO Access");
+    FEAT(gio_sup,            "Guest I/O Protection");
+    FEAT(ha_sup,             "Host Access");
+    FEAT(eph_sup,            "Enhanced PPR Handling");
+    FEAT(attr_fw_sup,        "Attribute Forward");
+    FEAT(hd_sup,             "Host Dirty");
+    FEAT(inv_iotlb_type_sup, "Invalidate IOTLB Type");
+    FEAT(viommu_sup,         "Virtualized IOMMU");
+    FEAT(vm_guard_io_sup,    "VMGuard I/O Support");
+    FEAT(vm_table_size,      "VM Table Size");
+    FEAT(ga_update_dis_sup,  "Guest Access Bit Update Disable");
+
+#undef FEAT
+#undef MASK
 }
 
 int __init amd_iommu_detect_one_acpi(
--- a/xen/drivers/passthrough/amd/iommu_guest.c
+++ b/xen/drivers/passthrough/amd/iommu_guest.c
@@ -638,7 +638,7 @@ static uint64_t iommu_mmio_read64(struct
         val = reg_to_u64(iommu->reg_status);
         break;
     case IOMMU_EXT_FEATURE_MMIO_OFFSET:
-        val = reg_to_u64(iommu->reg_ext_feature);
+        val = iommu->reg_ext_feature.raw;
         break;
 
     default:
@@ -802,39 +802,26 @@ int guest_iommu_set_base(struct domain *
 /* Initialize mmio read only bits */
 static void guest_iommu_reg_init(struct guest_iommu *iommu)
 {
-    uint32_t lower, upper;
+    union amd_iommu_ext_features ef = {
+        /* Support prefetch */
+        .flds.pref_sup = 1,
+        /* Support PPR log */
+        .flds.ppr_sup = 1,
+        /* Support guest translation */
+        .flds.gt_sup = 1,
+        /* Support invalidate all command */
+        .flds.ia_sup = 1,
+        /* Host translation size has 6 levels */
+        .flds.hats = HOST_ADDRESS_SIZE_6_LEVEL,
+        /* Guest translation size has 6 levels */
+        .flds.gats = GUEST_ADDRESS_SIZE_6_LEVEL,
+        /* Single level gCR3 */
+        .flds.glx_sup = GUEST_CR3_1_LEVEL,
+        /* 9 bit PASID */
+        .flds.pas_max = PASMAX_9_bit,
+    };
 
-    lower = upper = 0;
-    /* Support prefetch */
-    iommu_set_bit(&lower,IOMMU_EXT_FEATURE_PREFSUP_SHIFT);
-    /* Support PPR log */
-    iommu_set_bit(&lower,IOMMU_EXT_FEATURE_PPRSUP_SHIFT);
-    /* Support guest translation */
-    iommu_set_bit(&lower,IOMMU_EXT_FEATURE_GTSUP_SHIFT);
-    /* Support invalidate all command */
-    iommu_set_bit(&lower,IOMMU_EXT_FEATURE_IASUP_SHIFT);
-
-    /* Host translation size has 6 levels */
-    set_field_in_reg_u32(HOST_ADDRESS_SIZE_6_LEVEL, lower,
-                         IOMMU_EXT_FEATURE_HATS_MASK,
-                         IOMMU_EXT_FEATURE_HATS_SHIFT,
-                         &lower);
-    /* Guest translation size has 6 levels */
-    set_field_in_reg_u32(GUEST_ADDRESS_SIZE_6_LEVEL, lower,
-                         IOMMU_EXT_FEATURE_GATS_MASK,
-                         IOMMU_EXT_FEATURE_GATS_SHIFT,
-                         &lower);
-    /* Single level gCR3 */
-    set_field_in_reg_u32(GUEST_CR3_1_LEVEL, lower,
-                         IOMMU_EXT_FEATURE_GLXSUP_MASK,
-                         IOMMU_EXT_FEATURE_GLXSUP_SHIFT, &lower);
-    /* 9 bit PASID */
-    set_field_in_reg_u32(PASMAX_9_bit, upper,
-                         IOMMU_EXT_FEATURE_PASMAX_MASK,
-                         IOMMU_EXT_FEATURE_PASMAX_SHIFT, &upper);
-
-    iommu->reg_ext_feature.lo = lower;
-    iommu->reg_ext_feature.hi = upper;
+    iommu->reg_ext_feature = ef;
 }
 
 static int guest_iommu_mmio_range(struct vcpu *v, unsigned long addr)
--- a/xen/drivers/passthrough/amd/iommu_init.c
+++ b/xen/drivers/passthrough/amd/iommu_init.c
@@ -883,7 +883,7 @@ static void enable_iommu(struct amd_iomm
     register_iommu_event_log_in_mmio_space(iommu);
     register_iommu_exclusion_range(iommu);
 
-    if ( amd_iommu_has_feature(iommu, IOMMU_EXT_FEATURE_PPRSUP_SHIFT) )
+    if ( iommu->features.flds.ppr_sup )
         register_iommu_ppr_log_in_mmio_space(iommu);
 
     desc = irq_to_desc(iommu->msi.irq);
@@ -897,15 +897,15 @@ static void enable_iommu(struct amd_iomm
     set_iommu_command_buffer_control(iommu, IOMMU_CONTROL_ENABLED);
     set_iommu_event_log_control(iommu, IOMMU_CONTROL_ENABLED);
 
-    if ( amd_iommu_has_feature(iommu, IOMMU_EXT_FEATURE_PPRSUP_SHIFT) )
+    if ( iommu->features.flds.ppr_sup )
         set_iommu_ppr_log_control(iommu, IOMMU_CONTROL_ENABLED);
 
-    if ( amd_iommu_has_feature(iommu, IOMMU_EXT_FEATURE_GTSUP_SHIFT) )
+    if ( iommu->features.flds.gt_sup )
         set_iommu_guest_translation_control(iommu, IOMMU_CONTROL_ENABLED);
 
     set_iommu_translation_control(iommu, IOMMU_CONTROL_ENABLED);
 
-    if ( amd_iommu_has_feature(iommu, IOMMU_EXT_FEATURE_IASUP_SHIFT) )
+    if ( iommu->features.flds.ia_sup )
         amd_iommu_flush_all_caches(iommu);
 
     iommu->enabled = 1;
@@ -928,10 +928,10 @@ static void disable_iommu(struct amd_iom
     set_iommu_command_buffer_control(iommu, IOMMU_CONTROL_DISABLED);
     set_iommu_event_log_control(iommu, IOMMU_CONTROL_DISABLED);
 
-    if ( amd_iommu_has_feature(iommu, IOMMU_EXT_FEATURE_PPRSUP_SHIFT) )
+    if ( iommu->features.flds.ppr_sup )
         set_iommu_ppr_log_control(iommu, IOMMU_CONTROL_DISABLED);
 
-    if ( amd_iommu_has_feature(iommu, IOMMU_EXT_FEATURE_GTSUP_SHIFT) )
+    if ( iommu->features.flds.gt_sup )
         set_iommu_guest_translation_control(iommu, IOMMU_CONTROL_DISABLED);
 
     set_iommu_translation_control(iommu, IOMMU_CONTROL_DISABLED);
@@ -1027,7 +1027,7 @@ static int __init amd_iommu_init_one(str
 
     get_iommu_features(iommu);
 
-    if ( iommu->features )
+    if ( iommu->features.raw )
         iommuv2_enabled = 1;
 
     if ( allocate_cmd_buffer(iommu) == NULL )
@@ -1036,9 +1036,8 @@ static int __init amd_iommu_init_one(str
     if ( allocate_event_log(iommu) == NULL )
         goto error_out;
 
-    if ( amd_iommu_has_feature(iommu, IOMMU_EXT_FEATURE_PPRSUP_SHIFT) )
-        if ( allocate_ppr_log(iommu) == NULL )
-            goto error_out;
+    if ( iommu->features.flds.ppr_sup && !allocate_ppr_log(iommu) )
+        goto error_out;
 
     if ( !set_iommu_interrupt_handler(iommu) )
         goto error_out;
@@ -1389,7 +1388,7 @@ void amd_iommu_resume(void)
     }
 
     /* flush all cache entries after iommu re-enabled */
-    if ( !amd_iommu_has_feature(iommu, IOMMU_EXT_FEATURE_IASUP_SHIFT) )
+    if ( !iommu->features.flds.ia_sup )
     {
         invalidate_all_devices();
         invalidate_all_domain_pages();
--- a/xen/include/asm-x86/amd-iommu.h
+++ b/xen/include/asm-x86/amd-iommu.h
@@ -83,7 +83,7 @@ struct amd_iommu {
     iommu_cap_t cap;
 
     u8 ht_flags;
-    u64 features;
+    union amd_iommu_ext_features features;
 
     void *mmio_base;
     unsigned long mmio_base_phys;
@@ -174,7 +174,7 @@ struct guest_iommu {
     /* MMIO regs */
     struct mmio_reg         reg_ctrl;              /* MMIO offset 0018h */
     struct mmio_reg         reg_status;            /* MMIO offset 2020h */
-    struct mmio_reg         reg_ext_feature;       /* MMIO offset 0030h */
+    union amd_iommu_ext_features reg_ext_feature;  /* MMIO offset 0030h */
 
     /* guest interrupt settings */
     struct guest_iommu_msi  msi;
--- a/xen/include/asm-x86/hvm/svm/amd-iommu-defs.h
+++ b/xen/include/asm-x86/hvm/svm/amd-iommu-defs.h
@@ -346,26 +346,57 @@ struct amd_iommu_dte {
 #define IOMMU_EXCLUSION_LIMIT_HIGH_MASK		0xFFFFFFFF
 #define IOMMU_EXCLUSION_LIMIT_HIGH_SHIFT	0
 
-/* Extended Feature Register*/
+/* Extended Feature Register */
 #define IOMMU_EXT_FEATURE_MMIO_OFFSET                   0x30
-#define IOMMU_EXT_FEATURE_PREFSUP_SHIFT                 0x0
-#define IOMMU_EXT_FEATURE_PPRSUP_SHIFT                  0x1
-#define IOMMU_EXT_FEATURE_XTSUP_SHIFT                   0x2
-#define IOMMU_EXT_FEATURE_NXSUP_SHIFT                   0x3
-#define IOMMU_EXT_FEATURE_GTSUP_SHIFT                   0x4
-#define IOMMU_EXT_FEATURE_IASUP_SHIFT                   0x6
-#define IOMMU_EXT_FEATURE_GASUP_SHIFT                   0x7
-#define IOMMU_EXT_FEATURE_HESUP_SHIFT                   0x8
-#define IOMMU_EXT_FEATURE_PCSUP_SHIFT                   0x9
-#define IOMMU_EXT_FEATURE_HATS_SHIFT                    0x10
-#define IOMMU_EXT_FEATURE_HATS_MASK                     0x00000C00
-#define IOMMU_EXT_FEATURE_GATS_SHIFT                    0x12
-#define IOMMU_EXT_FEATURE_GATS_MASK                     0x00003000
-#define IOMMU_EXT_FEATURE_GLXSUP_SHIFT                  0x14
-#define IOMMU_EXT_FEATURE_GLXSUP_MASK                   0x0000C000
 
-#define IOMMU_EXT_FEATURE_PASMAX_SHIFT                  0x0
-#define IOMMU_EXT_FEATURE_PASMAX_MASK                   0x0000001F
+union amd_iommu_ext_features {
+    uint64_t raw;
+    struct {
+        unsigned int pref_sup:1;
+        unsigned int ppr_sup:1;
+        unsigned int xt_sup:1;
+        unsigned int nx_sup:1;
+        unsigned int gt_sup:1;
+        unsigned int gappi_sup:1;
+        unsigned int ia_sup:1;
+        unsigned int ga_sup:1;
+        unsigned int he_sup:1;
+        unsigned int pc_sup:1;
+        unsigned int hats:2;
+        unsigned int gats:2;
+        unsigned int glx_sup:2;
+        unsigned int smif_sup:2;
+        unsigned int smif_rc:3;
+        unsigned int gam_sup:3;
+        unsigned int dual_ppr_log_sup:2;
+        unsigned int :2;
+        unsigned int dual_event_log_sup:2;
+        unsigned int sat_sup:1;
+        unsigned int :1;
+        unsigned int pas_max:5;
+        unsigned int us_sup:1;
+        unsigned int dev_tbl_seg_sup:2;
+        unsigned int ppr_early_of_sup:1;
+        unsigned int ppr_auto_rsp_sup:1;
+        unsigned int marc_sup:2;
+        unsigned int blk_stop_mrk_sup:1;
+        unsigned int perf_opt_sup:1;
+        unsigned int msi_cap_mmio_sup:1;
+        unsigned int :1;
+        unsigned int gio_sup:1;
+        unsigned int ha_sup:1;
+        unsigned int eph_sup:1;
+        unsigned int attr_fw_sup:1;
+        unsigned int hd_sup:1;
+        unsigned int :1;
+        unsigned int inv_iotlb_type_sup:1;
+        unsigned int viommu_sup:1;
+        unsigned int vm_guard_io_sup:1;
+        unsigned int vm_table_size:4;
+        unsigned int ga_update_dis_sup:1;
+        unsigned int :2;
+    } flds;
+};
 
 /* Status Register*/
 #define IOMMU_STATUS_MMIO_OFFSET		0x2020
--- a/xen/include/asm-x86/hvm/svm/amd-iommu-proto.h
+++ b/xen/include/asm-x86/hvm/svm/amd-iommu-proto.h
@@ -219,13 +219,6 @@ static inline int iommu_has_cap(struct a
     return !!(iommu->cap.header & (1u << bit));
 }
 
-static inline int amd_iommu_has_feature(struct amd_iommu *iommu, uint32_t bit)
-{
-    if ( !iommu_has_cap(iommu, PCI_CAP_EFRSUP_SHIFT) )
-        return 0;
-    return !!(iommu->features & (1U << bit));
-}
-
 /* access tail or head pointer of ring buffer */
 static inline uint32_t iommu_get_rb_pointer(uint32_t reg)
 {




_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 54+ messages in thread

* [Xen-devel] [PATCH 2/9] AMD/IOMMU: use bit field for control register
  2019-06-13 13:14 [Xen-devel] [PATCH 0/9] x86: AMD x2APIC support Jan Beulich
  2019-06-13 13:22 ` [Xen-devel] [PATCH 1/9] AMD/IOMMU: use bit field for extended feature register Jan Beulich
@ 2019-06-13 13:22 ` Jan Beulich
  2019-06-18  9:54   ` Andrew Cooper
  2019-06-13 13:23 ` [Xen-devel] [PATCH 3/9] AMD/IOMMU: use bit field for IRTE Jan Beulich
                   ` (7 subsequent siblings)
  9 siblings, 1 reply; 54+ messages in thread
From: Jan Beulich @ 2019-06-13 13:22 UTC (permalink / raw)
  To: xen-devel; +Cc: Andrew Cooper, Brian Woods, Suravee Suthikulpanit

Also introduce a field in struct amd_iommu caching the most recently
written control register. All writes should now happen exclusively from
that cached value, such that it is guaranteed to be up to date.

Take the opportunity and add further fields. Also convert a few boolean
function parameters to bool, such that use of !! can be avoided.

Signed-off-by: Jan Beulich <jbeulich@suse.com>

--- a/xen/drivers/passthrough/amd/iommu_guest.c
+++ b/xen/drivers/passthrough/amd/iommu_guest.c
@@ -317,7 +317,7 @@ static int do_invalidate_iotlb_pages(str
 
 static int do_completion_wait(struct domain *d, cmd_entry_t *cmd)
 {
-    bool_t com_wait_int_en, com_wait_int, i, s;
+    bool com_wait_int, i, s;
     struct guest_iommu *iommu;
     unsigned long gfn;
     p2m_type_t p2mt;
@@ -354,12 +354,10 @@ static int do_completion_wait(struct dom
         unmap_domain_page(vaddr);
     }
 
-    com_wait_int_en = iommu_get_bit(iommu->reg_ctrl.lo,
-                                    IOMMU_CONTROL_COMP_WAIT_INT_SHIFT);
     com_wait_int = iommu_get_bit(iommu->reg_status.lo,
                                  IOMMU_STATUS_COMP_WAIT_INT_SHIFT);
 
-    if ( com_wait_int_en && com_wait_int )
+    if ( iommu->reg_ctrl.com_wait_int_en && com_wait_int )
         guest_iommu_deliver_msi(d);
 
     return 0;
@@ -521,40 +519,17 @@ static void guest_iommu_process_command(
     return;
 }
 
-static int guest_iommu_write_ctrl(struct guest_iommu *iommu, uint64_t newctrl)
+static int guest_iommu_write_ctrl(struct guest_iommu *iommu, uint64_t val)
 {
-    bool_t cmd_en, event_en, iommu_en, ppr_en, ppr_log_en;
-    bool_t cmd_en_old, event_en_old, iommu_en_old;
-    bool_t cmd_run;
-
-    iommu_en = iommu_get_bit(newctrl,
-                             IOMMU_CONTROL_TRANSLATION_ENABLE_SHIFT);
-    iommu_en_old = iommu_get_bit(iommu->reg_ctrl.lo,
-                                 IOMMU_CONTROL_TRANSLATION_ENABLE_SHIFT);
-
-    cmd_en = iommu_get_bit(newctrl,
-                           IOMMU_CONTROL_COMMAND_BUFFER_ENABLE_SHIFT);
-    cmd_en_old = iommu_get_bit(iommu->reg_ctrl.lo,
-                               IOMMU_CONTROL_COMMAND_BUFFER_ENABLE_SHIFT);
-    cmd_run = iommu_get_bit(iommu->reg_status.lo,
-                            IOMMU_STATUS_CMD_BUFFER_RUN_SHIFT);
-    event_en = iommu_get_bit(newctrl,
-                             IOMMU_CONTROL_EVENT_LOG_ENABLE_SHIFT);
-    event_en_old = iommu_get_bit(iommu->reg_ctrl.lo,
-                                 IOMMU_CONTROL_EVENT_LOG_ENABLE_SHIFT);
-
-    ppr_en = iommu_get_bit(newctrl,
-                           IOMMU_CONTROL_PPR_ENABLE_SHIFT);
-    ppr_log_en = iommu_get_bit(newctrl,
-                               IOMMU_CONTROL_PPR_LOG_ENABLE_SHIFT);
+    union amd_iommu_control newctrl = { .raw = val };
 
-    if ( iommu_en )
+    if ( newctrl.iommu_en )
     {
         guest_iommu_enable(iommu);
         guest_iommu_enable_dev_table(iommu);
     }
 
-    if ( iommu_en && cmd_en )
+    if ( newctrl.iommu_en && newctrl.cmd_buf_en )
     {
         guest_iommu_enable_ring_buffer(iommu, &iommu->cmd_buffer,
                                        sizeof(cmd_entry_t));
@@ -562,7 +537,7 @@ static int guest_iommu_write_ctrl(struct
         tasklet_schedule(&iommu->cmd_buffer_tasklet);
     }
 
-    if ( iommu_en && event_en )
+    if ( newctrl.iommu_en && newctrl.event_log_en )
     {
         guest_iommu_enable_ring_buffer(iommu, &iommu->event_log,
                                        sizeof(event_entry_t));
@@ -570,7 +545,7 @@ static int guest_iommu_write_ctrl(struct
         guest_iommu_clear_status(iommu, IOMMU_STATUS_EVENT_OVERFLOW_SHIFT);
     }
 
-    if ( iommu_en && ppr_en && ppr_log_en )
+    if ( newctrl.iommu_en && newctrl.ppr_en && newctrl.ppr_log_en )
     {
         guest_iommu_enable_ring_buffer(iommu, &iommu->ppr_log,
                                        sizeof(ppr_entry_t));
@@ -578,19 +553,21 @@ static int guest_iommu_write_ctrl(struct
         guest_iommu_clear_status(iommu, IOMMU_STATUS_PPR_LOG_OVERFLOW_SHIFT);
     }
 
-    if ( iommu_en && cmd_en_old && !cmd_en )
+    if ( newctrl.iommu_en && iommu->reg_ctrl.cmd_buf_en &&
+         !newctrl.cmd_buf_en )
     {
         /* Disable iommu command processing */
         tasklet_kill(&iommu->cmd_buffer_tasklet);
     }
 
-    if ( event_en_old && !event_en )
+    if ( iommu->reg_ctrl.event_log_en && !newctrl.event_log_en )
         guest_iommu_clear_status(iommu, IOMMU_STATUS_EVENT_LOG_RUN_SHIFT);
 
-    if ( iommu_en_old && !iommu_en )
+    if ( iommu->reg_ctrl.iommu_en && !newctrl.iommu_en )
         guest_iommu_disable(iommu);
 
-    u64_to_reg(&iommu->reg_ctrl, newctrl);
+    iommu->reg_ctrl = newctrl;
+
     return 0;
 }
 
@@ -632,7 +609,7 @@ static uint64_t iommu_mmio_read64(struct
         val = reg_to_u64(iommu->ppr_log.reg_tail);
         break;
     case IOMMU_CONTROL_MMIO_OFFSET:
-        val = reg_to_u64(iommu->reg_ctrl);
+        val = iommu->reg_ctrl.raw;
         break;
     case IOMMU_STATUS_MMIO_OFFSET:
         val = reg_to_u64(iommu->reg_status);
--- a/xen/drivers/passthrough/amd/iommu_init.c
+++ b/xen/drivers/passthrough/amd/iommu_init.c
@@ -41,7 +41,7 @@ LIST_HEAD_READ_MOSTLY(amd_iommu_head);
 struct table_struct device_table;
 bool_t iommuv2_enabled;
 
-static int iommu_has_ht_flag(struct amd_iommu *iommu, u8 mask)
+static bool iommu_has_ht_flag(struct amd_iommu *iommu, u8 mask)
 {
     return iommu->ht_flags & mask;
 }
@@ -69,31 +69,18 @@ static void __init unmap_iommu_mmio_regi
 
 static void set_iommu_ht_flags(struct amd_iommu *iommu)
 {
-    u32 entry;
-    entry = readl(iommu->mmio_base + IOMMU_CONTROL_MMIO_OFFSET);
-
     /* Setup HT flags */
     if ( iommu_has_cap(iommu, PCI_CAP_HT_TUNNEL_SHIFT) )
-        iommu_has_ht_flag(iommu, ACPI_IVHD_TT_ENABLE) ?
-            iommu_set_bit(&entry, IOMMU_CONTROL_HT_TUNNEL_TRANSLATION_SHIFT) :
-            iommu_clear_bit(&entry, IOMMU_CONTROL_HT_TUNNEL_TRANSLATION_SHIFT);
-
-    iommu_has_ht_flag(iommu, ACPI_IVHD_RES_PASS_PW) ?
-        iommu_set_bit(&entry, IOMMU_CONTROL_RESP_PASS_POSTED_WRITE_SHIFT):
-        iommu_clear_bit(&entry, IOMMU_CONTROL_RESP_PASS_POSTED_WRITE_SHIFT);
-
-    iommu_has_ht_flag(iommu, ACPI_IVHD_ISOC) ?
-        iommu_set_bit(&entry, IOMMU_CONTROL_ISOCHRONOUS_SHIFT):
-        iommu_clear_bit(&entry, IOMMU_CONTROL_ISOCHRONOUS_SHIFT);
-
-    iommu_has_ht_flag(iommu, ACPI_IVHD_PASS_PW) ?
-        iommu_set_bit(&entry, IOMMU_CONTROL_PASS_POSTED_WRITE_SHIFT):
-        iommu_clear_bit(&entry, IOMMU_CONTROL_PASS_POSTED_WRITE_SHIFT);
+        iommu->ctrl.ht_tun_en = iommu_has_ht_flag(iommu, ACPI_IVHD_TT_ENABLE);
+
+    iommu->ctrl.pass_pw     = iommu_has_ht_flag(iommu, ACPI_IVHD_PASS_PW);
+    iommu->ctrl.res_pass_pw = iommu_has_ht_flag(iommu, ACPI_IVHD_RES_PASS_PW);
+    iommu->ctrl.isoc        = iommu_has_ht_flag(iommu, ACPI_IVHD_ISOC);
 
     /* Force coherent */
-    iommu_set_bit(&entry, IOMMU_CONTROL_COHERENT_SHIFT);
+    iommu->ctrl.coherent = 1;
 
-    writel(entry, iommu->mmio_base+IOMMU_CONTROL_MMIO_OFFSET);
+    writeq(iommu->ctrl.raw, iommu->mmio_base + IOMMU_CONTROL_MMIO_OFFSET);
 }
 
 static void register_iommu_dev_table_in_mmio_space(struct amd_iommu *iommu)
@@ -205,55 +192,37 @@ static void register_iommu_ppr_log_in_mm
 
 
 static void set_iommu_translation_control(struct amd_iommu *iommu,
-                                                 int enable)
+                                          bool enable)
 {
-    u32 entry;
+    iommu->ctrl.iommu_en = enable;
 
-    entry = readl(iommu->mmio_base + IOMMU_CONTROL_MMIO_OFFSET);
-
-    enable ?
-        iommu_set_bit(&entry, IOMMU_CONTROL_TRANSLATION_ENABLE_SHIFT) :
-        iommu_clear_bit(&entry, IOMMU_CONTROL_TRANSLATION_ENABLE_SHIFT);
-
-    writel(entry, iommu->mmio_base+IOMMU_CONTROL_MMIO_OFFSET);
+    writeq(iommu->ctrl.raw, iommu->mmio_base + IOMMU_CONTROL_MMIO_OFFSET);
 }
 
 static void set_iommu_guest_translation_control(struct amd_iommu *iommu,
-                                                int enable)
+                                                bool enable)
 {
-    u32 entry;
-
-    entry = readl(iommu->mmio_base + IOMMU_CONTROL_MMIO_OFFSET);
+    iommu->ctrl.gt_en = enable;
 
-    enable ?
-        iommu_set_bit(&entry, IOMMU_CONTROL_GT_ENABLE_SHIFT) :
-        iommu_clear_bit(&entry, IOMMU_CONTROL_GT_ENABLE_SHIFT);
-
-    writel(entry, iommu->mmio_base+IOMMU_CONTROL_MMIO_OFFSET);
+    writeq(iommu->ctrl.raw, iommu->mmio_base + IOMMU_CONTROL_MMIO_OFFSET);
 
     if ( enable )
         AMD_IOMMU_DEBUG("Guest Translation Enabled.\n");
 }
 
 static void set_iommu_command_buffer_control(struct amd_iommu *iommu,
-                                                    int enable)
+                                             bool enable)
 {
-    u32 entry;
-
-    entry = readl(iommu->mmio_base + IOMMU_CONTROL_MMIO_OFFSET);
-
-    /*reset head and tail pointer manually before enablement */
+    /* Reset head and tail pointer manually before enablement */
     if ( enable )
     {
         writeq(0, iommu->mmio_base + IOMMU_CMD_BUFFER_HEAD_OFFSET);
         writeq(0, iommu->mmio_base + IOMMU_CMD_BUFFER_TAIL_OFFSET);
-
-        iommu_set_bit(&entry, IOMMU_CONTROL_COMMAND_BUFFER_ENABLE_SHIFT);
     }
-    else
-        iommu_clear_bit(&entry, IOMMU_CONTROL_COMMAND_BUFFER_ENABLE_SHIFT);
 
-    writel(entry, iommu->mmio_base+IOMMU_CONTROL_MMIO_OFFSET);
+    iommu->ctrl.cmd_buf_en = enable;
+
+    writeq(iommu->ctrl.raw, iommu->mmio_base + IOMMU_CONTROL_MMIO_OFFSET);
 }
 
 static void register_iommu_exclusion_range(struct amd_iommu *iommu)
@@ -295,57 +264,38 @@ static void register_iommu_exclusion_ran
 }
 
 static void set_iommu_event_log_control(struct amd_iommu *iommu,
-            int enable)
+                                        bool enable)
 {
-    u32 entry;
-
-    entry = readl(iommu->mmio_base + IOMMU_CONTROL_MMIO_OFFSET);
-
-    /*reset head and tail pointer manually before enablement */
+    /* Reset head and tail pointer manually before enablement */
     if ( enable )
     {
         writeq(0, iommu->mmio_base + IOMMU_EVENT_LOG_HEAD_OFFSET);
         writeq(0, iommu->mmio_base + IOMMU_EVENT_LOG_TAIL_OFFSET);
-
-        iommu_set_bit(&entry, IOMMU_CONTROL_EVENT_LOG_INT_SHIFT);
-        iommu_set_bit(&entry, IOMMU_CONTROL_EVENT_LOG_ENABLE_SHIFT);
-    }
-    else
-    {
-        iommu_clear_bit(&entry, IOMMU_CONTROL_EVENT_LOG_INT_SHIFT);
-        iommu_clear_bit(&entry, IOMMU_CONTROL_EVENT_LOG_ENABLE_SHIFT);
     }
 
-    iommu_clear_bit(&entry, IOMMU_CONTROL_COMP_WAIT_INT_SHIFT);
+    iommu->ctrl.event_int_en = enable;
+    iommu->ctrl.event_log_en = enable;
+    iommu->ctrl.com_wait_int_en = 0;
 
-    writel(entry, iommu->mmio_base + IOMMU_CONTROL_MMIO_OFFSET);
+    writeq(iommu->ctrl.raw, iommu->mmio_base + IOMMU_CONTROL_MMIO_OFFSET);
 }
 
 static void set_iommu_ppr_log_control(struct amd_iommu *iommu,
-                                      int enable)
+                                      bool enable)
 {
-    u32 entry;
-
-    entry = readl(iommu->mmio_base + IOMMU_CONTROL_MMIO_OFFSET);
-
-    /*reset head and tail pointer manually before enablement */
+    /* Reset head and tail pointer manually before enablement */
     if ( enable )
     {
         writeq(0, iommu->mmio_base + IOMMU_PPR_LOG_HEAD_OFFSET);
         writeq(0, iommu->mmio_base + IOMMU_PPR_LOG_TAIL_OFFSET);
-
-        iommu_set_bit(&entry, IOMMU_CONTROL_PPR_ENABLE_SHIFT);
-        iommu_set_bit(&entry, IOMMU_CONTROL_PPR_LOG_INT_SHIFT);
-        iommu_set_bit(&entry, IOMMU_CONTROL_PPR_LOG_ENABLE_SHIFT);
-    }
-    else
-    {
-        iommu_clear_bit(&entry, IOMMU_CONTROL_PPR_ENABLE_SHIFT);
-        iommu_clear_bit(&entry, IOMMU_CONTROL_PPR_LOG_INT_SHIFT);
-        iommu_clear_bit(&entry, IOMMU_CONTROL_PPR_LOG_ENABLE_SHIFT);
     }
 
-    writel(entry, iommu->mmio_base + IOMMU_CONTROL_MMIO_OFFSET);
+    iommu->ctrl.ppr_en = enable;
+    iommu->ctrl.ppr_int_en = enable;
+    iommu->ctrl.ppr_log_en = enable;
+
+    writeq(iommu->ctrl.raw, iommu->mmio_base + IOMMU_CONTROL_MMIO_OFFSET);
+
     if ( enable )
         AMD_IOMMU_DEBUG("PPR Log Enabled.\n");
 }
@@ -398,7 +348,7 @@ static int iommu_read_log(struct amd_iom
 /* reset event log or ppr log when overflow */
 static void iommu_reset_log(struct amd_iommu *iommu,
                             struct ring_buffer *log,
-                            void (*ctrl_func)(struct amd_iommu *iommu, int))
+                            void (*ctrl_func)(struct amd_iommu *iommu, bool))
 {
     u32 entry;
     int log_run, run_bit;
@@ -615,11 +565,11 @@ static void iommu_check_event_log(struct
         iommu_reset_log(iommu, &iommu->event_log, set_iommu_event_log_control);
     else
     {
-        entry = readl(iommu->mmio_base + IOMMU_CONTROL_MMIO_OFFSET);
-        if ( !(entry & IOMMU_CONTROL_EVENT_LOG_INT_MASK) )
+        if ( !iommu->ctrl.event_int_en )
         {
-            entry |= IOMMU_CONTROL_EVENT_LOG_INT_MASK;
-            writel(entry, iommu->mmio_base + IOMMU_CONTROL_MMIO_OFFSET);
+            iommu->ctrl.event_int_en = 1;
+            writeq(iommu->ctrl.raw,
+                   iommu->mmio_base + IOMMU_CONTROL_MMIO_OFFSET);
             /*
              * Re-schedule the tasklet to handle eventual log entries added
              * between reading the log above and re-enabling the interrupt.
@@ -704,11 +654,11 @@ static void iommu_check_ppr_log(struct a
         iommu_reset_log(iommu, &iommu->ppr_log, set_iommu_ppr_log_control);
     else
     {
-        entry = readl(iommu->mmio_base + IOMMU_CONTROL_MMIO_OFFSET);
-        if ( !(entry & IOMMU_CONTROL_PPR_LOG_INT_MASK) )
+        if ( !iommu->ctrl.ppr_int_en )
         {
-            entry |= IOMMU_CONTROL_PPR_LOG_INT_MASK;
-            writel(entry, iommu->mmio_base + IOMMU_CONTROL_MMIO_OFFSET);
+            iommu->ctrl.ppr_int_en = 1;
+            writeq(iommu->ctrl.raw,
+                   iommu->mmio_base + IOMMU_CONTROL_MMIO_OFFSET);
             /*
              * Re-schedule the tasklet to handle eventual log entries added
              * between reading the log above and re-enabling the interrupt.
@@ -754,7 +704,6 @@ static void do_amd_iommu_irq(unsigned lo
 static void iommu_interrupt_handler(int irq, void *dev_id,
                                     struct cpu_user_regs *regs)
 {
-    u32 entry;
     unsigned long flags;
     struct amd_iommu *iommu = dev_id;
 
@@ -764,10 +713,9 @@ static void iommu_interrupt_handler(int
      * Silence interrupts from both event and PPR by clearing the
      * enable logging bits in the control register
      */
-    entry = readl(iommu->mmio_base + IOMMU_CONTROL_MMIO_OFFSET);
-    iommu_clear_bit(&entry, IOMMU_CONTROL_EVENT_LOG_INT_SHIFT);
-    iommu_clear_bit(&entry, IOMMU_CONTROL_PPR_LOG_INT_SHIFT);
-    writel(entry, iommu->mmio_base + IOMMU_CONTROL_MMIO_OFFSET);
+    iommu->ctrl.event_int_en = 0;
+    iommu->ctrl.ppr_int_en = 0;
+    writeq(iommu->ctrl.raw, iommu->mmio_base + IOMMU_CONTROL_MMIO_OFFSET);
 
     spin_unlock_irqrestore(&iommu->lock, flags);
 
--- a/xen/include/asm-x86/amd-iommu.h
+++ b/xen/include/asm-x86/amd-iommu.h
@@ -88,6 +88,8 @@ struct amd_iommu {
     void *mmio_base;
     unsigned long mmio_base_phys;
 
+    union amd_iommu_control ctrl;
+
     struct table_struct dev_table;
     struct ring_buffer cmd_buffer;
     struct ring_buffer event_log;
@@ -172,7 +174,7 @@ struct guest_iommu {
     uint64_t                mmio_base;             /* MMIO base address */
 
     /* MMIO regs */
-    struct mmio_reg         reg_ctrl;              /* MMIO offset 0018h */
+    union amd_iommu_control reg_ctrl;              /* MMIO offset 0018h */
     struct mmio_reg         reg_status;            /* MMIO offset 2020h */
     union amd_iommu_ext_features reg_ext_feature;  /* MMIO offset 0030h */
 
--- a/xen/include/asm-x86/hvm/svm/amd-iommu-defs.h
+++ b/xen/include/asm-x86/hvm/svm/amd-iommu-defs.h
@@ -295,38 +295,55 @@ struct amd_iommu_dte {
 
 /* Control Register */
 #define IOMMU_CONTROL_MMIO_OFFSET			0x18
-#define IOMMU_CONTROL_TRANSLATION_ENABLE_MASK		0x00000001
-#define IOMMU_CONTROL_TRANSLATION_ENABLE_SHIFT		0
-#define IOMMU_CONTROL_HT_TUNNEL_TRANSLATION_MASK	0x00000002
-#define IOMMU_CONTROL_HT_TUNNEL_TRANSLATION_SHIFT	1
-#define IOMMU_CONTROL_EVENT_LOG_ENABLE_MASK		0x00000004
-#define IOMMU_CONTROL_EVENT_LOG_ENABLE_SHIFT		2
-#define IOMMU_CONTROL_EVENT_LOG_INT_MASK		0x00000008
-#define IOMMU_CONTROL_EVENT_LOG_INT_SHIFT		3
-#define IOMMU_CONTROL_COMP_WAIT_INT_MASK		0x00000010
-#define IOMMU_CONTROL_COMP_WAIT_INT_SHIFT		4
-#define IOMMU_CONTROL_INVALIDATION_TIMEOUT_MASK		0x000000E0
-#define IOMMU_CONTROL_INVALIDATION_TIMEOUT_SHIFT	5
-#define IOMMU_CONTROL_PASS_POSTED_WRITE_MASK		0x00000100
-#define IOMMU_CONTROL_PASS_POSTED_WRITE_SHIFT		8
-#define IOMMU_CONTROL_RESP_PASS_POSTED_WRITE_MASK	0x00000200
-#define IOMMU_CONTROL_RESP_PASS_POSTED_WRITE_SHIFT	9
-#define IOMMU_CONTROL_COHERENT_MASK			0x00000400
-#define IOMMU_CONTROL_COHERENT_SHIFT			10
-#define IOMMU_CONTROL_ISOCHRONOUS_MASK			0x00000800
-#define IOMMU_CONTROL_ISOCHRONOUS_SHIFT			11
-#define IOMMU_CONTROL_COMMAND_BUFFER_ENABLE_MASK	0x00001000
-#define IOMMU_CONTROL_COMMAND_BUFFER_ENABLE_SHIFT	12
-#define IOMMU_CONTROL_PPR_LOG_ENABLE_MASK		0x00002000
-#define IOMMU_CONTROL_PPR_LOG_ENABLE_SHIFT		13
-#define IOMMU_CONTROL_PPR_LOG_INT_MASK			0x00004000
-#define IOMMU_CONTROL_PPR_LOG_INT_SHIFT			14
-#define IOMMU_CONTROL_PPR_ENABLE_MASK			0x00008000
-#define IOMMU_CONTROL_PPR_ENABLE_SHIFT			15
-#define IOMMU_CONTROL_GT_ENABLE_MASK			0x00010000
-#define IOMMU_CONTROL_GT_ENABLE_SHIFT			16
-#define IOMMU_CONTROL_RESTART_MASK			0x80000000
-#define IOMMU_CONTROL_RESTART_SHIFT			31
+
+union amd_iommu_control {
+    uint64_t raw;
+    struct {
+        unsigned int iommu_en:1;
+        unsigned int ht_tun_en:1;
+        unsigned int event_log_en:1;
+        unsigned int event_int_en:1;
+        unsigned int com_wait_int_en:1;
+        unsigned int inv_timeout:3;
+        unsigned int pass_pw:1;
+        unsigned int res_pass_pw:1;
+        unsigned int coherent:1;
+        unsigned int isoc:1;
+        unsigned int cmd_buf_en:1;
+        unsigned int ppr_log_en:1;
+        unsigned int ppr_int_en:1;
+        unsigned int ppr_en:1;
+        unsigned int gt_en:1;
+        unsigned int ga_en:1;
+        unsigned int crw:4;
+        unsigned int smif_en:1;
+        unsigned int slf_wb_dis:1;
+        unsigned int smif_log_en:1;
+        unsigned int gam_en:3;
+        unsigned int ga_log_en:1;
+        unsigned int ga_int_en:1;
+        unsigned int dual_ppr_log_en:2;
+        unsigned int dual_event_log_en:2;
+        unsigned int dev_tbl_seg_en:3;
+        unsigned int priv_abrt_en:2;
+        unsigned int ppr_auto_rsp_en:1;
+        unsigned int marc_en:1;
+        unsigned int blk_stop_mrk_en:1;
+        unsigned int ppr_auto_rsp_aon:1;
+        unsigned int :2;
+        unsigned int eph_en:1;
+        unsigned int had_update:2;
+        unsigned int gd_update_dis:1;
+        unsigned int :1;
+        unsigned int xt_en:1;
+        unsigned int int_cap_xt_en:1;
+        unsigned int vcmd_en:1;
+        unsigned int viommu_en:1;
+        unsigned int ga_update_dis:1;
+        unsigned int gappi_en:1;
+        unsigned int :8;
+    };
+};
 
 /* Exclusion Register */
 #define IOMMU_EXCLUSION_BASE_LOW_OFFSET		0x20




_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 54+ messages in thread

* [Xen-devel] [PATCH 3/9] AMD/IOMMU: use bit field for IRTE
  2019-06-13 13:14 [Xen-devel] [PATCH 0/9] x86: AMD x2APIC support Jan Beulich
  2019-06-13 13:22 ` [Xen-devel] [PATCH 1/9] AMD/IOMMU: use bit field for extended feature register Jan Beulich
  2019-06-13 13:22 ` [Xen-devel] [PATCH 2/9] AMD/IOMMU: use bit field for control register Jan Beulich
@ 2019-06-13 13:23 ` Jan Beulich
  2019-06-18 10:37   ` Andrew Cooper
  2019-06-18 11:31   ` Andrew Cooper
  2019-06-13 13:23 ` [Xen-devel] [PATCH 4/9] AMD/IOMMU: introduce 128-bit IRTE non-guest-APIC IRTE format Jan Beulich
                   ` (6 subsequent siblings)
  9 siblings, 2 replies; 54+ messages in thread
From: Jan Beulich @ 2019-06-13 13:23 UTC (permalink / raw)
  To: xen-devel; +Cc: Andrew Cooper, Brian Woods, Suravee Suthikulpanit

At the same time restrict its scope to just the single source file
actually using it, and abstract accesses by introducing a union of
pointers. (A union of the actual table entries is not used to make it
impossible to [wrongly, once the 128-bit form gets added] perform
pointer arithmetic / array accesses on derived types.)

Also move away from updating the entries piecemeal: Construct a full new
entry, and write it out.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
---
It would have been nice to use write_atomic() or ACCESS_ONCE() for the
actual writes, but both cast the value to a scalar one, which doesn't
suit us here (and I also didn't want to make the compound type a union
with a raw member just for this).

--- a/xen/drivers/passthrough/amd/iommu_intr.c
+++ b/xen/drivers/passthrough/amd/iommu_intr.c
@@ -23,6 +23,23 @@
 #include <asm/io_apic.h>
 #include <xen/keyhandler.h>
 
+struct irte_basic {
+    unsigned int remap_en:1;
+    unsigned int sup_io_pf:1;
+    unsigned int int_type:3;
+    unsigned int rq_eoi:1;
+    unsigned int dm:1;
+    unsigned int guest_mode:1; /* MBZ */
+    unsigned int dest:8;
+    unsigned int vector:8;
+    unsigned int :8;
+};
+
+union irte_ptr {
+    void *raw;
+    struct irte_basic *basic;
+};
+
 #define INTREMAP_TABLE_ORDER    1
 #define INTREMAP_LENGTH 0xB
 #define INTREMAP_ENTRIES (1 << INTREMAP_LENGTH)
@@ -101,47 +118,44 @@ static unsigned int alloc_intremap_entry
     return slot;
 }
 
-static u32 *get_intremap_entry(int seg, int bdf, int offset)
+static union irte_ptr get_intremap_entry(unsigned int seg, unsigned int bdf,
+                                         unsigned int offset)
 {
-    u32 *table = get_ivrs_mappings(seg)[bdf].intremap_table;
+    union irte_ptr table = {
+        .raw = get_ivrs_mappings(seg)[bdf].intremap_table
+    };
+
+    ASSERT(table.raw && (offset < INTREMAP_ENTRIES));
 
-    ASSERT( (table != NULL) && (offset < INTREMAP_ENTRIES) );
+    table.basic += offset;
 
-    return table + offset;
+    return table;
 }
 
-static void free_intremap_entry(int seg, int bdf, int offset)
+static void free_intremap_entry(unsigned int seg, unsigned int bdf, unsigned int offset)
 {
-    u32 *entry = get_intremap_entry(seg, bdf, offset);
+    union irte_ptr entry = get_intremap_entry(seg, bdf, offset);
+
+    *entry.basic = (struct irte_basic){};
 
-    memset(entry, 0, sizeof(u32));
     __clear_bit(offset, get_ivrs_mappings(seg)[bdf].intremap_inuse);
 }
 
-static void update_intremap_entry(u32* entry, u8 vector, u8 int_type,
-    u8 dest_mode, u8 dest)
-{
-    set_field_in_reg_u32(IOMMU_CONTROL_ENABLED, 0,
-                            INT_REMAP_ENTRY_REMAPEN_MASK,
-                            INT_REMAP_ENTRY_REMAPEN_SHIFT, entry);
-    set_field_in_reg_u32(IOMMU_CONTROL_DISABLED, *entry,
-                            INT_REMAP_ENTRY_SUPIOPF_MASK,
-                            INT_REMAP_ENTRY_SUPIOPF_SHIFT, entry);
-    set_field_in_reg_u32(int_type, *entry,
-                            INT_REMAP_ENTRY_INTTYPE_MASK,
-                            INT_REMAP_ENTRY_INTTYPE_SHIFT, entry);
-    set_field_in_reg_u32(IOMMU_CONTROL_DISABLED, *entry,
-                            INT_REMAP_ENTRY_REQEOI_MASK,
-                            INT_REMAP_ENTRY_REQEOI_SHIFT, entry);
-    set_field_in_reg_u32((u32)dest_mode, *entry,
-                            INT_REMAP_ENTRY_DM_MASK,
-                            INT_REMAP_ENTRY_DM_SHIFT, entry);
-    set_field_in_reg_u32((u32)dest, *entry,
-                            INT_REMAP_ENTRY_DEST_MAST,
-                            INT_REMAP_ENTRY_DEST_SHIFT, entry);
-    set_field_in_reg_u32((u32)vector, *entry,
-                            INT_REMAP_ENTRY_VECTOR_MASK,
-                            INT_REMAP_ENTRY_VECTOR_SHIFT, entry);
+static void update_intremap_entry(union irte_ptr entry, unsigned int vector,
+                                  unsigned int int_type,
+                                  unsigned int dest_mode, unsigned int dest)
+{
+    struct irte_basic basic = {
+        .remap_en = 1,
+        .sup_io_pf = 0,
+        .int_type = int_type,
+        .rq_eoi = 0,
+        .dm = dest_mode,
+        .dest = dest,
+        .vector = vector,
+    };
+
+    *entry.basic = basic;
 }
 
 static inline int get_rte_index(const struct IO_APIC_route_entry *rte)
@@ -163,7 +177,7 @@ static int update_intremap_entry_from_io
     u16 *index)
 {
     unsigned long flags;
-    u32* entry;
+    union irte_ptr entry;
     u8 delivery_mode, dest, vector, dest_mode;
     int req_id;
     spinlock_t *lock;
@@ -201,12 +215,8 @@ static int update_intremap_entry_from_io
          * so need to recover vector and delivery mode from IRTE.
          */
         ASSERT(get_rte_index(rte) == offset);
-        vector = get_field_from_reg_u32(*entry,
-                                        INT_REMAP_ENTRY_VECTOR_MASK,
-                                        INT_REMAP_ENTRY_VECTOR_SHIFT);
-        delivery_mode = get_field_from_reg_u32(*entry,
-                                               INT_REMAP_ENTRY_INTTYPE_MASK,
-                                               INT_REMAP_ENTRY_INTTYPE_SHIFT);
+        vector = entry.basic->vector;
+        delivery_mode = entry.basic->int_type;
     }
     update_intremap_entry(entry, vector, delivery_mode, dest_mode, dest);
 
@@ -228,7 +238,7 @@ int __init amd_iommu_setup_ioapic_remapp
 {
     struct IO_APIC_route_entry rte;
     unsigned long flags;
-    u32* entry;
+    union irte_ptr entry;
     int apic, pin;
     u8 delivery_mode, dest, vector, dest_mode;
     u16 seg, bdf, req_id;
@@ -407,16 +417,12 @@ unsigned int amd_iommu_read_ioapic_from_
         u16 bdf = ioapic_sbdf[idx].bdf;
         u16 seg = ioapic_sbdf[idx].seg;
         u16 req_id = get_intremap_requestor_id(seg, bdf);
-        const u32 *entry = get_intremap_entry(seg, req_id, offset);
+        union irte_ptr entry = get_intremap_entry(seg, req_id, offset);
 
         ASSERT(offset == (val & (INTREMAP_ENTRIES - 1)));
         val &= ~(INTREMAP_ENTRIES - 1);
-        val |= get_field_from_reg_u32(*entry,
-                                      INT_REMAP_ENTRY_INTTYPE_MASK,
-                                      INT_REMAP_ENTRY_INTTYPE_SHIFT) << 8;
-        val |= get_field_from_reg_u32(*entry,
-                                      INT_REMAP_ENTRY_VECTOR_MASK,
-                                      INT_REMAP_ENTRY_VECTOR_SHIFT);
+        val |= MASK_INSR(entry.basic->int_type, IO_APIC_REDIR_DELIV_MODE_MASK);
+        val |= MASK_INSR(entry.basic->vector, IO_APIC_REDIR_VECTOR_MASK);
     }
 
     return val;
@@ -427,7 +433,7 @@ static int update_intremap_entry_from_ms
     int *remap_index, const struct msi_msg *msg, u32 *data)
 {
     unsigned long flags;
-    u32* entry;
+    union irte_ptr entry;
     u16 req_id, alias_id;
     u8 delivery_mode, dest, vector, dest_mode;
     spinlock_t *lock;
@@ -581,7 +587,7 @@ void amd_iommu_read_msi_from_ire(
     const struct pci_dev *pdev = msi_desc->dev;
     u16 bdf = pdev ? PCI_BDF2(pdev->bus, pdev->devfn) : hpet_sbdf.bdf;
     u16 seg = pdev ? pdev->seg : hpet_sbdf.seg;
-    const u32 *entry;
+    union irte_ptr entry;
 
     if ( IS_ERR_OR_NULL(_find_iommu_for_device(seg, bdf)) )
         return;
@@ -597,12 +603,8 @@ void amd_iommu_read_msi_from_ire(
     }
 
     msg->data &= ~(INTREMAP_ENTRIES - 1);
-    msg->data |= get_field_from_reg_u32(*entry,
-                                        INT_REMAP_ENTRY_INTTYPE_MASK,
-                                        INT_REMAP_ENTRY_INTTYPE_SHIFT) << 8;
-    msg->data |= get_field_from_reg_u32(*entry,
-                                        INT_REMAP_ENTRY_VECTOR_MASK,
-                                        INT_REMAP_ENTRY_VECTOR_SHIFT);
+    msg->data |= MASK_INSR(entry.basic->int_type, MSI_DATA_DELIVERY_MODE_MASK);
+    msg->data |= MASK_INSR(entry.basic->vector, MSI_DATA_VECTOR_MASK);
 }
 
 int __init amd_iommu_free_intremap_table(
--- a/xen/include/asm-x86/hvm/svm/amd-iommu-defs.h
+++ b/xen/include/asm-x86/hvm/svm/amd-iommu-defs.h
@@ -468,22 +468,6 @@ struct amd_iommu_pte {
 #define IOMMU_CONTROL_DISABLED	0
 #define IOMMU_CONTROL_ENABLED	1
 
-/* interrupt remapping table */
-#define INT_REMAP_ENTRY_REMAPEN_MASK    0x00000001
-#define INT_REMAP_ENTRY_REMAPEN_SHIFT   0
-#define INT_REMAP_ENTRY_SUPIOPF_MASK    0x00000002
-#define INT_REMAP_ENTRY_SUPIOPF_SHIFT   1
-#define INT_REMAP_ENTRY_INTTYPE_MASK    0x0000001C
-#define INT_REMAP_ENTRY_INTTYPE_SHIFT   2
-#define INT_REMAP_ENTRY_REQEOI_MASK     0x00000020
-#define INT_REMAP_ENTRY_REQEOI_SHIFT    5
-#define INT_REMAP_ENTRY_DM_MASK         0x00000040
-#define INT_REMAP_ENTRY_DM_SHIFT        6
-#define INT_REMAP_ENTRY_DEST_MAST       0x0000FF00
-#define INT_REMAP_ENTRY_DEST_SHIFT      8
-#define INT_REMAP_ENTRY_VECTOR_MASK     0x00FF0000
-#define INT_REMAP_ENTRY_VECTOR_SHIFT    16
-
 #define INV_IOMMU_ALL_PAGES_ADDRESS      ((1ULL << 63) - 1)
 
 #define IOMMU_RING_BUFFER_PTR_MASK                  0x0007FFF0




_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 54+ messages in thread

* [Xen-devel] [PATCH 4/9] AMD/IOMMU: introduce 128-bit IRTE non-guest-APIC IRTE format
  2019-06-13 13:14 [Xen-devel] [PATCH 0/9] x86: AMD x2APIC support Jan Beulich
                   ` (2 preceding siblings ...)
  2019-06-13 13:23 ` [Xen-devel] [PATCH 3/9] AMD/IOMMU: use bit field for IRTE Jan Beulich
@ 2019-06-13 13:23 ` Jan Beulich
  2019-06-18 11:57   ` Andrew Cooper
  2019-06-13 13:24 ` [Xen-devel] [PATCH 5/9] AMD/IOMMU: split amd_iommu_init_one() Jan Beulich
                   ` (5 subsequent siblings)
  9 siblings, 1 reply; 54+ messages in thread
From: Jan Beulich @ 2019-06-13 13:23 UTC (permalink / raw)
  To: xen-devel; +Cc: Andrew Cooper, Brian Woods, Suravee Suthikulpanit

This is in preparation of actually enabling x2APIC mode, which requires
this wider IRTE format to be used.

A specific remark regarding the first hunk changing
amd_iommu_ioapic_update_ire(): This bypass was introduced for XSA-36,
i.e. by 94d4a1119d ("AMD,IOMMU: Clean up old entries in remapping
tables when creating new one"). Other code introduced by that change has
meanwhile disappeared or further changed, and I wonder if - rather than
adding an x2apic_enabled check to the conditional - the bypass couldn't
be deleted altogether. For now the goal is to affect the non-x2APIC
paths as little as possible.

Take the liberty and use the new "fresh" flag to suppress an unneeded
flush in update_intremap_entry_from_ioapic().

Signed-off-by: Jan Beulich <jbeulich@suse.com>
---
Note that AMD's doc says Lowest Priority ("Arbitrated" by their naming)
mode is unavailable in x2APIC mode, but they've confirmed this to be a
mistake on their part.

--- a/xen/drivers/passthrough/amd/iommu_intr.c
+++ b/xen/drivers/passthrough/amd/iommu_intr.c
@@ -35,12 +35,34 @@ struct irte_basic {
     unsigned int :8;
 };
 
+struct irte_full {
+    unsigned int remap_en:1;
+    unsigned int sup_io_pf:1;
+    unsigned int int_type:3;
+    unsigned int rq_eoi:1;
+    unsigned int dm:1;
+    unsigned int guest_mode:1; /* MBZ */
+    unsigned int dest_lo:24;
+    unsigned int :32;
+    unsigned int vector:8;
+    unsigned int :24;
+    unsigned int :24;
+    unsigned int dest_hi:8;
+};
+
+static enum {
+    irte_basic,
+    irte_full,
+    irte_unset,
+} irte_mode __read_mostly = irte_unset;
+
 union irte_ptr {
     void *raw;
     struct irte_basic *basic;
+    struct irte_full *full;
 };
 
-#define INTREMAP_TABLE_ORDER    1
+#define INTREMAP_TABLE_ORDER (irte_mode == irte_basic ? 1 : 3)
 #define INTREMAP_LENGTH 0xB
 #define INTREMAP_ENTRIES (1 << INTREMAP_LENGTH)
 
@@ -127,7 +149,19 @@ static union irte_ptr get_intremap_entry
 
     ASSERT(table.raw && (offset < INTREMAP_ENTRIES));
 
-    table.basic += offset;
+    switch ( irte_mode )
+    {
+    case irte_basic:
+        table.basic += offset;
+        break;
+
+    case irte_full:
+        table.full += offset;
+        break;
+
+    default:
+        ASSERT_UNREACHABLE();
+    }
 
     return table;
 }
@@ -136,7 +170,21 @@ static void free_intremap_entry(unsigned
 {
     union irte_ptr entry = get_intremap_entry(seg, bdf, offset);
 
-    *entry.basic = (struct irte_basic){};
+    switch ( irte_mode )
+    {
+    case irte_basic:
+        *entry.basic = (struct irte_basic){};
+        break;
+
+    case irte_full:
+        entry.full->remap_en = 0;
+        wmb();
+        *entry.full = (struct irte_full){};
+        break;
+
+    default:
+        ASSERT_UNREACHABLE();
+    }
 
     __clear_bit(offset, get_ivrs_mappings(seg)[bdf].intremap_inuse);
 }
@@ -154,8 +202,38 @@ static void update_intremap_entry(union
         .dest = dest,
         .vector = vector,
     };
+    struct irte_full full = {
+        .remap_en = 1,
+        .sup_io_pf = 0,
+        .int_type = int_type,
+        .rq_eoi = 0,
+        .dm = dest_mode,
+        .dest_lo = dest,
+        .dest_hi = dest >> 24,
+        .vector = vector,
+    };
+
+    switch ( irte_mode )
+    {
+        __uint128_t ret;
+        union {
+            __uint128_t raw;
+            struct irte_full full;
+        } old;
+
+    case irte_basic:
+        *entry.basic = basic;
+        break;
+
+    case irte_full:
+        old.full = *entry.full;
+        ret = cmpxchg16b(entry.full, &old, &full);
+        ASSERT(ret == old.raw);
+        break;
 
-    *entry.basic = basic;
+    default:
+        ASSERT_UNREACHABLE();
+    }
 }
 
 static inline int get_rte_index(const struct IO_APIC_route_entry *rte)
@@ -169,6 +247,11 @@ static inline void set_rte_index(struct
     rte->delivery_mode = offset >> 8;
 }
 
+static inline unsigned int get_full_dest(const struct irte_full *entry)
+{
+    return entry->dest_lo | (entry->dest_hi << 24);
+}
+
 static int update_intremap_entry_from_ioapic(
     int bdf,
     struct amd_iommu *iommu,
@@ -178,10 +261,11 @@ static int update_intremap_entry_from_io
 {
     unsigned long flags;
     union irte_ptr entry;
-    u8 delivery_mode, dest, vector, dest_mode;
+    unsigned int delivery_mode, dest, vector, dest_mode;
     int req_id;
     spinlock_t *lock;
     unsigned int offset;
+    bool fresh = false;
 
     req_id = get_intremap_requestor_id(iommu->seg, bdf);
     lock = get_intremap_lock(iommu->seg, req_id);
@@ -189,7 +273,7 @@ static int update_intremap_entry_from_io
     delivery_mode = rte->delivery_mode;
     vector = rte->vector;
     dest_mode = rte->dest_mode;
-    dest = rte->dest.logical.logical_dest;
+    dest = x2apic_enabled ? rte->dest.dest32 : rte->dest.logical.logical_dest;
 
     spin_lock_irqsave(lock, flags);
 
@@ -204,25 +288,40 @@ static int update_intremap_entry_from_io
             return -ENOSPC;
         }
         *index = offset;
-        lo_update = 1;
+        fresh = true;
     }
 
     entry = get_intremap_entry(iommu->seg, req_id, offset);
-    if ( !lo_update )
+    if ( fresh )
+        /* nothing */;
+    else if ( !lo_update )
     {
         /*
          * Low half of incoming RTE is already in remapped format,
          * so need to recover vector and delivery mode from IRTE.
          */
         ASSERT(get_rte_index(rte) == offset);
-        vector = entry.basic->vector;
+        if ( irte_mode == irte_basic )
+            vector = entry.basic->vector;
+        else
+            vector = entry.full->vector;
+        /* The IntType fields match for both formats. */
         delivery_mode = entry.basic->int_type;
     }
+    else if ( x2apic_enabled )
+    {
+        /*
+         * High half of incoming RTE was read from the I/O APIC and hence may
+         * not hold the full destination, so need to recover full destination
+         * from IRTE.
+         */
+        dest = get_full_dest(entry.full);
+    }
     update_intremap_entry(entry, vector, delivery_mode, dest_mode, dest);
 
     spin_unlock_irqrestore(lock, flags);
 
-    if ( iommu->enabled )
+    if ( iommu->enabled && !fresh )
     {
         spin_lock_irqsave(&iommu->lock, flags);
         amd_iommu_flush_intremap(iommu, req_id);
@@ -246,6 +345,19 @@ int __init amd_iommu_setup_ioapic_remapp
     spinlock_t *lock;
     unsigned int offset;
 
+    for_each_amd_iommu ( iommu )
+    {
+        if ( irte_mode != irte_unset )
+        {
+            if ( iommu->ctrl.ga_en == (irte_mode == irte_basic) )
+                return -ENXIO;
+        }
+        else if ( iommu->ctrl.ga_en )
+            irte_mode = irte_full;
+        else
+            irte_mode = irte_basic;
+    }
+
     /* Read ioapic entries and update interrupt remapping table accordingly */
     for ( apic = 0; apic < nr_ioapics; apic++ )
     {
@@ -280,6 +392,18 @@ int __init amd_iommu_setup_ioapic_remapp
             dest_mode = rte.dest_mode;
             dest = rte.dest.logical.logical_dest;
 
+            if ( iommu->ctrl.xt_en )
+            {
+                /*
+                 * In x2APIC mode we have no way of discovering the high 24
+                 * bits of the destination of an already enabled interrupt.
+                 * We come here earlier than for xAPIC mode, so no interrupts
+                 * should have been set up before.
+                 */
+                AMD_IOMMU_DEBUG("Unmasked IO-APIC#%u entry %u in x2APIC mode\n",
+                                IO_APIC_ID(apic), pin);
+            }
+
             spin_lock_irqsave(lock, flags);
             offset = alloc_intremap_entry(seg, req_id, 1);
             BUG_ON(offset >= INTREMAP_ENTRIES);
@@ -314,7 +438,8 @@ void amd_iommu_ioapic_update_ire(
     struct IO_APIC_route_entry new_rte = { 0 };
     unsigned int rte_lo = (reg & 1) ? reg - 1 : reg;
     unsigned int pin = (reg - 0x10) / 2;
-    int saved_mask, seg, bdf, rc;
+    int seg, bdf, rc;
+    bool saved_mask, fresh = false;
     struct amd_iommu *iommu;
     unsigned int idx;
 
@@ -356,12 +481,22 @@ void amd_iommu_ioapic_update_ire(
         *(((u32 *)&new_rte) + 1) = value;
     }
 
-    if ( new_rte.mask &&
-         ioapic_sbdf[idx].pin_2_idx[pin] >= INTREMAP_ENTRIES )
+    if ( ioapic_sbdf[idx].pin_2_idx[pin] >= INTREMAP_ENTRIES )
     {
         ASSERT(saved_mask);
-        __io_apic_write(apic, reg, value);
-        return;
+
+        /*
+         * There's nowhere except the IRTE to store a full 32-bit destination,
+         * so we may not bypass entry allocation and updating of the low RTE
+         * half in the (usual) case of the high RTE half getting written first.
+         */
+        if ( new_rte.mask && !x2apic_enabled )
+        {
+            __io_apic_write(apic, reg, value);
+            return;
+        }
+
+        fresh = true;
     }
 
     /* mask the interrupt while we change the intremap table */
@@ -390,8 +525,12 @@ void amd_iommu_ioapic_update_ire(
     if ( reg == rte_lo )
         return;
 
-    /* unmask the interrupt after we have updated the intremap table */
-    if ( !saved_mask )
+    /*
+     * Unmask the interrupt after we have updated the intremap table. Also
+     * write the low half if a fresh entry was allocated for a high half
+     * update in x2APIC mode.
+     */
+    if ( !saved_mask || (x2apic_enabled && fresh) )
     {
         old_rte.mask = saved_mask;
         __io_apic_write(apic, rte_lo, *((u32 *)&old_rte));
@@ -405,25 +544,35 @@ unsigned int amd_iommu_read_ioapic_from_
     unsigned int offset;
     unsigned int val = __io_apic_read(apic, reg);
     unsigned int pin = (reg - 0x10) / 2;
+    uint16_t seg, req_id;
+    union irte_ptr entry;
 
     idx = ioapic_id_to_index(IO_APIC_ID(apic));
     if ( idx == MAX_IO_APICS )
         return -EINVAL;
 
     offset = ioapic_sbdf[idx].pin_2_idx[pin];
+    if ( offset >= INTREMAP_ENTRIES )
+        return val;
 
-    if ( !(reg & 1) && offset < INTREMAP_ENTRIES )
+    seg = ioapic_sbdf[idx].seg;
+    req_id = get_intremap_requestor_id(seg, ioapic_sbdf[idx].bdf);
+    entry = get_intremap_entry(seg, req_id, offset);
+
+    if ( !(reg & 1) )
     {
-        u16 bdf = ioapic_sbdf[idx].bdf;
-        u16 seg = ioapic_sbdf[idx].seg;
-        u16 req_id = get_intremap_requestor_id(seg, bdf);
-        union irte_ptr entry = get_intremap_entry(seg, req_id, offset);
 
         ASSERT(offset == (val & (INTREMAP_ENTRIES - 1)));
         val &= ~(INTREMAP_ENTRIES - 1);
+        /* The IntType fields match for both formats. */
         val |= MASK_INSR(entry.basic->int_type, IO_APIC_REDIR_DELIV_MODE_MASK);
-        val |= MASK_INSR(entry.basic->vector, IO_APIC_REDIR_VECTOR_MASK);
+        if ( irte_mode == irte_basic )
+            val |= MASK_INSR(entry.basic->vector, IO_APIC_REDIR_VECTOR_MASK);
+        else
+            val |= MASK_INSR(entry.full->vector, IO_APIC_REDIR_VECTOR_MASK);
     }
+    else if ( x2apic_enabled )
+        val = get_full_dest(entry.full);
 
     return val;
 }
@@ -435,9 +584,9 @@ static int update_intremap_entry_from_ms
     unsigned long flags;
     union irte_ptr entry;
     u16 req_id, alias_id;
-    u8 delivery_mode, dest, vector, dest_mode;
+    uint8_t delivery_mode, vector, dest_mode;
     spinlock_t *lock;
-    unsigned int offset, i;
+    unsigned int dest, offset, i;
 
     req_id = get_dma_requestor_id(iommu->seg, bdf);
     alias_id = get_intremap_requestor_id(iommu->seg, bdf);
@@ -458,7 +607,12 @@ static int update_intremap_entry_from_ms
     dest_mode = (msg->address_lo >> MSI_ADDR_DESTMODE_SHIFT) & 0x1;
     delivery_mode = (msg->data >> MSI_DATA_DELIVERY_MODE_SHIFT) & 0x1;
     vector = (msg->data >> MSI_DATA_VECTOR_SHIFT) & MSI_DATA_VECTOR_MASK;
-    dest = (msg->address_lo >> MSI_ADDR_DEST_ID_SHIFT) & 0xff;
+
+    if ( x2apic_enabled )
+        dest = msg->dest32;
+    else
+        dest = MASK_EXTR(msg->address_lo, MSI_ADDR_DEST_ID_MASK);
+
     offset = *remap_index;
     if ( offset >= INTREMAP_ENTRIES )
     {
@@ -603,8 +757,18 @@ void amd_iommu_read_msi_from_ire(
     }
 
     msg->data &= ~(INTREMAP_ENTRIES - 1);
+    /* The IntType fields match for both formats. */
     msg->data |= MASK_INSR(entry.basic->int_type, MSI_DATA_DELIVERY_MODE_MASK);
-    msg->data |= MASK_INSR(entry.basic->vector, MSI_DATA_VECTOR_MASK);
+    if ( irte_mode == irte_basic )
+    {
+        msg->data |= MASK_INSR(entry.basic->vector, MSI_DATA_VECTOR_MASK);
+        msg->dest32 = entry.basic->dest;
+    }
+    else
+    {
+        msg->data |= MASK_INSR(entry.full->vector, MSI_DATA_VECTOR_MASK);
+        msg->dest32 = get_full_dest(entry.full);
+    }
 }
 
 int __init amd_iommu_free_intremap_table(
@@ -667,18 +831,33 @@ int __init amd_setup_hpet_msi(struct msi
     return rc;
 }
 
-static void dump_intremap_table(const u32 *table)
+static void dump_intremap_table(const void *table)
 {
-    u32 count;
+    unsigned int count;
+    union {
+        const void *raw;
+        const uint32_t *basic;
+        const uint64_t (*full)[2];
+    } tbl = { .raw = table };
 
-    if ( !table )
+    if ( !table || irte_mode == irte_unset )
         return;
 
     for ( count = 0; count < INTREMAP_ENTRIES; count++ )
     {
-        if ( !table[count] )
-            continue;
-        printk("    IRTE[%03x] %08x\n", count, table[count]);
+        if ( irte_mode == irte_basic )
+        {
+            if ( !tbl.basic[count] )
+                continue;
+            printk("    IRTE[%03x] %08x\n", count, tbl.basic[count]);
+        }
+        else
+        {
+            if ( !tbl.full[count][0] && !tbl.full[count][1] )
+                continue;
+            printk("    IRTE[%03x] %016lx_%016lx\n",
+                   count, tbl.full[count][1], tbl.full[count][0]);
+        }
     }
 }
 




_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 54+ messages in thread

* [Xen-devel] [PATCH 5/9] AMD/IOMMU: split amd_iommu_init_one()
  2019-06-13 13:14 [Xen-devel] [PATCH 0/9] x86: AMD x2APIC support Jan Beulich
                   ` (3 preceding siblings ...)
  2019-06-13 13:23 ` [Xen-devel] [PATCH 4/9] AMD/IOMMU: introduce 128-bit IRTE non-guest-APIC IRTE format Jan Beulich
@ 2019-06-13 13:24 ` Jan Beulich
  2019-06-18 12:17   ` Andrew Cooper
  2019-06-13 13:25 ` [Xen-devel] [PATCH 6/9] AMD/IOMMU: allow enabling with IRQ not yet set up Jan Beulich
                   ` (4 subsequent siblings)
  9 siblings, 1 reply; 54+ messages in thread
From: Jan Beulich @ 2019-06-13 13:24 UTC (permalink / raw)
  To: xen-devel; +Cc: Andrew Cooper, Brian Woods, Suravee Suthikulpanit

Mapping the MMIO space and obtaining feature information needs to happen
slightly earlier, such that for x2APIC support we can set XTEn prior to
calling amd_iommu_update_ivrs_mapping_acpi() and
amd_iommu_setup_ioapic_remapping().

Signed-off-by: Jan Beulich <jbeulich@suse.com>

--- a/xen/drivers/passthrough/amd/iommu_init.c
+++ b/xen/drivers/passthrough/amd/iommu_init.c
@@ -970,14 +970,6 @@ static void * __init allocate_ppr_log(st
 
 static int __init amd_iommu_init_one(struct amd_iommu *iommu)
 {
-    if ( map_iommu_mmio_region(iommu) != 0 )
-        goto error_out;
-
-    get_iommu_features(iommu);
-
-    if ( iommu->features.raw )
-        iommuv2_enabled = 1;
-
     if ( allocate_cmd_buffer(iommu) == NULL )
         goto error_out;
 
@@ -1197,6 +1189,23 @@ static bool_t __init amd_sp5100_erratum2
     return 0;
 }
 
+static int __init amd_iommu_prepare_one(struct amd_iommu *iommu)
+{
+    int rc = alloc_ivrs_mappings(iommu->seg);
+
+    if ( !rc )
+        rc = map_iommu_mmio_region(iommu);
+    if ( rc )
+        return rc;
+
+    get_iommu_features(iommu);
+
+    if ( iommu->features.raw )
+        iommuv2_enabled = true;
+
+    return 0;
+}
+
 int __init amd_iommu_init(void)
 {
     struct amd_iommu *iommu;
@@ -1227,7 +1236,7 @@ int __init amd_iommu_init(void)
     radix_tree_init(&ivrs_maps);
     for_each_amd_iommu ( iommu )
     {
-        rc = alloc_ivrs_mappings(iommu->seg);
+        rc = amd_iommu_prepare_one(iommu);
         if ( rc )
             goto error_out;
     }





_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 54+ messages in thread

* [Xen-devel] [PATCH 6/9] AMD/IOMMU: allow enabling with IRQ not yet set up
  2019-06-13 13:14 [Xen-devel] [PATCH 0/9] x86: AMD x2APIC support Jan Beulich
                   ` (4 preceding siblings ...)
  2019-06-13 13:24 ` [Xen-devel] [PATCH 5/9] AMD/IOMMU: split amd_iommu_init_one() Jan Beulich
@ 2019-06-13 13:25 ` Jan Beulich
  2019-06-18 12:22   ` Andrew Cooper
  2019-06-13 13:26 ` [Xen-devel] [PATCH 7/9] AMD/IOMMU: adjust setup of internal interrupt for x2APIC mode Jan Beulich
                   ` (3 subsequent siblings)
  9 siblings, 1 reply; 54+ messages in thread
From: Jan Beulich @ 2019-06-13 13:25 UTC (permalink / raw)
  To: xen-devel; +Cc: Andrew Cooper, Brian Woods, Suravee Suthikulpanit

Early enabling (to enter x2APIC mode) requires deferring of the IRQ
setup. Code to actually do that setup in the x2APIC case will get added
subsequently.

Signed-off-by: Jan Beulich <jbeulich@suse.com>

--- a/xen/drivers/passthrough/amd/iommu_init.c
+++ b/xen/drivers/passthrough/amd/iommu_init.c
@@ -814,7 +814,6 @@ static void amd_iommu_erratum_746_workar
 static void enable_iommu(struct amd_iommu *iommu)
 {
     unsigned long flags;
-    struct irq_desc *desc;
 
     spin_lock_irqsave(&iommu->lock, flags);
 
@@ -834,19 +833,27 @@ static void enable_iommu(struct amd_iomm
     if ( iommu->features.flds.ppr_sup )
         register_iommu_ppr_log_in_mmio_space(iommu);
 
-    desc = irq_to_desc(iommu->msi.irq);
-    spin_lock(&desc->lock);
-    set_msi_affinity(desc, &cpu_online_map);
-    spin_unlock(&desc->lock);
+    if ( iommu->msi.irq > 0 )
+    {
+        struct irq_desc *desc = irq_to_desc(iommu->msi.irq);
+
+        spin_lock(&desc->lock);
+        set_msi_affinity(desc, &cpu_online_map);
+        spin_unlock(&desc->lock);
+    }
 
     amd_iommu_msi_enable(iommu, IOMMU_CONTROL_ENABLED);
 
     set_iommu_ht_flags(iommu);
     set_iommu_command_buffer_control(iommu, IOMMU_CONTROL_ENABLED);
-    set_iommu_event_log_control(iommu, IOMMU_CONTROL_ENABLED);
 
-    if ( iommu->features.flds.ppr_sup )
-        set_iommu_ppr_log_control(iommu, IOMMU_CONTROL_ENABLED);
+    if ( iommu->msi.irq > 0 )
+    {
+        set_iommu_event_log_control(iommu, IOMMU_CONTROL_ENABLED);
+
+        if ( iommu->features.flds.ppr_sup )
+            set_iommu_ppr_log_control(iommu, IOMMU_CONTROL_ENABLED);
+    }
 
     if ( iommu->features.flds.gt_sup )
         set_iommu_guest_translation_control(iommu, IOMMU_CONTROL_ENABLED);





_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 54+ messages in thread

* [Xen-devel] [PATCH 7/9] AMD/IOMMU: adjust setup of internal interrupt for x2APIC mode
  2019-06-13 13:14 [Xen-devel] [PATCH 0/9] x86: AMD x2APIC support Jan Beulich
                   ` (5 preceding siblings ...)
  2019-06-13 13:25 ` [Xen-devel] [PATCH 6/9] AMD/IOMMU: allow enabling with IRQ not yet set up Jan Beulich
@ 2019-06-13 13:26 ` Jan Beulich
  2019-06-18 12:35   ` Andrew Cooper
  2019-06-13 13:27 ` [Xen-devel] [PATCH 8/9] AMD/IOMMU: enable x2APIC mode when available Jan Beulich
                   ` (2 subsequent siblings)
  9 siblings, 1 reply; 54+ messages in thread
From: Jan Beulich @ 2019-06-13 13:26 UTC (permalink / raw)
  To: xen-devel; +Cc: Andrew Cooper, Brian Woods, Suravee Suthikulpanit

In order to be able to express all possible destinations we need to make
use of this non-MSI-capability based mechanism. The new IRQ controller
structure can re-use certain MSI functions, though.

For now general and PPR interrupts still share a single vector, IRQ, and
hence handler.

Signed-off-by: Jan Beulich <jbeulich@suse.com>

--- a/xen/drivers/passthrough/amd/iommu_init.c
+++ b/xen/drivers/passthrough/amd/iommu_init.c
@@ -472,6 +472,44 @@ static hw_irq_controller iommu_maskable_
     .set_affinity = set_msi_affinity,
 };
 
+static void set_x2apic_affinity(struct irq_desc *desc, const cpumask_t *mask)
+{
+    struct amd_iommu *iommu = desc->action->dev_id;
+    unsigned int dest = set_desc_affinity(desc, mask);
+    union amd_iommu_x2apic_control ctrl = {};
+    unsigned long flags;
+
+    if ( dest == BAD_APICID )
+        return;
+
+    msi_compose_msg(desc->arch.vector, NULL, &iommu->msi.msg);
+    iommu->msi.msg.dest32 = dest;
+
+    ctrl.dest_mode = MASK_EXTR(iommu->msi.msg.address_lo,
+                               MSI_ADDR_DESTMODE_MASK);
+    ctrl.int_type = MASK_EXTR(iommu->msi.msg.data,
+                              MSI_DATA_DELIVERY_MODE_MASK);
+    ctrl.vector = desc->arch.vector;
+    ctrl.dest_lo = dest;
+    ctrl.dest_hi = dest >> 24;
+
+    spin_lock_irqsave(&iommu->lock, flags);
+    writeq(ctrl.raw, iommu->mmio_base + IOMMU_XT_INT_CTRL_MMIO_OFFSET);
+    writeq(ctrl.raw, iommu->mmio_base + IOMMU_XT_PPR_INT_CTRL_MMIO_OFFSET);
+    spin_unlock_irqrestore(&iommu->lock, flags);
+}
+
+static hw_irq_controller iommu_x2apic_type = {
+    .typename     = "IOMMU-x2APIC",
+    .startup      = irq_startup_none,
+    .shutdown     = irq_shutdown_none,
+    .enable       = irq_enable_none,
+    .disable      = irq_disable_none,
+    .ack          = ack_nonmaskable_msi_irq,
+    .end          = end_nonmaskable_msi_irq,
+    .set_affinity = set_x2apic_affinity,
+};
+
 static void parse_event_log_entry(struct amd_iommu *iommu, u32 entry[])
 {
     u16 domain_id, device_id, flags;
@@ -726,8 +764,6 @@ static void iommu_interrupt_handler(int
 static bool_t __init set_iommu_interrupt_handler(struct amd_iommu *iommu)
 {
     int irq, ret;
-    hw_irq_controller *handler;
-    u16 control;
 
     irq = create_irq(NUMA_NO_NODE);
     if ( irq <= 0 )
@@ -747,20 +783,43 @@ static bool_t __init set_iommu_interrupt
                         PCI_SLOT(iommu->bdf), PCI_FUNC(iommu->bdf));
         return 0;
     }
-    control = pci_conf_read16(iommu->seg, PCI_BUS(iommu->bdf),
-                              PCI_SLOT(iommu->bdf), PCI_FUNC(iommu->bdf),
-                              iommu->msi.msi_attrib.pos + PCI_MSI_FLAGS);
-    iommu->msi.msi.nvec = 1;
-    if ( is_mask_bit_support(control) )
-    {
-        iommu->msi.msi_attrib.maskbit = 1;
-        iommu->msi.msi.mpos = msi_mask_bits_reg(iommu->msi.msi_attrib.pos,
-                                                is_64bit_address(control));
-        handler = &iommu_maskable_msi_type;
+
+    if ( iommu->ctrl.int_cap_xt_en )
+    {
+        struct irq_desc *desc = irq_to_desc(irq);
+
+        iommu->msi.msi_attrib.pos = MSI_TYPE_IOMMU;
+        iommu->msi.msi_attrib.maskbit = 0;
+        iommu->msi.msi_attrib.is_64 = 1;
+
+        desc->msi_desc = &iommu->msi;
+        desc->handler = &iommu_x2apic_type;
+
+        ret = 0;
     }
     else
-        handler = &iommu_msi_type;
-    ret = __setup_msi_irq(irq_to_desc(irq), &iommu->msi, handler);
+    {
+        hw_irq_controller *handler;
+        u16 control;
+
+        control = pci_conf_read16(iommu->seg, PCI_BUS(iommu->bdf),
+                                  PCI_SLOT(iommu->bdf), PCI_FUNC(iommu->bdf),
+                                  iommu->msi.msi_attrib.pos + PCI_MSI_FLAGS);
+
+        iommu->msi.msi.nvec = 1;
+        if ( is_mask_bit_support(control) )
+        {
+            iommu->msi.msi_attrib.maskbit = 1;
+            iommu->msi.msi.mpos = msi_mask_bits_reg(iommu->msi.msi_attrib.pos,
+                                                    is_64bit_address(control));
+            handler = &iommu_maskable_msi_type;
+        }
+        else
+            handler = &iommu_msi_type;
+
+        ret = __setup_msi_irq(irq_to_desc(irq), &iommu->msi, handler);
+    }
+
     if ( !ret )
         ret = request_irq(irq, 0, iommu_interrupt_handler, "amd_iommu", iommu);
     if ( ret )
@@ -838,8 +897,19 @@ static void enable_iommu(struct amd_iomm
         struct irq_desc *desc = irq_to_desc(iommu->msi.irq);
 
         spin_lock(&desc->lock);
-        set_msi_affinity(desc, &cpu_online_map);
-        spin_unlock(&desc->lock);
+
+        if ( iommu->ctrl.int_cap_xt_en )
+        {
+            set_x2apic_affinity(desc, &cpu_online_map);
+            spin_unlock(&desc->lock);
+        }
+        else
+        {
+            set_msi_affinity(desc, &cpu_online_map);
+            spin_unlock(&desc->lock);
+
+            amd_iommu_msi_enable(iommu, IOMMU_CONTROL_ENABLED);
+        }
     }
 
     amd_iommu_msi_enable(iommu, IOMMU_CONTROL_ENABLED);
@@ -879,7 +949,9 @@ static void disable_iommu(struct amd_iom
         return;
     }
 
-    amd_iommu_msi_enable(iommu, IOMMU_CONTROL_DISABLED);
+    if ( !iommu->ctrl.int_cap_xt_en )
+        amd_iommu_msi_enable(iommu, IOMMU_CONTROL_DISABLED);
+
     set_iommu_command_buffer_control(iommu, IOMMU_CONTROL_DISABLED);
     set_iommu_event_log_control(iommu, IOMMU_CONTROL_DISABLED);
 
--- a/xen/include/asm-x86/hvm/svm/amd-iommu-defs.h
+++ b/xen/include/asm-x86/hvm/svm/amd-iommu-defs.h
@@ -415,6 +415,25 @@ union amd_iommu_ext_features {
     } flds;
 };
 
+/* x2APIC Control Registers */
+#define IOMMU_XT_INT_CTRL_MMIO_OFFSET		0x0170
+#define IOMMU_XT_PPR_INT_CTRL_MMIO_OFFSET	0x0178
+#define IOMMU_XT_GA_INT_CTRL_MMIO_OFFSET	0x0180
+
+union amd_iommu_x2apic_control {
+    uint64_t raw;
+    struct {
+        unsigned int :2;
+        unsigned int dest_mode:1;
+        unsigned int :5;
+        unsigned int dest_lo:24;
+        unsigned int vector:8;
+        unsigned int int_type:1; /* DM in IOMMU spec 3.04 */
+        unsigned int :15;
+        unsigned int dest_hi:8;
+    };
+};
+
 /* Status Register*/
 #define IOMMU_STATUS_MMIO_OFFSET		0x2020
 #define IOMMU_STATUS_EVENT_OVERFLOW_MASK	0x00000001




_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 54+ messages in thread

* [Xen-devel] [PATCH 8/9] AMD/IOMMU: enable x2APIC mode when available
  2019-06-13 13:14 [Xen-devel] [PATCH 0/9] x86: AMD x2APIC support Jan Beulich
                   ` (6 preceding siblings ...)
  2019-06-13 13:26 ` [Xen-devel] [PATCH 7/9] AMD/IOMMU: adjust setup of internal interrupt for x2APIC mode Jan Beulich
@ 2019-06-13 13:27 ` Jan Beulich
  2019-06-18 13:40   ` Andrew Cooper
  2019-06-13 13:28 ` [Xen-devel] [PATCH RFC 9/9] AMD/IOMMU: correct IRTE updating Jan Beulich
  2019-06-27 15:15 ` [Xen-devel] [PATCH v2 00/10] x86: AMD x2APIC support Jan Beulich
  9 siblings, 1 reply; 54+ messages in thread
From: Jan Beulich @ 2019-06-13 13:27 UTC (permalink / raw)
  To: xen-devel; +Cc: Andrew Cooper, Brian Woods, Suravee Suthikulpanit

In order for the CPUs to use x2APIC mode, the IOMMU(s) first need to be
switched into suitable state.

The post-AP-bringup IRQ affinity adjustment is done also for the non-
x2APIC case.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
---
TBD: Instead of the system_state check in iov_enable_xt() the function
     could also zap its own hook pointer, at which point it could also
     become __init. This would, however, require that either
     resume_x2apic() be bound to ignore iommu_enable_x2apic() errors
     forever, or that iommu_enable_x2apic() be slightly re-arranged to
     not return -EOPNOTSUPP when finding a NULL hook during resume.

--- a/xen/drivers/passthrough/amd/iommu_init.c
+++ b/xen/drivers/passthrough/amd/iommu_init.c
@@ -834,6 +834,30 @@ static bool_t __init set_iommu_interrupt
     return 1;
 }
 
+int iov_adjust_irq_affinities(void)
+{
+    const struct amd_iommu *iommu;
+
+    if ( !iommu_enabled )
+        return 0;
+
+    for_each_amd_iommu ( iommu )
+    {
+        struct irq_desc *desc = irq_to_desc(iommu->msi.irq);
+        unsigned long flags;
+
+        spin_lock_irqsave(&desc->lock, flags);
+        if ( iommu->ctrl.int_cap_xt_en )
+            set_x2apic_affinity(desc, &cpu_online_map);
+        else
+            set_msi_affinity(desc, &cpu_online_map);
+        spin_unlock_irqrestore(&desc->lock, flags);
+    }
+
+    return 0;
+}
+__initcall(iov_adjust_irq_affinities);
+
 /*
  * Family15h Model 10h-1fh erratum 746 (IOMMU Logging May Stall Translations)
  * Workaround:
@@ -1047,7 +1071,7 @@ static void * __init allocate_ppr_log(st
                                 IOMMU_PPR_LOG_DEFAULT_ENTRIES, "PPR Log");
 }
 
-static int __init amd_iommu_init_one(struct amd_iommu *iommu)
+static int __init amd_iommu_init_one(struct amd_iommu *iommu, bool intr)
 {
     if ( allocate_cmd_buffer(iommu) == NULL )
         goto error_out;
@@ -1058,7 +1082,7 @@ static int __init amd_iommu_init_one(str
     if ( iommu->features.flds.ppr_sup && !allocate_ppr_log(iommu) )
         goto error_out;
 
-    if ( !set_iommu_interrupt_handler(iommu) )
+    if ( intr && !set_iommu_interrupt_handler(iommu) )
         goto error_out;
 
     /* To make sure that device_table.buffer has been successfully allocated */
@@ -1285,7 +1309,7 @@ static int __init amd_iommu_prepare_one(
     return 0;
 }
 
-int __init amd_iommu_init(void)
+int __init amd_iommu_prepare(void)
 {
     struct amd_iommu *iommu;
     int rc = -ENODEV;
@@ -1300,9 +1324,14 @@ int __init amd_iommu_init(void)
     if ( unlikely(acpi_gbl_FADT.boot_flags & ACPI_FADT_NO_MSI) )
         goto error_out;
 
+    /* Have we been here before? */
+    if ( ivhd_type )
+        return 0;
+
     rc = amd_iommu_get_supported_ivhd_type();
     if ( rc < 0 )
         goto error_out;
+    BUG_ON(!rc);
     ivhd_type = rc;
 
     rc = amd_iommu_get_ivrs_dev_entries();
@@ -1321,9 +1350,33 @@ int __init amd_iommu_init(void)
     }
 
     rc = amd_iommu_update_ivrs_mapping_acpi();
+
+ error_out:
+    if ( rc )
+    {
+        amd_iommu_init_cleanup();
+        ivhd_type = 0;
+    }
+
+    return rc;
+}
+
+int __init amd_iommu_init(bool xt)
+{
+    struct amd_iommu *iommu;
+    int rc = amd_iommu_prepare();
+
     if ( rc )
         goto error_out;
 
+    for_each_amd_iommu ( iommu )
+    {
+        /* NB: There's no need to actually write these out right here. */
+        iommu->ctrl.ga_en |= xt;
+        iommu->ctrl.xt_en = xt;
+        iommu->ctrl.int_cap_xt_en = xt;
+    }
+
     /* initialize io-apic interrupt remapping entries */
     if ( iommu_intremap )
         rc = amd_iommu_setup_ioapic_remapping();
@@ -1346,7 +1399,7 @@ int __init amd_iommu_init(void)
     /* per iommu initialization  */
     for_each_amd_iommu ( iommu )
     {
-        rc = amd_iommu_init_one(iommu);
+        rc = amd_iommu_init_one(iommu, !xt);
         if ( rc )
             goto error_out;
     }
@@ -1358,6 +1411,40 @@ error_out:
     return rc;
 }
 
+int __init amd_iommu_init_interrupt(void)
+{
+    struct amd_iommu *iommu;
+    int rc = 0;
+
+    for_each_amd_iommu ( iommu )
+    {
+        struct irq_desc *desc;
+
+        if ( !set_iommu_interrupt_handler(iommu) )
+        {
+            rc = -EIO;
+            break;
+        }
+
+        desc = irq_to_desc(iommu->msi.irq);
+
+        spin_lock(&desc->lock);
+        ASSERT(iommu->ctrl.int_cap_xt_en);
+        set_x2apic_affinity(desc, &cpu_online_map);
+        spin_unlock(&desc->lock);
+
+        set_iommu_event_log_control(iommu, IOMMU_CONTROL_ENABLED);
+
+        if ( iommu->features.flds.ppr_sup )
+            set_iommu_ppr_log_control(iommu, IOMMU_CONTROL_ENABLED);
+    }
+
+    if ( rc )
+        amd_iommu_init_cleanup();
+
+    return rc;
+}
+
 static void invalidate_all_domain_pages(void)
 {
     struct domain *d;
--- a/xen/drivers/passthrough/amd/iommu_intr.c
+++ b/xen/drivers/passthrough/amd/iommu_intr.c
@@ -796,6 +796,40 @@ void* __init amd_iommu_alloc_intremap_ta
     return tb;
 }
 
+bool __init iov_supports_xt(void)
+{
+    unsigned int apic;
+    struct amd_iommu *iommu;
+
+    if ( !iommu_enable || !iommu_intremap || !cpu_has_cx16 )
+        return false;
+
+    if ( amd_iommu_prepare() )
+        return false;
+
+    for_each_amd_iommu ( iommu )
+        if ( !iommu->features.flds.ga_sup || !iommu->features.flds.xt_sup )
+            return false;
+
+    for ( apic = 0; apic < nr_ioapics; apic++ )
+    {
+        unsigned int idx = ioapic_id_to_index(IO_APIC_ID(apic));
+
+        if ( idx == MAX_IO_APICS )
+            return false;
+
+        if ( !find_iommu_for_device(ioapic_sbdf[idx].seg,
+                                    ioapic_sbdf[idx].bdf) )
+        {
+            AMD_IOMMU_DEBUG("No IOMMU for IO-APIC %#x (ID %x)\n",
+                            apic, IO_APIC_ID(apic));
+            return false;
+        }
+    }
+
+    return true;
+}
+
 int __init amd_setup_hpet_msi(struct msi_desc *msi_desc)
 {
     spinlock_t *lock;
--- a/xen/drivers/passthrough/amd/pci_amd_iommu.c
+++ b/xen/drivers/passthrough/amd/pci_amd_iommu.c
@@ -170,7 +170,8 @@ static int __init iov_detect(void)
     if ( !iommu_enable && !iommu_intremap )
         return 0;
 
-    if ( amd_iommu_init() != 0 )
+    else if ( (init_done ? amd_iommu_init_interrupt()
+                         : amd_iommu_init(false)) != 0 )
     {
         printk("AMD-Vi: Error initialization\n");
         return -ENODEV;
@@ -183,6 +184,25 @@ static int __init iov_detect(void)
     return scan_pci_devices();
 }
 
+static int iov_enable_xt(void)
+{
+    int rc;
+
+    if ( system_state >= SYS_STATE_active )
+        return 0;
+
+    if ( (rc = amd_iommu_init(true)) != 0 )
+    {
+        printk("AMD-Vi: Error %d initializing for x2APIC mode\n", rc);
+        /* -ENXIO has special meaning to the caller - convert it. */
+        return rc != -ENXIO ? rc : -ENODATA;
+    }
+
+    init_done = true;
+
+    return 0;
+}
+
 int amd_iommu_alloc_root(struct domain_iommu *hd)
 {
     if ( unlikely(!hd->arch.root_table) )
@@ -559,11 +579,13 @@ static const struct iommu_ops __initcons
     .free_page_table = deallocate_page_table,
     .reassign_device = reassign_device,
     .get_device_group_id = amd_iommu_group_id,
+    .enable_x2apic = iov_enable_xt,
     .update_ire_from_apic = amd_iommu_ioapic_update_ire,
     .update_ire_from_msi = amd_iommu_msi_msg_update_ire,
     .read_apic_from_ire = amd_iommu_read_ioapic_from_ire,
     .read_msi_from_ire = amd_iommu_read_msi_from_ire,
     .setup_hpet_msi = amd_setup_hpet_msi,
+    .adjust_irq_affinities = iov_adjust_irq_affinities,
     .suspend = amd_iommu_suspend,
     .resume = amd_iommu_resume,
     .share_p2m = amd_iommu_share_p2m,
@@ -574,4 +596,5 @@ static const struct iommu_ops __initcons
 static const struct iommu_init_ops __initconstrel _iommu_init_ops = {
     .ops = &_iommu_ops,
     .setup = iov_detect,
+    .supports_x2apic = iov_supports_xt,
 };
--- a/xen/include/asm-x86/hvm/svm/amd-iommu-proto.h
+++ b/xen/include/asm-x86/hvm/svm/amd-iommu-proto.h
@@ -48,8 +48,11 @@ int amd_iommu_detect_acpi(void);
 void get_iommu_features(struct amd_iommu *iommu);
 
 /* amd-iommu-init functions */
-int amd_iommu_init(void);
+int amd_iommu_prepare(void);
+int amd_iommu_init(bool xt);
+int amd_iommu_init_interrupt(void);
 int amd_iommu_update_ivrs_mapping_acpi(void);
+int iov_adjust_irq_affinities(void);
 
 /* mapping functions */
 int __must_check amd_iommu_map_page(struct domain *d, dfn_t dfn,
@@ -96,6 +99,7 @@ void amd_iommu_flush_all_caches(struct a
 struct amd_iommu *find_iommu_for_device(int seg, int bdf);
 
 /* interrupt remapping */
+bool iov_supports_xt(void);
 int amd_iommu_setup_ioapic_remapping(void);
 void *amd_iommu_alloc_intremap_table(unsigned long **);
 int amd_iommu_free_intremap_table(u16 seg, struct ivrs_mappings *);




_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 54+ messages in thread

* [Xen-devel] [PATCH RFC 9/9] AMD/IOMMU: correct IRTE updating
  2019-06-13 13:14 [Xen-devel] [PATCH 0/9] x86: AMD x2APIC support Jan Beulich
                   ` (7 preceding siblings ...)
  2019-06-13 13:27 ` [Xen-devel] [PATCH 8/9] AMD/IOMMU: enable x2APIC mode when available Jan Beulich
@ 2019-06-13 13:28 ` Jan Beulich
  2019-06-18 13:28   ` Andrew Cooper
  2019-06-27 15:15 ` [Xen-devel] [PATCH v2 00/10] x86: AMD x2APIC support Jan Beulich
  9 siblings, 1 reply; 54+ messages in thread
From: Jan Beulich @ 2019-06-13 13:28 UTC (permalink / raw)
  To: xen-devel; +Cc: Andrew Cooper, Brian Woods, Suravee Suthikulpanit

While for 32-bit IRTEs I think we can safely continue to assume that the
writes will translate to a single MOV, the use of CMPXCHG16B is more
heavy handed than necessary for the 128-bit form, and the flushing
didn't get done along the lines of what the specification says. Mark
entries to be updated as not remapped (which will result in interrupt
requests to get target aborted, but the interrupts should be masked
anyway at that point in time), issue the flush, and only then write the
new entry. In the 128-bit IRTE case set RemapEn separately last, to that
the ordering of the writes of the two 64-bit halves won't matter.

In update_intremap_entry_from_msi_msg() also fold the duplicate initial
lock determination and acquire into just a single instance.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
---
RFC: Putting the flush invocations in loops isn't overly nice, but I
     don't think this can really be abused, since callers up the stack
     hold further locks. Nevertheless I'd like to ask for better
     suggestions.

--- a/xen/drivers/passthrough/amd/iommu_intr.c
+++ b/xen/drivers/passthrough/amd/iommu_intr.c
@@ -203,7 +203,7 @@ static void update_intremap_entry(union
         .vector = vector,
     };
     struct irte_full full = {
-        .remap_en = 1,
+        .remap_en = 0, /* Will be set explicitly below. */
         .sup_io_pf = 0,
         .int_type = int_type,
         .rq_eoi = 0,
@@ -215,20 +215,15 @@ static void update_intremap_entry(union
 
     switch ( irte_mode )
     {
-        __uint128_t ret;
-        union {
-            __uint128_t raw;
-            struct irte_full full;
-        } old;
-
     case irte_basic:
         *entry.basic = basic;
         break;
 
     case irte_full:
-        old.full = *entry.full;
-        ret = cmpxchg16b(entry.full, &old, &full);
-        ASSERT(ret == old.raw);
+        *entry.full = full;
+        wmb();
+        /* Enable the entry /after/ having written all other fields. */
+        entry.full->remap_en = 1;
         break;
 
     default:
@@ -292,6 +287,20 @@ static int update_intremap_entry_from_io
     }
 
     entry = get_intremap_entry(iommu->seg, req_id, offset);
+
+    /* The RemapEn fields match for all formats. */
+    while ( iommu->enabled && entry.basic->remap_en )
+    {
+        entry.basic->remap_en = 0;
+        spin_unlock(lock);
+
+        spin_lock(&iommu->lock);
+        amd_iommu_flush_intremap(iommu, req_id);
+        spin_unlock(&iommu->lock);
+
+        spin_lock(lock);
+    }
+
     if ( fresh )
         /* nothing */;
     else if ( !lo_update )
@@ -321,13 +330,6 @@ static int update_intremap_entry_from_io
 
     spin_unlock_irqrestore(lock, flags);
 
-    if ( iommu->enabled && !fresh )
-    {
-        spin_lock_irqsave(&iommu->lock, flags);
-        amd_iommu_flush_intremap(iommu, req_id);
-        spin_unlock_irqrestore(&iommu->lock, flags);
-    }
-
     set_rte_index(rte, offset);
 
     return 0;
@@ -591,19 +593,27 @@ static int update_intremap_entry_from_ms
     req_id = get_dma_requestor_id(iommu->seg, bdf);
     alias_id = get_intremap_requestor_id(iommu->seg, bdf);
 
+    lock = get_intremap_lock(iommu->seg, req_id);
+    spin_lock_irqsave(lock, flags);
+
     if ( msg == NULL )
     {
-        lock = get_intremap_lock(iommu->seg, req_id);
-        spin_lock_irqsave(lock, flags);
         for ( i = 0; i < nr; ++i )
             free_intremap_entry(iommu->seg, req_id, *remap_index + i);
         spin_unlock_irqrestore(lock, flags);
-        goto done;
-    }
 
-    lock = get_intremap_lock(iommu->seg, req_id);
+        if ( iommu->enabled )
+        {
+            spin_lock_irqsave(&iommu->lock, flags);
+            amd_iommu_flush_intremap(iommu, req_id);
+            if ( alias_id != req_id )
+                amd_iommu_flush_intremap(iommu, alias_id);
+            spin_unlock_irqrestore(&iommu->lock, flags);
+        }
+
+        return 0;
+    }
 
-    spin_lock_irqsave(lock, flags);
     dest_mode = (msg->address_lo >> MSI_ADDR_DESTMODE_SHIFT) & 0x1;
     delivery_mode = (msg->data >> MSI_DATA_DELIVERY_MODE_SHIFT) & 0x1;
     vector = (msg->data >> MSI_DATA_VECTOR_SHIFT) & MSI_DATA_VECTOR_MASK;
@@ -627,6 +637,22 @@ static int update_intremap_entry_from_ms
     }
 
     entry = get_intremap_entry(iommu->seg, req_id, offset);
+
+    /* The RemapEn fields match for all formats. */
+    while ( iommu->enabled && entry.basic->remap_en )
+    {
+        entry.basic->remap_en = 0;
+        spin_unlock(lock);
+
+        spin_lock(&iommu->lock);
+        amd_iommu_flush_intremap(iommu, req_id);
+        if ( alias_id != req_id )
+            amd_iommu_flush_intremap(iommu, alias_id);
+        spin_unlock(&iommu->lock);
+
+        spin_lock(lock);
+    }
+
     update_intremap_entry(entry, vector, delivery_mode, dest_mode, dest);
     spin_unlock_irqrestore(lock, flags);
 
@@ -646,16 +672,6 @@ static int update_intremap_entry_from_ms
                get_ivrs_mappings(iommu->seg)[alias_id].intremap_table);
     }
 
-done:
-    if ( iommu->enabled )
-    {
-        spin_lock_irqsave(&iommu->lock, flags);
-        amd_iommu_flush_intremap(iommu, req_id);
-        if ( alias_id != req_id )
-            amd_iommu_flush_intremap(iommu, alias_id);
-        spin_unlock_irqrestore(&iommu->lock, flags);
-    }
-
     return 0;
 }
 
@@ -801,7 +817,7 @@ bool __init iov_supports_xt(void)
     unsigned int apic;
     struct amd_iommu *iommu;
 
-    if ( !iommu_enable || !iommu_intremap || !cpu_has_cx16 )
+    if ( !iommu_enable || !iommu_intremap )
         return false;
 
     if ( amd_iommu_prepare() )




_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [Xen-devel] [PATCH 1/9] AMD/IOMMU: use bit field for extended feature register
  2019-06-13 13:22 ` [Xen-devel] [PATCH 1/9] AMD/IOMMU: use bit field for extended feature register Jan Beulich
@ 2019-06-17 19:07   ` Woods, Brian
  2019-06-18  9:37     ` Jan Beulich
  2019-06-17 20:23   ` Andrew Cooper
  1 sibling, 1 reply; 54+ messages in thread
From: Woods, Brian @ 2019-06-17 19:07 UTC (permalink / raw)
  To: Jan Beulich
  Cc: xen-devel, Woods, Brian, Suthikulpanit, Suravee, Andrew Cooper

On Thu, Jun 13, 2019 at 07:22:31AM -0600, Jan Beulich wrote:
> This also takes care of several of the shift values wrongly having been
> specified as hex rather than dec.
> 
> Take the opportunity and add further fields.
> 
> Signed-off-by: Jan Beulich <jbeulich@suse.com>
> 
> --- a/xen/drivers/passthrough/amd/iommu_detect.c
> +++ b/xen/drivers/passthrough/amd/iommu_detect.c
> @@ -60,43 +60,72 @@ static int __init get_iommu_capabilities
>  
>  void __init get_iommu_features(struct amd_iommu *iommu)
>  {
> -    u32 low, high;
> -    int i = 0 ;
> -    static const char *__initdata feature_str[] = {
> -        "- Prefetch Pages Command", 
> -        "- Peripheral Page Service Request", 
> -        "- X2APIC Supported", 
> -        "- NX bit Supported", 
> -        "- Guest Translation", 
> -        "- Reserved bit [5]",
> -        "- Invalidate All Command", 
> -        "- Guest APIC supported", 
> -        "- Hardware Error Registers", 
> -        "- Performance Counters", 
> -        NULL
> -    };
> -
>      ASSERT( iommu->mmio_base );
>  
>      if ( !iommu_has_cap(iommu, PCI_CAP_EFRSUP_SHIFT) )
>      {
> -        iommu->features = 0;
> +        iommu->features.raw = 0;
>          return;
>      }
>  
> -    low = readl(iommu->mmio_base + IOMMU_EXT_FEATURE_MMIO_OFFSET);
> -    high = readl(iommu->mmio_base + IOMMU_EXT_FEATURE_MMIO_OFFSET + 4);
> -
> -    iommu->features = ((u64)high << 32) | low;
> +    iommu->features.raw =
> +        readq(iommu->mmio_base + IOMMU_EXT_FEATURE_MMIO_OFFSET);
>  
>      printk("AMD-Vi: IOMMU Extended Features:\n");
>  
> -    while ( feature_str[i] )
> +#define MASK(fld) ((union amd_iommu_ext_features){ .flds.fld = ~0 }).raw
> +#define FEAT(fld, str) do { \
> +    if ( MASK(fld) & (MASK(fld) - 1) ) \
> +        printk( "- " str ": %#x\n", iommu->features.flds.fld); \
> +    else if ( iommu->features.raw & MASK(fld) ) \
> +        printk( "- " str "\n"); \
> +} while ( false )
> +
> +    FEAT(pref_sup,           "Prefetch Pages Command");
> +    FEAT(ppr_sup,            "Peripheral Page Service Request");
> +    FEAT(xt_sup,             "x2APIC");
> +    FEAT(nx_sup,             "NX bit");
> +    FEAT(gappi_sup,          "Guest APIC Physical Processor Interrupt");
> +    FEAT(ia_sup,             "Invalidate All Command");
> +    FEAT(ga_sup,             "Guest APIC");
> +    FEAT(he_sup,             "Hardware Error Registers");
> +    FEAT(pc_sup,             "Performance Counters");
> +    FEAT(hats,               "Host Address Translation Size");
> +
> +    if ( iommu->features.flds.gt_sup )
>      {
> -        if ( amd_iommu_has_feature(iommu, i) )
> -            printk( " %s\n", feature_str[i]);
> -        i++;
> +        FEAT(gats,           "Guest Address Translation Size");
> +        FEAT(glx_sup,        "Guest CR3 Root Table Level");
> +        FEAT(pas_max,        "Maximum PASID");
>      }
> +
> +    FEAT(smif_sup,           "SMI Filter Register");
> +    FEAT(smif_rc,            "SMI Filter Register Count");
> +    FEAT(gam_sup,            "Guest Virtual APIC Modes");
> +    FEAT(dual_ppr_log_sup,   "Dual PPR Log");
> +    FEAT(dual_event_log_sup, "Dual Event Log");
> +    FEAT(sat_sup,            "Secure ATS");
> +    FEAT(us_sup,             "User / Supervisor Page Protection");
> +    FEAT(dev_tbl_seg_sup,    "Device Table Segmentation");
> +    FEAT(ppr_early_of_sup,   "PPR Log Overflow Early Warning");
> +    FEAT(ppr_auto_rsp_sup,   "PPR Automatic Response");
> +    FEAT(marc_sup,           "Memory Access Routing and Control");
> +    FEAT(blk_stop_mrk_sup,   "Block StopMark Message");
> +    FEAT(perf_opt_sup ,      "Performance Optimization");
> +    FEAT(msi_cap_mmio_sup,   "MSI Capability MMIO Access");
> +    FEAT(gio_sup,            "Guest I/O Protection");
> +    FEAT(ha_sup,             "Host Access");
> +    FEAT(eph_sup,            "Enhanced PPR Handling");
> +    FEAT(attr_fw_sup,        "Attribute Forward");
> +    FEAT(hd_sup,             "Host Dirty");
> +    FEAT(inv_iotlb_type_sup, "Invalidate IOTLB Type");
> +    FEAT(viommu_sup,         "Virtualized IOMMU");
> +    FEAT(vm_guard_io_sup,    "VMGuard I/O Support");
> +    FEAT(vm_table_size,      "VM Table Size");
> +    FEAT(ga_update_dis_sup,  "Guest Access Bit Update Disable");
> +
> +#undef FEAT
> +#undef MASK
>  }
>  
>  int __init amd_iommu_detect_one_acpi(
> --- a/xen/drivers/passthrough/amd/iommu_guest.c
> +++ b/xen/drivers/passthrough/amd/iommu_guest.c
> @@ -638,7 +638,7 @@ static uint64_t iommu_mmio_read64(struct
>          val = reg_to_u64(iommu->reg_status);
>          break;
>      case IOMMU_EXT_FEATURE_MMIO_OFFSET:
> -        val = reg_to_u64(iommu->reg_ext_feature);
> +        val = iommu->reg_ext_feature.raw;
>          break;
>  
>      default:
> @@ -802,39 +802,26 @@ int guest_iommu_set_base(struct domain *
>  /* Initialize mmio read only bits */
>  static void guest_iommu_reg_init(struct guest_iommu *iommu)
>  {
> -    uint32_t lower, upper;
> +    union amd_iommu_ext_features ef = {
> +        /* Support prefetch */
> +        .flds.pref_sup = 1,
> +        /* Support PPR log */
> +        .flds.ppr_sup = 1,
> +        /* Support guest translation */
> +        .flds.gt_sup = 1,
> +        /* Support invalidate all command */
> +        .flds.ia_sup = 1,
> +        /* Host translation size has 6 levels */
> +        .flds.hats = HOST_ADDRESS_SIZE_6_LEVEL,
> +        /* Guest translation size has 6 levels */
> +        .flds.gats = GUEST_ADDRESS_SIZE_6_LEVEL,
> +        /* Single level gCR3 */
> +        .flds.glx_sup = GUEST_CR3_1_LEVEL,
> +        /* 9 bit PASID */
> +        .flds.pas_max = PASMAX_9_bit,
> +    };
>  
> -    lower = upper = 0;
> -    /* Support prefetch */
> -    iommu_set_bit(&lower,IOMMU_EXT_FEATURE_PREFSUP_SHIFT);
> -    /* Support PPR log */
> -    iommu_set_bit(&lower,IOMMU_EXT_FEATURE_PPRSUP_SHIFT);
> -    /* Support guest translation */
> -    iommu_set_bit(&lower,IOMMU_EXT_FEATURE_GTSUP_SHIFT);
> -    /* Support invalidate all command */
> -    iommu_set_bit(&lower,IOMMU_EXT_FEATURE_IASUP_SHIFT);
> -
> -    /* Host translation size has 6 levels */
> -    set_field_in_reg_u32(HOST_ADDRESS_SIZE_6_LEVEL, lower,
> -                         IOMMU_EXT_FEATURE_HATS_MASK,
> -                         IOMMU_EXT_FEATURE_HATS_SHIFT,
> -                         &lower);
> -    /* Guest translation size has 6 levels */
> -    set_field_in_reg_u32(GUEST_ADDRESS_SIZE_6_LEVEL, lower,
> -                         IOMMU_EXT_FEATURE_GATS_MASK,
> -                         IOMMU_EXT_FEATURE_GATS_SHIFT,
> -                         &lower);
> -    /* Single level gCR3 */
> -    set_field_in_reg_u32(GUEST_CR3_1_LEVEL, lower,
> -                         IOMMU_EXT_FEATURE_GLXSUP_MASK,
> -                         IOMMU_EXT_FEATURE_GLXSUP_SHIFT, &lower);
> -    /* 9 bit PASID */
> -    set_field_in_reg_u32(PASMAX_9_bit, upper,
> -                         IOMMU_EXT_FEATURE_PASMAX_MASK,
> -                         IOMMU_EXT_FEATURE_PASMAX_SHIFT, &upper);
> -
> -    iommu->reg_ext_feature.lo = lower;
> -    iommu->reg_ext_feature.hi = upper;
> +    iommu->reg_ext_feature = ef;
>  }
>  
>  static int guest_iommu_mmio_range(struct vcpu *v, unsigned long addr)
> --- a/xen/drivers/passthrough/amd/iommu_init.c
> +++ b/xen/drivers/passthrough/amd/iommu_init.c
> @@ -883,7 +883,7 @@ static void enable_iommu(struct amd_iomm
>      register_iommu_event_log_in_mmio_space(iommu);
>      register_iommu_exclusion_range(iommu);
>  
> -    if ( amd_iommu_has_feature(iommu, IOMMU_EXT_FEATURE_PPRSUP_SHIFT) )
> +    if ( iommu->features.flds.ppr_sup )
>          register_iommu_ppr_log_in_mmio_space(iommu);
>  
>      desc = irq_to_desc(iommu->msi.irq);
> @@ -897,15 +897,15 @@ static void enable_iommu(struct amd_iomm
>      set_iommu_command_buffer_control(iommu, IOMMU_CONTROL_ENABLED);
>      set_iommu_event_log_control(iommu, IOMMU_CONTROL_ENABLED);
>  
> -    if ( amd_iommu_has_feature(iommu, IOMMU_EXT_FEATURE_PPRSUP_SHIFT) )
> +    if ( iommu->features.flds.ppr_sup )
>          set_iommu_ppr_log_control(iommu, IOMMU_CONTROL_ENABLED);
>  
> -    if ( amd_iommu_has_feature(iommu, IOMMU_EXT_FEATURE_GTSUP_SHIFT) )
> +    if ( iommu->features.flds.gt_sup )
>          set_iommu_guest_translation_control(iommu, IOMMU_CONTROL_ENABLED);
>  
>      set_iommu_translation_control(iommu, IOMMU_CONTROL_ENABLED);
>  
> -    if ( amd_iommu_has_feature(iommu, IOMMU_EXT_FEATURE_IASUP_SHIFT) )
> +    if ( iommu->features.flds.ia_sup )
>          amd_iommu_flush_all_caches(iommu);
>  
>      iommu->enabled = 1;
> @@ -928,10 +928,10 @@ static void disable_iommu(struct amd_iom
>      set_iommu_command_buffer_control(iommu, IOMMU_CONTROL_DISABLED);
>      set_iommu_event_log_control(iommu, IOMMU_CONTROL_DISABLED);
>  
> -    if ( amd_iommu_has_feature(iommu, IOMMU_EXT_FEATURE_PPRSUP_SHIFT) )
> +    if ( iommu->features.flds.ppr_sup )
>          set_iommu_ppr_log_control(iommu, IOMMU_CONTROL_DISABLED);
>  
> -    if ( amd_iommu_has_feature(iommu, IOMMU_EXT_FEATURE_GTSUP_SHIFT) )
> +    if ( iommu->features.flds.gt_sup )
>          set_iommu_guest_translation_control(iommu, IOMMU_CONTROL_DISABLED);
>  
>      set_iommu_translation_control(iommu, IOMMU_CONTROL_DISABLED);
> @@ -1027,7 +1027,7 @@ static int __init amd_iommu_init_one(str
>  
>      get_iommu_features(iommu);
>  
> -    if ( iommu->features )
> +    if ( iommu->features.raw )
>          iommuv2_enabled = 1;
>  
>      if ( allocate_cmd_buffer(iommu) == NULL )
> @@ -1036,9 +1036,8 @@ static int __init amd_iommu_init_one(str
>      if ( allocate_event_log(iommu) == NULL )
>          goto error_out;
>  
> -    if ( amd_iommu_has_feature(iommu, IOMMU_EXT_FEATURE_PPRSUP_SHIFT) )
> -        if ( allocate_ppr_log(iommu) == NULL )
> -            goto error_out;
> +    if ( iommu->features.flds.ppr_sup && !allocate_ppr_log(iommu) )
> +        goto error_out;
>  
>      if ( !set_iommu_interrupt_handler(iommu) )
>          goto error_out;
> @@ -1389,7 +1388,7 @@ void amd_iommu_resume(void)
>      }
>  
>      /* flush all cache entries after iommu re-enabled */
> -    if ( !amd_iommu_has_feature(iommu, IOMMU_EXT_FEATURE_IASUP_SHIFT) )
> +    if ( !iommu->features.flds.ia_sup )
>      {
>          invalidate_all_devices();
>          invalidate_all_domain_pages();
> --- a/xen/include/asm-x86/amd-iommu.h
> +++ b/xen/include/asm-x86/amd-iommu.h
> @@ -83,7 +83,7 @@ struct amd_iommu {
>      iommu_cap_t cap;
>  
>      u8 ht_flags;
> -    u64 features;
> +    union amd_iommu_ext_features features;
>  
>      void *mmio_base;
>      unsigned long mmio_base_phys;
> @@ -174,7 +174,7 @@ struct guest_iommu {
>      /* MMIO regs */
>      struct mmio_reg         reg_ctrl;              /* MMIO offset 0018h */
>      struct mmio_reg         reg_status;            /* MMIO offset 2020h */
> -    struct mmio_reg         reg_ext_feature;       /* MMIO offset 0030h */
> +    union amd_iommu_ext_features reg_ext_feature;  /* MMIO offset 0030h */
>  
>      /* guest interrupt settings */
>      struct guest_iommu_msi  msi;
> --- a/xen/include/asm-x86/hvm/svm/amd-iommu-defs.h
> +++ b/xen/include/asm-x86/hvm/svm/amd-iommu-defs.h
> @@ -346,26 +346,57 @@ struct amd_iommu_dte {
>  #define IOMMU_EXCLUSION_LIMIT_HIGH_MASK		0xFFFFFFFF
>  #define IOMMU_EXCLUSION_LIMIT_HIGH_SHIFT	0
>  
> -/* Extended Feature Register*/
> +/* Extended Feature Register */
>  #define IOMMU_EXT_FEATURE_MMIO_OFFSET                   0x30
> -#define IOMMU_EXT_FEATURE_PREFSUP_SHIFT                 0x0
> -#define IOMMU_EXT_FEATURE_PPRSUP_SHIFT                  0x1
> -#define IOMMU_EXT_FEATURE_XTSUP_SHIFT                   0x2
> -#define IOMMU_EXT_FEATURE_NXSUP_SHIFT                   0x3
> -#define IOMMU_EXT_FEATURE_GTSUP_SHIFT                   0x4
> -#define IOMMU_EXT_FEATURE_IASUP_SHIFT                   0x6
> -#define IOMMU_EXT_FEATURE_GASUP_SHIFT                   0x7
> -#define IOMMU_EXT_FEATURE_HESUP_SHIFT                   0x8
> -#define IOMMU_EXT_FEATURE_PCSUP_SHIFT                   0x9
> -#define IOMMU_EXT_FEATURE_HATS_SHIFT                    0x10
> -#define IOMMU_EXT_FEATURE_HATS_MASK                     0x00000C00
> -#define IOMMU_EXT_FEATURE_GATS_SHIFT                    0x12
> -#define IOMMU_EXT_FEATURE_GATS_MASK                     0x00003000
> -#define IOMMU_EXT_FEATURE_GLXSUP_SHIFT                  0x14
> -#define IOMMU_EXT_FEATURE_GLXSUP_MASK                   0x0000C000
>  
> -#define IOMMU_EXT_FEATURE_PASMAX_SHIFT                  0x0
> -#define IOMMU_EXT_FEATURE_PASMAX_MASK                   0x0000001F
> +union amd_iommu_ext_features {
> +    uint64_t raw;
> +    struct {
> +        unsigned int pref_sup:1;
> +        unsigned int ppr_sup:1;
> +        unsigned int xt_sup:1;
> +        unsigned int nx_sup:1;
> +        unsigned int gt_sup:1;
> +        unsigned int gappi_sup:1;
> +        unsigned int ia_sup:1;
> +        unsigned int ga_sup:1;
> +        unsigned int he_sup:1;
> +        unsigned int pc_sup:1;
> +        unsigned int hats:2;
> +        unsigned int gats:2;
> +        unsigned int glx_sup:2;
> +        unsigned int smif_sup:2;
> +        unsigned int smif_rc:3;
> +        unsigned int gam_sup:3;
> +        unsigned int dual_ppr_log_sup:2;
> +        unsigned int :2;
> +        unsigned int dual_event_log_sup:2;

> +        unsigned int sat_sup:1;
> +        unsigned int :1;
I think these might be flipped.

> +        unsigned int pas_max:5;
> +        unsigned int us_sup:1;
> +        unsigned int dev_tbl_seg_sup:2;
> +        unsigned int ppr_early_of_sup:1;
> +        unsigned int ppr_auto_rsp_sup:1;
> +        unsigned int marc_sup:2;
> +        unsigned int blk_stop_mrk_sup:1;
> +        unsigned int perf_opt_sup:1;
> +        unsigned int msi_cap_mmio_sup:1;
> +        unsigned int :1;
> +        unsigned int gio_sup:1;
> +        unsigned int ha_sup:1;
> +        unsigned int eph_sup:1;
> +        unsigned int attr_fw_sup:1;
> +        unsigned int hd_sup:1;
> +        unsigned int :1;
> +        unsigned int inv_iotlb_type_sup:1;
> +        unsigned int viommu_sup:1;
> +        unsigned int vm_guard_io_sup:1;
> +        unsigned int vm_table_size:4;
> +        unsigned int ga_update_dis_sup:1;
> +        unsigned int :2;
> +    } flds;
> +};
>  
>  /* Status Register*/
>  #define IOMMU_STATUS_MMIO_OFFSET		0x2020
> --- a/xen/include/asm-x86/hvm/svm/amd-iommu-proto.h
> +++ b/xen/include/asm-x86/hvm/svm/amd-iommu-proto.h
> @@ -219,13 +219,6 @@ static inline int iommu_has_cap(struct a
>      return !!(iommu->cap.header & (1u << bit));
>  }
>  
> -static inline int amd_iommu_has_feature(struct amd_iommu *iommu, uint32_t bit)
> -{
> -    if ( !iommu_has_cap(iommu, PCI_CAP_EFRSUP_SHIFT) )
> -        return 0;
> -    return !!(iommu->features & (1U << bit));
> -}
> -
>  /* access tail or head pointer of ring buffer */
>  static inline uint32_t iommu_get_rb_pointer(uint32_t reg)
>  {
> 
> 
> 

-- 
Brian Woods

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [Xen-devel] [PATCH 1/9] AMD/IOMMU: use bit field for extended feature register
  2019-06-13 13:22 ` [Xen-devel] [PATCH 1/9] AMD/IOMMU: use bit field for extended feature register Jan Beulich
  2019-06-17 19:07   ` Woods, Brian
@ 2019-06-17 20:23   ` Andrew Cooper
  2019-06-18  9:33     ` Jan Beulich
  1 sibling, 1 reply; 54+ messages in thread
From: Andrew Cooper @ 2019-06-17 20:23 UTC (permalink / raw)
  To: Jan Beulich, xen-devel; +Cc: Brian Woods, Suravee Suthikulpanit

On 13/06/2019 14:22, Jan Beulich wrote:
> This also takes care of several of the shift values wrongly having been
> specified as hex rather than dec.
>
> Take the opportunity and add further fields.
>
> Signed-off-by: Jan Beulich <jbeulich@suse.com>
>
> --- a/xen/drivers/passthrough/amd/iommu_detect.c
> +++ b/xen/drivers/passthrough/amd/iommu_detect.c
> @@ -60,43 +60,72 @@ static int __init get_iommu_capabilities
>  
>  void __init get_iommu_features(struct amd_iommu *iommu)
>  {
> -    u32 low, high;
> -    int i = 0 ;
> -    static const char *__initdata feature_str[] = {
> -        "- Prefetch Pages Command", 
> -        "- Peripheral Page Service Request", 
> -        "- X2APIC Supported", 
> -        "- NX bit Supported", 
> -        "- Guest Translation", 
> -        "- Reserved bit [5]",
> -        "- Invalidate All Command", 
> -        "- Guest APIC supported", 
> -        "- Hardware Error Registers", 
> -        "- Performance Counters", 
> -        NULL
> -    };
> -
>      ASSERT( iommu->mmio_base );
>  
>      if ( !iommu_has_cap(iommu, PCI_CAP_EFRSUP_SHIFT) )
>      {
> -        iommu->features = 0;
> +        iommu->features.raw = 0;
>          return;
>      }
>  
> -    low = readl(iommu->mmio_base + IOMMU_EXT_FEATURE_MMIO_OFFSET);
> -    high = readl(iommu->mmio_base + IOMMU_EXT_FEATURE_MMIO_OFFSET + 4);
> -
> -    iommu->features = ((u64)high << 32) | low;
> +    iommu->features.raw =
> +        readq(iommu->mmio_base + IOMMU_EXT_FEATURE_MMIO_OFFSET);
>  
>      printk("AMD-Vi: IOMMU Extended Features:\n");
>  
> -    while ( feature_str[i] )
> +#define MASK(fld) ((union amd_iommu_ext_features){ .flds.fld = ~0 }).raw
> +#define FEAT(fld, str) do { \
> +    if ( MASK(fld) & (MASK(fld) - 1) ) \
> +        printk( "- " str ": %#x\n", iommu->features.flds.fld); \
> +    else if ( iommu->features.raw & MASK(fld) ) \
> +        printk( "- " str "\n"); \
> +} while ( false )
> +
> +    FEAT(pref_sup,           "Prefetch Pages Command");
> +    FEAT(ppr_sup,            "Peripheral Page Service Request");
> +    FEAT(xt_sup,             "x2APIC");
> +    FEAT(nx_sup,             "NX bit");
> +    FEAT(gappi_sup,          "Guest APIC Physical Processor Interrupt");
> +    FEAT(ia_sup,             "Invalidate All Command");
> +    FEAT(ga_sup,             "Guest APIC");
> +    FEAT(he_sup,             "Hardware Error Registers");
> +    FEAT(pc_sup,             "Performance Counters");
> +    FEAT(hats,               "Host Address Translation Size");
> +
> +    if ( iommu->features.flds.gt_sup )
>      {
> -        if ( amd_iommu_has_feature(iommu, i) )
> -            printk( " %s\n", feature_str[i]);
> -        i++;
> +        FEAT(gats,           "Guest Address Translation Size");
> +        FEAT(glx_sup,        "Guest CR3 Root Table Level");
> +        FEAT(pas_max,        "Maximum PASID");
>      }
> +
> +    FEAT(smif_sup,           "SMI Filter Register");
> +    FEAT(smif_rc,            "SMI Filter Register Count");
> +    FEAT(gam_sup,            "Guest Virtual APIC Modes");
> +    FEAT(dual_ppr_log_sup,   "Dual PPR Log");
> +    FEAT(dual_event_log_sup, "Dual Event Log");
> +    FEAT(sat_sup,            "Secure ATS");
> +    FEAT(us_sup,             "User / Supervisor Page Protection");
> +    FEAT(dev_tbl_seg_sup,    "Device Table Segmentation");
> +    FEAT(ppr_early_of_sup,   "PPR Log Overflow Early Warning");
> +    FEAT(ppr_auto_rsp_sup,   "PPR Automatic Response");
> +    FEAT(marc_sup,           "Memory Access Routing and Control");
> +    FEAT(blk_stop_mrk_sup,   "Block StopMark Message");
> +    FEAT(perf_opt_sup ,      "Performance Optimization");
> +    FEAT(msi_cap_mmio_sup,   "MSI Capability MMIO Access");
> +    FEAT(gio_sup,            "Guest I/O Protection");
> +    FEAT(ha_sup,             "Host Access");
> +    FEAT(eph_sup,            "Enhanced PPR Handling");
> +    FEAT(attr_fw_sup,        "Attribute Forward");
> +    FEAT(hd_sup,             "Host Dirty");
> +    FEAT(inv_iotlb_type_sup, "Invalidate IOTLB Type");
> +    FEAT(viommu_sup,         "Virtualized IOMMU");
> +    FEAT(vm_guard_io_sup,    "VMGuard I/O Support");
> +    FEAT(vm_table_size,      "VM Table Size");
> +    FEAT(ga_update_dis_sup,  "Guest Access Bit Update Disable");
> +
> +#undef FEAT
> +#undef MASK
>  }

So this is fine, but does come with a downside.  This is the log from a
Rome system:

(XEN) [   17.928225] AMD-Vi: IOMMU Extended Features:
(XEN) [   17.988264] - Peripheral Page Service Request
(XEN) [   18.048082] - x2APIC
(XEN) [   18.104735] - NX bit
(XEN) [   18.160751] - Invalidate All Command
(XEN) [   18.218116] - Guest APIC
(XEN) [   18.273660] - Performance Counters
(XEN) [   18.329868] - Host Address Translation Size: 0x2
(XEN) [   18.387363] - Guest Address Translation Size: 0
(XEN) [   18.444446] - Guest CR3 Root Table Level: 0x1
(XEN) [   18.501006] - Maximum PASID: 0xf
(XEN) [   18.555753] - SMI Filter Register: 0x1
(XEN) [   18.610773] - SMI Filter Register Count: 0x2
(XEN) [   18.666116] - Guest Virtual APIC Modes: 0x1
(XEN) [   18.721036] - Dual PPR Log: 0x2
(XEN) [   18.774237] - Dual Event Log: 0x2
(XEN) [   18.827164] - User / Supervisor Page Protection
(XEN) [   18.881374] - Device Table Segmentation: 0x3
(XEN) [   18.934949] - PPR Log Overflow Early Warning
(XEN) [   18.988186] - PPR Automatic Response
(XEN) [   19.040193] - Memory Access Routing and Control: 0x1
(XEN) [   19.093770] - Block StopMark Message
(XEN) [   19.145300] - Performance Optimization
(XEN) [   19.196603] - MSI Capability MMIO Access
(XEN) [   19.247754] - Guest I/O Protection
(XEN) [   19.297811] - Host Access
(XEN) [   19.346396] - Enhanced PPR Handling
(XEN) [   19.395647] - Attribute Forward
(XEN) [   19.443986] - Virtualized IOMMU
(XEN) [   19.491828] - VMGuard I/O Support
(XEN) [   19.539404] - VM Table Size: 0x2
(XEN) [   19.589837] AMD-Vi: IOMMU Extended Features:
(XEN) [   19.637829] - Peripheral Page Service Request
(XEN) [   19.685607] - x2APIC
(XEN) [   19.730231] - NX bit
(XEN) [   19.774195] - Invalidate All Command
(XEN) [   19.819528] - Guest APIC
(XEN) [   19.863027] - Performance Counters
(XEN) [   19.907165] - Host Address Translation Size: 0x2
(XEN) [   19.952578] - Guest Address Translation Size: 0
(XEN) [   19.997588] - Guest CR3 Root Table Level: 0x1
(XEN) [   20.042057] - Maximum PASID: 0xf
(XEN) [   20.084732] - SMI Filter Register: 0x1
(XEN) [   20.127648] - SMI Filter Register Count: 0x2
(XEN) [   20.170910] - Guest Virtual APIC Modes: 0x1
(XEN) [   20.213702] - Dual PPR Log: 0x2
(XEN) [   20.254765] - Dual Event Log: 0x2
(XEN) [   20.295567] - User / Supervisor Page Protection
(XEN) [   20.337626] - Device Table Segmentation: 0x3
(XEN) [   20.379042] - PPR Log Overflow Early Warning
(XEN) [   20.420094] - PPR Automatic Response
(XEN) [   20.459890] - Memory Access Routing and Control: 0x1
(XEN) [   20.501248] - Block StopMark Message
(XEN) [   20.540577] - Performance Optimization
(XEN) [   20.579876] - MSI Capability MMIO Access
(XEN) [   20.619309] - Guest I/O Protection
(XEN) [   20.658041] - Host Access
(XEN) [   20.695937] - Enhanced PPR Handling
(XEN) [   20.735276] - Attribute Forward
(XEN) [   20.773830] - Virtualized IOMMU
(XEN) [   20.812335] - VMGuard I/O Support
(XEN) [   20.850868] - VM Table Size: 0x2
(XEN) [   20.892543] AMD-Vi: IOMMU Extended Features:
(XEN) [   20.932087] - Peripheral Page Service Request
(XEN) [   20.971756] - x2APIC
(XEN) [   21.008681] - NX bit
(XEN) [   21.045279] - Invalidate All Command
(XEN) [   21.083567] - Guest APIC
(XEN) [   21.120410] - Performance Counters
(XEN) [   21.158386] - Host Address Translation Size: 0x2
(XEN) [   21.197956] - Guest Address Translation Size: 0
(XEN) [   21.237433] - Guest CR3 Root Table Level: 0x1
(XEN) [   21.276709] - Maximum PASID: 0xf
(XEN) [   21.314538] - SMI Filter Register: 0x1
(XEN) [   21.352844] - SMI Filter Register Count: 0x2
(XEN) [   21.391728] - Guest Virtual APIC Modes: 0x1
(XEN) [   21.430614] - Dual PPR Log: 0x2
(XEN) [   21.468150] - Dual Event Log: 0x2
(XEN) [   21.505833] - User / Supervisor Page Protection
(XEN) [   21.545270] - Device Table Segmentation: 0x3
(XEN) [   21.584538] - PPR Log Overflow Early Warning
(XEN) [   21.623928] - PPR Automatic Response
(XEN) [   21.662564] - Memory Access Routing and Control: 0x1
(XEN) [   21.703268] - Block StopMark Message
(XEN) [   21.742415] - Performance Optimization
(XEN) [   21.781692] - MSI Capability MMIO Access
(XEN) [   21.821159] - Guest I/O Protection
(XEN) [   21.859916] - Host Access
(XEN) [   21.897811] - Enhanced PPR Handling
(XEN) [   21.936818] - Attribute Forward
(XEN) [   21.975387] - Virtualized IOMMU
(XEN) [   22.013898] - VMGuard I/O Support
(XEN) [   22.052448] - VM Table Size: 0x2
(XEN) [   22.094140] AMD-Vi: IOMMU Extended Features:
(XEN) [   22.133685] - Peripheral Page Service Request
(XEN) [   22.173363] - x2APIC
(XEN) [   22.210287] - NX bit
(XEN) [   22.246888] - Invalidate All Command
(XEN) [   22.285176] - Guest APIC
(XEN) [   22.322008] - Performance Counters
(XEN) [   22.359994] - Host Address Translation Size: 0x2
(XEN) [   22.399552] - Guest Address Translation Size: 0
(XEN) [   22.439028] - Guest CR3 Root Table Level: 0x1
(XEN) [   22.478307] - Maximum PASID: 0xf
(XEN) [   22.516133] - SMI Filter Register: 0x1
(XEN) [   22.554441] - SMI Filter Register Count: 0x2
(XEN) [   22.593345] - Guest Virtual APIC Modes: 0x1
(XEN) [   22.632221] - Dual PPR Log: 0x2
(XEN) [   22.669766] - Dual Event Log: 0x2
(XEN) [   22.707455] - User / Supervisor Page Protection
(XEN) [   22.746896] - Device Table Segmentation: 0x3
(XEN) [   22.786161] - PPR Log Overflow Early Warning
(XEN) [   22.825552] - PPR Automatic Response
(XEN) [   22.864211] - Memory Access Routing and Control: 0x1
(XEN) [   22.904917] - Block StopMark Message
(XEN) [   22.944075] - Performance Optimization
(XEN) [   22.983360] - MSI Capability MMIO Access
(XEN) [   23.022815] - Guest I/O Protection
(XEN) [   23.061548] - Host Access
(XEN) [   23.099437] - Enhanced PPR Handling
(XEN) [   23.138461] - Attribute Forward
(XEN) [   23.177008] - Virtualized IOMMU
(XEN) [   23.215523] - VMGuard I/O Support
(XEN) [   23.254043] - VM Table Size: 0x2
(XEN) [   23.295705] AMD-Vi: IOMMU Extended Features:
(XEN) [   23.335250] - Peripheral Page Service Request
(XEN) [   23.374941] - x2APIC
(XEN) [   23.411860] - NX bit
(XEN) [   23.448460] - Invalidate All Command
(XEN) [   23.486748] - Guest APIC
(XEN) [   23.523569] - Performance Counters
(XEN) [   23.561564] - Host Address Translation Size: 0x2
(XEN) [   23.601127] - Guest Address Translation Size: 0
(XEN) [   23.640584] - Guest CR3 Root Table Level: 0x1
(XEN) [   23.679885] - Maximum PASID: 0xf
(XEN) [   23.717689] - SMI Filter Register: 0x1
(XEN) [   23.755994] - SMI Filter Register Count: 0x2
(XEN) [   23.794897] - Guest Virtual APIC Modes: 0x1
(XEN) [   23.833766] - Dual PPR Log: 0x2
(XEN) [   23.871319] - Dual Event Log: 0x2
(XEN) [   23.909008] - User / Supervisor Page Protection
(XEN) [   23.948434] - Device Table Segmentation: 0x3
(XEN) [   23.987698] - PPR Log Overflow Early Warning
(XEN) [   24.027099] - PPR Automatic Response
(XEN) [   24.065748] - Memory Access Routing and Control: 0x1
(XEN) [   24.106438] - Block StopMark Message
(XEN) [   24.145591] - Performance Optimization
(XEN) [   24.184893] - MSI Capability MMIO Access
(XEN) [   24.224345] - Guest I/O Protection
(XEN) [   24.263076] - Host Access
(XEN) [   24.300974] - Enhanced PPR Handling
(XEN) [   24.339982] - Attribute Forward
(XEN) [   24.378528] - Virtualized IOMMU
(XEN) [   24.417041] - VMGuard I/O Support
(XEN) [   24.455584] - VM Table Size: 0x2
(XEN) [   24.497260] AMD-Vi: IOMMU Extended Features:
(XEN) [   24.536794] - Peripheral Page Service Request
(XEN) [   24.576469] - x2APIC
(XEN) [   24.613395] - NX bit
(XEN) [   24.649996] - Invalidate All Command
(XEN) [   24.688294] - Guest APIC
(XEN) [   24.725142] - Performance Counters
(XEN) [   24.763127] - Host Address Translation Size: 0x2
(XEN) [   24.802707] - Guest Address Translation Size: 0
(XEN) [   24.842172] - Guest CR3 Root Table Level: 0x1
(XEN) [   24.881459] - Maximum PASID: 0xf
(XEN) [   24.919288] - SMI Filter Register: 0x1
(XEN) [   24.957583] - SMI Filter Register Count: 0x2
(XEN) [   24.996496] - Guest Virtual APIC Modes: 0x1
(XEN) [   25.035364] - Dual PPR Log: 0x2
(XEN) [   25.072908] - Dual Event Log: 0x2
(XEN) [   25.110588] - User / Supervisor Page Protection
(XEN) [   25.150037] - Device Table Segmentation: 0x3
(XEN) [   25.189290] - PPR Log Overflow Early Warning
(XEN) [   25.228696] - PPR Automatic Response
(XEN) [   25.267348] - Memory Access Routing a in Xennd Control: 0x1
(XEN) [   25.308045] - Block StopMark Message
(XEN) [   25.347199] - Performance Optimization
(XEN) [   25.386484] - MSI Capability MMIO Access
(XEN) [   25.425950] - Guest I/O Protection
(XEN) [   25.464680] - Host Access
(XEN) [   25.502580] - Enhanced PPR Handling
(XEN) [   25.541588] - Attribute Forward
(XEN) [   25.580145] - Virtualized IOMMU
(XEN) [   25.618647] - VMGuard I/O Support
(XEN) [   25.657196] - VM Table Size: 0x2
(XEN) [   25.698883] AMD-Vi: IOMMU Extended Features:
(XEN) [   25.738420] - Peripheral Page Service Request
(XEN) [   25.778084] - x2APIC
(XEN) [   25.815010] - NX bit
(XEN) [   25.851603] - Invalidate All Command
(XEN) [   25.889907] - Guest APIC
(XEN) [   25.926756] - Performance Counters
(XEN) [   25.964742] - Host Address Translation Size: 0x2
(XEN) [   26.004312] - Guest Address Translation Size: 0
(XEN) [   26.043787] - Guest CR3 Root Table Level: 0x1
(XEN) [   26.083064] - Maximum PASID: 0xf
(XEN) [   26.120903] - SMI Filter Register: 0x1
(XEN) [   26.159215] - SMI Filter Register Count: 0x2
(XEN) [   26.198129] - Guest Virtual APIC Modes: 0x1
(XEN) [   26.237015] - Dual PPR Log: 0x2
(XEN) [   26.274565] - Dual Event Log: 0x2
(XEN) [   26.312231] - User / Supervisor Page Protection
(XEN) [   26.351672] - Device Table Segmentation: 0x3
(XEN) [   26.390949] - PPR Log Overflow Early Warning
(XEN) [   26.430343] - PPR Automatic Response
(XEN) [   26.469006] - Memory Access Routing and Control: 0x1
(XEN) [   26.509711] - Block StopMark Message
(XEN) [   26.548867] - Performance Optimization
(XEN) [   26.588159] - MSI Capability MMIO Access
(XEN) [   26.627627] - Guest I/O Protection
(XEN) [   26.666355] - Host Access
(XEN) [   26.704253] - Enhanced PPR Handling
(XEN) [   26.743271] - Attribute Forward
(XEN) [   26.781828] - Virtualized IOMMU
(XEN) [   26.820341] - VMGuard I/O Support
(XEN) [   26.858874] - VM Table Size: 0x2
(XEN) [   26.900549] AMD-Vi: IOMMU Extended Features:
(XEN) [   26.940095] - Peripheral Page Service Request
(XEN) [   26.979769] - x2APIC
(XEN) [   27.016686] - NX bit
(XEN) [   27.053287] - Invalidate All Command
(XEN) [   27.091582] - Guest APIC
(XEN) [   27.128432] - Performance Counters
(XEN) [   27.166418] - Host Address Translation Size: 0x2
(XEN) [   27.205971] - Guest Address Translation Size: 0
(XEN) [   27.245440] - Guest CR3 Root Table Level: 0x1
(XEN) [   27.284714] - Maximum PASID: 0xf
(XEN) [   27.322543] - SMI Filter Register: 0x1
(XEN) [   27.360847] - SMI Filter Register Count: 0x2
(XEN) [   27.399768] - Guest Virtual APIC Modes: 0x1
(XEN) [   27.438656] - Dual PPR Log: 0x2
(XEN) [   27.476209] - Dual Event Log: 0x2
(XEN) [   27.513889] - User / Supervisor Page Protection
(XEN) [   27.553331] - Device Table Segmentation: 0x3
(XEN) [   27.592595] - PPR Log Overflow Early Warning
(XEN) [   27.632003] - PPR Automatic Response
(XEN) [   27.670655] - Memory Access Routing and Control: 0x1
(XEN) [   27.711351] - Block StopMark Message
(XEN) [   27.750525] - Performance Optimization
(XEN) [   27.789811] - MSI Capability MMIO Access
(XEN) [   27.829267] - Guest I/O Protection
(XEN) [   27.868005] - Host Access
(XEN) [   27.905889] - Enhanced PPR Handling
(XEN) [   27.944896] - Attribute Forward
(XEN) [   27.983445] - Virtualized IOMMU
(XEN) [   28.021956] - VMGuard I/O Support
(XEN) [   28.060494] - VM Table Size: 0x2
(XEN) [   28.284879] AMD-Vi: Disabled HAP memory map sharing with IOMMU
(XEN) [   28.326856] AMD-Vi: IOMMU 0 Enabled.
(XEN) [   28.366011] AMD-Vi: IOMMU 1 Enabled.
(XEN) [   28.405142] AMD-Vi: IOMMU 2 Enabled.
(XEN) [   28.444150] AMD-Vi: IOMMU 3 Enabled.
(XEN) [   28.483062] AMD-Vi: IOMMU 4 Enabled.
(XEN) [   28.521923] AMD-Vi: IOMMU 5 Enabled.
(XEN) [   28.560798] AMD-Vi: IOMMU 6 Enabled.
(XEN) [   28.599528] AMD-Vi: IOMMU 7 Enabled.
(XEN) [   28.645382] I/O virtualisation enabled
(XEN) [   28.684034]  - Dom0 mode: Relaxed
(XEN) [   28.722063] Interrupt remapping enabled

Given that the expected case is that all IOMMUs are identical, how about
only printing the details for IOMMU0, and eliding printing for further
IOMMUs which have an identical featureset?

> --- a/xen/include/asm-x86/hvm/svm/amd-iommu-defs.h
> +++ b/xen/include/asm-x86/hvm/svm/amd-iommu-defs.h
> @@ -346,26 +346,57 @@ struct amd_iommu_dte {
> -#define IOMMU_EXT_FEATURE_PASMAX_SHIFT                  0x0
> -#define IOMMU_EXT_FEATURE_PASMAX_MASK                   0x0000001F
> +union amd_iommu_ext_features {
> +    uint64_t raw;
> +    struct {
> +        unsigned int pref_sup:1;
> +        unsigned int ppr_sup:1;
> +        unsigned int xt_sup:1;
> +        unsigned int nx_sup:1;
> +        unsigned int gt_sup:1;
> +        unsigned int gappi_sup:1;
> +        unsigned int ia_sup:1;
> +        unsigned int ga_sup:1;
> +        unsigned int he_sup:1;
> +        unsigned int pc_sup:1;
> +        unsigned int hats:2;
> +        unsigned int gats:2;
> +        unsigned int glx_sup:2;
> +        unsigned int smif_sup:2;
> +        unsigned int smif_rc:3;
> +        unsigned int gam_sup:3;
> +        unsigned int dual_ppr_log_sup:2;
> +        unsigned int :2;
> +        unsigned int dual_event_log_sup:2;
> +        unsigned int sat_sup:1;
> +        unsigned int :1;
> +        unsigned int pas_max:5;
> +        unsigned int us_sup:1;
> +        unsigned int dev_tbl_seg_sup:2;
> +        unsigned int ppr_early_of_sup:1;
> +        unsigned int ppr_auto_rsp_sup:1;
> +        unsigned int marc_sup:2;
> +        unsigned int blk_stop_mrk_sup:1;
> +        unsigned int perf_opt_sup:1;
> +        unsigned int msi_cap_mmio_sup:1;
> +        unsigned int :1;
> +        unsigned int gio_sup:1;
> +        unsigned int ha_sup:1;
> +        unsigned int eph_sup:1;
> +        unsigned int attr_fw_sup:1;
> +        unsigned int hd_sup:1;
> +        unsigned int :1;
> +        unsigned int inv_iotlb_type_sup:1;
> +        unsigned int viommu_sup:1;
> +        unsigned int vm_guard_io_sup:1;
> +        unsigned int vm_table_size:4;
> +        unsigned int ga_update_dis_sup:1;
> +        unsigned int :2;
> +    } flds;
> +};

I'd suggest bool for single bitfields.  We've been bitten multiple times
by "x = (y & 0x80)" type bugs, which truncate to 0 using unsigned int
bitfields, but correctly become 1 given bool bitfields.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [Xen-devel] [PATCH 1/9] AMD/IOMMU: use bit field for extended feature register
  2019-06-17 20:23   ` Andrew Cooper
@ 2019-06-18  9:33     ` Jan Beulich
  0 siblings, 0 replies; 54+ messages in thread
From: Jan Beulich @ 2019-06-18  9:33 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: xen-devel, Brian Woods, Suravee Suthikulpanit

>>> On 17.06.19 at 22:23, <andrew.cooper3@citrix.com> wrote:
> On 13/06/2019 14:22, Jan Beulich wrote:
>> This also takes care of several of the shift values wrongly having been
>> specified as hex rather than dec.
>>
>> Take the opportunity and add further fields.
>>
>> Signed-off-by: Jan Beulich <jbeulich@suse.com>
>>
>> --- a/xen/drivers/passthrough/amd/iommu_detect.c
>> +++ b/xen/drivers/passthrough/amd/iommu_detect.c
>> @@ -60,43 +60,72 @@ static int __init get_iommu_capabilities
>>  
>>  void __init get_iommu_features(struct amd_iommu *iommu)
>>  {
>> -    u32 low, high;
>> -    int i = 0 ;
>> -    static const char *__initdata feature_str[] = {
>> -        "- Prefetch Pages Command", 
>> -        "- Peripheral Page Service Request", 
>> -        "- X2APIC Supported", 
>> -        "- NX bit Supported", 
>> -        "- Guest Translation", 
>> -        "- Reserved bit [5]",
>> -        "- Invalidate All Command", 
>> -        "- Guest APIC supported", 
>> -        "- Hardware Error Registers", 
>> -        "- Performance Counters", 
>> -        NULL
>> -    };
>> -
>>      ASSERT( iommu->mmio_base );
>>  
>>      if ( !iommu_has_cap(iommu, PCI_CAP_EFRSUP_SHIFT) )
>>      {
>> -        iommu->features = 0;
>> +        iommu->features.raw = 0;
>>          return;
>>      }
>>  
>> -    low = readl(iommu->mmio_base + IOMMU_EXT_FEATURE_MMIO_OFFSET);
>> -    high = readl(iommu->mmio_base + IOMMU_EXT_FEATURE_MMIO_OFFSET + 4);
>> -
>> -    iommu->features = ((u64)high << 32) | low;
>> +    iommu->features.raw =
>> +        readq(iommu->mmio_base + IOMMU_EXT_FEATURE_MMIO_OFFSET);
>>  
>>      printk("AMD-Vi: IOMMU Extended Features:\n");
>>  
>> -    while ( feature_str[i] )
>> +#define MASK(fld) ((union amd_iommu_ext_features){ .flds.fld = ~0 }).raw
>> +#define FEAT(fld, str) do { \
>> +    if ( MASK(fld) & (MASK(fld) - 1) ) \
>> +        printk( "- " str ": %#x\n", iommu->features.flds.fld); \
>> +    else if ( iommu->features.raw & MASK(fld) ) \
>> +        printk( "- " str "\n"); \
>> +} while ( false )
>> +
>> +    FEAT(pref_sup,           "Prefetch Pages Command");
>> +    FEAT(ppr_sup,            "Peripheral Page Service Request");
>> +    FEAT(xt_sup,             "x2APIC");
>> +    FEAT(nx_sup,             "NX bit");
>> +    FEAT(gappi_sup,          "Guest APIC Physical Processor Interrupt");
>> +    FEAT(ia_sup,             "Invalidate All Command");
>> +    FEAT(ga_sup,             "Guest APIC");
>> +    FEAT(he_sup,             "Hardware Error Registers");
>> +    FEAT(pc_sup,             "Performance Counters");
>> +    FEAT(hats,               "Host Address Translation Size");
>> +
>> +    if ( iommu->features.flds.gt_sup )
>>      {
>> -        if ( amd_iommu_has_feature(iommu, i) )
>> -            printk( " %s\n", feature_str[i]);
>> -        i++;
>> +        FEAT(gats,           "Guest Address Translation Size");
>> +        FEAT(glx_sup,        "Guest CR3 Root Table Level");
>> +        FEAT(pas_max,        "Maximum PASID");
>>      }
>> +
>> +    FEAT(smif_sup,           "SMI Filter Register");
>> +    FEAT(smif_rc,            "SMI Filter Register Count");
>> +    FEAT(gam_sup,            "Guest Virtual APIC Modes");
>> +    FEAT(dual_ppr_log_sup,   "Dual PPR Log");
>> +    FEAT(dual_event_log_sup, "Dual Event Log");
>> +    FEAT(sat_sup,            "Secure ATS");
>> +    FEAT(us_sup,             "User / Supervisor Page Protection");
>> +    FEAT(dev_tbl_seg_sup,    "Device Table Segmentation");
>> +    FEAT(ppr_early_of_sup,   "PPR Log Overflow Early Warning");
>> +    FEAT(ppr_auto_rsp_sup,   "PPR Automatic Response");
>> +    FEAT(marc_sup,           "Memory Access Routing and Control");
>> +    FEAT(blk_stop_mrk_sup,   "Block StopMark Message");
>> +    FEAT(perf_opt_sup ,      "Performance Optimization");
>> +    FEAT(msi_cap_mmio_sup,   "MSI Capability MMIO Access");
>> +    FEAT(gio_sup,            "Guest I/O Protection");
>> +    FEAT(ha_sup,             "Host Access");
>> +    FEAT(eph_sup,            "Enhanced PPR Handling");
>> +    FEAT(attr_fw_sup,        "Attribute Forward");
>> +    FEAT(hd_sup,             "Host Dirty");
>> +    FEAT(inv_iotlb_type_sup, "Invalidate IOTLB Type");
>> +    FEAT(viommu_sup,         "Virtualized IOMMU");
>> +    FEAT(vm_guard_io_sup,    "VMGuard I/O Support");
>> +    FEAT(vm_table_size,      "VM Table Size");
>> +    FEAT(ga_update_dis_sup,  "Guest Access Bit Update Disable");
>> +
>> +#undef FEAT
>> +#undef MASK
>>  }
> 
> So this is fine, but does come with a downside.  This is the log from a
> Rome system:
> 
> (XEN) [   17.928225] AMD-Vi: IOMMU Extended Features:
> (XEN) [   17.988264] - Peripheral Page Service Request
> (XEN) [   18.048082] - x2APIC
> (XEN) [   18.104735] - NX bit
> (XEN) [   18.160751] - Invalidate All Command
> (XEN) [   18.218116] - Guest APIC
> (XEN) [   18.273660] - Performance Counters
> (XEN) [   18.329868] - Host Address Translation Size: 0x2
> (XEN) [   18.387363] - Guest Address Translation Size: 0
> (XEN) [   18.444446] - Guest CR3 Root Table Level: 0x1
> (XEN) [   18.501006] - Maximum PASID: 0xf
> (XEN) [   18.555753] - SMI Filter Register: 0x1
> (XEN) [   18.610773] - SMI Filter Register Count: 0x2
> (XEN) [   18.666116] - Guest Virtual APIC Modes: 0x1
> (XEN) [   18.721036] - Dual PPR Log: 0x2
> (XEN) [   18.774237] - Dual Event Log: 0x2
> (XEN) [   18.827164] - User / Supervisor Page Protection
> (XEN) [   18.881374] - Device Table Segmentation: 0x3
> (XEN) [   18.934949] - PPR Log Overflow Early Warning
> (XEN) [   18.988186] - PPR Automatic Response
> (XEN) [   19.040193] - Memory Access Routing and Control: 0x1
> (XEN) [   19.093770] - Block StopMark Message
> (XEN) [   19.145300] - Performance Optimization
> (XEN) [   19.196603] - MSI Capability MMIO Access
> (XEN) [   19.247754] - Guest I/O Protection
> (XEN) [   19.297811] - Host Access
> (XEN) [   19.346396] - Enhanced PPR Handling
> (XEN) [   19.395647] - Attribute Forward
> (XEN) [   19.443986] - Virtualized IOMMU
> (XEN) [   19.491828] - VMGuard I/O Support
> (XEN) [   19.539404] - VM Table Size: 0x2
> (XEN) [   19.589837] AMD-Vi: IOMMU Extended Features:
> (XEN) [   19.637829] - Peripheral Page Service Request
> (XEN) [   19.685607] - x2APIC
> (XEN) [   19.730231] - NX bit
> (XEN) [   19.774195] - Invalidate All Command
> (XEN) [   19.819528] - Guest APIC
> (XEN) [   19.863027] - Performance Counters
> (XEN) [   19.907165] - Host Address Translation Size: 0x2
> (XEN) [   19.952578] - Guest Address Translation Size: 0
> (XEN) [   19.997588] - Guest CR3 Root Table Level: 0x1
> (XEN) [   20.042057] - Maximum PASID: 0xf
> (XEN) [   20.084732] - SMI Filter Register: 0x1
> (XEN) [   20.127648] - SMI Filter Register Count: 0x2
> (XEN) [   20.170910] - Guest Virtual APIC Modes: 0x1
> (XEN) [   20.213702] - Dual PPR Log: 0x2
> (XEN) [   20.254765] - Dual Event Log: 0x2
> (XEN) [   20.295567] - User / Supervisor Page Protection
> (XEN) [   20.337626] - Device Table Segmentation: 0x3
> (XEN) [   20.379042] - PPR Log Overflow Early Warning
> (XEN) [   20.420094] - PPR Automatic Response
> (XEN) [   20.459890] - Memory Access Routing and Control: 0x1
> (XEN) [   20.501248] - Block StopMark Message
> (XEN) [   20.540577] - Performance Optimization
> (XEN) [   20.579876] - MSI Capability MMIO Access
> (XEN) [   20.619309] - Guest I/O Protection
> (XEN) [   20.658041] - Host Access
> (XEN) [   20.695937] - Enhanced PPR Handling
> (XEN) [   20.735276] - Attribute Forward
> (XEN) [   20.773830] - Virtualized IOMMU
> (XEN) [   20.812335] - VMGuard I/O Support
> (XEN) [   20.850868] - VM Table Size: 0x2
> (XEN) [   20.892543] AMD-Vi: IOMMU Extended Features:
> (XEN) [   20.932087] - Peripheral Page Service Request
> (XEN) [   20.971756] - x2APIC
> (XEN) [   21.008681] - NX bit
> (XEN) [   21.045279] - Invalidate All Command
> (XEN) [   21.083567] - Guest APIC
> (XEN) [   21.120410] - Performance Counters
> (XEN) [   21.158386] - Host Address Translation Size: 0x2
> (XEN) [   21.197956] - Guest Address Translation Size: 0
> (XEN) [   21.237433] - Guest CR3 Root Table Level: 0x1
> (XEN) [   21.276709] - Maximum PASID: 0xf
> (XEN) [   21.314538] - SMI Filter Register: 0x1
> (XEN) [   21.352844] - SMI Filter Register Count: 0x2
> (XEN) [   21.391728] - Guest Virtual APIC Modes: 0x1
> (XEN) [   21.430614] - Dual PPR Log: 0x2
> (XEN) [   21.468150] - Dual Event Log: 0x2
> (XEN) [   21.505833] - User / Supervisor Page Protection
> (XEN) [   21.545270] - Device Table Segmentation: 0x3
> (XEN) [   21.584538] - PPR Log Overflow Early Warning
> (XEN) [   21.623928] - PPR Automatic Response
> (XEN) [   21.662564] - Memory Access Routing and Control: 0x1
> (XEN) [   21.703268] - Block StopMark Message
> (XEN) [   21.742415] - Performance Optimization
> (XEN) [   21.781692] - MSI Capability MMIO Access
> (XEN) [   21.821159] - Guest I/O Protection
> (XEN) [   21.859916] - Host Access
> (XEN) [   21.897811] - Enhanced PPR Handling
> (XEN) [   21.936818] - Attribute Forward
> (XEN) [   21.975387] - Virtualized IOMMU
> (XEN) [   22.013898] - VMGuard I/O Support
> (XEN) [   22.052448] - VM Table Size: 0x2
> (XEN) [   22.094140] AMD-Vi: IOMMU Extended Features:
> (XEN) [   22.133685] - Peripheral Page Service Request
> (XEN) [   22.173363] - x2APIC
> (XEN) [   22.210287] - NX bit
> (XEN) [   22.246888] - Invalidate All Command
> (XEN) [   22.285176] - Guest APIC
> (XEN) [   22.322008] - Performance Counters
> (XEN) [   22.359994] - Host Address Translation Size: 0x2
> (XEN) [   22.399552] - Guest Address Translation Size: 0
> (XEN) [   22.439028] - Guest CR3 Root Table Level: 0x1
> (XEN) [   22.478307] - Maximum PASID: 0xf
> (XEN) [   22.516133] - SMI Filter Register: 0x1
> (XEN) [   22.554441] - SMI Filter Register Count: 0x2
> (XEN) [   22.593345] - Guest Virtual APIC Modes: 0x1
> (XEN) [   22.632221] - Dual PPR Log: 0x2
> (XEN) [   22.669766] - Dual Event Log: 0x2
> (XEN) [   22.707455] - User / Supervisor Page Protection
> (XEN) [   22.746896] - Device Table Segmentation: 0x3
> (XEN) [   22.786161] - PPR Log Overflow Early Warning
> (XEN) [   22.825552] - PPR Automatic Response
> (XEN) [   22.864211] - Memory Access Routing and Control: 0x1
> (XEN) [   22.904917] - Block StopMark Message
> (XEN) [   22.944075] - Performance Optimization
> (XEN) [   22.983360] - MSI Capability MMIO Access
> (XEN) [   23.022815] - Guest I/O Protection
> (XEN) [   23.061548] - Host Access
> (XEN) [   23.099437] - Enhanced PPR Handling
> (XEN) [   23.138461] - Attribute Forward
> (XEN) [   23.177008] - Virtualized IOMMU
> (XEN) [   23.215523] - VMGuard I/O Support
> (XEN) [   23.254043] - VM Table Size: 0x2
> (XEN) [   23.295705] AMD-Vi: IOMMU Extended Features:
> (XEN) [   23.335250] - Peripheral Page Service Request
> (XEN) [   23.374941] - x2APIC
> (XEN) [   23.411860] - NX bit
> (XEN) [   23.448460] - Invalidate All Command
> (XEN) [   23.486748] - Guest APIC
> (XEN) [   23.523569] - Performance Counters
> (XEN) [   23.561564] - Host Address Translation Size: 0x2
> (XEN) [   23.601127] - Guest Address Translation Size: 0
> (XEN) [   23.640584] - Guest CR3 Root Table Level: 0x1
> (XEN) [   23.679885] - Maximum PASID: 0xf
> (XEN) [   23.717689] - SMI Filter Register: 0x1
> (XEN) [   23.755994] - SMI Filter Register Count: 0x2
> (XEN) [   23.794897] - Guest Virtual APIC Modes: 0x1
> (XEN) [   23.833766] - Dual PPR Log: 0x2
> (XEN) [   23.871319] - Dual Event Log: 0x2
> (XEN) [   23.909008] - User / Supervisor Page Protection
> (XEN) [   23.948434] - Device Table Segmentation: 0x3
> (XEN) [   23.987698] - PPR Log Overflow Early Warning
> (XEN) [   24.027099] - PPR Automatic Response
> (XEN) [   24.065748] - Memory Access Routing and Control: 0x1
> (XEN) [   24.106438] - Block StopMark Message
> (XEN) [   24.145591] - Performance Optimization
> (XEN) [   24.184893] - MSI Capability MMIO Access
> (XEN) [   24.224345] - Guest I/O Protection
> (XEN) [   24.263076] - Host Access
> (XEN) [   24.300974] - Enhanced PPR Handling
> (XEN) [   24.339982] - Attribute Forward
> (XEN) [   24.378528] - Virtualized IOMMU
> (XEN) [   24.417041] - VMGuard I/O Support
> (XEN) [   24.455584] - VM Table Size: 0x2
> (XEN) [   24.497260] AMD-Vi: IOMMU Extended Features:
> (XEN) [   24.536794] - Peripheral Page Service Request
> (XEN) [   24.576469] - x2APIC
> (XEN) [   24.613395] - NX bit
> (XEN) [   24.649996] - Invalidate All Command
> (XEN) [   24.688294] - Guest APIC
> (XEN) [   24.725142] - Performance Counters
> (XEN) [   24.763127] - Host Address Translation Size: 0x2
> (XEN) [   24.802707] - Guest Address Translation Size: 0
> (XEN) [   24.842172] - Guest CR3 Root Table Level: 0x1
> (XEN) [   24.881459] - Maximum PASID: 0xf
> (XEN) [   24.919288] - SMI Filter Register: 0x1
> (XEN) [   24.957583] - SMI Filter Register Count: 0x2
> (XEN) [   24.996496] - Guest Virtual APIC Modes: 0x1
> (XEN) [   25.035364] - Dual PPR Log: 0x2
> (XEN) [   25.072908] - Dual Event Log: 0x2
> (XEN) [   25.110588] - User / Supervisor Page Protection
> (XEN) [   25.150037] - Device Table Segmentation: 0x3
> (XEN) [   25.189290] - PPR Log Overflow Early Warning
> (XEN) [   25.228696] - PPR Automatic Response
> (XEN) [   25.267348] - Memory Access Routing a in Xennd Control: 0x1
> (XEN) [   25.308045] - Block StopMark Message
> (XEN) [   25.347199] - Performance Optimization
> (XEN) [   25.386484] - MSI Capability MMIO Access
> (XEN) [   25.425950] - Guest I/O Protection
> (XEN) [   25.464680] - Host Access
> (XEN) [   25.502580] - Enhanced PPR Handling
> (XEN) [   25.541588] - Attribute Forward
> (XEN) [   25.580145] - Virtualized IOMMU
> (XEN) [   25.618647] - VMGuard I/O Support
> (XEN) [   25.657196] - VM Table Size: 0x2
> (XEN) [   25.698883] AMD-Vi: IOMMU Extended Features:
> (XEN) [   25.738420] - Peripheral Page Service Request
> (XEN) [   25.778084] - x2APIC
> (XEN) [   25.815010] - NX bit
> (XEN) [   25.851603] - Invalidate All Command
> (XEN) [   25.889907] - Guest APIC
> (XEN) [   25.926756] - Performance Counters
> (XEN) [   25.964742] - Host Address Translation Size: 0x2
> (XEN) [   26.004312] - Guest Address Translation Size: 0
> (XEN) [   26.043787] - Guest CR3 Root Table Level: 0x1
> (XEN) [   26.083064] - Maximum PASID: 0xf
> (XEN) [   26.120903] - SMI Filter Register: 0x1
> (XEN) [   26.159215] - SMI Filter Register Count: 0x2
> (XEN) [   26.198129] - Guest Virtual APIC Modes: 0x1
> (XEN) [   26.237015] - Dual PPR Log: 0x2
> (XEN) [   26.274565] - Dual Event Log: 0x2
> (XEN) [   26.312231] - User / Supervisor Page Protection
> (XEN) [   26.351672] - Device Table Segmentation: 0x3
> (XEN) [   26.390949] - PPR Log Overflow Early Warning
> (XEN) [   26.430343] - PPR Automatic Response
> (XEN) [   26.469006] - Memory Access Routing and Control: 0x1
> (XEN) [   26.509711] - Block StopMark Message
> (XEN) [   26.548867] - Performance Optimization
> (XEN) [   26.588159] - MSI Capability MMIO Access
> (XEN) [   26.627627] - Guest I/O Protection
> (XEN) [   26.666355] - Host Access
> (XEN) [   26.704253] - Enhanced PPR Handling
> (XEN) [   26.743271] - Attribute Forward
> (XEN) [   26.781828] - Virtualized IOMMU
> (XEN) [   26.820341] - VMGuard I/O Support
> (XEN) [   26.858874] - VM Table Size: 0x2
> (XEN) [   26.900549] AMD-Vi: IOMMU Extended Features:
> (XEN) [   26.940095] - Peripheral Page Service Request
> (XEN) [   26.979769] - x2APIC
> (XEN) [   27.016686] - NX bit
> (XEN) [   27.053287] - Invalidate All Command
> (XEN) [   27.091582] - Guest APIC
> (XEN) [   27.128432] - Performance Counters
> (XEN) [   27.166418] - Host Address Translation Size: 0x2
> (XEN) [   27.205971] - Guest Address Translation Size: 0
> (XEN) [   27.245440] - Guest CR3 Root Table Level: 0x1
> (XEN) [   27.284714] - Maximum PASID: 0xf
> (XEN) [   27.322543] - SMI Filter Register: 0x1
> (XEN) [   27.360847] - SMI Filter Register Count: 0x2
> (XEN) [   27.399768] - Guest Virtual APIC Modes: 0x1
> (XEN) [   27.438656] - Dual PPR Log: 0x2
> (XEN) [   27.476209] - Dual Event Log: 0x2
> (XEN) [   27.513889] - User / Supervisor Page Protection
> (XEN) [   27.553331] - Device Table Segmentation: 0x3
> (XEN) [   27.592595] - PPR Log Overflow Early Warning
> (XEN) [   27.632003] - PPR Automatic Response
> (XEN) [   27.670655] - Memory Access Routing and Control: 0x1
> (XEN) [   27.711351] - Block StopMark Message
> (XEN) [   27.750525] - Performance Optimization
> (XEN) [   27.789811] - MSI Capability MMIO Access
> (XEN) [   27.829267] - Guest I/O Protection
> (XEN) [   27.868005] - Host Access
> (XEN) [   27.905889] - Enhanced PPR Handling
> (XEN) [   27.944896] - Attribute Forward
> (XEN) [   27.983445] - Virtualized IOMMU
> (XEN) [   28.021956] - VMGuard I/O Support
> (XEN) [   28.060494] - VM Table Size: 0x2
> (XEN) [   28.284879] AMD-Vi: Disabled HAP memory map sharing with IOMMU
> (XEN) [   28.326856] AMD-Vi: IOMMU 0 Enabled.
> (XEN) [   28.366011] AMD-Vi: IOMMU 1 Enabled.
> (XEN) [   28.405142] AMD-Vi: IOMMU 2 Enabled.
> (XEN) [   28.444150] AMD-Vi: IOMMU 3 Enabled.
> (XEN) [   28.483062] AMD-Vi: IOMMU 4 Enabled.
> (XEN) [   28.521923] AMD-Vi: IOMMU 5 Enabled.
> (XEN) [   28.560798] AMD-Vi: IOMMU 6 Enabled.
> (XEN) [   28.599528] AMD-Vi: IOMMU 7 Enabled.
> (XEN) [   28.645382] I/O virtualisation enabled
> (XEN) [   28.684034]  - Dom0 mode: Relaxed
> (XEN) [   28.722063] Interrupt remapping enabled
> 
> Given that the expected case is that all IOMMUs are identical, how about
> only printing the details for IOMMU0, and eliding printing for further
> IOMMUs which have an identical featureset?

Obviously I did notice this too. I can add a patch to the series to
improve the situation (perhaps even ahead of this one), but I don't
think I want to do this right here.

>> --- a/xen/include/asm-x86/hvm/svm/amd-iommu-defs.h
>> +++ b/xen/include/asm-x86/hvm/svm/amd-iommu-defs.h
>> @@ -346,26 +346,57 @@ struct amd_iommu_dte {
>> -#define IOMMU_EXT_FEATURE_PASMAX_SHIFT                  0x0
>> -#define IOMMU_EXT_FEATURE_PASMAX_MASK                   0x0000001F
>> +union amd_iommu_ext_features {
>> +    uint64_t raw;
>> +    struct {
>> +        unsigned int pref_sup:1;
>> +        unsigned int ppr_sup:1;
>> +        unsigned int xt_sup:1;
>> +        unsigned int nx_sup:1;
>> +        unsigned int gt_sup:1;
>> +        unsigned int gappi_sup:1;
>> +        unsigned int ia_sup:1;
>> +        unsigned int ga_sup:1;
>> +        unsigned int he_sup:1;
>> +        unsigned int pc_sup:1;
>> +        unsigned int hats:2;
>> +        unsigned int gats:2;
>> +        unsigned int glx_sup:2;
>> +        unsigned int smif_sup:2;
>> +        unsigned int smif_rc:3;
>> +        unsigned int gam_sup:3;
>> +        unsigned int dual_ppr_log_sup:2;
>> +        unsigned int :2;
>> +        unsigned int dual_event_log_sup:2;
>> +        unsigned int sat_sup:1;
>> +        unsigned int :1;
>> +        unsigned int pas_max:5;
>> +        unsigned int us_sup:1;
>> +        unsigned int dev_tbl_seg_sup:2;
>> +        unsigned int ppr_early_of_sup:1;
>> +        unsigned int ppr_auto_rsp_sup:1;
>> +        unsigned int marc_sup:2;
>> +        unsigned int blk_stop_mrk_sup:1;
>> +        unsigned int perf_opt_sup:1;
>> +        unsigned int msi_cap_mmio_sup:1;
>> +        unsigned int :1;
>> +        unsigned int gio_sup:1;
>> +        unsigned int ha_sup:1;
>> +        unsigned int eph_sup:1;
>> +        unsigned int attr_fw_sup:1;
>> +        unsigned int hd_sup:1;
>> +        unsigned int :1;
>> +        unsigned int inv_iotlb_type_sup:1;
>> +        unsigned int viommu_sup:1;
>> +        unsigned int vm_guard_io_sup:1;
>> +        unsigned int vm_table_size:4;
>> +        unsigned int ga_update_dis_sup:1;
>> +        unsigned int :2;
>> +    } flds;
>> +};
> 
> I'd suggest bool for single bitfields.  We've been bitten multiple times
> by "x = (y & 0x80)" type bugs, which truncate to 0 using unsigned int
> bitfields, but correctly become 1 given bool bitfields.

Oh, it took me a while to figure what you mean - you're after the case
of x (in your example), not y being a bitfield reference. I don't think
we're at risk of introducing such constructs here, so personally I'd
prefer it to stay as it is, but I'd listen to Brian and/or Suravee leaning
more towards your suggestion.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [Xen-devel] [PATCH 1/9] AMD/IOMMU: use bit field for extended feature register
  2019-06-17 19:07   ` Woods, Brian
@ 2019-06-18  9:37     ` Jan Beulich
  0 siblings, 0 replies; 54+ messages in thread
From: Jan Beulich @ 2019-06-18  9:37 UTC (permalink / raw)
  To: Brian Woods; +Cc: Andrew Cooper, Suravee Suthikulpanit, xen-devel

>>> On 17.06.19 at 21:07, <Brian.Woods@amd.com> wrote:
> On Thu, Jun 13, 2019 at 07:22:31AM -0600, Jan Beulich wrote:
>> --- a/xen/include/asm-x86/hvm/svm/amd-iommu-defs.h
>> +++ b/xen/include/asm-x86/hvm/svm/amd-iommu-defs.h
>> @@ -346,26 +346,57 @@ struct amd_iommu_dte {
>>  #define IOMMU_EXCLUSION_LIMIT_HIGH_MASK		0xFFFFFFFF
>>  #define IOMMU_EXCLUSION_LIMIT_HIGH_SHIFT	0
>>  
>> -/* Extended Feature Register*/
>> +/* Extended Feature Register */
>>  #define IOMMU_EXT_FEATURE_MMIO_OFFSET                   0x30
>> -#define IOMMU_EXT_FEATURE_PREFSUP_SHIFT                 0x0
>> -#define IOMMU_EXT_FEATURE_PPRSUP_SHIFT                  0x1
>> -#define IOMMU_EXT_FEATURE_XTSUP_SHIFT                   0x2
>> -#define IOMMU_EXT_FEATURE_NXSUP_SHIFT                   0x3
>> -#define IOMMU_EXT_FEATURE_GTSUP_SHIFT                   0x4
>> -#define IOMMU_EXT_FEATURE_IASUP_SHIFT                   0x6
>> -#define IOMMU_EXT_FEATURE_GASUP_SHIFT                   0x7
>> -#define IOMMU_EXT_FEATURE_HESUP_SHIFT                   0x8
>> -#define IOMMU_EXT_FEATURE_PCSUP_SHIFT                   0x9
>> -#define IOMMU_EXT_FEATURE_HATS_SHIFT                    0x10
>> -#define IOMMU_EXT_FEATURE_HATS_MASK                     0x00000C00
>> -#define IOMMU_EXT_FEATURE_GATS_SHIFT                    0x12
>> -#define IOMMU_EXT_FEATURE_GATS_MASK                     0x00003000
>> -#define IOMMU_EXT_FEATURE_GLXSUP_SHIFT                  0x14
>> -#define IOMMU_EXT_FEATURE_GLXSUP_MASK                   0x0000C000
>>  
>> -#define IOMMU_EXT_FEATURE_PASMAX_SHIFT                  0x0
>> -#define IOMMU_EXT_FEATURE_PASMAX_MASK                   0x0000001F
>> +union amd_iommu_ext_features {
>> +    uint64_t raw;
>> +    struct {
>> +        unsigned int pref_sup:1;
>> +        unsigned int ppr_sup:1;
>> +        unsigned int xt_sup:1;
>> +        unsigned int nx_sup:1;
>> +        unsigned int gt_sup:1;
>> +        unsigned int gappi_sup:1;
>> +        unsigned int ia_sup:1;
>> +        unsigned int ga_sup:1;
>> +        unsigned int he_sup:1;
>> +        unsigned int pc_sup:1;
>> +        unsigned int hats:2;
>> +        unsigned int gats:2;
>> +        unsigned int glx_sup:2;
>> +        unsigned int smif_sup:2;
>> +        unsigned int smif_rc:3;
>> +        unsigned int gam_sup:3;
>> +        unsigned int dual_ppr_log_sup:2;
>> +        unsigned int :2;
>> +        unsigned int dual_event_log_sup:2;
> 
>> +        unsigned int sat_sup:1;
>> +        unsigned int :1;
> I think these might be flipped.

Oh, indeed. And I've also omitted an 's' from the name. Thanks for
noticing.

Jan



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [Xen-devel] [PATCH 2/9] AMD/IOMMU: use bit field for control register
  2019-06-13 13:22 ` [Xen-devel] [PATCH 2/9] AMD/IOMMU: use bit field for control register Jan Beulich
@ 2019-06-18  9:54   ` Andrew Cooper
  2019-06-18 10:45     ` Jan Beulich
  0 siblings, 1 reply; 54+ messages in thread
From: Andrew Cooper @ 2019-06-18  9:54 UTC (permalink / raw)
  To: Jan Beulich, xen-devel; +Cc: Brian Woods, Suravee Suthikulpanit

On 13/06/2019 14:22, Jan Beulich wrote:
> Also introduce a field in struct amd_iommu caching the most recently
> written control register. All writes should now happen exclusively from
> that cached value, such that it is guaranteed to be up to date.
>
> Take the opportunity and add further fields. Also convert a few boolean
> function parameters to bool, such that use of !! can be avoided.

Critically also, some previous writel()'s have turned into writeq(),
which needs calling out.

> --- a/xen/drivers/passthrough/amd/iommu_init.c
> +++ b/xen/drivers/passthrough/amd/iommu_init.c
> @@ -69,31 +69,18 @@ static void __init unmap_iommu_mmio_regi
>  
>  static void set_iommu_ht_flags(struct amd_iommu *iommu)
>  {
> -    u32 entry;
> -    entry = readl(iommu->mmio_base + IOMMU_CONTROL_MMIO_OFFSET);
> -
>      /* Setup HT flags */
>      if ( iommu_has_cap(iommu, PCI_CAP_HT_TUNNEL_SHIFT) )
> -        iommu_has_ht_flag(iommu, ACPI_IVHD_TT_ENABLE) ?
> -            iommu_set_bit(&entry, IOMMU_CONTROL_HT_TUNNEL_TRANSLATION_SHIFT) :
> -            iommu_clear_bit(&entry, IOMMU_CONTROL_HT_TUNNEL_TRANSLATION_SHIFT);
> -
> -    iommu_has_ht_flag(iommu, ACPI_IVHD_RES_PASS_PW) ?
> -        iommu_set_bit(&entry, IOMMU_CONTROL_RESP_PASS_POSTED_WRITE_SHIFT):
> -        iommu_clear_bit(&entry, IOMMU_CONTROL_RESP_PASS_POSTED_WRITE_SHIFT);
> -
> -    iommu_has_ht_flag(iommu, ACPI_IVHD_ISOC) ?
> -        iommu_set_bit(&entry, IOMMU_CONTROL_ISOCHRONOUS_SHIFT):
> -        iommu_clear_bit(&entry, IOMMU_CONTROL_ISOCHRONOUS_SHIFT);
> -
> -    iommu_has_ht_flag(iommu, ACPI_IVHD_PASS_PW) ?
> -        iommu_set_bit(&entry, IOMMU_CONTROL_PASS_POSTED_WRITE_SHIFT):
> -        iommu_clear_bit(&entry, IOMMU_CONTROL_PASS_POSTED_WRITE_SHIFT);
> +        iommu->ctrl.ht_tun_en = iommu_has_ht_flag(iommu, ACPI_IVHD_TT_ENABLE);
> +
> +    iommu->ctrl.pass_pw     = iommu_has_ht_flag(iommu, ACPI_IVHD_PASS_PW);
> +    iommu->ctrl.res_pass_pw = iommu_has_ht_flag(iommu, ACPI_IVHD_RES_PASS_PW);
> +    iommu->ctrl.isoc        = iommu_has_ht_flag(iommu, ACPI_IVHD_ISOC);
>  
>      /* Force coherent */
> -    iommu_set_bit(&entry, IOMMU_CONTROL_COHERENT_SHIFT);
> +    iommu->ctrl.coherent = 1;

Ah - so this is the AMD version of Intel's iommu=snoop

> --- a/xen/include/asm-x86/hvm/svm/amd-iommu-defs.h
> +++ b/xen/include/asm-x86/hvm/svm/amd-iommu-defs.h
> @@ -295,38 +295,55 @@ struct amd_iommu_dte {
>
> +union amd_iommu_control {
> +    uint64_t raw;
> +    struct {
> +        unsigned int iommu_en:1;
> +        unsigned int ht_tun_en:1;
> +        unsigned int event_log_en:1;
> +        unsigned int event_int_en:1;
> +        unsigned int com_wait_int_en:1;
> +        unsigned int inv_timeout:3;
> +        unsigned int pass_pw:1;
> +        unsigned int res_pass_pw:1;
> +        unsigned int coherent:1;
> +        unsigned int isoc:1;
> +        unsigned int cmd_buf_en:1;
> +        unsigned int ppr_log_en:1;
> +        unsigned int ppr_int_en:1;
> +        unsigned int ppr_en:1;
> +        unsigned int gt_en:1;
> +        unsigned int ga_en:1;
> +        unsigned int crw:4;

This field does have an assigned name, but is also documented as Res0
for forwards compatibility.  I think this field wants handling
consistently with...

> +        unsigned int smif_en:1;
> +        unsigned int slf_wb_dis:1;
> +        unsigned int smif_log_en:1;
> +        unsigned int gam_en:3;
> +        unsigned int ga_log_en:1;
> +        unsigned int ga_int_en:1;
> +        unsigned int dual_ppr_log_en:2;
> +        unsigned int dual_event_log_en:2;
> +        unsigned int dev_tbl_seg_en:3;
> +        unsigned int priv_abrt_en:2;
> +        unsigned int ppr_auto_rsp_en:1;
> +        unsigned int marc_en:1;
> +        unsigned int blk_stop_mrk_en:1;
> +        unsigned int ppr_auto_rsp_aon:1;
> +        unsigned int :2;

... this, where you have dropped the DomainIDPNE bit (whatever the PN
stands for).

~Andrew

> +        unsigned int eph_en:1;
> +        unsigned int had_update:2;
> +        unsigned int gd_update_dis:1;
> +        unsigned int :1;
> +        unsigned int xt_en:1;
> +        unsigned int int_cap_xt_en:1;
> +        unsigned int vcmd_en:1;
> +        unsigned int viommu_en:1;
> +        unsigned int ga_update_dis:1;
> +        unsigned int gappi_en:1;
> +        unsigned int :8;
> +    };
> +};
>  
>  /* Exclusion Register */
>  #define IOMMU_EXCLUSION_BASE_LOW_OFFSET		0x20
>
>
>


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [Xen-devel] [PATCH 3/9] AMD/IOMMU: use bit field for IRTE
  2019-06-13 13:23 ` [Xen-devel] [PATCH 3/9] AMD/IOMMU: use bit field for IRTE Jan Beulich
@ 2019-06-18 10:37   ` Andrew Cooper
  2019-06-18 11:53     ` Jan Beulich
  2019-06-18 11:31   ` Andrew Cooper
  1 sibling, 1 reply; 54+ messages in thread
From: Andrew Cooper @ 2019-06-18 10:37 UTC (permalink / raw)
  To: Jan Beulich, xen-devel; +Cc: Brian Woods, Suravee Suthikulpanit

On 13/06/2019 14:23, Jan Beulich wrote:
> At the same time restrict its scope to just the single source file
> actually using it, and abstract accesses by introducing a union of
> pointers. (A union of the actual table entries is not used to make it
> impossible to [wrongly, once the 128-bit form gets added] perform
> pointer arithmetic / array accesses on derived types.)
>
> Also move away from updating the entries piecemeal: Construct a full new
> entry, and write it out.
>
> Signed-off-by: Jan Beulich <jbeulich@suse.com>
> ---
> It would have been nice to use write_atomic() or ACCESS_ONCE() for the
> actual writes, but both cast the value to a scalar one, which doesn't
> suit us here (and I also didn't want to make the compound type a union
> with a raw member just for this).
>
> --- a/xen/drivers/passthrough/amd/iommu_intr.c
> +++ b/xen/drivers/passthrough/amd/iommu_intr.c
> @@ -23,6 +23,23 @@
>  #include <asm/io_apic.h>
>  #include <xen/keyhandler.h>
>  
> +struct irte_basic {

I'd suggest irte_32, to go with irte_128 in the following patch. 

The 128bit format is also used for posted interrupts, and isn't specific
to x2apic support.

Furthermore, calling it irte_full isn't a term I can see in the manual,
and is falling into the naming trap that USB currently lives in.

> +    unsigned int remap_en:1;
> +    unsigned int sup_io_pf:1;
> +    unsigned int int_type:3;
> +    unsigned int rq_eoi:1;
> +    unsigned int dm:1;
> +    unsigned int guest_mode:1; /* MBZ */
> +    unsigned int dest:8;
> +    unsigned int vector:8;
> +    unsigned int :8;
> +};
> +
> +union irte_ptr {
> +    void *raw;
> +    struct irte_basic *basic;
> +};
> +
>  #define INTREMAP_TABLE_ORDER    1
>  #define INTREMAP_LENGTH 0xB
>  #define INTREMAP_ENTRIES (1 << INTREMAP_LENGTH)
> @@ -101,47 +118,44 @@ static unsigned int alloc_intremap_entry
>      return slot;
>  }
>  
> -static u32 *get_intremap_entry(int seg, int bdf, int offset)
> +static union irte_ptr get_intremap_entry(unsigned int seg, unsigned int bdf,
> +                                         unsigned int offset)

As this is changing, s/offset/entry/ to avoid any confusion where offset
might be in units of bytes.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [Xen-devel] [PATCH 2/9] AMD/IOMMU: use bit field for control register
  2019-06-18  9:54   ` Andrew Cooper
@ 2019-06-18 10:45     ` Jan Beulich
  0 siblings, 0 replies; 54+ messages in thread
From: Jan Beulich @ 2019-06-18 10:45 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: xen-devel, Brian Woods, Suravee Suthikulpanit

>>> On 18.06.19 at 11:54, <andrew.cooper3@citrix.com> wrote:
> On 13/06/2019 14:22, Jan Beulich wrote:
>> Also introduce a field in struct amd_iommu caching the most recently
>> written control register. All writes should now happen exclusively from
>> that cached value, such that it is guaranteed to be up to date.
>>
>> Take the opportunity and add further fields. Also convert a few boolean
>> function parameters to bool, such that use of !! can be avoided.
> 
> Critically also, some previous writel()'s have turned into writeq(),
> which needs calling out.

Sure, done.

>> --- a/xen/include/asm-x86/hvm/svm/amd-iommu-defs.h
>> +++ b/xen/include/asm-x86/hvm/svm/amd-iommu-defs.h
>> @@ -295,38 +295,55 @@ struct amd_iommu_dte {
>>
>> +union amd_iommu_control {
>> +    uint64_t raw;
>> +    struct {
>> +        unsigned int iommu_en:1;
>> +        unsigned int ht_tun_en:1;
>> +        unsigned int event_log_en:1;
>> +        unsigned int event_int_en:1;
>> +        unsigned int com_wait_int_en:1;
>> +        unsigned int inv_timeout:3;
>> +        unsigned int pass_pw:1;
>> +        unsigned int res_pass_pw:1;
>> +        unsigned int coherent:1;
>> +        unsigned int isoc:1;
>> +        unsigned int cmd_buf_en:1;
>> +        unsigned int ppr_log_en:1;
>> +        unsigned int ppr_int_en:1;
>> +        unsigned int ppr_en:1;
>> +        unsigned int gt_en:1;
>> +        unsigned int ga_en:1;
>> +        unsigned int crw:4;
> 
> This field does have an assigned name, but is also documented as Res0
> for forwards compatibility.  I think this field wants handling
> consistently with...
> 
>> +        unsigned int smif_en:1;
>> +        unsigned int slf_wb_dis:1;
>> +        unsigned int smif_log_en:1;
>> +        unsigned int gam_en:3;
>> +        unsigned int ga_log_en:1;
>> +        unsigned int ga_int_en:1;
>> +        unsigned int dual_ppr_log_en:2;
>> +        unsigned int dual_event_log_en:2;
>> +        unsigned int dev_tbl_seg_en:3;
>> +        unsigned int priv_abrt_en:2;
>> +        unsigned int ppr_auto_rsp_en:1;
>> +        unsigned int marc_en:1;
>> +        unsigned int blk_stop_mrk_en:1;
>> +        unsigned int ppr_auto_rsp_aon:1;
>> +        unsigned int :2;
> 
> ... this, where you have dropped the DomainIDPNE bit (whatever the PN
> stands for).

I guess the work was done with an earlier version of the doc. I've
now added the missing name of the field.

Jan



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [Xen-devel] [PATCH 3/9] AMD/IOMMU: use bit field for IRTE
  2019-06-13 13:23 ` [Xen-devel] [PATCH 3/9] AMD/IOMMU: use bit field for IRTE Jan Beulich
  2019-06-18 10:37   ` Andrew Cooper
@ 2019-06-18 11:31   ` Andrew Cooper
  2019-06-18 11:47     ` Jan Beulich
  1 sibling, 1 reply; 54+ messages in thread
From: Andrew Cooper @ 2019-06-18 11:31 UTC (permalink / raw)
  To: Jan Beulich, xen-devel; +Cc: Brian Woods, Suravee Suthikulpanit

On 13/06/2019 14:23, Jan Beulich wrote:
> At the same time restrict its scope to just the single source file
> actually using it, and abstract accesses by introducing a union of
> pointers. (A union of the actual table entries is not used to make it
> impossible to [wrongly, once the 128-bit form gets added] perform
> pointer arithmetic / array accesses on derived types.)
>
> Also move away from updating the entries piecemeal: Construct a full new
> entry, and write it out.
>
> Signed-off-by: Jan Beulich <jbeulich@suse.com>
> ---
> It would have been nice to use write_atomic() or ACCESS_ONCE() for the
> actual writes, but both cast the value to a scalar one, which doesn't
> suit us here (and I also didn't want to make the compound type a union
> with a raw member just for this).

Actually, having looked at the following patch, I think it would be
better to union with a uint32_t raw, so that we can use...

> @@ -101,47 +118,44 @@ static unsigned int alloc_intremap_entry
>      return slot;
>  }
>  
> -static u32 *get_intremap_entry(int seg, int bdf, int offset)
> +static union irte_ptr get_intremap_entry(unsigned int seg, unsigned int bdf,
> +                                         unsigned int offset)
>  {
> -    u32 *table = get_ivrs_mappings(seg)[bdf].intremap_table;
> +    union irte_ptr table = {
> +        .raw = get_ivrs_mappings(seg)[bdf].intremap_table
> +    };
> +
> +    ASSERT(table.raw && (offset < INTREMAP_ENTRIES));
>  
> -    ASSERT( (table != NULL) && (offset < INTREMAP_ENTRIES) );
> +    table.basic += offset;
>  
> -    return table + offset;
> +    return table;
>  }
>  
> -static void free_intremap_entry(int seg, int bdf, int offset)
> +static void free_intremap_entry(unsigned int seg, unsigned int bdf, unsigned int offset)
>  {
> -    u32 *entry = get_intremap_entry(seg, bdf, offset);
> +    union irte_ptr entry = get_intremap_entry(seg, bdf, offset);
> +
> +    *entry.basic = (struct irte_basic){};

ACCESS_ONCE(*entry.basic.raw) = 0;

and...

>  
> -    memset(entry, 0, sizeof(u32));
>      __clear_bit(offset, get_ivrs_mappings(seg)[bdf].intremap_inuse);
>  }
>  
> -static void update_intremap_entry(u32* entry, u8 vector, u8 int_type,
> -    u8 dest_mode, u8 dest)
> -{
> -    set_field_in_reg_u32(IOMMU_CONTROL_ENABLED, 0,
> -                            INT_REMAP_ENTRY_REMAPEN_MASK,
> -                            INT_REMAP_ENTRY_REMAPEN_SHIFT, entry);
> -    set_field_in_reg_u32(IOMMU_CONTROL_DISABLED, *entry,
> -                            INT_REMAP_ENTRY_SUPIOPF_MASK,
> -                            INT_REMAP_ENTRY_SUPIOPF_SHIFT, entry);
> -    set_field_in_reg_u32(int_type, *entry,
> -                            INT_REMAP_ENTRY_INTTYPE_MASK,
> -                            INT_REMAP_ENTRY_INTTYPE_SHIFT, entry);
> -    set_field_in_reg_u32(IOMMU_CONTROL_DISABLED, *entry,
> -                            INT_REMAP_ENTRY_REQEOI_MASK,
> -                            INT_REMAP_ENTRY_REQEOI_SHIFT, entry);
> -    set_field_in_reg_u32((u32)dest_mode, *entry,
> -                            INT_REMAP_ENTRY_DM_MASK,
> -                            INT_REMAP_ENTRY_DM_SHIFT, entry);
> -    set_field_in_reg_u32((u32)dest, *entry,
> -                            INT_REMAP_ENTRY_DEST_MAST,
> -                            INT_REMAP_ENTRY_DEST_SHIFT, entry);
> -    set_field_in_reg_u32((u32)vector, *entry,
> -                            INT_REMAP_ENTRY_VECTOR_MASK,
> -                            INT_REMAP_ENTRY_VECTOR_SHIFT, entry);
> +static void update_intremap_entry(union irte_ptr entry, unsigned int vector,
> +                                  unsigned int int_type,
> +                                  unsigned int dest_mode, unsigned int dest)
> +{
> +    struct irte_basic basic = {
> +        .remap_en = 1,
> +        .sup_io_pf = 0,
> +        .int_type = int_type,
> +        .rq_eoi = 0,
> +        .dm = dest_mode,
> +        .dest = dest,
> +        .vector = vector,
> +    };
> +
> +    *entry.basic = basic;

ACCESS_ONCE(*entry.basic.raw) = basic.raw.

The problem is in an unoptimised case, structure assignment in
implemented with memcpy(), which may be implemented as `rep stosb` which
may result in a spliced write with the enable bit set first.

Using a union with raw allows for the use of ACCESS_ONCE(), which forces
the compiler to implement them as 32bit single mov's.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [Xen-devel] [PATCH 3/9] AMD/IOMMU: use bit field for IRTE
  2019-06-18 11:31   ` Andrew Cooper
@ 2019-06-18 11:47     ` Jan Beulich
  0 siblings, 0 replies; 54+ messages in thread
From: Jan Beulich @ 2019-06-18 11:47 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: xen-devel, Brian Woods, Suravee Suthikulpanit

>>> On 18.06.19 at 13:31, <andrew.cooper3@citrix.com> wrote:
> On 13/06/2019 14:23, Jan Beulich wrote:
>> At the same time restrict its scope to just the single source file
>> actually using it, and abstract accesses by introducing a union of
>> pointers. (A union of the actual table entries is not used to make it
>> impossible to [wrongly, once the 128-bit form gets added] perform
>> pointer arithmetic / array accesses on derived types.)
>>
>> Also move away from updating the entries piecemeal: Construct a full new
>> entry, and write it out.
>>
>> Signed-off-by: Jan Beulich <jbeulich@suse.com>
>> ---
>> It would have been nice to use write_atomic() or ACCESS_ONCE() for the
>> actual writes, but both cast the value to a scalar one, which doesn't
>> suit us here (and I also didn't want to make the compound type a union
>> with a raw member just for this).
> 
> Actually, having looked at the following patch, I think it would be
> better to union with a uint32_t raw, so that we can use...

Well, I did in fact have it that way first, until ...

>> +static void update_intremap_entry(union irte_ptr entry, unsigned int vector,
>> +                                  unsigned int int_type,
>> +                                  unsigned int dest_mode, unsigned int dest)
>> +{
>> +    struct irte_basic basic = {
>> +        .remap_en = 1,
>> +        .sup_io_pf = 0,
>> +        .int_type = int_type,
>> +        .rq_eoi = 0,
>> +        .dm = dest_mode,
>> +        .dest = dest,
>> +        .vector = vector,
>> +    };

... I figured that this initializer then will require the bitfields part of
the union to also get named, like for union amd_iommu_ext_features
in patch 1.

>> +    *entry.basic = basic;
> 
> ACCESS_ONCE(*entry.basic.raw) = basic.raw.
> 
> The problem is in an unoptimised case, structure assignment in
> implemented with memcpy(), which may be implemented as `rep stosb` which
> may result in a spliced write with the enable bit set first.
> 
> Using a union with raw allows for the use of ACCESS_ONCE(), which forces
> the compiler to implement them as 32bit single mov's.

If we are worried about this, writing of 32-bit entries could be done
(as an alternative to what you suggest) just like that of 128-bit
ones by the last patch in the series.

Jan



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [Xen-devel] [PATCH 3/9] AMD/IOMMU: use bit field for IRTE
  2019-06-18 10:37   ` Andrew Cooper
@ 2019-06-18 11:53     ` Jan Beulich
  2019-06-18 12:16       ` Andrew Cooper
  0 siblings, 1 reply; 54+ messages in thread
From: Jan Beulich @ 2019-06-18 11:53 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: xen-devel, Brian Woods, Suravee Suthikulpanit

>>> On 18.06.19 at 12:37, <andrew.cooper3@citrix.com> wrote:
> On 13/06/2019 14:23, Jan Beulich wrote:
>> At the same time restrict its scope to just the single source file
>> actually using it, and abstract accesses by introducing a union of
>> pointers. (A union of the actual table entries is not used to make it
>> impossible to [wrongly, once the 128-bit form gets added] perform
>> pointer arithmetic / array accesses on derived types.)
>>
>> Also move away from updating the entries piecemeal: Construct a full new
>> entry, and write it out.
>>
>> Signed-off-by: Jan Beulich <jbeulich@suse.com>
>> ---
>> It would have been nice to use write_atomic() or ACCESS_ONCE() for the
>> actual writes, but both cast the value to a scalar one, which doesn't
>> suit us here (and I also didn't want to make the compound type a union
>> with a raw member just for this).
>>
>> --- a/xen/drivers/passthrough/amd/iommu_intr.c
>> +++ b/xen/drivers/passthrough/amd/iommu_intr.c
>> @@ -23,6 +23,23 @@
>>  #include <asm/io_apic.h>
>>  #include <xen/keyhandler.h>
>>  
>> +struct irte_basic {
> 
> I'd suggest irte_32, to go with irte_128 in the following patch. 
> 
> The 128bit format is also used for posted interrupts, and isn't specific
> to x2apic support.

There are still two forms of 128-bit entries, and the intention with
the chosen names was for the other one to become irte_guest.

> Furthermore, calling it irte_full isn't a term I can see in the manual,
> and is falling into the naming trap that USB currently lives in.

Except that other than for USB's transfer speeds I can't really see
this getting wider and wider.

>> @@ -101,47 +118,44 @@ static unsigned int alloc_intremap_entry
>>      return slot;
>>  }
>>  
>> -static u32 *get_intremap_entry(int seg, int bdf, int offset)
>> +static union irte_ptr get_intremap_entry(unsigned int seg, unsigned int bdf,
>> +                                         unsigned int offset)
> 
> As this is changing, s/offset/entry/ to avoid any confusion where offset
> might be in units of bytes.

I don't really mind - I think both names are sufficiently clear, but
I'll switch since you think the other name is better.

Jan



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [Xen-devel] [PATCH 4/9] AMD/IOMMU: introduce 128-bit IRTE non-guest-APIC IRTE format
  2019-06-13 13:23 ` [Xen-devel] [PATCH 4/9] AMD/IOMMU: introduce 128-bit IRTE non-guest-APIC IRTE format Jan Beulich
@ 2019-06-18 11:57   ` Andrew Cooper
  2019-06-18 15:31     ` Jan Beulich
  0 siblings, 1 reply; 54+ messages in thread
From: Andrew Cooper @ 2019-06-18 11:57 UTC (permalink / raw)
  To: Jan Beulich, xen-devel; +Cc: Brian Woods, Suravee Suthikulpanit

On 13/06/2019 14:23, Jan Beulich wrote:
> This is in preparation of actually enabling x2APIC mode, which requires
> this wider IRTE format to be used.
>
> A specific remark regarding the first hunk changing
> amd_iommu_ioapic_update_ire(): This bypass was introduced for XSA-36,
> i.e. by 94d4a1119d ("AMD,IOMMU: Clean up old entries in remapping
> tables when creating new one"). Other code introduced by that change has
> meanwhile disappeared or further changed, and I wonder if - rather than
> adding an x2apic_enabled check to the conditional - the bypass couldn't
> be deleted altogether. For now the goal is to affect the non-x2APIC
> paths as little as possible.
>
> Take the liberty and use the new "fresh" flag to suppress an unneeded
> flush in update_intremap_entry_from_ioapic().

What is the meaning of fresh?  Wouldn't "needs_update" be a more
descriptive name?

>
> Signed-off-by: Jan Beulich <jbeulich@suse.com>
> ---
> Note that AMD's doc says Lowest Priority ("Arbitrated" by their naming)
> mode is unavailable in x2APIC mode, but they've confirmed this to be a
> mistake on their part.
>
> --- a/xen/drivers/passthrough/amd/iommu_intr.c
> +++ b/xen/drivers/passthrough/amd/iommu_intr.c
> @@ -35,12 +35,34 @@ struct irte_basic {
>      unsigned int :8;
>  };
>  
> +struct irte_full {
> +    unsigned int remap_en:1;
> +    unsigned int sup_io_pf:1;
> +    unsigned int int_type:3;
> +    unsigned int rq_eoi:1;
> +    unsigned int dm:1;
> +    unsigned int guest_mode:1; /* MBZ */

/* MBZ - not implemented yet. */

Seeing as interrupt posting will be a minor tweak to this data
structure, rather than implementing a new one.

> +    unsigned int dest_lo:24;
> +    unsigned int :32;
> +    unsigned int vector:8;
> +    unsigned int :24;
> +    unsigned int :24;
> +    unsigned int dest_hi:8;

The manual says that we should prefer aligned 64bit access, so some
raw_{lo,hi} fields here will allow...

> @@ -136,7 +170,21 @@ static void free_intremap_entry(unsigned
>  {
>      union irte_ptr entry = get_intremap_entry(seg, bdf, offset);
>  
> -    *entry.basic = (struct irte_basic){};
> +    switch ( irte_mode )
> +    {
> +    case irte_basic:
> +        *entry.basic = (struct irte_basic){};
> +        break;
> +
> +    case irte_full:
> +        entry.full->remap_en = 0;
> +        wmb();

... this to become

entry._128->raw_lo = 0;
smp_wmb();
entry._128->raw_hi = 0;

The interrupt mapping table is allocated in WB memory and accessed
coherently, so an sfence instruction isn't necessary.  All that matters
is that remap_en gets cleared first.

> +        *entry.full = (struct irte_full){};
> +        break;
> +
> +    default:
> +        ASSERT_UNREACHABLE();
> +    }
>  
>      __clear_bit(offset, get_ivrs_mappings(seg)[bdf].intremap_inuse);
>  }
> @@ -154,8 +202,38 @@ static void update_intremap_entry(union
>          .dest = dest,
>          .vector = vector,
>      };
> +    struct irte_full full = {
> +        .remap_en = 1,
> +        .sup_io_pf = 0,
> +        .int_type = int_type,
> +        .rq_eoi = 0,
> +        .dm = dest_mode,
> +        .dest_lo = dest,
> +        .dest_hi = dest >> 24,
> +        .vector = vector,
> +    };

Looking at the resulting code after this patch, I think these structures
should move into their respective case blocks, to help the compiler to
avoid initialising both.

> +
> +    switch ( irte_mode )
> +    {
> +        __uint128_t ret;
> +        union {
> +            __uint128_t raw;
> +            struct irte_full full;
> +        } old;
> +
> +    case irte_basic:
> +        *entry.basic = basic;
> +        break;
> +
> +    case irte_full:
> +        old.full = *entry.full;
> +        ret = cmpxchg16b(entry.full, &old, &full);
> +        ASSERT(ret == old.raw);

Similarly, this can be implemented with

entry.full->remap_en = 0;
smp_wmb();
entry._128->raw_hi = full.raw_hi;
smp_wmb();
entry._128->raw_lo = full.raw_lo;

which avoids using a locked operation.

> +        break;
>  
> -    *entry.basic = basic;
> +    default:
> +        ASSERT_UNREACHABLE();
> +    }
>  }
>  
>  static inline int get_rte_index(const struct IO_APIC_route_entry *rte)
> @@ -169,6 +247,11 @@ static inline void set_rte_index(struct
>      rte->delivery_mode = offset >> 8;
>  }
>  
> +static inline unsigned int get_full_dest(const struct irte_full *entry)
> +{
> +    return entry->dest_lo | (entry->dest_hi << 24);

Given your observation on my still-not-upstream-yet patch about GCC not
honouring the type of bitfields, doesn't dest_hi need explicitly casting
to unsigned int before the shift, to avoid it being performed as int?

> @@ -280,6 +392,18 @@ int __init amd_iommu_setup_ioapic_remapp
>              dest_mode = rte.dest_mode;
>              dest = rte.dest.logical.logical_dest;
>  
> +            if ( iommu->ctrl.xt_en )
> +            {
> +                /*
> +                 * In x2APIC mode we have no way of discovering the high 24
> +                 * bits of the destination of an already enabled interrupt.

Yes we do.  We read the interrupt remapping table, which is where that
information lives.

Any firmware driver which is following the spec won't have let an IRTE
get cached, then played with the table without appropriate flushing.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [Xen-devel] [PATCH 3/9] AMD/IOMMU: use bit field for IRTE
  2019-06-18 11:53     ` Jan Beulich
@ 2019-06-18 12:16       ` Andrew Cooper
  2019-06-18 12:55         ` Jan Beulich
  0 siblings, 1 reply; 54+ messages in thread
From: Andrew Cooper @ 2019-06-18 12:16 UTC (permalink / raw)
  To: Jan Beulich; +Cc: xen-devel, Brian Woods, Suravee Suthikulpanit

On 18/06/2019 12:53, Jan Beulich wrote:
>>>> On 18.06.19 at 12:37, <andrew.cooper3@citrix.com> wrote:
>> On 13/06/2019 14:23, Jan Beulich wrote:
>>> At the same time restrict its scope to just the single source file
>>> actually using it, and abstract accesses by introducing a union of
>>> pointers. (A union of the actual table entries is not used to make it
>>> impossible to [wrongly, once the 128-bit form gets added] perform
>>> pointer arithmetic / array accesses on derived types.)
>>>
>>> Also move away from updating the entries piecemeal: Construct a full new
>>> entry, and write it out.
>>>
>>> Signed-off-by: Jan Beulich <jbeulich@suse.com>
>>> ---
>>> It would have been nice to use write_atomic() or ACCESS_ONCE() for the
>>> actual writes, but both cast the value to a scalar one, which doesn't
>>> suit us here (and I also didn't want to make the compound type a union
>>> with a raw member just for this).
>>>
>>> --- a/xen/drivers/passthrough/amd/iommu_intr.c
>>> +++ b/xen/drivers/passthrough/amd/iommu_intr.c
>>> @@ -23,6 +23,23 @@
>>>  #include <asm/io_apic.h>
>>>  #include <xen/keyhandler.h>
>>>  
>>> +struct irte_basic {
>> I'd suggest irte_32, to go with irte_128 in the following patch. 
>>
>> The 128bit format is also used for posted interrupts, and isn't specific
>> to x2apic support.
> There are still two forms of 128-bit entries, and the intention with
> the chosen names was for the other one to become irte_guest.

They are not forms of which can be delineated by irte_mode, because the
guest_mode setting is (/will be) per-domain, not global (which is
necessary for sane testability, and for nested-virt support where the
guest VMCB controls aren't set up by Xen).

>
>> Furthermore, calling it irte_full isn't a term I can see in the manual,
>> and is falling into the naming trap that USB currently lives in.
> Except that other than for USB's transfer speeds I can't really see
> this getting wider and wider.

It doesn't make the names "basic" and "full" any more descriptive.

>
>>> @@ -101,47 +118,44 @@ static unsigned int alloc_intremap_entry
>>>      return slot;
>>>  }
>>>  
>>> -static u32 *get_intremap_entry(int seg, int bdf, int offset)
>>> +static union irte_ptr get_intremap_entry(unsigned int seg, unsigned int bdf,
>>> +                                         unsigned int offset)
>> As this is changing, s/offset/entry/ to avoid any confusion where offset
>> might be in units of bytes.
> I don't really mind - I think both names are sufficiently clear, but
> I'll switch since you think the other name is better.

Looking through the other code, idx or index would also do fine, but I
think all of these are clearer than using offset.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [Xen-devel] [PATCH 5/9] AMD/IOMMU: split amd_iommu_init_one()
  2019-06-13 13:24 ` [Xen-devel] [PATCH 5/9] AMD/IOMMU: split amd_iommu_init_one() Jan Beulich
@ 2019-06-18 12:17   ` Andrew Cooper
  0 siblings, 0 replies; 54+ messages in thread
From: Andrew Cooper @ 2019-06-18 12:17 UTC (permalink / raw)
  To: Jan Beulich, xen-devel; +Cc: Brian Woods, Suravee Suthikulpanit

On 13/06/2019 14:24, Jan Beulich wrote:
> Mapping the MMIO space and obtaining feature information needs to happen
> slightly earlier, such that for x2APIC support we can set XTEn prior to
> calling amd_iommu_update_ivrs_mapping_acpi() and
> amd_iommu_setup_ioapic_remapping().
>
> Signed-off-by: Jan Beulich <jbeulich@suse.com>

Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [Xen-devel] [PATCH 6/9] AMD/IOMMU: allow enabling with IRQ not yet set up
  2019-06-13 13:25 ` [Xen-devel] [PATCH 6/9] AMD/IOMMU: allow enabling with IRQ not yet set up Jan Beulich
@ 2019-06-18 12:22   ` Andrew Cooper
  0 siblings, 0 replies; 54+ messages in thread
From: Andrew Cooper @ 2019-06-18 12:22 UTC (permalink / raw)
  To: Jan Beulich, xen-devel; +Cc: Brian Woods, Suravee Suthikulpanit

On 13/06/2019 14:25, Jan Beulich wrote:
> Early enabling (to enter x2APIC mode) requires deferring of the IRQ
> setup.

I can accept that this might be how the IOMMU infrastructure currently
functions (and therefore won't block this series), but this behaviour
isn't correct.

We should not have bifurcated setup depending on xAPIC vs x2APIC mode.

>  Code to actually do that setup in the x2APIC case will get added
> subsequently.
>
> Signed-off-by: Jan Beulich <jbeulich@suse.com>

Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [Xen-devel] [PATCH 7/9] AMD/IOMMU: adjust setup of internal interrupt for x2APIC mode
  2019-06-13 13:26 ` [Xen-devel] [PATCH 7/9] AMD/IOMMU: adjust setup of internal interrupt for x2APIC mode Jan Beulich
@ 2019-06-18 12:35   ` Andrew Cooper
  0 siblings, 0 replies; 54+ messages in thread
From: Andrew Cooper @ 2019-06-18 12:35 UTC (permalink / raw)
  To: Jan Beulich, xen-devel; +Cc: Brian Woods, Suravee Suthikulpanit

On 13/06/2019 14:26, Jan Beulich wrote:
> In order to be able to express all possible destinations we need to make
> use of this non-MSI-capability based mechanism. The new IRQ controller
> structure can re-use certain MSI functions, though.
>
> For now general and PPR interrupts still share a single vector, IRQ, and
> hence handler.
>
> Signed-off-by: Jan Beulich <jbeulich@suse.com>

Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [Xen-devel] [PATCH 3/9] AMD/IOMMU: use bit field for IRTE
  2019-06-18 12:16       ` Andrew Cooper
@ 2019-06-18 12:55         ` Jan Beulich
  0 siblings, 0 replies; 54+ messages in thread
From: Jan Beulich @ 2019-06-18 12:55 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: xen-devel, Brian Woods, Suravee Suthikulpanit

>>> On 18.06.19 at 14:16, <andrew.cooper3@citrix.com> wrote:
> On 18/06/2019 12:53, Jan Beulich wrote:
>>>>> On 18.06.19 at 12:37, <andrew.cooper3@citrix.com> wrote:
>>> On 13/06/2019 14:23, Jan Beulich wrote:
>>>> --- a/xen/drivers/passthrough/amd/iommu_intr.c
>>>> +++ b/xen/drivers/passthrough/amd/iommu_intr.c
>>>> @@ -23,6 +23,23 @@
>>>>  #include <asm/io_apic.h>
>>>>  #include <xen/keyhandler.h>
>>>>  
>>>> +struct irte_basic {
>>> I'd suggest irte_32, to go with irte_128 in the following patch. 
>>>
>>> The 128bit format is also used for posted interrupts, and isn't specific
>>> to x2apic support.
>> There are still two forms of 128-bit entries, and the intention with
>> the chosen names was for the other one to become irte_guest.
> 
> They are not forms of which can be delineated by irte_mode, because the
> guest_mode setting is (/will be) per-domain, not global (which is
> necessary for sane testability, and for nested-virt support where the
> guest VMCB controls aren't set up by Xen).

True and ...

>>> Furthermore, calling it irte_full isn't a term I can see in the manual,
>>> and is falling into the naming trap that USB currently lives in.
>> Except that other than for USB's transfer speeds I can't really see
>> this getting wider and wider.
> 
> It doesn't make the names "basic" and "full" any more descriptive.

... also true, but irte_128 still won't fly with the other (guest) layout.

>>>> @@ -101,47 +118,44 @@ static unsigned int alloc_intremap_entry
>>>>      return slot;
>>>>  }
>>>>  
>>>> -static u32 *get_intremap_entry(int seg, int bdf, int offset)
>>>> +static union irte_ptr get_intremap_entry(unsigned int seg, unsigned int bdf,
>>>> +                                         unsigned int offset)
>>> As this is changing, s/offset/entry/ to avoid any confusion where offset
>>> might be in units of bytes.
>> I don't really mind - I think both names are sufficiently clear, but
>> I'll switch since you think the other name is better.
> 
> Looking through the other code, idx or index would also do fine, but I
> think all of these are clearer than using offset.

"index" it is then.

Jan



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [Xen-devel] [PATCH RFC 9/9] AMD/IOMMU: correct IRTE updating
  2019-06-13 13:28 ` [Xen-devel] [PATCH RFC 9/9] AMD/IOMMU: correct IRTE updating Jan Beulich
@ 2019-06-18 13:28   ` Andrew Cooper
  2019-06-18 14:58     ` Jan Beulich
  0 siblings, 1 reply; 54+ messages in thread
From: Andrew Cooper @ 2019-06-18 13:28 UTC (permalink / raw)
  To: Jan Beulich, xen-devel; +Cc: Brian Woods, Suravee Suthikulpanit

On 13/06/2019 14:28, Jan Beulich wrote:
> While for 32-bit IRTEs I think we can safely continue to assume that the
> writes will translate to a single MOV, the use of CMPXCHG16B is more
> heavy handed than necessary for the 128-bit form, and the flushing
> didn't get done along the lines of what the specification says. Mark
> entries to be updated as not remapped (which will result in interrupt
> requests to get target aborted, but the interrupts should be masked
> anyway at that point in time), issue the flush, and only then write the
> new entry. In the 128-bit IRTE case set RemapEn separately last, to that
> the ordering of the writes of the two 64-bit halves won't matter.
>
> In update_intremap_entry_from_msi_msg() also fold the duplicate initial
> lock determination and acquire into just a single instance.
>
> Signed-off-by: Jan Beulich <jbeulich@suse.com>

Looking at this patch, I think quite a bit of it should be folded into
patch 4.  However, my review suggestions on that patch take precedent
over the net result here.

> ---
> RFC: Putting the flush invocations in loops isn't overly nice, but I
>      don't think this can really be abused, since callers up the stack
>      hold further locks. Nevertheless I'd like to ask for better
>      suggestions.

Lets focus on getting it functioning first, and fast second.  However, I
think we can do better than the loop.  Let me dig some notes out.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [Xen-devel] [PATCH 8/9] AMD/IOMMU: enable x2APIC mode when available
  2019-06-13 13:27 ` [Xen-devel] [PATCH 8/9] AMD/IOMMU: enable x2APIC mode when available Jan Beulich
@ 2019-06-18 13:40   ` Andrew Cooper
  2019-06-18 14:02     ` Jan Beulich
  0 siblings, 1 reply; 54+ messages in thread
From: Andrew Cooper @ 2019-06-18 13:40 UTC (permalink / raw)
  To: Jan Beulich, xen-devel; +Cc: Brian Woods, Suravee Suthikulpanit

On 13/06/2019 14:27, Jan Beulich wrote:
> @@ -1346,7 +1399,7 @@ int __init amd_iommu_init(void)
>      /* per iommu initialization  */
>      for_each_amd_iommu ( iommu )
>      {
> -        rc = amd_iommu_init_one(iommu);
> +        rc = amd_iommu_init_one(iommu, !xt);

This logic is very subtle, and is a consequence of the bifurcated setup
AFAICT.  I think it deserves a comment.

> --- a/xen/drivers/passthrough/amd/iommu_intr.c
> +++ b/xen/drivers/passthrough/amd/iommu_intr.c
> @@ -796,6 +796,40 @@ void* __init amd_iommu_alloc_intremap_ta
>      return tb;
>  }
>  
> +bool __init iov_supports_xt(void)
> +{
> +    unsigned int apic;
> +    struct amd_iommu *iommu;
> +
> +    if ( !iommu_enable || !iommu_intremap || !cpu_has_cx16 )
> +        return false;
> +
> +    if ( amd_iommu_prepare() )
> +        return false;
> +
> +    for_each_amd_iommu ( iommu )
> +        if ( !iommu->features.flds.ga_sup || !iommu->features.flds.xt_sup )

Why ga_sup?  I don't see anything in the manual which links xt_sup with
ga_sup, other than the chronology of spec updated.

In particular, it is explicitly stated to be ok to use xt without ga,
and the format of interrupts generated by the IOMMU is controlled by the
XTEn bit.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [Xen-devel] [PATCH 8/9] AMD/IOMMU: enable x2APIC mode when available
  2019-06-18 13:40   ` Andrew Cooper
@ 2019-06-18 14:02     ` Jan Beulich
  0 siblings, 0 replies; 54+ messages in thread
From: Jan Beulich @ 2019-06-18 14:02 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: xen-devel, Brian Woods, Suravee Suthikulpanit

>>> On 18.06.19 at 15:40, <andrew.cooper3@citrix.com> wrote:
> On 13/06/2019 14:27, Jan Beulich wrote:
>> @@ -1346,7 +1399,7 @@ int __init amd_iommu_init(void)
>>      /* per iommu initialization  */
>>      for_each_amd_iommu ( iommu )
>>      {
>> -        rc = amd_iommu_init_one(iommu);
>> +        rc = amd_iommu_init_one(iommu, !xt);
> 
> This logic is very subtle, and is a consequence of the bifurcated setup
> AFAICT.  I think it deserves a comment.

Will do.

>> --- a/xen/drivers/passthrough/amd/iommu_intr.c
>> +++ b/xen/drivers/passthrough/amd/iommu_intr.c
>> @@ -796,6 +796,40 @@ void* __init amd_iommu_alloc_intremap_ta
>>      return tb;
>>  }
>>  
>> +bool __init iov_supports_xt(void)
>> +{
>> +    unsigned int apic;
>> +    struct amd_iommu *iommu;
>> +
>> +    if ( !iommu_enable || !iommu_intremap || !cpu_has_cx16 )
>> +        return false;
>> +
>> +    if ( amd_iommu_prepare() )
>> +        return false;
>> +
>> +    for_each_amd_iommu ( iommu )
>> +        if ( !iommu->features.flds.ga_sup || !iommu->features.flds.xt_sup )
> 
> Why ga_sup?  I don't see anything in the manual which links xt_sup with
> ga_sup, other than the chronology of spec updated.

There is an (indirect connection), and I learned this the hard way -
I too was assuming that XTEn alone ought to be sufficient for the
IOMMU to use 128-bit IRTEs. But no, table 22 in the 3.04 doc
makes quite clear that it is GAEn alone which controls entry size.
GAMEn would also need to be set to actually do what I would have
thought should be controlled by GAEn alone.

> In particular, it is explicitly stated to be ok to use xt without ga,
> and the format of interrupts generated by the IOMMU is controlled by the
> XTEn bit.

Where did you find this? Depending how it's worded this may
deserve clarifying.

Jan



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [Xen-devel] [PATCH RFC 9/9] AMD/IOMMU: correct IRTE updating
  2019-06-18 13:28   ` Andrew Cooper
@ 2019-06-18 14:58     ` Jan Beulich
  0 siblings, 0 replies; 54+ messages in thread
From: Jan Beulich @ 2019-06-18 14:58 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: xen-devel, Brian Woods, Suravee Suthikulpanit

>>> On 18.06.19 at 15:28, <andrew.cooper3@citrix.com> wrote:
> On 13/06/2019 14:28, Jan Beulich wrote:
>> While for 32-bit IRTEs I think we can safely continue to assume that the
>> writes will translate to a single MOV, the use of CMPXCHG16B is more
>> heavy handed than necessary for the 128-bit form, and the flushing
>> didn't get done along the lines of what the specification says. Mark
>> entries to be updated as not remapped (which will result in interrupt
>> requests to get target aborted, but the interrupts should be masked
>> anyway at that point in time), issue the flush, and only then write the
>> new entry. In the 128-bit IRTE case set RemapEn separately last, to that
>> the ordering of the writes of the two 64-bit halves won't matter.
>>
>> In update_intremap_entry_from_msi_msg() also fold the duplicate initial
>> lock determination and acquire into just a single instance.
>>
>> Signed-off-by: Jan Beulich <jbeulich@suse.com>
> 
> Looking at this patch, I think quite a bit of it should be folded into
> patch 4.

Not that much - the changes to update_intremap_entry()
could go there, and then of course the on change to
iov_supports_xt(). But that's about it - the rest isn't specific to
handling of 128-bit IRTEs, and hence wouldn't belong there.

What could be discussed is moving the change here towards the
start of the series, ahead of the ones playing with how IRTEs
get updated. I've put it last for now because (a) I've added it
last, after the CMPXCHG16B approach was already tested and
(b) because of its RFC status (I don't want it to block the rest of
the series).

>  However, my review suggestions on that patch take precedent
> over the net result here.

Sure - let's settle on the least bad variant of that one first. I did
already reply there.

>> ---
>> RFC: Putting the flush invocations in loops isn't overly nice, but I
>>      don't think this can really be abused, since callers up the stack
>>      hold further locks. Nevertheless I'd like to ask for better
>>      suggestions.
> 
> Lets focus on getting it functioning first, and fast second.

My remark wasn't about this being slow, but the theoretical risk of
this allowing for a DoS attack.

>  However, I
> think we can do better than the loop.  Let me dig some notes out.

Thanks much,
Jan



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [Xen-devel] [PATCH 4/9] AMD/IOMMU: introduce 128-bit IRTE non-guest-APIC IRTE format
  2019-06-18 11:57   ` Andrew Cooper
@ 2019-06-18 15:31     ` Jan Beulich
  0 siblings, 0 replies; 54+ messages in thread
From: Jan Beulich @ 2019-06-18 15:31 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: xen-devel, Brian Woods, Suravee Suthikulpanit

>>> On 18.06.19 at 13:57, <andrew.cooper3@citrix.com> wrote:
> On 13/06/2019 14:23, Jan Beulich wrote:
>> This is in preparation of actually enabling x2APIC mode, which requires
>> this wider IRTE format to be used.
>>
>> A specific remark regarding the first hunk changing
>> amd_iommu_ioapic_update_ire(): This bypass was introduced for XSA-36,
>> i.e. by 94d4a1119d ("AMD,IOMMU: Clean up old entries in remapping
>> tables when creating new one"). Other code introduced by that change has
>> meanwhile disappeared or further changed, and I wonder if - rather than
>> adding an x2apic_enabled check to the conditional - the bypass couldn't
>> be deleted altogether. For now the goal is to affect the non-x2APIC
>> paths as little as possible.
>>
>> Take the liberty and use the new "fresh" flag to suppress an unneeded
>> flush in update_intremap_entry_from_ioapic().
> 
> What is the meaning of fresh?  Wouldn't "needs_update" be a more
> descriptive name?

I don't think so, no. "Fresh" means "freshly allocated" and hence not
holding any meaningful data yet.

>> --- a/xen/drivers/passthrough/amd/iommu_intr.c
>> +++ b/xen/drivers/passthrough/amd/iommu_intr.c
>> @@ -35,12 +35,34 @@ struct irte_basic {
>>      unsigned int :8;
>>  };
>>  
>> +struct irte_full {
>> +    unsigned int remap_en:1;
>> +    unsigned int sup_io_pf:1;
>> +    unsigned int int_type:3;
>> +    unsigned int rq_eoi:1;
>> +    unsigned int dm:1;
>> +    unsigned int guest_mode:1; /* MBZ */
> 
> /* MBZ - not implemented yet. */
> 
> Seeing as interrupt posting will be a minor tweak to this data
> structure, rather than implementing a new one.

Again I don't think so: Bits 2...6 have entirely different meaning,
so I think we'd better not try to fold both into one structure.
Separate structures will also better enforce thinking about using
the correct one in any given context, I think (hope).

>> +    unsigned int dest_lo:24;
>> +    unsigned int :32;
>> +    unsigned int vector:8;
>> +    unsigned int :24;
>> +    unsigned int :24;
>> +    unsigned int dest_hi:8;
> 
> The manual says that we should prefer aligned 64bit access, so some
> raw_{lo,hi} fields here will allow...
> 
>> @@ -136,7 +170,21 @@ static void free_intremap_entry(unsigned
>>  {
>>      union irte_ptr entry = get_intremap_entry(seg, bdf, offset);
>>  
>> -    *entry.basic = (struct irte_basic){};
>> +    switch ( irte_mode )
>> +    {
>> +    case irte_basic:
>> +        *entry.basic = (struct irte_basic){};
>> +        break;
>> +
>> +    case irte_full:
>> +        entry.full->remap_en = 0;
>> +        wmb();
> 
> ... this to become
> 
> entry._128->raw_lo = 0;
> smp_wmb();
> entry._128->raw_hi = 0;
> 
> The interrupt mapping table is allocated in WB memory and accessed
> coherently, so an sfence instruction isn't necessary.  All that matters
> is that remap_en gets cleared first.

I've been trying to spot such a coherency statement - where did I
overlook this being said?

As to introducing "raw" fields - see my reply on the earlier patch. It's
possible, but has other downsides.

And finally on smb_wmb() - there's nothing SMP-ish here. The barrier
is specifically not to vanish (in the theoretical case of us patching in
SMP alternatives) in the UP case. Either it's wmb(), or it's just barrier().

>> @@ -154,8 +202,38 @@ static void update_intremap_entry(union
>>          .dest = dest,
>>          .vector = vector,
>>      };
>> +    struct irte_full full = {
>> +        .remap_en = 1,
>> +        .sup_io_pf = 0,
>> +        .int_type = int_type,
>> +        .rq_eoi = 0,
>> +        .dm = dest_mode,
>> +        .dest_lo = dest,
>> +        .dest_hi = dest >> 24,
>> +        .vector = vector,
>> +    };
> 
> Looking at the resulting code after this patch, I think these structures
> should move into their respective case blocks, to help the compiler to
> avoid initialising both.

I admit I didn't check the code, but I'm pretty surprised you say they
can't track this. I pretty much dislike the improperly indented braces
we use to frame case blocks with their own local variables, which is
why I've decided to put the variables where they are now (assuming
that it ought to be rather easy for the compiler to move the actual
initialization into the switch()).

>> +
>> +    switch ( irte_mode )
>> +    {
>> +        __uint128_t ret;
>> +        union {
>> +            __uint128_t raw;
>> +            struct irte_full full;
>> +        } old;
>> +
>> +    case irte_basic:
>> +        *entry.basic = basic;
>> +        break;
>> +
>> +    case irte_full:
>> +        old.full = *entry.full;
>> +        ret = cmpxchg16b(entry.full, &old, &full);
>> +        ASSERT(ret == old.raw);
> 
> Similarly, this can be implemented with
> 
> entry.full->remap_en = 0;
> smp_wmb();
> entry._128->raw_hi = full.raw_hi;
> smp_wmb();
> entry._128->raw_lo = full.raw_lo;
> 
> which avoids using a locked operation.

Right, and the locked operation goes away again in the last patch.
As said there, that part of that patch can probably be moved here.

But first of all we need to settle on whether we want "raw" union
members.

>> @@ -169,6 +247,11 @@ static inline void set_rte_index(struct
>>      rte->delivery_mode = offset >> 8;
>>  }
>>  
>> +static inline unsigned int get_full_dest(const struct irte_full *entry)
>> +{
>> +    return entry->dest_lo | (entry->dest_hi << 24);
> 
> Given your observation on my still-not-upstream-yet patch about GCC not
> honouring the type of bitfields, doesn't dest_hi need explicitly casting
> to unsigned int before the shift, to avoid it being performed as int?

Hmm, interesting question. I think you're right, and a little bit of
experimenting supports your position.

>> @@ -280,6 +392,18 @@ int __init amd_iommu_setup_ioapic_remapp
>>              dest_mode = rte.dest_mode;
>>              dest = rte.dest.logical.logical_dest;
>>  
>> +            if ( iommu->ctrl.xt_en )
>> +            {
>> +                /*
>> +                 * In x2APIC mode we have no way of discovering the high 24
>> +                 * bits of the destination of an already enabled interrupt.
> 
> Yes we do.  We read the interrupt remapping table, which is where that
> information lives.
> 
> Any firmware driver which is following the spec won't have let an IRTE
> get cached, then played with the table without appropriate flushing.

Which interrupt remapping table? This is code which runs when we've
just freshly allocated one, with all zeros in it. I'm unaware of a protocol
to communicate whatever interrupt remapping table the firmware may
have used.

But - how would there legitimately be an enabled interrupt here in the
first place?

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 54+ messages in thread

* [Xen-devel] [PATCH v2 00/10] x86: AMD x2APIC support
  2019-06-13 13:14 [Xen-devel] [PATCH 0/9] x86: AMD x2APIC support Jan Beulich
                   ` (8 preceding siblings ...)
  2019-06-13 13:28 ` [Xen-devel] [PATCH RFC 9/9] AMD/IOMMU: correct IRTE updating Jan Beulich
@ 2019-06-27 15:15 ` Jan Beulich
  2019-06-27 15:19   ` [Xen-devel] [PATCH v2 01/10] AMD/IOMMU: restrict feature logging Jan Beulich
                     ` (9 more replies)
  9 siblings, 10 replies; 54+ messages in thread
From: Jan Beulich @ 2019-06-27 15:15 UTC (permalink / raw)
  To: xen-devel; +Cc: Andrew Cooper, Brian Woods, Suravee Suthikulpanit

Despite the title this is actually all AMD IOMMU side work; all x86
side adjustments have already been carried out.

1: AMD/IOMMU: restrict feature logging
2: AMD/IOMMU: use bit field for extended feature register
3: AMD/IOMMU: use bit field for control register
4: AMD/IOMMU: use bit field for IRTE
5: AMD/IOMMU: introduce 128-bit IRTE non-guest-APIC IRTE format
6: AMD/IOMMU: split amd_iommu_init_one()
7: AMD/IOMMU: allow enabling with IRQ not yet set up
8: AMD/IOMMU: adjust setup of internal interrupt for x2APIC mode
9: AMD/IOMMU: enable x2APIC mode when available
10: AMD/IOMMU: correct IRTE updating

Jan



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 54+ messages in thread

* [Xen-devel] [PATCH v2 01/10] AMD/IOMMU: restrict feature logging
  2019-06-27 15:15 ` [Xen-devel] [PATCH v2 00/10] x86: AMD x2APIC support Jan Beulich
@ 2019-06-27 15:19   ` Jan Beulich
  2019-07-01 15:37     ` Andrew Cooper
  2019-07-01 15:59     ` Woods, Brian
  2019-06-27 15:19   ` [Xen-devel] [PATCH v2 02/10] AMD/IOMMU: use bit field for extended feature register Jan Beulich
                     ` (8 subsequent siblings)
  9 siblings, 2 replies; 54+ messages in thread
From: Jan Beulich @ 2019-06-27 15:19 UTC (permalink / raw)
  To: xen-devel; +Cc: Andrew Cooper, Brian Woods, Suravee Suthikulpanit

The common case is all IOMMUs having the same features. Log them only
for the first IOMMU, or for any that have a differing feature set.

Requested-by: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
---
v2: New.

--- a/xen/drivers/passthrough/amd/iommu_detect.c
+++ b/xen/drivers/passthrough/amd/iommu_detect.c
@@ -62,6 +62,7 @@ void __init get_iommu_features(struct am
 {
     u32 low, high;
     int i = 0 ;
+    const struct amd_iommu *first;
     static const char *__initdata feature_str[] = {
         "- Prefetch Pages Command", 
         "- Peripheral Page Service Request", 
@@ -89,6 +90,11 @@ void __init get_iommu_features(struct am
 
     iommu->features = ((u64)high << 32) | low;
 
+    /* Don't log the same set of features over and over. */
+    first = list_first_entry(&amd_iommu_head, struct amd_iommu, list);
+    if ( iommu != first && iommu->features == first->features )
+        return;
+
     printk("AMD-Vi: IOMMU Extended Features:\n");
 
     while ( feature_str[i] )




_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 54+ messages in thread

* [Xen-devel] [PATCH v2 02/10] AMD/IOMMU: use bit field for extended feature register
  2019-06-27 15:15 ` [Xen-devel] [PATCH v2 00/10] x86: AMD x2APIC support Jan Beulich
  2019-06-27 15:19   ` [Xen-devel] [PATCH v2 01/10] AMD/IOMMU: restrict feature logging Jan Beulich
@ 2019-06-27 15:19   ` Jan Beulich
  2019-07-02 12:09     ` Andrew Cooper
  2019-06-27 15:20   ` [Xen-devel] [PATCH v2 03/10] AMD/IOMMU: use bit field for control register Jan Beulich
                     ` (7 subsequent siblings)
  9 siblings, 1 reply; 54+ messages in thread
From: Jan Beulich @ 2019-06-27 15:19 UTC (permalink / raw)
  To: xen-devel; +Cc: Andrew Cooper, Brian Woods, Suravee Suthikulpanit

This also takes care of several of the shift values wrongly having been
specified as hex rather than dec.

Take the opportunity and add further fields.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
---
v2: Correct sats_sup position and name. Re-base over new earlier patch.

--- a/xen/drivers/passthrough/amd/iommu_detect.c
+++ b/xen/drivers/passthrough/amd/iommu_detect.c
@@ -60,49 +60,78 @@ static int __init get_iommu_capabilities
 
 void __init get_iommu_features(struct amd_iommu *iommu)
 {
-    u32 low, high;
-    int i = 0 ;
     const struct amd_iommu *first;
-    static const char *__initdata feature_str[] = {
-        "- Prefetch Pages Command", 
-        "- Peripheral Page Service Request", 
-        "- X2APIC Supported", 
-        "- NX bit Supported", 
-        "- Guest Translation", 
-        "- Reserved bit [5]",
-        "- Invalidate All Command", 
-        "- Guest APIC supported", 
-        "- Hardware Error Registers", 
-        "- Performance Counters", 
-        NULL
-    };
-
     ASSERT( iommu->mmio_base );
 
     if ( !iommu_has_cap(iommu, PCI_CAP_EFRSUP_SHIFT) )
     {
-        iommu->features = 0;
+        iommu->features.raw = 0;
         return;
     }
 
-    low = readl(iommu->mmio_base + IOMMU_EXT_FEATURE_MMIO_OFFSET);
-    high = readl(iommu->mmio_base + IOMMU_EXT_FEATURE_MMIO_OFFSET + 4);
-
-    iommu->features = ((u64)high << 32) | low;
+    iommu->features.raw =
+        readq(iommu->mmio_base + IOMMU_EXT_FEATURE_MMIO_OFFSET);
 
     /* Don't log the same set of features over and over. */
     first = list_first_entry(&amd_iommu_head, struct amd_iommu, list);
-    if ( iommu != first && iommu->features == first->features )
+    if ( iommu != first && iommu->features.raw == first->features.raw )
         return;
 
     printk("AMD-Vi: IOMMU Extended Features:\n");
 
-    while ( feature_str[i] )
+#define MASK(fld) ((union amd_iommu_ext_features){ .flds.fld = ~0 }).raw
+#define FEAT(fld, str) do { \
+    if ( MASK(fld) & (MASK(fld) - 1) ) \
+        printk( "- " str ": %#x\n", iommu->features.flds.fld); \
+    else if ( iommu->features.raw & MASK(fld) ) \
+        printk( "- " str "\n"); \
+} while ( false )
+
+    FEAT(pref_sup,           "Prefetch Pages Command");
+    FEAT(ppr_sup,            "Peripheral Page Service Request");
+    FEAT(xt_sup,             "x2APIC");
+    FEAT(nx_sup,             "NX bit");
+    FEAT(gappi_sup,          "Guest APIC Physical Processor Interrupt");
+    FEAT(ia_sup,             "Invalidate All Command");
+    FEAT(ga_sup,             "Guest APIC");
+    FEAT(he_sup,             "Hardware Error Registers");
+    FEAT(pc_sup,             "Performance Counters");
+    FEAT(hats,               "Host Address Translation Size");
+
+    if ( iommu->features.flds.gt_sup )
     {
-        if ( amd_iommu_has_feature(iommu, i) )
-            printk( " %s\n", feature_str[i]);
-        i++;
+        FEAT(gats,           "Guest Address Translation Size");
+        FEAT(glx_sup,        "Guest CR3 Root Table Level");
+        FEAT(pas_max,        "Maximum PASID");
     }
+
+    FEAT(smif_sup,           "SMI Filter Register");
+    FEAT(smif_rc,            "SMI Filter Register Count");
+    FEAT(gam_sup,            "Guest Virtual APIC Modes");
+    FEAT(dual_ppr_log_sup,   "Dual PPR Log");
+    FEAT(dual_event_log_sup, "Dual Event Log");
+    FEAT(sats_sup,           "Secure ATS");
+    FEAT(us_sup,             "User / Supervisor Page Protection");
+    FEAT(dev_tbl_seg_sup,    "Device Table Segmentation");
+    FEAT(ppr_early_of_sup,   "PPR Log Overflow Early Warning");
+    FEAT(ppr_auto_rsp_sup,   "PPR Automatic Response");
+    FEAT(marc_sup,           "Memory Access Routing and Control");
+    FEAT(blk_stop_mrk_sup,   "Block StopMark Message");
+    FEAT(perf_opt_sup ,      "Performance Optimization");
+    FEAT(msi_cap_mmio_sup,   "MSI Capability MMIO Access");
+    FEAT(gio_sup,            "Guest I/O Protection");
+    FEAT(ha_sup,             "Host Access");
+    FEAT(eph_sup,            "Enhanced PPR Handling");
+    FEAT(attr_fw_sup,        "Attribute Forward");
+    FEAT(hd_sup,             "Host Dirty");
+    FEAT(inv_iotlb_type_sup, "Invalidate IOTLB Type");
+    FEAT(viommu_sup,         "Virtualized IOMMU");
+    FEAT(vm_guard_io_sup,    "VMGuard I/O Support");
+    FEAT(vm_table_size,      "VM Table Size");
+    FEAT(ga_update_dis_sup,  "Guest Access Bit Update Disable");
+
+#undef FEAT
+#undef MASK
 }
 
 int __init amd_iommu_detect_one_acpi(
--- a/xen/drivers/passthrough/amd/iommu_guest.c
+++ b/xen/drivers/passthrough/amd/iommu_guest.c
@@ -638,7 +638,7 @@ static uint64_t iommu_mmio_read64(struct
         val = reg_to_u64(iommu->reg_status);
         break;
     case IOMMU_EXT_FEATURE_MMIO_OFFSET:
-        val = reg_to_u64(iommu->reg_ext_feature);
+        val = iommu->reg_ext_feature.raw;
         break;
 
     default:
@@ -802,39 +802,26 @@ int guest_iommu_set_base(struct domain *
 /* Initialize mmio read only bits */
 static void guest_iommu_reg_init(struct guest_iommu *iommu)
 {
-    uint32_t lower, upper;
+    union amd_iommu_ext_features ef = {
+        /* Support prefetch */
+        .flds.pref_sup = 1,
+        /* Support PPR log */
+        .flds.ppr_sup = 1,
+        /* Support guest translation */
+        .flds.gt_sup = 1,
+        /* Support invalidate all command */
+        .flds.ia_sup = 1,
+        /* Host translation size has 6 levels */
+        .flds.hats = HOST_ADDRESS_SIZE_6_LEVEL,
+        /* Guest translation size has 6 levels */
+        .flds.gats = GUEST_ADDRESS_SIZE_6_LEVEL,
+        /* Single level gCR3 */
+        .flds.glx_sup = GUEST_CR3_1_LEVEL,
+        /* 9 bit PASID */
+        .flds.pas_max = PASMAX_9_bit,
+    };
 
-    lower = upper = 0;
-    /* Support prefetch */
-    iommu_set_bit(&lower,IOMMU_EXT_FEATURE_PREFSUP_SHIFT);
-    /* Support PPR log */
-    iommu_set_bit(&lower,IOMMU_EXT_FEATURE_PPRSUP_SHIFT);
-    /* Support guest translation */
-    iommu_set_bit(&lower,IOMMU_EXT_FEATURE_GTSUP_SHIFT);
-    /* Support invalidate all command */
-    iommu_set_bit(&lower,IOMMU_EXT_FEATURE_IASUP_SHIFT);
-
-    /* Host translation size has 6 levels */
-    set_field_in_reg_u32(HOST_ADDRESS_SIZE_6_LEVEL, lower,
-                         IOMMU_EXT_FEATURE_HATS_MASK,
-                         IOMMU_EXT_FEATURE_HATS_SHIFT,
-                         &lower);
-    /* Guest translation size has 6 levels */
-    set_field_in_reg_u32(GUEST_ADDRESS_SIZE_6_LEVEL, lower,
-                         IOMMU_EXT_FEATURE_GATS_MASK,
-                         IOMMU_EXT_FEATURE_GATS_SHIFT,
-                         &lower);
-    /* Single level gCR3 */
-    set_field_in_reg_u32(GUEST_CR3_1_LEVEL, lower,
-                         IOMMU_EXT_FEATURE_GLXSUP_MASK,
-                         IOMMU_EXT_FEATURE_GLXSUP_SHIFT, &lower);
-    /* 9 bit PASID */
-    set_field_in_reg_u32(PASMAX_9_bit, upper,
-                         IOMMU_EXT_FEATURE_PASMAX_MASK,
-                         IOMMU_EXT_FEATURE_PASMAX_SHIFT, &upper);
-
-    iommu->reg_ext_feature.lo = lower;
-    iommu->reg_ext_feature.hi = upper;
+    iommu->reg_ext_feature = ef;
 }
 
 static int guest_iommu_mmio_range(struct vcpu *v, unsigned long addr)
--- a/xen/drivers/passthrough/amd/iommu_init.c
+++ b/xen/drivers/passthrough/amd/iommu_init.c
@@ -883,7 +883,7 @@ static void enable_iommu(struct amd_iomm
     register_iommu_event_log_in_mmio_space(iommu);
     register_iommu_exclusion_range(iommu);
 
-    if ( amd_iommu_has_feature(iommu, IOMMU_EXT_FEATURE_PPRSUP_SHIFT) )
+    if ( iommu->features.flds.ppr_sup )
         register_iommu_ppr_log_in_mmio_space(iommu);
 
     desc = irq_to_desc(iommu->msi.irq);
@@ -897,15 +897,15 @@ static void enable_iommu(struct amd_iomm
     set_iommu_command_buffer_control(iommu, IOMMU_CONTROL_ENABLED);
     set_iommu_event_log_control(iommu, IOMMU_CONTROL_ENABLED);
 
-    if ( amd_iommu_has_feature(iommu, IOMMU_EXT_FEATURE_PPRSUP_SHIFT) )
+    if ( iommu->features.flds.ppr_sup )
         set_iommu_ppr_log_control(iommu, IOMMU_CONTROL_ENABLED);
 
-    if ( amd_iommu_has_feature(iommu, IOMMU_EXT_FEATURE_GTSUP_SHIFT) )
+    if ( iommu->features.flds.gt_sup )
         set_iommu_guest_translation_control(iommu, IOMMU_CONTROL_ENABLED);
 
     set_iommu_translation_control(iommu, IOMMU_CONTROL_ENABLED);
 
-    if ( amd_iommu_has_feature(iommu, IOMMU_EXT_FEATURE_IASUP_SHIFT) )
+    if ( iommu->features.flds.ia_sup )
         amd_iommu_flush_all_caches(iommu);
 
     iommu->enabled = 1;
@@ -928,10 +928,10 @@ static void disable_iommu(struct amd_iom
     set_iommu_command_buffer_control(iommu, IOMMU_CONTROL_DISABLED);
     set_iommu_event_log_control(iommu, IOMMU_CONTROL_DISABLED);
 
-    if ( amd_iommu_has_feature(iommu, IOMMU_EXT_FEATURE_PPRSUP_SHIFT) )
+    if ( iommu->features.flds.ppr_sup )
         set_iommu_ppr_log_control(iommu, IOMMU_CONTROL_DISABLED);
 
-    if ( amd_iommu_has_feature(iommu, IOMMU_EXT_FEATURE_GTSUP_SHIFT) )
+    if ( iommu->features.flds.gt_sup )
         set_iommu_guest_translation_control(iommu, IOMMU_CONTROL_DISABLED);
 
     set_iommu_translation_control(iommu, IOMMU_CONTROL_DISABLED);
@@ -1027,7 +1027,7 @@ static int __init amd_iommu_init_one(str
 
     get_iommu_features(iommu);
 
-    if ( iommu->features )
+    if ( iommu->features.raw )
         iommuv2_enabled = 1;
 
     if ( allocate_cmd_buffer(iommu) == NULL )
@@ -1036,9 +1036,8 @@ static int __init amd_iommu_init_one(str
     if ( allocate_event_log(iommu) == NULL )
         goto error_out;
 
-    if ( amd_iommu_has_feature(iommu, IOMMU_EXT_FEATURE_PPRSUP_SHIFT) )
-        if ( allocate_ppr_log(iommu) == NULL )
-            goto error_out;
+    if ( iommu->features.flds.ppr_sup && !allocate_ppr_log(iommu) )
+        goto error_out;
 
     if ( !set_iommu_interrupt_handler(iommu) )
         goto error_out;
@@ -1389,7 +1388,7 @@ void amd_iommu_resume(void)
     }
 
     /* flush all cache entries after iommu re-enabled */
-    if ( !amd_iommu_has_feature(iommu, IOMMU_EXT_FEATURE_IASUP_SHIFT) )
+    if ( !iommu->features.flds.ia_sup )
     {
         invalidate_all_devices();
         invalidate_all_domain_pages();
--- a/xen/include/asm-x86/amd-iommu.h
+++ b/xen/include/asm-x86/amd-iommu.h
@@ -83,7 +83,7 @@ struct amd_iommu {
     iommu_cap_t cap;
 
     u8 ht_flags;
-    u64 features;
+    union amd_iommu_ext_features features;
 
     void *mmio_base;
     unsigned long mmio_base_phys;
@@ -174,7 +174,7 @@ struct guest_iommu {
     /* MMIO regs */
     struct mmio_reg         reg_ctrl;              /* MMIO offset 0018h */
     struct mmio_reg         reg_status;            /* MMIO offset 2020h */
-    struct mmio_reg         reg_ext_feature;       /* MMIO offset 0030h */
+    union amd_iommu_ext_features reg_ext_feature;  /* MMIO offset 0030h */
 
     /* guest interrupt settings */
     struct guest_iommu_msi  msi;
--- a/xen/include/asm-x86/hvm/svm/amd-iommu-defs.h
+++ b/xen/include/asm-x86/hvm/svm/amd-iommu-defs.h
@@ -346,26 +346,57 @@ struct amd_iommu_dte {
 #define IOMMU_EXCLUSION_LIMIT_HIGH_MASK		0xFFFFFFFF
 #define IOMMU_EXCLUSION_LIMIT_HIGH_SHIFT	0
 
-/* Extended Feature Register*/
+/* Extended Feature Register */
 #define IOMMU_EXT_FEATURE_MMIO_OFFSET                   0x30
-#define IOMMU_EXT_FEATURE_PREFSUP_SHIFT                 0x0
-#define IOMMU_EXT_FEATURE_PPRSUP_SHIFT                  0x1
-#define IOMMU_EXT_FEATURE_XTSUP_SHIFT                   0x2
-#define IOMMU_EXT_FEATURE_NXSUP_SHIFT                   0x3
-#define IOMMU_EXT_FEATURE_GTSUP_SHIFT                   0x4
-#define IOMMU_EXT_FEATURE_IASUP_SHIFT                   0x6
-#define IOMMU_EXT_FEATURE_GASUP_SHIFT                   0x7
-#define IOMMU_EXT_FEATURE_HESUP_SHIFT                   0x8
-#define IOMMU_EXT_FEATURE_PCSUP_SHIFT                   0x9
-#define IOMMU_EXT_FEATURE_HATS_SHIFT                    0x10
-#define IOMMU_EXT_FEATURE_HATS_MASK                     0x00000C00
-#define IOMMU_EXT_FEATURE_GATS_SHIFT                    0x12
-#define IOMMU_EXT_FEATURE_GATS_MASK                     0x00003000
-#define IOMMU_EXT_FEATURE_GLXSUP_SHIFT                  0x14
-#define IOMMU_EXT_FEATURE_GLXSUP_MASK                   0x0000C000
 
-#define IOMMU_EXT_FEATURE_PASMAX_SHIFT                  0x0
-#define IOMMU_EXT_FEATURE_PASMAX_MASK                   0x0000001F
+union amd_iommu_ext_features {
+    uint64_t raw;
+    struct {
+        unsigned int pref_sup:1;
+        unsigned int ppr_sup:1;
+        unsigned int xt_sup:1;
+        unsigned int nx_sup:1;
+        unsigned int gt_sup:1;
+        unsigned int gappi_sup:1;
+        unsigned int ia_sup:1;
+        unsigned int ga_sup:1;
+        unsigned int he_sup:1;
+        unsigned int pc_sup:1;
+        unsigned int hats:2;
+        unsigned int gats:2;
+        unsigned int glx_sup:2;
+        unsigned int smif_sup:2;
+        unsigned int smif_rc:3;
+        unsigned int gam_sup:3;
+        unsigned int dual_ppr_log_sup:2;
+        unsigned int :2;
+        unsigned int dual_event_log_sup:2;
+        unsigned int :1;
+        unsigned int sats_sup:1;
+        unsigned int pas_max:5;
+        unsigned int us_sup:1;
+        unsigned int dev_tbl_seg_sup:2;
+        unsigned int ppr_early_of_sup:1;
+        unsigned int ppr_auto_rsp_sup:1;
+        unsigned int marc_sup:2;
+        unsigned int blk_stop_mrk_sup:1;
+        unsigned int perf_opt_sup:1;
+        unsigned int msi_cap_mmio_sup:1;
+        unsigned int :1;
+        unsigned int gio_sup:1;
+        unsigned int ha_sup:1;
+        unsigned int eph_sup:1;
+        unsigned int attr_fw_sup:1;
+        unsigned int hd_sup:1;
+        unsigned int :1;
+        unsigned int inv_iotlb_type_sup:1;
+        unsigned int viommu_sup:1;
+        unsigned int vm_guard_io_sup:1;
+        unsigned int vm_table_size:4;
+        unsigned int ga_update_dis_sup:1;
+        unsigned int :2;
+    } flds;
+};
 
 /* Status Register*/
 #define IOMMU_STATUS_MMIO_OFFSET		0x2020
--- a/xen/include/asm-x86/hvm/svm/amd-iommu-proto.h
+++ b/xen/include/asm-x86/hvm/svm/amd-iommu-proto.h
@@ -219,13 +219,6 @@ static inline int iommu_has_cap(struct a
     return !!(iommu->cap.header & (1u << bit));
 }
 
-static inline int amd_iommu_has_feature(struct amd_iommu *iommu, uint32_t bit)
-{
-    if ( !iommu_has_cap(iommu, PCI_CAP_EFRSUP_SHIFT) )
-        return 0;
-    return !!(iommu->features & (1U << bit));
-}
-
 /* access tail or head pointer of ring buffer */
 static inline uint32_t iommu_get_rb_pointer(uint32_t reg)
 {




_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 54+ messages in thread

* [Xen-devel] [PATCH v2 03/10] AMD/IOMMU: use bit field for control register
  2019-06-27 15:15 ` [Xen-devel] [PATCH v2 00/10] x86: AMD x2APIC support Jan Beulich
  2019-06-27 15:19   ` [Xen-devel] [PATCH v2 01/10] AMD/IOMMU: restrict feature logging Jan Beulich
  2019-06-27 15:19   ` [Xen-devel] [PATCH v2 02/10] AMD/IOMMU: use bit field for extended feature register Jan Beulich
@ 2019-06-27 15:20   ` Jan Beulich
  2019-07-02 12:20     ` Andrew Cooper
  2019-06-27 15:20   ` [Xen-devel] [PATCH v2 04/10] AMD/IOMMU: use bit field for IRTE Jan Beulich
                     ` (6 subsequent siblings)
  9 siblings, 1 reply; 54+ messages in thread
From: Jan Beulich @ 2019-06-27 15:20 UTC (permalink / raw)
  To: xen-devel; +Cc: Andrew Cooper, Brian Woods, Suravee Suthikulpanit

Also introduce a field in struct amd_iommu caching the most recently
written control register. All writes should now happen exclusively from
that cached value, such that it is guaranteed to be up to date.

Take the opportunity and add further fields. Also convert a few boolean
function parameters to bool, such that use of !! can be avoided.

Because of there now being definitions beyond bit 31, writel() also gets
replaced by writeq() when updating hardware.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
---
v2: Add domain_id_pne field. Mention writel() -> writeq() change.

--- a/xen/drivers/passthrough/amd/iommu_guest.c
+++ b/xen/drivers/passthrough/amd/iommu_guest.c
@@ -317,7 +317,7 @@ static int do_invalidate_iotlb_pages(str
 
 static int do_completion_wait(struct domain *d, cmd_entry_t *cmd)
 {
-    bool_t com_wait_int_en, com_wait_int, i, s;
+    bool com_wait_int, i, s;
     struct guest_iommu *iommu;
     unsigned long gfn;
     p2m_type_t p2mt;
@@ -354,12 +354,10 @@ static int do_completion_wait(struct dom
         unmap_domain_page(vaddr);
     }
 
-    com_wait_int_en = iommu_get_bit(iommu->reg_ctrl.lo,
-                                    IOMMU_CONTROL_COMP_WAIT_INT_SHIFT);
     com_wait_int = iommu_get_bit(iommu->reg_status.lo,
                                  IOMMU_STATUS_COMP_WAIT_INT_SHIFT);
 
-    if ( com_wait_int_en && com_wait_int )
+    if ( iommu->reg_ctrl.com_wait_int_en && com_wait_int )
         guest_iommu_deliver_msi(d);
 
     return 0;
@@ -521,40 +519,17 @@ static void guest_iommu_process_command(
     return;
 }
 
-static int guest_iommu_write_ctrl(struct guest_iommu *iommu, uint64_t newctrl)
+static int guest_iommu_write_ctrl(struct guest_iommu *iommu, uint64_t val)
 {
-    bool_t cmd_en, event_en, iommu_en, ppr_en, ppr_log_en;
-    bool_t cmd_en_old, event_en_old, iommu_en_old;
-    bool_t cmd_run;
-
-    iommu_en = iommu_get_bit(newctrl,
-                             IOMMU_CONTROL_TRANSLATION_ENABLE_SHIFT);
-    iommu_en_old = iommu_get_bit(iommu->reg_ctrl.lo,
-                                 IOMMU_CONTROL_TRANSLATION_ENABLE_SHIFT);
-
-    cmd_en = iommu_get_bit(newctrl,
-                           IOMMU_CONTROL_COMMAND_BUFFER_ENABLE_SHIFT);
-    cmd_en_old = iommu_get_bit(iommu->reg_ctrl.lo,
-                               IOMMU_CONTROL_COMMAND_BUFFER_ENABLE_SHIFT);
-    cmd_run = iommu_get_bit(iommu->reg_status.lo,
-                            IOMMU_STATUS_CMD_BUFFER_RUN_SHIFT);
-    event_en = iommu_get_bit(newctrl,
-                             IOMMU_CONTROL_EVENT_LOG_ENABLE_SHIFT);
-    event_en_old = iommu_get_bit(iommu->reg_ctrl.lo,
-                                 IOMMU_CONTROL_EVENT_LOG_ENABLE_SHIFT);
-
-    ppr_en = iommu_get_bit(newctrl,
-                           IOMMU_CONTROL_PPR_ENABLE_SHIFT);
-    ppr_log_en = iommu_get_bit(newctrl,
-                               IOMMU_CONTROL_PPR_LOG_ENABLE_SHIFT);
+    union amd_iommu_control newctrl = { .raw = val };
 
-    if ( iommu_en )
+    if ( newctrl.iommu_en )
     {
         guest_iommu_enable(iommu);
         guest_iommu_enable_dev_table(iommu);
     }
 
-    if ( iommu_en && cmd_en )
+    if ( newctrl.iommu_en && newctrl.cmd_buf_en )
     {
         guest_iommu_enable_ring_buffer(iommu, &iommu->cmd_buffer,
                                        sizeof(cmd_entry_t));
@@ -562,7 +537,7 @@ static int guest_iommu_write_ctrl(struct
         tasklet_schedule(&iommu->cmd_buffer_tasklet);
     }
 
-    if ( iommu_en && event_en )
+    if ( newctrl.iommu_en && newctrl.event_log_en )
     {
         guest_iommu_enable_ring_buffer(iommu, &iommu->event_log,
                                        sizeof(event_entry_t));
@@ -570,7 +545,7 @@ static int guest_iommu_write_ctrl(struct
         guest_iommu_clear_status(iommu, IOMMU_STATUS_EVENT_OVERFLOW_SHIFT);
     }
 
-    if ( iommu_en && ppr_en && ppr_log_en )
+    if ( newctrl.iommu_en && newctrl.ppr_en && newctrl.ppr_log_en )
     {
         guest_iommu_enable_ring_buffer(iommu, &iommu->ppr_log,
                                        sizeof(ppr_entry_t));
@@ -578,19 +553,21 @@ static int guest_iommu_write_ctrl(struct
         guest_iommu_clear_status(iommu, IOMMU_STATUS_PPR_LOG_OVERFLOW_SHIFT);
     }
 
-    if ( iommu_en && cmd_en_old && !cmd_en )
+    if ( newctrl.iommu_en && iommu->reg_ctrl.cmd_buf_en &&
+         !newctrl.cmd_buf_en )
     {
         /* Disable iommu command processing */
         tasklet_kill(&iommu->cmd_buffer_tasklet);
     }
 
-    if ( event_en_old && !event_en )
+    if ( iommu->reg_ctrl.event_log_en && !newctrl.event_log_en )
         guest_iommu_clear_status(iommu, IOMMU_STATUS_EVENT_LOG_RUN_SHIFT);
 
-    if ( iommu_en_old && !iommu_en )
+    if ( iommu->reg_ctrl.iommu_en && !newctrl.iommu_en )
         guest_iommu_disable(iommu);
 
-    u64_to_reg(&iommu->reg_ctrl, newctrl);
+    iommu->reg_ctrl = newctrl;
+
     return 0;
 }
 
@@ -632,7 +609,7 @@ static uint64_t iommu_mmio_read64(struct
         val = reg_to_u64(iommu->ppr_log.reg_tail);
         break;
     case IOMMU_CONTROL_MMIO_OFFSET:
-        val = reg_to_u64(iommu->reg_ctrl);
+        val = iommu->reg_ctrl.raw;
         break;
     case IOMMU_STATUS_MMIO_OFFSET:
         val = reg_to_u64(iommu->reg_status);
--- a/xen/drivers/passthrough/amd/iommu_init.c
+++ b/xen/drivers/passthrough/amd/iommu_init.c
@@ -41,7 +41,7 @@ LIST_HEAD_READ_MOSTLY(amd_iommu_head);
 struct table_struct device_table;
 bool_t iommuv2_enabled;
 
-static int iommu_has_ht_flag(struct amd_iommu *iommu, u8 mask)
+static bool iommu_has_ht_flag(struct amd_iommu *iommu, u8 mask)
 {
     return iommu->ht_flags & mask;
 }
@@ -69,31 +69,18 @@ static void __init unmap_iommu_mmio_regi
 
 static void set_iommu_ht_flags(struct amd_iommu *iommu)
 {
-    u32 entry;
-    entry = readl(iommu->mmio_base + IOMMU_CONTROL_MMIO_OFFSET);
-
     /* Setup HT flags */
     if ( iommu_has_cap(iommu, PCI_CAP_HT_TUNNEL_SHIFT) )
-        iommu_has_ht_flag(iommu, ACPI_IVHD_TT_ENABLE) ?
-            iommu_set_bit(&entry, IOMMU_CONTROL_HT_TUNNEL_TRANSLATION_SHIFT) :
-            iommu_clear_bit(&entry, IOMMU_CONTROL_HT_TUNNEL_TRANSLATION_SHIFT);
-
-    iommu_has_ht_flag(iommu, ACPI_IVHD_RES_PASS_PW) ?
-        iommu_set_bit(&entry, IOMMU_CONTROL_RESP_PASS_POSTED_WRITE_SHIFT):
-        iommu_clear_bit(&entry, IOMMU_CONTROL_RESP_PASS_POSTED_WRITE_SHIFT);
-
-    iommu_has_ht_flag(iommu, ACPI_IVHD_ISOC) ?
-        iommu_set_bit(&entry, IOMMU_CONTROL_ISOCHRONOUS_SHIFT):
-        iommu_clear_bit(&entry, IOMMU_CONTROL_ISOCHRONOUS_SHIFT);
-
-    iommu_has_ht_flag(iommu, ACPI_IVHD_PASS_PW) ?
-        iommu_set_bit(&entry, IOMMU_CONTROL_PASS_POSTED_WRITE_SHIFT):
-        iommu_clear_bit(&entry, IOMMU_CONTROL_PASS_POSTED_WRITE_SHIFT);
+        iommu->ctrl.ht_tun_en = iommu_has_ht_flag(iommu, ACPI_IVHD_TT_ENABLE);
+
+    iommu->ctrl.pass_pw     = iommu_has_ht_flag(iommu, ACPI_IVHD_PASS_PW);
+    iommu->ctrl.res_pass_pw = iommu_has_ht_flag(iommu, ACPI_IVHD_RES_PASS_PW);
+    iommu->ctrl.isoc        = iommu_has_ht_flag(iommu, ACPI_IVHD_ISOC);
 
     /* Force coherent */
-    iommu_set_bit(&entry, IOMMU_CONTROL_COHERENT_SHIFT);
+    iommu->ctrl.coherent = 1;
 
-    writel(entry, iommu->mmio_base+IOMMU_CONTROL_MMIO_OFFSET);
+    writeq(iommu->ctrl.raw, iommu->mmio_base + IOMMU_CONTROL_MMIO_OFFSET);
 }
 
 static void register_iommu_dev_table_in_mmio_space(struct amd_iommu *iommu)
@@ -205,55 +192,37 @@ static void register_iommu_ppr_log_in_mm
 
 
 static void set_iommu_translation_control(struct amd_iommu *iommu,
-                                                 int enable)
+                                          bool enable)
 {
-    u32 entry;
+    iommu->ctrl.iommu_en = enable;
 
-    entry = readl(iommu->mmio_base + IOMMU_CONTROL_MMIO_OFFSET);
-
-    enable ?
-        iommu_set_bit(&entry, IOMMU_CONTROL_TRANSLATION_ENABLE_SHIFT) :
-        iommu_clear_bit(&entry, IOMMU_CONTROL_TRANSLATION_ENABLE_SHIFT);
-
-    writel(entry, iommu->mmio_base+IOMMU_CONTROL_MMIO_OFFSET);
+    writeq(iommu->ctrl.raw, iommu->mmio_base + IOMMU_CONTROL_MMIO_OFFSET);
 }
 
 static void set_iommu_guest_translation_control(struct amd_iommu *iommu,
-                                                int enable)
+                                                bool enable)
 {
-    u32 entry;
-
-    entry = readl(iommu->mmio_base + IOMMU_CONTROL_MMIO_OFFSET);
+    iommu->ctrl.gt_en = enable;
 
-    enable ?
-        iommu_set_bit(&entry, IOMMU_CONTROL_GT_ENABLE_SHIFT) :
-        iommu_clear_bit(&entry, IOMMU_CONTROL_GT_ENABLE_SHIFT);
-
-    writel(entry, iommu->mmio_base+IOMMU_CONTROL_MMIO_OFFSET);
+    writeq(iommu->ctrl.raw, iommu->mmio_base + IOMMU_CONTROL_MMIO_OFFSET);
 
     if ( enable )
         AMD_IOMMU_DEBUG("Guest Translation Enabled.\n");
 }
 
 static void set_iommu_command_buffer_control(struct amd_iommu *iommu,
-                                                    int enable)
+                                             bool enable)
 {
-    u32 entry;
-
-    entry = readl(iommu->mmio_base + IOMMU_CONTROL_MMIO_OFFSET);
-
-    /*reset head and tail pointer manually before enablement */
+    /* Reset head and tail pointer manually before enablement */
     if ( enable )
     {
         writeq(0, iommu->mmio_base + IOMMU_CMD_BUFFER_HEAD_OFFSET);
         writeq(0, iommu->mmio_base + IOMMU_CMD_BUFFER_TAIL_OFFSET);
-
-        iommu_set_bit(&entry, IOMMU_CONTROL_COMMAND_BUFFER_ENABLE_SHIFT);
     }
-    else
-        iommu_clear_bit(&entry, IOMMU_CONTROL_COMMAND_BUFFER_ENABLE_SHIFT);
 
-    writel(entry, iommu->mmio_base+IOMMU_CONTROL_MMIO_OFFSET);
+    iommu->ctrl.cmd_buf_en = enable;
+
+    writeq(iommu->ctrl.raw, iommu->mmio_base + IOMMU_CONTROL_MMIO_OFFSET);
 }
 
 static void register_iommu_exclusion_range(struct amd_iommu *iommu)
@@ -295,57 +264,38 @@ static void register_iommu_exclusion_ran
 }
 
 static void set_iommu_event_log_control(struct amd_iommu *iommu,
-            int enable)
+                                        bool enable)
 {
-    u32 entry;
-
-    entry = readl(iommu->mmio_base + IOMMU_CONTROL_MMIO_OFFSET);
-
-    /*reset head and tail pointer manually before enablement */
+    /* Reset head and tail pointer manually before enablement */
     if ( enable )
     {
         writeq(0, iommu->mmio_base + IOMMU_EVENT_LOG_HEAD_OFFSET);
         writeq(0, iommu->mmio_base + IOMMU_EVENT_LOG_TAIL_OFFSET);
-
-        iommu_set_bit(&entry, IOMMU_CONTROL_EVENT_LOG_INT_SHIFT);
-        iommu_set_bit(&entry, IOMMU_CONTROL_EVENT_LOG_ENABLE_SHIFT);
-    }
-    else
-    {
-        iommu_clear_bit(&entry, IOMMU_CONTROL_EVENT_LOG_INT_SHIFT);
-        iommu_clear_bit(&entry, IOMMU_CONTROL_EVENT_LOG_ENABLE_SHIFT);
     }
 
-    iommu_clear_bit(&entry, IOMMU_CONTROL_COMP_WAIT_INT_SHIFT);
+    iommu->ctrl.event_int_en = enable;
+    iommu->ctrl.event_log_en = enable;
+    iommu->ctrl.com_wait_int_en = 0;
 
-    writel(entry, iommu->mmio_base + IOMMU_CONTROL_MMIO_OFFSET);
+    writeq(iommu->ctrl.raw, iommu->mmio_base + IOMMU_CONTROL_MMIO_OFFSET);
 }
 
 static void set_iommu_ppr_log_control(struct amd_iommu *iommu,
-                                      int enable)
+                                      bool enable)
 {
-    u32 entry;
-
-    entry = readl(iommu->mmio_base + IOMMU_CONTROL_MMIO_OFFSET);
-
-    /*reset head and tail pointer manually before enablement */
+    /* Reset head and tail pointer manually before enablement */
     if ( enable )
     {
         writeq(0, iommu->mmio_base + IOMMU_PPR_LOG_HEAD_OFFSET);
         writeq(0, iommu->mmio_base + IOMMU_PPR_LOG_TAIL_OFFSET);
-
-        iommu_set_bit(&entry, IOMMU_CONTROL_PPR_ENABLE_SHIFT);
-        iommu_set_bit(&entry, IOMMU_CONTROL_PPR_LOG_INT_SHIFT);
-        iommu_set_bit(&entry, IOMMU_CONTROL_PPR_LOG_ENABLE_SHIFT);
-    }
-    else
-    {
-        iommu_clear_bit(&entry, IOMMU_CONTROL_PPR_ENABLE_SHIFT);
-        iommu_clear_bit(&entry, IOMMU_CONTROL_PPR_LOG_INT_SHIFT);
-        iommu_clear_bit(&entry, IOMMU_CONTROL_PPR_LOG_ENABLE_SHIFT);
     }
 
-    writel(entry, iommu->mmio_base + IOMMU_CONTROL_MMIO_OFFSET);
+    iommu->ctrl.ppr_en = enable;
+    iommu->ctrl.ppr_int_en = enable;
+    iommu->ctrl.ppr_log_en = enable;
+
+    writeq(iommu->ctrl.raw, iommu->mmio_base + IOMMU_CONTROL_MMIO_OFFSET);
+
     if ( enable )
         AMD_IOMMU_DEBUG("PPR Log Enabled.\n");
 }
@@ -398,7 +348,7 @@ static int iommu_read_log(struct amd_iom
 /* reset event log or ppr log when overflow */
 static void iommu_reset_log(struct amd_iommu *iommu,
                             struct ring_buffer *log,
-                            void (*ctrl_func)(struct amd_iommu *iommu, int))
+                            void (*ctrl_func)(struct amd_iommu *iommu, bool))
 {
     u32 entry;
     int log_run, run_bit;
@@ -615,11 +565,11 @@ static void iommu_check_event_log(struct
         iommu_reset_log(iommu, &iommu->event_log, set_iommu_event_log_control);
     else
     {
-        entry = readl(iommu->mmio_base + IOMMU_CONTROL_MMIO_OFFSET);
-        if ( !(entry & IOMMU_CONTROL_EVENT_LOG_INT_MASK) )
+        if ( !iommu->ctrl.event_int_en )
         {
-            entry |= IOMMU_CONTROL_EVENT_LOG_INT_MASK;
-            writel(entry, iommu->mmio_base + IOMMU_CONTROL_MMIO_OFFSET);
+            iommu->ctrl.event_int_en = 1;
+            writeq(iommu->ctrl.raw,
+                   iommu->mmio_base + IOMMU_CONTROL_MMIO_OFFSET);
             /*
              * Re-schedule the tasklet to handle eventual log entries added
              * between reading the log above and re-enabling the interrupt.
@@ -704,11 +654,11 @@ static void iommu_check_ppr_log(struct a
         iommu_reset_log(iommu, &iommu->ppr_log, set_iommu_ppr_log_control);
     else
     {
-        entry = readl(iommu->mmio_base + IOMMU_CONTROL_MMIO_OFFSET);
-        if ( !(entry & IOMMU_CONTROL_PPR_LOG_INT_MASK) )
+        if ( !iommu->ctrl.ppr_int_en )
         {
-            entry |= IOMMU_CONTROL_PPR_LOG_INT_MASK;
-            writel(entry, iommu->mmio_base + IOMMU_CONTROL_MMIO_OFFSET);
+            iommu->ctrl.ppr_int_en = 1;
+            writeq(iommu->ctrl.raw,
+                   iommu->mmio_base + IOMMU_CONTROL_MMIO_OFFSET);
             /*
              * Re-schedule the tasklet to handle eventual log entries added
              * between reading the log above and re-enabling the interrupt.
@@ -754,7 +704,6 @@ static void do_amd_iommu_irq(unsigned lo
 static void iommu_interrupt_handler(int irq, void *dev_id,
                                     struct cpu_user_regs *regs)
 {
-    u32 entry;
     unsigned long flags;
     struct amd_iommu *iommu = dev_id;
 
@@ -764,10 +713,9 @@ static void iommu_interrupt_handler(int
      * Silence interrupts from both event and PPR by clearing the
      * enable logging bits in the control register
      */
-    entry = readl(iommu->mmio_base + IOMMU_CONTROL_MMIO_OFFSET);
-    iommu_clear_bit(&entry, IOMMU_CONTROL_EVENT_LOG_INT_SHIFT);
-    iommu_clear_bit(&entry, IOMMU_CONTROL_PPR_LOG_INT_SHIFT);
-    writel(entry, iommu->mmio_base + IOMMU_CONTROL_MMIO_OFFSET);
+    iommu->ctrl.event_int_en = 0;
+    iommu->ctrl.ppr_int_en = 0;
+    writeq(iommu->ctrl.raw, iommu->mmio_base + IOMMU_CONTROL_MMIO_OFFSET);
 
     spin_unlock_irqrestore(&iommu->lock, flags);
 
--- a/xen/include/asm-x86/amd-iommu.h
+++ b/xen/include/asm-x86/amd-iommu.h
@@ -88,6 +88,8 @@ struct amd_iommu {
     void *mmio_base;
     unsigned long mmio_base_phys;
 
+    union amd_iommu_control ctrl;
+
     struct table_struct dev_table;
     struct ring_buffer cmd_buffer;
     struct ring_buffer event_log;
@@ -172,7 +174,7 @@ struct guest_iommu {
     uint64_t                mmio_base;             /* MMIO base address */
 
     /* MMIO regs */
-    struct mmio_reg         reg_ctrl;              /* MMIO offset 0018h */
+    union amd_iommu_control reg_ctrl;              /* MMIO offset 0018h */
     struct mmio_reg         reg_status;            /* MMIO offset 2020h */
     union amd_iommu_ext_features reg_ext_feature;  /* MMIO offset 0030h */
 
--- a/xen/include/asm-x86/hvm/svm/amd-iommu-defs.h
+++ b/xen/include/asm-x86/hvm/svm/amd-iommu-defs.h
@@ -295,38 +295,56 @@ struct amd_iommu_dte {
 
 /* Control Register */
 #define IOMMU_CONTROL_MMIO_OFFSET			0x18
-#define IOMMU_CONTROL_TRANSLATION_ENABLE_MASK		0x00000001
-#define IOMMU_CONTROL_TRANSLATION_ENABLE_SHIFT		0
-#define IOMMU_CONTROL_HT_TUNNEL_TRANSLATION_MASK	0x00000002
-#define IOMMU_CONTROL_HT_TUNNEL_TRANSLATION_SHIFT	1
-#define IOMMU_CONTROL_EVENT_LOG_ENABLE_MASK		0x00000004
-#define IOMMU_CONTROL_EVENT_LOG_ENABLE_SHIFT		2
-#define IOMMU_CONTROL_EVENT_LOG_INT_MASK		0x00000008
-#define IOMMU_CONTROL_EVENT_LOG_INT_SHIFT		3
-#define IOMMU_CONTROL_COMP_WAIT_INT_MASK		0x00000010
-#define IOMMU_CONTROL_COMP_WAIT_INT_SHIFT		4
-#define IOMMU_CONTROL_INVALIDATION_TIMEOUT_MASK		0x000000E0
-#define IOMMU_CONTROL_INVALIDATION_TIMEOUT_SHIFT	5
-#define IOMMU_CONTROL_PASS_POSTED_WRITE_MASK		0x00000100
-#define IOMMU_CONTROL_PASS_POSTED_WRITE_SHIFT		8
-#define IOMMU_CONTROL_RESP_PASS_POSTED_WRITE_MASK	0x00000200
-#define IOMMU_CONTROL_RESP_PASS_POSTED_WRITE_SHIFT	9
-#define IOMMU_CONTROL_COHERENT_MASK			0x00000400
-#define IOMMU_CONTROL_COHERENT_SHIFT			10
-#define IOMMU_CONTROL_ISOCHRONOUS_MASK			0x00000800
-#define IOMMU_CONTROL_ISOCHRONOUS_SHIFT			11
-#define IOMMU_CONTROL_COMMAND_BUFFER_ENABLE_MASK	0x00001000
-#define IOMMU_CONTROL_COMMAND_BUFFER_ENABLE_SHIFT	12
-#define IOMMU_CONTROL_PPR_LOG_ENABLE_MASK		0x00002000
-#define IOMMU_CONTROL_PPR_LOG_ENABLE_SHIFT		13
-#define IOMMU_CONTROL_PPR_LOG_INT_MASK			0x00004000
-#define IOMMU_CONTROL_PPR_LOG_INT_SHIFT			14
-#define IOMMU_CONTROL_PPR_ENABLE_MASK			0x00008000
-#define IOMMU_CONTROL_PPR_ENABLE_SHIFT			15
-#define IOMMU_CONTROL_GT_ENABLE_MASK			0x00010000
-#define IOMMU_CONTROL_GT_ENABLE_SHIFT			16
-#define IOMMU_CONTROL_RESTART_MASK			0x80000000
-#define IOMMU_CONTROL_RESTART_SHIFT			31
+
+union amd_iommu_control {
+    uint64_t raw;
+    struct {
+        unsigned int iommu_en:1;
+        unsigned int ht_tun_en:1;
+        unsigned int event_log_en:1;
+        unsigned int event_int_en:1;
+        unsigned int com_wait_int_en:1;
+        unsigned int inv_timeout:3;
+        unsigned int pass_pw:1;
+        unsigned int res_pass_pw:1;
+        unsigned int coherent:1;
+        unsigned int isoc:1;
+        unsigned int cmd_buf_en:1;
+        unsigned int ppr_log_en:1;
+        unsigned int ppr_int_en:1;
+        unsigned int ppr_en:1;
+        unsigned int gt_en:1;
+        unsigned int ga_en:1;
+        unsigned int crw:4;
+        unsigned int smif_en:1;
+        unsigned int slf_wb_dis:1;
+        unsigned int smif_log_en:1;
+        unsigned int gam_en:3;
+        unsigned int ga_log_en:1;
+        unsigned int ga_int_en:1;
+        unsigned int dual_ppr_log_en:2;
+        unsigned int dual_event_log_en:2;
+        unsigned int dev_tbl_seg_en:3;
+        unsigned int priv_abrt_en:2;
+        unsigned int ppr_auto_rsp_en:1;
+        unsigned int marc_en:1;
+        unsigned int blk_stop_mrk_en:1;
+        unsigned int ppr_auto_rsp_aon:1;
+        unsigned int domain_id_pne:1;
+        unsigned int :1;
+        unsigned int eph_en:1;
+        unsigned int had_update:2;
+        unsigned int gd_update_dis:1;
+        unsigned int :1;
+        unsigned int xt_en:1;
+        unsigned int int_cap_xt_en:1;
+        unsigned int vcmd_en:1;
+        unsigned int viommu_en:1;
+        unsigned int ga_update_dis:1;
+        unsigned int gappi_en:1;
+        unsigned int :8;
+    };
+};
 
 /* Exclusion Register */
 #define IOMMU_EXCLUSION_BASE_LOW_OFFSET		0x20




_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 54+ messages in thread

* [Xen-devel] [PATCH v2 04/10] AMD/IOMMU: use bit field for IRTE
  2019-06-27 15:15 ` [Xen-devel] [PATCH v2 00/10] x86: AMD x2APIC support Jan Beulich
                     ` (2 preceding siblings ...)
  2019-06-27 15:20   ` [Xen-devel] [PATCH v2 03/10] AMD/IOMMU: use bit field for control register Jan Beulich
@ 2019-06-27 15:20   ` Jan Beulich
  2019-07-02 12:33     ` Andrew Cooper
  2019-06-27 15:21   ` [Xen-devel] [PATCH v2 05/10] AMD/IOMMU: introduce 128-bit IRTE non-guest-APIC IRTE format Jan Beulich
                     ` (5 subsequent siblings)
  9 siblings, 1 reply; 54+ messages in thread
From: Jan Beulich @ 2019-06-27 15:20 UTC (permalink / raw)
  To: xen-devel; +Cc: Andrew Cooper, Brian Woods, Suravee Suthikulpanit

At the same time restrict its scope to just the single source file
actually using it, and abstract accesses by introducing a union of
pointers. (A union of the actual table entries is not used to make it
impossible to [wrongly, once the 128-bit form gets added] perform
pointer arithmetic / array accesses on derived types.)

Also move away from updating the entries piecemeal: Construct a full new
entry, and write it out.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
---
v2: name {get,free}_intremap_entry()'s last parameter "index" instead of
    "offset". Introduce union irte32.
---
It would have been nice to use write_atomic() or ACCESS_ONCE() for the
actual writes, but both cast the value to a scalar one, which doesn't
suit us here (and I also didn't want to make the compound type a union
with a raw member just for this).

--- a/xen/drivers/passthrough/amd/iommu_intr.c
+++ b/xen/drivers/passthrough/amd/iommu_intr.c
@@ -23,6 +23,28 @@
 #include <asm/io_apic.h>
 #include <xen/keyhandler.h>
 
+struct irte_basic {
+    unsigned int remap_en:1;
+    unsigned int sup_io_pf:1;
+    unsigned int int_type:3;
+    unsigned int rq_eoi:1;
+    unsigned int dm:1;
+    unsigned int guest_mode:1; /* MBZ */
+    unsigned int dest:8;
+    unsigned int vector:8;
+    unsigned int :8;
+};
+
+union irte32 {
+    uint32_t raw[1];
+    struct irte_basic basic;
+};
+
+union irte_ptr {
+    void *ptr;
+    union irte32 *ptr32;
+};
+
 #define INTREMAP_TABLE_ORDER    1
 #define INTREMAP_LENGTH 0xB
 #define INTREMAP_ENTRIES (1 << INTREMAP_LENGTH)
@@ -101,47 +123,46 @@ static unsigned int alloc_intremap_entry
     return slot;
 }
 
-static u32 *get_intremap_entry(int seg, int bdf, int offset)
+static union irte_ptr get_intremap_entry(unsigned int seg, unsigned int bdf,
+                                         unsigned int index)
 {
-    u32 *table = get_ivrs_mappings(seg)[bdf].intremap_table;
+    union irte_ptr table = {
+        .ptr = get_ivrs_mappings(seg)[bdf].intremap_table
+    };
+
+    ASSERT(table.ptr && (index < INTREMAP_ENTRIES));
 
-    ASSERT( (table != NULL) && (offset < INTREMAP_ENTRIES) );
+    table.ptr32 += index;
 
-    return table + offset;
+    return table;
 }
 
-static void free_intremap_entry(int seg, int bdf, int offset)
-{
-    u32 *entry = get_intremap_entry(seg, bdf, offset);
-
-    memset(entry, 0, sizeof(u32));
-    __clear_bit(offset, get_ivrs_mappings(seg)[bdf].intremap_inuse);
-}
-
-static void update_intremap_entry(u32* entry, u8 vector, u8 int_type,
-    u8 dest_mode, u8 dest)
-{
-    set_field_in_reg_u32(IOMMU_CONTROL_ENABLED, 0,
-                            INT_REMAP_ENTRY_REMAPEN_MASK,
-                            INT_REMAP_ENTRY_REMAPEN_SHIFT, entry);
-    set_field_in_reg_u32(IOMMU_CONTROL_DISABLED, *entry,
-                            INT_REMAP_ENTRY_SUPIOPF_MASK,
-                            INT_REMAP_ENTRY_SUPIOPF_SHIFT, entry);
-    set_field_in_reg_u32(int_type, *entry,
-                            INT_REMAP_ENTRY_INTTYPE_MASK,
-                            INT_REMAP_ENTRY_INTTYPE_SHIFT, entry);
-    set_field_in_reg_u32(IOMMU_CONTROL_DISABLED, *entry,
-                            INT_REMAP_ENTRY_REQEOI_MASK,
-                            INT_REMAP_ENTRY_REQEOI_SHIFT, entry);
-    set_field_in_reg_u32((u32)dest_mode, *entry,
-                            INT_REMAP_ENTRY_DM_MASK,
-                            INT_REMAP_ENTRY_DM_SHIFT, entry);
-    set_field_in_reg_u32((u32)dest, *entry,
-                            INT_REMAP_ENTRY_DEST_MAST,
-                            INT_REMAP_ENTRY_DEST_SHIFT, entry);
-    set_field_in_reg_u32((u32)vector, *entry,
-                            INT_REMAP_ENTRY_VECTOR_MASK,
-                            INT_REMAP_ENTRY_VECTOR_SHIFT, entry);
+static void free_intremap_entry(unsigned int seg, unsigned int bdf,
+                                unsigned int index)
+{
+    union irte_ptr entry = get_intremap_entry(seg, bdf, index);
+
+    ACCESS_ONCE(entry.ptr32->raw[0]) = 0;
+
+    __clear_bit(index, get_ivrs_mappings(seg)[bdf].intremap_inuse);
+}
+
+static void update_intremap_entry(union irte_ptr entry, unsigned int vector,
+                                  unsigned int int_type,
+                                  unsigned int dest_mode, unsigned int dest)
+{
+    struct irte_basic basic = {
+        .remap_en = 1,
+        .sup_io_pf = 0,
+        .int_type = int_type,
+        .rq_eoi = 0,
+        .dm = dest_mode,
+        .dest = dest,
+        .vector = vector,
+    };
+
+    ACCESS_ONCE(entry.ptr32->raw[0]) =
+        container_of(&basic, union irte32, basic)->raw[0];
 }
 
 static inline int get_rte_index(const struct IO_APIC_route_entry *rte)
@@ -163,7 +184,7 @@ static int update_intremap_entry_from_io
     u16 *index)
 {
     unsigned long flags;
-    u32* entry;
+    union irte_ptr entry;
     u8 delivery_mode, dest, vector, dest_mode;
     int req_id;
     spinlock_t *lock;
@@ -201,12 +222,8 @@ static int update_intremap_entry_from_io
          * so need to recover vector and delivery mode from IRTE.
          */
         ASSERT(get_rte_index(rte) == offset);
-        vector = get_field_from_reg_u32(*entry,
-                                        INT_REMAP_ENTRY_VECTOR_MASK,
-                                        INT_REMAP_ENTRY_VECTOR_SHIFT);
-        delivery_mode = get_field_from_reg_u32(*entry,
-                                               INT_REMAP_ENTRY_INTTYPE_MASK,
-                                               INT_REMAP_ENTRY_INTTYPE_SHIFT);
+        vector = entry.ptr32->basic.vector;
+        delivery_mode = entry.ptr32->basic.int_type;
     }
     update_intremap_entry(entry, vector, delivery_mode, dest_mode, dest);
 
@@ -228,7 +245,7 @@ int __init amd_iommu_setup_ioapic_remapp
 {
     struct IO_APIC_route_entry rte;
     unsigned long flags;
-    u32* entry;
+    union irte_ptr entry;
     int apic, pin;
     u8 delivery_mode, dest, vector, dest_mode;
     u16 seg, bdf, req_id;
@@ -407,16 +424,14 @@ unsigned int amd_iommu_read_ioapic_from_
         u16 bdf = ioapic_sbdf[idx].bdf;
         u16 seg = ioapic_sbdf[idx].seg;
         u16 req_id = get_intremap_requestor_id(seg, bdf);
-        const u32 *entry = get_intremap_entry(seg, req_id, offset);
+        union irte_ptr entry = get_intremap_entry(seg, req_id, offset);
 
         ASSERT(offset == (val & (INTREMAP_ENTRIES - 1)));
         val &= ~(INTREMAP_ENTRIES - 1);
-        val |= get_field_from_reg_u32(*entry,
-                                      INT_REMAP_ENTRY_INTTYPE_MASK,
-                                      INT_REMAP_ENTRY_INTTYPE_SHIFT) << 8;
-        val |= get_field_from_reg_u32(*entry,
-                                      INT_REMAP_ENTRY_VECTOR_MASK,
-                                      INT_REMAP_ENTRY_VECTOR_SHIFT);
+        val |= MASK_INSR(entry.ptr32->basic.int_type,
+                         IO_APIC_REDIR_DELIV_MODE_MASK);
+        val |= MASK_INSR(entry.ptr32->basic.vector,
+                         IO_APIC_REDIR_VECTOR_MASK);
     }
 
     return val;
@@ -427,7 +442,7 @@ static int update_intremap_entry_from_ms
     int *remap_index, const struct msi_msg *msg, u32 *data)
 {
     unsigned long flags;
-    u32* entry;
+    union irte_ptr entry;
     u16 req_id, alias_id;
     u8 delivery_mode, dest, vector, dest_mode;
     spinlock_t *lock;
@@ -581,7 +596,7 @@ void amd_iommu_read_msi_from_ire(
     const struct pci_dev *pdev = msi_desc->dev;
     u16 bdf = pdev ? PCI_BDF2(pdev->bus, pdev->devfn) : hpet_sbdf.bdf;
     u16 seg = pdev ? pdev->seg : hpet_sbdf.seg;
-    const u32 *entry;
+    union irte_ptr entry;
 
     if ( IS_ERR_OR_NULL(_find_iommu_for_device(seg, bdf)) )
         return;
@@ -597,12 +612,10 @@ void amd_iommu_read_msi_from_ire(
     }
 
     msg->data &= ~(INTREMAP_ENTRIES - 1);
-    msg->data |= get_field_from_reg_u32(*entry,
-                                        INT_REMAP_ENTRY_INTTYPE_MASK,
-                                        INT_REMAP_ENTRY_INTTYPE_SHIFT) << 8;
-    msg->data |= get_field_from_reg_u32(*entry,
-                                        INT_REMAP_ENTRY_VECTOR_MASK,
-                                        INT_REMAP_ENTRY_VECTOR_SHIFT);
+    msg->data |= MASK_INSR(entry.ptr32->basic.int_type,
+                           MSI_DATA_DELIVERY_MODE_MASK);
+    msg->data |= MASK_INSR(entry.ptr32->basic.vector,
+                           MSI_DATA_VECTOR_MASK);
 }
 
 int __init amd_iommu_free_intremap_table(
--- a/xen/include/asm-x86/hvm/svm/amd-iommu-defs.h
+++ b/xen/include/asm-x86/hvm/svm/amd-iommu-defs.h
@@ -469,22 +469,6 @@ struct amd_iommu_pte {
 #define IOMMU_CONTROL_DISABLED	0
 #define IOMMU_CONTROL_ENABLED	1
 
-/* interrupt remapping table */
-#define INT_REMAP_ENTRY_REMAPEN_MASK    0x00000001
-#define INT_REMAP_ENTRY_REMAPEN_SHIFT   0
-#define INT_REMAP_ENTRY_SUPIOPF_MASK    0x00000002
-#define INT_REMAP_ENTRY_SUPIOPF_SHIFT   1
-#define INT_REMAP_ENTRY_INTTYPE_MASK    0x0000001C
-#define INT_REMAP_ENTRY_INTTYPE_SHIFT   2
-#define INT_REMAP_ENTRY_REQEOI_MASK     0x00000020
-#define INT_REMAP_ENTRY_REQEOI_SHIFT    5
-#define INT_REMAP_ENTRY_DM_MASK         0x00000040
-#define INT_REMAP_ENTRY_DM_SHIFT        6
-#define INT_REMAP_ENTRY_DEST_MAST       0x0000FF00
-#define INT_REMAP_ENTRY_DEST_SHIFT      8
-#define INT_REMAP_ENTRY_VECTOR_MASK     0x00FF0000
-#define INT_REMAP_ENTRY_VECTOR_SHIFT    16
-
 #define INV_IOMMU_ALL_PAGES_ADDRESS      ((1ULL << 63) - 1)
 
 #define IOMMU_RING_BUFFER_PTR_MASK                  0x0007FFF0




_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 54+ messages in thread

* [Xen-devel] [PATCH v2 05/10] AMD/IOMMU: introduce 128-bit IRTE non-guest-APIC IRTE format
  2019-06-27 15:15 ` [Xen-devel] [PATCH v2 00/10] x86: AMD x2APIC support Jan Beulich
                     ` (3 preceding siblings ...)
  2019-06-27 15:20   ` [Xen-devel] [PATCH v2 04/10] AMD/IOMMU: use bit field for IRTE Jan Beulich
@ 2019-06-27 15:21   ` Jan Beulich
  2019-07-02 14:41     ` Andrew Cooper
  2019-06-27 15:21   ` [Xen-devel] [PATCH v2 06/10] AMD/IOMMU: split amd_iommu_init_one() Jan Beulich
                     ` (4 subsequent siblings)
  9 siblings, 1 reply; 54+ messages in thread
From: Jan Beulich @ 2019-06-27 15:21 UTC (permalink / raw)
  To: xen-devel; +Cc: Andrew Cooper, Brian Woods, Suravee Suthikulpanit

This is in preparation of actually enabling x2APIC mode, which requires
this wider IRTE format to be used.

A specific remark regarding the first hunk changing
amd_iommu_ioapic_update_ire(): This bypass was introduced for XSA-36,
i.e. by 94d4a1119d ("AMD,IOMMU: Clean up old entries in remapping
tables when creating new one"). Other code introduced by that change has
meanwhile disappeared or further changed, and I wonder if - rather than
adding an x2apic_enabled check to the conditional - the bypass couldn't
be deleted altogether. For now the goal is to affect the non-x2APIC
paths as little as possible.

Take the liberty and use the new "fresh" flag to suppress an unneeded
flush in update_intremap_entry_from_ioapic().

Signed-off-by: Jan Beulich <jbeulich@suse.com>
---
v2: Add cast in get_full_dest(). Re-base over changes earlier in the
    series. Don't use cmpxchg16b. Use barrier() instead of wmb().
---
Note that AMD's doc says Lowest Priority ("Arbitrated" by their naming)
mode is unavailable in x2APIC mode, but they've confirmed this to be a
mistake on their part.

--- a/xen/drivers/passthrough/amd/iommu_intr.c
+++ b/xen/drivers/passthrough/amd/iommu_intr.c
@@ -40,12 +40,45 @@ union irte32 {
     struct irte_basic basic;
 };
 
+struct irte_full {
+    unsigned int remap_en:1;
+    unsigned int sup_io_pf:1;
+    unsigned int int_type:3;
+    unsigned int rq_eoi:1;
+    unsigned int dm:1;
+    unsigned int guest_mode:1; /* MBZ */
+    unsigned int dest_lo:24;
+    unsigned int :32;
+    unsigned int vector:8;
+    unsigned int :24;
+    unsigned int :24;
+    unsigned int dest_hi:8;
+};
+
+union irte128 {
+    uint64_t raw[2];
+    struct irte_full full;
+};
+
+static enum {
+    irte32,
+    irte128,
+    irteUNK,
+} irte_mode __read_mostly = irteUNK;
+
 union irte_ptr {
     void *ptr;
     union irte32 *ptr32;
+    union irte128 *ptr128;
 };
 
-#define INTREMAP_TABLE_ORDER    1
+union irte_cptr {
+    const void *ptr;
+    const union irte32 *ptr32;
+    const union irte128 *ptr128;
+} __transparent__;
+
+#define INTREMAP_TABLE_ORDER (irte_mode == irte32 ? 1 : 3)
 #define INTREMAP_LENGTH 0xB
 #define INTREMAP_ENTRIES (1 << INTREMAP_LENGTH)
 
@@ -132,7 +165,19 @@ static union irte_ptr get_intremap_entry
 
     ASSERT(table.ptr && (index < INTREMAP_ENTRIES));
 
-    table.ptr32 += index;
+    switch ( irte_mode )
+    {
+    case irte32:
+        table.ptr32 += index;
+        break;
+
+    case irte128:
+        table.ptr128 += index;
+        break;
+
+    default:
+        ASSERT_UNREACHABLE();
+    }
 
     return table;
 }
@@ -142,7 +187,21 @@ static void free_intremap_entry(unsigned
 {
     union irte_ptr entry = get_intremap_entry(seg, bdf, index);
 
-    ACCESS_ONCE(entry.ptr32->raw[0]) = 0;
+    switch ( irte_mode )
+    {
+    case irte32:
+        ACCESS_ONCE(entry.ptr32->raw[0]) = 0;
+        break;
+
+    case irte128:
+        ACCESS_ONCE(entry.ptr128->raw[0]) = 0;
+        barrier();
+        entry.ptr128->raw[1] = 0;
+        break;
+
+    default:
+        ASSERT_UNREACHABLE();
+    }
 
     __clear_bit(index, get_ivrs_mappings(seg)[bdf].intremap_inuse);
 }
@@ -160,9 +219,37 @@ static void update_intremap_entry(union
         .dest = dest,
         .vector = vector,
     };
+    struct irte_full full = {
+        .remap_en = 1,
+        .sup_io_pf = 0,
+        .int_type = int_type,
+        .rq_eoi = 0,
+        .dm = dest_mode,
+        .dest_lo = dest,
+        .dest_hi = dest >> 24,
+        .vector = vector,
+    };
+
+    switch ( irte_mode )
+    {
+    case irte32:
+        ACCESS_ONCE(entry.ptr32->raw[0]) =
+            container_of(&basic, union irte32, basic)->raw[0];
+        break;
+
+    case irte128:
+        ACCESS_ONCE(entry.ptr128->raw[0]) = 0;
+        barrier();
+        entry.ptr128->raw[1] =
+            container_of(&full, union irte128, full)->raw[1];
+        barrier();
+        ACCESS_ONCE(entry.ptr128->raw[0]) =
+            container_of(&full, union irte128, full)->raw[0];
+        break;
 
-    ACCESS_ONCE(entry.ptr32->raw[0]) =
-        container_of(&basic, union irte32, basic)->raw[0];
+    default:
+        ASSERT_UNREACHABLE();
+    }
 }
 
 static inline int get_rte_index(const struct IO_APIC_route_entry *rte)
@@ -176,6 +263,11 @@ static inline void set_rte_index(struct
     rte->delivery_mode = offset >> 8;
 }
 
+static inline unsigned int get_full_dest(const union irte128 *entry)
+{
+    return entry->full.dest_lo | ((unsigned int)entry->full.dest_hi << 24);
+}
+
 static int update_intremap_entry_from_ioapic(
     int bdf,
     struct amd_iommu *iommu,
@@ -185,10 +277,11 @@ static int update_intremap_entry_from_io
 {
     unsigned long flags;
     union irte_ptr entry;
-    u8 delivery_mode, dest, vector, dest_mode;
+    unsigned int delivery_mode, dest, vector, dest_mode;
     int req_id;
     spinlock_t *lock;
     unsigned int offset;
+    bool fresh = false;
 
     req_id = get_intremap_requestor_id(iommu->seg, bdf);
     lock = get_intremap_lock(iommu->seg, req_id);
@@ -196,7 +289,7 @@ static int update_intremap_entry_from_io
     delivery_mode = rte->delivery_mode;
     vector = rte->vector;
     dest_mode = rte->dest_mode;
-    dest = rte->dest.logical.logical_dest;
+    dest = x2apic_enabled ? rte->dest.dest32 : rte->dest.logical.logical_dest;
 
     spin_lock_irqsave(lock, flags);
 
@@ -211,25 +304,40 @@ static int update_intremap_entry_from_io
             return -ENOSPC;
         }
         *index = offset;
-        lo_update = 1;
+        fresh = true;
     }
 
     entry = get_intremap_entry(iommu->seg, req_id, offset);
-    if ( !lo_update )
+    if ( fresh )
+        /* nothing */;
+    else if ( !lo_update )
     {
         /*
          * Low half of incoming RTE is already in remapped format,
          * so need to recover vector and delivery mode from IRTE.
          */
         ASSERT(get_rte_index(rte) == offset);
-        vector = entry.ptr32->basic.vector;
+        if ( irte_mode == irte32 )
+            vector = entry.ptr32->basic.vector;
+        else
+            vector = entry.ptr128->full.vector;
+        /* The IntType fields match for both formats. */
         delivery_mode = entry.ptr32->basic.int_type;
     }
+    else if ( x2apic_enabled )
+    {
+        /*
+         * High half of incoming RTE was read from the I/O APIC and hence may
+         * not hold the full destination, so need to recover full destination
+         * from IRTE.
+         */
+        dest = get_full_dest(entry.ptr128);
+    }
     update_intremap_entry(entry, vector, delivery_mode, dest_mode, dest);
 
     spin_unlock_irqrestore(lock, flags);
 
-    if ( iommu->enabled )
+    if ( iommu->enabled && !fresh )
     {
         spin_lock_irqsave(&iommu->lock, flags);
         amd_iommu_flush_intremap(iommu, req_id);
@@ -253,6 +361,19 @@ int __init amd_iommu_setup_ioapic_remapp
     spinlock_t *lock;
     unsigned int offset;
 
+    for_each_amd_iommu ( iommu )
+    {
+        if ( irte_mode != irteUNK )
+        {
+            if ( iommu->ctrl.ga_en == (irte_mode == irte32) )
+                return -ENXIO;
+        }
+        else if ( iommu->ctrl.ga_en )
+            irte_mode = irte128;
+        else
+            irte_mode = irte32;
+    }
+
     /* Read ioapic entries and update interrupt remapping table accordingly */
     for ( apic = 0; apic < nr_ioapics; apic++ )
     {
@@ -287,6 +408,18 @@ int __init amd_iommu_setup_ioapic_remapp
             dest_mode = rte.dest_mode;
             dest = rte.dest.logical.logical_dest;
 
+            if ( iommu->ctrl.xt_en )
+            {
+                /*
+                 * In x2APIC mode we have no way of discovering the high 24
+                 * bits of the destination of an already enabled interrupt.
+                 * We come here earlier than for xAPIC mode, so no interrupts
+                 * should have been set up before.
+                 */
+                AMD_IOMMU_DEBUG("Unmasked IO-APIC#%u entry %u in x2APIC mode\n",
+                                IO_APIC_ID(apic), pin);
+            }
+
             spin_lock_irqsave(lock, flags);
             offset = alloc_intremap_entry(seg, req_id, 1);
             BUG_ON(offset >= INTREMAP_ENTRIES);
@@ -321,7 +454,8 @@ void amd_iommu_ioapic_update_ire(
     struct IO_APIC_route_entry new_rte = { 0 };
     unsigned int rte_lo = (reg & 1) ? reg - 1 : reg;
     unsigned int pin = (reg - 0x10) / 2;
-    int saved_mask, seg, bdf, rc;
+    int seg, bdf, rc;
+    bool saved_mask, fresh = false;
     struct amd_iommu *iommu;
     unsigned int idx;
 
@@ -363,12 +497,22 @@ void amd_iommu_ioapic_update_ire(
         *(((u32 *)&new_rte) + 1) = value;
     }
 
-    if ( new_rte.mask &&
-         ioapic_sbdf[idx].pin_2_idx[pin] >= INTREMAP_ENTRIES )
+    if ( ioapic_sbdf[idx].pin_2_idx[pin] >= INTREMAP_ENTRIES )
     {
         ASSERT(saved_mask);
-        __io_apic_write(apic, reg, value);
-        return;
+
+        /*
+         * There's nowhere except the IRTE to store a full 32-bit destination,
+         * so we may not bypass entry allocation and updating of the low RTE
+         * half in the (usual) case of the high RTE half getting written first.
+         */
+        if ( new_rte.mask && !x2apic_enabled )
+        {
+            __io_apic_write(apic, reg, value);
+            return;
+        }
+
+        fresh = true;
     }
 
     /* mask the interrupt while we change the intremap table */
@@ -397,8 +541,12 @@ void amd_iommu_ioapic_update_ire(
     if ( reg == rte_lo )
         return;
 
-    /* unmask the interrupt after we have updated the intremap table */
-    if ( !saved_mask )
+    /*
+     * Unmask the interrupt after we have updated the intremap table. Also
+     * write the low half if a fresh entry was allocated for a high half
+     * update in x2APIC mode.
+     */
+    if ( !saved_mask || (x2apic_enabled && fresh) )
     {
         old_rte.mask = saved_mask;
         __io_apic_write(apic, rte_lo, *((u32 *)&old_rte));
@@ -412,27 +560,36 @@ unsigned int amd_iommu_read_ioapic_from_
     unsigned int offset;
     unsigned int val = __io_apic_read(apic, reg);
     unsigned int pin = (reg - 0x10) / 2;
+    uint16_t seg, req_id;
+    union irte_ptr entry;
 
     idx = ioapic_id_to_index(IO_APIC_ID(apic));
     if ( idx == MAX_IO_APICS )
         return -EINVAL;
 
     offset = ioapic_sbdf[idx].pin_2_idx[pin];
+    if ( offset >= INTREMAP_ENTRIES )
+        return val;
 
-    if ( !(reg & 1) && offset < INTREMAP_ENTRIES )
+    seg = ioapic_sbdf[idx].seg;
+    req_id = get_intremap_requestor_id(seg, ioapic_sbdf[idx].bdf);
+    entry = get_intremap_entry(seg, req_id, offset);
+
+    if ( !(reg & 1) )
     {
-        u16 bdf = ioapic_sbdf[idx].bdf;
-        u16 seg = ioapic_sbdf[idx].seg;
-        u16 req_id = get_intremap_requestor_id(seg, bdf);
-        union irte_ptr entry = get_intremap_entry(seg, req_id, offset);
 
         ASSERT(offset == (val & (INTREMAP_ENTRIES - 1)));
         val &= ~(INTREMAP_ENTRIES - 1);
+        /* The IntType fields match for both formats. */
         val |= MASK_INSR(entry.ptr32->basic.int_type,
                          IO_APIC_REDIR_DELIV_MODE_MASK);
-        val |= MASK_INSR(entry.ptr32->basic.vector,
+        val |= MASK_INSR(irte_mode == irte32
+                         ? entry.ptr32->basic.vector
+                         : entry.ptr128->full.vector,
                          IO_APIC_REDIR_VECTOR_MASK);
     }
+    else if ( x2apic_enabled )
+        val = get_full_dest(entry.ptr128);
 
     return val;
 }
@@ -444,9 +601,9 @@ static int update_intremap_entry_from_ms
     unsigned long flags;
     union irte_ptr entry;
     u16 req_id, alias_id;
-    u8 delivery_mode, dest, vector, dest_mode;
+    uint8_t delivery_mode, vector, dest_mode;
     spinlock_t *lock;
-    unsigned int offset, i;
+    unsigned int dest, offset, i;
 
     req_id = get_dma_requestor_id(iommu->seg, bdf);
     alias_id = get_intremap_requestor_id(iommu->seg, bdf);
@@ -467,7 +624,12 @@ static int update_intremap_entry_from_ms
     dest_mode = (msg->address_lo >> MSI_ADDR_DESTMODE_SHIFT) & 0x1;
     delivery_mode = (msg->data >> MSI_DATA_DELIVERY_MODE_SHIFT) & 0x1;
     vector = (msg->data >> MSI_DATA_VECTOR_SHIFT) & MSI_DATA_VECTOR_MASK;
-    dest = (msg->address_lo >> MSI_ADDR_DEST_ID_SHIFT) & 0xff;
+
+    if ( x2apic_enabled )
+        dest = msg->dest32;
+    else
+        dest = MASK_EXTR(msg->address_lo, MSI_ADDR_DEST_ID_MASK);
+
     offset = *remap_index;
     if ( offset >= INTREMAP_ENTRIES )
     {
@@ -612,10 +774,21 @@ void amd_iommu_read_msi_from_ire(
     }
 
     msg->data &= ~(INTREMAP_ENTRIES - 1);
+    /* The IntType fields match for both formats. */
     msg->data |= MASK_INSR(entry.ptr32->basic.int_type,
                            MSI_DATA_DELIVERY_MODE_MASK);
-    msg->data |= MASK_INSR(entry.ptr32->basic.vector,
-                           MSI_DATA_VECTOR_MASK);
+    if ( irte_mode == irte32 )
+    {
+        msg->data |= MASK_INSR(entry.ptr32->basic.vector,
+                               MSI_DATA_VECTOR_MASK);
+        msg->dest32 = entry.ptr32->basic.dest;
+    }
+    else
+    {
+        msg->data |= MASK_INSR(entry.ptr128->full.vector,
+                               MSI_DATA_VECTOR_MASK);
+        msg->dest32 = get_full_dest(entry.ptr128);
+    }
 }
 
 int __init amd_iommu_free_intremap_table(
@@ -678,18 +851,28 @@ int __init amd_setup_hpet_msi(struct msi
     return rc;
 }
 
-static void dump_intremap_table(const u32 *table)
+static void dump_intremap_table(union irte_cptr tbl)
 {
-    u32 count;
+    unsigned int count;
 
-    if ( !table )
+    if ( !tbl.ptr || irte_mode == irteUNK )
         return;
 
     for ( count = 0; count < INTREMAP_ENTRIES; count++ )
     {
-        if ( !table[count] )
-            continue;
-        printk("    IRTE[%03x] %08x\n", count, table[count]);
+        if ( irte_mode == irte32 )
+        {
+            if ( !tbl.ptr32[count].raw[0] )
+                continue;
+            printk("    IRTE[%03x] %08x\n", count, tbl.ptr32[count].raw[0]);
+        }
+        else
+        {
+            if ( !tbl.ptr128[count].raw[0] && !tbl.ptr128[count].raw[1] )
+                continue;
+            printk("    IRTE[%03x] %016lx_%016lx\n",
+                   count, tbl.ptr128[count].raw[1], tbl.ptr128[count].raw[0]);
+        }
     }
 }
 




_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 54+ messages in thread

* [Xen-devel] [PATCH v2 06/10] AMD/IOMMU: split amd_iommu_init_one()
  2019-06-27 15:15 ` [Xen-devel] [PATCH v2 00/10] x86: AMD x2APIC support Jan Beulich
                     ` (4 preceding siblings ...)
  2019-06-27 15:21   ` [Xen-devel] [PATCH v2 05/10] AMD/IOMMU: introduce 128-bit IRTE non-guest-APIC IRTE format Jan Beulich
@ 2019-06-27 15:21   ` Jan Beulich
  2019-06-27 15:22   ` [Xen-devel] [PATCH v2 07/10] AMD/IOMMU: allow enabling with IRQ not yet set up Jan Beulich
                     ` (3 subsequent siblings)
  9 siblings, 0 replies; 54+ messages in thread
From: Jan Beulich @ 2019-06-27 15:21 UTC (permalink / raw)
  To: xen-devel; +Cc: Andrew Cooper, Brian Woods, Suravee Suthikulpanit

Mapping the MMIO space and obtaining feature information needs to happen
slightly earlier, such that for x2APIC support we can set XTEn prior to
calling amd_iommu_update_ivrs_mapping_acpi() and
amd_iommu_setup_ioapic_remapping().

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>

--- a/xen/drivers/passthrough/amd/iommu_init.c
+++ b/xen/drivers/passthrough/amd/iommu_init.c
@@ -970,14 +970,6 @@ static void * __init allocate_ppr_log(st
 
 static int __init amd_iommu_init_one(struct amd_iommu *iommu)
 {
-    if ( map_iommu_mmio_region(iommu) != 0 )
-        goto error_out;
-
-    get_iommu_features(iommu);
-
-    if ( iommu->features.raw )
-        iommuv2_enabled = 1;
-
     if ( allocate_cmd_buffer(iommu) == NULL )
         goto error_out;
 
@@ -1197,6 +1189,23 @@ static bool_t __init amd_sp5100_erratum2
     return 0;
 }
 
+static int __init amd_iommu_prepare_one(struct amd_iommu *iommu)
+{
+    int rc = alloc_ivrs_mappings(iommu->seg);
+
+    if ( !rc )
+        rc = map_iommu_mmio_region(iommu);
+    if ( rc )
+        return rc;
+
+    get_iommu_features(iommu);
+
+    if ( iommu->features.raw )
+        iommuv2_enabled = true;
+
+    return 0;
+}
+
 int __init amd_iommu_init(void)
 {
     struct amd_iommu *iommu;
@@ -1227,7 +1236,7 @@ int __init amd_iommu_init(void)
     radix_tree_init(&ivrs_maps);
     for_each_amd_iommu ( iommu )
     {
-        rc = alloc_ivrs_mappings(iommu->seg);
+        rc = amd_iommu_prepare_one(iommu);
         if ( rc )
             goto error_out;
     }





_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 54+ messages in thread

* [Xen-devel] [PATCH v2 07/10] AMD/IOMMU: allow enabling with IRQ not yet set up
  2019-06-27 15:15 ` [Xen-devel] [PATCH v2 00/10] x86: AMD x2APIC support Jan Beulich
                     ` (5 preceding siblings ...)
  2019-06-27 15:21   ` [Xen-devel] [PATCH v2 06/10] AMD/IOMMU: split amd_iommu_init_one() Jan Beulich
@ 2019-06-27 15:22   ` Jan Beulich
  2019-06-27 15:22   ` [Xen-devel] [PATCH v2 08/10] AMD/IOMMU: adjust setup of internal interrupt for x2APIC mode Jan Beulich
                     ` (2 subsequent siblings)
  9 siblings, 0 replies; 54+ messages in thread
From: Jan Beulich @ 2019-06-27 15:22 UTC (permalink / raw)
  To: xen-devel; +Cc: Andrew Cooper, Brian Woods, Suravee Suthikulpanit

Early enabling (to enter x2APIC mode) requires deferring of the IRQ
setup. Code to actually do that setup in the x2APIC case will get added
subsequently.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>

--- a/xen/drivers/passthrough/amd/iommu_init.c
+++ b/xen/drivers/passthrough/amd/iommu_init.c
@@ -814,7 +814,6 @@ static void amd_iommu_erratum_746_workar
 static void enable_iommu(struct amd_iommu *iommu)
 {
     unsigned long flags;
-    struct irq_desc *desc;
 
     spin_lock_irqsave(&iommu->lock, flags);
 
@@ -834,19 +833,27 @@ static void enable_iommu(struct amd_iomm
     if ( iommu->features.flds.ppr_sup )
         register_iommu_ppr_log_in_mmio_space(iommu);
 
-    desc = irq_to_desc(iommu->msi.irq);
-    spin_lock(&desc->lock);
-    set_msi_affinity(desc, &cpu_online_map);
-    spin_unlock(&desc->lock);
+    if ( iommu->msi.irq > 0 )
+    {
+        struct irq_desc *desc = irq_to_desc(iommu->msi.irq);
+
+        spin_lock(&desc->lock);
+        set_msi_affinity(desc, &cpu_online_map);
+        spin_unlock(&desc->lock);
+    }
 
     amd_iommu_msi_enable(iommu, IOMMU_CONTROL_ENABLED);
 
     set_iommu_ht_flags(iommu);
     set_iommu_command_buffer_control(iommu, IOMMU_CONTROL_ENABLED);
-    set_iommu_event_log_control(iommu, IOMMU_CONTROL_ENABLED);
 
-    if ( iommu->features.flds.ppr_sup )
-        set_iommu_ppr_log_control(iommu, IOMMU_CONTROL_ENABLED);
+    if ( iommu->msi.irq > 0 )
+    {
+        set_iommu_event_log_control(iommu, IOMMU_CONTROL_ENABLED);
+
+        if ( iommu->features.flds.ppr_sup )
+            set_iommu_ppr_log_control(iommu, IOMMU_CONTROL_ENABLED);
+    }
 
     if ( iommu->features.flds.gt_sup )
         set_iommu_guest_translation_control(iommu, IOMMU_CONTROL_ENABLED);





_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 54+ messages in thread

* [Xen-devel] [PATCH v2 08/10] AMD/IOMMU: adjust setup of internal interrupt for x2APIC mode
  2019-06-27 15:15 ` [Xen-devel] [PATCH v2 00/10] x86: AMD x2APIC support Jan Beulich
                     ` (6 preceding siblings ...)
  2019-06-27 15:22   ` [Xen-devel] [PATCH v2 07/10] AMD/IOMMU: allow enabling with IRQ not yet set up Jan Beulich
@ 2019-06-27 15:22   ` Jan Beulich
  2019-06-27 15:23   ` [Xen-devel] [PATCH v2 09/10] AMD/IOMMU: enable x2APIC mode when available Jan Beulich
  2019-06-27 15:23   ` [Xen-devel] [PATCH RFC v2 10/10] AMD/IOMMU: correct IRTE updating Jan Beulich
  9 siblings, 0 replies; 54+ messages in thread
From: Jan Beulich @ 2019-06-27 15:22 UTC (permalink / raw)
  To: xen-devel; +Cc: Andrew Cooper, Brian Woods, Suravee Suthikulpanit

In order to be able to express all possible destinations we need to make
use of this non-MSI-capability based mechanism. The new IRQ controller
structure can re-use certain MSI functions, though.

For now general and PPR interrupts still share a single vector, IRQ, and
hence handler.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>

--- a/xen/drivers/passthrough/amd/iommu_init.c
+++ b/xen/drivers/passthrough/amd/iommu_init.c
@@ -472,6 +472,44 @@ static hw_irq_controller iommu_maskable_
     .set_affinity = set_msi_affinity,
 };
 
+static void set_x2apic_affinity(struct irq_desc *desc, const cpumask_t *mask)
+{
+    struct amd_iommu *iommu = desc->action->dev_id;
+    unsigned int dest = set_desc_affinity(desc, mask);
+    union amd_iommu_x2apic_control ctrl = {};
+    unsigned long flags;
+
+    if ( dest == BAD_APICID )
+        return;
+
+    msi_compose_msg(desc->arch.vector, NULL, &iommu->msi.msg);
+    iommu->msi.msg.dest32 = dest;
+
+    ctrl.dest_mode = MASK_EXTR(iommu->msi.msg.address_lo,
+                               MSI_ADDR_DESTMODE_MASK);
+    ctrl.int_type = MASK_EXTR(iommu->msi.msg.data,
+                              MSI_DATA_DELIVERY_MODE_MASK);
+    ctrl.vector = desc->arch.vector;
+    ctrl.dest_lo = dest;
+    ctrl.dest_hi = dest >> 24;
+
+    spin_lock_irqsave(&iommu->lock, flags);
+    writeq(ctrl.raw, iommu->mmio_base + IOMMU_XT_INT_CTRL_MMIO_OFFSET);
+    writeq(ctrl.raw, iommu->mmio_base + IOMMU_XT_PPR_INT_CTRL_MMIO_OFFSET);
+    spin_unlock_irqrestore(&iommu->lock, flags);
+}
+
+static hw_irq_controller iommu_x2apic_type = {
+    .typename     = "IOMMU-x2APIC",
+    .startup      = irq_startup_none,
+    .shutdown     = irq_shutdown_none,
+    .enable       = irq_enable_none,
+    .disable      = irq_disable_none,
+    .ack          = ack_nonmaskable_msi_irq,
+    .end          = end_nonmaskable_msi_irq,
+    .set_affinity = set_x2apic_affinity,
+};
+
 static void parse_event_log_entry(struct amd_iommu *iommu, u32 entry[])
 {
     u16 domain_id, device_id, flags;
@@ -726,8 +764,6 @@ static void iommu_interrupt_handler(int
 static bool_t __init set_iommu_interrupt_handler(struct amd_iommu *iommu)
 {
     int irq, ret;
-    hw_irq_controller *handler;
-    u16 control;
 
     irq = create_irq(NUMA_NO_NODE);
     if ( irq <= 0 )
@@ -747,20 +783,43 @@ static bool_t __init set_iommu_interrupt
                         PCI_SLOT(iommu->bdf), PCI_FUNC(iommu->bdf));
         return 0;
     }
-    control = pci_conf_read16(iommu->seg, PCI_BUS(iommu->bdf),
-                              PCI_SLOT(iommu->bdf), PCI_FUNC(iommu->bdf),
-                              iommu->msi.msi_attrib.pos + PCI_MSI_FLAGS);
-    iommu->msi.msi.nvec = 1;
-    if ( is_mask_bit_support(control) )
-    {
-        iommu->msi.msi_attrib.maskbit = 1;
-        iommu->msi.msi.mpos = msi_mask_bits_reg(iommu->msi.msi_attrib.pos,
-                                                is_64bit_address(control));
-        handler = &iommu_maskable_msi_type;
+
+    if ( iommu->ctrl.int_cap_xt_en )
+    {
+        struct irq_desc *desc = irq_to_desc(irq);
+
+        iommu->msi.msi_attrib.pos = MSI_TYPE_IOMMU;
+        iommu->msi.msi_attrib.maskbit = 0;
+        iommu->msi.msi_attrib.is_64 = 1;
+
+        desc->msi_desc = &iommu->msi;
+        desc->handler = &iommu_x2apic_type;
+
+        ret = 0;
     }
     else
-        handler = &iommu_msi_type;
-    ret = __setup_msi_irq(irq_to_desc(irq), &iommu->msi, handler);
+    {
+        hw_irq_controller *handler;
+        u16 control;
+
+        control = pci_conf_read16(iommu->seg, PCI_BUS(iommu->bdf),
+                                  PCI_SLOT(iommu->bdf), PCI_FUNC(iommu->bdf),
+                                  iommu->msi.msi_attrib.pos + PCI_MSI_FLAGS);
+
+        iommu->msi.msi.nvec = 1;
+        if ( is_mask_bit_support(control) )
+        {
+            iommu->msi.msi_attrib.maskbit = 1;
+            iommu->msi.msi.mpos = msi_mask_bits_reg(iommu->msi.msi_attrib.pos,
+                                                    is_64bit_address(control));
+            handler = &iommu_maskable_msi_type;
+        }
+        else
+            handler = &iommu_msi_type;
+
+        ret = __setup_msi_irq(irq_to_desc(irq), &iommu->msi, handler);
+    }
+
     if ( !ret )
         ret = request_irq(irq, 0, iommu_interrupt_handler, "amd_iommu", iommu);
     if ( ret )
@@ -838,8 +897,19 @@ static void enable_iommu(struct amd_iomm
         struct irq_desc *desc = irq_to_desc(iommu->msi.irq);
 
         spin_lock(&desc->lock);
-        set_msi_affinity(desc, &cpu_online_map);
-        spin_unlock(&desc->lock);
+
+        if ( iommu->ctrl.int_cap_xt_en )
+        {
+            set_x2apic_affinity(desc, &cpu_online_map);
+            spin_unlock(&desc->lock);
+        }
+        else
+        {
+            set_msi_affinity(desc, &cpu_online_map);
+            spin_unlock(&desc->lock);
+
+            amd_iommu_msi_enable(iommu, IOMMU_CONTROL_ENABLED);
+        }
     }
 
     amd_iommu_msi_enable(iommu, IOMMU_CONTROL_ENABLED);
@@ -879,7 +949,9 @@ static void disable_iommu(struct amd_iom
         return;
     }
 
-    amd_iommu_msi_enable(iommu, IOMMU_CONTROL_DISABLED);
+    if ( !iommu->ctrl.int_cap_xt_en )
+        amd_iommu_msi_enable(iommu, IOMMU_CONTROL_DISABLED);
+
     set_iommu_command_buffer_control(iommu, IOMMU_CONTROL_DISABLED);
     set_iommu_event_log_control(iommu, IOMMU_CONTROL_DISABLED);
 
--- a/xen/include/asm-x86/hvm/svm/amd-iommu-defs.h
+++ b/xen/include/asm-x86/hvm/svm/amd-iommu-defs.h
@@ -416,6 +416,25 @@ union amd_iommu_ext_features {
     } flds;
 };
 
+/* x2APIC Control Registers */
+#define IOMMU_XT_INT_CTRL_MMIO_OFFSET		0x0170
+#define IOMMU_XT_PPR_INT_CTRL_MMIO_OFFSET	0x0178
+#define IOMMU_XT_GA_INT_CTRL_MMIO_OFFSET	0x0180
+
+union amd_iommu_x2apic_control {
+    uint64_t raw;
+    struct {
+        unsigned int :2;
+        unsigned int dest_mode:1;
+        unsigned int :5;
+        unsigned int dest_lo:24;
+        unsigned int vector:8;
+        unsigned int int_type:1; /* DM in IOMMU spec 3.04 */
+        unsigned int :15;
+        unsigned int dest_hi:8;
+    };
+};
+
 /* Status Register*/
 #define IOMMU_STATUS_MMIO_OFFSET		0x2020
 #define IOMMU_STATUS_EVENT_OVERFLOW_MASK	0x00000001




_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 54+ messages in thread

* [Xen-devel] [PATCH v2 09/10] AMD/IOMMU: enable x2APIC mode when available
  2019-06-27 15:15 ` [Xen-devel] [PATCH v2 00/10] x86: AMD x2APIC support Jan Beulich
                     ` (7 preceding siblings ...)
  2019-06-27 15:22   ` [Xen-devel] [PATCH v2 08/10] AMD/IOMMU: adjust setup of internal interrupt for x2APIC mode Jan Beulich
@ 2019-06-27 15:23   ` Jan Beulich
  2019-07-02 14:50     ` Andrew Cooper
  2019-06-27 15:23   ` [Xen-devel] [PATCH RFC v2 10/10] AMD/IOMMU: correct IRTE updating Jan Beulich
  9 siblings, 1 reply; 54+ messages in thread
From: Jan Beulich @ 2019-06-27 15:23 UTC (permalink / raw)
  To: xen-devel; +Cc: Andrew Cooper, Brian Woods, Suravee Suthikulpanit

In order for the CPUs to use x2APIC mode, the IOMMU(s) first need to be
switched into suitable state.

The post-AP-bringup IRQ affinity adjustment is done also for the non-
x2APIC case.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
---
v2: Drop cpu_has_cx16 check. Add comment.
---
TBD: Instead of the system_state check in iov_enable_xt() the function
     could also zap its own hook pointer, at which point it could also
     become __init. This would, however, require that either
     resume_x2apic() be bound to ignore iommu_enable_x2apic() errors
     forever, or that iommu_enable_x2apic() be slightly re-arranged to
     not return -EOPNOTSUPP when finding a NULL hook during resume.

--- a/xen/drivers/passthrough/amd/iommu_init.c
+++ b/xen/drivers/passthrough/amd/iommu_init.c
@@ -834,6 +834,30 @@ static bool_t __init set_iommu_interrupt
     return 1;
 }
 
+int iov_adjust_irq_affinities(void)
+{
+    const struct amd_iommu *iommu;
+
+    if ( !iommu_enabled )
+        return 0;
+
+    for_each_amd_iommu ( iommu )
+    {
+        struct irq_desc *desc = irq_to_desc(iommu->msi.irq);
+        unsigned long flags;
+
+        spin_lock_irqsave(&desc->lock, flags);
+        if ( iommu->ctrl.int_cap_xt_en )
+            set_x2apic_affinity(desc, &cpu_online_map);
+        else
+            set_msi_affinity(desc, &cpu_online_map);
+        spin_unlock_irqrestore(&desc->lock, flags);
+    }
+
+    return 0;
+}
+__initcall(iov_adjust_irq_affinities);
+
 /*
  * Family15h Model 10h-1fh erratum 746 (IOMMU Logging May Stall Translations)
  * Workaround:
@@ -1047,7 +1071,7 @@ static void * __init allocate_ppr_log(st
                                 IOMMU_PPR_LOG_DEFAULT_ENTRIES, "PPR Log");
 }
 
-static int __init amd_iommu_init_one(struct amd_iommu *iommu)
+static int __init amd_iommu_init_one(struct amd_iommu *iommu, bool intr)
 {
     if ( allocate_cmd_buffer(iommu) == NULL )
         goto error_out;
@@ -1058,7 +1082,7 @@ static int __init amd_iommu_init_one(str
     if ( iommu->features.flds.ppr_sup && !allocate_ppr_log(iommu) )
         goto error_out;
 
-    if ( !set_iommu_interrupt_handler(iommu) )
+    if ( intr && !set_iommu_interrupt_handler(iommu) )
         goto error_out;
 
     /* To make sure that device_table.buffer has been successfully allocated */
@@ -1285,7 +1309,7 @@ static int __init amd_iommu_prepare_one(
     return 0;
 }
 
-int __init amd_iommu_init(void)
+int __init amd_iommu_prepare(void)
 {
     struct amd_iommu *iommu;
     int rc = -ENODEV;
@@ -1300,9 +1324,14 @@ int __init amd_iommu_init(void)
     if ( unlikely(acpi_gbl_FADT.boot_flags & ACPI_FADT_NO_MSI) )
         goto error_out;
 
+    /* Have we been here before? */
+    if ( ivhd_type )
+        return 0;
+
     rc = amd_iommu_get_supported_ivhd_type();
     if ( rc < 0 )
         goto error_out;
+    BUG_ON(!rc);
     ivhd_type = rc;
 
     rc = amd_iommu_get_ivrs_dev_entries();
@@ -1321,9 +1350,33 @@ int __init amd_iommu_init(void)
     }
 
     rc = amd_iommu_update_ivrs_mapping_acpi();
+
+ error_out:
+    if ( rc )
+    {
+        amd_iommu_init_cleanup();
+        ivhd_type = 0;
+    }
+
+    return rc;
+}
+
+int __init amd_iommu_init(bool xt)
+{
+    struct amd_iommu *iommu;
+    int rc = amd_iommu_prepare();
+
     if ( rc )
         goto error_out;
 
+    for_each_amd_iommu ( iommu )
+    {
+        /* NB: There's no need to actually write these out right here. */
+        iommu->ctrl.ga_en |= xt;
+        iommu->ctrl.xt_en = xt;
+        iommu->ctrl.int_cap_xt_en = xt;
+    }
+
     /* initialize io-apic interrupt remapping entries */
     if ( iommu_intremap )
         rc = amd_iommu_setup_ioapic_remapping();
@@ -1346,7 +1399,12 @@ int __init amd_iommu_init(void)
     /* per iommu initialization  */
     for_each_amd_iommu ( iommu )
     {
-        rc = amd_iommu_init_one(iommu);
+        /*
+         * Setting up of the IOMMU interrupts cannot occur yet at the (very
+         * early) time we get here when enabling x2APIC mode. Suppress it
+         * here, and do it explicitly in amd_iommu_init_interrupt().
+         */
+        rc = amd_iommu_init_one(iommu, !xt);
         if ( rc )
             goto error_out;
     }
@@ -1358,6 +1416,40 @@ error_out:
     return rc;
 }
 
+int __init amd_iommu_init_interrupt(void)
+{
+    struct amd_iommu *iommu;
+    int rc = 0;
+
+    for_each_amd_iommu ( iommu )
+    {
+        struct irq_desc *desc;
+
+        if ( !set_iommu_interrupt_handler(iommu) )
+        {
+            rc = -EIO;
+            break;
+        }
+
+        desc = irq_to_desc(iommu->msi.irq);
+
+        spin_lock(&desc->lock);
+        ASSERT(iommu->ctrl.int_cap_xt_en);
+        set_x2apic_affinity(desc, &cpu_online_map);
+        spin_unlock(&desc->lock);
+
+        set_iommu_event_log_control(iommu, IOMMU_CONTROL_ENABLED);
+
+        if ( iommu->features.flds.ppr_sup )
+            set_iommu_ppr_log_control(iommu, IOMMU_CONTROL_ENABLED);
+    }
+
+    if ( rc )
+        amd_iommu_init_cleanup();
+
+    return rc;
+}
+
 static void invalidate_all_domain_pages(void)
 {
     struct domain *d;
--- a/xen/drivers/passthrough/amd/iommu_intr.c
+++ b/xen/drivers/passthrough/amd/iommu_intr.c
@@ -816,6 +816,40 @@ void* __init amd_iommu_alloc_intremap_ta
     return tb;
 }
 
+bool __init iov_supports_xt(void)
+{
+    unsigned int apic;
+    struct amd_iommu *iommu;
+
+    if ( !iommu_enable || !iommu_intremap )
+        return false;
+
+    if ( amd_iommu_prepare() )
+        return false;
+
+    for_each_amd_iommu ( iommu )
+        if ( !iommu->features.flds.ga_sup || !iommu->features.flds.xt_sup )
+            return false;
+
+    for ( apic = 0; apic < nr_ioapics; apic++ )
+    {
+        unsigned int idx = ioapic_id_to_index(IO_APIC_ID(apic));
+
+        if ( idx == MAX_IO_APICS )
+            return false;
+
+        if ( !find_iommu_for_device(ioapic_sbdf[idx].seg,
+                                    ioapic_sbdf[idx].bdf) )
+        {
+            AMD_IOMMU_DEBUG("No IOMMU for IO-APIC %#x (ID %x)\n",
+                            apic, IO_APIC_ID(apic));
+            return false;
+        }
+    }
+
+    return true;
+}
+
 int __init amd_setup_hpet_msi(struct msi_desc *msi_desc)
 {
     spinlock_t *lock;
--- a/xen/drivers/passthrough/amd/pci_amd_iommu.c
+++ b/xen/drivers/passthrough/amd/pci_amd_iommu.c
@@ -170,7 +170,8 @@ static int __init iov_detect(void)
     if ( !iommu_enable && !iommu_intremap )
         return 0;
 
-    if ( amd_iommu_init() != 0 )
+    else if ( (init_done ? amd_iommu_init_interrupt()
+                         : amd_iommu_init(false)) != 0 )
     {
         printk("AMD-Vi: Error initialization\n");
         return -ENODEV;
@@ -183,6 +184,25 @@ static int __init iov_detect(void)
     return scan_pci_devices();
 }
 
+static int iov_enable_xt(void)
+{
+    int rc;
+
+    if ( system_state >= SYS_STATE_active )
+        return 0;
+
+    if ( (rc = amd_iommu_init(true)) != 0 )
+    {
+        printk("AMD-Vi: Error %d initializing for x2APIC mode\n", rc);
+        /* -ENXIO has special meaning to the caller - convert it. */
+        return rc != -ENXIO ? rc : -ENODATA;
+    }
+
+    init_done = true;
+
+    return 0;
+}
+
 int amd_iommu_alloc_root(struct domain_iommu *hd)
 {
     if ( unlikely(!hd->arch.root_table) )
@@ -559,11 +579,13 @@ static const struct iommu_ops __initcons
     .free_page_table = deallocate_page_table,
     .reassign_device = reassign_device,
     .get_device_group_id = amd_iommu_group_id,
+    .enable_x2apic = iov_enable_xt,
     .update_ire_from_apic = amd_iommu_ioapic_update_ire,
     .update_ire_from_msi = amd_iommu_msi_msg_update_ire,
     .read_apic_from_ire = amd_iommu_read_ioapic_from_ire,
     .read_msi_from_ire = amd_iommu_read_msi_from_ire,
     .setup_hpet_msi = amd_setup_hpet_msi,
+    .adjust_irq_affinities = iov_adjust_irq_affinities,
     .suspend = amd_iommu_suspend,
     .resume = amd_iommu_resume,
     .share_p2m = amd_iommu_share_p2m,
@@ -574,4 +596,5 @@ static const struct iommu_ops __initcons
 static const struct iommu_init_ops __initconstrel _iommu_init_ops = {
     .ops = &_iommu_ops,
     .setup = iov_detect,
+    .supports_x2apic = iov_supports_xt,
 };
--- a/xen/include/asm-x86/hvm/svm/amd-iommu-proto.h
+++ b/xen/include/asm-x86/hvm/svm/amd-iommu-proto.h
@@ -48,8 +48,11 @@ int amd_iommu_detect_acpi(void);
 void get_iommu_features(struct amd_iommu *iommu);
 
 /* amd-iommu-init functions */
-int amd_iommu_init(void);
+int amd_iommu_prepare(void);
+int amd_iommu_init(bool xt);
+int amd_iommu_init_interrupt(void);
 int amd_iommu_update_ivrs_mapping_acpi(void);
+int iov_adjust_irq_affinities(void);
 
 /* mapping functions */
 int __must_check amd_iommu_map_page(struct domain *d, dfn_t dfn,
@@ -96,6 +99,7 @@ void amd_iommu_flush_all_caches(struct a
 struct amd_iommu *find_iommu_for_device(int seg, int bdf);
 
 /* interrupt remapping */
+bool iov_supports_xt(void);
 int amd_iommu_setup_ioapic_remapping(void);
 void *amd_iommu_alloc_intremap_table(unsigned long **);
 int amd_iommu_free_intremap_table(u16 seg, struct ivrs_mappings *);




_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 54+ messages in thread

* [Xen-devel] [PATCH RFC v2 10/10] AMD/IOMMU: correct IRTE updating
  2019-06-27 15:15 ` [Xen-devel] [PATCH v2 00/10] x86: AMD x2APIC support Jan Beulich
                     ` (8 preceding siblings ...)
  2019-06-27 15:23   ` [Xen-devel] [PATCH v2 09/10] AMD/IOMMU: enable x2APIC mode when available Jan Beulich
@ 2019-06-27 15:23   ` Jan Beulich
  2019-07-02 15:08     ` Andrew Cooper
  9 siblings, 1 reply; 54+ messages in thread
From: Jan Beulich @ 2019-06-27 15:23 UTC (permalink / raw)
  To: xen-devel; +Cc: Andrew Cooper, Brian Woods, Suravee Suthikulpanit

While for 32-bit IRTEs I think we can safely continue to assume that the
writes will translate to a single MOV, the use of CMPXCHG16B is more
heavy handed than necessary for the 128-bit form, and the flushing
didn't get done along the lines of what the specification says. Mark
entries to be updated as not remapped (which will result in interrupt
requests to get target aborted, but the interrupts should be masked
anyway at that point in time), issue the flush, and only then write the
new entry. In the 128-bit IRTE case set RemapEn separately last, to that
the ordering of the writes of the two 64-bit halves won't matter.

In update_intremap_entry_from_msi_msg() also fold the duplicate initial
lock determination and acquire into just a single instance.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
---
RFC: Putting the flush invocations in loops isn't overly nice, but I
     don't think this can really be abused, since callers up the stack
     hold further locks. Nevertheless I'd like to ask for better
     suggestions.
---
v2: Parts morphed into earlier patch.

--- a/xen/drivers/passthrough/amd/iommu_intr.c
+++ b/xen/drivers/passthrough/amd/iommu_intr.c
@@ -238,8 +238,7 @@ static void update_intremap_entry(union
         break;
 
     case irte128:
-        ACCESS_ONCE(entry.ptr128->raw[0]) = 0;
-        barrier();
+        ASSERT(!entry.ptr128->full.remap_en);
         entry.ptr128->raw[1] =
             container_of(&full, union irte128, full)->raw[1];
         barrier();
@@ -308,6 +307,20 @@ static int update_intremap_entry_from_io
     }
 
     entry = get_intremap_entry(iommu->seg, req_id, offset);
+
+    /* The RemapEn fields match for all formats. */
+    while ( iommu->enabled && entry.ptr32->basic.remap_en )
+    {
+        entry.ptr32->basic.remap_en = 0;
+        spin_unlock(lock);
+
+        spin_lock(&iommu->lock);
+        amd_iommu_flush_intremap(iommu, req_id);
+        spin_unlock(&iommu->lock);
+
+        spin_lock(lock);
+    }
+
     if ( fresh )
         /* nothing */;
     else if ( !lo_update )
@@ -337,13 +350,6 @@ static int update_intremap_entry_from_io
 
     spin_unlock_irqrestore(lock, flags);
 
-    if ( iommu->enabled && !fresh )
-    {
-        spin_lock_irqsave(&iommu->lock, flags);
-        amd_iommu_flush_intremap(iommu, req_id);
-        spin_unlock_irqrestore(&iommu->lock, flags);
-    }
-
     set_rte_index(rte, offset);
 
     return 0;
@@ -608,19 +614,27 @@ static int update_intremap_entry_from_ms
     req_id = get_dma_requestor_id(iommu->seg, bdf);
     alias_id = get_intremap_requestor_id(iommu->seg, bdf);
 
+    lock = get_intremap_lock(iommu->seg, req_id);
+    spin_lock_irqsave(lock, flags);
+
     if ( msg == NULL )
     {
-        lock = get_intremap_lock(iommu->seg, req_id);
-        spin_lock_irqsave(lock, flags);
         for ( i = 0; i < nr; ++i )
             free_intremap_entry(iommu->seg, req_id, *remap_index + i);
         spin_unlock_irqrestore(lock, flags);
-        goto done;
-    }
 
-    lock = get_intremap_lock(iommu->seg, req_id);
+        if ( iommu->enabled )
+        {
+            spin_lock_irqsave(&iommu->lock, flags);
+            amd_iommu_flush_intremap(iommu, req_id);
+            if ( alias_id != req_id )
+                amd_iommu_flush_intremap(iommu, alias_id);
+            spin_unlock_irqrestore(&iommu->lock, flags);
+        }
+
+        return 0;
+    }
 
-    spin_lock_irqsave(lock, flags);
     dest_mode = (msg->address_lo >> MSI_ADDR_DESTMODE_SHIFT) & 0x1;
     delivery_mode = (msg->data >> MSI_DATA_DELIVERY_MODE_SHIFT) & 0x1;
     vector = (msg->data >> MSI_DATA_VECTOR_SHIFT) & MSI_DATA_VECTOR_MASK;
@@ -644,6 +658,22 @@ static int update_intremap_entry_from_ms
     }
 
     entry = get_intremap_entry(iommu->seg, req_id, offset);
+
+    /* The RemapEn fields match for all formats. */
+    while ( iommu->enabled && entry.ptr32->basic.remap_en )
+    {
+        entry.ptr32->basic.remap_en = 0;
+        spin_unlock(lock);
+
+        spin_lock(&iommu->lock);
+        amd_iommu_flush_intremap(iommu, req_id);
+        if ( alias_id != req_id )
+            amd_iommu_flush_intremap(iommu, alias_id);
+        spin_unlock(&iommu->lock);
+
+        spin_lock(lock);
+    }
+
     update_intremap_entry(entry, vector, delivery_mode, dest_mode, dest);
     spin_unlock_irqrestore(lock, flags);
 
@@ -663,16 +693,6 @@ static int update_intremap_entry_from_ms
                get_ivrs_mappings(iommu->seg)[alias_id].intremap_table);
     }
 
-done:
-    if ( iommu->enabled )
-    {
-        spin_lock_irqsave(&iommu->lock, flags);
-        amd_iommu_flush_intremap(iommu, req_id);
-        if ( alias_id != req_id )
-            amd_iommu_flush_intremap(iommu, alias_id);
-        spin_unlock_irqrestore(&iommu->lock, flags);
-    }
-
     return 0;
 }
 




_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [Xen-devel] [PATCH v2 01/10] AMD/IOMMU: restrict feature logging
  2019-06-27 15:19   ` [Xen-devel] [PATCH v2 01/10] AMD/IOMMU: restrict feature logging Jan Beulich
@ 2019-07-01 15:37     ` Andrew Cooper
  2019-07-01 15:59     ` Woods, Brian
  1 sibling, 0 replies; 54+ messages in thread
From: Andrew Cooper @ 2019-07-01 15:37 UTC (permalink / raw)
  To: Jan Beulich, xen-devel; +Cc: Brian Woods, Suravee Suthikulpanit

On 27/06/2019 16:19, Jan Beulich wrote:
> The common case is all IOMMUs having the same features. Log them only
> for the first IOMMU, or for any that have a differing feature set.
>
> Requested-by: Andrew Cooper <andrew.cooper3@citrix.com>
> Signed-off-by: Jan Beulich <jbeulich@suse.com>

Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [Xen-devel] [PATCH v2 01/10] AMD/IOMMU: restrict feature logging
  2019-06-27 15:19   ` [Xen-devel] [PATCH v2 01/10] AMD/IOMMU: restrict feature logging Jan Beulich
  2019-07-01 15:37     ` Andrew Cooper
@ 2019-07-01 15:59     ` Woods, Brian
  1 sibling, 0 replies; 54+ messages in thread
From: Woods, Brian @ 2019-07-01 15:59 UTC (permalink / raw)
  To: Jan Beulich
  Cc: xen-devel, Woods, Brian, Suthikulpanit, Suravee, Andrew Cooper

On Thu, Jun 27, 2019 at 09:19:06AM -0600, Jan Beulich wrote:
> The common case is all IOMMUs having the same features. Log them only
> for the first IOMMU, or for any that have a differing feature set.
> 
> Requested-by: Andrew Cooper <andrew.cooper3@citrix.com>
> Signed-off-by: Jan Beulich <jbeulich@suse.com>

Acked-by: Brian Woods <brian.woods@amd.com>

> ---
> v2: New.
> 
> --- a/xen/drivers/passthrough/amd/iommu_detect.c
> +++ b/xen/drivers/passthrough/amd/iommu_detect.c
> @@ -62,6 +62,7 @@ void __init get_iommu_features(struct am
>  {
>      u32 low, high;
>      int i = 0 ;
> +    const struct amd_iommu *first;
>      static const char *__initdata feature_str[] = {
>          "- Prefetch Pages Command", 
>          "- Peripheral Page Service Request", 
> @@ -89,6 +90,11 @@ void __init get_iommu_features(struct am
>  
>      iommu->features = ((u64)high << 32) | low;
>  
> +    /* Don't log the same set of features over and over. */
> +    first = list_first_entry(&amd_iommu_head, struct amd_iommu, list);
> +    if ( iommu != first && iommu->features == first->features )
> +        return;
> +
>      printk("AMD-Vi: IOMMU Extended Features:\n");
>  
>      while ( feature_str[i] )
> 
> 
> 

-- 
Brian Woods

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [Xen-devel] [PATCH v2 02/10] AMD/IOMMU: use bit field for extended feature register
  2019-06-27 15:19   ` [Xen-devel] [PATCH v2 02/10] AMD/IOMMU: use bit field for extended feature register Jan Beulich
@ 2019-07-02 12:09     ` Andrew Cooper
  2019-07-02 13:48       ` Jan Beulich
  0 siblings, 1 reply; 54+ messages in thread
From: Andrew Cooper @ 2019-07-02 12:09 UTC (permalink / raw)
  To: Jan Beulich, xen-devel; +Cc: Brian Woods, Suravee Suthikulpanit

[-- Attachment #1.1: Type: text/plain, Size: 3388 bytes --]

On 27/06/2019 16:19, Jan Beulich wrote:
>      printk("AMD-Vi: IOMMU Extended Features:\n");
>  
> -    while ( feature_str[i] )
> +#define MASK(fld) ((union amd_iommu_ext_features){ .flds.fld = ~0 }).raw
> +#define FEAT(fld, str) do { \
> +    if ( MASK(fld) & (MASK(fld) - 1) ) \
> +        printk( "- " str ": %#x\n", iommu->features.flds.fld); \
> +    else if ( iommu->features.raw & MASK(fld) ) \
> +        printk( "- " str "\n"); \
> +} while ( false )

Sadly, Clang dislikes this construct.

https://gitlab.com/xen-project/people/andyhhp/xen/-/jobs/243795095 
(Click on the "Complete Raw" button)

iommu_detect.c:90:5: error: implicit truncation from 'int' to bitfield changes value from -1 to 1 [-Werror,-Wbitfield-constant-conversion]
    FEAT(pref_sup,           "Prefetch Pages Command");
    ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
iommu_detect.c:84:10: note: expanded from macro 'FEAT'
    if ( MASK(fld) & (MASK(fld) - 1) ) \
         ^~~~~~~~~
iommu_detect.c:82:64: note: expanded from macro 'MASK'
#define MASK(fld) ((union amd_iommu_ext_features){ .flds.fld = ~0 }).raw
                                                               ^~


which is a shame.  Furthermore, switching to ~(0u) won't work either,
because that will then get a truncation warning.

Clever as this trick is, this is write-once code and isn't going to
change moving forward.  I'd do away with the compile-time cleverness and
have simple FEAT() and MASK() macros, and use the correct one below.

> --- a/xen/include/asm-x86/hvm/svm/amd-iommu-defs.h
> +++ b/xen/include/asm-x86/hvm/svm/amd-iommu-defs.h
> @@ -346,26 +346,57 @@ struct amd_iommu_dte {
> +union amd_iommu_ext_features {
> +    uint64_t raw;
> +    struct {
> +        unsigned int pref_sup:1;
> +        unsigned int ppr_sup:1;
> +        unsigned int xt_sup:1;
> +        unsigned int nx_sup:1;
> +        unsigned int gt_sup:1;
> +        unsigned int gappi_sup:1;
> +        unsigned int ia_sup:1;
> +        unsigned int ga_sup:1;
> +        unsigned int he_sup:1;
> +        unsigned int pc_sup:1;
> +        unsigned int hats:2;
> +        unsigned int gats:2;
> +        unsigned int glx_sup:2;
> +        unsigned int smif_sup:2;
> +        unsigned int smif_rc:3;
> +        unsigned int gam_sup:3;
> +        unsigned int dual_ppr_log_sup:2;
> +        unsigned int :2;
> +        unsigned int dual_event_log_sup:2;
> +        unsigned int :1;
> +        unsigned int sats_sup:1;
> +        unsigned int pas_max:5;
> +        unsigned int us_sup:1;
> +        unsigned int dev_tbl_seg_sup:2;
> +        unsigned int ppr_early_of_sup:1;
> +        unsigned int ppr_auto_rsp_sup:1;
> +        unsigned int marc_sup:2;
> +        unsigned int blk_stop_mrk_sup:1;
> +        unsigned int perf_opt_sup:1;
> +        unsigned int msi_cap_mmio_sup:1;
> +        unsigned int :1;
> +        unsigned int gio_sup:1;
> +        unsigned int ha_sup:1;
> +        unsigned int eph_sup:1;
> +        unsigned int attr_fw_sup:1;
> +        unsigned int hd_sup:1;
> +        unsigned int :1;
> +        unsigned int inv_iotlb_type_sup:1;
> +        unsigned int viommu_sup:1;
> +        unsigned int vm_guard_io_sup:1;
> +        unsigned int vm_table_size:4;
> +        unsigned int ga_update_dis_sup:1;
> +        unsigned int :2;
> +    } flds;

Why the .flds name?  What is wrong with this becoming anonymous?

~Andrew

[-- Attachment #1.2: Type: text/html, Size: 4071 bytes --]

<html>
  <head>
    <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
  </head>
  <body text="#000000" bgcolor="#FFFFFF">
    <div class="moz-cite-prefix">On 27/06/2019 16:19, Jan Beulich wrote:<br>
    </div>
    <blockquote type="cite"
      cite="mid:5D14DE87020000780023B97E@prv1-mh.provo.novell.com">
      <pre class="moz-quote-pre" wrap="">     printk("AMD-Vi: IOMMU Extended Features:\n");
 
-    while ( feature_str[i] )
+#define MASK(fld) ((union amd_iommu_ext_features){ .flds.fld = ~0 }).raw
+#define FEAT(fld, str) do { \
+    if ( MASK(fld) &amp; (MASK(fld) - 1) ) \
+        printk( "- " str ": %#x\n", iommu-&gt;features.flds.fld); \
+    else if ( iommu-&gt;features.raw &amp; MASK(fld) ) \
+        printk( "- " str "\n"); \
+} while ( false )</pre>
    </blockquote>
    <br>
    Sadly, Clang dislikes this construct.<br>
    <br>
    <a class="moz-txt-link-freetext" href="https://gitlab.com/xen-project/people/andyhhp/xen/-/jobs/243795095">https://gitlab.com/xen-project/people/andyhhp/xen/-/jobs/243795095</a> 
    (Click on the "Complete Raw" button)<br>
    <br>
    <pre>iommu_detect.c:90:5: error: implicit truncation from 'int' to bitfield changes value from -1 to 1 [-Werror,-Wbitfield-constant-conversion]
    FEAT(pref_sup,           "Prefetch Pages Command");
    ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
iommu_detect.c:84:10: note: expanded from macro 'FEAT'
    if ( MASK(fld) &amp; (MASK(fld) - 1) ) \
         ^~~~~~~~~
iommu_detect.c:82:64: note: expanded from macro 'MASK'
#define MASK(fld) ((union amd_iommu_ext_features){ .flds.fld = ~0 }).raw
                                                               ^~</pre>
    <br>
    which is a shame.  Furthermore, switching to ~(0u) won't work
    either, because that will then get a truncation warning.<br>
    <br>
    Clever as this trick is, this is write-once code and isn't going to
    change moving forward.  I'd do away with the compile-time cleverness
    and have simple FEAT() and MASK() macros, and use the correct one
    below.<br>
    <br>
    <blockquote type="cite"
      cite="mid:5D14DE87020000780023B97E@prv1-mh.provo.novell.com">
      <pre class="moz-quote-pre" wrap="">--- a/xen/include/asm-x86/hvm/svm/amd-iommu-defs.h
+++ b/xen/include/asm-x86/hvm/svm/amd-iommu-defs.h
@@ -346,26 +346,57 @@ struct amd_iommu_dte {
+union amd_iommu_ext_features {
+    uint64_t raw;
+    struct {
+        unsigned int pref_sup:1;
+        unsigned int ppr_sup:1;
+        unsigned int xt_sup:1;
+        unsigned int nx_sup:1;
+        unsigned int gt_sup:1;
+        unsigned int gappi_sup:1;
+        unsigned int ia_sup:1;
+        unsigned int ga_sup:1;
+        unsigned int he_sup:1;
+        unsigned int pc_sup:1;
+        unsigned int hats:2;
+        unsigned int gats:2;
+        unsigned int glx_sup:2;
+        unsigned int smif_sup:2;
+        unsigned int smif_rc:3;
+        unsigned int gam_sup:3;
+        unsigned int dual_ppr_log_sup:2;
+        unsigned int :2;
+        unsigned int dual_event_log_sup:2;
+        unsigned int :1;
+        unsigned int sats_sup:1;
+        unsigned int pas_max:5;
+        unsigned int us_sup:1;
+        unsigned int dev_tbl_seg_sup:2;
+        unsigned int ppr_early_of_sup:1;
+        unsigned int ppr_auto_rsp_sup:1;
+        unsigned int marc_sup:2;
+        unsigned int blk_stop_mrk_sup:1;
+        unsigned int perf_opt_sup:1;
+        unsigned int msi_cap_mmio_sup:1;
+        unsigned int :1;
+        unsigned int gio_sup:1;
+        unsigned int ha_sup:1;
+        unsigned int eph_sup:1;
+        unsigned int attr_fw_sup:1;
+        unsigned int hd_sup:1;
+        unsigned int :1;
+        unsigned int inv_iotlb_type_sup:1;
+        unsigned int viommu_sup:1;
+        unsigned int vm_guard_io_sup:1;
+        unsigned int vm_table_size:4;
+        unsigned int ga_update_dis_sup:1;
+        unsigned int :2;
+    } flds;</pre>
    </blockquote>
    <br>
    Why the .flds name?  What is wrong with this becoming anonymous?<br>
    <br>
    ~Andrew<br>
  </body>
</html>

[-- Attachment #2: Type: text/plain, Size: 157 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [Xen-devel] [PATCH v2 03/10] AMD/IOMMU: use bit field for control register
  2019-06-27 15:20   ` [Xen-devel] [PATCH v2 03/10] AMD/IOMMU: use bit field for control register Jan Beulich
@ 2019-07-02 12:20     ` Andrew Cooper
  0 siblings, 0 replies; 54+ messages in thread
From: Andrew Cooper @ 2019-07-02 12:20 UTC (permalink / raw)
  To: Jan Beulich, xen-devel; +Cc: Brian Woods, Suravee Suthikulpanit

On 27/06/2019 16:20, Jan Beulich wrote:
> Also introduce a field in struct amd_iommu caching the most recently
> written control register. All writes should now happen exclusively from
> that cached value, such that it is guaranteed to be up to date.
>
> Take the opportunity and add further fields. Also convert a few boolean
> function parameters to bool, such that use of !! can be avoided.
>
> Because of there now being definitions beyond bit 31, writel() also gets
> replaced by writeq() when updating hardware.
>
> Signed-off-by: Jan Beulich <jbeulich@suse.com>
> ---
> v2: Add domain_id_pne field. Mention writel() -> writeq() change.

Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>, subject to the
resolution of similarities with the previous patch.

I'm still concerned that not using bool bitfields is a recipe for a
subtle mistakes.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [Xen-devel] [PATCH v2 04/10] AMD/IOMMU: use bit field for IRTE
  2019-06-27 15:20   ` [Xen-devel] [PATCH v2 04/10] AMD/IOMMU: use bit field for IRTE Jan Beulich
@ 2019-07-02 12:33     ` Andrew Cooper
  2019-07-02 13:56       ` Jan Beulich
  0 siblings, 1 reply; 54+ messages in thread
From: Andrew Cooper @ 2019-07-02 12:33 UTC (permalink / raw)
  To: Jan Beulich, xen-devel; +Cc: Brian Woods, Suravee Suthikulpanit

On 27/06/2019 16:20, Jan Beulich wrote:
> At the same time restrict its scope to just the single source file
> actually using it, and abstract accesses by introducing a union of
> pointers. (A union of the actual table entries is not used to make it
> impossible to [wrongly, once the 128-bit form gets added] perform
> pointer arithmetic / array accesses on derived types.)
>
> Also move away from updating the entries piecemeal: Construct a full new
> entry, and write it out.
>
> Signed-off-by: Jan Beulich <jbeulich@suse.com>
> ---
> v2: name {get,free}_intremap_entry()'s last parameter "index" instead of
>     "offset". Introduce union irte32.
> ---
> It would have been nice to use write_atomic() or ACCESS_ONCE() for the
> actual writes, but both cast the value to a scalar one, which doesn't
> suit us here (and I also didn't want to make the compound type a union
> with a raw member just for this).

This comment is stale.  However, I'm still confused as to what the
problem with putting a raw in union irte_basic is.

In particular, the containerof() usage is complicated to follow, and I
don't see it as being necessary.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [Xen-devel] [PATCH v2 02/10] AMD/IOMMU: use bit field for extended feature register
  2019-07-02 12:09     ` Andrew Cooper
@ 2019-07-02 13:48       ` Jan Beulich
  0 siblings, 0 replies; 54+ messages in thread
From: Jan Beulich @ 2019-07-02 13:48 UTC (permalink / raw)
  To: Andrew Cooper, xen-devel; +Cc: Brian Woods, Suravee Suthikulpanit

On 02.07.2019 14:09, Andrew Cooper wrote:
> On 27/06/2019 16:19, Jan Beulich wrote:
>>       printk("AMD-Vi: IOMMU Extended Features:\n");
>>   
>> -    while ( feature_str[i] )
>> +#define MASK(fld) ((union amd_iommu_ext_features){ .flds.fld = ~0 }).raw
>> +#define FEAT(fld, str) do { \
>> +    if ( MASK(fld) & (MASK(fld) - 1) ) \
>> +        printk( "- " str ": %#x\n", iommu->features.flds.fld); \
>> +    else if ( iommu->features.raw & MASK(fld) ) \
>> +        printk( "- " str "\n"); \
>> +} while ( false )
> 
> Sadly, Clang dislikes this construct.
> 
> https://gitlab.com/xen-project/people/andyhhp/xen/-/jobs/243795095
> (Click on the "Complete Raw" button)
> 
> iommu_detect.c:90:5: error: implicit truncation from 'int' to bitfield changes value from -1 to 1 [-Werror,-Wbitfield-constant-conversion]
>      FEAT(pref_sup,           "Prefetch Pages Command");
>      ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> iommu_detect.c:84:10: note: expanded from macro 'FEAT'
>      if ( MASK(fld) & (MASK(fld) - 1) ) \
>           ^~~~~~~~~
> iommu_detect.c:82:64: note: expanded from macro 'MASK'
> #define MASK(fld) ((union amd_iommu_ext_features){ .flds.fld = ~0 }).raw
>                                                                 ^~
> 
> 
> which is a shame.  Furthermore, switching to ~(0u) won't work either,
> because that will then get a truncation warning.
> 
> Clever as this trick is, this is write-once code and isn't going to
> change moving forward.  I'd do away with the compile-time cleverness and
> have simple FEAT() and MASK() macros, and use the correct one below.

I don't immediately see what you would mean by "simple FEAT() and MASK()
macros", but perhaps I'll figure when I actually make this change. What
I'm concerned about when changing away from the chosen model is that
there'll likely be a need to explicitly know whether a field is just a
boolean or holds an actual (wider) value. I.e. that's what is not "write
once" about this code, since future additions equally become more
fragile.

I was actually hoping to use this "mask from bitfield" approach
elsewhere, so this is yet another case where I wonder whether us wanting
to be able to build with clang is actually becoming an increasing
hindrance.

I'll see if I can come up with something else, still matching the
original idea. Clearly clang can't be consistent with its value
truncation warnings, or else Xen wouldn't build with it at all.

>> --- a/xen/include/asm-x86/hvm/svm/amd-iommu-defs.h
>> +++ b/xen/include/asm-x86/hvm/svm/amd-iommu-defs.h
>> @@ -346,26 +346,57 @@ struct amd_iommu_dte {
>> +union amd_iommu_ext_features {
>> +    uint64_t raw;
>> +    struct {
>> +        unsigned int pref_sup:1;
>> +        unsigned int ppr_sup:1;
>> +        unsigned int xt_sup:1;
>> +        unsigned int nx_sup:1;
>> +        unsigned int gt_sup:1;
>> +        unsigned int gappi_sup:1;
>> +        unsigned int ia_sup:1;
>> +        unsigned int ga_sup:1;
>> +        unsigned int he_sup:1;
>> +        unsigned int pc_sup:1;
>> +        unsigned int hats:2;
>> +        unsigned int gats:2;
>> +        unsigned int glx_sup:2;
>> +        unsigned int smif_sup:2;
>> +        unsigned int smif_rc:3;
>> +        unsigned int gam_sup:3;
>> +        unsigned int dual_ppr_log_sup:2;
>> +        unsigned int :2;
>> +        unsigned int dual_event_log_sup:2;
>> +        unsigned int :1;
>> +        unsigned int sats_sup:1;
>> +        unsigned int pas_max:5;
>> +        unsigned int us_sup:1;
>> +        unsigned int dev_tbl_seg_sup:2;
>> +        unsigned int ppr_early_of_sup:1;
>> +        unsigned int ppr_auto_rsp_sup:1;
>> +        unsigned int marc_sup:2;
>> +        unsigned int blk_stop_mrk_sup:1;
>> +        unsigned int perf_opt_sup:1;
>> +        unsigned int msi_cap_mmio_sup:1;
>> +        unsigned int :1;
>> +        unsigned int gio_sup:1;
>> +        unsigned int ha_sup:1;
>> +        unsigned int eph_sup:1;
>> +        unsigned int attr_fw_sup:1;
>> +        unsigned int hd_sup:1;
>> +        unsigned int :1;
>> +        unsigned int inv_iotlb_type_sup:1;
>> +        unsigned int viommu_sup:1;
>> +        unsigned int vm_guard_io_sup:1;
>> +        unsigned int vm_table_size:4;
>> +        unsigned int ga_update_dis_sup:1;
>> +        unsigned int :2;
>> +    } flds;
> 
> Why the .flds name?  What is wrong with this becoming anonymous?

The initializer in guest_iommu_reg_init() (with old gcc).

Jan
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [Xen-devel] [PATCH v2 04/10] AMD/IOMMU: use bit field for IRTE
  2019-07-02 12:33     ` Andrew Cooper
@ 2019-07-02 13:56       ` Jan Beulich
  0 siblings, 0 replies; 54+ messages in thread
From: Jan Beulich @ 2019-07-02 13:56 UTC (permalink / raw)
  To: Andrew Cooper, xen-devel; +Cc: Brian Woods, Suravee Suthikulpanit

On 02.07.2019 14:33, Andrew Cooper wrote:
> On 27/06/2019 16:20, Jan Beulich wrote:
>> At the same time restrict its scope to just the single source file
>> actually using it, and abstract accesses by introducing a union of
>> pointers. (A union of the actual table entries is not used to make it
>> impossible to [wrongly, once the 128-bit form gets added] perform
>> pointer arithmetic / array accesses on derived types.)
>>
>> Also move away from updating the entries piecemeal: Construct a full new
>> entry, and write it out.
>>
>> Signed-off-by: Jan Beulich <jbeulich@suse.com>
>> ---
>> v2: name {get,free}_intremap_entry()'s last parameter "index" instead of
>>      "offset". Introduce union irte32.
>> ---
>> It would have been nice to use write_atomic() or ACCESS_ONCE() for the
>> actual writes, but both cast the value to a scalar one, which doesn't
>> suit us here (and I also didn't want to make the compound type a union
>> with a raw member just for this).
> 
> This comment is stale.  However, I'm still confused as to what the
> problem with putting a raw in union irte_basic is.

That'll again require an intermediate "flds" (or however we choose to
name it) union field name for the bitfield structure, or else once
again initializers won't work with old gcc.

> In particular, the containerof() usage is complicated to follow, and I
> don't see it as being necessary.

Well, I can drop it if we're happy about the extra intermediate field
name (personally I'm not, but I'd accept it if it's considered less bad
than the containerof() approach).

Jan
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [Xen-devel] [PATCH v2 05/10] AMD/IOMMU: introduce 128-bit IRTE non-guest-APIC IRTE format
  2019-06-27 15:21   ` [Xen-devel] [PATCH v2 05/10] AMD/IOMMU: introduce 128-bit IRTE non-guest-APIC IRTE format Jan Beulich
@ 2019-07-02 14:41     ` Andrew Cooper
  2019-07-03  8:46       ` Jan Beulich
  0 siblings, 1 reply; 54+ messages in thread
From: Andrew Cooper @ 2019-07-02 14:41 UTC (permalink / raw)
  To: Jan Beulich, xen-devel; +Cc: Brian Woods, Suravee Suthikulpanit

On 27/06/2019 16:21, Jan Beulich wrote:
> --- a/xen/drivers/passthrough/amd/iommu_intr.c
> +++ b/xen/drivers/passthrough/amd/iommu_intr.c
> @@ -40,12 +40,45 @@ union irte32 {
>
> -#define INTREMAP_TABLE_ORDER    1
> +union irte_cptr {
> +    const void *ptr;
> +    const union irte32 *ptr32;
> +    const union irte128 *ptr128;
> +} __transparent__;
> +
> +#define INTREMAP_TABLE_ORDER (irte_mode == irte32 ? 1 : 3)

This is problematic for irte_mode == irteUNK.  As this "constant" is
used in exactly two places, I'd suggest a tiny static function along the
same lines as {get,update}_intremap_entry(), which can sensibly prevent
code looking for a size before irte_mode is set up.

> @@ -142,7 +187,21 @@ static void free_intremap_entry(unsigned
>  {
>      union irte_ptr entry = get_intremap_entry(seg, bdf, index);
>  
> -    ACCESS_ONCE(entry.ptr32->raw[0]) = 0;
> +    switch ( irte_mode )
> +    {
> +    case irte32:
> +        ACCESS_ONCE(entry.ptr32->raw[0]) = 0;
> +        break;
> +
> +    case irte128:
> +        ACCESS_ONCE(entry.ptr128->raw[0]) = 0;
> +        barrier();

smp_wmb().

Using barrier here isn't technically correct, because what matters is
the external visibility of the write.

It functions correctly on x86 because smp_wmb() is barrier(), but this
code doesn't work correctly on e.g. ARM.

I'd go further and leave an explanation.

smp_wmb(); /* Ensure the clear of .remap_en is visible to the IOMMU
first. */

> @@ -444,9 +601,9 @@ static int update_intremap_entry_from_ms
>      unsigned long flags;
>      union irte_ptr entry;
>      u16 req_id, alias_id;
> -    u8 delivery_mode, dest, vector, dest_mode;
> +    uint8_t delivery_mode, vector, dest_mode;

For the ioapic version, you used unsigned int, rather than uint8_t.  I'd
expect them to at least be consistent.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [Xen-devel] [PATCH v2 09/10] AMD/IOMMU: enable x2APIC mode when available
  2019-06-27 15:23   ` [Xen-devel] [PATCH v2 09/10] AMD/IOMMU: enable x2APIC mode when available Jan Beulich
@ 2019-07-02 14:50     ` Andrew Cooper
  0 siblings, 0 replies; 54+ messages in thread
From: Andrew Cooper @ 2019-07-02 14:50 UTC (permalink / raw)
  To: Jan Beulich, xen-devel; +Cc: Brian Woods, Suravee Suthikulpanit

On 27/06/2019 16:23, Jan Beulich wrote:
> In order for the CPUs to use x2APIC mode, the IOMMU(s) first need to be
> switched into suitable state.
>
> The post-AP-bringup IRQ affinity adjustment is done also for the non-
> x2APIC case.
>
> Signed-off-by: Jan Beulich <jbeulich@suse.com>

Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [Xen-devel] [PATCH RFC v2 10/10] AMD/IOMMU: correct IRTE updating
  2019-06-27 15:23   ` [Xen-devel] [PATCH RFC v2 10/10] AMD/IOMMU: correct IRTE updating Jan Beulich
@ 2019-07-02 15:08     ` Andrew Cooper
  2019-07-03  8:55       ` Jan Beulich
  0 siblings, 1 reply; 54+ messages in thread
From: Andrew Cooper @ 2019-07-02 15:08 UTC (permalink / raw)
  To: Jan Beulich, xen-devel; +Cc: Brian Woods, Suravee Suthikulpanit

On 27/06/2019 16:23, Jan Beulich wrote:
> While for 32-bit IRTEs I think we can safely continue to assume that the
> writes will translate to a single MOV, the use of CMPXCHG16B is more

The CMPXCHG16B here is stale.

> heavy handed than necessary for the 128-bit form, and the flushing
> didn't get done along the lines of what the specification says. Mark
> entries to be updated as not remapped (which will result in interrupt
> requests to get target aborted, but the interrupts should be masked
> anyway at that point in time), issue the flush, and only then write the
> new entry. In the 128-bit IRTE case set RemapEn separately last, to that
> the ordering of the writes of the two 64-bit halves won't matter.
>
> In update_intremap_entry_from_msi_msg() also fold the duplicate initial
> lock determination and acquire into just a single instance.
>
> Signed-off-by: Jan Beulich <jbeulich@suse.com>
> ---
> RFC: Putting the flush invocations in loops isn't overly nice, but I
>      don't think this can really be abused, since callers up the stack
>      hold further locks. Nevertheless I'd like to ask for better
>      suggestions.
> ---
> v2: Parts morphed into earlier patch.
>
> --- a/xen/drivers/passthrough/amd/iommu_intr.c
> +++ b/xen/drivers/passthrough/amd/iommu_intr.c
> @@ -238,8 +238,7 @@ static void update_intremap_entry(union
>          break;
>  
>      case irte128:
> -        ACCESS_ONCE(entry.ptr128->raw[0]) = 0;
> -        barrier();
> +        ASSERT(!entry.ptr128->full.remap_en);
>          entry.ptr128->raw[1] =
>              container_of(&full, union irte128, full)->raw[1];
>          barrier();
> @@ -308,6 +307,20 @@ static int update_intremap_entry_from_io
>      }
>  
>      entry = get_intremap_entry(iommu->seg, req_id, offset);
> +
> +    /* The RemapEn fields match for all formats. */
> +    while ( iommu->enabled && entry.ptr32->basic.remap_en )

Why while?  (and by this, what I mean is that this definitely needs a
comment, because the code looks like it ought to be an if.)

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [Xen-devel] [PATCH v2 05/10] AMD/IOMMU: introduce 128-bit IRTE non-guest-APIC IRTE format
  2019-07-02 14:41     ` Andrew Cooper
@ 2019-07-03  8:46       ` Jan Beulich
  0 siblings, 0 replies; 54+ messages in thread
From: Jan Beulich @ 2019-07-03  8:46 UTC (permalink / raw)
  To: Andrew Cooper, xen-devel; +Cc: Brian Woods, Suravee Suthikulpanit

On 02.07.2019 16:41, Andrew Cooper wrote:
> On 27/06/2019 16:21, Jan Beulich wrote:
>> @@ -142,7 +187,21 @@ static void free_intremap_entry(unsigned
>>   {
>>       union irte_ptr entry = get_intremap_entry(seg, bdf, index);
>>   
>> -    ACCESS_ONCE(entry.ptr32->raw[0]) = 0;
>> +    switch ( irte_mode )
>> +    {
>> +    case irte32:
>> +        ACCESS_ONCE(entry.ptr32->raw[0]) = 0;
>> +        break;
>> +
>> +    case irte128:
>> +        ACCESS_ONCE(entry.ptr128->raw[0]) = 0;
>> +        barrier();
> 
> smp_wmb().
> 
> Using barrier here isn't technically correct, because what matters is
> the external visibility of the write.
> 
> It functions correctly on x86 because smp_wmb() is barrier(), but this
> code doesn't work correctly on e.g. ARM.

Well, I did reply to a similar earlier comment of yours, and I
had hoped to get a reply from you in turn before actually sending
out v2. As said there, smp_wmb() isn't correct either, yet you
also don't want wmb() here. Even if we don't patch them ourselves,
we should still follow the abstract Linux model and _assume_
smp_*mb() convert to no-op when running on a UP system. The
barrier, however, is needed even in that case.

What I'm okay to do is accompany the barrier() (or, if you insist,
smp_wmb()) use with a comment clarifying that this is fine for x86,
but would need changing if the code was included in builds for
other architectures.

>> @@ -444,9 +601,9 @@ static int update_intremap_entry_from_ms
>>       unsigned long flags;
>>       union irte_ptr entry;
>>       u16 req_id, alias_id;
>> -    u8 delivery_mode, dest, vector, dest_mode;
>> +    uint8_t delivery_mode, vector, dest_mode;
> 
> For the ioapic version, you used unsigned int, rather than uint8_t.  I'd
> expect them to at least be consistent.

The type change on the I/O-APIC side is because "dest" is among
the variables there. But looking at both changes again, I guess
I'll rather use the approach here also in the I/O-APIC function,
moving "dest" down together with "offset".

Jan
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [Xen-devel] [PATCH RFC v2 10/10] AMD/IOMMU: correct IRTE updating
  2019-07-02 15:08     ` Andrew Cooper
@ 2019-07-03  8:55       ` Jan Beulich
  0 siblings, 0 replies; 54+ messages in thread
From: Jan Beulich @ 2019-07-03  8:55 UTC (permalink / raw)
  To: Andrew Cooper, xen-devel; +Cc: Brian Woods, Suravee Suthikulpanit

On 02.07.2019 17:08, Andrew Cooper wrote:
> On 27/06/2019 16:23, Jan Beulich wrote:
>> While for 32-bit IRTEs I think we can safely continue to assume that the
>> writes will translate to a single MOV, the use of CMPXCHG16B is more
> 
> The CMPXCHG16B here is stale.

Indeed, as is the 32-bit IRTE part of the sentence (now that I
use ACCESS_ONCE() already before this patch).

>> heavy handed than necessary for the 128-bit form, and the flushing
>> didn't get done along the lines of what the specification says. Mark
>> entries to be updated as not remapped (which will result in interrupt
>> requests to get target aborted, but the interrupts should be masked
>> anyway at that point in time), issue the flush, and only then write the
>> new entry. In the 128-bit IRTE case set RemapEn separately last, to that
>> the ordering of the writes of the two 64-bit halves won't matter.

This last sentence is stale too, and hence I've now removed it.

>> --- a/xen/drivers/passthrough/amd/iommu_intr.c
>> +++ b/xen/drivers/passthrough/amd/iommu_intr.c
>> @@ -238,8 +238,7 @@ static void update_intremap_entry(union
>>           break;
>>   
>>       case irte128:
>> -        ACCESS_ONCE(entry.ptr128->raw[0]) = 0;
>> -        barrier();
>> +        ASSERT(!entry.ptr128->full.remap_en);
>>           entry.ptr128->raw[1] =
>>               container_of(&full, union irte128, full)->raw[1];
>>           barrier();
>> @@ -308,6 +307,20 @@ static int update_intremap_entry_from_io
>>       }
>>   
>>       entry = get_intremap_entry(iommu->seg, req_id, offset);
>> +
>> +    /* The RemapEn fields match for all formats. */
>> +    while ( iommu->enabled && entry.ptr32->basic.remap_en )
> 
> Why while?  (and by this, what I mean is that this definitely needs a
> comment, because the code looks like it ought to be an if.)

Well - see the RFC remark after the description. I'd be happy to
change to if(), but only on solid grounds. Without clear
guarantees that no races between IRTE updates can occur, we need
to continue flushing as long as we find RemapEn to have got set
again after a flush. Note how the necessary lock guarding against
such is getting dropped and re-acquired in the loop bodies.

Jan
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 54+ messages in thread

end of thread, back to index

Thread overview: 54+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-06-13 13:14 [Xen-devel] [PATCH 0/9] x86: AMD x2APIC support Jan Beulich
2019-06-13 13:22 ` [Xen-devel] [PATCH 1/9] AMD/IOMMU: use bit field for extended feature register Jan Beulich
2019-06-17 19:07   ` Woods, Brian
2019-06-18  9:37     ` Jan Beulich
2019-06-17 20:23   ` Andrew Cooper
2019-06-18  9:33     ` Jan Beulich
2019-06-13 13:22 ` [Xen-devel] [PATCH 2/9] AMD/IOMMU: use bit field for control register Jan Beulich
2019-06-18  9:54   ` Andrew Cooper
2019-06-18 10:45     ` Jan Beulich
2019-06-13 13:23 ` [Xen-devel] [PATCH 3/9] AMD/IOMMU: use bit field for IRTE Jan Beulich
2019-06-18 10:37   ` Andrew Cooper
2019-06-18 11:53     ` Jan Beulich
2019-06-18 12:16       ` Andrew Cooper
2019-06-18 12:55         ` Jan Beulich
2019-06-18 11:31   ` Andrew Cooper
2019-06-18 11:47     ` Jan Beulich
2019-06-13 13:23 ` [Xen-devel] [PATCH 4/9] AMD/IOMMU: introduce 128-bit IRTE non-guest-APIC IRTE format Jan Beulich
2019-06-18 11:57   ` Andrew Cooper
2019-06-18 15:31     ` Jan Beulich
2019-06-13 13:24 ` [Xen-devel] [PATCH 5/9] AMD/IOMMU: split amd_iommu_init_one() Jan Beulich
2019-06-18 12:17   ` Andrew Cooper
2019-06-13 13:25 ` [Xen-devel] [PATCH 6/9] AMD/IOMMU: allow enabling with IRQ not yet set up Jan Beulich
2019-06-18 12:22   ` Andrew Cooper
2019-06-13 13:26 ` [Xen-devel] [PATCH 7/9] AMD/IOMMU: adjust setup of internal interrupt for x2APIC mode Jan Beulich
2019-06-18 12:35   ` Andrew Cooper
2019-06-13 13:27 ` [Xen-devel] [PATCH 8/9] AMD/IOMMU: enable x2APIC mode when available Jan Beulich
2019-06-18 13:40   ` Andrew Cooper
2019-06-18 14:02     ` Jan Beulich
2019-06-13 13:28 ` [Xen-devel] [PATCH RFC 9/9] AMD/IOMMU: correct IRTE updating Jan Beulich
2019-06-18 13:28   ` Andrew Cooper
2019-06-18 14:58     ` Jan Beulich
2019-06-27 15:15 ` [Xen-devel] [PATCH v2 00/10] x86: AMD x2APIC support Jan Beulich
2019-06-27 15:19   ` [Xen-devel] [PATCH v2 01/10] AMD/IOMMU: restrict feature logging Jan Beulich
2019-07-01 15:37     ` Andrew Cooper
2019-07-01 15:59     ` Woods, Brian
2019-06-27 15:19   ` [Xen-devel] [PATCH v2 02/10] AMD/IOMMU: use bit field for extended feature register Jan Beulich
2019-07-02 12:09     ` Andrew Cooper
2019-07-02 13:48       ` Jan Beulich
2019-06-27 15:20   ` [Xen-devel] [PATCH v2 03/10] AMD/IOMMU: use bit field for control register Jan Beulich
2019-07-02 12:20     ` Andrew Cooper
2019-06-27 15:20   ` [Xen-devel] [PATCH v2 04/10] AMD/IOMMU: use bit field for IRTE Jan Beulich
2019-07-02 12:33     ` Andrew Cooper
2019-07-02 13:56       ` Jan Beulich
2019-06-27 15:21   ` [Xen-devel] [PATCH v2 05/10] AMD/IOMMU: introduce 128-bit IRTE non-guest-APIC IRTE format Jan Beulich
2019-07-02 14:41     ` Andrew Cooper
2019-07-03  8:46       ` Jan Beulich
2019-06-27 15:21   ` [Xen-devel] [PATCH v2 06/10] AMD/IOMMU: split amd_iommu_init_one() Jan Beulich
2019-06-27 15:22   ` [Xen-devel] [PATCH v2 07/10] AMD/IOMMU: allow enabling with IRQ not yet set up Jan Beulich
2019-06-27 15:22   ` [Xen-devel] [PATCH v2 08/10] AMD/IOMMU: adjust setup of internal interrupt for x2APIC mode Jan Beulich
2019-06-27 15:23   ` [Xen-devel] [PATCH v2 09/10] AMD/IOMMU: enable x2APIC mode when available Jan Beulich
2019-07-02 14:50     ` Andrew Cooper
2019-06-27 15:23   ` [Xen-devel] [PATCH RFC v2 10/10] AMD/IOMMU: correct IRTE updating Jan Beulich
2019-07-02 15:08     ` Andrew Cooper
2019-07-03  8:55       ` Jan Beulich

Xen-Devel Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/xen-devel/0 xen-devel/git/0.git
	git clone --mirror https://lore.kernel.org/xen-devel/1 xen-devel/git/1.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 xen-devel xen-devel/ https://lore.kernel.org/xen-devel \
		xen-devel@lists.xenproject.org xen-devel@archiver.kernel.org
	public-inbox-index xen-devel


Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.xenproject.lists.xen-devel


AGPL code for this site: git clone https://public-inbox.org/ public-inbox