[Qemu-devel] [PATCH v2 0/3] q35: change defaults for kernel irqchip and IR

All of lore.kernel.org
 help / color / mirror / Atom feed

* [Qemu-devel] [PATCH v2 0/3] q35: change defaults for kernel irqchip and IR
@ 2018-12-20  5:40 Peter Xu
  2018-12-20  5:40 ` [Qemu-devel] [PATCH v2 1/3] q35: set split kernel irqchip as default Peter Xu
                   ` (3 more replies)
  0 siblings, 4 replies; 38+ messages in thread
From: Peter Xu @ 2018-12-20  5:40 UTC (permalink / raw)
  To: qemu-devel
  Cc: Michael S . Tsirkin, Eduardo Habkost, Igor Mammedov, peterx,
	Paolo Bonzini

v2:
- drop patch 1, so we don't even need to consider old kernels for the
  default value [Paolo, Eduardo]
- fix up commit message of the split irqchip patch to mention that

============= Original cover letter ================

This only changes q35.  Nothing else.

Before this series, we have these default parameters:

  - machine kernel-irqchip: on
  - intel-iommu IR: off

This series wants to change these default variables into:

  - machine kernel-irqchip: split
  - intel-iommu IR: on

and at the meantime we should keep compatibility with old kernels and
old versions of QEMU.

For old versions of QEMU: we used machine compat properties.

For old kernels (<4.4): if user didn't specify split kernel irqchip,
we'll take it only as the first priority if it's supported by the
kernel; otherwise, we will continue with complete kernel irqchip.

Both of these parameters should be good to have (split irqchip gains
more security, while IR gets it too but even more, like x2apic).  So
let's try to make them as default if capable.

Tests ("split" stands for whether kernel split irqchip enabled, "IR"
stands for whether IR is turned on):

   |--------------------------------------------------------+-------+----|
   | params                                                 | split | IR |
   |--------------------------------------------------------+-------+----|
   | -M q35                                                 |     0 | /  |
   | -M q35,accel=kvm                                       |     1 | /  |
   | -M pc-q35-3.1,accel=kvm                                |     0 | /  |
   | -M q35,accel=kvm,kernel-irqchip=off                    |     0 | /  |
   | -M q35,accel=kvm,kernel-irqchip=on                     |     0 | /  |
   | -M q35 -device intel-iommu                             |     0 | 1  |
   | -M q35,accel=kvm -device intel-iommu                   |     1 | 1  |
   | -M q35,accel=kvm,kernel-irqchip=on -device intel-iommu |     0 | 0  |
   |--------------------------------------------------------+-------+----|

I didn't try old kernels, though.

Any comment would be welcomed, thanks.

Peter Xu (3):
  q35: set split kernel irqchip as default
  x86-iommu: switch intr_supported to OnOffAuto type
  x86-iommu: turn on IR by default if proper

 hw/core/machine.c           |  2 ++
 hw/i386/acpi-build.c        |  6 +++---
 hw/i386/amd_iommu.c         |  2 +-
 hw/i386/intel_iommu.c       |  6 +++---
 hw/i386/pc.c                |  2 +-
 hw/i386/pc_q35.c            |  2 ++
 hw/i386/x86-iommu.c         | 18 +++++++++++++++---
 include/hw/boards.h         |  1 +
 include/hw/i386/x86-iommu.h |  4 +++-
 9 files changed, 31 insertions(+), 12 deletions(-)

-- 
2.17.1

^ permalink raw reply	[flat|nested] 38+ messages in thread

* [Qemu-devel] [PATCH v2 1/3] q35: set split kernel irqchip as default
  2018-12-20  5:40 [Qemu-devel] [PATCH v2 0/3] q35: change defaults for kernel irqchip and IR Peter Xu
@ 2018-12-20  5:40 ` Peter Xu
  2018-12-20 12:13   ` Eduardo Habkost
  2019-04-26 19:27     ` Alex Williamson
  2018-12-20  5:40 ` [Qemu-devel] [PATCH v2 2/3] x86-iommu: switch intr_supported to OnOffAuto type Peter Xu
                   ` (2 subsequent siblings)
  3 siblings, 2 replies; 38+ messages in thread
From: Peter Xu @ 2018-12-20  5:40 UTC (permalink / raw)
  To: qemu-devel
  Cc: Michael S . Tsirkin, Eduardo Habkost, Igor Mammedov, peterx,
	Paolo Bonzini

Starting from QEMU 4.0, let's specify "split" as the default value for
kernel-irqchip.

So for QEMU>=4.0 we'll have: allowed=Y,required=N,split=Y
   for QEMU<=3.1 we'll have: allowed=Y,required=N,split=N
   (omitting all the "kernel_irqchip_" prefix)

Note that this will let the default q35 machine type to depend on
Linux version 4.4 or newer because that's where split irqchip is
introduced in kernel.  But it's fine since we're boosting supported
Linux version for QEMU 4.0 to around Linux 4.5.  For more information
please refer to the discussion on AMD's RDTSCP:

  https://lore.kernel.org/lkml/20181210181328.GA762@zn.tnic/

Signed-off-by: Peter Xu <peterx@redhat.com>
---
 hw/core/machine.c   | 2 ++
 hw/i386/pc_q35.c    | 2 ++
 include/hw/boards.h | 1 +
 3 files changed, 5 insertions(+)

diff --git a/hw/core/machine.c b/hw/core/machine.c
index c51423b647..4439ea663f 100644
--- a/hw/core/machine.c
+++ b/hw/core/machine.c
@@ -653,8 +653,10 @@ static void machine_class_base_init(ObjectClass *oc, void *data)
 static void machine_initfn(Object *obj)
 {
     MachineState *ms = MACHINE(obj);
+    MachineClass *mc = MACHINE_GET_CLASS(obj);
 
     ms->kernel_irqchip_allowed = true;
+    ms->kernel_irqchip_split = mc->default_kernel_irqchip_split;
     ms->kvm_shadow_mem = -1;
     ms->dump_guest_core = true;
     ms->mem_merge = true;
diff --git a/hw/i386/pc_q35.c b/hw/i386/pc_q35.c
index 58459bdab5..d2fb0fa49f 100644
--- a/hw/i386/pc_q35.c
+++ b/hw/i386/pc_q35.c
@@ -304,6 +304,7 @@ static void pc_q35_machine_options(MachineClass *m)
     m->units_per_default_bus = 1;
     m->default_machine_opts = "firmware=bios-256k.bin";
     m->default_display = "std";
+    m->default_kernel_irqchip_split = true;
     m->no_floppy = 1;
     machine_class_allow_dynamic_sysbus_dev(m, TYPE_AMD_IOMMU_DEVICE);
     machine_class_allow_dynamic_sysbus_dev(m, TYPE_INTEL_IOMMU_DEVICE);
@@ -323,6 +324,7 @@ DEFINE_Q35_MACHINE(v4_0, "pc-q35-4.0", NULL,
 static void pc_q35_3_1_machine_options(MachineClass *m)
 {
     pc_q35_4_0_machine_options(m);
+    m->default_kernel_irqchip_split = false;
     m->alias = NULL;
     SET_MACHINE_COMPAT(m, PC_COMPAT_3_1);
 }
diff --git a/include/hw/boards.h b/include/hw/boards.h
index f82f28468b..362384815e 100644
--- a/include/hw/boards.h
+++ b/include/hw/boards.h
@@ -195,6 +195,7 @@ struct MachineClass {
     const char *hw_version;
     ram_addr_t default_ram_size;
     const char *default_cpu_type;
+    bool default_kernel_irqchip_split;
     bool option_rom_has_mr;
     bool rom_file_has_mr;
     int minimum_page_bits;
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [Qemu-devel] [PATCH v2 2/3] x86-iommu: switch intr_supported to OnOffAuto type
  2018-12-20  5:40 [Qemu-devel] [PATCH v2 0/3] q35: change defaults for kernel irqchip and IR Peter Xu
  2018-12-20  5:40 ` [Qemu-devel] [PATCH v2 1/3] q35: set split kernel irqchip as default Peter Xu
@ 2018-12-20  5:40 ` Peter Xu
  2018-12-20  5:40 ` [Qemu-devel] [PATCH v2 3/3] x86-iommu: turn on IR by default if proper Peter Xu
  2018-12-20 10:00 ` [Qemu-devel] [PATCH v2 0/3] q35: change defaults for kernel irqchip and IR Paolo Bonzini
  3 siblings, 0 replies; 38+ messages in thread
From: Peter Xu @ 2018-12-20  5:40 UTC (permalink / raw)
  To: qemu-devel
  Cc: Michael S . Tsirkin, Eduardo Habkost, Igor Mammedov, peterx,
	Paolo Bonzini

Switch the intr_supported variable from a boolean to OnOffAuto type so
that we can know whether the user specified it or not.  With that
we'll have a chance to help the user to choose more wisely where
possible.  Introduce x86_iommu_ir_supported() to mask these changes.

No functional change at all.

Signed-off-by: Peter Xu <peterx@redhat.com>
---
 hw/i386/acpi-build.c        |  6 +++---
 hw/i386/amd_iommu.c         |  2 +-
 hw/i386/intel_iommu.c       |  6 +++---
 hw/i386/pc.c                |  2 +-
 hw/i386/x86-iommu.c         | 15 +++++++++++++--
 include/hw/i386/x86-iommu.h |  4 +++-
 6 files changed, 24 insertions(+), 11 deletions(-)

diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c
index 236a20eaa8..7012f97cac 100644
--- a/hw/i386/acpi-build.c
+++ b/hw/i386/acpi-build.c
@@ -2426,7 +2426,7 @@ build_dmar_q35(GArray *table_data, BIOSLinker *linker)
     IntelIOMMUState *intel_iommu = INTEL_IOMMU_DEVICE(iommu);
 
     assert(iommu);
-    if (iommu->intr_supported) {
+    if (x86_iommu_ir_supported(iommu)) {
         dmar_flags |= 0x1;      /* Flags: 0x1: INT_REMAP */
     }
 
@@ -2499,7 +2499,7 @@ build_amd_iommu(GArray *table_data, BIOSLinker *linker)
      * When interrupt remapping is supported, we add a special IVHD device
      * for type IO-APIC.
      */
-    if (x86_iommu_get_default()->intr_supported) {
+    if (x86_iommu_ir_supported(x86_iommu_get_default())) {
         ivhd_table_len += 8;
     }
     /* IVHD length */
@@ -2535,7 +2535,7 @@ build_amd_iommu(GArray *table_data, BIOSLinker *linker)
      * Linux IOMMU driver checks for the special IVHD device (type IO-APIC).
      * See Linux kernel commit 'c2ff5cf5294bcbd7fa50f7d860e90a66db7e5059'
      */
-    if (x86_iommu_get_default()->intr_supported) {
+    if (x86_iommu_ir_supported(x86_iommu_get_default())) {
         build_append_int_noprefix(table_data,
                                  (0x1ull << 56) |           /* type IOAPIC */
                                  (IOAPIC_SB_DEVID << 40) |  /* IOAPIC devid */
diff --git a/hw/i386/amd_iommu.c b/hw/i386/amd_iommu.c
index 353a810e6b..8ad707aba0 100644
--- a/hw/i386/amd_iommu.c
+++ b/hw/i386/amd_iommu.c
@@ -1233,7 +1233,7 @@ static int amdvi_int_remap_msi(AMDVIState *iommu,
     }
 
     /* validate that we are configure with intremap=on */
-    if (!X86_IOMMU_DEVICE(iommu)->intr_supported) {
+    if (!x86_iommu_ir_supported(X86_IOMMU_DEVICE(iommu))) {
         trace_amdvi_err("Interrupt remapping is enabled in the guest but "
                         "not in the host. Use intremap=on to enable interrupt "
                         "remapping in amd-iommu.");
diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
index d97bcbc2f7..3df4b0a550 100644
--- a/hw/i386/intel_iommu.c
+++ b/hw/i386/intel_iommu.c
@@ -3137,7 +3137,7 @@ static void vtd_init(IntelIOMMUState *s)
     vtd_paging_entry_rsvd_field[7] = VTD_SPTE_LPAGE_L3_RSVD_MASK(s->aw_bits);
     vtd_paging_entry_rsvd_field[8] = VTD_SPTE_LPAGE_L4_RSVD_MASK(s->aw_bits);
 
-    if (x86_iommu->intr_supported) {
+    if (x86_iommu_ir_supported(x86_iommu)) {
         s->ecap |= VTD_ECAP_IR | VTD_ECAP_MHMV;
         if (s->intr_eim == ON_OFF_AUTO_ON) {
             s->ecap |= VTD_ECAP_EIM;
@@ -3238,14 +3238,14 @@ static bool vtd_decide_config(IntelIOMMUState *s, Error **errp)
 {
     X86IOMMUState *x86_iommu = X86_IOMMU_DEVICE(s);
 
-    if (s->intr_eim == ON_OFF_AUTO_ON && !x86_iommu->intr_supported) {
+    if (s->intr_eim == ON_OFF_AUTO_ON && !x86_iommu_ir_supported(x86_iommu)) {
         error_setg(errp, "eim=on cannot be selected without intremap=on");
         return false;
     }
 
     if (s->intr_eim == ON_OFF_AUTO_AUTO) {
         s->intr_eim = (kvm_irqchip_in_kernel() || s->buggy_eim)
-                      && x86_iommu->intr_supported ?
+                      && x86_iommu_ir_supported(x86_iommu) ?
                                               ON_OFF_AUTO_ON : ON_OFF_AUTO_OFF;
     }
     if (s->intr_eim == ON_OFF_AUTO_ON && !s->buggy_eim) {
diff --git a/hw/i386/pc.c b/hw/i386/pc.c
index 115bc2825c..d95a0e3ad1 100644
--- a/hw/i386/pc.c
+++ b/hw/i386/pc.c
@@ -1244,7 +1244,7 @@ void pc_machine_done(Notifier *notifier, void *data)
     if (pcms->apic_id_limit > 255 && !xen_enabled()) {
         IntelIOMMUState *iommu = INTEL_IOMMU_DEVICE(x86_iommu_get_default());
 
-        if (!iommu || !iommu->x86_iommu.intr_supported ||
+        if (!iommu || !x86_iommu_ir_supported(X86_IOMMU_DEVICE(iommu)) ||
             iommu->intr_eim != ON_OFF_AUTO_ON) {
             error_report("current -smp configuration requires "
                          "Extended Interrupt Mode enabled. "
diff --git a/hw/i386/x86-iommu.c b/hw/i386/x86-iommu.c
index abc3c03158..61ee0f1eaa 100644
--- a/hw/i386/x86-iommu.c
+++ b/hw/i386/x86-iommu.c
@@ -119,8 +119,13 @@ static void x86_iommu_realize(DeviceState *dev, Error **errp)
         return;
     }
 
+    /* If the user didn't specify IR, choose a default value for it */
+    if (x86_iommu->intr_supported == ON_OFF_AUTO_AUTO) {
+        x86_iommu->intr_supported = ON_OFF_AUTO_OFF;
+    }
+
     /* Both Intel and AMD IOMMU IR only support "kernel-irqchip={off|split}" */
-    if (x86_iommu->intr_supported && kvm_irqchip_in_kernel() &&
+    if (x86_iommu_ir_supported(x86_iommu) && kvm_irqchip_in_kernel() &&
         !kvm_irqchip_is_split()) {
         error_setg(errp, "Interrupt Remapping cannot work with "
                          "kernel-irqchip=on, please use 'split|off'.");
@@ -135,7 +140,8 @@ static void x86_iommu_realize(DeviceState *dev, Error **errp)
 }
 
 static Property x86_iommu_properties[] = {
-    DEFINE_PROP_BOOL("intremap", X86IOMMUState, intr_supported, false),
+    DEFINE_PROP_ON_OFF_AUTO("intremap", X86IOMMUState,
+                            intr_supported, ON_OFF_AUTO_AUTO),
     DEFINE_PROP_BOOL("device-iotlb", X86IOMMUState, dt_supported, false),
     DEFINE_PROP_BOOL("pt", X86IOMMUState, pt_supported, true),
     DEFINE_PROP_END_OF_LIST(),
@@ -148,6 +154,11 @@ static void x86_iommu_class_init(ObjectClass *klass, void *data)
     dc->props = x86_iommu_properties;
 }
 
+bool x86_iommu_ir_supported(X86IOMMUState *s)
+{
+    return s->intr_supported == ON_OFF_AUTO_ON;
+}
+
 static const TypeInfo x86_iommu_info = {
     .name          = TYPE_X86_IOMMU_DEVICE,
     .parent        = TYPE_SYS_BUS_DEVICE,
diff --git a/include/hw/i386/x86-iommu.h b/include/hw/i386/x86-iommu.h
index 2b22a579a3..dcd9719a2c 100644
--- a/include/hw/i386/x86-iommu.h
+++ b/include/hw/i386/x86-iommu.h
@@ -74,13 +74,15 @@ typedef struct IEC_Notifier IEC_Notifier;
 
 struct X86IOMMUState {
     SysBusDevice busdev;
-    bool intr_supported;        /* Whether vIOMMU supports IR */
+    OnOffAuto intr_supported;   /* Whether vIOMMU supports IR */
     bool dt_supported;          /* Whether vIOMMU supports DT */
     bool pt_supported;          /* Whether vIOMMU supports pass-through */
     IommuType type;             /* IOMMU type - AMD/Intel     */
     QLIST_HEAD(, IEC_Notifier) iec_notifiers; /* IEC notify list */
 };
 
+bool x86_iommu_ir_supported(X86IOMMUState *s);
+
 /* Generic IRQ entry information when interrupt remapping is enabled */
 struct X86IOMMUIrq {
     /* Used by both IOAPIC/MSI interrupt remapping */
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [Qemu-devel] [PATCH v2 3/3] x86-iommu: turn on IR by default if proper
  2018-12-20  5:40 [Qemu-devel] [PATCH v2 0/3] q35: change defaults for kernel irqchip and IR Peter Xu
  2018-12-20  5:40 ` [Qemu-devel] [PATCH v2 1/3] q35: set split kernel irqchip as default Peter Xu
  2018-12-20  5:40 ` [Qemu-devel] [PATCH v2 2/3] x86-iommu: switch intr_supported to OnOffAuto type Peter Xu
@ 2018-12-20  5:40 ` Peter Xu
  2018-12-20 10:00 ` [Qemu-devel] [PATCH v2 0/3] q35: change defaults for kernel irqchip and IR Paolo Bonzini
  3 siblings, 0 replies; 38+ messages in thread
From: Peter Xu @ 2018-12-20  5:40 UTC (permalink / raw)
  To: qemu-devel
  Cc: Michael S . Tsirkin, Eduardo Habkost, Igor Mammedov, peterx,
	Paolo Bonzini

When the user didn't specify "intremap" for the IOMMU device, we turn
it on by default if it is supported.  This will turn IR on for the
default Q35 platform as long as the IOMMU device is specified on new
kernels.

Signed-off-by: Peter Xu <peterx@redhat.com>
---
 hw/i386/x86-iommu.c | 7 ++++---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/hw/i386/x86-iommu.c b/hw/i386/x86-iommu.c
index 61ee0f1eaa..d1534c1ae0 100644
--- a/hw/i386/x86-iommu.c
+++ b/hw/i386/x86-iommu.c
@@ -112,6 +112,7 @@ static void x86_iommu_realize(DeviceState *dev, Error **errp)
     PCMachineState *pcms =
         PC_MACHINE(object_dynamic_cast(OBJECT(ms), TYPE_PC_MACHINE));
     QLIST_INIT(&x86_iommu->iec_notifiers);
+    bool irq_all_kernel = kvm_irqchip_in_kernel() && !kvm_irqchip_is_split();
 
     if (!pcms || !pcms->bus) {
         error_setg(errp, "Machine-type '%s' not supported by IOMMU",
@@ -121,12 +122,12 @@ static void x86_iommu_realize(DeviceState *dev, Error **errp)
 
     /* If the user didn't specify IR, choose a default value for it */
     if (x86_iommu->intr_supported == ON_OFF_AUTO_AUTO) {
-        x86_iommu->intr_supported = ON_OFF_AUTO_OFF;
+        x86_iommu->intr_supported = irq_all_kernel ?
+            ON_OFF_AUTO_OFF : ON_OFF_AUTO_ON;
     }
 
     /* Both Intel and AMD IOMMU IR only support "kernel-irqchip={off|split}" */
-    if (x86_iommu_ir_supported(x86_iommu) && kvm_irqchip_in_kernel() &&
-        !kvm_irqchip_is_split()) {
+    if (x86_iommu_ir_supported(x86_iommu) && irq_all_kernel) {
         error_setg(errp, "Interrupt Remapping cannot work with "
                          "kernel-irqchip=on, please use 'split|off'.");
         return;
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 38+ messages in thread

* Re: [Qemu-devel] [PATCH v2 0/3] q35: change defaults for kernel irqchip and IR
  2018-12-20  5:40 [Qemu-devel] [PATCH v2 0/3] q35: change defaults for kernel irqchip and IR Peter Xu
                   ` (2 preceding siblings ...)
  2018-12-20  5:40 ` [Qemu-devel] [PATCH v2 3/3] x86-iommu: turn on IR by default if proper Peter Xu
@ 2018-12-20 10:00 ` Paolo Bonzini
  2018-12-20 10:16   ` Peter Xu
  3 siblings, 1 reply; 38+ messages in thread
From: Paolo Bonzini @ 2018-12-20 10:00 UTC (permalink / raw)
  To: Peter Xu, qemu-devel; +Cc: Michael S . Tsirkin, Eduardo Habkost, Igor Mammedov

On 20/12/18 06:40, Peter Xu wrote:
> For old versions of QEMU: we used machine compat properties.
> 
> For old kernels (<4.4): if user didn't specify split kernel irqchip,
> we'll take it only as the first priority if it's supported by the
> kernel; otherwise, we will continue with complete kernel irqchip.
> 
> Both of these parameters should be good to have (split irqchip gains
> more security, while IR gets it too but even more, like x2apic).  So
> let's try to make them as default if capable.

This is not correct anymore, but the commit message in patch 1 is
correct, so

Acked-by: Paolo Bonzini <pbonzini@redhat.com>

Paolo

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Qemu-devel] [PATCH v2 0/3] q35: change defaults for kernel irqchip and IR
  2018-12-20 10:00 ` [Qemu-devel] [PATCH v2 0/3] q35: change defaults for kernel irqchip and IR Paolo Bonzini
@ 2018-12-20 10:16   ` Peter Xu
  0 siblings, 0 replies; 38+ messages in thread
From: Peter Xu @ 2018-12-20 10:16 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: qemu-devel, Michael S . Tsirkin, Eduardo Habkost, Igor Mammedov

On Thu, Dec 20, 2018 at 11:00:21AM +0100, Paolo Bonzini wrote:
> On 20/12/18 06:40, Peter Xu wrote:
> > For old versions of QEMU: we used machine compat properties.
> > 
> > For old kernels (<4.4): if user didn't specify split kernel irqchip,
> > we'll take it only as the first priority if it's supported by the
> > kernel; otherwise, we will continue with complete kernel irqchip.
> > 
> > Both of these parameters should be good to have (split irqchip gains
> > more security, while IR gets it too but even more, like x2apic).  So
> > let's try to make them as default if capable.
> 
> This is not correct anymore, but the commit message in patch 1 is
> correct, so

Indeed; I should remove the old cover letter to avoid confusion.

> 
> Acked-by: Paolo Bonzini <pbonzini@redhat.com>

Thanks!

-- 
Peter Xu

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Qemu-devel] [PATCH v2 1/3] q35: set split kernel irqchip as default
  2018-12-20  5:40 ` [Qemu-devel] [PATCH v2 1/3] q35: set split kernel irqchip as default Peter Xu
@ 2018-12-20 12:13   ` Eduardo Habkost
  2019-04-26 19:27     ` Alex Williamson
  1 sibling, 0 replies; 38+ messages in thread
From: Eduardo Habkost @ 2018-12-20 12:13 UTC (permalink / raw)
  To: Peter Xu; +Cc: qemu-devel, Michael S . Tsirkin, Igor Mammedov, Paolo Bonzini

On Thu, Dec 20, 2018 at 01:40:35PM +0800, Peter Xu wrote:
> Starting from QEMU 4.0, let's specify "split" as the default value for
> kernel-irqchip.
> 
> So for QEMU>=4.0 we'll have: allowed=Y,required=N,split=Y
>    for QEMU<=3.1 we'll have: allowed=Y,required=N,split=N
>    (omitting all the "kernel_irqchip_" prefix)
> 
> Note that this will let the default q35 machine type to depend on
> Linux version 4.4 or newer because that's where split irqchip is
> introduced in kernel.  But it's fine since we're boosting supported
> Linux version for QEMU 4.0 to around Linux 4.5.  For more information
> please refer to the discussion on AMD's RDTSCP:
> 
>   https://lore.kernel.org/lkml/20181210181328.GA762@zn.tnic/
> 
> Signed-off-by: Peter Xu <peterx@redhat.com>

Reviewed-by: Eduardo Habkost <ehabkost@redhat.com>

-- 
Eduardo

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Qemu-devel] [PATCH v2 1/3] q35: set split kernel irqchip as default
@ 2019-04-26 19:27     ` Alex Williamson
  0 siblings, 0 replies; 38+ messages in thread
From: Alex Williamson @ 2019-04-26 19:27 UTC (permalink / raw)
  To: Peter Xu
  Cc: qemu-devel, Igor Mammedov, Paolo Bonzini, Eduardo Habkost,
	Michael S . Tsirkin

On Thu, 20 Dec 2018 13:40:35 +0800
Peter Xu <peterx@redhat.com> wrote:

> Starting from QEMU 4.0, let's specify "split" as the default value for
> kernel-irqchip.
> 
> So for QEMU>=4.0 we'll have: allowed=Y,required=N,split=Y
>    for QEMU<=3.1 we'll have: allowed=Y,required=N,split=N
>    (omitting all the "kernel_irqchip_" prefix)
> 
> Note that this will let the default q35 machine type to depend on
> Linux version 4.4 or newer because that's where split irqchip is
> introduced in kernel.  But it's fine since we're boosting supported
> Linux version for QEMU 4.0 to around Linux 4.5.  For more information
> please refer to the discussion on AMD's RDTSCP:
> 
>   https://lore.kernel.org/lkml/20181210181328.GA762@zn.tnic/

It looks like this broke INTx for vfio-pci, see:

https://bugs.launchpad.net/qemu/+bug/1826422

In my testing it looks like KVM advertises supporting the KVM_IRQFD
resample feature, but vfio never gets the unmask notification, so the
device remains with DisINTx set and no further interrupts are
generated.  Do we expect KVM's IRQFD with resampler to work in the
split IRQ mode?  We can certainly hope that "high performance" devices
use MSI or MSI/X, but this would be quite a performance regression with
split mode if our userspace bypass for INTx goes away.  Thanks,

Alex

PS - the least impact workaround for users is to re-enable kernel mode
irqchip with kernel_irqchip=on or <ioapic driver='kvm'/>.  We can also
force QEMU routing of INTx with the x-no-kvm-intx=on option per
vfio-pci device, but this disables the userspace bypass and brings
along higher interrupt latency and overhead.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Qemu-devel] [PATCH v2 1/3] q35: set split kernel irqchip as default
@ 2019-04-26 19:27     ` Alex Williamson
  0 siblings, 0 replies; 38+ messages in thread
From: Alex Williamson @ 2019-04-26 19:27 UTC (permalink / raw)
  To: Peter Xu
  Cc: Igor Mammedov, Michael S . Tsirkin, qemu-devel, Eduardo Habkost,
	Paolo Bonzini

On Thu, 20 Dec 2018 13:40:35 +0800
Peter Xu <peterx@redhat.com> wrote:

> Starting from QEMU 4.0, let's specify "split" as the default value for
> kernel-irqchip.
> 
> So for QEMU>=4.0 we'll have: allowed=Y,required=N,split=Y
>    for QEMU<=3.1 we'll have: allowed=Y,required=N,split=N
>    (omitting all the "kernel_irqchip_" prefix)
> 
> Note that this will let the default q35 machine type to depend on
> Linux version 4.4 or newer because that's where split irqchip is
> introduced in kernel.  But it's fine since we're boosting supported
> Linux version for QEMU 4.0 to around Linux 4.5.  For more information
> please refer to the discussion on AMD's RDTSCP:
> 
>   https://lore.kernel.org/lkml/20181210181328.GA762@zn.tnic/

It looks like this broke INTx for vfio-pci, see:

https://bugs.launchpad.net/qemu/+bug/1826422

In my testing it looks like KVM advertises supporting the KVM_IRQFD
resample feature, but vfio never gets the unmask notification, so the
device remains with DisINTx set and no further interrupts are
generated.  Do we expect KVM's IRQFD with resampler to work in the
split IRQ mode?  We can certainly hope that "high performance" devices
use MSI or MSI/X, but this would be quite a performance regression with
split mode if our userspace bypass for INTx goes away.  Thanks,

Alex

PS - the least impact workaround for users is to re-enable kernel mode
irqchip with kernel_irqchip=on or <ioapic driver='kvm'/>.  We can also
force QEMU routing of INTx with the x-no-kvm-intx=on option per
vfio-pci device, but this disables the userspace bypass and brings
along higher interrupt latency and overhead.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Qemu-devel] [PATCH v2 1/3] q35: set split kernel irqchip as default
@ 2019-04-26 21:02       ` Alex Williamson
  0 siblings, 0 replies; 38+ messages in thread
From: Alex Williamson @ 2019-04-26 21:02 UTC (permalink / raw)
  To: Peter Xu
  Cc: qemu-devel, Igor Mammedov, Paolo Bonzini, Eduardo Habkost,
	Michael S . Tsirkin

On Fri, 26 Apr 2019 13:27:44 -0600
Alex Williamson <alex.williamson@redhat.com> wrote:

> On Thu, 20 Dec 2018 13:40:35 +0800
> Peter Xu <peterx@redhat.com> wrote:
> 
> > Starting from QEMU 4.0, let's specify "split" as the default value for
> > kernel-irqchip.
> > 
> > So for QEMU>=4.0 we'll have: allowed=Y,required=N,split=Y
> >    for QEMU<=3.1 we'll have: allowed=Y,required=N,split=N
> >    (omitting all the "kernel_irqchip_" prefix)
> > 
> > Note that this will let the default q35 machine type to depend on
> > Linux version 4.4 or newer because that's where split irqchip is
> > introduced in kernel.  But it's fine since we're boosting supported
> > Linux version for QEMU 4.0 to around Linux 4.5.  For more information
> > please refer to the discussion on AMD's RDTSCP:
> > 
> >   https://lore.kernel.org/lkml/20181210181328.GA762@zn.tnic/  
> 
> It looks like this broke INTx for vfio-pci, see:
> 
> https://bugs.launchpad.net/qemu/+bug/1826422
> 
> In my testing it looks like KVM advertises supporting the KVM_IRQFD
> resample feature, but vfio never gets the unmask notification, so the
> device remains with DisINTx set and no further interrupts are
> generated.  Do we expect KVM's IRQFD with resampler to work in the
> split IRQ mode?  We can certainly hope that "high performance" devices
> use MSI or MSI/X, but this would be quite a performance regression with
> split mode if our userspace bypass for INTx goes away.  Thanks,

arch/x86/kvm/lapic.c:kvm_ioapic_send_eoi() dumps to userspace before
kvm_ioapic_update_eoi() can handle the irq_ack_notifier_list via
kvm_notify_acked_gsi(), so it looks like KVM really ought to return an
error when trying to register a resample IRQFD when irqchip_split(),
but do we have better options?  EOI handling in QEMU is pretty much
non-existent and even if it was in place, it's a big performance
regression for vfio INTx handling.  Is a split irqchip that retains
performant resampling (EOI) support possible?  Thanks,

Alex

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Qemu-devel] [PATCH v2 1/3] q35: set split kernel irqchip as default
@ 2019-04-26 21:02       ` Alex Williamson
  0 siblings, 0 replies; 38+ messages in thread
From: Alex Williamson @ 2019-04-26 21:02 UTC (permalink / raw)
  To: Peter Xu
  Cc: Igor Mammedov, Michael S . Tsirkin, qemu-devel, Eduardo Habkost,
	Paolo Bonzini

On Fri, 26 Apr 2019 13:27:44 -0600
Alex Williamson <alex.williamson@redhat.com> wrote:

> On Thu, 20 Dec 2018 13:40:35 +0800
> Peter Xu <peterx@redhat.com> wrote:
> 
> > Starting from QEMU 4.0, let's specify "split" as the default value for
> > kernel-irqchip.
> > 
> > So for QEMU>=4.0 we'll have: allowed=Y,required=N,split=Y
> >    for QEMU<=3.1 we'll have: allowed=Y,required=N,split=N
> >    (omitting all the "kernel_irqchip_" prefix)
> > 
> > Note that this will let the default q35 machine type to depend on
> > Linux version 4.4 or newer because that's where split irqchip is
> > introduced in kernel.  But it's fine since we're boosting supported
> > Linux version for QEMU 4.0 to around Linux 4.5.  For more information
> > please refer to the discussion on AMD's RDTSCP:
> > 
> >   https://lore.kernel.org/lkml/20181210181328.GA762@zn.tnic/  
> 
> It looks like this broke INTx for vfio-pci, see:
> 
> https://bugs.launchpad.net/qemu/+bug/1826422
> 
> In my testing it looks like KVM advertises supporting the KVM_IRQFD
> resample feature, but vfio never gets the unmask notification, so the
> device remains with DisINTx set and no further interrupts are
> generated.  Do we expect KVM's IRQFD with resampler to work in the
> split IRQ mode?  We can certainly hope that "high performance" devices
> use MSI or MSI/X, but this would be quite a performance regression with
> split mode if our userspace bypass for INTx goes away.  Thanks,

arch/x86/kvm/lapic.c:kvm_ioapic_send_eoi() dumps to userspace before
kvm_ioapic_update_eoi() can handle the irq_ack_notifier_list via
kvm_notify_acked_gsi(), so it looks like KVM really ought to return an
error when trying to register a resample IRQFD when irqchip_split(),
but do we have better options?  EOI handling in QEMU is pretty much
non-existent and even if it was in place, it's a big performance
regression for vfio INTx handling.  Is a split irqchip that retains
performant resampling (EOI) support possible?  Thanks,

Alex


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Qemu-devel] [PATCH v2 1/3] q35: set split kernel irqchip as default
@ 2019-04-27  5:29         ` Paolo Bonzini
  0 siblings, 0 replies; 38+ messages in thread
From: Paolo Bonzini @ 2019-04-27  5:29 UTC (permalink / raw)
  To: Alex Williamson
  Cc: Peter Xu, qemu-devel, Igor Mammedov, Eduardo Habkost,
	Michael S . Tsirkin


> > In my testing it looks like KVM advertises supporting the KVM_IRQFD
> > resample feature, but vfio never gets the unmask notification, so the
> > device remains with DisINTx set and no further interrupts are
> > generated.  Do we expect KVM's IRQFD with resampler to work in the
> > split IRQ mode?  We can certainly hope that "high performance" devices
> > use MSI or MSI/X, but this would be quite a performance regression with
> > split mode if our userspace bypass for INTx goes away.  Thanks,
> 
> arch/x86/kvm/lapic.c:kvm_ioapic_send_eoi() dumps to userspace before
> kvm_ioapic_update_eoi() can handle the irq_ack_notifier_list via
> kvm_notify_acked_gsi(),

That wouldn't help because kvm_ioapic_update_eoi would not even be
able to access vcpu->kvm->arch.vioapic (it's NULL).

The following untested patch would signal the resamplefd in kvm_ioapic_send_eoi,
before requesting the exit to userspace.  However I am not sure how QEMU
sets up the VFIO eventfds: if I understand correctly, when VFIO writes again to
the irq eventfd, the interrupt request would not reach the userspace IOAPIC, but
only the in-kernel LAPIC.  That would be incorrect, and if my understanding is
correct we need to trigger resampling from hw/intc/ioapic.c.

Something like:

1) VFIO uses kvm_irqchip_add_irqfd_notifier_gsi instead of KVM_IRQFD directly

2) add a NotifierList in kvm_irqchip_assign_irqfd

3) if kvm_irqchip_is_split(), register a corresponding Notifier in ioapic.c which
stores the resamplefd as an EventNotifier*

4) ioapic_eoi_broadcast writes to the resamplefd

BTW, how do non-x86 platforms handle intx resampling?

Paolo

---- 8< -----
From: Paolo Bonzini <pbonzini@redhat.com>
Subject: [PATCH WIP] kvm: lapic: run ack notifiers for userspace IOAPIC routes

diff --git a/arch/x86/kvm/ioapic.h b/arch/x86/kvm/ioapic.h
index ea1a4e0297da..be337e06e3fd 100644
--- a/arch/x86/kvm/ioapic.h
+++ b/arch/x86/kvm/ioapic.h
@@ -135,4 +135,6 @@ void kvm_ioapic_scan_entry(struct kvm_vcpu *vcpu,
 			   ulong *ioapic_handled_vectors);
 void kvm_scan_ioapic_routes(struct kvm_vcpu *vcpu,
 			    ulong *ioapic_handled_vectors);
+void kvm_notify_userspace_ioapic_routes(struct kvm *kvm, int vector);
+
 #endif
diff --git a/arch/x86/kvm/irq_comm.c b/arch/x86/kvm/irq_comm.c
index 3cc3b2d130a0..46ea79a0dd8a 100644
--- a/arch/x86/kvm/irq_comm.c
+++ b/arch/x86/kvm/irq_comm.c
@@ -435,6 +435,34 @@ void kvm_scan_ioapic_routes(struct kvm_vcpu *vcpu,
 	srcu_read_unlock(&kvm->irq_srcu, idx);
 }
 
+void kvm_notify_userspace_ioapic_routes(struct kvm *kvm, int vector)
+{
+        struct kvm_kernel_irq_routing_entry *entry;
+        struct kvm_irq_routing_table *table;
+        u32 i, nr_ioapic_pins;
+        int idx;
+
+        idx = srcu_read_lock(&kvm->irq_srcu);
+        table = srcu_dereference(kvm->irq_routing, &kvm->irq_srcu);
+        nr_ioapic_pins = min_t(u32, table->nr_rt_entries,
+                               kvm->arch.nr_reserved_ioapic_pins);
+
+	for (i = 0; i < nr_ioapic_pins; i++) {
+                hlist_for_each_entry(entry, &table->map[i], link) {
+                        struct kvm_lapic_irq irq;
+
+                        if (entry->type != KVM_IRQ_ROUTING_MSI)
+                                continue;
+
+			kvm_set_msi_irq(kvm, entry, &irq);
+			if (irq.level && irq.vector == vector)
+				kvm_notify_acked_gsi(kvm, i);
+		}
+	}
+
+	srcu_read_unlock(&kvm->irq_srcu, idx);
+}
+
 void kvm_arch_irq_routing_update(struct kvm *kvm)
 {
 	kvm_hv_irq_routing_update(kvm);
diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index 9f089e2e09d0..8f8e76d77925 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -1142,6 +1142,7 @@ static bool kvm_ioapic_handles_vector(struct kvm_lapic *apic, int vector)
 
 static void kvm_ioapic_send_eoi(struct kvm_lapic *apic, int vector)
 {
+	struct kvm_vcpu *vcpu = apic->vcpu;
 	int trigger_mode;
 
 	/* Eoi the ioapic only if the ioapic doesn't own the vector. */
@@ -1149,9 +1150,10 @@ static void kvm_ioapic_send_eoi(struct kvm_lapic *apic, int vector)
 		return;
 
 	/* Request a KVM exit to inform the userspace IOAPIC. */
-	if (irqchip_split(apic->vcpu->kvm)) {
-		apic->vcpu->arch.pending_ioapic_eoi = vector;
-		kvm_make_request(KVM_REQ_IOAPIC_EOI_EXIT, apic->vcpu);
+	if (irqchip_split(vcpu->kvm)) {
+		kvm_notify_userspace_ioapic_routes(vcpu->kvm, vector);
+		vcpu->arch.pending_ioapic_eoi = vector;
+		kvm_make_request(KVM_REQ_IOAPIC_EOI_EXIT, vcpu);
 		return;
 	}
 
@@ -1160,7 +1162,7 @@ static void kvm_ioapic_send_eoi(struct kvm_lapic *apic, int vector)
 	else
 		trigger_mode = IOAPIC_EDGE_TRIG;
 
-	kvm_ioapic_update_eoi(apic->vcpu, vector, trigger_mode);
+	kvm_ioapic_update_eoi(vcpu, vector, trigger_mode);
 }
 
 static int apic_set_eoi(struct kvm_lapic *apic)


> so it looks like KVM really ought to return an
> error when trying to register a resample IRQFD when irqchip_split(),
> but do we have better options?  EOI handling in QEMU is pretty much
> non-existent and even if it was in place, it's a big performance
> regression for vfio INTx handling.  Is a split irqchip that retains
> performant resampling (EOI) support possible?  Thanks,

^ permalink raw reply related	[flat|nested] 38+ messages in thread

* Re: [Qemu-devel] [PATCH v2 1/3] q35: set split kernel irqchip as default
@ 2019-04-27  5:29         ` Paolo Bonzini
  0 siblings, 0 replies; 38+ messages in thread
From: Paolo Bonzini @ 2019-04-27  5:29 UTC (permalink / raw)
  To: Alex Williamson
  Cc: Igor Mammedov, Michael S . Tsirkin, qemu-devel, Peter Xu,
	Eduardo Habkost


> > In my testing it looks like KVM advertises supporting the KVM_IRQFD
> > resample feature, but vfio never gets the unmask notification, so the
> > device remains with DisINTx set and no further interrupts are
> > generated.  Do we expect KVM's IRQFD with resampler to work in the
> > split IRQ mode?  We can certainly hope that "high performance" devices
> > use MSI or MSI/X, but this would be quite a performance regression with
> > split mode if our userspace bypass for INTx goes away.  Thanks,
> 
> arch/x86/kvm/lapic.c:kvm_ioapic_send_eoi() dumps to userspace before
> kvm_ioapic_update_eoi() can handle the irq_ack_notifier_list via
> kvm_notify_acked_gsi(),

That wouldn't help because kvm_ioapic_update_eoi would not even be
able to access vcpu->kvm->arch.vioapic (it's NULL).

The following untested patch would signal the resamplefd in kvm_ioapic_send_eoi,
before requesting the exit to userspace.  However I am not sure how QEMU
sets up the VFIO eventfds: if I understand correctly, when VFIO writes again to
the irq eventfd, the interrupt request would not reach the userspace IOAPIC, but
only the in-kernel LAPIC.  That would be incorrect, and if my understanding is
correct we need to trigger resampling from hw/intc/ioapic.c.

Something like:

1) VFIO uses kvm_irqchip_add_irqfd_notifier_gsi instead of KVM_IRQFD directly

2) add a NotifierList in kvm_irqchip_assign_irqfd

3) if kvm_irqchip_is_split(), register a corresponding Notifier in ioapic.c which
stores the resamplefd as an EventNotifier*

4) ioapic_eoi_broadcast writes to the resamplefd

BTW, how do non-x86 platforms handle intx resampling?

Paolo

---- 8< -----
From: Paolo Bonzini <pbonzini@redhat.com>
Subject: [PATCH WIP] kvm: lapic: run ack notifiers for userspace IOAPIC routes

diff --git a/arch/x86/kvm/ioapic.h b/arch/x86/kvm/ioapic.h
index ea1a4e0297da..be337e06e3fd 100644
--- a/arch/x86/kvm/ioapic.h
+++ b/arch/x86/kvm/ioapic.h
@@ -135,4 +135,6 @@ void kvm_ioapic_scan_entry(struct kvm_vcpu *vcpu,
 			   ulong *ioapic_handled_vectors);
 void kvm_scan_ioapic_routes(struct kvm_vcpu *vcpu,
 			    ulong *ioapic_handled_vectors);
+void kvm_notify_userspace_ioapic_routes(struct kvm *kvm, int vector);
+
 #endif
diff --git a/arch/x86/kvm/irq_comm.c b/arch/x86/kvm/irq_comm.c
index 3cc3b2d130a0..46ea79a0dd8a 100644
--- a/arch/x86/kvm/irq_comm.c
+++ b/arch/x86/kvm/irq_comm.c
@@ -435,6 +435,34 @@ void kvm_scan_ioapic_routes(struct kvm_vcpu *vcpu,
 	srcu_read_unlock(&kvm->irq_srcu, idx);
 }
 
+void kvm_notify_userspace_ioapic_routes(struct kvm *kvm, int vector)
+{
+        struct kvm_kernel_irq_routing_entry *entry;
+        struct kvm_irq_routing_table *table;
+        u32 i, nr_ioapic_pins;
+        int idx;
+
+        idx = srcu_read_lock(&kvm->irq_srcu);
+        table = srcu_dereference(kvm->irq_routing, &kvm->irq_srcu);
+        nr_ioapic_pins = min_t(u32, table->nr_rt_entries,
+                               kvm->arch.nr_reserved_ioapic_pins);
+
+	for (i = 0; i < nr_ioapic_pins; i++) {
+                hlist_for_each_entry(entry, &table->map[i], link) {
+                        struct kvm_lapic_irq irq;
+
+                        if (entry->type != KVM_IRQ_ROUTING_MSI)
+                                continue;
+
+			kvm_set_msi_irq(kvm, entry, &irq);
+			if (irq.level && irq.vector == vector)
+				kvm_notify_acked_gsi(kvm, i);
+		}
+	}
+
+	srcu_read_unlock(&kvm->irq_srcu, idx);
+}
+
 void kvm_arch_irq_routing_update(struct kvm *kvm)
 {
 	kvm_hv_irq_routing_update(kvm);
diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index 9f089e2e09d0..8f8e76d77925 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -1142,6 +1142,7 @@ static bool kvm_ioapic_handles_vector(struct kvm_lapic *apic, int vector)
 
 static void kvm_ioapic_send_eoi(struct kvm_lapic *apic, int vector)
 {
+	struct kvm_vcpu *vcpu = apic->vcpu;
 	int trigger_mode;
 
 	/* Eoi the ioapic only if the ioapic doesn't own the vector. */
@@ -1149,9 +1150,10 @@ static void kvm_ioapic_send_eoi(struct kvm_lapic *apic, int vector)
 		return;
 
 	/* Request a KVM exit to inform the userspace IOAPIC. */
-	if (irqchip_split(apic->vcpu->kvm)) {
-		apic->vcpu->arch.pending_ioapic_eoi = vector;
-		kvm_make_request(KVM_REQ_IOAPIC_EOI_EXIT, apic->vcpu);
+	if (irqchip_split(vcpu->kvm)) {
+		kvm_notify_userspace_ioapic_routes(vcpu->kvm, vector);
+		vcpu->arch.pending_ioapic_eoi = vector;
+		kvm_make_request(KVM_REQ_IOAPIC_EOI_EXIT, vcpu);
 		return;
 	}
 
@@ -1160,7 +1162,7 @@ static void kvm_ioapic_send_eoi(struct kvm_lapic *apic, int vector)
 	else
 		trigger_mode = IOAPIC_EDGE_TRIG;
 
-	kvm_ioapic_update_eoi(apic->vcpu, vector, trigger_mode);
+	kvm_ioapic_update_eoi(vcpu, vector, trigger_mode);
 }
 
 static int apic_set_eoi(struct kvm_lapic *apic)


> so it looks like KVM really ought to return an
> error when trying to register a resample IRQFD when irqchip_split(),
> but do we have better options?  EOI handling in QEMU is pretty much
> non-existent and even if it was in place, it's a big performance
> regression for vfio INTx handling.  Is a split irqchip that retains
> performant resampling (EOI) support possible?  Thanks,


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* Re: [Qemu-devel] [PATCH v2 1/3] q35: set split kernel irqchip as default
@ 2019-04-27  8:09           ` Paolo Bonzini
  0 siblings, 0 replies; 38+ messages in thread
From: Paolo Bonzini @ 2019-04-27  8:09 UTC (permalink / raw)
  To: Alex Williamson
  Cc: Peter Xu, qemu-devel, Igor Mammedov, Eduardo Habkost,
	Michael S . Tsirkin

On 27/04/19 07:29, Paolo Bonzini wrote:
> 
>>> In my testing it looks like KVM advertises supporting the KVM_IRQFD
>>> resample feature, but vfio never gets the unmask notification, so the
>>> device remains with DisINTx set and no further interrupts are
>>> generated.  Do we expect KVM's IRQFD with resampler to work in the
>>> split IRQ mode?  We can certainly hope that "high performance" devices
>>> use MSI or MSI/X, but this would be quite a performance regression with
>>> split mode if our userspace bypass for INTx goes away.  Thanks,
>>
>> arch/x86/kvm/lapic.c:kvm_ioapic_send_eoi() dumps to userspace before
>> kvm_ioapic_update_eoi() can handle the irq_ack_notifier_list via
>> kvm_notify_acked_gsi(),
> 
> That wouldn't help because kvm_ioapic_update_eoi would not even be
> able to access vcpu->kvm->arch.vioapic (it's NULL).
> 
> The following untested patch would signal the resamplefd in kvm_ioapic_send_eoi,
> before requesting the exit to userspace.  However I am not sure how QEMU
> sets up the VFIO eventfds: if I understand correctly, when VFIO writes again to
> the irq eventfd, the interrupt request would not reach the userspace IOAPIC, but
> only the in-kernel LAPIC.  That would be incorrect, and if my understanding is
> correct we need to trigger resampling from hw/intc/ioapic.c.

Actually it's worse: because you're bypassing IOAPIC when raising the
irq, the IOAPIC's remote_irr for example will not be set.  So split
irqchip currently must disable the intx fast path completely.

I guess we could also reimplement irqfd and resamplefd in the userspace
IOAPIC, and run the listener in a separate thread (using "-object
iothread" on the command line and AioContext in the code).

Paolo

> 
> Something like:
> 
> 1) VFIO uses kvm_irqchip_add_irqfd_notifier_gsi instead of KVM_IRQFD directly
> 
> 2) add a NotifierList in kvm_irqchip_assign_irqfd
> 
> 3) if kvm_irqchip_is_split(), register a corresponding Notifier in ioapic.c which
> stores the resamplefd as an EventNotifier*
> 
> 4) ioapic_eoi_broadcast writes to the resamplefd
> 
> BTW, how do non-x86 platforms handle intx resampling?
> 
> Paolo
> 
> ---- 8< -----
> From: Paolo Bonzini <pbonzini@redhat.com>
> Subject: [PATCH WIP] kvm: lapic: run ack notifiers for userspace IOAPIC routes
> 
> diff --git a/arch/x86/kvm/ioapic.h b/arch/x86/kvm/ioapic.h
> index ea1a4e0297da..be337e06e3fd 100644
> --- a/arch/x86/kvm/ioapic.h
> +++ b/arch/x86/kvm/ioapic.h
> @@ -135,4 +135,6 @@ void kvm_ioapic_scan_entry(struct kvm_vcpu *vcpu,
>  			   ulong *ioapic_handled_vectors);
>  void kvm_scan_ioapic_routes(struct kvm_vcpu *vcpu,
>  			    ulong *ioapic_handled_vectors);
> +void kvm_notify_userspace_ioapic_routes(struct kvm *kvm, int vector);
> +
>  #endif
> diff --git a/arch/x86/kvm/irq_comm.c b/arch/x86/kvm/irq_comm.c
> index 3cc3b2d130a0..46ea79a0dd8a 100644
> --- a/arch/x86/kvm/irq_comm.c
> +++ b/arch/x86/kvm/irq_comm.c
> @@ -435,6 +435,34 @@ void kvm_scan_ioapic_routes(struct kvm_vcpu *vcpu,
>  	srcu_read_unlock(&kvm->irq_srcu, idx);
>  }
>  
> +void kvm_notify_userspace_ioapic_routes(struct kvm *kvm, int vector)
> +{
> +        struct kvm_kernel_irq_routing_entry *entry;
> +        struct kvm_irq_routing_table *table;
> +        u32 i, nr_ioapic_pins;
> +        int idx;
> +
> +        idx = srcu_read_lock(&kvm->irq_srcu);
> +        table = srcu_dereference(kvm->irq_routing, &kvm->irq_srcu);
> +        nr_ioapic_pins = min_t(u32, table->nr_rt_entries,
> +                               kvm->arch.nr_reserved_ioapic_pins);
> +
> +	for (i = 0; i < nr_ioapic_pins; i++) {
> +                hlist_for_each_entry(entry, &table->map[i], link) {
> +                        struct kvm_lapic_irq irq;
> +
> +                        if (entry->type != KVM_IRQ_ROUTING_MSI)
> +                                continue;
> +
> +			kvm_set_msi_irq(kvm, entry, &irq);
> +			if (irq.level && irq.vector == vector)
> +				kvm_notify_acked_gsi(kvm, i);
> +		}
> +	}
> +
> +	srcu_read_unlock(&kvm->irq_srcu, idx);
> +}
> +
>  void kvm_arch_irq_routing_update(struct kvm *kvm)
>  {
>  	kvm_hv_irq_routing_update(kvm);
> diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
> index 9f089e2e09d0..8f8e76d77925 100644
> --- a/arch/x86/kvm/lapic.c
> +++ b/arch/x86/kvm/lapic.c
> @@ -1142,6 +1142,7 @@ static bool kvm_ioapic_handles_vector(struct kvm_lapic *apic, int vector)
>  
>  static void kvm_ioapic_send_eoi(struct kvm_lapic *apic, int vector)
>  {
> +	struct kvm_vcpu *vcpu = apic->vcpu;
>  	int trigger_mode;
>  
>  	/* Eoi the ioapic only if the ioapic doesn't own the vector. */
> @@ -1149,9 +1150,10 @@ static void kvm_ioapic_send_eoi(struct kvm_lapic *apic, int vector)
>  		return;
>  
>  	/* Request a KVM exit to inform the userspace IOAPIC. */
> -	if (irqchip_split(apic->vcpu->kvm)) {
> -		apic->vcpu->arch.pending_ioapic_eoi = vector;
> -		kvm_make_request(KVM_REQ_IOAPIC_EOI_EXIT, apic->vcpu);
> +	if (irqchip_split(vcpu->kvm)) {
> +		kvm_notify_userspace_ioapic_routes(vcpu->kvm, vector);
> +		vcpu->arch.pending_ioapic_eoi = vector;
> +		kvm_make_request(KVM_REQ_IOAPIC_EOI_EXIT, vcpu);
>  		return;
>  	}
>  
> @@ -1160,7 +1162,7 @@ static void kvm_ioapic_send_eoi(struct kvm_lapic *apic, int vector)
>  	else
>  		trigger_mode = IOAPIC_EDGE_TRIG;
>  
> -	kvm_ioapic_update_eoi(apic->vcpu, vector, trigger_mode);
> +	kvm_ioapic_update_eoi(vcpu, vector, trigger_mode);
>  }
>  
>  static int apic_set_eoi(struct kvm_lapic *apic)
> 
> 
>> so it looks like KVM really ought to return an
>> error when trying to register a resample IRQFD when irqchip_split(),
>> but do we have better options?  EOI handling in QEMU is pretty much
>> non-existent and even if it was in place, it's a big performance
>> regression for vfio INTx handling.  Is a split irqchip that retains
>> performant resampling (EOI) support possible?  Thanks,

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Qemu-devel] [PATCH v2 1/3] q35: set split kernel irqchip as default
@ 2019-04-27  8:09           ` Paolo Bonzini
  0 siblings, 0 replies; 38+ messages in thread
From: Paolo Bonzini @ 2019-04-27  8:09 UTC (permalink / raw)
  To: Alex Williamson
  Cc: Igor Mammedov, Michael S . Tsirkin, qemu-devel, Peter Xu,
	Eduardo Habkost

On 27/04/19 07:29, Paolo Bonzini wrote:
> 
>>> In my testing it looks like KVM advertises supporting the KVM_IRQFD
>>> resample feature, but vfio never gets the unmask notification, so the
>>> device remains with DisINTx set and no further interrupts are
>>> generated.  Do we expect KVM's IRQFD with resampler to work in the
>>> split IRQ mode?  We can certainly hope that "high performance" devices
>>> use MSI or MSI/X, but this would be quite a performance regression with
>>> split mode if our userspace bypass for INTx goes away.  Thanks,
>>
>> arch/x86/kvm/lapic.c:kvm_ioapic_send_eoi() dumps to userspace before
>> kvm_ioapic_update_eoi() can handle the irq_ack_notifier_list via
>> kvm_notify_acked_gsi(),
> 
> That wouldn't help because kvm_ioapic_update_eoi would not even be
> able to access vcpu->kvm->arch.vioapic (it's NULL).
> 
> The following untested patch would signal the resamplefd in kvm_ioapic_send_eoi,
> before requesting the exit to userspace.  However I am not sure how QEMU
> sets up the VFIO eventfds: if I understand correctly, when VFIO writes again to
> the irq eventfd, the interrupt request would not reach the userspace IOAPIC, but
> only the in-kernel LAPIC.  That would be incorrect, and if my understanding is
> correct we need to trigger resampling from hw/intc/ioapic.c.

Actually it's worse: because you're bypassing IOAPIC when raising the
irq, the IOAPIC's remote_irr for example will not be set.  So split
irqchip currently must disable the intx fast path completely.

I guess we could also reimplement irqfd and resamplefd in the userspace
IOAPIC, and run the listener in a separate thread (using "-object
iothread" on the command line and AioContext in the code).

Paolo

> 
> Something like:
> 
> 1) VFIO uses kvm_irqchip_add_irqfd_notifier_gsi instead of KVM_IRQFD directly
> 
> 2) add a NotifierList in kvm_irqchip_assign_irqfd
> 
> 3) if kvm_irqchip_is_split(), register a corresponding Notifier in ioapic.c which
> stores the resamplefd as an EventNotifier*
> 
> 4) ioapic_eoi_broadcast writes to the resamplefd
> 
> BTW, how do non-x86 platforms handle intx resampling?
> 
> Paolo
> 
> ---- 8< -----
> From: Paolo Bonzini <pbonzini@redhat.com>
> Subject: [PATCH WIP] kvm: lapic: run ack notifiers for userspace IOAPIC routes
> 
> diff --git a/arch/x86/kvm/ioapic.h b/arch/x86/kvm/ioapic.h
> index ea1a4e0297da..be337e06e3fd 100644
> --- a/arch/x86/kvm/ioapic.h
> +++ b/arch/x86/kvm/ioapic.h
> @@ -135,4 +135,6 @@ void kvm_ioapic_scan_entry(struct kvm_vcpu *vcpu,
>  			   ulong *ioapic_handled_vectors);
>  void kvm_scan_ioapic_routes(struct kvm_vcpu *vcpu,
>  			    ulong *ioapic_handled_vectors);
> +void kvm_notify_userspace_ioapic_routes(struct kvm *kvm, int vector);
> +
>  #endif
> diff --git a/arch/x86/kvm/irq_comm.c b/arch/x86/kvm/irq_comm.c
> index 3cc3b2d130a0..46ea79a0dd8a 100644
> --- a/arch/x86/kvm/irq_comm.c
> +++ b/arch/x86/kvm/irq_comm.c
> @@ -435,6 +435,34 @@ void kvm_scan_ioapic_routes(struct kvm_vcpu *vcpu,
>  	srcu_read_unlock(&kvm->irq_srcu, idx);
>  }
>  
> +void kvm_notify_userspace_ioapic_routes(struct kvm *kvm, int vector)
> +{
> +        struct kvm_kernel_irq_routing_entry *entry;
> +        struct kvm_irq_routing_table *table;
> +        u32 i, nr_ioapic_pins;
> +        int idx;
> +
> +        idx = srcu_read_lock(&kvm->irq_srcu);
> +        table = srcu_dereference(kvm->irq_routing, &kvm->irq_srcu);
> +        nr_ioapic_pins = min_t(u32, table->nr_rt_entries,
> +                               kvm->arch.nr_reserved_ioapic_pins);
> +
> +	for (i = 0; i < nr_ioapic_pins; i++) {
> +                hlist_for_each_entry(entry, &table->map[i], link) {
> +                        struct kvm_lapic_irq irq;
> +
> +                        if (entry->type != KVM_IRQ_ROUTING_MSI)
> +                                continue;
> +
> +			kvm_set_msi_irq(kvm, entry, &irq);
> +			if (irq.level && irq.vector == vector)
> +				kvm_notify_acked_gsi(kvm, i);
> +		}
> +	}
> +
> +	srcu_read_unlock(&kvm->irq_srcu, idx);
> +}
> +
>  void kvm_arch_irq_routing_update(struct kvm *kvm)
>  {
>  	kvm_hv_irq_routing_update(kvm);
> diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
> index 9f089e2e09d0..8f8e76d77925 100644
> --- a/arch/x86/kvm/lapic.c
> +++ b/arch/x86/kvm/lapic.c
> @@ -1142,6 +1142,7 @@ static bool kvm_ioapic_handles_vector(struct kvm_lapic *apic, int vector)
>  
>  static void kvm_ioapic_send_eoi(struct kvm_lapic *apic, int vector)
>  {
> +	struct kvm_vcpu *vcpu = apic->vcpu;
>  	int trigger_mode;
>  
>  	/* Eoi the ioapic only if the ioapic doesn't own the vector. */
> @@ -1149,9 +1150,10 @@ static void kvm_ioapic_send_eoi(struct kvm_lapic *apic, int vector)
>  		return;
>  
>  	/* Request a KVM exit to inform the userspace IOAPIC. */
> -	if (irqchip_split(apic->vcpu->kvm)) {
> -		apic->vcpu->arch.pending_ioapic_eoi = vector;
> -		kvm_make_request(KVM_REQ_IOAPIC_EOI_EXIT, apic->vcpu);
> +	if (irqchip_split(vcpu->kvm)) {
> +		kvm_notify_userspace_ioapic_routes(vcpu->kvm, vector);
> +		vcpu->arch.pending_ioapic_eoi = vector;
> +		kvm_make_request(KVM_REQ_IOAPIC_EOI_EXIT, vcpu);
>  		return;
>  	}
>  
> @@ -1160,7 +1162,7 @@ static void kvm_ioapic_send_eoi(struct kvm_lapic *apic, int vector)
>  	else
>  		trigger_mode = IOAPIC_EDGE_TRIG;
>  
> -	kvm_ioapic_update_eoi(apic->vcpu, vector, trigger_mode);
> +	kvm_ioapic_update_eoi(vcpu, vector, trigger_mode);
>  }
>  
>  static int apic_set_eoi(struct kvm_lapic *apic)
> 
> 
>> so it looks like KVM really ought to return an
>> error when trying to register a resample IRQFD when irqchip_split(),
>> but do we have better options?  EOI handling in QEMU is pretty much
>> non-existent and even if it was in place, it's a big performance
>> regression for vfio INTx handling.  Is a split irqchip that retains
>> performant resampling (EOI) support possible?  Thanks,



^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Qemu-devel] [PATCH v2 1/3] q35: set split kernel irqchip as default
@ 2019-04-29 14:21             ` Alex Williamson
  0 siblings, 0 replies; 38+ messages in thread
From: Alex Williamson @ 2019-04-29 14:21 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Peter Xu, qemu-devel, Igor Mammedov, Eduardo Habkost,
	Michael S . Tsirkin

On Sat, 27 Apr 2019 10:09:51 +0200
Paolo Bonzini <pbonzini@redhat.com> wrote:

> On 27/04/19 07:29, Paolo Bonzini wrote:
> >   
> >>> In my testing it looks like KVM advertises supporting the KVM_IRQFD
> >>> resample feature, but vfio never gets the unmask notification, so the
> >>> device remains with DisINTx set and no further interrupts are
> >>> generated.  Do we expect KVM's IRQFD with resampler to work in the
> >>> split IRQ mode?  We can certainly hope that "high performance" devices
> >>> use MSI or MSI/X, but this would be quite a performance regression with
> >>> split mode if our userspace bypass for INTx goes away.  Thanks,  
> >>
> >> arch/x86/kvm/lapic.c:kvm_ioapic_send_eoi() dumps to userspace before
> >> kvm_ioapic_update_eoi() can handle the irq_ack_notifier_list via
> >> kvm_notify_acked_gsi(),  
> > 
> > That wouldn't help because kvm_ioapic_update_eoi would not even be
> > able to access vcpu->kvm->arch.vioapic (it's NULL).
> > 
> > The following untested patch would signal the resamplefd in kvm_ioapic_send_eoi,
> > before requesting the exit to userspace.  However I am not sure how QEMU
> > sets up the VFIO eventfds: if I understand correctly, when VFIO writes again to
> > the irq eventfd, the interrupt request would not reach the userspace IOAPIC, but
> > only the in-kernel LAPIC.  That would be incorrect, and if my understanding is
> > correct we need to trigger resampling from hw/intc/ioapic.c.  
> 
> Actually it's worse: because you're bypassing IOAPIC when raising the
> irq, the IOAPIC's remote_irr for example will not be set.  So split
> irqchip currently must disable the intx fast path completely.
> 
> I guess we could also reimplement irqfd and resamplefd in the userspace
> IOAPIC, and run the listener in a separate thread (using "-object
> iothread" on the command line and AioContext in the code).

This sounds like a performance regression vs KVM irqchip any way we
slice it.  Was this change a mistake?  Without KVM support, the
universal support in QEMU kicks in, where device mmaps are disabled
when an INTx occurs, forcing trapped access to the device, and we
assume that the next access is in response to an interrupt and trigger
our own internal EOI and re-enable mmaps.  A timer acts as a
catch-all.  Needless to say, this is functional but not fast.  It would
be a massive performance regression for devices depending on INTx and
previously using the KVM bypass to switch to this.  INTx is largely
considered a legacy interrupt, so non-x86 archs don't encounter it as
often, S390 even explicitly disables INTx support.  ARM and POWER
likely just don't see a lot of these devices, but nearly all devices
(except SR-IOV VFs) on x86 expect an INTx fallback mode and some
drivers may run the device in INTx for compatibility.  This split
irqchip change was likely fine for "enterprise" users concerned only
with modern high speed devices, but very much not for device assignment
used for compatibility use cases or commodity hardware users.

What's a good 4.0.1 strategy to resolve this?  Re-instate KVM irqchip
as the Q35 default?  I can't see that simply switching to current QEMU
handling is a viable option for performance?  What about 4.1?  We could
certainly improve EOI support in QEMU, there's essentially no support
currently, but it seems like an uphill battle for an iothread based
userspace ioapic to ever compare to KVM handling?  Thanks,

Alex

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Qemu-devel] [PATCH v2 1/3] q35: set split kernel irqchip as default
@ 2019-04-29 14:21             ` Alex Williamson
  0 siblings, 0 replies; 38+ messages in thread
From: Alex Williamson @ 2019-04-29 14:21 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Igor Mammedov, Michael S . Tsirkin, qemu-devel, Peter Xu,
	Eduardo Habkost

On Sat, 27 Apr 2019 10:09:51 +0200
Paolo Bonzini <pbonzini@redhat.com> wrote:

> On 27/04/19 07:29, Paolo Bonzini wrote:
> >   
> >>> In my testing it looks like KVM advertises supporting the KVM_IRQFD
> >>> resample feature, but vfio never gets the unmask notification, so the
> >>> device remains with DisINTx set and no further interrupts are
> >>> generated.  Do we expect KVM's IRQFD with resampler to work in the
> >>> split IRQ mode?  We can certainly hope that "high performance" devices
> >>> use MSI or MSI/X, but this would be quite a performance regression with
> >>> split mode if our userspace bypass for INTx goes away.  Thanks,  
> >>
> >> arch/x86/kvm/lapic.c:kvm_ioapic_send_eoi() dumps to userspace before
> >> kvm_ioapic_update_eoi() can handle the irq_ack_notifier_list via
> >> kvm_notify_acked_gsi(),  
> > 
> > That wouldn't help because kvm_ioapic_update_eoi would not even be
> > able to access vcpu->kvm->arch.vioapic (it's NULL).
> > 
> > The following untested patch would signal the resamplefd in kvm_ioapic_send_eoi,
> > before requesting the exit to userspace.  However I am not sure how QEMU
> > sets up the VFIO eventfds: if I understand correctly, when VFIO writes again to
> > the irq eventfd, the interrupt request would not reach the userspace IOAPIC, but
> > only the in-kernel LAPIC.  That would be incorrect, and if my understanding is
> > correct we need to trigger resampling from hw/intc/ioapic.c.  
> 
> Actually it's worse: because you're bypassing IOAPIC when raising the
> irq, the IOAPIC's remote_irr for example will not be set.  So split
> irqchip currently must disable the intx fast path completely.
> 
> I guess we could also reimplement irqfd and resamplefd in the userspace
> IOAPIC, and run the listener in a separate thread (using "-object
> iothread" on the command line and AioContext in the code).

This sounds like a performance regression vs KVM irqchip any way we
slice it.  Was this change a mistake?  Without KVM support, the
universal support in QEMU kicks in, where device mmaps are disabled
when an INTx occurs, forcing trapped access to the device, and we
assume that the next access is in response to an interrupt and trigger
our own internal EOI and re-enable mmaps.  A timer acts as a
catch-all.  Needless to say, this is functional but not fast.  It would
be a massive performance regression for devices depending on INTx and
previously using the KVM bypass to switch to this.  INTx is largely
considered a legacy interrupt, so non-x86 archs don't encounter it as
often, S390 even explicitly disables INTx support.  ARM and POWER
likely just don't see a lot of these devices, but nearly all devices
(except SR-IOV VFs) on x86 expect an INTx fallback mode and some
drivers may run the device in INTx for compatibility.  This split
irqchip change was likely fine for "enterprise" users concerned only
with modern high speed devices, but very much not for device assignment
used for compatibility use cases or commodity hardware users.

What's a good 4.0.1 strategy to resolve this?  Re-instate KVM irqchip
as the Q35 default?  I can't see that simply switching to current QEMU
handling is a viable option for performance?  What about 4.1?  We could
certainly improve EOI support in QEMU, there's essentially no support
currently, but it seems like an uphill battle for an iothread based
userspace ioapic to ever compare to KVM handling?  Thanks,

Alex

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Qemu-devel] [PATCH v2 1/3] q35: set split kernel irqchip as default
@ 2019-04-29 14:55               ` Eduardo Habkost
  0 siblings, 0 replies; 38+ messages in thread
From: Eduardo Habkost @ 2019-04-29 14:55 UTC (permalink / raw)
  To: Alex Williamson
  Cc: Paolo Bonzini, Igor Mammedov, Michael S . Tsirkin, qemu-devel, Peter Xu

On Mon, Apr 29, 2019 at 08:21:06AM -0600, Alex Williamson wrote:
> On Sat, 27 Apr 2019 10:09:51 +0200
> Paolo Bonzini <pbonzini@redhat.com> wrote:
> 
> > On 27/04/19 07:29, Paolo Bonzini wrote:
> > >   
> > >>> In my testing it looks like KVM advertises supporting the KVM_IRQFD
> > >>> resample feature, but vfio never gets the unmask notification, so the
> > >>> device remains with DisINTx set and no further interrupts are
> > >>> generated.  Do we expect KVM's IRQFD with resampler to work in the
> > >>> split IRQ mode?  We can certainly hope that "high performance" devices
> > >>> use MSI or MSI/X, but this would be quite a performance regression with
> > >>> split mode if our userspace bypass for INTx goes away.  Thanks,  
> > >>
> > >> arch/x86/kvm/lapic.c:kvm_ioapic_send_eoi() dumps to userspace before
> > >> kvm_ioapic_update_eoi() can handle the irq_ack_notifier_list via
> > >> kvm_notify_acked_gsi(),  
> > > 
> > > That wouldn't help because kvm_ioapic_update_eoi would not even be
> > > able to access vcpu->kvm->arch.vioapic (it's NULL).
> > > 
> > > The following untested patch would signal the resamplefd in kvm_ioapic_send_eoi,
> > > before requesting the exit to userspace.  However I am not sure how QEMU
> > > sets up the VFIO eventfds: if I understand correctly, when VFIO writes again to
> > > the irq eventfd, the interrupt request would not reach the userspace IOAPIC, but
> > > only the in-kernel LAPIC.  That would be incorrect, and if my understanding is
> > > correct we need to trigger resampling from hw/intc/ioapic.c.  
> > 
> > Actually it's worse: because you're bypassing IOAPIC when raising the
> > irq, the IOAPIC's remote_irr for example will not be set.  So split
> > irqchip currently must disable the intx fast path completely.
> > 
> > I guess we could also reimplement irqfd and resamplefd in the userspace
> > IOAPIC, and run the listener in a separate thread (using "-object
> > iothread" on the command line and AioContext in the code).
> 
> This sounds like a performance regression vs KVM irqchip any way we
> slice it.  Was this change a mistake?  Without KVM support, the
> universal support in QEMU kicks in, where device mmaps are disabled
> when an INTx occurs, forcing trapped access to the device, and we
> assume that the next access is in response to an interrupt and trigger
> our own internal EOI and re-enable mmaps.  A timer acts as a
> catch-all.  Needless to say, this is functional but not fast.  It would
> be a massive performance regression for devices depending on INTx and
> previously using the KVM bypass to switch to this.  INTx is largely
> considered a legacy interrupt, so non-x86 archs don't encounter it as
> often, S390 even explicitly disables INTx support.  ARM and POWER
> likely just don't see a lot of these devices, but nearly all devices
> (except SR-IOV VFs) on x86 expect an INTx fallback mode and some
> drivers may run the device in INTx for compatibility.  This split
> irqchip change was likely fine for "enterprise" users concerned only
> with modern high speed devices, but very much not for device assignment
> used for compatibility use cases or commodity hardware users.
> 
> What's a good 4.0.1 strategy to resolve this?  Re-instate KVM irqchip
> as the Q35 default?  I can't see that simply switching to current QEMU
> handling is a viable option for performance?  What about 4.1?  We could
> certainly improve EOI support in QEMU, there's essentially no support
> currently, but it seems like an uphill battle for an iothread based
> userspace ioapic to ever compare to KVM handling?  Thanks,

irqchip=split and irqchip=kernel aren't guest ABI compatible, are
they?  That would make it impossible to fix this in pc-q35-4.0
for a 4.0.1 update.

-- 
Eduardo

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Qemu-devel] [PATCH v2 1/3] q35: set split kernel irqchip as default
@ 2019-04-29 14:55               ` Eduardo Habkost
  0 siblings, 0 replies; 38+ messages in thread
From: Eduardo Habkost @ 2019-04-29 14:55 UTC (permalink / raw)
  To: Alex Williamson
  Cc: Paolo Bonzini, Michael S . Tsirkin, qemu-devel, Peter Xu, Igor Mammedov

On Mon, Apr 29, 2019 at 08:21:06AM -0600, Alex Williamson wrote:
> On Sat, 27 Apr 2019 10:09:51 +0200
> Paolo Bonzini <pbonzini@redhat.com> wrote:
> 
> > On 27/04/19 07:29, Paolo Bonzini wrote:
> > >   
> > >>> In my testing it looks like KVM advertises supporting the KVM_IRQFD
> > >>> resample feature, but vfio never gets the unmask notification, so the
> > >>> device remains with DisINTx set and no further interrupts are
> > >>> generated.  Do we expect KVM's IRQFD with resampler to work in the
> > >>> split IRQ mode?  We can certainly hope that "high performance" devices
> > >>> use MSI or MSI/X, but this would be quite a performance regression with
> > >>> split mode if our userspace bypass for INTx goes away.  Thanks,  
> > >>
> > >> arch/x86/kvm/lapic.c:kvm_ioapic_send_eoi() dumps to userspace before
> > >> kvm_ioapic_update_eoi() can handle the irq_ack_notifier_list via
> > >> kvm_notify_acked_gsi(),  
> > > 
> > > That wouldn't help because kvm_ioapic_update_eoi would not even be
> > > able to access vcpu->kvm->arch.vioapic (it's NULL).
> > > 
> > > The following untested patch would signal the resamplefd in kvm_ioapic_send_eoi,
> > > before requesting the exit to userspace.  However I am not sure how QEMU
> > > sets up the VFIO eventfds: if I understand correctly, when VFIO writes again to
> > > the irq eventfd, the interrupt request would not reach the userspace IOAPIC, but
> > > only the in-kernel LAPIC.  That would be incorrect, and if my understanding is
> > > correct we need to trigger resampling from hw/intc/ioapic.c.  
> > 
> > Actually it's worse: because you're bypassing IOAPIC when raising the
> > irq, the IOAPIC's remote_irr for example will not be set.  So split
> > irqchip currently must disable the intx fast path completely.
> > 
> > I guess we could also reimplement irqfd and resamplefd in the userspace
> > IOAPIC, and run the listener in a separate thread (using "-object
> > iothread" on the command line and AioContext in the code).
> 
> This sounds like a performance regression vs KVM irqchip any way we
> slice it.  Was this change a mistake?  Without KVM support, the
> universal support in QEMU kicks in, where device mmaps are disabled
> when an INTx occurs, forcing trapped access to the device, and we
> assume that the next access is in response to an interrupt and trigger
> our own internal EOI and re-enable mmaps.  A timer acts as a
> catch-all.  Needless to say, this is functional but not fast.  It would
> be a massive performance regression for devices depending on INTx and
> previously using the KVM bypass to switch to this.  INTx is largely
> considered a legacy interrupt, so non-x86 archs don't encounter it as
> often, S390 even explicitly disables INTx support.  ARM and POWER
> likely just don't see a lot of these devices, but nearly all devices
> (except SR-IOV VFs) on x86 expect an INTx fallback mode and some
> drivers may run the device in INTx for compatibility.  This split
> irqchip change was likely fine for "enterprise" users concerned only
> with modern high speed devices, but very much not for device assignment
> used for compatibility use cases or commodity hardware users.
> 
> What's a good 4.0.1 strategy to resolve this?  Re-instate KVM irqchip
> as the Q35 default?  I can't see that simply switching to current QEMU
> handling is a viable option for performance?  What about 4.1?  We could
> certainly improve EOI support in QEMU, there's essentially no support
> currently, but it seems like an uphill battle for an iothread based
> userspace ioapic to ever compare to KVM handling?  Thanks,

irqchip=split and irqchip=kernel aren't guest ABI compatible, are
they?  That would make it impossible to fix this in pc-q35-4.0
for a 4.0.1 update.

-- 
Eduardo


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Qemu-devel] [PATCH v2 1/3] q35: set split kernel irqchip as default
@ 2019-04-29 15:22                 ` Alex Williamson
  0 siblings, 0 replies; 38+ messages in thread
From: Alex Williamson @ 2019-04-29 15:22 UTC (permalink / raw)
  To: Eduardo Habkost
  Cc: Paolo Bonzini, Igor Mammedov, Michael S . Tsirkin, qemu-devel, Peter Xu

On Mon, 29 Apr 2019 11:55:56 -0300
Eduardo Habkost <ehabkost@redhat.com> wrote:

> On Mon, Apr 29, 2019 at 08:21:06AM -0600, Alex Williamson wrote:
> > On Sat, 27 Apr 2019 10:09:51 +0200
> > Paolo Bonzini <pbonzini@redhat.com> wrote:
> >   
> > > On 27/04/19 07:29, Paolo Bonzini wrote:  
> > > >     
> > > >>> In my testing it looks like KVM advertises supporting the KVM_IRQFD
> > > >>> resample feature, but vfio never gets the unmask notification, so the
> > > >>> device remains with DisINTx set and no further interrupts are
> > > >>> generated.  Do we expect KVM's IRQFD with resampler to work in the
> > > >>> split IRQ mode?  We can certainly hope that "high performance" devices
> > > >>> use MSI or MSI/X, but this would be quite a performance regression with
> > > >>> split mode if our userspace bypass for INTx goes away.  Thanks,    
> > > >>
> > > >> arch/x86/kvm/lapic.c:kvm_ioapic_send_eoi() dumps to userspace before
> > > >> kvm_ioapic_update_eoi() can handle the irq_ack_notifier_list via
> > > >> kvm_notify_acked_gsi(),    
> > > > 
> > > > That wouldn't help because kvm_ioapic_update_eoi would not even be
> > > > able to access vcpu->kvm->arch.vioapic (it's NULL).
> > > > 
> > > > The following untested patch would signal the resamplefd in kvm_ioapic_send_eoi,
> > > > before requesting the exit to userspace.  However I am not sure how QEMU
> > > > sets up the VFIO eventfds: if I understand correctly, when VFIO writes again to
> > > > the irq eventfd, the interrupt request would not reach the userspace IOAPIC, but
> > > > only the in-kernel LAPIC.  That would be incorrect, and if my understanding is
> > > > correct we need to trigger resampling from hw/intc/ioapic.c.    
> > > 
> > > Actually it's worse: because you're bypassing IOAPIC when raising the
> > > irq, the IOAPIC's remote_irr for example will not be set.  So split
> > > irqchip currently must disable the intx fast path completely.
> > > 
> > > I guess we could also reimplement irqfd and resamplefd in the userspace
> > > IOAPIC, and run the listener in a separate thread (using "-object
> > > iothread" on the command line and AioContext in the code).  
> > 
> > This sounds like a performance regression vs KVM irqchip any way we
> > slice it.  Was this change a mistake?  Without KVM support, the
> > universal support in QEMU kicks in, where device mmaps are disabled
> > when an INTx occurs, forcing trapped access to the device, and we
> > assume that the next access is in response to an interrupt and trigger
> > our own internal EOI and re-enable mmaps.  A timer acts as a
> > catch-all.  Needless to say, this is functional but not fast.  It would
> > be a massive performance regression for devices depending on INTx and
> > previously using the KVM bypass to switch to this.  INTx is largely
> > considered a legacy interrupt, so non-x86 archs don't encounter it as
> > often, S390 even explicitly disables INTx support.  ARM and POWER
> > likely just don't see a lot of these devices, but nearly all devices
> > (except SR-IOV VFs) on x86 expect an INTx fallback mode and some
> > drivers may run the device in INTx for compatibility.  This split
> > irqchip change was likely fine for "enterprise" users concerned only
> > with modern high speed devices, but very much not for device assignment
> > used for compatibility use cases or commodity hardware users.
> > 
> > What's a good 4.0.1 strategy to resolve this?  Re-instate KVM irqchip
> > as the Q35 default?  I can't see that simply switching to current QEMU
> > handling is a viable option for performance?  What about 4.1?  We could
> > certainly improve EOI support in QEMU, there's essentially no support
> > currently, but it seems like an uphill battle for an iothread based
> > userspace ioapic to ever compare to KVM handling?  Thanks,  
> 
> irqchip=split and irqchip=kernel aren't guest ABI compatible, are
> they?  That would make it impossible to fix this in pc-q35-4.0
> for a 4.0.1 update.

I suppose it would require a pc-q35-4.0.1 machine type :-\  Thanks,

Alex

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Qemu-devel] [PATCH v2 1/3] q35: set split kernel irqchip as default
@ 2019-04-29 15:22                 ` Alex Williamson
  0 siblings, 0 replies; 38+ messages in thread
From: Alex Williamson @ 2019-04-29 15:22 UTC (permalink / raw)
  To: Eduardo Habkost
  Cc: Paolo Bonzini, Michael S . Tsirkin, qemu-devel, Peter Xu, Igor Mammedov

On Mon, 29 Apr 2019 11:55:56 -0300
Eduardo Habkost <ehabkost@redhat.com> wrote:

> On Mon, Apr 29, 2019 at 08:21:06AM -0600, Alex Williamson wrote:
> > On Sat, 27 Apr 2019 10:09:51 +0200
> > Paolo Bonzini <pbonzini@redhat.com> wrote:
> >   
> > > On 27/04/19 07:29, Paolo Bonzini wrote:  
> > > >     
> > > >>> In my testing it looks like KVM advertises supporting the KVM_IRQFD
> > > >>> resample feature, but vfio never gets the unmask notification, so the
> > > >>> device remains with DisINTx set and no further interrupts are
> > > >>> generated.  Do we expect KVM's IRQFD with resampler to work in the
> > > >>> split IRQ mode?  We can certainly hope that "high performance" devices
> > > >>> use MSI or MSI/X, but this would be quite a performance regression with
> > > >>> split mode if our userspace bypass for INTx goes away.  Thanks,    
> > > >>
> > > >> arch/x86/kvm/lapic.c:kvm_ioapic_send_eoi() dumps to userspace before
> > > >> kvm_ioapic_update_eoi() can handle the irq_ack_notifier_list via
> > > >> kvm_notify_acked_gsi(),    
> > > > 
> > > > That wouldn't help because kvm_ioapic_update_eoi would not even be
> > > > able to access vcpu->kvm->arch.vioapic (it's NULL).
> > > > 
> > > > The following untested patch would signal the resamplefd in kvm_ioapic_send_eoi,
> > > > before requesting the exit to userspace.  However I am not sure how QEMU
> > > > sets up the VFIO eventfds: if I understand correctly, when VFIO writes again to
> > > > the irq eventfd, the interrupt request would not reach the userspace IOAPIC, but
> > > > only the in-kernel LAPIC.  That would be incorrect, and if my understanding is
> > > > correct we need to trigger resampling from hw/intc/ioapic.c.    
> > > 
> > > Actually it's worse: because you're bypassing IOAPIC when raising the
> > > irq, the IOAPIC's remote_irr for example will not be set.  So split
> > > irqchip currently must disable the intx fast path completely.
> > > 
> > > I guess we could also reimplement irqfd and resamplefd in the userspace
> > > IOAPIC, and run the listener in a separate thread (using "-object
> > > iothread" on the command line and AioContext in the code).  
> > 
> > This sounds like a performance regression vs KVM irqchip any way we
> > slice it.  Was this change a mistake?  Without KVM support, the
> > universal support in QEMU kicks in, where device mmaps are disabled
> > when an INTx occurs, forcing trapped access to the device, and we
> > assume that the next access is in response to an interrupt and trigger
> > our own internal EOI and re-enable mmaps.  A timer acts as a
> > catch-all.  Needless to say, this is functional but not fast.  It would
> > be a massive performance regression for devices depending on INTx and
> > previously using the KVM bypass to switch to this.  INTx is largely
> > considered a legacy interrupt, so non-x86 archs don't encounter it as
> > often, S390 even explicitly disables INTx support.  ARM and POWER
> > likely just don't see a lot of these devices, but nearly all devices
> > (except SR-IOV VFs) on x86 expect an INTx fallback mode and some
> > drivers may run the device in INTx for compatibility.  This split
> > irqchip change was likely fine for "enterprise" users concerned only
> > with modern high speed devices, but very much not for device assignment
> > used for compatibility use cases or commodity hardware users.
> > 
> > What's a good 4.0.1 strategy to resolve this?  Re-instate KVM irqchip
> > as the Q35 default?  I can't see that simply switching to current QEMU
> > handling is a viable option for performance?  What about 4.1?  We could
> > certainly improve EOI support in QEMU, there's essentially no support
> > currently, but it seems like an uphill battle for an iothread based
> > userspace ioapic to ever compare to KVM handling?  Thanks,  
> 
> irqchip=split and irqchip=kernel aren't guest ABI compatible, are
> they?  That would make it impossible to fix this in pc-q35-4.0
> for a 4.0.1 update.

I suppose it would require a pc-q35-4.0.1 machine type :-\  Thanks,

Alex


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Qemu-devel] [PATCH v2 1/3] q35: set split kernel irqchip as default
@ 2019-04-30 23:01               ` Alex Williamson
  0 siblings, 0 replies; 38+ messages in thread
From: Alex Williamson @ 2019-04-30 23:01 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Peter Xu, qemu-devel, Igor Mammedov, Eduardo Habkost,
	Michael S . Tsirkin

On Mon, 29 Apr 2019 08:21:06 -0600
Alex Williamson <alex.williamson@redhat.com> wrote:

> On Sat, 27 Apr 2019 10:09:51 +0200
> Paolo Bonzini <pbonzini@redhat.com> wrote:
> 
> > On 27/04/19 07:29, Paolo Bonzini wrote:  
> > >     
> > >>> In my testing it looks like KVM advertises supporting the KVM_IRQFD
> > >>> resample feature, but vfio never gets the unmask notification, so the
> > >>> device remains with DisINTx set and no further interrupts are
> > >>> generated.  Do we expect KVM's IRQFD with resampler to work in the
> > >>> split IRQ mode?  We can certainly hope that "high performance" devices
> > >>> use MSI or MSI/X, but this would be quite a performance regression with
> > >>> split mode if our userspace bypass for INTx goes away.  Thanks,    
> > >>
> > >> arch/x86/kvm/lapic.c:kvm_ioapic_send_eoi() dumps to userspace before
> > >> kvm_ioapic_update_eoi() can handle the irq_ack_notifier_list via
> > >> kvm_notify_acked_gsi(),    
> > > 
> > > That wouldn't help because kvm_ioapic_update_eoi would not even be
> > > able to access vcpu->kvm->arch.vioapic (it's NULL).
> > > 
> > > The following untested patch would signal the resamplefd in kvm_ioapic_send_eoi,
> > > before requesting the exit to userspace.  However I am not sure how QEMU
> > > sets up the VFIO eventfds: if I understand correctly, when VFIO writes again to
> > > the irq eventfd, the interrupt request would not reach the userspace IOAPIC, but
> > > only the in-kernel LAPIC.  That would be incorrect, and if my understanding is
> > > correct we need to trigger resampling from hw/intc/ioapic.c.    
> > 
> > Actually it's worse: because you're bypassing IOAPIC when raising the
> > irq, the IOAPIC's remote_irr for example will not be set.  So split
> > irqchip currently must disable the intx fast path completely.
> > 
> > I guess we could also reimplement irqfd and resamplefd in the userspace
> > IOAPIC, and run the listener in a separate thread (using "-object
> > iothread" on the command line and AioContext in the code).  
> 
> This sounds like a performance regression vs KVM irqchip any way we
> slice it.  Was this change a mistake?  Without KVM support, the
> universal support in QEMU kicks in, where device mmaps are disabled
> when an INTx occurs, forcing trapped access to the device, and we
> assume that the next access is in response to an interrupt and trigger
> our own internal EOI and re-enable mmaps.  A timer acts as a
> catch-all.  Needless to say, this is functional but not fast.  It would
> be a massive performance regression for devices depending on INTx and
> previously using the KVM bypass to switch to this.  INTx is largely
> considered a legacy interrupt, so non-x86 archs don't encounter it as
> often, S390 even explicitly disables INTx support.  ARM and POWER
> likely just don't see a lot of these devices, but nearly all devices
> (except SR-IOV VFs) on x86 expect an INTx fallback mode and some
> drivers may run the device in INTx for compatibility.  This split
> irqchip change was likely fine for "enterprise" users concerned only
> with modern high speed devices, but very much not for device assignment
> used for compatibility use cases or commodity hardware users.
> 
> What's a good 4.0.1 strategy to resolve this?  Re-instate KVM irqchip
> as the Q35 default?  I can't see that simply switching to current QEMU
> handling is a viable option for performance?  What about 4.1?  We could
> certainly improve EOI support in QEMU, there's essentially no support
> currently, but it seems like an uphill battle for an iothread based
> userspace ioapic to ever compare to KVM handling?  Thanks,

Poking at this a bit, we can add kvm_irqchip_is_split() to the set of
things we test for in hw/vfio/pci.c:vfio_intx_enable_kvm() to avoid the
KVM INTx bypass when using split IRQ chip.  This at least avoids paths
that cannot work currently.  We'll fall back to vfio's universal EOI
detection of toggling direct mapped MemoryRegions, which is enough for
simple devices like NICs.  However, it's barely functional with an
NVIDIA GeForce card assigned to a Windows VM, it only takes a graphics
test program to send it over the edge and trigger a TDR.  Even with TDR
disabled, the VM will hang.  I also played with ioapic_eoi_broadcast()
calling directly into vfio code to trigger the EOI (without the mmap
toggling), it's even worse, Windows can't even get to the desktop
before it hangs.

So while I was initially impressed that netperf TCP_RR results for a
gigabit NIC were not that different between in-kernel ioapic and split
irqchip, the graphics results have me again wondering why we made this
change and how userspace handling can get us back to a functional level.

The only way I can get the GPU/Windows configuration usable is to
assert the IRQ, immediately de-assert, and unmask the device all from
vfio_intx_interrupt().  An interrupt intensive graphics benchmark runs
at ~80% of KVM irqchip with about 10% more CPU load with this
experiment (but it actually runs!).  Potentially some devices could
mimic INTx using MSI, like legacy KVM device assignment used to do with
this mode, eliminating the unmask ioctl, but even the legacy driver
noted compatibility issues with that mode and neither is a good
reproduction of how INTx is supposed to work.

Any other insights appreciated, and I really would like to understand
what we've gained with split irqchip and whether it's worth this.
Thanks,

Alex

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Qemu-devel] [PATCH v2 1/3] q35: set split kernel irqchip as default
@ 2019-04-30 23:01               ` Alex Williamson
  0 siblings, 0 replies; 38+ messages in thread
From: Alex Williamson @ 2019-04-30 23:01 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Igor Mammedov, Michael S . Tsirkin, qemu-devel, Peter Xu,
	Eduardo Habkost

On Mon, 29 Apr 2019 08:21:06 -0600
Alex Williamson <alex.williamson@redhat.com> wrote:

> On Sat, 27 Apr 2019 10:09:51 +0200
> Paolo Bonzini <pbonzini@redhat.com> wrote:
> 
> > On 27/04/19 07:29, Paolo Bonzini wrote:  
> > >     
> > >>> In my testing it looks like KVM advertises supporting the KVM_IRQFD
> > >>> resample feature, but vfio never gets the unmask notification, so the
> > >>> device remains with DisINTx set and no further interrupts are
> > >>> generated.  Do we expect KVM's IRQFD with resampler to work in the
> > >>> split IRQ mode?  We can certainly hope that "high performance" devices
> > >>> use MSI or MSI/X, but this would be quite a performance regression with
> > >>> split mode if our userspace bypass for INTx goes away.  Thanks,    
> > >>
> > >> arch/x86/kvm/lapic.c:kvm_ioapic_send_eoi() dumps to userspace before
> > >> kvm_ioapic_update_eoi() can handle the irq_ack_notifier_list via
> > >> kvm_notify_acked_gsi(),    
> > > 
> > > That wouldn't help because kvm_ioapic_update_eoi would not even be
> > > able to access vcpu->kvm->arch.vioapic (it's NULL).
> > > 
> > > The following untested patch would signal the resamplefd in kvm_ioapic_send_eoi,
> > > before requesting the exit to userspace.  However I am not sure how QEMU
> > > sets up the VFIO eventfds: if I understand correctly, when VFIO writes again to
> > > the irq eventfd, the interrupt request would not reach the userspace IOAPIC, but
> > > only the in-kernel LAPIC.  That would be incorrect, and if my understanding is
> > > correct we need to trigger resampling from hw/intc/ioapic.c.    
> > 
> > Actually it's worse: because you're bypassing IOAPIC when raising the
> > irq, the IOAPIC's remote_irr for example will not be set.  So split
> > irqchip currently must disable the intx fast path completely.
> > 
> > I guess we could also reimplement irqfd and resamplefd in the userspace
> > IOAPIC, and run the listener in a separate thread (using "-object
> > iothread" on the command line and AioContext in the code).  
> 
> This sounds like a performance regression vs KVM irqchip any way we
> slice it.  Was this change a mistake?  Without KVM support, the
> universal support in QEMU kicks in, where device mmaps are disabled
> when an INTx occurs, forcing trapped access to the device, and we
> assume that the next access is in response to an interrupt and trigger
> our own internal EOI and re-enable mmaps.  A timer acts as a
> catch-all.  Needless to say, this is functional but not fast.  It would
> be a massive performance regression for devices depending on INTx and
> previously using the KVM bypass to switch to this.  INTx is largely
> considered a legacy interrupt, so non-x86 archs don't encounter it as
> often, S390 even explicitly disables INTx support.  ARM and POWER
> likely just don't see a lot of these devices, but nearly all devices
> (except SR-IOV VFs) on x86 expect an INTx fallback mode and some
> drivers may run the device in INTx for compatibility.  This split
> irqchip change was likely fine for "enterprise" users concerned only
> with modern high speed devices, but very much not for device assignment
> used for compatibility use cases or commodity hardware users.
> 
> What's a good 4.0.1 strategy to resolve this?  Re-instate KVM irqchip
> as the Q35 default?  I can't see that simply switching to current QEMU
> handling is a viable option for performance?  What about 4.1?  We could
> certainly improve EOI support in QEMU, there's essentially no support
> currently, but it seems like an uphill battle for an iothread based
> userspace ioapic to ever compare to KVM handling?  Thanks,

Poking at this a bit, we can add kvm_irqchip_is_split() to the set of
things we test for in hw/vfio/pci.c:vfio_intx_enable_kvm() to avoid the
KVM INTx bypass when using split IRQ chip.  This at least avoids paths
that cannot work currently.  We'll fall back to vfio's universal EOI
detection of toggling direct mapped MemoryRegions, which is enough for
simple devices like NICs.  However, it's barely functional with an
NVIDIA GeForce card assigned to a Windows VM, it only takes a graphics
test program to send it over the edge and trigger a TDR.  Even with TDR
disabled, the VM will hang.  I also played with ioapic_eoi_broadcast()
calling directly into vfio code to trigger the EOI (without the mmap
toggling), it's even worse, Windows can't even get to the desktop
before it hangs.

So while I was initially impressed that netperf TCP_RR results for a
gigabit NIC were not that different between in-kernel ioapic and split
irqchip, the graphics results have me again wondering why we made this
change and how userspace handling can get us back to a functional level.

The only way I can get the GPU/Windows configuration usable is to
assert the IRQ, immediately de-assert, and unmask the device all from
vfio_intx_interrupt().  An interrupt intensive graphics benchmark runs
at ~80% of KVM irqchip with about 10% more CPU load with this
experiment (but it actually runs!).  Potentially some devices could
mimic INTx using MSI, like legacy KVM device assignment used to do with
this mode, eliminating the unmask ioctl, but even the legacy driver
noted compatibility issues with that mode and neither is a good
reproduction of how INTx is supposed to work.

Any other insights appreciated, and I really would like to understand
what we've gained with split irqchip and whether it's worth this.
Thanks,

Alex

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Qemu-devel] [PATCH v2 1/3] q35: set split kernel irqchip as default
@ 2019-04-30 23:09                 ` Paolo Bonzini
  0 siblings, 0 replies; 38+ messages in thread
From: Paolo Bonzini @ 2019-04-30 23:09 UTC (permalink / raw)
  To: Alex Williamson
  Cc: Peter Xu, qemu-devel, Igor Mammedov, Eduardo Habkost,
	Michael S . Tsirkin

On 01/05/19 01:01, Alex Williamson wrote:
> Poking at this a bit, we can add kvm_irqchip_is_split() to the set of
> things we test for in hw/vfio/pci.c:vfio_intx_enable_kvm() to avoid the
> KVM INTx bypass when using split IRQ chip.

Yes, this should be enough.

> The only way I can get the GPU/Windows configuration usable is to
> assert the IRQ, immediately de-assert, and unmask the device all from
> vfio_intx_interrupt().  An interrupt intensive graphics benchmark runs
> at ~80% of KVM irqchip with about 10% more CPU load with this
> experiment (but it actually runs!).

Nice.  If you can do it directly from hw/vfio there may be no need to do
more changes to the IOAPIC, and least not immediately.  But that is not
a good emulation of INTX, isn't it?  IIUC, it relies on the
level-triggered interrupt never being masked in the IOAPIC.

> Any other insights appreciated, and I really would like to understand
> what we've gained with split irqchip and whether it's worth this.

We've gained guest interrupt remapping support, we are not relying on
newer kernel versions, and the attack surface from the guest to the
hypervisor is smaller.

Paolo

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Qemu-devel] [PATCH v2 1/3] q35: set split kernel irqchip as default
@ 2019-04-30 23:09                 ` Paolo Bonzini
  0 siblings, 0 replies; 38+ messages in thread
From: Paolo Bonzini @ 2019-04-30 23:09 UTC (permalink / raw)
  To: Alex Williamson
  Cc: Igor Mammedov, Michael S . Tsirkin, qemu-devel, Peter Xu,
	Eduardo Habkost

On 01/05/19 01:01, Alex Williamson wrote:
> Poking at this a bit, we can add kvm_irqchip_is_split() to the set of
> things we test for in hw/vfio/pci.c:vfio_intx_enable_kvm() to avoid the
> KVM INTx bypass when using split IRQ chip.

Yes, this should be enough.

> The only way I can get the GPU/Windows configuration usable is to
> assert the IRQ, immediately de-assert, and unmask the device all from
> vfio_intx_interrupt().  An interrupt intensive graphics benchmark runs
> at ~80% of KVM irqchip with about 10% more CPU load with this
> experiment (but it actually runs!).

Nice.  If you can do it directly from hw/vfio there may be no need to do
more changes to the IOAPIC, and least not immediately.  But that is not
a good emulation of INTX, isn't it?  IIUC, it relies on the
level-triggered interrupt never being masked in the IOAPIC.

> Any other insights appreciated, and I really would like to understand
> what we've gained with split irqchip and whether it's worth this.

We've gained guest interrupt remapping support, we are not relying on
newer kernel versions, and the attack surface from the guest to the
hypervisor is smaller.

Paolo



^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Qemu-devel] [PATCH v2 1/3] q35: set split kernel irqchip as default
@ 2019-05-01  0:28                   ` Alex Williamson
  0 siblings, 0 replies; 38+ messages in thread
From: Alex Williamson @ 2019-05-01  0:28 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Peter Xu, qemu-devel, Igor Mammedov, Eduardo Habkost,
	Michael S . Tsirkin

On Wed, 1 May 2019 01:09:48 +0200
Paolo Bonzini <pbonzini@redhat.com> wrote:

> On 01/05/19 01:01, Alex Williamson wrote:
> > Poking at this a bit, we can add kvm_irqchip_is_split() to the set of
> > things we test for in hw/vfio/pci.c:vfio_intx_enable_kvm() to avoid the
> > KVM INTx bypass when using split IRQ chip.  
> 
> Yes, this should be enough.
> 
> > The only way I can get the GPU/Windows configuration usable is to
> > assert the IRQ, immediately de-assert, and unmask the device all from
> > vfio_intx_interrupt().  An interrupt intensive graphics benchmark runs
> > at ~80% of KVM irqchip with about 10% more CPU load with this
> > experiment (but it actually runs!).  
> 
> Nice.  If you can do it directly from hw/vfio there may be no need to do
> more changes to the IOAPIC, and least not immediately.  But that is not
> a good emulation of INTX, isn't it?  IIUC, it relies on the
> level-triggered interrupt never being masked in the IOAPIC.

Yeah, that's the problem, well one of the problems in addition to the
lower performance and increased overhead, is that it's a rather poor
emulation of INTx.  It matches pci_irq_pulse(), which has been
discouraged from use.  Anything but kernel irqchip makes me nervous for
INTx, but emulating it with a different physical IRQ type definitely
more so.

> > Any other insights appreciated, and I really would like to understand
> > what we've gained with split irqchip and whether it's worth this.  
> 
> We've gained guest interrupt remapping support, we are not relying on
> newer kernel versions, and the attack surface from the guest to the
> hypervisor is smaller.

Interrupt remapping is really only necessary with high vCPU counts or
maybe nested device assignment, seems doubtful that has a larger user
base.  I did re-watch the video of Steve Rutherford's talk from a
couple years ago, but he does re-iterate that pulling the ioapic from
KVM does have drawbacks for general purpose VMs that I thought would
have precluded it from becoming the default for the base QEMU machine
type.  Getting a GeForce card to run with MSI requires (repeated)
regedit'ing on Windows or module options on Linux.  The audio device
can require the same, otherwise we're probably mostly looking at old
devices assigned for compatibility using INTx, NICs or USB devices
would of course more likely use MSI/X for anything reasonably modern.
Thanks,

Alex

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Qemu-devel] [PATCH v2 1/3] q35: set split kernel irqchip as default
@ 2019-05-01  0:28                   ` Alex Williamson
  0 siblings, 0 replies; 38+ messages in thread
From: Alex Williamson @ 2019-05-01  0:28 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Igor Mammedov, Michael S . Tsirkin, qemu-devel, Peter Xu,
	Eduardo Habkost

On Wed, 1 May 2019 01:09:48 +0200
Paolo Bonzini <pbonzini@redhat.com> wrote:

> On 01/05/19 01:01, Alex Williamson wrote:
> > Poking at this a bit, we can add kvm_irqchip_is_split() to the set of
> > things we test for in hw/vfio/pci.c:vfio_intx_enable_kvm() to avoid the
> > KVM INTx bypass when using split IRQ chip.  
> 
> Yes, this should be enough.
> 
> > The only way I can get the GPU/Windows configuration usable is to
> > assert the IRQ, immediately de-assert, and unmask the device all from
> > vfio_intx_interrupt().  An interrupt intensive graphics benchmark runs
> > at ~80% of KVM irqchip with about 10% more CPU load with this
> > experiment (but it actually runs!).  
> 
> Nice.  If you can do it directly from hw/vfio there may be no need to do
> more changes to the IOAPIC, and least not immediately.  But that is not
> a good emulation of INTX, isn't it?  IIUC, it relies on the
> level-triggered interrupt never being masked in the IOAPIC.

Yeah, that's the problem, well one of the problems in addition to the
lower performance and increased overhead, is that it's a rather poor
emulation of INTx.  It matches pci_irq_pulse(), which has been
discouraged from use.  Anything but kernel irqchip makes me nervous for
INTx, but emulating it with a different physical IRQ type definitely
more so.

> > Any other insights appreciated, and I really would like to understand
> > what we've gained with split irqchip and whether it's worth this.  
> 
> We've gained guest interrupt remapping support, we are not relying on
> newer kernel versions, and the attack surface from the guest to the
> hypervisor is smaller.

Interrupt remapping is really only necessary with high vCPU counts or
maybe nested device assignment, seems doubtful that has a larger user
base.  I did re-watch the video of Steve Rutherford's talk from a
couple years ago, but he does re-iterate that pulling the ioapic from
KVM does have drawbacks for general purpose VMs that I thought would
have precluded it from becoming the default for the base QEMU machine
type.  Getting a GeForce card to run with MSI requires (repeated)
regedit'ing on Windows or module options on Linux.  The audio device
can require the same, otherwise we're probably mostly looking at old
devices assigned for compatibility using INTx, NICs or USB devices
would of course more likely use MSI/X for anything reasonably modern.
Thanks,

Alex

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Qemu-devel] [PATCH v2 1/3] q35: set split kernel irqchip as default
@ 2019-05-03 18:42                   ` Eduardo Habkost
  0 siblings, 0 replies; 38+ messages in thread
From: Eduardo Habkost @ 2019-05-03 18:42 UTC (permalink / raw)
  To: Alex Williamson
  Cc: Paolo Bonzini, Igor Mammedov, Michael S . Tsirkin, qemu-devel, Peter Xu

On Mon, Apr 29, 2019 at 09:22:12AM -0600, Alex Williamson wrote:
[...]
> > > What's a good 4.0.1 strategy to resolve this?  Re-instate KVM irqchip
> > > as the Q35 default?  I can't see that simply switching to current QEMU
> > > handling is a viable option for performance?  What about 4.1?  We could
> > > certainly improve EOI support in QEMU, there's essentially no support
> > > currently, but it seems like an uphill battle for an iothread based
> > > userspace ioapic to ever compare to KVM handling?  Thanks,  
> > 
> > irqchip=split and irqchip=kernel aren't guest ABI compatible, are
> > they?  That would make it impossible to fix this in pc-q35-4.0
> > for a 4.0.1 update.
> 
> I suppose it would require a pc-q35-4.0.1 machine type :-\  Thanks,

I wonder if it's possible to untangle this and make the irqchip
option stop affecting guest ABI on 4.1+ machine-types?  This way
QEMU could choose smarter defaults in the future without the
compatibility code hassle.

-- 
Eduardo

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Qemu-devel] [PATCH v2 1/3] q35: set split kernel irqchip as default
@ 2019-05-03 18:42                   ` Eduardo Habkost
  0 siblings, 0 replies; 38+ messages in thread
From: Eduardo Habkost @ 2019-05-03 18:42 UTC (permalink / raw)
  To: Alex Williamson
  Cc: Paolo Bonzini, Michael S . Tsirkin, qemu-devel, Peter Xu, Igor Mammedov

On Mon, Apr 29, 2019 at 09:22:12AM -0600, Alex Williamson wrote:
[...]
> > > What's a good 4.0.1 strategy to resolve this?  Re-instate KVM irqchip
> > > as the Q35 default?  I can't see that simply switching to current QEMU
> > > handling is a viable option for performance?  What about 4.1?  We could
> > > certainly improve EOI support in QEMU, there's essentially no support
> > > currently, but it seems like an uphill battle for an iothread based
> > > userspace ioapic to ever compare to KVM handling?  Thanks,  
> > 
> > irqchip=split and irqchip=kernel aren't guest ABI compatible, are
> > they?  That would make it impossible to fix this in pc-q35-4.0
> > for a 4.0.1 update.
> 
> I suppose it would require a pc-q35-4.0.1 machine type :-\  Thanks,

I wonder if it's possible to untangle this and make the irqchip
option stop affecting guest ABI on 4.1+ machine-types?  This way
QEMU could choose smarter defaults in the future without the
compatibility code hassle.

-- 
Eduardo


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Qemu-devel] [PATCH v2 1/3] q35: set split kernel irqchip as default
@ 2019-05-03 20:00                 ` Michael S. Tsirkin
  0 siblings, 0 replies; 38+ messages in thread
From: Michael S. Tsirkin @ 2019-05-03 20:00 UTC (permalink / raw)
  To: Eduardo Habkost
  Cc: Alex Williamson, Paolo Bonzini, Igor Mammedov, qemu-devel, Peter Xu

On Mon, Apr 29, 2019 at 11:55:56AM -0300, Eduardo Habkost wrote:
> irqchip=split and irqchip=kernel aren't guest ABI compatible, are
> they?

Can you remind me why they aren't?

> -- 
> Eduardo

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Qemu-devel] [PATCH v2 1/3] q35: set split kernel irqchip as default
@ 2019-05-03 20:00                 ` Michael S. Tsirkin
  0 siblings, 0 replies; 38+ messages in thread
From: Michael S. Tsirkin @ 2019-05-03 20:00 UTC (permalink / raw)
  To: Eduardo Habkost
  Cc: Paolo Bonzini, Alex Williamson, qemu-devel, Peter Xu, Igor Mammedov

On Mon, Apr 29, 2019 at 11:55:56AM -0300, Eduardo Habkost wrote:
> irqchip=split and irqchip=kernel aren't guest ABI compatible, are
> they?

Can you remind me why they aren't?

> -- 
> Eduardo


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Qemu-devel] [PATCH v2 1/3] q35: set split kernel irqchip as default
@ 2019-05-03 20:23                   ` Eduardo Habkost
  0 siblings, 0 replies; 38+ messages in thread
From: Eduardo Habkost @ 2019-05-03 20:23 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Alex Williamson, Paolo Bonzini, Igor Mammedov, qemu-devel, Peter Xu

On Fri, May 03, 2019 at 04:00:33PM -0400, Michael S. Tsirkin wrote:
> On Mon, Apr 29, 2019 at 11:55:56AM -0300, Eduardo Habkost wrote:
> > irqchip=split and irqchip=kernel aren't guest ABI compatible, are
> > they?
> 
> Can you remind me why they aren't?

We have the code introduced by patch 3/3 from this series, but I
don't know if it's the only difference:

hw/i386/x86-iommu.c=static void x86_iommu_realize(DeviceState *dev, Error **errp)
[...]
hw/i386/x86-iommu.c:    bool irq_all_kernel = kvm_irqchip_in_kernel() && !kvm_irqchip_is_split();
[...]
hw/i386/x86-iommu.c-    /* If the user didn't specify IR, choose a default value for it */
hw/i386/x86-iommu.c-    if (x86_iommu->intr_supported == ON_OFF_AUTO_AUTO) {
hw/i386/x86-iommu.c-        x86_iommu->intr_supported = irq_all_kernel ?
hw/i386/x86-iommu.c-            ON_OFF_AUTO_OFF : ON_OFF_AUTO_ON;
hw/i386/x86-iommu.c-    }

-- 
Eduardo

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Qemu-devel] [PATCH v2 1/3] q35: set split kernel irqchip as default
@ 2019-05-03 20:23                   ` Eduardo Habkost
  0 siblings, 0 replies; 38+ messages in thread
From: Eduardo Habkost @ 2019-05-03 20:23 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Paolo Bonzini, Alex Williamson, qemu-devel, Peter Xu, Igor Mammedov

On Fri, May 03, 2019 at 04:00:33PM -0400, Michael S. Tsirkin wrote:
> On Mon, Apr 29, 2019 at 11:55:56AM -0300, Eduardo Habkost wrote:
> > irqchip=split and irqchip=kernel aren't guest ABI compatible, are
> > they?
> 
> Can you remind me why they aren't?

We have the code introduced by patch 3/3 from this series, but I
don't know if it's the only difference:

hw/i386/x86-iommu.c=static void x86_iommu_realize(DeviceState *dev, Error **errp)
[...]
hw/i386/x86-iommu.c:    bool irq_all_kernel = kvm_irqchip_in_kernel() && !kvm_irqchip_is_split();
[...]
hw/i386/x86-iommu.c-    /* If the user didn't specify IR, choose a default value for it */
hw/i386/x86-iommu.c-    if (x86_iommu->intr_supported == ON_OFF_AUTO_AUTO) {
hw/i386/x86-iommu.c-        x86_iommu->intr_supported = irq_all_kernel ?
hw/i386/x86-iommu.c-            ON_OFF_AUTO_OFF : ON_OFF_AUTO_ON;
hw/i386/x86-iommu.c-    }

-- 
Eduardo


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Qemu-devel] [PATCH v2 1/3] q35: set split kernel irqchip as default
@ 2019-05-05  9:06                     ` Peter Xu
  0 siblings, 0 replies; 38+ messages in thread
From: Peter Xu @ 2019-05-05  9:06 UTC (permalink / raw)
  To: Eduardo Habkost
  Cc: Alex Williamson, Paolo Bonzini, Igor Mammedov,
	Michael S . Tsirkin, qemu-devel

On Fri, May 03, 2019 at 03:42:06PM -0300, Eduardo Habkost wrote:
> On Mon, Apr 29, 2019 at 09:22:12AM -0600, Alex Williamson wrote:
> [...]
> > > > What's a good 4.0.1 strategy to resolve this?  Re-instate KVM irqchip
> > > > as the Q35 default?  I can't see that simply switching to current QEMU
> > > > handling is a viable option for performance?  What about 4.1?  We could
> > > > certainly improve EOI support in QEMU, there's essentially no support
> > > > currently, but it seems like an uphill battle for an iothread based
> > > > userspace ioapic to ever compare to KVM handling?  Thanks,  
> > > 
> > > irqchip=split and irqchip=kernel aren't guest ABI compatible, are
> > > they?  That would make it impossible to fix this in pc-q35-4.0
> > > for a 4.0.1 update.
> > 
> > I suppose it would require a pc-q35-4.0.1 machine type :-\  Thanks,
> 
> I wonder if it's possible to untangle this and make the irqchip
> option stop affecting guest ABI on 4.1+ machine-types?  This way
> QEMU could choose smarter defaults in the future without the
> compatibility code hassle.

Hi, Eduardo,

Do you mean to enable IOMMU IR for kernel-irqchip=on?  If so, I would
say it's not trivial...  The major issue is that we probably need to
explicitly kick QEMU for every kernel IOAPIC setups since QEMU is the
only one who knows everything about interrupt remapping, while KVM
don't even have such a mechanism so far.

I'm thinking whether we should try to fix the functional problem first
as proposed by Alex?  The problem is even if we switch the default
mode for Q35 the user could still specify that in the command line and
I feel like we'd better fix the functional issue first before we
consider the possible performance regression?  The worst case I
thought of is with the fix we offer a good QEMU 4.1/4.0.1 for users (I
believe in most cases for individual users since this issue seems to
not affect most enterprise users and with modern hardwares) then we
also tell people to explicitly enable kernel-irqchip=on to avoid the
potential performance degradation.

(Sorry for the late chim-in, and of course sorry for breaking VFIO
 from the very beginning...)

-- 
Peter Xu

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Qemu-devel] [PATCH v2 1/3] q35: set split kernel irqchip as default
@ 2019-05-05  9:06                     ` Peter Xu
  0 siblings, 0 replies; 38+ messages in thread
From: Peter Xu @ 2019-05-05  9:06 UTC (permalink / raw)
  To: Eduardo Habkost
  Cc: Paolo Bonzini, Alex Williamson, Michael S . Tsirkin, qemu-devel,
	Igor Mammedov

On Fri, May 03, 2019 at 03:42:06PM -0300, Eduardo Habkost wrote:
> On Mon, Apr 29, 2019 at 09:22:12AM -0600, Alex Williamson wrote:
> [...]
> > > > What's a good 4.0.1 strategy to resolve this?  Re-instate KVM irqchip
> > > > as the Q35 default?  I can't see that simply switching to current QEMU
> > > > handling is a viable option for performance?  What about 4.1?  We could
> > > > certainly improve EOI support in QEMU, there's essentially no support
> > > > currently, but it seems like an uphill battle for an iothread based
> > > > userspace ioapic to ever compare to KVM handling?  Thanks,  
> > > 
> > > irqchip=split and irqchip=kernel aren't guest ABI compatible, are
> > > they?  That would make it impossible to fix this in pc-q35-4.0
> > > for a 4.0.1 update.
> > 
> > I suppose it would require a pc-q35-4.0.1 machine type :-\  Thanks,
> 
> I wonder if it's possible to untangle this and make the irqchip
> option stop affecting guest ABI on 4.1+ machine-types?  This way
> QEMU could choose smarter defaults in the future without the
> compatibility code hassle.

Hi, Eduardo,

Do you mean to enable IOMMU IR for kernel-irqchip=on?  If so, I would
say it's not trivial...  The major issue is that we probably need to
explicitly kick QEMU for every kernel IOAPIC setups since QEMU is the
only one who knows everything about interrupt remapping, while KVM
don't even have such a mechanism so far.

I'm thinking whether we should try to fix the functional problem first
as proposed by Alex?  The problem is even if we switch the default
mode for Q35 the user could still specify that in the command line and
I feel like we'd better fix the functional issue first before we
consider the possible performance regression?  The worst case I
thought of is with the fix we offer a good QEMU 4.1/4.0.1 for users (I
believe in most cases for individual users since this issue seems to
not affect most enterprise users and with modern hardwares) then we
also tell people to explicitly enable kernel-irqchip=on to avoid the
potential performance degradation.

(Sorry for the late chim-in, and of course sorry for breaking VFIO
 from the very beginning...)

-- 
Peter Xu

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Qemu-devel] [PATCH v2 1/3] q35: set split kernel irqchip as default
  2019-05-05  9:06                     ` Peter Xu
  (?)
@ 2019-05-06 16:13                     ` Paolo Bonzini
  2019-05-06 21:16                       ` Eduardo Habkost
  -1 siblings, 1 reply; 38+ messages in thread
From: Paolo Bonzini @ 2019-05-06 16:13 UTC (permalink / raw)
  To: Peter Xu, Eduardo Habkost
  Cc: Igor Mammedov, Alex Williamson, qemu-devel, Michael S . Tsirkin

On 05/05/19 04:06, Peter Xu wrote:
>> I wonder if it's possible to untangle this and make the irqchip
>> option stop affecting guest ABI on 4.1+ machine-types?  This way
>> QEMU could choose smarter defaults in the future without the
>> compatibility code hassle.
> Hi, Eduardo,
> 
> Do you mean to enable IOMMU IR for kernel-irqchip=on?  If so, I would
> say it's not trivial...  The major issue is that we probably need to
> explicitly kick QEMU for every kernel IOAPIC setups since QEMU is the
> only one who knows everything about interrupt remapping, while KVM
> don't even have such a mechanism so far.

Right, it's not easy and it would be anyway possible only on kernels.
There would have to be a mechanism to setup IOAPIC->MSI routes, similar
to irqchip=split, and as Peter mentions an MMIO exit on writes to the
routing table.

Paolo


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Qemu-devel] [PATCH v2 1/3] q35: set split kernel irqchip as default
  2019-05-03 20:00                 ` Michael S. Tsirkin
  (?)
  (?)
@ 2019-05-06 16:17                 ` Paolo Bonzini
  -1 siblings, 0 replies; 38+ messages in thread
From: Paolo Bonzini @ 2019-05-06 16:17 UTC (permalink / raw)
  To: Michael S. Tsirkin, Eduardo Habkost
  Cc: Igor Mammedov, Alex Williamson, qemu-devel, Peter Xu

On 03/05/19 15:00, Michael S. Tsirkin wrote:
> On Mon, Apr 29, 2019 at 11:55:56AM -0300, Eduardo Habkost wrote:
>> irqchip=split and irqchip=kernel aren't guest ABI compatible, are
>> they?
> 
> Can you remind me why they aren't?

They are compatible if the userspace IOAPIC is configured with "-global
ioapic.version=0x11".  The userspace IOAPIC in addition supports version
0x20, which adds the EOI register (a requirement for interrupt remapping
but not necessary outside that case), and makes it the default.

Paolo


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Qemu-devel] [PATCH v2 1/3] q35: set split kernel irqchip as default
  2019-05-06 16:13                     ` Paolo Bonzini
@ 2019-05-06 21:16                       ` Eduardo Habkost
  0 siblings, 0 replies; 38+ messages in thread
From: Eduardo Habkost @ 2019-05-06 21:16 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Igor Mammedov, Alex Williamson, qemu-devel, Peter Xu,
	Michael S . Tsirkin

On Mon, May 06, 2019 at 11:13:28AM -0500, Paolo Bonzini wrote:
> On 05/05/19 04:06, Peter Xu wrote:
> >> I wonder if it's possible to untangle this and make the irqchip
> >> option stop affecting guest ABI on 4.1+ machine-types?  This way
> >> QEMU could choose smarter defaults in the future without the
> >> compatibility code hassle.
> > Hi, Eduardo,
> > 
> > Do you mean to enable IOMMU IR for kernel-irqchip=on?  If so, I would
> > say it's not trivial...  The major issue is that we probably need to
> > explicitly kick QEMU for every kernel IOAPIC setups since QEMU is the
> > only one who knows everything about interrupt remapping, while KVM
> > don't even have such a mechanism so far.
> 
> Right, it's not easy and it would be anyway possible only on kernels.
> There would have to be a mechanism to setup IOAPIC->MSI routes, similar
> to irqchip=split, and as Peter mentions an MMIO exit on writes to the
> routing table.

I don't mean we necessarily should enable IR for
kernel-irqchip=on too.  I'd just prefer the default setting to
not depend on the kernel-irqchip option.

x86-iommu could either have intremap=on as the default (and
refuse to run with kernel-irqchip=on without explicit
intremap=off), or simply default to intremap=off.

But as Paolo indicated elsewhere, this is not the only guest ABI
difference between "on" and "split".  Probably it's not worth the
hassle to try to to untangle this.

-- 
Eduardo


^ permalink raw reply	[flat|nested] 38+ messages in thread

end of thread, other threads:[~2019-05-06 21:17 UTC | newest]

Thread overview: 38+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-12-20  5:40 [Qemu-devel] [PATCH v2 0/3] q35: change defaults for kernel irqchip and IR Peter Xu
2018-12-20  5:40 ` [Qemu-devel] [PATCH v2 1/3] q35: set split kernel irqchip as default Peter Xu
2018-12-20 12:13   ` Eduardo Habkost
2019-04-26 19:27   ` Alex Williamson
2019-04-26 19:27     ` Alex Williamson
2019-04-26 21:02     ` Alex Williamson
2019-04-26 21:02       ` Alex Williamson
2019-04-27  5:29       ` Paolo Bonzini
2019-04-27  5:29         ` Paolo Bonzini
2019-04-27  8:09         ` Paolo Bonzini
2019-04-27  8:09           ` Paolo Bonzini
2019-04-29 14:21           ` Alex Williamson
2019-04-29 14:21             ` Alex Williamson
2019-04-29 14:55             ` Eduardo Habkost
2019-04-29 14:55               ` Eduardo Habkost
2019-04-29 15:22               ` Alex Williamson
2019-04-29 15:22                 ` Alex Williamson
2019-05-03 18:42                 ` Eduardo Habkost
2019-05-03 18:42                   ` Eduardo Habkost
2019-05-05  9:06                   ` Peter Xu
2019-05-05  9:06                     ` Peter Xu
2019-05-06 16:13                     ` Paolo Bonzini
2019-05-06 21:16                       ` Eduardo Habkost
2019-05-03 20:00               ` Michael S. Tsirkin
2019-05-03 20:00                 ` Michael S. Tsirkin
2019-05-03 20:23                 ` Eduardo Habkost
2019-05-03 20:23                   ` Eduardo Habkost
2019-05-06 16:17                 ` Paolo Bonzini
2019-04-30 23:01             ` Alex Williamson
2019-04-30 23:01               ` Alex Williamson
2019-04-30 23:09               ` Paolo Bonzini
2019-04-30 23:09                 ` Paolo Bonzini
2019-05-01  0:28                 ` Alex Williamson
2019-05-01  0:28                   ` Alex Williamson
2018-12-20  5:40 ` [Qemu-devel] [PATCH v2 2/3] x86-iommu: switch intr_supported to OnOffAuto type Peter Xu
2018-12-20  5:40 ` [Qemu-devel] [PATCH v2 3/3] x86-iommu: turn on IR by default if proper Peter Xu
2018-12-20 10:00 ` [Qemu-devel] [PATCH v2 0/3] q35: change defaults for kernel irqchip and IR Paolo Bonzini
2018-12-20 10:16   ` Peter Xu

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.