All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC][PATCH 00/45] qemu-kvm: MSI layer rework for in-kernel irqchip support
@ 2011-10-17  9:27 ` Jan Kiszka
  0 siblings, 0 replies; 288+ messages in thread
From: Jan Kiszka @ 2011-10-17  9:27 UTC (permalink / raw)
  To: Avi Kivity, Marcelo Tosatti
  Cc: kvm, Alex Williamson, Michael S. Tsirkin, qemu-devel,
	Alexander Graf, Gerd Hoffmann, Isaku Yamahata

As previously indicated, I was working for quite a while on a major
refactoring of the MSI "additions" we have in qemu-kvm to support
in-kernel irqchip, vhost and device assignment. This is now the outcome.

I'm quite happy with it, things are still working (apparently), and the
invasiveness of KVM hooks into the MSI layer is significantly reduced.
Moreover, I was able to port the device assignment code over generic MSI
support, reducing the size of that file a bit further.

Some further highlights:
 - fix for HPET MSI support with in-kernel irqchip
 - fully configurable MSI-X (allows 1:1 mapping for assigned devices)
 - refactored KVM core API for device assignment and IRQ routing

I'm sending the whole series in one chunk so that you can see what the
result will be. It's RFC as I bet that there are regressions included
and maybe still room left for improvements. Once all is fine (can be
broken up into multiple chunks for the merge), I would suggest patching
qemu-kvm first and then start with porting things over to upstream.

Comments & review welcome.

CC: Alexander Graf <agraf@suse.de>
CC: Gerd Hoffmann <kraxel@redhat.com>
CC: Isaku Yamahata <yamahata@valinux.co.jp>

Jan Kiszka (45):
  msi: Guard msi/msix_write_config with msi_present
  msi: Guard msi_reset with msi_present
  msi: Use msi/msix_present more consistently
  msi: Invoke msi/msix_reset from PCI core
  msi: Invoke msi/msix_write_config from PCI core
  msix: Prevent bogus mask updates on MMIO accesses
  msi: Generalize msix_supported to msi_supported
  Introduce MSIMessage structure
  msi: Factor out msi_message_from_vector
  msix: Factor out msix_message_from_vector
  msi: Factor out delivery hook
  msi: Introduce MSIRoutingCache
  hpet: Use msi_deliver
  qemu-kvm: Drop useless kvm_clear_gsi_routes
  qemu-kvm: Drop unused kvm_del_irq_route
  qemu-kvm: Use MSIMessage and MSIRoutingCache
  qemu-kvm: Track MSIRoutingCache in KVM routing table
  qemu-kvm: Hook into MSI delivery at APIC level
  qemu-kvm: Factor out kvm_msi_irqfd_set
  qemu-kvm: msix: Only invoke msix_handle_mask_update on changes
  qemu-kvm: msix: Don't fire notifier spuriously on set/unset
  qemu-kvm: msix: Fire mask notifier on global mask changes
  qemu-kvm: Rework MSI-X mask notifier to generic MSI config notifiers
  qemu-kvm: msix: Don't handle mask updated while disabled
  qemu-kvm: Update MSI cache on kvm_msi_irqfd_set
  qemu-kvm: Use g_realloc for irq_routes extension
  qemu-kvm: Lazily update MSI caches
  qemu-kvm: msix: Drop tracking of used vectors
  pci-assign: Drop kvm_assigned_irq::host_irq initialization
  pci-assign: Rename assign_irq to assign_intx
  qemu-kvm: Refactor kvm_deassign_irq to kvm_device_irq_deassign
  pci-assign: Factor out deassign_irq
  qemu-kvm: Factor out kvm_device_intx_assign
  qemu-kvm: Factor out kvm_device_msi_assign
  pci-assign: Polish assigned_dev_update_msix_mmio
  qemu-kvm: Factor out kvm_device_msix_* services
  qemu-kvm: Clean up irqrouting API
  msi: Implement config notifiers for legacy MSI
  pci-assign: Use generic MSI support
  qemu-kvm: msix: Drop check for preexisting cap from msix_add_config
  msix: Drop unused msix_bar_size
  msix: Introduce msix_init_simple
  msix: Allow to customize capability on init
  pci-assign: Use generic MSI-X support
  pci-assign: Fix coding style issues

 hw/apic.c               |   28 ++-
 hw/apic.h               |    1 +
 hw/device-assignment.c  |  751 +++++++++++++++++------------------------------
 hw/device-assignment.h  |   29 +--
 hw/hpet.c               |    7 +-
 hw/ide/ich.c            |    8 -
 hw/intel-hda.c          |   12 -
 hw/ioh3420.c            |    3 +-
 hw/ivshmem.c            |   22 +--
 hw/msi.c                |  329 +++++++++++++---------
 hw/msi.h                |   30 ++-
 hw/msix.c               |  626 ++++++++++++++++++----------------------
 hw/msix.h               |   29 +-
 hw/pc.c                 |   15 +-
 hw/pci.c                |    9 +-
 hw/pci.h                |   34 ++-
 hw/pci_bridge.c         |    4 +
 hw/virtio-pci.c         |   75 ++---
 hw/virtio-pci.h         |    1 -
 hw/xio3130_downstream.c |    3 +-
 hw/xio3130_upstream.c   |    2 -
 kvm-all.c               |    1 +
 kvm-stub.c              |   23 +--
 kvm.h                   |   17 +-
 qemu-common.h           |    2 +
 qemu-kvm-x86.c          |    1 -
 qemu-kvm.c              |  281 ++++++++++++------
 qemu-kvm.h              |   81 +-----
 28 files changed, 1110 insertions(+), 1314 deletions(-)

-- 
1.7.3.4


^ permalink raw reply	[flat|nested] 288+ messages in thread

* [Qemu-devel] [RFC][PATCH 00/45] qemu-kvm: MSI layer rework for in-kernel irqchip support
@ 2011-10-17  9:27 ` Jan Kiszka
  0 siblings, 0 replies; 288+ messages in thread
From: Jan Kiszka @ 2011-10-17  9:27 UTC (permalink / raw)
  To: Avi Kivity, Marcelo Tosatti
  Cc: kvm, Michael S. Tsirkin, Alexander Graf, qemu-devel,
	Isaku Yamahata, Alex Williamson, Gerd Hoffmann

As previously indicated, I was working for quite a while on a major
refactoring of the MSI "additions" we have in qemu-kvm to support
in-kernel irqchip, vhost and device assignment. This is now the outcome.

I'm quite happy with it, things are still working (apparently), and the
invasiveness of KVM hooks into the MSI layer is significantly reduced.
Moreover, I was able to port the device assignment code over generic MSI
support, reducing the size of that file a bit further.

Some further highlights:
 - fix for HPET MSI support with in-kernel irqchip
 - fully configurable MSI-X (allows 1:1 mapping for assigned devices)
 - refactored KVM core API for device assignment and IRQ routing

I'm sending the whole series in one chunk so that you can see what the
result will be. It's RFC as I bet that there are regressions included
and maybe still room left for improvements. Once all is fine (can be
broken up into multiple chunks for the merge), I would suggest patching
qemu-kvm first and then start with porting things over to upstream.

Comments & review welcome.

CC: Alexander Graf <agraf@suse.de>
CC: Gerd Hoffmann <kraxel@redhat.com>
CC: Isaku Yamahata <yamahata@valinux.co.jp>

Jan Kiszka (45):
  msi: Guard msi/msix_write_config with msi_present
  msi: Guard msi_reset with msi_present
  msi: Use msi/msix_present more consistently
  msi: Invoke msi/msix_reset from PCI core
  msi: Invoke msi/msix_write_config from PCI core
  msix: Prevent bogus mask updates on MMIO accesses
  msi: Generalize msix_supported to msi_supported
  Introduce MSIMessage structure
  msi: Factor out msi_message_from_vector
  msix: Factor out msix_message_from_vector
  msi: Factor out delivery hook
  msi: Introduce MSIRoutingCache
  hpet: Use msi_deliver
  qemu-kvm: Drop useless kvm_clear_gsi_routes
  qemu-kvm: Drop unused kvm_del_irq_route
  qemu-kvm: Use MSIMessage and MSIRoutingCache
  qemu-kvm: Track MSIRoutingCache in KVM routing table
  qemu-kvm: Hook into MSI delivery at APIC level
  qemu-kvm: Factor out kvm_msi_irqfd_set
  qemu-kvm: msix: Only invoke msix_handle_mask_update on changes
  qemu-kvm: msix: Don't fire notifier spuriously on set/unset
  qemu-kvm: msix: Fire mask notifier on global mask changes
  qemu-kvm: Rework MSI-X mask notifier to generic MSI config notifiers
  qemu-kvm: msix: Don't handle mask updated while disabled
  qemu-kvm: Update MSI cache on kvm_msi_irqfd_set
  qemu-kvm: Use g_realloc for irq_routes extension
  qemu-kvm: Lazily update MSI caches
  qemu-kvm: msix: Drop tracking of used vectors
  pci-assign: Drop kvm_assigned_irq::host_irq initialization
  pci-assign: Rename assign_irq to assign_intx
  qemu-kvm: Refactor kvm_deassign_irq to kvm_device_irq_deassign
  pci-assign: Factor out deassign_irq
  qemu-kvm: Factor out kvm_device_intx_assign
  qemu-kvm: Factor out kvm_device_msi_assign
  pci-assign: Polish assigned_dev_update_msix_mmio
  qemu-kvm: Factor out kvm_device_msix_* services
  qemu-kvm: Clean up irqrouting API
  msi: Implement config notifiers for legacy MSI
  pci-assign: Use generic MSI support
  qemu-kvm: msix: Drop check for preexisting cap from msix_add_config
  msix: Drop unused msix_bar_size
  msix: Introduce msix_init_simple
  msix: Allow to customize capability on init
  pci-assign: Use generic MSI-X support
  pci-assign: Fix coding style issues

 hw/apic.c               |   28 ++-
 hw/apic.h               |    1 +
 hw/device-assignment.c  |  751 +++++++++++++++++------------------------------
 hw/device-assignment.h  |   29 +--
 hw/hpet.c               |    7 +-
 hw/ide/ich.c            |    8 -
 hw/intel-hda.c          |   12 -
 hw/ioh3420.c            |    3 +-
 hw/ivshmem.c            |   22 +--
 hw/msi.c                |  329 +++++++++++++---------
 hw/msi.h                |   30 ++-
 hw/msix.c               |  626 ++++++++++++++++++----------------------
 hw/msix.h               |   29 +-
 hw/pc.c                 |   15 +-
 hw/pci.c                |    9 +-
 hw/pci.h                |   34 ++-
 hw/pci_bridge.c         |    4 +
 hw/virtio-pci.c         |   75 ++---
 hw/virtio-pci.h         |    1 -
 hw/xio3130_downstream.c |    3 +-
 hw/xio3130_upstream.c   |    2 -
 kvm-all.c               |    1 +
 kvm-stub.c              |   23 +--
 kvm.h                   |   17 +-
 qemu-common.h           |    2 +
 qemu-kvm-x86.c          |    1 -
 qemu-kvm.c              |  281 ++++++++++++------
 qemu-kvm.h              |   81 +-----
 28 files changed, 1110 insertions(+), 1314 deletions(-)

-- 
1.7.3.4

^ permalink raw reply	[flat|nested] 288+ messages in thread

* [RFC][PATCH 01/45] msi: Guard msi/msix_write_config with msi_present
  2011-10-17  9:27 ` [Qemu-devel] " Jan Kiszka
@ 2011-10-17  9:27   ` Jan Kiszka
  -1 siblings, 0 replies; 288+ messages in thread
From: Jan Kiszka @ 2011-10-17  9:27 UTC (permalink / raw)
  To: Avi Kivity, Marcelo Tosatti
  Cc: Alex Williamson, qemu-devel, kvm, Michael S. Tsirkin

Terminate msi/msix_write_config early if support is not enabled. This
allows to remove checks at the caller site if MSI is optional.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
---
 hw/msi.c  |    3 ++-
 hw/msix.c |    2 +-
 2 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/hw/msi.c b/hw/msi.c
index 56a4698..bbc9cd7 100644
--- a/hw/msi.c
+++ b/hw/msi.c
@@ -378,7 +378,8 @@ void msi_write_config(PCIDevice *dev, uint32_t addr, uint32_t val, int len)
     unsigned int vector;
     uint32_t pending;
 
-    if (!ranges_overlap(addr, len, dev->msi_cap, msi_cap_sizeof(flags))) {
+    if (!msi_present(dev) ||
+        !ranges_overlap(addr, len, dev->msi_cap, msi_cap_sizeof(flags))) {
         return;
     }
 
diff --git a/hw/msix.c b/hw/msix.c
index 60d6d1e..ebd5aee 100644
--- a/hw/msix.c
+++ b/hw/msix.c
@@ -240,7 +240,7 @@ void msix_write_config(PCIDevice *dev, uint32_t addr,
     unsigned enable_pos = dev->msix_cap + MSIX_CONTROL_OFFSET;
     int vector;
 
-    if (!range_covers_byte(addr, len, enable_pos)) {
+    if (!msix_present(dev) || !range_covers_byte(addr, len, enable_pos)) {
         return;
     }
 
-- 
1.7.3.4

^ permalink raw reply related	[flat|nested] 288+ messages in thread

* [Qemu-devel] [RFC][PATCH 01/45] msi: Guard msi/msix_write_config with msi_present
@ 2011-10-17  9:27   ` Jan Kiszka
  0 siblings, 0 replies; 288+ messages in thread
From: Jan Kiszka @ 2011-10-17  9:27 UTC (permalink / raw)
  To: Avi Kivity, Marcelo Tosatti
  Cc: Alex Williamson, qemu-devel, kvm, Michael S. Tsirkin

Terminate msi/msix_write_config early if support is not enabled. This
allows to remove checks at the caller site if MSI is optional.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
---
 hw/msi.c  |    3 ++-
 hw/msix.c |    2 +-
 2 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/hw/msi.c b/hw/msi.c
index 56a4698..bbc9cd7 100644
--- a/hw/msi.c
+++ b/hw/msi.c
@@ -378,7 +378,8 @@ void msi_write_config(PCIDevice *dev, uint32_t addr, uint32_t val, int len)
     unsigned int vector;
     uint32_t pending;
 
-    if (!ranges_overlap(addr, len, dev->msi_cap, msi_cap_sizeof(flags))) {
+    if (!msi_present(dev) ||
+        !ranges_overlap(addr, len, dev->msi_cap, msi_cap_sizeof(flags))) {
         return;
     }
 
diff --git a/hw/msix.c b/hw/msix.c
index 60d6d1e..ebd5aee 100644
--- a/hw/msix.c
+++ b/hw/msix.c
@@ -240,7 +240,7 @@ void msix_write_config(PCIDevice *dev, uint32_t addr,
     unsigned enable_pos = dev->msix_cap + MSIX_CONTROL_OFFSET;
     int vector;
 
-    if (!range_covers_byte(addr, len, enable_pos)) {
+    if (!msix_present(dev) || !range_covers_byte(addr, len, enable_pos)) {
         return;
     }
 
-- 
1.7.3.4

^ permalink raw reply related	[flat|nested] 288+ messages in thread

* [RFC][PATCH 02/45] msi: Guard msi_reset with msi_present
  2011-10-17  9:27 ` [Qemu-devel] " Jan Kiszka
@ 2011-10-17  9:27   ` Jan Kiszka
  -1 siblings, 0 replies; 288+ messages in thread
From: Jan Kiszka @ 2011-10-17  9:27 UTC (permalink / raw)
  To: Avi Kivity, Marcelo Tosatti
  Cc: kvm, Alex Williamson, Michael S. Tsirkin, qemu-devel

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
---
 hw/msi.c |    3 +++
 1 files changed, 3 insertions(+), 0 deletions(-)

diff --git a/hw/msi.c b/hw/msi.c
index bbc9cd7..5dbcccc 100644
--- a/hw/msi.c
+++ b/hw/msi.c
@@ -288,6 +288,9 @@ void msi_reset(PCIDevice *dev)
     uint16_t flags;
     bool msi64bit;
 
+    if (!msi_present(dev)) {
+        return;
+    }
     if (kvm_enabled() && kvm_irqchip_in_kernel()) {
         kvm_msi_free(dev);
     }
-- 
1.7.3.4


^ permalink raw reply related	[flat|nested] 288+ messages in thread

* [Qemu-devel] [RFC][PATCH 02/45] msi: Guard msi_reset with msi_present
@ 2011-10-17  9:27   ` Jan Kiszka
  0 siblings, 0 replies; 288+ messages in thread
From: Jan Kiszka @ 2011-10-17  9:27 UTC (permalink / raw)
  To: Avi Kivity, Marcelo Tosatti
  Cc: Alex Williamson, qemu-devel, kvm, Michael S. Tsirkin

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
---
 hw/msi.c |    3 +++
 1 files changed, 3 insertions(+), 0 deletions(-)

diff --git a/hw/msi.c b/hw/msi.c
index bbc9cd7..5dbcccc 100644
--- a/hw/msi.c
+++ b/hw/msi.c
@@ -288,6 +288,9 @@ void msi_reset(PCIDevice *dev)
     uint16_t flags;
     bool msi64bit;
 
+    if (!msi_present(dev)) {
+        return;
+    }
     if (kvm_enabled() && kvm_irqchip_in_kernel()) {
         kvm_msi_free(dev);
     }
-- 
1.7.3.4

^ permalink raw reply related	[flat|nested] 288+ messages in thread

* [RFC][PATCH 03/45] msi: Use msi/msix_present more consistently
  2011-10-17  9:27 ` [Qemu-devel] " Jan Kiszka
@ 2011-10-17  9:27   ` Jan Kiszka
  -1 siblings, 0 replies; 288+ messages in thread
From: Jan Kiszka @ 2011-10-17  9:27 UTC (permalink / raw)
  To: Avi Kivity, Marcelo Tosatti
  Cc: kvm, Alex Williamson, Michael S. Tsirkin, qemu-devel

Replace some open-coded msi/msix_present checks and drop redundant
msix_supported tests (present implies supported).

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
---
 hw/msi.c  |    2 +-
 hw/msix.c |   20 ++++++++------------
 2 files changed, 9 insertions(+), 13 deletions(-)

diff --git a/hw/msi.c b/hw/msi.c
index 5dbcccc..b117f69 100644
--- a/hw/msi.c
+++ b/hw/msi.c
@@ -266,7 +266,7 @@ void msi_uninit(struct PCIDevice *dev)
     uint16_t flags;
     uint8_t cap_size;
 
-    if (!(dev->cap_present & QEMU_PCI_CAP_MSI)) {
+    if (!msi_present(dev)) {
         return;
     }
     flags = pci_get_word(dev->config + msi_flags_off(dev));
diff --git a/hw/msix.c b/hw/msix.c
index ebd5aee..2c4de21 100644
--- a/hw/msix.c
+++ b/hw/msix.c
@@ -383,8 +383,9 @@ static void msix_free_irq_entries(PCIDevice *dev)
 /* Clean up resources for the device. */
 int msix_uninit(PCIDevice *dev, MemoryRegion *bar)
 {
-    if (!(dev->cap_present & QEMU_PCI_CAP_MSIX))
+    if (!msix_present(dev)) {
         return 0;
+    }
     pci_del_capability(dev, PCI_CAP_ID_MSIX, MSIX_CAP_LENGTH);
     dev->msix_cap = 0;
     msix_free_irq_entries(dev);
@@ -405,11 +406,7 @@ void msix_save(PCIDevice *dev, QEMUFile *f)
 {
     unsigned n = dev->msix_entries_nr;
 
-    if (!msix_supported) {
-        return;
-    }
-
-    if (!(dev->cap_present & QEMU_PCI_CAP_MSIX)) {
+    if (!msix_present(dev)) {
         return;
     }
     qemu_put_buffer(f, dev->msix_table_page, n * PCI_MSIX_ENTRY_SIZE);
@@ -421,10 +418,7 @@ void msix_load(PCIDevice *dev, QEMUFile *f)
 {
     unsigned n = dev->msix_entries_nr;
 
-    if (!msix_supported)
-        return;
-
-    if (!(dev->cap_present & QEMU_PCI_CAP_MSIX)) {
+    if (!msix_present(dev)) {
         return;
     }
 
@@ -480,8 +474,9 @@ void msix_notify(PCIDevice *dev, unsigned vector)
 
 void msix_reset(PCIDevice *dev)
 {
-    if (!(dev->cap_present & QEMU_PCI_CAP_MSIX))
+    if (!msix_present(dev)) {
         return;
+    }
     msix_free_irq_entries(dev);
     dev->config[dev->msix_cap + MSIX_CONTROL_OFFSET] &=
 	    ~dev->wmask[dev->msix_cap + MSIX_CONTROL_OFFSET];
@@ -531,8 +526,9 @@ void msix_vector_unuse(PCIDevice *dev, unsigned vector)
 
 void msix_unuse_all_vectors(PCIDevice *dev)
 {
-    if (!(dev->cap_present & QEMU_PCI_CAP_MSIX))
+    if (!msix_present(dev)) {
         return;
+    }
     msix_free_irq_entries(dev);
 }
 
-- 
1.7.3.4


^ permalink raw reply related	[flat|nested] 288+ messages in thread

* [Qemu-devel] [RFC][PATCH 03/45] msi: Use msi/msix_present more consistently
@ 2011-10-17  9:27   ` Jan Kiszka
  0 siblings, 0 replies; 288+ messages in thread
From: Jan Kiszka @ 2011-10-17  9:27 UTC (permalink / raw)
  To: Avi Kivity, Marcelo Tosatti
  Cc: Alex Williamson, qemu-devel, kvm, Michael S. Tsirkin

Replace some open-coded msi/msix_present checks and drop redundant
msix_supported tests (present implies supported).

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
---
 hw/msi.c  |    2 +-
 hw/msix.c |   20 ++++++++------------
 2 files changed, 9 insertions(+), 13 deletions(-)

diff --git a/hw/msi.c b/hw/msi.c
index 5dbcccc..b117f69 100644
--- a/hw/msi.c
+++ b/hw/msi.c
@@ -266,7 +266,7 @@ void msi_uninit(struct PCIDevice *dev)
     uint16_t flags;
     uint8_t cap_size;
 
-    if (!(dev->cap_present & QEMU_PCI_CAP_MSI)) {
+    if (!msi_present(dev)) {
         return;
     }
     flags = pci_get_word(dev->config + msi_flags_off(dev));
diff --git a/hw/msix.c b/hw/msix.c
index ebd5aee..2c4de21 100644
--- a/hw/msix.c
+++ b/hw/msix.c
@@ -383,8 +383,9 @@ static void msix_free_irq_entries(PCIDevice *dev)
 /* Clean up resources for the device. */
 int msix_uninit(PCIDevice *dev, MemoryRegion *bar)
 {
-    if (!(dev->cap_present & QEMU_PCI_CAP_MSIX))
+    if (!msix_present(dev)) {
         return 0;
+    }
     pci_del_capability(dev, PCI_CAP_ID_MSIX, MSIX_CAP_LENGTH);
     dev->msix_cap = 0;
     msix_free_irq_entries(dev);
@@ -405,11 +406,7 @@ void msix_save(PCIDevice *dev, QEMUFile *f)
 {
     unsigned n = dev->msix_entries_nr;
 
-    if (!msix_supported) {
-        return;
-    }
-
-    if (!(dev->cap_present & QEMU_PCI_CAP_MSIX)) {
+    if (!msix_present(dev)) {
         return;
     }
     qemu_put_buffer(f, dev->msix_table_page, n * PCI_MSIX_ENTRY_SIZE);
@@ -421,10 +418,7 @@ void msix_load(PCIDevice *dev, QEMUFile *f)
 {
     unsigned n = dev->msix_entries_nr;
 
-    if (!msix_supported)
-        return;
-
-    if (!(dev->cap_present & QEMU_PCI_CAP_MSIX)) {
+    if (!msix_present(dev)) {
         return;
     }
 
@@ -480,8 +474,9 @@ void msix_notify(PCIDevice *dev, unsigned vector)
 
 void msix_reset(PCIDevice *dev)
 {
-    if (!(dev->cap_present & QEMU_PCI_CAP_MSIX))
+    if (!msix_present(dev)) {
         return;
+    }
     msix_free_irq_entries(dev);
     dev->config[dev->msix_cap + MSIX_CONTROL_OFFSET] &=
 	    ~dev->wmask[dev->msix_cap + MSIX_CONTROL_OFFSET];
@@ -531,8 +526,9 @@ void msix_vector_unuse(PCIDevice *dev, unsigned vector)
 
 void msix_unuse_all_vectors(PCIDevice *dev)
 {
-    if (!(dev->cap_present & QEMU_PCI_CAP_MSIX))
+    if (!msix_present(dev)) {
         return;
+    }
     msix_free_irq_entries(dev);
 }
 
-- 
1.7.3.4

^ permalink raw reply related	[flat|nested] 288+ messages in thread

* [RFC][PATCH 04/45] msi: Invoke msi/msix_reset from PCI core
  2011-10-17  9:27 ` [Qemu-devel] " Jan Kiszka
@ 2011-10-17  9:27   ` Jan Kiszka
  -1 siblings, 0 replies; 288+ messages in thread
From: Jan Kiszka @ 2011-10-17  9:27 UTC (permalink / raw)
  To: Avi Kivity, Marcelo Tosatti
  Cc: kvm, Alex Williamson, Michael S. Tsirkin, qemu-devel,
	Alexander Graf, Gerd Hoffmann, Isaku Yamahata

There is no point in pushing this burden to the devices, they may rather
forget to call them (like intel-hda and ahci ATM). Instead, reset
functions are now called from pci_device_reset and pci_bridge_reset.
They do nothing if the MSI/MSI-X is not in use.

CC: Alexander Graf <agraf@suse.de>
CC: Gerd Hoffmann <kraxel@redhat.com>
CC: Isaku Yamahata <yamahata@valinux.co.jp>
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
---
 hw/ioh3420.c            |    2 +-
 hw/pci.c                |    4 ++++
 hw/pci_bridge.c         |    4 ++++
 hw/virtio-pci.c         |    1 -
 hw/xio3130_downstream.c |    2 +-
 hw/xio3130_upstream.c   |    1 -
 6 files changed, 10 insertions(+), 4 deletions(-)

diff --git a/hw/ioh3420.c b/hw/ioh3420.c
index a6bfbb9..fc2fb3b 100644
--- a/hw/ioh3420.c
+++ b/hw/ioh3420.c
@@ -81,7 +81,7 @@ static void ioh3420_write_config(PCIDevice *d,
 static void ioh3420_reset(DeviceState *qdev)
 {
     PCIDevice *d = DO_UPCAST(PCIDevice, qdev, qdev);
-    msi_reset(d);
+
     ioh3420_aer_vector_update(d);
     pcie_cap_root_reset(d);
     pcie_cap_deverr_reset(d);
diff --git a/hw/pci.c b/hw/pci.c
index 3c5d642..933d49e 100644
--- a/hw/pci.c
+++ b/hw/pci.c
@@ -35,6 +35,7 @@
 #include "qemu-objects.h"
 #include "range.h"
 #include "msi.h"
+#include "msix.h"
 
 //#define DEBUG_PCI
 #ifdef DEBUG_PCI
@@ -195,6 +196,9 @@ void pci_device_reset(PCIDevice *dev)
         }
     }
     pci_update_mappings(dev);
+
+    msi_reset(dev);
+    msix_reset(dev);
 }
 
 /*
diff --git a/hw/pci_bridge.c b/hw/pci_bridge.c
index b6287cd..e03c871 100644
--- a/hw/pci_bridge.c
+++ b/hw/pci_bridge.c
@@ -32,6 +32,8 @@
 #include "pci_bridge.h"
 #include "pci_internals.h"
 #include "range.h"
+#include "msi.h"
+#include "msix.h"
 
 /* PCI bridge subsystem vendor ID helper functions */
 #define PCI_SSVID_SIZEOF        8
@@ -296,6 +298,8 @@ void pci_bridge_reset(DeviceState *qdev)
 {
     PCIDevice *dev = DO_UPCAST(PCIDevice, qdev, qdev);
     pci_bridge_reset_reg(dev);
+    msi_reset(dev);
+    msix_reset(dev);
 }
 
 /* default qdev initialization function for PCI-to-PCI bridge */
diff --git a/hw/virtio-pci.c b/hw/virtio-pci.c
index 6582099..3fb250f 100644
--- a/hw/virtio-pci.c
+++ b/hw/virtio-pci.c
@@ -276,7 +276,6 @@ static void virtio_pci_reset(DeviceState *d)
     VirtIOPCIProxy *proxy = container_of(d, VirtIOPCIProxy, pci_dev.qdev);
     virtio_pci_stop_ioeventfd(proxy);
     virtio_reset(proxy->vdev);
-    msix_reset(&proxy->pci_dev);
     proxy->flags &= ~VIRTIO_PCI_FLAG_BUS_MASTER_BUG;
 }
 
diff --git a/hw/xio3130_downstream.c b/hw/xio3130_downstream.c
index d3c387d..464eefa 100644
--- a/hw/xio3130_downstream.c
+++ b/hw/xio3130_downstream.c
@@ -48,7 +48,7 @@ static void xio3130_downstream_write_config(PCIDevice *d, uint32_t address,
 static void xio3130_downstream_reset(DeviceState *qdev)
 {
     PCIDevice *d = DO_UPCAST(PCIDevice, qdev, qdev);
-    msi_reset(d);
+
     pcie_cap_deverr_reset(d);
     pcie_cap_slot_reset(d);
     pcie_cap_ari_reset(d);
diff --git a/hw/xio3130_upstream.c b/hw/xio3130_upstream.c
index 8283695..0d8d254 100644
--- a/hw/xio3130_upstream.c
+++ b/hw/xio3130_upstream.c
@@ -47,7 +47,6 @@ static void xio3130_upstream_write_config(PCIDevice *d, uint32_t address,
 static void xio3130_upstream_reset(DeviceState *qdev)
 {
     PCIDevice *d = DO_UPCAST(PCIDevice, qdev, qdev);
-    msi_reset(d);
     pci_bridge_reset(qdev);
     pcie_cap_deverr_reset(d);
 }
-- 
1.7.3.4


^ permalink raw reply related	[flat|nested] 288+ messages in thread

* [Qemu-devel] [RFC][PATCH 04/45] msi: Invoke msi/msix_reset from PCI core
@ 2011-10-17  9:27   ` Jan Kiszka
  0 siblings, 0 replies; 288+ messages in thread
From: Jan Kiszka @ 2011-10-17  9:27 UTC (permalink / raw)
  To: Avi Kivity, Marcelo Tosatti
  Cc: kvm, Michael S. Tsirkin, Alexander Graf, qemu-devel,
	Isaku Yamahata, Alex Williamson, Gerd Hoffmann

There is no point in pushing this burden to the devices, they may rather
forget to call them (like intel-hda and ahci ATM). Instead, reset
functions are now called from pci_device_reset and pci_bridge_reset.
They do nothing if the MSI/MSI-X is not in use.

CC: Alexander Graf <agraf@suse.de>
CC: Gerd Hoffmann <kraxel@redhat.com>
CC: Isaku Yamahata <yamahata@valinux.co.jp>
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
---
 hw/ioh3420.c            |    2 +-
 hw/pci.c                |    4 ++++
 hw/pci_bridge.c         |    4 ++++
 hw/virtio-pci.c         |    1 -
 hw/xio3130_downstream.c |    2 +-
 hw/xio3130_upstream.c   |    1 -
 6 files changed, 10 insertions(+), 4 deletions(-)

diff --git a/hw/ioh3420.c b/hw/ioh3420.c
index a6bfbb9..fc2fb3b 100644
--- a/hw/ioh3420.c
+++ b/hw/ioh3420.c
@@ -81,7 +81,7 @@ static void ioh3420_write_config(PCIDevice *d,
 static void ioh3420_reset(DeviceState *qdev)
 {
     PCIDevice *d = DO_UPCAST(PCIDevice, qdev, qdev);
-    msi_reset(d);
+
     ioh3420_aer_vector_update(d);
     pcie_cap_root_reset(d);
     pcie_cap_deverr_reset(d);
diff --git a/hw/pci.c b/hw/pci.c
index 3c5d642..933d49e 100644
--- a/hw/pci.c
+++ b/hw/pci.c
@@ -35,6 +35,7 @@
 #include "qemu-objects.h"
 #include "range.h"
 #include "msi.h"
+#include "msix.h"
 
 //#define DEBUG_PCI
 #ifdef DEBUG_PCI
@@ -195,6 +196,9 @@ void pci_device_reset(PCIDevice *dev)
         }
     }
     pci_update_mappings(dev);
+
+    msi_reset(dev);
+    msix_reset(dev);
 }
 
 /*
diff --git a/hw/pci_bridge.c b/hw/pci_bridge.c
index b6287cd..e03c871 100644
--- a/hw/pci_bridge.c
+++ b/hw/pci_bridge.c
@@ -32,6 +32,8 @@
 #include "pci_bridge.h"
 #include "pci_internals.h"
 #include "range.h"
+#include "msi.h"
+#include "msix.h"
 
 /* PCI bridge subsystem vendor ID helper functions */
 #define PCI_SSVID_SIZEOF        8
@@ -296,6 +298,8 @@ void pci_bridge_reset(DeviceState *qdev)
 {
     PCIDevice *dev = DO_UPCAST(PCIDevice, qdev, qdev);
     pci_bridge_reset_reg(dev);
+    msi_reset(dev);
+    msix_reset(dev);
 }
 
 /* default qdev initialization function for PCI-to-PCI bridge */
diff --git a/hw/virtio-pci.c b/hw/virtio-pci.c
index 6582099..3fb250f 100644
--- a/hw/virtio-pci.c
+++ b/hw/virtio-pci.c
@@ -276,7 +276,6 @@ static void virtio_pci_reset(DeviceState *d)
     VirtIOPCIProxy *proxy = container_of(d, VirtIOPCIProxy, pci_dev.qdev);
     virtio_pci_stop_ioeventfd(proxy);
     virtio_reset(proxy->vdev);
-    msix_reset(&proxy->pci_dev);
     proxy->flags &= ~VIRTIO_PCI_FLAG_BUS_MASTER_BUG;
 }
 
diff --git a/hw/xio3130_downstream.c b/hw/xio3130_downstream.c
index d3c387d..464eefa 100644
--- a/hw/xio3130_downstream.c
+++ b/hw/xio3130_downstream.c
@@ -48,7 +48,7 @@ static void xio3130_downstream_write_config(PCIDevice *d, uint32_t address,
 static void xio3130_downstream_reset(DeviceState *qdev)
 {
     PCIDevice *d = DO_UPCAST(PCIDevice, qdev, qdev);
-    msi_reset(d);
+
     pcie_cap_deverr_reset(d);
     pcie_cap_slot_reset(d);
     pcie_cap_ari_reset(d);
diff --git a/hw/xio3130_upstream.c b/hw/xio3130_upstream.c
index 8283695..0d8d254 100644
--- a/hw/xio3130_upstream.c
+++ b/hw/xio3130_upstream.c
@@ -47,7 +47,6 @@ static void xio3130_upstream_write_config(PCIDevice *d, uint32_t address,
 static void xio3130_upstream_reset(DeviceState *qdev)
 {
     PCIDevice *d = DO_UPCAST(PCIDevice, qdev, qdev);
-    msi_reset(d);
     pci_bridge_reset(qdev);
     pcie_cap_deverr_reset(d);
 }
-- 
1.7.3.4

^ permalink raw reply related	[flat|nested] 288+ messages in thread

* [RFC][PATCH 05/45] msi: Invoke msi/msix_write_config from PCI core
  2011-10-17  9:27 ` [Qemu-devel] " Jan Kiszka
@ 2011-10-17  9:27   ` Jan Kiszka
  -1 siblings, 0 replies; 288+ messages in thread
From: Jan Kiszka @ 2011-10-17  9:27 UTC (permalink / raw)
  To: Avi Kivity, Marcelo Tosatti
  Cc: kvm, Alex Williamson, Michael S. Tsirkin, qemu-devel,
	Alexander Graf, Gerd Hoffmann, Isaku Yamahata

Also this functions is better invoked by the core than by each and every
device. This allows to drop the config_write callbacks from ich and
intel-hda.

CC: Alexander Graf <agraf@suse.de>
CC: Gerd Hoffmann <kraxel@redhat.com>
CC: Isaku Yamahata <yamahata@valinux.co.jp>
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
---
 hw/ide/ich.c            |    8 --------
 hw/intel-hda.c          |   12 ------------
 hw/ioh3420.c            |    1 -
 hw/msi.c                |    2 +-
 hw/pci.c                |    3 +++
 hw/virtio-pci.c         |    2 --
 hw/xio3130_downstream.c |    1 -
 hw/xio3130_upstream.c   |    1 -
 8 files changed, 4 insertions(+), 26 deletions(-)

diff --git a/hw/ide/ich.c b/hw/ide/ich.c
index 3f7510f..a470c01 100644
--- a/hw/ide/ich.c
+++ b/hw/ide/ich.c
@@ -139,13 +139,6 @@ static int pci_ich9_uninit(PCIDevice *dev)
     return 0;
 }
 
-static void pci_ich9_write_config(PCIDevice *pci, uint32_t addr,
-                                  uint32_t val, int len)
-{
-    pci_default_write_config(pci, addr, val, len);
-    msi_write_config(pci, addr, val, len);
-}
-
 static PCIDeviceInfo ich_ahci_info[] = {
     {
         .qdev.name    = "ich9-ahci",
@@ -154,7 +147,6 @@ static PCIDeviceInfo ich_ahci_info[] = {
         .qdev.vmsd    = &vmstate_ahci,
         .init         = pci_ich9_ahci_init,
         .exit         = pci_ich9_uninit,
-        .config_write = pci_ich9_write_config,
         .vendor_id    = PCI_VENDOR_ID_INTEL,
         .device_id    = PCI_DEVICE_ID_INTEL_82801IR,
         .revision     = 0x02,
diff --git a/hw/intel-hda.c b/hw/intel-hda.c
index 4272204..0453039 100644
--- a/hw/intel-hda.c
+++ b/hw/intel-hda.c
@@ -1156,17 +1156,6 @@ static int intel_hda_exit(PCIDevice *pci)
     return 0;
 }
 
-static void intel_hda_write_config(PCIDevice *pci, uint32_t addr,
-                                   uint32_t val, int len)
-{
-    IntelHDAState *d = DO_UPCAST(IntelHDAState, pci, pci);
-
-    pci_default_write_config(pci, addr, val, len);
-    if (d->msi) {
-        msi_write_config(pci, addr, val, len);
-    }
-}
-
 static int intel_hda_post_load(void *opaque, int version)
 {
     IntelHDAState* d = opaque;
@@ -1250,7 +1239,6 @@ static PCIDeviceInfo intel_hda_info = {
     .qdev.reset   = intel_hda_reset,
     .init         = intel_hda_init,
     .exit         = intel_hda_exit,
-    .config_write = intel_hda_write_config,
     .vendor_id    = PCI_VENDOR_ID_INTEL,
     .device_id    = 0x2668,
     .revision     = 1,
diff --git a/hw/ioh3420.c b/hw/ioh3420.c
index fc2fb3b..886ede8 100644
--- a/hw/ioh3420.c
+++ b/hw/ioh3420.c
@@ -71,7 +71,6 @@ static void ioh3420_write_config(PCIDevice *d,
         pci_get_long(d->config + d->exp.aer_cap + PCI_ERR_ROOT_COMMAND);
 
     pci_bridge_write_config(d, address, val, len);
-    msi_write_config(d, address, val, len);
     ioh3420_aer_vector_update(d);
     pcie_cap_slot_write_config(d, address, val, len);
     pcie_aer_write_config(d, address, val, len);
diff --git a/hw/msi.c b/hw/msi.c
index b117f69..c924e38 100644
--- a/hw/msi.c
+++ b/hw/msi.c
@@ -369,7 +369,7 @@ void msi_notify(PCIDevice *dev, unsigned int vector)
     stl_le_phys(address, data);
 }
 
-/* call this function after updating configs by pci_default_write_config(). */
+/* Normally called by pci_default_write_config(). */
 void msi_write_config(PCIDevice *dev, uint32_t addr, uint32_t val, int len)
 {
     uint16_t flags = pci_get_word(dev->config + msi_flags_off(dev));
diff --git a/hw/pci.c b/hw/pci.c
index 933d49e..6673989 100644
--- a/hw/pci.c
+++ b/hw/pci.c
@@ -1154,6 +1154,9 @@ void pci_default_write_config(PCIDevice *d, uint32_t addr, uint32_t val, int l)
 
     if (range_covers_byte(addr, l, PCI_COMMAND))
         pci_update_irq_disabled(d, was_irq_disabled);
+
+    msi_write_config(d, addr, val, l);
+    msix_write_config(d, addr, val, l);
 }
 
 /***********************************************************/
diff --git a/hw/virtio-pci.c b/hw/virtio-pci.c
index 3fb250f..615295e 100644
--- a/hw/virtio-pci.c
+++ b/hw/virtio-pci.c
@@ -502,8 +502,6 @@ static void virtio_write_config(PCIDevice *pci_dev, uint32_t address,
         virtio_set_status(proxy->vdev,
                           proxy->vdev->status & ~VIRTIO_CONFIG_S_DRIVER_OK);
     }
-
-    msix_write_config(pci_dev, address, val, len);
 }
 
 static unsigned virtio_pci_get_features(void *opaque)
diff --git a/hw/xio3130_downstream.c b/hw/xio3130_downstream.c
index 464eefa..8e9117d 100644
--- a/hw/xio3130_downstream.c
+++ b/hw/xio3130_downstream.c
@@ -41,7 +41,6 @@ static void xio3130_downstream_write_config(PCIDevice *d, uint32_t address,
     pci_bridge_write_config(d, address, val, len);
     pcie_cap_flr_write_config(d, address, val, len);
     pcie_cap_slot_write_config(d, address, val, len);
-    msi_write_config(d, address, val, len);
     pcie_aer_write_config(d, address, val, len);
 }
 
diff --git a/hw/xio3130_upstream.c b/hw/xio3130_upstream.c
index 0d8d254..707401e 100644
--- a/hw/xio3130_upstream.c
+++ b/hw/xio3130_upstream.c
@@ -40,7 +40,6 @@ static void xio3130_upstream_write_config(PCIDevice *d, uint32_t address,
 {
     pci_bridge_write_config(d, address, val, len);
     pcie_cap_flr_write_config(d, address, val, len);
-    msi_write_config(d, address, val, len);
     pcie_aer_write_config(d, address, val, len);
 }
 
-- 
1.7.3.4


^ permalink raw reply related	[flat|nested] 288+ messages in thread

* [Qemu-devel] [RFC][PATCH 05/45] msi: Invoke msi/msix_write_config from PCI core
@ 2011-10-17  9:27   ` Jan Kiszka
  0 siblings, 0 replies; 288+ messages in thread
From: Jan Kiszka @ 2011-10-17  9:27 UTC (permalink / raw)
  To: Avi Kivity, Marcelo Tosatti
  Cc: kvm, Michael S. Tsirkin, Alexander Graf, qemu-devel,
	Isaku Yamahata, Alex Williamson, Gerd Hoffmann

Also this functions is better invoked by the core than by each and every
device. This allows to drop the config_write callbacks from ich and
intel-hda.

CC: Alexander Graf <agraf@suse.de>
CC: Gerd Hoffmann <kraxel@redhat.com>
CC: Isaku Yamahata <yamahata@valinux.co.jp>
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
---
 hw/ide/ich.c            |    8 --------
 hw/intel-hda.c          |   12 ------------
 hw/ioh3420.c            |    1 -
 hw/msi.c                |    2 +-
 hw/pci.c                |    3 +++
 hw/virtio-pci.c         |    2 --
 hw/xio3130_downstream.c |    1 -
 hw/xio3130_upstream.c   |    1 -
 8 files changed, 4 insertions(+), 26 deletions(-)

diff --git a/hw/ide/ich.c b/hw/ide/ich.c
index 3f7510f..a470c01 100644
--- a/hw/ide/ich.c
+++ b/hw/ide/ich.c
@@ -139,13 +139,6 @@ static int pci_ich9_uninit(PCIDevice *dev)
     return 0;
 }
 
-static void pci_ich9_write_config(PCIDevice *pci, uint32_t addr,
-                                  uint32_t val, int len)
-{
-    pci_default_write_config(pci, addr, val, len);
-    msi_write_config(pci, addr, val, len);
-}
-
 static PCIDeviceInfo ich_ahci_info[] = {
     {
         .qdev.name    = "ich9-ahci",
@@ -154,7 +147,6 @@ static PCIDeviceInfo ich_ahci_info[] = {
         .qdev.vmsd    = &vmstate_ahci,
         .init         = pci_ich9_ahci_init,
         .exit         = pci_ich9_uninit,
-        .config_write = pci_ich9_write_config,
         .vendor_id    = PCI_VENDOR_ID_INTEL,
         .device_id    = PCI_DEVICE_ID_INTEL_82801IR,
         .revision     = 0x02,
diff --git a/hw/intel-hda.c b/hw/intel-hda.c
index 4272204..0453039 100644
--- a/hw/intel-hda.c
+++ b/hw/intel-hda.c
@@ -1156,17 +1156,6 @@ static int intel_hda_exit(PCIDevice *pci)
     return 0;
 }
 
-static void intel_hda_write_config(PCIDevice *pci, uint32_t addr,
-                                   uint32_t val, int len)
-{
-    IntelHDAState *d = DO_UPCAST(IntelHDAState, pci, pci);
-
-    pci_default_write_config(pci, addr, val, len);
-    if (d->msi) {
-        msi_write_config(pci, addr, val, len);
-    }
-}
-
 static int intel_hda_post_load(void *opaque, int version)
 {
     IntelHDAState* d = opaque;
@@ -1250,7 +1239,6 @@ static PCIDeviceInfo intel_hda_info = {
     .qdev.reset   = intel_hda_reset,
     .init         = intel_hda_init,
     .exit         = intel_hda_exit,
-    .config_write = intel_hda_write_config,
     .vendor_id    = PCI_VENDOR_ID_INTEL,
     .device_id    = 0x2668,
     .revision     = 1,
diff --git a/hw/ioh3420.c b/hw/ioh3420.c
index fc2fb3b..886ede8 100644
--- a/hw/ioh3420.c
+++ b/hw/ioh3420.c
@@ -71,7 +71,6 @@ static void ioh3420_write_config(PCIDevice *d,
         pci_get_long(d->config + d->exp.aer_cap + PCI_ERR_ROOT_COMMAND);
 
     pci_bridge_write_config(d, address, val, len);
-    msi_write_config(d, address, val, len);
     ioh3420_aer_vector_update(d);
     pcie_cap_slot_write_config(d, address, val, len);
     pcie_aer_write_config(d, address, val, len);
diff --git a/hw/msi.c b/hw/msi.c
index b117f69..c924e38 100644
--- a/hw/msi.c
+++ b/hw/msi.c
@@ -369,7 +369,7 @@ void msi_notify(PCIDevice *dev, unsigned int vector)
     stl_le_phys(address, data);
 }
 
-/* call this function after updating configs by pci_default_write_config(). */
+/* Normally called by pci_default_write_config(). */
 void msi_write_config(PCIDevice *dev, uint32_t addr, uint32_t val, int len)
 {
     uint16_t flags = pci_get_word(dev->config + msi_flags_off(dev));
diff --git a/hw/pci.c b/hw/pci.c
index 933d49e..6673989 100644
--- a/hw/pci.c
+++ b/hw/pci.c
@@ -1154,6 +1154,9 @@ void pci_default_write_config(PCIDevice *d, uint32_t addr, uint32_t val, int l)
 
     if (range_covers_byte(addr, l, PCI_COMMAND))
         pci_update_irq_disabled(d, was_irq_disabled);
+
+    msi_write_config(d, addr, val, l);
+    msix_write_config(d, addr, val, l);
 }
 
 /***********************************************************/
diff --git a/hw/virtio-pci.c b/hw/virtio-pci.c
index 3fb250f..615295e 100644
--- a/hw/virtio-pci.c
+++ b/hw/virtio-pci.c
@@ -502,8 +502,6 @@ static void virtio_write_config(PCIDevice *pci_dev, uint32_t address,
         virtio_set_status(proxy->vdev,
                           proxy->vdev->status & ~VIRTIO_CONFIG_S_DRIVER_OK);
     }
-
-    msix_write_config(pci_dev, address, val, len);
 }
 
 static unsigned virtio_pci_get_features(void *opaque)
diff --git a/hw/xio3130_downstream.c b/hw/xio3130_downstream.c
index 464eefa..8e9117d 100644
--- a/hw/xio3130_downstream.c
+++ b/hw/xio3130_downstream.c
@@ -41,7 +41,6 @@ static void xio3130_downstream_write_config(PCIDevice *d, uint32_t address,
     pci_bridge_write_config(d, address, val, len);
     pcie_cap_flr_write_config(d, address, val, len);
     pcie_cap_slot_write_config(d, address, val, len);
-    msi_write_config(d, address, val, len);
     pcie_aer_write_config(d, address, val, len);
 }
 
diff --git a/hw/xio3130_upstream.c b/hw/xio3130_upstream.c
index 0d8d254..707401e 100644
--- a/hw/xio3130_upstream.c
+++ b/hw/xio3130_upstream.c
@@ -40,7 +40,6 @@ static void xio3130_upstream_write_config(PCIDevice *d, uint32_t address,
 {
     pci_bridge_write_config(d, address, val, len);
     pcie_cap_flr_write_config(d, address, val, len);
-    msi_write_config(d, address, val, len);
     pcie_aer_write_config(d, address, val, len);
 }
 
-- 
1.7.3.4

^ permalink raw reply related	[flat|nested] 288+ messages in thread

* [RFC][PATCH 06/45] msix: Prevent bogus mask updates on MMIO accesses
  2011-10-17  9:27 ` [Qemu-devel] " Jan Kiszka
@ 2011-10-17  9:27   ` Jan Kiszka
  -1 siblings, 0 replies; 288+ messages in thread
From: Jan Kiszka @ 2011-10-17  9:27 UTC (permalink / raw)
  To: Avi Kivity, Marcelo Tosatti
  Cc: kvm, Alex Williamson, Michael S. Tsirkin, qemu-devel

Only accesses to the MSI-X table must trigger a call to
msix_handle_mask_update or a notifier invocation.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
---
 hw/msix.c |   16 ++++++++++------
 1 files changed, 10 insertions(+), 6 deletions(-)

diff --git a/hw/msix.c b/hw/msix.c
index 2c4de21..33cb716 100644
--- a/hw/msix.c
+++ b/hw/msix.c
@@ -264,18 +264,22 @@ static void msix_mmio_write(void *opaque, target_phys_addr_t addr,
 {
     PCIDevice *dev = opaque;
     unsigned int offset = addr & (MSIX_PAGE_SIZE - 1) & ~0x3;
-    int vector = offset / PCI_MSIX_ENTRY_SIZE;
+    unsigned int vector = offset / PCI_MSIX_ENTRY_SIZE;
     int was_masked = msix_is_masked(dev, vector);
     pci_set_long(dev->msix_table_page + offset, val);
     if (kvm_enabled() && kvm_irqchip_in_kernel()) {
         kvm_msix_update(dev, vector, was_masked, msix_is_masked(dev, vector));
     }
-    if (was_masked != msix_is_masked(dev, vector) && dev->msix_mask_notifier) {
-        int r = dev->msix_mask_notifier(dev, vector,
-					msix_is_masked(dev, vector));
-        assert(r >= 0);
+
+    if (vector < dev->msix_entries_nr) {
+        if (was_masked != msix_is_masked(dev, vector) &&
+            dev->msix_mask_notifier) {
+            int r = dev->msix_mask_notifier(dev, vector,
+                                            msix_is_masked(dev, vector));
+            assert(r >= 0);
+        }
+        msix_handle_mask_update(dev, vector);
     }
-    msix_handle_mask_update(dev, vector);
 }
 
 static const MemoryRegionOps msix_mmio_ops = {
-- 
1.7.3.4


^ permalink raw reply related	[flat|nested] 288+ messages in thread

* [Qemu-devel] [RFC][PATCH 06/45] msix: Prevent bogus mask updates on MMIO accesses
@ 2011-10-17  9:27   ` Jan Kiszka
  0 siblings, 0 replies; 288+ messages in thread
From: Jan Kiszka @ 2011-10-17  9:27 UTC (permalink / raw)
  To: Avi Kivity, Marcelo Tosatti
  Cc: Alex Williamson, qemu-devel, kvm, Michael S. Tsirkin

Only accesses to the MSI-X table must trigger a call to
msix_handle_mask_update or a notifier invocation.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
---
 hw/msix.c |   16 ++++++++++------
 1 files changed, 10 insertions(+), 6 deletions(-)

diff --git a/hw/msix.c b/hw/msix.c
index 2c4de21..33cb716 100644
--- a/hw/msix.c
+++ b/hw/msix.c
@@ -264,18 +264,22 @@ static void msix_mmio_write(void *opaque, target_phys_addr_t addr,
 {
     PCIDevice *dev = opaque;
     unsigned int offset = addr & (MSIX_PAGE_SIZE - 1) & ~0x3;
-    int vector = offset / PCI_MSIX_ENTRY_SIZE;
+    unsigned int vector = offset / PCI_MSIX_ENTRY_SIZE;
     int was_masked = msix_is_masked(dev, vector);
     pci_set_long(dev->msix_table_page + offset, val);
     if (kvm_enabled() && kvm_irqchip_in_kernel()) {
         kvm_msix_update(dev, vector, was_masked, msix_is_masked(dev, vector));
     }
-    if (was_masked != msix_is_masked(dev, vector) && dev->msix_mask_notifier) {
-        int r = dev->msix_mask_notifier(dev, vector,
-					msix_is_masked(dev, vector));
-        assert(r >= 0);
+
+    if (vector < dev->msix_entries_nr) {
+        if (was_masked != msix_is_masked(dev, vector) &&
+            dev->msix_mask_notifier) {
+            int r = dev->msix_mask_notifier(dev, vector,
+                                            msix_is_masked(dev, vector));
+            assert(r >= 0);
+        }
+        msix_handle_mask_update(dev, vector);
     }
-    msix_handle_mask_update(dev, vector);
 }
 
 static const MemoryRegionOps msix_mmio_ops = {
-- 
1.7.3.4

^ permalink raw reply related	[flat|nested] 288+ messages in thread

* [RFC][PATCH 07/45] msi: Generalize msix_supported to msi_supported
  2011-10-17  9:27 ` [Qemu-devel] " Jan Kiszka
@ 2011-10-17  9:27   ` Jan Kiszka
  -1 siblings, 0 replies; 288+ messages in thread
From: Jan Kiszka @ 2011-10-17  9:27 UTC (permalink / raw)
  To: Avi Kivity, Marcelo Tosatti
  Cc: Alex Williamson, qemu-devel, kvm, Michael S. Tsirkin

Rename msix_supported to msi_supported and control MSI and MSI-X
activation this way. That was likely to original intention for this
flag, but MSI support came after MSI-X.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
---
 hw/msi.c  |    8 ++++++++
 hw/msi.h  |    2 ++
 hw/msix.c |    8 +++-----
 hw/msix.h |    2 --
 hw/pc.c   |    4 ++--
 5 files changed, 15 insertions(+), 9 deletions(-)

diff --git a/hw/msi.c b/hw/msi.c
index c924e38..2b7b6e3 100644
--- a/hw/msi.c
+++ b/hw/msi.c
@@ -37,6 +37,9 @@
 
 #define PCI_MSI_VECTORS_MAX     32
 
+/* Flag for interrupt controller to declare MSI/MSI-X support */
+bool msi_supported;
+
 /* If we get rid of cap allocator, we won't need this. */
 static inline uint8_t msi_cap_sizeof(uint16_t flags)
 {
@@ -205,6 +208,11 @@ int msi_init(struct PCIDevice *dev, uint8_t offset,
     uint16_t flags;
     uint8_t cap_size;
     int config_offset;
+
+    if (!msi_supported) {
+        return -ENOTSUP;
+    }
+
     MSI_DEV_PRINTF(dev,
                    "init offset: 0x%"PRIx8" vector: %"PRId8
                    " 64bit %d mask %d\n",
diff --git a/hw/msi.h b/hw/msi.h
index 6ff0607..e5e821f 100644
--- a/hw/msi.h
+++ b/hw/msi.h
@@ -24,6 +24,8 @@
 #include "qemu-common.h"
 #include "pci.h"
 
+extern bool msi_supported;
+
 bool msi_enabled(const PCIDevice *dev);
 int msi_init(struct PCIDevice *dev, uint8_t offset,
              unsigned int nr_vectors, bool msi64bit, bool msi_per_vector_mask);
diff --git a/hw/msix.c b/hw/msix.c
index 33cb716..04e08e5 100644
--- a/hw/msix.c
+++ b/hw/msix.c
@@ -12,6 +12,7 @@
  */
 
 #include "hw.h"
+#include "msi.h"
 #include "msix.h"
 #include "pci.h"
 #include "range.h"
@@ -32,10 +33,6 @@
 #define MSIX_PAGE_PENDING (MSIX_PAGE_SIZE / 2)
 #define MSIX_MAX_ENTRIES 32
 
-
-/* Flag for interrupt controller to declare MSI-X support */
-int msix_supported;
-
 /* KVM specific MSIX helpers */
 static void kvm_msix_free(PCIDevice *dev)
 {
@@ -327,8 +324,9 @@ int msix_init(struct PCIDevice *dev, unsigned short nentries,
               unsigned bar_nr, unsigned bar_size)
 {
     int ret;
+
     /* Nothing to do if MSI is not supported by interrupt controller */
-    if (!msix_supported ||
+    if (!msi_supported ||
         (kvm_enabled() && kvm_irqchip_in_kernel() && !kvm_has_gsi_routing())) {
         return -ENOTSUP;
     }
diff --git a/hw/msix.h b/hw/msix.h
index 189bb3f..a8661e1 100644
--- a/hw/msix.h
+++ b/hw/msix.h
@@ -29,8 +29,6 @@ void msix_notify(PCIDevice *dev, unsigned vector);
 
 void msix_reset(PCIDevice *dev);
 
-extern int msix_supported;
-
 int msix_set_mask_notifier(PCIDevice *dev, msix_mask_notifier_func);
 int msix_unset_mask_notifier(PCIDevice *dev);
 #endif
diff --git a/hw/pc.c b/hw/pc.c
index 70e0d08..768a20c 100644
--- a/hw/pc.c
+++ b/hw/pc.c
@@ -36,7 +36,7 @@
 #include "elf.h"
 #include "multiboot.h"
 #include "mc146818rtc.h"
-#include "msix.h"
+#include "msi.h"
 #include "sysbus.h"
 #include "sysemu.h"
 #include "kvm.h"
@@ -892,7 +892,7 @@ static DeviceState *apic_init(void *env, uint8_t apic_id)
         apic_mapped = 1;
     }
 
-    msix_supported = 1;
+    msi_supported = true;
 
     return dev;
 }
-- 
1.7.3.4

^ permalink raw reply related	[flat|nested] 288+ messages in thread

* [Qemu-devel] [RFC][PATCH 07/45] msi: Generalize msix_supported to msi_supported
@ 2011-10-17  9:27   ` Jan Kiszka
  0 siblings, 0 replies; 288+ messages in thread
From: Jan Kiszka @ 2011-10-17  9:27 UTC (permalink / raw)
  To: Avi Kivity, Marcelo Tosatti
  Cc: Alex Williamson, qemu-devel, kvm, Michael S. Tsirkin

Rename msix_supported to msi_supported and control MSI and MSI-X
activation this way. That was likely to original intention for this
flag, but MSI support came after MSI-X.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
---
 hw/msi.c  |    8 ++++++++
 hw/msi.h  |    2 ++
 hw/msix.c |    8 +++-----
 hw/msix.h |    2 --
 hw/pc.c   |    4 ++--
 5 files changed, 15 insertions(+), 9 deletions(-)

diff --git a/hw/msi.c b/hw/msi.c
index c924e38..2b7b6e3 100644
--- a/hw/msi.c
+++ b/hw/msi.c
@@ -37,6 +37,9 @@
 
 #define PCI_MSI_VECTORS_MAX     32
 
+/* Flag for interrupt controller to declare MSI/MSI-X support */
+bool msi_supported;
+
 /* If we get rid of cap allocator, we won't need this. */
 static inline uint8_t msi_cap_sizeof(uint16_t flags)
 {
@@ -205,6 +208,11 @@ int msi_init(struct PCIDevice *dev, uint8_t offset,
     uint16_t flags;
     uint8_t cap_size;
     int config_offset;
+
+    if (!msi_supported) {
+        return -ENOTSUP;
+    }
+
     MSI_DEV_PRINTF(dev,
                    "init offset: 0x%"PRIx8" vector: %"PRId8
                    " 64bit %d mask %d\n",
diff --git a/hw/msi.h b/hw/msi.h
index 6ff0607..e5e821f 100644
--- a/hw/msi.h
+++ b/hw/msi.h
@@ -24,6 +24,8 @@
 #include "qemu-common.h"
 #include "pci.h"
 
+extern bool msi_supported;
+
 bool msi_enabled(const PCIDevice *dev);
 int msi_init(struct PCIDevice *dev, uint8_t offset,
              unsigned int nr_vectors, bool msi64bit, bool msi_per_vector_mask);
diff --git a/hw/msix.c b/hw/msix.c
index 33cb716..04e08e5 100644
--- a/hw/msix.c
+++ b/hw/msix.c
@@ -12,6 +12,7 @@
  */
 
 #include "hw.h"
+#include "msi.h"
 #include "msix.h"
 #include "pci.h"
 #include "range.h"
@@ -32,10 +33,6 @@
 #define MSIX_PAGE_PENDING (MSIX_PAGE_SIZE / 2)
 #define MSIX_MAX_ENTRIES 32
 
-
-/* Flag for interrupt controller to declare MSI-X support */
-int msix_supported;
-
 /* KVM specific MSIX helpers */
 static void kvm_msix_free(PCIDevice *dev)
 {
@@ -327,8 +324,9 @@ int msix_init(struct PCIDevice *dev, unsigned short nentries,
               unsigned bar_nr, unsigned bar_size)
 {
     int ret;
+
     /* Nothing to do if MSI is not supported by interrupt controller */
-    if (!msix_supported ||
+    if (!msi_supported ||
         (kvm_enabled() && kvm_irqchip_in_kernel() && !kvm_has_gsi_routing())) {
         return -ENOTSUP;
     }
diff --git a/hw/msix.h b/hw/msix.h
index 189bb3f..a8661e1 100644
--- a/hw/msix.h
+++ b/hw/msix.h
@@ -29,8 +29,6 @@ void msix_notify(PCIDevice *dev, unsigned vector);
 
 void msix_reset(PCIDevice *dev);
 
-extern int msix_supported;
-
 int msix_set_mask_notifier(PCIDevice *dev, msix_mask_notifier_func);
 int msix_unset_mask_notifier(PCIDevice *dev);
 #endif
diff --git a/hw/pc.c b/hw/pc.c
index 70e0d08..768a20c 100644
--- a/hw/pc.c
+++ b/hw/pc.c
@@ -36,7 +36,7 @@
 #include "elf.h"
 #include "multiboot.h"
 #include "mc146818rtc.h"
-#include "msix.h"
+#include "msi.h"
 #include "sysbus.h"
 #include "sysemu.h"
 #include "kvm.h"
@@ -892,7 +892,7 @@ static DeviceState *apic_init(void *env, uint8_t apic_id)
         apic_mapped = 1;
     }
 
-    msix_supported = 1;
+    msi_supported = true;
 
     return dev;
 }
-- 
1.7.3.4

^ permalink raw reply related	[flat|nested] 288+ messages in thread

* [RFC][PATCH 08/45] Introduce MSIMessage structure
  2011-10-17  9:27 ` [Qemu-devel] " Jan Kiszka
@ 2011-10-17  9:27   ` Jan Kiszka
  -1 siblings, 0 replies; 288+ messages in thread
From: Jan Kiszka @ 2011-10-17  9:27 UTC (permalink / raw)
  To: Avi Kivity, Marcelo Tosatti
  Cc: kvm, Alex Williamson, Michael S. Tsirkin, qemu-devel

Will be used for generating and distributing MSI messages, both in
emulation mode and under KVM.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
---
 hw/msi.h      |    5 +++++
 qemu-common.h |    1 +
 2 files changed, 6 insertions(+), 0 deletions(-)

diff --git a/hw/msi.h b/hw/msi.h
index e5e821f..22e3932 100644
--- a/hw/msi.h
+++ b/hw/msi.h
@@ -24,6 +24,11 @@
 #include "qemu-common.h"
 #include "pci.h"
 
+struct MSIMessage {
+    uint64_t address;
+    uint32_t data;
+};
+
 extern bool msi_supported;
 
 bool msi_enabled(const PCIDevice *dev);
diff --git a/qemu-common.h b/qemu-common.h
index 5e87bdf..d3901bd 100644
--- a/qemu-common.h
+++ b/qemu-common.h
@@ -15,6 +15,7 @@ typedef struct QEMUTimer QEMUTimer;
 typedef struct QEMUFile QEMUFile;
 typedef struct QEMUBH QEMUBH;
 typedef struct DeviceState DeviceState;
+typedef struct MSIMessage MSIMessage;
 
 struct Monitor;
 typedef struct Monitor Monitor;
-- 
1.7.3.4


^ permalink raw reply related	[flat|nested] 288+ messages in thread

* [Qemu-devel] [RFC][PATCH 08/45] Introduce MSIMessage structure
@ 2011-10-17  9:27   ` Jan Kiszka
  0 siblings, 0 replies; 288+ messages in thread
From: Jan Kiszka @ 2011-10-17  9:27 UTC (permalink / raw)
  To: Avi Kivity, Marcelo Tosatti
  Cc: Alex Williamson, qemu-devel, kvm, Michael S. Tsirkin

Will be used for generating and distributing MSI messages, both in
emulation mode and under KVM.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
---
 hw/msi.h      |    5 +++++
 qemu-common.h |    1 +
 2 files changed, 6 insertions(+), 0 deletions(-)

diff --git a/hw/msi.h b/hw/msi.h
index e5e821f..22e3932 100644
--- a/hw/msi.h
+++ b/hw/msi.h
@@ -24,6 +24,11 @@
 #include "qemu-common.h"
 #include "pci.h"
 
+struct MSIMessage {
+    uint64_t address;
+    uint32_t data;
+};
+
 extern bool msi_supported;
 
 bool msi_enabled(const PCIDevice *dev);
diff --git a/qemu-common.h b/qemu-common.h
index 5e87bdf..d3901bd 100644
--- a/qemu-common.h
+++ b/qemu-common.h
@@ -15,6 +15,7 @@ typedef struct QEMUTimer QEMUTimer;
 typedef struct QEMUFile QEMUFile;
 typedef struct QEMUBH QEMUBH;
 typedef struct DeviceState DeviceState;
+typedef struct MSIMessage MSIMessage;
 
 struct Monitor;
 typedef struct Monitor Monitor;
-- 
1.7.3.4

^ permalink raw reply related	[flat|nested] 288+ messages in thread

* [RFC][PATCH 09/45] msi: Factor out msi_message_from_vector
  2011-10-17  9:27 ` [Qemu-devel] " Jan Kiszka
@ 2011-10-17  9:27   ` Jan Kiszka
  -1 siblings, 0 replies; 288+ messages in thread
From: Jan Kiszka @ 2011-10-17  9:27 UTC (permalink / raw)
  To: Avi Kivity, Marcelo Tosatti
  Cc: kvm, Alex Williamson, Michael S. Tsirkin, qemu-devel

This helper will also be used by the upcoming config notifier.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
---
 hw/msi.c |   43 +++++++++++++++++++++++++------------------
 1 files changed, 25 insertions(+), 18 deletions(-)

diff --git a/hw/msi.c b/hw/msi.c
index 2b7b6e3..3c7ebc3 100644
--- a/hw/msi.c
+++ b/hw/msi.c
@@ -113,6 +113,25 @@ bool msi_enabled(const PCIDevice *dev)
          PCI_MSI_FLAGS_ENABLE);
 }
 
+static void msi_message_from_vector(PCIDevice *dev, uint16_t msi_flags,
+                                    unsigned vector, MSIMessage *msg)
+{
+    bool msi64bit = msi_flags & PCI_MSI_FLAGS_64BIT;
+    unsigned int nr_vectors = msi_nr_vectors(msi_flags);
+
+    msg->address = pci_get_long(dev->config + msi_address_lo_off(dev));
+    if (msi64bit) {
+        msg->address |= (uint64_t)pci_get_long(dev->config +
+                                               msi_address_hi_off(dev)) << 32;
+    }
+
+    msg->data = pci_get_word(dev->config + msi_data_off(dev, msi64bit));
+    if (nr_vectors > 1) {
+        msg->data &= ~(nr_vectors - 1);
+        msg->data |= vector;
+    }
+}
+
 static void kvm_msi_message_from_vector(PCIDevice *dev, unsigned vector,
                                         KVMMsiMessage *kmm)
 {
@@ -339,11 +358,10 @@ void msi_notify(PCIDevice *dev, unsigned int vector)
 {
     uint16_t flags = pci_get_word(dev->config + msi_flags_off(dev));
     bool msi64bit = flags & PCI_MSI_FLAGS_64BIT;
-    unsigned int nr_vectors = msi_nr_vectors(flags);
-    uint64_t address;
-    uint32_t data;
+    MSIMessage msg;
+
+    assert(vector < msi_nr_vectors(flags));
 
-    assert(vector < nr_vectors);
     if (msi_is_masked(dev, vector)) {
         assert(flags & PCI_MSI_FLAGS_MASKBIT);
         pci_long_test_and_set_mask(
@@ -357,24 +375,13 @@ void msi_notify(PCIDevice *dev, unsigned int vector)
         return;
     }
 
-    if (msi64bit) {
-        address = pci_get_quad(dev->config + msi_address_lo_off(dev));
-    } else {
-        address = pci_get_long(dev->config + msi_address_lo_off(dev));
-    }
-
-    /* upper bit 31:16 is zero */
-    data = pci_get_word(dev->config + msi_data_off(dev, msi64bit));
-    if (nr_vectors > 1) {
-        data &= ~(nr_vectors - 1);
-        data |= vector;
-    }
+    msi_message_from_vector(dev, flags, vector, &msg);
 
     MSI_DEV_PRINTF(dev,
                    "notify vector 0x%x"
                    " address: 0x%"PRIx64" data: 0x%"PRIx32"\n",
-                   vector, address, data);
-    stl_le_phys(address, data);
+                   vector, msg.address, msg.data);
+    stl_le_phys(msg.address, msg.data);
 }
 
 /* Normally called by pci_default_write_config(). */
-- 
1.7.3.4


^ permalink raw reply related	[flat|nested] 288+ messages in thread

* [Qemu-devel] [RFC][PATCH 09/45] msi: Factor out msi_message_from_vector
@ 2011-10-17  9:27   ` Jan Kiszka
  0 siblings, 0 replies; 288+ messages in thread
From: Jan Kiszka @ 2011-10-17  9:27 UTC (permalink / raw)
  To: Avi Kivity, Marcelo Tosatti
  Cc: Alex Williamson, qemu-devel, kvm, Michael S. Tsirkin

This helper will also be used by the upcoming config notifier.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
---
 hw/msi.c |   43 +++++++++++++++++++++++++------------------
 1 files changed, 25 insertions(+), 18 deletions(-)

diff --git a/hw/msi.c b/hw/msi.c
index 2b7b6e3..3c7ebc3 100644
--- a/hw/msi.c
+++ b/hw/msi.c
@@ -113,6 +113,25 @@ bool msi_enabled(const PCIDevice *dev)
          PCI_MSI_FLAGS_ENABLE);
 }
 
+static void msi_message_from_vector(PCIDevice *dev, uint16_t msi_flags,
+                                    unsigned vector, MSIMessage *msg)
+{
+    bool msi64bit = msi_flags & PCI_MSI_FLAGS_64BIT;
+    unsigned int nr_vectors = msi_nr_vectors(msi_flags);
+
+    msg->address = pci_get_long(dev->config + msi_address_lo_off(dev));
+    if (msi64bit) {
+        msg->address |= (uint64_t)pci_get_long(dev->config +
+                                               msi_address_hi_off(dev)) << 32;
+    }
+
+    msg->data = pci_get_word(dev->config + msi_data_off(dev, msi64bit));
+    if (nr_vectors > 1) {
+        msg->data &= ~(nr_vectors - 1);
+        msg->data |= vector;
+    }
+}
+
 static void kvm_msi_message_from_vector(PCIDevice *dev, unsigned vector,
                                         KVMMsiMessage *kmm)
 {
@@ -339,11 +358,10 @@ void msi_notify(PCIDevice *dev, unsigned int vector)
 {
     uint16_t flags = pci_get_word(dev->config + msi_flags_off(dev));
     bool msi64bit = flags & PCI_MSI_FLAGS_64BIT;
-    unsigned int nr_vectors = msi_nr_vectors(flags);
-    uint64_t address;
-    uint32_t data;
+    MSIMessage msg;
+
+    assert(vector < msi_nr_vectors(flags));
 
-    assert(vector < nr_vectors);
     if (msi_is_masked(dev, vector)) {
         assert(flags & PCI_MSI_FLAGS_MASKBIT);
         pci_long_test_and_set_mask(
@@ -357,24 +375,13 @@ void msi_notify(PCIDevice *dev, unsigned int vector)
         return;
     }
 
-    if (msi64bit) {
-        address = pci_get_quad(dev->config + msi_address_lo_off(dev));
-    } else {
-        address = pci_get_long(dev->config + msi_address_lo_off(dev));
-    }
-
-    /* upper bit 31:16 is zero */
-    data = pci_get_word(dev->config + msi_data_off(dev, msi64bit));
-    if (nr_vectors > 1) {
-        data &= ~(nr_vectors - 1);
-        data |= vector;
-    }
+    msi_message_from_vector(dev, flags, vector, &msg);
 
     MSI_DEV_PRINTF(dev,
                    "notify vector 0x%x"
                    " address: 0x%"PRIx64" data: 0x%"PRIx32"\n",
-                   vector, address, data);
-    stl_le_phys(address, data);
+                   vector, msg.address, msg.data);
+    stl_le_phys(msg.address, msg.data);
 }
 
 /* Normally called by pci_default_write_config(). */
-- 
1.7.3.4

^ permalink raw reply related	[flat|nested] 288+ messages in thread

* [RFC][PATCH 10/45] msix: Factor out msix_message_from_vector
  2011-10-17  9:27 ` [Qemu-devel] " Jan Kiszka
@ 2011-10-17  9:27   ` Jan Kiszka
  -1 siblings, 0 replies; 288+ messages in thread
From: Jan Kiszka @ 2011-10-17  9:27 UTC (permalink / raw)
  To: Avi Kivity, Marcelo Tosatti
  Cc: Alex Williamson, qemu-devel, kvm, Michael S. Tsirkin

This helper will also be used by the upcoming config notifier.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
---
 hw/msix.c |   19 +++++++++++++------
 1 files changed, 13 insertions(+), 6 deletions(-)

diff --git a/hw/msix.c b/hw/msix.c
index 04e08e5..50fa504 100644
--- a/hw/msix.c
+++ b/hw/msix.c
@@ -33,6 +33,15 @@
 #define MSIX_PAGE_PENDING (MSIX_PAGE_SIZE / 2)
 #define MSIX_MAX_ENTRIES 32
 
+static void msix_message_from_vector(PCIDevice *dev, unsigned vector,
+                                     MSIMessage *msg)
+{
+    uint8_t *table_entry = dev->msix_table_page + vector * PCI_MSIX_ENTRY_SIZE;
+
+    msg->address = pci_get_quad(table_entry + PCI_MSIX_ENTRY_LOWER_ADDR);
+    msg->data = pci_get_long(table_entry + PCI_MSIX_ENTRY_DATA);
+}
+
 /* KVM specific MSIX helpers */
 static void kvm_msix_free(PCIDevice *dev)
 {
@@ -453,9 +462,7 @@ uint32_t msix_bar_size(PCIDevice *dev)
 /* Send an MSI-X message */
 void msix_notify(PCIDevice *dev, unsigned vector)
 {
-    uint8_t *table_entry = dev->msix_table_page + vector * PCI_MSIX_ENTRY_SIZE;
-    uint64_t address;
-    uint32_t data;
+    MSIMessage msg;
 
     if (vector >= dev->msix_entries_nr || !dev->msix_entry_used[vector])
         return;
@@ -469,9 +476,9 @@ void msix_notify(PCIDevice *dev, unsigned vector)
         return;
     }
 
-    address = pci_get_quad(table_entry + PCI_MSIX_ENTRY_LOWER_ADDR);
-    data = pci_get_long(table_entry + PCI_MSIX_ENTRY_DATA);
-    stl_le_phys(address, data);
+    msix_message_from_vector(dev, vector, &msg);
+
+    stl_le_phys(msg.address, msg.data);
 }
 
 void msix_reset(PCIDevice *dev)
-- 
1.7.3.4

^ permalink raw reply related	[flat|nested] 288+ messages in thread

* [Qemu-devel] [RFC][PATCH 10/45] msix: Factor out msix_message_from_vector
@ 2011-10-17  9:27   ` Jan Kiszka
  0 siblings, 0 replies; 288+ messages in thread
From: Jan Kiszka @ 2011-10-17  9:27 UTC (permalink / raw)
  To: Avi Kivity, Marcelo Tosatti
  Cc: Alex Williamson, qemu-devel, kvm, Michael S. Tsirkin

This helper will also be used by the upcoming config notifier.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
---
 hw/msix.c |   19 +++++++++++++------
 1 files changed, 13 insertions(+), 6 deletions(-)

diff --git a/hw/msix.c b/hw/msix.c
index 04e08e5..50fa504 100644
--- a/hw/msix.c
+++ b/hw/msix.c
@@ -33,6 +33,15 @@
 #define MSIX_PAGE_PENDING (MSIX_PAGE_SIZE / 2)
 #define MSIX_MAX_ENTRIES 32
 
+static void msix_message_from_vector(PCIDevice *dev, unsigned vector,
+                                     MSIMessage *msg)
+{
+    uint8_t *table_entry = dev->msix_table_page + vector * PCI_MSIX_ENTRY_SIZE;
+
+    msg->address = pci_get_quad(table_entry + PCI_MSIX_ENTRY_LOWER_ADDR);
+    msg->data = pci_get_long(table_entry + PCI_MSIX_ENTRY_DATA);
+}
+
 /* KVM specific MSIX helpers */
 static void kvm_msix_free(PCIDevice *dev)
 {
@@ -453,9 +462,7 @@ uint32_t msix_bar_size(PCIDevice *dev)
 /* Send an MSI-X message */
 void msix_notify(PCIDevice *dev, unsigned vector)
 {
-    uint8_t *table_entry = dev->msix_table_page + vector * PCI_MSIX_ENTRY_SIZE;
-    uint64_t address;
-    uint32_t data;
+    MSIMessage msg;
 
     if (vector >= dev->msix_entries_nr || !dev->msix_entry_used[vector])
         return;
@@ -469,9 +476,9 @@ void msix_notify(PCIDevice *dev, unsigned vector)
         return;
     }
 
-    address = pci_get_quad(table_entry + PCI_MSIX_ENTRY_LOWER_ADDR);
-    data = pci_get_long(table_entry + PCI_MSIX_ENTRY_DATA);
-    stl_le_phys(address, data);
+    msix_message_from_vector(dev, vector, &msg);
+
+    stl_le_phys(msg.address, msg.data);
 }
 
 void msix_reset(PCIDevice *dev)
-- 
1.7.3.4

^ permalink raw reply related	[flat|nested] 288+ messages in thread

* [RFC][PATCH 11/45] msi: Factor out delivery hook
  2011-10-17  9:27 ` [Qemu-devel] " Jan Kiszka
@ 2011-10-17  9:27   ` Jan Kiszka
  -1 siblings, 0 replies; 288+ messages in thread
From: Jan Kiszka @ 2011-10-17  9:27 UTC (permalink / raw)
  To: Avi Kivity, Marcelo Tosatti
  Cc: kvm, Alex Williamson, Michael S. Tsirkin, qemu-devel

So far we deliver MSI messages by writing them into the target MMIO
area. This reflects what happens on hardware, but imposes some
limitations on the emulation when introducing KVM in-kernel irqchip
models. For those we will need to track the message origin. Moreover,
different architecture or accelerators may want to overload the delivery
handler.

Therefore, this commit introduces a delivery hook that is called by the
MSI/MSI-X layer when devices send normal messages, but also on spurious
deliveries that ended up on the APIC MMIO handler. Our default delivery
handler for APIC-based PCs then dispatches between real MSIs and other
DMA requests that happened to take the MSI patch.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
---
 hw/apic.c |   19 ++++++++++++-------
 hw/apic.h |    1 +
 hw/msi.c  |   10 +++++++++-
 hw/msi.h  |    2 ++
 hw/msix.c |    2 +-
 hw/pc.c   |   11 +++++++++++
 6 files changed, 36 insertions(+), 9 deletions(-)

diff --git a/hw/apic.c b/hw/apic.c
index e43219f..c1d557d 100644
--- a/hw/apic.c
+++ b/hw/apic.c
@@ -19,6 +19,7 @@
 #include "hw.h"
 #include "apic.h"
 #include "ioapic.h"
+#include "msi.h"
 #include "qemu-timer.h"
 #include "host-utils.h"
 #include "sysbus.h"
@@ -803,13 +804,15 @@ static uint32_t apic_mem_readl(void *opaque, target_phys_addr_t addr)
     return val;
 }
 
-static void apic_send_msi(target_phys_addr_t addr, uint32_t data)
+void apic_deliver_msi(MSIMessage *msg)
 {
-    uint8_t dest = (addr & MSI_ADDR_DEST_ID_MASK) >> MSI_ADDR_DEST_ID_SHIFT;
-    uint8_t vector = (data & MSI_DATA_VECTOR_MASK) >> MSI_DATA_VECTOR_SHIFT;
-    uint8_t dest_mode = (addr >> MSI_ADDR_DEST_MODE_SHIFT) & 0x1;
-    uint8_t trigger_mode = (data >> MSI_DATA_TRIGGER_SHIFT) & 0x1;
-    uint8_t delivery = (data >> MSI_DATA_DELIVERY_MODE_SHIFT) & 0x7;
+    uint8_t dest =
+        (msg->address & MSI_ADDR_DEST_ID_MASK) >> MSI_ADDR_DEST_ID_SHIFT;
+    uint8_t vector =
+        (msg->data & MSI_DATA_VECTOR_MASK) >> MSI_DATA_VECTOR_SHIFT;
+    uint8_t dest_mode = (msg->address >> MSI_ADDR_DEST_MODE_SHIFT) & 0x1;
+    uint8_t trigger_mode = (msg->data >> MSI_DATA_TRIGGER_SHIFT) & 0x1;
+    uint8_t delivery = (msg->data >> MSI_DATA_DELIVERY_MODE_SHIFT) & 0x7;
     /* XXX: Ignore redirection hint. */
     apic_deliver_irq(dest, dest_mode, delivery, vector, trigger_mode);
 }
@@ -825,7 +828,9 @@ static void apic_mem_writel(void *opaque, target_phys_addr_t addr, uint32_t val)
          * APIC is connected directly to the CPU.
          * Mapping them on the global bus happens to work because
          * MSI registers are reserved in APIC MMIO and vice versa. */
-        apic_send_msi(addr, val);
+        MSIMessage msg = { .address = addr, .data = val };
+
+        msi_deliver(&msg);
         return;
     }
 
diff --git a/hw/apic.h b/hw/apic.h
index c398c83..fa848fd 100644
--- a/hw/apic.h
+++ b/hw/apic.h
@@ -18,6 +18,7 @@ void cpu_set_apic_tpr(DeviceState *s, uint8_t val);
 uint8_t cpu_get_apic_tpr(DeviceState *s);
 void apic_init_reset(DeviceState *s);
 void apic_sipi(DeviceState *s);
+void apic_deliver_msi(MSIMessage *msg);
 
 /* pc.c */
 int cpu_is_bsp(CPUState *env);
diff --git a/hw/msi.c b/hw/msi.c
index 3c7ebc3..9055155 100644
--- a/hw/msi.c
+++ b/hw/msi.c
@@ -40,6 +40,14 @@
 /* Flag for interrupt controller to declare MSI/MSI-X support */
 bool msi_supported;
 
+static void msi_unsupported(MSIMessage *msg)
+{
+    /* If we get here, the board failed to register a delivery handler. */
+    abort();
+}
+
+void (*msi_deliver)(MSIMessage *msg) = msi_unsupported;
+
 /* If we get rid of cap allocator, we won't need this. */
 static inline uint8_t msi_cap_sizeof(uint16_t flags)
 {
@@ -381,7 +389,7 @@ void msi_notify(PCIDevice *dev, unsigned int vector)
                    "notify vector 0x%x"
                    " address: 0x%"PRIx64" data: 0x%"PRIx32"\n",
                    vector, msg.address, msg.data);
-    stl_le_phys(msg.address, msg.data);
+    msi_deliver(&msg);
 }
 
 /* Normally called by pci_default_write_config(). */
diff --git a/hw/msi.h b/hw/msi.h
index 22e3932..f3152f3 100644
--- a/hw/msi.h
+++ b/hw/msi.h
@@ -46,4 +46,6 @@ static inline bool msi_present(const PCIDevice *dev)
     return dev->cap_present & QEMU_PCI_CAP_MSI;
 }
 
+extern void (*msi_deliver)(MSIMessage *msg);
+
 #endif /* QEMU_MSI_H */
diff --git a/hw/msix.c b/hw/msix.c
index 50fa504..08cc526 100644
--- a/hw/msix.c
+++ b/hw/msix.c
@@ -478,7 +478,7 @@ void msix_notify(PCIDevice *dev, unsigned vector)
 
     msix_message_from_vector(dev, vector, &msg);
 
-    stl_le_phys(msg.address, msg.data);
+    msi_deliver(&msg);
 }
 
 void msix_reset(PCIDevice *dev)
diff --git a/hw/pc.c b/hw/pc.c
index 768a20c..7d29a4a 100644
--- a/hw/pc.c
+++ b/hw/pc.c
@@ -24,6 +24,7 @@
 #include "hw.h"
 #include "pc.h"
 #include "apic.h"
+#include "msi.h"
 #include "fdc.h"
 #include "ide.h"
 #include "pci.h"
@@ -102,6 +103,15 @@ void isa_irq_handler(void *opaque, int n, int level)
         qemu_set_irq(isa->ioapic[n], level);
 };
 
+static void pc_msi_deliver(MSIMessage *msg)
+{
+    if ((msg->address & 0xfff00000) == MSI_ADDR_BASE) {
+        apic_deliver_msi(msg);
+    } else {
+        stl_phys(msg->address, msg->data);
+    }
+}
+
 static void ioport80_write(void *opaque, uint32_t addr, uint32_t data)
 {
 }
@@ -889,6 +899,7 @@ static DeviceState *apic_init(void *env, uint8_t apic_id)
            on the global memory bus. */
         /* XXX: what if the base changes? */
         sysbus_mmio_map(d, 0, MSI_ADDR_BASE);
+        msi_deliver = pc_msi_deliver;
         apic_mapped = 1;
     }
 
-- 
1.7.3.4


^ permalink raw reply related	[flat|nested] 288+ messages in thread

* [Qemu-devel] [RFC][PATCH 11/45] msi: Factor out delivery hook
@ 2011-10-17  9:27   ` Jan Kiszka
  0 siblings, 0 replies; 288+ messages in thread
From: Jan Kiszka @ 2011-10-17  9:27 UTC (permalink / raw)
  To: Avi Kivity, Marcelo Tosatti
  Cc: Alex Williamson, qemu-devel, kvm, Michael S. Tsirkin

So far we deliver MSI messages by writing them into the target MMIO
area. This reflects what happens on hardware, but imposes some
limitations on the emulation when introducing KVM in-kernel irqchip
models. For those we will need to track the message origin. Moreover,
different architecture or accelerators may want to overload the delivery
handler.

Therefore, this commit introduces a delivery hook that is called by the
MSI/MSI-X layer when devices send normal messages, but also on spurious
deliveries that ended up on the APIC MMIO handler. Our default delivery
handler for APIC-based PCs then dispatches between real MSIs and other
DMA requests that happened to take the MSI patch.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
---
 hw/apic.c |   19 ++++++++++++-------
 hw/apic.h |    1 +
 hw/msi.c  |   10 +++++++++-
 hw/msi.h  |    2 ++
 hw/msix.c |    2 +-
 hw/pc.c   |   11 +++++++++++
 6 files changed, 36 insertions(+), 9 deletions(-)

diff --git a/hw/apic.c b/hw/apic.c
index e43219f..c1d557d 100644
--- a/hw/apic.c
+++ b/hw/apic.c
@@ -19,6 +19,7 @@
 #include "hw.h"
 #include "apic.h"
 #include "ioapic.h"
+#include "msi.h"
 #include "qemu-timer.h"
 #include "host-utils.h"
 #include "sysbus.h"
@@ -803,13 +804,15 @@ static uint32_t apic_mem_readl(void *opaque, target_phys_addr_t addr)
     return val;
 }
 
-static void apic_send_msi(target_phys_addr_t addr, uint32_t data)
+void apic_deliver_msi(MSIMessage *msg)
 {
-    uint8_t dest = (addr & MSI_ADDR_DEST_ID_MASK) >> MSI_ADDR_DEST_ID_SHIFT;
-    uint8_t vector = (data & MSI_DATA_VECTOR_MASK) >> MSI_DATA_VECTOR_SHIFT;
-    uint8_t dest_mode = (addr >> MSI_ADDR_DEST_MODE_SHIFT) & 0x1;
-    uint8_t trigger_mode = (data >> MSI_DATA_TRIGGER_SHIFT) & 0x1;
-    uint8_t delivery = (data >> MSI_DATA_DELIVERY_MODE_SHIFT) & 0x7;
+    uint8_t dest =
+        (msg->address & MSI_ADDR_DEST_ID_MASK) >> MSI_ADDR_DEST_ID_SHIFT;
+    uint8_t vector =
+        (msg->data & MSI_DATA_VECTOR_MASK) >> MSI_DATA_VECTOR_SHIFT;
+    uint8_t dest_mode = (msg->address >> MSI_ADDR_DEST_MODE_SHIFT) & 0x1;
+    uint8_t trigger_mode = (msg->data >> MSI_DATA_TRIGGER_SHIFT) & 0x1;
+    uint8_t delivery = (msg->data >> MSI_DATA_DELIVERY_MODE_SHIFT) & 0x7;
     /* XXX: Ignore redirection hint. */
     apic_deliver_irq(dest, dest_mode, delivery, vector, trigger_mode);
 }
@@ -825,7 +828,9 @@ static void apic_mem_writel(void *opaque, target_phys_addr_t addr, uint32_t val)
          * APIC is connected directly to the CPU.
          * Mapping them on the global bus happens to work because
          * MSI registers are reserved in APIC MMIO and vice versa. */
-        apic_send_msi(addr, val);
+        MSIMessage msg = { .address = addr, .data = val };
+
+        msi_deliver(&msg);
         return;
     }
 
diff --git a/hw/apic.h b/hw/apic.h
index c398c83..fa848fd 100644
--- a/hw/apic.h
+++ b/hw/apic.h
@@ -18,6 +18,7 @@ void cpu_set_apic_tpr(DeviceState *s, uint8_t val);
 uint8_t cpu_get_apic_tpr(DeviceState *s);
 void apic_init_reset(DeviceState *s);
 void apic_sipi(DeviceState *s);
+void apic_deliver_msi(MSIMessage *msg);
 
 /* pc.c */
 int cpu_is_bsp(CPUState *env);
diff --git a/hw/msi.c b/hw/msi.c
index 3c7ebc3..9055155 100644
--- a/hw/msi.c
+++ b/hw/msi.c
@@ -40,6 +40,14 @@
 /* Flag for interrupt controller to declare MSI/MSI-X support */
 bool msi_supported;
 
+static void msi_unsupported(MSIMessage *msg)
+{
+    /* If we get here, the board failed to register a delivery handler. */
+    abort();
+}
+
+void (*msi_deliver)(MSIMessage *msg) = msi_unsupported;
+
 /* If we get rid of cap allocator, we won't need this. */
 static inline uint8_t msi_cap_sizeof(uint16_t flags)
 {
@@ -381,7 +389,7 @@ void msi_notify(PCIDevice *dev, unsigned int vector)
                    "notify vector 0x%x"
                    " address: 0x%"PRIx64" data: 0x%"PRIx32"\n",
                    vector, msg.address, msg.data);
-    stl_le_phys(msg.address, msg.data);
+    msi_deliver(&msg);
 }
 
 /* Normally called by pci_default_write_config(). */
diff --git a/hw/msi.h b/hw/msi.h
index 22e3932..f3152f3 100644
--- a/hw/msi.h
+++ b/hw/msi.h
@@ -46,4 +46,6 @@ static inline bool msi_present(const PCIDevice *dev)
     return dev->cap_present & QEMU_PCI_CAP_MSI;
 }
 
+extern void (*msi_deliver)(MSIMessage *msg);
+
 #endif /* QEMU_MSI_H */
diff --git a/hw/msix.c b/hw/msix.c
index 50fa504..08cc526 100644
--- a/hw/msix.c
+++ b/hw/msix.c
@@ -478,7 +478,7 @@ void msix_notify(PCIDevice *dev, unsigned vector)
 
     msix_message_from_vector(dev, vector, &msg);
 
-    stl_le_phys(msg.address, msg.data);
+    msi_deliver(&msg);
 }
 
 void msix_reset(PCIDevice *dev)
diff --git a/hw/pc.c b/hw/pc.c
index 768a20c..7d29a4a 100644
--- a/hw/pc.c
+++ b/hw/pc.c
@@ -24,6 +24,7 @@
 #include "hw.h"
 #include "pc.h"
 #include "apic.h"
+#include "msi.h"
 #include "fdc.h"
 #include "ide.h"
 #include "pci.h"
@@ -102,6 +103,15 @@ void isa_irq_handler(void *opaque, int n, int level)
         qemu_set_irq(isa->ioapic[n], level);
 };
 
+static void pc_msi_deliver(MSIMessage *msg)
+{
+    if ((msg->address & 0xfff00000) == MSI_ADDR_BASE) {
+        apic_deliver_msi(msg);
+    } else {
+        stl_phys(msg->address, msg->data);
+    }
+}
+
 static void ioport80_write(void *opaque, uint32_t addr, uint32_t data)
 {
 }
@@ -889,6 +899,7 @@ static DeviceState *apic_init(void *env, uint8_t apic_id)
            on the global memory bus. */
         /* XXX: what if the base changes? */
         sysbus_mmio_map(d, 0, MSI_ADDR_BASE);
+        msi_deliver = pc_msi_deliver;
         apic_mapped = 1;
     }
 
-- 
1.7.3.4

^ permalink raw reply related	[flat|nested] 288+ messages in thread

* [RFC][PATCH 12/45] msi: Introduce MSIRoutingCache
  2011-10-17  9:27 ` [Qemu-devel] " Jan Kiszka
@ 2011-10-17  9:27   ` Jan Kiszka
  -1 siblings, 0 replies; 288+ messages in thread
From: Jan Kiszka @ 2011-10-17  9:27 UTC (permalink / raw)
  To: Avi Kivity, Marcelo Tosatti
  Cc: kvm, Alex Williamson, Michael S. Tsirkin, qemu-devel

This cache will help us implementing KVM in-kernel irqchip support
without spreading hooks all over the place.

KVM requires us to register it first and then deliver it by raising a
pseudo IRQ line returned on registration. While this could be changed
for QEMU-originated MSI messages by adding direct MSI injection, we will
still need this translation for irqfd-originated messages. The
MSIRoutingCache will allow to track those registrations and update them
lazily before the actual delivery. This avoid having to track MSI
vectors at device level (like qemu-kvm currently does).

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
---
 hw/apic.c     |    5 +++--
 hw/apic.h     |    2 +-
 hw/msi.c      |   10 +++++++---
 hw/msi.h      |   14 +++++++++++++-
 hw/msix.c     |    7 ++++++-
 hw/pc.c       |    4 ++--
 hw/pci.h      |    4 ++++
 qemu-common.h |    1 +
 8 files changed, 37 insertions(+), 10 deletions(-)

diff --git a/hw/apic.c b/hw/apic.c
index c1d557d..6811ae1 100644
--- a/hw/apic.c
+++ b/hw/apic.c
@@ -804,7 +804,7 @@ static uint32_t apic_mem_readl(void *opaque, target_phys_addr_t addr)
     return val;
 }
 
-void apic_deliver_msi(MSIMessage *msg)
+void apic_deliver_msi(MSIMessage *msg, MSIRoutingCache *cache)
 {
     uint8_t dest =
         (msg->address & MSI_ADDR_DEST_ID_MASK) >> MSI_ADDR_DEST_ID_SHIFT;
@@ -829,8 +829,9 @@ static void apic_mem_writel(void *opaque, target_phys_addr_t addr, uint32_t val)
          * Mapping them on the global bus happens to work because
          * MSI registers are reserved in APIC MMIO and vice versa. */
         MSIMessage msg = { .address = addr, .data = val };
+        static MSIRoutingCache cache;
 
-        msi_deliver(&msg);
+        msi_deliver(&msg, &cache);
         return;
     }
 
diff --git a/hw/apic.h b/hw/apic.h
index fa848fd..353ea3a 100644
--- a/hw/apic.h
+++ b/hw/apic.h
@@ -18,7 +18,7 @@ void cpu_set_apic_tpr(DeviceState *s, uint8_t val);
 uint8_t cpu_get_apic_tpr(DeviceState *s);
 void apic_init_reset(DeviceState *s);
 void apic_sipi(DeviceState *s);
-void apic_deliver_msi(MSIMessage *msg);
+void apic_deliver_msi(MSIMessage *msg, MSIRoutingCache *cache);
 
 /* pc.c */
 int cpu_is_bsp(CPUState *env);
diff --git a/hw/msi.c b/hw/msi.c
index 9055155..c8ccb17 100644
--- a/hw/msi.c
+++ b/hw/msi.c
@@ -40,13 +40,13 @@
 /* Flag for interrupt controller to declare MSI/MSI-X support */
 bool msi_supported;
 
-static void msi_unsupported(MSIMessage *msg)
+static void msi_unsupported(MSIMessage *msg, MSIRoutingCache *cache)
 {
     /* If we get here, the board failed to register a delivery handler. */
     abort();
 }
 
-void (*msi_deliver)(MSIMessage *msg) = msi_unsupported;
+void (*msi_deliver)(MSIMessage *msg, MSIRoutingCache *cache) = msi_unsupported;
 
 /* If we get rid of cap allocator, we won't need this. */
 static inline uint8_t msi_cap_sizeof(uint16_t flags)
@@ -288,6 +288,8 @@ int msi_init(struct PCIDevice *dev, uint8_t offset,
                      0xffffffff >> (PCI_MSI_VECTORS_MAX - nr_vectors));
     }
 
+    dev->msi_cache = g_malloc0(nr_vectors * sizeof(*dev->msi_cache));
+
     if (kvm_enabled() && kvm_irqchip_in_kernel()) {
         dev->msi_irq_entries = g_malloc(nr_vectors *
                                         sizeof(*dev->msix_irq_entries));
@@ -312,6 +314,8 @@ void msi_uninit(struct PCIDevice *dev)
         g_free(dev->msi_irq_entries);
     }
 
+    g_free(dev->msi_cache);
+
     pci_del_capability(dev, PCI_CAP_ID_MSI, cap_size);
     dev->cap_present &= ~QEMU_PCI_CAP_MSI;
 
@@ -389,7 +393,7 @@ void msi_notify(PCIDevice *dev, unsigned int vector)
                    "notify vector 0x%x"
                    " address: 0x%"PRIx64" data: 0x%"PRIx32"\n",
                    vector, msg.address, msg.data);
-    msi_deliver(&msg);
+    msi_deliver(&msg, &dev->msi_cache[vector]);
 }
 
 /* Normally called by pci_default_write_config(). */
diff --git a/hw/msi.h b/hw/msi.h
index f3152f3..20ae215 100644
--- a/hw/msi.h
+++ b/hw/msi.h
@@ -29,6 +29,18 @@ struct MSIMessage {
     uint32_t data;
 };
 
+typedef enum {
+    MSI_ROUTE_NONE = 0,
+    MSI_ROUTE_STATIC,
+} MSIRouteType;
+
+struct MSIRoutingCache {
+    MSIMessage msg;
+    MSIRouteType type;
+    int kvm_gsi;
+    int kvm_irqfd;
+};
+
 extern bool msi_supported;
 
 bool msi_enabled(const PCIDevice *dev);
@@ -46,6 +58,6 @@ static inline bool msi_present(const PCIDevice *dev)
     return dev->cap_present & QEMU_PCI_CAP_MSI;
 }
 
-extern void (*msi_deliver)(MSIMessage *msg);
+extern void (*msi_deliver)(MSIMessage *msg, MSIRoutingCache *cache);
 
 #endif /* QEMU_MSI_H */
diff --git a/hw/msix.c b/hw/msix.c
index 08cc526..e824aef 100644
--- a/hw/msix.c
+++ b/hw/msix.c
@@ -358,6 +358,8 @@ int msix_init(struct PCIDevice *dev, unsigned short nentries,
     if (ret)
         goto err_config;
 
+    dev->msix_cache = g_malloc0(nentries * sizeof *dev->msix_cache);
+
     if (kvm_enabled() && kvm_irqchip_in_kernel()) {
         dev->msix_irq_entries = g_malloc(nentries *
                                          sizeof *dev->msix_irq_entries);
@@ -409,6 +411,9 @@ int msix_uninit(PCIDevice *dev, MemoryRegion *bar)
     dev->msix_entry_used = NULL;
     g_free(dev->msix_irq_entries);
     dev->msix_irq_entries = NULL;
+
+    g_free(dev->msix_cache);
+
     dev->cap_present &= ~QEMU_PCI_CAP_MSIX;
     return 0;
 }
@@ -478,7 +483,7 @@ void msix_notify(PCIDevice *dev, unsigned vector)
 
     msix_message_from_vector(dev, vector, &msg);
 
-    msi_deliver(&msg);
+    msi_deliver(&msg, &dev->msix_cache[vector]);
 }
 
 void msix_reset(PCIDevice *dev)
diff --git a/hw/pc.c b/hw/pc.c
index 7d29a4a..4d8b524 100644
--- a/hw/pc.c
+++ b/hw/pc.c
@@ -103,10 +103,10 @@ void isa_irq_handler(void *opaque, int n, int level)
         qemu_set_irq(isa->ioapic[n], level);
 };
 
-static void pc_msi_deliver(MSIMessage *msg)
+static void pc_msi_deliver(MSIMessage *msg, MSIRoutingCache *cache)
 {
     if ((msg->address & 0xfff00000) == MSI_ADDR_BASE) {
-        apic_deliver_msi(msg);
+        apic_deliver_msi(msg, cache);
     } else {
         stl_phys(msg->address, msg->data);
     }
diff --git a/hw/pci.h b/hw/pci.h
index 329ab32..5b5d2fd 100644
--- a/hw/pci.h
+++ b/hw/pci.h
@@ -197,6 +197,10 @@ struct PCIDevice {
     MemoryRegion rom;
     uint32_t rom_bar;
 
+    /* MSI routing chaches */
+    MSIRoutingCache *msi_cache;
+    MSIRoutingCache *msix_cache;
+
     /* MSI entries */
     int msi_entries_nr;
     struct KVMMsiMessage *msi_irq_entries;
diff --git a/qemu-common.h b/qemu-common.h
index d3901bd..c1d1614 100644
--- a/qemu-common.h
+++ b/qemu-common.h
@@ -16,6 +16,7 @@ typedef struct QEMUFile QEMUFile;
 typedef struct QEMUBH QEMUBH;
 typedef struct DeviceState DeviceState;
 typedef struct MSIMessage MSIMessage;
+typedef struct MSIRoutingCache MSIRoutingCache;
 
 struct Monitor;
 typedef struct Monitor Monitor;
-- 
1.7.3.4


^ permalink raw reply related	[flat|nested] 288+ messages in thread

* [Qemu-devel] [RFC][PATCH 12/45] msi: Introduce MSIRoutingCache
@ 2011-10-17  9:27   ` Jan Kiszka
  0 siblings, 0 replies; 288+ messages in thread
From: Jan Kiszka @ 2011-10-17  9:27 UTC (permalink / raw)
  To: Avi Kivity, Marcelo Tosatti
  Cc: Alex Williamson, qemu-devel, kvm, Michael S. Tsirkin

This cache will help us implementing KVM in-kernel irqchip support
without spreading hooks all over the place.

KVM requires us to register it first and then deliver it by raising a
pseudo IRQ line returned on registration. While this could be changed
for QEMU-originated MSI messages by adding direct MSI injection, we will
still need this translation for irqfd-originated messages. The
MSIRoutingCache will allow to track those registrations and update them
lazily before the actual delivery. This avoid having to track MSI
vectors at device level (like qemu-kvm currently does).

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
---
 hw/apic.c     |    5 +++--
 hw/apic.h     |    2 +-
 hw/msi.c      |   10 +++++++---
 hw/msi.h      |   14 +++++++++++++-
 hw/msix.c     |    7 ++++++-
 hw/pc.c       |    4 ++--
 hw/pci.h      |    4 ++++
 qemu-common.h |    1 +
 8 files changed, 37 insertions(+), 10 deletions(-)

diff --git a/hw/apic.c b/hw/apic.c
index c1d557d..6811ae1 100644
--- a/hw/apic.c
+++ b/hw/apic.c
@@ -804,7 +804,7 @@ static uint32_t apic_mem_readl(void *opaque, target_phys_addr_t addr)
     return val;
 }
 
-void apic_deliver_msi(MSIMessage *msg)
+void apic_deliver_msi(MSIMessage *msg, MSIRoutingCache *cache)
 {
     uint8_t dest =
         (msg->address & MSI_ADDR_DEST_ID_MASK) >> MSI_ADDR_DEST_ID_SHIFT;
@@ -829,8 +829,9 @@ static void apic_mem_writel(void *opaque, target_phys_addr_t addr, uint32_t val)
          * Mapping them on the global bus happens to work because
          * MSI registers are reserved in APIC MMIO and vice versa. */
         MSIMessage msg = { .address = addr, .data = val };
+        static MSIRoutingCache cache;
 
-        msi_deliver(&msg);
+        msi_deliver(&msg, &cache);
         return;
     }
 
diff --git a/hw/apic.h b/hw/apic.h
index fa848fd..353ea3a 100644
--- a/hw/apic.h
+++ b/hw/apic.h
@@ -18,7 +18,7 @@ void cpu_set_apic_tpr(DeviceState *s, uint8_t val);
 uint8_t cpu_get_apic_tpr(DeviceState *s);
 void apic_init_reset(DeviceState *s);
 void apic_sipi(DeviceState *s);
-void apic_deliver_msi(MSIMessage *msg);
+void apic_deliver_msi(MSIMessage *msg, MSIRoutingCache *cache);
 
 /* pc.c */
 int cpu_is_bsp(CPUState *env);
diff --git a/hw/msi.c b/hw/msi.c
index 9055155..c8ccb17 100644
--- a/hw/msi.c
+++ b/hw/msi.c
@@ -40,13 +40,13 @@
 /* Flag for interrupt controller to declare MSI/MSI-X support */
 bool msi_supported;
 
-static void msi_unsupported(MSIMessage *msg)
+static void msi_unsupported(MSIMessage *msg, MSIRoutingCache *cache)
 {
     /* If we get here, the board failed to register a delivery handler. */
     abort();
 }
 
-void (*msi_deliver)(MSIMessage *msg) = msi_unsupported;
+void (*msi_deliver)(MSIMessage *msg, MSIRoutingCache *cache) = msi_unsupported;
 
 /* If we get rid of cap allocator, we won't need this. */
 static inline uint8_t msi_cap_sizeof(uint16_t flags)
@@ -288,6 +288,8 @@ int msi_init(struct PCIDevice *dev, uint8_t offset,
                      0xffffffff >> (PCI_MSI_VECTORS_MAX - nr_vectors));
     }
 
+    dev->msi_cache = g_malloc0(nr_vectors * sizeof(*dev->msi_cache));
+
     if (kvm_enabled() && kvm_irqchip_in_kernel()) {
         dev->msi_irq_entries = g_malloc(nr_vectors *
                                         sizeof(*dev->msix_irq_entries));
@@ -312,6 +314,8 @@ void msi_uninit(struct PCIDevice *dev)
         g_free(dev->msi_irq_entries);
     }
 
+    g_free(dev->msi_cache);
+
     pci_del_capability(dev, PCI_CAP_ID_MSI, cap_size);
     dev->cap_present &= ~QEMU_PCI_CAP_MSI;
 
@@ -389,7 +393,7 @@ void msi_notify(PCIDevice *dev, unsigned int vector)
                    "notify vector 0x%x"
                    " address: 0x%"PRIx64" data: 0x%"PRIx32"\n",
                    vector, msg.address, msg.data);
-    msi_deliver(&msg);
+    msi_deliver(&msg, &dev->msi_cache[vector]);
 }
 
 /* Normally called by pci_default_write_config(). */
diff --git a/hw/msi.h b/hw/msi.h
index f3152f3..20ae215 100644
--- a/hw/msi.h
+++ b/hw/msi.h
@@ -29,6 +29,18 @@ struct MSIMessage {
     uint32_t data;
 };
 
+typedef enum {
+    MSI_ROUTE_NONE = 0,
+    MSI_ROUTE_STATIC,
+} MSIRouteType;
+
+struct MSIRoutingCache {
+    MSIMessage msg;
+    MSIRouteType type;
+    int kvm_gsi;
+    int kvm_irqfd;
+};
+
 extern bool msi_supported;
 
 bool msi_enabled(const PCIDevice *dev);
@@ -46,6 +58,6 @@ static inline bool msi_present(const PCIDevice *dev)
     return dev->cap_present & QEMU_PCI_CAP_MSI;
 }
 
-extern void (*msi_deliver)(MSIMessage *msg);
+extern void (*msi_deliver)(MSIMessage *msg, MSIRoutingCache *cache);
 
 #endif /* QEMU_MSI_H */
diff --git a/hw/msix.c b/hw/msix.c
index 08cc526..e824aef 100644
--- a/hw/msix.c
+++ b/hw/msix.c
@@ -358,6 +358,8 @@ int msix_init(struct PCIDevice *dev, unsigned short nentries,
     if (ret)
         goto err_config;
 
+    dev->msix_cache = g_malloc0(nentries * sizeof *dev->msix_cache);
+
     if (kvm_enabled() && kvm_irqchip_in_kernel()) {
         dev->msix_irq_entries = g_malloc(nentries *
                                          sizeof *dev->msix_irq_entries);
@@ -409,6 +411,9 @@ int msix_uninit(PCIDevice *dev, MemoryRegion *bar)
     dev->msix_entry_used = NULL;
     g_free(dev->msix_irq_entries);
     dev->msix_irq_entries = NULL;
+
+    g_free(dev->msix_cache);
+
     dev->cap_present &= ~QEMU_PCI_CAP_MSIX;
     return 0;
 }
@@ -478,7 +483,7 @@ void msix_notify(PCIDevice *dev, unsigned vector)
 
     msix_message_from_vector(dev, vector, &msg);
 
-    msi_deliver(&msg);
+    msi_deliver(&msg, &dev->msix_cache[vector]);
 }
 
 void msix_reset(PCIDevice *dev)
diff --git a/hw/pc.c b/hw/pc.c
index 7d29a4a..4d8b524 100644
--- a/hw/pc.c
+++ b/hw/pc.c
@@ -103,10 +103,10 @@ void isa_irq_handler(void *opaque, int n, int level)
         qemu_set_irq(isa->ioapic[n], level);
 };
 
-static void pc_msi_deliver(MSIMessage *msg)
+static void pc_msi_deliver(MSIMessage *msg, MSIRoutingCache *cache)
 {
     if ((msg->address & 0xfff00000) == MSI_ADDR_BASE) {
-        apic_deliver_msi(msg);
+        apic_deliver_msi(msg, cache);
     } else {
         stl_phys(msg->address, msg->data);
     }
diff --git a/hw/pci.h b/hw/pci.h
index 329ab32..5b5d2fd 100644
--- a/hw/pci.h
+++ b/hw/pci.h
@@ -197,6 +197,10 @@ struct PCIDevice {
     MemoryRegion rom;
     uint32_t rom_bar;
 
+    /* MSI routing chaches */
+    MSIRoutingCache *msi_cache;
+    MSIRoutingCache *msix_cache;
+
     /* MSI entries */
     int msi_entries_nr;
     struct KVMMsiMessage *msi_irq_entries;
diff --git a/qemu-common.h b/qemu-common.h
index d3901bd..c1d1614 100644
--- a/qemu-common.h
+++ b/qemu-common.h
@@ -16,6 +16,7 @@ typedef struct QEMUFile QEMUFile;
 typedef struct QEMUBH QEMUBH;
 typedef struct DeviceState DeviceState;
 typedef struct MSIMessage MSIMessage;
+typedef struct MSIRoutingCache MSIRoutingCache;
 
 struct Monitor;
 typedef struct Monitor Monitor;
-- 
1.7.3.4

^ permalink raw reply related	[flat|nested] 288+ messages in thread

* [RFC][PATCH 13/45] hpet: Use msi_deliver
  2011-10-17  9:27 ` [Qemu-devel] " Jan Kiszka
@ 2011-10-17  9:27   ` Jan Kiszka
  -1 siblings, 0 replies; 288+ messages in thread
From: Jan Kiszka @ 2011-10-17  9:27 UTC (permalink / raw)
  To: Avi Kivity, Marcelo Tosatti
  Cc: kvm, Alex Williamson, Michael S. Tsirkin, qemu-devel

Avoid the slow-path MSI delivery via stl_phys by switching to
msi_deliver. This also allows to prepare these rarely changing messages
in advance.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
---
 hw/hpet.c |    7 ++++++-
 1 files changed, 6 insertions(+), 1 deletions(-)

diff --git a/hw/hpet.c b/hw/hpet.c
index d8e6b8e..c6d6e35 100644
--- a/hw/hpet.c
+++ b/hw/hpet.c
@@ -31,6 +31,7 @@
 #include "hpet_emul.h"
 #include "sysbus.h"
 #include "mc146818rtc.h"
+#include "msi.h"
 
 //#define HPET_DEBUG
 #ifdef HPET_DEBUG
@@ -55,6 +56,8 @@ typedef struct HPETTimer {  /* timers */
     uint8_t wrap_flag;      /* timer pop will indicate wrap for one-shot 32-bit
                              * mode. Next pop will be actual timer expiration.
                              */
+    MSIMessage msi_msg;
+    MSIRoutingCache msi_cache;
 } HPETTimer;
 
 typedef struct HPETState {
@@ -192,7 +195,7 @@ static void update_irq(struct HPETTimer *timer, int set)
             qemu_irq_lower(s->irqs[route]);
         }
     } else if (timer_fsb_route(timer)) {
-        stl_le_phys(timer->fsb >> 32, timer->fsb & 0xffffffff);
+        msi_deliver(&timer->msi_msg, &timer->msi_cache);
     } else if (timer->config & HPET_TN_TYPE_LEVEL) {
         s->isr |= mask;
         qemu_irq_raise(s->irqs[route]);
@@ -533,9 +536,11 @@ static void hpet_ram_writel(void *opaque, target_phys_addr_t addr,
                 break;
         case HPET_TN_ROUTE:
             timer->fsb = (timer->fsb & 0xffffffff00000000ULL) | new_val;
+            timer->msi_msg.data = new_val;
             break;
         case HPET_TN_ROUTE + 4:
             timer->fsb = (new_val << 32) | (timer->fsb & 0xffffffff);
+            timer->msi_msg.address = new_val;
             break;
         default:
             DPRINTF("qemu: invalid hpet_ram_writel\n");
-- 
1.7.3.4


^ permalink raw reply related	[flat|nested] 288+ messages in thread

* [Qemu-devel] [RFC][PATCH 13/45] hpet: Use msi_deliver
@ 2011-10-17  9:27   ` Jan Kiszka
  0 siblings, 0 replies; 288+ messages in thread
From: Jan Kiszka @ 2011-10-17  9:27 UTC (permalink / raw)
  To: Avi Kivity, Marcelo Tosatti
  Cc: Alex Williamson, qemu-devel, kvm, Michael S. Tsirkin

Avoid the slow-path MSI delivery via stl_phys by switching to
msi_deliver. This also allows to prepare these rarely changing messages
in advance.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
---
 hw/hpet.c |    7 ++++++-
 1 files changed, 6 insertions(+), 1 deletions(-)

diff --git a/hw/hpet.c b/hw/hpet.c
index d8e6b8e..c6d6e35 100644
--- a/hw/hpet.c
+++ b/hw/hpet.c
@@ -31,6 +31,7 @@
 #include "hpet_emul.h"
 #include "sysbus.h"
 #include "mc146818rtc.h"
+#include "msi.h"
 
 //#define HPET_DEBUG
 #ifdef HPET_DEBUG
@@ -55,6 +56,8 @@ typedef struct HPETTimer {  /* timers */
     uint8_t wrap_flag;      /* timer pop will indicate wrap for one-shot 32-bit
                              * mode. Next pop will be actual timer expiration.
                              */
+    MSIMessage msi_msg;
+    MSIRoutingCache msi_cache;
 } HPETTimer;
 
 typedef struct HPETState {
@@ -192,7 +195,7 @@ static void update_irq(struct HPETTimer *timer, int set)
             qemu_irq_lower(s->irqs[route]);
         }
     } else if (timer_fsb_route(timer)) {
-        stl_le_phys(timer->fsb >> 32, timer->fsb & 0xffffffff);
+        msi_deliver(&timer->msi_msg, &timer->msi_cache);
     } else if (timer->config & HPET_TN_TYPE_LEVEL) {
         s->isr |= mask;
         qemu_irq_raise(s->irqs[route]);
@@ -533,9 +536,11 @@ static void hpet_ram_writel(void *opaque, target_phys_addr_t addr,
                 break;
         case HPET_TN_ROUTE:
             timer->fsb = (timer->fsb & 0xffffffff00000000ULL) | new_val;
+            timer->msi_msg.data = new_val;
             break;
         case HPET_TN_ROUTE + 4:
             timer->fsb = (new_val << 32) | (timer->fsb & 0xffffffff);
+            timer->msi_msg.address = new_val;
             break;
         default:
             DPRINTF("qemu: invalid hpet_ram_writel\n");
-- 
1.7.3.4

^ permalink raw reply related	[flat|nested] 288+ messages in thread

* [RFC][PATCH 14/45] qemu-kvm: Drop useless kvm_clear_gsi_routes
  2011-10-17  9:27 ` [Qemu-devel] " Jan Kiszka
@ 2011-10-17  9:27   ` Jan Kiszka
  -1 siblings, 0 replies; 288+ messages in thread
From: Jan Kiszka @ 2011-10-17  9:27 UTC (permalink / raw)
  To: Avi Kivity, Marcelo Tosatti
  Cc: kvm, Alex Williamson, Michael S. Tsirkin, qemu-devel

There are no routes to clear at this point, we are just creating the VM.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
---
 qemu-kvm-x86.c |    1 -
 qemu-kvm.c     |   10 ----------
 qemu-kvm.h     |    9 ---------
 3 files changed, 0 insertions(+), 20 deletions(-)

diff --git a/qemu-kvm-x86.c b/qemu-kvm-x86.c
index a7981b1..bab4307 100644
--- a/qemu-kvm-x86.c
+++ b/qemu-kvm-x86.c
@@ -167,7 +167,6 @@ int kvm_arch_init_irq_routing(void)
     int i, r;
 
     if (kvm_has_gsi_routing()) {
-        kvm_clear_gsi_routes();
         for (i = 0; i < 8; ++i) {
             if (i == 2) {
                 continue;
diff --git a/qemu-kvm.c b/qemu-kvm.c
index f5b129a..70481de 100644
--- a/qemu-kvm.c
+++ b/qemu-kvm.c
@@ -252,16 +252,6 @@ int kvm_has_gsi_routing(void)
     return r;
 }
 
-int kvm_clear_gsi_routes(void)
-{
-#ifdef KVM_CAP_IRQ_ROUTING
-    kvm_state->irq_routes->nr = 0;
-    return 0;
-#else
-    return -EINVAL;
-#endif
-}
-
 int kvm_add_routing_entry(struct kvm_irq_routing_entry *entry)
 {
 #ifdef KVM_CAP_IRQ_ROUTING
diff --git a/qemu-kvm.h b/qemu-kvm.h
index 2bd5602..8032388 100644
--- a/qemu-kvm.h
+++ b/qemu-kvm.h
@@ -174,15 +174,6 @@ int kvm_deassign_pci_device(KVMState *s,
                             struct kvm_assigned_pci_dev *assigned_dev);
 
 /*!
- * \brief Clears the temporary irq routing table
- *
- * Clears the temporary irq routing table.  Nothing is committed to the
- * running VM.
- *
- */
-int kvm_clear_gsi_routes(void);
-
-/*!
  * \brief Adds an irq route to the temporary irq routing table
  *
  * Adds an irq route to the temporary irq routing table.  Nothing is
-- 
1.7.3.4


^ permalink raw reply related	[flat|nested] 288+ messages in thread

* [Qemu-devel] [RFC][PATCH 14/45] qemu-kvm: Drop useless kvm_clear_gsi_routes
@ 2011-10-17  9:27   ` Jan Kiszka
  0 siblings, 0 replies; 288+ messages in thread
From: Jan Kiszka @ 2011-10-17  9:27 UTC (permalink / raw)
  To: Avi Kivity, Marcelo Tosatti
  Cc: Alex Williamson, qemu-devel, kvm, Michael S. Tsirkin

There are no routes to clear at this point, we are just creating the VM.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
---
 qemu-kvm-x86.c |    1 -
 qemu-kvm.c     |   10 ----------
 qemu-kvm.h     |    9 ---------
 3 files changed, 0 insertions(+), 20 deletions(-)

diff --git a/qemu-kvm-x86.c b/qemu-kvm-x86.c
index a7981b1..bab4307 100644
--- a/qemu-kvm-x86.c
+++ b/qemu-kvm-x86.c
@@ -167,7 +167,6 @@ int kvm_arch_init_irq_routing(void)
     int i, r;
 
     if (kvm_has_gsi_routing()) {
-        kvm_clear_gsi_routes();
         for (i = 0; i < 8; ++i) {
             if (i == 2) {
                 continue;
diff --git a/qemu-kvm.c b/qemu-kvm.c
index f5b129a..70481de 100644
--- a/qemu-kvm.c
+++ b/qemu-kvm.c
@@ -252,16 +252,6 @@ int kvm_has_gsi_routing(void)
     return r;
 }
 
-int kvm_clear_gsi_routes(void)
-{
-#ifdef KVM_CAP_IRQ_ROUTING
-    kvm_state->irq_routes->nr = 0;
-    return 0;
-#else
-    return -EINVAL;
-#endif
-}
-
 int kvm_add_routing_entry(struct kvm_irq_routing_entry *entry)
 {
 #ifdef KVM_CAP_IRQ_ROUTING
diff --git a/qemu-kvm.h b/qemu-kvm.h
index 2bd5602..8032388 100644
--- a/qemu-kvm.h
+++ b/qemu-kvm.h
@@ -174,15 +174,6 @@ int kvm_deassign_pci_device(KVMState *s,
                             struct kvm_assigned_pci_dev *assigned_dev);
 
 /*!
- * \brief Clears the temporary irq routing table
- *
- * Clears the temporary irq routing table.  Nothing is committed to the
- * running VM.
- *
- */
-int kvm_clear_gsi_routes(void);
-
-/*!
  * \brief Adds an irq route to the temporary irq routing table
  *
  * Adds an irq route to the temporary irq routing table.  Nothing is
-- 
1.7.3.4

^ permalink raw reply related	[flat|nested] 288+ messages in thread

* [RFC][PATCH 15/45] qemu-kvm: Drop unused kvm_del_irq_route
  2011-10-17  9:27 ` [Qemu-devel] " Jan Kiszka
@ 2011-10-17  9:27   ` Jan Kiszka
  -1 siblings, 0 replies; 288+ messages in thread
From: Jan Kiszka @ 2011-10-17  9:27 UTC (permalink / raw)
  To: Avi Kivity, Marcelo Tosatti
  Cc: Alex Williamson, qemu-devel, kvm, Michael S. Tsirkin

kvm_add_irq_route only exists to create platform specific static routes.
So there is no need for a corresponding delete.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
---
 qemu-kvm.c |   16 ----------------
 qemu-kvm.h |    8 --------
 2 files changed, 0 insertions(+), 24 deletions(-)

diff --git a/qemu-kvm.c b/qemu-kvm.c
index 70481de..e8dc537 100644
--- a/qemu-kvm.c
+++ b/qemu-kvm.c
@@ -410,22 +410,6 @@ int kvm_update_routing_entry(struct kvm_irq_routing_entry *entry,
 #endif
 }
 
-int kvm_del_irq_route(int gsi, int irqchip, int pin)
-{
-#ifdef KVM_CAP_IRQ_ROUTING
-    struct kvm_irq_routing_entry e;
-
-    e.gsi = gsi;
-    e.type = KVM_IRQ_ROUTING_IRQCHIP;
-    e.flags = 0;
-    e.u.irqchip.irqchip = irqchip;
-    e.u.irqchip.pin = pin;
-    return kvm_del_routing_entry(&e);
-#else
-    return -ENOSYS;
-#endif
-}
-
 int kvm_commit_irq_routes(void)
 {
 #ifdef KVM_CAP_IRQ_ROUTING
diff --git a/qemu-kvm.h b/qemu-kvm.h
index 8032388..68a921e 100644
--- a/qemu-kvm.h
+++ b/qemu-kvm.h
@@ -181,14 +181,6 @@ int kvm_deassign_pci_device(KVMState *s,
  */
 int kvm_add_irq_route(int gsi, int irqchip, int pin);
 
-/*!
- * \brief Removes an irq route from the temporary irq routing table
- *
- * Adds an irq route to the temporary irq routing table.  Nothing is
- * committed to the running VM.
- */
-int kvm_del_irq_route(int gsi, int irqchip, int pin);
-
 struct kvm_irq_routing_entry;
 /*!
  * \brief Adds a routing entry to the temporary irq routing table
-- 
1.7.3.4

^ permalink raw reply related	[flat|nested] 288+ messages in thread

* [Qemu-devel] [RFC][PATCH 15/45] qemu-kvm: Drop unused kvm_del_irq_route
@ 2011-10-17  9:27   ` Jan Kiszka
  0 siblings, 0 replies; 288+ messages in thread
From: Jan Kiszka @ 2011-10-17  9:27 UTC (permalink / raw)
  To: Avi Kivity, Marcelo Tosatti
  Cc: Alex Williamson, qemu-devel, kvm, Michael S. Tsirkin

kvm_add_irq_route only exists to create platform specific static routes.
So there is no need for a corresponding delete.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
---
 qemu-kvm.c |   16 ----------------
 qemu-kvm.h |    8 --------
 2 files changed, 0 insertions(+), 24 deletions(-)

diff --git a/qemu-kvm.c b/qemu-kvm.c
index 70481de..e8dc537 100644
--- a/qemu-kvm.c
+++ b/qemu-kvm.c
@@ -410,22 +410,6 @@ int kvm_update_routing_entry(struct kvm_irq_routing_entry *entry,
 #endif
 }
 
-int kvm_del_irq_route(int gsi, int irqchip, int pin)
-{
-#ifdef KVM_CAP_IRQ_ROUTING
-    struct kvm_irq_routing_entry e;
-
-    e.gsi = gsi;
-    e.type = KVM_IRQ_ROUTING_IRQCHIP;
-    e.flags = 0;
-    e.u.irqchip.irqchip = irqchip;
-    e.u.irqchip.pin = pin;
-    return kvm_del_routing_entry(&e);
-#else
-    return -ENOSYS;
-#endif
-}
-
 int kvm_commit_irq_routes(void)
 {
 #ifdef KVM_CAP_IRQ_ROUTING
diff --git a/qemu-kvm.h b/qemu-kvm.h
index 8032388..68a921e 100644
--- a/qemu-kvm.h
+++ b/qemu-kvm.h
@@ -181,14 +181,6 @@ int kvm_deassign_pci_device(KVMState *s,
  */
 int kvm_add_irq_route(int gsi, int irqchip, int pin);
 
-/*!
- * \brief Removes an irq route from the temporary irq routing table
- *
- * Adds an irq route to the temporary irq routing table.  Nothing is
- * committed to the running VM.
- */
-int kvm_del_irq_route(int gsi, int irqchip, int pin);
-
 struct kvm_irq_routing_entry;
 /*!
  * \brief Adds a routing entry to the temporary irq routing table
-- 
1.7.3.4

^ permalink raw reply related	[flat|nested] 288+ messages in thread

* [RFC][PATCH 16/45] qemu-kvm: Use MSIMessage and MSIRoutingCache
  2011-10-17  9:27 ` [Qemu-devel] " Jan Kiszka
@ 2011-10-17  9:27   ` Jan Kiszka
  -1 siblings, 0 replies; 288+ messages in thread
From: Jan Kiszka @ 2011-10-17  9:27 UTC (permalink / raw)
  To: Avi Kivity, Marcelo Tosatti
  Cc: kvm, Alex Williamson, Michael S. Tsirkin, qemu-devel

Start benefiting from the new abstractions and drop the KVM-specific
vector tracking to generic MSIMessage and MSIRoutingCache data
structures and helpers, also reducing the diff to upstream.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
---
 hw/msi.c        |   49 +++++++++++--------------------------------------
 hw/msix.c       |   37 +++++++++----------------------------
 hw/pci.h        |    4 ----
 hw/virtio-pci.c |    3 ++-
 kvm-stub.c      |    6 +++---
 kvm.h           |   13 +++----------
 qemu-kvm.c      |   46 +++++++++++++++++++++++++++++-----------------
 7 files changed, 57 insertions(+), 101 deletions(-)

diff --git a/hw/msi.c b/hw/msi.c
index c8ccb17..b947104 100644
--- a/hw/msi.c
+++ b/hw/msi.c
@@ -140,49 +140,29 @@ static void msi_message_from_vector(PCIDevice *dev, uint16_t msi_flags,
     }
 }
 
-static void kvm_msi_message_from_vector(PCIDevice *dev, unsigned vector,
-                                        KVMMsiMessage *kmm)
-{
-    uint16_t flags = pci_get_word(dev->config + msi_flags_off(dev));
-    bool msi64bit = flags & PCI_MSI_FLAGS_64BIT;
-    unsigned int nr_vectors = msi_nr_vectors(flags);
-
-    kmm->addr_lo = pci_get_long(dev->config + msi_address_lo_off(dev));
-    if (msi64bit) {
-        kmm->addr_hi = pci_get_long(dev->config + msi_address_hi_off(dev));
-    } else {
-        kmm->addr_hi = 0;
-    }
-
-    kmm->data = pci_get_word(dev->config + msi_data_off(dev, msi64bit));
-    if (nr_vectors > 1) {
-        kmm->data &= ~(nr_vectors - 1);
-        kmm->data |= vector;
-    }
-}
-
 static void kvm_msi_update(PCIDevice *dev)
 {
     uint16_t flags = pci_get_word(dev->config + msi_flags_off(dev));
     unsigned int max_vectors = 1 <<
         ((flags & PCI_MSI_FLAGS_QMASK) >> (ffs(PCI_MSI_FLAGS_QMASK) - 1));
     unsigned int nr_vectors = msi_nr_vectors(flags);
-    KVMMsiMessage new_entry, *entry;
+    MSIRoutingCache *cache;
     bool changed = false;
     unsigned int vector;
+    MSIMessage msg;
     int r;
 
     for (vector = 0; vector < max_vectors; vector++) {
-        entry = dev->msi_irq_entries + vector;
+        cache = &dev->msi_cache[vector];
 
         if (vector >= nr_vectors) {
             if (vector < dev->msi_entries_nr) {
-                kvm_msi_message_del(entry);
+                kvm_msi_message_del(cache);
                 changed = true;
             }
         } else if (vector >= dev->msi_entries_nr) {
-            kvm_msi_message_from_vector(dev, vector, entry);
-            r = kvm_msi_message_add(entry);
+            msi_message_from_vector(dev, flags, vector, &msg);
+            r = kvm_msi_message_add(&msg, cache);
             if (r) {
                 fprintf(stderr, "%s: kvm_msi_add failed: %s\n", __func__,
                         strerror(-r));
@@ -190,15 +170,14 @@ static void kvm_msi_update(PCIDevice *dev)
             }
             changed = true;
         } else {
-            kvm_msi_message_from_vector(dev, vector, &new_entry);
-            r = kvm_msi_message_update(entry, &new_entry);
+            msi_message_from_vector(dev, flags, vector, &msg);
+            r = kvm_msi_message_update(&msg, cache);
             if (r < 0) {
                 fprintf(stderr, "%s: kvm_update_msi failed: %s\n",
                         __func__, strerror(-r));
                 exit(1);
             }
             if (r > 0) {
-                *entry = new_entry;
                 changed = true;
             }
         }
@@ -220,7 +199,7 @@ static void kvm_msi_free(PCIDevice *dev)
     unsigned int vector;
 
     for (vector = 0; vector < dev->msi_entries_nr; ++vector) {
-        kvm_msi_message_del(&dev->msi_irq_entries[vector]);
+        kvm_msi_message_del(&dev->msi_cache[vector]);
     }
     if (dev->msi_entries_nr > 0) {
         kvm_commit_irq_routes();
@@ -290,11 +269,6 @@ int msi_init(struct PCIDevice *dev, uint8_t offset,
 
     dev->msi_cache = g_malloc0(nr_vectors * sizeof(*dev->msi_cache));
 
-    if (kvm_enabled() && kvm_irqchip_in_kernel()) {
-        dev->msi_irq_entries = g_malloc(nr_vectors *
-                                        sizeof(*dev->msix_irq_entries));
-    }
-
     return config_offset;
 }
 
@@ -311,7 +285,6 @@ void msi_uninit(struct PCIDevice *dev)
 
     if (kvm_enabled() && kvm_irqchip_in_kernel()) {
         kvm_msi_free(dev);
-        g_free(dev->msi_irq_entries);
     }
 
     g_free(dev->msi_cache);
@@ -383,7 +356,7 @@ void msi_notify(PCIDevice *dev, unsigned int vector)
     }
 
     if (kvm_enabled() && kvm_irqchip_in_kernel()) {
-        kvm_set_irq(dev->msi_irq_entries[vector].gsi, 1, NULL);
+        kvm_set_irq(dev->msi_cache[vector].kvm_gsi, 1, NULL);
         return;
     }
 
@@ -504,7 +477,7 @@ void msi_post_load(PCIDevice *dev)
 {
     uint16_t flags = pci_get_word(dev->config + msi_flags_off(dev));
 
-    if (kvm_enabled() && dev->msi_irq_entries) {
+    if (kvm_enabled() && kvm_irqchip_in_kernel()) {
         kvm_msi_free(dev);
 
         if (flags & PCI_MSI_FLAGS_ENABLE) {
diff --git a/hw/msix.c b/hw/msix.c
index e824aef..0be022e 100644
--- a/hw/msix.c
+++ b/hw/msix.c
@@ -49,7 +49,7 @@ static void kvm_msix_free(PCIDevice *dev)
 
     for (vector = 0; vector < dev->msix_entries_nr; ++vector) {
         if (dev->msix_entry_used[vector]) {
-            kvm_msi_message_del(&dev->msix_irq_entries[vector]);
+            kvm_msi_message_del(&dev->msix_cache[vector]);
             changed = 1;
         }
     }
@@ -58,21 +58,11 @@ static void kvm_msix_free(PCIDevice *dev)
     }
 }
 
-static void kvm_msix_message_from_vector(PCIDevice *dev, unsigned vector,
-                                         KVMMsiMessage *kmm)
-{
-    uint8_t *table_entry = dev->msix_table_page + vector * PCI_MSIX_ENTRY_SIZE;
-
-    kmm->addr_lo = pci_get_long(table_entry + PCI_MSIX_ENTRY_LOWER_ADDR);
-    kmm->addr_hi = pci_get_long(table_entry + PCI_MSIX_ENTRY_UPPER_ADDR);
-    kmm->data = pci_get_long(table_entry + PCI_MSIX_ENTRY_DATA);
-}
-
 static void kvm_msix_update(PCIDevice *dev, int vector,
                             int was_masked, int is_masked)
 {
-    KVMMsiMessage new_entry, *entry;
     int mask_cleared = was_masked && !is_masked;
+    MSIMessage msg;
     int r;
 
     /* It is only legal to change an entry when it is masked. Therefore, it is
@@ -84,16 +74,14 @@ static void kvm_msix_update(PCIDevice *dev, int vector,
         return;
     }
 
-    entry = dev->msix_irq_entries + vector;
-    kvm_msix_message_from_vector(dev, vector, &new_entry);
-    r = kvm_msi_message_update(entry, &new_entry);
+    msix_message_from_vector(dev, vector, &msg);
+    r = kvm_msi_message_update(&msg, &dev->msix_cache[vector]);
     if (r < 0) {
         fprintf(stderr, "%s: kvm_update_msix failed: %s\n", __func__,
                 strerror(-r));
         exit(1);
     }
     if (r > 0) {
-        *entry = new_entry;
         r = kvm_commit_irq_routes();
         if (r) {
             fprintf(stderr, "%s: kvm_commit_irq_routes failed: %s\n", __func__,
@@ -105,11 +93,11 @@ static void kvm_msix_update(PCIDevice *dev, int vector,
 
 static int kvm_msix_vector_add(PCIDevice *dev, unsigned vector)
 {
-    KVMMsiMessage *kmm = dev->msix_irq_entries + vector;
+    MSIMessage msg;
     int r;
 
-    kvm_msix_message_from_vector(dev, vector, kmm);
-    r = kvm_msi_message_add(kmm);
+    msix_message_from_vector(dev, vector, &msg);
+    r = kvm_msi_message_add(&msg, &dev->msix_cache[vector]);
     if (r < 0) {
         fprintf(stderr, "%s: kvm_add_msix failed: %s\n", __func__, strerror(-r));
         return r;
@@ -125,7 +113,7 @@ static int kvm_msix_vector_add(PCIDevice *dev, unsigned vector)
 
 static void kvm_msix_vector_del(PCIDevice *dev, unsigned vector)
 {
-    kvm_msi_message_del(&dev->msix_irq_entries[vector]);
+    kvm_msi_message_del(&dev->msix_cache[vector]);
     kvm_commit_irq_routes();
 }
 
@@ -360,11 +348,6 @@ int msix_init(struct PCIDevice *dev, unsigned short nentries,
 
     dev->msix_cache = g_malloc0(nentries * sizeof *dev->msix_cache);
 
-    if (kvm_enabled() && kvm_irqchip_in_kernel()) {
-        dev->msix_irq_entries = g_malloc(nentries *
-                                         sizeof *dev->msix_irq_entries);
-    }
-
     dev->cap_present |= QEMU_PCI_CAP_MSIX;
     msix_mmio_setup(dev, bar);
     return 0;
@@ -409,8 +392,6 @@ int msix_uninit(PCIDevice *dev, MemoryRegion *bar)
     dev->msix_table_page = NULL;
     g_free(dev->msix_entry_used);
     dev->msix_entry_used = NULL;
-    g_free(dev->msix_irq_entries);
-    dev->msix_irq_entries = NULL;
 
     g_free(dev->msix_cache);
 
@@ -477,7 +458,7 @@ void msix_notify(PCIDevice *dev, unsigned vector)
     }
 
     if (kvm_enabled() && kvm_irqchip_in_kernel()) {
-        kvm_set_irq(dev->msix_irq_entries[vector].gsi, 1, NULL);
+        kvm_set_irq(dev->msix_cache[vector].kvm_gsi, 1, NULL);
         return;
     }
 
diff --git a/hw/pci.h b/hw/pci.h
index 5b5d2fd..0177df4 100644
--- a/hw/pci.h
+++ b/hw/pci.h
@@ -6,7 +6,6 @@
 
 #include "qdev.h"
 #include "memory.h"
-#include "kvm.h"
 
 /* PCI includes legacy ISA access.  */
 #include "isa.h"
@@ -203,7 +202,6 @@ struct PCIDevice {
 
     /* MSI entries */
     int msi_entries_nr;
-    struct KVMMsiMessage *msi_irq_entries;
 
     /* How much space does an MSIX table need. */
     /* The spec requires giving the table structure
@@ -212,8 +210,6 @@ struct PCIDevice {
      * on the rest of the region. */
     target_phys_addr_t msix_page_size;
 
-    KVMMsiMessage *msix_irq_entries;
-
     msix_mask_notifier_func msix_mask_notifier;
 };
 
diff --git a/hw/virtio-pci.c b/hw/virtio-pci.c
index 615295e..23880e0 100644
--- a/hw/virtio-pci.c
+++ b/hw/virtio-pci.c
@@ -21,6 +21,7 @@
 #include "virtio-serial.h"
 #include "pci.h"
 #include "qemu-error.h"
+#include "msi.h"
 #include "msix.h"
 #include "net.h"
 #include "loader.h"
@@ -523,7 +524,7 @@ static int virtio_pci_mask_vq(PCIDevice *dev, unsigned vector,
                               VirtQueue *vq, int masked)
 {
     EventNotifier *notifier = virtio_queue_get_guest_notifier(vq);
-    int r = kvm_set_irqfd(dev->msix_irq_entries[vector].gsi,
+    int r = kvm_set_irqfd(dev->msix_cache[vector].kvm_gsi,
                           event_notifier_get_fd(notifier),
                           !masked);
     if (r < 0) {
diff --git a/kvm-stub.c b/kvm-stub.c
index c98170e..ca4382a 100644
--- a/kvm-stub.c
+++ b/kvm-stub.c
@@ -140,17 +140,17 @@ int kvm_get_irq_route_gsi(void)
     return -ENOSYS;
 }
 
-int kvm_msi_message_add(KVMMsiMessage *msg)
+int kvm_msi_message_add(MSIMessage *msg, MSIRoutingCache *cache)
 {
     return -ENOSYS;
 }
 
-int kvm_msi_message_del(KVMMsiMessage *msg)
+int kvm_msi_message_del(MSIRoutingCache *cache)
 {
     return -ENOSYS;
 }
 
-int kvm_msi_message_update(KVMMsiMessage *old, KVMMsiMessage *new)
+int kvm_msi_message_update(MSIMessage *msg, MSIRoutingCache *cache)
 {
     return -ENOSYS;
 }
diff --git a/kvm.h b/kvm.h
index b15e1dd..3706fc6 100644
--- a/kvm.h
+++ b/kvm.h
@@ -200,20 +200,13 @@ int kvm_set_irqfd(int gsi, int fd, bool assigned);
 
 int kvm_set_ioeventfd_pio_word(int fd, uint16_t adr, uint16_t val, bool assign);
 
-typedef struct KVMMsiMessage {
-    uint32_t gsi;
-    uint32_t addr_lo;
-    uint32_t addr_hi;
-    uint32_t data;
-} KVMMsiMessage;
-
 int kvm_has_gsi_routing(void);
 int kvm_allows_irq0_override(void);
 int kvm_get_irq_route_gsi(void);
 
-int kvm_msi_message_add(KVMMsiMessage *msg);
-int kvm_msi_message_del(KVMMsiMessage *msg);
-int kvm_msi_message_update(KVMMsiMessage *old, KVMMsiMessage *new);
+int kvm_msi_message_add(MSIMessage *msg, MSIRoutingCache *cache);
+int kvm_msi_message_del(MSIRoutingCache *cache);
+int kvm_msi_message_update(MSIMessage *msg, MSIRoutingCache *cache);
 
 int kvm_commit_irq_routes(void);
 
diff --git a/qemu-kvm.c b/qemu-kvm.c
index e8dc537..253cf75 100644
--- a/qemu-kvm.c
+++ b/qemu-kvm.c
@@ -19,6 +19,7 @@
 #include "gdbstub.h"
 #include "monitor.h"
 #include "cpus.h"
+#include "hw/msi.h"
 
 #include "qemu-kvm.h"
 
@@ -442,18 +443,18 @@ int kvm_get_irq_route_gsi(void)
 }
 
 static void kvm_msi_routing_entry(struct kvm_irq_routing_entry *e,
-                                  KVMMsiMessage *msg)
+                                  MSIRoutingCache *cache)
 
 {
-    e->gsi = msg->gsi;
+    e->gsi = cache->kvm_gsi;
     e->type = KVM_IRQ_ROUTING_MSI;
     e->flags = 0;
-    e->u.msi.address_lo = msg->addr_lo;
-    e->u.msi.address_hi = msg->addr_hi;
-    e->u.msi.data = msg->data;
+    e->u.msi.address_lo = (uint32_t)cache->msg.address;
+    e->u.msi.address_hi = cache->msg.address >> 32;
+    e->u.msi.data = cache->msg.data;
 }
 
-int kvm_msi_message_add(KVMMsiMessage *msg)
+int kvm_msi_message_add(MSIMessage *msg, MSIRoutingCache *cache)
 {
     struct kvm_irq_routing_entry e;
     int ret;
@@ -462,37 +463,48 @@ int kvm_msi_message_add(KVMMsiMessage *msg)
     if (ret < 0) {
         return ret;
     }
-    msg->gsi = ret;
+    cache->msg = *msg;
+    cache->type = MSI_ROUTE_STATIC;
+    cache->kvm_gsi = ret;
+    cache->kvm_irqfd = -1;
 
-    kvm_msi_routing_entry(&e, msg);
+    kvm_msi_routing_entry(&e, cache);
     return kvm_add_routing_entry(&e);
 }
 
-int kvm_msi_message_del(KVMMsiMessage *msg)
+int kvm_msi_message_del(MSIRoutingCache *cache)
 {
     struct kvm_irq_routing_entry e;
 
-    kvm_msi_routing_entry(&e, msg);
+    kvm_msi_routing_entry(&e, cache);
     return kvm_del_routing_entry(&e);
 }
 
-int kvm_msi_message_update(KVMMsiMessage *old, KVMMsiMessage *new)
+int kvm_msi_message_update(MSIMessage *msg, MSIRoutingCache *cache)
 {
-    struct kvm_irq_routing_entry e1, e2;
+    struct kvm_irq_routing_entry old, new;
+    MSIRoutingCache new_cache;
     int ret;
 
-    new->gsi = old->gsi;
-    if (memcmp(old, new, sizeof(KVMMsiMessage)) == 0) {
+    assert(cache->type != MSI_ROUTE_NONE);
+
+    if (msg->address == cache->msg.address && msg->data == cache->msg.data) {
         return 0;
     }
 
-    kvm_msi_routing_entry(&e1, old);
-    kvm_msi_routing_entry(&e2, new);
+    kvm_msi_routing_entry(&old, cache);
+
+    new_cache.msg = *msg;
+    new_cache.type = cache->type;
+    new_cache.kvm_gsi = cache->kvm_gsi;
+    new_cache.kvm_irqfd = cache->kvm_irqfd;
+    kvm_msi_routing_entry(&new, &new_cache);
 
-    ret = kvm_update_routing_entry(&e1, &e2);
+    ret = kvm_update_routing_entry(&old, &new);
     if (ret < 0) {
         return ret;
     }
+    *cache = new_cache;
 
     return 1;
 }
-- 
1.7.3.4


^ permalink raw reply related	[flat|nested] 288+ messages in thread

* [Qemu-devel] [RFC][PATCH 16/45] qemu-kvm: Use MSIMessage and MSIRoutingCache
@ 2011-10-17  9:27   ` Jan Kiszka
  0 siblings, 0 replies; 288+ messages in thread
From: Jan Kiszka @ 2011-10-17  9:27 UTC (permalink / raw)
  To: Avi Kivity, Marcelo Tosatti
  Cc: Alex Williamson, qemu-devel, kvm, Michael S. Tsirkin

Start benefiting from the new abstractions and drop the KVM-specific
vector tracking to generic MSIMessage and MSIRoutingCache data
structures and helpers, also reducing the diff to upstream.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
---
 hw/msi.c        |   49 +++++++++++--------------------------------------
 hw/msix.c       |   37 +++++++++----------------------------
 hw/pci.h        |    4 ----
 hw/virtio-pci.c |    3 ++-
 kvm-stub.c      |    6 +++---
 kvm.h           |   13 +++----------
 qemu-kvm.c      |   46 +++++++++++++++++++++++++++++-----------------
 7 files changed, 57 insertions(+), 101 deletions(-)

diff --git a/hw/msi.c b/hw/msi.c
index c8ccb17..b947104 100644
--- a/hw/msi.c
+++ b/hw/msi.c
@@ -140,49 +140,29 @@ static void msi_message_from_vector(PCIDevice *dev, uint16_t msi_flags,
     }
 }
 
-static void kvm_msi_message_from_vector(PCIDevice *dev, unsigned vector,
-                                        KVMMsiMessage *kmm)
-{
-    uint16_t flags = pci_get_word(dev->config + msi_flags_off(dev));
-    bool msi64bit = flags & PCI_MSI_FLAGS_64BIT;
-    unsigned int nr_vectors = msi_nr_vectors(flags);
-
-    kmm->addr_lo = pci_get_long(dev->config + msi_address_lo_off(dev));
-    if (msi64bit) {
-        kmm->addr_hi = pci_get_long(dev->config + msi_address_hi_off(dev));
-    } else {
-        kmm->addr_hi = 0;
-    }
-
-    kmm->data = pci_get_word(dev->config + msi_data_off(dev, msi64bit));
-    if (nr_vectors > 1) {
-        kmm->data &= ~(nr_vectors - 1);
-        kmm->data |= vector;
-    }
-}
-
 static void kvm_msi_update(PCIDevice *dev)
 {
     uint16_t flags = pci_get_word(dev->config + msi_flags_off(dev));
     unsigned int max_vectors = 1 <<
         ((flags & PCI_MSI_FLAGS_QMASK) >> (ffs(PCI_MSI_FLAGS_QMASK) - 1));
     unsigned int nr_vectors = msi_nr_vectors(flags);
-    KVMMsiMessage new_entry, *entry;
+    MSIRoutingCache *cache;
     bool changed = false;
     unsigned int vector;
+    MSIMessage msg;
     int r;
 
     for (vector = 0; vector < max_vectors; vector++) {
-        entry = dev->msi_irq_entries + vector;
+        cache = &dev->msi_cache[vector];
 
         if (vector >= nr_vectors) {
             if (vector < dev->msi_entries_nr) {
-                kvm_msi_message_del(entry);
+                kvm_msi_message_del(cache);
                 changed = true;
             }
         } else if (vector >= dev->msi_entries_nr) {
-            kvm_msi_message_from_vector(dev, vector, entry);
-            r = kvm_msi_message_add(entry);
+            msi_message_from_vector(dev, flags, vector, &msg);
+            r = kvm_msi_message_add(&msg, cache);
             if (r) {
                 fprintf(stderr, "%s: kvm_msi_add failed: %s\n", __func__,
                         strerror(-r));
@@ -190,15 +170,14 @@ static void kvm_msi_update(PCIDevice *dev)
             }
             changed = true;
         } else {
-            kvm_msi_message_from_vector(dev, vector, &new_entry);
-            r = kvm_msi_message_update(entry, &new_entry);
+            msi_message_from_vector(dev, flags, vector, &msg);
+            r = kvm_msi_message_update(&msg, cache);
             if (r < 0) {
                 fprintf(stderr, "%s: kvm_update_msi failed: %s\n",
                         __func__, strerror(-r));
                 exit(1);
             }
             if (r > 0) {
-                *entry = new_entry;
                 changed = true;
             }
         }
@@ -220,7 +199,7 @@ static void kvm_msi_free(PCIDevice *dev)
     unsigned int vector;
 
     for (vector = 0; vector < dev->msi_entries_nr; ++vector) {
-        kvm_msi_message_del(&dev->msi_irq_entries[vector]);
+        kvm_msi_message_del(&dev->msi_cache[vector]);
     }
     if (dev->msi_entries_nr > 0) {
         kvm_commit_irq_routes();
@@ -290,11 +269,6 @@ int msi_init(struct PCIDevice *dev, uint8_t offset,
 
     dev->msi_cache = g_malloc0(nr_vectors * sizeof(*dev->msi_cache));
 
-    if (kvm_enabled() && kvm_irqchip_in_kernel()) {
-        dev->msi_irq_entries = g_malloc(nr_vectors *
-                                        sizeof(*dev->msix_irq_entries));
-    }
-
     return config_offset;
 }
 
@@ -311,7 +285,6 @@ void msi_uninit(struct PCIDevice *dev)
 
     if (kvm_enabled() && kvm_irqchip_in_kernel()) {
         kvm_msi_free(dev);
-        g_free(dev->msi_irq_entries);
     }
 
     g_free(dev->msi_cache);
@@ -383,7 +356,7 @@ void msi_notify(PCIDevice *dev, unsigned int vector)
     }
 
     if (kvm_enabled() && kvm_irqchip_in_kernel()) {
-        kvm_set_irq(dev->msi_irq_entries[vector].gsi, 1, NULL);
+        kvm_set_irq(dev->msi_cache[vector].kvm_gsi, 1, NULL);
         return;
     }
 
@@ -504,7 +477,7 @@ void msi_post_load(PCIDevice *dev)
 {
     uint16_t flags = pci_get_word(dev->config + msi_flags_off(dev));
 
-    if (kvm_enabled() && dev->msi_irq_entries) {
+    if (kvm_enabled() && kvm_irqchip_in_kernel()) {
         kvm_msi_free(dev);
 
         if (flags & PCI_MSI_FLAGS_ENABLE) {
diff --git a/hw/msix.c b/hw/msix.c
index e824aef..0be022e 100644
--- a/hw/msix.c
+++ b/hw/msix.c
@@ -49,7 +49,7 @@ static void kvm_msix_free(PCIDevice *dev)
 
     for (vector = 0; vector < dev->msix_entries_nr; ++vector) {
         if (dev->msix_entry_used[vector]) {
-            kvm_msi_message_del(&dev->msix_irq_entries[vector]);
+            kvm_msi_message_del(&dev->msix_cache[vector]);
             changed = 1;
         }
     }
@@ -58,21 +58,11 @@ static void kvm_msix_free(PCIDevice *dev)
     }
 }
 
-static void kvm_msix_message_from_vector(PCIDevice *dev, unsigned vector,
-                                         KVMMsiMessage *kmm)
-{
-    uint8_t *table_entry = dev->msix_table_page + vector * PCI_MSIX_ENTRY_SIZE;
-
-    kmm->addr_lo = pci_get_long(table_entry + PCI_MSIX_ENTRY_LOWER_ADDR);
-    kmm->addr_hi = pci_get_long(table_entry + PCI_MSIX_ENTRY_UPPER_ADDR);
-    kmm->data = pci_get_long(table_entry + PCI_MSIX_ENTRY_DATA);
-}
-
 static void kvm_msix_update(PCIDevice *dev, int vector,
                             int was_masked, int is_masked)
 {
-    KVMMsiMessage new_entry, *entry;
     int mask_cleared = was_masked && !is_masked;
+    MSIMessage msg;
     int r;
 
     /* It is only legal to change an entry when it is masked. Therefore, it is
@@ -84,16 +74,14 @@ static void kvm_msix_update(PCIDevice *dev, int vector,
         return;
     }
 
-    entry = dev->msix_irq_entries + vector;
-    kvm_msix_message_from_vector(dev, vector, &new_entry);
-    r = kvm_msi_message_update(entry, &new_entry);
+    msix_message_from_vector(dev, vector, &msg);
+    r = kvm_msi_message_update(&msg, &dev->msix_cache[vector]);
     if (r < 0) {
         fprintf(stderr, "%s: kvm_update_msix failed: %s\n", __func__,
                 strerror(-r));
         exit(1);
     }
     if (r > 0) {
-        *entry = new_entry;
         r = kvm_commit_irq_routes();
         if (r) {
             fprintf(stderr, "%s: kvm_commit_irq_routes failed: %s\n", __func__,
@@ -105,11 +93,11 @@ static void kvm_msix_update(PCIDevice *dev, int vector,
 
 static int kvm_msix_vector_add(PCIDevice *dev, unsigned vector)
 {
-    KVMMsiMessage *kmm = dev->msix_irq_entries + vector;
+    MSIMessage msg;
     int r;
 
-    kvm_msix_message_from_vector(dev, vector, kmm);
-    r = kvm_msi_message_add(kmm);
+    msix_message_from_vector(dev, vector, &msg);
+    r = kvm_msi_message_add(&msg, &dev->msix_cache[vector]);
     if (r < 0) {
         fprintf(stderr, "%s: kvm_add_msix failed: %s\n", __func__, strerror(-r));
         return r;
@@ -125,7 +113,7 @@ static int kvm_msix_vector_add(PCIDevice *dev, unsigned vector)
 
 static void kvm_msix_vector_del(PCIDevice *dev, unsigned vector)
 {
-    kvm_msi_message_del(&dev->msix_irq_entries[vector]);
+    kvm_msi_message_del(&dev->msix_cache[vector]);
     kvm_commit_irq_routes();
 }
 
@@ -360,11 +348,6 @@ int msix_init(struct PCIDevice *dev, unsigned short nentries,
 
     dev->msix_cache = g_malloc0(nentries * sizeof *dev->msix_cache);
 
-    if (kvm_enabled() && kvm_irqchip_in_kernel()) {
-        dev->msix_irq_entries = g_malloc(nentries *
-                                         sizeof *dev->msix_irq_entries);
-    }
-
     dev->cap_present |= QEMU_PCI_CAP_MSIX;
     msix_mmio_setup(dev, bar);
     return 0;
@@ -409,8 +392,6 @@ int msix_uninit(PCIDevice *dev, MemoryRegion *bar)
     dev->msix_table_page = NULL;
     g_free(dev->msix_entry_used);
     dev->msix_entry_used = NULL;
-    g_free(dev->msix_irq_entries);
-    dev->msix_irq_entries = NULL;
 
     g_free(dev->msix_cache);
 
@@ -477,7 +458,7 @@ void msix_notify(PCIDevice *dev, unsigned vector)
     }
 
     if (kvm_enabled() && kvm_irqchip_in_kernel()) {
-        kvm_set_irq(dev->msix_irq_entries[vector].gsi, 1, NULL);
+        kvm_set_irq(dev->msix_cache[vector].kvm_gsi, 1, NULL);
         return;
     }
 
diff --git a/hw/pci.h b/hw/pci.h
index 5b5d2fd..0177df4 100644
--- a/hw/pci.h
+++ b/hw/pci.h
@@ -6,7 +6,6 @@
 
 #include "qdev.h"
 #include "memory.h"
-#include "kvm.h"
 
 /* PCI includes legacy ISA access.  */
 #include "isa.h"
@@ -203,7 +202,6 @@ struct PCIDevice {
 
     /* MSI entries */
     int msi_entries_nr;
-    struct KVMMsiMessage *msi_irq_entries;
 
     /* How much space does an MSIX table need. */
     /* The spec requires giving the table structure
@@ -212,8 +210,6 @@ struct PCIDevice {
      * on the rest of the region. */
     target_phys_addr_t msix_page_size;
 
-    KVMMsiMessage *msix_irq_entries;
-
     msix_mask_notifier_func msix_mask_notifier;
 };
 
diff --git a/hw/virtio-pci.c b/hw/virtio-pci.c
index 615295e..23880e0 100644
--- a/hw/virtio-pci.c
+++ b/hw/virtio-pci.c
@@ -21,6 +21,7 @@
 #include "virtio-serial.h"
 #include "pci.h"
 #include "qemu-error.h"
+#include "msi.h"
 #include "msix.h"
 #include "net.h"
 #include "loader.h"
@@ -523,7 +524,7 @@ static int virtio_pci_mask_vq(PCIDevice *dev, unsigned vector,
                               VirtQueue *vq, int masked)
 {
     EventNotifier *notifier = virtio_queue_get_guest_notifier(vq);
-    int r = kvm_set_irqfd(dev->msix_irq_entries[vector].gsi,
+    int r = kvm_set_irqfd(dev->msix_cache[vector].kvm_gsi,
                           event_notifier_get_fd(notifier),
                           !masked);
     if (r < 0) {
diff --git a/kvm-stub.c b/kvm-stub.c
index c98170e..ca4382a 100644
--- a/kvm-stub.c
+++ b/kvm-stub.c
@@ -140,17 +140,17 @@ int kvm_get_irq_route_gsi(void)
     return -ENOSYS;
 }
 
-int kvm_msi_message_add(KVMMsiMessage *msg)
+int kvm_msi_message_add(MSIMessage *msg, MSIRoutingCache *cache)
 {
     return -ENOSYS;
 }
 
-int kvm_msi_message_del(KVMMsiMessage *msg)
+int kvm_msi_message_del(MSIRoutingCache *cache)
 {
     return -ENOSYS;
 }
 
-int kvm_msi_message_update(KVMMsiMessage *old, KVMMsiMessage *new)
+int kvm_msi_message_update(MSIMessage *msg, MSIRoutingCache *cache)
 {
     return -ENOSYS;
 }
diff --git a/kvm.h b/kvm.h
index b15e1dd..3706fc6 100644
--- a/kvm.h
+++ b/kvm.h
@@ -200,20 +200,13 @@ int kvm_set_irqfd(int gsi, int fd, bool assigned);
 
 int kvm_set_ioeventfd_pio_word(int fd, uint16_t adr, uint16_t val, bool assign);
 
-typedef struct KVMMsiMessage {
-    uint32_t gsi;
-    uint32_t addr_lo;
-    uint32_t addr_hi;
-    uint32_t data;
-} KVMMsiMessage;
-
 int kvm_has_gsi_routing(void);
 int kvm_allows_irq0_override(void);
 int kvm_get_irq_route_gsi(void);
 
-int kvm_msi_message_add(KVMMsiMessage *msg);
-int kvm_msi_message_del(KVMMsiMessage *msg);
-int kvm_msi_message_update(KVMMsiMessage *old, KVMMsiMessage *new);
+int kvm_msi_message_add(MSIMessage *msg, MSIRoutingCache *cache);
+int kvm_msi_message_del(MSIRoutingCache *cache);
+int kvm_msi_message_update(MSIMessage *msg, MSIRoutingCache *cache);
 
 int kvm_commit_irq_routes(void);
 
diff --git a/qemu-kvm.c b/qemu-kvm.c
index e8dc537..253cf75 100644
--- a/qemu-kvm.c
+++ b/qemu-kvm.c
@@ -19,6 +19,7 @@
 #include "gdbstub.h"
 #include "monitor.h"
 #include "cpus.h"
+#include "hw/msi.h"
 
 #include "qemu-kvm.h"
 
@@ -442,18 +443,18 @@ int kvm_get_irq_route_gsi(void)
 }
 
 static void kvm_msi_routing_entry(struct kvm_irq_routing_entry *e,
-                                  KVMMsiMessage *msg)
+                                  MSIRoutingCache *cache)
 
 {
-    e->gsi = msg->gsi;
+    e->gsi = cache->kvm_gsi;
     e->type = KVM_IRQ_ROUTING_MSI;
     e->flags = 0;
-    e->u.msi.address_lo = msg->addr_lo;
-    e->u.msi.address_hi = msg->addr_hi;
-    e->u.msi.data = msg->data;
+    e->u.msi.address_lo = (uint32_t)cache->msg.address;
+    e->u.msi.address_hi = cache->msg.address >> 32;
+    e->u.msi.data = cache->msg.data;
 }
 
-int kvm_msi_message_add(KVMMsiMessage *msg)
+int kvm_msi_message_add(MSIMessage *msg, MSIRoutingCache *cache)
 {
     struct kvm_irq_routing_entry e;
     int ret;
@@ -462,37 +463,48 @@ int kvm_msi_message_add(KVMMsiMessage *msg)
     if (ret < 0) {
         return ret;
     }
-    msg->gsi = ret;
+    cache->msg = *msg;
+    cache->type = MSI_ROUTE_STATIC;
+    cache->kvm_gsi = ret;
+    cache->kvm_irqfd = -1;
 
-    kvm_msi_routing_entry(&e, msg);
+    kvm_msi_routing_entry(&e, cache);
     return kvm_add_routing_entry(&e);
 }
 
-int kvm_msi_message_del(KVMMsiMessage *msg)
+int kvm_msi_message_del(MSIRoutingCache *cache)
 {
     struct kvm_irq_routing_entry e;
 
-    kvm_msi_routing_entry(&e, msg);
+    kvm_msi_routing_entry(&e, cache);
     return kvm_del_routing_entry(&e);
 }
 
-int kvm_msi_message_update(KVMMsiMessage *old, KVMMsiMessage *new)
+int kvm_msi_message_update(MSIMessage *msg, MSIRoutingCache *cache)
 {
-    struct kvm_irq_routing_entry e1, e2;
+    struct kvm_irq_routing_entry old, new;
+    MSIRoutingCache new_cache;
     int ret;
 
-    new->gsi = old->gsi;
-    if (memcmp(old, new, sizeof(KVMMsiMessage)) == 0) {
+    assert(cache->type != MSI_ROUTE_NONE);
+
+    if (msg->address == cache->msg.address && msg->data == cache->msg.data) {
         return 0;
     }
 
-    kvm_msi_routing_entry(&e1, old);
-    kvm_msi_routing_entry(&e2, new);
+    kvm_msi_routing_entry(&old, cache);
+
+    new_cache.msg = *msg;
+    new_cache.type = cache->type;
+    new_cache.kvm_gsi = cache->kvm_gsi;
+    new_cache.kvm_irqfd = cache->kvm_irqfd;
+    kvm_msi_routing_entry(&new, &new_cache);
 
-    ret = kvm_update_routing_entry(&e1, &e2);
+    ret = kvm_update_routing_entry(&old, &new);
     if (ret < 0) {
         return ret;
     }
+    *cache = new_cache;
 
     return 1;
 }
-- 
1.7.3.4

^ permalink raw reply related	[flat|nested] 288+ messages in thread

* [RFC][PATCH 17/45] qemu-kvm: Track MSIRoutingCache in KVM routing table
  2011-10-17  9:27 ` [Qemu-devel] " Jan Kiszka
@ 2011-10-17  9:27   ` Jan Kiszka
  -1 siblings, 0 replies; 288+ messages in thread
From: Jan Kiszka @ 2011-10-17  9:27 UTC (permalink / raw)
  To: Avi Kivity, Marcelo Tosatti
  Cc: kvm, Alex Williamson, Michael S. Tsirkin, qemu-devel

Keep a link from the internal KVM routing table to potential MSI routing
cache entries. The link is used so far whenever the entry is dropped to
invalidate the cache content. It will allow us to build MSI routing
entries on demand and flush existing ones on table overflow.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
---
 hw/device-assignment.c |    4 ++--
 kvm-all.c              |    1 +
 qemu-kvm.c             |   25 ++++++++++++++++++-------
 qemu-kvm.h             |    3 ++-
 4 files changed, 23 insertions(+), 10 deletions(-)

diff --git a/hw/device-assignment.c b/hw/device-assignment.c
index 11efd16..07e9f5a 100644
--- a/hw/device-assignment.c
+++ b/hw/device-assignment.c
@@ -951,7 +951,7 @@ static void assigned_dev_update_msi(PCIDevice *pci_dev)
         }
         assigned_dev->entry->gsi = r;
 
-        kvm_add_routing_entry(assigned_dev->entry);
+        kvm_add_routing_entry(assigned_dev->entry, NULL);
         if (kvm_commit_irq_routes() < 0) {
             perror("assigned_dev_update_msi: kvm_commit_irq_routes");
             assigned_dev->cap.state &= ~ASSIGNED_DEVICE_MSI_ENABLED;
@@ -1039,7 +1039,7 @@ static int assigned_dev_update_msix_mmio(PCIDevice *pci_dev)
         adev->entry[entries_nr].u.msi.address_hi = msg_upper_addr;
         adev->entry[entries_nr].u.msi.data = msg_data;
         DEBUG("MSI-X data 0x%x, MSI-X addr_lo 0x%x\n!", msg_data, msg_addr);
-	kvm_add_routing_entry(&adev->entry[entries_nr]);
+        kvm_add_routing_entry(&adev->entry[entries_nr], NULL);
 
         msix_entry.gsi = adev->entry[entries_nr].gsi;
         msix_entry.entry = i;
diff --git a/kvm-all.c b/kvm-all.c
index c34263b..c4186a5 100644
--- a/kvm-all.c
+++ b/kvm-all.c
@@ -81,6 +81,7 @@ struct KVMState
     int irqchip_inject_ioctl;
 #ifdef KVM_CAP_IRQ_ROUTING
     struct kvm_irq_routing *irq_routes;
+    MSIRoutingCache **msi_cache;
     int nr_allocated_irq_routes;
 #endif
     void *used_gsi_bitmap;
diff --git a/qemu-kvm.c b/qemu-kvm.c
index 253cf75..13d4f90 100644
--- a/qemu-kvm.c
+++ b/qemu-kvm.c
@@ -253,7 +253,8 @@ int kvm_has_gsi_routing(void)
     return r;
 }
 
-int kvm_add_routing_entry(struct kvm_irq_routing_entry *entry)
+int kvm_add_routing_entry(struct kvm_irq_routing_entry *entry,
+                          MSIRoutingCache *msi_cache)
 {
 #ifdef KVM_CAP_IRQ_ROUTING
     KVMState *s = kvm_state;
@@ -274,6 +275,8 @@ int kvm_add_routing_entry(struct kvm_irq_routing_entry *entry)
         }
         s->nr_allocated_irq_routes = n;
         s->irq_routes = z;
+
+        s->msi_cache = g_realloc(s->msi_cache, sizeof(*s->msi_cache) * n);
     }
     n = s->irq_routes->nr++;
     new = &s->irq_routes->entries[n];
@@ -282,6 +285,7 @@ int kvm_add_routing_entry(struct kvm_irq_routing_entry *entry)
     new->type = entry->type;
     new->flags = entry->flags;
     new->u = entry->u;
+    s->msi_cache[n] = msi_cache;
 
     set_gsi(s, entry->gsi);
 
@@ -301,7 +305,7 @@ int kvm_add_irq_route(int gsi, int irqchip, int pin)
     e.flags = 0;
     e.u.irqchip.irqchip = irqchip;
     e.u.irqchip.pin = pin;
-    return kvm_add_routing_entry(&e);
+    return kvm_add_routing_entry(&e, NULL);
 #else
     return -ENOSYS;
 #endif
@@ -312,6 +316,7 @@ int kvm_del_routing_entry(struct kvm_irq_routing_entry *entry)
 #ifdef KVM_CAP_IRQ_ROUTING
     KVMState *s = kvm_state;
     struct kvm_irq_routing_entry *e, *p;
+    MSIRoutingCache *cache;
     int i, gsi, found = 0;
 
     gsi = entry->gsi;
@@ -324,8 +329,6 @@ int kvm_del_routing_entry(struct kvm_irq_routing_entry *entry)
                     if (e->u.irqchip.irqchip ==
                         entry->u.irqchip.irqchip
                         && e->u.irqchip.pin == entry->u.irqchip.pin) {
-                        p = &s->irq_routes->entries[--s->irq_routes->nr];
-                        *e = *p;
                         found = 1;
                     }
                     break;
@@ -336,8 +339,6 @@ int kvm_del_routing_entry(struct kvm_irq_routing_entry *entry)
                         && e->u.msi.address_hi ==
                         entry->u.msi.address_hi
                         && e->u.msi.data == entry->u.msi.data) {
-                        p = &s->irq_routes->entries[--s->irq_routes->nr];
-                        *e = *p;
                         found = 1;
                     }
                     break;
@@ -346,6 +347,16 @@ int kvm_del_routing_entry(struct kvm_irq_routing_entry *entry)
                 break;
             }
             if (found) {
+                s->irq_routes->nr--;
+                p = &s->irq_routes->entries[s->irq_routes->nr];
+                *e = *p;
+
+                cache = s->msi_cache[i];
+                if (cache) {
+                    cache->type = MSI_ROUTE_NONE;
+                }
+                s->msi_cache[i] = s->msi_cache[s->irq_routes->nr];
+
                 /* If there are no other users of this GSI
                  * mark it available in the bitmap */
                 for (i = 0; i < s->irq_routes->nr; i++) {
@@ -469,7 +480,7 @@ int kvm_msi_message_add(MSIMessage *msg, MSIRoutingCache *cache)
     cache->kvm_irqfd = -1;
 
     kvm_msi_routing_entry(&e, cache);
-    return kvm_add_routing_entry(&e);
+    return kvm_add_routing_entry(&e, cache);
 }
 
 int kvm_msi_message_del(MSIRoutingCache *cache)
diff --git a/qemu-kvm.h b/qemu-kvm.h
index 68a921e..b2ae5da 100644
--- a/qemu-kvm.h
+++ b/qemu-kvm.h
@@ -188,7 +188,8 @@ struct kvm_irq_routing_entry;
  * Adds a filled routing entry to the temporary irq routing table. Nothing is
  * committed to the running VM.
  */
-int kvm_add_routing_entry(struct kvm_irq_routing_entry *entry);
+int kvm_add_routing_entry(struct kvm_irq_routing_entry *entry,
+                          MSIRoutingCache *msi_cache);
 
 /*!
  * \brief Removes a routing from the temporary irq routing table
-- 
1.7.3.4


^ permalink raw reply related	[flat|nested] 288+ messages in thread

* [Qemu-devel] [RFC][PATCH 17/45] qemu-kvm: Track MSIRoutingCache in KVM routing table
@ 2011-10-17  9:27   ` Jan Kiszka
  0 siblings, 0 replies; 288+ messages in thread
From: Jan Kiszka @ 2011-10-17  9:27 UTC (permalink / raw)
  To: Avi Kivity, Marcelo Tosatti
  Cc: Alex Williamson, qemu-devel, kvm, Michael S. Tsirkin

Keep a link from the internal KVM routing table to potential MSI routing
cache entries. The link is used so far whenever the entry is dropped to
invalidate the cache content. It will allow us to build MSI routing
entries on demand and flush existing ones on table overflow.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
---
 hw/device-assignment.c |    4 ++--
 kvm-all.c              |    1 +
 qemu-kvm.c             |   25 ++++++++++++++++++-------
 qemu-kvm.h             |    3 ++-
 4 files changed, 23 insertions(+), 10 deletions(-)

diff --git a/hw/device-assignment.c b/hw/device-assignment.c
index 11efd16..07e9f5a 100644
--- a/hw/device-assignment.c
+++ b/hw/device-assignment.c
@@ -951,7 +951,7 @@ static void assigned_dev_update_msi(PCIDevice *pci_dev)
         }
         assigned_dev->entry->gsi = r;
 
-        kvm_add_routing_entry(assigned_dev->entry);
+        kvm_add_routing_entry(assigned_dev->entry, NULL);
         if (kvm_commit_irq_routes() < 0) {
             perror("assigned_dev_update_msi: kvm_commit_irq_routes");
             assigned_dev->cap.state &= ~ASSIGNED_DEVICE_MSI_ENABLED;
@@ -1039,7 +1039,7 @@ static int assigned_dev_update_msix_mmio(PCIDevice *pci_dev)
         adev->entry[entries_nr].u.msi.address_hi = msg_upper_addr;
         adev->entry[entries_nr].u.msi.data = msg_data;
         DEBUG("MSI-X data 0x%x, MSI-X addr_lo 0x%x\n!", msg_data, msg_addr);
-	kvm_add_routing_entry(&adev->entry[entries_nr]);
+        kvm_add_routing_entry(&adev->entry[entries_nr], NULL);
 
         msix_entry.gsi = adev->entry[entries_nr].gsi;
         msix_entry.entry = i;
diff --git a/kvm-all.c b/kvm-all.c
index c34263b..c4186a5 100644
--- a/kvm-all.c
+++ b/kvm-all.c
@@ -81,6 +81,7 @@ struct KVMState
     int irqchip_inject_ioctl;
 #ifdef KVM_CAP_IRQ_ROUTING
     struct kvm_irq_routing *irq_routes;
+    MSIRoutingCache **msi_cache;
     int nr_allocated_irq_routes;
 #endif
     void *used_gsi_bitmap;
diff --git a/qemu-kvm.c b/qemu-kvm.c
index 253cf75..13d4f90 100644
--- a/qemu-kvm.c
+++ b/qemu-kvm.c
@@ -253,7 +253,8 @@ int kvm_has_gsi_routing(void)
     return r;
 }
 
-int kvm_add_routing_entry(struct kvm_irq_routing_entry *entry)
+int kvm_add_routing_entry(struct kvm_irq_routing_entry *entry,
+                          MSIRoutingCache *msi_cache)
 {
 #ifdef KVM_CAP_IRQ_ROUTING
     KVMState *s = kvm_state;
@@ -274,6 +275,8 @@ int kvm_add_routing_entry(struct kvm_irq_routing_entry *entry)
         }
         s->nr_allocated_irq_routes = n;
         s->irq_routes = z;
+
+        s->msi_cache = g_realloc(s->msi_cache, sizeof(*s->msi_cache) * n);
     }
     n = s->irq_routes->nr++;
     new = &s->irq_routes->entries[n];
@@ -282,6 +285,7 @@ int kvm_add_routing_entry(struct kvm_irq_routing_entry *entry)
     new->type = entry->type;
     new->flags = entry->flags;
     new->u = entry->u;
+    s->msi_cache[n] = msi_cache;
 
     set_gsi(s, entry->gsi);
 
@@ -301,7 +305,7 @@ int kvm_add_irq_route(int gsi, int irqchip, int pin)
     e.flags = 0;
     e.u.irqchip.irqchip = irqchip;
     e.u.irqchip.pin = pin;
-    return kvm_add_routing_entry(&e);
+    return kvm_add_routing_entry(&e, NULL);
 #else
     return -ENOSYS;
 #endif
@@ -312,6 +316,7 @@ int kvm_del_routing_entry(struct kvm_irq_routing_entry *entry)
 #ifdef KVM_CAP_IRQ_ROUTING
     KVMState *s = kvm_state;
     struct kvm_irq_routing_entry *e, *p;
+    MSIRoutingCache *cache;
     int i, gsi, found = 0;
 
     gsi = entry->gsi;
@@ -324,8 +329,6 @@ int kvm_del_routing_entry(struct kvm_irq_routing_entry *entry)
                     if (e->u.irqchip.irqchip ==
                         entry->u.irqchip.irqchip
                         && e->u.irqchip.pin == entry->u.irqchip.pin) {
-                        p = &s->irq_routes->entries[--s->irq_routes->nr];
-                        *e = *p;
                         found = 1;
                     }
                     break;
@@ -336,8 +339,6 @@ int kvm_del_routing_entry(struct kvm_irq_routing_entry *entry)
                         && e->u.msi.address_hi ==
                         entry->u.msi.address_hi
                         && e->u.msi.data == entry->u.msi.data) {
-                        p = &s->irq_routes->entries[--s->irq_routes->nr];
-                        *e = *p;
                         found = 1;
                     }
                     break;
@@ -346,6 +347,16 @@ int kvm_del_routing_entry(struct kvm_irq_routing_entry *entry)
                 break;
             }
             if (found) {
+                s->irq_routes->nr--;
+                p = &s->irq_routes->entries[s->irq_routes->nr];
+                *e = *p;
+
+                cache = s->msi_cache[i];
+                if (cache) {
+                    cache->type = MSI_ROUTE_NONE;
+                }
+                s->msi_cache[i] = s->msi_cache[s->irq_routes->nr];
+
                 /* If there are no other users of this GSI
                  * mark it available in the bitmap */
                 for (i = 0; i < s->irq_routes->nr; i++) {
@@ -469,7 +480,7 @@ int kvm_msi_message_add(MSIMessage *msg, MSIRoutingCache *cache)
     cache->kvm_irqfd = -1;
 
     kvm_msi_routing_entry(&e, cache);
-    return kvm_add_routing_entry(&e);
+    return kvm_add_routing_entry(&e, cache);
 }
 
 int kvm_msi_message_del(MSIRoutingCache *cache)
diff --git a/qemu-kvm.h b/qemu-kvm.h
index 68a921e..b2ae5da 100644
--- a/qemu-kvm.h
+++ b/qemu-kvm.h
@@ -188,7 +188,8 @@ struct kvm_irq_routing_entry;
  * Adds a filled routing entry to the temporary irq routing table. Nothing is
  * committed to the running VM.
  */
-int kvm_add_routing_entry(struct kvm_irq_routing_entry *entry);
+int kvm_add_routing_entry(struct kvm_irq_routing_entry *entry,
+                          MSIRoutingCache *msi_cache);
 
 /*!
  * \brief Removes a routing from the temporary irq routing table
-- 
1.7.3.4

^ permalink raw reply related	[flat|nested] 288+ messages in thread

* [RFC][PATCH 18/45] qemu-kvm: Hook into MSI delivery at APIC level
  2011-10-17  9:27 ` [Qemu-devel] " Jan Kiszka
@ 2011-10-17  9:27   ` Jan Kiszka
  -1 siblings, 0 replies; 288+ messages in thread
From: Jan Kiszka @ 2011-10-17  9:27 UTC (permalink / raw)
  To: Avi Kivity, Marcelo Tosatti
  Cc: kvm, Alex Williamson, Michael S. Tsirkin, qemu-devel

Move the two hooks for MSI delivery to in-kernel irqchips from the MSI
layer to a single place: the APIC.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
---
 hw/apic.c |   24 +++++++++++++++---------
 hw/msi.c  |    5 -----
 hw/msix.c |    5 -----
 3 files changed, 15 insertions(+), 19 deletions(-)

diff --git a/hw/apic.c b/hw/apic.c
index 6811ae1..cb6662c 100644
--- a/hw/apic.c
+++ b/hw/apic.c
@@ -806,15 +806,21 @@ static uint32_t apic_mem_readl(void *opaque, target_phys_addr_t addr)
 
 void apic_deliver_msi(MSIMessage *msg, MSIRoutingCache *cache)
 {
-    uint8_t dest =
-        (msg->address & MSI_ADDR_DEST_ID_MASK) >> MSI_ADDR_DEST_ID_SHIFT;
-    uint8_t vector =
-        (msg->data & MSI_DATA_VECTOR_MASK) >> MSI_DATA_VECTOR_SHIFT;
-    uint8_t dest_mode = (msg->address >> MSI_ADDR_DEST_MODE_SHIFT) & 0x1;
-    uint8_t trigger_mode = (msg->data >> MSI_DATA_TRIGGER_SHIFT) & 0x1;
-    uint8_t delivery = (msg->data >> MSI_DATA_DELIVERY_MODE_SHIFT) & 0x7;
-    /* XXX: Ignore redirection hint. */
-    apic_deliver_irq(dest, dest_mode, delivery, vector, trigger_mode);
+    if (kvm_enabled() && kvm_irqchip_in_kernel()) {
+        if (kvm_set_irq(cache->kvm_gsi, 1, NULL) < 0) {
+            abort();
+        }
+    } else {
+        uint8_t dest =
+            (msg->address & MSI_ADDR_DEST_ID_MASK) >> MSI_ADDR_DEST_ID_SHIFT;
+        uint8_t vector =
+            (msg->data & MSI_DATA_VECTOR_MASK) >> MSI_DATA_VECTOR_SHIFT;
+        uint8_t dest_mode = (msg->address >> MSI_ADDR_DEST_MODE_SHIFT) & 0x1;
+        uint8_t trigger_mode = (msg->data >> MSI_DATA_TRIGGER_SHIFT) & 0x1;
+        uint8_t delivery = (msg->data >> MSI_DATA_DELIVERY_MODE_SHIFT) & 0x7;
+        /* XXX: Ignore redirection hint. */
+        apic_deliver_irq(dest, dest_mode, delivery, vector, trigger_mode);
+    }
 }
 
 static void apic_mem_writel(void *opaque, target_phys_addr_t addr, uint32_t val)
diff --git a/hw/msi.c b/hw/msi.c
index b947104..1328903 100644
--- a/hw/msi.c
+++ b/hw/msi.c
@@ -355,11 +355,6 @@ void msi_notify(PCIDevice *dev, unsigned int vector)
         return;
     }
 
-    if (kvm_enabled() && kvm_irqchip_in_kernel()) {
-        kvm_set_irq(dev->msi_cache[vector].kvm_gsi, 1, NULL);
-        return;
-    }
-
     msi_message_from_vector(dev, flags, vector, &msg);
 
     MSI_DEV_PRINTF(dev,
diff --git a/hw/msix.c b/hw/msix.c
index 0be022e..6886255 100644
--- a/hw/msix.c
+++ b/hw/msix.c
@@ -457,11 +457,6 @@ void msix_notify(PCIDevice *dev, unsigned vector)
         return;
     }
 
-    if (kvm_enabled() && kvm_irqchip_in_kernel()) {
-        kvm_set_irq(dev->msix_cache[vector].kvm_gsi, 1, NULL);
-        return;
-    }
-
     msix_message_from_vector(dev, vector, &msg);
 
     msi_deliver(&msg, &dev->msix_cache[vector]);
-- 
1.7.3.4


^ permalink raw reply related	[flat|nested] 288+ messages in thread

* [Qemu-devel] [RFC][PATCH 18/45] qemu-kvm: Hook into MSI delivery at APIC level
@ 2011-10-17  9:27   ` Jan Kiszka
  0 siblings, 0 replies; 288+ messages in thread
From: Jan Kiszka @ 2011-10-17  9:27 UTC (permalink / raw)
  To: Avi Kivity, Marcelo Tosatti
  Cc: Alex Williamson, qemu-devel, kvm, Michael S. Tsirkin

Move the two hooks for MSI delivery to in-kernel irqchips from the MSI
layer to a single place: the APIC.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
---
 hw/apic.c |   24 +++++++++++++++---------
 hw/msi.c  |    5 -----
 hw/msix.c |    5 -----
 3 files changed, 15 insertions(+), 19 deletions(-)

diff --git a/hw/apic.c b/hw/apic.c
index 6811ae1..cb6662c 100644
--- a/hw/apic.c
+++ b/hw/apic.c
@@ -806,15 +806,21 @@ static uint32_t apic_mem_readl(void *opaque, target_phys_addr_t addr)
 
 void apic_deliver_msi(MSIMessage *msg, MSIRoutingCache *cache)
 {
-    uint8_t dest =
-        (msg->address & MSI_ADDR_DEST_ID_MASK) >> MSI_ADDR_DEST_ID_SHIFT;
-    uint8_t vector =
-        (msg->data & MSI_DATA_VECTOR_MASK) >> MSI_DATA_VECTOR_SHIFT;
-    uint8_t dest_mode = (msg->address >> MSI_ADDR_DEST_MODE_SHIFT) & 0x1;
-    uint8_t trigger_mode = (msg->data >> MSI_DATA_TRIGGER_SHIFT) & 0x1;
-    uint8_t delivery = (msg->data >> MSI_DATA_DELIVERY_MODE_SHIFT) & 0x7;
-    /* XXX: Ignore redirection hint. */
-    apic_deliver_irq(dest, dest_mode, delivery, vector, trigger_mode);
+    if (kvm_enabled() && kvm_irqchip_in_kernel()) {
+        if (kvm_set_irq(cache->kvm_gsi, 1, NULL) < 0) {
+            abort();
+        }
+    } else {
+        uint8_t dest =
+            (msg->address & MSI_ADDR_DEST_ID_MASK) >> MSI_ADDR_DEST_ID_SHIFT;
+        uint8_t vector =
+            (msg->data & MSI_DATA_VECTOR_MASK) >> MSI_DATA_VECTOR_SHIFT;
+        uint8_t dest_mode = (msg->address >> MSI_ADDR_DEST_MODE_SHIFT) & 0x1;
+        uint8_t trigger_mode = (msg->data >> MSI_DATA_TRIGGER_SHIFT) & 0x1;
+        uint8_t delivery = (msg->data >> MSI_DATA_DELIVERY_MODE_SHIFT) & 0x7;
+        /* XXX: Ignore redirection hint. */
+        apic_deliver_irq(dest, dest_mode, delivery, vector, trigger_mode);
+    }
 }
 
 static void apic_mem_writel(void *opaque, target_phys_addr_t addr, uint32_t val)
diff --git a/hw/msi.c b/hw/msi.c
index b947104..1328903 100644
--- a/hw/msi.c
+++ b/hw/msi.c
@@ -355,11 +355,6 @@ void msi_notify(PCIDevice *dev, unsigned int vector)
         return;
     }
 
-    if (kvm_enabled() && kvm_irqchip_in_kernel()) {
-        kvm_set_irq(dev->msi_cache[vector].kvm_gsi, 1, NULL);
-        return;
-    }
-
     msi_message_from_vector(dev, flags, vector, &msg);
 
     MSI_DEV_PRINTF(dev,
diff --git a/hw/msix.c b/hw/msix.c
index 0be022e..6886255 100644
--- a/hw/msix.c
+++ b/hw/msix.c
@@ -457,11 +457,6 @@ void msix_notify(PCIDevice *dev, unsigned vector)
         return;
     }
 
-    if (kvm_enabled() && kvm_irqchip_in_kernel()) {
-        kvm_set_irq(dev->msix_cache[vector].kvm_gsi, 1, NULL);
-        return;
-    }
-
     msix_message_from_vector(dev, vector, &msg);
 
     msi_deliver(&msg, &dev->msix_cache[vector]);
-- 
1.7.3.4

^ permalink raw reply related	[flat|nested] 288+ messages in thread

* [RFC][PATCH 19/45] qemu-kvm: Factor out kvm_msi_irqfd_set
  2011-10-17  9:27 ` [Qemu-devel] " Jan Kiszka
@ 2011-10-17  9:27   ` Jan Kiszka
  -1 siblings, 0 replies; 288+ messages in thread
From: Jan Kiszka @ 2011-10-17  9:27 UTC (permalink / raw)
  To: Avi Kivity, Marcelo Tosatti
  Cc: kvm, Alex Williamson, Michael S. Tsirkin, qemu-devel

This makes the KVM core layer aware of the irqfd associated with some
MSI cache. kvm_msi_irqfd_set is defined for this purpose, which avoids
that virtio needs to peek into the cache for extracting the GSI.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
---
 hw/virtio-pci.c |    6 +++---
 kvm.h           |    2 ++
 qemu-kvm.c      |   14 +++++++++++++-
 3 files changed, 18 insertions(+), 4 deletions(-)

diff --git a/hw/virtio-pci.c b/hw/virtio-pci.c
index 23880e0..ad6a002 100644
--- a/hw/virtio-pci.c
+++ b/hw/virtio-pci.c
@@ -524,9 +524,9 @@ static int virtio_pci_mask_vq(PCIDevice *dev, unsigned vector,
                               VirtQueue *vq, int masked)
 {
     EventNotifier *notifier = virtio_queue_get_guest_notifier(vq);
-    int r = kvm_set_irqfd(dev->msix_cache[vector].kvm_gsi,
-                          event_notifier_get_fd(notifier),
-                          !masked);
+    int r = kvm_msi_irqfd_set(&dev->msix_cache[vector],
+                              event_notifier_get_fd(notifier),
+                              !masked);
     if (r < 0) {
         return (r == -ENOSYS) ? 0 : r;
     }
diff --git a/kvm.h b/kvm.h
index 3706fc6..fe2eec5 100644
--- a/kvm.h
+++ b/kvm.h
@@ -208,6 +208,8 @@ int kvm_msi_message_add(MSIMessage *msg, MSIRoutingCache *cache);
 int kvm_msi_message_del(MSIRoutingCache *cache);
 int kvm_msi_message_update(MSIMessage *msg, MSIRoutingCache *cache);
 
+int kvm_msi_irqfd_set(MSIRoutingCache *cache, int fd, bool assigned);
+
 int kvm_commit_irq_routes(void);
 
 int kvm_irqchip_in_kernel(void);
diff --git a/qemu-kvm.c b/qemu-kvm.c
index 13d4f90..ab7703b 100644
--- a/qemu-kvm.c
+++ b/qemu-kvm.c
@@ -352,8 +352,11 @@ int kvm_del_routing_entry(struct kvm_irq_routing_entry *entry)
                 *e = *p;
 
                 cache = s->msi_cache[i];
-                if (cache) {
+                if (cache && cache->type != MSI_ROUTE_NONE) {
                     cache->type = MSI_ROUTE_NONE;
+                    if (cache->kvm_irqfd >= 0) {
+                        kvm_set_irqfd(cache->kvm_gsi, cache->kvm_irqfd, false);
+                    }
                 }
                 s->msi_cache[i] = s->msi_cache[s->irq_routes->nr];
 
@@ -521,6 +524,15 @@ int kvm_msi_message_update(MSIMessage *msg, MSIRoutingCache *cache)
 }
 
 
+int kvm_msi_irqfd_set(MSIRoutingCache *cache, int fd, bool assigned)
+{
+    if (cache->type == MSI_ROUTE_NONE) {
+        return assigned ? -EINVAL : 0;
+    }
+    cache->kvm_irqfd = assigned ? fd : -1;
+    return kvm_set_irqfd(cache->kvm_gsi, fd, assigned);
+}
+
 #ifdef KVM_CAP_DEVICE_MSIX
 int kvm_assign_set_msix_nr(KVMState *s, struct kvm_assigned_msix_nr *msix_nr)
 {
-- 
1.7.3.4


^ permalink raw reply related	[flat|nested] 288+ messages in thread

* [Qemu-devel] [RFC][PATCH 19/45] qemu-kvm: Factor out kvm_msi_irqfd_set
@ 2011-10-17  9:27   ` Jan Kiszka
  0 siblings, 0 replies; 288+ messages in thread
From: Jan Kiszka @ 2011-10-17  9:27 UTC (permalink / raw)
  To: Avi Kivity, Marcelo Tosatti
  Cc: Alex Williamson, qemu-devel, kvm, Michael S. Tsirkin

This makes the KVM core layer aware of the irqfd associated with some
MSI cache. kvm_msi_irqfd_set is defined for this purpose, which avoids
that virtio needs to peek into the cache for extracting the GSI.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
---
 hw/virtio-pci.c |    6 +++---
 kvm.h           |    2 ++
 qemu-kvm.c      |   14 +++++++++++++-
 3 files changed, 18 insertions(+), 4 deletions(-)

diff --git a/hw/virtio-pci.c b/hw/virtio-pci.c
index 23880e0..ad6a002 100644
--- a/hw/virtio-pci.c
+++ b/hw/virtio-pci.c
@@ -524,9 +524,9 @@ static int virtio_pci_mask_vq(PCIDevice *dev, unsigned vector,
                               VirtQueue *vq, int masked)
 {
     EventNotifier *notifier = virtio_queue_get_guest_notifier(vq);
-    int r = kvm_set_irqfd(dev->msix_cache[vector].kvm_gsi,
-                          event_notifier_get_fd(notifier),
-                          !masked);
+    int r = kvm_msi_irqfd_set(&dev->msix_cache[vector],
+                              event_notifier_get_fd(notifier),
+                              !masked);
     if (r < 0) {
         return (r == -ENOSYS) ? 0 : r;
     }
diff --git a/kvm.h b/kvm.h
index 3706fc6..fe2eec5 100644
--- a/kvm.h
+++ b/kvm.h
@@ -208,6 +208,8 @@ int kvm_msi_message_add(MSIMessage *msg, MSIRoutingCache *cache);
 int kvm_msi_message_del(MSIRoutingCache *cache);
 int kvm_msi_message_update(MSIMessage *msg, MSIRoutingCache *cache);
 
+int kvm_msi_irqfd_set(MSIRoutingCache *cache, int fd, bool assigned);
+
 int kvm_commit_irq_routes(void);
 
 int kvm_irqchip_in_kernel(void);
diff --git a/qemu-kvm.c b/qemu-kvm.c
index 13d4f90..ab7703b 100644
--- a/qemu-kvm.c
+++ b/qemu-kvm.c
@@ -352,8 +352,11 @@ int kvm_del_routing_entry(struct kvm_irq_routing_entry *entry)
                 *e = *p;
 
                 cache = s->msi_cache[i];
-                if (cache) {
+                if (cache && cache->type != MSI_ROUTE_NONE) {
                     cache->type = MSI_ROUTE_NONE;
+                    if (cache->kvm_irqfd >= 0) {
+                        kvm_set_irqfd(cache->kvm_gsi, cache->kvm_irqfd, false);
+                    }
                 }
                 s->msi_cache[i] = s->msi_cache[s->irq_routes->nr];
 
@@ -521,6 +524,15 @@ int kvm_msi_message_update(MSIMessage *msg, MSIRoutingCache *cache)
 }
 
 
+int kvm_msi_irqfd_set(MSIRoutingCache *cache, int fd, bool assigned)
+{
+    if (cache->type == MSI_ROUTE_NONE) {
+        return assigned ? -EINVAL : 0;
+    }
+    cache->kvm_irqfd = assigned ? fd : -1;
+    return kvm_set_irqfd(cache->kvm_gsi, fd, assigned);
+}
+
 #ifdef KVM_CAP_DEVICE_MSIX
 int kvm_assign_set_msix_nr(KVMState *s, struct kvm_assigned_msix_nr *msix_nr)
 {
-- 
1.7.3.4

^ permalink raw reply related	[flat|nested] 288+ messages in thread

* [RFC][PATCH 20/45] qemu-kvm: msix: Only invoke msix_handle_mask_update on changes
  2011-10-17  9:27 ` [Qemu-devel] " Jan Kiszka
@ 2011-10-17  9:27   ` Jan Kiszka
  -1 siblings, 0 replies; 288+ messages in thread
From: Jan Kiszka @ 2011-10-17  9:27 UTC (permalink / raw)
  To: Avi Kivity, Marcelo Tosatti
  Cc: kvm, Alex Williamson, Michael S. Tsirkin, qemu-devel

Reorganize msix_mmio_writel so that msix_handle_mask_update is only
called on mask changes. Pass previous config space value to
msix_write_config so that is can check if a mask change took place.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
---
 hw/msix.c |   36 ++++++++++++++++++++----------------
 hw/msix.h |    2 +-
 hw/pci.c  |    3 ++-
 3 files changed, 23 insertions(+), 18 deletions(-)

diff --git a/hw/msix.c b/hw/msix.c
index 6886255..57d0aac 100644
--- a/hw/msix.c
+++ b/hw/msix.c
@@ -206,12 +206,12 @@ static void msix_clr_pending(PCIDevice *dev, int vector)
     *msix_pending_byte(dev, vector) &= ~msix_pending_mask(vector);
 }
 
-static int msix_function_masked(PCIDevice *dev)
+static bool msix_function_masked(PCIDevice *dev)
 {
     return dev->config[dev->msix_cap + MSIX_CONTROL_OFFSET] & MSIX_MASKALL_MASK;
 }
 
-static int msix_is_masked(PCIDevice *dev, int vector)
+static bool msix_is_masked(PCIDevice *dev, int vector)
 {
     unsigned offset =
         vector * PCI_MSIX_ENTRY_SIZE + PCI_MSIX_ENTRY_VECTOR_CTRL;
@@ -229,9 +229,10 @@ static void msix_handle_mask_update(PCIDevice *dev, int vector)
 
 /* Handle MSI-X capability config write. */
 void msix_write_config(PCIDevice *dev, uint32_t addr,
-                       uint32_t val, int len)
+                       uint32_t old_val, int len)
 {
     unsigned enable_pos = dev->msix_cap + MSIX_CONTROL_OFFSET;
+    bool was_masked;
     int vector;
 
     if (!msix_present(dev) || !range_covers_byte(addr, len, enable_pos)) {
@@ -244,12 +245,13 @@ void msix_write_config(PCIDevice *dev, uint32_t addr,
 
     pci_device_deassert_intx(dev);
 
-    if (msix_function_masked(dev)) {
-        return;
-    }
-
-    for (vector = 0; vector < dev->msix_entries_nr; ++vector) {
-        msix_handle_mask_update(dev, vector);
+    old_val >>= (enable_pos - addr) * 8;
+    was_masked =
+        (old_val & (MSIX_MASKALL_MASK | MSIX_ENABLE_MASK)) != MSIX_ENABLE_MASK;
+    if (was_masked != msix_function_masked(dev)) {
+        for (vector = 0; vector < dev->msix_entries_nr; ++vector) {
+            msix_handle_mask_update(dev, vector);
+        }
     }
 }
 
@@ -259,17 +261,19 @@ static void msix_mmio_write(void *opaque, target_phys_addr_t addr,
     PCIDevice *dev = opaque;
     unsigned int offset = addr & (MSIX_PAGE_SIZE - 1) & ~0x3;
     unsigned int vector = offset / PCI_MSIX_ENTRY_SIZE;
-    int was_masked = msix_is_masked(dev, vector);
+    bool was_masked = msix_is_masked(dev, vector);
+    int r;
+
     pci_set_long(dev->msix_table_page + offset, val);
     if (kvm_enabled() && kvm_irqchip_in_kernel()) {
         kvm_msix_update(dev, vector, was_masked, msix_is_masked(dev, vector));
     }
 
-    if (vector < dev->msix_entries_nr) {
-        if (was_masked != msix_is_masked(dev, vector) &&
-            dev->msix_mask_notifier) {
-            int r = dev->msix_mask_notifier(dev, vector,
-                                            msix_is_masked(dev, vector));
+    if (vector < dev->msix_entries_nr &&
+        was_masked != msix_is_masked(dev, vector)) {
+        if (dev->msix_mask_notifier) {
+            r = dev->msix_mask_notifier(dev, vector,
+                                        msix_is_masked(dev, vector));
             assert(r >= 0);
         }
         msix_handle_mask_update(dev, vector);
@@ -303,7 +307,7 @@ static void msix_mask_all(struct PCIDevice *dev, unsigned nentries)
     for (vector = 0; vector < nentries; ++vector) {
         unsigned offset =
             vector * PCI_MSIX_ENTRY_SIZE + PCI_MSIX_ENTRY_VECTOR_CTRL;
-        int was_masked = msix_is_masked(dev, vector);
+        bool was_masked = msix_is_masked(dev, vector);
         dev->msix_table_page[offset] |= PCI_MSIX_ENTRY_CTRL_MASKBIT;
         if (was_masked != msix_is_masked(dev, vector) &&
             dev->msix_mask_notifier) {
diff --git a/hw/msix.h b/hw/msix.h
index a8661e1..685dbe2 100644
--- a/hw/msix.h
+++ b/hw/msix.h
@@ -9,7 +9,7 @@ int msix_init(PCIDevice *pdev, unsigned short nentries,
               unsigned bar_nr, unsigned bar_size);
 
 void msix_write_config(PCIDevice *pci_dev, uint32_t address,
-                       uint32_t val, int len);
+                       uint32_t old_val, int len);
 
 int msix_uninit(PCIDevice *d, MemoryRegion *bar);
 
diff --git a/hw/pci.c b/hw/pci.c
index 6673989..39b2173 100644
--- a/hw/pci.c
+++ b/hw/pci.c
@@ -1129,6 +1129,7 @@ uint32_t pci_default_read_config(PCIDevice *d,
 
 void pci_default_write_config(PCIDevice *d, uint32_t addr, uint32_t val, int l)
 {
+    uint32_t old_val = pci_default_read_config(d, addr, l);
     int i, was_irq_disabled = pci_irq_disabled(d);
 
     for (i = 0; i < l; val >>= 8, ++i) {
@@ -1156,7 +1157,7 @@ void pci_default_write_config(PCIDevice *d, uint32_t addr, uint32_t val, int l)
         pci_update_irq_disabled(d, was_irq_disabled);
 
     msi_write_config(d, addr, val, l);
-    msix_write_config(d, addr, val, l);
+    msix_write_config(d, addr, old_val, l);
 }
 
 /***********************************************************/
-- 
1.7.3.4


^ permalink raw reply related	[flat|nested] 288+ messages in thread

* [Qemu-devel] [RFC][PATCH 20/45] qemu-kvm: msix: Only invoke msix_handle_mask_update on changes
@ 2011-10-17  9:27   ` Jan Kiszka
  0 siblings, 0 replies; 288+ messages in thread
From: Jan Kiszka @ 2011-10-17  9:27 UTC (permalink / raw)
  To: Avi Kivity, Marcelo Tosatti
  Cc: Alex Williamson, qemu-devel, kvm, Michael S. Tsirkin

Reorganize msix_mmio_writel so that msix_handle_mask_update is only
called on mask changes. Pass previous config space value to
msix_write_config so that is can check if a mask change took place.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
---
 hw/msix.c |   36 ++++++++++++++++++++----------------
 hw/msix.h |    2 +-
 hw/pci.c  |    3 ++-
 3 files changed, 23 insertions(+), 18 deletions(-)

diff --git a/hw/msix.c b/hw/msix.c
index 6886255..57d0aac 100644
--- a/hw/msix.c
+++ b/hw/msix.c
@@ -206,12 +206,12 @@ static void msix_clr_pending(PCIDevice *dev, int vector)
     *msix_pending_byte(dev, vector) &= ~msix_pending_mask(vector);
 }
 
-static int msix_function_masked(PCIDevice *dev)
+static bool msix_function_masked(PCIDevice *dev)
 {
     return dev->config[dev->msix_cap + MSIX_CONTROL_OFFSET] & MSIX_MASKALL_MASK;
 }
 
-static int msix_is_masked(PCIDevice *dev, int vector)
+static bool msix_is_masked(PCIDevice *dev, int vector)
 {
     unsigned offset =
         vector * PCI_MSIX_ENTRY_SIZE + PCI_MSIX_ENTRY_VECTOR_CTRL;
@@ -229,9 +229,10 @@ static void msix_handle_mask_update(PCIDevice *dev, int vector)
 
 /* Handle MSI-X capability config write. */
 void msix_write_config(PCIDevice *dev, uint32_t addr,
-                       uint32_t val, int len)
+                       uint32_t old_val, int len)
 {
     unsigned enable_pos = dev->msix_cap + MSIX_CONTROL_OFFSET;
+    bool was_masked;
     int vector;
 
     if (!msix_present(dev) || !range_covers_byte(addr, len, enable_pos)) {
@@ -244,12 +245,13 @@ void msix_write_config(PCIDevice *dev, uint32_t addr,
 
     pci_device_deassert_intx(dev);
 
-    if (msix_function_masked(dev)) {
-        return;
-    }
-
-    for (vector = 0; vector < dev->msix_entries_nr; ++vector) {
-        msix_handle_mask_update(dev, vector);
+    old_val >>= (enable_pos - addr) * 8;
+    was_masked =
+        (old_val & (MSIX_MASKALL_MASK | MSIX_ENABLE_MASK)) != MSIX_ENABLE_MASK;
+    if (was_masked != msix_function_masked(dev)) {
+        for (vector = 0; vector < dev->msix_entries_nr; ++vector) {
+            msix_handle_mask_update(dev, vector);
+        }
     }
 }
 
@@ -259,17 +261,19 @@ static void msix_mmio_write(void *opaque, target_phys_addr_t addr,
     PCIDevice *dev = opaque;
     unsigned int offset = addr & (MSIX_PAGE_SIZE - 1) & ~0x3;
     unsigned int vector = offset / PCI_MSIX_ENTRY_SIZE;
-    int was_masked = msix_is_masked(dev, vector);
+    bool was_masked = msix_is_masked(dev, vector);
+    int r;
+
     pci_set_long(dev->msix_table_page + offset, val);
     if (kvm_enabled() && kvm_irqchip_in_kernel()) {
         kvm_msix_update(dev, vector, was_masked, msix_is_masked(dev, vector));
     }
 
-    if (vector < dev->msix_entries_nr) {
-        if (was_masked != msix_is_masked(dev, vector) &&
-            dev->msix_mask_notifier) {
-            int r = dev->msix_mask_notifier(dev, vector,
-                                            msix_is_masked(dev, vector));
+    if (vector < dev->msix_entries_nr &&
+        was_masked != msix_is_masked(dev, vector)) {
+        if (dev->msix_mask_notifier) {
+            r = dev->msix_mask_notifier(dev, vector,
+                                        msix_is_masked(dev, vector));
             assert(r >= 0);
         }
         msix_handle_mask_update(dev, vector);
@@ -303,7 +307,7 @@ static void msix_mask_all(struct PCIDevice *dev, unsigned nentries)
     for (vector = 0; vector < nentries; ++vector) {
         unsigned offset =
             vector * PCI_MSIX_ENTRY_SIZE + PCI_MSIX_ENTRY_VECTOR_CTRL;
-        int was_masked = msix_is_masked(dev, vector);
+        bool was_masked = msix_is_masked(dev, vector);
         dev->msix_table_page[offset] |= PCI_MSIX_ENTRY_CTRL_MASKBIT;
         if (was_masked != msix_is_masked(dev, vector) &&
             dev->msix_mask_notifier) {
diff --git a/hw/msix.h b/hw/msix.h
index a8661e1..685dbe2 100644
--- a/hw/msix.h
+++ b/hw/msix.h
@@ -9,7 +9,7 @@ int msix_init(PCIDevice *pdev, unsigned short nentries,
               unsigned bar_nr, unsigned bar_size);
 
 void msix_write_config(PCIDevice *pci_dev, uint32_t address,
-                       uint32_t val, int len);
+                       uint32_t old_val, int len);
 
 int msix_uninit(PCIDevice *d, MemoryRegion *bar);
 
diff --git a/hw/pci.c b/hw/pci.c
index 6673989..39b2173 100644
--- a/hw/pci.c
+++ b/hw/pci.c
@@ -1129,6 +1129,7 @@ uint32_t pci_default_read_config(PCIDevice *d,
 
 void pci_default_write_config(PCIDevice *d, uint32_t addr, uint32_t val, int l)
 {
+    uint32_t old_val = pci_default_read_config(d, addr, l);
     int i, was_irq_disabled = pci_irq_disabled(d);
 
     for (i = 0; i < l; val >>= 8, ++i) {
@@ -1156,7 +1157,7 @@ void pci_default_write_config(PCIDevice *d, uint32_t addr, uint32_t val, int l)
         pci_update_irq_disabled(d, was_irq_disabled);
 
     msi_write_config(d, addr, val, l);
-    msix_write_config(d, addr, val, l);
+    msix_write_config(d, addr, old_val, l);
 }
 
 /***********************************************************/
-- 
1.7.3.4

^ permalink raw reply related	[flat|nested] 288+ messages in thread

* [RFC][PATCH 21/45] qemu-kvm: msix: Don't fire notifier spuriously on set/unset
  2011-10-17  9:27 ` [Qemu-devel] " Jan Kiszka
@ 2011-10-17  9:27   ` Jan Kiszka
  -1 siblings, 0 replies; 288+ messages in thread
From: Jan Kiszka @ 2011-10-17  9:27 UTC (permalink / raw)
  To: Avi Kivity, Marcelo Tosatti
  Cc: kvm, Alex Williamson, Michael S. Tsirkin, qemu-devel

If MSI-X is disabled or the global mask is set, don't fire the notifier
during registration or removal, reporting a wrong state.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
---
 hw/msix.c |   22 ++++++++++++++--------
 1 files changed, 14 insertions(+), 8 deletions(-)

diff --git a/hw/msix.c b/hw/msix.c
index 57d0aac..739b56f 100644
--- a/hw/msix.c
+++ b/hw/msix.c
@@ -553,10 +553,13 @@ int msix_set_mask_notifier(PCIDevice *dev, msix_mask_notifier_func f)
     int r, n;
     assert(!dev->msix_mask_notifier);
     dev->msix_mask_notifier = f;
-    for (n = 0; n < dev->msix_entries_nr; ++n) {
-        r = msix_set_mask_notifier_for_vector(dev, n);
-        if (r < 0) {
-            goto undo;
+    if ((dev->config[dev->msix_cap + MSIX_CONTROL_OFFSET] &
+        (MSIX_ENABLE_MASK | MSIX_MASKALL_MASK)) == MSIX_ENABLE_MASK) {
+        for (n = 0; n < dev->msix_entries_nr; ++n) {
+            r = msix_set_mask_notifier_for_vector(dev, n);
+            if (r < 0) {
+                goto undo;
+            }
         }
     }
     return 0;
@@ -573,10 +576,13 @@ int msix_unset_mask_notifier(PCIDevice *dev)
 {
     int r, n;
     assert(dev->msix_mask_notifier);
-    for (n = 0; n < dev->msix_entries_nr; ++n) {
-        r = msix_unset_mask_notifier_for_vector(dev, n);
-        if (r < 0) {
-            goto undo;
+    if ((dev->config[dev->msix_cap + MSIX_CONTROL_OFFSET] &
+        (MSIX_ENABLE_MASK | MSIX_MASKALL_MASK)) == MSIX_ENABLE_MASK) {
+        for (n = 0; n < dev->msix_entries_nr; ++n) {
+            r = msix_unset_mask_notifier_for_vector(dev, n);
+            if (r < 0) {
+                goto undo;
+            }
         }
     }
     dev->msix_mask_notifier = NULL;
-- 
1.7.3.4


^ permalink raw reply related	[flat|nested] 288+ messages in thread

* [Qemu-devel] [RFC][PATCH 21/45] qemu-kvm: msix: Don't fire notifier spuriously on set/unset
@ 2011-10-17  9:27   ` Jan Kiszka
  0 siblings, 0 replies; 288+ messages in thread
From: Jan Kiszka @ 2011-10-17  9:27 UTC (permalink / raw)
  To: Avi Kivity, Marcelo Tosatti
  Cc: Alex Williamson, qemu-devel, kvm, Michael S. Tsirkin

If MSI-X is disabled or the global mask is set, don't fire the notifier
during registration or removal, reporting a wrong state.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
---
 hw/msix.c |   22 ++++++++++++++--------
 1 files changed, 14 insertions(+), 8 deletions(-)

diff --git a/hw/msix.c b/hw/msix.c
index 57d0aac..739b56f 100644
--- a/hw/msix.c
+++ b/hw/msix.c
@@ -553,10 +553,13 @@ int msix_set_mask_notifier(PCIDevice *dev, msix_mask_notifier_func f)
     int r, n;
     assert(!dev->msix_mask_notifier);
     dev->msix_mask_notifier = f;
-    for (n = 0; n < dev->msix_entries_nr; ++n) {
-        r = msix_set_mask_notifier_for_vector(dev, n);
-        if (r < 0) {
-            goto undo;
+    if ((dev->config[dev->msix_cap + MSIX_CONTROL_OFFSET] &
+        (MSIX_ENABLE_MASK | MSIX_MASKALL_MASK)) == MSIX_ENABLE_MASK) {
+        for (n = 0; n < dev->msix_entries_nr; ++n) {
+            r = msix_set_mask_notifier_for_vector(dev, n);
+            if (r < 0) {
+                goto undo;
+            }
         }
     }
     return 0;
@@ -573,10 +576,13 @@ int msix_unset_mask_notifier(PCIDevice *dev)
 {
     int r, n;
     assert(dev->msix_mask_notifier);
-    for (n = 0; n < dev->msix_entries_nr; ++n) {
-        r = msix_unset_mask_notifier_for_vector(dev, n);
-        if (r < 0) {
-            goto undo;
+    if ((dev->config[dev->msix_cap + MSIX_CONTROL_OFFSET] &
+        (MSIX_ENABLE_MASK | MSIX_MASKALL_MASK)) == MSIX_ENABLE_MASK) {
+        for (n = 0; n < dev->msix_entries_nr; ++n) {
+            r = msix_unset_mask_notifier_for_vector(dev, n);
+            if (r < 0) {
+                goto undo;
+            }
         }
     }
     dev->msix_mask_notifier = NULL;
-- 
1.7.3.4

^ permalink raw reply related	[flat|nested] 288+ messages in thread

* [RFC][PATCH 22/45] qemu-kvm: msix: Fire mask notifier on global mask changes
  2011-10-17  9:27 ` [Qemu-devel] " Jan Kiszka
@ 2011-10-17  9:27   ` Jan Kiszka
  -1 siblings, 0 replies; 288+ messages in thread
From: Jan Kiszka @ 2011-10-17  9:27 UTC (permalink / raw)
  To: Avi Kivity, Marcelo Tosatti
  Cc: kvm, Alex Williamson, Michael S. Tsirkin, qemu-devel

Also invoke the mask notifier if the global MSI-X mask is modified. For
this purpose, we push the notifier call from the per-vector mask update
to the central msix_handle_mask_update.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
---
 hw/msix.c |   16 +++++++++-------
 1 files changed, 9 insertions(+), 7 deletions(-)

diff --git a/hw/msix.c b/hw/msix.c
index 739b56f..247b255 100644
--- a/hw/msix.c
+++ b/hw/msix.c
@@ -221,7 +221,15 @@ static bool msix_is_masked(PCIDevice *dev, int vector)
 
 static void msix_handle_mask_update(PCIDevice *dev, int vector)
 {
-    if (!msix_is_masked(dev, vector) && msix_is_pending(dev, vector)) {
+    bool masked = msix_is_masked(dev, vector);
+    int ret;
+
+    if (dev->msix_mask_notifier) {
+        ret = dev->msix_mask_notifier(dev, vector,
+                                      msix_is_masked(dev, vector));
+        assert(ret >= 0);
+    }
+    if (!masked && msix_is_pending(dev, vector)) {
         msix_clr_pending(dev, vector);
         msix_notify(dev, vector);
     }
@@ -262,7 +270,6 @@ static void msix_mmio_write(void *opaque, target_phys_addr_t addr,
     unsigned int offset = addr & (MSIX_PAGE_SIZE - 1) & ~0x3;
     unsigned int vector = offset / PCI_MSIX_ENTRY_SIZE;
     bool was_masked = msix_is_masked(dev, vector);
-    int r;
 
     pci_set_long(dev->msix_table_page + offset, val);
     if (kvm_enabled() && kvm_irqchip_in_kernel()) {
@@ -271,11 +278,6 @@ static void msix_mmio_write(void *opaque, target_phys_addr_t addr,
 
     if (vector < dev->msix_entries_nr &&
         was_masked != msix_is_masked(dev, vector)) {
-        if (dev->msix_mask_notifier) {
-            r = dev->msix_mask_notifier(dev, vector,
-                                        msix_is_masked(dev, vector));
-            assert(r >= 0);
-        }
         msix_handle_mask_update(dev, vector);
     }
 }
-- 
1.7.3.4


^ permalink raw reply related	[flat|nested] 288+ messages in thread

* [Qemu-devel] [RFC][PATCH 22/45] qemu-kvm: msix: Fire mask notifier on global mask changes
@ 2011-10-17  9:27   ` Jan Kiszka
  0 siblings, 0 replies; 288+ messages in thread
From: Jan Kiszka @ 2011-10-17  9:27 UTC (permalink / raw)
  To: Avi Kivity, Marcelo Tosatti
  Cc: Alex Williamson, qemu-devel, kvm, Michael S. Tsirkin

Also invoke the mask notifier if the global MSI-X mask is modified. For
this purpose, we push the notifier call from the per-vector mask update
to the central msix_handle_mask_update.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
---
 hw/msix.c |   16 +++++++++-------
 1 files changed, 9 insertions(+), 7 deletions(-)

diff --git a/hw/msix.c b/hw/msix.c
index 739b56f..247b255 100644
--- a/hw/msix.c
+++ b/hw/msix.c
@@ -221,7 +221,15 @@ static bool msix_is_masked(PCIDevice *dev, int vector)
 
 static void msix_handle_mask_update(PCIDevice *dev, int vector)
 {
-    if (!msix_is_masked(dev, vector) && msix_is_pending(dev, vector)) {
+    bool masked = msix_is_masked(dev, vector);
+    int ret;
+
+    if (dev->msix_mask_notifier) {
+        ret = dev->msix_mask_notifier(dev, vector,
+                                      msix_is_masked(dev, vector));
+        assert(ret >= 0);
+    }
+    if (!masked && msix_is_pending(dev, vector)) {
         msix_clr_pending(dev, vector);
         msix_notify(dev, vector);
     }
@@ -262,7 +270,6 @@ static void msix_mmio_write(void *opaque, target_phys_addr_t addr,
     unsigned int offset = addr & (MSIX_PAGE_SIZE - 1) & ~0x3;
     unsigned int vector = offset / PCI_MSIX_ENTRY_SIZE;
     bool was_masked = msix_is_masked(dev, vector);
-    int r;
 
     pci_set_long(dev->msix_table_page + offset, val);
     if (kvm_enabled() && kvm_irqchip_in_kernel()) {
@@ -271,11 +278,6 @@ static void msix_mmio_write(void *opaque, target_phys_addr_t addr,
 
     if (vector < dev->msix_entries_nr &&
         was_masked != msix_is_masked(dev, vector)) {
-        if (dev->msix_mask_notifier) {
-            r = dev->msix_mask_notifier(dev, vector,
-                                        msix_is_masked(dev, vector));
-            assert(r >= 0);
-        }
         msix_handle_mask_update(dev, vector);
     }
 }
-- 
1.7.3.4

^ permalink raw reply related	[flat|nested] 288+ messages in thread

* [RFC][PATCH 23/45] qemu-kvm: Rework MSI-X mask notifier to generic MSI config notifiers
  2011-10-17  9:27 ` [Qemu-devel] " Jan Kiszka
@ 2011-10-17  9:27   ` Jan Kiszka
  -1 siblings, 0 replies; 288+ messages in thread
From: Jan Kiszka @ 2011-10-17  9:27 UTC (permalink / raw)
  To: Avi Kivity, Marcelo Tosatti
  Cc: kvm, Alex Williamson, Michael S. Tsirkin, qemu-devel

MSI config notifiers are supposed to be triggered on every relevant
configuration change of MSI vectors or if MSI is enabled/disabled.

Two notifiers are established, one for vector changes and one for general
enabling. The former notifier additionally passes the currently active
MSI message. This will allow to update potential in-kernel IRQ routes on
changes. The latter notifier is optional and will only be used by a
subset of clients.

These notifiers are currently only available for MSI-X but will be
extended to legacy MSI as well.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
---
 hw/msix.c       |  119 +++++++++++++++++++++++++++++++++++++-----------------
 hw/msix.h       |    6 ++-
 hw/pci.h        |    8 ++-
 hw/virtio-pci.c |   24 ++++++------
 4 files changed, 102 insertions(+), 55 deletions(-)

diff --git a/hw/msix.c b/hw/msix.c
index 247b255..176bc76 100644
--- a/hw/msix.c
+++ b/hw/msix.c
@@ -219,16 +219,24 @@ static bool msix_is_masked(PCIDevice *dev, int vector)
 	   dev->msix_table_page[offset] & PCI_MSIX_ENTRY_CTRL_MASKBIT;
 }
 
-static void msix_handle_mask_update(PCIDevice *dev, int vector)
+static void msix_fire_vector_config_notifier(PCIDevice *dev,
+                                             unsigned int vector, bool masked)
 {
-    bool masked = msix_is_masked(dev, vector);
+    MSIMessage msg;
     int ret;
 
-    if (dev->msix_mask_notifier) {
-        ret = dev->msix_mask_notifier(dev, vector,
-                                      msix_is_masked(dev, vector));
+    if (dev->msix_vector_config_notifier) {
+        msix_message_from_vector(dev, vector, &msg);
+        ret = dev->msix_vector_config_notifier(dev, vector, &msg, masked);
         assert(ret >= 0);
     }
+}
+
+static void msix_handle_mask_update(PCIDevice *dev, int vector)
+{
+    bool masked = msix_is_masked(dev, vector);
+
+    msix_fire_vector_config_notifier(dev, vector, masked);
     if (!masked && msix_is_pending(dev, vector)) {
         msix_clr_pending(dev, vector);
         msix_notify(dev, vector);
@@ -240,20 +248,27 @@ void msix_write_config(PCIDevice *dev, uint32_t addr,
                        uint32_t old_val, int len)
 {
     unsigned enable_pos = dev->msix_cap + MSIX_CONTROL_OFFSET;
-    bool was_masked;
+    bool was_masked, was_enabled, is_enabled;
     int vector;
 
     if (!msix_present(dev) || !range_covers_byte(addr, len, enable_pos)) {
         return;
     }
 
-    if (!msix_enabled(dev)) {
+    old_val >>= (enable_pos - addr) * 8;
+
+    was_enabled = old_val & MSIX_ENABLE_MASK;
+    is_enabled = msix_enabled(dev);
+    if (was_enabled != is_enabled && dev->msix_enable_notifier) {
+        dev->msix_enable_notifier(dev, is_enabled);
+    }
+
+    if (!is_enabled) {
         return;
     }
 
     pci_device_deassert_intx(dev);
 
-    old_val >>= (enable_pos - addr) * 8;
     was_masked =
         (old_val & (MSIX_MASKALL_MASK | MSIX_ENABLE_MASK)) != MSIX_ENABLE_MASK;
     if (was_masked != msix_function_masked(dev)) {
@@ -270,15 +285,20 @@ static void msix_mmio_write(void *opaque, target_phys_addr_t addr,
     unsigned int offset = addr & (MSIX_PAGE_SIZE - 1) & ~0x3;
     unsigned int vector = offset / PCI_MSIX_ENTRY_SIZE;
     bool was_masked = msix_is_masked(dev, vector);
+    bool is_masked;
 
     pci_set_long(dev->msix_table_page + offset, val);
     if (kvm_enabled() && kvm_irqchip_in_kernel()) {
         kvm_msix_update(dev, vector, was_masked, msix_is_masked(dev, vector));
     }
 
-    if (vector < dev->msix_entries_nr &&
-        was_masked != msix_is_masked(dev, vector)) {
-        msix_handle_mask_update(dev, vector);
+    if (vector < dev->msix_entries_nr) {
+        is_masked = msix_is_masked(dev, vector);
+        if (was_masked != is_masked) {
+            msix_handle_mask_update(dev, vector);
+        } else {
+            msix_fire_vector_config_notifier(dev, vector, is_masked);
+        }
     }
 }
 
@@ -305,17 +325,17 @@ static void msix_mmio_setup(PCIDevice *d, MemoryRegion *bar)
 
 static void msix_mask_all(struct PCIDevice *dev, unsigned nentries)
 {
-    int vector, r;
+    int vector;
+
     for (vector = 0; vector < nentries; ++vector) {
         unsigned offset =
             vector * PCI_MSIX_ENTRY_SIZE + PCI_MSIX_ENTRY_VECTOR_CTRL;
         bool was_masked = msix_is_masked(dev, vector);
+
         dev->msix_table_page[offset] |= PCI_MSIX_ENTRY_CTRL_MASKBIT;
-        if (was_masked != msix_is_masked(dev, vector) &&
-            dev->msix_mask_notifier) {
-            r = dev->msix_mask_notifier(dev, vector,
-                                        msix_is_masked(dev, vector));
-            assert(r >= 0);
+
+        if (!was_masked) {
+            msix_handle_mask_update(dev, vector);
         }
     }
 }
@@ -337,7 +357,6 @@ int msix_init(struct PCIDevice *dev, unsigned short nentries,
     if (nentries > MSIX_MAX_ENTRIES)
         return -EINVAL;
 
-    dev->msix_mask_notifier = NULL;
     dev->msix_entry_used = g_malloc0(MSIX_MAX_ENTRIES *
                                         sizeof *dev->msix_entry_used);
 
@@ -529,36 +548,50 @@ void msix_unuse_all_vectors(PCIDevice *dev)
 }
 
 /* Invoke the notifier if vector entry is used and unmasked. */
-static int msix_notify_if_unmasked(PCIDevice *dev, unsigned vector, int masked)
+static int
+msix_notify_if_unmasked(PCIDevice *dev, unsigned int vector, bool masked)
 {
-    assert(dev->msix_mask_notifier);
+    MSIMessage msg;
+
+    assert(dev->msix_vector_config_notifier);
+
     if (!dev->msix_entry_used[vector] || msix_is_masked(dev, vector)) {
         return 0;
     }
-    return dev->msix_mask_notifier(dev, vector, masked);
+    msix_message_from_vector(dev, vector, &msg);
+    return dev->msix_vector_config_notifier(dev, vector, &msg, masked);
 }
 
-static int msix_set_mask_notifier_for_vector(PCIDevice *dev, unsigned vector)
+static int
+msix_set_config_notifier_for_vector(PCIDevice *dev, unsigned int vector)
 {
-	/* Notifier has been set. Invoke it on unmasked vectors. */
-	return msix_notify_if_unmasked(dev, vector, 0);
+    /* Notifier has been set. Invoke it on unmasked vectors. */
+    return msix_notify_if_unmasked(dev, vector, false);
 }
 
-static int msix_unset_mask_notifier_for_vector(PCIDevice *dev, unsigned vector)
+static int
+msix_unset_config_notifier_for_vector(PCIDevice *dev, unsigned int vector)
 {
-	/* Notifier will be unset. Invoke it to mask unmasked entries. */
-	return msix_notify_if_unmasked(dev, vector, 1);
+    /* Notifier will be unset. Invoke it to mask unmasked entries. */
+    return msix_notify_if_unmasked(dev, vector, true);
 }
 
-int msix_set_mask_notifier(PCIDevice *dev, msix_mask_notifier_func f)
+int msix_set_config_notifiers(PCIDevice *dev,
+                              MSIEnableNotifier enable_notifier,
+                              MSIVectorConfigNotifier vector_config_notifier)
 {
     int r, n;
-    assert(!dev->msix_mask_notifier);
-    dev->msix_mask_notifier = f;
+
+    dev->msix_enable_notifier = enable_notifier;
+    dev->msix_vector_config_notifier = vector_config_notifier;
+
+    if (enable_notifier && msix_enabled(dev)) {
+        enable_notifier(dev, true);
+    }
     if ((dev->config[dev->msix_cap + MSIX_CONTROL_OFFSET] &
         (MSIX_ENABLE_MASK | MSIX_MASKALL_MASK)) == MSIX_ENABLE_MASK) {
         for (n = 0; n < dev->msix_entries_nr; ++n) {
-            r = msix_set_mask_notifier_for_vector(dev, n);
+            r = msix_set_config_notifier_for_vector(dev, n);
             if (r < 0) {
                 goto undo;
             }
@@ -568,31 +601,41 @@ int msix_set_mask_notifier(PCIDevice *dev, msix_mask_notifier_func f)
 
 undo:
     while (--n >= 0) {
-        msix_unset_mask_notifier_for_vector(dev, n);
+        msix_unset_config_notifier_for_vector(dev, n);
     }
-    dev->msix_mask_notifier = NULL;
+    if (enable_notifier && msix_enabled(dev)) {
+        enable_notifier(dev, false);
+    }
+    dev->msix_enable_notifier = NULL;
+    dev->msix_vector_config_notifier = NULL;
     return r;
 }
 
-int msix_unset_mask_notifier(PCIDevice *dev)
+int msix_unset_config_notifiers(PCIDevice *dev)
 {
     int r, n;
-    assert(dev->msix_mask_notifier);
+
+    assert(dev->msix_vector_config_notifier);
+
     if ((dev->config[dev->msix_cap + MSIX_CONTROL_OFFSET] &
         (MSIX_ENABLE_MASK | MSIX_MASKALL_MASK)) == MSIX_ENABLE_MASK) {
         for (n = 0; n < dev->msix_entries_nr; ++n) {
-            r = msix_unset_mask_notifier_for_vector(dev, n);
+            r = msix_unset_config_notifier_for_vector(dev, n);
             if (r < 0) {
                 goto undo;
             }
         }
     }
-    dev->msix_mask_notifier = NULL;
+    if (dev->msix_enable_notifier && msix_enabled(dev)) {
+        dev->msix_enable_notifier(dev, false);
+    }
+    dev->msix_enable_notifier = NULL;
+    dev->msix_vector_config_notifier = NULL;
     return 0;
 
 undo:
     while (--n >= 0) {
-        msix_set_mask_notifier_for_vector(dev, n);
+        msix_set_config_notifier_for_vector(dev, n);
     }
     return r;
 }
diff --git a/hw/msix.h b/hw/msix.h
index 685dbe2..978f417 100644
--- a/hw/msix.h
+++ b/hw/msix.h
@@ -29,6 +29,8 @@ void msix_notify(PCIDevice *dev, unsigned vector);
 
 void msix_reset(PCIDevice *dev);
 
-int msix_set_mask_notifier(PCIDevice *dev, msix_mask_notifier_func);
-int msix_unset_mask_notifier(PCIDevice *dev);
+int msix_set_config_notifiers(PCIDevice *dev,
+                              MSIEnableNotifier enable_notifier,
+                              MSIVectorConfigNotifier vector_config_notifier);
+int msix_unset_config_notifiers(PCIDevice *dev);
 #endif
diff --git a/hw/pci.h b/hw/pci.h
index 0177df4..4249c6a 100644
--- a/hw/pci.h
+++ b/hw/pci.h
@@ -127,8 +127,9 @@ enum {
     QEMU_PCI_CAP_SERR = (1 << QEMU_PCI_CAP_SERR_BITNR),
 };
 
-typedef int (*msix_mask_notifier_func)(PCIDevice *, unsigned vector,
-				       int masked);
+typedef void (*MSIEnableNotifier)(PCIDevice *dev, bool enabled);
+typedef int (*MSIVectorConfigNotifier)(PCIDevice *dev, unsigned int vector,
+                                       MSIMessage *msg, bool masked);
 
 struct PCIDevice {
     DeviceState qdev;
@@ -210,7 +211,8 @@ struct PCIDevice {
      * on the rest of the region. */
     target_phys_addr_t msix_page_size;
 
-    msix_mask_notifier_func msix_mask_notifier;
+    MSIEnableNotifier msix_enable_notifier;
+    MSIVectorConfigNotifier msix_vector_config_notifier;
 };
 
 PCIDevice *pci_register_device(PCIBus *bus, const char *name,
diff --git a/hw/virtio-pci.c b/hw/virtio-pci.c
index ad6a002..6718945 100644
--- a/hw/virtio-pci.c
+++ b/hw/virtio-pci.c
@@ -520,8 +520,8 @@ static void virtio_pci_guest_notifier_read(void *opaque)
     }
 }
 
-static int virtio_pci_mask_vq(PCIDevice *dev, unsigned vector,
-                              VirtQueue *vq, int masked)
+static int virtio_pci_mask_vq(PCIDevice *dev, unsigned int vector,
+                              VirtQueue *vq, bool masked)
 {
     EventNotifier *notifier = virtio_queue_get_guest_notifier(vq);
     int r = kvm_msi_irqfd_set(&dev->msix_cache[vector],
@@ -540,8 +540,8 @@ static int virtio_pci_mask_vq(PCIDevice *dev, unsigned vector,
     return 0;
 }
 
-static int virtio_pci_mask_notifier(PCIDevice *dev, unsigned vector,
-                                    int masked)
+static int virtio_pci_msi_vector_config(PCIDevice *dev, unsigned int vector,
+                                        MSIMessage *msg, bool masked)
 {
     VirtIOPCIProxy *proxy = container_of(dev, VirtIOPCIProxy, pci_dev);
     VirtIODevice *vdev = proxy->vdev;
@@ -608,11 +608,11 @@ static int virtio_pci_set_guest_notifiers(void *opaque, bool assign)
     VirtIODevice *vdev = proxy->vdev;
     int r, n;
 
-    /* Must unset mask notifier while guest notifier
+    /* Must unset vector config notifier while guest notifier
      * is still assigned */
     if (!assign) {
-	    r = msix_unset_mask_notifier(&proxy->pci_dev);
-            assert(r >= 0);
+        r = msix_unset_config_notifiers(&proxy->pci_dev);
+        assert(r >= 0);
     }
 
     for (n = 0; n < VIRTIO_PCI_QUEUE_MAX; n++) {
@@ -626,11 +626,11 @@ static int virtio_pci_set_guest_notifiers(void *opaque, bool assign)
         }
     }
 
-    /* Must set mask notifier after guest notifier
+    /* Must set vector config notifier after guest notifier
      * has been assigned */
     if (assign) {
-        r = msix_set_mask_notifier(&proxy->pci_dev,
-                                   virtio_pci_mask_notifier);
+        r = msix_set_config_notifiers(&proxy->pci_dev, NULL,
+                                      virtio_pci_msi_vector_config);
         if (r < 0) {
             goto assign_error;
         }
@@ -645,8 +645,8 @@ assign_error:
     }
 
     if (!assign) {
-        msix_set_mask_notifier(&proxy->pci_dev,
-                               virtio_pci_mask_notifier);
+        msix_set_config_notifiers(&proxy->pci_dev, NULL,
+                                  virtio_pci_msi_vector_config);
     }
     return r;
 }
-- 
1.7.3.4


^ permalink raw reply related	[flat|nested] 288+ messages in thread

* [Qemu-devel] [RFC][PATCH 23/45] qemu-kvm: Rework MSI-X mask notifier to generic MSI config notifiers
@ 2011-10-17  9:27   ` Jan Kiszka
  0 siblings, 0 replies; 288+ messages in thread
From: Jan Kiszka @ 2011-10-17  9:27 UTC (permalink / raw)
  To: Avi Kivity, Marcelo Tosatti
  Cc: Alex Williamson, qemu-devel, kvm, Michael S. Tsirkin

MSI config notifiers are supposed to be triggered on every relevant
configuration change of MSI vectors or if MSI is enabled/disabled.

Two notifiers are established, one for vector changes and one for general
enabling. The former notifier additionally passes the currently active
MSI message. This will allow to update potential in-kernel IRQ routes on
changes. The latter notifier is optional and will only be used by a
subset of clients.

These notifiers are currently only available for MSI-X but will be
extended to legacy MSI as well.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
---
 hw/msix.c       |  119 +++++++++++++++++++++++++++++++++++++-----------------
 hw/msix.h       |    6 ++-
 hw/pci.h        |    8 ++-
 hw/virtio-pci.c |   24 ++++++------
 4 files changed, 102 insertions(+), 55 deletions(-)

diff --git a/hw/msix.c b/hw/msix.c
index 247b255..176bc76 100644
--- a/hw/msix.c
+++ b/hw/msix.c
@@ -219,16 +219,24 @@ static bool msix_is_masked(PCIDevice *dev, int vector)
 	   dev->msix_table_page[offset] & PCI_MSIX_ENTRY_CTRL_MASKBIT;
 }
 
-static void msix_handle_mask_update(PCIDevice *dev, int vector)
+static void msix_fire_vector_config_notifier(PCIDevice *dev,
+                                             unsigned int vector, bool masked)
 {
-    bool masked = msix_is_masked(dev, vector);
+    MSIMessage msg;
     int ret;
 
-    if (dev->msix_mask_notifier) {
-        ret = dev->msix_mask_notifier(dev, vector,
-                                      msix_is_masked(dev, vector));
+    if (dev->msix_vector_config_notifier) {
+        msix_message_from_vector(dev, vector, &msg);
+        ret = dev->msix_vector_config_notifier(dev, vector, &msg, masked);
         assert(ret >= 0);
     }
+}
+
+static void msix_handle_mask_update(PCIDevice *dev, int vector)
+{
+    bool masked = msix_is_masked(dev, vector);
+
+    msix_fire_vector_config_notifier(dev, vector, masked);
     if (!masked && msix_is_pending(dev, vector)) {
         msix_clr_pending(dev, vector);
         msix_notify(dev, vector);
@@ -240,20 +248,27 @@ void msix_write_config(PCIDevice *dev, uint32_t addr,
                        uint32_t old_val, int len)
 {
     unsigned enable_pos = dev->msix_cap + MSIX_CONTROL_OFFSET;
-    bool was_masked;
+    bool was_masked, was_enabled, is_enabled;
     int vector;
 
     if (!msix_present(dev) || !range_covers_byte(addr, len, enable_pos)) {
         return;
     }
 
-    if (!msix_enabled(dev)) {
+    old_val >>= (enable_pos - addr) * 8;
+
+    was_enabled = old_val & MSIX_ENABLE_MASK;
+    is_enabled = msix_enabled(dev);
+    if (was_enabled != is_enabled && dev->msix_enable_notifier) {
+        dev->msix_enable_notifier(dev, is_enabled);
+    }
+
+    if (!is_enabled) {
         return;
     }
 
     pci_device_deassert_intx(dev);
 
-    old_val >>= (enable_pos - addr) * 8;
     was_masked =
         (old_val & (MSIX_MASKALL_MASK | MSIX_ENABLE_MASK)) != MSIX_ENABLE_MASK;
     if (was_masked != msix_function_masked(dev)) {
@@ -270,15 +285,20 @@ static void msix_mmio_write(void *opaque, target_phys_addr_t addr,
     unsigned int offset = addr & (MSIX_PAGE_SIZE - 1) & ~0x3;
     unsigned int vector = offset / PCI_MSIX_ENTRY_SIZE;
     bool was_masked = msix_is_masked(dev, vector);
+    bool is_masked;
 
     pci_set_long(dev->msix_table_page + offset, val);
     if (kvm_enabled() && kvm_irqchip_in_kernel()) {
         kvm_msix_update(dev, vector, was_masked, msix_is_masked(dev, vector));
     }
 
-    if (vector < dev->msix_entries_nr &&
-        was_masked != msix_is_masked(dev, vector)) {
-        msix_handle_mask_update(dev, vector);
+    if (vector < dev->msix_entries_nr) {
+        is_masked = msix_is_masked(dev, vector);
+        if (was_masked != is_masked) {
+            msix_handle_mask_update(dev, vector);
+        } else {
+            msix_fire_vector_config_notifier(dev, vector, is_masked);
+        }
     }
 }
 
@@ -305,17 +325,17 @@ static void msix_mmio_setup(PCIDevice *d, MemoryRegion *bar)
 
 static void msix_mask_all(struct PCIDevice *dev, unsigned nentries)
 {
-    int vector, r;
+    int vector;
+
     for (vector = 0; vector < nentries; ++vector) {
         unsigned offset =
             vector * PCI_MSIX_ENTRY_SIZE + PCI_MSIX_ENTRY_VECTOR_CTRL;
         bool was_masked = msix_is_masked(dev, vector);
+
         dev->msix_table_page[offset] |= PCI_MSIX_ENTRY_CTRL_MASKBIT;
-        if (was_masked != msix_is_masked(dev, vector) &&
-            dev->msix_mask_notifier) {
-            r = dev->msix_mask_notifier(dev, vector,
-                                        msix_is_masked(dev, vector));
-            assert(r >= 0);
+
+        if (!was_masked) {
+            msix_handle_mask_update(dev, vector);
         }
     }
 }
@@ -337,7 +357,6 @@ int msix_init(struct PCIDevice *dev, unsigned short nentries,
     if (nentries > MSIX_MAX_ENTRIES)
         return -EINVAL;
 
-    dev->msix_mask_notifier = NULL;
     dev->msix_entry_used = g_malloc0(MSIX_MAX_ENTRIES *
                                         sizeof *dev->msix_entry_used);
 
@@ -529,36 +548,50 @@ void msix_unuse_all_vectors(PCIDevice *dev)
 }
 
 /* Invoke the notifier if vector entry is used and unmasked. */
-static int msix_notify_if_unmasked(PCIDevice *dev, unsigned vector, int masked)
+static int
+msix_notify_if_unmasked(PCIDevice *dev, unsigned int vector, bool masked)
 {
-    assert(dev->msix_mask_notifier);
+    MSIMessage msg;
+
+    assert(dev->msix_vector_config_notifier);
+
     if (!dev->msix_entry_used[vector] || msix_is_masked(dev, vector)) {
         return 0;
     }
-    return dev->msix_mask_notifier(dev, vector, masked);
+    msix_message_from_vector(dev, vector, &msg);
+    return dev->msix_vector_config_notifier(dev, vector, &msg, masked);
 }
 
-static int msix_set_mask_notifier_for_vector(PCIDevice *dev, unsigned vector)
+static int
+msix_set_config_notifier_for_vector(PCIDevice *dev, unsigned int vector)
 {
-	/* Notifier has been set. Invoke it on unmasked vectors. */
-	return msix_notify_if_unmasked(dev, vector, 0);
+    /* Notifier has been set. Invoke it on unmasked vectors. */
+    return msix_notify_if_unmasked(dev, vector, false);
 }
 
-static int msix_unset_mask_notifier_for_vector(PCIDevice *dev, unsigned vector)
+static int
+msix_unset_config_notifier_for_vector(PCIDevice *dev, unsigned int vector)
 {
-	/* Notifier will be unset. Invoke it to mask unmasked entries. */
-	return msix_notify_if_unmasked(dev, vector, 1);
+    /* Notifier will be unset. Invoke it to mask unmasked entries. */
+    return msix_notify_if_unmasked(dev, vector, true);
 }
 
-int msix_set_mask_notifier(PCIDevice *dev, msix_mask_notifier_func f)
+int msix_set_config_notifiers(PCIDevice *dev,
+                              MSIEnableNotifier enable_notifier,
+                              MSIVectorConfigNotifier vector_config_notifier)
 {
     int r, n;
-    assert(!dev->msix_mask_notifier);
-    dev->msix_mask_notifier = f;
+
+    dev->msix_enable_notifier = enable_notifier;
+    dev->msix_vector_config_notifier = vector_config_notifier;
+
+    if (enable_notifier && msix_enabled(dev)) {
+        enable_notifier(dev, true);
+    }
     if ((dev->config[dev->msix_cap + MSIX_CONTROL_OFFSET] &
         (MSIX_ENABLE_MASK | MSIX_MASKALL_MASK)) == MSIX_ENABLE_MASK) {
         for (n = 0; n < dev->msix_entries_nr; ++n) {
-            r = msix_set_mask_notifier_for_vector(dev, n);
+            r = msix_set_config_notifier_for_vector(dev, n);
             if (r < 0) {
                 goto undo;
             }
@@ -568,31 +601,41 @@ int msix_set_mask_notifier(PCIDevice *dev, msix_mask_notifier_func f)
 
 undo:
     while (--n >= 0) {
-        msix_unset_mask_notifier_for_vector(dev, n);
+        msix_unset_config_notifier_for_vector(dev, n);
     }
-    dev->msix_mask_notifier = NULL;
+    if (enable_notifier && msix_enabled(dev)) {
+        enable_notifier(dev, false);
+    }
+    dev->msix_enable_notifier = NULL;
+    dev->msix_vector_config_notifier = NULL;
     return r;
 }
 
-int msix_unset_mask_notifier(PCIDevice *dev)
+int msix_unset_config_notifiers(PCIDevice *dev)
 {
     int r, n;
-    assert(dev->msix_mask_notifier);
+
+    assert(dev->msix_vector_config_notifier);
+
     if ((dev->config[dev->msix_cap + MSIX_CONTROL_OFFSET] &
         (MSIX_ENABLE_MASK | MSIX_MASKALL_MASK)) == MSIX_ENABLE_MASK) {
         for (n = 0; n < dev->msix_entries_nr; ++n) {
-            r = msix_unset_mask_notifier_for_vector(dev, n);
+            r = msix_unset_config_notifier_for_vector(dev, n);
             if (r < 0) {
                 goto undo;
             }
         }
     }
-    dev->msix_mask_notifier = NULL;
+    if (dev->msix_enable_notifier && msix_enabled(dev)) {
+        dev->msix_enable_notifier(dev, false);
+    }
+    dev->msix_enable_notifier = NULL;
+    dev->msix_vector_config_notifier = NULL;
     return 0;
 
 undo:
     while (--n >= 0) {
-        msix_set_mask_notifier_for_vector(dev, n);
+        msix_set_config_notifier_for_vector(dev, n);
     }
     return r;
 }
diff --git a/hw/msix.h b/hw/msix.h
index 685dbe2..978f417 100644
--- a/hw/msix.h
+++ b/hw/msix.h
@@ -29,6 +29,8 @@ void msix_notify(PCIDevice *dev, unsigned vector);
 
 void msix_reset(PCIDevice *dev);
 
-int msix_set_mask_notifier(PCIDevice *dev, msix_mask_notifier_func);
-int msix_unset_mask_notifier(PCIDevice *dev);
+int msix_set_config_notifiers(PCIDevice *dev,
+                              MSIEnableNotifier enable_notifier,
+                              MSIVectorConfigNotifier vector_config_notifier);
+int msix_unset_config_notifiers(PCIDevice *dev);
 #endif
diff --git a/hw/pci.h b/hw/pci.h
index 0177df4..4249c6a 100644
--- a/hw/pci.h
+++ b/hw/pci.h
@@ -127,8 +127,9 @@ enum {
     QEMU_PCI_CAP_SERR = (1 << QEMU_PCI_CAP_SERR_BITNR),
 };
 
-typedef int (*msix_mask_notifier_func)(PCIDevice *, unsigned vector,
-				       int masked);
+typedef void (*MSIEnableNotifier)(PCIDevice *dev, bool enabled);
+typedef int (*MSIVectorConfigNotifier)(PCIDevice *dev, unsigned int vector,
+                                       MSIMessage *msg, bool masked);
 
 struct PCIDevice {
     DeviceState qdev;
@@ -210,7 +211,8 @@ struct PCIDevice {
      * on the rest of the region. */
     target_phys_addr_t msix_page_size;
 
-    msix_mask_notifier_func msix_mask_notifier;
+    MSIEnableNotifier msix_enable_notifier;
+    MSIVectorConfigNotifier msix_vector_config_notifier;
 };
 
 PCIDevice *pci_register_device(PCIBus *bus, const char *name,
diff --git a/hw/virtio-pci.c b/hw/virtio-pci.c
index ad6a002..6718945 100644
--- a/hw/virtio-pci.c
+++ b/hw/virtio-pci.c
@@ -520,8 +520,8 @@ static void virtio_pci_guest_notifier_read(void *opaque)
     }
 }
 
-static int virtio_pci_mask_vq(PCIDevice *dev, unsigned vector,
-                              VirtQueue *vq, int masked)
+static int virtio_pci_mask_vq(PCIDevice *dev, unsigned int vector,
+                              VirtQueue *vq, bool masked)
 {
     EventNotifier *notifier = virtio_queue_get_guest_notifier(vq);
     int r = kvm_msi_irqfd_set(&dev->msix_cache[vector],
@@ -540,8 +540,8 @@ static int virtio_pci_mask_vq(PCIDevice *dev, unsigned vector,
     return 0;
 }
 
-static int virtio_pci_mask_notifier(PCIDevice *dev, unsigned vector,
-                                    int masked)
+static int virtio_pci_msi_vector_config(PCIDevice *dev, unsigned int vector,
+                                        MSIMessage *msg, bool masked)
 {
     VirtIOPCIProxy *proxy = container_of(dev, VirtIOPCIProxy, pci_dev);
     VirtIODevice *vdev = proxy->vdev;
@@ -608,11 +608,11 @@ static int virtio_pci_set_guest_notifiers(void *opaque, bool assign)
     VirtIODevice *vdev = proxy->vdev;
     int r, n;
 
-    /* Must unset mask notifier while guest notifier
+    /* Must unset vector config notifier while guest notifier
      * is still assigned */
     if (!assign) {
-	    r = msix_unset_mask_notifier(&proxy->pci_dev);
-            assert(r >= 0);
+        r = msix_unset_config_notifiers(&proxy->pci_dev);
+        assert(r >= 0);
     }
 
     for (n = 0; n < VIRTIO_PCI_QUEUE_MAX; n++) {
@@ -626,11 +626,11 @@ static int virtio_pci_set_guest_notifiers(void *opaque, bool assign)
         }
     }
 
-    /* Must set mask notifier after guest notifier
+    /* Must set vector config notifier after guest notifier
      * has been assigned */
     if (assign) {
-        r = msix_set_mask_notifier(&proxy->pci_dev,
-                                   virtio_pci_mask_notifier);
+        r = msix_set_config_notifiers(&proxy->pci_dev, NULL,
+                                      virtio_pci_msi_vector_config);
         if (r < 0) {
             goto assign_error;
         }
@@ -645,8 +645,8 @@ assign_error:
     }
 
     if (!assign) {
-        msix_set_mask_notifier(&proxy->pci_dev,
-                               virtio_pci_mask_notifier);
+        msix_set_config_notifiers(&proxy->pci_dev, NULL,
+                                  virtio_pci_msi_vector_config);
     }
     return r;
 }
-- 
1.7.3.4

^ permalink raw reply related	[flat|nested] 288+ messages in thread

* [RFC][PATCH 24/45] qemu-kvm: msix: Don't handle mask updated while disabled
  2011-10-17  9:27 ` [Qemu-devel] " Jan Kiszka
@ 2011-10-17  9:27   ` Jan Kiszka
  -1 siblings, 0 replies; 288+ messages in thread
From: Jan Kiszka @ 2011-10-17  9:27 UTC (permalink / raw)
  To: Avi Kivity, Marcelo Tosatti
  Cc: kvm, Alex Williamson, Michael S. Tsirkin, qemu-devel

As long as MSI-X is disabled, it's incorrect to invoke
msix_handle_mask_update on per-vector mask changes. That may misguide
the config notifier callback or spuriously trigger an MSI event.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
---
 hw/msix.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/hw/msix.c b/hw/msix.c
index 176bc76..7d45760 100644
--- a/hw/msix.c
+++ b/hw/msix.c
@@ -292,7 +292,7 @@ static void msix_mmio_write(void *opaque, target_phys_addr_t addr,
         kvm_msix_update(dev, vector, was_masked, msix_is_masked(dev, vector));
     }
 
-    if (vector < dev->msix_entries_nr) {
+    if (msix_enabled(dev) && vector < dev->msix_entries_nr) {
         is_masked = msix_is_masked(dev, vector);
         if (was_masked != is_masked) {
             msix_handle_mask_update(dev, vector);
-- 
1.7.3.4


^ permalink raw reply related	[flat|nested] 288+ messages in thread

* [Qemu-devel] [RFC][PATCH 24/45] qemu-kvm: msix: Don't handle mask updated while disabled
@ 2011-10-17  9:27   ` Jan Kiszka
  0 siblings, 0 replies; 288+ messages in thread
From: Jan Kiszka @ 2011-10-17  9:27 UTC (permalink / raw)
  To: Avi Kivity, Marcelo Tosatti
  Cc: Alex Williamson, qemu-devel, kvm, Michael S. Tsirkin

As long as MSI-X is disabled, it's incorrect to invoke
msix_handle_mask_update on per-vector mask changes. That may misguide
the config notifier callback or spuriously trigger an MSI event.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
---
 hw/msix.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/hw/msix.c b/hw/msix.c
index 176bc76..7d45760 100644
--- a/hw/msix.c
+++ b/hw/msix.c
@@ -292,7 +292,7 @@ static void msix_mmio_write(void *opaque, target_phys_addr_t addr,
         kvm_msix_update(dev, vector, was_masked, msix_is_masked(dev, vector));
     }
 
-    if (vector < dev->msix_entries_nr) {
+    if (msix_enabled(dev) && vector < dev->msix_entries_nr) {
         is_masked = msix_is_masked(dev, vector);
         if (was_masked != is_masked) {
             msix_handle_mask_update(dev, vector);
-- 
1.7.3.4

^ permalink raw reply related	[flat|nested] 288+ messages in thread

* [RFC][PATCH 25/45] qemu-kvm: Update MSI cache on kvm_msi_irqfd_set
  2011-10-17  9:27 ` [Qemu-devel] " Jan Kiszka
@ 2011-10-17  9:27   ` Jan Kiszka
  -1 siblings, 0 replies; 288+ messages in thread
From: Jan Kiszka @ 2011-10-17  9:27 UTC (permalink / raw)
  To: Avi Kivity, Marcelo Tosatti
  Cc: kvm, Alex Williamson, Michael S. Tsirkin, qemu-devel

Updating the MSI message registration on kvm_msi_irqfd_set will allow us
to switch to a lazy mode and remove the need to track message changes in
the device config space.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
---
 hw/virtio-pci.c |   10 ++++++----
 kvm.h           |    3 ++-
 qemu-kvm.c      |   17 ++++++++++++++---
 3 files changed, 22 insertions(+), 8 deletions(-)

diff --git a/hw/virtio-pci.c b/hw/virtio-pci.c
index 6718945..85d6771 100644
--- a/hw/virtio-pci.c
+++ b/hw/virtio-pci.c
@@ -521,10 +521,10 @@ static void virtio_pci_guest_notifier_read(void *opaque)
 }
 
 static int virtio_pci_mask_vq(PCIDevice *dev, unsigned int vector,
-                              VirtQueue *vq, bool masked)
+                              MSIMessage *msg, VirtQueue *vq, bool masked)
 {
     EventNotifier *notifier = virtio_queue_get_guest_notifier(vq);
-    int r = kvm_msi_irqfd_set(&dev->msix_cache[vector],
+    int r = kvm_msi_irqfd_set(msg, &dev->msix_cache[vector],
                               event_notifier_get_fd(notifier),
                               !masked);
     if (r < 0) {
@@ -554,7 +554,8 @@ static int virtio_pci_msi_vector_config(PCIDevice *dev, unsigned int vector,
         if (virtio_queue_vector(vdev, n) != vector) {
             continue;
         }
-        r = virtio_pci_mask_vq(dev, vector, virtio_get_queue(vdev, n), masked);
+        r = virtio_pci_mask_vq(dev, vector, msg, virtio_get_queue(vdev, n),
+                               masked);
         if (r < 0) {
             goto undo;
         }
@@ -565,7 +566,8 @@ undo:
         if (virtio_queue_vector(vdev, n) != vector) {
             continue;
         }
-        virtio_pci_mask_vq(dev, vector, virtio_get_queue(vdev, n), !masked);
+        virtio_pci_mask_vq(dev, vector, msg, virtio_get_queue(vdev, n),
+                           !masked);
     }
     return r;
 }
diff --git a/kvm.h b/kvm.h
index fe2eec5..8647647 100644
--- a/kvm.h
+++ b/kvm.h
@@ -208,7 +208,8 @@ int kvm_msi_message_add(MSIMessage *msg, MSIRoutingCache *cache);
 int kvm_msi_message_del(MSIRoutingCache *cache);
 int kvm_msi_message_update(MSIMessage *msg, MSIRoutingCache *cache);
 
-int kvm_msi_irqfd_set(MSIRoutingCache *cache, int fd, bool assigned);
+int kvm_msi_irqfd_set(MSIMessage *msg, MSIRoutingCache *cache, int fd,
+                      bool assigned);
 
 int kvm_commit_irq_routes(void);
 
diff --git a/qemu-kvm.c b/qemu-kvm.c
index ab7703b..6bdd7b5 100644
--- a/qemu-kvm.c
+++ b/qemu-kvm.c
@@ -524,10 +524,21 @@ int kvm_msi_message_update(MSIMessage *msg, MSIRoutingCache *cache)
 }
 
 
-int kvm_msi_irqfd_set(MSIRoutingCache *cache, int fd, bool assigned)
+int kvm_msi_irqfd_set(MSIMessage *msg, MSIRoutingCache *cache, int fd,
+                      bool assigned)
 {
-    if (cache->type == MSI_ROUTE_NONE) {
-        return assigned ? -EINVAL : 0;
+    int ret;
+
+    if (assigned) {
+        if (cache->type == MSI_ROUTE_NONE) {
+            return -EINVAL;
+        }
+        ret = kvm_msi_message_update(msg, cache);
+        if (ret < 0) {
+            return ret;
+        }
+    } else if (cache->type == MSI_ROUTE_NONE) {
+        return 0;
     }
     cache->kvm_irqfd = assigned ? fd : -1;
     return kvm_set_irqfd(cache->kvm_gsi, fd, assigned);
-- 
1.7.3.4


^ permalink raw reply related	[flat|nested] 288+ messages in thread

* [Qemu-devel] [RFC][PATCH 25/45] qemu-kvm: Update MSI cache on kvm_msi_irqfd_set
@ 2011-10-17  9:27   ` Jan Kiszka
  0 siblings, 0 replies; 288+ messages in thread
From: Jan Kiszka @ 2011-10-17  9:27 UTC (permalink / raw)
  To: Avi Kivity, Marcelo Tosatti
  Cc: Alex Williamson, qemu-devel, kvm, Michael S. Tsirkin

Updating the MSI message registration on kvm_msi_irqfd_set will allow us
to switch to a lazy mode and remove the need to track message changes in
the device config space.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
---
 hw/virtio-pci.c |   10 ++++++----
 kvm.h           |    3 ++-
 qemu-kvm.c      |   17 ++++++++++++++---
 3 files changed, 22 insertions(+), 8 deletions(-)

diff --git a/hw/virtio-pci.c b/hw/virtio-pci.c
index 6718945..85d6771 100644
--- a/hw/virtio-pci.c
+++ b/hw/virtio-pci.c
@@ -521,10 +521,10 @@ static void virtio_pci_guest_notifier_read(void *opaque)
 }
 
 static int virtio_pci_mask_vq(PCIDevice *dev, unsigned int vector,
-                              VirtQueue *vq, bool masked)
+                              MSIMessage *msg, VirtQueue *vq, bool masked)
 {
     EventNotifier *notifier = virtio_queue_get_guest_notifier(vq);
-    int r = kvm_msi_irqfd_set(&dev->msix_cache[vector],
+    int r = kvm_msi_irqfd_set(msg, &dev->msix_cache[vector],
                               event_notifier_get_fd(notifier),
                               !masked);
     if (r < 0) {
@@ -554,7 +554,8 @@ static int virtio_pci_msi_vector_config(PCIDevice *dev, unsigned int vector,
         if (virtio_queue_vector(vdev, n) != vector) {
             continue;
         }
-        r = virtio_pci_mask_vq(dev, vector, virtio_get_queue(vdev, n), masked);
+        r = virtio_pci_mask_vq(dev, vector, msg, virtio_get_queue(vdev, n),
+                               masked);
         if (r < 0) {
             goto undo;
         }
@@ -565,7 +566,8 @@ undo:
         if (virtio_queue_vector(vdev, n) != vector) {
             continue;
         }
-        virtio_pci_mask_vq(dev, vector, virtio_get_queue(vdev, n), !masked);
+        virtio_pci_mask_vq(dev, vector, msg, virtio_get_queue(vdev, n),
+                           !masked);
     }
     return r;
 }
diff --git a/kvm.h b/kvm.h
index fe2eec5..8647647 100644
--- a/kvm.h
+++ b/kvm.h
@@ -208,7 +208,8 @@ int kvm_msi_message_add(MSIMessage *msg, MSIRoutingCache *cache);
 int kvm_msi_message_del(MSIRoutingCache *cache);
 int kvm_msi_message_update(MSIMessage *msg, MSIRoutingCache *cache);
 
-int kvm_msi_irqfd_set(MSIRoutingCache *cache, int fd, bool assigned);
+int kvm_msi_irqfd_set(MSIMessage *msg, MSIRoutingCache *cache, int fd,
+                      bool assigned);
 
 int kvm_commit_irq_routes(void);
 
diff --git a/qemu-kvm.c b/qemu-kvm.c
index ab7703b..6bdd7b5 100644
--- a/qemu-kvm.c
+++ b/qemu-kvm.c
@@ -524,10 +524,21 @@ int kvm_msi_message_update(MSIMessage *msg, MSIRoutingCache *cache)
 }
 
 
-int kvm_msi_irqfd_set(MSIRoutingCache *cache, int fd, bool assigned)
+int kvm_msi_irqfd_set(MSIMessage *msg, MSIRoutingCache *cache, int fd,
+                      bool assigned)
 {
-    if (cache->type == MSI_ROUTE_NONE) {
-        return assigned ? -EINVAL : 0;
+    int ret;
+
+    if (assigned) {
+        if (cache->type == MSI_ROUTE_NONE) {
+            return -EINVAL;
+        }
+        ret = kvm_msi_message_update(msg, cache);
+        if (ret < 0) {
+            return ret;
+        }
+    } else if (cache->type == MSI_ROUTE_NONE) {
+        return 0;
     }
     cache->kvm_irqfd = assigned ? fd : -1;
     return kvm_set_irqfd(cache->kvm_gsi, fd, assigned);
-- 
1.7.3.4

^ permalink raw reply related	[flat|nested] 288+ messages in thread

* [RFC][PATCH 26/45] qemu-kvm: Use g_realloc for irq_routes extension
  2011-10-17  9:27 ` [Qemu-devel] " Jan Kiszka
@ 2011-10-17  9:28   ` Jan Kiszka
  -1 siblings, 0 replies; 288+ messages in thread
From: Jan Kiszka @ 2011-10-17  9:28 UTC (permalink / raw)
  To: Avi Kivity, Marcelo Tosatti
  Cc: kvm, Alex Williamson, Michael S. Tsirkin, qemu-devel

Allows to drop checking for out-of-memory.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
---
 qemu-kvm.c |    7 +------
 1 files changed, 1 insertions(+), 6 deletions(-)

diff --git a/qemu-kvm.c b/qemu-kvm.c
index 6bdd7b5..eb8f176 100644
--- a/qemu-kvm.c
+++ b/qemu-kvm.c
@@ -258,7 +258,6 @@ int kvm_add_routing_entry(struct kvm_irq_routing_entry *entry,
 {
 #ifdef KVM_CAP_IRQ_ROUTING
     KVMState *s = kvm_state;
-    struct kvm_irq_routing *z;
     struct kvm_irq_routing_entry *new;
     int n, size;
 
@@ -269,12 +268,8 @@ int kvm_add_routing_entry(struct kvm_irq_routing_entry *entry,
         }
         size = sizeof(struct kvm_irq_routing);
         size += n * sizeof(*new);
-        z = realloc(s->irq_routes, size);
-        if (!z) {
-            return -ENOMEM;
-        }
+        s->irq_routes = g_realloc(s->irq_routes, size);
         s->nr_allocated_irq_routes = n;
-        s->irq_routes = z;
 
         s->msi_cache = g_realloc(s->msi_cache, sizeof(*s->msi_cache) * n);
     }
-- 
1.7.3.4


^ permalink raw reply related	[flat|nested] 288+ messages in thread

* [Qemu-devel] [RFC][PATCH 26/45] qemu-kvm: Use g_realloc for irq_routes extension
@ 2011-10-17  9:28   ` Jan Kiszka
  0 siblings, 0 replies; 288+ messages in thread
From: Jan Kiszka @ 2011-10-17  9:28 UTC (permalink / raw)
  To: Avi Kivity, Marcelo Tosatti
  Cc: Alex Williamson, qemu-devel, kvm, Michael S. Tsirkin

Allows to drop checking for out-of-memory.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
---
 qemu-kvm.c |    7 +------
 1 files changed, 1 insertions(+), 6 deletions(-)

diff --git a/qemu-kvm.c b/qemu-kvm.c
index 6bdd7b5..eb8f176 100644
--- a/qemu-kvm.c
+++ b/qemu-kvm.c
@@ -258,7 +258,6 @@ int kvm_add_routing_entry(struct kvm_irq_routing_entry *entry,
 {
 #ifdef KVM_CAP_IRQ_ROUTING
     KVMState *s = kvm_state;
-    struct kvm_irq_routing *z;
     struct kvm_irq_routing_entry *new;
     int n, size;
 
@@ -269,12 +268,8 @@ int kvm_add_routing_entry(struct kvm_irq_routing_entry *entry,
         }
         size = sizeof(struct kvm_irq_routing);
         size += n * sizeof(*new);
-        z = realloc(s->irq_routes, size);
-        if (!z) {
-            return -ENOMEM;
-        }
+        s->irq_routes = g_realloc(s->irq_routes, size);
         s->nr_allocated_irq_routes = n;
-        s->irq_routes = z;
 
         s->msi_cache = g_realloc(s->msi_cache, sizeof(*s->msi_cache) * n);
     }
-- 
1.7.3.4

^ permalink raw reply related	[flat|nested] 288+ messages in thread

* [RFC][PATCH 27/45] qemu-kvm: Lazily update MSI caches
  2011-10-17  9:27 ` [Qemu-devel] " Jan Kiszka
@ 2011-10-17  9:28   ` Jan Kiszka
  -1 siblings, 0 replies; 288+ messages in thread
From: Jan Kiszka @ 2011-10-17  9:28 UTC (permalink / raw)
  To: Avi Kivity, Marcelo Tosatti
  Cc: kvm, Alex Williamson, Michael S. Tsirkin, qemu-devel

Instead of registering every possible MSI message that is prepared in
some device's config space, this commit only registers those messages
that are actually sent.

Every message that runs through the delivery hook is first checked
against its cached data. If there is a mismatch, then the registration
is created or updated, if it matches, delivery is performed directly.

To avoid exhausting limited KVM IRQ routes, devices are expected to
flush their MSI caches whenever the content is no longer used or valid.
If we run out of routes nevertheless, we flush all caches that were
created dynamically, ie. via the MSI delivery hook. However, we keep all
those cached routes intact that are static because they are associated
with external sources (irqfds).

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
---
 hw/apic.c  |    4 +--
 hw/msi.c   |   93 ++++++------------------------------------------------------
 hw/msi.h   |    2 +-
 hw/msix.c  |   91 ++++------------------------------------------------------
 hw/pci.c   |    1 -
 hw/pci.h   |    3 --
 kvm-stub.c |   13 +--------
 kvm.h      |    6 ++--
 qemu-kvm.c |   69 ++++++++++++++++++++++++++++++++++---------
 9 files changed, 75 insertions(+), 207 deletions(-)

diff --git a/hw/apic.c b/hw/apic.c
index cb6662c..2cafc49 100644
--- a/hw/apic.c
+++ b/hw/apic.c
@@ -807,9 +807,7 @@ static uint32_t apic_mem_readl(void *opaque, target_phys_addr_t addr)
 void apic_deliver_msi(MSIMessage *msg, MSIRoutingCache *cache)
 {
     if (kvm_enabled() && kvm_irqchip_in_kernel()) {
-        if (kvm_set_irq(cache->kvm_gsi, 1, NULL) < 0) {
-            abort();
-        }
+        kvm_msi_deliver(msg, cache);
     } else {
         uint8_t dest =
             (msg->address & MSI_ADDR_DEST_ID_MASK) >> MSI_ADDR_DEST_ID_SHIFT;
diff --git a/hw/msi.c b/hw/msi.c
index 1328903..23d79dd 100644
--- a/hw/msi.c
+++ b/hw/msi.c
@@ -140,71 +140,18 @@ static void msi_message_from_vector(PCIDevice *dev, uint16_t msi_flags,
     }
 }
 
-static void kvm_msi_update(PCIDevice *dev)
-{
-    uint16_t flags = pci_get_word(dev->config + msi_flags_off(dev));
-    unsigned int max_vectors = 1 <<
-        ((flags & PCI_MSI_FLAGS_QMASK) >> (ffs(PCI_MSI_FLAGS_QMASK) - 1));
-    unsigned int nr_vectors = msi_nr_vectors(flags);
-    MSIRoutingCache *cache;
-    bool changed = false;
-    unsigned int vector;
-    MSIMessage msg;
-    int r;
-
-    for (vector = 0; vector < max_vectors; vector++) {
-        cache = &dev->msi_cache[vector];
-
-        if (vector >= nr_vectors) {
-            if (vector < dev->msi_entries_nr) {
-                kvm_msi_message_del(cache);
-                changed = true;
-            }
-        } else if (vector >= dev->msi_entries_nr) {
-            msi_message_from_vector(dev, flags, vector, &msg);
-            r = kvm_msi_message_add(&msg, cache);
-            if (r) {
-                fprintf(stderr, "%s: kvm_msi_add failed: %s\n", __func__,
-                        strerror(-r));
-                exit(1);
-            }
-            changed = true;
-        } else {
-            msi_message_from_vector(dev, flags, vector, &msg);
-            r = kvm_msi_message_update(&msg, cache);
-            if (r < 0) {
-                fprintf(stderr, "%s: kvm_update_msi failed: %s\n",
-                        __func__, strerror(-r));
-                exit(1);
-            }
-            if (r > 0) {
-                changed = true;
-            }
-        }
-    }
-    dev->msi_entries_nr = nr_vectors;
-    if (changed) {
-        r = kvm_commit_irq_routes();
-        if (r) {
-            fprintf(stderr, "%s: kvm_commit_irq_routes failed: %s\n", __func__,
-                    strerror(-r));
-            exit(1);
-        }
-    }
-}
-
-/* KVM specific MSI helpers */
 static void kvm_msi_free(PCIDevice *dev)
 {
-    unsigned int vector;
+    unsigned int vector, nr_vectors;
 
-    for (vector = 0; vector < dev->msi_entries_nr; ++vector) {
-        kvm_msi_message_del(&dev->msi_cache[vector]);
+    if (!kvm_enabled() || !kvm_irqchip_in_kernel()) {
+        return;
     }
-    if (dev->msi_entries_nr > 0) {
-        kvm_commit_irq_routes();
+    nr_vectors =
+        msi_nr_vectors(pci_get_word(dev->config + msi_flags_off(dev)));
+    for (vector = 0; vector < nr_vectors; ++vector) {
+        kvm_msi_cache_invalidate(&dev->msi_cache[vector]);
     }
-    dev->msi_entries_nr = 0;
 }
 
 int msi_init(struct PCIDevice *dev, uint8_t offset,
@@ -283,10 +230,7 @@ void msi_uninit(struct PCIDevice *dev)
     flags = pci_get_word(dev->config + msi_flags_off(dev));
     cap_size = msi_cap_sizeof(flags);
 
-    if (kvm_enabled() && kvm_irqchip_in_kernel()) {
-        kvm_msi_free(dev);
-    }
-
+    kvm_msi_free(dev);
     g_free(dev->msi_cache);
 
     pci_del_capability(dev, PCI_CAP_ID_MSI, cap_size);
@@ -303,9 +247,6 @@ void msi_reset(PCIDevice *dev)
     if (!msi_present(dev)) {
         return;
     }
-    if (kvm_enabled() && kvm_irqchip_in_kernel()) {
-        kvm_msi_free(dev);
-    }
 
     flags = pci_get_word(dev->config + msi_flags_off(dev));
     flags &= ~(PCI_MSI_FLAGS_QSIZE | PCI_MSI_FLAGS_ENABLE);
@@ -402,6 +343,7 @@ void msi_write_config(PCIDevice *dev, uint32_t addr, uint32_t val, int len)
 #endif
 
     if (!(flags & PCI_MSI_FLAGS_ENABLE)) {
+        kvm_msi_free(dev);
         return;
     }
 
@@ -433,10 +375,6 @@ void msi_write_config(PCIDevice *dev, uint32_t addr, uint32_t val, int len)
         pci_set_word(dev->config + msi_flags_off(dev), flags);
     }
 
-    if (kvm_enabled() && kvm_irqchip_in_kernel()) {
-        kvm_msi_update(dev);
-    }
-
     if (!msi_per_vector_mask) {
         /* if per vector masking isn't supported,
            there is no pending interrupt. */
@@ -467,16 +405,3 @@ unsigned int msi_nr_vectors_allocated(const PCIDevice *dev)
     uint16_t flags = pci_get_word(dev->config + msi_flags_off(dev));
     return msi_nr_vectors(flags);
 }
-
-void msi_post_load(PCIDevice *dev)
-{
-    uint16_t flags = pci_get_word(dev->config + msi_flags_off(dev));
-
-    if (kvm_enabled() && kvm_irqchip_in_kernel()) {
-        kvm_msi_free(dev);
-
-        if (flags & PCI_MSI_FLAGS_ENABLE) {
-            kvm_msi_update(dev);
-        }
-    }
-}
diff --git a/hw/msi.h b/hw/msi.h
index 20ae215..74f6d52 100644
--- a/hw/msi.h
+++ b/hw/msi.h
@@ -32,6 +32,7 @@ struct MSIMessage {
 typedef enum {
     MSI_ROUTE_NONE = 0,
     MSI_ROUTE_STATIC,
+    MSI_ROUTE_DYNAMIC,
 } MSIRouteType;
 
 struct MSIRoutingCache {
@@ -51,7 +52,6 @@ void msi_reset(PCIDevice *dev);
 void msi_notify(PCIDevice *dev, unsigned int vector);
 void msi_write_config(PCIDevice *dev, uint32_t addr, uint32_t val, int len);
 unsigned int msi_nr_vectors_allocated(const PCIDevice *dev);
-void msi_post_load(PCIDevice *dev);
 
 static inline bool msi_present(const PCIDevice *dev)
 {
diff --git a/hw/msix.c b/hw/msix.c
index 7d45760..ce3375a 100644
--- a/hw/msix.c
+++ b/hw/msix.c
@@ -42,79 +42,16 @@ static void msix_message_from_vector(PCIDevice *dev, unsigned vector,
     msg->data = pci_get_long(table_entry + PCI_MSIX_ENTRY_DATA);
 }
 
-/* KVM specific MSIX helpers */
 static void kvm_msix_free(PCIDevice *dev)
 {
-    int vector, changed = 0;
-
-    for (vector = 0; vector < dev->msix_entries_nr; ++vector) {
-        if (dev->msix_entry_used[vector]) {
-            kvm_msi_message_del(&dev->msix_cache[vector]);
-            changed = 1;
-        }
-    }
-    if (changed) {
-        kvm_commit_irq_routes();
-    }
-}
-
-static void kvm_msix_update(PCIDevice *dev, int vector,
-                            int was_masked, int is_masked)
-{
-    int mask_cleared = was_masked && !is_masked;
-    MSIMessage msg;
-    int r;
+    int vector;
 
-    /* It is only legal to change an entry when it is masked. Therefore, it is
-     * enough to update the routing in kernel when mask is being cleared. */
-    if (!mask_cleared) {
+    if (!kvm_enabled() || !kvm_irqchip_in_kernel()) {
         return;
     }
-    if (!dev->msix_entry_used[vector]) {
-        return;
-    }
-
-    msix_message_from_vector(dev, vector, &msg);
-    r = kvm_msi_message_update(&msg, &dev->msix_cache[vector]);
-    if (r < 0) {
-        fprintf(stderr, "%s: kvm_update_msix failed: %s\n", __func__,
-                strerror(-r));
-        exit(1);
-    }
-    if (r > 0) {
-        r = kvm_commit_irq_routes();
-        if (r) {
-            fprintf(stderr, "%s: kvm_commit_irq_routes failed: %s\n", __func__,
-		    strerror(-r));
-            exit(1);
-        }
-    }
-}
-
-static int kvm_msix_vector_add(PCIDevice *dev, unsigned vector)
-{
-    MSIMessage msg;
-    int r;
-
-    msix_message_from_vector(dev, vector, &msg);
-    r = kvm_msi_message_add(&msg, &dev->msix_cache[vector]);
-    if (r < 0) {
-        fprintf(stderr, "%s: kvm_add_msix failed: %s\n", __func__, strerror(-r));
-        return r;
-    }
-
-    r = kvm_commit_irq_routes();
-    if (r < 0) {
-        fprintf(stderr, "%s: kvm_commit_irq_routes failed: %s\n", __func__, strerror(-r));
-        return r;
+    for (vector = 0; vector < dev->msix_entries_nr; ++vector) {
+        kvm_msi_cache_invalidate(&dev->msix_cache[vector]);
     }
-    return 0;
-}
-
-static void kvm_msix_vector_del(PCIDevice *dev, unsigned vector)
-{
-    kvm_msi_message_del(&dev->msix_cache[vector]);
-    kvm_commit_irq_routes();
 }
 
 /* Add MSI-X capability to the config space for the device. */
@@ -264,6 +201,7 @@ void msix_write_config(PCIDevice *dev, uint32_t addr,
     }
 
     if (!is_enabled) {
+        kvm_msix_free(dev);
         return;
     }
 
@@ -288,9 +226,6 @@ static void msix_mmio_write(void *opaque, target_phys_addr_t addr,
     bool is_masked;
 
     pci_set_long(dev->msix_table_page + offset, val);
-    if (kvm_enabled() && kvm_irqchip_in_kernel()) {
-        kvm_msix_update(dev, vector, was_masked, msix_is_masked(dev, vector));
-    }
 
     if (msix_enabled(dev) && vector < dev->msix_entries_nr) {
         is_masked = msix_is_masked(dev, vector);
@@ -391,10 +326,6 @@ static void msix_free_irq_entries(PCIDevice *dev)
 {
     int vector;
 
-    if (kvm_enabled() && kvm_irqchip_in_kernel()) {
-        kvm_msix_free(dev);
-    }
-
     for (vector = 0; vector < dev->msix_entries_nr; ++vector) {
         dev->msix_entry_used[vector] = 0;
         msix_clr_pending(dev, vector);
@@ -418,6 +349,7 @@ int msix_uninit(PCIDevice *dev, MemoryRegion *bar)
     g_free(dev->msix_entry_used);
     dev->msix_entry_used = NULL;
 
+    kvm_msix_free(dev);
     g_free(dev->msix_cache);
 
     dev->cap_present &= ~QEMU_PCI_CAP_MSIX;
@@ -510,16 +442,8 @@ void msix_reset(PCIDevice *dev)
 /* Mark vector as used. */
 int msix_vector_use(PCIDevice *dev, unsigned vector)
 {
-    int ret;
     if (vector >= dev->msix_entries_nr)
         return -EINVAL;
-    if (kvm_enabled() && kvm_irqchip_in_kernel() &&
-        !dev->msix_entry_used[vector]) {
-        ret = kvm_msix_vector_add(dev, vector);
-        if (ret) {
-            return ret;
-        }
-    }
     ++dev->msix_entry_used[vector];
     return 0;
 }
@@ -533,9 +457,6 @@ void msix_vector_unuse(PCIDevice *dev, unsigned vector)
     if (--dev->msix_entry_used[vector]) {
         return;
     }
-    if (kvm_enabled() && kvm_irqchip_in_kernel()) {
-        kvm_msix_vector_del(dev, vector);
-    }
     msix_clr_pending(dev, vector);
 }
 
diff --git a/hw/pci.c b/hw/pci.c
index 39b2173..4f0d7e1 100644
--- a/hw/pci.c
+++ b/hw/pci.c
@@ -362,7 +362,6 @@ static int get_pci_config_device(QEMUFile *f, void *pv, size_t size)
     memcpy(s->config, config, size);
 
     pci_update_mappings(s);
-    msi_post_load(s);
 
     g_free(config);
     return 0;
diff --git a/hw/pci.h b/hw/pci.h
index 4249c6a..d7a652e 100644
--- a/hw/pci.h
+++ b/hw/pci.h
@@ -201,9 +201,6 @@ struct PCIDevice {
     MSIRoutingCache *msi_cache;
     MSIRoutingCache *msix_cache;
 
-    /* MSI entries */
-    int msi_entries_nr;
-
     /* How much space does an MSIX table need. */
     /* The spec requires giving the table structure
      * a 4K aligned region all by itself. Align it to
diff --git a/kvm-stub.c b/kvm-stub.c
index ca4382a..acd1446 100644
--- a/kvm-stub.c
+++ b/kvm-stub.c
@@ -140,19 +140,8 @@ int kvm_get_irq_route_gsi(void)
     return -ENOSYS;
 }
 
-int kvm_msi_message_add(MSIMessage *msg, MSIRoutingCache *cache)
+void kvm_msi_cache_invalidate(MSIRoutingCache *cache)
 {
-    return -ENOSYS;
-}
-
-int kvm_msi_message_del(MSIRoutingCache *cache)
-{
-    return -ENOSYS;
-}
-
-int kvm_msi_message_update(MSIMessage *msg, MSIRoutingCache *cache)
-{
-    return -ENOSYS;
 }
 
 int kvm_commit_irq_routes(void)
diff --git a/kvm.h b/kvm.h
index 8647647..61bcfec 100644
--- a/kvm.h
+++ b/kvm.h
@@ -204,9 +204,9 @@ int kvm_has_gsi_routing(void);
 int kvm_allows_irq0_override(void);
 int kvm_get_irq_route_gsi(void);
 
-int kvm_msi_message_add(MSIMessage *msg, MSIRoutingCache *cache);
-int kvm_msi_message_del(MSIRoutingCache *cache);
-int kvm_msi_message_update(MSIMessage *msg, MSIRoutingCache *cache);
+void kvm_msi_cache_invalidate(MSIRoutingCache *cache);
+
+void kvm_msi_deliver(MSIMessage *msg, MSIRoutingCache *cache);
 
 int kvm_msi_irqfd_set(MSIMessage *msg, MSIRoutingCache *cache, int fd,
                       bool assigned);
diff --git a/qemu-kvm.c b/qemu-kvm.c
index eb8f176..199564c 100644
--- a/qemu-kvm.c
+++ b/qemu-kvm.c
@@ -432,12 +432,16 @@ int kvm_commit_irq_routes(void)
 #endif
 }
 
+static void kvm_msi_cache_flush(KVMState *s);
+
 int kvm_get_irq_route_gsi(void)
 {
     KVMState *s = kvm_state;
     int i, bit;
     uint32_t *buf = s->used_gsi_bitmap;
+    bool retry = true;
 
+again:
     /* Return the lowest unused GSI in the bitmap */
     for (i = 0; i < s->max_gsi / 32; i++) {
         bit = ffs(~buf[i]);
@@ -447,7 +451,11 @@ int kvm_get_irq_route_gsi(void)
 
         return bit - 1 + i * 32;
     }
-
+    if (retry) {
+        retry = false;
+        kvm_msi_cache_flush(s);
+        goto again;
+    }
     return -ENOSPC;
 }
 
@@ -463,7 +471,8 @@ static void kvm_msi_routing_entry(struct kvm_irq_routing_entry *e,
     e->u.msi.data = cache->msg.data;
 }
 
-int kvm_msi_message_add(MSIMessage *msg, MSIRoutingCache *cache)
+static int kvm_msi_message_add(MSIMessage *msg, MSIRoutingCache *cache,
+                               MSIRouteType type)
 {
     struct kvm_irq_routing_entry e;
     int ret;
@@ -473,30 +482,56 @@ int kvm_msi_message_add(MSIMessage *msg, MSIRoutingCache *cache)
         return ret;
     }
     cache->msg = *msg;
-    cache->type = MSI_ROUTE_STATIC;
+    cache->type = type;
     cache->kvm_gsi = ret;
     cache->kvm_irqfd = -1;
 
     kvm_msi_routing_entry(&e, cache);
-    return kvm_add_routing_entry(&e, cache);
+    ret = kvm_add_routing_entry(&e, cache);
+    if (ret < 0) {
+        return ret;
+    }
+    return kvm_commit_irq_routes();
 }
 
-int kvm_msi_message_del(MSIRoutingCache *cache)
+void kvm_msi_cache_invalidate(MSIRoutingCache *cache)
 {
     struct kvm_irq_routing_entry e;
 
-    kvm_msi_routing_entry(&e, cache);
-    return kvm_del_routing_entry(&e);
+    if (cache->type != MSI_ROUTE_NONE) {
+        kvm_msi_routing_entry(&e, cache);
+        kvm_del_routing_entry(&e);
+    }
+}
+
+static void kvm_msi_cache_flush(KVMState *s)
+{
+    int nr_irq_routes = kvm_state->irq_routes->nr;
+    MSIRoutingCache *cache;
+    int i;
+
+    for (i = 0; i < nr_irq_routes; i++) {
+        cache = s->msi_cache[i];
+        if (cache && cache->type == MSI_ROUTE_DYNAMIC) {
+            kvm_msi_cache_invalidate(cache);
+        }
+    }
 }
 
-int kvm_msi_message_update(MSIMessage *msg, MSIRoutingCache *cache)
+static int kvm_msi_message_update(MSIMessage *msg, MSIRoutingCache *cache,
+                                  MSIRouteType type)
 {
     struct kvm_irq_routing_entry old, new;
     MSIRoutingCache new_cache;
     int ret;
 
-    assert(cache->type != MSI_ROUTE_NONE);
-
+    if (cache->type == MSI_ROUTE_NONE) {
+        ret = kvm_msi_message_add(msg, cache, type);
+        if (ret < 0) {
+            return ret;
+        }
+        return kvm_commit_irq_routes();
+    }
     if (msg->address == cache->msg.address && msg->data == cache->msg.data) {
         return 0;
     }
@@ -515,9 +550,16 @@ int kvm_msi_message_update(MSIMessage *msg, MSIRoutingCache *cache)
     }
     *cache = new_cache;
 
-    return 1;
+    return kvm_commit_irq_routes();
 }
 
+void kvm_msi_deliver(MSIMessage *msg, MSIRoutingCache *cache)
+{
+    if (kvm_msi_message_update(msg, cache, MSI_ROUTE_DYNAMIC) < 0 ||
+        kvm_set_irq(cache->kvm_gsi, 1, NULL) < 0) {
+        abort();
+    }
+}
 
 int kvm_msi_irqfd_set(MSIMessage *msg, MSIRoutingCache *cache, int fd,
                       bool assigned)
@@ -525,10 +567,7 @@ int kvm_msi_irqfd_set(MSIMessage *msg, MSIRoutingCache *cache, int fd,
     int ret;
 
     if (assigned) {
-        if (cache->type == MSI_ROUTE_NONE) {
-            return -EINVAL;
-        }
-        ret = kvm_msi_message_update(msg, cache);
+        ret = kvm_msi_message_update(msg, cache, MSI_ROUTE_STATIC);
         if (ret < 0) {
             return ret;
         }
-- 
1.7.3.4


^ permalink raw reply related	[flat|nested] 288+ messages in thread

* [Qemu-devel] [RFC][PATCH 27/45] qemu-kvm: Lazily update MSI caches
@ 2011-10-17  9:28   ` Jan Kiszka
  0 siblings, 0 replies; 288+ messages in thread
From: Jan Kiszka @ 2011-10-17  9:28 UTC (permalink / raw)
  To: Avi Kivity, Marcelo Tosatti
  Cc: Alex Williamson, qemu-devel, kvm, Michael S. Tsirkin

Instead of registering every possible MSI message that is prepared in
some device's config space, this commit only registers those messages
that are actually sent.

Every message that runs through the delivery hook is first checked
against its cached data. If there is a mismatch, then the registration
is created or updated, if it matches, delivery is performed directly.

To avoid exhausting limited KVM IRQ routes, devices are expected to
flush their MSI caches whenever the content is no longer used or valid.
If we run out of routes nevertheless, we flush all caches that were
created dynamically, ie. via the MSI delivery hook. However, we keep all
those cached routes intact that are static because they are associated
with external sources (irqfds).

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
---
 hw/apic.c  |    4 +--
 hw/msi.c   |   93 ++++++------------------------------------------------------
 hw/msi.h   |    2 +-
 hw/msix.c  |   91 ++++------------------------------------------------------
 hw/pci.c   |    1 -
 hw/pci.h   |    3 --
 kvm-stub.c |   13 +--------
 kvm.h      |    6 ++--
 qemu-kvm.c |   69 ++++++++++++++++++++++++++++++++++---------
 9 files changed, 75 insertions(+), 207 deletions(-)

diff --git a/hw/apic.c b/hw/apic.c
index cb6662c..2cafc49 100644
--- a/hw/apic.c
+++ b/hw/apic.c
@@ -807,9 +807,7 @@ static uint32_t apic_mem_readl(void *opaque, target_phys_addr_t addr)
 void apic_deliver_msi(MSIMessage *msg, MSIRoutingCache *cache)
 {
     if (kvm_enabled() && kvm_irqchip_in_kernel()) {
-        if (kvm_set_irq(cache->kvm_gsi, 1, NULL) < 0) {
-            abort();
-        }
+        kvm_msi_deliver(msg, cache);
     } else {
         uint8_t dest =
             (msg->address & MSI_ADDR_DEST_ID_MASK) >> MSI_ADDR_DEST_ID_SHIFT;
diff --git a/hw/msi.c b/hw/msi.c
index 1328903..23d79dd 100644
--- a/hw/msi.c
+++ b/hw/msi.c
@@ -140,71 +140,18 @@ static void msi_message_from_vector(PCIDevice *dev, uint16_t msi_flags,
     }
 }
 
-static void kvm_msi_update(PCIDevice *dev)
-{
-    uint16_t flags = pci_get_word(dev->config + msi_flags_off(dev));
-    unsigned int max_vectors = 1 <<
-        ((flags & PCI_MSI_FLAGS_QMASK) >> (ffs(PCI_MSI_FLAGS_QMASK) - 1));
-    unsigned int nr_vectors = msi_nr_vectors(flags);
-    MSIRoutingCache *cache;
-    bool changed = false;
-    unsigned int vector;
-    MSIMessage msg;
-    int r;
-
-    for (vector = 0; vector < max_vectors; vector++) {
-        cache = &dev->msi_cache[vector];
-
-        if (vector >= nr_vectors) {
-            if (vector < dev->msi_entries_nr) {
-                kvm_msi_message_del(cache);
-                changed = true;
-            }
-        } else if (vector >= dev->msi_entries_nr) {
-            msi_message_from_vector(dev, flags, vector, &msg);
-            r = kvm_msi_message_add(&msg, cache);
-            if (r) {
-                fprintf(stderr, "%s: kvm_msi_add failed: %s\n", __func__,
-                        strerror(-r));
-                exit(1);
-            }
-            changed = true;
-        } else {
-            msi_message_from_vector(dev, flags, vector, &msg);
-            r = kvm_msi_message_update(&msg, cache);
-            if (r < 0) {
-                fprintf(stderr, "%s: kvm_update_msi failed: %s\n",
-                        __func__, strerror(-r));
-                exit(1);
-            }
-            if (r > 0) {
-                changed = true;
-            }
-        }
-    }
-    dev->msi_entries_nr = nr_vectors;
-    if (changed) {
-        r = kvm_commit_irq_routes();
-        if (r) {
-            fprintf(stderr, "%s: kvm_commit_irq_routes failed: %s\n", __func__,
-                    strerror(-r));
-            exit(1);
-        }
-    }
-}
-
-/* KVM specific MSI helpers */
 static void kvm_msi_free(PCIDevice *dev)
 {
-    unsigned int vector;
+    unsigned int vector, nr_vectors;
 
-    for (vector = 0; vector < dev->msi_entries_nr; ++vector) {
-        kvm_msi_message_del(&dev->msi_cache[vector]);
+    if (!kvm_enabled() || !kvm_irqchip_in_kernel()) {
+        return;
     }
-    if (dev->msi_entries_nr > 0) {
-        kvm_commit_irq_routes();
+    nr_vectors =
+        msi_nr_vectors(pci_get_word(dev->config + msi_flags_off(dev)));
+    for (vector = 0; vector < nr_vectors; ++vector) {
+        kvm_msi_cache_invalidate(&dev->msi_cache[vector]);
     }
-    dev->msi_entries_nr = 0;
 }
 
 int msi_init(struct PCIDevice *dev, uint8_t offset,
@@ -283,10 +230,7 @@ void msi_uninit(struct PCIDevice *dev)
     flags = pci_get_word(dev->config + msi_flags_off(dev));
     cap_size = msi_cap_sizeof(flags);
 
-    if (kvm_enabled() && kvm_irqchip_in_kernel()) {
-        kvm_msi_free(dev);
-    }
-
+    kvm_msi_free(dev);
     g_free(dev->msi_cache);
 
     pci_del_capability(dev, PCI_CAP_ID_MSI, cap_size);
@@ -303,9 +247,6 @@ void msi_reset(PCIDevice *dev)
     if (!msi_present(dev)) {
         return;
     }
-    if (kvm_enabled() && kvm_irqchip_in_kernel()) {
-        kvm_msi_free(dev);
-    }
 
     flags = pci_get_word(dev->config + msi_flags_off(dev));
     flags &= ~(PCI_MSI_FLAGS_QSIZE | PCI_MSI_FLAGS_ENABLE);
@@ -402,6 +343,7 @@ void msi_write_config(PCIDevice *dev, uint32_t addr, uint32_t val, int len)
 #endif
 
     if (!(flags & PCI_MSI_FLAGS_ENABLE)) {
+        kvm_msi_free(dev);
         return;
     }
 
@@ -433,10 +375,6 @@ void msi_write_config(PCIDevice *dev, uint32_t addr, uint32_t val, int len)
         pci_set_word(dev->config + msi_flags_off(dev), flags);
     }
 
-    if (kvm_enabled() && kvm_irqchip_in_kernel()) {
-        kvm_msi_update(dev);
-    }
-
     if (!msi_per_vector_mask) {
         /* if per vector masking isn't supported,
            there is no pending interrupt. */
@@ -467,16 +405,3 @@ unsigned int msi_nr_vectors_allocated(const PCIDevice *dev)
     uint16_t flags = pci_get_word(dev->config + msi_flags_off(dev));
     return msi_nr_vectors(flags);
 }
-
-void msi_post_load(PCIDevice *dev)
-{
-    uint16_t flags = pci_get_word(dev->config + msi_flags_off(dev));
-
-    if (kvm_enabled() && kvm_irqchip_in_kernel()) {
-        kvm_msi_free(dev);
-
-        if (flags & PCI_MSI_FLAGS_ENABLE) {
-            kvm_msi_update(dev);
-        }
-    }
-}
diff --git a/hw/msi.h b/hw/msi.h
index 20ae215..74f6d52 100644
--- a/hw/msi.h
+++ b/hw/msi.h
@@ -32,6 +32,7 @@ struct MSIMessage {
 typedef enum {
     MSI_ROUTE_NONE = 0,
     MSI_ROUTE_STATIC,
+    MSI_ROUTE_DYNAMIC,
 } MSIRouteType;
 
 struct MSIRoutingCache {
@@ -51,7 +52,6 @@ void msi_reset(PCIDevice *dev);
 void msi_notify(PCIDevice *dev, unsigned int vector);
 void msi_write_config(PCIDevice *dev, uint32_t addr, uint32_t val, int len);
 unsigned int msi_nr_vectors_allocated(const PCIDevice *dev);
-void msi_post_load(PCIDevice *dev);
 
 static inline bool msi_present(const PCIDevice *dev)
 {
diff --git a/hw/msix.c b/hw/msix.c
index 7d45760..ce3375a 100644
--- a/hw/msix.c
+++ b/hw/msix.c
@@ -42,79 +42,16 @@ static void msix_message_from_vector(PCIDevice *dev, unsigned vector,
     msg->data = pci_get_long(table_entry + PCI_MSIX_ENTRY_DATA);
 }
 
-/* KVM specific MSIX helpers */
 static void kvm_msix_free(PCIDevice *dev)
 {
-    int vector, changed = 0;
-
-    for (vector = 0; vector < dev->msix_entries_nr; ++vector) {
-        if (dev->msix_entry_used[vector]) {
-            kvm_msi_message_del(&dev->msix_cache[vector]);
-            changed = 1;
-        }
-    }
-    if (changed) {
-        kvm_commit_irq_routes();
-    }
-}
-
-static void kvm_msix_update(PCIDevice *dev, int vector,
-                            int was_masked, int is_masked)
-{
-    int mask_cleared = was_masked && !is_masked;
-    MSIMessage msg;
-    int r;
+    int vector;
 
-    /* It is only legal to change an entry when it is masked. Therefore, it is
-     * enough to update the routing in kernel when mask is being cleared. */
-    if (!mask_cleared) {
+    if (!kvm_enabled() || !kvm_irqchip_in_kernel()) {
         return;
     }
-    if (!dev->msix_entry_used[vector]) {
-        return;
-    }
-
-    msix_message_from_vector(dev, vector, &msg);
-    r = kvm_msi_message_update(&msg, &dev->msix_cache[vector]);
-    if (r < 0) {
-        fprintf(stderr, "%s: kvm_update_msix failed: %s\n", __func__,
-                strerror(-r));
-        exit(1);
-    }
-    if (r > 0) {
-        r = kvm_commit_irq_routes();
-        if (r) {
-            fprintf(stderr, "%s: kvm_commit_irq_routes failed: %s\n", __func__,
-		    strerror(-r));
-            exit(1);
-        }
-    }
-}
-
-static int kvm_msix_vector_add(PCIDevice *dev, unsigned vector)
-{
-    MSIMessage msg;
-    int r;
-
-    msix_message_from_vector(dev, vector, &msg);
-    r = kvm_msi_message_add(&msg, &dev->msix_cache[vector]);
-    if (r < 0) {
-        fprintf(stderr, "%s: kvm_add_msix failed: %s\n", __func__, strerror(-r));
-        return r;
-    }
-
-    r = kvm_commit_irq_routes();
-    if (r < 0) {
-        fprintf(stderr, "%s: kvm_commit_irq_routes failed: %s\n", __func__, strerror(-r));
-        return r;
+    for (vector = 0; vector < dev->msix_entries_nr; ++vector) {
+        kvm_msi_cache_invalidate(&dev->msix_cache[vector]);
     }
-    return 0;
-}
-
-static void kvm_msix_vector_del(PCIDevice *dev, unsigned vector)
-{
-    kvm_msi_message_del(&dev->msix_cache[vector]);
-    kvm_commit_irq_routes();
 }
 
 /* Add MSI-X capability to the config space for the device. */
@@ -264,6 +201,7 @@ void msix_write_config(PCIDevice *dev, uint32_t addr,
     }
 
     if (!is_enabled) {
+        kvm_msix_free(dev);
         return;
     }
 
@@ -288,9 +226,6 @@ static void msix_mmio_write(void *opaque, target_phys_addr_t addr,
     bool is_masked;
 
     pci_set_long(dev->msix_table_page + offset, val);
-    if (kvm_enabled() && kvm_irqchip_in_kernel()) {
-        kvm_msix_update(dev, vector, was_masked, msix_is_masked(dev, vector));
-    }
 
     if (msix_enabled(dev) && vector < dev->msix_entries_nr) {
         is_masked = msix_is_masked(dev, vector);
@@ -391,10 +326,6 @@ static void msix_free_irq_entries(PCIDevice *dev)
 {
     int vector;
 
-    if (kvm_enabled() && kvm_irqchip_in_kernel()) {
-        kvm_msix_free(dev);
-    }
-
     for (vector = 0; vector < dev->msix_entries_nr; ++vector) {
         dev->msix_entry_used[vector] = 0;
         msix_clr_pending(dev, vector);
@@ -418,6 +349,7 @@ int msix_uninit(PCIDevice *dev, MemoryRegion *bar)
     g_free(dev->msix_entry_used);
     dev->msix_entry_used = NULL;
 
+    kvm_msix_free(dev);
     g_free(dev->msix_cache);
 
     dev->cap_present &= ~QEMU_PCI_CAP_MSIX;
@@ -510,16 +442,8 @@ void msix_reset(PCIDevice *dev)
 /* Mark vector as used. */
 int msix_vector_use(PCIDevice *dev, unsigned vector)
 {
-    int ret;
     if (vector >= dev->msix_entries_nr)
         return -EINVAL;
-    if (kvm_enabled() && kvm_irqchip_in_kernel() &&
-        !dev->msix_entry_used[vector]) {
-        ret = kvm_msix_vector_add(dev, vector);
-        if (ret) {
-            return ret;
-        }
-    }
     ++dev->msix_entry_used[vector];
     return 0;
 }
@@ -533,9 +457,6 @@ void msix_vector_unuse(PCIDevice *dev, unsigned vector)
     if (--dev->msix_entry_used[vector]) {
         return;
     }
-    if (kvm_enabled() && kvm_irqchip_in_kernel()) {
-        kvm_msix_vector_del(dev, vector);
-    }
     msix_clr_pending(dev, vector);
 }
 
diff --git a/hw/pci.c b/hw/pci.c
index 39b2173..4f0d7e1 100644
--- a/hw/pci.c
+++ b/hw/pci.c
@@ -362,7 +362,6 @@ static int get_pci_config_device(QEMUFile *f, void *pv, size_t size)
     memcpy(s->config, config, size);
 
     pci_update_mappings(s);
-    msi_post_load(s);
 
     g_free(config);
     return 0;
diff --git a/hw/pci.h b/hw/pci.h
index 4249c6a..d7a652e 100644
--- a/hw/pci.h
+++ b/hw/pci.h
@@ -201,9 +201,6 @@ struct PCIDevice {
     MSIRoutingCache *msi_cache;
     MSIRoutingCache *msix_cache;
 
-    /* MSI entries */
-    int msi_entries_nr;
-
     /* How much space does an MSIX table need. */
     /* The spec requires giving the table structure
      * a 4K aligned region all by itself. Align it to
diff --git a/kvm-stub.c b/kvm-stub.c
index ca4382a..acd1446 100644
--- a/kvm-stub.c
+++ b/kvm-stub.c
@@ -140,19 +140,8 @@ int kvm_get_irq_route_gsi(void)
     return -ENOSYS;
 }
 
-int kvm_msi_message_add(MSIMessage *msg, MSIRoutingCache *cache)
+void kvm_msi_cache_invalidate(MSIRoutingCache *cache)
 {
-    return -ENOSYS;
-}
-
-int kvm_msi_message_del(MSIRoutingCache *cache)
-{
-    return -ENOSYS;
-}
-
-int kvm_msi_message_update(MSIMessage *msg, MSIRoutingCache *cache)
-{
-    return -ENOSYS;
 }
 
 int kvm_commit_irq_routes(void)
diff --git a/kvm.h b/kvm.h
index 8647647..61bcfec 100644
--- a/kvm.h
+++ b/kvm.h
@@ -204,9 +204,9 @@ int kvm_has_gsi_routing(void);
 int kvm_allows_irq0_override(void);
 int kvm_get_irq_route_gsi(void);
 
-int kvm_msi_message_add(MSIMessage *msg, MSIRoutingCache *cache);
-int kvm_msi_message_del(MSIRoutingCache *cache);
-int kvm_msi_message_update(MSIMessage *msg, MSIRoutingCache *cache);
+void kvm_msi_cache_invalidate(MSIRoutingCache *cache);
+
+void kvm_msi_deliver(MSIMessage *msg, MSIRoutingCache *cache);
 
 int kvm_msi_irqfd_set(MSIMessage *msg, MSIRoutingCache *cache, int fd,
                       bool assigned);
diff --git a/qemu-kvm.c b/qemu-kvm.c
index eb8f176..199564c 100644
--- a/qemu-kvm.c
+++ b/qemu-kvm.c
@@ -432,12 +432,16 @@ int kvm_commit_irq_routes(void)
 #endif
 }
 
+static void kvm_msi_cache_flush(KVMState *s);
+
 int kvm_get_irq_route_gsi(void)
 {
     KVMState *s = kvm_state;
     int i, bit;
     uint32_t *buf = s->used_gsi_bitmap;
+    bool retry = true;
 
+again:
     /* Return the lowest unused GSI in the bitmap */
     for (i = 0; i < s->max_gsi / 32; i++) {
         bit = ffs(~buf[i]);
@@ -447,7 +451,11 @@ int kvm_get_irq_route_gsi(void)
 
         return bit - 1 + i * 32;
     }
-
+    if (retry) {
+        retry = false;
+        kvm_msi_cache_flush(s);
+        goto again;
+    }
     return -ENOSPC;
 }
 
@@ -463,7 +471,8 @@ static void kvm_msi_routing_entry(struct kvm_irq_routing_entry *e,
     e->u.msi.data = cache->msg.data;
 }
 
-int kvm_msi_message_add(MSIMessage *msg, MSIRoutingCache *cache)
+static int kvm_msi_message_add(MSIMessage *msg, MSIRoutingCache *cache,
+                               MSIRouteType type)
 {
     struct kvm_irq_routing_entry e;
     int ret;
@@ -473,30 +482,56 @@ int kvm_msi_message_add(MSIMessage *msg, MSIRoutingCache *cache)
         return ret;
     }
     cache->msg = *msg;
-    cache->type = MSI_ROUTE_STATIC;
+    cache->type = type;
     cache->kvm_gsi = ret;
     cache->kvm_irqfd = -1;
 
     kvm_msi_routing_entry(&e, cache);
-    return kvm_add_routing_entry(&e, cache);
+    ret = kvm_add_routing_entry(&e, cache);
+    if (ret < 0) {
+        return ret;
+    }
+    return kvm_commit_irq_routes();
 }
 
-int kvm_msi_message_del(MSIRoutingCache *cache)
+void kvm_msi_cache_invalidate(MSIRoutingCache *cache)
 {
     struct kvm_irq_routing_entry e;
 
-    kvm_msi_routing_entry(&e, cache);
-    return kvm_del_routing_entry(&e);
+    if (cache->type != MSI_ROUTE_NONE) {
+        kvm_msi_routing_entry(&e, cache);
+        kvm_del_routing_entry(&e);
+    }
+}
+
+static void kvm_msi_cache_flush(KVMState *s)
+{
+    int nr_irq_routes = kvm_state->irq_routes->nr;
+    MSIRoutingCache *cache;
+    int i;
+
+    for (i = 0; i < nr_irq_routes; i++) {
+        cache = s->msi_cache[i];
+        if (cache && cache->type == MSI_ROUTE_DYNAMIC) {
+            kvm_msi_cache_invalidate(cache);
+        }
+    }
 }
 
-int kvm_msi_message_update(MSIMessage *msg, MSIRoutingCache *cache)
+static int kvm_msi_message_update(MSIMessage *msg, MSIRoutingCache *cache,
+                                  MSIRouteType type)
 {
     struct kvm_irq_routing_entry old, new;
     MSIRoutingCache new_cache;
     int ret;
 
-    assert(cache->type != MSI_ROUTE_NONE);
-
+    if (cache->type == MSI_ROUTE_NONE) {
+        ret = kvm_msi_message_add(msg, cache, type);
+        if (ret < 0) {
+            return ret;
+        }
+        return kvm_commit_irq_routes();
+    }
     if (msg->address == cache->msg.address && msg->data == cache->msg.data) {
         return 0;
     }
@@ -515,9 +550,16 @@ int kvm_msi_message_update(MSIMessage *msg, MSIRoutingCache *cache)
     }
     *cache = new_cache;
 
-    return 1;
+    return kvm_commit_irq_routes();
 }
 
+void kvm_msi_deliver(MSIMessage *msg, MSIRoutingCache *cache)
+{
+    if (kvm_msi_message_update(msg, cache, MSI_ROUTE_DYNAMIC) < 0 ||
+        kvm_set_irq(cache->kvm_gsi, 1, NULL) < 0) {
+        abort();
+    }
+}
 
 int kvm_msi_irqfd_set(MSIMessage *msg, MSIRoutingCache *cache, int fd,
                       bool assigned)
@@ -525,10 +567,7 @@ int kvm_msi_irqfd_set(MSIMessage *msg, MSIRoutingCache *cache, int fd,
     int ret;
 
     if (assigned) {
-        if (cache->type == MSI_ROUTE_NONE) {
-            return -EINVAL;
-        }
-        ret = kvm_msi_message_update(msg, cache);
+        ret = kvm_msi_message_update(msg, cache, MSI_ROUTE_STATIC);
         if (ret < 0) {
             return ret;
         }
-- 
1.7.3.4

^ permalink raw reply related	[flat|nested] 288+ messages in thread

* [RFC][PATCH 28/45] qemu-kvm: msix: Drop tracking of used vectors
  2011-10-17  9:27 ` [Qemu-devel] " Jan Kiszka
@ 2011-10-17  9:28   ` Jan Kiszka
  -1 siblings, 0 replies; 288+ messages in thread
From: Jan Kiszka @ 2011-10-17  9:28 UTC (permalink / raw)
  To: Avi Kivity, Marcelo Tosatti
  Cc: kvm, Alex Williamson, Michael S. Tsirkin, qemu-devel

This optimization was only required to keep KVM route usage low. Now
that we solve that problem via lazy updates, we can drop the field. We
still need interfaces to clear pending vectors, though (and we have to
make use of them more broadly - but that's unrelated to this patch).

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
---
 hw/ivshmem.c    |   16 ++-----------
 hw/msix.c       |   62 +++++++++++-------------------------------------------
 hw/msix.h       |    5 +--
 hw/pci.h        |    2 -
 hw/virtio-pci.c |   20 +++++++----------
 5 files changed, 26 insertions(+), 79 deletions(-)

diff --git a/hw/ivshmem.c b/hw/ivshmem.c
index 242fbea..a402c98 100644
--- a/hw/ivshmem.c
+++ b/hw/ivshmem.c
@@ -535,10 +535,8 @@ static uint64_t ivshmem_get_size(IVShmemState * s) {
     return value;
 }
 
-static void ivshmem_setup_msi(IVShmemState * s) {
-
-    int i;
-
+static void ivshmem_setup_msi(IVShmemState *s)
+{
     /* allocate the MSI-X vectors */
 
     memory_region_init(&s->msix_bar, "ivshmem-msix", 4096);
@@ -551,11 +549,6 @@ static void ivshmem_setup_msi(IVShmemState * s) {
         exit(1);
     }
 
-    /* 'activate' the vectors */
-    for (i = 0; i < s->vectors; i++) {
-        msix_vector_use(&s->dev, i);
-    }
-
     /* allocate Qemu char devices for receiving interrupts */
     s->eventfd_table = g_malloc0(s->vectors * sizeof(EventfdEntry));
 }
@@ -581,7 +574,7 @@ static int ivshmem_load(QEMUFile* f, void *opaque, int version_id)
     IVSHMEM_DPRINTF("ivshmem_load\n");
 
     IVShmemState *proxy = opaque;
-    int ret, i;
+    int ret;
 
     if (version_id > 0) {
         return -EINVAL;
@@ -599,9 +592,6 @@ static int ivshmem_load(QEMUFile* f, void *opaque, int version_id)
 
     if (ivshmem_has_feature(proxy, IVSHMEM_MSI)) {
         msix_load(&proxy->dev, f);
-        for (i = 0; i < proxy->vectors; i++) {
-            msix_vector_use(&proxy->dev, i);
-        }
     } else {
         proxy->intrstatus = qemu_get_be32(f);
         proxy->intrmask = qemu_get_be32(f);
diff --git a/hw/msix.c b/hw/msix.c
index ce3375a..f1b97b5 100644
--- a/hw/msix.c
+++ b/hw/msix.c
@@ -292,9 +292,6 @@ int msix_init(struct PCIDevice *dev, unsigned short nentries,
     if (nentries > MSIX_MAX_ENTRIES)
         return -EINVAL;
 
-    dev->msix_entry_used = g_malloc0(MSIX_MAX_ENTRIES *
-                                        sizeof *dev->msix_entry_used);
-
     dev->msix_table_page = g_malloc0(MSIX_PAGE_SIZE);
     msix_mask_all(dev, nentries);
 
@@ -317,21 +314,9 @@ err_config:
     memory_region_destroy(&dev->msix_mmio);
     g_free(dev->msix_table_page);
     dev->msix_table_page = NULL;
-    g_free(dev->msix_entry_used);
-    dev->msix_entry_used = NULL;
     return ret;
 }
 
-static void msix_free_irq_entries(PCIDevice *dev)
-{
-    int vector;
-
-    for (vector = 0; vector < dev->msix_entries_nr; ++vector) {
-        dev->msix_entry_used[vector] = 0;
-        msix_clr_pending(dev, vector);
-    }
-}
-
 /* Clean up resources for the device. */
 int msix_uninit(PCIDevice *dev, MemoryRegion *bar)
 {
@@ -340,14 +325,11 @@ int msix_uninit(PCIDevice *dev, MemoryRegion *bar)
     }
     pci_del_capability(dev, PCI_CAP_ID_MSIX, MSIX_CAP_LENGTH);
     dev->msix_cap = 0;
-    msix_free_irq_entries(dev);
     dev->msix_entries_nr = 0;
     memory_region_del_subregion(bar, &dev->msix_mmio);
     memory_region_destroy(&dev->msix_mmio);
     g_free(dev->msix_table_page);
     dev->msix_table_page = NULL;
-    g_free(dev->msix_entry_used);
-    dev->msix_entry_used = NULL;
 
     kvm_msix_free(dev);
     g_free(dev->msix_cache);
@@ -376,7 +358,6 @@ void msix_load(PCIDevice *dev, QEMUFile *f)
         return;
     }
 
-    msix_free_irq_entries(dev);
     qemu_get_buffer(f, dev->msix_table_page, n * PCI_MSIX_ENTRY_SIZE);
     qemu_get_buffer(f, dev->msix_table_page + MSIX_PAGE_PENDING, (n + 7) / 8);
 }
@@ -407,7 +388,7 @@ void msix_notify(PCIDevice *dev, unsigned vector)
 {
     MSIMessage msg;
 
-    if (vector >= dev->msix_entries_nr || !dev->msix_entry_used[vector])
+    if (vector >= dev->msix_entries_nr)
         return;
     if (msix_is_masked(dev, vector)) {
         msix_set_pending(dev, vector);
@@ -424,48 +405,31 @@ void msix_reset(PCIDevice *dev)
     if (!msix_present(dev)) {
         return;
     }
-    msix_free_irq_entries(dev);
+    msix_clear_all_vectors(dev);
     dev->config[dev->msix_cap + MSIX_CONTROL_OFFSET] &=
 	    ~dev->wmask[dev->msix_cap + MSIX_CONTROL_OFFSET];
     memset(dev->msix_table_page, 0, MSIX_PAGE_SIZE);
     msix_mask_all(dev, dev->msix_entries_nr);
 }
 
-/* PCI spec suggests that devices make it possible for software to configure
- * less vectors than supported by the device, but does not specify a standard
- * mechanism for devices to do so.
- *
- * We support this by asking devices to declare vectors software is going to
- * actually use, and checking this on the notification path. Devices that
- * don't want to follow the spec suggestion can declare all vectors as used. */
-
-/* Mark vector as used. */
-int msix_vector_use(PCIDevice *dev, unsigned vector)
+/* Clear pending vector. */
+void msix_clear_vector(PCIDevice *dev, unsigned vector)
 {
-    if (vector >= dev->msix_entries_nr)
-        return -EINVAL;
-    ++dev->msix_entry_used[vector];
-    return 0;
-}
-
-/* Mark vector as unused. */
-void msix_vector_unuse(PCIDevice *dev, unsigned vector)
-{
-    if (vector >= dev->msix_entries_nr || !dev->msix_entry_used[vector]) {
-        return;
-    }
-    if (--dev->msix_entry_used[vector]) {
-        return;
+    if (msix_present(dev) && vector < dev->msix_entries_nr) {
+        msix_clr_pending(dev, vector);
     }
-    msix_clr_pending(dev, vector);
 }
 
-void msix_unuse_all_vectors(PCIDevice *dev)
+void msix_clear_all_vectors(PCIDevice *dev)
 {
+    unsigned int vector;
+
     if (!msix_present(dev)) {
         return;
     }
-    msix_free_irq_entries(dev);
+    for (vector = 0; vector < dev->msix_entries_nr; ++vector) {
+        msix_clr_pending(dev, vector);
+    }
 }
 
 /* Invoke the notifier if vector entry is used and unmasked. */
@@ -476,7 +440,7 @@ msix_notify_if_unmasked(PCIDevice *dev, unsigned int vector, bool masked)
 
     assert(dev->msix_vector_config_notifier);
 
-    if (!dev->msix_entry_used[vector] || msix_is_masked(dev, vector)) {
+    if (msix_is_masked(dev, vector)) {
         return 0;
     }
     msix_message_from_vector(dev, vector, &msg);
diff --git a/hw/msix.h b/hw/msix.h
index 978f417..9cd54cf 100644
--- a/hw/msix.h
+++ b/hw/msix.h
@@ -21,9 +21,8 @@ int msix_present(PCIDevice *dev);
 
 uint32_t msix_bar_size(PCIDevice *dev);
 
-int msix_vector_use(PCIDevice *dev, unsigned vector);
-void msix_vector_unuse(PCIDevice *dev, unsigned vector);
-void msix_unuse_all_vectors(PCIDevice *dev);
+void msix_clear_vector(PCIDevice *dev, unsigned vector);
+void msix_clear_all_vectors(PCIDevice *dev);
 
 void msix_notify(PCIDevice *dev, unsigned vector);
 
diff --git a/hw/pci.h b/hw/pci.h
index d7a652e..5cf9a16 100644
--- a/hw/pci.h
+++ b/hw/pci.h
@@ -178,8 +178,6 @@ struct PCIDevice {
     uint8_t *msix_table_page;
     /* MMIO index used to map MSIX table and pending bit entries. */
     MemoryRegion msix_mmio;
-    /* Reference-count for entries actually in use by driver. */
-    unsigned *msix_entry_used;
     /* Region including the MSI-X table */
     uint32_t msix_bar_size;
     /* Version id needed for VMState */
diff --git a/hw/virtio-pci.c b/hw/virtio-pci.c
index 85d6771..5004d7d 100644
--- a/hw/virtio-pci.c
+++ b/hw/virtio-pci.c
@@ -136,9 +136,6 @@ static int virtio_pci_load_config(void * opaque, QEMUFile *f)
     } else {
         proxy->vdev->config_vector = VIRTIO_NO_VECTOR;
     }
-    if (proxy->vdev->config_vector != VIRTIO_NO_VECTOR) {
-        return msix_vector_use(&proxy->pci_dev, proxy->vdev->config_vector);
-    }
     return 0;
 }
 
@@ -152,9 +149,6 @@ static int virtio_pci_load_queue(void * opaque, int n, QEMUFile *f)
         vector = VIRTIO_NO_VECTOR;
     }
     virtio_queue_set_vector(proxy->vdev, n, vector);
-    if (vector != VIRTIO_NO_VECTOR) {
-        return msix_vector_use(&proxy->pci_dev, vector);
-    }
     return 0;
 }
 
@@ -304,7 +298,7 @@ static void virtio_ioport_write(void *opaque, uint32_t addr, uint32_t val)
         if (pa == 0) {
             virtio_pci_stop_ioeventfd(proxy);
             virtio_reset(proxy->vdev);
-            msix_unuse_all_vectors(&proxy->pci_dev);
+            msix_clear_all_vectors(&proxy->pci_dev);
         }
         else
             virtio_queue_set_addr(vdev, vdev->queue_sel, pa);
@@ -331,7 +325,7 @@ static void virtio_ioport_write(void *opaque, uint32_t addr, uint32_t val)
 
         if (vdev->status == 0) {
             virtio_reset(proxy->vdev);
-            msix_unuse_all_vectors(&proxy->pci_dev);
+            msix_clear_all_vectors(&proxy->pci_dev);
         }
 
         /* Linux before 2.6.34 sets the device as OK without enabling
@@ -343,18 +337,20 @@ static void virtio_ioport_write(void *opaque, uint32_t addr, uint32_t val)
         }
         break;
     case VIRTIO_MSI_CONFIG_VECTOR:
-        msix_vector_unuse(&proxy->pci_dev, vdev->config_vector);
+        msix_clear_vector(&proxy->pci_dev, vdev->config_vector);
         /* Make it possible for guest to discover an error took place. */
-        if (msix_vector_use(&proxy->pci_dev, val) < 0)
+        if (val >= vdev->nvectors) {
             val = VIRTIO_NO_VECTOR;
+        }
         vdev->config_vector = val;
         break;
     case VIRTIO_MSI_QUEUE_VECTOR:
-        msix_vector_unuse(&proxy->pci_dev,
+        msix_clear_vector(&proxy->pci_dev,
                           virtio_queue_vector(vdev, vdev->queue_sel));
         /* Make it possible for guest to discover an error took place. */
-        if (msix_vector_use(&proxy->pci_dev, val) < 0)
+        if (val >= vdev->nvectors) {
             val = VIRTIO_NO_VECTOR;
+        }
         virtio_queue_set_vector(vdev, vdev->queue_sel, val);
         break;
     default:
-- 
1.7.3.4


^ permalink raw reply related	[flat|nested] 288+ messages in thread

* [Qemu-devel] [RFC][PATCH 28/45] qemu-kvm: msix: Drop tracking of used vectors
@ 2011-10-17  9:28   ` Jan Kiszka
  0 siblings, 0 replies; 288+ messages in thread
From: Jan Kiszka @ 2011-10-17  9:28 UTC (permalink / raw)
  To: Avi Kivity, Marcelo Tosatti
  Cc: Alex Williamson, qemu-devel, kvm, Michael S. Tsirkin

This optimization was only required to keep KVM route usage low. Now
that we solve that problem via lazy updates, we can drop the field. We
still need interfaces to clear pending vectors, though (and we have to
make use of them more broadly - but that's unrelated to this patch).

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
---
 hw/ivshmem.c    |   16 ++-----------
 hw/msix.c       |   62 +++++++++++-------------------------------------------
 hw/msix.h       |    5 +--
 hw/pci.h        |    2 -
 hw/virtio-pci.c |   20 +++++++----------
 5 files changed, 26 insertions(+), 79 deletions(-)

diff --git a/hw/ivshmem.c b/hw/ivshmem.c
index 242fbea..a402c98 100644
--- a/hw/ivshmem.c
+++ b/hw/ivshmem.c
@@ -535,10 +535,8 @@ static uint64_t ivshmem_get_size(IVShmemState * s) {
     return value;
 }
 
-static void ivshmem_setup_msi(IVShmemState * s) {
-
-    int i;
-
+static void ivshmem_setup_msi(IVShmemState *s)
+{
     /* allocate the MSI-X vectors */
 
     memory_region_init(&s->msix_bar, "ivshmem-msix", 4096);
@@ -551,11 +549,6 @@ static void ivshmem_setup_msi(IVShmemState * s) {
         exit(1);
     }
 
-    /* 'activate' the vectors */
-    for (i = 0; i < s->vectors; i++) {
-        msix_vector_use(&s->dev, i);
-    }
-
     /* allocate Qemu char devices for receiving interrupts */
     s->eventfd_table = g_malloc0(s->vectors * sizeof(EventfdEntry));
 }
@@ -581,7 +574,7 @@ static int ivshmem_load(QEMUFile* f, void *opaque, int version_id)
     IVSHMEM_DPRINTF("ivshmem_load\n");
 
     IVShmemState *proxy = opaque;
-    int ret, i;
+    int ret;
 
     if (version_id > 0) {
         return -EINVAL;
@@ -599,9 +592,6 @@ static int ivshmem_load(QEMUFile* f, void *opaque, int version_id)
 
     if (ivshmem_has_feature(proxy, IVSHMEM_MSI)) {
         msix_load(&proxy->dev, f);
-        for (i = 0; i < proxy->vectors; i++) {
-            msix_vector_use(&proxy->dev, i);
-        }
     } else {
         proxy->intrstatus = qemu_get_be32(f);
         proxy->intrmask = qemu_get_be32(f);
diff --git a/hw/msix.c b/hw/msix.c
index ce3375a..f1b97b5 100644
--- a/hw/msix.c
+++ b/hw/msix.c
@@ -292,9 +292,6 @@ int msix_init(struct PCIDevice *dev, unsigned short nentries,
     if (nentries > MSIX_MAX_ENTRIES)
         return -EINVAL;
 
-    dev->msix_entry_used = g_malloc0(MSIX_MAX_ENTRIES *
-                                        sizeof *dev->msix_entry_used);
-
     dev->msix_table_page = g_malloc0(MSIX_PAGE_SIZE);
     msix_mask_all(dev, nentries);
 
@@ -317,21 +314,9 @@ err_config:
     memory_region_destroy(&dev->msix_mmio);
     g_free(dev->msix_table_page);
     dev->msix_table_page = NULL;
-    g_free(dev->msix_entry_used);
-    dev->msix_entry_used = NULL;
     return ret;
 }
 
-static void msix_free_irq_entries(PCIDevice *dev)
-{
-    int vector;
-
-    for (vector = 0; vector < dev->msix_entries_nr; ++vector) {
-        dev->msix_entry_used[vector] = 0;
-        msix_clr_pending(dev, vector);
-    }
-}
-
 /* Clean up resources for the device. */
 int msix_uninit(PCIDevice *dev, MemoryRegion *bar)
 {
@@ -340,14 +325,11 @@ int msix_uninit(PCIDevice *dev, MemoryRegion *bar)
     }
     pci_del_capability(dev, PCI_CAP_ID_MSIX, MSIX_CAP_LENGTH);
     dev->msix_cap = 0;
-    msix_free_irq_entries(dev);
     dev->msix_entries_nr = 0;
     memory_region_del_subregion(bar, &dev->msix_mmio);
     memory_region_destroy(&dev->msix_mmio);
     g_free(dev->msix_table_page);
     dev->msix_table_page = NULL;
-    g_free(dev->msix_entry_used);
-    dev->msix_entry_used = NULL;
 
     kvm_msix_free(dev);
     g_free(dev->msix_cache);
@@ -376,7 +358,6 @@ void msix_load(PCIDevice *dev, QEMUFile *f)
         return;
     }
 
-    msix_free_irq_entries(dev);
     qemu_get_buffer(f, dev->msix_table_page, n * PCI_MSIX_ENTRY_SIZE);
     qemu_get_buffer(f, dev->msix_table_page + MSIX_PAGE_PENDING, (n + 7) / 8);
 }
@@ -407,7 +388,7 @@ void msix_notify(PCIDevice *dev, unsigned vector)
 {
     MSIMessage msg;
 
-    if (vector >= dev->msix_entries_nr || !dev->msix_entry_used[vector])
+    if (vector >= dev->msix_entries_nr)
         return;
     if (msix_is_masked(dev, vector)) {
         msix_set_pending(dev, vector);
@@ -424,48 +405,31 @@ void msix_reset(PCIDevice *dev)
     if (!msix_present(dev)) {
         return;
     }
-    msix_free_irq_entries(dev);
+    msix_clear_all_vectors(dev);
     dev->config[dev->msix_cap + MSIX_CONTROL_OFFSET] &=
 	    ~dev->wmask[dev->msix_cap + MSIX_CONTROL_OFFSET];
     memset(dev->msix_table_page, 0, MSIX_PAGE_SIZE);
     msix_mask_all(dev, dev->msix_entries_nr);
 }
 
-/* PCI spec suggests that devices make it possible for software to configure
- * less vectors than supported by the device, but does not specify a standard
- * mechanism for devices to do so.
- *
- * We support this by asking devices to declare vectors software is going to
- * actually use, and checking this on the notification path. Devices that
- * don't want to follow the spec suggestion can declare all vectors as used. */
-
-/* Mark vector as used. */
-int msix_vector_use(PCIDevice *dev, unsigned vector)
+/* Clear pending vector. */
+void msix_clear_vector(PCIDevice *dev, unsigned vector)
 {
-    if (vector >= dev->msix_entries_nr)
-        return -EINVAL;
-    ++dev->msix_entry_used[vector];
-    return 0;
-}
-
-/* Mark vector as unused. */
-void msix_vector_unuse(PCIDevice *dev, unsigned vector)
-{
-    if (vector >= dev->msix_entries_nr || !dev->msix_entry_used[vector]) {
-        return;
-    }
-    if (--dev->msix_entry_used[vector]) {
-        return;
+    if (msix_present(dev) && vector < dev->msix_entries_nr) {
+        msix_clr_pending(dev, vector);
     }
-    msix_clr_pending(dev, vector);
 }
 
-void msix_unuse_all_vectors(PCIDevice *dev)
+void msix_clear_all_vectors(PCIDevice *dev)
 {
+    unsigned int vector;
+
     if (!msix_present(dev)) {
         return;
     }
-    msix_free_irq_entries(dev);
+    for (vector = 0; vector < dev->msix_entries_nr; ++vector) {
+        msix_clr_pending(dev, vector);
+    }
 }
 
 /* Invoke the notifier if vector entry is used and unmasked. */
@@ -476,7 +440,7 @@ msix_notify_if_unmasked(PCIDevice *dev, unsigned int vector, bool masked)
 
     assert(dev->msix_vector_config_notifier);
 
-    if (!dev->msix_entry_used[vector] || msix_is_masked(dev, vector)) {
+    if (msix_is_masked(dev, vector)) {
         return 0;
     }
     msix_message_from_vector(dev, vector, &msg);
diff --git a/hw/msix.h b/hw/msix.h
index 978f417..9cd54cf 100644
--- a/hw/msix.h
+++ b/hw/msix.h
@@ -21,9 +21,8 @@ int msix_present(PCIDevice *dev);
 
 uint32_t msix_bar_size(PCIDevice *dev);
 
-int msix_vector_use(PCIDevice *dev, unsigned vector);
-void msix_vector_unuse(PCIDevice *dev, unsigned vector);
-void msix_unuse_all_vectors(PCIDevice *dev);
+void msix_clear_vector(PCIDevice *dev, unsigned vector);
+void msix_clear_all_vectors(PCIDevice *dev);
 
 void msix_notify(PCIDevice *dev, unsigned vector);
 
diff --git a/hw/pci.h b/hw/pci.h
index d7a652e..5cf9a16 100644
--- a/hw/pci.h
+++ b/hw/pci.h
@@ -178,8 +178,6 @@ struct PCIDevice {
     uint8_t *msix_table_page;
     /* MMIO index used to map MSIX table and pending bit entries. */
     MemoryRegion msix_mmio;
-    /* Reference-count for entries actually in use by driver. */
-    unsigned *msix_entry_used;
     /* Region including the MSI-X table */
     uint32_t msix_bar_size;
     /* Version id needed for VMState */
diff --git a/hw/virtio-pci.c b/hw/virtio-pci.c
index 85d6771..5004d7d 100644
--- a/hw/virtio-pci.c
+++ b/hw/virtio-pci.c
@@ -136,9 +136,6 @@ static int virtio_pci_load_config(void * opaque, QEMUFile *f)
     } else {
         proxy->vdev->config_vector = VIRTIO_NO_VECTOR;
     }
-    if (proxy->vdev->config_vector != VIRTIO_NO_VECTOR) {
-        return msix_vector_use(&proxy->pci_dev, proxy->vdev->config_vector);
-    }
     return 0;
 }
 
@@ -152,9 +149,6 @@ static int virtio_pci_load_queue(void * opaque, int n, QEMUFile *f)
         vector = VIRTIO_NO_VECTOR;
     }
     virtio_queue_set_vector(proxy->vdev, n, vector);
-    if (vector != VIRTIO_NO_VECTOR) {
-        return msix_vector_use(&proxy->pci_dev, vector);
-    }
     return 0;
 }
 
@@ -304,7 +298,7 @@ static void virtio_ioport_write(void *opaque, uint32_t addr, uint32_t val)
         if (pa == 0) {
             virtio_pci_stop_ioeventfd(proxy);
             virtio_reset(proxy->vdev);
-            msix_unuse_all_vectors(&proxy->pci_dev);
+            msix_clear_all_vectors(&proxy->pci_dev);
         }
         else
             virtio_queue_set_addr(vdev, vdev->queue_sel, pa);
@@ -331,7 +325,7 @@ static void virtio_ioport_write(void *opaque, uint32_t addr, uint32_t val)
 
         if (vdev->status == 0) {
             virtio_reset(proxy->vdev);
-            msix_unuse_all_vectors(&proxy->pci_dev);
+            msix_clear_all_vectors(&proxy->pci_dev);
         }
 
         /* Linux before 2.6.34 sets the device as OK without enabling
@@ -343,18 +337,20 @@ static void virtio_ioport_write(void *opaque, uint32_t addr, uint32_t val)
         }
         break;
     case VIRTIO_MSI_CONFIG_VECTOR:
-        msix_vector_unuse(&proxy->pci_dev, vdev->config_vector);
+        msix_clear_vector(&proxy->pci_dev, vdev->config_vector);
         /* Make it possible for guest to discover an error took place. */
-        if (msix_vector_use(&proxy->pci_dev, val) < 0)
+        if (val >= vdev->nvectors) {
             val = VIRTIO_NO_VECTOR;
+        }
         vdev->config_vector = val;
         break;
     case VIRTIO_MSI_QUEUE_VECTOR:
-        msix_vector_unuse(&proxy->pci_dev,
+        msix_clear_vector(&proxy->pci_dev,
                           virtio_queue_vector(vdev, vdev->queue_sel));
         /* Make it possible for guest to discover an error took place. */
-        if (msix_vector_use(&proxy->pci_dev, val) < 0)
+        if (val >= vdev->nvectors) {
             val = VIRTIO_NO_VECTOR;
+        }
         virtio_queue_set_vector(vdev, vdev->queue_sel, val);
         break;
     default:
-- 
1.7.3.4

^ permalink raw reply related	[flat|nested] 288+ messages in thread

* [RFC][PATCH 29/45] pci-assign: Drop kvm_assigned_irq::host_irq initialization
  2011-10-17  9:27 ` [Qemu-devel] " Jan Kiszka
@ 2011-10-17  9:28   ` Jan Kiszka
  -1 siblings, 0 replies; 288+ messages in thread
From: Jan Kiszka @ 2011-10-17  9:28 UTC (permalink / raw)
  To: Avi Kivity, Marcelo Tosatti
  Cc: kvm, Alex Williamson, Michael S. Tsirkin, qemu-devel

real_device.irq is never set explicitly, thus remains 0. So we can
simply drop this line as assigned_irq_data is zero-initialized anyway.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
---
 hw/device-assignment.c |    1 -
 1 files changed, 0 insertions(+), 1 deletions(-)

diff --git a/hw/device-assignment.c b/hw/device-assignment.c
index 07e9f5a..799b816 100644
--- a/hw/device-assignment.c
+++ b/hw/device-assignment.c
@@ -825,7 +825,6 @@ static int assign_irq(AssignedDevice *dev)
     memset(&assigned_irq_data, 0, sizeof(assigned_irq_data));
     assigned_irq_data.assigned_dev_id = calc_assigned_dev_id(dev);
     assigned_irq_data.guest_irq = irq;
-    assigned_irq_data.host_irq = dev->real_device.irq;
     if (dev->irq_requested_type) {
         assigned_irq_data.flags = dev->irq_requested_type;
         r = kvm_deassign_irq(kvm_state, &assigned_irq_data);
-- 
1.7.3.4


^ permalink raw reply related	[flat|nested] 288+ messages in thread

* [Qemu-devel] [RFC][PATCH 29/45] pci-assign: Drop kvm_assigned_irq::host_irq initialization
@ 2011-10-17  9:28   ` Jan Kiszka
  0 siblings, 0 replies; 288+ messages in thread
From: Jan Kiszka @ 2011-10-17  9:28 UTC (permalink / raw)
  To: Avi Kivity, Marcelo Tosatti
  Cc: Alex Williamson, qemu-devel, kvm, Michael S. Tsirkin

real_device.irq is never set explicitly, thus remains 0. So we can
simply drop this line as assigned_irq_data is zero-initialized anyway.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
---
 hw/device-assignment.c |    1 -
 1 files changed, 0 insertions(+), 1 deletions(-)

diff --git a/hw/device-assignment.c b/hw/device-assignment.c
index 07e9f5a..799b816 100644
--- a/hw/device-assignment.c
+++ b/hw/device-assignment.c
@@ -825,7 +825,6 @@ static int assign_irq(AssignedDevice *dev)
     memset(&assigned_irq_data, 0, sizeof(assigned_irq_data));
     assigned_irq_data.assigned_dev_id = calc_assigned_dev_id(dev);
     assigned_irq_data.guest_irq = irq;
-    assigned_irq_data.host_irq = dev->real_device.irq;
     if (dev->irq_requested_type) {
         assigned_irq_data.flags = dev->irq_requested_type;
         r = kvm_deassign_irq(kvm_state, &assigned_irq_data);
-- 
1.7.3.4

^ permalink raw reply related	[flat|nested] 288+ messages in thread

* [RFC][PATCH 30/45] pci-assign: Rename assign_irq to assign_intx
  2011-10-17  9:27 ` [Qemu-devel] " Jan Kiszka
@ 2011-10-17  9:28   ` Jan Kiszka
  -1 siblings, 0 replies; 288+ messages in thread
From: Jan Kiszka @ 2011-10-17  9:28 UTC (permalink / raw)
  To: Avi Kivity, Marcelo Tosatti
  Cc: kvm, Alex Williamson, Michael S. Tsirkin, qemu-devel

The previous name may incorrectly suggest that this function assigns all
types of IRQs though it's only dealing with legacy interrupts.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
---
 hw/device-assignment.c |   14 +++++++-------
 1 files changed, 7 insertions(+), 7 deletions(-)

diff --git a/hw/device-assignment.c b/hw/device-assignment.c
index 799b816..4e4349b 100644
--- a/hw/device-assignment.c
+++ b/hw/device-assignment.c
@@ -807,7 +807,7 @@ static int assign_device(AssignedDevice *dev)
     return r;
 }
 
-static int assign_irq(AssignedDevice *dev)
+static int assign_intx(AssignedDevice *dev)
 {
     struct kvm_assigned_irq assigned_irq_data;
     int irq, r = 0;
@@ -829,7 +829,7 @@ static int assign_irq(AssignedDevice *dev)
         assigned_irq_data.flags = dev->irq_requested_type;
         r = kvm_deassign_irq(kvm_state, &assigned_irq_data);
         if (r) {
-            perror("assign_irq: deassign");
+            perror("assign_intx: deassign");
         }
         dev->irq_requested_type = 0;
     }
@@ -898,7 +898,7 @@ void assigned_dev_update_irqs(void)
     while (dev) {
         next = QLIST_NEXT(dev, next);
         if (dev->irq_requested_type & KVM_DEV_IRQ_HOST_INTX) {
-            r = assign_irq(dev);
+            r = assign_intx(dev);
             if (r < 0) {
                 qdev_unplug(&dev->dev.qdev);
             }
@@ -967,7 +967,7 @@ static void assigned_dev_update_msi(PCIDevice *pci_dev)
         assigned_dev->girq = -1;
         assigned_dev->irq_requested_type = assigned_irq_data.flags;
     } else {
-        assign_irq(assigned_dev);
+        assign_intx(assigned_dev);
     }
 }
 
@@ -1102,7 +1102,7 @@ static void assigned_dev_update_msix(PCIDevice *pci_dev)
         assigned_dev->girq = -1;
         assigned_dev->irq_requested_type = assigned_irq_data.flags;
     } else {
-        assign_irq(assigned_dev);
+        assign_intx(assigned_dev);
     }
 }
 
@@ -1645,8 +1645,8 @@ static int assigned_initfn(struct PCIDevice *pci_dev)
     if (r < 0)
         goto out;
 
-    /* assign irq for the device */
-    r = assign_irq(dev);
+    /* assign legacy INTx to the device */
+    r = assign_intx(dev);
     if (r < 0)
         goto assigned_out;
 
-- 
1.7.3.4


^ permalink raw reply related	[flat|nested] 288+ messages in thread

* [Qemu-devel] [RFC][PATCH 30/45] pci-assign: Rename assign_irq to assign_intx
@ 2011-10-17  9:28   ` Jan Kiszka
  0 siblings, 0 replies; 288+ messages in thread
From: Jan Kiszka @ 2011-10-17  9:28 UTC (permalink / raw)
  To: Avi Kivity, Marcelo Tosatti
  Cc: Alex Williamson, qemu-devel, kvm, Michael S. Tsirkin

The previous name may incorrectly suggest that this function assigns all
types of IRQs though it's only dealing with legacy interrupts.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
---
 hw/device-assignment.c |   14 +++++++-------
 1 files changed, 7 insertions(+), 7 deletions(-)

diff --git a/hw/device-assignment.c b/hw/device-assignment.c
index 799b816..4e4349b 100644
--- a/hw/device-assignment.c
+++ b/hw/device-assignment.c
@@ -807,7 +807,7 @@ static int assign_device(AssignedDevice *dev)
     return r;
 }
 
-static int assign_irq(AssignedDevice *dev)
+static int assign_intx(AssignedDevice *dev)
 {
     struct kvm_assigned_irq assigned_irq_data;
     int irq, r = 0;
@@ -829,7 +829,7 @@ static int assign_irq(AssignedDevice *dev)
         assigned_irq_data.flags = dev->irq_requested_type;
         r = kvm_deassign_irq(kvm_state, &assigned_irq_data);
         if (r) {
-            perror("assign_irq: deassign");
+            perror("assign_intx: deassign");
         }
         dev->irq_requested_type = 0;
     }
@@ -898,7 +898,7 @@ void assigned_dev_update_irqs(void)
     while (dev) {
         next = QLIST_NEXT(dev, next);
         if (dev->irq_requested_type & KVM_DEV_IRQ_HOST_INTX) {
-            r = assign_irq(dev);
+            r = assign_intx(dev);
             if (r < 0) {
                 qdev_unplug(&dev->dev.qdev);
             }
@@ -967,7 +967,7 @@ static void assigned_dev_update_msi(PCIDevice *pci_dev)
         assigned_dev->girq = -1;
         assigned_dev->irq_requested_type = assigned_irq_data.flags;
     } else {
-        assign_irq(assigned_dev);
+        assign_intx(assigned_dev);
     }
 }
 
@@ -1102,7 +1102,7 @@ static void assigned_dev_update_msix(PCIDevice *pci_dev)
         assigned_dev->girq = -1;
         assigned_dev->irq_requested_type = assigned_irq_data.flags;
     } else {
-        assign_irq(assigned_dev);
+        assign_intx(assigned_dev);
     }
 }
 
@@ -1645,8 +1645,8 @@ static int assigned_initfn(struct PCIDevice *pci_dev)
     if (r < 0)
         goto out;
 
-    /* assign irq for the device */
-    r = assign_irq(dev);
+    /* assign legacy INTx to the device */
+    r = assign_intx(dev);
     if (r < 0)
         goto assigned_out;
 
-- 
1.7.3.4

^ permalink raw reply related	[flat|nested] 288+ messages in thread

* [RFC][PATCH 31/45] qemu-kvm: Refactor kvm_deassign_irq to kvm_device_irq_deassign
  2011-10-17  9:27 ` [Qemu-devel] " Jan Kiszka
@ 2011-10-17  9:28   ` Jan Kiszka
  -1 siblings, 0 replies; 288+ messages in thread
From: Jan Kiszka @ 2011-10-17  9:28 UTC (permalink / raw)
  To: Avi Kivity, Marcelo Tosatti
  Cc: kvm, Alex Williamson, Michael S. Tsirkin, qemu-devel

Don't pass kvm_assigned_irq struct, rather use the actually required
fields in the interface.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
---
 hw/device-assignment.c |   42 ++++++++++++++++++++++++------------------
 qemu-kvm.c             |   15 ++++++++++-----
 qemu-kvm.h             |   11 +----------
 3 files changed, 35 insertions(+), 33 deletions(-)

diff --git a/hw/device-assignment.c b/hw/device-assignment.c
index 4e4349b..e0b9cfe 100644
--- a/hw/device-assignment.c
+++ b/hw/device-assignment.c
@@ -810,7 +810,8 @@ static int assign_device(AssignedDevice *dev)
 static int assign_intx(AssignedDevice *dev)
 {
     struct kvm_assigned_irq assigned_irq_data;
-    int irq, r = 0;
+    uint32_t dev_id;
+    int irq, r;
 
     /* Interrupt PIN 0 means don't use INTx */
     if (assigned_dev_pci_read_byte(&dev->dev, PCI_INTERRUPT_PIN) == 0)
@@ -819,21 +820,24 @@ static int assign_intx(AssignedDevice *dev)
     irq = pci_map_irq(&dev->dev, dev->intpin);
     irq = piix_get_irq(irq);
 
-    if (dev->girq == irq)
-        return r;
+    if (dev->girq == irq) {
+        return 0;
+    }
+
+    dev_id = calc_assigned_dev_id(dev);
 
-    memset(&assigned_irq_data, 0, sizeof(assigned_irq_data));
-    assigned_irq_data.assigned_dev_id = calc_assigned_dev_id(dev);
-    assigned_irq_data.guest_irq = irq;
     if (dev->irq_requested_type) {
-        assigned_irq_data.flags = dev->irq_requested_type;
-        r = kvm_deassign_irq(kvm_state, &assigned_irq_data);
+        r = kvm_device_irq_deassign(kvm_state, dev_id,
+                                    dev->irq_requested_type);
         if (r) {
             perror("assign_intx: deassign");
         }
         dev->irq_requested_type = 0;
     }
 
+    memset(&assigned_irq_data, 0, sizeof(assigned_irq_data));
+    assigned_irq_data.assigned_dev_id = dev_id;
+    assigned_irq_data.guest_irq = irq;
     assigned_irq_data.flags = KVM_DEV_IRQ_GUEST_INTX;
     if (dev->features & ASSIGNED_DEVICE_PREFER_MSI_MASK &&
         dev->cap.available & ASSIGNED_DEVICE_CAP_MSI)
@@ -913,20 +917,19 @@ static void assigned_dev_update_msi(PCIDevice *pci_dev)
     AssignedDevice *assigned_dev = DO_UPCAST(AssignedDevice, dev, pci_dev);
     uint8_t ctrl_byte = pci_get_byte(pci_dev->config + pci_dev->msi_cap +
                                      PCI_MSI_FLAGS);
+    uint32_t dev_id;
     int r;
 
-    memset(&assigned_irq_data, 0, sizeof assigned_irq_data);
-    assigned_irq_data.assigned_dev_id = calc_assigned_dev_id(assigned_dev);
+    dev_id = calc_assigned_dev_id(assigned_dev);
 
     /* Some guests gratuitously disable MSI even if they're not using it,
      * try to catch this by only deassigning irqs if the guest is using
      * MSI or intends to start. */
     if ((assigned_dev->irq_requested_type & KVM_DEV_IRQ_GUEST_MSI) ||
         (ctrl_byte & PCI_MSI_FLAGS_ENABLE)) {
-
-        assigned_irq_data.flags = assigned_dev->irq_requested_type;
         free_dev_irq_entries(assigned_dev);
-        r = kvm_deassign_irq(kvm_state, &assigned_irq_data);
+        r = kvm_device_irq_deassign(kvm_state, dev_id,
+                                    assigned_dev->irq_requested_type);
         /* -ENXIO means no assigned irq */
         if (r && r != -ENXIO)
             perror("assigned_dev_update_msi: deassign irq");
@@ -958,6 +961,8 @@ static void assigned_dev_update_msi(PCIDevice *pci_dev)
         }
 	assigned_dev->irq_entries_nr = 1;
 
+        memset(&assigned_irq_data, 0, sizeof assigned_irq_data);
+        assigned_irq_data.assigned_dev_id = dev_id;
         assigned_irq_data.guest_irq = assigned_dev->entry->gsi;
 	assigned_irq_data.flags = KVM_DEV_IRQ_HOST_MSI | KVM_DEV_IRQ_GUEST_MSI;
         if (kvm_assign_irq(kvm_state, &assigned_irq_data) < 0) {
@@ -1066,20 +1071,19 @@ static void assigned_dev_update_msix(PCIDevice *pci_dev)
     AssignedDevice *assigned_dev = DO_UPCAST(AssignedDevice, dev, pci_dev);
     uint16_t ctrl_word = pci_get_word(pci_dev->config + pci_dev->msix_cap +
                                       PCI_MSIX_FLAGS);
+    uint32_t dev_id;
     int r;
 
-    memset(&assigned_irq_data, 0, sizeof assigned_irq_data);
-    assigned_irq_data.assigned_dev_id = calc_assigned_dev_id(assigned_dev);
+    dev_id = calc_assigned_dev_id(assigned_dev);
 
     /* Some guests gratuitously disable MSIX even if they're not using it,
      * try to catch this by only deassigning irqs if the guest is using
      * MSIX or intends to start. */
     if ((assigned_dev->irq_requested_type & KVM_DEV_IRQ_GUEST_MSIX) ||
         (ctrl_word & PCI_MSIX_FLAGS_ENABLE)) {
-
-        assigned_irq_data.flags = assigned_dev->irq_requested_type;
         free_dev_irq_entries(assigned_dev);
-        r = kvm_deassign_irq(kvm_state, &assigned_irq_data);
+        r = kvm_device_irq_deassign(kvm_state, dev_id,
+                                    assigned_dev->irq_requested_type);
         /* -ENXIO means no assigned irq */
         if (r && r != -ENXIO)
             perror("assigned_dev_update_msix: deassign irq");
@@ -1088,6 +1092,8 @@ static void assigned_dev_update_msix(PCIDevice *pci_dev)
     }
 
     if (ctrl_word & PCI_MSIX_FLAGS_ENABLE) {
+        memset(&assigned_irq_data, 0, sizeof assigned_irq_data);
+        assigned_irq_data.assigned_dev_id = dev_id;
         assigned_irq_data.flags = KVM_DEV_IRQ_HOST_MSIX |
                                   KVM_DEV_IRQ_GUEST_MSIX;
 
diff --git a/qemu-kvm.c b/qemu-kvm.c
index 199564c..c24e93c 100644
--- a/qemu-kvm.c
+++ b/qemu-kvm.c
@@ -206,11 +206,6 @@ int kvm_assign_irq(KVMState *s, struct kvm_assigned_irq *assigned_irq)
 
     return kvm_old_assign_irq(s, assigned_irq);
 }
-
-int kvm_deassign_irq(KVMState *s, struct kvm_assigned_irq *assigned_irq)
-{
-    return kvm_vm_ioctl(s, KVM_DEASSIGN_DEV_IRQ, assigned_irq);
-}
 #else
 int kvm_assign_irq(KVMState *s, struct kvm_assigned_irq *assigned_irq)
 {
@@ -219,6 +214,16 @@ int kvm_assign_irq(KVMState *s, struct kvm_assigned_irq *assigned_irq)
 #endif
 #endif
 
+int kvm_device_irq_deassign(KVMState *s, uint32_t dev_id, uint32_t type)
+{
+    struct kvm_assigned_irq assigned_irq = {
+        .assigned_dev_id = dev_id,
+        .flags = type,
+    };
+
+    return kvm_vm_ioctl(s, KVM_DEASSIGN_DEV_IRQ, &assigned_irq);
+}
+
 #ifdef KVM_CAP_DEVICE_DEASSIGNMENT
 int kvm_deassign_pci_device(KVMState *s,
                             struct kvm_assigned_pci_dev *assigned_dev)
diff --git a/qemu-kvm.h b/qemu-kvm.h
index b2ae5da..7cdb5a8 100644
--- a/qemu-kvm.h
+++ b/qemu-kvm.h
@@ -150,16 +150,7 @@ int kvm_assign_pci_device(KVMState *s,
  */
 int kvm_assign_irq(KVMState *s, struct kvm_assigned_irq *assigned_irq);
 
-/*!
- * \brief Deassign IRQ for an assigned device
- *
- * Used for PCI device assignment, this function deassigns IRQ numbers
- * for an assigned device.
- *
- * \param kvm Pointer to the current kvm_context
- * \param assigned_irq Parameters, like dev id, host irq, guest irq, etc
- */
-int kvm_deassign_irq(KVMState *s, struct kvm_assigned_irq *assigned_irq);
+int kvm_device_irq_deassign(KVMState *s, uint32_t dev_id, uint32_t type);
 
 /*!
  * \brief Notifies host kernel about a PCI device to be deassigned from a guest
-- 
1.7.3.4


^ permalink raw reply related	[flat|nested] 288+ messages in thread

* [Qemu-devel] [RFC][PATCH 31/45] qemu-kvm: Refactor kvm_deassign_irq to kvm_device_irq_deassign
@ 2011-10-17  9:28   ` Jan Kiszka
  0 siblings, 0 replies; 288+ messages in thread
From: Jan Kiszka @ 2011-10-17  9:28 UTC (permalink / raw)
  To: Avi Kivity, Marcelo Tosatti
  Cc: Alex Williamson, qemu-devel, kvm, Michael S. Tsirkin

Don't pass kvm_assigned_irq struct, rather use the actually required
fields in the interface.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
---
 hw/device-assignment.c |   42 ++++++++++++++++++++++++------------------
 qemu-kvm.c             |   15 ++++++++++-----
 qemu-kvm.h             |   11 +----------
 3 files changed, 35 insertions(+), 33 deletions(-)

diff --git a/hw/device-assignment.c b/hw/device-assignment.c
index 4e4349b..e0b9cfe 100644
--- a/hw/device-assignment.c
+++ b/hw/device-assignment.c
@@ -810,7 +810,8 @@ static int assign_device(AssignedDevice *dev)
 static int assign_intx(AssignedDevice *dev)
 {
     struct kvm_assigned_irq assigned_irq_data;
-    int irq, r = 0;
+    uint32_t dev_id;
+    int irq, r;
 
     /* Interrupt PIN 0 means don't use INTx */
     if (assigned_dev_pci_read_byte(&dev->dev, PCI_INTERRUPT_PIN) == 0)
@@ -819,21 +820,24 @@ static int assign_intx(AssignedDevice *dev)
     irq = pci_map_irq(&dev->dev, dev->intpin);
     irq = piix_get_irq(irq);
 
-    if (dev->girq == irq)
-        return r;
+    if (dev->girq == irq) {
+        return 0;
+    }
+
+    dev_id = calc_assigned_dev_id(dev);
 
-    memset(&assigned_irq_data, 0, sizeof(assigned_irq_data));
-    assigned_irq_data.assigned_dev_id = calc_assigned_dev_id(dev);
-    assigned_irq_data.guest_irq = irq;
     if (dev->irq_requested_type) {
-        assigned_irq_data.flags = dev->irq_requested_type;
-        r = kvm_deassign_irq(kvm_state, &assigned_irq_data);
+        r = kvm_device_irq_deassign(kvm_state, dev_id,
+                                    dev->irq_requested_type);
         if (r) {
             perror("assign_intx: deassign");
         }
         dev->irq_requested_type = 0;
     }
 
+    memset(&assigned_irq_data, 0, sizeof(assigned_irq_data));
+    assigned_irq_data.assigned_dev_id = dev_id;
+    assigned_irq_data.guest_irq = irq;
     assigned_irq_data.flags = KVM_DEV_IRQ_GUEST_INTX;
     if (dev->features & ASSIGNED_DEVICE_PREFER_MSI_MASK &&
         dev->cap.available & ASSIGNED_DEVICE_CAP_MSI)
@@ -913,20 +917,19 @@ static void assigned_dev_update_msi(PCIDevice *pci_dev)
     AssignedDevice *assigned_dev = DO_UPCAST(AssignedDevice, dev, pci_dev);
     uint8_t ctrl_byte = pci_get_byte(pci_dev->config + pci_dev->msi_cap +
                                      PCI_MSI_FLAGS);
+    uint32_t dev_id;
     int r;
 
-    memset(&assigned_irq_data, 0, sizeof assigned_irq_data);
-    assigned_irq_data.assigned_dev_id = calc_assigned_dev_id(assigned_dev);
+    dev_id = calc_assigned_dev_id(assigned_dev);
 
     /* Some guests gratuitously disable MSI even if they're not using it,
      * try to catch this by only deassigning irqs if the guest is using
      * MSI or intends to start. */
     if ((assigned_dev->irq_requested_type & KVM_DEV_IRQ_GUEST_MSI) ||
         (ctrl_byte & PCI_MSI_FLAGS_ENABLE)) {
-
-        assigned_irq_data.flags = assigned_dev->irq_requested_type;
         free_dev_irq_entries(assigned_dev);
-        r = kvm_deassign_irq(kvm_state, &assigned_irq_data);
+        r = kvm_device_irq_deassign(kvm_state, dev_id,
+                                    assigned_dev->irq_requested_type);
         /* -ENXIO means no assigned irq */
         if (r && r != -ENXIO)
             perror("assigned_dev_update_msi: deassign irq");
@@ -958,6 +961,8 @@ static void assigned_dev_update_msi(PCIDevice *pci_dev)
         }
 	assigned_dev->irq_entries_nr = 1;
 
+        memset(&assigned_irq_data, 0, sizeof assigned_irq_data);
+        assigned_irq_data.assigned_dev_id = dev_id;
         assigned_irq_data.guest_irq = assigned_dev->entry->gsi;
 	assigned_irq_data.flags = KVM_DEV_IRQ_HOST_MSI | KVM_DEV_IRQ_GUEST_MSI;
         if (kvm_assign_irq(kvm_state, &assigned_irq_data) < 0) {
@@ -1066,20 +1071,19 @@ static void assigned_dev_update_msix(PCIDevice *pci_dev)
     AssignedDevice *assigned_dev = DO_UPCAST(AssignedDevice, dev, pci_dev);
     uint16_t ctrl_word = pci_get_word(pci_dev->config + pci_dev->msix_cap +
                                       PCI_MSIX_FLAGS);
+    uint32_t dev_id;
     int r;
 
-    memset(&assigned_irq_data, 0, sizeof assigned_irq_data);
-    assigned_irq_data.assigned_dev_id = calc_assigned_dev_id(assigned_dev);
+    dev_id = calc_assigned_dev_id(assigned_dev);
 
     /* Some guests gratuitously disable MSIX even if they're not using it,
      * try to catch this by only deassigning irqs if the guest is using
      * MSIX or intends to start. */
     if ((assigned_dev->irq_requested_type & KVM_DEV_IRQ_GUEST_MSIX) ||
         (ctrl_word & PCI_MSIX_FLAGS_ENABLE)) {
-
-        assigned_irq_data.flags = assigned_dev->irq_requested_type;
         free_dev_irq_entries(assigned_dev);
-        r = kvm_deassign_irq(kvm_state, &assigned_irq_data);
+        r = kvm_device_irq_deassign(kvm_state, dev_id,
+                                    assigned_dev->irq_requested_type);
         /* -ENXIO means no assigned irq */
         if (r && r != -ENXIO)
             perror("assigned_dev_update_msix: deassign irq");
@@ -1088,6 +1092,8 @@ static void assigned_dev_update_msix(PCIDevice *pci_dev)
     }
 
     if (ctrl_word & PCI_MSIX_FLAGS_ENABLE) {
+        memset(&assigned_irq_data, 0, sizeof assigned_irq_data);
+        assigned_irq_data.assigned_dev_id = dev_id;
         assigned_irq_data.flags = KVM_DEV_IRQ_HOST_MSIX |
                                   KVM_DEV_IRQ_GUEST_MSIX;
 
diff --git a/qemu-kvm.c b/qemu-kvm.c
index 199564c..c24e93c 100644
--- a/qemu-kvm.c
+++ b/qemu-kvm.c
@@ -206,11 +206,6 @@ int kvm_assign_irq(KVMState *s, struct kvm_assigned_irq *assigned_irq)
 
     return kvm_old_assign_irq(s, assigned_irq);
 }
-
-int kvm_deassign_irq(KVMState *s, struct kvm_assigned_irq *assigned_irq)
-{
-    return kvm_vm_ioctl(s, KVM_DEASSIGN_DEV_IRQ, assigned_irq);
-}
 #else
 int kvm_assign_irq(KVMState *s, struct kvm_assigned_irq *assigned_irq)
 {
@@ -219,6 +214,16 @@ int kvm_assign_irq(KVMState *s, struct kvm_assigned_irq *assigned_irq)
 #endif
 #endif
 
+int kvm_device_irq_deassign(KVMState *s, uint32_t dev_id, uint32_t type)
+{
+    struct kvm_assigned_irq assigned_irq = {
+        .assigned_dev_id = dev_id,
+        .flags = type,
+    };
+
+    return kvm_vm_ioctl(s, KVM_DEASSIGN_DEV_IRQ, &assigned_irq);
+}
+
 #ifdef KVM_CAP_DEVICE_DEASSIGNMENT
 int kvm_deassign_pci_device(KVMState *s,
                             struct kvm_assigned_pci_dev *assigned_dev)
diff --git a/qemu-kvm.h b/qemu-kvm.h
index b2ae5da..7cdb5a8 100644
--- a/qemu-kvm.h
+++ b/qemu-kvm.h
@@ -150,16 +150,7 @@ int kvm_assign_pci_device(KVMState *s,
  */
 int kvm_assign_irq(KVMState *s, struct kvm_assigned_irq *assigned_irq);
 
-/*!
- * \brief Deassign IRQ for an assigned device
- *
- * Used for PCI device assignment, this function deassigns IRQ numbers
- * for an assigned device.
- *
- * \param kvm Pointer to the current kvm_context
- * \param assigned_irq Parameters, like dev id, host irq, guest irq, etc
- */
-int kvm_deassign_irq(KVMState *s, struct kvm_assigned_irq *assigned_irq);
+int kvm_device_irq_deassign(KVMState *s, uint32_t dev_id, uint32_t type);
 
 /*!
  * \brief Notifies host kernel about a PCI device to be deassigned from a guest
-- 
1.7.3.4

^ permalink raw reply related	[flat|nested] 288+ messages in thread

* [RFC][PATCH 32/45] pci-assign: Factor out deassign_irq
  2011-10-17  9:27 ` [Qemu-devel] " Jan Kiszka
@ 2011-10-17  9:28   ` Jan Kiszka
  -1 siblings, 0 replies; 288+ messages in thread
From: Jan Kiszka @ 2011-10-17  9:28 UTC (permalink / raw)
  To: Avi Kivity, Marcelo Tosatti
  Cc: kvm, Alex Williamson, Michael S. Tsirkin, qemu-devel

Will have more users soon.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
---
 hw/device-assignment.c |   30 ++++++++++++++++++------------
 1 files changed, 18 insertions(+), 12 deletions(-)

diff --git a/hw/device-assignment.c b/hw/device-assignment.c
index e0b9cfe..e5ac54c 100644
--- a/hw/device-assignment.c
+++ b/hw/device-assignment.c
@@ -807,10 +807,25 @@ static int assign_device(AssignedDevice *dev)
     return r;
 }
 
+static void deassign_irq(AssignedDevice *dev)
+{
+    int ret;
+
+    if (dev->irq_requested_type) {
+        ret = kvm_device_irq_deassign(kvm_state,
+                                      calc_assigned_dev_id(dev),
+                                      dev->irq_requested_type);
+        if (ret) {
+            perror("assigned_dev: deassign irq");
+        }
+        dev->girq = -1;
+        dev->irq_requested_type = 0;
+    }
+}
+
 static int assign_intx(AssignedDevice *dev)
 {
     struct kvm_assigned_irq assigned_irq_data;
-    uint32_t dev_id;
     int irq, r;
 
     /* Interrupt PIN 0 means don't use INTx */
@@ -824,19 +839,10 @@ static int assign_intx(AssignedDevice *dev)
         return 0;
     }
 
-    dev_id = calc_assigned_dev_id(dev);
-
-    if (dev->irq_requested_type) {
-        r = kvm_device_irq_deassign(kvm_state, dev_id,
-                                    dev->irq_requested_type);
-        if (r) {
-            perror("assign_intx: deassign");
-        }
-        dev->irq_requested_type = 0;
-    }
+    deassign_irq(dev);
 
     memset(&assigned_irq_data, 0, sizeof(assigned_irq_data));
-    assigned_irq_data.assigned_dev_id = dev_id;
+    assigned_irq_data.assigned_dev_id = calc_assigned_dev_id(dev);
     assigned_irq_data.guest_irq = irq;
     assigned_irq_data.flags = KVM_DEV_IRQ_GUEST_INTX;
     if (dev->features & ASSIGNED_DEVICE_PREFER_MSI_MASK &&
-- 
1.7.3.4


^ permalink raw reply related	[flat|nested] 288+ messages in thread

* [Qemu-devel] [RFC][PATCH 32/45] pci-assign: Factor out deassign_irq
@ 2011-10-17  9:28   ` Jan Kiszka
  0 siblings, 0 replies; 288+ messages in thread
From: Jan Kiszka @ 2011-10-17  9:28 UTC (permalink / raw)
  To: Avi Kivity, Marcelo Tosatti
  Cc: Alex Williamson, qemu-devel, kvm, Michael S. Tsirkin

Will have more users soon.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
---
 hw/device-assignment.c |   30 ++++++++++++++++++------------
 1 files changed, 18 insertions(+), 12 deletions(-)

diff --git a/hw/device-assignment.c b/hw/device-assignment.c
index e0b9cfe..e5ac54c 100644
--- a/hw/device-assignment.c
+++ b/hw/device-assignment.c
@@ -807,10 +807,25 @@ static int assign_device(AssignedDevice *dev)
     return r;
 }
 
+static void deassign_irq(AssignedDevice *dev)
+{
+    int ret;
+
+    if (dev->irq_requested_type) {
+        ret = kvm_device_irq_deassign(kvm_state,
+                                      calc_assigned_dev_id(dev),
+                                      dev->irq_requested_type);
+        if (ret) {
+            perror("assigned_dev: deassign irq");
+        }
+        dev->girq = -1;
+        dev->irq_requested_type = 0;
+    }
+}
+
 static int assign_intx(AssignedDevice *dev)
 {
     struct kvm_assigned_irq assigned_irq_data;
-    uint32_t dev_id;
     int irq, r;
 
     /* Interrupt PIN 0 means don't use INTx */
@@ -824,19 +839,10 @@ static int assign_intx(AssignedDevice *dev)
         return 0;
     }
 
-    dev_id = calc_assigned_dev_id(dev);
-
-    if (dev->irq_requested_type) {
-        r = kvm_device_irq_deassign(kvm_state, dev_id,
-                                    dev->irq_requested_type);
-        if (r) {
-            perror("assign_intx: deassign");
-        }
-        dev->irq_requested_type = 0;
-    }
+    deassign_irq(dev);
 
     memset(&assigned_irq_data, 0, sizeof(assigned_irq_data));
-    assigned_irq_data.assigned_dev_id = dev_id;
+    assigned_irq_data.assigned_dev_id = calc_assigned_dev_id(dev);
     assigned_irq_data.guest_irq = irq;
     assigned_irq_data.flags = KVM_DEV_IRQ_GUEST_INTX;
     if (dev->features & ASSIGNED_DEVICE_PREFER_MSI_MASK &&
-- 
1.7.3.4

^ permalink raw reply related	[flat|nested] 288+ messages in thread

* [RFC][PATCH 33/45] qemu-kvm: Factor out kvm_device_intx_assign
  2011-10-17  9:27 ` [Qemu-devel] " Jan Kiszka
@ 2011-10-17  9:28   ` Jan Kiszka
  -1 siblings, 0 replies; 288+ messages in thread
From: Jan Kiszka @ 2011-10-17  9:28 UTC (permalink / raw)
  To: Avi Kivity, Marcelo Tosatti
  Cc: kvm, Alex Williamson, Michael S. Tsirkin, qemu-devel

Avoid passing kvm_assigned_irq on INTx assignment and separate this
function from (to-be-refactored) MSI/MSI-X assignment.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
---
 hw/device-assignment.c |   21 ++++++++++-----------
 qemu-kvm.c             |   17 +++++++++++++++++
 qemu-kvm.h             |    2 ++
 3 files changed, 29 insertions(+), 11 deletions(-)

diff --git a/hw/device-assignment.c b/hw/device-assignment.c
index e5ac54c..f145a84 100644
--- a/hw/device-assignment.c
+++ b/hw/device-assignment.c
@@ -825,7 +825,7 @@ static void deassign_irq(AssignedDevice *dev)
 
 static int assign_intx(AssignedDevice *dev)
 {
-    struct kvm_assigned_irq assigned_irq_data;
+    uint32_t irq_type = 0;
     int irq, r;
 
     /* Interrupt PIN 0 means don't use INTx */
@@ -841,17 +841,16 @@ static int assign_intx(AssignedDevice *dev)
 
     deassign_irq(dev);
 
-    memset(&assigned_irq_data, 0, sizeof(assigned_irq_data));
-    assigned_irq_data.assigned_dev_id = calc_assigned_dev_id(dev);
-    assigned_irq_data.guest_irq = irq;
-    assigned_irq_data.flags = KVM_DEV_IRQ_GUEST_INTX;
+    irq_type = KVM_DEV_IRQ_GUEST_INTX;
     if (dev->features & ASSIGNED_DEVICE_PREFER_MSI_MASK &&
-        dev->cap.available & ASSIGNED_DEVICE_CAP_MSI)
-        assigned_irq_data.flags |= KVM_DEV_IRQ_HOST_MSI;
-    else
-        assigned_irq_data.flags |= KVM_DEV_IRQ_HOST_INTX;
+        dev->cap.available & ASSIGNED_DEVICE_CAP_MSI) {
+        irq_type |= KVM_DEV_IRQ_HOST_MSI;
+    } else {
+        irq_type |= KVM_DEV_IRQ_HOST_INTX;
+    }
 
-    r = kvm_assign_irq(kvm_state, &assigned_irq_data);
+    r = kvm_device_intx_assign(kvm_state, calc_assigned_dev_id(dev), irq_type,
+                               irq);
     if (r < 0) {
         fprintf(stderr, "Failed to assign irq for \"%s\": %s\n",
                 dev->dev.qdev.id, strerror(-r));
@@ -861,7 +860,7 @@ static int assign_intx(AssignedDevice *dev)
     }
 
     dev->girq = irq;
-    dev->irq_requested_type = assigned_irq_data.flags;
+    dev->irq_requested_type = irq_type;
     return r;
 }
 
diff --git a/qemu-kvm.c b/qemu-kvm.c
index c24e93c..0086514 100644
--- a/qemu-kvm.c
+++ b/qemu-kvm.c
@@ -194,6 +194,23 @@ static int kvm_old_assign_irq(KVMState *s,
     return kvm_vm_ioctl(s, KVM_ASSIGN_IRQ, assigned_irq);
 }
 
+int kvm_device_intx_assign(KVMState *s, uint32_t dev_id,
+                           uint32_t host_irq_type, uint32_t guest_irq)
+{
+    struct kvm_assigned_irq assigned_irq;
+
+    assigned_irq.assigned_dev_id = dev_id;
+    assigned_irq.guest_irq = guest_irq;
+    assigned_irq.flags = KVM_DEV_IRQ_GUEST_INTX |
+        (host_irq_type & (KVM_DEV_IRQ_HOST_INTX | KVM_DEV_IRQ_HOST_MSI));
+    if (kvm_check_extension(s, KVM_CAP_ASSIGN_DEV_IRQ)) {
+        return kvm_vm_ioctl(s, KVM_ASSIGN_DEV_IRQ, &assigned_irq);
+    } else {
+        assigned_irq.host_irq = 0;
+        return kvm_vm_ioctl(s, KVM_ASSIGN_IRQ, &assigned_irq);
+    }
+}
+
 #ifdef KVM_CAP_ASSIGN_DEV_IRQ
 int kvm_assign_irq(KVMState *s, struct kvm_assigned_irq *assigned_irq)
 {
diff --git a/qemu-kvm.h b/qemu-kvm.h
index 7cdb5a8..783df7f 100644
--- a/qemu-kvm.h
+++ b/qemu-kvm.h
@@ -150,6 +150,8 @@ int kvm_assign_pci_device(KVMState *s,
  */
 int kvm_assign_irq(KVMState *s, struct kvm_assigned_irq *assigned_irq);
 
+int kvm_device_intx_assign(KVMState *s, uint32_t dev_id,
+                           uint32_t host_irq_type, uint32_t guest_irq);
 int kvm_device_irq_deassign(KVMState *s, uint32_t dev_id, uint32_t type);
 
 /*!
-- 
1.7.3.4


^ permalink raw reply related	[flat|nested] 288+ messages in thread

* [Qemu-devel] [RFC][PATCH 33/45] qemu-kvm: Factor out kvm_device_intx_assign
@ 2011-10-17  9:28   ` Jan Kiszka
  0 siblings, 0 replies; 288+ messages in thread
From: Jan Kiszka @ 2011-10-17  9:28 UTC (permalink / raw)
  To: Avi Kivity, Marcelo Tosatti
  Cc: Alex Williamson, qemu-devel, kvm, Michael S. Tsirkin

Avoid passing kvm_assigned_irq on INTx assignment and separate this
function from (to-be-refactored) MSI/MSI-X assignment.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
---
 hw/device-assignment.c |   21 ++++++++++-----------
 qemu-kvm.c             |   17 +++++++++++++++++
 qemu-kvm.h             |    2 ++
 3 files changed, 29 insertions(+), 11 deletions(-)

diff --git a/hw/device-assignment.c b/hw/device-assignment.c
index e5ac54c..f145a84 100644
--- a/hw/device-assignment.c
+++ b/hw/device-assignment.c
@@ -825,7 +825,7 @@ static void deassign_irq(AssignedDevice *dev)
 
 static int assign_intx(AssignedDevice *dev)
 {
-    struct kvm_assigned_irq assigned_irq_data;
+    uint32_t irq_type = 0;
     int irq, r;
 
     /* Interrupt PIN 0 means don't use INTx */
@@ -841,17 +841,16 @@ static int assign_intx(AssignedDevice *dev)
 
     deassign_irq(dev);
 
-    memset(&assigned_irq_data, 0, sizeof(assigned_irq_data));
-    assigned_irq_data.assigned_dev_id = calc_assigned_dev_id(dev);
-    assigned_irq_data.guest_irq = irq;
-    assigned_irq_data.flags = KVM_DEV_IRQ_GUEST_INTX;
+    irq_type = KVM_DEV_IRQ_GUEST_INTX;
     if (dev->features & ASSIGNED_DEVICE_PREFER_MSI_MASK &&
-        dev->cap.available & ASSIGNED_DEVICE_CAP_MSI)
-        assigned_irq_data.flags |= KVM_DEV_IRQ_HOST_MSI;
-    else
-        assigned_irq_data.flags |= KVM_DEV_IRQ_HOST_INTX;
+        dev->cap.available & ASSIGNED_DEVICE_CAP_MSI) {
+        irq_type |= KVM_DEV_IRQ_HOST_MSI;
+    } else {
+        irq_type |= KVM_DEV_IRQ_HOST_INTX;
+    }
 
-    r = kvm_assign_irq(kvm_state, &assigned_irq_data);
+    r = kvm_device_intx_assign(kvm_state, calc_assigned_dev_id(dev), irq_type,
+                               irq);
     if (r < 0) {
         fprintf(stderr, "Failed to assign irq for \"%s\": %s\n",
                 dev->dev.qdev.id, strerror(-r));
@@ -861,7 +860,7 @@ static int assign_intx(AssignedDevice *dev)
     }
 
     dev->girq = irq;
-    dev->irq_requested_type = assigned_irq_data.flags;
+    dev->irq_requested_type = irq_type;
     return r;
 }
 
diff --git a/qemu-kvm.c b/qemu-kvm.c
index c24e93c..0086514 100644
--- a/qemu-kvm.c
+++ b/qemu-kvm.c
@@ -194,6 +194,23 @@ static int kvm_old_assign_irq(KVMState *s,
     return kvm_vm_ioctl(s, KVM_ASSIGN_IRQ, assigned_irq);
 }
 
+int kvm_device_intx_assign(KVMState *s, uint32_t dev_id,
+                           uint32_t host_irq_type, uint32_t guest_irq)
+{
+    struct kvm_assigned_irq assigned_irq;
+
+    assigned_irq.assigned_dev_id = dev_id;
+    assigned_irq.guest_irq = guest_irq;
+    assigned_irq.flags = KVM_DEV_IRQ_GUEST_INTX |
+        (host_irq_type & (KVM_DEV_IRQ_HOST_INTX | KVM_DEV_IRQ_HOST_MSI));
+    if (kvm_check_extension(s, KVM_CAP_ASSIGN_DEV_IRQ)) {
+        return kvm_vm_ioctl(s, KVM_ASSIGN_DEV_IRQ, &assigned_irq);
+    } else {
+        assigned_irq.host_irq = 0;
+        return kvm_vm_ioctl(s, KVM_ASSIGN_IRQ, &assigned_irq);
+    }
+}
+
 #ifdef KVM_CAP_ASSIGN_DEV_IRQ
 int kvm_assign_irq(KVMState *s, struct kvm_assigned_irq *assigned_irq)
 {
diff --git a/qemu-kvm.h b/qemu-kvm.h
index 7cdb5a8..783df7f 100644
--- a/qemu-kvm.h
+++ b/qemu-kvm.h
@@ -150,6 +150,8 @@ int kvm_assign_pci_device(KVMState *s,
  */
 int kvm_assign_irq(KVMState *s, struct kvm_assigned_irq *assigned_irq);
 
+int kvm_device_intx_assign(KVMState *s, uint32_t dev_id,
+                           uint32_t host_irq_type, uint32_t guest_irq);
 int kvm_device_irq_deassign(KVMState *s, uint32_t dev_id, uint32_t type);
 
 /*!
-- 
1.7.3.4

^ permalink raw reply related	[flat|nested] 288+ messages in thread

* [RFC][PATCH 34/45] qemu-kvm: Factor out kvm_device_msi_assign
  2011-10-17  9:27 ` [Qemu-devel] " Jan Kiszka
@ 2011-10-17  9:28   ` Jan Kiszka
  -1 siblings, 0 replies; 288+ messages in thread
From: Jan Kiszka @ 2011-10-17  9:28 UTC (permalink / raw)
  To: Avi Kivity, Marcelo Tosatti
  Cc: kvm, Alex Williamson, Michael S. Tsirkin, qemu-devel

This reuses the MSI routing infrastructure of the KVM core by folding
both the routing setup and the IRQ assignment into a new function called
kvm_device_msi_assign. It's also a good chance to clean up the IRQ
deassignment before updates.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
---
 hw/device-assignment.c |   70 +++++++++++++----------------------------------
 qemu-kvm.c             |   16 +++++++++++
 qemu-kvm.h             |    2 +
 3 files changed, 38 insertions(+), 50 deletions(-)

diff --git a/hw/device-assignment.c b/hw/device-assignment.c
index f145a84..7a8f702 100644
--- a/hw/device-assignment.c
+++ b/hw/device-assignment.c
@@ -40,6 +40,7 @@
 #include "monitor.h"
 #include "range.h"
 #include "sysemu.h"
+#include "msi.h"
 
 #define MSIX_PAGE_SIZE 0x1000
 
@@ -701,6 +702,11 @@ static void free_assigned_device(AssignedDevice *dev)
     }
 
     free_dev_irq_entries(dev);
+
+    if (dev->dev.msi_cache) {
+        kvm_msi_cache_invalidate(&dev->dev.msi_cache[0]);
+        g_free(dev->dev.msi_cache);
+    }
 }
 
 static uint32_t calc_assigned_dev_id(AssignedDevice *dev)
@@ -918,66 +924,28 @@ void assigned_dev_update_irqs(void)
 
 static void assigned_dev_update_msi(PCIDevice *pci_dev)
 {
-    struct kvm_assigned_irq assigned_irq_data;
-    AssignedDevice *assigned_dev = DO_UPCAST(AssignedDevice, dev, pci_dev);
+    AssignedDevice *dev = DO_UPCAST(AssignedDevice, dev, pci_dev);
     uint8_t ctrl_byte = pci_get_byte(pci_dev->config + pci_dev->msi_cap +
                                      PCI_MSI_FLAGS);
-    uint32_t dev_id;
-    int r;
-
-    dev_id = calc_assigned_dev_id(assigned_dev);
-
-    /* Some guests gratuitously disable MSI even if they're not using it,
-     * try to catch this by only deassigning irqs if the guest is using
-     * MSI or intends to start. */
-    if ((assigned_dev->irq_requested_type & KVM_DEV_IRQ_GUEST_MSI) ||
-        (ctrl_byte & PCI_MSI_FLAGS_ENABLE)) {
-        free_dev_irq_entries(assigned_dev);
-        r = kvm_device_irq_deassign(kvm_state, dev_id,
-                                    assigned_dev->irq_requested_type);
-        /* -ENXIO means no assigned irq */
-        if (r && r != -ENXIO)
-            perror("assigned_dev_update_msi: deassign irq");
-
-        assigned_dev->irq_requested_type = 0;
-    }
 
     if (ctrl_byte & PCI_MSI_FLAGS_ENABLE) {
         uint8_t *pos = pci_dev->config + pci_dev->msi_cap;
+        MSIMessage msg;
 
-        assigned_dev->entry = g_malloc0(sizeof(*(assigned_dev->entry)));
-        assigned_dev->entry->u.msi.address_lo =
-            pci_get_long(pos + PCI_MSI_ADDRESS_LO);
-        assigned_dev->entry->u.msi.address_hi = 0;
-        assigned_dev->entry->u.msi.data = pci_get_word(pos + PCI_MSI_DATA_32);
-        assigned_dev->entry->type = KVM_IRQ_ROUTING_MSI;
-        r = kvm_get_irq_route_gsi();
-        if (r < 0) {
-            perror("assigned_dev_update_msi: kvm_get_irq_route_gsi");
-            return;
-        }
-        assigned_dev->entry->gsi = r;
+        deassign_irq(dev);
 
-        kvm_add_routing_entry(assigned_dev->entry, NULL);
-        if (kvm_commit_irq_routes() < 0) {
-            perror("assigned_dev_update_msi: kvm_commit_irq_routes");
-            assigned_dev->cap.state &= ~ASSIGNED_DEVICE_MSI_ENABLED;
-            return;
-        }
-	assigned_dev->irq_entries_nr = 1;
+        msg.address = pci_get_long(pos + PCI_MSI_ADDRESS_LO);
+        msg.data = pci_get_word(pos + PCI_MSI_DATA_32);
 
-        memset(&assigned_irq_data, 0, sizeof assigned_irq_data);
-        assigned_irq_data.assigned_dev_id = dev_id;
-        assigned_irq_data.guest_irq = assigned_dev->entry->gsi;
-	assigned_irq_data.flags = KVM_DEV_IRQ_HOST_MSI | KVM_DEV_IRQ_GUEST_MSI;
-        if (kvm_assign_irq(kvm_state, &assigned_irq_data) < 0) {
-            perror("assigned_dev_enable_msi: assign irq");
+        if (kvm_device_msi_assign(kvm_state, calc_assigned_dev_id(dev), &msg,
+                                  &dev->dev.msi_cache[0]) < 0) {
+            perror("assigned_dev_update_msi: assign msi");
+            return;
         }
-
-        assigned_dev->girq = -1;
-        assigned_dev->irq_requested_type = assigned_irq_data.flags;
+        dev->irq_requested_type = KVM_DEV_IRQ_HOST_MSI | KVM_DEV_IRQ_GUEST_MSI;
     } else {
-        assign_intx(assigned_dev);
+        kvm_msi_cache_invalidate(&dev->dev.msi_cache[0]);
+        assign_intx(dev);
     }
 }
 
@@ -1215,6 +1183,8 @@ static int assigned_device_pci_cap_init(PCIDevice *pci_dev)
                      PCI_MSI_FLAGS_QSIZE | PCI_MSI_FLAGS_ENABLE);
         pci_set_long(pci_dev->wmask + pos + PCI_MSI_ADDRESS_LO, 0xfffffffc);
         pci_set_word(pci_dev->wmask + pos + PCI_MSI_DATA_32, 0xffff);
+
+        dev->dev.msi_cache = g_malloc0(sizeof(MSIRoutingCache));
     }
     /* Expose MSI-X capability */
     pos = pci_find_cap_offset(pci_dev, PCI_CAP_ID_MSIX, 0);
diff --git a/qemu-kvm.c b/qemu-kvm.c
index 0086514..27723a6 100644
--- a/qemu-kvm.c
+++ b/qemu-kvm.c
@@ -600,6 +600,22 @@ int kvm_msi_irqfd_set(MSIMessage *msg, MSIRoutingCache *cache, int fd,
     return kvm_set_irqfd(cache->kvm_gsi, fd, assigned);
 }
 
+int kvm_device_msi_assign(KVMState *s, uint32_t dev_id, MSIMessage *msg,
+                          MSIRoutingCache *cache)
+{
+    struct kvm_assigned_irq assigned_irq;
+    int ret;
+
+    ret = kvm_msi_message_update(msg, cache, MSI_ROUTE_STATIC);
+    if (ret < 0) {
+        return ret;
+    }
+    assigned_irq.assigned_dev_id = dev_id;
+    assigned_irq.guest_irq = cache->kvm_gsi;
+    assigned_irq.flags = KVM_DEV_IRQ_HOST_MSI | KVM_DEV_IRQ_GUEST_MSI;
+    return kvm_vm_ioctl(s, KVM_ASSIGN_DEV_IRQ, &assigned_irq);
+}
+
 #ifdef KVM_CAP_DEVICE_MSIX
 int kvm_assign_set_msix_nr(KVMState *s, struct kvm_assigned_msix_nr *msix_nr)
 {
diff --git a/qemu-kvm.h b/qemu-kvm.h
index 783df7f..d987d41 100644
--- a/qemu-kvm.h
+++ b/qemu-kvm.h
@@ -152,6 +152,8 @@ int kvm_assign_irq(KVMState *s, struct kvm_assigned_irq *assigned_irq);
 
 int kvm_device_intx_assign(KVMState *s, uint32_t dev_id,
                            uint32_t host_irq_type, uint32_t guest_irq);
+int kvm_device_msi_assign(KVMState *s, uint32_t dev_id, MSIMessage *msg,
+                          MSIRoutingCache *cache);
 int kvm_device_irq_deassign(KVMState *s, uint32_t dev_id, uint32_t type);
 
 /*!
-- 
1.7.3.4


^ permalink raw reply related	[flat|nested] 288+ messages in thread

* [Qemu-devel] [RFC][PATCH 34/45] qemu-kvm: Factor out kvm_device_msi_assign
@ 2011-10-17  9:28   ` Jan Kiszka
  0 siblings, 0 replies; 288+ messages in thread
From: Jan Kiszka @ 2011-10-17  9:28 UTC (permalink / raw)
  To: Avi Kivity, Marcelo Tosatti
  Cc: Alex Williamson, qemu-devel, kvm, Michael S. Tsirkin

This reuses the MSI routing infrastructure of the KVM core by folding
both the routing setup and the IRQ assignment into a new function called
kvm_device_msi_assign. It's also a good chance to clean up the IRQ
deassignment before updates.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
---
 hw/device-assignment.c |   70 +++++++++++++----------------------------------
 qemu-kvm.c             |   16 +++++++++++
 qemu-kvm.h             |    2 +
 3 files changed, 38 insertions(+), 50 deletions(-)

diff --git a/hw/device-assignment.c b/hw/device-assignment.c
index f145a84..7a8f702 100644
--- a/hw/device-assignment.c
+++ b/hw/device-assignment.c
@@ -40,6 +40,7 @@
 #include "monitor.h"
 #include "range.h"
 #include "sysemu.h"
+#include "msi.h"
 
 #define MSIX_PAGE_SIZE 0x1000
 
@@ -701,6 +702,11 @@ static void free_assigned_device(AssignedDevice *dev)
     }
 
     free_dev_irq_entries(dev);
+
+    if (dev->dev.msi_cache) {
+        kvm_msi_cache_invalidate(&dev->dev.msi_cache[0]);
+        g_free(dev->dev.msi_cache);
+    }
 }
 
 static uint32_t calc_assigned_dev_id(AssignedDevice *dev)
@@ -918,66 +924,28 @@ void assigned_dev_update_irqs(void)
 
 static void assigned_dev_update_msi(PCIDevice *pci_dev)
 {
-    struct kvm_assigned_irq assigned_irq_data;
-    AssignedDevice *assigned_dev = DO_UPCAST(AssignedDevice, dev, pci_dev);
+    AssignedDevice *dev = DO_UPCAST(AssignedDevice, dev, pci_dev);
     uint8_t ctrl_byte = pci_get_byte(pci_dev->config + pci_dev->msi_cap +
                                      PCI_MSI_FLAGS);
-    uint32_t dev_id;
-    int r;
-
-    dev_id = calc_assigned_dev_id(assigned_dev);
-
-    /* Some guests gratuitously disable MSI even if they're not using it,
-     * try to catch this by only deassigning irqs if the guest is using
-     * MSI or intends to start. */
-    if ((assigned_dev->irq_requested_type & KVM_DEV_IRQ_GUEST_MSI) ||
-        (ctrl_byte & PCI_MSI_FLAGS_ENABLE)) {
-        free_dev_irq_entries(assigned_dev);
-        r = kvm_device_irq_deassign(kvm_state, dev_id,
-                                    assigned_dev->irq_requested_type);
-        /* -ENXIO means no assigned irq */
-        if (r && r != -ENXIO)
-            perror("assigned_dev_update_msi: deassign irq");
-
-        assigned_dev->irq_requested_type = 0;
-    }
 
     if (ctrl_byte & PCI_MSI_FLAGS_ENABLE) {
         uint8_t *pos = pci_dev->config + pci_dev->msi_cap;
+        MSIMessage msg;
 
-        assigned_dev->entry = g_malloc0(sizeof(*(assigned_dev->entry)));
-        assigned_dev->entry->u.msi.address_lo =
-            pci_get_long(pos + PCI_MSI_ADDRESS_LO);
-        assigned_dev->entry->u.msi.address_hi = 0;
-        assigned_dev->entry->u.msi.data = pci_get_word(pos + PCI_MSI_DATA_32);
-        assigned_dev->entry->type = KVM_IRQ_ROUTING_MSI;
-        r = kvm_get_irq_route_gsi();
-        if (r < 0) {
-            perror("assigned_dev_update_msi: kvm_get_irq_route_gsi");
-            return;
-        }
-        assigned_dev->entry->gsi = r;
+        deassign_irq(dev);
 
-        kvm_add_routing_entry(assigned_dev->entry, NULL);
-        if (kvm_commit_irq_routes() < 0) {
-            perror("assigned_dev_update_msi: kvm_commit_irq_routes");
-            assigned_dev->cap.state &= ~ASSIGNED_DEVICE_MSI_ENABLED;
-            return;
-        }
-	assigned_dev->irq_entries_nr = 1;
+        msg.address = pci_get_long(pos + PCI_MSI_ADDRESS_LO);
+        msg.data = pci_get_word(pos + PCI_MSI_DATA_32);
 
-        memset(&assigned_irq_data, 0, sizeof assigned_irq_data);
-        assigned_irq_data.assigned_dev_id = dev_id;
-        assigned_irq_data.guest_irq = assigned_dev->entry->gsi;
-	assigned_irq_data.flags = KVM_DEV_IRQ_HOST_MSI | KVM_DEV_IRQ_GUEST_MSI;
-        if (kvm_assign_irq(kvm_state, &assigned_irq_data) < 0) {
-            perror("assigned_dev_enable_msi: assign irq");
+        if (kvm_device_msi_assign(kvm_state, calc_assigned_dev_id(dev), &msg,
+                                  &dev->dev.msi_cache[0]) < 0) {
+            perror("assigned_dev_update_msi: assign msi");
+            return;
         }
-
-        assigned_dev->girq = -1;
-        assigned_dev->irq_requested_type = assigned_irq_data.flags;
+        dev->irq_requested_type = KVM_DEV_IRQ_HOST_MSI | KVM_DEV_IRQ_GUEST_MSI;
     } else {
-        assign_intx(assigned_dev);
+        kvm_msi_cache_invalidate(&dev->dev.msi_cache[0]);
+        assign_intx(dev);
     }
 }
 
@@ -1215,6 +1183,8 @@ static int assigned_device_pci_cap_init(PCIDevice *pci_dev)
                      PCI_MSI_FLAGS_QSIZE | PCI_MSI_FLAGS_ENABLE);
         pci_set_long(pci_dev->wmask + pos + PCI_MSI_ADDRESS_LO, 0xfffffffc);
         pci_set_word(pci_dev->wmask + pos + PCI_MSI_DATA_32, 0xffff);
+
+        dev->dev.msi_cache = g_malloc0(sizeof(MSIRoutingCache));
     }
     /* Expose MSI-X capability */
     pos = pci_find_cap_offset(pci_dev, PCI_CAP_ID_MSIX, 0);
diff --git a/qemu-kvm.c b/qemu-kvm.c
index 0086514..27723a6 100644
--- a/qemu-kvm.c
+++ b/qemu-kvm.c
@@ -600,6 +600,22 @@ int kvm_msi_irqfd_set(MSIMessage *msg, MSIRoutingCache *cache, int fd,
     return kvm_set_irqfd(cache->kvm_gsi, fd, assigned);
 }
 
+int kvm_device_msi_assign(KVMState *s, uint32_t dev_id, MSIMessage *msg,
+                          MSIRoutingCache *cache)
+{
+    struct kvm_assigned_irq assigned_irq;
+    int ret;
+
+    ret = kvm_msi_message_update(msg, cache, MSI_ROUTE_STATIC);
+    if (ret < 0) {
+        return ret;
+    }
+    assigned_irq.assigned_dev_id = dev_id;
+    assigned_irq.guest_irq = cache->kvm_gsi;
+    assigned_irq.flags = KVM_DEV_IRQ_HOST_MSI | KVM_DEV_IRQ_GUEST_MSI;
+    return kvm_vm_ioctl(s, KVM_ASSIGN_DEV_IRQ, &assigned_irq);
+}
+
 #ifdef KVM_CAP_DEVICE_MSIX
 int kvm_assign_set_msix_nr(KVMState *s, struct kvm_assigned_msix_nr *msix_nr)
 {
diff --git a/qemu-kvm.h b/qemu-kvm.h
index 783df7f..d987d41 100644
--- a/qemu-kvm.h
+++ b/qemu-kvm.h
@@ -152,6 +152,8 @@ int kvm_assign_irq(KVMState *s, struct kvm_assigned_irq *assigned_irq);
 
 int kvm_device_intx_assign(KVMState *s, uint32_t dev_id,
                            uint32_t host_irq_type, uint32_t guest_irq);
+int kvm_device_msi_assign(KVMState *s, uint32_t dev_id, MSIMessage *msg,
+                          MSIRoutingCache *cache);
 int kvm_device_irq_deassign(KVMState *s, uint32_t dev_id, uint32_t type);
 
 /*!
-- 
1.7.3.4

^ permalink raw reply related	[flat|nested] 288+ messages in thread

* [RFC][PATCH 35/45] pci-assign: Polish assigned_dev_update_msix_mmio
  2011-10-17  9:27 ` [Qemu-devel] " Jan Kiszka
@ 2011-10-17  9:28   ` Jan Kiszka
  -1 siblings, 0 replies; 288+ messages in thread
From: Jan Kiszka @ 2011-10-17  9:28 UTC (permalink / raw)
  To: Avi Kivity, Marcelo Tosatti
  Cc: kvm, Alex Williamson, Michael S. Tsirkin, qemu-devel

- rename to assigned_dev_set_msix_vectors
- drop unused msg_ctrl
- use pci_get_* accessors
- rename variable va to msix_page
- clarify comment on msg_data == 0 optimization
- fix coding style

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
---
 hw/device-assignment.c |   53 ++++++++++++++++++++++++++---------------------
 1 files changed, 29 insertions(+), 24 deletions(-)

diff --git a/hw/device-assignment.c b/hw/device-assignment.c
index 7a8f702..83951a3 100644
--- a/hw/device-assignment.c
+++ b/hw/device-assignment.c
@@ -949,42 +949,43 @@ static void assigned_dev_update_msi(PCIDevice *pci_dev)
     }
 }
 
-static int assigned_dev_update_msix_mmio(PCIDevice *pci_dev)
+static int assigned_dev_set_msix_vectors(PCIDevice *pci_dev)
 {
     AssignedDevice *adev = DO_UPCAST(AssignedDevice, dev, pci_dev);
     uint16_t entries_nr = 0, entries_max_nr;
     int pos = 0, i, r = 0;
-    uint32_t msg_addr, msg_upper_addr, msg_data, msg_ctrl;
+    uint32_t msg_addr, msg_upper_addr, msg_data;
     struct kvm_assigned_msix_nr msix_nr;
     struct kvm_assigned_msix_entry msix_entry;
-    void *va = adev->msix_table_page;
+    void *msix_page = adev->msix_table_page;
 
     pos = pci_find_capability(pci_dev, PCI_CAP_ID_MSIX);
 
-    entries_max_nr = *(uint16_t *)(pci_dev->config + pos + 2);
+    entries_max_nr = pci_get_word(pci_dev->config + pos + PCI_MSIX_FLAGS);
     entries_max_nr &= PCI_MSIX_FLAGS_QSIZE;
     entries_max_nr += 1;
 
     /* Get the usable entry number for allocating */
     for (i = 0; i < entries_max_nr; i++) {
-        memcpy(&msg_ctrl, va + i * 16 + 12, 4);
-        memcpy(&msg_data, va + i * 16 + 8, 4);
-        /* Ignore unused entry even it's unmasked */
-        if (msg_data == 0)
+        /* Assuming IA-32 MSI message format:
+         * Ignore unused entry (invalid vector) */
+        if (pci_get_long(msix_page + i * PCI_MSIX_ENTRY_SIZE +
+                         PCI_MSIX_ENTRY_DATA) == 0) {
             continue;
-        entries_nr ++;
+        }
+        entries_nr++;
     }
-
     if (entries_nr == 0) {
         fprintf(stderr, "MSI-X entry number is zero!\n");
         return -EINVAL;
     }
+
     msix_nr.assigned_dev_id = calc_assigned_dev_id(adev);
     msix_nr.entry_nr = entries_nr;
     r = kvm_assign_set_msix_nr(kvm_state, &msix_nr);
     if (r != 0) {
         fprintf(stderr, "fail to set MSI-X entry number for MSIX! %s\n",
-			strerror(-r));
+                strerror(-r));
         return r;
     }
 
@@ -995,19 +996,23 @@ static int assigned_dev_update_msix_mmio(PCIDevice *pci_dev)
     msix_entry.assigned_dev_id = msix_nr.assigned_dev_id;
     entries_nr = 0;
     for (i = 0; i < entries_max_nr; i++) {
-        if (entries_nr >= msix_nr.entry_nr)
+        if (entries_nr >= msix_nr.entry_nr) {
             break;
-        memcpy(&msg_ctrl, va + i * 16 + 12, 4);
-        memcpy(&msg_data, va + i * 16 + 8, 4);
-        if (msg_data == 0)
+        }
+        msg_data = pci_get_long(msix_page + i * PCI_MSIX_ENTRY_SIZE +
+                                PCI_MSIX_ENTRY_DATA);
+        if (msg_data == 0) {
             continue;
-
-        memcpy(&msg_addr, va + i * 16, 4);
-        memcpy(&msg_upper_addr, va + i * 16 + 4, 4);
+        }
+        msg_addr = pci_get_long(msix_page + i * PCI_MSIX_ENTRY_SIZE +
+                                PCI_MSIX_ENTRY_LOWER_ADDR);
+        msg_upper_addr = pci_get_long(msix_page + i * PCI_MSIX_ENTRY_SIZE +
+                                      PCI_MSIX_ENTRY_UPPER_ADDR);
 
         r = kvm_get_irq_route_gsi();
-        if (r < 0)
+        if (r < 0) {
             return r;
+        }
 
         adev->entry[entries_nr].gsi = r;
         adev->entry[entries_nr].type = KVM_IRQ_ROUTING_MSI;
@@ -1026,13 +1031,13 @@ static int assigned_dev_update_msix_mmio(PCIDevice *pci_dev)
             break;
         }
         DEBUG("MSI-X entry gsi 0x%x, entry %d\n!",
-                msix_entry.gsi, msix_entry.entry);
-        entries_nr ++;
+              msix_entry.gsi, msix_entry.entry);
+        entries_nr++;
     }
 
     if (r == 0 && kvm_commit_irq_routes() < 0) {
-	    perror("assigned_dev_update_msix_mmio: kvm_commit_irq_routes");
-	    return -EINVAL;
+        perror("assigned_dev_update_msix_mmio: kvm_commit_irq_routes");
+        return -EINVAL;
     }
 
     return r;
@@ -1070,7 +1075,7 @@ static void assigned_dev_update_msix(PCIDevice *pci_dev)
         assigned_irq_data.flags = KVM_DEV_IRQ_HOST_MSIX |
                                   KVM_DEV_IRQ_GUEST_MSIX;
 
-        if (assigned_dev_update_msix_mmio(pci_dev) < 0) {
+        if (assigned_dev_set_msix_vectors(pci_dev) < 0) {
             perror("assigned_dev_update_msix_mmio");
             return;
         }
-- 
1.7.3.4


^ permalink raw reply related	[flat|nested] 288+ messages in thread

* [Qemu-devel] [RFC][PATCH 35/45] pci-assign: Polish assigned_dev_update_msix_mmio
@ 2011-10-17  9:28   ` Jan Kiszka
  0 siblings, 0 replies; 288+ messages in thread
From: Jan Kiszka @ 2011-10-17  9:28 UTC (permalink / raw)
  To: Avi Kivity, Marcelo Tosatti
  Cc: Alex Williamson, qemu-devel, kvm, Michael S. Tsirkin

- rename to assigned_dev_set_msix_vectors
- drop unused msg_ctrl
- use pci_get_* accessors
- rename variable va to msix_page
- clarify comment on msg_data == 0 optimization
- fix coding style

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
---
 hw/device-assignment.c |   53 ++++++++++++++++++++++++++---------------------
 1 files changed, 29 insertions(+), 24 deletions(-)

diff --git a/hw/device-assignment.c b/hw/device-assignment.c
index 7a8f702..83951a3 100644
--- a/hw/device-assignment.c
+++ b/hw/device-assignment.c
@@ -949,42 +949,43 @@ static void assigned_dev_update_msi(PCIDevice *pci_dev)
     }
 }
 
-static int assigned_dev_update_msix_mmio(PCIDevice *pci_dev)
+static int assigned_dev_set_msix_vectors(PCIDevice *pci_dev)
 {
     AssignedDevice *adev = DO_UPCAST(AssignedDevice, dev, pci_dev);
     uint16_t entries_nr = 0, entries_max_nr;
     int pos = 0, i, r = 0;
-    uint32_t msg_addr, msg_upper_addr, msg_data, msg_ctrl;
+    uint32_t msg_addr, msg_upper_addr, msg_data;
     struct kvm_assigned_msix_nr msix_nr;
     struct kvm_assigned_msix_entry msix_entry;
-    void *va = adev->msix_table_page;
+    void *msix_page = adev->msix_table_page;
 
     pos = pci_find_capability(pci_dev, PCI_CAP_ID_MSIX);
 
-    entries_max_nr = *(uint16_t *)(pci_dev->config + pos + 2);
+    entries_max_nr = pci_get_word(pci_dev->config + pos + PCI_MSIX_FLAGS);
     entries_max_nr &= PCI_MSIX_FLAGS_QSIZE;
     entries_max_nr += 1;
 
     /* Get the usable entry number for allocating */
     for (i = 0; i < entries_max_nr; i++) {
-        memcpy(&msg_ctrl, va + i * 16 + 12, 4);
-        memcpy(&msg_data, va + i * 16 + 8, 4);
-        /* Ignore unused entry even it's unmasked */
-        if (msg_data == 0)
+        /* Assuming IA-32 MSI message format:
+         * Ignore unused entry (invalid vector) */
+        if (pci_get_long(msix_page + i * PCI_MSIX_ENTRY_SIZE +
+                         PCI_MSIX_ENTRY_DATA) == 0) {
             continue;
-        entries_nr ++;
+        }
+        entries_nr++;
     }
-
     if (entries_nr == 0) {
         fprintf(stderr, "MSI-X entry number is zero!\n");
         return -EINVAL;
     }
+
     msix_nr.assigned_dev_id = calc_assigned_dev_id(adev);
     msix_nr.entry_nr = entries_nr;
     r = kvm_assign_set_msix_nr(kvm_state, &msix_nr);
     if (r != 0) {
         fprintf(stderr, "fail to set MSI-X entry number for MSIX! %s\n",
-			strerror(-r));
+                strerror(-r));
         return r;
     }
 
@@ -995,19 +996,23 @@ static int assigned_dev_update_msix_mmio(PCIDevice *pci_dev)
     msix_entry.assigned_dev_id = msix_nr.assigned_dev_id;
     entries_nr = 0;
     for (i = 0; i < entries_max_nr; i++) {
-        if (entries_nr >= msix_nr.entry_nr)
+        if (entries_nr >= msix_nr.entry_nr) {
             break;
-        memcpy(&msg_ctrl, va + i * 16 + 12, 4);
-        memcpy(&msg_data, va + i * 16 + 8, 4);
-        if (msg_data == 0)
+        }
+        msg_data = pci_get_long(msix_page + i * PCI_MSIX_ENTRY_SIZE +
+                                PCI_MSIX_ENTRY_DATA);
+        if (msg_data == 0) {
             continue;
-
-        memcpy(&msg_addr, va + i * 16, 4);
-        memcpy(&msg_upper_addr, va + i * 16 + 4, 4);
+        }
+        msg_addr = pci_get_long(msix_page + i * PCI_MSIX_ENTRY_SIZE +
+                                PCI_MSIX_ENTRY_LOWER_ADDR);
+        msg_upper_addr = pci_get_long(msix_page + i * PCI_MSIX_ENTRY_SIZE +
+                                      PCI_MSIX_ENTRY_UPPER_ADDR);
 
         r = kvm_get_irq_route_gsi();
-        if (r < 0)
+        if (r < 0) {
             return r;
+        }
 
         adev->entry[entries_nr].gsi = r;
         adev->entry[entries_nr].type = KVM_IRQ_ROUTING_MSI;
@@ -1026,13 +1031,13 @@ static int assigned_dev_update_msix_mmio(PCIDevice *pci_dev)
             break;
         }
         DEBUG("MSI-X entry gsi 0x%x, entry %d\n!",
-                msix_entry.gsi, msix_entry.entry);
-        entries_nr ++;
+              msix_entry.gsi, msix_entry.entry);
+        entries_nr++;
     }
 
     if (r == 0 && kvm_commit_irq_routes() < 0) {
-	    perror("assigned_dev_update_msix_mmio: kvm_commit_irq_routes");
-	    return -EINVAL;
+        perror("assigned_dev_update_msix_mmio: kvm_commit_irq_routes");
+        return -EINVAL;
     }
 
     return r;
@@ -1070,7 +1075,7 @@ static void assigned_dev_update_msix(PCIDevice *pci_dev)
         assigned_irq_data.flags = KVM_DEV_IRQ_HOST_MSIX |
                                   KVM_DEV_IRQ_GUEST_MSIX;
 
-        if (assigned_dev_update_msix_mmio(pci_dev) < 0) {
+        if (assigned_dev_set_msix_vectors(pci_dev) < 0) {
             perror("assigned_dev_update_msix_mmio");
             return;
         }
-- 
1.7.3.4

^ permalink raw reply related	[flat|nested] 288+ messages in thread

* [RFC][PATCH 36/45] qemu-kvm: Factor out kvm_device_msix_* services
  2011-10-17  9:27 ` [Qemu-devel] " Jan Kiszka
@ 2011-10-17  9:28   ` Jan Kiszka
  -1 siblings, 0 replies; 288+ messages in thread
From: Jan Kiszka @ 2011-10-17  9:28 UTC (permalink / raw)
  To: Avi Kivity, Marcelo Tosatti
  Cc: kvm, Alex Williamson, Michael S. Tsirkin, qemu-devel

Create kvm_device_msix_{supported,init_vectors,set_vector,assign},
replacing the old kvm_assign_set_msix_{nr,entry} services. The new API
no longer requires direct fiddling with the KVM API data structures and
just takes the required parameters. kvm_device_msix_set_vector also
combines MSI route creation/update with registering the vector with the
device assignment kernel part. The routing information is now stored in
the msix_cache of the backing QEMU PCI device, maintained by the device
assigment code until we switch to generic MSI-X support.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
---
 hw/device-assignment.c |  103 +++++++++++++++--------------------------------
 hw/device-assignment.h |    1 -
 qemu-kvm.c             |   42 +++++++++++++++++--
 qemu-kvm.h             |   11 +++--
 4 files changed, 76 insertions(+), 81 deletions(-)

diff --git a/hw/device-assignment.c b/hw/device-assignment.c
index 83951a3..2484afd 100644
--- a/hw/device-assignment.c
+++ b/hw/device-assignment.c
@@ -648,15 +648,13 @@ again:
 
 static QLIST_HEAD(, AssignedDevice) devs = QLIST_HEAD_INITIALIZER(devs);
 
-static void free_dev_irq_entries(AssignedDevice *dev)
+static void invalidate_msix_vectors(AssignedDevice *dev)
 {
     int i;
 
-    for (i = 0; i < dev->irq_entries_nr; i++)
-        kvm_del_routing_entry(&dev->entry[i]);
-    g_free(dev->entry);
-    dev->entry = NULL;
-    dev->irq_entries_nr = 0;
+    for (i = 0; i < dev->irq_entries_nr; i++) {
+        kvm_msi_cache_invalidate(&dev->dev.msix_cache[i]);
+    }
 }
 
 static void free_assigned_device(AssignedDevice *dev)
@@ -701,12 +699,12 @@ static void free_assigned_device(AssignedDevice *dev)
         close(dev->real_device.config_fd);
     }
 
-    free_dev_irq_entries(dev);
-
     if (dev->dev.msi_cache) {
         kvm_msi_cache_invalidate(&dev->dev.msi_cache[0]);
         g_free(dev->dev.msi_cache);
     }
+    invalidate_msix_vectors(dev);
+    g_free(dev->dev.msix_cache);
 }
 
 static uint32_t calc_assigned_dev_id(AssignedDevice *dev)
@@ -953,11 +951,12 @@ static int assigned_dev_set_msix_vectors(PCIDevice *pci_dev)
 {
     AssignedDevice *adev = DO_UPCAST(AssignedDevice, dev, pci_dev);
     uint16_t entries_nr = 0, entries_max_nr;
-    int pos = 0, i, r = 0;
-    uint32_t msg_addr, msg_upper_addr, msg_data;
-    struct kvm_assigned_msix_nr msix_nr;
-    struct kvm_assigned_msix_entry msix_entry;
     void *msix_page = adev->msix_table_page;
+    uint32_t dev_id;
+    MSIMessage msg;
+    int pos, i, r;
+
+    assert(adev->irq_entries_nr == 0);
 
     pos = pci_find_capability(pci_dev, PCI_CAP_ID_MSIX);
 
@@ -980,72 +979,40 @@ static int assigned_dev_set_msix_vectors(PCIDevice *pci_dev)
         return -EINVAL;
     }
 
-    msix_nr.assigned_dev_id = calc_assigned_dev_id(adev);
-    msix_nr.entry_nr = entries_nr;
-    r = kvm_assign_set_msix_nr(kvm_state, &msix_nr);
-    if (r != 0) {
-        fprintf(stderr, "fail to set MSI-X entry number for MSIX! %s\n",
-                strerror(-r));
+    dev_id = calc_assigned_dev_id(adev);
+
+    r = kvm_device_msix_init_vectors(kvm_state, dev_id, entries_nr);
+    if (r < 0) {
         return r;
     }
-
-    free_dev_irq_entries(adev);
+    pci_dev->msix_cache = g_malloc0(entries_nr * sizeof(MSIRoutingCache));
     adev->irq_entries_nr = entries_nr;
-    adev->entry = g_malloc0(entries_nr * sizeof(*(adev->entry)));
 
-    msix_entry.assigned_dev_id = msix_nr.assigned_dev_id;
-    entries_nr = 0;
     for (i = 0; i < entries_max_nr; i++) {
-        if (entries_nr >= msix_nr.entry_nr) {
+        if (entries_nr == 0) {
             break;
         }
-        msg_data = pci_get_long(msix_page + i * PCI_MSIX_ENTRY_SIZE +
+        msg.data = pci_get_long(msix_page + i * PCI_MSIX_ENTRY_SIZE +
                                 PCI_MSIX_ENTRY_DATA);
-        if (msg_data == 0) {
+        if (msg.data == 0) {
             continue;
         }
-        msg_addr = pci_get_long(msix_page + i * PCI_MSIX_ENTRY_SIZE +
-                                PCI_MSIX_ENTRY_LOWER_ADDR);
-        msg_upper_addr = pci_get_long(msix_page + i * PCI_MSIX_ENTRY_SIZE +
-                                      PCI_MSIX_ENTRY_UPPER_ADDR);
+        msg.address = pci_get_quad(msix_page + i * PCI_MSIX_ENTRY_SIZE +
+                                   PCI_MSIX_ENTRY_LOWER_ADDR);
 
-        r = kvm_get_irq_route_gsi();
+        r = kvm_device_msix_set_vector(kvm_state, dev_id, i, &msg,
+                                       &pci_dev->msix_cache[i]);
         if (r < 0) {
             return r;
         }
-
-        adev->entry[entries_nr].gsi = r;
-        adev->entry[entries_nr].type = KVM_IRQ_ROUTING_MSI;
-        adev->entry[entries_nr].flags = 0;
-        adev->entry[entries_nr].u.msi.address_lo = msg_addr;
-        adev->entry[entries_nr].u.msi.address_hi = msg_upper_addr;
-        adev->entry[entries_nr].u.msi.data = msg_data;
-        DEBUG("MSI-X data 0x%x, MSI-X addr_lo 0x%x\n!", msg_data, msg_addr);
-        kvm_add_routing_entry(&adev->entry[entries_nr], NULL);
-
-        msix_entry.gsi = adev->entry[entries_nr].gsi;
-        msix_entry.entry = i;
-        r = kvm_assign_set_msix_entry(kvm_state, &msix_entry);
-        if (r) {
-            fprintf(stderr, "fail to set MSI-X entry! %s\n", strerror(-r));
-            break;
-        }
-        DEBUG("MSI-X entry gsi 0x%x, entry %d\n!",
-              msix_entry.gsi, msix_entry.entry);
-        entries_nr++;
-    }
-
-    if (r == 0 && kvm_commit_irq_routes() < 0) {
-        perror("assigned_dev_update_msix_mmio: kvm_commit_irq_routes");
-        return -EINVAL;
+        entries_nr--;
     }
 
-    return r;
+    return 0;
 }
 
 static void assigned_dev_update_msix(PCIDevice *pci_dev)
 {
-    struct kvm_assigned_irq assigned_irq_data;
     AssignedDevice *assigned_dev = DO_UPCAST(AssignedDevice, dev, pci_dev);
     uint16_t ctrl_word = pci_get_word(pci_dev->config + pci_dev->msix_cap +
                                       PCI_MSIX_FLAGS);
@@ -1059,7 +1026,10 @@ static void assigned_dev_update_msix(PCIDevice *pci_dev)
      * MSIX or intends to start. */
     if ((assigned_dev->irq_requested_type & KVM_DEV_IRQ_GUEST_MSIX) ||
         (ctrl_word & PCI_MSIX_FLAGS_ENABLE)) {
-        free_dev_irq_entries(assigned_dev);
+        invalidate_msix_vectors(assigned_dev);
+        g_free(pci_dev->msix_cache);
+        assigned_dev->irq_entries_nr = 0;
+
         r = kvm_device_irq_deassign(kvm_state, dev_id,
                                     assigned_dev->irq_requested_type);
         /* -ENXIO means no assigned irq */
@@ -1070,21 +1040,17 @@ static void assigned_dev_update_msix(PCIDevice *pci_dev)
     }
 
     if (ctrl_word & PCI_MSIX_FLAGS_ENABLE) {
-        memset(&assigned_irq_data, 0, sizeof assigned_irq_data);
-        assigned_irq_data.assigned_dev_id = dev_id;
-        assigned_irq_data.flags = KVM_DEV_IRQ_HOST_MSIX |
-                                  KVM_DEV_IRQ_GUEST_MSIX;
-
         if (assigned_dev_set_msix_vectors(pci_dev) < 0) {
             perror("assigned_dev_update_msix_mmio");
             return;
         }
-        if (kvm_assign_irq(kvm_state, &assigned_irq_data) < 0) {
+        if (kvm_device_msix_assign(kvm_state, dev_id) < 0) {
             perror("assigned_dev_enable_msix: assign irq");
             return;
         }
         assigned_dev->girq = -1;
-        assigned_dev->irq_requested_type = assigned_irq_data.flags;
+        assigned_dev->irq_requested_type = KVM_DEV_IRQ_HOST_MSIX |
+                                           KVM_DEV_IRQ_GUEST_MSIX;
     } else {
         assign_intx(assigned_dev);
     }
@@ -1193,10 +1159,7 @@ static int assigned_device_pci_cap_init(PCIDevice *pci_dev)
     }
     /* Expose MSI-X capability */
     pos = pci_find_cap_offset(pci_dev, PCI_CAP_ID_MSIX, 0);
-    /* Would really like to test kvm_check_extension(, KVM_CAP_DEVICE_MSIX),
-     * but the kernel doesn't expose it.  Instead do a dummy call to
-     * KVM_ASSIGN_SET_MSIX_NR to see if it exists. */
-    if (pos != 0 && kvm_assign_set_msix_nr(kvm_state, NULL) == -EFAULT) {
+    if (pos != 0 && kvm_device_msix_supported(kvm_state)) {
         int bar_nr;
         uint32_t msix_table_entry;
 
diff --git a/hw/device-assignment.h b/hw/device-assignment.h
index 1b4aecc..4b67f14 100644
--- a/hw/device-assignment.h
+++ b/hw/device-assignment.h
@@ -107,7 +107,6 @@ typedef struct AssignedDevice {
     uint8_t emulate_config_read[PCI_CONFIG_SPACE_SIZE];
     uint8_t emulate_config_write[PCI_CONFIG_SPACE_SIZE];
     int irq_entries_nr;
-    struct kvm_irq_routing_entry *entry;
     void *msix_table_page;
     target_phys_addr_t msix_table_addr;
     MemoryRegion mmio;
diff --git a/qemu-kvm.c b/qemu-kvm.c
index 27723a6..c9b348c 100644
--- a/qemu-kvm.c
+++ b/qemu-kvm.c
@@ -617,15 +617,47 @@ int kvm_device_msi_assign(KVMState *s, uint32_t dev_id, MSIMessage *msg,
 }
 
 #ifdef KVM_CAP_DEVICE_MSIX
-int kvm_assign_set_msix_nr(KVMState *s, struct kvm_assigned_msix_nr *msix_nr)
+bool kvm_device_msix_supported(KVMState *s)
 {
-    return kvm_vm_ioctl(s, KVM_ASSIGN_SET_MSIX_NR, msix_nr);
+    /* Would really like to test kvm_check_extension(, KVM_CAP_DEVICE_MSIX),
+     * but the kernel doesn't expose it.  Instead do a dummy call to
+     * KVM_ASSIGN_SET_MSIX_NR to see if it exists. */
+    return kvm_vm_ioctl(s, KVM_ASSIGN_SET_MSIX_NR, NULL) == -EFAULT;
 }
 
-int kvm_assign_set_msix_entry(KVMState *s,
-                              struct kvm_assigned_msix_entry *entry)
+int kvm_device_msix_init_vectors(KVMState *s, uint32_t dev_id,
+                                 uint32_t nr_vectors)
 {
-    return kvm_vm_ioctl(s, KVM_ASSIGN_SET_MSIX_ENTRY, entry);
+    struct kvm_assigned_msix_nr msix_nr;
+
+    msix_nr.assigned_dev_id = dev_id;
+    msix_nr.entry_nr = nr_vectors;
+    return kvm_vm_ioctl(s, KVM_ASSIGN_SET_MSIX_NR, &msix_nr);
+}
+
+int kvm_device_msix_set_vector(KVMState *s, uint32_t dev_id, uint32_t vector,
+                               MSIMessage *msg, MSIRoutingCache *cache)
+{
+    struct kvm_assigned_msix_entry msix_entry;
+    int ret;
+
+    ret = kvm_msi_message_update(msg, cache, MSI_ROUTE_STATIC);
+    if (ret < 0) {
+        return ret;
+    }
+    msix_entry.assigned_dev_id = dev_id;
+    msix_entry.gsi = cache->kvm_gsi;
+    msix_entry.entry = vector;
+    return kvm_vm_ioctl(s, KVM_ASSIGN_SET_MSIX_ENTRY, &msix_entry);
+}
+
+int kvm_device_msix_assign(KVMState *s, uint32_t dev_id)
+{
+    struct kvm_assigned_irq assigned_irq;
+
+    assigned_irq.assigned_dev_id = dev_id;
+    assigned_irq.flags = KVM_DEV_IRQ_HOST_MSIX | KVM_DEV_IRQ_GUEST_MSIX;
+    return kvm_vm_ioctl(s, KVM_ASSIGN_DEV_IRQ, &assigned_irq);
 }
 #endif
 
diff --git a/qemu-kvm.h b/qemu-kvm.h
index d987d41..552b668 100644
--- a/qemu-kvm.h
+++ b/qemu-kvm.h
@@ -154,6 +154,12 @@ int kvm_device_intx_assign(KVMState *s, uint32_t dev_id,
                            uint32_t host_irq_type, uint32_t guest_irq);
 int kvm_device_msi_assign(KVMState *s, uint32_t dev_id, MSIMessage *msg,
                           MSIRoutingCache *cache);
+bool kvm_device_msix_supported(KVMState *s);
+int kvm_device_msix_init_vectors(KVMState *s, uint32_t dev_id,
+                                 uint32_t nr_vectors);
+int kvm_device_msix_set_vector(KVMState *s, uint32_t dev_id, uint32_t vector,
+                               MSIMessage *msg, MSIRoutingCache *cache);
+int kvm_device_msix_assign(KVMState *s, uint32_t dev_id);
 int kvm_device_irq_deassign(KVMState *s, uint32_t dev_id, uint32_t type);
 
 /*!
@@ -204,11 +210,6 @@ int kvm_del_routing_entry(struct kvm_irq_routing_entry *entry);
 int kvm_update_routing_entry(struct kvm_irq_routing_entry *entry,
                              struct kvm_irq_routing_entry *newentry);
 
-
-int kvm_assign_set_msix_nr(KVMState *s, struct kvm_assigned_msix_nr *msix_nr);
-int kvm_assign_set_msix_entry(KVMState *s,
-                              struct kvm_assigned_msix_entry *entry);
-
 #else                           /* !CONFIG_KVM */
 
 struct kvm_pit_state {
-- 
1.7.3.4


^ permalink raw reply related	[flat|nested] 288+ messages in thread

* [Qemu-devel] [RFC][PATCH 36/45] qemu-kvm: Factor out kvm_device_msix_* services
@ 2011-10-17  9:28   ` Jan Kiszka
  0 siblings, 0 replies; 288+ messages in thread
From: Jan Kiszka @ 2011-10-17  9:28 UTC (permalink / raw)
  To: Avi Kivity, Marcelo Tosatti
  Cc: Alex Williamson, qemu-devel, kvm, Michael S. Tsirkin

Create kvm_device_msix_{supported,init_vectors,set_vector,assign},
replacing the old kvm_assign_set_msix_{nr,entry} services. The new API
no longer requires direct fiddling with the KVM API data structures and
just takes the required parameters. kvm_device_msix_set_vector also
combines MSI route creation/update with registering the vector with the
device assignment kernel part. The routing information is now stored in
the msix_cache of the backing QEMU PCI device, maintained by the device
assigment code until we switch to generic MSI-X support.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
---
 hw/device-assignment.c |  103 +++++++++++++++--------------------------------
 hw/device-assignment.h |    1 -
 qemu-kvm.c             |   42 +++++++++++++++++--
 qemu-kvm.h             |   11 +++--
 4 files changed, 76 insertions(+), 81 deletions(-)

diff --git a/hw/device-assignment.c b/hw/device-assignment.c
index 83951a3..2484afd 100644
--- a/hw/device-assignment.c
+++ b/hw/device-assignment.c
@@ -648,15 +648,13 @@ again:
 
 static QLIST_HEAD(, AssignedDevice) devs = QLIST_HEAD_INITIALIZER(devs);
 
-static void free_dev_irq_entries(AssignedDevice *dev)
+static void invalidate_msix_vectors(AssignedDevice *dev)
 {
     int i;
 
-    for (i = 0; i < dev->irq_entries_nr; i++)
-        kvm_del_routing_entry(&dev->entry[i]);
-    g_free(dev->entry);
-    dev->entry = NULL;
-    dev->irq_entries_nr = 0;
+    for (i = 0; i < dev->irq_entries_nr; i++) {
+        kvm_msi_cache_invalidate(&dev->dev.msix_cache[i]);
+    }
 }
 
 static void free_assigned_device(AssignedDevice *dev)
@@ -701,12 +699,12 @@ static void free_assigned_device(AssignedDevice *dev)
         close(dev->real_device.config_fd);
     }
 
-    free_dev_irq_entries(dev);
-
     if (dev->dev.msi_cache) {
         kvm_msi_cache_invalidate(&dev->dev.msi_cache[0]);
         g_free(dev->dev.msi_cache);
     }
+    invalidate_msix_vectors(dev);
+    g_free(dev->dev.msix_cache);
 }
 
 static uint32_t calc_assigned_dev_id(AssignedDevice *dev)
@@ -953,11 +951,12 @@ static int assigned_dev_set_msix_vectors(PCIDevice *pci_dev)
 {
     AssignedDevice *adev = DO_UPCAST(AssignedDevice, dev, pci_dev);
     uint16_t entries_nr = 0, entries_max_nr;
-    int pos = 0, i, r = 0;
-    uint32_t msg_addr, msg_upper_addr, msg_data;
-    struct kvm_assigned_msix_nr msix_nr;
-    struct kvm_assigned_msix_entry msix_entry;
     void *msix_page = adev->msix_table_page;
+    uint32_t dev_id;
+    MSIMessage msg;
+    int pos, i, r;
+
+    assert(adev->irq_entries_nr == 0);
 
     pos = pci_find_capability(pci_dev, PCI_CAP_ID_MSIX);
 
@@ -980,72 +979,40 @@ static int assigned_dev_set_msix_vectors(PCIDevice *pci_dev)
         return -EINVAL;
     }
 
-    msix_nr.assigned_dev_id = calc_assigned_dev_id(adev);
-    msix_nr.entry_nr = entries_nr;
-    r = kvm_assign_set_msix_nr(kvm_state, &msix_nr);
-    if (r != 0) {
-        fprintf(stderr, "fail to set MSI-X entry number for MSIX! %s\n",
-                strerror(-r));
+    dev_id = calc_assigned_dev_id(adev);
+
+    r = kvm_device_msix_init_vectors(kvm_state, dev_id, entries_nr);
+    if (r < 0) {
         return r;
     }
-
-    free_dev_irq_entries(adev);
+    pci_dev->msix_cache = g_malloc0(entries_nr * sizeof(MSIRoutingCache));
     adev->irq_entries_nr = entries_nr;
-    adev->entry = g_malloc0(entries_nr * sizeof(*(adev->entry)));
 
-    msix_entry.assigned_dev_id = msix_nr.assigned_dev_id;
-    entries_nr = 0;
     for (i = 0; i < entries_max_nr; i++) {
-        if (entries_nr >= msix_nr.entry_nr) {
+        if (entries_nr == 0) {
             break;
         }
-        msg_data = pci_get_long(msix_page + i * PCI_MSIX_ENTRY_SIZE +
+        msg.data = pci_get_long(msix_page + i * PCI_MSIX_ENTRY_SIZE +
                                 PCI_MSIX_ENTRY_DATA);
-        if (msg_data == 0) {
+        if (msg.data == 0) {
             continue;
         }
-        msg_addr = pci_get_long(msix_page + i * PCI_MSIX_ENTRY_SIZE +
-                                PCI_MSIX_ENTRY_LOWER_ADDR);
-        msg_upper_addr = pci_get_long(msix_page + i * PCI_MSIX_ENTRY_SIZE +
-                                      PCI_MSIX_ENTRY_UPPER_ADDR);
+        msg.address = pci_get_quad(msix_page + i * PCI_MSIX_ENTRY_SIZE +
+                                   PCI_MSIX_ENTRY_LOWER_ADDR);
 
-        r = kvm_get_irq_route_gsi();
+        r = kvm_device_msix_set_vector(kvm_state, dev_id, i, &msg,
+                                       &pci_dev->msix_cache[i]);
         if (r < 0) {
             return r;
         }
-
-        adev->entry[entries_nr].gsi = r;
-        adev->entry[entries_nr].type = KVM_IRQ_ROUTING_MSI;
-        adev->entry[entries_nr].flags = 0;
-        adev->entry[entries_nr].u.msi.address_lo = msg_addr;
-        adev->entry[entries_nr].u.msi.address_hi = msg_upper_addr;
-        adev->entry[entries_nr].u.msi.data = msg_data;
-        DEBUG("MSI-X data 0x%x, MSI-X addr_lo 0x%x\n!", msg_data, msg_addr);
-        kvm_add_routing_entry(&adev->entry[entries_nr], NULL);
-
-        msix_entry.gsi = adev->entry[entries_nr].gsi;
-        msix_entry.entry = i;
-        r = kvm_assign_set_msix_entry(kvm_state, &msix_entry);
-        if (r) {
-            fprintf(stderr, "fail to set MSI-X entry! %s\n", strerror(-r));
-            break;
-        }
-        DEBUG("MSI-X entry gsi 0x%x, entry %d\n!",
-              msix_entry.gsi, msix_entry.entry);
-        entries_nr++;
-    }
-
-    if (r == 0 && kvm_commit_irq_routes() < 0) {
-        perror("assigned_dev_update_msix_mmio: kvm_commit_irq_routes");
-        return -EINVAL;
+        entries_nr--;
     }
 
-    return r;
+    return 0;
 }
 
 static void assigned_dev_update_msix(PCIDevice *pci_dev)
 {
-    struct kvm_assigned_irq assigned_irq_data;
     AssignedDevice *assigned_dev = DO_UPCAST(AssignedDevice, dev, pci_dev);
     uint16_t ctrl_word = pci_get_word(pci_dev->config + pci_dev->msix_cap +
                                       PCI_MSIX_FLAGS);
@@ -1059,7 +1026,10 @@ static void assigned_dev_update_msix(PCIDevice *pci_dev)
      * MSIX or intends to start. */
     if ((assigned_dev->irq_requested_type & KVM_DEV_IRQ_GUEST_MSIX) ||
         (ctrl_word & PCI_MSIX_FLAGS_ENABLE)) {
-        free_dev_irq_entries(assigned_dev);
+        invalidate_msix_vectors(assigned_dev);
+        g_free(pci_dev->msix_cache);
+        assigned_dev->irq_entries_nr = 0;
+
         r = kvm_device_irq_deassign(kvm_state, dev_id,
                                     assigned_dev->irq_requested_type);
         /* -ENXIO means no assigned irq */
@@ -1070,21 +1040,17 @@ static void assigned_dev_update_msix(PCIDevice *pci_dev)
     }
 
     if (ctrl_word & PCI_MSIX_FLAGS_ENABLE) {
-        memset(&assigned_irq_data, 0, sizeof assigned_irq_data);
-        assigned_irq_data.assigned_dev_id = dev_id;
-        assigned_irq_data.flags = KVM_DEV_IRQ_HOST_MSIX |
-                                  KVM_DEV_IRQ_GUEST_MSIX;
-
         if (assigned_dev_set_msix_vectors(pci_dev) < 0) {
             perror("assigned_dev_update_msix_mmio");
             return;
         }
-        if (kvm_assign_irq(kvm_state, &assigned_irq_data) < 0) {
+        if (kvm_device_msix_assign(kvm_state, dev_id) < 0) {
             perror("assigned_dev_enable_msix: assign irq");
             return;
         }
         assigned_dev->girq = -1;
-        assigned_dev->irq_requested_type = assigned_irq_data.flags;
+        assigned_dev->irq_requested_type = KVM_DEV_IRQ_HOST_MSIX |
+                                           KVM_DEV_IRQ_GUEST_MSIX;
     } else {
         assign_intx(assigned_dev);
     }
@@ -1193,10 +1159,7 @@ static int assigned_device_pci_cap_init(PCIDevice *pci_dev)
     }
     /* Expose MSI-X capability */
     pos = pci_find_cap_offset(pci_dev, PCI_CAP_ID_MSIX, 0);
-    /* Would really like to test kvm_check_extension(, KVM_CAP_DEVICE_MSIX),
-     * but the kernel doesn't expose it.  Instead do a dummy call to
-     * KVM_ASSIGN_SET_MSIX_NR to see if it exists. */
-    if (pos != 0 && kvm_assign_set_msix_nr(kvm_state, NULL) == -EFAULT) {
+    if (pos != 0 && kvm_device_msix_supported(kvm_state)) {
         int bar_nr;
         uint32_t msix_table_entry;
 
diff --git a/hw/device-assignment.h b/hw/device-assignment.h
index 1b4aecc..4b67f14 100644
--- a/hw/device-assignment.h
+++ b/hw/device-assignment.h
@@ -107,7 +107,6 @@ typedef struct AssignedDevice {
     uint8_t emulate_config_read[PCI_CONFIG_SPACE_SIZE];
     uint8_t emulate_config_write[PCI_CONFIG_SPACE_SIZE];
     int irq_entries_nr;
-    struct kvm_irq_routing_entry *entry;
     void *msix_table_page;
     target_phys_addr_t msix_table_addr;
     MemoryRegion mmio;
diff --git a/qemu-kvm.c b/qemu-kvm.c
index 27723a6..c9b348c 100644
--- a/qemu-kvm.c
+++ b/qemu-kvm.c
@@ -617,15 +617,47 @@ int kvm_device_msi_assign(KVMState *s, uint32_t dev_id, MSIMessage *msg,
 }
 
 #ifdef KVM_CAP_DEVICE_MSIX
-int kvm_assign_set_msix_nr(KVMState *s, struct kvm_assigned_msix_nr *msix_nr)
+bool kvm_device_msix_supported(KVMState *s)
 {
-    return kvm_vm_ioctl(s, KVM_ASSIGN_SET_MSIX_NR, msix_nr);
+    /* Would really like to test kvm_check_extension(, KVM_CAP_DEVICE_MSIX),
+     * but the kernel doesn't expose it.  Instead do a dummy call to
+     * KVM_ASSIGN_SET_MSIX_NR to see if it exists. */
+    return kvm_vm_ioctl(s, KVM_ASSIGN_SET_MSIX_NR, NULL) == -EFAULT;
 }
 
-int kvm_assign_set_msix_entry(KVMState *s,
-                              struct kvm_assigned_msix_entry *entry)
+int kvm_device_msix_init_vectors(KVMState *s, uint32_t dev_id,
+                                 uint32_t nr_vectors)
 {
-    return kvm_vm_ioctl(s, KVM_ASSIGN_SET_MSIX_ENTRY, entry);
+    struct kvm_assigned_msix_nr msix_nr;
+
+    msix_nr.assigned_dev_id = dev_id;
+    msix_nr.entry_nr = nr_vectors;
+    return kvm_vm_ioctl(s, KVM_ASSIGN_SET_MSIX_NR, &msix_nr);
+}
+
+int kvm_device_msix_set_vector(KVMState *s, uint32_t dev_id, uint32_t vector,
+                               MSIMessage *msg, MSIRoutingCache *cache)
+{
+    struct kvm_assigned_msix_entry msix_entry;
+    int ret;
+
+    ret = kvm_msi_message_update(msg, cache, MSI_ROUTE_STATIC);
+    if (ret < 0) {
+        return ret;
+    }
+    msix_entry.assigned_dev_id = dev_id;
+    msix_entry.gsi = cache->kvm_gsi;
+    msix_entry.entry = vector;
+    return kvm_vm_ioctl(s, KVM_ASSIGN_SET_MSIX_ENTRY, &msix_entry);
+}
+
+int kvm_device_msix_assign(KVMState *s, uint32_t dev_id)
+{
+    struct kvm_assigned_irq assigned_irq;
+
+    assigned_irq.assigned_dev_id = dev_id;
+    assigned_irq.flags = KVM_DEV_IRQ_HOST_MSIX | KVM_DEV_IRQ_GUEST_MSIX;
+    return kvm_vm_ioctl(s, KVM_ASSIGN_DEV_IRQ, &assigned_irq);
 }
 #endif
 
diff --git a/qemu-kvm.h b/qemu-kvm.h
index d987d41..552b668 100644
--- a/qemu-kvm.h
+++ b/qemu-kvm.h
@@ -154,6 +154,12 @@ int kvm_device_intx_assign(KVMState *s, uint32_t dev_id,
                            uint32_t host_irq_type, uint32_t guest_irq);
 int kvm_device_msi_assign(KVMState *s, uint32_t dev_id, MSIMessage *msg,
                           MSIRoutingCache *cache);
+bool kvm_device_msix_supported(KVMState *s);
+int kvm_device_msix_init_vectors(KVMState *s, uint32_t dev_id,
+                                 uint32_t nr_vectors);
+int kvm_device_msix_set_vector(KVMState *s, uint32_t dev_id, uint32_t vector,
+                               MSIMessage *msg, MSIRoutingCache *cache);
+int kvm_device_msix_assign(KVMState *s, uint32_t dev_id);
 int kvm_device_irq_deassign(KVMState *s, uint32_t dev_id, uint32_t type);
 
 /*!
@@ -204,11 +210,6 @@ int kvm_del_routing_entry(struct kvm_irq_routing_entry *entry);
 int kvm_update_routing_entry(struct kvm_irq_routing_entry *entry,
                              struct kvm_irq_routing_entry *newentry);
 
-
-int kvm_assign_set_msix_nr(KVMState *s, struct kvm_assigned_msix_nr *msix_nr);
-int kvm_assign_set_msix_entry(KVMState *s,
-                              struct kvm_assigned_msix_entry *entry);
-
 #else                           /* !CONFIG_KVM */
 
 struct kvm_pit_state {
-- 
1.7.3.4

^ permalink raw reply related	[flat|nested] 288+ messages in thread

* [RFC][PATCH 37/45] qemu-kvm: Clean up irqrouting API
  2011-10-17  9:27 ` [Qemu-devel] " Jan Kiszka
@ 2011-10-17  9:28   ` Jan Kiszka
  -1 siblings, 0 replies; 288+ messages in thread
From: Jan Kiszka @ 2011-10-17  9:28 UTC (permalink / raw)
  To: Avi Kivity, Marcelo Tosatti
  Cc: kvm, Alex Williamson, Michael S. Tsirkin, qemu-devel

Drop unused functions, privatize those which are only used internally now.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
---
 kvm-stub.c |   10 ----------
 kvm.h      |    1 -
 qemu-kvm.c |   37 ++++++-------------------------------
 qemu-kvm.h |   39 ---------------------------------------
 4 files changed, 6 insertions(+), 81 deletions(-)

diff --git a/kvm-stub.c b/kvm-stub.c
index acd1446..a4225e0 100644
--- a/kvm-stub.c
+++ b/kvm-stub.c
@@ -135,20 +135,10 @@ int kvm_has_gsi_routing(void)
     return 0;
 }
 
-int kvm_get_irq_route_gsi(void)
-{
-    return -ENOSYS;
-}
-
 void kvm_msi_cache_invalidate(MSIRoutingCache *cache)
 {
 }
 
-int kvm_commit_irq_routes(void)
-{
-    return -ENOSYS;
-}
-
 int kvm_set_irq(int irq, int level, int *status)
 {
     assert(0);
diff --git a/kvm.h b/kvm.h
index 61bcfec..9780e53 100644
--- a/kvm.h
+++ b/kvm.h
@@ -202,7 +202,6 @@ int kvm_set_ioeventfd_pio_word(int fd, uint16_t adr, uint16_t val, bool assign);
 
 int kvm_has_gsi_routing(void);
 int kvm_allows_irq0_override(void);
-int kvm_get_irq_route_gsi(void);
 
 void kvm_msi_cache_invalidate(MSIRoutingCache *cache);
 
diff --git a/qemu-kvm.c b/qemu-kvm.c
index c9b348c..34aebe5 100644
--- a/qemu-kvm.c
+++ b/qemu-kvm.c
@@ -188,12 +188,6 @@ int kvm_assign_pci_device(KVMState *s,
     return kvm_vm_ioctl(s, KVM_ASSIGN_PCI_DEVICE, assigned_dev);
 }
 
-static int kvm_old_assign_irq(KVMState *s,
-                              struct kvm_assigned_irq *assigned_irq)
-{
-    return kvm_vm_ioctl(s, KVM_ASSIGN_IRQ, assigned_irq);
-}
-
 int kvm_device_intx_assign(KVMState *s, uint32_t dev_id,
                            uint32_t host_irq_type, uint32_t guest_irq)
 {
@@ -210,25 +204,6 @@ int kvm_device_intx_assign(KVMState *s, uint32_t dev_id,
         return kvm_vm_ioctl(s, KVM_ASSIGN_IRQ, &assigned_irq);
     }
 }
-
-#ifdef KVM_CAP_ASSIGN_DEV_IRQ
-int kvm_assign_irq(KVMState *s, struct kvm_assigned_irq *assigned_irq)
-{
-    int ret;
-
-    ret = kvm_ioctl(s, KVM_CHECK_EXTENSION, KVM_CAP_ASSIGN_DEV_IRQ);
-    if (ret > 0) {
-        return kvm_vm_ioctl(s, KVM_ASSIGN_DEV_IRQ, assigned_irq);
-    }
-
-    return kvm_old_assign_irq(s, assigned_irq);
-}
-#else
-int kvm_assign_irq(KVMState *s, struct kvm_assigned_irq *assigned_irq)
-{
-    return kvm_old_assign_irq(s, assigned_irq);
-}
-#endif
 #endif
 
 int kvm_device_irq_deassign(KVMState *s, uint32_t dev_id, uint32_t type)
@@ -275,8 +250,8 @@ int kvm_has_gsi_routing(void)
     return r;
 }
 
-int kvm_add_routing_entry(struct kvm_irq_routing_entry *entry,
-                          MSIRoutingCache *msi_cache)
+static int kvm_add_routing_entry(struct kvm_irq_routing_entry *entry,
+                                 MSIRoutingCache *msi_cache)
 {
 #ifdef KVM_CAP_IRQ_ROUTING
     KVMState *s = kvm_state;
@@ -328,7 +303,7 @@ int kvm_add_irq_route(int gsi, int irqchip, int pin)
 #endif
 }
 
-int kvm_del_routing_entry(struct kvm_irq_routing_entry *entry)
+static int kvm_del_routing_entry(struct kvm_irq_routing_entry *entry)
 {
 #ifdef KVM_CAP_IRQ_ROUTING
     KVMState *s = kvm_state;
@@ -398,8 +373,8 @@ int kvm_del_routing_entry(struct kvm_irq_routing_entry *entry)
 #endif
 }
 
-int kvm_update_routing_entry(struct kvm_irq_routing_entry *entry,
-                             struct kvm_irq_routing_entry *newentry)
+static int kvm_update_routing_entry(struct kvm_irq_routing_entry *entry,
+                                    struct kvm_irq_routing_entry *newentry)
 {
 #ifdef KVM_CAP_IRQ_ROUTING
     KVMState *s = kvm_state;
@@ -456,7 +431,7 @@ int kvm_commit_irq_routes(void)
 
 static void kvm_msi_cache_flush(KVMState *s);
 
-int kvm_get_irq_route_gsi(void)
+static int kvm_get_irq_route_gsi(void)
 {
     KVMState *s = kvm_state;
     int i, bit;
diff --git a/qemu-kvm.h b/qemu-kvm.h
index 552b668..6b73ce1 100644
--- a/qemu-kvm.h
+++ b/qemu-kvm.h
@@ -139,17 +139,6 @@ int kvm_enable_vapic(CPUState *env, uint64_t vapic);
 int kvm_assign_pci_device(KVMState *s,
                           struct kvm_assigned_pci_dev *assigned_dev);
 
-/*!
- * \brief Assign IRQ for an assigned device
- *
- * Used for PCI device assignment, this function assigns IRQ numbers for
- * an physical device and guest IRQ handling.
- *
- * \param kvm Pointer to the current kvm_context
- * \param assigned_irq Parameters, like dev id, host irq, guest irq, etc
- */
-int kvm_assign_irq(KVMState *s, struct kvm_assigned_irq *assigned_irq);
-
 int kvm_device_intx_assign(KVMState *s, uint32_t dev_id,
                            uint32_t host_irq_type, uint32_t guest_irq);
 int kvm_device_msi_assign(KVMState *s, uint32_t dev_id, MSIMessage *msg,
@@ -182,34 +171,6 @@ int kvm_deassign_pci_device(KVMState *s,
  */
 int kvm_add_irq_route(int gsi, int irqchip, int pin);
 
-struct kvm_irq_routing_entry;
-/*!
- * \brief Adds a routing entry to the temporary irq routing table
- *
- * Adds a filled routing entry to the temporary irq routing table. Nothing is
- * committed to the running VM.
- */
-int kvm_add_routing_entry(struct kvm_irq_routing_entry *entry,
-                          MSIRoutingCache *msi_cache);
-
-/*!
- * \brief Removes a routing from the temporary irq routing table
- *
- * Remove a routing to the temporary irq routing table.  Nothing is
- * committed to the running VM.
- */
-int kvm_del_routing_entry(struct kvm_irq_routing_entry *entry);
-
-/*!
- * \brief Updates a routing in the temporary irq routing table
- *
- * Update a routing in the temporary irq routing table
- * with a new value. entry type and GSI can not be changed.
- * Nothing is committed to the running VM.
- */
-int kvm_update_routing_entry(struct kvm_irq_routing_entry *entry,
-                             struct kvm_irq_routing_entry *newentry);
-
 #else                           /* !CONFIG_KVM */
 
 struct kvm_pit_state {
-- 
1.7.3.4


^ permalink raw reply related	[flat|nested] 288+ messages in thread

* [Qemu-devel] [RFC][PATCH 37/45] qemu-kvm: Clean up irqrouting API
@ 2011-10-17  9:28   ` Jan Kiszka
  0 siblings, 0 replies; 288+ messages in thread
From: Jan Kiszka @ 2011-10-17  9:28 UTC (permalink / raw)
  To: Avi Kivity, Marcelo Tosatti
  Cc: Alex Williamson, qemu-devel, kvm, Michael S. Tsirkin

Drop unused functions, privatize those which are only used internally now.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
---
 kvm-stub.c |   10 ----------
 kvm.h      |    1 -
 qemu-kvm.c |   37 ++++++-------------------------------
 qemu-kvm.h |   39 ---------------------------------------
 4 files changed, 6 insertions(+), 81 deletions(-)

diff --git a/kvm-stub.c b/kvm-stub.c
index acd1446..a4225e0 100644
--- a/kvm-stub.c
+++ b/kvm-stub.c
@@ -135,20 +135,10 @@ int kvm_has_gsi_routing(void)
     return 0;
 }
 
-int kvm_get_irq_route_gsi(void)
-{
-    return -ENOSYS;
-}
-
 void kvm_msi_cache_invalidate(MSIRoutingCache *cache)
 {
 }
 
-int kvm_commit_irq_routes(void)
-{
-    return -ENOSYS;
-}
-
 int kvm_set_irq(int irq, int level, int *status)
 {
     assert(0);
diff --git a/kvm.h b/kvm.h
index 61bcfec..9780e53 100644
--- a/kvm.h
+++ b/kvm.h
@@ -202,7 +202,6 @@ int kvm_set_ioeventfd_pio_word(int fd, uint16_t adr, uint16_t val, bool assign);
 
 int kvm_has_gsi_routing(void);
 int kvm_allows_irq0_override(void);
-int kvm_get_irq_route_gsi(void);
 
 void kvm_msi_cache_invalidate(MSIRoutingCache *cache);
 
diff --git a/qemu-kvm.c b/qemu-kvm.c
index c9b348c..34aebe5 100644
--- a/qemu-kvm.c
+++ b/qemu-kvm.c
@@ -188,12 +188,6 @@ int kvm_assign_pci_device(KVMState *s,
     return kvm_vm_ioctl(s, KVM_ASSIGN_PCI_DEVICE, assigned_dev);
 }
 
-static int kvm_old_assign_irq(KVMState *s,
-                              struct kvm_assigned_irq *assigned_irq)
-{
-    return kvm_vm_ioctl(s, KVM_ASSIGN_IRQ, assigned_irq);
-}
-
 int kvm_device_intx_assign(KVMState *s, uint32_t dev_id,
                            uint32_t host_irq_type, uint32_t guest_irq)
 {
@@ -210,25 +204,6 @@ int kvm_device_intx_assign(KVMState *s, uint32_t dev_id,
         return kvm_vm_ioctl(s, KVM_ASSIGN_IRQ, &assigned_irq);
     }
 }
-
-#ifdef KVM_CAP_ASSIGN_DEV_IRQ
-int kvm_assign_irq(KVMState *s, struct kvm_assigned_irq *assigned_irq)
-{
-    int ret;
-
-    ret = kvm_ioctl(s, KVM_CHECK_EXTENSION, KVM_CAP_ASSIGN_DEV_IRQ);
-    if (ret > 0) {
-        return kvm_vm_ioctl(s, KVM_ASSIGN_DEV_IRQ, assigned_irq);
-    }
-
-    return kvm_old_assign_irq(s, assigned_irq);
-}
-#else
-int kvm_assign_irq(KVMState *s, struct kvm_assigned_irq *assigned_irq)
-{
-    return kvm_old_assign_irq(s, assigned_irq);
-}
-#endif
 #endif
 
 int kvm_device_irq_deassign(KVMState *s, uint32_t dev_id, uint32_t type)
@@ -275,8 +250,8 @@ int kvm_has_gsi_routing(void)
     return r;
 }
 
-int kvm_add_routing_entry(struct kvm_irq_routing_entry *entry,
-                          MSIRoutingCache *msi_cache)
+static int kvm_add_routing_entry(struct kvm_irq_routing_entry *entry,
+                                 MSIRoutingCache *msi_cache)
 {
 #ifdef KVM_CAP_IRQ_ROUTING
     KVMState *s = kvm_state;
@@ -328,7 +303,7 @@ int kvm_add_irq_route(int gsi, int irqchip, int pin)
 #endif
 }
 
-int kvm_del_routing_entry(struct kvm_irq_routing_entry *entry)
+static int kvm_del_routing_entry(struct kvm_irq_routing_entry *entry)
 {
 #ifdef KVM_CAP_IRQ_ROUTING
     KVMState *s = kvm_state;
@@ -398,8 +373,8 @@ int kvm_del_routing_entry(struct kvm_irq_routing_entry *entry)
 #endif
 }
 
-int kvm_update_routing_entry(struct kvm_irq_routing_entry *entry,
-                             struct kvm_irq_routing_entry *newentry)
+static int kvm_update_routing_entry(struct kvm_irq_routing_entry *entry,
+                                    struct kvm_irq_routing_entry *newentry)
 {
 #ifdef KVM_CAP_IRQ_ROUTING
     KVMState *s = kvm_state;
@@ -456,7 +431,7 @@ int kvm_commit_irq_routes(void)
 
 static void kvm_msi_cache_flush(KVMState *s);
 
-int kvm_get_irq_route_gsi(void)
+static int kvm_get_irq_route_gsi(void)
 {
     KVMState *s = kvm_state;
     int i, bit;
diff --git a/qemu-kvm.h b/qemu-kvm.h
index 552b668..6b73ce1 100644
--- a/qemu-kvm.h
+++ b/qemu-kvm.h
@@ -139,17 +139,6 @@ int kvm_enable_vapic(CPUState *env, uint64_t vapic);
 int kvm_assign_pci_device(KVMState *s,
                           struct kvm_assigned_pci_dev *assigned_dev);
 
-/*!
- * \brief Assign IRQ for an assigned device
- *
- * Used for PCI device assignment, this function assigns IRQ numbers for
- * an physical device and guest IRQ handling.
- *
- * \param kvm Pointer to the current kvm_context
- * \param assigned_irq Parameters, like dev id, host irq, guest irq, etc
- */
-int kvm_assign_irq(KVMState *s, struct kvm_assigned_irq *assigned_irq);
-
 int kvm_device_intx_assign(KVMState *s, uint32_t dev_id,
                            uint32_t host_irq_type, uint32_t guest_irq);
 int kvm_device_msi_assign(KVMState *s, uint32_t dev_id, MSIMessage *msg,
@@ -182,34 +171,6 @@ int kvm_deassign_pci_device(KVMState *s,
  */
 int kvm_add_irq_route(int gsi, int irqchip, int pin);
 
-struct kvm_irq_routing_entry;
-/*!
- * \brief Adds a routing entry to the temporary irq routing table
- *
- * Adds a filled routing entry to the temporary irq routing table. Nothing is
- * committed to the running VM.
- */
-int kvm_add_routing_entry(struct kvm_irq_routing_entry *entry,
-                          MSIRoutingCache *msi_cache);
-
-/*!
- * \brief Removes a routing from the temporary irq routing table
- *
- * Remove a routing to the temporary irq routing table.  Nothing is
- * committed to the running VM.
- */
-int kvm_del_routing_entry(struct kvm_irq_routing_entry *entry);
-
-/*!
- * \brief Updates a routing in the temporary irq routing table
- *
- * Update a routing in the temporary irq routing table
- * with a new value. entry type and GSI can not be changed.
- * Nothing is committed to the running VM.
- */
-int kvm_update_routing_entry(struct kvm_irq_routing_entry *entry,
-                             struct kvm_irq_routing_entry *newentry);
-
 #else                           /* !CONFIG_KVM */
 
 struct kvm_pit_state {
-- 
1.7.3.4

^ permalink raw reply related	[flat|nested] 288+ messages in thread

* [RFC][PATCH 38/45] msi: Implement config notifiers for legacy MSI
  2011-10-17  9:27 ` [Qemu-devel] " Jan Kiszka
@ 2011-10-17  9:28   ` Jan Kiszka
  -1 siblings, 0 replies; 288+ messages in thread
From: Jan Kiszka @ 2011-10-17  9:28 UTC (permalink / raw)
  To: Avi Kivity, Marcelo Tosatti
  Cc: kvm, Alex Williamson, Michael S. Tsirkin, qemu-devel

Realize support for MSI config notifiers analogously to MSI-X. The logic
is slightly more complex for legacy MSI as per-vector masking is option
here. Device assignment will be the first user.

Note that this change does not introduce per-vector masking support.
This can to be added at some later point, using the notifications the
MSI layer provides now.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
---
 hw/msi.c |  171 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++------
 hw/msi.h |    7 ++-
 hw/pci.c |    2 +-
 hw/pci.h |    3 +
 4 files changed, 166 insertions(+), 17 deletions(-)

diff --git a/hw/msi.c b/hw/msi.c
index 23d79dd..2380ee3 100644
--- a/hw/msi.c
+++ b/hw/msi.c
@@ -241,15 +241,15 @@ void msi_uninit(struct PCIDevice *dev)
 
 void msi_reset(PCIDevice *dev)
 {
-    uint16_t flags;
+    uint16_t flags, old_flags;
     bool msi64bit;
 
     if (!msi_present(dev)) {
         return;
     }
 
-    flags = pci_get_word(dev->config + msi_flags_off(dev));
-    flags &= ~(PCI_MSI_FLAGS_QSIZE | PCI_MSI_FLAGS_ENABLE);
+    old_flags = pci_get_word(dev->config + msi_flags_off(dev));
+    flags = old_flags & ~(PCI_MSI_FLAGS_QSIZE | PCI_MSI_FLAGS_ENABLE);
     msi64bit = flags & PCI_MSI_FLAGS_64BIT;
 
     pci_set_word(dev->config + msi_flags_off(dev), flags);
@@ -262,6 +262,8 @@ void msi_reset(PCIDevice *dev)
         pci_set_long(dev->config + msi_mask_off(dev, msi64bit), 0);
         pci_set_long(dev->config + msi_pending_off(dev, msi64bit), 0);
     }
+    /* trigger notifier on potential changes */
+    msi_write_config(dev, msi_flags_off(dev), old_flags, 2);
     MSI_DEV_PRINTF(dev, "reset\n");
 }
 
@@ -306,16 +308,20 @@ void msi_notify(PCIDevice *dev, unsigned int vector)
 }
 
 /* Normally called by pci_default_write_config(). */
-void msi_write_config(PCIDevice *dev, uint32_t addr, uint32_t val, int len)
+void msi_write_config(PCIDevice *dev, uint32_t addr, uint32_t old_val, int len)
 {
     uint16_t flags = pci_get_word(dev->config + msi_flags_off(dev));
     bool msi64bit = flags & PCI_MSI_FLAGS_64BIT;
     bool msi_per_vector_mask = flags & PCI_MSI_FLAGS_MASKBIT;
+    bool fire_vector_notifier = false;
     unsigned int nr_vectors;
     uint8_t log_num_vecs;
     uint8_t log_max_vecs;
     unsigned int vector;
     uint32_t pending;
+    MSIMessage msg;
+    bool enabled;
+    int ret;
 
     if (!msi_present(dev) ||
         !ranges_overlap(addr, len, dev->msi_cap, msi_cap_sizeof(flags))) {
@@ -342,7 +348,35 @@ void msi_write_config(PCIDevice *dev, uint32_t addr, uint32_t val, int len)
     fprintf(stderr, "\n");
 #endif
 
-    if (!(flags & PCI_MSI_FLAGS_ENABLE)) {
+    enabled = flags & PCI_MSI_FLAGS_ENABLE;
+    nr_vectors = msi_nr_vectors(flags);
+
+    if (dev->msi_enable_notifier &&
+        range_covers_byte(addr, len, msi_flags_off(dev))) {
+        old_val >>= (msi_flags_off(dev) - addr) * 8;
+        if ((old_val & PCI_MSI_FLAGS_ENABLE) != enabled) {
+            dev->msi_enable_notifier(dev, enabled);
+            if (enabled && dev->msi_vector_config_notifier) {
+                fire_vector_notifier = true;
+            }
+        }
+    }
+    if (dev->msi_vector_config_notifier) {
+        if (ranges_overlap(addr, len, msi_address_lo_off(dev),
+                   msi64bit ? 10 : 6)) {
+            fire_vector_notifier = true;
+        }
+    }
+    if (fire_vector_notifier) {
+        for (vector = 0; vector < nr_vectors; ++vector) {
+            msi_message_from_vector(dev, flags, vector, &msg);
+            ret = dev->msi_vector_config_notifier(dev, vector, &msg,
+                                                  msi_is_masked(dev, vector));
+            assert(ret >= 0);
+        }
+    }
+
+    if (!enabled) {
         kvm_msi_free(dev);
         return;
     }
@@ -375,13 +409,12 @@ void msi_write_config(PCIDevice *dev, uint32_t addr, uint32_t val, int len)
         pci_set_word(dev->config + msi_flags_off(dev), flags);
     }
 
-    if (!msi_per_vector_mask) {
-        /* if per vector masking isn't supported,
-           there is no pending interrupt. */
+    if (!msi_per_vector_mask ||
+        !ranges_overlap(addr, len, msi_mask_off(dev, msi64bit), 4)) {
         return;
     }
 
-    nr_vectors = msi_nr_vectors(flags);
+    old_val >>= (msi_mask_off(dev, msi64bit) - addr) * 8;
 
     /* This will discard pending interrupts, if any. */
     pending = pci_get_long(dev->config + msi_pending_off(dev, msi64bit));
@@ -390,13 +423,22 @@ void msi_write_config(PCIDevice *dev, uint32_t addr, uint32_t val, int len)
 
     /* deliver pending interrupts which are unmasked */
     for (vector = 0; vector < nr_vectors; ++vector) {
-        if (msi_is_masked(dev, vector) || !(pending & (1U << vector))) {
-            continue;
+        bool is_masked = msi_is_masked(dev, vector);
+        unsigned int vector_mask = 1U << vector;
+
+        if (!fire_vector_notifier && dev->msi_vector_config_notifier &&
+            (bool)(old_val & vector_mask) != is_masked) {
+            msi_message_from_vector(dev, flags, vector, &msg);
+            ret = dev->msi_vector_config_notifier(dev, vector, &msg,
+                                                  is_masked);
+            assert(ret >= 0);
+        }
+        if (!is_masked && pending & vector_mask) {
+            pci_long_test_and_clear_mask(dev->config +
+                                         msi_pending_off(dev, msi64bit),
+                                         vector_mask);
+            msi_notify(dev, vector);
         }
-
-        pci_long_test_and_clear_mask(
-            dev->config + msi_pending_off(dev, msi64bit), 1U << vector);
-        msi_notify(dev, vector);
     }
 }
 
@@ -405,3 +447,102 @@ unsigned int msi_nr_vectors_allocated(const PCIDevice *dev)
     uint16_t flags = pci_get_word(dev->config + msi_flags_off(dev));
     return msi_nr_vectors(flags);
 }
+
+/* Invoke the notifier if vector entry is unmasked. */
+static int
+msi_notify_if_unmasked(PCIDevice *dev, unsigned int vector, int masked)
+{
+    uint16_t flags = pci_get_word(dev->config + msi_flags_off(dev));
+    MSIMessage msg;
+
+    assert(dev->msi_vector_config_notifier);
+
+    if (msi_is_masked(dev, vector)) {
+        return 0;
+    }
+    msi_message_from_vector(dev, flags, vector, &msg);
+    return dev->msi_vector_config_notifier(dev, vector, &msg, masked);
+}
+
+static int
+msi_set_config_notifier_for_vector(PCIDevice *dev, unsigned int vector)
+{
+    /* Notifier has been set. Invoke it on unmasked vectors. */
+    return msi_notify_if_unmasked(dev, vector, 0);
+}
+
+static int
+msi_unset_config_notifier_for_vector(PCIDevice *dev, unsigned int vector)
+{
+    /* Notifier will be unset. Invoke it to mask unmasked entries. */
+    return msi_notify_if_unmasked(dev, vector, 1);
+}
+
+int msi_set_config_notifiers(PCIDevice *dev, MSIEnableNotifier enable_notifier,
+                             MSIVectorConfigNotifier vector_config_notifier)
+{
+    unsigned int nr_vectors;
+    int r, vector;
+
+    assert(!dev->msi_vector_config_notifier);
+
+    dev->msi_enable_notifier = enable_notifier;
+    dev->msi_vector_config_notifier = vector_config_notifier;
+
+    if (enable_notifier && msi_enabled(dev)) {
+        enable_notifier(dev, true);
+    }
+    if (msi_enabled(dev)) {
+        nr_vectors =
+            msi_nr_vectors(pci_get_word(dev->config + msi_flags_off(dev)));
+        for (vector = 0; vector < nr_vectors; ++vector) {
+            r = msi_set_config_notifier_for_vector(dev, vector);
+            if (r < 0) {
+                goto undo;
+            }
+        }
+    }
+    return 0;
+
+undo:
+    while (--vector >= 0) {
+        msi_unset_config_notifier_for_vector(dev, vector);
+    }
+    if (enable_notifier && msi_enabled(dev)) {
+        enable_notifier(dev, false);
+    }
+    dev->msi_enable_notifier = NULL;
+    dev->msi_vector_config_notifier = NULL;
+    return r;
+}
+
+int msi_unset_config_notifiers(PCIDevice *dev)
+{
+    unsigned int nr_vectors;
+    int r, vector;
+
+    assert(dev->msi_vector_config_notifier);
+
+    if (msi_enabled(dev)) {
+        nr_vectors =
+            msi_nr_vectors(pci_get_word(dev->config + msi_flags_off(dev)));
+        for (vector = 0; vector < nr_vectors; ++vector) {
+            r = msi_unset_config_notifier_for_vector(dev, vector);
+            if (r < 0) {
+                goto undo;
+            }
+        }
+    }
+    if (dev->msi_enable_notifier && msi_enabled(dev)) {
+        dev->msi_enable_notifier(dev, false);
+    }
+    dev->msi_enable_notifier = NULL;
+    dev->msi_vector_config_notifier = NULL;
+    return 0;
+
+undo:
+    while (--vector >= 0) {
+        msi_set_config_notifier_for_vector(dev, vector);
+    }
+    return r;
+}
diff --git a/hw/msi.h b/hw/msi.h
index 74f6d52..c28665b 100644
--- a/hw/msi.h
+++ b/hw/msi.h
@@ -50,9 +50,14 @@ int msi_init(struct PCIDevice *dev, uint8_t offset,
 void msi_uninit(struct PCIDevice *dev);
 void msi_reset(PCIDevice *dev);
 void msi_notify(PCIDevice *dev, unsigned int vector);
-void msi_write_config(PCIDevice *dev, uint32_t addr, uint32_t val, int len);
+void msi_write_config(PCIDevice *dev, uint32_t addr, uint32_t old_val,
+                      int len);
 unsigned int msi_nr_vectors_allocated(const PCIDevice *dev);
 
+int msi_set_config_notifiers(PCIDevice *dev, MSIEnableNotifier enable_notifier,
+                             MSIVectorConfigNotifier vector_config_notifier);
+int msi_unset_config_notifiers(PCIDevice *dev);
+
 static inline bool msi_present(const PCIDevice *dev)
 {
     return dev->cap_present & QEMU_PCI_CAP_MSI;
diff --git a/hw/pci.c b/hw/pci.c
index 4f0d7e1..96cd334 100644
--- a/hw/pci.c
+++ b/hw/pci.c
@@ -1155,7 +1155,7 @@ void pci_default_write_config(PCIDevice *d, uint32_t addr, uint32_t val, int l)
     if (range_covers_byte(addr, l, PCI_COMMAND))
         pci_update_irq_disabled(d, was_irq_disabled);
 
-    msi_write_config(d, addr, val, l);
+    msi_write_config(d, addr, old_val, l);
     msix_write_config(d, addr, old_val, l);
 }
 
diff --git a/hw/pci.h b/hw/pci.h
index 5cf9a16..266fe34 100644
--- a/hw/pci.h
+++ b/hw/pci.h
@@ -206,6 +206,9 @@ struct PCIDevice {
      * on the rest of the region. */
     target_phys_addr_t msix_page_size;
 
+    MSIEnableNotifier msi_enable_notifier;
+    MSIVectorConfigNotifier msi_vector_config_notifier;
+
     MSIEnableNotifier msix_enable_notifier;
     MSIVectorConfigNotifier msix_vector_config_notifier;
 };
-- 
1.7.3.4


^ permalink raw reply related	[flat|nested] 288+ messages in thread

* [Qemu-devel] [RFC][PATCH 38/45] msi: Implement config notifiers for legacy MSI
@ 2011-10-17  9:28   ` Jan Kiszka
  0 siblings, 0 replies; 288+ messages in thread
From: Jan Kiszka @ 2011-10-17  9:28 UTC (permalink / raw)
  To: Avi Kivity, Marcelo Tosatti
  Cc: Alex Williamson, qemu-devel, kvm, Michael S. Tsirkin

Realize support for MSI config notifiers analogously to MSI-X. The logic
is slightly more complex for legacy MSI as per-vector masking is option
here. Device assignment will be the first user.

Note that this change does not introduce per-vector masking support.
This can to be added at some later point, using the notifications the
MSI layer provides now.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
---
 hw/msi.c |  171 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++------
 hw/msi.h |    7 ++-
 hw/pci.c |    2 +-
 hw/pci.h |    3 +
 4 files changed, 166 insertions(+), 17 deletions(-)

diff --git a/hw/msi.c b/hw/msi.c
index 23d79dd..2380ee3 100644
--- a/hw/msi.c
+++ b/hw/msi.c
@@ -241,15 +241,15 @@ void msi_uninit(struct PCIDevice *dev)
 
 void msi_reset(PCIDevice *dev)
 {
-    uint16_t flags;
+    uint16_t flags, old_flags;
     bool msi64bit;
 
     if (!msi_present(dev)) {
         return;
     }
 
-    flags = pci_get_word(dev->config + msi_flags_off(dev));
-    flags &= ~(PCI_MSI_FLAGS_QSIZE | PCI_MSI_FLAGS_ENABLE);
+    old_flags = pci_get_word(dev->config + msi_flags_off(dev));
+    flags = old_flags & ~(PCI_MSI_FLAGS_QSIZE | PCI_MSI_FLAGS_ENABLE);
     msi64bit = flags & PCI_MSI_FLAGS_64BIT;
 
     pci_set_word(dev->config + msi_flags_off(dev), flags);
@@ -262,6 +262,8 @@ void msi_reset(PCIDevice *dev)
         pci_set_long(dev->config + msi_mask_off(dev, msi64bit), 0);
         pci_set_long(dev->config + msi_pending_off(dev, msi64bit), 0);
     }
+    /* trigger notifier on potential changes */
+    msi_write_config(dev, msi_flags_off(dev), old_flags, 2);
     MSI_DEV_PRINTF(dev, "reset\n");
 }
 
@@ -306,16 +308,20 @@ void msi_notify(PCIDevice *dev, unsigned int vector)
 }
 
 /* Normally called by pci_default_write_config(). */
-void msi_write_config(PCIDevice *dev, uint32_t addr, uint32_t val, int len)
+void msi_write_config(PCIDevice *dev, uint32_t addr, uint32_t old_val, int len)
 {
     uint16_t flags = pci_get_word(dev->config + msi_flags_off(dev));
     bool msi64bit = flags & PCI_MSI_FLAGS_64BIT;
     bool msi_per_vector_mask = flags & PCI_MSI_FLAGS_MASKBIT;
+    bool fire_vector_notifier = false;
     unsigned int nr_vectors;
     uint8_t log_num_vecs;
     uint8_t log_max_vecs;
     unsigned int vector;
     uint32_t pending;
+    MSIMessage msg;
+    bool enabled;
+    int ret;
 
     if (!msi_present(dev) ||
         !ranges_overlap(addr, len, dev->msi_cap, msi_cap_sizeof(flags))) {
@@ -342,7 +348,35 @@ void msi_write_config(PCIDevice *dev, uint32_t addr, uint32_t val, int len)
     fprintf(stderr, "\n");
 #endif
 
-    if (!(flags & PCI_MSI_FLAGS_ENABLE)) {
+    enabled = flags & PCI_MSI_FLAGS_ENABLE;
+    nr_vectors = msi_nr_vectors(flags);
+
+    if (dev->msi_enable_notifier &&
+        range_covers_byte(addr, len, msi_flags_off(dev))) {
+        old_val >>= (msi_flags_off(dev) - addr) * 8;
+        if ((old_val & PCI_MSI_FLAGS_ENABLE) != enabled) {
+            dev->msi_enable_notifier(dev, enabled);
+            if (enabled && dev->msi_vector_config_notifier) {
+                fire_vector_notifier = true;
+            }
+        }
+    }
+    if (dev->msi_vector_config_notifier) {
+        if (ranges_overlap(addr, len, msi_address_lo_off(dev),
+                   msi64bit ? 10 : 6)) {
+            fire_vector_notifier = true;
+        }
+    }
+    if (fire_vector_notifier) {
+        for (vector = 0; vector < nr_vectors; ++vector) {
+            msi_message_from_vector(dev, flags, vector, &msg);
+            ret = dev->msi_vector_config_notifier(dev, vector, &msg,
+                                                  msi_is_masked(dev, vector));
+            assert(ret >= 0);
+        }
+    }
+
+    if (!enabled) {
         kvm_msi_free(dev);
         return;
     }
@@ -375,13 +409,12 @@ void msi_write_config(PCIDevice *dev, uint32_t addr, uint32_t val, int len)
         pci_set_word(dev->config + msi_flags_off(dev), flags);
     }
 
-    if (!msi_per_vector_mask) {
-        /* if per vector masking isn't supported,
-           there is no pending interrupt. */
+    if (!msi_per_vector_mask ||
+        !ranges_overlap(addr, len, msi_mask_off(dev, msi64bit), 4)) {
         return;
     }
 
-    nr_vectors = msi_nr_vectors(flags);
+    old_val >>= (msi_mask_off(dev, msi64bit) - addr) * 8;
 
     /* This will discard pending interrupts, if any. */
     pending = pci_get_long(dev->config + msi_pending_off(dev, msi64bit));
@@ -390,13 +423,22 @@ void msi_write_config(PCIDevice *dev, uint32_t addr, uint32_t val, int len)
 
     /* deliver pending interrupts which are unmasked */
     for (vector = 0; vector < nr_vectors; ++vector) {
-        if (msi_is_masked(dev, vector) || !(pending & (1U << vector))) {
-            continue;
+        bool is_masked = msi_is_masked(dev, vector);
+        unsigned int vector_mask = 1U << vector;
+
+        if (!fire_vector_notifier && dev->msi_vector_config_notifier &&
+            (bool)(old_val & vector_mask) != is_masked) {
+            msi_message_from_vector(dev, flags, vector, &msg);
+            ret = dev->msi_vector_config_notifier(dev, vector, &msg,
+                                                  is_masked);
+            assert(ret >= 0);
+        }
+        if (!is_masked && pending & vector_mask) {
+            pci_long_test_and_clear_mask(dev->config +
+                                         msi_pending_off(dev, msi64bit),
+                                         vector_mask);
+            msi_notify(dev, vector);
         }
-
-        pci_long_test_and_clear_mask(
-            dev->config + msi_pending_off(dev, msi64bit), 1U << vector);
-        msi_notify(dev, vector);
     }
 }
 
@@ -405,3 +447,102 @@ unsigned int msi_nr_vectors_allocated(const PCIDevice *dev)
     uint16_t flags = pci_get_word(dev->config + msi_flags_off(dev));
     return msi_nr_vectors(flags);
 }
+
+/* Invoke the notifier if vector entry is unmasked. */
+static int
+msi_notify_if_unmasked(PCIDevice *dev, unsigned int vector, int masked)
+{
+    uint16_t flags = pci_get_word(dev->config + msi_flags_off(dev));
+    MSIMessage msg;
+
+    assert(dev->msi_vector_config_notifier);
+
+    if (msi_is_masked(dev, vector)) {
+        return 0;
+    }
+    msi_message_from_vector(dev, flags, vector, &msg);
+    return dev->msi_vector_config_notifier(dev, vector, &msg, masked);
+}
+
+static int
+msi_set_config_notifier_for_vector(PCIDevice *dev, unsigned int vector)
+{
+    /* Notifier has been set. Invoke it on unmasked vectors. */
+    return msi_notify_if_unmasked(dev, vector, 0);
+}
+
+static int
+msi_unset_config_notifier_for_vector(PCIDevice *dev, unsigned int vector)
+{
+    /* Notifier will be unset. Invoke it to mask unmasked entries. */
+    return msi_notify_if_unmasked(dev, vector, 1);
+}
+
+int msi_set_config_notifiers(PCIDevice *dev, MSIEnableNotifier enable_notifier,
+                             MSIVectorConfigNotifier vector_config_notifier)
+{
+    unsigned int nr_vectors;
+    int r, vector;
+
+    assert(!dev->msi_vector_config_notifier);
+
+    dev->msi_enable_notifier = enable_notifier;
+    dev->msi_vector_config_notifier = vector_config_notifier;
+
+    if (enable_notifier && msi_enabled(dev)) {
+        enable_notifier(dev, true);
+    }
+    if (msi_enabled(dev)) {
+        nr_vectors =
+            msi_nr_vectors(pci_get_word(dev->config + msi_flags_off(dev)));
+        for (vector = 0; vector < nr_vectors; ++vector) {
+            r = msi_set_config_notifier_for_vector(dev, vector);
+            if (r < 0) {
+                goto undo;
+            }
+        }
+    }
+    return 0;
+
+undo:
+    while (--vector >= 0) {
+        msi_unset_config_notifier_for_vector(dev, vector);
+    }
+    if (enable_notifier && msi_enabled(dev)) {
+        enable_notifier(dev, false);
+    }
+    dev->msi_enable_notifier = NULL;
+    dev->msi_vector_config_notifier = NULL;
+    return r;
+}
+
+int msi_unset_config_notifiers(PCIDevice *dev)
+{
+    unsigned int nr_vectors;
+    int r, vector;
+
+    assert(dev->msi_vector_config_notifier);
+
+    if (msi_enabled(dev)) {
+        nr_vectors =
+            msi_nr_vectors(pci_get_word(dev->config + msi_flags_off(dev)));
+        for (vector = 0; vector < nr_vectors; ++vector) {
+            r = msi_unset_config_notifier_for_vector(dev, vector);
+            if (r < 0) {
+                goto undo;
+            }
+        }
+    }
+    if (dev->msi_enable_notifier && msi_enabled(dev)) {
+        dev->msi_enable_notifier(dev, false);
+    }
+    dev->msi_enable_notifier = NULL;
+    dev->msi_vector_config_notifier = NULL;
+    return 0;
+
+undo:
+    while (--vector >= 0) {
+        msi_set_config_notifier_for_vector(dev, vector);
+    }
+    return r;
+}
diff --git a/hw/msi.h b/hw/msi.h
index 74f6d52..c28665b 100644
--- a/hw/msi.h
+++ b/hw/msi.h
@@ -50,9 +50,14 @@ int msi_init(struct PCIDevice *dev, uint8_t offset,
 void msi_uninit(struct PCIDevice *dev);
 void msi_reset(PCIDevice *dev);
 void msi_notify(PCIDevice *dev, unsigned int vector);
-void msi_write_config(PCIDevice *dev, uint32_t addr, uint32_t val, int len);
+void msi_write_config(PCIDevice *dev, uint32_t addr, uint32_t old_val,
+                      int len);
 unsigned int msi_nr_vectors_allocated(const PCIDevice *dev);
 
+int msi_set_config_notifiers(PCIDevice *dev, MSIEnableNotifier enable_notifier,
+                             MSIVectorConfigNotifier vector_config_notifier);
+int msi_unset_config_notifiers(PCIDevice *dev);
+
 static inline bool msi_present(const PCIDevice *dev)
 {
     return dev->cap_present & QEMU_PCI_CAP_MSI;
diff --git a/hw/pci.c b/hw/pci.c
index 4f0d7e1..96cd334 100644
--- a/hw/pci.c
+++ b/hw/pci.c
@@ -1155,7 +1155,7 @@ void pci_default_write_config(PCIDevice *d, uint32_t addr, uint32_t val, int l)
     if (range_covers_byte(addr, l, PCI_COMMAND))
         pci_update_irq_disabled(d, was_irq_disabled);
 
-    msi_write_config(d, addr, val, l);
+    msi_write_config(d, addr, old_val, l);
     msix_write_config(d, addr, old_val, l);
 }
 
diff --git a/hw/pci.h b/hw/pci.h
index 5cf9a16..266fe34 100644
--- a/hw/pci.h
+++ b/hw/pci.h
@@ -206,6 +206,9 @@ struct PCIDevice {
      * on the rest of the region. */
     target_phys_addr_t msix_page_size;
 
+    MSIEnableNotifier msi_enable_notifier;
+    MSIVectorConfigNotifier msi_vector_config_notifier;
+
     MSIEnableNotifier msix_enable_notifier;
     MSIVectorConfigNotifier msix_vector_config_notifier;
 };
-- 
1.7.3.4

^ permalink raw reply related	[flat|nested] 288+ messages in thread

* [RFC][PATCH 39/45] pci-assign: Use generic MSI support
  2011-10-17  9:27 ` [Qemu-devel] " Jan Kiszka
@ 2011-10-17  9:28   ` Jan Kiszka
  -1 siblings, 0 replies; 288+ messages in thread
From: Jan Kiszka @ 2011-10-17  9:28 UTC (permalink / raw)
  To: Avi Kivity, Marcelo Tosatti
  Cc: kvm, Alex Williamson, Michael S. Tsirkin, qemu-devel

Implement MSI support of a assigned devices via the generic MSI layer of
QEMU. Use config notifiers to update the vector route or switch back to
INTx when MSI gets disabled again.

Using the generic layer not only saves a bit code, it also fixes reset
while legacy MSI is in use and adds 64 bit support.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
---
 hw/device-assignment.c |   77 +++++++++++++++++++----------------------------
 1 files changed, 31 insertions(+), 46 deletions(-)

diff --git a/hw/device-assignment.c b/hw/device-assignment.c
index 2484afd..10b30a3 100644
--- a/hw/device-assignment.c
+++ b/hw/device-assignment.c
@@ -699,10 +699,6 @@ static void free_assigned_device(AssignedDevice *dev)
         close(dev->real_device.config_fd);
     }
 
-    if (dev->dev.msi_cache) {
-        kvm_msi_cache_invalidate(&dev->dev.msi_cache[0]);
-        g_free(dev->dev.msi_cache);
-    }
     invalidate_msix_vectors(dev);
     g_free(dev->dev.msix_cache);
 }
@@ -847,7 +843,7 @@ static int assign_intx(AssignedDevice *dev)
 
     irq_type = KVM_DEV_IRQ_GUEST_INTX;
     if (dev->features & ASSIGNED_DEVICE_PREFER_MSI_MASK &&
-        dev->cap.available & ASSIGNED_DEVICE_CAP_MSI) {
+        msi_present(&dev->dev)) {
         irq_type |= KVM_DEV_IRQ_HOST_MSI;
     } else {
         irq_type |= KVM_DEV_IRQ_HOST_INTX;
@@ -920,31 +916,33 @@ void assigned_dev_update_irqs(void)
     }
 }
 
-static void assigned_dev_update_msi(PCIDevice *pci_dev)
+static void assigned_dev_update_msi(PCIDevice *pci_dev, bool enabled)
 {
     AssignedDevice *dev = DO_UPCAST(AssignedDevice, dev, pci_dev);
-    uint8_t ctrl_byte = pci_get_byte(pci_dev->config + pci_dev->msi_cap +
-                                     PCI_MSI_FLAGS);
-
-    if (ctrl_byte & PCI_MSI_FLAGS_ENABLE) {
-        uint8_t *pos = pci_dev->config + pci_dev->msi_cap;
-        MSIMessage msg;
 
-        deassign_irq(dev);
+    if (!enabled) {
+        assign_intx(dev);
+    }
+}
 
-        msg.address = pci_get_long(pos + PCI_MSI_ADDRESS_LO);
-        msg.data = pci_get_word(pos + PCI_MSI_DATA_32);
+static int assigned_dev_update_msi_vector(PCIDevice *pci_dev,
+                                          unsigned int vector,
+                                          MSIMessage *msg, bool masked)
+{
+    AssignedDevice *dev = DO_UPCAST(AssignedDevice, dev, pci_dev);
+    int ret;
 
-        if (kvm_device_msi_assign(kvm_state, calc_assigned_dev_id(dev), &msg,
-                                  &dev->dev.msi_cache[0]) < 0) {
-            perror("assigned_dev_update_msi: assign msi");
-            return;
+    if (!masked) {
+        deassign_irq(dev);
+        ret = kvm_device_msi_assign(kvm_state, calc_assigned_dev_id(dev), msg,
+                                    &dev->dev.msi_cache[0]);
+        if (ret < 0) {
+            perror("assigned_dev_update_msi_vector: assign msi");
+            return ret;
         }
         dev->irq_requested_type = KVM_DEV_IRQ_HOST_MSI | KVM_DEV_IRQ_GUEST_MSI;
-    } else {
-        kvm_msi_cache_invalidate(&dev->dev.msi_cache[0]);
-        assign_intx(dev);
     }
+    return 0;
 }
 
 static int assigned_dev_set_msix_vectors(PCIDevice *pci_dev)
@@ -1085,12 +1083,6 @@ static void assigned_dev_pci_write_config(PCIDevice *pci_dev, uint32_t address,
 
     pci_default_write_config(pci_dev, address, val, len);
 
-    if (assigned_dev->cap.available & ASSIGNED_DEVICE_CAP_MSI) {
-        if (range_covers_byte(address, len,
-                              pci_dev->msi_cap + PCI_MSI_FLAGS)) {
-            assigned_dev_update_msi(pci_dev);
-        }
-    }
     if (assigned_dev->cap.available & ASSIGNED_DEVICE_CAP_MSIX) {
         if (range_covers_byte(address, len,
                               pci_dev->msix_cap + PCI_MSIX_FLAGS + 1)) {
@@ -1136,26 +1128,19 @@ static int assigned_device_pci_cap_init(PCIDevice *pci_dev)
      * MSI capability is the 1st capability in capability config */
     pos = pci_find_cap_offset(pci_dev, PCI_CAP_ID_MSI, 0);
     if (pos != 0 && kvm_check_extension(kvm_state, KVM_CAP_ASSIGN_DEV_IRQ)) {
-        dev->cap.available |= ASSIGNED_DEVICE_CAP_MSI;
-        /* Only 32-bit/no-mask currently supported */
-        if ((ret = pci_add_capability(pci_dev, PCI_CAP_ID_MSI, pos, 10)) < 0) {
+        uint16_t flags = pci_get_word(pci_dev->config + pos + PCI_MSI_FLAGS);
+
+        /* Note: KVM does not support multiple messages */
+        ret = msi_init(pci_dev, pos, 1, flags & PCI_MSI_FLAGS_64BIT,
+                       flags & PCI_MSI_FLAGS_MASKBIT);
+        if (ret < 0) {
+            return ret;
+        }
+        ret = msi_set_config_notifiers(pci_dev, assigned_dev_update_msi,
+                                       assigned_dev_update_msi_vector);
+        if (ret < 0) {
             return ret;
         }
-        pci_dev->msi_cap = pos;
-
-        pci_set_word(pci_dev->config + pos + PCI_MSI_FLAGS,
-                     pci_get_word(pci_dev->config + pos + PCI_MSI_FLAGS) &
-                     PCI_MSI_FLAGS_QMASK);
-        pci_set_long(pci_dev->config + pos + PCI_MSI_ADDRESS_LO, 0);
-        pci_set_word(pci_dev->config + pos + PCI_MSI_DATA_32, 0);
-
-        /* Set writable fields */
-        pci_set_word(pci_dev->wmask + pos + PCI_MSI_FLAGS,
-                     PCI_MSI_FLAGS_QSIZE | PCI_MSI_FLAGS_ENABLE);
-        pci_set_long(pci_dev->wmask + pos + PCI_MSI_ADDRESS_LO, 0xfffffffc);
-        pci_set_word(pci_dev->wmask + pos + PCI_MSI_DATA_32, 0xffff);
-
-        dev->dev.msi_cache = g_malloc0(sizeof(MSIRoutingCache));
     }
     /* Expose MSI-X capability */
     pos = pci_find_cap_offset(pci_dev, PCI_CAP_ID_MSIX, 0);
-- 
1.7.3.4


^ permalink raw reply related	[flat|nested] 288+ messages in thread

* [Qemu-devel] [RFC][PATCH 39/45] pci-assign: Use generic MSI support
@ 2011-10-17  9:28   ` Jan Kiszka
  0 siblings, 0 replies; 288+ messages in thread
From: Jan Kiszka @ 2011-10-17  9:28 UTC (permalink / raw)
  To: Avi Kivity, Marcelo Tosatti
  Cc: Alex Williamson, qemu-devel, kvm, Michael S. Tsirkin

Implement MSI support of a assigned devices via the generic MSI layer of
QEMU. Use config notifiers to update the vector route or switch back to
INTx when MSI gets disabled again.

Using the generic layer not only saves a bit code, it also fixes reset
while legacy MSI is in use and adds 64 bit support.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
---
 hw/device-assignment.c |   77 +++++++++++++++++++----------------------------
 1 files changed, 31 insertions(+), 46 deletions(-)

diff --git a/hw/device-assignment.c b/hw/device-assignment.c
index 2484afd..10b30a3 100644
--- a/hw/device-assignment.c
+++ b/hw/device-assignment.c
@@ -699,10 +699,6 @@ static void free_assigned_device(AssignedDevice *dev)
         close(dev->real_device.config_fd);
     }
 
-    if (dev->dev.msi_cache) {
-        kvm_msi_cache_invalidate(&dev->dev.msi_cache[0]);
-        g_free(dev->dev.msi_cache);
-    }
     invalidate_msix_vectors(dev);
     g_free(dev->dev.msix_cache);
 }
@@ -847,7 +843,7 @@ static int assign_intx(AssignedDevice *dev)
 
     irq_type = KVM_DEV_IRQ_GUEST_INTX;
     if (dev->features & ASSIGNED_DEVICE_PREFER_MSI_MASK &&
-        dev->cap.available & ASSIGNED_DEVICE_CAP_MSI) {
+        msi_present(&dev->dev)) {
         irq_type |= KVM_DEV_IRQ_HOST_MSI;
     } else {
         irq_type |= KVM_DEV_IRQ_HOST_INTX;
@@ -920,31 +916,33 @@ void assigned_dev_update_irqs(void)
     }
 }
 
-static void assigned_dev_update_msi(PCIDevice *pci_dev)
+static void assigned_dev_update_msi(PCIDevice *pci_dev, bool enabled)
 {
     AssignedDevice *dev = DO_UPCAST(AssignedDevice, dev, pci_dev);
-    uint8_t ctrl_byte = pci_get_byte(pci_dev->config + pci_dev->msi_cap +
-                                     PCI_MSI_FLAGS);
-
-    if (ctrl_byte & PCI_MSI_FLAGS_ENABLE) {
-        uint8_t *pos = pci_dev->config + pci_dev->msi_cap;
-        MSIMessage msg;
 
-        deassign_irq(dev);
+    if (!enabled) {
+        assign_intx(dev);
+    }
+}
 
-        msg.address = pci_get_long(pos + PCI_MSI_ADDRESS_LO);
-        msg.data = pci_get_word(pos + PCI_MSI_DATA_32);
+static int assigned_dev_update_msi_vector(PCIDevice *pci_dev,
+                                          unsigned int vector,
+                                          MSIMessage *msg, bool masked)
+{
+    AssignedDevice *dev = DO_UPCAST(AssignedDevice, dev, pci_dev);
+    int ret;
 
-        if (kvm_device_msi_assign(kvm_state, calc_assigned_dev_id(dev), &msg,
-                                  &dev->dev.msi_cache[0]) < 0) {
-            perror("assigned_dev_update_msi: assign msi");
-            return;
+    if (!masked) {
+        deassign_irq(dev);
+        ret = kvm_device_msi_assign(kvm_state, calc_assigned_dev_id(dev), msg,
+                                    &dev->dev.msi_cache[0]);
+        if (ret < 0) {
+            perror("assigned_dev_update_msi_vector: assign msi");
+            return ret;
         }
         dev->irq_requested_type = KVM_DEV_IRQ_HOST_MSI | KVM_DEV_IRQ_GUEST_MSI;
-    } else {
-        kvm_msi_cache_invalidate(&dev->dev.msi_cache[0]);
-        assign_intx(dev);
     }
+    return 0;
 }
 
 static int assigned_dev_set_msix_vectors(PCIDevice *pci_dev)
@@ -1085,12 +1083,6 @@ static void assigned_dev_pci_write_config(PCIDevice *pci_dev, uint32_t address,
 
     pci_default_write_config(pci_dev, address, val, len);
 
-    if (assigned_dev->cap.available & ASSIGNED_DEVICE_CAP_MSI) {
-        if (range_covers_byte(address, len,
-                              pci_dev->msi_cap + PCI_MSI_FLAGS)) {
-            assigned_dev_update_msi(pci_dev);
-        }
-    }
     if (assigned_dev->cap.available & ASSIGNED_DEVICE_CAP_MSIX) {
         if (range_covers_byte(address, len,
                               pci_dev->msix_cap + PCI_MSIX_FLAGS + 1)) {
@@ -1136,26 +1128,19 @@ static int assigned_device_pci_cap_init(PCIDevice *pci_dev)
      * MSI capability is the 1st capability in capability config */
     pos = pci_find_cap_offset(pci_dev, PCI_CAP_ID_MSI, 0);
     if (pos != 0 && kvm_check_extension(kvm_state, KVM_CAP_ASSIGN_DEV_IRQ)) {
-        dev->cap.available |= ASSIGNED_DEVICE_CAP_MSI;
-        /* Only 32-bit/no-mask currently supported */
-        if ((ret = pci_add_capability(pci_dev, PCI_CAP_ID_MSI, pos, 10)) < 0) {
+        uint16_t flags = pci_get_word(pci_dev->config + pos + PCI_MSI_FLAGS);
+
+        /* Note: KVM does not support multiple messages */
+        ret = msi_init(pci_dev, pos, 1, flags & PCI_MSI_FLAGS_64BIT,
+                       flags & PCI_MSI_FLAGS_MASKBIT);
+        if (ret < 0) {
+            return ret;
+        }
+        ret = msi_set_config_notifiers(pci_dev, assigned_dev_update_msi,
+                                       assigned_dev_update_msi_vector);
+        if (ret < 0) {
             return ret;
         }
-        pci_dev->msi_cap = pos;
-
-        pci_set_word(pci_dev->config + pos + PCI_MSI_FLAGS,
-                     pci_get_word(pci_dev->config + pos + PCI_MSI_FLAGS) &
-                     PCI_MSI_FLAGS_QMASK);
-        pci_set_long(pci_dev->config + pos + PCI_MSI_ADDRESS_LO, 0);
-        pci_set_word(pci_dev->config + pos + PCI_MSI_DATA_32, 0);
-
-        /* Set writable fields */
-        pci_set_word(pci_dev->wmask + pos + PCI_MSI_FLAGS,
-                     PCI_MSI_FLAGS_QSIZE | PCI_MSI_FLAGS_ENABLE);
-        pci_set_long(pci_dev->wmask + pos + PCI_MSI_ADDRESS_LO, 0xfffffffc);
-        pci_set_word(pci_dev->wmask + pos + PCI_MSI_DATA_32, 0xffff);
-
-        dev->dev.msi_cache = g_malloc0(sizeof(MSIRoutingCache));
     }
     /* Expose MSI-X capability */
     pos = pci_find_cap_offset(pci_dev, PCI_CAP_ID_MSIX, 0);
-- 
1.7.3.4

^ permalink raw reply related	[flat|nested] 288+ messages in thread

* [RFC][PATCH 40/45] qemu-kvm: msix: Drop check for preexisting cap from msix_add_config
  2011-10-17  9:27 ` [Qemu-devel] " Jan Kiszka
@ 2011-10-17  9:28   ` Jan Kiszka
  -1 siblings, 0 replies; 288+ messages in thread
From: Jan Kiszka @ 2011-10-17  9:28 UTC (permalink / raw)
  To: Avi Kivity, Marcelo Tosatti
  Cc: kvm, Alex Williamson, Michael S. Tsirkin, qemu-devel

msix_add_config is called from msix_init which only supports init-once.
Moreover, msix_add_config performed no check if the provided parameters
were compatible with the existing capability entry, so was inconsistent
anyway.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
---
 hw/msix.c |   72 +++++++++++++++++++++++++++++-------------------------------
 1 files changed, 35 insertions(+), 37 deletions(-)

diff --git a/hw/msix.c b/hw/msix.c
index f1b97b5..5f0fa6a 100644
--- a/hw/msix.c
+++ b/hw/msix.c
@@ -63,48 +63,46 @@ static int msix_add_config(struct PCIDevice *pdev, unsigned short nentries,
                            unsigned bar_nr, unsigned bar_size)
 {
     int config_offset;
+    uint32_t new_size;
     uint8_t *config;
 
-    pdev->msix_bar_size = bar_size;
-
-    config_offset = pci_find_capability(pdev, PCI_CAP_ID_MSIX);
-
-    if (!config_offset) {
-        uint32_t new_size;
-
-        if (nentries < 1 || nentries > PCI_MSIX_FLAGS_QSIZE + 1)
-            return -EINVAL;
-        if (bar_size > 0x80000000)
-            return -ENOSPC;
+    if (nentries < 1 || nentries > PCI_MSIX_FLAGS_QSIZE + 1) {
+        return -EINVAL;
+    }
+    if (bar_size > 0x80000000) {
+        return -ENOSPC;
+    }
 
-        /* Add space for MSI-X structures */
-        if (!bar_size) {
-            new_size = MSIX_PAGE_SIZE;
-        } else if (bar_size < MSIX_PAGE_SIZE) {
-            bar_size = MSIX_PAGE_SIZE;
-            new_size = MSIX_PAGE_SIZE * 2;
-        } else {
-            new_size = bar_size * 2;
-        }
+    /* Add space for MSI-X structures */
+    if (!bar_size) {
+        new_size = MSIX_PAGE_SIZE;
+    } else if (bar_size < MSIX_PAGE_SIZE) {
+        bar_size = MSIX_PAGE_SIZE;
+        new_size = MSIX_PAGE_SIZE * 2;
+    } else {
+        new_size = bar_size * 2;
+    }
 
-        pdev->msix_bar_size = new_size;
-        config_offset = pci_add_capability(pdev, PCI_CAP_ID_MSIX,
-                                           0, MSIX_CAP_LENGTH);
-        if (config_offset < 0)
-            return config_offset;
-        config = pdev->config + config_offset;
-
-        pci_set_word(config + PCI_MSIX_FLAGS, nentries - 1);
-        /* Table on top of BAR */
-        pci_set_long(config + PCI_MSIX_TABLE, bar_size | bar_nr);
-        /* Pending bits on top of that */
-        pci_set_long(config + PCI_MSIX_PBA, (bar_size + MSIX_PAGE_PENDING) |
-                     bar_nr);
+    pdev->msix_bar_size = new_size;
+    config_offset = pci_add_capability(pdev, PCI_CAP_ID_MSIX, 0,
+                                       MSIX_CAP_LENGTH);
+    if (config_offset < 0) {
+        return config_offset;
     }
     pdev->msix_cap = config_offset;
+
+    config = pdev->config + config_offset;
+    pci_set_word(config + PCI_MSIX_FLAGS, nentries - 1);
+    /* Table on top of BAR */
+    pci_set_long(config + PCI_MSIX_TABLE, bar_size | bar_nr);
+    /* Pending bits on top of that */
+    pci_set_long(config + PCI_MSIX_PBA,
+                 (bar_size + MSIX_PAGE_PENDING) | bar_nr);
+
     /* Make flags bit writable. */
-    pdev->wmask[config_offset + MSIX_CONTROL_OFFSET] |= MSIX_ENABLE_MASK |
-	    MSIX_MASKALL_MASK;
+    pdev->wmask[config_offset + MSIX_CONTROL_OFFSET] |=
+        MSIX_ENABLE_MASK | MSIX_MASKALL_MASK;
+
     return 0;
 }
 
-- 
1.7.3.4


^ permalink raw reply related	[flat|nested] 288+ messages in thread

* [Qemu-devel] [RFC][PATCH 40/45] qemu-kvm: msix: Drop check for preexisting cap from msix_add_config
@ 2011-10-17  9:28   ` Jan Kiszka
  0 siblings, 0 replies; 288+ messages in thread
From: Jan Kiszka @ 2011-10-17  9:28 UTC (permalink / raw)
  To: Avi Kivity, Marcelo Tosatti
  Cc: Alex Williamson, qemu-devel, kvm, Michael S. Tsirkin

msix_add_config is called from msix_init which only supports init-once.
Moreover, msix_add_config performed no check if the provided parameters
were compatible with the existing capability entry, so was inconsistent
anyway.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
---
 hw/msix.c |   72 +++++++++++++++++++++++++++++-------------------------------
 1 files changed, 35 insertions(+), 37 deletions(-)

diff --git a/hw/msix.c b/hw/msix.c
index f1b97b5..5f0fa6a 100644
--- a/hw/msix.c
+++ b/hw/msix.c
@@ -63,48 +63,46 @@ static int msix_add_config(struct PCIDevice *pdev, unsigned short nentries,
                            unsigned bar_nr, unsigned bar_size)
 {
     int config_offset;
+    uint32_t new_size;
     uint8_t *config;
 
-    pdev->msix_bar_size = bar_size;
-
-    config_offset = pci_find_capability(pdev, PCI_CAP_ID_MSIX);
-
-    if (!config_offset) {
-        uint32_t new_size;
-
-        if (nentries < 1 || nentries > PCI_MSIX_FLAGS_QSIZE + 1)
-            return -EINVAL;
-        if (bar_size > 0x80000000)
-            return -ENOSPC;
+    if (nentries < 1 || nentries > PCI_MSIX_FLAGS_QSIZE + 1) {
+        return -EINVAL;
+    }
+    if (bar_size > 0x80000000) {
+        return -ENOSPC;
+    }
 
-        /* Add space for MSI-X structures */
-        if (!bar_size) {
-            new_size = MSIX_PAGE_SIZE;
-        } else if (bar_size < MSIX_PAGE_SIZE) {
-            bar_size = MSIX_PAGE_SIZE;
-            new_size = MSIX_PAGE_SIZE * 2;
-        } else {
-            new_size = bar_size * 2;
-        }
+    /* Add space for MSI-X structures */
+    if (!bar_size) {
+        new_size = MSIX_PAGE_SIZE;
+    } else if (bar_size < MSIX_PAGE_SIZE) {
+        bar_size = MSIX_PAGE_SIZE;
+        new_size = MSIX_PAGE_SIZE * 2;
+    } else {
+        new_size = bar_size * 2;
+    }
 
-        pdev->msix_bar_size = new_size;
-        config_offset = pci_add_capability(pdev, PCI_CAP_ID_MSIX,
-                                           0, MSIX_CAP_LENGTH);
-        if (config_offset < 0)
-            return config_offset;
-        config = pdev->config + config_offset;
-
-        pci_set_word(config + PCI_MSIX_FLAGS, nentries - 1);
-        /* Table on top of BAR */
-        pci_set_long(config + PCI_MSIX_TABLE, bar_size | bar_nr);
-        /* Pending bits on top of that */
-        pci_set_long(config + PCI_MSIX_PBA, (bar_size + MSIX_PAGE_PENDING) |
-                     bar_nr);
+    pdev->msix_bar_size = new_size;
+    config_offset = pci_add_capability(pdev, PCI_CAP_ID_MSIX, 0,
+                                       MSIX_CAP_LENGTH);
+    if (config_offset < 0) {
+        return config_offset;
     }
     pdev->msix_cap = config_offset;
+
+    config = pdev->config + config_offset;
+    pci_set_word(config + PCI_MSIX_FLAGS, nentries - 1);
+    /* Table on top of BAR */
+    pci_set_long(config + PCI_MSIX_TABLE, bar_size | bar_nr);
+    /* Pending bits on top of that */
+    pci_set_long(config + PCI_MSIX_PBA,
+                 (bar_size + MSIX_PAGE_PENDING) | bar_nr);
+
     /* Make flags bit writable. */
-    pdev->wmask[config_offset + MSIX_CONTROL_OFFSET] |= MSIX_ENABLE_MASK |
-	    MSIX_MASKALL_MASK;
+    pdev->wmask[config_offset + MSIX_CONTROL_OFFSET] |=
+        MSIX_ENABLE_MASK | MSIX_MASKALL_MASK;
+
     return 0;
 }
 
-- 
1.7.3.4

^ permalink raw reply related	[flat|nested] 288+ messages in thread

* [RFC][PATCH 41/45] msix: Drop unused msix_bar_size
  2011-10-17  9:27 ` [Qemu-devel] " Jan Kiszka
@ 2011-10-17  9:28   ` Jan Kiszka
  -1 siblings, 0 replies; 288+ messages in thread
From: Jan Kiszka @ 2011-10-17  9:28 UTC (permalink / raw)
  To: Avi Kivity, Marcelo Tosatti
  Cc: kvm, Alex Williamson, Michael S. Tsirkin, qemu-devel

No use for it, even more after the upcoming API changes.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
---
 hw/msix.c |    8 --------
 hw/msix.h |    2 --
 hw/pci.h  |    2 --
 3 files changed, 0 insertions(+), 12 deletions(-)

diff --git a/hw/msix.c b/hw/msix.c
index 5f0fa6a..bccd8b1 100644
--- a/hw/msix.c
+++ b/hw/msix.c
@@ -83,7 +83,6 @@ static int msix_add_config(struct PCIDevice *pdev, unsigned short nentries,
         new_size = bar_size * 2;
     }
 
-    pdev->msix_bar_size = new_size;
     config_offset = pci_add_capability(pdev, PCI_CAP_ID_MSIX, 0,
                                        MSIX_CAP_LENGTH);
     if (config_offset < 0) {
@@ -374,13 +373,6 @@ int msix_enabled(PCIDevice *dev)
          MSIX_ENABLE_MASK);
 }
 
-/* Size of bar where MSI-X table resides, or 0 if MSI-X not supported. */
-uint32_t msix_bar_size(PCIDevice *dev)
-{
-    return (dev->cap_present & QEMU_PCI_CAP_MSIX) ?
-        dev->msix_bar_size : 0;
-}
-
 /* Send an MSI-X message */
 void msix_notify(PCIDevice *dev, unsigned vector)
 {
diff --git a/hw/msix.h b/hw/msix.h
index 9cd54cf..dfc6087 100644
--- a/hw/msix.h
+++ b/hw/msix.h
@@ -19,8 +19,6 @@ void msix_load(PCIDevice *dev, QEMUFile *f);
 int msix_enabled(PCIDevice *dev);
 int msix_present(PCIDevice *dev);
 
-uint32_t msix_bar_size(PCIDevice *dev);
-
 void msix_clear_vector(PCIDevice *dev, unsigned vector);
 void msix_clear_all_vectors(PCIDevice *dev);
 
diff --git a/hw/pci.h b/hw/pci.h
index 266fe34..e2be271 100644
--- a/hw/pci.h
+++ b/hw/pci.h
@@ -178,8 +178,6 @@ struct PCIDevice {
     uint8_t *msix_table_page;
     /* MMIO index used to map MSIX table and pending bit entries. */
     MemoryRegion msix_mmio;
-    /* Region including the MSI-X table */
-    uint32_t msix_bar_size;
     /* Version id needed for VMState */
     int32_t version_id;
 
-- 
1.7.3.4


^ permalink raw reply related	[flat|nested] 288+ messages in thread

* [Qemu-devel] [RFC][PATCH 41/45] msix: Drop unused msix_bar_size
@ 2011-10-17  9:28   ` Jan Kiszka
  0 siblings, 0 replies; 288+ messages in thread
From: Jan Kiszka @ 2011-10-17  9:28 UTC (permalink / raw)
  To: Avi Kivity, Marcelo Tosatti
  Cc: Alex Williamson, qemu-devel, kvm, Michael S. Tsirkin

No use for it, even more after the upcoming API changes.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
---
 hw/msix.c |    8 --------
 hw/msix.h |    2 --
 hw/pci.h  |    2 --
 3 files changed, 0 insertions(+), 12 deletions(-)

diff --git a/hw/msix.c b/hw/msix.c
index 5f0fa6a..bccd8b1 100644
--- a/hw/msix.c
+++ b/hw/msix.c
@@ -83,7 +83,6 @@ static int msix_add_config(struct PCIDevice *pdev, unsigned short nentries,
         new_size = bar_size * 2;
     }
 
-    pdev->msix_bar_size = new_size;
     config_offset = pci_add_capability(pdev, PCI_CAP_ID_MSIX, 0,
                                        MSIX_CAP_LENGTH);
     if (config_offset < 0) {
@@ -374,13 +373,6 @@ int msix_enabled(PCIDevice *dev)
          MSIX_ENABLE_MASK);
 }
 
-/* Size of bar where MSI-X table resides, or 0 if MSI-X not supported. */
-uint32_t msix_bar_size(PCIDevice *dev)
-{
-    return (dev->cap_present & QEMU_PCI_CAP_MSIX) ?
-        dev->msix_bar_size : 0;
-}
-
 /* Send an MSI-X message */
 void msix_notify(PCIDevice *dev, unsigned vector)
 {
diff --git a/hw/msix.h b/hw/msix.h
index 9cd54cf..dfc6087 100644
--- a/hw/msix.h
+++ b/hw/msix.h
@@ -19,8 +19,6 @@ void msix_load(PCIDevice *dev, QEMUFile *f);
 int msix_enabled(PCIDevice *dev);
 int msix_present(PCIDevice *dev);
 
-uint32_t msix_bar_size(PCIDevice *dev);
-
 void msix_clear_vector(PCIDevice *dev, unsigned vector);
 void msix_clear_all_vectors(PCIDevice *dev);
 
diff --git a/hw/pci.h b/hw/pci.h
index 266fe34..e2be271 100644
--- a/hw/pci.h
+++ b/hw/pci.h
@@ -178,8 +178,6 @@ struct PCIDevice {
     uint8_t *msix_table_page;
     /* MMIO index used to map MSIX table and pending bit entries. */
     MemoryRegion msix_mmio;
-    /* Region including the MSI-X table */
-    uint32_t msix_bar_size;
     /* Version id needed for VMState */
     int32_t version_id;
 
-- 
1.7.3.4

^ permalink raw reply related	[flat|nested] 288+ messages in thread

* [RFC][PATCH 42/45] msix: Introduce msix_init_simple
  2011-10-17  9:27 ` [Qemu-devel] " Jan Kiszka
@ 2011-10-17  9:28   ` Jan Kiszka
  -1 siblings, 0 replies; 288+ messages in thread
From: Jan Kiszka @ 2011-10-17  9:28 UTC (permalink / raw)
  To: Avi Kivity, Marcelo Tosatti
  Cc: kvm, Alex Williamson, Michael S. Tsirkin, qemu-devel

Devices models are usually not interested in specifying MSI-X
configuration details beyond the number of vectors to provide and the
BAR number to use. Layout of an exclusively used BAR and its
registration can also be handled centrally.

This is the purpose of msix_init_simple. It provides handy services to
the existing users. Future users like device assignment may require more
detailed setup specification. For them we will (re-)introduce msix_init
with the full list of configuration option (in contrast to the current
code).

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
---
 hw/ivshmem.c    |    6 +-----
 hw/msix.c       |   35 ++++++++++++++---------------------
 hw/msix.h       |    7 +++----
 hw/virtio-pci.c |   15 +++++----------
 hw/virtio-pci.h |    1 -
 5 files changed, 23 insertions(+), 41 deletions(-)

diff --git a/hw/ivshmem.c b/hw/ivshmem.c
index a402c98..d9dbd18 100644
--- a/hw/ivshmem.c
+++ b/hw/ivshmem.c
@@ -65,7 +65,6 @@ typedef struct IVShmemState {
      */
     MemoryRegion bar;
     MemoryRegion ivshmem;
-    MemoryRegion msix_bar;
     uint64_t ivshmem_size; /* size of shared memory region */
     int shm_fd; /* shared memory file descriptor */
 
@@ -539,10 +538,7 @@ static void ivshmem_setup_msi(IVShmemState *s)
 {
     /* allocate the MSI-X vectors */
 
-    memory_region_init(&s->msix_bar, "ivshmem-msix", 4096);
-    if (!msix_init(&s->dev, s->vectors, &s->msix_bar, 1, 0)) {
-        pci_register_bar(&s->dev, 1, PCI_BASE_ADDRESS_SPACE_MEMORY,
-                         &s->msix_bar);
+    if (!msix_init_simple(&s->dev, s->vectors, 1)) {
         IVSHMEM_DPRINTF("msix initialized (%d vectors)\n", s->vectors);
     } else {
         IVSHMEM_DPRINTF("msix initialization failed\n");
diff --git a/hw/msix.c b/hw/msix.c
index bccd8b1..258b9c1 100644
--- a/hw/msix.c
+++ b/hw/msix.c
@@ -244,17 +244,6 @@ static const MemoryRegionOps msix_mmio_ops = {
     },
 };
 
-static void msix_mmio_setup(PCIDevice *d, MemoryRegion *bar)
-{
-    uint8_t *config = d->config + d->msix_cap;
-    uint32_t table = pci_get_long(config + PCI_MSIX_TABLE);
-    uint32_t offset = table & ~(MSIX_PAGE_SIZE - 1);
-    /* TODO: for assigned devices, we'll want to make it possible to map
-     * pending bits separately in case they are in a separate bar. */
-
-    memory_region_add_subregion(bar, offset, &d->msix_mmio);
-}
-
 static void msix_mask_all(struct PCIDevice *dev, unsigned nentries)
 {
     int vector;
@@ -272,11 +261,9 @@ static void msix_mask_all(struct PCIDevice *dev, unsigned nentries)
     }
 }
 
-/* Initialize the MSI-X structures. Note: if MSI-X is supported, BAR size is
- * modified, it should be retrieved with msix_bar_size. */
-int msix_init(struct PCIDevice *dev, unsigned short nentries,
-              MemoryRegion *bar,
-              unsigned bar_nr, unsigned bar_size)
+/* Initialize the MSI-X structures in a single dedicated BAR
+ * and register it. */
+int msix_init_simple(PCIDevice *dev, unsigned short nentries, unsigned bar_nr)
 {
     int ret;
 
@@ -296,14 +283,16 @@ int msix_init(struct PCIDevice *dev, unsigned short nentries,
                           "msix", MSIX_PAGE_SIZE);
 
     dev->msix_entries_nr = nentries;
-    ret = msix_add_config(dev, nentries, bar_nr, bar_size);
+    ret = msix_add_config(dev, nentries, bar_nr, 0);
     if (ret)
         goto err_config;
 
     dev->msix_cache = g_malloc0(nentries * sizeof *dev->msix_cache);
 
     dev->cap_present |= QEMU_PCI_CAP_MSIX;
-    msix_mmio_setup(dev, bar);
+
+    pci_register_bar(dev, bar_nr, PCI_BASE_ADDRESS_SPACE_MEMORY,
+                     &dev->msix_mmio);
     return 0;
 
 err_config:
@@ -315,10 +304,10 @@ err_config:
 }
 
 /* Clean up resources for the device. */
-int msix_uninit(PCIDevice *dev, MemoryRegion *bar)
+void msix_uninit(PCIDevice *dev, MemoryRegion *bar)
 {
     if (!msix_present(dev)) {
-        return 0;
+        return;
     }
     pci_del_capability(dev, PCI_CAP_ID_MSIX, MSIX_CAP_LENGTH);
     dev->msix_cap = 0;
@@ -332,7 +321,11 @@ int msix_uninit(PCIDevice *dev, MemoryRegion *bar)
     g_free(dev->msix_cache);
 
     dev->cap_present &= ~QEMU_PCI_CAP_MSIX;
-    return 0;
+}
+
+void msix_uninit_simple(PCIDevice *dev)
+{
+    msix_uninit(dev, &dev->msix_mmio);
 }
 
 void msix_save(PCIDevice *dev, QEMUFile *f)
diff --git a/hw/msix.h b/hw/msix.h
index dfc6087..56e7ba5 100644
--- a/hw/msix.h
+++ b/hw/msix.h
@@ -4,14 +4,13 @@
 #include "qemu-common.h"
 #include "pci.h"
 
-int msix_init(PCIDevice *pdev, unsigned short nentries,
-              MemoryRegion *bar,
-              unsigned bar_nr, unsigned bar_size);
+int msix_init_simple(PCIDevice *dev, unsigned short nentries, unsigned bar_nr);
 
 void msix_write_config(PCIDevice *pci_dev, uint32_t address,
                        uint32_t old_val, int len);
 
-int msix_uninit(PCIDevice *d, MemoryRegion *bar);
+void msix_uninit(PCIDevice *d, MemoryRegion *bar);
+void msix_uninit_simple(PCIDevice *d);
 
 void msix_save(PCIDevice *dev, QEMUFile *f);
 void msix_load(PCIDevice *dev, QEMUFile *f);
diff --git a/hw/virtio-pci.c b/hw/virtio-pci.c
index 5004d7d..6fe2b5e 100644
--- a/hw/virtio-pci.c
+++ b/hw/virtio-pci.c
@@ -713,13 +713,10 @@ void virtio_init_pci(VirtIOPCIProxy *proxy, VirtIODevice *vdev)
     pci_set_word(config + 0x2e, vdev->device_id);
     config[0x3d] = 1;
 
-    memory_region_init(&proxy->msix_bar, "virtio-msix", 4096);
-    if (vdev->nvectors && !msix_init(&proxy->pci_dev, vdev->nvectors,
-                                     &proxy->msix_bar, 1, 0)) {
-        pci_register_bar(&proxy->pci_dev, 1, PCI_BASE_ADDRESS_SPACE_MEMORY,
-                         &proxy->msix_bar);
-    } else
+    if (vdev->nvectors &&
+        msix_init_simple(&proxy->pci_dev, vdev->nvectors, 1)) {
         vdev->nvectors = 0;
+    }
 
     proxy->pci_dev.config_write = virtio_write_config;
 
@@ -766,12 +763,10 @@ static int virtio_blk_init_pci(PCIDevice *pci_dev)
 static int virtio_exit_pci(PCIDevice *pci_dev)
 {
     VirtIOPCIProxy *proxy = DO_UPCAST(VirtIOPCIProxy, pci_dev, pci_dev);
-    int r;
 
     memory_region_destroy(&proxy->bar);
-    r = msix_uninit(pci_dev, &proxy->msix_bar);
-    memory_region_destroy(&proxy->msix_bar);
-    return r;
+    msix_uninit_simple(pci_dev);
+    return 0;
 }
 
 static int virtio_blk_exit_pci(PCIDevice *pci_dev)
diff --git a/hw/virtio-pci.h b/hw/virtio-pci.h
index 14c10f7..5af1c8c 100644
--- a/hw/virtio-pci.h
+++ b/hw/virtio-pci.h
@@ -22,7 +22,6 @@ typedef struct {
     PCIDevice pci_dev;
     VirtIODevice *vdev;
     MemoryRegion bar;
-    MemoryRegion msix_bar;
     uint32_t flags;
     uint32_t class_code;
     uint32_t nvectors;
-- 
1.7.3.4


^ permalink raw reply related	[flat|nested] 288+ messages in thread

* [Qemu-devel] [RFC][PATCH 42/45] msix: Introduce msix_init_simple
@ 2011-10-17  9:28   ` Jan Kiszka
  0 siblings, 0 replies; 288+ messages in thread
From: Jan Kiszka @ 2011-10-17  9:28 UTC (permalink / raw)
  To: Avi Kivity, Marcelo Tosatti
  Cc: Alex Williamson, qemu-devel, kvm, Michael S. Tsirkin

Devices models are usually not interested in specifying MSI-X
configuration details beyond the number of vectors to provide and the
BAR number to use. Layout of an exclusively used BAR and its
registration can also be handled centrally.

This is the purpose of msix_init_simple. It provides handy services to
the existing users. Future users like device assignment may require more
detailed setup specification. For them we will (re-)introduce msix_init
with the full list of configuration option (in contrast to the current
code).

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
---
 hw/ivshmem.c    |    6 +-----
 hw/msix.c       |   35 ++++++++++++++---------------------
 hw/msix.h       |    7 +++----
 hw/virtio-pci.c |   15 +++++----------
 hw/virtio-pci.h |    1 -
 5 files changed, 23 insertions(+), 41 deletions(-)

diff --git a/hw/ivshmem.c b/hw/ivshmem.c
index a402c98..d9dbd18 100644
--- a/hw/ivshmem.c
+++ b/hw/ivshmem.c
@@ -65,7 +65,6 @@ typedef struct IVShmemState {
      */
     MemoryRegion bar;
     MemoryRegion ivshmem;
-    MemoryRegion msix_bar;
     uint64_t ivshmem_size; /* size of shared memory region */
     int shm_fd; /* shared memory file descriptor */
 
@@ -539,10 +538,7 @@ static void ivshmem_setup_msi(IVShmemState *s)
 {
     /* allocate the MSI-X vectors */
 
-    memory_region_init(&s->msix_bar, "ivshmem-msix", 4096);
-    if (!msix_init(&s->dev, s->vectors, &s->msix_bar, 1, 0)) {
-        pci_register_bar(&s->dev, 1, PCI_BASE_ADDRESS_SPACE_MEMORY,
-                         &s->msix_bar);
+    if (!msix_init_simple(&s->dev, s->vectors, 1)) {
         IVSHMEM_DPRINTF("msix initialized (%d vectors)\n", s->vectors);
     } else {
         IVSHMEM_DPRINTF("msix initialization failed\n");
diff --git a/hw/msix.c b/hw/msix.c
index bccd8b1..258b9c1 100644
--- a/hw/msix.c
+++ b/hw/msix.c
@@ -244,17 +244,6 @@ static const MemoryRegionOps msix_mmio_ops = {
     },
 };
 
-static void msix_mmio_setup(PCIDevice *d, MemoryRegion *bar)
-{
-    uint8_t *config = d->config + d->msix_cap;
-    uint32_t table = pci_get_long(config + PCI_MSIX_TABLE);
-    uint32_t offset = table & ~(MSIX_PAGE_SIZE - 1);
-    /* TODO: for assigned devices, we'll want to make it possible to map
-     * pending bits separately in case they are in a separate bar. */
-
-    memory_region_add_subregion(bar, offset, &d->msix_mmio);
-}
-
 static void msix_mask_all(struct PCIDevice *dev, unsigned nentries)
 {
     int vector;
@@ -272,11 +261,9 @@ static void msix_mask_all(struct PCIDevice *dev, unsigned nentries)
     }
 }
 
-/* Initialize the MSI-X structures. Note: if MSI-X is supported, BAR size is
- * modified, it should be retrieved with msix_bar_size. */
-int msix_init(struct PCIDevice *dev, unsigned short nentries,
-              MemoryRegion *bar,
-              unsigned bar_nr, unsigned bar_size)
+/* Initialize the MSI-X structures in a single dedicated BAR
+ * and register it. */
+int msix_init_simple(PCIDevice *dev, unsigned short nentries, unsigned bar_nr)
 {
     int ret;
 
@@ -296,14 +283,16 @@ int msix_init(struct PCIDevice *dev, unsigned short nentries,
                           "msix", MSIX_PAGE_SIZE);
 
     dev->msix_entries_nr = nentries;
-    ret = msix_add_config(dev, nentries, bar_nr, bar_size);
+    ret = msix_add_config(dev, nentries, bar_nr, 0);
     if (ret)
         goto err_config;
 
     dev->msix_cache = g_malloc0(nentries * sizeof *dev->msix_cache);
 
     dev->cap_present |= QEMU_PCI_CAP_MSIX;
-    msix_mmio_setup(dev, bar);
+
+    pci_register_bar(dev, bar_nr, PCI_BASE_ADDRESS_SPACE_MEMORY,
+                     &dev->msix_mmio);
     return 0;
 
 err_config:
@@ -315,10 +304,10 @@ err_config:
 }
 
 /* Clean up resources for the device. */
-int msix_uninit(PCIDevice *dev, MemoryRegion *bar)
+void msix_uninit(PCIDevice *dev, MemoryRegion *bar)
 {
     if (!msix_present(dev)) {
-        return 0;
+        return;
     }
     pci_del_capability(dev, PCI_CAP_ID_MSIX, MSIX_CAP_LENGTH);
     dev->msix_cap = 0;
@@ -332,7 +321,11 @@ int msix_uninit(PCIDevice *dev, MemoryRegion *bar)
     g_free(dev->msix_cache);
 
     dev->cap_present &= ~QEMU_PCI_CAP_MSIX;
-    return 0;
+}
+
+void msix_uninit_simple(PCIDevice *dev)
+{
+    msix_uninit(dev, &dev->msix_mmio);
 }
 
 void msix_save(PCIDevice *dev, QEMUFile *f)
diff --git a/hw/msix.h b/hw/msix.h
index dfc6087..56e7ba5 100644
--- a/hw/msix.h
+++ b/hw/msix.h
@@ -4,14 +4,13 @@
 #include "qemu-common.h"
 #include "pci.h"
 
-int msix_init(PCIDevice *pdev, unsigned short nentries,
-              MemoryRegion *bar,
-              unsigned bar_nr, unsigned bar_size);
+int msix_init_simple(PCIDevice *dev, unsigned short nentries, unsigned bar_nr);
 
 void msix_write_config(PCIDevice *pci_dev, uint32_t address,
                        uint32_t old_val, int len);
 
-int msix_uninit(PCIDevice *d, MemoryRegion *bar);
+void msix_uninit(PCIDevice *d, MemoryRegion *bar);
+void msix_uninit_simple(PCIDevice *d);
 
 void msix_save(PCIDevice *dev, QEMUFile *f);
 void msix_load(PCIDevice *dev, QEMUFile *f);
diff --git a/hw/virtio-pci.c b/hw/virtio-pci.c
index 5004d7d..6fe2b5e 100644
--- a/hw/virtio-pci.c
+++ b/hw/virtio-pci.c
@@ -713,13 +713,10 @@ void virtio_init_pci(VirtIOPCIProxy *proxy, VirtIODevice *vdev)
     pci_set_word(config + 0x2e, vdev->device_id);
     config[0x3d] = 1;
 
-    memory_region_init(&proxy->msix_bar, "virtio-msix", 4096);
-    if (vdev->nvectors && !msix_init(&proxy->pci_dev, vdev->nvectors,
-                                     &proxy->msix_bar, 1, 0)) {
-        pci_register_bar(&proxy->pci_dev, 1, PCI_BASE_ADDRESS_SPACE_MEMORY,
-                         &proxy->msix_bar);
-    } else
+    if (vdev->nvectors &&
+        msix_init_simple(&proxy->pci_dev, vdev->nvectors, 1)) {
         vdev->nvectors = 0;
+    }
 
     proxy->pci_dev.config_write = virtio_write_config;
 
@@ -766,12 +763,10 @@ static int virtio_blk_init_pci(PCIDevice *pci_dev)
 static int virtio_exit_pci(PCIDevice *pci_dev)
 {
     VirtIOPCIProxy *proxy = DO_UPCAST(VirtIOPCIProxy, pci_dev, pci_dev);
-    int r;
 
     memory_region_destroy(&proxy->bar);
-    r = msix_uninit(pci_dev, &proxy->msix_bar);
-    memory_region_destroy(&proxy->msix_bar);
-    return r;
+    msix_uninit_simple(pci_dev);
+    return 0;
 }
 
 static int virtio_blk_exit_pci(PCIDevice *pci_dev)
diff --git a/hw/virtio-pci.h b/hw/virtio-pci.h
index 14c10f7..5af1c8c 100644
--- a/hw/virtio-pci.h
+++ b/hw/virtio-pci.h
@@ -22,7 +22,6 @@ typedef struct {
     PCIDevice pci_dev;
     VirtIODevice *vdev;
     MemoryRegion bar;
-    MemoryRegion msix_bar;
     uint32_t flags;
     uint32_t class_code;
     uint32_t nvectors;
-- 
1.7.3.4

^ permalink raw reply related	[flat|nested] 288+ messages in thread

* [RFC][PATCH 43/45] msix: Allow to customize capability on init
  2011-10-17  9:27 ` [Qemu-devel] " Jan Kiszka
@ 2011-10-17  9:28   ` Jan Kiszka
  -1 siblings, 0 replies; 288+ messages in thread
From: Jan Kiszka @ 2011-10-17  9:28 UTC (permalink / raw)
  To: Avi Kivity, Marcelo Tosatti
  Cc: kvm, Alex Williamson, Michael S. Tsirkin, qemu-devel

This enables fully configurable MSI-X initialization by taking config
space offset, independent table and PBA BARs and the offset inside them
on msix_init. Table and PBA are now realized as two memory subregions,
either of the passed BAR regions or the single page container
msix_init_simple creates and registers.

Will be required for device assignment.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
---
 hw/msix.c |  245 +++++++++++++++++++++++++++++++++---------------------------
 hw/msix.h |    7 ++-
 hw/pci.h  |   12 ++-
 3 files changed, 150 insertions(+), 114 deletions(-)

diff --git a/hw/msix.c b/hw/msix.c
index 258b9c1..548e712 100644
--- a/hw/msix.c
+++ b/hw/msix.c
@@ -25,18 +25,12 @@
 #define MSIX_ENABLE_MASK (PCI_MSIX_FLAGS_ENABLE >> 8)
 #define MSIX_MASKALL_MASK (PCI_MSIX_FLAGS_MASKALL >> 8)
 
-/* How much space does an MSIX table need. */
-/* The spec requires giving the table structure
- * a 4K aligned region all by itself. */
 #define MSIX_PAGE_SIZE 0x1000
-/* Reserve second half of the page for pending bits */
-#define MSIX_PAGE_PENDING (MSIX_PAGE_SIZE / 2)
-#define MSIX_MAX_ENTRIES 32
 
 static void msix_message_from_vector(PCIDevice *dev, unsigned vector,
                                      MSIMessage *msg)
 {
-    uint8_t *table_entry = dev->msix_table_page + vector * PCI_MSIX_ENTRY_SIZE;
+    uint8_t *table_entry = dev->msix_table + vector * PCI_MSIX_ENTRY_SIZE;
 
     msg->address = pci_get_quad(table_entry + PCI_MSIX_ENTRY_LOWER_ADDR);
     msg->data = pci_get_long(table_entry + PCI_MSIX_ENTRY_DATA);
@@ -54,67 +48,6 @@ static void kvm_msix_free(PCIDevice *dev)
     }
 }
 
-/* Add MSI-X capability to the config space for the device. */
-/* Given a bar and its size, add MSI-X table on top of it
- * and fill MSI-X capability in the config space.
- * Original bar size must be a power of 2 or 0.
- * New bar size is returned. */
-static int msix_add_config(struct PCIDevice *pdev, unsigned short nentries,
-                           unsigned bar_nr, unsigned bar_size)
-{
-    int config_offset;
-    uint32_t new_size;
-    uint8_t *config;
-
-    if (nentries < 1 || nentries > PCI_MSIX_FLAGS_QSIZE + 1) {
-        return -EINVAL;
-    }
-    if (bar_size > 0x80000000) {
-        return -ENOSPC;
-    }
-
-    /* Add space for MSI-X structures */
-    if (!bar_size) {
-        new_size = MSIX_PAGE_SIZE;
-    } else if (bar_size < MSIX_PAGE_SIZE) {
-        bar_size = MSIX_PAGE_SIZE;
-        new_size = MSIX_PAGE_SIZE * 2;
-    } else {
-        new_size = bar_size * 2;
-    }
-
-    config_offset = pci_add_capability(pdev, PCI_CAP_ID_MSIX, 0,
-                                       MSIX_CAP_LENGTH);
-    if (config_offset < 0) {
-        return config_offset;
-    }
-    pdev->msix_cap = config_offset;
-
-    config = pdev->config + config_offset;
-    pci_set_word(config + PCI_MSIX_FLAGS, nentries - 1);
-    /* Table on top of BAR */
-    pci_set_long(config + PCI_MSIX_TABLE, bar_size | bar_nr);
-    /* Pending bits on top of that */
-    pci_set_long(config + PCI_MSIX_PBA,
-                 (bar_size + MSIX_PAGE_PENDING) | bar_nr);
-
-    /* Make flags bit writable. */
-    pdev->wmask[config_offset + MSIX_CONTROL_OFFSET] |=
-        MSIX_ENABLE_MASK | MSIX_MASKALL_MASK;
-
-    return 0;
-}
-
-static uint64_t msix_mmio_read(void *opaque, target_phys_addr_t addr,
-                               unsigned size)
-{
-    PCIDevice *dev = opaque;
-    unsigned int offset = addr & (MSIX_PAGE_SIZE - 1) & ~0x3;
-    void *page = dev->msix_table_page;
-
-    return pci_get_long(page + offset);
-}
-
 static uint8_t msix_pending_mask(int vector)
 {
     return 1 << (vector % 8);
@@ -122,7 +55,7 @@ static uint8_t msix_pending_mask(int vector)
 
 static uint8_t *msix_pending_byte(PCIDevice *dev, int vector)
 {
-    return dev->msix_table_page + MSIX_PAGE_PENDING + vector / 8;
+    return dev->msix_pba + vector / 8;
 }
 
 static int msix_is_pending(PCIDevice *dev, int vector)
@@ -150,7 +83,7 @@ static bool msix_is_masked(PCIDevice *dev, int vector)
     unsigned offset =
         vector * PCI_MSIX_ENTRY_SIZE + PCI_MSIX_ENTRY_VECTOR_CTRL;
     return msix_function_masked(dev) ||
-	   dev->msix_table_page[offset] & PCI_MSIX_ENTRY_CTRL_MASKBIT;
+        dev->msix_table[offset] & PCI_MSIX_ENTRY_CTRL_MASKBIT;
 }
 
 static void msix_fire_vector_config_notifier(PCIDevice *dev,
@@ -213,18 +146,25 @@ void msix_write_config(PCIDevice *dev, uint32_t addr,
     }
 }
 
-static void msix_mmio_write(void *opaque, target_phys_addr_t addr,
-                            uint64_t val, unsigned size)
+static uint64_t msix_table_read(void *opaque, target_phys_addr_t addr,
+                                unsigned size)
+{
+    PCIDevice *dev = opaque;
+
+    return pci_get_long(dev->msix_table + addr);
+}
+
+static void msix_table_write(void *opaque, target_phys_addr_t addr,
+                             uint64_t val, unsigned size)
 {
     PCIDevice *dev = opaque;
-    unsigned int offset = addr & (MSIX_PAGE_SIZE - 1) & ~0x3;
-    unsigned int vector = offset / PCI_MSIX_ENTRY_SIZE;
+    unsigned int vector = addr / PCI_MSIX_ENTRY_SIZE;
     bool was_masked = msix_is_masked(dev, vector);
     bool is_masked;
 
-    pci_set_long(dev->msix_table_page + offset, val);
+    pci_set_long(dev->msix_table + addr, val);
 
-    if (msix_enabled(dev) && vector < dev->msix_entries_nr) {
+    if (msix_enabled(dev)) {
         is_masked = msix_is_masked(dev, vector);
         if (was_masked != is_masked) {
             msix_handle_mask_update(dev, vector);
@@ -234,9 +174,35 @@ static void msix_mmio_write(void *opaque, target_phys_addr_t addr,
     }
 }
 
-static const MemoryRegionOps msix_mmio_ops = {
-    .read = msix_mmio_read,
-    .write = msix_mmio_write,
+static const MemoryRegionOps msix_table_ops = {
+    .read = msix_table_read,
+    .write = msix_table_write,
+    .endianness = DEVICE_NATIVE_ENDIAN,
+    .valid = {
+        .min_access_size = 4,
+        .max_access_size = 4,
+    },
+};
+
+static uint64_t msix_pba_read(void *opaque, target_phys_addr_t addr,
+                              unsigned size)
+{
+    PCIDevice *dev = opaque;
+
+    return pci_get_long(dev->msix_pba + addr);
+}
+
+static void msix_pba_write(void *opaque, target_phys_addr_t addr,
+                           uint64_t val, unsigned size)
+{
+    PCIDevice *dev = opaque;
+
+    pci_set_long(dev->msix_pba + addr, val);
+}
+
+static const MemoryRegionOps msix_pba_ops = {
+    .read = msix_pba_read,
+    .write = msix_pba_write,
     .endianness = DEVICE_NATIVE_ENDIAN,
     .valid = {
         .min_access_size = 4,
@@ -253,7 +219,7 @@ static void msix_mask_all(struct PCIDevice *dev, unsigned nentries)
             vector * PCI_MSIX_ENTRY_SIZE + PCI_MSIX_ENTRY_VECTOR_CTRL;
         bool was_masked = msix_is_masked(dev, vector);
 
-        dev->msix_table_page[offset] |= PCI_MSIX_ENTRY_CTRL_MASKBIT;
+        dev->msix_table[offset] |= PCI_MSIX_ENTRY_CTRL_MASKBIT;
 
         if (!was_masked) {
             msix_handle_mask_update(dev, vector);
@@ -261,10 +227,16 @@ static void msix_mask_all(struct PCIDevice *dev, unsigned nentries)
     }
 }
 
-/* Initialize the MSI-X structures in a single dedicated BAR
- * and register it. */
-int msix_init_simple(PCIDevice *dev, unsigned short nentries, unsigned bar_nr)
+/* Initialize the MSI-X structures with all degrees of freedom. The caller is
+ * responsible for providing the BAR regions and registering them. */
+int msix_init(PCIDevice *dev, uint8_t config_offset, unsigned int nentries,
+              MemoryRegion *table_bar, unsigned int table_bar_nr,
+              pcibus_t table_offset, MemoryRegion *pba_bar,
+              unsigned int pba_bar_nr, pcibus_t pba_offset)
 {
+    pcibus_t table_size;
+    pcibus_t pba_size;
+    uint8_t *config;
     int ret;
 
     /* Nothing to do if MSI is not supported by interrupt controller */
@@ -273,38 +245,86 @@ int msix_init_simple(PCIDevice *dev, unsigned short nentries, unsigned bar_nr)
         return -ENOTSUP;
     }
 
-    if (nentries > MSIX_MAX_ENTRIES)
+    if (nentries < 1 || nentries > PCI_MSIX_FLAGS_QSIZE + 1 ||
+        table_bar_nr > 5 || pba_bar_nr > 5) {
         return -EINVAL;
+    }
+
+    table_size = nentries * PCI_MSIX_ENTRY_SIZE;
+    /* Round up to multiples of 16 byte as we cannot create smaller memory
+     * regions. */
+    pba_size = (nentries + 127) / 8;
+
+    if (table_bar_nr == pba_bar_nr &&
+        ranges_overlap(table_offset, table_size, pba_offset, pba_size)) {
+        return -EINVAL;
+    }
 
-    dev->msix_table_page = g_malloc0(MSIX_PAGE_SIZE);
+    ret = pci_add_capability(dev, PCI_CAP_ID_MSIX, config_offset,
+                             MSIX_CAP_LENGTH);
+    if (ret < 0) {
+        return ret;
+    }
+    config_offset = ret;
+
+    dev->msix_table = g_malloc0(table_size);
+    dev->msix_pba = g_malloc0(pba_size);
     msix_mask_all(dev, nentries);
 
-    memory_region_init_io(&dev->msix_mmio, &msix_mmio_ops, dev,
-                          "msix", MSIX_PAGE_SIZE);
+    memory_region_init_io(&dev->msix_table_mem, &msix_table_ops, dev,
+                          "msix-table", table_size);
+    memory_region_add_subregion_overlap(table_bar, table_offset,
+                                        &dev->msix_table_mem, 1);
 
-    dev->msix_entries_nr = nentries;
-    ret = msix_add_config(dev, nentries, bar_nr, 0);
-    if (ret)
-        goto err_config;
+    memory_region_init_io(&dev->msix_pba_mem, &msix_pba_ops, dev,
+                          "msix-pba", pba_size);
+    memory_region_add_subregion_overlap(pba_bar, pba_offset,
+                                        &dev->msix_pba_mem, 1);
+
+    config = dev->config + config_offset;
+    pci_set_word(config + PCI_MSIX_FLAGS, nentries - 1);
+    pci_set_long(config + PCI_MSIX_TABLE, table_offset | table_bar_nr);
+    pci_set_long(config + PCI_MSIX_PBA, pba_offset | pba_bar_nr);
+
+    /* Make flags bit writable. */
+    dev->wmask[config_offset + MSIX_CONTROL_OFFSET] |=
+        MSIX_ENABLE_MASK | MSIX_MASKALL_MASK;
 
     dev->msix_cache = g_malloc0(nentries * sizeof *dev->msix_cache);
 
+    dev->msix_cap = config_offset;
+    dev->msix_entries_nr = nentries;
     dev->cap_present |= QEMU_PCI_CAP_MSIX;
 
-    pci_register_bar(dev, bar_nr, PCI_BASE_ADDRESS_SPACE_MEMORY,
-                     &dev->msix_mmio);
     return 0;
+}
 
-err_config:
-    dev->msix_entries_nr = 0;
-    memory_region_destroy(&dev->msix_mmio);
-    g_free(dev->msix_table_page);
-    dev->msix_table_page = NULL;
-    return ret;
+/* Initialize the MSI-X structures in a single dedicated BAR
+ * and register it. */
+int msix_init_simple(PCIDevice *dev, unsigned short nentries, unsigned bar_nr)
+{
+    int ret;
+
+    assert(nentries * PCI_MSIX_ENTRY_SIZE <= MSIX_PAGE_SIZE / 2);
+
+    memory_region_init(&dev->msix_simple_container, "msix-container",
+                       MSIX_PAGE_SIZE);
+
+    ret = msix_init(dev, 0, nentries, &dev->msix_simple_container, bar_nr, 0,
+                    &dev->msix_simple_container, bar_nr, MSIX_PAGE_SIZE / 2);
+    if (ret < 0) {
+        memory_region_destroy(&dev->msix_simple_container);
+        return ret;
+    }
+
+    pci_register_bar(dev, bar_nr, PCI_BASE_ADDRESS_SPACE_MEMORY,
+                     &dev->msix_simple_container);
+    return 0;
 }
 
 /* Clean up resources for the device. */
-void msix_uninit(PCIDevice *dev, MemoryRegion *bar)
+void msix_uninit(PCIDevice *dev, MemoryRegion *table_bar,
+                 MemoryRegion *pba_bar)
 {
     if (!msix_present(dev)) {
         return;
@@ -312,10 +332,14 @@ void msix_uninit(PCIDevice *dev, MemoryRegion *bar)
     pci_del_capability(dev, PCI_CAP_ID_MSIX, MSIX_CAP_LENGTH);
     dev->msix_cap = 0;
     dev->msix_entries_nr = 0;
-    memory_region_del_subregion(bar, &dev->msix_mmio);
-    memory_region_destroy(&dev->msix_mmio);
-    g_free(dev->msix_table_page);
-    dev->msix_table_page = NULL;
+    memory_region_del_subregion(pba_bar, &dev->msix_pba_mem);
+    memory_region_destroy(&dev->msix_pba_mem);
+    memory_region_del_subregion(table_bar, &dev->msix_table_mem);
+    memory_region_destroy(&dev->msix_table_mem);
+    g_free(dev->msix_table);
+    dev->msix_table = NULL;
+    g_free(dev->msix_pba);
+    dev->msix_pba = NULL;
 
     kvm_msix_free(dev);
     g_free(dev->msix_cache);
@@ -325,7 +349,7 @@ void msix_uninit(PCIDevice *dev, MemoryRegion *bar)
 
 void msix_uninit_simple(PCIDevice *dev)
 {
-    msix_uninit(dev, &dev->msix_mmio);
+    msix_uninit(dev, &dev->msix_table_mem, &dev->msix_pba_mem);
 }
 
 void msix_save(PCIDevice *dev, QEMUFile *f)
@@ -335,8 +359,8 @@ void msix_save(PCIDevice *dev, QEMUFile *f)
     if (!msix_present(dev)) {
         return;
     }
-    qemu_put_buffer(f, dev->msix_table_page, n * PCI_MSIX_ENTRY_SIZE);
-    qemu_put_buffer(f, dev->msix_table_page + MSIX_PAGE_PENDING, (n + 7) / 8);
+    qemu_put_buffer(f, dev->msix_table, n * PCI_MSIX_ENTRY_SIZE);
+    qemu_put_buffer(f, dev->msix_pba, (n + 7) / 8);
 }
 
 /* Should be called after restoring the config space. */
@@ -348,8 +372,8 @@ void msix_load(PCIDevice *dev, QEMUFile *f)
         return;
     }
 
-    qemu_get_buffer(f, dev->msix_table_page, n * PCI_MSIX_ENTRY_SIZE);
-    qemu_get_buffer(f, dev->msix_table_page + MSIX_PAGE_PENDING, (n + 7) / 8);
+    qemu_get_buffer(f, dev->msix_table, n * PCI_MSIX_ENTRY_SIZE);
+    qemu_get_buffer(f, dev->msix_pba, (n + 7) / 8);
 }
 
 /* Does device support MSI-X? */
@@ -391,7 +415,8 @@ void msix_reset(PCIDevice *dev)
     msix_clear_all_vectors(dev);
     dev->config[dev->msix_cap + MSIX_CONTROL_OFFSET] &=
 	    ~dev->wmask[dev->msix_cap + MSIX_CONTROL_OFFSET];
-    memset(dev->msix_table_page, 0, MSIX_PAGE_SIZE);
+    memset(dev->msix_table, 0, dev->msix_entries_nr * PCI_MSIX_ENTRY_SIZE);
+    memset(dev->msix_pba, 0, (dev->msix_entries_nr + 7) / 8);
     msix_mask_all(dev, dev->msix_entries_nr);
 }
 
diff --git a/hw/msix.h b/hw/msix.h
index 56e7ba5..040b552 100644
--- a/hw/msix.h
+++ b/hw/msix.h
@@ -4,12 +4,17 @@
 #include "qemu-common.h"
 #include "pci.h"
 
+int msix_init(PCIDevice *pdev, uint8_t config_offset, unsigned int nentries,
+              MemoryRegion *table_bar, unsigned int table_bar_nr,
+              pcibus_t table_offset, MemoryRegion *pba_bar,
+              unsigned int pba_bar_nr, pcibus_t pba_offset);
 int msix_init_simple(PCIDevice *dev, unsigned short nentries, unsigned bar_nr);
 
 void msix_write_config(PCIDevice *pci_dev, uint32_t address,
                        uint32_t old_val, int len);
 
-void msix_uninit(PCIDevice *d, MemoryRegion *bar);
+void msix_uninit(PCIDevice *dev, MemoryRegion *table_bar,
+                 MemoryRegion *pba_bar);
 void msix_uninit_simple(PCIDevice *d);
 
 void msix_save(PCIDevice *dev, QEMUFile *f);
diff --git a/hw/pci.h b/hw/pci.h
index e2be271..4b90f5c 100644
--- a/hw/pci.h
+++ b/hw/pci.h
@@ -175,9 +175,15 @@ struct PCIDevice {
     int msix_entries_nr;
 
     /* Space to store MSIX table */
-    uint8_t *msix_table_page;
-    /* MMIO index used to map MSIX table and pending bit entries. */
-    MemoryRegion msix_mmio;
+    uint8_t *msix_table;
+    /* Space to store MSIX PBA */
+    uint8_t *msix_pba;
+    /* single-page MSI-X MMIO container. */
+    MemoryRegion msix_simple_container;
+    /* Used to map MSIX table. */
+    MemoryRegion msix_table_mem;
+    /* Used to map PBA. */
+    MemoryRegion msix_pba_mem;
     /* Version id needed for VMState */
     int32_t version_id;
 
-- 
1.7.3.4


^ permalink raw reply related	[flat|nested] 288+ messages in thread

* [Qemu-devel] [RFC][PATCH 43/45] msix: Allow to customize capability on init
@ 2011-10-17  9:28   ` Jan Kiszka
  0 siblings, 0 replies; 288+ messages in thread
From: Jan Kiszka @ 2011-10-17  9:28 UTC (permalink / raw)
  To: Avi Kivity, Marcelo Tosatti
  Cc: Alex Williamson, qemu-devel, kvm, Michael S. Tsirkin

This enables fully configurable MSI-X initialization by taking config
space offset, independent table and PBA BARs and the offset inside them
on msix_init. Table and PBA are now realized as two memory subregions,
either of the passed BAR regions or the single page container
msix_init_simple creates and registers.

Will be required for device assignment.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
---
 hw/msix.c |  245 +++++++++++++++++++++++++++++++++---------------------------
 hw/msix.h |    7 ++-
 hw/pci.h  |   12 ++-
 3 files changed, 150 insertions(+), 114 deletions(-)

diff --git a/hw/msix.c b/hw/msix.c
index 258b9c1..548e712 100644
--- a/hw/msix.c
+++ b/hw/msix.c
@@ -25,18 +25,12 @@
 #define MSIX_ENABLE_MASK (PCI_MSIX_FLAGS_ENABLE >> 8)
 #define MSIX_MASKALL_MASK (PCI_MSIX_FLAGS_MASKALL >> 8)
 
-/* How much space does an MSIX table need. */
-/* The spec requires giving the table structure
- * a 4K aligned region all by itself. */
 #define MSIX_PAGE_SIZE 0x1000
-/* Reserve second half of the page for pending bits */
-#define MSIX_PAGE_PENDING (MSIX_PAGE_SIZE / 2)
-#define MSIX_MAX_ENTRIES 32
 
 static void msix_message_from_vector(PCIDevice *dev, unsigned vector,
                                      MSIMessage *msg)
 {
-    uint8_t *table_entry = dev->msix_table_page + vector * PCI_MSIX_ENTRY_SIZE;
+    uint8_t *table_entry = dev->msix_table + vector * PCI_MSIX_ENTRY_SIZE;
 
     msg->address = pci_get_quad(table_entry + PCI_MSIX_ENTRY_LOWER_ADDR);
     msg->data = pci_get_long(table_entry + PCI_MSIX_ENTRY_DATA);
@@ -54,67 +48,6 @@ static void kvm_msix_free(PCIDevice *dev)
     }
 }
 
-/* Add MSI-X capability to the config space for the device. */
-/* Given a bar and its size, add MSI-X table on top of it
- * and fill MSI-X capability in the config space.
- * Original bar size must be a power of 2 or 0.
- * New bar size is returned. */
-static int msix_add_config(struct PCIDevice *pdev, unsigned short nentries,
-                           unsigned bar_nr, unsigned bar_size)
-{
-    int config_offset;
-    uint32_t new_size;
-    uint8_t *config;
-
-    if (nentries < 1 || nentries > PCI_MSIX_FLAGS_QSIZE + 1) {
-        return -EINVAL;
-    }
-    if (bar_size > 0x80000000) {
-        return -ENOSPC;
-    }
-
-    /* Add space for MSI-X structures */
-    if (!bar_size) {
-        new_size = MSIX_PAGE_SIZE;
-    } else if (bar_size < MSIX_PAGE_SIZE) {
-        bar_size = MSIX_PAGE_SIZE;
-        new_size = MSIX_PAGE_SIZE * 2;
-    } else {
-        new_size = bar_size * 2;
-    }
-
-    config_offset = pci_add_capability(pdev, PCI_CAP_ID_MSIX, 0,
-                                       MSIX_CAP_LENGTH);
-    if (config_offset < 0) {
-        return config_offset;
-    }
-    pdev->msix_cap = config_offset;
-
-    config = pdev->config + config_offset;
-    pci_set_word(config + PCI_MSIX_FLAGS, nentries - 1);
-    /* Table on top of BAR */
-    pci_set_long(config + PCI_MSIX_TABLE, bar_size | bar_nr);
-    /* Pending bits on top of that */
-    pci_set_long(config + PCI_MSIX_PBA,
-                 (bar_size + MSIX_PAGE_PENDING) | bar_nr);
-
-    /* Make flags bit writable. */
-    pdev->wmask[config_offset + MSIX_CONTROL_OFFSET] |=
-        MSIX_ENABLE_MASK | MSIX_MASKALL_MASK;
-
-    return 0;
-}
-
-static uint64_t msix_mmio_read(void *opaque, target_phys_addr_t addr,
-                               unsigned size)
-{
-    PCIDevice *dev = opaque;
-    unsigned int offset = addr & (MSIX_PAGE_SIZE - 1) & ~0x3;
-    void *page = dev->msix_table_page;
-
-    return pci_get_long(page + offset);
-}
-
 static uint8_t msix_pending_mask(int vector)
 {
     return 1 << (vector % 8);
@@ -122,7 +55,7 @@ static uint8_t msix_pending_mask(int vector)
 
 static uint8_t *msix_pending_byte(PCIDevice *dev, int vector)
 {
-    return dev->msix_table_page + MSIX_PAGE_PENDING + vector / 8;
+    return dev->msix_pba + vector / 8;
 }
 
 static int msix_is_pending(PCIDevice *dev, int vector)
@@ -150,7 +83,7 @@ static bool msix_is_masked(PCIDevice *dev, int vector)
     unsigned offset =
         vector * PCI_MSIX_ENTRY_SIZE + PCI_MSIX_ENTRY_VECTOR_CTRL;
     return msix_function_masked(dev) ||
-	   dev->msix_table_page[offset] & PCI_MSIX_ENTRY_CTRL_MASKBIT;
+        dev->msix_table[offset] & PCI_MSIX_ENTRY_CTRL_MASKBIT;
 }
 
 static void msix_fire_vector_config_notifier(PCIDevice *dev,
@@ -213,18 +146,25 @@ void msix_write_config(PCIDevice *dev, uint32_t addr,
     }
 }
 
-static void msix_mmio_write(void *opaque, target_phys_addr_t addr,
-                            uint64_t val, unsigned size)
+static uint64_t msix_table_read(void *opaque, target_phys_addr_t addr,
+                                unsigned size)
+{
+    PCIDevice *dev = opaque;
+
+    return pci_get_long(dev->msix_table + addr);
+}
+
+static void msix_table_write(void *opaque, target_phys_addr_t addr,
+                             uint64_t val, unsigned size)
 {
     PCIDevice *dev = opaque;
-    unsigned int offset = addr & (MSIX_PAGE_SIZE - 1) & ~0x3;
-    unsigned int vector = offset / PCI_MSIX_ENTRY_SIZE;
+    unsigned int vector = addr / PCI_MSIX_ENTRY_SIZE;
     bool was_masked = msix_is_masked(dev, vector);
     bool is_masked;
 
-    pci_set_long(dev->msix_table_page + offset, val);
+    pci_set_long(dev->msix_table + addr, val);
 
-    if (msix_enabled(dev) && vector < dev->msix_entries_nr) {
+    if (msix_enabled(dev)) {
         is_masked = msix_is_masked(dev, vector);
         if (was_masked != is_masked) {
             msix_handle_mask_update(dev, vector);
@@ -234,9 +174,35 @@ static void msix_mmio_write(void *opaque, target_phys_addr_t addr,
     }
 }
 
-static const MemoryRegionOps msix_mmio_ops = {
-    .read = msix_mmio_read,
-    .write = msix_mmio_write,
+static const MemoryRegionOps msix_table_ops = {
+    .read = msix_table_read,
+    .write = msix_table_write,
+    .endianness = DEVICE_NATIVE_ENDIAN,
+    .valid = {
+        .min_access_size = 4,
+        .max_access_size = 4,
+    },
+};
+
+static uint64_t msix_pba_read(void *opaque, target_phys_addr_t addr,
+                              unsigned size)
+{
+    PCIDevice *dev = opaque;
+
+    return pci_get_long(dev->msix_pba + addr);
+}
+
+static void msix_pba_write(void *opaque, target_phys_addr_t addr,
+                           uint64_t val, unsigned size)
+{
+    PCIDevice *dev = opaque;
+
+    pci_set_long(dev->msix_pba + addr, val);
+}
+
+static const MemoryRegionOps msix_pba_ops = {
+    .read = msix_pba_read,
+    .write = msix_pba_write,
     .endianness = DEVICE_NATIVE_ENDIAN,
     .valid = {
         .min_access_size = 4,
@@ -253,7 +219,7 @@ static void msix_mask_all(struct PCIDevice *dev, unsigned nentries)
             vector * PCI_MSIX_ENTRY_SIZE + PCI_MSIX_ENTRY_VECTOR_CTRL;
         bool was_masked = msix_is_masked(dev, vector);
 
-        dev->msix_table_page[offset] |= PCI_MSIX_ENTRY_CTRL_MASKBIT;
+        dev->msix_table[offset] |= PCI_MSIX_ENTRY_CTRL_MASKBIT;
 
         if (!was_masked) {
             msix_handle_mask_update(dev, vector);
@@ -261,10 +227,16 @@ static void msix_mask_all(struct PCIDevice *dev, unsigned nentries)
     }
 }
 
-/* Initialize the MSI-X structures in a single dedicated BAR
- * and register it. */
-int msix_init_simple(PCIDevice *dev, unsigned short nentries, unsigned bar_nr)
+/* Initialize the MSI-X structures with all degrees of freedom. The caller is
+ * responsible for providing the BAR regions and registering them. */
+int msix_init(PCIDevice *dev, uint8_t config_offset, unsigned int nentries,
+              MemoryRegion *table_bar, unsigned int table_bar_nr,
+              pcibus_t table_offset, MemoryRegion *pba_bar,
+              unsigned int pba_bar_nr, pcibus_t pba_offset)
 {
+    pcibus_t table_size;
+    pcibus_t pba_size;
+    uint8_t *config;
     int ret;
 
     /* Nothing to do if MSI is not supported by interrupt controller */
@@ -273,38 +245,86 @@ int msix_init_simple(PCIDevice *dev, unsigned short nentries, unsigned bar_nr)
         return -ENOTSUP;
     }
 
-    if (nentries > MSIX_MAX_ENTRIES)
+    if (nentries < 1 || nentries > PCI_MSIX_FLAGS_QSIZE + 1 ||
+        table_bar_nr > 5 || pba_bar_nr > 5) {
         return -EINVAL;
+    }
+
+    table_size = nentries * PCI_MSIX_ENTRY_SIZE;
+    /* Round up to multiples of 16 byte as we cannot create smaller memory
+     * regions. */
+    pba_size = (nentries + 127) / 8;
+
+    if (table_bar_nr == pba_bar_nr &&
+        ranges_overlap(table_offset, table_size, pba_offset, pba_size)) {
+        return -EINVAL;
+    }
 
-    dev->msix_table_page = g_malloc0(MSIX_PAGE_SIZE);
+    ret = pci_add_capability(dev, PCI_CAP_ID_MSIX, config_offset,
+                             MSIX_CAP_LENGTH);
+    if (ret < 0) {
+        return ret;
+    }
+    config_offset = ret;
+
+    dev->msix_table = g_malloc0(table_size);
+    dev->msix_pba = g_malloc0(pba_size);
     msix_mask_all(dev, nentries);
 
-    memory_region_init_io(&dev->msix_mmio, &msix_mmio_ops, dev,
-                          "msix", MSIX_PAGE_SIZE);
+    memory_region_init_io(&dev->msix_table_mem, &msix_table_ops, dev,
+                          "msix-table", table_size);
+    memory_region_add_subregion_overlap(table_bar, table_offset,
+                                        &dev->msix_table_mem, 1);
 
-    dev->msix_entries_nr = nentries;
-    ret = msix_add_config(dev, nentries, bar_nr, 0);
-    if (ret)
-        goto err_config;
+    memory_region_init_io(&dev->msix_pba_mem, &msix_pba_ops, dev,
+                          "msix-pba", pba_size);
+    memory_region_add_subregion_overlap(pba_bar, pba_offset,
+                                        &dev->msix_pba_mem, 1);
+
+    config = dev->config + config_offset;
+    pci_set_word(config + PCI_MSIX_FLAGS, nentries - 1);
+    pci_set_long(config + PCI_MSIX_TABLE, table_offset | table_bar_nr);
+    pci_set_long(config + PCI_MSIX_PBA, pba_offset | pba_bar_nr);
+
+    /* Make flags bit writable. */
+    dev->wmask[config_offset + MSIX_CONTROL_OFFSET] |=
+        MSIX_ENABLE_MASK | MSIX_MASKALL_MASK;
 
     dev->msix_cache = g_malloc0(nentries * sizeof *dev->msix_cache);
 
+    dev->msix_cap = config_offset;
+    dev->msix_entries_nr = nentries;
     dev->cap_present |= QEMU_PCI_CAP_MSIX;
 
-    pci_register_bar(dev, bar_nr, PCI_BASE_ADDRESS_SPACE_MEMORY,
-                     &dev->msix_mmio);
     return 0;
+}
 
-err_config:
-    dev->msix_entries_nr = 0;
-    memory_region_destroy(&dev->msix_mmio);
-    g_free(dev->msix_table_page);
-    dev->msix_table_page = NULL;
-    return ret;
+/* Initialize the MSI-X structures in a single dedicated BAR
+ * and register it. */
+int msix_init_simple(PCIDevice *dev, unsigned short nentries, unsigned bar_nr)
+{
+    int ret;
+
+    assert(nentries * PCI_MSIX_ENTRY_SIZE <= MSIX_PAGE_SIZE / 2);
+
+    memory_region_init(&dev->msix_simple_container, "msix-container",
+                       MSIX_PAGE_SIZE);
+
+    ret = msix_init(dev, 0, nentries, &dev->msix_simple_container, bar_nr, 0,
+                    &dev->msix_simple_container, bar_nr, MSIX_PAGE_SIZE / 2);
+    if (ret < 0) {
+        memory_region_destroy(&dev->msix_simple_container);
+        return ret;
+    }
+
+    pci_register_bar(dev, bar_nr, PCI_BASE_ADDRESS_SPACE_MEMORY,
+                     &dev->msix_simple_container);
+    return 0;
 }
 
 /* Clean up resources for the device. */
-void msix_uninit(PCIDevice *dev, MemoryRegion *bar)
+void msix_uninit(PCIDevice *dev, MemoryRegion *table_bar,
+                 MemoryRegion *pba_bar)
 {
     if (!msix_present(dev)) {
         return;
@@ -312,10 +332,14 @@ void msix_uninit(PCIDevice *dev, MemoryRegion *bar)
     pci_del_capability(dev, PCI_CAP_ID_MSIX, MSIX_CAP_LENGTH);
     dev->msix_cap = 0;
     dev->msix_entries_nr = 0;
-    memory_region_del_subregion(bar, &dev->msix_mmio);
-    memory_region_destroy(&dev->msix_mmio);
-    g_free(dev->msix_table_page);
-    dev->msix_table_page = NULL;
+    memory_region_del_subregion(pba_bar, &dev->msix_pba_mem);
+    memory_region_destroy(&dev->msix_pba_mem);
+    memory_region_del_subregion(table_bar, &dev->msix_table_mem);
+    memory_region_destroy(&dev->msix_table_mem);
+    g_free(dev->msix_table);
+    dev->msix_table = NULL;
+    g_free(dev->msix_pba);
+    dev->msix_pba = NULL;
 
     kvm_msix_free(dev);
     g_free(dev->msix_cache);
@@ -325,7 +349,7 @@ void msix_uninit(PCIDevice *dev, MemoryRegion *bar)
 
 void msix_uninit_simple(PCIDevice *dev)
 {
-    msix_uninit(dev, &dev->msix_mmio);
+    msix_uninit(dev, &dev->msix_table_mem, &dev->msix_pba_mem);
 }
 
 void msix_save(PCIDevice *dev, QEMUFile *f)
@@ -335,8 +359,8 @@ void msix_save(PCIDevice *dev, QEMUFile *f)
     if (!msix_present(dev)) {
         return;
     }
-    qemu_put_buffer(f, dev->msix_table_page, n * PCI_MSIX_ENTRY_SIZE);
-    qemu_put_buffer(f, dev->msix_table_page + MSIX_PAGE_PENDING, (n + 7) / 8);
+    qemu_put_buffer(f, dev->msix_table, n * PCI_MSIX_ENTRY_SIZE);
+    qemu_put_buffer(f, dev->msix_pba, (n + 7) / 8);
 }
 
 /* Should be called after restoring the config space. */
@@ -348,8 +372,8 @@ void msix_load(PCIDevice *dev, QEMUFile *f)
         return;
     }
 
-    qemu_get_buffer(f, dev->msix_table_page, n * PCI_MSIX_ENTRY_SIZE);
-    qemu_get_buffer(f, dev->msix_table_page + MSIX_PAGE_PENDING, (n + 7) / 8);
+    qemu_get_buffer(f, dev->msix_table, n * PCI_MSIX_ENTRY_SIZE);
+    qemu_get_buffer(f, dev->msix_pba, (n + 7) / 8);
 }
 
 /* Does device support MSI-X? */
@@ -391,7 +415,8 @@ void msix_reset(PCIDevice *dev)
     msix_clear_all_vectors(dev);
     dev->config[dev->msix_cap + MSIX_CONTROL_OFFSET] &=
 	    ~dev->wmask[dev->msix_cap + MSIX_CONTROL_OFFSET];
-    memset(dev->msix_table_page, 0, MSIX_PAGE_SIZE);
+    memset(dev->msix_table, 0, dev->msix_entries_nr * PCI_MSIX_ENTRY_SIZE);
+    memset(dev->msix_pba, 0, (dev->msix_entries_nr + 7) / 8);
     msix_mask_all(dev, dev->msix_entries_nr);
 }
 
diff --git a/hw/msix.h b/hw/msix.h
index 56e7ba5..040b552 100644
--- a/hw/msix.h
+++ b/hw/msix.h
@@ -4,12 +4,17 @@
 #include "qemu-common.h"
 #include "pci.h"
 
+int msix_init(PCIDevice *pdev, uint8_t config_offset, unsigned int nentries,
+              MemoryRegion *table_bar, unsigned int table_bar_nr,
+              pcibus_t table_offset, MemoryRegion *pba_bar,
+              unsigned int pba_bar_nr, pcibus_t pba_offset);
 int msix_init_simple(PCIDevice *dev, unsigned short nentries, unsigned bar_nr);
 
 void msix_write_config(PCIDevice *pci_dev, uint32_t address,
                        uint32_t old_val, int len);
 
-void msix_uninit(PCIDevice *d, MemoryRegion *bar);
+void msix_uninit(PCIDevice *dev, MemoryRegion *table_bar,
+                 MemoryRegion *pba_bar);
 void msix_uninit_simple(PCIDevice *d);
 
 void msix_save(PCIDevice *dev, QEMUFile *f);
diff --git a/hw/pci.h b/hw/pci.h
index e2be271..4b90f5c 100644
--- a/hw/pci.h
+++ b/hw/pci.h
@@ -175,9 +175,15 @@ struct PCIDevice {
     int msix_entries_nr;
 
     /* Space to store MSIX table */
-    uint8_t *msix_table_page;
-    /* MMIO index used to map MSIX table and pending bit entries. */
-    MemoryRegion msix_mmio;
+    uint8_t *msix_table;
+    /* Space to store MSIX PBA */
+    uint8_t *msix_pba;
+    /* single-page MSI-X MMIO container. */
+    MemoryRegion msix_simple_container;
+    /* Used to map MSIX table. */
+    MemoryRegion msix_table_mem;
+    /* Used to map PBA. */
+    MemoryRegion msix_pba_mem;
     /* Version id needed for VMState */
     int32_t version_id;
 
-- 
1.7.3.4

^ permalink raw reply related	[flat|nested] 288+ messages in thread

* [RFC][PATCH 44/45] pci-assign: Use generic MSI-X support
  2011-10-17  9:27 ` [Qemu-devel] " Jan Kiszka
@ 2011-10-17  9:28   ` Jan Kiszka
  -1 siblings, 0 replies; 288+ messages in thread
From: Jan Kiszka @ 2011-10-17  9:28 UTC (permalink / raw)
  To: Avi Kivity, Marcelo Tosatti
  Cc: kvm, Alex Williamson, Michael S. Tsirkin, qemu-devel

Switch MSI-X support of the device assignment core to the generic layer
QEMU offers. As for legacy MSI, we use config notifiers to update IRQ
assignment and routes on guest changes. Quite a bit code becomes
obsolete in the device assigment core, e.g. the maintenance of the MSI-X
vector masking MMIO page. Note that we have to reorder BAR mapping and
capability initialization in order to pass the BAR container on
msix_init.

Also in this case we still do not support per-vector masking even after
these changes.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
---
 hw/device-assignment.c |  335 +++++++++++++-----------------------------------
 hw/device-assignment.h |   14 +--
 2 files changed, 88 insertions(+), 261 deletions(-)

diff --git a/hw/device-assignment.c b/hw/device-assignment.c
index 10b30a3..df554b3 100644
--- a/hw/device-assignment.c
+++ b/hw/device-assignment.c
@@ -24,6 +24,7 @@
  *  Copyright (C) 2008, Qumranet, Amit Shah (amit.shah@qumranet.com)
  *  Copyright (C) 2008, Red Hat, Amit Shah (amit.shah@redhat.com)
  *  Copyright (C) 2008, IBM, Muli Ben-Yehuda (muli@il.ibm.com)
+ *  Copyright (C) 2011, Siemens AG, Jan Kiszka (jan.kiszka@siemens.com)
  */
 #include <stdio.h>
 #include <unistd.h>
@@ -41,6 +42,7 @@
 #include "range.h"
 #include "sysemu.h"
 #include "msi.h"
+#include "msix.h"
 
 #define MSIX_PAGE_SIZE 0x1000
 
@@ -64,8 +66,6 @@
 
 static void assigned_dev_load_option_rom(AssignedDevice *dev);
 
-static void assigned_dev_unregister_msix_mmio(AssignedDevice *dev);
-
 static uint32_t assigned_dev_ioport_rw(AssignedDevRegion *dev_region,
                                        uint32_t addr, int len, uint32_t *val)
 {
@@ -238,24 +238,11 @@ static void assigned_dev_iomem_setup(PCIDevice *pci_dev, int region_num,
 {
     AssignedDevice *r_dev = DO_UPCAST(AssignedDevice, dev, pci_dev);
     AssignedDevRegion *region = &r_dev->v_addrs[region_num];
-    PCIRegion *real_region = &r_dev->real_device.regions[region_num];
 
     if (e_size > 0) {
         memory_region_init(&region->container, "assigned-dev-container",
                            e_size);
         memory_region_add_subregion(&region->container, 0, &region->real_iomem);
-
-        /* deal with MSI-X MMIO page */
-        if (real_region->base_addr <= r_dev->msix_table_addr &&
-                real_region->base_addr + real_region->size >
-                r_dev->msix_table_addr) {
-            int offset = r_dev->msix_table_addr - real_region->base_addr;
-
-            memory_region_add_subregion_overlap(&region->container,
-                                                offset,
-                                                &r_dev->mmio,
-                                                1);
-        }
     }
 }
 
@@ -648,21 +635,20 @@ again:
 
 static QLIST_HEAD(, AssignedDevice) devs = QLIST_HEAD_INITIALIZER(devs);
 
-static void invalidate_msix_vectors(AssignedDevice *dev)
-{
-    int i;
-
-    for (i = 0; i < dev->irq_entries_nr; i++) {
-        kvm_msi_cache_invalidate(&dev->dev.msix_cache[i]);
-    }
-}
-
 static void free_assigned_device(AssignedDevice *dev)
 {
+    uint32_t table_bar_nr, pba_bar_nr;
+    uint8_t *msix_cap;
     int i;
 
-    if (dev->cap.available & ASSIGNED_DEVICE_CAP_MSIX) {
-        assigned_dev_unregister_msix_mmio(dev);
+    if (msix_present(&dev->dev)) {
+        msix_cap = dev->dev.config + dev->dev.msix_cap;
+        table_bar_nr = pci_get_long(msix_cap + PCI_MSIX_TABLE) &
+            PCI_MSIX_FLAGS_BIRMASK;
+        pba_bar_nr = pci_get_long(msix_cap + PCI_MSIX_PBA) &
+            PCI_MSIX_FLAGS_BIRMASK;
+        msix_uninit(&dev->dev, &dev->v_addrs[table_bar_nr].container,
+                    &dev->v_addrs[pba_bar_nr].container);
     }
     for (i = 0; i < dev->real_device.region_number; i++) {
         PCIRegion *pci_region = &dev->real_device.regions[i];
@@ -698,9 +684,6 @@ static void free_assigned_device(AssignedDevice *dev)
     if (dev->real_device.config_fd >= 0) {
         close(dev->real_device.config_fd);
     }
-
-    invalidate_msix_vectors(dev);
-    g_free(dev->dev.msix_cache);
 }
 
 static uint32_t calc_assigned_dev_id(AssignedDevice *dev)
@@ -916,11 +899,13 @@ void assigned_dev_update_irqs(void)
     }
 }
 
+/* used for both MSI and MSI-X */
 static void assigned_dev_update_msi(PCIDevice *pci_dev, bool enabled)
 {
     AssignedDevice *dev = DO_UPCAST(AssignedDevice, dev, pci_dev);
 
     if (!enabled) {
+        dev->msix_vectors_in_use = 0;
         assign_intx(dev);
     }
 }
@@ -945,113 +930,66 @@ static int assigned_dev_update_msi_vector(PCIDevice *pci_dev,
     return 0;
 }
 
-static int assigned_dev_set_msix_vectors(PCIDevice *pci_dev)
+static int assigned_dev_update_msix_vector(PCIDevice *pci_dev,
+                                           unsigned int vector,
+                                           MSIMessage *msg, bool masked)
 {
-    AssignedDevice *adev = DO_UPCAST(AssignedDevice, dev, pci_dev);
-    uint16_t entries_nr = 0, entries_max_nr;
-    void *msix_page = adev->msix_table_page;
+    AssignedDevice *dev = DO_UPCAST(AssignedDevice, dev, pci_dev);
+    MSIRoutingCache *cache;
     uint32_t dev_id;
-    MSIMessage msg;
-    int pos, i, r;
-
-    assert(adev->irq_entries_nr == 0);
-
-    pos = pci_find_capability(pci_dev, PCI_CAP_ID_MSIX);
+    unsigned int i;
+    int ret = 0;
 
-    entries_max_nr = pci_get_word(pci_dev->config + pos + PCI_MSIX_FLAGS);
-    entries_max_nr &= PCI_MSIX_FLAGS_QSIZE;
-    entries_max_nr += 1;
+    if (!masked) {
+        dev_id = calc_assigned_dev_id(dev);
 
-    /* Get the usable entry number for allocating */
-    for (i = 0; i < entries_max_nr; i++) {
         /* Assuming IA-32 MSI message format:
          * Ignore unused entry (invalid vector) */
-        if (pci_get_long(msix_page + i * PCI_MSIX_ENTRY_SIZE +
-                         PCI_MSIX_ENTRY_DATA) == 0) {
-            continue;
+        if (msg->data == 0) {
+            if (pci_dev->msix_cache[vector].type == MSI_ROUTE_NONE) {
+                return ret;
+            }
+            dev->msix_vectors_in_use--;
+            deassign_irq(dev);
+            kvm_msi_cache_invalidate(&pci_dev->msix_cache[vector]);
+        } else {
+            if (pci_dev->msix_cache[vector].type != MSI_ROUTE_NONE) {
+                ret = kvm_device_msix_set_vector(kvm_state, dev_id,
+                                                 vector, msg,
+                                                 &pci_dev->msix_cache[vector]);
+                return ret;
+            }
+            dev->msix_vectors_in_use++;
+            deassign_irq(dev);
         }
-        entries_nr++;
-    }
-    if (entries_nr == 0) {
-        fprintf(stderr, "MSI-X entry number is zero!\n");
-        return -EINVAL;
-    }
 
-    dev_id = calc_assigned_dev_id(adev);
-
-    r = kvm_device_msix_init_vectors(kvm_state, dev_id, entries_nr);
-    if (r < 0) {
-        return r;
-    }
-    pci_dev->msix_cache = g_malloc0(entries_nr * sizeof(MSIRoutingCache));
-    adev->irq_entries_nr = entries_nr;
-
-    for (i = 0; i < entries_max_nr; i++) {
-        if (entries_nr == 0) {
-            break;
-        }
-        msg.data = pci_get_long(msix_page + i * PCI_MSIX_ENTRY_SIZE +
-                                PCI_MSIX_ENTRY_DATA);
-        if (msg.data == 0) {
-            continue;
+        ret = kvm_device_msix_init_vectors(kvm_state, dev_id,
+                                           dev->msix_vectors_in_use);
+        if (ret < 0) {
+            return ret;
         }
-        msg.address = pci_get_quad(msix_page + i * PCI_MSIX_ENTRY_SIZE +
-                                   PCI_MSIX_ENTRY_LOWER_ADDR);
 
-        r = kvm_device_msix_set_vector(kvm_state, dev_id, i, &msg,
-                                       &pci_dev->msix_cache[i]);
-        if (r < 0) {
-            return r;
+        for (i = 0; i < pci_dev->msix_entries_nr; i++) {
+            cache = &pci_dev->msix_cache[i];
+            if (i != vector && cache->type == MSI_ROUTE_NONE) {
+                continue;
+            }
+            ret = kvm_device_msix_set_vector(kvm_state, dev_id, i,
+                                             i == vector ? msg : &cache->msg,
+                                             cache);
+            if (ret < 0) {
+                return ret;
+            }
         }
-        entries_nr--;
-    }
-
-    return 0;
-}
-
-static void assigned_dev_update_msix(PCIDevice *pci_dev)
-{
-    AssignedDevice *assigned_dev = DO_UPCAST(AssignedDevice, dev, pci_dev);
-    uint16_t ctrl_word = pci_get_word(pci_dev->config + pci_dev->msix_cap +
-                                      PCI_MSIX_FLAGS);
-    uint32_t dev_id;
-    int r;
 
-    dev_id = calc_assigned_dev_id(assigned_dev);
-
-    /* Some guests gratuitously disable MSIX even if they're not using it,
-     * try to catch this by only deassigning irqs if the guest is using
-     * MSIX or intends to start. */
-    if ((assigned_dev->irq_requested_type & KVM_DEV_IRQ_GUEST_MSIX) ||
-        (ctrl_word & PCI_MSIX_FLAGS_ENABLE)) {
-        invalidate_msix_vectors(assigned_dev);
-        g_free(pci_dev->msix_cache);
-        assigned_dev->irq_entries_nr = 0;
-
-        r = kvm_device_irq_deassign(kvm_state, dev_id,
-                                    assigned_dev->irq_requested_type);
-        /* -ENXIO means no assigned irq */
-        if (r && r != -ENXIO)
-            perror("assigned_dev_update_msix: deassign irq");
-
-        assigned_dev->irq_requested_type = 0;
-    }
-
-    if (ctrl_word & PCI_MSIX_FLAGS_ENABLE) {
-        if (assigned_dev_set_msix_vectors(pci_dev) < 0) {
-            perror("assigned_dev_update_msix_mmio");
-            return;
-        }
-        if (kvm_device_msix_assign(kvm_state, dev_id) < 0) {
-            perror("assigned_dev_enable_msix: assign irq");
-            return;
+        ret = kvm_device_msix_assign(kvm_state, dev_id);
+        if (ret < 0) {
+            return ret;
         }
-        assigned_dev->girq = -1;
-        assigned_dev->irq_requested_type = KVM_DEV_IRQ_HOST_MSIX |
-                                           KVM_DEV_IRQ_GUEST_MSIX;
-    } else {
-        assign_intx(assigned_dev);
+        dev->irq_requested_type =
+            KVM_DEV_IRQ_HOST_MSIX | KVM_DEV_IRQ_GUEST_MSIX;
     }
+    return ret;
 }
 
 static uint32_t assigned_dev_pci_read_config(PCIDevice *pci_dev,
@@ -1083,13 +1021,6 @@ static void assigned_dev_pci_write_config(PCIDevice *pci_dev, uint32_t address,
 
     pci_default_write_config(pci_dev, address, val, len);
 
-    if (assigned_dev->cap.available & ASSIGNED_DEVICE_CAP_MSIX) {
-        if (range_covers_byte(address, len,
-                              pci_dev->msix_cap + PCI_MSIX_FLAGS + 1)) {
-            assigned_dev_update_msix(pci_dev);
-        }
-    }
-
     emulate_mask = 0;
     memcpy(&emulate_mask, assigned_dev->emulate_config_write + address, len);
     emulate_mask = le32_to_cpu(emulate_mask);
@@ -1115,7 +1046,6 @@ static void assigned_dev_setup_cap_read(AssignedDevice *dev, uint32_t offset,
 static int assigned_device_pci_cap_init(PCIDevice *pci_dev)
 {
     AssignedDevice *dev = DO_UPCAST(AssignedDevice, dev, pci_dev);
-    PCIRegion *pci_region = dev->real_device.regions;
     int ret, pos;
 
     /* Clear initial capabilities pointer and status copied from hw */
@@ -1145,27 +1075,31 @@ static int assigned_device_pci_cap_init(PCIDevice *pci_dev)
     /* Expose MSI-X capability */
     pos = pci_find_cap_offset(pci_dev, PCI_CAP_ID_MSIX, 0);
     if (pos != 0 && kvm_device_msix_supported(kvm_state)) {
-        int bar_nr;
-        uint32_t msix_table_entry;
-
-        dev->cap.available |= ASSIGNED_DEVICE_CAP_MSIX;
-        if ((ret = pci_add_capability(pci_dev, PCI_CAP_ID_MSIX, pos, 12)) < 0) {
+        unsigned int table_bar_nr, pba_bar_nr;
+        uint32_t table_offset, pba_offset;
+        uint16_t nentries;
+
+        nentries = (pci_get_word(pci_dev->config + pos + PCI_MSIX_FLAGS) &
+                    PCI_MSIX_FLAGS_QSIZE) + 1;
+        table_offset = pci_get_long(pci_dev->config + pos + PCI_MSIX_TABLE);
+        table_bar_nr = table_offset & PCI_MSIX_FLAGS_BIRMASK;
+        table_offset &= ~PCI_MSIX_FLAGS_BIRMASK;
+        pba_offset = pci_get_long(pci_dev->config + pos + PCI_MSIX_PBA);
+        pba_bar_nr = pba_offset & PCI_MSIX_FLAGS_BIRMASK;
+        pba_offset &= ~PCI_MSIX_FLAGS_BIRMASK;
+
+        ret = msix_init(pci_dev, pos, nentries,
+                        &dev->v_addrs[table_bar_nr].container, table_bar_nr,
+                        table_offset, &dev->v_addrs[pba_bar_nr].container,
+                        pba_bar_nr, pba_offset);
+        if (ret < 0) {
+            return ret;
+        }
+        ret = msix_set_config_notifiers(pci_dev, assigned_dev_update_msi,
+                                        assigned_dev_update_msix_vector);
+        if (ret < 0) {
             return ret;
         }
-        pci_dev->msix_cap = pos;
-
-        pci_set_word(pci_dev->config + pos + PCI_MSIX_FLAGS,
-                     pci_get_word(pci_dev->config + pos + PCI_MSIX_FLAGS) &
-                     PCI_MSIX_FLAGS_QSIZE);
-
-        /* Only enable and function mask bits are writable */
-        pci_set_word(pci_dev->wmask + pos + PCI_MSIX_FLAGS,
-                     PCI_MSIX_FLAGS_ENABLE | PCI_MSIX_FLAGS_MASKALL);
-
-        msix_table_entry = pci_get_long(pci_dev->config + pos + PCI_MSIX_TABLE);
-        bar_nr = msix_table_entry & PCI_MSIX_FLAGS_BIRMASK;
-        msix_table_entry &= ~PCI_MSIX_FLAGS_BIRMASK;
-        dev->msix_table_addr = pci_region[bar_nr].base_addr + msix_table_entry;
     }
 
     /* Minimal PM support, nothing writable, device appears to NAK changes */
@@ -1378,94 +1312,6 @@ static int assigned_device_pci_cap_init(PCIDevice *pci_dev)
     return 0;
 }
 
-static uint32_t msix_mmio_readl(void *opaque, target_phys_addr_t addr)
-{
-    AssignedDevice *adev = opaque;
-    unsigned int offset = addr & 0xfff;
-    void *page = adev->msix_table_page;
-    uint32_t val = 0;
-
-    memcpy(&val, (void *)((char *)page + offset), 4);
-
-    return val;
-}
-
-static uint32_t msix_mmio_readb(void *opaque, target_phys_addr_t addr)
-{
-    return ((msix_mmio_readl(opaque, addr & ~3)) >>
-            (8 * (addr & 3))) & 0xff;
-}
-
-static uint32_t msix_mmio_readw(void *opaque, target_phys_addr_t addr)
-{
-    return ((msix_mmio_readl(opaque, addr & ~3)) >>
-            (8 * (addr & 3))) & 0xffff;
-}
-
-static void msix_mmio_writel(void *opaque,
-                             target_phys_addr_t addr, uint32_t val)
-{
-    AssignedDevice *adev = opaque;
-    unsigned int offset = addr & 0xfff;
-    void *page = adev->msix_table_page;
-
-    DEBUG("write to MSI-X entry table mmio offset 0x%lx, val 0x%x\n",
-		    addr, val);
-    memcpy((void *)((char *)page + offset), &val, 4);
-}
-
-static void msix_mmio_writew(void *opaque,
-                             target_phys_addr_t addr, uint32_t val)
-{
-    msix_mmio_writel(opaque, addr & ~3,
-                     (val & 0xffff) << (8*(addr & 3)));
-}
-
-static void msix_mmio_writeb(void *opaque,
-                             target_phys_addr_t addr, uint32_t val)
-{
-    msix_mmio_writel(opaque, addr & ~3,
-                     (val & 0xff) << (8*(addr & 3)));
-}
-
-static const MemoryRegionOps msix_mmio_ops = {
-    .old_mmio = {
-        .read = { msix_mmio_readb, msix_mmio_readw, msix_mmio_readl, },
-        .write = { msix_mmio_writeb, msix_mmio_writew, msix_mmio_writel, },
-    },
-    .endianness = DEVICE_NATIVE_ENDIAN,
-};
-
-static int assigned_dev_register_msix_mmio(AssignedDevice *dev)
-{
-    dev->msix_table_page = mmap(NULL, 0x1000,
-                                PROT_READ|PROT_WRITE,
-                                MAP_ANONYMOUS|MAP_PRIVATE, 0, 0);
-    if (dev->msix_table_page == MAP_FAILED) {
-        fprintf(stderr, "fail allocate msix_table_page! %s\n",
-                strerror(errno));
-        return -EFAULT;
-    }
-    memset(dev->msix_table_page, 0, 0x1000);
-    memory_region_init_io(&dev->mmio, &msix_mmio_ops, dev,
-                          "assigned-dev-msix", MSIX_PAGE_SIZE);
-    return 0;
-}
-
-static void assigned_dev_unregister_msix_mmio(AssignedDevice *dev)
-{
-    if (!dev->msix_table_page)
-        return;
-
-    memory_region_destroy(&dev->mmio);
-
-    if (munmap(dev->msix_table_page, 0x1000) == -1) {
-        fprintf(stderr, "error unmapping msix_table_page! %s\n",
-                strerror(errno));
-    }
-    dev->msix_table_page = NULL;
-}
-
 static const VMStateDescription vmstate_assigned_device = {
     .name = "pci-assign",
     .unmigratable = 1,
@@ -1548,23 +1394,16 @@ static int assigned_initfn(struct PCIDevice *pci_dev)
         goto out;
     }
 
-    if (assigned_device_pci_cap_init(pci_dev) < 0) {
-        goto out;
-    }
-
-    /* intercept MSI-X entry page in the MMIO */
-    if (dev->cap.available & ASSIGNED_DEVICE_CAP_MSIX) {
-        if (assigned_dev_register_msix_mmio(dev)) {
-            goto out;
-        }
-    }
-
     /* handle real device's MMIO/PIO BARs */
     if (assigned_dev_register_regions(dev->real_device.regions,
                                       dev->real_device.region_number,
                                       dev))
         goto out;
 
+    if (assigned_device_pci_cap_init(pci_dev) < 0) {
+        goto out;
+    }
+
     /* handle interrupt routing */
     e_intx = dev->dev.config[0x3d] - 1;
     dev->intpin = e_intx;
diff --git a/hw/device-assignment.h b/hw/device-assignment.h
index 4b67f14..c41ea33 100644
--- a/hw/device-assignment.h
+++ b/hw/device-assignment.h
@@ -95,21 +95,9 @@ typedef struct AssignedDevice {
     uint8_t h_devfn;
     int irq_requested_type;
     int bound;
-    struct {
-#define ASSIGNED_DEVICE_CAP_MSI (1 << 0)
-#define ASSIGNED_DEVICE_CAP_MSIX (1 << 1)
-        uint32_t available;
-#define ASSIGNED_DEVICE_MSI_ENABLED (1 << 0)
-#define ASSIGNED_DEVICE_MSIX_ENABLED (1 << 1)
-#define ASSIGNED_DEVICE_MSIX_MASKED (1 << 2)
-        uint32_t state;
-    } cap;
     uint8_t emulate_config_read[PCI_CONFIG_SPACE_SIZE];
     uint8_t emulate_config_write[PCI_CONFIG_SPACE_SIZE];
-    int irq_entries_nr;
-    void *msix_table_page;
-    target_phys_addr_t msix_table_addr;
-    MemoryRegion mmio;
+    unsigned int msix_vectors_in_use;
     char *configfd_name;
     int32_t bootindex;
     QLIST_ENTRY(AssignedDevice) next;
-- 
1.7.3.4


^ permalink raw reply related	[flat|nested] 288+ messages in thread

* [Qemu-devel] [RFC][PATCH 44/45] pci-assign: Use generic MSI-X support
@ 2011-10-17  9:28   ` Jan Kiszka
  0 siblings, 0 replies; 288+ messages in thread
From: Jan Kiszka @ 2011-10-17  9:28 UTC (permalink / raw)
  To: Avi Kivity, Marcelo Tosatti
  Cc: Alex Williamson, qemu-devel, kvm, Michael S. Tsirkin

Switch MSI-X support of the device assignment core to the generic layer
QEMU offers. As for legacy MSI, we use config notifiers to update IRQ
assignment and routes on guest changes. Quite a bit code becomes
obsolete in the device assigment core, e.g. the maintenance of the MSI-X
vector masking MMIO page. Note that we have to reorder BAR mapping and
capability initialization in order to pass the BAR container on
msix_init.

Also in this case we still do not support per-vector masking even after
these changes.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
---
 hw/device-assignment.c |  335 +++++++++++++-----------------------------------
 hw/device-assignment.h |   14 +--
 2 files changed, 88 insertions(+), 261 deletions(-)

diff --git a/hw/device-assignment.c b/hw/device-assignment.c
index 10b30a3..df554b3 100644
--- a/hw/device-assignment.c
+++ b/hw/device-assignment.c
@@ -24,6 +24,7 @@
  *  Copyright (C) 2008, Qumranet, Amit Shah (amit.shah@qumranet.com)
  *  Copyright (C) 2008, Red Hat, Amit Shah (amit.shah@redhat.com)
  *  Copyright (C) 2008, IBM, Muli Ben-Yehuda (muli@il.ibm.com)
+ *  Copyright (C) 2011, Siemens AG, Jan Kiszka (jan.kiszka@siemens.com)
  */
 #include <stdio.h>
 #include <unistd.h>
@@ -41,6 +42,7 @@
 #include "range.h"
 #include "sysemu.h"
 #include "msi.h"
+#include "msix.h"
 
 #define MSIX_PAGE_SIZE 0x1000
 
@@ -64,8 +66,6 @@
 
 static void assigned_dev_load_option_rom(AssignedDevice *dev);
 
-static void assigned_dev_unregister_msix_mmio(AssignedDevice *dev);
-
 static uint32_t assigned_dev_ioport_rw(AssignedDevRegion *dev_region,
                                        uint32_t addr, int len, uint32_t *val)
 {
@@ -238,24 +238,11 @@ static void assigned_dev_iomem_setup(PCIDevice *pci_dev, int region_num,
 {
     AssignedDevice *r_dev = DO_UPCAST(AssignedDevice, dev, pci_dev);
     AssignedDevRegion *region = &r_dev->v_addrs[region_num];
-    PCIRegion *real_region = &r_dev->real_device.regions[region_num];
 
     if (e_size > 0) {
         memory_region_init(&region->container, "assigned-dev-container",
                            e_size);
         memory_region_add_subregion(&region->container, 0, &region->real_iomem);
-
-        /* deal with MSI-X MMIO page */
-        if (real_region->base_addr <= r_dev->msix_table_addr &&
-                real_region->base_addr + real_region->size >
-                r_dev->msix_table_addr) {
-            int offset = r_dev->msix_table_addr - real_region->base_addr;
-
-            memory_region_add_subregion_overlap(&region->container,
-                                                offset,
-                                                &r_dev->mmio,
-                                                1);
-        }
     }
 }
 
@@ -648,21 +635,20 @@ again:
 
 static QLIST_HEAD(, AssignedDevice) devs = QLIST_HEAD_INITIALIZER(devs);
 
-static void invalidate_msix_vectors(AssignedDevice *dev)
-{
-    int i;
-
-    for (i = 0; i < dev->irq_entries_nr; i++) {
-        kvm_msi_cache_invalidate(&dev->dev.msix_cache[i]);
-    }
-}
-
 static void free_assigned_device(AssignedDevice *dev)
 {
+    uint32_t table_bar_nr, pba_bar_nr;
+    uint8_t *msix_cap;
     int i;
 
-    if (dev->cap.available & ASSIGNED_DEVICE_CAP_MSIX) {
-        assigned_dev_unregister_msix_mmio(dev);
+    if (msix_present(&dev->dev)) {
+        msix_cap = dev->dev.config + dev->dev.msix_cap;
+        table_bar_nr = pci_get_long(msix_cap + PCI_MSIX_TABLE) &
+            PCI_MSIX_FLAGS_BIRMASK;
+        pba_bar_nr = pci_get_long(msix_cap + PCI_MSIX_PBA) &
+            PCI_MSIX_FLAGS_BIRMASK;
+        msix_uninit(&dev->dev, &dev->v_addrs[table_bar_nr].container,
+                    &dev->v_addrs[pba_bar_nr].container);
     }
     for (i = 0; i < dev->real_device.region_number; i++) {
         PCIRegion *pci_region = &dev->real_device.regions[i];
@@ -698,9 +684,6 @@ static void free_assigned_device(AssignedDevice *dev)
     if (dev->real_device.config_fd >= 0) {
         close(dev->real_device.config_fd);
     }
-
-    invalidate_msix_vectors(dev);
-    g_free(dev->dev.msix_cache);
 }
 
 static uint32_t calc_assigned_dev_id(AssignedDevice *dev)
@@ -916,11 +899,13 @@ void assigned_dev_update_irqs(void)
     }
 }
 
+/* used for both MSI and MSI-X */
 static void assigned_dev_update_msi(PCIDevice *pci_dev, bool enabled)
 {
     AssignedDevice *dev = DO_UPCAST(AssignedDevice, dev, pci_dev);
 
     if (!enabled) {
+        dev->msix_vectors_in_use = 0;
         assign_intx(dev);
     }
 }
@@ -945,113 +930,66 @@ static int assigned_dev_update_msi_vector(PCIDevice *pci_dev,
     return 0;
 }
 
-static int assigned_dev_set_msix_vectors(PCIDevice *pci_dev)
+static int assigned_dev_update_msix_vector(PCIDevice *pci_dev,
+                                           unsigned int vector,
+                                           MSIMessage *msg, bool masked)
 {
-    AssignedDevice *adev = DO_UPCAST(AssignedDevice, dev, pci_dev);
-    uint16_t entries_nr = 0, entries_max_nr;
-    void *msix_page = adev->msix_table_page;
+    AssignedDevice *dev = DO_UPCAST(AssignedDevice, dev, pci_dev);
+    MSIRoutingCache *cache;
     uint32_t dev_id;
-    MSIMessage msg;
-    int pos, i, r;
-
-    assert(adev->irq_entries_nr == 0);
-
-    pos = pci_find_capability(pci_dev, PCI_CAP_ID_MSIX);
+    unsigned int i;
+    int ret = 0;
 
-    entries_max_nr = pci_get_word(pci_dev->config + pos + PCI_MSIX_FLAGS);
-    entries_max_nr &= PCI_MSIX_FLAGS_QSIZE;
-    entries_max_nr += 1;
+    if (!masked) {
+        dev_id = calc_assigned_dev_id(dev);
 
-    /* Get the usable entry number for allocating */
-    for (i = 0; i < entries_max_nr; i++) {
         /* Assuming IA-32 MSI message format:
          * Ignore unused entry (invalid vector) */
-        if (pci_get_long(msix_page + i * PCI_MSIX_ENTRY_SIZE +
-                         PCI_MSIX_ENTRY_DATA) == 0) {
-            continue;
+        if (msg->data == 0) {
+            if (pci_dev->msix_cache[vector].type == MSI_ROUTE_NONE) {
+                return ret;
+            }
+            dev->msix_vectors_in_use--;
+            deassign_irq(dev);
+            kvm_msi_cache_invalidate(&pci_dev->msix_cache[vector]);
+        } else {
+            if (pci_dev->msix_cache[vector].type != MSI_ROUTE_NONE) {
+                ret = kvm_device_msix_set_vector(kvm_state, dev_id,
+                                                 vector, msg,
+                                                 &pci_dev->msix_cache[vector]);
+                return ret;
+            }
+            dev->msix_vectors_in_use++;
+            deassign_irq(dev);
         }
-        entries_nr++;
-    }
-    if (entries_nr == 0) {
-        fprintf(stderr, "MSI-X entry number is zero!\n");
-        return -EINVAL;
-    }
 
-    dev_id = calc_assigned_dev_id(adev);
-
-    r = kvm_device_msix_init_vectors(kvm_state, dev_id, entries_nr);
-    if (r < 0) {
-        return r;
-    }
-    pci_dev->msix_cache = g_malloc0(entries_nr * sizeof(MSIRoutingCache));
-    adev->irq_entries_nr = entries_nr;
-
-    for (i = 0; i < entries_max_nr; i++) {
-        if (entries_nr == 0) {
-            break;
-        }
-        msg.data = pci_get_long(msix_page + i * PCI_MSIX_ENTRY_SIZE +
-                                PCI_MSIX_ENTRY_DATA);
-        if (msg.data == 0) {
-            continue;
+        ret = kvm_device_msix_init_vectors(kvm_state, dev_id,
+                                           dev->msix_vectors_in_use);
+        if (ret < 0) {
+            return ret;
         }
-        msg.address = pci_get_quad(msix_page + i * PCI_MSIX_ENTRY_SIZE +
-                                   PCI_MSIX_ENTRY_LOWER_ADDR);
 
-        r = kvm_device_msix_set_vector(kvm_state, dev_id, i, &msg,
-                                       &pci_dev->msix_cache[i]);
-        if (r < 0) {
-            return r;
+        for (i = 0; i < pci_dev->msix_entries_nr; i++) {
+            cache = &pci_dev->msix_cache[i];
+            if (i != vector && cache->type == MSI_ROUTE_NONE) {
+                continue;
+            }
+            ret = kvm_device_msix_set_vector(kvm_state, dev_id, i,
+                                             i == vector ? msg : &cache->msg,
+                                             cache);
+            if (ret < 0) {
+                return ret;
+            }
         }
-        entries_nr--;
-    }
-
-    return 0;
-}
-
-static void assigned_dev_update_msix(PCIDevice *pci_dev)
-{
-    AssignedDevice *assigned_dev = DO_UPCAST(AssignedDevice, dev, pci_dev);
-    uint16_t ctrl_word = pci_get_word(pci_dev->config + pci_dev->msix_cap +
-                                      PCI_MSIX_FLAGS);
-    uint32_t dev_id;
-    int r;
 
-    dev_id = calc_assigned_dev_id(assigned_dev);
-
-    /* Some guests gratuitously disable MSIX even if they're not using it,
-     * try to catch this by only deassigning irqs if the guest is using
-     * MSIX or intends to start. */
-    if ((assigned_dev->irq_requested_type & KVM_DEV_IRQ_GUEST_MSIX) ||
-        (ctrl_word & PCI_MSIX_FLAGS_ENABLE)) {
-        invalidate_msix_vectors(assigned_dev);
-        g_free(pci_dev->msix_cache);
-        assigned_dev->irq_entries_nr = 0;
-
-        r = kvm_device_irq_deassign(kvm_state, dev_id,
-                                    assigned_dev->irq_requested_type);
-        /* -ENXIO means no assigned irq */
-        if (r && r != -ENXIO)
-            perror("assigned_dev_update_msix: deassign irq");
-
-        assigned_dev->irq_requested_type = 0;
-    }
-
-    if (ctrl_word & PCI_MSIX_FLAGS_ENABLE) {
-        if (assigned_dev_set_msix_vectors(pci_dev) < 0) {
-            perror("assigned_dev_update_msix_mmio");
-            return;
-        }
-        if (kvm_device_msix_assign(kvm_state, dev_id) < 0) {
-            perror("assigned_dev_enable_msix: assign irq");
-            return;
+        ret = kvm_device_msix_assign(kvm_state, dev_id);
+        if (ret < 0) {
+            return ret;
         }
-        assigned_dev->girq = -1;
-        assigned_dev->irq_requested_type = KVM_DEV_IRQ_HOST_MSIX |
-                                           KVM_DEV_IRQ_GUEST_MSIX;
-    } else {
-        assign_intx(assigned_dev);
+        dev->irq_requested_type =
+            KVM_DEV_IRQ_HOST_MSIX | KVM_DEV_IRQ_GUEST_MSIX;
     }
+    return ret;
 }
 
 static uint32_t assigned_dev_pci_read_config(PCIDevice *pci_dev,
@@ -1083,13 +1021,6 @@ static void assigned_dev_pci_write_config(PCIDevice *pci_dev, uint32_t address,
 
     pci_default_write_config(pci_dev, address, val, len);
 
-    if (assigned_dev->cap.available & ASSIGNED_DEVICE_CAP_MSIX) {
-        if (range_covers_byte(address, len,
-                              pci_dev->msix_cap + PCI_MSIX_FLAGS + 1)) {
-            assigned_dev_update_msix(pci_dev);
-        }
-    }
-
     emulate_mask = 0;
     memcpy(&emulate_mask, assigned_dev->emulate_config_write + address, len);
     emulate_mask = le32_to_cpu(emulate_mask);
@@ -1115,7 +1046,6 @@ static void assigned_dev_setup_cap_read(AssignedDevice *dev, uint32_t offset,
 static int assigned_device_pci_cap_init(PCIDevice *pci_dev)
 {
     AssignedDevice *dev = DO_UPCAST(AssignedDevice, dev, pci_dev);
-    PCIRegion *pci_region = dev->real_device.regions;
     int ret, pos;
 
     /* Clear initial capabilities pointer and status copied from hw */
@@ -1145,27 +1075,31 @@ static int assigned_device_pci_cap_init(PCIDevice *pci_dev)
     /* Expose MSI-X capability */
     pos = pci_find_cap_offset(pci_dev, PCI_CAP_ID_MSIX, 0);
     if (pos != 0 && kvm_device_msix_supported(kvm_state)) {
-        int bar_nr;
-        uint32_t msix_table_entry;
-
-        dev->cap.available |= ASSIGNED_DEVICE_CAP_MSIX;
-        if ((ret = pci_add_capability(pci_dev, PCI_CAP_ID_MSIX, pos, 12)) < 0) {
+        unsigned int table_bar_nr, pba_bar_nr;
+        uint32_t table_offset, pba_offset;
+        uint16_t nentries;
+
+        nentries = (pci_get_word(pci_dev->config + pos + PCI_MSIX_FLAGS) &
+                    PCI_MSIX_FLAGS_QSIZE) + 1;
+        table_offset = pci_get_long(pci_dev->config + pos + PCI_MSIX_TABLE);
+        table_bar_nr = table_offset & PCI_MSIX_FLAGS_BIRMASK;
+        table_offset &= ~PCI_MSIX_FLAGS_BIRMASK;
+        pba_offset = pci_get_long(pci_dev->config + pos + PCI_MSIX_PBA);
+        pba_bar_nr = pba_offset & PCI_MSIX_FLAGS_BIRMASK;
+        pba_offset &= ~PCI_MSIX_FLAGS_BIRMASK;
+
+        ret = msix_init(pci_dev, pos, nentries,
+                        &dev->v_addrs[table_bar_nr].container, table_bar_nr,
+                        table_offset, &dev->v_addrs[pba_bar_nr].container,
+                        pba_bar_nr, pba_offset);
+        if (ret < 0) {
+            return ret;
+        }
+        ret = msix_set_config_notifiers(pci_dev, assigned_dev_update_msi,
+                                        assigned_dev_update_msix_vector);
+        if (ret < 0) {
             return ret;
         }
-        pci_dev->msix_cap = pos;
-
-        pci_set_word(pci_dev->config + pos + PCI_MSIX_FLAGS,
-                     pci_get_word(pci_dev->config + pos + PCI_MSIX_FLAGS) &
-                     PCI_MSIX_FLAGS_QSIZE);
-
-        /* Only enable and function mask bits are writable */
-        pci_set_word(pci_dev->wmask + pos + PCI_MSIX_FLAGS,
-                     PCI_MSIX_FLAGS_ENABLE | PCI_MSIX_FLAGS_MASKALL);
-
-        msix_table_entry = pci_get_long(pci_dev->config + pos + PCI_MSIX_TABLE);
-        bar_nr = msix_table_entry & PCI_MSIX_FLAGS_BIRMASK;
-        msix_table_entry &= ~PCI_MSIX_FLAGS_BIRMASK;
-        dev->msix_table_addr = pci_region[bar_nr].base_addr + msix_table_entry;
     }
 
     /* Minimal PM support, nothing writable, device appears to NAK changes */
@@ -1378,94 +1312,6 @@ static int assigned_device_pci_cap_init(PCIDevice *pci_dev)
     return 0;
 }
 
-static uint32_t msix_mmio_readl(void *opaque, target_phys_addr_t addr)
-{
-    AssignedDevice *adev = opaque;
-    unsigned int offset = addr & 0xfff;
-    void *page = adev->msix_table_page;
-    uint32_t val = 0;
-
-    memcpy(&val, (void *)((char *)page + offset), 4);
-
-    return val;
-}
-
-static uint32_t msix_mmio_readb(void *opaque, target_phys_addr_t addr)
-{
-    return ((msix_mmio_readl(opaque, addr & ~3)) >>
-            (8 * (addr & 3))) & 0xff;
-}
-
-static uint32_t msix_mmio_readw(void *opaque, target_phys_addr_t addr)
-{
-    return ((msix_mmio_readl(opaque, addr & ~3)) >>
-            (8 * (addr & 3))) & 0xffff;
-}
-
-static void msix_mmio_writel(void *opaque,
-                             target_phys_addr_t addr, uint32_t val)
-{
-    AssignedDevice *adev = opaque;
-    unsigned int offset = addr & 0xfff;
-    void *page = adev->msix_table_page;
-
-    DEBUG("write to MSI-X entry table mmio offset 0x%lx, val 0x%x\n",
-		    addr, val);
-    memcpy((void *)((char *)page + offset), &val, 4);
-}
-
-static void msix_mmio_writew(void *opaque,
-                             target_phys_addr_t addr, uint32_t val)
-{
-    msix_mmio_writel(opaque, addr & ~3,
-                     (val & 0xffff) << (8*(addr & 3)));
-}
-
-static void msix_mmio_writeb(void *opaque,
-                             target_phys_addr_t addr, uint32_t val)
-{
-    msix_mmio_writel(opaque, addr & ~3,
-                     (val & 0xff) << (8*(addr & 3)));
-}
-
-static const MemoryRegionOps msix_mmio_ops = {
-    .old_mmio = {
-        .read = { msix_mmio_readb, msix_mmio_readw, msix_mmio_readl, },
-        .write = { msix_mmio_writeb, msix_mmio_writew, msix_mmio_writel, },
-    },
-    .endianness = DEVICE_NATIVE_ENDIAN,
-};
-
-static int assigned_dev_register_msix_mmio(AssignedDevice *dev)
-{
-    dev->msix_table_page = mmap(NULL, 0x1000,
-                                PROT_READ|PROT_WRITE,
-                                MAP_ANONYMOUS|MAP_PRIVATE, 0, 0);
-    if (dev->msix_table_page == MAP_FAILED) {
-        fprintf(stderr, "fail allocate msix_table_page! %s\n",
-                strerror(errno));
-        return -EFAULT;
-    }
-    memset(dev->msix_table_page, 0, 0x1000);
-    memory_region_init_io(&dev->mmio, &msix_mmio_ops, dev,
-                          "assigned-dev-msix", MSIX_PAGE_SIZE);
-    return 0;
-}
-
-static void assigned_dev_unregister_msix_mmio(AssignedDevice *dev)
-{
-    if (!dev->msix_table_page)
-        return;
-
-    memory_region_destroy(&dev->mmio);
-
-    if (munmap(dev->msix_table_page, 0x1000) == -1) {
-        fprintf(stderr, "error unmapping msix_table_page! %s\n",
-                strerror(errno));
-    }
-    dev->msix_table_page = NULL;
-}
-
 static const VMStateDescription vmstate_assigned_device = {
     .name = "pci-assign",
     .unmigratable = 1,
@@ -1548,23 +1394,16 @@ static int assigned_initfn(struct PCIDevice *pci_dev)
         goto out;
     }
 
-    if (assigned_device_pci_cap_init(pci_dev) < 0) {
-        goto out;
-    }
-
-    /* intercept MSI-X entry page in the MMIO */
-    if (dev->cap.available & ASSIGNED_DEVICE_CAP_MSIX) {
-        if (assigned_dev_register_msix_mmio(dev)) {
-            goto out;
-        }
-    }
-
     /* handle real device's MMIO/PIO BARs */
     if (assigned_dev_register_regions(dev->real_device.regions,
                                       dev->real_device.region_number,
                                       dev))
         goto out;
 
+    if (assigned_device_pci_cap_init(pci_dev) < 0) {
+        goto out;
+    }
+
     /* handle interrupt routing */
     e_intx = dev->dev.config[0x3d] - 1;
     dev->intpin = e_intx;
diff --git a/hw/device-assignment.h b/hw/device-assignment.h
index 4b67f14..c41ea33 100644
--- a/hw/device-assignment.h
+++ b/hw/device-assignment.h
@@ -95,21 +95,9 @@ typedef struct AssignedDevice {
     uint8_t h_devfn;
     int irq_requested_type;
     int bound;
-    struct {
-#define ASSIGNED_DEVICE_CAP_MSI (1 << 0)
-#define ASSIGNED_DEVICE_CAP_MSIX (1 << 1)
-        uint32_t available;
-#define ASSIGNED_DEVICE_MSI_ENABLED (1 << 0)
-#define ASSIGNED_DEVICE_MSIX_ENABLED (1 << 1)
-#define ASSIGNED_DEVICE_MSIX_MASKED (1 << 2)
-        uint32_t state;
-    } cap;
     uint8_t emulate_config_read[PCI_CONFIG_SPACE_SIZE];
     uint8_t emulate_config_write[PCI_CONFIG_SPACE_SIZE];
-    int irq_entries_nr;
-    void *msix_table_page;
-    target_phys_addr_t msix_table_addr;
-    MemoryRegion mmio;
+    unsigned int msix_vectors_in_use;
     char *configfd_name;
     int32_t bootindex;
     QLIST_ENTRY(AssignedDevice) next;
-- 
1.7.3.4

^ permalink raw reply related	[flat|nested] 288+ messages in thread

* [RFC][PATCH 45/45] pci-assign: Fix coding style issues
  2011-10-17  9:27 ` [Qemu-devel] " Jan Kiszka
@ 2011-10-17  9:28   ` Jan Kiszka
  -1 siblings, 0 replies; 288+ messages in thread
From: Jan Kiszka @ 2011-10-17  9:28 UTC (permalink / raw)
  To: Avi Kivity, Marcelo Tosatti
  Cc: kvm, Alex Williamson, Michael S. Tsirkin, qemu-devel

Also remove the dead get_assigned_device at this chance. No functional
changes.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
---
 hw/device-assignment.c |  199 ++++++++++++++++++++++++------------------------
 hw/device-assignment.h |   14 ++--
 2 files changed, 107 insertions(+), 106 deletions(-)

diff --git a/hw/device-assignment.c b/hw/device-assignment.c
index df554b3..c7930e4 100644
--- a/hw/device-assignment.c
+++ b/hw/device-assignment.c
@@ -58,10 +58,10 @@
 #ifdef DEVICE_ASSIGNMENT_DEBUG
 #define DEBUG(fmt, ...)                                       \
     do {                                                      \
-      fprintf(stderr, "%s: " fmt, __func__ , __VA_ARGS__);    \
+        fprintf(stderr, "%s: " fmt, __func__ , __VA_ARGS__);  \
     } while (0)
 #else
-#define DEBUG(fmt, ...) do { } while(0)
+#define DEBUG(fmt, ...) do { } while (0)
 #endif
 
 static void assigned_dev_load_option_rom(AssignedDevice *dev);
@@ -97,27 +97,27 @@ static uint32_t assigned_dev_ioport_rw(AssignedDevRegion *dev_region,
             DEBUG("out val=%x, len=%d, e_phys=%x, host=%x\n",
                   *val, len, addr, port);
             switch (len) {
-                case 1:
-                    outb(*val, port);
-                    break;
-                case 2:
-                    outw(*val, port);
-                    break;
-                case 4:
-                    outl(*val, port);
-                    break;
+            case 1:
+                outb(*val, port);
+                break;
+            case 2:
+                outw(*val, port);
+                break;
+            case 4:
+                outl(*val, port);
+                break;
             }
         } else {
             switch (len) {
-                case 1:
-                    ret = inb(port);
-                    break;
-                case 2:
-                    ret = inw(port);
-                    break;
-                case 4:
-                    ret = inl(port);
-                    break;
+            case 1:
+                ret = inb(port);
+                break;
+            case 2:
+                ret = inw(port);
+                break;
+            case 4:
+                ret = inl(port);
+                break;
             }
             DEBUG("in val=%x, len=%d, e_phys=%x, host=%x\n",
                   ret, len, addr, port);
@@ -130,21 +130,18 @@ static void assigned_dev_ioport_writeb(void *opaque, uint32_t addr,
                                        uint32_t value)
 {
     assigned_dev_ioport_rw(opaque, addr, 1, &value);
-    return;
 }
 
 static void assigned_dev_ioport_writew(void *opaque, uint32_t addr,
                                        uint32_t value)
 {
     assigned_dev_ioport_rw(opaque, addr, 2, &value);
-    return;
 }
 
 static void assigned_dev_ioport_writel(void *opaque, uint32_t addr,
                        uint32_t value)
 {
     assigned_dev_ioport_rw(opaque, addr, 4, &value);
-    return;
 }
 
 static uint32_t assigned_dev_ioport_readb(void *opaque, uint32_t addr)
@@ -295,13 +292,13 @@ static uint32_t assigned_dev_pci_read(PCIDevice *d, int pos, int len)
 again:
     ret = pread(fd, &val, len, pos);
     if (ret != len) {
-	if ((ret < 0) && (errno == EINTR || errno == EAGAIN))
-	    goto again;
-
-	fprintf(stderr, "%s: pread failed, ret = %zd errno = %d\n",
-		__func__, ret, errno);
+        if ((ret < 0) && (errno == EINTR || errno == EAGAIN)) {
+            goto again;
+        }
+        fprintf(stderr, "%s: pread failed, ret = %zd errno = %d\n",
+                __func__, ret, errno);
 
-	exit(1);
+        exit(1);
     }
 
     return val;
@@ -321,16 +318,14 @@ static void assigned_dev_pci_write(PCIDevice *d, int pos, uint32_t val, int len)
 again:
     ret = pwrite(fd, &val, len, pos);
     if (ret != len) {
-	if ((ret < 0) && (errno == EINTR || errno == EAGAIN))
-	    goto again;
-
-	fprintf(stderr, "%s: pwrite failed, ret = %zd errno = %d\n",
-		__func__, ret, errno);
+        if ((ret < 0) && (errno == EINTR || errno == EAGAIN)) {
+            goto again;
+        }
+        fprintf(stderr, "%s: pwrite failed, ret = %zd errno = %d\n",
+                __func__, ret, errno);
 
-	exit(1);
+        exit(1);
     }
-
-    return;
 }
 
 static void assigned_dev_emulate_config_read(AssignedDevice *dev,
@@ -359,22 +354,24 @@ static uint8_t pci_find_cap_offset(PCIDevice *d, uint8_t cap, uint8_t start)
     int status;
 
     status = assigned_dev_pci_read_byte(d, PCI_STATUS);
-    if ((status & PCI_STATUS_CAP_LIST) == 0)
+    if ((status & PCI_STATUS_CAP_LIST) == 0) {
         return 0;
+    }
 
     while (max_cap--) {
         pos = assigned_dev_pci_read_byte(d, pos);
-        if (pos < 0x40)
+        if (pos < 0x40) {
             break;
-
+        }
         pos &= ~3;
         id = assigned_dev_pci_read_byte(d, pos + PCI_CAP_LIST_ID);
 
-        if (id == 0xff)
+        if (id == 0xff) {
             break;
-        if (id == cap)
+        }
+        if (id == cap) {
             return pos;
-
+        }
         pos += PCI_CAP_LIST_NEXT;
     }
     return 0;
@@ -388,8 +385,9 @@ static int assigned_dev_register_regions(PCIRegion *io_regions,
     PCIRegion *cur_region = io_regions;
 
     for (i = 0; i < regions_num; i++, cur_region++) {
-        if (!cur_region->valid)
+        if (!cur_region->valid) {
             continue;
+        }
         pci_dev->v_addrs[i].num = i;
 
         /* handle memory io regions */
@@ -527,7 +525,7 @@ static int get_real_device(AssignedDevice *pci_dev, uint16_t r_seg,
     dev->region_number = 0;
 
     snprintf(dir, sizeof(dir), "/sys/bus/pci/devices/%04x:%02x:%02x.%x/",
-	     r_seg, r_bus, r_dev, r_func);
+             r_seg, r_bus, r_dev, r_func);
 
     snprintf(name, sizeof(name), "%sconfig", dir);
 
@@ -554,8 +552,9 @@ again:
     r = read(dev->config_fd, pci_dev->dev.config,
              pci_config_size(&pci_dev->dev));
     if (r < 0) {
-        if (errno == EINTR || errno == EAGAIN)
+        if (errno == EINTR || errno == EAGAIN) {
             goto again;
+        }
         fprintf(stderr, "%s: read failed, errno = %d\n", __func__, errno);
     }
 
@@ -574,16 +573,17 @@ again:
     }
 
     for (r = 0; r < PCI_ROM_SLOT; r++) {
-	if (fscanf(f, "%lli %lli %lli\n", &start, &end, &flags) != 3)
-	    break;
-
+        if (fscanf(f, "%lli %lli %lli\n", &start, &end, &flags) != 3) {
+            break;
+        }
         rp = dev->regions + r;
         rp->valid = 0;
         rp->resource_fd = -1;
         size = end - start + 1;
         flags &= IORESOURCE_IO | IORESOURCE_MEM | IORESOURCE_PREFETCH;
-        if (size == 0 || (flags & ~IORESOURCE_PREFETCH) == 0)
+        if (size == 0 || (flags & ~IORESOURCE_PREFETCH) == 0) {
             continue;
+        }
         if (flags & IORESOURCE_MEM) {
             flags &= ~IORESOURCE_IO;
         } else {
@@ -591,8 +591,9 @@ again:
         }
         snprintf(name, sizeof(name), "%sresource%d", dir, r);
         fd = open(name, O_RDWR);
-        if (fd == -1)
+        if (fd == -1) {
             continue;
+        }
         rp->resource_fd = fd;
 
         rp->type = flags;
@@ -704,7 +705,8 @@ static void assign_failed_examine(AssignedDevice *dev)
     sprintf(name, "%sdriver", dir);
 
     r = readlink(name, driver, sizeof(driver));
-    if ((r <= 0) || r >= sizeof(driver) || !(ns = strrchr(driver, '/'))) {
+    ns = strrchr(driver, '/');
+    if (r <= 0 || r >= sizeof(driver) || ns == NULL) {
         goto fail;
     }
 
@@ -780,11 +782,11 @@ static int assign_device(AssignedDevice *dev)
                 dev->dev.qdev.id, strerror(-r));
 
         switch (r) {
-            case -EBUSY:
-                assign_failed_examine(dev);
-                break;
-            default:
-                break;
+        case -EBUSY:
+            assign_failed_examine(dev);
+            break;
+        default:
+            break;
         }
     }
     return r;
@@ -812,9 +814,9 @@ static int assign_intx(AssignedDevice *dev)
     int irq, r;
 
     /* Interrupt PIN 0 means don't use INTx */
-    if (assigned_dev_pci_read_byte(&dev->dev, PCI_INTERRUPT_PIN) == 0)
+    if (assigned_dev_pci_read_byte(&dev->dev, PCI_INTERRUPT_PIN) == 0) {
         return 0;
-
+    }
     irq = pci_map_irq(&dev->dev, dev->intpin);
     irq = piix_get_irq(irq);
 
@@ -856,27 +858,11 @@ static void deassign_device(AssignedDevice *dev)
     assigned_dev_data.assigned_dev_id = calc_assigned_dev_id(dev);
 
     r = kvm_deassign_pci_device(kvm_state, &assigned_dev_data);
-    if (r < 0)
-	fprintf(stderr, "Failed to deassign device \"%s\" : %s\n",
+    if (r < 0) {
+        fprintf(stderr, "Failed to deassign device \"%s\" : %s\n",
                 dev->dev.qdev.id, strerror(-r));
-}
-
-#if 0
-AssignedDevInfo *get_assigned_device(int pcibus, int slot)
-{
-    AssignedDevice *assigned_dev = NULL;
-    AssignedDevInfo *adev = NULL;
-
-    QLIST_FOREACH(adev, &adev_head, next) {
-        assigned_dev = adev->assigned_dev;
-        if (pci_bus_num(assigned_dev->dev.bus) == pcibus &&
-            PCI_SLOT(assigned_dev->dev.devfn) == slot)
-            return adev;
     }
-
-    return NULL;
 }
-#endif
 
 /* The pci config space got updated. Check if irq numbers have changed
  * for our devices
@@ -1103,10 +1089,12 @@ static int assigned_device_pci_cap_init(PCIDevice *pci_dev)
     }
 
     /* Minimal PM support, nothing writable, device appears to NAK changes */
-    if ((pos = pci_find_cap_offset(pci_dev, PCI_CAP_ID_PM, 0))) {
+    pos = pci_find_cap_offset(pci_dev, PCI_CAP_ID_PM, 0);
+    if (pos != 0) {
         uint16_t pmc;
-        if ((ret = pci_add_capability(pci_dev, PCI_CAP_ID_PM, pos,
-                                      PCI_PM_SIZEOF)) < 0) {
+
+        ret = pci_add_capability(pci_dev, PCI_CAP_ID_PM, pos, PCI_PM_SIZEOF);
+        if (ret < 0) {
             return ret;
         }
 
@@ -1125,7 +1113,8 @@ static int assigned_device_pci_cap_init(PCIDevice *pci_dev)
         pci_set_byte(pci_dev->config + pos + PCI_PM_DATA_REGISTER, 0);
     }
 
-    if ((pos = pci_find_cap_offset(pci_dev, PCI_CAP_ID_EXP, 0))) {
+    pos = pci_find_cap_offset(pci_dev, PCI_CAP_ID_EXP, 0);
+    if (pos != 0) {
         uint8_t version, size = 0;
         uint16_t type, devctl, lnkcap, lnksta;
         uint32_t devcap;
@@ -1144,13 +1133,13 @@ static int assigned_device_pci_cap_init(PCIDevice *pci_dev)
             size = MIN(0x3c, PCI_CONFIG_SPACE_SIZE - pos);
             if (size < 0x34) {
                 fprintf(stderr,
-                        "%s: Invalid size PCIe cap-id 0x%x \n",
+                        "%s: Invalid size PCIe cap-id 0x%x\n",
                         __func__, PCI_CAP_ID_EXP);
                 return -EINVAL;
             } else if (size != 0x3c) {
                 fprintf(stderr,
                         "WARNING, %s: PCIe cap-id 0x%x has "
-                        "non-standard size 0x%x; std size should be 0x3c \n",
+                        "non-standard size 0x%x; std size should be 0x3c\n",
                          __func__, PCI_CAP_ID_EXP, size);
             }
         } else if (version == 0) {
@@ -1173,8 +1162,8 @@ static int assigned_device_pci_cap_init(PCIDevice *pci_dev)
             return -EINVAL;
         }
 
-        if ((ret = pci_add_capability(pci_dev, PCI_CAP_ID_EXP,
-                                      pos, size)) < 0) {
+        ret = pci_add_capability(pci_dev, PCI_CAP_ID_EXP, pos, size);
+        if (ret < 0) {
             return ret;
         }
 
@@ -1246,12 +1235,14 @@ static int assigned_device_pci_cap_init(PCIDevice *pci_dev)
         }
     }
 
-    if ((pos = pci_find_cap_offset(pci_dev, PCI_CAP_ID_PCIX, 0))) {
+    pos = pci_find_cap_offset(pci_dev, PCI_CAP_ID_PCIX, 0);
+    if (pos != 0) {
         uint16_t cmd;
         uint32_t status;
 
         /* Only expose the minimum, 8 byte capability */
-        if ((ret = pci_add_capability(pci_dev, PCI_CAP_ID_PCIX, pos, 8)) < 0) {
+        ret = pci_add_capability(pci_dev, PCI_CAP_ID_PCIX, pos, 8);
+        if (ret < 0) {
             return ret;
         }
 
@@ -1273,9 +1264,11 @@ static int assigned_device_pci_cap_init(PCIDevice *pci_dev)
         pci_set_long(pci_dev->config + pos + PCI_X_STATUS, status);
     }
 
-    if ((pos = pci_find_cap_offset(pci_dev, PCI_CAP_ID_VPD, 0))) {
+    pos = pci_find_cap_offset(pci_dev, PCI_CAP_ID_VPD, 0);
+    if (pos != 0) {
         /* Direct R/W passthrough */
-        if ((ret = pci_add_capability(pci_dev, PCI_CAP_ID_VPD, pos, 8)) < 0) {
+        ret = pci_add_capability(pci_dev, PCI_CAP_ID_VPD, pos, 8);
+        if (ret < 0) {
             return ret;
         }
 
@@ -1290,8 +1283,8 @@ static int assigned_device_pci_cap_init(PCIDevice *pci_dev)
         pos += PCI_CAP_LIST_NEXT) {
         uint8_t len = pci_get_byte(pci_dev->config + pos + PCI_CAP_FLAGS);
         /* Direct R/W passthrough */
-        if ((ret = pci_add_capability(pci_dev, PCI_CAP_ID_VNDR,
-                                      pos, len)) < 0) {
+        ret = pci_add_capability(pci_dev, PCI_CAP_ID_VNDR, pos, len);
+        if (ret < 0) {
             return ret;
         }
 
@@ -1397,8 +1390,9 @@ static int assigned_initfn(struct PCIDevice *pci_dev)
     /* handle real device's MMIO/PIO BARs */
     if (assigned_dev_register_regions(dev->real_device.regions,
                                       dev->real_device.region_number,
-                                      dev))
+                                      dev)) {
         goto out;
+    }
 
     if (assigned_device_pci_cap_init(pci_dev) < 0) {
         goto out;
@@ -1415,13 +1409,15 @@ static int assigned_initfn(struct PCIDevice *pci_dev)
 
     /* assign device to guest */
     r = assign_device(dev);
-    if (r < 0)
+    if (r < 0) {
         goto out;
+    }
 
     /* assign legacy INTx to the device */
     r = assign_intx(dev);
-    if (r < 0)
+    if (r < 0) {
         goto assigned_out;
+    }
 
     assigned_dev_load_option_rom(dev);
     QLIST_INSERT_HEAD(&devs, dev, next);
@@ -1452,13 +1448,16 @@ static int parse_hostaddr(DeviceState *dev, Property *prop, const char *str)
     PCIHostDevice *ptr = qdev_get_prop_ptr(dev, prop);
     int rc;
 
-    rc = pci_parse_host_devaddr(str, &ptr->seg, &ptr->bus, &ptr->dev, &ptr->func);
-    if (rc != 0)
+    rc = pci_parse_host_devaddr(str, &ptr->seg, &ptr->bus, &ptr->dev,
+                                &ptr->func);
+    if (rc != 0) {
         return -1;
+    }
     return 0;
 }
 
-static int print_hostaddr(DeviceState *dev, Property *prop, char *dest, size_t len)
+static int print_hostaddr(DeviceState *dev, Property *prop, char *dest,
+                          size_t len)
 {
     PCIHostDevice *ptr = qdev_get_prop_ptr(dev, prop);
 
@@ -1484,7 +1483,8 @@ static PCIDeviceInfo assign_info = {
     .config_read  = assigned_dev_pci_read_config,
     .config_write = assigned_dev_pci_write_config,
     .qdev.props   = (Property[]) {
-        DEFINE_PROP("host", AssignedDevice, host, qdev_prop_hostaddr, PCIHostDevice),
+        DEFINE_PROP("host", AssignedDevice, host, qdev_prop_hostaddr,
+                    PCIHostDevice),
         DEFINE_PROP_BIT("iommu", AssignedDevice, features,
                         ASSIGNED_DEVICE_USE_IOMMU_BIT, true),
         DEFINE_PROP_BIT("prefer_msi", AssignedDevice, features,
@@ -1516,8 +1516,9 @@ static void assigned_dev_load_option_rom(AssignedDevice *dev)
     void *ptr;
 
     /* If loading ROM from file, pci handles it */
-    if (dev->dev.romfile || !dev->dev.rom_bar)
+    if (dev->dev.romfile || !dev->dev.rom_bar) {
         return;
+    }
 
     snprintf(rom_file, sizeof(rom_file),
              "/sys/bus/pci/devices/%04x:%02x:%02x.%01x/rom",
diff --git a/hw/device-assignment.h b/hw/device-assignment.h
index c41ea33..1e8fa37 100644
--- a/hw/device-assignment.h
+++ b/hw/device-assignment.h
@@ -25,8 +25,8 @@
  *  Copyright (C) 2008, Red Hat, Amit Shah (amit.shah@redhat.com)
  */
 
-#ifndef __DEVICE_ASSIGNMENT_H__
-#define __DEVICE_ASSIGNMENT_H__
+#ifndef QEMU_DEVICE_ASSIGNMENT_H
+#define QEMU_DEVICE_ASSIGNMENT_H
 
 #include <sys/mman.h>
 #include "qemu-common.h"
@@ -74,11 +74,11 @@ typedef struct {
     PCIRegion *region;
 } AssignedDevRegion;
 
-#define ASSIGNED_DEVICE_USE_IOMMU_BIT	0
-#define ASSIGNED_DEVICE_PREFER_MSI_BIT	1
+#define ASSIGNED_DEVICE_USE_IOMMU_BIT   0
+#define ASSIGNED_DEVICE_PREFER_MSI_BIT  1
 
-#define ASSIGNED_DEVICE_USE_IOMMU_MASK	(1 << ASSIGNED_DEVICE_USE_IOMMU_BIT)
-#define ASSIGNED_DEVICE_PREFER_MSI_MASK	(1 << ASSIGNED_DEVICE_PREFER_MSI_BIT)
+#define ASSIGNED_DEVICE_USE_IOMMU_MASK  (1 << ASSIGNED_DEVICE_USE_IOMMU_BIT)
+#define ASSIGNED_DEVICE_PREFER_MSI_MASK (1 << ASSIGNED_DEVICE_PREFER_MSI_BIT)
 
 typedef struct AssignedDevice {
     PCIDevice dev;
@@ -105,4 +105,4 @@ typedef struct AssignedDevice {
 
 void assigned_dev_update_irqs(void);
 
-#endif              /* __DEVICE_ASSIGNMENT_H__ */
+#endif /* QEMU_DEVICE_ASSIGNMENT_H */
-- 
1.7.3.4


^ permalink raw reply related	[flat|nested] 288+ messages in thread

* [Qemu-devel] [RFC][PATCH 45/45] pci-assign: Fix coding style issues
@ 2011-10-17  9:28   ` Jan Kiszka
  0 siblings, 0 replies; 288+ messages in thread
From: Jan Kiszka @ 2011-10-17  9:28 UTC (permalink / raw)
  To: Avi Kivity, Marcelo Tosatti
  Cc: Alex Williamson, qemu-devel, kvm, Michael S. Tsirkin

Also remove the dead get_assigned_device at this chance. No functional
changes.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
---
 hw/device-assignment.c |  199 ++++++++++++++++++++++++------------------------
 hw/device-assignment.h |   14 ++--
 2 files changed, 107 insertions(+), 106 deletions(-)

diff --git a/hw/device-assignment.c b/hw/device-assignment.c
index df554b3..c7930e4 100644
--- a/hw/device-assignment.c
+++ b/hw/device-assignment.c
@@ -58,10 +58,10 @@
 #ifdef DEVICE_ASSIGNMENT_DEBUG
 #define DEBUG(fmt, ...)                                       \
     do {                                                      \
-      fprintf(stderr, "%s: " fmt, __func__ , __VA_ARGS__);    \
+        fprintf(stderr, "%s: " fmt, __func__ , __VA_ARGS__);  \
     } while (0)
 #else
-#define DEBUG(fmt, ...) do { } while(0)
+#define DEBUG(fmt, ...) do { } while (0)
 #endif
 
 static void assigned_dev_load_option_rom(AssignedDevice *dev);
@@ -97,27 +97,27 @@ static uint32_t assigned_dev_ioport_rw(AssignedDevRegion *dev_region,
             DEBUG("out val=%x, len=%d, e_phys=%x, host=%x\n",
                   *val, len, addr, port);
             switch (len) {
-                case 1:
-                    outb(*val, port);
-                    break;
-                case 2:
-                    outw(*val, port);
-                    break;
-                case 4:
-                    outl(*val, port);
-                    break;
+            case 1:
+                outb(*val, port);
+                break;
+            case 2:
+                outw(*val, port);
+                break;
+            case 4:
+                outl(*val, port);
+                break;
             }
         } else {
             switch (len) {
-                case 1:
-                    ret = inb(port);
-                    break;
-                case 2:
-                    ret = inw(port);
-                    break;
-                case 4:
-                    ret = inl(port);
-                    break;
+            case 1:
+                ret = inb(port);
+                break;
+            case 2:
+                ret = inw(port);
+                break;
+            case 4:
+                ret = inl(port);
+                break;
             }
             DEBUG("in val=%x, len=%d, e_phys=%x, host=%x\n",
                   ret, len, addr, port);
@@ -130,21 +130,18 @@ static void assigned_dev_ioport_writeb(void *opaque, uint32_t addr,
                                        uint32_t value)
 {
     assigned_dev_ioport_rw(opaque, addr, 1, &value);
-    return;
 }
 
 static void assigned_dev_ioport_writew(void *opaque, uint32_t addr,
                                        uint32_t value)
 {
     assigned_dev_ioport_rw(opaque, addr, 2, &value);
-    return;
 }
 
 static void assigned_dev_ioport_writel(void *opaque, uint32_t addr,
                        uint32_t value)
 {
     assigned_dev_ioport_rw(opaque, addr, 4, &value);
-    return;
 }
 
 static uint32_t assigned_dev_ioport_readb(void *opaque, uint32_t addr)
@@ -295,13 +292,13 @@ static uint32_t assigned_dev_pci_read(PCIDevice *d, int pos, int len)
 again:
     ret = pread(fd, &val, len, pos);
     if (ret != len) {
-	if ((ret < 0) && (errno == EINTR || errno == EAGAIN))
-	    goto again;
-
-	fprintf(stderr, "%s: pread failed, ret = %zd errno = %d\n",
-		__func__, ret, errno);
+        if ((ret < 0) && (errno == EINTR || errno == EAGAIN)) {
+            goto again;
+        }
+        fprintf(stderr, "%s: pread failed, ret = %zd errno = %d\n",
+                __func__, ret, errno);
 
-	exit(1);
+        exit(1);
     }
 
     return val;
@@ -321,16 +318,14 @@ static void assigned_dev_pci_write(PCIDevice *d, int pos, uint32_t val, int len)
 again:
     ret = pwrite(fd, &val, len, pos);
     if (ret != len) {
-	if ((ret < 0) && (errno == EINTR || errno == EAGAIN))
-	    goto again;
-
-	fprintf(stderr, "%s: pwrite failed, ret = %zd errno = %d\n",
-		__func__, ret, errno);
+        if ((ret < 0) && (errno == EINTR || errno == EAGAIN)) {
+            goto again;
+        }
+        fprintf(stderr, "%s: pwrite failed, ret = %zd errno = %d\n",
+                __func__, ret, errno);
 
-	exit(1);
+        exit(1);
     }
-
-    return;
 }
 
 static void assigned_dev_emulate_config_read(AssignedDevice *dev,
@@ -359,22 +354,24 @@ static uint8_t pci_find_cap_offset(PCIDevice *d, uint8_t cap, uint8_t start)
     int status;
 
     status = assigned_dev_pci_read_byte(d, PCI_STATUS);
-    if ((status & PCI_STATUS_CAP_LIST) == 0)
+    if ((status & PCI_STATUS_CAP_LIST) == 0) {
         return 0;
+    }
 
     while (max_cap--) {
         pos = assigned_dev_pci_read_byte(d, pos);
-        if (pos < 0x40)
+        if (pos < 0x40) {
             break;
-
+        }
         pos &= ~3;
         id = assigned_dev_pci_read_byte(d, pos + PCI_CAP_LIST_ID);
 
-        if (id == 0xff)
+        if (id == 0xff) {
             break;
-        if (id == cap)
+        }
+        if (id == cap) {
             return pos;
-
+        }
         pos += PCI_CAP_LIST_NEXT;
     }
     return 0;
@@ -388,8 +385,9 @@ static int assigned_dev_register_regions(PCIRegion *io_regions,
     PCIRegion *cur_region = io_regions;
 
     for (i = 0; i < regions_num; i++, cur_region++) {
-        if (!cur_region->valid)
+        if (!cur_region->valid) {
             continue;
+        }
         pci_dev->v_addrs[i].num = i;
 
         /* handle memory io regions */
@@ -527,7 +525,7 @@ static int get_real_device(AssignedDevice *pci_dev, uint16_t r_seg,
     dev->region_number = 0;
 
     snprintf(dir, sizeof(dir), "/sys/bus/pci/devices/%04x:%02x:%02x.%x/",
-	     r_seg, r_bus, r_dev, r_func);
+             r_seg, r_bus, r_dev, r_func);
 
     snprintf(name, sizeof(name), "%sconfig", dir);
 
@@ -554,8 +552,9 @@ again:
     r = read(dev->config_fd, pci_dev->dev.config,
              pci_config_size(&pci_dev->dev));
     if (r < 0) {
-        if (errno == EINTR || errno == EAGAIN)
+        if (errno == EINTR || errno == EAGAIN) {
             goto again;
+        }
         fprintf(stderr, "%s: read failed, errno = %d\n", __func__, errno);
     }
 
@@ -574,16 +573,17 @@ again:
     }
 
     for (r = 0; r < PCI_ROM_SLOT; r++) {
-	if (fscanf(f, "%lli %lli %lli\n", &start, &end, &flags) != 3)
-	    break;
-
+        if (fscanf(f, "%lli %lli %lli\n", &start, &end, &flags) != 3) {
+            break;
+        }
         rp = dev->regions + r;
         rp->valid = 0;
         rp->resource_fd = -1;
         size = end - start + 1;
         flags &= IORESOURCE_IO | IORESOURCE_MEM | IORESOURCE_PREFETCH;
-        if (size == 0 || (flags & ~IORESOURCE_PREFETCH) == 0)
+        if (size == 0 || (flags & ~IORESOURCE_PREFETCH) == 0) {
             continue;
+        }
         if (flags & IORESOURCE_MEM) {
             flags &= ~IORESOURCE_IO;
         } else {
@@ -591,8 +591,9 @@ again:
         }
         snprintf(name, sizeof(name), "%sresource%d", dir, r);
         fd = open(name, O_RDWR);
-        if (fd == -1)
+        if (fd == -1) {
             continue;
+        }
         rp->resource_fd = fd;
 
         rp->type = flags;
@@ -704,7 +705,8 @@ static void assign_failed_examine(AssignedDevice *dev)
     sprintf(name, "%sdriver", dir);
 
     r = readlink(name, driver, sizeof(driver));
-    if ((r <= 0) || r >= sizeof(driver) || !(ns = strrchr(driver, '/'))) {
+    ns = strrchr(driver, '/');
+    if (r <= 0 || r >= sizeof(driver) || ns == NULL) {
         goto fail;
     }
 
@@ -780,11 +782,11 @@ static int assign_device(AssignedDevice *dev)
                 dev->dev.qdev.id, strerror(-r));
 
         switch (r) {
-            case -EBUSY:
-                assign_failed_examine(dev);
-                break;
-            default:
-                break;
+        case -EBUSY:
+            assign_failed_examine(dev);
+            break;
+        default:
+            break;
         }
     }
     return r;
@@ -812,9 +814,9 @@ static int assign_intx(AssignedDevice *dev)
     int irq, r;
 
     /* Interrupt PIN 0 means don't use INTx */
-    if (assigned_dev_pci_read_byte(&dev->dev, PCI_INTERRUPT_PIN) == 0)
+    if (assigned_dev_pci_read_byte(&dev->dev, PCI_INTERRUPT_PIN) == 0) {
         return 0;
-
+    }
     irq = pci_map_irq(&dev->dev, dev->intpin);
     irq = piix_get_irq(irq);
 
@@ -856,27 +858,11 @@ static void deassign_device(AssignedDevice *dev)
     assigned_dev_data.assigned_dev_id = calc_assigned_dev_id(dev);
 
     r = kvm_deassign_pci_device(kvm_state, &assigned_dev_data);
-    if (r < 0)
-	fprintf(stderr, "Failed to deassign device \"%s\" : %s\n",
+    if (r < 0) {
+        fprintf(stderr, "Failed to deassign device \"%s\" : %s\n",
                 dev->dev.qdev.id, strerror(-r));
-}
-
-#if 0
-AssignedDevInfo *get_assigned_device(int pcibus, int slot)
-{
-    AssignedDevice *assigned_dev = NULL;
-    AssignedDevInfo *adev = NULL;
-
-    QLIST_FOREACH(adev, &adev_head, next) {
-        assigned_dev = adev->assigned_dev;
-        if (pci_bus_num(assigned_dev->dev.bus) == pcibus &&
-            PCI_SLOT(assigned_dev->dev.devfn) == slot)
-            return adev;
     }
-
-    return NULL;
 }
-#endif
 
 /* The pci config space got updated. Check if irq numbers have changed
  * for our devices
@@ -1103,10 +1089,12 @@ static int assigned_device_pci_cap_init(PCIDevice *pci_dev)
     }
 
     /* Minimal PM support, nothing writable, device appears to NAK changes */
-    if ((pos = pci_find_cap_offset(pci_dev, PCI_CAP_ID_PM, 0))) {
+    pos = pci_find_cap_offset(pci_dev, PCI_CAP_ID_PM, 0);
+    if (pos != 0) {
         uint16_t pmc;
-        if ((ret = pci_add_capability(pci_dev, PCI_CAP_ID_PM, pos,
-                                      PCI_PM_SIZEOF)) < 0) {
+
+        ret = pci_add_capability(pci_dev, PCI_CAP_ID_PM, pos, PCI_PM_SIZEOF);
+        if (ret < 0) {
             return ret;
         }
 
@@ -1125,7 +1113,8 @@ static int assigned_device_pci_cap_init(PCIDevice *pci_dev)
         pci_set_byte(pci_dev->config + pos + PCI_PM_DATA_REGISTER, 0);
     }
 
-    if ((pos = pci_find_cap_offset(pci_dev, PCI_CAP_ID_EXP, 0))) {
+    pos = pci_find_cap_offset(pci_dev, PCI_CAP_ID_EXP, 0);
+    if (pos != 0) {
         uint8_t version, size = 0;
         uint16_t type, devctl, lnkcap, lnksta;
         uint32_t devcap;
@@ -1144,13 +1133,13 @@ static int assigned_device_pci_cap_init(PCIDevice *pci_dev)
             size = MIN(0x3c, PCI_CONFIG_SPACE_SIZE - pos);
             if (size < 0x34) {
                 fprintf(stderr,
-                        "%s: Invalid size PCIe cap-id 0x%x \n",
+                        "%s: Invalid size PCIe cap-id 0x%x\n",
                         __func__, PCI_CAP_ID_EXP);
                 return -EINVAL;
             } else if (size != 0x3c) {
                 fprintf(stderr,
                         "WARNING, %s: PCIe cap-id 0x%x has "
-                        "non-standard size 0x%x; std size should be 0x3c \n",
+                        "non-standard size 0x%x; std size should be 0x3c\n",
                          __func__, PCI_CAP_ID_EXP, size);
             }
         } else if (version == 0) {
@@ -1173,8 +1162,8 @@ static int assigned_device_pci_cap_init(PCIDevice *pci_dev)
             return -EINVAL;
         }
 
-        if ((ret = pci_add_capability(pci_dev, PCI_CAP_ID_EXP,
-                                      pos, size)) < 0) {
+        ret = pci_add_capability(pci_dev, PCI_CAP_ID_EXP, pos, size);
+        if (ret < 0) {
             return ret;
         }
 
@@ -1246,12 +1235,14 @@ static int assigned_device_pci_cap_init(PCIDevice *pci_dev)
         }
     }
 
-    if ((pos = pci_find_cap_offset(pci_dev, PCI_CAP_ID_PCIX, 0))) {
+    pos = pci_find_cap_offset(pci_dev, PCI_CAP_ID_PCIX, 0);
+    if (pos != 0) {
         uint16_t cmd;
         uint32_t status;
 
         /* Only expose the minimum, 8 byte capability */
-        if ((ret = pci_add_capability(pci_dev, PCI_CAP_ID_PCIX, pos, 8)) < 0) {
+        ret = pci_add_capability(pci_dev, PCI_CAP_ID_PCIX, pos, 8);
+        if (ret < 0) {
             return ret;
         }
 
@@ -1273,9 +1264,11 @@ static int assigned_device_pci_cap_init(PCIDevice *pci_dev)
         pci_set_long(pci_dev->config + pos + PCI_X_STATUS, status);
     }
 
-    if ((pos = pci_find_cap_offset(pci_dev, PCI_CAP_ID_VPD, 0))) {
+    pos = pci_find_cap_offset(pci_dev, PCI_CAP_ID_VPD, 0);
+    if (pos != 0) {
         /* Direct R/W passthrough */
-        if ((ret = pci_add_capability(pci_dev, PCI_CAP_ID_VPD, pos, 8)) < 0) {
+        ret = pci_add_capability(pci_dev, PCI_CAP_ID_VPD, pos, 8);
+        if (ret < 0) {
             return ret;
         }
 
@@ -1290,8 +1283,8 @@ static int assigned_device_pci_cap_init(PCIDevice *pci_dev)
         pos += PCI_CAP_LIST_NEXT) {
         uint8_t len = pci_get_byte(pci_dev->config + pos + PCI_CAP_FLAGS);
         /* Direct R/W passthrough */
-        if ((ret = pci_add_capability(pci_dev, PCI_CAP_ID_VNDR,
-                                      pos, len)) < 0) {
+        ret = pci_add_capability(pci_dev, PCI_CAP_ID_VNDR, pos, len);
+        if (ret < 0) {
             return ret;
         }
 
@@ -1397,8 +1390,9 @@ static int assigned_initfn(struct PCIDevice *pci_dev)
     /* handle real device's MMIO/PIO BARs */
     if (assigned_dev_register_regions(dev->real_device.regions,
                                       dev->real_device.region_number,
-                                      dev))
+                                      dev)) {
         goto out;
+    }
 
     if (assigned_device_pci_cap_init(pci_dev) < 0) {
         goto out;
@@ -1415,13 +1409,15 @@ static int assigned_initfn(struct PCIDevice *pci_dev)
 
     /* assign device to guest */
     r = assign_device(dev);
-    if (r < 0)
+    if (r < 0) {
         goto out;
+    }
 
     /* assign legacy INTx to the device */
     r = assign_intx(dev);
-    if (r < 0)
+    if (r < 0) {
         goto assigned_out;
+    }
 
     assigned_dev_load_option_rom(dev);
     QLIST_INSERT_HEAD(&devs, dev, next);
@@ -1452,13 +1448,16 @@ static int parse_hostaddr(DeviceState *dev, Property *prop, const char *str)
     PCIHostDevice *ptr = qdev_get_prop_ptr(dev, prop);
     int rc;
 
-    rc = pci_parse_host_devaddr(str, &ptr->seg, &ptr->bus, &ptr->dev, &ptr->func);
-    if (rc != 0)
+    rc = pci_parse_host_devaddr(str, &ptr->seg, &ptr->bus, &ptr->dev,
+                                &ptr->func);
+    if (rc != 0) {
         return -1;
+    }
     return 0;
 }
 
-static int print_hostaddr(DeviceState *dev, Property *prop, char *dest, size_t len)
+static int print_hostaddr(DeviceState *dev, Property *prop, char *dest,
+                          size_t len)
 {
     PCIHostDevice *ptr = qdev_get_prop_ptr(dev, prop);
 
@@ -1484,7 +1483,8 @@ static PCIDeviceInfo assign_info = {
     .config_read  = assigned_dev_pci_read_config,
     .config_write = assigned_dev_pci_write_config,
     .qdev.props   = (Property[]) {
-        DEFINE_PROP("host", AssignedDevice, host, qdev_prop_hostaddr, PCIHostDevice),
+        DEFINE_PROP("host", AssignedDevice, host, qdev_prop_hostaddr,
+                    PCIHostDevice),
         DEFINE_PROP_BIT("iommu", AssignedDevice, features,
                         ASSIGNED_DEVICE_USE_IOMMU_BIT, true),
         DEFINE_PROP_BIT("prefer_msi", AssignedDevice, features,
@@ -1516,8 +1516,9 @@ static void assigned_dev_load_option_rom(AssignedDevice *dev)
     void *ptr;
 
     /* If loading ROM from file, pci handles it */
-    if (dev->dev.romfile || !dev->dev.rom_bar)
+    if (dev->dev.romfile || !dev->dev.rom_bar) {
         return;
+    }
 
     snprintf(rom_file, sizeof(rom_file),
              "/sys/bus/pci/devices/%04x:%02x:%02x.%01x/rom",
diff --git a/hw/device-assignment.h b/hw/device-assignment.h
index c41ea33..1e8fa37 100644
--- a/hw/device-assignment.h
+++ b/hw/device-assignment.h
@@ -25,8 +25,8 @@
  *  Copyright (C) 2008, Red Hat, Amit Shah (amit.shah@redhat.com)
  */
 
-#ifndef __DEVICE_ASSIGNMENT_H__
-#define __DEVICE_ASSIGNMENT_H__
+#ifndef QEMU_DEVICE_ASSIGNMENT_H
+#define QEMU_DEVICE_ASSIGNMENT_H
 
 #include <sys/mman.h>
 #include "qemu-common.h"
@@ -74,11 +74,11 @@ typedef struct {
     PCIRegion *region;
 } AssignedDevRegion;
 
-#define ASSIGNED_DEVICE_USE_IOMMU_BIT	0
-#define ASSIGNED_DEVICE_PREFER_MSI_BIT	1
+#define ASSIGNED_DEVICE_USE_IOMMU_BIT   0
+#define ASSIGNED_DEVICE_PREFER_MSI_BIT  1
 
-#define ASSIGNED_DEVICE_USE_IOMMU_MASK	(1 << ASSIGNED_DEVICE_USE_IOMMU_BIT)
-#define ASSIGNED_DEVICE_PREFER_MSI_MASK	(1 << ASSIGNED_DEVICE_PREFER_MSI_BIT)
+#define ASSIGNED_DEVICE_USE_IOMMU_MASK  (1 << ASSIGNED_DEVICE_USE_IOMMU_BIT)
+#define ASSIGNED_DEVICE_PREFER_MSI_MASK (1 << ASSIGNED_DEVICE_PREFER_MSI_BIT)
 
 typedef struct AssignedDevice {
     PCIDevice dev;
@@ -105,4 +105,4 @@ typedef struct AssignedDevice {
 
 void assigned_dev_update_irqs(void);
 
-#endif              /* __DEVICE_ASSIGNMENT_H__ */
+#endif /* QEMU_DEVICE_ASSIGNMENT_H */
-- 
1.7.3.4

^ permalink raw reply related	[flat|nested] 288+ messages in thread

* Re: [RFC][PATCH 11/45] msi: Factor out delivery hook
  2011-10-17  9:27   ` [Qemu-devel] " Jan Kiszka
@ 2011-10-17 10:56     ` Avi Kivity
  -1 siblings, 0 replies; 288+ messages in thread
From: Avi Kivity @ 2011-10-17 10:56 UTC (permalink / raw)
  To: Jan Kiszka
  Cc: Alex Williamson, Marcelo Tosatti, qemu-devel, kvm, Michael S. Tsirkin

On 10/17/2011 11:27 AM, Jan Kiszka wrote:
> So far we deliver MSI messages by writing them into the target MMIO
> area. This reflects what happens on hardware, but imposes some
> limitations on the emulation when introducing KVM in-kernel irqchip
> models. For those we will need to track the message origin.

Why do we need to track the message origin?  Emulated interrupt remapping?

>  Moreover,
> different architecture or accelerators may want to overload the delivery
> handler.
>
> Therefore, this commit introduces a delivery hook that is called by the
> MSI/MSI-X layer when devices send normal messages, but also on spurious
> deliveries that ended up on the APIC MMIO handler. Our default delivery
> handler for APIC-based PCs then dispatches between real MSIs and other
> DMA requests that happened to take the MSI patch.

'path'

>  
> -static void apic_send_msi(target_phys_addr_t addr, uint32_t data)
> +void apic_deliver_msi(MSIMessage *msg)

In general, it is better these days to pass small structures by value.


Not sure what the gain is from intercepting the msi just before the
stl_phys() vs. in the apic handler.

-- 
error compiling committee.c: too many arguments to function

^ permalink raw reply	[flat|nested] 288+ messages in thread

* Re: [Qemu-devel] [RFC][PATCH 11/45] msi: Factor out delivery hook
@ 2011-10-17 10:56     ` Avi Kivity
  0 siblings, 0 replies; 288+ messages in thread
From: Avi Kivity @ 2011-10-17 10:56 UTC (permalink / raw)
  To: Jan Kiszka
  Cc: Alex Williamson, Marcelo Tosatti, qemu-devel, kvm, Michael S. Tsirkin

On 10/17/2011 11:27 AM, Jan Kiszka wrote:
> So far we deliver MSI messages by writing them into the target MMIO
> area. This reflects what happens on hardware, but imposes some
> limitations on the emulation when introducing KVM in-kernel irqchip
> models. For those we will need to track the message origin.

Why do we need to track the message origin?  Emulated interrupt remapping?

>  Moreover,
> different architecture or accelerators may want to overload the delivery
> handler.
>
> Therefore, this commit introduces a delivery hook that is called by the
> MSI/MSI-X layer when devices send normal messages, but also on spurious
> deliveries that ended up on the APIC MMIO handler. Our default delivery
> handler for APIC-based PCs then dispatches between real MSIs and other
> DMA requests that happened to take the MSI patch.

'path'

>  
> -static void apic_send_msi(target_phys_addr_t addr, uint32_t data)
> +void apic_deliver_msi(MSIMessage *msg)

In general, it is better these days to pass small structures by value.


Not sure what the gain is from intercepting the msi just before the
stl_phys() vs. in the apic handler.

-- 
error compiling committee.c: too many arguments to function

^ permalink raw reply	[flat|nested] 288+ messages in thread

* Re: [RFC][PATCH 12/45] msi: Introduce MSIRoutingCache
  2011-10-17  9:27   ` [Qemu-devel] " Jan Kiszka
@ 2011-10-17 11:06     ` Avi Kivity
  -1 siblings, 0 replies; 288+ messages in thread
From: Avi Kivity @ 2011-10-17 11:06 UTC (permalink / raw)
  To: Jan Kiszka
  Cc: Marcelo Tosatti, kvm, Alex Williamson, Michael S. Tsirkin, qemu-devel

On 10/17/2011 11:27 AM, Jan Kiszka wrote:
> This cache will help us implementing KVM in-kernel irqchip support
> without spreading hooks all over the place.
>
> KVM requires us to register it first and then deliver it by raising a
> pseudo IRQ line returned on registration. While this could be changed
> for QEMU-originated MSI messages by adding direct MSI injection, we will
> still need this translation for irqfd-originated messages. The
> MSIRoutingCache will allow to track those registrations and update them
> lazily before the actual delivery. This avoid having to track MSI
> vectors at device level (like qemu-kvm currently does).
>
>
> +typedef enum {
> +    MSI_ROUTE_NONE = 0,
> +    MSI_ROUTE_STATIC,
> +} MSIRouteType;
> +
> +struct MSIRoutingCache {
> +    MSIMessage msg;
> +    MSIRouteType type;
> +    int kvm_gsi;
> +    int kvm_irqfd;
> +};
> +
> diff --git a/hw/pci.h b/hw/pci.h
> index 329ab32..5b5d2fd 100644
> --- a/hw/pci.h
> +++ b/hw/pci.h
> @@ -197,6 +197,10 @@ struct PCIDevice {
>      MemoryRegion rom;
>      uint32_t rom_bar;
>  
> +    /* MSI routing chaches */
> +    MSIRoutingCache *msi_cache;
> +    MSIRoutingCache *msix_cache;
> +
>      /* MSI entries */
>      int msi_entries_nr;
>      struct KVMMsiMessage *msi_irq_entries;

IMO this needlessly leaks kvm information into core qemu.  The cache
should be completely hidden in kvm code.

I think msi_deliver() can hide the use of the cache completely.  For
pre-registered events like kvm's irqfd, you can use something like

  qemu_irq qemu_msi_irq(MSIMessage msg)

for non-kvm, it simply returns a qemu_irq that triggers a stl_phys();
for kvm, it allocates an irqfd and a permanent entry in the cache and
returns a qemu_irq that triggers the irqfd.

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 288+ messages in thread

* Re: [Qemu-devel] [RFC][PATCH 12/45] msi: Introduce MSIRoutingCache
@ 2011-10-17 11:06     ` Avi Kivity
  0 siblings, 0 replies; 288+ messages in thread
From: Avi Kivity @ 2011-10-17 11:06 UTC (permalink / raw)
  To: Jan Kiszka
  Cc: Alex Williamson, Marcelo Tosatti, qemu-devel, kvm, Michael S. Tsirkin

On 10/17/2011 11:27 AM, Jan Kiszka wrote:
> This cache will help us implementing KVM in-kernel irqchip support
> without spreading hooks all over the place.
>
> KVM requires us to register it first and then deliver it by raising a
> pseudo IRQ line returned on registration. While this could be changed
> for QEMU-originated MSI messages by adding direct MSI injection, we will
> still need this translation for irqfd-originated messages. The
> MSIRoutingCache will allow to track those registrations and update them
> lazily before the actual delivery. This avoid having to track MSI
> vectors at device level (like qemu-kvm currently does).
>
>
> +typedef enum {
> +    MSI_ROUTE_NONE = 0,
> +    MSI_ROUTE_STATIC,
> +} MSIRouteType;
> +
> +struct MSIRoutingCache {
> +    MSIMessage msg;
> +    MSIRouteType type;
> +    int kvm_gsi;
> +    int kvm_irqfd;
> +};
> +
> diff --git a/hw/pci.h b/hw/pci.h
> index 329ab32..5b5d2fd 100644
> --- a/hw/pci.h
> +++ b/hw/pci.h
> @@ -197,6 +197,10 @@ struct PCIDevice {
>      MemoryRegion rom;
>      uint32_t rom_bar;
>  
> +    /* MSI routing chaches */
> +    MSIRoutingCache *msi_cache;
> +    MSIRoutingCache *msix_cache;
> +
>      /* MSI entries */
>      int msi_entries_nr;
>      struct KVMMsiMessage *msi_irq_entries;

IMO this needlessly leaks kvm information into core qemu.  The cache
should be completely hidden in kvm code.

I think msi_deliver() can hide the use of the cache completely.  For
pre-registered events like kvm's irqfd, you can use something like

  qemu_irq qemu_msi_irq(MSIMessage msg)

for non-kvm, it simply returns a qemu_irq that triggers a stl_phys();
for kvm, it allocates an irqfd and a permanent entry in the cache and
returns a qemu_irq that triggers the irqfd.

-- 
error compiling committee.c: too many arguments to function

^ permalink raw reply	[flat|nested] 288+ messages in thread

* Re: [RFC][PATCH 06/45] msix: Prevent bogus mask updates on MMIO accesses
  2011-10-17  9:27   ` [Qemu-devel] " Jan Kiszka
@ 2011-10-17 11:10     ` Michael S. Tsirkin
  -1 siblings, 0 replies; 288+ messages in thread
From: Michael S. Tsirkin @ 2011-10-17 11:10 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: Avi Kivity, Marcelo Tosatti, kvm, Alex Williamson, qemu-devel

On Mon, Oct 17, 2011 at 11:27:40AM +0200, Jan Kiszka wrote:
> Only accesses to the MSI-X table must trigger a call to
> msix_handle_mask_update or a notifier invocation.
> 
> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>

Why would msix_mmio_write be called on an access
outside the table?

> ---
>  hw/msix.c |   16 ++++++++++------
>  1 files changed, 10 insertions(+), 6 deletions(-)
> 
> diff --git a/hw/msix.c b/hw/msix.c
> index 2c4de21..33cb716 100644
> --- a/hw/msix.c
> +++ b/hw/msix.c
> @@ -264,18 +264,22 @@ static void msix_mmio_write(void *opaque, target_phys_addr_t addr,
>  {
>      PCIDevice *dev = opaque;
>      unsigned int offset = addr & (MSIX_PAGE_SIZE - 1) & ~0x3;
> -    int vector = offset / PCI_MSIX_ENTRY_SIZE;
> +    unsigned int vector = offset / PCI_MSIX_ENTRY_SIZE;

Why the int/unsigned change? this has no chance to overflow, and using
unsigned causes signed/unsigned comparison below,
and unsigned/signed conversion on calls such as msix_is_masked.

>      int was_masked = msix_is_masked(dev, vector);
>      pci_set_long(dev->msix_table_page + offset, val);
>      if (kvm_enabled() && kvm_irqchip_in_kernel()) {
>          kvm_msix_update(dev, vector, was_masked, msix_is_masked(dev, vector));
>      }

I would say if we need to check the address, check it first thing
and return if the address is out of a sensible range.
For example, are you worried about kvm_msix_update calls with
a sensible mask?

> -    if (was_masked != msix_is_masked(dev, vector) && dev->msix_mask_notifier) {
> -        int r = dev->msix_mask_notifier(dev, vector,
> -					msix_is_masked(dev, vector));
> -        assert(r >= 0);
> +
> +    if (vector < dev->msix_entries_nr) {
> +        if (was_masked != msix_is_masked(dev, vector) &&
> +            dev->msix_mask_notifier) {
> +            int r = dev->msix_mask_notifier(dev, vector,
> +                                            msix_is_masked(dev, vector));
> +            assert(r >= 0);
> +        }
> +        msix_handle_mask_update(dev, vector);
>      }
> -    msix_handle_mask_update(dev, vector);
>  }
>  
>  static const MemoryRegionOps msix_mmio_ops = {
> -- 
> 1.7.3.4

^ permalink raw reply	[flat|nested] 288+ messages in thread

* Re: [Qemu-devel] [RFC][PATCH 06/45] msix: Prevent bogus mask updates on MMIO accesses
@ 2011-10-17 11:10     ` Michael S. Tsirkin
  0 siblings, 0 replies; 288+ messages in thread
From: Michael S. Tsirkin @ 2011-10-17 11:10 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: Alex Williamson, Marcelo Tosatti, Avi Kivity, kvm, qemu-devel

On Mon, Oct 17, 2011 at 11:27:40AM +0200, Jan Kiszka wrote:
> Only accesses to the MSI-X table must trigger a call to
> msix_handle_mask_update or a notifier invocation.
> 
> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>

Why would msix_mmio_write be called on an access
outside the table?

> ---
>  hw/msix.c |   16 ++++++++++------
>  1 files changed, 10 insertions(+), 6 deletions(-)
> 
> diff --git a/hw/msix.c b/hw/msix.c
> index 2c4de21..33cb716 100644
> --- a/hw/msix.c
> +++ b/hw/msix.c
> @@ -264,18 +264,22 @@ static void msix_mmio_write(void *opaque, target_phys_addr_t addr,
>  {
>      PCIDevice *dev = opaque;
>      unsigned int offset = addr & (MSIX_PAGE_SIZE - 1) & ~0x3;
> -    int vector = offset / PCI_MSIX_ENTRY_SIZE;
> +    unsigned int vector = offset / PCI_MSIX_ENTRY_SIZE;

Why the int/unsigned change? this has no chance to overflow, and using
unsigned causes signed/unsigned comparison below,
and unsigned/signed conversion on calls such as msix_is_masked.

>      int was_masked = msix_is_masked(dev, vector);
>      pci_set_long(dev->msix_table_page + offset, val);
>      if (kvm_enabled() && kvm_irqchip_in_kernel()) {
>          kvm_msix_update(dev, vector, was_masked, msix_is_masked(dev, vector));
>      }

I would say if we need to check the address, check it first thing
and return if the address is out of a sensible range.
For example, are you worried about kvm_msix_update calls with
a sensible mask?

> -    if (was_masked != msix_is_masked(dev, vector) && dev->msix_mask_notifier) {
> -        int r = dev->msix_mask_notifier(dev, vector,
> -					msix_is_masked(dev, vector));
> -        assert(r >= 0);
> +
> +    if (vector < dev->msix_entries_nr) {
> +        if (was_masked != msix_is_masked(dev, vector) &&
> +            dev->msix_mask_notifier) {
> +            int r = dev->msix_mask_notifier(dev, vector,
> +                                            msix_is_masked(dev, vector));
> +            assert(r >= 0);
> +        }
> +        msix_handle_mask_update(dev, vector);
>      }
> -    msix_handle_mask_update(dev, vector);
>  }
>  
>  static const MemoryRegionOps msix_mmio_ops = {
> -- 
> 1.7.3.4

^ permalink raw reply	[flat|nested] 288+ messages in thread

* Re: [RFC][PATCH 17/45] qemu-kvm: Track MSIRoutingCache in KVM routing table
  2011-10-17  9:27   ` [Qemu-devel] " Jan Kiszka
@ 2011-10-17 11:13     ` Avi Kivity
  -1 siblings, 0 replies; 288+ messages in thread
From: Avi Kivity @ 2011-10-17 11:13 UTC (permalink / raw)
  To: Jan Kiszka
  Cc: Marcelo Tosatti, kvm, Alex Williamson, Michael S. Tsirkin, qemu-devel

On 10/17/2011 11:27 AM, Jan Kiszka wrote:
> Keep a link from the internal KVM routing table to potential MSI routing
> cache entries. The link is used so far whenever the entry is dropped to
> invalidate the cache content. It will allow us to build MSI routing
> entries on demand and flush existing ones on table overflow.
>

Does this not require a destructor for MSIRoutingCache?

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 288+ messages in thread

* Re: [Qemu-devel] [RFC][PATCH 17/45] qemu-kvm: Track MSIRoutingCache in KVM routing table
@ 2011-10-17 11:13     ` Avi Kivity
  0 siblings, 0 replies; 288+ messages in thread
From: Avi Kivity @ 2011-10-17 11:13 UTC (permalink / raw)
  To: Jan Kiszka
  Cc: Alex Williamson, Marcelo Tosatti, qemu-devel, kvm, Michael S. Tsirkin

On 10/17/2011 11:27 AM, Jan Kiszka wrote:
> Keep a link from the internal KVM routing table to potential MSI routing
> cache entries. The link is used so far whenever the entry is dropped to
> invalidate the cache content. It will allow us to build MSI routing
> entries on demand and flush existing ones on table overflow.
>

Does this not require a destructor for MSIRoutingCache?

-- 
error compiling committee.c: too many arguments to function

^ permalink raw reply	[flat|nested] 288+ messages in thread

* Re: [RFC][PATCH 11/45] msi: Factor out delivery hook
  2011-10-17 10:56     ` [Qemu-devel] " Avi Kivity
@ 2011-10-17 11:15       ` Jan Kiszka
  -1 siblings, 0 replies; 288+ messages in thread
From: Jan Kiszka @ 2011-10-17 11:15 UTC (permalink / raw)
  To: Avi Kivity
  Cc: Marcelo Tosatti, kvm, Alex Williamson, Michael S. Tsirkin, qemu-devel

On 2011-10-17 12:56, Avi Kivity wrote:
> On 10/17/2011 11:27 AM, Jan Kiszka wrote:
>> So far we deliver MSI messages by writing them into the target MMIO
>> area. This reflects what happens on hardware, but imposes some
>> limitations on the emulation when introducing KVM in-kernel irqchip
>> models. For those we will need to track the message origin.
> 
> Why do we need to track the message origin?  Emulated interrupt remapping?

The origin holds the routing cache which we need to track if the message
already has a route (and that without searching long lists) and to
update that route instead of add another one.

> 
>>  Moreover,
>> different architecture or accelerators may want to overload the delivery
>> handler.
>>
>> Therefore, this commit introduces a delivery hook that is called by the
>> MSI/MSI-X layer when devices send normal messages, but also on spurious
>> deliveries that ended up on the APIC MMIO handler. Our default delivery
>> handler for APIC-based PCs then dispatches between real MSIs and other
>> DMA requests that happened to take the MSI patch.
> 
> 'path'
> 
>>  
>> -static void apic_send_msi(target_phys_addr_t addr, uint32_t data)
>> +void apic_deliver_msi(MSIMessage *msg)
> 
> In general, it is better these days to pass small structures by value.

OK, will adjust this.

> 
> 
> Not sure what the gain is from intercepting the msi just before the
> stl_phys() vs. in the apic handler.

APIC is x86-specific, MSI is not. I think Xen will also want to make use
of this hook. I originally though of using it for the KVM in-kernel
models as well, but I will now establish a callback at APIC-level
(upstream will look differently from qemu-kvm in this regard).

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux

^ permalink raw reply	[flat|nested] 288+ messages in thread

* Re: [Qemu-devel] [RFC][PATCH 11/45] msi: Factor out delivery hook
@ 2011-10-17 11:15       ` Jan Kiszka
  0 siblings, 0 replies; 288+ messages in thread
From: Jan Kiszka @ 2011-10-17 11:15 UTC (permalink / raw)
  To: Avi Kivity
  Cc: Alex Williamson, Marcelo Tosatti, qemu-devel, kvm, Michael S. Tsirkin

On 2011-10-17 12:56, Avi Kivity wrote:
> On 10/17/2011 11:27 AM, Jan Kiszka wrote:
>> So far we deliver MSI messages by writing them into the target MMIO
>> area. This reflects what happens on hardware, but imposes some
>> limitations on the emulation when introducing KVM in-kernel irqchip
>> models. For those we will need to track the message origin.
> 
> Why do we need to track the message origin?  Emulated interrupt remapping?

The origin holds the routing cache which we need to track if the message
already has a route (and that without searching long lists) and to
update that route instead of add another one.

> 
>>  Moreover,
>> different architecture or accelerators may want to overload the delivery
>> handler.
>>
>> Therefore, this commit introduces a delivery hook that is called by the
>> MSI/MSI-X layer when devices send normal messages, but also on spurious
>> deliveries that ended up on the APIC MMIO handler. Our default delivery
>> handler for APIC-based PCs then dispatches between real MSIs and other
>> DMA requests that happened to take the MSI patch.
> 
> 'path'
> 
>>  
>> -static void apic_send_msi(target_phys_addr_t addr, uint32_t data)
>> +void apic_deliver_msi(MSIMessage *msg)
> 
> In general, it is better these days to pass small structures by value.

OK, will adjust this.

> 
> 
> Not sure what the gain is from intercepting the msi just before the
> stl_phys() vs. in the apic handler.

APIC is x86-specific, MSI is not. I think Xen will also want to make use
of this hook. I originally though of using it for the KVM in-kernel
models as well, but I will now establish a callback at APIC-level
(upstream will look differently from qemu-kvm in this regard).

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux

^ permalink raw reply	[flat|nested] 288+ messages in thread

* Re: [RFC][PATCH 12/45] msi: Introduce MSIRoutingCache
  2011-10-17 11:06     ` [Qemu-devel] " Avi Kivity
@ 2011-10-17 11:19       ` Jan Kiszka
  -1 siblings, 0 replies; 288+ messages in thread
From: Jan Kiszka @ 2011-10-17 11:19 UTC (permalink / raw)
  To: Avi Kivity
  Cc: Marcelo Tosatti, kvm, Alex Williamson, Michael S. Tsirkin, qemu-devel

On 2011-10-17 13:06, Avi Kivity wrote:
> On 10/17/2011 11:27 AM, Jan Kiszka wrote:
>> This cache will help us implementing KVM in-kernel irqchip support
>> without spreading hooks all over the place.
>>
>> KVM requires us to register it first and then deliver it by raising a
>> pseudo IRQ line returned on registration. While this could be changed
>> for QEMU-originated MSI messages by adding direct MSI injection, we will
>> still need this translation for irqfd-originated messages. The
>> MSIRoutingCache will allow to track those registrations and update them
>> lazily before the actual delivery. This avoid having to track MSI
>> vectors at device level (like qemu-kvm currently does).
>>
>>
>> +typedef enum {
>> +    MSI_ROUTE_NONE = 0,
>> +    MSI_ROUTE_STATIC,
>> +} MSIRouteType;
>> +
>> +struct MSIRoutingCache {
>> +    MSIMessage msg;
>> +    MSIRouteType type;
>> +    int kvm_gsi;
>> +    int kvm_irqfd;
>> +};
>> +
>> diff --git a/hw/pci.h b/hw/pci.h
>> index 329ab32..5b5d2fd 100644
>> --- a/hw/pci.h
>> +++ b/hw/pci.h
>> @@ -197,6 +197,10 @@ struct PCIDevice {
>>      MemoryRegion rom;
>>      uint32_t rom_bar;
>>  
>> +    /* MSI routing chaches */
>> +    MSIRoutingCache *msi_cache;
>> +    MSIRoutingCache *msix_cache;
>> +
>>      /* MSI entries */
>>      int msi_entries_nr;
>>      struct KVMMsiMessage *msi_irq_entries;
> 
> IMO this needlessly leaks kvm information into core qemu.  The cache
> should be completely hidden in kvm code.
> 
> I think msi_deliver() can hide the use of the cache completely.  For
> pre-registered events like kvm's irqfd, you can use something like
> 
>   qemu_irq qemu_msi_irq(MSIMessage msg)
> 
> for non-kvm, it simply returns a qemu_irq that triggers a stl_phys();
> for kvm, it allocates an irqfd and a permanent entry in the cache and
> returns a qemu_irq that triggers the irqfd.

See my previously mail: you want to track the life-cycle of an MSI
source to avoid generating routes for identical sources. A messages is
not a source. Two identical messages can come from different sources. So
we need a separate data structure for that purpose.

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux

^ permalink raw reply	[flat|nested] 288+ messages in thread

* Re: [Qemu-devel] [RFC][PATCH 12/45] msi: Introduce MSIRoutingCache
@ 2011-10-17 11:19       ` Jan Kiszka
  0 siblings, 0 replies; 288+ messages in thread
From: Jan Kiszka @ 2011-10-17 11:19 UTC (permalink / raw)
  To: Avi Kivity
  Cc: Alex Williamson, Marcelo Tosatti, qemu-devel, kvm, Michael S. Tsirkin

On 2011-10-17 13:06, Avi Kivity wrote:
> On 10/17/2011 11:27 AM, Jan Kiszka wrote:
>> This cache will help us implementing KVM in-kernel irqchip support
>> without spreading hooks all over the place.
>>
>> KVM requires us to register it first and then deliver it by raising a
>> pseudo IRQ line returned on registration. While this could be changed
>> for QEMU-originated MSI messages by adding direct MSI injection, we will
>> still need this translation for irqfd-originated messages. The
>> MSIRoutingCache will allow to track those registrations and update them
>> lazily before the actual delivery. This avoid having to track MSI
>> vectors at device level (like qemu-kvm currently does).
>>
>>
>> +typedef enum {
>> +    MSI_ROUTE_NONE = 0,
>> +    MSI_ROUTE_STATIC,
>> +} MSIRouteType;
>> +
>> +struct MSIRoutingCache {
>> +    MSIMessage msg;
>> +    MSIRouteType type;
>> +    int kvm_gsi;
>> +    int kvm_irqfd;
>> +};
>> +
>> diff --git a/hw/pci.h b/hw/pci.h
>> index 329ab32..5b5d2fd 100644
>> --- a/hw/pci.h
>> +++ b/hw/pci.h
>> @@ -197,6 +197,10 @@ struct PCIDevice {
>>      MemoryRegion rom;
>>      uint32_t rom_bar;
>>  
>> +    /* MSI routing chaches */
>> +    MSIRoutingCache *msi_cache;
>> +    MSIRoutingCache *msix_cache;
>> +
>>      /* MSI entries */
>>      int msi_entries_nr;
>>      struct KVMMsiMessage *msi_irq_entries;
> 
> IMO this needlessly leaks kvm information into core qemu.  The cache
> should be completely hidden in kvm code.
> 
> I think msi_deliver() can hide the use of the cache completely.  For
> pre-registered events like kvm's irqfd, you can use something like
> 
>   qemu_irq qemu_msi_irq(MSIMessage msg)
> 
> for non-kvm, it simply returns a qemu_irq that triggers a stl_phys();
> for kvm, it allocates an irqfd and a permanent entry in the cache and
> returns a qemu_irq that triggers the irqfd.

See my previously mail: you want to track the life-cycle of an MSI
source to avoid generating routes for identical sources. A messages is
not a source. Two identical messages can come from different sources. So
we need a separate data structure for that purpose.

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux

^ permalink raw reply	[flat|nested] 288+ messages in thread

* Re: [RFC][PATCH 42/45] msix: Introduce msix_init_simple
  2011-10-17  9:28   ` [Qemu-devel] " Jan Kiszka
@ 2011-10-17 11:22     ` Michael S. Tsirkin
  -1 siblings, 0 replies; 288+ messages in thread
From: Michael S. Tsirkin @ 2011-10-17 11:22 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: Avi Kivity, Marcelo Tosatti, kvm, Alex Williamson, qemu-devel

On Mon, Oct 17, 2011 at 11:28:16AM +0200, Jan Kiszka wrote:
> Devices models are usually not interested in specifying MSI-X
> configuration details beyond the number of vectors to provide and the
> BAR number to use. Layout of an exclusively used BAR and its
> registration can also be handled centrally.
> 
> This is the purpose of msix_init_simple. It provides handy services to
> the existing users. Future users like device assignment may require more
> detailed setup specification. For them we will (re-)introduce msix_init
> with the full list of configuration option (in contrast to the current
> code).
> 
> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>

Well, this seems a bit of a code churn then, doesn't it?
We are also discussing using memory BAR for virtio-pci for other
stuff besides MSI-X, so the last user of the _simple variant
will be ivshmem then?

> ---
>  hw/ivshmem.c    |    6 +-----
>  hw/msix.c       |   35 ++++++++++++++---------------------
>  hw/msix.h       |    7 +++----
>  hw/virtio-pci.c |   15 +++++----------
>  hw/virtio-pci.h |    1 -
>  5 files changed, 23 insertions(+), 41 deletions(-)
> 
> diff --git a/hw/ivshmem.c b/hw/ivshmem.c
> index a402c98..d9dbd18 100644
> --- a/hw/ivshmem.c
> +++ b/hw/ivshmem.c
> @@ -65,7 +65,6 @@ typedef struct IVShmemState {
>       */
>      MemoryRegion bar;
>      MemoryRegion ivshmem;
> -    MemoryRegion msix_bar;
>      uint64_t ivshmem_size; /* size of shared memory region */
>      int shm_fd; /* shared memory file descriptor */
>  
> @@ -539,10 +538,7 @@ static void ivshmem_setup_msi(IVShmemState *s)
>  {
>      /* allocate the MSI-X vectors */
>  
> -    memory_region_init(&s->msix_bar, "ivshmem-msix", 4096);
> -    if (!msix_init(&s->dev, s->vectors, &s->msix_bar, 1, 0)) {
> -        pci_register_bar(&s->dev, 1, PCI_BASE_ADDRESS_SPACE_MEMORY,
> -                         &s->msix_bar);
> +    if (!msix_init_simple(&s->dev, s->vectors, 1)) {
>          IVSHMEM_DPRINTF("msix initialized (%d vectors)\n", s->vectors);
>      } else {
>          IVSHMEM_DPRINTF("msix initialization failed\n");
> diff --git a/hw/msix.c b/hw/msix.c
> index bccd8b1..258b9c1 100644
> --- a/hw/msix.c
> +++ b/hw/msix.c
> @@ -244,17 +244,6 @@ static const MemoryRegionOps msix_mmio_ops = {
>      },
>  };
>  
> -static void msix_mmio_setup(PCIDevice *d, MemoryRegion *bar)
> -{
> -    uint8_t *config = d->config + d->msix_cap;
> -    uint32_t table = pci_get_long(config + PCI_MSIX_TABLE);
> -    uint32_t offset = table & ~(MSIX_PAGE_SIZE - 1);
> -    /* TODO: for assigned devices, we'll want to make it possible to map
> -     * pending bits separately in case they are in a separate bar. */
> -
> -    memory_region_add_subregion(bar, offset, &d->msix_mmio);
> -}
> -
>  static void msix_mask_all(struct PCIDevice *dev, unsigned nentries)
>  {
>      int vector;
> @@ -272,11 +261,9 @@ static void msix_mask_all(struct PCIDevice *dev, unsigned nentries)
>      }
>  }
>  
> -/* Initialize the MSI-X structures. Note: if MSI-X is supported, BAR size is
> - * modified, it should be retrieved with msix_bar_size. */
> -int msix_init(struct PCIDevice *dev, unsigned short nentries,
> -              MemoryRegion *bar,
> -              unsigned bar_nr, unsigned bar_size)
> +/* Initialize the MSI-X structures in a single dedicated BAR
> + * and register it. */
> +int msix_init_simple(PCIDevice *dev, unsigned short nentries, unsigned bar_nr)
>  {
>      int ret;
>  
> @@ -296,14 +283,16 @@ int msix_init(struct PCIDevice *dev, unsigned short nentries,
>                            "msix", MSIX_PAGE_SIZE);
>  
>      dev->msix_entries_nr = nentries;
> -    ret = msix_add_config(dev, nentries, bar_nr, bar_size);
> +    ret = msix_add_config(dev, nentries, bar_nr, 0);
>      if (ret)
>          goto err_config;
>  
>      dev->msix_cache = g_malloc0(nentries * sizeof *dev->msix_cache);
>  
>      dev->cap_present |= QEMU_PCI_CAP_MSIX;
> -    msix_mmio_setup(dev, bar);
> +
> +    pci_register_bar(dev, bar_nr, PCI_BASE_ADDRESS_SPACE_MEMORY,
> +                     &dev->msix_mmio);
>      return 0;
>  
>  err_config:
> @@ -315,10 +304,10 @@ err_config:
>  }
>  
>  /* Clean up resources for the device. */
> -int msix_uninit(PCIDevice *dev, MemoryRegion *bar)
> +void msix_uninit(PCIDevice *dev, MemoryRegion *bar)
>  {
>      if (!msix_present(dev)) {
> -        return 0;
> +        return;
>      }
>      pci_del_capability(dev, PCI_CAP_ID_MSIX, MSIX_CAP_LENGTH);
>      dev->msix_cap = 0;
> @@ -332,7 +321,11 @@ int msix_uninit(PCIDevice *dev, MemoryRegion *bar)
>      g_free(dev->msix_cache);
>  
>      dev->cap_present &= ~QEMU_PCI_CAP_MSIX;
> -    return 0;
> +}
> +
> +void msix_uninit_simple(PCIDevice *dev)
> +{
> +    msix_uninit(dev, &dev->msix_mmio);
>  }
>  
>  void msix_save(PCIDevice *dev, QEMUFile *f)
> diff --git a/hw/msix.h b/hw/msix.h
> index dfc6087..56e7ba5 100644
> --- a/hw/msix.h
> +++ b/hw/msix.h
> @@ -4,14 +4,13 @@
>  #include "qemu-common.h"
>  #include "pci.h"
>  
> -int msix_init(PCIDevice *pdev, unsigned short nentries,
> -              MemoryRegion *bar,
> -              unsigned bar_nr, unsigned bar_size);
> +int msix_init_simple(PCIDevice *dev, unsigned short nentries, unsigned bar_nr);
>  
>  void msix_write_config(PCIDevice *pci_dev, uint32_t address,
>                         uint32_t old_val, int len);
>  
> -int msix_uninit(PCIDevice *d, MemoryRegion *bar);
> +void msix_uninit(PCIDevice *d, MemoryRegion *bar);
> +void msix_uninit_simple(PCIDevice *d);
>  
>  void msix_save(PCIDevice *dev, QEMUFile *f);
>  void msix_load(PCIDevice *dev, QEMUFile *f);
> diff --git a/hw/virtio-pci.c b/hw/virtio-pci.c
> index 5004d7d..6fe2b5e 100644
> --- a/hw/virtio-pci.c
> +++ b/hw/virtio-pci.c
> @@ -713,13 +713,10 @@ void virtio_init_pci(VirtIOPCIProxy *proxy, VirtIODevice *vdev)
>      pci_set_word(config + 0x2e, vdev->device_id);
>      config[0x3d] = 1;
>  
> -    memory_region_init(&proxy->msix_bar, "virtio-msix", 4096);
> -    if (vdev->nvectors && !msix_init(&proxy->pci_dev, vdev->nvectors,
> -                                     &proxy->msix_bar, 1, 0)) {
> -        pci_register_bar(&proxy->pci_dev, 1, PCI_BASE_ADDRESS_SPACE_MEMORY,
> -                         &proxy->msix_bar);
> -    } else
> +    if (vdev->nvectors &&
> +        msix_init_simple(&proxy->pci_dev, vdev->nvectors, 1)) {
>          vdev->nvectors = 0;
> +    }
>  
>      proxy->pci_dev.config_write = virtio_write_config;
>  
> @@ -766,12 +763,10 @@ static int virtio_blk_init_pci(PCIDevice *pci_dev)
>  static int virtio_exit_pci(PCIDevice *pci_dev)
>  {
>      VirtIOPCIProxy *proxy = DO_UPCAST(VirtIOPCIProxy, pci_dev, pci_dev);
> -    int r;
>  
>      memory_region_destroy(&proxy->bar);
> -    r = msix_uninit(pci_dev, &proxy->msix_bar);
> -    memory_region_destroy(&proxy->msix_bar);
> -    return r;
> +    msix_uninit_simple(pci_dev);
> +    return 0;
>  }
>  
>  static int virtio_blk_exit_pci(PCIDevice *pci_dev)
> diff --git a/hw/virtio-pci.h b/hw/virtio-pci.h
> index 14c10f7..5af1c8c 100644
> --- a/hw/virtio-pci.h
> +++ b/hw/virtio-pci.h
> @@ -22,7 +22,6 @@ typedef struct {
>      PCIDevice pci_dev;
>      VirtIODevice *vdev;
>      MemoryRegion bar;
> -    MemoryRegion msix_bar;
>      uint32_t flags;
>      uint32_t class_code;
>      uint32_t nvectors;
> -- 
> 1.7.3.4

^ permalink raw reply	[flat|nested] 288+ messages in thread

* Re: [Qemu-devel] [RFC][PATCH 42/45] msix: Introduce msix_init_simple
@ 2011-10-17 11:22     ` Michael S. Tsirkin
  0 siblings, 0 replies; 288+ messages in thread
From: Michael S. Tsirkin @ 2011-10-17 11:22 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: Alex Williamson, Marcelo Tosatti, Avi Kivity, kvm, qemu-devel

On Mon, Oct 17, 2011 at 11:28:16AM +0200, Jan Kiszka wrote:
> Devices models are usually not interested in specifying MSI-X
> configuration details beyond the number of vectors to provide and the
> BAR number to use. Layout of an exclusively used BAR and its
> registration can also be handled centrally.
> 
> This is the purpose of msix_init_simple. It provides handy services to
> the existing users. Future users like device assignment may require more
> detailed setup specification. For them we will (re-)introduce msix_init
> with the full list of configuration option (in contrast to the current
> code).
> 
> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>

Well, this seems a bit of a code churn then, doesn't it?
We are also discussing using memory BAR for virtio-pci for other
stuff besides MSI-X, so the last user of the _simple variant
will be ivshmem then?

> ---
>  hw/ivshmem.c    |    6 +-----
>  hw/msix.c       |   35 ++++++++++++++---------------------
>  hw/msix.h       |    7 +++----
>  hw/virtio-pci.c |   15 +++++----------
>  hw/virtio-pci.h |    1 -
>  5 files changed, 23 insertions(+), 41 deletions(-)
> 
> diff --git a/hw/ivshmem.c b/hw/ivshmem.c
> index a402c98..d9dbd18 100644
> --- a/hw/ivshmem.c
> +++ b/hw/ivshmem.c
> @@ -65,7 +65,6 @@ typedef struct IVShmemState {
>       */
>      MemoryRegion bar;
>      MemoryRegion ivshmem;
> -    MemoryRegion msix_bar;
>      uint64_t ivshmem_size; /* size of shared memory region */
>      int shm_fd; /* shared memory file descriptor */
>  
> @@ -539,10 +538,7 @@ static void ivshmem_setup_msi(IVShmemState *s)
>  {
>      /* allocate the MSI-X vectors */
>  
> -    memory_region_init(&s->msix_bar, "ivshmem-msix", 4096);
> -    if (!msix_init(&s->dev, s->vectors, &s->msix_bar, 1, 0)) {
> -        pci_register_bar(&s->dev, 1, PCI_BASE_ADDRESS_SPACE_MEMORY,
> -                         &s->msix_bar);
> +    if (!msix_init_simple(&s->dev, s->vectors, 1)) {
>          IVSHMEM_DPRINTF("msix initialized (%d vectors)\n", s->vectors);
>      } else {
>          IVSHMEM_DPRINTF("msix initialization failed\n");
> diff --git a/hw/msix.c b/hw/msix.c
> index bccd8b1..258b9c1 100644
> --- a/hw/msix.c
> +++ b/hw/msix.c
> @@ -244,17 +244,6 @@ static const MemoryRegionOps msix_mmio_ops = {
>      },
>  };
>  
> -static void msix_mmio_setup(PCIDevice *d, MemoryRegion *bar)
> -{
> -    uint8_t *config = d->config + d->msix_cap;
> -    uint32_t table = pci_get_long(config + PCI_MSIX_TABLE);
> -    uint32_t offset = table & ~(MSIX_PAGE_SIZE - 1);
> -    /* TODO: for assigned devices, we'll want to make it possible to map
> -     * pending bits separately in case they are in a separate bar. */
> -
> -    memory_region_add_subregion(bar, offset, &d->msix_mmio);
> -}
> -
>  static void msix_mask_all(struct PCIDevice *dev, unsigned nentries)
>  {
>      int vector;
> @@ -272,11 +261,9 @@ static void msix_mask_all(struct PCIDevice *dev, unsigned nentries)
>      }
>  }
>  
> -/* Initialize the MSI-X structures. Note: if MSI-X is supported, BAR size is
> - * modified, it should be retrieved with msix_bar_size. */
> -int msix_init(struct PCIDevice *dev, unsigned short nentries,
> -              MemoryRegion *bar,
> -              unsigned bar_nr, unsigned bar_size)
> +/* Initialize the MSI-X structures in a single dedicated BAR
> + * and register it. */
> +int msix_init_simple(PCIDevice *dev, unsigned short nentries, unsigned bar_nr)
>  {
>      int ret;
>  
> @@ -296,14 +283,16 @@ int msix_init(struct PCIDevice *dev, unsigned short nentries,
>                            "msix", MSIX_PAGE_SIZE);
>  
>      dev->msix_entries_nr = nentries;
> -    ret = msix_add_config(dev, nentries, bar_nr, bar_size);
> +    ret = msix_add_config(dev, nentries, bar_nr, 0);
>      if (ret)
>          goto err_config;
>  
>      dev->msix_cache = g_malloc0(nentries * sizeof *dev->msix_cache);
>  
>      dev->cap_present |= QEMU_PCI_CAP_MSIX;
> -    msix_mmio_setup(dev, bar);
> +
> +    pci_register_bar(dev, bar_nr, PCI_BASE_ADDRESS_SPACE_MEMORY,
> +                     &dev->msix_mmio);
>      return 0;
>  
>  err_config:
> @@ -315,10 +304,10 @@ err_config:
>  }
>  
>  /* Clean up resources for the device. */
> -int msix_uninit(PCIDevice *dev, MemoryRegion *bar)
> +void msix_uninit(PCIDevice *dev, MemoryRegion *bar)
>  {
>      if (!msix_present(dev)) {
> -        return 0;
> +        return;
>      }
>      pci_del_capability(dev, PCI_CAP_ID_MSIX, MSIX_CAP_LENGTH);
>      dev->msix_cap = 0;
> @@ -332,7 +321,11 @@ int msix_uninit(PCIDevice *dev, MemoryRegion *bar)
>      g_free(dev->msix_cache);
>  
>      dev->cap_present &= ~QEMU_PCI_CAP_MSIX;
> -    return 0;
> +}
> +
> +void msix_uninit_simple(PCIDevice *dev)
> +{
> +    msix_uninit(dev, &dev->msix_mmio);
>  }
>  
>  void msix_save(PCIDevice *dev, QEMUFile *f)
> diff --git a/hw/msix.h b/hw/msix.h
> index dfc6087..56e7ba5 100644
> --- a/hw/msix.h
> +++ b/hw/msix.h
> @@ -4,14 +4,13 @@
>  #include "qemu-common.h"
>  #include "pci.h"
>  
> -int msix_init(PCIDevice *pdev, unsigned short nentries,
> -              MemoryRegion *bar,
> -              unsigned bar_nr, unsigned bar_size);
> +int msix_init_simple(PCIDevice *dev, unsigned short nentries, unsigned bar_nr);
>  
>  void msix_write_config(PCIDevice *pci_dev, uint32_t address,
>                         uint32_t old_val, int len);
>  
> -int msix_uninit(PCIDevice *d, MemoryRegion *bar);
> +void msix_uninit(PCIDevice *d, MemoryRegion *bar);
> +void msix_uninit_simple(PCIDevice *d);
>  
>  void msix_save(PCIDevice *dev, QEMUFile *f);
>  void msix_load(PCIDevice *dev, QEMUFile *f);
> diff --git a/hw/virtio-pci.c b/hw/virtio-pci.c
> index 5004d7d..6fe2b5e 100644
> --- a/hw/virtio-pci.c
> +++ b/hw/virtio-pci.c
> @@ -713,13 +713,10 @@ void virtio_init_pci(VirtIOPCIProxy *proxy, VirtIODevice *vdev)
>      pci_set_word(config + 0x2e, vdev->device_id);
>      config[0x3d] = 1;
>  
> -    memory_region_init(&proxy->msix_bar, "virtio-msix", 4096);
> -    if (vdev->nvectors && !msix_init(&proxy->pci_dev, vdev->nvectors,
> -                                     &proxy->msix_bar, 1, 0)) {
> -        pci_register_bar(&proxy->pci_dev, 1, PCI_BASE_ADDRESS_SPACE_MEMORY,
> -                         &proxy->msix_bar);
> -    } else
> +    if (vdev->nvectors &&
> +        msix_init_simple(&proxy->pci_dev, vdev->nvectors, 1)) {
>          vdev->nvectors = 0;
> +    }
>  
>      proxy->pci_dev.config_write = virtio_write_config;
>  
> @@ -766,12 +763,10 @@ static int virtio_blk_init_pci(PCIDevice *pci_dev)
>  static int virtio_exit_pci(PCIDevice *pci_dev)
>  {
>      VirtIOPCIProxy *proxy = DO_UPCAST(VirtIOPCIProxy, pci_dev, pci_dev);
> -    int r;
>  
>      memory_region_destroy(&proxy->bar);
> -    r = msix_uninit(pci_dev, &proxy->msix_bar);
> -    memory_region_destroy(&proxy->msix_bar);
> -    return r;
> +    msix_uninit_simple(pci_dev);
> +    return 0;
>  }
>  
>  static int virtio_blk_exit_pci(PCIDevice *pci_dev)
> diff --git a/hw/virtio-pci.h b/hw/virtio-pci.h
> index 14c10f7..5af1c8c 100644
> --- a/hw/virtio-pci.h
> +++ b/hw/virtio-pci.h
> @@ -22,7 +22,6 @@ typedef struct {
>      PCIDevice pci_dev;
>      VirtIODevice *vdev;
>      MemoryRegion bar;
> -    MemoryRegion msix_bar;
>      uint32_t flags;
>      uint32_t class_code;
>      uint32_t nvectors;
> -- 
> 1.7.3.4

^ permalink raw reply	[flat|nested] 288+ messages in thread

* Re: [RFC][PATCH 11/45] msi: Factor out delivery hook
  2011-10-17 11:15       ` [Qemu-devel] " Jan Kiszka
@ 2011-10-17 11:22         ` Avi Kivity
  -1 siblings, 0 replies; 288+ messages in thread
From: Avi Kivity @ 2011-10-17 11:22 UTC (permalink / raw)
  To: Jan Kiszka
  Cc: Marcelo Tosatti, kvm, Alex Williamson, Michael S. Tsirkin, qemu-devel

On 10/17/2011 01:15 PM, Jan Kiszka wrote:
> On 2011-10-17 12:56, Avi Kivity wrote:
> > On 10/17/2011 11:27 AM, Jan Kiszka wrote:
> >> So far we deliver MSI messages by writing them into the target MMIO
> >> area. This reflects what happens on hardware, but imposes some
> >> limitations on the emulation when introducing KVM in-kernel irqchip
> >> models. For those we will need to track the message origin.
> > 
> > Why do we need to track the message origin?  Emulated interrupt remapping?
>
> The origin holds the routing cache which we need to track if the message
> already has a route (and that without searching long lists) and to
> update that route instead of add another one.

Okay, having read more of the code I understand this better.  The
approach of providing an explicit cache entry, while more intrusive, is
simpler (at least, without std::unordered_map).  However you do need
destructors for the cache to let the core know that it can't reference
it anymore.


>
> > 
> > 
> > Not sure what the gain is from intercepting the msi just before the
> > stl_phys() vs. in the apic handler.
>
> APIC is x86-specific, MSI is not. I think Xen will also want to make use
> of this hook. I originally though of using it for the KVM in-kernel
> models as well, but I will now establish a callback at APIC-level
> (upstream will look differently from qemu-kvm in this regard).
>

But you still have to handle it the the platform interrupt controller
(or whatever processes msi messages) since you can still DMA there.  So
you don't get away from doing it there anyway.

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 288+ messages in thread

* Re: [Qemu-devel] [RFC][PATCH 11/45] msi: Factor out delivery hook
@ 2011-10-17 11:22         ` Avi Kivity
  0 siblings, 0 replies; 288+ messages in thread
From: Avi Kivity @ 2011-10-17 11:22 UTC (permalink / raw)
  To: Jan Kiszka
  Cc: Alex Williamson, Marcelo Tosatti, qemu-devel, kvm, Michael S. Tsirkin

On 10/17/2011 01:15 PM, Jan Kiszka wrote:
> On 2011-10-17 12:56, Avi Kivity wrote:
> > On 10/17/2011 11:27 AM, Jan Kiszka wrote:
> >> So far we deliver MSI messages by writing them into the target MMIO
> >> area. This reflects what happens on hardware, but imposes some
> >> limitations on the emulation when introducing KVM in-kernel irqchip
> >> models. For those we will need to track the message origin.
> > 
> > Why do we need to track the message origin?  Emulated interrupt remapping?
>
> The origin holds the routing cache which we need to track if the message
> already has a route (and that without searching long lists) and to
> update that route instead of add another one.

Okay, having read more of the code I understand this better.  The
approach of providing an explicit cache entry, while more intrusive, is
simpler (at least, without std::unordered_map).  However you do need
destructors for the cache to let the core know that it can't reference
it anymore.


>
> > 
> > 
> > Not sure what the gain is from intercepting the msi just before the
> > stl_phys() vs. in the apic handler.
>
> APIC is x86-specific, MSI is not. I think Xen will also want to make use
> of this hook. I originally though of using it for the KVM in-kernel
> models as well, but I will now establish a callback at APIC-level
> (upstream will look differently from qemu-kvm in this regard).
>

But you still have to handle it the the platform interrupt controller
(or whatever processes msi messages) since you can still DMA there.  So
you don't get away from doing it there anyway.

-- 
error compiling committee.c: too many arguments to function

^ permalink raw reply	[flat|nested] 288+ messages in thread

* Re: [RFC][PATCH 06/45] msix: Prevent bogus mask updates on MMIO accesses
  2011-10-17 11:10     ` [Qemu-devel] " Michael S. Tsirkin
@ 2011-10-17 11:23       ` Jan Kiszka
  -1 siblings, 0 replies; 288+ messages in thread
From: Jan Kiszka @ 2011-10-17 11:23 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Alex Williamson, Marcelo Tosatti, Avi Kivity, kvm, qemu-devel

On 2011-10-17 13:10, Michael S. Tsirkin wrote:
> On Mon, Oct 17, 2011 at 11:27:40AM +0200, Jan Kiszka wrote:
>> Only accesses to the MSI-X table must trigger a call to
>> msix_handle_mask_update or a notifier invocation.
>>
>> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
> 
> Why would msix_mmio_write be called on an access
> outside the table?

Because it handles both the table and the PBA.

> 
>> ---
>>  hw/msix.c |   16 ++++++++++------
>>  1 files changed, 10 insertions(+), 6 deletions(-)
>>
>> diff --git a/hw/msix.c b/hw/msix.c
>> index 2c4de21..33cb716 100644
>> --- a/hw/msix.c
>> +++ b/hw/msix.c
>> @@ -264,18 +264,22 @@ static void msix_mmio_write(void *opaque, target_phys_addr_t addr,
>>  {
>>      PCIDevice *dev = opaque;
>>      unsigned int offset = addr & (MSIX_PAGE_SIZE - 1) & ~0x3;
>> -    int vector = offset / PCI_MSIX_ENTRY_SIZE;
>> +    unsigned int vector = offset / PCI_MSIX_ENTRY_SIZE;
> 
> Why the int/unsigned change? this has no chance to overflow, and using
> unsigned causes signed/unsigned comparison below,
> and unsigned/signed conversion on calls such as msix_is_masked.

Vectors should be unsigned int, this is just one step in that direction
as we are at it. Even if the overflow is practically impossible, this
remains cleaner.

> 
>>      int was_masked = msix_is_masked(dev, vector);
>>      pci_set_long(dev->msix_table_page + offset, val);
>>      if (kvm_enabled() && kvm_irqchip_in_kernel()) {
>>          kvm_msix_update(dev, vector, was_masked, msix_is_masked(dev, vector));
>>      }
> 
> I would say if we need to check the address, check it first thing
> and return if the address is out of a sensible range.

Will do that later when generalized MSI-X support.

> For example, are you worried about kvm_msix_update calls with
> a sensible mask?

No, that kvm code will die anyway.

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux

^ permalink raw reply	[flat|nested] 288+ messages in thread

* Re: [Qemu-devel] [RFC][PATCH 06/45] msix: Prevent bogus mask updates on MMIO accesses
@ 2011-10-17 11:23       ` Jan Kiszka
  0 siblings, 0 replies; 288+ messages in thread
From: Jan Kiszka @ 2011-10-17 11:23 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Alex Williamson, Marcelo Tosatti, Avi Kivity, kvm, qemu-devel

On 2011-10-17 13:10, Michael S. Tsirkin wrote:
> On Mon, Oct 17, 2011 at 11:27:40AM +0200, Jan Kiszka wrote:
>> Only accesses to the MSI-X table must trigger a call to
>> msix_handle_mask_update or a notifier invocation.
>>
>> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
> 
> Why would msix_mmio_write be called on an access
> outside the table?

Because it handles both the table and the PBA.

> 
>> ---
>>  hw/msix.c |   16 ++++++++++------
>>  1 files changed, 10 insertions(+), 6 deletions(-)
>>
>> diff --git a/hw/msix.c b/hw/msix.c
>> index 2c4de21..33cb716 100644
>> --- a/hw/msix.c
>> +++ b/hw/msix.c
>> @@ -264,18 +264,22 @@ static void msix_mmio_write(void *opaque, target_phys_addr_t addr,
>>  {
>>      PCIDevice *dev = opaque;
>>      unsigned int offset = addr & (MSIX_PAGE_SIZE - 1) & ~0x3;
>> -    int vector = offset / PCI_MSIX_ENTRY_SIZE;
>> +    unsigned int vector = offset / PCI_MSIX_ENTRY_SIZE;
> 
> Why the int/unsigned change? this has no chance to overflow, and using
> unsigned causes signed/unsigned comparison below,
> and unsigned/signed conversion on calls such as msix_is_masked.

Vectors should be unsigned int, this is just one step in that direction
as we are at it. Even if the overflow is practically impossible, this
remains cleaner.

> 
>>      int was_masked = msix_is_masked(dev, vector);
>>      pci_set_long(dev->msix_table_page + offset, val);
>>      if (kvm_enabled() && kvm_irqchip_in_kernel()) {
>>          kvm_msix_update(dev, vector, was_masked, msix_is_masked(dev, vector));
>>      }
> 
> I would say if we need to check the address, check it first thing
> and return if the address is out of a sensible range.

Will do that later when generalized MSI-X support.

> For example, are you worried about kvm_msix_update calls with
> a sensible mask?

No, that kvm code will die anyway.

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux

^ permalink raw reply	[flat|nested] 288+ messages in thread

* Re: [RFC][PATCH 12/45] msi: Introduce MSIRoutingCache
  2011-10-17 11:19       ` [Qemu-devel] " Jan Kiszka
@ 2011-10-17 11:25         ` Avi Kivity
  -1 siblings, 0 replies; 288+ messages in thread
From: Avi Kivity @ 2011-10-17 11:25 UTC (permalink / raw)
  To: Jan Kiszka
  Cc: Marcelo Tosatti, kvm, Alex Williamson, Michael S. Tsirkin, qemu-devel

On 10/17/2011 01:19 PM, Jan Kiszka wrote:
> > IMO this needlessly leaks kvm information into core qemu.  The cache
> > should be completely hidden in kvm code.
> > 
> > I think msi_deliver() can hide the use of the cache completely.  For
> > pre-registered events like kvm's irqfd, you can use something like
> > 
> >   qemu_irq qemu_msi_irq(MSIMessage msg)
> > 
> > for non-kvm, it simply returns a qemu_irq that triggers a stl_phys();
> > for kvm, it allocates an irqfd and a permanent entry in the cache and
> > returns a qemu_irq that triggers the irqfd.
>
> See my previously mail: you want to track the life-cycle of an MSI
> source to avoid generating routes for identical sources. A messages is
> not a source. Two identical messages can come from different sources. So
> we need a separate data structure for that purpose.
>

Yes, I understand this now.

Just to make sure I understand this completely:  a hash table indexed by
MSIMessage in kvm code would avoid this?  You'd just allocate on demand
when seeing a new MSIMessage and free on an LRU basis, avoiding pinned
entries.

I'm not advocating this (yet), just want to understand the tradeoffs.

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 288+ messages in thread

* Re: [Qemu-devel] [RFC][PATCH 12/45] msi: Introduce MSIRoutingCache
@ 2011-10-17 11:25         ` Avi Kivity
  0 siblings, 0 replies; 288+ messages in thread
From: Avi Kivity @ 2011-10-17 11:25 UTC (permalink / raw)
  To: Jan Kiszka
  Cc: Alex Williamson, Marcelo Tosatti, qemu-devel, kvm, Michael S. Tsirkin

On 10/17/2011 01:19 PM, Jan Kiszka wrote:
> > IMO this needlessly leaks kvm information into core qemu.  The cache
> > should be completely hidden in kvm code.
> > 
> > I think msi_deliver() can hide the use of the cache completely.  For
> > pre-registered events like kvm's irqfd, you can use something like
> > 
> >   qemu_irq qemu_msi_irq(MSIMessage msg)
> > 
> > for non-kvm, it simply returns a qemu_irq that triggers a stl_phys();
> > for kvm, it allocates an irqfd and a permanent entry in the cache and
> > returns a qemu_irq that triggers the irqfd.
>
> See my previously mail: you want to track the life-cycle of an MSI
> source to avoid generating routes for identical sources. A messages is
> not a source. Two identical messages can come from different sources. So
> we need a separate data structure for that purpose.
>

Yes, I understand this now.

Just to make sure I understand this completely:  a hash table indexed by
MSIMessage in kvm code would avoid this?  You'd just allocate on demand
when seeing a new MSIMessage and free on an LRU basis, avoiding pinned
entries.

I'm not advocating this (yet), just want to understand the tradeoffs.

-- 
error compiling committee.c: too many arguments to function

^ permalink raw reply	[flat|nested] 288+ messages in thread

* Re: [RFC][PATCH 17/45] qemu-kvm: Track MSIRoutingCache in KVM routing table
  2011-10-17 11:13     ` [Qemu-devel] " Avi Kivity
@ 2011-10-17 11:25       ` Jan Kiszka
  -1 siblings, 0 replies; 288+ messages in thread
From: Jan Kiszka @ 2011-10-17 11:25 UTC (permalink / raw)
  To: Avi Kivity
  Cc: Marcelo Tosatti, kvm, Alex Williamson, Michael S. Tsirkin, qemu-devel

On 2011-10-17 13:13, Avi Kivity wrote:
> On 10/17/2011 11:27 AM, Jan Kiszka wrote:
>> Keep a link from the internal KVM routing table to potential MSI routing
>> cache entries. The link is used so far whenever the entry is dropped to
>> invalidate the cache content. It will allow us to build MSI routing
>> entries on demand and flush existing ones on table overflow.
>>
> 
> Does this not require a destructor for MSIRoutingCache?

Yes, kvm_msi_cache_invalidate. Cache providers are responsible for
invalidating used caches before freeing them. That also drops the
reference established here.

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux

^ permalink raw reply	[flat|nested] 288+ messages in thread

* Re: [Qemu-devel] [RFC][PATCH 17/45] qemu-kvm: Track MSIRoutingCache in KVM routing table
@ 2011-10-17 11:25       ` Jan Kiszka
  0 siblings, 0 replies; 288+ messages in thread
From: Jan Kiszka @ 2011-10-17 11:25 UTC (permalink / raw)
  To: Avi Kivity
  Cc: Alex Williamson, Marcelo Tosatti, qemu-devel, kvm, Michael S. Tsirkin

On 2011-10-17 13:13, Avi Kivity wrote:
> On 10/17/2011 11:27 AM, Jan Kiszka wrote:
>> Keep a link from the internal KVM routing table to potential MSI routing
>> cache entries. The link is used so far whenever the entry is dropped to
>> invalidate the cache content. It will allow us to build MSI routing
>> entries on demand and flush existing ones on table overflow.
>>
> 
> Does this not require a destructor for MSIRoutingCache?

Yes, kvm_msi_cache_invalidate. Cache providers are responsible for
invalidating used caches before freeing them. That also drops the
reference established here.

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux

^ permalink raw reply	[flat|nested] 288+ messages in thread

* Re: [RFC][PATCH 42/45] msix: Introduce msix_init_simple
  2011-10-17 11:22     ` [Qemu-devel] " Michael S. Tsirkin
@ 2011-10-17 11:27       ` Jan Kiszka
  -1 siblings, 0 replies; 288+ messages in thread
From: Jan Kiszka @ 2011-10-17 11:27 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Avi Kivity, Marcelo Tosatti, kvm, Alex Williamson, qemu-devel

On 2011-10-17 13:22, Michael S. Tsirkin wrote:
> On Mon, Oct 17, 2011 at 11:28:16AM +0200, Jan Kiszka wrote:
>> Devices models are usually not interested in specifying MSI-X
>> configuration details beyond the number of vectors to provide and the
>> BAR number to use. Layout of an exclusively used BAR and its
>> registration can also be handled centrally.
>>
>> This is the purpose of msix_init_simple. It provides handy services to
>> the existing users. Future users like device assignment may require more
>> detailed setup specification. For them we will (re-)introduce msix_init
>> with the full list of configuration option (in contrast to the current
>> code).
>>
>> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
> 
> Well, this seems a bit of a code churn then, doesn't it?
> We are also discussing using memory BAR for virtio-pci for other
> stuff besides MSI-X, so the last user of the _simple variant
> will be ivshmem then?

We will surely see more MSI-X users over the time. Not sure if they all
mix their MSIX-X BARs with other stuff. But e.g. the e1000 variant I
have here does not. So there should be users in the future.

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux

^ permalink raw reply	[flat|nested] 288+ messages in thread

* Re: [Qemu-devel] [RFC][PATCH 42/45] msix: Introduce msix_init_simple
@ 2011-10-17 11:27       ` Jan Kiszka
  0 siblings, 0 replies; 288+ messages in thread
From: Jan Kiszka @ 2011-10-17 11:27 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Alex Williamson, Marcelo Tosatti, Avi Kivity, kvm, qemu-devel

On 2011-10-17 13:22, Michael S. Tsirkin wrote:
> On Mon, Oct 17, 2011 at 11:28:16AM +0200, Jan Kiszka wrote:
>> Devices models are usually not interested in specifying MSI-X
>> configuration details beyond the number of vectors to provide and the
>> BAR number to use. Layout of an exclusively used BAR and its
>> registration can also be handled centrally.
>>
>> This is the purpose of msix_init_simple. It provides handy services to
>> the existing users. Future users like device assignment may require more
>> detailed setup specification. For them we will (re-)introduce msix_init
>> with the full list of configuration option (in contrast to the current
>> code).
>>
>> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
> 
> Well, this seems a bit of a code churn then, doesn't it?
> We are also discussing using memory BAR for virtio-pci for other
> stuff besides MSI-X, so the last user of the _simple variant
> will be ivshmem then?

We will surely see more MSI-X users over the time. Not sure if they all
mix their MSIX-X BARs with other stuff. But e.g. the e1000 variant I
have here does not. So there should be users in the future.

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux

^ permalink raw reply	[flat|nested] 288+ messages in thread

* Re: [RFC][PATCH 11/45] msi: Factor out delivery hook
  2011-10-17 11:22         ` [Qemu-devel] " Avi Kivity
@ 2011-10-17 11:29           ` Jan Kiszka
  -1 siblings, 0 replies; 288+ messages in thread
From: Jan Kiszka @ 2011-10-17 11:29 UTC (permalink / raw)
  To: Avi Kivity
  Cc: Alex Williamson, Marcelo Tosatti, qemu-devel, kvm, Michael S. Tsirkin

On 2011-10-17 13:22, Avi Kivity wrote:
> On 10/17/2011 01:15 PM, Jan Kiszka wrote:
>> On 2011-10-17 12:56, Avi Kivity wrote:
>>> On 10/17/2011 11:27 AM, Jan Kiszka wrote:
>>>> So far we deliver MSI messages by writing them into the target MMIO
>>>> area. This reflects what happens on hardware, but imposes some
>>>> limitations on the emulation when introducing KVM in-kernel irqchip
>>>> models. For those we will need to track the message origin.
>>>
>>> Why do we need to track the message origin?  Emulated interrupt remapping?
>>
>> The origin holds the routing cache which we need to track if the message
>> already has a route (and that without searching long lists) and to
>> update that route instead of add another one.
> 
> Okay, having read more of the code I understand this better.  The
> approach of providing an explicit cache entry, while more intrusive, is
> simpler (at least, without std::unordered_map).  However you do need
> destructors for the cache to let the core know that it can't reference
> it anymore.

See my other mail.

> 
> 
>>
>>>
>>>
>>> Not sure what the gain is from intercepting the msi just before the
>>> stl_phys() vs. in the apic handler.
>>
>> APIC is x86-specific, MSI is not. I think Xen will also want to make use
>> of this hook. I originally though of using it for the KVM in-kernel
>> models as well, but I will now establish a callback at APIC-level
>> (upstream will look differently from qemu-kvm in this regard).
>>
> 
> But you still have to handle it the the platform interrupt controller
> (or whatever processes msi messages) since you can still DMA there.  So
> you don't get away from doing it there anyway.

Right, but that's the slow path (which is still handled - on x86 via the
MMIO region the APIC still maintains).

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux

^ permalink raw reply	[flat|nested] 288+ messages in thread

* Re: [Qemu-devel] [RFC][PATCH 11/45] msi: Factor out delivery hook
@ 2011-10-17 11:29           ` Jan Kiszka
  0 siblings, 0 replies; 288+ messages in thread
From: Jan Kiszka @ 2011-10-17 11:29 UTC (permalink / raw)
  To: Avi Kivity
  Cc: Alex Williamson, Marcelo Tosatti, qemu-devel, kvm, Michael S. Tsirkin

On 2011-10-17 13:22, Avi Kivity wrote:
> On 10/17/2011 01:15 PM, Jan Kiszka wrote:
>> On 2011-10-17 12:56, Avi Kivity wrote:
>>> On 10/17/2011 11:27 AM, Jan Kiszka wrote:
>>>> So far we deliver MSI messages by writing them into the target MMIO
>>>> area. This reflects what happens on hardware, but imposes some
>>>> limitations on the emulation when introducing KVM in-kernel irqchip
>>>> models. For those we will need to track the message origin.
>>>
>>> Why do we need to track the message origin?  Emulated interrupt remapping?
>>
>> The origin holds the routing cache which we need to track if the message
>> already has a route (and that without searching long lists) and to
>> update that route instead of add another one.
> 
> Okay, having read more of the code I understand this better.  The
> approach of providing an explicit cache entry, while more intrusive, is
> simpler (at least, without std::unordered_map).  However you do need
> destructors for the cache to let the core know that it can't reference
> it anymore.

See my other mail.

> 
> 
>>
>>>
>>>
>>> Not sure what the gain is from intercepting the msi just before the
>>> stl_phys() vs. in the apic handler.
>>
>> APIC is x86-specific, MSI is not. I think Xen will also want to make use
>> of this hook. I originally though of using it for the KVM in-kernel
>> models as well, but I will now establish a callback at APIC-level
>> (upstream will look differently from qemu-kvm in this regard).
>>
> 
> But you still have to handle it the the platform interrupt controller
> (or whatever processes msi messages) since you can still DMA there.  So
> you don't get away from doing it there anyway.

Right, but that's the slow path (which is still handled - on x86 via the
MMIO region the APIC still maintains).

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux

^ permalink raw reply	[flat|nested] 288+ messages in thread

* Re: [RFC][PATCH 12/45] msi: Introduce MSIRoutingCache
  2011-10-17 11:25         ` [Qemu-devel] " Avi Kivity
@ 2011-10-17 11:31           ` Jan Kiszka
  -1 siblings, 0 replies; 288+ messages in thread
From: Jan Kiszka @ 2011-10-17 11:31 UTC (permalink / raw)
  To: Avi Kivity
  Cc: Marcelo Tosatti, kvm, Alex Williamson, Michael S. Tsirkin, qemu-devel

On 2011-10-17 13:25, Avi Kivity wrote:
> On 10/17/2011 01:19 PM, Jan Kiszka wrote:
>>> IMO this needlessly leaks kvm information into core qemu.  The cache
>>> should be completely hidden in kvm code.
>>>
>>> I think msi_deliver() can hide the use of the cache completely.  For
>>> pre-registered events like kvm's irqfd, you can use something like
>>>
>>>   qemu_irq qemu_msi_irq(MSIMessage msg)
>>>
>>> for non-kvm, it simply returns a qemu_irq that triggers a stl_phys();
>>> for kvm, it allocates an irqfd and a permanent entry in the cache and
>>> returns a qemu_irq that triggers the irqfd.
>>
>> See my previously mail: you want to track the life-cycle of an MSI
>> source to avoid generating routes for identical sources. A messages is
>> not a source. Two identical messages can come from different sources. So
>> we need a separate data structure for that purpose.
>>
> 
> Yes, I understand this now.
> 
> Just to make sure I understand this completely:  a hash table indexed by
> MSIMessage in kvm code would avoid this?  You'd just allocate on demand
> when seeing a new MSIMessage and free on an LRU basis, avoiding pinned
> entries.
> 
> I'm not advocating this (yet), just want to understand the tradeoffs.

Practically, that may work. I just wanted to avoid searching. And for
static routes (irqfd, device assigment) you still need caches anyway, so
I decided to use them consistently.

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux

^ permalink raw reply	[flat|nested] 288+ messages in thread

* Re: [Qemu-devel] [RFC][PATCH 12/45] msi: Introduce MSIRoutingCache
@ 2011-10-17 11:31           ` Jan Kiszka
  0 siblings, 0 replies; 288+ messages in thread
From: Jan Kiszka @ 2011-10-17 11:31 UTC (permalink / raw)
  To: Avi Kivity
  Cc: Alex Williamson, Marcelo Tosatti, qemu-devel, kvm, Michael S. Tsirkin

On 2011-10-17 13:25, Avi Kivity wrote:
> On 10/17/2011 01:19 PM, Jan Kiszka wrote:
>>> IMO this needlessly leaks kvm information into core qemu.  The cache
>>> should be completely hidden in kvm code.
>>>
>>> I think msi_deliver() can hide the use of the cache completely.  For
>>> pre-registered events like kvm's irqfd, you can use something like
>>>
>>>   qemu_irq qemu_msi_irq(MSIMessage msg)
>>>
>>> for non-kvm, it simply returns a qemu_irq that triggers a stl_phys();
>>> for kvm, it allocates an irqfd and a permanent entry in the cache and
>>> returns a qemu_irq that triggers the irqfd.
>>
>> See my previously mail: you want to track the life-cycle of an MSI
>> source to avoid generating routes for identical sources. A messages is
>> not a source. Two identical messages can come from different sources. So
>> we need a separate data structure for that purpose.
>>
> 
> Yes, I understand this now.
> 
> Just to make sure I understand this completely:  a hash table indexed by
> MSIMessage in kvm code would avoid this?  You'd just allocate on demand
> when seeing a new MSIMessage and free on an LRU basis, avoiding pinned
> entries.
> 
> I'm not advocating this (yet), just want to understand the tradeoffs.

Practically, that may work. I just wanted to avoid searching. And for
static routes (irqfd, device assigment) you still need caches anyway, so
I decided to use them consistently.

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux

^ permalink raw reply	[flat|nested] 288+ messages in thread

* Re: [RFC][PATCH 23/45] qemu-kvm: Rework MSI-X mask notifier to generic MSI config notifiers
  2011-10-17  9:27   ` [Qemu-devel] " Jan Kiszka
@ 2011-10-17 11:40     ` Michael S. Tsirkin
  -1 siblings, 0 replies; 288+ messages in thread
From: Michael S. Tsirkin @ 2011-10-17 11:40 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: Avi Kivity, Marcelo Tosatti, kvm, Alex Williamson, qemu-devel

On Mon, Oct 17, 2011 at 11:27:57AM +0200, Jan Kiszka wrote:
> MSI config notifiers are supposed to be triggered on every relevant
> configuration change of MSI vectors or if MSI is enabled/disabled.
> 
> Two notifiers are established, one for vector changes and one for general
> enabling. The former notifier additionally passes the currently active
> MSI message.
> This will allow to update potential in-kernel IRQ routes on
> changes. The latter notifier is optional and will only be used by a
> subset of clients.
> 
> These notifiers are currently only available for MSI-X but will be
> extended to legacy MSI as well.
> 
> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>

Passing message, always, does not seem to make sense: message is only
valid if it is unmasked.
Further, IIRC the spec requires any changes to be done while
message is masked. So mask notifier makes more sense to me:
it does the same thing using one notifier that you do
using two notifiers.


> ---
>  hw/msix.c       |  119 +++++++++++++++++++++++++++++++++++++-----------------
>  hw/msix.h       |    6 ++-
>  hw/pci.h        |    8 ++-
>  hw/virtio-pci.c |   24 ++++++------
>  4 files changed, 102 insertions(+), 55 deletions(-)
> 
> diff --git a/hw/msix.c b/hw/msix.c
> index 247b255..176bc76 100644
> --- a/hw/msix.c
> +++ b/hw/msix.c
> @@ -219,16 +219,24 @@ static bool msix_is_masked(PCIDevice *dev, int vector)
>  	   dev->msix_table_page[offset] & PCI_MSIX_ENTRY_CTRL_MASKBIT;
>  }
>  
> -static void msix_handle_mask_update(PCIDevice *dev, int vector)
> +static void msix_fire_vector_config_notifier(PCIDevice *dev,
> +                                             unsigned int vector, bool masked)
>  {
> -    bool masked = msix_is_masked(dev, vector);
> +    MSIMessage msg;
>      int ret;
>  
> -    if (dev->msix_mask_notifier) {
> -        ret = dev->msix_mask_notifier(dev, vector,
> -                                      msix_is_masked(dev, vector));
> +    if (dev->msix_vector_config_notifier) {
> +        msix_message_from_vector(dev, vector, &msg);
> +        ret = dev->msix_vector_config_notifier(dev, vector, &msg, masked);
>          assert(ret >= 0);
>      }
> +}
> +
> +static void msix_handle_mask_update(PCIDevice *dev, int vector)
> +{
> +    bool masked = msix_is_masked(dev, vector);
> +
> +    msix_fire_vector_config_notifier(dev, vector, masked);
>      if (!masked && msix_is_pending(dev, vector)) {
>          msix_clr_pending(dev, vector);
>          msix_notify(dev, vector);
> @@ -240,20 +248,27 @@ void msix_write_config(PCIDevice *dev, uint32_t addr,
>                         uint32_t old_val, int len)
>  {
>      unsigned enable_pos = dev->msix_cap + MSIX_CONTROL_OFFSET;
> -    bool was_masked;
> +    bool was_masked, was_enabled, is_enabled;
>      int vector;
>  
>      if (!msix_present(dev) || !range_covers_byte(addr, len, enable_pos)) {
>          return;
>      }
>  
> -    if (!msix_enabled(dev)) {
> +    old_val >>= (enable_pos - addr) * 8;
> +
> +    was_enabled = old_val & MSIX_ENABLE_MASK;
> +    is_enabled = msix_enabled(dev);
> +    if (was_enabled != is_enabled && dev->msix_enable_notifier) {
> +        dev->msix_enable_notifier(dev, is_enabled);
> +    }
> +
> +    if (!is_enabled) {
>          return;
>      }
>  
>      pci_device_deassert_intx(dev);
>  
> -    old_val >>= (enable_pos - addr) * 8;
>      was_masked =
>          (old_val & (MSIX_MASKALL_MASK | MSIX_ENABLE_MASK)) != MSIX_ENABLE_MASK;
>      if (was_masked != msix_function_masked(dev)) {
> @@ -270,15 +285,20 @@ static void msix_mmio_write(void *opaque, target_phys_addr_t addr,
>      unsigned int offset = addr & (MSIX_PAGE_SIZE - 1) & ~0x3;
>      unsigned int vector = offset / PCI_MSIX_ENTRY_SIZE;
>      bool was_masked = msix_is_masked(dev, vector);
> +    bool is_masked;
>  
>      pci_set_long(dev->msix_table_page + offset, val);
>      if (kvm_enabled() && kvm_irqchip_in_kernel()) {
>          kvm_msix_update(dev, vector, was_masked, msix_is_masked(dev, vector));
>      }
>  
> -    if (vector < dev->msix_entries_nr &&
> -        was_masked != msix_is_masked(dev, vector)) {
> -        msix_handle_mask_update(dev, vector);
> +    if (vector < dev->msix_entries_nr) {
> +        is_masked = msix_is_masked(dev, vector);
> +        if (was_masked != is_masked) {
> +            msix_handle_mask_update(dev, vector);
> +        } else {
> +            msix_fire_vector_config_notifier(dev, vector, is_masked);
> +        }
>      }
>  }
>  
> @@ -305,17 +325,17 @@ static void msix_mmio_setup(PCIDevice *d, MemoryRegion *bar)
>  
>  static void msix_mask_all(struct PCIDevice *dev, unsigned nentries)
>  {
> -    int vector, r;
> +    int vector;
> +
>      for (vector = 0; vector < nentries; ++vector) {
>          unsigned offset =
>              vector * PCI_MSIX_ENTRY_SIZE + PCI_MSIX_ENTRY_VECTOR_CTRL;
>          bool was_masked = msix_is_masked(dev, vector);
> +
>          dev->msix_table_page[offset] |= PCI_MSIX_ENTRY_CTRL_MASKBIT;
> -        if (was_masked != msix_is_masked(dev, vector) &&
> -            dev->msix_mask_notifier) {
> -            r = dev->msix_mask_notifier(dev, vector,
> -                                        msix_is_masked(dev, vector));
> -            assert(r >= 0);
> +
> +        if (!was_masked) {
> +            msix_handle_mask_update(dev, vector);
>          }
>      }
>  }
> @@ -337,7 +357,6 @@ int msix_init(struct PCIDevice *dev, unsigned short nentries,
>      if (nentries > MSIX_MAX_ENTRIES)
>          return -EINVAL;
>  
> -    dev->msix_mask_notifier = NULL;
>      dev->msix_entry_used = g_malloc0(MSIX_MAX_ENTRIES *
>                                          sizeof *dev->msix_entry_used);
>  
> @@ -529,36 +548,50 @@ void msix_unuse_all_vectors(PCIDevice *dev)
>  }
>  
>  /* Invoke the notifier if vector entry is used and unmasked. */
> -static int msix_notify_if_unmasked(PCIDevice *dev, unsigned vector, int masked)
> +static int
> +msix_notify_if_unmasked(PCIDevice *dev, unsigned int vector, bool masked)
>  {
> -    assert(dev->msix_mask_notifier);
> +    MSIMessage msg;
> +
> +    assert(dev->msix_vector_config_notifier);
> +
>      if (!dev->msix_entry_used[vector] || msix_is_masked(dev, vector)) {
>          return 0;
>      }
> -    return dev->msix_mask_notifier(dev, vector, masked);
> +    msix_message_from_vector(dev, vector, &msg);
> +    return dev->msix_vector_config_notifier(dev, vector, &msg, masked);
>  }
>  
> -static int msix_set_mask_notifier_for_vector(PCIDevice *dev, unsigned vector)
> +static int
> +msix_set_config_notifier_for_vector(PCIDevice *dev, unsigned int vector)
>  {
> -	/* Notifier has been set. Invoke it on unmasked vectors. */
> -	return msix_notify_if_unmasked(dev, vector, 0);
> +    /* Notifier has been set. Invoke it on unmasked vectors. */
> +    return msix_notify_if_unmasked(dev, vector, false);
>  }
>  
> -static int msix_unset_mask_notifier_for_vector(PCIDevice *dev, unsigned vector)
> +static int
> +msix_unset_config_notifier_for_vector(PCIDevice *dev, unsigned int vector)
>  {
> -	/* Notifier will be unset. Invoke it to mask unmasked entries. */
> -	return msix_notify_if_unmasked(dev, vector, 1);
> +    /* Notifier will be unset. Invoke it to mask unmasked entries. */
> +    return msix_notify_if_unmasked(dev, vector, true);
>  }
>  
> -int msix_set_mask_notifier(PCIDevice *dev, msix_mask_notifier_func f)
> +int msix_set_config_notifiers(PCIDevice *dev,
> +                              MSIEnableNotifier enable_notifier,
> +                              MSIVectorConfigNotifier vector_config_notifier)
>  {
>      int r, n;
> -    assert(!dev->msix_mask_notifier);
> -    dev->msix_mask_notifier = f;
> +
> +    dev->msix_enable_notifier = enable_notifier;
> +    dev->msix_vector_config_notifier = vector_config_notifier;
> +
> +    if (enable_notifier && msix_enabled(dev)) {
> +        enable_notifier(dev, true);
> +    }
>      if ((dev->config[dev->msix_cap + MSIX_CONTROL_OFFSET] &
>          (MSIX_ENABLE_MASK | MSIX_MASKALL_MASK)) == MSIX_ENABLE_MASK) {
>          for (n = 0; n < dev->msix_entries_nr; ++n) {
> -            r = msix_set_mask_notifier_for_vector(dev, n);
> +            r = msix_set_config_notifier_for_vector(dev, n);
>              if (r < 0) {
>                  goto undo;
>              }
> @@ -568,31 +601,41 @@ int msix_set_mask_notifier(PCIDevice *dev, msix_mask_notifier_func f)
>  
>  undo:
>      while (--n >= 0) {
> -        msix_unset_mask_notifier_for_vector(dev, n);
> +        msix_unset_config_notifier_for_vector(dev, n);
>      }
> -    dev->msix_mask_notifier = NULL;
> +    if (enable_notifier && msix_enabled(dev)) {
> +        enable_notifier(dev, false);
> +    }
> +    dev->msix_enable_notifier = NULL;
> +    dev->msix_vector_config_notifier = NULL;
>      return r;
>  }
>  
> -int msix_unset_mask_notifier(PCIDevice *dev)
> +int msix_unset_config_notifiers(PCIDevice *dev)
>  {
>      int r, n;
> -    assert(dev->msix_mask_notifier);
> +
> +    assert(dev->msix_vector_config_notifier);
> +
>      if ((dev->config[dev->msix_cap + MSIX_CONTROL_OFFSET] &
>          (MSIX_ENABLE_MASK | MSIX_MASKALL_MASK)) == MSIX_ENABLE_MASK) {
>          for (n = 0; n < dev->msix_entries_nr; ++n) {
> -            r = msix_unset_mask_notifier_for_vector(dev, n);
> +            r = msix_unset_config_notifier_for_vector(dev, n);
>              if (r < 0) {
>                  goto undo;
>              }
>          }
>      }
> -    dev->msix_mask_notifier = NULL;
> +    if (dev->msix_enable_notifier && msix_enabled(dev)) {
> +        dev->msix_enable_notifier(dev, false);
> +    }
> +    dev->msix_enable_notifier = NULL;
> +    dev->msix_vector_config_notifier = NULL;
>      return 0;
>  
>  undo:
>      while (--n >= 0) {
> -        msix_set_mask_notifier_for_vector(dev, n);
> +        msix_set_config_notifier_for_vector(dev, n);
>      }
>      return r;
>  }
> diff --git a/hw/msix.h b/hw/msix.h
> index 685dbe2..978f417 100644
> --- a/hw/msix.h
> +++ b/hw/msix.h
> @@ -29,6 +29,8 @@ void msix_notify(PCIDevice *dev, unsigned vector);
>  
>  void msix_reset(PCIDevice *dev);
>  
> -int msix_set_mask_notifier(PCIDevice *dev, msix_mask_notifier_func);
> -int msix_unset_mask_notifier(PCIDevice *dev);
> +int msix_set_config_notifiers(PCIDevice *dev,
> +                              MSIEnableNotifier enable_notifier,
> +                              MSIVectorConfigNotifier vector_config_notifier);
> +int msix_unset_config_notifiers(PCIDevice *dev);
>  #endif
> diff --git a/hw/pci.h b/hw/pci.h
> index 0177df4..4249c6a 100644
> --- a/hw/pci.h
> +++ b/hw/pci.h
> @@ -127,8 +127,9 @@ enum {
>      QEMU_PCI_CAP_SERR = (1 << QEMU_PCI_CAP_SERR_BITNR),
>  };
>  
> -typedef int (*msix_mask_notifier_func)(PCIDevice *, unsigned vector,
> -				       int masked);
> +typedef void (*MSIEnableNotifier)(PCIDevice *dev, bool enabled);
> +typedef int (*MSIVectorConfigNotifier)(PCIDevice *dev, unsigned int vector,
> +                                       MSIMessage *msg, bool masked);
>  
>  struct PCIDevice {
>      DeviceState qdev;
> @@ -210,7 +211,8 @@ struct PCIDevice {
>       * on the rest of the region. */
>      target_phys_addr_t msix_page_size;
>  
> -    msix_mask_notifier_func msix_mask_notifier;
> +    MSIEnableNotifier msix_enable_notifier;
> +    MSIVectorConfigNotifier msix_vector_config_notifier;
>  };
>  
>  PCIDevice *pci_register_device(PCIBus *bus, const char *name,
> diff --git a/hw/virtio-pci.c b/hw/virtio-pci.c
> index ad6a002..6718945 100644
> --- a/hw/virtio-pci.c
> +++ b/hw/virtio-pci.c
> @@ -520,8 +520,8 @@ static void virtio_pci_guest_notifier_read(void *opaque)
>      }
>  }
>  
> -static int virtio_pci_mask_vq(PCIDevice *dev, unsigned vector,
> -                              VirtQueue *vq, int masked)
> +static int virtio_pci_mask_vq(PCIDevice *dev, unsigned int vector,
> +                              VirtQueue *vq, bool masked)
>  {
>      EventNotifier *notifier = virtio_queue_get_guest_notifier(vq);
>      int r = kvm_msi_irqfd_set(&dev->msix_cache[vector],
> @@ -540,8 +540,8 @@ static int virtio_pci_mask_vq(PCIDevice *dev, unsigned vector,
>      return 0;
>  }
>  
> -static int virtio_pci_mask_notifier(PCIDevice *dev, unsigned vector,
> -                                    int masked)
> +static int virtio_pci_msi_vector_config(PCIDevice *dev, unsigned int vector,
> +                                        MSIMessage *msg, bool masked)
>  {
>      VirtIOPCIProxy *proxy = container_of(dev, VirtIOPCIProxy, pci_dev);
>      VirtIODevice *vdev = proxy->vdev;
> @@ -608,11 +608,11 @@ static int virtio_pci_set_guest_notifiers(void *opaque, bool assign)
>      VirtIODevice *vdev = proxy->vdev;
>      int r, n;
>  
> -    /* Must unset mask notifier while guest notifier
> +    /* Must unset vector config notifier while guest notifier
>       * is still assigned */
>      if (!assign) {
> -	    r = msix_unset_mask_notifier(&proxy->pci_dev);
> -            assert(r >= 0);
> +        r = msix_unset_config_notifiers(&proxy->pci_dev);
> +        assert(r >= 0);
>      }
>  
>      for (n = 0; n < VIRTIO_PCI_QUEUE_MAX; n++) {
> @@ -626,11 +626,11 @@ static int virtio_pci_set_guest_notifiers(void *opaque, bool assign)
>          }
>      }
>  
> -    /* Must set mask notifier after guest notifier
> +    /* Must set vector config notifier after guest notifier
>       * has been assigned */
>      if (assign) {
> -        r = msix_set_mask_notifier(&proxy->pci_dev,
> -                                   virtio_pci_mask_notifier);
> +        r = msix_set_config_notifiers(&proxy->pci_dev, NULL,
> +                                      virtio_pci_msi_vector_config);
>          if (r < 0) {
>              goto assign_error;
>          }
> @@ -645,8 +645,8 @@ assign_error:
>      }
>  
>      if (!assign) {
> -        msix_set_mask_notifier(&proxy->pci_dev,
> -                               virtio_pci_mask_notifier);
> +        msix_set_config_notifiers(&proxy->pci_dev, NULL,
> +                                  virtio_pci_msi_vector_config);
>      }
>      return r;
>  }
> -- 
> 1.7.3.4

^ permalink raw reply	[flat|nested] 288+ messages in thread

* Re: [Qemu-devel] [RFC][PATCH 23/45] qemu-kvm: Rework MSI-X mask notifier to generic MSI config notifiers
@ 2011-10-17 11:40     ` Michael S. Tsirkin
  0 siblings, 0 replies; 288+ messages in thread
From: Michael S. Tsirkin @ 2011-10-17 11:40 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: Alex Williamson, Marcelo Tosatti, Avi Kivity, kvm, qemu-devel

On Mon, Oct 17, 2011 at 11:27:57AM +0200, Jan Kiszka wrote:
> MSI config notifiers are supposed to be triggered on every relevant
> configuration change of MSI vectors or if MSI is enabled/disabled.
> 
> Two notifiers are established, one for vector changes and one for general
> enabling. The former notifier additionally passes the currently active
> MSI message.
> This will allow to update potential in-kernel IRQ routes on
> changes. The latter notifier is optional and will only be used by a
> subset of clients.
> 
> These notifiers are currently only available for MSI-X but will be
> extended to legacy MSI as well.
> 
> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>

Passing message, always, does not seem to make sense: message is only
valid if it is unmasked.
Further, IIRC the spec requires any changes to be done while
message is masked. So mask notifier makes more sense to me:
it does the same thing using one notifier that you do
using two notifiers.


> ---
>  hw/msix.c       |  119 +++++++++++++++++++++++++++++++++++++-----------------
>  hw/msix.h       |    6 ++-
>  hw/pci.h        |    8 ++-
>  hw/virtio-pci.c |   24 ++++++------
>  4 files changed, 102 insertions(+), 55 deletions(-)
> 
> diff --git a/hw/msix.c b/hw/msix.c
> index 247b255..176bc76 100644
> --- a/hw/msix.c
> +++ b/hw/msix.c
> @@ -219,16 +219,24 @@ static bool msix_is_masked(PCIDevice *dev, int vector)
>  	   dev->msix_table_page[offset] & PCI_MSIX_ENTRY_CTRL_MASKBIT;
>  }
>  
> -static void msix_handle_mask_update(PCIDevice *dev, int vector)
> +static void msix_fire_vector_config_notifier(PCIDevice *dev,
> +                                             unsigned int vector, bool masked)
>  {
> -    bool masked = msix_is_masked(dev, vector);
> +    MSIMessage msg;
>      int ret;
>  
> -    if (dev->msix_mask_notifier) {
> -        ret = dev->msix_mask_notifier(dev, vector,
> -                                      msix_is_masked(dev, vector));
> +    if (dev->msix_vector_config_notifier) {
> +        msix_message_from_vector(dev, vector, &msg);
> +        ret = dev->msix_vector_config_notifier(dev, vector, &msg, masked);
>          assert(ret >= 0);
>      }
> +}
> +
> +static void msix_handle_mask_update(PCIDevice *dev, int vector)
> +{
> +    bool masked = msix_is_masked(dev, vector);
> +
> +    msix_fire_vector_config_notifier(dev, vector, masked);
>      if (!masked && msix_is_pending(dev, vector)) {
>          msix_clr_pending(dev, vector);
>          msix_notify(dev, vector);
> @@ -240,20 +248,27 @@ void msix_write_config(PCIDevice *dev, uint32_t addr,
>                         uint32_t old_val, int len)
>  {
>      unsigned enable_pos = dev->msix_cap + MSIX_CONTROL_OFFSET;
> -    bool was_masked;
> +    bool was_masked, was_enabled, is_enabled;
>      int vector;
>  
>      if (!msix_present(dev) || !range_covers_byte(addr, len, enable_pos)) {
>          return;
>      }
>  
> -    if (!msix_enabled(dev)) {
> +    old_val >>= (enable_pos - addr) * 8;
> +
> +    was_enabled = old_val & MSIX_ENABLE_MASK;
> +    is_enabled = msix_enabled(dev);
> +    if (was_enabled != is_enabled && dev->msix_enable_notifier) {
> +        dev->msix_enable_notifier(dev, is_enabled);
> +    }
> +
> +    if (!is_enabled) {
>          return;
>      }
>  
>      pci_device_deassert_intx(dev);
>  
> -    old_val >>= (enable_pos - addr) * 8;
>      was_masked =
>          (old_val & (MSIX_MASKALL_MASK | MSIX_ENABLE_MASK)) != MSIX_ENABLE_MASK;
>      if (was_masked != msix_function_masked(dev)) {
> @@ -270,15 +285,20 @@ static void msix_mmio_write(void *opaque, target_phys_addr_t addr,
>      unsigned int offset = addr & (MSIX_PAGE_SIZE - 1) & ~0x3;
>      unsigned int vector = offset / PCI_MSIX_ENTRY_SIZE;
>      bool was_masked = msix_is_masked(dev, vector);
> +    bool is_masked;
>  
>      pci_set_long(dev->msix_table_page + offset, val);
>      if (kvm_enabled() && kvm_irqchip_in_kernel()) {
>          kvm_msix_update(dev, vector, was_masked, msix_is_masked(dev, vector));
>      }
>  
> -    if (vector < dev->msix_entries_nr &&
> -        was_masked != msix_is_masked(dev, vector)) {
> -        msix_handle_mask_update(dev, vector);
> +    if (vector < dev->msix_entries_nr) {
> +        is_masked = msix_is_masked(dev, vector);
> +        if (was_masked != is_masked) {
> +            msix_handle_mask_update(dev, vector);
> +        } else {
> +            msix_fire_vector_config_notifier(dev, vector, is_masked);
> +        }
>      }
>  }
>  
> @@ -305,17 +325,17 @@ static void msix_mmio_setup(PCIDevice *d, MemoryRegion *bar)
>  
>  static void msix_mask_all(struct PCIDevice *dev, unsigned nentries)
>  {
> -    int vector, r;
> +    int vector;
> +
>      for (vector = 0; vector < nentries; ++vector) {
>          unsigned offset =
>              vector * PCI_MSIX_ENTRY_SIZE + PCI_MSIX_ENTRY_VECTOR_CTRL;
>          bool was_masked = msix_is_masked(dev, vector);
> +
>          dev->msix_table_page[offset] |= PCI_MSIX_ENTRY_CTRL_MASKBIT;
> -        if (was_masked != msix_is_masked(dev, vector) &&
> -            dev->msix_mask_notifier) {
> -            r = dev->msix_mask_notifier(dev, vector,
> -                                        msix_is_masked(dev, vector));
> -            assert(r >= 0);
> +
> +        if (!was_masked) {
> +            msix_handle_mask_update(dev, vector);
>          }
>      }
>  }
> @@ -337,7 +357,6 @@ int msix_init(struct PCIDevice *dev, unsigned short nentries,
>      if (nentries > MSIX_MAX_ENTRIES)
>          return -EINVAL;
>  
> -    dev->msix_mask_notifier = NULL;
>      dev->msix_entry_used = g_malloc0(MSIX_MAX_ENTRIES *
>                                          sizeof *dev->msix_entry_used);
>  
> @@ -529,36 +548,50 @@ void msix_unuse_all_vectors(PCIDevice *dev)
>  }
>  
>  /* Invoke the notifier if vector entry is used and unmasked. */
> -static int msix_notify_if_unmasked(PCIDevice *dev, unsigned vector, int masked)
> +static int
> +msix_notify_if_unmasked(PCIDevice *dev, unsigned int vector, bool masked)
>  {
> -    assert(dev->msix_mask_notifier);
> +    MSIMessage msg;
> +
> +    assert(dev->msix_vector_config_notifier);
> +
>      if (!dev->msix_entry_used[vector] || msix_is_masked(dev, vector)) {
>          return 0;
>      }
> -    return dev->msix_mask_notifier(dev, vector, masked);
> +    msix_message_from_vector(dev, vector, &msg);
> +    return dev->msix_vector_config_notifier(dev, vector, &msg, masked);
>  }
>  
> -static int msix_set_mask_notifier_for_vector(PCIDevice *dev, unsigned vector)
> +static int
> +msix_set_config_notifier_for_vector(PCIDevice *dev, unsigned int vector)
>  {
> -	/* Notifier has been set. Invoke it on unmasked vectors. */
> -	return msix_notify_if_unmasked(dev, vector, 0);
> +    /* Notifier has been set. Invoke it on unmasked vectors. */
> +    return msix_notify_if_unmasked(dev, vector, false);
>  }
>  
> -static int msix_unset_mask_notifier_for_vector(PCIDevice *dev, unsigned vector)
> +static int
> +msix_unset_config_notifier_for_vector(PCIDevice *dev, unsigned int vector)
>  {
> -	/* Notifier will be unset. Invoke it to mask unmasked entries. */
> -	return msix_notify_if_unmasked(dev, vector, 1);
> +    /* Notifier will be unset. Invoke it to mask unmasked entries. */
> +    return msix_notify_if_unmasked(dev, vector, true);
>  }
>  
> -int msix_set_mask_notifier(PCIDevice *dev, msix_mask_notifier_func f)
> +int msix_set_config_notifiers(PCIDevice *dev,
> +                              MSIEnableNotifier enable_notifier,
> +                              MSIVectorConfigNotifier vector_config_notifier)
>  {
>      int r, n;
> -    assert(!dev->msix_mask_notifier);
> -    dev->msix_mask_notifier = f;
> +
> +    dev->msix_enable_notifier = enable_notifier;
> +    dev->msix_vector_config_notifier = vector_config_notifier;
> +
> +    if (enable_notifier && msix_enabled(dev)) {
> +        enable_notifier(dev, true);
> +    }
>      if ((dev->config[dev->msix_cap + MSIX_CONTROL_OFFSET] &
>          (MSIX_ENABLE_MASK | MSIX_MASKALL_MASK)) == MSIX_ENABLE_MASK) {
>          for (n = 0; n < dev->msix_entries_nr; ++n) {
> -            r = msix_set_mask_notifier_for_vector(dev, n);
> +            r = msix_set_config_notifier_for_vector(dev, n);
>              if (r < 0) {
>                  goto undo;
>              }
> @@ -568,31 +601,41 @@ int msix_set_mask_notifier(PCIDevice *dev, msix_mask_notifier_func f)
>  
>  undo:
>      while (--n >= 0) {
> -        msix_unset_mask_notifier_for_vector(dev, n);
> +        msix_unset_config_notifier_for_vector(dev, n);
>      }
> -    dev->msix_mask_notifier = NULL;
> +    if (enable_notifier && msix_enabled(dev)) {
> +        enable_notifier(dev, false);
> +    }
> +    dev->msix_enable_notifier = NULL;
> +    dev->msix_vector_config_notifier = NULL;
>      return r;
>  }
>  
> -int msix_unset_mask_notifier(PCIDevice *dev)
> +int msix_unset_config_notifiers(PCIDevice *dev)
>  {
>      int r, n;
> -    assert(dev->msix_mask_notifier);
> +
> +    assert(dev->msix_vector_config_notifier);
> +
>      if ((dev->config[dev->msix_cap + MSIX_CONTROL_OFFSET] &
>          (MSIX_ENABLE_MASK | MSIX_MASKALL_MASK)) == MSIX_ENABLE_MASK) {
>          for (n = 0; n < dev->msix_entries_nr; ++n) {
> -            r = msix_unset_mask_notifier_for_vector(dev, n);
> +            r = msix_unset_config_notifier_for_vector(dev, n);
>              if (r < 0) {
>                  goto undo;
>              }
>          }
>      }
> -    dev->msix_mask_notifier = NULL;
> +    if (dev->msix_enable_notifier && msix_enabled(dev)) {
> +        dev->msix_enable_notifier(dev, false);
> +    }
> +    dev->msix_enable_notifier = NULL;
> +    dev->msix_vector_config_notifier = NULL;
>      return 0;
>  
>  undo:
>      while (--n >= 0) {
> -        msix_set_mask_notifier_for_vector(dev, n);
> +        msix_set_config_notifier_for_vector(dev, n);
>      }
>      return r;
>  }
> diff --git a/hw/msix.h b/hw/msix.h
> index 685dbe2..978f417 100644
> --- a/hw/msix.h
> +++ b/hw/msix.h
> @@ -29,6 +29,8 @@ void msix_notify(PCIDevice *dev, unsigned vector);
>  
>  void msix_reset(PCIDevice *dev);
>  
> -int msix_set_mask_notifier(PCIDevice *dev, msix_mask_notifier_func);
> -int msix_unset_mask_notifier(PCIDevice *dev);
> +int msix_set_config_notifiers(PCIDevice *dev,
> +                              MSIEnableNotifier enable_notifier,
> +                              MSIVectorConfigNotifier vector_config_notifier);
> +int msix_unset_config_notifiers(PCIDevice *dev);
>  #endif
> diff --git a/hw/pci.h b/hw/pci.h
> index 0177df4..4249c6a 100644
> --- a/hw/pci.h
> +++ b/hw/pci.h
> @@ -127,8 +127,9 @@ enum {
>      QEMU_PCI_CAP_SERR = (1 << QEMU_PCI_CAP_SERR_BITNR),
>  };
>  
> -typedef int (*msix_mask_notifier_func)(PCIDevice *, unsigned vector,
> -				       int masked);
> +typedef void (*MSIEnableNotifier)(PCIDevice *dev, bool enabled);
> +typedef int (*MSIVectorConfigNotifier)(PCIDevice *dev, unsigned int vector,
> +                                       MSIMessage *msg, bool masked);
>  
>  struct PCIDevice {
>      DeviceState qdev;
> @@ -210,7 +211,8 @@ struct PCIDevice {
>       * on the rest of the region. */
>      target_phys_addr_t msix_page_size;
>  
> -    msix_mask_notifier_func msix_mask_notifier;
> +    MSIEnableNotifier msix_enable_notifier;
> +    MSIVectorConfigNotifier msix_vector_config_notifier;
>  };
>  
>  PCIDevice *pci_register_device(PCIBus *bus, const char *name,
> diff --git a/hw/virtio-pci.c b/hw/virtio-pci.c
> index ad6a002..6718945 100644
> --- a/hw/virtio-pci.c
> +++ b/hw/virtio-pci.c
> @@ -520,8 +520,8 @@ static void virtio_pci_guest_notifier_read(void *opaque)
>      }
>  }
>  
> -static int virtio_pci_mask_vq(PCIDevice *dev, unsigned vector,
> -                              VirtQueue *vq, int masked)
> +static int virtio_pci_mask_vq(PCIDevice *dev, unsigned int vector,
> +                              VirtQueue *vq, bool masked)
>  {
>      EventNotifier *notifier = virtio_queue_get_guest_notifier(vq);
>      int r = kvm_msi_irqfd_set(&dev->msix_cache[vector],
> @@ -540,8 +540,8 @@ static int virtio_pci_mask_vq(PCIDevice *dev, unsigned vector,
>      return 0;
>  }
>  
> -static int virtio_pci_mask_notifier(PCIDevice *dev, unsigned vector,
> -                                    int masked)
> +static int virtio_pci_msi_vector_config(PCIDevice *dev, unsigned int vector,
> +                                        MSIMessage *msg, bool masked)
>  {
>      VirtIOPCIProxy *proxy = container_of(dev, VirtIOPCIProxy, pci_dev);
>      VirtIODevice *vdev = proxy->vdev;
> @@ -608,11 +608,11 @@ static int virtio_pci_set_guest_notifiers(void *opaque, bool assign)
>      VirtIODevice *vdev = proxy->vdev;
>      int r, n;
>  
> -    /* Must unset mask notifier while guest notifier
> +    /* Must unset vector config notifier while guest notifier
>       * is still assigned */
>      if (!assign) {
> -	    r = msix_unset_mask_notifier(&proxy->pci_dev);
> -            assert(r >= 0);
> +        r = msix_unset_config_notifiers(&proxy->pci_dev);
> +        assert(r >= 0);
>      }
>  
>      for (n = 0; n < VIRTIO_PCI_QUEUE_MAX; n++) {
> @@ -626,11 +626,11 @@ static int virtio_pci_set_guest_notifiers(void *opaque, bool assign)
>          }
>      }
>  
> -    /* Must set mask notifier after guest notifier
> +    /* Must set vector config notifier after guest notifier
>       * has been assigned */
>      if (assign) {
> -        r = msix_set_mask_notifier(&proxy->pci_dev,
> -                                   virtio_pci_mask_notifier);
> +        r = msix_set_config_notifiers(&proxy->pci_dev, NULL,
> +                                      virtio_pci_msi_vector_config);
>          if (r < 0) {
>              goto assign_error;
>          }
> @@ -645,8 +645,8 @@ assign_error:
>      }
>  
>      if (!assign) {
> -        msix_set_mask_notifier(&proxy->pci_dev,
> -                               virtio_pci_mask_notifier);
> +        msix_set_config_notifiers(&proxy->pci_dev, NULL,
> +                                  virtio_pci_msi_vector_config);
>      }
>      return r;
>  }
> -- 
> 1.7.3.4

^ permalink raw reply	[flat|nested] 288+ messages in thread

* Re: [RFC][PATCH 23/45] qemu-kvm: Rework MSI-X mask notifier to generic MSI config notifiers
  2011-10-17 11:40     ` [Qemu-devel] " Michael S. Tsirkin
@ 2011-10-17 11:45       ` Jan Kiszka
  -1 siblings, 0 replies; 288+ messages in thread
From: Jan Kiszka @ 2011-10-17 11:45 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Avi Kivity, Marcelo Tosatti, kvm, Alex Williamson, qemu-devel

On 2011-10-17 13:40, Michael S. Tsirkin wrote:
> On Mon, Oct 17, 2011 at 11:27:57AM +0200, Jan Kiszka wrote:
>> MSI config notifiers are supposed to be triggered on every relevant
>> configuration change of MSI vectors or if MSI is enabled/disabled.
>>
>> Two notifiers are established, one for vector changes and one for general
>> enabling. The former notifier additionally passes the currently active
>> MSI message.
>> This will allow to update potential in-kernel IRQ routes on
>> changes. The latter notifier is optional and will only be used by a
>> subset of clients.
>>
>> These notifiers are currently only available for MSI-X but will be
>> extended to legacy MSI as well.
>>
>> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
> 
> Passing message, always, does not seem to make sense: message is only
> valid if it is unmasked.

If we go from unmasked to masked, the consumer could just ignore the
message.

> Further, IIRC the spec requires any changes to be done while
> message is masked. So mask notifier makes more sense to me:
> it does the same thing using one notifier that you do
> using two notifiers.

That's in fact a possible optimization (only invoke the callback on mask
transitions). Not sure if that applies to MSI as well, probably not. To
have common types, I would prefer to stay with vector config notifiers
as name then.

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux

^ permalink raw reply	[flat|nested] 288+ messages in thread

* Re: [Qemu-devel] [RFC][PATCH 23/45] qemu-kvm: Rework MSI-X mask notifier to generic MSI config notifiers
@ 2011-10-17 11:45       ` Jan Kiszka
  0 siblings, 0 replies; 288+ messages in thread
From: Jan Kiszka @ 2011-10-17 11:45 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Alex Williamson, Marcelo Tosatti, Avi Kivity, kvm, qemu-devel

On 2011-10-17 13:40, Michael S. Tsirkin wrote:
> On Mon, Oct 17, 2011 at 11:27:57AM +0200, Jan Kiszka wrote:
>> MSI config notifiers are supposed to be triggered on every relevant
>> configuration change of MSI vectors or if MSI is enabled/disabled.
>>
>> Two notifiers are established, one for vector changes and one for general
>> enabling. The former notifier additionally passes the currently active
>> MSI message.
>> This will allow to update potential in-kernel IRQ routes on
>> changes. The latter notifier is optional and will only be used by a
>> subset of clients.
>>
>> These notifiers are currently only available for MSI-X but will be
>> extended to legacy MSI as well.
>>
>> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
> 
> Passing message, always, does not seem to make sense: message is only
> valid if it is unmasked.

If we go from unmasked to masked, the consumer could just ignore the
message.

> Further, IIRC the spec requires any changes to be done while
> message is masked. So mask notifier makes more sense to me:
> it does the same thing using one notifier that you do
> using two notifiers.

That's in fact a possible optimization (only invoke the callback on mask
transitions). Not sure if that applies to MSI as well, probably not. To
have common types, I would prefer to stay with vector config notifiers
as name then.

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux

^ permalink raw reply	[flat|nested] 288+ messages in thread

* Re: [RFC][PATCH 08/45] Introduce MSIMessage structure
  2011-10-17  9:27   ` [Qemu-devel] " Jan Kiszka
@ 2011-10-17 11:46     ` Michael S. Tsirkin
  -1 siblings, 0 replies; 288+ messages in thread
From: Michael S. Tsirkin @ 2011-10-17 11:46 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: Alex Williamson, Marcelo Tosatti, Avi Kivity, kvm, qemu-devel

On Mon, Oct 17, 2011 at 11:27:42AM +0200, Jan Kiszka wrote:
> Will be used for generating and distributing MSI messages, both in
> emulation mode and under KVM.
> 
> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>

I would add

uint64_t msix_get_address(dev, vector)
uint64_t msix_get_data(dev, vector)

and same for msi.

this would minimise the changes while still making it
possible to avoid code duplication in kvm.

> ---
>  hw/msi.h      |    5 +++++
>  qemu-common.h |    1 +
>  2 files changed, 6 insertions(+), 0 deletions(-)
> 
> diff --git a/hw/msi.h b/hw/msi.h
> index e5e821f..22e3932 100644
> --- a/hw/msi.h
> +++ b/hw/msi.h
> @@ -24,6 +24,11 @@
>  #include "qemu-common.h"
>  #include "pci.h"
>  
> +struct MSIMessage {
> +    uint64_t address;
> +    uint32_t data;
> +};
> +
>  extern bool msi_supported;
>  
>  bool msi_enabled(const PCIDevice *dev);
> diff --git a/qemu-common.h b/qemu-common.h
> index 5e87bdf..d3901bd 100644
> --- a/qemu-common.h
> +++ b/qemu-common.h
> @@ -15,6 +15,7 @@ typedef struct QEMUTimer QEMUTimer;
>  typedef struct QEMUFile QEMUFile;
>  typedef struct QEMUBH QEMUBH;
>  typedef struct DeviceState DeviceState;
> +typedef struct MSIMessage MSIMessage;
>  
>  struct Monitor;
>  typedef struct Monitor Monitor;
> -- 
> 1.7.3.4

^ permalink raw reply	[flat|nested] 288+ messages in thread

* Re: [Qemu-devel] [RFC][PATCH 08/45] Introduce MSIMessage structure
@ 2011-10-17 11:46     ` Michael S. Tsirkin
  0 siblings, 0 replies; 288+ messages in thread
From: Michael S. Tsirkin @ 2011-10-17 11:46 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: Alex Williamson, Marcelo Tosatti, Avi Kivity, kvm, qemu-devel

On Mon, Oct 17, 2011 at 11:27:42AM +0200, Jan Kiszka wrote:
> Will be used for generating and distributing MSI messages, both in
> emulation mode and under KVM.
> 
> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>

I would add

uint64_t msix_get_address(dev, vector)
uint64_t msix_get_data(dev, vector)

and same for msi.

this would minimise the changes while still making it
possible to avoid code duplication in kvm.

> ---
>  hw/msi.h      |    5 +++++
>  qemu-common.h |    1 +
>  2 files changed, 6 insertions(+), 0 deletions(-)
> 
> diff --git a/hw/msi.h b/hw/msi.h
> index e5e821f..22e3932 100644
> --- a/hw/msi.h
> +++ b/hw/msi.h
> @@ -24,6 +24,11 @@
>  #include "qemu-common.h"
>  #include "pci.h"
>  
> +struct MSIMessage {
> +    uint64_t address;
> +    uint32_t data;
> +};
> +
>  extern bool msi_supported;
>  
>  bool msi_enabled(const PCIDevice *dev);
> diff --git a/qemu-common.h b/qemu-common.h
> index 5e87bdf..d3901bd 100644
> --- a/qemu-common.h
> +++ b/qemu-common.h
> @@ -15,6 +15,7 @@ typedef struct QEMUTimer QEMUTimer;
>  typedef struct QEMUFile QEMUFile;
>  typedef struct QEMUBH QEMUBH;
>  typedef struct DeviceState DeviceState;
> +typedef struct MSIMessage MSIMessage;
>  
>  struct Monitor;
>  typedef struct Monitor Monitor;
> -- 
> 1.7.3.4

^ permalink raw reply	[flat|nested] 288+ messages in thread

* Re: [RFC][PATCH 08/45] Introduce MSIMessage structure
  2011-10-17 11:46     ` [Qemu-devel] " Michael S. Tsirkin
@ 2011-10-17 11:51       ` Jan Kiszka
  -1 siblings, 0 replies; 288+ messages in thread
From: Jan Kiszka @ 2011-10-17 11:51 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Alex Williamson, Marcelo Tosatti, Avi Kivity, kvm, qemu-devel

On 2011-10-17 13:46, Michael S. Tsirkin wrote:
> On Mon, Oct 17, 2011 at 11:27:42AM +0200, Jan Kiszka wrote:
>> Will be used for generating and distributing MSI messages, both in
>> emulation mode and under KVM.
>>
>> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
> 
> I would add
> 
> uint64_t msix_get_address(dev, vector)
> uint64_t msix_get_data(dev, vector)
> 
> and same for msi.
> 
> this would minimise the changes while still making it
> possible to avoid code duplication in kvm.

I'm introducing msi[x]_message_from_vector for that purpose later on. Or
what do you mean?

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux

^ permalink raw reply	[flat|nested] 288+ messages in thread

* Re: [Qemu-devel] [RFC][PATCH 08/45] Introduce MSIMessage structure
@ 2011-10-17 11:51       ` Jan Kiszka
  0 siblings, 0 replies; 288+ messages in thread
From: Jan Kiszka @ 2011-10-17 11:51 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Alex Williamson, Marcelo Tosatti, Avi Kivity, kvm, qemu-devel

On 2011-10-17 13:46, Michael S. Tsirkin wrote:
> On Mon, Oct 17, 2011 at 11:27:42AM +0200, Jan Kiszka wrote:
>> Will be used for generating and distributing MSI messages, both in
>> emulation mode and under KVM.
>>
>> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
> 
> I would add
> 
> uint64_t msix_get_address(dev, vector)
> uint64_t msix_get_data(dev, vector)
> 
> and same for msi.
> 
> this would minimise the changes while still making it
> possible to avoid code duplication in kvm.

I'm introducing msi[x]_message_from_vector for that purpose later on. Or
what do you mean?

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux

^ permalink raw reply	[flat|nested] 288+ messages in thread

* Re: [RFC][PATCH 06/45] msix: Prevent bogus mask updates on MMIO accesses
  2011-10-17 11:23       ` [Qemu-devel] " Jan Kiszka
@ 2011-10-17 11:57         ` Michael S. Tsirkin
  -1 siblings, 0 replies; 288+ messages in thread
From: Michael S. Tsirkin @ 2011-10-17 11:57 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: Avi Kivity, Marcelo Tosatti, kvm, Alex Williamson, qemu-devel

On Mon, Oct 17, 2011 at 01:23:46PM +0200, Jan Kiszka wrote:
> On 2011-10-17 13:10, Michael S. Tsirkin wrote:
> > On Mon, Oct 17, 2011 at 11:27:40AM +0200, Jan Kiszka wrote:
> >> Only accesses to the MSI-X table must trigger a call to
> >> msix_handle_mask_update or a notifier invocation.
> >>
> >> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
> > 
> > Why would msix_mmio_write be called on an access
> > outside the table?
> 
> Because it handles both the table and the PBA.

Hmm. Interesting. Is there a bug in how we handle PBA
updates then? If yes I'd like a separate patch for that
to apply to the stable tree.

BTW, this code will go away if PBA can get stored separately?


> > 
> >> ---
> >>  hw/msix.c |   16 ++++++++++------
> >>  1 files changed, 10 insertions(+), 6 deletions(-)
> >>
> >> diff --git a/hw/msix.c b/hw/msix.c
> >> index 2c4de21..33cb716 100644
> >> --- a/hw/msix.c
> >> +++ b/hw/msix.c
> >> @@ -264,18 +264,22 @@ static void msix_mmio_write(void *opaque, target_phys_addr_t addr,
> >>  {
> >>      PCIDevice *dev = opaque;
> >>      unsigned int offset = addr & (MSIX_PAGE_SIZE - 1) & ~0x3;
> >> -    int vector = offset / PCI_MSIX_ENTRY_SIZE;
> >> +    unsigned int vector = offset / PCI_MSIX_ENTRY_SIZE;
> > 
> > Why the int/unsigned change? this has no chance to overflow, and using
> > unsigned causes signed/unsigned comparison below,
> > and unsigned/signed conversion on calls such as msix_is_masked.
> 
> Vectors should be unsigned int, this is just one step in that direction

Not sure why, but if you want to rework this we
would be better off doing the conversion in one go. Making half the code
use unsigned and half signed is way worse.

> as we are at it.

Should be a separate patch.

> Even if the overflow is practically impossible, this
> remains cleaner.

I have to say this change if done throughout would introduce
a lot of code churn. The potential of introducing bugs
seems higher than the potential to find/fix them.

> > 
> >>      int was_masked = msix_is_masked(dev, vector);
> >>      pci_set_long(dev->msix_table_page + offset, val);
> >>      if (kvm_enabled() && kvm_irqchip_in_kernel()) {
> >>          kvm_msix_update(dev, vector, was_masked, msix_is_masked(dev, vector));
> >>      }
> > 
> > I would say if we need to check the address, check it first thing
> > and return if the address is out of a sensible range.
> 
> Will do that later when generalized MSI-X support.

But then do we need this patch at all?

> > For example, are you worried about kvm_msix_update calls with
> > a sensible mask?
> 
> No, that kvm code will die anyway.

Yes but we care about stable too, if there's a bug there
we need to fix it.

> Jan
> 
> -- 
> Siemens AG, Corporate Technology, CT T DE IT 1
> Corporate Competence Center Embedded Linux

^ permalink raw reply	[flat|nested] 288+ messages in thread

* Re: [Qemu-devel] [RFC][PATCH 06/45] msix: Prevent bogus mask updates on MMIO accesses
@ 2011-10-17 11:57         ` Michael S. Tsirkin
  0 siblings, 0 replies; 288+ messages in thread
From: Michael S. Tsirkin @ 2011-10-17 11:57 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: Alex Williamson, Marcelo Tosatti, Avi Kivity, kvm, qemu-devel

On Mon, Oct 17, 2011 at 01:23:46PM +0200, Jan Kiszka wrote:
> On 2011-10-17 13:10, Michael S. Tsirkin wrote:
> > On Mon, Oct 17, 2011 at 11:27:40AM +0200, Jan Kiszka wrote:
> >> Only accesses to the MSI-X table must trigger a call to
> >> msix_handle_mask_update or a notifier invocation.
> >>
> >> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
> > 
> > Why would msix_mmio_write be called on an access
> > outside the table?
> 
> Because it handles both the table and the PBA.

Hmm. Interesting. Is there a bug in how we handle PBA
updates then? If yes I'd like a separate patch for that
to apply to the stable tree.

BTW, this code will go away if PBA can get stored separately?


> > 
> >> ---
> >>  hw/msix.c |   16 ++++++++++------
> >>  1 files changed, 10 insertions(+), 6 deletions(-)
> >>
> >> diff --git a/hw/msix.c b/hw/msix.c
> >> index 2c4de21..33cb716 100644
> >> --- a/hw/msix.c
> >> +++ b/hw/msix.c
> >> @@ -264,18 +264,22 @@ static void msix_mmio_write(void *opaque, target_phys_addr_t addr,
> >>  {
> >>      PCIDevice *dev = opaque;
> >>      unsigned int offset = addr & (MSIX_PAGE_SIZE - 1) & ~0x3;
> >> -    int vector = offset / PCI_MSIX_ENTRY_SIZE;
> >> +    unsigned int vector = offset / PCI_MSIX_ENTRY_SIZE;
> > 
> > Why the int/unsigned change? this has no chance to overflow, and using
> > unsigned causes signed/unsigned comparison below,
> > and unsigned/signed conversion on calls such as msix_is_masked.
> 
> Vectors should be unsigned int, this is just one step in that direction

Not sure why, but if you want to rework this we
would be better off doing the conversion in one go. Making half the code
use unsigned and half signed is way worse.

> as we are at it.

Should be a separate patch.

> Even if the overflow is practically impossible, this
> remains cleaner.

I have to say this change if done throughout would introduce
a lot of code churn. The potential of introducing bugs
seems higher than the potential to find/fix them.

> > 
> >>      int was_masked = msix_is_masked(dev, vector);
> >>      pci_set_long(dev->msix_table_page + offset, val);
> >>      if (kvm_enabled() && kvm_irqchip_in_kernel()) {
> >>          kvm_msix_update(dev, vector, was_masked, msix_is_masked(dev, vector));
> >>      }
> > 
> > I would say if we need to check the address, check it first thing
> > and return if the address is out of a sensible range.
> 
> Will do that later when generalized MSI-X support.

But then do we need this patch at all?

> > For example, are you worried about kvm_msix_update calls with
> > a sensible mask?
> 
> No, that kvm code will die anyway.

Yes but we care about stable too, if there's a bug there
we need to fix it.

> Jan
> 
> -- 
> Siemens AG, Corporate Technology, CT T DE IT 1
> Corporate Competence Center Embedded Linux

^ permalink raw reply	[flat|nested] 288+ messages in thread

* Re: [RFC][PATCH 08/45] Introduce MSIMessage structure
  2011-10-17 11:51       ` [Qemu-devel] " Jan Kiszka
@ 2011-10-17 12:04         ` Michael S. Tsirkin
  -1 siblings, 0 replies; 288+ messages in thread
From: Michael S. Tsirkin @ 2011-10-17 12:04 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: Avi Kivity, Marcelo Tosatti, kvm, Alex Williamson, qemu-devel

On Mon, Oct 17, 2011 at 01:51:00PM +0200, Jan Kiszka wrote:
> On 2011-10-17 13:46, Michael S. Tsirkin wrote:
> > On Mon, Oct 17, 2011 at 11:27:42AM +0200, Jan Kiszka wrote:
> >> Will be used for generating and distributing MSI messages, both in
> >> emulation mode and under KVM.
> >>
> >> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
> > 
> > I would add
> > 
> > uint64_t msix_get_address(dev, vector)
> > uint64_t msix_get_data(dev, vector)
> > 
> > and same for msi.
> > 
> > this would minimise the changes while still making it
> > possible to avoid code duplication in kvm.
> 
> I'm introducing msi[x]_message_from_vector for that purpose later on. Or
> what do you mean?
> 
> Jan

It does not look like everyone actually wants the structure,
users seem to put it on stack and then immediately
unwrap it to get at the address/data.
So two accessorts get_data + get_address instead of one, will
remove the need to rework all code to use the structure.

> -- 
> Siemens AG, Corporate Technology, CT T DE IT 1
> Corporate Competence Center Embedded Linux

^ permalink raw reply	[flat|nested] 288+ messages in thread

* Re: [Qemu-devel] [RFC][PATCH 08/45] Introduce MSIMessage structure
@ 2011-10-17 12:04         ` Michael S. Tsirkin
  0 siblings, 0 replies; 288+ messages in thread
From: Michael S. Tsirkin @ 2011-10-17 12:04 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: Alex Williamson, Marcelo Tosatti, Avi Kivity, kvm, qemu-devel

On Mon, Oct 17, 2011 at 01:51:00PM +0200, Jan Kiszka wrote:
> On 2011-10-17 13:46, Michael S. Tsirkin wrote:
> > On Mon, Oct 17, 2011 at 11:27:42AM +0200, Jan Kiszka wrote:
> >> Will be used for generating and distributing MSI messages, both in
> >> emulation mode and under KVM.
> >>
> >> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
> > 
> > I would add
> > 
> > uint64_t msix_get_address(dev, vector)
> > uint64_t msix_get_data(dev, vector)
> > 
> > and same for msi.
> > 
> > this would minimise the changes while still making it
> > possible to avoid code duplication in kvm.
> 
> I'm introducing msi[x]_message_from_vector for that purpose later on. Or
> what do you mean?
> 
> Jan

It does not look like everyone actually wants the structure,
users seem to put it on stack and then immediately
unwrap it to get at the address/data.
So two accessorts get_data + get_address instead of one, will
remove the need to rework all code to use the structure.

> -- 
> Siemens AG, Corporate Technology, CT T DE IT 1
> Corporate Competence Center Embedded Linux

^ permalink raw reply	[flat|nested] 288+ messages in thread

* Re: [RFC][PATCH 06/45] msix: Prevent bogus mask updates on MMIO accesses
  2011-10-17 11:57         ` [Qemu-devel] " Michael S. Tsirkin
@ 2011-10-17 12:07           ` Jan Kiszka
  -1 siblings, 0 replies; 288+ messages in thread
From: Jan Kiszka @ 2011-10-17 12:07 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Avi Kivity, Marcelo Tosatti, kvm, Alex Williamson, qemu-devel

On 2011-10-17 13:57, Michael S. Tsirkin wrote:
> On Mon, Oct 17, 2011 at 01:23:46PM +0200, Jan Kiszka wrote:
>> On 2011-10-17 13:10, Michael S. Tsirkin wrote:
>>> On Mon, Oct 17, 2011 at 11:27:40AM +0200, Jan Kiszka wrote:
>>>> Only accesses to the MSI-X table must trigger a call to
>>>> msix_handle_mask_update or a notifier invocation.
>>>>
>>>> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
>>>
>>> Why would msix_mmio_write be called on an access
>>> outside the table?
>>
>> Because it handles both the table and the PBA.
> 
> Hmm. Interesting. Is there a bug in how we handle PBA
> updates then? If yes I'd like a separate patch for that
> to apply to the stable tree.

I first thought it was a serious bug, but it just triggers if the guest
write to PBA (which is very uncommon) and that actually triggers any
spurious out-of-bounds vector injection. Highly unlikely.

> 
> BTW, this code will go away if PBA can get stored separately?

Hmm - yeah, true. Likely it's moot to discuss this change then.

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux

^ permalink raw reply	[flat|nested] 288+ messages in thread

* Re: [Qemu-devel] [RFC][PATCH 06/45] msix: Prevent bogus mask updates on MMIO accesses
@ 2011-10-17 12:07           ` Jan Kiszka
  0 siblings, 0 replies; 288+ messages in thread
From: Jan Kiszka @ 2011-10-17 12:07 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Alex Williamson, Marcelo Tosatti, Avi Kivity, kvm, qemu-devel

On 2011-10-17 13:57, Michael S. Tsirkin wrote:
> On Mon, Oct 17, 2011 at 01:23:46PM +0200, Jan Kiszka wrote:
>> On 2011-10-17 13:10, Michael S. Tsirkin wrote:
>>> On Mon, Oct 17, 2011 at 11:27:40AM +0200, Jan Kiszka wrote:
>>>> Only accesses to the MSI-X table must trigger a call to
>>>> msix_handle_mask_update or a notifier invocation.
>>>>
>>>> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
>>>
>>> Why would msix_mmio_write be called on an access
>>> outside the table?
>>
>> Because it handles both the table and the PBA.
> 
> Hmm. Interesting. Is there a bug in how we handle PBA
> updates then? If yes I'd like a separate patch for that
> to apply to the stable tree.

I first thought it was a serious bug, but it just triggers if the guest
write to PBA (which is very uncommon) and that actually triggers any
spurious out-of-bounds vector injection. Highly unlikely.

> 
> BTW, this code will go away if PBA can get stored separately?

Hmm - yeah, true. Likely it's moot to discuss this change then.

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux

^ permalink raw reply	[flat|nested] 288+ messages in thread

* Re: [RFC][PATCH 08/45] Introduce MSIMessage structure
  2011-10-17 12:04         ` [Qemu-devel] " Michael S. Tsirkin
@ 2011-10-17 12:09           ` Jan Kiszka
  -1 siblings, 0 replies; 288+ messages in thread
From: Jan Kiszka @ 2011-10-17 12:09 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Avi Kivity, Marcelo Tosatti, kvm, Alex Williamson, qemu-devel

On 2011-10-17 14:04, Michael S. Tsirkin wrote:
> On Mon, Oct 17, 2011 at 01:51:00PM +0200, Jan Kiszka wrote:
>> On 2011-10-17 13:46, Michael S. Tsirkin wrote:
>>> On Mon, Oct 17, 2011 at 11:27:42AM +0200, Jan Kiszka wrote:
>>>> Will be used for generating and distributing MSI messages, both in
>>>> emulation mode and under KVM.
>>>>
>>>> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
>>>
>>> I would add
>>>
>>> uint64_t msix_get_address(dev, vector)
>>> uint64_t msix_get_data(dev, vector)
>>>
>>> and same for msi.
>>>
>>> this would minimise the changes while still making it
>>> possible to avoid code duplication in kvm.
>>
>> I'm introducing msi[x]_message_from_vector for that purpose later on. Or
>> what do you mean?
>>
>> Jan
> 
> It does not look like everyone actually wants the structure,
> users seem to put it on stack and then immediately
> unwrap it to get at the address/data.
> So two accessorts get_data + get_address instead of one, will
> remove the need to rework all code to use the structure.

The idea of this patch is to start handling MSI messages as a single
blob. There should be no need to ask a device for parts of that blobs
this way. If you see use cases in this series, though, let me know.

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux

^ permalink raw reply	[flat|nested] 288+ messages in thread

* Re: [Qemu-devel] [RFC][PATCH 08/45] Introduce MSIMessage structure
@ 2011-10-17 12:09           ` Jan Kiszka
  0 siblings, 0 replies; 288+ messages in thread
From: Jan Kiszka @ 2011-10-17 12:09 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Alex Williamson, Marcelo Tosatti, Avi Kivity, kvm, qemu-devel

On 2011-10-17 14:04, Michael S. Tsirkin wrote:
> On Mon, Oct 17, 2011 at 01:51:00PM +0200, Jan Kiszka wrote:
>> On 2011-10-17 13:46, Michael S. Tsirkin wrote:
>>> On Mon, Oct 17, 2011 at 11:27:42AM +0200, Jan Kiszka wrote:
>>>> Will be used for generating and distributing MSI messages, both in
>>>> emulation mode and under KVM.
>>>>
>>>> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
>>>
>>> I would add
>>>
>>> uint64_t msix_get_address(dev, vector)
>>> uint64_t msix_get_data(dev, vector)
>>>
>>> and same for msi.
>>>
>>> this would minimise the changes while still making it
>>> possible to avoid code duplication in kvm.
>>
>> I'm introducing msi[x]_message_from_vector for that purpose later on. Or
>> what do you mean?
>>
>> Jan
> 
> It does not look like everyone actually wants the structure,
> users seem to put it on stack and then immediately
> unwrap it to get at the address/data.
> So two accessorts get_data + get_address instead of one, will
> remove the need to rework all code to use the structure.

The idea of this patch is to start handling MSI messages as a single
blob. There should be no need to ask a device for parts of that blobs
this way. If you see use cases in this series, though, let me know.

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux

^ permalink raw reply	[flat|nested] 288+ messages in thread

* Re: [RFC][PATCH 11/45] msi: Factor out delivery hook
  2011-10-17 11:29           ` [Qemu-devel] " Jan Kiszka
@ 2011-10-17 12:14             ` Avi Kivity
  -1 siblings, 0 replies; 288+ messages in thread
From: Avi Kivity @ 2011-10-17 12:14 UTC (permalink / raw)
  To: Jan Kiszka
  Cc: Marcelo Tosatti, kvm, Alex Williamson, Michael S. Tsirkin, qemu-devel

On 10/17/2011 01:29 PM, Jan Kiszka wrote:
> >>
> >> APIC is x86-specific, MSI is not. I think Xen will also want to make use
> >> of this hook. I originally though of using it for the KVM in-kernel
> >> models as well, but I will now establish a callback at APIC-level
> >> (upstream will look differently from qemu-kvm in this regard).
> >>
> > 
> > But you still have to handle it the the platform interrupt controller
> > (or whatever processes msi messages) since you can still DMA there.  So
> > you don't get away from doing it there anyway.
>
> Right, but that's the slow path (which is still handled - on x86 via the
> MMIO region the APIC still maintains).
>

It's handled by caching and immediately uncaching the MSIMessage/kvm
route relationship?

Can you post a git tree?  It will be easier for me to understand the
whole thing this way.

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 288+ messages in thread

* Re: [Qemu-devel] [RFC][PATCH 11/45] msi: Factor out delivery hook
@ 2011-10-17 12:14             ` Avi Kivity
  0 siblings, 0 replies; 288+ messages in thread
From: Avi Kivity @ 2011-10-17 12:14 UTC (permalink / raw)
  To: Jan Kiszka
  Cc: Alex Williamson, Marcelo Tosatti, qemu-devel, kvm, Michael S. Tsirkin

On 10/17/2011 01:29 PM, Jan Kiszka wrote:
> >>
> >> APIC is x86-specific, MSI is not. I think Xen will also want to make use
> >> of this hook. I originally though of using it for the KVM in-kernel
> >> models as well, but I will now establish a callback at APIC-level
> >> (upstream will look differently from qemu-kvm in this regard).
> >>
> > 
> > But you still have to handle it the the platform interrupt controller
> > (or whatever processes msi messages) since you can still DMA there.  So
> > you don't get away from doing it there anyway.
>
> Right, but that's the slow path (which is still handled - on x86 via the
> MMIO region the APIC still maintains).
>

It's handled by caching and immediately uncaching the MSIMessage/kvm
route relationship?

Can you post a git tree?  It will be easier for me to understand the
whole thing this way.

-- 
error compiling committee.c: too many arguments to function

^ permalink raw reply	[flat|nested] 288+ messages in thread

* Re: [RFC][PATCH 17/45] qemu-kvm: Track MSIRoutingCache in KVM routing table
  2011-10-17 11:25       ` [Qemu-devel] " Jan Kiszka
@ 2011-10-17 12:15         ` Avi Kivity
  -1 siblings, 0 replies; 288+ messages in thread
From: Avi Kivity @ 2011-10-17 12:15 UTC (permalink / raw)
  To: Jan Kiszka
  Cc: Alex Williamson, Marcelo Tosatti, qemu-devel, kvm, Michael S. Tsirkin

On 10/17/2011 01:25 PM, Jan Kiszka wrote:
> On 2011-10-17 13:13, Avi Kivity wrote:
> > On 10/17/2011 11:27 AM, Jan Kiszka wrote:
> >> Keep a link from the internal KVM routing table to potential MSI routing
> >> cache entries. The link is used so far whenever the entry is dropped to
> >> invalidate the cache content. It will allow us to build MSI routing
> >> entries on demand and flush existing ones on table overflow.
> >>
> > 
> > Does this not require a destructor for MSIRoutingCache?
>
> Yes, kvm_msi_cache_invalidate. Cache providers are responsible for
> invalidating used caches before freeing them. That also drops the
> reference established here.

Ah, apic.c's cache is static.  It looked to me as if you're immediately
leaking references to it, but you aren't.

-- 
error compiling committee.c: too many arguments to function

^ permalink raw reply	[flat|nested] 288+ messages in thread

* Re: [Qemu-devel] [RFC][PATCH 17/45] qemu-kvm: Track MSIRoutingCache in KVM routing table
@ 2011-10-17 12:15         ` Avi Kivity
  0 siblings, 0 replies; 288+ messages in thread
From: Avi Kivity @ 2011-10-17 12:15 UTC (permalink / raw)
  To: Jan Kiszka
  Cc: Alex Williamson, Marcelo Tosatti, qemu-devel, kvm, Michael S. Tsirkin

On 10/17/2011 01:25 PM, Jan Kiszka wrote:
> On 2011-10-17 13:13, Avi Kivity wrote:
> > On 10/17/2011 11:27 AM, Jan Kiszka wrote:
> >> Keep a link from the internal KVM routing table to potential MSI routing
> >> cache entries. The link is used so far whenever the entry is dropped to
> >> invalidate the cache content. It will allow us to build MSI routing
> >> entries on demand and flush existing ones on table overflow.
> >>
> > 
> > Does this not require a destructor for MSIRoutingCache?
>
> Yes, kvm_msi_cache_invalidate. Cache providers are responsible for
> invalidating used caches before freeing them. That also drops the
> reference established here.

Ah, apic.c's cache is static.  It looked to me as if you're immediately
leaking references to it, but you aren't.

-- 
error compiling committee.c: too many arguments to function

^ permalink raw reply	[flat|nested] 288+ messages in thread

* Re: [RFC][PATCH 22/45] qemu-kvm: msix: Fire mask notifier on global mask changes
  2011-10-17  9:27   ` [Qemu-devel] " Jan Kiszka
@ 2011-10-17 12:16     ` Michael S. Tsirkin
  -1 siblings, 0 replies; 288+ messages in thread
From: Michael S. Tsirkin @ 2011-10-17 12:16 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: Alex Williamson, Marcelo Tosatti, Avi Kivity, kvm, qemu-devel

On Mon, Oct 17, 2011 at 11:27:56AM +0200, Jan Kiszka wrote:
> Also invoke the mask notifier if the global MSI-X mask is modified. For
> this purpose, we push the notifier call from the per-vector mask update
> to the central msix_handle_mask_update.
> 
> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>

This is a bugfix, isn't it?
If yes it should be separated and put on -stable.

> ---
>  hw/msix.c |   16 +++++++++-------
>  1 files changed, 9 insertions(+), 7 deletions(-)
> 
> diff --git a/hw/msix.c b/hw/msix.c
> index 739b56f..247b255 100644
> --- a/hw/msix.c
> +++ b/hw/msix.c
> @@ -221,7 +221,15 @@ static bool msix_is_masked(PCIDevice *dev, int vector)
>  
>  static void msix_handle_mask_update(PCIDevice *dev, int vector)
>  {
> -    if (!msix_is_masked(dev, vector) && msix_is_pending(dev, vector)) {
> +    bool masked = msix_is_masked(dev, vector);
> +    int ret;
> +
> +    if (dev->msix_mask_notifier) {
> +        ret = dev->msix_mask_notifier(dev, vector,
> +                                      msix_is_masked(dev, vector));

Use 'masked' value here as well?

> +        assert(ret >= 0);
> +    }
> +    if (!masked && msix_is_pending(dev, vector)) {
>          msix_clr_pending(dev, vector);
>          msix_notify(dev, vector);
>      }
> @@ -262,7 +270,6 @@ static void msix_mmio_write(void *opaque, target_phys_addr_t addr,
>      unsigned int offset = addr & (MSIX_PAGE_SIZE - 1) & ~0x3;
>      unsigned int vector = offset / PCI_MSIX_ENTRY_SIZE;
>      bool was_masked = msix_is_masked(dev, vector);
> -    int r;
>  
>      pci_set_long(dev->msix_table_page + offset, val);
>      if (kvm_enabled() && kvm_irqchip_in_kernel()) {
> @@ -271,11 +278,6 @@ static void msix_mmio_write(void *opaque, target_phys_addr_t addr,
>  
>      if (vector < dev->msix_entries_nr &&
>          was_masked != msix_is_masked(dev, vector)) {
> -        if (dev->msix_mask_notifier) {
> -            r = dev->msix_mask_notifier(dev, vector,
> -                                        msix_is_masked(dev, vector));
> -            assert(r >= 0);
> -        }
>          msix_handle_mask_update(dev, vector);
>      }
>  }
> -- 
> 1.7.3.4

^ permalink raw reply	[flat|nested] 288+ messages in thread

* Re: [Qemu-devel] [RFC][PATCH 22/45] qemu-kvm: msix: Fire mask notifier on global mask changes
@ 2011-10-17 12:16     ` Michael S. Tsirkin
  0 siblings, 0 replies; 288+ messages in thread
From: Michael S. Tsirkin @ 2011-10-17 12:16 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: Alex Williamson, Marcelo Tosatti, Avi Kivity, kvm, qemu-devel

On Mon, Oct 17, 2011 at 11:27:56AM +0200, Jan Kiszka wrote:
> Also invoke the mask notifier if the global MSI-X mask is modified. For
> this purpose, we push the notifier call from the per-vector mask update
> to the central msix_handle_mask_update.
> 
> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>

This is a bugfix, isn't it?
If yes it should be separated and put on -stable.

> ---
>  hw/msix.c |   16 +++++++++-------
>  1 files changed, 9 insertions(+), 7 deletions(-)
> 
> diff --git a/hw/msix.c b/hw/msix.c
> index 739b56f..247b255 100644
> --- a/hw/msix.c
> +++ b/hw/msix.c
> @@ -221,7 +221,15 @@ static bool msix_is_masked(PCIDevice *dev, int vector)
>  
>  static void msix_handle_mask_update(PCIDevice *dev, int vector)
>  {
> -    if (!msix_is_masked(dev, vector) && msix_is_pending(dev, vector)) {
> +    bool masked = msix_is_masked(dev, vector);
> +    int ret;
> +
> +    if (dev->msix_mask_notifier) {
> +        ret = dev->msix_mask_notifier(dev, vector,
> +                                      msix_is_masked(dev, vector));

Use 'masked' value here as well?

> +        assert(ret >= 0);
> +    }
> +    if (!masked && msix_is_pending(dev, vector)) {
>          msix_clr_pending(dev, vector);
>          msix_notify(dev, vector);
>      }
> @@ -262,7 +270,6 @@ static void msix_mmio_write(void *opaque, target_phys_addr_t addr,
>      unsigned int offset = addr & (MSIX_PAGE_SIZE - 1) & ~0x3;
>      unsigned int vector = offset / PCI_MSIX_ENTRY_SIZE;
>      bool was_masked = msix_is_masked(dev, vector);
> -    int r;
>  
>      pci_set_long(dev->msix_table_page + offset, val);
>      if (kvm_enabled() && kvm_irqchip_in_kernel()) {
> @@ -271,11 +278,6 @@ static void msix_mmio_write(void *opaque, target_phys_addr_t addr,
>  
>      if (vector < dev->msix_entries_nr &&
>          was_masked != msix_is_masked(dev, vector)) {
> -        if (dev->msix_mask_notifier) {
> -            r = dev->msix_mask_notifier(dev, vector,
> -                                        msix_is_masked(dev, vector));
> -            assert(r >= 0);
> -        }
>          msix_handle_mask_update(dev, vector);
>      }
>  }
> -- 
> 1.7.3.4

^ permalink raw reply	[flat|nested] 288+ messages in thread

* Re: [RFC][PATCH 12/45] msi: Introduce MSIRoutingCache
  2011-10-17 11:31           ` [Qemu-devel] " Jan Kiszka
@ 2011-10-17 12:17             ` Avi Kivity
  -1 siblings, 0 replies; 288+ messages in thread
From: Avi Kivity @ 2011-10-17 12:17 UTC (permalink / raw)
  To: Jan Kiszka
  Cc: Alex Williamson, Marcelo Tosatti, qemu-devel, kvm, Michael S. Tsirkin

On 10/17/2011 01:31 PM, Jan Kiszka wrote:
> > 
> > Just to make sure I understand this completely:  a hash table indexed by
> > MSIMessage in kvm code would avoid this?  You'd just allocate on demand
> > when seeing a new MSIMessage and free on an LRU basis, avoiding pinned
> > entries.
> > 
> > I'm not advocating this (yet), just want to understand the tradeoffs.
>
> Practically, that may work. I just wanted to avoid searching. And for
> static routes (irqfd, device assigment) you still need caches anyway, so
> I decided to use them consistently.

Okay.  Even if we do decide to go for transparent caches, it should be
done after this is merged, to avoid excessive churn.

-- 
error compiling committee.c: too many arguments to function

^ permalink raw reply	[flat|nested] 288+ messages in thread

* Re: [Qemu-devel] [RFC][PATCH 12/45] msi: Introduce MSIRoutingCache
@ 2011-10-17 12:17             ` Avi Kivity
  0 siblings, 0 replies; 288+ messages in thread
From: Avi Kivity @ 2011-10-17 12:17 UTC (permalink / raw)
  To: Jan Kiszka
  Cc: Alex Williamson, Marcelo Tosatti, qemu-devel, kvm, Michael S. Tsirkin

On 10/17/2011 01:31 PM, Jan Kiszka wrote:
> > 
> > Just to make sure I understand this completely:  a hash table indexed by
> > MSIMessage in kvm code would avoid this?  You'd just allocate on demand
> > when seeing a new MSIMessage and free on an LRU basis, avoiding pinned
> > entries.
> > 
> > I'm not advocating this (yet), just want to understand the tradeoffs.
>
> Practically, that may work. I just wanted to avoid searching. And for
> static routes (irqfd, device assigment) you still need caches anyway, so
> I decided to use them consistently.

Okay.  Even if we do decide to go for transparent caches, it should be
done after this is merged, to avoid excessive churn.

-- 
error compiling committee.c: too many arguments to function

^ permalink raw reply	[flat|nested] 288+ messages in thread

* Re: [RFC][PATCH 00/45] qemu-kvm: MSI layer rework for in-kernel irqchip support
  2011-10-17  9:27 ` [Qemu-devel] " Jan Kiszka
@ 2011-10-17 12:18   ` Avi Kivity
  -1 siblings, 0 replies; 288+ messages in thread
From: Avi Kivity @ 2011-10-17 12:18 UTC (permalink / raw)
  To: Jan Kiszka
  Cc: Marcelo Tosatti, kvm, Alex Williamson, Michael S. Tsirkin,
	qemu-devel, Alexander Graf, Gerd Hoffmann, Isaku Yamahata

On 10/17/2011 11:27 AM, Jan Kiszka wrote:
> As previously indicated, I was working for quite a while on a major
> refactoring of the MSI "additions" we have in qemu-kvm to support
> in-kernel irqchip, vhost and device assignment. This is now the outcome.
>
> I'm quite happy with it, things are still working (apparently), and the
> invasiveness of KVM hooks into the MSI layer is significantly reduced.
> Moreover, I was able to port the device assignment code over generic MSI
> support, reducing the size of that file a bit further.
>
> Some further highlights:
>  - fix for HPET MSI support with in-kernel irqchip
>  - fully configurable MSI-X (allows 1:1 mapping for assigned devices)
>  - refactored KVM core API for device assignment and IRQ routing
>
> I'm sending the whole series in one chunk so that you can see what the
> result will be. It's RFC as I bet that there are regressions included
> and maybe still room left for improvements. Once all is fine (can be
> broken up into multiple chunks for the merge), I would suggest patching
> qemu-kvm first and then start with porting things over to upstream.

Impressive patchset, let's merge this as quickly as possible.

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 288+ messages in thread

* Re: [Qemu-devel] [RFC][PATCH 00/45] qemu-kvm: MSI layer rework for in-kernel irqchip support
@ 2011-10-17 12:18   ` Avi Kivity
  0 siblings, 0 replies; 288+ messages in thread
From: Avi Kivity @ 2011-10-17 12:18 UTC (permalink / raw)
  To: Jan Kiszka
  Cc: kvm, Michael S. Tsirkin, Marcelo Tosatti, Alexander Graf,
	qemu-devel, Isaku Yamahata, Alex Williamson, Gerd Hoffmann

On 10/17/2011 11:27 AM, Jan Kiszka wrote:
> As previously indicated, I was working for quite a while on a major
> refactoring of the MSI "additions" we have in qemu-kvm to support
> in-kernel irqchip, vhost and device assignment. This is now the outcome.
>
> I'm quite happy with it, things are still working (apparently), and the
> invasiveness of KVM hooks into the MSI layer is significantly reduced.
> Moreover, I was able to port the device assignment code over generic MSI
> support, reducing the size of that file a bit further.
>
> Some further highlights:
>  - fix for HPET MSI support with in-kernel irqchip
>  - fully configurable MSI-X (allows 1:1 mapping for assigned devices)
>  - refactored KVM core API for device assignment and IRQ routing
>
> I'm sending the whole series in one chunk so that you can see what the
> result will be. It's RFC as I bet that there are regressions included
> and maybe still room left for improvements. Once all is fine (can be
> broken up into multiple chunks for the merge), I would suggest patching
> qemu-kvm first and then start with porting things over to upstream.

Impressive patchset, let's merge this as quickly as possible.

-- 
error compiling committee.c: too many arguments to function

^ permalink raw reply	[flat|nested] 288+ messages in thread

* Re: [RFC][PATCH 23/45] qemu-kvm: Rework MSI-X mask notifier to generic MSI config notifiers
  2011-10-17 11:45       ` [Qemu-devel] " Jan Kiszka
@ 2011-10-17 12:39         ` Michael S. Tsirkin
  -1 siblings, 0 replies; 288+ messages in thread
From: Michael S. Tsirkin @ 2011-10-17 12:39 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: Avi Kivity, Marcelo Tosatti, kvm, Alex Williamson, qemu-devel

On Mon, Oct 17, 2011 at 01:45:04PM +0200, Jan Kiszka wrote:
> On 2011-10-17 13:40, Michael S. Tsirkin wrote:
> > On Mon, Oct 17, 2011 at 11:27:57AM +0200, Jan Kiszka wrote:
> >> MSI config notifiers are supposed to be triggered on every relevant
> >> configuration change of MSI vectors or if MSI is enabled/disabled.
> >>
> >> Two notifiers are established, one for vector changes and one for general
> >> enabling. The former notifier additionally passes the currently active
> >> MSI message.
> >> This will allow to update potential in-kernel IRQ routes on
> >> changes. The latter notifier is optional and will only be used by a
> >> subset of clients.
> >>
> >> These notifiers are currently only available for MSI-X but will be
> >> extended to legacy MSI as well.
> >>
> >> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
> > 
> > Passing message, always, does not seem to make sense: message is only
> > valid if it is unmasked.
> 
> If we go from unmasked to masked, the consumer could just ignore the
> message.

Why don't we let the consumer get the message if it needs it?

> > Further, IIRC the spec requires any changes to be done while
> > message is masked. So mask notifier makes more sense to me:
> > it does the same thing using one notifier that you do
> > using two notifiers.
> 
> That's in fact a possible optimization (only invoke the callback on mask
> transitions).

Further, it is one that is already implemented.
So I would prefer not to add work by removing it :)

> Not sure if that applies to MSI as well, probably not.

Probably not. However, if per vector masking is
supported, and while vector is masked, the address/
data values might not make any sense.

So I think even msi users needs to know about masked state.

> To
> have common types, I would prefer to stay with vector config notifiers
> as name then.
> 
> Jan

So we pass in nonsense values and ask all users to know about MSIX rules.
Ugh.

I do realize msi might change the vector without masking.
We can either artificially call mask before value change
and unmask after, or use 3 notifiers: mask,unmask,config.
Add a comment that config is invoked when configuration
for an unmasked vector is changed, and that
it can only happen for msi, not msix.

This also removes the need to handle enable/disable specially:
you simply pretend all vectors are masked.


> -- 
> Siemens AG, Corporate Technology, CT T DE IT 1
> Corporate Competence Center Embedded Linux

^ permalink raw reply	[flat|nested] 288+ messages in thread

* Re: [Qemu-devel] [RFC][PATCH 23/45] qemu-kvm: Rework MSI-X mask notifier to generic MSI config notifiers
@ 2011-10-17 12:39         ` Michael S. Tsirkin
  0 siblings, 0 replies; 288+ messages in thread
From: Michael S. Tsirkin @ 2011-10-17 12:39 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: Alex Williamson, Marcelo Tosatti, Avi Kivity, kvm, qemu-devel

On Mon, Oct 17, 2011 at 01:45:04PM +0200, Jan Kiszka wrote:
> On 2011-10-17 13:40, Michael S. Tsirkin wrote:
> > On Mon, Oct 17, 2011 at 11:27:57AM +0200, Jan Kiszka wrote:
> >> MSI config notifiers are supposed to be triggered on every relevant
> >> configuration change of MSI vectors or if MSI is enabled/disabled.
> >>
> >> Two notifiers are established, one for vector changes and one for general
> >> enabling. The former notifier additionally passes the currently active
> >> MSI message.
> >> This will allow to update potential in-kernel IRQ routes on
> >> changes. The latter notifier is optional and will only be used by a
> >> subset of clients.
> >>
> >> These notifiers are currently only available for MSI-X but will be
> >> extended to legacy MSI as well.
> >>
> >> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
> > 
> > Passing message, always, does not seem to make sense: message is only
> > valid if it is unmasked.
> 
> If we go from unmasked to masked, the consumer could just ignore the
> message.

Why don't we let the consumer get the message if it needs it?

> > Further, IIRC the spec requires any changes to be done while
> > message is masked. So mask notifier makes more sense to me:
> > it does the same thing using one notifier that you do
> > using two notifiers.
> 
> That's in fact a possible optimization (only invoke the callback on mask
> transitions).

Further, it is one that is already implemented.
So I would prefer not to add work by removing it :)

> Not sure if that applies to MSI as well, probably not.

Probably not. However, if per vector masking is
supported, and while vector is masked, the address/
data values might not make any sense.

So I think even msi users needs to know about masked state.

> To
> have common types, I would prefer to stay with vector config notifiers
> as name then.
> 
> Jan

So we pass in nonsense values and ask all users to know about MSIX rules.
Ugh.

I do realize msi might change the vector without masking.
We can either artificially call mask before value change
and unmask after, or use 3 notifiers: mask,unmask,config.
Add a comment that config is invoked when configuration
for an unmasked vector is changed, and that
it can only happen for msi, not msix.

This also removes the need to handle enable/disable specially:
you simply pretend all vectors are masked.


> -- 
> Siemens AG, Corporate Technology, CT T DE IT 1
> Corporate Competence Center Embedded Linux

^ permalink raw reply	[flat|nested] 288+ messages in thread

* Re: [RFC][PATCH 06/45] msix: Prevent bogus mask updates on MMIO accesses
  2011-10-17 12:07           ` [Qemu-devel] " Jan Kiszka
@ 2011-10-17 12:50             ` Michael S. Tsirkin
  -1 siblings, 0 replies; 288+ messages in thread
From: Michael S. Tsirkin @ 2011-10-17 12:50 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: Avi Kivity, Marcelo Tosatti, kvm, Alex Williamson, qemu-devel

On Mon, Oct 17, 2011 at 02:07:10PM +0200, Jan Kiszka wrote:
> On 2011-10-17 13:57, Michael S. Tsirkin wrote:
> > On Mon, Oct 17, 2011 at 01:23:46PM +0200, Jan Kiszka wrote:
> >> On 2011-10-17 13:10, Michael S. Tsirkin wrote:
> >>> On Mon, Oct 17, 2011 at 11:27:40AM +0200, Jan Kiszka wrote:
> >>>> Only accesses to the MSI-X table must trigger a call to
> >>>> msix_handle_mask_update or a notifier invocation.
> >>>>
> >>>> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
> >>>
> >>> Why would msix_mmio_write be called on an access
> >>> outside the table?
> >>
> >> Because it handles both the table and the PBA.
> > 
> > Hmm. Interesting. Is there a bug in how we handle PBA
> > updates then? If yes I'd like a separate patch for that
> > to apply to the stable tree.
> 
> I first thought it was a serious bug, but it just triggers if the guest
> write to PBA (which is very uncommon) and that actually triggers any
> spurious out-of-bounds vector injection. Highly unlikely.

Yes guests don't really use PBA ATM. But is there something
bad a malicious guest can do? For example, what if
msix_clr_pending gets invoked with this huge vector value?

It does seem serious ...


> > 
> > BTW, this code will go away if PBA can get stored separately?
> 
> Hmm - yeah, true. Likely it's moot to discuss this change then.
> 
> Jan
> 
> -- 
> Siemens AG, Corporate Technology, CT T DE IT 1
> Corporate Competence Center Embedded Linux

^ permalink raw reply	[flat|nested] 288+ messages in thread

* Re: [Qemu-devel] [RFC][PATCH 06/45] msix: Prevent bogus mask updates on MMIO accesses
@ 2011-10-17 12:50             ` Michael S. Tsirkin
  0 siblings, 0 replies; 288+ messages in thread
From: Michael S. Tsirkin @ 2011-10-17 12:50 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: Alex Williamson, Marcelo Tosatti, Avi Kivity, kvm, qemu-devel

On Mon, Oct 17, 2011 at 02:07:10PM +0200, Jan Kiszka wrote:
> On 2011-10-17 13:57, Michael S. Tsirkin wrote:
> > On Mon, Oct 17, 2011 at 01:23:46PM +0200, Jan Kiszka wrote:
> >> On 2011-10-17 13:10, Michael S. Tsirkin wrote:
> >>> On Mon, Oct 17, 2011 at 11:27:40AM +0200, Jan Kiszka wrote:
> >>>> Only accesses to the MSI-X table must trigger a call to
> >>>> msix_handle_mask_update or a notifier invocation.
> >>>>
> >>>> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
> >>>
> >>> Why would msix_mmio_write be called on an access
> >>> outside the table?
> >>
> >> Because it handles both the table and the PBA.
> > 
> > Hmm. Interesting. Is there a bug in how we handle PBA
> > updates then? If yes I'd like a separate patch for that
> > to apply to the stable tree.
> 
> I first thought it was a serious bug, but it just triggers if the guest
> write to PBA (which is very uncommon) and that actually triggers any
> spurious out-of-bounds vector injection. Highly unlikely.

Yes guests don't really use PBA ATM. But is there something
bad a malicious guest can do? For example, what if
msix_clr_pending gets invoked with this huge vector value?

It does seem serious ...


> > 
> > BTW, this code will go away if PBA can get stored separately?
> 
> Hmm - yeah, true. Likely it's moot to discuss this change then.
> 
> Jan
> 
> -- 
> Siemens AG, Corporate Technology, CT T DE IT 1
> Corporate Competence Center Embedded Linux

^ permalink raw reply	[flat|nested] 288+ messages in thread

* Re: [RFC][PATCH 08/45] Introduce MSIMessage structure
  2011-10-17 12:09           ` [Qemu-devel] " Jan Kiszka
@ 2011-10-17 13:01             ` Michael S. Tsirkin
  -1 siblings, 0 replies; 288+ messages in thread
From: Michael S. Tsirkin @ 2011-10-17 13:01 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: Alex Williamson, Marcelo Tosatti, Avi Kivity, kvm, qemu-devel

On Mon, Oct 17, 2011 at 02:09:46PM +0200, Jan Kiszka wrote:
> On 2011-10-17 14:04, Michael S. Tsirkin wrote:
> > On Mon, Oct 17, 2011 at 01:51:00PM +0200, Jan Kiszka wrote:
> >> On 2011-10-17 13:46, Michael S. Tsirkin wrote:
> >>> On Mon, Oct 17, 2011 at 11:27:42AM +0200, Jan Kiszka wrote:
> >>>> Will be used for generating and distributing MSI messages, both in
> >>>> emulation mode and under KVM.
> >>>>
> >>>> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
> >>>
> >>> I would add
> >>>
> >>> uint64_t msix_get_address(dev, vector)
> >>> uint64_t msix_get_data(dev, vector)
> >>>
> >>> and same for msi.
> >>>
> >>> this would minimise the changes while still making it
> >>> possible to avoid code duplication in kvm.
> >>
> >> I'm introducing msi[x]_message_from_vector for that purpose later on. Or
> >> what do you mean?
> >>
> >> Jan
> > 
> > It does not look like everyone actually wants the structure,
> > users seem to put it on stack and then immediately
> > unwrap it to get at the address/data.
> > So two accessorts get_data + get_address instead of one, will
> > remove the need to rework all code to use the structure.
> 
> The idea of this patch is to start handling MSI messages as a single
> blob. There should be no need to ask a device for parts of that blobs
> this way.

There should be no need to look at the message at all.
devices really only care about vector numbers.
So we are left with msix.c msi.c and kvm as the only users.
kvm has a cache of messages so it needs a struct of these,
msix/msi don't.

> If you see use cases in this series, though, let me know.
> 
> Jan

Yes, I see them. msix_notify is one example. msi_notify is another.

E.g. msi_notify would IMO look nicer as:
    stl_le_phys(msi_get_address(dev, vector), msi_get_data(dev, vector));



> -- 
> Siemens AG, Corporate Technology, CT T DE IT 1
> Corporate Competence Center Embedded Linux

^ permalink raw reply	[flat|nested] 288+ messages in thread

* Re: [Qemu-devel] [RFC][PATCH 08/45] Introduce MSIMessage structure
@ 2011-10-17 13:01             ` Michael S. Tsirkin
  0 siblings, 0 replies; 288+ messages in thread
From: Michael S. Tsirkin @ 2011-10-17 13:01 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: Alex Williamson, Marcelo Tosatti, Avi Kivity, kvm, qemu-devel

On Mon, Oct 17, 2011 at 02:09:46PM +0200, Jan Kiszka wrote:
> On 2011-10-17 14:04, Michael S. Tsirkin wrote:
> > On Mon, Oct 17, 2011 at 01:51:00PM +0200, Jan Kiszka wrote:
> >> On 2011-10-17 13:46, Michael S. Tsirkin wrote:
> >>> On Mon, Oct 17, 2011 at 11:27:42AM +0200, Jan Kiszka wrote:
> >>>> Will be used for generating and distributing MSI messages, both in
> >>>> emulation mode and under KVM.
> >>>>
> >>>> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
> >>>
> >>> I would add
> >>>
> >>> uint64_t msix_get_address(dev, vector)
> >>> uint64_t msix_get_data(dev, vector)
> >>>
> >>> and same for msi.
> >>>
> >>> this would minimise the changes while still making it
> >>> possible to avoid code duplication in kvm.
> >>
> >> I'm introducing msi[x]_message_from_vector for that purpose later on. Or
> >> what do you mean?
> >>
> >> Jan
> > 
> > It does not look like everyone actually wants the structure,
> > users seem to put it on stack and then immediately
> > unwrap it to get at the address/data.
> > So two accessorts get_data + get_address instead of one, will
> > remove the need to rework all code to use the structure.
> 
> The idea of this patch is to start handling MSI messages as a single
> blob. There should be no need to ask a device for parts of that blobs
> this way.

There should be no need to look at the message at all.
devices really only care about vector numbers.
So we are left with msix.c msi.c and kvm as the only users.
kvm has a cache of messages so it needs a struct of these,
msix/msi don't.

> If you see use cases in this series, though, let me know.
> 
> Jan

Yes, I see them. msix_notify is one example. msi_notify is another.

E.g. msi_notify would IMO look nicer as:
    stl_le_phys(msi_get_address(dev, vector), msi_get_data(dev, vector));



> -- 
> Siemens AG, Corporate Technology, CT T DE IT 1
> Corporate Competence Center Embedded Linux

^ permalink raw reply	[flat|nested] 288+ messages in thread

* Re: [RFC][PATCH 11/45] msi: Factor out delivery hook
  2011-10-17 11:15       ` [Qemu-devel] " Jan Kiszka
@ 2011-10-17 13:41         ` Michael S. Tsirkin
  -1 siblings, 0 replies; 288+ messages in thread
From: Michael S. Tsirkin @ 2011-10-17 13:41 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: Avi Kivity, Marcelo Tosatti, kvm, Alex Williamson, qemu-devel

On Mon, Oct 17, 2011 at 01:15:56PM +0200, Jan Kiszka wrote:
> On 2011-10-17 12:56, Avi Kivity wrote:
> > On 10/17/2011 11:27 AM, Jan Kiszka wrote:
> >> So far we deliver MSI messages by writing them into the target MMIO
> >> area. This reflects what happens on hardware, but imposes some
> >> limitations on the emulation when introducing KVM in-kernel irqchip
> >> models. For those we will need to track the message origin.
> > 
> > Why do we need to track the message origin?  Emulated interrupt remapping?
> 
> The origin holds the routing cache which we need to track if the message
> already has a route (and that without searching long lists) and to
> update that route instead of add another one.

Hmm, yes, but if the device does stl_phys or something like this,
it won't work with irqchip, will it? And it should, ideally.

-- 
MST

^ permalink raw reply	[flat|nested] 288+ messages in thread

* Re: [Qemu-devel] [RFC][PATCH 11/45] msi: Factor out delivery hook
@ 2011-10-17 13:41         ` Michael S. Tsirkin
  0 siblings, 0 replies; 288+ messages in thread
From: Michael S. Tsirkin @ 2011-10-17 13:41 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: Alex Williamson, Marcelo Tosatti, Avi Kivity, kvm, qemu-devel

On Mon, Oct 17, 2011 at 01:15:56PM +0200, Jan Kiszka wrote:
> On 2011-10-17 12:56, Avi Kivity wrote:
> > On 10/17/2011 11:27 AM, Jan Kiszka wrote:
> >> So far we deliver MSI messages by writing them into the target MMIO
> >> area. This reflects what happens on hardware, but imposes some
> >> limitations on the emulation when introducing KVM in-kernel irqchip
> >> models. For those we will need to track the message origin.
> > 
> > Why do we need to track the message origin?  Emulated interrupt remapping?
> 
> The origin holds the routing cache which we need to track if the message
> already has a route (and that without searching long lists) and to
> update that route instead of add another one.

Hmm, yes, but if the device does stl_phys or something like this,
it won't work with irqchip, will it? And it should, ideally.

-- 
MST

^ permalink raw reply	[flat|nested] 288+ messages in thread

* Re: [RFC][PATCH 11/45] msi: Factor out delivery hook
  2011-10-17 13:41         ` [Qemu-devel] " Michael S. Tsirkin
@ 2011-10-17 13:41           ` Avi Kivity
  -1 siblings, 0 replies; 288+ messages in thread
From: Avi Kivity @ 2011-10-17 13:41 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Jan Kiszka, Marcelo Tosatti, kvm, Alex Williamson, qemu-devel

On 10/17/2011 03:41 PM, Michael S. Tsirkin wrote:
> On Mon, Oct 17, 2011 at 01:15:56PM +0200, Jan Kiszka wrote:
> > On 2011-10-17 12:56, Avi Kivity wrote:
> > > On 10/17/2011 11:27 AM, Jan Kiszka wrote:
> > >> So far we deliver MSI messages by writing them into the target MMIO
> > >> area. This reflects what happens on hardware, but imposes some
> > >> limitations on the emulation when introducing KVM in-kernel irqchip
> > >> models. For those we will need to track the message origin.
> > > 
> > > Why do we need to track the message origin?  Emulated interrupt remapping?
> > 
> > The origin holds the routing cache which we need to track if the message
> > already has a route (and that without searching long lists) and to
> > update that route instead of add another one.
>
> Hmm, yes, but if the device does stl_phys or something like this,
> it won't work with irqchip, will it? And it should, ideally.

Why not?  it will fall back to the apic path, and use the local routing
cache entry there.

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 288+ messages in thread

* Re: [Qemu-devel] [RFC][PATCH 11/45] msi: Factor out delivery hook
@ 2011-10-17 13:41           ` Avi Kivity
  0 siblings, 0 replies; 288+ messages in thread
From: Avi Kivity @ 2011-10-17 13:41 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Jan Kiszka, Alex Williamson, Marcelo Tosatti, qemu-devel, kvm

On 10/17/2011 03:41 PM, Michael S. Tsirkin wrote:
> On Mon, Oct 17, 2011 at 01:15:56PM +0200, Jan Kiszka wrote:
> > On 2011-10-17 12:56, Avi Kivity wrote:
> > > On 10/17/2011 11:27 AM, Jan Kiszka wrote:
> > >> So far we deliver MSI messages by writing them into the target MMIO
> > >> area. This reflects what happens on hardware, but imposes some
> > >> limitations on the emulation when introducing KVM in-kernel irqchip
> > >> models. For those we will need to track the message origin.
> > > 
> > > Why do we need to track the message origin?  Emulated interrupt remapping?
> > 
> > The origin holds the routing cache which we need to track if the message
> > already has a route (and that without searching long lists) and to
> > update that route instead of add another one.
>
> Hmm, yes, but if the device does stl_phys or something like this,
> it won't work with irqchip, will it? And it should, ideally.

Why not?  it will fall back to the apic path, and use the local routing
cache entry there.

-- 
error compiling committee.c: too many arguments to function

^ permalink raw reply	[flat|nested] 288+ messages in thread

* Re: [RFC][PATCH 11/45] msi: Factor out delivery hook
  2011-10-17  9:27   ` [Qemu-devel] " Jan Kiszka
@ 2011-10-17 13:43     ` Michael S. Tsirkin
  -1 siblings, 0 replies; 288+ messages in thread
From: Michael S. Tsirkin @ 2011-10-17 13:43 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: Avi Kivity, Marcelo Tosatti, kvm, Alex Williamson, qemu-devel

On Mon, Oct 17, 2011 at 11:27:45AM +0200, Jan Kiszka wrote:
> diff --git a/hw/msi.c b/hw/msi.c
> index 3c7ebc3..9055155 100644
> --- a/hw/msi.c
> +++ b/hw/msi.c
> @@ -40,6 +40,14 @@
>  /* Flag for interrupt controller to declare MSI/MSI-X support */
>  bool msi_supported;
>  
> +static void msi_unsupported(MSIMessage *msg)
> +{
> +    /* If we get here, the board failed to register a delivery handler. */
> +    abort();
> +}
> +
> +void (*msi_deliver)(MSIMessage *msg) = msi_unsupported;
> +

How about we set this to NULL, and check it instead of the bool
flag?

-- 
MSt

^ permalink raw reply	[flat|nested] 288+ messages in thread

* Re: [Qemu-devel] [RFC][PATCH 11/45] msi: Factor out delivery hook
@ 2011-10-17 13:43     ` Michael S. Tsirkin
  0 siblings, 0 replies; 288+ messages in thread
From: Michael S. Tsirkin @ 2011-10-17 13:43 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: Alex Williamson, Marcelo Tosatti, Avi Kivity, kvm, qemu-devel

On Mon, Oct 17, 2011 at 11:27:45AM +0200, Jan Kiszka wrote:
> diff --git a/hw/msi.c b/hw/msi.c
> index 3c7ebc3..9055155 100644
> --- a/hw/msi.c
> +++ b/hw/msi.c
> @@ -40,6 +40,14 @@
>  /* Flag for interrupt controller to declare MSI/MSI-X support */
>  bool msi_supported;
>  
> +static void msi_unsupported(MSIMessage *msg)
> +{
> +    /* If we get here, the board failed to register a delivery handler. */
> +    abort();
> +}
> +
> +void (*msi_deliver)(MSIMessage *msg) = msi_unsupported;
> +

How about we set this to NULL, and check it instead of the bool
flag?

-- 
MSt

^ permalink raw reply	[flat|nested] 288+ messages in thread

* Re: [RFC][PATCH 11/45] msi: Factor out delivery hook
  2011-10-17 13:41           ` [Qemu-devel] " Avi Kivity
@ 2011-10-17 13:48             ` Michael S. Tsirkin
  -1 siblings, 0 replies; 288+ messages in thread
From: Michael S. Tsirkin @ 2011-10-17 13:48 UTC (permalink / raw)
  To: Avi Kivity; +Cc: Jan Kiszka, Alex Williamson, Marcelo Tosatti, qemu-devel, kvm

On Mon, Oct 17, 2011 at 03:41:44PM +0200, Avi Kivity wrote:
> On 10/17/2011 03:41 PM, Michael S. Tsirkin wrote:
> > On Mon, Oct 17, 2011 at 01:15:56PM +0200, Jan Kiszka wrote:
> > > On 2011-10-17 12:56, Avi Kivity wrote:
> > > > On 10/17/2011 11:27 AM, Jan Kiszka wrote:
> > > >> So far we deliver MSI messages by writing them into the target MMIO
> > > >> area. This reflects what happens on hardware, but imposes some
> > > >> limitations on the emulation when introducing KVM in-kernel irqchip
> > > >> models. For those we will need to track the message origin.
> > > > 
> > > > Why do we need to track the message origin?  Emulated interrupt remapping?
> > > 
> > > The origin holds the routing cache which we need to track if the message
> > > already has a route (and that without searching long lists) and to
> > > update that route instead of add another one.
> >
> > Hmm, yes, but if the device does stl_phys or something like this,
> > it won't work with irqchip, will it? And it should, ideally.
> 
> Why not?  it will fall back to the apic path, and use the local routing
> cache entry there.

Does it still work with irqchip enabled? I didn't realize ...

> -- 
> error compiling committee.c: too many arguments to function

^ permalink raw reply	[flat|nested] 288+ messages in thread

* Re: [Qemu-devel] [RFC][PATCH 11/45] msi: Factor out delivery hook
@ 2011-10-17 13:48             ` Michael S. Tsirkin
  0 siblings, 0 replies; 288+ messages in thread
From: Michael S. Tsirkin @ 2011-10-17 13:48 UTC (permalink / raw)
  To: Avi Kivity; +Cc: Jan Kiszka, Alex Williamson, Marcelo Tosatti, qemu-devel, kvm

On Mon, Oct 17, 2011 at 03:41:44PM +0200, Avi Kivity wrote:
> On 10/17/2011 03:41 PM, Michael S. Tsirkin wrote:
> > On Mon, Oct 17, 2011 at 01:15:56PM +0200, Jan Kiszka wrote:
> > > On 2011-10-17 12:56, Avi Kivity wrote:
> > > > On 10/17/2011 11:27 AM, Jan Kiszka wrote:
> > > >> So far we deliver MSI messages by writing them into the target MMIO
> > > >> area. This reflects what happens on hardware, but imposes some
> > > >> limitations on the emulation when introducing KVM in-kernel irqchip
> > > >> models. For those we will need to track the message origin.
> > > > 
> > > > Why do we need to track the message origin?  Emulated interrupt remapping?
> > > 
> > > The origin holds the routing cache which we need to track if the message
> > > already has a route (and that without searching long lists) and to
> > > update that route instead of add another one.
> >
> > Hmm, yes, but if the device does stl_phys or something like this,
> > it won't work with irqchip, will it? And it should, ideally.
> 
> Why not?  it will fall back to the apic path, and use the local routing
> cache entry there.

Does it still work with irqchip enabled? I didn't realize ...

> -- 
> error compiling committee.c: too many arguments to function

^ permalink raw reply	[flat|nested] 288+ messages in thread

* Re: [RFC][PATCH 42/45] msix: Introduce msix_init_simple
  2011-10-17 11:27       ` [Qemu-devel] " Jan Kiszka
@ 2011-10-17 14:28         ` Michael S. Tsirkin
  -1 siblings, 0 replies; 288+ messages in thread
From: Michael S. Tsirkin @ 2011-10-17 14:28 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: Avi Kivity, Marcelo Tosatti, kvm, Alex Williamson, qemu-devel

On Mon, Oct 17, 2011 at 01:27:31PM +0200, Jan Kiszka wrote:
> On 2011-10-17 13:22, Michael S. Tsirkin wrote:
> > On Mon, Oct 17, 2011 at 11:28:16AM +0200, Jan Kiszka wrote:
> >> Devices models are usually not interested in specifying MSI-X
> >> configuration details beyond the number of vectors to provide and the
> >> BAR number to use. Layout of an exclusively used BAR and its
> >> registration can also be handled centrally.
> >>
> >> This is the purpose of msix_init_simple. It provides handy services to
> >> the existing users. Future users like device assignment may require more
> >> detailed setup specification. For them we will (re-)introduce msix_init
> >> with the full list of configuration option (in contrast to the current
> >> code).
> >>
> >> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
> > 
> > Well, this seems a bit of a code churn then, doesn't it?
> > We are also discussing using memory BAR for virtio-pci for other
> > stuff besides MSI-X, so the last user of the _simple variant
> > will be ivshmem then?
> 
> We will surely see more MSI-X users over the time. Not sure if they all
> mix their MSIX-X BARs with other stuff. But e.g. the e1000 variant I
> have here does not. So there should be users in the future.
> 
> Jan

Question is, how hard is to pass in the BAR and the offset?

> -- 
> Siemens AG, Corporate Technology, CT T DE IT 1
> Corporate Competence Center Embedded Linux

^ permalink raw reply	[flat|nested] 288+ messages in thread

* Re: [Qemu-devel] [RFC][PATCH 42/45] msix: Introduce msix_init_simple
@ 2011-10-17 14:28         ` Michael S. Tsirkin
  0 siblings, 0 replies; 288+ messages in thread
From: Michael S. Tsirkin @ 2011-10-17 14:28 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: Alex Williamson, Marcelo Tosatti, Avi Kivity, kvm, qemu-devel

On Mon, Oct 17, 2011 at 01:27:31PM +0200, Jan Kiszka wrote:
> On 2011-10-17 13:22, Michael S. Tsirkin wrote:
> > On Mon, Oct 17, 2011 at 11:28:16AM +0200, Jan Kiszka wrote:
> >> Devices models are usually not interested in specifying MSI-X
> >> configuration details beyond the number of vectors to provide and the
> >> BAR number to use. Layout of an exclusively used BAR and its
> >> registration can also be handled centrally.
> >>
> >> This is the purpose of msix_init_simple. It provides handy services to
> >> the existing users. Future users like device assignment may require more
> >> detailed setup specification. For them we will (re-)introduce msix_init
> >> with the full list of configuration option (in contrast to the current
> >> code).
> >>
> >> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
> > 
> > Well, this seems a bit of a code churn then, doesn't it?
> > We are also discussing using memory BAR for virtio-pci for other
> > stuff besides MSI-X, so the last user of the _simple variant
> > will be ivshmem then?
> 
> We will surely see more MSI-X users over the time. Not sure if they all
> mix their MSIX-X BARs with other stuff. But e.g. the e1000 variant I
> have here does not. So there should be users in the future.
> 
> Jan

Question is, how hard is to pass in the BAR and the offset?

> -- 
> Siemens AG, Corporate Technology, CT T DE IT 1
> Corporate Competence Center Embedded Linux

^ permalink raw reply	[flat|nested] 288+ messages in thread

* Re: [RFC][PATCH 12/45] msi: Introduce MSIRoutingCache
  2011-10-17 11:19       ` [Qemu-devel] " Jan Kiszka
@ 2011-10-17 15:37         ` Michael S. Tsirkin
  -1 siblings, 0 replies; 288+ messages in thread
From: Michael S. Tsirkin @ 2011-10-17 15:37 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: Avi Kivity, Marcelo Tosatti, kvm, Alex Williamson, qemu-devel

On Mon, Oct 17, 2011 at 01:19:56PM +0200, Jan Kiszka wrote:
> On 2011-10-17 13:06, Avi Kivity wrote:
> > On 10/17/2011 11:27 AM, Jan Kiszka wrote:
> >> This cache will help us implementing KVM in-kernel irqchip support
> >> without spreading hooks all over the place.
> >>
> >> KVM requires us to register it first and then deliver it by raising a
> >> pseudo IRQ line returned on registration. While this could be changed
> >> for QEMU-originated MSI messages by adding direct MSI injection, we will
> >> still need this translation for irqfd-originated messages. The
> >> MSIRoutingCache will allow to track those registrations and update them
> >> lazily before the actual delivery. This avoid having to track MSI
> >> vectors at device level (like qemu-kvm currently does).
> >>
> >>
> >> +typedef enum {
> >> +    MSI_ROUTE_NONE = 0,
> >> +    MSI_ROUTE_STATIC,
> >> +} MSIRouteType;
> >> +
> >> +struct MSIRoutingCache {
> >> +    MSIMessage msg;
> >> +    MSIRouteType type;
> >> +    int kvm_gsi;
> >> +    int kvm_irqfd;
> >> +};
> >> +
> >> diff --git a/hw/pci.h b/hw/pci.h
> >> index 329ab32..5b5d2fd 100644
> >> --- a/hw/pci.h
> >> +++ b/hw/pci.h
> >> @@ -197,6 +197,10 @@ struct PCIDevice {
> >>      MemoryRegion rom;
> >>      uint32_t rom_bar;
> >>  
> >> +    /* MSI routing chaches */
> >> +    MSIRoutingCache *msi_cache;
> >> +    MSIRoutingCache *msix_cache;
> >> +
> >>      /* MSI entries */
> >>      int msi_entries_nr;
> >>      struct KVMMsiMessage *msi_irq_entries;
> > 
> > IMO this needlessly leaks kvm information into core qemu.  The cache
> > should be completely hidden in kvm code.
> > 
> > I think msi_deliver() can hide the use of the cache completely.  For
> > pre-registered events like kvm's irqfd, you can use something like
> > 
> >   qemu_irq qemu_msi_irq(MSIMessage msg)
> > 
> > for non-kvm, it simply returns a qemu_irq that triggers a stl_phys();
> > for kvm, it allocates an irqfd and a permanent entry in the cache and
> > returns a qemu_irq that triggers the irqfd.
> 
> See my previously mail: you want to track the life-cycle of an MSI
> source to avoid generating routes for identical sources. A messages is
> not a source. Two identical messages can come from different sources.

Since MSI messages are edge triggered, I don't see how this
would work without losing interrupts. And AFAIK,
existing guests do not use the same message for
different sources.

> So
> we need a separate data structure for that purpose.
> 
> Jan
> 
> -- 
> Siemens AG, Corporate Technology, CT T DE IT 1
> Corporate Competence Center Embedded Linux

^ permalink raw reply	[flat|nested] 288+ messages in thread

* Re: [Qemu-devel] [RFC][PATCH 12/45] msi: Introduce MSIRoutingCache
@ 2011-10-17 15:37         ` Michael S. Tsirkin
  0 siblings, 0 replies; 288+ messages in thread
From: Michael S. Tsirkin @ 2011-10-17 15:37 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: Alex Williamson, Marcelo Tosatti, Avi Kivity, kvm, qemu-devel

On Mon, Oct 17, 2011 at 01:19:56PM +0200, Jan Kiszka wrote:
> On 2011-10-17 13:06, Avi Kivity wrote:
> > On 10/17/2011 11:27 AM, Jan Kiszka wrote:
> >> This cache will help us implementing KVM in-kernel irqchip support
> >> without spreading hooks all over the place.
> >>
> >> KVM requires us to register it first and then deliver it by raising a
> >> pseudo IRQ line returned on registration. While this could be changed
> >> for QEMU-originated MSI messages by adding direct MSI injection, we will
> >> still need this translation for irqfd-originated messages. The
> >> MSIRoutingCache will allow to track those registrations and update them
> >> lazily before the actual delivery. This avoid having to track MSI
> >> vectors at device level (like qemu-kvm currently does).
> >>
> >>
> >> +typedef enum {
> >> +    MSI_ROUTE_NONE = 0,
> >> +    MSI_ROUTE_STATIC,
> >> +} MSIRouteType;
> >> +
> >> +struct MSIRoutingCache {
> >> +    MSIMessage msg;
> >> +    MSIRouteType type;
> >> +    int kvm_gsi;
> >> +    int kvm_irqfd;
> >> +};
> >> +
> >> diff --git a/hw/pci.h b/hw/pci.h
> >> index 329ab32..5b5d2fd 100644
> >> --- a/hw/pci.h
> >> +++ b/hw/pci.h
> >> @@ -197,6 +197,10 @@ struct PCIDevice {
> >>      MemoryRegion rom;
> >>      uint32_t rom_bar;
> >>  
> >> +    /* MSI routing chaches */
> >> +    MSIRoutingCache *msi_cache;
> >> +    MSIRoutingCache *msix_cache;
> >> +
> >>      /* MSI entries */
> >>      int msi_entries_nr;
> >>      struct KVMMsiMessage *msi_irq_entries;
> > 
> > IMO this needlessly leaks kvm information into core qemu.  The cache
> > should be completely hidden in kvm code.
> > 
> > I think msi_deliver() can hide the use of the cache completely.  For
> > pre-registered events like kvm's irqfd, you can use something like
> > 
> >   qemu_irq qemu_msi_irq(MSIMessage msg)
> > 
> > for non-kvm, it simply returns a qemu_irq that triggers a stl_phys();
> > for kvm, it allocates an irqfd and a permanent entry in the cache and
> > returns a qemu_irq that triggers the irqfd.
> 
> See my previously mail: you want to track the life-cycle of an MSI
> source to avoid generating routes for identical sources. A messages is
> not a source. Two identical messages can come from different sources.

Since MSI messages are edge triggered, I don't see how this
would work without losing interrupts. And AFAIK,
existing guests do not use the same message for
different sources.

> So
> we need a separate data structure for that purpose.
> 
> Jan
> 
> -- 
> Siemens AG, Corporate Technology, CT T DE IT 1
> Corporate Competence Center Embedded Linux

^ permalink raw reply	[flat|nested] 288+ messages in thread

* Re: [RFC][PATCH 12/45] msi: Introduce MSIRoutingCache
  2011-10-17  9:27   ` [Qemu-devel] " Jan Kiszka
@ 2011-10-17 15:43     ` Michael S. Tsirkin
  -1 siblings, 0 replies; 288+ messages in thread
From: Michael S. Tsirkin @ 2011-10-17 15:43 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: Avi Kivity, Marcelo Tosatti, kvm, Alex Williamson, qemu-devel

On Mon, Oct 17, 2011 at 11:27:46AM +0200, Jan Kiszka wrote:
> This cache will help us implementing KVM in-kernel irqchip support
> without spreading hooks all over the place.
> 
> KVM requires us to register it first and then deliver it by raising a
> pseudo IRQ line returned on registration. While this could be changed
> for QEMU-originated MSI messages by adding direct MSI injection, we will
> still need this translation for irqfd-originated messages. The
> MSIRoutingCache will allow to track those registrations and update them
> lazily before the actual delivery. This avoid having to track MSI
> vectors at device level (like qemu-kvm currently does).
> 
> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>

So if many devices are added, exhausting the number of GSIs supported,
we get terrible performance intead of simply failing outright.

To me, this looks more like a bug than a feature ...


> ---
>  hw/apic.c     |    5 +++--
>  hw/apic.h     |    2 +-
>  hw/msi.c      |   10 +++++++---
>  hw/msi.h      |   14 +++++++++++++-
>  hw/msix.c     |    7 ++++++-
>  hw/pc.c       |    4 ++--
>  hw/pci.h      |    4 ++++
>  qemu-common.h |    1 +
>  8 files changed, 37 insertions(+), 10 deletions(-)
> 
> diff --git a/hw/apic.c b/hw/apic.c
> index c1d557d..6811ae1 100644
> --- a/hw/apic.c
> +++ b/hw/apic.c
> @@ -804,7 +804,7 @@ static uint32_t apic_mem_readl(void *opaque, target_phys_addr_t addr)
>      return val;
>  }
>  
> -void apic_deliver_msi(MSIMessage *msg)
> +void apic_deliver_msi(MSIMessage *msg, MSIRoutingCache *cache)
>  {
>      uint8_t dest =
>          (msg->address & MSI_ADDR_DEST_ID_MASK) >> MSI_ADDR_DEST_ID_SHIFT;
> @@ -829,8 +829,9 @@ static void apic_mem_writel(void *opaque, target_phys_addr_t addr, uint32_t val)
>           * Mapping them on the global bus happens to work because
>           * MSI registers are reserved in APIC MMIO and vice versa. */
>          MSIMessage msg = { .address = addr, .data = val };
> +        static MSIRoutingCache cache;
>  
> -        msi_deliver(&msg);
> +        msi_deliver(&msg, &cache);
>          return;
>      }
>  
> diff --git a/hw/apic.h b/hw/apic.h
> index fa848fd..353ea3a 100644
> --- a/hw/apic.h
> +++ b/hw/apic.h
> @@ -18,7 +18,7 @@ void cpu_set_apic_tpr(DeviceState *s, uint8_t val);
>  uint8_t cpu_get_apic_tpr(DeviceState *s);
>  void apic_init_reset(DeviceState *s);
>  void apic_sipi(DeviceState *s);
> -void apic_deliver_msi(MSIMessage *msg);
> +void apic_deliver_msi(MSIMessage *msg, MSIRoutingCache *cache);
>  
>  /* pc.c */
>  int cpu_is_bsp(CPUState *env);
> diff --git a/hw/msi.c b/hw/msi.c
> index 9055155..c8ccb17 100644
> --- a/hw/msi.c
> +++ b/hw/msi.c
> @@ -40,13 +40,13 @@
>  /* Flag for interrupt controller to declare MSI/MSI-X support */
>  bool msi_supported;
>  
> -static void msi_unsupported(MSIMessage *msg)
> +static void msi_unsupported(MSIMessage *msg, MSIRoutingCache *cache)
>  {
>      /* If we get here, the board failed to register a delivery handler. */
>      abort();
>  }
>  
> -void (*msi_deliver)(MSIMessage *msg) = msi_unsupported;
> +void (*msi_deliver)(MSIMessage *msg, MSIRoutingCache *cache) = msi_unsupported;
>  
>  /* If we get rid of cap allocator, we won't need this. */
>  static inline uint8_t msi_cap_sizeof(uint16_t flags)
> @@ -288,6 +288,8 @@ int msi_init(struct PCIDevice *dev, uint8_t offset,
>                       0xffffffff >> (PCI_MSI_VECTORS_MAX - nr_vectors));
>      }
>  
> +    dev->msi_cache = g_malloc0(nr_vectors * sizeof(*dev->msi_cache));
> +
>      if (kvm_enabled() && kvm_irqchip_in_kernel()) {
>          dev->msi_irq_entries = g_malloc(nr_vectors *
>                                          sizeof(*dev->msix_irq_entries));
> @@ -312,6 +314,8 @@ void msi_uninit(struct PCIDevice *dev)
>          g_free(dev->msi_irq_entries);
>      }
>  
> +    g_free(dev->msi_cache);
> +
>      pci_del_capability(dev, PCI_CAP_ID_MSI, cap_size);
>      dev->cap_present &= ~QEMU_PCI_CAP_MSI;
>  
> @@ -389,7 +393,7 @@ void msi_notify(PCIDevice *dev, unsigned int vector)
>                     "notify vector 0x%x"
>                     " address: 0x%"PRIx64" data: 0x%"PRIx32"\n",
>                     vector, msg.address, msg.data);
> -    msi_deliver(&msg);
> +    msi_deliver(&msg, &dev->msi_cache[vector]);
>  }
>  
>  /* Normally called by pci_default_write_config(). */
> diff --git a/hw/msi.h b/hw/msi.h
> index f3152f3..20ae215 100644
> --- a/hw/msi.h
> +++ b/hw/msi.h
> @@ -29,6 +29,18 @@ struct MSIMessage {
>      uint32_t data;
>  };
>  
> +typedef enum {
> +    MSI_ROUTE_NONE = 0,
> +    MSI_ROUTE_STATIC,
> +} MSIRouteType;
> +
> +struct MSIRoutingCache {
> +    MSIMessage msg;
> +    MSIRouteType type;
> +    int kvm_gsi;
> +    int kvm_irqfd;
> +};
> +
>  extern bool msi_supported;
>  
>  bool msi_enabled(const PCIDevice *dev);
> @@ -46,6 +58,6 @@ static inline bool msi_present(const PCIDevice *dev)
>      return dev->cap_present & QEMU_PCI_CAP_MSI;
>  }
>  
> -extern void (*msi_deliver)(MSIMessage *msg);
> +extern void (*msi_deliver)(MSIMessage *msg, MSIRoutingCache *cache);
>  
>  #endif /* QEMU_MSI_H */
> diff --git a/hw/msix.c b/hw/msix.c
> index 08cc526..e824aef 100644
> --- a/hw/msix.c
> +++ b/hw/msix.c
> @@ -358,6 +358,8 @@ int msix_init(struct PCIDevice *dev, unsigned short nentries,
>      if (ret)
>          goto err_config;
>  
> +    dev->msix_cache = g_malloc0(nentries * sizeof *dev->msix_cache);
> +
>      if (kvm_enabled() && kvm_irqchip_in_kernel()) {
>          dev->msix_irq_entries = g_malloc(nentries *
>                                           sizeof *dev->msix_irq_entries);
> @@ -409,6 +411,9 @@ int msix_uninit(PCIDevice *dev, MemoryRegion *bar)
>      dev->msix_entry_used = NULL;
>      g_free(dev->msix_irq_entries);
>      dev->msix_irq_entries = NULL;
> +
> +    g_free(dev->msix_cache);
> +
>      dev->cap_present &= ~QEMU_PCI_CAP_MSIX;
>      return 0;
>  }
> @@ -478,7 +483,7 @@ void msix_notify(PCIDevice *dev, unsigned vector)
>  
>      msix_message_from_vector(dev, vector, &msg);
>  
> -    msi_deliver(&msg);
> +    msi_deliver(&msg, &dev->msix_cache[vector]);
>  }
>  
>  void msix_reset(PCIDevice *dev)
> diff --git a/hw/pc.c b/hw/pc.c
> index 7d29a4a..4d8b524 100644
> --- a/hw/pc.c
> +++ b/hw/pc.c
> @@ -103,10 +103,10 @@ void isa_irq_handler(void *opaque, int n, int level)
>          qemu_set_irq(isa->ioapic[n], level);
>  };
>  
> -static void pc_msi_deliver(MSIMessage *msg)
> +static void pc_msi_deliver(MSIMessage *msg, MSIRoutingCache *cache)
>  {
>      if ((msg->address & 0xfff00000) == MSI_ADDR_BASE) {
> -        apic_deliver_msi(msg);
> +        apic_deliver_msi(msg, cache);
>      } else {
>          stl_phys(msg->address, msg->data);
>      }
> diff --git a/hw/pci.h b/hw/pci.h
> index 329ab32..5b5d2fd 100644
> --- a/hw/pci.h
> +++ b/hw/pci.h
> @@ -197,6 +197,10 @@ struct PCIDevice {
>      MemoryRegion rom;
>      uint32_t rom_bar;
>  
> +    /* MSI routing chaches */
> +    MSIRoutingCache *msi_cache;
> +    MSIRoutingCache *msix_cache;
> +
>      /* MSI entries */
>      int msi_entries_nr;
>      struct KVMMsiMessage *msi_irq_entries;
> diff --git a/qemu-common.h b/qemu-common.h
> index d3901bd..c1d1614 100644
> --- a/qemu-common.h
> +++ b/qemu-common.h
> @@ -16,6 +16,7 @@ typedef struct QEMUFile QEMUFile;
>  typedef struct QEMUBH QEMUBH;
>  typedef struct DeviceState DeviceState;
>  typedef struct MSIMessage MSIMessage;
> +typedef struct MSIRoutingCache MSIRoutingCache;
>  
>  struct Monitor;
>  typedef struct Monitor Monitor;
> -- 
> 1.7.3.4

^ permalink raw reply	[flat|nested] 288+ messages in thread

* Re: [Qemu-devel] [RFC][PATCH 12/45] msi: Introduce MSIRoutingCache
@ 2011-10-17 15:43     ` Michael S. Tsirkin
  0 siblings, 0 replies; 288+ messages in thread
From: Michael S. Tsirkin @ 2011-10-17 15:43 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: Alex Williamson, Marcelo Tosatti, Avi Kivity, kvm, qemu-devel

On Mon, Oct 17, 2011 at 11:27:46AM +0200, Jan Kiszka wrote:
> This cache will help us implementing KVM in-kernel irqchip support
> without spreading hooks all over the place.
> 
> KVM requires us to register it first and then deliver it by raising a
> pseudo IRQ line returned on registration. While this could be changed
> for QEMU-originated MSI messages by adding direct MSI injection, we will
> still need this translation for irqfd-originated messages. The
> MSIRoutingCache will allow to track those registrations and update them
> lazily before the actual delivery. This avoid having to track MSI
> vectors at device level (like qemu-kvm currently does).
> 
> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>

So if many devices are added, exhausting the number of GSIs supported,
we get terrible performance intead of simply failing outright.

To me, this looks more like a bug than a feature ...


> ---
>  hw/apic.c     |    5 +++--
>  hw/apic.h     |    2 +-
>  hw/msi.c      |   10 +++++++---
>  hw/msi.h      |   14 +++++++++++++-
>  hw/msix.c     |    7 ++++++-
>  hw/pc.c       |    4 ++--
>  hw/pci.h      |    4 ++++
>  qemu-common.h |    1 +
>  8 files changed, 37 insertions(+), 10 deletions(-)
> 
> diff --git a/hw/apic.c b/hw/apic.c
> index c1d557d..6811ae1 100644
> --- a/hw/apic.c
> +++ b/hw/apic.c
> @@ -804,7 +804,7 @@ static uint32_t apic_mem_readl(void *opaque, target_phys_addr_t addr)
>      return val;
>  }
>  
> -void apic_deliver_msi(MSIMessage *msg)
> +void apic_deliver_msi(MSIMessage *msg, MSIRoutingCache *cache)
>  {
>      uint8_t dest =
>          (msg->address & MSI_ADDR_DEST_ID_MASK) >> MSI_ADDR_DEST_ID_SHIFT;
> @@ -829,8 +829,9 @@ static void apic_mem_writel(void *opaque, target_phys_addr_t addr, uint32_t val)
>           * Mapping them on the global bus happens to work because
>           * MSI registers are reserved in APIC MMIO and vice versa. */
>          MSIMessage msg = { .address = addr, .data = val };
> +        static MSIRoutingCache cache;
>  
> -        msi_deliver(&msg);
> +        msi_deliver(&msg, &cache);
>          return;
>      }
>  
> diff --git a/hw/apic.h b/hw/apic.h
> index fa848fd..353ea3a 100644
> --- a/hw/apic.h
> +++ b/hw/apic.h
> @@ -18,7 +18,7 @@ void cpu_set_apic_tpr(DeviceState *s, uint8_t val);
>  uint8_t cpu_get_apic_tpr(DeviceState *s);
>  void apic_init_reset(DeviceState *s);
>  void apic_sipi(DeviceState *s);
> -void apic_deliver_msi(MSIMessage *msg);
> +void apic_deliver_msi(MSIMessage *msg, MSIRoutingCache *cache);
>  
>  /* pc.c */
>  int cpu_is_bsp(CPUState *env);
> diff --git a/hw/msi.c b/hw/msi.c
> index 9055155..c8ccb17 100644
> --- a/hw/msi.c
> +++ b/hw/msi.c
> @@ -40,13 +40,13 @@
>  /* Flag for interrupt controller to declare MSI/MSI-X support */
>  bool msi_supported;
>  
> -static void msi_unsupported(MSIMessage *msg)
> +static void msi_unsupported(MSIMessage *msg, MSIRoutingCache *cache)
>  {
>      /* If we get here, the board failed to register a delivery handler. */
>      abort();
>  }
>  
> -void (*msi_deliver)(MSIMessage *msg) = msi_unsupported;
> +void (*msi_deliver)(MSIMessage *msg, MSIRoutingCache *cache) = msi_unsupported;
>  
>  /* If we get rid of cap allocator, we won't need this. */
>  static inline uint8_t msi_cap_sizeof(uint16_t flags)
> @@ -288,6 +288,8 @@ int msi_init(struct PCIDevice *dev, uint8_t offset,
>                       0xffffffff >> (PCI_MSI_VECTORS_MAX - nr_vectors));
>      }
>  
> +    dev->msi_cache = g_malloc0(nr_vectors * sizeof(*dev->msi_cache));
> +
>      if (kvm_enabled() && kvm_irqchip_in_kernel()) {
>          dev->msi_irq_entries = g_malloc(nr_vectors *
>                                          sizeof(*dev->msix_irq_entries));
> @@ -312,6 +314,8 @@ void msi_uninit(struct PCIDevice *dev)
>          g_free(dev->msi_irq_entries);
>      }
>  
> +    g_free(dev->msi_cache);
> +
>      pci_del_capability(dev, PCI_CAP_ID_MSI, cap_size);
>      dev->cap_present &= ~QEMU_PCI_CAP_MSI;
>  
> @@ -389,7 +393,7 @@ void msi_notify(PCIDevice *dev, unsigned int vector)
>                     "notify vector 0x%x"
>                     " address: 0x%"PRIx64" data: 0x%"PRIx32"\n",
>                     vector, msg.address, msg.data);
> -    msi_deliver(&msg);
> +    msi_deliver(&msg, &dev->msi_cache[vector]);
>  }
>  
>  /* Normally called by pci_default_write_config(). */
> diff --git a/hw/msi.h b/hw/msi.h
> index f3152f3..20ae215 100644
> --- a/hw/msi.h
> +++ b/hw/msi.h
> @@ -29,6 +29,18 @@ struct MSIMessage {
>      uint32_t data;
>  };
>  
> +typedef enum {
> +    MSI_ROUTE_NONE = 0,
> +    MSI_ROUTE_STATIC,
> +} MSIRouteType;
> +
> +struct MSIRoutingCache {
> +    MSIMessage msg;
> +    MSIRouteType type;
> +    int kvm_gsi;
> +    int kvm_irqfd;
> +};
> +
>  extern bool msi_supported;
>  
>  bool msi_enabled(const PCIDevice *dev);
> @@ -46,6 +58,6 @@ static inline bool msi_present(const PCIDevice *dev)
>      return dev->cap_present & QEMU_PCI_CAP_MSI;
>  }
>  
> -extern void (*msi_deliver)(MSIMessage *msg);
> +extern void (*msi_deliver)(MSIMessage *msg, MSIRoutingCache *cache);
>  
>  #endif /* QEMU_MSI_H */
> diff --git a/hw/msix.c b/hw/msix.c
> index 08cc526..e824aef 100644
> --- a/hw/msix.c
> +++ b/hw/msix.c
> @@ -358,6 +358,8 @@ int msix_init(struct PCIDevice *dev, unsigned short nentries,
>      if (ret)
>          goto err_config;
>  
> +    dev->msix_cache = g_malloc0(nentries * sizeof *dev->msix_cache);
> +
>      if (kvm_enabled() && kvm_irqchip_in_kernel()) {
>          dev->msix_irq_entries = g_malloc(nentries *
>                                           sizeof *dev->msix_irq_entries);
> @@ -409,6 +411,9 @@ int msix_uninit(PCIDevice *dev, MemoryRegion *bar)
>      dev->msix_entry_used = NULL;
>      g_free(dev->msix_irq_entries);
>      dev->msix_irq_entries = NULL;
> +
> +    g_free(dev->msix_cache);
> +
>      dev->cap_present &= ~QEMU_PCI_CAP_MSIX;
>      return 0;
>  }
> @@ -478,7 +483,7 @@ void msix_notify(PCIDevice *dev, unsigned vector)
>  
>      msix_message_from_vector(dev, vector, &msg);
>  
> -    msi_deliver(&msg);
> +    msi_deliver(&msg, &dev->msix_cache[vector]);
>  }
>  
>  void msix_reset(PCIDevice *dev)
> diff --git a/hw/pc.c b/hw/pc.c
> index 7d29a4a..4d8b524 100644
> --- a/hw/pc.c
> +++ b/hw/pc.c
> @@ -103,10 +103,10 @@ void isa_irq_handler(void *opaque, int n, int level)
>          qemu_set_irq(isa->ioapic[n], level);
>  };
>  
> -static void pc_msi_deliver(MSIMessage *msg)
> +static void pc_msi_deliver(MSIMessage *msg, MSIRoutingCache *cache)
>  {
>      if ((msg->address & 0xfff00000) == MSI_ADDR_BASE) {
> -        apic_deliver_msi(msg);
> +        apic_deliver_msi(msg, cache);
>      } else {
>          stl_phys(msg->address, msg->data);
>      }
> diff --git a/hw/pci.h b/hw/pci.h
> index 329ab32..5b5d2fd 100644
> --- a/hw/pci.h
> +++ b/hw/pci.h
> @@ -197,6 +197,10 @@ struct PCIDevice {
>      MemoryRegion rom;
>      uint32_t rom_bar;
>  
> +    /* MSI routing chaches */
> +    MSIRoutingCache *msi_cache;
> +    MSIRoutingCache *msix_cache;
> +
>      /* MSI entries */
>      int msi_entries_nr;
>      struct KVMMsiMessage *msi_irq_entries;
> diff --git a/qemu-common.h b/qemu-common.h
> index d3901bd..c1d1614 100644
> --- a/qemu-common.h
> +++ b/qemu-common.h
> @@ -16,6 +16,7 @@ typedef struct QEMUFile QEMUFile;
>  typedef struct QEMUBH QEMUBH;
>  typedef struct DeviceState DeviceState;
>  typedef struct MSIMessage MSIMessage;
> +typedef struct MSIRoutingCache MSIRoutingCache;
>  
>  struct Monitor;
>  typedef struct Monitor Monitor;
> -- 
> 1.7.3.4

^ permalink raw reply	[flat|nested] 288+ messages in thread

* Re: [RFC][PATCH 28/45] qemu-kvm: msix: Drop tracking of used vectors
  2011-10-17  9:28   ` [Qemu-devel] " Jan Kiszka
@ 2011-10-17 15:48     ` Michael S. Tsirkin
  -1 siblings, 0 replies; 288+ messages in thread
From: Michael S. Tsirkin @ 2011-10-17 15:48 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: Avi Kivity, Marcelo Tosatti, kvm, Alex Williamson, qemu-devel

On Mon, Oct 17, 2011 at 11:28:02AM +0200, Jan Kiszka wrote:
> This optimization was only required to keep KVM route usage low. Now
> that we solve that problem via lazy updates, we can drop the field. We
> still need interfaces to clear pending vectors, though (and we have to
> make use of them more broadly - but that's unrelated to this patch).
> 
> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>

Lazy updates should be an implementation detail.
IMO resource tracking of vectors makes sense
as an API. Making devices deal with pending
vectors as a concept, IMO, does not.

> ---
>  hw/ivshmem.c    |   16 ++-----------
>  hw/msix.c       |   62 +++++++++++-------------------------------------------
>  hw/msix.h       |    5 +--
>  hw/pci.h        |    2 -
>  hw/virtio-pci.c |   20 +++++++----------
>  5 files changed, 26 insertions(+), 79 deletions(-)
> 
> diff --git a/hw/ivshmem.c b/hw/ivshmem.c
> index 242fbea..a402c98 100644
> --- a/hw/ivshmem.c
> +++ b/hw/ivshmem.c
> @@ -535,10 +535,8 @@ static uint64_t ivshmem_get_size(IVShmemState * s) {
>      return value;
>  }
>  
> -static void ivshmem_setup_msi(IVShmemState * s) {
> -
> -    int i;
> -
> +static void ivshmem_setup_msi(IVShmemState *s)
> +{
>      /* allocate the MSI-X vectors */
>  
>      memory_region_init(&s->msix_bar, "ivshmem-msix", 4096);
> @@ -551,11 +549,6 @@ static void ivshmem_setup_msi(IVShmemState * s) {
>          exit(1);
>      }
>  
> -    /* 'activate' the vectors */
> -    for (i = 0; i < s->vectors; i++) {
> -        msix_vector_use(&s->dev, i);
> -    }
> -
>      /* allocate Qemu char devices for receiving interrupts */
>      s->eventfd_table = g_malloc0(s->vectors * sizeof(EventfdEntry));
>  }
> @@ -581,7 +574,7 @@ static int ivshmem_load(QEMUFile* f, void *opaque, int version_id)
>      IVSHMEM_DPRINTF("ivshmem_load\n");
>  
>      IVShmemState *proxy = opaque;
> -    int ret, i;
> +    int ret;
>  
>      if (version_id > 0) {
>          return -EINVAL;
> @@ -599,9 +592,6 @@ static int ivshmem_load(QEMUFile* f, void *opaque, int version_id)
>  
>      if (ivshmem_has_feature(proxy, IVSHMEM_MSI)) {
>          msix_load(&proxy->dev, f);
> -        for (i = 0; i < proxy->vectors; i++) {
> -            msix_vector_use(&proxy->dev, i);
> -        }
>      } else {
>          proxy->intrstatus = qemu_get_be32(f);
>          proxy->intrmask = qemu_get_be32(f);
> diff --git a/hw/msix.c b/hw/msix.c
> index ce3375a..f1b97b5 100644
> --- a/hw/msix.c
> +++ b/hw/msix.c
> @@ -292,9 +292,6 @@ int msix_init(struct PCIDevice *dev, unsigned short nentries,
>      if (nentries > MSIX_MAX_ENTRIES)
>          return -EINVAL;
>  
> -    dev->msix_entry_used = g_malloc0(MSIX_MAX_ENTRIES *
> -                                        sizeof *dev->msix_entry_used);
> -
>      dev->msix_table_page = g_malloc0(MSIX_PAGE_SIZE);
>      msix_mask_all(dev, nentries);
>  
> @@ -317,21 +314,9 @@ err_config:
>      memory_region_destroy(&dev->msix_mmio);
>      g_free(dev->msix_table_page);
>      dev->msix_table_page = NULL;
> -    g_free(dev->msix_entry_used);
> -    dev->msix_entry_used = NULL;
>      return ret;
>  }
>  
> -static void msix_free_irq_entries(PCIDevice *dev)
> -{
> -    int vector;
> -
> -    for (vector = 0; vector < dev->msix_entries_nr; ++vector) {
> -        dev->msix_entry_used[vector] = 0;
> -        msix_clr_pending(dev, vector);
> -    }
> -}
> -
>  /* Clean up resources for the device. */
>  int msix_uninit(PCIDevice *dev, MemoryRegion *bar)
>  {
> @@ -340,14 +325,11 @@ int msix_uninit(PCIDevice *dev, MemoryRegion *bar)
>      }
>      pci_del_capability(dev, PCI_CAP_ID_MSIX, MSIX_CAP_LENGTH);
>      dev->msix_cap = 0;
> -    msix_free_irq_entries(dev);
>      dev->msix_entries_nr = 0;
>      memory_region_del_subregion(bar, &dev->msix_mmio);
>      memory_region_destroy(&dev->msix_mmio);
>      g_free(dev->msix_table_page);
>      dev->msix_table_page = NULL;
> -    g_free(dev->msix_entry_used);
> -    dev->msix_entry_used = NULL;
>  
>      kvm_msix_free(dev);
>      g_free(dev->msix_cache);
> @@ -376,7 +358,6 @@ void msix_load(PCIDevice *dev, QEMUFile *f)
>          return;
>      }
>  
> -    msix_free_irq_entries(dev);
>      qemu_get_buffer(f, dev->msix_table_page, n * PCI_MSIX_ENTRY_SIZE);
>      qemu_get_buffer(f, dev->msix_table_page + MSIX_PAGE_PENDING, (n + 7) / 8);
>  }
> @@ -407,7 +388,7 @@ void msix_notify(PCIDevice *dev, unsigned vector)
>  {
>      MSIMessage msg;
>  
> -    if (vector >= dev->msix_entries_nr || !dev->msix_entry_used[vector])
> +    if (vector >= dev->msix_entries_nr)
>          return;
>      if (msix_is_masked(dev, vector)) {
>          msix_set_pending(dev, vector);
> @@ -424,48 +405,31 @@ void msix_reset(PCIDevice *dev)
>      if (!msix_present(dev)) {
>          return;
>      }
> -    msix_free_irq_entries(dev);
> +    msix_clear_all_vectors(dev);
>      dev->config[dev->msix_cap + MSIX_CONTROL_OFFSET] &=
>  	    ~dev->wmask[dev->msix_cap + MSIX_CONTROL_OFFSET];
>      memset(dev->msix_table_page, 0, MSIX_PAGE_SIZE);
>      msix_mask_all(dev, dev->msix_entries_nr);
>  }
>  
> -/* PCI spec suggests that devices make it possible for software to configure
> - * less vectors than supported by the device, but does not specify a standard
> - * mechanism for devices to do so.
> - *
> - * We support this by asking devices to declare vectors software is going to
> - * actually use, and checking this on the notification path. Devices that
> - * don't want to follow the spec suggestion can declare all vectors as used. */
> -
> -/* Mark vector as used. */
> -int msix_vector_use(PCIDevice *dev, unsigned vector)
> +/* Clear pending vector. */
> +void msix_clear_vector(PCIDevice *dev, unsigned vector)
>  {
> -    if (vector >= dev->msix_entries_nr)
> -        return -EINVAL;
> -    ++dev->msix_entry_used[vector];
> -    return 0;
> -}
> -
> -/* Mark vector as unused. */
> -void msix_vector_unuse(PCIDevice *dev, unsigned vector)
> -{
> -    if (vector >= dev->msix_entries_nr || !dev->msix_entry_used[vector]) {
> -        return;
> -    }
> -    if (--dev->msix_entry_used[vector]) {
> -        return;
> +    if (msix_present(dev) && vector < dev->msix_entries_nr) {
> +        msix_clr_pending(dev, vector);
>      }
> -    msix_clr_pending(dev, vector);
>  }
>  
> -void msix_unuse_all_vectors(PCIDevice *dev)
> +void msix_clear_all_vectors(PCIDevice *dev)
>  {
> +    unsigned int vector;
> +
>      if (!msix_present(dev)) {
>          return;
>      }
> -    msix_free_irq_entries(dev);
> +    for (vector = 0; vector < dev->msix_entries_nr; ++vector) {
> +        msix_clr_pending(dev, vector);
> +    }
>  }
>  
>  /* Invoke the notifier if vector entry is used and unmasked. */
> @@ -476,7 +440,7 @@ msix_notify_if_unmasked(PCIDevice *dev, unsigned int vector, bool masked)
>  
>      assert(dev->msix_vector_config_notifier);
>  
> -    if (!dev->msix_entry_used[vector] || msix_is_masked(dev, vector)) {
> +    if (msix_is_masked(dev, vector)) {
>          return 0;
>      }
>      msix_message_from_vector(dev, vector, &msg);
> diff --git a/hw/msix.h b/hw/msix.h
> index 978f417..9cd54cf 100644
> --- a/hw/msix.h
> +++ b/hw/msix.h
> @@ -21,9 +21,8 @@ int msix_present(PCIDevice *dev);
>  
>  uint32_t msix_bar_size(PCIDevice *dev);
>  
> -int msix_vector_use(PCIDevice *dev, unsigned vector);
> -void msix_vector_unuse(PCIDevice *dev, unsigned vector);
> -void msix_unuse_all_vectors(PCIDevice *dev);
> +void msix_clear_vector(PCIDevice *dev, unsigned vector);
> +void msix_clear_all_vectors(PCIDevice *dev);
>  
>  void msix_notify(PCIDevice *dev, unsigned vector);
>  
> diff --git a/hw/pci.h b/hw/pci.h
> index d7a652e..5cf9a16 100644
> --- a/hw/pci.h
> +++ b/hw/pci.h
> @@ -178,8 +178,6 @@ struct PCIDevice {
>      uint8_t *msix_table_page;
>      /* MMIO index used to map MSIX table and pending bit entries. */
>      MemoryRegion msix_mmio;
> -    /* Reference-count for entries actually in use by driver. */
> -    unsigned *msix_entry_used;
>      /* Region including the MSI-X table */
>      uint32_t msix_bar_size;
>      /* Version id needed for VMState */
> diff --git a/hw/virtio-pci.c b/hw/virtio-pci.c
> index 85d6771..5004d7d 100644
> --- a/hw/virtio-pci.c
> +++ b/hw/virtio-pci.c
> @@ -136,9 +136,6 @@ static int virtio_pci_load_config(void * opaque, QEMUFile *f)
>      } else {
>          proxy->vdev->config_vector = VIRTIO_NO_VECTOR;
>      }
> -    if (proxy->vdev->config_vector != VIRTIO_NO_VECTOR) {
> -        return msix_vector_use(&proxy->pci_dev, proxy->vdev->config_vector);
> -    }
>      return 0;
>  }
>  
> @@ -152,9 +149,6 @@ static int virtio_pci_load_queue(void * opaque, int n, QEMUFile *f)
>          vector = VIRTIO_NO_VECTOR;
>      }
>      virtio_queue_set_vector(proxy->vdev, n, vector);
> -    if (vector != VIRTIO_NO_VECTOR) {
> -        return msix_vector_use(&proxy->pci_dev, vector);
> -    }
>      return 0;
>  }
>  
> @@ -304,7 +298,7 @@ static void virtio_ioport_write(void *opaque, uint32_t addr, uint32_t val)
>          if (pa == 0) {
>              virtio_pci_stop_ioeventfd(proxy);
>              virtio_reset(proxy->vdev);
> -            msix_unuse_all_vectors(&proxy->pci_dev);
> +            msix_clear_all_vectors(&proxy->pci_dev);
>          }
>          else
>              virtio_queue_set_addr(vdev, vdev->queue_sel, pa);
> @@ -331,7 +325,7 @@ static void virtio_ioport_write(void *opaque, uint32_t addr, uint32_t val)
>  
>          if (vdev->status == 0) {
>              virtio_reset(proxy->vdev);
> -            msix_unuse_all_vectors(&proxy->pci_dev);
> +            msix_clear_all_vectors(&proxy->pci_dev);
>          }
>  
>          /* Linux before 2.6.34 sets the device as OK without enabling
> @@ -343,18 +337,20 @@ static void virtio_ioport_write(void *opaque, uint32_t addr, uint32_t val)
>          }
>          break;
>      case VIRTIO_MSI_CONFIG_VECTOR:
> -        msix_vector_unuse(&proxy->pci_dev, vdev->config_vector);
> +        msix_clear_vector(&proxy->pci_dev, vdev->config_vector);
>          /* Make it possible for guest to discover an error took place. */
> -        if (msix_vector_use(&proxy->pci_dev, val) < 0)
> +        if (val >= vdev->nvectors) {
>              val = VIRTIO_NO_VECTOR;
> +        }
>          vdev->config_vector = val;
>          break;
>      case VIRTIO_MSI_QUEUE_VECTOR:
> -        msix_vector_unuse(&proxy->pci_dev,
> +        msix_clear_vector(&proxy->pci_dev,
>                            virtio_queue_vector(vdev, vdev->queue_sel));
>          /* Make it possible for guest to discover an error took place. */
> -        if (msix_vector_use(&proxy->pci_dev, val) < 0)
> +        if (val >= vdev->nvectors) {
>              val = VIRTIO_NO_VECTOR;
> +        }
>          virtio_queue_set_vector(vdev, vdev->queue_sel, val);
>          break;
>      default:
> -- 
> 1.7.3.4

^ permalink raw reply	[flat|nested] 288+ messages in thread

* Re: [Qemu-devel] [RFC][PATCH 28/45] qemu-kvm: msix: Drop tracking of used vectors
@ 2011-10-17 15:48     ` Michael S. Tsirkin
  0 siblings, 0 replies; 288+ messages in thread
From: Michael S. Tsirkin @ 2011-10-17 15:48 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: Alex Williamson, Marcelo Tosatti, Avi Kivity, kvm, qemu-devel

On Mon, Oct 17, 2011 at 11:28:02AM +0200, Jan Kiszka wrote:
> This optimization was only required to keep KVM route usage low. Now
> that we solve that problem via lazy updates, we can drop the field. We
> still need interfaces to clear pending vectors, though (and we have to
> make use of them more broadly - but that's unrelated to this patch).
> 
> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>

Lazy updates should be an implementation detail.
IMO resource tracking of vectors makes sense
as an API. Making devices deal with pending
vectors as a concept, IMO, does not.

> ---
>  hw/ivshmem.c    |   16 ++-----------
>  hw/msix.c       |   62 +++++++++++-------------------------------------------
>  hw/msix.h       |    5 +--
>  hw/pci.h        |    2 -
>  hw/virtio-pci.c |   20 +++++++----------
>  5 files changed, 26 insertions(+), 79 deletions(-)
> 
> diff --git a/hw/ivshmem.c b/hw/ivshmem.c
> index 242fbea..a402c98 100644
> --- a/hw/ivshmem.c
> +++ b/hw/ivshmem.c
> @@ -535,10 +535,8 @@ static uint64_t ivshmem_get_size(IVShmemState * s) {
>      return value;
>  }
>  
> -static void ivshmem_setup_msi(IVShmemState * s) {
> -
> -    int i;
> -
> +static void ivshmem_setup_msi(IVShmemState *s)
> +{
>      /* allocate the MSI-X vectors */
>  
>      memory_region_init(&s->msix_bar, "ivshmem-msix", 4096);
> @@ -551,11 +549,6 @@ static void ivshmem_setup_msi(IVShmemState * s) {
>          exit(1);
>      }
>  
> -    /* 'activate' the vectors */
> -    for (i = 0; i < s->vectors; i++) {
> -        msix_vector_use(&s->dev, i);
> -    }
> -
>      /* allocate Qemu char devices for receiving interrupts */
>      s->eventfd_table = g_malloc0(s->vectors * sizeof(EventfdEntry));
>  }
> @@ -581,7 +574,7 @@ static int ivshmem_load(QEMUFile* f, void *opaque, int version_id)
>      IVSHMEM_DPRINTF("ivshmem_load\n");
>  
>      IVShmemState *proxy = opaque;
> -    int ret, i;
> +    int ret;
>  
>      if (version_id > 0) {
>          return -EINVAL;
> @@ -599,9 +592,6 @@ static int ivshmem_load(QEMUFile* f, void *opaque, int version_id)
>  
>      if (ivshmem_has_feature(proxy, IVSHMEM_MSI)) {
>          msix_load(&proxy->dev, f);
> -        for (i = 0; i < proxy->vectors; i++) {
> -            msix_vector_use(&proxy->dev, i);
> -        }
>      } else {
>          proxy->intrstatus = qemu_get_be32(f);
>          proxy->intrmask = qemu_get_be32(f);
> diff --git a/hw/msix.c b/hw/msix.c
> index ce3375a..f1b97b5 100644
> --- a/hw/msix.c
> +++ b/hw/msix.c
> @@ -292,9 +292,6 @@ int msix_init(struct PCIDevice *dev, unsigned short nentries,
>      if (nentries > MSIX_MAX_ENTRIES)
>          return -EINVAL;
>  
> -    dev->msix_entry_used = g_malloc0(MSIX_MAX_ENTRIES *
> -                                        sizeof *dev->msix_entry_used);
> -
>      dev->msix_table_page = g_malloc0(MSIX_PAGE_SIZE);
>      msix_mask_all(dev, nentries);
>  
> @@ -317,21 +314,9 @@ err_config:
>      memory_region_destroy(&dev->msix_mmio);
>      g_free(dev->msix_table_page);
>      dev->msix_table_page = NULL;
> -    g_free(dev->msix_entry_used);
> -    dev->msix_entry_used = NULL;
>      return ret;
>  }
>  
> -static void msix_free_irq_entries(PCIDevice *dev)
> -{
> -    int vector;
> -
> -    for (vector = 0; vector < dev->msix_entries_nr; ++vector) {
> -        dev->msix_entry_used[vector] = 0;
> -        msix_clr_pending(dev, vector);
> -    }
> -}
> -
>  /* Clean up resources for the device. */
>  int msix_uninit(PCIDevice *dev, MemoryRegion *bar)
>  {
> @@ -340,14 +325,11 @@ int msix_uninit(PCIDevice *dev, MemoryRegion *bar)
>      }
>      pci_del_capability(dev, PCI_CAP_ID_MSIX, MSIX_CAP_LENGTH);
>      dev->msix_cap = 0;
> -    msix_free_irq_entries(dev);
>      dev->msix_entries_nr = 0;
>      memory_region_del_subregion(bar, &dev->msix_mmio);
>      memory_region_destroy(&dev->msix_mmio);
>      g_free(dev->msix_table_page);
>      dev->msix_table_page = NULL;
> -    g_free(dev->msix_entry_used);
> -    dev->msix_entry_used = NULL;
>  
>      kvm_msix_free(dev);
>      g_free(dev->msix_cache);
> @@ -376,7 +358,6 @@ void msix_load(PCIDevice *dev, QEMUFile *f)
>          return;
>      }
>  
> -    msix_free_irq_entries(dev);
>      qemu_get_buffer(f, dev->msix_table_page, n * PCI_MSIX_ENTRY_SIZE);
>      qemu_get_buffer(f, dev->msix_table_page + MSIX_PAGE_PENDING, (n + 7) / 8);
>  }
> @@ -407,7 +388,7 @@ void msix_notify(PCIDevice *dev, unsigned vector)
>  {
>      MSIMessage msg;
>  
> -    if (vector >= dev->msix_entries_nr || !dev->msix_entry_used[vector])
> +    if (vector >= dev->msix_entries_nr)
>          return;
>      if (msix_is_masked(dev, vector)) {
>          msix_set_pending(dev, vector);
> @@ -424,48 +405,31 @@ void msix_reset(PCIDevice *dev)
>      if (!msix_present(dev)) {
>          return;
>      }
> -    msix_free_irq_entries(dev);
> +    msix_clear_all_vectors(dev);
>      dev->config[dev->msix_cap + MSIX_CONTROL_OFFSET] &=
>  	    ~dev->wmask[dev->msix_cap + MSIX_CONTROL_OFFSET];
>      memset(dev->msix_table_page, 0, MSIX_PAGE_SIZE);
>      msix_mask_all(dev, dev->msix_entries_nr);
>  }
>  
> -/* PCI spec suggests that devices make it possible for software to configure
> - * less vectors than supported by the device, but does not specify a standard
> - * mechanism for devices to do so.
> - *
> - * We support this by asking devices to declare vectors software is going to
> - * actually use, and checking this on the notification path. Devices that
> - * don't want to follow the spec suggestion can declare all vectors as used. */
> -
> -/* Mark vector as used. */
> -int msix_vector_use(PCIDevice *dev, unsigned vector)
> +/* Clear pending vector. */
> +void msix_clear_vector(PCIDevice *dev, unsigned vector)
>  {
> -    if (vector >= dev->msix_entries_nr)
> -        return -EINVAL;
> -    ++dev->msix_entry_used[vector];
> -    return 0;
> -}
> -
> -/* Mark vector as unused. */
> -void msix_vector_unuse(PCIDevice *dev, unsigned vector)
> -{
> -    if (vector >= dev->msix_entries_nr || !dev->msix_entry_used[vector]) {
> -        return;
> -    }
> -    if (--dev->msix_entry_used[vector]) {
> -        return;
> +    if (msix_present(dev) && vector < dev->msix_entries_nr) {
> +        msix_clr_pending(dev, vector);
>      }
> -    msix_clr_pending(dev, vector);
>  }
>  
> -void msix_unuse_all_vectors(PCIDevice *dev)
> +void msix_clear_all_vectors(PCIDevice *dev)
>  {
> +    unsigned int vector;
> +
>      if (!msix_present(dev)) {
>          return;
>      }
> -    msix_free_irq_entries(dev);
> +    for (vector = 0; vector < dev->msix_entries_nr; ++vector) {
> +        msix_clr_pending(dev, vector);
> +    }
>  }
>  
>  /* Invoke the notifier if vector entry is used and unmasked. */
> @@ -476,7 +440,7 @@ msix_notify_if_unmasked(PCIDevice *dev, unsigned int vector, bool masked)
>  
>      assert(dev->msix_vector_config_notifier);
>  
> -    if (!dev->msix_entry_used[vector] || msix_is_masked(dev, vector)) {
> +    if (msix_is_masked(dev, vector)) {
>          return 0;
>      }
>      msix_message_from_vector(dev, vector, &msg);
> diff --git a/hw/msix.h b/hw/msix.h
> index 978f417..9cd54cf 100644
> --- a/hw/msix.h
> +++ b/hw/msix.h
> @@ -21,9 +21,8 @@ int msix_present(PCIDevice *dev);
>  
>  uint32_t msix_bar_size(PCIDevice *dev);
>  
> -int msix_vector_use(PCIDevice *dev, unsigned vector);
> -void msix_vector_unuse(PCIDevice *dev, unsigned vector);
> -void msix_unuse_all_vectors(PCIDevice *dev);
> +void msix_clear_vector(PCIDevice *dev, unsigned vector);
> +void msix_clear_all_vectors(PCIDevice *dev);
>  
>  void msix_notify(PCIDevice *dev, unsigned vector);
>  
> diff --git a/hw/pci.h b/hw/pci.h
> index d7a652e..5cf9a16 100644
> --- a/hw/pci.h
> +++ b/hw/pci.h
> @@ -178,8 +178,6 @@ struct PCIDevice {
>      uint8_t *msix_table_page;
>      /* MMIO index used to map MSIX table and pending bit entries. */
>      MemoryRegion msix_mmio;
> -    /* Reference-count for entries actually in use by driver. */
> -    unsigned *msix_entry_used;
>      /* Region including the MSI-X table */
>      uint32_t msix_bar_size;
>      /* Version id needed for VMState */
> diff --git a/hw/virtio-pci.c b/hw/virtio-pci.c
> index 85d6771..5004d7d 100644
> --- a/hw/virtio-pci.c
> +++ b/hw/virtio-pci.c
> @@ -136,9 +136,6 @@ static int virtio_pci_load_config(void * opaque, QEMUFile *f)
>      } else {
>          proxy->vdev->config_vector = VIRTIO_NO_VECTOR;
>      }
> -    if (proxy->vdev->config_vector != VIRTIO_NO_VECTOR) {
> -        return msix_vector_use(&proxy->pci_dev, proxy->vdev->config_vector);
> -    }
>      return 0;
>  }
>  
> @@ -152,9 +149,6 @@ static int virtio_pci_load_queue(void * opaque, int n, QEMUFile *f)
>          vector = VIRTIO_NO_VECTOR;
>      }
>      virtio_queue_set_vector(proxy->vdev, n, vector);
> -    if (vector != VIRTIO_NO_VECTOR) {
> -        return msix_vector_use(&proxy->pci_dev, vector);
> -    }
>      return 0;
>  }
>  
> @@ -304,7 +298,7 @@ static void virtio_ioport_write(void *opaque, uint32_t addr, uint32_t val)
>          if (pa == 0) {
>              virtio_pci_stop_ioeventfd(proxy);
>              virtio_reset(proxy->vdev);
> -            msix_unuse_all_vectors(&proxy->pci_dev);
> +            msix_clear_all_vectors(&proxy->pci_dev);
>          }
>          else
>              virtio_queue_set_addr(vdev, vdev->queue_sel, pa);
> @@ -331,7 +325,7 @@ static void virtio_ioport_write(void *opaque, uint32_t addr, uint32_t val)
>  
>          if (vdev->status == 0) {
>              virtio_reset(proxy->vdev);
> -            msix_unuse_all_vectors(&proxy->pci_dev);
> +            msix_clear_all_vectors(&proxy->pci_dev);
>          }
>  
>          /* Linux before 2.6.34 sets the device as OK without enabling
> @@ -343,18 +337,20 @@ static void virtio_ioport_write(void *opaque, uint32_t addr, uint32_t val)
>          }
>          break;
>      case VIRTIO_MSI_CONFIG_VECTOR:
> -        msix_vector_unuse(&proxy->pci_dev, vdev->config_vector);
> +        msix_clear_vector(&proxy->pci_dev, vdev->config_vector);
>          /* Make it possible for guest to discover an error took place. */
> -        if (msix_vector_use(&proxy->pci_dev, val) < 0)
> +        if (val >= vdev->nvectors) {
>              val = VIRTIO_NO_VECTOR;
> +        }
>          vdev->config_vector = val;
>          break;
>      case VIRTIO_MSI_QUEUE_VECTOR:
> -        msix_vector_unuse(&proxy->pci_dev,
> +        msix_clear_vector(&proxy->pci_dev,
>                            virtio_queue_vector(vdev, vdev->queue_sel));
>          /* Make it possible for guest to discover an error took place. */
> -        if (msix_vector_use(&proxy->pci_dev, val) < 0)
> +        if (val >= vdev->nvectors) {
>              val = VIRTIO_NO_VECTOR;
> +        }
>          virtio_queue_set_vector(vdev, vdev->queue_sel, val);
>          break;
>      default:
> -- 
> 1.7.3.4

^ permalink raw reply	[flat|nested] 288+ messages in thread

* Re: [RFC][PATCH 00/45] qemu-kvm: MSI layer rework for in-kernel irqchip support
  2011-10-17  9:27 ` [Qemu-devel] " Jan Kiszka
@ 2011-10-17 15:57   ` Michael S. Tsirkin
  -1 siblings, 0 replies; 288+ messages in thread
From: Michael S. Tsirkin @ 2011-10-17 15:57 UTC (permalink / raw)
  To: Jan Kiszka
  Cc: Avi Kivity, Marcelo Tosatti, kvm, Alex Williamson, qemu-devel,
	Alexander Graf, Gerd Hoffmann, Isaku Yamahata

On Mon, Oct 17, 2011 at 11:27:34AM +0200, Jan Kiszka wrote:
> As previously indicated, I was working for quite a while on a major
> refactoring of the MSI "additions" we have in qemu-kvm to support
> in-kernel irqchip, vhost and device assignment. This is now the outcome.
> 
> I'm quite happy with it, things are still working (apparently), and the
> invasiveness of KVM hooks into the MSI layer is significantly reduced.
> Moreover, I was able to port the device assignment code over generic MSI
> support, reducing the size of that file a bit further.
> 
> Some further highlights:
>  - fix for HPET MSI support with in-kernel irqchip
>  - fully configurable MSI-X (allows 1:1 mapping for assigned devices)
>  - refactored KVM core API for device assignment and IRQ routing
> 
> I'm sending the whole series in one chunk so that you can see what the
> result will be. It's RFC as I bet that there are regressions included
> and maybe still room left for improvements. Once all is fine (can be
> broken up into multiple chunks for the merge), I would suggest patching
> qemu-kvm first and then start with porting things over to upstream.
> 
> Comments & review welcome.
> 
> CC: Alexander Graf <agraf@suse.de>
> CC: Gerd Hoffmann <kraxel@redhat.com>
> CC: Isaku Yamahata <yamahata@valinux.co.jp>


So the scheme where we lazily update kvm gsi table
on interrupts is interesting. There seems to be
very little point in it for virtio, and it does
seem to make it impossible to detect lack or resources
(at the moment we let guest know if we run out of GSIs
and linux guests can fall back to regular interrupts).

I am guessing the idea is to use it for device assignment
where it does make sense as there is no standard way
to track which vectors are actually used?
But how does it work there? kvm does not
propage unmapped interrupts from an assigned device to qemu, does it?
 

-- 
MST

^ permalink raw reply	[flat|nested] 288+ messages in thread

* Re: [Qemu-devel] [RFC][PATCH 00/45] qemu-kvm: MSI layer rework for in-kernel irqchip support
@ 2011-10-17 15:57   ` Michael S. Tsirkin
  0 siblings, 0 replies; 288+ messages in thread
From: Michael S. Tsirkin @ 2011-10-17 15:57 UTC (permalink / raw)
  To: Jan Kiszka
  Cc: kvm, Marcelo Tosatti, Alexander Graf, qemu-devel, Isaku Yamahata,
	Alex Williamson, Avi Kivity, Gerd Hoffmann

On Mon, Oct 17, 2011 at 11:27:34AM +0200, Jan Kiszka wrote:
> As previously indicated, I was working for quite a while on a major
> refactoring of the MSI "additions" we have in qemu-kvm to support
> in-kernel irqchip, vhost and device assignment. This is now the outcome.
> 
> I'm quite happy with it, things are still working (apparently), and the
> invasiveness of KVM hooks into the MSI layer is significantly reduced.
> Moreover, I was able to port the device assignment code over generic MSI
> support, reducing the size of that file a bit further.
> 
> Some further highlights:
>  - fix for HPET MSI support with in-kernel irqchip
>  - fully configurable MSI-X (allows 1:1 mapping for assigned devices)
>  - refactored KVM core API for device assignment and IRQ routing
> 
> I'm sending the whole series in one chunk so that you can see what the
> result will be. It's RFC as I bet that there are regressions included
> and maybe still room left for improvements. Once all is fine (can be
> broken up into multiple chunks for the merge), I would suggest patching
> qemu-kvm first and then start with porting things over to upstream.
> 
> Comments & review welcome.
> 
> CC: Alexander Graf <agraf@suse.de>
> CC: Gerd Hoffmann <kraxel@redhat.com>
> CC: Isaku Yamahata <yamahata@valinux.co.jp>


So the scheme where we lazily update kvm gsi table
on interrupts is interesting. There seems to be
very little point in it for virtio, and it does
seem to make it impossible to detect lack or resources
(at the moment we let guest know if we run out of GSIs
and linux guests can fall back to regular interrupts).

I am guessing the idea is to use it for device assignment
where it does make sense as there is no standard way
to track which vectors are actually used?
But how does it work there? kvm does not
propage unmapped interrupts from an assigned device to qemu, does it?
 

-- 
MST

^ permalink raw reply	[flat|nested] 288+ messages in thread

* Re: [RFC][PATCH 11/45] msi: Factor out delivery hook
  2011-10-17 12:14             ` [Qemu-devel] " Avi Kivity
@ 2011-10-17 18:59               ` Jan Kiszka
  -1 siblings, 0 replies; 288+ messages in thread
From: Jan Kiszka @ 2011-10-17 18:59 UTC (permalink / raw)
  To: Avi Kivity
  Cc: Alex Williamson, Marcelo Tosatti, qemu-devel, kvm, Michael S. Tsirkin

[-- Attachment #1: Type: text/plain, Size: 218 bytes --]

On 2011-10-17 14:14, Avi Kivity wrote:
> Can you post a git tree?  It will be easier for me to understand the
> whole thing this way.

Pushed current state to git://git.kiszka.org/qemu-kvm.git queues/msi

Jan


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 262 bytes --]

^ permalink raw reply	[flat|nested] 288+ messages in thread

* Re: [Qemu-devel] [RFC][PATCH 11/45] msi: Factor out delivery hook
@ 2011-10-17 18:59               ` Jan Kiszka
  0 siblings, 0 replies; 288+ messages in thread
From: Jan Kiszka @ 2011-10-17 18:59 UTC (permalink / raw)
  To: Avi Kivity
  Cc: Alex Williamson, Marcelo Tosatti, qemu-devel, kvm, Michael S. Tsirkin

[-- Attachment #1: Type: text/plain, Size: 218 bytes --]

On 2011-10-17 14:14, Avi Kivity wrote:
> Can you post a git tree?  It will be easier for me to understand the
> whole thing this way.

Pushed current state to git://git.kiszka.org/qemu-kvm.git queues/msi

Jan


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 262 bytes --]

^ permalink raw reply	[flat|nested] 288+ messages in thread

* Re: [RFC][PATCH 22/45] qemu-kvm: msix: Fire mask notifier on global mask changes
  2011-10-17 12:16     ` [Qemu-devel] " Michael S. Tsirkin
@ 2011-10-17 19:00       ` Jan Kiszka
  -1 siblings, 0 replies; 288+ messages in thread
From: Jan Kiszka @ 2011-10-17 19:00 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Avi Kivity, Marcelo Tosatti, kvm, Alex Williamson, qemu-devel

[-- Attachment #1: Type: text/plain, Size: 1256 bytes --]

On 2011-10-17 14:16, Michael S. Tsirkin wrote:
> On Mon, Oct 17, 2011 at 11:27:56AM +0200, Jan Kiszka wrote:
>> Also invoke the mask notifier if the global MSI-X mask is modified. For
>> this purpose, we push the notifier call from the per-vector mask update
>> to the central msix_handle_mask_update.
>>
>> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
> 
> This is a bugfix, isn't it?
> If yes it should be separated and put on -stable.

Yep, will pull this to the front.

> 
>> ---
>>  hw/msix.c |   16 +++++++++-------
>>  1 files changed, 9 insertions(+), 7 deletions(-)
>>
>> diff --git a/hw/msix.c b/hw/msix.c
>> index 739b56f..247b255 100644
>> --- a/hw/msix.c
>> +++ b/hw/msix.c
>> @@ -221,7 +221,15 @@ static bool msix_is_masked(PCIDevice *dev, int vector)
>>  
>>  static void msix_handle_mask_update(PCIDevice *dev, int vector)
>>  {
>> -    if (!msix_is_masked(dev, vector) && msix_is_pending(dev, vector)) {
>> +    bool masked = msix_is_masked(dev, vector);
>> +    int ret;
>> +
>> +    if (dev->msix_mask_notifier) {
>> +        ret = dev->msix_mask_notifier(dev, vector,
>> +                                      msix_is_masked(dev, vector));
> 
> Use 'masked' value here as well?

Yes.

Jan


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 262 bytes --]

^ permalink raw reply	[flat|nested] 288+ messages in thread

* Re: [Qemu-devel] [RFC][PATCH 22/45] qemu-kvm: msix: Fire mask notifier on global mask changes
@ 2011-10-17 19:00       ` Jan Kiszka
  0 siblings, 0 replies; 288+ messages in thread
From: Jan Kiszka @ 2011-10-17 19:00 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Alex Williamson, Marcelo Tosatti, Avi Kivity, kvm, qemu-devel

[-- Attachment #1: Type: text/plain, Size: 1256 bytes --]

On 2011-10-17 14:16, Michael S. Tsirkin wrote:
> On Mon, Oct 17, 2011 at 11:27:56AM +0200, Jan Kiszka wrote:
>> Also invoke the mask notifier if the global MSI-X mask is modified. For
>> this purpose, we push the notifier call from the per-vector mask update
>> to the central msix_handle_mask_update.
>>
>> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
> 
> This is a bugfix, isn't it?
> If yes it should be separated and put on -stable.

Yep, will pull this to the front.

> 
>> ---
>>  hw/msix.c |   16 +++++++++-------
>>  1 files changed, 9 insertions(+), 7 deletions(-)
>>
>> diff --git a/hw/msix.c b/hw/msix.c
>> index 739b56f..247b255 100644
>> --- a/hw/msix.c
>> +++ b/hw/msix.c
>> @@ -221,7 +221,15 @@ static bool msix_is_masked(PCIDevice *dev, int vector)
>>  
>>  static void msix_handle_mask_update(PCIDevice *dev, int vector)
>>  {
>> -    if (!msix_is_masked(dev, vector) && msix_is_pending(dev, vector)) {
>> +    bool masked = msix_is_masked(dev, vector);
>> +    int ret;
>> +
>> +    if (dev->msix_mask_notifier) {
>> +        ret = dev->msix_mask_notifier(dev, vector,
>> +                                      msix_is_masked(dev, vector));
> 
> Use 'masked' value here as well?

Yes.

Jan


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 262 bytes --]

^ permalink raw reply	[flat|nested] 288+ messages in thread

* Re: [RFC][PATCH 23/45] qemu-kvm: Rework MSI-X mask notifier to generic MSI config notifiers
  2011-10-17 12:39         ` [Qemu-devel] " Michael S. Tsirkin
@ 2011-10-17 19:08           ` Jan Kiszka
  -1 siblings, 0 replies; 288+ messages in thread
From: Jan Kiszka @ 2011-10-17 19:08 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Alex Williamson, Marcelo Tosatti, Avi Kivity, kvm, qemu-devel

[-- Attachment #1: Type: text/plain, Size: 3021 bytes --]

On 2011-10-17 14:39, Michael S. Tsirkin wrote:
> On Mon, Oct 17, 2011 at 01:45:04PM +0200, Jan Kiszka wrote:
>> On 2011-10-17 13:40, Michael S. Tsirkin wrote:
>>> On Mon, Oct 17, 2011 at 11:27:57AM +0200, Jan Kiszka wrote:
>>>> MSI config notifiers are supposed to be triggered on every relevant
>>>> configuration change of MSI vectors or if MSI is enabled/disabled.
>>>>
>>>> Two notifiers are established, one for vector changes and one for general
>>>> enabling. The former notifier additionally passes the currently active
>>>> MSI message.
>>>> This will allow to update potential in-kernel IRQ routes on
>>>> changes. The latter notifier is optional and will only be used by a
>>>> subset of clients.
>>>>
>>>> These notifiers are currently only available for MSI-X but will be
>>>> extended to legacy MSI as well.
>>>>
>>>> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
>>>
>>> Passing message, always, does not seem to make sense: message is only
>>> valid if it is unmasked.
>>
>> If we go from unmasked to masked, the consumer could just ignore the
>> message.
> 
> Why don't we let the consumer get the message if it needs it?

Because most consumer will need and because I want to keep the API simple.

> 
>>> Further, IIRC the spec requires any changes to be done while
>>> message is masked. So mask notifier makes more sense to me:
>>> it does the same thing using one notifier that you do
>>> using two notifiers.
>>
>> That's in fact a possible optimization (only invoke the callback on mask
>> transitions).
> 
> Further, it is one that is already implemented.
> So I would prefer not to add work by removing it :)

Generalization to cover MSI requires some changes. Unneeded behavioral
changes back and forth should and will of course be avoided. I will
rework this.

> 
>> Not sure if that applies to MSI as well, probably not.
> 
> Probably not. However, if per vector masking is
> supported, and while vector is masked, the address/
> data values might not make any sense.
> 
> So I think even msi users needs to know about masked state.

Yes, and they get this information via the config notifier.

> 
>> To
>> have common types, I would prefer to stay with vector config notifiers
>> as name then.
>>
>> Jan
> 
> So we pass in nonsense values and ask all users to know about MSIX rules.
> Ugh.
> 
> I do realize msi might change the vector without masking.
> We can either artificially call mask before value change
> and unmask after, or use 3 notifiers: mask,unmask,config.
> Add a comment that config is invoked when configuration
> for an unmasked vector is changed, and that
> it can only happen for msi, not msix.

I see no need in complicating the API like this. MSI-X still needs the
config information on unmask, so let's just consistently pass it via the
unified config notifier instead of forcing the consumers to create yet
two more handlers. I really do not see the benefit for the consumer.

Jan


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 262 bytes --]

^ permalink raw reply	[flat|nested] 288+ messages in thread

* Re: [Qemu-devel] [RFC][PATCH 23/45] qemu-kvm: Rework MSI-X mask notifier to generic MSI config notifiers
@ 2011-10-17 19:08           ` Jan Kiszka
  0 siblings, 0 replies; 288+ messages in thread
From: Jan Kiszka @ 2011-10-17 19:08 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Alex Williamson, Marcelo Tosatti, Avi Kivity, kvm, qemu-devel

[-- Attachment #1: Type: text/plain, Size: 3021 bytes --]

On 2011-10-17 14:39, Michael S. Tsirkin wrote:
> On Mon, Oct 17, 2011 at 01:45:04PM +0200, Jan Kiszka wrote:
>> On 2011-10-17 13:40, Michael S. Tsirkin wrote:
>>> On Mon, Oct 17, 2011 at 11:27:57AM +0200, Jan Kiszka wrote:
>>>> MSI config notifiers are supposed to be triggered on every relevant
>>>> configuration change of MSI vectors or if MSI is enabled/disabled.
>>>>
>>>> Two notifiers are established, one for vector changes and one for general
>>>> enabling. The former notifier additionally passes the currently active
>>>> MSI message.
>>>> This will allow to update potential in-kernel IRQ routes on
>>>> changes. The latter notifier is optional and will only be used by a
>>>> subset of clients.
>>>>
>>>> These notifiers are currently only available for MSI-X but will be
>>>> extended to legacy MSI as well.
>>>>
>>>> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
>>>
>>> Passing message, always, does not seem to make sense: message is only
>>> valid if it is unmasked.
>>
>> If we go from unmasked to masked, the consumer could just ignore the
>> message.
> 
> Why don't we let the consumer get the message if it needs it?

Because most consumer will need and because I want to keep the API simple.

> 
>>> Further, IIRC the spec requires any changes to be done while
>>> message is masked. So mask notifier makes more sense to me:
>>> it does the same thing using one notifier that you do
>>> using two notifiers.
>>
>> That's in fact a possible optimization (only invoke the callback on mask
>> transitions).
> 
> Further, it is one that is already implemented.
> So I would prefer not to add work by removing it :)

Generalization to cover MSI requires some changes. Unneeded behavioral
changes back and forth should and will of course be avoided. I will
rework this.

> 
>> Not sure if that applies to MSI as well, probably not.
> 
> Probably not. However, if per vector masking is
> supported, and while vector is masked, the address/
> data values might not make any sense.
> 
> So I think even msi users needs to know about masked state.

Yes, and they get this information via the config notifier.

> 
>> To
>> have common types, I would prefer to stay with vector config notifiers
>> as name then.
>>
>> Jan
> 
> So we pass in nonsense values and ask all users to know about MSIX rules.
> Ugh.
> 
> I do realize msi might change the vector without masking.
> We can either artificially call mask before value change
> and unmask after, or use 3 notifiers: mask,unmask,config.
> Add a comment that config is invoked when configuration
> for an unmasked vector is changed, and that
> it can only happen for msi, not msix.

I see no need in complicating the API like this. MSI-X still needs the
config information on unmask, so let's just consistently pass it via the
unified config notifier instead of forcing the consumers to create yet
two more handlers. I really do not see the benefit for the consumer.

Jan


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 262 bytes --]

^ permalink raw reply	[flat|nested] 288+ messages in thread

* Re: [RFC][PATCH 06/45] msix: Prevent bogus mask updates on MMIO accesses
  2011-10-17 12:50             ` [Qemu-devel] " Michael S. Tsirkin
@ 2011-10-17 19:11               ` Jan Kiszka
  -1 siblings, 0 replies; 288+ messages in thread
From: Jan Kiszka @ 2011-10-17 19:11 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Alex Williamson, Marcelo Tosatti, Avi Kivity, kvm, qemu-devel

[-- Attachment #1: Type: text/plain, Size: 1447 bytes --]

On 2011-10-17 14:50, Michael S. Tsirkin wrote:
> On Mon, Oct 17, 2011 at 02:07:10PM +0200, Jan Kiszka wrote:
>> On 2011-10-17 13:57, Michael S. Tsirkin wrote:
>>> On Mon, Oct 17, 2011 at 01:23:46PM +0200, Jan Kiszka wrote:
>>>> On 2011-10-17 13:10, Michael S. Tsirkin wrote:
>>>>> On Mon, Oct 17, 2011 at 11:27:40AM +0200, Jan Kiszka wrote:
>>>>>> Only accesses to the MSI-X table must trigger a call to
>>>>>> msix_handle_mask_update or a notifier invocation.
>>>>>>
>>>>>> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
>>>>>
>>>>> Why would msix_mmio_write be called on an access
>>>>> outside the table?
>>>>
>>>> Because it handles both the table and the PBA.
>>>
>>> Hmm. Interesting. Is there a bug in how we handle PBA
>>> updates then? If yes I'd like a separate patch for that
>>> to apply to the stable tree.
>>
>> I first thought it was a serious bug, but it just triggers if the guest
>> write to PBA (which is very uncommon) and that actually triggers any
>> spurious out-of-bounds vector injection. Highly unlikely.
> 
> Yes guests don't really use PBA ATM. But is there something
> bad a malicious guest can do? For example, what if
> msix_clr_pending gets invoked with this huge vector value?
> 
> It does seem serious ...

I checked it before and I think it is harmless. The largest vector that
can be miscalculated is 255. But bit 255 in the PBA is still safe inside
our MMIO page.

Jan


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 262 bytes --]

^ permalink raw reply	[flat|nested] 288+ messages in thread

* Re: [Qemu-devel] [RFC][PATCH 06/45] msix: Prevent bogus mask updates on MMIO accesses
@ 2011-10-17 19:11               ` Jan Kiszka
  0 siblings, 0 replies; 288+ messages in thread
From: Jan Kiszka @ 2011-10-17 19:11 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Alex Williamson, Marcelo Tosatti, Avi Kivity, kvm, qemu-devel

[-- Attachment #1: Type: text/plain, Size: 1447 bytes --]

On 2011-10-17 14:50, Michael S. Tsirkin wrote:
> On Mon, Oct 17, 2011 at 02:07:10PM +0200, Jan Kiszka wrote:
>> On 2011-10-17 13:57, Michael S. Tsirkin wrote:
>>> On Mon, Oct 17, 2011 at 01:23:46PM +0200, Jan Kiszka wrote:
>>>> On 2011-10-17 13:10, Michael S. Tsirkin wrote:
>>>>> On Mon, Oct 17, 2011 at 11:27:40AM +0200, Jan Kiszka wrote:
>>>>>> Only accesses to the MSI-X table must trigger a call to
>>>>>> msix_handle_mask_update or a notifier invocation.
>>>>>>
>>>>>> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
>>>>>
>>>>> Why would msix_mmio_write be called on an access
>>>>> outside the table?
>>>>
>>>> Because it handles both the table and the PBA.
>>>
>>> Hmm. Interesting. Is there a bug in how we handle PBA
>>> updates then? If yes I'd like a separate patch for that
>>> to apply to the stable tree.
>>
>> I first thought it was a serious bug, but it just triggers if the guest
>> write to PBA (which is very uncommon) and that actually triggers any
>> spurious out-of-bounds vector injection. Highly unlikely.
> 
> Yes guests don't really use PBA ATM. But is there something
> bad a malicious guest can do? For example, what if
> msix_clr_pending gets invoked with this huge vector value?
> 
> It does seem serious ...

I checked it before and I think it is harmless. The largest vector that
can be miscalculated is 255. But bit 255 in the PBA is still safe inside
our MMIO page.

Jan


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 262 bytes --]

^ permalink raw reply	[flat|nested] 288+ messages in thread

* Re: [RFC][PATCH 08/45] Introduce MSIMessage structure
  2011-10-17 13:01             ` [Qemu-devel] " Michael S. Tsirkin
@ 2011-10-17 19:14               ` Jan Kiszka
  -1 siblings, 0 replies; 288+ messages in thread
From: Jan Kiszka @ 2011-10-17 19:14 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Avi Kivity, Marcelo Tosatti, kvm, Alex Williamson, qemu-devel

[-- Attachment #1: Type: text/plain, Size: 2233 bytes --]

On 2011-10-17 15:01, Michael S. Tsirkin wrote:
> On Mon, Oct 17, 2011 at 02:09:46PM +0200, Jan Kiszka wrote:
>> On 2011-10-17 14:04, Michael S. Tsirkin wrote:
>>> On Mon, Oct 17, 2011 at 01:51:00PM +0200, Jan Kiszka wrote:
>>>> On 2011-10-17 13:46, Michael S. Tsirkin wrote:
>>>>> On Mon, Oct 17, 2011 at 11:27:42AM +0200, Jan Kiszka wrote:
>>>>>> Will be used for generating and distributing MSI messages, both in
>>>>>> emulation mode and under KVM.
>>>>>>
>>>>>> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
>>>>>
>>>>> I would add
>>>>>
>>>>> uint64_t msix_get_address(dev, vector)
>>>>> uint64_t msix_get_data(dev, vector)
>>>>>
>>>>> and same for msi.
>>>>>
>>>>> this would minimise the changes while still making it
>>>>> possible to avoid code duplication in kvm.
>>>>
>>>> I'm introducing msi[x]_message_from_vector for that purpose later on. Or
>>>> what do you mean?
>>>>
>>>> Jan
>>>
>>> It does not look like everyone actually wants the structure,
>>> users seem to put it on stack and then immediately
>>> unwrap it to get at the address/data.
>>> So two accessorts get_data + get_address instead of one, will
>>> remove the need to rework all code to use the structure.
>>
>> The idea of this patch is to start handling MSI messages as a single
>> blob. There should be no need to ask a device for parts of that blobs
>> this way.
> 
> There should be no need to look at the message at all.
> devices really only care about vector numbers.
> So we are left with msix.c msi.c and kvm as the only users.
> kvm has a cache of messages so it needs a struct of these,
> msix/msi don't.

MSIMessages is primarily designed for path from the device to the
interrupt controller. And there are multiple stops, already in the path
after this patch set. Interrupt remapping, e.g., would add another stop.

> 
>> If you see use cases in this series, though, let me know.
>>
>> Jan
> 
> Yes, I see them. msix_notify is one example. msi_notify is another.
> 
> E.g. msi_notify would IMO look nicer as:
>     stl_le_phys(msi_get_address(dev, vector), msi_get_data(dev, vector));

This line does not exist anymore at the end of this series. See
pc_msi_deliver.

Jan


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 262 bytes --]

^ permalink raw reply	[flat|nested] 288+ messages in thread

* Re: [Qemu-devel] [RFC][PATCH 08/45] Introduce MSIMessage structure
@ 2011-10-17 19:14               ` Jan Kiszka
  0 siblings, 0 replies; 288+ messages in thread
From: Jan Kiszka @ 2011-10-17 19:14 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Alex Williamson, Marcelo Tosatti, Avi Kivity, kvm, qemu-devel

[-- Attachment #1: Type: text/plain, Size: 2233 bytes --]

On 2011-10-17 15:01, Michael S. Tsirkin wrote:
> On Mon, Oct 17, 2011 at 02:09:46PM +0200, Jan Kiszka wrote:
>> On 2011-10-17 14:04, Michael S. Tsirkin wrote:
>>> On Mon, Oct 17, 2011 at 01:51:00PM +0200, Jan Kiszka wrote:
>>>> On 2011-10-17 13:46, Michael S. Tsirkin wrote:
>>>>> On Mon, Oct 17, 2011 at 11:27:42AM +0200, Jan Kiszka wrote:
>>>>>> Will be used for generating and distributing MSI messages, both in
>>>>>> emulation mode and under KVM.
>>>>>>
>>>>>> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
>>>>>
>>>>> I would add
>>>>>
>>>>> uint64_t msix_get_address(dev, vector)
>>>>> uint64_t msix_get_data(dev, vector)
>>>>>
>>>>> and same for msi.
>>>>>
>>>>> this would minimise the changes while still making it
>>>>> possible to avoid code duplication in kvm.
>>>>
>>>> I'm introducing msi[x]_message_from_vector for that purpose later on. Or
>>>> what do you mean?
>>>>
>>>> Jan
>>>
>>> It does not look like everyone actually wants the structure,
>>> users seem to put it on stack and then immediately
>>> unwrap it to get at the address/data.
>>> So two accessorts get_data + get_address instead of one, will
>>> remove the need to rework all code to use the structure.
>>
>> The idea of this patch is to start handling MSI messages as a single
>> blob. There should be no need to ask a device for parts of that blobs
>> this way.
> 
> There should be no need to look at the message at all.
> devices really only care about vector numbers.
> So we are left with msix.c msi.c and kvm as the only users.
> kvm has a cache of messages so it needs a struct of these,
> msix/msi don't.

MSIMessages is primarily designed for path from the device to the
interrupt controller. And there are multiple stops, already in the path
after this patch set. Interrupt remapping, e.g., would add another stop.

> 
>> If you see use cases in this series, though, let me know.
>>
>> Jan
> 
> Yes, I see them. msix_notify is one example. msi_notify is another.
> 
> E.g. msi_notify would IMO look nicer as:
>     stl_le_phys(msi_get_address(dev, vector), msi_get_data(dev, vector));

This line does not exist anymore at the end of this series. See
pc_msi_deliver.

Jan


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 262 bytes --]

^ permalink raw reply	[flat|nested] 288+ messages in thread

* Re: [RFC][PATCH 11/45] msi: Factor out delivery hook
  2011-10-17 13:43     ` [Qemu-devel] " Michael S. Tsirkin
@ 2011-10-17 19:15       ` Jan Kiszka
  -1 siblings, 0 replies; 288+ messages in thread
From: Jan Kiszka @ 2011-10-17 19:15 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Alex Williamson, Marcelo Tosatti, Avi Kivity, kvm, qemu-devel

[-- Attachment #1: Type: text/plain, Size: 783 bytes --]

On 2011-10-17 15:43, Michael S. Tsirkin wrote:
> On Mon, Oct 17, 2011 at 11:27:45AM +0200, Jan Kiszka wrote:
>> diff --git a/hw/msi.c b/hw/msi.c
>> index 3c7ebc3..9055155 100644
>> --- a/hw/msi.c
>> +++ b/hw/msi.c
>> @@ -40,6 +40,14 @@
>>  /* Flag for interrupt controller to declare MSI/MSI-X support */
>>  bool msi_supported;
>>  
>> +static void msi_unsupported(MSIMessage *msg)
>> +{
>> +    /* If we get here, the board failed to register a delivery handler. */
>> +    abort();
>> +}
>> +
>> +void (*msi_deliver)(MSIMessage *msg) = msi_unsupported;
>> +
> 
> How about we set this to NULL, and check it instead of the bool
> flag?
> 

Yeah. I will introduce

bool msi_supported(void)
{
    return msi_deliver != msi_unsupported;
}

OK?

Jan


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 262 bytes --]

^ permalink raw reply	[flat|nested] 288+ messages in thread

* Re: [Qemu-devel] [RFC][PATCH 11/45] msi: Factor out delivery hook
@ 2011-10-17 19:15       ` Jan Kiszka
  0 siblings, 0 replies; 288+ messages in thread
From: Jan Kiszka @ 2011-10-17 19:15 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Alex Williamson, Marcelo Tosatti, Avi Kivity, kvm, qemu-devel

[-- Attachment #1: Type: text/plain, Size: 783 bytes --]

On 2011-10-17 15:43, Michael S. Tsirkin wrote:
> On Mon, Oct 17, 2011 at 11:27:45AM +0200, Jan Kiszka wrote:
>> diff --git a/hw/msi.c b/hw/msi.c
>> index 3c7ebc3..9055155 100644
>> --- a/hw/msi.c
>> +++ b/hw/msi.c
>> @@ -40,6 +40,14 @@
>>  /* Flag for interrupt controller to declare MSI/MSI-X support */
>>  bool msi_supported;
>>  
>> +static void msi_unsupported(MSIMessage *msg)
>> +{
>> +    /* If we get here, the board failed to register a delivery handler. */
>> +    abort();
>> +}
>> +
>> +void (*msi_deliver)(MSIMessage *msg) = msi_unsupported;
>> +
> 
> How about we set this to NULL, and check it instead of the bool
> flag?
> 

Yeah. I will introduce

bool msi_supported(void)
{
    return msi_deliver != msi_unsupported;
}

OK?

Jan


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 262 bytes --]

^ permalink raw reply	[flat|nested] 288+ messages in thread

* Re: [RFC][PATCH 11/45] msi: Factor out delivery hook
  2011-10-17 13:48             ` [Qemu-devel] " Michael S. Tsirkin
@ 2011-10-17 19:18               ` Jan Kiszka
  -1 siblings, 0 replies; 288+ messages in thread
From: Jan Kiszka @ 2011-10-17 19:18 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Alex Williamson, Marcelo Tosatti, Avi Kivity, kvm, qemu-devel

[-- Attachment #1: Type: text/plain, Size: 1369 bytes --]

On 2011-10-17 15:48, Michael S. Tsirkin wrote:
> On Mon, Oct 17, 2011 at 03:41:44PM +0200, Avi Kivity wrote:
>> On 10/17/2011 03:41 PM, Michael S. Tsirkin wrote:
>>> On Mon, Oct 17, 2011 at 01:15:56PM +0200, Jan Kiszka wrote:
>>>> On 2011-10-17 12:56, Avi Kivity wrote:
>>>>> On 10/17/2011 11:27 AM, Jan Kiszka wrote:
>>>>>> So far we deliver MSI messages by writing them into the target MMIO
>>>>>> area. This reflects what happens on hardware, but imposes some
>>>>>> limitations on the emulation when introducing KVM in-kernel irqchip
>>>>>> models. For those we will need to track the message origin.
>>>>>
>>>>> Why do we need to track the message origin?  Emulated interrupt remapping?
>>>>
>>>> The origin holds the routing cache which we need to track if the message
>>>> already has a route (and that without searching long lists) and to
>>>> update that route instead of add another one.
>>>
>>> Hmm, yes, but if the device does stl_phys or something like this,
>>> it won't work with irqchip, will it? And it should, ideally.
>>
>> Why not?  it will fall back to the apic path, and use the local routing
>> cache entry there.
> 
> Does it still work with irqchip enabled? I didn't realize ...

Yep, as MSI requests that land in the APIC MMIO page are also fed into
msi_deliver and will take the normal path from there on.

Jan


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 262 bytes --]

^ permalink raw reply	[flat|nested] 288+ messages in thread

* Re: [Qemu-devel] [RFC][PATCH 11/45] msi: Factor out delivery hook
@ 2011-10-17 19:18               ` Jan Kiszka
  0 siblings, 0 replies; 288+ messages in thread
From: Jan Kiszka @ 2011-10-17 19:18 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Alex Williamson, Marcelo Tosatti, Avi Kivity, kvm, qemu-devel

[-- Attachment #1: Type: text/plain, Size: 1369 bytes --]

On 2011-10-17 15:48, Michael S. Tsirkin wrote:
> On Mon, Oct 17, 2011 at 03:41:44PM +0200, Avi Kivity wrote:
>> On 10/17/2011 03:41 PM, Michael S. Tsirkin wrote:
>>> On Mon, Oct 17, 2011 at 01:15:56PM +0200, Jan Kiszka wrote:
>>>> On 2011-10-17 12:56, Avi Kivity wrote:
>>>>> On 10/17/2011 11:27 AM, Jan Kiszka wrote:
>>>>>> So far we deliver MSI messages by writing them into the target MMIO
>>>>>> area. This reflects what happens on hardware, but imposes some
>>>>>> limitations on the emulation when introducing KVM in-kernel irqchip
>>>>>> models. For those we will need to track the message origin.
>>>>>
>>>>> Why do we need to track the message origin?  Emulated interrupt remapping?
>>>>
>>>> The origin holds the routing cache which we need to track if the message
>>>> already has a route (and that without searching long lists) and to
>>>> update that route instead of add another one.
>>>
>>> Hmm, yes, but if the device does stl_phys or something like this,
>>> it won't work with irqchip, will it? And it should, ideally.
>>
>> Why not?  it will fall back to the apic path, and use the local routing
>> cache entry there.
> 
> Does it still work with irqchip enabled? I didn't realize ...

Yep, as MSI requests that land in the APIC MMIO page are also fed into
msi_deliver and will take the normal path from there on.

Jan


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 262 bytes --]

^ permalink raw reply	[flat|nested] 288+ messages in thread

* Re: [RFC][PATCH 12/45] msi: Introduce MSIRoutingCache
  2011-10-17 15:37         ` [Qemu-devel] " Michael S. Tsirkin
@ 2011-10-17 19:19           ` Jan Kiszka
  -1 siblings, 0 replies; 288+ messages in thread
From: Jan Kiszka @ 2011-10-17 19:19 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Alex Williamson, Marcelo Tosatti, Avi Kivity, kvm, qemu-devel

[-- Attachment #1: Type: text/plain, Size: 2651 bytes --]

On 2011-10-17 17:37, Michael S. Tsirkin wrote:
> On Mon, Oct 17, 2011 at 01:19:56PM +0200, Jan Kiszka wrote:
>> On 2011-10-17 13:06, Avi Kivity wrote:
>>> On 10/17/2011 11:27 AM, Jan Kiszka wrote:
>>>> This cache will help us implementing KVM in-kernel irqchip support
>>>> without spreading hooks all over the place.
>>>>
>>>> KVM requires us to register it first and then deliver it by raising a
>>>> pseudo IRQ line returned on registration. While this could be changed
>>>> for QEMU-originated MSI messages by adding direct MSI injection, we will
>>>> still need this translation for irqfd-originated messages. The
>>>> MSIRoutingCache will allow to track those registrations and update them
>>>> lazily before the actual delivery. This avoid having to track MSI
>>>> vectors at device level (like qemu-kvm currently does).
>>>>
>>>>
>>>> +typedef enum {
>>>> +    MSI_ROUTE_NONE = 0,
>>>> +    MSI_ROUTE_STATIC,
>>>> +} MSIRouteType;
>>>> +
>>>> +struct MSIRoutingCache {
>>>> +    MSIMessage msg;
>>>> +    MSIRouteType type;
>>>> +    int kvm_gsi;
>>>> +    int kvm_irqfd;
>>>> +};
>>>> +
>>>> diff --git a/hw/pci.h b/hw/pci.h
>>>> index 329ab32..5b5d2fd 100644
>>>> --- a/hw/pci.h
>>>> +++ b/hw/pci.h
>>>> @@ -197,6 +197,10 @@ struct PCIDevice {
>>>>      MemoryRegion rom;
>>>>      uint32_t rom_bar;
>>>>  
>>>> +    /* MSI routing chaches */
>>>> +    MSIRoutingCache *msi_cache;
>>>> +    MSIRoutingCache *msix_cache;
>>>> +
>>>>      /* MSI entries */
>>>>      int msi_entries_nr;
>>>>      struct KVMMsiMessage *msi_irq_entries;
>>>
>>> IMO this needlessly leaks kvm information into core qemu.  The cache
>>> should be completely hidden in kvm code.
>>>
>>> I think msi_deliver() can hide the use of the cache completely.  For
>>> pre-registered events like kvm's irqfd, you can use something like
>>>
>>>   qemu_irq qemu_msi_irq(MSIMessage msg)
>>>
>>> for non-kvm, it simply returns a qemu_irq that triggers a stl_phys();
>>> for kvm, it allocates an irqfd and a permanent entry in the cache and
>>> returns a qemu_irq that triggers the irqfd.
>>
>> See my previously mail: you want to track the life-cycle of an MSI
>> source to avoid generating routes for identical sources. A messages is
>> not a source. Two identical messages can come from different sources.
> 
> Since MSI messages are edge triggered, I don't see how this
> would work without losing interrupts. And AFAIK,
> existing guests do not use the same message for
> different sources.

Just like we handle shared edge-triggered line-base IRQs, shared MSIs
are in principle feasible as well.

Jan


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 262 bytes --]

^ permalink raw reply	[flat|nested] 288+ messages in thread

* Re: [Qemu-devel] [RFC][PATCH 12/45] msi: Introduce MSIRoutingCache
@ 2011-10-17 19:19           ` Jan Kiszka
  0 siblings, 0 replies; 288+ messages in thread
From: Jan Kiszka @ 2011-10-17 19:19 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Alex Williamson, Marcelo Tosatti, Avi Kivity, kvm, qemu-devel

[-- Attachment #1: Type: text/plain, Size: 2651 bytes --]

On 2011-10-17 17:37, Michael S. Tsirkin wrote:
> On Mon, Oct 17, 2011 at 01:19:56PM +0200, Jan Kiszka wrote:
>> On 2011-10-17 13:06, Avi Kivity wrote:
>>> On 10/17/2011 11:27 AM, Jan Kiszka wrote:
>>>> This cache will help us implementing KVM in-kernel irqchip support
>>>> without spreading hooks all over the place.
>>>>
>>>> KVM requires us to register it first and then deliver it by raising a
>>>> pseudo IRQ line returned on registration. While this could be changed
>>>> for QEMU-originated MSI messages by adding direct MSI injection, we will
>>>> still need this translation for irqfd-originated messages. The
>>>> MSIRoutingCache will allow to track those registrations and update them
>>>> lazily before the actual delivery. This avoid having to track MSI
>>>> vectors at device level (like qemu-kvm currently does).
>>>>
>>>>
>>>> +typedef enum {
>>>> +    MSI_ROUTE_NONE = 0,
>>>> +    MSI_ROUTE_STATIC,
>>>> +} MSIRouteType;
>>>> +
>>>> +struct MSIRoutingCache {
>>>> +    MSIMessage msg;
>>>> +    MSIRouteType type;
>>>> +    int kvm_gsi;
>>>> +    int kvm_irqfd;
>>>> +};
>>>> +
>>>> diff --git a/hw/pci.h b/hw/pci.h
>>>> index 329ab32..5b5d2fd 100644
>>>> --- a/hw/pci.h
>>>> +++ b/hw/pci.h
>>>> @@ -197,6 +197,10 @@ struct PCIDevice {
>>>>      MemoryRegion rom;
>>>>      uint32_t rom_bar;
>>>>  
>>>> +    /* MSI routing chaches */
>>>> +    MSIRoutingCache *msi_cache;
>>>> +    MSIRoutingCache *msix_cache;
>>>> +
>>>>      /* MSI entries */
>>>>      int msi_entries_nr;
>>>>      struct KVMMsiMessage *msi_irq_entries;
>>>
>>> IMO this needlessly leaks kvm information into core qemu.  The cache
>>> should be completely hidden in kvm code.
>>>
>>> I think msi_deliver() can hide the use of the cache completely.  For
>>> pre-registered events like kvm's irqfd, you can use something like
>>>
>>>   qemu_irq qemu_msi_irq(MSIMessage msg)
>>>
>>> for non-kvm, it simply returns a qemu_irq that triggers a stl_phys();
>>> for kvm, it allocates an irqfd and a permanent entry in the cache and
>>> returns a qemu_irq that triggers the irqfd.
>>
>> See my previously mail: you want to track the life-cycle of an MSI
>> source to avoid generating routes for identical sources. A messages is
>> not a source. Two identical messages can come from different sources.
> 
> Since MSI messages are edge triggered, I don't see how this
> would work without losing interrupts. And AFAIK,
> existing guests do not use the same message for
> different sources.

Just like we handle shared edge-triggered line-base IRQs, shared MSIs
are in principle feasible as well.

Jan


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 262 bytes --]

^ permalink raw reply	[flat|nested] 288+ messages in thread

* Re: [RFC][PATCH 42/45] msix: Introduce msix_init_simple
  2011-10-17 14:28         ` [Qemu-devel] " Michael S. Tsirkin
@ 2011-10-17 19:21           ` Jan Kiszka
  -1 siblings, 0 replies; 288+ messages in thread
From: Jan Kiszka @ 2011-10-17 19:21 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Avi Kivity, Marcelo Tosatti, kvm, Alex Williamson, qemu-devel

[-- Attachment #1: Type: text/plain, Size: 1638 bytes --]

On 2011-10-17 16:28, Michael S. Tsirkin wrote:
> On Mon, Oct 17, 2011 at 01:27:31PM +0200, Jan Kiszka wrote:
>> On 2011-10-17 13:22, Michael S. Tsirkin wrote:
>>> On Mon, Oct 17, 2011 at 11:28:16AM +0200, Jan Kiszka wrote:
>>>> Devices models are usually not interested in specifying MSI-X
>>>> configuration details beyond the number of vectors to provide and the
>>>> BAR number to use. Layout of an exclusively used BAR and its
>>>> registration can also be handled centrally.
>>>>
>>>> This is the purpose of msix_init_simple. It provides handy services to
>>>> the existing users. Future users like device assignment may require more
>>>> detailed setup specification. For them we will (re-)introduce msix_init
>>>> with the full list of configuration option (in contrast to the current
>>>> code).
>>>>
>>>> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
>>>
>>> Well, this seems a bit of a code churn then, doesn't it?
>>> We are also discussing using memory BAR for virtio-pci for other
>>> stuff besides MSI-X, so the last user of the _simple variant
>>> will be ivshmem then?
>>
>> We will surely see more MSI-X users over the time. Not sure if they all
>> mix their MSIX-X BARs with other stuff. But e.g. the e1000 variant I
>> have here does not. So there should be users in the future.
>>
>> Jan
> 
> Question is, how hard is to pass in the BAR and the offset?

That is trivial. But have a look at the final simple implementation. It
also manages the container memory region for table and PBA and
registers/unregisters that container as BAR. So there is measurable
added-value.

Jan


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 262 bytes --]

^ permalink raw reply	[flat|nested] 288+ messages in thread

* Re: [Qemu-devel] [RFC][PATCH 42/45] msix: Introduce msix_init_simple
@ 2011-10-17 19:21           ` Jan Kiszka
  0 siblings, 0 replies; 288+ messages in thread
From: Jan Kiszka @ 2011-10-17 19:21 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Alex Williamson, Marcelo Tosatti, Avi Kivity, kvm, qemu-devel

[-- Attachment #1: Type: text/plain, Size: 1638 bytes --]

On 2011-10-17 16:28, Michael S. Tsirkin wrote:
> On Mon, Oct 17, 2011 at 01:27:31PM +0200, Jan Kiszka wrote:
>> On 2011-10-17 13:22, Michael S. Tsirkin wrote:
>>> On Mon, Oct 17, 2011 at 11:28:16AM +0200, Jan Kiszka wrote:
>>>> Devices models are usually not interested in specifying MSI-X
>>>> configuration details beyond the number of vectors to provide and the
>>>> BAR number to use. Layout of an exclusively used BAR and its
>>>> registration can also be handled centrally.
>>>>
>>>> This is the purpose of msix_init_simple. It provides handy services to
>>>> the existing users. Future users like device assignment may require more
>>>> detailed setup specification. For them we will (re-)introduce msix_init
>>>> with the full list of configuration option (in contrast to the current
>>>> code).
>>>>
>>>> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
>>>
>>> Well, this seems a bit of a code churn then, doesn't it?
>>> We are also discussing using memory BAR for virtio-pci for other
>>> stuff besides MSI-X, so the last user of the _simple variant
>>> will be ivshmem then?
>>
>> We will surely see more MSI-X users over the time. Not sure if they all
>> mix their MSIX-X BARs with other stuff. But e.g. the e1000 variant I
>> have here does not. So there should be users in the future.
>>
>> Jan
> 
> Question is, how hard is to pass in the BAR and the offset?

That is trivial. But have a look at the final simple implementation. It
also manages the container memory region for table and PBA and
registers/unregisters that container as BAR. So there is measurable
added-value.

Jan


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 262 bytes --]

^ permalink raw reply	[flat|nested] 288+ messages in thread

* Re: [RFC][PATCH 12/45] msi: Introduce MSIRoutingCache
  2011-10-17 15:43     ` [Qemu-devel] " Michael S. Tsirkin
@ 2011-10-17 19:23       ` Jan Kiszka
  -1 siblings, 0 replies; 288+ messages in thread
From: Jan Kiszka @ 2011-10-17 19:23 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Avi Kivity, Marcelo Tosatti, kvm, Alex Williamson, qemu-devel

[-- Attachment #1: Type: text/plain, Size: 1174 bytes --]

On 2011-10-17 17:43, Michael S. Tsirkin wrote:
> On Mon, Oct 17, 2011 at 11:27:46AM +0200, Jan Kiszka wrote:
>> This cache will help us implementing KVM in-kernel irqchip support
>> without spreading hooks all over the place.
>>
>> KVM requires us to register it first and then deliver it by raising a
>> pseudo IRQ line returned on registration. While this could be changed
>> for QEMU-originated MSI messages by adding direct MSI injection, we will
>> still need this translation for irqfd-originated messages. The
>> MSIRoutingCache will allow to track those registrations and update them
>> lazily before the actual delivery. This avoid having to track MSI
>> vectors at device level (like qemu-kvm currently does).
>>
>> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
> 
> So if many devices are added, exhausting the number of GSIs supported,
> we get terrible performance intead of simply failing outright.
> 
> To me, this looks more like a bug than a feature ...

If that ever turns out to be a bottleneck, failing looks like the worst
we can do. Reporting excessive cache flushes would make some sense and
could still be added.

Jan


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 262 bytes --]

^ permalink raw reply	[flat|nested] 288+ messages in thread

* Re: [Qemu-devel] [RFC][PATCH 12/45] msi: Introduce MSIRoutingCache
@ 2011-10-17 19:23       ` Jan Kiszka
  0 siblings, 0 replies; 288+ messages in thread
From: Jan Kiszka @ 2011-10-17 19:23 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Alex Williamson, Marcelo Tosatti, Avi Kivity, kvm, qemu-devel

[-- Attachment #1: Type: text/plain, Size: 1174 bytes --]

On 2011-10-17 17:43, Michael S. Tsirkin wrote:
> On Mon, Oct 17, 2011 at 11:27:46AM +0200, Jan Kiszka wrote:
>> This cache will help us implementing KVM in-kernel irqchip support
>> without spreading hooks all over the place.
>>
>> KVM requires us to register it first and then deliver it by raising a
>> pseudo IRQ line returned on registration. While this could be changed
>> for QEMU-originated MSI messages by adding direct MSI injection, we will
>> still need this translation for irqfd-originated messages. The
>> MSIRoutingCache will allow to track those registrations and update them
>> lazily before the actual delivery. This avoid having to track MSI
>> vectors at device level (like qemu-kvm currently does).
>>
>> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
> 
> So if many devices are added, exhausting the number of GSIs supported,
> we get terrible performance intead of simply failing outright.
> 
> To me, this looks more like a bug than a feature ...

If that ever turns out to be a bottleneck, failing looks like the worst
we can do. Reporting excessive cache flushes would make some sense and
could still be added.

Jan


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 262 bytes --]

^ permalink raw reply	[flat|nested] 288+ messages in thread

* Re: [RFC][PATCH 28/45] qemu-kvm: msix: Drop tracking of used vectors
  2011-10-17 15:48     ` [Qemu-devel] " Michael S. Tsirkin
@ 2011-10-17 19:28       ` Jan Kiszka
  -1 siblings, 0 replies; 288+ messages in thread
From: Jan Kiszka @ 2011-10-17 19:28 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Avi Kivity, Marcelo Tosatti, kvm, Alex Williamson, qemu-devel

[-- Attachment #1: Type: text/plain, Size: 1015 bytes --]

On 2011-10-17 17:48, Michael S. Tsirkin wrote:
> On Mon, Oct 17, 2011 at 11:28:02AM +0200, Jan Kiszka wrote:
>> This optimization was only required to keep KVM route usage low. Now
>> that we solve that problem via lazy updates, we can drop the field. We
>> still need interfaces to clear pending vectors, though (and we have to
>> make use of them more broadly - but that's unrelated to this patch).
>>
>> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
> 
> Lazy updates should be an implementation detail.
> IMO resource tracking of vectors makes sense
> as an API. Making devices deal with pending
> vectors as a concept, IMO, does not.

There is really no use for tracking the vector lifecycle once we have
lazy updates (except for static routes). It's a way too invasive
concept, and it's not needed for anything but KVM.

If you want an example, check
http://permalink.gmane.org/gmane.comp.emulators.kvm.devel/70915 and
compare it to the changes done to hpet in this series.

Jan


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 262 bytes --]

^ permalink raw reply	[flat|nested] 288+ messages in thread

* Re: [Qemu-devel] [RFC][PATCH 28/45] qemu-kvm: msix: Drop tracking of used vectors
@ 2011-10-17 19:28       ` Jan Kiszka
  0 siblings, 0 replies; 288+ messages in thread
From: Jan Kiszka @ 2011-10-17 19:28 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Alex Williamson, Marcelo Tosatti, Avi Kivity, kvm, qemu-devel

[-- Attachment #1: Type: text/plain, Size: 1015 bytes --]

On 2011-10-17 17:48, Michael S. Tsirkin wrote:
> On Mon, Oct 17, 2011 at 11:28:02AM +0200, Jan Kiszka wrote:
>> This optimization was only required to keep KVM route usage low. Now
>> that we solve that problem via lazy updates, we can drop the field. We
>> still need interfaces to clear pending vectors, though (and we have to
>> make use of them more broadly - but that's unrelated to this patch).
>>
>> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
> 
> Lazy updates should be an implementation detail.
> IMO resource tracking of vectors makes sense
> as an API. Making devices deal with pending
> vectors as a concept, IMO, does not.

There is really no use for tracking the vector lifecycle once we have
lazy updates (except for static routes). It's a way too invasive
concept, and it's not needed for anything but KVM.

If you want an example, check
http://permalink.gmane.org/gmane.comp.emulators.kvm.devel/70915 and
compare it to the changes done to hpet in this series.

Jan


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 262 bytes --]

^ permalink raw reply	[flat|nested] 288+ messages in thread

* Re: [RFC][PATCH 00/45] qemu-kvm: MSI layer rework for in-kernel irqchip support
  2011-10-17 15:57   ` [Qemu-devel] " Michael S. Tsirkin
@ 2011-10-17 19:35     ` Jan Kiszka
  -1 siblings, 0 replies; 288+ messages in thread
From: Jan Kiszka @ 2011-10-17 19:35 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: kvm, Marcelo Tosatti, Alexander Graf, qemu-devel, Isaku Yamahata,
	Alex Williamson, Avi Kivity, Gerd Hoffmann

[-- Attachment #1: Type: text/plain, Size: 2554 bytes --]

On 2011-10-17 17:57, Michael S. Tsirkin wrote:
> On Mon, Oct 17, 2011 at 11:27:34AM +0200, Jan Kiszka wrote:
>> As previously indicated, I was working for quite a while on a major
>> refactoring of the MSI "additions" we have in qemu-kvm to support
>> in-kernel irqchip, vhost and device assignment. This is now the outcome.
>>
>> I'm quite happy with it, things are still working (apparently), and the
>> invasiveness of KVM hooks into the MSI layer is significantly reduced.
>> Moreover, I was able to port the device assignment code over generic MSI
>> support, reducing the size of that file a bit further.
>>
>> Some further highlights:
>>  - fix for HPET MSI support with in-kernel irqchip
>>  - fully configurable MSI-X (allows 1:1 mapping for assigned devices)
>>  - refactored KVM core API for device assignment and IRQ routing
>>
>> I'm sending the whole series in one chunk so that you can see what the
>> result will be. It's RFC as I bet that there are regressions included
>> and maybe still room left for improvements. Once all is fine (can be
>> broken up into multiple chunks for the merge), I would suggest patching
>> qemu-kvm first and then start with porting things over to upstream.
>>
>> Comments & review welcome.
>>
>> CC: Alexander Graf <agraf@suse.de>
>> CC: Gerd Hoffmann <kraxel@redhat.com>
>> CC: Isaku Yamahata <yamahata@valinux.co.jp>
> 
> 
> So the scheme where we lazily update kvm gsi table
> on interrupts is interesting. There seems to be
> very little point in it for virtio, and it does
> seem to make it impossible to detect lack or resources
> (at the moment we let guest know if we run out of GSIs
> and linux guests can fall back to regular interrupts).

Are we really at that limit already? Then I think it's rather time to
lift it at the kernel side.

> 
> I am guessing the idea is to use it for device assignment
> where it does make sense as there is no standard way
> to track which vectors are actually used?
> But how does it work there? kvm does not
> propage unmapped interrupts from an assigned device to qemu, does it?

No, device assignment and irqfd (vhost, vfio, whatever-will-come) belong
to the static group. They need to hand out a static gsi to the irq
source, thus they cannot participate in lazy updating.

But they can benefit from the generic config notifiers: whenever the
guest changes some route (or disables it), they just need to inform the
core about this. So there is no need to track used vectors separately
anymore.

Jan


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 262 bytes --]

^ permalink raw reply	[flat|nested] 288+ messages in thread

* Re: [Qemu-devel] [RFC][PATCH 00/45] qemu-kvm: MSI layer rework for in-kernel irqchip support
@ 2011-10-17 19:35     ` Jan Kiszka
  0 siblings, 0 replies; 288+ messages in thread
From: Jan Kiszka @ 2011-10-17 19:35 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: kvm, Marcelo Tosatti, Alexander Graf, qemu-devel, Isaku Yamahata,
	Alex Williamson, Avi Kivity, Gerd Hoffmann

[-- Attachment #1: Type: text/plain, Size: 2554 bytes --]

On 2011-10-17 17:57, Michael S. Tsirkin wrote:
> On Mon, Oct 17, 2011 at 11:27:34AM +0200, Jan Kiszka wrote:
>> As previously indicated, I was working for quite a while on a major
>> refactoring of the MSI "additions" we have in qemu-kvm to support
>> in-kernel irqchip, vhost and device assignment. This is now the outcome.
>>
>> I'm quite happy with it, things are still working (apparently), and the
>> invasiveness of KVM hooks into the MSI layer is significantly reduced.
>> Moreover, I was able to port the device assignment code over generic MSI
>> support, reducing the size of that file a bit further.
>>
>> Some further highlights:
>>  - fix for HPET MSI support with in-kernel irqchip
>>  - fully configurable MSI-X (allows 1:1 mapping for assigned devices)
>>  - refactored KVM core API for device assignment and IRQ routing
>>
>> I'm sending the whole series in one chunk so that you can see what the
>> result will be. It's RFC as I bet that there are regressions included
>> and maybe still room left for improvements. Once all is fine (can be
>> broken up into multiple chunks for the merge), I would suggest patching
>> qemu-kvm first and then start with porting things over to upstream.
>>
>> Comments & review welcome.
>>
>> CC: Alexander Graf <agraf@suse.de>
>> CC: Gerd Hoffmann <kraxel@redhat.com>
>> CC: Isaku Yamahata <yamahata@valinux.co.jp>
> 
> 
> So the scheme where we lazily update kvm gsi table
> on interrupts is interesting. There seems to be
> very little point in it for virtio, and it does
> seem to make it impossible to detect lack or resources
> (at the moment we let guest know if we run out of GSIs
> and linux guests can fall back to regular interrupts).

Are we really at that limit already? Then I think it's rather time to
lift it at the kernel side.

> 
> I am guessing the idea is to use it for device assignment
> where it does make sense as there is no standard way
> to track which vectors are actually used?
> But how does it work there? kvm does not
> propage unmapped interrupts from an assigned device to qemu, does it?

No, device assignment and irqfd (vhost, vfio, whatever-will-come) belong
to the static group. They need to hand out a static gsi to the irq
source, thus they cannot participate in lazy updating.

But they can benefit from the generic config notifiers: whenever the
guest changes some route (or disables it), they just need to inform the
core about this. So there is no need to track used vectors separately
anymore.

Jan


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 262 bytes --]

^ permalink raw reply	[flat|nested] 288+ messages in thread

* Re: [RFC][PATCH 06/45] msix: Prevent bogus mask updates on MMIO accesses
  2011-10-17 19:11               ` [Qemu-devel] " Jan Kiszka
@ 2011-10-17 19:43                 ` Michael S. Tsirkin
  -1 siblings, 0 replies; 288+ messages in thread
From: Michael S. Tsirkin @ 2011-10-17 19:43 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: Avi Kivity, Marcelo Tosatti, kvm, Alex Williamson, qemu-devel

On Mon, Oct 17, 2011 at 09:11:29PM +0200, Jan Kiszka wrote:
> On 2011-10-17 14:50, Michael S. Tsirkin wrote:
> > On Mon, Oct 17, 2011 at 02:07:10PM +0200, Jan Kiszka wrote:
> >> On 2011-10-17 13:57, Michael S. Tsirkin wrote:
> >>> On Mon, Oct 17, 2011 at 01:23:46PM +0200, Jan Kiszka wrote:
> >>>> On 2011-10-17 13:10, Michael S. Tsirkin wrote:
> >>>>> On Mon, Oct 17, 2011 at 11:27:40AM +0200, Jan Kiszka wrote:
> >>>>>> Only accesses to the MSI-X table must trigger a call to
> >>>>>> msix_handle_mask_update or a notifier invocation.
> >>>>>>
> >>>>>> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
> >>>>>
> >>>>> Why would msix_mmio_write be called on an access
> >>>>> outside the table?
> >>>>
> >>>> Because it handles both the table and the PBA.
> >>>
> >>> Hmm. Interesting. Is there a bug in how we handle PBA
> >>> updates then? If yes I'd like a separate patch for that
> >>> to apply to the stable tree.
> >>
> >> I first thought it was a serious bug, but it just triggers if the guest
> >> write to PBA (which is very uncommon) and that actually triggers any
> >> spurious out-of-bounds vector injection. Highly unlikely.
> > 
> > Yes guests don't really use PBA ATM. But is there something
> > bad a malicious guest can do? For example, what if
> > msix_clr_pending gets invoked with this huge vector value?
> > 
> > It does seem serious ...
> 
> I checked it before and I think it is harmless. The largest vector that
> can be miscalculated is 255. But bit 255 in the PBA is still safe inside
> our MMIO page.
> 
> Jan
> 

you are right. we got lucky.


^ permalink raw reply	[flat|nested] 288+ messages in thread

* Re: [Qemu-devel] [RFC][PATCH 06/45] msix: Prevent bogus mask updates on MMIO accesses
@ 2011-10-17 19:43                 ` Michael S. Tsirkin
  0 siblings, 0 replies; 288+ messages in thread
From: Michael S. Tsirkin @ 2011-10-17 19:43 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: Alex Williamson, Marcelo Tosatti, Avi Kivity, kvm, qemu-devel

On Mon, Oct 17, 2011 at 09:11:29PM +0200, Jan Kiszka wrote:
> On 2011-10-17 14:50, Michael S. Tsirkin wrote:
> > On Mon, Oct 17, 2011 at 02:07:10PM +0200, Jan Kiszka wrote:
> >> On 2011-10-17 13:57, Michael S. Tsirkin wrote:
> >>> On Mon, Oct 17, 2011 at 01:23:46PM +0200, Jan Kiszka wrote:
> >>>> On 2011-10-17 13:10, Michael S. Tsirkin wrote:
> >>>>> On Mon, Oct 17, 2011 at 11:27:40AM +0200, Jan Kiszka wrote:
> >>>>>> Only accesses to the MSI-X table must trigger a call to
> >>>>>> msix_handle_mask_update or a notifier invocation.
> >>>>>>
> >>>>>> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
> >>>>>
> >>>>> Why would msix_mmio_write be called on an access
> >>>>> outside the table?
> >>>>
> >>>> Because it handles both the table and the PBA.
> >>>
> >>> Hmm. Interesting. Is there a bug in how we handle PBA
> >>> updates then? If yes I'd like a separate patch for that
> >>> to apply to the stable tree.
> >>
> >> I first thought it was a serious bug, but it just triggers if the guest
> >> write to PBA (which is very uncommon) and that actually triggers any
> >> spurious out-of-bounds vector injection. Highly unlikely.
> > 
> > Yes guests don't really use PBA ATM. But is there something
> > bad a malicious guest can do? For example, what if
> > msix_clr_pending gets invoked with this huge vector value?
> > 
> > It does seem serious ...
> 
> I checked it before and I think it is harmless. The largest vector that
> can be miscalculated is 255. But bit 255 in the PBA is still safe inside
> our MMIO page.
> 
> Jan
> 

you are right. we got lucky.

^ permalink raw reply	[flat|nested] 288+ messages in thread

* Re: [RFC][PATCH 42/45] msix: Introduce msix_init_simple
  2011-10-17 19:21           ` [Qemu-devel] " Jan Kiszka
@ 2011-10-18 10:52             ` Michael S. Tsirkin
  -1 siblings, 0 replies; 288+ messages in thread
From: Michael S. Tsirkin @ 2011-10-18 10:52 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: Avi Kivity, Marcelo Tosatti, kvm, Alex Williamson, qemu-devel

On Mon, Oct 17, 2011 at 09:21:34PM +0200, Jan Kiszka wrote:
> On 2011-10-17 16:28, Michael S. Tsirkin wrote:
> > On Mon, Oct 17, 2011 at 01:27:31PM +0200, Jan Kiszka wrote:
> >> On 2011-10-17 13:22, Michael S. Tsirkin wrote:
> >>> On Mon, Oct 17, 2011 at 11:28:16AM +0200, Jan Kiszka wrote:
> >>>> Devices models are usually not interested in specifying MSI-X
> >>>> configuration details beyond the number of vectors to provide and the
> >>>> BAR number to use. Layout of an exclusively used BAR and its
> >>>> registration can also be handled centrally.
> >>>>
> >>>> This is the purpose of msix_init_simple. It provides handy services to
> >>>> the existing users. Future users like device assignment may require more
> >>>> detailed setup specification. For them we will (re-)introduce msix_init
> >>>> with the full list of configuration option (in contrast to the current
> >>>> code).
> >>>>
> >>>> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
> >>>
> >>> Well, this seems a bit of a code churn then, doesn't it?
> >>> We are also discussing using memory BAR for virtio-pci for other
> >>> stuff besides MSI-X, so the last user of the _simple variant
> >>> will be ivshmem then?
> >>
> >> We will surely see more MSI-X users over the time. Not sure if they all
> >> mix their MSIX-X BARs with other stuff. But e.g. the e1000 variant I
> >> have here does not. So there should be users in the future.
> >>
> >> Jan
> > 
> > Question is, how hard is to pass in the BAR and the offset?
> 
> That is trivial. But have a look at the final simple implementation. It
> also manages the container memory region for table and PBA and
> registers/unregisters that container as BAR. So there is measurable
> added-value.
> 
> Jan
> 

Yes, I agree. In particular it's not very nice that the user has to know
the size of the bar to create. But the API is very unfortunate IMO.

I am also more interested in solutions that help all devices
and not just those that have a dedicated bar for msix + pba.

We should probably pass in the size of the memory region allocated to
the msix table, and verify that the table fits there.
We can also avoid passing in bar number, like this:

diff --git a/hw/pci.c b/hw/pci.c
index 749e8d8..d0d893e 100644
--- a/hw/pci.c
+++ b/hw/pci.c
@@ -903,6 +903,17 @@ void pci_register_bar(PCIDevice *pci_dev, int region_num,
         : pci_dev->bus->address_space_mem;
 }
 
+int pci_get_bar_nr(PCIDevice *pci_dev, MemoryRegion *bar)
+{
+    int region_num;
+    for (region_num = 0; region_num < PCI_NUM_REGIONS; ++region_num) {
+        if (pci_dev->io_regions[region_num].memory == bar) {
+            return region_num;
+        }
+    }
+    return -1;
+}
+
 pcibus_t pci_get_bar_addr(PCIDevice *pci_dev, int region_num)
 {
     return pci_dev->io_regions[region_num].addr;
-- 
MST

^ permalink raw reply related	[flat|nested] 288+ messages in thread

* Re: [Qemu-devel] [RFC][PATCH 42/45] msix: Introduce msix_init_simple
@ 2011-10-18 10:52             ` Michael S. Tsirkin
  0 siblings, 0 replies; 288+ messages in thread
From: Michael S. Tsirkin @ 2011-10-18 10:52 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: Alex Williamson, Marcelo Tosatti, Avi Kivity, kvm, qemu-devel

On Mon, Oct 17, 2011 at 09:21:34PM +0200, Jan Kiszka wrote:
> On 2011-10-17 16:28, Michael S. Tsirkin wrote:
> > On Mon, Oct 17, 2011 at 01:27:31PM +0200, Jan Kiszka wrote:
> >> On 2011-10-17 13:22, Michael S. Tsirkin wrote:
> >>> On Mon, Oct 17, 2011 at 11:28:16AM +0200, Jan Kiszka wrote:
> >>>> Devices models are usually not interested in specifying MSI-X
> >>>> configuration details beyond the number of vectors to provide and the
> >>>> BAR number to use. Layout of an exclusively used BAR and its
> >>>> registration can also be handled centrally.
> >>>>
> >>>> This is the purpose of msix_init_simple. It provides handy services to
> >>>> the existing users. Future users like device assignment may require more
> >>>> detailed setup specification. For them we will (re-)introduce msix_init
> >>>> with the full list of configuration option (in contrast to the current
> >>>> code).
> >>>>
> >>>> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
> >>>
> >>> Well, this seems a bit of a code churn then, doesn't it?
> >>> We are also discussing using memory BAR for virtio-pci for other
> >>> stuff besides MSI-X, so the last user of the _simple variant
> >>> will be ivshmem then?
> >>
> >> We will surely see more MSI-X users over the time. Not sure if they all
> >> mix their MSIX-X BARs with other stuff. But e.g. the e1000 variant I
> >> have here does not. So there should be users in the future.
> >>
> >> Jan
> > 
> > Question is, how hard is to pass in the BAR and the offset?
> 
> That is trivial. But have a look at the final simple implementation. It
> also manages the container memory region for table and PBA and
> registers/unregisters that container as BAR. So there is measurable
> added-value.
> 
> Jan
> 

Yes, I agree. In particular it's not very nice that the user has to know
the size of the bar to create. But the API is very unfortunate IMO.

I am also more interested in solutions that help all devices
and not just those that have a dedicated bar for msix + pba.

We should probably pass in the size of the memory region allocated to
the msix table, and verify that the table fits there.
We can also avoid passing in bar number, like this:

diff --git a/hw/pci.c b/hw/pci.c
index 749e8d8..d0d893e 100644
--- a/hw/pci.c
+++ b/hw/pci.c
@@ -903,6 +903,17 @@ void pci_register_bar(PCIDevice *pci_dev, int region_num,
         : pci_dev->bus->address_space_mem;
 }
 
+int pci_get_bar_nr(PCIDevice *pci_dev, MemoryRegion *bar)
+{
+    int region_num;
+    for (region_num = 0; region_num < PCI_NUM_REGIONS; ++region_num) {
+        if (pci_dev->io_regions[region_num].memory == bar) {
+            return region_num;
+        }
+    }
+    return -1;
+}
+
 pcibus_t pci_get_bar_addr(PCIDevice *pci_dev, int region_num)
 {
     return pci_dev->io_regions[region_num].addr;
-- 
MST

^ permalink raw reply related	[flat|nested] 288+ messages in thread

* Re: [RFC][PATCH 42/45] msix: Introduce msix_init_simple
  2011-10-18 10:52             ` [Qemu-devel] " Michael S. Tsirkin
@ 2011-10-18 11:02               ` Jan Kiszka
  -1 siblings, 0 replies; 288+ messages in thread
From: Jan Kiszka @ 2011-10-18 11:02 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Avi Kivity, Marcelo Tosatti, kvm, Alex Williamson, qemu-devel

On 2011-10-18 12:52, Michael S. Tsirkin wrote:
> On Mon, Oct 17, 2011 at 09:21:34PM +0200, Jan Kiszka wrote:
>> On 2011-10-17 16:28, Michael S. Tsirkin wrote:
>>> On Mon, Oct 17, 2011 at 01:27:31PM +0200, Jan Kiszka wrote:
>>>> On 2011-10-17 13:22, Michael S. Tsirkin wrote:
>>>>> On Mon, Oct 17, 2011 at 11:28:16AM +0200, Jan Kiszka wrote:
>>>>>> Devices models are usually not interested in specifying MSI-X
>>>>>> configuration details beyond the number of vectors to provide and the
>>>>>> BAR number to use. Layout of an exclusively used BAR and its
>>>>>> registration can also be handled centrally.
>>>>>>
>>>>>> This is the purpose of msix_init_simple. It provides handy services to
>>>>>> the existing users. Future users like device assignment may require more
>>>>>> detailed setup specification. For them we will (re-)introduce msix_init
>>>>>> with the full list of configuration option (in contrast to the current
>>>>>> code).
>>>>>>
>>>>>> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
>>>>>
>>>>> Well, this seems a bit of a code churn then, doesn't it?
>>>>> We are also discussing using memory BAR for virtio-pci for other
>>>>> stuff besides MSI-X, so the last user of the _simple variant
>>>>> will be ivshmem then?
>>>>
>>>> We will surely see more MSI-X users over the time. Not sure if they all
>>>> mix their MSIX-X BARs with other stuff. But e.g. the e1000 variant I
>>>> have here does not. So there should be users in the future.
>>>>
>>>> Jan
>>>
>>> Question is, how hard is to pass in the BAR and the offset?
>>
>> That is trivial. But have a look at the final simple implementation. It
>> also manages the container memory region for table and PBA and
>> registers/unregisters that container as BAR. So there is measurable
>> added-value.
>>
>> Jan
>>
> 
> Yes, I agree. In particular it's not very nice that the user has to know
> the size of the bar to create. But the API is very unfortunate IMO.

I'm open to see the prototypes of a different one.

> 
> I am also more interested in solutions that help all devices
> and not just those that have a dedicated bar for msix + pba.

That's what patch 43 is for.

> 
> We should probably pass in the size of the memory region allocated to
> the msix table, and verify that the table fits there.

Well, if you specify table and PBA offset explicitly, I think it is fair
to expect the caller having done the math.

There are some checks built into the core already, e.g. against table
and PBA overlap. We could add one for checking BAR limits. I'm not sure,
though, if requesting table and PBA sizes solves the issue that the user
may have done the calculation wrong.

> We can also avoid passing in bar number, like this:
> 
> diff --git a/hw/pci.c b/hw/pci.c
> index 749e8d8..d0d893e 100644
> --- a/hw/pci.c
> +++ b/hw/pci.c
> @@ -903,6 +903,17 @@ void pci_register_bar(PCIDevice *pci_dev, int region_num,
>          : pci_dev->bus->address_space_mem;
>  }
>  
> +int pci_get_bar_nr(PCIDevice *pci_dev, MemoryRegion *bar)
> +{
> +    int region_num;
> +    for (region_num = 0; region_num < PCI_NUM_REGIONS; ++region_num) {
> +        if (pci_dev->io_regions[region_num].memory == bar) {
> +            return region_num;
> +        }
> +    }
> +    return -1;
> +}
> +
>  pcibus_t pci_get_bar_addr(PCIDevice *pci_dev, int region_num)
>  {
>      return pci_dev->io_regions[region_num].addr;

That enforces a specific registration order. If you call msix_init
before doing the BAR registration, the function above will not help.

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux

^ permalink raw reply	[flat|nested] 288+ messages in thread

* Re: [Qemu-devel] [RFC][PATCH 42/45] msix: Introduce msix_init_simple
@ 2011-10-18 11:02               ` Jan Kiszka
  0 siblings, 0 replies; 288+ messages in thread
From: Jan Kiszka @ 2011-10-18 11:02 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Alex Williamson, Marcelo Tosatti, Avi Kivity, kvm, qemu-devel

On 2011-10-18 12:52, Michael S. Tsirkin wrote:
> On Mon, Oct 17, 2011 at 09:21:34PM +0200, Jan Kiszka wrote:
>> On 2011-10-17 16:28, Michael S. Tsirkin wrote:
>>> On Mon, Oct 17, 2011 at 01:27:31PM +0200, Jan Kiszka wrote:
>>>> On 2011-10-17 13:22, Michael S. Tsirkin wrote:
>>>>> On Mon, Oct 17, 2011 at 11:28:16AM +0200, Jan Kiszka wrote:
>>>>>> Devices models are usually not interested in specifying MSI-X
>>>>>> configuration details beyond the number of vectors to provide and the
>>>>>> BAR number to use. Layout of an exclusively used BAR and its
>>>>>> registration can also be handled centrally.
>>>>>>
>>>>>> This is the purpose of msix_init_simple. It provides handy services to
>>>>>> the existing users. Future users like device assignment may require more
>>>>>> detailed setup specification. For them we will (re-)introduce msix_init
>>>>>> with the full list of configuration option (in contrast to the current
>>>>>> code).
>>>>>>
>>>>>> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
>>>>>
>>>>> Well, this seems a bit of a code churn then, doesn't it?
>>>>> We are also discussing using memory BAR for virtio-pci for other
>>>>> stuff besides MSI-X, so the last user of the _simple variant
>>>>> will be ivshmem then?
>>>>
>>>> We will surely see more MSI-X users over the time. Not sure if they all
>>>> mix their MSIX-X BARs with other stuff. But e.g. the e1000 variant I
>>>> have here does not. So there should be users in the future.
>>>>
>>>> Jan
>>>
>>> Question is, how hard is to pass in the BAR and the offset?
>>
>> That is trivial. But have a look at the final simple implementation. It
>> also manages the container memory region for table and PBA and
>> registers/unregisters that container as BAR. So there is measurable
>> added-value.
>>
>> Jan
>>
> 
> Yes, I agree. In particular it's not very nice that the user has to know
> the size of the bar to create. But the API is very unfortunate IMO.

I'm open to see the prototypes of a different one.

> 
> I am also more interested in solutions that help all devices
> and not just those that have a dedicated bar for msix + pba.

That's what patch 43 is for.

> 
> We should probably pass in the size of the memory region allocated to
> the msix table, and verify that the table fits there.

Well, if you specify table and PBA offset explicitly, I think it is fair
to expect the caller having done the math.

There are some checks built into the core already, e.g. against table
and PBA overlap. We could add one for checking BAR limits. I'm not sure,
though, if requesting table and PBA sizes solves the issue that the user
may have done the calculation wrong.

> We can also avoid passing in bar number, like this:
> 
> diff --git a/hw/pci.c b/hw/pci.c
> index 749e8d8..d0d893e 100644
> --- a/hw/pci.c
> +++ b/hw/pci.c
> @@ -903,6 +903,17 @@ void pci_register_bar(PCIDevice *pci_dev, int region_num,
>          : pci_dev->bus->address_space_mem;
>  }
>  
> +int pci_get_bar_nr(PCIDevice *pci_dev, MemoryRegion *bar)
> +{
> +    int region_num;
> +    for (region_num = 0; region_num < PCI_NUM_REGIONS; ++region_num) {
> +        if (pci_dev->io_regions[region_num].memory == bar) {
> +            return region_num;
> +        }
> +    }
> +    return -1;
> +}
> +
>  pcibus_t pci_get_bar_addr(PCIDevice *pci_dev, int region_num)
>  {
>      return pci_dev->io_regions[region_num].addr;

That enforces a specific registration order. If you call msix_init
before doing the BAR registration, the function above will not help.

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux

^ permalink raw reply	[flat|nested] 288+ messages in thread

* Re: [RFC][PATCH 28/45] qemu-kvm: msix: Drop tracking of used vectors
  2011-10-17 19:28       ` [Qemu-devel] " Jan Kiszka
@ 2011-10-18 11:58         ` Michael S. Tsirkin
  -1 siblings, 0 replies; 288+ messages in thread
From: Michael S. Tsirkin @ 2011-10-18 11:58 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: Avi Kivity, Marcelo Tosatti, kvm, Alex Williamson, qemu-devel

On Mon, Oct 17, 2011 at 09:28:12PM +0200, Jan Kiszka wrote:
> On 2011-10-17 17:48, Michael S. Tsirkin wrote:
> > On Mon, Oct 17, 2011 at 11:28:02AM +0200, Jan Kiszka wrote:
> >> This optimization was only required to keep KVM route usage low. Now
> >> that we solve that problem via lazy updates, we can drop the field. We
> >> still need interfaces to clear pending vectors, though (and we have to
> >> make use of them more broadly - but that's unrelated to this patch).
> >>
> >> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
> > 
> > Lazy updates should be an implementation detail.
> > IMO resource tracking of vectors makes sense
> > as an API. Making devices deal with pending
> > vectors as a concept, IMO, does not.
> 
> There is really no use for tracking the vector lifecycle once we have
> lazy updates (except for static routes). It's a way too invasive
> concept, and it's not needed for anything but KVM.

I think it's needed. The PCI spec states that when the device
does not need an interrupt anymore, it should clear the pending
bit. The use/unuse is IMO a decent API for this,
because it uses a familiar resource tracking concept.
Exposing this knowledge of msix to devices seems
like a worse API.

> 
> If you want an example, check
> http://permalink.gmane.org/gmane.comp.emulators.kvm.devel/70915 and
> compare it to the changes done to hpet in this series.
> 
> Jan
> 

This seems to be a general argument that lazy updates are good?
I have no real problem with them, besides the fact that
we need an API to reserve space in the routing
table so that device setup can fail upfront.

-- 
MST

^ permalink raw reply	[flat|nested] 288+ messages in thread

* Re: [Qemu-devel] [RFC][PATCH 28/45] qemu-kvm: msix: Drop tracking of used vectors
@ 2011-10-18 11:58         ` Michael S. Tsirkin
  0 siblings, 0 replies; 288+ messages in thread
From: Michael S. Tsirkin @ 2011-10-18 11:58 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: Alex Williamson, Marcelo Tosatti, Avi Kivity, kvm, qemu-devel

On Mon, Oct 17, 2011 at 09:28:12PM +0200, Jan Kiszka wrote:
> On 2011-10-17 17:48, Michael S. Tsirkin wrote:
> > On Mon, Oct 17, 2011 at 11:28:02AM +0200, Jan Kiszka wrote:
> >> This optimization was only required to keep KVM route usage low. Now
> >> that we solve that problem via lazy updates, we can drop the field. We
> >> still need interfaces to clear pending vectors, though (and we have to
> >> make use of them more broadly - but that's unrelated to this patch).
> >>
> >> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
> > 
> > Lazy updates should be an implementation detail.
> > IMO resource tracking of vectors makes sense
> > as an API. Making devices deal with pending
> > vectors as a concept, IMO, does not.
> 
> There is really no use for tracking the vector lifecycle once we have
> lazy updates (except for static routes). It's a way too invasive
> concept, and it's not needed for anything but KVM.

I think it's needed. The PCI spec states that when the device
does not need an interrupt anymore, it should clear the pending
bit. The use/unuse is IMO a decent API for this,
because it uses a familiar resource tracking concept.
Exposing this knowledge of msix to devices seems
like a worse API.

> 
> If you want an example, check
> http://permalink.gmane.org/gmane.comp.emulators.kvm.devel/70915 and
> compare it to the changes done to hpet in this series.
> 
> Jan
> 

This seems to be a general argument that lazy updates are good?
I have no real problem with them, besides the fact that
we need an API to reserve space in the routing
table so that device setup can fail upfront.

-- 
MST

^ permalink raw reply	[flat|nested] 288+ messages in thread

* Re: [RFC][PATCH 11/45] msi: Factor out delivery hook
  2011-10-17 19:15       ` [Qemu-devel] " Jan Kiszka
@ 2011-10-18 12:05         ` Michael S. Tsirkin
  -1 siblings, 0 replies; 288+ messages in thread
From: Michael S. Tsirkin @ 2011-10-18 12:05 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: Avi Kivity, Marcelo Tosatti, kvm, Alex Williamson, qemu-devel

On Mon, Oct 17, 2011 at 09:15:47PM +0200, Jan Kiszka wrote:
> On 2011-10-17 15:43, Michael S. Tsirkin wrote:
> > On Mon, Oct 17, 2011 at 11:27:45AM +0200, Jan Kiszka wrote:
> >> diff --git a/hw/msi.c b/hw/msi.c
> >> index 3c7ebc3..9055155 100644
> >> --- a/hw/msi.c
> >> +++ b/hw/msi.c
> >> @@ -40,6 +40,14 @@
> >>  /* Flag for interrupt controller to declare MSI/MSI-X support */
> >>  bool msi_supported;
> >>  
> >> +static void msi_unsupported(MSIMessage *msg)
> >> +{
> >> +    /* If we get here, the board failed to register a delivery handler. */
> >> +    abort();
> >> +}
> >> +
> >> +void (*msi_deliver)(MSIMessage *msg) = msi_unsupported;
> >> +
> > 
> > How about we set this to NULL, and check it instead of the bool
> > flag?
> > 
> 
> Yeah. I will introduce
> 
> bool msi_supported(void)
> {
>     return msi_deliver != msi_unsupported;
> }
> 
> OK?
> 
> Jan
> 

Looks a bit weird ...
NULL is a pretty standard value for an invalid pointer, isn't it?

-- 
MST

^ permalink raw reply	[flat|nested] 288+ messages in thread

* Re: [Qemu-devel] [RFC][PATCH 11/45] msi: Factor out delivery hook
@ 2011-10-18 12:05         ` Michael S. Tsirkin
  0 siblings, 0 replies; 288+ messages in thread
From: Michael S. Tsirkin @ 2011-10-18 12:05 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: Alex Williamson, Marcelo Tosatti, Avi Kivity, kvm, qemu-devel

On Mon, Oct 17, 2011 at 09:15:47PM +0200, Jan Kiszka wrote:
> On 2011-10-17 15:43, Michael S. Tsirkin wrote:
> > On Mon, Oct 17, 2011 at 11:27:45AM +0200, Jan Kiszka wrote:
> >> diff --git a/hw/msi.c b/hw/msi.c
> >> index 3c7ebc3..9055155 100644
> >> --- a/hw/msi.c
> >> +++ b/hw/msi.c
> >> @@ -40,6 +40,14 @@
> >>  /* Flag for interrupt controller to declare MSI/MSI-X support */
> >>  bool msi_supported;
> >>  
> >> +static void msi_unsupported(MSIMessage *msg)
> >> +{
> >> +    /* If we get here, the board failed to register a delivery handler. */
> >> +    abort();
> >> +}
> >> +
> >> +void (*msi_deliver)(MSIMessage *msg) = msi_unsupported;
> >> +
> > 
> > How about we set this to NULL, and check it instead of the bool
> > flag?
> > 
> 
> Yeah. I will introduce
> 
> bool msi_supported(void)
> {
>     return msi_deliver != msi_unsupported;
> }
> 
> OK?
> 
> Jan
> 

Looks a bit weird ...
NULL is a pretty standard value for an invalid pointer, isn't it?

-- 
MST

^ permalink raw reply	[flat|nested] 288+ messages in thread

* Re: [RFC][PATCH 28/45] qemu-kvm: msix: Drop tracking of used vectors
  2011-10-18 11:58         ` [Qemu-devel] " Michael S. Tsirkin
@ 2011-10-18 12:08           ` Jan Kiszka
  -1 siblings, 0 replies; 288+ messages in thread
From: Jan Kiszka @ 2011-10-18 12:08 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Avi Kivity, Marcelo Tosatti, kvm, Alex Williamson, qemu-devel

On 2011-10-18 13:58, Michael S. Tsirkin wrote:
> On Mon, Oct 17, 2011 at 09:28:12PM +0200, Jan Kiszka wrote:
>> On 2011-10-17 17:48, Michael S. Tsirkin wrote:
>>> On Mon, Oct 17, 2011 at 11:28:02AM +0200, Jan Kiszka wrote:
>>>> This optimization was only required to keep KVM route usage low. Now
>>>> that we solve that problem via lazy updates, we can drop the field. We
>>>> still need interfaces to clear pending vectors, though (and we have to
>>>> make use of them more broadly - but that's unrelated to this patch).
>>>>
>>>> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
>>>
>>> Lazy updates should be an implementation detail.
>>> IMO resource tracking of vectors makes sense
>>> as an API. Making devices deal with pending
>>> vectors as a concept, IMO, does not.
>>
>> There is really no use for tracking the vector lifecycle once we have
>> lazy updates (except for static routes). It's a way too invasive
>> concept, and it's not needed for anything but KVM.
> 
> I think it's needed. The PCI spec states that when the device
> does not need an interrupt anymore, it should clear the pending
> bit.

That should be done explicitly if it is required outside existing
clearing points. We already have that service, it's called
msix_clear_vector. That alone does not justify msix_vector_use and all
the state and logic behind it IMHO.

> The use/unuse is IMO a decent API for this,
> because it uses a familiar resource tracking concept.
> Exposing this knowledge of msix to devices seems
> like a worse API.
> 
>>
>> If you want an example, check
>> http://permalink.gmane.org/gmane.comp.emulators.kvm.devel/70915 and
>> compare it to the changes done to hpet in this series.
>>
>> Jan
>>
> 
> This seems to be a general argument that lazy updates are good?
> I have no real problem with them, besides the fact that
> we need an API to reserve space in the routing
> table so that device setup can fail upfront.

That's not possible, even with used vectors, as devices change their
vector usage depending on how the guest configures the devices. If you
(pre-)allocate all possible vectors, you may run out of resources
earlier than needed actually. That's also why we do those data == 0
checks to skip used but unconfigured vectors.

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux

^ permalink raw reply	[flat|nested] 288+ messages in thread

* Re: [Qemu-devel] [RFC][PATCH 28/45] qemu-kvm: msix: Drop tracking of used vectors
@ 2011-10-18 12:08           ` Jan Kiszka
  0 siblings, 0 replies; 288+ messages in thread
From: Jan Kiszka @ 2011-10-18 12:08 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Alex Williamson, Marcelo Tosatti, Avi Kivity, kvm, qemu-devel

On 2011-10-18 13:58, Michael S. Tsirkin wrote:
> On Mon, Oct 17, 2011 at 09:28:12PM +0200, Jan Kiszka wrote:
>> On 2011-10-17 17:48, Michael S. Tsirkin wrote:
>>> On Mon, Oct 17, 2011 at 11:28:02AM +0200, Jan Kiszka wrote:
>>>> This optimization was only required to keep KVM route usage low. Now
>>>> that we solve that problem via lazy updates, we can drop the field. We
>>>> still need interfaces to clear pending vectors, though (and we have to
>>>> make use of them more broadly - but that's unrelated to this patch).
>>>>
>>>> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
>>>
>>> Lazy updates should be an implementation detail.
>>> IMO resource tracking of vectors makes sense
>>> as an API. Making devices deal with pending
>>> vectors as a concept, IMO, does not.
>>
>> There is really no use for tracking the vector lifecycle once we have
>> lazy updates (except for static routes). It's a way too invasive
>> concept, and it's not needed for anything but KVM.
> 
> I think it's needed. The PCI spec states that when the device
> does not need an interrupt anymore, it should clear the pending
> bit.

That should be done explicitly if it is required outside existing
clearing points. We already have that service, it's called
msix_clear_vector. That alone does not justify msix_vector_use and all
the state and logic behind it IMHO.

> The use/unuse is IMO a decent API for this,
> because it uses a familiar resource tracking concept.
> Exposing this knowledge of msix to devices seems
> like a worse API.
> 
>>
>> If you want an example, check
>> http://permalink.gmane.org/gmane.comp.emulators.kvm.devel/70915 and
>> compare it to the changes done to hpet in this series.
>>
>> Jan
>>
> 
> This seems to be a general argument that lazy updates are good?
> I have no real problem with them, besides the fact that
> we need an API to reserve space in the routing
> table so that device setup can fail upfront.

That's not possible, even with used vectors, as devices change their
vector usage depending on how the guest configures the devices. If you
(pre-)allocate all possible vectors, you may run out of resources
earlier than needed actually. That's also why we do those data == 0
checks to skip used but unconfigured vectors.

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux

^ permalink raw reply	[flat|nested] 288+ messages in thread

* Re: [RFC][PATCH 12/45] msi: Introduce MSIRoutingCache
  2011-10-17 19:19           ` [Qemu-devel] " Jan Kiszka
@ 2011-10-18 12:17             ` Michael S. Tsirkin
  -1 siblings, 0 replies; 288+ messages in thread
From: Michael S. Tsirkin @ 2011-10-18 12:17 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: Avi Kivity, Marcelo Tosatti, kvm, Alex Williamson, qemu-devel

On Mon, Oct 17, 2011 at 09:19:34PM +0200, Jan Kiszka wrote:
> On 2011-10-17 17:37, Michael S. Tsirkin wrote:
> > On Mon, Oct 17, 2011 at 01:19:56PM +0200, Jan Kiszka wrote:
> >> On 2011-10-17 13:06, Avi Kivity wrote:
> >>> On 10/17/2011 11:27 AM, Jan Kiszka wrote:
> >>>> This cache will help us implementing KVM in-kernel irqchip support
> >>>> without spreading hooks all over the place.
> >>>>
> >>>> KVM requires us to register it first and then deliver it by raising a
> >>>> pseudo IRQ line returned on registration. While this could be changed
> >>>> for QEMU-originated MSI messages by adding direct MSI injection, we will
> >>>> still need this translation for irqfd-originated messages. The
> >>>> MSIRoutingCache will allow to track those registrations and update them
> >>>> lazily before the actual delivery. This avoid having to track MSI
> >>>> vectors at device level (like qemu-kvm currently does).
> >>>>
> >>>>
> >>>> +typedef enum {
> >>>> +    MSI_ROUTE_NONE = 0,
> >>>> +    MSI_ROUTE_STATIC,
> >>>> +} MSIRouteType;
> >>>> +
> >>>> +struct MSIRoutingCache {
> >>>> +    MSIMessage msg;
> >>>> +    MSIRouteType type;
> >>>> +    int kvm_gsi;
> >>>> +    int kvm_irqfd;
> >>>> +};
> >>>> +
> >>>> diff --git a/hw/pci.h b/hw/pci.h
> >>>> index 329ab32..5b5d2fd 100644
> >>>> --- a/hw/pci.h
> >>>> +++ b/hw/pci.h
> >>>> @@ -197,6 +197,10 @@ struct PCIDevice {
> >>>>      MemoryRegion rom;
> >>>>      uint32_t rom_bar;
> >>>>  
> >>>> +    /* MSI routing chaches */
> >>>> +    MSIRoutingCache *msi_cache;
> >>>> +    MSIRoutingCache *msix_cache;
> >>>> +
> >>>>      /* MSI entries */
> >>>>      int msi_entries_nr;
> >>>>      struct KVMMsiMessage *msi_irq_entries;
> >>>
> >>> IMO this needlessly leaks kvm information into core qemu.  The cache
> >>> should be completely hidden in kvm code.
> >>>
> >>> I think msi_deliver() can hide the use of the cache completely.  For
> >>> pre-registered events like kvm's irqfd, you can use something like
> >>>
> >>>   qemu_irq qemu_msi_irq(MSIMessage msg)
> >>>
> >>> for non-kvm, it simply returns a qemu_irq that triggers a stl_phys();
> >>> for kvm, it allocates an irqfd and a permanent entry in the cache and
> >>> returns a qemu_irq that triggers the irqfd.
> >>
> >> See my previously mail: you want to track the life-cycle of an MSI
> >> source to avoid generating routes for identical sources. A messages is
> >> not a source. Two identical messages can come from different sources.
> > 
> > Since MSI messages are edge triggered, I don't see how this
> > would work without losing interrupts. And AFAIK,
> > existing guests do not use the same message for
> > different sources.
> 
> Just like we handle shared edge-triggered line-base IRQs, shared MSIs
> are in principle feasible as well.
> 
> Jan
> 

For this case it seems quite harmless to use multiple
routes for identical sources. Yes it would use more resources
but it never happens in practice.
So what Avi said originally is still true.

-- 
MST

^ permalink raw reply	[flat|nested] 288+ messages in thread

* Re: [Qemu-devel] [RFC][PATCH 12/45] msi: Introduce MSIRoutingCache
@ 2011-10-18 12:17             ` Michael S. Tsirkin
  0 siblings, 0 replies; 288+ messages in thread
From: Michael S. Tsirkin @ 2011-10-18 12:17 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: Alex Williamson, Marcelo Tosatti, Avi Kivity, kvm, qemu-devel

On Mon, Oct 17, 2011 at 09:19:34PM +0200, Jan Kiszka wrote:
> On 2011-10-17 17:37, Michael S. Tsirkin wrote:
> > On Mon, Oct 17, 2011 at 01:19:56PM +0200, Jan Kiszka wrote:
> >> On 2011-10-17 13:06, Avi Kivity wrote:
> >>> On 10/17/2011 11:27 AM, Jan Kiszka wrote:
> >>>> This cache will help us implementing KVM in-kernel irqchip support
> >>>> without spreading hooks all over the place.
> >>>>
> >>>> KVM requires us to register it first and then deliver it by raising a
> >>>> pseudo IRQ line returned on registration. While this could be changed
> >>>> for QEMU-originated MSI messages by adding direct MSI injection, we will
> >>>> still need this translation for irqfd-originated messages. The
> >>>> MSIRoutingCache will allow to track those registrations and update them
> >>>> lazily before the actual delivery. This avoid having to track MSI
> >>>> vectors at device level (like qemu-kvm currently does).
> >>>>
> >>>>
> >>>> +typedef enum {
> >>>> +    MSI_ROUTE_NONE = 0,
> >>>> +    MSI_ROUTE_STATIC,
> >>>> +} MSIRouteType;
> >>>> +
> >>>> +struct MSIRoutingCache {
> >>>> +    MSIMessage msg;
> >>>> +    MSIRouteType type;
> >>>> +    int kvm_gsi;
> >>>> +    int kvm_irqfd;
> >>>> +};
> >>>> +
> >>>> diff --git a/hw/pci.h b/hw/pci.h
> >>>> index 329ab32..5b5d2fd 100644
> >>>> --- a/hw/pci.h
> >>>> +++ b/hw/pci.h
> >>>> @@ -197,6 +197,10 @@ struct PCIDevice {
> >>>>      MemoryRegion rom;
> >>>>      uint32_t rom_bar;
> >>>>  
> >>>> +    /* MSI routing chaches */
> >>>> +    MSIRoutingCache *msi_cache;
> >>>> +    MSIRoutingCache *msix_cache;
> >>>> +
> >>>>      /* MSI entries */
> >>>>      int msi_entries_nr;
> >>>>      struct KVMMsiMessage *msi_irq_entries;
> >>>
> >>> IMO this needlessly leaks kvm information into core qemu.  The cache
> >>> should be completely hidden in kvm code.
> >>>
> >>> I think msi_deliver() can hide the use of the cache completely.  For
> >>> pre-registered events like kvm's irqfd, you can use something like
> >>>
> >>>   qemu_irq qemu_msi_irq(MSIMessage msg)
> >>>
> >>> for non-kvm, it simply returns a qemu_irq that triggers a stl_phys();
> >>> for kvm, it allocates an irqfd and a permanent entry in the cache and
> >>> returns a qemu_irq that triggers the irqfd.
> >>
> >> See my previously mail: you want to track the life-cycle of an MSI
> >> source to avoid generating routes for identical sources. A messages is
> >> not a source. Two identical messages can come from different sources.
> > 
> > Since MSI messages are edge triggered, I don't see how this
> > would work without losing interrupts. And AFAIK,
> > existing guests do not use the same message for
> > different sources.
> 
> Just like we handle shared edge-triggered line-base IRQs, shared MSIs
> are in principle feasible as well.
> 
> Jan
> 

For this case it seems quite harmless to use multiple
routes for identical sources. Yes it would use more resources
but it never happens in practice.
So what Avi said originally is still true.

-- 
MST

^ permalink raw reply	[flat|nested] 288+ messages in thread

* Re: [RFC][PATCH 11/45] msi: Factor out delivery hook
  2011-10-18 12:05         ` [Qemu-devel] " Michael S. Tsirkin
@ 2011-10-18 12:23           ` Jan Kiszka
  -1 siblings, 0 replies; 288+ messages in thread
From: Jan Kiszka @ 2011-10-18 12:23 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Avi Kivity, Marcelo Tosatti, kvm, Alex Williamson, qemu-devel

On 2011-10-18 14:05, Michael S. Tsirkin wrote:
> On Mon, Oct 17, 2011 at 09:15:47PM +0200, Jan Kiszka wrote:
>> On 2011-10-17 15:43, Michael S. Tsirkin wrote:
>>> On Mon, Oct 17, 2011 at 11:27:45AM +0200, Jan Kiszka wrote:
>>>> diff --git a/hw/msi.c b/hw/msi.c
>>>> index 3c7ebc3..9055155 100644
>>>> --- a/hw/msi.c
>>>> +++ b/hw/msi.c
>>>> @@ -40,6 +40,14 @@
>>>>  /* Flag for interrupt controller to declare MSI/MSI-X support */
>>>>  bool msi_supported;
>>>>  
>>>> +static void msi_unsupported(MSIMessage *msg)
>>>> +{
>>>> +    /* If we get here, the board failed to register a delivery handler. */
>>>> +    abort();
>>>> +}
>>>> +
>>>> +void (*msi_deliver)(MSIMessage *msg) = msi_unsupported;
>>>> +
>>>
>>> How about we set this to NULL, and check it instead of the bool
>>> flag?
>>>
>>
>> Yeah. I will introduce
>>
>> bool msi_supported(void)
>> {
>>     return msi_deliver != msi_unsupported;
>> }
>>
>> OK?
>>
>> Jan
>>
> 
> Looks a bit weird ...
> NULL is a pretty standard value for an invalid pointer, isn't it?

Save us the runtime check and is equally expressive and readable IMHO.

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux

^ permalink raw reply	[flat|nested] 288+ messages in thread

* Re: [Qemu-devel] [RFC][PATCH 11/45] msi: Factor out delivery hook
@ 2011-10-18 12:23           ` Jan Kiszka
  0 siblings, 0 replies; 288+ messages in thread
From: Jan Kiszka @ 2011-10-18 12:23 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Alex Williamson, Marcelo Tosatti, Avi Kivity, kvm, qemu-devel

On 2011-10-18 14:05, Michael S. Tsirkin wrote:
> On Mon, Oct 17, 2011 at 09:15:47PM +0200, Jan Kiszka wrote:
>> On 2011-10-17 15:43, Michael S. Tsirkin wrote:
>>> On Mon, Oct 17, 2011 at 11:27:45AM +0200, Jan Kiszka wrote:
>>>> diff --git a/hw/msi.c b/hw/msi.c
>>>> index 3c7ebc3..9055155 100644
>>>> --- a/hw/msi.c
>>>> +++ b/hw/msi.c
>>>> @@ -40,6 +40,14 @@
>>>>  /* Flag for interrupt controller to declare MSI/MSI-X support */
>>>>  bool msi_supported;
>>>>  
>>>> +static void msi_unsupported(MSIMessage *msg)
>>>> +{
>>>> +    /* If we get here, the board failed to register a delivery handler. */
>>>> +    abort();
>>>> +}
>>>> +
>>>> +void (*msi_deliver)(MSIMessage *msg) = msi_unsupported;
>>>> +
>>>
>>> How about we set this to NULL, and check it instead of the bool
>>> flag?
>>>
>>
>> Yeah. I will introduce
>>
>> bool msi_supported(void)
>> {
>>     return msi_deliver != msi_unsupported;
>> }
>>
>> OK?
>>
>> Jan
>>
> 
> Looks a bit weird ...
> NULL is a pretty standard value for an invalid pointer, isn't it?

Save us the runtime check and is equally expressive and readable IMHO.

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux

^ permalink raw reply	[flat|nested] 288+ messages in thread

* Re: [RFC][PATCH 12/45] msi: Introduce MSIRoutingCache
  2011-10-18 12:17             ` [Qemu-devel] " Michael S. Tsirkin
@ 2011-10-18 12:26               ` Jan Kiszka
  -1 siblings, 0 replies; 288+ messages in thread
From: Jan Kiszka @ 2011-10-18 12:26 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Avi Kivity, Marcelo Tosatti, kvm, Alex Williamson, qemu-devel

On 2011-10-18 14:17, Michael S. Tsirkin wrote:
> On Mon, Oct 17, 2011 at 09:19:34PM +0200, Jan Kiszka wrote:
>> On 2011-10-17 17:37, Michael S. Tsirkin wrote:
>>> On Mon, Oct 17, 2011 at 01:19:56PM +0200, Jan Kiszka wrote:
>>>> On 2011-10-17 13:06, Avi Kivity wrote:
>>>>> On 10/17/2011 11:27 AM, Jan Kiszka wrote:
>>>>>> This cache will help us implementing KVM in-kernel irqchip support
>>>>>> without spreading hooks all over the place.
>>>>>>
>>>>>> KVM requires us to register it first and then deliver it by raising a
>>>>>> pseudo IRQ line returned on registration. While this could be changed
>>>>>> for QEMU-originated MSI messages by adding direct MSI injection, we will
>>>>>> still need this translation for irqfd-originated messages. The
>>>>>> MSIRoutingCache will allow to track those registrations and update them
>>>>>> lazily before the actual delivery. This avoid having to track MSI
>>>>>> vectors at device level (like qemu-kvm currently does).
>>>>>>
>>>>>>
>>>>>> +typedef enum {
>>>>>> +    MSI_ROUTE_NONE = 0,
>>>>>> +    MSI_ROUTE_STATIC,
>>>>>> +} MSIRouteType;
>>>>>> +
>>>>>> +struct MSIRoutingCache {
>>>>>> +    MSIMessage msg;
>>>>>> +    MSIRouteType type;
>>>>>> +    int kvm_gsi;
>>>>>> +    int kvm_irqfd;
>>>>>> +};
>>>>>> +
>>>>>> diff --git a/hw/pci.h b/hw/pci.h
>>>>>> index 329ab32..5b5d2fd 100644
>>>>>> --- a/hw/pci.h
>>>>>> +++ b/hw/pci.h
>>>>>> @@ -197,6 +197,10 @@ struct PCIDevice {
>>>>>>      MemoryRegion rom;
>>>>>>      uint32_t rom_bar;
>>>>>>  
>>>>>> +    /* MSI routing chaches */
>>>>>> +    MSIRoutingCache *msi_cache;
>>>>>> +    MSIRoutingCache *msix_cache;
>>>>>> +
>>>>>>      /* MSI entries */
>>>>>>      int msi_entries_nr;
>>>>>>      struct KVMMsiMessage *msi_irq_entries;
>>>>>
>>>>> IMO this needlessly leaks kvm information into core qemu.  The cache
>>>>> should be completely hidden in kvm code.
>>>>>
>>>>> I think msi_deliver() can hide the use of the cache completely.  For
>>>>> pre-registered events like kvm's irqfd, you can use something like
>>>>>
>>>>>   qemu_irq qemu_msi_irq(MSIMessage msg)
>>>>>
>>>>> for non-kvm, it simply returns a qemu_irq that triggers a stl_phys();
>>>>> for kvm, it allocates an irqfd and a permanent entry in the cache and
>>>>> returns a qemu_irq that triggers the irqfd.
>>>>
>>>> See my previously mail: you want to track the life-cycle of an MSI
>>>> source to avoid generating routes for identical sources. A messages is
>>>> not a source. Two identical messages can come from different sources.
>>>
>>> Since MSI messages are edge triggered, I don't see how this
>>> would work without losing interrupts. And AFAIK,
>>> existing guests do not use the same message for
>>> different sources.
>>
>> Just like we handle shared edge-triggered line-base IRQs, shared MSIs
>> are in principle feasible as well.
>>
>> Jan
>>
> 
> For this case it seems quite harmless to use multiple
> routes for identical sources.

Unless we track the source (via the MSIRoutingCache abstraction), there
can be no multiple routes. The core cannot differentiate between
identical messages, thus will not create multiple routes.

But that's actually a corner case, and we could probably live with it.
The real question is if we want to search for MSI routes on each message
delivery.

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux

^ permalink raw reply	[flat|nested] 288+ messages in thread

* Re: [Qemu-devel] [RFC][PATCH 12/45] msi: Introduce MSIRoutingCache
@ 2011-10-18 12:26               ` Jan Kiszka
  0 siblings, 0 replies; 288+ messages in thread
From: Jan Kiszka @ 2011-10-18 12:26 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Alex Williamson, Marcelo Tosatti, Avi Kivity, kvm, qemu-devel

On 2011-10-18 14:17, Michael S. Tsirkin wrote:
> On Mon, Oct 17, 2011 at 09:19:34PM +0200, Jan Kiszka wrote:
>> On 2011-10-17 17:37, Michael S. Tsirkin wrote:
>>> On Mon, Oct 17, 2011 at 01:19:56PM +0200, Jan Kiszka wrote:
>>>> On 2011-10-17 13:06, Avi Kivity wrote:
>>>>> On 10/17/2011 11:27 AM, Jan Kiszka wrote:
>>>>>> This cache will help us implementing KVM in-kernel irqchip support
>>>>>> without spreading hooks all over the place.
>>>>>>
>>>>>> KVM requires us to register it first and then deliver it by raising a
>>>>>> pseudo IRQ line returned on registration. While this could be changed
>>>>>> for QEMU-originated MSI messages by adding direct MSI injection, we will
>>>>>> still need this translation for irqfd-originated messages. The
>>>>>> MSIRoutingCache will allow to track those registrations and update them
>>>>>> lazily before the actual delivery. This avoid having to track MSI
>>>>>> vectors at device level (like qemu-kvm currently does).
>>>>>>
>>>>>>
>>>>>> +typedef enum {
>>>>>> +    MSI_ROUTE_NONE = 0,
>>>>>> +    MSI_ROUTE_STATIC,
>>>>>> +} MSIRouteType;
>>>>>> +
>>>>>> +struct MSIRoutingCache {
>>>>>> +    MSIMessage msg;
>>>>>> +    MSIRouteType type;
>>>>>> +    int kvm_gsi;
>>>>>> +    int kvm_irqfd;
>>>>>> +};
>>>>>> +
>>>>>> diff --git a/hw/pci.h b/hw/pci.h
>>>>>> index 329ab32..5b5d2fd 100644
>>>>>> --- a/hw/pci.h
>>>>>> +++ b/hw/pci.h
>>>>>> @@ -197,6 +197,10 @@ struct PCIDevice {
>>>>>>      MemoryRegion rom;
>>>>>>      uint32_t rom_bar;
>>>>>>  
>>>>>> +    /* MSI routing chaches */
>>>>>> +    MSIRoutingCache *msi_cache;
>>>>>> +    MSIRoutingCache *msix_cache;
>>>>>> +
>>>>>>      /* MSI entries */
>>>>>>      int msi_entries_nr;
>>>>>>      struct KVMMsiMessage *msi_irq_entries;
>>>>>
>>>>> IMO this needlessly leaks kvm information into core qemu.  The cache
>>>>> should be completely hidden in kvm code.
>>>>>
>>>>> I think msi_deliver() can hide the use of the cache completely.  For
>>>>> pre-registered events like kvm's irqfd, you can use something like
>>>>>
>>>>>   qemu_irq qemu_msi_irq(MSIMessage msg)
>>>>>
>>>>> for non-kvm, it simply returns a qemu_irq that triggers a stl_phys();
>>>>> for kvm, it allocates an irqfd and a permanent entry in the cache and
>>>>> returns a qemu_irq that triggers the irqfd.
>>>>
>>>> See my previously mail: you want to track the life-cycle of an MSI
>>>> source to avoid generating routes for identical sources. A messages is
>>>> not a source. Two identical messages can come from different sources.
>>>
>>> Since MSI messages are edge triggered, I don't see how this
>>> would work without losing interrupts. And AFAIK,
>>> existing guests do not use the same message for
>>> different sources.
>>
>> Just like we handle shared edge-triggered line-base IRQs, shared MSIs
>> are in principle feasible as well.
>>
>> Jan
>>
> 
> For this case it seems quite harmless to use multiple
> routes for identical sources.

Unless we track the source (via the MSIRoutingCache abstraction), there
can be no multiple routes. The core cannot differentiate between
identical messages, thus will not create multiple routes.

But that's actually a corner case, and we could probably live with it.
The real question is if we want to search for MSI routes on each message
delivery.

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux

^ permalink raw reply	[flat|nested] 288+ messages in thread

* Re: [RFC][PATCH 28/45] qemu-kvm: msix: Drop tracking of used vectors
  2011-10-18 12:08           ` [Qemu-devel] " Jan Kiszka
@ 2011-10-18 12:33             ` Michael S. Tsirkin
  -1 siblings, 0 replies; 288+ messages in thread
From: Michael S. Tsirkin @ 2011-10-18 12:33 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: Avi Kivity, Marcelo Tosatti, kvm, Alex Williamson, qemu-devel

On Tue, Oct 18, 2011 at 02:08:59PM +0200, Jan Kiszka wrote:
> On 2011-10-18 13:58, Michael S. Tsirkin wrote:
> > On Mon, Oct 17, 2011 at 09:28:12PM +0200, Jan Kiszka wrote:
> >> On 2011-10-17 17:48, Michael S. Tsirkin wrote:
> >>> On Mon, Oct 17, 2011 at 11:28:02AM +0200, Jan Kiszka wrote:
> >>>> This optimization was only required to keep KVM route usage low. Now
> >>>> that we solve that problem via lazy updates, we can drop the field. We
> >>>> still need interfaces to clear pending vectors, though (and we have to
> >>>> make use of them more broadly - but that's unrelated to this patch).
> >>>>
> >>>> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
> >>>
> >>> Lazy updates should be an implementation detail.
> >>> IMO resource tracking of vectors makes sense
> >>> as an API. Making devices deal with pending
> >>> vectors as a concept, IMO, does not.
> >>
> >> There is really no use for tracking the vector lifecycle once we have
> >> lazy updates (except for static routes). It's a way too invasive
> >> concept, and it's not needed for anything but KVM.
> > 
> > I think it's needed. The PCI spec states that when the device
> > does not need an interrupt anymore, it should clear the pending
> > bit.
> 
> That should be done explicitly if it is required outside existing
> clearing points. We already have that service, it's called
> msix_clear_vector.

We do? I don't seem to see it upstream...

> That alone does not justify msix_vector_use and all
> the state and logic behind it IMHO.

To me it looks like an abstraction that solves both
this problem and the resource allocation problem.
Resources are actually limited BTW, this is not just
a KVM thing. qemu.git currently lets guests decide
what to do with them, but it might turn out to
be benefitial to warn the management application
that it is shooting itself in the foot.

> > The use/unuse is IMO a decent API for this,
> > because it uses a familiar resource tracking concept.
> > Exposing this knowledge of msix to devices seems
> > like a worse API.
> > 
> >>
> >> If you want an example, check
> >> http://permalink.gmane.org/gmane.comp.emulators.kvm.devel/70915 and
> >> compare it to the changes done to hpet in this series.
> >>
> >> Jan
> >>
> > 
> > This seems to be a general argument that lazy updates are good?
> > I have no real problem with them, besides the fact that
> > we need an API to reserve space in the routing
> > table so that device setup can fail upfront.
> 
> That's not possible, even with used vectors, as devices change their
> vector usage depending on how the guest configures the devices. If you
> (pre-)allocate all possible vectors, you may run out of resources
> earlier than needed actually.

This really depends, but please do look at how with virtio
we report resource shortage to guest and let it fall back to
level interrups. You seem to remove that capability.

I actually would not mind preallocating everything upfront which is much
easier.  But with your patch we get a silent failure or a drastic
slowdown which is much more painful IMO.

> That's also why we do those data == 0
> checks to skip used but unconfigured vectors.
> 
> Jan

These checks work more or less by luck BTW. It's
a hack which I hope lazy allocation will replace.

> -- 
> Siemens AG, Corporate Technology, CT T DE IT 1
> Corporate Competence Center Embedded Linux

^ permalink raw reply	[flat|nested] 288+ messages in thread

* Re: [Qemu-devel] [RFC][PATCH 28/45] qemu-kvm: msix: Drop tracking of used vectors
@ 2011-10-18 12:33             ` Michael S. Tsirkin
  0 siblings, 0 replies; 288+ messages in thread
From: Michael S. Tsirkin @ 2011-10-18 12:33 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: Alex Williamson, Marcelo Tosatti, Avi Kivity, kvm, qemu-devel

On Tue, Oct 18, 2011 at 02:08:59PM +0200, Jan Kiszka wrote:
> On 2011-10-18 13:58, Michael S. Tsirkin wrote:
> > On Mon, Oct 17, 2011 at 09:28:12PM +0200, Jan Kiszka wrote:
> >> On 2011-10-17 17:48, Michael S. Tsirkin wrote:
> >>> On Mon, Oct 17, 2011 at 11:28:02AM +0200, Jan Kiszka wrote:
> >>>> This optimization was only required to keep KVM route usage low. Now
> >>>> that we solve that problem via lazy updates, we can drop the field. We
> >>>> still need interfaces to clear pending vectors, though (and we have to
> >>>> make use of them more broadly - but that's unrelated to this patch).
> >>>>
> >>>> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
> >>>
> >>> Lazy updates should be an implementation detail.
> >>> IMO resource tracking of vectors makes sense
> >>> as an API. Making devices deal with pending
> >>> vectors as a concept, IMO, does not.
> >>
> >> There is really no use for tracking the vector lifecycle once we have
> >> lazy updates (except for static routes). It's a way too invasive
> >> concept, and it's not needed for anything but KVM.
> > 
> > I think it's needed. The PCI spec states that when the device
> > does not need an interrupt anymore, it should clear the pending
> > bit.
> 
> That should be done explicitly if it is required outside existing
> clearing points. We already have that service, it's called
> msix_clear_vector.

We do? I don't seem to see it upstream...

> That alone does not justify msix_vector_use and all
> the state and logic behind it IMHO.

To me it looks like an abstraction that solves both
this problem and the resource allocation problem.
Resources are actually limited BTW, this is not just
a KVM thing. qemu.git currently lets guests decide
what to do with them, but it might turn out to
be benefitial to warn the management application
that it is shooting itself in the foot.

> > The use/unuse is IMO a decent API for this,
> > because it uses a familiar resource tracking concept.
> > Exposing this knowledge of msix to devices seems
> > like a worse API.
> > 
> >>
> >> If you want an example, check
> >> http://permalink.gmane.org/gmane.comp.emulators.kvm.devel/70915 and
> >> compare it to the changes done to hpet in this series.
> >>
> >> Jan
> >>
> > 
> > This seems to be a general argument that lazy updates are good?
> > I have no real problem with them, besides the fact that
> > we need an API to reserve space in the routing
> > table so that device setup can fail upfront.
> 
> That's not possible, even with used vectors, as devices change their
> vector usage depending on how the guest configures the devices. If you
> (pre-)allocate all possible vectors, you may run out of resources
> earlier than needed actually.

This really depends, but please do look at how with virtio
we report resource shortage to guest and let it fall back to
level interrups. You seem to remove that capability.

I actually would not mind preallocating everything upfront which is much
easier.  But with your patch we get a silent failure or a drastic
slowdown which is much more painful IMO.

> That's also why we do those data == 0
> checks to skip used but unconfigured vectors.
> 
> Jan

These checks work more or less by luck BTW. It's
a hack which I hope lazy allocation will replace.

> -- 
> Siemens AG, Corporate Technology, CT T DE IT 1
> Corporate Competence Center Embedded Linux

^ permalink raw reply	[flat|nested] 288+ messages in thread

* Re: [RFC][PATCH 28/45] qemu-kvm: msix: Drop tracking of used vectors
  2011-10-18 12:33             ` [Qemu-devel] " Michael S. Tsirkin
@ 2011-10-18 12:38               ` Jan Kiszka
  -1 siblings, 0 replies; 288+ messages in thread
From: Jan Kiszka @ 2011-10-18 12:38 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Avi Kivity, Marcelo Tosatti, kvm, Alex Williamson, qemu-devel

On 2011-10-18 14:33, Michael S. Tsirkin wrote:
> On Tue, Oct 18, 2011 at 02:08:59PM +0200, Jan Kiszka wrote:
>> On 2011-10-18 13:58, Michael S. Tsirkin wrote:
>>> On Mon, Oct 17, 2011 at 09:28:12PM +0200, Jan Kiszka wrote:
>>>> On 2011-10-17 17:48, Michael S. Tsirkin wrote:
>>>>> On Mon, Oct 17, 2011 at 11:28:02AM +0200, Jan Kiszka wrote:
>>>>>> This optimization was only required to keep KVM route usage low. Now
>>>>>> that we solve that problem via lazy updates, we can drop the field. We
>>>>>> still need interfaces to clear pending vectors, though (and we have to
>>>>>> make use of them more broadly - but that's unrelated to this patch).
>>>>>>
>>>>>> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
>>>>>
>>>>> Lazy updates should be an implementation detail.
>>>>> IMO resource tracking of vectors makes sense
>>>>> as an API. Making devices deal with pending
>>>>> vectors as a concept, IMO, does not.
>>>>
>>>> There is really no use for tracking the vector lifecycle once we have
>>>> lazy updates (except for static routes). It's a way too invasive
>>>> concept, and it's not needed for anything but KVM.
>>>
>>> I think it's needed. The PCI spec states that when the device
>>> does not need an interrupt anymore, it should clear the pending
>>> bit.
>>
>> That should be done explicitly if it is required outside existing
>> clearing points. We already have that service, it's called
>> msix_clear_vector.
> 
> We do? I don't seem to see it upstream...

True. From the device's POV, MSI-X (and also MSI!) vectors are actually
level-triggered. So we should communicate the level to the MSI core and
not just the edge. Needs more fixing

> 
>> That alone does not justify msix_vector_use and all
>> the state and logic behind it IMHO.
> 
> To me it looks like an abstraction that solves both
> this problem and the resource allocation problem.
> Resources are actually limited BTW, this is not just
> a KVM thing. qemu.git currently lets guests decide
> what to do with them, but it might turn out to
> be benefitial to warn the management application
> that it is shooting itself in the foot.
> 
>>> The use/unuse is IMO a decent API for this,
>>> because it uses a familiar resource tracking concept.
>>> Exposing this knowledge of msix to devices seems
>>> like a worse API.
>>>
>>>>
>>>> If you want an example, check
>>>> http://permalink.gmane.org/gmane.comp.emulators.kvm.devel/70915 and
>>>> compare it to the changes done to hpet in this series.
>>>>
>>>> Jan
>>>>
>>>
>>> This seems to be a general argument that lazy updates are good?
>>> I have no real problem with them, besides the fact that
>>> we need an API to reserve space in the routing
>>> table so that device setup can fail upfront.
>>
>> That's not possible, even with used vectors, as devices change their
>> vector usage depending on how the guest configures the devices. If you
>> (pre-)allocate all possible vectors, you may run out of resources
>> earlier than needed actually.
> 
> This really depends, but please do look at how with virtio
> we report resource shortage to guest and let it fall back to
> level interrups. You seem to remove that capability.

To my understanding, virtio will be the exception as no other device
will have a chance to react on resource shortage while sending(!) an MSI
message.

> 
> I actually would not mind preallocating everything upfront which is much
> easier.  But with your patch we get a silent failure or a drastic
> slowdown which is much more painful IMO.

Again: did we already saw that limit? And where does it come from if not
from KVM?

> 
>> That's also why we do those data == 0
>> checks to skip used but unconfigured vectors.
>>
>> Jan
> 
> These checks work more or less by luck BTW. It's
> a hack which I hope lazy allocation will replace.

The check is still valid (for x86) when we have to use static routes
(device assignment, vhost). For lazy updates, it's obsolete, that's true.

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux

^ permalink raw reply	[flat|nested] 288+ messages in thread

* Re: [Qemu-devel] [RFC][PATCH 28/45] qemu-kvm: msix: Drop tracking of used vectors
@ 2011-10-18 12:38               ` Jan Kiszka
  0 siblings, 0 replies; 288+ messages in thread
From: Jan Kiszka @ 2011-10-18 12:38 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Alex Williamson, Marcelo Tosatti, Avi Kivity, kvm, qemu-devel

On 2011-10-18 14:33, Michael S. Tsirkin wrote:
> On Tue, Oct 18, 2011 at 02:08:59PM +0200, Jan Kiszka wrote:
>> On 2011-10-18 13:58, Michael S. Tsirkin wrote:
>>> On Mon, Oct 17, 2011 at 09:28:12PM +0200, Jan Kiszka wrote:
>>>> On 2011-10-17 17:48, Michael S. Tsirkin wrote:
>>>>> On Mon, Oct 17, 2011 at 11:28:02AM +0200, Jan Kiszka wrote:
>>>>>> This optimization was only required to keep KVM route usage low. Now
>>>>>> that we solve that problem via lazy updates, we can drop the field. We
>>>>>> still need interfaces to clear pending vectors, though (and we have to
>>>>>> make use of them more broadly - but that's unrelated to this patch).
>>>>>>
>>>>>> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
>>>>>
>>>>> Lazy updates should be an implementation detail.
>>>>> IMO resource tracking of vectors makes sense
>>>>> as an API. Making devices deal with pending
>>>>> vectors as a concept, IMO, does not.
>>>>
>>>> There is really no use for tracking the vector lifecycle once we have
>>>> lazy updates (except for static routes). It's a way too invasive
>>>> concept, and it's not needed for anything but KVM.
>>>
>>> I think it's needed. The PCI spec states that when the device
>>> does not need an interrupt anymore, it should clear the pending
>>> bit.
>>
>> That should be done explicitly if it is required outside existing
>> clearing points. We already have that service, it's called
>> msix_clear_vector.
> 
> We do? I don't seem to see it upstream...

True. From the device's POV, MSI-X (and also MSI!) vectors are actually
level-triggered. So we should communicate the level to the MSI core and
not just the edge. Needs more fixing

> 
>> That alone does not justify msix_vector_use and all
>> the state and logic behind it IMHO.
> 
> To me it looks like an abstraction that solves both
> this problem and the resource allocation problem.
> Resources are actually limited BTW, this is not just
> a KVM thing. qemu.git currently lets guests decide
> what to do with them, but it might turn out to
> be benefitial to warn the management application
> that it is shooting itself in the foot.
> 
>>> The use/unuse is IMO a decent API for this,
>>> because it uses a familiar resource tracking concept.
>>> Exposing this knowledge of msix to devices seems
>>> like a worse API.
>>>
>>>>
>>>> If you want an example, check
>>>> http://permalink.gmane.org/gmane.comp.emulators.kvm.devel/70915 and
>>>> compare it to the changes done to hpet in this series.
>>>>
>>>> Jan
>>>>
>>>
>>> This seems to be a general argument that lazy updates are good?
>>> I have no real problem with them, besides the fact that
>>> we need an API to reserve space in the routing
>>> table so that device setup can fail upfront.
>>
>> That's not possible, even with used vectors, as devices change their
>> vector usage depending on how the guest configures the devices. If you
>> (pre-)allocate all possible vectors, you may run out of resources
>> earlier than needed actually.
> 
> This really depends, but please do look at how with virtio
> we report resource shortage to guest and let it fall back to
> level interrups. You seem to remove that capability.

To my understanding, virtio will be the exception as no other device
will have a chance to react on resource shortage while sending(!) an MSI
message.

> 
> I actually would not mind preallocating everything upfront which is much
> easier.  But with your patch we get a silent failure or a drastic
> slowdown which is much more painful IMO.

Again: did we already saw that limit? And where does it come from if not
from KVM?

> 
>> That's also why we do those data == 0
>> checks to skip used but unconfigured vectors.
>>
>> Jan
> 
> These checks work more or less by luck BTW. It's
> a hack which I hope lazy allocation will replace.

The check is still valid (for x86) when we have to use static routes
(device assignment, vhost). For lazy updates, it's obsolete, that's true.

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux

^ permalink raw reply	[flat|nested] 288+ messages in thread

* Re: [RFC][PATCH 11/45] msi: Factor out delivery hook
  2011-10-18 12:23           ` [Qemu-devel] " Jan Kiszka
@ 2011-10-18 12:38             ` Michael S. Tsirkin
  -1 siblings, 0 replies; 288+ messages in thread
From: Michael S. Tsirkin @ 2011-10-18 12:38 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: Avi Kivity, Marcelo Tosatti, kvm, Alex Williamson, qemu-devel

On Tue, Oct 18, 2011 at 02:23:29PM +0200, Jan Kiszka wrote:
> On 2011-10-18 14:05, Michael S. Tsirkin wrote:
> > On Mon, Oct 17, 2011 at 09:15:47PM +0200, Jan Kiszka wrote:
> >> On 2011-10-17 15:43, Michael S. Tsirkin wrote:
> >>> On Mon, Oct 17, 2011 at 11:27:45AM +0200, Jan Kiszka wrote:
> >>>> diff --git a/hw/msi.c b/hw/msi.c
> >>>> index 3c7ebc3..9055155 100644
> >>>> --- a/hw/msi.c
> >>>> +++ b/hw/msi.c
> >>>> @@ -40,6 +40,14 @@
> >>>>  /* Flag for interrupt controller to declare MSI/MSI-X support */
> >>>>  bool msi_supported;
> >>>>  
> >>>> +static void msi_unsupported(MSIMessage *msg)
> >>>> +{
> >>>> +    /* If we get here, the board failed to register a delivery handler. */
> >>>> +    abort();
> >>>> +}
> >>>> +
> >>>> +void (*msi_deliver)(MSIMessage *msg) = msi_unsupported;
> >>>> +
> >>>
> >>> How about we set this to NULL, and check it instead of the bool
> >>> flag?
> >>>
> >>
> >> Yeah. I will introduce
> >>
> >> bool msi_supported(void)
> >> {
> >>     return msi_deliver != msi_unsupported;
> >> }
> >>
> >> OK?
> >>
> >> Jan
> >>
> > 
> > Looks a bit weird ...
> > NULL is a pretty standard value for an invalid pointer, isn't it?
> 
> Save us the runtime check and is equally expressive and readable IMHO.
> 
> Jan

Do we need to check?
NULL dereference leads to a crash just as surely...

> -- 
> Siemens AG, Corporate Technology, CT T DE IT 1
> Corporate Competence Center Embedded Linux

^ permalink raw reply	[flat|nested] 288+ messages in thread

* Re: [Qemu-devel] [RFC][PATCH 11/45] msi: Factor out delivery hook
@ 2011-10-18 12:38             ` Michael S. Tsirkin
  0 siblings, 0 replies; 288+ messages in thread
From: Michael S. Tsirkin @ 2011-10-18 12:38 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: Alex Williamson, Marcelo Tosatti, Avi Kivity, kvm, qemu-devel

On Tue, Oct 18, 2011 at 02:23:29PM +0200, Jan Kiszka wrote:
> On 2011-10-18 14:05, Michael S. Tsirkin wrote:
> > On Mon, Oct 17, 2011 at 09:15:47PM +0200, Jan Kiszka wrote:
> >> On 2011-10-17 15:43, Michael S. Tsirkin wrote:
> >>> On Mon, Oct 17, 2011 at 11:27:45AM +0200, Jan Kiszka wrote:
> >>>> diff --git a/hw/msi.c b/hw/msi.c
> >>>> index 3c7ebc3..9055155 100644
> >>>> --- a/hw/msi.c
> >>>> +++ b/hw/msi.c
> >>>> @@ -40,6 +40,14 @@
> >>>>  /* Flag for interrupt controller to declare MSI/MSI-X support */
> >>>>  bool msi_supported;
> >>>>  
> >>>> +static void msi_unsupported(MSIMessage *msg)
> >>>> +{
> >>>> +    /* If we get here, the board failed to register a delivery handler. */
> >>>> +    abort();
> >>>> +}
> >>>> +
> >>>> +void (*msi_deliver)(MSIMessage *msg) = msi_unsupported;
> >>>> +
> >>>
> >>> How about we set this to NULL, and check it instead of the bool
> >>> flag?
> >>>
> >>
> >> Yeah. I will introduce
> >>
> >> bool msi_supported(void)
> >> {
> >>     return msi_deliver != msi_unsupported;
> >> }
> >>
> >> OK?
> >>
> >> Jan
> >>
> > 
> > Looks a bit weird ...
> > NULL is a pretty standard value for an invalid pointer, isn't it?
> 
> Save us the runtime check and is equally expressive and readable IMHO.
> 
> Jan

Do we need to check?
NULL dereference leads to a crash just as surely...

> -- 
> Siemens AG, Corporate Technology, CT T DE IT 1
> Corporate Competence Center Embedded Linux

^ permalink raw reply	[flat|nested] 288+ messages in thread

* Re: [RFC][PATCH 22/45] qemu-kvm: msix: Fire mask notifier on global mask changes
  2011-10-17 19:00       ` [Qemu-devel] " Jan Kiszka
@ 2011-10-18 12:40         ` Michael S. Tsirkin
  -1 siblings, 0 replies; 288+ messages in thread
From: Michael S. Tsirkin @ 2011-10-18 12:40 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: Avi Kivity, Marcelo Tosatti, kvm, Alex Williamson, qemu-devel

On Mon, Oct 17, 2011 at 09:00:12PM +0200, Jan Kiszka wrote:
> On 2011-10-17 14:16, Michael S. Tsirkin wrote:
> > On Mon, Oct 17, 2011 at 11:27:56AM +0200, Jan Kiszka wrote:
> >> Also invoke the mask notifier if the global MSI-X mask is modified. For
> >> this purpose, we push the notifier call from the per-vector mask update
> >> to the central msix_handle_mask_update.
> >>
> >> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
> > 
> > This is a bugfix, isn't it?
> > If yes it should be separated and put on -stable.
> 
> Yep, will pull this to the front.

I'll apply this to qemu.git, no need to mix bugfixes
with features ...

> > 
> >> ---
> >>  hw/msix.c |   16 +++++++++-------
> >>  1 files changed, 9 insertions(+), 7 deletions(-)
> >>
> >> diff --git a/hw/msix.c b/hw/msix.c
> >> index 739b56f..247b255 100644
> >> --- a/hw/msix.c
> >> +++ b/hw/msix.c
> >> @@ -221,7 +221,15 @@ static bool msix_is_masked(PCIDevice *dev, int vector)
> >>  
> >>  static void msix_handle_mask_update(PCIDevice *dev, int vector)
> >>  {
> >> -    if (!msix_is_masked(dev, vector) && msix_is_pending(dev, vector)) {
> >> +    bool masked = msix_is_masked(dev, vector);
> >> +    int ret;
> >> +
> >> +    if (dev->msix_mask_notifier) {
> >> +        ret = dev->msix_mask_notifier(dev, vector,
> >> +                                      msix_is_masked(dev, vector));
> > 
> > Use 'masked' value here as well?
> 
> Yes.
> 
> Jan
> 



^ permalink raw reply	[flat|nested] 288+ messages in thread

* Re: [Qemu-devel] [RFC][PATCH 22/45] qemu-kvm: msix: Fire mask notifier on global mask changes
@ 2011-10-18 12:40         ` Michael S. Tsirkin
  0 siblings, 0 replies; 288+ messages in thread
From: Michael S. Tsirkin @ 2011-10-18 12:40 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: Alex Williamson, Marcelo Tosatti, Avi Kivity, kvm, qemu-devel

On Mon, Oct 17, 2011 at 09:00:12PM +0200, Jan Kiszka wrote:
> On 2011-10-17 14:16, Michael S. Tsirkin wrote:
> > On Mon, Oct 17, 2011 at 11:27:56AM +0200, Jan Kiszka wrote:
> >> Also invoke the mask notifier if the global MSI-X mask is modified. For
> >> this purpose, we push the notifier call from the per-vector mask update
> >> to the central msix_handle_mask_update.
> >>
> >> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
> > 
> > This is a bugfix, isn't it?
> > If yes it should be separated and put on -stable.
> 
> Yep, will pull this to the front.

I'll apply this to qemu.git, no need to mix bugfixes
with features ...

> > 
> >> ---
> >>  hw/msix.c |   16 +++++++++-------
> >>  1 files changed, 9 insertions(+), 7 deletions(-)
> >>
> >> diff --git a/hw/msix.c b/hw/msix.c
> >> index 739b56f..247b255 100644
> >> --- a/hw/msix.c
> >> +++ b/hw/msix.c
> >> @@ -221,7 +221,15 @@ static bool msix_is_masked(PCIDevice *dev, int vector)
> >>  
> >>  static void msix_handle_mask_update(PCIDevice *dev, int vector)
> >>  {
> >> -    if (!msix_is_masked(dev, vector) && msix_is_pending(dev, vector)) {
> >> +    bool masked = msix_is_masked(dev, vector);
> >> +    int ret;
> >> +
> >> +    if (dev->msix_mask_notifier) {
> >> +        ret = dev->msix_mask_notifier(dev, vector,
> >> +                                      msix_is_masked(dev, vector));
> > 
> > Use 'masked' value here as well?
> 
> Yes.
> 
> Jan
> 

^ permalink raw reply	[flat|nested] 288+ messages in thread

* Re: [RFC][PATCH 11/45] msi: Factor out delivery hook
  2011-10-18 12:38             ` [Qemu-devel] " Michael S. Tsirkin
@ 2011-10-18 12:41               ` Jan Kiszka
  -1 siblings, 0 replies; 288+ messages in thread
From: Jan Kiszka @ 2011-10-18 12:41 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Avi Kivity, Marcelo Tosatti, kvm, Alex Williamson, qemu-devel

On 2011-10-18 14:38, Michael S. Tsirkin wrote:
> On Tue, Oct 18, 2011 at 02:23:29PM +0200, Jan Kiszka wrote:
>> On 2011-10-18 14:05, Michael S. Tsirkin wrote:
>>> On Mon, Oct 17, 2011 at 09:15:47PM +0200, Jan Kiszka wrote:
>>>> On 2011-10-17 15:43, Michael S. Tsirkin wrote:
>>>>> On Mon, Oct 17, 2011 at 11:27:45AM +0200, Jan Kiszka wrote:
>>>>>> diff --git a/hw/msi.c b/hw/msi.c
>>>>>> index 3c7ebc3..9055155 100644
>>>>>> --- a/hw/msi.c
>>>>>> +++ b/hw/msi.c
>>>>>> @@ -40,6 +40,14 @@
>>>>>>  /* Flag for interrupt controller to declare MSI/MSI-X support */
>>>>>>  bool msi_supported;
>>>>>>  
>>>>>> +static void msi_unsupported(MSIMessage *msg)
>>>>>> +{
>>>>>> +    /* If we get here, the board failed to register a delivery handler. */
>>>>>> +    abort();
>>>>>> +}
>>>>>> +
>>>>>> +void (*msi_deliver)(MSIMessage *msg) = msi_unsupported;
>>>>>> +
>>>>>
>>>>> How about we set this to NULL, and check it instead of the bool
>>>>> flag?
>>>>>
>>>>
>>>> Yeah. I will introduce
>>>>
>>>> bool msi_supported(void)
>>>> {
>>>>     return msi_deliver != msi_unsupported;
>>>> }
>>>>
>>>> OK?
>>>>
>>>> Jan
>>>>
>>>
>>> Looks a bit weird ...
>>> NULL is a pretty standard value for an invalid pointer, isn't it?
>>
>> Save us the runtime check and is equally expressive and readable IMHO.
>>
>> Jan
> 
> Do we need to check?
> NULL dereference leads to a crash just as surely...

There is no NULL state of msi_deliver. A) it would execute
msi_unsupported if all goes wrong (which will abort) and B)
msi_supported() is supposed to protect us in the absence of bugs from
ever executing msi_deliver() if it points to msi_unsupported.

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux

^ permalink raw reply	[flat|nested] 288+ messages in thread

* Re: [Qemu-devel] [RFC][PATCH 11/45] msi: Factor out delivery hook
@ 2011-10-18 12:41               ` Jan Kiszka
  0 siblings, 0 replies; 288+ messages in thread
From: Jan Kiszka @ 2011-10-18 12:41 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Alex Williamson, Marcelo Tosatti, Avi Kivity, kvm, qemu-devel

On 2011-10-18 14:38, Michael S. Tsirkin wrote:
> On Tue, Oct 18, 2011 at 02:23:29PM +0200, Jan Kiszka wrote:
>> On 2011-10-18 14:05, Michael S. Tsirkin wrote:
>>> On Mon, Oct 17, 2011 at 09:15:47PM +0200, Jan Kiszka wrote:
>>>> On 2011-10-17 15:43, Michael S. Tsirkin wrote:
>>>>> On Mon, Oct 17, 2011 at 11:27:45AM +0200, Jan Kiszka wrote:
>>>>>> diff --git a/hw/msi.c b/hw/msi.c
>>>>>> index 3c7ebc3..9055155 100644
>>>>>> --- a/hw/msi.c
>>>>>> +++ b/hw/msi.c
>>>>>> @@ -40,6 +40,14 @@
>>>>>>  /* Flag for interrupt controller to declare MSI/MSI-X support */
>>>>>>  bool msi_supported;
>>>>>>  
>>>>>> +static void msi_unsupported(MSIMessage *msg)
>>>>>> +{
>>>>>> +    /* If we get here, the board failed to register a delivery handler. */
>>>>>> +    abort();
>>>>>> +}
>>>>>> +
>>>>>> +void (*msi_deliver)(MSIMessage *msg) = msi_unsupported;
>>>>>> +
>>>>>
>>>>> How about we set this to NULL, and check it instead of the bool
>>>>> flag?
>>>>>
>>>>
>>>> Yeah. I will introduce
>>>>
>>>> bool msi_supported(void)
>>>> {
>>>>     return msi_deliver != msi_unsupported;
>>>> }
>>>>
>>>> OK?
>>>>
>>>> Jan
>>>>
>>>
>>> Looks a bit weird ...
>>> NULL is a pretty standard value for an invalid pointer, isn't it?
>>
>> Save us the runtime check and is equally expressive and readable IMHO.
>>
>> Jan
> 
> Do we need to check?
> NULL dereference leads to a crash just as surely...

There is no NULL state of msi_deliver. A) it would execute
msi_unsupported if all goes wrong (which will abort) and B)
msi_supported() is supposed to protect us in the absence of bugs from
ever executing msi_deliver() if it points to msi_unsupported.

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux

^ permalink raw reply	[flat|nested] 288+ messages in thread

* Re: [RFC][PATCH 11/45] msi: Factor out delivery hook
  2011-10-18 12:38             ` [Qemu-devel] " Michael S. Tsirkin
@ 2011-10-18 12:44               ` malc
  -1 siblings, 0 replies; 288+ messages in thread
From: malc @ 2011-10-18 12:44 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: kvm, Jan Kiszka, Marcelo Tosatti, qemu-devel, Alex Williamson,
	Avi Kivity

On Tue, 18 Oct 2011, Michael S. Tsirkin wrote:

> On Tue, Oct 18, 2011 at 02:23:29PM +0200, Jan Kiszka wrote:
> > On 2011-10-18 14:05, Michael S. Tsirkin wrote:
> > > On Mon, Oct 17, 2011 at 09:15:47PM +0200, Jan Kiszka wrote:
> > >> On 2011-10-17 15:43, Michael S. Tsirkin wrote:
> > >>> On Mon, Oct 17, 2011 at 11:27:45AM +0200, Jan Kiszka wrote:
> > >>>> diff --git a/hw/msi.c b/hw/msi.c
> > >>>> index 3c7ebc3..9055155 100644
> > >>>> --- a/hw/msi.c
> > >>>> +++ b/hw/msi.c
> > >>>> @@ -40,6 +40,14 @@
> > >>>>  /* Flag for interrupt controller to declare MSI/MSI-X support */
> > >>>>  bool msi_supported;
> > >>>>  
> > >>>> +static void msi_unsupported(MSIMessage *msg)
> > >>>> +{
> > >>>> +    /* If we get here, the board failed to register a delivery handler. */
> > >>>> +    abort();
> > >>>> +}
> > >>>> +
> > >>>> +void (*msi_deliver)(MSIMessage *msg) = msi_unsupported;
> > >>>> +
> > >>>
> > >>> How about we set this to NULL, and check it instead of the bool
> > >>> flag?
> > >>>
> > >>
> > >> Yeah. I will introduce
> > >>
> > >> bool msi_supported(void)
> > >> {
> > >>     return msi_deliver != msi_unsupported;
> > >> }
> > >>
> > >> OK?
> > >>
> > >> Jan
> > >>
> > > 
> > > Looks a bit weird ...
> > > NULL is a pretty standard value for an invalid pointer, isn't it?
> > 
> > Save us the runtime check and is equally expressive and readable IMHO.
> > 
> > Jan
> 
> Do we need to check?
> NULL dereference leads to a crash just as surely...
> 

Not universally (not on AIX for instance (read)).

-- 
mailto:av1474@comtv.ru

^ permalink raw reply	[flat|nested] 288+ messages in thread

* Re: [Qemu-devel] [RFC][PATCH 11/45] msi: Factor out delivery hook
@ 2011-10-18 12:44               ` malc
  0 siblings, 0 replies; 288+ messages in thread
From: malc @ 2011-10-18 12:44 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: kvm, Jan Kiszka, Marcelo Tosatti, qemu-devel, Alex Williamson,
	Avi Kivity

On Tue, 18 Oct 2011, Michael S. Tsirkin wrote:

> On Tue, Oct 18, 2011 at 02:23:29PM +0200, Jan Kiszka wrote:
> > On 2011-10-18 14:05, Michael S. Tsirkin wrote:
> > > On Mon, Oct 17, 2011 at 09:15:47PM +0200, Jan Kiszka wrote:
> > >> On 2011-10-17 15:43, Michael S. Tsirkin wrote:
> > >>> On Mon, Oct 17, 2011 at 11:27:45AM +0200, Jan Kiszka wrote:
> > >>>> diff --git a/hw/msi.c b/hw/msi.c
> > >>>> index 3c7ebc3..9055155 100644
> > >>>> --- a/hw/msi.c
> > >>>> +++ b/hw/msi.c
> > >>>> @@ -40,6 +40,14 @@
> > >>>>  /* Flag for interrupt controller to declare MSI/MSI-X support */
> > >>>>  bool msi_supported;
> > >>>>  
> > >>>> +static void msi_unsupported(MSIMessage *msg)
> > >>>> +{
> > >>>> +    /* If we get here, the board failed to register a delivery handler. */
> > >>>> +    abort();
> > >>>> +}
> > >>>> +
> > >>>> +void (*msi_deliver)(MSIMessage *msg) = msi_unsupported;
> > >>>> +
> > >>>
> > >>> How about we set this to NULL, and check it instead of the bool
> > >>> flag?
> > >>>
> > >>
> > >> Yeah. I will introduce
> > >>
> > >> bool msi_supported(void)
> > >> {
> > >>     return msi_deliver != msi_unsupported;
> > >> }
> > >>
> > >> OK?
> > >>
> > >> Jan
> > >>
> > > 
> > > Looks a bit weird ...
> > > NULL is a pretty standard value for an invalid pointer, isn't it?
> > 
> > Save us the runtime check and is equally expressive and readable IMHO.
> > 
> > Jan
> 
> Do we need to check?
> NULL dereference leads to a crash just as surely...
> 

Not universally (not on AIX for instance (read)).

-- 
mailto:av1474@comtv.ru

^ permalink raw reply	[flat|nested] 288+ messages in thread

* Re: [RFC][PATCH 22/45] qemu-kvm: msix: Fire mask notifier on global mask changes
  2011-10-18 12:40         ` [Qemu-devel] " Michael S. Tsirkin
@ 2011-10-18 12:45           ` Jan Kiszka
  -1 siblings, 0 replies; 288+ messages in thread
From: Jan Kiszka @ 2011-10-18 12:45 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Avi Kivity, Marcelo Tosatti, kvm, Alex Williamson, qemu-devel

On 2011-10-18 14:40, Michael S. Tsirkin wrote:
> On Mon, Oct 17, 2011 at 09:00:12PM +0200, Jan Kiszka wrote:
>> On 2011-10-17 14:16, Michael S. Tsirkin wrote:
>>> On Mon, Oct 17, 2011 at 11:27:56AM +0200, Jan Kiszka wrote:
>>>> Also invoke the mask notifier if the global MSI-X mask is modified. For
>>>> this purpose, we push the notifier call from the per-vector mask update
>>>> to the central msix_handle_mask_update.
>>>>
>>>> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
>>>
>>> This is a bugfix, isn't it?
>>> If yes it should be separated and put on -stable.
>>
>> Yep, will pull this to the front.
> 
> I'll apply this to qemu.git, no need to mix bugfixes
> with features ...

This doesn't apply to qemu.git (there are no notifiers upstream).

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux

^ permalink raw reply	[flat|nested] 288+ messages in thread

* Re: [Qemu-devel] [RFC][PATCH 22/45] qemu-kvm: msix: Fire mask notifier on global mask changes
@ 2011-10-18 12:45           ` Jan Kiszka
  0 siblings, 0 replies; 288+ messages in thread
From: Jan Kiszka @ 2011-10-18 12:45 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Alex Williamson, Marcelo Tosatti, Avi Kivity, kvm, qemu-devel

On 2011-10-18 14:40, Michael S. Tsirkin wrote:
> On Mon, Oct 17, 2011 at 09:00:12PM +0200, Jan Kiszka wrote:
>> On 2011-10-17 14:16, Michael S. Tsirkin wrote:
>>> On Mon, Oct 17, 2011 at 11:27:56AM +0200, Jan Kiszka wrote:
>>>> Also invoke the mask notifier if the global MSI-X mask is modified. For
>>>> this purpose, we push the notifier call from the per-vector mask update
>>>> to the central msix_handle_mask_update.
>>>>
>>>> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
>>>
>>> This is a bugfix, isn't it?
>>> If yes it should be separated and put on -stable.
>>
>> Yep, will pull this to the front.
> 
> I'll apply this to qemu.git, no need to mix bugfixes
> with features ...

This doesn't apply to qemu.git (there are no notifiers upstream).

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux

^ permalink raw reply	[flat|nested] 288+ messages in thread

* Re: [RFC][PATCH 28/45] qemu-kvm: msix: Drop tracking of used vectors
  2011-10-18 12:38               ` [Qemu-devel] " Jan Kiszka
@ 2011-10-18 12:48                 ` Michael S. Tsirkin
  -1 siblings, 0 replies; 288+ messages in thread
From: Michael S. Tsirkin @ 2011-10-18 12:48 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: Avi Kivity, Marcelo Tosatti, kvm, Alex Williamson, qemu-devel

On Tue, Oct 18, 2011 at 02:38:36PM +0200, Jan Kiszka wrote:
> On 2011-10-18 14:33, Michael S. Tsirkin wrote:
> > On Tue, Oct 18, 2011 at 02:08:59PM +0200, Jan Kiszka wrote:
> >> On 2011-10-18 13:58, Michael S. Tsirkin wrote:
> >>> On Mon, Oct 17, 2011 at 09:28:12PM +0200, Jan Kiszka wrote:
> >>>> On 2011-10-17 17:48, Michael S. Tsirkin wrote:
> >>>>> On Mon, Oct 17, 2011 at 11:28:02AM +0200, Jan Kiszka wrote:
> >>>>>> This optimization was only required to keep KVM route usage low. Now
> >>>>>> that we solve that problem via lazy updates, we can drop the field. We
> >>>>>> still need interfaces to clear pending vectors, though (and we have to
> >>>>>> make use of them more broadly - but that's unrelated to this patch).
> >>>>>>
> >>>>>> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
> >>>>>
> >>>>> Lazy updates should be an implementation detail.
> >>>>> IMO resource tracking of vectors makes sense
> >>>>> as an API. Making devices deal with pending
> >>>>> vectors as a concept, IMO, does not.
> >>>>
> >>>> There is really no use for tracking the vector lifecycle once we have
> >>>> lazy updates (except for static routes). It's a way too invasive
> >>>> concept, and it's not needed for anything but KVM.
> >>>
> >>> I think it's needed. The PCI spec states that when the device
> >>> does not need an interrupt anymore, it should clear the pending
> >>> bit.
> >>
> >> That should be done explicitly if it is required outside existing
> >> clearing points. We already have that service, it's called
> >> msix_clear_vector.
> > 
> > We do? I don't seem to see it upstream...
> 
> True. From the device's POV, MSI-X (and also MSI!) vectors are actually
> level-triggered.

This definitely takes adjusting to.

> So we should communicate the level to the MSI core and
> not just the edge. Needs more fixing
>
> > 
> >> That alone does not justify msix_vector_use and all
> >> the state and logic behind it IMHO.
> > 
> > To me it looks like an abstraction that solves both
> > this problem and the resource allocation problem.
> > Resources are actually limited BTW, this is not just
> > a KVM thing. qemu.git currently lets guests decide
> > what to do with them, but it might turn out to
> > be benefitial to warn the management application
> > that it is shooting itself in the foot.
> > 
> >>> The use/unuse is IMO a decent API for this,
> >>> because it uses a familiar resource tracking concept.
> >>> Exposing this knowledge of msix to devices seems
> >>> like a worse API.
> >>>
> >>>>
> >>>> If you want an example, check
> >>>> http://permalink.gmane.org/gmane.comp.emulators.kvm.devel/70915 and
> >>>> compare it to the changes done to hpet in this series.
> >>>>
> >>>> Jan
> >>>>
> >>>
> >>> This seems to be a general argument that lazy updates are good?
> >>> I have no real problem with them, besides the fact that
> >>> we need an API to reserve space in the routing
> >>> table so that device setup can fail upfront.
> >>
> >> That's not possible, even with used vectors, as devices change their
> >> vector usage depending on how the guest configures the devices. If you
> >> (pre-)allocate all possible vectors, you may run out of resources
> >> earlier than needed actually.
> > 
> > This really depends, but please do look at how with virtio
> > we report resource shortage to guest and let it fall back to
> > level interrups. You seem to remove that capability.
> 
> To my understanding, virtio will be the exception as no other device
> will have a chance to react on resource shortage while sending(!) an MSI
> message.

Hmm, are you familiar with that spec? This is not what virtio does,
resource shortage is detected during setup.
This is exactly the problem with lazy registration as you don't
allocate until it's too late.

> > 
> > I actually would not mind preallocating everything upfront which is much
> > easier.  But with your patch we get a silent failure or a drastic
> > slowdown which is much more painful IMO.
> 
> Again: did we already saw that limit? And where does it come from if not
> from KVM?

It's a hardware limitation of intel APICs. interrupt vector is encoded
in an 8 bit field in msi address. So you can have at most 256 of these.

> > 
> >> That's also why we do those data == 0
> >> checks to skip used but unconfigured vectors.
> >>
> >> Jan
> > 
> > These checks work more or less by luck BTW. It's
> > a hack which I hope lazy allocation will replace.
> 
> The check is still valid (for x86) when we have to use static routes
> (device assignment, vhost).

It's not valid at all - we are just lucky that linux and
windows guests seem to zero out the vector when it's not in use.
They do not have to do that.

> For lazy updates, it's obsolete, that's true.
> 
> Jan
> 
> -- 
> Siemens AG, Corporate Technology, CT T DE IT 1
> Corporate Competence Center Embedded Linux

^ permalink raw reply	[flat|nested] 288+ messages in thread

* Re: [Qemu-devel] [RFC][PATCH 28/45] qemu-kvm: msix: Drop tracking of used vectors
@ 2011-10-18 12:48                 ` Michael S. Tsirkin
  0 siblings, 0 replies; 288+ messages in thread
From: Michael S. Tsirkin @ 2011-10-18 12:48 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: Alex Williamson, Marcelo Tosatti, Avi Kivity, kvm, qemu-devel

On Tue, Oct 18, 2011 at 02:38:36PM +0200, Jan Kiszka wrote:
> On 2011-10-18 14:33, Michael S. Tsirkin wrote:
> > On Tue, Oct 18, 2011 at 02:08:59PM +0200, Jan Kiszka wrote:
> >> On 2011-10-18 13:58, Michael S. Tsirkin wrote:
> >>> On Mon, Oct 17, 2011 at 09:28:12PM +0200, Jan Kiszka wrote:
> >>>> On 2011-10-17 17:48, Michael S. Tsirkin wrote:
> >>>>> On Mon, Oct 17, 2011 at 11:28:02AM +0200, Jan Kiszka wrote:
> >>>>>> This optimization was only required to keep KVM route usage low. Now
> >>>>>> that we solve that problem via lazy updates, we can drop the field. We
> >>>>>> still need interfaces to clear pending vectors, though (and we have to
> >>>>>> make use of them more broadly - but that's unrelated to this patch).
> >>>>>>
> >>>>>> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
> >>>>>
> >>>>> Lazy updates should be an implementation detail.
> >>>>> IMO resource tracking of vectors makes sense
> >>>>> as an API. Making devices deal with pending
> >>>>> vectors as a concept, IMO, does not.
> >>>>
> >>>> There is really no use for tracking the vector lifecycle once we have
> >>>> lazy updates (except for static routes). It's a way too invasive
> >>>> concept, and it's not needed for anything but KVM.
> >>>
> >>> I think it's needed. The PCI spec states that when the device
> >>> does not need an interrupt anymore, it should clear the pending
> >>> bit.
> >>
> >> That should be done explicitly if it is required outside existing
> >> clearing points. We already have that service, it's called
> >> msix_clear_vector.
> > 
> > We do? I don't seem to see it upstream...
> 
> True. From the device's POV, MSI-X (and also MSI!) vectors are actually
> level-triggered.

This definitely takes adjusting to.

> So we should communicate the level to the MSI core and
> not just the edge. Needs more fixing
>
> > 
> >> That alone does not justify msix_vector_use and all
> >> the state and logic behind it IMHO.
> > 
> > To me it looks like an abstraction that solves both
> > this problem and the resource allocation problem.
> > Resources are actually limited BTW, this is not just
> > a KVM thing. qemu.git currently lets guests decide
> > what to do with them, but it might turn out to
> > be benefitial to warn the management application
> > that it is shooting itself in the foot.
> > 
> >>> The use/unuse is IMO a decent API for this,
> >>> because it uses a familiar resource tracking concept.
> >>> Exposing this knowledge of msix to devices seems
> >>> like a worse API.
> >>>
> >>>>
> >>>> If you want an example, check
> >>>> http://permalink.gmane.org/gmane.comp.emulators.kvm.devel/70915 and
> >>>> compare it to the changes done to hpet in this series.
> >>>>
> >>>> Jan
> >>>>
> >>>
> >>> This seems to be a general argument that lazy updates are good?
> >>> I have no real problem with them, besides the fact that
> >>> we need an API to reserve space in the routing
> >>> table so that device setup can fail upfront.
> >>
> >> That's not possible, even with used vectors, as devices change their
> >> vector usage depending on how the guest configures the devices. If you
> >> (pre-)allocate all possible vectors, you may run out of resources
> >> earlier than needed actually.
> > 
> > This really depends, but please do look at how with virtio
> > we report resource shortage to guest and let it fall back to
> > level interrups. You seem to remove that capability.
> 
> To my understanding, virtio will be the exception as no other device
> will have a chance to react on resource shortage while sending(!) an MSI
> message.

Hmm, are you familiar with that spec? This is not what virtio does,
resource shortage is detected during setup.
This is exactly the problem with lazy registration as you don't
allocate until it's too late.

> > 
> > I actually would not mind preallocating everything upfront which is much
> > easier.  But with your patch we get a silent failure or a drastic
> > slowdown which is much more painful IMO.
> 
> Again: did we already saw that limit? And where does it come from if not
> from KVM?

It's a hardware limitation of intel APICs. interrupt vector is encoded
in an 8 bit field in msi address. So you can have at most 256 of these.

> > 
> >> That's also why we do those data == 0
> >> checks to skip used but unconfigured vectors.
> >>
> >> Jan
> > 
> > These checks work more or less by luck BTW. It's
> > a hack which I hope lazy allocation will replace.
> 
> The check is still valid (for x86) when we have to use static routes
> (device assignment, vhost).

It's not valid at all - we are just lucky that linux and
windows guests seem to zero out the vector when it's not in use.
They do not have to do that.

> For lazy updates, it's obsolete, that's true.
> 
> Jan
> 
> -- 
> Siemens AG, Corporate Technology, CT T DE IT 1
> Corporate Competence Center Embedded Linux

^ permalink raw reply	[flat|nested] 288+ messages in thread

* Re: [RFC][PATCH 11/45] msi: Factor out delivery hook
  2011-10-18 12:44               ` [Qemu-devel] " malc
@ 2011-10-18 12:49                 ` Michael S. Tsirkin
  -1 siblings, 0 replies; 288+ messages in thread
From: Michael S. Tsirkin @ 2011-10-18 12:49 UTC (permalink / raw)
  To: malc
  Cc: kvm, Jan Kiszka, Marcelo Tosatti, qemu-devel, Alex Williamson,
	Avi Kivity

On Tue, Oct 18, 2011 at 04:44:47PM +0400, malc wrote:
> On Tue, 18 Oct 2011, Michael S. Tsirkin wrote:
> 
> > On Tue, Oct 18, 2011 at 02:23:29PM +0200, Jan Kiszka wrote:
> > > On 2011-10-18 14:05, Michael S. Tsirkin wrote:
> > > > On Mon, Oct 17, 2011 at 09:15:47PM +0200, Jan Kiszka wrote:
> > > >> On 2011-10-17 15:43, Michael S. Tsirkin wrote:
> > > >>> On Mon, Oct 17, 2011 at 11:27:45AM +0200, Jan Kiszka wrote:
> > > >>>> diff --git a/hw/msi.c b/hw/msi.c
> > > >>>> index 3c7ebc3..9055155 100644
> > > >>>> --- a/hw/msi.c
> > > >>>> +++ b/hw/msi.c
> > > >>>> @@ -40,6 +40,14 @@
> > > >>>>  /* Flag for interrupt controller to declare MSI/MSI-X support */
> > > >>>>  bool msi_supported;
> > > >>>>  
> > > >>>> +static void msi_unsupported(MSIMessage *msg)
> > > >>>> +{
> > > >>>> +    /* If we get here, the board failed to register a delivery handler. */
> > > >>>> +    abort();
> > > >>>> +}
> > > >>>> +
> > > >>>> +void (*msi_deliver)(MSIMessage *msg) = msi_unsupported;
> > > >>>> +
> > > >>>
> > > >>> How about we set this to NULL, and check it instead of the bool
> > > >>> flag?
> > > >>>
> > > >>
> > > >> Yeah. I will introduce
> > > >>
> > > >> bool msi_supported(void)
> > > >> {
> > > >>     return msi_deliver != msi_unsupported;
> > > >> }
> > > >>
> > > >> OK?
> > > >>
> > > >> Jan
> > > >>
> > > > 
> > > > Looks a bit weird ...
> > > > NULL is a pretty standard value for an invalid pointer, isn't it?
> > > 
> > > Save us the runtime check and is equally expressive and readable IMHO.
> > > 
> > > Jan
> > 
> > Do we need to check?
> > NULL dereference leads to a crash just as surely...
> > 
> 
> Not universally (not on AIX for instance (read)).

This is a NULL function call though :)
Anyway, this was just nitpicking. Do it any way you like.

> -- 
> mailto:av1474@comtv.ru

^ permalink raw reply	[flat|nested] 288+ messages in thread

* Re: [Qemu-devel] [RFC][PATCH 11/45] msi: Factor out delivery hook
@ 2011-10-18 12:49                 ` Michael S. Tsirkin
  0 siblings, 0 replies; 288+ messages in thread
From: Michael S. Tsirkin @ 2011-10-18 12:49 UTC (permalink / raw)
  To: malc
  Cc: kvm, Jan Kiszka, Marcelo Tosatti, qemu-devel, Alex Williamson,
	Avi Kivity

On Tue, Oct 18, 2011 at 04:44:47PM +0400, malc wrote:
> On Tue, 18 Oct 2011, Michael S. Tsirkin wrote:
> 
> > On Tue, Oct 18, 2011 at 02:23:29PM +0200, Jan Kiszka wrote:
> > > On 2011-10-18 14:05, Michael S. Tsirkin wrote:
> > > > On Mon, Oct 17, 2011 at 09:15:47PM +0200, Jan Kiszka wrote:
> > > >> On 2011-10-17 15:43, Michael S. Tsirkin wrote:
> > > >>> On Mon, Oct 17, 2011 at 11:27:45AM +0200, Jan Kiszka wrote:
> > > >>>> diff --git a/hw/msi.c b/hw/msi.c
> > > >>>> index 3c7ebc3..9055155 100644
> > > >>>> --- a/hw/msi.c
> > > >>>> +++ b/hw/msi.c
> > > >>>> @@ -40,6 +40,14 @@
> > > >>>>  /* Flag for interrupt controller to declare MSI/MSI-X support */
> > > >>>>  bool msi_supported;
> > > >>>>  
> > > >>>> +static void msi_unsupported(MSIMessage *msg)
> > > >>>> +{
> > > >>>> +    /* If we get here, the board failed to register a delivery handler. */
> > > >>>> +    abort();
> > > >>>> +}
> > > >>>> +
> > > >>>> +void (*msi_deliver)(MSIMessage *msg) = msi_unsupported;
> > > >>>> +
> > > >>>
> > > >>> How about we set this to NULL, and check it instead of the bool
> > > >>> flag?
> > > >>>
> > > >>
> > > >> Yeah. I will introduce
> > > >>
> > > >> bool msi_supported(void)
> > > >> {
> > > >>     return msi_deliver != msi_unsupported;
> > > >> }
> > > >>
> > > >> OK?
> > > >>
> > > >> Jan
> > > >>
> > > > 
> > > > Looks a bit weird ...
> > > > NULL is a pretty standard value for an invalid pointer, isn't it?
> > > 
> > > Save us the runtime check and is equally expressive and readable IMHO.
> > > 
> > > Jan
> > 
> > Do we need to check?
> > NULL dereference leads to a crash just as surely...
> > 
> 
> Not universally (not on AIX for instance (read)).

This is a NULL function call though :)
Anyway, this was just nitpicking. Do it any way you like.

> -- 
> mailto:av1474@comtv.ru

^ permalink raw reply	[flat|nested] 288+ messages in thread

* Re: [RFC][PATCH 22/45] qemu-kvm: msix: Fire mask notifier on global mask changes
  2011-10-18 12:45           ` [Qemu-devel] " Jan Kiszka
@ 2011-10-18 12:57             ` Michael S. Tsirkin
  -1 siblings, 0 replies; 288+ messages in thread
From: Michael S. Tsirkin @ 2011-10-18 12:57 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: Alex Williamson, Marcelo Tosatti, Avi Kivity, kvm, qemu-devel

On Tue, Oct 18, 2011 at 02:45:58PM +0200, Jan Kiszka wrote:
> On 2011-10-18 14:40, Michael S. Tsirkin wrote:
> > On Mon, Oct 17, 2011 at 09:00:12PM +0200, Jan Kiszka wrote:
> >> On 2011-10-17 14:16, Michael S. Tsirkin wrote:
> >>> On Mon, Oct 17, 2011 at 11:27:56AM +0200, Jan Kiszka wrote:
> >>>> Also invoke the mask notifier if the global MSI-X mask is modified. For
> >>>> this purpose, we push the notifier call from the per-vector mask update
> >>>> to the central msix_handle_mask_update.
> >>>>
> >>>> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
> >>>
> >>> This is a bugfix, isn't it?
> >>> If yes it should be separated and put on -stable.
> >>
> >> Yep, will pull this to the front.
> > 
> > I'll apply this to qemu.git, no need to mix bugfixes
> > with features ...
> 
> This doesn't apply to qemu.git (there are no notifiers upstream).
> 
> Jan

Right, thanks for the reminder. Anyway, I'll take care
of this, don't worry :)

> -- 
> Siemens AG, Corporate Technology, CT T DE IT 1
> Corporate Competence Center Embedded Linux

^ permalink raw reply	[flat|nested] 288+ messages in thread

* Re: [Qemu-devel] [RFC][PATCH 22/45] qemu-kvm: msix: Fire mask notifier on global mask changes
@ 2011-10-18 12:57             ` Michael S. Tsirkin
  0 siblings, 0 replies; 288+ messages in thread
From: Michael S. Tsirkin @ 2011-10-18 12:57 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: Alex Williamson, Marcelo Tosatti, Avi Kivity, kvm, qemu-devel

On Tue, Oct 18, 2011 at 02:45:58PM +0200, Jan Kiszka wrote:
> On 2011-10-18 14:40, Michael S. Tsirkin wrote:
> > On Mon, Oct 17, 2011 at 09:00:12PM +0200, Jan Kiszka wrote:
> >> On 2011-10-17 14:16, Michael S. Tsirkin wrote:
> >>> On Mon, Oct 17, 2011 at 11:27:56AM +0200, Jan Kiszka wrote:
> >>>> Also invoke the mask notifier if the global MSI-X mask is modified. For
> >>>> this purpose, we push the notifier call from the per-vector mask update
> >>>> to the central msix_handle_mask_update.
> >>>>
> >>>> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
> >>>
> >>> This is a bugfix, isn't it?
> >>> If yes it should be separated and put on -stable.
> >>
> >> Yep, will pull this to the front.
> > 
> > I'll apply this to qemu.git, no need to mix bugfixes
> > with features ...
> 
> This doesn't apply to qemu.git (there are no notifiers upstream).
> 
> Jan

Right, thanks for the reminder. Anyway, I'll take care
of this, don't worry :)

> -- 
> Siemens AG, Corporate Technology, CT T DE IT 1
> Corporate Competence Center Embedded Linux

^ permalink raw reply	[flat|nested] 288+ messages in thread

* Re: [RFC][PATCH 28/45] qemu-kvm: msix: Drop tracking of used vectors
  2011-10-18 12:48                 ` [Qemu-devel] " Michael S. Tsirkin
@ 2011-10-18 13:00                   ` Jan Kiszka
  -1 siblings, 0 replies; 288+ messages in thread
From: Jan Kiszka @ 2011-10-18 13:00 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Avi Kivity, Marcelo Tosatti, kvm, Alex Williamson, qemu-devel

On 2011-10-18 14:48, Michael S. Tsirkin wrote:
>> To my understanding, virtio will be the exception as no other device
>> will have a chance to react on resource shortage while sending(!) an MSI
>> message.
> 
> Hmm, are you familiar with that spec?

Not by heart.

> This is not what virtio does,
> resource shortage is detected during setup.
> This is exactly the problem with lazy registration as you don't
> allocate until it's too late.

When is that setup phase? Does it actually come after every change to an
MSI vector? I doubt so. Thus virtio can only estimate the guest usage as
well (a guest may or may not actually write a non-null data into a
vector and unmask it).

> 
>>>
>>> I actually would not mind preallocating everything upfront which is much
>>> easier.  But with your patch we get a silent failure or a drastic
>>> slowdown which is much more painful IMO.
>>
>> Again: did we already saw that limit? And where does it come from if not
>> from KVM?
> 
> It's a hardware limitation of intel APICs. interrupt vector is encoded
> in an 8 bit field in msi address. So you can have at most 256 of these.

There should be no such limitation with pseudo GSIs we use for MSI
injection. They end up as MSI messages again, so actually 256 (-reserved
vectors) * number-of-cpus (on x86).

> 
>>>
>>>> That's also why we do those data == 0
>>>> checks to skip used but unconfigured vectors.
>>>>
>>>> Jan
>>>
>>> These checks work more or less by luck BTW. It's
>>> a hack which I hope lazy allocation will replace.
>>
>> The check is still valid (for x86) when we have to use static routes
>> (device assignment, vhost).
> 
> It's not valid at all - we are just lucky that linux and
> windows guests seem to zero out the vector when it's not in use.
> They do not have to do that.

It is valid as it is just an optimization. If an unused vector has a
non-null data field, we just redundantly register a route where we do
not actually have to. But we do need to be prepared for potentially
arriving messages on that virtual GSI, either via irqfd or kvm device
assignment.

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux

^ permalink raw reply	[flat|nested] 288+ messages in thread

* Re: [Qemu-devel] [RFC][PATCH 28/45] qemu-kvm: msix: Drop tracking of used vectors
@ 2011-10-18 13:00                   ` Jan Kiszka
  0 siblings, 0 replies; 288+ messages in thread
From: Jan Kiszka @ 2011-10-18 13:00 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Alex Williamson, Marcelo Tosatti, Avi Kivity, kvm, qemu-devel

On 2011-10-18 14:48, Michael S. Tsirkin wrote:
>> To my understanding, virtio will be the exception as no other device
>> will have a chance to react on resource shortage while sending(!) an MSI
>> message.
> 
> Hmm, are you familiar with that spec?

Not by heart.

> This is not what virtio does,
> resource shortage is detected during setup.
> This is exactly the problem with lazy registration as you don't
> allocate until it's too late.

When is that setup phase? Does it actually come after every change to an
MSI vector? I doubt so. Thus virtio can only estimate the guest usage as
well (a guest may or may not actually write a non-null data into a
vector and unmask it).

> 
>>>
>>> I actually would not mind preallocating everything upfront which is much
>>> easier.  But with your patch we get a silent failure or a drastic
>>> slowdown which is much more painful IMO.
>>
>> Again: did we already saw that limit? And where does it come from if not
>> from KVM?
> 
> It's a hardware limitation of intel APICs. interrupt vector is encoded
> in an 8 bit field in msi address. So you can have at most 256 of these.

There should be no such limitation with pseudo GSIs we use for MSI
injection. They end up as MSI messages again, so actually 256 (-reserved
vectors) * number-of-cpus (on x86).

> 
>>>
>>>> That's also why we do those data == 0
>>>> checks to skip used but unconfigured vectors.
>>>>
>>>> Jan
>>>
>>> These checks work more or less by luck BTW. It's
>>> a hack which I hope lazy allocation will replace.
>>
>> The check is still valid (for x86) when we have to use static routes
>> (device assignment, vhost).
> 
> It's not valid at all - we are just lucky that linux and
> windows guests seem to zero out the vector when it's not in use.
> They do not have to do that.

It is valid as it is just an optimization. If an unused vector has a
non-null data field, we just redundantly register a route where we do
not actually have to. But we do need to be prepared for potentially
arriving messages on that virtual GSI, either via irqfd or kvm device
assignment.

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux

^ permalink raw reply	[flat|nested] 288+ messages in thread

* Re: [RFC][PATCH 28/45] qemu-kvm: msix: Drop tracking of used vectors
  2011-10-18 13:00                   ` [Qemu-devel] " Jan Kiszka
@ 2011-10-18 13:37                     ` Michael S. Tsirkin
  -1 siblings, 0 replies; 288+ messages in thread
From: Michael S. Tsirkin @ 2011-10-18 13:37 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: Alex Williamson, Marcelo Tosatti, Avi Kivity, kvm, qemu-devel

On Tue, Oct 18, 2011 at 03:00:29PM +0200, Jan Kiszka wrote:
> On 2011-10-18 14:48, Michael S. Tsirkin wrote:
> >> To my understanding, virtio will be the exception as no other device
> >> will have a chance to react on resource shortage while sending(!) an MSI
> >> message.
> > 
> > Hmm, are you familiar with that spec?
> 
> Not by heart.
> 
> > This is not what virtio does,
> > resource shortage is detected during setup.
> > This is exactly the problem with lazy registration as you don't
> > allocate until it's too late.
> 
> When is that setup phase? Does it actually come after every change to an
> MSI vector? I doubt so.

No. During setup, driver requests vectors from the OS, and then tells
the device which vector should each VQ use.  It then checks that the
assignment was successful. If not, it retries with less vectors.

Other devices can do this during initialization, and signal
resource availability to guest using msix vector number field.

> Thus virtio can only estimate the guest usage as
> well

At some level, this is fundamental: some guest operations
have no failure mode. So we must preallocate
some resources to make sure they won't fail.

> (a guest may or may not actually write a non-null data into a
> vector and unmask it).

Please, forget the non-NULL thing. virtio driver knows exactly
how many vectors we use and communicates this info to the device.
This is not uncommon at all.

> > 
> >>>
> >>> I actually would not mind preallocating everything upfront which is much
> >>> easier.  But with your patch we get a silent failure or a drastic
> >>> slowdown which is much more painful IMO.
> >>
> >> Again: did we already saw that limit? And where does it come from if not
> >> from KVM?
> > 
> > It's a hardware limitation of intel APICs. interrupt vector is encoded
> > in an 8 bit field in msi address. So you can have at most 256 of these.
> 
> There should be no such limitation with pseudo GSIs we use for MSI
> injection. They end up as MSI messages again, so actually 256 (-reserved
> vectors) * number-of-cpus (on x86).

This limits which CPUs can get the interrupt though.
Linux seems to have a global pool as it wants to be able to freely
balance vectors between CPUs. Or, consider a guest with a single CPU :)

Anyway, why argue - there is a limitation, and it's not coming from KVM,
right?

> > 
> >>>
> >>>> That's also why we do those data == 0
> >>>> checks to skip used but unconfigured vectors.
> >>>>
> >>>> Jan
> >>>
> >>> These checks work more or less by luck BTW. It's
> >>> a hack which I hope lazy allocation will replace.
> >>
> >> The check is still valid (for x86) when we have to use static routes
> >> (device assignment, vhost).
> > 
> > It's not valid at all - we are just lucky that linux and
> > windows guests seem to zero out the vector when it's not in use.
> > They do not have to do that.
> 
> It is valid as it is just an optimization. If an unused vector has a
> non-null data field, we just redundantly register a route where we do
> not actually have to.

Well, the only reason we even have this code is because
it was claimed that some devices declare support for a huge number
of vectors which then go unused. So if the guest does not
do this we'll run out of vectors ...

> But we do need to be prepared

And ATM, we aren't, and probably can't be without kernel
changes, right?

> for potentially
> arriving messages on that virtual GSI, either via irqfd or kvm device
> assignment.
> 
> Jan

Why irqfd?  Device assignment is ATM the only place where we use these
ugly hacks.

> -- 
> Siemens AG, Corporate Technology, CT T DE IT 1
> Corporate Competence Center Embedded Linux

^ permalink raw reply	[flat|nested] 288+ messages in thread

* Re: [Qemu-devel] [RFC][PATCH 28/45] qemu-kvm: msix: Drop tracking of used vectors
@ 2011-10-18 13:37                     ` Michael S. Tsirkin
  0 siblings, 0 replies; 288+ messages in thread
From: Michael S. Tsirkin @ 2011-10-18 13:37 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: Alex Williamson, Marcelo Tosatti, Avi Kivity, kvm, qemu-devel

On Tue, Oct 18, 2011 at 03:00:29PM +0200, Jan Kiszka wrote:
> On 2011-10-18 14:48, Michael S. Tsirkin wrote:
> >> To my understanding, virtio will be the exception as no other device
> >> will have a chance to react on resource shortage while sending(!) an MSI
> >> message.
> > 
> > Hmm, are you familiar with that spec?
> 
> Not by heart.
> 
> > This is not what virtio does,
> > resource shortage is detected during setup.
> > This is exactly the problem with lazy registration as you don't
> > allocate until it's too late.
> 
> When is that setup phase? Does it actually come after every change to an
> MSI vector? I doubt so.

No. During setup, driver requests vectors from the OS, and then tells
the device which vector should each VQ use.  It then checks that the
assignment was successful. If not, it retries with less vectors.

Other devices can do this during initialization, and signal
resource availability to guest using msix vector number field.

> Thus virtio can only estimate the guest usage as
> well

At some level, this is fundamental: some guest operations
have no failure mode. So we must preallocate
some resources to make sure they won't fail.

> (a guest may or may not actually write a non-null data into a
> vector and unmask it).

Please, forget the non-NULL thing. virtio driver knows exactly
how many vectors we use and communicates this info to the device.
This is not uncommon at all.

> > 
> >>>
> >>> I actually would not mind preallocating everything upfront which is much
> >>> easier.  But with your patch we get a silent failure or a drastic
> >>> slowdown which is much more painful IMO.
> >>
> >> Again: did we already saw that limit? And where does it come from if not
> >> from KVM?
> > 
> > It's a hardware limitation of intel APICs. interrupt vector is encoded
> > in an 8 bit field in msi address. So you can have at most 256 of these.
> 
> There should be no such limitation with pseudo GSIs we use for MSI
> injection. They end up as MSI messages again, so actually 256 (-reserved
> vectors) * number-of-cpus (on x86).

This limits which CPUs can get the interrupt though.
Linux seems to have a global pool as it wants to be able to freely
balance vectors between CPUs. Or, consider a guest with a single CPU :)

Anyway, why argue - there is a limitation, and it's not coming from KVM,
right?

> > 
> >>>
> >>>> That's also why we do those data == 0
> >>>> checks to skip used but unconfigured vectors.
> >>>>
> >>>> Jan
> >>>
> >>> These checks work more or less by luck BTW. It's
> >>> a hack which I hope lazy allocation will replace.
> >>
> >> The check is still valid (for x86) when we have to use static routes
> >> (device assignment, vhost).
> > 
> > It's not valid at all - we are just lucky that linux and
> > windows guests seem to zero out the vector when it's not in use.
> > They do not have to do that.
> 
> It is valid as it is just an optimization. If an unused vector has a
> non-null data field, we just redundantly register a route where we do
> not actually have to.

Well, the only reason we even have this code is because
it was claimed that some devices declare support for a huge number
of vectors which then go unused. So if the guest does not
do this we'll run out of vectors ...

> But we do need to be prepared

And ATM, we aren't, and probably can't be without kernel
changes, right?

> for potentially
> arriving messages on that virtual GSI, either via irqfd or kvm device
> assignment.
> 
> Jan

Why irqfd?  Device assignment is ATM the only place where we use these
ugly hacks.

> -- 
> Siemens AG, Corporate Technology, CT T DE IT 1
> Corporate Competence Center Embedded Linux

^ permalink raw reply	[flat|nested] 288+ messages in thread

* Re: [RFC][PATCH 28/45] qemu-kvm: msix: Drop tracking of used vectors
  2011-10-18 13:37                     ` [Qemu-devel] " Michael S. Tsirkin
@ 2011-10-18 13:46                       ` Jan Kiszka
  -1 siblings, 0 replies; 288+ messages in thread
From: Jan Kiszka @ 2011-10-18 13:46 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Alex Williamson, Marcelo Tosatti, Avi Kivity, kvm, qemu-devel

On 2011-10-18 15:37, Michael S. Tsirkin wrote:
> On Tue, Oct 18, 2011 at 03:00:29PM +0200, Jan Kiszka wrote:
>> On 2011-10-18 14:48, Michael S. Tsirkin wrote:
>>>> To my understanding, virtio will be the exception as no other device
>>>> will have a chance to react on resource shortage while sending(!) an MSI
>>>> message.
>>>
>>> Hmm, are you familiar with that spec?
>>
>> Not by heart.
>>
>>> This is not what virtio does,
>>> resource shortage is detected during setup.
>>> This is exactly the problem with lazy registration as you don't
>>> allocate until it's too late.
>>
>> When is that setup phase? Does it actually come after every change to an
>> MSI vector? I doubt so.
> 
> No. During setup, driver requests vectors from the OS, and then tells
> the device which vector should each VQ use.  It then checks that the
> assignment was successful. If not, it retries with less vectors.
> 
> Other devices can do this during initialization, and signal
> resource availability to guest using msix vector number field.
> 
>> Thus virtio can only estimate the guest usage as
>> well
> 
> At some level, this is fundamental: some guest operations
> have no failure mode. So we must preallocate
> some resources to make sure they won't fail.

We can still track the expected maximum number of active vectors at core
level, collect them from the KVM layer, and warn if we expect conflicts.
Anxious MSI users could then refrain from using this feature, others
might be fine with risking a slow-down on conflicts.

> 
>> (a guest may or may not actually write a non-null data into a
>> vector and unmask it).
> 
> Please, forget the non-NULL thing. virtio driver knows exactly
> how many vectors we use and communicates this info to the device.
> This is not uncommon at all.
> 
>>>
>>>>>
>>>>> I actually would not mind preallocating everything upfront which is much
>>>>> easier.  But with your patch we get a silent failure or a drastic
>>>>> slowdown which is much more painful IMO.
>>>>
>>>> Again: did we already saw that limit? And where does it come from if not
>>>> from KVM?
>>>
>>> It's a hardware limitation of intel APICs. interrupt vector is encoded
>>> in an 8 bit field in msi address. So you can have at most 256 of these.
>>
>> There should be no such limitation with pseudo GSIs we use for MSI
>> injection. They end up as MSI messages again, so actually 256 (-reserved
>> vectors) * number-of-cpus (on x86).
> 
> This limits which CPUs can get the interrupt though.
> Linux seems to have a global pool as it wants to be able to freely
> balance vectors between CPUs. Or, consider a guest with a single CPU :)
> 
> Anyway, why argue - there is a limitation, and it's not coming from KVM,
> right?

No, our limit we hit with MSI message routing are first of all KVM GSIs,
and there only pseudo GSIs that do not go to any interrupt controller
with limited pins. That could easily be lifted in the kernel if we run
into shortages in practice.

> 
>>>
>>>>>
>>>>>> That's also why we do those data == 0
>>>>>> checks to skip used but unconfigured vectors.
>>>>>>
>>>>>> Jan
>>>>>
>>>>> These checks work more or less by luck BTW. It's
>>>>> a hack which I hope lazy allocation will replace.
>>>>
>>>> The check is still valid (for x86) when we have to use static routes
>>>> (device assignment, vhost).
>>>
>>> It's not valid at all - we are just lucky that linux and
>>> windows guests seem to zero out the vector when it's not in use.
>>> They do not have to do that.
>>
>> It is valid as it is just an optimization. If an unused vector has a
>> non-null data field, we just redundantly register a route where we do
>> not actually have to.
> 
> Well, the only reason we even have this code is because
> it was claimed that some devices declare support for a huge number
> of vectors which then go unused. So if the guest does not
> do this we'll run out of vectors ...
> 
>> But we do need to be prepared
> 
> And ATM, we aren't, and probably can't be without kernel
> changes, right?
> 
>> for potentially
>> arriving messages on that virtual GSI, either via irqfd or kvm device
>> assignment.
>>
>> Jan
> 
> Why irqfd?  Device assignment is ATM the only place where we use these
> ugly hacks.

vfio will use irqfds. And that virtio is partly out of the picture is
only because we know much more about virtio internals (specifically:
"will not advertise more vectors than guests will want to use").

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux

^ permalink raw reply	[flat|nested] 288+ messages in thread

* Re: [Qemu-devel] [RFC][PATCH 28/45] qemu-kvm: msix: Drop tracking of used vectors
@ 2011-10-18 13:46                       ` Jan Kiszka
  0 siblings, 0 replies; 288+ messages in thread
From: Jan Kiszka @ 2011-10-18 13:46 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Alex Williamson, Marcelo Tosatti, Avi Kivity, kvm, qemu-devel

On 2011-10-18 15:37, Michael S. Tsirkin wrote:
> On Tue, Oct 18, 2011 at 03:00:29PM +0200, Jan Kiszka wrote:
>> On 2011-10-18 14:48, Michael S. Tsirkin wrote:
>>>> To my understanding, virtio will be the exception as no other device
>>>> will have a chance to react on resource shortage while sending(!) an MSI
>>>> message.
>>>
>>> Hmm, are you familiar with that spec?
>>
>> Not by heart.
>>
>>> This is not what virtio does,
>>> resource shortage is detected during setup.
>>> This is exactly the problem with lazy registration as you don't
>>> allocate until it's too late.
>>
>> When is that setup phase? Does it actually come after every change to an
>> MSI vector? I doubt so.
> 
> No. During setup, driver requests vectors from the OS, and then tells
> the device which vector should each VQ use.  It then checks that the
> assignment was successful. If not, it retries with less vectors.
> 
> Other devices can do this during initialization, and signal
> resource availability to guest using msix vector number field.
> 
>> Thus virtio can only estimate the guest usage as
>> well
> 
> At some level, this is fundamental: some guest operations
> have no failure mode. So we must preallocate
> some resources to make sure they won't fail.

We can still track the expected maximum number of active vectors at core
level, collect them from the KVM layer, and warn if we expect conflicts.
Anxious MSI users could then refrain from using this feature, others
might be fine with risking a slow-down on conflicts.

> 
>> (a guest may or may not actually write a non-null data into a
>> vector and unmask it).
> 
> Please, forget the non-NULL thing. virtio driver knows exactly
> how many vectors we use and communicates this info to the device.
> This is not uncommon at all.
> 
>>>
>>>>>
>>>>> I actually would not mind preallocating everything upfront which is much
>>>>> easier.  But with your patch we get a silent failure or a drastic
>>>>> slowdown which is much more painful IMO.
>>>>
>>>> Again: did we already saw that limit? And where does it come from if not
>>>> from KVM?
>>>
>>> It's a hardware limitation of intel APICs. interrupt vector is encoded
>>> in an 8 bit field in msi address. So you can have at most 256 of these.
>>
>> There should be no such limitation with pseudo GSIs we use for MSI
>> injection. They end up as MSI messages again, so actually 256 (-reserved
>> vectors) * number-of-cpus (on x86).
> 
> This limits which CPUs can get the interrupt though.
> Linux seems to have a global pool as it wants to be able to freely
> balance vectors between CPUs. Or, consider a guest with a single CPU :)
> 
> Anyway, why argue - there is a limitation, and it's not coming from KVM,
> right?

No, our limit we hit with MSI message routing are first of all KVM GSIs,
and there only pseudo GSIs that do not go to any interrupt controller
with limited pins. That could easily be lifted in the kernel if we run
into shortages in practice.

> 
>>>
>>>>>
>>>>>> That's also why we do those data == 0
>>>>>> checks to skip used but unconfigured vectors.
>>>>>>
>>>>>> Jan
>>>>>
>>>>> These checks work more or less by luck BTW. It's
>>>>> a hack which I hope lazy allocation will replace.
>>>>
>>>> The check is still valid (for x86) when we have to use static routes
>>>> (device assignment, vhost).
>>>
>>> It's not valid at all - we are just lucky that linux and
>>> windows guests seem to zero out the vector when it's not in use.
>>> They do not have to do that.
>>
>> It is valid as it is just an optimization. If an unused vector has a
>> non-null data field, we just redundantly register a route where we do
>> not actually have to.
> 
> Well, the only reason we even have this code is because
> it was claimed that some devices declare support for a huge number
> of vectors which then go unused. So if the guest does not
> do this we'll run out of vectors ...
> 
>> But we do need to be prepared
> 
> And ATM, we aren't, and probably can't be without kernel
> changes, right?
> 
>> for potentially
>> arriving messages on that virtual GSI, either via irqfd or kvm device
>> assignment.
>>
>> Jan
> 
> Why irqfd?  Device assignment is ATM the only place where we use these
> ugly hacks.

vfio will use irqfds. And that virtio is partly out of the picture is
only because we know much more about virtio internals (specifically:
"will not advertise more vectors than guests will want to use").

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux

^ permalink raw reply	[flat|nested] 288+ messages in thread

* Re: [RFC][PATCH 23/45] qemu-kvm: Rework MSI-X mask notifier to generic MSI config notifiers
  2011-10-17 19:08           ` [Qemu-devel] " Jan Kiszka
@ 2011-10-18 13:46             ` Michael S. Tsirkin
  -1 siblings, 0 replies; 288+ messages in thread
From: Michael S. Tsirkin @ 2011-10-18 13:46 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: Avi Kivity, Marcelo Tosatti, kvm, Alex Williamson, qemu-devel

On Mon, Oct 17, 2011 at 09:08:58PM +0200, Jan Kiszka wrote:
> On 2011-10-17 14:39, Michael S. Tsirkin wrote:
> > On Mon, Oct 17, 2011 at 01:45:04PM +0200, Jan Kiszka wrote:
> >> On 2011-10-17 13:40, Michael S. Tsirkin wrote:
> >>> On Mon, Oct 17, 2011 at 11:27:57AM +0200, Jan Kiszka wrote:
> >>>> MSI config notifiers are supposed to be triggered on every relevant
> >>>> configuration change of MSI vectors or if MSI is enabled/disabled.
> >>>>
> >>>> Two notifiers are established, one for vector changes and one for general
> >>>> enabling. The former notifier additionally passes the currently active
> >>>> MSI message.
> >>>> This will allow to update potential in-kernel IRQ routes on
> >>>> changes. The latter notifier is optional and will only be used by a
> >>>> subset of clients.
> >>>>
> >>>> These notifiers are currently only available for MSI-X but will be
> >>>> extended to legacy MSI as well.
> >>>>
> >>>> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
> >>>
> >>> Passing message, always, does not seem to make sense: message is only
> >>> valid if it is unmasked.
> >>
> >> If we go from unmasked to masked, the consumer could just ignore the
> >> message.
> > 
> > Why don't we let the consumer get the message if it needs it?
> 
> Because most consumer will need and because I want to keep the API simple.

The API seems to get more complex, as you are passing
in fields which are only valid sometimes.

> > 
> >>> Further, IIRC the spec requires any changes to be done while
> >>> message is masked. So mask notifier makes more sense to me:
> >>> it does the same thing using one notifier that you do
> >>> using two notifiers.
> >>
> >> That's in fact a possible optimization (only invoke the callback on mask
> >> transitions).
> > 
> > Further, it is one that is already implemented.
> > So I would prefer not to add work by removing it :)
> 
> Generalization to cover MSI requires some changes. Unneeded behavioral
> changes back and forth should and will of course be avoided. I will
> rework this.
> 
> > 
> >> Not sure if that applies to MSI as well, probably not.
> > 
> > Probably not. However, if per vector masking is
> > supported, and while vector is masked, the address/
> > data values might not make any sense.
> > 
> > So I think even msi users needs to know about masked state.
> 
> Yes, and they get this information via the config notifier.
> 
> > 
> >> To
> >> have common types, I would prefer to stay with vector config notifiers
> >> as name then.
> >>
> >> Jan
> > 
> > So we pass in nonsense values and ask all users to know about MSIX rules.
> > Ugh.
> > 
> > I do realize msi might change the vector without masking.
> > We can either artificially call mask before value change
> > and unmask after, or use 3 notifiers: mask,unmask,config.
> > Add a comment that config is invoked when configuration
> > for an unmasked vector is changed, and that
> > it can only happen for msi, not msix.
> 
> I see no need in complicating the API like this. MSI-X still needs the
> config information on unmask, so let's just consistently pass it via the
> unified config notifier instead of forcing the consumers to create yet
> two more handlers. I really do not see the benefit for the consumer.
> 
> Jan
> 

The benefit is a clearer API, where all parameters you get are valid,
so you do not need to go read the spec to see what is OK to use.
Generally, encoding events in flags is more error
prone than using different notifiers for different events.

E.g. _unmask and _mask make
it obvious that they are called on mask and on unmask
respectively.
OTOH _config_change(bool mask) is unclear: is 'mask' the new
state or the old state?

It might be just my taste, but I usually prefer multiple
functions doing one thing each rather than a single
function doing multiple things. It shouldn't be too hard ...

-- 
MST

^ permalink raw reply	[flat|nested] 288+ messages in thread

* Re: [Qemu-devel] [RFC][PATCH 23/45] qemu-kvm: Rework MSI-X mask notifier to generic MSI config notifiers
@ 2011-10-18 13:46             ` Michael S. Tsirkin
  0 siblings, 0 replies; 288+ messages in thread
From: Michael S. Tsirkin @ 2011-10-18 13:46 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: Alex Williamson, Marcelo Tosatti, Avi Kivity, kvm, qemu-devel

On Mon, Oct 17, 2011 at 09:08:58PM +0200, Jan Kiszka wrote:
> On 2011-10-17 14:39, Michael S. Tsirkin wrote:
> > On Mon, Oct 17, 2011 at 01:45:04PM +0200, Jan Kiszka wrote:
> >> On 2011-10-17 13:40, Michael S. Tsirkin wrote:
> >>> On Mon, Oct 17, 2011 at 11:27:57AM +0200, Jan Kiszka wrote:
> >>>> MSI config notifiers are supposed to be triggered on every relevant
> >>>> configuration change of MSI vectors or if MSI is enabled/disabled.
> >>>>
> >>>> Two notifiers are established, one for vector changes and one for general
> >>>> enabling. The former notifier additionally passes the currently active
> >>>> MSI message.
> >>>> This will allow to update potential in-kernel IRQ routes on
> >>>> changes. The latter notifier is optional and will only be used by a
> >>>> subset of clients.
> >>>>
> >>>> These notifiers are currently only available for MSI-X but will be
> >>>> extended to legacy MSI as well.
> >>>>
> >>>> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
> >>>
> >>> Passing message, always, does not seem to make sense: message is only
> >>> valid if it is unmasked.
> >>
> >> If we go from unmasked to masked, the consumer could just ignore the
> >> message.
> > 
> > Why don't we let the consumer get the message if it needs it?
> 
> Because most consumer will need and because I want to keep the API simple.

The API seems to get more complex, as you are passing
in fields which are only valid sometimes.

> > 
> >>> Further, IIRC the spec requires any changes to be done while
> >>> message is masked. So mask notifier makes more sense to me:
> >>> it does the same thing using one notifier that you do
> >>> using two notifiers.
> >>
> >> That's in fact a possible optimization (only invoke the callback on mask
> >> transitions).
> > 
> > Further, it is one that is already implemented.
> > So I would prefer not to add work by removing it :)
> 
> Generalization to cover MSI requires some changes. Unneeded behavioral
> changes back and forth should and will of course be avoided. I will
> rework this.
> 
> > 
> >> Not sure if that applies to MSI as well, probably not.
> > 
> > Probably not. However, if per vector masking is
> > supported, and while vector is masked, the address/
> > data values might not make any sense.
> > 
> > So I think even msi users needs to know about masked state.
> 
> Yes, and they get this information via the config notifier.
> 
> > 
> >> To
> >> have common types, I would prefer to stay with vector config notifiers
> >> as name then.
> >>
> >> Jan
> > 
> > So we pass in nonsense values and ask all users to know about MSIX rules.
> > Ugh.
> > 
> > I do realize msi might change the vector without masking.
> > We can either artificially call mask before value change
> > and unmask after, or use 3 notifiers: mask,unmask,config.
> > Add a comment that config is invoked when configuration
> > for an unmasked vector is changed, and that
> > it can only happen for msi, not msix.
> 
> I see no need in complicating the API like this. MSI-X still needs the
> config information on unmask, so let's just consistently pass it via the
> unified config notifier instead of forcing the consumers to create yet
> two more handlers. I really do not see the benefit for the consumer.
> 
> Jan
> 

The benefit is a clearer API, where all parameters you get are valid,
so you do not need to go read the spec to see what is OK to use.
Generally, encoding events in flags is more error
prone than using different notifiers for different events.

E.g. _unmask and _mask make
it obvious that they are called on mask and on unmask
respectively.
OTOH _config_change(bool mask) is unclear: is 'mask' the new
state or the old state?

It might be just my taste, but I usually prefer multiple
functions doing one thing each rather than a single
function doing multiple things. It shouldn't be too hard ...

-- 
MST

^ permalink raw reply	[flat|nested] 288+ messages in thread

* Re: [RFC][PATCH 23/45] qemu-kvm: Rework MSI-X mask notifier to generic MSI config notifiers
  2011-10-18 13:46             ` [Qemu-devel] " Michael S. Tsirkin
@ 2011-10-18 13:49               ` Jan Kiszka
  -1 siblings, 0 replies; 288+ messages in thread
From: Jan Kiszka @ 2011-10-18 13:49 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Avi Kivity, Marcelo Tosatti, kvm, Alex Williamson, qemu-devel

On 2011-10-18 15:46, Michael S. Tsirkin wrote:
>>>> To
>>>> have common types, I would prefer to stay with vector config notifiers
>>>> as name then.
>>>>
>>>> Jan
>>>
>>> So we pass in nonsense values and ask all users to know about MSIX rules.
>>> Ugh.
>>>
>>> I do realize msi might change the vector without masking.
>>> We can either artificially call mask before value change
>>> and unmask after, or use 3 notifiers: mask,unmask,config.
>>> Add a comment that config is invoked when configuration
>>> for an unmasked vector is changed, and that
>>> it can only happen for msi, not msix.
>>
>> I see no need in complicating the API like this. MSI-X still needs the
>> config information on unmask, so let's just consistently pass it via the
>> unified config notifier instead of forcing the consumers to create yet
>> two more handlers. I really do not see the benefit for the consumer.
>>
>> Jan
>>
> 
> The benefit is a clearer API, where all parameters you get are valid,
> so you do not need to go read the spec to see what is OK to use.
> Generally, encoding events in flags is more error
> prone than using different notifiers for different events.
> 
> E.g. _unmask and _mask make
> it obvious that they are called on mask and on unmask
> respectively.
> OTOH _config_change(bool mask) is unclear: is 'mask' the new
> state or the old state?
> 
> It might be just my taste, but I usually prefer multiple
> functions doing one thing each rather than a single
> function doing multiple things. It shouldn't be too hard ...

The impact on the user side (device models) will be larger while the
work in those different handlers will be widely the same (check my
series, e.g. the virtio handler). But it's still just a guess of mine as
well.

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux

^ permalink raw reply	[flat|nested] 288+ messages in thread

* Re: [Qemu-devel] [RFC][PATCH 23/45] qemu-kvm: Rework MSI-X mask notifier to generic MSI config notifiers
@ 2011-10-18 13:49               ` Jan Kiszka
  0 siblings, 0 replies; 288+ messages in thread
From: Jan Kiszka @ 2011-10-18 13:49 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Alex Williamson, Marcelo Tosatti, Avi Kivity, kvm, qemu-devel

On 2011-10-18 15:46, Michael S. Tsirkin wrote:
>>>> To
>>>> have common types, I would prefer to stay with vector config notifiers
>>>> as name then.
>>>>
>>>> Jan
>>>
>>> So we pass in nonsense values and ask all users to know about MSIX rules.
>>> Ugh.
>>>
>>> I do realize msi might change the vector without masking.
>>> We can either artificially call mask before value change
>>> and unmask after, or use 3 notifiers: mask,unmask,config.
>>> Add a comment that config is invoked when configuration
>>> for an unmasked vector is changed, and that
>>> it can only happen for msi, not msix.
>>
>> I see no need in complicating the API like this. MSI-X still needs the
>> config information on unmask, so let's just consistently pass it via the
>> unified config notifier instead of forcing the consumers to create yet
>> two more handlers. I really do not see the benefit for the consumer.
>>
>> Jan
>>
> 
> The benefit is a clearer API, where all parameters you get are valid,
> so you do not need to go read the spec to see what is OK to use.
> Generally, encoding events in flags is more error
> prone than using different notifiers for different events.
> 
> E.g. _unmask and _mask make
> it obvious that they are called on mask and on unmask
> respectively.
> OTOH _config_change(bool mask) is unclear: is 'mask' the new
> state or the old state?
> 
> It might be just my taste, but I usually prefer multiple
> functions doing one thing each rather than a single
> function doing multiple things. It shouldn't be too hard ...

The impact on the user side (device models) will be larger while the
work in those different handlers will be widely the same (check my
series, e.g. the virtio handler). But it's still just a guess of mine as
well.

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux

^ permalink raw reply	[flat|nested] 288+ messages in thread

* Re: [RFC][PATCH 28/45] qemu-kvm: msix: Drop tracking of used vectors
  2011-10-18 13:46                       ` [Qemu-devel] " Jan Kiszka
@ 2011-10-18 14:01                         ` Michael S. Tsirkin
  -1 siblings, 0 replies; 288+ messages in thread
From: Michael S. Tsirkin @ 2011-10-18 14:01 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: Avi Kivity, Marcelo Tosatti, kvm, Alex Williamson, qemu-devel

On Tue, Oct 18, 2011 at 03:46:06PM +0200, Jan Kiszka wrote:
> On 2011-10-18 15:37, Michael S. Tsirkin wrote:
> > On Tue, Oct 18, 2011 at 03:00:29PM +0200, Jan Kiszka wrote:
> >> On 2011-10-18 14:48, Michael S. Tsirkin wrote:
> >>>> To my understanding, virtio will be the exception as no other device
> >>>> will have a chance to react on resource shortage while sending(!) an MSI
> >>>> message.
> >>>
> >>> Hmm, are you familiar with that spec?
> >>
> >> Not by heart.
> >>
> >>> This is not what virtio does,
> >>> resource shortage is detected during setup.
> >>> This is exactly the problem with lazy registration as you don't
> >>> allocate until it's too late.
> >>
> >> When is that setup phase? Does it actually come after every change to an
> >> MSI vector? I doubt so.
> > 
> > No. During setup, driver requests vectors from the OS, and then tells
> > the device which vector should each VQ use.  It then checks that the
> > assignment was successful. If not, it retries with less vectors.
> > 
> > Other devices can do this during initialization, and signal
> > resource availability to guest using msix vector number field.
> > 
> >> Thus virtio can only estimate the guest usage as
> >> well
> > 
> > At some level, this is fundamental: some guest operations
> > have no failure mode. So we must preallocate
> > some resources to make sure they won't fail.
> 
> We can still track the expected maximum number of active vectors at core
> level, collect them from the KVM layer, and warn if we expect conflicts.
> Anxious MSI users could then refrain from using this feature, others
> might be fine with risking a slow-down on conflicts.

It seems like a nice feature until you have to debug it in the field :).
If you really think it's worthwhile, let's add a 'force' flag so that
advanced users at least can declare that they know what they are doing.

> > 
> >> (a guest may or may not actually write a non-null data into a
> >> vector and unmask it).
> > 
> > Please, forget the non-NULL thing. virtio driver knows exactly
> > how many vectors we use and communicates this info to the device.
> > This is not uncommon at all.
> > 
> >>>
> >>>>>
> >>>>> I actually would not mind preallocating everything upfront which is much
> >>>>> easier.  But with your patch we get a silent failure or a drastic
> >>>>> slowdown which is much more painful IMO.
> >>>>
> >>>> Again: did we already saw that limit? And where does it come from if not
> >>>> from KVM?
> >>>
> >>> It's a hardware limitation of intel APICs. interrupt vector is encoded
> >>> in an 8 bit field in msi address. So you can have at most 256 of these.
> >>
> >> There should be no such limitation with pseudo GSIs we use for MSI
> >> injection. They end up as MSI messages again, so actually 256 (-reserved
> >> vectors) * number-of-cpus (on x86).
> > 
> > This limits which CPUs can get the interrupt though.
> > Linux seems to have a global pool as it wants to be able to freely
> > balance vectors between CPUs. Or, consider a guest with a single CPU :)
> > 
> > Anyway, why argue - there is a limitation, and it's not coming from KVM,
> > right?
> 
> No, our limit we hit with MSI message routing are first of all KVM GSIs,
> and there only pseudo GSIs that do not go to any interrupt controller
> with limited pins.

I see KVM_MAX_IRQ_ROUTES 1024
This is > 256 so KVM does not seem to be the problem.

> That could easily be lifted in the kernel if we run
> into shortages in practice.

What I was saying is that resources are limited even without kvm.

> > 
> >>>
> >>>>>
> >>>>>> That's also why we do those data == 0
> >>>>>> checks to skip used but unconfigured vectors.
> >>>>>>
> >>>>>> Jan
> >>>>>
> >>>>> These checks work more or less by luck BTW. It's
> >>>>> a hack which I hope lazy allocation will replace.
> >>>>
> >>>> The check is still valid (for x86) when we have to use static routes
> >>>> (device assignment, vhost).
> >>>
> >>> It's not valid at all - we are just lucky that linux and
> >>> windows guests seem to zero out the vector when it's not in use.
> >>> They do not have to do that.
> >>
> >> It is valid as it is just an optimization. If an unused vector has a
> >> non-null data field, we just redundantly register a route where we do
> >> not actually have to.
> > 
> > Well, the only reason we even have this code is because
> > it was claimed that some devices declare support for a huge number
> > of vectors which then go unused. So if the guest does not
> > do this we'll run out of vectors ...
> > 
> >> But we do need to be prepared
> > 
> > And ATM, we aren't, and probably can't be without kernel
> > changes, right?
> > 
> >> for potentially
> >> arriving messages on that virtual GSI, either via irqfd or kvm device
> >> assignment.
> >>
> >> Jan
> > 
> > Why irqfd?  Device assignment is ATM the only place where we use these
> > ugly hacks.
> 
> vfio will use irqfds. And that virtio is partly out of the picture is
> only because we know much more about virtio internals (specifically:
> "will not advertise more vectors than guests will want to use").
> 
> Jan
> 
> -- 
> Siemens AG, Corporate Technology, CT T DE IT 1
> Corporate Competence Center Embedded Linux

^ permalink raw reply	[flat|nested] 288+ messages in thread

* Re: [Qemu-devel] [RFC][PATCH 28/45] qemu-kvm: msix: Drop tracking of used vectors
@ 2011-10-18 14:01                         ` Michael S. Tsirkin
  0 siblings, 0 replies; 288+ messages in thread
From: Michael S. Tsirkin @ 2011-10-18 14:01 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: Alex Williamson, Marcelo Tosatti, Avi Kivity, kvm, qemu-devel

On Tue, Oct 18, 2011 at 03:46:06PM +0200, Jan Kiszka wrote:
> On 2011-10-18 15:37, Michael S. Tsirkin wrote:
> > On Tue, Oct 18, 2011 at 03:00:29PM +0200, Jan Kiszka wrote:
> >> On 2011-10-18 14:48, Michael S. Tsirkin wrote:
> >>>> To my understanding, virtio will be the exception as no other device
> >>>> will have a chance to react on resource shortage while sending(!) an MSI
> >>>> message.
> >>>
> >>> Hmm, are you familiar with that spec?
> >>
> >> Not by heart.
> >>
> >>> This is not what virtio does,
> >>> resource shortage is detected during setup.
> >>> This is exactly the problem with lazy registration as you don't
> >>> allocate until it's too late.
> >>
> >> When is that setup phase? Does it actually come after every change to an
> >> MSI vector? I doubt so.
> > 
> > No. During setup, driver requests vectors from the OS, and then tells
> > the device which vector should each VQ use.  It then checks that the
> > assignment was successful. If not, it retries with less vectors.
> > 
> > Other devices can do this during initialization, and signal
> > resource availability to guest using msix vector number field.
> > 
> >> Thus virtio can only estimate the guest usage as
> >> well
> > 
> > At some level, this is fundamental: some guest operations
> > have no failure mode. So we must preallocate
> > some resources to make sure they won't fail.
> 
> We can still track the expected maximum number of active vectors at core
> level, collect them from the KVM layer, and warn if we expect conflicts.
> Anxious MSI users could then refrain from using this feature, others
> might be fine with risking a slow-down on conflicts.

It seems like a nice feature until you have to debug it in the field :).
If you really think it's worthwhile, let's add a 'force' flag so that
advanced users at least can declare that they know what they are doing.

> > 
> >> (a guest may or may not actually write a non-null data into a
> >> vector and unmask it).
> > 
> > Please, forget the non-NULL thing. virtio driver knows exactly
> > how many vectors we use and communicates this info to the device.
> > This is not uncommon at all.
> > 
> >>>
> >>>>>
> >>>>> I actually would not mind preallocating everything upfront which is much
> >>>>> easier.  But with your patch we get a silent failure or a drastic
> >>>>> slowdown which is much more painful IMO.
> >>>>
> >>>> Again: did we already saw that limit? And where does it come from if not
> >>>> from KVM?
> >>>
> >>> It's a hardware limitation of intel APICs. interrupt vector is encoded
> >>> in an 8 bit field in msi address. So you can have at most 256 of these.
> >>
> >> There should be no such limitation with pseudo GSIs we use for MSI
> >> injection. They end up as MSI messages again, so actually 256 (-reserved
> >> vectors) * number-of-cpus (on x86).
> > 
> > This limits which CPUs can get the interrupt though.
> > Linux seems to have a global pool as it wants to be able to freely
> > balance vectors between CPUs. Or, consider a guest with a single CPU :)
> > 
> > Anyway, why argue - there is a limitation, and it's not coming from KVM,
> > right?
> 
> No, our limit we hit with MSI message routing are first of all KVM GSIs,
> and there only pseudo GSIs that do not go to any interrupt controller
> with limited pins.

I see KVM_MAX_IRQ_ROUTES 1024
This is > 256 so KVM does not seem to be the problem.

> That could easily be lifted in the kernel if we run
> into shortages in practice.

What I was saying is that resources are limited even without kvm.

> > 
> >>>
> >>>>>
> >>>>>> That's also why we do those data == 0
> >>>>>> checks to skip used but unconfigured vectors.
> >>>>>>
> >>>>>> Jan
> >>>>>
> >>>>> These checks work more or less by luck BTW. It's
> >>>>> a hack which I hope lazy allocation will replace.
> >>>>
> >>>> The check is still valid (for x86) when we have to use static routes
> >>>> (device assignment, vhost).
> >>>
> >>> It's not valid at all - we are just lucky that linux and
> >>> windows guests seem to zero out the vector when it's not in use.
> >>> They do not have to do that.
> >>
> >> It is valid as it is just an optimization. If an unused vector has a
> >> non-null data field, we just redundantly register a route where we do
> >> not actually have to.
> > 
> > Well, the only reason we even have this code is because
> > it was claimed that some devices declare support for a huge number
> > of vectors which then go unused. So if the guest does not
> > do this we'll run out of vectors ...
> > 
> >> But we do need to be prepared
> > 
> > And ATM, we aren't, and probably can't be without kernel
> > changes, right?
> > 
> >> for potentially
> >> arriving messages on that virtual GSI, either via irqfd or kvm device
> >> assignment.
> >>
> >> Jan
> > 
> > Why irqfd?  Device assignment is ATM the only place where we use these
> > ugly hacks.
> 
> vfio will use irqfds. And that virtio is partly out of the picture is
> only because we know much more about virtio internals (specifically:
> "will not advertise more vectors than guests will want to use").
> 
> Jan
> 
> -- 
> Siemens AG, Corporate Technology, CT T DE IT 1
> Corporate Competence Center Embedded Linux

^ permalink raw reply	[flat|nested] 288+ messages in thread

* Re: [RFC][PATCH 28/45] qemu-kvm: msix: Drop tracking of used vectors
  2011-10-18 14:01                         ` [Qemu-devel] " Michael S. Tsirkin
@ 2011-10-18 14:08                           ` Jan Kiszka
  -1 siblings, 0 replies; 288+ messages in thread
From: Jan Kiszka @ 2011-10-18 14:08 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Alex Williamson, Marcelo Tosatti, Avi Kivity, kvm, qemu-devel

On 2011-10-18 16:01, Michael S. Tsirkin wrote:
>>>>>>>
>>>>>>> I actually would not mind preallocating everything upfront which is much
>>>>>>> easier.  But with your patch we get a silent failure or a drastic
>>>>>>> slowdown which is much more painful IMO.
>>>>>>
>>>>>> Again: did we already saw that limit? And where does it come from if not
>>>>>> from KVM?
>>>>>
>>>>> It's a hardware limitation of intel APICs. interrupt vector is encoded
>>>>> in an 8 bit field in msi address. So you can have at most 256 of these.
>>>>
>>>> There should be no such limitation with pseudo GSIs we use for MSI
>>>> injection. They end up as MSI messages again, so actually 256 (-reserved
>>>> vectors) * number-of-cpus (on x86).
>>>
>>> This limits which CPUs can get the interrupt though.
>>> Linux seems to have a global pool as it wants to be able to freely
>>> balance vectors between CPUs. Or, consider a guest with a single CPU :)
>>>
>>> Anyway, why argue - there is a limitation, and it's not coming from KVM,
>>> right?
>>
>> No, our limit we hit with MSI message routing are first of all KVM GSIs,
>> and there only pseudo GSIs that do not go to any interrupt controller
>> with limited pins.
> 
> I see KVM_MAX_IRQ_ROUTES 1024
> This is > 256 so KVM does not seem to be the problem.

We can generate way more different MSI messages than 256. A message may
encode the target CPU, so you have this number in the equation e.g.

> 
>> That could easily be lifted in the kernel if we run
>> into shortages in practice.
> 
> What I was saying is that resources are limited even without kvm.

What other resources related to this particular case are exhausted
before GSI numbers?

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux

^ permalink raw reply	[flat|nested] 288+ messages in thread

* Re: [Qemu-devel] [RFC][PATCH 28/45] qemu-kvm: msix: Drop tracking of used vectors
@ 2011-10-18 14:08                           ` Jan Kiszka
  0 siblings, 0 replies; 288+ messages in thread
From: Jan Kiszka @ 2011-10-18 14:08 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Alex Williamson, Marcelo Tosatti, Avi Kivity, kvm, qemu-devel

On 2011-10-18 16:01, Michael S. Tsirkin wrote:
>>>>>>>
>>>>>>> I actually would not mind preallocating everything upfront which is much
>>>>>>> easier.  But with your patch we get a silent failure or a drastic
>>>>>>> slowdown which is much more painful IMO.
>>>>>>
>>>>>> Again: did we already saw that limit? And where does it come from if not
>>>>>> from KVM?
>>>>>
>>>>> It's a hardware limitation of intel APICs. interrupt vector is encoded
>>>>> in an 8 bit field in msi address. So you can have at most 256 of these.
>>>>
>>>> There should be no such limitation with pseudo GSIs we use for MSI
>>>> injection. They end up as MSI messages again, so actually 256 (-reserved
>>>> vectors) * number-of-cpus (on x86).
>>>
>>> This limits which CPUs can get the interrupt though.
>>> Linux seems to have a global pool as it wants to be able to freely
>>> balance vectors between CPUs. Or, consider a guest with a single CPU :)
>>>
>>> Anyway, why argue - there is a limitation, and it's not coming from KVM,
>>> right?
>>
>> No, our limit we hit with MSI message routing are first of all KVM GSIs,
>> and there only pseudo GSIs that do not go to any interrupt controller
>> with limited pins.
> 
> I see KVM_MAX_IRQ_ROUTES 1024
> This is > 256 so KVM does not seem to be the problem.

We can generate way more different MSI messages than 256. A message may
encode the target CPU, so you have this number in the equation e.g.

> 
>> That could easily be lifted in the kernel if we run
>> into shortages in practice.
> 
> What I was saying is that resources are limited even without kvm.

What other resources related to this particular case are exhausted
before GSI numbers?

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux

^ permalink raw reply	[flat|nested] 288+ messages in thread

* Re: [RFC][PATCH 28/45] qemu-kvm: msix: Drop tracking of used vectors
  2011-10-18 14:08                           ` [Qemu-devel] " Jan Kiszka
@ 2011-10-18 15:08                             ` Michael S. Tsirkin
  -1 siblings, 0 replies; 288+ messages in thread
From: Michael S. Tsirkin @ 2011-10-18 15:08 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: Avi Kivity, Marcelo Tosatti, kvm, Alex Williamson, qemu-devel

On Tue, Oct 18, 2011 at 04:08:46PM +0200, Jan Kiszka wrote:
> On 2011-10-18 16:01, Michael S. Tsirkin wrote:
> >>>>>>>
> >>>>>>> I actually would not mind preallocating everything upfront which is much
> >>>>>>> easier.  But with your patch we get a silent failure or a drastic
> >>>>>>> slowdown which is much more painful IMO.
> >>>>>>
> >>>>>> Again: did we already saw that limit? And where does it come from if not
> >>>>>> from KVM?
> >>>>>
> >>>>> It's a hardware limitation of intel APICs. interrupt vector is encoded
> >>>>> in an 8 bit field in msi address. So you can have at most 256 of these.
> >>>>
> >>>> There should be no such limitation with pseudo GSIs we use for MSI
> >>>> injection. They end up as MSI messages again, so actually 256 (-reserved
> >>>> vectors) * number-of-cpus (on x86).
> >>>
> >>> This limits which CPUs can get the interrupt though.
> >>> Linux seems to have a global pool as it wants to be able to freely
> >>> balance vectors between CPUs. Or, consider a guest with a single CPU :)
> >>>
> >>> Anyway, why argue - there is a limitation, and it's not coming from KVM,
> >>> right?
> >>
> >> No, our limit we hit with MSI message routing are first of all KVM GSIs,
> >> and there only pseudo GSIs that do not go to any interrupt controller
> >> with limited pins.
> > 
> > I see KVM_MAX_IRQ_ROUTES 1024
> > This is > 256 so KVM does not seem to be the problem.
> 
> We can generate way more different MSI messages than 256. A message may
> encode the target CPU, so you have this number in the equation e.g.

Yes but the vector is encoded in 256 bits. The rest is
stuff like delivery mode, which won't affect which
handler is run AFAIK. So while the problem might
appear with vector sharing, in practice there is
no vector sharing so no problem :)

> > 
> >> That could easily be lifted in the kernel if we run
> >> into shortages in practice.
> > 
> > What I was saying is that resources are limited even without kvm.
> 
> What other resources related to this particular case are exhausted
> before GSI numbers?
> 
> Jan

distinct vectors

> Siemens AG, Corporate Technology, CT T DE IT 1
> Corporate Competence Center Embedded Linux

^ permalink raw reply	[flat|nested] 288+ messages in thread

* Re: [Qemu-devel] [RFC][PATCH 28/45] qemu-kvm: msix: Drop tracking of used vectors
@ 2011-10-18 15:08                             ` Michael S. Tsirkin
  0 siblings, 0 replies; 288+ messages in thread
From: Michael S. Tsirkin @ 2011-10-18 15:08 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: Alex Williamson, Marcelo Tosatti, Avi Kivity, kvm, qemu-devel

On Tue, Oct 18, 2011 at 04:08:46PM +0200, Jan Kiszka wrote:
> On 2011-10-18 16:01, Michael S. Tsirkin wrote:
> >>>>>>>
> >>>>>>> I actually would not mind preallocating everything upfront which is much
> >>>>>>> easier.  But with your patch we get a silent failure or a drastic
> >>>>>>> slowdown which is much more painful IMO.
> >>>>>>
> >>>>>> Again: did we already saw that limit? And where does it come from if not
> >>>>>> from KVM?
> >>>>>
> >>>>> It's a hardware limitation of intel APICs. interrupt vector is encoded
> >>>>> in an 8 bit field in msi address. So you can have at most 256 of these.
> >>>>
> >>>> There should be no such limitation with pseudo GSIs we use for MSI
> >>>> injection. They end up as MSI messages again, so actually 256 (-reserved
> >>>> vectors) * number-of-cpus (on x86).
> >>>
> >>> This limits which CPUs can get the interrupt though.
> >>> Linux seems to have a global pool as it wants to be able to freely
> >>> balance vectors between CPUs. Or, consider a guest with a single CPU :)
> >>>
> >>> Anyway, why argue - there is a limitation, and it's not coming from KVM,
> >>> right?
> >>
> >> No, our limit we hit with MSI message routing are first of all KVM GSIs,
> >> and there only pseudo GSIs that do not go to any interrupt controller
> >> with limited pins.
> > 
> > I see KVM_MAX_IRQ_ROUTES 1024
> > This is > 256 so KVM does not seem to be the problem.
> 
> We can generate way more different MSI messages than 256. A message may
> encode the target CPU, so you have this number in the equation e.g.

Yes but the vector is encoded in 256 bits. The rest is
stuff like delivery mode, which won't affect which
handler is run AFAIK. So while the problem might
appear with vector sharing, in practice there is
no vector sharing so no problem :)

> > 
> >> That could easily be lifted in the kernel if we run
> >> into shortages in practice.
> > 
> > What I was saying is that resources are limited even without kvm.
> 
> What other resources related to this particular case are exhausted
> before GSI numbers?
> 
> Jan

distinct vectors

> Siemens AG, Corporate Technology, CT T DE IT 1
> Corporate Competence Center Embedded Linux

^ permalink raw reply	[flat|nested] 288+ messages in thread

* Re: [RFC][PATCH 28/45] qemu-kvm: msix: Drop tracking of used vectors
  2011-10-18 15:08                             ` [Qemu-devel] " Michael S. Tsirkin
@ 2011-10-18 15:22                               ` Jan Kiszka
  -1 siblings, 0 replies; 288+ messages in thread
From: Jan Kiszka @ 2011-10-18 15:22 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Avi Kivity, Marcelo Tosatti, kvm, Alex Williamson, qemu-devel

On 2011-10-18 17:08, Michael S. Tsirkin wrote:
> On Tue, Oct 18, 2011 at 04:08:46PM +0200, Jan Kiszka wrote:
>> On 2011-10-18 16:01, Michael S. Tsirkin wrote:
>>>>>>>>>
>>>>>>>>> I actually would not mind preallocating everything upfront which is much
>>>>>>>>> easier.  But with your patch we get a silent failure or a drastic
>>>>>>>>> slowdown which is much more painful IMO.
>>>>>>>>
>>>>>>>> Again: did we already saw that limit? And where does it come from if not
>>>>>>>> from KVM?
>>>>>>>
>>>>>>> It's a hardware limitation of intel APICs. interrupt vector is encoded
>>>>>>> in an 8 bit field in msi address. So you can have at most 256 of these.
>>>>>>
>>>>>> There should be no such limitation with pseudo GSIs we use for MSI
>>>>>> injection. They end up as MSI messages again, so actually 256 (-reserved
>>>>>> vectors) * number-of-cpus (on x86).
>>>>>
>>>>> This limits which CPUs can get the interrupt though.
>>>>> Linux seems to have a global pool as it wants to be able to freely
>>>>> balance vectors between CPUs. Or, consider a guest with a single CPU :)
>>>>>
>>>>> Anyway, why argue - there is a limitation, and it's not coming from KVM,
>>>>> right?
>>>>
>>>> No, our limit we hit with MSI message routing are first of all KVM GSIs,
>>>> and there only pseudo GSIs that do not go to any interrupt controller
>>>> with limited pins.
>>>
>>> I see KVM_MAX_IRQ_ROUTES 1024
>>> This is > 256 so KVM does not seem to be the problem.
>>
>> We can generate way more different MSI messages than 256. A message may
>> encode the target CPU, so you have this number in the equation e.g.
> 
> Yes but the vector is encoded in 256 bits. The rest is
> stuff like delivery mode, which won't affect which
> handler is run AFAIK. So while the problem might
> appear with vector sharing, in practice there is
> no vector sharing so no problem :)
> 
>>>
>>>> That could easily be lifted in the kernel if we run
>>>> into shortages in practice.
>>>
>>> What I was saying is that resources are limited even without kvm.
>>
>> What other resources related to this particular case are exhausted
>> before GSI numbers?
>>
>> Jan
> 
> distinct vectors

The guest is responsible for managing vectors, not KVM, not QEMU. And
the guest will notice first when it runs out of them. So a virtio guest
driver may not even request MSI-X support if that happens.

What KVM has to do is just mapping an arbitrary MSI message
(theoretically 64+32 bits, in practice it's much of course much less) to
a single GSI and vice versa. As there are less GSIs than possible MSI
messages, we could run out of them when creating routes, statically or
lazily.

What would probably help us long-term out of your concerns regarding
lazy routing is to bypass that redundant GSI translation for dynamic
messages, i.e. those that are not associated with an irqfd number or an
assigned device irq. Something like a KVM_DELIVER_MSI IOCTL that accepts
address and data directly.

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux

^ permalink raw reply	[flat|nested] 288+ messages in thread

* Re: [Qemu-devel] [RFC][PATCH 28/45] qemu-kvm: msix: Drop tracking of used vectors
@ 2011-10-18 15:22                               ` Jan Kiszka
  0 siblings, 0 replies; 288+ messages in thread
From: Jan Kiszka @ 2011-10-18 15:22 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Alex Williamson, Marcelo Tosatti, Avi Kivity, kvm, qemu-devel

On 2011-10-18 17:08, Michael S. Tsirkin wrote:
> On Tue, Oct 18, 2011 at 04:08:46PM +0200, Jan Kiszka wrote:
>> On 2011-10-18 16:01, Michael S. Tsirkin wrote:
>>>>>>>>>
>>>>>>>>> I actually would not mind preallocating everything upfront which is much
>>>>>>>>> easier.  But with your patch we get a silent failure or a drastic
>>>>>>>>> slowdown which is much more painful IMO.
>>>>>>>>
>>>>>>>> Again: did we already saw that limit? And where does it come from if not
>>>>>>>> from KVM?
>>>>>>>
>>>>>>> It's a hardware limitation of intel APICs. interrupt vector is encoded
>>>>>>> in an 8 bit field in msi address. So you can have at most 256 of these.
>>>>>>
>>>>>> There should be no such limitation with pseudo GSIs we use for MSI
>>>>>> injection. They end up as MSI messages again, so actually 256 (-reserved
>>>>>> vectors) * number-of-cpus (on x86).
>>>>>
>>>>> This limits which CPUs can get the interrupt though.
>>>>> Linux seems to have a global pool as it wants to be able to freely
>>>>> balance vectors between CPUs. Or, consider a guest with a single CPU :)
>>>>>
>>>>> Anyway, why argue - there is a limitation, and it's not coming from KVM,
>>>>> right?
>>>>
>>>> No, our limit we hit with MSI message routing are first of all KVM GSIs,
>>>> and there only pseudo GSIs that do not go to any interrupt controller
>>>> with limited pins.
>>>
>>> I see KVM_MAX_IRQ_ROUTES 1024
>>> This is > 256 so KVM does not seem to be the problem.
>>
>> We can generate way more different MSI messages than 256. A message may
>> encode the target CPU, so you have this number in the equation e.g.
> 
> Yes but the vector is encoded in 256 bits. The rest is
> stuff like delivery mode, which won't affect which
> handler is run AFAIK. So while the problem might
> appear with vector sharing, in practice there is
> no vector sharing so no problem :)
> 
>>>
>>>> That could easily be lifted in the kernel if we run
>>>> into shortages in practice.
>>>
>>> What I was saying is that resources are limited even without kvm.
>>
>> What other resources related to this particular case are exhausted
>> before GSI numbers?
>>
>> Jan
> 
> distinct vectors

The guest is responsible for managing vectors, not KVM, not QEMU. And
the guest will notice first when it runs out of them. So a virtio guest
driver may not even request MSI-X support if that happens.

What KVM has to do is just mapping an arbitrary MSI message
(theoretically 64+32 bits, in practice it's much of course much less) to
a single GSI and vice versa. As there are less GSIs than possible MSI
messages, we could run out of them when creating routes, statically or
lazily.

What would probably help us long-term out of your concerns regarding
lazy routing is to bypass that redundant GSI translation for dynamic
messages, i.e. those that are not associated with an irqfd number or an
assigned device irq. Something like a KVM_DELIVER_MSI IOCTL that accepts
address and data directly.

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux

^ permalink raw reply	[flat|nested] 288+ messages in thread

* Re: [RFC][PATCH 28/45] qemu-kvm: msix: Drop tracking of used vectors
  2011-10-18 15:22                               ` [Qemu-devel] " Jan Kiszka
@ 2011-10-18 15:55                                 ` Jan Kiszka
  -1 siblings, 0 replies; 288+ messages in thread
From: Jan Kiszka @ 2011-10-18 15:55 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Avi Kivity, Marcelo Tosatti, kvm, Alex Williamson, qemu-devel

On 2011-10-18 17:22, Jan Kiszka wrote:
> What KVM has to do is just mapping an arbitrary MSI message
> (theoretically 64+32 bits, in practice it's much of course much less) to

( There are 24 distinguishing bits in an MSI message on x86, but that's
only a current interpretation of one specific arch. )

> a single GSI and vice versa. As there are less GSIs than possible MSI
> messages, we could run out of them when creating routes, statically or
> lazily.
> 
> What would probably help us long-term out of your concerns regarding
> lazy routing is to bypass that redundant GSI translation for dynamic
> messages, i.e. those that are not associated with an irqfd number or an
> assigned device irq. Something like a KVM_DELIVER_MSI IOCTL that accepts
> address and data directly.

This would be a trivial extension in fact. Given its beneficial impact
on our GSI limitation issue, I think I will hack up something like that.

And maybe this makes a transparent cache more reasonable. Then only old
host kernels would force us to do searches for already cached messages.

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux

^ permalink raw reply	[flat|nested] 288+ messages in thread

* Re: [Qemu-devel] [RFC][PATCH 28/45] qemu-kvm: msix: Drop tracking of used vectors
@ 2011-10-18 15:55                                 ` Jan Kiszka
  0 siblings, 0 replies; 288+ messages in thread
From: Jan Kiszka @ 2011-10-18 15:55 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Alex Williamson, Marcelo Tosatti, Avi Kivity, kvm, qemu-devel

On 2011-10-18 17:22, Jan Kiszka wrote:
> What KVM has to do is just mapping an arbitrary MSI message
> (theoretically 64+32 bits, in practice it's much of course much less) to

( There are 24 distinguishing bits in an MSI message on x86, but that's
only a current interpretation of one specific arch. )

> a single GSI and vice versa. As there are less GSIs than possible MSI
> messages, we could run out of them when creating routes, statically or
> lazily.
> 
> What would probably help us long-term out of your concerns regarding
> lazy routing is to bypass that redundant GSI translation for dynamic
> messages, i.e. those that are not associated with an irqfd number or an
> assigned device irq. Something like a KVM_DELIVER_MSI IOCTL that accepts
> address and data directly.

This would be a trivial extension in fact. Given its beneficial impact
on our GSI limitation issue, I think I will hack up something like that.

And maybe this makes a transparent cache more reasonable. Then only old
host kernels would force us to do searches for already cached messages.

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux

^ permalink raw reply	[flat|nested] 288+ messages in thread

* Re: [RFC][PATCH 28/45] qemu-kvm: msix: Drop tracking of used vectors
  2011-10-18 15:22                               ` [Qemu-devel] " Jan Kiszka
@ 2011-10-18 15:56                                 ` Michael S. Tsirkin
  -1 siblings, 0 replies; 288+ messages in thread
From: Michael S. Tsirkin @ 2011-10-18 15:56 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: Avi Kivity, Marcelo Tosatti, kvm, Alex Williamson, qemu-devel

On Tue, Oct 18, 2011 at 05:22:38PM +0200, Jan Kiszka wrote:
> On 2011-10-18 17:08, Michael S. Tsirkin wrote:
> > On Tue, Oct 18, 2011 at 04:08:46PM +0200, Jan Kiszka wrote:
> >> On 2011-10-18 16:01, Michael S. Tsirkin wrote:
> >>>>>>>>>
> >>>>>>>>> I actually would not mind preallocating everything upfront which is much
> >>>>>>>>> easier.  But with your patch we get a silent failure or a drastic
> >>>>>>>>> slowdown which is much more painful IMO.
> >>>>>>>>
> >>>>>>>> Again: did we already saw that limit? And where does it come from if not
> >>>>>>>> from KVM?
> >>>>>>>
> >>>>>>> It's a hardware limitation of intel APICs. interrupt vector is encoded
> >>>>>>> in an 8 bit field in msi address. So you can have at most 256 of these.
> >>>>>>
> >>>>>> There should be no such limitation with pseudo GSIs we use for MSI
> >>>>>> injection. They end up as MSI messages again, so actually 256 (-reserved
> >>>>>> vectors) * number-of-cpus (on x86).
> >>>>>
> >>>>> This limits which CPUs can get the interrupt though.
> >>>>> Linux seems to have a global pool as it wants to be able to freely
> >>>>> balance vectors between CPUs. Or, consider a guest with a single CPU :)
> >>>>>
> >>>>> Anyway, why argue - there is a limitation, and it's not coming from KVM,
> >>>>> right?
> >>>>
> >>>> No, our limit we hit with MSI message routing are first of all KVM GSIs,
> >>>> and there only pseudo GSIs that do not go to any interrupt controller
> >>>> with limited pins.
> >>>
> >>> I see KVM_MAX_IRQ_ROUTES 1024
> >>> This is > 256 so KVM does not seem to be the problem.
> >>
> >> We can generate way more different MSI messages than 256. A message may
> >> encode the target CPU, so you have this number in the equation e.g.
> > 
> > Yes but the vector is encoded in 256 bits. The rest is
> > stuff like delivery mode, which won't affect which
> > handler is run AFAIK. So while the problem might
> > appear with vector sharing, in practice there is
> > no vector sharing so no problem :)
> > 
> >>>
> >>>> That could easily be lifted in the kernel if we run
> >>>> into shortages in practice.
> >>>
> >>> What I was saying is that resources are limited even without kvm.
> >>
> >> What other resources related to this particular case are exhausted
> >> before GSI numbers?
> >>
> >> Jan
> > 
> > distinct vectors
> 
> The guest is responsible for managing vectors, not KVM, not QEMU. And
> the guest will notice first when it runs out of them. So a virtio guest
> driver may not even request MSI-X support if that happens.

Absolutely. You can solve the problem from guest in theory.
But what I was saying is, in practice what happens first X
devices get msix, others don't. Guests aren't doing anything smart as
they are not designed with a huge number of devices in mind.

What we can do is solve the problem from management.
And to do that we can't delay allocation until it's used.

> What KVM has to do is just mapping an arbitrary MSI message
> (theoretically 64+32 bits, in practice it's much of course much less) to
> a single GSI and vice versa. As there are less GSIs than possible MSI
> messages, we could run out of them when creating routes, statically or
> lazily.

Possible MSI messages != possible MSI vectors.
If two devices share a vector, APIC won't be able
to distinguish even though e.g. delivery mode is
different.

> What would probably help us long-term out of your concerns regarding
> lazy routing is to bypass that redundant GSI translation for dynamic
> messages, i.e. those that are not associated with an irqfd number or an
> assigned device irq. Something like a KVM_DELIVER_MSI IOCTL that accepts
> address and data directly.
> 
> Jan

You are trying to work around the problem by not requiring
any resources per MSI vector. This just might work for some
uses (ioctl) but isn't a generic solution (e.g. won't work for irqfd).

> -- 
> Siemens AG, Corporate Technology, CT T DE IT 1
> Corporate Competence Center Embedded Linux

^ permalink raw reply	[flat|nested] 288+ messages in thread

* Re: [Qemu-devel] [RFC][PATCH 28/45] qemu-kvm: msix: Drop tracking of used vectors
@ 2011-10-18 15:56                                 ` Michael S. Tsirkin
  0 siblings, 0 replies; 288+ messages in thread
From: Michael S. Tsirkin @ 2011-10-18 15:56 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: Alex Williamson, Marcelo Tosatti, Avi Kivity, kvm, qemu-devel

On Tue, Oct 18, 2011 at 05:22:38PM +0200, Jan Kiszka wrote:
> On 2011-10-18 17:08, Michael S. Tsirkin wrote:
> > On Tue, Oct 18, 2011 at 04:08:46PM +0200, Jan Kiszka wrote:
> >> On 2011-10-18 16:01, Michael S. Tsirkin wrote:
> >>>>>>>>>
> >>>>>>>>> I actually would not mind preallocating everything upfront which is much
> >>>>>>>>> easier.  But with your patch we get a silent failure or a drastic
> >>>>>>>>> slowdown which is much more painful IMO.
> >>>>>>>>
> >>>>>>>> Again: did we already saw that limit? And where does it come from if not
> >>>>>>>> from KVM?
> >>>>>>>
> >>>>>>> It's a hardware limitation of intel APICs. interrupt vector is encoded
> >>>>>>> in an 8 bit field in msi address. So you can have at most 256 of these.
> >>>>>>
> >>>>>> There should be no such limitation with pseudo GSIs we use for MSI
> >>>>>> injection. They end up as MSI messages again, so actually 256 (-reserved
> >>>>>> vectors) * number-of-cpus (on x86).
> >>>>>
> >>>>> This limits which CPUs can get the interrupt though.
> >>>>> Linux seems to have a global pool as it wants to be able to freely
> >>>>> balance vectors between CPUs. Or, consider a guest with a single CPU :)
> >>>>>
> >>>>> Anyway, why argue - there is a limitation, and it's not coming from KVM,
> >>>>> right?
> >>>>
> >>>> No, our limit we hit with MSI message routing are first of all KVM GSIs,
> >>>> and there only pseudo GSIs that do not go to any interrupt controller
> >>>> with limited pins.
> >>>
> >>> I see KVM_MAX_IRQ_ROUTES 1024
> >>> This is > 256 so KVM does not seem to be the problem.
> >>
> >> We can generate way more different MSI messages than 256. A message may
> >> encode the target CPU, so you have this number in the equation e.g.
> > 
> > Yes but the vector is encoded in 256 bits. The rest is
> > stuff like delivery mode, which won't affect which
> > handler is run AFAIK. So while the problem might
> > appear with vector sharing, in practice there is
> > no vector sharing so no problem :)
> > 
> >>>
> >>>> That could easily be lifted in the kernel if we run
> >>>> into shortages in practice.
> >>>
> >>> What I was saying is that resources are limited even without kvm.
> >>
> >> What other resources related to this particular case are exhausted
> >> before GSI numbers?
> >>
> >> Jan
> > 
> > distinct vectors
> 
> The guest is responsible for managing vectors, not KVM, not QEMU. And
> the guest will notice first when it runs out of them. So a virtio guest
> driver may not even request MSI-X support if that happens.

Absolutely. You can solve the problem from guest in theory.
But what I was saying is, in practice what happens first X
devices get msix, others don't. Guests aren't doing anything smart as
they are not designed with a huge number of devices in mind.

What we can do is solve the problem from management.
And to do that we can't delay allocation until it's used.

> What KVM has to do is just mapping an arbitrary MSI message
> (theoretically 64+32 bits, in practice it's much of course much less) to
> a single GSI and vice versa. As there are less GSIs than possible MSI
> messages, we could run out of them when creating routes, statically or
> lazily.

Possible MSI messages != possible MSI vectors.
If two devices share a vector, APIC won't be able
to distinguish even though e.g. delivery mode is
different.

> What would probably help us long-term out of your concerns regarding
> lazy routing is to bypass that redundant GSI translation for dynamic
> messages, i.e. those that are not associated with an irqfd number or an
> assigned device irq. Something like a KVM_DELIVER_MSI IOCTL that accepts
> address and data directly.
> 
> Jan

You are trying to work around the problem by not requiring
any resources per MSI vector. This just might work for some
uses (ioctl) but isn't a generic solution (e.g. won't work for irqfd).

> -- 
> Siemens AG, Corporate Technology, CT T DE IT 1
> Corporate Competence Center Embedded Linux

^ permalink raw reply	[flat|nested] 288+ messages in thread

* Re: [RFC][PATCH 28/45] qemu-kvm: msix: Drop tracking of used vectors
  2011-10-18 15:56                                 ` [Qemu-devel] " Michael S. Tsirkin
@ 2011-10-18 15:58                                   ` Jan Kiszka
  -1 siblings, 0 replies; 288+ messages in thread
From: Jan Kiszka @ 2011-10-18 15:58 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Avi Kivity, Marcelo Tosatti, kvm, Alex Williamson, qemu-devel

On 2011-10-18 17:56, Michael S. Tsirkin wrote:
>> What would probably help us long-term out of your concerns regarding
>> lazy routing is to bypass that redundant GSI translation for dynamic
>> messages, i.e. those that are not associated with an irqfd number or an
>> assigned device irq. Something like a KVM_DELIVER_MSI IOCTL that accepts
>> address and data directly.
>>
>> Jan
> 
> You are trying to work around the problem by not requiring
> any resources per MSI vector. This just might work for some
> uses (ioctl) but isn't a generic solution (e.g. won't work for irqfd).

irqfd is not affected anymore in that model as it cannot participate in
lazy routing anyway.

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux

^ permalink raw reply	[flat|nested] 288+ messages in thread

* Re: [Qemu-devel] [RFC][PATCH 28/45] qemu-kvm: msix: Drop tracking of used vectors
@ 2011-10-18 15:58                                   ` Jan Kiszka
  0 siblings, 0 replies; 288+ messages in thread
From: Jan Kiszka @ 2011-10-18 15:58 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Alex Williamson, Marcelo Tosatti, Avi Kivity, kvm, qemu-devel

On 2011-10-18 17:56, Michael S. Tsirkin wrote:
>> What would probably help us long-term out of your concerns regarding
>> lazy routing is to bypass that redundant GSI translation for dynamic
>> messages, i.e. those that are not associated with an irqfd number or an
>> assigned device irq. Something like a KVM_DELIVER_MSI IOCTL that accepts
>> address and data directly.
>>
>> Jan
> 
> You are trying to work around the problem by not requiring
> any resources per MSI vector. This just might work for some
> uses (ioctl) but isn't a generic solution (e.g. won't work for irqfd).

irqfd is not affected anymore in that model as it cannot participate in
lazy routing anyway.

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux

^ permalink raw reply	[flat|nested] 288+ messages in thread

* Re: [RFC][PATCH 28/45] qemu-kvm: msix: Drop tracking of used vectors
  2011-10-18 15:55                                 ` [Qemu-devel] " Jan Kiszka
@ 2011-10-18 17:06                                   ` Michael S. Tsirkin
  -1 siblings, 0 replies; 288+ messages in thread
From: Michael S. Tsirkin @ 2011-10-18 17:06 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: Avi Kivity, Marcelo Tosatti, kvm, Alex Williamson, qemu-devel

On Tue, Oct 18, 2011 at 05:55:54PM +0200, Jan Kiszka wrote:
> On 2011-10-18 17:22, Jan Kiszka wrote:
> > What KVM has to do is just mapping an arbitrary MSI message
> > (theoretically 64+32 bits, in practice it's much of course much less) to
> 
> ( There are 24 distinguishing bits in an MSI message on x86, but that's
> only a current interpretation of one specific arch. )

Confused. vector mask is 8 bits. the rest is destination id etc.

> > a single GSI and vice versa. As there are less GSIs than possible MSI
> > messages, we could run out of them when creating routes, statically or
> > lazily.
> > 
> > What would probably help us long-term out of your concerns regarding
> > lazy routing is to bypass that redundant GSI translation for dynamic
> > messages, i.e. those that are not associated with an irqfd number or an
> > assigned device irq. Something like a KVM_DELIVER_MSI IOCTL that accepts
> > address and data directly.
> 
> This would be a trivial extension in fact. Given its beneficial impact
> on our GSI limitation issue, I think I will hack up something like that.
> 
> And maybe this makes a transparent cache more reasonable. Then only old
> host kernels would force us to do searches for already cached messages.
> 
> Jan

Hmm, I'm not all that sure. Existing design really allows
caching the route in various smart ways. We currently do
this for irqfd but this can be extended to ioctls.
If we just let the guest inject arbitrary messages,
that becomes much more complex.

Another concern is mask bit emulation. We currently
handle mask bit in userspace but patches
to do them in kernel for assigned devices where seen
and IMO we might want to do that for virtio as well.

For that to work the mask bit needs to be tied to
a specific gsi or specific device, which does not
work if we just inject arbitrary writes.

-- 
MST

^ permalink raw reply	[flat|nested] 288+ messages in thread

* Re: [Qemu-devel] [RFC][PATCH 28/45] qemu-kvm: msix: Drop tracking of used vectors
@ 2011-10-18 17:06                                   ` Michael S. Tsirkin
  0 siblings, 0 replies; 288+ messages in thread
From: Michael S. Tsirkin @ 2011-10-18 17:06 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: Alex Williamson, Marcelo Tosatti, Avi Kivity, kvm, qemu-devel

On Tue, Oct 18, 2011 at 05:55:54PM +0200, Jan Kiszka wrote:
> On 2011-10-18 17:22, Jan Kiszka wrote:
> > What KVM has to do is just mapping an arbitrary MSI message
> > (theoretically 64+32 bits, in practice it's much of course much less) to
> 
> ( There are 24 distinguishing bits in an MSI message on x86, but that's
> only a current interpretation of one specific arch. )

Confused. vector mask is 8 bits. the rest is destination id etc.

> > a single GSI and vice versa. As there are less GSIs than possible MSI
> > messages, we could run out of them when creating routes, statically or
> > lazily.
> > 
> > What would probably help us long-term out of your concerns regarding
> > lazy routing is to bypass that redundant GSI translation for dynamic
> > messages, i.e. those that are not associated with an irqfd number or an
> > assigned device irq. Something like a KVM_DELIVER_MSI IOCTL that accepts
> > address and data directly.
> 
> This would be a trivial extension in fact. Given its beneficial impact
> on our GSI limitation issue, I think I will hack up something like that.
> 
> And maybe this makes a transparent cache more reasonable. Then only old
> host kernels would force us to do searches for already cached messages.
> 
> Jan

Hmm, I'm not all that sure. Existing design really allows
caching the route in various smart ways. We currently do
this for irqfd but this can be extended to ioctls.
If we just let the guest inject arbitrary messages,
that becomes much more complex.

Another concern is mask bit emulation. We currently
handle mask bit in userspace but patches
to do them in kernel for assigned devices where seen
and IMO we might want to do that for virtio as well.

For that to work the mask bit needs to be tied to
a specific gsi or specific device, which does not
work if we just inject arbitrary writes.

-- 
MST

^ permalink raw reply	[flat|nested] 288+ messages in thread

* Re: [RFC][PATCH 28/45] qemu-kvm: msix: Drop tracking of used vectors
  2011-10-18 17:06                                   ` [Qemu-devel] " Michael S. Tsirkin
@ 2011-10-18 18:24                                     ` Jan Kiszka
  -1 siblings, 0 replies; 288+ messages in thread
From: Jan Kiszka @ 2011-10-18 18:24 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Avi Kivity, Marcelo Tosatti, kvm, Alex Williamson, qemu-devel

[-- Attachment #1: Type: text/plain, Size: 2648 bytes --]

On 2011-10-18 19:06, Michael S. Tsirkin wrote:
> On Tue, Oct 18, 2011 at 05:55:54PM +0200, Jan Kiszka wrote:
>> On 2011-10-18 17:22, Jan Kiszka wrote:
>>> What KVM has to do is just mapping an arbitrary MSI message
>>> (theoretically 64+32 bits, in practice it's much of course much less) to
>>
>> ( There are 24 distinguishing bits in an MSI message on x86, but that's
>> only a current interpretation of one specific arch. )
> 
> Confused. vector mask is 8 bits. the rest is destination id etc.

Right, but those additional bits like the destination make different
messages. We have to encode those 24 bits into a unique GSI number and
restore them (by table lookup) on APIC injection inside the kernel. If
we only had to encode 256 different vectors, we would be done already.

> 
>>> a single GSI and vice versa. As there are less GSIs than possible MSI
>>> messages, we could run out of them when creating routes, statically or
>>> lazily.
>>>
>>> What would probably help us long-term out of your concerns regarding
>>> lazy routing is to bypass that redundant GSI translation for dynamic
>>> messages, i.e. those that are not associated with an irqfd number or an
>>> assigned device irq. Something like a KVM_DELIVER_MSI IOCTL that accepts
>>> address and data directly.
>>
>> This would be a trivial extension in fact. Given its beneficial impact
>> on our GSI limitation issue, I think I will hack up something like that.
>>
>> And maybe this makes a transparent cache more reasonable. Then only old
>> host kernels would force us to do searches for already cached messages.
>>
>> Jan
> 
> Hmm, I'm not all that sure. Existing design really allows
> caching the route in various smart ways. We currently do
> this for irqfd but this can be extended to ioctls.
> If we just let the guest inject arbitrary messages,
> that becomes much more complex.

irqfd and kvm device assignment do not allow us to inject arbitrary
messages at arbitrary points. The new API offers kvm_msi_irqfd_set and
kvm_device_msix_set_vector (etc.) for those scenarios to set static
routes from an MSI message to a GSI number (+they configure the related
backends).

> 
> Another concern is mask bit emulation. We currently
> handle mask bit in userspace but patches
> to do them in kernel for assigned devices where seen
> and IMO we might want to do that for virtio as well.
> 
> For that to work the mask bit needs to be tied to
> a specific gsi or specific device, which does not
> work if we just inject arbitrary writes.

Yes, but I do not see those valuable plans being negatively affected.

Jan


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 262 bytes --]

^ permalink raw reply	[flat|nested] 288+ messages in thread

* Re: [Qemu-devel] [RFC][PATCH 28/45] qemu-kvm: msix: Drop tracking of used vectors
@ 2011-10-18 18:24                                     ` Jan Kiszka
  0 siblings, 0 replies; 288+ messages in thread
From: Jan Kiszka @ 2011-10-18 18:24 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Alex Williamson, Marcelo Tosatti, Avi Kivity, kvm, qemu-devel

[-- Attachment #1: Type: text/plain, Size: 2648 bytes --]

On 2011-10-18 19:06, Michael S. Tsirkin wrote:
> On Tue, Oct 18, 2011 at 05:55:54PM +0200, Jan Kiszka wrote:
>> On 2011-10-18 17:22, Jan Kiszka wrote:
>>> What KVM has to do is just mapping an arbitrary MSI message
>>> (theoretically 64+32 bits, in practice it's much of course much less) to
>>
>> ( There are 24 distinguishing bits in an MSI message on x86, but that's
>> only a current interpretation of one specific arch. )
> 
> Confused. vector mask is 8 bits. the rest is destination id etc.

Right, but those additional bits like the destination make different
messages. We have to encode those 24 bits into a unique GSI number and
restore them (by table lookup) on APIC injection inside the kernel. If
we only had to encode 256 different vectors, we would be done already.

> 
>>> a single GSI and vice versa. As there are less GSIs than possible MSI
>>> messages, we could run out of them when creating routes, statically or
>>> lazily.
>>>
>>> What would probably help us long-term out of your concerns regarding
>>> lazy routing is to bypass that redundant GSI translation for dynamic
>>> messages, i.e. those that are not associated with an irqfd number or an
>>> assigned device irq. Something like a KVM_DELIVER_MSI IOCTL that accepts
>>> address and data directly.
>>
>> This would be a trivial extension in fact. Given its beneficial impact
>> on our GSI limitation issue, I think I will hack up something like that.
>>
>> And maybe this makes a transparent cache more reasonable. Then only old
>> host kernels would force us to do searches for already cached messages.
>>
>> Jan
> 
> Hmm, I'm not all that sure. Existing design really allows
> caching the route in various smart ways. We currently do
> this for irqfd but this can be extended to ioctls.
> If we just let the guest inject arbitrary messages,
> that becomes much more complex.

irqfd and kvm device assignment do not allow us to inject arbitrary
messages at arbitrary points. The new API offers kvm_msi_irqfd_set and
kvm_device_msix_set_vector (etc.) for those scenarios to set static
routes from an MSI message to a GSI number (+they configure the related
backends).

> 
> Another concern is mask bit emulation. We currently
> handle mask bit in userspace but patches
> to do them in kernel for assigned devices where seen
> and IMO we might want to do that for virtio as well.
> 
> For that to work the mask bit needs to be tied to
> a specific gsi or specific device, which does not
> work if we just inject arbitrary writes.

Yes, but I do not see those valuable plans being negatively affected.

Jan


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 262 bytes --]

^ permalink raw reply	[flat|nested] 288+ messages in thread

* Re: [RFC][PATCH 28/45] qemu-kvm: msix: Drop tracking of used vectors
  2011-10-18 17:06                                   ` [Qemu-devel] " Michael S. Tsirkin
@ 2011-10-18 18:26                                     ` Jan Kiszka
  -1 siblings, 0 replies; 288+ messages in thread
From: Jan Kiszka @ 2011-10-18 18:26 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Avi Kivity, Marcelo Tosatti, kvm, Alex Williamson, qemu-devel

[-- Attachment #1: Type: text/plain, Size: 2648 bytes --]

On 2011-10-18 19:06, Michael S. Tsirkin wrote:
> On Tue, Oct 18, 2011 at 05:55:54PM +0200, Jan Kiszka wrote:
>> On 2011-10-18 17:22, Jan Kiszka wrote:
>>> What KVM has to do is just mapping an arbitrary MSI message
>>> (theoretically 64+32 bits, in practice it's much of course much less) to
>>
>> ( There are 24 distinguishing bits in an MSI message on x86, but that's
>> only a current interpretation of one specific arch. )
> 
> Confused. vector mask is 8 bits. the rest is destination id etc.

Right, but those additional bits like the destination make different
messages. We have to encode those 24 bits into a unique GSI number and
restore them (by table lookup) on APIC injection inside the kernel. If
we only had to encode 256 different vectors, we would be done already.

> 
>>> a single GSI and vice versa. As there are less GSIs than possible MSI
>>> messages, we could run out of them when creating routes, statically or
>>> lazily.
>>>
>>> What would probably help us long-term out of your concerns regarding
>>> lazy routing is to bypass that redundant GSI translation for dynamic
>>> messages, i.e. those that are not associated with an irqfd number or an
>>> assigned device irq. Something like a KVM_DELIVER_MSI IOCTL that accepts
>>> address and data directly.
>>
>> This would be a trivial extension in fact. Given its beneficial impact
>> on our GSI limitation issue, I think I will hack up something like that.
>>
>> And maybe this makes a transparent cache more reasonable. Then only old
>> host kernels would force us to do searches for already cached messages.
>>
>> Jan
> 
> Hmm, I'm not all that sure. Existing design really allows
> caching the route in various smart ways. We currently do
> this for irqfd but this can be extended to ioctls.
> If we just let the guest inject arbitrary messages,
> that becomes much more complex.

irqfd and kvm device assignment do not allow us to inject arbitrary
messages at arbitrary points. The new API offers kvm_msi_irqfd_set and
kvm_device_msix_set_vector (etc.) for those scenarios to set static
routes from an MSI message to a GSI number (+they configure the related
backends).

> 
> Another concern is mask bit emulation. We currently
> handle mask bit in userspace but patches
> to do them in kernel for assigned devices where seen
> and IMO we might want to do that for virtio as well.
> 
> For that to work the mask bit needs to be tied to
> a specific gsi or specific device, which does not
> work if we just inject arbitrary writes.

Yes, but I do not see those valuable plans being negatively affected.

Jan


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 262 bytes --]

^ permalink raw reply	[flat|nested] 288+ messages in thread

* Re: [Qemu-devel] [RFC][PATCH 28/45] qemu-kvm: msix: Drop tracking of used vectors
@ 2011-10-18 18:26                                     ` Jan Kiszka
  0 siblings, 0 replies; 288+ messages in thread
From: Jan Kiszka @ 2011-10-18 18:26 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Alex Williamson, Marcelo Tosatti, Avi Kivity, kvm, qemu-devel

[-- Attachment #1: Type: text/plain, Size: 2648 bytes --]

On 2011-10-18 19:06, Michael S. Tsirkin wrote:
> On Tue, Oct 18, 2011 at 05:55:54PM +0200, Jan Kiszka wrote:
>> On 2011-10-18 17:22, Jan Kiszka wrote:
>>> What KVM has to do is just mapping an arbitrary MSI message
>>> (theoretically 64+32 bits, in practice it's much of course much less) to
>>
>> ( There are 24 distinguishing bits in an MSI message on x86, but that's
>> only a current interpretation of one specific arch. )
> 
> Confused. vector mask is 8 bits. the rest is destination id etc.

Right, but those additional bits like the destination make different
messages. We have to encode those 24 bits into a unique GSI number and
restore them (by table lookup) on APIC injection inside the kernel. If
we only had to encode 256 different vectors, we would be done already.

> 
>>> a single GSI and vice versa. As there are less GSIs than possible MSI
>>> messages, we could run out of them when creating routes, statically or
>>> lazily.
>>>
>>> What would probably help us long-term out of your concerns regarding
>>> lazy routing is to bypass that redundant GSI translation for dynamic
>>> messages, i.e. those that are not associated with an irqfd number or an
>>> assigned device irq. Something like a KVM_DELIVER_MSI IOCTL that accepts
>>> address and data directly.
>>
>> This would be a trivial extension in fact. Given its beneficial impact
>> on our GSI limitation issue, I think I will hack up something like that.
>>
>> And maybe this makes a transparent cache more reasonable. Then only old
>> host kernels would force us to do searches for already cached messages.
>>
>> Jan
> 
> Hmm, I'm not all that sure. Existing design really allows
> caching the route in various smart ways. We currently do
> this for irqfd but this can be extended to ioctls.
> If we just let the guest inject arbitrary messages,
> that becomes much more complex.

irqfd and kvm device assignment do not allow us to inject arbitrary
messages at arbitrary points. The new API offers kvm_msi_irqfd_set and
kvm_device_msix_set_vector (etc.) for those scenarios to set static
routes from an MSI message to a GSI number (+they configure the related
backends).

> 
> Another concern is mask bit emulation. We currently
> handle mask bit in userspace but patches
> to do them in kernel for assigned devices where seen
> and IMO we might want to do that for virtio as well.
> 
> For that to work the mask bit needs to be tied to
> a specific gsi or specific device, which does not
> work if we just inject arbitrary writes.

Yes, but I do not see those valuable plans being negatively affected.

Jan


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 262 bytes --]

^ permalink raw reply	[flat|nested] 288+ messages in thread

* Re: [RFC][PATCH 28/45] qemu-kvm: msix: Drop tracking of used vectors
  2011-10-18 18:24                                     ` [Qemu-devel] " Jan Kiszka
@ 2011-10-18 18:40                                       ` Michael S. Tsirkin
  -1 siblings, 0 replies; 288+ messages in thread
From: Michael S. Tsirkin @ 2011-10-18 18:40 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: Avi Kivity, Marcelo Tosatti, kvm, Alex Williamson, qemu-devel

On Tue, Oct 18, 2011 at 08:24:39PM +0200, Jan Kiszka wrote:
> On 2011-10-18 19:06, Michael S. Tsirkin wrote:
> > On Tue, Oct 18, 2011 at 05:55:54PM +0200, Jan Kiszka wrote:
> >> On 2011-10-18 17:22, Jan Kiszka wrote:
> >>> What KVM has to do is just mapping an arbitrary MSI message
> >>> (theoretically 64+32 bits, in practice it's much of course much less) to
> >>
> >> ( There are 24 distinguishing bits in an MSI message on x86, but that's
> >> only a current interpretation of one specific arch. )
> > 
> > Confused. vector mask is 8 bits. the rest is destination id etc.
> 
> Right, but those additional bits like the destination make different
> messages. We have to encode those 24 bits into a unique GSI number and
> restore them (by table lookup) on APIC injection inside the kernel. If
> we only had to encode 256 different vectors, we would be done already.

Right. But in practice guests always use distinct vectors (from the
256 available) for distinct messages. This is because
the vector seems to be the only thing that gets communicated by the APIC
to the software.

So e.g. a table with 256 entries, with extra 1024-256
used for spill-over for guests that do something unexpected,
would work really well.


> > 
> >>> a single GSI and vice versa. As there are less GSIs than possible MSI
> >>> messages, we could run out of them when creating routes, statically or
> >>> lazily.
> >>>
> >>> What would probably help us long-term out of your concerns regarding
> >>> lazy routing is to bypass that redundant GSI translation for dynamic
> >>> messages, i.e. those that are not associated with an irqfd number or an
> >>> assigned device irq. Something like a KVM_DELIVER_MSI IOCTL that accepts
> >>> address and data directly.
> >>
> >> This would be a trivial extension in fact. Given its beneficial impact
> >> on our GSI limitation issue, I think I will hack up something like that.
> >>
> >> And maybe this makes a transparent cache more reasonable. Then only old
> >> host kernels would force us to do searches for already cached messages.
> >>
> >> Jan
> > 
> > Hmm, I'm not all that sure. Existing design really allows
> > caching the route in various smart ways. We currently do
> > this for irqfd but this can be extended to ioctls.
> > If we just let the guest inject arbitrary messages,
> > that becomes much more complex.
> 
> irqfd and kvm device assignment do not allow us to inject arbitrary
> messages at arbitrary points. The new API offers kvm_msi_irqfd_set and
> kvm_device_msix_set_vector (etc.) for those scenarios to set static
> routes from an MSI message to a GSI number (+they configure the related
> backends).

Yes, it's a very flexible API but it would be very hard to optimize.
GSIs let us do the slow path setup, but they make it easy
to optimize target lookup in kernel.

An analogy would be if read/write operated on file paths.
fd makes it easy to do permission checks and slow lookups
in one place. GSI happens to work like this (maybe, by accident).

> > 
> > Another concern is mask bit emulation. We currently
> > handle mask bit in userspace but patches
> > to do them in kernel for assigned devices where seen
> > and IMO we might want to do that for virtio as well.
> > 
> > For that to work the mask bit needs to be tied to
> > a specific gsi or specific device, which does not
> > work if we just inject arbitrary writes.
> 
> Yes, but I do not see those valuable plans being negatively affected.
> 
> Jan
> 

I do.
How would we maintain a mask/pending bit in kernel if we are not
supplied info on all available vectors even?
-- 
MST

^ permalink raw reply	[flat|nested] 288+ messages in thread

* Re: [Qemu-devel] [RFC][PATCH 28/45] qemu-kvm: msix: Drop tracking of used vectors
@ 2011-10-18 18:40                                       ` Michael S. Tsirkin
  0 siblings, 0 replies; 288+ messages in thread
From: Michael S. Tsirkin @ 2011-10-18 18:40 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: Alex Williamson, Marcelo Tosatti, Avi Kivity, kvm, qemu-devel

On Tue, Oct 18, 2011 at 08:24:39PM +0200, Jan Kiszka wrote:
> On 2011-10-18 19:06, Michael S. Tsirkin wrote:
> > On Tue, Oct 18, 2011 at 05:55:54PM +0200, Jan Kiszka wrote:
> >> On 2011-10-18 17:22, Jan Kiszka wrote:
> >>> What KVM has to do is just mapping an arbitrary MSI message
> >>> (theoretically 64+32 bits, in practice it's much of course much less) to
> >>
> >> ( There are 24 distinguishing bits in an MSI message on x86, but that's
> >> only a current interpretation of one specific arch. )
> > 
> > Confused. vector mask is 8 bits. the rest is destination id etc.
> 
> Right, but those additional bits like the destination make different
> messages. We have to encode those 24 bits into a unique GSI number and
> restore them (by table lookup) on APIC injection inside the kernel. If
> we only had to encode 256 different vectors, we would be done already.

Right. But in practice guests always use distinct vectors (from the
256 available) for distinct messages. This is because
the vector seems to be the only thing that gets communicated by the APIC
to the software.

So e.g. a table with 256 entries, with extra 1024-256
used for spill-over for guests that do something unexpected,
would work really well.


> > 
> >>> a single GSI and vice versa. As there are less GSIs than possible MSI
> >>> messages, we could run out of them when creating routes, statically or
> >>> lazily.
> >>>
> >>> What would probably help us long-term out of your concerns regarding
> >>> lazy routing is to bypass that redundant GSI translation for dynamic
> >>> messages, i.e. those that are not associated with an irqfd number or an
> >>> assigned device irq. Something like a KVM_DELIVER_MSI IOCTL that accepts
> >>> address and data directly.
> >>
> >> This would be a trivial extension in fact. Given its beneficial impact
> >> on our GSI limitation issue, I think I will hack up something like that.
> >>
> >> And maybe this makes a transparent cache more reasonable. Then only old
> >> host kernels would force us to do searches for already cached messages.
> >>
> >> Jan
> > 
> > Hmm, I'm not all that sure. Existing design really allows
> > caching the route in various smart ways. We currently do
> > this for irqfd but this can be extended to ioctls.
> > If we just let the guest inject arbitrary messages,
> > that becomes much more complex.
> 
> irqfd and kvm device assignment do not allow us to inject arbitrary
> messages at arbitrary points. The new API offers kvm_msi_irqfd_set and
> kvm_device_msix_set_vector (etc.) for those scenarios to set static
> routes from an MSI message to a GSI number (+they configure the related
> backends).

Yes, it's a very flexible API but it would be very hard to optimize.
GSIs let us do the slow path setup, but they make it easy
to optimize target lookup in kernel.

An analogy would be if read/write operated on file paths.
fd makes it easy to do permission checks and slow lookups
in one place. GSI happens to work like this (maybe, by accident).

> > 
> > Another concern is mask bit emulation. We currently
> > handle mask bit in userspace but patches
> > to do them in kernel for assigned devices where seen
> > and IMO we might want to do that for virtio as well.
> > 
> > For that to work the mask bit needs to be tied to
> > a specific gsi or specific device, which does not
> > work if we just inject arbitrary writes.
> 
> Yes, but I do not see those valuable plans being negatively affected.
> 
> Jan
> 

I do.
How would we maintain a mask/pending bit in kernel if we are not
supplied info on all available vectors even?
-- 
MST

^ permalink raw reply	[flat|nested] 288+ messages in thread

* Re: [RFC][PATCH 28/45] qemu-kvm: msix: Drop tracking of used vectors
  2011-10-18 18:40                                       ` [Qemu-devel] " Michael S. Tsirkin
@ 2011-10-18 19:37                                         ` Jan Kiszka
  -1 siblings, 0 replies; 288+ messages in thread
From: Jan Kiszka @ 2011-10-18 19:37 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Avi Kivity, Marcelo Tosatti, kvm, Alex Williamson, qemu-devel

[-- Attachment #1: Type: text/plain, Size: 5783 bytes --]

On 2011-10-18 20:40, Michael S. Tsirkin wrote:
> On Tue, Oct 18, 2011 at 08:24:39PM +0200, Jan Kiszka wrote:
>> On 2011-10-18 19:06, Michael S. Tsirkin wrote:
>>> On Tue, Oct 18, 2011 at 05:55:54PM +0200, Jan Kiszka wrote:
>>>> On 2011-10-18 17:22, Jan Kiszka wrote:
>>>>> What KVM has to do is just mapping an arbitrary MSI message
>>>>> (theoretically 64+32 bits, in practice it's much of course much less) to
>>>>
>>>> ( There are 24 distinguishing bits in an MSI message on x86, but that's
>>>> only a current interpretation of one specific arch. )
>>>
>>> Confused. vector mask is 8 bits. the rest is destination id etc.
>>
>> Right, but those additional bits like the destination make different
>> messages. We have to encode those 24 bits into a unique GSI number and
>> restore them (by table lookup) on APIC injection inside the kernel. If
>> we only had to encode 256 different vectors, we would be done already.
> 
> Right. But in practice guests always use distinct vectors (from the
> 256 available) for distinct messages. This is because
> the vector seems to be the only thing that gets communicated by the APIC
> to the software.
> 
> So e.g. a table with 256 entries, with extra 1024-256
> used for spill-over for guests that do something unexpected,
> would work really well.

Already Linux manages vectors on a pre-CPU basis. For efficiency
reasons, it does not exploit the full range of 256 vectors but actually
allocates them in - IIRC - steps of 16. So I would not be surprised to
find lots of vector number "collisions" when looking over a full set of
CPUs in a system.

Really, these considerations do not help us. We must store all 96 bits,
already for the sake of other KVM architectures that want MSI routing.

> 
> 
>>>
>>>>> a single GSI and vice versa. As there are less GSIs than possible MSI
>>>>> messages, we could run out of them when creating routes, statically or
>>>>> lazily.
>>>>>
>>>>> What would probably help us long-term out of your concerns regarding
>>>>> lazy routing is to bypass that redundant GSI translation for dynamic
>>>>> messages, i.e. those that are not associated with an irqfd number or an
>>>>> assigned device irq. Something like a KVM_DELIVER_MSI IOCTL that accepts
>>>>> address and data directly.
>>>>
>>>> This would be a trivial extension in fact. Given its beneficial impact
>>>> on our GSI limitation issue, I think I will hack up something like that.
>>>>
>>>> And maybe this makes a transparent cache more reasonable. Then only old
>>>> host kernels would force us to do searches for already cached messages.
>>>>
>>>> Jan
>>>
>>> Hmm, I'm not all that sure. Existing design really allows
>>> caching the route in various smart ways. We currently do
>>> this for irqfd but this can be extended to ioctls.
>>> If we just let the guest inject arbitrary messages,
>>> that becomes much more complex.
>>
>> irqfd and kvm device assignment do not allow us to inject arbitrary
>> messages at arbitrary points. The new API offers kvm_msi_irqfd_set and
>> kvm_device_msix_set_vector (etc.) for those scenarios to set static
>> routes from an MSI message to a GSI number (+they configure the related
>> backends).
> 
> Yes, it's a very flexible API but it would be very hard to optimize.
> GSIs let us do the slow path setup, but they make it easy
> to optimize target lookup in kernel.

Users of the API above have no need to know anything about GSIs. They
are an artifact of the KVM-internal interface between user space and
kernel now - thanks to the MSIRoutingCache encapsulation.

> 
> An analogy would be if read/write operated on file paths.
> fd makes it easy to do permission checks and slow lookups
> in one place. GSI happens to work like this (maybe, by accident).

Think of an opaque file handle as a MSIRoutingCache object. And it
encodes not only the routing handle but also other useful associated
information we need from time to time - internally, not in the device
models.

>>>
>>> Another concern is mask bit emulation. We currently
>>> handle mask bit in userspace but patches
>>> to do them in kernel for assigned devices where seen
>>> and IMO we might want to do that for virtio as well.
>>>
>>> For that to work the mask bit needs to be tied to
>>> a specific gsi or specific device, which does not
>>> work if we just inject arbitrary writes.
>>
>> Yes, but I do not see those valuable plans being negatively affected.
>>
>> Jan
>>
> 
> I do.
> How would we maintain a mask/pending bit in kernel if we are not
> supplied info on all available vectors even?

It's tricky to discuss an undefined interface (there only exists an
outdated proposal for kvm device assignment). But I suppose that user
space will have to define the maximum number of vectors when creating an
in-kernel MSI-X MMIO area. The device already has to tell this to msix_init.

The number of used vectors will correlate with the number of registered
irqfds (in the case of vhost or vfio, device assignment still has
SET_MSIX_NR). As kernel space would then be responsible for mask
processing, user space would keep vectors registered with irqfds, even
if they are masked. It could just continue to play the trick and drop
data=0 vectors.

The point here is: All those steps have _nothing_ to do with the generic
MSI-X core. They are KVM-specific "side channels" for which KVM provides
an API. In contrast, msix_vector_use/unuse were generic services that
were actually created to please KVM requirements. But if we split that
up, we can address the generic MSI-X requirements in a way that makes
more sense for emulated devices (and particularly msix_vector_use makes
no sense for emulation).

Jan


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 262 bytes --]

^ permalink raw reply	[flat|nested] 288+ messages in thread

* Re: [Qemu-devel] [RFC][PATCH 28/45] qemu-kvm: msix: Drop tracking of used vectors
@ 2011-10-18 19:37                                         ` Jan Kiszka
  0 siblings, 0 replies; 288+ messages in thread
From: Jan Kiszka @ 2011-10-18 19:37 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Alex Williamson, Marcelo Tosatti, Avi Kivity, kvm, qemu-devel

[-- Attachment #1: Type: text/plain, Size: 5783 bytes --]

On 2011-10-18 20:40, Michael S. Tsirkin wrote:
> On Tue, Oct 18, 2011 at 08:24:39PM +0200, Jan Kiszka wrote:
>> On 2011-10-18 19:06, Michael S. Tsirkin wrote:
>>> On Tue, Oct 18, 2011 at 05:55:54PM +0200, Jan Kiszka wrote:
>>>> On 2011-10-18 17:22, Jan Kiszka wrote:
>>>>> What KVM has to do is just mapping an arbitrary MSI message
>>>>> (theoretically 64+32 bits, in practice it's much of course much less) to
>>>>
>>>> ( There are 24 distinguishing bits in an MSI message on x86, but that's
>>>> only a current interpretation of one specific arch. )
>>>
>>> Confused. vector mask is 8 bits. the rest is destination id etc.
>>
>> Right, but those additional bits like the destination make different
>> messages. We have to encode those 24 bits into a unique GSI number and
>> restore them (by table lookup) on APIC injection inside the kernel. If
>> we only had to encode 256 different vectors, we would be done already.
> 
> Right. But in practice guests always use distinct vectors (from the
> 256 available) for distinct messages. This is because
> the vector seems to be the only thing that gets communicated by the APIC
> to the software.
> 
> So e.g. a table with 256 entries, with extra 1024-256
> used for spill-over for guests that do something unexpected,
> would work really well.

Already Linux manages vectors on a pre-CPU basis. For efficiency
reasons, it does not exploit the full range of 256 vectors but actually
allocates them in - IIRC - steps of 16. So I would not be surprised to
find lots of vector number "collisions" when looking over a full set of
CPUs in a system.

Really, these considerations do not help us. We must store all 96 bits,
already for the sake of other KVM architectures that want MSI routing.

> 
> 
>>>
>>>>> a single GSI and vice versa. As there are less GSIs than possible MSI
>>>>> messages, we could run out of them when creating routes, statically or
>>>>> lazily.
>>>>>
>>>>> What would probably help us long-term out of your concerns regarding
>>>>> lazy routing is to bypass that redundant GSI translation for dynamic
>>>>> messages, i.e. those that are not associated with an irqfd number or an
>>>>> assigned device irq. Something like a KVM_DELIVER_MSI IOCTL that accepts
>>>>> address and data directly.
>>>>
>>>> This would be a trivial extension in fact. Given its beneficial impact
>>>> on our GSI limitation issue, I think I will hack up something like that.
>>>>
>>>> And maybe this makes a transparent cache more reasonable. Then only old
>>>> host kernels would force us to do searches for already cached messages.
>>>>
>>>> Jan
>>>
>>> Hmm, I'm not all that sure. Existing design really allows
>>> caching the route in various smart ways. We currently do
>>> this for irqfd but this can be extended to ioctls.
>>> If we just let the guest inject arbitrary messages,
>>> that becomes much more complex.
>>
>> irqfd and kvm device assignment do not allow us to inject arbitrary
>> messages at arbitrary points. The new API offers kvm_msi_irqfd_set and
>> kvm_device_msix_set_vector (etc.) for those scenarios to set static
>> routes from an MSI message to a GSI number (+they configure the related
>> backends).
> 
> Yes, it's a very flexible API but it would be very hard to optimize.
> GSIs let us do the slow path setup, but they make it easy
> to optimize target lookup in kernel.

Users of the API above have no need to know anything about GSIs. They
are an artifact of the KVM-internal interface between user space and
kernel now - thanks to the MSIRoutingCache encapsulation.

> 
> An analogy would be if read/write operated on file paths.
> fd makes it easy to do permission checks and slow lookups
> in one place. GSI happens to work like this (maybe, by accident).

Think of an opaque file handle as a MSIRoutingCache object. And it
encodes not only the routing handle but also other useful associated
information we need from time to time - internally, not in the device
models.

>>>
>>> Another concern is mask bit emulation. We currently
>>> handle mask bit in userspace but patches
>>> to do them in kernel for assigned devices where seen
>>> and IMO we might want to do that for virtio as well.
>>>
>>> For that to work the mask bit needs to be tied to
>>> a specific gsi or specific device, which does not
>>> work if we just inject arbitrary writes.
>>
>> Yes, but I do not see those valuable plans being negatively affected.
>>
>> Jan
>>
> 
> I do.
> How would we maintain a mask/pending bit in kernel if we are not
> supplied info on all available vectors even?

It's tricky to discuss an undefined interface (there only exists an
outdated proposal for kvm device assignment). But I suppose that user
space will have to define the maximum number of vectors when creating an
in-kernel MSI-X MMIO area. The device already has to tell this to msix_init.

The number of used vectors will correlate with the number of registered
irqfds (in the case of vhost or vfio, device assignment still has
SET_MSIX_NR). As kernel space would then be responsible for mask
processing, user space would keep vectors registered with irqfds, even
if they are masked. It could just continue to play the trick and drop
data=0 vectors.

The point here is: All those steps have _nothing_ to do with the generic
MSI-X core. They are KVM-specific "side channels" for which KVM provides
an API. In contrast, msix_vector_use/unuse were generic services that
were actually created to please KVM requirements. But if we split that
up, we can address the generic MSI-X requirements in a way that makes
more sense for emulated devices (and particularly msix_vector_use makes
no sense for emulation).

Jan


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 262 bytes --]

^ permalink raw reply	[flat|nested] 288+ messages in thread

* Re: [RFC][PATCH 28/45] qemu-kvm: msix: Drop tracking of used vectors
  2011-10-18 19:37                                         ` [Qemu-devel] " Jan Kiszka
@ 2011-10-18 21:40                                           ` Michael S. Tsirkin
  -1 siblings, 0 replies; 288+ messages in thread
From: Michael S. Tsirkin @ 2011-10-18 21:40 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: Avi Kivity, Marcelo Tosatti, kvm, Alex Williamson, qemu-devel

On Tue, Oct 18, 2011 at 09:37:14PM +0200, Jan Kiszka wrote:
> On 2011-10-18 20:40, Michael S. Tsirkin wrote:
> > On Tue, Oct 18, 2011 at 08:24:39PM +0200, Jan Kiszka wrote:
> >> On 2011-10-18 19:06, Michael S. Tsirkin wrote:
> >>> On Tue, Oct 18, 2011 at 05:55:54PM +0200, Jan Kiszka wrote:
> >>>> On 2011-10-18 17:22, Jan Kiszka wrote:
> >>>>> What KVM has to do is just mapping an arbitrary MSI message
> >>>>> (theoretically 64+32 bits, in practice it's much of course much less) to
> >>>>
> >>>> ( There are 24 distinguishing bits in an MSI message on x86, but that's
> >>>> only a current interpretation of one specific arch. )
> >>>
> >>> Confused. vector mask is 8 bits. the rest is destination id etc.
> >>
> >> Right, but those additional bits like the destination make different
> >> messages. We have to encode those 24 bits into a unique GSI number and
> >> restore them (by table lookup) on APIC injection inside the kernel. If
> >> we only had to encode 256 different vectors, we would be done already.
> > 
> > Right. But in practice guests always use distinct vectors (from the
> > 256 available) for distinct messages. This is because
> > the vector seems to be the only thing that gets communicated by the APIC
> > to the software.
> > 
> > So e.g. a table with 256 entries, with extra 1024-256
> > used for spill-over for guests that do something unexpected,
> > would work really well.
> 
> Already Linux manages vectors on a pre-CPU basis. For efficiency
> reasons, it does not exploit the full range of 256 vectors but actually
> allocates them in - IIRC - steps of 16. So I would not be surprised to
> find lots of vector number "collisions" when looking over a full set of
> CPUs in a system.
> 
> Really, these considerations do not help us. We must store all 96 bits,
> already for the sake of other KVM architectures that want MSI routing.
> > 
> > 
> >>>
> >>>>> a single GSI and vice versa. As there are less GSIs than possible MSI
> >>>>> messages, we could run out of them when creating routes, statically or
> >>>>> lazily.
> >>>>>
> >>>>> What would probably help us long-term out of your concerns regarding
> >>>>> lazy routing is to bypass that redundant GSI translation for dynamic
> >>>>> messages, i.e. those that are not associated with an irqfd number or an
> >>>>> assigned device irq. Something like a KVM_DELIVER_MSI IOCTL that accepts
> >>>>> address and data directly.
> >>>>
> >>>> This would be a trivial extension in fact. Given its beneficial impact
> >>>> on our GSI limitation issue, I think I will hack up something like that.
> >>>>
> >>>> And maybe this makes a transparent cache more reasonable. Then only old
> >>>> host kernels would force us to do searches for already cached messages.
> >>>>
> >>>> Jan
> >>>
> >>> Hmm, I'm not all that sure. Existing design really allows
> >>> caching the route in various smart ways. We currently do
> >>> this for irqfd but this can be extended to ioctls.
> >>> If we just let the guest inject arbitrary messages,
> >>> that becomes much more complex.
> >>
> >> irqfd and kvm device assignment do not allow us to inject arbitrary
> >> messages at arbitrary points. The new API offers kvm_msi_irqfd_set and
> >> kvm_device_msix_set_vector (etc.) for those scenarios to set static
> >> routes from an MSI message to a GSI number (+they configure the related
> >> backends).
> > 
> > Yes, it's a very flexible API but it would be very hard to optimize.
> > GSIs let us do the slow path setup, but they make it easy
> > to optimize target lookup in kernel.
> 
> Users of the API above have no need to know anything about GSIs. They
> are an artifact of the KVM-internal interface between user space and
> kernel now - thanks to the MSIRoutingCache encapsulation.

Yes but I am saying that the API above can't be implemented
more efficiently than now: you will have to scan all apics on each MSI.
The GSI implementation can be optimized: decode the vector once,
if it matches a single vcpu, store that vcpu and use when sending
interrupts.


> > 
> > An analogy would be if read/write operated on file paths.
> > fd makes it easy to do permission checks and slow lookups
> > in one place. GSI happens to work like this (maybe, by accident).
> 
> Think of an opaque file handle as a MSIRoutingCache object. And it
> encodes not only the routing handle but also other useful associated
> information we need from time to time - internally, not in the device
> models.

Forget qemu abstractions, I am talking about data path
optimizations in kernel in kvm. From that POV the point of an fd is not
that it is opaque. It is that it's an index in an array that
can be used for fast lookups.

> >>>
> >>> Another concern is mask bit emulation. We currently
> >>> handle mask bit in userspace but patches
> >>> to do them in kernel for assigned devices where seen
> >>> and IMO we might want to do that for virtio as well.
> >>>
> >>> For that to work the mask bit needs to be tied to
> >>> a specific gsi or specific device, which does not
> >>> work if we just inject arbitrary writes.
> >>
> >> Yes, but I do not see those valuable plans being negatively affected.
> >>
> >> Jan
> >>
> > 
> > I do.
> > How would we maintain a mask/pending bit in kernel if we are not
> > supplied info on all available vectors even?
> 
> It's tricky to discuss an undefined interface (there only exists an
> outdated proposal for kvm device assignment). But I suppose that user
> space will have to define the maximum number of vectors when creating an
> in-kernel MSI-X MMIO area. The device already has to tell this to msix_init.
> 
> The number of used vectors will correlate with the number of registered
> irqfds (in the case of vhost or vfio, device assignment still has
> SET_MSIX_NR). As kernel space would then be responsible for mask
> processing, user space would keep vectors registered with irqfds, even
> if they are masked. It could just continue to play the trick and drop
> data=0 vectors.

Which trick?  We don't play any tricks except for device assignment.

> The point here is: All those steps have _nothing_ to do with the generic
> MSI-X core. They are KVM-specific "side channels" for which KVM provides
> an API. In contrast, msix_vector_use/unuse were generic services that
> were actually created to please KVM requirements. But if we split that
> up, we can address the generic MSI-X requirements in a way that makes
> more sense for emulated devices (and particularly msix_vector_use makes
> no sense for emulation).
> 
> Jan
> 

We need at least msix_vector_unuse - IMO it makes more sense than "clear
pending vector". msix_vector_use is good to keep around for symmetry:
who knows whether we'll need to allocate resources per vector
in the future.

-- 
MST

^ permalink raw reply	[flat|nested] 288+ messages in thread

* Re: [Qemu-devel] [RFC][PATCH 28/45] qemu-kvm: msix: Drop tracking of used vectors
@ 2011-10-18 21:40                                           ` Michael S. Tsirkin
  0 siblings, 0 replies; 288+ messages in thread
From: Michael S. Tsirkin @ 2011-10-18 21:40 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: Alex Williamson, Marcelo Tosatti, Avi Kivity, kvm, qemu-devel

On Tue, Oct 18, 2011 at 09:37:14PM +0200, Jan Kiszka wrote:
> On 2011-10-18 20:40, Michael S. Tsirkin wrote:
> > On Tue, Oct 18, 2011 at 08:24:39PM +0200, Jan Kiszka wrote:
> >> On 2011-10-18 19:06, Michael S. Tsirkin wrote:
> >>> On Tue, Oct 18, 2011 at 05:55:54PM +0200, Jan Kiszka wrote:
> >>>> On 2011-10-18 17:22, Jan Kiszka wrote:
> >>>>> What KVM has to do is just mapping an arbitrary MSI message
> >>>>> (theoretically 64+32 bits, in practice it's much of course much less) to
> >>>>
> >>>> ( There are 24 distinguishing bits in an MSI message on x86, but that's
> >>>> only a current interpretation of one specific arch. )
> >>>
> >>> Confused. vector mask is 8 bits. the rest is destination id etc.
> >>
> >> Right, but those additional bits like the destination make different
> >> messages. We have to encode those 24 bits into a unique GSI number and
> >> restore them (by table lookup) on APIC injection inside the kernel. If
> >> we only had to encode 256 different vectors, we would be done already.
> > 
> > Right. But in practice guests always use distinct vectors (from the
> > 256 available) for distinct messages. This is because
> > the vector seems to be the only thing that gets communicated by the APIC
> > to the software.
> > 
> > So e.g. a table with 256 entries, with extra 1024-256
> > used for spill-over for guests that do something unexpected,
> > would work really well.
> 
> Already Linux manages vectors on a pre-CPU basis. For efficiency
> reasons, it does not exploit the full range of 256 vectors but actually
> allocates them in - IIRC - steps of 16. So I would not be surprised to
> find lots of vector number "collisions" when looking over a full set of
> CPUs in a system.
> 
> Really, these considerations do not help us. We must store all 96 bits,
> already for the sake of other KVM architectures that want MSI routing.
> > 
> > 
> >>>
> >>>>> a single GSI and vice versa. As there are less GSIs than possible MSI
> >>>>> messages, we could run out of them when creating routes, statically or
> >>>>> lazily.
> >>>>>
> >>>>> What would probably help us long-term out of your concerns regarding
> >>>>> lazy routing is to bypass that redundant GSI translation for dynamic
> >>>>> messages, i.e. those that are not associated with an irqfd number or an
> >>>>> assigned device irq. Something like a KVM_DELIVER_MSI IOCTL that accepts
> >>>>> address and data directly.
> >>>>
> >>>> This would be a trivial extension in fact. Given its beneficial impact
> >>>> on our GSI limitation issue, I think I will hack up something like that.
> >>>>
> >>>> And maybe this makes a transparent cache more reasonable. Then only old
> >>>> host kernels would force us to do searches for already cached messages.
> >>>>
> >>>> Jan
> >>>
> >>> Hmm, I'm not all that sure. Existing design really allows
> >>> caching the route in various smart ways. We currently do
> >>> this for irqfd but this can be extended to ioctls.
> >>> If we just let the guest inject arbitrary messages,
> >>> that becomes much more complex.
> >>
> >> irqfd and kvm device assignment do not allow us to inject arbitrary
> >> messages at arbitrary points. The new API offers kvm_msi_irqfd_set and
> >> kvm_device_msix_set_vector (etc.) for those scenarios to set static
> >> routes from an MSI message to a GSI number (+they configure the related
> >> backends).
> > 
> > Yes, it's a very flexible API but it would be very hard to optimize.
> > GSIs let us do the slow path setup, but they make it easy
> > to optimize target lookup in kernel.
> 
> Users of the API above have no need to know anything about GSIs. They
> are an artifact of the KVM-internal interface between user space and
> kernel now - thanks to the MSIRoutingCache encapsulation.

Yes but I am saying that the API above can't be implemented
more efficiently than now: you will have to scan all apics on each MSI.
The GSI implementation can be optimized: decode the vector once,
if it matches a single vcpu, store that vcpu and use when sending
interrupts.


> > 
> > An analogy would be if read/write operated on file paths.
> > fd makes it easy to do permission checks and slow lookups
> > in one place. GSI happens to work like this (maybe, by accident).
> 
> Think of an opaque file handle as a MSIRoutingCache object. And it
> encodes not only the routing handle but also other useful associated
> information we need from time to time - internally, not in the device
> models.

Forget qemu abstractions, I am talking about data path
optimizations in kernel in kvm. From that POV the point of an fd is not
that it is opaque. It is that it's an index in an array that
can be used for fast lookups.

> >>>
> >>> Another concern is mask bit emulation. We currently
> >>> handle mask bit in userspace but patches
> >>> to do them in kernel for assigned devices where seen
> >>> and IMO we might want to do that for virtio as well.
> >>>
> >>> For that to work the mask bit needs to be tied to
> >>> a specific gsi or specific device, which does not
> >>> work if we just inject arbitrary writes.
> >>
> >> Yes, but I do not see those valuable plans being negatively affected.
> >>
> >> Jan
> >>
> > 
> > I do.
> > How would we maintain a mask/pending bit in kernel if we are not
> > supplied info on all available vectors even?
> 
> It's tricky to discuss an undefined interface (there only exists an
> outdated proposal for kvm device assignment). But I suppose that user
> space will have to define the maximum number of vectors when creating an
> in-kernel MSI-X MMIO area. The device already has to tell this to msix_init.
> 
> The number of used vectors will correlate with the number of registered
> irqfds (in the case of vhost or vfio, device assignment still has
> SET_MSIX_NR). As kernel space would then be responsible for mask
> processing, user space would keep vectors registered with irqfds, even
> if they are masked. It could just continue to play the trick and drop
> data=0 vectors.

Which trick?  We don't play any tricks except for device assignment.

> The point here is: All those steps have _nothing_ to do with the generic
> MSI-X core. They are KVM-specific "side channels" for which KVM provides
> an API. In contrast, msix_vector_use/unuse were generic services that
> were actually created to please KVM requirements. But if we split that
> up, we can address the generic MSI-X requirements in a way that makes
> more sense for emulated devices (and particularly msix_vector_use makes
> no sense for emulation).
> 
> Jan
> 

We need at least msix_vector_unuse - IMO it makes more sense than "clear
pending vector". msix_vector_use is good to keep around for symmetry:
who knows whether we'll need to allocate resources per vector
in the future.

-- 
MST

^ permalink raw reply	[flat|nested] 288+ messages in thread

* Re: [RFC][PATCH 28/45] qemu-kvm: msix: Drop tracking of used vectors
  2011-10-18 21:40                                           ` [Qemu-devel] " Michael S. Tsirkin
@ 2011-10-18 22:13                                             ` Jan Kiszka
  -1 siblings, 0 replies; 288+ messages in thread
From: Jan Kiszka @ 2011-10-18 22:13 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Avi Kivity, Marcelo Tosatti, kvm, Alex Williamson, qemu-devel

[-- Attachment #1: Type: text/plain, Size: 7787 bytes --]

On 2011-10-18 23:40, Michael S. Tsirkin wrote:
> On Tue, Oct 18, 2011 at 09:37:14PM +0200, Jan Kiszka wrote:
>> On 2011-10-18 20:40, Michael S. Tsirkin wrote:
>>> On Tue, Oct 18, 2011 at 08:24:39PM +0200, Jan Kiszka wrote:
>>>> On 2011-10-18 19:06, Michael S. Tsirkin wrote:
>>>>> On Tue, Oct 18, 2011 at 05:55:54PM +0200, Jan Kiszka wrote:
>>>>>> On 2011-10-18 17:22, Jan Kiszka wrote:
>>>>>>> What KVM has to do is just mapping an arbitrary MSI message
>>>>>>> (theoretically 64+32 bits, in practice it's much of course much less) to
>>>>>>
>>>>>> ( There are 24 distinguishing bits in an MSI message on x86, but that's
>>>>>> only a current interpretation of one specific arch. )
>>>>>
>>>>> Confused. vector mask is 8 bits. the rest is destination id etc.
>>>>
>>>> Right, but those additional bits like the destination make different
>>>> messages. We have to encode those 24 bits into a unique GSI number and
>>>> restore them (by table lookup) on APIC injection inside the kernel. If
>>>> we only had to encode 256 different vectors, we would be done already.
>>>
>>> Right. But in practice guests always use distinct vectors (from the
>>> 256 available) for distinct messages. This is because
>>> the vector seems to be the only thing that gets communicated by the APIC
>>> to the software.
>>>
>>> So e.g. a table with 256 entries, with extra 1024-256
>>> used for spill-over for guests that do something unexpected,
>>> would work really well.
>>
>> Already Linux manages vectors on a pre-CPU basis. For efficiency
>> reasons, it does not exploit the full range of 256 vectors but actually
>> allocates them in - IIRC - steps of 16. So I would not be surprised to
>> find lots of vector number "collisions" when looking over a full set of
>> CPUs in a system.
>>
>> Really, these considerations do not help us. We must store all 96 bits,
>> already for the sake of other KVM architectures that want MSI routing.
>>>
>>>
>>>>>
>>>>>>> a single GSI and vice versa. As there are less GSIs than possible MSI
>>>>>>> messages, we could run out of them when creating routes, statically or
>>>>>>> lazily.
>>>>>>>
>>>>>>> What would probably help us long-term out of your concerns regarding
>>>>>>> lazy routing is to bypass that redundant GSI translation for dynamic
>>>>>>> messages, i.e. those that are not associated with an irqfd number or an
>>>>>>> assigned device irq. Something like a KVM_DELIVER_MSI IOCTL that accepts
>>>>>>> address and data directly.
>>>>>>
>>>>>> This would be a trivial extension in fact. Given its beneficial impact
>>>>>> on our GSI limitation issue, I think I will hack up something like that.
>>>>>>
>>>>>> And maybe this makes a transparent cache more reasonable. Then only old
>>>>>> host kernels would force us to do searches for already cached messages.
>>>>>>
>>>>>> Jan
>>>>>
>>>>> Hmm, I'm not all that sure. Existing design really allows
>>>>> caching the route in various smart ways. We currently do
>>>>> this for irqfd but this can be extended to ioctls.
>>>>> If we just let the guest inject arbitrary messages,
>>>>> that becomes much more complex.
>>>>
>>>> irqfd and kvm device assignment do not allow us to inject arbitrary
>>>> messages at arbitrary points. The new API offers kvm_msi_irqfd_set and
>>>> kvm_device_msix_set_vector (etc.) for those scenarios to set static
>>>> routes from an MSI message to a GSI number (+they configure the related
>>>> backends).
>>>
>>> Yes, it's a very flexible API but it would be very hard to optimize.
>>> GSIs let us do the slow path setup, but they make it easy
>>> to optimize target lookup in kernel.
>>
>> Users of the API above have no need to know anything about GSIs. They
>> are an artifact of the KVM-internal interface between user space and
>> kernel now - thanks to the MSIRoutingCache encapsulation.
> 
> Yes but I am saying that the API above can't be implemented
> more efficiently than now: you will have to scan all apics on each MSI.
> The GSI implementation can be optimized: decode the vector once,
> if it matches a single vcpu, store that vcpu and use when sending
> interrupts.

Sorry, missed that you switched to kernel.

What information do you want to cache there that cannot be easily
obtained by looking at a concrete message? I do not see any. Once you
checked that the delivery mode targets a specific cpu, you could address
it directly. Or are you thinking about some cluster mode?

> 
> 
>>>
>>> An analogy would be if read/write operated on file paths.
>>> fd makes it easy to do permission checks and slow lookups
>>> in one place. GSI happens to work like this (maybe, by accident).
>>
>> Think of an opaque file handle as a MSIRoutingCache object. And it
>> encodes not only the routing handle but also other useful associated
>> information we need from time to time - internally, not in the device
>> models.
> 
> Forget qemu abstractions, I am talking about data path
> optimizations in kernel in kvm. From that POV the point of an fd is not
> that it is opaque. It is that it's an index in an array that
> can be used for fast lookups.
> 
>>>>>
>>>>> Another concern is mask bit emulation. We currently
>>>>> handle mask bit in userspace but patches
>>>>> to do them in kernel for assigned devices where seen
>>>>> and IMO we might want to do that for virtio as well.
>>>>>
>>>>> For that to work the mask bit needs to be tied to
>>>>> a specific gsi or specific device, which does not
>>>>> work if we just inject arbitrary writes.
>>>>
>>>> Yes, but I do not see those valuable plans being negatively affected.
>>>>
>>>> Jan
>>>>
>>>
>>> I do.
>>> How would we maintain a mask/pending bit in kernel if we are not
>>> supplied info on all available vectors even?
>>
>> It's tricky to discuss an undefined interface (there only exists an
>> outdated proposal for kvm device assignment). But I suppose that user
>> space will have to define the maximum number of vectors when creating an
>> in-kernel MSI-X MMIO area. The device already has to tell this to msix_init.
>>
>> The number of used vectors will correlate with the number of registered
>> irqfds (in the case of vhost or vfio, device assignment still has
>> SET_MSIX_NR). As kernel space would then be responsible for mask
>> processing, user space would keep vectors registered with irqfds, even
>> if they are masked. It could just continue to play the trick and drop
>> data=0 vectors.
> 
> Which trick?  We don't play any tricks except for device assignment.
> 
>> The point here is: All those steps have _nothing_ to do with the generic
>> MSI-X core. They are KVM-specific "side channels" for which KVM provides
>> an API. In contrast, msix_vector_use/unuse were generic services that
>> were actually created to please KVM requirements. But if we split that
>> up, we can address the generic MSI-X requirements in a way that makes
>> more sense for emulated devices (and particularly msix_vector_use makes
>> no sense for emulation).
>>
>> Jan
>>
> 
> We need at least msix_vector_unuse

Not at all. We rather need some qemu_irq_set(level) for MSI. The spec
requires that the device clears pending when the reason for that is
removed. And any removal that is device model-originated should simply
be signaled like an irq de-assert. Vector "unusage" is just one reason here.

> - IMO it makes more sense than "clear
> pending vector". msix_vector_use is good to keep around for symmetry:
> who knows whether we'll need to allocate resources per vector
> in the future.

For MSI[-X], the spec is already there, and we know that there no need
for further resources when emulating it. Only KVM has special needs.

Jan


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 262 bytes --]

^ permalink raw reply	[flat|nested] 288+ messages in thread

* Re: [Qemu-devel] [RFC][PATCH 28/45] qemu-kvm: msix: Drop tracking of used vectors
@ 2011-10-18 22:13                                             ` Jan Kiszka
  0 siblings, 0 replies; 288+ messages in thread
From: Jan Kiszka @ 2011-10-18 22:13 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Alex Williamson, Marcelo Tosatti, Avi Kivity, kvm, qemu-devel

[-- Attachment #1: Type: text/plain, Size: 7787 bytes --]

On 2011-10-18 23:40, Michael S. Tsirkin wrote:
> On Tue, Oct 18, 2011 at 09:37:14PM +0200, Jan Kiszka wrote:
>> On 2011-10-18 20:40, Michael S. Tsirkin wrote:
>>> On Tue, Oct 18, 2011 at 08:24:39PM +0200, Jan Kiszka wrote:
>>>> On 2011-10-18 19:06, Michael S. Tsirkin wrote:
>>>>> On Tue, Oct 18, 2011 at 05:55:54PM +0200, Jan Kiszka wrote:
>>>>>> On 2011-10-18 17:22, Jan Kiszka wrote:
>>>>>>> What KVM has to do is just mapping an arbitrary MSI message
>>>>>>> (theoretically 64+32 bits, in practice it's much of course much less) to
>>>>>>
>>>>>> ( There are 24 distinguishing bits in an MSI message on x86, but that's
>>>>>> only a current interpretation of one specific arch. )
>>>>>
>>>>> Confused. vector mask is 8 bits. the rest is destination id etc.
>>>>
>>>> Right, but those additional bits like the destination make different
>>>> messages. We have to encode those 24 bits into a unique GSI number and
>>>> restore them (by table lookup) on APIC injection inside the kernel. If
>>>> we only had to encode 256 different vectors, we would be done already.
>>>
>>> Right. But in practice guests always use distinct vectors (from the
>>> 256 available) for distinct messages. This is because
>>> the vector seems to be the only thing that gets communicated by the APIC
>>> to the software.
>>>
>>> So e.g. a table with 256 entries, with extra 1024-256
>>> used for spill-over for guests that do something unexpected,
>>> would work really well.
>>
>> Already Linux manages vectors on a pre-CPU basis. For efficiency
>> reasons, it does not exploit the full range of 256 vectors but actually
>> allocates them in - IIRC - steps of 16. So I would not be surprised to
>> find lots of vector number "collisions" when looking over a full set of
>> CPUs in a system.
>>
>> Really, these considerations do not help us. We must store all 96 bits,
>> already for the sake of other KVM architectures that want MSI routing.
>>>
>>>
>>>>>
>>>>>>> a single GSI and vice versa. As there are less GSIs than possible MSI
>>>>>>> messages, we could run out of them when creating routes, statically or
>>>>>>> lazily.
>>>>>>>
>>>>>>> What would probably help us long-term out of your concerns regarding
>>>>>>> lazy routing is to bypass that redundant GSI translation for dynamic
>>>>>>> messages, i.e. those that are not associated with an irqfd number or an
>>>>>>> assigned device irq. Something like a KVM_DELIVER_MSI IOCTL that accepts
>>>>>>> address and data directly.
>>>>>>
>>>>>> This would be a trivial extension in fact. Given its beneficial impact
>>>>>> on our GSI limitation issue, I think I will hack up something like that.
>>>>>>
>>>>>> And maybe this makes a transparent cache more reasonable. Then only old
>>>>>> host kernels would force us to do searches for already cached messages.
>>>>>>
>>>>>> Jan
>>>>>
>>>>> Hmm, I'm not all that sure. Existing design really allows
>>>>> caching the route in various smart ways. We currently do
>>>>> this for irqfd but this can be extended to ioctls.
>>>>> If we just let the guest inject arbitrary messages,
>>>>> that becomes much more complex.
>>>>
>>>> irqfd and kvm device assignment do not allow us to inject arbitrary
>>>> messages at arbitrary points. The new API offers kvm_msi_irqfd_set and
>>>> kvm_device_msix_set_vector (etc.) for those scenarios to set static
>>>> routes from an MSI message to a GSI number (+they configure the related
>>>> backends).
>>>
>>> Yes, it's a very flexible API but it would be very hard to optimize.
>>> GSIs let us do the slow path setup, but they make it easy
>>> to optimize target lookup in kernel.
>>
>> Users of the API above have no need to know anything about GSIs. They
>> are an artifact of the KVM-internal interface between user space and
>> kernel now - thanks to the MSIRoutingCache encapsulation.
> 
> Yes but I am saying that the API above can't be implemented
> more efficiently than now: you will have to scan all apics on each MSI.
> The GSI implementation can be optimized: decode the vector once,
> if it matches a single vcpu, store that vcpu and use when sending
> interrupts.

Sorry, missed that you switched to kernel.

What information do you want to cache there that cannot be easily
obtained by looking at a concrete message? I do not see any. Once you
checked that the delivery mode targets a specific cpu, you could address
it directly. Or are you thinking about some cluster mode?

> 
> 
>>>
>>> An analogy would be if read/write operated on file paths.
>>> fd makes it easy to do permission checks and slow lookups
>>> in one place. GSI happens to work like this (maybe, by accident).
>>
>> Think of an opaque file handle as a MSIRoutingCache object. And it
>> encodes not only the routing handle but also other useful associated
>> information we need from time to time - internally, not in the device
>> models.
> 
> Forget qemu abstractions, I am talking about data path
> optimizations in kernel in kvm. From that POV the point of an fd is not
> that it is opaque. It is that it's an index in an array that
> can be used for fast lookups.
> 
>>>>>
>>>>> Another concern is mask bit emulation. We currently
>>>>> handle mask bit in userspace but patches
>>>>> to do them in kernel for assigned devices where seen
>>>>> and IMO we might want to do that for virtio as well.
>>>>>
>>>>> For that to work the mask bit needs to be tied to
>>>>> a specific gsi or specific device, which does not
>>>>> work if we just inject arbitrary writes.
>>>>
>>>> Yes, but I do not see those valuable plans being negatively affected.
>>>>
>>>> Jan
>>>>
>>>
>>> I do.
>>> How would we maintain a mask/pending bit in kernel if we are not
>>> supplied info on all available vectors even?
>>
>> It's tricky to discuss an undefined interface (there only exists an
>> outdated proposal for kvm device assignment). But I suppose that user
>> space will have to define the maximum number of vectors when creating an
>> in-kernel MSI-X MMIO area. The device already has to tell this to msix_init.
>>
>> The number of used vectors will correlate with the number of registered
>> irqfds (in the case of vhost or vfio, device assignment still has
>> SET_MSIX_NR). As kernel space would then be responsible for mask
>> processing, user space would keep vectors registered with irqfds, even
>> if they are masked. It could just continue to play the trick and drop
>> data=0 vectors.
> 
> Which trick?  We don't play any tricks except for device assignment.
> 
>> The point here is: All those steps have _nothing_ to do with the generic
>> MSI-X core. They are KVM-specific "side channels" for which KVM provides
>> an API. In contrast, msix_vector_use/unuse were generic services that
>> were actually created to please KVM requirements. But if we split that
>> up, we can address the generic MSI-X requirements in a way that makes
>> more sense for emulated devices (and particularly msix_vector_use makes
>> no sense for emulation).
>>
>> Jan
>>
> 
> We need at least msix_vector_unuse

Not at all. We rather need some qemu_irq_set(level) for MSI. The spec
requires that the device clears pending when the reason for that is
removed. And any removal that is device model-originated should simply
be signaled like an irq de-assert. Vector "unusage" is just one reason here.

> - IMO it makes more sense than "clear
> pending vector". msix_vector_use is good to keep around for symmetry:
> who knows whether we'll need to allocate resources per vector
> in the future.

For MSI[-X], the spec is already there, and we know that there no need
for further resources when emulating it. Only KVM has special needs.

Jan


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 262 bytes --]

^ permalink raw reply	[flat|nested] 288+ messages in thread

* Re: [RFC][PATCH 28/45] qemu-kvm: msix: Drop tracking of used vectors
  2011-10-18 22:13                                             ` [Qemu-devel] " Jan Kiszka
@ 2011-10-19  0:56                                               ` Michael S. Tsirkin
  -1 siblings, 0 replies; 288+ messages in thread
From: Michael S. Tsirkin @ 2011-10-19  0:56 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: Avi Kivity, Marcelo Tosatti, kvm, Alex Williamson, qemu-devel

On Wed, Oct 19, 2011 at 12:13:49AM +0200, Jan Kiszka wrote:
> On 2011-10-18 23:40, Michael S. Tsirkin wrote:
> > On Tue, Oct 18, 2011 at 09:37:14PM +0200, Jan Kiszka wrote:
> >> On 2011-10-18 20:40, Michael S. Tsirkin wrote:
> >>> On Tue, Oct 18, 2011 at 08:24:39PM +0200, Jan Kiszka wrote:
> >>>> On 2011-10-18 19:06, Michael S. Tsirkin wrote:
> >>>>> On Tue, Oct 18, 2011 at 05:55:54PM +0200, Jan Kiszka wrote:
> >>>>>> On 2011-10-18 17:22, Jan Kiszka wrote:
> >>>>>>> What KVM has to do is just mapping an arbitrary MSI message
> >>>>>>> (theoretically 64+32 bits, in practice it's much of course much less) to
> >>>>>>
> >>>>>> ( There are 24 distinguishing bits in an MSI message on x86, but that's
> >>>>>> only a current interpretation of one specific arch. )
> >>>>>
> >>>>> Confused. vector mask is 8 bits. the rest is destination id etc.
> >>>>
> >>>> Right, but those additional bits like the destination make different
> >>>> messages. We have to encode those 24 bits into a unique GSI number and
> >>>> restore them (by table lookup) on APIC injection inside the kernel. If
> >>>> we only had to encode 256 different vectors, we would be done already.
> >>>
> >>> Right. But in practice guests always use distinct vectors (from the
> >>> 256 available) for distinct messages. This is because
> >>> the vector seems to be the only thing that gets communicated by the APIC
> >>> to the software.
> >>>
> >>> So e.g. a table with 256 entries, with extra 1024-256
> >>> used for spill-over for guests that do something unexpected,
> >>> would work really well.
> >>
> >> Already Linux manages vectors on a pre-CPU basis. For efficiency
> >> reasons, it does not exploit the full range of 256 vectors but actually
> >> allocates them in - IIRC - steps of 16. So I would not be surprised to
> >> find lots of vector number "collisions" when looking over a full set of
> >> CPUs in a system.
> >>
> >> Really, these considerations do not help us. We must store all 96 bits,
> >> already for the sake of other KVM architectures that want MSI routing.
> >>>
> >>>
> >>>>>
> >>>>>>> a single GSI and vice versa. As there are less GSIs than possible MSI
> >>>>>>> messages, we could run out of them when creating routes, statically or
> >>>>>>> lazily.
> >>>>>>>
> >>>>>>> What would probably help us long-term out of your concerns regarding
> >>>>>>> lazy routing is to bypass that redundant GSI translation for dynamic
> >>>>>>> messages, i.e. those that are not associated with an irqfd number or an
> >>>>>>> assigned device irq. Something like a KVM_DELIVER_MSI IOCTL that accepts
> >>>>>>> address and data directly.
> >>>>>>
> >>>>>> This would be a trivial extension in fact. Given its beneficial impact
> >>>>>> on our GSI limitation issue, I think I will hack up something like that.
> >>>>>>
> >>>>>> And maybe this makes a transparent cache more reasonable. Then only old
> >>>>>> host kernels would force us to do searches for already cached messages.
> >>>>>>
> >>>>>> Jan
> >>>>>
> >>>>> Hmm, I'm not all that sure. Existing design really allows
> >>>>> caching the route in various smart ways. We currently do
> >>>>> this for irqfd but this can be extended to ioctls.
> >>>>> If we just let the guest inject arbitrary messages,
> >>>>> that becomes much more complex.
> >>>>
> >>>> irqfd and kvm device assignment do not allow us to inject arbitrary
> >>>> messages at arbitrary points. The new API offers kvm_msi_irqfd_set and
> >>>> kvm_device_msix_set_vector (etc.) for those scenarios to set static
> >>>> routes from an MSI message to a GSI number (+they configure the related
> >>>> backends).
> >>>
> >>> Yes, it's a very flexible API but it would be very hard to optimize.
> >>> GSIs let us do the slow path setup, but they make it easy
> >>> to optimize target lookup in kernel.
> >>
> >> Users of the API above have no need to know anything about GSIs. They
> >> are an artifact of the KVM-internal interface between user space and
> >> kernel now - thanks to the MSIRoutingCache encapsulation.
> > 
> > Yes but I am saying that the API above can't be implemented
> > more efficiently than now: you will have to scan all apics on each MSI.
> > The GSI implementation can be optimized: decode the vector once,
> > if it matches a single vcpu, store that vcpu and use when sending
> > interrupts.
> 
> Sorry, missed that you switched to kernel.
> 
> What information do you want to cache there that cannot be easily
> obtained by looking at a concrete message? I do not see any. Once you
> checked that the delivery mode targets a specific cpu, you could address
> it directly.

I thought we need to match APIC ID. That needs a table lookup, no?

> Or are you thinking about some cluster mode?

That too.

> > 
> > 
> >>>
> >>> An analogy would be if read/write operated on file paths.
> >>> fd makes it easy to do permission checks and slow lookups
> >>> in one place. GSI happens to work like this (maybe, by accident).
> >>
> >> Think of an opaque file handle as a MSIRoutingCache object. And it
> >> encodes not only the routing handle but also other useful associated
> >> information we need from time to time - internally, not in the device
> >> models.
> > 
> > Forget qemu abstractions, I am talking about data path
> > optimizations in kernel in kvm. From that POV the point of an fd is not
> > that it is opaque. It is that it's an index in an array that
> > can be used for fast lookups.
> > 
> >>>>>
> >>>>> Another concern is mask bit emulation. We currently
> >>>>> handle mask bit in userspace but patches
> >>>>> to do them in kernel for assigned devices where seen
> >>>>> and IMO we might want to do that for virtio as well.
> >>>>>
> >>>>> For that to work the mask bit needs to be tied to
> >>>>> a specific gsi or specific device, which does not
> >>>>> work if we just inject arbitrary writes.
> >>>>
> >>>> Yes, but I do not see those valuable plans being negatively affected.
> >>>>
> >>>> Jan
> >>>>
> >>>
> >>> I do.
> >>> How would we maintain a mask/pending bit in kernel if we are not
> >>> supplied info on all available vectors even?
> >>
> >> It's tricky to discuss an undefined interface (there only exists an
> >> outdated proposal for kvm device assignment). But I suppose that user
> >> space will have to define the maximum number of vectors when creating an
> >> in-kernel MSI-X MMIO area. The device already has to tell this to msix_init.
> >>
> >> The number of used vectors will correlate with the number of registered
> >> irqfds (in the case of vhost or vfio, device assignment still has
> >> SET_MSIX_NR). As kernel space would then be responsible for mask
> >> processing, user space would keep vectors registered with irqfds, even
> >> if they are masked. It could just continue to play the trick and drop
> >> data=0 vectors.
> > 
> > Which trick?  We don't play any tricks except for device assignment.
> > 
> >> The point here is: All those steps have _nothing_ to do with the generic
> >> MSI-X core. They are KVM-specific "side channels" for which KVM provides
> >> an API. In contrast, msix_vector_use/unuse were generic services that
> >> were actually created to please KVM requirements. But if we split that
> >> up, we can address the generic MSI-X requirements in a way that makes
> >> more sense for emulated devices (and particularly msix_vector_use makes
> >> no sense for emulation).
> >>
> >> Jan
> >>
> > 
> > We need at least msix_vector_unuse
> 
> Not at all. We rather need some qemu_irq_set(level) for MSI.
> The spec
> requires that the device clears pending when the reason for that is
> removed. And any removal that is device model-originated should simply
> be signaled like an irq de-assert.

OK, this is a good argument.
In particular virtio ISR read could clear msix pending bit
(note: it would also need to clear irqfd as that is where
 we get the pending bit).

I would prefer not to use qemu_irq_set for this though.
We can add a level flag to msix_notify.

> Vector "unusage" is just one reason here.

I don't see removing the use/unuse functions as a priority though,
but if we add an API that also lets devices say
'reason for interrupt is removed', that would be nice.

Removing extra code can then be done separately, and on qemu.git
not on qemu-kvm.

> > - IMO it makes more sense than "clear
> > pending vector". msix_vector_use is good to keep around for symmetry:
> > who knows whether we'll need to allocate resources per vector
> > in the future.
> 
> For MSI[-X], the spec is already there, and we know that there no need
> for further resources when emulating it.
> Only KVM has special needs.
> 
> Jan
> 

It's not hard to speculate.  Imagine an out of process device that
shares guest memory and sends interrupts to qemu using eventfd. Suddenly
we need an fd per vector, and this without KVM.

-- 
MST

^ permalink raw reply	[flat|nested] 288+ messages in thread

* Re: [Qemu-devel] [RFC][PATCH 28/45] qemu-kvm: msix: Drop tracking of used vectors
@ 2011-10-19  0:56                                               ` Michael S. Tsirkin
  0 siblings, 0 replies; 288+ messages in thread
From: Michael S. Tsirkin @ 2011-10-19  0:56 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: Alex Williamson, Marcelo Tosatti, Avi Kivity, kvm, qemu-devel

On Wed, Oct 19, 2011 at 12:13:49AM +0200, Jan Kiszka wrote:
> On 2011-10-18 23:40, Michael S. Tsirkin wrote:
> > On Tue, Oct 18, 2011 at 09:37:14PM +0200, Jan Kiszka wrote:
> >> On 2011-10-18 20:40, Michael S. Tsirkin wrote:
> >>> On Tue, Oct 18, 2011 at 08:24:39PM +0200, Jan Kiszka wrote:
> >>>> On 2011-10-18 19:06, Michael S. Tsirkin wrote:
> >>>>> On Tue, Oct 18, 2011 at 05:55:54PM +0200, Jan Kiszka wrote:
> >>>>>> On 2011-10-18 17:22, Jan Kiszka wrote:
> >>>>>>> What KVM has to do is just mapping an arbitrary MSI message
> >>>>>>> (theoretically 64+32 bits, in practice it's much of course much less) to
> >>>>>>
> >>>>>> ( There are 24 distinguishing bits in an MSI message on x86, but that's
> >>>>>> only a current interpretation of one specific arch. )
> >>>>>
> >>>>> Confused. vector mask is 8 bits. the rest is destination id etc.
> >>>>
> >>>> Right, but those additional bits like the destination make different
> >>>> messages. We have to encode those 24 bits into a unique GSI number and
> >>>> restore them (by table lookup) on APIC injection inside the kernel. If
> >>>> we only had to encode 256 different vectors, we would be done already.
> >>>
> >>> Right. But in practice guests always use distinct vectors (from the
> >>> 256 available) for distinct messages. This is because
> >>> the vector seems to be the only thing that gets communicated by the APIC
> >>> to the software.
> >>>
> >>> So e.g. a table with 256 entries, with extra 1024-256
> >>> used for spill-over for guests that do something unexpected,
> >>> would work really well.
> >>
> >> Already Linux manages vectors on a pre-CPU basis. For efficiency
> >> reasons, it does not exploit the full range of 256 vectors but actually
> >> allocates them in - IIRC - steps of 16. So I would not be surprised to
> >> find lots of vector number "collisions" when looking over a full set of
> >> CPUs in a system.
> >>
> >> Really, these considerations do not help us. We must store all 96 bits,
> >> already for the sake of other KVM architectures that want MSI routing.
> >>>
> >>>
> >>>>>
> >>>>>>> a single GSI and vice versa. As there are less GSIs than possible MSI
> >>>>>>> messages, we could run out of them when creating routes, statically or
> >>>>>>> lazily.
> >>>>>>>
> >>>>>>> What would probably help us long-term out of your concerns regarding
> >>>>>>> lazy routing is to bypass that redundant GSI translation for dynamic
> >>>>>>> messages, i.e. those that are not associated with an irqfd number or an
> >>>>>>> assigned device irq. Something like a KVM_DELIVER_MSI IOCTL that accepts
> >>>>>>> address and data directly.
> >>>>>>
> >>>>>> This would be a trivial extension in fact. Given its beneficial impact
> >>>>>> on our GSI limitation issue, I think I will hack up something like that.
> >>>>>>
> >>>>>> And maybe this makes a transparent cache more reasonable. Then only old
> >>>>>> host kernels would force us to do searches for already cached messages.
> >>>>>>
> >>>>>> Jan
> >>>>>
> >>>>> Hmm, I'm not all that sure. Existing design really allows
> >>>>> caching the route in various smart ways. We currently do
> >>>>> this for irqfd but this can be extended to ioctls.
> >>>>> If we just let the guest inject arbitrary messages,
> >>>>> that becomes much more complex.
> >>>>
> >>>> irqfd and kvm device assignment do not allow us to inject arbitrary
> >>>> messages at arbitrary points. The new API offers kvm_msi_irqfd_set and
> >>>> kvm_device_msix_set_vector (etc.) for those scenarios to set static
> >>>> routes from an MSI message to a GSI number (+they configure the related
> >>>> backends).
> >>>
> >>> Yes, it's a very flexible API but it would be very hard to optimize.
> >>> GSIs let us do the slow path setup, but they make it easy
> >>> to optimize target lookup in kernel.
> >>
> >> Users of the API above have no need to know anything about GSIs. They
> >> are an artifact of the KVM-internal interface between user space and
> >> kernel now - thanks to the MSIRoutingCache encapsulation.
> > 
> > Yes but I am saying that the API above can't be implemented
> > more efficiently than now: you will have to scan all apics on each MSI.
> > The GSI implementation can be optimized: decode the vector once,
> > if it matches a single vcpu, store that vcpu and use when sending
> > interrupts.
> 
> Sorry, missed that you switched to kernel.
> 
> What information do you want to cache there that cannot be easily
> obtained by looking at a concrete message? I do not see any. Once you
> checked that the delivery mode targets a specific cpu, you could address
> it directly.

I thought we need to match APIC ID. That needs a table lookup, no?

> Or are you thinking about some cluster mode?

That too.

> > 
> > 
> >>>
> >>> An analogy would be if read/write operated on file paths.
> >>> fd makes it easy to do permission checks and slow lookups
> >>> in one place. GSI happens to work like this (maybe, by accident).
> >>
> >> Think of an opaque file handle as a MSIRoutingCache object. And it
> >> encodes not only the routing handle but also other useful associated
> >> information we need from time to time - internally, not in the device
> >> models.
> > 
> > Forget qemu abstractions, I am talking about data path
> > optimizations in kernel in kvm. From that POV the point of an fd is not
> > that it is opaque. It is that it's an index in an array that
> > can be used for fast lookups.
> > 
> >>>>>
> >>>>> Another concern is mask bit emulation. We currently
> >>>>> handle mask bit in userspace but patches
> >>>>> to do them in kernel for assigned devices where seen
> >>>>> and IMO we might want to do that for virtio as well.
> >>>>>
> >>>>> For that to work the mask bit needs to be tied to
> >>>>> a specific gsi or specific device, which does not
> >>>>> work if we just inject arbitrary writes.
> >>>>
> >>>> Yes, but I do not see those valuable plans being negatively affected.
> >>>>
> >>>> Jan
> >>>>
> >>>
> >>> I do.
> >>> How would we maintain a mask/pending bit in kernel if we are not
> >>> supplied info on all available vectors even?
> >>
> >> It's tricky to discuss an undefined interface (there only exists an
> >> outdated proposal for kvm device assignment). But I suppose that user
> >> space will have to define the maximum number of vectors when creating an
> >> in-kernel MSI-X MMIO area. The device already has to tell this to msix_init.
> >>
> >> The number of used vectors will correlate with the number of registered
> >> irqfds (in the case of vhost or vfio, device assignment still has
> >> SET_MSIX_NR). As kernel space would then be responsible for mask
> >> processing, user space would keep vectors registered with irqfds, even
> >> if they are masked. It could just continue to play the trick and drop
> >> data=0 vectors.
> > 
> > Which trick?  We don't play any tricks except for device assignment.
> > 
> >> The point here is: All those steps have _nothing_ to do with the generic
> >> MSI-X core. They are KVM-specific "side channels" for which KVM provides
> >> an API. In contrast, msix_vector_use/unuse were generic services that
> >> were actually created to please KVM requirements. But if we split that
> >> up, we can address the generic MSI-X requirements in a way that makes
> >> more sense for emulated devices (and particularly msix_vector_use makes
> >> no sense for emulation).
> >>
> >> Jan
> >>
> > 
> > We need at least msix_vector_unuse
> 
> Not at all. We rather need some qemu_irq_set(level) for MSI.
> The spec
> requires that the device clears pending when the reason for that is
> removed. And any removal that is device model-originated should simply
> be signaled like an irq de-assert.

OK, this is a good argument.
In particular virtio ISR read could clear msix pending bit
(note: it would also need to clear irqfd as that is where
 we get the pending bit).

I would prefer not to use qemu_irq_set for this though.
We can add a level flag to msix_notify.

> Vector "unusage" is just one reason here.

I don't see removing the use/unuse functions as a priority though,
but if we add an API that also lets devices say
'reason for interrupt is removed', that would be nice.

Removing extra code can then be done separately, and on qemu.git
not on qemu-kvm.

> > - IMO it makes more sense than "clear
> > pending vector". msix_vector_use is good to keep around for symmetry:
> > who knows whether we'll need to allocate resources per vector
> > in the future.
> 
> For MSI[-X], the spec is already there, and we know that there no need
> for further resources when emulating it.
> Only KVM has special needs.
> 
> Jan
> 

It's not hard to speculate.  Imagine an out of process device that
shares guest memory and sends interrupts to qemu using eventfd. Suddenly
we need an fd per vector, and this without KVM.

-- 
MST

^ permalink raw reply	[flat|nested] 288+ messages in thread

* Re: [RFC][PATCH 28/45] qemu-kvm: msix: Drop tracking of used vectors
  2011-10-19  0:56                                               ` [Qemu-devel] " Michael S. Tsirkin
@ 2011-10-19  6:41                                                 ` Jan Kiszka
  -1 siblings, 0 replies; 288+ messages in thread
From: Jan Kiszka @ 2011-10-19  6:41 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Avi Kivity, Marcelo Tosatti, kvm, Alex Williamson, qemu-devel

[-- Attachment #1: Type: text/plain, Size: 9963 bytes --]

On 2011-10-19 02:56, Michael S. Tsirkin wrote:
> On Wed, Oct 19, 2011 at 12:13:49AM +0200, Jan Kiszka wrote:
>> On 2011-10-18 23:40, Michael S. Tsirkin wrote:
>>> On Tue, Oct 18, 2011 at 09:37:14PM +0200, Jan Kiszka wrote:
>>>> On 2011-10-18 20:40, Michael S. Tsirkin wrote:
>>>>> On Tue, Oct 18, 2011 at 08:24:39PM +0200, Jan Kiszka wrote:
>>>>>> On 2011-10-18 19:06, Michael S. Tsirkin wrote:
>>>>>>> On Tue, Oct 18, 2011 at 05:55:54PM +0200, Jan Kiszka wrote:
>>>>>>>> On 2011-10-18 17:22, Jan Kiszka wrote:
>>>>>>>>> What KVM has to do is just mapping an arbitrary MSI message
>>>>>>>>> (theoretically 64+32 bits, in practice it's much of course much less) to
>>>>>>>>
>>>>>>>> ( There are 24 distinguishing bits in an MSI message on x86, but that's
>>>>>>>> only a current interpretation of one specific arch. )
>>>>>>>
>>>>>>> Confused. vector mask is 8 bits. the rest is destination id etc.
>>>>>>
>>>>>> Right, but those additional bits like the destination make different
>>>>>> messages. We have to encode those 24 bits into a unique GSI number and
>>>>>> restore them (by table lookup) on APIC injection inside the kernel. If
>>>>>> we only had to encode 256 different vectors, we would be done already.
>>>>>
>>>>> Right. But in practice guests always use distinct vectors (from the
>>>>> 256 available) for distinct messages. This is because
>>>>> the vector seems to be the only thing that gets communicated by the APIC
>>>>> to the software.
>>>>>
>>>>> So e.g. a table with 256 entries, with extra 1024-256
>>>>> used for spill-over for guests that do something unexpected,
>>>>> would work really well.
>>>>
>>>> Already Linux manages vectors on a pre-CPU basis. For efficiency
>>>> reasons, it does not exploit the full range of 256 vectors but actually
>>>> allocates them in - IIRC - steps of 16. So I would not be surprised to
>>>> find lots of vector number "collisions" when looking over a full set of
>>>> CPUs in a system.
>>>>
>>>> Really, these considerations do not help us. We must store all 96 bits,
>>>> already for the sake of other KVM architectures that want MSI routing.
>>>>>
>>>>>
>>>>>>>
>>>>>>>>> a single GSI and vice versa. As there are less GSIs than possible MSI
>>>>>>>>> messages, we could run out of them when creating routes, statically or
>>>>>>>>> lazily.
>>>>>>>>>
>>>>>>>>> What would probably help us long-term out of your concerns regarding
>>>>>>>>> lazy routing is to bypass that redundant GSI translation for dynamic
>>>>>>>>> messages, i.e. those that are not associated with an irqfd number or an
>>>>>>>>> assigned device irq. Something like a KVM_DELIVER_MSI IOCTL that accepts
>>>>>>>>> address and data directly.
>>>>>>>>
>>>>>>>> This would be a trivial extension in fact. Given its beneficial impact
>>>>>>>> on our GSI limitation issue, I think I will hack up something like that.
>>>>>>>>
>>>>>>>> And maybe this makes a transparent cache more reasonable. Then only old
>>>>>>>> host kernels would force us to do searches for already cached messages.
>>>>>>>>
>>>>>>>> Jan
>>>>>>>
>>>>>>> Hmm, I'm not all that sure. Existing design really allows
>>>>>>> caching the route in various smart ways. We currently do
>>>>>>> this for irqfd but this can be extended to ioctls.
>>>>>>> If we just let the guest inject arbitrary messages,
>>>>>>> that becomes much more complex.
>>>>>>
>>>>>> irqfd and kvm device assignment do not allow us to inject arbitrary
>>>>>> messages at arbitrary points. The new API offers kvm_msi_irqfd_set and
>>>>>> kvm_device_msix_set_vector (etc.) for those scenarios to set static
>>>>>> routes from an MSI message to a GSI number (+they configure the related
>>>>>> backends).
>>>>>
>>>>> Yes, it's a very flexible API but it would be very hard to optimize.
>>>>> GSIs let us do the slow path setup, but they make it easy
>>>>> to optimize target lookup in kernel.
>>>>
>>>> Users of the API above have no need to know anything about GSIs. They
>>>> are an artifact of the KVM-internal interface between user space and
>>>> kernel now - thanks to the MSIRoutingCache encapsulation.
>>>
>>> Yes but I am saying that the API above can't be implemented
>>> more efficiently than now: you will have to scan all apics on each MSI.
>>> The GSI implementation can be optimized: decode the vector once,
>>> if it matches a single vcpu, store that vcpu and use when sending
>>> interrupts.
>>
>> Sorry, missed that you switched to kernel.
>>
>> What information do you want to cache there that cannot be easily
>> obtained by looking at a concrete message? I do not see any. Once you
>> checked that the delivery mode targets a specific cpu, you could address
>> it directly.
> 
> I thought we need to match APIC ID. That needs a table lookup, no?

Yes. But that's completely independent of a concrete MSI message. In
fact, this is the same thing we need when interpreting an IOAPIC
redirection table entry. So let's create an APIC ID lookup table for the
destination ID field, maybe multiple of them to match different modes,
but not a MSI message table.

> 
>> Or are you thinking about some cluster mode?
> 
> That too.
> 
>>>
>>>
>>>>>
>>>>> An analogy would be if read/write operated on file paths.
>>>>> fd makes it easy to do permission checks and slow lookups
>>>>> in one place. GSI happens to work like this (maybe, by accident).
>>>>
>>>> Think of an opaque file handle as a MSIRoutingCache object. And it
>>>> encodes not only the routing handle but also other useful associated
>>>> information we need from time to time - internally, not in the device
>>>> models.
>>>
>>> Forget qemu abstractions, I am talking about data path
>>> optimizations in kernel in kvm. From that POV the point of an fd is not
>>> that it is opaque. It is that it's an index in an array that
>>> can be used for fast lookups.
>>>
>>>>>>>
>>>>>>> Another concern is mask bit emulation. We currently
>>>>>>> handle mask bit in userspace but patches
>>>>>>> to do them in kernel for assigned devices where seen
>>>>>>> and IMO we might want to do that for virtio as well.
>>>>>>>
>>>>>>> For that to work the mask bit needs to be tied to
>>>>>>> a specific gsi or specific device, which does not
>>>>>>> work if we just inject arbitrary writes.
>>>>>>
>>>>>> Yes, but I do not see those valuable plans being negatively affected.
>>>>>>
>>>>>> Jan
>>>>>>
>>>>>
>>>>> I do.
>>>>> How would we maintain a mask/pending bit in kernel if we are not
>>>>> supplied info on all available vectors even?
>>>>
>>>> It's tricky to discuss an undefined interface (there only exists an
>>>> outdated proposal for kvm device assignment). But I suppose that user
>>>> space will have to define the maximum number of vectors when creating an
>>>> in-kernel MSI-X MMIO area. The device already has to tell this to msix_init.
>>>>
>>>> The number of used vectors will correlate with the number of registered
>>>> irqfds (in the case of vhost or vfio, device assignment still has
>>>> SET_MSIX_NR). As kernel space would then be responsible for mask
>>>> processing, user space would keep vectors registered with irqfds, even
>>>> if they are masked. It could just continue to play the trick and drop
>>>> data=0 vectors.
>>>
>>> Which trick?  We don't play any tricks except for device assignment.
>>>
>>>> The point here is: All those steps have _nothing_ to do with the generic
>>>> MSI-X core. They are KVM-specific "side channels" for which KVM provides
>>>> an API. In contrast, msix_vector_use/unuse were generic services that
>>>> were actually created to please KVM requirements. But if we split that
>>>> up, we can address the generic MSI-X requirements in a way that makes
>>>> more sense for emulated devices (and particularly msix_vector_use makes
>>>> no sense for emulation).
>>>>
>>>> Jan
>>>>
>>>
>>> We need at least msix_vector_unuse
>>
>> Not at all. We rather need some qemu_irq_set(level) for MSI.
>> The spec
>> requires that the device clears pending when the reason for that is
>> removed. And any removal that is device model-originated should simply
>> be signaled like an irq de-assert.
> 
> OK, this is a good argument.
> In particular virtio ISR read could clear msix pending bit
> (note: it would also need to clear irqfd as that is where
>  we get the pending bit).
> 
> I would prefer not to use qemu_irq_set for this though.
> We can add a level flag to msix_notify.

No concerns.

> 
>> Vector "unusage" is just one reason here.
> 
> I don't see removing the use/unuse functions as a priority though,
> but if we add an API that also lets devices say
> 'reason for interrupt is removed', that would be nice.
> 
> Removing extra code can then be done separately, and on qemu.git
> not on qemu-kvm.

If we refrain from hacking KVM logic into the use/unuse services
upstream, we can do this later on. For me it is important that those
obsolete services do not block or complicate further cleanups of the MSI
layer nor bother device model creators with tasks they should not worry
about.

> 
>>> - IMO it makes more sense than "clear
>>> pending vector". msix_vector_use is good to keep around for symmetry:
>>> who knows whether we'll need to allocate resources per vector
>>> in the future.
>>
>> For MSI[-X], the spec is already there, and we know that there no need
>> for further resources when emulating it.
>> Only KVM has special needs.
>>
>> Jan
>>
> 
> It's not hard to speculate.  Imagine an out of process device that
> shares guest memory and sends interrupts to qemu using eventfd. Suddenly
> we need an fd per vector, and this without KVM.

That's what irqfd was invented for. Already works for vhost, and there
is nothing that prevents communicating the irqfd fd between two
processes. But note: irqfd handle, not a KVM-internal GSI.

Jan


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 262 bytes --]

^ permalink raw reply	[flat|nested] 288+ messages in thread

* Re: [Qemu-devel] [RFC][PATCH 28/45] qemu-kvm: msix: Drop tracking of used vectors
@ 2011-10-19  6:41                                                 ` Jan Kiszka
  0 siblings, 0 replies; 288+ messages in thread
From: Jan Kiszka @ 2011-10-19  6:41 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Alex Williamson, Marcelo Tosatti, Avi Kivity, kvm, qemu-devel

[-- Attachment #1: Type: text/plain, Size: 9963 bytes --]

On 2011-10-19 02:56, Michael S. Tsirkin wrote:
> On Wed, Oct 19, 2011 at 12:13:49AM +0200, Jan Kiszka wrote:
>> On 2011-10-18 23:40, Michael S. Tsirkin wrote:
>>> On Tue, Oct 18, 2011 at 09:37:14PM +0200, Jan Kiszka wrote:
>>>> On 2011-10-18 20:40, Michael S. Tsirkin wrote:
>>>>> On Tue, Oct 18, 2011 at 08:24:39PM +0200, Jan Kiszka wrote:
>>>>>> On 2011-10-18 19:06, Michael S. Tsirkin wrote:
>>>>>>> On Tue, Oct 18, 2011 at 05:55:54PM +0200, Jan Kiszka wrote:
>>>>>>>> On 2011-10-18 17:22, Jan Kiszka wrote:
>>>>>>>>> What KVM has to do is just mapping an arbitrary MSI message
>>>>>>>>> (theoretically 64+32 bits, in practice it's much of course much less) to
>>>>>>>>
>>>>>>>> ( There are 24 distinguishing bits in an MSI message on x86, but that's
>>>>>>>> only a current interpretation of one specific arch. )
>>>>>>>
>>>>>>> Confused. vector mask is 8 bits. the rest is destination id etc.
>>>>>>
>>>>>> Right, but those additional bits like the destination make different
>>>>>> messages. We have to encode those 24 bits into a unique GSI number and
>>>>>> restore them (by table lookup) on APIC injection inside the kernel. If
>>>>>> we only had to encode 256 different vectors, we would be done already.
>>>>>
>>>>> Right. But in practice guests always use distinct vectors (from the
>>>>> 256 available) for distinct messages. This is because
>>>>> the vector seems to be the only thing that gets communicated by the APIC
>>>>> to the software.
>>>>>
>>>>> So e.g. a table with 256 entries, with extra 1024-256
>>>>> used for spill-over for guests that do something unexpected,
>>>>> would work really well.
>>>>
>>>> Already Linux manages vectors on a pre-CPU basis. For efficiency
>>>> reasons, it does not exploit the full range of 256 vectors but actually
>>>> allocates them in - IIRC - steps of 16. So I would not be surprised to
>>>> find lots of vector number "collisions" when looking over a full set of
>>>> CPUs in a system.
>>>>
>>>> Really, these considerations do not help us. We must store all 96 bits,
>>>> already for the sake of other KVM architectures that want MSI routing.
>>>>>
>>>>>
>>>>>>>
>>>>>>>>> a single GSI and vice versa. As there are less GSIs than possible MSI
>>>>>>>>> messages, we could run out of them when creating routes, statically or
>>>>>>>>> lazily.
>>>>>>>>>
>>>>>>>>> What would probably help us long-term out of your concerns regarding
>>>>>>>>> lazy routing is to bypass that redundant GSI translation for dynamic
>>>>>>>>> messages, i.e. those that are not associated with an irqfd number or an
>>>>>>>>> assigned device irq. Something like a KVM_DELIVER_MSI IOCTL that accepts
>>>>>>>>> address and data directly.
>>>>>>>>
>>>>>>>> This would be a trivial extension in fact. Given its beneficial impact
>>>>>>>> on our GSI limitation issue, I think I will hack up something like that.
>>>>>>>>
>>>>>>>> And maybe this makes a transparent cache more reasonable. Then only old
>>>>>>>> host kernels would force us to do searches for already cached messages.
>>>>>>>>
>>>>>>>> Jan
>>>>>>>
>>>>>>> Hmm, I'm not all that sure. Existing design really allows
>>>>>>> caching the route in various smart ways. We currently do
>>>>>>> this for irqfd but this can be extended to ioctls.
>>>>>>> If we just let the guest inject arbitrary messages,
>>>>>>> that becomes much more complex.
>>>>>>
>>>>>> irqfd and kvm device assignment do not allow us to inject arbitrary
>>>>>> messages at arbitrary points. The new API offers kvm_msi_irqfd_set and
>>>>>> kvm_device_msix_set_vector (etc.) for those scenarios to set static
>>>>>> routes from an MSI message to a GSI number (+they configure the related
>>>>>> backends).
>>>>>
>>>>> Yes, it's a very flexible API but it would be very hard to optimize.
>>>>> GSIs let us do the slow path setup, but they make it easy
>>>>> to optimize target lookup in kernel.
>>>>
>>>> Users of the API above have no need to know anything about GSIs. They
>>>> are an artifact of the KVM-internal interface between user space and
>>>> kernel now - thanks to the MSIRoutingCache encapsulation.
>>>
>>> Yes but I am saying that the API above can't be implemented
>>> more efficiently than now: you will have to scan all apics on each MSI.
>>> The GSI implementation can be optimized: decode the vector once,
>>> if it matches a single vcpu, store that vcpu and use when sending
>>> interrupts.
>>
>> Sorry, missed that you switched to kernel.
>>
>> What information do you want to cache there that cannot be easily
>> obtained by looking at a concrete message? I do not see any. Once you
>> checked that the delivery mode targets a specific cpu, you could address
>> it directly.
> 
> I thought we need to match APIC ID. That needs a table lookup, no?

Yes. But that's completely independent of a concrete MSI message. In
fact, this is the same thing we need when interpreting an IOAPIC
redirection table entry. So let's create an APIC ID lookup table for the
destination ID field, maybe multiple of them to match different modes,
but not a MSI message table.

> 
>> Or are you thinking about some cluster mode?
> 
> That too.
> 
>>>
>>>
>>>>>
>>>>> An analogy would be if read/write operated on file paths.
>>>>> fd makes it easy to do permission checks and slow lookups
>>>>> in one place. GSI happens to work like this (maybe, by accident).
>>>>
>>>> Think of an opaque file handle as a MSIRoutingCache object. And it
>>>> encodes not only the routing handle but also other useful associated
>>>> information we need from time to time - internally, not in the device
>>>> models.
>>>
>>> Forget qemu abstractions, I am talking about data path
>>> optimizations in kernel in kvm. From that POV the point of an fd is not
>>> that it is opaque. It is that it's an index in an array that
>>> can be used for fast lookups.
>>>
>>>>>>>
>>>>>>> Another concern is mask bit emulation. We currently
>>>>>>> handle mask bit in userspace but patches
>>>>>>> to do them in kernel for assigned devices where seen
>>>>>>> and IMO we might want to do that for virtio as well.
>>>>>>>
>>>>>>> For that to work the mask bit needs to be tied to
>>>>>>> a specific gsi or specific device, which does not
>>>>>>> work if we just inject arbitrary writes.
>>>>>>
>>>>>> Yes, but I do not see those valuable plans being negatively affected.
>>>>>>
>>>>>> Jan
>>>>>>
>>>>>
>>>>> I do.
>>>>> How would we maintain a mask/pending bit in kernel if we are not
>>>>> supplied info on all available vectors even?
>>>>
>>>> It's tricky to discuss an undefined interface (there only exists an
>>>> outdated proposal for kvm device assignment). But I suppose that user
>>>> space will have to define the maximum number of vectors when creating an
>>>> in-kernel MSI-X MMIO area. The device already has to tell this to msix_init.
>>>>
>>>> The number of used vectors will correlate with the number of registered
>>>> irqfds (in the case of vhost or vfio, device assignment still has
>>>> SET_MSIX_NR). As kernel space would then be responsible for mask
>>>> processing, user space would keep vectors registered with irqfds, even
>>>> if they are masked. It could just continue to play the trick and drop
>>>> data=0 vectors.
>>>
>>> Which trick?  We don't play any tricks except for device assignment.
>>>
>>>> The point here is: All those steps have _nothing_ to do with the generic
>>>> MSI-X core. They are KVM-specific "side channels" for which KVM provides
>>>> an API. In contrast, msix_vector_use/unuse were generic services that
>>>> were actually created to please KVM requirements. But if we split that
>>>> up, we can address the generic MSI-X requirements in a way that makes
>>>> more sense for emulated devices (and particularly msix_vector_use makes
>>>> no sense for emulation).
>>>>
>>>> Jan
>>>>
>>>
>>> We need at least msix_vector_unuse
>>
>> Not at all. We rather need some qemu_irq_set(level) for MSI.
>> The spec
>> requires that the device clears pending when the reason for that is
>> removed. And any removal that is device model-originated should simply
>> be signaled like an irq de-assert.
> 
> OK, this is a good argument.
> In particular virtio ISR read could clear msix pending bit
> (note: it would also need to clear irqfd as that is where
>  we get the pending bit).
> 
> I would prefer not to use qemu_irq_set for this though.
> We can add a level flag to msix_notify.

No concerns.

> 
>> Vector "unusage" is just one reason here.
> 
> I don't see removing the use/unuse functions as a priority though,
> but if we add an API that also lets devices say
> 'reason for interrupt is removed', that would be nice.
> 
> Removing extra code can then be done separately, and on qemu.git
> not on qemu-kvm.

If we refrain from hacking KVM logic into the use/unuse services
upstream, we can do this later on. For me it is important that those
obsolete services do not block or complicate further cleanups of the MSI
layer nor bother device model creators with tasks they should not worry
about.

> 
>>> - IMO it makes more sense than "clear
>>> pending vector". msix_vector_use is good to keep around for symmetry:
>>> who knows whether we'll need to allocate resources per vector
>>> in the future.
>>
>> For MSI[-X], the spec is already there, and we know that there no need
>> for further resources when emulating it.
>> Only KVM has special needs.
>>
>> Jan
>>
> 
> It's not hard to speculate.  Imagine an out of process device that
> shares guest memory and sends interrupts to qemu using eventfd. Suddenly
> we need an fd per vector, and this without KVM.

That's what irqfd was invented for. Already works for vhost, and there
is nothing that prevents communicating the irqfd fd between two
processes. But note: irqfd handle, not a KVM-internal GSI.

Jan


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 262 bytes --]

^ permalink raw reply	[flat|nested] 288+ messages in thread

* Re: [RFC][PATCH 28/45] qemu-kvm: msix: Drop tracking of used vectors
  2011-10-19  6:41                                                 ` [Qemu-devel] " Jan Kiszka
@ 2011-10-19  9:03                                                   ` Michael S. Tsirkin
  -1 siblings, 0 replies; 288+ messages in thread
From: Michael S. Tsirkin @ 2011-10-19  9:03 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: Avi Kivity, Marcelo Tosatti, kvm, Alex Williamson, qemu-devel

On Wed, Oct 19, 2011 at 08:41:48AM +0200, Jan Kiszka wrote:
> >>>>>>>>> a single GSI and vice versa. As there are less GSIs than possible MSI
> >>>>>>>>> messages, we could run out of them when creating routes, statically or
> >>>>>>>>> lazily.
> >>>>>>>>>
> >>>>>>>>> What would probably help us long-term out of your concerns regarding
> >>>>>>>>> lazy routing is to bypass that redundant GSI translation for dynamic
> >>>>>>>>> messages, i.e. those that are not associated with an irqfd number or an
> >>>>>>>>> assigned device irq. Something like a KVM_DELIVER_MSI IOCTL that accepts
> >>>>>>>>> address and data directly.
> >>>>>>>>
> >>>>>>>> This would be a trivial extension in fact. Given its beneficial impact
> >>>>>>>> on our GSI limitation issue, I think I will hack up something like that.
> >>>>>>>>
> >>>>>>>> And maybe this makes a transparent cache more reasonable. Then only old
> >>>>>>>> host kernels would force us to do searches for already cached messages.
> >>>>>>>>
> >>>>>>>> Jan
> >>>>>>>
> >>>>>>> Hmm, I'm not all that sure. Existing design really allows
> >>>>>>> caching the route in various smart ways. We currently do
> >>>>>>> this for irqfd but this can be extended to ioctls.
> >>>>>>> If we just let the guest inject arbitrary messages,
> >>>>>>> that becomes much more complex.
> >>>>>>
> >>>>>> irqfd and kvm device assignment do not allow us to inject arbitrary
> >>>>>> messages at arbitrary points. The new API offers kvm_msi_irqfd_set and
> >>>>>> kvm_device_msix_set_vector (etc.) for those scenarios to set static
> >>>>>> routes from an MSI message to a GSI number (+they configure the related
> >>>>>> backends).
> >>>>>
> >>>>> Yes, it's a very flexible API but it would be very hard to optimize.
> >>>>> GSIs let us do the slow path setup, but they make it easy
> >>>>> to optimize target lookup in kernel.
> >>>>
> >>>> Users of the API above have no need to know anything about GSIs. They
> >>>> are an artifact of the KVM-internal interface between user space and
> >>>> kernel now - thanks to the MSIRoutingCache encapsulation.
> >>>
> >>> Yes but I am saying that the API above can't be implemented
> >>> more efficiently than now: you will have to scan all apics on each MSI.
> >>> The GSI implementation can be optimized: decode the vector once,
> >>> if it matches a single vcpu, store that vcpu and use when sending
> >>> interrupts.
> >>
> >> Sorry, missed that you switched to kernel.
> >>
> >> What information do you want to cache there that cannot be easily
> >> obtained by looking at a concrete message? I do not see any. Once you
> >> checked that the delivery mode targets a specific cpu, you could address
> >> it directly.
> > 
> > I thought we need to match APIC ID. That needs a table lookup, no?
> 
> Yes. But that's completely independent of a concrete MSI message. In
> fact, this is the same thing we need when interpreting an IOAPIC
> redirection table entry. So let's create an APIC ID lookup table for the
> destination ID field, maybe multiple of them to match different modes,
> but not a MSI message table.
> > 
> >> Or are you thinking about some cluster mode?
> > 
> > That too.

Hmm, might be a good idea. APIC IDs are 8 bit, right?


> >>>
> >>>
> >>>>>
> >>>>> An analogy would be if read/write operated on file paths.
> >>>>> fd makes it easy to do permission checks and slow lookups
> >>>>> in one place. GSI happens to work like this (maybe, by accident).
> >>>>
> >>>> Think of an opaque file handle as a MSIRoutingCache object. And it
> >>>> encodes not only the routing handle but also other useful associated
> >>>> information we need from time to time - internally, not in the device
> >>>> models.
> >>>
> >>> Forget qemu abstractions, I am talking about data path
> >>> optimizations in kernel in kvm. From that POV the point of an fd is not
> >>> that it is opaque. It is that it's an index in an array that
> >>> can be used for fast lookups.
> >>>
> >>>>>>>
> >>>>>>> Another concern is mask bit emulation. We currently
> >>>>>>> handle mask bit in userspace but patches
> >>>>>>> to do them in kernel for assigned devices where seen
> >>>>>>> and IMO we might want to do that for virtio as well.
> >>>>>>>
> >>>>>>> For that to work the mask bit needs to be tied to
> >>>>>>> a specific gsi or specific device, which does not
> >>>>>>> work if we just inject arbitrary writes.
> >>>>>>
> >>>>>> Yes, but I do not see those valuable plans being negatively affected.
> >>>>>>
> >>>>>> Jan
> >>>>>>
> >>>>>
> >>>>> I do.
> >>>>> How would we maintain a mask/pending bit in kernel if we are not
> >>>>> supplied info on all available vectors even?
> >>>>
> >>>> It's tricky to discuss an undefined interface (there only exists an
> >>>> outdated proposal for kvm device assignment). But I suppose that user
> >>>> space will have to define the maximum number of vectors when creating an
> >>>> in-kernel MSI-X MMIO area. The device already has to tell this to msix_init.
> >>>>
> >>>> The number of used vectors will correlate with the number of registered
> >>>> irqfds (in the case of vhost or vfio, device assignment still has
> >>>> SET_MSIX_NR). As kernel space would then be responsible for mask
> >>>> processing, user space would keep vectors registered with irqfds, even
> >>>> if they are masked. It could just continue to play the trick and drop
> >>>> data=0 vectors.
> >>>
> >>> Which trick?  We don't play any tricks except for device assignment.
> >>>
> >>>> The point here is: All those steps have _nothing_ to do with the generic
> >>>> MSI-X core. They are KVM-specific "side channels" for which KVM provides
> >>>> an API. In contrast, msix_vector_use/unuse were generic services that
> >>>> were actually created to please KVM requirements. But if we split that
> >>>> up, we can address the generic MSI-X requirements in a way that makes
> >>>> more sense for emulated devices (and particularly msix_vector_use makes
> >>>> no sense for emulation).
> >>>>
> >>>> Jan
> >>>>
> >>>
> >>> We need at least msix_vector_unuse
> >>
> >> Not at all. We rather need some qemu_irq_set(level) for MSI.
> >> The spec
> >> requires that the device clears pending when the reason for that is
> >> removed. And any removal that is device model-originated should simply
> >> be signaled like an irq de-assert.
> > 
> > OK, this is a good argument.
> > In particular virtio ISR read could clear msix pending bit
> > (note: it would also need to clear irqfd as that is where
> >  we get the pending bit).
> > 
> > I would prefer not to use qemu_irq_set for this though.
> > We can add a level flag to msix_notify.
> 
> No concerns.
> 
> > 
> >> Vector "unusage" is just one reason here.
> > 
> > I don't see removing the use/unuse functions as a priority though,
> > but if we add an API that also lets devices say
> > 'reason for interrupt is removed', that would be nice.
> > 
> > Removing extra code can then be done separately, and on qemu.git
> > not on qemu-kvm.
> 
> If we refrain from hacking KVM logic into the use/unuse services
> upstream, we can do this later on. For me it is important that those
> obsolete services do not block or complicate further cleanups of the MSI
> layer nor bother device model creators with tasks they should not worry
> about.

My assumption is devices shall keep calling use/unuse until we drop it.
Does not seem like a major bother. If you like, use all vectors
or just those with message != 0.

> > 
> >>> - IMO it makes more sense than "clear
> >>> pending vector". msix_vector_use is good to keep around for symmetry:
> >>> who knows whether we'll need to allocate resources per vector
> >>> in the future.
> >>
> >> For MSI[-X], the spec is already there, and we know that there no need
> >> for further resources when emulating it.
> >> Only KVM has special needs.
> >>
> >> Jan
> >>
> > 
> > It's not hard to speculate.  Imagine an out of process device that
> > shares guest memory and sends interrupts to qemu using eventfd. Suddenly
> > we need an fd per vector, and this without KVM.
> 
> That's what irqfd was invented for. Already works for vhost, and there
> is nothing that prevents communicating the irqfd fd between two
> processes. But note: irqfd handle, not a KVM-internal GSI.
> 
> Jan
> 

Yes. But this still makes an API for acquiring per-vector resources a requirement.

-- 
MST

^ permalink raw reply	[flat|nested] 288+ messages in thread

* Re: [Qemu-devel] [RFC][PATCH 28/45] qemu-kvm: msix: Drop tracking of used vectors
@ 2011-10-19  9:03                                                   ` Michael S. Tsirkin
  0 siblings, 0 replies; 288+ messages in thread
From: Michael S. Tsirkin @ 2011-10-19  9:03 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: Alex Williamson, Marcelo Tosatti, Avi Kivity, kvm, qemu-devel

On Wed, Oct 19, 2011 at 08:41:48AM +0200, Jan Kiszka wrote:
> >>>>>>>>> a single GSI and vice versa. As there are less GSIs than possible MSI
> >>>>>>>>> messages, we could run out of them when creating routes, statically or
> >>>>>>>>> lazily.
> >>>>>>>>>
> >>>>>>>>> What would probably help us long-term out of your concerns regarding
> >>>>>>>>> lazy routing is to bypass that redundant GSI translation for dynamic
> >>>>>>>>> messages, i.e. those that are not associated with an irqfd number or an
> >>>>>>>>> assigned device irq. Something like a KVM_DELIVER_MSI IOCTL that accepts
> >>>>>>>>> address and data directly.
> >>>>>>>>
> >>>>>>>> This would be a trivial extension in fact. Given its beneficial impact
> >>>>>>>> on our GSI limitation issue, I think I will hack up something like that.
> >>>>>>>>
> >>>>>>>> And maybe this makes a transparent cache more reasonable. Then only old
> >>>>>>>> host kernels would force us to do searches for already cached messages.
> >>>>>>>>
> >>>>>>>> Jan
> >>>>>>>
> >>>>>>> Hmm, I'm not all that sure. Existing design really allows
> >>>>>>> caching the route in various smart ways. We currently do
> >>>>>>> this for irqfd but this can be extended to ioctls.
> >>>>>>> If we just let the guest inject arbitrary messages,
> >>>>>>> that becomes much more complex.
> >>>>>>
> >>>>>> irqfd and kvm device assignment do not allow us to inject arbitrary
> >>>>>> messages at arbitrary points. The new API offers kvm_msi_irqfd_set and
> >>>>>> kvm_device_msix_set_vector (etc.) for those scenarios to set static
> >>>>>> routes from an MSI message to a GSI number (+they configure the related
> >>>>>> backends).
> >>>>>
> >>>>> Yes, it's a very flexible API but it would be very hard to optimize.
> >>>>> GSIs let us do the slow path setup, but they make it easy
> >>>>> to optimize target lookup in kernel.
> >>>>
> >>>> Users of the API above have no need to know anything about GSIs. They
> >>>> are an artifact of the KVM-internal interface between user space and
> >>>> kernel now - thanks to the MSIRoutingCache encapsulation.
> >>>
> >>> Yes but I am saying that the API above can't be implemented
> >>> more efficiently than now: you will have to scan all apics on each MSI.
> >>> The GSI implementation can be optimized: decode the vector once,
> >>> if it matches a single vcpu, store that vcpu and use when sending
> >>> interrupts.
> >>
> >> Sorry, missed that you switched to kernel.
> >>
> >> What information do you want to cache there that cannot be easily
> >> obtained by looking at a concrete message? I do not see any. Once you
> >> checked that the delivery mode targets a specific cpu, you could address
> >> it directly.
> > 
> > I thought we need to match APIC ID. That needs a table lookup, no?
> 
> Yes. But that's completely independent of a concrete MSI message. In
> fact, this is the same thing we need when interpreting an IOAPIC
> redirection table entry. So let's create an APIC ID lookup table for the
> destination ID field, maybe multiple of them to match different modes,
> but not a MSI message table.
> > 
> >> Or are you thinking about some cluster mode?
> > 
> > That too.

Hmm, might be a good idea. APIC IDs are 8 bit, right?


> >>>
> >>>
> >>>>>
> >>>>> An analogy would be if read/write operated on file paths.
> >>>>> fd makes it easy to do permission checks and slow lookups
> >>>>> in one place. GSI happens to work like this (maybe, by accident).
> >>>>
> >>>> Think of an opaque file handle as a MSIRoutingCache object. And it
> >>>> encodes not only the routing handle but also other useful associated
> >>>> information we need from time to time - internally, not in the device
> >>>> models.
> >>>
> >>> Forget qemu abstractions, I am talking about data path
> >>> optimizations in kernel in kvm. From that POV the point of an fd is not
> >>> that it is opaque. It is that it's an index in an array that
> >>> can be used for fast lookups.
> >>>
> >>>>>>>
> >>>>>>> Another concern is mask bit emulation. We currently
> >>>>>>> handle mask bit in userspace but patches
> >>>>>>> to do them in kernel for assigned devices where seen
> >>>>>>> and IMO we might want to do that for virtio as well.
> >>>>>>>
> >>>>>>> For that to work the mask bit needs to be tied to
> >>>>>>> a specific gsi or specific device, which does not
> >>>>>>> work if we just inject arbitrary writes.
> >>>>>>
> >>>>>> Yes, but I do not see those valuable plans being negatively affected.
> >>>>>>
> >>>>>> Jan
> >>>>>>
> >>>>>
> >>>>> I do.
> >>>>> How would we maintain a mask/pending bit in kernel if we are not
> >>>>> supplied info on all available vectors even?
> >>>>
> >>>> It's tricky to discuss an undefined interface (there only exists an
> >>>> outdated proposal for kvm device assignment). But I suppose that user
> >>>> space will have to define the maximum number of vectors when creating an
> >>>> in-kernel MSI-X MMIO area. The device already has to tell this to msix_init.
> >>>>
> >>>> The number of used vectors will correlate with the number of registered
> >>>> irqfds (in the case of vhost or vfio, device assignment still has
> >>>> SET_MSIX_NR). As kernel space would then be responsible for mask
> >>>> processing, user space would keep vectors registered with irqfds, even
> >>>> if they are masked. It could just continue to play the trick and drop
> >>>> data=0 vectors.
> >>>
> >>> Which trick?  We don't play any tricks except for device assignment.
> >>>
> >>>> The point here is: All those steps have _nothing_ to do with the generic
> >>>> MSI-X core. They are KVM-specific "side channels" for which KVM provides
> >>>> an API. In contrast, msix_vector_use/unuse were generic services that
> >>>> were actually created to please KVM requirements. But if we split that
> >>>> up, we can address the generic MSI-X requirements in a way that makes
> >>>> more sense for emulated devices (and particularly msix_vector_use makes
> >>>> no sense for emulation).
> >>>>
> >>>> Jan
> >>>>
> >>>
> >>> We need at least msix_vector_unuse
> >>
> >> Not at all. We rather need some qemu_irq_set(level) for MSI.
> >> The spec
> >> requires that the device clears pending when the reason for that is
> >> removed. And any removal that is device model-originated should simply
> >> be signaled like an irq de-assert.
> > 
> > OK, this is a good argument.
> > In particular virtio ISR read could clear msix pending bit
> > (note: it would also need to clear irqfd as that is where
> >  we get the pending bit).
> > 
> > I would prefer not to use qemu_irq_set for this though.
> > We can add a level flag to msix_notify.
> 
> No concerns.
> 
> > 
> >> Vector "unusage" is just one reason here.
> > 
> > I don't see removing the use/unuse functions as a priority though,
> > but if we add an API that also lets devices say
> > 'reason for interrupt is removed', that would be nice.
> > 
> > Removing extra code can then be done separately, and on qemu.git
> > not on qemu-kvm.
> 
> If we refrain from hacking KVM logic into the use/unuse services
> upstream, we can do this later on. For me it is important that those
> obsolete services do not block or complicate further cleanups of the MSI
> layer nor bother device model creators with tasks they should not worry
> about.

My assumption is devices shall keep calling use/unuse until we drop it.
Does not seem like a major bother. If you like, use all vectors
or just those with message != 0.

> > 
> >>> - IMO it makes more sense than "clear
> >>> pending vector". msix_vector_use is good to keep around for symmetry:
> >>> who knows whether we'll need to allocate resources per vector
> >>> in the future.
> >>
> >> For MSI[-X], the spec is already there, and we know that there no need
> >> for further resources when emulating it.
> >> Only KVM has special needs.
> >>
> >> Jan
> >>
> > 
> > It's not hard to speculate.  Imagine an out of process device that
> > shares guest memory and sends interrupts to qemu using eventfd. Suddenly
> > we need an fd per vector, and this without KVM.
> 
> That's what irqfd was invented for. Already works for vhost, and there
> is nothing that prevents communicating the irqfd fd between two
> processes. But note: irqfd handle, not a KVM-internal GSI.
> 
> Jan
> 

Yes. But this still makes an API for acquiring per-vector resources a requirement.

-- 
MST

^ permalink raw reply	[flat|nested] 288+ messages in thread

* Re: [RFC][PATCH 28/45] qemu-kvm: msix: Drop tracking of used vectors
  2011-10-19  9:03                                                   ` [Qemu-devel] " Michael S. Tsirkin
@ 2011-10-19 11:17                                                     ` Jan Kiszka
  -1 siblings, 0 replies; 288+ messages in thread
From: Jan Kiszka @ 2011-10-19 11:17 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Avi Kivity, Marcelo Tosatti, kvm, Alex Williamson, qemu-devel

On 2011-10-19 11:03, Michael S. Tsirkin wrote:
>>> I thought we need to match APIC ID. That needs a table lookup, no?
>>
>> Yes. But that's completely independent of a concrete MSI message. In
>> fact, this is the same thing we need when interpreting an IOAPIC
>> redirection table entry. So let's create an APIC ID lookup table for the
>> destination ID field, maybe multiple of them to match different modes,
>> but not a MSI message table.
>>>
>>>> Or are you thinking about some cluster mode?
>>>
>>> That too.
> 
> Hmm, might be a good idea. APIC IDs are 8 bit, right?

Yep (more generally: destination IDs). So even if we have to create
multiple lookup tables for the various modes, that won't consume
megabytes of RAM.

> 
> 
>>>>>
>>>>>
>>>>>>>
>>>>>>> An analogy would be if read/write operated on file paths.
>>>>>>> fd makes it easy to do permission checks and slow lookups
>>>>>>> in one place. GSI happens to work like this (maybe, by accident).
>>>>>>
>>>>>> Think of an opaque file handle as a MSIRoutingCache object. And it
>>>>>> encodes not only the routing handle but also other useful associated
>>>>>> information we need from time to time - internally, not in the device
>>>>>> models.
>>>>>
>>>>> Forget qemu abstractions, I am talking about data path
>>>>> optimizations in kernel in kvm. From that POV the point of an fd is not
>>>>> that it is opaque. It is that it's an index in an array that
>>>>> can be used for fast lookups.
>>>>>
>>>>>>>>>
>>>>>>>>> Another concern is mask bit emulation. We currently
>>>>>>>>> handle mask bit in userspace but patches
>>>>>>>>> to do them in kernel for assigned devices where seen
>>>>>>>>> and IMO we might want to do that for virtio as well.
>>>>>>>>>
>>>>>>>>> For that to work the mask bit needs to be tied to
>>>>>>>>> a specific gsi or specific device, which does not
>>>>>>>>> work if we just inject arbitrary writes.
>>>>>>>>
>>>>>>>> Yes, but I do not see those valuable plans being negatively affected.
>>>>>>>>
>>>>>>>> Jan
>>>>>>>>
>>>>>>>
>>>>>>> I do.
>>>>>>> How would we maintain a mask/pending bit in kernel if we are not
>>>>>>> supplied info on all available vectors even?
>>>>>>
>>>>>> It's tricky to discuss an undefined interface (there only exists an
>>>>>> outdated proposal for kvm device assignment). But I suppose that user
>>>>>> space will have to define the maximum number of vectors when creating an
>>>>>> in-kernel MSI-X MMIO area. The device already has to tell this to msix_init.
>>>>>>
>>>>>> The number of used vectors will correlate with the number of registered
>>>>>> irqfds (in the case of vhost or vfio, device assignment still has
>>>>>> SET_MSIX_NR). As kernel space would then be responsible for mask
>>>>>> processing, user space would keep vectors registered with irqfds, even
>>>>>> if they are masked. It could just continue to play the trick and drop
>>>>>> data=0 vectors.
>>>>>
>>>>> Which trick?  We don't play any tricks except for device assignment.
>>>>>
>>>>>> The point here is: All those steps have _nothing_ to do with the generic
>>>>>> MSI-X core. They are KVM-specific "side channels" for which KVM provides
>>>>>> an API. In contrast, msix_vector_use/unuse were generic services that
>>>>>> were actually created to please KVM requirements. But if we split that
>>>>>> up, we can address the generic MSI-X requirements in a way that makes
>>>>>> more sense for emulated devices (and particularly msix_vector_use makes
>>>>>> no sense for emulation).
>>>>>>
>>>>>> Jan
>>>>>>
>>>>>
>>>>> We need at least msix_vector_unuse
>>>>
>>>> Not at all. We rather need some qemu_irq_set(level) for MSI.
>>>> The spec
>>>> requires that the device clears pending when the reason for that is
>>>> removed. And any removal that is device model-originated should simply
>>>> be signaled like an irq de-assert.
>>>
>>> OK, this is a good argument.
>>> In particular virtio ISR read could clear msix pending bit
>>> (note: it would also need to clear irqfd as that is where
>>>  we get the pending bit).
>>>
>>> I would prefer not to use qemu_irq_set for this though.
>>> We can add a level flag to msix_notify.
>>
>> No concerns.
>>
>>>
>>>> Vector "unusage" is just one reason here.
>>>
>>> I don't see removing the use/unuse functions as a priority though,
>>> but if we add an API that also lets devices say
>>> 'reason for interrupt is removed', that would be nice.
>>>
>>> Removing extra code can then be done separately, and on qemu.git
>>> not on qemu-kvm.
>>
>> If we refrain from hacking KVM logic into the use/unuse services
>> upstream, we can do this later on. For me it is important that those
>> obsolete services do not block or complicate further cleanups of the MSI
>> layer nor bother device model creators with tasks they should not worry
>> about.
> 
> My assumption is devices shall keep calling use/unuse until we drop it.
> Does not seem like a major bother. If you like, use all vectors
> or just those with message != 0.

What about letting only those devices call use/unuse that sometimes need
less than the maximum amount? All other would benefit for an use_all
executed on enable and a unuse_all on disable/reset/uninit.

> 
>>>
>>>>> - IMO it makes more sense than "clear
>>>>> pending vector". msix_vector_use is good to keep around for symmetry:
>>>>> who knows whether we'll need to allocate resources per vector
>>>>> in the future.
>>>>
>>>> For MSI[-X], the spec is already there, and we know that there no need
>>>> for further resources when emulating it.
>>>> Only KVM has special needs.
>>>>
>>>> Jan
>>>>
>>>
>>> It's not hard to speculate.  Imagine an out of process device that
>>> shares guest memory and sends interrupts to qemu using eventfd. Suddenly
>>> we need an fd per vector, and this without KVM.
>>
>> That's what irqfd was invented for. Already works for vhost, and there
>> is nothing that prevents communicating the irqfd fd between two
>> processes. But note: irqfd handle, not a KVM-internal GSI.
>>
>> Jan
>>
> 
> Yes. But this still makes an API for acquiring per-vector resources a requirement.

Yes, but a different one than current use/unuse. And it will be an
optional one, only for those devices that need to establish irq/eventfd
channels.

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux

^ permalink raw reply	[flat|nested] 288+ messages in thread

* Re: [Qemu-devel] [RFC][PATCH 28/45] qemu-kvm: msix: Drop tracking of used vectors
@ 2011-10-19 11:17                                                     ` Jan Kiszka
  0 siblings, 0 replies; 288+ messages in thread
From: Jan Kiszka @ 2011-10-19 11:17 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Alex Williamson, Marcelo Tosatti, Avi Kivity, kvm, qemu-devel

On 2011-10-19 11:03, Michael S. Tsirkin wrote:
>>> I thought we need to match APIC ID. That needs a table lookup, no?
>>
>> Yes. But that's completely independent of a concrete MSI message. In
>> fact, this is the same thing we need when interpreting an IOAPIC
>> redirection table entry. So let's create an APIC ID lookup table for the
>> destination ID field, maybe multiple of them to match different modes,
>> but not a MSI message table.
>>>
>>>> Or are you thinking about some cluster mode?
>>>
>>> That too.
> 
> Hmm, might be a good idea. APIC IDs are 8 bit, right?

Yep (more generally: destination IDs). So even if we have to create
multiple lookup tables for the various modes, that won't consume
megabytes of RAM.

> 
> 
>>>>>
>>>>>
>>>>>>>
>>>>>>> An analogy would be if read/write operated on file paths.
>>>>>>> fd makes it easy to do permission checks and slow lookups
>>>>>>> in one place. GSI happens to work like this (maybe, by accident).
>>>>>>
>>>>>> Think of an opaque file handle as a MSIRoutingCache object. And it
>>>>>> encodes not only the routing handle but also other useful associated
>>>>>> information we need from time to time - internally, not in the device
>>>>>> models.
>>>>>
>>>>> Forget qemu abstractions, I am talking about data path
>>>>> optimizations in kernel in kvm. From that POV the point of an fd is not
>>>>> that it is opaque. It is that it's an index in an array that
>>>>> can be used for fast lookups.
>>>>>
>>>>>>>>>
>>>>>>>>> Another concern is mask bit emulation. We currently
>>>>>>>>> handle mask bit in userspace but patches
>>>>>>>>> to do them in kernel for assigned devices where seen
>>>>>>>>> and IMO we might want to do that for virtio as well.
>>>>>>>>>
>>>>>>>>> For that to work the mask bit needs to be tied to
>>>>>>>>> a specific gsi or specific device, which does not
>>>>>>>>> work if we just inject arbitrary writes.
>>>>>>>>
>>>>>>>> Yes, but I do not see those valuable plans being negatively affected.
>>>>>>>>
>>>>>>>> Jan
>>>>>>>>
>>>>>>>
>>>>>>> I do.
>>>>>>> How would we maintain a mask/pending bit in kernel if we are not
>>>>>>> supplied info on all available vectors even?
>>>>>>
>>>>>> It's tricky to discuss an undefined interface (there only exists an
>>>>>> outdated proposal for kvm device assignment). But I suppose that user
>>>>>> space will have to define the maximum number of vectors when creating an
>>>>>> in-kernel MSI-X MMIO area. The device already has to tell this to msix_init.
>>>>>>
>>>>>> The number of used vectors will correlate with the number of registered
>>>>>> irqfds (in the case of vhost or vfio, device assignment still has
>>>>>> SET_MSIX_NR). As kernel space would then be responsible for mask
>>>>>> processing, user space would keep vectors registered with irqfds, even
>>>>>> if they are masked. It could just continue to play the trick and drop
>>>>>> data=0 vectors.
>>>>>
>>>>> Which trick?  We don't play any tricks except for device assignment.
>>>>>
>>>>>> The point here is: All those steps have _nothing_ to do with the generic
>>>>>> MSI-X core. They are KVM-specific "side channels" for which KVM provides
>>>>>> an API. In contrast, msix_vector_use/unuse were generic services that
>>>>>> were actually created to please KVM requirements. But if we split that
>>>>>> up, we can address the generic MSI-X requirements in a way that makes
>>>>>> more sense for emulated devices (and particularly msix_vector_use makes
>>>>>> no sense for emulation).
>>>>>>
>>>>>> Jan
>>>>>>
>>>>>
>>>>> We need at least msix_vector_unuse
>>>>
>>>> Not at all. We rather need some qemu_irq_set(level) for MSI.
>>>> The spec
>>>> requires that the device clears pending when the reason for that is
>>>> removed. And any removal that is device model-originated should simply
>>>> be signaled like an irq de-assert.
>>>
>>> OK, this is a good argument.
>>> In particular virtio ISR read could clear msix pending bit
>>> (note: it would also need to clear irqfd as that is where
>>>  we get the pending bit).
>>>
>>> I would prefer not to use qemu_irq_set for this though.
>>> We can add a level flag to msix_notify.
>>
>> No concerns.
>>
>>>
>>>> Vector "unusage" is just one reason here.
>>>
>>> I don't see removing the use/unuse functions as a priority though,
>>> but if we add an API that also lets devices say
>>> 'reason for interrupt is removed', that would be nice.
>>>
>>> Removing extra code can then be done separately, and on qemu.git
>>> not on qemu-kvm.
>>
>> If we refrain from hacking KVM logic into the use/unuse services
>> upstream, we can do this later on. For me it is important that those
>> obsolete services do not block or complicate further cleanups of the MSI
>> layer nor bother device model creators with tasks they should not worry
>> about.
> 
> My assumption is devices shall keep calling use/unuse until we drop it.
> Does not seem like a major bother. If you like, use all vectors
> or just those with message != 0.

What about letting only those devices call use/unuse that sometimes need
less than the maximum amount? All other would benefit for an use_all
executed on enable and a unuse_all on disable/reset/uninit.

> 
>>>
>>>>> - IMO it makes more sense than "clear
>>>>> pending vector". msix_vector_use is good to keep around for symmetry:
>>>>> who knows whether we'll need to allocate resources per vector
>>>>> in the future.
>>>>
>>>> For MSI[-X], the spec is already there, and we know that there no need
>>>> for further resources when emulating it.
>>>> Only KVM has special needs.
>>>>
>>>> Jan
>>>>
>>>
>>> It's not hard to speculate.  Imagine an out of process device that
>>> shares guest memory and sends interrupts to qemu using eventfd. Suddenly
>>> we need an fd per vector, and this without KVM.
>>
>> That's what irqfd was invented for. Already works for vhost, and there
>> is nothing that prevents communicating the irqfd fd between two
>> processes. But note: irqfd handle, not a KVM-internal GSI.
>>
>> Jan
>>
> 
> Yes. But this still makes an API for acquiring per-vector resources a requirement.

Yes, but a different one than current use/unuse. And it will be an
optional one, only for those devices that need to establish irq/eventfd
channels.

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux

^ permalink raw reply	[flat|nested] 288+ messages in thread

* Re: [RFC][PATCH 28/45] qemu-kvm: msix: Drop tracking of used vectors
  2011-10-19 11:17                                                     ` [Qemu-devel] " Jan Kiszka
@ 2011-10-20 22:02                                                       ` Michael S. Tsirkin
  -1 siblings, 0 replies; 288+ messages in thread
From: Michael S. Tsirkin @ 2011-10-20 22:02 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: Alex Williamson, Marcelo Tosatti, Avi Kivity, kvm, qemu-devel

On Wed, Oct 19, 2011 at 01:17:03PM +0200, Jan Kiszka wrote:
> On 2011-10-19 11:03, Michael S. Tsirkin wrote:
> >>> I thought we need to match APIC ID. That needs a table lookup, no?
> >>
> >> Yes. But that's completely independent of a concrete MSI message. In
> >> fact, this is the same thing we need when interpreting an IOAPIC
> >> redirection table entry. So let's create an APIC ID lookup table for the
> >> destination ID field, maybe multiple of them to match different modes,
> >> but not a MSI message table.
> >>>
> >>>> Or are you thinking about some cluster mode?
> >>>
> >>> That too.
> > 
> > Hmm, might be a good idea. APIC IDs are 8 bit, right?
> 
> Yep (more generally: destination IDs). So even if we have to create
> multiple lookup tables for the various modes, that won't consume
> megabytes of RAM.
> 
> > 
> > 
> >>>>>
> >>>>>
> >>>>>>>
> >>>>>>> An analogy would be if read/write operated on file paths.
> >>>>>>> fd makes it easy to do permission checks and slow lookups
> >>>>>>> in one place. GSI happens to work like this (maybe, by accident).
> >>>>>>
> >>>>>> Think of an opaque file handle as a MSIRoutingCache object. And it
> >>>>>> encodes not only the routing handle but also other useful associated
> >>>>>> information we need from time to time - internally, not in the device
> >>>>>> models.
> >>>>>
> >>>>> Forget qemu abstractions, I am talking about data path
> >>>>> optimizations in kernel in kvm. From that POV the point of an fd is not
> >>>>> that it is opaque. It is that it's an index in an array that
> >>>>> can be used for fast lookups.
> >>>>>
> >>>>>>>>>
> >>>>>>>>> Another concern is mask bit emulation. We currently
> >>>>>>>>> handle mask bit in userspace but patches
> >>>>>>>>> to do them in kernel for assigned devices where seen
> >>>>>>>>> and IMO we might want to do that for virtio as well.
> >>>>>>>>>
> >>>>>>>>> For that to work the mask bit needs to be tied to
> >>>>>>>>> a specific gsi or specific device, which does not
> >>>>>>>>> work if we just inject arbitrary writes.
> >>>>>>>>
> >>>>>>>> Yes, but I do not see those valuable plans being negatively affected.
> >>>>>>>>
> >>>>>>>> Jan
> >>>>>>>>
> >>>>>>>
> >>>>>>> I do.
> >>>>>>> How would we maintain a mask/pending bit in kernel if we are not
> >>>>>>> supplied info on all available vectors even?
> >>>>>>
> >>>>>> It's tricky to discuss an undefined interface (there only exists an
> >>>>>> outdated proposal for kvm device assignment). But I suppose that user
> >>>>>> space will have to define the maximum number of vectors when creating an
> >>>>>> in-kernel MSI-X MMIO area. The device already has to tell this to msix_init.
> >>>>>>
> >>>>>> The number of used vectors will correlate with the number of registered
> >>>>>> irqfds (in the case of vhost or vfio, device assignment still has
> >>>>>> SET_MSIX_NR). As kernel space would then be responsible for mask
> >>>>>> processing, user space would keep vectors registered with irqfds, even
> >>>>>> if they are masked. It could just continue to play the trick and drop
> >>>>>> data=0 vectors.
> >>>>>
> >>>>> Which trick?  We don't play any tricks except for device assignment.
> >>>>>
> >>>>>> The point here is: All those steps have _nothing_ to do with the generic
> >>>>>> MSI-X core. They are KVM-specific "side channels" for which KVM provides
> >>>>>> an API. In contrast, msix_vector_use/unuse were generic services that
> >>>>>> were actually created to please KVM requirements. But if we split that
> >>>>>> up, we can address the generic MSI-X requirements in a way that makes
> >>>>>> more sense for emulated devices (and particularly msix_vector_use makes
> >>>>>> no sense for emulation).
> >>>>>>
> >>>>>> Jan
> >>>>>>
> >>>>>
> >>>>> We need at least msix_vector_unuse
> >>>>
> >>>> Not at all. We rather need some qemu_irq_set(level) for MSI.
> >>>> The spec
> >>>> requires that the device clears pending when the reason for that is
> >>>> removed. And any removal that is device model-originated should simply
> >>>> be signaled like an irq de-assert.
> >>>
> >>> OK, this is a good argument.
> >>> In particular virtio ISR read could clear msix pending bit
> >>> (note: it would also need to clear irqfd as that is where
> >>>  we get the pending bit).
> >>>
> >>> I would prefer not to use qemu_irq_set for this though.
> >>> We can add a level flag to msix_notify.
> >>
> >> No concerns.
> >>
> >>>
> >>>> Vector "unusage" is just one reason here.
> >>>
> >>> I don't see removing the use/unuse functions as a priority though,
> >>> but if we add an API that also lets devices say
> >>> 'reason for interrupt is removed', that would be nice.
> >>>
> >>> Removing extra code can then be done separately, and on qemu.git
> >>> not on qemu-kvm.
> >>
> >> If we refrain from hacking KVM logic into the use/unuse services
> >> upstream, we can do this later on. For me it is important that those
> >> obsolete services do not block or complicate further cleanups of the MSI
> >> layer nor bother device model creators with tasks they should not worry
> >> about.
> > 
> > My assumption is devices shall keep calling use/unuse until we drop it.
> > Does not seem like a major bother. If you like, use all vectors
> > or just those with message != 0.
> 
> What about letting only those devices call use/unuse that sometimes need
> less than the maximum amount? All other would benefit for an use_all
> executed on enable and a unuse_all on disable/reset/uninit.

Sure, I don't mind adding use_all/unuse_all wrappers.

> > 
> >>>
> >>>>> - IMO it makes more sense than "clear
> >>>>> pending vector". msix_vector_use is good to keep around for symmetry:
> >>>>> who knows whether we'll need to allocate resources per vector
> >>>>> in the future.
> >>>>
> >>>> For MSI[-X], the spec is already there, and we know that there no need
> >>>> for further resources when emulating it.
> >>>> Only KVM has special needs.
> >>>>
> >>>> Jan
> >>>>
> >>>
> >>> It's not hard to speculate.  Imagine an out of process device that
> >>> shares guest memory and sends interrupts to qemu using eventfd. Suddenly
> >>> we need an fd per vector, and this without KVM.
> >>
> >> That's what irqfd was invented for. Already works for vhost, and there
> >> is nothing that prevents communicating the irqfd fd between two
> >> processes. But note: irqfd handle, not a KVM-internal GSI.
> >>
> >> Jan
> >>
> > 
> > Yes. But this still makes an API for acquiring per-vector resources a requirement.
> 
> Yes, but a different one than current use/unuse.

What's wrong with use/unuse as an API? It's already in place
and virtio calls it.

> And it will be an
> optional one, only for those devices that need to establish irq/eventfd
> channels.
> 
> Jan

Not sure this should be up to the device.

> -- 
> Siemens AG, Corporate Technology, CT T DE IT 1
> Corporate Competence Center Embedded Linux

^ permalink raw reply	[flat|nested] 288+ messages in thread

* Re: [Qemu-devel] [RFC][PATCH 28/45] qemu-kvm: msix: Drop tracking of used vectors
@ 2011-10-20 22:02                                                       ` Michael S. Tsirkin
  0 siblings, 0 replies; 288+ messages in thread
From: Michael S. Tsirkin @ 2011-10-20 22:02 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: Alex Williamson, Marcelo Tosatti, Avi Kivity, kvm, qemu-devel

On Wed, Oct 19, 2011 at 01:17:03PM +0200, Jan Kiszka wrote:
> On 2011-10-19 11:03, Michael S. Tsirkin wrote:
> >>> I thought we need to match APIC ID. That needs a table lookup, no?
> >>
> >> Yes. But that's completely independent of a concrete MSI message. In
> >> fact, this is the same thing we need when interpreting an IOAPIC
> >> redirection table entry. So let's create an APIC ID lookup table for the
> >> destination ID field, maybe multiple of them to match different modes,
> >> but not a MSI message table.
> >>>
> >>>> Or are you thinking about some cluster mode?
> >>>
> >>> That too.
> > 
> > Hmm, might be a good idea. APIC IDs are 8 bit, right?
> 
> Yep (more generally: destination IDs). So even if we have to create
> multiple lookup tables for the various modes, that won't consume
> megabytes of RAM.
> 
> > 
> > 
> >>>>>
> >>>>>
> >>>>>>>
> >>>>>>> An analogy would be if read/write operated on file paths.
> >>>>>>> fd makes it easy to do permission checks and slow lookups
> >>>>>>> in one place. GSI happens to work like this (maybe, by accident).
> >>>>>>
> >>>>>> Think of an opaque file handle as a MSIRoutingCache object. And it
> >>>>>> encodes not only the routing handle but also other useful associated
> >>>>>> information we need from time to time - internally, not in the device
> >>>>>> models.
> >>>>>
> >>>>> Forget qemu abstractions, I am talking about data path
> >>>>> optimizations in kernel in kvm. From that POV the point of an fd is not
> >>>>> that it is opaque. It is that it's an index in an array that
> >>>>> can be used for fast lookups.
> >>>>>
> >>>>>>>>>
> >>>>>>>>> Another concern is mask bit emulation. We currently
> >>>>>>>>> handle mask bit in userspace but patches
> >>>>>>>>> to do them in kernel for assigned devices where seen
> >>>>>>>>> and IMO we might want to do that for virtio as well.
> >>>>>>>>>
> >>>>>>>>> For that to work the mask bit needs to be tied to
> >>>>>>>>> a specific gsi or specific device, which does not
> >>>>>>>>> work if we just inject arbitrary writes.
> >>>>>>>>
> >>>>>>>> Yes, but I do not see those valuable plans being negatively affected.
> >>>>>>>>
> >>>>>>>> Jan
> >>>>>>>>
> >>>>>>>
> >>>>>>> I do.
> >>>>>>> How would we maintain a mask/pending bit in kernel if we are not
> >>>>>>> supplied info on all available vectors even?
> >>>>>>
> >>>>>> It's tricky to discuss an undefined interface (there only exists an
> >>>>>> outdated proposal for kvm device assignment). But I suppose that user
> >>>>>> space will have to define the maximum number of vectors when creating an
> >>>>>> in-kernel MSI-X MMIO area. The device already has to tell this to msix_init.
> >>>>>>
> >>>>>> The number of used vectors will correlate with the number of registered
> >>>>>> irqfds (in the case of vhost or vfio, device assignment still has
> >>>>>> SET_MSIX_NR). As kernel space would then be responsible for mask
> >>>>>> processing, user space would keep vectors registered with irqfds, even
> >>>>>> if they are masked. It could just continue to play the trick and drop
> >>>>>> data=0 vectors.
> >>>>>
> >>>>> Which trick?  We don't play any tricks except for device assignment.
> >>>>>
> >>>>>> The point here is: All those steps have _nothing_ to do with the generic
> >>>>>> MSI-X core. They are KVM-specific "side channels" for which KVM provides
> >>>>>> an API. In contrast, msix_vector_use/unuse were generic services that
> >>>>>> were actually created to please KVM requirements. But if we split that
> >>>>>> up, we can address the generic MSI-X requirements in a way that makes
> >>>>>> more sense for emulated devices (and particularly msix_vector_use makes
> >>>>>> no sense for emulation).
> >>>>>>
> >>>>>> Jan
> >>>>>>
> >>>>>
> >>>>> We need at least msix_vector_unuse
> >>>>
> >>>> Not at all. We rather need some qemu_irq_set(level) for MSI.
> >>>> The spec
> >>>> requires that the device clears pending when the reason for that is
> >>>> removed. And any removal that is device model-originated should simply
> >>>> be signaled like an irq de-assert.
> >>>
> >>> OK, this is a good argument.
> >>> In particular virtio ISR read could clear msix pending bit
> >>> (note: it would also need to clear irqfd as that is where
> >>>  we get the pending bit).
> >>>
> >>> I would prefer not to use qemu_irq_set for this though.
> >>> We can add a level flag to msix_notify.
> >>
> >> No concerns.
> >>
> >>>
> >>>> Vector "unusage" is just one reason here.
> >>>
> >>> I don't see removing the use/unuse functions as a priority though,
> >>> but if we add an API that also lets devices say
> >>> 'reason for interrupt is removed', that would be nice.
> >>>
> >>> Removing extra code can then be done separately, and on qemu.git
> >>> not on qemu-kvm.
> >>
> >> If we refrain from hacking KVM logic into the use/unuse services
> >> upstream, we can do this later on. For me it is important that those
> >> obsolete services do not block or complicate further cleanups of the MSI
> >> layer nor bother device model creators with tasks they should not worry
> >> about.
> > 
> > My assumption is devices shall keep calling use/unuse until we drop it.
> > Does not seem like a major bother. If you like, use all vectors
> > or just those with message != 0.
> 
> What about letting only those devices call use/unuse that sometimes need
> less than the maximum amount? All other would benefit for an use_all
> executed on enable and a unuse_all on disable/reset/uninit.

Sure, I don't mind adding use_all/unuse_all wrappers.

> > 
> >>>
> >>>>> - IMO it makes more sense than "clear
> >>>>> pending vector". msix_vector_use is good to keep around for symmetry:
> >>>>> who knows whether we'll need to allocate resources per vector
> >>>>> in the future.
> >>>>
> >>>> For MSI[-X], the spec is already there, and we know that there no need
> >>>> for further resources when emulating it.
> >>>> Only KVM has special needs.
> >>>>
> >>>> Jan
> >>>>
> >>>
> >>> It's not hard to speculate.  Imagine an out of process device that
> >>> shares guest memory and sends interrupts to qemu using eventfd. Suddenly
> >>> we need an fd per vector, and this without KVM.
> >>
> >> That's what irqfd was invented for. Already works for vhost, and there
> >> is nothing that prevents communicating the irqfd fd between two
> >> processes. But note: irqfd handle, not a KVM-internal GSI.
> >>
> >> Jan
> >>
> > 
> > Yes. But this still makes an API for acquiring per-vector resources a requirement.
> 
> Yes, but a different one than current use/unuse.

What's wrong with use/unuse as an API? It's already in place
and virtio calls it.

> And it will be an
> optional one, only for those devices that need to establish irq/eventfd
> channels.
> 
> Jan

Not sure this should be up to the device.

> -- 
> Siemens AG, Corporate Technology, CT T DE IT 1
> Corporate Competence Center Embedded Linux

^ permalink raw reply	[flat|nested] 288+ messages in thread

* Re: [RFC][PATCH 28/45] qemu-kvm: msix: Drop tracking of used vectors
  2011-10-20 22:02                                                       ` [Qemu-devel] " Michael S. Tsirkin
@ 2011-10-21  7:09                                                         ` Jan Kiszka
  -1 siblings, 0 replies; 288+ messages in thread
From: Jan Kiszka @ 2011-10-21  7:09 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Alex Williamson, Marcelo Tosatti, Avi Kivity, kvm, qemu-devel

On 2011-10-21 00:02, Michael S. Tsirkin wrote:
>>> Yes. But this still makes an API for acquiring per-vector resources a requirement.
>>
>> Yes, but a different one than current use/unuse.
> 
> What's wrong with use/unuse as an API? It's already in place
> and virtio calls it.

Not for that purpose. It remains a useless API in the absence of KVM's
requirements.

> 
>> And it will be an
>> optional one, only for those devices that need to establish irq/eventfd
>> channels.
>>
>> Jan
> 
> Not sure this should be up to the device.

The device provides the fd. At least it acquires and associates it.

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux

^ permalink raw reply	[flat|nested] 288+ messages in thread

* Re: [Qemu-devel] [RFC][PATCH 28/45] qemu-kvm: msix: Drop tracking of used vectors
@ 2011-10-21  7:09                                                         ` Jan Kiszka
  0 siblings, 0 replies; 288+ messages in thread
From: Jan Kiszka @ 2011-10-21  7:09 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Alex Williamson, Marcelo Tosatti, Avi Kivity, kvm, qemu-devel

On 2011-10-21 00:02, Michael S. Tsirkin wrote:
>>> Yes. But this still makes an API for acquiring per-vector resources a requirement.
>>
>> Yes, but a different one than current use/unuse.
> 
> What's wrong with use/unuse as an API? It's already in place
> and virtio calls it.

Not for that purpose. It remains a useless API in the absence of KVM's
requirements.

> 
>> And it will be an
>> optional one, only for those devices that need to establish irq/eventfd
>> channels.
>>
>> Jan
> 
> Not sure this should be up to the device.

The device provides the fd. At least it acquires and associates it.

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux

^ permalink raw reply	[flat|nested] 288+ messages in thread

* Re: [RFC][PATCH 28/45] qemu-kvm: msix: Drop tracking of used vectors
  2011-10-21  7:09                                                         ` [Qemu-devel] " Jan Kiszka
@ 2011-10-21  7:54                                                           ` Michael S. Tsirkin
  -1 siblings, 0 replies; 288+ messages in thread
From: Michael S. Tsirkin @ 2011-10-21  7:54 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: Alex Williamson, Marcelo Tosatti, Avi Kivity, kvm, qemu-devel

On Fri, Oct 21, 2011 at 09:09:10AM +0200, Jan Kiszka wrote:
> On 2011-10-21 00:02, Michael S. Tsirkin wrote:
> >>> Yes. But this still makes an API for acquiring per-vector resources a requirement.
> >>
> >> Yes, but a different one than current use/unuse.
> > 
> > What's wrong with use/unuse as an API? It's already in place
> > and virtio calls it.
> 
> Not for that purpose.
> It remains a useless API in the absence of KVM's
> requirements.
> 

Sorry, I don't understand. This can acquire whatever resources
necessary. It does not seem to make sense to rip it out
only to add a different one back in.

> > 
> >> And it will be an
> >> optional one, only for those devices that need to establish irq/eventfd
> >> channels.
> >>
> >> Jan
> > 
> > Not sure this should be up to the device.
> 
> The device provides the fd. At least it acquires and associates it.
> 
> Jan

It would surely be beneficial to be able to have a uniform
API so that devices don't need to be recoded to be moved
in this way.

> -- 
> Siemens AG, Corporate Technology, CT T DE IT 1
> Corporate Competence Center Embedded Linux

^ permalink raw reply	[flat|nested] 288+ messages in thread

* Re: [Qemu-devel] [RFC][PATCH 28/45] qemu-kvm: msix: Drop tracking of used vectors
@ 2011-10-21  7:54                                                           ` Michael S. Tsirkin
  0 siblings, 0 replies; 288+ messages in thread
From: Michael S. Tsirkin @ 2011-10-21  7:54 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: Alex Williamson, Marcelo Tosatti, Avi Kivity, kvm, qemu-devel

On Fri, Oct 21, 2011 at 09:09:10AM +0200, Jan Kiszka wrote:
> On 2011-10-21 00:02, Michael S. Tsirkin wrote:
> >>> Yes. But this still makes an API for acquiring per-vector resources a requirement.
> >>
> >> Yes, but a different one than current use/unuse.
> > 
> > What's wrong with use/unuse as an API? It's already in place
> > and virtio calls it.
> 
> Not for that purpose.
> It remains a useless API in the absence of KVM's
> requirements.
> 

Sorry, I don't understand. This can acquire whatever resources
necessary. It does not seem to make sense to rip it out
only to add a different one back in.

> > 
> >> And it will be an
> >> optional one, only for those devices that need to establish irq/eventfd
> >> channels.
> >>
> >> Jan
> > 
> > Not sure this should be up to the device.
> 
> The device provides the fd. At least it acquires and associates it.
> 
> Jan

It would surely be beneficial to be able to have a uniform
API so that devices don't need to be recoded to be moved
in this way.

> -- 
> Siemens AG, Corporate Technology, CT T DE IT 1
> Corporate Competence Center Embedded Linux

^ permalink raw reply	[flat|nested] 288+ messages in thread

* Re: [RFC][PATCH 28/45] qemu-kvm: msix: Drop tracking of used vectors
  2011-10-21  7:54                                                           ` [Qemu-devel] " Michael S. Tsirkin
@ 2011-10-21  9:27                                                             ` Jan Kiszka
  -1 siblings, 0 replies; 288+ messages in thread
From: Jan Kiszka @ 2011-10-21  9:27 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Alex Williamson, Marcelo Tosatti, Avi Kivity, kvm, qemu-devel

On 2011-10-21 09:54, Michael S. Tsirkin wrote:
> On Fri, Oct 21, 2011 at 09:09:10AM +0200, Jan Kiszka wrote:
>> On 2011-10-21 00:02, Michael S. Tsirkin wrote:
>>>>> Yes. But this still makes an API for acquiring per-vector resources a requirement.
>>>>
>>>> Yes, but a different one than current use/unuse.
>>>
>>> What's wrong with use/unuse as an API? It's already in place
>>> and virtio calls it.
>>
>> Not for that purpose.
>> It remains a useless API in the absence of KVM's
>> requirements.
>>
> 
> Sorry, I don't understand. This can acquire whatever resources
> necessary. It does not seem to make sense to rip it out
> only to add a different one back in.
> 
>>>
>>>> And it will be an
>>>> optional one, only for those devices that need to establish irq/eventfd
>>>> channels.
>>>>
>>>> Jan
>>>
>>> Not sure this should be up to the device.
>>
>> The device provides the fd. At least it acquires and associates it.
>>
>> Jan
> 
> It would surely be beneficial to be able to have a uniform
> API so that devices don't need to be recoded to be moved
> in this way.

The point is that the current API is useless for devices that do not
have to declare any vector to the core. By forcing them to call into
that API, we solve no current problem automatically. We rather need
associate_vector_with_x (and the reverse). And that only for device that
have different backends than user space models.

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux

^ permalink raw reply	[flat|nested] 288+ messages in thread

* Re: [Qemu-devel] [RFC][PATCH 28/45] qemu-kvm: msix: Drop tracking of used vectors
@ 2011-10-21  9:27                                                             ` Jan Kiszka
  0 siblings, 0 replies; 288+ messages in thread
From: Jan Kiszka @ 2011-10-21  9:27 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Alex Williamson, Marcelo Tosatti, Avi Kivity, kvm, qemu-devel

On 2011-10-21 09:54, Michael S. Tsirkin wrote:
> On Fri, Oct 21, 2011 at 09:09:10AM +0200, Jan Kiszka wrote:
>> On 2011-10-21 00:02, Michael S. Tsirkin wrote:
>>>>> Yes. But this still makes an API for acquiring per-vector resources a requirement.
>>>>
>>>> Yes, but a different one than current use/unuse.
>>>
>>> What's wrong with use/unuse as an API? It's already in place
>>> and virtio calls it.
>>
>> Not for that purpose.
>> It remains a useless API in the absence of KVM's
>> requirements.
>>
> 
> Sorry, I don't understand. This can acquire whatever resources
> necessary. It does not seem to make sense to rip it out
> only to add a different one back in.
> 
>>>
>>>> And it will be an
>>>> optional one, only for those devices that need to establish irq/eventfd
>>>> channels.
>>>>
>>>> Jan
>>>
>>> Not sure this should be up to the device.
>>
>> The device provides the fd. At least it acquires and associates it.
>>
>> Jan
> 
> It would surely be beneficial to be able to have a uniform
> API so that devices don't need to be recoded to be moved
> in this way.

The point is that the current API is useless for devices that do not
have to declare any vector to the core. By forcing them to call into
that API, we solve no current problem automatically. We rather need
associate_vector_with_x (and the reverse). And that only for device that
have different backends than user space models.

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux

^ permalink raw reply	[flat|nested] 288+ messages in thread

* Re: [RFC][PATCH 28/45] qemu-kvm: msix: Drop tracking of used vectors
  2011-10-21  9:27                                                             ` [Qemu-devel] " Jan Kiszka
@ 2011-10-21 10:57                                                               ` Michael S. Tsirkin
  -1 siblings, 0 replies; 288+ messages in thread
From: Michael S. Tsirkin @ 2011-10-21 10:57 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: Avi Kivity, Marcelo Tosatti, kvm, Alex Williamson, qemu-devel

On Fri, Oct 21, 2011 at 11:27:48AM +0200, Jan Kiszka wrote:
> On 2011-10-21 09:54, Michael S. Tsirkin wrote:
> > On Fri, Oct 21, 2011 at 09:09:10AM +0200, Jan Kiszka wrote:
> >> On 2011-10-21 00:02, Michael S. Tsirkin wrote:
> >>>>> Yes. But this still makes an API for acquiring per-vector resources a requirement.
> >>>>
> >>>> Yes, but a different one than current use/unuse.
> >>>
> >>> What's wrong with use/unuse as an API? It's already in place
> >>> and virtio calls it.
> >>
> >> Not for that purpose.
> >> It remains a useless API in the absence of KVM's
> >> requirements.
> >>
> > 
> > Sorry, I don't understand. This can acquire whatever resources
> > necessary. It does not seem to make sense to rip it out
> > only to add a different one back in.
> > 
> >>>
> >>>> And it will be an
> >>>> optional one, only for those devices that need to establish irq/eventfd
> >>>> channels.
> >>>>
> >>>> Jan
> >>>
> >>> Not sure this should be up to the device.
> >>
> >> The device provides the fd. At least it acquires and associates it.
> >>
> >> Jan
> > 
> > It would surely be beneficial to be able to have a uniform
> > API so that devices don't need to be recoded to be moved
> > in this way.
> 
> The point is that the current API is useless for devices that do not
> have to declare any vector to the core.

Don't assigned devices want this as well?
They handle 0-address vectors specially, and
this hack absolutely doesn't belong in pci core ...

> By forcing them to call into
> that API, we solve no current problem automatically. We rather need
> associate_vector_with_x (and the reverse). And that only for device that
> have different backends than user space models.
> 
> Jan

I'll need to think about this, would prefer this series not
to get blocked on this issue. We more or less agreed
to add _use_all/unuse_all for now?

> -- 
> Siemens AG, Corporate Technology, CT T DE IT 1
> Corporate Competence Center Embedded Linux

^ permalink raw reply	[flat|nested] 288+ messages in thread

* Re: [Qemu-devel] [RFC][PATCH 28/45] qemu-kvm: msix: Drop tracking of used vectors
@ 2011-10-21 10:57                                                               ` Michael S. Tsirkin
  0 siblings, 0 replies; 288+ messages in thread
From: Michael S. Tsirkin @ 2011-10-21 10:57 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: Alex Williamson, Marcelo Tosatti, Avi Kivity, kvm, qemu-devel

On Fri, Oct 21, 2011 at 11:27:48AM +0200, Jan Kiszka wrote:
> On 2011-10-21 09:54, Michael S. Tsirkin wrote:
> > On Fri, Oct 21, 2011 at 09:09:10AM +0200, Jan Kiszka wrote:
> >> On 2011-10-21 00:02, Michael S. Tsirkin wrote:
> >>>>> Yes. But this still makes an API for acquiring per-vector resources a requirement.
> >>>>
> >>>> Yes, but a different one than current use/unuse.
> >>>
> >>> What's wrong with use/unuse as an API? It's already in place
> >>> and virtio calls it.
> >>
> >> Not for that purpose.
> >> It remains a useless API in the absence of KVM's
> >> requirements.
> >>
> > 
> > Sorry, I don't understand. This can acquire whatever resources
> > necessary. It does not seem to make sense to rip it out
> > only to add a different one back in.
> > 
> >>>
> >>>> And it will be an
> >>>> optional one, only for those devices that need to establish irq/eventfd
> >>>> channels.
> >>>>
> >>>> Jan
> >>>
> >>> Not sure this should be up to the device.
> >>
> >> The device provides the fd. At least it acquires and associates it.
> >>
> >> Jan
> > 
> > It would surely be beneficial to be able to have a uniform
> > API so that devices don't need to be recoded to be moved
> > in this way.
> 
> The point is that the current API is useless for devices that do not
> have to declare any vector to the core.

Don't assigned devices want this as well?
They handle 0-address vectors specially, and
this hack absolutely doesn't belong in pci core ...

> By forcing them to call into
> that API, we solve no current problem automatically. We rather need
> associate_vector_with_x (and the reverse). And that only for device that
> have different backends than user space models.
> 
> Jan

I'll need to think about this, would prefer this series not
to get blocked on this issue. We more or less agreed
to add _use_all/unuse_all for now?

> -- 
> Siemens AG, Corporate Technology, CT T DE IT 1
> Corporate Competence Center Embedded Linux

^ permalink raw reply	[flat|nested] 288+ messages in thread

end of thread, other threads:[~2011-10-21 11:53 UTC | newest]

Thread overview: 288+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-10-17  9:27 [RFC][PATCH 00/45] qemu-kvm: MSI layer rework for in-kernel irqchip support Jan Kiszka
2011-10-17  9:27 ` [Qemu-devel] " Jan Kiszka
2011-10-17  9:27 ` [RFC][PATCH 01/45] msi: Guard msi/msix_write_config with msi_present Jan Kiszka
2011-10-17  9:27   ` [Qemu-devel] " Jan Kiszka
2011-10-17  9:27 ` [RFC][PATCH 02/45] msi: Guard msi_reset " Jan Kiszka
2011-10-17  9:27   ` [Qemu-devel] " Jan Kiszka
2011-10-17  9:27 ` [RFC][PATCH 03/45] msi: Use msi/msix_present more consistently Jan Kiszka
2011-10-17  9:27   ` [Qemu-devel] " Jan Kiszka
2011-10-17  9:27 ` [RFC][PATCH 04/45] msi: Invoke msi/msix_reset from PCI core Jan Kiszka
2011-10-17  9:27   ` [Qemu-devel] " Jan Kiszka
2011-10-17  9:27 ` [RFC][PATCH 05/45] msi: Invoke msi/msix_write_config " Jan Kiszka
2011-10-17  9:27   ` [Qemu-devel] " Jan Kiszka
2011-10-17  9:27 ` [RFC][PATCH 06/45] msix: Prevent bogus mask updates on MMIO accesses Jan Kiszka
2011-10-17  9:27   ` [Qemu-devel] " Jan Kiszka
2011-10-17 11:10   ` Michael S. Tsirkin
2011-10-17 11:10     ` [Qemu-devel] " Michael S. Tsirkin
2011-10-17 11:23     ` Jan Kiszka
2011-10-17 11:23       ` [Qemu-devel] " Jan Kiszka
2011-10-17 11:57       ` Michael S. Tsirkin
2011-10-17 11:57         ` [Qemu-devel] " Michael S. Tsirkin
2011-10-17 12:07         ` Jan Kiszka
2011-10-17 12:07           ` [Qemu-devel] " Jan Kiszka
2011-10-17 12:50           ` Michael S. Tsirkin
2011-10-17 12:50             ` [Qemu-devel] " Michael S. Tsirkin
2011-10-17 19:11             ` Jan Kiszka
2011-10-17 19:11               ` [Qemu-devel] " Jan Kiszka
2011-10-17 19:43               ` Michael S. Tsirkin
2011-10-17 19:43                 ` [Qemu-devel] " Michael S. Tsirkin
2011-10-17  9:27 ` [RFC][PATCH 07/45] msi: Generalize msix_supported to msi_supported Jan Kiszka
2011-10-17  9:27   ` [Qemu-devel] " Jan Kiszka
2011-10-17  9:27 ` [RFC][PATCH 08/45] Introduce MSIMessage structure Jan Kiszka
2011-10-17  9:27   ` [Qemu-devel] " Jan Kiszka
2011-10-17 11:46   ` Michael S. Tsirkin
2011-10-17 11:46     ` [Qemu-devel] " Michael S. Tsirkin
2011-10-17 11:51     ` Jan Kiszka
2011-10-17 11:51       ` [Qemu-devel] " Jan Kiszka
2011-10-17 12:04       ` Michael S. Tsirkin
2011-10-17 12:04         ` [Qemu-devel] " Michael S. Tsirkin
2011-10-17 12:09         ` Jan Kiszka
2011-10-17 12:09           ` [Qemu-devel] " Jan Kiszka
2011-10-17 13:01           ` Michael S. Tsirkin
2011-10-17 13:01             ` [Qemu-devel] " Michael S. Tsirkin
2011-10-17 19:14             ` Jan Kiszka
2011-10-17 19:14               ` [Qemu-devel] " Jan Kiszka
2011-10-17  9:27 ` [RFC][PATCH 09/45] msi: Factor out msi_message_from_vector Jan Kiszka
2011-10-17  9:27   ` [Qemu-devel] " Jan Kiszka
2011-10-17  9:27 ` [RFC][PATCH 10/45] msix: Factor out msix_message_from_vector Jan Kiszka
2011-10-17  9:27   ` [Qemu-devel] " Jan Kiszka
2011-10-17  9:27 ` [RFC][PATCH 11/45] msi: Factor out delivery hook Jan Kiszka
2011-10-17  9:27   ` [Qemu-devel] " Jan Kiszka
2011-10-17 10:56   ` Avi Kivity
2011-10-17 10:56     ` [Qemu-devel] " Avi Kivity
2011-10-17 11:15     ` Jan Kiszka
2011-10-17 11:15       ` [Qemu-devel] " Jan Kiszka
2011-10-17 11:22       ` Avi Kivity
2011-10-17 11:22         ` [Qemu-devel] " Avi Kivity
2011-10-17 11:29         ` Jan Kiszka
2011-10-17 11:29           ` [Qemu-devel] " Jan Kiszka
2011-10-17 12:14           ` Avi Kivity
2011-10-17 12:14             ` [Qemu-devel] " Avi Kivity
2011-10-17 18:59             ` Jan Kiszka
2011-10-17 18:59               ` [Qemu-devel] " Jan Kiszka
2011-10-17 13:41       ` Michael S. Tsirkin
2011-10-17 13:41         ` [Qemu-devel] " Michael S. Tsirkin
2011-10-17 13:41         ` Avi Kivity
2011-10-17 13:41           ` [Qemu-devel] " Avi Kivity
2011-10-17 13:48           ` Michael S. Tsirkin
2011-10-17 13:48             ` [Qemu-devel] " Michael S. Tsirkin
2011-10-17 19:18             ` Jan Kiszka
2011-10-17 19:18               ` [Qemu-devel] " Jan Kiszka
2011-10-17 13:43   ` Michael S. Tsirkin
2011-10-17 13:43     ` [Qemu-devel] " Michael S. Tsirkin
2011-10-17 19:15     ` Jan Kiszka
2011-10-17 19:15       ` [Qemu-devel] " Jan Kiszka
2011-10-18 12:05       ` Michael S. Tsirkin
2011-10-18 12:05         ` [Qemu-devel] " Michael S. Tsirkin
2011-10-18 12:23         ` Jan Kiszka
2011-10-18 12:23           ` [Qemu-devel] " Jan Kiszka
2011-10-18 12:38           ` Michael S. Tsirkin
2011-10-18 12:38             ` [Qemu-devel] " Michael S. Tsirkin
2011-10-18 12:41             ` Jan Kiszka
2011-10-18 12:41               ` [Qemu-devel] " Jan Kiszka
2011-10-18 12:44             ` malc
2011-10-18 12:44               ` [Qemu-devel] " malc
2011-10-18 12:49               ` Michael S. Tsirkin
2011-10-18 12:49                 ` [Qemu-devel] " Michael S. Tsirkin
2011-10-17  9:27 ` [RFC][PATCH 12/45] msi: Introduce MSIRoutingCache Jan Kiszka
2011-10-17  9:27   ` [Qemu-devel] " Jan Kiszka
2011-10-17 11:06   ` Avi Kivity
2011-10-17 11:06     ` [Qemu-devel] " Avi Kivity
2011-10-17 11:19     ` Jan Kiszka
2011-10-17 11:19       ` [Qemu-devel] " Jan Kiszka
2011-10-17 11:25       ` Avi Kivity
2011-10-17 11:25         ` [Qemu-devel] " Avi Kivity
2011-10-17 11:31         ` Jan Kiszka
2011-10-17 11:31           ` [Qemu-devel] " Jan Kiszka
2011-10-17 12:17           ` Avi Kivity
2011-10-17 12:17             ` [Qemu-devel] " Avi Kivity
2011-10-17 15:37       ` Michael S. Tsirkin
2011-10-17 15:37         ` [Qemu-devel] " Michael S. Tsirkin
2011-10-17 19:19         ` Jan Kiszka
2011-10-17 19:19           ` [Qemu-devel] " Jan Kiszka
2011-10-18 12:17           ` Michael S. Tsirkin
2011-10-18 12:17             ` [Qemu-devel] " Michael S. Tsirkin
2011-10-18 12:26             ` Jan Kiszka
2011-10-18 12:26               ` [Qemu-devel] " Jan Kiszka
2011-10-17 15:43   ` Michael S. Tsirkin
2011-10-17 15:43     ` [Qemu-devel] " Michael S. Tsirkin
2011-10-17 19:23     ` Jan Kiszka
2011-10-17 19:23       ` [Qemu-devel] " Jan Kiszka
2011-10-17  9:27 ` [RFC][PATCH 13/45] hpet: Use msi_deliver Jan Kiszka
2011-10-17  9:27   ` [Qemu-devel] " Jan Kiszka
2011-10-17  9:27 ` [RFC][PATCH 14/45] qemu-kvm: Drop useless kvm_clear_gsi_routes Jan Kiszka
2011-10-17  9:27   ` [Qemu-devel] " Jan Kiszka
2011-10-17  9:27 ` [RFC][PATCH 15/45] qemu-kvm: Drop unused kvm_del_irq_route Jan Kiszka
2011-10-17  9:27   ` [Qemu-devel] " Jan Kiszka
2011-10-17  9:27 ` [RFC][PATCH 16/45] qemu-kvm: Use MSIMessage and MSIRoutingCache Jan Kiszka
2011-10-17  9:27   ` [Qemu-devel] " Jan Kiszka
2011-10-17  9:27 ` [RFC][PATCH 17/45] qemu-kvm: Track MSIRoutingCache in KVM routing table Jan Kiszka
2011-10-17  9:27   ` [Qemu-devel] " Jan Kiszka
2011-10-17 11:13   ` Avi Kivity
2011-10-17 11:13     ` [Qemu-devel] " Avi Kivity
2011-10-17 11:25     ` Jan Kiszka
2011-10-17 11:25       ` [Qemu-devel] " Jan Kiszka
2011-10-17 12:15       ` Avi Kivity
2011-10-17 12:15         ` [Qemu-devel] " Avi Kivity
2011-10-17  9:27 ` [RFC][PATCH 18/45] qemu-kvm: Hook into MSI delivery at APIC level Jan Kiszka
2011-10-17  9:27   ` [Qemu-devel] " Jan Kiszka
2011-10-17  9:27 ` [RFC][PATCH 19/45] qemu-kvm: Factor out kvm_msi_irqfd_set Jan Kiszka
2011-10-17  9:27   ` [Qemu-devel] " Jan Kiszka
2011-10-17  9:27 ` [RFC][PATCH 20/45] qemu-kvm: msix: Only invoke msix_handle_mask_update on changes Jan Kiszka
2011-10-17  9:27   ` [Qemu-devel] " Jan Kiszka
2011-10-17  9:27 ` [RFC][PATCH 21/45] qemu-kvm: msix: Don't fire notifier spuriously on set/unset Jan Kiszka
2011-10-17  9:27   ` [Qemu-devel] " Jan Kiszka
2011-10-17  9:27 ` [RFC][PATCH 22/45] qemu-kvm: msix: Fire mask notifier on global mask changes Jan Kiszka
2011-10-17  9:27   ` [Qemu-devel] " Jan Kiszka
2011-10-17 12:16   ` Michael S. Tsirkin
2011-10-17 12:16     ` [Qemu-devel] " Michael S. Tsirkin
2011-10-17 19:00     ` Jan Kiszka
2011-10-17 19:00       ` [Qemu-devel] " Jan Kiszka
2011-10-18 12:40       ` Michael S. Tsirkin
2011-10-18 12:40         ` [Qemu-devel] " Michael S. Tsirkin
2011-10-18 12:45         ` Jan Kiszka
2011-10-18 12:45           ` [Qemu-devel] " Jan Kiszka
2011-10-18 12:57           ` Michael S. Tsirkin
2011-10-18 12:57             ` [Qemu-devel] " Michael S. Tsirkin
2011-10-17  9:27 ` [RFC][PATCH 23/45] qemu-kvm: Rework MSI-X mask notifier to generic MSI config notifiers Jan Kiszka
2011-10-17  9:27   ` [Qemu-devel] " Jan Kiszka
2011-10-17 11:40   ` Michael S. Tsirkin
2011-10-17 11:40     ` [Qemu-devel] " Michael S. Tsirkin
2011-10-17 11:45     ` Jan Kiszka
2011-10-17 11:45       ` [Qemu-devel] " Jan Kiszka
2011-10-17 12:39       ` Michael S. Tsirkin
2011-10-17 12:39         ` [Qemu-devel] " Michael S. Tsirkin
2011-10-17 19:08         ` Jan Kiszka
2011-10-17 19:08           ` [Qemu-devel] " Jan Kiszka
2011-10-18 13:46           ` Michael S. Tsirkin
2011-10-18 13:46             ` [Qemu-devel] " Michael S. Tsirkin
2011-10-18 13:49             ` Jan Kiszka
2011-10-18 13:49               ` [Qemu-devel] " Jan Kiszka
2011-10-17  9:27 ` [RFC][PATCH 24/45] qemu-kvm: msix: Don't handle mask updated while disabled Jan Kiszka
2011-10-17  9:27   ` [Qemu-devel] " Jan Kiszka
2011-10-17  9:27 ` [RFC][PATCH 25/45] qemu-kvm: Update MSI cache on kvm_msi_irqfd_set Jan Kiszka
2011-10-17  9:27   ` [Qemu-devel] " Jan Kiszka
2011-10-17  9:28 ` [RFC][PATCH 26/45] qemu-kvm: Use g_realloc for irq_routes extension Jan Kiszka
2011-10-17  9:28   ` [Qemu-devel] " Jan Kiszka
2011-10-17  9:28 ` [RFC][PATCH 27/45] qemu-kvm: Lazily update MSI caches Jan Kiszka
2011-10-17  9:28   ` [Qemu-devel] " Jan Kiszka
2011-10-17  9:28 ` [RFC][PATCH 28/45] qemu-kvm: msix: Drop tracking of used vectors Jan Kiszka
2011-10-17  9:28   ` [Qemu-devel] " Jan Kiszka
2011-10-17 15:48   ` Michael S. Tsirkin
2011-10-17 15:48     ` [Qemu-devel] " Michael S. Tsirkin
2011-10-17 19:28     ` Jan Kiszka
2011-10-17 19:28       ` [Qemu-devel] " Jan Kiszka
2011-10-18 11:58       ` Michael S. Tsirkin
2011-10-18 11:58         ` [Qemu-devel] " Michael S. Tsirkin
2011-10-18 12:08         ` Jan Kiszka
2011-10-18 12:08           ` [Qemu-devel] " Jan Kiszka
2011-10-18 12:33           ` Michael S. Tsirkin
2011-10-18 12:33             ` [Qemu-devel] " Michael S. Tsirkin
2011-10-18 12:38             ` Jan Kiszka
2011-10-18 12:38               ` [Qemu-devel] " Jan Kiszka
2011-10-18 12:48               ` Michael S. Tsirkin
2011-10-18 12:48                 ` [Qemu-devel] " Michael S. Tsirkin
2011-10-18 13:00                 ` Jan Kiszka
2011-10-18 13:00                   ` [Qemu-devel] " Jan Kiszka
2011-10-18 13:37                   ` Michael S. Tsirkin
2011-10-18 13:37                     ` [Qemu-devel] " Michael S. Tsirkin
2011-10-18 13:46                     ` Jan Kiszka
2011-10-18 13:46                       ` [Qemu-devel] " Jan Kiszka
2011-10-18 14:01                       ` Michael S. Tsirkin
2011-10-18 14:01                         ` [Qemu-devel] " Michael S. Tsirkin
2011-10-18 14:08                         ` Jan Kiszka
2011-10-18 14:08                           ` [Qemu-devel] " Jan Kiszka
2011-10-18 15:08                           ` Michael S. Tsirkin
2011-10-18 15:08                             ` [Qemu-devel] " Michael S. Tsirkin
2011-10-18 15:22                             ` Jan Kiszka
2011-10-18 15:22                               ` [Qemu-devel] " Jan Kiszka
2011-10-18 15:55                               ` Jan Kiszka
2011-10-18 15:55                                 ` [Qemu-devel] " Jan Kiszka
2011-10-18 17:06                                 ` Michael S. Tsirkin
2011-10-18 17:06                                   ` [Qemu-devel] " Michael S. Tsirkin
2011-10-18 18:24                                   ` Jan Kiszka
2011-10-18 18:24                                     ` [Qemu-devel] " Jan Kiszka
2011-10-18 18:40                                     ` Michael S. Tsirkin
2011-10-18 18:40                                       ` [Qemu-devel] " Michael S. Tsirkin
2011-10-18 19:37                                       ` Jan Kiszka
2011-10-18 19:37                                         ` [Qemu-devel] " Jan Kiszka
2011-10-18 21:40                                         ` Michael S. Tsirkin
2011-10-18 21:40                                           ` [Qemu-devel] " Michael S. Tsirkin
2011-10-18 22:13                                           ` Jan Kiszka
2011-10-18 22:13                                             ` [Qemu-devel] " Jan Kiszka
2011-10-19  0:56                                             ` Michael S. Tsirkin
2011-10-19  0:56                                               ` [Qemu-devel] " Michael S. Tsirkin
2011-10-19  6:41                                               ` Jan Kiszka
2011-10-19  6:41                                                 ` [Qemu-devel] " Jan Kiszka
2011-10-19  9:03                                                 ` Michael S. Tsirkin
2011-10-19  9:03                                                   ` [Qemu-devel] " Michael S. Tsirkin
2011-10-19 11:17                                                   ` Jan Kiszka
2011-10-19 11:17                                                     ` [Qemu-devel] " Jan Kiszka
2011-10-20 22:02                                                     ` Michael S. Tsirkin
2011-10-20 22:02                                                       ` [Qemu-devel] " Michael S. Tsirkin
2011-10-21  7:09                                                       ` Jan Kiszka
2011-10-21  7:09                                                         ` [Qemu-devel] " Jan Kiszka
2011-10-21  7:54                                                         ` Michael S. Tsirkin
2011-10-21  7:54                                                           ` [Qemu-devel] " Michael S. Tsirkin
2011-10-21  9:27                                                           ` Jan Kiszka
2011-10-21  9:27                                                             ` [Qemu-devel] " Jan Kiszka
2011-10-21 10:57                                                             ` Michael S. Tsirkin
2011-10-21 10:57                                                               ` [Qemu-devel] " Michael S. Tsirkin
2011-10-18 18:26                                   ` Jan Kiszka
2011-10-18 18:26                                     ` [Qemu-devel] " Jan Kiszka
2011-10-18 15:56                               ` Michael S. Tsirkin
2011-10-18 15:56                                 ` [Qemu-devel] " Michael S. Tsirkin
2011-10-18 15:58                                 ` Jan Kiszka
2011-10-18 15:58                                   ` [Qemu-devel] " Jan Kiszka
2011-10-17  9:28 ` [RFC][PATCH 29/45] pci-assign: Drop kvm_assigned_irq::host_irq initialization Jan Kiszka
2011-10-17  9:28   ` [Qemu-devel] " Jan Kiszka
2011-10-17  9:28 ` [RFC][PATCH 30/45] pci-assign: Rename assign_irq to assign_intx Jan Kiszka
2011-10-17  9:28   ` [Qemu-devel] " Jan Kiszka
2011-10-17  9:28 ` [RFC][PATCH 31/45] qemu-kvm: Refactor kvm_deassign_irq to kvm_device_irq_deassign Jan Kiszka
2011-10-17  9:28   ` [Qemu-devel] " Jan Kiszka
2011-10-17  9:28 ` [RFC][PATCH 32/45] pci-assign: Factor out deassign_irq Jan Kiszka
2011-10-17  9:28   ` [Qemu-devel] " Jan Kiszka
2011-10-17  9:28 ` [RFC][PATCH 33/45] qemu-kvm: Factor out kvm_device_intx_assign Jan Kiszka
2011-10-17  9:28   ` [Qemu-devel] " Jan Kiszka
2011-10-17  9:28 ` [RFC][PATCH 34/45] qemu-kvm: Factor out kvm_device_msi_assign Jan Kiszka
2011-10-17  9:28   ` [Qemu-devel] " Jan Kiszka
2011-10-17  9:28 ` [RFC][PATCH 35/45] pci-assign: Polish assigned_dev_update_msix_mmio Jan Kiszka
2011-10-17  9:28   ` [Qemu-devel] " Jan Kiszka
2011-10-17  9:28 ` [RFC][PATCH 36/45] qemu-kvm: Factor out kvm_device_msix_* services Jan Kiszka
2011-10-17  9:28   ` [Qemu-devel] " Jan Kiszka
2011-10-17  9:28 ` [RFC][PATCH 37/45] qemu-kvm: Clean up irqrouting API Jan Kiszka
2011-10-17  9:28   ` [Qemu-devel] " Jan Kiszka
2011-10-17  9:28 ` [RFC][PATCH 38/45] msi: Implement config notifiers for legacy MSI Jan Kiszka
2011-10-17  9:28   ` [Qemu-devel] " Jan Kiszka
2011-10-17  9:28 ` [RFC][PATCH 39/45] pci-assign: Use generic MSI support Jan Kiszka
2011-10-17  9:28   ` [Qemu-devel] " Jan Kiszka
2011-10-17  9:28 ` [RFC][PATCH 40/45] qemu-kvm: msix: Drop check for preexisting cap from msix_add_config Jan Kiszka
2011-10-17  9:28   ` [Qemu-devel] " Jan Kiszka
2011-10-17  9:28 ` [RFC][PATCH 41/45] msix: Drop unused msix_bar_size Jan Kiszka
2011-10-17  9:28   ` [Qemu-devel] " Jan Kiszka
2011-10-17  9:28 ` [RFC][PATCH 42/45] msix: Introduce msix_init_simple Jan Kiszka
2011-10-17  9:28   ` [Qemu-devel] " Jan Kiszka
2011-10-17 11:22   ` Michael S. Tsirkin
2011-10-17 11:22     ` [Qemu-devel] " Michael S. Tsirkin
2011-10-17 11:27     ` Jan Kiszka
2011-10-17 11:27       ` [Qemu-devel] " Jan Kiszka
2011-10-17 14:28       ` Michael S. Tsirkin
2011-10-17 14:28         ` [Qemu-devel] " Michael S. Tsirkin
2011-10-17 19:21         ` Jan Kiszka
2011-10-17 19:21           ` [Qemu-devel] " Jan Kiszka
2011-10-18 10:52           ` Michael S. Tsirkin
2011-10-18 10:52             ` [Qemu-devel] " Michael S. Tsirkin
2011-10-18 11:02             ` Jan Kiszka
2011-10-18 11:02               ` [Qemu-devel] " Jan Kiszka
2011-10-17  9:28 ` [RFC][PATCH 43/45] msix: Allow to customize capability on init Jan Kiszka
2011-10-17  9:28   ` [Qemu-devel] " Jan Kiszka
2011-10-17  9:28 ` [RFC][PATCH 44/45] pci-assign: Use generic MSI-X support Jan Kiszka
2011-10-17  9:28   ` [Qemu-devel] " Jan Kiszka
2011-10-17  9:28 ` [RFC][PATCH 45/45] pci-assign: Fix coding style issues Jan Kiszka
2011-10-17  9:28   ` [Qemu-devel] " Jan Kiszka
2011-10-17 12:18 ` [RFC][PATCH 00/45] qemu-kvm: MSI layer rework for in-kernel irqchip support Avi Kivity
2011-10-17 12:18   ` [Qemu-devel] " Avi Kivity
2011-10-17 15:57 ` Michael S. Tsirkin
2011-10-17 15:57   ` [Qemu-devel] " Michael S. Tsirkin
2011-10-17 19:35   ` Jan Kiszka
2011-10-17 19:35     ` [Qemu-devel] " Jan Kiszka

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.