All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH v2 0/5] Convert msix_init() to error
@ 2016-08-23  9:27 Cao jin
  2016-08-23  9:27 ` [Qemu-devel] [PATCH v2 1/5] msix_init: assert programming error Cao jin
                   ` (5 more replies)
  0 siblings, 6 replies; 21+ messages in thread
From: Cao jin @ 2016-08-23  9:27 UTC (permalink / raw)
  To: qemu-devel
  Cc: Jiri Pirko, Gerd Hoffmann, Dmitry Fleytman, Jason Wang,
	Michael S. Tsirkin, Hannes Reinecke, Paolo Bonzini,
	Alex Williamson, Markus Armbruster, Marcel Apfelbaum

v2 changelog:
1. Separate one patch out: "e1000e: remove internal interrupt flag"
2. Separate coding style related code into  patch 2
3. Add commit message body for patch 1
3. Add function comment for msix_init() in patch 3
4. Convert msix_init_exclusive_bar() to error
5. Other minor changes to patch 3, according to Markus's suggestion.

Cao jin (5):
  msix_init: assert programming error
  msix: Follow CODING_STYLE
  pci: Convert msix_init() to Error and fix callers to check it
  megasas: remove unnecessary megasas_use_msix()
  megasas: undo the overwrites of user configuration

CC: Jiri Pirko <jiri@resnulli.us>
CC: Gerd Hoffmann <kraxel@redhat.com>
CC: Dmitry Fleytman <dmitry@daynix.com>
CC: Jason Wang <jasowang@redhat.com>
CC: Michael S. Tsirkin <mst@redhat.com>
CC: Hannes Reinecke <hare@suse.de>
CC: Paolo Bonzini <pbonzini@redhat.com>
CC: Alex Williamson <alex.williamson@redhat.com>
CC: Markus Armbruster <armbru@redhat.com>
CC: Marcel Apfelbaum <marcel@redhat.com>

 hw/block/nvme.c        |  5 +++-
 hw/misc/ivshmem.c      |  8 +++---
 hw/net/e1000e.c        |  2 +-
 hw/net/rocker/rocker.c |  4 ++-
 hw/net/vmxnet3.c       | 42 +++++++++--------------------
 hw/pci/msix.c          | 45 ++++++++++++++++++++++++--------
 hw/scsi/megasas.c      | 49 +++++++++++++++++++---------------
 hw/usb/hcd-xhci.c      | 71 ++++++++++++++++++++++++++++++--------------------
 hw/vfio/pci.c          |  7 +++--
 hw/virtio/virtio-pci.c |  8 ++----
 include/hw/pci/msix.h  |  5 ++--
 11 files changed, 140 insertions(+), 106 deletions(-)

-- 
2.1.0

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [Qemu-devel] [PATCH v2 1/5] msix_init: assert programming error
  2016-08-23  9:27 [Qemu-devel] [PATCH v2 0/5] Convert msix_init() to error Cao jin
@ 2016-08-23  9:27 ` Cao jin
  2016-09-12 13:29   ` Markus Armbruster
  2016-08-23  9:27 ` [Qemu-devel] [PATCH v2 2/5] msix: Follow CODING_STYLE Cao jin
                   ` (4 subsequent siblings)
  5 siblings, 1 reply; 21+ messages in thread
From: Cao jin @ 2016-08-23  9:27 UTC (permalink / raw)
  To: qemu-devel; +Cc: Markus Armbruster, Marcel Apfelbaum, Michael S. Tsirkin

The input parameters is used for creating the msix capable device, so
they must obey the PCI spec, or else, it should be programming error.

CC: Markus Armbruster <armbru@redhat.com>
CC: Marcel Apfelbaum <marcel@redhat.com>
CC: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Cao jin <caoj.fnst@cn.fujitsu.com>
---
 hw/pci/msix.c | 6 ++----
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/hw/pci/msix.c b/hw/pci/msix.c
index 0ec1cb1..384a29d 100644
--- a/hw/pci/msix.c
+++ b/hw/pci/msix.c
@@ -253,9 +253,7 @@ int msix_init(struct PCIDevice *dev, unsigned short nentries,
         return -ENOTSUP;
     }
 
-    if (nentries < 1 || nentries > PCI_MSIX_FLAGS_QSIZE + 1) {
-        return -EINVAL;
-    }
+    assert(nentries >= 1 && nentries <= PCI_MSIX_FLAGS_QSIZE + 1);
 
     table_size = nentries * PCI_MSIX_ENTRY_SIZE;
     pba_size = QEMU_ALIGN_UP(nentries, 64) / 8;
@@ -266,7 +264,7 @@ int msix_init(struct PCIDevice *dev, unsigned short nentries,
         table_offset + table_size > memory_region_size(table_bar) ||
         pba_offset + pba_size > memory_region_size(pba_bar) ||
         (table_offset | pba_offset) & PCI_MSIX_FLAGS_BIRMASK) {
-        return -EINVAL;
+        assert(0);
     }
 
     cap = pci_add_capability(dev, PCI_CAP_ID_MSIX, cap_pos, MSIX_CAP_LENGTH);
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [Qemu-devel] [PATCH v2 2/5] msix: Follow CODING_STYLE
  2016-08-23  9:27 [Qemu-devel] [PATCH v2 0/5] Convert msix_init() to error Cao jin
  2016-08-23  9:27 ` [Qemu-devel] [PATCH v2 1/5] msix_init: assert programming error Cao jin
@ 2016-08-23  9:27 ` Cao jin
  2016-08-23  9:27 ` [Qemu-devel] [PATCH v2 3/5] pci: Convert msix_init() to Error and fix callers to check it Cao jin
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 21+ messages in thread
From: Cao jin @ 2016-08-23  9:27 UTC (permalink / raw)
  To: qemu-devel; +Cc: Markus Armbruster, Marcel Apfelbaum, Michael S. Tsirkin

CC: Markus Armbruster <armbru@redhat.com>
CC: Marcel Apfelbaum <marcel@redhat.com>
CC: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Cao jin <caoj.fnst@cn.fujitsu.com>
---
 hw/pci/msix.c | 8 ++++++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/hw/pci/msix.c b/hw/pci/msix.c
index 384a29d..0aadc0c 100644
--- a/hw/pci/msix.c
+++ b/hw/pci/msix.c
@@ -445,8 +445,10 @@ void msix_notify(PCIDevice *dev, unsigned vector)
 {
     MSIMessage msg;
 
-    if (vector >= dev->msix_entries_nr || !dev->msix_entry_used[vector])
+    if (vector >= dev->msix_entries_nr || !dev->msix_entry_used[vector]) {
         return;
+    }
+
     if (msix_is_masked(dev, vector)) {
         msix_set_pending(dev, vector);
         return;
@@ -481,8 +483,10 @@ void msix_reset(PCIDevice *dev)
 /* Mark vector as used. */
 int msix_vector_use(PCIDevice *dev, unsigned vector)
 {
-    if (vector >= dev->msix_entries_nr)
+    if (vector >= dev->msix_entries_nr) {
         return -EINVAL;
+    }
+
     dev->msix_entry_used[vector]++;
     return 0;
 }
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [Qemu-devel] [PATCH v2 3/5] pci: Convert msix_init() to Error and fix callers to check it
  2016-08-23  9:27 [Qemu-devel] [PATCH v2 0/5] Convert msix_init() to error Cao jin
  2016-08-23  9:27 ` [Qemu-devel] [PATCH v2 1/5] msix_init: assert programming error Cao jin
  2016-08-23  9:27 ` [Qemu-devel] [PATCH v2 2/5] msix: Follow CODING_STYLE Cao jin
@ 2016-08-23  9:27 ` Cao jin
  2016-09-12 13:47   ` Markus Armbruster
  2016-08-23  9:27 ` [Qemu-devel] [PATCH v2 4/5] megasas: remove unnecessary megasas_use_msix() Cao jin
                   ` (2 subsequent siblings)
  5 siblings, 1 reply; 21+ messages in thread
From: Cao jin @ 2016-08-23  9:27 UTC (permalink / raw)
  To: qemu-devel
  Cc: Jiri Pirko, Gerd Hoffmann, Dmitry Fleytman, Jason Wang,
	Michael S. Tsirkin, Hannes Reinecke, Paolo Bonzini,
	Alex Williamson, Markus Armbruster, Marcel Apfelbaum

msix_init() reports errors with error_report(), which is wrong when
it's used in realize().  The same issue was fixed for msi_init() in
commit 1108b2f.

For some devices like e1000e & vmxnet3 who won't fail because of
msi_init's failure, suppress the error report by passing NULL error object.

CC: Jiri Pirko <jiri@resnulli.us>
CC: Gerd Hoffmann <kraxel@redhat.com>
CC: Dmitry Fleytman <dmitry@daynix.com>
CC: Jason Wang <jasowang@redhat.com>
CC: Michael S. Tsirkin <mst@redhat.com>
CC: Hannes Reinecke <hare@suse.de>
CC: Paolo Bonzini <pbonzini@redhat.com>
CC: Alex Williamson <alex.williamson@redhat.com>
CC: Markus Armbruster <armbru@redhat.com>
CC: Marcel Apfelbaum <marcel@redhat.com>
Signed-off-by: Cao jin <caoj.fnst@cn.fujitsu.com>
---
 hw/block/nvme.c        |  5 +++-
 hw/misc/ivshmem.c      |  8 +++---
 hw/net/e1000e.c        |  2 +-
 hw/net/rocker/rocker.c |  4 ++-
 hw/net/vmxnet3.c       | 42 +++++++++--------------------
 hw/pci/msix.c          | 31 ++++++++++++++++++----
 hw/scsi/megasas.c      | 26 ++++++++++++++----
 hw/usb/hcd-xhci.c      | 71 ++++++++++++++++++++++++++++++--------------------
 hw/vfio/pci.c          |  7 +++--
 hw/virtio/virtio-pci.c |  8 ++----
 include/hw/pci/msix.h  |  5 ++--
 11 files changed, 125 insertions(+), 84 deletions(-)

diff --git a/hw/block/nvme.c b/hw/block/nvme.c
index cef3bb4..ae84dc7 100644
--- a/hw/block/nvme.c
+++ b/hw/block/nvme.c
@@ -829,6 +829,7 @@ static int nvme_init(PCIDevice *pci_dev)
 {
     NvmeCtrl *n = NVME(pci_dev);
     NvmeIdCtrl *id = &n->id_ctrl;
+    Error *err = NULL;
 
     int i;
     int64_t bs_size;
@@ -870,7 +871,9 @@ static int nvme_init(PCIDevice *pci_dev)
     pci_register_bar(&n->parent_obj, 0,
         PCI_BASE_ADDRESS_SPACE_MEMORY | PCI_BASE_ADDRESS_MEM_TYPE_64,
         &n->iomem);
-    msix_init_exclusive_bar(&n->parent_obj, n->num_queues, 4);
+    if (msix_init_exclusive_bar(&n->parent_obj, n->num_queues, 4, &err)) {
+        error_report_err(err);
+    }
 
     id->vid = cpu_to_le16(pci_get_word(pci_conf + PCI_VENDOR_ID));
     id->ssvid = cpu_to_le16(pci_get_word(pci_conf + PCI_SUBSYSTEM_VENDOR_ID));
diff --git a/hw/misc/ivshmem.c b/hw/misc/ivshmem.c
index 40a2ebc..a1060ec 100644
--- a/hw/misc/ivshmem.c
+++ b/hw/misc/ivshmem.c
@@ -750,13 +750,13 @@ static void ivshmem_reset(DeviceState *d)
     }
 }
 
-static int ivshmem_setup_interrupts(IVShmemState *s)
+static int ivshmem_setup_interrupts(IVShmemState *s, Error **errp)
 {
     /* allocate QEMU callback data for receiving interrupts */
     s->msi_vectors = g_malloc0(s->vectors * sizeof(MSIVector));
 
     if (ivshmem_has_feature(s, IVSHMEM_MSI)) {
-        if (msix_init_exclusive_bar(PCI_DEVICE(s), s->vectors, 1)) {
+        if (msix_init_exclusive_bar(PCI_DEVICE(s), s->vectors, 1, errp)) {
             return -1;
         }
 
@@ -897,8 +897,8 @@ static void ivshmem_common_realize(PCIDevice *dev, Error **errp)
         qemu_chr_add_handlers(s->server_chr, ivshmem_can_receive,
                               ivshmem_read, NULL, s);
 
-        if (ivshmem_setup_interrupts(s) < 0) {
-            error_setg(errp, "failed to initialize interrupts");
+        if (ivshmem_setup_interrupts(s, errp) < 0) {
+            error_prepend(errp, "Failed to initialize interrupts: ");
             return;
         }
     }
diff --git a/hw/net/e1000e.c b/hw/net/e1000e.c
index bad43f4..72aad21 100644
--- a/hw/net/e1000e.c
+++ b/hw/net/e1000e.c
@@ -292,7 +292,7 @@ e1000e_init_msix(E1000EState *s)
                         E1000E_MSIX_IDX, E1000E_MSIX_TABLE,
                         &s->msix,
                         E1000E_MSIX_IDX, E1000E_MSIX_PBA,
-                        0xA0);
+                        0xA0, NULL);
 
     if (res < 0) {
         trace_e1000e_msix_init_fail(res);
diff --git a/hw/net/rocker/rocker.c b/hw/net/rocker/rocker.c
index 30f2ce4..e421ebb 100644
--- a/hw/net/rocker/rocker.c
+++ b/hw/net/rocker/rocker.c
@@ -1256,14 +1256,16 @@ static int rocker_msix_init(Rocker *r)
 {
     PCIDevice *dev = PCI_DEVICE(r);
     int err;
+    Error *local_err = NULL;
 
     err = msix_init(dev, ROCKER_MSIX_VEC_COUNT(r->fp_ports),
                     &r->msix_bar,
                     ROCKER_PCI_MSIX_BAR_IDX, ROCKER_PCI_MSIX_TABLE_OFFSET,
                     &r->msix_bar,
                     ROCKER_PCI_MSIX_BAR_IDX, ROCKER_PCI_MSIX_PBA_OFFSET,
-                    0);
+                    0, &local_err);
     if (err) {
+        error_report_err(local_err);
         return err;
     }
 
diff --git a/hw/net/vmxnet3.c b/hw/net/vmxnet3.c
index 90f6943..4824f8d 100644
--- a/hw/net/vmxnet3.c
+++ b/hw/net/vmxnet3.c
@@ -2181,32 +2181,6 @@ vmxnet3_use_msix_vectors(VMXNET3State *s, int num_vectors)
     return true;
 }
 
-static bool
-vmxnet3_init_msix(VMXNET3State *s)
-{
-    PCIDevice *d = PCI_DEVICE(s);
-    int res = msix_init(d, VMXNET3_MAX_INTRS,
-                        &s->msix_bar,
-                        VMXNET3_MSIX_BAR_IDX, VMXNET3_OFF_MSIX_TABLE,
-                        &s->msix_bar,
-                        VMXNET3_MSIX_BAR_IDX, VMXNET3_OFF_MSIX_PBA(s),
-                        VMXNET3_MSIX_OFFSET(s));
-
-    if (0 > res) {
-        VMW_WRPRN("Failed to initialize MSI-X, error %d", res);
-        s->msix_used = false;
-    } else {
-        if (!vmxnet3_use_msix_vectors(s, VMXNET3_MAX_INTRS)) {
-            VMW_WRPRN("Failed to use MSI-X vectors, error %d", res);
-            msix_uninit(d, &s->msix_bar, &s->msix_bar);
-            s->msix_used = false;
-        } else {
-            s->msix_used = true;
-        }
-    }
-    return s->msix_used;
-}
-
 static void
 vmxnet3_cleanup_msix(VMXNET3State *s)
 {
@@ -2315,9 +2289,19 @@ static void vmxnet3_pci_realize(PCIDevice *pci_dev, Error **errp)
      * is a programming error. Fall back to INTx silently on -ENOTSUP */
     assert(!ret || ret == -ENOTSUP);
 
-    if (!vmxnet3_init_msix(s)) {
-        VMW_WRPRN("Failed to initialize MSI-X, configuration is inconsistent.");
-    }
+    ret = msix_init(pci_dev, VMXNET3_MAX_INTRS,
+                    &s->msix_bar,
+                    VMXNET3_MSIX_BAR_IDX, VMXNET3_OFF_MSIX_TABLE,
+                    &s->msix_bar,
+                    VMXNET3_MSIX_BAR_IDX, VMXNET3_OFF_MSIX_PBA(s),
+                    VMXNET3_MSIX_OFFSET(s), NULL);
+    /* Any error other than -ENOTSUP(board's MSI support is broken)
+     * is a programming error. Fall back to INTx silently on -ENOTSUP */
+    assert(!ret || ret == -ENOTSUP);
+    s->msix_used = !ret;
+    /* VMXNET3_MAX_INTRS is passed, so it will never fail when mark vector.
+     * For simplicity, no need to check return value. */
+    vmxnet3_use_msix_vectors(s, VMXNET3_MAX_INTRS);
 
     vmxnet3_net_init(s);
 
diff --git a/hw/pci/msix.c b/hw/pci/msix.c
index 0aadc0c..568c051 100644
--- a/hw/pci/msix.c
+++ b/hw/pci/msix.c
@@ -21,6 +21,7 @@
 #include "hw/pci/pci.h"
 #include "hw/xen/xen.h"
 #include "qemu/range.h"
+#include "qapi/error.h"
 
 #define MSIX_CAP_LENGTH 12
 
@@ -238,11 +239,29 @@ static void msix_mask_all(struct PCIDevice *dev, unsigned nentries)
     }
 }
 
-/* Initialize the MSI-X structures */
+/* Make PCI device @dev MSI-X capable
+ * @nentries is the max number of MSI-X vectors that the device support.
+ * @table_bar is the MemoryRegion that MSI-X table structure resides.
+ * @table_bar_nr is number of base address register corresponding to @table_bar.
+ * @table_offset indicates the offset that the MSI-X table structure starts with
+ * in @table_bar.
+ * @pba_bar is the MemoryRegion that the Pending Bit Array structure resides.
+ * @pba_bar_nr is number of base address register corresponding to @pba_bar.
+ * @pba_offset indicates the offset that the Pending Bit Array structure
+ * starts with in @pba_bar.
+ * Non-zero @cap_pos puts capability MSI-X at that offset in PCI config space.
+ * @errp is for returning errors.
+ *
+ * Return 0 on success; set @errp and return -errno on error.
+ * -ENOTSUP means lacking msi support for a msi-capable platform.
+ * -EINVAL means capability overlap, happens when @cap_pos is non-zero,
+ * also means a programming error, except device assignment, which can check
+ * if a real HW is broken.*/
 int msix_init(struct PCIDevice *dev, unsigned short nentries,
               MemoryRegion *table_bar, uint8_t table_bar_nr,
               unsigned table_offset, MemoryRegion *pba_bar,
-              uint8_t pba_bar_nr, unsigned pba_offset, uint8_t cap_pos)
+              uint8_t pba_bar_nr, unsigned pba_offset, uint8_t cap_pos,
+              Error **errp)
 {
     int cap;
     unsigned table_size, pba_size;
@@ -250,6 +269,7 @@ int msix_init(struct PCIDevice *dev, unsigned short nentries,
 
     /* Nothing to do if MSI is not supported by interrupt controller */
     if (!msi_nonbroken) {
+        error_setg(errp, "MSI-X is not supported by interrupt controller");
         return -ENOTSUP;
     }
 
@@ -267,7 +287,8 @@ int msix_init(struct PCIDevice *dev, unsigned short nentries,
         assert(0);
     }
 
-    cap = pci_add_capability(dev, PCI_CAP_ID_MSIX, cap_pos, MSIX_CAP_LENGTH);
+    cap = pci_add_capability2(dev, PCI_CAP_ID_MSIX,
+                              cap_pos, MSIX_CAP_LENGTH, errp);
     if (cap < 0) {
         return cap;
     }
@@ -304,7 +325,7 @@ int msix_init(struct PCIDevice *dev, unsigned short nentries,
 }
 
 int msix_init_exclusive_bar(PCIDevice *dev, unsigned short nentries,
-                            uint8_t bar_nr)
+                            uint8_t bar_nr, Error **errp)
 {
     int ret;
     char *name;
@@ -336,7 +357,7 @@ int msix_init_exclusive_bar(PCIDevice *dev, unsigned short nentries,
     ret = msix_init(dev, nentries, &dev->msix_exclusive_bar, bar_nr,
                     0, &dev->msix_exclusive_bar,
                     bar_nr, bar_pba_offset,
-                    0);
+                    0, errp);
     if (ret) {
         return ret;
     }
diff --git a/hw/scsi/megasas.c b/hw/scsi/megasas.c
index e968302..6d45025 100644
--- a/hw/scsi/megasas.c
+++ b/hw/scsi/megasas.c
@@ -2349,16 +2349,32 @@ static void megasas_scsi_realize(PCIDevice *dev, Error **errp)
 
     memory_region_init_io(&s->mmio_io, OBJECT(s), &megasas_mmio_ops, s,
                           "megasas-mmio", 0x4000);
+    if (megasas_use_msix(s)) {
+        ret = msix_init(dev, 15, &s->mmio_io, b->mmio_bar, 0x2000,
+                        &s->mmio_io, b->mmio_bar, 0x3800, 0x68, &err);
+        /* Any error other than -ENOTSUP(board's MSI support is broken)
+         * is a programming error */
+        assert(!ret || ret == -ENOTSUP);
+        if (ret && s->msix == ON_OFF_AUTO_ON) {
+            /* Can't satisfy user's explicit msix=on request, fail */
+            error_append_hint(&err, "You have to use msix=auto (default) or "
+                    "msix=off with this machine type.\n");
+            /* No instance_finalize method, need to free the resource here */
+            object_unref(OBJECT(&s->mmio_io));
+            error_propagate(errp, err);
+            return;
+        } else if (ret) {
+            /* With msix=auto, we fall back to MSI off silently */
+            s->msix = ON_OFF_AUTO_OFF;
+            error_free(err);
+        }
+    }
+
     memory_region_init_io(&s->port_io, OBJECT(s), &megasas_port_ops, s,
                           "megasas-io", 256);
     memory_region_init_io(&s->queue_io, OBJECT(s), &megasas_queue_ops, s,
                           "megasas-queue", 0x40000);
 
-    if (megasas_use_msix(s) &&
-        msix_init(dev, 15, &s->mmio_io, b->mmio_bar, 0x2000,
-                  &s->mmio_io, b->mmio_bar, 0x3800, 0x68)) {
-        s->msix = ON_OFF_AUTO_OFF;
-    }
     if (pci_is_express(dev)) {
         pcie_endpoint_cap_init(dev, 0xa0);
     }
diff --git a/hw/usb/hcd-xhci.c b/hw/usb/hcd-xhci.c
index 188f954..4280c5d 100644
--- a/hw/usb/hcd-xhci.c
+++ b/hw/usb/hcd-xhci.c
@@ -3594,25 +3594,6 @@ static void usb_xhci_realize(struct PCIDevice *dev, Error **errp)
     dev->config[PCI_CACHE_LINE_SIZE] = 0x10;
     dev->config[0x60] = 0x30; /* release number */
 
-    usb_xhci_init(xhci);
-
-    if (xhci->msi != ON_OFF_AUTO_OFF) {
-        ret = msi_init(dev, 0x70, xhci->numintrs, true, false, &err);
-        /* Any error other than -ENOTSUP(board's MSI support is broken)
-         * is a programming error */
-        assert(!ret || ret == -ENOTSUP);
-        if (ret && xhci->msi == ON_OFF_AUTO_ON) {
-            /* Can't satisfy user's explicit msi=on request, fail */
-            error_append_hint(&err, "You have to use msi=auto (default) or "
-                    "msi=off with this machine type.\n");
-            error_propagate(errp, err);
-            return;
-        }
-        assert(!err || xhci->msi == ON_OFF_AUTO_AUTO);
-        /* With msi=auto, we fall back to MSI off silently */
-        error_free(err);
-    }
-
     if (xhci->numintrs > MAXINTRS) {
         xhci->numintrs = MAXINTRS;
     }
@@ -3622,21 +3603,60 @@ static void usb_xhci_realize(struct PCIDevice *dev, Error **errp)
     if (xhci->numintrs < 1) {
         xhci->numintrs = 1;
     }
+
     if (xhci->numslots > MAXSLOTS) {
         xhci->numslots = MAXSLOTS;
     }
     if (xhci->numslots < 1) {
         xhci->numslots = 1;
     }
+
     if (xhci_get_flag(xhci, XHCI_FLAG_ENABLE_STREAMS)) {
         xhci->max_pstreams_mask = 7; /* == 256 primary streams */
     } else {
         xhci->max_pstreams_mask = 0;
     }
 
-    xhci->mfwrap_timer = timer_new_ns(QEMU_CLOCK_VIRTUAL, xhci_mfwrap_timer, xhci);
+    if (xhci->msi != ON_OFF_AUTO_OFF) {
+        ret = msi_init(dev, 0x70, xhci->numintrs, true, false, &err);
+        /* Any error other than -ENOTSUP(board's MSI support is broken)
+         * is a programming error */
+        assert(!ret || ret == -ENOTSUP);
+        if (ret && xhci->msi == ON_OFF_AUTO_ON) {
+            /* Can't satisfy user's explicit msi=on request, fail */
+            error_append_hint(&err, "You have to use msi=auto (default) or "
+                    "msi=off with this machine type.\n");
+            error_propagate(errp, err);
+            return;
+        }
+        assert(!err || xhci->msi == ON_OFF_AUTO_AUTO);
+        /* With msi=auto, we fall back to MSI off silently */
+        error_free(err);
+    }
 
     memory_region_init(&xhci->mem, OBJECT(xhci), "xhci", LEN_REGS);
+    if (xhci->msix != ON_OFF_AUTO_OFF) {
+        ret = msix_init(dev, xhci->numintrs,
+                        &xhci->mem, 0, OFF_MSIX_TABLE,
+                        &xhci->mem, 0, OFF_MSIX_PBA,
+                        0x90, &err);
+        /* Any error other than -ENOTSUP(board's MSI support is broken)
+         * is a programming error */
+        assert(!ret || ret == -ENOTSUP);
+        if (ret && xhci->msix == ON_OFF_AUTO_ON) {
+            /* Can't satisfy user's explicit msix=on request, fail */
+            error_append_hint(&err, "You have to use msix=auto (default) or "
+                    "msic=off with this machine type.\n");
+            /* No instance_finalize method, need to free the resource here */
+            object_unref(OBJECT(&xhci->mem));
+            error_propagate(errp, err);
+            return;
+        }
+        assert(!err || xhci->msix == ON_OFF_AUTO_AUTO);
+        /* With msix=auto, we fall back to MSI off silently */
+        error_free(err);
+    }
+
     memory_region_init_io(&xhci->mem_cap, OBJECT(xhci), &xhci_cap_ops, xhci,
                           "capabilities", LEN_CAP);
     memory_region_init_io(&xhci->mem_oper, OBJECT(xhci), &xhci_oper_ops, xhci,
@@ -3664,19 +3684,14 @@ static void usb_xhci_realize(struct PCIDevice *dev, Error **errp)
                      PCI_BASE_ADDRESS_SPACE_MEMORY|PCI_BASE_ADDRESS_MEM_TYPE_64,
                      &xhci->mem);
 
+    usb_xhci_init(xhci);
+    xhci->mfwrap_timer = timer_new_ns(QEMU_CLOCK_VIRTUAL, xhci_mfwrap_timer, xhci);
+
     if (pci_bus_is_express(dev->bus) ||
         xhci_get_flag(xhci, XHCI_FLAG_FORCE_PCIE_ENDCAP)) {
         ret = pcie_endpoint_cap_init(dev, 0xa0);
         assert(ret >= 0);
     }
-
-    if (xhci->msix != ON_OFF_AUTO_OFF) {
-        /* TODO check for errors */
-        msix_init(dev, xhci->numintrs,
-                  &xhci->mem, 0, OFF_MSIX_TABLE,
-                  &xhci->mem, 0, OFF_MSIX_PBA,
-                  0x90);
-    }
 }
 
 static void usb_xhci_exit(PCIDevice *dev)
diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
index 7bfa17c..87f4e11 100644
--- a/hw/vfio/pci.c
+++ b/hw/vfio/pci.c
@@ -1349,6 +1349,7 @@ static int vfio_msix_early_setup(VFIOPCIDevice *vdev)
 static int vfio_msix_setup(VFIOPCIDevice *vdev, int pos)
 {
     int ret;
+    Error *err = NULL;
 
     vdev->msix->pending = g_malloc0(BITS_TO_LONGS(vdev->msix->entries) *
                                     sizeof(unsigned long));
@@ -1356,12 +1357,14 @@ static int vfio_msix_setup(VFIOPCIDevice *vdev, int pos)
                     vdev->bars[vdev->msix->table_bar].region.mem,
                     vdev->msix->table_bar, vdev->msix->table_offset,
                     vdev->bars[vdev->msix->pba_bar].region.mem,
-                    vdev->msix->pba_bar, vdev->msix->pba_offset, pos);
+                    vdev->msix->pba_bar, vdev->msix->pba_offset, pos,
+                    &err);
     if (ret < 0) {
         if (ret == -ENOTSUP) {
             return 0;
         }
-        error_report("vfio: msix_init failed");
+        error_prepend(&err, "vfio: msix_init failed: ");
+        error_report_err(err);
         return ret;
     }
 
diff --git a/hw/virtio/virtio-pci.c b/hw/virtio/virtio-pci.c
index 755f921..2e6b9bc 100644
--- a/hw/virtio/virtio-pci.c
+++ b/hw/virtio/virtio-pci.c
@@ -1657,13 +1657,9 @@ static void virtio_pci_device_plugged(DeviceState *d, Error **errp)
 
     if (proxy->nvectors) {
         int err = msix_init_exclusive_bar(&proxy->pci_dev, proxy->nvectors,
-                                          proxy->msix_bar);
+                                          proxy->msix_bar, errp);
         if (err) {
-            /* Notice when a system that supports MSIx can't initialize it.  */
-            if (err != -ENOTSUP) {
-                error_report("unable to init msix vectors to %" PRIu32,
-                             proxy->nvectors);
-            }
+            error_report_err(*errp);
             proxy->nvectors = 0;
         }
     }
diff --git a/include/hw/pci/msix.h b/include/hw/pci/msix.h
index 048a29d..1f27658 100644
--- a/include/hw/pci/msix.h
+++ b/include/hw/pci/msix.h
@@ -9,9 +9,10 @@ MSIMessage msix_get_message(PCIDevice *dev, unsigned int vector);
 int msix_init(PCIDevice *dev, unsigned short nentries,
               MemoryRegion *table_bar, uint8_t table_bar_nr,
               unsigned table_offset, MemoryRegion *pba_bar,
-              uint8_t pba_bar_nr, unsigned pba_offset, uint8_t cap_pos);
+              uint8_t pba_bar_nr, unsigned pba_offset, uint8_t cap_pos,
+              Error **errp);
 int msix_init_exclusive_bar(PCIDevice *dev, unsigned short nentries,
-                            uint8_t bar_nr);
+                            uint8_t bar_nr, Error **errp);
 
 void msix_write_config(PCIDevice *dev, uint32_t address, uint32_t val, int len);
 
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [Qemu-devel] [PATCH v2 4/5] megasas: remove unnecessary megasas_use_msix()
  2016-08-23  9:27 [Qemu-devel] [PATCH v2 0/5] Convert msix_init() to error Cao jin
                   ` (2 preceding siblings ...)
  2016-08-23  9:27 ` [Qemu-devel] [PATCH v2 3/5] pci: Convert msix_init() to Error and fix callers to check it Cao jin
@ 2016-08-23  9:27 ` Cao jin
  2016-08-23  9:27 ` [Qemu-devel] [PATCH v2 5/5] megasas: undo the overwrites of user configuration Cao jin
  2016-09-06 12:42 ` [Qemu-devel] [PATCH v2 0/5] Convert msix_init() to error Cao jin
  5 siblings, 0 replies; 21+ messages in thread
From: Cao jin @ 2016-08-23  9:27 UTC (permalink / raw)
  To: qemu-devel
  Cc: Hannes Reinecke, Paolo Bonzini, Markus Armbruster,
	Marcel Apfelbaum, Michael S. Tsirkin

megasas overwrites user configuration when msix_init() fail,
to indicate internal msi state, which is unsuitable.
And megasa_use_msix() is unnecessary, so remove it.

CC: Hannes Reinecke <hare@suse.de>
CC: Paolo Bonzini <pbonzini@redhat.com>
CC: Markus Armbruster <armbru@redhat.com>
CC: Marcel Apfelbaum <marcel@redhat.com>
CC: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Cao jin <caoj.fnst@cn.fujitsu.com>
---
 hw/scsi/megasas.c | 26 +++++++++-----------------
 1 file changed, 9 insertions(+), 17 deletions(-)

diff --git a/hw/scsi/megasas.c b/hw/scsi/megasas.c
index 6d45025..90cd873 100644
--- a/hw/scsi/megasas.c
+++ b/hw/scsi/megasas.c
@@ -155,11 +155,6 @@ static bool megasas_use_queue64(MegasasState *s)
     return s->flags & MEGASAS_MASK_USE_QUEUE64;
 }
 
-static bool megasas_use_msix(MegasasState *s)
-{
-    return s->msix != ON_OFF_AUTO_OFF;
-}
-
 static bool megasas_is_jbod(MegasasState *s)
 {
     return s->flags & MEGASAS_MASK_USE_JBOD;
@@ -2295,9 +2290,7 @@ static void megasas_scsi_uninit(PCIDevice *d)
 {
     MegasasState *s = MEGASAS(d);
 
-    if (megasas_use_msix(s)) {
-        msix_uninit(d, &s->mmio_io, &s->mmio_io);
-    }
+    msix_uninit(d, &s->mmio_io, &s->mmio_io);
     msi_uninit(d);
 }
 
@@ -2349,7 +2342,7 @@ static void megasas_scsi_realize(PCIDevice *dev, Error **errp)
 
     memory_region_init_io(&s->mmio_io, OBJECT(s), &megasas_mmio_ops, s,
                           "megasas-mmio", 0x4000);
-    if (megasas_use_msix(s)) {
+    if (s->msix != ON_OFF_AUTO_OFF) {
         ret = msix_init(dev, 15, &s->mmio_io, b->mmio_bar, 0x2000,
                         &s->mmio_io, b->mmio_bar, 0x3800, 0x68, &err);
         /* Any error other than -ENOTSUP(board's MSI support is broken)
@@ -2363,11 +2356,14 @@ static void megasas_scsi_realize(PCIDevice *dev, Error **errp)
             object_unref(OBJECT(&s->mmio_io));
             error_propagate(errp, err);
             return;
-        } else if (ret) {
-            /* With msix=auto, we fall back to MSI off silently */
-            s->msix = ON_OFF_AUTO_OFF;
-            error_free(err);
         }
+        assert(!err || s->msix == ON_OFF_AUTO_AUTO);
+        /* With msix=auto, we fall back to MSI off silently */
+        error_free(err);
+    }
+
+    if (msix_enabled(dev)) {
+        msix_vector_use(dev, 0);
     }
 
     memory_region_init_io(&s->port_io, OBJECT(s), &megasas_port_ops, s,
@@ -2385,10 +2381,6 @@ static void megasas_scsi_realize(PCIDevice *dev, Error **errp)
     pci_register_bar(dev, b->mmio_bar, bar_type, &s->mmio_io);
     pci_register_bar(dev, 3, bar_type, &s->queue_io);
 
-    if (megasas_use_msix(s)) {
-        msix_vector_use(dev, 0);
-    }
-
     s->fw_state = MFI_FWSTATE_READY;
     if (!s->sas_addr) {
         s->sas_addr = ((NAA_LOCALLY_ASSIGNED_ID << 24) |
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [Qemu-devel] [PATCH v2 5/5] megasas: undo the overwrites of user configuration
  2016-08-23  9:27 [Qemu-devel] [PATCH v2 0/5] Convert msix_init() to error Cao jin
                   ` (3 preceding siblings ...)
  2016-08-23  9:27 ` [Qemu-devel] [PATCH v2 4/5] megasas: remove unnecessary megasas_use_msix() Cao jin
@ 2016-08-23  9:27 ` Cao jin
  2016-09-06 12:42 ` [Qemu-devel] [PATCH v2 0/5] Convert msix_init() to error Cao jin
  5 siblings, 0 replies; 21+ messages in thread
From: Cao jin @ 2016-08-23  9:27 UTC (permalink / raw)
  To: qemu-devel
  Cc: Hannes Reinecke, Paolo Bonzini, Markus Armbruster,
	Marcel Apfelbaum, Michael S. Tsirkin

Commit afea4e14 seems forgetting to undo the overwrites, which is
unsuitable.

CC: Hannes Reinecke <hare@suse.de>
CC: Paolo Bonzini <pbonzini@redhat.com>
CC: Markus Armbruster <armbru@redhat.com>
CC: Marcel Apfelbaum <marcel@redhat.com>
CC: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Cao jin <caoj.fnst@cn.fujitsu.com>
---
 hw/scsi/megasas.c | 7 +++----
 1 file changed, 3 insertions(+), 4 deletions(-)

diff --git a/hw/scsi/megasas.c b/hw/scsi/megasas.c
index 90cd873..ff314a2 100644
--- a/hw/scsi/megasas.c
+++ b/hw/scsi/megasas.c
@@ -2333,11 +2333,10 @@ static void megasas_scsi_realize(PCIDevice *dev, Error **errp)
                     "msi=off with this machine type.\n");
             error_propagate(errp, err);
             return;
-        } else if (ret) {
-            /* With msi=auto, we fall back to MSI off silently */
-            s->msi = ON_OFF_AUTO_OFF;
-            error_free(err);
         }
+        assert(!err || s->msix == ON_OFF_AUTO_AUTO);
+        /* With msi=auto, we fall back to MSI off silently */
+        error_free(err);
     }
 
     memory_region_init_io(&s->mmio_io, OBJECT(s), &megasas_mmio_ops, s,
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: [Qemu-devel] [PATCH v2 0/5] Convert msix_init() to error
  2016-08-23  9:27 [Qemu-devel] [PATCH v2 0/5] Convert msix_init() to error Cao jin
                   ` (4 preceding siblings ...)
  2016-08-23  9:27 ` [Qemu-devel] [PATCH v2 5/5] megasas: undo the overwrites of user configuration Cao jin
@ 2016-09-06 12:42 ` Cao jin
  5 siblings, 0 replies; 21+ messages in thread
From: Cao jin @ 2016-09-06 12:42 UTC (permalink / raw)
  To: qemu-devel
  Cc: Jiri Pirko, Michael S. Tsirkin, Jason Wang, Markus Armbruster,
	Marcel Apfelbaum, Alex Williamson, Hannes Reinecke,
	Dmitry Fleytman, Paolo Bonzini, Gerd Hoffmann

ping

On 08/23/2016 05:27 PM, Cao jin wrote:
> v2 changelog:
> 1. Separate one patch out: "e1000e: remove internal interrupt flag"
> 2. Separate coding style related code into  patch 2
> 3. Add commit message body for patch 1
> 3. Add function comment for msix_init() in patch 3
> 4. Convert msix_init_exclusive_bar() to error
> 5. Other minor changes to patch 3, according to Markus's suggestion.
>
> Cao jin (5):
>    msix_init: assert programming error
>    msix: Follow CODING_STYLE
>    pci: Convert msix_init() to Error and fix callers to check it
>    megasas: remove unnecessary megasas_use_msix()
>    megasas: undo the overwrites of user configuration
>
> CC: Jiri Pirko <jiri@resnulli.us>
> CC: Gerd Hoffmann <kraxel@redhat.com>
> CC: Dmitry Fleytman <dmitry@daynix.com>
> CC: Jason Wang <jasowang@redhat.com>
> CC: Michael S. Tsirkin <mst@redhat.com>
> CC: Hannes Reinecke <hare@suse.de>
> CC: Paolo Bonzini <pbonzini@redhat.com>
> CC: Alex Williamson <alex.williamson@redhat.com>
> CC: Markus Armbruster <armbru@redhat.com>
> CC: Marcel Apfelbaum <marcel@redhat.com>
>
>   hw/block/nvme.c        |  5 +++-
>   hw/misc/ivshmem.c      |  8 +++---
>   hw/net/e1000e.c        |  2 +-
>   hw/net/rocker/rocker.c |  4 ++-
>   hw/net/vmxnet3.c       | 42 +++++++++--------------------
>   hw/pci/msix.c          | 45 ++++++++++++++++++++++++--------
>   hw/scsi/megasas.c      | 49 +++++++++++++++++++---------------
>   hw/usb/hcd-xhci.c      | 71 ++++++++++++++++++++++++++++++--------------------
>   hw/vfio/pci.c          |  7 +++--
>   hw/virtio/virtio-pci.c |  8 ++----
>   include/hw/pci/msix.h  |  5 ++--
>   11 files changed, 140 insertions(+), 106 deletions(-)
>

-- 
Yours Sincerely,

Cao jin

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Qemu-devel] [PATCH v2 1/5] msix_init: assert programming error
  2016-08-23  9:27 ` [Qemu-devel] [PATCH v2 1/5] msix_init: assert programming error Cao jin
@ 2016-09-12 13:29   ` Markus Armbruster
  2016-09-13  2:51     ` Cao jin
  0 siblings, 1 reply; 21+ messages in thread
From: Markus Armbruster @ 2016-09-12 13:29 UTC (permalink / raw)
  To: Cao jin; +Cc: qemu-devel, Marcel Apfelbaum, Michael S. Tsirkin

Cao jin <caoj.fnst@cn.fujitsu.com> writes:

> The input parameters is used for creating the msix capable device, so
> they must obey the PCI spec, or else, it should be programming error.

True when the the parameters come from a device model attempting to
define a PCI device violating the spec.  But what if the parameters come
from an actual PCI device violating the spec, via device assignment?

For what it's worth, the new behavior seems consistent with msi_init(),
which is good.

> CC: Markus Armbruster <armbru@redhat.com>
> CC: Marcel Apfelbaum <marcel@redhat.com>
> CC: Michael S. Tsirkin <mst@redhat.com>
> Signed-off-by: Cao jin <caoj.fnst@cn.fujitsu.com>
> ---
>  hw/pci/msix.c | 6 ++----
>  1 file changed, 2 insertions(+), 4 deletions(-)
>
> diff --git a/hw/pci/msix.c b/hw/pci/msix.c
> index 0ec1cb1..384a29d 100644
> --- a/hw/pci/msix.c
> +++ b/hw/pci/msix.c
> @@ -253,9 +253,7 @@ int msix_init(struct PCIDevice *dev, unsigned short nentries,
>          return -ENOTSUP;
>      }
>  
> -    if (nentries < 1 || nentries > PCI_MSIX_FLAGS_QSIZE + 1) {
> -        return -EINVAL;
> -    }
> +    assert(nentries >= 1 && nentries <= PCI_MSIX_FLAGS_QSIZE + 1);
>  
>      table_size = nentries * PCI_MSIX_ENTRY_SIZE;
>      pba_size = QEMU_ALIGN_UP(nentries, 64) / 8;
> @@ -266,7 +264,7 @@ int msix_init(struct PCIDevice *dev, unsigned short nentries,
       /* Sanity test: table & pba don't overlap, fit within BARs, min aligned */
       if ((table_bar_nr == pba_bar_nr &&
            ranges_overlap(table_offset, table_size, pba_offset, pba_size)) ||
>          table_offset + table_size > memory_region_size(table_bar) ||
>          pba_offset + pba_size > memory_region_size(pba_bar) ||
>          (table_offset | pba_offset) & PCI_MSIX_FLAGS_BIRMASK) {
> -        return -EINVAL;
> +        assert(0);
>      }

Instead of

    if (... complicated condition ...) {
        assert(0);
    }

let's write

    assert(... negation of the complicated condition ...);

>  
>      cap = pci_add_capability(dev, PCI_CAP_ID_MSIX, cap_pos, MSIX_CAP_LENGTH);

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Qemu-devel] [PATCH v2 3/5] pci: Convert msix_init() to Error and fix callers to check it
  2016-08-23  9:27 ` [Qemu-devel] [PATCH v2 3/5] pci: Convert msix_init() to Error and fix callers to check it Cao jin
@ 2016-09-12 13:47   ` Markus Armbruster
  2016-09-13  6:04     ` Cao jin
  0 siblings, 1 reply; 21+ messages in thread
From: Markus Armbruster @ 2016-09-12 13:47 UTC (permalink / raw)
  To: Cao jin
  Cc: qemu-devel, Jiri Pirko, Michael S. Tsirkin, Jason Wang,
	Marcel Apfelbaum, Alex Williamson, Hannes Reinecke,
	Dmitry Fleytman, Paolo Bonzini, Gerd Hoffmann

Cao jin <caoj.fnst@cn.fujitsu.com> writes:

> msix_init() reports errors with error_report(), which is wrong when
> it's used in realize().  The same issue was fixed for msi_init() in
> commit 1108b2f.
>
> For some devices like e1000e & vmxnet3 who won't fail because of
> msi_init's failure, suppress the error report by passing NULL error object.
>
> CC: Jiri Pirko <jiri@resnulli.us>
> CC: Gerd Hoffmann <kraxel@redhat.com>
> CC: Dmitry Fleytman <dmitry@daynix.com>
> CC: Jason Wang <jasowang@redhat.com>
> CC: Michael S. Tsirkin <mst@redhat.com>
> CC: Hannes Reinecke <hare@suse.de>
> CC: Paolo Bonzini <pbonzini@redhat.com>
> CC: Alex Williamson <alex.williamson@redhat.com>
> CC: Markus Armbruster <armbru@redhat.com>
> CC: Marcel Apfelbaum <marcel@redhat.com>
> Signed-off-by: Cao jin <caoj.fnst@cn.fujitsu.com>
> ---
>  hw/block/nvme.c        |  5 +++-
>  hw/misc/ivshmem.c      |  8 +++---
>  hw/net/e1000e.c        |  2 +-
>  hw/net/rocker/rocker.c |  4 ++-
>  hw/net/vmxnet3.c       | 42 +++++++++--------------------
>  hw/pci/msix.c          | 31 ++++++++++++++++++----
>  hw/scsi/megasas.c      | 26 ++++++++++++++----
>  hw/usb/hcd-xhci.c      | 71 ++++++++++++++++++++++++++++++--------------------
>  hw/vfio/pci.c          |  7 +++--
>  hw/virtio/virtio-pci.c |  8 ++----
>  include/hw/pci/msix.h  |  5 ++--
>  11 files changed, 125 insertions(+), 84 deletions(-)
>
> diff --git a/hw/block/nvme.c b/hw/block/nvme.c
> index cef3bb4..ae84dc7 100644
> --- a/hw/block/nvme.c
> +++ b/hw/block/nvme.c
> @@ -829,6 +829,7 @@ static int nvme_init(PCIDevice *pci_dev)
>  {
>      NvmeCtrl *n = NVME(pci_dev);
>      NvmeIdCtrl *id = &n->id_ctrl;
> +    Error *err = NULL;
>  
>      int i;
>      int64_t bs_size;
> @@ -870,7 +871,9 @@ static int nvme_init(PCIDevice *pci_dev)
>      pci_register_bar(&n->parent_obj, 0,
>          PCI_BASE_ADDRESS_SPACE_MEMORY | PCI_BASE_ADDRESS_MEM_TYPE_64,
>          &n->iomem);
> -    msix_init_exclusive_bar(&n->parent_obj, n->num_queues, 4);
> +    if (msix_init_exclusive_bar(&n->parent_obj, n->num_queues, 4, &err)) {
> +        error_report_err(err);
> +    }
>  
>      id->vid = cpu_to_le16(pci_get_word(pci_conf + PCI_VENDOR_ID));
>      id->ssvid = cpu_to_le16(pci_get_word(pci_conf + PCI_SUBSYSTEM_VENDOR_ID));
> diff --git a/hw/misc/ivshmem.c b/hw/misc/ivshmem.c
> index 40a2ebc..a1060ec 100644
> --- a/hw/misc/ivshmem.c
> +++ b/hw/misc/ivshmem.c
> @@ -750,13 +750,13 @@ static void ivshmem_reset(DeviceState *d)
>      }
>  }
>  
> -static int ivshmem_setup_interrupts(IVShmemState *s)
> +static int ivshmem_setup_interrupts(IVShmemState *s, Error **errp)
>  {
>      /* allocate QEMU callback data for receiving interrupts */
>      s->msi_vectors = g_malloc0(s->vectors * sizeof(MSIVector));
>  
>      if (ivshmem_has_feature(s, IVSHMEM_MSI)) {
> -        if (msix_init_exclusive_bar(PCI_DEVICE(s), s->vectors, 1)) {
> +        if (msix_init_exclusive_bar(PCI_DEVICE(s), s->vectors, 1, errp)) {
>              return -1;
>          }
>  
> @@ -897,8 +897,8 @@ static void ivshmem_common_realize(PCIDevice *dev, Error **errp)
>          qemu_chr_add_handlers(s->server_chr, ivshmem_can_receive,
>                                ivshmem_read, NULL, s);
>  
> -        if (ivshmem_setup_interrupts(s) < 0) {
> -            error_setg(errp, "failed to initialize interrupts");
> +        if (ivshmem_setup_interrupts(s, errp) < 0) {
> +            error_prepend(errp, "Failed to initialize interrupts: ");
>              return;
>          }
>      }
> diff --git a/hw/net/e1000e.c b/hw/net/e1000e.c
> index bad43f4..72aad21 100644
> --- a/hw/net/e1000e.c
> +++ b/hw/net/e1000e.c
> @@ -292,7 +292,7 @@ e1000e_init_msix(E1000EState *s)
>                          E1000E_MSIX_IDX, E1000E_MSIX_TABLE,
>                          &s->msix,
>                          E1000E_MSIX_IDX, E1000E_MSIX_PBA,
> -                        0xA0);
> +                        0xA0, NULL);
>  
>      if (res < 0) {
>          trace_e1000e_msix_init_fail(res);
> diff --git a/hw/net/rocker/rocker.c b/hw/net/rocker/rocker.c
> index 30f2ce4..e421ebb 100644
> --- a/hw/net/rocker/rocker.c
> +++ b/hw/net/rocker/rocker.c
> @@ -1256,14 +1256,16 @@ static int rocker_msix_init(Rocker *r)
>  {
>      PCIDevice *dev = PCI_DEVICE(r);
>      int err;
> +    Error *local_err = NULL;
>  
>      err = msix_init(dev, ROCKER_MSIX_VEC_COUNT(r->fp_ports),
>                      &r->msix_bar,
>                      ROCKER_PCI_MSIX_BAR_IDX, ROCKER_PCI_MSIX_TABLE_OFFSET,
>                      &r->msix_bar,
>                      ROCKER_PCI_MSIX_BAR_IDX, ROCKER_PCI_MSIX_PBA_OFFSET,
> -                    0);
> +                    0, &local_err);
>      if (err) {
> +        error_report_err(local_err);
>          return err;
>      }
>  
> diff --git a/hw/net/vmxnet3.c b/hw/net/vmxnet3.c
> index 90f6943..4824f8d 100644
> --- a/hw/net/vmxnet3.c
> +++ b/hw/net/vmxnet3.c
> @@ -2181,32 +2181,6 @@ vmxnet3_use_msix_vectors(VMXNET3State *s, int num_vectors)
>      return true;
>  }
>  
> -static bool
> -vmxnet3_init_msix(VMXNET3State *s)
> -{
> -    PCIDevice *d = PCI_DEVICE(s);
> -    int res = msix_init(d, VMXNET3_MAX_INTRS,
> -                        &s->msix_bar,
> -                        VMXNET3_MSIX_BAR_IDX, VMXNET3_OFF_MSIX_TABLE,
> -                        &s->msix_bar,
> -                        VMXNET3_MSIX_BAR_IDX, VMXNET3_OFF_MSIX_PBA(s),
> -                        VMXNET3_MSIX_OFFSET(s));
> -
> -    if (0 > res) {
> -        VMW_WRPRN("Failed to initialize MSI-X, error %d", res);
> -        s->msix_used = false;
> -    } else {
> -        if (!vmxnet3_use_msix_vectors(s, VMXNET3_MAX_INTRS)) {
> -            VMW_WRPRN("Failed to use MSI-X vectors, error %d", res);
> -            msix_uninit(d, &s->msix_bar, &s->msix_bar);
> -            s->msix_used = false;
> -        } else {
> -            s->msix_used = true;
> -        }
> -    }
> -    return s->msix_used;
> -}
> -
>  static void
>  vmxnet3_cleanup_msix(VMXNET3State *s)
>  {
> @@ -2315,9 +2289,19 @@ static void vmxnet3_pci_realize(PCIDevice *pci_dev, Error **errp)
>       * is a programming error. Fall back to INTx silently on -ENOTSUP */
>      assert(!ret || ret == -ENOTSUP);
>  
> -    if (!vmxnet3_init_msix(s)) {
> -        VMW_WRPRN("Failed to initialize MSI-X, configuration is inconsistent.");
> -    }
> +    ret = msix_init(pci_dev, VMXNET3_MAX_INTRS,
> +                    &s->msix_bar,
> +                    VMXNET3_MSIX_BAR_IDX, VMXNET3_OFF_MSIX_TABLE,
> +                    &s->msix_bar,
> +                    VMXNET3_MSIX_BAR_IDX, VMXNET3_OFF_MSIX_PBA(s),
> +                    VMXNET3_MSIX_OFFSET(s), NULL);
> +    /* Any error other than -ENOTSUP(board's MSI support is broken)
> +     * is a programming error. Fall back to INTx silently on -ENOTSUP */
> +    assert(!ret || ret == -ENOTSUP);
> +    s->msix_used = !ret;
> +    /* VMXNET3_MAX_INTRS is passed, so it will never fail when mark vector.
> +     * For simplicity, no need to check return value. */
> +    vmxnet3_use_msix_vectors(s, VMXNET3_MAX_INTRS);
>  
>      vmxnet3_net_init(s);

Uh, this is more than just a conversion to Error.  Before, the code
falls back to not using MSI-X on any error, with a warning.  After, it
falls back on ENOTSUP only, silently, and crashes on any other error.
Such a change needs to be documented in the commit message, or be in a
separate patch.  I prefer separate patch.

>  
> diff --git a/hw/pci/msix.c b/hw/pci/msix.c
> index 0aadc0c..568c051 100644
> --- a/hw/pci/msix.c
> +++ b/hw/pci/msix.c
> @@ -21,6 +21,7 @@
>  #include "hw/pci/pci.h"
>  #include "hw/xen/xen.h"
>  #include "qemu/range.h"
> +#include "qapi/error.h"
>  
>  #define MSIX_CAP_LENGTH 12
>  
> @@ -238,11 +239,29 @@ static void msix_mask_all(struct PCIDevice *dev, unsigned nentries)
>      }
>  }
>  
> -/* Initialize the MSI-X structures */
> +/* Make PCI device @dev MSI-X capable
> + * @nentries is the max number of MSI-X vectors that the device support.
> + * @table_bar is the MemoryRegion that MSI-X table structure resides.
> + * @table_bar_nr is number of base address register corresponding to @table_bar.
> + * @table_offset indicates the offset that the MSI-X table structure starts with
> + * in @table_bar.
> + * @pba_bar is the MemoryRegion that the Pending Bit Array structure resides.
> + * @pba_bar_nr is number of base address register corresponding to @pba_bar.
> + * @pba_offset indicates the offset that the Pending Bit Array structure
> + * starts with in @pba_bar.
> + * Non-zero @cap_pos puts capability MSI-X at that offset in PCI config space.
> + * @errp is for returning errors.
> + *
> + * Return 0 on success; set @errp and return -errno on error.
> + * -ENOTSUP means lacking msi support for a msi-capable platform.
> + * -EINVAL means capability overlap, happens when @cap_pos is non-zero,
> + * also means a programming error, except device assignment, which can check
> + * if a real HW is broken.*/
>  int msix_init(struct PCIDevice *dev, unsigned short nentries,
>                MemoryRegion *table_bar, uint8_t table_bar_nr,
>                unsigned table_offset, MemoryRegion *pba_bar,
> -              uint8_t pba_bar_nr, unsigned pba_offset, uint8_t cap_pos)
> +              uint8_t pba_bar_nr, unsigned pba_offset, uint8_t cap_pos,
> +              Error **errp)
>  {
>      int cap;
>      unsigned table_size, pba_size;
> @@ -250,6 +269,7 @@ int msix_init(struct PCIDevice *dev, unsigned short nentries,
>  
>      /* Nothing to do if MSI is not supported by interrupt controller */
>      if (!msi_nonbroken) {
> +        error_setg(errp, "MSI-X is not supported by interrupt controller");
>          return -ENOTSUP;
>      }
>  
> @@ -267,7 +287,8 @@ int msix_init(struct PCIDevice *dev, unsigned short nentries,
>          assert(0);
>      }
>  
> -    cap = pci_add_capability(dev, PCI_CAP_ID_MSIX, cap_pos, MSIX_CAP_LENGTH);
> +    cap = pci_add_capability2(dev, PCI_CAP_ID_MSIX,
> +                              cap_pos, MSIX_CAP_LENGTH, errp);
>      if (cap < 0) {
>          return cap;
>      }
> @@ -304,7 +325,7 @@ int msix_init(struct PCIDevice *dev, unsigned short nentries,
>  }
>  
>  int msix_init_exclusive_bar(PCIDevice *dev, unsigned short nentries,
> -                            uint8_t bar_nr)
> +                            uint8_t bar_nr, Error **errp)
>  {
>      int ret;
>      char *name;
> @@ -336,7 +357,7 @@ int msix_init_exclusive_bar(PCIDevice *dev, unsigned short nentries,
>      ret = msix_init(dev, nentries, &dev->msix_exclusive_bar, bar_nr,
>                      0, &dev->msix_exclusive_bar,
>                      bar_nr, bar_pba_offset,
> -                    0);
> +                    0, errp);
>      if (ret) {
>          return ret;
>      }
> diff --git a/hw/scsi/megasas.c b/hw/scsi/megasas.c
> index e968302..6d45025 100644
> --- a/hw/scsi/megasas.c
> +++ b/hw/scsi/megasas.c
> @@ -2349,16 +2349,32 @@ static void megasas_scsi_realize(PCIDevice *dev, Error **errp)
>  
>      memory_region_init_io(&s->mmio_io, OBJECT(s), &megasas_mmio_ops, s,
>                            "megasas-mmio", 0x4000);
> +    if (megasas_use_msix(s)) {
> +        ret = msix_init(dev, 15, &s->mmio_io, b->mmio_bar, 0x2000,
> +                        &s->mmio_io, b->mmio_bar, 0x3800, 0x68, &err);
> +        /* Any error other than -ENOTSUP(board's MSI support is broken)
> +         * is a programming error */
> +        assert(!ret || ret == -ENOTSUP);
> +        if (ret && s->msix == ON_OFF_AUTO_ON) {
> +            /* Can't satisfy user's explicit msix=on request, fail */
> +            error_append_hint(&err, "You have to use msix=auto (default) or "
> +                    "msix=off with this machine type.\n");
> +            /* No instance_finalize method, need to free the resource here */
> +            object_unref(OBJECT(&s->mmio_io));
> +            error_propagate(errp, err);
> +            return;
> +        } else if (ret) {
> +            /* With msix=auto, we fall back to MSI off silently */
> +            s->msix = ON_OFF_AUTO_OFF;
> +            error_free(err);
> +        }
> +    }
> +
>      memory_region_init_io(&s->port_io, OBJECT(s), &megasas_port_ops, s,
>                            "megasas-io", 256);
>      memory_region_init_io(&s->queue_io, OBJECT(s), &megasas_queue_ops, s,
>                            "megasas-queue", 0x40000);
>  
> -    if (megasas_use_msix(s) &&
> -        msix_init(dev, 15, &s->mmio_io, b->mmio_bar, 0x2000,
> -                  &s->mmio_io, b->mmio_bar, 0x3800, 0x68)) {
> -        s->msix = ON_OFF_AUTO_OFF;
> -    }

Before your patch, msix=on behaves just like msix=auto.

Afterwards, msix=on fails when MSI-X can't be enabled.

That's a good change, but it needs to be documented in the commit
message, or be in a separate patch.  I prefer separate patch.

>      if (pci_is_express(dev)) {
>          pcie_endpoint_cap_init(dev, 0xa0);
>      }
> diff --git a/hw/usb/hcd-xhci.c b/hw/usb/hcd-xhci.c
> index 188f954..4280c5d 100644
> --- a/hw/usb/hcd-xhci.c
> +++ b/hw/usb/hcd-xhci.c
> @@ -3594,25 +3594,6 @@ static void usb_xhci_realize(struct PCIDevice *dev, Error **errp)
>      dev->config[PCI_CACHE_LINE_SIZE] = 0x10;
>      dev->config[0x60] = 0x30; /* release number */
>  
> -    usb_xhci_init(xhci);
> -
> -    if (xhci->msi != ON_OFF_AUTO_OFF) {
> -        ret = msi_init(dev, 0x70, xhci->numintrs, true, false, &err);
> -        /* Any error other than -ENOTSUP(board's MSI support is broken)
> -         * is a programming error */
> -        assert(!ret || ret == -ENOTSUP);
> -        if (ret && xhci->msi == ON_OFF_AUTO_ON) {
> -            /* Can't satisfy user's explicit msi=on request, fail */
> -            error_append_hint(&err, "You have to use msi=auto (default) or "
> -                    "msi=off with this machine type.\n");
> -            error_propagate(errp, err);
> -            return;
> -        }
> -        assert(!err || xhci->msi == ON_OFF_AUTO_AUTO);
> -        /* With msi=auto, we fall back to MSI off silently */
> -        error_free(err);
> -    }
> -
>      if (xhci->numintrs > MAXINTRS) {
>          xhci->numintrs = MAXINTRS;
>      }
> @@ -3622,21 +3603,60 @@ static void usb_xhci_realize(struct PCIDevice *dev, Error **errp)
>      if (xhci->numintrs < 1) {
>          xhci->numintrs = 1;
>      }
> +
>      if (xhci->numslots > MAXSLOTS) {
>          xhci->numslots = MAXSLOTS;
>      }
>      if (xhci->numslots < 1) {
>          xhci->numslots = 1;
>      }
> +
>      if (xhci_get_flag(xhci, XHCI_FLAG_ENABLE_STREAMS)) {
>          xhci->max_pstreams_mask = 7; /* == 256 primary streams */
>      } else {
>          xhci->max_pstreams_mask = 0;
>      }
>  
> -    xhci->mfwrap_timer = timer_new_ns(QEMU_CLOCK_VIRTUAL, xhci_mfwrap_timer, xhci);
> +    if (xhci->msi != ON_OFF_AUTO_OFF) {
> +        ret = msi_init(dev, 0x70, xhci->numintrs, true, false, &err);
> +        /* Any error other than -ENOTSUP(board's MSI support is broken)
> +         * is a programming error */
> +        assert(!ret || ret == -ENOTSUP);
> +        if (ret && xhci->msi == ON_OFF_AUTO_ON) {
> +            /* Can't satisfy user's explicit msi=on request, fail */
> +            error_append_hint(&err, "You have to use msi=auto (default) or "
> +                    "msi=off with this machine type.\n");
> +            error_propagate(errp, err);
> +            return;
> +        }
> +        assert(!err || xhci->msi == ON_OFF_AUTO_AUTO);
> +        /* With msi=auto, we fall back to MSI off silently */
> +        error_free(err);
> +    }

Can you explain why you're moving this code?

>  
>      memory_region_init(&xhci->mem, OBJECT(xhci), "xhci", LEN_REGS);
> +    if (xhci->msix != ON_OFF_AUTO_OFF) {
> +        ret = msix_init(dev, xhci->numintrs,
> +                        &xhci->mem, 0, OFF_MSIX_TABLE,
> +                        &xhci->mem, 0, OFF_MSIX_PBA,
> +                        0x90, &err);
> +        /* Any error other than -ENOTSUP(board's MSI support is broken)
> +         * is a programming error */
> +        assert(!ret || ret == -ENOTSUP);
> +        if (ret && xhci->msix == ON_OFF_AUTO_ON) {
> +            /* Can't satisfy user's explicit msix=on request, fail */
> +            error_append_hint(&err, "You have to use msix=auto (default) or "
> +                    "msic=off with this machine type.\n");
> +            /* No instance_finalize method, need to free the resource here */
> +            object_unref(OBJECT(&xhci->mem));
> +            error_propagate(errp, err);
> +            return;
> +        }
> +        assert(!err || xhci->msix == ON_OFF_AUTO_AUTO);
> +        /* With msix=auto, we fall back to MSI off silently */
> +        error_free(err);
> +    }
> +
>      memory_region_init_io(&xhci->mem_cap, OBJECT(xhci), &xhci_cap_ops, xhci,
>                            "capabilities", LEN_CAP);
>      memory_region_init_io(&xhci->mem_oper, OBJECT(xhci), &xhci_oper_ops, xhci,
> @@ -3664,19 +3684,14 @@ static void usb_xhci_realize(struct PCIDevice *dev, Error **errp)
>                       PCI_BASE_ADDRESS_SPACE_MEMORY|PCI_BASE_ADDRESS_MEM_TYPE_64,
>                       &xhci->mem);
>  
> +    usb_xhci_init(xhci);
> +    xhci->mfwrap_timer = timer_new_ns(QEMU_CLOCK_VIRTUAL, xhci_mfwrap_timer, xhci);
> +
>      if (pci_bus_is_express(dev->bus) ||
>          xhci_get_flag(xhci, XHCI_FLAG_FORCE_PCIE_ENDCAP)) {
>          ret = pcie_endpoint_cap_init(dev, 0xa0);
>          assert(ret >= 0);
>      }
> -
> -    if (xhci->msix != ON_OFF_AUTO_OFF) {
> -        /* TODO check for errors */
> -        msix_init(dev, xhci->numintrs,
> -                  &xhci->mem, 0, OFF_MSIX_TABLE,
> -                  &xhci->mem, 0, OFF_MSIX_PBA,
> -                  0x90);
> -    }

You're resolving the TODO.  Good, but it needs to be documented in the
commit message, or be in a separate patch.

>  }
>  
>  static void usb_xhci_exit(PCIDevice *dev)
> diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
> index 7bfa17c..87f4e11 100644
> --- a/hw/vfio/pci.c
> +++ b/hw/vfio/pci.c
> @@ -1349,6 +1349,7 @@ static int vfio_msix_early_setup(VFIOPCIDevice *vdev)
>  static int vfio_msix_setup(VFIOPCIDevice *vdev, int pos)
>  {
>      int ret;
> +    Error *err = NULL;
>  
>      vdev->msix->pending = g_malloc0(BITS_TO_LONGS(vdev->msix->entries) *
>                                      sizeof(unsigned long));
> @@ -1356,12 +1357,14 @@ static int vfio_msix_setup(VFIOPCIDevice *vdev, int pos)
>                      vdev->bars[vdev->msix->table_bar].region.mem,
>                      vdev->msix->table_bar, vdev->msix->table_offset,
>                      vdev->bars[vdev->msix->pba_bar].region.mem,
> -                    vdev->msix->pba_bar, vdev->msix->pba_offset, pos);
> +                    vdev->msix->pba_bar, vdev->msix->pba_offset, pos,
> +                    &err);
>      if (ret < 0) {
>          if (ret == -ENOTSUP) {
>              return 0;
>          }
> -        error_report("vfio: msix_init failed");
> +        error_prepend(&err, "vfio: msix_init failed: ");
> +        error_report_err(err);
>          return ret;
>      }
>  
> diff --git a/hw/virtio/virtio-pci.c b/hw/virtio/virtio-pci.c
> index 755f921..2e6b9bc 100644
> --- a/hw/virtio/virtio-pci.c
> +++ b/hw/virtio/virtio-pci.c
> @@ -1657,13 +1657,9 @@ static void virtio_pci_device_plugged(DeviceState *d, Error **errp)
>  
>      if (proxy->nvectors) {
>          int err = msix_init_exclusive_bar(&proxy->pci_dev, proxy->nvectors,
> -                                          proxy->msix_bar);
> +                                          proxy->msix_bar, errp);
>          if (err) {
> -            /* Notice when a system that supports MSIx can't initialize it.  */
> -            if (err != -ENOTSUP) {
> -                error_report("unable to init msix vectors to %" PRIu32,
> -                             proxy->nvectors);
> -            }
> +            error_report_err(*errp);
>              proxy->nvectors = 0;
>          }
>      }
> diff --git a/include/hw/pci/msix.h b/include/hw/pci/msix.h
> index 048a29d..1f27658 100644
> --- a/include/hw/pci/msix.h
> +++ b/include/hw/pci/msix.h
> @@ -9,9 +9,10 @@ MSIMessage msix_get_message(PCIDevice *dev, unsigned int vector);
>  int msix_init(PCIDevice *dev, unsigned short nentries,
>                MemoryRegion *table_bar, uint8_t table_bar_nr,
>                unsigned table_offset, MemoryRegion *pba_bar,
> -              uint8_t pba_bar_nr, unsigned pba_offset, uint8_t cap_pos);
> +              uint8_t pba_bar_nr, unsigned pba_offset, uint8_t cap_pos,
> +              Error **errp);
>  int msix_init_exclusive_bar(PCIDevice *dev, unsigned short nentries,
> -                            uint8_t bar_nr);
> +                            uint8_t bar_nr, Error **errp);
>  
>  void msix_write_config(PCIDevice *dev, uint32_t address, uint32_t val, int len);

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Qemu-devel] [PATCH v2 1/5] msix_init: assert programming error
  2016-09-12 13:29   ` Markus Armbruster
@ 2016-09-13  2:51     ` Cao jin
  2016-09-13  6:16       ` Markus Armbruster
  0 siblings, 1 reply; 21+ messages in thread
From: Cao jin @ 2016-09-13  2:51 UTC (permalink / raw)
  To: Markus Armbruster; +Cc: qemu-devel, Marcel Apfelbaum, Michael S. Tsirkin



On 09/12/2016 09:29 PM, Markus Armbruster wrote:
> Cao jin <caoj.fnst@cn.fujitsu.com> writes:
>
>> The input parameters is used for creating the msix capable device, so
>> they must obey the PCI spec, or else, it should be programming error.
>
> True when the the parameters come from a device model attempting to
> define a PCI device violating the spec.  But what if the parameters come
> from an actual PCI device violating the spec, via device assignment?

Before the patch, on invalid param, the vfio behaviour is:
   error_report("vfio: msix_init failed");
   then, device create fail.

After the patch, its behaviour is:
   asserted.

Do you mean we should still report some useful info to user on invalid 
params?

Cao jin
>
> For what it's worth, the new behavior seems consistent with msi_init(),
> which is good.
>
>> CC: Markus Armbruster <armbru@redhat.com>
>> CC: Marcel Apfelbaum <marcel@redhat.com>
>> CC: Michael S. Tsirkin <mst@redhat.com>
>> Signed-off-by: Cao jin <caoj.fnst@cn.fujitsu.com>
>> ---
>>   hw/pci/msix.c | 6 ++----
>>   1 file changed, 2 insertions(+), 4 deletions(-)
>>
>> diff --git a/hw/pci/msix.c b/hw/pci/msix.c
>> index 0ec1cb1..384a29d 100644
>> --- a/hw/pci/msix.c
>> +++ b/hw/pci/msix.c
>> @@ -253,9 +253,7 @@ int msix_init(struct PCIDevice *dev, unsigned short nentries,
>>           return -ENOTSUP;
>>       }
>>
>> -    if (nentries < 1 || nentries > PCI_MSIX_FLAGS_QSIZE + 1) {
>> -        return -EINVAL;
>> -    }
>> +    assert(nentries >= 1 && nentries <= PCI_MSIX_FLAGS_QSIZE + 1);
>>
>>       table_size = nentries * PCI_MSIX_ENTRY_SIZE;
>>       pba_size = QEMU_ALIGN_UP(nentries, 64) / 8;
>> @@ -266,7 +264,7 @@ int msix_init(struct PCIDevice *dev, unsigned short nentries,
>         /* Sanity test: table & pba don't overlap, fit within BARs, min aligned */
>         if ((table_bar_nr == pba_bar_nr &&
>              ranges_overlap(table_offset, table_size, pba_offset, pba_size)) ||
>>           table_offset + table_size > memory_region_size(table_bar) ||
>>           pba_offset + pba_size > memory_region_size(pba_bar) ||
>>           (table_offset | pba_offset) & PCI_MSIX_FLAGS_BIRMASK) {
>> -        return -EINVAL;
>> +        assert(0);
>>       }
>
> Instead of
>
>      if (... complicated condition ...) {
>          assert(0);
>      }
>
> let's write
>
>      assert(... negation of the complicated condition ...);
>
>>
>>       cap = pci_add_capability(dev, PCI_CAP_ID_MSIX, cap_pos, MSIX_CAP_LENGTH);
>
>
> .
>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Qemu-devel] [PATCH v2 3/5] pci: Convert msix_init() to Error and fix callers to check it
  2016-09-12 13:47   ` Markus Armbruster
@ 2016-09-13  6:04     ` Cao jin
  2016-09-13  8:27       ` Markus Armbruster
  0 siblings, 1 reply; 21+ messages in thread
From: Cao jin @ 2016-09-13  6:04 UTC (permalink / raw)
  To: Markus Armbruster
  Cc: qemu-devel, Jiri Pirko, Michael S. Tsirkin, Jason Wang,
	Marcel Apfelbaum, Alex Williamson, Hannes Reinecke,
	Dmitry Fleytman, Paolo Bonzini, Gerd Hoffmann



On 09/12/2016 09:47 PM, Markus Armbruster wrote:
> Cao jin <caoj.fnst@cn.fujitsu.com> writes:
>

>>   static void
>>   vmxnet3_cleanup_msix(VMXNET3State *s)
>>   {
>> @@ -2315,9 +2289,19 @@ static void vmxnet3_pci_realize(PCIDevice *pci_dev, Error **errp)
>>        * is a programming error. Fall back to INTx silently on -ENOTSUP */
>>       assert(!ret || ret == -ENOTSUP);
>>
>> -    if (!vmxnet3_init_msix(s)) {
>> -        VMW_WRPRN("Failed to initialize MSI-X, configuration is inconsistent.");
>> -    }
>> +    ret = msix_init(pci_dev, VMXNET3_MAX_INTRS,
>> +                    &s->msix_bar,
>> +                    VMXNET3_MSIX_BAR_IDX, VMXNET3_OFF_MSIX_TABLE,
>> +                    &s->msix_bar,
>> +                    VMXNET3_MSIX_BAR_IDX, VMXNET3_OFF_MSIX_PBA(s),
>> +                    VMXNET3_MSIX_OFFSET(s), NULL);
>> +    /* Any error other than -ENOTSUP(board's MSI support is broken)
>> +     * is a programming error. Fall back to INTx silently on -ENOTSUP */
>> +    assert(!ret || ret == -ENOTSUP);
>> +    s->msix_used = !ret;
>> +    /* VMXNET3_MAX_INTRS is passed, so it will never fail when mark vector.
>> +     * For simplicity, no need to check return value. */
>> +    vmxnet3_use_msix_vectors(s, VMXNET3_MAX_INTRS);
>>
>>       vmxnet3_net_init(s);
>
> Uh, this is more than just a conversion to Error.  Before, the code
> falls back to not using MSI-X on any error, with a warning.  After, it
> falls back on ENOTSUP only, silently, and crashes on any other error.
> Such a change needs to be documented in the commit message, or be in a
> separate patch.  I prefer separate patch.
>

Dmitry has option that we should check the return value of 
msix_vector_use and prefer to keep init function, so I will withdraw 
this modification.

>> diff --git a/hw/usb/hcd-xhci.c b/hw/usb/hcd-xhci.c
>> index 188f954..4280c5d 100644
>> --- a/hw/usb/hcd-xhci.c
>> +++ b/hw/usb/hcd-xhci.c
>> @@ -3594,25 +3594,6 @@ static void usb_xhci_realize(struct PCIDevice *dev, Error **errp)
>>       dev->config[PCI_CACHE_LINE_SIZE] = 0x10;
>>       dev->config[0x60] = 0x30; /* release number */
>>
>> -    usb_xhci_init(xhci);
>> -
>> -    if (xhci->msi != ON_OFF_AUTO_OFF) {
>> -        ret = msi_init(dev, 0x70, xhci->numintrs, true, false, &err);
>> -        /* Any error other than -ENOTSUP(board's MSI support is broken)
>> -         * is a programming error */
>> -        assert(!ret || ret == -ENOTSUP);
>> -        if (ret && xhci->msi == ON_OFF_AUTO_ON) {
>> -            /* Can't satisfy user's explicit msi=on request, fail */
>> -            error_append_hint(&err, "You have to use msi=auto (default) or "
>> -                    "msi=off with this machine type.\n");
>> -            error_propagate(errp, err);
>> -            return;
>> -        }
>> -        assert(!err || xhci->msi == ON_OFF_AUTO_AUTO);
>> -        /* With msi=auto, we fall back to MSI off silently */
>> -        error_free(err);
>> -    }
>> -
>>       if (xhci->numintrs > MAXINTRS) {
>>           xhci->numintrs = MAXINTRS;
>>       }
>> @@ -3622,21 +3603,60 @@ static void usb_xhci_realize(struct PCIDevice *dev, Error **errp)
>>       if (xhci->numintrs < 1) {
>>           xhci->numintrs = 1;
>>       }
>> +
>>       if (xhci->numslots > MAXSLOTS) {
>>           xhci->numslots = MAXSLOTS;
>>       }
>>       if (xhci->numslots < 1) {
>>           xhci->numslots = 1;
>>       }
>> +
>>       if (xhci_get_flag(xhci, XHCI_FLAG_ENABLE_STREAMS)) {
>>           xhci->max_pstreams_mask = 7; /* == 256 primary streams */
>>       } else {
>>           xhci->max_pstreams_mask = 0;
>>       }
>>
>> -    xhci->mfwrap_timer = timer_new_ns(QEMU_CLOCK_VIRTUAL, xhci_mfwrap_timer, xhci);
>> +    if (xhci->msi != ON_OFF_AUTO_OFF) {
>> +        ret = msi_init(dev, 0x70, xhci->numintrs, true, false, &err);
>> +        /* Any error other than -ENOTSUP(board's MSI support is broken)
>> +         * is a programming error */
>> +        assert(!ret || ret == -ENOTSUP);
>> +        if (ret && xhci->msi == ON_OFF_AUTO_ON) {
>> +            /* Can't satisfy user's explicit msi=on request, fail */
>> +            error_append_hint(&err, "You have to use msi=auto (default) or "
>> +                    "msi=off with this machine type.\n");
>> +            error_propagate(errp, err);
>> +            return;
>> +        }
>> +        assert(!err || xhci->msi == ON_OFF_AUTO_AUTO);
>> +        /* With msi=auto, we fall back to MSI off silently */
>> +        error_free(err);
>> +    }
>
> Can you explain why you're moving this code?
>

Sorry I forget to mention this: msi_init() uses xhci->numintrs, but 
there is value checking/correcting on xhci->numintrs, it should be done 
before using.

-- 
Yours Sincerely,

Cao jin

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Qemu-devel] [PATCH v2 1/5] msix_init: assert programming error
  2016-09-13  2:51     ` Cao jin
@ 2016-09-13  6:16       ` Markus Armbruster
  2016-09-13 14:49         ` Alex Williamson
  0 siblings, 1 reply; 21+ messages in thread
From: Markus Armbruster @ 2016-09-13  6:16 UTC (permalink / raw)
  To: Cao jin; +Cc: Marcel Apfelbaum, qemu-devel, Michael S. Tsirkin, Alex Williamson

Cc: Alex for device assignment expertise.

Cao jin <caoj.fnst@cn.fujitsu.com> writes:

> On 09/12/2016 09:29 PM, Markus Armbruster wrote:
>> Cao jin <caoj.fnst@cn.fujitsu.com> writes:
>>
>>> The input parameters is used for creating the msix capable device, so
>>> they must obey the PCI spec, or else, it should be programming error.
>>
>> True when the the parameters come from a device model attempting to
>> define a PCI device violating the spec.  But what if the parameters come
>> from an actual PCI device violating the spec, via device assignment?
>
> Before the patch, on invalid param, the vfio behaviour is:
>   error_report("vfio: msix_init failed");
>   then, device create fail.
>
> After the patch, its behaviour is:
>   asserted.
>
> Do you mean we should still report some useful info to user on invalid
> params?

In the normal case, asking msix_init() to create MSI-X that are out of
spec is a programming error: the code that does it is broken and needs
fixing.

Device assignment might be the exception: there, the parameters for
msix_init() come from the assigned device, not the program.  If they
violate the spec, the device is broken.  This wouldn't be a programming
error.  Alex, can this happen?

If yes, we may want to handle it by failing device assignment.

> Cao jin
>>
>> For what it's worth, the new behavior seems consistent with msi_init(),
>> which is good.

Whatever behavior on out-of-spec parameters we choose, msi_init() and
msix_init() should behave the same.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Qemu-devel] [PATCH v2 3/5] pci: Convert msix_init() to Error and fix callers to check it
  2016-09-13  6:04     ` Cao jin
@ 2016-09-13  8:27       ` Markus Armbruster
  0 siblings, 0 replies; 21+ messages in thread
From: Markus Armbruster @ 2016-09-13  8:27 UTC (permalink / raw)
  To: Cao jin
  Cc: Jiri Pirko, Michael S. Tsirkin, Jason Wang, qemu-devel,
	Dmitry Fleytman, Alex Williamson, Hannes Reinecke,
	Marcel Apfelbaum, Paolo Bonzini, Gerd Hoffmann

Cao jin <caoj.fnst@cn.fujitsu.com> writes:

> On 09/12/2016 09:47 PM, Markus Armbruster wrote:
>> Cao jin <caoj.fnst@cn.fujitsu.com> writes:
[...]
>>> diff --git a/hw/usb/hcd-xhci.c b/hw/usb/hcd-xhci.c
>>> index 188f954..4280c5d 100644
>>> --- a/hw/usb/hcd-xhci.c
>>> +++ b/hw/usb/hcd-xhci.c
>>> @@ -3594,25 +3594,6 @@ static void usb_xhci_realize(struct PCIDevice *dev, Error **errp)
>>>       dev->config[PCI_CACHE_LINE_SIZE] = 0x10;
>>>       dev->config[0x60] = 0x30; /* release number */
>>>
>>> -    usb_xhci_init(xhci);
>>> -
>>> -    if (xhci->msi != ON_OFF_AUTO_OFF) {
>>> -        ret = msi_init(dev, 0x70, xhci->numintrs, true, false, &err);
>>> -        /* Any error other than -ENOTSUP(board's MSI support is broken)
>>> -         * is a programming error */
>>> -        assert(!ret || ret == -ENOTSUP);
>>> -        if (ret && xhci->msi == ON_OFF_AUTO_ON) {
>>> -            /* Can't satisfy user's explicit msi=on request, fail */
>>> -            error_append_hint(&err, "You have to use msi=auto (default) or "
>>> -                    "msi=off with this machine type.\n");
>>> -            error_propagate(errp, err);
>>> -            return;
>>> -        }
>>> -        assert(!err || xhci->msi == ON_OFF_AUTO_AUTO);
>>> -        /* With msi=auto, we fall back to MSI off silently */
>>> -        error_free(err);
>>> -    }
>>> -
>>>       if (xhci->numintrs > MAXINTRS) {
>>>           xhci->numintrs = MAXINTRS;
>>>       }
>>> @@ -3622,21 +3603,60 @@ static void usb_xhci_realize(struct PCIDevice *dev, Error **errp)
>>>       if (xhci->numintrs < 1) {
>>>           xhci->numintrs = 1;
>>>       }
>>> +
>>>       if (xhci->numslots > MAXSLOTS) {
>>>           xhci->numslots = MAXSLOTS;
>>>       }
>>>       if (xhci->numslots < 1) {
>>>           xhci->numslots = 1;
>>>       }
>>> +
>>>       if (xhci_get_flag(xhci, XHCI_FLAG_ENABLE_STREAMS)) {
>>>           xhci->max_pstreams_mask = 7; /* == 256 primary streams */
>>>       } else {
>>>           xhci->max_pstreams_mask = 0;
>>>       }
>>>
>>> -    xhci->mfwrap_timer = timer_new_ns(QEMU_CLOCK_VIRTUAL, xhci_mfwrap_timer, xhci);
>>> +    if (xhci->msi != ON_OFF_AUTO_OFF) {
>>> +        ret = msi_init(dev, 0x70, xhci->numintrs, true, false, &err);
>>> +        /* Any error other than -ENOTSUP(board's MSI support is broken)
>>> +         * is a programming error */
>>> +        assert(!ret || ret == -ENOTSUP);
>>> +        if (ret && xhci->msi == ON_OFF_AUTO_ON) {
>>> +            /* Can't satisfy user's explicit msi=on request, fail */
>>> +            error_append_hint(&err, "You have to use msi=auto (default) or "
>>> +                    "msi=off with this machine type.\n");
>>> +            error_propagate(errp, err);
>>> +            return;
>>> +        }
>>> +        assert(!err || xhci->msi == ON_OFF_AUTO_AUTO);
>>> +        /* With msi=auto, we fall back to MSI off silently */
>>> +        error_free(err);
>>> +    }
>>
>> Can you explain why you're moving this code?
>>
>
> Sorry I forget to mention this: msi_init() uses xhci->numintrs, but
> there is value checking/correcting on xhci->numintrs, it should be
> done before using.

If you do the move in a separate patch before this one, you can explain
it in its commit message.  Easier to review that way.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Qemu-devel] [PATCH v2 1/5] msix_init: assert programming error
  2016-09-13  6:16       ` Markus Armbruster
@ 2016-09-13 14:49         ` Alex Williamson
  2016-09-29 13:11           ` Markus Armbruster
  0 siblings, 1 reply; 21+ messages in thread
From: Alex Williamson @ 2016-09-13 14:49 UTC (permalink / raw)
  To: Markus Armbruster
  Cc: Cao jin, Marcel Apfelbaum, qemu-devel, Michael S. Tsirkin

On Tue, 13 Sep 2016 08:16:20 +0200
Markus Armbruster <armbru@redhat.com> wrote:

> Cc: Alex for device assignment expertise.
> 
> Cao jin <caoj.fnst@cn.fujitsu.com> writes:
> 
> > On 09/12/2016 09:29 PM, Markus Armbruster wrote:  
> >> Cao jin <caoj.fnst@cn.fujitsu.com> writes:
> >>  
> >>> The input parameters is used for creating the msix capable device, so
> >>> they must obey the PCI spec, or else, it should be programming error.  
> >>
> >> True when the the parameters come from a device model attempting to
> >> define a PCI device violating the spec.  But what if the parameters come
> >> from an actual PCI device violating the spec, via device assignment?  
> >
> > Before the patch, on invalid param, the vfio behaviour is:
> >   error_report("vfio: msix_init failed");
> >   then, device create fail.
> >
> > After the patch, its behaviour is:
> >   asserted.
> >
> > Do you mean we should still report some useful info to user on invalid
> > params?  
> 
> In the normal case, asking msix_init() to create MSI-X that are out of
> spec is a programming error: the code that does it is broken and needs
> fixing.
> 
> Device assignment might be the exception: there, the parameters for
> msix_init() come from the assigned device, not the program.  If they
> violate the spec, the device is broken.  This wouldn't be a programming
> error.  Alex, can this happen?
> 
> If yes, we may want to handle it by failing device assignment.


Generally, I think the entire premise of these sorts of patches is
flawed.  We take a working error path that allows a driver to robustly
abort on unexpected date and turn it into a time bomb.  Often the
excuse for this is that "error handling is hard".  Tough.  Now a
hot-add of a device that triggers this changes from a simple failure to
a denial of service event.  Furthermore, we base that time bomb on our
interpretation of the spec, which we can only validate against in-tree
devices.

We have actually had assigned devices that fail the sanity test here,
there's a quirk in vfio_msix_early_setup() for a Chelsio device with
this bug.  Do we really want user experiencing aborts when a simple
device initialization failure is sufficient?

Generally abort code paths like this cause me to do my own sanity
testing, which is really poor practice since we should have that sanity
testing in the common code.  Thanks,

Alex

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Qemu-devel] [PATCH v2 1/5] msix_init: assert programming error
  2016-09-13 14:49         ` Alex Williamson
@ 2016-09-29 13:11           ` Markus Armbruster
  2016-09-29 16:10             ` Alex Williamson
  0 siblings, 1 reply; 21+ messages in thread
From: Markus Armbruster @ 2016-09-29 13:11 UTC (permalink / raw)
  To: Alex Williamson; +Cc: Marcel Apfelbaum, Cao jin, qemu-devel, Michael S. Tsirkin

Alex Williamson <alex.williamson@redhat.com> writes:

> On Tue, 13 Sep 2016 08:16:20 +0200
> Markus Armbruster <armbru@redhat.com> wrote:
>
>> Cc: Alex for device assignment expertise.
>> 
>> Cao jin <caoj.fnst@cn.fujitsu.com> writes:
>> 
>> > On 09/12/2016 09:29 PM, Markus Armbruster wrote:  
>> >> Cao jin <caoj.fnst@cn.fujitsu.com> writes:
>> >>  
>> >>> The input parameters is used for creating the msix capable device, so
>> >>> they must obey the PCI spec, or else, it should be programming error.  
>> >>
>> >> True when the the parameters come from a device model attempting to
>> >> define a PCI device violating the spec.  But what if the parameters come
>> >> from an actual PCI device violating the spec, via device assignment?  
>> >
>> > Before the patch, on invalid param, the vfio behaviour is:
>> >   error_report("vfio: msix_init failed");
>> >   then, device create fail.
>> >
>> > After the patch, its behaviour is:
>> >   asserted.
>> >
>> > Do you mean we should still report some useful info to user on invalid
>> > params?  
>> 
>> In the normal case, asking msix_init() to create MSI-X that are out of
>> spec is a programming error: the code that does it is broken and needs
>> fixing.
>> 
>> Device assignment might be the exception: there, the parameters for
>> msix_init() come from the assigned device, not the program.  If they
>> violate the spec, the device is broken.  This wouldn't be a programming
>> error.  Alex, can this happen?
>> 
>> If yes, we may want to handle it by failing device assignment.
>
>
> Generally, I think the entire premise of these sorts of patches is
> flawed.  We take a working error path that allows a driver to robustly
> abort on unexpected date and turn it into a time bomb.  Often the
> excuse for this is that "error handling is hard".  Tough.  Now a
> hot-add of a device that triggers this changes from a simple failure to
> a denial of service event.  Furthermore, we base that time bomb on our
> interpretation of the spec, which we can only validate against in-tree
> devices.
>
> We have actually had assigned devices that fail the sanity test here,
> there's a quirk in vfio_msix_early_setup() for a Chelsio device with
> this bug.  Do we really want user experiencing aborts when a simple
> device initialization failure is sufficient?
>
> Generally abort code paths like this cause me to do my own sanity
> testing, which is really poor practice since we should have that sanity
> testing in the common code.  Thanks,

I prefer to assert on programming error, because 1. it does double duty
as documentation, 2. error handling of impossible conditions is commonly
wrong, and 3. assertion failures have a much better chance to get the
program fixed.  Even when presence of a working error path kills 2., the
other two make me stick to assertions.

However, input out-of-spec is not a programming error.  For most users
of msix_init(), the arguments are hard-coded, thus invalid arguments are
a programming error.  For device assignment, they come from a physical
device, thus invalid arguments can either be a programming error (our
idea of "invalid" is invalid) or bad input (the physical device is
out-of-spec).  Since we can't know, we better handle it rather than
assert.

Bottom line: you convinced me msix_init() should stay as it is.  But now
msi_init() looks like it needs a change: it asserts on invalid
nr_vectors parameter.  Does that need fixing, Alex?

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Qemu-devel] [PATCH v2 1/5] msix_init: assert programming error
  2016-09-29 13:11           ` Markus Armbruster
@ 2016-09-29 16:10             ` Alex Williamson
  2016-09-30 14:06               ` Markus Armbruster
  0 siblings, 1 reply; 21+ messages in thread
From: Alex Williamson @ 2016-09-29 16:10 UTC (permalink / raw)
  To: Markus Armbruster
  Cc: Marcel Apfelbaum, Cao jin, qemu-devel, Michael S. Tsirkin

On Thu, 29 Sep 2016 15:11:27 +0200
Markus Armbruster <armbru@redhat.com> wrote:

> Alex Williamson <alex.williamson@redhat.com> writes:
> 
> > On Tue, 13 Sep 2016 08:16:20 +0200
> > Markus Armbruster <armbru@redhat.com> wrote:
> >  
> >> Cc: Alex for device assignment expertise.
> >> 
> >> Cao jin <caoj.fnst@cn.fujitsu.com> writes:
> >>   
> >> > On 09/12/2016 09:29 PM, Markus Armbruster wrote:    
> >> >> Cao jin <caoj.fnst@cn.fujitsu.com> writes:
> >> >>    
> >> >>> The input parameters is used for creating the msix capable device, so
> >> >>> they must obey the PCI spec, or else, it should be programming error.    
> >> >>
> >> >> True when the the parameters come from a device model attempting to
> >> >> define a PCI device violating the spec.  But what if the parameters come
> >> >> from an actual PCI device violating the spec, via device assignment?    
> >> >
> >> > Before the patch, on invalid param, the vfio behaviour is:
> >> >   error_report("vfio: msix_init failed");
> >> >   then, device create fail.
> >> >
> >> > After the patch, its behaviour is:
> >> >   asserted.
> >> >
> >> > Do you mean we should still report some useful info to user on invalid
> >> > params?    
> >> 
> >> In the normal case, asking msix_init() to create MSI-X that are out of
> >> spec is a programming error: the code that does it is broken and needs
> >> fixing.
> >> 
> >> Device assignment might be the exception: there, the parameters for
> >> msix_init() come from the assigned device, not the program.  If they
> >> violate the spec, the device is broken.  This wouldn't be a programming
> >> error.  Alex, can this happen?
> >> 
> >> If yes, we may want to handle it by failing device assignment.  
> >
> >
> > Generally, I think the entire premise of these sorts of patches is
> > flawed.  We take a working error path that allows a driver to robustly
> > abort on unexpected date and turn it into a time bomb.  Often the
> > excuse for this is that "error handling is hard".  Tough.  Now a
> > hot-add of a device that triggers this changes from a simple failure to
> > a denial of service event.  Furthermore, we base that time bomb on our
> > interpretation of the spec, which we can only validate against in-tree
> > devices.
> >
> > We have actually had assigned devices that fail the sanity test here,
> > there's a quirk in vfio_msix_early_setup() for a Chelsio device with
> > this bug.  Do we really want user experiencing aborts when a simple
> > device initialization failure is sufficient?
> >
> > Generally abort code paths like this cause me to do my own sanity
> > testing, which is really poor practice since we should have that sanity
> > testing in the common code.  Thanks,  
> 
> I prefer to assert on programming error, because 1. it does double duty
> as documentation, 2. error handling of impossible conditions is commonly
> wrong, and 3. assertion failures have a much better chance to get the
> program fixed.  Even when presence of a working error path kills 2., the
> other two make me stick to assertions.

So we're looking at:

> -    if (nentries < 1 || nentries > PCI_MSIX_FLAGS_QSIZE + 1) {
> -        return -EINVAL;
> -    }

vs

> +    assert(nentries >= 1 && nentries <= PCI_MSIX_FLAGS_QSIZE + 1);

How do you argue that one of these provides better self documentation
than the other?

The assert may have a better chance of getting fixed, but it's because
the existence of the assert itself exposes a vulnerability in the code.
Which would you rather have in production, a VMM that crashes on the
slightest deviance from the input it expects or one that simply errors
the faulting code path and continues?

Error handling is hard, which is why we need to look at it as a
collection of smaller problems.  We return an error at a leaf function
and let callers of that function decide how to handle it.  If some of
those callers don't want to deal with error handling, abort there, we
can come back to them later, but let the code paths that do want proper
error handling to continue.  If we add aborts into the leaf function,
then any calling path that wants to be robust against an error needs to
fully sanitize the input itself, at which point we have different
drivers sanitizing in different ways, all building up walls to protect
themselves from the time bombs in these leaf functions.  It's crazy.

> However, input out-of-spec is not a programming error.  For most users
> of msix_init(), the arguments are hard-coded, thus invalid arguments are
> a programming error.  For device assignment, they come from a physical
> device, thus invalid arguments can either be a programming error (our
> idea of "invalid" is invalid) or bad input (the physical device is
> out-of-spec).  Since we can't know, we better handle it rather than
> assert.

So are we going to flag every call path that device assignment might
use as one that needs "proper" error handling any anything that's only
used by emulated devices can assert?  How will anyone ever know?  vfio
tries really hard to be just another device in the QEMU ecosystem.

> Bottom line: you convinced me msix_init() should stay as it is.  But now
> msi_init() looks like it needs a change: it asserts on invalid
> nr_vectors parameter.  Does that need fixing, Alex?

IMHO, they all need to be fixed.  Besides, look at the callers of
msi_init(), almost every one will assert on its own if msi_init()
fails, all we're doing is hindering drivers like vfio-pci that can
gracefully handle a failure.  I think that's exactly how each of these
should be handled, find a leaf function with asserts, convert it to
proper error handling, change the callers that don't already handle the
error or assert to assert, then work down through each code path to
figure out how they can more robustly handle an error.  I don't buy the
argument that error handling is too hard or that we're more likely to
get it wrong.  It needs to be handled as percolating small errors, each
of which is trivial to handle on its own.  Thanks,

Alex

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Qemu-devel] [PATCH v2 1/5] msix_init: assert programming error
  2016-09-29 16:10             ` Alex Williamson
@ 2016-09-30 14:06               ` Markus Armbruster
  2016-09-30 18:06                 ` Dr. David Alan Gilbert
  2016-10-06  7:00                 ` Cao jin
  0 siblings, 2 replies; 21+ messages in thread
From: Markus Armbruster @ 2016-09-30 14:06 UTC (permalink / raw)
  To: Alex Williamson; +Cc: Marcel Apfelbaum, Cao jin, qemu-devel, Michael S. Tsirkin

Alex Williamson <alex.williamson@redhat.com> writes:

> On Thu, 29 Sep 2016 15:11:27 +0200
> Markus Armbruster <armbru@redhat.com> wrote:
>
>> Alex Williamson <alex.williamson@redhat.com> writes:
>> 
>> > On Tue, 13 Sep 2016 08:16:20 +0200
>> > Markus Armbruster <armbru@redhat.com> wrote:
>> >  
>> >> Cc: Alex for device assignment expertise.
>> >> 
>> >> Cao jin <caoj.fnst@cn.fujitsu.com> writes:
>> >>   
>> >> > On 09/12/2016 09:29 PM, Markus Armbruster wrote:    
>> >> >> Cao jin <caoj.fnst@cn.fujitsu.com> writes:
>> >> >>    
>> >> >>> The input parameters is used for creating the msix capable device, so
>> >> >>> they must obey the PCI spec, or else, it should be programming error.    
>> >> >>
>> >> >> True when the the parameters come from a device model attempting to
>> >> >> define a PCI device violating the spec.  But what if the parameters come
>> >> >> from an actual PCI device violating the spec, via device assignment?    
>> >> >
>> >> > Before the patch, on invalid param, the vfio behaviour is:
>> >> >   error_report("vfio: msix_init failed");
>> >> >   then, device create fail.
>> >> >
>> >> > After the patch, its behaviour is:
>> >> >   asserted.
>> >> >
>> >> > Do you mean we should still report some useful info to user on invalid
>> >> > params?    
>> >> 
>> >> In the normal case, asking msix_init() to create MSI-X that are out of
>> >> spec is a programming error: the code that does it is broken and needs
>> >> fixing.
>> >> 
>> >> Device assignment might be the exception: there, the parameters for
>> >> msix_init() come from the assigned device, not the program.  If they
>> >> violate the spec, the device is broken.  This wouldn't be a programming
>> >> error.  Alex, can this happen?
>> >> 
>> >> If yes, we may want to handle it by failing device assignment.  
>> >
>> >
>> > Generally, I think the entire premise of these sorts of patches is
>> > flawed.  We take a working error path that allows a driver to robustly
>> > abort on unexpected date and turn it into a time bomb.  Often the
>> > excuse for this is that "error handling is hard".  Tough.  Now a
>> > hot-add of a device that triggers this changes from a simple failure to
>> > a denial of service event.  Furthermore, we base that time bomb on our
>> > interpretation of the spec, which we can only validate against in-tree
>> > devices.
>> >
>> > We have actually had assigned devices that fail the sanity test here,
>> > there's a quirk in vfio_msix_early_setup() for a Chelsio device with
>> > this bug.  Do we really want user experiencing aborts when a simple
>> > device initialization failure is sufficient?
>> >
>> > Generally abort code paths like this cause me to do my own sanity
>> > testing, which is really poor practice since we should have that sanity
>> > testing in the common code.  Thanks,  
>> 
>> I prefer to assert on programming error, because 1. it does double duty
>> as documentation, 2. error handling of impossible conditions is commonly
>> wrong, and 3. assertion failures have a much better chance to get the
>> program fixed.  Even when presence of a working error path kills 2., the
>> other two make me stick to assertions.
>
> So we're looking at:
>
>> -    if (nentries < 1 || nentries > PCI_MSIX_FLAGS_QSIZE + 1) {
>> -        return -EINVAL;
>> -    }
>
> vs
>
>> +    assert(nentries >= 1 && nentries <= PCI_MSIX_FLAGS_QSIZE + 1);
>
> How do you argue that one of these provides better self documentation
> than the other?

The first one says "this can happen, and when it does, the function
fails cleanly."  For a genuine programming error, this is in part
misleading.

The second one says "I assert this can't happen.  We'd be toast if I was
wrong."

> The assert may have a better chance of getting fixed, but it's because
> the existence of the assert itself exposes a vulnerability in the code.
> Which would you rather have in production, a VMM that crashes on the
> slightest deviance from the input it expects or one that simply errors
> the faulting code path and continues?

Invalid input to a program should never be treated as programming error.

> Error handling is hard, which is why we need to look at it as a
> collection of smaller problems.  We return an error at a leaf function
> and let callers of that function decide how to handle it.  If some of
> those callers don't want to deal with error handling, abort there, we
> can come back to them later, but let the code paths that do want proper
> error handling to continue.  If we add aborts into the leaf function,
> then any calling path that wants to be robust against an error needs to
> fully sanitize the input itself, at which point we have different
> drivers sanitizing in different ways, all building up walls to protect
> themselves from the time bombs in these leaf functions.  It's crazy.

It depends on the kind of error in the leaf function.

I suspect we're talking past each other because we got different kinds
of errors in mind.

Programming is impossible without things like preconditions,
postconditions, invariants.

If a section of code is entered when its precondition doesn't hold,
we're toast.  This is the archetypical programming error.

If it can actually happen, the program is incorrect, and needs fixing.

Checking preconditions is often (but not always) practical.  In my
opinion, checking is good practice, and the proper way to check is
assert().  Makes the incorrect program fail before it can do further
damage, and helps with finding the programming error.

A preconditions is part of the contract between a function and its
users.  An strong precondition can make the function's job easier, but
that's no use if the resulting function is inconvenient to use.  On the
other hand, complicating the function to get a weaker precondition
nobody actually needs is just as dumb.

Returning an error is *not* checking preconditions.  Remember, if the
precondition doesn't hold, we're toast.  If we're toast when we return
an error, we're clearly doing it wrong.

You are arguing for weaker preconditions.  I'm not actually disagreeing
with you!  I'm merely expressing my opinion that checking preconditions
with assert() is a good idea.

>> However, input out-of-spec is not a programming error.  For most users
>> of msix_init(), the arguments are hard-coded, thus invalid arguments are
>> a programming error.  For device assignment, they come from a physical
>> device, thus invalid arguments can either be a programming error (our
>> idea of "invalid" is invalid) or bad input (the physical device is
>> out-of-spec).  Since we can't know, we better handle it rather than
>> assert.
>
> So are we going to flag every call path that device assignment might
> use as one that needs "proper" error handling any anything that's only
> used by emulated devices can assert?  How will anyone ever know?  vfio
> tries really hard to be just another device in the QEMU ecosystem.

It tries, but it can't help to add a few things.

Consider the number of MSI vectors.  It can only be 1, 2, 4, 8, 16 or
32.

When the callers of msi_init() pass literal numbers, making "the number
is valid" a precondition is quite sensible.

If the numbers come from the user via configuration, they need to be
checked.  Two sane ways to do that: check close to where the
configuration is processed, and check where it is used.  The former will
likely produce better error messages.  But the latter has its
advantages, too.  Checking next to its use in msi_init() involves making
it handle invalid numbers, i.e. weakening its precondition.

Making vectors configurable turned moves them from the realm of
preconditions to the realm of program input.  Code needs to be updated
for that.

What device assignment adds is moving many more bits to the program
input realm.  More code needs to be updated for that.

>> Bottom line: you convinced me msix_init() should stay as it is.  But now
>> msi_init() looks like it needs a change: it asserts on invalid
>> nr_vectors parameter.  Does that need fixing, Alex?
>
> IMHO, they all need to be fixed.  Besides, look at the callers of
> msi_init(), almost every one will assert on its own if msi_init()
> fails, all we're doing is hindering drivers like vfio-pci that can
> gracefully handle a failure.  I think that's exactly how each of these
> should be handled, find a leaf function with asserts, convert it to
> proper error handling, change the callers that don't already handle the
> error or assert to assert, then work down through each code path to
> figure out how they can more robustly handle an error.  I don't buy the
> argument that error handling is too hard or that we're more likely to
> get it wrong.  It needs to be handled as percolating small errors, each
> of which is trivial to handle on its own.  Thanks,

Once there's a need to handle a certain condition as an error, we should
do that, no argument.  This also provides a way to test the error path.

However, I wouldn't buy an argument that preconditions should be made as
weak as possible in leaf functions (let alone always) regardless of the
cost in complexity, and non-testability of error paths.  I'm strictly a
pay as you go person.

Back to the problem at hand.  Cao jin, would you be willing to fix
msi_init()?

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Qemu-devel] [PATCH v2 1/5] msix_init: assert programming error
  2016-09-30 14:06               ` Markus Armbruster
@ 2016-09-30 18:06                 ` Dr. David Alan Gilbert
  2016-10-04  9:33                   ` Markus Armbruster
  2016-10-06  7:00                 ` Cao jin
  1 sibling, 1 reply; 21+ messages in thread
From: Dr. David Alan Gilbert @ 2016-09-30 18:06 UTC (permalink / raw)
  To: Markus Armbruster
  Cc: Alex Williamson, Marcel Apfelbaum, Cao jin, qemu-devel,
	Michael S. Tsirkin

* Markus Armbruster (armbru@redhat.com) wrote:
> Alex Williamson <alex.williamson@redhat.com> writes:
> 
> > On Thu, 29 Sep 2016 15:11:27 +0200
> > Markus Armbruster <armbru@redhat.com> wrote:
> >
> >> Alex Williamson <alex.williamson@redhat.com> writes:
> >> 
> >> > On Tue, 13 Sep 2016 08:16:20 +0200
> >> > Markus Armbruster <armbru@redhat.com> wrote:
> >> >  
> >> >> Cc: Alex for device assignment expertise.
> >> >> 
> >> >> Cao jin <caoj.fnst@cn.fujitsu.com> writes:
> >> >>   
> >> >> > On 09/12/2016 09:29 PM, Markus Armbruster wrote:    
> >> >> >> Cao jin <caoj.fnst@cn.fujitsu.com> writes:
> >> >> >>    
> >> >> >>> The input parameters is used for creating the msix capable device, so
> >> >> >>> they must obey the PCI spec, or else, it should be programming error.    
> >> >> >>
> >> >> >> True when the the parameters come from a device model attempting to
> >> >> >> define a PCI device violating the spec.  But what if the parameters come
> >> >> >> from an actual PCI device violating the spec, via device assignment?    
> >> >> >
> >> >> > Before the patch, on invalid param, the vfio behaviour is:
> >> >> >   error_report("vfio: msix_init failed");
> >> >> >   then, device create fail.
> >> >> >
> >> >> > After the patch, its behaviour is:
> >> >> >   asserted.
> >> >> >
> >> >> > Do you mean we should still report some useful info to user on invalid
> >> >> > params?    
> >> >> 
> >> >> In the normal case, asking msix_init() to create MSI-X that are out of
> >> >> spec is a programming error: the code that does it is broken and needs
> >> >> fixing.
> >> >> 
> >> >> Device assignment might be the exception: there, the parameters for
> >> >> msix_init() come from the assigned device, not the program.  If they
> >> >> violate the spec, the device is broken.  This wouldn't be a programming
> >> >> error.  Alex, can this happen?
> >> >> 
> >> >> If yes, we may want to handle it by failing device assignment.  
> >> >
> >> >
> >> > Generally, I think the entire premise of these sorts of patches is
> >> > flawed.  We take a working error path that allows a driver to robustly
> >> > abort on unexpected date and turn it into a time bomb.  Often the
> >> > excuse for this is that "error handling is hard".  Tough.  Now a
> >> > hot-add of a device that triggers this changes from a simple failure to
> >> > a denial of service event.  Furthermore, we base that time bomb on our
> >> > interpretation of the spec, which we can only validate against in-tree
> >> > devices.
> >> >
> >> > We have actually had assigned devices that fail the sanity test here,
> >> > there's a quirk in vfio_msix_early_setup() for a Chelsio device with
> >> > this bug.  Do we really want user experiencing aborts when a simple
> >> > device initialization failure is sufficient?
> >> >
> >> > Generally abort code paths like this cause me to do my own sanity
> >> > testing, which is really poor practice since we should have that sanity
> >> > testing in the common code.  Thanks,  
> >> 
> >> I prefer to assert on programming error, because 1. it does double duty
> >> as documentation, 2. error handling of impossible conditions is commonly
> >> wrong, and 3. assertion failures have a much better chance to get the
> >> program fixed.  Even when presence of a working error path kills 2., the
> >> other two make me stick to assertions.
> >
> > So we're looking at:
> >
> >> -    if (nentries < 1 || nentries > PCI_MSIX_FLAGS_QSIZE + 1) {
> >> -        return -EINVAL;
> >> -    }
> >
> > vs
> >
> >> +    assert(nentries >= 1 && nentries <= PCI_MSIX_FLAGS_QSIZE + 1);
> >
> > How do you argue that one of these provides better self documentation
> > than the other?
> 
> The first one says "this can happen, and when it does, the function
> fails cleanly."  For a genuine programming error, this is in part
> misleading.
> 
> The second one says "I assert this can't happen.  We'd be toast if I was
> wrong."
> 
> > The assert may have a better chance of getting fixed, but it's because
> > the existence of the assert itself exposes a vulnerability in the code.
> > Which would you rather have in production, a VMM that crashes on the
> > slightest deviance from the input it expects or one that simply errors
> > the faulting code path and continues?
> 
> Invalid input to a program should never be treated as programming error.
> 
> > Error handling is hard, which is why we need to look at it as a
> > collection of smaller problems.  We return an error at a leaf function
> > and let callers of that function decide how to handle it.  If some of
> > those callers don't want to deal with error handling, abort there, we
> > can come back to them later, but let the code paths that do want proper
> > error handling to continue.  If we add aborts into the leaf function,
> > then any calling path that wants to be robust against an error needs to
> > fully sanitize the input itself, at which point we have different
> > drivers sanitizing in different ways, all building up walls to protect
> > themselves from the time bombs in these leaf functions.  It's crazy.
> 
> It depends on the kind of error in the leaf function.
> 
> I suspect we're talking past each other because we got different kinds
> of errors in mind.
> 
> Programming is impossible without things like preconditions,
> postconditions, invariants.
> 
> If a section of code is entered when its precondition doesn't hold,
> we're toast.  This is the archetypical programming error.
> 
> If it can actually happen, the program is incorrect, and needs fixing.
> 
> Checking preconditions is often (but not always) practical.  In my
> opinion, checking is good practice, and the proper way to check is
> assert().  Makes the incorrect program fail before it can do further
> damage, and helps with finding the programming error.
> 
> A preconditions is part of the contract between a function and its
> users.  An strong precondition can make the function's job easier, but
> that's no use if the resulting function is inconvenient to use.  On the
> other hand, complicating the function to get a weaker precondition
> nobody actually needs is just as dumb.
> 
> Returning an error is *not* checking preconditions.  Remember, if the
> precondition doesn't hold, we're toast.  If we're toast when we return
> an error, we're clearly doing it wrong.
> 
> You are arguing for weaker preconditions.  I'm not actually disagreeing
> with you!  I'm merely expressing my opinion that checking preconditions
> with assert() is a good idea.

I have a fairly strong dislike for asserts in qemu, and although I'm not
always consistent, my reasoning is mainly to do with asserts once a guest
is running.

Lets imagine you have a happily running guest and then you try and do
something new and complex (e.g. hotplug a vfio-device); now lets say that
new thing has something very broken about it, do you really want the previously
running guest to die?

My view is it can very much depend on how broken you think the
world is; you've got to remember that crashing at this point
is going to lose the user a VM, and that could mean losing
data - so at that point you have to make a decision about whether
your lack of confidence in the state of the VM due to the failed
precondition is worse than your knowledge that the VM is going to fail.

Perhaps giving the user an error and disabling the device lets
the admin gravefully shutdown the VM and walk away with all
their data intact.

So I wouldn't argue for weaker preconditions, just what the
result is if the precondition fails.

Dave
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Qemu-devel] [PATCH v2 1/5] msix_init: assert programming error
  2016-09-30 18:06                 ` Dr. David Alan Gilbert
@ 2016-10-04  9:33                   ` Markus Armbruster
  2016-10-04 11:19                     ` Dr. David Alan Gilbert
  0 siblings, 1 reply; 21+ messages in thread
From: Markus Armbruster @ 2016-10-04  9:33 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: Marcel Apfelbaum, Alex Williamson, Michael S. Tsirkin, Cao jin,
	qemu-devel

"Dr. David Alan Gilbert" <dgilbert@redhat.com> writes:

> * Markus Armbruster (armbru@redhat.com) wrote:
>> Alex Williamson <alex.williamson@redhat.com> writes:
>> 
>> > On Thu, 29 Sep 2016 15:11:27 +0200
>> > Markus Armbruster <armbru@redhat.com> wrote:
>> >
>> >> Alex Williamson <alex.williamson@redhat.com> writes:
>> >> 
>> >> > On Tue, 13 Sep 2016 08:16:20 +0200
>> >> > Markus Armbruster <armbru@redhat.com> wrote:
>> >> >  
>> >> >> Cc: Alex for device assignment expertise.
>> >> >> 
>> >> >> Cao jin <caoj.fnst@cn.fujitsu.com> writes:
>> >> >>   
>> >> >> > On 09/12/2016 09:29 PM, Markus Armbruster wrote:    
>> >> >> >> Cao jin <caoj.fnst@cn.fujitsu.com> writes:
>> >> >> >>    
>> >> >> >>> The input parameters is used for creating the msix capable device, so
>> >> >> >>> they must obey the PCI spec, or else, it should be programming error.    
>> >> >> >>
>> >> >> >> True when the the parameters come from a device model attempting to
>> >> >> >> define a PCI device violating the spec.  But what if the parameters come
>> >> >> >> from an actual PCI device violating the spec, via device assignment?    
>> >> >> >
>> >> >> > Before the patch, on invalid param, the vfio behaviour is:
>> >> >> >   error_report("vfio: msix_init failed");
>> >> >> >   then, device create fail.
>> >> >> >
>> >> >> > After the patch, its behaviour is:
>> >> >> >   asserted.
>> >> >> >
>> >> >> > Do you mean we should still report some useful info to user on invalid
>> >> >> > params?    
>> >> >> 
>> >> >> In the normal case, asking msix_init() to create MSI-X that are out of
>> >> >> spec is a programming error: the code that does it is broken and needs
>> >> >> fixing.
>> >> >> 
>> >> >> Device assignment might be the exception: there, the parameters for
>> >> >> msix_init() come from the assigned device, not the program.  If they
>> >> >> violate the spec, the device is broken.  This wouldn't be a programming
>> >> >> error.  Alex, can this happen?
>> >> >> 
>> >> >> If yes, we may want to handle it by failing device assignment.  
>> >> >
>> >> >
>> >> > Generally, I think the entire premise of these sorts of patches is
>> >> > flawed.  We take a working error path that allows a driver to robustly
>> >> > abort on unexpected date and turn it into a time bomb.  Often the
>> >> > excuse for this is that "error handling is hard".  Tough.  Now a
>> >> > hot-add of a device that triggers this changes from a simple failure to
>> >> > a denial of service event.  Furthermore, we base that time bomb on our
>> >> > interpretation of the spec, which we can only validate against in-tree
>> >> > devices.
>> >> >
>> >> > We have actually had assigned devices that fail the sanity test here,
>> >> > there's a quirk in vfio_msix_early_setup() for a Chelsio device with
>> >> > this bug.  Do we really want user experiencing aborts when a simple
>> >> > device initialization failure is sufficient?
>> >> >
>> >> > Generally abort code paths like this cause me to do my own sanity
>> >> > testing, which is really poor practice since we should have that sanity
>> >> > testing in the common code.  Thanks,  
>> >> 
>> >> I prefer to assert on programming error, because 1. it does double duty
>> >> as documentation, 2. error handling of impossible conditions is commonly
>> >> wrong, and 3. assertion failures have a much better chance to get the
>> >> program fixed.  Even when presence of a working error path kills 2., the
>> >> other two make me stick to assertions.
>> >
>> > So we're looking at:
>> >
>> >> -    if (nentries < 1 || nentries > PCI_MSIX_FLAGS_QSIZE + 1) {
>> >> -        return -EINVAL;
>> >> -    }
>> >
>> > vs
>> >
>> >> +    assert(nentries >= 1 && nentries <= PCI_MSIX_FLAGS_QSIZE + 1);
>> >
>> > How do you argue that one of these provides better self documentation
>> > than the other?
>> 
>> The first one says "this can happen, and when it does, the function
>> fails cleanly."  For a genuine programming error, this is in part
>> misleading.
>> 
>> The second one says "I assert this can't happen.  We'd be toast if I was
>> wrong."
>> 
>> > The assert may have a better chance of getting fixed, but it's because
>> > the existence of the assert itself exposes a vulnerability in the code.
>> > Which would you rather have in production, a VMM that crashes on the
>> > slightest deviance from the input it expects or one that simply errors
>> > the faulting code path and continues?
>> 
>> Invalid input to a program should never be treated as programming error.
>> 
>> > Error handling is hard, which is why we need to look at it as a
>> > collection of smaller problems.  We return an error at a leaf function
>> > and let callers of that function decide how to handle it.  If some of
>> > those callers don't want to deal with error handling, abort there, we
>> > can come back to them later, but let the code paths that do want proper
>> > error handling to continue.  If we add aborts into the leaf function,
>> > then any calling path that wants to be robust against an error needs to
>> > fully sanitize the input itself, at which point we have different
>> > drivers sanitizing in different ways, all building up walls to protect
>> > themselves from the time bombs in these leaf functions.  It's crazy.
>> 
>> It depends on the kind of error in the leaf function.
>> 
>> I suspect we're talking past each other because we got different kinds
>> of errors in mind.
>> 
>> Programming is impossible without things like preconditions,
>> postconditions, invariants.
>> 
>> If a section of code is entered when its precondition doesn't hold,
>> we're toast.  This is the archetypical programming error.
>> 
>> If it can actually happen, the program is incorrect, and needs fixing.
>> 
>> Checking preconditions is often (but not always) practical.  In my
>> opinion, checking is good practice, and the proper way to check is
>> assert().  Makes the incorrect program fail before it can do further
>> damage, and helps with finding the programming error.
>> 
>> A preconditions is part of the contract between a function and its
>> users.  An strong precondition can make the function's job easier, but
>> that's no use if the resulting function is inconvenient to use.  On the
>> other hand, complicating the function to get a weaker precondition
>> nobody actually needs is just as dumb.
>> 
>> Returning an error is *not* checking preconditions.  Remember, if the
>> precondition doesn't hold, we're toast.  If we're toast when we return
>> an error, we're clearly doing it wrong.
>> 
>> You are arguing for weaker preconditions.  I'm not actually disagreeing
>> with you!  I'm merely expressing my opinion that checking preconditions
>> with assert() is a good idea.
>
> I have a fairly strong dislike for asserts in qemu, and although I'm not
> always consistent, my reasoning is mainly to do with asserts once a guest
> is running.
>
> Lets imagine you have a happily running guest and then you try and do
> something new and complex (e.g. hotplug a vfio-device); now lets say that
> new thing has something very broken about it, do you really want the previously
> running guest to die?

If a precondition doesn't hold, we're toast.  The best we can do is
crash before we mess up things further.

A problematic condition we can safely recover from can be made an error
condition.

I think the crux of our misunderstandings (I hesitate to call it an
argument) is confusing recoverable error conditions with violated
preconditions.  We all agree (violently, perhaps) that assert() is not
an acceptable error handling mechanism.

> My view is it can very much depend on how broken you think the
> world is; you've got to remember that crashing at this point
> is going to lose the user a VM, and that could mean losing
> data - so at that point you have to make a decision about whether
> your lack of confidence in the state of the VM due to the failed
> precondition is worse than your knowledge that the VM is going to fail.
>
> Perhaps giving the user an error and disabling the device lets
> the admin gravefully shutdown the VM and walk away with all
> their data intact.

This is risky business unless you can prove the problematic condition is
safely isolated.  To elaborate on your device example: say some logic
error in device emulation code put the device instance in some broken
state.  If you detect that before the device could mess up anything
else, fencing the device is safe.  But if device state is borked because
some other code overran an array, continuing risks making things worse.
Crashing the guest is bad.  Letting it first overwrite good data with
bad data is worse.

Sadly, such proof is hardly ever possible in unrestricted C.  So we're
down to probabilities and tradeoffs.

I'd reject a claim that once the guest is running the tradeoffs *always*
favour trying to hobble on.

If you want a less bleak isolation and recovery story, check out Erlang.
Note that its "let it crash" philosophy is very much in accordance with
my views on what can safely be done after detecting a programming error
/ violated precondition.

> So I wouldn't argue for weaker preconditions, just what the
> result is if the precondition fails.

I respectfully disagree with your use of the concept "precondition".

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Qemu-devel] [PATCH v2 1/5] msix_init: assert programming error
  2016-10-04  9:33                   ` Markus Armbruster
@ 2016-10-04 11:19                     ` Dr. David Alan Gilbert
  0 siblings, 0 replies; 21+ messages in thread
From: Dr. David Alan Gilbert @ 2016-10-04 11:19 UTC (permalink / raw)
  To: Markus Armbruster
  Cc: Marcel Apfelbaum, Alex Williamson, Michael S. Tsirkin, Cao jin,
	qemu-devel

* Markus Armbruster (armbru@redhat.com) wrote:
> "Dr. David Alan Gilbert" <dgilbert@redhat.com> writes:
> 
> > * Markus Armbruster (armbru@redhat.com) wrote:
> >> Alex Williamson <alex.williamson@redhat.com> writes:
> >> 
> >> > On Thu, 29 Sep 2016 15:11:27 +0200
> >> > Markus Armbruster <armbru@redhat.com> wrote:
> >> >
> >> >> Alex Williamson <alex.williamson@redhat.com> writes:
> >> >> 
> >> >> > On Tue, 13 Sep 2016 08:16:20 +0200
> >> >> > Markus Armbruster <armbru@redhat.com> wrote:
> >> >> >  
> >> >> >> Cc: Alex for device assignment expertise.
> >> >> >> 
> >> >> >> Cao jin <caoj.fnst@cn.fujitsu.com> writes:
> >> >> >>   
> >> >> >> > On 09/12/2016 09:29 PM, Markus Armbruster wrote:    
> >> >> >> >> Cao jin <caoj.fnst@cn.fujitsu.com> writes:
> >> >> >> >>    
> >> >> >> >>> The input parameters is used for creating the msix capable device, so
> >> >> >> >>> they must obey the PCI spec, or else, it should be programming error.    
> >> >> >> >>
> >> >> >> >> True when the the parameters come from a device model attempting to
> >> >> >> >> define a PCI device violating the spec.  But what if the parameters come
> >> >> >> >> from an actual PCI device violating the spec, via device assignment?    
> >> >> >> >
> >> >> >> > Before the patch, on invalid param, the vfio behaviour is:
> >> >> >> >   error_report("vfio: msix_init failed");
> >> >> >> >   then, device create fail.
> >> >> >> >
> >> >> >> > After the patch, its behaviour is:
> >> >> >> >   asserted.
> >> >> >> >
> >> >> >> > Do you mean we should still report some useful info to user on invalid
> >> >> >> > params?    
> >> >> >> 
> >> >> >> In the normal case, asking msix_init() to create MSI-X that are out of
> >> >> >> spec is a programming error: the code that does it is broken and needs
> >> >> >> fixing.
> >> >> >> 
> >> >> >> Device assignment might be the exception: there, the parameters for
> >> >> >> msix_init() come from the assigned device, not the program.  If they
> >> >> >> violate the spec, the device is broken.  This wouldn't be a programming
> >> >> >> error.  Alex, can this happen?
> >> >> >> 
> >> >> >> If yes, we may want to handle it by failing device assignment.  
> >> >> >
> >> >> >
> >> >> > Generally, I think the entire premise of these sorts of patches is
> >> >> > flawed.  We take a working error path that allows a driver to robustly
> >> >> > abort on unexpected date and turn it into a time bomb.  Often the
> >> >> > excuse for this is that "error handling is hard".  Tough.  Now a
> >> >> > hot-add of a device that triggers this changes from a simple failure to
> >> >> > a denial of service event.  Furthermore, we base that time bomb on our
> >> >> > interpretation of the spec, which we can only validate against in-tree
> >> >> > devices.
> >> >> >
> >> >> > We have actually had assigned devices that fail the sanity test here,
> >> >> > there's a quirk in vfio_msix_early_setup() for a Chelsio device with
> >> >> > this bug.  Do we really want user experiencing aborts when a simple
> >> >> > device initialization failure is sufficient?
> >> >> >
> >> >> > Generally abort code paths like this cause me to do my own sanity
> >> >> > testing, which is really poor practice since we should have that sanity
> >> >> > testing in the common code.  Thanks,  
> >> >> 
> >> >> I prefer to assert on programming error, because 1. it does double duty
> >> >> as documentation, 2. error handling of impossible conditions is commonly
> >> >> wrong, and 3. assertion failures have a much better chance to get the
> >> >> program fixed.  Even when presence of a working error path kills 2., the
> >> >> other two make me stick to assertions.
> >> >
> >> > So we're looking at:
> >> >
> >> >> -    if (nentries < 1 || nentries > PCI_MSIX_FLAGS_QSIZE + 1) {
> >> >> -        return -EINVAL;
> >> >> -    }
> >> >
> >> > vs
> >> >
> >> >> +    assert(nentries >= 1 && nentries <= PCI_MSIX_FLAGS_QSIZE + 1);
> >> >
> >> > How do you argue that one of these provides better self documentation
> >> > than the other?
> >> 
> >> The first one says "this can happen, and when it does, the function
> >> fails cleanly."  For a genuine programming error, this is in part
> >> misleading.
> >> 
> >> The second one says "I assert this can't happen.  We'd be toast if I was
> >> wrong."
> >> 
> >> > The assert may have a better chance of getting fixed, but it's because
> >> > the existence of the assert itself exposes a vulnerability in the code.
> >> > Which would you rather have in production, a VMM that crashes on the
> >> > slightest deviance from the input it expects or one that simply errors
> >> > the faulting code path and continues?
> >> 
> >> Invalid input to a program should never be treated as programming error.
> >> 
> >> > Error handling is hard, which is why we need to look at it as a
> >> > collection of smaller problems.  We return an error at a leaf function
> >> > and let callers of that function decide how to handle it.  If some of
> >> > those callers don't want to deal with error handling, abort there, we
> >> > can come back to them later, but let the code paths that do want proper
> >> > error handling to continue.  If we add aborts into the leaf function,
> >> > then any calling path that wants to be robust against an error needs to
> >> > fully sanitize the input itself, at which point we have different
> >> > drivers sanitizing in different ways, all building up walls to protect
> >> > themselves from the time bombs in these leaf functions.  It's crazy.
> >> 
> >> It depends on the kind of error in the leaf function.
> >> 
> >> I suspect we're talking past each other because we got different kinds
> >> of errors in mind.
> >> 
> >> Programming is impossible without things like preconditions,
> >> postconditions, invariants.
> >> 
> >> If a section of code is entered when its precondition doesn't hold,
> >> we're toast.  This is the archetypical programming error.
> >> 
> >> If it can actually happen, the program is incorrect, and needs fixing.
> >> 
> >> Checking preconditions is often (but not always) practical.  In my
> >> opinion, checking is good practice, and the proper way to check is
> >> assert().  Makes the incorrect program fail before it can do further
> >> damage, and helps with finding the programming error.
> >> 
> >> A preconditions is part of the contract between a function and its
> >> users.  An strong precondition can make the function's job easier, but
> >> that's no use if the resulting function is inconvenient to use.  On the
> >> other hand, complicating the function to get a weaker precondition
> >> nobody actually needs is just as dumb.
> >> 
> >> Returning an error is *not* checking preconditions.  Remember, if the
> >> precondition doesn't hold, we're toast.  If we're toast when we return
> >> an error, we're clearly doing it wrong.
> >> 
> >> You are arguing for weaker preconditions.  I'm not actually disagreeing
> >> with you!  I'm merely expressing my opinion that checking preconditions
> >> with assert() is a good idea.
> >
> > I have a fairly strong dislike for asserts in qemu, and although I'm not
> > always consistent, my reasoning is mainly to do with asserts once a guest
> > is running.
> >
> > Lets imagine you have a happily running guest and then you try and do
> > something new and complex (e.g. hotplug a vfio-device); now lets say that
> > new thing has something very broken about it, do you really want the previously
> > running guest to die?
> 
> If a precondition doesn't hold, we're toast.  The best we can do is
> crash before we mess up things further.
> 
> A problematic condition we can safely recover from can be made an error
> condition.
> 
> I think the crux of our misunderstandings (I hesitate to call it an
> argument) is confusing recoverable error conditions with violated
> preconditions.  We all agree (violently, perhaps) that assert() is not
> an acceptable error handling mechanism.

I think perhaps part of the problem maybe trying to place all types of screwups
into only two categories; 'errors' and 'violations of preconditions'.
Consider some cases:
   a) The user tries to specify an out of range value to a setting;
      an error, probably not fatal (except if it was commandline)

   b) An inconsistency is found in the MMU state
      violation of precondition, fatal.

   c) A host device used for passthrough does something which according
      to the USB/PCI/SCSI specs is illegal
      violation of precondition - but you probably don't want that
      to be fatal.

   d) An inconsistency is found in a specific device emulation
      violation of precondition - but I might not want that to be fatal.

I think we agree on (a),(b), disagree on (d)  and I think this
case might be (c).

> > My view is it can very much depend on how broken you think the
> > world is; you've got to remember that crashing at this point
> > is going to lose the user a VM, and that could mean losing
> > data - so at that point you have to make a decision about whether
> > your lack of confidence in the state of the VM due to the failed
> > precondition is worse than your knowledge that the VM is going to fail.
> >
> > Perhaps giving the user an error and disabling the device lets
> > the admin gravefully shutdown the VM and walk away with all
> > their data intact.
> 
> This is risky business unless you can prove the problematic condition is
> safely isolated.  To elaborate on your device example: say some logic
> error in device emulation code put the device instance in some broken
> state.  If you detect that before the device could mess up anything
> else, fencing the device is safe.  But if device state is borked because
> some other code overran an array, continuing risks making things worse.
> Crashing the guest is bad.  Letting it first overwrite good data with
> bad data is worse.
> 
> Sadly, such proof is hardly ever possible in unrestricted C.  So we're
> down to probabilities and tradeoffs.

Agreed.

> I'd reject a claim that once the guest is running the tradeoffs *always*
> favour trying to hobble on.

Agreed; this is the difference between my case (b) and (d).
My preference is to fail the device in question if it's not a core device;
that way if it's a disk you can't write any more to it to mess it's contents
up further, and you won't read bad data from it - that's about as much
isolation as you're going to get.

However, some of it is also down to our expections of the stability of the
code in question - if the inconsistency is in some code that you know
is complex probably with untested cases and which isn't core to the VM
continuing (e.g. outgoing migration or hotplugging a host device) then
I believe it's OK to issue a scary warning, disable/error the device
in question and hobble on.

I'd say it's OK to argue that a piece of core code should be heavily
isolated from the bits you think are still a bit touchy - so it's
reasonable to me to have an assert in some core code (b) as long
as it's possible to stop any of the (c) and (d) cases triggering it
if they're coded defensively enough to error out before that assert
could be hit.  But then again someone might worry they just can't
deal with all the types of screwup (c) might present.

> If you want a less bleak isolation and recovery story, check out Erlang.
> Note that its "let it crash" philosophy is very much in accordance with
> my views on what can safely be done after detecting a programming error
> / violated precondition.
> 
> > So I wouldn't argue for weaker preconditions, just what the
> > result is if the precondition fails.
> 
> I respectfully disagree with your use of the concept "precondition".

I generally avoid using the word precondition; it's too formal for my
liking given the level we're programming at and the lack of any formal
defs.

Dave
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Qemu-devel] [PATCH v2 1/5] msix_init: assert programming error
  2016-09-30 14:06               ` Markus Armbruster
  2016-09-30 18:06                 ` Dr. David Alan Gilbert
@ 2016-10-06  7:00                 ` Cao jin
  1 sibling, 0 replies; 21+ messages in thread
From: Cao jin @ 2016-10-06  7:00 UTC (permalink / raw)
  To: Markus Armbruster, Alex Williamson
  Cc: Marcel Apfelbaum, qemu-devel, Michael S. Tsirkin



On 09/30/2016 10:06 PM, Markus Armbruster wrote:
> Alex Williamson <alex.williamson@redhat.com> writes:
>


>
> Once there's a need to handle a certain condition as an error, we should
> do that, no argument.  This also provides a way to test the error path.
>
> However, I wouldn't buy an argument that preconditions should be made as
> weak as possible in leaf functions (let alone always) regardless of the
> cost in complexity, and non-testability of error paths.  I'm strictly a
> pay as you go person.
>
> Back to the problem at hand.  Cao jin, would you be willing to fix
> msi_init()?
>
>

Sorry for the holiday delay. Sure, will fix it, and add it as a new 
patch in this series.

-- 
Yours Sincerely,

Cao jin

^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2016-10-06  6:59 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-08-23  9:27 [Qemu-devel] [PATCH v2 0/5] Convert msix_init() to error Cao jin
2016-08-23  9:27 ` [Qemu-devel] [PATCH v2 1/5] msix_init: assert programming error Cao jin
2016-09-12 13:29   ` Markus Armbruster
2016-09-13  2:51     ` Cao jin
2016-09-13  6:16       ` Markus Armbruster
2016-09-13 14:49         ` Alex Williamson
2016-09-29 13:11           ` Markus Armbruster
2016-09-29 16:10             ` Alex Williamson
2016-09-30 14:06               ` Markus Armbruster
2016-09-30 18:06                 ` Dr. David Alan Gilbert
2016-10-04  9:33                   ` Markus Armbruster
2016-10-04 11:19                     ` Dr. David Alan Gilbert
2016-10-06  7:00                 ` Cao jin
2016-08-23  9:27 ` [Qemu-devel] [PATCH v2 2/5] msix: Follow CODING_STYLE Cao jin
2016-08-23  9:27 ` [Qemu-devel] [PATCH v2 3/5] pci: Convert msix_init() to Error and fix callers to check it Cao jin
2016-09-12 13:47   ` Markus Armbruster
2016-09-13  6:04     ` Cao jin
2016-09-13  8:27       ` Markus Armbruster
2016-08-23  9:27 ` [Qemu-devel] [PATCH v2 4/5] megasas: remove unnecessary megasas_use_msix() Cao jin
2016-08-23  9:27 ` [Qemu-devel] [PATCH v2 5/5] megasas: undo the overwrites of user configuration Cao jin
2016-09-06 12:42 ` [Qemu-devel] [PATCH v2 0/5] Convert msix_init() to error Cao jin

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.