All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [PULL 00/23] virtio, pci: fixes, features
@ 2017-02-17 19:53 Michael S. Tsirkin
  2017-02-17 19:53 ` [Qemu-devel] [PULL 01/23] pci/pcie: don't assume cap id 0 is reserved Michael S. Tsirkin
                   ` (23 more replies)
  0 siblings, 24 replies; 36+ messages in thread
From: Michael S. Tsirkin @ 2017-02-17 19:53 UTC (permalink / raw)
  To: qemu-devel; +Cc: Peter Maydell

The following changes since commit ad584d37f2a86b392c25f3f00cc1f1532676c2d1:

  Merge remote-tracking branch 'remotes/bonzini/tags/for-upstream' into staging (2017-02-16 17:46:52 +0000)

are available in the git repository at:

  git://git.kernel.org/pub/scm/virt/kvm/mst/qemu.git tags/for_upstream

for you to fetch changes up to 7e58326ad7e79b8c5dbcc6f24e9dc1523d84c11b:

  intel_iommu: vtd_slpt_level_shift check level (2017-02-17 21:52:31 +0200)

----------------------------------------------------------------
virtio, pci: fixes, features

virtio is using region caches for performance
iommu support for IOTLBs
misc fixes

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

----------------------------------------------------------------
Aviv Ben-David (1):
      intel_iommu: add "caching-mode" option

Fam Zheng (1):
      virtio: Report real progress in VQ aio poll handler

Haozhong Zhang (1):
      docs: add document to explain the usage of vNVDIMM

Michael S. Tsirkin (2):
      pci/pcie: don't assume cap id 0 is reserved
      virtio: Fix no interrupt when not creating msi controller

Paolo Bonzini (9):
      memory: make memory_listener_unregister idempotent
      virtio: add virtio_*_phys_cached
      virtio: use address_space_map/unmap to access descriptors
      exec: make address_space_cache_destroy idempotent
      virtio: use MemoryRegionCache to access descriptors
      virtio: add MemoryListener to cache ring translations
      virtio: use VRingMemoryRegionCaches for descriptor ring
      virtio: check for vring setup in virtio_queue_update_used_idx
      virtio: use VRingMemoryRegionCaches for avail and used rings

Peter Xu (9):
      pcie: simplify pcie_add_capability()
      vfio: trace map/unmap for notify as well
      vfio: introduce vfio_get_vaddr()
      vfio: allow to notify unmap for very large region
      intel_iommu: simplify irq region translation
      intel_iommu: renaming gpa to iova where proper
      intel_iommu: convert dbg macros to traces for inv
      intel_iommu: convert dbg macros to trace for trans
      intel_iommu: vtd_slpt_level_shift check level

 docs/nvdimm.txt                   | 124 +++++++++++++
 hw/i386/intel_iommu_internal.h    |   1 +
 include/exec/memory.h             |   2 +
 include/hw/i386/intel_iommu.h     |   2 +
 include/hw/virtio/virtio-access.h |  52 ++++++
 include/hw/virtio/virtio-blk.h    |   2 +-
 include/hw/virtio/virtio-scsi.h   |   6 +-
 include/hw/virtio/virtio.h        |   5 +-
 exec.c                            |   1 +
 hw/block/dataplane/virtio-blk.c   |   4 +-
 hw/block/virtio-blk.c             |  12 +-
 hw/i386/intel_iommu.c             | 238 ++++++++++---------------
 hw/net/virtio-net.c               |  14 +-
 hw/pci/pcie.c                     |  23 +--
 hw/scsi/virtio-scsi-dataplane.c   |  14 +-
 hw/scsi/virtio-scsi.c             |  14 +-
 hw/vfio/common.c                  |  65 ++++---
 hw/virtio/virtio.c                | 364 ++++++++++++++++++++++++++++++--------
 memory.c                          |   5 +
 hw/i386/trace-events              |  28 +++
 hw/vfio/trace-events              |   2 +-
 21 files changed, 702 insertions(+), 276 deletions(-)
 create mode 100644 docs/nvdimm.txt

^ permalink raw reply	[flat|nested] 36+ messages in thread

* [Qemu-devel] [PULL 01/23] pci/pcie: don't assume cap id 0 is reserved
  2017-02-17 19:53 [Qemu-devel] [PULL 00/23] virtio, pci: fixes, features Michael S. Tsirkin
@ 2017-02-17 19:53 ` Michael S. Tsirkin
  2017-02-17 19:54 ` [Qemu-devel] [PULL 02/23] virtio: Report real progress in VQ aio poll handler Michael S. Tsirkin
                   ` (22 subsequent siblings)
  23 siblings, 0 replies; 36+ messages in thread
From: Michael S. Tsirkin @ 2017-02-17 19:53 UTC (permalink / raw)
  To: qemu-devel; +Cc: Peter Maydell, Peter Xu, Alex Williamson, Marcel Apfelbaum

VFIO actually wants to create a capability with ID == 0.
This is done to make guest drivers skip the given capability.
pcie_add_capability then trips up on this capability
when looking for end of capability list.

To support this use-case, it's easy enough to switch to
e.g. 0xffffffff for these comparisons - we can be sure
it will never match a 16-bit capability ID.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Alex Williamson <alex.williamson@redhat.com>
---
 hw/pci/pcie.c | 11 +++++++----
 1 file changed, 7 insertions(+), 4 deletions(-)

diff --git a/hw/pci/pcie.c b/hw/pci/pcie.c
index cbd4bb4..f4dd177 100644
--- a/hw/pci/pcie.c
+++ b/hw/pci/pcie.c
@@ -610,7 +610,8 @@ bool pcie_cap_is_arifwd_enabled(const PCIDevice *dev)
  * uint16_t ext_cap_size
  */
 
-static uint16_t pcie_find_capability_list(PCIDevice *dev, uint16_t cap_id,
+/* Passing a cap_id value > 0xffff will return 0 and put end of list in prev */
+static uint16_t pcie_find_capability_list(PCIDevice *dev, uint32_t cap_id,
                                           uint16_t *prev_p)
 {
     uint16_t prev = 0;
@@ -679,9 +680,11 @@ void pcie_add_capability(PCIDevice *dev,
     } else {
         uint16_t prev;
 
-        /* 0 is reserved cap id. use internally to find the last capability
-           in the linked list */
-        next = pcie_find_capability_list(dev, 0, &prev);
+        /*
+         * 0xffffffff is not a valid cap id (it's a 16 bit field). use
+         * internally to find the last capability in the linked list.
+         */
+        next = pcie_find_capability_list(dev, 0xffffffff, &prev);
 
         assert(prev >= PCI_CONFIG_SPACE_SIZE);
         assert(next == 0);
-- 
MST

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [Qemu-devel] [PULL 02/23] virtio: Report real progress in VQ aio poll handler
  2017-02-17 19:53 [Qemu-devel] [PULL 00/23] virtio, pci: fixes, features Michael S. Tsirkin
  2017-02-17 19:53 ` [Qemu-devel] [PULL 01/23] pci/pcie: don't assume cap id 0 is reserved Michael S. Tsirkin
@ 2017-02-17 19:54 ` Michael S. Tsirkin
  2017-02-17 19:54 ` [Qemu-devel] [PULL 03/23] docs: add document to explain the usage of vNVDIMM Michael S. Tsirkin
                   ` (21 subsequent siblings)
  23 siblings, 0 replies; 36+ messages in thread
From: Michael S. Tsirkin @ 2017-02-17 19:54 UTC (permalink / raw)
  To: qemu-devel
  Cc: Peter Maydell, Fam Zheng, Ed Swierk, Stefan Hajnoczi, Kevin Wolf,
	Max Reitz, Paolo Bonzini, qemu-block

From: Fam Zheng <famz@redhat.com>

In virtio_queue_host_notifier_aio_poll, not all "!virtio_queue_empty()"
cases are making true progress.

Currently the offending one is virtio-scsi event queue, whose handler
does nothing if no event is pending. As a result aio_poll() will spin on
the "non-empty" VQ and take 100% host CPU.

Fix this by reporting actual progress from virtio queue aio handlers.

Reported-by: Ed Swierk <eswierk@skyportsystems.com>
Signed-off-by: Fam Zheng <famz@redhat.com>
Tested-by: Ed Swierk <eswierk@skyportsystems.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
---
 include/hw/virtio/virtio-blk.h  |  2 +-
 include/hw/virtio/virtio-scsi.h |  6 +++---
 include/hw/virtio/virtio.h      |  4 ++--
 hw/block/dataplane/virtio-blk.c |  4 ++--
 hw/block/virtio-blk.c           | 12 ++++++++++--
 hw/scsi/virtio-scsi-dataplane.c | 14 +++++++-------
 hw/scsi/virtio-scsi.c           | 14 +++++++++++---
 hw/virtio/virtio.c              | 15 +++++++++------
 8 files changed, 45 insertions(+), 26 deletions(-)

diff --git a/include/hw/virtio/virtio-blk.h b/include/hw/virtio/virtio-blk.h
index 9734b4c..d3c8a6f 100644
--- a/include/hw/virtio/virtio-blk.h
+++ b/include/hw/virtio/virtio-blk.h
@@ -80,6 +80,6 @@ typedef struct MultiReqBuffer {
     bool is_write;
 } MultiReqBuffer;
 
-void virtio_blk_handle_vq(VirtIOBlock *s, VirtQueue *vq);
+bool virtio_blk_handle_vq(VirtIOBlock *s, VirtQueue *vq);
 
 #endif
diff --git a/include/hw/virtio/virtio-scsi.h b/include/hw/virtio/virtio-scsi.h
index 7375196..f536f77 100644
--- a/include/hw/virtio/virtio-scsi.h
+++ b/include/hw/virtio/virtio-scsi.h
@@ -126,9 +126,9 @@ void virtio_scsi_common_realize(DeviceState *dev, Error **errp,
                                 VirtIOHandleOutput cmd);
 
 void virtio_scsi_common_unrealize(DeviceState *dev, Error **errp);
-void virtio_scsi_handle_event_vq(VirtIOSCSI *s, VirtQueue *vq);
-void virtio_scsi_handle_cmd_vq(VirtIOSCSI *s, VirtQueue *vq);
-void virtio_scsi_handle_ctrl_vq(VirtIOSCSI *s, VirtQueue *vq);
+bool virtio_scsi_handle_event_vq(VirtIOSCSI *s, VirtQueue *vq);
+bool virtio_scsi_handle_cmd_vq(VirtIOSCSI *s, VirtQueue *vq);
+bool virtio_scsi_handle_ctrl_vq(VirtIOSCSI *s, VirtQueue *vq);
 void virtio_scsi_init_req(VirtIOSCSI *s, VirtQueue *vq, VirtIOSCSIReq *req);
 void virtio_scsi_free_req(VirtIOSCSIReq *req);
 void virtio_scsi_push_event(VirtIOSCSI *s, SCSIDevice *dev,
diff --git a/include/hw/virtio/virtio.h b/include/hw/virtio/virtio.h
index 525da24..0863a25 100644
--- a/include/hw/virtio/virtio.h
+++ b/include/hw/virtio/virtio.h
@@ -154,6 +154,7 @@ void virtio_error(VirtIODevice *vdev, const char *fmt, ...) GCC_FMT_ATTR(2, 3);
 void virtio_device_set_child_bus_name(VirtIODevice *vdev, char *bus_name);
 
 typedef void (*VirtIOHandleOutput)(VirtIODevice *, VirtQueue *);
+typedef bool (*VirtIOHandleAIOOutput)(VirtIODevice *, VirtQueue *);
 
 VirtQueue *virtio_add_queue(VirtIODevice *vdev, int queue_size,
                             VirtIOHandleOutput handle_output);
@@ -284,8 +285,7 @@ bool virtio_device_ioeventfd_enabled(VirtIODevice *vdev);
 EventNotifier *virtio_queue_get_host_notifier(VirtQueue *vq);
 void virtio_queue_host_notifier_read(EventNotifier *n);
 void virtio_queue_aio_set_host_notifier_handler(VirtQueue *vq, AioContext *ctx,
-                                                void (*fn)(VirtIODevice *,
-                                                           VirtQueue *));
+                                                VirtIOHandleAIOOutput handle_output);
 VirtQueue *virtio_vector_first_queue(VirtIODevice *vdev, uint16_t vector);
 VirtQueue *virtio_vector_next_queue(VirtQueue *vq);
 
diff --git a/hw/block/dataplane/virtio-blk.c b/hw/block/dataplane/virtio-blk.c
index d1f9f63..5556f0e 100644
--- a/hw/block/dataplane/virtio-blk.c
+++ b/hw/block/dataplane/virtio-blk.c
@@ -147,7 +147,7 @@ void virtio_blk_data_plane_destroy(VirtIOBlockDataPlane *s)
     g_free(s);
 }
 
-static void virtio_blk_data_plane_handle_output(VirtIODevice *vdev,
+static bool virtio_blk_data_plane_handle_output(VirtIODevice *vdev,
                                                 VirtQueue *vq)
 {
     VirtIOBlock *s = (VirtIOBlock *)vdev;
@@ -155,7 +155,7 @@ static void virtio_blk_data_plane_handle_output(VirtIODevice *vdev,
     assert(s->dataplane);
     assert(s->dataplane_started);
 
-    virtio_blk_handle_vq(s, vq);
+    return virtio_blk_handle_vq(s, vq);
 }
 
 /* Context: QEMU global mutex held */
diff --git a/hw/block/virtio-blk.c b/hw/block/virtio-blk.c
index 702eda8..baaa195 100644
--- a/hw/block/virtio-blk.c
+++ b/hw/block/virtio-blk.c
@@ -581,10 +581,11 @@ static int virtio_blk_handle_request(VirtIOBlockReq *req, MultiReqBuffer *mrb)
     return 0;
 }
 
-void virtio_blk_handle_vq(VirtIOBlock *s, VirtQueue *vq)
+bool virtio_blk_handle_vq(VirtIOBlock *s, VirtQueue *vq)
 {
     VirtIOBlockReq *req;
     MultiReqBuffer mrb = {};
+    bool progress = false;
 
     blk_io_plug(s->blk);
 
@@ -592,6 +593,7 @@ void virtio_blk_handle_vq(VirtIOBlock *s, VirtQueue *vq)
         virtio_queue_set_notification(vq, 0);
 
         while ((req = virtio_blk_get_request(s, vq))) {
+            progress = true;
             if (virtio_blk_handle_request(req, &mrb)) {
                 virtqueue_detach_element(req->vq, &req->elem, 0);
                 virtio_blk_free_request(req);
@@ -607,6 +609,12 @@ void virtio_blk_handle_vq(VirtIOBlock *s, VirtQueue *vq)
     }
 
     blk_io_unplug(s->blk);
+    return progress;
+}
+
+static void virtio_blk_handle_output_do(VirtIOBlock *s, VirtQueue *vq)
+{
+    virtio_blk_handle_vq(s, vq);
 }
 
 static void virtio_blk_handle_output(VirtIODevice *vdev, VirtQueue *vq)
@@ -622,7 +630,7 @@ static void virtio_blk_handle_output(VirtIODevice *vdev, VirtQueue *vq)
             return;
         }
     }
-    virtio_blk_handle_vq(s, vq);
+    virtio_blk_handle_output_do(s, vq);
 }
 
 static void virtio_blk_dma_restart_bh(void *opaque)
diff --git a/hw/scsi/virtio-scsi-dataplane.c b/hw/scsi/virtio-scsi-dataplane.c
index 6b8d0f0..74c95e0 100644
--- a/hw/scsi/virtio-scsi-dataplane.c
+++ b/hw/scsi/virtio-scsi-dataplane.c
@@ -49,35 +49,35 @@ void virtio_scsi_dataplane_setup(VirtIOSCSI *s, Error **errp)
     }
 }
 
-static void virtio_scsi_data_plane_handle_cmd(VirtIODevice *vdev,
+static bool virtio_scsi_data_plane_handle_cmd(VirtIODevice *vdev,
                                               VirtQueue *vq)
 {
     VirtIOSCSI *s = (VirtIOSCSI *)vdev;
 
     assert(s->ctx && s->dataplane_started);
-    virtio_scsi_handle_cmd_vq(s, vq);
+    return virtio_scsi_handle_cmd_vq(s, vq);
 }
 
-static void virtio_scsi_data_plane_handle_ctrl(VirtIODevice *vdev,
+static bool virtio_scsi_data_plane_handle_ctrl(VirtIODevice *vdev,
                                                VirtQueue *vq)
 {
     VirtIOSCSI *s = VIRTIO_SCSI(vdev);
 
     assert(s->ctx && s->dataplane_started);
-    virtio_scsi_handle_ctrl_vq(s, vq);
+    return virtio_scsi_handle_ctrl_vq(s, vq);
 }
 
-static void virtio_scsi_data_plane_handle_event(VirtIODevice *vdev,
+static bool virtio_scsi_data_plane_handle_event(VirtIODevice *vdev,
                                                 VirtQueue *vq)
 {
     VirtIOSCSI *s = VIRTIO_SCSI(vdev);
 
     assert(s->ctx && s->dataplane_started);
-    virtio_scsi_handle_event_vq(s, vq);
+    return virtio_scsi_handle_event_vq(s, vq);
 }
 
 static int virtio_scsi_vring_init(VirtIOSCSI *s, VirtQueue *vq, int n,
-                                  void (*fn)(VirtIODevice *vdev, VirtQueue *vq))
+                                  VirtIOHandleAIOOutput fn)
 {
     BusState *qbus = BUS(qdev_get_parent_bus(DEVICE(s)));
     int rc;
diff --git a/hw/scsi/virtio-scsi.c b/hw/scsi/virtio-scsi.c
index ce19eff..b01030b 100644
--- a/hw/scsi/virtio-scsi.c
+++ b/hw/scsi/virtio-scsi.c
@@ -436,13 +436,16 @@ static inline void virtio_scsi_release(VirtIOSCSI *s)
     }
 }
 
-void virtio_scsi_handle_ctrl_vq(VirtIOSCSI *s, VirtQueue *vq)
+bool virtio_scsi_handle_ctrl_vq(VirtIOSCSI *s, VirtQueue *vq)
 {
     VirtIOSCSIReq *req;
+    bool progress = false;
 
     while ((req = virtio_scsi_pop_req(s, vq))) {
+        progress = true;
         virtio_scsi_handle_ctrl_req(s, req);
     }
+    return progress;
 }
 
 static void virtio_scsi_handle_ctrl(VirtIODevice *vdev, VirtQueue *vq)
@@ -591,10 +594,11 @@ static void virtio_scsi_handle_cmd_req_submit(VirtIOSCSI *s, VirtIOSCSIReq *req)
     scsi_req_unref(sreq);
 }
 
-void virtio_scsi_handle_cmd_vq(VirtIOSCSI *s, VirtQueue *vq)
+bool virtio_scsi_handle_cmd_vq(VirtIOSCSI *s, VirtQueue *vq)
 {
     VirtIOSCSIReq *req, *next;
     int ret = 0;
+    bool progress = false;
 
     QTAILQ_HEAD(, VirtIOSCSIReq) reqs = QTAILQ_HEAD_INITIALIZER(reqs);
 
@@ -602,6 +606,7 @@ void virtio_scsi_handle_cmd_vq(VirtIOSCSI *s, VirtQueue *vq)
         virtio_queue_set_notification(vq, 0);
 
         while ((req = virtio_scsi_pop_req(s, vq))) {
+            progress = true;
             ret = virtio_scsi_handle_cmd_req_prepare(s, req);
             if (!ret) {
                 QTAILQ_INSERT_TAIL(&reqs, req, next);
@@ -624,6 +629,7 @@ void virtio_scsi_handle_cmd_vq(VirtIOSCSI *s, VirtQueue *vq)
     QTAILQ_FOREACH_SAFE(req, &reqs, next, next) {
         virtio_scsi_handle_cmd_req_submit(s, req);
     }
+    return progress;
 }
 
 static void virtio_scsi_handle_cmd(VirtIODevice *vdev, VirtQueue *vq)
@@ -752,11 +758,13 @@ out:
     virtio_scsi_release(s);
 }
 
-void virtio_scsi_handle_event_vq(VirtIOSCSI *s, VirtQueue *vq)
+bool virtio_scsi_handle_event_vq(VirtIOSCSI *s, VirtQueue *vq)
 {
     if (s->events_dropped) {
         virtio_scsi_push_event(s, NULL, VIRTIO_SCSI_T_NO_EVENT, 0);
+        return true;
     }
+    return false;
 }
 
 static void virtio_scsi_handle_event(VirtIODevice *vdev, VirtQueue *vq)
diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
index 6365706..2461c06 100644
--- a/hw/virtio/virtio.c
+++ b/hw/virtio/virtio.c
@@ -97,7 +97,7 @@ struct VirtQueue
 
     uint16_t vector;
     VirtIOHandleOutput handle_output;
-    VirtIOHandleOutput handle_aio_output;
+    VirtIOHandleAIOOutput handle_aio_output;
     VirtIODevice *vdev;
     EventNotifier guest_notifier;
     EventNotifier host_notifier;
@@ -1287,14 +1287,16 @@ void virtio_queue_set_align(VirtIODevice *vdev, int n, int align)
     virtio_queue_update_rings(vdev, n);
 }
 
-static void virtio_queue_notify_aio_vq(VirtQueue *vq)
+static bool virtio_queue_notify_aio_vq(VirtQueue *vq)
 {
     if (vq->vring.desc && vq->handle_aio_output) {
         VirtIODevice *vdev = vq->vdev;
 
         trace_virtio_queue_notify(vdev, vq - vdev->vq, vq);
-        vq->handle_aio_output(vdev, vq);
+        return vq->handle_aio_output(vdev, vq);
     }
+
+    return false;
 }
 
 static void virtio_queue_notify_vq(VirtQueue *vq)
@@ -2125,16 +2127,17 @@ static bool virtio_queue_host_notifier_aio_poll(void *opaque)
 {
     EventNotifier *n = opaque;
     VirtQueue *vq = container_of(n, VirtQueue, host_notifier);
+    bool progress;
 
     if (virtio_queue_empty(vq)) {
         return false;
     }
 
-    virtio_queue_notify_aio_vq(vq);
+    progress = virtio_queue_notify_aio_vq(vq);
 
     /* In case the handler function re-enabled notifications */
     virtio_queue_set_notification(vq, 0);
-    return true;
+    return progress;
 }
 
 static void virtio_queue_host_notifier_aio_poll_end(EventNotifier *n)
@@ -2146,7 +2149,7 @@ static void virtio_queue_host_notifier_aio_poll_end(EventNotifier *n)
 }
 
 void virtio_queue_aio_set_host_notifier_handler(VirtQueue *vq, AioContext *ctx,
-                                                VirtIOHandleOutput handle_output)
+                                                VirtIOHandleAIOOutput handle_output)
 {
     if (handle_output) {
         vq->handle_aio_output = handle_output;
-- 
MST

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [Qemu-devel] [PULL 03/23] docs: add document to explain the usage of vNVDIMM
  2017-02-17 19:53 [Qemu-devel] [PULL 00/23] virtio, pci: fixes, features Michael S. Tsirkin
  2017-02-17 19:53 ` [Qemu-devel] [PULL 01/23] pci/pcie: don't assume cap id 0 is reserved Michael S. Tsirkin
  2017-02-17 19:54 ` [Qemu-devel] [PULL 02/23] virtio: Report real progress in VQ aio poll handler Michael S. Tsirkin
@ 2017-02-17 19:54 ` Michael S. Tsirkin
  2017-02-17 19:54 ` [Qemu-devel] [PULL 04/23] memory: make memory_listener_unregister idempotent Michael S. Tsirkin
                   ` (20 subsequent siblings)
  23 siblings, 0 replies; 36+ messages in thread
From: Michael S. Tsirkin @ 2017-02-17 19:54 UTC (permalink / raw)
  To: qemu-devel; +Cc: Peter Maydell, Haozhong Zhang, Xiao Guangrong, Stefan Hajnoczi

From: Haozhong Zhang <haozhong.zhang@intel.com>

Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
Reviewed-by: Xiao Guangrong <guangrong.xiao@linux.intel.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
---
 docs/nvdimm.txt | 124 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 124 insertions(+)
 create mode 100644 docs/nvdimm.txt

diff --git a/docs/nvdimm.txt b/docs/nvdimm.txt
new file mode 100644
index 0000000..2d9f8c0
--- /dev/null
+++ b/docs/nvdimm.txt
@@ -0,0 +1,124 @@
+QEMU Virtual NVDIMM
+===================
+
+This document explains the usage of virtual NVDIMM (vNVDIMM) feature
+which is available since QEMU v2.6.0.
+
+The current QEMU only implements the persistent memory mode of vNVDIMM
+device and not the block window mode.
+
+Basic Usage
+-----------
+
+The storage of a vNVDIMM device in QEMU is provided by the memory
+backend (i.e. memory-backend-file and memory-backend-ram). A simple
+way to create a vNVDIMM device at startup time is done via the
+following command line options:
+
+ -machine pc,nvdimm
+ -m $RAM_SIZE,slots=$N,maxmem=$MAX_SIZE
+ -object memory-backend-file,id=mem1,share=on,mem-path=$PATH,size=$NVDIMM_SIZE
+ -device nvdimm,id=nvdimm1,memdev=mem1
+
+Where,
+
+ - the "nvdimm" machine option enables vNVDIMM feature.
+
+ - "slots=$N" should be equal to or larger than the total amount of
+   normal RAM devices and vNVDIMM devices, e.g. $N should be >= 2 here.
+
+ - "maxmem=$MAX_SIZE" should be equal to or larger than the total size
+   of normal RAM devices and vNVDIMM devices, e.g. $MAX_SIZE should be
+   >= $RAM_SIZE + $NVDIMM_SIZE here.
+
+ - "object memory-backend-file,id=mem1,share=on,mem-path=$PATH,size=$NVDIMM_SIZE"
+   creates a backend storage of size $NVDIMM_SIZE on a file $PATH. All
+   accesses to the virtual NVDIMM device go to the file $PATH.
+
+   "share=on/off" controls the visibility of guest writes. If
+   "share=on", then guest writes will be applied to the backend
+   file. If another guest uses the same backend file with option
+   "share=on", then above writes will be visible to it as well. If
+   "share=off", then guest writes won't be applied to the backend
+   file and thus will be invisible to other guests.
+
+ - "device nvdimm,id=nvdimm1,memdev=mem1" creates a virtual NVDIMM
+   device whose storage is provided by above memory backend device.
+
+Multiple vNVDIMM devices can be created if multiple pairs of "-object"
+and "-device" are provided.
+
+For above command line options, if the guest OS has the proper NVDIMM
+driver, it should be able to detect a NVDIMM device which is in the
+persistent memory mode and whose size is $NVDIMM_SIZE.
+
+Note:
+
+1. Prior to QEMU v2.8.0, if memory-backend-file is used and the actual
+   backend file size is not equal to the size given by "size" option,
+   QEMU will truncate the backend file by ftruncate(2), which will
+   corrupt the existing data in the backend file, especially for the
+   shrink case.
+
+   QEMU v2.8.0 and later check the backend file size and the "size"
+   option. If they do not match, QEMU will report errors and abort in
+   order to avoid the data corruption.
+
+2. QEMU v2.6.0 only puts a basic alignment requirement on the "size"
+   option of memory-backend-file, e.g. 4KB alignment on x86.  However,
+   QEMU v.2.7.0 puts an additional alignment requirement, which may
+   require a larger value than the basic one, e.g. 2MB on x86. This
+   change breaks the usage of memory-backend-file that only satisfies
+   the basic alignment.
+
+   QEMU v2.8.0 and later remove the additional alignment on non-s390x
+   architectures, so the broken memory-backend-file can work again.
+
+Label
+-----
+
+QEMU v2.7.0 and later implement the label support for vNVDIMM devices.
+To enable label on vNVDIMM devices, users can simply add
+"label-size=$SZ" option to "-device nvdimm", e.g.
+
+ -device nvdimm,id=nvdimm1,memdev=mem1,label-size=128K
+
+Note:
+
+1. The minimal label size is 128KB.
+
+2. QEMU v2.7.0 and later store labels at the end of backend storage.
+   If a memory backend file, which was previously used as the backend
+   of a vNVDIMM device without labels, is now used for a vNVDIMM
+   device with label, the data in the label area at the end of file
+   will be inaccessible to the guest. If any useful data (e.g. the
+   meta-data of the file system) was stored there, the latter usage
+   may result guest data corruption (e.g. breakage of guest file
+   system).
+
+Hotplug
+-------
+
+QEMU v2.8.0 and later implement the hotplug support for vNVDIMM
+devices. Similarly to the RAM hotplug, the vNVDIMM hotplug is
+accomplished by two monitor commands "object_add" and "device_add".
+
+For example, the following commands add another 4GB vNVDIMM device to
+the guest:
+
+ (qemu) object_add memory-backend-file,id=mem2,share=on,mem-path=new_nvdimm.img,size=4G
+ (qemu) device_add nvdimm,id=nvdimm2,memdev=mem2
+
+Note:
+
+1. Each hotplugged vNVDIMM device consumes one memory slot. Users
+   should always ensure the memory option "-m ...,slots=N" specifies
+   enough number of slots, i.e.
+     N >= number of RAM devices +
+          number of statically plugged vNVDIMM devices +
+          number of hotplugged vNVDIMM devices
+
+2. The similar is required for the memory option "-m ...,maxmem=M", i.e.
+     M >= size of RAM devices +
+          size of statically plugged vNVDIMM devices +
+          size of hotplugged vNVDIMM devices
-- 
MST

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [Qemu-devel] [PULL 04/23] memory: make memory_listener_unregister idempotent
  2017-02-17 19:53 [Qemu-devel] [PULL 00/23] virtio, pci: fixes, features Michael S. Tsirkin
                   ` (2 preceding siblings ...)
  2017-02-17 19:54 ` [Qemu-devel] [PULL 03/23] docs: add document to explain the usage of vNVDIMM Michael S. Tsirkin
@ 2017-02-17 19:54 ` Michael S. Tsirkin
  2017-02-17 19:54 ` [Qemu-devel] [PULL 05/23] virtio: add virtio_*_phys_cached Michael S. Tsirkin
                   ` (19 subsequent siblings)
  23 siblings, 0 replies; 36+ messages in thread
From: Michael S. Tsirkin @ 2017-02-17 19:54 UTC (permalink / raw)
  To: qemu-devel; +Cc: Peter Maydell, Paolo Bonzini, Philippe Mathieu-Daudé

From: Paolo Bonzini <pbonzini@redhat.com>

Make it easy to unregister a MemoryListener without tracking whether it
had been registered before.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
---
 memory.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/memory.c b/memory.c
index 6c58373..ed8b5aa 100644
--- a/memory.c
+++ b/memory.c
@@ -2371,8 +2371,13 @@ void memory_listener_register(MemoryListener *listener, AddressSpace *as)
 
 void memory_listener_unregister(MemoryListener *listener)
 {
+    if (!listener->address_space) {
+        return;
+    }
+
     QTAILQ_REMOVE(&memory_listeners, listener, link);
     QTAILQ_REMOVE(&listener->address_space->listeners, listener, link_as);
+    listener->address_space = NULL;
 }
 
 void address_space_init(AddressSpace *as, MemoryRegion *root, const char *name)
-- 
MST

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [Qemu-devel] [PULL 05/23] virtio: add virtio_*_phys_cached
  2017-02-17 19:53 [Qemu-devel] [PULL 00/23] virtio, pci: fixes, features Michael S. Tsirkin
                   ` (3 preceding siblings ...)
  2017-02-17 19:54 ` [Qemu-devel] [PULL 04/23] memory: make memory_listener_unregister idempotent Michael S. Tsirkin
@ 2017-02-17 19:54 ` Michael S. Tsirkin
  2017-02-17 19:54 ` [Qemu-devel] [PULL 06/23] virtio: use address_space_map/unmap to access descriptors Michael S. Tsirkin
                   ` (18 subsequent siblings)
  23 siblings, 0 replies; 36+ messages in thread
From: Michael S. Tsirkin @ 2017-02-17 19:54 UTC (permalink / raw)
  To: qemu-devel; +Cc: Peter Maydell, Paolo Bonzini, Stefan Hajnoczi

From: Paolo Bonzini <pbonzini@redhat.com>

Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
---
 include/hw/virtio/virtio-access.h | 52 +++++++++++++++++++++++++++++++++++++++
 1 file changed, 52 insertions(+)

diff --git a/include/hw/virtio/virtio-access.h b/include/hw/virtio/virtio-access.h
index 91ae14d..2e92074 100644
--- a/include/hw/virtio/virtio-access.h
+++ b/include/hw/virtio/virtio-access.h
@@ -156,6 +156,58 @@ static inline uint16_t virtio_tswap16(VirtIODevice *vdev, uint16_t s)
 #endif
 }
 
+static inline uint16_t virtio_lduw_phys_cached(VirtIODevice *vdev,
+                                               MemoryRegionCache *cache,
+                                               hwaddr pa)
+{
+    if (virtio_access_is_big_endian(vdev)) {
+        return lduw_be_phys_cached(cache, pa);
+    }
+    return lduw_le_phys_cached(cache, pa);
+}
+
+static inline uint32_t virtio_ldl_phys_cached(VirtIODevice *vdev,
+                                              MemoryRegionCache *cache,
+                                              hwaddr pa)
+{
+    if (virtio_access_is_big_endian(vdev)) {
+        return ldl_be_phys_cached(cache, pa);
+    }
+    return ldl_le_phys_cached(cache, pa);
+}
+
+static inline uint64_t virtio_ldq_phys_cached(VirtIODevice *vdev,
+                                              MemoryRegionCache *cache,
+                                              hwaddr pa)
+{
+    if (virtio_access_is_big_endian(vdev)) {
+        return ldq_be_phys_cached(cache, pa);
+    }
+    return ldq_le_phys_cached(cache, pa);
+}
+
+static inline void virtio_stw_phys_cached(VirtIODevice *vdev,
+                                          MemoryRegionCache *cache,
+                                          hwaddr pa, uint16_t value)
+{
+    if (virtio_access_is_big_endian(vdev)) {
+        stw_be_phys_cached(cache, pa, value);
+    } else {
+        stw_le_phys_cached(cache, pa, value);
+    }
+}
+
+static inline void virtio_stl_phys_cached(VirtIODevice *vdev,
+                                          MemoryRegionCache *cache,
+                                          hwaddr pa, uint32_t value)
+{
+    if (virtio_access_is_big_endian(vdev)) {
+        stl_be_phys_cached(cache, pa, value);
+    } else {
+        stl_le_phys_cached(cache, pa, value);
+    }
+}
+
 static inline void virtio_tswap16s(VirtIODevice *vdev, uint16_t *s)
 {
     *s = virtio_tswap16(vdev, *s);
-- 
MST

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [Qemu-devel] [PULL 06/23] virtio: use address_space_map/unmap to access descriptors
  2017-02-17 19:53 [Qemu-devel] [PULL 00/23] virtio, pci: fixes, features Michael S. Tsirkin
                   ` (4 preceding siblings ...)
  2017-02-17 19:54 ` [Qemu-devel] [PULL 05/23] virtio: add virtio_*_phys_cached Michael S. Tsirkin
@ 2017-02-17 19:54 ` Michael S. Tsirkin
  2017-02-17 19:54 ` [Qemu-devel] [PULL 07/23] exec: make address_space_cache_destroy idempotent Michael S. Tsirkin
                   ` (17 subsequent siblings)
  23 siblings, 0 replies; 36+ messages in thread
From: Michael S. Tsirkin @ 2017-02-17 19:54 UTC (permalink / raw)
  To: qemu-devel; +Cc: Peter Maydell, Paolo Bonzini, Stefan Hajnoczi

From: Paolo Bonzini <pbonzini@redhat.com>

This makes little difference, but it makes the code change smaller
for the next patch that introduces MemoryRegionCache.  This is
because map/unmap are similar to MemoryRegionCache init/destroy.

Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
---
 hw/virtio/virtio.c | 103 ++++++++++++++++++++++++++++++++++++++---------------
 1 file changed, 75 insertions(+), 28 deletions(-)

diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
index 2461c06..6ce6a26 100644
--- a/hw/virtio/virtio.c
+++ b/hw/virtio/virtio.c
@@ -120,10 +120,9 @@ void virtio_queue_update_rings(VirtIODevice *vdev, int n)
 }
 
 static void vring_desc_read(VirtIODevice *vdev, VRingDesc *desc,
-                            hwaddr desc_pa, int i)
+                            uint8_t *desc_ptr, int i)
 {
-    address_space_read(vdev->dma_as, desc_pa + i * sizeof(VRingDesc),
-                       MEMTXATTRS_UNSPECIFIED, (void *)desc, sizeof(VRingDesc));
+    memcpy(desc, desc_ptr + i * sizeof(VRingDesc), sizeof(VRingDesc));
     virtio_tswap64s(vdev, &desc->addr);
     virtio_tswap32s(vdev, &desc->len);
     virtio_tswap16s(vdev, &desc->flags);
@@ -408,7 +407,7 @@ enum {
 };
 
 static int virtqueue_read_next_desc(VirtIODevice *vdev, VRingDesc *desc,
-                                    hwaddr desc_pa, unsigned int max,
+                                    void *desc_ptr, unsigned int max,
                                     unsigned int *next)
 {
     /* If this descriptor says it doesn't chain, we're done. */
@@ -426,7 +425,7 @@ static int virtqueue_read_next_desc(VirtIODevice *vdev, VRingDesc *desc,
         return VIRTQUEUE_READ_DESC_ERROR;
     }
 
-    vring_desc_read(vdev, desc, desc_pa, *next);
+    vring_desc_read(vdev, desc, desc_ptr, *next);
     return VIRTQUEUE_READ_DESC_MORE;
 }
 
@@ -434,31 +433,41 @@ void virtqueue_get_avail_bytes(VirtQueue *vq, unsigned int *in_bytes,
                                unsigned int *out_bytes,
                                unsigned max_in_bytes, unsigned max_out_bytes)
 {
-    unsigned int idx;
+    VirtIODevice *vdev = vq->vdev;
+    unsigned int max, idx;
     unsigned int total_bufs, in_total, out_total;
+    void *vring_desc_ptr;
+    void *indirect_desc_ptr = NULL;
+    hwaddr len = 0;
     int rc;
 
     idx = vq->last_avail_idx;
-
     total_bufs = in_total = out_total = 0;
+
+    max = vq->vring.num;
+    len = max * sizeof(VRingDesc);
+    vring_desc_ptr = address_space_map(vdev->dma_as, vq->vring.desc, &len, false);
+    if (len < max * sizeof(VRingDesc)) {
+        virtio_error(vdev, "Cannot map descriptor ring");
+        goto err;
+    }
+
     while ((rc = virtqueue_num_heads(vq, idx)) > 0) {
-        VirtIODevice *vdev = vq->vdev;
-        unsigned int max, num_bufs, indirect = 0;
+        void *desc_ptr = vring_desc_ptr;
+        unsigned int num_bufs;
         VRingDesc desc;
-        hwaddr desc_pa;
         unsigned int i;
 
-        max = vq->vring.num;
         num_bufs = total_bufs;
 
         if (!virtqueue_get_head(vq, idx++, &i)) {
             goto err;
         }
 
-        desc_pa = vq->vring.desc;
-        vring_desc_read(vdev, &desc, desc_pa, i);
+        vring_desc_read(vdev, &desc, desc_ptr, i);
 
         if (desc.flags & VRING_DESC_F_INDIRECT) {
+            len = desc.len;
             if (desc.len % sizeof(VRingDesc)) {
                 virtio_error(vdev, "Invalid size for indirect buffer table");
                 goto err;
@@ -471,11 +480,17 @@ void virtqueue_get_avail_bytes(VirtQueue *vq, unsigned int *in_bytes,
             }
 
             /* loop over the indirect descriptor table */
-            indirect = 1;
+            indirect_desc_ptr = address_space_map(vdev->dma_as, desc.addr,
+                                                  &len, false);
+            desc_ptr = indirect_desc_ptr;
+            if (len < desc.len) {
+                virtio_error(vdev, "Cannot map indirect buffer");
+                goto err;
+            }
+
             max = desc.len / sizeof(VRingDesc);
-            desc_pa = desc.addr;
             num_bufs = i = 0;
-            vring_desc_read(vdev, &desc, desc_pa, i);
+            vring_desc_read(vdev, &desc, desc_ptr, i);
         }
 
         do {
@@ -494,17 +509,20 @@ void virtqueue_get_avail_bytes(VirtQueue *vq, unsigned int *in_bytes,
                 goto done;
             }
 
-            rc = virtqueue_read_next_desc(vdev, &desc, desc_pa, max, &i);
+            rc = virtqueue_read_next_desc(vdev, &desc, desc_ptr, max, &i);
         } while (rc == VIRTQUEUE_READ_DESC_MORE);
 
         if (rc == VIRTQUEUE_READ_DESC_ERROR) {
             goto err;
         }
 
-        if (!indirect)
-            total_bufs = num_bufs;
-        else
+        if (desc_ptr == indirect_desc_ptr) {
+            address_space_unmap(vdev->dma_as, desc_ptr, len, false, 0);
+            indirect_desc_ptr = NULL;
             total_bufs++;
+        } else {
+            total_bufs = num_bufs;
+        }
     }
 
     if (rc < 0) {
@@ -512,6 +530,10 @@ void virtqueue_get_avail_bytes(VirtQueue *vq, unsigned int *in_bytes,
     }
 
 done:
+    if (indirect_desc_ptr) {
+        address_space_unmap(vdev->dma_as, indirect_desc_ptr, len, false, 0);
+    }
+    address_space_unmap(vdev->dma_as, vring_desc_ptr, len, false, 0);
     if (in_bytes) {
         *in_bytes = in_total;
     }
@@ -651,9 +673,12 @@ static void *virtqueue_alloc_element(size_t sz, unsigned out_num, unsigned in_nu
 void *virtqueue_pop(VirtQueue *vq, size_t sz)
 {
     unsigned int i, head, max;
-    hwaddr desc_pa = vq->vring.desc;
+    void *vring_desc_ptr;
+    void *indirect_desc_ptr = NULL;
+    void *desc_ptr;
+    hwaddr len;
     VirtIODevice *vdev = vq->vdev;
-    VirtQueueElement *elem;
+    VirtQueueElement *elem = NULL;
     unsigned out_num, in_num;
     hwaddr addr[VIRTQUEUE_MAX_SIZE];
     struct iovec iov[VIRTQUEUE_MAX_SIZE];
@@ -689,18 +714,34 @@ void *virtqueue_pop(VirtQueue *vq, size_t sz)
     }
 
     i = head;
-    vring_desc_read(vdev, &desc, desc_pa, i);
+
+    len = max * sizeof(VRingDesc);
+    vring_desc_ptr = address_space_map(vdev->dma_as, vq->vring.desc, &len, false);
+    if (len < max * sizeof(VRingDesc)) {
+        virtio_error(vdev, "Cannot map descriptor ring");
+        goto done;
+    }
+
+    desc_ptr = vring_desc_ptr;
+    vring_desc_read(vdev, &desc, desc_ptr, i);
     if (desc.flags & VRING_DESC_F_INDIRECT) {
         if (desc.len % sizeof(VRingDesc)) {
             virtio_error(vdev, "Invalid size for indirect buffer table");
-            return NULL;
+            goto done;
         }
 
         /* loop over the indirect descriptor table */
+        len = desc.len;
+        indirect_desc_ptr = address_space_map(vdev->dma_as, desc.addr, &len, false);
+        desc_ptr = indirect_desc_ptr;
+        if (len < desc.len) {
+            virtio_error(vdev, "Cannot map indirect buffer");
+            goto done;
+        }
+
         max = desc.len / sizeof(VRingDesc);
-        desc_pa = desc.addr;
         i = 0;
-        vring_desc_read(vdev, &desc, desc_pa, i);
+        vring_desc_read(vdev, &desc, desc_ptr, i);
     }
 
     /* Collect all the descriptors */
@@ -731,7 +772,7 @@ void *virtqueue_pop(VirtQueue *vq, size_t sz)
             goto err_undo_map;
         }
 
-        rc = virtqueue_read_next_desc(vdev, &desc, desc_pa, max, &i);
+        rc = virtqueue_read_next_desc(vdev, &desc, desc_ptr, max, &i);
     } while (rc == VIRTQUEUE_READ_DESC_MORE);
 
     if (rc == VIRTQUEUE_READ_DESC_ERROR) {
@@ -753,11 +794,17 @@ void *virtqueue_pop(VirtQueue *vq, size_t sz)
     vq->inuse++;
 
     trace_virtqueue_pop(vq, elem, elem->in_num, elem->out_num);
+done:
+    if (indirect_desc_ptr) {
+        address_space_unmap(vdev->dma_as, indirect_desc_ptr, len, false, 0);
+    }
+    address_space_unmap(vdev->dma_as, vring_desc_ptr, len, false, 0);
+
     return elem;
 
 err_undo_map:
     virtqueue_undo_map_desc(out_num, in_num, iov);
-    return NULL;
+    goto done;
 }
 
 /* virtqueue_drop_all:
-- 
MST

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [Qemu-devel] [PULL 07/23] exec: make address_space_cache_destroy idempotent
  2017-02-17 19:53 [Qemu-devel] [PULL 00/23] virtio, pci: fixes, features Michael S. Tsirkin
                   ` (5 preceding siblings ...)
  2017-02-17 19:54 ` [Qemu-devel] [PULL 06/23] virtio: use address_space_map/unmap to access descriptors Michael S. Tsirkin
@ 2017-02-17 19:54 ` Michael S. Tsirkin
  2017-02-17 19:54 ` [Qemu-devel] [PULL 08/23] virtio: use MemoryRegionCache to access descriptors Michael S. Tsirkin
                   ` (16 subsequent siblings)
  23 siblings, 0 replies; 36+ messages in thread
From: Michael S. Tsirkin @ 2017-02-17 19:54 UTC (permalink / raw)
  To: qemu-devel
  Cc: Peter Maydell, Paolo Bonzini, Stefan Hajnoczi,
	Philippe Mathieu-Daudé,
	Peter Crosthwaite, Richard Henderson

From: Paolo Bonzini <pbonzini@redhat.com>

Clear cache->mr so that address_space_cache_destroy does nothing
the second time it is called.

Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
---
 exec.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/exec.c b/exec.c
index 6fa337b..865a1e8 100644
--- a/exec.c
+++ b/exec.c
@@ -3166,6 +3166,7 @@ void address_space_cache_destroy(MemoryRegionCache *cache)
         xen_invalidate_map_cache_entry(cache->ptr);
     }
     memory_region_unref(cache->mr);
+    cache->mr = NULL;
 }
 
 /* Called from RCU critical section.  This function has the same
-- 
MST

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [Qemu-devel] [PULL 08/23] virtio: use MemoryRegionCache to access descriptors
  2017-02-17 19:53 [Qemu-devel] [PULL 00/23] virtio, pci: fixes, features Michael S. Tsirkin
                   ` (6 preceding siblings ...)
  2017-02-17 19:54 ` [Qemu-devel] [PULL 07/23] exec: make address_space_cache_destroy idempotent Michael S. Tsirkin
@ 2017-02-17 19:54 ` Michael S. Tsirkin
  2017-02-17 19:54 ` [Qemu-devel] [PULL 09/23] virtio: add MemoryListener to cache ring translations Michael S. Tsirkin
                   ` (15 subsequent siblings)
  23 siblings, 0 replies; 36+ messages in thread
From: Michael S. Tsirkin @ 2017-02-17 19:54 UTC (permalink / raw)
  To: qemu-devel; +Cc: Peter Maydell, Paolo Bonzini, Stefan Hajnoczi

From: Paolo Bonzini <pbonzini@redhat.com>

For now, the cache is created on every virtqueue_pop.  Later on,
direct descriptors will be able to reuse it.

Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
---
 include/exec/memory.h |  2 ++
 hw/virtio/virtio.c    | 80 +++++++++++++++++++++++++--------------------------
 2 files changed, 41 insertions(+), 41 deletions(-)

diff --git a/include/exec/memory.h b/include/exec/memory.h
index 987f925..6911023 100644
--- a/include/exec/memory.h
+++ b/include/exec/memory.h
@@ -1426,6 +1426,8 @@ struct MemoryRegionCache {
     bool is_write;
 };
 
+#define MEMORY_REGION_CACHE_INVALID ((MemoryRegionCache) { .mr = NULL })
+
 /* address_space_cache_init: prepare for repeated access to a physical
  * memory region
  *
diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
index 6ce6a26..71e41f6 100644
--- a/hw/virtio/virtio.c
+++ b/hw/virtio/virtio.c
@@ -120,9 +120,10 @@ void virtio_queue_update_rings(VirtIODevice *vdev, int n)
 }
 
 static void vring_desc_read(VirtIODevice *vdev, VRingDesc *desc,
-                            uint8_t *desc_ptr, int i)
+                            MemoryRegionCache *cache, int i)
 {
-    memcpy(desc, desc_ptr + i * sizeof(VRingDesc), sizeof(VRingDesc));
+    address_space_read_cached(cache, i * sizeof(VRingDesc),
+                              desc, sizeof(VRingDesc));
     virtio_tswap64s(vdev, &desc->addr);
     virtio_tswap32s(vdev, &desc->len);
     virtio_tswap16s(vdev, &desc->flags);
@@ -407,7 +408,7 @@ enum {
 };
 
 static int virtqueue_read_next_desc(VirtIODevice *vdev, VRingDesc *desc,
-                                    void *desc_ptr, unsigned int max,
+                                    MemoryRegionCache *desc_cache, unsigned int max,
                                     unsigned int *next)
 {
     /* If this descriptor says it doesn't chain, we're done. */
@@ -425,7 +426,7 @@ static int virtqueue_read_next_desc(VirtIODevice *vdev, VRingDesc *desc,
         return VIRTQUEUE_READ_DESC_ERROR;
     }
 
-    vring_desc_read(vdev, desc, desc_ptr, *next);
+    vring_desc_read(vdev, desc, desc_cache, *next);
     return VIRTQUEUE_READ_DESC_MORE;
 }
 
@@ -436,24 +437,25 @@ void virtqueue_get_avail_bytes(VirtQueue *vq, unsigned int *in_bytes,
     VirtIODevice *vdev = vq->vdev;
     unsigned int max, idx;
     unsigned int total_bufs, in_total, out_total;
-    void *vring_desc_ptr;
-    void *indirect_desc_ptr = NULL;
-    hwaddr len = 0;
+    MemoryRegionCache vring_desc_cache;
+    MemoryRegionCache indirect_desc_cache = MEMORY_REGION_CACHE_INVALID;
+    int64_t len = 0;
     int rc;
 
     idx = vq->last_avail_idx;
     total_bufs = in_total = out_total = 0;
 
     max = vq->vring.num;
-    len = max * sizeof(VRingDesc);
-    vring_desc_ptr = address_space_map(vdev->dma_as, vq->vring.desc, &len, false);
+    len = address_space_cache_init(&vring_desc_cache, vdev->dma_as,
+                                   vq->vring.desc, max * sizeof(VRingDesc),
+                                   false);
     if (len < max * sizeof(VRingDesc)) {
         virtio_error(vdev, "Cannot map descriptor ring");
         goto err;
     }
 
     while ((rc = virtqueue_num_heads(vq, idx)) > 0) {
-        void *desc_ptr = vring_desc_ptr;
+        MemoryRegionCache *desc_cache = &vring_desc_cache;
         unsigned int num_bufs;
         VRingDesc desc;
         unsigned int i;
@@ -464,10 +466,9 @@ void virtqueue_get_avail_bytes(VirtQueue *vq, unsigned int *in_bytes,
             goto err;
         }
 
-        vring_desc_read(vdev, &desc, desc_ptr, i);
+        vring_desc_read(vdev, &desc, desc_cache, i);
 
         if (desc.flags & VRING_DESC_F_INDIRECT) {
-            len = desc.len;
             if (desc.len % sizeof(VRingDesc)) {
                 virtio_error(vdev, "Invalid size for indirect buffer table");
                 goto err;
@@ -480,9 +481,10 @@ void virtqueue_get_avail_bytes(VirtQueue *vq, unsigned int *in_bytes,
             }
 
             /* loop over the indirect descriptor table */
-            indirect_desc_ptr = address_space_map(vdev->dma_as, desc.addr,
-                                                  &len, false);
-            desc_ptr = indirect_desc_ptr;
+            len = address_space_cache_init(&indirect_desc_cache,
+                                           vdev->dma_as,
+                                           desc.addr, desc.len, false);
+            desc_cache = &indirect_desc_cache;
             if (len < desc.len) {
                 virtio_error(vdev, "Cannot map indirect buffer");
                 goto err;
@@ -490,7 +492,7 @@ void virtqueue_get_avail_bytes(VirtQueue *vq, unsigned int *in_bytes,
 
             max = desc.len / sizeof(VRingDesc);
             num_bufs = i = 0;
-            vring_desc_read(vdev, &desc, desc_ptr, i);
+            vring_desc_read(vdev, &desc, desc_cache, i);
         }
 
         do {
@@ -509,16 +511,15 @@ void virtqueue_get_avail_bytes(VirtQueue *vq, unsigned int *in_bytes,
                 goto done;
             }
 
-            rc = virtqueue_read_next_desc(vdev, &desc, desc_ptr, max, &i);
+            rc = virtqueue_read_next_desc(vdev, &desc, desc_cache, max, &i);
         } while (rc == VIRTQUEUE_READ_DESC_MORE);
 
         if (rc == VIRTQUEUE_READ_DESC_ERROR) {
             goto err;
         }
 
-        if (desc_ptr == indirect_desc_ptr) {
-            address_space_unmap(vdev->dma_as, desc_ptr, len, false, 0);
-            indirect_desc_ptr = NULL;
+        if (desc_cache == &indirect_desc_cache) {
+            address_space_cache_destroy(&indirect_desc_cache);
             total_bufs++;
         } else {
             total_bufs = num_bufs;
@@ -530,10 +531,8 @@ void virtqueue_get_avail_bytes(VirtQueue *vq, unsigned int *in_bytes,
     }
 
 done:
-    if (indirect_desc_ptr) {
-        address_space_unmap(vdev->dma_as, indirect_desc_ptr, len, false, 0);
-    }
-    address_space_unmap(vdev->dma_as, vring_desc_ptr, len, false, 0);
+    address_space_cache_destroy(&indirect_desc_cache);
+    address_space_cache_destroy(&vring_desc_cache);
     if (in_bytes) {
         *in_bytes = in_total;
     }
@@ -673,10 +672,10 @@ static void *virtqueue_alloc_element(size_t sz, unsigned out_num, unsigned in_nu
 void *virtqueue_pop(VirtQueue *vq, size_t sz)
 {
     unsigned int i, head, max;
-    void *vring_desc_ptr;
-    void *indirect_desc_ptr = NULL;
-    void *desc_ptr;
-    hwaddr len;
+    MemoryRegionCache vring_desc_cache;
+    MemoryRegionCache indirect_desc_cache = MEMORY_REGION_CACHE_INVALID;
+    MemoryRegionCache *desc_cache;
+    int64_t len;
     VirtIODevice *vdev = vq->vdev;
     VirtQueueElement *elem = NULL;
     unsigned out_num, in_num;
@@ -715,15 +714,16 @@ void *virtqueue_pop(VirtQueue *vq, size_t sz)
 
     i = head;
 
-    len = max * sizeof(VRingDesc);
-    vring_desc_ptr = address_space_map(vdev->dma_as, vq->vring.desc, &len, false);
+    len = address_space_cache_init(&vring_desc_cache, vdev->dma_as,
+                                   vq->vring.desc, max * sizeof(VRingDesc),
+                                   false);
     if (len < max * sizeof(VRingDesc)) {
         virtio_error(vdev, "Cannot map descriptor ring");
         goto done;
     }
 
-    desc_ptr = vring_desc_ptr;
-    vring_desc_read(vdev, &desc, desc_ptr, i);
+    desc_cache = &vring_desc_cache;
+    vring_desc_read(vdev, &desc, desc_cache, i);
     if (desc.flags & VRING_DESC_F_INDIRECT) {
         if (desc.len % sizeof(VRingDesc)) {
             virtio_error(vdev, "Invalid size for indirect buffer table");
@@ -731,9 +731,9 @@ void *virtqueue_pop(VirtQueue *vq, size_t sz)
         }
 
         /* loop over the indirect descriptor table */
-        len = desc.len;
-        indirect_desc_ptr = address_space_map(vdev->dma_as, desc.addr, &len, false);
-        desc_ptr = indirect_desc_ptr;
+        len = address_space_cache_init(&indirect_desc_cache, vdev->dma_as,
+                                       desc.addr, desc.len, false);
+        desc_cache = &indirect_desc_cache;
         if (len < desc.len) {
             virtio_error(vdev, "Cannot map indirect buffer");
             goto done;
@@ -741,7 +741,7 @@ void *virtqueue_pop(VirtQueue *vq, size_t sz)
 
         max = desc.len / sizeof(VRingDesc);
         i = 0;
-        vring_desc_read(vdev, &desc, desc_ptr, i);
+        vring_desc_read(vdev, &desc, desc_cache, i);
     }
 
     /* Collect all the descriptors */
@@ -772,7 +772,7 @@ void *virtqueue_pop(VirtQueue *vq, size_t sz)
             goto err_undo_map;
         }
 
-        rc = virtqueue_read_next_desc(vdev, &desc, desc_ptr, max, &i);
+        rc = virtqueue_read_next_desc(vdev, &desc, desc_cache, max, &i);
     } while (rc == VIRTQUEUE_READ_DESC_MORE);
 
     if (rc == VIRTQUEUE_READ_DESC_ERROR) {
@@ -795,10 +795,8 @@ void *virtqueue_pop(VirtQueue *vq, size_t sz)
 
     trace_virtqueue_pop(vq, elem, elem->in_num, elem->out_num);
 done:
-    if (indirect_desc_ptr) {
-        address_space_unmap(vdev->dma_as, indirect_desc_ptr, len, false, 0);
-    }
-    address_space_unmap(vdev->dma_as, vring_desc_ptr, len, false, 0);
+    address_space_cache_destroy(&indirect_desc_cache);
+    address_space_cache_destroy(&vring_desc_cache);
 
     return elem;
 
-- 
MST

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [Qemu-devel] [PULL 09/23] virtio: add MemoryListener to cache ring translations
  2017-02-17 19:53 [Qemu-devel] [PULL 00/23] virtio, pci: fixes, features Michael S. Tsirkin
                   ` (7 preceding siblings ...)
  2017-02-17 19:54 ` [Qemu-devel] [PULL 08/23] virtio: use MemoryRegionCache to access descriptors Michael S. Tsirkin
@ 2017-02-17 19:54 ` Michael S. Tsirkin
  2017-02-17 19:54 ` [Qemu-devel] [PULL 10/23] virtio: use VRingMemoryRegionCaches for descriptor ring Michael S. Tsirkin
                   ` (14 subsequent siblings)
  23 siblings, 0 replies; 36+ messages in thread
From: Michael S. Tsirkin @ 2017-02-17 19:54 UTC (permalink / raw)
  To: qemu-devel; +Cc: Peter Maydell, Paolo Bonzini, Stefan Hajnoczi

From: Paolo Bonzini <pbonzini@redhat.com>

The cached translations are RCU-protected to allow efficient use
when processing virtqueues.

Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
---
 include/hw/virtio/virtio.h |   1 +
 hw/virtio/virtio.c         | 105 +++++++++++++++++++++++++++++++++++++++++++--
 2 files changed, 103 insertions(+), 3 deletions(-)

diff --git a/include/hw/virtio/virtio.h b/include/hw/virtio/virtio.h
index 0863a25..15efcf2 100644
--- a/include/hw/virtio/virtio.h
+++ b/include/hw/virtio/virtio.h
@@ -85,6 +85,7 @@ struct VirtIODevice
     uint32_t generation;
     int nvectors;
     VirtQueue *vq;
+    MemoryListener listener;
     uint16_t device_id;
     bool vm_running;
     bool broken; /* device in invalid state, needs reset */
diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
index 71e41f6..b75cb52 100644
--- a/hw/virtio/virtio.c
+++ b/hw/virtio/virtio.c
@@ -60,6 +60,13 @@ typedef struct VRingUsed
     VRingUsedElem ring[0];
 } VRingUsed;
 
+typedef struct VRingMemoryRegionCaches {
+    struct rcu_head rcu;
+    MemoryRegionCache desc;
+    MemoryRegionCache avail;
+    MemoryRegionCache used;
+} VRingMemoryRegionCaches;
+
 typedef struct VRing
 {
     unsigned int num;
@@ -68,6 +75,7 @@ typedef struct VRing
     hwaddr desc;
     hwaddr avail;
     hwaddr used;
+    VRingMemoryRegionCaches *caches;
 } VRing;
 
 struct VirtQueue
@@ -104,6 +112,51 @@ struct VirtQueue
     QLIST_ENTRY(VirtQueue) node;
 };
 
+static void virtio_free_region_cache(VRingMemoryRegionCaches *caches)
+{
+    if (!caches) {
+        return;
+    }
+
+    address_space_cache_destroy(&caches->desc);
+    address_space_cache_destroy(&caches->avail);
+    address_space_cache_destroy(&caches->used);
+    g_free(caches);
+}
+
+static void virtio_init_region_cache(VirtIODevice *vdev, int n)
+{
+    VirtQueue *vq = &vdev->vq[n];
+    VRingMemoryRegionCaches *old = vq->vring.caches;
+    VRingMemoryRegionCaches *new;
+    hwaddr addr, size;
+    int event_size;
+
+    event_size = virtio_vdev_has_feature(vq->vdev, VIRTIO_RING_F_EVENT_IDX) ? 2 : 0;
+
+    addr = vq->vring.desc;
+    if (!addr) {
+        return;
+    }
+    new = g_new0(VRingMemoryRegionCaches, 1);
+    size = virtio_queue_get_desc_size(vdev, n);
+    address_space_cache_init(&new->desc, vdev->dma_as,
+                             addr, size, false);
+
+    size = virtio_queue_get_used_size(vdev, n) + event_size;
+    address_space_cache_init(&new->used, vdev->dma_as,
+                             vq->vring.used, size, true);
+
+    size = virtio_queue_get_avail_size(vdev, n) + event_size;
+    address_space_cache_init(&new->avail, vdev->dma_as,
+                             vq->vring.avail, size, false);
+
+    atomic_rcu_set(&vq->vring.caches, new);
+    if (old) {
+        call_rcu(old, virtio_free_region_cache, rcu);
+    }
+}
+
 /* virt queue functions */
 void virtio_queue_update_rings(VirtIODevice *vdev, int n)
 {
@@ -117,6 +170,7 @@ void virtio_queue_update_rings(VirtIODevice *vdev, int n)
     vring->used = vring_align(vring->avail +
                               offsetof(VRingAvail, ring[vring->num]),
                               vring->align);
+    virtio_init_region_cache(vdev, n);
 }
 
 static void vring_desc_read(VirtIODevice *vdev, VRingDesc *desc,
@@ -1264,6 +1318,7 @@ void virtio_queue_set_rings(VirtIODevice *vdev, int n, hwaddr desc,
     vdev->vq[n].vring.desc = desc;
     vdev->vq[n].vring.avail = avail;
     vdev->vq[n].vring.used = used;
+    virtio_init_region_cache(vdev, n);
 }
 
 void virtio_queue_set_num(VirtIODevice *vdev, int n, int num)
@@ -1984,9 +2039,6 @@ int virtio_load(VirtIODevice *vdev, QEMUFile *f, int version_id)
 void virtio_cleanup(VirtIODevice *vdev)
 {
     qemu_del_vm_change_state_handler(vdev->vmstate);
-    g_free(vdev->config);
-    g_free(vdev->vq);
-    g_free(vdev->vector_queues);
 }
 
 static void virtio_vmstate_change(void *opaque, int running, RunState state)
@@ -2248,6 +2300,19 @@ void GCC_FMT_ATTR(2, 3) virtio_error(VirtIODevice *vdev, const char *fmt, ...)
     }
 }
 
+static void virtio_memory_listener_commit(MemoryListener *listener)
+{
+    VirtIODevice *vdev = container_of(listener, VirtIODevice, listener);
+    int i;
+
+    for (i = 0; i < VIRTIO_QUEUE_MAX; i++) {
+        if (vdev->vq[i].vring.num == 0) {
+            break;
+        }
+        virtio_init_region_cache(vdev, i);
+    }
+}
+
 static void virtio_device_realize(DeviceState *dev, Error **errp)
 {
     VirtIODevice *vdev = VIRTIO_DEVICE(dev);
@@ -2270,6 +2335,9 @@ static void virtio_device_realize(DeviceState *dev, Error **errp)
         error_propagate(errp, err);
         return;
     }
+
+    vdev->listener.commit = virtio_memory_listener_commit;
+    memory_listener_register(&vdev->listener, vdev->dma_as);
 }
 
 static void virtio_device_unrealize(DeviceState *dev, Error **errp)
@@ -2292,6 +2360,36 @@ static void virtio_device_unrealize(DeviceState *dev, Error **errp)
     vdev->bus_name = NULL;
 }
 
+static void virtio_device_free_virtqueues(VirtIODevice *vdev)
+{
+    int i;
+    if (!vdev->vq) {
+        return;
+    }
+
+    for (i = 0; i < VIRTIO_QUEUE_MAX; i++) {
+        VRingMemoryRegionCaches *caches;
+        if (vdev->vq[i].vring.num == 0) {
+            break;
+        }
+        caches = atomic_read(&vdev->vq[i].vring.caches);
+        atomic_set(&vdev->vq[i].vring.caches, NULL);
+        virtio_free_region_cache(caches);
+    }
+    g_free(vdev->vq);
+}
+
+static void virtio_device_instance_finalize(Object *obj)
+{
+    VirtIODevice *vdev = VIRTIO_DEVICE(obj);
+
+    memory_listener_unregister(&vdev->listener);
+    virtio_device_free_virtqueues(vdev);
+
+    g_free(vdev->config);
+    g_free(vdev->vector_queues);
+}
+
 static Property virtio_properties[] = {
     DEFINE_VIRTIO_COMMON_FEATURES(VirtIODevice, host_features),
     DEFINE_PROP_END_OF_LIST(),
@@ -2418,6 +2516,7 @@ static const TypeInfo virtio_device_info = {
     .parent = TYPE_DEVICE,
     .instance_size = sizeof(VirtIODevice),
     .class_init = virtio_device_class_init,
+    .instance_finalize = virtio_device_instance_finalize,
     .abstract = true,
     .class_size = sizeof(VirtioDeviceClass),
 };
-- 
MST

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [Qemu-devel] [PULL 10/23] virtio: use VRingMemoryRegionCaches for descriptor ring
  2017-02-17 19:53 [Qemu-devel] [PULL 00/23] virtio, pci: fixes, features Michael S. Tsirkin
                   ` (8 preceding siblings ...)
  2017-02-17 19:54 ` [Qemu-devel] [PULL 09/23] virtio: add MemoryListener to cache ring translations Michael S. Tsirkin
@ 2017-02-17 19:54 ` Michael S. Tsirkin
  2017-02-17 19:54 ` [Qemu-devel] [PULL 11/23] virtio: check for vring setup in virtio_queue_update_used_idx Michael S. Tsirkin
                   ` (13 subsequent siblings)
  23 siblings, 0 replies; 36+ messages in thread
From: Michael S. Tsirkin @ 2017-02-17 19:54 UTC (permalink / raw)
  To: qemu-devel; +Cc: Peter Maydell, Paolo Bonzini, Stefan Hajnoczi

From: Paolo Bonzini <pbonzini@redhat.com>

Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
---
 hw/virtio/virtio.c | 26 ++++++++++++--------------
 1 file changed, 12 insertions(+), 14 deletions(-)

diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
index b75cb52..d62509d 100644
--- a/hw/virtio/virtio.c
+++ b/hw/virtio/virtio.c
@@ -491,25 +491,24 @@ void virtqueue_get_avail_bytes(VirtQueue *vq, unsigned int *in_bytes,
     VirtIODevice *vdev = vq->vdev;
     unsigned int max, idx;
     unsigned int total_bufs, in_total, out_total;
-    MemoryRegionCache vring_desc_cache;
+    VRingMemoryRegionCaches *caches;
     MemoryRegionCache indirect_desc_cache = MEMORY_REGION_CACHE_INVALID;
     int64_t len = 0;
     int rc;
 
+    rcu_read_lock();
     idx = vq->last_avail_idx;
     total_bufs = in_total = out_total = 0;
 
     max = vq->vring.num;
-    len = address_space_cache_init(&vring_desc_cache, vdev->dma_as,
-                                   vq->vring.desc, max * sizeof(VRingDesc),
-                                   false);
-    if (len < max * sizeof(VRingDesc)) {
+    caches = atomic_rcu_read(&vq->vring.caches);
+    if (caches->desc.len < max * sizeof(VRingDesc)) {
         virtio_error(vdev, "Cannot map descriptor ring");
         goto err;
     }
 
     while ((rc = virtqueue_num_heads(vq, idx)) > 0) {
-        MemoryRegionCache *desc_cache = &vring_desc_cache;
+        MemoryRegionCache *desc_cache = &caches->desc;
         unsigned int num_bufs;
         VRingDesc desc;
         unsigned int i;
@@ -586,13 +585,13 @@ void virtqueue_get_avail_bytes(VirtQueue *vq, unsigned int *in_bytes,
 
 done:
     address_space_cache_destroy(&indirect_desc_cache);
-    address_space_cache_destroy(&vring_desc_cache);
     if (in_bytes) {
         *in_bytes = in_total;
     }
     if (out_bytes) {
         *out_bytes = out_total;
     }
+    rcu_read_unlock();
     return;
 
 err:
@@ -726,7 +725,7 @@ static void *virtqueue_alloc_element(size_t sz, unsigned out_num, unsigned in_nu
 void *virtqueue_pop(VirtQueue *vq, size_t sz)
 {
     unsigned int i, head, max;
-    MemoryRegionCache vring_desc_cache;
+    VRingMemoryRegionCaches *caches;
     MemoryRegionCache indirect_desc_cache = MEMORY_REGION_CACHE_INVALID;
     MemoryRegionCache *desc_cache;
     int64_t len;
@@ -768,15 +767,14 @@ void *virtqueue_pop(VirtQueue *vq, size_t sz)
 
     i = head;
 
-    len = address_space_cache_init(&vring_desc_cache, vdev->dma_as,
-                                   vq->vring.desc, max * sizeof(VRingDesc),
-                                   false);
-    if (len < max * sizeof(VRingDesc)) {
+    rcu_read_lock();
+    caches = atomic_rcu_read(&vq->vring.caches);
+    if (caches->desc.len < max * sizeof(VRingDesc)) {
         virtio_error(vdev, "Cannot map descriptor ring");
         goto done;
     }
 
-    desc_cache = &vring_desc_cache;
+    desc_cache = &caches->desc;
     vring_desc_read(vdev, &desc, desc_cache, i);
     if (desc.flags & VRING_DESC_F_INDIRECT) {
         if (desc.len % sizeof(VRingDesc)) {
@@ -850,7 +848,7 @@ void *virtqueue_pop(VirtQueue *vq, size_t sz)
     trace_virtqueue_pop(vq, elem, elem->in_num, elem->out_num);
 done:
     address_space_cache_destroy(&indirect_desc_cache);
-    address_space_cache_destroy(&vring_desc_cache);
+    rcu_read_unlock();
 
     return elem;
 
-- 
MST

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [Qemu-devel] [PULL 11/23] virtio: check for vring setup in virtio_queue_update_used_idx
  2017-02-17 19:53 [Qemu-devel] [PULL 00/23] virtio, pci: fixes, features Michael S. Tsirkin
                   ` (9 preceding siblings ...)
  2017-02-17 19:54 ` [Qemu-devel] [PULL 10/23] virtio: use VRingMemoryRegionCaches for descriptor ring Michael S. Tsirkin
@ 2017-02-17 19:54 ` Michael S. Tsirkin
  2017-02-17 19:54 ` [Qemu-devel] [PULL 12/23] virtio: use VRingMemoryRegionCaches for avail and used rings Michael S. Tsirkin
                   ` (12 subsequent siblings)
  23 siblings, 0 replies; 36+ messages in thread
From: Michael S. Tsirkin @ 2017-02-17 19:54 UTC (permalink / raw)
  To: qemu-devel; +Cc: Peter Maydell, Paolo Bonzini, Philippe Mathieu-Daudé

From: Paolo Bonzini <pbonzini@redhat.com>

If the vring has not been set up, it is not necessary for vring_used_idx
to do anything (as is already the case when the caller is virtio_load).
This is harmless for now, but it will be a problem when the
MemoryRegionCache has not been set up.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
---
 hw/virtio/virtio.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
index d62509d..cdafcec 100644
--- a/hw/virtio/virtio.c
+++ b/hw/virtio/virtio.c
@@ -2156,7 +2156,9 @@ void virtio_queue_set_last_avail_idx(VirtIODevice *vdev, int n, uint16_t idx)
 
 void virtio_queue_update_used_idx(VirtIODevice *vdev, int n)
 {
-    vdev->vq[n].used_idx = vring_used_idx(&vdev->vq[n]);
+    if (vdev->vq[n].vring.desc) {
+        vdev->vq[n].used_idx = vring_used_idx(&vdev->vq[n]);
+    }
 }
 
 void virtio_queue_invalidate_signalled_used(VirtIODevice *vdev, int n)
-- 
MST

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [Qemu-devel] [PULL 12/23] virtio: use VRingMemoryRegionCaches for avail and used rings
  2017-02-17 19:53 [Qemu-devel] [PULL 00/23] virtio, pci: fixes, features Michael S. Tsirkin
                   ` (10 preceding siblings ...)
  2017-02-17 19:54 ` [Qemu-devel] [PULL 11/23] virtio: check for vring setup in virtio_queue_update_used_idx Michael S. Tsirkin
@ 2017-02-17 19:54 ` Michael S. Tsirkin
  2017-02-21 12:57   ` Gerd Hoffmann
  2017-02-17 19:54 ` [Qemu-devel] [PULL 13/23] virtio: Fix no interrupt when not creating msi controller Michael S. Tsirkin
                   ` (11 subsequent siblings)
  23 siblings, 1 reply; 36+ messages in thread
From: Michael S. Tsirkin @ 2017-02-17 19:54 UTC (permalink / raw)
  To: qemu-devel; +Cc: Peter Maydell, Paolo Bonzini, Stefan Hajnoczi, Jason Wang

From: Paolo Bonzini <pbonzini@redhat.com>

The virtio-net change is necessary because it uses virtqueue_fill
and virtqueue_flush instead of the more convenient virtqueue_push.

Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
---
 hw/net/virtio-net.c |  14 +++++-
 hw/virtio/virtio.c  | 132 ++++++++++++++++++++++++++++++++++++++--------------
 2 files changed, 109 insertions(+), 37 deletions(-)

diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
index 354a19e..c321680 100644
--- a/hw/net/virtio-net.c
+++ b/hw/net/virtio-net.c
@@ -1130,7 +1130,8 @@ static int receive_filter(VirtIONet *n, const uint8_t *buf, int size)
     return 0;
 }
 
-static ssize_t virtio_net_receive(NetClientState *nc, const uint8_t *buf, size_t size)
+static ssize_t virtio_net_receive_rcu(NetClientState *nc, const uint8_t *buf,
+                                      size_t size)
 {
     VirtIONet *n = qemu_get_nic_opaque(nc);
     VirtIONetQueue *q = virtio_net_get_subqueue(nc);
@@ -1233,6 +1234,17 @@ static ssize_t virtio_net_receive(NetClientState *nc, const uint8_t *buf, size_t
     return size;
 }
 
+static ssize_t virtio_net_receive(NetClientState *nc, const uint8_t *buf,
+                                  size_t size)
+{
+    ssize_t r;
+
+    rcu_read_lock();
+    r = virtio_net_receive_rcu(nc, buf, size);
+    rcu_read_unlock();
+    return r;
+}
+
 static int32_t virtio_net_flush_tx(VirtIONetQueue *q);
 
 static void virtio_net_tx_complete(NetClientState *nc, ssize_t len)
diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
index cdafcec..c08e50f 100644
--- a/hw/virtio/virtio.c
+++ b/hw/virtio/virtio.c
@@ -173,6 +173,7 @@ void virtio_queue_update_rings(VirtIODevice *vdev, int n)
     virtio_init_region_cache(vdev, n);
 }
 
+/* Called within rcu_read_lock().  */
 static void vring_desc_read(VirtIODevice *vdev, VRingDesc *desc,
                             MemoryRegionCache *cache, int i)
 {
@@ -184,88 +185,110 @@ static void vring_desc_read(VirtIODevice *vdev, VRingDesc *desc,
     virtio_tswap16s(vdev, &desc->next);
 }
 
+/* Called within rcu_read_lock().  */
 static inline uint16_t vring_avail_flags(VirtQueue *vq)
 {
-    hwaddr pa;
-    pa = vq->vring.avail + offsetof(VRingAvail, flags);
-    return virtio_lduw_phys(vq->vdev, pa);
+    VRingMemoryRegionCaches *caches = atomic_rcu_read(&vq->vring.caches);
+    hwaddr pa = offsetof(VRingAvail, flags);
+    return virtio_lduw_phys_cached(vq->vdev, &caches->avail, pa);
 }
 
+/* Called within rcu_read_lock().  */
 static inline uint16_t vring_avail_idx(VirtQueue *vq)
 {
-    hwaddr pa;
-    pa = vq->vring.avail + offsetof(VRingAvail, idx);
-    vq->shadow_avail_idx = virtio_lduw_phys(vq->vdev, pa);
+    VRingMemoryRegionCaches *caches = atomic_rcu_read(&vq->vring.caches);
+    hwaddr pa = offsetof(VRingAvail, idx);
+    vq->shadow_avail_idx = virtio_lduw_phys_cached(vq->vdev, &caches->avail, pa);
     return vq->shadow_avail_idx;
 }
 
+/* Called within rcu_read_lock().  */
 static inline uint16_t vring_avail_ring(VirtQueue *vq, int i)
 {
-    hwaddr pa;
-    pa = vq->vring.avail + offsetof(VRingAvail, ring[i]);
-    return virtio_lduw_phys(vq->vdev, pa);
+    VRingMemoryRegionCaches *caches = atomic_rcu_read(&vq->vring.caches);
+    hwaddr pa = offsetof(VRingAvail, ring[i]);
+    return virtio_lduw_phys_cached(vq->vdev, &caches->avail, pa);
 }
 
+/* Called within rcu_read_lock().  */
 static inline uint16_t vring_get_used_event(VirtQueue *vq)
 {
     return vring_avail_ring(vq, vq->vring.num);
 }
 
+/* Called within rcu_read_lock().  */
 static inline void vring_used_write(VirtQueue *vq, VRingUsedElem *uelem,
                                     int i)
 {
-    hwaddr pa;
+    VRingMemoryRegionCaches *caches = atomic_rcu_read(&vq->vring.caches);
+    hwaddr pa = offsetof(VRingUsed, ring[i]);
     virtio_tswap32s(vq->vdev, &uelem->id);
     virtio_tswap32s(vq->vdev, &uelem->len);
-    pa = vq->vring.used + offsetof(VRingUsed, ring[i]);
-    address_space_write(vq->vdev->dma_as, pa, MEMTXATTRS_UNSPECIFIED,
-                       (void *)uelem, sizeof(VRingUsedElem));
+    address_space_write_cached(&caches->used, pa, uelem, sizeof(VRingUsedElem));
+    address_space_cache_invalidate(&caches->used, pa, sizeof(VRingUsedElem));
 }
 
+/* Called within rcu_read_lock().  */
 static uint16_t vring_used_idx(VirtQueue *vq)
 {
-    hwaddr pa;
-    pa = vq->vring.used + offsetof(VRingUsed, idx);
-    return virtio_lduw_phys(vq->vdev, pa);
+    VRingMemoryRegionCaches *caches = atomic_rcu_read(&vq->vring.caches);
+    hwaddr pa = offsetof(VRingUsed, idx);
+    return virtio_lduw_phys_cached(vq->vdev, &caches->used, pa);
 }
 
+/* Called within rcu_read_lock().  */
 static inline void vring_used_idx_set(VirtQueue *vq, uint16_t val)
 {
-    hwaddr pa;
-    pa = vq->vring.used + offsetof(VRingUsed, idx);
-    virtio_stw_phys(vq->vdev, pa, val);
+    VRingMemoryRegionCaches *caches = atomic_rcu_read(&vq->vring.caches);
+    hwaddr pa = offsetof(VRingUsed, idx);
+    virtio_stw_phys_cached(vq->vdev, &caches->used, pa, val);
+    address_space_cache_invalidate(&caches->used, pa, sizeof(val));
     vq->used_idx = val;
 }
 
+/* Called within rcu_read_lock().  */
 static inline void vring_used_flags_set_bit(VirtQueue *vq, int mask)
 {
+    VRingMemoryRegionCaches *caches = atomic_rcu_read(&vq->vring.caches);
     VirtIODevice *vdev = vq->vdev;
-    hwaddr pa;
-    pa = vq->vring.used + offsetof(VRingUsed, flags);
-    virtio_stw_phys(vdev, pa, virtio_lduw_phys(vdev, pa) | mask);
+    hwaddr pa = offsetof(VRingUsed, flags);
+    uint16_t flags = virtio_lduw_phys_cached(vq->vdev, &caches->used, pa);
+
+    virtio_stw_phys_cached(vdev, &caches->used, pa, flags | mask);
+    address_space_cache_invalidate(&caches->used, pa, sizeof(flags));
 }
 
+/* Called within rcu_read_lock().  */
 static inline void vring_used_flags_unset_bit(VirtQueue *vq, int mask)
 {
+    VRingMemoryRegionCaches *caches = atomic_rcu_read(&vq->vring.caches);
     VirtIODevice *vdev = vq->vdev;
-    hwaddr pa;
-    pa = vq->vring.used + offsetof(VRingUsed, flags);
-    virtio_stw_phys(vdev, pa, virtio_lduw_phys(vdev, pa) & ~mask);
+    hwaddr pa = offsetof(VRingUsed, flags);
+    uint16_t flags = virtio_lduw_phys_cached(vq->vdev, &caches->used, pa);
+
+    virtio_stw_phys_cached(vdev, &caches->used, pa, flags & ~mask);
+    address_space_cache_invalidate(&caches->used, pa, sizeof(flags));
 }
 
+/* Called within rcu_read_lock().  */
 static inline void vring_set_avail_event(VirtQueue *vq, uint16_t val)
 {
+    VRingMemoryRegionCaches *caches;
     hwaddr pa;
     if (!vq->notification) {
         return;
     }
-    pa = vq->vring.used + offsetof(VRingUsed, ring[vq->vring.num]);
-    virtio_stw_phys(vq->vdev, pa, val);
+
+    caches = atomic_rcu_read(&vq->vring.caches);
+    pa = offsetof(VRingUsed, ring[vq->vring.num]);
+    virtio_stw_phys_cached(vq->vdev, &caches->used, pa, val);
 }
 
 void virtio_queue_set_notification(VirtQueue *vq, int enable)
 {
     vq->notification = enable;
+
+    rcu_read_lock();
     if (virtio_vdev_has_feature(vq->vdev, VIRTIO_RING_F_EVENT_IDX)) {
         vring_set_avail_event(vq, vring_avail_idx(vq));
     } else if (enable) {
@@ -277,6 +300,7 @@ void virtio_queue_set_notification(VirtQueue *vq, int enable)
         /* Expose avail event/used flags before caller checks the avail idx. */
         smp_mb();
     }
+    rcu_read_unlock();
 }
 
 int virtio_queue_ready(VirtQueue *vq)
@@ -285,8 +309,9 @@ int virtio_queue_ready(VirtQueue *vq)
 }
 
 /* Fetch avail_idx from VQ memory only when we really need to know if
- * guest has added some buffers. */
-int virtio_queue_empty(VirtQueue *vq)
+ * guest has added some buffers.
+ * Called within rcu_read_lock().  */
+static int virtio_queue_empty_rcu(VirtQueue *vq)
 {
     if (vq->shadow_avail_idx != vq->last_avail_idx) {
         return 0;
@@ -295,6 +320,20 @@ int virtio_queue_empty(VirtQueue *vq)
     return vring_avail_idx(vq) == vq->last_avail_idx;
 }
 
+int virtio_queue_empty(VirtQueue *vq)
+{
+    bool empty;
+
+    if (vq->shadow_avail_idx != vq->last_avail_idx) {
+        return 0;
+    }
+
+    rcu_read_lock();
+    empty = vring_avail_idx(vq) == vq->last_avail_idx;
+    rcu_read_unlock();
+    return empty;
+}
+
 static void virtqueue_unmap_sg(VirtQueue *vq, const VirtQueueElement *elem,
                                unsigned int len)
 {
@@ -373,6 +412,7 @@ bool virtqueue_rewind(VirtQueue *vq, unsigned int num)
     return true;
 }
 
+/* Called within rcu_read_lock().  */
 void virtqueue_fill(VirtQueue *vq, const VirtQueueElement *elem,
                     unsigned int len, unsigned int idx)
 {
@@ -393,6 +433,7 @@ void virtqueue_fill(VirtQueue *vq, const VirtQueueElement *elem,
     vring_used_write(vq, &uelem, idx);
 }
 
+/* Called within rcu_read_lock().  */
 void virtqueue_flush(VirtQueue *vq, unsigned int count)
 {
     uint16_t old, new;
@@ -416,10 +457,13 @@ void virtqueue_flush(VirtQueue *vq, unsigned int count)
 void virtqueue_push(VirtQueue *vq, const VirtQueueElement *elem,
                     unsigned int len)
 {
+    rcu_read_lock();
     virtqueue_fill(vq, elem, len, 0);
     virtqueue_flush(vq, 1);
+    rcu_read_unlock();
 }
 
+/* Called within rcu_read_lock().  */
 static int virtqueue_num_heads(VirtQueue *vq, unsigned int idx)
 {
     uint16_t num_heads = vring_avail_idx(vq) - idx;
@@ -439,6 +483,7 @@ static int virtqueue_num_heads(VirtQueue *vq, unsigned int idx)
     return num_heads;
 }
 
+/* Called within rcu_read_lock().  */
 static bool virtqueue_get_head(VirtQueue *vq, unsigned int idx,
                                unsigned int *head)
 {
@@ -740,8 +785,9 @@ void *virtqueue_pop(VirtQueue *vq, size_t sz)
     if (unlikely(vdev->broken)) {
         return NULL;
     }
-    if (virtio_queue_empty(vq)) {
-        return NULL;
+    rcu_read_lock();
+    if (virtio_queue_empty_rcu(vq)) {
+        goto done;
     }
     /* Needed after virtio_queue_empty(), see comment in
      * virtqueue_num_heads(). */
@@ -754,11 +800,11 @@ void *virtqueue_pop(VirtQueue *vq, size_t sz)
 
     if (vq->inuse >= vq->vring.num) {
         virtio_error(vdev, "Virtqueue size exceeded");
-        return NULL;
+        goto done;
     }
 
     if (!virtqueue_get_head(vq, vq->last_avail_idx++, &head)) {
-        return NULL;
+        goto done;
     }
 
     if (virtio_vdev_has_feature(vdev, VIRTIO_RING_F_EVENT_IDX)) {
@@ -767,7 +813,6 @@ void *virtqueue_pop(VirtQueue *vq, size_t sz)
 
     i = head;
 
-    rcu_read_lock();
     caches = atomic_rcu_read(&vq->vring.caches);
     if (caches->desc.len < max * sizeof(VRingDesc)) {
         virtio_error(vdev, "Cannot map descriptor ring");
@@ -1483,6 +1528,7 @@ static void virtio_set_isr(VirtIODevice *vdev, int value)
     }
 }
 
+/* Called within rcu_read_lock().  */
 static bool virtio_should_notify(VirtIODevice *vdev, VirtQueue *vq)
 {
     uint16_t old, new;
@@ -1508,7 +1554,12 @@ static bool virtio_should_notify(VirtIODevice *vdev, VirtQueue *vq)
 
 void virtio_notify_irqfd(VirtIODevice *vdev, VirtQueue *vq)
 {
-    if (!virtio_should_notify(vdev, vq)) {
+    bool should_notify;
+    rcu_read_lock();
+    should_notify = virtio_should_notify(vdev, vq);
+    rcu_read_unlock();
+
+    if (!should_notify) {
         return;
     }
 
@@ -1535,7 +1586,12 @@ void virtio_notify_irqfd(VirtIODevice *vdev, VirtQueue *vq)
 
 void virtio_notify(VirtIODevice *vdev, VirtQueue *vq)
 {
-    if (!virtio_should_notify(vdev, vq)) {
+    bool should_notify;
+    rcu_read_lock();
+    should_notify = virtio_should_notify(vdev, vq);
+    rcu_read_unlock();
+
+    if (!should_notify) {
         return;
     }
 
@@ -1996,6 +2052,7 @@ int virtio_load(VirtIODevice *vdev, QEMUFile *f, int version_id)
         }
     }
 
+    rcu_read_lock();
     for (i = 0; i < num; i++) {
         if (vdev->vq[i].vring.desc) {
             uint16_t nheads;
@@ -2030,6 +2087,7 @@ int virtio_load(VirtIODevice *vdev, QEMUFile *f, int version_id)
             }
         }
     }
+    rcu_read_unlock();
 
     return 0;
 }
@@ -2156,9 +2214,11 @@ void virtio_queue_set_last_avail_idx(VirtIODevice *vdev, int n, uint16_t idx)
 
 void virtio_queue_update_used_idx(VirtIODevice *vdev, int n)
 {
+    rcu_read_lock();
     if (vdev->vq[n].vring.desc) {
         vdev->vq[n].used_idx = vring_used_idx(&vdev->vq[n]);
     }
+    rcu_read_unlock();
 }
 
 void virtio_queue_invalidate_signalled_used(VirtIODevice *vdev, int n)
-- 
MST

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [Qemu-devel] [PULL 13/23] virtio: Fix no interrupt when not creating msi controller
  2017-02-17 19:53 [Qemu-devel] [PULL 00/23] virtio, pci: fixes, features Michael S. Tsirkin
                   ` (11 preceding siblings ...)
  2017-02-17 19:54 ` [Qemu-devel] [PULL 12/23] virtio: use VRingMemoryRegionCaches for avail and used rings Michael S. Tsirkin
@ 2017-02-17 19:54 ` Michael S. Tsirkin
  2017-02-17 19:54 ` [Qemu-devel] [PULL 14/23] pcie: simplify pcie_add_capability() Michael S. Tsirkin
                   ` (10 subsequent siblings)
  23 siblings, 0 replies; 36+ messages in thread
From: Michael S. Tsirkin @ 2017-02-17 19:54 UTC (permalink / raw)
  To: qemu-devel; +Cc: Peter Maydell, Shannon Zhao

For ARM virt machine, if we use virt-2.7 which will not create ITS node,
the virtio-net can not recieve interrupts so it can't get ip address
through dhcp.
This fixes commit 83d768b(virtio: set ISR on dataplane notifications).

Signed-off-by: Shannon Zhao <shannon.zhao@linaro.org>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
---
 hw/virtio/virtio.c | 11 ++++++++---
 1 file changed, 8 insertions(+), 3 deletions(-)

diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
index c08e50f..23483c7 100644
--- a/hw/virtio/virtio.c
+++ b/hw/virtio/virtio.c
@@ -1584,6 +1584,12 @@ void virtio_notify_irqfd(VirtIODevice *vdev, VirtQueue *vq)
     event_notifier_set(&vq->guest_notifier);
 }
 
+static void virtio_irq(VirtQueue *vq)
+{
+    virtio_set_isr(vq->vdev, 0x1);
+    virtio_notify_vector(vq->vdev, vq->vector);
+}
+
 void virtio_notify(VirtIODevice *vdev, VirtQueue *vq)
 {
     bool should_notify;
@@ -1596,8 +1602,7 @@ void virtio_notify(VirtIODevice *vdev, VirtQueue *vq)
     }
 
     trace_virtio_notify(vdev, vq);
-    virtio_set_isr(vq->vdev, 0x1);
-    virtio_notify_vector(vdev, vq->vector);
+    virtio_irq(vq);
 }
 
 void virtio_notify_config(VirtIODevice *vdev)
@@ -2240,7 +2245,7 @@ static void virtio_queue_guest_notifier_read(EventNotifier *n)
 {
     VirtQueue *vq = container_of(n, VirtQueue, guest_notifier);
     if (event_notifier_test_and_clear(n)) {
-        virtio_notify_vector(vq->vdev, vq->vector);
+        virtio_irq(vq);
     }
 }
 
-- 
MST

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [Qemu-devel] [PULL 14/23] pcie: simplify pcie_add_capability()
  2017-02-17 19:53 [Qemu-devel] [PULL 00/23] virtio, pci: fixes, features Michael S. Tsirkin
                   ` (12 preceding siblings ...)
  2017-02-17 19:54 ` [Qemu-devel] [PULL 13/23] virtio: Fix no interrupt when not creating msi controller Michael S. Tsirkin
@ 2017-02-17 19:54 ` Michael S. Tsirkin
  2017-02-17 19:54 ` [Qemu-devel] [PULL 15/23] vfio: trace map/unmap for notify as well Michael S. Tsirkin
                   ` (9 subsequent siblings)
  23 siblings, 0 replies; 36+ messages in thread
From: Michael S. Tsirkin @ 2017-02-17 19:54 UTC (permalink / raw)
  To: qemu-devel; +Cc: Peter Maydell, Peter Xu, Marcel Apfelbaum

From: Peter Xu <peterx@redhat.com>

When we add PCIe extended capabilities, we should be following the rule
that we add the head extended cap (at offset 0x100) first, then the rest
of them. Meanwhile, we are always adding new capability bits at the end
of the list. Here the "next" looks meaningless in all cases since it
should always be zero (along with the "header").

Simplify the function a bit, and it looks more readable now.

Signed-off-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
---
 hw/pci/pcie.c | 14 +++-----------
 1 file changed, 3 insertions(+), 11 deletions(-)

diff --git a/hw/pci/pcie.c b/hw/pci/pcie.c
index f4dd177..fc54bfd 100644
--- a/hw/pci/pcie.c
+++ b/hw/pci/pcie.c
@@ -665,32 +665,24 @@ void pcie_add_capability(PCIDevice *dev,
                          uint16_t cap_id, uint8_t cap_ver,
                          uint16_t offset, uint16_t size)
 {
-    uint32_t header;
-    uint16_t next;
-
     assert(offset >= PCI_CONFIG_SPACE_SIZE);
     assert(offset < offset + size);
     assert(offset + size <= PCIE_CONFIG_SPACE_SIZE);
     assert(size >= 8);
     assert(pci_is_express(dev));
 
-    if (offset == PCI_CONFIG_SPACE_SIZE) {
-        header = pci_get_long(dev->config + offset);
-        next = PCI_EXT_CAP_NEXT(header);
-    } else {
+    if (offset != PCI_CONFIG_SPACE_SIZE) {
         uint16_t prev;
 
         /*
          * 0xffffffff is not a valid cap id (it's a 16 bit field). use
          * internally to find the last capability in the linked list.
          */
-        next = pcie_find_capability_list(dev, 0xffffffff, &prev);
-
+        pcie_find_capability_list(dev, 0xffffffff, &prev);
         assert(prev >= PCI_CONFIG_SPACE_SIZE);
-        assert(next == 0);
         pcie_ext_cap_set_next(dev, prev, offset);
     }
-    pci_set_long(dev->config + offset, PCI_EXT_CAP(cap_id, cap_ver, next));
+    pci_set_long(dev->config + offset, PCI_EXT_CAP(cap_id, cap_ver, 0));
 
     /* Make capability read-only by default */
     memset(dev->wmask + offset, 0, size);
-- 
MST

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [Qemu-devel] [PULL 15/23] vfio: trace map/unmap for notify as well
  2017-02-17 19:53 [Qemu-devel] [PULL 00/23] virtio, pci: fixes, features Michael S. Tsirkin
                   ` (13 preceding siblings ...)
  2017-02-17 19:54 ` [Qemu-devel] [PULL 14/23] pcie: simplify pcie_add_capability() Michael S. Tsirkin
@ 2017-02-17 19:54 ` Michael S. Tsirkin
  2017-02-17 19:54 ` [Qemu-devel] [PULL 16/23] vfio: introduce vfio_get_vaddr() Michael S. Tsirkin
                   ` (8 subsequent siblings)
  23 siblings, 0 replies; 36+ messages in thread
From: Michael S. Tsirkin @ 2017-02-17 19:54 UTC (permalink / raw)
  To: qemu-devel; +Cc: Peter Maydell, Peter Xu, Alex Williamson, David Gibson

From: Peter Xu <peterx@redhat.com>

We traces its range, but we don't know whether it's a MAP/UNMAP. Let's
dump it as well.

Acked-by: Alex Williamson <alex.williamson@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
---
 hw/vfio/common.c     | 3 ++-
 hw/vfio/trace-events | 2 +-
 2 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/hw/vfio/common.c b/hw/vfio/common.c
index 801578b..174f351 100644
--- a/hw/vfio/common.c
+++ b/hw/vfio/common.c
@@ -305,7 +305,8 @@ static void vfio_iommu_map_notify(IOMMUNotifier *n, IOMMUTLBEntry *iotlb)
     void *vaddr;
     int ret;
 
-    trace_vfio_iommu_map_notify(iova, iova + iotlb->addr_mask);
+    trace_vfio_iommu_map_notify(iotlb->perm == IOMMU_NONE ? "UNMAP" : "MAP",
+                                iova, iova + iotlb->addr_mask);
 
     if (iotlb->target_as != &address_space_memory) {
         error_report("Wrong target AS \"%s\", only system memory is allowed",
diff --git a/hw/vfio/trace-events b/hw/vfio/trace-events
index 8de8281..2561c6d 100644
--- a/hw/vfio/trace-events
+++ b/hw/vfio/trace-events
@@ -84,7 +84,7 @@ vfio_pci_igd_lpc_bridge_enabled(const char *name) "%s"
 # hw/vfio/common.c
 vfio_region_write(const char *name, int index, uint64_t addr, uint64_t data, unsigned size) " (%s:region%d+0x%"PRIx64", 0x%"PRIx64 ", %d)"
 vfio_region_read(char *name, int index, uint64_t addr, unsigned size, uint64_t data) " (%s:region%d+0x%"PRIx64", %d) = 0x%"PRIx64
-vfio_iommu_map_notify(uint64_t iova_start, uint64_t iova_end) "iommu map @ %"PRIx64" - %"PRIx64
+vfio_iommu_map_notify(const char *op, uint64_t iova_start, uint64_t iova_end) "iommu %s @ %"PRIx64" - %"PRIx64
 vfio_listener_region_add_skip(uint64_t start, uint64_t end) "SKIPPING region_add %"PRIx64" - %"PRIx64
 vfio_listener_region_add_iommu(uint64_t start, uint64_t end) "region_add [iommu] %"PRIx64" - %"PRIx64
 vfio_listener_region_add_ram(uint64_t iova_start, uint64_t iova_end, void *vaddr) "region_add [ram] %"PRIx64" - %"PRIx64" [%p]"
-- 
MST

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [Qemu-devel] [PULL 16/23] vfio: introduce vfio_get_vaddr()
  2017-02-17 19:53 [Qemu-devel] [PULL 00/23] virtio, pci: fixes, features Michael S. Tsirkin
                   ` (14 preceding siblings ...)
  2017-02-17 19:54 ` [Qemu-devel] [PULL 15/23] vfio: trace map/unmap for notify as well Michael S. Tsirkin
@ 2017-02-17 19:54 ` Michael S. Tsirkin
  2017-02-17 19:54 ` [Qemu-devel] [PULL 17/23] vfio: allow to notify unmap for very large region Michael S. Tsirkin
                   ` (7 subsequent siblings)
  23 siblings, 0 replies; 36+ messages in thread
From: Michael S. Tsirkin @ 2017-02-17 19:54 UTC (permalink / raw)
  To: qemu-devel; +Cc: Peter Maydell, Peter Xu, Alex Williamson, David Gibson

From: Peter Xu <peterx@redhat.com>

A cleanup for vfio_iommu_map_notify(). Now we will fetch vaddr even if
the operation is unmap, but it won't hurt much.

One thing to mention is that we need the RCU read lock to protect the
whole translation and map/unmap procedure.

Acked-by: Alex Williamson <alex.williamson@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
---
 hw/vfio/common.c | 65 +++++++++++++++++++++++++++++++++++++++-----------------
 1 file changed, 45 insertions(+), 20 deletions(-)

diff --git a/hw/vfio/common.c b/hw/vfio/common.c
index 174f351..42c4790 100644
--- a/hw/vfio/common.c
+++ b/hw/vfio/common.c
@@ -294,54 +294,79 @@ static bool vfio_listener_skipped_section(MemoryRegionSection *section)
            section->offset_within_address_space & (1ULL << 63);
 }
 
-static void vfio_iommu_map_notify(IOMMUNotifier *n, IOMMUTLBEntry *iotlb)
+/* Called with rcu_read_lock held.  */
+static bool vfio_get_vaddr(IOMMUTLBEntry *iotlb, void **vaddr,
+                           bool *read_only)
 {
-    VFIOGuestIOMMU *giommu = container_of(n, VFIOGuestIOMMU, n);
-    VFIOContainer *container = giommu->container;
-    hwaddr iova = iotlb->iova + giommu->iommu_offset;
     MemoryRegion *mr;
     hwaddr xlat;
     hwaddr len = iotlb->addr_mask + 1;
-    void *vaddr;
-    int ret;
-
-    trace_vfio_iommu_map_notify(iotlb->perm == IOMMU_NONE ? "UNMAP" : "MAP",
-                                iova, iova + iotlb->addr_mask);
-
-    if (iotlb->target_as != &address_space_memory) {
-        error_report("Wrong target AS \"%s\", only system memory is allowed",
-                     iotlb->target_as->name ? iotlb->target_as->name : "none");
-        return;
-    }
+    bool writable = iotlb->perm & IOMMU_WO;
 
     /*
      * The IOMMU TLB entry we have just covers translation through
      * this IOMMU to its immediate target.  We need to translate
      * it the rest of the way through to memory.
      */
-    rcu_read_lock();
     mr = address_space_translate(&address_space_memory,
                                  iotlb->translated_addr,
-                                 &xlat, &len, iotlb->perm & IOMMU_WO);
+                                 &xlat, &len, writable);
     if (!memory_region_is_ram(mr)) {
         error_report("iommu map to non memory area %"HWADDR_PRIx"",
                      xlat);
-        goto out;
+        return false;
     }
+
     /*
      * Translation truncates length to the IOMMU page size,
      * check that it did not truncate too much.
      */
     if (len & iotlb->addr_mask) {
         error_report("iommu has granularity incompatible with target AS");
+        return false;
+    }
+
+    *vaddr = memory_region_get_ram_ptr(mr) + xlat;
+    *read_only = !writable || mr->readonly;
+
+    return true;
+}
+
+static void vfio_iommu_map_notify(IOMMUNotifier *n, IOMMUTLBEntry *iotlb)
+{
+    VFIOGuestIOMMU *giommu = container_of(n, VFIOGuestIOMMU, n);
+    VFIOContainer *container = giommu->container;
+    hwaddr iova = iotlb->iova + giommu->iommu_offset;
+    bool read_only;
+    void *vaddr;
+    int ret;
+
+    trace_vfio_iommu_map_notify(iotlb->perm == IOMMU_NONE ? "UNMAP" : "MAP",
+                                iova, iova + iotlb->addr_mask);
+
+    if (iotlb->target_as != &address_space_memory) {
+        error_report("Wrong target AS \"%s\", only system memory is allowed",
+                     iotlb->target_as->name ? iotlb->target_as->name : "none");
+        return;
+    }
+
+    rcu_read_lock();
+
+    if (!vfio_get_vaddr(iotlb, &vaddr, &read_only)) {
         goto out;
     }
 
     if ((iotlb->perm & IOMMU_RW) != IOMMU_NONE) {
-        vaddr = memory_region_get_ram_ptr(mr) + xlat;
+        /*
+         * vaddr is only valid until rcu_read_unlock(). But after
+         * vfio_dma_map has set up the mapping the pages will be
+         * pinned by the kernel. This makes sure that the RAM backend
+         * of vaddr will always be there, even if the memory object is
+         * destroyed and its backing memory munmap-ed.
+         */
         ret = vfio_dma_map(container, iova,
                            iotlb->addr_mask + 1, vaddr,
-                           !(iotlb->perm & IOMMU_WO) || mr->readonly);
+                           read_only);
         if (ret) {
             error_report("vfio_dma_map(%p, 0x%"HWADDR_PRIx", "
                          "0x%"HWADDR_PRIx", %p) = %d (%m)",
-- 
MST

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [Qemu-devel] [PULL 17/23] vfio: allow to notify unmap for very large region
  2017-02-17 19:53 [Qemu-devel] [PULL 00/23] virtio, pci: fixes, features Michael S. Tsirkin
                   ` (15 preceding siblings ...)
  2017-02-17 19:54 ` [Qemu-devel] [PULL 16/23] vfio: introduce vfio_get_vaddr() Michael S. Tsirkin
@ 2017-02-17 19:54 ` Michael S. Tsirkin
  2017-02-17 19:54 ` [Qemu-devel] [PULL 18/23] intel_iommu: add "caching-mode" option Michael S. Tsirkin
                   ` (6 subsequent siblings)
  23 siblings, 0 replies; 36+ messages in thread
From: Michael S. Tsirkin @ 2017-02-17 19:54 UTC (permalink / raw)
  To: qemu-devel; +Cc: Peter Maydell, Peter Xu, Alex Williamson, David Gibson

From: Peter Xu <peterx@redhat.com>

Linux vfio driver supports to do VFIO_IOMMU_UNMAP_DMA for a very big
region. This can be leveraged by QEMU IOMMU implementation to cleanup
existing page mappings for an entire iova address space (by notifying
with an IOTLB with extremely huge addr_mask). However current
vfio_iommu_map_notify() does not allow that. It make sure that all the
translated address in IOTLB is falling into RAM range.

The check makes sense, but it should only be a sensible checker for
mapping operations, and mean little for unmap operations.

This patch moves this check into map logic only, so that we'll get
faster unmap handling (no need to translate again), and also we can then
better support unmapping a very big region when it covers non-ram ranges
or even not-existing ranges.

Acked-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
---
 hw/vfio/common.c | 7 +++----
 1 file changed, 3 insertions(+), 4 deletions(-)

diff --git a/hw/vfio/common.c b/hw/vfio/common.c
index 42c4790..f3ba9b9 100644
--- a/hw/vfio/common.c
+++ b/hw/vfio/common.c
@@ -352,11 +352,10 @@ static void vfio_iommu_map_notify(IOMMUNotifier *n, IOMMUTLBEntry *iotlb)
 
     rcu_read_lock();
 
-    if (!vfio_get_vaddr(iotlb, &vaddr, &read_only)) {
-        goto out;
-    }
-
     if ((iotlb->perm & IOMMU_RW) != IOMMU_NONE) {
+        if (!vfio_get_vaddr(iotlb, &vaddr, &read_only)) {
+            goto out;
+        }
         /*
          * vaddr is only valid until rcu_read_unlock(). But after
          * vfio_dma_map has set up the mapping the pages will be
-- 
MST

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [Qemu-devel] [PULL 18/23] intel_iommu: add "caching-mode" option
  2017-02-17 19:53 [Qemu-devel] [PULL 00/23] virtio, pci: fixes, features Michael S. Tsirkin
                   ` (16 preceding siblings ...)
  2017-02-17 19:54 ` [Qemu-devel] [PULL 17/23] vfio: allow to notify unmap for very large region Michael S. Tsirkin
@ 2017-02-17 19:54 ` Michael S. Tsirkin
  2017-02-17 19:54 ` [Qemu-devel] [PULL 19/23] intel_iommu: simplify irq region translation Michael S. Tsirkin
                   ` (5 subsequent siblings)
  23 siblings, 0 replies; 36+ messages in thread
From: Michael S. Tsirkin @ 2017-02-17 19:54 UTC (permalink / raw)
  To: qemu-devel
  Cc: Peter Maydell, Aviv Ben-David, Jason Wang, Peter Xu,
	Paolo Bonzini, Richard Henderson, Eduardo Habkost

From: Aviv Ben-David <bd.aviv@gmail.com>

This capability asks the guest to invalidate cache before each map operation.
We can use this invalidation to trap map operations in the hypervisor.

Signed-off-by: Aviv Ben-David <bd.aviv@gmail.com>
[peterx: using "caching-mode" instead of "cache-mode" to align with spec]
[peterx: re-write the subject to make it short and clear]
Reviewed-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Aviv Ben-David <bd.aviv@gmail.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
---
 hw/i386/intel_iommu_internal.h | 1 +
 include/hw/i386/intel_iommu.h  | 2 ++
 hw/i386/intel_iommu.c          | 5 +++++
 3 files changed, 8 insertions(+)

diff --git a/hw/i386/intel_iommu_internal.h b/hw/i386/intel_iommu_internal.h
index 356f188..4104121 100644
--- a/hw/i386/intel_iommu_internal.h
+++ b/hw/i386/intel_iommu_internal.h
@@ -202,6 +202,7 @@
 #define VTD_CAP_MAMV                (VTD_MAMV << 48)
 #define VTD_CAP_PSI                 (1ULL << 39)
 #define VTD_CAP_SLLPS               ((1ULL << 34) | (1ULL << 35))
+#define VTD_CAP_CM                  (1ULL << 7)
 
 /* Supported Adjusted Guest Address Widths */
 #define VTD_CAP_SAGAW_SHIFT         8
diff --git a/include/hw/i386/intel_iommu.h b/include/hw/i386/intel_iommu.h
index 405c9d1..fe645aa 100644
--- a/include/hw/i386/intel_iommu.h
+++ b/include/hw/i386/intel_iommu.h
@@ -257,6 +257,8 @@ struct IntelIOMMUState {
     uint8_t womask[DMAR_REG_SIZE];  /* WO (write only - read returns 0) */
     uint32_t version;
 
+    bool caching_mode;          /* RO - is cap CM enabled? */
+
     dma_addr_t root;                /* Current root table pointer */
     bool root_extended;             /* Type of root table (extended or not) */
     bool dmar_enabled;              /* Set if DMA remapping is enabled */
diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
index 3270fb9..50251c3 100644
--- a/hw/i386/intel_iommu.c
+++ b/hw/i386/intel_iommu.c
@@ -2115,6 +2115,7 @@ static Property vtd_properties[] = {
     DEFINE_PROP_ON_OFF_AUTO("eim", IntelIOMMUState, intr_eim,
                             ON_OFF_AUTO_AUTO),
     DEFINE_PROP_BOOL("x-buggy-eim", IntelIOMMUState, buggy_eim, false),
+    DEFINE_PROP_BOOL("caching-mode", IntelIOMMUState, caching_mode, FALSE),
     DEFINE_PROP_END_OF_LIST(),
 };
 
@@ -2496,6 +2497,10 @@ static void vtd_init(IntelIOMMUState *s)
         s->ecap |= VTD_ECAP_DT;
     }
 
+    if (s->caching_mode) {
+        s->cap |= VTD_CAP_CM;
+    }
+
     vtd_reset_context_cache(s);
     vtd_reset_iotlb(s);
 
-- 
MST

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [Qemu-devel] [PULL 19/23] intel_iommu: simplify irq region translation
  2017-02-17 19:53 [Qemu-devel] [PULL 00/23] virtio, pci: fixes, features Michael S. Tsirkin
                   ` (17 preceding siblings ...)
  2017-02-17 19:54 ` [Qemu-devel] [PULL 18/23] intel_iommu: add "caching-mode" option Michael S. Tsirkin
@ 2017-02-17 19:54 ` Michael S. Tsirkin
  2017-02-17 19:54 ` [Qemu-devel] [PULL 20/23] intel_iommu: renaming gpa to iova where proper Michael S. Tsirkin
                   ` (4 subsequent siblings)
  23 siblings, 0 replies; 36+ messages in thread
From: Michael S. Tsirkin @ 2017-02-17 19:54 UTC (permalink / raw)
  To: qemu-devel
  Cc: Peter Maydell, Peter Xu, Jason Wang, David Gibson, Paolo Bonzini,
	Richard Henderson, Eduardo Habkost

From: Peter Xu <peterx@redhat.com>

Now we have a standalone memory region for MSI, all the irq region
requests should be redirected there. Cleaning up the block with an
assertion instead.

Reviewed-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
---
 hw/i386/intel_iommu.c | 28 ++++++----------------------
 1 file changed, 6 insertions(+), 22 deletions(-)

diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
index 50251c3..86d19bb 100644
--- a/hw/i386/intel_iommu.c
+++ b/hw/i386/intel_iommu.c
@@ -818,28 +818,12 @@ static void vtd_do_iommu_translate(VTDAddressSpace *vtd_as, PCIBus *bus,
     bool writes = true;
     VTDIOTLBEntry *iotlb_entry;
 
-    /* Check if the request is in interrupt address range */
-    if (vtd_is_interrupt_addr(addr)) {
-        if (is_write) {
-            /* FIXME: since we don't know the length of the access here, we
-             * treat Non-DWORD length write requests without PASID as
-             * interrupt requests, too. Withoud interrupt remapping support,
-             * we just use 1:1 mapping.
-             */
-            VTD_DPRINTF(MMU, "write request to interrupt address "
-                        "gpa 0x%"PRIx64, addr);
-            entry->iova = addr & VTD_PAGE_MASK_4K;
-            entry->translated_addr = addr & VTD_PAGE_MASK_4K;
-            entry->addr_mask = ~VTD_PAGE_MASK_4K;
-            entry->perm = IOMMU_WO;
-            return;
-        } else {
-            VTD_DPRINTF(GENERAL, "error: read request from interrupt address "
-                        "gpa 0x%"PRIx64, addr);
-            vtd_report_dmar_fault(s, source_id, addr, VTD_FR_READ, is_write);
-            return;
-        }
-    }
+    /*
+     * We have standalone memory region for interrupt addresses, we
+     * should never receive translation requests in this region.
+     */
+    assert(!vtd_is_interrupt_addr(addr));
+
     /* Try to fetch slpte form IOTLB */
     iotlb_entry = vtd_lookup_iotlb(s, source_id, addr);
     if (iotlb_entry) {
-- 
MST

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [Qemu-devel] [PULL 20/23] intel_iommu: renaming gpa to iova where proper
  2017-02-17 19:53 [Qemu-devel] [PULL 00/23] virtio, pci: fixes, features Michael S. Tsirkin
                   ` (18 preceding siblings ...)
  2017-02-17 19:54 ` [Qemu-devel] [PULL 19/23] intel_iommu: simplify irq region translation Michael S. Tsirkin
@ 2017-02-17 19:54 ` Michael S. Tsirkin
  2017-02-17 19:55 ` [Qemu-devel] [PULL 21/23] intel_iommu: convert dbg macros to traces for inv Michael S. Tsirkin
                   ` (3 subsequent siblings)
  23 siblings, 0 replies; 36+ messages in thread
From: Michael S. Tsirkin @ 2017-02-17 19:54 UTC (permalink / raw)
  To: qemu-devel
  Cc: Peter Maydell, Peter Xu, Jason Wang, David Gibson, Paolo Bonzini,
	Richard Henderson, Eduardo Habkost

From: Peter Xu <peterx@redhat.com>

There are lots of places in current intel_iommu.c codes that named
"iova" as "gpa". It is really confusing to use a name "gpa" in these
places (which is very easily to be understood as "Guest Physical
Address", while it's not). To make the codes (much) easier to be read, I
decided to do this once and for all.

No functional change is made. Only literal ones.

Reviewed-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
---
 hw/i386/intel_iommu.c | 44 ++++++++++++++++++++++----------------------
 1 file changed, 22 insertions(+), 22 deletions(-)

diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
index 86d19bb..0c94b79 100644
--- a/hw/i386/intel_iommu.c
+++ b/hw/i386/intel_iommu.c
@@ -259,7 +259,7 @@ static void vtd_update_iotlb(IntelIOMMUState *s, uint16_t source_id,
     uint64_t *key = g_malloc(sizeof(*key));
     uint64_t gfn = vtd_get_iotlb_gfn(addr, level);
 
-    VTD_DPRINTF(CACHE, "update iotlb sid 0x%"PRIx16 " gpa 0x%"PRIx64
+    VTD_DPRINTF(CACHE, "update iotlb sid 0x%"PRIx16 " iova 0x%"PRIx64
                 " slpte 0x%"PRIx64 " did 0x%"PRIx16, source_id, addr, slpte,
                 domain_id);
     if (g_hash_table_size(s->iotlb) >= VTD_IOTLB_MAX_SIZE) {
@@ -575,12 +575,12 @@ static uint64_t vtd_get_slpte(dma_addr_t base_addr, uint32_t index)
     return slpte;
 }
 
-/* Given a gpa and the level of paging structure, return the offset of current
- * level.
+/* Given an iova and the level of paging structure, return the offset
+ * of current level.
  */
-static inline uint32_t vtd_gpa_level_offset(uint64_t gpa, uint32_t level)
+static inline uint32_t vtd_iova_level_offset(uint64_t iova, uint32_t level)
 {
-    return (gpa >> vtd_slpt_level_shift(level)) &
+    return (iova >> vtd_slpt_level_shift(level)) &
             ((1ULL << VTD_SL_LEVEL_BITS) - 1);
 }
 
@@ -628,12 +628,12 @@ static bool vtd_slpte_nonzero_rsvd(uint64_t slpte, uint32_t level)
     }
 }
 
-/* Given the @gpa, get relevant @slptep. @slpte_level will be the last level
+/* Given the @iova, get relevant @slptep. @slpte_level will be the last level
  * of the translation, can be used for deciding the size of large page.
  */
-static int vtd_gpa_to_slpte(VTDContextEntry *ce, uint64_t gpa, bool is_write,
-                            uint64_t *slptep, uint32_t *slpte_level,
-                            bool *reads, bool *writes)
+static int vtd_iova_to_slpte(VTDContextEntry *ce, uint64_t iova, bool is_write,
+                             uint64_t *slptep, uint32_t *slpte_level,
+                             bool *reads, bool *writes)
 {
     dma_addr_t addr = vtd_get_slpt_base_from_context(ce);
     uint32_t level = vtd_get_level_from_context_entry(ce);
@@ -642,11 +642,11 @@ static int vtd_gpa_to_slpte(VTDContextEntry *ce, uint64_t gpa, bool is_write,
     uint32_t ce_agaw = vtd_get_agaw_from_context_entry(ce);
     uint64_t access_right_check;
 
-    /* Check if @gpa is above 2^X-1, where X is the minimum of MGAW in CAP_REG
-     * and AW in context-entry.
+    /* Check if @iova is above 2^X-1, where X is the minimum of MGAW
+     * in CAP_REG and AW in context-entry.
      */
-    if (gpa & ~((1ULL << MIN(ce_agaw, VTD_MGAW)) - 1)) {
-        VTD_DPRINTF(GENERAL, "error: gpa 0x%"PRIx64 " exceeds limits", gpa);
+    if (iova & ~((1ULL << MIN(ce_agaw, VTD_MGAW)) - 1)) {
+        VTD_DPRINTF(GENERAL, "error: iova 0x%"PRIx64 " exceeds limits", iova);
         return -VTD_FR_ADDR_BEYOND_MGAW;
     }
 
@@ -654,13 +654,13 @@ static int vtd_gpa_to_slpte(VTDContextEntry *ce, uint64_t gpa, bool is_write,
     access_right_check = is_write ? VTD_SL_W : VTD_SL_R;
 
     while (true) {
-        offset = vtd_gpa_level_offset(gpa, level);
+        offset = vtd_iova_level_offset(iova, level);
         slpte = vtd_get_slpte(addr, offset);
 
         if (slpte == (uint64_t)-1) {
             VTD_DPRINTF(GENERAL, "error: fail to access second-level paging "
-                        "entry at level %"PRIu32 " for gpa 0x%"PRIx64,
-                        level, gpa);
+                        "entry at level %"PRIu32 " for iova 0x%"PRIx64,
+                        level, iova);
             if (level == vtd_get_level_from_context_entry(ce)) {
                 /* Invalid programming of context-entry */
                 return -VTD_FR_CONTEXT_ENTRY_INV;
@@ -672,8 +672,8 @@ static int vtd_gpa_to_slpte(VTDContextEntry *ce, uint64_t gpa, bool is_write,
         *writes = (*writes) && (slpte & VTD_SL_W);
         if (!(slpte & access_right_check)) {
             VTD_DPRINTF(GENERAL, "error: lack of %s permission for "
-                        "gpa 0x%"PRIx64 " slpte 0x%"PRIx64,
-                        (is_write ? "write" : "read"), gpa, slpte);
+                        "iova 0x%"PRIx64 " slpte 0x%"PRIx64,
+                        (is_write ? "write" : "read"), iova, slpte);
             return is_write ? -VTD_FR_WRITE : -VTD_FR_READ;
         }
         if (vtd_slpte_nonzero_rsvd(slpte, level)) {
@@ -827,7 +827,7 @@ static void vtd_do_iommu_translate(VTDAddressSpace *vtd_as, PCIBus *bus,
     /* Try to fetch slpte form IOTLB */
     iotlb_entry = vtd_lookup_iotlb(s, source_id, addr);
     if (iotlb_entry) {
-        VTD_DPRINTF(CACHE, "hit iotlb sid 0x%"PRIx16 " gpa 0x%"PRIx64
+        VTD_DPRINTF(CACHE, "hit iotlb sid 0x%"PRIx16 " iova 0x%"PRIx64
                     " slpte 0x%"PRIx64 " did 0x%"PRIx16, source_id, addr,
                     iotlb_entry->slpte, iotlb_entry->domain_id);
         slpte = iotlb_entry->slpte;
@@ -867,8 +867,8 @@ static void vtd_do_iommu_translate(VTDAddressSpace *vtd_as, PCIBus *bus,
         cc_entry->context_cache_gen = s->context_cache_gen;
     }
 
-    ret_fr = vtd_gpa_to_slpte(&ce, addr, is_write, &slpte, &level,
-                              &reads, &writes);
+    ret_fr = vtd_iova_to_slpte(&ce, addr, is_write, &slpte, &level,
+                               &reads, &writes);
     if (ret_fr) {
         ret_fr = -ret_fr;
         if (is_fpd_set && vtd_is_qualified_fault(ret_fr)) {
@@ -2033,7 +2033,7 @@ static IOMMUTLBEntry vtd_iommu_translate(MemoryRegion *iommu, hwaddr addr,
                            is_write, &ret);
     VTD_DPRINTF(MMU,
                 "bus %"PRIu8 " slot %"PRIu8 " func %"PRIu8 " devfn %"PRIu8
-                " gpa 0x%"PRIx64 " hpa 0x%"PRIx64, pci_bus_num(vtd_as->bus),
+                " iova 0x%"PRIx64 " hpa 0x%"PRIx64, pci_bus_num(vtd_as->bus),
                 VTD_PCI_SLOT(vtd_as->devfn), VTD_PCI_FUNC(vtd_as->devfn),
                 vtd_as->devfn, addr, ret.translated_addr);
     return ret;
-- 
MST

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [Qemu-devel] [PULL 21/23] intel_iommu: convert dbg macros to traces for inv
  2017-02-17 19:53 [Qemu-devel] [PULL 00/23] virtio, pci: fixes, features Michael S. Tsirkin
                   ` (19 preceding siblings ...)
  2017-02-17 19:54 ` [Qemu-devel] [PULL 20/23] intel_iommu: renaming gpa to iova where proper Michael S. Tsirkin
@ 2017-02-17 19:55 ` Michael S. Tsirkin
  2017-02-17 19:55 ` [Qemu-devel] [PULL 22/23] intel_iommu: convert dbg macros to trace for trans Michael S. Tsirkin
                   ` (2 subsequent siblings)
  23 siblings, 0 replies; 36+ messages in thread
From: Michael S. Tsirkin @ 2017-02-17 19:55 UTC (permalink / raw)
  To: qemu-devel
  Cc: Peter Maydell, Peter Xu, Jason Wang, David Gibson, Paolo Bonzini,
	Richard Henderson, Eduardo Habkost

From: Peter Xu <peterx@redhat.com>

VT-d codes are still using static DEBUG_INTEL_IOMMU macro. That's not
good, and we should end the day when we need to recompile the code
before getting useful debugging information for vt-d. Time to switch to
the trace system. This is the first patch to do it.

Signed-off-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Jason Wang <jasowang@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
---
 hw/i386/intel_iommu.c | 95 +++++++++++++++++++++------------------------------
 hw/i386/trace-events  | 18 ++++++++++
 2 files changed, 56 insertions(+), 57 deletions(-)

diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
index 0c94b79..08e43b6 100644
--- a/hw/i386/intel_iommu.c
+++ b/hw/i386/intel_iommu.c
@@ -35,6 +35,7 @@
 #include "sysemu/kvm.h"
 #include "hw/i386/apic_internal.h"
 #include "kvm_i386.h"
+#include "trace.h"
 
 /*#define DEBUG_INTEL_IOMMU*/
 #ifdef DEBUG_INTEL_IOMMU
@@ -474,22 +475,19 @@ static void vtd_handle_inv_queue_error(IntelIOMMUState *s)
 /* Set the IWC field and try to generate an invalidation completion interrupt */
 static void vtd_generate_completion_event(IntelIOMMUState *s)
 {
-    VTD_DPRINTF(INV, "completes an invalidation wait command with "
-                "Interrupt Flag");
     if (vtd_get_long_raw(s, DMAR_ICS_REG) & VTD_ICS_IWC) {
-        VTD_DPRINTF(INV, "there is a previous interrupt condition to be "
-                    "serviced by software, "
-                    "new invalidation event is not generated");
+        trace_vtd_inv_desc_wait_irq("One pending, skip current");
         return;
     }
     vtd_set_clear_mask_long(s, DMAR_ICS_REG, 0, VTD_ICS_IWC);
     vtd_set_clear_mask_long(s, DMAR_IECTL_REG, 0, VTD_IECTL_IP);
     if (vtd_get_long_raw(s, DMAR_IECTL_REG) & VTD_IECTL_IM) {
-        VTD_DPRINTF(INV, "IM filed in IECTL_REG is set, new invalidation "
-                    "event is not generated");
+        trace_vtd_inv_desc_wait_irq("IM in IECTL_REG is set, "
+                                    "new event not generated");
         return;
     } else {
         /* Generate the interrupt event */
+        trace_vtd_inv_desc_wait_irq("Generating complete event");
         vtd_generate_interrupt(s, DMAR_IEADDR_REG, DMAR_IEDATA_REG);
         vtd_set_clear_mask_long(s, DMAR_IECTL_REG, VTD_IECTL_IP, 0);
     }
@@ -923,6 +921,7 @@ static void vtd_interrupt_remap_table_setup(IntelIOMMUState *s)
 
 static void vtd_context_global_invalidate(IntelIOMMUState *s)
 {
+    trace_vtd_inv_desc_cc_global();
     s->context_cache_gen++;
     if (s->context_cache_gen == VTD_CONTEXT_CACHE_GEN_MAX) {
         vtd_reset_context_cache(s);
@@ -962,9 +961,11 @@ static void vtd_context_device_invalidate(IntelIOMMUState *s,
     uint16_t mask;
     VTDBus *vtd_bus;
     VTDAddressSpace *vtd_as;
-    uint16_t devfn;
+    uint8_t bus_n, devfn;
     uint16_t devfn_it;
 
+    trace_vtd_inv_desc_cc_devices(source_id, func_mask);
+
     switch (func_mask & 3) {
     case 0:
         mask = 0;   /* No bits in the SID field masked */
@@ -980,16 +981,16 @@ static void vtd_context_device_invalidate(IntelIOMMUState *s,
         break;
     }
     mask = ~mask;
-    VTD_DPRINTF(INV, "device-selective invalidation source 0x%"PRIx16
-                    " mask %"PRIu16, source_id, mask);
-    vtd_bus = vtd_find_as_from_bus_num(s, VTD_SID_TO_BUS(source_id));
+
+    bus_n = VTD_SID_TO_BUS(source_id);
+    vtd_bus = vtd_find_as_from_bus_num(s, bus_n);
     if (vtd_bus) {
         devfn = VTD_SID_TO_DEVFN(source_id);
         for (devfn_it = 0; devfn_it < X86_IOMMU_PCI_DEVFN_MAX; ++devfn_it) {
             vtd_as = vtd_bus->dev_as[devfn_it];
             if (vtd_as && ((devfn_it & mask) == (devfn & mask))) {
-                VTD_DPRINTF(INV, "invalidate context-cahce of devfn 0x%"PRIx16,
-                            devfn_it);
+                trace_vtd_inv_desc_cc_device(bus_n, VTD_PCI_SLOT(devfn_it),
+                                             VTD_PCI_FUNC(devfn_it));
                 vtd_as->context_cache_entry.context_cache_gen = 0;
             }
         }
@@ -1302,9 +1303,7 @@ static bool vtd_process_wait_desc(IntelIOMMUState *s, VTDInvDesc *inv_desc)
 {
     if ((inv_desc->hi & VTD_INV_DESC_WAIT_RSVD_HI) ||
         (inv_desc->lo & VTD_INV_DESC_WAIT_RSVD_LO)) {
-        VTD_DPRINTF(GENERAL, "error: non-zero reserved field in Invalidation "
-                    "Wait Descriptor hi 0x%"PRIx64 " lo 0x%"PRIx64,
-                    inv_desc->hi, inv_desc->lo);
+        trace_vtd_inv_desc_wait_invalid(inv_desc->hi, inv_desc->lo);
         return false;
     }
     if (inv_desc->lo & VTD_INV_DESC_WAIT_SW) {
@@ -1316,21 +1315,18 @@ static bool vtd_process_wait_desc(IntelIOMMUState *s, VTDInvDesc *inv_desc)
 
         /* FIXME: need to be masked with HAW? */
         dma_addr_t status_addr = inv_desc->hi;
-        VTD_DPRINTF(INV, "status data 0x%x, status addr 0x%"PRIx64,
-                    status_data, status_addr);
+        trace_vtd_inv_desc_wait_sw(status_addr, status_data);
         status_data = cpu_to_le32(status_data);
         if (dma_memory_write(&address_space_memory, status_addr, &status_data,
                              sizeof(status_data))) {
-            VTD_DPRINTF(GENERAL, "error: fail to perform a coherent write");
+            trace_vtd_inv_desc_wait_write_fail(inv_desc->hi, inv_desc->lo);
             return false;
         }
     } else if (inv_desc->lo & VTD_INV_DESC_WAIT_IF) {
         /* Interrupt flag */
-        VTD_DPRINTF(INV, "Invalidation Wait Descriptor interrupt completion");
         vtd_generate_completion_event(s);
     } else {
-        VTD_DPRINTF(GENERAL, "error: invalid Invalidation Wait Descriptor: "
-                    "hi 0x%"PRIx64 " lo 0x%"PRIx64, inv_desc->hi, inv_desc->lo);
+        trace_vtd_inv_desc_wait_invalid(inv_desc->hi, inv_desc->lo);
         return false;
     }
     return true;
@@ -1339,30 +1335,29 @@ static bool vtd_process_wait_desc(IntelIOMMUState *s, VTDInvDesc *inv_desc)
 static bool vtd_process_context_cache_desc(IntelIOMMUState *s,
                                            VTDInvDesc *inv_desc)
 {
+    uint16_t sid, fmask;
+
     if ((inv_desc->lo & VTD_INV_DESC_CC_RSVD) || inv_desc->hi) {
-        VTD_DPRINTF(GENERAL, "error: non-zero reserved field in Context-cache "
-                    "Invalidate Descriptor");
+        trace_vtd_inv_desc_cc_invalid(inv_desc->hi, inv_desc->lo);
         return false;
     }
     switch (inv_desc->lo & VTD_INV_DESC_CC_G) {
     case VTD_INV_DESC_CC_DOMAIN:
-        VTD_DPRINTF(INV, "domain-selective invalidation domain 0x%"PRIx16,
-                    (uint16_t)VTD_INV_DESC_CC_DID(inv_desc->lo));
+        trace_vtd_inv_desc_cc_domain(
+            (uint16_t)VTD_INV_DESC_CC_DID(inv_desc->lo));
         /* Fall through */
     case VTD_INV_DESC_CC_GLOBAL:
-        VTD_DPRINTF(INV, "global invalidation");
         vtd_context_global_invalidate(s);
         break;
 
     case VTD_INV_DESC_CC_DEVICE:
-        vtd_context_device_invalidate(s, VTD_INV_DESC_CC_SID(inv_desc->lo),
-                                      VTD_INV_DESC_CC_FM(inv_desc->lo));
+        sid = VTD_INV_DESC_CC_SID(inv_desc->lo);
+        fmask = VTD_INV_DESC_CC_FM(inv_desc->lo);
+        vtd_context_device_invalidate(s, sid, fmask);
         break;
 
     default:
-        VTD_DPRINTF(GENERAL, "error: invalid granularity in Context-cache "
-                    "Invalidate Descriptor hi 0x%"PRIx64  " lo 0x%"PRIx64,
-                    inv_desc->hi, inv_desc->lo);
+        trace_vtd_inv_desc_cc_invalid(inv_desc->hi, inv_desc->lo);
         return false;
     }
     return true;
@@ -1376,22 +1371,19 @@ static bool vtd_process_iotlb_desc(IntelIOMMUState *s, VTDInvDesc *inv_desc)
 
     if ((inv_desc->lo & VTD_INV_DESC_IOTLB_RSVD_LO) ||
         (inv_desc->hi & VTD_INV_DESC_IOTLB_RSVD_HI)) {
-        VTD_DPRINTF(GENERAL, "error: non-zero reserved field in IOTLB "
-                    "Invalidate Descriptor hi 0x%"PRIx64 " lo 0x%"PRIx64,
-                    inv_desc->hi, inv_desc->lo);
+        trace_vtd_inv_desc_iotlb_invalid(inv_desc->hi, inv_desc->lo);
         return false;
     }
 
     switch (inv_desc->lo & VTD_INV_DESC_IOTLB_G) {
     case VTD_INV_DESC_IOTLB_GLOBAL:
-        VTD_DPRINTF(INV, "global invalidation");
+        trace_vtd_inv_desc_iotlb_global();
         vtd_iotlb_global_invalidate(s);
         break;
 
     case VTD_INV_DESC_IOTLB_DOMAIN:
         domain_id = VTD_INV_DESC_IOTLB_DID(inv_desc->lo);
-        VTD_DPRINTF(INV, "domain-selective invalidation domain 0x%"PRIx16,
-                    domain_id);
+        trace_vtd_inv_desc_iotlb_domain(domain_id);
         vtd_iotlb_domain_invalidate(s, domain_id);
         break;
 
@@ -1399,20 +1391,16 @@ static bool vtd_process_iotlb_desc(IntelIOMMUState *s, VTDInvDesc *inv_desc)
         domain_id = VTD_INV_DESC_IOTLB_DID(inv_desc->lo);
         addr = VTD_INV_DESC_IOTLB_ADDR(inv_desc->hi);
         am = VTD_INV_DESC_IOTLB_AM(inv_desc->hi);
-        VTD_DPRINTF(INV, "page-selective invalidation domain 0x%"PRIx16
-                    " addr 0x%"PRIx64 " mask %"PRIu8, domain_id, addr, am);
+        trace_vtd_inv_desc_iotlb_pages(domain_id, addr, am);
         if (am > VTD_MAMV) {
-            VTD_DPRINTF(GENERAL, "error: supported max address mask value is "
-                        "%"PRIu8, (uint8_t)VTD_MAMV);
+            trace_vtd_inv_desc_iotlb_invalid(inv_desc->hi, inv_desc->lo);
             return false;
         }
         vtd_iotlb_page_invalidate(s, domain_id, addr, am);
         break;
 
     default:
-        VTD_DPRINTF(GENERAL, "error: invalid granularity in IOTLB Invalidate "
-                    "Descriptor hi 0x%"PRIx64 " lo 0x%"PRIx64,
-                    inv_desc->hi, inv_desc->lo);
+        trace_vtd_inv_desc_iotlb_invalid(inv_desc->hi, inv_desc->lo);
         return false;
     }
     return true;
@@ -1511,33 +1499,28 @@ static bool vtd_process_inv_desc(IntelIOMMUState *s)
 
     switch (desc_type) {
     case VTD_INV_DESC_CC:
-        VTD_DPRINTF(INV, "Context-cache Invalidate Descriptor hi 0x%"PRIx64
-                    " lo 0x%"PRIx64, inv_desc.hi, inv_desc.lo);
+        trace_vtd_inv_desc("context-cache", inv_desc.hi, inv_desc.lo);
         if (!vtd_process_context_cache_desc(s, &inv_desc)) {
             return false;
         }
         break;
 
     case VTD_INV_DESC_IOTLB:
-        VTD_DPRINTF(INV, "IOTLB Invalidate Descriptor hi 0x%"PRIx64
-                    " lo 0x%"PRIx64, inv_desc.hi, inv_desc.lo);
+        trace_vtd_inv_desc("iotlb", inv_desc.hi, inv_desc.lo);
         if (!vtd_process_iotlb_desc(s, &inv_desc)) {
             return false;
         }
         break;
 
     case VTD_INV_DESC_WAIT:
-        VTD_DPRINTF(INV, "Invalidation Wait Descriptor hi 0x%"PRIx64
-                    " lo 0x%"PRIx64, inv_desc.hi, inv_desc.lo);
+        trace_vtd_inv_desc("wait", inv_desc.hi, inv_desc.lo);
         if (!vtd_process_wait_desc(s, &inv_desc)) {
             return false;
         }
         break;
 
     case VTD_INV_DESC_IEC:
-        VTD_DPRINTF(INV, "Invalidation Interrupt Entry Cache "
-                    "Descriptor hi 0x%"PRIx64 " lo 0x%"PRIx64,
-                    inv_desc.hi, inv_desc.lo);
+        trace_vtd_inv_desc("iec", inv_desc.hi, inv_desc.lo);
         if (!vtd_process_inv_iec_desc(s, &inv_desc)) {
             return false;
         }
@@ -1552,9 +1535,7 @@ static bool vtd_process_inv_desc(IntelIOMMUState *s)
         break;
 
     default:
-        VTD_DPRINTF(GENERAL, "error: unkonw Invalidation Descriptor type "
-                    "hi 0x%"PRIx64 " lo 0x%"PRIx64 " type %"PRIu8,
-                    inv_desc.hi, inv_desc.lo, desc_type);
+        trace_vtd_inv_desc_invalid(inv_desc.hi, inv_desc.lo);
         return false;
     }
     s->iq_head++;
diff --git a/hw/i386/trace-events b/hw/i386/trace-events
index 1cc4a10..02aeaab 100644
--- a/hw/i386/trace-events
+++ b/hw/i386/trace-events
@@ -3,6 +3,24 @@
 # hw/i386/x86-iommu.c
 x86_iommu_iec_notify(bool global, uint32_t index, uint32_t mask) "Notify IEC invalidation: global=%d index=%" PRIu32 " mask=%" PRIu32
 
+# hw/i386/intel_iommu.c
+vtd_switch_address_space(uint8_t bus, uint8_t slot, uint8_t fn, bool on) "Device %02x:%02x.%x switching address space (iommu enabled=%d)"
+vtd_inv_desc(const char *type, uint64_t hi, uint64_t lo) "invalidate desc type %s high 0x%"PRIx64" low 0x%"PRIx64
+vtd_inv_desc_invalid(uint64_t hi, uint64_t lo) "invalid inv desc hi 0x%"PRIx64" lo 0x%"PRIx64
+vtd_inv_desc_cc_domain(uint16_t domain) "context invalidate domain 0x%"PRIx16
+vtd_inv_desc_cc_global(void) "context invalidate globally"
+vtd_inv_desc_cc_device(uint8_t bus, uint8_t dev, uint8_t fn) "context invalidate device %02"PRIx8":%02"PRIx8".%02"PRIx8
+vtd_inv_desc_cc_devices(uint16_t sid, uint16_t fmask) "context invalidate devices sid 0x%"PRIx16" fmask 0x%"PRIx16
+vtd_inv_desc_cc_invalid(uint64_t hi, uint64_t lo) "invalid context-cache desc hi 0x%"PRIx64" lo 0x%"PRIx64
+vtd_inv_desc_iotlb_global(void) "iotlb invalidate global"
+vtd_inv_desc_iotlb_domain(uint16_t domain) "iotlb invalidate whole domain 0x%"PRIx16
+vtd_inv_desc_iotlb_pages(uint16_t domain, uint64_t addr, uint8_t mask) "iotlb invalidate domain 0x%"PRIx16" addr 0x%"PRIx64" mask 0x%"PRIx8
+vtd_inv_desc_iotlb_invalid(uint64_t hi, uint64_t lo) "invalid iotlb desc hi 0x%"PRIx64" lo 0x%"PRIx64
+vtd_inv_desc_wait_sw(uint64_t addr, uint32_t data) "wait invalidate status write addr 0x%"PRIx64" data 0x%"PRIx32
+vtd_inv_desc_wait_irq(const char *msg) "%s"
+vtd_inv_desc_wait_invalid(uint64_t hi, uint64_t lo) "invalid wait desc hi 0x%"PRIx64" lo 0x%"PRIx64
+vtd_inv_desc_wait_write_fail(uint64_t hi, uint64_t lo) "write fail for wait desc hi 0x%"PRIx64" lo 0x%"PRIx64
+
 # hw/i386/amd_iommu.c
 amdvi_evntlog_fail(uint64_t addr, uint32_t head) "error: fail to write at addr 0x%"PRIx64" +  offset 0x%"PRIx32
 amdvi_cache_update(uint16_t domid, uint8_t bus, uint8_t slot, uint8_t func, uint64_t gpa, uint64_t txaddr) " update iotlb domid 0x%"PRIx16" devid: %02x:%02x.%x gpa 0x%"PRIx64" hpa 0x%"PRIx64
-- 
MST

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [Qemu-devel] [PULL 22/23] intel_iommu: convert dbg macros to trace for trans
  2017-02-17 19:53 [Qemu-devel] [PULL 00/23] virtio, pci: fixes, features Michael S. Tsirkin
                   ` (20 preceding siblings ...)
  2017-02-17 19:55 ` [Qemu-devel] [PULL 21/23] intel_iommu: convert dbg macros to traces for inv Michael S. Tsirkin
@ 2017-02-17 19:55 ` Michael S. Tsirkin
  2017-02-17 19:55 ` [Qemu-devel] [PULL 23/23] intel_iommu: vtd_slpt_level_shift check level Michael S. Tsirkin
  2017-02-20 11:55 ` [Qemu-devel] [PULL 00/23] virtio, pci: fixes, features Peter Maydell
  23 siblings, 0 replies; 36+ messages in thread
From: Michael S. Tsirkin @ 2017-02-17 19:55 UTC (permalink / raw)
  To: qemu-devel
  Cc: Peter Maydell, Peter Xu, Jason Wang, David Gibson, Paolo Bonzini,
	Richard Henderson, Eduardo Habkost

From: Peter Xu <peterx@redhat.com>

Another patch to convert the DPRINTF() stuffs. This patch focuses on the
address translation path and caching.

Signed-off-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Jason Wang <jasowang@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
---
 hw/i386/intel_iommu.c | 69 ++++++++++++++++++---------------------------------
 hw/i386/trace-events  | 10 ++++++++
 2 files changed, 34 insertions(+), 45 deletions(-)

diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
index 08e43b6..ad304f6 100644
--- a/hw/i386/intel_iommu.c
+++ b/hw/i386/intel_iommu.c
@@ -260,11 +260,9 @@ static void vtd_update_iotlb(IntelIOMMUState *s, uint16_t source_id,
     uint64_t *key = g_malloc(sizeof(*key));
     uint64_t gfn = vtd_get_iotlb_gfn(addr, level);
 
-    VTD_DPRINTF(CACHE, "update iotlb sid 0x%"PRIx16 " iova 0x%"PRIx64
-                " slpte 0x%"PRIx64 " did 0x%"PRIx16, source_id, addr, slpte,
-                domain_id);
+    trace_vtd_iotlb_page_update(source_id, addr, slpte, domain_id);
     if (g_hash_table_size(s->iotlb) >= VTD_IOTLB_MAX_SIZE) {
-        VTD_DPRINTF(CACHE, "iotlb exceeds size limit, forced to reset");
+        trace_vtd_iotlb_reset("iotlb exceeds size limit");
         vtd_reset_iotlb(s);
     }
 
@@ -505,8 +503,7 @@ static int vtd_get_root_entry(IntelIOMMUState *s, uint8_t index,
 
     addr = s->root + index * sizeof(*re);
     if (dma_memory_read(&address_space_memory, addr, re, sizeof(*re))) {
-        VTD_DPRINTF(GENERAL, "error: fail to access root-entry at 0x%"PRIx64
-                    " + %"PRIu8, s->root, index);
+        trace_vtd_re_invalid(re->rsvd, re->val);
         re->val = 0;
         return -VTD_FR_ROOT_TABLE_INV;
     }
@@ -524,15 +521,10 @@ static int vtd_get_context_entry_from_root(VTDRootEntry *root, uint8_t index,
 {
     dma_addr_t addr;
 
-    if (!vtd_root_entry_present(root)) {
-        VTD_DPRINTF(GENERAL, "error: root-entry is not present");
-        return -VTD_FR_ROOT_ENTRY_P;
-    }
+    /* we have checked that root entry is present */
     addr = (root->val & VTD_ROOT_ENTRY_CTP) + index * sizeof(*ce);
     if (dma_memory_read(&address_space_memory, addr, ce, sizeof(*ce))) {
-        VTD_DPRINTF(GENERAL, "error: fail to access context-entry at 0x%"PRIx64
-                    " + %"PRIu8,
-                    (uint64_t)(root->val & VTD_ROOT_ENTRY_CTP), index);
+        trace_vtd_re_invalid(root->rsvd, root->val);
         return -VTD_FR_CONTEXT_TABLE_INV;
     }
     ce->lo = le64_to_cpu(ce->lo);
@@ -704,12 +696,11 @@ static int vtd_dev_to_context_entry(IntelIOMMUState *s, uint8_t bus_num,
     }
 
     if (!vtd_root_entry_present(&re)) {
-        VTD_DPRINTF(GENERAL, "error: root-entry #%"PRIu8 " is not present",
-                    bus_num);
+        /* Not error - it's okay we don't have root entry. */
+        trace_vtd_re_not_present(bus_num);
         return -VTD_FR_ROOT_ENTRY_P;
     } else if (re.rsvd || (re.val & VTD_ROOT_ENTRY_RSVD)) {
-        VTD_DPRINTF(GENERAL, "error: non-zero reserved field in root-entry "
-                    "hi 0x%"PRIx64 " lo 0x%"PRIx64, re.rsvd, re.val);
+        trace_vtd_re_invalid(re.rsvd, re.val);
         return -VTD_FR_ROOT_ENTRY_RSVD;
     }
 
@@ -719,22 +710,17 @@ static int vtd_dev_to_context_entry(IntelIOMMUState *s, uint8_t bus_num,
     }
 
     if (!vtd_context_entry_present(ce)) {
-        VTD_DPRINTF(GENERAL,
-                    "error: context-entry #%"PRIu8 "(bus #%"PRIu8 ") "
-                    "is not present", devfn, bus_num);
+        /* Not error - it's okay we don't have context entry. */
+        trace_vtd_ce_not_present(bus_num, devfn);
         return -VTD_FR_CONTEXT_ENTRY_P;
     } else if ((ce->hi & VTD_CONTEXT_ENTRY_RSVD_HI) ||
                (ce->lo & VTD_CONTEXT_ENTRY_RSVD_LO)) {
-        VTD_DPRINTF(GENERAL,
-                    "error: non-zero reserved field in context-entry "
-                    "hi 0x%"PRIx64 " lo 0x%"PRIx64, ce->hi, ce->lo);
+        trace_vtd_ce_invalid(ce->hi, ce->lo);
         return -VTD_FR_CONTEXT_ENTRY_RSVD;
     }
     /* Check if the programming of context-entry is valid */
     if (!vtd_is_level_supported(s, vtd_get_level_from_context_entry(ce))) {
-        VTD_DPRINTF(GENERAL, "error: unsupported Address Width value in "
-                    "context-entry hi 0x%"PRIx64 " lo 0x%"PRIx64,
-                    ce->hi, ce->lo);
+        trace_vtd_ce_invalid(ce->hi, ce->lo);
         return -VTD_FR_CONTEXT_ENTRY_INV;
     } else {
         switch (ce->lo & VTD_CONTEXT_ENTRY_TT) {
@@ -743,9 +729,7 @@ static int vtd_dev_to_context_entry(IntelIOMMUState *s, uint8_t bus_num,
         case VTD_CONTEXT_TT_DEV_IOTLB:
             break;
         default:
-            VTD_DPRINTF(GENERAL, "error: unsupported Translation Type in "
-                        "context-entry hi 0x%"PRIx64 " lo 0x%"PRIx64,
-                        ce->hi, ce->lo);
+            trace_vtd_ce_invalid(ce->hi, ce->lo);
             return -VTD_FR_CONTEXT_ENTRY_INV;
         }
     }
@@ -825,9 +809,8 @@ static void vtd_do_iommu_translate(VTDAddressSpace *vtd_as, PCIBus *bus,
     /* Try to fetch slpte form IOTLB */
     iotlb_entry = vtd_lookup_iotlb(s, source_id, addr);
     if (iotlb_entry) {
-        VTD_DPRINTF(CACHE, "hit iotlb sid 0x%"PRIx16 " iova 0x%"PRIx64
-                    " slpte 0x%"PRIx64 " did 0x%"PRIx16, source_id, addr,
-                    iotlb_entry->slpte, iotlb_entry->domain_id);
+        trace_vtd_iotlb_page_hit(source_id, addr, iotlb_entry->slpte,
+                                 iotlb_entry->domain_id);
         slpte = iotlb_entry->slpte;
         reads = iotlb_entry->read_flags;
         writes = iotlb_entry->write_flags;
@@ -836,10 +819,9 @@ static void vtd_do_iommu_translate(VTDAddressSpace *vtd_as, PCIBus *bus,
     }
     /* Try to fetch context-entry from cache first */
     if (cc_entry->context_cache_gen == s->context_cache_gen) {
-        VTD_DPRINTF(CACHE, "hit context-cache bus %d devfn %d "
-                    "(hi %"PRIx64 " lo %"PRIx64 " gen %"PRIu32 ")",
-                    bus_num, devfn, cc_entry->context_entry.hi,
-                    cc_entry->context_entry.lo, cc_entry->context_cache_gen);
+        trace_vtd_iotlb_cc_hit(bus_num, devfn, cc_entry->context_entry.hi,
+                               cc_entry->context_entry.lo,
+                               cc_entry->context_cache_gen);
         ce = cc_entry->context_entry;
         is_fpd_set = ce.lo & VTD_CONTEXT_ENTRY_FPD;
     } else {
@@ -848,19 +830,16 @@ static void vtd_do_iommu_translate(VTDAddressSpace *vtd_as, PCIBus *bus,
         if (ret_fr) {
             ret_fr = -ret_fr;
             if (is_fpd_set && vtd_is_qualified_fault(ret_fr)) {
-                VTD_DPRINTF(FLOG, "fault processing is disabled for DMA "
-                            "requests through this context-entry "
-                            "(with FPD Set)");
+                trace_vtd_fault_disabled();
             } else {
                 vtd_report_dmar_fault(s, source_id, addr, ret_fr, is_write);
             }
             return;
         }
         /* Update context-cache */
-        VTD_DPRINTF(CACHE, "update context-cache bus %d devfn %d "
-                    "(hi %"PRIx64 " lo %"PRIx64 " gen %"PRIu32 "->%"PRIu32 ")",
-                    bus_num, devfn, ce.hi, ce.lo,
-                    cc_entry->context_cache_gen, s->context_cache_gen);
+        trace_vtd_iotlb_cc_update(bus_num, devfn, ce.hi, ce.lo,
+                                  cc_entry->context_cache_gen,
+                                  s->context_cache_gen);
         cc_entry->context_entry = ce;
         cc_entry->context_cache_gen = s->context_cache_gen;
     }
@@ -870,8 +849,7 @@ static void vtd_do_iommu_translate(VTDAddressSpace *vtd_as, PCIBus *bus,
     if (ret_fr) {
         ret_fr = -ret_fr;
         if (is_fpd_set && vtd_is_qualified_fault(ret_fr)) {
-            VTD_DPRINTF(FLOG, "fault processing is disabled for DMA requests "
-                        "through this context-entry (with FPD Set)");
+            trace_vtd_fault_disabled();
         } else {
             vtd_report_dmar_fault(s, source_id, addr, ret_fr, is_write);
         }
@@ -1031,6 +1009,7 @@ static uint64_t vtd_context_cache_invalidate(IntelIOMMUState *s, uint64_t val)
 
 static void vtd_iotlb_global_invalidate(IntelIOMMUState *s)
 {
+    trace_vtd_iotlb_reset("global invalidation recved");
     vtd_reset_iotlb(s);
 }
 
diff --git a/hw/i386/trace-events b/hw/i386/trace-events
index 02aeaab..88ad5e4 100644
--- a/hw/i386/trace-events
+++ b/hw/i386/trace-events
@@ -20,6 +20,16 @@ vtd_inv_desc_wait_sw(uint64_t addr, uint32_t data) "wait invalidate status write
 vtd_inv_desc_wait_irq(const char *msg) "%s"
 vtd_inv_desc_wait_invalid(uint64_t hi, uint64_t lo) "invalid wait desc hi 0x%"PRIx64" lo 0x%"PRIx64
 vtd_inv_desc_wait_write_fail(uint64_t hi, uint64_t lo) "write fail for wait desc hi 0x%"PRIx64" lo 0x%"PRIx64
+vtd_re_not_present(uint8_t bus) "Root entry bus %"PRIu8" not present"
+vtd_re_invalid(uint64_t hi, uint64_t lo) "invalid root entry hi 0x%"PRIx64" lo 0x%"PRIx64
+vtd_ce_not_present(uint8_t bus, uint8_t devfn) "Context entry bus %"PRIu8" devfn %"PRIu8" not present"
+vtd_ce_invalid(uint64_t hi, uint64_t lo) "invalid context entry hi 0x%"PRIx64" lo 0x%"PRIx64
+vtd_iotlb_page_hit(uint16_t sid, uint64_t addr, uint64_t slpte, uint16_t domain) "IOTLB page hit sid 0x%"PRIx16" iova 0x%"PRIx64" slpte 0x%"PRIx64" domain 0x%"PRIx16
+vtd_iotlb_page_update(uint16_t sid, uint64_t addr, uint64_t slpte, uint16_t domain) "IOTLB page update sid 0x%"PRIx16" iova 0x%"PRIx64" slpte 0x%"PRIx64" domain 0x%"PRIx16
+vtd_iotlb_cc_hit(uint8_t bus, uint8_t devfn, uint64_t high, uint64_t low, uint32_t gen) "IOTLB context hit bus 0x%"PRIx8" devfn 0x%"PRIx8" high 0x%"PRIx64" low 0x%"PRIx64" gen %"PRIu32
+vtd_iotlb_cc_update(uint8_t bus, uint8_t devfn, uint64_t high, uint64_t low, uint32_t gen1, uint32_t gen2) "IOTLB context update bus 0x%"PRIx8" devfn 0x%"PRIx8" high 0x%"PRIx64" low 0x%"PRIx64" gen %"PRIu32" -> gen %"PRIu32
+vtd_iotlb_reset(const char *reason) "IOTLB reset (reason: %s)"
+vtd_fault_disabled(void) "Fault processing disabled for context entry"
 
 # hw/i386/amd_iommu.c
 amdvi_evntlog_fail(uint64_t addr, uint32_t head) "error: fail to write at addr 0x%"PRIx64" +  offset 0x%"PRIx32
-- 
MST

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [Qemu-devel] [PULL 23/23] intel_iommu: vtd_slpt_level_shift check level
  2017-02-17 19:53 [Qemu-devel] [PULL 00/23] virtio, pci: fixes, features Michael S. Tsirkin
                   ` (21 preceding siblings ...)
  2017-02-17 19:55 ` [Qemu-devel] [PULL 22/23] intel_iommu: convert dbg macros to trace for trans Michael S. Tsirkin
@ 2017-02-17 19:55 ` Michael S. Tsirkin
  2017-02-20 11:55 ` [Qemu-devel] [PULL 00/23] virtio, pci: fixes, features Peter Maydell
  23 siblings, 0 replies; 36+ messages in thread
From: Michael S. Tsirkin @ 2017-02-17 19:55 UTC (permalink / raw)
  To: qemu-devel
  Cc: Peter Maydell, Peter Xu, Jason Wang, David Gibson, Paolo Bonzini,
	Richard Henderson, Eduardo Habkost

From: Peter Xu <peterx@redhat.com>

This helps in debugging incorrect level passed in.

Reviewed-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
---
 hw/i386/intel_iommu.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
index ad304f6..22d8226 100644
--- a/hw/i386/intel_iommu.c
+++ b/hw/i386/intel_iommu.c
@@ -168,6 +168,7 @@ static gboolean vtd_hash_remove_by_domain(gpointer key, gpointer value,
 /* The shift of an addr for a certain level of paging structure */
 static inline uint32_t vtd_slpt_level_shift(uint32_t level)
 {
+    assert(level != 0);
     return VTD_PAGE_SHIFT_4K + (level - 1) * VTD_SL_LEVEL_BITS;
 }
 
-- 
MST

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* Re: [Qemu-devel] [PULL 00/23] virtio, pci: fixes, features
  2017-02-17 19:53 [Qemu-devel] [PULL 00/23] virtio, pci: fixes, features Michael S. Tsirkin
                   ` (22 preceding siblings ...)
  2017-02-17 19:55 ` [Qemu-devel] [PULL 23/23] intel_iommu: vtd_slpt_level_shift check level Michael S. Tsirkin
@ 2017-02-20 11:55 ` Peter Maydell
  23 siblings, 0 replies; 36+ messages in thread
From: Peter Maydell @ 2017-02-20 11:55 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: QEMU Developers

On 17 February 2017 at 19:53, Michael S. Tsirkin <mst@redhat.com> wrote:
> The following changes since commit ad584d37f2a86b392c25f3f00cc1f1532676c2d1:
>
>   Merge remote-tracking branch 'remotes/bonzini/tags/for-upstream' into staging (2017-02-16 17:46:52 +0000)
>
> are available in the git repository at:
>
>   git://git.kernel.org/pub/scm/virt/kvm/mst/qemu.git tags/for_upstream
>
> for you to fetch changes up to 7e58326ad7e79b8c5dbcc6f24e9dc1523d84c11b:
>
>   intel_iommu: vtd_slpt_level_shift check level (2017-02-17 21:52:31 +0200)
>
> ----------------------------------------------------------------
> virtio, pci: fixes, features
>
> virtio is using region caches for performance
> iommu support for IOTLBs
> misc fixes
>
> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
>

Applied, thanks.

-- PMM

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [Qemu-devel] [PULL 12/23] virtio: use VRingMemoryRegionCaches for avail and used rings
  2017-02-17 19:54 ` [Qemu-devel] [PULL 12/23] virtio: use VRingMemoryRegionCaches for avail and used rings Michael S. Tsirkin
@ 2017-02-21 12:57   ` Gerd Hoffmann
  2017-02-21 16:25     ` Laszlo Ersek
  2017-02-21 18:08     ` Paolo Bonzini
  0 siblings, 2 replies; 36+ messages in thread
From: Gerd Hoffmann @ 2017-02-21 12:57 UTC (permalink / raw)
  To: Michael S. Tsirkin, Laszlo Ersek
  Cc: qemu-devel, Peter Maydell, Jason Wang, Stefan Hajnoczi, Paolo Bonzini

On Fr, 2017-02-17 at 21:54 +0200, Michael S. Tsirkin wrote:
> From: Paolo Bonzini <pbonzini@redhat.com>
> 
> The virtio-net change is necessary because it uses virtqueue_fill
> and virtqueue_flush instead of the more convenient virtqueue_push.
> 
> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

This change breaks ovmf for me, although it isn't obvious to me why.
Bisect landed here, and reverting indeed makes things going again.

Using q35 machine type, pcie virtio devices, with the rhel ovmf build
(OVMF-20160608b-1.git988715a.el7.noarch).

First thing I've tried is swapping virtio-net for another nic,
suspecting this change might trigger a bug in the ovmf virtio-net
driver, but that didn't change things.

Effect is that qemu just exits, without logging some error, looks like a
normal guest shutdown.  Firmware log doesn't give a clue either, it just
stops at some point, again without any error message.  Here are the last
lines of the log:

SataControllerStart START
SataControllerStart error return status = Already started
SetPciIntLine: [00:1C.0] PciRoot(0x0)/Pci(0x1C,0x0) -> 0x0A
SetPciIntLine: [01:00.0] PciRoot(0x0)/Pci(0x1C,0x0)/Pci(0x0,0x0) -> 0x0A
SetPciIntLine: [00:1C.1] PciRoot(0x0)/Pci(0x1C,0x1) -> 0x0A
SetPciIntLine: [02:00.0] PciRoot(0x0)/Pci(0x1C,0x1)/Pci(0x0,0x0) -> 0x0A
SetPciIntLine: [00:1C.2] PciRoot(0x0)/Pci(0x1C,0x2) -> 0x0A
SetPciIntLine: [00:1C.3] PciRoot(0x0)/Pci(0x1C,0x3) -> 0x0A
SetPciIntLine: [00:1C.4] PciRoot(0x0)/Pci(0x1C,0x4) -> 0x0A
SetPciIntLine: [05:00.0] PciRoot(0x0)/Pci(0x1C,0x4)/Pci(0x0,0x0) -> 0x0A
SetPciIntLine: [05:00.1] PciRoot(0x0)/Pci(0x1C,0x4)/Pci(0x0,0x1) -> 0x0A
SetPciIntLine: [05:00.2] PciRoot(0x0)/Pci(0x1C,0x4)/Pci(0x0,0x2) -> 0x0A
SetPciIntLine: [00:1C.5] PciRoot(0x0)/Pci(0x1C,0x5) -> 0x0A
SetPciIntLine: [06:00.0] PciRoot(0x0)/Pci(0x1C,0x5)/Pci(0x0,0x0) -> 0x0A
SetPciIntLine: [00:1C.6] PciRoot(0x0)/Pci(0x1C,0x6) -> 0x0A
SetPciIntLine: [00:1C.7] PciRoot(0x0)/Pci(0x1C,0x7) -> 0x0A
SetPciIntLine: [00:1F.2] PciRoot(0x0)/Pci(0x1F,0x2) -> 0x0A
SetPciIntLine: [00:1F.3] PciRoot(0x0)/Pci(0x1F,0x3) -> 0x0A
Select Item: 0x8
Select Item: 0x17
qemu -kernel was not used.

cheers,
  Gerd

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [Qemu-devel] [PULL 12/23] virtio: use VRingMemoryRegionCaches for avail and used rings
  2017-02-21 12:57   ` Gerd Hoffmann
@ 2017-02-21 16:25     ` Laszlo Ersek
  2017-02-21 17:54       ` Laszlo Ersek
  2017-02-21 18:08     ` Paolo Bonzini
  1 sibling, 1 reply; 36+ messages in thread
From: Laszlo Ersek @ 2017-02-21 16:25 UTC (permalink / raw)
  To: Gerd Hoffmann, Michael S. Tsirkin
  Cc: Peter Maydell, Jason Wang, qemu-devel, Stefan Hajnoczi, Paolo Bonzini

On 02/21/17 13:57, Gerd Hoffmann wrote:
> On Fr, 2017-02-17 at 21:54 +0200, Michael S. Tsirkin wrote:
>> From: Paolo Bonzini <pbonzini@redhat.com>
>>
>> The virtio-net change is necessary because it uses virtqueue_fill
>> and virtqueue_flush instead of the more convenient virtqueue_push.
>>
>> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
>> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
>> Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
>> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
> 
> This change breaks ovmf for me, although it isn't obvious to me why.
> Bisect landed here, and reverting indeed makes things going again.

I looked at the patch (on the list) and I don't have the slightest idea
what's going on. I read the word "cache" in it, so I guess it introduces
(or exposes) some cache coherency issue.

> Using q35 machine type, pcie virtio devices, with the rhel ovmf build
> (OVMF-20160608b-1.git988715a.el7.noarch).
> 
> First thing I've tried is swapping virtio-net for another nic,
> suspecting this change might trigger a bug in the ovmf virtio-net
> driver, but that didn't change things.
> 
> Effect is that qemu just exits, without logging some error, looks like a
> normal guest shutdown.

That's very strange (especially given the OVMF log below).

> Firmware log doesn't give a clue either, it just
> stops at some point, again without any error message.  Here are the last
> lines of the log:
> 
> SataControllerStart START
> SataControllerStart error return status = Already started
> SetPciIntLine: [00:1C.0] PciRoot(0x0)/Pci(0x1C,0x0) -> 0x0A
> SetPciIntLine: [01:00.0] PciRoot(0x0)/Pci(0x1C,0x0)/Pci(0x0,0x0) -> 0x0A
> SetPciIntLine: [00:1C.1] PciRoot(0x0)/Pci(0x1C,0x1) -> 0x0A
> SetPciIntLine: [02:00.0] PciRoot(0x0)/Pci(0x1C,0x1)/Pci(0x0,0x0) -> 0x0A
> SetPciIntLine: [00:1C.2] PciRoot(0x0)/Pci(0x1C,0x2) -> 0x0A
> SetPciIntLine: [00:1C.3] PciRoot(0x0)/Pci(0x1C,0x3) -> 0x0A
> SetPciIntLine: [00:1C.4] PciRoot(0x0)/Pci(0x1C,0x4) -> 0x0A
> SetPciIntLine: [05:00.0] PciRoot(0x0)/Pci(0x1C,0x4)/Pci(0x0,0x0) -> 0x0A
> SetPciIntLine: [05:00.1] PciRoot(0x0)/Pci(0x1C,0x4)/Pci(0x0,0x1) -> 0x0A
> SetPciIntLine: [05:00.2] PciRoot(0x0)/Pci(0x1C,0x4)/Pci(0x0,0x2) -> 0x0A
> SetPciIntLine: [00:1C.5] PciRoot(0x0)/Pci(0x1C,0x5) -> 0x0A
> SetPciIntLine: [06:00.0] PciRoot(0x0)/Pci(0x1C,0x5)/Pci(0x0,0x0) -> 0x0A
> SetPciIntLine: [00:1C.6] PciRoot(0x0)/Pci(0x1C,0x6) -> 0x0A
> SetPciIntLine: [00:1C.7] PciRoot(0x0)/Pci(0x1C,0x7) -> 0x0A
> SetPciIntLine: [00:1F.2] PciRoot(0x0)/Pci(0x1F,0x2) -> 0x0A
> SetPciIntLine: [00:1F.3] PciRoot(0x0)/Pci(0x1F,0x3) -> 0x0A
> Select Item: 0x8
> Select Item: 0x17
> qemu -kernel was not used.

The next action would be the EfiBootManagerRefreshAllBootOption()
function call in PlatformBootManagerAfterConsole(), in file
"OvmfPkg/Library/PlatformBootManagerLib/BdsPlatform.c".

That function (from "MdeModulePkg/Library/UefiBootManagerLib/BmBoot.c")
"enumerates all boot options, creates them and registers them in the
BootOrder variable". While doing that, it definitely looks (indirectly)
at any UEFI-bootable virtio-scsi or virtio-blk device.

The direct symptom you are seeing ("qemu just exits / shuts down") is
inexplicable. If there were a virtio-de-sync between guest and host, I'd
expect OVMF to hang, and/or emit error messages.

Thanks
Laszlo

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [Qemu-devel] [PULL 12/23] virtio: use VRingMemoryRegionCaches for avail and used rings
  2017-02-21 16:25     ` Laszlo Ersek
@ 2017-02-21 17:54       ` Laszlo Ersek
  2017-02-22  9:03         ` Paolo Bonzini
  0 siblings, 1 reply; 36+ messages in thread
From: Laszlo Ersek @ 2017-02-21 17:54 UTC (permalink / raw)
  To: Gerd Hoffmann, Michael S. Tsirkin
  Cc: Peter Maydell, Jason Wang, qemu-devel, Stefan Hajnoczi, Paolo Bonzini

On 02/21/17 17:25, Laszlo Ersek wrote:
> On 02/21/17 13:57, Gerd Hoffmann wrote:
>> On Fr, 2017-02-17 at 21:54 +0200, Michael S. Tsirkin wrote:
>>> From: Paolo Bonzini <pbonzini@redhat.com>
>>>
>>> The virtio-net change is necessary because it uses virtqueue_fill
>>> and virtqueue_flush instead of the more convenient virtqueue_push.
>>>
>>> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
>>> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
>>> Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
>>> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
>>
>> This change breaks ovmf for me, although it isn't obvious to me why.
>> Bisect landed here, and reverting indeed makes things going again.
> 
> I looked at the patch (on the list) and I don't have the slightest idea
> what's going on. I read the word "cache" in it, so I guess it introduces
> (or exposes) some cache coherency issue.
> 
>> Using q35 machine type, pcie virtio devices, with the rhel ovmf build
>> (OVMF-20160608b-1.git988715a.el7.noarch).
>>
>> First thing I've tried is swapping virtio-net for another nic,
>> suspecting this change might trigger a bug in the ovmf virtio-net
>> driver, but that didn't change things.
>>
>> Effect is that qemu just exits, without logging some error, looks like a
>> normal guest shutdown.
> 
> That's very strange (especially given the OVMF log below).
> 
>> Firmware log doesn't give a clue either, it just
>> stops at some point, again without any error message.  Here are the last
>> lines of the log:
>>
>> SataControllerStart START
>> SataControllerStart error return status = Already started
>> SetPciIntLine: [00:1C.0] PciRoot(0x0)/Pci(0x1C,0x0) -> 0x0A
>> SetPciIntLine: [01:00.0] PciRoot(0x0)/Pci(0x1C,0x0)/Pci(0x0,0x0) -> 0x0A
>> SetPciIntLine: [00:1C.1] PciRoot(0x0)/Pci(0x1C,0x1) -> 0x0A
>> SetPciIntLine: [02:00.0] PciRoot(0x0)/Pci(0x1C,0x1)/Pci(0x0,0x0) -> 0x0A
>> SetPciIntLine: [00:1C.2] PciRoot(0x0)/Pci(0x1C,0x2) -> 0x0A
>> SetPciIntLine: [00:1C.3] PciRoot(0x0)/Pci(0x1C,0x3) -> 0x0A
>> SetPciIntLine: [00:1C.4] PciRoot(0x0)/Pci(0x1C,0x4) -> 0x0A
>> SetPciIntLine: [05:00.0] PciRoot(0x0)/Pci(0x1C,0x4)/Pci(0x0,0x0) -> 0x0A
>> SetPciIntLine: [05:00.1] PciRoot(0x0)/Pci(0x1C,0x4)/Pci(0x0,0x1) -> 0x0A
>> SetPciIntLine: [05:00.2] PciRoot(0x0)/Pci(0x1C,0x4)/Pci(0x0,0x2) -> 0x0A
>> SetPciIntLine: [00:1C.5] PciRoot(0x0)/Pci(0x1C,0x5) -> 0x0A
>> SetPciIntLine: [06:00.0] PciRoot(0x0)/Pci(0x1C,0x5)/Pci(0x0,0x0) -> 0x0A
>> SetPciIntLine: [00:1C.6] PciRoot(0x0)/Pci(0x1C,0x6) -> 0x0A
>> SetPciIntLine: [00:1C.7] PciRoot(0x0)/Pci(0x1C,0x7) -> 0x0A
>> SetPciIntLine: [00:1F.2] PciRoot(0x0)/Pci(0x1F,0x2) -> 0x0A
>> SetPciIntLine: [00:1F.3] PciRoot(0x0)/Pci(0x1F,0x3) -> 0x0A
>> Select Item: 0x8
>> Select Item: 0x17
>> qemu -kernel was not used.
> 
> The next action would be the EfiBootManagerRefreshAllBootOption()
> function call in PlatformBootManagerAfterConsole(), in file
> "OvmfPkg/Library/PlatformBootManagerLib/BdsPlatform.c".
> 
> That function (from "MdeModulePkg/Library/UefiBootManagerLib/BmBoot.c")
> "enumerates all boot options, creates them and registers them in the
> BootOrder variable". While doing that, it definitely looks (indirectly)
> at any UEFI-bootable virtio-scsi or virtio-blk device.
> 
> The direct symptom you are seeing ("qemu just exits / shuts down") is
> inexplicable. If there were a virtio-de-sync between guest and host, I'd
> expect OVMF to hang, and/or emit error messages.

Actually, QEMU segfaults. From the dmesg:

[Tue Feb 21 18:47:28 2017] CPU 0/KVM[8298]: segfault at 48 ip 00007fcb5dd02105 sp 00007fcb49efc270 error 4 in qemu-system-x86_64[7fcb5dae3000+905000]

Complete backtrace below. (Thread 11 seems to be the one segfaulting.)

Thanks
Laszlo

> Program received signal SIGSEGV, Segmentation fault.
> [Switching to Thread 0x7f651dcbb700 (LWP 8553)]
> 0x00007f6531ac0105 in address_space_translate_cached (cache=0x38, addr=2, xlat=0x7f651dcba2d0, plen=0x7f651dcba2d8,
>     is_write=false) at .../exec.c:3181
> 3181        assert(addr < cache->len && *plen <= cache->len - addr);
> (gdb) thread apply all bt full
>
> Thread 13 (Thread 0x7f651f5d8700 (LWP 8549)):
> #0  0x00007f6528937bdd in nanosleep () from /lib64/libpthread.so.0
> No symbol table info available.
> #1  0x00007f6526f316f8 in g_usleep () from /lib64/libglib-2.0.so.0
> No symbol table info available.
> #2  0x00007f6531f8791e in call_rcu_thread (opaque=0x0) at .../util/rcu.c:244
>         tries = 1
>         n = 1
>         node = 0x7f65180053b0
> #3  0x00007f6528930dc5 in start_thread () from /lib64/libpthread.so.0
> No symbol table info available.
> #4  0x00007f6525bad73d in clone () from /lib64/libc.so.6
> No symbol table info available.
>
> Thread 12 (Thread 0x7f651ebc4700 (LWP 8551)):
> #0  0x00007f65289369b1 in do_futex_wait () from /lib64/libpthread.so.0
> No symbol table info available.
> #1  0x00007f6528936a77 in __new_sem_wait_slow () from /lib64/libpthread.so.0
> No symbol table info available.
> #2  0x00007f6528936b15 in sem_timedwait () from /lib64/libpthread.so.0
> No symbol table info available.
> #3  0x00007f6531f7249a in qemu_sem_timedwait (sem=0x7f6532c1f488, ms=10000) at .../util/qemu-thread-posix.c:255
>         rc = 0
>         ts = {tv_sec = 1487699392, tv_nsec = 569220000}
>         __func__ = "qemu_sem_timedwait"
> #4  0x00007f6531f6c7a4 in worker_thread (opaque=0x7f6532c1f420) at .../util/thread-pool.c:92
>         req = 0x7f65184f1a20
>         ret = 0
>         pool = 0x7f6532c1f420
> #5  0x00007f6528930dc5 in start_thread () from /lib64/libpthread.so.0
> No symbol table info available.
> #6  0x00007f6525bad73d in clone () from /lib64/libc.so.6
> No symbol table info available.
>
> Thread 11 (Thread 0x7f651dcbb700 (LWP 8553)):
> #0  0x00007f6531ac0105 in address_space_translate_cached (cache=0x38, addr=2, xlat=0x7f651dcba2d0, plen=0x7f651dcba2d8, is_write=false) at .../exec.c:3181
>         __PRETTY_FUNCTION__ = "address_space_translate_cached"
> #1  0x00007f6531ac07aa in address_space_lduw_internal_cached (cache=0x38, addr=2, attrs=..., result=0x0, endian=DEVICE_LITTLE_ENDIAN) at .../memory_ldst.inc.c:264
>         ptr = 0x7f6476c73802 "H\004"
>         val = 1096
>         mr = 0x7f6532d91260
>         l = 2
>         addr1 = 3202824194
>         r = 0
>         release_lock = false
> #2  0x00007f6531ac0917 in address_space_lduw_le_cached (cache=0x38, addr=2, attrs=..., result=0x0) at .../memory_ldst.inc.c:315
> No locals.
> #3  0x00007f6531ac09c3 in lduw_le_phys_cached (cache=0x38, addr=2) at .../memory_ldst.inc.c:334
> No locals.
> #4  0x00007f6531b737b1 in virtio_lduw_phys_cached (vdev=0x7f65343fa4e0, cache=0x38, pa=2) at .../include/hw/virtio/virtio-access.h:166
> No locals.
> #5  0x00007f6531b73d40 in vring_avail_idx (vq=0x7f651c09c090) at .../hw/virtio/virtio.c:201
>         caches = 0x0
>         pa = 2
> #6  0x00007f6531b7421f in virtio_queue_empty (vq=0x7f651c09c090) at .../hw/virtio/virtio.c:332
>         empty = true
> #7  0x00007f6531b78b82 in virtio_queue_host_notifier_aio_poll (opaque=0x7f651c09c0f8) at .../hw/virtio/virtio.c:2294
>         n = 0x7f651c09c0f8
>         vq = 0x7f651c09c090
>         progress = false
> #8  0x00007f6531f6fe2c in run_poll_handlers_once (ctx=0x7f6532bcd940) at .../util/aio-posix.c:490
>         progress = false
>         node = 0x7f6518478650
> #9  0x00007f6531f7002d in try_poll_mode (ctx=0x7f6532bcd940, blocking=true) at .../util/aio-posix.c:566
> No locals.
> #10 0x00007f6531f700c1 in aio_poll (ctx=0x7f6532bcd940, blocking=true) at .../util/aio-posix.c:595
>         node = 0x7f6531ed48fc <aio_context_in_iothread+17>
>         i = 32613
>         ret = 0
>         progress = false
>         timeout = 140072268314064
>         start = 0
>         __PRETTY_FUNCTION__ = "aio_poll"
> #11 0x00007f6531ed6157 in blk_prw (blk=0x7f6532bf7ee0, offset=16896, buf=0x7f650d404200 " ", bytes=512, co_entry=0x7f6531ed5fe3 <blk_write_entry>, flags=0) at .../block/block-backend.c:905
>         waited_ = false
>         bs_ = 0x7f6532c07980
>         ctx_ = 0x7f6532bcd940
>         co = 0x7f6518184010
>         qiov = {iov = 0x7f651dcba600, niov = 1, nalloc = -1, size = 512}
>         iov = {iov_base = 0x7f650d404200, iov_len = 512}
>         rwco = {blk = 0x7f6532bf7ee0, offset = 16896, qiov = 0x7f651dcba610, ret = 2147483647, flags = 0}
>         __PRETTY_FUNCTION__ = "blk_prw"
> #12 0x00007f6531ed67c3 in blk_pwrite (blk=0x7f6532bf7ee0, offset=16896, buf=0x7f650d404200, count=512, flags=0) at .../block/block-backend.c:1064
>         ret = 0
> #13 0x00007f6531cad498 in pflash_update (pfl=0x7f6532e5fbb0, offset=16896, size=1) at .../hw/block/pflash_cfi01.c:420
>         offset_end = 17408
> #14 0x00007f6531cad8e8 in pflash_write (pfl=0x7f6532e5fbb0, offset=17378, value=62, width=1, be=0) at .../hw/block/pflash_cfi01.c:545
>         p = 0x7f651dcba760 "°§Ë\035e\177"
>         cmd = 62 '>'
>         __func__ = "pflash_write"
> #15 0x00007f6531caddbd in pflash_mem_write_with_attrs (opaque=0x7f6532e5fbb0, addr=17378, value=62, len=1, attrs=...) at .../hw/block/pflash_cfi01.c:691
>         pfl = 0x7f6532e5fbb0
>         be = false
> #16 0x00007f6531b10524 in memory_region_write_with_attrs_accessor (mr=0x7f6532e5ff50, addr=17378, value=0x7f651dcba838, size=1, shift=0, mask=255, attrs=...) at .../memory.c:552
>         tmp = 62
> #17 0x00007f6531b10643 in access_with_adjusted_size (addr=17378, value=0x7f651dcba838, size=1, access_size_min=1, access_size_max=4, access=0x7f6531b1043f <memory_region_write_with_attrs_accessor>, mr=0x7f6532e5ff50, attrs=...) at .../memory.c:592
>         access_mask = 255
>         access_size = 1
>         i = 0
>         r = 0
> #18 0x00007f6531b12ca3 in memory_region_dispatch_write (mr=0x7f6532e5ff50, addr=17378, data=62, size=1, attrs=...) at .../memory.c:1329
> No locals.
> #19 0x00007f6531abdbff in address_space_write_continue (as=0x7f65325ddd80 <address_space_memory>, addr=4292887522, attrs=..., buf=0x7f6531894028 ">\020", len=1, addr1=17378, l=1, mr=0x7f6532e5ff50) at .../exec.c:2647
>         ptr = 0x0
>         val = 62
>         result = 0
>         release_lock = true
> #20 0x00007f6531abdd4e in address_space_write (as=0x7f65325ddd80 <address_space_memory>, addr=4292887522, attrs=..., buf=0x7f6531894028 ">\020", len=1) at .../exec.c:2692
>         l = 1
>         addr1 = 17378
>         mr = 0x7f6532e5ff50
>         result = 0
> #21 0x00007f6531abe078 in address_space_rw (as=0x7f65325ddd80 <address_space_memory>, addr=4292887522, attrs=..., buf=0x7f6531894028 ">\020", len=1, is_write=true) at .../exec.c:2794
> No locals.
> #22 0x00007f6531b0d039 in kvm_cpu_exec (cpu=0x7f6532c4f9b0) at .../kvm-all.c:1968
>         attrs = {unspecified = 0, secure = 0, user = 0, requester_id = 0}
>         run = 0x7f6531894000
>         ret = 0
>         run_ret = 0
> #23 0x00007f6531af4f0a in qemu_kvm_cpu_thread_fn (arg=0x7f6532c4f9b0) at .../cpus.c:1000
>         cpu = 0x7f6532c4f9b0
>         r = 0
> #24 0x00007f6528930dc5 in start_thread () from /lib64/libpthread.so.0
> No symbol table info available.
> #25 0x00007f6525bad73d in clone () from /lib64/libc.so.6
> No symbol table info available.
>
> Thread 10 (Thread 0x7f651d4ba700 (LWP 8554)):
> #0  0x00007f6525ba4507 in ioctl () from /lib64/libc.so.6
> No symbol table info available.
> #1  0x00007f6531b0d526 in kvm_vcpu_ioctl (cpu=0x7f6532cb2810, type=44672) at .../kvm-all.c:2080
>         ret = 32613
>         arg = 0x0
>         ap = {{gp_offset = 24, fp_offset = 48, overflow_arg_area = 0x7f651d4b99b0, reg_save_area = 0x7f651d4b98f0}}
> #2  0x00007f6531b0cede in kvm_cpu_exec (cpu=0x7f6532cb2810) at .../kvm-all.c:1929
>         attrs = {unspecified = 0, secure = 0, user = 0, requester_id = 29500}
>         run = 0x7f6531891000
>         ret = 32613
>         run_ret = 32613
> #3  0x00007f6531af4f0a in qemu_kvm_cpu_thread_fn (arg=0x7f6532cb2810) at .../cpus.c:1000
>         cpu = 0x7f6532cb2810
>         r = 65536
> #4  0x00007f6528930dc5 in start_thread () from /lib64/libpthread.so.0
> No symbol table info available.
> #5  0x00007f6525bad73d in clone () from /lib64/libc.so.6
> No symbol table info available.
>
> Thread 9 (Thread 0x7f651ccb9700 (LWP 8555)):
> #0  0x00007f6525ba4507 in ioctl () from /lib64/libc.so.6
> No symbol table info available.
> #1  0x00007f6531b0d526 in kvm_vcpu_ioctl (cpu=0x7f6532cd2310, type=44672) at .../kvm-all.c:2080
>         ret = 32613
>         arg = 0x0
>         ap = {{gp_offset = 24, fp_offset = 48, overflow_arg_area = 0x7f651ccb89b0, reg_save_area = 0x7f651ccb88f0}}
> #2  0x00007f6531b0cede in kvm_cpu_exec (cpu=0x7f6532cd2310) at .../kvm-all.c:1929
>         attrs = {unspecified = 0, secure = 0, user = 0, requester_id = 28988}
>         run = 0x7f653188e000
>         ret = 32613
>         run_ret = 32613
> #3  0x00007f6531af4f0a in qemu_kvm_cpu_thread_fn (arg=0x7f6532cd2310) at .../cpus.c:1000
>         cpu = 0x7f6532cd2310
>         r = 65536
> #4  0x00007f6528930dc5 in start_thread () from /lib64/libpthread.so.0
> No symbol table info available.
> #5  0x00007f6525bad73d in clone () from /lib64/libc.so.6
> No symbol table info available.
>
> Thread 8 (Thread 0x7f650ffff700 (LWP 8556)):
> #0  0x00007f6525ba4507 in ioctl () from /lib64/libc.so.6
> No symbol table info available.
> #1  0x00007f6531b0d526 in kvm_vcpu_ioctl (cpu=0x7f6532cf1e40, type=44672) at .../kvm-all.c:2080
>         ret = 32613
>         arg = 0x0
>         ap = {{gp_offset = 24, fp_offset = 48, overflow_arg_area = 0x7f650fffe9b0, reg_save_area = 0x7f650fffe8f0}}
> #2  0x00007f6531b0cede in kvm_cpu_exec (cpu=0x7f6532cf1e40) at .../kvm-all.c:1929
>         attrs = {unspecified = 0, secure = 0, user = 0, requester_id = 64828}
>         run = 0x7f653188b000
>         ret = 32613
>         run_ret = 32613
> #3  0x00007f6531af4f0a in qemu_kvm_cpu_thread_fn (arg=0x7f6532cf1e40) at .../cpus.c:1000
>         cpu = 0x7f6532cf1e40
>         r = 65536
> #4  0x00007f6528930dc5 in start_thread () from /lib64/libpthread.so.0
> No symbol table info available.
> #5  0x00007f6525bad73d in clone () from /lib64/libc.so.6
> No symbol table info available.
>
> Thread 7 (Thread 0x7f650f7fe700 (LWP 8557)):
> #0  0x00007f6525ba4507 in ioctl () from /lib64/libc.so.6
> No symbol table info available.
> #1  0x00007f6531b0d526 in kvm_vcpu_ioctl (cpu=0x7f6532d11750, type=44672) at .../kvm-all.c:2080
>         ret = 32613
>         arg = 0x0
>         ap = {{gp_offset = 24, fp_offset = 48, overflow_arg_area = 0x7f650f7fd9b0, reg_save_area = 0x7f650f7fd8f0}}
> #2  0x00007f6531b0cede in kvm_cpu_exec (cpu=0x7f6532d11750) at .../kvm-all.c:1929
>         attrs = {unspecified = 0, secure = 0, user = 0, requester_id = 64316}
>         run = 0x7f6531888000
>         ret = 32613
>         run_ret = 32613
> #3  0x00007f6531af4f0a in qemu_kvm_cpu_thread_fn (arg=0x7f6532d11750) at .../cpus.c:1000
>         cpu = 0x7f6532d11750
>         r = 65536
> #4  0x00007f6528930dc5 in start_thread () from /lib64/libpthread.so.0
> No symbol table info available.
> #5  0x00007f6525bad73d in clone () from /lib64/libc.so.6
> No symbol table info available.
>
> Thread 6 (Thread 0x7f650effd700 (LWP 8558)):
> #0  0x00007f6525ba4507 in ioctl () from /lib64/libc.so.6
> No symbol table info available.
> #1  0x00007f6531b0d526 in kvm_vcpu_ioctl (cpu=0x7f6532d31060, type=44672) at .../kvm-all.c:2080
>         ret = 32613
>         arg = 0x0
>         ap = {{gp_offset = 24, fp_offset = 48, overflow_arg_area = 0x7f650effc9b0, reg_save_area = 0x7f650effc8f0}}
> #2  0x00007f6531b0cede in kvm_cpu_exec (cpu=0x7f6532d31060) at .../kvm-all.c:1929
>         attrs = {unspecified = 0, secure = 0, user = 0, requester_id = 63804}
>         run = 0x7f6531885000
>         ret = 32613
>         run_ret = 32613
> #3  0x00007f6531af4f0a in qemu_kvm_cpu_thread_fn (arg=0x7f6532d31060) at .../cpus.c:1000
>         cpu = 0x7f6532d31060
>         r = 65536
> #4  0x00007f6528930dc5 in start_thread () from /lib64/libpthread.so.0
> No symbol table info available.
> #5  0x00007f6525bad73d in clone () from /lib64/libc.so.6
> No symbol table info available.
>
> Thread 5 (Thread 0x7f650e7fc700 (LWP 8559)):
> #0  0x00007f6525ba4507 in ioctl () from /lib64/libc.so.6
> No symbol table info available.
> #1  0x00007f6531b0d526 in kvm_vcpu_ioctl (cpu=0x7f6532d51190, type=44672) at .../kvm-all.c:2080
>         ret = 32613
>         arg = 0x0
>         ap = {{gp_offset = 24, fp_offset = 48, overflow_arg_area = 0x7f650e7fb9b0, reg_save_area = 0x7f650e7fb8f0}}
> #2  0x00007f6531b0cede in kvm_cpu_exec (cpu=0x7f6532d51190) at .../kvm-all.c:1929
>         attrs = {unspecified = 0, secure = 0, user = 0, requester_id = 63292}
>         run = 0x7f6531882000
>         ret = 32613
>         run_ret = 32613
> #3  0x00007f6531af4f0a in qemu_kvm_cpu_thread_fn (arg=0x7f6532d51190) at .../cpus.c:1000
>         cpu = 0x7f6532d51190
>         r = 65536
> #4  0x00007f6528930dc5 in start_thread () from /lib64/libpthread.so.0
> No symbol table info available.
> #5  0x00007f6525bad73d in clone () from /lib64/libc.so.6
> No symbol table info available.
>
> Thread 4 (Thread 0x7f650dffb700 (LWP 8560)):
> #0  0x00007f6525ba4507 in ioctl () from /lib64/libc.so.6
> No symbol table info available.
> #1  0x00007f6531b0d526 in kvm_vcpu_ioctl (cpu=0x7f6532d70a90, type=44672) at .../kvm-all.c:2080
>         ret = 32613
>         arg = 0x0
>         ap = {{gp_offset = 24, fp_offset = 48, overflow_arg_area = 0x7f650dffa9b0, reg_save_area = 0x7f650dffa8f0}}
> #2  0x00007f6531b0cede in kvm_cpu_exec (cpu=0x7f6532d70a90) at .../kvm-all.c:1929
>         attrs = {unspecified = 0, secure = 0, user = 0, requester_id = 62780}
>         run = 0x7f653187f000
>         ret = 32613
>         run_ret = 32613
> #3  0x00007f6531af4f0a in qemu_kvm_cpu_thread_fn (arg=0x7f6532d70a90) at .../cpus.c:1000
>         cpu = 0x7f6532d70a90
>         r = 65536
> #4  0x00007f6528930dc5 in start_thread () from /lib64/libpthread.so.0
> No symbol table info available.
> #5  0x00007f6525bad73d in clone () from /lib64/libc.so.6
> No symbol table info available.
>
> Thread 3 (Thread 0x7f63af22c700 (LWP 8562)):
> #0  0x00007f6525ba2dfd in poll () from /lib64/libc.so.6
> No symbol table info available.
> #1  0x00007f652744a327 in red_worker_main () from /lib64/libspice-server.so.1
> No symbol table info available.
> #2  0x00007f6528930dc5 in start_thread () from /lib64/libpthread.so.0
> No symbol table info available.
> #3  0x00007f6525bad73d in clone () from /lib64/libc.so.6
> No symbol table info available.
>
> Thread 2 (Thread 0x7f63ae9ff700 (LWP 8563)):
> #0  0x00007f65289346d5 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
> No symbol table info available.
> #1  0x00007f6531f72293 in qemu_cond_wait (cond=0x7f653443c710, mutex=0x7f653443c740) at .../util/qemu-thread-posix.c:133
>         err = 32613
>         __func__ = "qemu_cond_wait"
> #2  0x00007f6531e7fd47 in vnc_worker_thread_loop (queue=0x7f653443c710) at .../ui/vnc-jobs.c:205
>         job = 0x7f6534ca4700
>         entry = 0x0
>         tmp = 0x0
>         vs = {sioc = 0x0, ioc = 0x0, ioc_tag = 0, disconnecting = 0,
>         dirty = {{0, 0, 0} <repeats 2048 times>}, lossy_rect = 0x0, vd = 0x0,
>         need_update = 0, force_update = 0, has_dirty = 0, features = 0, absolute
>         = 0, last_x = 0, last_y = 0, last_bmask = 0, client_width = 0,
>         client_height = 0, share_mode = 0, vnc_encoding = 0, major = 0, minor =
>         0, auth = 0, subauth = 0, challenge = '\000' <repeats 15 times>, tls =
>         0x0, sasl = {conn = 0x0, wantSSF = false, runSSF = false, waitWriteSSF =
>         0, encoded = 0x0, encodedLength = 0, encodedOffset = 0, username = 0x0,
>         mechlist = 0x0}, encode_ws = false, websocket = false, info = 0x0,
>         output = {name = 0x0, capacity = 0, offset = 0, avg_size = 0, buffer =
>         0x0}, input = {name = 0x0, capacity = 0, offset = 0, avg_size = 0,
>         buffer = 0x0}, write_pixels = 0x0, client_pf = {bits_per_pixel = 0
>         '\000', bytes_per_pixel = 0 '\000', depth = 0 '\000', rmask = 0, gmask =
>         0, bmask = 0, amask = 0, rshift = 0 '\000', gshift = 0 '\000', bshift =
>         0 '\000', ashift = 0 '\000', rmax = 0 '\000', gmax = 0 '\000', bmax = 0
>         '\000', amax = 0 '\000', rbits = 0 '\000', gbits = 0 '\000', bbits = 0
>         '\000', abits = 0 '\000'}, client_format = 0, client_be = false,
>         audio_cap = 0x0, as = {freq = 0, nchannels = 0, fmt = AUD_FMT_U8,
>         endianness = 0}, read_handler = 0x0, read_handler_expect = 0,
>         modifiers_state = '\000' <repeats 255 times>, abort = false,
>         output_mutex = {lock = {__data = {__lock = 0, __count = 0, __owner = 0,
>         __nusers = 0, __kind = 0, __spins = 0, __list = {__prev = 0x0, __next =
>         0x0}}, __size = '\000' <repeats 39 times>, __align = 0}}, bh = 0x0,
>         jobs_buffer = {name = 0x0, capacity = 0, offset = 0, avg_size = 0,
>         buffer = 0x0}, tight = {type = 0, quality = 0 '\000', compression = 0
>         '\000', pixel24 = 0 '\000', tight = {name = 0x0, capacity = 0, offset =
>         0, avg_size = 0, buffer = 0x0}, tmp = {name = 0x0, capacity = 0, offset
>         = 0, avg_size = 0, buffer = 0x0}, zlib = {name = 0x0, capacity = 0,
>         offset = 0, avg_size = 0, buffer = 0x0}, gradient = {name = 0x0,
>         capacity = 0, offset = 0, avg_size = 0, buffer = 0x0}, jpeg = {name =
>         0x0, capacity = 0, offset = 0, avg_size = 0, buffer = 0x0}, png = {name
>         = 0x0, capacity = 0, offset = 0, avg_size = 0, buffer = 0x0}, levels =
>         {0, 0, 0, 0}, stream = {{next_in = 0x0, avail_in = 0, total_in = 0,
>         next_out = 0x0, avail_out = 0, total_out = 0, msg = 0x0, state = 0x0,
>         zalloc = 0x0, zfree = 0x0, opaque = 0x0, data_type = 0, adler = 0,
>         reserved = 0}, {next_in = 0x0, avail_in = 0, total_in = 0, next_out =
>         0x0, avail_out = 0, total_out = 0, msg = 0x0, state = 0x0, zalloc = 0x0,
>         zfree = 0x0, opaque = 0x0, data_type = 0, adler = 0, reserved = 0},
>         {next_in = 0x0, avail_in = 0, total_in = 0, next_out = 0x0, avail_out =
>         0, total_out = 0, msg = 0x0, state = 0x0, zalloc = 0x0, zfree = 0x0,
>         opaque = 0x0, data_type = 0, adler = 0, reserved = 0}, {next_in = 0x0,
>         avail_in = 0, total_in = 0, next_out = 0x0, avail_out = 0, total_out =
>         0, msg = 0x0, state = 0x0, zalloc = 0x0, zfree = 0x0, opaque = 0x0,
>         data_type = 0, adler = 0, reserved = 0}}}, zlib = {zlib = {name = 0x0,
>         capacity = 0, offset = 0, avg_size = 0, buffer = 0x0}, tmp = {name =
>         0x0, capacity = 0, offset = 0, avg_size = 0, buffer = 0x0}, stream =
>         {next_in = 0x0, avail_in = 0, total_in = 0, next_out = 0x0, avail_out =
>         0, total_out = 0, msg = 0x0, state = 0x0, zalloc = 0x0, zfree = 0x0,
>         opaque = 0x0, data_type = 0, adler = 0, reserved = 0}, level = 0},
>         hextile = {send_tile = 0x0}, zrle = {type = 0, fb = {name = 0x0,
>         capacity = 0, offset = 0, avg_size = 0, buffer = 0x0}, zrle = {name =
>         0x0, capacity = 0, offset = 0, avg_size = 0, buffer = 0x0}, tmp = {name
>         = 0x0, capacity = 0, offset = 0, avg_size = 0, buffer = 0x0}, zlib =
>         {name = 0x0, capacity = 0, offset = 0, avg_size = 0, buffer = 0x0},
>         stream = {next_in = 0x0, avail_in = 0, total_in = 0, next_out = 0x0,
>         avail_out = 0, total_out = 0, msg = 0x0, state = 0x0, zalloc = 0x0,
>         zfree = 0x0, opaque = 0x0, data_type = 0, adler = 0, reserved = 0},
>         palette = {pool = {{idx = 0, color = 0, next = {le_next = 0x0, le_prev =
>         0x0}} <repeats 256 times>}, size = 0, max = 0, bpp = 0, table =
>         {{lh_first = 0x0} <repeats 256 times>}}}, zywrle = {buf = {0 <repeats
>         4096 times>}}, mouse_mode_notifier = {notify = 0x0, node = {le_next =
>         0x0, le_prev = 0x0}}, next = {tqe_next = 0x0, tqe_prev = 0x0}}
>         n_rectangles = 4
>         saved_offset = 2
> #3  0x00007f6531e801cb in vnc_worker_thread (arg=0x7f653443c710) at .../ui/vnc-jobs.c:312
>         queue = 0x7f653443c710
> #4  0x00007f6528930dc5 in start_thread () from /lib64/libpthread.so.0
> No symbol table info available.
> #5  0x00007f6525bad73d in clone () from /lib64/libc.so.6
> No symbol table info available.
>
> Thread 1 (Thread 0x7f6531835c40 (LWP 8527)):
> #0  0x00007f65289371bd in __lll_lock_wait () from /lib64/libpthread.so.0
> No symbol table info available.
> #1  0x00007f6528932d02 in _L_lock_791 () from /lib64/libpthread.so.0
> No symbol table info available.
> #2  0x00007f6528932c08 in pthread_mutex_lock () from /lib64/libpthread.so.0
> No symbol table info available.
> #3  0x00007f6531f720cd in qemu_mutex_lock (mutex=0x7f6532606940 <qemu_global_mutex>) at .../util/qemu-thread-posix.c:60
>         err = 0
>         __func__ = "qemu_mutex_lock"
> #4  0x00007f6531af5830 in qemu_mutex_lock_iothread () at .../cpus.c:1351
> No locals.
> #5  0x00007f6531f6e8af in os_host_main_loop_wait (timeout=902507) at .../util/main-loop.c:257
>         ret = 1
>         spin_counter = 0
> #6  0x00007f6531f6e955 in main_loop_wait (nonblocking=0) at .../util/main-loop.c:508
>         ret = 0
>         timeout = 4294967295
>         timeout_ns = 902507
> #7  0x00007f6531c2bd89 in main_loop () at .../vl.c:1877
>         nonblocking = false
>         last_io = 0
> #8  0x00007f6531c335e0 in main (argc=88, argv=0x7ffd256a2918, envp=0x7ffd256a2be0) at .../vl.c:4628
>         i = 32613
>         snapshot = 0
>         linux_boot = 0
>         initrd_filename = 0x0
>         kernel_filename = 0x0
>         kernel_cmdline = 0x7f6531fc713e ""
>         boot_order = 0x7f6531faf741 "cad"
>         boot_once = 0x0
>         ds = 0x7f6534ca4120
>         cyls = 0
>         heads = 0
>         secs = 0
>         translation = 0
>         hda_opts = 0x0
>         opts = 0x7f6532b6ec50
>         machine_opts = 0x7f6532b6d840
>         icount_opts = 0x0
>         olist = 0x7f65324903a0 <qemu_machine_opts>
>         optind = 88
>         optarg = 0x7ffd256a4f54 "timestamp=on"
>         loadvm = 0x0
>         machine_class = 0x7f6532b96800
>         cpu_model = 0x7ffd256a464a "Haswell-noTSX,+vmx"
>         vga_model = 0x0
>         qtest_chrdev = 0x0
>         qtest_log = 0x0
>         pid_file = 0x0
>         incoming = 0x0
>         defconfig = true
>         userconfig = false
>         nographic = false
>         display_type = DT_DEFAULT
>         display_remote = 1
>         log_mask = 0x0
>         log_file = 0x0
>         trace_file = 0x0
>         maxram_size = 5368709120
>         ram_slots = 0
>         vmstate_dump_file = 0x0
>         main_loop_err = 0x0
>         err = 0x0
>         list_data_dirs = false
>         __func__ = "main"
>         __FUNCTION__ = "main"

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [Qemu-devel] [PULL 12/23] virtio: use VRingMemoryRegionCaches for avail and used rings
  2017-02-21 12:57   ` Gerd Hoffmann
  2017-02-21 16:25     ` Laszlo Ersek
@ 2017-02-21 18:08     ` Paolo Bonzini
  2017-02-21 19:07       ` Laszlo Ersek
  1 sibling, 1 reply; 36+ messages in thread
From: Paolo Bonzini @ 2017-02-21 18:08 UTC (permalink / raw)
  To: Gerd Hoffmann, Michael S. Tsirkin, Laszlo Ersek
  Cc: Peter Maydell, Jason Wang, qemu-devel, Stefan Hajnoczi



On 21/02/2017 13:57, Gerd Hoffmann wrote:
> This change breaks ovmf for me, although it isn't obvious to me why.
> Bisect landed here, and reverting indeed makes things going again.
> 
> Using q35 machine type, pcie virtio devices, with the rhel ovmf build
> (OVMF-20160608b-1.git988715a.el7.noarch).
> 
> First thing I've tried is swapping virtio-net for another nic,
> suspecting this change might trigger a bug in the ovmf virtio-net
> driver, but that didn't change things.
> 
> Effect is that qemu just exits, without logging some error, looks like a
> normal guest shutdown.  Firmware log doesn't give a clue either, it just
> stops at some point, again without any error message.  Here are the last
> lines of the log:
> 
> SataControllerStart START
> SataControllerStart error return status = Already started
> SetPciIntLine: [00:1C.0] PciRoot(0x0)/Pci(0x1C,0x0) -> 0x0A
> SetPciIntLine: [01:00.0] PciRoot(0x0)/Pci(0x1C,0x0)/Pci(0x0,0x0) -> 0x0A
> SetPciIntLine: [00:1C.1] PciRoot(0x0)/Pci(0x1C,0x1) -> 0x0A
> SetPciIntLine: [02:00.0] PciRoot(0x0)/Pci(0x1C,0x1)/Pci(0x0,0x0) -> 0x0A
> SetPciIntLine: [00:1C.2] PciRoot(0x0)/Pci(0x1C,0x2) -> 0x0A
> SetPciIntLine: [00:1C.3] PciRoot(0x0)/Pci(0x1C,0x3) -> 0x0A
> SetPciIntLine: [00:1C.4] PciRoot(0x0)/Pci(0x1C,0x4) -> 0x0A
> SetPciIntLine: [05:00.0] PciRoot(0x0)/Pci(0x1C,0x4)/Pci(0x0,0x0) -> 0x0A
> SetPciIntLine: [05:00.1] PciRoot(0x0)/Pci(0x1C,0x4)/Pci(0x0,0x1) -> 0x0A
> SetPciIntLine: [05:00.2] PciRoot(0x0)/Pci(0x1C,0x4)/Pci(0x0,0x2) -> 0x0A
> SetPciIntLine: [00:1C.5] PciRoot(0x0)/Pci(0x1C,0x5) -> 0x0A
> SetPciIntLine: [06:00.0] PciRoot(0x0)/Pci(0x1C,0x5)/Pci(0x0,0x0) -> 0x0A
> SetPciIntLine: [00:1C.6] PciRoot(0x0)/Pci(0x1C,0x6) -> 0x0A
> SetPciIntLine: [00:1C.7] PciRoot(0x0)/Pci(0x1C,0x7) -> 0x0A
> SetPciIntLine: [00:1F.2] PciRoot(0x0)/Pci(0x1F,0x2) -> 0x0A
> SetPciIntLine: [00:1F.3] PciRoot(0x0)/Pci(0x1F,0x3) -> 0x0A
> Select Item: 0x8
> Select Item: 0x17
> qemu -kernel was not used.

Command line?  I tried with

x86_64-softmmu/qemu-system-x86_64 \
  -pflash /usr/share/edk2/ovmf/OVMF_CODE.fd \
  -pflash /usr/share/edk2/ovmf/OVMF_VARS.fd  \
  --enable-kvm -M q35 -debugcon stdio \
  -global isa-debugcon.iobase=0x402 \
  -net nic,model=virtio \
  -net bridge,helper=/usr/libexec/qemu-bridge-helper,br=virbr0 \
  -snapshot foo.raw

and after the above lines I get:

Select Item: 0x19
[Bds]OsIndication: 0000000000000000
[Bds]=============Begin Load Options Dumping ...=============
  Driver Options:
  SysPrep Options:
  Boot Options:
    Boot0000: UiApp              0x0109
    Boot0001: UEFI QEMU DVD-ROM QM00005                  0x0001
    Boot0002: UEFI QEMU HARDDISK QM00001                 0x0001
    Boot0003: EFI Internal Shell                 0x0001
  PlatformRecovery Options:
    PlatformRecovery0000: Default PlatformRecovery               0x0001

(but I'm using a different OVMF build,
edk2-ovmf-20161105git3b25ca8-2.fc25.noarch; dinner time now).

Thanks,

Paolo

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [Qemu-devel] [PULL 12/23] virtio: use VRingMemoryRegionCaches for avail and used rings
  2017-02-21 18:08     ` Paolo Bonzini
@ 2017-02-21 19:07       ` Laszlo Ersek
  2017-02-21 20:04         ` Gerd Hoffmann
  0 siblings, 1 reply; 36+ messages in thread
From: Laszlo Ersek @ 2017-02-21 19:07 UTC (permalink / raw)
  To: Paolo Bonzini, Gerd Hoffmann, Michael S. Tsirkin
  Cc: Peter Maydell, Jason Wang, qemu-devel, Stefan Hajnoczi

On 02/21/17 19:08, Paolo Bonzini wrote:
> 
> 
> On 21/02/2017 13:57, Gerd Hoffmann wrote:
>> This change breaks ovmf for me, although it isn't obvious to me why.
>> Bisect landed here, and reverting indeed makes things going again.
>>
>> Using q35 machine type, pcie virtio devices, with the rhel ovmf build
>> (OVMF-20160608b-1.git988715a.el7.noarch).
>>
>> First thing I've tried is swapping virtio-net for another nic,
>> suspecting this change might trigger a bug in the ovmf virtio-net
>> driver, but that didn't change things.
>>
>> Effect is that qemu just exits, without logging some error, looks like a
>> normal guest shutdown.  Firmware log doesn't give a clue either, it just
>> stops at some point, again without any error message.  Here are the last
>> lines of the log:
>>
>> SataControllerStart START
>> SataControllerStart error return status = Already started
>> SetPciIntLine: [00:1C.0] PciRoot(0x0)/Pci(0x1C,0x0) -> 0x0A
>> SetPciIntLine: [01:00.0] PciRoot(0x0)/Pci(0x1C,0x0)/Pci(0x0,0x0) -> 0x0A
>> SetPciIntLine: [00:1C.1] PciRoot(0x0)/Pci(0x1C,0x1) -> 0x0A
>> SetPciIntLine: [02:00.0] PciRoot(0x0)/Pci(0x1C,0x1)/Pci(0x0,0x0) -> 0x0A
>> SetPciIntLine: [00:1C.2] PciRoot(0x0)/Pci(0x1C,0x2) -> 0x0A
>> SetPciIntLine: [00:1C.3] PciRoot(0x0)/Pci(0x1C,0x3) -> 0x0A
>> SetPciIntLine: [00:1C.4] PciRoot(0x0)/Pci(0x1C,0x4) -> 0x0A
>> SetPciIntLine: [05:00.0] PciRoot(0x0)/Pci(0x1C,0x4)/Pci(0x0,0x0) -> 0x0A
>> SetPciIntLine: [05:00.1] PciRoot(0x0)/Pci(0x1C,0x4)/Pci(0x0,0x1) -> 0x0A
>> SetPciIntLine: [05:00.2] PciRoot(0x0)/Pci(0x1C,0x4)/Pci(0x0,0x2) -> 0x0A
>> SetPciIntLine: [00:1C.5] PciRoot(0x0)/Pci(0x1C,0x5) -> 0x0A
>> SetPciIntLine: [06:00.0] PciRoot(0x0)/Pci(0x1C,0x5)/Pci(0x0,0x0) -> 0x0A
>> SetPciIntLine: [00:1C.6] PciRoot(0x0)/Pci(0x1C,0x6) -> 0x0A
>> SetPciIntLine: [00:1C.7] PciRoot(0x0)/Pci(0x1C,0x7) -> 0x0A
>> SetPciIntLine: [00:1F.2] PciRoot(0x0)/Pci(0x1F,0x2) -> 0x0A
>> SetPciIntLine: [00:1F.3] PciRoot(0x0)/Pci(0x1F,0x3) -> 0x0A
>> Select Item: 0x8
>> Select Item: 0x17
>> qemu -kernel was not used.
> 
> Command line?  I tried with
> 
> x86_64-softmmu/qemu-system-x86_64 \
>   -pflash /usr/share/edk2/ovmf/OVMF_CODE.fd \
>   -pflash /usr/share/edk2/ovmf/OVMF_VARS.fd  \
>   --enable-kvm -M q35 -debugcon stdio \
>   -global isa-debugcon.iobase=0x402 \
>   -net nic,model=virtio \
>   -net bridge,helper=/usr/libexec/qemu-bridge-helper,br=virbr0 \
>   -snapshot foo.raw
> 
> and after the above lines I get:
> 
> Select Item: 0x19
> [Bds]OsIndication: 0000000000000000
> [Bds]=============Begin Load Options Dumping ...=============
>   Driver Options:
>   SysPrep Options:
>   Boot Options:
>     Boot0000: UiApp              0x0109
>     Boot0001: UEFI QEMU DVD-ROM QM00005                  0x0001
>     Boot0002: UEFI QEMU HARDDISK QM00001                 0x0001
>     Boot0003: EFI Internal Shell                 0x0001
>   PlatformRecovery Options:
>     PlatformRecovery0000: Default PlatformRecovery               0x0001
> 
> (but I'm using a different OVMF build,
> edk2-ovmf-20161105git3b25ca8-2.fc25.noarch; dinner time now).

The bug reproduces for me with current upstream OVMF. I sent a full QEMU
backtrace (SIGSEGV) in another message. I use libvirt (and I assume Gerd
does that too, because otherwise the shell would report the SIGSEGV
immediately).

Interestingly, in that call stack, I see pflash_update -> blk_pwrite ->
blk_prw -> aio_poll -> ... -> virtio_queue_host_notifier_aio_poll.
Therefore the direct trigger seems to be a UEFI variable update, but I
don't understand how that lands in virtio code in QEMU.

Thanks
Laszlo

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [Qemu-devel] [PULL 12/23] virtio: use VRingMemoryRegionCaches for avail and used rings
  2017-02-21 19:07       ` Laszlo Ersek
@ 2017-02-21 20:04         ` Gerd Hoffmann
  0 siblings, 0 replies; 36+ messages in thread
From: Gerd Hoffmann @ 2017-02-21 20:04 UTC (permalink / raw)
  To: Laszlo Ersek
  Cc: Paolo Bonzini, Michael S. Tsirkin, Peter Maydell, Jason Wang,
	qemu-devel, Stefan Hajnoczi

[-- Attachment #1: Type: text/plain, Size: 306 bytes --]

  Hi,

> The bug reproduces for me with current upstream OVMF. I sent a full QEMU
> backtrace (SIGSEGV) in another message. I use libvirt (and I assume Gerd
> does that too, because otherwise the shell would report the SIGSEGV
> immediately).

libvirt indeed, config attached.

cheers,
  Gerd


[-- Attachment #2: Type: application/xml, Size: 5774 bytes --]

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [Qemu-devel] [PULL 12/23] virtio: use VRingMemoryRegionCaches for avail and used rings
  2017-02-21 17:54       ` Laszlo Ersek
@ 2017-02-22  9:03         ` Paolo Bonzini
  2017-02-23  0:32           ` Alex Williamson
  0 siblings, 1 reply; 36+ messages in thread
From: Paolo Bonzini @ 2017-02-22  9:03 UTC (permalink / raw)
  To: Laszlo Ersek, Gerd Hoffmann, Michael S. Tsirkin
  Cc: Peter Maydell, Jason Wang, qemu-devel, Stefan Hajnoczi



On 21/02/2017 18:54, Laszlo Ersek wrote:
> Actually, QEMU segfaults. From the dmesg:
> 
> [Tue Feb 21 18:47:28 2017] CPU 0/KVM[8298]: segfault at 48 ip
> 00007fcb5dd02105 sp 00007fcb49efc270 error 4 in
> qemu-system-x86_64[7fcb5dae3000+905000]
> 
> Complete backtrace below. (Thread 11 seems to be the one
> segfaulting.)

Indeed.  It's the virtio-blk device that is segfaulting, please try
this one liner (haven't reproduced it, but it seems obvious with the
backtrace and some sleep).

commit c1aa478c7181c543606ca81404c59e126d66213d
Author: Paolo Bonzini <pbonzini@redhat.com>
Date:   Wed Feb 22 10:02:37 2017 +0100

    virtio: check for vring setup in virtio_queue_empty
    
    If the vring has not been set up, there is nothing in the virtqueue.
    virtio_queue_host_notifier_aio_poll calls virtio_queue_empty even in
    this case; we have to filter it out just like virtio_queue_notify_aio_vq.
    
    Reported-by: Gerd Hoffmann <kraxel@redhat.com>
    Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
index 23483c7..e487e36 100644
--- a/hw/virtio/virtio.c
+++ b/hw/virtio/virtio.c
@@ -2291,7 +2291,7 @@ static bool virtio_queue_host_notifier_aio_poll(void *opaque)
     VirtQueue *vq = container_of(n, VirtQueue, host_notifier);
     bool progress;
 
-    if (virtio_queue_empty(vq)) {
+    if (!vq->vring.desc || virtio_queue_empty(vq)) {
         return false;
     }
 

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* Re: [Qemu-devel] [PULL 12/23] virtio: use VRingMemoryRegionCaches for avail and used rings
  2017-02-22  9:03         ` Paolo Bonzini
@ 2017-02-23  0:32           ` Alex Williamson
  2017-02-23  9:33             ` Cédric Le Goater
  0 siblings, 1 reply; 36+ messages in thread
From: Alex Williamson @ 2017-02-23  0:32 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Laszlo Ersek, Gerd Hoffmann, Michael S. Tsirkin, Peter Maydell,
	Jason Wang, qemu-devel, Stefan Hajnoczi

On Wed, 22 Feb 2017 10:03:56 +0100
Paolo Bonzini <pbonzini@redhat.com> wrote:

> On 21/02/2017 18:54, Laszlo Ersek wrote:
> > Actually, QEMU segfaults. From the dmesg:
> > 
> > [Tue Feb 21 18:47:28 2017] CPU 0/KVM[8298]: segfault at 48 ip
> > 00007fcb5dd02105 sp 00007fcb49efc270 error 4 in
> > qemu-system-x86_64[7fcb5dae3000+905000]
> > 
> > Complete backtrace below. (Thread 11 seems to be the one
> > segfaulting.)  
> 
> Indeed.  It's the virtio-blk device that is segfaulting, please try
> this one liner (haven't reproduced it, but it seems obvious with the
> backtrace and some sleep).

I hit this as well, the patch below resolves it.

Tested-by: Alex Williamson <alex.williamson@redhat.com>
 
> commit c1aa478c7181c543606ca81404c59e126d66213d
> Author: Paolo Bonzini <pbonzini@redhat.com>
> Date:   Wed Feb 22 10:02:37 2017 +0100
> 
>     virtio: check for vring setup in virtio_queue_empty
>     
>     If the vring has not been set up, there is nothing in the virtqueue.
>     virtio_queue_host_notifier_aio_poll calls virtio_queue_empty even in
>     this case; we have to filter it out just like virtio_queue_notify_aio_vq.
>     
>     Reported-by: Gerd Hoffmann <kraxel@redhat.com>
>     Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> 
> diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
> index 23483c7..e487e36 100644
> --- a/hw/virtio/virtio.c
> +++ b/hw/virtio/virtio.c
> @@ -2291,7 +2291,7 @@ static bool virtio_queue_host_notifier_aio_poll(void *opaque)
>      VirtQueue *vq = container_of(n, VirtQueue, host_notifier);
>      bool progress;
>  
> -    if (virtio_queue_empty(vq)) {
> +    if (!vq->vring.desc || virtio_queue_empty(vq)) {
>          return false;
>      }
>  
> 

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [Qemu-devel] [PULL 12/23] virtio: use VRingMemoryRegionCaches for avail and used rings
  2017-02-23  0:32           ` Alex Williamson
@ 2017-02-23  9:33             ` Cédric Le Goater
  2017-02-23  9:47               ` Paolo Bonzini
  0 siblings, 1 reply; 36+ messages in thread
From: Cédric Le Goater @ 2017-02-23  9:33 UTC (permalink / raw)
  To: Alex Williamson, Paolo Bonzini
  Cc: Peter Maydell, Michael S. Tsirkin, Jason Wang, qemu-devel,
	Gerd Hoffmann, Stefan Hajnoczi, Laszlo Ersek, David Gibson

Hello,

On 02/23/2017 01:32 AM, Alex Williamson wrote:
> On Wed, 22 Feb 2017 10:03:56 +0100
> Paolo Bonzini <pbonzini@redhat.com> wrote:
> 
>> On 21/02/2017 18:54, Laszlo Ersek wrote:
>>> Actually, QEMU segfaults. From the dmesg:
>>>
>>> [Tue Feb 21 18:47:28 2017] CPU 0/KVM[8298]: segfault at 48 ip
>>> 00007fcb5dd02105 sp 00007fcb49efc270 error 4 in
>>> qemu-system-x86_64[7fcb5dae3000+905000]
>>>
>>> Complete backtrace below. (Thread 11 seems to be the one
>>> segfaulting.)  
>>
>> Indeed.  It's the virtio-blk device that is segfaulting, please try
>> this one liner (haven't reproduced it, but it seems obvious with the
>> backtrace and some sleep).
> 
> I hit this as well, the patch below resolves it.

I still have migration issues with this patch on ppc, which fails
on the target guest with : 

qemu-system-ppc64: VQ 0 size 0x80 < last_avail_idx 0xe1e - used_idx 0x0
qemu-system-ppc64: Failed to load virtio-blk:virtio
qemu-system-ppc64: error while loading state for instance 0x0 of device 'pci@800000020000000:01.0/virtio-blk'
qemu-system-ppc64: load of migration failed: Operation not permitted

Thanks,

C. 

> Tested-by: Alex Williamson <alex.williamson@redhat.com>
>  
>> commit c1aa478c7181c543606ca81404c59e126d66213d
>> Author: Paolo Bonzini <pbonzini@redhat.com>
>> Date:   Wed Feb 22 10:02:37 2017 +0100
>>
>>     virtio: check for vring setup in virtio_queue_empty
>>     
>>     If the vring has not been set up, there is nothing in the virtqueue.
>>     virtio_queue_host_notifier_aio_poll calls virtio_queue_empty even in
>>     this case; we have to filter it out just like virtio_queue_notify_aio_vq.
>>     
>>     Reported-by: Gerd Hoffmann <kraxel@redhat.com>
>>     Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
>>
>> diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
>> index 23483c7..e487e36 100644
>> --- a/hw/virtio/virtio.c
>> +++ b/hw/virtio/virtio.c
>> @@ -2291,7 +2291,7 @@ static bool virtio_queue_host_notifier_aio_poll(void *opaque)
>>      VirtQueue *vq = container_of(n, VirtQueue, host_notifier);
>>      bool progress;
>>  
>> -    if (virtio_queue_empty(vq)) {
>> +    if (!vq->vring.desc || virtio_queue_empty(vq)) {
>>          return false;
>>      }
>>  
>>
> 
> 

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [Qemu-devel] [PULL 12/23] virtio: use VRingMemoryRegionCaches for avail and used rings
  2017-02-23  9:33             ` Cédric Le Goater
@ 2017-02-23  9:47               ` Paolo Bonzini
  2017-02-23 11:56                 ` Cédric Le Goater
  0 siblings, 1 reply; 36+ messages in thread
From: Paolo Bonzini @ 2017-02-23  9:47 UTC (permalink / raw)
  To: Cédric Le Goater, Alex Williamson
  Cc: Peter Maydell, Michael S. Tsirkin, Jason Wang, qemu-devel,
	Gerd Hoffmann, Stefan Hajnoczi, Laszlo Ersek, David Gibson



On 23/02/2017 10:33, Cédric Le Goater wrote:
> I still have migration issues with this patch on ppc, which fails
> on the target guest with : 
> 
> qemu-system-ppc64: VQ 0 size 0x80 < last_avail_idx 0xe1e - used_idx 0x0
> qemu-system-ppc64: Failed to load virtio-blk:virtio
> qemu-system-ppc64: error while loading state for instance 0x0 of device 'pci@800000020000000:01.0/virtio-blk'
> qemu-system-ppc64: load of migration failed: Operation not permitted

Stefan sent patches for that.  As a workaround, perhaps you can try
disabling virtio-modern.

Paolo

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [Qemu-devel] [PULL 12/23] virtio: use VRingMemoryRegionCaches for avail and used rings
  2017-02-23  9:47               ` Paolo Bonzini
@ 2017-02-23 11:56                 ` Cédric Le Goater
  0 siblings, 0 replies; 36+ messages in thread
From: Cédric Le Goater @ 2017-02-23 11:56 UTC (permalink / raw)
  To: Paolo Bonzini, Alex Williamson
  Cc: Peter Maydell, Michael S. Tsirkin, Jason Wang, qemu-devel,
	Gerd Hoffmann, Stefan Hajnoczi, Laszlo Ersek, David Gibson

On 02/23/2017 10:47 AM, Paolo Bonzini wrote:
> 
> 
> On 23/02/2017 10:33, Cédric Le Goater wrote:
>> I still have migration issues with this patch on ppc, which fails
>> on the target guest with : 
>>
>> qemu-system-ppc64: VQ 0 size 0x80 < last_avail_idx 0xe1e - used_idx 0x0
>> qemu-system-ppc64: Failed to load virtio-blk:virtio
>> qemu-system-ppc64: error while loading state for instance 0x0 of device 'pci@800000020000000:01.0/virtio-blk'
>> qemu-system-ppc64: load of migration failed: Operation not permitted
> 
> Stefan sent patches for that.  As a workaround, perhaps you can try
> disabling virtio-modern.

I just took the patches. All good now.

Thanks,

C.

^ permalink raw reply	[flat|nested] 36+ messages in thread

end of thread, other threads:[~2017-02-23 12:05 UTC | newest]

Thread overview: 36+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-02-17 19:53 [Qemu-devel] [PULL 00/23] virtio, pci: fixes, features Michael S. Tsirkin
2017-02-17 19:53 ` [Qemu-devel] [PULL 01/23] pci/pcie: don't assume cap id 0 is reserved Michael S. Tsirkin
2017-02-17 19:54 ` [Qemu-devel] [PULL 02/23] virtio: Report real progress in VQ aio poll handler Michael S. Tsirkin
2017-02-17 19:54 ` [Qemu-devel] [PULL 03/23] docs: add document to explain the usage of vNVDIMM Michael S. Tsirkin
2017-02-17 19:54 ` [Qemu-devel] [PULL 04/23] memory: make memory_listener_unregister idempotent Michael S. Tsirkin
2017-02-17 19:54 ` [Qemu-devel] [PULL 05/23] virtio: add virtio_*_phys_cached Michael S. Tsirkin
2017-02-17 19:54 ` [Qemu-devel] [PULL 06/23] virtio: use address_space_map/unmap to access descriptors Michael S. Tsirkin
2017-02-17 19:54 ` [Qemu-devel] [PULL 07/23] exec: make address_space_cache_destroy idempotent Michael S. Tsirkin
2017-02-17 19:54 ` [Qemu-devel] [PULL 08/23] virtio: use MemoryRegionCache to access descriptors Michael S. Tsirkin
2017-02-17 19:54 ` [Qemu-devel] [PULL 09/23] virtio: add MemoryListener to cache ring translations Michael S. Tsirkin
2017-02-17 19:54 ` [Qemu-devel] [PULL 10/23] virtio: use VRingMemoryRegionCaches for descriptor ring Michael S. Tsirkin
2017-02-17 19:54 ` [Qemu-devel] [PULL 11/23] virtio: check for vring setup in virtio_queue_update_used_idx Michael S. Tsirkin
2017-02-17 19:54 ` [Qemu-devel] [PULL 12/23] virtio: use VRingMemoryRegionCaches for avail and used rings Michael S. Tsirkin
2017-02-21 12:57   ` Gerd Hoffmann
2017-02-21 16:25     ` Laszlo Ersek
2017-02-21 17:54       ` Laszlo Ersek
2017-02-22  9:03         ` Paolo Bonzini
2017-02-23  0:32           ` Alex Williamson
2017-02-23  9:33             ` Cédric Le Goater
2017-02-23  9:47               ` Paolo Bonzini
2017-02-23 11:56                 ` Cédric Le Goater
2017-02-21 18:08     ` Paolo Bonzini
2017-02-21 19:07       ` Laszlo Ersek
2017-02-21 20:04         ` Gerd Hoffmann
2017-02-17 19:54 ` [Qemu-devel] [PULL 13/23] virtio: Fix no interrupt when not creating msi controller Michael S. Tsirkin
2017-02-17 19:54 ` [Qemu-devel] [PULL 14/23] pcie: simplify pcie_add_capability() Michael S. Tsirkin
2017-02-17 19:54 ` [Qemu-devel] [PULL 15/23] vfio: trace map/unmap for notify as well Michael S. Tsirkin
2017-02-17 19:54 ` [Qemu-devel] [PULL 16/23] vfio: introduce vfio_get_vaddr() Michael S. Tsirkin
2017-02-17 19:54 ` [Qemu-devel] [PULL 17/23] vfio: allow to notify unmap for very large region Michael S. Tsirkin
2017-02-17 19:54 ` [Qemu-devel] [PULL 18/23] intel_iommu: add "caching-mode" option Michael S. Tsirkin
2017-02-17 19:54 ` [Qemu-devel] [PULL 19/23] intel_iommu: simplify irq region translation Michael S. Tsirkin
2017-02-17 19:54 ` [Qemu-devel] [PULL 20/23] intel_iommu: renaming gpa to iova where proper Michael S. Tsirkin
2017-02-17 19:55 ` [Qemu-devel] [PULL 21/23] intel_iommu: convert dbg macros to traces for inv Michael S. Tsirkin
2017-02-17 19:55 ` [Qemu-devel] [PULL 22/23] intel_iommu: convert dbg macros to trace for trans Michael S. Tsirkin
2017-02-17 19:55 ` [Qemu-devel] [PULL 23/23] intel_iommu: vtd_slpt_level_shift check level Michael S. Tsirkin
2017-02-20 11:55 ` [Qemu-devel] [PULL 00/23] virtio, pci: fixes, features Peter Maydell

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.