All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [RFC PATCH 0/8] virtio/vhost DMAR support
@ 2016-03-25  2:13 Jason Wang
  2016-03-25  2:13 ` [Qemu-devel] [RFC PATCH 1/8] virtio: convert to use DMA api Jason Wang
                   ` (7 more replies)
  0 siblings, 8 replies; 19+ messages in thread
From: Jason Wang @ 2016-03-25  2:13 UTC (permalink / raw)
  To: mst, qemu-devel; +Cc: cornelia.huck, pbonzini, Jason Wang, peterx

Hi all:

As the userspace vitio driver became popular, this calls for the
request of secure DMA environemt (DMAR). So this series tries to make
DMAR works for virtio/vhost. The idea is let virtio/vhost co-work with
userspace iommu implememtation. This is done through:

- for virtio, do not assume address_space_memory and convert to use
  dma helpers.
- for vhost kernel, implement a device IOTLB by using device IOTLB API
  supported by kernel. With this API, vhost kernel can query IOTLB
  entry for a specified iova from qemu, qemu can invalidate an
  arbitrary range of iova in vhost kernel.

The device IOTLB API is totaly architecture independent, an example
implementation was done with intel iommu by:

- implement basic ATS (Address Translation Service) for virtio-pci,
  this will make device IOTLB visible for iommu driver in guest.
- implement device IOTLB descriptor processing in intel iommu, and
  trigger the device IOTLB invalidation in vhost through iommu
  notifier.

It could be easily ported to other IOMMU or architecture even if it
doesn't support device IOTLB. (e.g just invalidate the vhost IOTLB
during IOMMU IOTLB invalidation).

Test was done by:

- intel_iommu=on/strict in guest.
- vfio (unsafe interrupt mode) l2fwd in guest.

This main use case is the programs that use fixed mapping in guest
(e.g dpdk). If 1G hugepage were used in guest, thanks to the SLLPS
support, we can get 100% TLB hit rate for l2fwd in guest.

For the normal kernel driver which uses lots of dynamic mapping and
unmapping, we may see performance penalty, this could be optimized in
the future.

Please reivew.

Jason Wang (8):
  virtio: convert to use DMA api
  intel_iommu: name vtd address space with devfn
  intel_iommu: allocate new key when creating new address space
  exec: introduce address_space_get_iotlb_entry()
  virtio-pci: address space translation service (ATS) support
  intel_iommu: support device iotlb descriptor
  memory: handle alias for iommu notifier
  vhost_net: device IOTLB support

 exec.c                                    |  30 +++++
 hw/block/virtio-blk.c                     |   2 +-
 hw/char/virtio-serial-bus.c               |   3 +-
 hw/i386/intel_iommu.c                     |  92 ++++++++++++--
 hw/i386/intel_iommu_internal.h            |  13 +-
 hw/scsi/virtio-scsi.c                     |   4 +-
 hw/virtio/vhost-backend.c                 |  33 +++++
 hw/virtio/vhost.c                         | 203 ++++++++++++++++++++++++++----
 hw/virtio/virtio-pci.c                    |  23 +++-
 hw/virtio/virtio-pci.h                    |   4 +
 hw/virtio/virtio.c                        |  58 +++++----
 include/exec/memory.h                     |   7 ++
 include/hw/virtio/vhost-backend.h         |  14 +++
 include/hw/virtio/vhost.h                 |   6 +
 include/hw/virtio/virtio-access.h         |  64 ++++++++--
 include/hw/virtio/virtio-bus.h            |   1 +
 include/hw/virtio/virtio.h                |   4 +-
 include/standard-headers/linux/pci_regs.h |   1 +
 linux-headers/linux/vhost.h               |  35 ++++++
 memory.c                                  |   3 +
 20 files changed, 525 insertions(+), 75 deletions(-)

-- 
2.5.0

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [Qemu-devel] [RFC PATCH 1/8] virtio: convert to use DMA api
  2016-03-25  2:13 [Qemu-devel] [RFC PATCH 0/8] virtio/vhost DMAR support Jason Wang
@ 2016-03-25  2:13 ` Jason Wang
  2016-04-19 13:37   ` Michael S. Tsirkin
  2016-03-25  2:13 ` [Qemu-devel] [RFC PATCH 2/8] intel_iommu: name vtd address space with devfn Jason Wang
                   ` (6 subsequent siblings)
  7 siblings, 1 reply; 19+ messages in thread
From: Jason Wang @ 2016-03-25  2:13 UTC (permalink / raw)
  To: mst, qemu-devel
  Cc: Kevin Wolf, qemu-block, Jason Wang, peterx, Amit Shah,
	Stefan Hajnoczi, cornelia.huck, pbonzini

Currently, all virtio devices bypass IOMMU completely. This is because
address_space_memory is assumed and used during DMA emulation. This
patch converts the virtio core API to use DMA API. This idea is

- introducing a new transport specific helper to query the dma address
  space. (only pci version is implemented).
- query and use this address space during virtio device guest memory
  accessing

With this virtio devices will not bypass IOMMU anymore. Tested with
intel_iommu=on/strict with:

- virtio guest DMA series posted in https://lkml.org/lkml/2015/10/28/64.
- vfio (unsafe interrupt mode) dpdk l2fwd in guest

TODO:
- Feature bit for this
- Implement this for all transports

Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: Stefan Hajnoczi <stefanha@redhat.com>
Cc: Kevin Wolf <kwolf@redhat.com>
Cc: Amit Shah <amit.shah@redhat.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: qemu-block@nongnu.org
Signed-off-by: Jason Wang <jasowang@redhat.com>
---
 hw/block/virtio-blk.c             |  2 +-
 hw/char/virtio-serial-bus.c       |  3 +-
 hw/scsi/virtio-scsi.c             |  4 ++-
 hw/virtio/virtio-pci.c            |  9 ++++++
 hw/virtio/virtio.c                | 58 +++++++++++++++++++++++----------------
 include/hw/virtio/virtio-access.h | 42 +++++++++++++++++++++-------
 include/hw/virtio/virtio-bus.h    |  1 +
 include/hw/virtio/virtio.h        |  4 +--
 8 files changed, 85 insertions(+), 38 deletions(-)

diff --git a/hw/block/virtio-blk.c b/hw/block/virtio-blk.c
index cb710f1..9411f99 100644
--- a/hw/block/virtio-blk.c
+++ b/hw/block/virtio-blk.c
@@ -829,7 +829,7 @@ static int virtio_blk_load_device(VirtIODevice *vdev, QEMUFile *f,
 
     while (qemu_get_sbyte(f)) {
         VirtIOBlockReq *req;
-        req = qemu_get_virtqueue_element(f, sizeof(VirtIOBlockReq));
+        req = qemu_get_virtqueue_element(vdev, f, sizeof(VirtIOBlockReq));
         virtio_blk_init_request(s, req);
         req->next = s->rq;
         s->rq = req;
diff --git a/hw/char/virtio-serial-bus.c b/hw/char/virtio-serial-bus.c
index 99cb683..bdc5393 100644
--- a/hw/char/virtio-serial-bus.c
+++ b/hw/char/virtio-serial-bus.c
@@ -687,6 +687,7 @@ static void virtio_serial_post_load_timer_cb(void *opaque)
 static int fetch_active_ports_list(QEMUFile *f, int version_id,
                                    VirtIOSerial *s, uint32_t nr_active_ports)
 {
+    VirtIODevice *vdev = VIRTIO_DEVICE(s);
     uint32_t i;
 
     s->post_load = g_malloc0(sizeof(*s->post_load));
@@ -722,7 +723,7 @@ static int fetch_active_ports_list(QEMUFile *f, int version_id,
                 qemu_get_be64s(f, &port->iov_offset);
 
                 port->elem =
-                    qemu_get_virtqueue_element(f, sizeof(VirtQueueElement));
+                    qemu_get_virtqueue_element(vdev, f, sizeof(VirtQueueElement));
 
                 /*
                  *  Port was throttled on source machine.  Let's
diff --git a/hw/scsi/virtio-scsi.c b/hw/scsi/virtio-scsi.c
index 0c30d2e..26ce701 100644
--- a/hw/scsi/virtio-scsi.c
+++ b/hw/scsi/virtio-scsi.c
@@ -196,12 +196,14 @@ static void *virtio_scsi_load_request(QEMUFile *f, SCSIRequest *sreq)
     SCSIBus *bus = sreq->bus;
     VirtIOSCSI *s = container_of(bus, VirtIOSCSI, bus);
     VirtIOSCSICommon *vs = VIRTIO_SCSI_COMMON(s);
+    VirtIODevice *vdev = VIRTIO_DEVICE(s);
     VirtIOSCSIReq *req;
     uint32_t n;
 
     qemu_get_be32s(f, &n);
     assert(n < vs->conf.num_queues);
-    req = qemu_get_virtqueue_element(f, sizeof(VirtIOSCSIReq) + vs->cdb_size);
+    req = qemu_get_virtqueue_element(vdev, f,
+                                     sizeof(VirtIOSCSIReq) + vs->cdb_size);
     virtio_scsi_init_req(s, vs->cmd_vqs[n], req);
 
     if (virtio_scsi_parse_req(req, sizeof(VirtIOSCSICmdReq) + vs->cdb_size,
diff --git a/hw/virtio/virtio-pci.c b/hw/virtio/virtio-pci.c
index 0dadb66..5508b1c 100644
--- a/hw/virtio/virtio-pci.c
+++ b/hw/virtio/virtio-pci.c
@@ -1211,6 +1211,14 @@ static int virtio_pci_query_nvectors(DeviceState *d)
     return proxy->nvectors;
 }
 
+static AddressSpace *virtio_pci_get_dma_as(DeviceState *d)
+{
+    VirtIOPCIProxy *proxy = VIRTIO_PCI(d);
+    PCIDevice *dev = &proxy->pci_dev;
+
+    return pci_get_address_space(dev);
+}
+
 static int virtio_pci_add_mem_cap(VirtIOPCIProxy *proxy,
                                    struct virtio_pci_cap *cap)
 {
@@ -2495,6 +2503,7 @@ static void virtio_pci_bus_class_init(ObjectClass *klass, void *data)
     k->device_plugged = virtio_pci_device_plugged;
     k->device_unplugged = virtio_pci_device_unplugged;
     k->query_nvectors = virtio_pci_query_nvectors;
+    k->get_dma_as = virtio_pci_get_dma_as;
 }
 
 static const TypeInfo virtio_pci_bus_info = {
diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
index 08275a9..37c9951 100644
--- a/hw/virtio/virtio.c
+++ b/hw/virtio/virtio.c
@@ -21,6 +21,7 @@
 #include "hw/virtio/virtio-bus.h"
 #include "migration/migration.h"
 #include "hw/virtio/virtio-access.h"
+#include "sysemu/dma.h"
 
 /*
  * The alignment to use between consumer and producer parts of vring.
@@ -118,7 +119,7 @@ void virtio_queue_update_rings(VirtIODevice *vdev, int n)
 static void vring_desc_read(VirtIODevice *vdev, VRingDesc *desc,
                             hwaddr desc_pa, int i)
 {
-    address_space_read(&address_space_memory, desc_pa + i * sizeof(VRingDesc),
+    address_space_read(virtio_get_dma_as(vdev), desc_pa + i * sizeof(VRingDesc),
                        MEMTXATTRS_UNSPECIFIED, (void *)desc, sizeof(VRingDesc));
     virtio_tswap64s(vdev, &desc->addr);
     virtio_tswap32s(vdev, &desc->len);
@@ -160,7 +161,7 @@ static inline void vring_used_write(VirtQueue *vq, VRingUsedElem *uelem,
     virtio_tswap32s(vq->vdev, &uelem->id);
     virtio_tswap32s(vq->vdev, &uelem->len);
     pa = vq->vring.used + offsetof(VRingUsed, ring[i]);
-    address_space_write(&address_space_memory, pa, MEMTXATTRS_UNSPECIFIED,
+    address_space_write(virtio_get_dma_as(vq->vdev), pa, MEMTXATTRS_UNSPECIFIED,
                        (void *)uelem, sizeof(VRingUsedElem));
 }
 
@@ -240,6 +241,7 @@ int virtio_queue_empty(VirtQueue *vq)
 static void virtqueue_unmap_sg(VirtQueue *vq, const VirtQueueElement *elem,
                                unsigned int len)
 {
+    AddressSpace *dma_as = virtio_get_dma_as(vq->vdev);
     unsigned int offset;
     int i;
 
@@ -247,17 +249,17 @@ static void virtqueue_unmap_sg(VirtQueue *vq, const VirtQueueElement *elem,
     for (i = 0; i < elem->in_num; i++) {
         size_t size = MIN(len - offset, elem->in_sg[i].iov_len);
 
-        cpu_physical_memory_unmap(elem->in_sg[i].iov_base,
-                                  elem->in_sg[i].iov_len,
-                                  1, size);
+        dma_memory_unmap(dma_as, elem->in_sg[i].iov_base, elem->in_sg[i].iov_len,
+                         DMA_DIRECTION_FROM_DEVICE, size);
 
         offset += size;
     }
 
     for (i = 0; i < elem->out_num; i++)
-        cpu_physical_memory_unmap(elem->out_sg[i].iov_base,
-                                  elem->out_sg[i].iov_len,
-                                  0, elem->out_sg[i].iov_len);
+        dma_memory_unmap(dma_as, elem->out_sg[i].iov_base,
+                         elem->out_sg[i].iov_len,
+                         DMA_DIRECTION_TO_DEVICE,
+                         elem->out_sg[i].iov_len);
 }
 
 void virtqueue_discard(VirtQueue *vq, const VirtQueueElement *elem,
@@ -447,7 +449,8 @@ int virtqueue_avail_bytes(VirtQueue *vq, unsigned int in_bytes,
     return in_bytes <= in_total && out_bytes <= out_total;
 }
 
-static void virtqueue_map_desc(unsigned int *p_num_sg, hwaddr *addr, struct iovec *iov,
+static void virtqueue_map_desc(VirtIODevice *vdev,
+                               unsigned int *p_num_sg, hwaddr *addr, struct iovec *iov,
                                unsigned int max_num_sg, bool is_write,
                                hwaddr pa, size_t sz)
 {
@@ -462,7 +465,10 @@ static void virtqueue_map_desc(unsigned int *p_num_sg, hwaddr *addr, struct iove
             exit(1);
         }
 
-        iov[num_sg].iov_base = cpu_physical_memory_map(pa, &len, is_write);
+        iov[num_sg].iov_base = dma_memory_map(virtio_get_dma_as(vdev), pa, &len,
+                                              is_write ?
+                                              DMA_DIRECTION_FROM_DEVICE:
+                                              DMA_DIRECTION_TO_DEVICE);
         iov[num_sg].iov_len = len;
         addr[num_sg] = pa;
 
@@ -473,9 +479,9 @@ static void virtqueue_map_desc(unsigned int *p_num_sg, hwaddr *addr, struct iove
     *p_num_sg = num_sg;
 }
 
-static void virtqueue_map_iovec(struct iovec *sg, hwaddr *addr,
-                                unsigned int *num_sg, unsigned int max_size,
-                                int is_write)
+static void virtqueue_map_iovec(VirtIODevice *vdev, struct iovec *sg,
+                                hwaddr *addr, unsigned int *num_sg,
+                                unsigned int max_size, int is_write)
 {
     unsigned int i;
     hwaddr len;
@@ -494,7 +500,10 @@ static void virtqueue_map_iovec(struct iovec *sg, hwaddr *addr,
 
     for (i = 0; i < *num_sg; i++) {
         len = sg[i].iov_len;
-        sg[i].iov_base = cpu_physical_memory_map(addr[i], &len, is_write);
+        sg[i].iov_base = dma_memory_map(virtio_get_dma_as(vdev),
+                                        addr[i], &len, is_write ?
+                                        DMA_DIRECTION_FROM_DEVICE :
+                                        DMA_DIRECTION_TO_DEVICE);
         if (!sg[i].iov_base) {
             error_report("virtio: error trying to map MMIO memory");
             exit(1);
@@ -506,12 +515,15 @@ static void virtqueue_map_iovec(struct iovec *sg, hwaddr *addr,
     }
 }
 
-void virtqueue_map(VirtQueueElement *elem)
+void virtqueue_map(VirtIODevice *vdev, VirtQueueElement *elem)
 {
-    virtqueue_map_iovec(elem->in_sg, elem->in_addr, &elem->in_num,
-                        VIRTQUEUE_MAX_SIZE, 1);
-    virtqueue_map_iovec(elem->out_sg, elem->out_addr, &elem->out_num,
-                        VIRTQUEUE_MAX_SIZE, 0);
+    virtqueue_map_iovec(vdev, elem->in_sg, elem->in_addr, &elem->in_num,
+                        MIN(ARRAY_SIZE(elem->in_sg), ARRAY_SIZE(elem->in_addr)),
+                        1);
+    virtqueue_map_iovec(vdev, elem->out_sg, elem->out_addr, &elem->out_num,
+                        MIN(ARRAY_SIZE(elem->out_sg),
+                        ARRAY_SIZE(elem->out_addr)),
+                        0);
 }
 
 void *virtqueue_alloc_element(size_t sz, unsigned out_num, unsigned in_num)
@@ -580,14 +592,14 @@ void *virtqueue_pop(VirtQueue *vq, size_t sz)
     /* Collect all the descriptors */
     do {
         if (desc.flags & VRING_DESC_F_WRITE) {
-            virtqueue_map_desc(&in_num, addr + out_num, iov + out_num,
+            virtqueue_map_desc(vdev, &in_num, addr + out_num, iov + out_num,
                                VIRTQUEUE_MAX_SIZE - out_num, true, desc.addr, desc.len);
         } else {
             if (in_num) {
                 error_report("Incorrect order for descriptors");
                 exit(1);
             }
-            virtqueue_map_desc(&out_num, addr, iov,
+            virtqueue_map_desc(vdev, &out_num, addr, iov,
                                VIRTQUEUE_MAX_SIZE, false, desc.addr, desc.len);
         }
 
@@ -633,7 +645,7 @@ typedef struct VirtQueueElementOld {
     struct iovec out_sg[VIRTQUEUE_MAX_SIZE];
 } VirtQueueElementOld;
 
-void *qemu_get_virtqueue_element(QEMUFile *f, size_t sz)
+void *qemu_get_virtqueue_element(VirtIODevice *vdev, QEMUFile *f, size_t sz)
 {
     VirtQueueElement *elem;
     VirtQueueElementOld data;
@@ -664,7 +676,7 @@ void *qemu_get_virtqueue_element(QEMUFile *f, size_t sz)
         elem->out_sg[i].iov_len = data.out_sg[i].iov_len;
     }
 
-    virtqueue_map(elem);
+    virtqueue_map(vdev, elem);
     return elem;
 }
 
diff --git a/include/hw/virtio/virtio-access.h b/include/hw/virtio/virtio-access.h
index 8dc84f5..967cc75 100644
--- a/include/hw/virtio/virtio-access.h
+++ b/include/hw/virtio/virtio-access.h
@@ -15,8 +15,20 @@
 #ifndef _QEMU_VIRTIO_ACCESS_H
 #define _QEMU_VIRTIO_ACCESS_H
 #include "hw/virtio/virtio.h"
+#include "hw/virtio/virtio-bus.h"
 #include "exec/address-spaces.h"
 
+static inline AddressSpace *virtio_get_dma_as(VirtIODevice *vdev)
+{
+    BusState *qbus = qdev_get_parent_bus(DEVICE(vdev));
+    VirtioBusClass *k = VIRTIO_BUS_GET_CLASS(qbus);
+
+    if (k->get_dma_as) {
+        return k->get_dma_as(qbus->parent);
+    }
+    return &address_space_memory;
+}
+
 static inline bool virtio_access_is_big_endian(VirtIODevice *vdev)
 {
 #if defined(TARGET_IS_BIENDIAN)
@@ -34,45 +46,55 @@ static inline bool virtio_access_is_big_endian(VirtIODevice *vdev)
 
 static inline uint16_t virtio_lduw_phys(VirtIODevice *vdev, hwaddr pa)
 {
+    AddressSpace *dma_as = virtio_get_dma_as(vdev);
+
     if (virtio_access_is_big_endian(vdev)) {
-        return lduw_be_phys(&address_space_memory, pa);
+        return lduw_be_phys(dma_as, pa);
     }
-    return lduw_le_phys(&address_space_memory, pa);
+    return lduw_le_phys(dma_as, pa);
 }
 
 static inline uint32_t virtio_ldl_phys(VirtIODevice *vdev, hwaddr pa)
 {
+    AddressSpace *dma_as = virtio_get_dma_as(vdev);
+
     if (virtio_access_is_big_endian(vdev)) {
-        return ldl_be_phys(&address_space_memory, pa);
+        return ldl_be_phys(dma_as, pa);
     }
-    return ldl_le_phys(&address_space_memory, pa);
+    return ldl_le_phys(dma_as, pa);
 }
 
 static inline uint64_t virtio_ldq_phys(VirtIODevice *vdev, hwaddr pa)
 {
+    AddressSpace *dma_as = virtio_get_dma_as(vdev);
+
     if (virtio_access_is_big_endian(vdev)) {
-        return ldq_be_phys(&address_space_memory, pa);
+        return ldq_be_phys(dma_as, pa);
     }
-    return ldq_le_phys(&address_space_memory, pa);
+    return ldq_le_phys(dma_as, pa);
 }
 
 static inline void virtio_stw_phys(VirtIODevice *vdev, hwaddr pa,
                                    uint16_t value)
 {
+    AddressSpace *dma_as = virtio_get_dma_as(vdev);
+
     if (virtio_access_is_big_endian(vdev)) {
-        stw_be_phys(&address_space_memory, pa, value);
+        stw_be_phys(dma_as, pa, value);
     } else {
-        stw_le_phys(&address_space_memory, pa, value);
+        stw_le_phys(dma_as, pa, value);
     }
 }
 
 static inline void virtio_stl_phys(VirtIODevice *vdev, hwaddr pa,
                                    uint32_t value)
 {
+    AddressSpace *dma_as = virtio_get_dma_as(vdev);
+
     if (virtio_access_is_big_endian(vdev)) {
-        stl_be_phys(&address_space_memory, pa, value);
+        stl_be_phys(dma_as, pa, value);
     } else {
-        stl_le_phys(&address_space_memory, pa, value);
+        stl_le_phys(dma_as, pa, value);
     }
 }
 
diff --git a/include/hw/virtio/virtio-bus.h b/include/hw/virtio/virtio-bus.h
index 3f2c136..17c07af 100644
--- a/include/hw/virtio/virtio-bus.h
+++ b/include/hw/virtio/virtio-bus.h
@@ -76,6 +76,7 @@ typedef struct VirtioBusClass {
      * Note that changing this will break migration for this transport.
      */
     bool has_variable_vring_alignment;
+    AddressSpace *(*get_dma_as)(DeviceState *d);
 } VirtioBusClass;
 
 struct VirtioBusState {
diff --git a/include/hw/virtio/virtio.h b/include/hw/virtio/virtio.h
index 2b5b248..0908bf6 100644
--- a/include/hw/virtio/virtio.h
+++ b/include/hw/virtio/virtio.h
@@ -153,9 +153,9 @@ void virtqueue_discard(VirtQueue *vq, const VirtQueueElement *elem,
 void virtqueue_fill(VirtQueue *vq, const VirtQueueElement *elem,
                     unsigned int len, unsigned int idx);
 
-void virtqueue_map(VirtQueueElement *elem);
+void virtqueue_map(VirtIODevice *vdev, VirtQueueElement *elem);
 void *virtqueue_pop(VirtQueue *vq, size_t sz);
-void *qemu_get_virtqueue_element(QEMUFile *f, size_t sz);
+void *qemu_get_virtqueue_element(VirtIODevice *vdev, QEMUFile *f, size_t sz);
 void qemu_put_virtqueue_element(QEMUFile *f, VirtQueueElement *elem);
 int virtqueue_avail_bytes(VirtQueue *vq, unsigned int in_bytes,
                           unsigned int out_bytes);
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [Qemu-devel] [RFC PATCH 2/8] intel_iommu: name vtd address space with devfn
  2016-03-25  2:13 [Qemu-devel] [RFC PATCH 0/8] virtio/vhost DMAR support Jason Wang
  2016-03-25  2:13 ` [Qemu-devel] [RFC PATCH 1/8] virtio: convert to use DMA api Jason Wang
@ 2016-03-25  2:13 ` Jason Wang
  2016-03-28  2:02   ` Peter Xu
  2016-03-25  2:13 ` [Qemu-devel] [RFC PATCH 3/8] intel_iommu: allocate new key when creating new address space Jason Wang
                   ` (5 subsequent siblings)
  7 siblings, 1 reply; 19+ messages in thread
From: Jason Wang @ 2016-03-25  2:13 UTC (permalink / raw)
  To: mst, qemu-devel
  Cc: Eduardo Habkost, Jason Wang, peterx, cornelia.huck, pbonzini,
	Richard Henderson

To avoid duplicated name and ease debugging.

Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Eduardo Habkost <ehabkost@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
---
 hw/i386/intel_iommu.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
index 347718f..d647b42 100644
--- a/hw/i386/intel_iommu.c
+++ b/hw/i386/intel_iommu.c
@@ -1901,6 +1901,7 @@ VTDAddressSpace *vtd_find_add_as(IntelIOMMUState *s, PCIBus *bus, int devfn)
     uintptr_t key = (uintptr_t)bus;
     VTDBus *vtd_bus = g_hash_table_lookup(s->vtd_as_by_busptr, &key);
     VTDAddressSpace *vtd_dev_as;
+    char name[128];
 
     if (!vtd_bus) {
         /* No corresponding free() */
@@ -1913,6 +1914,7 @@ VTDAddressSpace *vtd_find_add_as(IntelIOMMUState *s, PCIBus *bus, int devfn)
     vtd_dev_as = vtd_bus->dev_as[devfn];
 
     if (!vtd_dev_as) {
+        sprintf(name, "intel_iommu_devfn_%d", devfn);
         vtd_bus->dev_as[devfn] = vtd_dev_as = g_malloc0(sizeof(VTDAddressSpace));
 
         vtd_dev_as->bus = bus;
@@ -1920,9 +1922,9 @@ VTDAddressSpace *vtd_find_add_as(IntelIOMMUState *s, PCIBus *bus, int devfn)
         vtd_dev_as->iommu_state = s;
         vtd_dev_as->context_cache_entry.context_cache_gen = 0;
         memory_region_init_iommu(&vtd_dev_as->iommu, OBJECT(s),
-                                 &s->iommu_ops, "intel_iommu", UINT64_MAX);
+                                 &s->iommu_ops, name, UINT64_MAX);
         address_space_init(&vtd_dev_as->as,
-                           &vtd_dev_as->iommu, "intel_iommu");
+                           &vtd_dev_as->iommu, name);
     }
     return vtd_dev_as;
 }
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [Qemu-devel] [RFC PATCH 3/8] intel_iommu: allocate new key when creating new address space
  2016-03-25  2:13 [Qemu-devel] [RFC PATCH 0/8] virtio/vhost DMAR support Jason Wang
  2016-03-25  2:13 ` [Qemu-devel] [RFC PATCH 1/8] virtio: convert to use DMA api Jason Wang
  2016-03-25  2:13 ` [Qemu-devel] [RFC PATCH 2/8] intel_iommu: name vtd address space with devfn Jason Wang
@ 2016-03-25  2:13 ` Jason Wang
  2016-03-28  2:07   ` Peter Xu
  2016-03-25  2:13 ` [Qemu-devel] [RFC PATCH 4/8] exec: introduce address_space_get_iotlb_entry() Jason Wang
                   ` (4 subsequent siblings)
  7 siblings, 1 reply; 19+ messages in thread
From: Jason Wang @ 2016-03-25  2:13 UTC (permalink / raw)
  To: mst, qemu-devel
  Cc: Eduardo Habkost, Jason Wang, peterx, cornelia.huck, pbonzini,
	Richard Henderson

We use the pointer to stack for key for new address space, this will break hash
table searching, fixing by g_malloc() a new key instead.

Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Eduardo Habkost <ehabkost@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
---
 hw/i386/intel_iommu.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
index d647b42..36b2072 100644
--- a/hw/i386/intel_iommu.c
+++ b/hw/i386/intel_iommu.c
@@ -1904,11 +1904,12 @@ VTDAddressSpace *vtd_find_add_as(IntelIOMMUState *s, PCIBus *bus, int devfn)
     char name[128];
 
     if (!vtd_bus) {
+        uintptr_t *new_key = g_malloc(sizeof(*new_key));
+        *new_key = (uintptr_t)bus;
         /* No corresponding free() */
         vtd_bus = g_malloc0(sizeof(VTDBus) + sizeof(VTDAddressSpace *) * VTD_PCI_DEVFN_MAX);
         vtd_bus->bus = bus;
-        key = (uintptr_t)bus;
-        g_hash_table_insert(s->vtd_as_by_busptr, &key, vtd_bus);
+        g_hash_table_insert(s->vtd_as_by_busptr, new_key, vtd_bus);
     }
 
     vtd_dev_as = vtd_bus->dev_as[devfn];
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [Qemu-devel] [RFC PATCH 4/8] exec: introduce address_space_get_iotlb_entry()
  2016-03-25  2:13 [Qemu-devel] [RFC PATCH 0/8] virtio/vhost DMAR support Jason Wang
                   ` (2 preceding siblings ...)
  2016-03-25  2:13 ` [Qemu-devel] [RFC PATCH 3/8] intel_iommu: allocate new key when creating new address space Jason Wang
@ 2016-03-25  2:13 ` Jason Wang
  2016-03-28  2:18   ` Peter Xu
  2016-03-25  2:13 ` [Qemu-devel] [RFC PATCH 5/8] virtio-pci: address space translation service (ATS) support Jason Wang
                   ` (3 subsequent siblings)
  7 siblings, 1 reply; 19+ messages in thread
From: Jason Wang @ 2016-03-25  2:13 UTC (permalink / raw)
  To: mst, qemu-devel
  Cc: Peter Crosthwaite, Jason Wang, peterx, cornelia.huck, pbonzini,
	Richard Henderson

This patch introduces a helper to query the iotlb entry for a
possible iova. This will be used by later device IOTLB API to enable
the capability for a dataplane (e.g vhost) to query the IOTLB.

Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Peter Crosthwaite <crosthwaite.peter@gmail.com>
Cc: Richard Henderson <rth@twiddle.net>
Signed-off-by: Jason Wang <jasowang@redhat.com>
---
 exec.c                | 30 ++++++++++++++++++++++++++++++
 include/exec/memory.h |  7 +++++++
 2 files changed, 37 insertions(+)

diff --git a/exec.c b/exec.c
index f398d21..31fac9f 100644
--- a/exec.c
+++ b/exec.c
@@ -411,6 +411,36 @@ address_space_translate_internal(AddressSpaceDispatch *d, hwaddr addr, hwaddr *x
 }
 
 /* Called from RCU critical section */
+IOMMUTLBEntry address_space_get_iotlb_entry(AddressSpace *as, hwaddr addr,
+                                            bool is_write)
+{
+    IOMMUTLBEntry iotlb = {0};
+    MemoryRegionSection *section;
+    MemoryRegion *mr;
+    hwaddr plen;
+
+    for (;;) {
+        AddressSpaceDispatch *d = atomic_rcu_read(&as->dispatch);
+        section = address_space_translate_internal(d, addr, &addr, &plen, true);
+        mr = section->mr;
+
+        if (!mr->iommu_ops) {
+            break;
+        }
+
+        iotlb = mr->iommu_ops->translate(mr, addr, is_write);
+        if (!(iotlb.perm & (1 << is_write))) {
+            iotlb.target_as = NULL;
+            break;
+        }
+
+        as = iotlb.target_as;
+    }
+
+    return iotlb;
+}
+
+/* Called from RCU critical section */
 MemoryRegion *address_space_translate(AddressSpace *as, hwaddr addr,
                                       hwaddr *xlat, hwaddr *plen,
                                       bool is_write)
diff --git a/include/exec/memory.h b/include/exec/memory.h
index 2de7898..0411a59 100644
--- a/include/exec/memory.h
+++ b/include/exec/memory.h
@@ -1310,6 +1310,13 @@ void address_space_stq(AddressSpace *as, hwaddr addr, uint64_t val,
                             MemTxAttrs attrs, MemTxResult *result);
 #endif
 
+
+/* address_space_get_iotlb_entry: translate an address into an IOTLB
+ * entry. Should be called from an RCU critical section.
+ */
+IOMMUTLBEntry address_space_get_iotlb_entry(AddressSpace *as, hwaddr addr,
+                                            bool is_write);
+
 /* address_space_translate: translate an address range into an address space
  * into a MemoryRegion and an address range into that section.  Should be
  * called from an RCU critical section, to avoid that the last reference
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [Qemu-devel] [RFC PATCH 5/8] virtio-pci: address space translation service (ATS) support
  2016-03-25  2:13 [Qemu-devel] [RFC PATCH 0/8] virtio/vhost DMAR support Jason Wang
                   ` (3 preceding siblings ...)
  2016-03-25  2:13 ` [Qemu-devel] [RFC PATCH 4/8] exec: introduce address_space_get_iotlb_entry() Jason Wang
@ 2016-03-25  2:13 ` Jason Wang
  2016-03-25  2:13 ` [Qemu-devel] [RFC PATCH 6/8] intel_iommu: support device iotlb descriptor Jason Wang
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 19+ messages in thread
From: Jason Wang @ 2016-03-25  2:13 UTC (permalink / raw)
  To: mst, qemu-devel; +Cc: cornelia.huck, pbonzini, Jason Wang, peterx

This patches enable the Address Translation Service support for virtio
pci devices. This is needed for a guest visible Device IOTLB
implementation and will be used by vhost device IOTLB API.

Cc: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
---
 hw/virtio/virtio-pci.c                    | 14 ++++++++++++--
 hw/virtio/virtio-pci.h                    |  4 ++++
 include/standard-headers/linux/pci_regs.h |  1 +
 3 files changed, 17 insertions(+), 2 deletions(-)

diff --git a/hw/virtio/virtio-pci.c b/hw/virtio/virtio-pci.c
index 5508b1c..0c4212c 100644
--- a/hw/virtio/virtio-pci.c
+++ b/hw/virtio/virtio-pci.c
@@ -1821,8 +1821,10 @@ static void virtio_pci_realize(PCIDevice *pci_dev, Error **errp)
 
     address_space_init(&proxy->modern_as, &proxy->modern_cfg, "virtio-pci-cfg-as");
 
-    if (pci_is_express(pci_dev) && pci_bus_is_express(pci_dev->bus) &&
-        !pci_bus_is_root(pci_dev->bus)) {
+    if (pci_is_express(pci_dev) && pci_bus_is_express(pci_dev->bus)) {
+        /* FIXME:
+         * &&!pci_bus_is_root(pci_dev->bus)) {
+         */
         int pos;
 
         pos = pcie_endpoint_cap_init(pci_dev, 0);
@@ -1836,6 +1838,12 @@ static void virtio_pci_realize(PCIDevice *pci_dev, Error **errp)
          * PCI Power Management Interface Specification.
          */
         pci_set_word(pci_dev->config + pos + PCI_PM_PMC, 0x3);
+
+        if (proxy->flags & VIRTIO_PCI_FLAG_ATS) {
+            pcie_add_capability(pci_dev, PCI_EXT_CAP_ID_ATS, 0x1,
+                                256, PCI_EXT_CAP_ATS_SIZEOF);
+        }
+
     } else {
         /*
          * make future invocations of pci_is_express() return false
@@ -1886,6 +1894,8 @@ static Property virtio_pci_properties[] = {
                     VIRTIO_PCI_FLAG_MODERN_PIO_NOTIFY_BIT, false),
     DEFINE_PROP_BIT("x-disable-pcie", VirtIOPCIProxy, flags,
                     VIRTIO_PCI_FLAG_DISABLE_PCIE_BIT, false),
+    DEFINE_PROP_BIT("ats", VirtIOPCIProxy, flags,
+                    VIRTIO_PCI_FLAG_ATS_BIT, false),
     DEFINE_PROP_END_OF_LIST(),
 };
 
diff --git a/hw/virtio/virtio-pci.h b/hw/virtio/virtio-pci.h
index e4548c2..f3b93a0 100644
--- a/hw/virtio/virtio-pci.h
+++ b/hw/virtio/virtio-pci.h
@@ -66,6 +66,7 @@ enum {
     VIRTIO_PCI_FLAG_MIGRATE_EXTRA_BIT,
     VIRTIO_PCI_FLAG_MODERN_PIO_NOTIFY_BIT,
     VIRTIO_PCI_FLAG_DISABLE_PCIE_BIT,
+    VIRTIO_PCI_FLAG_ATS_BIT,
 };
 
 /* Need to activate work-arounds for buggy guests at vmstate load. */
@@ -88,6 +89,9 @@ enum {
 #define VIRTIO_PCI_FLAG_MODERN_PIO_NOTIFY \
     (1 << VIRTIO_PCI_FLAG_MODERN_PIO_NOTIFY_BIT)
 
+/* address space translation service */
+#define VIRTIO_PCI_FLAG_ATS (1 << VIRTIO_PCI_FLAG_ATS_BIT)
+
 typedef struct {
     MSIMessage msg;
     int virq;
diff --git a/include/standard-headers/linux/pci_regs.h b/include/standard-headers/linux/pci_regs.h
index 1becea8..cfb798a 100644
--- a/include/standard-headers/linux/pci_regs.h
+++ b/include/standard-headers/linux/pci_regs.h
@@ -673,6 +673,7 @@
 #define PCI_EXT_CAP_ID_MAX	PCI_EXT_CAP_ID_PASID
 
 #define PCI_EXT_CAP_DSN_SIZEOF	12
+#define PCI_EXT_CAP_ATS_SIZEOF	8
 #define PCI_EXT_CAP_MCAST_ENDPOINT_SIZEOF 40
 
 /* Advanced Error Reporting */
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [Qemu-devel] [RFC PATCH 6/8] intel_iommu: support device iotlb descriptor
  2016-03-25  2:13 [Qemu-devel] [RFC PATCH 0/8] virtio/vhost DMAR support Jason Wang
                   ` (4 preceding siblings ...)
  2016-03-25  2:13 ` [Qemu-devel] [RFC PATCH 5/8] virtio-pci: address space translation service (ATS) support Jason Wang
@ 2016-03-25  2:13 ` Jason Wang
  2016-03-28  3:37   ` Peter Xu
  2016-03-25  2:13 ` [Qemu-devel] [RFC PATCH 7/8] memory: handle alias for iommu notifier Jason Wang
  2016-03-25  2:13 ` [Qemu-devel] [RFC PATCH 8/8] vhost_net: device IOTLB support Jason Wang
  7 siblings, 1 reply; 19+ messages in thread
From: Jason Wang @ 2016-03-25  2:13 UTC (permalink / raw)
  To: mst, qemu-devel
  Cc: Eduardo Habkost, Jason Wang, peterx, cornelia.huck, pbonzini,
	Richard Henderson

This patch enables device IOTLB support for intel iommu. The major
work is to implement QI device IOTLB descriptor processing and notify
the device through iommu notifier.

Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Eduardo Habkost <ehabkost@redhat.com>
Cc: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
---
 hw/i386/intel_iommu.c          | 81 ++++++++++++++++++++++++++++++++++++++----
 hw/i386/intel_iommu_internal.h | 13 +++++--
 2 files changed, 86 insertions(+), 8 deletions(-)

diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
index 36b2072..e23bf2c 100644
--- a/hw/i386/intel_iommu.c
+++ b/hw/i386/intel_iommu.c
@@ -728,11 +728,18 @@ static int vtd_dev_to_context_entry(IntelIOMMUState *s, uint8_t bus_num,
                     "context-entry hi 0x%"PRIx64 " lo 0x%"PRIx64,
                     ce->hi, ce->lo);
         return -VTD_FR_CONTEXT_ENTRY_INV;
-    } else if (ce->lo & VTD_CONTEXT_ENTRY_TT) {
-        VTD_DPRINTF(GENERAL, "error: unsupported Translation Type in "
-                    "context-entry hi 0x%"PRIx64 " lo 0x%"PRIx64,
-                    ce->hi, ce->lo);
-        return -VTD_FR_CONTEXT_ENTRY_INV;
+    } else {
+        switch (ce->lo & VTD_CONTEXT_ENTRY_TT) {
+        case VTD_CONTEXT_TT_MULTI_LEVEL:
+            /* fall through */
+        case VTD_CONTEXT_TT_DEV_IOTLB:
+            break;
+        default:
+            VTD_DPRINTF(GENERAL, "error: unsupported Translation Type in "
+                        "context-entry hi 0x%"PRIx64 " lo 0x%"PRIx64,
+                        ce->hi, ce->lo);
+            return -VTD_FR_CONTEXT_ENTRY_INV;
+        }
     }
     return 0;
 }
@@ -1361,6 +1368,60 @@ static bool vtd_process_iotlb_desc(IntelIOMMUState *s, VTDInvDesc *inv_desc)
     return true;
 }
 
+static bool vtd_process_device_iotlb_desc(IntelIOMMUState *s,
+                                          VTDInvDesc *inv_desc)
+{
+    VTDAddressSpace *vtd_dev_as;
+    IOMMUTLBEntry entry;
+    struct VTDBus *vtd_bus;
+    hwaddr addr;
+    uint64_t sz;
+    uint16_t sid;
+    uint8_t devfn;
+    bool size;
+    uint8_t bus_num;
+
+    addr = VTD_INV_DESC_DEVICE_IOTLB_ADDR(inv_desc->hi);
+    sid = VTD_INV_DESC_DEVICE_IOTLB_SID(inv_desc->lo);
+    devfn = sid & 0xff;
+    bus_num = sid >> 8;
+    size = VTD_INV_DESC_DEVICE_IOTLB_SIZE(inv_desc->hi);
+
+    if ((inv_desc->lo & VTD_INV_DESC_DEVICE_IOTLB_RSVD_LO) ||
+        (inv_desc->hi & VTD_INV_DESC_DEVICE_IOTLB_RSVD_HI)) {
+        VTD_DPRINTF(GENERAL, "error: non-zero reserved field in Device "
+                    "IOTLB Invalidate Descriptor hi 0x%"PRIx64 " lo 0x%"PRIx64,
+                    inv_desc->hi, inv_desc->lo);
+        return false;
+    }
+
+    vtd_bus = vtd_find_as_from_bus_num(s, bus_num);
+    if (!vtd_bus) {
+        goto done;
+    }
+
+    vtd_dev_as = vtd_bus->dev_as[devfn];
+    if (!vtd_dev_as) {
+        goto done;
+    }
+
+    if (size) {
+        sz = ffsll(~(addr >> VTD_PAGE_SHIFT));
+        addr = addr & ~((1 << (sz + VTD_PAGE_SHIFT)) - 1);
+        sz = VTD_PAGE_SIZE << sz;
+    } else {
+        sz = VTD_PAGE_SIZE;
+    }
+
+    entry.target_as = &vtd_dev_as->as;
+    entry.addr_mask = sz - 1;
+    entry.iova = addr;
+    memory_region_notify_iommu(entry.target_as->root, entry);
+
+done:
+    return true;
+}
+
 static bool vtd_process_inv_desc(IntelIOMMUState *s)
 {
     VTDInvDesc inv_desc;
@@ -1400,6 +1461,14 @@ static bool vtd_process_inv_desc(IntelIOMMUState *s)
         }
         break;
 
+    case VTD_INV_DESC_DEVICE:
+        VTD_DPRINTF(INV, "Device IOTLB Invalidation Descriptor hi 0x%"PRIx64
+                    " lo 0x%"PRIx64, inv_desc.hi, inv_desc.lo);
+        if (!vtd_process_device_iotlb_desc(s, &inv_desc)) {
+            return false;
+        }
+        break;
+
     default:
         VTD_DPRINTF(GENERAL, "error: unkonw Invalidation Descriptor type "
                     "hi 0x%"PRIx64 " lo 0x%"PRIx64 " type %"PRIu8,
@@ -1953,7 +2022,7 @@ static void vtd_init(IntelIOMMUState *s)
     s->next_frcd_reg = 0;
     s->cap = VTD_CAP_FRO | VTD_CAP_NFR | VTD_CAP_ND | VTD_CAP_MGAW |
              VTD_CAP_SAGAW | VTD_CAP_MAMV | VTD_CAP_PSI | VTD_CAP_SLLPS;
-    s->ecap = VTD_ECAP_QI | VTD_ECAP_IRO;
+    s->ecap = VTD_ECAP_QI | VTD_ECAP_DT | VTD_ECAP_IRO;
 
     vtd_reset_context_cache(s);
     vtd_reset_iotlb(s);
diff --git a/hw/i386/intel_iommu_internal.h b/hw/i386/intel_iommu_internal.h
index e5f514c..5b803d5 100644
--- a/hw/i386/intel_iommu_internal.h
+++ b/hw/i386/intel_iommu_internal.h
@@ -176,6 +176,7 @@
 /* (offset >> 4) << 8 */
 #define VTD_ECAP_IRO                (DMAR_IOTLB_REG_OFFSET << 4)
 #define VTD_ECAP_QI                 (1ULL << 1)
+#define VTD_ECAP_DT                 (1ULL << 2)
 
 /* CAP_REG */
 /* (offset >> 4) << 24 */
@@ -286,6 +287,7 @@ typedef struct VTDInvDesc VTDInvDesc;
 #define VTD_INV_DESC_TYPE               0xf
 #define VTD_INV_DESC_CC                 0x1 /* Context-cache Invalidate Desc */
 #define VTD_INV_DESC_IOTLB              0x2
+#define VTD_INV_DESC_DEVICE             0x3
 #define VTD_INV_DESC_WAIT               0x5 /* Invalidation Wait Descriptor */
 #define VTD_INV_DESC_NONE               0   /* Not an Invalidate Descriptor */
 
@@ -319,6 +321,13 @@ typedef struct VTDInvDesc VTDInvDesc;
 #define VTD_INV_DESC_IOTLB_RSVD_LO      0xffffffff0000ff00ULL
 #define VTD_INV_DESC_IOTLB_RSVD_HI      0xf80ULL
 
+/* Mask for Device IOTLB Invalidate Descriptor */
+#define VTD_INV_DESC_DEVICE_IOTLB_ADDR(val) ((val) & 0xfffffffffffff000ULL)
+#define VTD_INV_DESC_DEVICE_IOTLB_SIZE(val) ((val) & 0x1)
+#define VTD_INV_DESC_DEVICE_IOTLB_SID(val) (((val) >> 32) & 0xFFFFULL)
+#define VTD_INV_DESC_DEVICE_IOTLB_RSVD_HI 0xffeULL
+#define VTD_INV_DESC_DEVICE_IOTLB_RSVD_LO 0xffff0000ffe0fff8
+
 /* Information about page-selective IOTLB invalidate */
 struct VTDIOTLBPageInvInfo {
     uint16_t domain_id;
@@ -357,8 +366,8 @@ typedef struct VTDRootEntry VTDRootEntry;
 #define VTD_CONTEXT_ENTRY_FPD       (1ULL << 1) /* Fault Processing Disable */
 #define VTD_CONTEXT_ENTRY_TT        (3ULL << 2) /* Translation Type */
 #define VTD_CONTEXT_TT_MULTI_LEVEL  0
-#define VTD_CONTEXT_TT_DEV_IOTLB    1
-#define VTD_CONTEXT_TT_PASS_THROUGH 2
+#define VTD_CONTEXT_TT_DEV_IOTLB    (1ULL << 2)
+#define VTD_CONTEXT_TT_PASS_THROUGH (2ULL << 2)
 /* Second Level Page Translation Pointer*/
 #define VTD_CONTEXT_ENTRY_SLPTPTR   (~0xfffULL)
 #define VTD_CONTEXT_ENTRY_RSVD_LO   (0xff0ULL | ~VTD_HAW_MASK)
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [Qemu-devel] [RFC PATCH 7/8] memory: handle alias for iommu notifier
  2016-03-25  2:13 [Qemu-devel] [RFC PATCH 0/8] virtio/vhost DMAR support Jason Wang
                   ` (5 preceding siblings ...)
  2016-03-25  2:13 ` [Qemu-devel] [RFC PATCH 6/8] intel_iommu: support device iotlb descriptor Jason Wang
@ 2016-03-25  2:13 ` Jason Wang
  2016-03-25  2:13 ` [Qemu-devel] [RFC PATCH 8/8] vhost_net: device IOTLB support Jason Wang
  7 siblings, 0 replies; 19+ messages in thread
From: Jason Wang @ 2016-03-25  2:13 UTC (permalink / raw)
  To: mst, qemu-devel; +Cc: cornelia.huck, pbonzini, Jason Wang, peterx

Cc: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
---
 memory.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/memory.c b/memory.c
index 95f7209..858b390 100644
--- a/memory.c
+++ b/memory.c
@@ -1509,6 +1509,9 @@ bool memory_region_is_logging(MemoryRegion *mr, uint8_t client)
 
 void memory_region_register_iommu_notifier(MemoryRegion *mr, Notifier *n)
 {
+    if (mr->alias) {
+        memory_region_register_iommu_notifier(mr->alias, n);
+    }
     notifier_list_add(&mr->iommu_notify, n);
 }
 
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [Qemu-devel] [RFC PATCH 8/8] vhost_net: device IOTLB support
  2016-03-25  2:13 [Qemu-devel] [RFC PATCH 0/8] virtio/vhost DMAR support Jason Wang
                   ` (6 preceding siblings ...)
  2016-03-25  2:13 ` [Qemu-devel] [RFC PATCH 7/8] memory: handle alias for iommu notifier Jason Wang
@ 2016-03-25  2:13 ` Jason Wang
  7 siblings, 0 replies; 19+ messages in thread
From: Jason Wang @ 2016-03-25  2:13 UTC (permalink / raw)
  To: mst, qemu-devel; +Cc: cornelia.huck, pbonzini, Jason Wang, peterx

This patches implements Device IOTLB support for vhost kernel. This is
done through:

1) switch to use dma helpers when map/unmap vrings from vhost codes
2) kernel support for Device IOTLB API:

- allow vhost-net to query the IOMMU IOTLB entry through eventfd
- enable the ability for qemu to update a specified mapping of vhost
- through ioctl.
- enable the ability to invalidate a specified range of iova for the
  device IOTLB of vhost through ioctl. In x86/intel_iommu case this is
  triggered through iommu memory region notifier from device IOTLB
  invalidation descriptor processing routine.

With all the above, kernel vhost_net can co-operate with IOMMU.

Cc: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
---
 hw/virtio/vhost-backend.c         |  33 +++++++
 hw/virtio/vhost.c                 | 203 +++++++++++++++++++++++++++++++++-----
 include/hw/virtio/vhost-backend.h |  14 +++
 include/hw/virtio/vhost.h         |   6 ++
 include/hw/virtio/virtio-access.h |  22 +++++
 linux-headers/linux/vhost.h       |  35 +++++++
 6 files changed, 290 insertions(+), 23 deletions(-)

diff --git a/hw/virtio/vhost-backend.c b/hw/virtio/vhost-backend.c
index b358902..a1e4848 100644
--- a/hw/virtio/vhost-backend.c
+++ b/hw/virtio/vhost-backend.c
@@ -167,6 +167,35 @@ static int vhost_kernel_get_vq_index(struct vhost_dev *dev, int idx)
     return idx - dev->vq_index;
 }
 
+static int vhost_kernel_set_vring_iotlb_request(struct vhost_dev *dev,
+                                                struct
+                                                vhost_vring_iotlb_entry
+                                                *entry)
+{
+    int r = vhost_kernel_call(dev, VHOST_SET_VRING_IOTLB_REQUEST, entry);
+    return r;
+}
+
+static int vhost_kernel_update_iotlb(struct vhost_dev *dev,
+                                     struct vhost_iotlb_entry *entry)
+{
+    int r = vhost_kernel_call(dev, VHOST_UPDATE_IOTLB, entry);
+    return r;
+}
+
+static int vhost_kernel_run_iotlb(struct vhost_dev *dev,
+                                  int *enabled)
+{
+    int r = vhost_kernel_call(dev, VHOST_RUN_IOTLB, enabled);
+    return r;
+}
+
+static int vhost_kernel_set_vring_iotlb_call(struct vhost_dev *dev,
+                                             struct vhost_vring_file *file)
+{
+    return vhost_kernel_call(dev, VHOST_SET_VRING_IOTLB_CALL, file);
+}
+
 static const VhostOps kernel_ops = {
         .backend_type = VHOST_BACKEND_TYPE_KERNEL,
         .vhost_backend_init = vhost_kernel_init,
@@ -190,6 +219,10 @@ static const VhostOps kernel_ops = {
         .vhost_set_owner = vhost_kernel_set_owner,
         .vhost_reset_device = vhost_kernel_reset_device,
         .vhost_get_vq_index = vhost_kernel_get_vq_index,
+        .vhost_set_vring_iotlb_request = vhost_kernel_set_vring_iotlb_request,
+        .vhost_update_iotlb = vhost_kernel_update_iotlb,
+        .vhost_set_vring_iotlb_call = vhost_kernel_set_vring_iotlb_call,
+        .vhost_run_iotlb = vhost_kernel_run_iotlb,
 };
 
 int vhost_set_backend_type(struct vhost_dev *dev, VhostBackendType backend_type)
diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c
index 392d848..653b210 100644
--- a/hw/virtio/vhost.c
+++ b/hw/virtio/vhost.c
@@ -22,6 +22,7 @@
 #include "qemu/memfd.h"
 #include <linux/vhost.h>
 #include "exec/address-spaces.h"
+#include "exec/ram_addr.h"
 #include "hw/virtio/virtio-bus.h"
 #include "hw/virtio/virtio-access.h"
 #include "migration/migration.h"
@@ -407,6 +408,7 @@ static int vhost_verify_ring_mappings(struct vhost_dev *dev,
                                       uint64_t start_addr,
                                       uint64_t size)
 {
+    #if 0
     int i;
     int r = 0;
 
@@ -419,7 +421,7 @@ static int vhost_verify_ring_mappings(struct vhost_dev *dev,
             continue;
         }
         l = vq->ring_size;
-        p = cpu_physical_memory_map(vq->ring_phys, &l, 1);
+        p = virtio_memory_map(dev->vdev, vq->ring_phys, &l, 1);
         if (!p || l != vq->ring_size) {
             fprintf(stderr, "Unable to map ring buffer for ring %d\n", i);
             r = -ENOMEM;
@@ -428,9 +430,11 @@ static int vhost_verify_ring_mappings(struct vhost_dev *dev,
             fprintf(stderr, "Ring buffer relocated for ring %d\n", i);
             r = -EBUSY;
         }
-        cpu_physical_memory_unmap(p, l, 0, 0);
+        virtio_memory_unmap(dev->vdev, p, l, 0, 0);
     }
     return r;
+    #endif
+    return 0;
 }
 
 static struct vhost_memory_region *vhost_dev_find_reg(struct vhost_dev *dev,
@@ -662,6 +666,22 @@ static int vhost_dev_set_features(struct vhost_dev *dev, bool enable_log)
     return r < 0 ? -errno : 0;
 }
 
+static int vhost_dev_update_iotlb(struct vhost_dev *dev,
+                                  struct vhost_iotlb_entry *entry)
+{
+    int r;
+    r = dev->vhost_ops->vhost_update_iotlb(dev, entry);
+    return r < 0 ? -errno : 0;
+}
+
+static int vhost_run_iotlb(struct vhost_dev *dev,
+                           int *enabled)
+{
+    int r;
+    r = dev->vhost_ops->vhost_run_iotlb(dev, enabled);
+    return r < 0 ? -errno : 0;
+}
+
 static int vhost_dev_set_log(struct vhost_dev *dev, bool enable_log)
 {
     int r, t, i, idx;
@@ -798,6 +818,73 @@ static int vhost_virtqueue_set_vring_endian_legacy(struct vhost_dev *dev,
     return -errno;
 }
 
+static int vhost_memory_region_lookup(struct vhost_dev *hdev,
+                                      __u64 gpa, __u64 *uaddr, __u64 *len)
+{
+    int i;
+
+    for (i = 0; i < hdev->mem->nregions; i++) {
+        struct vhost_memory_region *reg = hdev->mem->regions + i;
+
+        if (gpa >= reg->guest_phys_addr &&
+            reg->guest_phys_addr + reg->memory_size > gpa) {
+            *uaddr = reg->userspace_addr + gpa - reg->guest_phys_addr;
+            *len = reg->guest_phys_addr + reg->memory_size - gpa;
+            return 0;
+        }
+    }
+
+    return -EFAULT;
+}
+
+static void vhost_device_iotlb_request(void *opaque)
+{
+    IOMMUTLBEntry iotlb;
+    struct vhost_virtqueue *vq = opaque;
+    struct vhost_dev *hdev = vq->dev;
+    struct vhost_iotlb_entry *request = vq->iotlb_req;
+    struct vhost_iotlb_entry reply = *request;
+
+    rcu_read_lock();
+
+    event_notifier_test_and_clear(&vq->iotlb_notifier);
+
+    reply.flags.type = VHOST_IOTLB_UPDATE;
+    reply.flags.valid = VHOST_IOTLB_INVALID;
+
+    if (request->flags.type != VHOST_IOTLB_MISS) {
+        goto done;
+    }
+
+    iotlb = address_space_get_iotlb_entry(virtio_get_dma_as(hdev->vdev),
+                                          request->iova,
+                                          false);
+    if (iotlb.target_as != NULL) {
+        if (vhost_memory_region_lookup(hdev, iotlb.translated_addr,
+                                       &reply.userspace_addr,
+                                       &reply.size)) {
+            goto done;
+        }
+        reply.iova = reply.iova & ~iotlb.addr_mask;
+        reply.size = MIN(iotlb.addr_mask + 1, reply.size);
+        if (iotlb.perm == IOMMU_RO) {
+            reply.flags.perm = VHOST_ACCESS_RO;
+        } else if (iotlb.perm == IOMMU_WO) {
+            reply.flags.perm = VHOST_ACCESS_WO;
+        } else if (iotlb.perm == IOMMU_RW) {
+            reply.flags.perm = VHOST_ACCESS_RW;
+        } else {
+            fprintf(stderr, "unknown iotlb perm!\n");
+        }
+        reply.flags.type = VHOST_IOTLB_UPDATE;
+        reply.flags.valid = VHOST_IOTLB_VALID;
+    }
+
+done:
+    vhost_dev_update_iotlb(hdev, &reply);
+    rcu_read_unlock();
+}
+
 static int vhost_virtqueue_start(struct vhost_dev *dev,
                                 struct VirtIODevice *vdev,
                                 struct vhost_virtqueue *vq,
@@ -838,21 +925,21 @@ static int vhost_virtqueue_start(struct vhost_dev *dev,
 
     s = l = virtio_queue_get_desc_size(vdev, idx);
     a = virtio_queue_get_desc_addr(vdev, idx);
-    vq->desc = cpu_physical_memory_map(a, &l, 0);
+    vq->desc = virtio_memory_map(vdev, a, &l, 0);
     if (!vq->desc || l != s) {
         r = -ENOMEM;
         goto fail_alloc_desc;
     }
     s = l = virtio_queue_get_avail_size(vdev, idx);
     a = virtio_queue_get_avail_addr(vdev, idx);
-    vq->avail = cpu_physical_memory_map(a, &l, 0);
+    vq->avail = virtio_memory_map(vdev, a, &l, 0);
     if (!vq->avail || l != s) {
         r = -ENOMEM;
         goto fail_alloc_avail;
     }
     vq->used_size = s = l = virtio_queue_get_used_size(vdev, idx);
     vq->used_phys = a = virtio_queue_get_used_addr(vdev, idx);
-    vq->used = cpu_physical_memory_map(a, &l, 1);
+    vq->used = virtio_memory_map(vdev, a, &l, 1);
     if (!vq->used || l != s) {
         r = -ENOMEM;
         goto fail_alloc_used;
@@ -860,7 +947,7 @@ static int vhost_virtqueue_start(struct vhost_dev *dev,
 
     vq->ring_size = s = l = virtio_queue_get_ring_size(vdev, idx);
     vq->ring_phys = a = virtio_queue_get_ring_addr(vdev, idx);
-    vq->ring = cpu_physical_memory_map(a, &l, 1);
+    vq->ring = virtio_memory_map(vdev, a, &l, 1);
     if (!vq->ring || l != s) {
         r = -ENOMEM;
         goto fail_alloc_ring;
@@ -891,20 +978,19 @@ static int vhost_virtqueue_start(struct vhost_dev *dev,
     }
 
     return 0;
-
 fail_kick:
 fail_alloc:
-    cpu_physical_memory_unmap(vq->ring, virtio_queue_get_ring_size(vdev, idx),
-                              0, 0);
+    virtio_memory_unmap(vdev, vq->ring, virtio_queue_get_ring_size(vdev, idx),
+                        0, 0);
 fail_alloc_ring:
-    cpu_physical_memory_unmap(vq->used, virtio_queue_get_used_size(vdev, idx),
-                              0, 0);
+    virtio_memory_unmap(vdev, vq->used, virtio_queue_get_used_size(vdev, idx),
+                        0, 0);
 fail_alloc_used:
-    cpu_physical_memory_unmap(vq->avail, virtio_queue_get_avail_size(vdev, idx),
-                              0, 0);
+    virtio_memory_unmap(vdev, vq->avail, virtio_queue_get_avail_size(vdev, idx),
+                        0, 0);
 fail_alloc_avail:
-    cpu_physical_memory_unmap(vq->desc, virtio_queue_get_desc_size(vdev, idx),
-                              0, 0);
+    virtio_memory_unmap(vdev, vq->desc, virtio_queue_get_desc_size(vdev, idx),
+                        0, 0);
 fail_alloc_desc:
     return r;
 }
@@ -941,14 +1027,14 @@ static void vhost_virtqueue_stop(struct vhost_dev *dev,
     }
 
     assert (r >= 0);
-    cpu_physical_memory_unmap(vq->ring, virtio_queue_get_ring_size(vdev, idx),
-                              0, virtio_queue_get_ring_size(vdev, idx));
-    cpu_physical_memory_unmap(vq->used, virtio_queue_get_used_size(vdev, idx),
-                              1, virtio_queue_get_used_size(vdev, idx));
-    cpu_physical_memory_unmap(vq->avail, virtio_queue_get_avail_size(vdev, idx),
-                              0, virtio_queue_get_avail_size(vdev, idx));
-    cpu_physical_memory_unmap(vq->desc, virtio_queue_get_desc_size(vdev, idx),
-                              0, virtio_queue_get_desc_size(vdev, idx));
+    virtio_memory_unmap(vdev, vq->ring, virtio_queue_get_ring_size(vdev, idx),
+                        0, virtio_queue_get_ring_size(vdev, idx));
+    virtio_memory_unmap(vdev, vq->used, virtio_queue_get_used_size(vdev, idx),
+                        1, virtio_queue_get_used_size(vdev, idx));
+    virtio_memory_unmap(vdev, vq->avail, virtio_queue_get_avail_size(vdev, idx),
+                        0, virtio_queue_get_avail_size(vdev, idx));
+    virtio_memory_unmap(vdev, vq->desc, virtio_queue_get_desc_size(vdev, idx),
+                         0, virtio_queue_get_desc_size(vdev, idx));
 }
 
 static void vhost_eventfd_add(MemoryListener *listener,
@@ -970,6 +1056,9 @@ static int vhost_virtqueue_init(struct vhost_dev *dev,
     struct vhost_vring_file file = {
         .index = vhost_vq_index,
     };
+    struct vhost_vring_iotlb_entry request = {
+        .index = vhost_vq_index,
+    };
     int r = event_notifier_init(&vq->masked_notifier, 0);
     if (r < 0) {
         return r;
@@ -981,7 +1070,37 @@ static int vhost_virtqueue_init(struct vhost_dev *dev,
         r = -errno;
         goto fail_call;
     }
+
+    r = event_notifier_init(&vq->iotlb_notifier, 0);
+    if (r < 0) {
+        r = -errno;
+        goto fail_call;
+    }
+
+    file.fd = event_notifier_get_fd(&vq->iotlb_notifier);
+    r = dev->vhost_ops->vhost_set_vring_iotlb_call(dev, &file);
+    if (r) {
+        r = -errno;
+        goto fail_iotlb;
+    }
+    qemu_set_fd_handler(event_notifier_get_fd(&vq->iotlb_notifier),
+                        vhost_device_iotlb_request, NULL, vq);
+
+    vq->iotlb_req = g_malloc0(sizeof(*vq->iotlb_req));
+    request.userspace_addr = (uint64_t)(unsigned long)vq->iotlb_req;
+    r = dev->vhost_ops->vhost_set_vring_iotlb_request(dev, &request);
+    if (r) {
+        r = -errno;
+        goto fail_req;
+    }
+
+    vq->dev = dev;
+
     return 0;
+fail_req:
+    qemu_set_fd_handler(file.fd, NULL, NULL, NULL);
+fail_iotlb:
+    event_notifier_cleanup(&vq->iotlb_notifier);
 fail_call:
     event_notifier_cleanup(&vq->masked_notifier);
     return r;
@@ -989,7 +1108,24 @@ fail_call:
 
 static void vhost_virtqueue_cleanup(struct vhost_virtqueue *vq)
 {
+    qemu_set_fd_handler(event_notifier_get_fd(&vq->iotlb_notifier),
+                        NULL, NULL, NULL);
     event_notifier_cleanup(&vq->masked_notifier);
+    event_notifier_cleanup(&vq->iotlb_notifier);
+    g_free(vq->iotlb_req);
+}
+
+static void vhost_iommu_unmap_notify(Notifier *n, void *data)
+{
+    struct vhost_dev *hdev = container_of(n, struct vhost_dev, n);
+    IOMMUTLBEntry *iotlb = data;
+    struct vhost_iotlb_entry inv = {
+        .flags.type = VHOST_IOTLB_INVALIDATE,
+        .iova = iotlb->iova,
+        .size = iotlb->addr_mask + 1,
+    };
+
+    vhost_dev_update_iotlb(hdev, &inv);
 }
 
 int vhost_dev_init(struct vhost_dev *hdev, void *opaque,
@@ -998,6 +1134,7 @@ int vhost_dev_init(struct vhost_dev *hdev, void *opaque,
     uint64_t features;
     int i, r;
 
+    hdev->vdev = NULL;
     hdev->migration_blocker = NULL;
 
     if (vhost_set_backend_type(hdev, backend_type) < 0) {
@@ -1052,6 +1189,8 @@ int vhost_dev_init(struct vhost_dev *hdev, void *opaque,
         .priority = 10
     };
 
+    hdev->n.notify = vhost_iommu_unmap_notify;
+
     if (hdev->migration_blocker == NULL) {
         if (!(hdev->features & (0x1ULL << VHOST_F_LOG_ALL))) {
             error_setg(&hdev->migration_blocker,
@@ -1231,6 +1370,10 @@ int vhost_dev_start(struct vhost_dev *hdev, VirtIODevice *vdev)
     if (r < 0) {
         goto fail_features;
     }
+
+    memory_region_register_iommu_notifier(virtio_get_dma_as(vdev)->root,
+                                          &hdev->n);
+
     r = hdev->vhost_ops->vhost_set_mem_table(hdev, hdev->mem);
     if (r < 0) {
         r = -errno;
@@ -1262,7 +1405,18 @@ int vhost_dev_start(struct vhost_dev *hdev, VirtIODevice *vdev)
         }
     }
 
+    /* FIXME: conditionally */
+    r = vhost_run_iotlb(hdev, NULL);
+    if (r < 0) {
+        goto fail_iotlb;
+    }
+
+    hdev->vdev = vdev;
     return 0;
+fail_iotlb:
+    if (hdev->vhost_ops->vhost_set_vring_enable) {
+        hdev->vhost_ops->vhost_set_vring_enable(hdev, 0);
+    }
 fail_log:
     vhost_log_put(hdev, false);
 fail_vq:
@@ -1273,6 +1427,7 @@ fail_vq:
                              hdev->vq_index + i);
     }
     i = hdev->nvqs;
+
 fail_mem:
 fail_features:
 
@@ -1292,9 +1447,11 @@ void vhost_dev_stop(struct vhost_dev *hdev, VirtIODevice *vdev)
                              hdev->vq_index + i);
     }
 
+    memory_region_unregister_iommu_notifier(&hdev->n);
     vhost_log_put(hdev, true);
     hdev->started = false;
     hdev->log = NULL;
     hdev->log_size = 0;
+    hdev->vdev = NULL;
 }
 
diff --git a/include/hw/virtio/vhost-backend.h b/include/hw/virtio/vhost-backend.h
index 95fcc96..db2931c 100644
--- a/include/hw/virtio/vhost-backend.h
+++ b/include/hw/virtio/vhost-backend.h
@@ -22,9 +22,11 @@ typedef enum VhostBackendType {
 struct vhost_dev;
 struct vhost_log;
 struct vhost_memory;
+struct vhost_iotlb_entry;
 struct vhost_vring_file;
 struct vhost_vring_state;
 struct vhost_vring_addr;
+struct vhost_vring_iotlb_entry;
 struct vhost_scsi_target;
 
 typedef int (*vhost_backend_init)(struct vhost_dev *dev, void *opaque);
@@ -72,6 +74,14 @@ typedef int (*vhost_migration_done_op)(struct vhost_dev *dev,
 typedef bool (*vhost_backend_can_merge_op)(struct vhost_dev *dev,
                                            uint64_t start1, uint64_t size1,
                                            uint64_t start2, uint64_t size2);
+typedef int (*vhost_set_vring_iotlb_request_op)(struct vhost_dev *dev,
+                                                struct vhost_vring_iotlb_entry *entry);
+typedef int (*vhost_update_iotlb_op)(struct vhost_dev *dev,
+                                     struct vhost_iotlb_entry *entry);
+typedef int (*vhost_set_vring_iotlb_call_op)(struct vhost_dev *dev,
+                                             struct vhost_vring_file *file);
+typedef int (*vhost_run_iotlb_op)(struct vhost_dev *dev,
+                                  int *enalbed);
 
 typedef struct VhostOps {
     VhostBackendType backend_type;
@@ -100,6 +110,10 @@ typedef struct VhostOps {
     vhost_requires_shm_log_op vhost_requires_shm_log;
     vhost_migration_done_op vhost_migration_done;
     vhost_backend_can_merge_op vhost_backend_can_merge;
+    vhost_set_vring_iotlb_request_op vhost_set_vring_iotlb_request;
+    vhost_update_iotlb_op vhost_update_iotlb;
+    vhost_set_vring_iotlb_call_op vhost_set_vring_iotlb_call;
+    vhost_run_iotlb_op vhost_run_iotlb;
 } VhostOps;
 
 extern const VhostOps user_ops;
diff --git a/include/hw/virtio/vhost.h b/include/hw/virtio/vhost.h
index b60d758..60d0706 100644
--- a/include/hw/virtio/vhost.h
+++ b/include/hw/virtio/vhost.h
@@ -20,6 +20,9 @@ struct vhost_virtqueue {
     unsigned long long ring_phys;
     unsigned ring_size;
     EventNotifier masked_notifier;
+    EventNotifier iotlb_notifier;
+    struct vhost_iotlb_entry *iotlb_req;
+    struct vhost_dev *dev;
 };
 
 typedef unsigned long vhost_log_chunk_t;
@@ -36,7 +39,9 @@ struct vhost_log {
 };
 
 struct vhost_memory;
+struct vhost_iotlb_entry;
 struct vhost_dev {
+    VirtIODevice *vdev;
     MemoryListener memory_listener;
     struct vhost_memory *mem;
     int n_mem_sections;
@@ -61,6 +66,7 @@ struct vhost_dev {
     void *opaque;
     struct vhost_log *log;
     QLIST_ENTRY(vhost_dev) entry;
+    Notifier n;
 };
 
 int vhost_dev_init(struct vhost_dev *hdev, void *opaque,
diff --git a/include/hw/virtio/virtio-access.h b/include/hw/virtio/virtio-access.h
index 967cc75..6b4b45a 100644
--- a/include/hw/virtio/virtio-access.h
+++ b/include/hw/virtio/virtio-access.h
@@ -16,6 +16,7 @@
 #define _QEMU_VIRTIO_ACCESS_H
 #include "hw/virtio/virtio.h"
 #include "hw/virtio/virtio-bus.h"
+#include "sysemu/dma.h"
 #include "exec/address-spaces.h"
 
 static inline AddressSpace *virtio_get_dma_as(VirtIODevice *vdev)
@@ -193,4 +194,25 @@ static inline void virtio_tswap64s(VirtIODevice *vdev, uint64_t *s)
 {
     *s = virtio_tswap64(vdev, *s);
 }
+
+static inline void *virtio_memory_map(VirtIODevice *vdev, hwaddr addr,
+                                      hwaddr *plen, int is_write)
+{
+    AddressSpace *dma_as = virtio_get_dma_as(vdev);
+
+    return dma_memory_map(dma_as, addr, plen, is_write ?
+                          DMA_DIRECTION_FROM_DEVICE : DMA_DIRECTION_TO_DEVICE);
+}
+
+static inline void virtio_memory_unmap(VirtIODevice *vdev, void *buffer,
+                                       hwaddr len, int is_write,
+                                       hwaddr access_len)
+{
+    AddressSpace *dma_as = virtio_get_dma_as(vdev);
+
+    dma_memory_unmap(dma_as, buffer, len, is_write ?
+                     DMA_DIRECTION_FROM_DEVICE : DMA_DIRECTION_TO_DEVICE,
+                     access_len);
+}
+
 #endif /* _QEMU_VIRTIO_ACCESS_H */
diff --git a/linux-headers/linux/vhost.h b/linux-headers/linux/vhost.h
index ead86db..0987c87 100644
--- a/linux-headers/linux/vhost.h
+++ b/linux-headers/linux/vhost.h
@@ -27,6 +27,32 @@ struct vhost_vring_file {
 
 };
 
+struct vhost_iotlb_entry {
+	__u64 iova;
+	__u64 size;
+	__u64 userspace_addr;
+	struct {
+#define VHOST_ACCESS_RO      0x1
+#define VHOST_ACCESS_WO      0x2
+#define VHOST_ACCESS_RW      0x3
+		__u8  perm;
+#define VHOST_IOTLB_MISS           1
+#define VHOST_IOTLB_UPDATE         2
+#define VHOST_IOTLB_INVALIDATE     3
+		__u8  type;
+#define VHOST_IOTLB_INVALID        0x1
+#define VHOST_IOTLB_VALID          0x2
+		__u8  valid;
+		__u8  u8_padding;
+		__u32 padding;
+	} flags;
+};
+
+struct vhost_vring_iotlb_entry {
+    unsigned int index;
+    __u64 userspace_addr;
+};
+
 struct vhost_vring_addr {
 	unsigned int index;
 	/* Option flags. */
@@ -127,6 +153,15 @@ struct vhost_memory {
 /* Set eventfd to signal an error */
 #define VHOST_SET_VRING_ERR _IOW(VHOST_VIRTIO, 0x22, struct vhost_vring_file)
 
+/* IOTLB */
+/* Specify an eventfd file descriptor to signle on IOTLB miss */
+#define VHOST_SET_VRING_IOTLB_CALL _IOW(VHOST_VIRTIO, 0x23, struct      \
+                                        vhost_vring_file)
+#define VHOST_SET_VRING_IOTLB_REQUEST _IOW(VHOST_VIRTIO, 0x25, struct   \
+                                           vhost_vring_iotlb_entry)
+#define VHOST_UPDATE_IOTLB _IOW(VHOST_VIRTIO, 0x24, struct vhost_iotlb_entry)
+#define VHOST_RUN_IOTLB _IOW(VHOST_VIRTIO, 0x26, int)
+
 /* VHOST_NET specific defines */
 
 /* Attach virtio net ring to a raw socket, or tap device.
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 2/8] intel_iommu: name vtd address space with devfn
  2016-03-25  2:13 ` [Qemu-devel] [RFC PATCH 2/8] intel_iommu: name vtd address space with devfn Jason Wang
@ 2016-03-28  2:02   ` Peter Xu
  2016-03-30  1:12     ` Jason Wang
  0 siblings, 1 reply; 19+ messages in thread
From: Peter Xu @ 2016-03-28  2:02 UTC (permalink / raw)
  To: Jason Wang
  Cc: Eduardo Habkost, mst, qemu-devel, cornelia.huck, pbonzini,
	Richard Henderson

On Fri, Mar 25, 2016 at 10:13:23AM +0800, Jason Wang wrote:
> To avoid duplicated name and ease debugging.
> 
> Cc: Michael S. Tsirkin <mst@redhat.com>
> Cc: Paolo Bonzini <pbonzini@redhat.com>
> Cc: Richard Henderson <rth@twiddle.net>
> Cc: Eduardo Habkost <ehabkost@redhat.com>
> Signed-off-by: Jason Wang <jasowang@redhat.com>
> ---
>  hw/i386/intel_iommu.c | 6 ++++--
>  1 file changed, 4 insertions(+), 2 deletions(-)
> 
> diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
> index 347718f..d647b42 100644
> --- a/hw/i386/intel_iommu.c
> +++ b/hw/i386/intel_iommu.c
> @@ -1901,6 +1901,7 @@ VTDAddressSpace *vtd_find_add_as(IntelIOMMUState *s, PCIBus *bus, int devfn)
>      uintptr_t key = (uintptr_t)bus;
>      VTDBus *vtd_bus = g_hash_table_lookup(s->vtd_as_by_busptr, &key);
>      VTDAddressSpace *vtd_dev_as;
> +    char name[128];
>  
>      if (!vtd_bus) {
>          /* No corresponding free() */
> @@ -1913,6 +1914,7 @@ VTDAddressSpace *vtd_find_add_as(IntelIOMMUState *s, PCIBus *bus, int devfn)
>      vtd_dev_as = vtd_bus->dev_as[devfn];
>  
>      if (!vtd_dev_as) {
> +        sprintf(name, "intel_iommu_devfn_%d", devfn);

It's safe here, but would snprintf() look better?

>          vtd_bus->dev_as[devfn] = vtd_dev_as = g_malloc0(sizeof(VTDAddressSpace));
>  
>          vtd_dev_as->bus = bus;
> @@ -1920,9 +1922,9 @@ VTDAddressSpace *vtd_find_add_as(IntelIOMMUState *s, PCIBus *bus, int devfn)
>          vtd_dev_as->iommu_state = s;
>          vtd_dev_as->context_cache_entry.context_cache_gen = 0;
>          memory_region_init_iommu(&vtd_dev_as->iommu, OBJECT(s),
> -                                 &s->iommu_ops, "intel_iommu", UINT64_MAX);
> +                                 &s->iommu_ops, name, UINT64_MAX);
>          address_space_init(&vtd_dev_as->as,
> -                           &vtd_dev_as->iommu, "intel_iommu");
> +                           &vtd_dev_as->iommu, name);
>      }
>      return vtd_dev_as;
>  }
> -- 
> 2.5.0
> 

Besides the nit-pick:

Acked-by: Peter Xu <peterx@redhat.com>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 3/8] intel_iommu: allocate new key when creating new address space
  2016-03-25  2:13 ` [Qemu-devel] [RFC PATCH 3/8] intel_iommu: allocate new key when creating new address space Jason Wang
@ 2016-03-28  2:07   ` Peter Xu
  0 siblings, 0 replies; 19+ messages in thread
From: Peter Xu @ 2016-03-28  2:07 UTC (permalink / raw)
  To: Jason Wang
  Cc: Eduardo Habkost, mst, qemu-devel, cornelia.huck, pbonzini,
	Richard Henderson

On Fri, Mar 25, 2016 at 10:13:24AM +0800, Jason Wang wrote:
> We use the pointer to stack for key for new address space, this will break hash
> table searching, fixing by g_malloc() a new key instead.
> 
> Cc: Michael S. Tsirkin <mst@redhat.com>
> Cc: Paolo Bonzini <pbonzini@redhat.com>
> Cc: Richard Henderson <rth@twiddle.net>
> Cc: Eduardo Habkost <ehabkost@redhat.com>
> Signed-off-by: Jason Wang <jasowang@redhat.com>

Acked-by: Peter Xu <peterx@redhat.com>

> ---
>  hw/i386/intel_iommu.c | 5 +++--
>  1 file changed, 3 insertions(+), 2 deletions(-)
> 
> diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
> index d647b42..36b2072 100644
> --- a/hw/i386/intel_iommu.c
> +++ b/hw/i386/intel_iommu.c
> @@ -1904,11 +1904,12 @@ VTDAddressSpace *vtd_find_add_as(IntelIOMMUState *s, PCIBus *bus, int devfn)
>      char name[128];
>  
>      if (!vtd_bus) {
> +        uintptr_t *new_key = g_malloc(sizeof(*new_key));
> +        *new_key = (uintptr_t)bus;
>          /* No corresponding free() */
>          vtd_bus = g_malloc0(sizeof(VTDBus) + sizeof(VTDAddressSpace *) * VTD_PCI_DEVFN_MAX);
>          vtd_bus->bus = bus;
> -        key = (uintptr_t)bus;
> -        g_hash_table_insert(s->vtd_as_by_busptr, &key, vtd_bus);
> +        g_hash_table_insert(s->vtd_as_by_busptr, new_key, vtd_bus);
>      }
>  
>      vtd_dev_as = vtd_bus->dev_as[devfn];
> -- 
> 2.5.0
> 

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 4/8] exec: introduce address_space_get_iotlb_entry()
  2016-03-25  2:13 ` [Qemu-devel] [RFC PATCH 4/8] exec: introduce address_space_get_iotlb_entry() Jason Wang
@ 2016-03-28  2:18   ` Peter Xu
  2016-03-30  1:13     ` Jason Wang
  0 siblings, 1 reply; 19+ messages in thread
From: Peter Xu @ 2016-03-28  2:18 UTC (permalink / raw)
  To: Jason Wang
  Cc: Peter Crosthwaite, mst, qemu-devel, cornelia.huck, pbonzini,
	Richard Henderson

On Fri, Mar 25, 2016 at 10:13:25AM +0800, Jason Wang wrote:
> This patch introduces a helper to query the iotlb entry for a
> possible iova. This will be used by later device IOTLB API to enable
> the capability for a dataplane (e.g vhost) to query the IOTLB.
> 
> Cc: Paolo Bonzini <pbonzini@redhat.com>
> Cc: Peter Crosthwaite <crosthwaite.peter@gmail.com>
> Cc: Richard Henderson <rth@twiddle.net>
> Signed-off-by: Jason Wang <jasowang@redhat.com>
> ---
>  exec.c                | 30 ++++++++++++++++++++++++++++++
>  include/exec/memory.h |  7 +++++++
>  2 files changed, 37 insertions(+)
> 
> diff --git a/exec.c b/exec.c
> index f398d21..31fac9f 100644
> --- a/exec.c
> +++ b/exec.c
> @@ -411,6 +411,36 @@ address_space_translate_internal(AddressSpaceDispatch *d, hwaddr addr, hwaddr *x
>  }
>  
>  /* Called from RCU critical section */
> +IOMMUTLBEntry address_space_get_iotlb_entry(AddressSpace *as, hwaddr addr,
> +                                            bool is_write)
> +{
> +    IOMMUTLBEntry iotlb = {0};
> +    MemoryRegionSection *section;
> +    MemoryRegion *mr;
> +    hwaddr plen;
> +
> +    for (;;) {
> +        AddressSpaceDispatch *d = atomic_rcu_read(&as->dispatch);
> +        section = address_space_translate_internal(d, addr, &addr, &plen, true);
> +        mr = section->mr;
> +
> +        if (!mr->iommu_ops) {
> +            break;
> +        }
> +
> +        iotlb = mr->iommu_ops->translate(mr, addr, is_write);
> +        if (!(iotlb.perm & (1 << is_write))) {
> +            iotlb.target_as = NULL;
> +            break;
> +        }

Here, do we still need something like:

        addr = ((iotlb.translated_addr & ~iotlb.addr_mask)
                | (addr & iotlb.addr_mask));

Just as address_space_translate() does? Now "addr" should be the
offset in memory region "mr", while we need it to be the offset in
address space if there are more loops, right?

Also, not sure whether we can abstract a shared function out of this
function and address_space_translate().

Thanks.

-- peterx

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 6/8] intel_iommu: support device iotlb descriptor
  2016-03-25  2:13 ` [Qemu-devel] [RFC PATCH 6/8] intel_iommu: support device iotlb descriptor Jason Wang
@ 2016-03-28  3:37   ` Peter Xu
  2016-03-30  5:08     ` Jason Wang
  0 siblings, 1 reply; 19+ messages in thread
From: Peter Xu @ 2016-03-28  3:37 UTC (permalink / raw)
  To: Jason Wang
  Cc: Eduardo Habkost, mst, qemu-devel, cornelia.huck, pbonzini,
	Richard Henderson

On Fri, Mar 25, 2016 at 10:13:27AM +0800, Jason Wang wrote:
> This patch enables device IOTLB support for intel iommu. The major
> work is to implement QI device IOTLB descriptor processing and notify
> the device through iommu notifier.
> 
> Cc: Paolo Bonzini <pbonzini@redhat.com>
> Cc: Richard Henderson <rth@twiddle.net>
> Cc: Eduardo Habkost <ehabkost@redhat.com>
> Cc: Michael S. Tsirkin <mst@redhat.com>
> Signed-off-by: Jason Wang <jasowang@redhat.com>
> ---
>  hw/i386/intel_iommu.c          | 81 ++++++++++++++++++++++++++++++++++++++----
>  hw/i386/intel_iommu_internal.h | 13 +++++--
>  2 files changed, 86 insertions(+), 8 deletions(-)
> 

[...]

> +static bool vtd_process_device_iotlb_desc(IntelIOMMUState *s,
> +                                          VTDInvDesc *inv_desc)
> +{
> +    VTDAddressSpace *vtd_dev_as;
> +    IOMMUTLBEntry entry;
> +    struct VTDBus *vtd_bus;
> +    hwaddr addr;
> +    uint64_t sz;
> +    uint16_t sid;
> +    uint8_t devfn;
> +    bool size;
> +    uint8_t bus_num;
> +
> +    addr = VTD_INV_DESC_DEVICE_IOTLB_ADDR(inv_desc->hi);
> +    sid = VTD_INV_DESC_DEVICE_IOTLB_SID(inv_desc->lo);
> +    devfn = sid & 0xff;
> +    bus_num = sid >> 8;
> +    size = VTD_INV_DESC_DEVICE_IOTLB_SIZE(inv_desc->hi);
> +
> +    if ((inv_desc->lo & VTD_INV_DESC_DEVICE_IOTLB_RSVD_LO) ||
> +        (inv_desc->hi & VTD_INV_DESC_DEVICE_IOTLB_RSVD_HI)) {
> +        VTD_DPRINTF(GENERAL, "error: non-zero reserved field in Device "
> +                    "IOTLB Invalidate Descriptor hi 0x%"PRIx64 " lo 0x%"PRIx64,
> +                    inv_desc->hi, inv_desc->lo);
> +        return false;
> +    }
> +
> +    vtd_bus = vtd_find_as_from_bus_num(s, bus_num);
> +    if (!vtd_bus) {
> +        goto done;
> +    }
> +
> +    vtd_dev_as = vtd_bus->dev_as[devfn];
> +    if (!vtd_dev_as) {
> +        goto done;
> +    }
> +
> +    if (size) {
> +        sz = ffsll(~(addr >> VTD_PAGE_SHIFT));
> +        addr = addr & ~((1 << (sz + VTD_PAGE_SHIFT)) - 1);
> +        sz = VTD_PAGE_SIZE << sz;

For these three lines, could it be shorter like:

    sz = 1 << ffsll(~addr);
    addr &= ~(sz - 1);

It seems that we can avoid using VTD_PAGE_*.

> +    } else {
> +        sz = VTD_PAGE_SIZE;
> +    }
> +
> +    entry.target_as = &vtd_dev_as->as;
> +    entry.addr_mask = sz - 1;
> +    entry.iova = addr;
> +    memory_region_notify_iommu(entry.target_as->root, entry);

Here, we seems to be posting this invalidation to all registered
notifiers. Since this is a device-tlb invalidation, and we should
know which device (BDF) that we should invalidate, is there any way
that we can directly route this info to that specific device?

E.g., if we enable VFIO with current patch, this notify will
possibly be passed to VFIO devices as well, even it's actually for
vhost devices.  Not sure whether there would be problem.

Another thing totally not related to this patch: I see that the
second parameter for memory_region_notify_iommu() is IOMMUTLBEntry,
rather than its pointer.  While inside of the funccall, it only
passes in the pointer directly:

void memory_region_notify_iommu(MemoryRegion *mr,
                                IOMMUTLBEntry entry)
{
    assert(memory_region_is_iommu(mr));
    notifier_list_notify(&mr->iommu_notify, &entry);
}

Shall we change "entry" into a pointer as well? I found no reason
why we need to keep this IOMMUTLBEntry in stack twice...

Thanks.

-- peterx

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 2/8] intel_iommu: name vtd address space with devfn
  2016-03-28  2:02   ` Peter Xu
@ 2016-03-30  1:12     ` Jason Wang
  2016-03-30 11:12       ` Michael S. Tsirkin
  0 siblings, 1 reply; 19+ messages in thread
From: Jason Wang @ 2016-03-30  1:12 UTC (permalink / raw)
  To: Peter Xu
  Cc: Eduardo Habkost, mst, qemu-devel, cornelia.huck, pbonzini,
	Richard Henderson



On 03/28/2016 10:02 AM, Peter Xu wrote:
> On Fri, Mar 25, 2016 at 10:13:23AM +0800, Jason Wang wrote:
>> To avoid duplicated name and ease debugging.
>>
>> Cc: Michael S. Tsirkin <mst@redhat.com>
>> Cc: Paolo Bonzini <pbonzini@redhat.com>
>> Cc: Richard Henderson <rth@twiddle.net>
>> Cc: Eduardo Habkost <ehabkost@redhat.com>
>> Signed-off-by: Jason Wang <jasowang@redhat.com>
>> ---
>>  hw/i386/intel_iommu.c | 6 ++++--
>>  1 file changed, 4 insertions(+), 2 deletions(-)
>>
>> diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
>> index 347718f..d647b42 100644
>> --- a/hw/i386/intel_iommu.c
>> +++ b/hw/i386/intel_iommu.c
>> @@ -1901,6 +1901,7 @@ VTDAddressSpace *vtd_find_add_as(IntelIOMMUState *s, PCIBus *bus, int devfn)
>>      uintptr_t key = (uintptr_t)bus;
>>      VTDBus *vtd_bus = g_hash_table_lookup(s->vtd_as_by_busptr, &key);
>>      VTDAddressSpace *vtd_dev_as;
>> +    char name[128];
>>  
>>      if (!vtd_bus) {
>>          /* No corresponding free() */
>> @@ -1913,6 +1914,7 @@ VTDAddressSpace *vtd_find_add_as(IntelIOMMUState *s, PCIBus *bus, int devfn)
>>      vtd_dev_as = vtd_bus->dev_as[devfn];
>>  
>>      if (!vtd_dev_as) {
>> +        sprintf(name, "intel_iommu_devfn_%d", devfn);
> It's safe here, but would snprintf() look better?

Not sure, we're sure that name is large enough here.

>
>>          vtd_bus->dev_as[devfn] = vtd_dev_as = g_malloc0(sizeof(VTDAddressSpace));
>>  
>>          vtd_dev_as->bus = bus;
>> @@ -1920,9 +1922,9 @@ VTDAddressSpace *vtd_find_add_as(IntelIOMMUState *s, PCIBus *bus, int devfn)
>>          vtd_dev_as->iommu_state = s;
>>          vtd_dev_as->context_cache_entry.context_cache_gen = 0;
>>          memory_region_init_iommu(&vtd_dev_as->iommu, OBJECT(s),
>> -                                 &s->iommu_ops, "intel_iommu", UINT64_MAX);
>> +                                 &s->iommu_ops, name, UINT64_MAX);
>>          address_space_init(&vtd_dev_as->as,
>> -                           &vtd_dev_as->iommu, "intel_iommu");
>> +                           &vtd_dev_as->iommu, name);
>>      }
>>      return vtd_dev_as;
>>  }
>> -- 
>> 2.5.0
>>
> Besides the nit-pick:
>
> Acked-by: Peter Xu <peterx@redhat.com>
>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 4/8] exec: introduce address_space_get_iotlb_entry()
  2016-03-28  2:18   ` Peter Xu
@ 2016-03-30  1:13     ` Jason Wang
  0 siblings, 0 replies; 19+ messages in thread
From: Jason Wang @ 2016-03-30  1:13 UTC (permalink / raw)
  To: Peter Xu
  Cc: mst, Peter Crosthwaite, qemu-devel, cornelia.huck, pbonzini,
	Richard Henderson



On 03/28/2016 10:18 AM, Peter Xu wrote:
> On Fri, Mar 25, 2016 at 10:13:25AM +0800, Jason Wang wrote:
>> This patch introduces a helper to query the iotlb entry for a
>> possible iova. This will be used by later device IOTLB API to enable
>> the capability for a dataplane (e.g vhost) to query the IOTLB.
>>
>> Cc: Paolo Bonzini <pbonzini@redhat.com>
>> Cc: Peter Crosthwaite <crosthwaite.peter@gmail.com>
>> Cc: Richard Henderson <rth@twiddle.net>
>> Signed-off-by: Jason Wang <jasowang@redhat.com>
>> ---
>>  exec.c                | 30 ++++++++++++++++++++++++++++++
>>  include/exec/memory.h |  7 +++++++
>>  2 files changed, 37 insertions(+)
>>
>> diff --git a/exec.c b/exec.c
>> index f398d21..31fac9f 100644
>> --- a/exec.c
>> +++ b/exec.c
>> @@ -411,6 +411,36 @@ address_space_translate_internal(AddressSpaceDispatch *d, hwaddr addr, hwaddr *x
>>  }
>>  
>>  /* Called from RCU critical section */
>> +IOMMUTLBEntry address_space_get_iotlb_entry(AddressSpace *as, hwaddr addr,
>> +                                            bool is_write)
>> +{
>> +    IOMMUTLBEntry iotlb = {0};
>> +    MemoryRegionSection *section;
>> +    MemoryRegion *mr;
>> +    hwaddr plen;
>> +
>> +    for (;;) {
>> +        AddressSpaceDispatch *d = atomic_rcu_read(&as->dispatch);
>> +        section = address_space_translate_internal(d, addr, &addr, &plen, true);
>> +        mr = section->mr;
>> +
>> +        if (!mr->iommu_ops) {
>> +            break;
>> +        }
>> +
>> +        iotlb = mr->iommu_ops->translate(mr, addr, is_write);
>> +        if (!(iotlb.perm & (1 << is_write))) {
>> +            iotlb.target_as = NULL;
>> +            break;
>> +        }
> Here, do we still need something like:
>
>         addr = ((iotlb.translated_addr & ~iotlb.addr_mask)
>                 | (addr & iotlb.addr_mask));
>
> Just as address_space_translate() does? Now "addr" should be the
> offset in memory region "mr", while we need it to be the offset in
> address space if there are more loops, right?

Right, will address this in next version.

>
> Also, not sure whether we can abstract a shared function out of this
> function and address_space_translate().

Looks possible.

Thanks

>
> Thanks.
>
> -- peterx
>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 6/8] intel_iommu: support device iotlb descriptor
  2016-03-28  3:37   ` Peter Xu
@ 2016-03-30  5:08     ` Jason Wang
  2016-03-30  5:21       ` Peter Xu
  0 siblings, 1 reply; 19+ messages in thread
From: Jason Wang @ 2016-03-30  5:08 UTC (permalink / raw)
  To: Peter Xu
  Cc: Eduardo Habkost, mst, qemu-devel, cornelia.huck, pbonzini,
	Richard Henderson



On 03/28/2016 11:37 AM, Peter Xu wrote:
> On Fri, Mar 25, 2016 at 10:13:27AM +0800, Jason Wang wrote:
>> This patch enables device IOTLB support for intel iommu. The major
>> work is to implement QI device IOTLB descriptor processing and notify
>> the device through iommu notifier.
>>
>> Cc: Paolo Bonzini <pbonzini@redhat.com>
>> Cc: Richard Henderson <rth@twiddle.net>
>> Cc: Eduardo Habkost <ehabkost@redhat.com>
>> Cc: Michael S. Tsirkin <mst@redhat.com>
>> Signed-off-by: Jason Wang <jasowang@redhat.com>
>> ---
>>  hw/i386/intel_iommu.c          | 81 ++++++++++++++++++++++++++++++++++++++----
>>  hw/i386/intel_iommu_internal.h | 13 +++++--
>>  2 files changed, 86 insertions(+), 8 deletions(-)
>>
> [...]
>
>> +static bool vtd_process_device_iotlb_desc(IntelIOMMUState *s,
>> +                                          VTDInvDesc *inv_desc)
>> +{
>> +    VTDAddressSpace *vtd_dev_as;
>> +    IOMMUTLBEntry entry;
>> +    struct VTDBus *vtd_bus;
>> +    hwaddr addr;
>> +    uint64_t sz;
>> +    uint16_t sid;
>> +    uint8_t devfn;
>> +    bool size;
>> +    uint8_t bus_num;
>> +
>> +    addr = VTD_INV_DESC_DEVICE_IOTLB_ADDR(inv_desc->hi);
>> +    sid = VTD_INV_DESC_DEVICE_IOTLB_SID(inv_desc->lo);
>> +    devfn = sid & 0xff;
>> +    bus_num = sid >> 8;
>> +    size = VTD_INV_DESC_DEVICE_IOTLB_SIZE(inv_desc->hi);
>> +
>> +    if ((inv_desc->lo & VTD_INV_DESC_DEVICE_IOTLB_RSVD_LO) ||
>> +        (inv_desc->hi & VTD_INV_DESC_DEVICE_IOTLB_RSVD_HI)) {
>> +        VTD_DPRINTF(GENERAL, "error: non-zero reserved field in Device "
>> +                    "IOTLB Invalidate Descriptor hi 0x%"PRIx64 " lo 0x%"PRIx64,
>> +                    inv_desc->hi, inv_desc->lo);
>> +        return false;
>> +    }
>> +
>> +    vtd_bus = vtd_find_as_from_bus_num(s, bus_num);
>> +    if (!vtd_bus) {
>> +        goto done;
>> +    }
>> +
>> +    vtd_dev_as = vtd_bus->dev_as[devfn];
>> +    if (!vtd_dev_as) {
>> +        goto done;
>> +    }
>> +
>> +    if (size) {
>> +        sz = ffsll(~(addr >> VTD_PAGE_SHIFT));
>> +        addr = addr & ~((1 << (sz + VTD_PAGE_SHIFT)) - 1);
>> +        sz = VTD_PAGE_SIZE << sz;
> For these three lines, could it be shorter like:
>
>     sz = 1 << ffsll(~addr);
>     addr &= ~(sz - 1);
>
> It seems that we can avoid using VTD_PAGE_*.

Some lower bits of addr is zero (since it was reserved), so this may not
work. Looks like it could be optimized to

sz = 1 << ffsll(~(addr | (VTD_PAGE_MASK - 1)));
addr &= ~(sz - 1);

>
>> +    } else {
>> +        sz = VTD_PAGE_SIZE;
>> +    }
>> +
>> +    entry.target_as = &vtd_dev_as->as;
>> +    entry.addr_mask = sz - 1;
>> +    entry.iova = addr;
>> +    memory_region_notify_iommu(entry.target_as->root, entry);
> Here, we seems to be posting this invalidation to all registered
> notifiers.

Yes, but only for a device specified address space.

>  Since this is a device-tlb invalidation, and we should
> know which device (BDF) that we should invalidate, is there any way
> that we can directly route this info to that specific device?

Looks like the codes has already done this, the target_as was found by
bus num and devfn.

>
> E.g., if we enable VFIO with current patch, this notify will
> possibly be passed to VFIO devices as well, even it's actually for
> vhost devices.  Not sure whether there would be problem.

Not sure, but if the underlaying device has ATS capability, we probably
need to propagate the invalidation to the device itself too.

>
> Another thing totally not related to this patch: I see that the
> second parameter for memory_region_notify_iommu() is IOMMUTLBEntry,
> rather than its pointer.  While inside of the funccall, it only
> passes in the pointer directly:
>
> void memory_region_notify_iommu(MemoryRegion *mr,
>                                 IOMMUTLBEntry entry)
> {
>     assert(memory_region_is_iommu(mr));
>     notifier_list_notify(&mr->iommu_notify, &entry);
> }
>
> Shall we change "entry" into a pointer as well? I found no reason
> why we need to keep this IOMMUTLBEntry in stack twice...
>
> Thanks.
>
> -- peterx
>

Right, it looks ok to change to use a pointer.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 6/8] intel_iommu: support device iotlb descriptor
  2016-03-30  5:08     ` Jason Wang
@ 2016-03-30  5:21       ` Peter Xu
  0 siblings, 0 replies; 19+ messages in thread
From: Peter Xu @ 2016-03-30  5:21 UTC (permalink / raw)
  To: Jason Wang
  Cc: Eduardo Habkost, mst, qemu-devel, cornelia.huck, pbonzini,
	Richard Henderson

On Wed, Mar 30, 2016 at 01:08:45PM +0800, Jason Wang wrote:

[...]

> >> +    } else {
> >> +        sz = VTD_PAGE_SIZE;
> >> +    }
> >> +
> >> +    entry.target_as = &vtd_dev_as->as;
> >> +    entry.addr_mask = sz - 1;
> >> +    entry.iova = addr;
> >> +    memory_region_notify_iommu(entry.target_as->root, entry);
> > Here, we seems to be posting this invalidation to all registered
> > notifiers.
> 
> Yes, but only for a device specified address space.
> 
> >  Since this is a device-tlb invalidation, and we should
> > know which device (BDF) that we should invalidate, is there any way
> > that we can directly route this info to that specific device?
> 
> Looks like the codes has already done this, the target_as was found by
> bus num and devfn.

Yes, seems you are right. :)

-- peterx

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 2/8] intel_iommu: name vtd address space with devfn
  2016-03-30  1:12     ` Jason Wang
@ 2016-03-30 11:12       ` Michael S. Tsirkin
  0 siblings, 0 replies; 19+ messages in thread
From: Michael S. Tsirkin @ 2016-03-30 11:12 UTC (permalink / raw)
  To: Jason Wang
  Cc: Eduardo Habkost, qemu-devel, Peter Xu, cornelia.huck, pbonzini,
	Richard Henderson

On Wed, Mar 30, 2016 at 09:12:46AM +0800, Jason Wang wrote:
> 
> 
> On 03/28/2016 10:02 AM, Peter Xu wrote:
> > On Fri, Mar 25, 2016 at 10:13:23AM +0800, Jason Wang wrote:
> >> To avoid duplicated name and ease debugging.
> >>
> >> Cc: Michael S. Tsirkin <mst@redhat.com>
> >> Cc: Paolo Bonzini <pbonzini@redhat.com>
> >> Cc: Richard Henderson <rth@twiddle.net>
> >> Cc: Eduardo Habkost <ehabkost@redhat.com>
> >> Signed-off-by: Jason Wang <jasowang@redhat.com>
> >> ---
> >>  hw/i386/intel_iommu.c | 6 ++++--
> >>  1 file changed, 4 insertions(+), 2 deletions(-)
> >>
> >> diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
> >> index 347718f..d647b42 100644
> >> --- a/hw/i386/intel_iommu.c
> >> +++ b/hw/i386/intel_iommu.c
> >> @@ -1901,6 +1901,7 @@ VTDAddressSpace *vtd_find_add_as(IntelIOMMUState *s, PCIBus *bus, int devfn)
> >>      uintptr_t key = (uintptr_t)bus;
> >>      VTDBus *vtd_bus = g_hash_table_lookup(s->vtd_as_by_busptr, &key);
> >>      VTDAddressSpace *vtd_dev_as;
> >> +    char name[128];
> >>  
> >>      if (!vtd_bus) {
> >>          /* No corresponding free() */
> >> @@ -1913,6 +1914,7 @@ VTDAddressSpace *vtd_find_add_as(IntelIOMMUState *s, PCIBus *bus, int devfn)
> >>      vtd_dev_as = vtd_bus->dev_as[devfn];
> >>  
> >>      if (!vtd_dev_as) {
> >> +        sprintf(name, "intel_iommu_devfn_%d", devfn);
> > It's safe here, but would snprintf() look better?
> 
> Not sure, we're sure that name is large enough here.

It's generally good practice, pls use snprintf.

> >
> >>          vtd_bus->dev_as[devfn] = vtd_dev_as = g_malloc0(sizeof(VTDAddressSpace));
> >>  
> >>          vtd_dev_as->bus = bus;
> >> @@ -1920,9 +1922,9 @@ VTDAddressSpace *vtd_find_add_as(IntelIOMMUState *s, PCIBus *bus, int devfn)
> >>          vtd_dev_as->iommu_state = s;
> >>          vtd_dev_as->context_cache_entry.context_cache_gen = 0;
> >>          memory_region_init_iommu(&vtd_dev_as->iommu, OBJECT(s),
> >> -                                 &s->iommu_ops, "intel_iommu", UINT64_MAX);
> >> +                                 &s->iommu_ops, name, UINT64_MAX);
> >>          address_space_init(&vtd_dev_as->as,
> >> -                           &vtd_dev_as->iommu, "intel_iommu");
> >> +                           &vtd_dev_as->iommu, name);
> >>      }
> >>      return vtd_dev_as;
> >>  }
> >> -- 
> >> 2.5.0
> >>
> > Besides the nit-pick:
> >
> > Acked-by: Peter Xu <peterx@redhat.com>
> >

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 1/8] virtio: convert to use DMA api
  2016-03-25  2:13 ` [Qemu-devel] [RFC PATCH 1/8] virtio: convert to use DMA api Jason Wang
@ 2016-04-19 13:37   ` Michael S. Tsirkin
  0 siblings, 0 replies; 19+ messages in thread
From: Michael S. Tsirkin @ 2016-04-19 13:37 UTC (permalink / raw)
  To: Jason Wang
  Cc: qemu-devel, pbonzini, peterx, cornelia.huck, Stefan Hajnoczi,
	Kevin Wolf, Amit Shah, qemu-block

On Fri, Mar 25, 2016 at 10:13:22AM +0800, Jason Wang wrote:
> Currently, all virtio devices bypass IOMMU completely. This is because
> address_space_memory is assumed and used during DMA emulation. This
> patch converts the virtio core API to use DMA API. This idea is
> 
> - introducing a new transport specific helper to query the dma address
>   space. (only pci version is implemented).
> - query and use this address space during virtio device guest memory
>   accessing
> 
> With this virtio devices will not bypass IOMMU anymore. Tested with
> intel_iommu=on/strict with:
> 
> - virtio guest DMA series posted in https://lkml.org/lkml/2015/10/28/64.
> - vfio (unsafe interrupt mode) dpdk l2fwd in guest
> 
> TODO:
> - Feature bit for this
> - Implement this for all transports

Nice. The only thing that worries me here is that ring
addresses are only translated once at DRIVER_OK.

In theory, rings could be non-contigious and
mapped to contigious ranges of bus addresses by
the IOMMU. Might be useful for very large rings
or memory-constrained guests.

Thoughts? Is this worth worrying about?

> Cc: Michael S. Tsirkin <mst@redhat.com>
> Cc: Stefan Hajnoczi <stefanha@redhat.com>
> Cc: Kevin Wolf <kwolf@redhat.com>
> Cc: Amit Shah <amit.shah@redhat.com>
> Cc: Paolo Bonzini <pbonzini@redhat.com>
> Cc: qemu-block@nongnu.org
> Signed-off-by: Jason Wang <jasowang@redhat.com>
> ---
>  hw/block/virtio-blk.c             |  2 +-
>  hw/char/virtio-serial-bus.c       |  3 +-
>  hw/scsi/virtio-scsi.c             |  4 ++-
>  hw/virtio/virtio-pci.c            |  9 ++++++
>  hw/virtio/virtio.c                | 58 +++++++++++++++++++++++----------------
>  include/hw/virtio/virtio-access.h | 42 +++++++++++++++++++++-------
>  include/hw/virtio/virtio-bus.h    |  1 +
>  include/hw/virtio/virtio.h        |  4 +--
>  8 files changed, 85 insertions(+), 38 deletions(-)
> 
> diff --git a/hw/block/virtio-blk.c b/hw/block/virtio-blk.c
> index cb710f1..9411f99 100644
> --- a/hw/block/virtio-blk.c
> +++ b/hw/block/virtio-blk.c
> @@ -829,7 +829,7 @@ static int virtio_blk_load_device(VirtIODevice *vdev, QEMUFile *f,
>  
>      while (qemu_get_sbyte(f)) {
>          VirtIOBlockReq *req;
> -        req = qemu_get_virtqueue_element(f, sizeof(VirtIOBlockReq));
> +        req = qemu_get_virtqueue_element(vdev, f, sizeof(VirtIOBlockReq));
>          virtio_blk_init_request(s, req);
>          req->next = s->rq;
>          s->rq = req;
> diff --git a/hw/char/virtio-serial-bus.c b/hw/char/virtio-serial-bus.c
> index 99cb683..bdc5393 100644
> --- a/hw/char/virtio-serial-bus.c
> +++ b/hw/char/virtio-serial-bus.c
> @@ -687,6 +687,7 @@ static void virtio_serial_post_load_timer_cb(void *opaque)
>  static int fetch_active_ports_list(QEMUFile *f, int version_id,
>                                     VirtIOSerial *s, uint32_t nr_active_ports)
>  {
> +    VirtIODevice *vdev = VIRTIO_DEVICE(s);
>      uint32_t i;
>  
>      s->post_load = g_malloc0(sizeof(*s->post_load));
> @@ -722,7 +723,7 @@ static int fetch_active_ports_list(QEMUFile *f, int version_id,
>                  qemu_get_be64s(f, &port->iov_offset);
>  
>                  port->elem =
> -                    qemu_get_virtqueue_element(f, sizeof(VirtQueueElement));
> +                    qemu_get_virtqueue_element(vdev, f, sizeof(VirtQueueElement));
>  
>                  /*
>                   *  Port was throttled on source machine.  Let's
> diff --git a/hw/scsi/virtio-scsi.c b/hw/scsi/virtio-scsi.c
> index 0c30d2e..26ce701 100644
> --- a/hw/scsi/virtio-scsi.c
> +++ b/hw/scsi/virtio-scsi.c
> @@ -196,12 +196,14 @@ static void *virtio_scsi_load_request(QEMUFile *f, SCSIRequest *sreq)
>      SCSIBus *bus = sreq->bus;
>      VirtIOSCSI *s = container_of(bus, VirtIOSCSI, bus);
>      VirtIOSCSICommon *vs = VIRTIO_SCSI_COMMON(s);
> +    VirtIODevice *vdev = VIRTIO_DEVICE(s);
>      VirtIOSCSIReq *req;
>      uint32_t n;
>  
>      qemu_get_be32s(f, &n);
>      assert(n < vs->conf.num_queues);
> -    req = qemu_get_virtqueue_element(f, sizeof(VirtIOSCSIReq) + vs->cdb_size);
> +    req = qemu_get_virtqueue_element(vdev, f,
> +                                     sizeof(VirtIOSCSIReq) + vs->cdb_size);
>      virtio_scsi_init_req(s, vs->cmd_vqs[n], req);
>  
>      if (virtio_scsi_parse_req(req, sizeof(VirtIOSCSICmdReq) + vs->cdb_size,
> diff --git a/hw/virtio/virtio-pci.c b/hw/virtio/virtio-pci.c
> index 0dadb66..5508b1c 100644
> --- a/hw/virtio/virtio-pci.c
> +++ b/hw/virtio/virtio-pci.c
> @@ -1211,6 +1211,14 @@ static int virtio_pci_query_nvectors(DeviceState *d)
>      return proxy->nvectors;
>  }
>  
> +static AddressSpace *virtio_pci_get_dma_as(DeviceState *d)
> +{
> +    VirtIOPCIProxy *proxy = VIRTIO_PCI(d);
> +    PCIDevice *dev = &proxy->pci_dev;
> +
> +    return pci_get_address_space(dev);
> +}
> +
>  static int virtio_pci_add_mem_cap(VirtIOPCIProxy *proxy,
>                                     struct virtio_pci_cap *cap)
>  {
> @@ -2495,6 +2503,7 @@ static void virtio_pci_bus_class_init(ObjectClass *klass, void *data)
>      k->device_plugged = virtio_pci_device_plugged;
>      k->device_unplugged = virtio_pci_device_unplugged;
>      k->query_nvectors = virtio_pci_query_nvectors;
> +    k->get_dma_as = virtio_pci_get_dma_as;
>  }
>  
>  static const TypeInfo virtio_pci_bus_info = {
> diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
> index 08275a9..37c9951 100644
> --- a/hw/virtio/virtio.c
> +++ b/hw/virtio/virtio.c
> @@ -21,6 +21,7 @@
>  #include "hw/virtio/virtio-bus.h"
>  #include "migration/migration.h"
>  #include "hw/virtio/virtio-access.h"
> +#include "sysemu/dma.h"
>  
>  /*
>   * The alignment to use between consumer and producer parts of vring.
> @@ -118,7 +119,7 @@ void virtio_queue_update_rings(VirtIODevice *vdev, int n)
>  static void vring_desc_read(VirtIODevice *vdev, VRingDesc *desc,
>                              hwaddr desc_pa, int i)
>  {
> -    address_space_read(&address_space_memory, desc_pa + i * sizeof(VRingDesc),
> +    address_space_read(virtio_get_dma_as(vdev), desc_pa + i * sizeof(VRingDesc),
>                         MEMTXATTRS_UNSPECIFIED, (void *)desc, sizeof(VRingDesc));
>      virtio_tswap64s(vdev, &desc->addr);
>      virtio_tswap32s(vdev, &desc->len);
> @@ -160,7 +161,7 @@ static inline void vring_used_write(VirtQueue *vq, VRingUsedElem *uelem,
>      virtio_tswap32s(vq->vdev, &uelem->id);
>      virtio_tswap32s(vq->vdev, &uelem->len);
>      pa = vq->vring.used + offsetof(VRingUsed, ring[i]);
> -    address_space_write(&address_space_memory, pa, MEMTXATTRS_UNSPECIFIED,
> +    address_space_write(virtio_get_dma_as(vq->vdev), pa, MEMTXATTRS_UNSPECIFIED,
>                         (void *)uelem, sizeof(VRingUsedElem));
>  }
>  
> @@ -240,6 +241,7 @@ int virtio_queue_empty(VirtQueue *vq)
>  static void virtqueue_unmap_sg(VirtQueue *vq, const VirtQueueElement *elem,
>                                 unsigned int len)
>  {
> +    AddressSpace *dma_as = virtio_get_dma_as(vq->vdev);
>      unsigned int offset;
>      int i;
>  
> @@ -247,17 +249,17 @@ static void virtqueue_unmap_sg(VirtQueue *vq, const VirtQueueElement *elem,
>      for (i = 0; i < elem->in_num; i++) {
>          size_t size = MIN(len - offset, elem->in_sg[i].iov_len);
>  
> -        cpu_physical_memory_unmap(elem->in_sg[i].iov_base,
> -                                  elem->in_sg[i].iov_len,
> -                                  1, size);
> +        dma_memory_unmap(dma_as, elem->in_sg[i].iov_base, elem->in_sg[i].iov_len,
> +                         DMA_DIRECTION_FROM_DEVICE, size);
>  
>          offset += size;
>      }
>  
>      for (i = 0; i < elem->out_num; i++)
> -        cpu_physical_memory_unmap(elem->out_sg[i].iov_base,
> -                                  elem->out_sg[i].iov_len,
> -                                  0, elem->out_sg[i].iov_len);
> +        dma_memory_unmap(dma_as, elem->out_sg[i].iov_base,
> +                         elem->out_sg[i].iov_len,
> +                         DMA_DIRECTION_TO_DEVICE,
> +                         elem->out_sg[i].iov_len);
>  }
>  
>  void virtqueue_discard(VirtQueue *vq, const VirtQueueElement *elem,
> @@ -447,7 +449,8 @@ int virtqueue_avail_bytes(VirtQueue *vq, unsigned int in_bytes,
>      return in_bytes <= in_total && out_bytes <= out_total;
>  }
>  
> -static void virtqueue_map_desc(unsigned int *p_num_sg, hwaddr *addr, struct iovec *iov,
> +static void virtqueue_map_desc(VirtIODevice *vdev,
> +                               unsigned int *p_num_sg, hwaddr *addr, struct iovec *iov,
>                                 unsigned int max_num_sg, bool is_write,
>                                 hwaddr pa, size_t sz)
>  {
> @@ -462,7 +465,10 @@ static void virtqueue_map_desc(unsigned int *p_num_sg, hwaddr *addr, struct iove
>              exit(1);
>          }
>  
> -        iov[num_sg].iov_base = cpu_physical_memory_map(pa, &len, is_write);
> +        iov[num_sg].iov_base = dma_memory_map(virtio_get_dma_as(vdev), pa, &len,
> +                                              is_write ?
> +                                              DMA_DIRECTION_FROM_DEVICE:
> +                                              DMA_DIRECTION_TO_DEVICE);
>          iov[num_sg].iov_len = len;
>          addr[num_sg] = pa;
>  
> @@ -473,9 +479,9 @@ static void virtqueue_map_desc(unsigned int *p_num_sg, hwaddr *addr, struct iove
>      *p_num_sg = num_sg;
>  }
>  
> -static void virtqueue_map_iovec(struct iovec *sg, hwaddr *addr,
> -                                unsigned int *num_sg, unsigned int max_size,
> -                                int is_write)
> +static void virtqueue_map_iovec(VirtIODevice *vdev, struct iovec *sg,
> +                                hwaddr *addr, unsigned int *num_sg,
> +                                unsigned int max_size, int is_write)
>  {
>      unsigned int i;
>      hwaddr len;
> @@ -494,7 +500,10 @@ static void virtqueue_map_iovec(struct iovec *sg, hwaddr *addr,
>  
>      for (i = 0; i < *num_sg; i++) {
>          len = sg[i].iov_len;
> -        sg[i].iov_base = cpu_physical_memory_map(addr[i], &len, is_write);
> +        sg[i].iov_base = dma_memory_map(virtio_get_dma_as(vdev),
> +                                        addr[i], &len, is_write ?
> +                                        DMA_DIRECTION_FROM_DEVICE :
> +                                        DMA_DIRECTION_TO_DEVICE);
>          if (!sg[i].iov_base) {
>              error_report("virtio: error trying to map MMIO memory");
>              exit(1);
> @@ -506,12 +515,15 @@ static void virtqueue_map_iovec(struct iovec *sg, hwaddr *addr,
>      }
>  }
>  
> -void virtqueue_map(VirtQueueElement *elem)
> +void virtqueue_map(VirtIODevice *vdev, VirtQueueElement *elem)
>  {
> -    virtqueue_map_iovec(elem->in_sg, elem->in_addr, &elem->in_num,
> -                        VIRTQUEUE_MAX_SIZE, 1);
> -    virtqueue_map_iovec(elem->out_sg, elem->out_addr, &elem->out_num,
> -                        VIRTQUEUE_MAX_SIZE, 0);
> +    virtqueue_map_iovec(vdev, elem->in_sg, elem->in_addr, &elem->in_num,
> +                        MIN(ARRAY_SIZE(elem->in_sg), ARRAY_SIZE(elem->in_addr)),
> +                        1);
> +    virtqueue_map_iovec(vdev, elem->out_sg, elem->out_addr, &elem->out_num,
> +                        MIN(ARRAY_SIZE(elem->out_sg),
> +                        ARRAY_SIZE(elem->out_addr)),
> +                        0);
>  }
>  
>  void *virtqueue_alloc_element(size_t sz, unsigned out_num, unsigned in_num)
> @@ -580,14 +592,14 @@ void *virtqueue_pop(VirtQueue *vq, size_t sz)
>      /* Collect all the descriptors */
>      do {
>          if (desc.flags & VRING_DESC_F_WRITE) {
> -            virtqueue_map_desc(&in_num, addr + out_num, iov + out_num,
> +            virtqueue_map_desc(vdev, &in_num, addr + out_num, iov + out_num,
>                                 VIRTQUEUE_MAX_SIZE - out_num, true, desc.addr, desc.len);
>          } else {
>              if (in_num) {
>                  error_report("Incorrect order for descriptors");
>                  exit(1);
>              }
> -            virtqueue_map_desc(&out_num, addr, iov,
> +            virtqueue_map_desc(vdev, &out_num, addr, iov,
>                                 VIRTQUEUE_MAX_SIZE, false, desc.addr, desc.len);
>          }
>  
> @@ -633,7 +645,7 @@ typedef struct VirtQueueElementOld {
>      struct iovec out_sg[VIRTQUEUE_MAX_SIZE];
>  } VirtQueueElementOld;
>  
> -void *qemu_get_virtqueue_element(QEMUFile *f, size_t sz)
> +void *qemu_get_virtqueue_element(VirtIODevice *vdev, QEMUFile *f, size_t sz)
>  {
>      VirtQueueElement *elem;
>      VirtQueueElementOld data;
> @@ -664,7 +676,7 @@ void *qemu_get_virtqueue_element(QEMUFile *f, size_t sz)
>          elem->out_sg[i].iov_len = data.out_sg[i].iov_len;
>      }
>  
> -    virtqueue_map(elem);
> +    virtqueue_map(vdev, elem);
>      return elem;
>  }
>  
> diff --git a/include/hw/virtio/virtio-access.h b/include/hw/virtio/virtio-access.h
> index 8dc84f5..967cc75 100644
> --- a/include/hw/virtio/virtio-access.h
> +++ b/include/hw/virtio/virtio-access.h
> @@ -15,8 +15,20 @@
>  #ifndef _QEMU_VIRTIO_ACCESS_H
>  #define _QEMU_VIRTIO_ACCESS_H
>  #include "hw/virtio/virtio.h"
> +#include "hw/virtio/virtio-bus.h"
>  #include "exec/address-spaces.h"
>  
> +static inline AddressSpace *virtio_get_dma_as(VirtIODevice *vdev)
> +{
> +    BusState *qbus = qdev_get_parent_bus(DEVICE(vdev));
> +    VirtioBusClass *k = VIRTIO_BUS_GET_CLASS(qbus);
> +
> +    if (k->get_dma_as) {
> +        return k->get_dma_as(qbus->parent);
> +    }
> +    return &address_space_memory;
> +}
> +
>  static inline bool virtio_access_is_big_endian(VirtIODevice *vdev)
>  {
>  #if defined(TARGET_IS_BIENDIAN)
> @@ -34,45 +46,55 @@ static inline bool virtio_access_is_big_endian(VirtIODevice *vdev)
>  
>  static inline uint16_t virtio_lduw_phys(VirtIODevice *vdev, hwaddr pa)
>  {
> +    AddressSpace *dma_as = virtio_get_dma_as(vdev);
> +
>      if (virtio_access_is_big_endian(vdev)) {
> -        return lduw_be_phys(&address_space_memory, pa);
> +        return lduw_be_phys(dma_as, pa);
>      }
> -    return lduw_le_phys(&address_space_memory, pa);
> +    return lduw_le_phys(dma_as, pa);
>  }
>  
>  static inline uint32_t virtio_ldl_phys(VirtIODevice *vdev, hwaddr pa)
>  {
> +    AddressSpace *dma_as = virtio_get_dma_as(vdev);
> +
>      if (virtio_access_is_big_endian(vdev)) {
> -        return ldl_be_phys(&address_space_memory, pa);
> +        return ldl_be_phys(dma_as, pa);
>      }
> -    return ldl_le_phys(&address_space_memory, pa);
> +    return ldl_le_phys(dma_as, pa);
>  }
>  
>  static inline uint64_t virtio_ldq_phys(VirtIODevice *vdev, hwaddr pa)
>  {
> +    AddressSpace *dma_as = virtio_get_dma_as(vdev);
> +
>      if (virtio_access_is_big_endian(vdev)) {
> -        return ldq_be_phys(&address_space_memory, pa);
> +        return ldq_be_phys(dma_as, pa);
>      }
> -    return ldq_le_phys(&address_space_memory, pa);
> +    return ldq_le_phys(dma_as, pa);
>  }
>  
>  static inline void virtio_stw_phys(VirtIODevice *vdev, hwaddr pa,
>                                     uint16_t value)
>  {
> +    AddressSpace *dma_as = virtio_get_dma_as(vdev);
> +
>      if (virtio_access_is_big_endian(vdev)) {
> -        stw_be_phys(&address_space_memory, pa, value);
> +        stw_be_phys(dma_as, pa, value);
>      } else {
> -        stw_le_phys(&address_space_memory, pa, value);
> +        stw_le_phys(dma_as, pa, value);
>      }
>  }
>  
>  static inline void virtio_stl_phys(VirtIODevice *vdev, hwaddr pa,
>                                     uint32_t value)
>  {
> +    AddressSpace *dma_as = virtio_get_dma_as(vdev);
> +
>      if (virtio_access_is_big_endian(vdev)) {
> -        stl_be_phys(&address_space_memory, pa, value);
> +        stl_be_phys(dma_as, pa, value);
>      } else {
> -        stl_le_phys(&address_space_memory, pa, value);
> +        stl_le_phys(dma_as, pa, value);
>      }
>  }
>  
> diff --git a/include/hw/virtio/virtio-bus.h b/include/hw/virtio/virtio-bus.h
> index 3f2c136..17c07af 100644
> --- a/include/hw/virtio/virtio-bus.h
> +++ b/include/hw/virtio/virtio-bus.h
> @@ -76,6 +76,7 @@ typedef struct VirtioBusClass {
>       * Note that changing this will break migration for this transport.
>       */
>      bool has_variable_vring_alignment;
> +    AddressSpace *(*get_dma_as)(DeviceState *d);
>  } VirtioBusClass;
>  
>  struct VirtioBusState {
> diff --git a/include/hw/virtio/virtio.h b/include/hw/virtio/virtio.h
> index 2b5b248..0908bf6 100644
> --- a/include/hw/virtio/virtio.h
> +++ b/include/hw/virtio/virtio.h
> @@ -153,9 +153,9 @@ void virtqueue_discard(VirtQueue *vq, const VirtQueueElement *elem,
>  void virtqueue_fill(VirtQueue *vq, const VirtQueueElement *elem,
>                      unsigned int len, unsigned int idx);
>  
> -void virtqueue_map(VirtQueueElement *elem);
> +void virtqueue_map(VirtIODevice *vdev, VirtQueueElement *elem);
>  void *virtqueue_pop(VirtQueue *vq, size_t sz);
> -void *qemu_get_virtqueue_element(QEMUFile *f, size_t sz);
> +void *qemu_get_virtqueue_element(VirtIODevice *vdev, QEMUFile *f, size_t sz);
>  void qemu_put_virtqueue_element(QEMUFile *f, VirtQueueElement *elem);
>  int virtqueue_avail_bytes(VirtQueue *vq, unsigned int in_bytes,
>                            unsigned int out_bytes);
> -- 
> 2.5.0

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2016-04-19 13:37 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-03-25  2:13 [Qemu-devel] [RFC PATCH 0/8] virtio/vhost DMAR support Jason Wang
2016-03-25  2:13 ` [Qemu-devel] [RFC PATCH 1/8] virtio: convert to use DMA api Jason Wang
2016-04-19 13:37   ` Michael S. Tsirkin
2016-03-25  2:13 ` [Qemu-devel] [RFC PATCH 2/8] intel_iommu: name vtd address space with devfn Jason Wang
2016-03-28  2:02   ` Peter Xu
2016-03-30  1:12     ` Jason Wang
2016-03-30 11:12       ` Michael S. Tsirkin
2016-03-25  2:13 ` [Qemu-devel] [RFC PATCH 3/8] intel_iommu: allocate new key when creating new address space Jason Wang
2016-03-28  2:07   ` Peter Xu
2016-03-25  2:13 ` [Qemu-devel] [RFC PATCH 4/8] exec: introduce address_space_get_iotlb_entry() Jason Wang
2016-03-28  2:18   ` Peter Xu
2016-03-30  1:13     ` Jason Wang
2016-03-25  2:13 ` [Qemu-devel] [RFC PATCH 5/8] virtio-pci: address space translation service (ATS) support Jason Wang
2016-03-25  2:13 ` [Qemu-devel] [RFC PATCH 6/8] intel_iommu: support device iotlb descriptor Jason Wang
2016-03-28  3:37   ` Peter Xu
2016-03-30  5:08     ` Jason Wang
2016-03-30  5:21       ` Peter Xu
2016-03-25  2:13 ` [Qemu-devel] [RFC PATCH 7/8] memory: handle alias for iommu notifier Jason Wang
2016-03-25  2:13 ` [Qemu-devel] [RFC PATCH 8/8] vhost_net: device IOTLB support Jason Wang

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.