All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [RFC v2 0/8] VIRTIO-IOMMU device
@ 2017-06-07 16:01 Eric Auger
  2017-06-07 16:01 ` [Qemu-devel] [RFC v2 1/8] update-linux-headers: import virtio_iommu.h Eric Auger
                   ` (8 more replies)
  0 siblings, 9 replies; 73+ messages in thread
From: Eric Auger @ 2017-06-07 16:01 UTC (permalink / raw)
  To: eric.auger.pro, eric.auger, peter.maydell, alex.williamson, mst,
	qemu-arm, qemu-devel, jean-philippe.brucker
  Cc: will.deacon, robin.murphy, kevin.tian, marc.zyngier,
	christoffer.dall, drjones, wei, tn, bharat.bhushan

This series implements the virtio-iommu device. This is a proof
of concept based on the virtio-iommu specification written by
Jean-Philippe Brucker [1]. This was tested with a guest using
the virtio-iommu driver [2] and exposed with a virtio-net-pci
using dma ops.

The device gets instantiated using the "-device virtio-iommu-device"
option. It currently works with ARM virt machine only as the machine
must handle the dt binding between the virtio-mmio "iommu" node and
the PCI host bridge node. ACPI booting is not yet supported.

This should allow to start some benchmarking activities against
pure emulated IOMMU (especially ARM SMMU).

Best Regards

Eric

This series can be found at:
https://github.com/eauger/qemu/tree/virtio-iommu-rfcv2

References:
[1] [RFC 0/3] virtio-iommu: a paravirtualized IOMMU,
[2] [RFC PATCH linux] iommu: Add virtio-iommu driver
[3] [RFC PATCH kvmtool 00/15] Add virtio-iommu

History:
v1 -> v2:
- fix redifinition of viommu_as typedef

Eric Auger (8):
  update-linux-headers: import virtio_iommu.h
  linux-headers: Update for virtio-iommu
  virtio_iommu: add skeleton
  virtio-iommu: Decode the command payload
  virtio_iommu: Add the iommu regions
  virtio-iommu: Implement the translation and commands
  hw/arm/virt: Add 2.10 machine type
  hw/arm/virt: Add virtio-iommu the virt board

 hw/arm/virt.c                                 | 116 ++++-
 hw/virtio/Makefile.objs                       |   1 +
 hw/virtio/trace-events                        |  14 +
 hw/virtio/virtio-iommu.c                      | 623 ++++++++++++++++++++++++++
 include/hw/arm/virt.h                         |   5 +
 include/hw/virtio/virtio-iommu.h              |  60 +++
 include/standard-headers/linux/virtio_ids.h   |   1 +
 include/standard-headers/linux/virtio_iommu.h | 142 ++++++
 linux-headers/linux/virtio_iommu.h            |   1 +
 scripts/update-linux-headers.sh               |   3 +
 10 files changed, 957 insertions(+), 9 deletions(-)
 create mode 100644 hw/virtio/virtio-iommu.c
 create mode 100644 include/hw/virtio/virtio-iommu.h
 create mode 100644 include/standard-headers/linux/virtio_iommu.h
 create mode 100644 linux-headers/linux/virtio_iommu.h

-- 
2.5.5

^ permalink raw reply	[flat|nested] 73+ messages in thread

* [Qemu-devel] [RFC v2 1/8] update-linux-headers: import virtio_iommu.h
  2017-06-07 16:01 [Qemu-devel] [RFC v2 0/8] VIRTIO-IOMMU device Eric Auger
@ 2017-06-07 16:01 ` Eric Auger
  2017-06-07 16:01 ` [Qemu-devel] [RFC v2 2/8] linux-headers: Update for virtio-iommu Eric Auger
                   ` (7 subsequent siblings)
  8 siblings, 0 replies; 73+ messages in thread
From: Eric Auger @ 2017-06-07 16:01 UTC (permalink / raw)
  To: eric.auger.pro, eric.auger, peter.maydell, alex.williamson, mst,
	qemu-arm, qemu-devel, jean-philippe.brucker
  Cc: will.deacon, robin.murphy, kevin.tian, marc.zyngier,
	christoffer.dall, drjones, wei, tn, bharat.bhushan

Update the script to update the virtio_iommu.h header.

Signed-off-by: Eric Auger <eric.auger@redhat.com>
---
 scripts/update-linux-headers.sh | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/scripts/update-linux-headers.sh b/scripts/update-linux-headers.sh
index 2f906c4..03f6712 100755
--- a/scripts/update-linux-headers.sh
+++ b/scripts/update-linux-headers.sh
@@ -134,6 +134,9 @@ EOF
 cat <<EOF >$output/linux-headers/linux/virtio_config.h
 #include "standard-headers/linux/virtio_config.h"
 EOF
+cat <<EOF >$output/linux-headers/linux/virtio_iommu.h
+#include "standard-headers/linux/virtio_iommu.h"
+EOF
 cat <<EOF >$output/linux-headers/linux/virtio_ring.h
 #include "standard-headers/linux/virtio_ring.h"
 EOF
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 73+ messages in thread

* [Qemu-devel] [RFC v2 2/8] linux-headers: Update for virtio-iommu
  2017-06-07 16:01 [Qemu-devel] [RFC v2 0/8] VIRTIO-IOMMU device Eric Auger
  2017-06-07 16:01 ` [Qemu-devel] [RFC v2 1/8] update-linux-headers: import virtio_iommu.h Eric Auger
@ 2017-06-07 16:01 ` Eric Auger
  2017-06-07 16:01 ` [Qemu-devel] [RFC v2 3/8] virtio_iommu: add skeleton Eric Auger
                   ` (6 subsequent siblings)
  8 siblings, 0 replies; 73+ messages in thread
From: Eric Auger @ 2017-06-07 16:01 UTC (permalink / raw)
  To: eric.auger.pro, eric.auger, peter.maydell, alex.williamson, mst,
	qemu-arm, qemu-devel, jean-philippe.brucker
  Cc: will.deacon, robin.murphy, kevin.tian, marc.zyngier,
	christoffer.dall, drjones, wei, tn, bharat.bhushan

This is a partial linux header update against Jean-Philippe's branch:
git://linux-arm.org/linux-jpb.git virtio-iommu/base (unstable)

Signed-off-by: Eric Auger <eric.auger@redhat.com>
---
 include/standard-headers/linux/virtio_ids.h   |   1 +
 include/standard-headers/linux/virtio_iommu.h | 142 ++++++++++++++++++++++++++
 linux-headers/linux/virtio_iommu.h            |   1 +
 3 files changed, 144 insertions(+)
 create mode 100644 include/standard-headers/linux/virtio_iommu.h
 create mode 100644 linux-headers/linux/virtio_iommu.h

diff --git a/include/standard-headers/linux/virtio_ids.h b/include/standard-headers/linux/virtio_ids.h
index 6d5c3b2..934ed3d 100644
--- a/include/standard-headers/linux/virtio_ids.h
+++ b/include/standard-headers/linux/virtio_ids.h
@@ -43,5 +43,6 @@
 #define VIRTIO_ID_INPUT        18 /* virtio input */
 #define VIRTIO_ID_VSOCK        19 /* virtio vsock transport */
 #define VIRTIO_ID_CRYPTO       20 /* virtio crypto */
+#define VIRTIO_ID_IOMMU	    61216 /* virtio IOMMU (temporary) */
 
 #endif /* _LINUX_VIRTIO_IDS_H */
diff --git a/include/standard-headers/linux/virtio_iommu.h b/include/standard-headers/linux/virtio_iommu.h
new file mode 100644
index 0000000..e139587
--- /dev/null
+++ b/include/standard-headers/linux/virtio_iommu.h
@@ -0,0 +1,142 @@
+/*
+ * Copyright (C) 2017 ARM Ltd.
+ *
+ * This header is BSD licensed so anyone can use the definitions
+ * to implement compatible drivers/servers:
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ * are met:
+ * 1. Redistributions of source code must retain the above copyright
+ *    notice, this list of conditions and the following disclaimer.
+ * 2. Redistributions in binary form must reproduce the above copyright
+ *    notice, this list of conditions and the following disclaimer in the
+ *    documentation and/or other materials provided with the distribution.
+ * 3. Neither the name of ARM Ltd. nor the names of its contributors
+ *    may be used to endorse or promote products derived from this software
+ *    without specific prior written permission.
+ * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ * ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
+ * FOR A PARTICULAR PURPOSE ARE DISCLAIMED.  IN NO EVENT SHALL IBM OR
+ * CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF
+ * USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND
+ * ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
+ * OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT
+ * OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
+ * SUCH DAMAGE.
+ */
+#ifndef _LINUX_VIRTIO_IOMMU_H
+#define _LINUX_VIRTIO_IOMMU_H
+
+/* Feature bits */
+#define VIRTIO_IOMMU_F_INPUT_RANGE		0
+#define VIRTIO_IOMMU_F_IOASID_BITS		1
+#define VIRTIO_IOMMU_F_MAP_UNMAP		2
+#define VIRTIO_IOMMU_F_BYPASS			3
+
+QEMU_PACKED
+struct virtio_iommu_config {
+	/* Supported page sizes */
+	uint64_t					page_sizes;
+	struct virtio_iommu_range {
+		uint64_t				start;
+		uint64_t				end;
+	} input_range;
+	uint8_t 					ioasid_bits;
+};
+
+/* Request types */
+#define VIRTIO_IOMMU_T_ATTACH			0x01
+#define VIRTIO_IOMMU_T_DETACH			0x02
+#define VIRTIO_IOMMU_T_MAP			0x03
+#define VIRTIO_IOMMU_T_UNMAP			0x04
+
+/* Status types */
+#define VIRTIO_IOMMU_S_OK			0x00
+#define VIRTIO_IOMMU_S_IOERR			0x01
+#define VIRTIO_IOMMU_S_UNSUPP			0x02
+#define VIRTIO_IOMMU_S_DEVERR			0x03
+#define VIRTIO_IOMMU_S_INVAL			0x04
+#define VIRTIO_IOMMU_S_RANGE			0x05
+#define VIRTIO_IOMMU_S_NOENT			0x06
+#define VIRTIO_IOMMU_S_FAULT			0x07
+
+QEMU_PACKED
+struct virtio_iommu_req_head {
+	uint8_t					type;
+	uint8_t					reserved[3];
+};
+
+QEMU_PACKED
+struct virtio_iommu_req_tail {
+	uint8_t					status;
+	uint8_t					reserved[3];
+};
+
+QEMU_PACKED
+struct virtio_iommu_req_attach {
+	struct virtio_iommu_req_head		head;
+
+	uint32_t					address_space;
+	uint32_t					device;
+	uint32_t					reserved;
+
+	struct virtio_iommu_req_tail		tail;
+};
+
+QEMU_PACKED
+struct virtio_iommu_req_detach {
+	struct virtio_iommu_req_head		head;
+
+	uint32_t					device;
+	uint32_t					reserved;
+
+	struct virtio_iommu_req_tail		tail;
+};
+
+#define VIRTIO_IOMMU_MAP_F_READ			(1 << 0)
+#define VIRTIO_IOMMU_MAP_F_WRITE		(1 << 1)
+#define VIRTIO_IOMMU_MAP_F_EXEC			(1 << 2)
+
+#define VIRTIO_IOMMU_MAP_F_MASK			(VIRTIO_IOMMU_MAP_F_READ |	\
+						 VIRTIO_IOMMU_MAP_F_WRITE |	\
+						 VIRTIO_IOMMU_MAP_F_EXEC)
+
+QEMU_PACKED
+struct virtio_iommu_req_map {
+	struct virtio_iommu_req_head		head;
+
+	uint32_t					address_space;
+	uint32_t					flags;
+	uint64_t					virt_addr;
+	uint64_t					phys_addr;
+	uint64_t					size;
+
+	struct virtio_iommu_req_tail		tail;
+};
+
+QEMU_PACKED
+struct virtio_iommu_req_unmap {
+	struct virtio_iommu_req_head		head;
+
+	uint32_t					address_space;
+	uint32_t					flags;
+	uint64_t					virt_addr;
+	uint64_t					size;
+
+	struct virtio_iommu_req_tail		tail;
+};
+
+union virtio_iommu_req {
+	struct virtio_iommu_req_head		head;
+
+	struct virtio_iommu_req_attach		attach;
+	struct virtio_iommu_req_detach		detach;
+	struct virtio_iommu_req_map		map;
+	struct virtio_iommu_req_unmap		unmap;
+};
+
+#endif
diff --git a/linux-headers/linux/virtio_iommu.h b/linux-headers/linux/virtio_iommu.h
new file mode 100644
index 0000000..2dc4609
--- /dev/null
+++ b/linux-headers/linux/virtio_iommu.h
@@ -0,0 +1 @@
+#include "standard-headers/linux/virtio_iommu.h"
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 73+ messages in thread

* [Qemu-devel] [RFC v2 3/8] virtio_iommu: add skeleton
  2017-06-07 16:01 [Qemu-devel] [RFC v2 0/8] VIRTIO-IOMMU device Eric Auger
  2017-06-07 16:01 ` [Qemu-devel] [RFC v2 1/8] update-linux-headers: import virtio_iommu.h Eric Auger
  2017-06-07 16:01 ` [Qemu-devel] [RFC v2 2/8] linux-headers: Update for virtio-iommu Eric Auger
@ 2017-06-07 16:01 ` Eric Auger
  2017-06-08 11:09   ` Bharat Bhushan
  2017-06-07 16:01 ` [Qemu-devel] [RFC v2 4/8] virtio-iommu: Decode the command payload Eric Auger
                   ` (5 subsequent siblings)
  8 siblings, 1 reply; 73+ messages in thread
From: Eric Auger @ 2017-06-07 16:01 UTC (permalink / raw)
  To: eric.auger.pro, eric.auger, peter.maydell, alex.williamson, mst,
	qemu-arm, qemu-devel, jean-philippe.brucker
  Cc: will.deacon, robin.murphy, kevin.tian, marc.zyngier,
	christoffer.dall, drjones, wei, tn, bharat.bhushan

This patchs adds the skeleton for the virtio-iommu device.

Signed-off-by: Eric Auger <eric.auger@redhat.com>
---
 hw/virtio/Makefile.objs          |   1 +
 hw/virtio/virtio-iommu.c         | 247 +++++++++++++++++++++++++++++++++++++++
 include/hw/virtio/virtio-iommu.h |  60 ++++++++++
 3 files changed, 308 insertions(+)
 create mode 100644 hw/virtio/virtio-iommu.c
 create mode 100644 include/hw/virtio/virtio-iommu.h

diff --git a/hw/virtio/Makefile.objs b/hw/virtio/Makefile.objs
index 765d363..8967a4a 100644
--- a/hw/virtio/Makefile.objs
+++ b/hw/virtio/Makefile.objs
@@ -6,6 +6,7 @@ common-obj-y += virtio-mmio.o
 
 obj-y += virtio.o virtio-balloon.o 
 obj-$(CONFIG_LINUX) += vhost.o vhost-backend.o vhost-user.o
+obj-$(CONFIG_LINUX) += virtio-iommu.o
 obj-$(CONFIG_VHOST_VSOCK) += vhost-vsock.o
 obj-y += virtio-crypto.o
 obj-$(CONFIG_VIRTIO_PCI) += virtio-crypto-pci.o
diff --git a/hw/virtio/virtio-iommu.c b/hw/virtio/virtio-iommu.c
new file mode 100644
index 0000000..86129ef
--- /dev/null
+++ b/hw/virtio/virtio-iommu.c
@@ -0,0 +1,247 @@
+/*
+ * virtio-iommu device
+ *
+ * Copyright (c) 2017 Red Hat, Inc.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2 or later, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License along with
+ * this program.  If not, see <http://www.gnu.org/licenses/>.
+ *
+ */
+
+#include "qemu/osdep.h"
+#include "qemu/iov.h"
+#include "qemu-common.h"
+#include "hw/virtio/virtio.h"
+#include "sysemu/kvm.h"
+#include "qapi-event.h"
+#include "trace.h"
+
+#include "standard-headers/linux/virtio_ids.h"
+#include <linux/virtio_iommu.h>
+
+#include "hw/virtio/virtio-bus.h"
+#include "hw/virtio/virtio-access.h"
+#include "hw/virtio/virtio-iommu.h"
+
+/* Max size */
+#define VIOMMU_DEFAULT_QUEUE_SIZE 256
+
+static int virtio_iommu_handle_attach(VirtIOIOMMU *s,
+                                      struct iovec *iov,
+                                      unsigned int iov_cnt)
+{
+    return -ENOENT;
+}
+static int virtio_iommu_handle_detach(VirtIOIOMMU *s,
+                                      struct iovec *iov,
+                                      unsigned int iov_cnt)
+{
+    return -ENOENT;
+}
+static int virtio_iommu_handle_map(VirtIOIOMMU *s,
+                                   struct iovec *iov,
+                                   unsigned int iov_cnt)
+{
+    return -ENOENT;
+}
+static int virtio_iommu_handle_unmap(VirtIOIOMMU *s,
+                                     struct iovec *iov,
+                                     unsigned int iov_cnt)
+{
+    return -ENOENT;
+}
+
+static void virtio_iommu_handle_command(VirtIODevice *vdev, VirtQueue *vq)
+{
+    VirtIOIOMMU *s = VIRTIO_IOMMU(vdev);
+    VirtQueueElement *elem;
+    struct virtio_iommu_req_head head;
+    struct virtio_iommu_req_tail tail;
+    unsigned int iov_cnt;
+    struct iovec *iov;
+    size_t sz;
+
+    for (;;) {
+        elem = virtqueue_pop(vq, sizeof(VirtQueueElement));
+        if (!elem) {
+            return;
+        }
+
+        if (iov_size(elem->in_sg, elem->in_num) < sizeof(tail) ||
+            iov_size(elem->out_sg, elem->out_num) < sizeof(head)) {
+            virtio_error(vdev, "virtio-iommu erroneous head or tail");
+            virtqueue_detach_element(vq, elem, 0);
+            g_free(elem);
+            break;
+        }
+
+        iov_cnt = elem->out_num;
+        iov = g_memdup(elem->out_sg, sizeof(struct iovec) * elem->out_num);
+        sz = iov_to_buf(iov, iov_cnt, 0, &head, sizeof(head));
+        if (sz != sizeof(head)) {
+            tail.status = VIRTIO_IOMMU_S_UNSUPP;
+        }
+        qemu_mutex_lock(&s->mutex);
+        switch (head.type) {
+        case VIRTIO_IOMMU_T_ATTACH:
+            tail.status = virtio_iommu_handle_attach(s, iov, iov_cnt);
+            break;
+        case VIRTIO_IOMMU_T_DETACH:
+            tail.status = virtio_iommu_handle_detach(s, iov, iov_cnt);
+            break;
+        case VIRTIO_IOMMU_T_MAP:
+            tail.status = virtio_iommu_handle_map(s, iov, iov_cnt);
+            break;
+        case VIRTIO_IOMMU_T_UNMAP:
+            tail.status = virtio_iommu_handle_unmap(s, iov, iov_cnt);
+            break;
+        default:
+            tail.status = VIRTIO_IOMMU_S_UNSUPP;
+        }
+        qemu_mutex_unlock(&s->mutex);
+
+        sz = iov_from_buf(elem->in_sg, elem->in_num, 0,
+                          &tail, sizeof(tail));
+        assert(sz == sizeof(tail));
+
+        virtqueue_push(vq, elem, sizeof(tail));
+        virtio_notify(vdev, vq);
+        g_free(elem);
+    }
+}
+
+static void virtio_iommu_get_config(VirtIODevice *vdev, uint8_t *config_data)
+{
+    VirtIOIOMMU *dev = VIRTIO_IOMMU(vdev);
+
+    memcpy(config_data, &dev->config, sizeof(struct virtio_iommu_config));
+}
+
+static void virtio_iommu_set_config(VirtIODevice *vdev,
+                                      const uint8_t *config_data)
+{
+    VirtIOIOMMU *dev = VIRTIO_IOMMU(vdev);
+    struct virtio_iommu_config config;
+
+    memcpy(&config, config_data, sizeof(struct virtio_iommu_config));
+
+    dev->config.page_sizes = le64_to_cpu(config.page_sizes);
+    dev->config.input_range.end = le64_to_cpu(config.input_range.end);
+}
+
+static uint64_t virtio_iommu_get_features(VirtIODevice *vdev, uint64_t f,
+                                            Error **errp)
+{
+    VirtIOIOMMU *dev = VIRTIO_IOMMU(vdev);
+    f |= dev->host_features;
+    virtio_add_feature(&f, VIRTIO_RING_F_EVENT_IDX);
+    virtio_add_feature(&f, VIRTIO_RING_F_INDIRECT_DESC);
+    virtio_add_feature(&f, VIRTIO_IOMMU_F_INPUT_RANGE);
+    return f;
+}
+
+static int virtio_iommu_post_load_device(void *opaque, int version_id)
+{
+    return 0;
+}
+
+static const VMStateDescription vmstate_virtio_iommu_device = {
+    .name = "virtio-iommu-device",
+    .version_id = 1,
+    .minimum_version_id = 1,
+    .post_load = virtio_iommu_post_load_device,
+    .fields = (VMStateField[]) {
+        VMSTATE_END_OF_LIST()
+    },
+};
+
+static void virtio_iommu_device_realize(DeviceState *dev, Error **errp)
+{
+    VirtIODevice *vdev = VIRTIO_DEVICE(dev);
+    VirtIOIOMMU *s = VIRTIO_IOMMU(dev);
+
+    virtio_init(vdev, "virtio-iommu", VIRTIO_ID_IOMMU,
+                sizeof(struct virtio_iommu_config));
+
+    s->vq = virtio_add_queue(vdev, VIOMMU_DEFAULT_QUEUE_SIZE,
+                             virtio_iommu_handle_command);
+
+    s->config.page_sizes = ~((1ULL << 12) - 1);
+    s->config.input_range.end = -1UL;
+}
+
+static void virtio_iommu_device_unrealize(DeviceState *dev, Error **errp)
+{
+    VirtIODevice *vdev = VIRTIO_DEVICE(dev);
+
+    virtio_cleanup(vdev);
+}
+
+static void virtio_iommu_device_reset(VirtIODevice *vdev)
+{
+}
+
+static void virtio_iommu_set_status(VirtIODevice *vdev, uint8_t status)
+{
+}
+
+static void virtio_iommu_instance_init(Object *obj)
+{
+}
+
+static const VMStateDescription vmstate_virtio_iommu = {
+    .name = "virtio-iommu",
+    .minimum_version_id = 1,
+    .version_id = 1,
+    .fields = (VMStateField[]) {
+        VMSTATE_VIRTIO_DEVICE,
+        VMSTATE_END_OF_LIST()
+    },
+};
+
+static Property virtio_iommu_properties[] = {
+    DEFINE_PROP_END_OF_LIST(),
+};
+
+static void virtio_iommu_class_init(ObjectClass *klass, void *data)
+{
+    DeviceClass *dc = DEVICE_CLASS(klass);
+    VirtioDeviceClass *vdc = VIRTIO_DEVICE_CLASS(klass);
+
+    dc->props = virtio_iommu_properties;
+    dc->vmsd = &vmstate_virtio_iommu;
+
+    set_bit(DEVICE_CATEGORY_MISC, dc->categories);
+    vdc->realize = virtio_iommu_device_realize;
+    vdc->unrealize = virtio_iommu_device_unrealize;
+    vdc->reset = virtio_iommu_device_reset;
+    vdc->get_config = virtio_iommu_get_config;
+    vdc->set_config = virtio_iommu_set_config;
+    vdc->get_features = virtio_iommu_get_features;
+    vdc->set_status = virtio_iommu_set_status;
+    vdc->vmsd = &vmstate_virtio_iommu_device;
+}
+
+static const TypeInfo virtio_iommu_info = {
+    .name = TYPE_VIRTIO_IOMMU,
+    .parent = TYPE_VIRTIO_DEVICE,
+    .instance_size = sizeof(VirtIOIOMMU),
+    .instance_init = virtio_iommu_instance_init,
+    .class_init = virtio_iommu_class_init,
+};
+
+static void virtio_register_types(void)
+{
+    type_register_static(&virtio_iommu_info);
+}
+
+type_init(virtio_register_types)
diff --git a/include/hw/virtio/virtio-iommu.h b/include/hw/virtio/virtio-iommu.h
new file mode 100644
index 0000000..2259413
--- /dev/null
+++ b/include/hw/virtio/virtio-iommu.h
@@ -0,0 +1,60 @@
+/*
+ * virtio-iommu device
+ *
+ * Copyright (c) 2017 Red Hat, Inc.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2 or later, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License along with
+ * this program.  If not, see <http://www.gnu.org/licenses/>.
+ *
+ */
+
+#ifndef QEMU_VIRTIO_IOMMU_H
+#define QEMU_VIRTIO_IOMMU_H
+
+#include "standard-headers/linux/virtio_iommu.h"
+#include "hw/virtio/virtio.h"
+#include "hw/pci/pci.h"
+
+#define TYPE_VIRTIO_IOMMU "virtio-iommu-device"
+#define VIRTIO_IOMMU(obj) \
+        OBJECT_CHECK(VirtIOIOMMU, (obj), TYPE_VIRTIO_IOMMU)
+
+#define IOMMU_PCI_BUS_MAX      256
+#define IOMMU_PCI_DEVFN_MAX    256
+
+typedef struct IOMMUDevice {
+    void         *viommu;
+    PCIBus       *bus;
+    int           devfn;
+    MemoryRegion  iommu_mr;
+    AddressSpace  as;
+} IOMMUDevice;
+
+typedef struct IOMMUPciBus {
+    PCIBus       *bus;
+    IOMMUDevice  *pbdev[0]; /* Parent array is sparse, so dynamically alloc */
+} IOMMUPciBus;
+
+typedef struct VirtIOIOMMU {
+    VirtIODevice parent_obj;
+    VirtQueue *vq;
+    struct virtio_iommu_config config;
+    MemoryRegionIOMMUOps iommu_ops;
+    uint32_t host_features;
+    GHashTable *as_by_busptr;
+    IOMMUPciBus *as_by_bus_num[IOMMU_PCI_BUS_MAX];
+    GTree *address_spaces;
+    QemuMutex mutex;
+    GTree *devices;
+} VirtIOIOMMU;
+
+#endif
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 73+ messages in thread

* [Qemu-devel] [RFC v2 4/8] virtio-iommu: Decode the command payload
  2017-06-07 16:01 [Qemu-devel] [RFC v2 0/8] VIRTIO-IOMMU device Eric Auger
                   ` (2 preceding siblings ...)
  2017-06-07 16:01 ` [Qemu-devel] [RFC v2 3/8] virtio_iommu: add skeleton Eric Auger
@ 2017-06-07 16:01 ` Eric Auger
  2017-06-07 16:01 ` [Qemu-devel] [RFC v2 5/8] virtio_iommu: Add the iommu regions Eric Auger
                   ` (4 subsequent siblings)
  8 siblings, 0 replies; 73+ messages in thread
From: Eric Auger @ 2017-06-07 16:01 UTC (permalink / raw)
  To: eric.auger.pro, eric.auger, peter.maydell, alex.williamson, mst,
	qemu-arm, qemu-devel, jean-philippe.brucker
  Cc: will.deacon, robin.murphy, kevin.tian, marc.zyngier,
	christoffer.dall, drjones, wei, tn, bharat.bhushan

This patch adds the command payload decoding and
introduces the functions that will do the actual
command handling. Those functions are not yet implemented.

Signed-off-by: Eric Auger <eric.auger@redhat.com>
---
 hw/virtio/trace-events   |  7 ++++
 hw/virtio/virtio-iommu.c | 97 ++++++++++++++++++++++++++++++++++++++++++++++--
 2 files changed, 100 insertions(+), 4 deletions(-)

diff --git a/hw/virtio/trace-events b/hw/virtio/trace-events
index e24d8fa..fba1da6 100644
--- a/hw/virtio/trace-events
+++ b/hw/virtio/trace-events
@@ -25,3 +25,10 @@ virtio_balloon_handle_output(const char *name, uint64_t gpa) "section name: %s g
 virtio_balloon_get_config(uint32_t num_pages, uint32_t actual) "num_pages: %d actual: %d"
 virtio_balloon_set_config(uint32_t actual, uint32_t oldactual) "actual: %d oldactual: %d"
 virtio_balloon_to_target(uint64_t target, uint32_t num_pages) "balloon target: %"PRIx64" num_pages: %d"
+
+# hw/virtio/virtio-iommu.c
+#
+virtio_iommu_attach(uint32_t as, uint32_t dev, uint32_t flags) "as=%d dev=%d flags=%d"
+virtio_iommu_detach(uint32_t dev, uint32_t flags) "dev=%d flags=%d"
+virtio_iommu_map(uint32_t as, uint64_t phys_addr, uint64_t virt_addr, uint64_t size, uint32_t flags) "as= %d phys_addr=0x%"PRIx64" virt_addr=0x%"PRIx64" size=0x%"PRIx64" flags=%d"
+virtio_iommu_unmap(uint32_t as, uint64_t virt_addr, uint64_t size, uint32_t reserved) "as= %d virt_addr=0x%"PRIx64" size=0x%"PRIx64" reserved=%d"
diff --git a/hw/virtio/virtio-iommu.c b/hw/virtio/virtio-iommu.c
index 86129ef..ea1caa7 100644
--- a/hw/virtio/virtio-iommu.c
+++ b/hw/virtio/virtio-iommu.c
@@ -35,29 +35,118 @@
 /* Max size */
 #define VIOMMU_DEFAULT_QUEUE_SIZE 256
 
+static int virtio_iommu_attach(VirtIOIOMMU *s,
+                               struct virtio_iommu_req_attach *req)
+{
+    uint32_t asid = le32_to_cpu(req->address_space);
+    uint32_t devid = le32_to_cpu(req->device);
+    uint32_t reserved = le32_to_cpu(req->reserved);
+
+    trace_virtio_iommu_attach(asid, devid, reserved);
+
+    return VIRTIO_IOMMU_S_UNSUPP;
+}
+
+static int virtio_iommu_detach(VirtIOIOMMU *s,
+                               struct virtio_iommu_req_detach *req)
+{
+    uint32_t devid = le32_to_cpu(req->device);
+    uint32_t reserved = le32_to_cpu(req->reserved);
+
+    trace_virtio_iommu_detach(devid, reserved);
+
+    return VIRTIO_IOMMU_S_UNSUPP;
+}
+
+static int virtio_iommu_map(VirtIOIOMMU *s,
+                            struct virtio_iommu_req_map *req)
+{
+    uint32_t asid = le32_to_cpu(req->address_space);
+    uint64_t phys_addr = le64_to_cpu(req->phys_addr);
+    uint64_t virt_addr = le64_to_cpu(req->virt_addr);
+    uint64_t size = le64_to_cpu(req->size);
+    uint32_t flags = le32_to_cpu(req->flags);
+
+    trace_virtio_iommu_map(asid, phys_addr, virt_addr, size, flags);
+
+    return VIRTIO_IOMMU_S_UNSUPP;
+}
+
+static int virtio_iommu_unmap(VirtIOIOMMU *s,
+                              struct virtio_iommu_req_unmap *req)
+{
+    uint32_t asid = le32_to_cpu(req->address_space);
+    uint64_t virt_addr = le64_to_cpu(req->virt_addr);
+    uint64_t size = le64_to_cpu(req->size);
+    uint32_t flags = le32_to_cpu(req->flags);
+
+    trace_virtio_iommu_unmap(asid, virt_addr, size, flags);
+
+    return VIRTIO_IOMMU_S_UNSUPP;
+}
+
+#define get_payload_size(req) (\
+sizeof((req)) - sizeof(struct virtio_iommu_req_tail))
+
 static int virtio_iommu_handle_attach(VirtIOIOMMU *s,
                                       struct iovec *iov,
                                       unsigned int iov_cnt)
 {
-    return -ENOENT;
+    struct virtio_iommu_req_attach req;
+    size_t sz, payload_sz;
+
+    payload_sz = get_payload_size(req);
+
+    sz = iov_to_buf(iov, iov_cnt, 0, &req, payload_sz);
+    if (sz != payload_sz) {
+        return VIRTIO_IOMMU_S_INVAL;
+    }
+    return virtio_iommu_attach(s, &req);
 }
 static int virtio_iommu_handle_detach(VirtIOIOMMU *s,
                                       struct iovec *iov,
                                       unsigned int iov_cnt)
 {
-    return -ENOENT;
+    struct virtio_iommu_req_detach req;
+    size_t sz, payload_sz;
+
+    payload_sz = get_payload_size(req);
+
+    sz = iov_to_buf(iov, iov_cnt, 0, &req, payload_sz);
+    if (sz != payload_sz) {
+        return VIRTIO_IOMMU_S_INVAL;
+    }
+    return virtio_iommu_detach(s, &req);
 }
 static int virtio_iommu_handle_map(VirtIOIOMMU *s,
                                    struct iovec *iov,
                                    unsigned int iov_cnt)
 {
-    return -ENOENT;
+    struct virtio_iommu_req_map req;
+    size_t sz, payload_sz;
+
+    payload_sz = get_payload_size(req);
+
+    sz = iov_to_buf(iov, iov_cnt, 0, &req, payload_sz);
+    if (sz != payload_sz) {
+        return VIRTIO_IOMMU_S_INVAL;
+    }
+    return virtio_iommu_map(s, &req);
 }
 static int virtio_iommu_handle_unmap(VirtIOIOMMU *s,
                                      struct iovec *iov,
                                      unsigned int iov_cnt)
 {
-    return -ENOENT;
+    struct virtio_iommu_req_unmap req;
+    size_t sz, payload_sz;
+
+    payload_sz = get_payload_size(req);
+
+    sz = iov_to_buf(iov, iov_cnt, 0, &req, payload_sz);
+    if (sz != payload_sz) {
+        return VIRTIO_IOMMU_S_INVAL;
+    }
+    return virtio_iommu_unmap(s, &req);
 }
 
 static void virtio_iommu_handle_command(VirtIODevice *vdev, VirtQueue *vq)
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 73+ messages in thread

* [Qemu-devel] [RFC v2 5/8] virtio_iommu: Add the iommu regions
  2017-06-07 16:01 [Qemu-devel] [RFC v2 0/8] VIRTIO-IOMMU device Eric Auger
                   ` (3 preceding siblings ...)
  2017-06-07 16:01 ` [Qemu-devel] [RFC v2 4/8] virtio-iommu: Decode the command payload Eric Auger
@ 2017-06-07 16:01 ` Eric Auger
  2017-06-12  5:59   ` Bharat Bhushan
  2017-06-07 16:01 ` [Qemu-devel] [RFC v2 6/8] virtio-iommu: Implement the translation and commands Eric Auger
                   ` (3 subsequent siblings)
  8 siblings, 1 reply; 73+ messages in thread
From: Eric Auger @ 2017-06-07 16:01 UTC (permalink / raw)
  To: eric.auger.pro, eric.auger, peter.maydell, alex.williamson, mst,
	qemu-arm, qemu-devel, jean-philippe.brucker
  Cc: will.deacon, robin.murphy, kevin.tian, marc.zyngier,
	christoffer.dall, drjones, wei, tn, bharat.bhushan

This patch initializes the iommu memory regions so that
PCIe end point transactions get translated. The translation function
is not yet implemented at that stage.

Signed-off-by: Eric Auger <eric.auger@redhat.com>
---
 hw/virtio/trace-events   |  1 +
 hw/virtio/virtio-iommu.c | 97 ++++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 98 insertions(+)

diff --git a/hw/virtio/trace-events b/hw/virtio/trace-events
index fba1da6..341dbdf 100644
--- a/hw/virtio/trace-events
+++ b/hw/virtio/trace-events
@@ -32,3 +32,4 @@ virtio_iommu_attach(uint32_t as, uint32_t dev, uint32_t flags) "as=%d dev=%d fla
 virtio_iommu_detach(uint32_t dev, uint32_t flags) "dev=%d flags=%d"
 virtio_iommu_map(uint32_t as, uint64_t phys_addr, uint64_t virt_addr, uint64_t size, uint32_t flags) "as= %d phys_addr=0x%"PRIx64" virt_addr=0x%"PRIx64" size=0x%"PRIx64" flags=%d"
 virtio_iommu_unmap(uint32_t as, uint64_t virt_addr, uint64_t size, uint32_t reserved) "as= %d virt_addr=0x%"PRIx64" size=0x%"PRIx64" reserved=%d"
+virtio_iommu_translate(const char *name, uint32_t rid, uint64_t iova, int flag) "mr=%s rid=%d addr=0x%"PRIx64" flag=%d"
diff --git a/hw/virtio/virtio-iommu.c b/hw/virtio/virtio-iommu.c
index ea1caa7..902c779 100644
--- a/hw/virtio/virtio-iommu.c
+++ b/hw/virtio/virtio-iommu.c
@@ -23,6 +23,7 @@
 #include "hw/virtio/virtio.h"
 #include "sysemu/kvm.h"
 #include "qapi-event.h"
+#include "qemu/error-report.h"
 #include "trace.h"
 
 #include "standard-headers/linux/virtio_ids.h"
@@ -35,6 +36,59 @@
 /* Max size */
 #define VIOMMU_DEFAULT_QUEUE_SIZE 256
 
+static inline uint16_t smmu_get_sid(IOMMUDevice *dev)
+{
+    return  ((pci_bus_num(dev->bus) & 0xff) << 8) | dev->devfn;
+}
+
+static AddressSpace *virtio_iommu_find_add_as(PCIBus *bus, void *opaque,
+                                              int devfn)
+{
+    VirtIOIOMMU *s = opaque;
+    uintptr_t key = (uintptr_t)bus;
+    IOMMUPciBus *sbus = g_hash_table_lookup(s->as_by_busptr, &key);
+    IOMMUDevice *sdev;
+
+    if (!sbus) {
+        uintptr_t *new_key = g_malloc(sizeof(*new_key));
+
+        *new_key = (uintptr_t)bus;
+        sbus = g_malloc0(sizeof(IOMMUPciBus) +
+                         sizeof(IOMMUDevice *) * IOMMU_PCI_DEVFN_MAX);
+        sbus->bus = bus;
+        g_hash_table_insert(s->as_by_busptr, new_key, sbus);
+    }
+
+    sdev = sbus->pbdev[devfn];
+    if (!sdev) {
+        sdev = sbus->pbdev[devfn] = g_malloc0(sizeof(IOMMUDevice));
+
+        sdev->viommu = s;
+        sdev->bus = bus;
+        sdev->devfn = devfn;
+
+        memory_region_init_iommu(&sdev->iommu_mr, OBJECT(s),
+                                 &s->iommu_ops, TYPE_VIRTIO_IOMMU,
+                                 UINT64_MAX);
+        address_space_init(&sdev->as, &sdev->iommu_mr, TYPE_VIRTIO_IOMMU);
+    }
+
+    return &sdev->as;
+
+}
+
+static void virtio_iommu_init_as(VirtIOIOMMU *s)
+{
+    PCIBus *pcibus = pci_find_primary_bus();
+
+    if (pcibus) {
+        pci_setup_iommu(pcibus, virtio_iommu_find_add_as, s);
+    } else {
+        error_report("No PCI bus, virtio-iommu is not registered");
+    }
+}
+
+
 static int virtio_iommu_attach(VirtIOIOMMU *s,
                                struct virtio_iommu_req_attach *req)
 {
@@ -208,6 +262,26 @@ static void virtio_iommu_handle_command(VirtIODevice *vdev, VirtQueue *vq)
     }
 }
 
+static IOMMUTLBEntry virtio_iommu_translate(MemoryRegion *mr, hwaddr addr,
+                                            IOMMUAccessFlags flag)
+{
+    IOMMUDevice *sdev = container_of(mr, IOMMUDevice, iommu_mr);
+    uint32_t sid;
+
+    IOMMUTLBEntry entry = {
+        .target_as = &address_space_memory,
+        .iova = addr,
+        .translated_addr = addr,
+        .addr_mask = ~(hwaddr)0,
+        .perm = IOMMU_NONE,
+    };
+
+    sid = smmu_get_sid(sdev);
+
+    trace_virtio_iommu_translate(mr->name, sid, addr, flag);
+    return entry;
+}
+
 static void virtio_iommu_get_config(VirtIODevice *vdev, uint8_t *config_data)
 {
     VirtIOIOMMU *dev = VIRTIO_IOMMU(vdev);
@@ -253,6 +327,21 @@ static const VMStateDescription vmstate_virtio_iommu_device = {
     },
 };
 
+/*****************************
+ * Hash Table
+ *****************************/
+
+static inline gboolean as_uint64_equal(gconstpointer v1, gconstpointer v2)
+{
+    return *((const uint64_t *)v1) == *((const uint64_t *)v2);
+}
+
+static inline guint as_uint64_hash(gconstpointer v)
+{
+    return (guint)*(const uint64_t *)v;
+}
+
+
 static void virtio_iommu_device_realize(DeviceState *dev, Error **errp)
 {
     VirtIODevice *vdev = VIRTIO_DEVICE(dev);
@@ -266,6 +355,14 @@ static void virtio_iommu_device_realize(DeviceState *dev, Error **errp)
 
     s->config.page_sizes = ~((1ULL << 12) - 1);
     s->config.input_range.end = -1UL;
+
+    s->iommu_ops.translate = virtio_iommu_translate;
+    memset(s->as_by_bus_num, 0, sizeof(s->as_by_bus_num));
+    s->as_by_busptr = g_hash_table_new_full(as_uint64_hash,
+                                            as_uint64_equal,
+                                            g_free, g_free);
+
+    virtio_iommu_init_as(s);
 }
 
 static void virtio_iommu_device_unrealize(DeviceState *dev, Error **errp)
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 73+ messages in thread

* [Qemu-devel] [RFC v2 6/8] virtio-iommu: Implement the translation and commands
  2017-06-07 16:01 [Qemu-devel] [RFC v2 0/8] VIRTIO-IOMMU device Eric Auger
                   ` (4 preceding siblings ...)
  2017-06-07 16:01 ` [Qemu-devel] [RFC v2 5/8] virtio_iommu: Add the iommu regions Eric Auger
@ 2017-06-07 16:01 ` Eric Auger
  2017-06-23 16:09   ` Jean-Philippe Brucker
                     ` (2 more replies)
  2017-06-07 16:01 ` [Qemu-devel] [RFC v2 7/8] hw/arm/virt: Add 2.10 machine type Eric Auger
                   ` (2 subsequent siblings)
  8 siblings, 3 replies; 73+ messages in thread
From: Eric Auger @ 2017-06-07 16:01 UTC (permalink / raw)
  To: eric.auger.pro, eric.auger, peter.maydell, alex.williamson, mst,
	qemu-arm, qemu-devel, jean-philippe.brucker
  Cc: will.deacon, robin.murphy, kevin.tian, marc.zyngier,
	christoffer.dall, drjones, wei, tn, bharat.bhushan

This patch adds the actual implementation for the translation routine
and the virtio-iommu commands.

Signed-off-by: Eric Auger <eric.auger@redhat.com>

---

v1 -> v2:
- fix compilation issue reported by autobuild system
---
 hw/virtio/trace-events   |   6 ++
 hw/virtio/virtio-iommu.c | 202 +++++++++++++++++++++++++++++++++++++++++++++--
 2 files changed, 202 insertions(+), 6 deletions(-)

diff --git a/hw/virtio/trace-events b/hw/virtio/trace-events
index 341dbdf..9196b63 100644
--- a/hw/virtio/trace-events
+++ b/hw/virtio/trace-events
@@ -33,3 +33,9 @@ virtio_iommu_detach(uint32_t dev, uint32_t flags) "dev=%d flags=%d"
 virtio_iommu_map(uint32_t as, uint64_t phys_addr, uint64_t virt_addr, uint64_t size, uint32_t flags) "as= %d phys_addr=0x%"PRIx64" virt_addr=0x%"PRIx64" size=0x%"PRIx64" flags=%d"
 virtio_iommu_unmap(uint32_t as, uint64_t virt_addr, uint64_t size, uint32_t reserved) "as= %d virt_addr=0x%"PRIx64" size=0x%"PRIx64" reserved=%d"
 virtio_iommu_translate(const char *name, uint32_t rid, uint64_t iova, int flag) "mr=%s rid=%d addr=0x%"PRIx64" flag=%d"
+virtio_iommu_new_asid(uint32_t asid) "Allocate a new asid=%d"
+virtio_iommu_new_devid(uint32_t devid) "Allocate a new devid=%d"
+virtio_iommu_unmap_left_interval(uint64_t low, uint64_t high, uint64_t next_low, uint64_t next_high) "Unmap left [0x%"PRIx64",0x%"PRIx64"], new interval=[0x%"PRIx64",0x%"PRIx64"]"
+virtio_iommu_unmap_right_interval(uint64_t low, uint64_t high, uint64_t next_low, uint64_t next_high) "Unmap right [0x%"PRIx64",0x%"PRIx64"], new interval=[0x%"PRIx64",0x%"PRIx64"]"
+virtio_iommu_unmap_inc_interval(uint64_t low, uint64_t high) "Unmap inc [0x%"PRIx64",0x%"PRIx64"]"
+virtio_iommu_translate_result(uint64_t virt_addr, uint64_t phys_addr, uint32_t sid) "0x%"PRIx64" -> 0x%"PRIx64 " for sid=%d"
diff --git a/hw/virtio/virtio-iommu.c b/hw/virtio/virtio-iommu.c
index 902c779..cd188fc 100644
--- a/hw/virtio/virtio-iommu.c
+++ b/hw/virtio/virtio-iommu.c
@@ -32,10 +32,37 @@
 #include "hw/virtio/virtio-bus.h"
 #include "hw/virtio/virtio-access.h"
 #include "hw/virtio/virtio-iommu.h"
+#include "hw/pci/pci_bus.h"
+#include "hw/pci/pci.h"
 
 /* Max size */
 #define VIOMMU_DEFAULT_QUEUE_SIZE 256
 
+typedef struct viommu_as viommu_as;
+
+typedef struct viommu_mapping {
+    uint64_t virt_addr;
+    uint64_t phys_addr;
+    uint64_t size;
+    uint32_t flags;
+} viommu_mapping;
+
+typedef struct viommu_interval {
+    uint64_t low;
+    uint64_t high;
+} viommu_interval;
+
+typedef struct viommu_dev {
+    uint32_t id;
+    viommu_as *as;
+} viommu_dev;
+
+struct viommu_as {
+    uint32_t id;
+    uint32_t nr_devices;
+    GTree *mappings;
+};
+
 static inline uint16_t smmu_get_sid(IOMMUDevice *dev)
 {
     return  ((pci_bus_num(dev->bus) & 0xff) << 8) | dev->devfn;
@@ -88,6 +115,19 @@ static void virtio_iommu_init_as(VirtIOIOMMU *s)
     }
 }
 
+static gint interval_cmp(gconstpointer a, gconstpointer b, gpointer user_data)
+{
+    viommu_interval *inta = (viommu_interval *)a;
+    viommu_interval *intb = (viommu_interval *)b;
+
+    if (inta->high <= intb->low) {
+        return -1;
+    } else if (intb->high <= inta->low) {
+        return 1;
+    } else {
+        return 0;
+    }
+}
 
 static int virtio_iommu_attach(VirtIOIOMMU *s,
                                struct virtio_iommu_req_attach *req)
@@ -95,10 +135,34 @@ static int virtio_iommu_attach(VirtIOIOMMU *s,
     uint32_t asid = le32_to_cpu(req->address_space);
     uint32_t devid = le32_to_cpu(req->device);
     uint32_t reserved = le32_to_cpu(req->reserved);
+    viommu_as *as;
+    viommu_dev *dev;
 
     trace_virtio_iommu_attach(asid, devid, reserved);
 
-    return VIRTIO_IOMMU_S_UNSUPP;
+    dev = g_tree_lookup(s->devices, GUINT_TO_POINTER(devid));
+    if (dev) {
+        return -1;
+    }
+
+    as = g_tree_lookup(s->address_spaces, GUINT_TO_POINTER(asid));
+    if (!as) {
+        as = g_malloc0(sizeof(*as));
+        as->id = asid;
+        as->mappings = g_tree_new_full((GCompareDataFunc)interval_cmp,
+                                         NULL, NULL, (GDestroyNotify)g_free);
+        g_tree_insert(s->address_spaces, GUINT_TO_POINTER(asid), as);
+        trace_virtio_iommu_new_asid(asid);
+    }
+
+    dev = g_malloc0(sizeof(*dev));
+    dev->as = as;
+    dev->id = devid;
+    as->nr_devices++;
+    trace_virtio_iommu_new_devid(devid);
+    g_tree_insert(s->devices, GUINT_TO_POINTER(devid), dev);
+
+    return VIRTIO_IOMMU_S_OK;
 }
 
 static int virtio_iommu_detach(VirtIOIOMMU *s,
@@ -106,10 +170,13 @@ static int virtio_iommu_detach(VirtIOIOMMU *s,
 {
     uint32_t devid = le32_to_cpu(req->device);
     uint32_t reserved = le32_to_cpu(req->reserved);
+    int ret;
 
     trace_virtio_iommu_detach(devid, reserved);
 
-    return VIRTIO_IOMMU_S_UNSUPP;
+    ret = g_tree_remove(s->devices, GUINT_TO_POINTER(devid));
+
+    return ret ? VIRTIO_IOMMU_S_OK : VIRTIO_IOMMU_S_INVAL;
 }
 
 static int virtio_iommu_map(VirtIOIOMMU *s,
@@ -120,10 +187,37 @@ static int virtio_iommu_map(VirtIOIOMMU *s,
     uint64_t virt_addr = le64_to_cpu(req->virt_addr);
     uint64_t size = le64_to_cpu(req->size);
     uint32_t flags = le32_to_cpu(req->flags);
+    viommu_as *as;
+    viommu_interval *interval;
+    viommu_mapping *mapping;
+
+    interval = g_malloc0(sizeof(*interval));
+
+    interval->low = virt_addr;
+    interval->high = virt_addr + size - 1;
+
+    as = g_tree_lookup(s->address_spaces, GUINT_TO_POINTER(asid));
+    if (!as) {
+        return VIRTIO_IOMMU_S_INVAL;
+    }
+
+    mapping = g_tree_lookup(as->mappings, (gpointer)interval);
+    if (mapping) {
+        g_free(interval);
+        return VIRTIO_IOMMU_S_INVAL;
+    }
 
     trace_virtio_iommu_map(asid, phys_addr, virt_addr, size, flags);
 
-    return VIRTIO_IOMMU_S_UNSUPP;
+    mapping = g_malloc0(sizeof(*mapping));
+    mapping->virt_addr = virt_addr;
+    mapping->phys_addr = phys_addr;
+    mapping->size = size;
+    mapping->flags = flags;
+
+    g_tree_insert(as->mappings, interval, mapping);
+
+    return VIRTIO_IOMMU_S_OK;
 }
 
 static int virtio_iommu_unmap(VirtIOIOMMU *s,
@@ -133,10 +227,64 @@ static int virtio_iommu_unmap(VirtIOIOMMU *s,
     uint64_t virt_addr = le64_to_cpu(req->virt_addr);
     uint64_t size = le64_to_cpu(req->size);
     uint32_t flags = le32_to_cpu(req->flags);
+    viommu_mapping *mapping;
+    viommu_interval interval;
+    viommu_as *as;
 
     trace_virtio_iommu_unmap(asid, virt_addr, size, flags);
 
-    return VIRTIO_IOMMU_S_UNSUPP;
+    as = g_tree_lookup(s->address_spaces, GUINT_TO_POINTER(asid));
+    if (!as) {
+        error_report("%s: no as", __func__);
+        return VIRTIO_IOMMU_S_INVAL;
+    }
+    interval.low = virt_addr;
+    interval.high = virt_addr + size - 1;
+
+    mapping = g_tree_lookup(as->mappings, (gpointer)&interval);
+
+    while (mapping) {
+        viommu_interval current;
+        uint64_t low  = mapping->virt_addr;
+        uint64_t high = mapping->virt_addr + mapping->size - 1;
+
+        current.low = low;
+        current.high = high;
+
+        if (low == interval.low && size >= mapping->size) {
+            g_tree_remove(as->mappings, (gpointer)&current);
+            interval.low = high + 1;
+            trace_virtio_iommu_unmap_left_interval(current.low, current.high,
+                interval.low, interval.high);
+        } else if (high == interval.high && size >= mapping->size) {
+            trace_virtio_iommu_unmap_right_interval(current.low, current.high,
+                interval.low, interval.high);
+            g_tree_remove(as->mappings, (gpointer)&current);
+            interval.high = low - 1;
+        } else if (low > interval.low && high < interval.high) {
+            trace_virtio_iommu_unmap_inc_interval(current.low, current.high);
+            g_tree_remove(as->mappings, (gpointer)&current);
+        } else {
+            break;
+        }
+        if (interval.low >= interval.high) {
+            return VIRTIO_IOMMU_S_OK;
+        } else {
+            mapping = g_tree_lookup(as->mappings, (gpointer)&interval);
+        }
+    }
+
+    if (mapping) {
+        error_report("****** %s: Unmap 0x%"PRIx64" size=0x%"PRIx64
+                     " from 0x%"PRIx64" size=0x%"PRIx64" is not supported",
+                     __func__, interval.low, size,
+                     mapping->virt_addr, mapping->size);
+    } else {
+        error_report("****** %s: no mapping for [0x%"PRIx64",0x%"PRIx64"]",
+                     __func__, interval.low, interval.high);
+    }
+
+    return VIRTIO_IOMMU_S_INVAL;
 }
 
 #define get_payload_size(req) (\
@@ -266,19 +414,46 @@ static IOMMUTLBEntry virtio_iommu_translate(MemoryRegion *mr, hwaddr addr,
                                             IOMMUAccessFlags flag)
 {
     IOMMUDevice *sdev = container_of(mr, IOMMUDevice, iommu_mr);
+    VirtIOIOMMU *s = sdev->viommu;
     uint32_t sid;
+    viommu_dev *dev;
+    viommu_mapping *mapping;
+    viommu_interval interval;
+
+    interval.low = addr;
+    interval.high = addr + 1;
 
     IOMMUTLBEntry entry = {
         .target_as = &address_space_memory,
         .iova = addr,
         .translated_addr = addr,
-        .addr_mask = ~(hwaddr)0,
-        .perm = IOMMU_NONE,
+        .addr_mask = (1 << 12) - 1, /* TODO */
+        .perm = 3,
     };
 
     sid = smmu_get_sid(sdev);
 
     trace_virtio_iommu_translate(mr->name, sid, addr, flag);
+    qemu_mutex_lock(&s->mutex);
+
+    dev = g_tree_lookup(s->devices, GUINT_TO_POINTER(sid));
+    if (!dev) {
+        /* device cannot be attached to another as */
+        printf("%s sid=%d is not known!!\n", __func__, sid);
+        goto unlock;
+    }
+
+    mapping = g_tree_lookup(dev->as->mappings, (gpointer)&interval);
+    if (!mapping) {
+        printf("%s no mapping for 0x%"PRIx64" for sid=%d\n", __func__,
+               addr, sid);
+        goto unlock;
+    }
+    entry.translated_addr = addr - mapping->virt_addr + mapping->phys_addr,
+    trace_virtio_iommu_translate_result(addr, entry.translated_addr, sid);
+
+unlock:
+    qemu_mutex_unlock(&s->mutex);
     return entry;
 }
 
@@ -341,6 +516,12 @@ static inline guint as_uint64_hash(gconstpointer v)
     return (guint)*(const uint64_t *)v;
 }
 
+static gint int_cmp(gconstpointer a, gconstpointer b, gpointer user_data)
+{
+    uint ua = GPOINTER_TO_UINT(a);
+    uint ub = GPOINTER_TO_UINT(b);
+    return (ua > ub) - (ua < ub);
+}
 
 static void virtio_iommu_device_realize(DeviceState *dev, Error **errp)
 {
@@ -362,12 +543,21 @@ static void virtio_iommu_device_realize(DeviceState *dev, Error **errp)
                                             as_uint64_equal,
                                             g_free, g_free);
 
+    s->address_spaces = g_tree_new_full((GCompareDataFunc)int_cmp,
+                                         NULL, NULL, (GDestroyNotify)g_free);
+    s->devices = g_tree_new_full((GCompareDataFunc)int_cmp,
+                                         NULL, NULL, (GDestroyNotify)g_free);
+
     virtio_iommu_init_as(s);
 }
 
 static void virtio_iommu_device_unrealize(DeviceState *dev, Error **errp)
 {
     VirtIODevice *vdev = VIRTIO_DEVICE(dev);
+    VirtIOIOMMU *s = VIRTIO_IOMMU(dev);
+
+    g_tree_destroy(s->address_spaces);
+    g_tree_destroy(s->devices);
 
     virtio_cleanup(vdev);
 }
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 73+ messages in thread

* [Qemu-devel] [RFC v2 7/8] hw/arm/virt: Add 2.10 machine type
  2017-06-07 16:01 [Qemu-devel] [RFC v2 0/8] VIRTIO-IOMMU device Eric Auger
                   ` (5 preceding siblings ...)
  2017-06-07 16:01 ` [Qemu-devel] [RFC v2 6/8] virtio-iommu: Implement the translation and commands Eric Auger
@ 2017-06-07 16:01 ` Eric Auger
  2017-06-07 16:01 ` [Qemu-devel] [RFC v2 8/8] hw/arm/virt: Add virtio-iommu the virt board Eric Auger
  2017-06-09  6:16 ` [Qemu-devel] [RFC v2 0/8] VIRTIO-IOMMU device Bharat Bhushan
  8 siblings, 0 replies; 73+ messages in thread
From: Eric Auger @ 2017-06-07 16:01 UTC (permalink / raw)
  To: eric.auger.pro, eric.auger, peter.maydell, alex.williamson, mst,
	qemu-arm, qemu-devel, jean-philippe.brucker
  Cc: will.deacon, robin.murphy, kevin.tian, marc.zyngier,
	christoffer.dall, drjones, wei, tn, bharat.bhushan

The new machine type allows virtio-iommu instantiation.

Signed-off-by: Eric Auger <eric.auger@redhat.com>

---

a Veuillez saisir le message de validation pour vos modifications. Les lignes
---
 hw/arm/virt.c         | 24 ++++++++++++++++++++++--
 include/hw/arm/virt.h |  1 +
 2 files changed, 23 insertions(+), 2 deletions(-)

diff --git a/hw/arm/virt.c b/hw/arm/virt.c
index 010f724..6eb0d2a 100644
--- a/hw/arm/virt.c
+++ b/hw/arm/virt.c
@@ -1639,7 +1639,7 @@ static void machvirt_machine_init(void)
 }
 type_init(machvirt_machine_init);
 
-static void virt_2_9_instance_init(Object *obj)
+static void virt_2_10_instance_init(Object *obj)
 {
     VirtMachineState *vms = VIRT_MACHINE(obj);
     VirtMachineClass *vmc = VIRT_MACHINE_GET_CLASS(vms);
@@ -1699,10 +1699,30 @@ static void virt_2_9_instance_init(Object *obj)
     vms->irqmap = a15irqmap;
 }
 
+static void virt_machine_2_10_options(MachineClass *mc)
+{
+}
+DEFINE_VIRT_MACHINE_AS_LATEST(2, 10)
+
+#define VIRT_COMPAT_2_9 \
+    HW_COMPAT_2_9
+
+static void virt_2_9_instance_init(Object *obj)
+{
+    virt_2_10_instance_init(obj);
+}
+
 static void virt_machine_2_9_options(MachineClass *mc)
 {
+    VirtMachineClass *vmc = VIRT_MACHINE_CLASS(OBJECT_CLASS(mc));
+
+    virt_machine_2_10_options(mc);
+    SET_MACHINE_COMPAT(mc, VIRT_COMPAT_2_9);
+
+    vmc->no_iommu = true;
 }
-DEFINE_VIRT_MACHINE_AS_LATEST(2, 9)
+DEFINE_VIRT_MACHINE(2, 9)
+
 
 #define VIRT_COMPAT_2_8 \
     HW_COMPAT_2_8
diff --git a/include/hw/arm/virt.h b/include/hw/arm/virt.h
index 33b0ff3..ff27551 100644
--- a/include/hw/arm/virt.h
+++ b/include/hw/arm/virt.h
@@ -84,6 +84,7 @@ typedef struct {
     bool disallow_affinity_adjustment;
     bool no_its;
     bool no_pmu;
+    bool no_iommu;
     bool claim_edge_triggered_timers;
 } VirtMachineClass;
 
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 73+ messages in thread

* [Qemu-devel] [RFC v2 8/8] hw/arm/virt: Add virtio-iommu the virt board
  2017-06-07 16:01 [Qemu-devel] [RFC v2 0/8] VIRTIO-IOMMU device Eric Auger
                   ` (6 preceding siblings ...)
  2017-06-07 16:01 ` [Qemu-devel] [RFC v2 7/8] hw/arm/virt: Add 2.10 machine type Eric Auger
@ 2017-06-07 16:01 ` Eric Auger
  2017-06-09  6:16 ` [Qemu-devel] [RFC v2 0/8] VIRTIO-IOMMU device Bharat Bhushan
  8 siblings, 0 replies; 73+ messages in thread
From: Eric Auger @ 2017-06-07 16:01 UTC (permalink / raw)
  To: eric.auger.pro, eric.auger, peter.maydell, alex.williamson, mst,
	qemu-arm, qemu-devel, jean-philippe.brucker
  Cc: will.deacon, robin.murphy, kevin.tian, marc.zyngier,
	christoffer.dall, drjones, wei, tn, bharat.bhushan

The specific virtio-mmio node is inconditionally added on
machine init while the binding between this latter and the
PCIe host bridge is done on machine init done notifier, only
if -device virtio-iommu-device was added to the qemu command
line.

Signed-off-by: Eric Auger <eric.auger@redhat.com>

---
---
 hw/arm/virt.c         | 92 +++++++++++++++++++++++++++++++++++++++++++++++----
 include/hw/arm/virt.h |  4 +++
 2 files changed, 89 insertions(+), 7 deletions(-)

diff --git a/hw/arm/virt.c b/hw/arm/virt.c
index 6eb0d2a..6bcfbcd 100644
--- a/hw/arm/virt.c
+++ b/hw/arm/virt.c
@@ -52,6 +52,7 @@
 #include "hw/arm/fdt.h"
 #include "hw/intc/arm_gic.h"
 #include "hw/intc/arm_gicv3_common.h"
+#include "hw/virtio/virtio-iommu.h"
 #include "kvm_arm.h"
 #include "hw/smbios/smbios.h"
 #include "qapi/visitor.h"
@@ -139,6 +140,7 @@ static const MemMapEntry a15memmap[] = {
     [VIRT_FW_CFG] =             { 0x09020000, 0x00000018 },
     [VIRT_GPIO] =               { 0x09030000, 0x00001000 },
     [VIRT_SECURE_UART] =        { 0x09040000, 0x00001000 },
+    [VIRT_SMMU] =               { 0x09050000, 0x00000200 },
     [VIRT_MMIO] =               { 0x0a000000, 0x00000200 },
     /* ...repeating for a total of NUM_VIRTIO_TRANSPORTS, each of that size */
     [VIRT_PLATFORM_BUS] =       { 0x0c000000, 0x02000000 },
@@ -159,6 +161,7 @@ static const int a15irqmap[] = {
     [VIRT_SECURE_UART] = 8,
     [VIRT_MMIO] = 16, /* ...to 16 + NUM_VIRTIO_TRANSPORTS - 1 */
     [VIRT_GIC_V2M] = 48, /* ...to 48 + NUM_GICV2M_SPIS - 1 */
+    [VIRT_SMMU] = 74,
     [VIRT_PLATFORM_BUS] = 112, /* ...to 112 + PLATFORM_BUS_NUM_IRQS -1 */
 };
 
@@ -991,7 +994,81 @@ static void create_pcie_irq_map(const VirtMachineState *vms,
                            0x7           /* PCI irq */);
 }
 
-static void create_pcie(const VirtMachineState *vms, qemu_irq *pic)
+static int bind_virtio_iommu_device(Object *obj, void *opaque)
+{
+    VirtMachineState *vms = (VirtMachineState *)opaque;
+    struct arm_boot_info *info = &vms->bootinfo;
+    int dtb_size;
+    void *fdt = info->get_dtb(info, &dtb_size);
+    Object *dev;
+
+    dev = object_dynamic_cast(obj, TYPE_VIRTIO_IOMMU);
+
+    if (!dev) {
+        /* Container, traverse it for children */
+        return object_child_foreach(obj, bind_virtio_iommu_device, opaque);
+    }
+
+    qemu_fdt_setprop_cells(fdt, vms->pcie_host_nodename, "iommu-map",
+                           0x0, vms->smmu_phandle, 0x0, 0x10000);
+
+    return true;
+}
+
+static
+void virtio_iommu_notifier(Notifier *notifier, void *data)
+{
+    VirtMachineState *vms = container_of(notifier, VirtMachineState,
+                                         virtio_iommu_done);
+    VirtMachineClass *vmc = VIRT_MACHINE_GET_CLASS(vms);
+    Object *container;
+
+
+    if (vmc->no_iommu) {
+        return;
+    }
+
+    container = container_get(qdev_get_machine(), "/peripheral");
+    bind_virtio_iommu_device(container, vms);
+    container = container_get(qdev_get_machine(), "/peripheral-anon");
+    bind_virtio_iommu_device(container, vms);
+}
+
+static void create_virtio_iommu(VirtMachineState *vms, qemu_irq *pic)
+{
+    char *smmu;
+    const char compat[] = "virtio,mmio";
+    int irq =  vms->irqmap[VIRT_SMMU];
+    hwaddr base = vms->memmap[VIRT_SMMU].base;
+    hwaddr size = vms->memmap[VIRT_SMMU].size;
+    VirtMachineClass *vmc = VIRT_MACHINE_GET_CLASS(vms);
+
+    if (vmc->no_iommu) {
+        return;
+    }
+
+    vms->smmu_phandle = qemu_fdt_alloc_phandle(vms->fdt);
+
+    sysbus_create_simple("virtio-mmio", base, pic[irq]);
+
+    smmu = g_strdup_printf("/virtio_mmio@%" PRIx64, base);
+    qemu_fdt_add_subnode(vms->fdt, smmu);
+    qemu_fdt_setprop(vms->fdt, smmu, "compatible", compat, sizeof(compat));
+    qemu_fdt_setprop_sized_cells(vms->fdt, smmu, "reg", 2, base, 2, size);
+
+    qemu_fdt_setprop_cells(vms->fdt, smmu, "interrupts",
+            GIC_FDT_IRQ_TYPE_SPI, irq, GIC_FDT_IRQ_FLAGS_EDGE_LO_HI);
+
+    qemu_fdt_setprop(vms->fdt, smmu, "dma-coherent", NULL, 0);
+    qemu_fdt_setprop_cell(vms->fdt, smmu, "#iommu-cells", 1);
+    qemu_fdt_setprop_cell(vms->fdt, smmu, "phandle", vms->smmu_phandle);
+    g_free(smmu);
+
+    vms->virtio_iommu_done.notify = virtio_iommu_notifier;
+    qemu_add_machine_init_done_notifier(&vms->virtio_iommu_done);
+}
+
+static void create_pcie(VirtMachineState *vms, qemu_irq *pic)
 {
     hwaddr base_mmio = vms->memmap[VIRT_PCIE_MMIO].base;
     hwaddr size_mmio = vms->memmap[VIRT_PCIE_MMIO].size;
@@ -1064,7 +1141,8 @@ static void create_pcie(const VirtMachineState *vms, qemu_irq *pic)
         }
     }
 
-    nodename = g_strdup_printf("/pcie@%" PRIx64, base);
+    vms->pcie_host_nodename = g_strdup_printf("/pcie@%" PRIx64, base);
+    nodename = vms->pcie_host_nodename;
     qemu_fdt_add_subnode(vms->fdt, nodename);
     qemu_fdt_setprop_string(vms->fdt, nodename,
                             "compatible", "pci-host-ecam-generic");
@@ -1103,7 +1181,6 @@ static void create_pcie(const VirtMachineState *vms, qemu_irq *pic)
     qemu_fdt_setprop_cell(vms->fdt, nodename, "#interrupt-cells", 1);
     create_pcie_irq_map(vms, vms->gic_phandle, irq, nodename);
 
-    g_free(nodename);
 }
 
 static void create_platform_bus(VirtMachineState *vms, qemu_irq *pic)
@@ -1448,16 +1525,16 @@ static void machvirt_init(MachineState *machine)
 
     create_rtc(vms, pic);
 
-    create_pcie(vms, pic);
-
-    create_gpio(vms, pic);
-
     /* Create mmio transports, so the user can create virtio backends
      * (which will be automatically plugged in to the transports). If
      * no backend is created the transport will just sit harmlessly idle.
      */
     create_virtio_devices(vms, pic);
 
+    create_pcie(vms, pic);
+
+    create_gpio(vms, pic);
+
     vms->fw_cfg = create_fw_cfg(vms, &address_space_memory);
     rom_set_fw(vms->fw_cfg);
 
@@ -1482,6 +1559,7 @@ static void machvirt_init(MachineState *machine)
      * Notifiers are executed in registration reverse order.
      */
     create_platform_bus(vms, pic);
+    create_virtio_iommu(vms, pic);
 }
 
 static bool virt_get_secure(Object *obj, Error **errp)
diff --git a/include/hw/arm/virt.h b/include/hw/arm/virt.h
index ff27551..070cb39 100644
--- a/include/hw/arm/virt.h
+++ b/include/hw/arm/virt.h
@@ -59,6 +59,7 @@ enum {
     VIRT_GIC_V2M,
     VIRT_GIC_ITS,
     VIRT_GIC_REDIST,
+    VIRT_SMMU,
     VIRT_UART,
     VIRT_MMIO,
     VIRT_RTC,
@@ -91,6 +92,7 @@ typedef struct {
 typedef struct {
     MachineState parent;
     Notifier machine_done;
+    Notifier virtio_iommu_done;
     FWCfgState *fw_cfg;
     bool secure;
     bool highmem;
@@ -106,6 +108,8 @@ typedef struct {
     uint32_t clock_phandle;
     uint32_t gic_phandle;
     uint32_t msi_phandle;
+    uint32_t smmu_phandle;
+    char *pcie_host_nodename;
     int psci_conduit;
 } VirtMachineState;
 
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 73+ messages in thread

* Re: [Qemu-devel] [RFC v2 3/8] virtio_iommu: add skeleton
  2017-06-07 16:01 ` [Qemu-devel] [RFC v2 3/8] virtio_iommu: add skeleton Eric Auger
@ 2017-06-08 11:09   ` Bharat Bhushan
  2017-06-23 16:08     ` Jean-Philippe Brucker
  0 siblings, 1 reply; 73+ messages in thread
From: Bharat Bhushan @ 2017-06-08 11:09 UTC (permalink / raw)
  To: Eric Auger, eric.auger.pro, peter.maydell, alex.williamson, mst,
	qemu-arm, qemu-devel, jean-philippe.brucker
  Cc: will.deacon, robin.murphy, kevin.tian, marc.zyngier,
	christoffer.dall, drjones, wei, tn



> -----Original Message-----
> From: Eric Auger [mailto:eric.auger@redhat.com]
> Sent: Wednesday, June 07, 2017 9:31 PM
> To: eric.auger.pro@gmail.com; eric.auger@redhat.com;
> peter.maydell@linaro.org; alex.williamson@redhat.com; mst@redhat.com;
> qemu-arm@nongnu.org; qemu-devel@nongnu.org; jean-
> philippe.brucker@arm.com
> Cc: will.deacon@arm.com; robin.murphy@arm.com; kevin.tian@intel.com;
> marc.zyngier@arm.com; christoffer.dall@linaro.org; drjones@redhat.com;
> wei@redhat.com; tn@semihalf.com; Bharat Bhushan
> <bharat.bhushan@nxp.com>
> Subject: [RFC v2 3/8] virtio_iommu: add skeleton
> 
> This patchs adds the skeleton for the virtio-iommu device.
> 
> Signed-off-by: Eric Auger <eric.auger@redhat.com>
> ---
>  hw/virtio/Makefile.objs          |   1 +
>  hw/virtio/virtio-iommu.c         | 247
> +++++++++++++++++++++++++++++++++++++++
>  include/hw/virtio/virtio-iommu.h |  60 ++++++++++
>  3 files changed, 308 insertions(+)
>  create mode 100644 hw/virtio/virtio-iommu.c
>  create mode 100644 include/hw/virtio/virtio-iommu.h
> 
> diff --git a/hw/virtio/Makefile.objs b/hw/virtio/Makefile.objs
> index 765d363..8967a4a 100644
> --- a/hw/virtio/Makefile.objs
> +++ b/hw/virtio/Makefile.objs
> @@ -6,6 +6,7 @@ common-obj-y += virtio-mmio.o
> 
>  obj-y += virtio.o virtio-balloon.o
>  obj-$(CONFIG_LINUX) += vhost.o vhost-backend.o vhost-user.o
> +obj-$(CONFIG_LINUX) += virtio-iommu.o
>  obj-$(CONFIG_VHOST_VSOCK) += vhost-vsock.o
>  obj-y += virtio-crypto.o
>  obj-$(CONFIG_VIRTIO_PCI) += virtio-crypto-pci.o
> diff --git a/hw/virtio/virtio-iommu.c b/hw/virtio/virtio-iommu.c
> new file mode 100644
> index 0000000..86129ef
> --- /dev/null
> +++ b/hw/virtio/virtio-iommu.c
> @@ -0,0 +1,247 @@
> +/*
> + * virtio-iommu device
> + *
> + * Copyright (c) 2017 Red Hat, Inc.
> + *
> + * This program is free software; you can redistribute it and/or modify it
> + * under the terms and conditions of the GNU General Public License,
> + * version 2 or later, as published by the Free Software Foundation.
> + *
> + * This program is distributed in the hope it will be useful, but WITHOUT
> + * ANY WARRANTY; without even the implied warranty of
> MERCHANTABILITY or
> + * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public
> License for
> + * more details.
> + *
> + * You should have received a copy of the GNU General Public License along
> with
> + * this program.  If not, see <http://www.gnu.org/licenses/>.
> + *
> + */
> +
> +#include "qemu/osdep.h"
> +#include "qemu/iov.h"
> +#include "qemu-common.h"
> +#include "hw/virtio/virtio.h"
> +#include "sysemu/kvm.h"
> +#include "qapi-event.h"
> +#include "trace.h"
> +
> +#include "standard-headers/linux/virtio_ids.h"
> +#include <linux/virtio_iommu.h>
> +
> +#include "hw/virtio/virtio-bus.h"
> +#include "hw/virtio/virtio-access.h"
> +#include "hw/virtio/virtio-iommu.h"
> +
> +/* Max size */
> +#define VIOMMU_DEFAULT_QUEUE_SIZE 256
> +
> +static int virtio_iommu_handle_attach(VirtIOIOMMU *s,
> +                                      struct iovec *iov,
> +                                      unsigned int iov_cnt)
> +{
> +    return -ENOENT;
> +}
> +static int virtio_iommu_handle_detach(VirtIOIOMMU *s,
> +                                      struct iovec *iov,
> +                                      unsigned int iov_cnt)
> +{
> +    return -ENOENT;
> +}
> +static int virtio_iommu_handle_map(VirtIOIOMMU *s,
> +                                   struct iovec *iov,
> +                                   unsigned int iov_cnt)
> +{
> +    return -ENOENT;
> +}
> +static int virtio_iommu_handle_unmap(VirtIOIOMMU *s,
> +                                     struct iovec *iov,
> +                                     unsigned int iov_cnt)
> +{
> +    return -ENOENT;
> +}
> +
> +static void virtio_iommu_handle_command(VirtIODevice *vdev, VirtQueue
> *vq)
> +{
> +    VirtIOIOMMU *s = VIRTIO_IOMMU(vdev);
> +    VirtQueueElement *elem;
> +    struct virtio_iommu_req_head head;
> +    struct virtio_iommu_req_tail tail;
> +    unsigned int iov_cnt;
> +    struct iovec *iov;
> +    size_t sz;
> +
> +    for (;;) {
> +        elem = virtqueue_pop(vq, sizeof(VirtQueueElement));
> +        if (!elem) {
> +            return;
> +        }
> +
> +        if (iov_size(elem->in_sg, elem->in_num) < sizeof(tail) ||
> +            iov_size(elem->out_sg, elem->out_num) < sizeof(head)) {
> +            virtio_error(vdev, "virtio-iommu erroneous head or tail");
> +            virtqueue_detach_element(vq, elem, 0);
> +            g_free(elem);
> +            break;
> +        }
> +
> +        iov_cnt = elem->out_num;
> +        iov = g_memdup(elem->out_sg, sizeof(struct iovec) * elem-
> >out_num);
> +        sz = iov_to_buf(iov, iov_cnt, 0, &head, sizeof(head));
> +        if (sz != sizeof(head)) {
> +            tail.status = VIRTIO_IOMMU_S_UNSUPP;
> +        }
> +        qemu_mutex_lock(&s->mutex);
> +        switch (head.type) {
> +        case VIRTIO_IOMMU_T_ATTACH:
> +            tail.status = virtio_iommu_handle_attach(s, iov, iov_cnt);
> +            break;
> +        case VIRTIO_IOMMU_T_DETACH:
> +            tail.status = virtio_iommu_handle_detach(s, iov, iov_cnt);
> +            break;
> +        case VIRTIO_IOMMU_T_MAP:
> +            tail.status = virtio_iommu_handle_map(s, iov, iov_cnt);
> +            break;
> +        case VIRTIO_IOMMU_T_UNMAP:
> +            tail.status = virtio_iommu_handle_unmap(s, iov, iov_cnt);
> +            break;
> +        default:
> +            tail.status = VIRTIO_IOMMU_S_UNSUPP;
> +        }
> +        qemu_mutex_unlock(&s->mutex);
> +
> +        sz = iov_from_buf(elem->in_sg, elem->in_num, 0,
> +                          &tail, sizeof(tail));
> +        assert(sz == sizeof(tail));
> +
> +        virtqueue_push(vq, elem, sizeof(tail));
> +        virtio_notify(vdev, vq);
> +        g_free(elem);
> +    }
> +}
> +
> +static void virtio_iommu_get_config(VirtIODevice *vdev, uint8_t
> *config_data)
> +{
> +    VirtIOIOMMU *dev = VIRTIO_IOMMU(vdev);
> +
> +    memcpy(config_data, &dev->config, sizeof(struct virtio_iommu_config));
> +}
> +
> +static void virtio_iommu_set_config(VirtIODevice *vdev,
> +                                      const uint8_t *config_data)
> +{
> +    VirtIOIOMMU *dev = VIRTIO_IOMMU(vdev);
> +    struct virtio_iommu_config config;
> +
> +    memcpy(&config, config_data, sizeof(struct virtio_iommu_config));
> +
> +    dev->config.page_sizes = le64_to_cpu(config.page_sizes);
> +    dev->config.input_range.end = le64_to_cpu(config.input_range.end);
> +}
> +
> +static uint64_t virtio_iommu_get_features(VirtIODevice *vdev, uint64_t f,
> +                                            Error **errp)
> +{
> +    VirtIOIOMMU *dev = VIRTIO_IOMMU(vdev);
> +    f |= dev->host_features;
> +    virtio_add_feature(&f, VIRTIO_RING_F_EVENT_IDX);
> +    virtio_add_feature(&f, VIRTIO_RING_F_INDIRECT_DESC);
> +    virtio_add_feature(&f, VIRTIO_IOMMU_F_INPUT_RANGE);
> +    return f;
> +}
> +
> +static int virtio_iommu_post_load_device(void *opaque, int version_id)
> +{
> +    return 0;
> +}
> +
> +static const VMStateDescription vmstate_virtio_iommu_device = {
> +    .name = "virtio-iommu-device",
> +    .version_id = 1,
> +    .minimum_version_id = 1,
> +    .post_load = virtio_iommu_post_load_device,
> +    .fields = (VMStateField[]) {
> +        VMSTATE_END_OF_LIST()
> +    },
> +};
> +
> +static void virtio_iommu_device_realize(DeviceState *dev, Error **errp)
> +{
> +    VirtIODevice *vdev = VIRTIO_DEVICE(dev);
> +    VirtIOIOMMU *s = VIRTIO_IOMMU(dev);
> +
> +    virtio_init(vdev, "virtio-iommu", VIRTIO_ID_IOMMU,
> +                sizeof(struct virtio_iommu_config));
> +
> +    s->vq = virtio_add_queue(vdev, VIOMMU_DEFAULT_QUEUE_SIZE,
> +                             virtio_iommu_handle_command);
> +
> +    s->config.page_sizes = ~((1ULL << 12) - 1);

This is hardcoded to 4K, Should this be aligned to Host-page size ?

Thanks
-Bharat

> +    s->config.input_range.end = -1UL;
> +}
> +
> +static void virtio_iommu_device_unrealize(DeviceState *dev, Error **errp)
> +{
> +    VirtIODevice *vdev = VIRTIO_DEVICE(dev);
> +
> +    virtio_cleanup(vdev);
> +}
> +
> +static void virtio_iommu_device_reset(VirtIODevice *vdev)
> +{
> +}
> +
> +static void virtio_iommu_set_status(VirtIODevice *vdev, uint8_t status)
> +{
> +}
> +
> +static void virtio_iommu_instance_init(Object *obj)
> +{
> +}
> +
> +static const VMStateDescription vmstate_virtio_iommu = {
> +    .name = "virtio-iommu",
> +    .minimum_version_id = 1,
> +    .version_id = 1,
> +    .fields = (VMStateField[]) {
> +        VMSTATE_VIRTIO_DEVICE,
> +        VMSTATE_END_OF_LIST()
> +    },
> +};
> +
> +static Property virtio_iommu_properties[] = {
> +    DEFINE_PROP_END_OF_LIST(),
> +};
> +
> +static void virtio_iommu_class_init(ObjectClass *klass, void *data)
> +{
> +    DeviceClass *dc = DEVICE_CLASS(klass);
> +    VirtioDeviceClass *vdc = VIRTIO_DEVICE_CLASS(klass);
> +
> +    dc->props = virtio_iommu_properties;
> +    dc->vmsd = &vmstate_virtio_iommu;
> +
> +    set_bit(DEVICE_CATEGORY_MISC, dc->categories);
> +    vdc->realize = virtio_iommu_device_realize;
> +    vdc->unrealize = virtio_iommu_device_unrealize;
> +    vdc->reset = virtio_iommu_device_reset;
> +    vdc->get_config = virtio_iommu_get_config;
> +    vdc->set_config = virtio_iommu_set_config;
> +    vdc->get_features = virtio_iommu_get_features;
> +    vdc->set_status = virtio_iommu_set_status;
> +    vdc->vmsd = &vmstate_virtio_iommu_device;
> +}
> +
> +static const TypeInfo virtio_iommu_info = {
> +    .name = TYPE_VIRTIO_IOMMU,
> +    .parent = TYPE_VIRTIO_DEVICE,
> +    .instance_size = sizeof(VirtIOIOMMU),
> +    .instance_init = virtio_iommu_instance_init,
> +    .class_init = virtio_iommu_class_init,
> +};
> +
> +static void virtio_register_types(void)
> +{
> +    type_register_static(&virtio_iommu_info);
> +}
> +
> +type_init(virtio_register_types)
> diff --git a/include/hw/virtio/virtio-iommu.h b/include/hw/virtio/virtio-
> iommu.h
> new file mode 100644
> index 0000000..2259413
> --- /dev/null
> +++ b/include/hw/virtio/virtio-iommu.h
> @@ -0,0 +1,60 @@
> +/*
> + * virtio-iommu device
> + *
> + * Copyright (c) 2017 Red Hat, Inc.
> + *
> + * This program is free software; you can redistribute it and/or modify it
> + * under the terms and conditions of the GNU General Public License,
> + * version 2 or later, as published by the Free Software Foundation.
> + *
> + * This program is distributed in the hope it will be useful, but WITHOUT
> + * ANY WARRANTY; without even the implied warranty of
> MERCHANTABILITY or
> + * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public
> License for
> + * more details.
> + *
> + * You should have received a copy of the GNU General Public License along
> with
> + * this program.  If not, see <http://www.gnu.org/licenses/>.
> + *
> + */
> +
> +#ifndef QEMU_VIRTIO_IOMMU_H
> +#define QEMU_VIRTIO_IOMMU_H
> +
> +#include "standard-headers/linux/virtio_iommu.h"
> +#include "hw/virtio/virtio.h"
> +#include "hw/pci/pci.h"
> +
> +#define TYPE_VIRTIO_IOMMU "virtio-iommu-device"
> +#define VIRTIO_IOMMU(obj) \
> +        OBJECT_CHECK(VirtIOIOMMU, (obj), TYPE_VIRTIO_IOMMU)
> +
> +#define IOMMU_PCI_BUS_MAX      256
> +#define IOMMU_PCI_DEVFN_MAX    256
> +
> +typedef struct IOMMUDevice {
> +    void         *viommu;
> +    PCIBus       *bus;
> +    int           devfn;
> +    MemoryRegion  iommu_mr;
> +    AddressSpace  as;
> +} IOMMUDevice;
> +
> +typedef struct IOMMUPciBus {
> +    PCIBus       *bus;
> +    IOMMUDevice  *pbdev[0]; /* Parent array is sparse, so dynamically alloc
> */
> +} IOMMUPciBus;
> +
> +typedef struct VirtIOIOMMU {
> +    VirtIODevice parent_obj;
> +    VirtQueue *vq;
> +    struct virtio_iommu_config config;
> +    MemoryRegionIOMMUOps iommu_ops;
> +    uint32_t host_features;
> +    GHashTable *as_by_busptr;
> +    IOMMUPciBus *as_by_bus_num[IOMMU_PCI_BUS_MAX];
> +    GTree *address_spaces;
> +    QemuMutex mutex;
> +    GTree *devices;
> +} VirtIOIOMMU;
> +
> +#endif
> --
> 2.5.5

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [Qemu-devel] [RFC v2 0/8] VIRTIO-IOMMU device
  2017-06-07 16:01 [Qemu-devel] [RFC v2 0/8] VIRTIO-IOMMU device Eric Auger
                   ` (7 preceding siblings ...)
  2017-06-07 16:01 ` [Qemu-devel] [RFC v2 8/8] hw/arm/virt: Add virtio-iommu the virt board Eric Auger
@ 2017-06-09  6:16 ` Bharat Bhushan
  2017-06-09  6:43   ` Auger Eric
  8 siblings, 1 reply; 73+ messages in thread
From: Bharat Bhushan @ 2017-06-09  6:16 UTC (permalink / raw)
  To: Eric Auger, eric.auger.pro, peter.maydell, alex.williamson, mst,
	qemu-arm, qemu-devel, jean-philippe.brucker
  Cc: will.deacon, robin.murphy, kevin.tian, marc.zyngier,
	christoffer.dall, drjones, wei, tn

Hi Eric,

> -----Original Message-----
> From: Eric Auger [mailto:eric.auger@redhat.com]
> Sent: Wednesday, June 07, 2017 9:31 PM
> To: eric.auger.pro@gmail.com; eric.auger@redhat.com;
> peter.maydell@linaro.org; alex.williamson@redhat.com; mst@redhat.com;
> qemu-arm@nongnu.org; qemu-devel@nongnu.org; jean-
> philippe.brucker@arm.com
> Cc: will.deacon@arm.com; robin.murphy@arm.com; kevin.tian@intel.com;
> marc.zyngier@arm.com; christoffer.dall@linaro.org; drjones@redhat.com;
> wei@redhat.com; tn@semihalf.com; Bharat Bhushan
> <bharat.bhushan@nxp.com>
> Subject: [RFC v2 0/8] VIRTIO-IOMMU device
> 
> This series implements the virtio-iommu device. This is a proof of concept
> based on the virtio-iommu specification written by Jean-Philippe Brucker [1].
> This was tested with a guest using the virtio-iommu driver [2] and exposed
> with a virtio-net-pci using dma ops.
> 
> The device gets instantiated using the "-device virtio-iommu-device"
> option. It currently works with ARM virt machine only as the machine must
> handle the dt binding between the virtio-mmio "iommu" node and the PCI
> host bridge node. ACPI booting is not yet supported.
> 
> This should allow to start some benchmarking activities against pure
> emulated IOMMU (especially ARM SMMU).

I am testing this on ARM64 and see below continuous error prints:

	virtio_iommu_translate sid=8 is not known!!
	virtio_iommu_translate sid=8 is not known!!
	virtio_iommu_translate sid=8 is not known!!
	virtio_iommu_translate sid=8 is not known!!
	virtio_iommu_translate sid=8 is not known!!
	virtio_iommu_translate sid=8 is not known!!
	virtio_iommu_translate sid=8 is not known!!
	virtio_iommu_translate sid=8 is not known!!
	virtio_iommu_translate sid=8 is not known!!
	virtio_iommu_translate sid=8 is not known!! 


Also in guest I do not see device-tree node with virtio-iommu.
I am using qemu-tree you mentioned below and iommu-driver patches published by Jean-P.
Qemu command line have additional ""-device virtio-iommu-device". What I am missing ?

Thanks
-Bharat

> 
> Best Regards
> 
> Eric
> 
> This series can be found at:
> https://github.com/eauger/qemu/tree/virtio-iommu-rfcv2
> 
> References:
> [1] [RFC 0/3] virtio-iommu: a paravirtualized IOMMU, [2] [RFC PATCH linux]
> iommu: Add virtio-iommu driver [3] [RFC PATCH kvmtool 00/15] Add virtio-
> iommu
> 
> History:
> v1 -> v2:
> - fix redifinition of viommu_as typedef
> 
> Eric Auger (8):
>   update-linux-headers: import virtio_iommu.h
>   linux-headers: Update for virtio-iommu
>   virtio_iommu: add skeleton
>   virtio-iommu: Decode the command payload
>   virtio_iommu: Add the iommu regions
>   virtio-iommu: Implement the translation and commands
>   hw/arm/virt: Add 2.10 machine type
>   hw/arm/virt: Add virtio-iommu the virt board
> 
>  hw/arm/virt.c                                 | 116 ++++-
>  hw/virtio/Makefile.objs                       |   1 +
>  hw/virtio/trace-events                        |  14 +
>  hw/virtio/virtio-iommu.c                      | 623 ++++++++++++++++++++++++++
>  include/hw/arm/virt.h                         |   5 +
>  include/hw/virtio/virtio-iommu.h              |  60 +++
>  include/standard-headers/linux/virtio_ids.h   |   1 +
>  include/standard-headers/linux/virtio_iommu.h | 142 ++++++
>  linux-headers/linux/virtio_iommu.h            |   1 +
>  scripts/update-linux-headers.sh               |   3 +
>  10 files changed, 957 insertions(+), 9 deletions(-)  create mode 100644
> hw/virtio/virtio-iommu.c  create mode 100644 include/hw/virtio/virtio-
> iommu.h  create mode 100644 include/standard-
> headers/linux/virtio_iommu.h
>  create mode 100644 linux-headers/linux/virtio_iommu.h
> 
> --
> 2.5.5

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [Qemu-devel] [RFC v2 0/8] VIRTIO-IOMMU device
  2017-06-09  6:16 ` [Qemu-devel] [RFC v2 0/8] VIRTIO-IOMMU device Bharat Bhushan
@ 2017-06-09  6:43   ` Auger Eric
  2017-06-09 11:30     ` Bharat Bhushan
  0 siblings, 1 reply; 73+ messages in thread
From: Auger Eric @ 2017-06-09  6:43 UTC (permalink / raw)
  To: Bharat Bhushan, eric.auger.pro, peter.maydell, alex.williamson,
	mst, qemu-arm, qemu-devel, jean-philippe.brucker
  Cc: will.deacon, robin.murphy, kevin.tian, marc.zyngier,
	christoffer.dall, drjones, wei, tn

Hi Bharat,

On 09/06/2017 08:16, Bharat Bhushan wrote:
> Hi Eric,
> 
>> -----Original Message-----
>> From: Eric Auger [mailto:eric.auger@redhat.com]
>> Sent: Wednesday, June 07, 2017 9:31 PM
>> To: eric.auger.pro@gmail.com; eric.auger@redhat.com;
>> peter.maydell@linaro.org; alex.williamson@redhat.com; mst@redhat.com;
>> qemu-arm@nongnu.org; qemu-devel@nongnu.org; jean-
>> philippe.brucker@arm.com
>> Cc: will.deacon@arm.com; robin.murphy@arm.com; kevin.tian@intel.com;
>> marc.zyngier@arm.com; christoffer.dall@linaro.org; drjones@redhat.com;
>> wei@redhat.com; tn@semihalf.com; Bharat Bhushan
>> <bharat.bhushan@nxp.com>
>> Subject: [RFC v2 0/8] VIRTIO-IOMMU device
>>
>> This series implements the virtio-iommu device. This is a proof of concept
>> based on the virtio-iommu specification written by Jean-Philippe Brucker [1].
>> This was tested with a guest using the virtio-iommu driver [2] and exposed
>> with a virtio-net-pci using dma ops.
>>
>> The device gets instantiated using the "-device virtio-iommu-device"
>> option. It currently works with ARM virt machine only as the machine must
>> handle the dt binding between the virtio-mmio "iommu" node and the PCI
>> host bridge node. ACPI booting is not yet supported.
>>
>> This should allow to start some benchmarking activities against pure
>> emulated IOMMU (especially ARM SMMU).
> 
> I am testing this on ARM64 and see below continuous error prints:
> 
> 	virtio_iommu_translate sid=8 is not known!!
> 	virtio_iommu_translate sid=8 is not known!!
> 	virtio_iommu_translate sid=8 is not known!!
> 	virtio_iommu_translate sid=8 is not known!!
> 	virtio_iommu_translate sid=8 is not known!!
> 	virtio_iommu_translate sid=8 is not known!!
> 	virtio_iommu_translate sid=8 is not known!!
> 	virtio_iommu_translate sid=8 is not known!!
> 	virtio_iommu_translate sid=8 is not known!!
> 	virtio_iommu_translate sid=8 is not known!! 
> 
> 
> Also in guest I do not see device-tree node with virtio-iommu.
do you mean the virtio-mmio with #iommu-cells property?

This one is created statically by virt machine. I would be surprised if
it were not there. Are you using the virt = virt2.10 machine. Machines
before do not support its instantiation.

Please can you add a printf in hw/arm/virt.c create_virtio_mmio() at the
moment when this node is created. Also you can add a printf in
bind_virtio_iommu_device() to make sure the binding with the PCI host
bridge is added on machine init done.

Also worth to check, CONFIG_VIRTIO_IOMMU=y on guest side.

Thanks

Eric

> I am using qemu-tree you mentioned below and iommu-driver patches published by Jean-P.
> Qemu command line have additional ""-device virtio-iommu-device". What I am missing ?


> 
> Thanks
> -Bharat
> 
>>
>> Best Regards
>>
>> Eric
>>
>> This series can be found at:
>> https://github.com/eauger/qemu/tree/virtio-iommu-rfcv2
>>
>> References:
>> [1] [RFC 0/3] virtio-iommu: a paravirtualized IOMMU, [2] [RFC PATCH linux]
>> iommu: Add virtio-iommu driver [3] [RFC PATCH kvmtool 00/15] Add virtio-
>> iommu
>>
>> History:
>> v1 -> v2:
>> - fix redifinition of viommu_as typedef
>>
>> Eric Auger (8):
>>   update-linux-headers: import virtio_iommu.h
>>   linux-headers: Update for virtio-iommu
>>   virtio_iommu: add skeleton
>>   virtio-iommu: Decode the command payload
>>   virtio_iommu: Add the iommu regions
>>   virtio-iommu: Implement the translation and commands
>>   hw/arm/virt: Add 2.10 machine type
>>   hw/arm/virt: Add virtio-iommu the virt board
>>
>>  hw/arm/virt.c                                 | 116 ++++-
>>  hw/virtio/Makefile.objs                       |   1 +
>>  hw/virtio/trace-events                        |  14 +
>>  hw/virtio/virtio-iommu.c                      | 623 ++++++++++++++++++++++++++
>>  include/hw/arm/virt.h                         |   5 +
>>  include/hw/virtio/virtio-iommu.h              |  60 +++
>>  include/standard-headers/linux/virtio_ids.h   |   1 +
>>  include/standard-headers/linux/virtio_iommu.h | 142 ++++++
>>  linux-headers/linux/virtio_iommu.h            |   1 +
>>  scripts/update-linux-headers.sh               |   3 +
>>  10 files changed, 957 insertions(+), 9 deletions(-)  create mode 100644
>> hw/virtio/virtio-iommu.c  create mode 100644 include/hw/virtio/virtio-
>> iommu.h  create mode 100644 include/standard-
>> headers/linux/virtio_iommu.h
>>  create mode 100644 linux-headers/linux/virtio_iommu.h
>>
>> --
>> 2.5.5
> 

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [Qemu-devel] [RFC v2 0/8] VIRTIO-IOMMU device
  2017-06-09  6:43   ` Auger Eric
@ 2017-06-09 11:30     ` Bharat Bhushan
  2017-06-09 11:53       ` Auger Eric
  2017-06-09 12:15       ` Auger Eric
  0 siblings, 2 replies; 73+ messages in thread
From: Bharat Bhushan @ 2017-06-09 11:30 UTC (permalink / raw)
  To: Auger Eric, eric.auger.pro, peter.maydell, alex.williamson, mst,
	qemu-arm, qemu-devel, jean-philippe.brucker
  Cc: will.deacon, robin.murphy, kevin.tian, marc.zyngier,
	christoffer.dall, drjones, wei, tn

Hi Eric,

> -----Original Message-----
> From: Auger Eric [mailto:eric.auger@redhat.com]
> Sent: Friday, June 09, 2017 12:14 PM
> To: Bharat Bhushan <bharat.bhushan@nxp.com>;
> eric.auger.pro@gmail.com; peter.maydell@linaro.org;
> alex.williamson@redhat.com; mst@redhat.com; qemu-arm@nongnu.org;
> qemu-devel@nongnu.org; jean-philippe.brucker@arm.com
> Cc: will.deacon@arm.com; robin.murphy@arm.com; kevin.tian@intel.com;
> marc.zyngier@arm.com; christoffer.dall@linaro.org; drjones@redhat.com;
> wei@redhat.com; tn@semihalf.com
> Subject: Re: [RFC v2 0/8] VIRTIO-IOMMU device
> 
> Hi Bharat,
> 
> On 09/06/2017 08:16, Bharat Bhushan wrote:
> > Hi Eric,
> >
> >> -----Original Message-----
> >> From: Eric Auger [mailto:eric.auger@redhat.com]
> >> Sent: Wednesday, June 07, 2017 9:31 PM
> >> To: eric.auger.pro@gmail.com; eric.auger@redhat.com;
> >> peter.maydell@linaro.org; alex.williamson@redhat.com;
> mst@redhat.com;
> >> qemu-arm@nongnu.org; qemu-devel@nongnu.org; jean-
> >> philippe.brucker@arm.com
> >> Cc: will.deacon@arm.com; robin.murphy@arm.com;
> kevin.tian@intel.com;
> >> marc.zyngier@arm.com; christoffer.dall@linaro.org;
> >> drjones@redhat.com; wei@redhat.com; tn@semihalf.com; Bharat
> Bhushan
> >> <bharat.bhushan@nxp.com>
> >> Subject: [RFC v2 0/8] VIRTIO-IOMMU device
> >>
> >> This series implements the virtio-iommu device. This is a proof of
> >> concept based on the virtio-iommu specification written by Jean-Philippe
> Brucker [1].
> >> This was tested with a guest using the virtio-iommu driver [2] and
> >> exposed with a virtio-net-pci using dma ops.
> >>
> >> The device gets instantiated using the "-device virtio-iommu-device"
> >> option. It currently works with ARM virt machine only as the machine
> >> must handle the dt binding between the virtio-mmio "iommu" node and
> >> the PCI host bridge node. ACPI booting is not yet supported.
> >>
> >> This should allow to start some benchmarking activities against pure
> >> emulated IOMMU (especially ARM SMMU).
> >
> > I am testing this on ARM64 and see below continuous error prints:
> >
> > 	virtio_iommu_translate sid=8 is not known!!
> > 	virtio_iommu_translate sid=8 is not known!!
> > 	virtio_iommu_translate sid=8 is not known!!
> > 	virtio_iommu_translate sid=8 is not known!!
> > 	virtio_iommu_translate sid=8 is not known!!
> > 	virtio_iommu_translate sid=8 is not known!!
> > 	virtio_iommu_translate sid=8 is not known!!
> > 	virtio_iommu_translate sid=8 is not known!!
> > 	virtio_iommu_translate sid=8 is not known!!
> > 	virtio_iommu_translate sid=8 is not known!!
> >
> >
> > Also in guest I do not see device-tree node with virtio-iommu.
> do you mean the virtio-mmio with #iommu-cells property?
> 
> This one is created statically by virt machine. I would be surprised if it were
> not there. Are you using the virt = virt2.10 machine. Machines before do not
> support its instantiation.
> 
> Please can you add a printf in hw/arm/virt.c create_virtio_mmio() at the
> moment when this node is created. Also you can add a printf in
> bind_virtio_iommu_device() to make sure the binding with the PCI host
> bridge is added on machine init done.
> 
> Also worth to check, CONFIG_VIRTIO_IOMMU=y on guest side.

It works on my side. The driver config was disabled and also I was using guest kernel which was not have deferred-probing. Now after fixing it works on my side
I placed some prints to see dma-map are mapping regions in virtio-iommu, it uses emulated iommu.

I will continue to add VFIO support now on this and more testing !!

Thanks
-Bharat

> 
> Thanks
> 
> Eric
> 
> > I am using qemu-tree you mentioned below and iommu-driver patches
> published by Jean-P.
> > Qemu command line have additional ""-device virtio-iommu-device". What
> I am missing ?
> 
> 
> >
> > Thanks
> > -Bharat
> >
> >>
> >> Best Regards
> >>
> >> Eric
> >>
> >> This series can be found at:
> >> https://github.com/eauger/qemu/tree/virtio-iommu-rfcv2
> >>
> >> References:
> >> [1] [RFC 0/3] virtio-iommu: a paravirtualized IOMMU, [2] [RFC PATCH
> >> linux]
> >> iommu: Add virtio-iommu driver [3] [RFC PATCH kvmtool 00/15] Add
> >> virtio- iommu
> >>
> >> History:
> >> v1 -> v2:
> >> - fix redifinition of viommu_as typedef
> >>
> >> Eric Auger (8):
> >>   update-linux-headers: import virtio_iommu.h
> >>   linux-headers: Update for virtio-iommu
> >>   virtio_iommu: add skeleton
> >>   virtio-iommu: Decode the command payload
> >>   virtio_iommu: Add the iommu regions
> >>   virtio-iommu: Implement the translation and commands
> >>   hw/arm/virt: Add 2.10 machine type
> >>   hw/arm/virt: Add virtio-iommu the virt board
> >>
> >>  hw/arm/virt.c                                 | 116 ++++-
> >>  hw/virtio/Makefile.objs                       |   1 +
> >>  hw/virtio/trace-events                        |  14 +
> >>  hw/virtio/virtio-iommu.c                      | 623
> ++++++++++++++++++++++++++
> >>  include/hw/arm/virt.h                         |   5 +
> >>  include/hw/virtio/virtio-iommu.h              |  60 +++
> >>  include/standard-headers/linux/virtio_ids.h   |   1 +
> >>  include/standard-headers/linux/virtio_iommu.h | 142 ++++++
> >>  linux-headers/linux/virtio_iommu.h            |   1 +
> >>  scripts/update-linux-headers.sh               |   3 +
> >>  10 files changed, 957 insertions(+), 9 deletions(-)  create mode
> >> 100644 hw/virtio/virtio-iommu.c  create mode 100644
> >> include/hw/virtio/virtio- iommu.h  create mode 100644
> >> include/standard- headers/linux/virtio_iommu.h  create mode 100644
> >> linux-headers/linux/virtio_iommu.h
> >>
> >> --
> >> 2.5.5
> >

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [Qemu-devel] [RFC v2 0/8] VIRTIO-IOMMU device
  2017-06-09 11:30     ` Bharat Bhushan
@ 2017-06-09 11:53       ` Auger Eric
  2017-06-19  7:54         ` Bharat Bhushan
  2017-06-09 12:15       ` Auger Eric
  1 sibling, 1 reply; 73+ messages in thread
From: Auger Eric @ 2017-06-09 11:53 UTC (permalink / raw)
  To: Bharat Bhushan, eric.auger.pro, peter.maydell, alex.williamson,
	mst, qemu-arm, qemu-devel, jean-philippe.brucker
  Cc: wei, kevin.tian, marc.zyngier, tn, will.deacon, drjones,
	robin.murphy, christoffer.dall

Hi Bharat,

On 09/06/2017 13:30, Bharat Bhushan wrote:
> Hi Eric,
> 
>> -----Original Message-----
>> From: Auger Eric [mailto:eric.auger@redhat.com]
>> Sent: Friday, June 09, 2017 12:14 PM
>> To: Bharat Bhushan <bharat.bhushan@nxp.com>;
>> eric.auger.pro@gmail.com; peter.maydell@linaro.org;
>> alex.williamson@redhat.com; mst@redhat.com; qemu-arm@nongnu.org;
>> qemu-devel@nongnu.org; jean-philippe.brucker@arm.com
>> Cc: will.deacon@arm.com; robin.murphy@arm.com; kevin.tian@intel.com;
>> marc.zyngier@arm.com; christoffer.dall@linaro.org; drjones@redhat.com;
>> wei@redhat.com; tn@semihalf.com
>> Subject: Re: [RFC v2 0/8] VIRTIO-IOMMU device
>>
>> Hi Bharat,
>>
>> On 09/06/2017 08:16, Bharat Bhushan wrote:
>>> Hi Eric,
>>>
>>>> -----Original Message-----
>>>> From: Eric Auger [mailto:eric.auger@redhat.com]
>>>> Sent: Wednesday, June 07, 2017 9:31 PM
>>>> To: eric.auger.pro@gmail.com; eric.auger@redhat.com;
>>>> peter.maydell@linaro.org; alex.williamson@redhat.com;
>> mst@redhat.com;
>>>> qemu-arm@nongnu.org; qemu-devel@nongnu.org; jean-
>>>> philippe.brucker@arm.com
>>>> Cc: will.deacon@arm.com; robin.murphy@arm.com;
>> kevin.tian@intel.com;
>>>> marc.zyngier@arm.com; christoffer.dall@linaro.org;
>>>> drjones@redhat.com; wei@redhat.com; tn@semihalf.com; Bharat
>> Bhushan
>>>> <bharat.bhushan@nxp.com>
>>>> Subject: [RFC v2 0/8] VIRTIO-IOMMU device
>>>>
>>>> This series implements the virtio-iommu device. This is a proof of
>>>> concept based on the virtio-iommu specification written by Jean-Philippe
>> Brucker [1].
>>>> This was tested with a guest using the virtio-iommu driver [2] and
>>>> exposed with a virtio-net-pci using dma ops.
>>>>
>>>> The device gets instantiated using the "-device virtio-iommu-device"
>>>> option. It currently works with ARM virt machine only as the machine
>>>> must handle the dt binding between the virtio-mmio "iommu" node and
>>>> the PCI host bridge node. ACPI booting is not yet supported.
>>>>
>>>> This should allow to start some benchmarking activities against pure
>>>> emulated IOMMU (especially ARM SMMU).
>>>
>>> I am testing this on ARM64 and see below continuous error prints:
>>>
>>> 	virtio_iommu_translate sid=8 is not known!!
>>> 	virtio_iommu_translate sid=8 is not known!!
>>> 	virtio_iommu_translate sid=8 is not known!!
>>> 	virtio_iommu_translate sid=8 is not known!!
>>> 	virtio_iommu_translate sid=8 is not known!!
>>> 	virtio_iommu_translate sid=8 is not known!!
>>> 	virtio_iommu_translate sid=8 is not known!!
>>> 	virtio_iommu_translate sid=8 is not known!!
>>> 	virtio_iommu_translate sid=8 is not known!!
>>> 	virtio_iommu_translate sid=8 is not known!!
>>>
>>>
>>> Also in guest I do not see device-tree node with virtio-iommu.
>> do you mean the virtio-mmio with #iommu-cells property?
>>
>> This one is created statically by virt machine. I would be surprised if it were
>> not there. Are you using the virt = virt2.10 machine. Machines before do not
>> support its instantiation.
>>
>> Please can you add a printf in hw/arm/virt.c create_virtio_mmio() at the
>> moment when this node is created. Also you can add a printf in
>> bind_virtio_iommu_device() to make sure the binding with the PCI host
>> bridge is added on machine init done.
>>
>> Also worth to check, CONFIG_VIRTIO_IOMMU=y on guest side.
> 
> It works on my side.
Great.

 The driver config was disabled and also I was using guest kernel which
was not have deferred-probing.
Yes I did not mention in my cover letter the guest I have been using is
based on Jean-Philippe's branch, featuring deferred IOMMU probing. I I
have not tried yet with an upstream guest.
 Now after fixing it works on my side
> I placed some prints to see dma-map are mapping regions in virtio-iommu, it uses emulated iommu.
> 
> I will continue to add VFIO support now on this and more testing !!

OK. I will do the VFIO integration first on the vsmmuv3 device as I
already prepared the VFIO replay and hopefully we will sync ;-)

Thanks

Eric
> 
> Thanks
> -Bharat
> 
>>
>> Thanks
>>
>> Eric
>>
>>> I am using qemu-tree you mentioned below and iommu-driver patches
>> published by Jean-P.
>>> Qemu command line have additional ""-device virtio-iommu-device". What
>> I am missing ?
>>
>>
>>>
>>> Thanks
>>> -Bharat
>>>
>>>>
>>>> Best Regards
>>>>
>>>> Eric
>>>>
>>>> This series can be found at:
>>>> https://github.com/eauger/qemu/tree/virtio-iommu-rfcv2
>>>>
>>>> References:
>>>> [1] [RFC 0/3] virtio-iommu: a paravirtualized IOMMU, [2] [RFC PATCH
>>>> linux]
>>>> iommu: Add virtio-iommu driver [3] [RFC PATCH kvmtool 00/15] Add
>>>> virtio- iommu
>>>>
>>>> History:
>>>> v1 -> v2:
>>>> - fix redifinition of viommu_as typedef
>>>>
>>>> Eric Auger (8):
>>>>   update-linux-headers: import virtio_iommu.h
>>>>   linux-headers: Update for virtio-iommu
>>>>   virtio_iommu: add skeleton
>>>>   virtio-iommu: Decode the command payload
>>>>   virtio_iommu: Add the iommu regions
>>>>   virtio-iommu: Implement the translation and commands
>>>>   hw/arm/virt: Add 2.10 machine type
>>>>   hw/arm/virt: Add virtio-iommu the virt board
>>>>
>>>>  hw/arm/virt.c                                 | 116 ++++-
>>>>  hw/virtio/Makefile.objs                       |   1 +
>>>>  hw/virtio/trace-events                        |  14 +
>>>>  hw/virtio/virtio-iommu.c                      | 623
>> ++++++++++++++++++++++++++
>>>>  include/hw/arm/virt.h                         |   5 +
>>>>  include/hw/virtio/virtio-iommu.h              |  60 +++
>>>>  include/standard-headers/linux/virtio_ids.h   |   1 +
>>>>  include/standard-headers/linux/virtio_iommu.h | 142 ++++++
>>>>  linux-headers/linux/virtio_iommu.h            |   1 +
>>>>  scripts/update-linux-headers.sh               |   3 +
>>>>  10 files changed, 957 insertions(+), 9 deletions(-)  create mode
>>>> 100644 hw/virtio/virtio-iommu.c  create mode 100644
>>>> include/hw/virtio/virtio- iommu.h  create mode 100644
>>>> include/standard- headers/linux/virtio_iommu.h  create mode 100644
>>>> linux-headers/linux/virtio_iommu.h
>>>>
>>>> --
>>>> 2.5.5
>>>
> 

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [Qemu-devel] [RFC v2 0/8] VIRTIO-IOMMU device
  2017-06-09 11:30     ` Bharat Bhushan
  2017-06-09 11:53       ` Auger Eric
@ 2017-06-09 12:15       ` Auger Eric
  1 sibling, 0 replies; 73+ messages in thread
From: Auger Eric @ 2017-06-09 12:15 UTC (permalink / raw)
  To: Bharat Bhushan, eric.auger.pro, peter.maydell, alex.williamson,
	mst, qemu-arm, qemu-devel, jean-philippe.brucker
  Cc: will.deacon, robin.murphy, kevin.tian, marc.zyngier,
	christoffer.dall, drjones, wei, tn, Peter Xu

Hi,

On 09/06/2017 13:30, Bharat Bhushan wrote:
> Hi Eric,
> 
>> -----Original Message-----
>> From: Auger Eric [mailto:eric.auger@redhat.com]
>> Sent: Friday, June 09, 2017 12:14 PM
>> To: Bharat Bhushan <bharat.bhushan@nxp.com>;
>> eric.auger.pro@gmail.com; peter.maydell@linaro.org;
>> alex.williamson@redhat.com; mst@redhat.com; qemu-arm@nongnu.org;
>> qemu-devel@nongnu.org; jean-philippe.brucker@arm.com
>> Cc: will.deacon@arm.com; robin.murphy@arm.com; kevin.tian@intel.com;
>> marc.zyngier@arm.com; christoffer.dall@linaro.org; drjones@redhat.com;
>> wei@redhat.com; tn@semihalf.com
>> Subject: Re: [RFC v2 0/8] VIRTIO-IOMMU device
>>
>> Hi Bharat,
>>
>> On 09/06/2017 08:16, Bharat Bhushan wrote:
>>> Hi Eric,
>>>
>>>> -----Original Message-----
>>>> From: Eric Auger [mailto:eric.auger@redhat.com]
>>>> Sent: Wednesday, June 07, 2017 9:31 PM
>>>> To: eric.auger.pro@gmail.com; eric.auger@redhat.com;
>>>> peter.maydell@linaro.org; alex.williamson@redhat.com;
>> mst@redhat.com;
>>>> qemu-arm@nongnu.org; qemu-devel@nongnu.org; jean-
>>>> philippe.brucker@arm.com
>>>> Cc: will.deacon@arm.com; robin.murphy@arm.com;
>> kevin.tian@intel.com;
>>>> marc.zyngier@arm.com; christoffer.dall@linaro.org;
>>>> drjones@redhat.com; wei@redhat.com; tn@semihalf.com; Bharat
>> Bhushan
>>>> <bharat.bhushan@nxp.com>
>>>> Subject: [RFC v2 0/8] VIRTIO-IOMMU device
>>>>
>>>> This series implements the virtio-iommu device. This is a proof of
>>>> concept based on the virtio-iommu specification written by Jean-Philippe
>> Brucker [1].
>>>> This was tested with a guest using the virtio-iommu driver [2] and
>>>> exposed with a virtio-net-pci using dma ops.
>>>>
>>>> The device gets instantiated using the "-device virtio-iommu-device"
>>>> option. It currently works with ARM virt machine only as the machine
>>>> must handle the dt binding between the virtio-mmio "iommu" node and
>>>> the PCI host bridge node. ACPI booting is not yet supported.

For those who may play with the device, this was tested with a
virtio-net-pci device using the following command:

-device
virtio-net-pci,netdev=tap0,mac=<MAC>,iommu_platform,disable-modern=off,disable-legacy=on
\

I tried to run the guest using a virtio-blk-pci device using
-device
virtio-blk-pci,scsi=off,drive=<>,iommu_platform=off,disable-modern=off,disable-legacy=on,werror=stop,rerror=stop
\

and the guest does *not* boot whereas it does without any iommu.

However I am not sure the issue is related to the actual virtual iommu
device as I have the exact same issue with vsmmuv3 emulated device (This
was originally reported by Tomasz). So the issue may come from the
infrastructure around. To be further investigated ...

Thanks

Eric

>>>>
>>>> This should allow to start some benchmarking activities against pure
>>>> emulated IOMMU (especially ARM SMMU).
>>>
>>> I am testing this on ARM64 and see below continuous error prints:
>>>
>>> 	virtio_iommu_translate sid=8 is not known!!
>>> 	virtio_iommu_translate sid=8 is not known!!
>>> 	virtio_iommu_translate sid=8 is not known!!
>>> 	virtio_iommu_translate sid=8 is not known!!
>>> 	virtio_iommu_translate sid=8 is not known!!
>>> 	virtio_iommu_translate sid=8 is not known!!
>>> 	virtio_iommu_translate sid=8 is not known!!
>>> 	virtio_iommu_translate sid=8 is not known!!
>>> 	virtio_iommu_translate sid=8 is not known!!
>>> 	virtio_iommu_translate sid=8 is not known!!
>>>
>>>
>>> Also in guest I do not see device-tree node with virtio-iommu.
>> do you mean the virtio-mmio with #iommu-cells property?
>>
>> This one is created statically by virt machine. I would be surprised if it were
>> not there. Are you using the virt = virt2.10 machine. Machines before do not
>> support its instantiation.
>>
>> Please can you add a printf in hw/arm/virt.c create_virtio_mmio() at the
>> moment when this node is created. Also you can add a printf in
>> bind_virtio_iommu_device() to make sure the binding with the PCI host
>> bridge is added on machine init done.
>>
>> Also worth to check, CONFIG_VIRTIO_IOMMU=y on guest side.
> 
> It works on my side. The driver config was disabled and also I was using guest kernel which was not have deferred-probing. Now after fixing it works on my side
> I placed some prints to see dma-map are mapping regions in virtio-iommu, it uses emulated iommu.
> 
> I will continue to add VFIO support now on this and more testing !!
> 
> Thanks
> -Bharat
> 
>>
>> Thanks
>>
>> Eric
>>
>>> I am using qemu-tree you mentioned below and iommu-driver patches
>> published by Jean-P.
>>> Qemu command line have additional ""-device virtio-iommu-device". What
>> I am missing ?
>>
>>
>>>
>>> Thanks
>>> -Bharat
>>>
>>>>
>>>> Best Regards
>>>>
>>>> Eric
>>>>
>>>> This series can be found at:
>>>> https://github.com/eauger/qemu/tree/virtio-iommu-rfcv2
>>>>
>>>> References:
>>>> [1] [RFC 0/3] virtio-iommu: a paravirtualized IOMMU, [2] [RFC PATCH
>>>> linux]
>>>> iommu: Add virtio-iommu driver [3] [RFC PATCH kvmtool 00/15] Add
>>>> virtio- iommu
>>>>
>>>> History:
>>>> v1 -> v2:
>>>> - fix redifinition of viommu_as typedef
>>>>
>>>> Eric Auger (8):
>>>>   update-linux-headers: import virtio_iommu.h
>>>>   linux-headers: Update for virtio-iommu
>>>>   virtio_iommu: add skeleton
>>>>   virtio-iommu: Decode the command payload
>>>>   virtio_iommu: Add the iommu regions
>>>>   virtio-iommu: Implement the translation and commands
>>>>   hw/arm/virt: Add 2.10 machine type
>>>>   hw/arm/virt: Add virtio-iommu the virt board
>>>>
>>>>  hw/arm/virt.c                                 | 116 ++++-
>>>>  hw/virtio/Makefile.objs                       |   1 +
>>>>  hw/virtio/trace-events                        |  14 +
>>>>  hw/virtio/virtio-iommu.c                      | 623
>> ++++++++++++++++++++++++++
>>>>  include/hw/arm/virt.h                         |   5 +
>>>>  include/hw/virtio/virtio-iommu.h              |  60 +++
>>>>  include/standard-headers/linux/virtio_ids.h   |   1 +
>>>>  include/standard-headers/linux/virtio_iommu.h | 142 ++++++
>>>>  linux-headers/linux/virtio_iommu.h            |   1 +
>>>>  scripts/update-linux-headers.sh               |   3 +
>>>>  10 files changed, 957 insertions(+), 9 deletions(-)  create mode
>>>> 100644 hw/virtio/virtio-iommu.c  create mode 100644
>>>> include/hw/virtio/virtio- iommu.h  create mode 100644
>>>> include/standard- headers/linux/virtio_iommu.h  create mode 100644
>>>> linux-headers/linux/virtio_iommu.h
>>>>
>>>> --
>>>> 2.5.5
>>>

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [Qemu-devel] [RFC v2 5/8] virtio_iommu: Add the iommu regions
  2017-06-07 16:01 ` [Qemu-devel] [RFC v2 5/8] virtio_iommu: Add the iommu regions Eric Auger
@ 2017-06-12  5:59   ` Bharat Bhushan
  0 siblings, 0 replies; 73+ messages in thread
From: Bharat Bhushan @ 2017-06-12  5:59 UTC (permalink / raw)
  To: Eric Auger, eric.auger.pro, peter.maydell, alex.williamson, mst,
	qemu-arm, qemu-devel, jean-philippe.brucker
  Cc: will.deacon, robin.murphy, kevin.tian, marc.zyngier,
	christoffer.dall, drjones, wei, tn

Hi Eric,

> -----Original Message-----
> From: Eric Auger [mailto:eric.auger@redhat.com]
> Sent: Wednesday, June 07, 2017 9:31 PM
> To: eric.auger.pro@gmail.com; eric.auger@redhat.com;
> peter.maydell@linaro.org; alex.williamson@redhat.com; mst@redhat.com;
> qemu-arm@nongnu.org; qemu-devel@nongnu.org; jean-
> philippe.brucker@arm.com
> Cc: will.deacon@arm.com; robin.murphy@arm.com; kevin.tian@intel.com;
> marc.zyngier@arm.com; christoffer.dall@linaro.org; drjones@redhat.com;
> wei@redhat.com; tn@semihalf.com; Bharat Bhushan
> <bharat.bhushan@nxp.com>
> Subject: [RFC v2 5/8] virtio_iommu: Add the iommu regions
> 
> This patch initializes the iommu memory regions so that
> PCIe end point transactions get translated. The translation function
> is not yet implemented at that stage.
> 
> Signed-off-by: Eric Auger <eric.auger@redhat.com>
> ---
>  hw/virtio/trace-events   |  1 +
>  hw/virtio/virtio-iommu.c | 97
> ++++++++++++++++++++++++++++++++++++++++++++++++
>  2 files changed, 98 insertions(+)
> 
> diff --git a/hw/virtio/trace-events b/hw/virtio/trace-events
> index fba1da6..341dbdf 100644
> --- a/hw/virtio/trace-events
> +++ b/hw/virtio/trace-events
> @@ -32,3 +32,4 @@ virtio_iommu_attach(uint32_t as, uint32_t dev, uint32_t
> flags) "as=%d dev=%d fla
>  virtio_iommu_detach(uint32_t dev, uint32_t flags) "dev=%d flags=%d"
>  virtio_iommu_map(uint32_t as, uint64_t phys_addr, uint64_t virt_addr,
> uint64_t size, uint32_t flags) "as= %d phys_addr=0x%"PRIx64"
> virt_addr=0x%"PRIx64" size=0x%"PRIx64" flags=%d"
>  virtio_iommu_unmap(uint32_t as, uint64_t virt_addr, uint64_t size, uint32_t
> reserved) "as= %d virt_addr=0x%"PRIx64" size=0x%"PRIx64" reserved=%d"
> +virtio_iommu_translate(const char *name, uint32_t rid, uint64_t iova, int
> flag) "mr=%s rid=%d addr=0x%"PRIx64" flag=%d"
> diff --git a/hw/virtio/virtio-iommu.c b/hw/virtio/virtio-iommu.c
> index ea1caa7..902c779 100644
> --- a/hw/virtio/virtio-iommu.c
> +++ b/hw/virtio/virtio-iommu.c
> @@ -23,6 +23,7 @@
>  #include "hw/virtio/virtio.h"
>  #include "sysemu/kvm.h"
>  #include "qapi-event.h"
> +#include "qemu/error-report.h"
>  #include "trace.h"
> 
>  #include "standard-headers/linux/virtio_ids.h"
> @@ -35,6 +36,59 @@
>  /* Max size */
>  #define VIOMMU_DEFAULT_QUEUE_SIZE 256
> 
> +static inline uint16_t smmu_get_sid(IOMMUDevice *dev)

This should be virtio-iommu not smmu ?

Thanks
-Bharat

> +{
> +    return  ((pci_bus_num(dev->bus) & 0xff) << 8) | dev->devfn;
> +}
> +
> +static AddressSpace *virtio_iommu_find_add_as(PCIBus *bus, void
> *opaque,
> +                                              int devfn)
> +{
> +    VirtIOIOMMU *s = opaque;
> +    uintptr_t key = (uintptr_t)bus;
> +    IOMMUPciBus *sbus = g_hash_table_lookup(s->as_by_busptr, &key);
> +    IOMMUDevice *sdev;
> +
> +    if (!sbus) {
> +        uintptr_t *new_key = g_malloc(sizeof(*new_key));
> +
> +        *new_key = (uintptr_t)bus;
> +        sbus = g_malloc0(sizeof(IOMMUPciBus) +
> +                         sizeof(IOMMUDevice *) * IOMMU_PCI_DEVFN_MAX);
> +        sbus->bus = bus;
> +        g_hash_table_insert(s->as_by_busptr, new_key, sbus);
> +    }
> +
> +    sdev = sbus->pbdev[devfn];
> +    if (!sdev) {
> +        sdev = sbus->pbdev[devfn] = g_malloc0(sizeof(IOMMUDevice));
> +
> +        sdev->viommu = s;
> +        sdev->bus = bus;
> +        sdev->devfn = devfn;
> +
> +        memory_region_init_iommu(&sdev->iommu_mr, OBJECT(s),
> +                                 &s->iommu_ops, TYPE_VIRTIO_IOMMU,
> +                                 UINT64_MAX);
> +        address_space_init(&sdev->as, &sdev->iommu_mr,
> TYPE_VIRTIO_IOMMU);
> +    }
> +
> +    return &sdev->as;
> +
> +}
> +
> +static void virtio_iommu_init_as(VirtIOIOMMU *s)
> +{
> +    PCIBus *pcibus = pci_find_primary_bus();
> +
> +    if (pcibus) {
> +        pci_setup_iommu(pcibus, virtio_iommu_find_add_as, s);
> +    } else {
> +        error_report("No PCI bus, virtio-iommu is not registered");
> +    }
> +}
> +
> +
>  static int virtio_iommu_attach(VirtIOIOMMU *s,
>                                 struct virtio_iommu_req_attach *req)
>  {
> @@ -208,6 +262,26 @@ static void
> virtio_iommu_handle_command(VirtIODevice *vdev, VirtQueue *vq)
>      }
>  }
> 
> +static IOMMUTLBEntry virtio_iommu_translate(MemoryRegion *mr,
> hwaddr addr,
> +                                            IOMMUAccessFlags flag)
> +{
> +    IOMMUDevice *sdev = container_of(mr, IOMMUDevice, iommu_mr);
> +    uint32_t sid;
> +
> +    IOMMUTLBEntry entry = {
> +        .target_as = &address_space_memory,
> +        .iova = addr,
> +        .translated_addr = addr,
> +        .addr_mask = ~(hwaddr)0,
> +        .perm = IOMMU_NONE,
> +    };
> +
> +    sid = smmu_get_sid(sdev);
> +
> +    trace_virtio_iommu_translate(mr->name, sid, addr, flag);
> +    return entry;
> +}
> +
>  static void virtio_iommu_get_config(VirtIODevice *vdev, uint8_t
> *config_data)
>  {
>      VirtIOIOMMU *dev = VIRTIO_IOMMU(vdev);
> @@ -253,6 +327,21 @@ static const VMStateDescription
> vmstate_virtio_iommu_device = {
>      },
>  };
> 
> +/*****************************
> + * Hash Table
> + *****************************/
> +
> +static inline gboolean as_uint64_equal(gconstpointer v1, gconstpointer v2)
> +{
> +    return *((const uint64_t *)v1) == *((const uint64_t *)v2);
> +}
> +
> +static inline guint as_uint64_hash(gconstpointer v)
> +{
> +    return (guint)*(const uint64_t *)v;
> +}
> +
> +
>  static void virtio_iommu_device_realize(DeviceState *dev, Error **errp)
>  {
>      VirtIODevice *vdev = VIRTIO_DEVICE(dev);
> @@ -266,6 +355,14 @@ static void virtio_iommu_device_realize(DeviceState
> *dev, Error **errp)
> 
>      s->config.page_sizes = ~((1ULL << 12) - 1);
>      s->config.input_range.end = -1UL;
> +
> +    s->iommu_ops.translate = virtio_iommu_translate;
> +    memset(s->as_by_bus_num, 0, sizeof(s->as_by_bus_num));
> +    s->as_by_busptr = g_hash_table_new_full(as_uint64_hash,
> +                                            as_uint64_equal,
> +                                            g_free, g_free);
> +
> +    virtio_iommu_init_as(s);
>  }
> 
>  static void virtio_iommu_device_unrealize(DeviceState *dev, Error **errp)
> --
> 2.5.5

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [Qemu-devel] [RFC v2 0/8] VIRTIO-IOMMU device
  2017-06-09 11:53       ` Auger Eric
@ 2017-06-19  7:54         ` Bharat Bhushan
  2017-06-19 10:15           ` Jean-Philippe Brucker
  2017-06-26  7:54           ` Auger Eric
  0 siblings, 2 replies; 73+ messages in thread
From: Bharat Bhushan @ 2017-06-19  7:54 UTC (permalink / raw)
  To: Auger Eric, eric.auger.pro, peter.maydell, alex.williamson, mst,
	qemu-arm, qemu-devel, jean-philippe.brucker
  Cc: wei, kevin.tian, marc.zyngier, tn, will.deacon, drjones,
	robin.murphy, christoffer.dall

Hi Eric,

I started added replay in virtio-iommu and came across how MSI interrupts with work with VFIO. 
I understand that on intel this works differently but vsmmu will have same requirement. 
kvm-msi-irq-route are added using the msi-address to be translated by viommu and not the final translated address.
While currently the irqfd framework does not know about emulated iommus (virtio-iommu, vsmmuv3/vintel-iommu).
So in my view we have following options:
- Programming with translated address when setting up kvm-msi-irq-route
- Route the interrupts via QEMU, which is bad from performance
- vhost-virtio-iommu may solve the problem in long term

Is there any other better option I am missing?

Thanks
-Bharat

> -----Original Message-----
> From: Auger Eric [mailto:eric.auger@redhat.com]
> Sent: Friday, June 09, 2017 5:24 PM
> To: Bharat Bhushan <bharat.bhushan@nxp.com>;
> eric.auger.pro@gmail.com; peter.maydell@linaro.org;
> alex.williamson@redhat.com; mst@redhat.com; qemu-arm@nongnu.org;
> qemu-devel@nongnu.org; jean-philippe.brucker@arm.com
> Cc: wei@redhat.com; kevin.tian@intel.com; marc.zyngier@arm.com;
> tn@semihalf.com; will.deacon@arm.com; drjones@redhat.com;
> robin.murphy@arm.com; christoffer.dall@linaro.org
> Subject: Re: [Qemu-devel] [RFC v2 0/8] VIRTIO-IOMMU device
> 
> Hi Bharat,
> 
> On 09/06/2017 13:30, Bharat Bhushan wrote:
> > Hi Eric,
> >
> >> -----Original Message-----
> >> From: Auger Eric [mailto:eric.auger@redhat.com]
> >> Sent: Friday, June 09, 2017 12:14 PM
> >> To: Bharat Bhushan <bharat.bhushan@nxp.com>;
> >> eric.auger.pro@gmail.com; peter.maydell@linaro.org;
> >> alex.williamson@redhat.com; mst@redhat.com; qemu-arm@nongnu.org;
> >> qemu-devel@nongnu.org; jean-philippe.brucker@arm.com
> >> Cc: will.deacon@arm.com; robin.murphy@arm.com;
> kevin.tian@intel.com;
> >> marc.zyngier@arm.com; christoffer.dall@linaro.org;
> >> drjones@redhat.com; wei@redhat.com; tn@semihalf.com
> >> Subject: Re: [RFC v2 0/8] VIRTIO-IOMMU device
> >>
> >> Hi Bharat,
> >>
> >> On 09/06/2017 08:16, Bharat Bhushan wrote:
> >>> Hi Eric,
> >>>
> >>>> -----Original Message-----
> >>>> From: Eric Auger [mailto:eric.auger@redhat.com]
> >>>> Sent: Wednesday, June 07, 2017 9:31 PM
> >>>> To: eric.auger.pro@gmail.com; eric.auger@redhat.com;
> >>>> peter.maydell@linaro.org; alex.williamson@redhat.com;
> >> mst@redhat.com;
> >>>> qemu-arm@nongnu.org; qemu-devel@nongnu.org; jean-
> >>>> philippe.brucker@arm.com
> >>>> Cc: will.deacon@arm.com; robin.murphy@arm.com;
> >> kevin.tian@intel.com;
> >>>> marc.zyngier@arm.com; christoffer.dall@linaro.org;
> >>>> drjones@redhat.com; wei@redhat.com; tn@semihalf.com; Bharat
> >> Bhushan
> >>>> <bharat.bhushan@nxp.com>
> >>>> Subject: [RFC v2 0/8] VIRTIO-IOMMU device
> >>>>
> >>>> This series implements the virtio-iommu device. This is a proof of
> >>>> concept based on the virtio-iommu specification written by
> >>>> Jean-Philippe
> >> Brucker [1].
> >>>> This was tested with a guest using the virtio-iommu driver [2] and
> >>>> exposed with a virtio-net-pci using dma ops.
> >>>>
> >>>> The device gets instantiated using the "-device virtio-iommu-device"
> >>>> option. It currently works with ARM virt machine only as the
> >>>> machine must handle the dt binding between the virtio-mmio "iommu"
> >>>> node and the PCI host bridge node. ACPI booting is not yet supported.
> >>>>
> >>>> This should allow to start some benchmarking activities against
> >>>> pure emulated IOMMU (especially ARM SMMU).
> >>>
> >>> I am testing this on ARM64 and see below continuous error prints:
> >>>
> >>> 	virtio_iommu_translate sid=8 is not known!!
> >>> 	virtio_iommu_translate sid=8 is not known!!
> >>> 	virtio_iommu_translate sid=8 is not known!!
> >>> 	virtio_iommu_translate sid=8 is not known!!
> >>> 	virtio_iommu_translate sid=8 is not known!!
> >>> 	virtio_iommu_translate sid=8 is not known!!
> >>> 	virtio_iommu_translate sid=8 is not known!!
> >>> 	virtio_iommu_translate sid=8 is not known!!
> >>> 	virtio_iommu_translate sid=8 is not known!!
> >>> 	virtio_iommu_translate sid=8 is not known!!
> >>>
> >>>
> >>> Also in guest I do not see device-tree node with virtio-iommu.
> >> do you mean the virtio-mmio with #iommu-cells property?
> >>
> >> This one is created statically by virt machine. I would be surprised
> >> if it were not there. Are you using the virt = virt2.10 machine.
> >> Machines before do not support its instantiation.
> >>
> >> Please can you add a printf in hw/arm/virt.c create_virtio_mmio() at
> >> the moment when this node is created. Also you can add a printf in
> >> bind_virtio_iommu_device() to make sure the binding with the PCI host
> >> bridge is added on machine init done.
> >>
> >> Also worth to check, CONFIG_VIRTIO_IOMMU=y on guest side.
> >
> > It works on my side.
> Great.
> 
>  The driver config was disabled and also I was using guest kernel which was
> not have deferred-probing.
> Yes I did not mention in my cover letter the guest I have been using is based
> on Jean-Philippe's branch, featuring deferred IOMMU probing. I I have not
> tried yet with an upstream guest.
>  Now after fixing it works on my side
> > I placed some prints to see dma-map are mapping regions in virtio-iommu,
> it uses emulated iommu.
> >
> > I will continue to add VFIO support now on this and more testing !!
> 
> OK. I will do the VFIO integration first on the vsmmuv3 device as I already
> prepared the VFIO replay and hopefully we will sync ;-)
> 
> Thanks
> 
> Eric
> >
> > Thanks
> > -Bharat
> >
> >>
> >> Thanks
> >>
> >> Eric
> >>
> >>> I am using qemu-tree you mentioned below and iommu-driver patches
> >> published by Jean-P.
> >>> Qemu command line have additional ""-device virtio-iommu-device".
> >>> What
> >> I am missing ?
> >>
> >>
> >>>
> >>> Thanks
> >>> -Bharat
> >>>
> >>>>
> >>>> Best Regards
> >>>>
> >>>> Eric
> >>>>
> >>>> This series can be found at:
> >>>> https://github.com/eauger/qemu/tree/virtio-iommu-rfcv2
> >>>>
> >>>> References:
> >>>> [1] [RFC 0/3] virtio-iommu: a paravirtualized IOMMU, [2] [RFC PATCH
> >>>> linux]
> >>>> iommu: Add virtio-iommu driver [3] [RFC PATCH kvmtool 00/15] Add
> >>>> virtio- iommu
> >>>>
> >>>> History:
> >>>> v1 -> v2:
> >>>> - fix redifinition of viommu_as typedef
> >>>>
> >>>> Eric Auger (8):
> >>>>   update-linux-headers: import virtio_iommu.h
> >>>>   linux-headers: Update for virtio-iommu
> >>>>   virtio_iommu: add skeleton
> >>>>   virtio-iommu: Decode the command payload
> >>>>   virtio_iommu: Add the iommu regions
> >>>>   virtio-iommu: Implement the translation and commands
> >>>>   hw/arm/virt: Add 2.10 machine type
> >>>>   hw/arm/virt: Add virtio-iommu the virt board
> >>>>
> >>>>  hw/arm/virt.c                                 | 116 ++++-
> >>>>  hw/virtio/Makefile.objs                       |   1 +
> >>>>  hw/virtio/trace-events                        |  14 +
> >>>>  hw/virtio/virtio-iommu.c                      | 623
> >> ++++++++++++++++++++++++++
> >>>>  include/hw/arm/virt.h                         |   5 +
> >>>>  include/hw/virtio/virtio-iommu.h              |  60 +++
> >>>>  include/standard-headers/linux/virtio_ids.h   |   1 +
> >>>>  include/standard-headers/linux/virtio_iommu.h | 142 ++++++
> >>>>  linux-headers/linux/virtio_iommu.h            |   1 +
> >>>>  scripts/update-linux-headers.sh               |   3 +
> >>>>  10 files changed, 957 insertions(+), 9 deletions(-)  create mode
> >>>> 100644 hw/virtio/virtio-iommu.c  create mode 100644
> >>>> include/hw/virtio/virtio- iommu.h  create mode 100644
> >>>> include/standard- headers/linux/virtio_iommu.h  create mode 100644
> >>>> linux-headers/linux/virtio_iommu.h
> >>>>
> >>>> --
> >>>> 2.5.5
> >>>
> >

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [Qemu-devel] [RFC v2 0/8] VIRTIO-IOMMU device
  2017-06-19  7:54         ` Bharat Bhushan
@ 2017-06-19 10:15           ` Jean-Philippe Brucker
  2017-06-26  8:22             ` Auger Eric
                               ` (2 more replies)
  2017-06-26  7:54           ` Auger Eric
  1 sibling, 3 replies; 73+ messages in thread
From: Jean-Philippe Brucker @ 2017-06-19 10:15 UTC (permalink / raw)
  To: Bharat Bhushan, Auger Eric, eric.auger.pro, peter.maydell,
	alex.williamson, mst, qemu-arm, qemu-devel
  Cc: wei, kevin.tian, marc.zyngier, tn, will.deacon, drjones,
	robin.murphy, christoffer.dall

On 19/06/17 08:54, Bharat Bhushan wrote:
> Hi Eric,
> 
> I started added replay in virtio-iommu and came across how MSI interrupts with work with VFIO. 
> I understand that on intel this works differently but vsmmu will have same requirement. 
> kvm-msi-irq-route are added using the msi-address to be translated by viommu and not the final translated address.
> While currently the irqfd framework does not know about emulated iommus (virtio-iommu, vsmmuv3/vintel-iommu).
> So in my view we have following options:
> - Programming with translated address when setting up kvm-msi-irq-route
> - Route the interrupts via QEMU, which is bad from performance
> - vhost-virtio-iommu may solve the problem in long term
> 
> Is there any other better option I am missing?

Since we're on the topic of MSIs... I'm currently trying to figure out how
we'll handle MSIs in the nested translation mode, where the guest manages
S1 page tables and the host doesn't know about GVA->GPA translation.

I'm also wondering about the benefits of having SW-mapped MSIs in the
guest. It seems unavoidable for vSMMU since that's what a physical system
would do. But in a paravirtualized solution there doesn't seem to be any
compelling reason for having the guest map MSI doorbells. These addresses
are never accessed directly, they are only used for setting up IRQ routing
(at least on kvmtool). So here's what I'd like to have. Note that I
haven't investigated the feasibility in Qemu yet, I don't know how it
deals with MSIs.

(1) Guest uses the guest-physical MSI doorbell when setting up MSIs. For
ARM with GICv3 this would be GITS_TRANSLATER, for x86 it would be the
fixed MSI doorbell. This way the host wouldn't need to inspect IOMMU
mappings when handling writes to PCI MSI-X tables.

(2) In nested mode (with VFIO) on ARM, the pSMMU will still translate MSIs
via S1+S2. Therefore the host needs to map MSIs at stage-1, and I'd like
to use the (currently unused) TTB1 tables in that case. In addition, using
TTB1 would be useful for SVM, when endpoints write MSIs with PASIDs and we
don't want to map them in user address space.

This means that the host needs to use different doorbell addresses in
nested mode, since it would be unable to map at S1 the same IOVA as S2
(TTB1 manages negative addresses - 0xffff............, which are not
representable as GPAs.) It also requires to use 32-bit page tables for
endpoints that are not capable of using 64-bit MSI addresses.


Now (2) is entirely handled in the host kernel, so it's more a Linux
question. But does (1) seem acceptable for virtio-iommu in Qemu?

Thanks,
Jean

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [Qemu-devel] [RFC v2 3/8] virtio_iommu: add skeleton
  2017-06-08 11:09   ` Bharat Bhushan
@ 2017-06-23 16:08     ` Jean-Philippe Brucker
  0 siblings, 0 replies; 73+ messages in thread
From: Jean-Philippe Brucker @ 2017-06-23 16:08 UTC (permalink / raw)
  To: Bharat Bhushan, Eric Auger, eric.auger.pro, peter.maydell,
	alex.williamson, mst, qemu-arm, qemu-devel
  Cc: will.deacon, robin.murphy, kevin.tian, marc.zyngier,
	christoffer.dall, drjones, wei, tn

On 06/08/2017 12:09 PM, Bharat Bhushan wrote:
>> From: Eric Auger [mailto:eric.auger@redhat.com]
>> Sent: Wednesday, June 07, 2017 9:31 PM
>> To: eric.auger.pro@gmail.com; eric.auger@redhat.com;
>> peter.maydell@linaro.org; alex.williamson@redhat.com; mst@redhat.com;
>> qemu-arm@nongnu.org; qemu-devel@nongnu.org; jean-
>> philippe.brucker@arm.com
>> Cc: will.deacon@arm.com; robin.murphy@arm.com; kevin.tian@intel.com;
>> marc.zyngier@arm.com; christoffer.dall@linaro.org; drjones@redhat.com;
>> wei@redhat.com; tn@semihalf.com; Bharat Bhushan
>> <bharat.bhushan@nxp.com>
>> Subject: [RFC v2 3/8] virtio_iommu: add skeleton
>>
>> This patchs adds the skeleton for the virtio-iommu device.
>>
>> Signed-off-by: Eric Auger <eric.auger@redhat.com>
>> +static void virtio_iommu_device_realize(DeviceState *dev, Error **errp)
>> +{
>> +    VirtIODevice *vdev = VIRTIO_DEVICE(dev);
>> +    VirtIOIOMMU *s = VIRTIO_IOMMU(dev);
>> +
>> +    virtio_init(vdev, "virtio-iommu", VIRTIO_ID_IOMMU,
>> +                sizeof(struct virtio_iommu_config));
>> +
>> +    s->vq = virtio_add_queue(vdev, VIOMMU_DEFAULT_QUEUE_SIZE,
>> +                             virtio_iommu_handle_command);
>> +
>> +    s->config.page_sizes = ~((1ULL << 12) - 1);
> 
> This is hardcoded to 4K, Should this be aligned to Host-page size ?

I wonder if we should introduce per-address-space page sizes, to cater
for emulated and VFIO devices being managed by the same IOMMU.

For an emulated device, it seems that the page granularity can be
arbitrary, so maybe TARGET_PAGE_MASK would be more convenient. But for
VFIO, the page granularity is a property of the physical IOMMU.

In kvmtool I instantiate two virtio-iommus for vfio and virtio devices,
so the page size issue hasn't come up, but here things won't work if the
page granularity advertised in config.page_sizes is smaller than the
pIOMMU page size.

Adding address space properties is tricky because they change when
attaching devices, and I wanted to avoid this complication. In nested
mode I have to add one AS state, where the AS is active and properties
are freezed (attaching an incompatible device is then rejected). Maybe
we need to do the same for map/unmap.

A simpler solution (for me, that is), would be to put the greatest page
granularity of all VFIO devices into page_sizes, but it doesn't take
hotplug into account.

Thanks,
Jean

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [Qemu-devel] [RFC v2 6/8] virtio-iommu: Implement the translation and commands
  2017-06-07 16:01 ` [Qemu-devel] [RFC v2 6/8] virtio-iommu: Implement the translation and commands Eric Auger
@ 2017-06-23 16:09   ` Jean-Philippe Brucker
  2017-07-04  9:13   ` Bharat Bhushan
  2017-07-14  2:17   ` Peter Xu
  2 siblings, 0 replies; 73+ messages in thread
From: Jean-Philippe Brucker @ 2017-06-23 16:09 UTC (permalink / raw)
  To: Eric Auger, eric.auger.pro, peter.maydell, alex.williamson, mst,
	qemu-arm, qemu-devel
  Cc: will.deacon, robin.murphy, kevin.tian, marc.zyngier,
	christoffer.dall, drjones, wei, tn, bharat.bhushan

On 07/06/17 17:01, Eric Auger wrote:
> This patch adds the actual implementation for the translation routine
> and the virtio-iommu commands.
> 
> Signed-off-by: Eric Auger <eric.auger@redhat.com>
> 
> ---[...]
>  static int virtio_iommu_attach(VirtIOIOMMU *s,
>                                 struct virtio_iommu_req_attach *req)
> @@ -95,10 +135,34 @@ static int virtio_iommu_attach(VirtIOIOMMU *s,
>      uint32_t asid = le32_to_cpu(req->address_space);
>      uint32_t devid = le32_to_cpu(req->device);
>      uint32_t reserved = le32_to_cpu(req->reserved);
> +    viommu_as *as;
> +    viommu_dev *dev;
>  
>      trace_virtio_iommu_attach(asid, devid, reserved);
>  
> -    return VIRTIO_IOMMU_S_UNSUPP;
> +    dev = g_tree_lookup(s->devices, GUINT_TO_POINTER(devid));
> +    if (dev) {
> +        return -1;

I guess you could return S_INVAL here. However, if the device is already
attached to AS0, it should be detached and attached to AS1. The Linux
driver relies on this behavior when moving a device from kernel to user
address space and back.

Thanks,
Jean

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [Qemu-devel] [RFC v2 0/8] VIRTIO-IOMMU device
  2017-06-19  7:54         ` Bharat Bhushan
  2017-06-19 10:15           ` Jean-Philippe Brucker
@ 2017-06-26  7:54           ` Auger Eric
  2017-07-05  8:23             ` Bharat Bhushan
  1 sibling, 1 reply; 73+ messages in thread
From: Auger Eric @ 2017-06-26  7:54 UTC (permalink / raw)
  To: Bharat Bhushan, eric.auger.pro, peter.maydell, alex.williamson,
	mst, qemu-arm, qemu-devel, jean-philippe.brucker
  Cc: wei, kevin.tian, marc.zyngier, tn, will.deacon, drjones,
	robin.murphy, christoffer.dall

Hi Bharat,

On 19/06/2017 09:54, Bharat Bhushan wrote:
> Hi Eric,
> 
> I started added replay in virtio-iommu and came across how MSI interrupts with work with VFIO. 
> I understand that on intel this works differently but vsmmu will have same requirement. 
> kvm-msi-irq-route are added using the msi-address to be translated by viommu and not the final translated address.
> While currently the irqfd framework does not know about emulated iommus (virtio-iommu, vsmmuv3/vintel-iommu).
> So in my view we have following options:
> - Programming with translated address when setting up kvm-msi-irq-route
> - Route the interrupts via QEMU, which is bad from performance
> - vhost-virtio-iommu may solve the problem in long term

Sorry for the delay. With regard to the vsmmuv3/vfio integration I think
we need to use the guest physical address otherwise the MSI address will
not be recognized as an MSI doorbell.

Also the fact on ARM we map the MSI doorbell causes an assert in
vfio_get_vaddr() as the vITS doorbell is not a RAM region. We will need
to handle this specifically.

Besides I have not looked specifically at the virtio-iommu/vfio
integration yet.

Thanks

Eric
> 
> Is there any other better option I am missing?
> 
> Thanks
> -Bharat
> 
>> -----Original Message-----
>> From: Auger Eric [mailto:eric.auger@redhat.com]
>> Sent: Friday, June 09, 2017 5:24 PM
>> To: Bharat Bhushan <bharat.bhushan@nxp.com>;
>> eric.auger.pro@gmail.com; peter.maydell@linaro.org;
>> alex.williamson@redhat.com; mst@redhat.com; qemu-arm@nongnu.org;
>> qemu-devel@nongnu.org; jean-philippe.brucker@arm.com
>> Cc: wei@redhat.com; kevin.tian@intel.com; marc.zyngier@arm.com;
>> tn@semihalf.com; will.deacon@arm.com; drjones@redhat.com;
>> robin.murphy@arm.com; christoffer.dall@linaro.org
>> Subject: Re: [Qemu-devel] [RFC v2 0/8] VIRTIO-IOMMU device
>>
>> Hi Bharat,
>>
>> On 09/06/2017 13:30, Bharat Bhushan wrote:
>>> Hi Eric,
>>>
>>>> -----Original Message-----
>>>> From: Auger Eric [mailto:eric.auger@redhat.com]
>>>> Sent: Friday, June 09, 2017 12:14 PM
>>>> To: Bharat Bhushan <bharat.bhushan@nxp.com>;
>>>> eric.auger.pro@gmail.com; peter.maydell@linaro.org;
>>>> alex.williamson@redhat.com; mst@redhat.com; qemu-arm@nongnu.org;
>>>> qemu-devel@nongnu.org; jean-philippe.brucker@arm.com
>>>> Cc: will.deacon@arm.com; robin.murphy@arm.com;
>> kevin.tian@intel.com;
>>>> marc.zyngier@arm.com; christoffer.dall@linaro.org;
>>>> drjones@redhat.com; wei@redhat.com; tn@semihalf.com
>>>> Subject: Re: [RFC v2 0/8] VIRTIO-IOMMU device
>>>>
>>>> Hi Bharat,
>>>>
>>>> On 09/06/2017 08:16, Bharat Bhushan wrote:
>>>>> Hi Eric,
>>>>>
>>>>>> -----Original Message-----
>>>>>> From: Eric Auger [mailto:eric.auger@redhat.com]
>>>>>> Sent: Wednesday, June 07, 2017 9:31 PM
>>>>>> To: eric.auger.pro@gmail.com; eric.auger@redhat.com;
>>>>>> peter.maydell@linaro.org; alex.williamson@redhat.com;
>>>> mst@redhat.com;
>>>>>> qemu-arm@nongnu.org; qemu-devel@nongnu.org; jean-
>>>>>> philippe.brucker@arm.com
>>>>>> Cc: will.deacon@arm.com; robin.murphy@arm.com;
>>>> kevin.tian@intel.com;
>>>>>> marc.zyngier@arm.com; christoffer.dall@linaro.org;
>>>>>> drjones@redhat.com; wei@redhat.com; tn@semihalf.com; Bharat
>>>> Bhushan
>>>>>> <bharat.bhushan@nxp.com>
>>>>>> Subject: [RFC v2 0/8] VIRTIO-IOMMU device
>>>>>>
>>>>>> This series implements the virtio-iommu device. This is a proof of
>>>>>> concept based on the virtio-iommu specification written by
>>>>>> Jean-Philippe
>>>> Brucker [1].
>>>>>> This was tested with a guest using the virtio-iommu driver [2] and
>>>>>> exposed with a virtio-net-pci using dma ops.
>>>>>>
>>>>>> The device gets instantiated using the "-device virtio-iommu-device"
>>>>>> option. It currently works with ARM virt machine only as the
>>>>>> machine must handle the dt binding between the virtio-mmio "iommu"
>>>>>> node and the PCI host bridge node. ACPI booting is not yet supported.
>>>>>>
>>>>>> This should allow to start some benchmarking activities against
>>>>>> pure emulated IOMMU (especially ARM SMMU).
>>>>>
>>>>> I am testing this on ARM64 and see below continuous error prints:
>>>>>
>>>>> 	virtio_iommu_translate sid=8 is not known!!
>>>>> 	virtio_iommu_translate sid=8 is not known!!
>>>>> 	virtio_iommu_translate sid=8 is not known!!
>>>>> 	virtio_iommu_translate sid=8 is not known!!
>>>>> 	virtio_iommu_translate sid=8 is not known!!
>>>>> 	virtio_iommu_translate sid=8 is not known!!
>>>>> 	virtio_iommu_translate sid=8 is not known!!
>>>>> 	virtio_iommu_translate sid=8 is not known!!
>>>>> 	virtio_iommu_translate sid=8 is not known!!
>>>>> 	virtio_iommu_translate sid=8 is not known!!
>>>>>
>>>>>
>>>>> Also in guest I do not see device-tree node with virtio-iommu.
>>>> do you mean the virtio-mmio with #iommu-cells property?
>>>>
>>>> This one is created statically by virt machine. I would be surprised
>>>> if it were not there. Are you using the virt = virt2.10 machine.
>>>> Machines before do not support its instantiation.
>>>>
>>>> Please can you add a printf in hw/arm/virt.c create_virtio_mmio() at
>>>> the moment when this node is created. Also you can add a printf in
>>>> bind_virtio_iommu_device() to make sure the binding with the PCI host
>>>> bridge is added on machine init done.
>>>>
>>>> Also worth to check, CONFIG_VIRTIO_IOMMU=y on guest side.
>>>
>>> It works on my side.
>> Great.
>>
>>  The driver config was disabled and also I was using guest kernel which was
>> not have deferred-probing.
>> Yes I did not mention in my cover letter the guest I have been using is based
>> on Jean-Philippe's branch, featuring deferred IOMMU probing. I I have not
>> tried yet with an upstream guest.
>>  Now after fixing it works on my side
>>> I placed some prints to see dma-map are mapping regions in virtio-iommu,
>> it uses emulated iommu.
>>>
>>> I will continue to add VFIO support now on this and more testing !!
>>
>> OK. I will do the VFIO integration first on the vsmmuv3 device as I already
>> prepared the VFIO replay and hopefully we will sync ;-)
>>
>> Thanks
>>
>> Eric
>>>
>>> Thanks
>>> -Bharat
>>>
>>>>
>>>> Thanks
>>>>
>>>> Eric
>>>>
>>>>> I am using qemu-tree you mentioned below and iommu-driver patches
>>>> published by Jean-P.
>>>>> Qemu command line have additional ""-device virtio-iommu-device".
>>>>> What
>>>> I am missing ?
>>>>
>>>>
>>>>>
>>>>> Thanks
>>>>> -Bharat
>>>>>
>>>>>>
>>>>>> Best Regards
>>>>>>
>>>>>> Eric
>>>>>>
>>>>>> This series can be found at:
>>>>>> https://github.com/eauger/qemu/tree/virtio-iommu-rfcv2
>>>>>>
>>>>>> References:
>>>>>> [1] [RFC 0/3] virtio-iommu: a paravirtualized IOMMU, [2] [RFC PATCH
>>>>>> linux]
>>>>>> iommu: Add virtio-iommu driver [3] [RFC PATCH kvmtool 00/15] Add
>>>>>> virtio- iommu
>>>>>>
>>>>>> History:
>>>>>> v1 -> v2:
>>>>>> - fix redifinition of viommu_as typedef
>>>>>>
>>>>>> Eric Auger (8):
>>>>>>   update-linux-headers: import virtio_iommu.h
>>>>>>   linux-headers: Update for virtio-iommu
>>>>>>   virtio_iommu: add skeleton
>>>>>>   virtio-iommu: Decode the command payload
>>>>>>   virtio_iommu: Add the iommu regions
>>>>>>   virtio-iommu: Implement the translation and commands
>>>>>>   hw/arm/virt: Add 2.10 machine type
>>>>>>   hw/arm/virt: Add virtio-iommu the virt board
>>>>>>
>>>>>>  hw/arm/virt.c                                 | 116 ++++-
>>>>>>  hw/virtio/Makefile.objs                       |   1 +
>>>>>>  hw/virtio/trace-events                        |  14 +
>>>>>>  hw/virtio/virtio-iommu.c                      | 623
>>>> ++++++++++++++++++++++++++
>>>>>>  include/hw/arm/virt.h                         |   5 +
>>>>>>  include/hw/virtio/virtio-iommu.h              |  60 +++
>>>>>>  include/standard-headers/linux/virtio_ids.h   |   1 +
>>>>>>  include/standard-headers/linux/virtio_iommu.h | 142 ++++++
>>>>>>  linux-headers/linux/virtio_iommu.h            |   1 +
>>>>>>  scripts/update-linux-headers.sh               |   3 +
>>>>>>  10 files changed, 957 insertions(+), 9 deletions(-)  create mode
>>>>>> 100644 hw/virtio/virtio-iommu.c  create mode 100644
>>>>>> include/hw/virtio/virtio- iommu.h  create mode 100644
>>>>>> include/standard- headers/linux/virtio_iommu.h  create mode 100644
>>>>>> linux-headers/linux/virtio_iommu.h
>>>>>>
>>>>>> --
>>>>>> 2.5.5
>>>>>
>>>
> 

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [Qemu-devel] [RFC v2 0/8] VIRTIO-IOMMU device
  2017-06-19 10:15           ` Jean-Philippe Brucker
@ 2017-06-26  8:22             ` Auger Eric
  2017-06-26 16:13               ` Jean-Philippe Brucker
  2017-07-05  7:14             ` Tian, Kevin
  2017-07-05  7:15             ` Bharat Bhushan
  2 siblings, 1 reply; 73+ messages in thread
From: Auger Eric @ 2017-06-26  8:22 UTC (permalink / raw)
  To: Jean-Philippe Brucker, Bharat Bhushan, eric.auger.pro,
	peter.maydell, alex.williamson, mst, qemu-arm, qemu-devel
  Cc: wei, kevin.tian, marc.zyngier, tn, will.deacon, drjones,
	robin.murphy, christoffer.dall

Hi Jean-Philippe,

On 19/06/2017 12:15, Jean-Philippe Brucker wrote:
> On 19/06/17 08:54, Bharat Bhushan wrote:
>> Hi Eric,
>>
>> I started added replay in virtio-iommu and came across how MSI interrupts with work with VFIO. 
>> I understand that on intel this works differently but vsmmu will have same requirement. 
>> kvm-msi-irq-route are added using the msi-address to be translated by viommu and not the final translated address.
>> While currently the irqfd framework does not know about emulated iommus (virtio-iommu, vsmmuv3/vintel-iommu).
>> So in my view we have following options:
>> - Programming with translated address when setting up kvm-msi-irq-route
>> - Route the interrupts via QEMU, which is bad from performance
>> - vhost-virtio-iommu may solve the problem in long term
>>
>> Is there any other better option I am missing?
> 
> Since we're on the topic of MSIs... I'm currently trying to figure out how
> we'll handle MSIs in the nested translation mode, where the guest manages
> S1 page tables and the host doesn't know about GVA->GPA translation.

I have a question about the "nested translation mode" terminology. Do
you mean in that case you use stage 1 + stage 2 of the physical IOMMU
(which the ARM spec normally advises or was meant for) or do you mean
stage 1 implemented in vIOMMU and stage 2 implemented in pIOMMU. At the
moment my understanding is for VFIO integration the pIOMMU uses a single
stage combining both the stage 1 and stage2 mappings but the host is not
aware of those 2 stages.
> 
> I'm also wondering about the benefits of having SW-mapped MSIs in the
> guest. It seems unavoidable for vSMMU since that's what a physical system
> would do. But in a paravirtualized solution there doesn't seem to be any
> compelling reason for having the guest map MSI doorbells.

If I understand correctly the virtio-iommu would not expose MSI reserved
regions (saying it does not translates MSIs). In that case he VFIO
kernel code will not check the irq_domain_check_msi_remap() but will
check iommu_capable(bus, IOMMU_CAP_INTR_REMAP) instead. Would the
virtio-iommu expose this capability? How would it isolate MSI
transactions from different devices?

Thanks

Eric


 These addresses
> are never accessed directly, they are only used for setting up IRQ routing
> (at least on kvmtool). So here's what I'd like to have. Note that I
> haven't investigated the feasibility in Qemu yet, I don't know how it
> deals with MSIs.
> 
> (1) Guest uses the guest-physical MSI doorbell when setting up MSIs. For
> ARM with GICv3 this would be GITS_TRANSLATER, for x86 it would be the
> fixed MSI doorbell. This way the host wouldn't need to inspect IOMMU
> mappings when handling writes to PCI MSI-X tables.
> 
> (2) In nested mode (with VFIO) on ARM, the pSMMU will still translate MSIs
> via S1+S2. Therefore the host needs to map MSIs at stage-1, and I'd like
> to use the (currently unused) TTB1 tables in that case. In addition, using
> TTB1 would be useful for SVM, when endpoints write MSIs with PASIDs and we
> don't want to map them in user address space.
> 
> This means that the host needs to use different doorbell addresses in
> nested mode, since it would be unable to map at S1 the same IOVA as S2
> (TTB1 manages negative addresses - 0xffff............, which are not
> representable as GPAs.) It also requires to use 32-bit page tables for
> endpoints that are not capable of using 64-bit MSI addresses.
> 
> 
> Now (2) is entirely handled in the host kernel, so it's more a Linux
> question. But does (1) seem acceptable for virtio-iommu in Qemu?
> 
> Thanks,
> Jean
> 

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [Qemu-devel] [RFC v2 0/8] VIRTIO-IOMMU device
  2017-06-26  8:22             ` Auger Eric
@ 2017-06-26 16:13               ` Jean-Philippe Brucker
  2017-06-27  6:38                 ` Auger Eric
  2017-07-05  7:25                 ` Tian, Kevin
  0 siblings, 2 replies; 73+ messages in thread
From: Jean-Philippe Brucker @ 2017-06-26 16:13 UTC (permalink / raw)
  To: Auger Eric, Bharat Bhushan, eric.auger.pro, peter.maydell,
	alex.williamson, mst, qemu-arm, qemu-devel
  Cc: wei, kevin.tian, marc.zyngier, tn, will.deacon, drjones,
	robin.murphy, christoffer.dall

On 26/06/17 09:22, Auger Eric wrote:
> Hi Jean-Philippe,
> 
> On 19/06/2017 12:15, Jean-Philippe Brucker wrote:
>> On 19/06/17 08:54, Bharat Bhushan wrote:
>>> Hi Eric,
>>>
>>> I started added replay in virtio-iommu and came across how MSI interrupts with work with VFIO. 
>>> I understand that on intel this works differently but vsmmu will have same requirement. 
>>> kvm-msi-irq-route are added using the msi-address to be translated by viommu and not the final translated address.
>>> While currently the irqfd framework does not know about emulated iommus (virtio-iommu, vsmmuv3/vintel-iommu).
>>> So in my view we have following options:
>>> - Programming with translated address when setting up kvm-msi-irq-route
>>> - Route the interrupts via QEMU, which is bad from performance
>>> - vhost-virtio-iommu may solve the problem in long term
>>>
>>> Is there any other better option I am missing?
>>
>> Since we're on the topic of MSIs... I'm currently trying to figure out how
>> we'll handle MSIs in the nested translation mode, where the guest manages
>> S1 page tables and the host doesn't know about GVA->GPA translation.
> 
> I have a question about the "nested translation mode" terminology. Do
> you mean in that case you use stage 1 + stage 2 of the physical IOMMU
> (which the ARM spec normally advises or was meant for) or do you mean
> stage 1 implemented in vIOMMU and stage 2 implemented in pIOMMU. At the
> moment my understanding is for VFIO integration the pIOMMU uses a single
> stage combining both the stage 1 and stage2 mappings but the host is not
> aware of those 2 stages.

Yes at the moment the VMM merges stage-1 (GVA->GPA) from the guest with
its stage-2 mappings (GPA->HPA) and creates a stage-2 mapping (GVA->HPA)
in the pIOMMU via VFIO_IOMMU_MAP_DMA. stage-1 is disabled in the pIOMMU.

What I mean by "nested mode" is stage 1 + stage 2 in the physical IOMMU.
I'm referring to the "Page Table Sharing" bit of the Future Work in the
initial RFC for virtio-iommu [1], and also PASID table binding [2] in the
case of vSMMU. In that mode, stage-1 page tables in the pIOMMU are managed
by the guest, and the VMM only maps GPA->HPA.

Since both s1 and s2 are then enabled in the pIOMMU, MSIs reaching the
pIOMMU will be translated at s1 then s2. To create nested translation for
MSIs, I see two solutions:

A. The GPA of the doorbell that is exposed to the guest is mapped by the
VMM at S2. This mapping is GPA->(PA of the doorbell) with Dev-nGnRE memory
attributes. The guest creates a GVA->GPA mapping, then writes GVA in the
MSI-X tables.
- If the MSI-X table is emulated (as we currently do), VMM has to force
  the host to rewrite the physical MSIX entry with the GVA.
- If the MSI-X table is mapped (see [3]), then the guest writes
  the GVA into the physical MSI-X entry. (How does this work with lazy MSI
  routing setup, that is based on trapping MSIX table?)

B. The VMM exposes a fake doorbell. Hardware MSI vectors are programmed
upfront by the host. Since TTB0 is assigned to the guest, then host must
use TTB1 to create the GVA->GPA mapping.

Solution B was my proposal (2) below, but I didn't take vSMMU into account
at the time. I think that for virtual SVM with the vSMMU, the VMM has to
hand the whole PASID table over to the guest. This is what Intel seems to
do [2]. Even if we emulated the PASID table instead of handing it over, we
wouldn't have a way to hide TTB1 from the guest. So with vSMMU we loose
control over TTB1 and (2) doesn't work.

I don't really like A, but it might be the only way with vSMMU:
- Guest maps doorbell at S1,
- Guest writes the GVA in its virtual MSI-X tables,
- Host handles the GVA write and reprograms the hardware MSI-X tables,
- Device issues an MSI, which gets translated at S1+S2, then hits the
  doorbell,
- VFIO handles the IRQ, which is forwarded to KVM via IRQFD, finds the
  corresponding irqchip by GPA, then injects the MSI.

>> I'm also wondering about the benefits of having SW-mapped MSIs in the
>> guest. It seems unavoidable for vSMMU since that's what a physical system
>> would do. But in a paravirtualized solution there doesn't seem to be any
>> compelling reason for having the guest map MSI doorbells.
> 
> If I understand correctly the virtio-iommu would not expose MSI reserved
> regions (saying it does not translates MSIs). In that case he VFIO
> kernel code will not check the irq_domain_check_msi_remap() but will
> check iommu_capable(bus, IOMMU_CAP_INTR_REMAP) instead. Would the
> virtio-iommu expose this capability? How would it isolate MSI
> transactions from different devices?

Yes, the virtio-iommu would expose IOMMU_CAP_INTR_REMAP to keep VFIO
happy. But the virtio-iommu device wouldn't do any MSI isolation. We have
software-mapped doorbell on ARM because MSI transactions are translated by
the SMMU before reaching the GIC, which then performs device isolation.
With virtio-iommu on ARM, the address translation stage seems unnecessary
if you already have a vGIC handling device isolation. So maybe MSIs could
bypass the vIOMMU in that case.

However on x86, I think we will need to handle MSI remapping in the
virtio-iommu itself, since that's what x86 platforms expect. In this case
the guest doesn't have to create an IOMMU mapping for doorbell addresses
either, but might need to manage some form of IRQ remapping table (on
which I still have a tonne of research to do.)

Thanks,
Jean

[1] https://www.spinics.net/lists/kvm/msg147993.html
[2] https://www.spinics.net/lists/kvm/msg148798.html
[3] https://lkml.org/lkml/2017/6/15/34

> Thanks
> 
> Eric
> 
> 
>  These addresses
>> are never accessed directly, they are only used for setting up IRQ routing
>> (at least on kvmtool). So here's what I'd like to have. Note that I
>> haven't investigated the feasibility in Qemu yet, I don't know how it
>> deals with MSIs.
>>
>> (1) Guest uses the guest-physical MSI doorbell when setting up MSIs. For
>> ARM with GICv3 this would be GITS_TRANSLATER, for x86 it would be the
>> fixed MSI doorbell. This way the host wouldn't need to inspect IOMMU
>> mappings when handling writes to PCI MSI-X tables.
>>
>> (2) In nested mode (with VFIO) on ARM, the pSMMU will still translate MSIs
>> via S1+S2. Therefore the host needs to map MSIs at stage-1, and I'd like
>> to use the (currently unused) TTB1 tables in that case. In addition, using
>> TTB1 would be useful for SVM, when endpoints write MSIs with PASIDs and we
>> don't want to map them in user address space.
>>
>> This means that the host needs to use different doorbell addresses in
>> nested mode, since it would be unable to map at S1 the same IOVA as S2
>> (TTB1 manages negative addresses - 0xffff............, which are not
>> representable as GPAs.) It also requires to use 32-bit page tables for
>> endpoints that are not capable of using 64-bit MSI addresses.
>>
>>
>> Now (2) is entirely handled in the host kernel, so it's more a Linux
>> question. But does (1) seem acceptable for virtio-iommu in Qemu?
>>
>> Thanks,
>> Jean
>>

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [Qemu-devel] [RFC v2 0/8] VIRTIO-IOMMU device
  2017-06-26 16:13               ` Jean-Philippe Brucker
@ 2017-06-27  6:38                 ` Auger Eric
  2017-06-27  8:46                   ` Will Deacon
  2017-07-05  7:25                 ` Tian, Kevin
  1 sibling, 1 reply; 73+ messages in thread
From: Auger Eric @ 2017-06-27  6:38 UTC (permalink / raw)
  To: Jean-Philippe Brucker, Bharat Bhushan, eric.auger.pro,
	peter.maydell, alex.williamson, mst, qemu-arm, qemu-devel
  Cc: wei, kevin.tian, marc.zyngier, tn, will.deacon, drjones,
	robin.murphy, christoffer.dall

Hi Jean-Philippe,

On 26/06/2017 18:13, Jean-Philippe Brucker wrote:
> On 26/06/17 09:22, Auger Eric wrote:
>> Hi Jean-Philippe,
>>
>> On 19/06/2017 12:15, Jean-Philippe Brucker wrote:
>>> On 19/06/17 08:54, Bharat Bhushan wrote:
>>>> Hi Eric,
>>>>
>>>> I started added replay in virtio-iommu and came across how MSI interrupts with work with VFIO. 
>>>> I understand that on intel this works differently but vsmmu will have same requirement. 
>>>> kvm-msi-irq-route are added using the msi-address to be translated by viommu and not the final translated address.
>>>> While currently the irqfd framework does not know about emulated iommus (virtio-iommu, vsmmuv3/vintel-iommu).
>>>> So in my view we have following options:
>>>> - Programming with translated address when setting up kvm-msi-irq-route
>>>> - Route the interrupts via QEMU, which is bad from performance
>>>> - vhost-virtio-iommu may solve the problem in long term
>>>>
>>>> Is there any other better option I am missing?
>>>
>>> Since we're on the topic of MSIs... I'm currently trying to figure out how
>>> we'll handle MSIs in the nested translation mode, where the guest manages
>>> S1 page tables and the host doesn't know about GVA->GPA translation.
>>
>> I have a question about the "nested translation mode" terminology. Do
>> you mean in that case you use stage 1 + stage 2 of the physical IOMMU
>> (which the ARM spec normally advises or was meant for) or do you mean
>> stage 1 implemented in vIOMMU and stage 2 implemented in pIOMMU. At the
>> moment my understanding is for VFIO integration the pIOMMU uses a single
>> stage combining both the stage 1 and stage2 mappings but the host is not
>> aware of those 2 stages.
> 
> Yes at the moment the VMM merges stage-1 (GVA->GPA) from the guest with
> its stage-2 mappings (GPA->HPA) and creates a stage-2 mapping (GVA->HPA)
> in the pIOMMU via VFIO_IOMMU_MAP_DMA. stage-1 is disabled in the pIOMMU.
> 
> What I mean by "nested mode" is stage 1 + stage 2 in the physical IOMMU.
> I'm referring to the "Page Table Sharing" bit of the Future Work in the
> initial RFC for virtio-iommu [1], and also PASID table binding [2] in the
> case of vSMMU. In that mode, stage-1 page tables in the pIOMMU are managed
> by the guest, and the VMM only maps GPA->HPA.

OK I need to read that part more thoroughly. I was told in the past
handling nested stages at pIOMMU was considered too complex and
difficult to maintain. But definitively The SMMU architecture is devised
for that. Michael asked why we did not use that already for vsmmu
(nested stages are used on AMD IOMMU I think).
> 
> Since both s1 and s2 are then enabled in the pIOMMU, MSIs reaching the
> pIOMMU will be translated at s1 then s2. To create nested translation for
> MSIs, I see two solutions:
> 
> A. The GPA of the doorbell that is exposed to the guest is mapped by the
> VMM at S2. This mapping is GPA->(PA of the doorbell) with Dev-nGnRE memory
> attributes. The guest creates a GVA->GPA mapping, then writes GVA in the
> MSI-X tables.
> - If the MSI-X table is emulated (as we currently do), VMM has to force
>   the host to rewrite the physical MSIX entry with the GVA.
> - If the MSI-X table is mapped (see [3]), then the guest writes
>   the GVA into the physical MSI-X entry. (How does this work with lazy MSI
>   routing setup, that is based on trapping MSIX table?)
> 
> B. The VMM exposes a fake doorbell. Hardware MSI vectors are programmed
> upfront by the host. Since TTB0 is assigned to the guest, then host must
> use TTB1 to create the GVA->GPA mapping.
> 
> Solution B was my proposal (2) below, but I didn't take vSMMU into account
> at the time. I think that for virtual SVM with the vSMMU, the VMM has to
> hand the whole PASID table over to the guest. This is what Intel seems to
> do [2]. Even if we emulated the PASID table instead of handing it over, we
> wouldn't have a way to hide TTB1 from the guest. So with vSMMU we loose
> control over TTB1 and (2) doesn't work.
> 
> I don't really like A, but it might be the only way with vSMMU:
> - Guest maps doorbell at S1,
> - Guest writes the GVA in its virtual MSI-X tables,
> - Host handles the GVA write and reprograms the hardware MSI-X tables,
> - Device issues an MSI, which gets translated at S1+S2, then hits the
>   doorbell,
> - VFIO handles the IRQ, which is forwarded to KVM via IRQFD, finds the
>   corresponding irqchip by GPA, then injects the MSI.

I am about to experience A) with vsmmu/VFIO. Please let me few days
before I answer accurately to this part.
> 
>>> I'm also wondering about the benefits of having SW-mapped MSIs in the
>>> guest. It seems unavoidable for vSMMU since that's what a physical system
>>> would do. But in a paravirtualized solution there doesn't seem to be any
>>> compelling reason for having the guest map MSI doorbells.
>>
>> If I understand correctly the virtio-iommu would not expose MSI reserved
>> regions (saying it does not translates MSIs). In that case he VFIO
>> kernel code will not check the irq_domain_check_msi_remap() but will
>> check iommu_capable(bus, IOMMU_CAP_INTR_REMAP) instead. Would the
>> virtio-iommu expose this capability? How would it isolate MSI
>> transactions from different devices?
> 
> Yes, the virtio-iommu would expose IOMMU_CAP_INTR_REMAP to keep VFIO
> happy. But the virtio-iommu device wouldn't do any MSI isolation. We have
> software-mapped doorbell on ARM because MSI transactions are translated by
> the SMMU before reaching the GIC, which then performs device isolation.
> With virtio-iommu on ARM, the address translation stage seems unnecessary
> if you already have a vGIC handling device isolation. So maybe MSIs could
> bypass the vIOMMU in that case.
only vITS performs device isolation. In case we have a vGICv2M there is
no irq translation. So yes MSI isolation also needs to be handled in x86
and GICv2m cases.

Thanks

Eric
> 
> However on x86, I think we will need to handle MSI remapping in the
> virtio-iommu itself, since that's what x86 platforms expect. In this case
> the guest doesn't have to create an IOMMU mapping for doorbell addresses
> either, but might need to manage some form of IRQ remapping table (on
> which I still have a tonne of research to do.)
> 
> Thanks,
> Jean
> 
> [1] https://www.spinics.net/lists/kvm/msg147993.html
> [2] https://www.spinics.net/lists/kvm/msg148798.html
> [3] https://lkml.org/lkml/2017/6/15/34
> 
>> Thanks
>>
>> Eric
>>
>>
>>  These addresses
>>> are never accessed directly, they are only used for setting up IRQ routing
>>> (at least on kvmtool). So here's what I'd like to have. Note that I
>>> haven't investigated the feasibility in Qemu yet, I don't know how it
>>> deals with MSIs.
>>>
>>> (1) Guest uses the guest-physical MSI doorbell when setting up MSIs. For
>>> ARM with GICv3 this would be GITS_TRANSLATER, for x86 it would be the
>>> fixed MSI doorbell. This way the host wouldn't need to inspect IOMMU
>>> mappings when handling writes to PCI MSI-X tables.
>>>
>>> (2) In nested mode (with VFIO) on ARM, the pSMMU will still translate MSIs
>>> via S1+S2. Therefore the host needs to map MSIs at stage-1, and I'd like
>>> to use the (currently unused) TTB1 tables in that case. In addition, using
>>> TTB1 would be useful for SVM, when endpoints write MSIs with PASIDs and we
>>> don't want to map them in user address space.
>>>
>>> This means that the host needs to use different doorbell addresses in
>>> nested mode, since it would be unable to map at S1 the same IOVA as S2
>>> (TTB1 manages negative addresses - 0xffff............, which are not
>>> representable as GPAs.) It also requires to use 32-bit page tables for
>>> endpoints that are not capable of using 64-bit MSI addresses.
>>>
>>>
>>> Now (2) is entirely handled in the host kernel, so it's more a Linux
>>> question. But does (1) seem acceptable for virtio-iommu in Qemu?
>>>
>>> Thanks,
>>> Jean
>>>
> 

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [Qemu-devel] [RFC v2 0/8] VIRTIO-IOMMU device
  2017-06-27  6:38                 ` Auger Eric
@ 2017-06-27  8:46                   ` Will Deacon
  2017-06-27  8:59                     ` Auger Eric
  0 siblings, 1 reply; 73+ messages in thread
From: Will Deacon @ 2017-06-27  8:46 UTC (permalink / raw)
  To: Auger Eric
  Cc: Jean-Philippe Brucker, Bharat Bhushan, eric.auger.pro,
	peter.maydell, alex.williamson, mst, qemu-arm, qemu-devel, wei,
	kevin.tian, marc.zyngier, tn, drjones, robin.murphy,
	christoffer.dall

Hi Eric,

On Tue, Jun 27, 2017 at 08:38:48AM +0200, Auger Eric wrote:
> On 26/06/2017 18:13, Jean-Philippe Brucker wrote:
> > On 26/06/17 09:22, Auger Eric wrote:
> >> On 19/06/2017 12:15, Jean-Philippe Brucker wrote:
> >>> On 19/06/17 08:54, Bharat Bhushan wrote:
> >>>> I started added replay in virtio-iommu and came across how MSI interrupts with work with VFIO. 
> >>>> I understand that on intel this works differently but vsmmu will have same requirement. 
> >>>> kvm-msi-irq-route are added using the msi-address to be translated by viommu and not the final translated address.
> >>>> While currently the irqfd framework does not know about emulated iommus (virtio-iommu, vsmmuv3/vintel-iommu).
> >>>> So in my view we have following options:
> >>>> - Programming with translated address when setting up kvm-msi-irq-route
> >>>> - Route the interrupts via QEMU, which is bad from performance
> >>>> - vhost-virtio-iommu may solve the problem in long term
> >>>>
> >>>> Is there any other better option I am missing?
> >>>
> >>> Since we're on the topic of MSIs... I'm currently trying to figure out how
> >>> we'll handle MSIs in the nested translation mode, where the guest manages
> >>> S1 page tables and the host doesn't know about GVA->GPA translation.
> >>
> >> I have a question about the "nested translation mode" terminology. Do
> >> you mean in that case you use stage 1 + stage 2 of the physical IOMMU
> >> (which the ARM spec normally advises or was meant for) or do you mean
> >> stage 1 implemented in vIOMMU and stage 2 implemented in pIOMMU. At the
> >> moment my understanding is for VFIO integration the pIOMMU uses a single
> >> stage combining both the stage 1 and stage2 mappings but the host is not
> >> aware of those 2 stages.
> > 
> > Yes at the moment the VMM merges stage-1 (GVA->GPA) from the guest with
> > its stage-2 mappings (GPA->HPA) and creates a stage-2 mapping (GVA->HPA)
> > in the pIOMMU via VFIO_IOMMU_MAP_DMA. stage-1 is disabled in the pIOMMU.
> > 
> > What I mean by "nested mode" is stage 1 + stage 2 in the physical IOMMU.
> > I'm referring to the "Page Table Sharing" bit of the Future Work in the
> > initial RFC for virtio-iommu [1], and also PASID table binding [2] in the
> > case of vSMMU. In that mode, stage-1 page tables in the pIOMMU are managed
> > by the guest, and the VMM only maps GPA->HPA.
> 
> OK I need to read that part more thoroughly. I was told in the past
> handling nested stages at pIOMMU was considered too complex and
> difficult to maintain. But definitively The SMMU architecture is devised
> for that. Michael asked why we did not use that already for vsmmu
> (nested stages are used on AMD IOMMU I think).

Curious -- but what gave you that idea? I worry that something I might have
said wasn't clear or has been misunderstood.

Will

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [Qemu-devel] [RFC v2 0/8] VIRTIO-IOMMU device
  2017-06-27  8:46                   ` Will Deacon
@ 2017-06-27  8:59                     ` Auger Eric
  0 siblings, 0 replies; 73+ messages in thread
From: Auger Eric @ 2017-06-27  8:59 UTC (permalink / raw)
  To: Will Deacon
  Cc: wei, peter.maydell, kevin.tian, drjones, mst,
	Jean-Philippe Brucker, tn, qemu-devel, marc.zyngier,
	alex.williamson, qemu-arm, robin.murphy, Bharat Bhushan,
	christoffer.dall, eric.auger.pro

Hi,

On 27/06/2017 10:46, Will Deacon wrote:
> Hi Eric,
> 
> On Tue, Jun 27, 2017 at 08:38:48AM +0200, Auger Eric wrote:
>> On 26/06/2017 18:13, Jean-Philippe Brucker wrote:
>>> On 26/06/17 09:22, Auger Eric wrote:
>>>> On 19/06/2017 12:15, Jean-Philippe Brucker wrote:
>>>>> On 19/06/17 08:54, Bharat Bhushan wrote:
>>>>>> I started added replay in virtio-iommu and came across how MSI interrupts with work with VFIO. 
>>>>>> I understand that on intel this works differently but vsmmu will have same requirement. 
>>>>>> kvm-msi-irq-route are added using the msi-address to be translated by viommu and not the final translated address.
>>>>>> While currently the irqfd framework does not know about emulated iommus (virtio-iommu, vsmmuv3/vintel-iommu).
>>>>>> So in my view we have following options:
>>>>>> - Programming with translated address when setting up kvm-msi-irq-route
>>>>>> - Route the interrupts via QEMU, which is bad from performance
>>>>>> - vhost-virtio-iommu may solve the problem in long term
>>>>>>
>>>>>> Is there any other better option I am missing?
>>>>>
>>>>> Since we're on the topic of MSIs... I'm currently trying to figure out how
>>>>> we'll handle MSIs in the nested translation mode, where the guest manages
>>>>> S1 page tables and the host doesn't know about GVA->GPA translation.
>>>>
>>>> I have a question about the "nested translation mode" terminology. Do
>>>> you mean in that case you use stage 1 + stage 2 of the physical IOMMU
>>>> (which the ARM spec normally advises or was meant for) or do you mean
>>>> stage 1 implemented in vIOMMU and stage 2 implemented in pIOMMU. At the
>>>> moment my understanding is for VFIO integration the pIOMMU uses a single
>>>> stage combining both the stage 1 and stage2 mappings but the host is not
>>>> aware of those 2 stages.
>>>
>>> Yes at the moment the VMM merges stage-1 (GVA->GPA) from the guest with
>>> its stage-2 mappings (GPA->HPA) and creates a stage-2 mapping (GVA->HPA)
>>> in the pIOMMU via VFIO_IOMMU_MAP_DMA. stage-1 is disabled in the pIOMMU.
>>>
>>> What I mean by "nested mode" is stage 1 + stage 2 in the physical IOMMU.
>>> I'm referring to the "Page Table Sharing" bit of the Future Work in the
>>> initial RFC for virtio-iommu [1], and also PASID table binding [2] in the
>>> case of vSMMU. In that mode, stage-1 page tables in the pIOMMU are managed
>>> by the guest, and the VMM only maps GPA->HPA.
>>
>> OK I need to read that part more thoroughly. I was told in the past
>> handling nested stages at pIOMMU was considered too complex and
>> difficult to maintain. But definitively The SMMU architecture is devised
>> for that. Michael asked why we did not use that already for vsmmu
>> (nested stages are used on AMD IOMMU I think).
> 
> Curious -- but what gave you that idea? I worry that something I might have
> said wasn't clear or has been misunderstood.

Lobby discussions I might not have correctly understood ;-) Anyway
that's a new direction that I am happy to investigate then.

Thanks

Eric
> 
> Will
> 

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [Qemu-devel] [RFC v2 6/8] virtio-iommu: Implement the translation and commands
  2017-06-07 16:01 ` [Qemu-devel] [RFC v2 6/8] virtio-iommu: Implement the translation and commands Eric Auger
  2017-06-23 16:09   ` Jean-Philippe Brucker
@ 2017-07-04  9:13   ` Bharat Bhushan
  2017-07-05  6:40     ` Auger Eric
  2017-07-14  2:17   ` Peter Xu
  2 siblings, 1 reply; 73+ messages in thread
From: Bharat Bhushan @ 2017-07-04  9:13 UTC (permalink / raw)
  To: Eric Auger, eric.auger.pro, peter.maydell, alex.williamson, mst,
	qemu-arm, qemu-devel, jean-philippe.brucker
  Cc: will.deacon, robin.murphy, kevin.tian, marc.zyngier,
	christoffer.dall, drjones, wei, tn

Hi Eric,

> -----Original Message-----
> From: Eric Auger [mailto:eric.auger@redhat.com]
> Sent: Wednesday, June 07, 2017 9:31 PM
> To: eric.auger.pro@gmail.com; eric.auger@redhat.com;
> peter.maydell@linaro.org; alex.williamson@redhat.com; mst@redhat.com;
> qemu-arm@nongnu.org; qemu-devel@nongnu.org; jean-
> philippe.brucker@arm.com
> Cc: will.deacon@arm.com; robin.murphy@arm.com; kevin.tian@intel.com;
> marc.zyngier@arm.com; christoffer.dall@linaro.org; drjones@redhat.com;
> wei@redhat.com; tn@semihalf.com; Bharat Bhushan
> <bharat.bhushan@nxp.com>
> Subject: [RFC v2 6/8] virtio-iommu: Implement the translation and
> commands
> 
> This patch adds the actual implementation for the translation routine
> and the virtio-iommu commands.
> 
> Signed-off-by: Eric Auger <eric.auger@redhat.com>
> 
> ---
> 
> v1 -> v2:
> - fix compilation issue reported by autobuild system
> ---
>  hw/virtio/trace-events   |   6 ++
>  hw/virtio/virtio-iommu.c | 202
> +++++++++++++++++++++++++++++++++++++++++++++--
>  2 files changed, 202 insertions(+), 6 deletions(-)
> 
> diff --git a/hw/virtio/trace-events b/hw/virtio/trace-events
> index 341dbdf..9196b63 100644
> --- a/hw/virtio/trace-events
> +++ b/hw/virtio/trace-events
> @@ -33,3 +33,9 @@ virtio_iommu_detach(uint32_t dev, uint32_t flags)
> "dev=%d flags=%d"
>  virtio_iommu_map(uint32_t as, uint64_t phys_addr, uint64_t virt_addr,
> uint64_t size, uint32_t flags) "as= %d phys_addr=0x%"PRIx64"
> virt_addr=0x%"PRIx64" size=0x%"PRIx64" flags=%d"
>  virtio_iommu_unmap(uint32_t as, uint64_t virt_addr, uint64_t size, uint32_t
> reserved) "as= %d virt_addr=0x%"PRIx64" size=0x%"PRIx64" reserved=%d"
>  virtio_iommu_translate(const char *name, uint32_t rid, uint64_t iova, int
> flag) "mr=%s rid=%d addr=0x%"PRIx64" flag=%d"
> +virtio_iommu_new_asid(uint32_t asid) "Allocate a new asid=%d"
> +virtio_iommu_new_devid(uint32_t devid) "Allocate a new devid=%d"
> +virtio_iommu_unmap_left_interval(uint64_t low, uint64_t high, uint64_t
> next_low, uint64_t next_high) "Unmap left [0x%"PRIx64",0x%"PRIx64"],
> new interval=[0x%"PRIx64",0x%"PRIx64"]"
> +virtio_iommu_unmap_right_interval(uint64_t low, uint64_t high, uint64_t
> next_low, uint64_t next_high) "Unmap right [0x%"PRIx64",0x%"PRIx64"],
> new interval=[0x%"PRIx64",0x%"PRIx64"]"
> +virtio_iommu_unmap_inc_interval(uint64_t low, uint64_t high) "Unmap inc
> [0x%"PRIx64",0x%"PRIx64"]"
> +virtio_iommu_translate_result(uint64_t virt_addr, uint64_t phys_addr,
> uint32_t sid) "0x%"PRIx64" -> 0x%"PRIx64 " for sid=%d"
> diff --git a/hw/virtio/virtio-iommu.c b/hw/virtio/virtio-iommu.c
> index 902c779..cd188fc 100644
> --- a/hw/virtio/virtio-iommu.c
> +++ b/hw/virtio/virtio-iommu.c
> @@ -32,10 +32,37 @@
>  #include "hw/virtio/virtio-bus.h"
>  #include "hw/virtio/virtio-access.h"
>  #include "hw/virtio/virtio-iommu.h"
> +#include "hw/pci/pci_bus.h"
> +#include "hw/pci/pci.h"
> 
>  /* Max size */
>  #define VIOMMU_DEFAULT_QUEUE_SIZE 256
> 
> +typedef struct viommu_as viommu_as;
> +
> +typedef struct viommu_mapping {
> +    uint64_t virt_addr;
> +    uint64_t phys_addr;
> +    uint64_t size;
> +    uint32_t flags;
> +} viommu_mapping;
> +
> +typedef struct viommu_interval {
> +    uint64_t low;
> +    uint64_t high;
> +} viommu_interval;
> +
> +typedef struct viommu_dev {
> +    uint32_t id;
> +    viommu_as *as;
> +} viommu_dev;
> +
> +struct viommu_as {
> +    uint32_t id;
> +    uint32_t nr_devices;
> +    GTree *mappings;
> +};
> +
>  static inline uint16_t smmu_get_sid(IOMMUDevice *dev)
>  {
>      return  ((pci_bus_num(dev->bus) & 0xff) << 8) | dev->devfn;
> @@ -88,6 +115,19 @@ static void virtio_iommu_init_as(VirtIOIOMMU *s)
>      }
>  }
> 
> +static gint interval_cmp(gconstpointer a, gconstpointer b, gpointer
> user_data)
> +{
> +    viommu_interval *inta = (viommu_interval *)a;
> +    viommu_interval *intb = (viommu_interval *)b;
> +
> +    if (inta->high <= intb->low) {
> +        return -1;
> +    } else if (intb->high <= inta->low) {
> +        return 1;
> +    } else {
> +        return 0;
> +    }
> +}
> 
>  static int virtio_iommu_attach(VirtIOIOMMU *s,
>                                 struct virtio_iommu_req_attach *req)
> @@ -95,10 +135,34 @@ static int virtio_iommu_attach(VirtIOIOMMU *s,
>      uint32_t asid = le32_to_cpu(req->address_space);
>      uint32_t devid = le32_to_cpu(req->device);
>      uint32_t reserved = le32_to_cpu(req->reserved);
> +    viommu_as *as;
> +    viommu_dev *dev;
> 
>      trace_virtio_iommu_attach(asid, devid, reserved);
> 
> -    return VIRTIO_IOMMU_S_UNSUPP;
> +    dev = g_tree_lookup(s->devices, GUINT_TO_POINTER(devid));
> +    if (dev) {
> +        return -1;
> +    }
> +
> +    as = g_tree_lookup(s->address_spaces, GUINT_TO_POINTER(asid));
> +    if (!as) {
> +        as = g_malloc0(sizeof(*as));
> +        as->id = asid;
> +        as->mappings = g_tree_new_full((GCompareDataFunc)interval_cmp,
> +                                         NULL, NULL, (GDestroyNotify)g_free);

Created the tree here but seems like missed destroy tree on detach.

Thanks
-Bharat

> +        g_tree_insert(s->address_spaces, GUINT_TO_POINTER(asid), as);
> +        trace_virtio_iommu_new_asid(asid);
> +    }
> +
> +    dev = g_malloc0(sizeof(*dev));
> +    dev->as = as;
> +    dev->id = devid;
> +    as->nr_devices++;
> +    trace_virtio_iommu_new_devid(devid);
> +    g_tree_insert(s->devices, GUINT_TO_POINTER(devid), dev);
> +
> +    return VIRTIO_IOMMU_S_OK;
>  }
> 
>  static int virtio_iommu_detach(VirtIOIOMMU *s,
> @@ -106,10 +170,13 @@ static int virtio_iommu_detach(VirtIOIOMMU *s,
>  {
>      uint32_t devid = le32_to_cpu(req->device);
>      uint32_t reserved = le32_to_cpu(req->reserved);
> +    int ret;
> 
>      trace_virtio_iommu_detach(devid, reserved);
> 
> -    return VIRTIO_IOMMU_S_UNSUPP;
> +    ret = g_tree_remove(s->devices, GUINT_TO_POINTER(devid));
> +
> +    return ret ? VIRTIO_IOMMU_S_OK : VIRTIO_IOMMU_S_INVAL;
>  }
> 
>  static int virtio_iommu_map(VirtIOIOMMU *s,
> @@ -120,10 +187,37 @@ static int virtio_iommu_map(VirtIOIOMMU *s,
>      uint64_t virt_addr = le64_to_cpu(req->virt_addr);
>      uint64_t size = le64_to_cpu(req->size);
>      uint32_t flags = le32_to_cpu(req->flags);
> +    viommu_as *as;
> +    viommu_interval *interval;
> +    viommu_mapping *mapping;
> +
> +    interval = g_malloc0(sizeof(*interval));
> +
> +    interval->low = virt_addr;
> +    interval->high = virt_addr + size - 1;
> +
> +    as = g_tree_lookup(s->address_spaces, GUINT_TO_POINTER(asid));
> +    if (!as) {
> +        return VIRTIO_IOMMU_S_INVAL;
> +    }
> +
> +    mapping = g_tree_lookup(as->mappings, (gpointer)interval);
> +    if (mapping) {
> +        g_free(interval);
> +        return VIRTIO_IOMMU_S_INVAL;
> +    }
> 
>      trace_virtio_iommu_map(asid, phys_addr, virt_addr, size, flags);
> 
> -    return VIRTIO_IOMMU_S_UNSUPP;
> +    mapping = g_malloc0(sizeof(*mapping));
> +    mapping->virt_addr = virt_addr;
> +    mapping->phys_addr = phys_addr;
> +    mapping->size = size;
> +    mapping->flags = flags;
> +
> +    g_tree_insert(as->mappings, interval, mapping);
> +
> +    return VIRTIO_IOMMU_S_OK;
>  }
> 
>  static int virtio_iommu_unmap(VirtIOIOMMU *s,
> @@ -133,10 +227,64 @@ static int virtio_iommu_unmap(VirtIOIOMMU *s,
>      uint64_t virt_addr = le64_to_cpu(req->virt_addr);
>      uint64_t size = le64_to_cpu(req->size);
>      uint32_t flags = le32_to_cpu(req->flags);
> +    viommu_mapping *mapping;
> +    viommu_interval interval;
> +    viommu_as *as;
> 
>      trace_virtio_iommu_unmap(asid, virt_addr, size, flags);
> 
> -    return VIRTIO_IOMMU_S_UNSUPP;
> +    as = g_tree_lookup(s->address_spaces, GUINT_TO_POINTER(asid));
> +    if (!as) {
> +        error_report("%s: no as", __func__);
> +        return VIRTIO_IOMMU_S_INVAL;
> +    }
> +    interval.low = virt_addr;
> +    interval.high = virt_addr + size - 1;
> +
> +    mapping = g_tree_lookup(as->mappings, (gpointer)&interval);
> +
> +    while (mapping) {
> +        viommu_interval current;
> +        uint64_t low  = mapping->virt_addr;
> +        uint64_t high = mapping->virt_addr + mapping->size - 1;
> +
> +        current.low = low;
> +        current.high = high;
> +
> +        if (low == interval.low && size >= mapping->size) {
> +            g_tree_remove(as->mappings, (gpointer)&current);
> +            interval.low = high + 1;
> +            trace_virtio_iommu_unmap_left_interval(current.low, current.high,
> +                interval.low, interval.high);
> +        } else if (high == interval.high && size >= mapping->size) {
> +            trace_virtio_iommu_unmap_right_interval(current.low, current.high,
> +                interval.low, interval.high);
> +            g_tree_remove(as->mappings, (gpointer)&current);
> +            interval.high = low - 1;
> +        } else if (low > interval.low && high < interval.high) {
> +            trace_virtio_iommu_unmap_inc_interval(current.low, current.high);
> +            g_tree_remove(as->mappings, (gpointer)&current);
> +        } else {
> +            break;
> +        }
> +        if (interval.low >= interval.high) {
> +            return VIRTIO_IOMMU_S_OK;
> +        } else {
> +            mapping = g_tree_lookup(as->mappings, (gpointer)&interval);
> +        }
> +    }
> +
> +    if (mapping) {
> +        error_report("****** %s: Unmap 0x%"PRIx64" size=0x%"PRIx64
> +                     " from 0x%"PRIx64" size=0x%"PRIx64" is not supported",
> +                     __func__, interval.low, size,
> +                     mapping->virt_addr, mapping->size);
> +    } else {
> +        error_report("****** %s: no mapping for
> [0x%"PRIx64",0x%"PRIx64"]",
> +                     __func__, interval.low, interval.high);
> +    }
> +
> +    return VIRTIO_IOMMU_S_INVAL;
>  }
> 
>  #define get_payload_size(req) (\
> @@ -266,19 +414,46 @@ static IOMMUTLBEntry
> virtio_iommu_translate(MemoryRegion *mr, hwaddr addr,
>                                              IOMMUAccessFlags flag)
>  {
>      IOMMUDevice *sdev = container_of(mr, IOMMUDevice, iommu_mr);
> +    VirtIOIOMMU *s = sdev->viommu;
>      uint32_t sid;
> +    viommu_dev *dev;
> +    viommu_mapping *mapping;
> +    viommu_interval interval;
> +
> +    interval.low = addr;
> +    interval.high = addr + 1;
> 
>      IOMMUTLBEntry entry = {
>          .target_as = &address_space_memory,
>          .iova = addr,
>          .translated_addr = addr,
> -        .addr_mask = ~(hwaddr)0,
> -        .perm = IOMMU_NONE,
> +        .addr_mask = (1 << 12) - 1, /* TODO */
> +        .perm = 3,
>      };
> 
>      sid = smmu_get_sid(sdev);
> 
>      trace_virtio_iommu_translate(mr->name, sid, addr, flag);
> +    qemu_mutex_lock(&s->mutex);
> +
> +    dev = g_tree_lookup(s->devices, GUINT_TO_POINTER(sid));
> +    if (!dev) {
> +        /* device cannot be attached to another as */
> +        printf("%s sid=%d is not known!!\n", __func__, sid);
> +        goto unlock;
> +    }
> +
> +    mapping = g_tree_lookup(dev->as->mappings, (gpointer)&interval);
> +    if (!mapping) {
> +        printf("%s no mapping for 0x%"PRIx64" for sid=%d\n", __func__,
> +               addr, sid);
> +        goto unlock;
> +    }
> +    entry.translated_addr = addr - mapping->virt_addr + mapping-
> >phys_addr,
> +    trace_virtio_iommu_translate_result(addr, entry.translated_addr, sid);
> +
> +unlock:
> +    qemu_mutex_unlock(&s->mutex);
>      return entry;
>  }
> 
> @@ -341,6 +516,12 @@ static inline guint as_uint64_hash(gconstpointer v)
>      return (guint)*(const uint64_t *)v;
>  }
> 
> +static gint int_cmp(gconstpointer a, gconstpointer b, gpointer user_data)
> +{
> +    uint ua = GPOINTER_TO_UINT(a);
> +    uint ub = GPOINTER_TO_UINT(b);
> +    return (ua > ub) - (ua < ub);
> +}
> 
>  static void virtio_iommu_device_realize(DeviceState *dev, Error **errp)
>  {
> @@ -362,12 +543,21 @@ static void
> virtio_iommu_device_realize(DeviceState *dev, Error **errp)
>                                              as_uint64_equal,
>                                              g_free, g_free);
> 
> +    s->address_spaces = g_tree_new_full((GCompareDataFunc)int_cmp,
> +                                         NULL, NULL, (GDestroyNotify)g_free);
> +    s->devices = g_tree_new_full((GCompareDataFunc)int_cmp,
> +                                         NULL, NULL, (GDestroyNotify)g_free);
> +
>      virtio_iommu_init_as(s);
>  }
> 
>  static void virtio_iommu_device_unrealize(DeviceState *dev, Error **errp)
>  {
>      VirtIODevice *vdev = VIRTIO_DEVICE(dev);
> +    VirtIOIOMMU *s = VIRTIO_IOMMU(dev);
> +
> +    g_tree_destroy(s->address_spaces);
> +    g_tree_destroy(s->devices);
> 
>      virtio_cleanup(vdev);
>  }
> --
> 2.5.5

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [Qemu-devel] [RFC v2 6/8] virtio-iommu: Implement the translation and commands
  2017-07-04  9:13   ` Bharat Bhushan
@ 2017-07-05  6:40     ` Auger Eric
  0 siblings, 0 replies; 73+ messages in thread
From: Auger Eric @ 2017-07-05  6:40 UTC (permalink / raw)
  To: Bharat Bhushan, eric.auger.pro, peter.maydell, alex.williamson,
	mst, qemu-arm, qemu-devel, jean-philippe.brucker
  Cc: wei, kevin.tian, marc.zyngier, tn, will.deacon, drjones,
	robin.murphy, christoffer.dall

Hi Bharat,

On 04/07/2017 11:13, Bharat Bhushan wrote:
> Hi Eric,
> 
>> -----Original Message-----
>> From: Eric Auger [mailto:eric.auger@redhat.com]
>> Sent: Wednesday, June 07, 2017 9:31 PM
>> To: eric.auger.pro@gmail.com; eric.auger@redhat.com;
>> peter.maydell@linaro.org; alex.williamson@redhat.com; mst@redhat.com;
>> qemu-arm@nongnu.org; qemu-devel@nongnu.org; jean-
>> philippe.brucker@arm.com
>> Cc: will.deacon@arm.com; robin.murphy@arm.com; kevin.tian@intel.com;
>> marc.zyngier@arm.com; christoffer.dall@linaro.org; drjones@redhat.com;
>> wei@redhat.com; tn@semihalf.com; Bharat Bhushan
>> <bharat.bhushan@nxp.com>
>> Subject: [RFC v2 6/8] virtio-iommu: Implement the translation and
>> commands
>>
>> This patch adds the actual implementation for the translation routine
>> and the virtio-iommu commands.
>>
>> Signed-off-by: Eric Auger <eric.auger@redhat.com>
>>
>> ---
>>
>> v1 -> v2:
>> - fix compilation issue reported by autobuild system
>> ---
>>  hw/virtio/trace-events   |   6 ++
>>  hw/virtio/virtio-iommu.c | 202
>> +++++++++++++++++++++++++++++++++++++++++++++--
>>  2 files changed, 202 insertions(+), 6 deletions(-)
>>
>> diff --git a/hw/virtio/trace-events b/hw/virtio/trace-events
>> index 341dbdf..9196b63 100644
>> --- a/hw/virtio/trace-events
>> +++ b/hw/virtio/trace-events
>> @@ -33,3 +33,9 @@ virtio_iommu_detach(uint32_t dev, uint32_t flags)
>> "dev=%d flags=%d"
>>  virtio_iommu_map(uint32_t as, uint64_t phys_addr, uint64_t virt_addr,
>> uint64_t size, uint32_t flags) "as= %d phys_addr=0x%"PRIx64"
>> virt_addr=0x%"PRIx64" size=0x%"PRIx64" flags=%d"
>>  virtio_iommu_unmap(uint32_t as, uint64_t virt_addr, uint64_t size, uint32_t
>> reserved) "as= %d virt_addr=0x%"PRIx64" size=0x%"PRIx64" reserved=%d"
>>  virtio_iommu_translate(const char *name, uint32_t rid, uint64_t iova, int
>> flag) "mr=%s rid=%d addr=0x%"PRIx64" flag=%d"
>> +virtio_iommu_new_asid(uint32_t asid) "Allocate a new asid=%d"
>> +virtio_iommu_new_devid(uint32_t devid) "Allocate a new devid=%d"
>> +virtio_iommu_unmap_left_interval(uint64_t low, uint64_t high, uint64_t
>> next_low, uint64_t next_high) "Unmap left [0x%"PRIx64",0x%"PRIx64"],
>> new interval=[0x%"PRIx64",0x%"PRIx64"]"
>> +virtio_iommu_unmap_right_interval(uint64_t low, uint64_t high, uint64_t
>> next_low, uint64_t next_high) "Unmap right [0x%"PRIx64",0x%"PRIx64"],
>> new interval=[0x%"PRIx64",0x%"PRIx64"]"
>> +virtio_iommu_unmap_inc_interval(uint64_t low, uint64_t high) "Unmap inc
>> [0x%"PRIx64",0x%"PRIx64"]"
>> +virtio_iommu_translate_result(uint64_t virt_addr, uint64_t phys_addr,
>> uint32_t sid) "0x%"PRIx64" -> 0x%"PRIx64 " for sid=%d"
>> diff --git a/hw/virtio/virtio-iommu.c b/hw/virtio/virtio-iommu.c
>> index 902c779..cd188fc 100644
>> --- a/hw/virtio/virtio-iommu.c
>> +++ b/hw/virtio/virtio-iommu.c
>> @@ -32,10 +32,37 @@
>>  #include "hw/virtio/virtio-bus.h"
>>  #include "hw/virtio/virtio-access.h"
>>  #include "hw/virtio/virtio-iommu.h"
>> +#include "hw/pci/pci_bus.h"
>> +#include "hw/pci/pci.h"
>>
>>  /* Max size */
>>  #define VIOMMU_DEFAULT_QUEUE_SIZE 256
>>
>> +typedef struct viommu_as viommu_as;
>> +
>> +typedef struct viommu_mapping {
>> +    uint64_t virt_addr;
>> +    uint64_t phys_addr;
>> +    uint64_t size;
>> +    uint32_t flags;
>> +} viommu_mapping;
>> +
>> +typedef struct viommu_interval {
>> +    uint64_t low;
>> +    uint64_t high;
>> +} viommu_interval;
>> +
>> +typedef struct viommu_dev {
>> +    uint32_t id;
>> +    viommu_as *as;
>> +} viommu_dev;
>> +
>> +struct viommu_as {
>> +    uint32_t id;
>> +    uint32_t nr_devices;
>> +    GTree *mappings;
>> +};
>> +
>>  static inline uint16_t smmu_get_sid(IOMMUDevice *dev)
>>  {
>>      return  ((pci_bus_num(dev->bus) & 0xff) << 8) | dev->devfn;
>> @@ -88,6 +115,19 @@ static void virtio_iommu_init_as(VirtIOIOMMU *s)
>>      }
>>  }
>>
>> +static gint interval_cmp(gconstpointer a, gconstpointer b, gpointer
>> user_data)
>> +{
>> +    viommu_interval *inta = (viommu_interval *)a;
>> +    viommu_interval *intb = (viommu_interval *)b;
>> +
>> +    if (inta->high <= intb->low) {
>> +        return -1;
>> +    } else if (intb->high <= inta->low) {
>> +        return 1;
>> +    } else {
>> +        return 0;
>> +    }
>> +}
>>
>>  static int virtio_iommu_attach(VirtIOIOMMU *s,
>>                                 struct virtio_iommu_req_attach *req)
>> @@ -95,10 +135,34 @@ static int virtio_iommu_attach(VirtIOIOMMU *s,
>>      uint32_t asid = le32_to_cpu(req->address_space);
>>      uint32_t devid = le32_to_cpu(req->device);
>>      uint32_t reserved = le32_to_cpu(req->reserved);
>> +    viommu_as *as;
>> +    viommu_dev *dev;
>>
>>      trace_virtio_iommu_attach(asid, devid, reserved);
>>
>> -    return VIRTIO_IOMMU_S_UNSUPP;
>> +    dev = g_tree_lookup(s->devices, GUINT_TO_POINTER(devid));
>> +    if (dev) {
>> +        return -1;
>> +    }
>> +
>> +    as = g_tree_lookup(s->address_spaces, GUINT_TO_POINTER(asid));
>> +    if (!as) {
>> +        as = g_malloc0(sizeof(*as));
>> +        as->id = asid;
>> +        as->mappings = g_tree_new_full((GCompareDataFunc)interval_cmp,
>> +                                         NULL, NULL, (GDestroyNotify)g_free);
> 
> Created the tree here but seems like missed destroy tree on detach.
Sure, I will fix that.

Thanks!

Eric
> 
> Thanks
> -Bharat
> 
>> +        g_tree_insert(s->address_spaces, GUINT_TO_POINTER(asid), as);
>> +        trace_virtio_iommu_new_asid(asid);
>> +    }
>> +
>> +    dev = g_malloc0(sizeof(*dev));
>> +    dev->as = as;
>> +    dev->id = devid;
>> +    as->nr_devices++;
>> +    trace_virtio_iommu_new_devid(devid);
>> +    g_tree_insert(s->devices, GUINT_TO_POINTER(devid), dev);
>> +
>> +    return VIRTIO_IOMMU_S_OK;
>>  }
>>
>>  static int virtio_iommu_detach(VirtIOIOMMU *s,
>> @@ -106,10 +170,13 @@ static int virtio_iommu_detach(VirtIOIOMMU *s,
>>  {
>>      uint32_t devid = le32_to_cpu(req->device);
>>      uint32_t reserved = le32_to_cpu(req->reserved);
>> +    int ret;
>>
>>      trace_virtio_iommu_detach(devid, reserved);
>>
>> -    return VIRTIO_IOMMU_S_UNSUPP;
>> +    ret = g_tree_remove(s->devices, GUINT_TO_POINTER(devid));
>> +
>> +    return ret ? VIRTIO_IOMMU_S_OK : VIRTIO_IOMMU_S_INVAL;
>>  }
>>
>>  static int virtio_iommu_map(VirtIOIOMMU *s,
>> @@ -120,10 +187,37 @@ static int virtio_iommu_map(VirtIOIOMMU *s,
>>      uint64_t virt_addr = le64_to_cpu(req->virt_addr);
>>      uint64_t size = le64_to_cpu(req->size);
>>      uint32_t flags = le32_to_cpu(req->flags);
>> +    viommu_as *as;
>> +    viommu_interval *interval;
>> +    viommu_mapping *mapping;
>> +
>> +    interval = g_malloc0(sizeof(*interval));
>> +
>> +    interval->low = virt_addr;
>> +    interval->high = virt_addr + size - 1;
>> +
>> +    as = g_tree_lookup(s->address_spaces, GUINT_TO_POINTER(asid));
>> +    if (!as) {
>> +        return VIRTIO_IOMMU_S_INVAL;
>> +    }
>> +
>> +    mapping = g_tree_lookup(as->mappings, (gpointer)interval);
>> +    if (mapping) {
>> +        g_free(interval);
>> +        return VIRTIO_IOMMU_S_INVAL;
>> +    }
>>
>>      trace_virtio_iommu_map(asid, phys_addr, virt_addr, size, flags);
>>
>> -    return VIRTIO_IOMMU_S_UNSUPP;
>> +    mapping = g_malloc0(sizeof(*mapping));
>> +    mapping->virt_addr = virt_addr;
>> +    mapping->phys_addr = phys_addr;
>> +    mapping->size = size;
>> +    mapping->flags = flags;
>> +
>> +    g_tree_insert(as->mappings, interval, mapping);
>> +
>> +    return VIRTIO_IOMMU_S_OK;
>>  }
>>
>>  static int virtio_iommu_unmap(VirtIOIOMMU *s,
>> @@ -133,10 +227,64 @@ static int virtio_iommu_unmap(VirtIOIOMMU *s,
>>      uint64_t virt_addr = le64_to_cpu(req->virt_addr);
>>      uint64_t size = le64_to_cpu(req->size);
>>      uint32_t flags = le32_to_cpu(req->flags);
>> +    viommu_mapping *mapping;
>> +    viommu_interval interval;
>> +    viommu_as *as;
>>
>>      trace_virtio_iommu_unmap(asid, virt_addr, size, flags);
>>
>> -    return VIRTIO_IOMMU_S_UNSUPP;
>> +    as = g_tree_lookup(s->address_spaces, GUINT_TO_POINTER(asid));
>> +    if (!as) {
>> +        error_report("%s: no as", __func__);
>> +        return VIRTIO_IOMMU_S_INVAL;
>> +    }
>> +    interval.low = virt_addr;
>> +    interval.high = virt_addr + size - 1;
>> +
>> +    mapping = g_tree_lookup(as->mappings, (gpointer)&interval);
>> +
>> +    while (mapping) {
>> +        viommu_interval current;
>> +        uint64_t low  = mapping->virt_addr;
>> +        uint64_t high = mapping->virt_addr + mapping->size - 1;
>> +
>> +        current.low = low;
>> +        current.high = high;
>> +
>> +        if (low == interval.low && size >= mapping->size) {
>> +            g_tree_remove(as->mappings, (gpointer)&current);
>> +            interval.low = high + 1;
>> +            trace_virtio_iommu_unmap_left_interval(current.low, current.high,
>> +                interval.low, interval.high);
>> +        } else if (high == interval.high && size >= mapping->size) {
>> +            trace_virtio_iommu_unmap_right_interval(current.low, current.high,
>> +                interval.low, interval.high);
>> +            g_tree_remove(as->mappings, (gpointer)&current);
>> +            interval.high = low - 1;
>> +        } else if (low > interval.low && high < interval.high) {
>> +            trace_virtio_iommu_unmap_inc_interval(current.low, current.high);
>> +            g_tree_remove(as->mappings, (gpointer)&current);
>> +        } else {
>> +            break;
>> +        }
>> +        if (interval.low >= interval.high) {
>> +            return VIRTIO_IOMMU_S_OK;
>> +        } else {
>> +            mapping = g_tree_lookup(as->mappings, (gpointer)&interval);
>> +        }
>> +    }
>> +
>> +    if (mapping) {
>> +        error_report("****** %s: Unmap 0x%"PRIx64" size=0x%"PRIx64
>> +                     " from 0x%"PRIx64" size=0x%"PRIx64" is not supported",
>> +                     __func__, interval.low, size,
>> +                     mapping->virt_addr, mapping->size);
>> +    } else {
>> +        error_report("****** %s: no mapping for
>> [0x%"PRIx64",0x%"PRIx64"]",
>> +                     __func__, interval.low, interval.high);
>> +    }
>> +
>> +    return VIRTIO_IOMMU_S_INVAL;
>>  }
>>
>>  #define get_payload_size(req) (\
>> @@ -266,19 +414,46 @@ static IOMMUTLBEntry
>> virtio_iommu_translate(MemoryRegion *mr, hwaddr addr,
>>                                              IOMMUAccessFlags flag)
>>  {
>>      IOMMUDevice *sdev = container_of(mr, IOMMUDevice, iommu_mr);
>> +    VirtIOIOMMU *s = sdev->viommu;
>>      uint32_t sid;
>> +    viommu_dev *dev;
>> +    viommu_mapping *mapping;
>> +    viommu_interval interval;
>> +
>> +    interval.low = addr;
>> +    interval.high = addr + 1;
>>
>>      IOMMUTLBEntry entry = {
>>          .target_as = &address_space_memory,
>>          .iova = addr,
>>          .translated_addr = addr,
>> -        .addr_mask = ~(hwaddr)0,
>> -        .perm = IOMMU_NONE,
>> +        .addr_mask = (1 << 12) - 1, /* TODO */
>> +        .perm = 3,
>>      };
>>
>>      sid = smmu_get_sid(sdev);
>>
>>      trace_virtio_iommu_translate(mr->name, sid, addr, flag);
>> +    qemu_mutex_lock(&s->mutex);
>> +
>> +    dev = g_tree_lookup(s->devices, GUINT_TO_POINTER(sid));
>> +    if (!dev) {
>> +        /* device cannot be attached to another as */
>> +        printf("%s sid=%d is not known!!\n", __func__, sid);
>> +        goto unlock;
>> +    }
>> +
>> +    mapping = g_tree_lookup(dev->as->mappings, (gpointer)&interval);
>> +    if (!mapping) {
>> +        printf("%s no mapping for 0x%"PRIx64" for sid=%d\n", __func__,
>> +               addr, sid);
>> +        goto unlock;
>> +    }
>> +    entry.translated_addr = addr - mapping->virt_addr + mapping-
>>> phys_addr,
>> +    trace_virtio_iommu_translate_result(addr, entry.translated_addr, sid);
>> +
>> +unlock:
>> +    qemu_mutex_unlock(&s->mutex);
>>      return entry;
>>  }
>>
>> @@ -341,6 +516,12 @@ static inline guint as_uint64_hash(gconstpointer v)
>>      return (guint)*(const uint64_t *)v;
>>  }
>>
>> +static gint int_cmp(gconstpointer a, gconstpointer b, gpointer user_data)
>> +{
>> +    uint ua = GPOINTER_TO_UINT(a);
>> +    uint ub = GPOINTER_TO_UINT(b);
>> +    return (ua > ub) - (ua < ub);
>> +}
>>
>>  static void virtio_iommu_device_realize(DeviceState *dev, Error **errp)
>>  {
>> @@ -362,12 +543,21 @@ static void
>> virtio_iommu_device_realize(DeviceState *dev, Error **errp)
>>                                              as_uint64_equal,
>>                                              g_free, g_free);
>>
>> +    s->address_spaces = g_tree_new_full((GCompareDataFunc)int_cmp,
>> +                                         NULL, NULL, (GDestroyNotify)g_free);
>> +    s->devices = g_tree_new_full((GCompareDataFunc)int_cmp,
>> +                                         NULL, NULL, (GDestroyNotify)g_free);
>> +
>>      virtio_iommu_init_as(s);
>>  }
>>
>>  static void virtio_iommu_device_unrealize(DeviceState *dev, Error **errp)
>>  {
>>      VirtIODevice *vdev = VIRTIO_DEVICE(dev);
>> +    VirtIOIOMMU *s = VIRTIO_IOMMU(dev);
>> +
>> +    g_tree_destroy(s->address_spaces);
>> +    g_tree_destroy(s->devices);
>>
>>      virtio_cleanup(vdev);
>>  }
>> --
>> 2.5.5
> 
> 

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [Qemu-devel] [RFC v2 0/8] VIRTIO-IOMMU device
  2017-06-19 10:15           ` Jean-Philippe Brucker
  2017-06-26  8:22             ` Auger Eric
@ 2017-07-05  7:14             ` Tian, Kevin
  2017-07-05 12:44               ` Jean-Philippe Brucker
  2017-07-05  7:15             ` Bharat Bhushan
  2 siblings, 1 reply; 73+ messages in thread
From: Tian, Kevin @ 2017-07-05  7:14 UTC (permalink / raw)
  To: Jean-Philippe Brucker, Bharat Bhushan, Auger Eric,
	eric.auger.pro, peter.maydell, alex.williamson, mst, qemu-arm,
	qemu-devel
  Cc: wei, marc.zyngier, tn, will.deacon, drjones, robin.murphy,
	christoffer.dall

> From: Jean-Philippe Brucker [mailto:jean-philippe.brucker@arm.com]
> Sent: Monday, June 19, 2017 6:15 PM
> 
> On 19/06/17 08:54, Bharat Bhushan wrote:
> > Hi Eric,
> >
> > I started added replay in virtio-iommu and came across how MSI interrupts
> with work with VFIO.
> > I understand that on intel this works differently but vsmmu will have same
> requirement.
> > kvm-msi-irq-route are added using the msi-address to be translated by
> viommu and not the final translated address.
> > While currently the irqfd framework does not know about emulated
> iommus (virtio-iommu, vsmmuv3/vintel-iommu).
> > So in my view we have following options:
> > - Programming with translated address when setting up kvm-msi-irq-route
> > - Route the interrupts via QEMU, which is bad from performance
> > - vhost-virtio-iommu may solve the problem in long term
> >
> > Is there any other better option I am missing?
> 
> Since we're on the topic of MSIs... I'm currently trying to figure out how
> we'll handle MSIs in the nested translation mode, where the guest manages
> S1 page tables and the host doesn't know about GVA->GPA translation.
> 
> I'm also wondering about the benefits of having SW-mapped MSIs in the
> guest. It seems unavoidable for vSMMU since that's what a physical system
> would do. But in a paravirtualized solution there doesn't seem to be any
> compelling reason for having the guest map MSI doorbells. These addresses
> are never accessed directly, they are only used for setting up IRQ routing
> (at least on kvmtool). So here's what I'd like to have. Note that I
> haven't investigated the feasibility in Qemu yet, I don't know how it
> deals with MSIs.
> 
> (1) Guest uses the guest-physical MSI doorbell when setting up MSIs. For
> ARM with GICv3 this would be GITS_TRANSLATER, for x86 it would be the
> fixed MSI doorbell. This way the host wouldn't need to inspect IOMMU
> mappings when handling writes to PCI MSI-X tables.
> 

What do you mean by "fixed MSI doorbell"? PCI MSI-X table is part of
PCI MMIO bar. Accessing to it is just a memory virtualization issue (e.g.
trap by KVM and then emulated in Qemu) on x86. It's not a IOMMU
problem. I guess you may mean same thing but want to double confirm
here given the terminology confusion. Or do you mean the interrupt
triggered by IOMMU itself?

Thanks
Kevin

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [Qemu-devel] [RFC v2 0/8] VIRTIO-IOMMU device
  2017-06-19 10:15           ` Jean-Philippe Brucker
  2017-06-26  8:22             ` Auger Eric
  2017-07-05  7:14             ` Tian, Kevin
@ 2017-07-05  7:15             ` Bharat Bhushan
  2 siblings, 0 replies; 73+ messages in thread
From: Bharat Bhushan @ 2017-07-05  7:15 UTC (permalink / raw)
  To: Jean-Philippe Brucker, Auger Eric, eric.auger.pro, peter.maydell,
	alex.williamson, mst, qemu-arm, qemu-devel
  Cc: wei, kevin.tian, marc.zyngier, tn, will.deacon, drjones,
	robin.murphy, christoffer.dall

Hi Jean,

> -----Original Message-----
> From: Jean-Philippe Brucker [mailto:jean-philippe.brucker@arm.com]
> Sent: Monday, June 19, 2017 3:45 PM
> To: Bharat Bhushan <bharat.bhushan@nxp.com>; Auger Eric
> <eric.auger@redhat.com>; eric.auger.pro@gmail.com;
> peter.maydell@linaro.org; alex.williamson@redhat.com; mst@redhat.com;
> qemu-arm@nongnu.org; qemu-devel@nongnu.org
> Cc: wei@redhat.com; kevin.tian@intel.com; marc.zyngier@arm.com;
> tn@semihalf.com; will.deacon@arm.com; drjones@redhat.com;
> robin.murphy@arm.com; christoffer.dall@linaro.org
> Subject: Re: [Qemu-devel] [RFC v2 0/8] VIRTIO-IOMMU device
> 
> On 19/06/17 08:54, Bharat Bhushan wrote:
> > Hi Eric,
> >
> > I started added replay in virtio-iommu and came across how MSI interrupts
> with work with VFIO.
> > I understand that on intel this works differently but vsmmu will have same
> requirement.
> > kvm-msi-irq-route are added using the msi-address to be translated by
> viommu and not the final translated address.
> > While currently the irqfd framework does not know about emulated
> iommus (virtio-iommu, vsmmuv3/vintel-iommu).
> > So in my view we have following options:
> > - Programming with translated address when setting up
> > kvm-msi-irq-route
> > - Route the interrupts via QEMU, which is bad from performance
> > - vhost-virtio-iommu may solve the problem in long term
> >
> > Is there any other better option I am missing?
> 
> Since we're on the topic of MSIs... I'm currently trying to figure out how we'll
> handle MSIs in the nested translation mode, where the guest manages
> S1 page tables and the host doesn't know about GVA->GPA translation.
> 
> I'm also wondering about the benefits of having SW-mapped MSIs in the
> guest. It seems unavoidable for vSMMU since that's what a physical system
> would do. But in a paravirtualized solution there doesn't seem to be any
> compelling reason for having the guest map MSI doorbells. These addresses
> are never accessed directly, they are only used for setting up IRQ routing (at
> least on kvmtool). So here's what I'd like to have. Note that I haven't
> investigated the feasibility in Qemu yet, I don't know how it deals with MSIs.
> 
> (1) Guest uses the guest-physical MSI doorbell when setting up MSIs. For
> ARM with GICv3 this would be GITS_TRANSLATER, for x86 it would be the
> fixed MSI doorbell. This way the host wouldn't need to inspect IOMMU
> mappings when handling writes to PCI MSI-X tables.

Sorry for late reply, does this mean that we can use IOMMU_RESV_MSI for virtio-iommu driver? This will not create mapping in IOMMU?
I tried this PCI pass-through using QEMU (integrated vfio with virtio-iommu) and MSI interrupts works without any change.

Thanks
-Bharat

> 
> (2) In nested mode (with VFIO) on ARM, the pSMMU will still translate MSIs
> via S1+S2. Therefore the host needs to map MSIs at stage-1, and I'd like to
> use the (currently unused) TTB1 tables in that case. In addition, using
> TTB1 would be useful for SVM, when endpoints write MSIs with PASIDs and
> we don't want to map them in user address space.
> 
> This means that the host needs to use different doorbell addresses in nested
> mode, since it would be unable to map at S1 the same IOVA as S2
> (TTB1 manages negative addresses - 0xffff............, which are not
> representable as GPAs.) It also requires to use 32-bit page tables for
> endpoints that are not capable of using 64-bit MSI addresses.
> 
> Now (2) is entirely handled in the host kernel, so it's more a Linux question.
> But does (1) seem acceptable for virtio-iommu in Qemu?
> 
> Thanks,
> Jean

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [Qemu-devel] [RFC v2 0/8] VIRTIO-IOMMU device
  2017-06-26 16:13               ` Jean-Philippe Brucker
  2017-06-27  6:38                 ` Auger Eric
@ 2017-07-05  7:25                 ` Tian, Kevin
  2017-07-05 12:44                   ` Jean-Philippe Brucker
  1 sibling, 1 reply; 73+ messages in thread
From: Tian, Kevin @ 2017-07-05  7:25 UTC (permalink / raw)
  To: Jean-Philippe Brucker, Auger Eric, Bharat Bhushan,
	eric.auger.pro, peter.maydell, alex.williamson, mst, qemu-arm,
	qemu-devel
  Cc: wei, marc.zyngier, tn, will.deacon, drjones, robin.murphy,
	christoffer.dall

> From: Jean-Philippe Brucker [mailto:jean-philippe.brucker@arm.com]
> Sent: Tuesday, June 27, 2017 12:13 AM
> 
> On 26/06/17 09:22, Auger Eric wrote:
> > Hi Jean-Philippe,
> >
> > On 19/06/2017 12:15, Jean-Philippe Brucker wrote:
> >> On 19/06/17 08:54, Bharat Bhushan wrote:
> >>> Hi Eric,
> >>>
> >>> I started added replay in virtio-iommu and came across how MSI
> interrupts with work with VFIO.
> >>> I understand that on intel this works differently but vsmmu will have
> same requirement.
> >>> kvm-msi-irq-route are added using the msi-address to be translated by
> viommu and not the final translated address.
> >>> While currently the irqfd framework does not know about emulated
> iommus (virtio-iommu, vsmmuv3/vintel-iommu).
> >>> So in my view we have following options:
> >>> - Programming with translated address when setting up kvm-msi-irq-
> route
> >>> - Route the interrupts via QEMU, which is bad from performance
> >>> - vhost-virtio-iommu may solve the problem in long term
> >>>
> >>> Is there any other better option I am missing?
> >>
> >> Since we're on the topic of MSIs... I'm currently trying to figure out how
> >> we'll handle MSIs in the nested translation mode, where the guest
> manages
> >> S1 page tables and the host doesn't know about GVA->GPA translation.
> >
> > I have a question about the "nested translation mode" terminology. Do
> > you mean in that case you use stage 1 + stage 2 of the physical IOMMU
> > (which the ARM spec normally advises or was meant for) or do you mean
> > stage 1 implemented in vIOMMU and stage 2 implemented in pIOMMU. At
> the
> > moment my understanding is for VFIO integration the pIOMMU uses a
> single
> > stage combining both the stage 1 and stage2 mappings but the host is not
> > aware of those 2 stages.
> 
> Yes at the moment the VMM merges stage-1 (GVA->GPA) from the guest with
> its stage-2 mappings (GPA->HPA) and creates a stage-2 mapping (GVA->HPA)
> in the pIOMMU via VFIO_IOMMU_MAP_DMA. stage-1 is disabled in the
> pIOMMU.
> 

Curious whether you are describing current smmu status or general
vIOMMU status also applying to other vendors...

the usage what you described is about svm, while svm requires PASID.
At least PASID is tied to stage-1 on Intel VT-d. Only DMA w/o PASID
or nested translation from stage-1 will go through stage-2. Unless
ARM smmu has a completely different implementation, I'm not sure
how svm can be virtualized w/ stage-1 translation disabled. There
are multiple stage-1 page tables while only one stage-2 page table per
device. Could merging actually work here?

The only case with merging happen today is for guest stage-2 usage
or so-called GIOVA usage. Guest programs GIOVA->GPA to vIOMMU 
stage-2. Then vIOMMU invokes vfio map/unmap APIs to translate/
merge to GIOVA->HPA to pIOMMU stage-2. Maybe what you
actually meant is this one?

Thanks
Kevin

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [Qemu-devel] [RFC v2 0/8] VIRTIO-IOMMU device
  2017-06-26  7:54           ` Auger Eric
@ 2017-07-05  8:23             ` Bharat Bhushan
  2017-07-05  8:44               ` Auger Eric
  0 siblings, 1 reply; 73+ messages in thread
From: Bharat Bhushan @ 2017-07-05  8:23 UTC (permalink / raw)
  To: Auger Eric, eric.auger.pro, peter.maydell, alex.williamson, mst,
	qemu-arm, qemu-devel, jean-philippe.brucker
  Cc: wei, kevin.tian, marc.zyngier, tn, will.deacon, drjones,
	robin.murphy, christoffer.dall

Hi Eric,

> -----Original Message-----
> From: Auger Eric [mailto:eric.auger@redhat.com]
> Sent: Monday, June 26, 2017 1:25 PM
> To: Bharat Bhushan <bharat.bhushan@nxp.com>;
> eric.auger.pro@gmail.com; peter.maydell@linaro.org;
> alex.williamson@redhat.com; mst@redhat.com; qemu-arm@nongnu.org;
> qemu-devel@nongnu.org; jean-philippe.brucker@arm.com
> Cc: wei@redhat.com; kevin.tian@intel.com; marc.zyngier@arm.com;
> tn@semihalf.com; will.deacon@arm.com; drjones@redhat.com;
> robin.murphy@arm.com; christoffer.dall@linaro.org
> Subject: Re: [Qemu-devel] [RFC v2 0/8] VIRTIO-IOMMU device
> 
> Hi Bharat,
> 
> On 19/06/2017 09:54, Bharat Bhushan wrote:
> > Hi Eric,
> >
> > I started added replay in virtio-iommu and came across how MSI interrupts
> with work with VFIO.
> > I understand that on intel this works differently but vsmmu will have same
> requirement.
> > kvm-msi-irq-route are added using the msi-address to be translated by
> viommu and not the final translated address.
> > While currently the irqfd framework does not know about emulated
> iommus (virtio-iommu, vsmmuv3/vintel-iommu).
> > So in my view we have following options:
> > - Programming with translated address when setting up
> > kvm-msi-irq-route
> > - Route the interrupts via QEMU, which is bad from performance
> > - vhost-virtio-iommu may solve the problem in long term
> 
> Sorry for the delay. With regard to the vsmmuv3/vfio integration I think we
> need to use the guest physical address otherwise the MSI address will not be
> recognized as an MSI doorbell.
> 
> Also the fact on ARM we map the MSI doorbell causes an assert in
> vfio_get_vaddr() as the vITS doorbell is not a RAM region. We will need to
> handle this specifically.

Also when setup msi-route kvm_irqchip_add_msi_route() we needed to provide the translated address.
According to my understanding this is required because kernel does no go through viommu translation when generating interrupt, no? 

Thanks
-Bharat

> 
> Besides I have not looked specifically at the virtio-iommu/vfio integration
> yet.
> 
> Thanks
> 
> Eric
> >
> > Is there any other better option I am missing?
> >
> > Thanks
> > -Bharat
> >
> >> -----Original Message-----
> >> From: Auger Eric [mailto:eric.auger@redhat.com]
> >> Sent: Friday, June 09, 2017 5:24 PM
> >> To: Bharat Bhushan <bharat.bhushan@nxp.com>;
> >> eric.auger.pro@gmail.com; peter.maydell@linaro.org;
> >> alex.williamson@redhat.com; mst@redhat.com; qemu-arm@nongnu.org;
> >> qemu-devel@nongnu.org; jean-philippe.brucker@arm.com
> >> Cc: wei@redhat.com; kevin.tian@intel.com; marc.zyngier@arm.com;
> >> tn@semihalf.com; will.deacon@arm.com; drjones@redhat.com;
> >> robin.murphy@arm.com; christoffer.dall@linaro.org
> >> Subject: Re: [Qemu-devel] [RFC v2 0/8] VIRTIO-IOMMU device
> >>
> >> Hi Bharat,
> >>
> >> On 09/06/2017 13:30, Bharat Bhushan wrote:
> >>> Hi Eric,
> >>>
> >>>> -----Original Message-----
> >>>> From: Auger Eric [mailto:eric.auger@redhat.com]
> >>>> Sent: Friday, June 09, 2017 12:14 PM
> >>>> To: Bharat Bhushan <bharat.bhushan@nxp.com>;
> >>>> eric.auger.pro@gmail.com; peter.maydell@linaro.org;
> >>>> alex.williamson@redhat.com; mst@redhat.com; qemu-
> arm@nongnu.org;
> >>>> qemu-devel@nongnu.org; jean-philippe.brucker@arm.com
> >>>> Cc: will.deacon@arm.com; robin.murphy@arm.com;
> >> kevin.tian@intel.com;
> >>>> marc.zyngier@arm.com; christoffer.dall@linaro.org;
> >>>> drjones@redhat.com; wei@redhat.com; tn@semihalf.com
> >>>> Subject: Re: [RFC v2 0/8] VIRTIO-IOMMU device
> >>>>
> >>>> Hi Bharat,
> >>>>
> >>>> On 09/06/2017 08:16, Bharat Bhushan wrote:
> >>>>> Hi Eric,
> >>>>>
> >>>>>> -----Original Message-----
> >>>>>> From: Eric Auger [mailto:eric.auger@redhat.com]
> >>>>>> Sent: Wednesday, June 07, 2017 9:31 PM
> >>>>>> To: eric.auger.pro@gmail.com; eric.auger@redhat.com;
> >>>>>> peter.maydell@linaro.org; alex.williamson@redhat.com;
> >>>> mst@redhat.com;
> >>>>>> qemu-arm@nongnu.org; qemu-devel@nongnu.org; jean-
> >>>>>> philippe.brucker@arm.com
> >>>>>> Cc: will.deacon@arm.com; robin.murphy@arm.com;
> >>>> kevin.tian@intel.com;
> >>>>>> marc.zyngier@arm.com; christoffer.dall@linaro.org;
> >>>>>> drjones@redhat.com; wei@redhat.com; tn@semihalf.com; Bharat
> >>>> Bhushan
> >>>>>> <bharat.bhushan@nxp.com>
> >>>>>> Subject: [RFC v2 0/8] VIRTIO-IOMMU device
> >>>>>>
> >>>>>> This series implements the virtio-iommu device. This is a proof
> >>>>>> of concept based on the virtio-iommu specification written by
> >>>>>> Jean-Philippe
> >>>> Brucker [1].
> >>>>>> This was tested with a guest using the virtio-iommu driver [2]
> >>>>>> and exposed with a virtio-net-pci using dma ops.
> >>>>>>
> >>>>>> The device gets instantiated using the "-device virtio-iommu-device"
> >>>>>> option. It currently works with ARM virt machine only as the
> >>>>>> machine must handle the dt binding between the virtio-mmio
> "iommu"
> >>>>>> node and the PCI host bridge node. ACPI booting is not yet
> supported.
> >>>>>>
> >>>>>> This should allow to start some benchmarking activities against
> >>>>>> pure emulated IOMMU (especially ARM SMMU).
> >>>>>
> >>>>> I am testing this on ARM64 and see below continuous error prints:
> >>>>>
> >>>>> 	virtio_iommu_translate sid=8 is not known!!
> >>>>> 	virtio_iommu_translate sid=8 is not known!!
> >>>>> 	virtio_iommu_translate sid=8 is not known!!
> >>>>> 	virtio_iommu_translate sid=8 is not known!!
> >>>>> 	virtio_iommu_translate sid=8 is not known!!
> >>>>> 	virtio_iommu_translate sid=8 is not known!!
> >>>>> 	virtio_iommu_translate sid=8 is not known!!
> >>>>> 	virtio_iommu_translate sid=8 is not known!!
> >>>>> 	virtio_iommu_translate sid=8 is not known!!
> >>>>> 	virtio_iommu_translate sid=8 is not known!!
> >>>>>
> >>>>>
> >>>>> Also in guest I do not see device-tree node with virtio-iommu.
> >>>> do you mean the virtio-mmio with #iommu-cells property?
> >>>>
> >>>> This one is created statically by virt machine. I would be
> >>>> surprised if it were not there. Are you using the virt = virt2.10 machine.
> >>>> Machines before do not support its instantiation.
> >>>>
> >>>> Please can you add a printf in hw/arm/virt.c create_virtio_mmio()
> >>>> at the moment when this node is created. Also you can add a printf
> >>>> in
> >>>> bind_virtio_iommu_device() to make sure the binding with the PCI
> >>>> host bridge is added on machine init done.
> >>>>
> >>>> Also worth to check, CONFIG_VIRTIO_IOMMU=y on guest side.
> >>>
> >>> It works on my side.
> >> Great.
> >>
> >>  The driver config was disabled and also I was using guest kernel
> >> which was not have deferred-probing.
> >> Yes I did not mention in my cover letter the guest I have been using
> >> is based on Jean-Philippe's branch, featuring deferred IOMMU probing.
> >> I I have not tried yet with an upstream guest.
> >>  Now after fixing it works on my side
> >>> I placed some prints to see dma-map are mapping regions in
> >>> virtio-iommu,
> >> it uses emulated iommu.
> >>>
> >>> I will continue to add VFIO support now on this and more testing !!
> >>
> >> OK. I will do the VFIO integration first on the vsmmuv3 device as I
> >> already prepared the VFIO replay and hopefully we will sync ;-)
> >>
> >> Thanks
> >>
> >> Eric
> >>>
> >>> Thanks
> >>> -Bharat
> >>>
> >>>>
> >>>> Thanks
> >>>>
> >>>> Eric
> >>>>
> >>>>> I am using qemu-tree you mentioned below and iommu-driver
> patches
> >>>> published by Jean-P.
> >>>>> Qemu command line have additional ""-device virtio-iommu-device".
> >>>>> What
> >>>> I am missing ?
> >>>>
> >>>>
> >>>>>
> >>>>> Thanks
> >>>>> -Bharat
> >>>>>
> >>>>>>
> >>>>>> Best Regards
> >>>>>>
> >>>>>> Eric
> >>>>>>
> >>>>>> This series can be found at:
> >>>>>> https://github.com/eauger/qemu/tree/virtio-iommu-rfcv2
> >>>>>>
> >>>>>> References:
> >>>>>> [1] [RFC 0/3] virtio-iommu: a paravirtualized IOMMU, [2] [RFC
> >>>>>> PATCH linux]
> >>>>>> iommu: Add virtio-iommu driver [3] [RFC PATCH kvmtool 00/15] Add
> >>>>>> virtio- iommu
> >>>>>>
> >>>>>> History:
> >>>>>> v1 -> v2:
> >>>>>> - fix redifinition of viommu_as typedef
> >>>>>>
> >>>>>> Eric Auger (8):
> >>>>>>   update-linux-headers: import virtio_iommu.h
> >>>>>>   linux-headers: Update for virtio-iommu
> >>>>>>   virtio_iommu: add skeleton
> >>>>>>   virtio-iommu: Decode the command payload
> >>>>>>   virtio_iommu: Add the iommu regions
> >>>>>>   virtio-iommu: Implement the translation and commands
> >>>>>>   hw/arm/virt: Add 2.10 machine type
> >>>>>>   hw/arm/virt: Add virtio-iommu the virt board
> >>>>>>
> >>>>>>  hw/arm/virt.c                                 | 116 ++++-
> >>>>>>  hw/virtio/Makefile.objs                       |   1 +
> >>>>>>  hw/virtio/trace-events                        |  14 +
> >>>>>>  hw/virtio/virtio-iommu.c                      | 623
> >>>> ++++++++++++++++++++++++++
> >>>>>>  include/hw/arm/virt.h                         |   5 +
> >>>>>>  include/hw/virtio/virtio-iommu.h              |  60 +++
> >>>>>>  include/standard-headers/linux/virtio_ids.h   |   1 +
> >>>>>>  include/standard-headers/linux/virtio_iommu.h | 142 ++++++
> >>>>>>  linux-headers/linux/virtio_iommu.h            |   1 +
> >>>>>>  scripts/update-linux-headers.sh               |   3 +
> >>>>>>  10 files changed, 957 insertions(+), 9 deletions(-)  create mode
> >>>>>> 100644 hw/virtio/virtio-iommu.c  create mode 100644
> >>>>>> include/hw/virtio/virtio- iommu.h  create mode 100644
> >>>>>> include/standard- headers/linux/virtio_iommu.h  create mode
> >>>>>> 100644 linux-headers/linux/virtio_iommu.h
> >>>>>>
> >>>>>> --
> >>>>>> 2.5.5
> >>>>>
> >>>
> >

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [Qemu-devel] [RFC v2 0/8] VIRTIO-IOMMU device
  2017-07-05  8:23             ` Bharat Bhushan
@ 2017-07-05  8:44               ` Auger Eric
  2017-07-05  8:49                 ` Bharat Bhushan
  0 siblings, 1 reply; 73+ messages in thread
From: Auger Eric @ 2017-07-05  8:44 UTC (permalink / raw)
  To: Bharat Bhushan, eric.auger.pro, peter.maydell, alex.williamson,
	mst, qemu-arm, qemu-devel, jean-philippe.brucker
  Cc: wei, kevin.tian, marc.zyngier, tn, will.deacon, drjones,
	robin.murphy, christoffer.dall

Hi Bharat,

On 05/07/2017 10:23, Bharat Bhushan wrote:
> Hi Eric,
> 
>> -----Original Message-----
>> From: Auger Eric [mailto:eric.auger@redhat.com]
>> Sent: Monday, June 26, 2017 1:25 PM
>> To: Bharat Bhushan <bharat.bhushan@nxp.com>;
>> eric.auger.pro@gmail.com; peter.maydell@linaro.org;
>> alex.williamson@redhat.com; mst@redhat.com; qemu-arm@nongnu.org;
>> qemu-devel@nongnu.org; jean-philippe.brucker@arm.com
>> Cc: wei@redhat.com; kevin.tian@intel.com; marc.zyngier@arm.com;
>> tn@semihalf.com; will.deacon@arm.com; drjones@redhat.com;
>> robin.murphy@arm.com; christoffer.dall@linaro.org
>> Subject: Re: [Qemu-devel] [RFC v2 0/8] VIRTIO-IOMMU device
>>
>> Hi Bharat,
>>
>> On 19/06/2017 09:54, Bharat Bhushan wrote:
>>> Hi Eric,
>>>
>>> I started added replay in virtio-iommu and came across how MSI interrupts
>> with work with VFIO.
>>> I understand that on intel this works differently but vsmmu will have same
>> requirement.
>>> kvm-msi-irq-route are added using the msi-address to be translated by
>> viommu and not the final translated address.
>>> While currently the irqfd framework does not know about emulated
>> iommus (virtio-iommu, vsmmuv3/vintel-iommu).
>>> So in my view we have following options:
>>> - Programming with translated address when setting up
>>> kvm-msi-irq-route
>>> - Route the interrupts via QEMU, which is bad from performance
>>> - vhost-virtio-iommu may solve the problem in long term
>>
>> Sorry for the delay. With regard to the vsmmuv3/vfio integration I think we
>> need to use the guest physical address otherwise the MSI address will not be
>> recognized as an MSI doorbell.
>>
>> Also the fact on ARM we map the MSI doorbell causes an assert in
>> vfio_get_vaddr() as the vITS doorbell is not a RAM region. We will need to
>> handle this specifically.
> 
> Also when setup msi-route kvm_irqchip_add_msi_route() we needed to provide the translated address.
> According to my understanding this is required because kernel does no go through viommu translation when generating interrupt, no? 

yes this is needed when KVM MSI routes are set up, ie. along with GICV3
ITS. With GICv2M, qemu direct gsi mapping is used and this is not needed.

So I do not understand your previous sentence saying "MSI interrupts
works without any change".

Thanks

Eric

> 
> Thanks
> -Bharat
> 
>>
>> Besides I have not looked specifically at the virtio-iommu/vfio integration
>> yet.
>>
>> Thanks
>>
>> Eric
>>>
>>> Is there any other better option I am missing?
>>>
>>> Thanks
>>> -Bharat
>>>
>>>> -----Original Message-----
>>>> From: Auger Eric [mailto:eric.auger@redhat.com]
>>>> Sent: Friday, June 09, 2017 5:24 PM
>>>> To: Bharat Bhushan <bharat.bhushan@nxp.com>;
>>>> eric.auger.pro@gmail.com; peter.maydell@linaro.org;
>>>> alex.williamson@redhat.com; mst@redhat.com; qemu-arm@nongnu.org;
>>>> qemu-devel@nongnu.org; jean-philippe.brucker@arm.com
>>>> Cc: wei@redhat.com; kevin.tian@intel.com; marc.zyngier@arm.com;
>>>> tn@semihalf.com; will.deacon@arm.com; drjones@redhat.com;
>>>> robin.murphy@arm.com; christoffer.dall@linaro.org
>>>> Subject: Re: [Qemu-devel] [RFC v2 0/8] VIRTIO-IOMMU device
>>>>
>>>> Hi Bharat,
>>>>
>>>> On 09/06/2017 13:30, Bharat Bhushan wrote:
>>>>> Hi Eric,
>>>>>
>>>>>> -----Original Message-----
>>>>>> From: Auger Eric [mailto:eric.auger@redhat.com]
>>>>>> Sent: Friday, June 09, 2017 12:14 PM
>>>>>> To: Bharat Bhushan <bharat.bhushan@nxp.com>;
>>>>>> eric.auger.pro@gmail.com; peter.maydell@linaro.org;
>>>>>> alex.williamson@redhat.com; mst@redhat.com; qemu-
>> arm@nongnu.org;
>>>>>> qemu-devel@nongnu.org; jean-philippe.brucker@arm.com
>>>>>> Cc: will.deacon@arm.com; robin.murphy@arm.com;
>>>> kevin.tian@intel.com;
>>>>>> marc.zyngier@arm.com; christoffer.dall@linaro.org;
>>>>>> drjones@redhat.com; wei@redhat.com; tn@semihalf.com
>>>>>> Subject: Re: [RFC v2 0/8] VIRTIO-IOMMU device
>>>>>>
>>>>>> Hi Bharat,
>>>>>>
>>>>>> On 09/06/2017 08:16, Bharat Bhushan wrote:
>>>>>>> Hi Eric,
>>>>>>>
>>>>>>>> -----Original Message-----
>>>>>>>> From: Eric Auger [mailto:eric.auger@redhat.com]
>>>>>>>> Sent: Wednesday, June 07, 2017 9:31 PM
>>>>>>>> To: eric.auger.pro@gmail.com; eric.auger@redhat.com;
>>>>>>>> peter.maydell@linaro.org; alex.williamson@redhat.com;
>>>>>> mst@redhat.com;
>>>>>>>> qemu-arm@nongnu.org; qemu-devel@nongnu.org; jean-
>>>>>>>> philippe.brucker@arm.com
>>>>>>>> Cc: will.deacon@arm.com; robin.murphy@arm.com;
>>>>>> kevin.tian@intel.com;
>>>>>>>> marc.zyngier@arm.com; christoffer.dall@linaro.org;
>>>>>>>> drjones@redhat.com; wei@redhat.com; tn@semihalf.com; Bharat
>>>>>> Bhushan
>>>>>>>> <bharat.bhushan@nxp.com>
>>>>>>>> Subject: [RFC v2 0/8] VIRTIO-IOMMU device
>>>>>>>>
>>>>>>>> This series implements the virtio-iommu device. This is a proof
>>>>>>>> of concept based on the virtio-iommu specification written by
>>>>>>>> Jean-Philippe
>>>>>> Brucker [1].
>>>>>>>> This was tested with a guest using the virtio-iommu driver [2]
>>>>>>>> and exposed with a virtio-net-pci using dma ops.
>>>>>>>>
>>>>>>>> The device gets instantiated using the "-device virtio-iommu-device"
>>>>>>>> option. It currently works with ARM virt machine only as the
>>>>>>>> machine must handle the dt binding between the virtio-mmio
>> "iommu"
>>>>>>>> node and the PCI host bridge node. ACPI booting is not yet
>> supported.
>>>>>>>>
>>>>>>>> This should allow to start some benchmarking activities against
>>>>>>>> pure emulated IOMMU (especially ARM SMMU).
>>>>>>>
>>>>>>> I am testing this on ARM64 and see below continuous error prints:
>>>>>>>
>>>>>>> 	virtio_iommu_translate sid=8 is not known!!
>>>>>>> 	virtio_iommu_translate sid=8 is not known!!
>>>>>>> 	virtio_iommu_translate sid=8 is not known!!
>>>>>>> 	virtio_iommu_translate sid=8 is not known!!
>>>>>>> 	virtio_iommu_translate sid=8 is not known!!
>>>>>>> 	virtio_iommu_translate sid=8 is not known!!
>>>>>>> 	virtio_iommu_translate sid=8 is not known!!
>>>>>>> 	virtio_iommu_translate sid=8 is not known!!
>>>>>>> 	virtio_iommu_translate sid=8 is not known!!
>>>>>>> 	virtio_iommu_translate sid=8 is not known!!
>>>>>>>
>>>>>>>
>>>>>>> Also in guest I do not see device-tree node with virtio-iommu.
>>>>>> do you mean the virtio-mmio with #iommu-cells property?
>>>>>>
>>>>>> This one is created statically by virt machine. I would be
>>>>>> surprised if it were not there. Are you using the virt = virt2.10 machine.
>>>>>> Machines before do not support its instantiation.
>>>>>>
>>>>>> Please can you add a printf in hw/arm/virt.c create_virtio_mmio()
>>>>>> at the moment when this node is created. Also you can add a printf
>>>>>> in
>>>>>> bind_virtio_iommu_device() to make sure the binding with the PCI
>>>>>> host bridge is added on machine init done.
>>>>>>
>>>>>> Also worth to check, CONFIG_VIRTIO_IOMMU=y on guest side.
>>>>>
>>>>> It works on my side.
>>>> Great.
>>>>
>>>>  The driver config was disabled and also I was using guest kernel
>>>> which was not have deferred-probing.
>>>> Yes I did not mention in my cover letter the guest I have been using
>>>> is based on Jean-Philippe's branch, featuring deferred IOMMU probing.
>>>> I I have not tried yet with an upstream guest.
>>>>  Now after fixing it works on my side
>>>>> I placed some prints to see dma-map are mapping regions in
>>>>> virtio-iommu,
>>>> it uses emulated iommu.
>>>>>
>>>>> I will continue to add VFIO support now on this and more testing !!
>>>>
>>>> OK. I will do the VFIO integration first on the vsmmuv3 device as I
>>>> already prepared the VFIO replay and hopefully we will sync ;-)
>>>>
>>>> Thanks
>>>>
>>>> Eric
>>>>>
>>>>> Thanks
>>>>> -Bharat
>>>>>
>>>>>>
>>>>>> Thanks
>>>>>>
>>>>>> Eric
>>>>>>
>>>>>>> I am using qemu-tree you mentioned below and iommu-driver
>> patches
>>>>>> published by Jean-P.
>>>>>>> Qemu command line have additional ""-device virtio-iommu-device".
>>>>>>> What
>>>>>> I am missing ?
>>>>>>
>>>>>>
>>>>>>>
>>>>>>> Thanks
>>>>>>> -Bharat
>>>>>>>
>>>>>>>>
>>>>>>>> Best Regards
>>>>>>>>
>>>>>>>> Eric
>>>>>>>>
>>>>>>>> This series can be found at:
>>>>>>>> https://github.com/eauger/qemu/tree/virtio-iommu-rfcv2
>>>>>>>>
>>>>>>>> References:
>>>>>>>> [1] [RFC 0/3] virtio-iommu: a paravirtualized IOMMU, [2] [RFC
>>>>>>>> PATCH linux]
>>>>>>>> iommu: Add virtio-iommu driver [3] [RFC PATCH kvmtool 00/15] Add
>>>>>>>> virtio- iommu
>>>>>>>>
>>>>>>>> History:
>>>>>>>> v1 -> v2:
>>>>>>>> - fix redifinition of viommu_as typedef
>>>>>>>>
>>>>>>>> Eric Auger (8):
>>>>>>>>   update-linux-headers: import virtio_iommu.h
>>>>>>>>   linux-headers: Update for virtio-iommu
>>>>>>>>   virtio_iommu: add skeleton
>>>>>>>>   virtio-iommu: Decode the command payload
>>>>>>>>   virtio_iommu: Add the iommu regions
>>>>>>>>   virtio-iommu: Implement the translation and commands
>>>>>>>>   hw/arm/virt: Add 2.10 machine type
>>>>>>>>   hw/arm/virt: Add virtio-iommu the virt board
>>>>>>>>
>>>>>>>>  hw/arm/virt.c                                 | 116 ++++-
>>>>>>>>  hw/virtio/Makefile.objs                       |   1 +
>>>>>>>>  hw/virtio/trace-events                        |  14 +
>>>>>>>>  hw/virtio/virtio-iommu.c                      | 623
>>>>>> ++++++++++++++++++++++++++
>>>>>>>>  include/hw/arm/virt.h                         |   5 +
>>>>>>>>  include/hw/virtio/virtio-iommu.h              |  60 +++
>>>>>>>>  include/standard-headers/linux/virtio_ids.h   |   1 +
>>>>>>>>  include/standard-headers/linux/virtio_iommu.h | 142 ++++++
>>>>>>>>  linux-headers/linux/virtio_iommu.h            |   1 +
>>>>>>>>  scripts/update-linux-headers.sh               |   3 +
>>>>>>>>  10 files changed, 957 insertions(+), 9 deletions(-)  create mode
>>>>>>>> 100644 hw/virtio/virtio-iommu.c  create mode 100644
>>>>>>>> include/hw/virtio/virtio- iommu.h  create mode 100644
>>>>>>>> include/standard- headers/linux/virtio_iommu.h  create mode
>>>>>>>> 100644 linux-headers/linux/virtio_iommu.h
>>>>>>>>
>>>>>>>> --
>>>>>>>> 2.5.5
>>>>>>>
>>>>>
>>>

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [Qemu-devel] [RFC v2 0/8] VIRTIO-IOMMU device
  2017-07-05  8:44               ` Auger Eric
@ 2017-07-05  8:49                 ` Bharat Bhushan
  2017-07-06 10:02                   ` Jean-Philippe Brucker
  0 siblings, 1 reply; 73+ messages in thread
From: Bharat Bhushan @ 2017-07-05  8:49 UTC (permalink / raw)
  To: Auger Eric, eric.auger.pro, peter.maydell, alex.williamson, mst,
	qemu-arm, qemu-devel, jean-philippe.brucker
  Cc: wei, kevin.tian, marc.zyngier, tn, will.deacon, drjones,
	robin.murphy, christoffer.dall

Hi Eric,

> -----Original Message-----
> From: Auger Eric [mailto:eric.auger@redhat.com]
> Sent: Wednesday, July 05, 2017 2:14 PM
> To: Bharat Bhushan <bharat.bhushan@nxp.com>;
> eric.auger.pro@gmail.com; peter.maydell@linaro.org;
> alex.williamson@redhat.com; mst@redhat.com; qemu-arm@nongnu.org;
> qemu-devel@nongnu.org; jean-philippe.brucker@arm.com
> Cc: wei@redhat.com; kevin.tian@intel.com; marc.zyngier@arm.com;
> tn@semihalf.com; will.deacon@arm.com; drjones@redhat.com;
> robin.murphy@arm.com; christoffer.dall@linaro.org
> Subject: Re: [Qemu-devel] [RFC v2 0/8] VIRTIO-IOMMU device
> 
> Hi Bharat,
> 
> On 05/07/2017 10:23, Bharat Bhushan wrote:
> > Hi Eric,
> >
> >> -----Original Message-----
> >> From: Auger Eric [mailto:eric.auger@redhat.com]
> >> Sent: Monday, June 26, 2017 1:25 PM
> >> To: Bharat Bhushan <bharat.bhushan@nxp.com>;
> >> eric.auger.pro@gmail.com; peter.maydell@linaro.org;
> >> alex.williamson@redhat.com; mst@redhat.com; qemu-arm@nongnu.org;
> >> qemu-devel@nongnu.org; jean-philippe.brucker@arm.com
> >> Cc: wei@redhat.com; kevin.tian@intel.com; marc.zyngier@arm.com;
> >> tn@semihalf.com; will.deacon@arm.com; drjones@redhat.com;
> >> robin.murphy@arm.com; christoffer.dall@linaro.org
> >> Subject: Re: [Qemu-devel] [RFC v2 0/8] VIRTIO-IOMMU device
> >>
> >> Hi Bharat,
> >>
> >> On 19/06/2017 09:54, Bharat Bhushan wrote:
> >>> Hi Eric,
> >>>
> >>> I started added replay in virtio-iommu and came across how MSI
> >>> interrupts
> >> with work with VFIO.
> >>> I understand that on intel this works differently but vsmmu will
> >>> have same
> >> requirement.
> >>> kvm-msi-irq-route are added using the msi-address to be translated
> >>> by
> >> viommu and not the final translated address.
> >>> While currently the irqfd framework does not know about emulated
> >> iommus (virtio-iommu, vsmmuv3/vintel-iommu).
> >>> So in my view we have following options:
> >>> - Programming with translated address when setting up
> >>> kvm-msi-irq-route
> >>> - Route the interrupts via QEMU, which is bad from performance
> >>> - vhost-virtio-iommu may solve the problem in long term
> >>
> >> Sorry for the delay. With regard to the vsmmuv3/vfio integration I
> >> think we need to use the guest physical address otherwise the MSI
> >> address will not be recognized as an MSI doorbell.
> >>
> >> Also the fact on ARM we map the MSI doorbell causes an assert in
> >> vfio_get_vaddr() as the vITS doorbell is not a RAM region. We will
> >> need to handle this specifically.
> >
> > Also when setup msi-route kvm_irqchip_add_msi_route() we needed to
> provide the translated address.
> > According to my understanding this is required because kernel does no go
> through viommu translation when generating interrupt, no?
> 
> yes this is needed when KVM MSI routes are set up, ie. along with GICV3 ITS.
> With GICv2M, qemu direct gsi mapping is used and this is not needed.
> 
> So I do not understand your previous sentence saying "MSI interrupts works
> without any change".

I have almost completed vfio integration with virtio-iommu and now testing the changes by assigning e1000 device to VM. For this I have changed virtio-iommu driver to use IOMMU_RESV_MSI rather than sw-msi and this does not need changed in vfio_get_addr()  and kvm_irqchip_add_msi_route()

Thanks
-Bharat

> 
> Thanks
> 
> Eric
> 
> >
> > Thanks
> > -Bharat
> >
> >>
> >> Besides I have not looked specifically at the virtio-iommu/vfio
> >> integration yet.
> >>
> >> Thanks
> >>
> >> Eric
> >>>
> >>> Is there any other better option I am missing?
> >>>
> >>> Thanks
> >>> -Bharat
> >>>
> >>>> -----Original Message-----
> >>>> From: Auger Eric [mailto:eric.auger@redhat.com]
> >>>> Sent: Friday, June 09, 2017 5:24 PM
> >>>> To: Bharat Bhushan <bharat.bhushan@nxp.com>;
> >>>> eric.auger.pro@gmail.com; peter.maydell@linaro.org;
> >>>> alex.williamson@redhat.com; mst@redhat.com; qemu-
> arm@nongnu.org;
> >>>> qemu-devel@nongnu.org; jean-philippe.brucker@arm.com
> >>>> Cc: wei@redhat.com; kevin.tian@intel.com; marc.zyngier@arm.com;
> >>>> tn@semihalf.com; will.deacon@arm.com; drjones@redhat.com;
> >>>> robin.murphy@arm.com; christoffer.dall@linaro.org
> >>>> Subject: Re: [Qemu-devel] [RFC v2 0/8] VIRTIO-IOMMU device
> >>>>
> >>>> Hi Bharat,
> >>>>
> >>>> On 09/06/2017 13:30, Bharat Bhushan wrote:
> >>>>> Hi Eric,
> >>>>>
> >>>>>> -----Original Message-----
> >>>>>> From: Auger Eric [mailto:eric.auger@redhat.com]
> >>>>>> Sent: Friday, June 09, 2017 12:14 PM
> >>>>>> To: Bharat Bhushan <bharat.bhushan@nxp.com>;
> >>>>>> eric.auger.pro@gmail.com; peter.maydell@linaro.org;
> >>>>>> alex.williamson@redhat.com; mst@redhat.com; qemu-
> >> arm@nongnu.org;
> >>>>>> qemu-devel@nongnu.org; jean-philippe.brucker@arm.com
> >>>>>> Cc: will.deacon@arm.com; robin.murphy@arm.com;
> >>>> kevin.tian@intel.com;
> >>>>>> marc.zyngier@arm.com; christoffer.dall@linaro.org;
> >>>>>> drjones@redhat.com; wei@redhat.com; tn@semihalf.com
> >>>>>> Subject: Re: [RFC v2 0/8] VIRTIO-IOMMU device
> >>>>>>
> >>>>>> Hi Bharat,
> >>>>>>
> >>>>>> On 09/06/2017 08:16, Bharat Bhushan wrote:
> >>>>>>> Hi Eric,
> >>>>>>>
> >>>>>>>> -----Original Message-----
> >>>>>>>> From: Eric Auger [mailto:eric.auger@redhat.com]
> >>>>>>>> Sent: Wednesday, June 07, 2017 9:31 PM
> >>>>>>>> To: eric.auger.pro@gmail.com; eric.auger@redhat.com;
> >>>>>>>> peter.maydell@linaro.org; alex.williamson@redhat.com;
> >>>>>> mst@redhat.com;
> >>>>>>>> qemu-arm@nongnu.org; qemu-devel@nongnu.org; jean-
> >>>>>>>> philippe.brucker@arm.com
> >>>>>>>> Cc: will.deacon@arm.com; robin.murphy@arm.com;
> >>>>>> kevin.tian@intel.com;
> >>>>>>>> marc.zyngier@arm.com; christoffer.dall@linaro.org;
> >>>>>>>> drjones@redhat.com; wei@redhat.com; tn@semihalf.com;
> Bharat
> >>>>>> Bhushan
> >>>>>>>> <bharat.bhushan@nxp.com>
> >>>>>>>> Subject: [RFC v2 0/8] VIRTIO-IOMMU device
> >>>>>>>>
> >>>>>>>> This series implements the virtio-iommu device. This is a proof
> >>>>>>>> of concept based on the virtio-iommu specification written by
> >>>>>>>> Jean-Philippe
> >>>>>> Brucker [1].
> >>>>>>>> This was tested with a guest using the virtio-iommu driver [2]
> >>>>>>>> and exposed with a virtio-net-pci using dma ops.
> >>>>>>>>
> >>>>>>>> The device gets instantiated using the "-device virtio-iommu-
> device"
> >>>>>>>> option. It currently works with ARM virt machine only as the
> >>>>>>>> machine must handle the dt binding between the virtio-mmio
> >> "iommu"
> >>>>>>>> node and the PCI host bridge node. ACPI booting is not yet
> >> supported.
> >>>>>>>>
> >>>>>>>> This should allow to start some benchmarking activities against
> >>>>>>>> pure emulated IOMMU (especially ARM SMMU).
> >>>>>>>
> >>>>>>> I am testing this on ARM64 and see below continuous error prints:
> >>>>>>>
> >>>>>>> 	virtio_iommu_translate sid=8 is not known!!
> >>>>>>> 	virtio_iommu_translate sid=8 is not known!!
> >>>>>>> 	virtio_iommu_translate sid=8 is not known!!
> >>>>>>> 	virtio_iommu_translate sid=8 is not known!!
> >>>>>>> 	virtio_iommu_translate sid=8 is not known!!
> >>>>>>> 	virtio_iommu_translate sid=8 is not known!!
> >>>>>>> 	virtio_iommu_translate sid=8 is not known!!
> >>>>>>> 	virtio_iommu_translate sid=8 is not known!!
> >>>>>>> 	virtio_iommu_translate sid=8 is not known!!
> >>>>>>> 	virtio_iommu_translate sid=8 is not known!!
> >>>>>>>
> >>>>>>>
> >>>>>>> Also in guest I do not see device-tree node with virtio-iommu.
> >>>>>> do you mean the virtio-mmio with #iommu-cells property?
> >>>>>>
> >>>>>> This one is created statically by virt machine. I would be
> >>>>>> surprised if it were not there. Are you using the virt = virt2.10
> machine.
> >>>>>> Machines before do not support its instantiation.
> >>>>>>
> >>>>>> Please can you add a printf in hw/arm/virt.c create_virtio_mmio()
> >>>>>> at the moment when this node is created. Also you can add a
> >>>>>> printf in
> >>>>>> bind_virtio_iommu_device() to make sure the binding with the PCI
> >>>>>> host bridge is added on machine init done.
> >>>>>>
> >>>>>> Also worth to check, CONFIG_VIRTIO_IOMMU=y on guest side.
> >>>>>
> >>>>> It works on my side.
> >>>> Great.
> >>>>
> >>>>  The driver config was disabled and also I was using guest kernel
> >>>> which was not have deferred-probing.
> >>>> Yes I did not mention in my cover letter the guest I have been
> >>>> using is based on Jean-Philippe's branch, featuring deferred IOMMU
> probing.
> >>>> I I have not tried yet with an upstream guest.
> >>>>  Now after fixing it works on my side
> >>>>> I placed some prints to see dma-map are mapping regions in
> >>>>> virtio-iommu,
> >>>> it uses emulated iommu.
> >>>>>
> >>>>> I will continue to add VFIO support now on this and more testing !!
> >>>>
> >>>> OK. I will do the VFIO integration first on the vsmmuv3 device as I
> >>>> already prepared the VFIO replay and hopefully we will sync ;-)
> >>>>
> >>>> Thanks
> >>>>
> >>>> Eric
> >>>>>
> >>>>> Thanks
> >>>>> -Bharat
> >>>>>
> >>>>>>
> >>>>>> Thanks
> >>>>>>
> >>>>>> Eric
> >>>>>>
> >>>>>>> I am using qemu-tree you mentioned below and iommu-driver
> >> patches
> >>>>>> published by Jean-P.
> >>>>>>> Qemu command line have additional ""-device virtio-iommu-
> device".
> >>>>>>> What
> >>>>>> I am missing ?
> >>>>>>
> >>>>>>
> >>>>>>>
> >>>>>>> Thanks
> >>>>>>> -Bharat
> >>>>>>>
> >>>>>>>>
> >>>>>>>> Best Regards
> >>>>>>>>
> >>>>>>>> Eric
> >>>>>>>>
> >>>>>>>> This series can be found at:
> >>>>>>>> https://github.com/eauger/qemu/tree/virtio-iommu-rfcv2
> >>>>>>>>
> >>>>>>>> References:
> >>>>>>>> [1] [RFC 0/3] virtio-iommu: a paravirtualized IOMMU, [2] [RFC
> >>>>>>>> PATCH linux]
> >>>>>>>> iommu: Add virtio-iommu driver [3] [RFC PATCH kvmtool 00/15]
> >>>>>>>> Add
> >>>>>>>> virtio- iommu
> >>>>>>>>
> >>>>>>>> History:
> >>>>>>>> v1 -> v2:
> >>>>>>>> - fix redifinition of viommu_as typedef
> >>>>>>>>
> >>>>>>>> Eric Auger (8):
> >>>>>>>>   update-linux-headers: import virtio_iommu.h
> >>>>>>>>   linux-headers: Update for virtio-iommu
> >>>>>>>>   virtio_iommu: add skeleton
> >>>>>>>>   virtio-iommu: Decode the command payload
> >>>>>>>>   virtio_iommu: Add the iommu regions
> >>>>>>>>   virtio-iommu: Implement the translation and commands
> >>>>>>>>   hw/arm/virt: Add 2.10 machine type
> >>>>>>>>   hw/arm/virt: Add virtio-iommu the virt board
> >>>>>>>>
> >>>>>>>>  hw/arm/virt.c                                 | 116 ++++-
> >>>>>>>>  hw/virtio/Makefile.objs                       |   1 +
> >>>>>>>>  hw/virtio/trace-events                        |  14 +
> >>>>>>>>  hw/virtio/virtio-iommu.c                      | 623
> >>>>>> ++++++++++++++++++++++++++
> >>>>>>>>  include/hw/arm/virt.h                         |   5 +
> >>>>>>>>  include/hw/virtio/virtio-iommu.h              |  60 +++
> >>>>>>>>  include/standard-headers/linux/virtio_ids.h   |   1 +
> >>>>>>>>  include/standard-headers/linux/virtio_iommu.h | 142 ++++++
> >>>>>>>>  linux-headers/linux/virtio_iommu.h            |   1 +
> >>>>>>>>  scripts/update-linux-headers.sh               |   3 +
> >>>>>>>>  10 files changed, 957 insertions(+), 9 deletions(-)  create
> >>>>>>>> mode
> >>>>>>>> 100644 hw/virtio/virtio-iommu.c  create mode 100644
> >>>>>>>> include/hw/virtio/virtio- iommu.h  create mode 100644
> >>>>>>>> include/standard- headers/linux/virtio_iommu.h  create mode
> >>>>>>>> 100644 linux-headers/linux/virtio_iommu.h
> >>>>>>>>
> >>>>>>>> --
> >>>>>>>> 2.5.5
> >>>>>>>
> >>>>>
> >>>

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [Qemu-devel] [RFC v2 0/8] VIRTIO-IOMMU device
  2017-07-05  7:14             ` Tian, Kevin
@ 2017-07-05 12:44               ` Jean-Philippe Brucker
  2017-07-07  6:21                 ` Tian, Kevin
  0 siblings, 1 reply; 73+ messages in thread
From: Jean-Philippe Brucker @ 2017-07-05 12:44 UTC (permalink / raw)
  To: Tian, Kevin, Bharat Bhushan, Auger Eric, eric.auger.pro,
	peter.maydell, alex.williamson, mst, qemu-arm, qemu-devel
  Cc: wei, marc.zyngier, tn, will.deacon, drjones, robin.murphy,
	christoffer.dall

On 05/07/17 08:14, Tian, Kevin wrote:
>> From: Jean-Philippe Brucker [mailto:jean-philippe.brucker@arm.com]
>> Sent: Monday, June 19, 2017 6:15 PM
>>
>> On 19/06/17 08:54, Bharat Bhushan wrote:
>>> Hi Eric,
>>>
>>> I started added replay in virtio-iommu and came across how MSI interrupts
>> with work with VFIO.
>>> I understand that on intel this works differently but vsmmu will have same
>> requirement.
>>> kvm-msi-irq-route are added using the msi-address to be translated by
>> viommu and not the final translated address.
>>> While currently the irqfd framework does not know about emulated
>> iommus (virtio-iommu, vsmmuv3/vintel-iommu).
>>> So in my view we have following options:
>>> - Programming with translated address when setting up kvm-msi-irq-route
>>> - Route the interrupts via QEMU, which is bad from performance
>>> - vhost-virtio-iommu may solve the problem in long term
>>>
>>> Is there any other better option I am missing?
>>
>> Since we're on the topic of MSIs... I'm currently trying to figure out how
>> we'll handle MSIs in the nested translation mode, where the guest manages
>> S1 page tables and the host doesn't know about GVA->GPA translation.
>>
>> I'm also wondering about the benefits of having SW-mapped MSIs in the
>> guest. It seems unavoidable for vSMMU since that's what a physical system
>> would do. But in a paravirtualized solution there doesn't seem to be any
>> compelling reason for having the guest map MSI doorbells. These addresses
>> are never accessed directly, they are only used for setting up IRQ routing
>> (at least on kvmtool). So here's what I'd like to have. Note that I
>> haven't investigated the feasibility in Qemu yet, I don't know how it
>> deals with MSIs.
>>
>> (1) Guest uses the guest-physical MSI doorbell when setting up MSIs. For
>> ARM with GICv3 this would be GITS_TRANSLATER, for x86 it would be the
>> fixed MSI doorbell. This way the host wouldn't need to inspect IOMMU
>> mappings when handling writes to PCI MSI-X tables.
>>
> 
> What do you mean by "fixed MSI doorbell"? PCI MSI-X table is part of
> PCI MMIO bar. Accessing to it is just a memory virtualization issue (e.g.
> trap by KVM and then emulated in Qemu) on x86. It's not a IOMMU
> problem. I guess you may mean same thing but want to double confirm
> here given the terminology confusion. Or do you mean the interrupt
> triggered by IOMMU itself?

Yes I didn't mean access to the MSI-X table, but how we interpret the
address in the MSI message. In kvmtool I create MSI routes for VFIO
devices when the guest accesses the MSI-X tables. And on ARM the tables
contains an IOVA that needs to be translated into a PA, so handling a
write to an MSI-X entry might mean doing the IOVA->PA translation of the
doorbell.

On x86 the MSI address is 0xfee...., whether there is an IOMMU or not.
That's what I meant by fixed. And it is the IOMMU that performs IRQ remapping.

On physical ARM systems, the SMMU doesn't treat any special address range
as "MSI window". For the SMMU, an MSI is simply a memory transaction. MSI
addresses are arbitrary IOVAs that get translated into PAs by the SMMU.
The SMMU doesn't perform any IRQ remapping, only address translation. This
PA is a doorbell register in the irqchip, which performs IRQ remapping and
triggers an interrupt.

Therefore in an emulated ARM system, when the guest writes the MSI-X
table, it writes an IOVA. In a strict emulation the MSI would have to
first go through the vIOMMU, and then into the irqchip. I was wondering if
with virtio-iommu we could skip the address translation and go to the MSI
remapping component immediately, effectively implementing a "hardware MSI
window". This is what x86 does, the difference being that MSI remapping is
done by the IOMMU on x86, and by the irqchip on ARM.

My current take is that we should keep the current behavior, but I will
try to sort out the different ways of implementing MSIs with virtio-iommu
in the next specification draft.

Thanks,
Jean

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [Qemu-devel] [RFC v2 0/8] VIRTIO-IOMMU device
  2017-07-05  7:25                 ` Tian, Kevin
@ 2017-07-05 12:44                   ` Jean-Philippe Brucker
  0 siblings, 0 replies; 73+ messages in thread
From: Jean-Philippe Brucker @ 2017-07-05 12:44 UTC (permalink / raw)
  To: Tian, Kevin, Auger Eric, Bharat Bhushan, eric.auger.pro,
	peter.maydell, alex.williamson, mst, qemu-arm, qemu-devel
  Cc: wei, marc.zyngier, tn, will.deacon, drjones, robin.murphy,
	christoffer.dall

On 05/07/17 08:25, Tian, Kevin wrote:
>> From: Jean-Philippe Brucker [mailto:jean-philippe.brucker@arm.com]
>> Sent: Tuesday, June 27, 2017 12:13 AM
>>
>> On 26/06/17 09:22, Auger Eric wrote:
>>> Hi Jean-Philippe,
>>>
>>> On 19/06/2017 12:15, Jean-Philippe Brucker wrote:
>>>> On 19/06/17 08:54, Bharat Bhushan wrote:
>>>>> Hi Eric,
>>>>>
>>>>> I started added replay in virtio-iommu and came across how MSI
>> interrupts with work with VFIO.
>>>>> I understand that on intel this works differently but vsmmu will have
>> same requirement.
>>>>> kvm-msi-irq-route are added using the msi-address to be translated by
>> viommu and not the final translated address.
>>>>> While currently the irqfd framework does not know about emulated
>> iommus (virtio-iommu, vsmmuv3/vintel-iommu).
>>>>> So in my view we have following options:
>>>>> - Programming with translated address when setting up kvm-msi-irq-
>> route
>>>>> - Route the interrupts via QEMU, which is bad from performance
>>>>> - vhost-virtio-iommu may solve the problem in long term
>>>>>
>>>>> Is there any other better option I am missing?
>>>>
>>>> Since we're on the topic of MSIs... I'm currently trying to figure out how
>>>> we'll handle MSIs in the nested translation mode, where the guest
>> manages
>>>> S1 page tables and the host doesn't know about GVA->GPA translation.
>>>
>>> I have a question about the "nested translation mode" terminology. Do
>>> you mean in that case you use stage 1 + stage 2 of the physical IOMMU
>>> (which the ARM spec normally advises or was meant for) or do you mean
>>> stage 1 implemented in vIOMMU and stage 2 implemented in pIOMMU. At
>> the
>>> moment my understanding is for VFIO integration the pIOMMU uses a
>> single
>>> stage combining both the stage 1 and stage2 mappings but the host is not
>>> aware of those 2 stages.
>>
>> Yes at the moment the VMM merges stage-1 (GVA->GPA) from the guest with
>> its stage-2 mappings (GPA->HPA) and creates a stage-2 mapping (GVA->HPA)
>> in the pIOMMU via VFIO_IOMMU_MAP_DMA. stage-1 is disabled in the
>> pIOMMU.
>>
> 
> Curious whether you are describing current smmu status or general
> vIOMMU status also applying to other vendors...

This particular paragraph was about the non-SVM state of things. The rest
was about stage-1 + stage-2 (what I call nested), which would indeed be
required for SVM. I don't think SVM can work with software merging.

Thanks,
Jean

> the usage what you described is about svm, while svm requires PASID.
> At least PASID is tied to stage-1 on Intel VT-d. Only DMA w/o PASID
> or nested translation from stage-1 will go through stage-2. Unless
> ARM smmu has a completely different implementation, I'm not sure
> how svm can be virtualized w/ stage-1 translation disabled. There
> are multiple stage-1 page tables while only one stage-2 page table per
> device. Could merging actually work here?
> 
> The only case with merging happen today is for guest stage-2 usage
> or so-called GIOVA usage. Guest programs GIOVA->GPA to vIOMMU 
> stage-2. Then vIOMMU invokes vfio map/unmap APIs to translate/
> merge to GIOVA->HPA to pIOMMU stage-2. Maybe what you
> actually meant is this one?
> 
> Thanks
> Kevin
> 

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [Qemu-devel] [RFC v2 0/8] VIRTIO-IOMMU device
  2017-07-05  8:49                 ` Bharat Bhushan
@ 2017-07-06 10:02                   ` Jean-Philippe Brucker
  2017-07-06 11:24                     ` Bharat Bhushan
  2017-07-06 21:11                     ` Auger Eric
  0 siblings, 2 replies; 73+ messages in thread
From: Jean-Philippe Brucker @ 2017-07-06 10:02 UTC (permalink / raw)
  To: Bharat Bhushan, Auger Eric, eric.auger.pro, peter.maydell,
	alex.williamson, mst, qemu-arm, qemu-devel
  Cc: wei, kevin.tian, marc.zyngier, tn, will.deacon, drjones,
	robin.murphy, christoffer.dall

On 05/07/17 09:49, Bharat Bhushan wrote:>>> Also when setup msi-route
kvm_irqchip_add_msi_route() we needed to
>> provide the translated address.
>>> According to my understanding this is required because kernel does no go
>> through viommu translation when generating interrupt, no?
>>
>> yes this is needed when KVM MSI routes are set up, ie. along with GICV3 ITS.
>> With GICv2M, qemu direct gsi mapping is used and this is not needed.
>>
>> So I do not understand your previous sentence saying "MSI interrupts works
>> without any change".
> 
> I have almost completed vfio integration with virtio-iommu and now testing the changes by assigning e1000 device to VM. For this I have changed virtio-iommu driver to use IOMMU_RESV_MSI rather than sw-msi and this does not need changed in vfio_get_addr()  and kvm_irqchip_add_msi_route()

I understand you're reserving region 0x08000000-0x08100000 as
IOMMU_RESV_MSI instead of IOMMU_RESV_SW_MSI? I think this only works
because Qemu places the vgic in that area as well (in hw/arm/virt.c). It's
not a coincidence if the addresses are the same, because Eric chose them
for the Linux SMMU drivers and I copied them.

We can't rely on that behavior, though, it will break MSIs in emulated
devices. And if Qemu happens to move the MSI doorbell in future machine
revisions, then it would also break VFIO.

Just for my own understanding -- what happens, I think, is that in Linux
iova_reserve_iommu_regions initially reserves the guest-physical doorbell
0x08000000-0x08100000. Then much later, when the device driver requests an
MSI, the irqchip driver calls iommu_dma_map_msi_msg with the
guest-physical gicv2m address 0x08020000. The function finds the right
page in msi_page_list, which was added by cookie_init_hw_msi_region,
therefore bypassing the viommu and the GPA gets written in the MSI-X table.

If an emulated device such as virtio-net-pci were to generate an MSI, then
Qemu would attempt to access the doorbell written by Linux into the MSI-X
table, 0x08020000, and fault because that address wasn't mapped in the viommu.

So for VFIO, you either need to translate the MSI-X entry using the
viommu, or just assume that the vaddr corresponds to the only MSI doorbell
accessible by this device (because how can we be certain that the guest
already mapped the doorbell before writing the entry?)

For ARM machines it's probably best to stick with IOMMU_RESV_SW_MSI.
However, a nice way to use IOMMU_RESV_MSI would be for the virtio-iommu
device to advertise identity-mapped/reserved regions, and bypass
translation on these regions. Then the driver could reserve those with
IOMMU_RESV_MSI. For x86 we will need such a system, with an added IRQ
remapping feature.

Thanks,
Jean

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [Qemu-devel] [RFC v2 0/8] VIRTIO-IOMMU device
  2017-07-06 10:02                   ` Jean-Philippe Brucker
@ 2017-07-06 11:24                     ` Bharat Bhushan
  2017-07-06 11:55                       ` Jean-Philippe Brucker
  2017-07-06 21:16                       ` Auger Eric
  2017-07-06 21:11                     ` Auger Eric
  1 sibling, 2 replies; 73+ messages in thread
From: Bharat Bhushan @ 2017-07-06 11:24 UTC (permalink / raw)
  To: Jean-Philippe Brucker, Auger Eric, eric.auger.pro, peter.maydell,
	alex.williamson, mst, qemu-arm, qemu-devel
  Cc: wei, kevin.tian, marc.zyngier, tn, will.deacon, drjones,
	robin.murphy, christoffer.dall



> -----Original Message-----
> From: Jean-Philippe Brucker [mailto:jean-philippe.brucker@arm.com]
> Sent: Thursday, July 06, 2017 3:33 PM
> To: Bharat Bhushan <bharat.bhushan@nxp.com>; Auger Eric
> <eric.auger@redhat.com>; eric.auger.pro@gmail.com;
> peter.maydell@linaro.org; alex.williamson@redhat.com; mst@redhat.com;
> qemu-arm@nongnu.org; qemu-devel@nongnu.org
> Cc: wei@redhat.com; kevin.tian@intel.com; marc.zyngier@arm.com;
> tn@semihalf.com; will.deacon@arm.com; drjones@redhat.com;
> robin.murphy@arm.com; christoffer.dall@linaro.org
> Subject: Re: [Qemu-devel] [RFC v2 0/8] VIRTIO-IOMMU device
> 
> On 05/07/17 09:49, Bharat Bhushan wrote:>>> Also when setup msi-route
> kvm_irqchip_add_msi_route() we needed to
> >> provide the translated address.
> >>> According to my understanding this is required because kernel does
> >>> no go
> >> through viommu translation when generating interrupt, no?
> >>
> >> yes this is needed when KVM MSI routes are set up, ie. along with GICV3
> ITS.
> >> With GICv2M, qemu direct gsi mapping is used and this is not needed.
> >>
> >> So I do not understand your previous sentence saying "MSI interrupts
> >> works without any change".
> >
> > I have almost completed vfio integration with virtio-iommu and now
> > testing the changes by assigning e1000 device to VM. For this I have
> > changed virtio-iommu driver to use IOMMU_RESV_MSI rather than sw-msi
> > and this does not need changed in vfio_get_addr()  and
> > kvm_irqchip_add_msi_route()
> 
> I understand you're reserving region 0x08000000-0x08100000 as
> IOMMU_RESV_MSI instead of IOMMU_RESV_SW_MSI? I think this only
> works because Qemu places the vgic in that area as well (in hw/arm/virt.c).
> It's not a coincidence if the addresses are the same, because Eric chose them
> for the Linux SMMU drivers and I copied them.
> 
> We can't rely on that behavior, though, it will break MSIs in emulated
> devices. And if Qemu happens to move the MSI doorbell in future machine
> revisions, then it would also break VFIO.

Yes, make sense to me

> 
> Just for my own understanding -- what happens, I think, is that in Linux
> iova_reserve_iommu_regions initially reserves the guest-physical doorbell
> 0x08000000-0x08100000. Then much later, when the device driver requests
> an MSI, the irqchip driver calls iommu_dma_map_msi_msg with the guest-
> physical gicv2m address 0x08020000. The function finds the right page in
> msi_page_list, which was added by cookie_init_hw_msi_region, therefore
> bypassing the viommu and the GPA gets written in the MSI-X table.

This means in case tomorrow when qemu changes virt machine address map and vgic-its (its-translator register address) address range does not fall in the msi_page_list then it will allocate a new iova, create mapping in iommu. So this will no longer be identity mapped and fail to work with new qemu?

> 
> If an emulated device such as virtio-net-pci were to generate an MSI, then
> Qemu would attempt to access the doorbell written by Linux into the MSI-X
> table, 0x08020000, and fault because that address wasn't mapped in the
> viommu.
> 
> So for VFIO, you either need to translate the MSI-X entry using the viommu,
> or just assume that the vaddr corresponds to the only MSI doorbell
> accessible by this device (because how can we be certain that the guest
> already mapped the doorbell before writing the entry?)
> 
> For ARM machines it's probably best to stick with IOMMU_RESV_SW_MSI.
> However, a nice way to use IOMMU_RESV_MSI would be for the virtio-
> iommu device to advertise identity-mapped/reserved regions, and bypass
> translation on these regions. Then the driver could reserve those with
> IOMMU_RESV_MSI.

Correct me if I did not understood you correctly, today iommu-driver decides msi-reserved region, what if we change this and virtio-iommu device will provide the reserved msi region as per the emulated machine (virt/intel). So virtio-iommu driver will use the address advertised by virtio-iommu device as IOMMU_RESV_MSI. In this case msi-page-list will always have the reserved region for MSI.
On qemu side, for emulated devices we will let virtio-iommu return same address as translated address as it falls in MSI-reserved page already known to it.


> For x86 we will need such a system, with an added IRQ
> remapping feature.

I do not understand x86 MSI interrupt generation, but If above understand is correct, then why we need IRQ remapping for x86?
Will the x86 machine emulated in QEMU provides a big address range for MSIs and when actually generating MSI it needed some extra processing (IRQ-remapping processing) before actually generating write transaction for MSI interrupt ? 

Thanks
-Bharat

> 
> Thanks,
> Jean

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [Qemu-devel] [RFC v2 0/8] VIRTIO-IOMMU device
  2017-07-06 11:24                     ` Bharat Bhushan
@ 2017-07-06 11:55                       ` Jean-Philippe Brucker
  2017-07-06 21:16                       ` Auger Eric
  1 sibling, 0 replies; 73+ messages in thread
From: Jean-Philippe Brucker @ 2017-07-06 11:55 UTC (permalink / raw)
  To: Bharat Bhushan, Auger Eric, eric.auger.pro, peter.maydell,
	alex.williamson, mst, qemu-arm, qemu-devel
  Cc: wei, kevin.tian, marc.zyngier, tn, will.deacon, drjones,
	robin.murphy, christoffer.dall

On 06/07/17 12:24, Bharat Bhushan wrote:
> 
> 
>> -----Original Message-----
>> From: Jean-Philippe Brucker [mailto:jean-philippe.brucker@arm.com]
>> Sent: Thursday, July 06, 2017 3:33 PM
>> To: Bharat Bhushan <bharat.bhushan@nxp.com>; Auger Eric
>> <eric.auger@redhat.com>; eric.auger.pro@gmail.com;
>> peter.maydell@linaro.org; alex.williamson@redhat.com; mst@redhat.com;
>> qemu-arm@nongnu.org; qemu-devel@nongnu.org
>> Cc: wei@redhat.com; kevin.tian@intel.com; marc.zyngier@arm.com;
>> tn@semihalf.com; will.deacon@arm.com; drjones@redhat.com;
>> robin.murphy@arm.com; christoffer.dall@linaro.org
>> Subject: Re: [Qemu-devel] [RFC v2 0/8] VIRTIO-IOMMU device
>>
>> On 05/07/17 09:49, Bharat Bhushan wrote:>>> Also when setup msi-route
>> kvm_irqchip_add_msi_route() we needed to
>>>> provide the translated address.
>>>>> According to my understanding this is required because kernel does
>>>>> no go
>>>> through viommu translation when generating interrupt, no?
>>>>
>>>> yes this is needed when KVM MSI routes are set up, ie. along with GICV3
>> ITS.
>>>> With GICv2M, qemu direct gsi mapping is used and this is not needed.
>>>>
>>>> So I do not understand your previous sentence saying "MSI interrupts
>>>> works without any change".
>>>
>>> I have almost completed vfio integration with virtio-iommu and now
>>> testing the changes by assigning e1000 device to VM. For this I have
>>> changed virtio-iommu driver to use IOMMU_RESV_MSI rather than sw-msi
>>> and this does not need changed in vfio_get_addr()  and
>>> kvm_irqchip_add_msi_route()
>>
>> I understand you're reserving region 0x08000000-0x08100000 as
>> IOMMU_RESV_MSI instead of IOMMU_RESV_SW_MSI? I think this only
>> works because Qemu places the vgic in that area as well (in hw/arm/virt.c).
>> It's not a coincidence if the addresses are the same, because Eric chose them
>> for the Linux SMMU drivers and I copied them.
>>
>> We can't rely on that behavior, though, it will break MSIs in emulated
>> devices. And if Qemu happens to move the MSI doorbell in future machine
>> revisions, then it would also break VFIO.
> 
> Yes, make sense to me
> 
>>
>> Just for my own understanding -- what happens, I think, is that in Linux
>> iova_reserve_iommu_regions initially reserves the guest-physical doorbell
>> 0x08000000-0x08100000. Then much later, when the device driver requests
>> an MSI, the irqchip driver calls iommu_dma_map_msi_msg with the guest-
>> physical gicv2m address 0x08020000. The function finds the right page in
>> msi_page_list, which was added by cookie_init_hw_msi_region, therefore
>> bypassing the viommu and the GPA gets written in the MSI-X table.
> 
> This means in case tomorrow when qemu changes virt machine address map and vgic-its (its-translator register address) address range does not fall in the msi_page_list then it will allocate a new iova, create mapping in iommu. So this will no longer be identity mapped and fail to work with new qemu?

Precisely

>>
>> If an emulated device such as virtio-net-pci were to generate an MSI, then
>> Qemu would attempt to access the doorbell written by Linux into the MSI-X
>> table, 0x08020000, and fault because that address wasn't mapped in the
>> viommu.
>>
>> So for VFIO, you either need to translate the MSI-X entry using the viommu,
>> or just assume that the vaddr corresponds to the only MSI doorbell
>> accessible by this device (because how can we be certain that the guest
>> already mapped the doorbell before writing the entry?)
>>
>> For ARM machines it's probably best to stick with IOMMU_RESV_SW_MSI.
>> However, a nice way to use IOMMU_RESV_MSI would be for the virtio-
>> iommu device to advertise identity-mapped/reserved regions, and bypass
>> translation on these regions. Then the driver could reserve those with
>> IOMMU_RESV_MSI.
> 
> Correct me if I did not understood you correctly, today iommu-driver decides msi-reserved region, what if we change this and virtio-iommu device will provide the reserved msi region as per the emulated machine (virt/intel). So virtio-iommu driver will use the address advertised by virtio-iommu device as IOMMU_RESV_MSI. In this case msi-page-list will always have the reserved region for MSI.
> On qemu side, for emulated devices we will let virtio-iommu return same address as translated address as it falls in MSI-reserved page already known to it.

Yes that's it. For example on x86, the virtio-iommu device will advertise
range 0xfee00000-0xfeefffff as IOMMU_RESV_MSI, which is the IOAPIC
address. And it will leave any access to this region go through
untranslated. As far as I understand, that's what the AMD vIOMMU does.

In the x86 world this fee00000 window is known implicitly to be the MSI
window. But the virtio-iommu device needs a generic method to declare
which bypassed windows it implements (and actually implement it), and I
still have to think about that. That way an emulated ARM system could also
advertise the GIC doorbell as bypassed.

>> For x86 we will need such a system, with an added IRQ
>> remapping feature.
> 
> I do not understand x86 MSI interrupt generation, but If above understand is correct, then why we need IRQ remapping for x86?
> Will the x86 machine emulated in QEMU provides a big address range for MSIs and when actually generating MSI it needed some extra processing (IRQ-remapping processing) before actually generating write transaction for MSI interrupt ? 

On x86 it's not the irqchip that performs MSI remapping, but the IOMMU.
For example the vtd emulation (intel viommu) performs MSI remapping, but
the AMD viommu doesn't yet. So if we want the virtio-iommu to be adequate
for the x86 world as well, we'll probably need rudimentary IRQ remapping
support, in parallel of the bypassed window feature described above.

Thank,
Jean

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [Qemu-devel] [RFC v2 0/8] VIRTIO-IOMMU device
  2017-07-06 10:02                   ` Jean-Philippe Brucker
  2017-07-06 11:24                     ` Bharat Bhushan
@ 2017-07-06 21:11                     ` Auger Eric
  2017-07-07  7:31                       ` Auger Eric
  2017-07-07 15:20                       ` Jean-Philippe Brucker
  1 sibling, 2 replies; 73+ messages in thread
From: Auger Eric @ 2017-07-06 21:11 UTC (permalink / raw)
  To: Jean-Philippe Brucker, Bharat Bhushan, eric.auger.pro,
	peter.maydell, alex.williamson, mst, qemu-arm, qemu-devel
  Cc: wei, kevin.tian, marc.zyngier, tn, will.deacon, drjones,
	robin.murphy, christoffer.dall

Hello Bharat, Jean-Philippe,
On 06/07/2017 12:02, Jean-Philippe Brucker wrote:
> On 05/07/17 09:49, Bharat Bhushan wrote:>>> Also when setup msi-route
> kvm_irqchip_add_msi_route() we needed to
>>> provide the translated address.
>>>> According to my understanding this is required because kernel does no go
>>> through viommu translation when generating interrupt, no?
>>>
>>> yes this is needed when KVM MSI routes are set up, ie. along with GICV3 ITS.
>>> With GICv2M, qemu direct gsi mapping is used and this is not needed.
>>>
>>> So I do not understand your previous sentence saying "MSI interrupts works
>>> without any change".
>>
>> I have almost completed vfio integration with virtio-iommu and now testing the changes by assigning e1000 device to VM. For this I have changed virtio-iommu driver to use IOMMU_RESV_MSI rather than sw-msi and this does not need changed in vfio_get_addr()  and kvm_irqchip_add_msi_route()
> 
> I understand you're reserving region 0x08000000-0x08100000 as
> IOMMU_RESV_MSI instead of IOMMU_RESV_SW_MSI? I think this only works
> because Qemu places the vgic in that area as well (in hw/arm/virt.c). It's
> not a coincidence if the addresses are the same, because Eric chose them
> for the Linux SMMU drivers and I copied them.

Yes I chose this region because it does not overlap with any guest RAM
region

> 
> We can't rely on that behavior, though, it will break MSIs in emulated
> devices. And if Qemu happens to move the MSI doorbell in future machine
> revisions, then it would also break VFIO.
> 
> Just for my own understanding -- what happens, I think, is that in Linux
> iova_reserve_iommu_regions initially reserves the guest-physical doorbell
> 0x08000000-0x08100000. Then much later, when the device driver requests an
> MSI, the irqchip driver calls iommu_dma_map_msi_msg with the
> guest-physical gicv2m address 0x08020000. The function finds the right
> page in msi_page_list, which was added by cookie_init_hw_msi_region,
> therefore bypassing the viommu and the GPA gets written in the MSI-X table.

I share Jean's understanding. To me using IOMMU_RESV_MSI in the
virtio-iommu means this region is not translated by the IOMMU. as
cookie_init_hw_msi_region() pre-allocates the msi_page array,
iommu_dma_get_msi_page() does not do any IOMMU mapping.

> 
> If an emulated device such as virtio-net-pci were to generate an MSI, then
> Qemu would attempt to access the doorbell written by Linux into the MSI-X
> table, 0x08020000, and fault because that address wasn't mapped in the viommu.
Yes so I am confused, how can it work with a virtio-net-pci or
passthrough'ed e1000e device using MSIs?
> 
> So for VFIO, you either need to translate the MSI-X entry using the
> viommu,

For the vsmmuv3 I created a dedicated IOMMUNotifier to handle the fact
the MSI doorbell is translated and MSI routes need to be updated. This
seems to work.

 or just assume that the vaddr corresponds to the only MSI doorbell
> accessible by this device (because how can we be certain that the guest
> already mapped the doorbell before writing the entry?)
> 
> For ARM machines it's probably best to stick with IOMMU_RESV_SW_MSI.
> However, a nice way to use IOMMU_RESV_MSI would be for the virtio-iommu
> device to advertise identity-mapped/reserved regions, and bypass
> translation on these regions. Then the driver could reserve those with
> IOMMU_RESV_MSI.

At least we may need to configure the virtio-iommu to either bypass MSIs
(x86) or translate MSIs (ARM)?
 For x86 we will need such a system, with an added IRQ
> remapping feature.
Meaning this must live along with vIR, is that what you mean? Also on
ARM this must live with vITS anyway. This is an orthogonal feature, right?

Thanks

Eric
> 
> Thanks,
> Jean
> 

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [Qemu-devel] [RFC v2 0/8] VIRTIO-IOMMU device
  2017-07-06 11:24                     ` Bharat Bhushan
  2017-07-06 11:55                       ` Jean-Philippe Brucker
@ 2017-07-06 21:16                       ` Auger Eric
  2017-07-06 23:23                         ` [Qemu-devel] [Qemu-arm] " Kalra, Ashish
  2017-07-07  6:25                         ` [Qemu-devel] " Bharat Bhushan
  1 sibling, 2 replies; 73+ messages in thread
From: Auger Eric @ 2017-07-06 21:16 UTC (permalink / raw)
  To: Bharat Bhushan, Jean-Philippe Brucker, eric.auger.pro,
	peter.maydell, alex.williamson, mst, qemu-arm, qemu-devel
  Cc: wei, kevin.tian, marc.zyngier, tn, will.deacon, drjones,
	robin.murphy, christoffer.dall

Hi Bharat,

On 06/07/2017 13:24, Bharat Bhushan wrote:
> 
> 
>> -----Original Message-----
>> From: Jean-Philippe Brucker [mailto:jean-philippe.brucker@arm.com]
>> Sent: Thursday, July 06, 2017 3:33 PM
>> To: Bharat Bhushan <bharat.bhushan@nxp.com>; Auger Eric
>> <eric.auger@redhat.com>; eric.auger.pro@gmail.com;
>> peter.maydell@linaro.org; alex.williamson@redhat.com; mst@redhat.com;
>> qemu-arm@nongnu.org; qemu-devel@nongnu.org
>> Cc: wei@redhat.com; kevin.tian@intel.com; marc.zyngier@arm.com;
>> tn@semihalf.com; will.deacon@arm.com; drjones@redhat.com;
>> robin.murphy@arm.com; christoffer.dall@linaro.org
>> Subject: Re: [Qemu-devel] [RFC v2 0/8] VIRTIO-IOMMU device
>>
>> On 05/07/17 09:49, Bharat Bhushan wrote:>>> Also when setup msi-route
>> kvm_irqchip_add_msi_route() we needed to
>>>> provide the translated address.
>>>>> According to my understanding this is required because kernel does
>>>>> no go
>>>> through viommu translation when generating interrupt, no?
>>>>
>>>> yes this is needed when KVM MSI routes are set up, ie. along with GICV3
>> ITS.
>>>> With GICv2M, qemu direct gsi mapping is used and this is not needed.
>>>>
>>>> So I do not understand your previous sentence saying "MSI interrupts
>>>> works without any change".
>>>
>>> I have almost completed vfio integration with virtio-iommu and now
>>> testing the changes by assigning e1000 device to VM. For this I have
>>> changed virtio-iommu driver to use IOMMU_RESV_MSI rather than sw-msi
>>> and this does not need changed in vfio_get_addr()  and
>>> kvm_irqchip_add_msi_route()
>>
>> I understand you're reserving region 0x08000000-0x08100000 as
>> IOMMU_RESV_MSI instead of IOMMU_RESV_SW_MSI? I think this only
>> works because Qemu places the vgic in that area as well (in hw/arm/virt.c).
>> It's not a coincidence if the addresses are the same, because Eric chose them
>> for the Linux SMMU drivers and I copied them.
>>
>> We can't rely on that behavior, though, it will break MSIs in emulated
>> devices. And if Qemu happens to move the MSI doorbell in future machine
>> revisions, then it would also break VFIO.
> 
> Yes, make sense to me
> 
>>
>> Just for my own understanding -- what happens, I think, is that in Linux
>> iova_reserve_iommu_regions initially reserves the guest-physical doorbell
>> 0x08000000-0x08100000. Then much later, when the device driver requests
>> an MSI, the irqchip driver calls iommu_dma_map_msi_msg with the guest-
>> physical gicv2m address 0x08020000. The function finds the right page in
>> msi_page_list, which was added by cookie_init_hw_msi_region, therefore
>> bypassing the viommu and the GPA gets written in the MSI-X table.
> 
> This means in case tomorrow when qemu changes virt machine address map and vgic-its (its-translator register address) address range does not fall in the msi_page_list then it will allocate a new iova, create mapping in iommu. So this will no longer be identity mapped and fail to work with new qemu?
> 
Yes that's correct.
>>
>> If an emulated device such as virtio-net-pci were to generate an MSI, then
>> Qemu would attempt to access the doorbell written by Linux into the MSI-X
>> table, 0x08020000, and fault because that address wasn't mapped in the
>> viommu.
>>
>> So for VFIO, you either need to translate the MSI-X entry using the viommu,
>> or just assume that the vaddr corresponds to the only MSI doorbell
>> accessible by this device (because how can we be certain that the guest
>> already mapped the doorbell before writing the entry?)
>>
>> For ARM machines it's probably best to stick with IOMMU_RESV_SW_MSI.
>> However, a nice way to use IOMMU_RESV_MSI would be for the virtio-
>> iommu device to advertise identity-mapped/reserved regions, and bypass
>> translation on these regions. Then the driver could reserve those with
>> IOMMU_RESV_MSI.
> 
> Correct me if I did not understood you correctly, today iommu-driver decides msi-reserved region, what if we change this and virtio-iommu device will provide the reserved msi region as per the emulated machine (virt/intel). So virtio-iommu driver will use the address advertised by virtio-iommu device as IOMMU_RESV_MSI. In this case msi-page-list will always have the reserved region for MSI.
> On qemu side, for emulated devices we will let virtio-iommu return same address as translated address as it falls in MSI-reserved page already known to it.

I think what you're proposing here corresponds to the 1st approach that
was followed for PCIe passthrough/MSI on ARM, ie. the userspace was
providing the reserved region base address & size. This was ruled out
and now this region is arbitrarily set by the smmu-driver. At the moment
this means this region cannot contain guest RAM.

> 
> 
>> For x86 we will need such a system, with an added IRQ
>> remapping feature.
> 
> I do not understand x86 MSI interrupt generation, but If above understand is correct, then why we need IRQ remapping for x86?
To me x86 IR corresponds simply corresponds to the ITS MSI controller
modality on ARM. So as you still need vITS along with virtio-iommu on
ARM, you need vIRQ alongs with virtio-iommu on Intel. Does that make sense?

So in any case we need to make sure the guest uses a vITS or vIR to make
sure MSIs are correctly isolated.


> Will the x86 machine emulated in QEMU provides a big address range for MSIs and when actually generating MSI it needed some extra processing (IRQ-remapping processing) before actually generating write transaction for MSI interrupt ? 
My understanding is on x86, the MSI window is fixed and matches
[FEE0_0000h – FEF0_000h]. MSIs are conveyed on a separate address space
than usual DMA accesses. And yes they end up in IR if supported in the HW.

Thanks

Eric
> 
> Thanks
> -Bharat
> 
>>
>> Thanks,
>> Jean

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [Qemu-devel] [Qemu-arm]  [RFC v2 0/8] VIRTIO-IOMMU device
  2017-07-06 21:16                       ` Auger Eric
@ 2017-07-06 23:23                         ` Kalra, Ashish
  2017-07-06 23:29                           ` Michael S. Tsirkin
  2017-07-06 23:33                           ` Tian, Kevin
  2017-07-07  6:25                         ` [Qemu-devel] " Bharat Bhushan
  1 sibling, 2 replies; 73+ messages in thread
From: Kalra, Ashish @ 2017-07-06 23:23 UTC (permalink / raw)
  To: Auger Eric, Bharat Bhushan, Jean-Philippe Brucker,
	eric.auger.pro, peter.maydell, alex.williamson, mst, qemu-arm,
	qemu-devel
  Cc: kevin.tian, marc.zyngier, tn, will.deacon, drjones, robin.murphy,
	christoffer.dall

I have a generic question on vIOMMU support, is there any proposal/plan to add ATS/PRI extension support to vIOMMUs and allow
handling for end to end (v)IOMMU Page faults (w/t the device side implementation on Vhost) ?

Again, the motivation will be to do DMA on paged guest memory and potentially avoiding the requirement of pinned/locked
guest physical memory for DMA.

Thanks,
Ashish

-----Original Message-----
From: Qemu-arm [mailto:qemu-arm-bounces+ashish.kalra=cavium.com@nongnu.org] On Behalf Of Auger Eric
Sent: Friday, July 07, 2017 2:47 AM
To: Bharat Bhushan; Jean-Philippe Brucker; eric.auger.pro@gmail.com; peter.maydell@linaro.org; alex.williamson@redhat.com; mst@redhat.com; qemu-arm@nongnu.org; qemu-devel@nongnu.org
Cc: kevin.tian@intel.com; marc.zyngier@arm.com; tn@semihalf.com; will.deacon@arm.com; drjones@redhat.com; robin.murphy@arm.com; christoffer.dall@linaro.org
Subject: Re: [Qemu-arm] [Qemu-devel] [RFC v2 0/8] VIRTIO-IOMMU device

Hi Bharat,

On 06/07/2017 13:24, Bharat Bhushan wrote:
> 
> 
>> -----Original Message-----
>> From: Jean-Philippe Brucker [mailto:jean-philippe.brucker@arm.com]
>> Sent: Thursday, July 06, 2017 3:33 PM
>> To: Bharat Bhushan <bharat.bhushan@nxp.com>; Auger Eric 
>> <eric.auger@redhat.com>; eric.auger.pro@gmail.com; 
>> peter.maydell@linaro.org; alex.williamson@redhat.com; mst@redhat.com; 
>> qemu-arm@nongnu.org; qemu-devel@nongnu.org
>> Cc: wei@redhat.com; kevin.tian@intel.com; marc.zyngier@arm.com; 
>> tn@semihalf.com; will.deacon@arm.com; drjones@redhat.com; 
>> robin.murphy@arm.com; christoffer.dall@linaro.org
>> Subject: Re: [Qemu-devel] [RFC v2 0/8] VIRTIO-IOMMU device
>>
>> On 05/07/17 09:49, Bharat Bhushan wrote:>>> Also when setup msi-route
>> kvm_irqchip_add_msi_route() we needed to
>>>> provide the translated address.
>>>>> According to my understanding this is required because kernel does 
>>>>> no go
>>>> through viommu translation when generating interrupt, no?
>>>>
>>>> yes this is needed when KVM MSI routes are set up, ie. along with 
>>>> GICV3
>> ITS.
>>>> With GICv2M, qemu direct gsi mapping is used and this is not needed.
>>>>
>>>> So I do not understand your previous sentence saying "MSI 
>>>> interrupts works without any change".
>>>
>>> I have almost completed vfio integration with virtio-iommu and now 
>>> testing the changes by assigning e1000 device to VM. For this I have 
>>> changed virtio-iommu driver to use IOMMU_RESV_MSI rather than sw-msi 
>>> and this does not need changed in vfio_get_addr()  and
>>> kvm_irqchip_add_msi_route()
>>
>> I understand you're reserving region 0x08000000-0x08100000 as 
>> IOMMU_RESV_MSI instead of IOMMU_RESV_SW_MSI? I think this only works 
>> because Qemu places the vgic in that area as well (in hw/arm/virt.c).
>> It's not a coincidence if the addresses are the same, because Eric 
>> chose them for the Linux SMMU drivers and I copied them.
>>
>> We can't rely on that behavior, though, it will break MSIs in 
>> emulated devices. And if Qemu happens to move the MSI doorbell in 
>> future machine revisions, then it would also break VFIO.
> 
> Yes, make sense to me
> 
>>
>> Just for my own understanding -- what happens, I think, is that in 
>> Linux iova_reserve_iommu_regions initially reserves the 
>> guest-physical doorbell 0x08000000-0x08100000. Then much later, when 
>> the device driver requests an MSI, the irqchip driver calls 
>> iommu_dma_map_msi_msg with the guest- physical gicv2m address 
>> 0x08020000. The function finds the right page in msi_page_list, which 
>> was added by cookie_init_hw_msi_region, therefore bypassing the viommu and the GPA gets written in the MSI-X table.
> 
> This means in case tomorrow when qemu changes virt machine address map and vgic-its (its-translator register address) address range does not fall in the msi_page_list then it will allocate a new iova, create mapping in iommu. So this will no longer be identity mapped and fail to work with new qemu?
> 
Yes that's correct.
>>
>> If an emulated device such as virtio-net-pci were to generate an MSI, 
>> then Qemu would attempt to access the doorbell written by Linux into 
>> the MSI-X table, 0x08020000, and fault because that address wasn't 
>> mapped in the viommu.
>>
>> So for VFIO, you either need to translate the MSI-X entry using the 
>> viommu, or just assume that the vaddr corresponds to the only MSI 
>> doorbell accessible by this device (because how can we be certain 
>> that the guest already mapped the doorbell before writing the entry?)
>>
>> For ARM machines it's probably best to stick with IOMMU_RESV_SW_MSI.
>> However, a nice way to use IOMMU_RESV_MSI would be for the virtio- 
>> iommu device to advertise identity-mapped/reserved regions, and 
>> bypass translation on these regions. Then the driver could reserve 
>> those with IOMMU_RESV_MSI.
> 
> Correct me if I did not understood you correctly, today iommu-driver decides msi-reserved region, what if we change this and virtio-iommu device will provide the reserved msi region as per the emulated machine (virt/intel). So virtio-iommu driver will use the address advertised by virtio-iommu device as IOMMU_RESV_MSI. In this case msi-page-list will always have the reserved region for MSI.
> On qemu side, for emulated devices we will let virtio-iommu return same address as translated address as it falls in MSI-reserved page already known to it.

I think what you're proposing here corresponds to the 1st approach that was followed for PCIe passthrough/MSI on ARM, ie. the userspace was providing the reserved region base address & size. This was ruled out and now this region is arbitrarily set by the smmu-driver. At the moment this means this region cannot contain guest RAM.

> 
> 
>> For x86 we will need such a system, with an added IRQ remapping 
>> feature.
> 
> I do not understand x86 MSI interrupt generation, but If above understand is correct, then why we need IRQ remapping for x86?
To me x86 IR corresponds simply corresponds to the ITS MSI controller modality on ARM. So as you still need vITS along with virtio-iommu on ARM, you need vIRQ alongs with virtio-iommu on Intel. Does that make sense?

So in any case we need to make sure the guest uses a vITS or vIR to make sure MSIs are correctly isolated.


> Will the x86 machine emulated in QEMU provides a big address range for MSIs and when actually generating MSI it needed some extra processing (IRQ-remapping processing) before actually generating write transaction for MSI interrupt ? 
My understanding is on x86, the MSI window is fixed and matches [FEE0_0000h – FEF0_000h]. MSIs are conveyed on a separate address space than usual DMA accesses. And yes they end up in IR if supported in the HW.

Thanks

Eric
> 
> Thanks
> -Bharat
> 
>>
>> Thanks,
>> Jean


^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [Qemu-devel] [Qemu-arm]  [RFC v2 0/8] VIRTIO-IOMMU device
  2017-07-06 23:23                         ` [Qemu-devel] [Qemu-arm] " Kalra, Ashish
@ 2017-07-06 23:29                           ` Michael S. Tsirkin
  2017-07-06 23:33                           ` Tian, Kevin
  1 sibling, 0 replies; 73+ messages in thread
From: Michael S. Tsirkin @ 2017-07-06 23:29 UTC (permalink / raw)
  To: Kalra, Ashish
  Cc: Auger Eric, Bharat Bhushan, Jean-Philippe Brucker,
	eric.auger.pro, peter.maydell, alex.williamson, qemu-arm,
	qemu-devel, kevin.tian, marc.zyngier, tn, will.deacon, drjones,
	robin.murphy, christoffer.dall

On Thu, Jul 06, 2017 at 11:23:41PM +0000, Kalra, Ashish wrote:
> I have a generic question on vIOMMU support, is there any proposal/plan to add ATS/PRI extension support to vIOMMUs and allow
> handling for end to end (v)IOMMU Page faults (w/t the device side implementation on Vhost) ?
> 
> Again, the motivation will be to do DMA on paged guest memory and potentially avoiding the requirement of pinned/locked
> guest physical memory for DMA.
> 
> Thanks,
> Ashish

If that's your goal, first step would be support in VFIO.
End to end would only be helpful when running userspace
drivers within guest, or nested virt.

-- 
MST

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [Qemu-devel] [Qemu-arm]  [RFC v2 0/8] VIRTIO-IOMMU device
  2017-07-06 23:23                         ` [Qemu-devel] [Qemu-arm] " Kalra, Ashish
  2017-07-06 23:29                           ` Michael S. Tsirkin
@ 2017-07-06 23:33                           ` Tian, Kevin
  2017-07-07 15:14                             ` Jean-Philippe Brucker
  1 sibling, 1 reply; 73+ messages in thread
From: Tian, Kevin @ 2017-07-06 23:33 UTC (permalink / raw)
  To: Kalra, Ashish, Auger Eric, Bharat Bhushan, Jean-Philippe Brucker,
	eric.auger.pro, peter.maydell, alex.williamson, mst, qemu-arm,
	qemu-devel
  Cc: marc.zyngier, tn, will.deacon, drjones, robin.murphy, christoffer.dall

> From: Kalra, Ashish [mailto:Ashish.Kalra@cavium.com]
> Sent: Friday, July 7, 2017 7:24 AM
> 
> I have a generic question on vIOMMU support, is there any proposal/plan to
> add ATS/PRI extension support to vIOMMUs and allow
> handling for end to end (v)IOMMU Page faults (w/t the device side
> implementation on Vhost) ?
> 
> Again, the motivation will be to do DMA on paged guest memory and
> potentially avoiding the requirement of pinned/locked
> guest physical memory for DMA.

yes, that's a necessary part to support SVM in both virtio-iommu 
approach and fully emulated approach (e.g. for Intel VTd). There
are already patches and discussions in other thread about how to
propagate IOMMU page fault to vIOMMU. Then after it is done
vIOMMU page fault emulation will be further added.

https://lkml.org/lkml/2017/6/27/964

Thanks
Kevin

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [Qemu-devel] [RFC v2 0/8] VIRTIO-IOMMU device
  2017-07-05 12:44               ` Jean-Philippe Brucker
@ 2017-07-07  6:21                 ` Tian, Kevin
  2017-07-07 15:15                   ` Jean-Philippe Brucker
  0 siblings, 1 reply; 73+ messages in thread
From: Tian, Kevin @ 2017-07-07  6:21 UTC (permalink / raw)
  To: Jean-Philippe Brucker, Bharat Bhushan, Auger Eric,
	eric.auger.pro, peter.maydell, alex.williamson, mst, qemu-arm,
	qemu-devel
  Cc: wei, marc.zyngier, tn, will.deacon, drjones, robin.murphy,
	christoffer.dall

> From: Jean-Philippe Brucker [mailto:jean-philippe.brucker@arm.com]
> Sent: Wednesday, July 5, 2017 8:45 PM
> 
> On 05/07/17 08:14, Tian, Kevin wrote:
> >> From: Jean-Philippe Brucker [mailto:jean-philippe.brucker@arm.com]
> >> Sent: Monday, June 19, 2017 6:15 PM
> >>
> >> On 19/06/17 08:54, Bharat Bhushan wrote:
> >>> Hi Eric,
> >>>
> >>> I started added replay in virtio-iommu and came across how MSI
> interrupts
> >> with work with VFIO.
> >>> I understand that on intel this works differently but vsmmu will have
> same
> >> requirement.
> >>> kvm-msi-irq-route are added using the msi-address to be translated by
> >> viommu and not the final translated address.
> >>> While currently the irqfd framework does not know about emulated
> >> iommus (virtio-iommu, vsmmuv3/vintel-iommu).
> >>> So in my view we have following options:
> >>> - Programming with translated address when setting up kvm-msi-irq-
> route
> >>> - Route the interrupts via QEMU, which is bad from performance
> >>> - vhost-virtio-iommu may solve the problem in long term
> >>>
> >>> Is there any other better option I am missing?
> >>
> >> Since we're on the topic of MSIs... I'm currently trying to figure out how
> >> we'll handle MSIs in the nested translation mode, where the guest
> manages
> >> S1 page tables and the host doesn't know about GVA->GPA translation.
> >>
> >> I'm also wondering about the benefits of having SW-mapped MSIs in the
> >> guest. It seems unavoidable for vSMMU since that's what a physical
> system
> >> would do. But in a paravirtualized solution there doesn't seem to be any
> >> compelling reason for having the guest map MSI doorbells. These
> addresses
> >> are never accessed directly, they are only used for setting up IRQ routing
> >> (at least on kvmtool). So here's what I'd like to have. Note that I
> >> haven't investigated the feasibility in Qemu yet, I don't know how it
> >> deals with MSIs.
> >>
> >> (1) Guest uses the guest-physical MSI doorbell when setting up MSIs. For
> >> ARM with GICv3 this would be GITS_TRANSLATER, for x86 it would be the
> >> fixed MSI doorbell. This way the host wouldn't need to inspect IOMMU
> >> mappings when handling writes to PCI MSI-X tables.
> >>
> >
> > What do you mean by "fixed MSI doorbell"? PCI MSI-X table is part of
> > PCI MMIO bar. Accessing to it is just a memory virtualization issue (e.g.
> > trap by KVM and then emulated in Qemu) on x86. It's not a IOMMU
> > problem. I guess you may mean same thing but want to double confirm
> > here given the terminology confusion. Or do you mean the interrupt
> > triggered by IOMMU itself?
> 
> Yes I didn't mean access to the MSI-X table, but how we interpret the
> address in the MSI message. In kvmtool I create MSI routes for VFIO
> devices when the guest accesses the MSI-X tables. And on ARM the tables
> contains an IOVA that needs to be translated into a PA, so handling a
> write to an MSI-X entry might mean doing the IOVA->PA translation of the
> doorbell.
> 
> On x86 the MSI address is 0xfee...., whether there is an IOMMU or not.
> That's what I meant by fixed. And it is the IOMMU that performs IRQ
> remapping.
> 
> On physical ARM systems, the SMMU doesn't treat any special address range
> as "MSI window". For the SMMU, an MSI is simply a memory transaction.
> MSI
> addresses are arbitrary IOVAs that get translated into PAs by the SMMU.
> The SMMU doesn't perform any IRQ remapping, only address translation.
> This
> PA is a doorbell register in the irqchip, which performs IRQ remapping and
> triggers an interrupt.

Thanks for explanation. I see the background now.

> 
> Therefore in an emulated ARM system, when the guest writes the MSI-X
> table, it writes an IOVA. In a strict emulation the MSI would have to
> first go through the vIOMMU, and then into the irqchip. I was wondering if
> with virtio-iommu we could skip the address translation and go to the MSI
> remapping component immediately, effectively implementing a "hardware
> MSI
> window". This is what x86 does, the difference being that MSI remapping is
> done by the IOMMU on x86, and by the irqchip on ARM.

sorry I didn't quite get this part, and here is my understanding:

Guest programs vIOMMU to map a gIOVA (used by MSI to a GPA 
of doorbell register of virtual irqchip. vIOMMU then 
triggers VFIO map/unmap to update physical IOMMU page
table for gIOVA -> HPA of real doorbell of physical irqchip
(assume your irqchip will provide multiple doorbells so each
device can have its own channel). then once this update is
done, later MSI interrupts from assigned device will go 
through physical IOMMU (gIOVA->HPA) then reach irqchip 
for irq remapping. vIOMMU is involved only in configuration
path instead of actual interrupt path.

If my understanding is correct, above will be the natural flow then
why is additional virtio-iommu change required? :-)

> 
> My current take is that we should keep the current behavior, but I will
> try to sort out the different ways of implementing MSIs with virtio-iommu
> in the next specification draft.
> 
> Thanks,
> Jean

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [Qemu-devel] [RFC v2 0/8] VIRTIO-IOMMU device
  2017-07-06 21:16                       ` Auger Eric
  2017-07-06 23:23                         ` [Qemu-devel] [Qemu-arm] " Kalra, Ashish
@ 2017-07-07  6:25                         ` Bharat Bhushan
  2017-07-07  7:25                           ` Auger Eric
  1 sibling, 1 reply; 73+ messages in thread
From: Bharat Bhushan @ 2017-07-07  6:25 UTC (permalink / raw)
  To: Auger Eric, Jean-Philippe Brucker, eric.auger.pro, peter.maydell,
	alex.williamson, mst, qemu-arm, qemu-devel
  Cc: wei, kevin.tian, marc.zyngier, tn, will.deacon, drjones,
	robin.murphy, christoffer.dall

Hi Eric,

> -----Original Message-----
> From: Auger Eric [mailto:eric.auger@redhat.com]
> Sent: Friday, July 07, 2017 2:47 AM
> To: Bharat Bhushan <bharat.bhushan@nxp.com>; Jean-Philippe Brucker
> <jean-philippe.brucker@arm.com>; eric.auger.pro@gmail.com;
> peter.maydell@linaro.org; alex.williamson@redhat.com; mst@redhat.com;
> qemu-arm@nongnu.org; qemu-devel@nongnu.org
> Cc: wei@redhat.com; kevin.tian@intel.com; marc.zyngier@arm.com;
> tn@semihalf.com; will.deacon@arm.com; drjones@redhat.com;
> robin.murphy@arm.com; christoffer.dall@linaro.org
> Subject: Re: [Qemu-devel] [RFC v2 0/8] VIRTIO-IOMMU device
> 
> Hi Bharat,
> 
> On 06/07/2017 13:24, Bharat Bhushan wrote:
> >
> >
> >> -----Original Message-----
> >> From: Jean-Philippe Brucker [mailto:jean-philippe.brucker@arm.com]
> >> Sent: Thursday, July 06, 2017 3:33 PM
> >> To: Bharat Bhushan <bharat.bhushan@nxp.com>; Auger Eric
> >> <eric.auger@redhat.com>; eric.auger.pro@gmail.com;
> >> peter.maydell@linaro.org; alex.williamson@redhat.com;
> mst@redhat.com;
> >> qemu-arm@nongnu.org; qemu-devel@nongnu.org
> >> Cc: wei@redhat.com; kevin.tian@intel.com; marc.zyngier@arm.com;
> >> tn@semihalf.com; will.deacon@arm.com; drjones@redhat.com;
> >> robin.murphy@arm.com; christoffer.dall@linaro.org
> >> Subject: Re: [Qemu-devel] [RFC v2 0/8] VIRTIO-IOMMU device
> >>
> >> On 05/07/17 09:49, Bharat Bhushan wrote:>>> Also when setup msi-route
> >> kvm_irqchip_add_msi_route() we needed to
> >>>> provide the translated address.
> >>>>> According to my understanding this is required because kernel does
> >>>>> no go
> >>>> through viommu translation when generating interrupt, no?
> >>>>
> >>>> yes this is needed when KVM MSI routes are set up, ie. along with
> >>>> GICV3
> >> ITS.
> >>>> With GICv2M, qemu direct gsi mapping is used and this is not needed.
> >>>>
> >>>> So I do not understand your previous sentence saying "MSI
> >>>> interrupts works without any change".
> >>>
> >>> I have almost completed vfio integration with virtio-iommu and now
> >>> testing the changes by assigning e1000 device to VM. For this I have
> >>> changed virtio-iommu driver to use IOMMU_RESV_MSI rather than sw-
> msi
> >>> and this does not need changed in vfio_get_addr()  and
> >>> kvm_irqchip_add_msi_route()
> >>
> >> I understand you're reserving region 0x08000000-0x08100000 as
> >> IOMMU_RESV_MSI instead of IOMMU_RESV_SW_MSI? I think this only
> works
> >> because Qemu places the vgic in that area as well (in hw/arm/virt.c).
> >> It's not a coincidence if the addresses are the same, because Eric
> >> chose them for the Linux SMMU drivers and I copied them.
> >>
> >> We can't rely on that behavior, though, it will break MSIs in
> >> emulated devices. And if Qemu happens to move the MSI doorbell in
> >> future machine revisions, then it would also break VFIO.
> >
> > Yes, make sense to me
> >
> >>
> >> Just for my own understanding -- what happens, I think, is that in
> >> Linux iova_reserve_iommu_regions initially reserves the
> >> guest-physical doorbell 0x08000000-0x08100000. Then much later, when
> >> the device driver requests an MSI, the irqchip driver calls
> >> iommu_dma_map_msi_msg with the guest- physical gicv2m address
> >> 0x08020000. The function finds the right page in msi_page_list, which
> >> was added by cookie_init_hw_msi_region, therefore bypassing the
> viommu and the GPA gets written in the MSI-X table.
> >
> > This means in case tomorrow when qemu changes virt machine address
> map and vgic-its (its-translator register address) address range does not fall
> in the msi_page_list then it will allocate a new iova, create mapping in
> iommu. So this will no longer be identity mapped and fail to work with new
> qemu?
> >
> Yes that's correct.
> >>
> >> If an emulated device such as virtio-net-pci were to generate an MSI,
> >> then Qemu would attempt to access the doorbell written by Linux into
> >> the MSI-X table, 0x08020000, and fault because that address wasn't
> >> mapped in the viommu.
> >>
> >> So for VFIO, you either need to translate the MSI-X entry using the
> >> viommu, or just assume that the vaddr corresponds to the only MSI
> >> doorbell accessible by this device (because how can we be certain
> >> that the guest already mapped the doorbell before writing the entry?)
> >>
> >> For ARM machines it's probably best to stick with
> IOMMU_RESV_SW_MSI.
> >> However, a nice way to use IOMMU_RESV_MSI would be for the virtio-
> >> iommu device to advertise identity-mapped/reserved regions, and
> >> bypass translation on these regions. Then the driver could reserve
> >> those with IOMMU_RESV_MSI.
> >
> > Correct me if I did not understood you correctly, today iommu-driver
> decides msi-reserved region, what if we change this and virtio-iommu device
> will provide the reserved msi region as per the emulated machine (virt/intel).
> So virtio-iommu driver will use the address advertised by virtio-iommu device
> as IOMMU_RESV_MSI. In this case msi-page-list will always have the
> reserved region for MSI.
> > On qemu side, for emulated devices we will let virtio-iommu return same
> address as translated address as it falls in MSI-reserved page already known
> to it.
> 
> I think what you're proposing here corresponds to the 1st approach that was
> followed for PCIe passthrough/MSI on ARM, ie. the userspace was providing
> the reserved region base address & size.
>  This was ruled out and now this
> region is arbitrarily set by the smmu-driver. At the moment this means this
> region cannot contain guest RAM.

In rejected proposal, user-space used to choose a reserve region and provide that to Host Linux. Host Linux uses that MSI mapping. Just for my understanding if tomorrow QEMU changes its address space then it may not work without changing SMMU msi-reserved-iova in host driver, right? For example if emulated machine have RAM at this address then it will not work?

In this proposal, QEMU reserves a iova-range for guest (not host) and guest kernel will use this as msi-iova untranslated (IOMMU_RESV_MSI). While this does not change host interface and it will continue to use host reserved mapping for actual interrupt generation, no?

Thanks
-Bharat

> 
> >
> >
> >> For x86 we will need such a system, with an added IRQ remapping
> >> feature.
> >
> > I do not understand x86 MSI interrupt generation, but If above understand
> is correct, then why we need IRQ remapping for x86?
> To me x86 IR corresponds simply corresponds to the ITS MSI controller
> modality on ARM. So as you still need vITS along with virtio-iommu on ARM,
> you need vIRQ alongs with virtio-iommu on Intel. Does that make sense?
> 
> So in any case we need to make sure the guest uses a vITS or vIR to make
> sure MSIs are correctly isolated.
> 
> 
> > Will the x86 machine emulated in QEMU provides a big address range for
> MSIs and when actually generating MSI it needed some extra processing
> (IRQ-remapping processing) before actually generating write transaction for
> MSI interrupt ?
> My understanding is on x86, the MSI window is fixed and matches
> [FEE0_0000h – FEF0_000h]. MSIs are conveyed on a separate address space
> than usual DMA accesses. And yes they end up in IR if supported in the HW.
> 
> Thanks
> 
> Eric
> >
> > Thanks
> > -Bharat
> >
> >>
> >> Thanks,
> >> Jean

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [Qemu-devel] [RFC v2 0/8] VIRTIO-IOMMU device
  2017-07-07  6:25                         ` [Qemu-devel] " Bharat Bhushan
@ 2017-07-07  7:25                           ` Auger Eric
  2017-07-07 11:36                             ` Bharat Bhushan
  0 siblings, 1 reply; 73+ messages in thread
From: Auger Eric @ 2017-07-07  7:25 UTC (permalink / raw)
  To: Bharat Bhushan, Jean-Philippe Brucker, eric.auger.pro,
	peter.maydell, alex.williamson, mst, qemu-arm, qemu-devel
  Cc: wei, kevin.tian, marc.zyngier, tn, will.deacon, drjones,
	robin.murphy, christoffer.dall



On 07/07/2017 08:25, Bharat Bhushan wrote:
> Hi Eric,
> 
>> -----Original Message-----
>> From: Auger Eric [mailto:eric.auger@redhat.com]
>> Sent: Friday, July 07, 2017 2:47 AM
>> To: Bharat Bhushan <bharat.bhushan@nxp.com>; Jean-Philippe Brucker
>> <jean-philippe.brucker@arm.com>; eric.auger.pro@gmail.com;
>> peter.maydell@linaro.org; alex.williamson@redhat.com; mst@redhat.com;
>> qemu-arm@nongnu.org; qemu-devel@nongnu.org
>> Cc: wei@redhat.com; kevin.tian@intel.com; marc.zyngier@arm.com;
>> tn@semihalf.com; will.deacon@arm.com; drjones@redhat.com;
>> robin.murphy@arm.com; christoffer.dall@linaro.org
>> Subject: Re: [Qemu-devel] [RFC v2 0/8] VIRTIO-IOMMU device
>>
>> Hi Bharat,
>>
>> On 06/07/2017 13:24, Bharat Bhushan wrote:
>>>
>>>
>>>> -----Original Message-----
>>>> From: Jean-Philippe Brucker [mailto:jean-philippe.brucker@arm.com]
>>>> Sent: Thursday, July 06, 2017 3:33 PM
>>>> To: Bharat Bhushan <bharat.bhushan@nxp.com>; Auger Eric
>>>> <eric.auger@redhat.com>; eric.auger.pro@gmail.com;
>>>> peter.maydell@linaro.org; alex.williamson@redhat.com;
>> mst@redhat.com;
>>>> qemu-arm@nongnu.org; qemu-devel@nongnu.org
>>>> Cc: wei@redhat.com; kevin.tian@intel.com; marc.zyngier@arm.com;
>>>> tn@semihalf.com; will.deacon@arm.com; drjones@redhat.com;
>>>> robin.murphy@arm.com; christoffer.dall@linaro.org
>>>> Subject: Re: [Qemu-devel] [RFC v2 0/8] VIRTIO-IOMMU device
>>>>
>>>> On 05/07/17 09:49, Bharat Bhushan wrote:>>> Also when setup msi-route
>>>> kvm_irqchip_add_msi_route() we needed to
>>>>>> provide the translated address.
>>>>>>> According to my understanding this is required because kernel does
>>>>>>> no go
>>>>>> through viommu translation when generating interrupt, no?
>>>>>>
>>>>>> yes this is needed when KVM MSI routes are set up, ie. along with
>>>>>> GICV3
>>>> ITS.
>>>>>> With GICv2M, qemu direct gsi mapping is used and this is not needed.
>>>>>>
>>>>>> So I do not understand your previous sentence saying "MSI
>>>>>> interrupts works without any change".
>>>>>
>>>>> I have almost completed vfio integration with virtio-iommu and now
>>>>> testing the changes by assigning e1000 device to VM. For this I have
>>>>> changed virtio-iommu driver to use IOMMU_RESV_MSI rather than sw-
>> msi
>>>>> and this does not need changed in vfio_get_addr()  and
>>>>> kvm_irqchip_add_msi_route()
>>>>
>>>> I understand you're reserving region 0x08000000-0x08100000 as
>>>> IOMMU_RESV_MSI instead of IOMMU_RESV_SW_MSI? I think this only
>> works
>>>> because Qemu places the vgic in that area as well (in hw/arm/virt.c).
>>>> It's not a coincidence if the addresses are the same, because Eric
>>>> chose them for the Linux SMMU drivers and I copied them.
>>>>
>>>> We can't rely on that behavior, though, it will break MSIs in
>>>> emulated devices. And if Qemu happens to move the MSI doorbell in
>>>> future machine revisions, then it would also break VFIO.
>>>
>>> Yes, make sense to me
>>>
>>>>
>>>> Just for my own understanding -- what happens, I think, is that in
>>>> Linux iova_reserve_iommu_regions initially reserves the
>>>> guest-physical doorbell 0x08000000-0x08100000. Then much later, when
>>>> the device driver requests an MSI, the irqchip driver calls
>>>> iommu_dma_map_msi_msg with the guest- physical gicv2m address
>>>> 0x08020000. The function finds the right page in msi_page_list, which
>>>> was added by cookie_init_hw_msi_region, therefore bypassing the
>> viommu and the GPA gets written in the MSI-X table.
>>>
>>> This means in case tomorrow when qemu changes virt machine address
>> map and vgic-its (its-translator register address) address range does not fall
>> in the msi_page_list then it will allocate a new iova, create mapping in
>> iommu. So this will no longer be identity mapped and fail to work with new
>> qemu?
>>>
>> Yes that's correct.
>>>>
>>>> If an emulated device such as virtio-net-pci were to generate an MSI,
>>>> then Qemu would attempt to access the doorbell written by Linux into
>>>> the MSI-X table, 0x08020000, and fault because that address wasn't
>>>> mapped in the viommu.
>>>>
>>>> So for VFIO, you either need to translate the MSI-X entry using the
>>>> viommu, or just assume that the vaddr corresponds to the only MSI
>>>> doorbell accessible by this device (because how can we be certain
>>>> that the guest already mapped the doorbell before writing the entry?)
>>>>
>>>> For ARM machines it's probably best to stick with
>> IOMMU_RESV_SW_MSI.
>>>> However, a nice way to use IOMMU_RESV_MSI would be for the virtio-
>>>> iommu device to advertise identity-mapped/reserved regions, and
>>>> bypass translation on these regions. Then the driver could reserve
>>>> those with IOMMU_RESV_MSI.
>>>
>>> Correct me if I did not understood you correctly, today iommu-driver
>> decides msi-reserved region, what if we change this and virtio-iommu device
>> will provide the reserved msi region as per the emulated machine (virt/intel).
>> So virtio-iommu driver will use the address advertised by virtio-iommu device
>> as IOMMU_RESV_MSI. In this case msi-page-list will always have the
>> reserved region for MSI.
>>> On qemu side, for emulated devices we will let virtio-iommu return same
>> address as translated address as it falls in MSI-reserved page already known
>> to it.
>>
>> I think what you're proposing here corresponds to the 1st approach that was
>> followed for PCIe passthrough/MSI on ARM, ie. the userspace was providing
>> the reserved region base address & size.
>>  This was ruled out and now this
>> region is arbitrarily set by the smmu-driver. At the moment this means this
>> region cannot contain guest RAM.
> 
> In rejected proposal, user-space used to choose a reserve region and provide that to Host Linux. Host Linux uses that MSI mapping. Just for my understanding if tomorrow QEMU changes its address space then it may not work without changing SMMU msi-reserved-iova in host driver, right? For example if emulated machine have RAM at this address then it will not work?
Yes that's correct. Note MSI reserved regions are now exposed to
userspace through /sys/kernel/iommu_groups/<>/reserved_regions
> 
> In this proposal, QEMU reserves a iova-range for guest (not host) and guest kernel will use this as msi-iova untranslated (IOMMU_RESV_MSI). While this does not change host interface and it will continue to use host reserved mapping for actual interrupt generation, no?
But then userspace needs to provide IOMMU_RESV_MSI range to guest
kernel, right? What would be the proposed manner? Looks weird to me to
have different MSI handling on host and guest. Also I still don't get
how you handle the case where virtio-net-pci emits accesses to the MSI
doorbell while this latter is not mapped.

Thanks

Eric
> 
> Thanks
> -Bharat
> 
>>
>>>
>>>
>>>> For x86 we will need such a system, with an added IRQ remapping
>>>> feature.
>>>
>>> I do not understand x86 MSI interrupt generation, but If above understand
>> is correct, then why we need IRQ remapping for x86?
>> To me x86 IR corresponds simply corresponds to the ITS MSI controller
>> modality on ARM. So as you still need vITS along with virtio-iommu on ARM,
>> you need vIRQ alongs with virtio-iommu on Intel. Does that make sense?
>>
>> So in any case we need to make sure the guest uses a vITS or vIR to make
>> sure MSIs are correctly isolated.
>>
>>
>>> Will the x86 machine emulated in QEMU provides a big address range for
>> MSIs and when actually generating MSI it needed some extra processing
>> (IRQ-remapping processing) before actually generating write transaction for
>> MSI interrupt ?
>> My understanding is on x86, the MSI window is fixed and matches
>> [FEE0_0000h – FEF0_000h]. MSIs are conveyed on a separate address space
>> than usual DMA accesses. And yes they end up in IR if supported in the HW.
>>
>> Thanks
>>
>> Eric
>>>
>>> Thanks
>>> -Bharat
>>>
>>>>
>>>> Thanks,
>>>> Jean

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [Qemu-devel] [RFC v2 0/8] VIRTIO-IOMMU device
  2017-07-06 21:11                     ` Auger Eric
@ 2017-07-07  7:31                       ` Auger Eric
  2017-07-07 15:20                       ` Jean-Philippe Brucker
  1 sibling, 0 replies; 73+ messages in thread
From: Auger Eric @ 2017-07-07  7:31 UTC (permalink / raw)
  To: Jean-Philippe Brucker, Bharat Bhushan, eric.auger.pro,
	peter.maydell, alex.williamson, mst, qemu-arm, qemu-devel
  Cc: wei, kevin.tian, marc.zyngier, tn, will.deacon, drjones,
	robin.murphy, christoffer.dall

Hi,

On 06/07/2017 23:11, Auger Eric wrote:
> Hello Bharat, Jean-Philippe,
> On 06/07/2017 12:02, Jean-Philippe Brucker wrote:
>> On 05/07/17 09:49, Bharat Bhushan wrote:>>> Also when setup msi-route
>> kvm_irqchip_add_msi_route() we needed to
>>>> provide the translated address.
>>>>> According to my understanding this is required because kernel does no go
>>>> through viommu translation when generating interrupt, no?
>>>>
>>>> yes this is needed when KVM MSI routes are set up, ie. along with GICV3 ITS.
>>>> With GICv2M, qemu direct gsi mapping is used and this is not needed.
>>>>
>>>> So I do not understand your previous sentence saying "MSI interrupts works
>>>> without any change".
>>>
>>> I have almost completed vfio integration with virtio-iommu and now testing the changes by assigning e1000 device to VM. For this I have changed virtio-iommu driver to use IOMMU_RESV_MSI rather than sw-msi and this does not need changed in vfio_get_addr()  and kvm_irqchip_add_msi_route()
>>
>> I understand you're reserving region 0x08000000-0x08100000 as
>> IOMMU_RESV_MSI instead of IOMMU_RESV_SW_MSI? I think this only works
>> because Qemu places the vgic in that area as well (in hw/arm/virt.c). It's
>> not a coincidence if the addresses are the same, because Eric chose them
>> for the Linux SMMU drivers and I copied them.
> 
> Yes I chose this region because it does not overlap with any guest RAM
> region
> 
>>
>> We can't rely on that behavior, though, it will break MSIs in emulated
>> devices. And if Qemu happens to move the MSI doorbell in future machine
>> revisions, then it would also break VFIO.
>>
>> Just for my own understanding -- what happens, I think, is that in Linux
>> iova_reserve_iommu_regions initially reserves the guest-physical doorbell
>> 0x08000000-0x08100000. Then much later, when the device driver requests an
>> MSI, the irqchip driver calls iommu_dma_map_msi_msg with the
>> guest-physical gicv2m address 0x08020000. The function finds the right
>> page in msi_page_list, which was added by cookie_init_hw_msi_region,
>> therefore bypassing the viommu and the GPA gets written in the MSI-X table.
> 
> I share Jean's understanding. To me using IOMMU_RESV_MSI in the
> virtio-iommu means this region is not translated by the IOMMU. as
> cookie_init_hw_msi_region() pre-allocates the msi_page array,
> iommu_dma_get_msi_page() does not do any IOMMU mapping.
> 
>>
>> If an emulated device such as virtio-net-pci were to generate an MSI, then
>> Qemu would attempt to access the doorbell written by Linux into the MSI-X
>> table, 0x08020000, and fault because that address wasn't mapped in the viommu.
> Yes so I am confused, how can it work with a virtio-net-pci or
> passthrough'ed e1000e device using MSIs?
>>
>> So for VFIO, you either need to translate the MSI-X entry using the
>> viommu,
> 
> For the vsmmuv3 I created a dedicated IOMMUNotifier to handle the fact
> the MSI doorbell is translated and MSI routes need to be updated. This
> seems to work.
> 
>  or just assume that the vaddr corresponds to the only MSI doorbell
>> accessible by this device (because how can we be certain that the guest
>> already mapped the doorbell before writing the entry?)
>>
>> For ARM machines it's probably best to stick with IOMMU_RESV_SW_MSI.
>> However, a nice way to use IOMMU_RESV_MSI would be for the virtio-iommu
>> device to advertise identity-mapped/reserved regions, and bypass
>> translation on these regions. Then the driver could reserve those with
>> IOMMU_RESV_MSI.
> 
> At least we may need to configure the virtio-iommu to either bypass MSIs
> (x86) or translate MSIs (ARM)?

Actually on x86 no MSI controller will attempt to map MSIs, as opposed
to ARM GICv2M & ITS. So the only problem exposing IOMMU_RESV_SW_MSI
regions is vfio_iommu_type1 will assess the IRQ assignment safety using
irq_domain_check_msi_remap() and not with the IOMMU IOMMU_CAP_INTR_REMAP
capability.

Thanks

Eric
>  For x86 we will need such a system, with an added IRQ
>> remapping feature.
> Meaning this must live along with vIR, is that what you mean? Also on
> ARM this must live with vITS anyway. This is an orthogonal feature, right?
> 
> Thanks
> 
> Eric
>>
>> Thanks,
>> Jean
>>

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [Qemu-devel] [RFC v2 0/8] VIRTIO-IOMMU device
  2017-07-07  7:25                           ` Auger Eric
@ 2017-07-07 11:36                             ` Bharat Bhushan
  2017-07-07 15:19                               ` Jean-Philippe Brucker
  0 siblings, 1 reply; 73+ messages in thread
From: Bharat Bhushan @ 2017-07-07 11:36 UTC (permalink / raw)
  To: Auger Eric, Jean-Philippe Brucker, eric.auger.pro, peter.maydell,
	alex.williamson, mst, qemu-arm, qemu-devel
  Cc: wei, kevin.tian, marc.zyngier, tn, will.deacon, drjones,
	robin.murphy, christoffer.dall

Hi Eric,

> >>>> -----Original Message-----
> >>>> From: Jean-Philippe Brucker [mailto:jean-philippe.brucker@arm.com]
> >>>> Sent: Thursday, July 06, 2017 3:33 PM
> >>>> To: Bharat Bhushan <bharat.bhushan@nxp.com>; Auger Eric
> >>>> <eric.auger@redhat.com>; eric.auger.pro@gmail.com;
> >>>> peter.maydell@linaro.org; alex.williamson@redhat.com;
> >> mst@redhat.com;
> >>>> qemu-arm@nongnu.org; qemu-devel@nongnu.org
> >>>> Cc: wei@redhat.com; kevin.tian@intel.com; marc.zyngier@arm.com;
> >>>> tn@semihalf.com; will.deacon@arm.com; drjones@redhat.com;
> >>>> robin.murphy@arm.com; christoffer.dall@linaro.org
> >>>> Subject: Re: [Qemu-devel] [RFC v2 0/8] VIRTIO-IOMMU device
> >>>>
> >>>> On 05/07/17 09:49, Bharat Bhushan wrote:>>> Also when setup
> >>>> msi-route
> >>>> kvm_irqchip_add_msi_route() we needed to
> >>>>>> provide the translated address.
> >>>>>>> According to my understanding this is required because kernel
> >>>>>>> does no go
> >>>>>> through viommu translation when generating interrupt, no?
> >>>>>>
> >>>>>> yes this is needed when KVM MSI routes are set up, ie. along with
> >>>>>> GICV3
> >>>> ITS.
> >>>>>> With GICv2M, qemu direct gsi mapping is used and this is not
> needed.
> >>>>>>
> >>>>>> So I do not understand your previous sentence saying "MSI
> >>>>>> interrupts works without any change".
> >>>>>
> >>>>> I have almost completed vfio integration with virtio-iommu and now
> >>>>> testing the changes by assigning e1000 device to VM. For this I
> >>>>> have changed virtio-iommu driver to use IOMMU_RESV_MSI rather
> than
> >>>>> sw-
> >> msi
> >>>>> and this does not need changed in vfio_get_addr()  and
> >>>>> kvm_irqchip_add_msi_route()
> >>>>
> >>>> I understand you're reserving region 0x08000000-0x08100000 as
> >>>> IOMMU_RESV_MSI instead of IOMMU_RESV_SW_MSI? I think this
> only
> >> works
> >>>> because Qemu places the vgic in that area as well (in hw/arm/virt.c).
> >>>> It's not a coincidence if the addresses are the same, because Eric
> >>>> chose them for the Linux SMMU drivers and I copied them.
> >>>>
> >>>> We can't rely on that behavior, though, it will break MSIs in
> >>>> emulated devices. And if Qemu happens to move the MSI doorbell in
> >>>> future machine revisions, then it would also break VFIO.
> >>>
> >>> Yes, make sense to me
> >>>
> >>>>
> >>>> Just for my own understanding -- what happens, I think, is that in
> >>>> Linux iova_reserve_iommu_regions initially reserves the
> >>>> guest-physical doorbell 0x08000000-0x08100000. Then much later,
> >>>> when the device driver requests an MSI, the irqchip driver calls
> >>>> iommu_dma_map_msi_msg with the guest- physical gicv2m address
> >>>> 0x08020000. The function finds the right page in msi_page_list,
> >>>> which was added by cookie_init_hw_msi_region, therefore bypassing
> >>>> the
> >> viommu and the GPA gets written in the MSI-X table.
> >>>
> >>> This means in case tomorrow when qemu changes virt machine address
> >> map and vgic-its (its-translator register address) address range does
> >> not fall in the msi_page_list then it will allocate a new iova,
> >> create mapping in iommu. So this will no longer be identity mapped
> >> and fail to work with new qemu?
> >>>
> >> Yes that's correct.
> >>>>
> >>>> If an emulated device such as virtio-net-pci were to generate an
> >>>> MSI, then Qemu would attempt to access the doorbell written by
> >>>> Linux into the MSI-X table, 0x08020000, and fault because that
> >>>> address wasn't mapped in the viommu.
> >>>>
> >>>> So for VFIO, you either need to translate the MSI-X entry using the
> >>>> viommu, or just assume that the vaddr corresponds to the only MSI
> >>>> doorbell accessible by this device (because how can we be certain
> >>>> that the guest already mapped the doorbell before writing the
> >>>> entry?)
> >>>>
> >>>> For ARM machines it's probably best to stick with
> >> IOMMU_RESV_SW_MSI.
> >>>> However, a nice way to use IOMMU_RESV_MSI would be for the virtio-
> >>>> iommu device to advertise identity-mapped/reserved regions, and
> >>>> bypass translation on these regions. Then the driver could reserve
> >>>> those with IOMMU_RESV_MSI.
> >>>
> >>> Correct me if I did not understood you correctly, today iommu-driver
> >> decides msi-reserved region, what if we change this and virtio-iommu
> >> device will provide the reserved msi region as per the emulated machine
> (virt/intel).
> >> So virtio-iommu driver will use the address advertised by
> >> virtio-iommu device as IOMMU_RESV_MSI. In this case msi-page-list
> >> will always have the reserved region for MSI.
> >>> On qemu side, for emulated devices we will let virtio-iommu return
> >>> same
> >> address as translated address as it falls in MSI-reserved page
> >> already known to it.
> >>
> >> I think what you're proposing here corresponds to the 1st approach
> >> that was followed for PCIe passthrough/MSI on ARM, ie. the userspace
> >> was providing the reserved region base address & size.
> >>  This was ruled out and now this
> >> region is arbitrarily set by the smmu-driver. At the moment this
> >> means this region cannot contain guest RAM.
> >
> > In rejected proposal, user-space used to choose a reserve region and
> provide that to Host Linux. Host Linux uses that MSI mapping. Just for my
> understanding if tomorrow QEMU changes its address space then it may not
> work without changing SMMU msi-reserved-iova in host driver, right? For
> example if emulated machine have RAM at this address then it will not work?
> Yes that's correct. Note MSI reserved regions are now exposed to userspace
> through /sys/kernel/iommu_groups/<>/reserved_regions
> >
> > In this proposal, QEMU reserves a iova-range for guest (not host) and guest
> kernel will use this as msi-iova untranslated (IOMMU_RESV_MSI). While this
> does not change host interface and it will continue to use host reserved
> mapping for actual interrupt generation, no?
> But then userspace needs to provide IOMMU_RESV_MSI range to guest
> kernel, right? What would be the proposed manner?

Just an opinion, we can define feature (VIRTIO_IOMMU_F_RES_MSI_RANGE) and provide this info via a command (VIRTIO_IOMMU_T_MSI_RANGE). Guest iommu-driver will make this call during initialization and store the value. This value will just replace MSI_IOVA_BASE and MSI_IOVA_LENGHT hash define. Rest will remain same in virtio-iommu driver.

> Looks weird to me to
> have different MSI handling on host and guest. Also I still don't get how you
> handle the case where virtio-net-pci emits accesses to the MSI doorbell
> while this latter is not mapped.

I think I will look the code in detail for virtio-net-pci code, I was thinking of if we can handle this region differently or make 1:1 translation on virtio_iommu_translate() for this region. But my knowledge is limited around this piece of code. Is this possible or ugly ?

Thanks
-Bharat

> 
> Thanks
> 
> Eric
> >
> > Thanks
> > -Bharat
> >
> >>
> >>>
> >>>
> >>>> For x86 we will need such a system, with an added IRQ remapping
> >>>> feature.
> >>>
> >>> I do not understand x86 MSI interrupt generation, but If above
> >>> understand
> >> is correct, then why we need IRQ remapping for x86?
> >> To me x86 IR corresponds simply corresponds to the ITS MSI controller
> >> modality on ARM. So as you still need vITS along with virtio-iommu on
> >> ARM, you need vIRQ alongs with virtio-iommu on Intel. Does that make
> sense?
> >>
> >> So in any case we need to make sure the guest uses a vITS or vIR to
> >> make sure MSIs are correctly isolated.
> >>
> >>
> >>> Will the x86 machine emulated in QEMU provides a big address range
> >>> for
> >> MSIs and when actually generating MSI it needed some extra processing
> >> (IRQ-remapping processing) before actually generating write
> >> transaction for MSI interrupt ?
> >> My understanding is on x86, the MSI window is fixed and matches
> >> [FEE0_0000h – FEF0_000h]. MSIs are conveyed on a separate address
> >> space than usual DMA accesses. And yes they end up in IR if supported in
> the HW.
> >>
> >> Thanks
> >>
> >> Eric
> >>>
> >>> Thanks
> >>> -Bharat
> >>>
> >>>>
> >>>> Thanks,
> >>>> Jean

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [Qemu-devel] [Qemu-arm]  [RFC v2 0/8] VIRTIO-IOMMU device
  2017-07-06 23:33                           ` Tian, Kevin
@ 2017-07-07 15:14                             ` Jean-Philippe Brucker
  2017-07-07 22:11                               ` Kalra, Ashish
  2017-07-14  6:58                               ` Tian, Kevin
  0 siblings, 2 replies; 73+ messages in thread
From: Jean-Philippe Brucker @ 2017-07-07 15:14 UTC (permalink / raw)
  To: Tian, Kevin, Kalra, Ashish, Auger Eric, Bharat Bhushan,
	eric.auger.pro, peter.maydell, alex.williamson, mst, qemu-arm,
	qemu-devel
  Cc: marc.zyngier, tn, will.deacon, drjones, robin.murphy, christoffer.dall

Hi Ashish,

On 07/07/17 00:33, Tian, Kevin wrote:
>> From: Kalra, Ashish [mailto:Ashish.Kalra@cavium.com]
>> Sent: Friday, July 7, 2017 7:24 AM
>>
>> I have a generic question on vIOMMU support, is there any proposal/plan to
>> add ATS/PRI extension support to vIOMMUs and allow
>> handling for end to end (v)IOMMU Page faults (w/t the device side
>> implementation on Vhost) ?
>>
>> Again, the motivation will be to do DMA on paged guest memory and
>> potentially avoiding the requirement of pinned/locked
>> guest physical memory for DMA.
> 
> yes, that's a necessary part to support SVM in both virtio-iommu 
> approach and fully emulated approach (e.g. for Intel VTd). There
> are already patches and discussions in other thread about how to
> propagate IOMMU page fault to vIOMMU. Then after it is done
> vIOMMU page fault emulation will be further added.
> 
> https://lkml.org/lkml/2017/6/27/964

For virtio-iommu, I'd like to add an event virtqueue for the device to
send page faults to the driver, in a format similar to a PRI Page Request.
The driver would then send a reply via the request virtqueue in a format
similar to a PRG Response.

In Qemu the device implementation would hopefully be based on the same
mechanism as VTd. The vhost implementation would receive IO Page Faults
from VFIO and forward them on the event virtqueue similarly to the
userspace implementation.

Thanks,
Jean

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [Qemu-devel] [RFC v2 0/8] VIRTIO-IOMMU device
  2017-07-07  6:21                 ` Tian, Kevin
@ 2017-07-07 15:15                   ` Jean-Philippe Brucker
  2017-07-14  7:20                     ` Tian, Kevin
  0 siblings, 1 reply; 73+ messages in thread
From: Jean-Philippe Brucker @ 2017-07-07 15:15 UTC (permalink / raw)
  To: Tian, Kevin, Bharat Bhushan, Auger Eric, eric.auger.pro,
	peter.maydell, alex.williamson, mst, qemu-arm, qemu-devel
  Cc: wei, marc.zyngier, tn, will.deacon, drjones, robin.murphy,
	christoffer.dall

On 07/07/17 07:21, Tian, Kevin wrote:
> sorry I didn't quite get this part, and here is my understanding:
> 
> Guest programs vIOMMU to map a gIOVA (used by MSI to a GPA 
> of doorbell register of virtual irqchip. vIOMMU then 
> triggers VFIO map/unmap to update physical IOMMU page
> table for gIOVA -> HPA of real doorbell of physical irqchip

At the moment (non-SVM), physical and virtual MSI doorbell are completely
dissociated. VFIO itself maps the doorbell GPA->HPA during container
initialization. The GPA, chosen arbitrarily by the host, is then removed
from the guest GPA space.

When the guest programs the vIOMMU to map a gIOVA to the virtual irqchip
doorbell, I suppose Qemu will notice that the GPA doesn't correspond to
RAM and will withhold sending a VFIO_IOMMU_MAP_DMA request.

(For SVM I don't want to go into the details just now, but we will
probably need a separate VFIO mechanism to update the physical MSI-X
tables with whatever gIOVA the guest mapped in its private stage-1 page
tables.)

> (assume your irqchip will provide multiple doorbells so each
> device can have its own channel).

In existing irqchips the doorbell is shared by endpoints, which are
differentiated by their device ID (generally the BDF). I'm not sure why
this matters here?

> then once this update is
> done, later MSI interrupts from assigned device will go 
> through physical IOMMU (gIOVA->HPA) then reach irqchip 
> for irq remapping. vIOMMU is involved only in configuration
> path instead of actual interrupt path.

Yes the vIOMMU is used to correlate the IOVA written by the guest in its
virtual MSI-X table with the MAP request received by the vIOMMU. That is
probably used to setup IRQFD routes with KVM. But the vIOMMU is not
involved further than that in MSIs.

> If my understanding is correct, above will be the natural flow then
> why is additional virtio-iommu change required? :-)

The change is not *required* for ARM systems, I only proposed removing the
doorbell address translation stage to make host implementation simpler
(and since virtio-iommu on x86 won't translate the doorbell anyway, we
have to add support for this to virtio-iommu). But for Qemu, since vSMMU
needs to implement the natural flow anyway, it might not be a lot of
effort to also do it for virtio-iommu. Other implementations (e.g.
kvmtool) might piggy-back on the x86 way and declare the irqchip doorbell
as untranslated.

My proposal also breaks when confronted to virtual SVM in a physical ARM
system, where the guest owns stage-1 page tables and *has* to map the
doorbell if it wants MSIs to work, so you can disregard it :)

Thanks,
Jean

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [Qemu-devel] [RFC v2 0/8] VIRTIO-IOMMU device
  2017-07-07 11:36                             ` Bharat Bhushan
@ 2017-07-07 15:19                               ` Jean-Philippe Brucker
  2017-07-11  5:54                                 ` Bharat Bhushan
  0 siblings, 1 reply; 73+ messages in thread
From: Jean-Philippe Brucker @ 2017-07-07 15:19 UTC (permalink / raw)
  To: Bharat Bhushan, Auger Eric, eric.auger.pro, peter.maydell,
	alex.williamson, mst, qemu-arm, qemu-devel
  Cc: wei, kevin.tian, marc.zyngier, tn, will.deacon, drjones,
	robin.murphy, christoffer.dall

On 07/07/17 12:36, Bharat Bhushan wrote:
>>> In this proposal, QEMU reserves a iova-range for guest (not host) and guest
>> kernel will use this as msi-iova untranslated (IOMMU_RESV_MSI). While this
>> does not change host interface and it will continue to use host reserved
>> mapping for actual interrupt generation, no?
>> But then userspace needs to provide IOMMU_RESV_MSI range to guest
>> kernel, right? What would be the proposed manner?
> 
> Just an opinion, we can define feature (VIRTIO_IOMMU_F_RES_MSI_RANGE) and provide this info via a command (VIRTIO_IOMMU_T_MSI_RANGE). Guest iommu-driver will make this call during initialization and store the value. This value will just replace MSI_IOVA_BASE and MSI_IOVA_LENGHT hash define. Rest will remain same in virtio-iommu driver.

Yes I had something similar in mind, although more generic since we'll
need to get other bits of information from the device in future extensions
(fault handling, page table formats and dynamic reserves of memory for
SVM), and maybe also for finding out per-address-space page granularity
(see my reply of patch 3/8). These are per-endpoint properties that cannot
be advertise in the virtio config space.

                                 ***

So I propose to add a per-endpoint probing mechanism on the request queue:

* The device advertises a new command VIRTIO_IOMMU_T_PROBE with feature
bit VIRTIO_IOMMU_F_PROBE.
* When this feature is advertised, the device sets probe_size field in the
the config space.
* When device offers VIRTIO_IOMMU_F_PROBE, the driver should send an
VIRTIO_IOMMU_T_PROBE request for each new endpoint.
* The driver allocates a device-writeable buffer of probe_size (plus
framing) and sends it as a VIRTIO_IOMMU_T_PROBE request.
* The device fills the buffer with various information.

struct virtio_iommu_req_probe {
	/* device-readable */
	struct virtio_iommu_req_head head;
	le32 device;
	le32 flags;

	/* maybe also le32 content_size, but it must be equal to
	   probe_size */

	/* device-writeable */
	u8 content[];
	struct virtio_iommu_req_tail tail;
};

I'm still struggling with the content and layout of the probe request, and
would appreciate any feedback. To be easily extended, I think it should
contain a list of fields of variable size:

	|0           15|16           31|32               N|
	|     type     |    length     |      values      |

'length' might be made optional if it can be deduced from type, but might
make driver-side parsing more robust.

The probe could either be done for each endpoint, or for each address
space. I much prefer endpoint because it is the smallest granularity. The
driver can then decide what endpoints to put together in the same address
space based on their individual capabilities. The specification would
described how each endpoint property is combined when endpoints are put in
the same address space. For example, take the minimum of all PASID size,
the maximum of all page granularities, combine doorbell addresses, etc.

If we did the probe on address spaces instead, the driver would have to
re-send a probe request each time a new endpoint is attached to an
existing address space, to see if it is still capable of page table
handover or if the driver just combined a VFIO and an emulated endpoint by
accident.

                                 ***

Using this framework, the device can declare doorbell regions by adding
one or more RESV fields into the probe buffer:

/* 'type' */
#define VIRTIO_IOMMU_PROBE_T_RESV 	0x1

/* 'values'. 'length' is sizeof(struct virtio_iommu_probe_resv) */
struct virtio_iommu_probe_resv {
	le64 gpa;
	le64 size;

#define VIRTIO_IOMMU_PROBE_RESV_MSI	0x1
	u8 type;
};

Such a region would be subject to the following rules:

* Driver should not use any IOVA declared as RESV_MSI in a mapping.
* Device should leave any transaction matching a RESV_MSI region pass
through untranslated.
* If the device does not advertise any RESV region, then the driver should
assume that MSI doorbells, like any other GPA, must be mapped with an
arbitrary IOVA in order for the endpoint to access them.
* Given that the driver *should* perform a probe request if available, and
it *should* understand the VIRTIO_IOMMU_PROBE_T_RESV field, then this
field tells the guest how it should handle MSI doorbells, and whether it
should map the address via MAP requests or not.

Does this make sense and did I overlook something?

Thanks,
Jean

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [Qemu-devel] [RFC v2 0/8] VIRTIO-IOMMU device
  2017-07-06 21:11                     ` Auger Eric
  2017-07-07  7:31                       ` Auger Eric
@ 2017-07-07 15:20                       ` Jean-Philippe Brucker
  1 sibling, 0 replies; 73+ messages in thread
From: Jean-Philippe Brucker @ 2017-07-07 15:20 UTC (permalink / raw)
  To: Auger Eric, Bharat Bhushan, eric.auger.pro, peter.maydell,
	alex.williamson, mst, qemu-arm, qemu-devel
  Cc: wei, kevin.tian, marc.zyngier, tn, will.deacon, drjones,
	robin.murphy, christoffer.dall

On 06/07/17 22:11, Auger Eric wrote:
> Hello Bharat, Jean-Philippe,
> On 06/07/2017 12:02, Jean-Philippe Brucker wrote:
>> On 05/07/17 09:49, Bharat Bhushan wrote:>>> Also when setup msi-route
>> kvm_irqchip_add_msi_route() we needed to
>>>> provide the translated address.
>>>>> According to my understanding this is required because kernel does no go
>>>> through viommu translation when generating interrupt, no?
>>>>
>>>> yes this is needed when KVM MSI routes are set up, ie. along with GICV3 ITS.
>>>> With GICv2M, qemu direct gsi mapping is used and this is not needed.
>>>>
>>>> So I do not understand your previous sentence saying "MSI interrupts works
>>>> without any change".
>>>
>>> I have almost completed vfio integration with virtio-iommu and now testing the changes by assigning e1000 device to VM. For this I have changed virtio-iommu driver to use IOMMU_RESV_MSI rather than sw-msi and this does not need changed in vfio_get_addr()  and kvm_irqchip_add_msi_route()
>>
>> I understand you're reserving region 0x08000000-0x08100000 as
>> IOMMU_RESV_MSI instead of IOMMU_RESV_SW_MSI? I think this only works
>> because Qemu places the vgic in that area as well (in hw/arm/virt.c). It's
>> not a coincidence if the addresses are the same, because Eric chose them
>> for the Linux SMMU drivers and I copied them.
> 
> Yes I chose this region because it does not overlap with any guest RAM
> region
> 
>>
>> We can't rely on that behavior, though, it will break MSIs in emulated
>> devices. And if Qemu happens to move the MSI doorbell in future machine
>> revisions, then it would also break VFIO.
>>
>> Just for my own understanding -- what happens, I think, is that in Linux
>> iova_reserve_iommu_regions initially reserves the guest-physical doorbell
>> 0x08000000-0x08100000. Then much later, when the device driver requests an
>> MSI, the irqchip driver calls iommu_dma_map_msi_msg with the
>> guest-physical gicv2m address 0x08020000. The function finds the right
>> page in msi_page_list, which was added by cookie_init_hw_msi_region,
>> therefore bypassing the viommu and the GPA gets written in the MSI-X table.
> 
> I share Jean's understanding. To me using IOMMU_RESV_MSI in the
> virtio-iommu means this region is not translated by the IOMMU. as
> cookie_init_hw_msi_region() pre-allocates the msi_page array,
> iommu_dma_get_msi_page() does not do any IOMMU mapping.
> 
>>
>> If an emulated device such as virtio-net-pci were to generate an MSI, then
>> Qemu would attempt to access the doorbell written by Linux into the MSI-X
>> table, 0x08020000, and fault because that address wasn't mapped in the viommu.
> Yes so I am confused, how can it work with a virtio-net-pci or
> passthrough'ed e1000e device using MSIs?
>>
>> So for VFIO, you either need to translate the MSI-X entry using the
>> viommu,
> 
> For the vsmmuv3 I created a dedicated IOMMUNotifier to handle the fact
> the MSI doorbell is translated and MSI routes need to be updated. This
> seems to work.
> 
>  or just assume that the vaddr corresponds to the only MSI doorbell
>> accessible by this device (because how can we be certain that the guest
>> already mapped the doorbell before writing the entry?)
>>
>> For ARM machines it's probably best to stick with IOMMU_RESV_SW_MSI.
>> However, a nice way to use IOMMU_RESV_MSI would be for the virtio-iommu
>> device to advertise identity-mapped/reserved regions, and bypass
>> translation on these regions. Then the driver could reserve those with
>> IOMMU_RESV_MSI.
> 
> At least we may need to configure the virtio-iommu to either bypass MSIs
> (x86) or translate MSIs (ARM)?

Yes, see the VIRTIO_IOMMU_T_PROBE proposal in, er, my other reply.

>  For x86 we will need such a system, with an added IRQ
>> remapping feature.
> Meaning this must live along with vIR, is that what you mean? Also on
> ARM this must live with vITS anyway. This is an orthogonal feature, right?

Reserving doorbells regions on x86 is a must otherwise MSIs won't work.
IRQ remapping would be nice to add in some distant future.

Thanks,
Jean

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [Qemu-devel] [Qemu-arm]  [RFC v2 0/8] VIRTIO-IOMMU device
  2017-07-07 15:14                             ` Jean-Philippe Brucker
@ 2017-07-07 22:11                               ` Kalra, Ashish
  2017-07-11 11:31                                 ` Jean-Philippe Brucker
  2017-07-14  6:58                               ` Tian, Kevin
  1 sibling, 1 reply; 73+ messages in thread
From: Kalra, Ashish @ 2017-07-07 22:11 UTC (permalink / raw)
  To: Jean-Philippe Brucker, Tian, Kevin, Auger Eric, Bharat Bhushan,
	eric.auger.pro, peter.maydell, alex.williamson, mst, qemu-arm,
	qemu-devel
  Cc: marc.zyngier, tn, will.deacon, drjones, robin.murphy, christoffer.dall

Hello Jean,

Thanks for the information.

Is someone already working on implementing this, or is this something I can look into and implementing ?

Also, as Michael mentioned and I looked in the vfio iommu (type1) driver implementation that it supports mainly
static mappings of user-space processes with  user-space pages pinned/locked in memory, so a generic support
for ATS/PRI extensions needs to be added in VFIO too.

Thanks,
Ashish

-----Original Message-----
From: Jean-Philippe Brucker [mailto:jean-philippe.brucker@arm.com] 
Sent: Friday, July 07, 2017 8:45 PM
To: Tian, Kevin; Kalra, Ashish; Auger Eric; Bharat Bhushan; eric.auger.pro@gmail.com; peter.maydell@linaro.org; alex.williamson@redhat.com; mst@redhat.com; qemu-arm@nongnu.org; qemu-devel@nongnu.org
Cc: marc.zyngier@arm.com; tn@semihalf.com; will.deacon@arm.com; drjones@redhat.com; robin.murphy@arm.com; christoffer.dall@linaro.org
Subject: Re: [Qemu-arm] [Qemu-devel] [RFC v2 0/8] VIRTIO-IOMMU device

Hi Ashish,

On 07/07/17 00:33, Tian, Kevin wrote:
>> From: Kalra, Ashish [mailto:Ashish.Kalra@cavium.com]
>> Sent: Friday, July 7, 2017 7:24 AM
>>
>> I have a generic question on vIOMMU support, is there any 
>> proposal/plan to add ATS/PRI extension support to vIOMMUs and allow 
>> handling for end to end (v)IOMMU Page faults (w/t the device side 
>> implementation on Vhost) ?
>>
>> Again, the motivation will be to do DMA on paged guest memory and 
>> potentially avoiding the requirement of pinned/locked guest physical 
>> memory for DMA.
> 
> yes, that's a necessary part to support SVM in both virtio-iommu 
> approach and fully emulated approach (e.g. for Intel VTd). There are 
> already patches and discussions in other thread about how to propagate 
> IOMMU page fault to vIOMMU. Then after it is done vIOMMU page fault 
> emulation will be further added.
> 
> https://lkml.org/lkml/2017/6/27/964

For virtio-iommu, I'd like to add an event virtqueue for the device to send page faults to the driver, in a format similar to a PRI Page Request.
The driver would then send a reply via the request virtqueue in a format similar to a PRG Response.

In Qemu the device implementation would hopefully be based on the same mechanism as VTd. The vhost implementation would receive IO Page Faults from VFIO and forward them on the event virtqueue similarly to the userspace implementation.

Thanks,
Jean

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [Qemu-devel] [RFC v2 0/8] VIRTIO-IOMMU device
  2017-07-07 15:19                               ` Jean-Philippe Brucker
@ 2017-07-11  5:54                                 ` Bharat Bhushan
  2017-07-11 12:51                                   ` Jean-Philippe Brucker
  0 siblings, 1 reply; 73+ messages in thread
From: Bharat Bhushan @ 2017-07-11  5:54 UTC (permalink / raw)
  To: Jean-Philippe Brucker, Auger Eric, eric.auger.pro, peter.maydell,
	alex.williamson, mst, qemu-arm, qemu-devel
  Cc: wei, kevin.tian, marc.zyngier, tn, will.deacon, drjones,
	robin.murphy, christoffer.dall

Hi Jean,

> -----Original Message-----
> From: Jean-Philippe Brucker [mailto:jean-philippe.brucker@arm.com]
> Sent: Friday, July 07, 2017 8:50 PM
> To: Bharat Bhushan <bharat.bhushan@nxp.com>; Auger Eric
> <eric.auger@redhat.com>; eric.auger.pro@gmail.com;
> peter.maydell@linaro.org; alex.williamson@redhat.com; mst@redhat.com;
> qemu-arm@nongnu.org; qemu-devel@nongnu.org
> Cc: wei@redhat.com; kevin.tian@intel.com; marc.zyngier@arm.com;
> tn@semihalf.com; will.deacon@arm.com; drjones@redhat.com;
> robin.murphy@arm.com; christoffer.dall@linaro.org
> Subject: Re: [Qemu-devel] [RFC v2 0/8] VIRTIO-IOMMU device
> 
> On 07/07/17 12:36, Bharat Bhushan wrote:
> >>> In this proposal, QEMU reserves a iova-range for guest (not host) and
> guest
> >> kernel will use this as msi-iova untranslated (IOMMU_RESV_MSI). While
> this
> >> does not change host interface and it will continue to use host reserved
> >> mapping for actual interrupt generation, no?
> >> But then userspace needs to provide IOMMU_RESV_MSI range to guest
> >> kernel, right? What would be the proposed manner?
> >
> > Just an opinion, we can define feature
> (VIRTIO_IOMMU_F_RES_MSI_RANGE) and provide this info via a command
> (VIRTIO_IOMMU_T_MSI_RANGE). Guest iommu-driver will make this call
> during initialization and store the value. This value will just replace
> MSI_IOVA_BASE and MSI_IOVA_LENGHT hash define. Rest will remain same
> in virtio-iommu driver.
> 
> Yes I had something similar in mind, although more generic since we'll
> need to get other bits of information from the device in future extensions
> (fault handling, page table formats and dynamic reserves of memory for
> SVM), and maybe also for finding out per-address-space page granularity
> (see my reply of patch 3/8). These are per-endpoint properties that cannot
> be advertise in the virtio config space.
> 
>                                  ***
> 
> So I propose to add a per-endpoint probing mechanism on the request
> queue:

What is per-endpoint? Is it "per-pci/platform-device"?

> 
> * The device advertises a new command VIRTIO_IOMMU_T_PROBE with
> feature
> bit VIRTIO_IOMMU_F_PROBE.
> * When this feature is advertised, the device sets probe_size field in the
> the config space.

Probably I did not get how virtio-iommu device emulation decides value of "probe_size", can you share more info?

> * When device offers VIRTIO_IOMMU_F_PROBE, the driver should send an
> VIRTIO_IOMMU_T_PROBE request for each new endpoint.
> * The driver allocates a device-writeable buffer of probe_size (plus
> framing) and sends it as a VIRTIO_IOMMU_T_PROBE request.
> * The device fills the buffer with various information.
> 
> struct virtio_iommu_req_probe {
> 	/* device-readable */
> 	struct virtio_iommu_req_head head;
> 	le32 device;
> 	le32 flags;
> 
> 	/* maybe also le32 content_size, but it must be equal to
> 	   probe_size */

Can you please describe why we need to pass size of "probe_size" in probe request?

> 
> 	/* device-writeable */
> 	u8 content[];

I assume content_size above is the size of array "content[]" and max value can be equal to probe_size advertised by device?

> 	struct virtio_iommu_req_tail tail;
> };
> 
> I'm still struggling with the content and layout of the probe request, and
> would appreciate any feedback. To be easily extended, I think it should
> contain a list of fields of variable size:
> 
> 	|0           15|16           31|32               N|
> 	|     type     |    length     |      values      |
> 
> 'length' might be made optional if it can be deduced from type, but might
> make driver-side parsing more robust.
> 
> The probe could either be done for each endpoint, or for each address
> space. I much prefer endpoint because it is the smallest granularity. The
> driver can then decide what endpoints to put together in the same address
> space based on their individual capabilities. The specification would
> described how each endpoint property is combined when endpoints are put
> in
> the same address space. For example, take the minimum of all PASID size,
> the maximum of all page granularities, combine doorbell addresses, etc.
> 
> If we did the probe on address spaces instead, the driver would have to
> re-send a probe request each time a new endpoint is attached to an
> existing address space, to see if it is still capable of page table
> handover or if the driver just combined a VFIO and an emulated endpoint by
> accident.
> 
>                                  ***
> 
> Using this framework, the device can declare doorbell regions by adding
> one or more RESV fields into the probe buffer:
> 
> /* 'type' */
> #define VIRTIO_IOMMU_PROBE_T_RESV 	0x1
> 
> /* 'values'. 'length' is sizeof(struct virtio_iommu_probe_resv) */
> struct virtio_iommu_probe_resv {
> 	le64 gpa;
> 	le64 size;
> 
> #define VIRTIO_IOMMU_PROBE_RESV_MSI	0x1
> 	u8 type;

type is 16 bit above?

> };
> 
> Such a region would be subject to the following rules:
> 
> * Driver should not use any IOVA declared as RESV_MSI in a mapping.
> * Device should leave any transaction matching a RESV_MSI region pass
> through untranslated.
> * If the device does not advertise any RESV region, then the driver should
> assume that MSI doorbells, like any other GPA, must be mapped with an
> arbitrary IOVA in order for the endpoint to access them.
> * Given that the driver *should* perform a probe request if available, and
> it *should* understand the VIRTIO_IOMMU_PROBE_T_RESV field, then this
> field tells the guest how it should handle MSI doorbells, and whether it
> should map the address via MAP requests or not.
> 
> Does this make sense and did I overlook something?

Overall it looks good to me. Do you have plans to implements this in virtio-iommu driver and kvmtool?

Thanks
-Bharat

> 
> Thanks,
> Jean

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [Qemu-devel] [Qemu-arm]  [RFC v2 0/8] VIRTIO-IOMMU device
  2017-07-07 22:11                               ` Kalra, Ashish
@ 2017-07-11 11:31                                 ` Jean-Philippe Brucker
  0 siblings, 0 replies; 73+ messages in thread
From: Jean-Philippe Brucker @ 2017-07-11 11:31 UTC (permalink / raw)
  To: Kalra, Ashish, Tian, Kevin, Auger Eric, Bharat Bhushan,
	eric.auger.pro, peter.maydell, alex.williamson, mst, qemu-arm,
	qemu-devel
  Cc: marc.zyngier, tn, will.deacon, drjones, robin.murphy, christoffer.dall

On 07/07/17 23:11, Kalra, Ashish wrote:
> Hello Jean,
> 
> Thanks for the information.
> 
> Is someone already working on implementing this, or is this something I can look into and implementing ?

I'm still working on the specification, along with a prototype for the
Linux driver and the kvmtool device. I don't want to rush it because it's
a complicated subject, and we're barely supporting SVM in the host.
Supporting IO page faults is only part of the problem, we also need to
hand page tables over to the guest, which requires a new interface for
virtio-iommu. For the moment, my priority is on consolidating the base
device specification.

> Also, as Michael mentioned and I looked in the vfio iommu (type1) driver implementation that it supports mainly
> static mappings of user-space processes with  user-space pages pinned/locked in memory, so a generic support
> for ATS/PRI extensions needs to be added in VFIO too.

That's also in progress. As far as I know the latest version for fault
reporting is http://www.spinics.net/lists/kvm/msg146615.html

Thanks,
Jean

> Thanks,
> Ashish
> 
> -----Original Message-----
> From: Jean-Philippe Brucker [mailto:jean-philippe.brucker@arm.com] 
> Sent: Friday, July 07, 2017 8:45 PM
> To: Tian, Kevin; Kalra, Ashish; Auger Eric; Bharat Bhushan; eric.auger.pro@gmail.com; peter.maydell@linaro.org; alex.williamson@redhat.com; mst@redhat.com; qemu-arm@nongnu.org; qemu-devel@nongnu.org
> Cc: marc.zyngier@arm.com; tn@semihalf.com; will.deacon@arm.com; drjones@redhat.com; robin.murphy@arm.com; christoffer.dall@linaro.org
> Subject: Re: [Qemu-arm] [Qemu-devel] [RFC v2 0/8] VIRTIO-IOMMU device
> 
> Hi Ashish,
> 
> On 07/07/17 00:33, Tian, Kevin wrote:
>>> From: Kalra, Ashish [mailto:Ashish.Kalra@cavium.com]
>>> Sent: Friday, July 7, 2017 7:24 AM
>>>
>>> I have a generic question on vIOMMU support, is there any 
>>> proposal/plan to add ATS/PRI extension support to vIOMMUs and allow 
>>> handling for end to end (v)IOMMU Page faults (w/t the device side 
>>> implementation on Vhost) ?
>>>
>>> Again, the motivation will be to do DMA on paged guest memory and 
>>> potentially avoiding the requirement of pinned/locked guest physical 
>>> memory for DMA.
>>
>> yes, that's a necessary part to support SVM in both virtio-iommu 
>> approach and fully emulated approach (e.g. for Intel VTd). There are 
>> already patches and discussions in other thread about how to propagate 
>> IOMMU page fault to vIOMMU. Then after it is done vIOMMU page fault 
>> emulation will be further added.
>>
>> https://lkml.org/lkml/2017/6/27/964
> 
> For virtio-iommu, I'd like to add an event virtqueue for the device to send page faults to the driver, in a format similar to a PRI Page Request.
> The driver would then send a reply via the request virtqueue in a format similar to a PRG Response.
> 
> In Qemu the device implementation would hopefully be based on the same mechanism as VTd. The vhost implementation would receive IO Page Faults from VFIO and forward them on the event virtqueue similarly to the userspace implementation.
> 
> Thanks,
> Jean
> 

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [Qemu-devel] [RFC v2 0/8] VIRTIO-IOMMU device
  2017-07-11  5:54                                 ` Bharat Bhushan
@ 2017-07-11 12:51                                   ` Jean-Philippe Brucker
  2017-07-12  3:50                                     ` Bharat Bhushan
  0 siblings, 1 reply; 73+ messages in thread
From: Jean-Philippe Brucker @ 2017-07-11 12:51 UTC (permalink / raw)
  To: Bharat Bhushan, Auger Eric, eric.auger.pro, peter.maydell,
	alex.williamson, mst, qemu-arm, qemu-devel
  Cc: wei, kevin.tian, marc.zyngier, tn, will.deacon, drjones,
	robin.murphy, christoffer.dall

On 11/07/17 06:54, Bharat Bhushan wrote:
> Hi Jean,
> 
>> -----Original Message-----
>> From: Jean-Philippe Brucker [mailto:jean-philippe.brucker@arm.com]
>> Sent: Friday, July 07, 2017 8:50 PM
>> To: Bharat Bhushan <bharat.bhushan@nxp.com>; Auger Eric
>> <eric.auger@redhat.com>; eric.auger.pro@gmail.com;
>> peter.maydell@linaro.org; alex.williamson@redhat.com; mst@redhat.com;
>> qemu-arm@nongnu.org; qemu-devel@nongnu.org
>> Cc: wei@redhat.com; kevin.tian@intel.com; marc.zyngier@arm.com;
>> tn@semihalf.com; will.deacon@arm.com; drjones@redhat.com;
>> robin.murphy@arm.com; christoffer.dall@linaro.org
>> Subject: Re: [Qemu-devel] [RFC v2 0/8] VIRTIO-IOMMU device
>>
>> On 07/07/17 12:36, Bharat Bhushan wrote:
>>>>> In this proposal, QEMU reserves a iova-range for guest (not host) and
>> guest
>>>> kernel will use this as msi-iova untranslated (IOMMU_RESV_MSI). While
>> this
>>>> does not change host interface and it will continue to use host reserved
>>>> mapping for actual interrupt generation, no?
>>>> But then userspace needs to provide IOMMU_RESV_MSI range to guest
>>>> kernel, right? What would be the proposed manner?
>>>
>>> Just an opinion, we can define feature
>> (VIRTIO_IOMMU_F_RES_MSI_RANGE) and provide this info via a command
>> (VIRTIO_IOMMU_T_MSI_RANGE). Guest iommu-driver will make this call
>> during initialization and store the value. This value will just replace
>> MSI_IOVA_BASE and MSI_IOVA_LENGHT hash define. Rest will remain same
>> in virtio-iommu driver.
>>
>> Yes I had something similar in mind, although more generic since we'll
>> need to get other bits of information from the device in future extensions
>> (fault handling, page table formats and dynamic reserves of memory for
>> SVM), and maybe also for finding out per-address-space page granularity
>> (see my reply of patch 3/8). These are per-endpoint properties that cannot
>> be advertise in the virtio config space.
>>
>>                                  ***
>>
>> So I propose to add a per-endpoint probing mechanism on the request
>> queue:
> 
> What is per-endpoint? Is it "per-pci/platform-device"?

Yes, it's a pci or platform device managed by the IOMMU. In the spec I'm
now using the term "endpoint" to easily differentiate from the
virtio-iommu device ("the device").

>> * The device advertises a new command VIRTIO_IOMMU_T_PROBE with
>> feature
>> bit VIRTIO_IOMMU_F_PROBE.
>> * When this feature is advertised, the device sets probe_size field in the
>> the config space.
> 
> Probably I did not get how virtio-iommu device emulation decides value of "probe_size", can you share more info?

The size of the virtio_iommu_req_probe structure is variable, and depends
what fields the device implements. So the device initially computes the
size it needs to fill virtio_iommu_req_probe, describes it in probe_size,
and the driver allocates that many bytes for virtio_iommu_req_probe.content[]

>> * When device offers VIRTIO_IOMMU_F_PROBE, the driver should send an
>> VIRTIO_IOMMU_T_PROBE request for each new endpoint.
>> * The driver allocates a device-writeable buffer of probe_size (plus
>> framing) and sends it as a VIRTIO_IOMMU_T_PROBE request.
>> * The device fills the buffer with various information.
>>
>> struct virtio_iommu_req_probe {
>> 	/* device-readable */
>> 	struct virtio_iommu_req_head head;
>> 	le32 device;
>> 	le32 flags;
>>
>> 	/* maybe also le32 content_size, but it must be equal to
>> 	   probe_size */
> 
> Can you please describe why we need to pass size of "probe_size" in probe request?

We don't. I don't think we should add this 'content_size' field unless
there is a compelling reason to do so.

>>
>> 	/* device-writeable */
>> 	u8 content[];
> 
> I assume content_size above is the size of array "content[]" and max value can be equal to probe_size advertised by device?

probe_size is exactly the size of array content[]. The driver must
allocate a buffer of this size (plus the space needed for head, device,
flags and tail).

Then the device is free to leave parts of content[] empty. Field 'type' 0
will be reserved and mark the end of the array.

>> 	struct virtio_iommu_req_tail tail;
>> };
>>
>> I'm still struggling with the content and layout of the probe request, and
>> would appreciate any feedback. To be easily extended, I think it should
>> contain a list of fields of variable size:
>>
>> 	|0           15|16           31|32               N|
>> 	|     type     |    length     |      values      |
>>
>> 'length' might be made optional if it can be deduced from type, but might
>> make driver-side parsing more robust.
>>
>> The probe could either be done for each endpoint, or for each address
>> space. I much prefer endpoint because it is the smallest granularity. The
>> driver can then decide what endpoints to put together in the same address
>> space based on their individual capabilities. The specification would
>> described how each endpoint property is combined when endpoints are put
>> in
>> the same address space. For example, take the minimum of all PASID size,
>> the maximum of all page granularities, combine doorbell addresses, etc.
>>
>> If we did the probe on address spaces instead, the driver would have to
>> re-send a probe request each time a new endpoint is attached to an
>> existing address space, to see if it is still capable of page table
>> handover or if the driver just combined a VFIO and an emulated endpoint by
>> accident.
>>
>>                                  ***
>>
>> Using this framework, the device can declare doorbell regions by adding
>> one or more RESV fields into the probe buffer:
>>
>> /* 'type' */
>> #define VIRTIO_IOMMU_PROBE_T_RESV 	0x1
>>
>> /* 'values'. 'length' is sizeof(struct virtio_iommu_probe_resv) */
>> struct virtio_iommu_probe_resv {
>> 	le64 gpa;
>> 	le64 size;
>>
>> #define VIRTIO_IOMMU_PROBE_RESV_MSI	0x1
>> 	u8 type;
> 
> type is 16 bit above?

Ah, the naming isn't great. This is not the same as above, and could be
called 'subtype' to avoid confusion. The above 16-bit type describes the
field type, e.g. struct virtio_iommu_probe_resv. I proposed 16-bit because
it seems easy to reach more than 255 kinds of endpoint properties, but
65535 should do.

This subtype describes which kind of resv region is described in the
structure. For the moment there only is VIRTIO_IOMMU_PROBE_RESV_MSI, but
we could for example add resv regions that the driver should never use or
that it should identity-map (equivalent to IOMMU_RESV_RESERVED/DIRECT in
Linux). I think 8 bits should be enough to contain any future types,
unless we make this a bitfield. For identity-map, there may be an
additional flags field describing the protection.

>> };
>>
>> Such a region would be subject to the following rules:
>>
>> * Driver should not use any IOVA declared as RESV_MSI in a mapping.
>> * Device should leave any transaction matching a RESV_MSI region pass
>> through untranslated.
>> * If the device does not advertise any RESV region, then the driver should
>> assume that MSI doorbells, like any other GPA, must be mapped with an
>> arbitrary IOVA in order for the endpoint to access them.
>> * Given that the driver *should* perform a probe request if available, and
>> it *should* understand the VIRTIO_IOMMU_PROBE_T_RESV field, then this
>> field tells the guest how it should handle MSI doorbells, and whether it
>> should map the address via MAP requests or not.
>>
>> Does this make sense and did I overlook something?
> 
> Overall it looks good to me. Do you have plans to implements this in virtio-iommu driver and kvmtool?

Yes, if there is no objection I'll try to formalize it and implement it
right away.

Thanks,
Jean

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [Qemu-devel] [RFC v2 0/8] VIRTIO-IOMMU device
  2017-07-11 12:51                                   ` Jean-Philippe Brucker
@ 2017-07-12  3:50                                     ` Bharat Bhushan
  2017-07-12 10:18                                       ` Jean-Philippe Brucker
  0 siblings, 1 reply; 73+ messages in thread
From: Bharat Bhushan @ 2017-07-12  3:50 UTC (permalink / raw)
  To: Jean-Philippe Brucker, Auger Eric, eric.auger.pro, peter.maydell,
	alex.williamson, mst, qemu-arm, qemu-devel
  Cc: wei, kevin.tian, marc.zyngier, tn, will.deacon, drjones,
	robin.murphy, christoffer.dall



> -----Original Message-----
> From: Jean-Philippe Brucker [mailto:jean-philippe.brucker@arm.com]
> Sent: Tuesday, July 11, 2017 6:21 PM
> To: Bharat Bhushan <bharat.bhushan@nxp.com>; Auger Eric
> <eric.auger@redhat.com>; eric.auger.pro@gmail.com;
> peter.maydell@linaro.org; alex.williamson@redhat.com; mst@redhat.com;
> qemu-arm@nongnu.org; qemu-devel@nongnu.org
> Cc: wei@redhat.com; kevin.tian@intel.com; marc.zyngier@arm.com;
> tn@semihalf.com; will.deacon@arm.com; drjones@redhat.com;
> robin.murphy@arm.com; christoffer.dall@linaro.org
> Subject: Re: [Qemu-devel] [RFC v2 0/8] VIRTIO-IOMMU device
> 
> On 11/07/17 06:54, Bharat Bhushan wrote:
> > Hi Jean,
> >
> >> -----Original Message-----
> >> From: Jean-Philippe Brucker [mailto:jean-philippe.brucker@arm.com]
> >> Sent: Friday, July 07, 2017 8:50 PM
> >> To: Bharat Bhushan <bharat.bhushan@nxp.com>; Auger Eric
> >> <eric.auger@redhat.com>; eric.auger.pro@gmail.com;
> >> peter.maydell@linaro.org; alex.williamson@redhat.com;
> mst@redhat.com;
> >> qemu-arm@nongnu.org; qemu-devel@nongnu.org
> >> Cc: wei@redhat.com; kevin.tian@intel.com; marc.zyngier@arm.com;
> >> tn@semihalf.com; will.deacon@arm.com; drjones@redhat.com;
> >> robin.murphy@arm.com; christoffer.dall@linaro.org
> >> Subject: Re: [Qemu-devel] [RFC v2 0/8] VIRTIO-IOMMU device
> >>
> >> On 07/07/17 12:36, Bharat Bhushan wrote:
> >>>>> In this proposal, QEMU reserves a iova-range for guest (not host)
> >>>>> and
> >> guest
> >>>> kernel will use this as msi-iova untranslated (IOMMU_RESV_MSI).
> >>>> While
> >> this
> >>>> does not change host interface and it will continue to use host
> >>>> reserved mapping for actual interrupt generation, no?
> >>>> But then userspace needs to provide IOMMU_RESV_MSI range to
> guest
> >>>> kernel, right? What would be the proposed manner?
> >>>
> >>> Just an opinion, we can define feature
> >> (VIRTIO_IOMMU_F_RES_MSI_RANGE) and provide this info via a
> command
> >> (VIRTIO_IOMMU_T_MSI_RANGE). Guest iommu-driver will make this call
> >> during initialization and store the value. This value will just
> >> replace MSI_IOVA_BASE and MSI_IOVA_LENGHT hash define. Rest will
> >> remain same in virtio-iommu driver.
> >>
> >> Yes I had something similar in mind, although more generic since
> >> we'll need to get other bits of information from the device in future
> >> extensions (fault handling, page table formats and dynamic reserves
> >> of memory for SVM), and maybe also for finding out per-address-space
> >> page granularity (see my reply of patch 3/8). These are per-endpoint
> >> properties that cannot be advertise in the virtio config space.
> >>
> >>                                  ***
> >>
> >> So I propose to add a per-endpoint probing mechanism on the request
> >> queue:
> >
> > What is per-endpoint? Is it "per-pci/platform-device"?
> 
> Yes, it's a pci or platform device managed by the IOMMU. In the spec I'm
> now using the term "endpoint" to easily differentiate from the virtio-iommu
> device ("the device").
> 
> >> * The device advertises a new command VIRTIO_IOMMU_T_PROBE with
> >> feature bit VIRTIO_IOMMU_F_PROBE.
> >> * When this feature is advertised, the device sets probe_size field
> >> in the the config space.
> >
> > Probably I did not get how virtio-iommu device emulation decides value of
> "probe_size", can you share more info?
> 
> The size of the virtio_iommu_req_probe structure is variable, and depends
> what fields the device implements. So the device initially computes the size it
> needs to fill virtio_iommu_req_probe, describes it in probe_size, and the
> driver allocates that many bytes for virtio_iommu_req_probe.content[]
> 
> >> * When device offers VIRTIO_IOMMU_F_PROBE, the driver should send
> an
> >> VIRTIO_IOMMU_T_PROBE request for each new endpoint.
> >> * The driver allocates a device-writeable buffer of probe_size (plus
> >> framing) and sends it as a VIRTIO_IOMMU_T_PROBE request.
> >> * The device fills the buffer with various information.
> >>
> >> struct virtio_iommu_req_probe {
> >> 	/* device-readable */
> >> 	struct virtio_iommu_req_head head;
> >> 	le32 device;
> >> 	le32 flags;
> >>
> >> 	/* maybe also le32 content_size, but it must be equal to
> >> 	   probe_size */
> >
> > Can you please describe why we need to pass size of "probe_size" in probe
> request?
> 
> We don't. I don't think we should add this 'content_size' field unless there is
> a compelling reason to do so.
> 
> >>
> >> 	/* device-writeable */
> >> 	u8 content[];
> >
> > I assume content_size above is the size of array "content[]" and max value
> can be equal to probe_size advertised by device?
> 
> probe_size is exactly the size of array content[]. The driver must allocate a
> buffer of this size (plus the space needed for head, device, flags and tail).
> 
> Then the device is free to leave parts of content[] empty. Field 'type' 0 will be
> reserved and mark the end of the array.
> 
> >> 	struct virtio_iommu_req_tail tail;
> >> };
> >>
> >> I'm still struggling with the content and layout of the probe
> >> request, and would appreciate any feedback. To be easily extended, I
> >> think it should contain a list of fields of variable size:
> >>
> >> 	|0           15|16           31|32               N|
> >> 	|     type     |    length     |      values      |
> >>
> >> 'length' might be made optional if it can be deduced from type, but
> >> might make driver-side parsing more robust.
> >>
> >> The probe could either be done for each endpoint, or for each address
> >> space. I much prefer endpoint because it is the smallest granularity.
> >> The driver can then decide what endpoints to put together in the same
> >> address space based on their individual capabilities. The
> >> specification would described how each endpoint property is combined
> >> when endpoints are put in the same address space. For example, take
> >> the minimum of all PASID size, the maximum of all page granularities,
> >> combine doorbell addresses, etc.
> >>
> >> If we did the probe on address spaces instead, the driver would have
> >> to re-send a probe request each time a new endpoint is attached to an
> >> existing address space, to see if it is still capable of page table
> >> handover or if the driver just combined a VFIO and an emulated
> >> endpoint by accident.
> >>
> >>                                  ***
> >>
> >> Using this framework, the device can declare doorbell regions by
> >> adding one or more RESV fields into the probe buffer:
> >>
> >> /* 'type' */
> >> #define VIRTIO_IOMMU_PROBE_T_RESV 	0x1
> >>
> >> /* 'values'. 'length' is sizeof(struct virtio_iommu_probe_resv) */
> >> struct virtio_iommu_probe_resv {
> >> 	le64 gpa;
> >> 	le64 size;
> >>
> >> #define VIRTIO_IOMMU_PROBE_RESV_MSI	0x1
> >> 	u8 type;

To be sure I am understanding it correctly, Is this "type" in struct virtio_iommu_req_head?

Thanks
-Bharat

> >
> > type is 16 bit above?
> 
> Ah, the naming isn't great. This is not the same as above, and could be called
> 'subtype' to avoid confusion. The above 16-bit type describes the field type,
> e.g. struct virtio_iommu_probe_resv. I proposed 16-bit because it seems
> easy to reach more than 255 kinds of endpoint properties, but
> 65535 should do.
> 
> This subtype describes which kind of resv region is described in the structure.
> For the moment there only is VIRTIO_IOMMU_PROBE_RESV_MSI, but we
> could for example add resv regions that the driver should never use or that it
> should identity-map (equivalent to IOMMU_RESV_RESERVED/DIRECT in
> Linux). I think 8 bits should be enough to contain any future types, unless we
> make this a bitfield. For identity-map, there may be an additional flags field
> describing the protection.
> 
> >> };
> >>
> >> Such a region would be subject to the following rules:
> >>
> >> * Driver should not use any IOVA declared as RESV_MSI in a mapping.
> >> * Device should leave any transaction matching a RESV_MSI region pass
> >> through untranslated.
> >> * If the device does not advertise any RESV region, then the driver
> >> should assume that MSI doorbells, like any other GPA, must be mapped
> >> with an arbitrary IOVA in order for the endpoint to access them.
> >> * Given that the driver *should* perform a probe request if
> >> available, and it *should* understand the
> VIRTIO_IOMMU_PROBE_T_RESV
> >> field, then this field tells the guest how it should handle MSI
> >> doorbells, and whether it should map the address via MAP requests or
> not.
> >>
> >> Does this make sense and did I overlook something?
> >
> > Overall it looks good to me. Do you have plans to implements this in virtio-
> iommu driver and kvmtool?
> 
> Yes, if there is no objection I'll try to formalize it and implement it right away.
> 
> Thanks,
> Jean

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [Qemu-devel] [RFC v2 0/8] VIRTIO-IOMMU device
  2017-07-12  3:50                                     ` Bharat Bhushan
@ 2017-07-12 10:18                                       ` Jean-Philippe Brucker
  2017-07-12 10:27                                         ` Bharat Bhushan
  0 siblings, 1 reply; 73+ messages in thread
From: Jean-Philippe Brucker @ 2017-07-12 10:18 UTC (permalink / raw)
  To: Bharat Bhushan, Auger Eric, eric.auger.pro, peter.maydell,
	alex.williamson, mst, qemu-arm, qemu-devel
  Cc: wei, kevin.tian, marc.zyngier, tn, will.deacon, drjones,
	robin.murphy, christoffer.dall

On 12/07/17 04:50, Bharat Bhushan wrote:
[...]
>> The size of the virtio_iommu_req_probe structure is variable, and depends
>> what fields the device implements. So the device initially computes the size it
>> needs to fill virtio_iommu_req_probe, describes it in probe_size, and the
>> driver allocates that many bytes for virtio_iommu_req_probe.content[]
>>
>>>> * When device offers VIRTIO_IOMMU_F_PROBE, the driver should send
>> an
>>>> VIRTIO_IOMMU_T_PROBE request for each new endpoint.
>>>> * The driver allocates a device-writeable buffer of probe_size (plus
>>>> framing) and sends it as a VIRTIO_IOMMU_T_PROBE request.
>>>> * The device fills the buffer with various information.
>>>>
>>>> struct virtio_iommu_req_probe {
>>>> 	/* device-readable */
>>>> 	struct virtio_iommu_req_head head;
>>>> 	le32 device;
>>>> 	le32 flags;
>>>>
>>>> 	/* maybe also le32 content_size, but it must be equal to
>>>> 	   probe_size */
>>>
>>> Can you please describe why we need to pass size of "probe_size" in probe
>> request?
>>
>> We don't. I don't think we should add this 'content_size' field unless there is
>> a compelling reason to do so.
>>
>>>>
>>>> 	/* device-writeable */
>>>> 	u8 content[];
>>>
>>> I assume content_size above is the size of array "content[]" and max value
>> can be equal to probe_size advertised by device?
>>
>> probe_size is exactly the size of array content[]. The driver must allocate a
>> buffer of this size (plus the space needed for head, device, flags and tail).
>>
>> Then the device is free to leave parts of content[] empty. Field 'type' 0 will be
>> reserved and mark the end of the array.
>>
>>>> 	struct virtio_iommu_req_tail tail;
>>>> };
>>>>
>>>> I'm still struggling with the content and layout of the probe
>>>> request, and would appreciate any feedback. To be easily extended, I
>>>> think it should contain a list of fields of variable size:
>>>>
>>>> 	|0           15|16           31|32               N|
>>>> 	|     type     |    length     |      values      |
>>>>
>>>> 'length' might be made optional if it can be deduced from type, but
>>>> might make driver-side parsing more robust.
>>>>
>>>> The probe could either be done for each endpoint, or for each address
>>>> space. I much prefer endpoint because it is the smallest granularity.
>>>> The driver can then decide what endpoints to put together in the same
>>>> address space based on their individual capabilities. The
>>>> specification would described how each endpoint property is combined
>>>> when endpoints are put in the same address space. For example, take
>>>> the minimum of all PASID size, the maximum of all page granularities,
>>>> combine doorbell addresses, etc.
>>>>
>>>> If we did the probe on address spaces instead, the driver would have
>>>> to re-send a probe request each time a new endpoint is attached to an
>>>> existing address space, to see if it is still capable of page table
>>>> handover or if the driver just combined a VFIO and an emulated
>>>> endpoint by accident.
>>>>
>>>>                                  ***
>>>>
>>>> Using this framework, the device can declare doorbell regions by
>>>> adding one or more RESV fields into the probe buffer:
>>>>
>>>> /* 'type' */
>>>> #define VIRTIO_IOMMU_PROBE_T_RESV 	0x1
>>>>
>>>> /* 'values'. 'length' is sizeof(struct virtio_iommu_probe_resv) */
>>>> struct virtio_iommu_probe_resv {
>>>> 	le64 gpa;
>>>> 	le64 size;
>>>>
>>>> #define VIRTIO_IOMMU_PROBE_RESV_MSI	0x1
>>>> 	u8 type;
> 
> To be sure I am understanding it correctly, Is this "type" in struct virtio_iommu_req_head?

No, virtio_iommu_req_head::type is the request type
(ATTACH/DETACH/MAP/UNMAP/PROBE).

Then virtio_iommu_probe_property::type is the property type (only RESV for
the moment).

And this is virtio_iommu_probe_resv::type, which is the type of the resv
region (MSI). I renamed it to 'subtype' below, but I think it still is
pretty confusing.


I did a number of changes to structures and naming when trying to
integrate it to the specification:

* Added 64 bytes of padding in virtio_iommu_req_probe, so that future
extensions can add fields in the device-readable part.
* renamed "RESV" to "RESV_MEM".
* The resv_mem property now looks like this:
  struct virtio_iommu_probe_resv_mem {
        u8      subtype;
        u8      padding[3];
        le32    flags;
        le64    addr;
        le64    size;
  };
* subtype for MSI doorbells is now VIRTIO_IOMMU_PROBE_RESV_MEM_T_BYPASS
(because transactions to this region bypass the IOMMU). 'flags' contain a
hint VIRTIO_IOMMU_PROBE_RESV_MEM_F_MSI, telling the driver that this
region is used for MSIs.

Here is an example of a probe request returning an MSI doorbell property.

     31                       7      0
    +---------------------------------+
    |           0            |  type  | <- request type = PROBE (5)
    +---------------------------------+
    |             device              |
    +---------------------------------+
    :                                 :
    :          (64B padding)          :
    :                                 :
    +---------------------------------+
  ^ |  length = 24   |    type = 1    | <- property type = RESV_MEM (1)
  | +---------------------------------+
  | |           0            |subtype | <- RESV_MEM subtype = BYPASS (1)
 p| +---------------------------------+
 r| |           flags = MSI           |
 o| +---------------------------------+
 b| |         addr = 0xfee00000       |
 e| |                                 |
 _| +---------------------------------+
 s| |         size = 0x00100000       |
 i| |                                 |
 z| +---------------------------------+
 e| |    length      |      type      | <- another property may start
  | :                                 :    here
  v :               ...               :
    +---------------------------------+
    |           0            | status | <- request tail
    +---------------------------------+


I'll try to send the next version of the spec out as soon as possible.

Thanks,
Jean


> Thanks
> -Bharat
> 
>>>
>>> type is 16 bit above?
>>
>> Ah, the naming isn't great. This is not the same as above, and could be called
>> 'subtype' to avoid confusion. The above 16-bit type describes the field type,
>> e.g. struct virtio_iommu_probe_resv. I proposed 16-bit because it seems
>> easy to reach more than 255 kinds of endpoint properties, but
>> 65535 should do.
>>
>> This subtype describes which kind of resv region is described in the structure.
>> For the moment there only is VIRTIO_IOMMU_PROBE_RESV_MSI, but we
>> could for example add resv regions that the driver should never use or that it
>> should identity-map (equivalent to IOMMU_RESV_RESERVED/DIRECT in
>> Linux). I think 8 bits should be enough to contain any future types, unless we
>> make this a bitfield. For identity-map, there may be an additional flags field
>> describing the protection.
>>
>>>> };
>>>>
>>>> Such a region would be subject to the following rules:
>>>>
>>>> * Driver should not use any IOVA declared as RESV_MSI in a mapping.
>>>> * Device should leave any transaction matching a RESV_MSI region pass
>>>> through untranslated.
>>>> * If the device does not advertise any RESV region, then the driver
>>>> should assume that MSI doorbells, like any other GPA, must be mapped
>>>> with an arbitrary IOVA in order for the endpoint to access them.
>>>> * Given that the driver *should* perform a probe request if
>>>> available, and it *should* understand the
>> VIRTIO_IOMMU_PROBE_T_RESV
>>>> field, then this field tells the guest how it should handle MSI
>>>> doorbells, and whether it should map the address via MAP requests or
>> not.
>>>>
>>>> Does this make sense and did I overlook something?
>>>
>>> Overall it looks good to me. Do you have plans to implements this in virtio-
>> iommu driver and kvmtool?
>>
>> Yes, if there is no objection I'll try to formalize it and implement it right away.
>>
>> Thanks,
>> Jean

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [Qemu-devel] [RFC v2 0/8] VIRTIO-IOMMU device
  2017-07-12 10:18                                       ` Jean-Philippe Brucker
@ 2017-07-12 10:27                                         ` Bharat Bhushan
  2017-07-12 10:58                                           ` Jean-Philippe Brucker
  0 siblings, 1 reply; 73+ messages in thread
From: Bharat Bhushan @ 2017-07-12 10:27 UTC (permalink / raw)
  To: Jean-Philippe Brucker, Auger Eric, eric.auger.pro, peter.maydell,
	alex.williamson, mst, qemu-arm, qemu-devel
  Cc: wei, kevin.tian, marc.zyngier, tn, will.deacon, drjones,
	robin.murphy, christoffer.dall



> -----Original Message-----
> From: Jean-Philippe Brucker [mailto:jean-philippe.brucker@arm.com]
> Sent: Wednesday, July 12, 2017 3:48 PM
> To: Bharat Bhushan <bharat.bhushan@nxp.com>; Auger Eric
> <eric.auger@redhat.com>; eric.auger.pro@gmail.com;
> peter.maydell@linaro.org; alex.williamson@redhat.com; mst@redhat.com;
> qemu-arm@nongnu.org; qemu-devel@nongnu.org
> Cc: wei@redhat.com; kevin.tian@intel.com; marc.zyngier@arm.com;
> tn@semihalf.com; will.deacon@arm.com; drjones@redhat.com;
> robin.murphy@arm.com; christoffer.dall@linaro.org
> Subject: Re: [Qemu-devel] [RFC v2 0/8] VIRTIO-IOMMU device
> 
> On 12/07/17 04:50, Bharat Bhushan wrote:
> [...]
> >> The size of the virtio_iommu_req_probe structure is variable, and
> depends
> >> what fields the device implements. So the device initially computes the
> size it
> >> needs to fill virtio_iommu_req_probe, describes it in probe_size, and the
> >> driver allocates that many bytes for virtio_iommu_req_probe.content[]
> >>
> >>>> * When device offers VIRTIO_IOMMU_F_PROBE, the driver should
> send
> >> an
> >>>> VIRTIO_IOMMU_T_PROBE request for each new endpoint.
> >>>> * The driver allocates a device-writeable buffer of probe_size (plus
> >>>> framing) and sends it as a VIRTIO_IOMMU_T_PROBE request.
> >>>> * The device fills the buffer with various information.
> >>>>
> >>>> struct virtio_iommu_req_probe {
> >>>> 	/* device-readable */
> >>>> 	struct virtio_iommu_req_head head;
> >>>> 	le32 device;
> >>>> 	le32 flags;
> >>>>
> >>>> 	/* maybe also le32 content_size, but it must be equal to
> >>>> 	   probe_size */
> >>>
> >>> Can you please describe why we need to pass size of "probe_size" in
> probe
> >> request?
> >>
> >> We don't. I don't think we should add this 'content_size' field unless there
> is
> >> a compelling reason to do so.
> >>
> >>>>
> >>>> 	/* device-writeable */
> >>>> 	u8 content[];
> >>>
> >>> I assume content_size above is the size of array "content[]" and max
> value
> >> can be equal to probe_size advertised by device?
> >>
> >> probe_size is exactly the size of array content[]. The driver must allocate a
> >> buffer of this size (plus the space needed for head, device, flags and tail).
> >>
> >> Then the device is free to leave parts of content[] empty. Field 'type' 0 will
> be
> >> reserved and mark the end of the array.
> >>
> >>>> 	struct virtio_iommu_req_tail tail;
> >>>> };
> >>>>
> >>>> I'm still struggling with the content and layout of the probe
> >>>> request, and would appreciate any feedback. To be easily extended, I
> >>>> think it should contain a list of fields of variable size:
> >>>>
> >>>> 	|0           15|16           31|32               N|
> >>>> 	|     type     |    length     |      values      |
> >>>>
> >>>> 'length' might be made optional if it can be deduced from type, but
> >>>> might make driver-side parsing more robust.
> >>>>
> >>>> The probe could either be done for each endpoint, or for each address
> >>>> space. I much prefer endpoint because it is the smallest granularity.
> >>>> The driver can then decide what endpoints to put together in the same
> >>>> address space based on their individual capabilities. The
> >>>> specification would described how each endpoint property is combined
> >>>> when endpoints are put in the same address space. For example, take
> >>>> the minimum of all PASID size, the maximum of all page granularities,
> >>>> combine doorbell addresses, etc.
> >>>>
> >>>> If we did the probe on address spaces instead, the driver would have
> >>>> to re-send a probe request each time a new endpoint is attached to an
> >>>> existing address space, to see if it is still capable of page table
> >>>> handover or if the driver just combined a VFIO and an emulated
> >>>> endpoint by accident.
> >>>>
> >>>>                                  ***
> >>>>
> >>>> Using this framework, the device can declare doorbell regions by
> >>>> adding one or more RESV fields into the probe buffer:
> >>>>
> >>>> /* 'type' */
> >>>> #define VIRTIO_IOMMU_PROBE_T_RESV 	0x1
> >>>>
> >>>> /* 'values'. 'length' is sizeof(struct virtio_iommu_probe_resv) */
> >>>> struct virtio_iommu_probe_resv {
> >>>> 	le64 gpa;
> >>>> 	le64 size;
> >>>>
> >>>> #define VIRTIO_IOMMU_PROBE_RESV_MSI	0x1
> >>>> 	u8 type;
> >
> > To be sure I am understanding it correctly, Is this "type" in struct
> virtio_iommu_req_head?
> 
> No, virtio_iommu_req_head::type is the request type
> (ATTACH/DETACH/MAP/UNMAP/PROBE).
> 
> Then virtio_iommu_probe_property::type is the property type (only RESV
> for
> the moment).
> 
> And this is virtio_iommu_probe_resv::type, which is the type of the resv
> region (MSI). I renamed it to 'subtype' below, but I think it still is
> pretty confusing.
> 
> 
> I did a number of changes to structures and naming when trying to
> integrate it to the specification:
> 
> * Added 64 bytes of padding in virtio_iommu_req_probe, so that future
> extensions can add fields in the device-readable part.
> * renamed "RESV" to "RESV_MEM".
> * The resv_mem property now looks like this:
>   struct virtio_iommu_probe_resv_mem {
>         u8      subtype;
>         u8      padding[3];
>         le32    flags;
>         le64    addr;
>         le64    size;
>   };
> * subtype for MSI doorbells is now
> VIRTIO_IOMMU_PROBE_RESV_MEM_T_BYPASS
> (because transactions to this region bypass the IOMMU). 'flags' contain a
> hint VIRTIO_IOMMU_PROBE_RESV_MEM_F_MSI, telling the driver that this
> region is used for MSIs.
> 
> Here is an example of a probe request returning an MSI doorbell property.
> 
>      31                       7      0
>     +---------------------------------+
>     |           0            |  type  | <- request type = PROBE (5)
>     +---------------------------------+
>     |             device              |
>     +---------------------------------+
>     :                                 :
>     :          (64B padding)          :
>     :                                 :
>     +---------------------------------+
>   ^ |  length = 24   |    type = 1    | <- property type = RESV_MEM (1)
>   | +---------------------------------+
>   | |           0            |subtype | <- RESV_MEM subtype = BYPASS (1)
>  p| +---------------------------------+
>  r| |           flags = MSI           |
>  o| +---------------------------------+
>  b| |         addr = 0xfee00000       |
>  e| |                                 |
>  _| +---------------------------------+
>  s| |         size = 0x00100000       |
>  i| |                                 |
>  z| +---------------------------------+
>  e| |    length      |      type      | <- another property may start
>   | :                                 :    here
>   v :               ...               :
>     +---------------------------------+
>     |           0            | status | <- request tail
>     +---------------------------------+

So we want a single probe will return info of all "types" and each "subtype" of given "type"? I was of impression that based on flags there will be separate probe request for a type.

Thanks
-Bharat

> 
> 
> I'll try to send the next version of the spec out as soon as possible.
> 
> Thanks,
> Jean
> 
> 
> > Thanks
> > -Bharat
> >
> >>>
> >>> type is 16 bit above?
> >>
> >> Ah, the naming isn't great. This is not the same as above, and could be
> called
> >> 'subtype' to avoid confusion. The above 16-bit type describes the field
> type,
> >> e.g. struct virtio_iommu_probe_resv. I proposed 16-bit because it seems
> >> easy to reach more than 255 kinds of endpoint properties, but
> >> 65535 should do.
> >>
> >> This subtype describes which kind of resv region is described in the
> structure.
> >> For the moment there only is VIRTIO_IOMMU_PROBE_RESV_MSI, but we
> >> could for example add resv regions that the driver should never use or
> that it
> >> should identity-map (equivalent to IOMMU_RESV_RESERVED/DIRECT in
> >> Linux). I think 8 bits should be enough to contain any future types, unless
> we
> >> make this a bitfield. For identity-map, there may be an additional flags
> field
> >> describing the protection.
> >>
> >>>> };
> >>>>
> >>>> Such a region would be subject to the following rules:
> >>>>
> >>>> * Driver should not use any IOVA declared as RESV_MSI in a mapping.
> >>>> * Device should leave any transaction matching a RESV_MSI region pass
> >>>> through untranslated.
> >>>> * If the device does not advertise any RESV region, then the driver
> >>>> should assume that MSI doorbells, like any other GPA, must be
> mapped
> >>>> with an arbitrary IOVA in order for the endpoint to access them.
> >>>> * Given that the driver *should* perform a probe request if
> >>>> available, and it *should* understand the
> >> VIRTIO_IOMMU_PROBE_T_RESV
> >>>> field, then this field tells the guest how it should handle MSI
> >>>> doorbells, and whether it should map the address via MAP requests or
> >> not.
> >>>>
> >>>> Does this make sense and did I overlook something?
> >>>
> >>> Overall it looks good to me. Do you have plans to implements this in
> virtio-
> >> iommu driver and kvmtool?
> >>
> >> Yes, if there is no objection I'll try to formalize it and implement it right
> away.
> >>
> >> Thanks,
> >> Jean


^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [Qemu-devel] [RFC v2 0/8] VIRTIO-IOMMU device
  2017-07-12 10:27                                         ` Bharat Bhushan
@ 2017-07-12 10:58                                           ` Jean-Philippe Brucker
  2017-07-12 11:12                                             ` Bharat Bhushan
  0 siblings, 1 reply; 73+ messages in thread
From: Jean-Philippe Brucker @ 2017-07-12 10:58 UTC (permalink / raw)
  To: Bharat Bhushan, Auger Eric, eric.auger.pro, peter.maydell,
	alex.williamson, mst, qemu-arm, qemu-devel
  Cc: wei, kevin.tian, marc.zyngier, tn, will.deacon, drjones,
	robin.murphy, christoffer.dall

On 12/07/17 11:27, Bharat Bhushan wrote:
> 
> 
>> -----Original Message-----
>> From: Jean-Philippe Brucker [mailto:jean-philippe.brucker@arm.com]
>> Sent: Wednesday, July 12, 2017 3:48 PM
>> To: Bharat Bhushan <bharat.bhushan@nxp.com>; Auger Eric
>> <eric.auger@redhat.com>; eric.auger.pro@gmail.com;
>> peter.maydell@linaro.org; alex.williamson@redhat.com; mst@redhat.com;
>> qemu-arm@nongnu.org; qemu-devel@nongnu.org
>> Cc: wei@redhat.com; kevin.tian@intel.com; marc.zyngier@arm.com;
>> tn@semihalf.com; will.deacon@arm.com; drjones@redhat.com;
>> robin.murphy@arm.com; christoffer.dall@linaro.org
>> Subject: Re: [Qemu-devel] [RFC v2 0/8] VIRTIO-IOMMU device
>>
>> On 12/07/17 04:50, Bharat Bhushan wrote:
>> [...]
>>>> The size of the virtio_iommu_req_probe structure is variable, and
>> depends
>>>> what fields the device implements. So the device initially computes the
>> size it
>>>> needs to fill virtio_iommu_req_probe, describes it in probe_size, and the
>>>> driver allocates that many bytes for virtio_iommu_req_probe.content[]
>>>>
>>>>>> * When device offers VIRTIO_IOMMU_F_PROBE, the driver should
>> send
>>>> an
>>>>>> VIRTIO_IOMMU_T_PROBE request for each new endpoint.
>>>>>> * The driver allocates a device-writeable buffer of probe_size (plus
>>>>>> framing) and sends it as a VIRTIO_IOMMU_T_PROBE request.
>>>>>> * The device fills the buffer with various information.
>>>>>>
>>>>>> struct virtio_iommu_req_probe {
>>>>>> 	/* device-readable */
>>>>>> 	struct virtio_iommu_req_head head;
>>>>>> 	le32 device;
>>>>>> 	le32 flags;
>>>>>>
>>>>>> 	/* maybe also le32 content_size, but it must be equal to
>>>>>> 	   probe_size */
>>>>>
>>>>> Can you please describe why we need to pass size of "probe_size" in
>> probe
>>>> request?
>>>>
>>>> We don't. I don't think we should add this 'content_size' field unless there
>> is
>>>> a compelling reason to do so.
>>>>
>>>>>>
>>>>>> 	/* device-writeable */
>>>>>> 	u8 content[];
>>>>>
>>>>> I assume content_size above is the size of array "content[]" and max
>> value
>>>> can be equal to probe_size advertised by device?
>>>>
>>>> probe_size is exactly the size of array content[]. The driver must allocate a
>>>> buffer of this size (plus the space needed for head, device, flags and tail).
>>>>
>>>> Then the device is free to leave parts of content[] empty. Field 'type' 0 will
>> be
>>>> reserved and mark the end of the array.
>>>>
>>>>>> 	struct virtio_iommu_req_tail tail;
>>>>>> };
>>>>>>
>>>>>> I'm still struggling with the content and layout of the probe
>>>>>> request, and would appreciate any feedback. To be easily extended, I
>>>>>> think it should contain a list of fields of variable size:
>>>>>>
>>>>>> 	|0           15|16           31|32               N|
>>>>>> 	|     type     |    length     |      values      |
>>>>>>
>>>>>> 'length' might be made optional if it can be deduced from type, but
>>>>>> might make driver-side parsing more robust.
>>>>>>
>>>>>> The probe could either be done for each endpoint, or for each address
>>>>>> space. I much prefer endpoint because it is the smallest granularity.
>>>>>> The driver can then decide what endpoints to put together in the same
>>>>>> address space based on their individual capabilities. The
>>>>>> specification would described how each endpoint property is combined
>>>>>> when endpoints are put in the same address space. For example, take
>>>>>> the minimum of all PASID size, the maximum of all page granularities,
>>>>>> combine doorbell addresses, etc.
>>>>>>
>>>>>> If we did the probe on address spaces instead, the driver would have
>>>>>> to re-send a probe request each time a new endpoint is attached to an
>>>>>> existing address space, to see if it is still capable of page table
>>>>>> handover or if the driver just combined a VFIO and an emulated
>>>>>> endpoint by accident.
>>>>>>
>>>>>>                                  ***
>>>>>>
>>>>>> Using this framework, the device can declare doorbell regions by
>>>>>> adding one or more RESV fields into the probe buffer:
>>>>>>
>>>>>> /* 'type' */
>>>>>> #define VIRTIO_IOMMU_PROBE_T_RESV 	0x1
>>>>>>
>>>>>> /* 'values'. 'length' is sizeof(struct virtio_iommu_probe_resv) */
>>>>>> struct virtio_iommu_probe_resv {
>>>>>> 	le64 gpa;
>>>>>> 	le64 size;
>>>>>>
>>>>>> #define VIRTIO_IOMMU_PROBE_RESV_MSI	0x1
>>>>>> 	u8 type;
>>>
>>> To be sure I am understanding it correctly, Is this "type" in struct
>> virtio_iommu_req_head?
>>
>> No, virtio_iommu_req_head::type is the request type
>> (ATTACH/DETACH/MAP/UNMAP/PROBE).
>>
>> Then virtio_iommu_probe_property::type is the property type (only RESV
>> for
>> the moment).
>>
>> And this is virtio_iommu_probe_resv::type, which is the type of the resv
>> region (MSI). I renamed it to 'subtype' below, but I think it still is
>> pretty confusing.
>>
>>
>> I did a number of changes to structures and naming when trying to
>> integrate it to the specification:
>>
>> * Added 64 bytes of padding in virtio_iommu_req_probe, so that future
>> extensions can add fields in the device-readable part.
>> * renamed "RESV" to "RESV_MEM".
>> * The resv_mem property now looks like this:
>>   struct virtio_iommu_probe_resv_mem {
>>         u8      subtype;
>>         u8      padding[3];
>>         le32    flags;
>>         le64    addr;
>>         le64    size;
>>   };
>> * subtype for MSI doorbells is now
>> VIRTIO_IOMMU_PROBE_RESV_MEM_T_BYPASS
>> (because transactions to this region bypass the IOMMU). 'flags' contain a
>> hint VIRTIO_IOMMU_PROBE_RESV_MEM_F_MSI, telling the driver that this
>> region is used for MSIs.
>>
>> Here is an example of a probe request returning an MSI doorbell property.
>>
>>      31                       7      0
>>     +---------------------------------+
>>     |           0            |  type  | <- request type = PROBE (5)
>>     +---------------------------------+
>>     |             device              |
>>     +---------------------------------+
>>     :                                 :
>>     :          (64B padding)          :
>>     :                                 :
>>     +---------------------------------+
>>   ^ |  length = 24   |    type = 1    | <- property type = RESV_MEM (1)
>>   | +---------------------------------+
>>   | |           0            |subtype | <- RESV_MEM subtype = BYPASS (1)
>>  p| +---------------------------------+
>>  r| |           flags = MSI           |
>>  o| +---------------------------------+
>>  b| |         addr = 0xfee00000       |
>>  e| |                                 |
>>  _| +---------------------------------+
>>  s| |         size = 0x00100000       |
>>  i| |                                 |
>>  z| +---------------------------------+
>>  e| |    length      |      type      | <- another property may start
>>   | :                                 :    here
>>   v :               ...               :
>>     +---------------------------------+
>>     |           0            | status | <- request tail
>>     +---------------------------------+
> 
> So we want a single probe will return info of all "types" and each "subtype" of given "type"? I was of impression that based on flags there will be separate probe request for a type.

Argh, now I'm lost :)

The virtio-iommu driver sends a single PROBE request for each endpoint.
The virtio-iommu device fills the 'properties' field with a list of
properties.

And endpoint may have one or more reserved virtual addresses. Each such
region is described by the virtio-iommu device with a RESV_MEM property in
the properties list.

There will be other types of properties in the future, for other
information than RESV_MEM. So the PROBE request is a generic channel for
different types of properties, all aggregated into a single list. The
virtio-iommu device chooses which property it needs to communicate to the
driver.

The list does not have a fixed layout, and the driver knows what
properties it contains by reading the 'type' header of each property.

The virtio-iommu driver parses the 'properties' list filled by the device.
If it encounters one or more RESV_MEM properties, the driver should take
note of them. Thereafter, the driver should never create a mapping
overlapping RESV_MEM regions for this endpoint.

If, in addition, the RESV_MEM property is of subtype BYPASS and has flag
MSI, then the driver knows that it is an MSI doorbell and it doesn't need
to create a mapping (using a MAP request) for this MSI doorbell.

Maybe my prototype implementation will be more clear.

Thanks,
Jean


>>
>>
>> I'll try to send the next version of the spec out as soon as possible.
>>
>> Thanks,
>> Jean
>>
>>
>>> Thanks
>>> -Bharat
>>>
>>>>>
>>>>> type is 16 bit above?
>>>>
>>>> Ah, the naming isn't great. This is not the same as above, and could be
>> called
>>>> 'subtype' to avoid confusion. The above 16-bit type describes the field
>> type,
>>>> e.g. struct virtio_iommu_probe_resv. I proposed 16-bit because it seems
>>>> easy to reach more than 255 kinds of endpoint properties, but
>>>> 65535 should do.
>>>>
>>>> This subtype describes which kind of resv region is described in the
>> structure.
>>>> For the moment there only is VIRTIO_IOMMU_PROBE_RESV_MSI, but we
>>>> could for example add resv regions that the driver should never use or
>> that it
>>>> should identity-map (equivalent to IOMMU_RESV_RESERVED/DIRECT in
>>>> Linux). I think 8 bits should be enough to contain any future types, unless
>> we
>>>> make this a bitfield. For identity-map, there may be an additional flags
>> field
>>>> describing the protection.
>>>>
>>>>>> };
>>>>>>
>>>>>> Such a region would be subject to the following rules:
>>>>>>
>>>>>> * Driver should not use any IOVA declared as RESV_MSI in a mapping.
>>>>>> * Device should leave any transaction matching a RESV_MSI region pass
>>>>>> through untranslated.
>>>>>> * If the device does not advertise any RESV region, then the driver
>>>>>> should assume that MSI doorbells, like any other GPA, must be
>> mapped
>>>>>> with an arbitrary IOVA in order for the endpoint to access them.
>>>>>> * Given that the driver *should* perform a probe request if
>>>>>> available, and it *should* understand the
>>>> VIRTIO_IOMMU_PROBE_T_RESV
>>>>>> field, then this field tells the guest how it should handle MSI
>>>>>> doorbells, and whether it should map the address via MAP requests or
>>>> not.
>>>>>>
>>>>>> Does this make sense and did I overlook something?
>>>>>
>>>>> Overall it looks good to me. Do you have plans to implements this in
>> virtio-
>>>> iommu driver and kvmtool?
>>>>
>>>> Yes, if there is no objection I'll try to formalize it and implement it right
>> away.
>>>>
>>>> Thanks,
>>>> Jean
> 

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [Qemu-devel] [RFC v2 0/8] VIRTIO-IOMMU device
  2017-07-12 10:58                                           ` Jean-Philippe Brucker
@ 2017-07-12 11:12                                             ` Bharat Bhushan
  0 siblings, 0 replies; 73+ messages in thread
From: Bharat Bhushan @ 2017-07-12 11:12 UTC (permalink / raw)
  To: Jean-Philippe Brucker, Auger Eric, eric.auger.pro, peter.maydell,
	alex.williamson, mst, qemu-arm, qemu-devel
  Cc: wei, kevin.tian, marc.zyngier, tn, will.deacon, drjones,
	robin.murphy, christoffer.dall



> -----Original Message-----
> From: Jean-Philippe Brucker [mailto:jean-philippe.brucker@arm.com]
> Sent: Wednesday, July 12, 2017 4:28 PM
> To: Bharat Bhushan <bharat.bhushan@nxp.com>; Auger Eric
> <eric.auger@redhat.com>; eric.auger.pro@gmail.com;
> peter.maydell@linaro.org; alex.williamson@redhat.com; mst@redhat.com;
> qemu-arm@nongnu.org; qemu-devel@nongnu.org
> Cc: wei@redhat.com; kevin.tian@intel.com; marc.zyngier@arm.com;
> tn@semihalf.com; will.deacon@arm.com; drjones@redhat.com;
> robin.murphy@arm.com; christoffer.dall@linaro.org
> Subject: Re: [Qemu-devel] [RFC v2 0/8] VIRTIO-IOMMU device
> 
> On 12/07/17 11:27, Bharat Bhushan wrote:
> >
> >
> >> -----Original Message-----
> >> From: Jean-Philippe Brucker [mailto:jean-philippe.brucker@arm.com]
> >> Sent: Wednesday, July 12, 2017 3:48 PM
> >> To: Bharat Bhushan <bharat.bhushan@nxp.com>; Auger Eric
> >> <eric.auger@redhat.com>; eric.auger.pro@gmail.com;
> >> peter.maydell@linaro.org; alex.williamson@redhat.com;
> mst@redhat.com;
> >> qemu-arm@nongnu.org; qemu-devel@nongnu.org
> >> Cc: wei@redhat.com; kevin.tian@intel.com; marc.zyngier@arm.com;
> >> tn@semihalf.com; will.deacon@arm.com; drjones@redhat.com;
> >> robin.murphy@arm.com; christoffer.dall@linaro.org
> >> Subject: Re: [Qemu-devel] [RFC v2 0/8] VIRTIO-IOMMU device
> >>
> >> On 12/07/17 04:50, Bharat Bhushan wrote:
> >> [...]
> >>>> The size of the virtio_iommu_req_probe structure is variable, and
> >> depends
> >>>> what fields the device implements. So the device initially computes
> >>>> the
> >> size it
> >>>> needs to fill virtio_iommu_req_probe, describes it in probe_size,
> >>>> and the driver allocates that many bytes for
> >>>> virtio_iommu_req_probe.content[]
> >>>>
> >>>>>> * When device offers VIRTIO_IOMMU_F_PROBE, the driver should
> >> send
> >>>> an
> >>>>>> VIRTIO_IOMMU_T_PROBE request for each new endpoint.
> >>>>>> * The driver allocates a device-writeable buffer of probe_size
> >>>>>> (plus
> >>>>>> framing) and sends it as a VIRTIO_IOMMU_T_PROBE request.
> >>>>>> * The device fills the buffer with various information.
> >>>>>>
> >>>>>> struct virtio_iommu_req_probe {
> >>>>>> 	/* device-readable */
> >>>>>> 	struct virtio_iommu_req_head head;
> >>>>>> 	le32 device;
> >>>>>> 	le32 flags;
> >>>>>>
> >>>>>> 	/* maybe also le32 content_size, but it must be equal to
> >>>>>> 	   probe_size */
> >>>>>
> >>>>> Can you please describe why we need to pass size of "probe_size"
> >>>>> in
> >> probe
> >>>> request?
> >>>>
> >>>> We don't. I don't think we should add this 'content_size' field
> >>>> unless there
> >> is
> >>>> a compelling reason to do so.
> >>>>
> >>>>>>
> >>>>>> 	/* device-writeable */
> >>>>>> 	u8 content[];
> >>>>>
> >>>>> I assume content_size above is the size of array "content[]" and
> >>>>> max
> >> value
> >>>> can be equal to probe_size advertised by device?
> >>>>
> >>>> probe_size is exactly the size of array content[]. The driver must
> >>>> allocate a buffer of this size (plus the space needed for head, device,
> flags and tail).
> >>>>
> >>>> Then the device is free to leave parts of content[] empty. Field
> >>>> 'type' 0 will
> >> be
> >>>> reserved and mark the end of the array.
> >>>>
> >>>>>> 	struct virtio_iommu_req_tail tail; };
> >>>>>>
> >>>>>> I'm still struggling with the content and layout of the probe
> >>>>>> request, and would appreciate any feedback. To be easily
> >>>>>> extended, I think it should contain a list of fields of variable size:
> >>>>>>
> >>>>>> 	|0           15|16           31|32               N|
> >>>>>> 	|     type     |    length     |      values      |
> >>>>>>
> >>>>>> 'length' might be made optional if it can be deduced from type,
> >>>>>> but might make driver-side parsing more robust.
> >>>>>>
> >>>>>> The probe could either be done for each endpoint, or for each
> >>>>>> address space. I much prefer endpoint because it is the smallest
> granularity.
> >>>>>> The driver can then decide what endpoints to put together in the
> >>>>>> same address space based on their individual capabilities. The
> >>>>>> specification would described how each endpoint property is
> >>>>>> combined when endpoints are put in the same address space. For
> >>>>>> example, take the minimum of all PASID size, the maximum of all
> >>>>>> page granularities, combine doorbell addresses, etc.
> >>>>>>
> >>>>>> If we did the probe on address spaces instead, the driver would
> >>>>>> have to re-send a probe request each time a new endpoint is
> >>>>>> attached to an existing address space, to see if it is still
> >>>>>> capable of page table handover or if the driver just combined a
> >>>>>> VFIO and an emulated endpoint by accident.
> >>>>>>
> >>>>>>                                  ***
> >>>>>>
> >>>>>> Using this framework, the device can declare doorbell regions by
> >>>>>> adding one or more RESV fields into the probe buffer:
> >>>>>>
> >>>>>> /* 'type' */
> >>>>>> #define VIRTIO_IOMMU_PROBE_T_RESV 	0x1
> >>>>>>
> >>>>>> /* 'values'. 'length' is sizeof(struct virtio_iommu_probe_resv)
> >>>>>> */ struct virtio_iommu_probe_resv {
> >>>>>> 	le64 gpa;
> >>>>>> 	le64 size;
> >>>>>>
> >>>>>> #define VIRTIO_IOMMU_PROBE_RESV_MSI	0x1
> >>>>>> 	u8 type;
> >>>
> >>> To be sure I am understanding it correctly, Is this "type" in struct
> >> virtio_iommu_req_head?
> >>
> >> No, virtio_iommu_req_head::type is the request type
> >> (ATTACH/DETACH/MAP/UNMAP/PROBE).
> >>
> >> Then virtio_iommu_probe_property::type is the property type (only
> >> RESV for the moment).
> >>
> >> And this is virtio_iommu_probe_resv::type, which is the type of the
> >> resv region (MSI). I renamed it to 'subtype' below, but I think it
> >> still is pretty confusing.
> >>
> >>
> >> I did a number of changes to structures and naming when trying to
> >> integrate it to the specification:
> >>
> >> * Added 64 bytes of padding in virtio_iommu_req_probe, so that future
> >> extensions can add fields in the device-readable part.
> >> * renamed "RESV" to "RESV_MEM".
> >> * The resv_mem property now looks like this:
> >>   struct virtio_iommu_probe_resv_mem {
> >>         u8      subtype;
> >>         u8      padding[3];
> >>         le32    flags;
> >>         le64    addr;
> >>         le64    size;
> >>   };
> >> * subtype for MSI doorbells is now
> >> VIRTIO_IOMMU_PROBE_RESV_MEM_T_BYPASS
> >> (because transactions to this region bypass the IOMMU). 'flags'
> >> contain a hint VIRTIO_IOMMU_PROBE_RESV_MEM_F_MSI, telling the
> driver
> >> that this region is used for MSIs.
> >>
> >> Here is an example of a probe request returning an MSI doorbell
> property.
> >>
> >>      31                       7      0
> >>     +---------------------------------+
> >>     |           0            |  type  | <- request type = PROBE (5)
> >>     +---------------------------------+
> >>     |             device              |
> >>     +---------------------------------+
> >>     :                                 :
> >>     :          (64B padding)          :
> >>     :                                 :
> >>     +---------------------------------+
> >>   ^ |  length = 24   |    type = 1    | <- property type = RESV_MEM (1)
> >>   | +---------------------------------+
> >>   | |           0            |subtype | <- RESV_MEM subtype = BYPASS (1)
> >>  p| +---------------------------------+
> >>  r| |           flags = MSI           |
> >>  o| +---------------------------------+
> >>  b| |         addr = 0xfee00000       |
> >>  e| |                                 |
> >>  _| +---------------------------------+
> >>  s| |         size = 0x00100000       |
> >>  i| |                                 |
> >>  z| +---------------------------------+
> >>  e| |    length      |      type      | <- another property may start
> >>   | :                                 :    here
> >>   v :               ...               :
> >>     +---------------------------------+
> >>     |           0            | status | <- request tail
> >>     +---------------------------------+
> >
> > So we want a single probe will return info of all "types" and each "subtype"
> of given "type"? I was of impression that based on flags there will be
> separate probe request for a type.
> 
> Argh, now I'm lost :)
> 
> The virtio-iommu driver sends a single PROBE request for each endpoint.
> The virtio-iommu device fills the 'properties' field with a list of properties.
> 
> And endpoint may have one or more reserved virtual addresses. Each such
> region is described by the virtio-iommu device with a RESV_MEM property in
> the properties list.
> 
> There will be other types of properties in the future, for other information
> than RESV_MEM. So the PROBE request is a generic channel for different
> types of properties, all aggregated into a single list. The virtio-iommu device
> chooses which property it needs to communicate to the driver.
> 
> The list does not have a fixed layout, and the driver knows what properties it
> contains by reading the 'type' header of each property.
> 
> The virtio-iommu driver parses the 'properties' list filled by the device.
> If it encounters one or more RESV_MEM properties, the driver should take
> note of them. Thereafter, the driver should never create a mapping
> overlapping RESV_MEM regions for this endpoint.
> 
> If, in addition, the RESV_MEM property is of subtype BYPASS and has flag
> MSI, then the driver knows that it is an MSI doorbell and it doesn't need to
> create a mapping (using a MAP request) for this MSI doorbell.

Thanks for clarification, it is inline with my understanding.

Thanks
-Bharat

> 
> Maybe my prototype implementation will be more clear.
> 
> Thanks,
> Jean
> 
> 
> >>
> >>
> >> I'll try to send the next version of the spec out as soon as possible.
> >>
> >> Thanks,
> >> Jean
> >>
> >>
> >>> Thanks
> >>> -Bharat
> >>>
> >>>>>
> >>>>> type is 16 bit above?
> >>>>
> >>>> Ah, the naming isn't great. This is not the same as above, and
> >>>> could be
> >> called
> >>>> 'subtype' to avoid confusion. The above 16-bit type describes the
> >>>> field
> >> type,
> >>>> e.g. struct virtio_iommu_probe_resv. I proposed 16-bit because it
> >>>> seems easy to reach more than 255 kinds of endpoint properties, but
> >>>> 65535 should do.
> >>>>
> >>>> This subtype describes which kind of resv region is described in
> >>>> the
> >> structure.
> >>>> For the moment there only is VIRTIO_IOMMU_PROBE_RESV_MSI, but
> we
> >>>> could for example add resv regions that the driver should never use
> >>>> or
> >> that it
> >>>> should identity-map (equivalent to IOMMU_RESV_RESERVED/DIRECT
> in
> >>>> Linux). I think 8 bits should be enough to contain any future
> >>>> types, unless
> >> we
> >>>> make this a bitfield. For identity-map, there may be an additional
> >>>> flags
> >> field
> >>>> describing the protection.
> >>>>
> >>>>>> };
> >>>>>>
> >>>>>> Such a region would be subject to the following rules:
> >>>>>>
> >>>>>> * Driver should not use any IOVA declared as RESV_MSI in a
> mapping.
> >>>>>> * Device should leave any transaction matching a RESV_MSI region
> >>>>>> pass through untranslated.
> >>>>>> * If the device does not advertise any RESV region, then the
> >>>>>> driver should assume that MSI doorbells, like any other GPA, must
> >>>>>> be
> >> mapped
> >>>>>> with an arbitrary IOVA in order for the endpoint to access them.
> >>>>>> * Given that the driver *should* perform a probe request if
> >>>>>> available, and it *should* understand the
> >>>> VIRTIO_IOMMU_PROBE_T_RESV
> >>>>>> field, then this field tells the guest how it should handle MSI
> >>>>>> doorbells, and whether it should map the address via MAP requests
> >>>>>> or
> >>>> not.
> >>>>>>
> >>>>>> Does this make sense and did I overlook something?
> >>>>>
> >>>>> Overall it looks good to me. Do you have plans to implements this
> >>>>> in
> >> virtio-
> >>>> iommu driver and kvmtool?
> >>>>
> >>>> Yes, if there is no objection I'll try to formalize it and
> >>>> implement it right
> >> away.
> >>>>
> >>>> Thanks,
> >>>> Jean
> >


^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [Qemu-devel] [RFC v2 6/8] virtio-iommu: Implement the translation and commands
  2017-06-07 16:01 ` [Qemu-devel] [RFC v2 6/8] virtio-iommu: Implement the translation and commands Eric Auger
  2017-06-23 16:09   ` Jean-Philippe Brucker
  2017-07-04  9:13   ` Bharat Bhushan
@ 2017-07-14  2:17   ` Peter Xu
  2017-07-14  6:40     ` Bharat Bhushan
  2017-07-14 11:25     ` Jean-Philippe Brucker
  2 siblings, 2 replies; 73+ messages in thread
From: Peter Xu @ 2017-07-14  2:17 UTC (permalink / raw)
  To: Eric Auger
  Cc: eric.auger.pro, peter.maydell, alex.williamson, mst, qemu-arm,
	qemu-devel, jean-philippe.brucker, wei, kevin.tian,
	bharat.bhushan, marc.zyngier, tn, will.deacon, drjones,
	robin.murphy, christoffer.dall

On Wed, Jun 07, 2017 at 06:01:25PM +0200, Eric Auger wrote:
> This patch adds the actual implementation for the translation routine
> and the virtio-iommu commands.
> 
> Signed-off-by: Eric Auger <eric.auger@redhat.com>

[...]

>  static int virtio_iommu_attach(VirtIOIOMMU *s,
>                                 struct virtio_iommu_req_attach *req)
> @@ -95,10 +135,34 @@ static int virtio_iommu_attach(VirtIOIOMMU *s,
>      uint32_t asid = le32_to_cpu(req->address_space);
>      uint32_t devid = le32_to_cpu(req->device);
>      uint32_t reserved = le32_to_cpu(req->reserved);
> +    viommu_as *as;
> +    viommu_dev *dev;
>  
>      trace_virtio_iommu_attach(asid, devid, reserved);
>  
> -    return VIRTIO_IOMMU_S_UNSUPP;
> +    dev = g_tree_lookup(s->devices, GUINT_TO_POINTER(devid));
> +    if (dev) {
> +        return -1;
> +    }
> +
> +    as = g_tree_lookup(s->address_spaces, GUINT_TO_POINTER(asid));
> +    if (!as) {
> +        as = g_malloc0(sizeof(*as));
> +        as->id = asid;
> +        as->mappings = g_tree_new_full((GCompareDataFunc)interval_cmp,
> +                                         NULL, NULL, (GDestroyNotify)g_free);
> +        g_tree_insert(s->address_spaces, GUINT_TO_POINTER(asid), as);
> +        trace_virtio_iommu_new_asid(asid);
> +    }
> +
> +    dev = g_malloc0(sizeof(*dev));
> +    dev->as = as;
> +    dev->id = devid;
> +    as->nr_devices++;
> +    trace_virtio_iommu_new_devid(devid);
> +    g_tree_insert(s->devices, GUINT_TO_POINTER(devid), dev);

Here do we need to record something like a refcount for address space?
Since...

> +
> +    return VIRTIO_IOMMU_S_OK;
>  }
>  
>  static int virtio_iommu_detach(VirtIOIOMMU *s,
> @@ -106,10 +170,13 @@ static int virtio_iommu_detach(VirtIOIOMMU *s,
>  {
>      uint32_t devid = le32_to_cpu(req->device);
>      uint32_t reserved = le32_to_cpu(req->reserved);
> +    int ret;
>  
>      trace_virtio_iommu_detach(devid, reserved);
>  
> -    return VIRTIO_IOMMU_S_UNSUPP;
> +    ret = g_tree_remove(s->devices, GUINT_TO_POINTER(devid));

... here when detach, imho we should check the refcount: if there is
no device using specific address space, we should release the address
space as well.

Otherwise there would have no way to destroy an address space?

> +
> +    return ret ? VIRTIO_IOMMU_S_OK : VIRTIO_IOMMU_S_INVAL;
>  }

[...]

>  static int virtio_iommu_unmap(VirtIOIOMMU *s,
> @@ -133,10 +227,64 @@ static int virtio_iommu_unmap(VirtIOIOMMU *s,
>      uint64_t virt_addr = le64_to_cpu(req->virt_addr);
>      uint64_t size = le64_to_cpu(req->size);
>      uint32_t flags = le32_to_cpu(req->flags);
> +    viommu_mapping *mapping;
> +    viommu_interval interval;
> +    viommu_as *as;
>  
>      trace_virtio_iommu_unmap(asid, virt_addr, size, flags);
>  
> -    return VIRTIO_IOMMU_S_UNSUPP;
> +    as = g_tree_lookup(s->address_spaces, GUINT_TO_POINTER(asid));
> +    if (!as) {
> +        error_report("%s: no as", __func__);
> +        return VIRTIO_IOMMU_S_INVAL;
> +    }
> +    interval.low = virt_addr;
> +    interval.high = virt_addr + size - 1;
> +
> +    mapping = g_tree_lookup(as->mappings, (gpointer)&interval);
> +
> +    while (mapping) {
> +        viommu_interval current;
> +        uint64_t low  = mapping->virt_addr;
> +        uint64_t high = mapping->virt_addr + mapping->size - 1;
> +
> +        current.low = low;
> +        current.high = high;
> +
> +        if (low == interval.low && size >= mapping->size) {
> +            g_tree_remove(as->mappings, (gpointer)&current);
> +            interval.low = high + 1;
> +            trace_virtio_iommu_unmap_left_interval(current.low, current.high,
> +                interval.low, interval.high);
> +        } else if (high == interval.high && size >= mapping->size) {
> +            trace_virtio_iommu_unmap_right_interval(current.low, current.high,
> +                interval.low, interval.high);
> +            g_tree_remove(as->mappings, (gpointer)&current);
> +            interval.high = low - 1;
> +        } else if (low > interval.low && high < interval.high) {
> +            trace_virtio_iommu_unmap_inc_interval(current.low, current.high);
> +            g_tree_remove(as->mappings, (gpointer)&current);
> +        } else {
> +            break;
> +        }
> +        if (interval.low >= interval.high) {
> +            return VIRTIO_IOMMU_S_OK;
> +        } else {
> +            mapping = g_tree_lookup(as->mappings, (gpointer)&interval);
> +        }
> +    }

IIUC for unmap here we are very strict - a extreme case is that when
an address space is just created (so no mapping inside), if we send
one UNMAP to a range, it'll fail currently, right? Should we loosen
it?

IMHO as long as we make sure all the mappings in the range of an UNMAP
request are destroyed, then we are good. I think at least both vfio
api and vt-d emuation have this assumption. But maybe I am wrong.
Please correct me if so.

> +
> +    if (mapping) {
> +        error_report("****** %s: Unmap 0x%"PRIx64" size=0x%"PRIx64
> +                     " from 0x%"PRIx64" size=0x%"PRIx64" is not supported",
> +                     __func__, interval.low, size,
> +                     mapping->virt_addr, mapping->size);
> +    } else {
> +        error_report("****** %s: no mapping for [0x%"PRIx64",0x%"PRIx64"]",
> +                     __func__, interval.low, interval.high);
> +    }
> +
> +    return VIRTIO_IOMMU_S_INVAL;
>  }

Thanks,

-- 
Peter Xu

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [Qemu-devel] [RFC v2 6/8] virtio-iommu: Implement the translation and commands
  2017-07-14  2:17   ` Peter Xu
@ 2017-07-14  6:40     ` Bharat Bhushan
  2017-07-17  1:28       ` Peter Xu
  2017-07-14 11:25     ` Jean-Philippe Brucker
  1 sibling, 1 reply; 73+ messages in thread
From: Bharat Bhushan @ 2017-07-14  6:40 UTC (permalink / raw)
  To: Peter Xu, Eric Auger
  Cc: eric.auger.pro, peter.maydell, alex.williamson, mst, qemu-arm,
	qemu-devel, jean-philippe.brucker, wei, kevin.tian, marc.zyngier,
	tn, will.deacon, drjones, robin.murphy, christoffer.dall

Hi Peter,

> -----Original Message-----
> From: Peter Xu [mailto:peterx@redhat.com]
> Sent: Friday, July 14, 2017 7:48 AM
> To: Eric Auger <eric.auger@redhat.com>
> Cc: eric.auger.pro@gmail.com; peter.maydell@linaro.org;
> alex.williamson@redhat.com; mst@redhat.com; qemu-arm@nongnu.org;
> qemu-devel@nongnu.org; jean-philippe.brucker@arm.com;
> wei@redhat.com; kevin.tian@intel.com; Bharat Bhushan
> <bharat.bhushan@nxp.com>; marc.zyngier@arm.com; tn@semihalf.com;
> will.deacon@arm.com; drjones@redhat.com; robin.murphy@arm.com;
> christoffer.dall@linaro.org
> Subject: Re: [Qemu-devel] [RFC v2 6/8] virtio-iommu: Implement the
> translation and commands
> 
> On Wed, Jun 07, 2017 at 06:01:25PM +0200, Eric Auger wrote:
> > This patch adds the actual implementation for the translation routine
> > and the virtio-iommu commands.
> >
> > Signed-off-by: Eric Auger <eric.auger@redhat.com>
> 
> [...]
> 
> >  static int virtio_iommu_attach(VirtIOIOMMU *s,
> >                                 struct virtio_iommu_req_attach *req)
> > @@ -95,10 +135,34 @@ static int virtio_iommu_attach(VirtIOIOMMU *s,
> >      uint32_t asid = le32_to_cpu(req->address_space);
> >      uint32_t devid = le32_to_cpu(req->device);
> >      uint32_t reserved = le32_to_cpu(req->reserved);
> > +    viommu_as *as;
> > +    viommu_dev *dev;
> >
> >      trace_virtio_iommu_attach(asid, devid, reserved);
> >
> > -    return VIRTIO_IOMMU_S_UNSUPP;
> > +    dev = g_tree_lookup(s->devices, GUINT_TO_POINTER(devid));
> > +    if (dev) {
> > +        return -1;
> > +    }
> > +
> > +    as = g_tree_lookup(s->address_spaces, GUINT_TO_POINTER(asid));
> > +    if (!as) {
> > +        as = g_malloc0(sizeof(*as));
> > +        as->id = asid;
> > +        as->mappings = g_tree_new_full((GCompareDataFunc)interval_cmp,
> > +                                         NULL, NULL, (GDestroyNotify)g_free);
> > +        g_tree_insert(s->address_spaces, GUINT_TO_POINTER(asid), as);
> > +        trace_virtio_iommu_new_asid(asid);
> > +    }
> > +
> > +    dev = g_malloc0(sizeof(*dev));
> > +    dev->as = as;
> > +    dev->id = devid;
> > +    as->nr_devices++;
> > +    trace_virtio_iommu_new_devid(devid);
> > +    g_tree_insert(s->devices, GUINT_TO_POINTER(devid), dev);
> 
> Here do we need to record something like a refcount for address space?
> Since...

We are using "nr_devices" as number of devices attached to an address-space

> 
> > +
> > +    return VIRTIO_IOMMU_S_OK;
> >  }
> >
> >  static int virtio_iommu_detach(VirtIOIOMMU *s, @@ -106,10 +170,13 @@
> > static int virtio_iommu_detach(VirtIOIOMMU *s,  {
> >      uint32_t devid = le32_to_cpu(req->device);
> >      uint32_t reserved = le32_to_cpu(req->reserved);
> > +    int ret;
> >
> >      trace_virtio_iommu_detach(devid, reserved);
> >
> > -    return VIRTIO_IOMMU_S_UNSUPP;
> > +    ret = g_tree_remove(s->devices, GUINT_TO_POINTER(devid));
> 
> ... here when detach, imho we should check the refcount: if there is no
> device using specific address space, we should release the address space as
> well.
> 
> Otherwise there would have no way to destroy an address space?


Here if nr_devices == 0 then release the address space, is that ok? 

This is how I implemented as part of VFIO integration over this patch series.
	"[RFC PATCH 2/2] virtio-iommu: vfio integration with virtio-iommu"

Thanks
-Bharat
> 
> > +
> > +    return ret ? VIRTIO_IOMMU_S_OK : VIRTIO_IOMMU_S_INVAL;
> >  }
> 
> [...]
> 
> >  static int virtio_iommu_unmap(VirtIOIOMMU *s, @@ -133,10 +227,64 @@
> > static int virtio_iommu_unmap(VirtIOIOMMU *s,
> >      uint64_t virt_addr = le64_to_cpu(req->virt_addr);
> >      uint64_t size = le64_to_cpu(req->size);
> >      uint32_t flags = le32_to_cpu(req->flags);
> > +    viommu_mapping *mapping;
> > +    viommu_interval interval;
> > +    viommu_as *as;
> >
> >      trace_virtio_iommu_unmap(asid, virt_addr, size, flags);
> >
> > -    return VIRTIO_IOMMU_S_UNSUPP;
> > +    as = g_tree_lookup(s->address_spaces, GUINT_TO_POINTER(asid));
> > +    if (!as) {
> > +        error_report("%s: no as", __func__);
> > +        return VIRTIO_IOMMU_S_INVAL;
> > +    }
> > +    interval.low = virt_addr;
> > +    interval.high = virt_addr + size - 1;
> > +
> > +    mapping = g_tree_lookup(as->mappings, (gpointer)&interval);
> > +
> > +    while (mapping) {
> > +        viommu_interval current;
> > +        uint64_t low  = mapping->virt_addr;
> > +        uint64_t high = mapping->virt_addr + mapping->size - 1;
> > +
> > +        current.low = low;
> > +        current.high = high;
> > +
> > +        if (low == interval.low && size >= mapping->size) {
> > +            g_tree_remove(as->mappings, (gpointer)&current);
> > +            interval.low = high + 1;
> > +            trace_virtio_iommu_unmap_left_interval(current.low,
> current.high,
> > +                interval.low, interval.high);
> > +        } else if (high == interval.high && size >= mapping->size) {
> > +            trace_virtio_iommu_unmap_right_interval(current.low,
> current.high,
> > +                interval.low, interval.high);
> > +            g_tree_remove(as->mappings, (gpointer)&current);
> > +            interval.high = low - 1;
> > +        } else if (low > interval.low && high < interval.high) {
> > +            trace_virtio_iommu_unmap_inc_interval(current.low,
> current.high);
> > +            g_tree_remove(as->mappings, (gpointer)&current);
> > +        } else {
> > +            break;
> > +        }
> > +        if (interval.low >= interval.high) {
> > +            return VIRTIO_IOMMU_S_OK;
> > +        } else {
> > +            mapping = g_tree_lookup(as->mappings, (gpointer)&interval);
> > +        }
> > +    }
> 
> IIUC for unmap here we are very strict - a extreme case is that when an
> address space is just created (so no mapping inside), if we send one UNMAP
> to a range, it'll fail currently, right? Should we loosen it?
> 
> IMHO as long as we make sure all the mappings in the range of an UNMAP
> request are destroyed, then we are good. I think at least both vfio api and vt-
> d emuation have this assumption. But maybe I am wrong.
> Please correct me if so.
> 
> > +
> > +    if (mapping) {
> > +        error_report("****** %s: Unmap 0x%"PRIx64" size=0x%"PRIx64
> > +                     " from 0x%"PRIx64" size=0x%"PRIx64" is not supported",
> > +                     __func__, interval.low, size,
> > +                     mapping->virt_addr, mapping->size);
> > +    } else {
> > +        error_report("****** %s: no mapping for
> [0x%"PRIx64",0x%"PRIx64"]",
> > +                     __func__, interval.low, interval.high);
> > +    }
> > +
> > +    return VIRTIO_IOMMU_S_INVAL;
> >  }
> 
> Thanks,
> 
> --
> Peter Xu

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [Qemu-devel] [Qemu-arm]  [RFC v2 0/8] VIRTIO-IOMMU device
  2017-07-07 15:14                             ` Jean-Philippe Brucker
  2017-07-07 22:11                               ` Kalra, Ashish
@ 2017-07-14  6:58                               ` Tian, Kevin
  1 sibling, 0 replies; 73+ messages in thread
From: Tian, Kevin @ 2017-07-14  6:58 UTC (permalink / raw)
  To: Jean-Philippe Brucker, Kalra, Ashish, Auger Eric, Bharat Bhushan,
	eric.auger.pro, peter.maydell, alex.williamson, mst, qemu-arm,
	qemu-devel
  Cc: marc.zyngier, tn, will.deacon, drjones, robin.murphy, christoffer.dall

> From: Jean-Philippe Brucker [mailto:jean-philippe.brucker@arm.com]
> Sent: Friday, July 7, 2017 11:15 PM
> 
> Hi Ashish,
> 
> On 07/07/17 00:33, Tian, Kevin wrote:
> >> From: Kalra, Ashish [mailto:Ashish.Kalra@cavium.com]
> >> Sent: Friday, July 7, 2017 7:24 AM
> >>
> >> I have a generic question on vIOMMU support, is there any proposal/plan
> to
> >> add ATS/PRI extension support to vIOMMUs and allow
> >> handling for end to end (v)IOMMU Page faults (w/t the device side
> >> implementation on Vhost) ?
> >>
> >> Again, the motivation will be to do DMA on paged guest memory and
> >> potentially avoiding the requirement of pinned/locked
> >> guest physical memory for DMA.
> >
> > yes, that's a necessary part to support SVM in both virtio-iommu
> > approach and fully emulated approach (e.g. for Intel VTd). There
> > are already patches and discussions in other thread about how to
> > propagate IOMMU page fault to vIOMMU. Then after it is done
> > vIOMMU page fault emulation will be further added.
> >
> > https://lkml.org/lkml/2017/6/27/964
> 
> For virtio-iommu, I'd like to add an event virtqueue for the device to
> send page faults to the driver, in a format similar to a PRI Page Request.
> The driver would then send a reply via the request virtqueue in a format
> similar to a PRG Response.
> 
> In Qemu the device implementation would hopefully be based on the same
> mechanism as VTd. The vhost implementation would receive IO Page Faults
> from VFIO and forward them on the event virtqueue similarly to the
> userspace implementation.
> 

Agree. I expect the path between Qemu and VFIO are general enough
for both emulated IOMMU and virtio-IOMMU. Difference is on the 
propagation path to guest based on the definition of different virtual 
interfaces.

Thanks
Kevin

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [Qemu-devel] [RFC v2 0/8] VIRTIO-IOMMU device
  2017-07-07 15:15                   ` Jean-Philippe Brucker
@ 2017-07-14  7:20                     ` Tian, Kevin
  2017-07-14 11:25                       ` Jean-Philippe Brucker
  0 siblings, 1 reply; 73+ messages in thread
From: Tian, Kevin @ 2017-07-14  7:20 UTC (permalink / raw)
  To: Jean-Philippe Brucker, Bharat Bhushan, Auger Eric,
	eric.auger.pro, peter.maydell, alex.williamson, mst, qemu-arm,
	qemu-devel
  Cc: wei, marc.zyngier, tn, will.deacon, drjones, robin.murphy,
	christoffer.dall

> From: Jean-Philippe Brucker [mailto:jean-philippe.brucker@arm.com]
> Sent: Friday, July 7, 2017 11:15 PM
> 
> On 07/07/17 07:21, Tian, Kevin wrote:
> > sorry I didn't quite get this part, and here is my understanding:
> >
> > Guest programs vIOMMU to map a gIOVA (used by MSI to a GPA
> > of doorbell register of virtual irqchip. vIOMMU then
> > triggers VFIO map/unmap to update physical IOMMU page
> > table for gIOVA -> HPA of real doorbell of physical irqchip
> 
> At the moment (non-SVM), physical and virtual MSI doorbell are completely
> dissociated. VFIO itself maps the doorbell GPA->HPA during container
> initialization. The GPA, chosen arbitrarily by the host, is then removed
> from the guest GPA space.

got you. I also got some basic understanding from below link. :-)

https://www.linaro.org/blog/core-dump/kvm-pciemsi-passthrough-armarm64/

> 
> When the guest programs the vIOMMU to map a gIOVA to the virtual irqchip
> doorbell, I suppose Qemu will notice that the GPA doesn't correspond to
> RAM and will withhold sending a VFIO_IOMMU_MAP_DMA request.
> 
> (For SVM I don't want to go into the details just now, but we will
> probably need a separate VFIO mechanism to update the physical MSI-X
> tables with whatever gIOVA the guest mapped in its private stage-1 page
> tables.)

I guess there may be either a terminology difference or a hardware
difference here, since I noted you mentioned IOVA with stage-1
multiple times.

For Intel VT-d:

- stage-1 is only for VA translation, tagged with PASID
- stage-2 can be used for IOVA translation on bare metal or GPA/gIOVA
translation in virtualization, w/o PASID tagged

Does ARM SMMU allow stage-1 used for both VA and IOVA? IIRC
you said PASID#0 reserved for traffic w/o PASID in some mail...

> 
> > (assume your irqchip will provide multiple doorbells so each
> > device can have its own channel).
> 
> In existing irqchips the doorbell is shared by endpoints, which are
> differentiated by their device ID (generally the BDF). I'm not sure why
> this matters here?

Not matter now with device ID

> 
> > then once this update is
> > done, later MSI interrupts from assigned device will go
> > through physical IOMMU (gIOVA->HPA) then reach irqchip
> > for irq remapping. vIOMMU is involved only in configuration
> > path instead of actual interrupt path.
> 
> Yes the vIOMMU is used to correlate the IOVA written by the guest in its
> virtual MSI-X table with the MAP request received by the vIOMMU. That is
> probably used to setup IRQFD routes with KVM. But the vIOMMU is not
> involved further than that in MSIs.
> 
> > If my understanding is correct, above will be the natural flow then
> > why is additional virtio-iommu change required? :-)
> 
> The change is not *required* for ARM systems, I only proposed removing the
> doorbell address translation stage to make host implementation simpler
> (and since virtio-iommu on x86 won't translate the doorbell anyway, we
> have to add support for this to virtio-iommu). But for Qemu, since vSMMU
> needs to implement the natural flow anyway, it might not be a lot of
> effort to also do it for virtio-iommu. Other implementations (e.g.
> kvmtool) might piggy-back on the x86 way and declare the irqchip doorbell
> as untranslated.
> 
> My proposal also breaks when confronted to virtual SVM in a physical ARM
> system, where the guest owns stage-1 page tables and *has* to map the
> doorbell if it wants MSIs to work, so you can disregard it :)
> 

It is a good learning. thanks.

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [Qemu-devel] [RFC v2 6/8] virtio-iommu: Implement the translation and commands
  2017-07-14  2:17   ` Peter Xu
  2017-07-14  6:40     ` Bharat Bhushan
@ 2017-07-14 11:25     ` Jean-Philippe Brucker
  2017-07-17  1:37       ` Peter Xu
  1 sibling, 1 reply; 73+ messages in thread
From: Jean-Philippe Brucker @ 2017-07-14 11:25 UTC (permalink / raw)
  To: Peter Xu, Eric Auger
  Cc: eric.auger.pro, peter.maydell, alex.williamson, mst, qemu-arm,
	qemu-devel, wei, kevin.tian, bharat.bhushan, marc.zyngier, tn,
	will.deacon, drjones, robin.murphy, christoffer.dall

Hi Peter,

On 14/07/17 03:17, Peter Xu wrote:
>
> [...]
> 
>>  static int virtio_iommu_unmap(VirtIOIOMMU *s,
>> @@ -133,10 +227,64 @@ static int virtio_iommu_unmap(VirtIOIOMMU *s,
>>      uint64_t virt_addr = le64_to_cpu(req->virt_addr);
>>      uint64_t size = le64_to_cpu(req->size);
>>      uint32_t flags = le32_to_cpu(req->flags);
>> +    viommu_mapping *mapping;
>> +    viommu_interval interval;
>> +    viommu_as *as;
>>  
>>      trace_virtio_iommu_unmap(asid, virt_addr, size, flags);
>>  
>> -    return VIRTIO_IOMMU_S_UNSUPP;
>> +    as = g_tree_lookup(s->address_spaces, GUINT_TO_POINTER(asid));
>> +    if (!as) {
>> +        error_report("%s: no as", __func__);
>> +        return VIRTIO_IOMMU_S_INVAL;
>> +    }
>> +    interval.low = virt_addr;
>> +    interval.high = virt_addr + size - 1;
>> +
>> +    mapping = g_tree_lookup(as->mappings, (gpointer)&interval);
>> +
>> +    while (mapping) {
>> +        viommu_interval current;
>> +        uint64_t low  = mapping->virt_addr;
>> +        uint64_t high = mapping->virt_addr + mapping->size - 1;
>> +
>> +        current.low = low;
>> +        current.high = high;
>> +
>> +        if (low == interval.low && size >= mapping->size) {
>> +            g_tree_remove(as->mappings, (gpointer)&current);
>> +            interval.low = high + 1;
>> +            trace_virtio_iommu_unmap_left_interval(current.low, current.high,
>> +                interval.low, interval.high);
>> +        } else if (high == interval.high && size >= mapping->size) {
>> +            trace_virtio_iommu_unmap_right_interval(current.low, current.high,
>> +                interval.low, interval.high);
>> +            g_tree_remove(as->mappings, (gpointer)&current);
>> +            interval.high = low - 1;
>> +        } else if (low > interval.low && high < interval.high) {
>> +            trace_virtio_iommu_unmap_inc_interval(current.low, current.high);
>> +            g_tree_remove(as->mappings, (gpointer)&current);
>> +        } else {
>> +            break;
>> +        }
>> +        if (interval.low >= interval.high) {
>> +            return VIRTIO_IOMMU_S_OK;
>> +        } else {
>> +            mapping = g_tree_lookup(as->mappings, (gpointer)&interval);
>> +        }
>> +    }
> 
> IIUC for unmap here we are very strict - a extreme case is that when
> an address space is just created (so no mapping inside), if we send
> one UNMAP to a range, it'll fail currently, right? Should we loosen
> it?

In the next specification version I'd like to slightly redefine unmap
semantics (to make them more deterministic). Unmapping a range that is
only partially mapped or not mapped at all would not return an error, and
would unmap everything that is covered by the range.

Note that unmapping a partial range (splitting a single mapping) is still
forbidden. The following pseudocode sequences attempt to illustrate the
rules I'd like to set. There are no mappings in the address space prior to
each sequence.

(1) unmap(addr=0, size=5)        -> succeeds, doesn't unmap anything

(2) a = map(addr=0, size=10);
    unmap(0, 10)                 -> succeeds, unmaps a

(3) a = map(0, 5);
    b = map(5, 5);
    unmap(0, 10)                 -> succeeds, unmaps a and b

(4) a = map(0, 10);
    unmap(0, 5)                  -> faults, doesn't unmap anything

(5) a = map(0, 5);
    b = map(5, 5);
    unmap(0, 5)                  -> succeeds, unmaps a

(6) a = map(0, 5);
    unmap(0, 10)                 -> succeeds, unmaps a

(7) a = map(0, 5);
    b = map(10, 5);
    unmap(0, 15)                 -> succeeds, unmaps a and b

Previously I left (1), (6) and (7) as an implementation choice. The device
could either succeed and unmap, or fail and not unmap. This means that if
a driver wanted guarantees, it had to issue strict map/unmap sequences.

I believe that VFIO type1 v2 has these semantics, but "v1" allowed (4) to
succeed and unmap a.

Thanks,
Jean


> IMHO as long as we make sure all the mappings in the range of an UNMAP
> request are destroyed, then we are good. I think at least both vfio
> api and vt-d emuation have this assumption. But maybe I am wrong.
> Please correct me if so.
> 
>> +
>> +    if (mapping) {
>> +        error_report("****** %s: Unmap 0x%"PRIx64" size=0x%"PRIx64
>> +                     " from 0x%"PRIx64" size=0x%"PRIx64" is not supported",
>> +                     __func__, interval.low, size,
>> +                     mapping->virt_addr, mapping->size);
>> +    } else {
>> +        error_report("****** %s: no mapping for [0x%"PRIx64",0x%"PRIx64"]",
>> +                     __func__, interval.low, interval.high);
>> +    }
>> +
>> +    return VIRTIO_IOMMU_S_INVAL;
>>  }
> 
> Thanks,
> 

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [Qemu-devel] [RFC v2 0/8] VIRTIO-IOMMU device
  2017-07-14  7:20                     ` Tian, Kevin
@ 2017-07-14 11:25                       ` Jean-Philippe Brucker
  2017-07-17  2:20                         ` Tian, Kevin
  0 siblings, 1 reply; 73+ messages in thread
From: Jean-Philippe Brucker @ 2017-07-14 11:25 UTC (permalink / raw)
  To: Tian, Kevin, Bharat Bhushan, Auger Eric, eric.auger.pro,
	peter.maydell, alex.williamson, mst, qemu-arm, qemu-devel
  Cc: wei, marc.zyngier, tn, will.deacon, drjones, robin.murphy,
	christoffer.dall

On 14/07/17 08:20, Tian, Kevin wrote:
>> From: Jean-Philippe Brucker [mailto:jean-philippe.brucker@arm.com]
>> Sent: Friday, July 7, 2017 11:15 PM
>>
>> On 07/07/17 07:21, Tian, Kevin wrote:
>>> sorry I didn't quite get this part, and here is my understanding:
>>>
>>> Guest programs vIOMMU to map a gIOVA (used by MSI to a GPA
>>> of doorbell register of virtual irqchip. vIOMMU then
>>> triggers VFIO map/unmap to update physical IOMMU page
>>> table for gIOVA -> HPA of real doorbell of physical irqchip
>>
>> At the moment (non-SVM), physical and virtual MSI doorbell are completely
>> dissociated. VFIO itself maps the doorbell GPA->HPA during container
>> initialization. The GPA, chosen arbitrarily by the host, is then removed
>> from the guest GPA space.
> 
> got you. I also got some basic understanding from below link. :-)
> 
> https://www.linaro.org/blog/core-dump/kvm-pciemsi-passthrough-armarm64/
> 
>>
>> When the guest programs the vIOMMU to map a gIOVA to the virtual irqchip
>> doorbell, I suppose Qemu will notice that the GPA doesn't correspond to
>> RAM and will withhold sending a VFIO_IOMMU_MAP_DMA request.
>>
>> (For SVM I don't want to go into the details just now, but we will
>> probably need a separate VFIO mechanism to update the physical MSI-X
>> tables with whatever gIOVA the guest mapped in its private stage-1 page
>> tables.)
> 
> I guess there may be either a terminology difference or a hardware
> difference here, since I noted you mentioned IOVA with stage-1
> multiple times.
> 
> For Intel VT-d:
> 
> - stage-1 is only for VA translation, tagged with PASID
> - stage-2 can be used for IOVA translation on bare metal or GPA/gIOVA
> translation in virtualization, w/o PASID tagged

The terminology is indeed a bit confusing, and the hardware slightly
different. For me IOVA is the address used as input of the pIOMMU, PA is
the output address, and GPA only exists if there is stage-1 + stage-2. So
I think what I meant by gIOVA above was VA in your description.

I understand your "stage-1" and "stage-2" are named "first-level" and
"second level" in the VT-d spec?

If I read the VT-d spec correctly, I think the main difference on ARM SMMU
is that stage-2 always follows stage-1 translation, but either stage may
be disabled (or both, for bypass mode). There is no mode like in VT-d,
where non-PASID transactions go only through stage-2 and PASID
transactions go only through stage-1. I believe this is (NESTE=0,
T=000b/001b) in the Extended-Context-Entry.

Something equivalent in SMMU is disabling stage-2 and using the entry 0 in
the PASID table for non-PASID traffic. In this mode, traffic that uses
PASID#0 would be aborted. So using your terms, the SMMU can have VAs and
IOVAs be translated by stage-1 and then, if enabled, be translated by
stage-2 as well.

Thanks,
Jean

> Does ARM SMMU allow stage-1 used for both VA and IOVA? IIRC
> you said PASID#0 reserved for traffic w/o PASID in some mail...>
>>> (assume your irqchip will provide multiple doorbells so each
>>> device can have its own channel).
>>
>> In existing irqchips the doorbell is shared by endpoints, which are
>> differentiated by their device ID (generally the BDF). I'm not sure why
>> this matters here?
> 
> Not matter now with device ID
> 
>>
>>> then once this update is
>>> done, later MSI interrupts from assigned device will go
>>> through physical IOMMU (gIOVA->HPA) then reach irqchip
>>> for irq remapping. vIOMMU is involved only in configuration
>>> path instead of actual interrupt path.
>>
>> Yes the vIOMMU is used to correlate the IOVA written by the guest in its
>> virtual MSI-X table with the MAP request received by the vIOMMU. That is
>> probably used to setup IRQFD routes with KVM. But the vIOMMU is not
>> involved further than that in MSIs.
>>
>>> If my understanding is correct, above will be the natural flow then
>>> why is additional virtio-iommu change required? :-)
>>
>> The change is not *required* for ARM systems, I only proposed removing the
>> doorbell address translation stage to make host implementation simpler
>> (and since virtio-iommu on x86 won't translate the doorbell anyway, we
>> have to add support for this to virtio-iommu). But for Qemu, since vSMMU
>> needs to implement the natural flow anyway, it might not be a lot of
>> effort to also do it for virtio-iommu. Other implementations (e.g.
>> kvmtool) might piggy-back on the x86 way and declare the irqchip doorbell
>> as untranslated.
>>
>> My proposal also breaks when confronted to virtual SVM in a physical ARM
>> system, where the guest owns stage-1 page tables and *has* to map the
>> doorbell if it wants MSIs to work, so you can disregard it :)
>>
> 
> It is a good learning. thanks.
> 

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [Qemu-devel] [RFC v2 6/8] virtio-iommu: Implement the translation and commands
  2017-07-14  6:40     ` Bharat Bhushan
@ 2017-07-17  1:28       ` Peter Xu
  2017-07-31 13:08         ` Auger Eric
  0 siblings, 1 reply; 73+ messages in thread
From: Peter Xu @ 2017-07-17  1:28 UTC (permalink / raw)
  To: Bharat Bhushan
  Cc: Eric Auger, eric.auger.pro, peter.maydell, alex.williamson, mst,
	qemu-arm, qemu-devel, jean-philippe.brucker, wei, kevin.tian,
	marc.zyngier, tn, will.deacon, drjones, robin.murphy,
	christoffer.dall

On Fri, Jul 14, 2017 at 06:40:34AM +0000, Bharat Bhushan wrote:
> Hi Peter,
> 
> > -----Original Message-----
> > From: Peter Xu [mailto:peterx@redhat.com]
> > Sent: Friday, July 14, 2017 7:48 AM
> > To: Eric Auger <eric.auger@redhat.com>
> > Cc: eric.auger.pro@gmail.com; peter.maydell@linaro.org;
> > alex.williamson@redhat.com; mst@redhat.com; qemu-arm@nongnu.org;
> > qemu-devel@nongnu.org; jean-philippe.brucker@arm.com;
> > wei@redhat.com; kevin.tian@intel.com; Bharat Bhushan
> > <bharat.bhushan@nxp.com>; marc.zyngier@arm.com; tn@semihalf.com;
> > will.deacon@arm.com; drjones@redhat.com; robin.murphy@arm.com;
> > christoffer.dall@linaro.org
> > Subject: Re: [Qemu-devel] [RFC v2 6/8] virtio-iommu: Implement the
> > translation and commands
> > 
> > On Wed, Jun 07, 2017 at 06:01:25PM +0200, Eric Auger wrote:
> > > This patch adds the actual implementation for the translation routine
> > > and the virtio-iommu commands.
> > >
> > > Signed-off-by: Eric Auger <eric.auger@redhat.com>
> > 
> > [...]
> > 
> > >  static int virtio_iommu_attach(VirtIOIOMMU *s,
> > >                                 struct virtio_iommu_req_attach *req)
> > > @@ -95,10 +135,34 @@ static int virtio_iommu_attach(VirtIOIOMMU *s,
> > >      uint32_t asid = le32_to_cpu(req->address_space);
> > >      uint32_t devid = le32_to_cpu(req->device);
> > >      uint32_t reserved = le32_to_cpu(req->reserved);
> > > +    viommu_as *as;
> > > +    viommu_dev *dev;
> > >
> > >      trace_virtio_iommu_attach(asid, devid, reserved);
> > >
> > > -    return VIRTIO_IOMMU_S_UNSUPP;
> > > +    dev = g_tree_lookup(s->devices, GUINT_TO_POINTER(devid));
> > > +    if (dev) {
> > > +        return -1;
> > > +    }
> > > +
> > > +    as = g_tree_lookup(s->address_spaces, GUINT_TO_POINTER(asid));
> > > +    if (!as) {
> > > +        as = g_malloc0(sizeof(*as));
> > > +        as->id = asid;
> > > +        as->mappings = g_tree_new_full((GCompareDataFunc)interval_cmp,
> > > +                                         NULL, NULL, (GDestroyNotify)g_free);
> > > +        g_tree_insert(s->address_spaces, GUINT_TO_POINTER(asid), as);
> > > +        trace_virtio_iommu_new_asid(asid);
> > > +    }
> > > +
> > > +    dev = g_malloc0(sizeof(*dev));
> > > +    dev->as = as;
> > > +    dev->id = devid;
> > > +    as->nr_devices++;
> > > +    trace_virtio_iommu_new_devid(devid);
> > > +    g_tree_insert(s->devices, GUINT_TO_POINTER(devid), dev);
> > 
> > Here do we need to record something like a refcount for address space?
> > Since...
> 
> We are using "nr_devices" as number of devices attached to an address-space
> 
> > 
> > > +
> > > +    return VIRTIO_IOMMU_S_OK;
> > >  }
> > >
> > >  static int virtio_iommu_detach(VirtIOIOMMU *s, @@ -106,10 +170,13 @@
> > > static int virtio_iommu_detach(VirtIOIOMMU *s,  {
> > >      uint32_t devid = le32_to_cpu(req->device);
> > >      uint32_t reserved = le32_to_cpu(req->reserved);
> > > +    int ret;
> > >
> > >      trace_virtio_iommu_detach(devid, reserved);
> > >
> > > -    return VIRTIO_IOMMU_S_UNSUPP;
> > > +    ret = g_tree_remove(s->devices, GUINT_TO_POINTER(devid));
> > 
> > ... here when detach, imho we should check the refcount: if there is no
> > device using specific address space, we should release the address space as
> > well.
> > 
> > Otherwise there would have no way to destroy an address space?
> 
> 
> Here if nr_devices == 0 then release the address space, is that ok? 
> 
> This is how I implemented as part of VFIO integration over this patch series.
> 	"[RFC PATCH 2/2] virtio-iommu: vfio integration with virtio-iommu"

Sorry I didn't read that when posting. It is what I mean.  Thanks,

-- 
Peter Xu

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [Qemu-devel] [RFC v2 6/8] virtio-iommu: Implement the translation and commands
  2017-07-14 11:25     ` Jean-Philippe Brucker
@ 2017-07-17  1:37       ` Peter Xu
  0 siblings, 0 replies; 73+ messages in thread
From: Peter Xu @ 2017-07-17  1:37 UTC (permalink / raw)
  To: Jean-Philippe Brucker
  Cc: Eric Auger, eric.auger.pro, peter.maydell, alex.williamson, mst,
	qemu-arm, qemu-devel, wei, kevin.tian, bharat.bhushan,
	marc.zyngier, tn, will.deacon, drjones, robin.murphy,
	christoffer.dall

On Fri, Jul 14, 2017 at 12:25:13PM +0100, Jean-Philippe Brucker wrote:
> Hi Peter,
> 
> On 14/07/17 03:17, Peter Xu wrote:
> >
> > [...]
> > 
> >>  static int virtio_iommu_unmap(VirtIOIOMMU *s,
> >> @@ -133,10 +227,64 @@ static int virtio_iommu_unmap(VirtIOIOMMU *s,
> >>      uint64_t virt_addr = le64_to_cpu(req->virt_addr);
> >>      uint64_t size = le64_to_cpu(req->size);
> >>      uint32_t flags = le32_to_cpu(req->flags);
> >> +    viommu_mapping *mapping;
> >> +    viommu_interval interval;
> >> +    viommu_as *as;
> >>  
> >>      trace_virtio_iommu_unmap(asid, virt_addr, size, flags);
> >>  
> >> -    return VIRTIO_IOMMU_S_UNSUPP;
> >> +    as = g_tree_lookup(s->address_spaces, GUINT_TO_POINTER(asid));
> >> +    if (!as) {
> >> +        error_report("%s: no as", __func__);
> >> +        return VIRTIO_IOMMU_S_INVAL;
> >> +    }
> >> +    interval.low = virt_addr;
> >> +    interval.high = virt_addr + size - 1;
> >> +
> >> +    mapping = g_tree_lookup(as->mappings, (gpointer)&interval);
> >> +
> >> +    while (mapping) {
> >> +        viommu_interval current;
> >> +        uint64_t low  = mapping->virt_addr;
> >> +        uint64_t high = mapping->virt_addr + mapping->size - 1;
> >> +
> >> +        current.low = low;
> >> +        current.high = high;
> >> +
> >> +        if (low == interval.low && size >= mapping->size) {
> >> +            g_tree_remove(as->mappings, (gpointer)&current);
> >> +            interval.low = high + 1;
> >> +            trace_virtio_iommu_unmap_left_interval(current.low, current.high,
> >> +                interval.low, interval.high);
> >> +        } else if (high == interval.high && size >= mapping->size) {
> >> +            trace_virtio_iommu_unmap_right_interval(current.low, current.high,
> >> +                interval.low, interval.high);
> >> +            g_tree_remove(as->mappings, (gpointer)&current);
> >> +            interval.high = low - 1;
> >> +        } else if (low > interval.low && high < interval.high) {
> >> +            trace_virtio_iommu_unmap_inc_interval(current.low, current.high);
> >> +            g_tree_remove(as->mappings, (gpointer)&current);
> >> +        } else {
> >> +            break;
> >> +        }
> >> +        if (interval.low >= interval.high) {
> >> +            return VIRTIO_IOMMU_S_OK;
> >> +        } else {
> >> +            mapping = g_tree_lookup(as->mappings, (gpointer)&interval);
> >> +        }
> >> +    }
> > 
> > IIUC for unmap here we are very strict - a extreme case is that when
> > an address space is just created (so no mapping inside), if we send
> > one UNMAP to a range, it'll fail currently, right? Should we loosen
> > it?
> 
> In the next specification version I'd like to slightly redefine unmap
> semantics (to make them more deterministic). Unmapping a range that is
> only partially mapped or not mapped at all would not return an error, and
> would unmap everything that is covered by the range.
> 
> Note that unmapping a partial range (splitting a single mapping) is still
> forbidden. The following pseudocode sequences attempt to illustrate the
> rules I'd like to set. There are no mappings in the address space prior to
> each sequence.
> 
> (1) unmap(addr=0, size=5)        -> succeeds, doesn't unmap anything
> 
> (2) a = map(addr=0, size=10);
>     unmap(0, 10)                 -> succeeds, unmaps a
> 
> (3) a = map(0, 5);
>     b = map(5, 5);
>     unmap(0, 10)                 -> succeeds, unmaps a and b
> 
> (4) a = map(0, 10);
>     unmap(0, 5)                  -> faults, doesn't unmap anything
> 
> (5) a = map(0, 5);
>     b = map(5, 5);
>     unmap(0, 5)                  -> succeeds, unmaps a
> 
> (6) a = map(0, 5);
>     unmap(0, 10)                 -> succeeds, unmaps a
> 
> (7) a = map(0, 5);
>     b = map(10, 5);
>     unmap(0, 15)                 -> succeeds, unmaps a and b
> 
> Previously I left (1), (6) and (7) as an implementation choice. The device
> could either succeed and unmap, or fail and not unmap. This means that if
> a driver wanted guarantees, it had to issue strict map/unmap sequences.
> 
> I believe that VFIO type1 v2 has these semantics, but "v1" allowed (4) to
> succeed and unmap a.

Thanks Jean. It looks good.

Actually I asked mostly for (7). IMHO it is really helpful sometimes
to allow the upper layer to release the things it holds in some easy
way.

-- 
Peter Xu

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [Qemu-devel] [RFC v2 0/8] VIRTIO-IOMMU device
  2017-07-14 11:25                       ` Jean-Philippe Brucker
@ 2017-07-17  2:20                         ` Tian, Kevin
  0 siblings, 0 replies; 73+ messages in thread
From: Tian, Kevin @ 2017-07-17  2:20 UTC (permalink / raw)
  To: Jean-Philippe Brucker, Bharat Bhushan, Auger Eric,
	eric.auger.pro, peter.maydell, alex.williamson, mst, qemu-arm,
	qemu-devel
  Cc: wei, marc.zyngier, tn, will.deacon, drjones, robin.murphy,
	christoffer.dall

> From: Jean-Philippe Brucker [mailto:jean-philippe.brucker@arm.com]
> Sent: Friday, July 14, 2017 7:26 PM
> 
> On 14/07/17 08:20, Tian, Kevin wrote:
> >> From: Jean-Philippe Brucker [mailto:jean-philippe.brucker@arm.com]
> >> Sent: Friday, July 7, 2017 11:15 PM
> >>
> >> On 07/07/17 07:21, Tian, Kevin wrote:
> >>> sorry I didn't quite get this part, and here is my understanding:
> >>>
> >>> Guest programs vIOMMU to map a gIOVA (used by MSI to a GPA
> >>> of doorbell register of virtual irqchip. vIOMMU then
> >>> triggers VFIO map/unmap to update physical IOMMU page
> >>> table for gIOVA -> HPA of real doorbell of physical irqchip
> >>
> >> At the moment (non-SVM), physical and virtual MSI doorbell are
> completely
> >> dissociated. VFIO itself maps the doorbell GPA->HPA during container
> >> initialization. The GPA, chosen arbitrarily by the host, is then removed
> >> from the guest GPA space.
> >
> > got you. I also got some basic understanding from below link. :-)
> >
> > https://www.linaro.org/blog/core-dump/kvm-pciemsi-passthrough-
> armarm64/
> >
> >>
> >> When the guest programs the vIOMMU to map a gIOVA to the virtual
> irqchip
> >> doorbell, I suppose Qemu will notice that the GPA doesn't correspond to
> >> RAM and will withhold sending a VFIO_IOMMU_MAP_DMA request.
> >>
> >> (For SVM I don't want to go into the details just now, but we will
> >> probably need a separate VFIO mechanism to update the physical MSI-X
> >> tables with whatever gIOVA the guest mapped in its private stage-1 page
> >> tables.)
> >
> > I guess there may be either a terminology difference or a hardware
> > difference here, since I noted you mentioned IOVA with stage-1
> > multiple times.
> >
> > For Intel VT-d:
> >
> > - stage-1 is only for VA translation, tagged with PASID
> > - stage-2 can be used for IOVA translation on bare metal or GPA/gIOVA
> > translation in virtualization, w/o PASID tagged
> 
> The terminology is indeed a bit confusing, and the hardware slightly
> different. For me IOVA is the address used as input of the pIOMMU, PA is
> the output address, and GPA only exists if there is stage-1 + stage-2. So
> I think what I meant by gIOVA above was VA in your description.

In Linux kernel IOVA specifically refers to a pseudo address space
remapped to PA (e.g. from pci_map) while VA is for real CPU virtual 
address (so-called SVM). either IOVA or VA can be input to pIOMMU
based on different usages. When running inside a VM, then input
addresses become gIOVA or GVA. What about following this convention
here and in future discussions, though I agree conceptually IOVA can 
represent any input of pIOMMU? :-)

> 
> I understand your "stage-1" and "stage-2" are named "first-level" and
> "second level" in the VT-d spec?

yes, VT-d uses first/second level.

> 
> If I read the VT-d spec correctly, I think the main difference on ARM SMMU
> is that stage-2 always follows stage-1 translation, but either stage may
> be disabled (or both, for bypass mode). There is no mode like in VT-d,
> where non-PASID transactions go only through stage-2 and PASID
> transactions go only through stage-1. I believe this is (NESTE=0,
> T=000b/001b) in the Extended-Context-Entry.
> 
> Something equivalent in SMMU is disabling stage-2 and using the entry 0 in
> the PASID table for non-PASID traffic. In this mode, traffic that uses
> PASID#0 would be aborted. So using your terms, the SMMU can have VAs
> and
> IOVAs be translated by stage-1 and then, if enabled, be translated by
> stage-2 as well.
> 

Clear to me. Thanks for explanation.

Kevin

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [Qemu-devel] [RFC v2 6/8] virtio-iommu: Implement the translation and commands
  2017-07-17  1:28       ` Peter Xu
@ 2017-07-31 13:08         ` Auger Eric
  2017-08-03 10:48           ` Bharat Bhushan
  0 siblings, 1 reply; 73+ messages in thread
From: Auger Eric @ 2017-07-31 13:08 UTC (permalink / raw)
  To: Peter Xu, Bharat Bhushan
  Cc: wei, peter.maydell, kevin.tian, drjones, mst,
	jean-philippe.brucker, tn, will.deacon, qemu-devel,
	alex.williamson, qemu-arm, marc.zyngier, robin.murphy,
	christoffer.dall, eric.auger.pro

Hi Peter, Bharat,

On 17/07/2017 03:28, Peter Xu wrote:
> On Fri, Jul 14, 2017 at 06:40:34AM +0000, Bharat Bhushan wrote:
>> Hi Peter,
>>
>>> -----Original Message-----
>>> From: Peter Xu [mailto:peterx@redhat.com]
>>> Sent: Friday, July 14, 2017 7:48 AM
>>> To: Eric Auger <eric.auger@redhat.com>
>>> Cc: eric.auger.pro@gmail.com; peter.maydell@linaro.org;
>>> alex.williamson@redhat.com; mst@redhat.com; qemu-arm@nongnu.org;
>>> qemu-devel@nongnu.org; jean-philippe.brucker@arm.com;
>>> wei@redhat.com; kevin.tian@intel.com; Bharat Bhushan
>>> <bharat.bhushan@nxp.com>; marc.zyngier@arm.com; tn@semihalf.com;
>>> will.deacon@arm.com; drjones@redhat.com; robin.murphy@arm.com;
>>> christoffer.dall@linaro.org
>>> Subject: Re: [Qemu-devel] [RFC v2 6/8] virtio-iommu: Implement the
>>> translation and commands
>>>
>>> On Wed, Jun 07, 2017 at 06:01:25PM +0200, Eric Auger wrote:
>>>> This patch adds the actual implementation for the translation routine
>>>> and the virtio-iommu commands.
>>>>
>>>> Signed-off-by: Eric Auger <eric.auger@redhat.com>
>>>
>>> [...]
>>>
>>>>  static int virtio_iommu_attach(VirtIOIOMMU *s,
>>>>                                 struct virtio_iommu_req_attach *req)
>>>> @@ -95,10 +135,34 @@ static int virtio_iommu_attach(VirtIOIOMMU *s,
>>>>      uint32_t asid = le32_to_cpu(req->address_space);
>>>>      uint32_t devid = le32_to_cpu(req->device);
>>>>      uint32_t reserved = le32_to_cpu(req->reserved);
>>>> +    viommu_as *as;
>>>> +    viommu_dev *dev;
>>>>
>>>>      trace_virtio_iommu_attach(asid, devid, reserved);
>>>>
>>>> -    return VIRTIO_IOMMU_S_UNSUPP;
>>>> +    dev = g_tree_lookup(s->devices, GUINT_TO_POINTER(devid));
>>>> +    if (dev) {
>>>> +        return -1;
>>>> +    }
>>>> +
>>>> +    as = g_tree_lookup(s->address_spaces, GUINT_TO_POINTER(asid));
>>>> +    if (!as) {
>>>> +        as = g_malloc0(sizeof(*as));
>>>> +        as->id = asid;
>>>> +        as->mappings = g_tree_new_full((GCompareDataFunc)interval_cmp,
>>>> +                                         NULL, NULL, (GDestroyNotify)g_free);
>>>> +        g_tree_insert(s->address_spaces, GUINT_TO_POINTER(asid), as);
>>>> +        trace_virtio_iommu_new_asid(asid);
>>>> +    }
>>>> +
>>>> +    dev = g_malloc0(sizeof(*dev));
>>>> +    dev->as = as;
>>>> +    dev->id = devid;
>>>> +    as->nr_devices++;
>>>> +    trace_virtio_iommu_new_devid(devid);
>>>> +    g_tree_insert(s->devices, GUINT_TO_POINTER(devid), dev);
>>>
>>> Here do we need to record something like a refcount for address space?
>>> Since...
>>
>> We are using "nr_devices" as number of devices attached to an address-space
>>
>>>
>>>> +
>>>> +    return VIRTIO_IOMMU_S_OK;
>>>>  }
>>>>
>>>>  static int virtio_iommu_detach(VirtIOIOMMU *s, @@ -106,10 +170,13 @@
>>>> static int virtio_iommu_detach(VirtIOIOMMU *s,  {
>>>>      uint32_t devid = le32_to_cpu(req->device);
>>>>      uint32_t reserved = le32_to_cpu(req->reserved);
>>>> +    int ret;
>>>>
>>>>      trace_virtio_iommu_detach(devid, reserved);
>>>>
>>>> -    return VIRTIO_IOMMU_S_UNSUPP;
>>>> +    ret = g_tree_remove(s->devices, GUINT_TO_POINTER(devid));
>>>
>>> ... here when detach, imho we should check the refcount: if there is no
>>> device using specific address space, we should release the address space as
>>> well.
>>>
>>> Otherwise there would have no way to destroy an address space?
>>
>>
>> Here if nr_devices == 0 then release the address space, is that ok? 
>>
>> This is how I implemented as part of VFIO integration over this patch series.
>> 	"[RFC PATCH 2/2] virtio-iommu: vfio integration with virtio-iommu"

glib provides g_tree_ref/g_tree_unref. I think the most elegant is to
use g_tree_ref when adding a device and decrementing when removing a
device, as suggested by Peter.

"if the reference count drops to 0, all keys and values will be
destroyed (if destroy functions were specified) and all memory allocated
by tree will be released."

That way we should be able to remove nr_devices from viommu_as

Bharat, if this breaks your implementation I will let you re-introduce
nr_devices as part of the VFIO patch.

Thanks

Eric
> 
> Sorry I didn't read that when posting. It is what I mean.  Thanks,
> 

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [Qemu-devel] [RFC v2 6/8] virtio-iommu: Implement the translation and commands
  2017-07-31 13:08         ` Auger Eric
@ 2017-08-03 10:48           ` Bharat Bhushan
  0 siblings, 0 replies; 73+ messages in thread
From: Bharat Bhushan @ 2017-08-03 10:48 UTC (permalink / raw)
  To: Auger Eric, Peter Xu
  Cc: wei, peter.maydell, kevin.tian, drjones, mst,
	jean-philippe.brucker, tn, will.deacon, qemu-devel,
	alex.williamson, qemu-arm, marc.zyngier, robin.murphy,
	christoffer.dall, eric.auger.pro

Hi Eric,

> -----Original Message-----
> From: Auger Eric [mailto:eric.auger@redhat.com]
> Sent: Monday, July 31, 2017 6:38 PM
> To: Peter Xu <peterx@redhat.com>; Bharat Bhushan
> <bharat.bhushan@nxp.com>
> Cc: wei@redhat.com; peter.maydell@linaro.org; kevin.tian@intel.com;
> drjones@redhat.com; mst@redhat.com; jean-philippe.brucker@arm.com;
> tn@semihalf.com; will.deacon@arm.com; qemu-devel@nongnu.org;
> alex.williamson@redhat.com; qemu-arm@nongnu.org;
> marc.zyngier@arm.com; robin.murphy@arm.com;
> christoffer.dall@linaro.org; eric.auger.pro@gmail.com
> Subject: Re: [Qemu-devel] [RFC v2 6/8] virtio-iommu: Implement the
> translation and commands
> 
> Hi Peter, Bharat,
> 
> On 17/07/2017 03:28, Peter Xu wrote:
> > On Fri, Jul 14, 2017 at 06:40:34AM +0000, Bharat Bhushan wrote:
> >> Hi Peter,
> >>
> >>> -----Original Message-----
> >>> From: Peter Xu [mailto:peterx@redhat.com]
> >>> Sent: Friday, July 14, 2017 7:48 AM
> >>> To: Eric Auger <eric.auger@redhat.com>
> >>> Cc: eric.auger.pro@gmail.com; peter.maydell@linaro.org;
> >>> alex.williamson@redhat.com; mst@redhat.com; qemu-
> arm@nongnu.org;
> >>> qemu-devel@nongnu.org; jean-philippe.brucker@arm.com;
> >>> wei@redhat.com; kevin.tian@intel.com; Bharat Bhushan
> >>> <bharat.bhushan@nxp.com>; marc.zyngier@arm.com;
> tn@semihalf.com;
> >>> will.deacon@arm.com; drjones@redhat.com; robin.murphy@arm.com;
> >>> christoffer.dall@linaro.org
> >>> Subject: Re: [Qemu-devel] [RFC v2 6/8] virtio-iommu: Implement the
> >>> translation and commands
> >>>
> >>> On Wed, Jun 07, 2017 at 06:01:25PM +0200, Eric Auger wrote:
> >>>> This patch adds the actual implementation for the translation
> >>>> routine and the virtio-iommu commands.
> >>>>
> >>>> Signed-off-by: Eric Auger <eric.auger@redhat.com>
> >>>
> >>> [...]
> >>>
> >>>>  static int virtio_iommu_attach(VirtIOIOMMU *s,
> >>>>                                 struct virtio_iommu_req_attach
> >>>> *req) @@ -95,10 +135,34 @@ static int
> virtio_iommu_attach(VirtIOIOMMU *s,
> >>>>      uint32_t asid = le32_to_cpu(req->address_space);
> >>>>      uint32_t devid = le32_to_cpu(req->device);
> >>>>      uint32_t reserved = le32_to_cpu(req->reserved);
> >>>> +    viommu_as *as;
> >>>> +    viommu_dev *dev;
> >>>>
> >>>>      trace_virtio_iommu_attach(asid, devid, reserved);
> >>>>
> >>>> -    return VIRTIO_IOMMU_S_UNSUPP;
> >>>> +    dev = g_tree_lookup(s->devices, GUINT_TO_POINTER(devid));
> >>>> +    if (dev) {
> >>>> +        return -1;
> >>>> +    }
> >>>> +
> >>>> +    as = g_tree_lookup(s->address_spaces,
> GUINT_TO_POINTER(asid));
> >>>> +    if (!as) {
> >>>> +        as = g_malloc0(sizeof(*as));
> >>>> +        as->id = asid;
> >>>> +        as->mappings =
> g_tree_new_full((GCompareDataFunc)interval_cmp,
> >>>> +                                         NULL, NULL, (GDestroyNotify)g_free);
> >>>> +        g_tree_insert(s->address_spaces, GUINT_TO_POINTER(asid), as);
> >>>> +        trace_virtio_iommu_new_asid(asid);
> >>>> +    }
> >>>> +
> >>>> +    dev = g_malloc0(sizeof(*dev));
> >>>> +    dev->as = as;
> >>>> +    dev->id = devid;
> >>>> +    as->nr_devices++;
> >>>> +    trace_virtio_iommu_new_devid(devid);
> >>>> +    g_tree_insert(s->devices, GUINT_TO_POINTER(devid), dev);
> >>>
> >>> Here do we need to record something like a refcount for address space?
> >>> Since...
> >>
> >> We are using "nr_devices" as number of devices attached to an
> >> address-space
> >>
> >>>
> >>>> +
> >>>> +    return VIRTIO_IOMMU_S_OK;
> >>>>  }
> >>>>
> >>>>  static int virtio_iommu_detach(VirtIOIOMMU *s, @@ -106,10 +170,13
> >>>> @@ static int virtio_iommu_detach(VirtIOIOMMU *s,  {
> >>>>      uint32_t devid = le32_to_cpu(req->device);
> >>>>      uint32_t reserved = le32_to_cpu(req->reserved);
> >>>> +    int ret;
> >>>>
> >>>>      trace_virtio_iommu_detach(devid, reserved);
> >>>>
> >>>> -    return VIRTIO_IOMMU_S_UNSUPP;
> >>>> +    ret = g_tree_remove(s->devices, GUINT_TO_POINTER(devid));
> >>>
> >>> ... here when detach, imho we should check the refcount: if there is
> >>> no device using specific address space, we should release the
> >>> address space as well.
> >>>
> >>> Otherwise there would have no way to destroy an address space?
> >>
> >>
> >> Here if nr_devices == 0 then release the address space, is that ok?
> >>
> >> This is how I implemented as part of VFIO integration over this patch
> series.
> >> 	"[RFC PATCH 2/2] virtio-iommu: vfio integration with virtio-iommu"
> 
> glib provides g_tree_ref/g_tree_unref. I think the most elegant is to use
> g_tree_ref when adding a device and decrementing when removing a
> device, as suggested by Peter.
> 
> "if the reference count drops to 0, all keys and values will be destroyed (if
> destroy functions were specified) and all memory allocated by tree will be
> released."
> 
> That way we should be able to remove nr_devices from viommu_as
> 
> Bharat, if this breaks your implementation I will let you re-introduce
> nr_devices as part of the VFIO patch.

We need to unmap() everything in physical-iommu when last device is detached, seems like g_tree_ref/unref() does not give any handle for this. I will go ahead with adding nr_devices with VFIO integration patches.

Thanks
-Bharat

> 
> Thanks
> 
> Eric
> >
> > Sorry I didn't read that when posting. It is what I mean.  Thanks,
> >

^ permalink raw reply	[flat|nested] 73+ messages in thread

end of thread, other threads:[~2017-08-03 10:48 UTC | newest]

Thread overview: 73+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-06-07 16:01 [Qemu-devel] [RFC v2 0/8] VIRTIO-IOMMU device Eric Auger
2017-06-07 16:01 ` [Qemu-devel] [RFC v2 1/8] update-linux-headers: import virtio_iommu.h Eric Auger
2017-06-07 16:01 ` [Qemu-devel] [RFC v2 2/8] linux-headers: Update for virtio-iommu Eric Auger
2017-06-07 16:01 ` [Qemu-devel] [RFC v2 3/8] virtio_iommu: add skeleton Eric Auger
2017-06-08 11:09   ` Bharat Bhushan
2017-06-23 16:08     ` Jean-Philippe Brucker
2017-06-07 16:01 ` [Qemu-devel] [RFC v2 4/8] virtio-iommu: Decode the command payload Eric Auger
2017-06-07 16:01 ` [Qemu-devel] [RFC v2 5/8] virtio_iommu: Add the iommu regions Eric Auger
2017-06-12  5:59   ` Bharat Bhushan
2017-06-07 16:01 ` [Qemu-devel] [RFC v2 6/8] virtio-iommu: Implement the translation and commands Eric Auger
2017-06-23 16:09   ` Jean-Philippe Brucker
2017-07-04  9:13   ` Bharat Bhushan
2017-07-05  6:40     ` Auger Eric
2017-07-14  2:17   ` Peter Xu
2017-07-14  6:40     ` Bharat Bhushan
2017-07-17  1:28       ` Peter Xu
2017-07-31 13:08         ` Auger Eric
2017-08-03 10:48           ` Bharat Bhushan
2017-07-14 11:25     ` Jean-Philippe Brucker
2017-07-17  1:37       ` Peter Xu
2017-06-07 16:01 ` [Qemu-devel] [RFC v2 7/8] hw/arm/virt: Add 2.10 machine type Eric Auger
2017-06-07 16:01 ` [Qemu-devel] [RFC v2 8/8] hw/arm/virt: Add virtio-iommu the virt board Eric Auger
2017-06-09  6:16 ` [Qemu-devel] [RFC v2 0/8] VIRTIO-IOMMU device Bharat Bhushan
2017-06-09  6:43   ` Auger Eric
2017-06-09 11:30     ` Bharat Bhushan
2017-06-09 11:53       ` Auger Eric
2017-06-19  7:54         ` Bharat Bhushan
2017-06-19 10:15           ` Jean-Philippe Brucker
2017-06-26  8:22             ` Auger Eric
2017-06-26 16:13               ` Jean-Philippe Brucker
2017-06-27  6:38                 ` Auger Eric
2017-06-27  8:46                   ` Will Deacon
2017-06-27  8:59                     ` Auger Eric
2017-07-05  7:25                 ` Tian, Kevin
2017-07-05 12:44                   ` Jean-Philippe Brucker
2017-07-05  7:14             ` Tian, Kevin
2017-07-05 12:44               ` Jean-Philippe Brucker
2017-07-07  6:21                 ` Tian, Kevin
2017-07-07 15:15                   ` Jean-Philippe Brucker
2017-07-14  7:20                     ` Tian, Kevin
2017-07-14 11:25                       ` Jean-Philippe Brucker
2017-07-17  2:20                         ` Tian, Kevin
2017-07-05  7:15             ` Bharat Bhushan
2017-06-26  7:54           ` Auger Eric
2017-07-05  8:23             ` Bharat Bhushan
2017-07-05  8:44               ` Auger Eric
2017-07-05  8:49                 ` Bharat Bhushan
2017-07-06 10:02                   ` Jean-Philippe Brucker
2017-07-06 11:24                     ` Bharat Bhushan
2017-07-06 11:55                       ` Jean-Philippe Brucker
2017-07-06 21:16                       ` Auger Eric
2017-07-06 23:23                         ` [Qemu-devel] [Qemu-arm] " Kalra, Ashish
2017-07-06 23:29                           ` Michael S. Tsirkin
2017-07-06 23:33                           ` Tian, Kevin
2017-07-07 15:14                             ` Jean-Philippe Brucker
2017-07-07 22:11                               ` Kalra, Ashish
2017-07-11 11:31                                 ` Jean-Philippe Brucker
2017-07-14  6:58                               ` Tian, Kevin
2017-07-07  6:25                         ` [Qemu-devel] " Bharat Bhushan
2017-07-07  7:25                           ` Auger Eric
2017-07-07 11:36                             ` Bharat Bhushan
2017-07-07 15:19                               ` Jean-Philippe Brucker
2017-07-11  5:54                                 ` Bharat Bhushan
2017-07-11 12:51                                   ` Jean-Philippe Brucker
2017-07-12  3:50                                     ` Bharat Bhushan
2017-07-12 10:18                                       ` Jean-Philippe Brucker
2017-07-12 10:27                                         ` Bharat Bhushan
2017-07-12 10:58                                           ` Jean-Philippe Brucker
2017-07-12 11:12                                             ` Bharat Bhushan
2017-07-06 21:11                     ` Auger Eric
2017-07-07  7:31                       ` Auger Eric
2017-07-07 15:20                       ` Jean-Philippe Brucker
2017-06-09 12:15       ` Auger Eric

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.