[Qemu-devel] [PATCHv4 00/12] vhost-net: upstream integration

All of lore.kernel.org
 help / color / mirror / Atom feed

* [Qemu-devel] [PATCHv4 00/12] vhost-net: upstream integration
@ 2010-03-03 17:15 Michael S. Tsirkin
  2010-03-03 17:15 ` [Qemu-devel] [PATCHv4 01/12] tap: add interface to get device fd Michael S. Tsirkin
                   ` (11 more replies)
  0 siblings, 12 replies; 32+ messages in thread
From: Michael S. Tsirkin @ 2010-03-03 17:15 UTC (permalink / raw)
  To: Anthony Liguori, qemu-devel; +Cc: amit.shah, kraxel, quintela

Here's a patchset with vhost support for upstream qemu,
rebased to latest bits, and with all comments I'm aware of
addressed.

Please consider for merging.  Anthony, if you are still deliberating
some issues, maybe the series can be merged partially?  This will at
least reduce the amount of noise from reposting the large patchset.

Changes from v3:
  vhost: vhost net support: use typedef instead of struct name
  virtio: add set_status callback: fix up non-PCI bindings

Changes from v2:
  Addressed style comments
  Detect mapping changes and abort
  Unmap ring on cleanup

Changes from v1:
  Addressed style comments
  Migration fixes.
  Gracefully fail with non-tap backends.

Michael S. Tsirkin (12):
  tap: add interface to get device fd
  kvm: add API to set ioeventfd
  notifier: event notifier implementation
  virtio: add notifier support
  virtio: add APIs for queue fields
  virtio: add set_status callback
  virtio: move typedef to qemu-common
  virtio-pci: fill in notifier support
  vhost: vhost net support
  tap: add vhost/vhostfd options
  tap: add API to retrieve vhost net header
  virtio-net: vhost net support

 Makefile.target      |    3 +
 configure            |   36 +++
 hw/notifier.c        |   62 +++++
 hw/notifier.h        |   16 ++
 hw/s390-virtio-bus.c |    2 +-
 hw/syborg_virtio.c   |    2 +-
 hw/vhost.c           |  706 ++++++++++++++++++++++++++++++++++++++++++++++++++
 hw/vhost.h           |   48 ++++
 hw/vhost_net.c       |  198 ++++++++++++++
 hw/vhost_net.h       |   19 ++
 hw/virtio-net.c      |   71 +++++-
 hw/virtio-pci.c      |   67 +++++-
 hw/virtio.c          |   81 ++++++-
 hw/virtio.h          |   28 ++-
 kvm-all.c            |   22 ++
 kvm.h                |   16 ++
 net.c                |    8 +
 net/tap.c            |   43 +++
 net/tap.h            |    5 +
 qemu-common.h        |    2 +
 qemu-options.hx      |    4 +-
 21 files changed, 1429 insertions(+), 10 deletions(-)
 create mode 100644 hw/notifier.c
 create mode 100644 hw/notifier.h
 create mode 100644 hw/vhost.c
 create mode 100644 hw/vhost.h
 create mode 100644 hw/vhost_net.c
 create mode 100644 hw/vhost_net.h

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [Qemu-devel] [PATCHv4 01/12] tap: add interface to get device fd
  2010-03-03 17:15 [Qemu-devel] [PATCHv4 00/12] vhost-net: upstream integration Michael S. Tsirkin
@ 2010-03-03 17:15 ` Michael S. Tsirkin
  2010-03-03 17:15 ` [Qemu-devel] [PATCHv4 02/12] kvm: add API to set ioeventfd Michael S. Tsirkin
                   ` (10 subsequent siblings)
  11 siblings, 0 replies; 32+ messages in thread
From: Michael S. Tsirkin @ 2010-03-03 17:15 UTC (permalink / raw)
  To: Anthony Liguori, qemu-devel; +Cc: amit.shah, kraxel, quintela

Will be used by vhost to attach/detach to backend.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
---
 net/tap.c |    7 +++++++
 net/tap.h |    2 ++
 2 files changed, 9 insertions(+), 0 deletions(-)

diff --git a/net/tap.c b/net/tap.c
index 7a7320c..fc59fd4 100644
--- a/net/tap.c
+++ b/net/tap.c
@@ -269,6 +269,13 @@ static void tap_poll(VLANClientState *nc, bool enable)
     tap_write_poll(s, enable);
 }
 
+int tap_get_fd(VLANClientState *nc)
+{
+    TAPState *s = DO_UPCAST(TAPState, nc, nc);
+    assert(nc->info->type == NET_CLIENT_TYPE_TAP);
+    return s->fd;
+}
+
 /* fd support */
 
 static NetClientInfo net_tap_info = {
diff --git a/net/tap.h b/net/tap.h
index 538a562..a244b28 100644
--- a/net/tap.h
+++ b/net/tap.h
@@ -48,4 +48,6 @@ int tap_probe_vnet_hdr(int fd);
 int tap_probe_has_ufo(int fd);
 void tap_fd_set_offload(int fd, int csum, int tso4, int tso6, int ecn, int ufo);
 
+int tap_get_fd(VLANClientState *vc);
+
 #endif /* QEMU_NET_TAP_H */
-- 
1.7.0.18.g0d53a5

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [Qemu-devel] [PATCHv4 02/12] kvm: add API to set ioeventfd
  2010-03-03 17:15 [Qemu-devel] [PATCHv4 00/12] vhost-net: upstream integration Michael S. Tsirkin
  2010-03-03 17:15 ` [Qemu-devel] [PATCHv4 01/12] tap: add interface to get device fd Michael S. Tsirkin
@ 2010-03-03 17:15 ` Michael S. Tsirkin
  2010-03-03 17:16 ` [Qemu-devel] [PATCHv4 03/12] notifier: event notifier implementation Michael S. Tsirkin
                   ` (9 subsequent siblings)
  11 siblings, 0 replies; 32+ messages in thread
From: Michael S. Tsirkin @ 2010-03-03 17:15 UTC (permalink / raw)
  To: Anthony Liguori, qemu-devel; +Cc: amit.shah, kraxel, quintela

Comment on kvm usage: rather than require users to do if (kvm_enabled())
and/or ifdefs, this patch adds an API that, internally, is defined to
stub function on non-kvm build, and checks kvm_enabled for non-kvm
run.

While rest of qemu code still uses if (kvm_enabled()), I think this
approach is cleaner, and we should convert rest of code to it
long term.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
---
 kvm-all.c |   22 ++++++++++++++++++++++
 kvm.h     |   16 ++++++++++++++++
 2 files changed, 38 insertions(+), 0 deletions(-)

diff --git a/kvm-all.c b/kvm-all.c
index 1a02076..f54af71 100644
--- a/kvm-all.c
+++ b/kvm-all.c
@@ -1138,3 +1138,25 @@ int kvm_set_signal_mask(CPUState *env, const sigset_t *sigset)
 
     return r;
 }
+
+#ifdef KVM_IOEVENTFD
+int kvm_set_ioeventfd_pio_word(int fd, uint16_t addr, uint16_t val, bool assign)
+{
+    struct kvm_ioeventfd kick = {
+        .datamatch = val,
+        .addr = addr,
+        .len = 2,
+        .flags = KVM_IOEVENTFD_FLAG_DATAMATCH | KVM_IOEVENTFD_FLAG_PIO,
+        .fd = fd,
+    };
+    int r;
+    if (!kvm_enabled())
+        return -ENOSYS;
+    if (!assign)
+        kick.flags |= KVM_IOEVENTFD_FLAG_DEASSIGN;
+    r = kvm_vm_ioctl(kvm_state, KVM_IOEVENTFD, &kick);
+    if (r < 0)
+        return r;
+    return 0;
+}
+#endif
diff --git a/kvm.h b/kvm.h
index a74dfcb..45d087b 100644
--- a/kvm.h
+++ b/kvm.h
@@ -14,10 +14,16 @@
 #ifndef QEMU_KVM_H
 #define QEMU_KVM_H
 
+#include <stdbool.h>
+#include <errno.h>
 #include "config.h"
 #include "qemu-queue.h"
 
 #ifdef CONFIG_KVM
+#include <linux/kvm.h>
+#endif
+
+#ifdef CONFIG_KVM
 extern int kvm_allowed;
 
 #define kvm_enabled() (kvm_allowed)
@@ -135,4 +141,14 @@ static inline void cpu_synchronize_state(CPUState *env)
     }
 }
 
+#if defined(KVM_IOEVENTFD) && defined(CONFIG_KVM)
+int kvm_set_ioeventfd_pio_word(int fd, uint16_t adr, uint16_t val, bool assign);
+#else
+static inline
+int kvm_set_ioeventfd_pio_word(int fd, uint16_t adr, uint16_t val, bool assign)
+{
+    return -ENOSYS;
+}
+#endif
+
 #endif
-- 
1.7.0.18.g0d53a5

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [Qemu-devel] [PATCHv4 03/12] notifier: event notifier implementation
  2010-03-03 17:15 [Qemu-devel] [PATCHv4 00/12] vhost-net: upstream integration Michael S. Tsirkin
  2010-03-03 17:15 ` [Qemu-devel] [PATCHv4 01/12] tap: add interface to get device fd Michael S. Tsirkin
  2010-03-03 17:15 ` [Qemu-devel] [PATCHv4 02/12] kvm: add API to set ioeventfd Michael S. Tsirkin
@ 2010-03-03 17:16 ` Michael S. Tsirkin
  2010-03-05 12:03   ` [Qemu-devel] " Amit Shah
  2010-03-03 17:16 ` [Qemu-devel] [PATCHv4 04/12] virtio: add notifier support Michael S. Tsirkin
                   ` (8 subsequent siblings)
  11 siblings, 1 reply; 32+ messages in thread
From: Michael S. Tsirkin @ 2010-03-03 17:16 UTC (permalink / raw)
  To: Anthony Liguori, qemu-devel; +Cc: amit.shah, kraxel, quintela

event notifiers are slightly generalized eventfd descriptors. Current
implementation depends on eventfd because vhost is the only user, and
vhost depends on eventfd anyway, but a stub is provided for non-eventfd
case.

We'll be able to further generalize this when another user comes along
and we see how to best do this.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
---
 Makefile.target |    1 +
 hw/notifier.c   |   62 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
 hw/notifier.h   |   16 ++++++++++++++
 qemu-common.h   |    1 +
 4 files changed, 80 insertions(+), 0 deletions(-)
 create mode 100644 hw/notifier.c
 create mode 100644 hw/notifier.h

diff --git a/Makefile.target b/Makefile.target
index 320f807..1f5aa9d 100644
--- a/Makefile.target
+++ b/Makefile.target
@@ -172,6 +172,7 @@ obj-y = vl.o async.o monitor.o pci.o pci_host.o pcie_host.o machine.o gdbstub.o
 # virtio has to be here due to weird dependency between PCI and virtio-net.
 # need to fix this properly
 obj-y += virtio-blk.o virtio-balloon.o virtio-net.o virtio-pci.o virtio-serial-bus.o
+obj-y += notifier.o
 obj-y += rwhandler.o
 obj-$(CONFIG_KVM) += kvm.o kvm-all.o
 obj-$(CONFIG_ISA_MMIO) += isa_mmio.o
diff --git a/hw/notifier.c b/hw/notifier.c
new file mode 100644
index 0000000..c6db3c3
--- /dev/null
+++ b/hw/notifier.c
@@ -0,0 +1,62 @@
+/*
+ * event notifier support
+ *
+ * Copyright Red Hat, Inc. 2010
+ *
+ * Authors:
+ *  Michael S. Tsirkin <mst@redhat.com>
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2.  See
+ * the COPYING file in the top-level directory.
+ */
+
+#include "hw.h"
+#include "notifier.h"
+#ifdef CONFIG_EVENTFD
+#include <sys/eventfd.h>
+#endif
+
+int event_notifier_init(EventNotifier *e, int active)
+{
+#ifdef CONFIG_EVENTFD
+	int fd = eventfd(!!active, EFD_NONBLOCK | EFD_CLOEXEC);
+	if (fd < 0)
+		return -errno;
+	e->fd = fd;
+	return 0;
+#else
+	return -ENOSYS;
+#endif
+}
+
+void event_notifier_cleanup(EventNotifier *e)
+{
+	close(e->fd);
+}
+
+int event_notifier_get_fd(EventNotifier *e)
+{
+	return e->fd;
+}
+
+int event_notifier_test_and_clear(EventNotifier *e)
+{
+	uint64_t value;
+	int r = read(e->fd, &value, sizeof(value));
+	return r == sizeof(value);
+}
+
+int event_notifier_test(EventNotifier *e)
+{
+	uint64_t value;
+	int r = read(e->fd, &value, sizeof(value));
+	if (r == sizeof(value)) {
+		/* restore previous value. */
+		int s = write(e->fd, &value, sizeof(value));
+		/* never blocks because we use EFD_SEMAPHORE.
+		 * If we didn't we'd get EAGAIN on overflow
+		 * and we'd have to write code to ignore it. */
+		assert(s == sizeof(value));
+	}
+	return r == sizeof(value);
+}
diff --git a/hw/notifier.h b/hw/notifier.h
new file mode 100644
index 0000000..24117ea
--- /dev/null
+++ b/hw/notifier.h
@@ -0,0 +1,16 @@
+#ifndef QEMU_EVENT_NOTIFIER_H
+#define QEMU_EVENT_NOTIFIER_H
+
+#include "qemu-common.h"
+
+struct EventNotifier {
+	int fd;
+};
+
+int event_notifier_init(EventNotifier *, int active);
+void event_notifier_cleanup(EventNotifier *);
+int event_notifier_get_fd(EventNotifier *);
+int event_notifier_test_and_clear(EventNotifier *);
+int event_notifier_test(EventNotifier *);
+
+#endif
diff --git a/qemu-common.h b/qemu-common.h
index 805be1a..f12a8f5 100644
--- a/qemu-common.h
+++ b/qemu-common.h
@@ -227,6 +227,7 @@ typedef struct uWireSlave uWireSlave;
 typedef struct I2SCodec I2SCodec;
 typedef struct DeviceState DeviceState;
 typedef struct SSIBus SSIBus;
+typedef struct EventNotifier EventNotifier;
 
 typedef uint64_t pcibus_t;
 
-- 
1.7.0.18.g0d53a5

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [Qemu-devel] [PATCHv4 04/12] virtio: add notifier support
  2010-03-03 17:15 [Qemu-devel] [PATCHv4 00/12] vhost-net: upstream integration Michael S. Tsirkin
                   ` (2 preceding siblings ...)
  2010-03-03 17:16 ` [Qemu-devel] [PATCHv4 03/12] notifier: event notifier implementation Michael S. Tsirkin
@ 2010-03-03 17:16 ` Michael S. Tsirkin
  2010-03-05 12:04   ` [Qemu-devel] " Amit Shah
  2010-03-03 17:16 ` [Qemu-devel] [PATCHv4 05/12] virtio: add APIs for queue fields Michael S. Tsirkin
                   ` (7 subsequent siblings)
  11 siblings, 1 reply; 32+ messages in thread
From: Michael S. Tsirkin @ 2010-03-03 17:16 UTC (permalink / raw)
  To: Anthony Liguori, qemu-devel; +Cc: amit.shah, kraxel, quintela

Add binding API to set host/guest notifiers.
Will be used by vhost.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
---
 hw/virtio.c |    5 ++++-
 hw/virtio.h |    3 +++
 2 files changed, 7 insertions(+), 1 deletions(-)

diff --git a/hw/virtio.c b/hw/virtio.c
index 7c020a3..1f5e7be 100644
--- a/hw/virtio.c
+++ b/hw/virtio.c
@@ -73,6 +73,7 @@ struct VirtQueue
     int inuse;
     uint16_t vector;
     void (*handle_output)(VirtIODevice *vdev, VirtQueue *vq);
+    VirtIODevice *vdev;
 };
 
 /* virt queue functions */
@@ -714,8 +715,10 @@ VirtIODevice *virtio_common_init(const char *name, uint16_t device_id,
     vdev->queue_sel = 0;
     vdev->config_vector = VIRTIO_NO_VECTOR;
     vdev->vq = qemu_mallocz(sizeof(VirtQueue) * VIRTIO_PCI_QUEUE_MAX);
-    for(i = 0; i < VIRTIO_PCI_QUEUE_MAX; i++)
+    for(i = 0; i < VIRTIO_PCI_QUEUE_MAX; i++) {
         vdev->vq[i].vector = VIRTIO_NO_VECTOR;
+        vdev->vq[i].vdev = vdev;
+    }
 
     vdev->name = name;
     vdev->config_len = config_size;
diff --git a/hw/virtio.h b/hw/virtio.h
index 3baa2a3..af87889 100644
--- a/hw/virtio.h
+++ b/hw/virtio.h
@@ -19,6 +19,7 @@
 #include "qdev.h"
 #include "sysemu.h"
 #include "block_int.h"
+#include "notifier.h"
 
 /* from Linux's linux/virtio_config.h */
 
@@ -89,6 +90,8 @@ typedef struct {
     int (*load_config)(void * opaque, QEMUFile *f);
     int (*load_queue)(void * opaque, int n, QEMUFile *f);
     unsigned (*get_features)(void * opaque);
+    int (*guest_notifier)(void * opaque, int n, bool assigned);
+    int (*host_notifier)(void * opaque, int n, bool assigned);
 } VirtIOBindings;
 
 #define VIRTIO_PCI_QUEUE_MAX 64
-- 
1.7.0.18.g0d53a5

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [Qemu-devel] [PATCHv4 05/12] virtio: add APIs for queue fields
  2010-03-03 17:15 [Qemu-devel] [PATCHv4 00/12] vhost-net: upstream integration Michael S. Tsirkin
                   ` (3 preceding siblings ...)
  2010-03-03 17:16 ` [Qemu-devel] [PATCHv4 04/12] virtio: add notifier support Michael S. Tsirkin
@ 2010-03-03 17:16 ` Michael S. Tsirkin
  2010-03-05 12:10   ` [Qemu-devel] " Amit Shah
  2010-03-05 13:08   ` Amit Shah
  2010-03-03 17:16 ` [Qemu-devel] [PATCHv4 06/12] virtio: add set_status callback Michael S. Tsirkin
                   ` (6 subsequent siblings)
  11 siblings, 2 replies; 32+ messages in thread
From: Michael S. Tsirkin @ 2010-03-03 17:16 UTC (permalink / raw)
  To: Anthony Liguori, qemu-devel; +Cc: amit.shah, kraxel, quintela

vhost needs physical addresses for ring and other queue fields,
so add APIs for these.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
---
 hw/virtio.c |   76 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 hw/virtio.h |   15 +++++++++++-
 2 files changed, 90 insertions(+), 1 deletions(-)

diff --git a/hw/virtio.c b/hw/virtio.c
index 1f5e7be..e5787fa 100644
--- a/hw/virtio.c
+++ b/hw/virtio.c
@@ -74,6 +74,8 @@ struct VirtQueue
     uint16_t vector;
     void (*handle_output)(VirtIODevice *vdev, VirtQueue *vq);
     VirtIODevice *vdev;
+    EventNotifier guest_notifier;
+    EventNotifier host_notifier;
 };
 
 /* virt queue functions */
@@ -593,6 +595,12 @@ VirtQueue *virtio_add_queue(VirtIODevice *vdev, int queue_size,
     return &vdev->vq[i];
 }
 
+void virtio_irq(VirtQueue *vq)
+{
+    vq->vdev->isr |= 0x01;
+    virtio_notify_vector(vq->vdev, vq->vector);
+}
+
 void virtio_notify(VirtIODevice *vdev, VirtQueue *vq)
 {
     /* Always notify when queue is empty (when feature acknowledge) */
@@ -736,3 +744,71 @@ void virtio_bind_device(VirtIODevice *vdev, const VirtIOBindings *binding,
     vdev->binding = binding;
     vdev->binding_opaque = opaque;
 }
+
+target_phys_addr_t virtio_queue_get_desc(VirtIODevice *vdev, int n)
+{
+    return vdev->vq[n].vring.desc;
+}
+
+target_phys_addr_t virtio_queue_get_avail(VirtIODevice *vdev, int n)
+{
+    return vdev->vq[n].vring.avail;
+}
+
+target_phys_addr_t virtio_queue_get_used(VirtIODevice *vdev, int n)
+{
+    return vdev->vq[n].vring.used;
+}
+
+target_phys_addr_t virtio_queue_get_ring(VirtIODevice *vdev, int n)
+{
+    return vdev->vq[n].vring.desc;
+}
+
+target_phys_addr_t virtio_queue_get_desc_size(VirtIODevice *vdev, int n)
+{
+    return sizeof(VRingDesc) * vdev->vq[n].vring.num;
+}
+
+target_phys_addr_t virtio_queue_get_avail_size(VirtIODevice *vdev, int n)
+{
+    return offsetof(VRingAvail, ring) +
+        sizeof(u_int64_t) * vdev->vq[n].vring.num;
+}
+
+target_phys_addr_t virtio_queue_get_used_size(VirtIODevice *vdev, int n)
+{
+    return offsetof(VRingUsed, ring) +
+        sizeof(VRingUsedElem) * vdev->vq[n].vring.num;
+}
+
+
+target_phys_addr_t virtio_queue_get_ring_size(VirtIODevice *vdev, int n)
+{
+    return vdev->vq[n].vring.used - vdev->vq[n].vring.desc +
+	    virtio_queue_get_used_size(vdev, n);
+}
+
+uint16_t virtio_queue_last_avail_idx(VirtIODevice *vdev, int n)
+{
+    return vdev->vq[n].last_avail_idx;
+}
+
+void virtio_queue_set_last_avail_idx(VirtIODevice *vdev, int n, uint16_t idx)
+{
+    vdev->vq[n].last_avail_idx = idx;
+}
+
+VirtQueue *virtio_queue(VirtIODevice *vdev, int n)
+{
+    return vdev->vq + n;
+}
+
+EventNotifier *virtio_queue_guest_notifier(VirtQueue *vq)
+{
+    return &vq->guest_notifier;
+}
+EventNotifier *virtio_queue_host_notifier(VirtQueue *vq)
+{
+    return &vq->host_notifier;
+}
diff --git a/hw/virtio.h b/hw/virtio.h
index af87889..26cf5fd 100644
--- a/hw/virtio.h
+++ b/hw/virtio.h
@@ -184,5 +184,18 @@ void virtio_net_exit(VirtIODevice *vdev);
 	DEFINE_PROP_BIT("indirect_desc", _state, _field, \
 			VIRTIO_RING_F_INDIRECT_DESC, true)
 
-
+target_phys_addr_t virtio_queue_get_desc(VirtIODevice *vdev, int n);
+target_phys_addr_t virtio_queue_get_avail(VirtIODevice *vdev, int n);
+target_phys_addr_t virtio_queue_get_used(VirtIODevice *vdev, int n);
+target_phys_addr_t virtio_queue_get_ring(VirtIODevice *vdev, int n);
+target_phys_addr_t virtio_queue_get_desc_size(VirtIODevice *vdev, int n);
+target_phys_addr_t virtio_queue_get_avail_size(VirtIODevice *vdev, int n);
+target_phys_addr_t virtio_queue_get_used_size(VirtIODevice *vdev, int n);
+target_phys_addr_t virtio_queue_get_ring_size(VirtIODevice *vdev, int n);
+uint16_t virtio_queue_last_avail_idx(VirtIODevice *vdev, int n);
+void virtio_queue_set_last_avail_idx(VirtIODevice *vdev, int n, uint16_t idx);
+VirtQueue *virtio_queue(VirtIODevice *vdev, int n);
+EventNotifier *virtio_queue_guest_notifier(VirtQueue *vq);
+EventNotifier *virtio_queue_host_notifier(VirtQueue *vq);
+void virtio_irq(VirtQueue *vq);
 #endif
-- 
1.7.0.18.g0d53a5

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [Qemu-devel] [PATCHv4 06/12] virtio: add set_status callback
  2010-03-03 17:15 [Qemu-devel] [PATCHv4 00/12] vhost-net: upstream integration Michael S. Tsirkin
                   ` (4 preceding siblings ...)
  2010-03-03 17:16 ` [Qemu-devel] [PATCHv4 05/12] virtio: add APIs for queue fields Michael S. Tsirkin
@ 2010-03-03 17:16 ` Michael S. Tsirkin
  2010-03-04 12:19   ` Amit Shah
  2010-03-03 17:16 ` [Qemu-devel] [PATCHv4 07/12] virtio: move typedef to qemu-common Michael S. Tsirkin
                   ` (5 subsequent siblings)
  11 siblings, 1 reply; 32+ messages in thread
From: Michael S. Tsirkin @ 2010-03-03 17:16 UTC (permalink / raw)
  To: Anthony Liguori, qemu-devel; +Cc: amit.shah, kraxel, quintela

vhost net backend needs to be notified when
frontend status changes. Add a callback,
similar to set_features.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
---
 hw/s390-virtio-bus.c |    2 +-
 hw/syborg_virtio.c   |    2 +-
 hw/virtio-pci.c      |    5 +++--
 hw/virtio.h          |    9 +++++++++
 4 files changed, 14 insertions(+), 4 deletions(-)

diff --git a/hw/s390-virtio-bus.c b/hw/s390-virtio-bus.c
index fa0a74f..6f93d67 100644
--- a/hw/s390-virtio-bus.c
+++ b/hw/s390-virtio-bus.c
@@ -242,7 +242,7 @@ void s390_virtio_device_update_status(VirtIOS390Device *dev)
     VirtIODevice *vdev = dev->vdev;
     uint32_t features;
 
-    vdev->status = ldub_phys(dev->dev_offs + VIRTIO_DEV_OFFS_STATUS);
+    virtio_set_status(vdev, ldub_phys(dev->dev_offs + VIRTIO_DEV_OFFS_STATUS));
 
     /* Update guest supported feature bitmap */
 
diff --git a/hw/syborg_virtio.c b/hw/syborg_virtio.c
index 65239a0..c7b1162 100644
--- a/hw/syborg_virtio.c
+++ b/hw/syborg_virtio.c
@@ -149,7 +149,7 @@ static void syborg_virtio_writel(void *opaque, target_phys_addr_t offset,
         virtio_queue_notify(vdev, value);
         break;
     case SYBORG_VIRTIO_STATUS:
-        vdev->status = value & 0xFF;
+        virtio_set_status(vdev, value);
         if (vdev->status == 0)
             virtio_reset(vdev);
         break;
diff --git a/hw/virtio-pci.c b/hw/virtio-pci.c
index bcd40f7..fd9759a 100644
--- a/hw/virtio-pci.c
+++ b/hw/virtio-pci.c
@@ -206,7 +206,7 @@ static void virtio_ioport_write(void *opaque, uint32_t addr, uint32_t val)
         virtio_queue_notify(vdev, val);
         break;
     case VIRTIO_PCI_STATUS:
-        vdev->status = val & 0xFF;
+        virtio_set_status(vdev, val & 0xFF);
         if (vdev->status == 0) {
             virtio_reset(proxy->vdev);
             msix_unuse_all_vectors(&proxy->pci_dev);
@@ -377,7 +377,8 @@ static void virtio_write_config(PCIDevice *pci_dev, uint32_t address,
 
     if (PCI_COMMAND == address) {
         if (!(val & PCI_COMMAND_MASTER)) {
-            proxy->vdev->status &= ~VIRTIO_CONFIG_S_DRIVER_OK;
+            virtio_set_status(proxy->vdev,
+                              proxy->vdev->status & ~VIRTIO_CONFIG_S_DRIVER_OK);
         }
     }
 
diff --git a/hw/virtio.h b/hw/virtio.h
index 26cf5fd..58b06bf 100644
--- a/hw/virtio.h
+++ b/hw/virtio.h
@@ -115,12 +115,21 @@ struct VirtIODevice
     void (*get_config)(VirtIODevice *vdev, uint8_t *config);
     void (*set_config)(VirtIODevice *vdev, const uint8_t *config);
     void (*reset)(VirtIODevice *vdev);
+    void (*set_status)(VirtIODevice *vdev, uint8_t val);
     VirtQueue *vq;
     const VirtIOBindings *binding;
     void *binding_opaque;
     uint16_t device_id;
 };
 
+static inline void virtio_set_status(VirtIODevice *vdev, uint8_t val)
+{
+    if (vdev->set_status) {
+        vdev->set_status(vdev, val);
+    }
+    vdev->status = val;
+}
+
 VirtQueue *virtio_add_queue(VirtIODevice *vdev, int queue_size,
                             void (*handle_output)(VirtIODevice *,
                                                   VirtQueue *));
-- 
1.7.0.18.g0d53a5

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [Qemu-devel] [PATCHv4 07/12] virtio: move typedef to qemu-common
  2010-03-03 17:15 [Qemu-devel] [PATCHv4 00/12] vhost-net: upstream integration Michael S. Tsirkin
                   ` (5 preceding siblings ...)
  2010-03-03 17:16 ` [Qemu-devel] [PATCHv4 06/12] virtio: add set_status callback Michael S. Tsirkin
@ 2010-03-03 17:16 ` Michael S. Tsirkin
  2010-03-04 12:20   ` Amit Shah
  2010-03-03 17:16 ` [Qemu-devel] [PATCHv4 08/12] virtio-pci: fill in notifier support Michael S. Tsirkin
                   ` (4 subsequent siblings)
  11 siblings, 1 reply; 32+ messages in thread
From: Michael S. Tsirkin @ 2010-03-03 17:16 UTC (permalink / raw)
  To: Anthony Liguori, qemu-devel; +Cc: amit.shah, kraxel, quintela

make it possible to use type without header include

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
---
 hw/virtio.h   |    1 -
 qemu-common.h |    1 +
 2 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/hw/virtio.h b/hw/virtio.h
index 58b06bf..6f2fab0 100644
--- a/hw/virtio.h
+++ b/hw/virtio.h
@@ -69,7 +69,6 @@ static inline target_phys_addr_t vring_align(target_phys_addr_t addr,
 }
 
 typedef struct VirtQueue VirtQueue;
-typedef struct VirtIODevice VirtIODevice;
 
 #define VIRTQUEUE_MAX_SIZE 1024
 
diff --git a/qemu-common.h b/qemu-common.h
index f12a8f5..90ca3b8 100644
--- a/qemu-common.h
+++ b/qemu-common.h
@@ -228,6 +228,7 @@ typedef struct I2SCodec I2SCodec;
 typedef struct DeviceState DeviceState;
 typedef struct SSIBus SSIBus;
 typedef struct EventNotifier EventNotifier;
+typedef struct VirtIODevice VirtIODevice;
 
 typedef uint64_t pcibus_t;
 
-- 
1.7.0.18.g0d53a5

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [Qemu-devel] [PATCHv4 08/12] virtio-pci: fill in notifier support
  2010-03-03 17:15 [Qemu-devel] [PATCHv4 00/12] vhost-net: upstream integration Michael S. Tsirkin
                   ` (6 preceding siblings ...)
  2010-03-03 17:16 ` [Qemu-devel] [PATCHv4 07/12] virtio: move typedef to qemu-common Michael S. Tsirkin
@ 2010-03-03 17:16 ` Michael S. Tsirkin
  2010-03-03 17:16 ` [Qemu-devel] [PATCHv4 09/12] vhost: vhost net support Michael S. Tsirkin
                   ` (3 subsequent siblings)
  11 siblings, 0 replies; 32+ messages in thread
From: Michael S. Tsirkin @ 2010-03-03 17:16 UTC (permalink / raw)
  To: Anthony Liguori, qemu-devel; +Cc: amit.shah, kraxel, quintela

Support host/guest notifiers in virtio-pci.
The last one only with kvm, that's okay
because vhost relies on kvm anyway.

Note on kvm usage: kvm ioeventfd API
is implemented on non-kvm systems as well,
this is the reason we don't need if (kvm_enabled())
around it.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
---
 hw/virtio-pci.c |   62 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 files changed, 62 insertions(+), 0 deletions(-)

diff --git a/hw/virtio-pci.c b/hw/virtio-pci.c
index fd9759a..596f056 100644
--- a/hw/virtio-pci.c
+++ b/hw/virtio-pci.c
@@ -24,6 +24,7 @@
 #include "net.h"
 #include "block_int.h"
 #include "loader.h"
+#include "kvm.h"
 
 /* from Linux's linux/virtio_pci.h */
 
@@ -392,6 +393,65 @@ static unsigned virtio_pci_get_features(void *opaque)
     return proxy->host_features;
 }
 
+static void virtio_pci_guest_notifier_read(void *opaque)
+{
+    VirtQueue *vq = opaque;
+    EventNotifier *n = virtio_queue_guest_notifier(vq);
+    if (event_notifier_test_and_clear(n)) {
+        virtio_irq(vq);
+    }
+}
+
+static int virtio_pci_guest_notifier(void *opaque, int n, bool assign)
+{
+    VirtIOPCIProxy *proxy = opaque;
+    VirtQueue *vq = virtio_queue(proxy->vdev, n);
+    EventNotifier *notifier = virtio_queue_guest_notifier(vq);
+
+    if (assign) {
+        int r = event_notifier_init(notifier, 0);
+	if (r < 0)
+		return r;
+        qemu_set_fd_handler(event_notifier_get_fd(notifier),
+                            virtio_pci_guest_notifier_read, NULL, vq);
+    } else {
+        qemu_set_fd_handler(event_notifier_get_fd(notifier),
+                            NULL, NULL, NULL);
+        event_notifier_cleanup(notifier);
+    }
+
+    return 0;
+}
+
+static int virtio_pci_host_notifier(void *opaque, int n, bool assign)
+{
+    VirtIOPCIProxy *proxy = opaque;
+    VirtQueue *vq = virtio_queue(proxy->vdev, n);
+    EventNotifier *notifier = virtio_queue_host_notifier(vq);
+    int r;
+    if (assign) {
+        r = event_notifier_init(notifier, 1);
+        if (r < 0) {
+            return r;
+        }
+        r = kvm_set_ioeventfd_pio_word(event_notifier_get_fd(notifier),
+                                       proxy->addr + VIRTIO_PCI_QUEUE_NOTIFY,
+                                       n, assign);
+        if (r < 0) {
+            event_notifier_cleanup(notifier);
+        }
+    } else {
+        r = kvm_set_ioeventfd_pio_word(event_notifier_get_fd(notifier),
+                                       proxy->addr + VIRTIO_PCI_QUEUE_NOTIFY,
+                                       n, assign);
+        if (r < 0) {
+            return r;
+        }
+        event_notifier_cleanup(notifier);
+    }
+    return r;
+}
+
 static const VirtIOBindings virtio_pci_bindings = {
     .notify = virtio_pci_notify,
     .save_config = virtio_pci_save_config,
@@ -399,6 +459,8 @@ static const VirtIOBindings virtio_pci_bindings = {
     .save_queue = virtio_pci_save_queue,
     .load_queue = virtio_pci_load_queue,
     .get_features = virtio_pci_get_features,
+    .host_notifier = virtio_pci_host_notifier,
+    .guest_notifier = virtio_pci_guest_notifier,
 };
 
 static void virtio_init_pci(VirtIOPCIProxy *proxy, VirtIODevice *vdev,
-- 
1.7.0.18.g0d53a5

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [Qemu-devel] [PATCHv4 09/12] vhost: vhost net support
  2010-03-03 17:15 [Qemu-devel] [PATCHv4 00/12] vhost-net: upstream integration Michael S. Tsirkin
                   ` (7 preceding siblings ...)
  2010-03-03 17:16 ` [Qemu-devel] [PATCHv4 08/12] virtio-pci: fill in notifier support Michael S. Tsirkin
@ 2010-03-03 17:16 ` Michael S. Tsirkin
  2010-03-05 18:19   ` [Qemu-devel] " Amit Shah
  2010-03-03 17:16 ` [Qemu-devel] [PATCHv4 10/12] tap: add vhost/vhostfd options Michael S. Tsirkin
                   ` (2 subsequent siblings)
  11 siblings, 1 reply; 32+ messages in thread
From: Michael S. Tsirkin @ 2010-03-03 17:16 UTC (permalink / raw)
  To: Anthony Liguori, qemu-devel; +Cc: amit.shah, kraxel, quintela

This adds vhost net device support in qemu. Will be tied to tap device
and virtio by following patches.  Raw backend is currently missing,
will be worked on/submitted separately.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
---
 Makefile.target |    2 +
 configure       |   36 +++
 hw/vhost.c      |  706 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
 hw/vhost.h      |   48 ++++
 hw/vhost_net.c  |  198 ++++++++++++++++
 hw/vhost_net.h  |   19 ++
 6 files changed, 1009 insertions(+), 0 deletions(-)
 create mode 100644 hw/vhost.c
 create mode 100644 hw/vhost.h
 create mode 100644 hw/vhost_net.c
 create mode 100644 hw/vhost_net.h

diff --git a/Makefile.target b/Makefile.target
index 1f5aa9d..f833735 100644
--- a/Makefile.target
+++ b/Makefile.target
@@ -173,6 +173,8 @@ obj-y = vl.o async.o monitor.o pci.o pci_host.o pcie_host.o machine.o gdbstub.o
 # need to fix this properly
 obj-y += virtio-blk.o virtio-balloon.o virtio-net.o virtio-pci.o virtio-serial-bus.o
 obj-y += notifier.o
+obj-y += vhost_net.o
+obj-$(CONFIG_VHOST_NET) += vhost.o
 obj-y += rwhandler.o
 obj-$(CONFIG_KVM) += kvm.o kvm-all.o
 obj-$(CONFIG_ISA_MMIO) += isa_mmio.o
diff --git a/configure b/configure
index 8eb5f5b..66f0335 100755
--- a/configure
+++ b/configure
@@ -87,6 +87,7 @@ libs_softmmu=""
 libs_tools=""
 audio_pt_int=""
 audio_win_int=""
+audio_win_int=""
 
 # parse CC options first
 for opt do
@@ -263,6 +264,7 @@ vnc_tls=""
 vnc_sasl=""
 xen=""
 linux_aio=""
+vhost_net=""
 
 gprof="no"
 debug_tcg="no"
@@ -651,6 +653,10 @@ for opt do
   ;;
   --enable-docs) docs="yes"
   ;;
+  --disable-vhost-net) vhost_net="no"
+  ;;
+  --enable-vhost-net) vhost_net="yes"
+  ;;
   *) echo "ERROR: unknown option $opt"; show_help="yes"
   ;;
   esac
@@ -1498,6 +1504,32 @@ EOF
 fi
 
 ##########################################
+# test for vhost net
+
+if test "$vhost_net" != "no"; then
+    if test "$kvm" != "no"; then
+            cat > $TMPC <<EOF
+    #include <linux/vhost.h>
+    int main(void) { return 0; }
+EOF
+            if compile_prog "$kvm_cflags" "" ; then
+                vhost_net=yes
+            else
+                if "$vhost_net" == "yes" ; then
+                    feature_not_found "vhost-net"
+                fi
+                vhost_net=no
+            fi
+    else
+            if "$vhost_net" == "yes" ; then
+                echo -e "NOTE: vhost-net feature requires KVM (--enable-kvm)."
+                feature_not_found "vhost-net"
+            fi
+            vhost_net=no
+    fi
+fi
+
+##########################################
 # pthread probe
 PTHREADLIBS_LIST="-lpthread -lpthreadGC2"
 
@@ -1968,6 +2000,7 @@ echo "fdt support       $fdt"
 echo "preadv support    $preadv"
 echo "fdatasync         $fdatasync"
 echo "uuid support      $uuid"
+echo "vhost-net support $vhost_net"
 
 if test $sdl_too_old = "yes"; then
 echo "-> Your SDL version is too old - please upgrade to have SDL support"
@@ -2492,6 +2525,9 @@ case "$target_arch2" in
       if test "$kvm_para" = "yes"; then
         echo "CONFIG_KVM_PARA=y" >> $config_target_mak
       fi
+      if test $vhost_net = "yes" ; then
+        echo "CONFIG_VHOST_NET=y" >> $config_target_mak
+      fi
     fi
 esac
 echo "TARGET_PHYS_ADDR_BITS=$target_phys_bits" >> $config_target_mak
diff --git a/hw/vhost.c b/hw/vhost.c
new file mode 100644
index 0000000..12a5757
--- /dev/null
+++ b/hw/vhost.c
@@ -0,0 +1,706 @@
+/*
+ * vhost support
+ *
+ * Copyright Red Hat, Inc. 2010
+ *
+ * Authors:
+ *  Michael S. Tsirkin <mst@redhat.com>
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2.  See
+ * the COPYING file in the top-level directory.
+ */
+
+#include <linux/vhost.h>
+#include <sys/ioctl.h>
+#include <sys/eventfd.h>
+#include "vhost.h"
+#include "hw/hw.h"
+/* For range_get_last */
+#include "pci.h"
+
+static void vhost_dev_sync_region(struct vhost_dev *dev,
+                                  uint64_t mfirst, uint64_t mlast,
+                                  uint64_t rfirst, uint64_t rlast)
+{
+    uint64_t start = MAX(mfirst, rfirst);
+    uint64_t end = MIN(mlast, rlast);
+    vhost_log_chunk_t *from = dev->log + start / VHOST_LOG_CHUNK;
+    vhost_log_chunk_t *to = dev->log + end / VHOST_LOG_CHUNK + 1;
+    uint64_t addr = (start / VHOST_LOG_CHUNK) * VHOST_LOG_CHUNK;
+
+    assert(end / VHOST_LOG_CHUNK < dev->log_size);
+    assert(start / VHOST_LOG_CHUNK < dev->log_size);
+    if (end < start) {
+        return;
+    }
+    for (;from < to; ++from) {
+        vhost_log_chunk_t log;
+        int bit;
+        /* We first check with non-atomic: much cheaper,
+         * and we expect non-dirty to be the common case. */
+        if (!*from) {
+            continue;
+        }
+        /* Data must be read atomically. We don't really
+         * need the barrier semantics of __sync
+         * builtins, but it's easier to use them than
+         * roll our own. */
+        log = __sync_fetch_and_and(from, 0);
+        while ((bit = sizeof(log) > sizeof(int) ?
+                ffsll(log) : ffs(log))) {
+            bit -= 1;
+            cpu_physical_memory_set_dirty(addr + bit * VHOST_LOG_PAGE);
+            log &= ~(0x1ull << bit);
+        }
+        addr += VHOST_LOG_CHUNK;
+    }
+}
+
+static int vhost_client_sync_dirty_bitmap(CPUPhysMemoryClient *client,
+                                          target_phys_addr_t start_addr,
+                                          target_phys_addr_t end_addr)
+{
+    struct vhost_dev *dev = container_of(client, struct vhost_dev, client);
+    int i;
+    if (!dev->log_enabled || !dev->started) {
+        return 0;
+    }
+    for (i = 0; i < dev->mem->nregions; ++i) {
+        struct vhost_memory_region *reg = dev->mem->regions + i;
+        vhost_dev_sync_region(dev, start_addr, end_addr,
+                              reg->guest_phys_addr,
+                              range_get_last(reg->guest_phys_addr,
+                                             reg->memory_size));
+    }
+    for (i = 0; i < dev->nvqs; ++i) {
+        struct vhost_virtqueue *vq = dev->vqs + i;
+        vhost_dev_sync_region(dev, start_addr, end_addr, vq->used_phys,
+                              range_get_last(vq->used_phys, vq->used_size));
+    }
+    return 0;
+}
+
+/* Assign/unassign. Keep an unsorted array of non-overlapping
+ * memory regions in dev->mem. */
+static void vhost_dev_unassign_memory(struct vhost_dev *dev,
+                                      uint64_t start_addr,
+                                      uint64_t size)
+{
+    int from, to, n = dev->mem->nregions;
+    /* Track overlapping/split regions for sanity checking. */
+    int overlap_start = 0, overlap_end = 0, overlap_middle = 0, split = 0;
+
+    for (from = 0, to = 0; from < n; ++from, ++to) {
+        struct vhost_memory_region *reg = dev->mem->regions + to;
+        uint64_t reglast;
+        uint64_t memlast;
+        uint64_t change;
+
+        /* clone old region */
+        if (to != from) {
+            memcpy(reg, dev->mem->regions + from, sizeof *reg);
+        }
+
+        /* No overlap is simple */
+        if (!ranges_overlap(reg->guest_phys_addr, reg->memory_size,
+                            start_addr, size)) {
+            continue;
+        }
+
+        /* Split only happens if supplied region
+         * is in the middle of an existing one. Thus it can not
+         * overlap with any other existing region. */
+        assert(!split);
+
+        reglast = range_get_last(reg->guest_phys_addr, reg->memory_size);
+        memlast = range_get_last(start_addr, size);
+
+        /* Remove whole region */
+        if (start_addr <= reg->guest_phys_addr && memlast >= reglast) {
+            --dev->mem->nregions;
+            --to;
+            assert(to >= 0);
+            ++overlap_middle;
+            continue;
+        }
+
+        /* Shrink region */
+        if (memlast >= reglast) {
+            reg->memory_size = start_addr - reg->guest_phys_addr;
+            assert(reg->memory_size);
+            assert(!overlap_end);
+            ++overlap_end;
+            continue;
+        }
+
+        /* Shift region */
+        if (start_addr <= reg->guest_phys_addr) {
+            change = memlast + 1 - reg->guest_phys_addr;
+            reg->memory_size -= change;
+            reg->guest_phys_addr += change;
+            reg->userspace_addr += change;
+            assert(reg->memory_size);
+            assert(!overlap_start);
+            ++overlap_start;
+            continue;
+        }
+
+        /* This only happens if supplied region
+         * is in the middle of an existing one. Thus it can not
+         * overlap with any other existing region. */
+        assert(!overlap_start);
+        assert(!overlap_end);
+        assert(!overlap_middle);
+        /* Split region: shrink first part, shift second part. */
+        memcpy(dev->mem->regions + n, reg, sizeof *reg);
+        reg->memory_size = start_addr - reg->guest_phys_addr;
+        assert(reg->memory_size);
+        change = memlast + 1 - reg->guest_phys_addr;
+        reg = dev->mem->regions + n;
+        reg->memory_size -= change;
+        assert(reg->memory_size);
+        reg->guest_phys_addr += change;
+        reg->userspace_addr += change;
+        /* Never add more than 1 region */
+        assert(dev->mem->nregions == n);
+        ++dev->mem->nregions;
+        ++split;
+    }
+}
+
+/* Called after unassign, so no regions overlap the given range. */
+static void vhost_dev_assign_memory(struct vhost_dev *dev,
+                                    uint64_t start_addr,
+                                    uint64_t size,
+                                    uint64_t uaddr)
+{
+    int from, to;
+    struct vhost_memory_region *merged = NULL;
+    for (from = 0, to = 0; from < dev->mem->nregions; ++from, ++to) {
+        struct vhost_memory_region *reg = dev->mem->regions + to;
+        uint64_t prlast, urlast;
+        uint64_t pmlast, umlast;
+        uint64_t s, e, u;
+
+        /* clone old region */
+        if (to != from) {
+            memcpy(reg, dev->mem->regions + from, sizeof *reg);
+        }
+        prlast = range_get_last(reg->guest_phys_addr, reg->memory_size);
+        pmlast = range_get_last(start_addr, size);
+        urlast = range_get_last(reg->userspace_addr, reg->memory_size);
+        umlast = range_get_last(uaddr, size);
+
+        /* check for overlapping regions: should never happen. */
+        assert(prlast < start_addr || pmlast < reg->guest_phys_addr);
+        /* Not an adjacent or overlapping region - do not merge. */
+        if ((prlast + 1 != start_addr || urlast + 1 != uaddr) &&
+            (pmlast + 1 != reg->guest_phys_addr ||
+             umlast + 1 != reg->userspace_addr)) {
+            continue;
+        }
+
+        if (merged) {
+            --to;
+            assert(to >= 0);
+        } else {
+            merged = reg;
+        }
+        u = MIN(uaddr, reg->userspace_addr);
+        s = MIN(start_addr, reg->guest_phys_addr);
+        e = MAX(pmlast, prlast);
+        uaddr = merged->userspace_addr = u;
+        start_addr = merged->guest_phys_addr = s;
+        size = merged->memory_size = e - s + 1;
+        assert(merged->memory_size);
+    }
+
+    if (!merged) {
+        struct vhost_memory_region *reg = dev->mem->regions + to;
+        memset(reg, 0, sizeof *reg);
+        reg->memory_size = size;
+        assert(reg->memory_size);
+        reg->guest_phys_addr = start_addr;
+        reg->userspace_addr = uaddr;
+        ++to;
+    }
+    assert(to <= dev->mem->nregions + 1);
+    dev->mem->nregions = to;
+}
+
+static uint64_t vhost_get_log_size(struct vhost_dev *dev)
+{
+    uint64_t log_size = 0;
+    int i;
+    for (i = 0; i < dev->mem->nregions; ++i) {
+        struct vhost_memory_region *reg = dev->mem->regions + i;
+        uint64_t last = range_get_last(reg->guest_phys_addr,
+                                       reg->memory_size);
+        log_size = MAX(log_size, last / VHOST_LOG_CHUNK + 1);
+    }
+    for (i = 0; i < dev->nvqs; ++i) {
+        struct vhost_virtqueue *vq = dev->vqs + i;
+        uint64_t last = vq->used_phys + vq->used_size - 1;
+        log_size = MAX(log_size, last / VHOST_LOG_CHUNK + 1);
+    }
+    return log_size;
+}
+
+static inline void vhost_dev_log_resize(struct vhost_dev* dev, uint64_t size)
+{
+    vhost_log_chunk_t *log;
+    uint64_t log_base;
+    int r;
+    if (size) {
+        log = qemu_mallocz(size * sizeof *log);
+    } else {
+        log = NULL;
+    }
+    log_base = (uint64_t)(unsigned long)log;
+    r = ioctl(dev->control, VHOST_SET_LOG_BASE, &log_base);
+    assert(r >= 0);
+    vhost_client_sync_dirty_bitmap(&dev->client, 0,
+                                   (target_phys_addr_t)~0x0ull);
+    if (dev->log) {
+        qemu_free(dev->log);
+    }
+    dev->log = log;
+    dev->log_size = size;
+}
+
+static int vhost_verify_ring_mappings(struct vhost_dev *dev,
+                                      uint64_t start_addr,
+                                      uint64_t size)
+{
+    int i;
+    for (i = 0; i < dev->nvqs; ++i) {
+        struct vhost_virtqueue *vq = dev->vqs + i;
+        target_phys_addr_t l;
+        void *p;
+
+        if (!ranges_overlap(start_addr, size, vq->ring_phys, vq->ring_size)) {
+            continue;
+        }
+        l = vq->ring_size;
+        p = cpu_physical_memory_map(vq->ring_phys, &l, 1);
+        if (!p || l != vq->ring_size) {
+            fprintf(stderr, "Unable to map ring buffer for ring %d\n", i);
+            return -ENOMEM;
+        }
+        if (p != vq->ring) {
+            fprintf(stderr, "Ring buffer relocated for ring %d\n", i);
+            return -EBUSY;
+        }
+        cpu_physical_memory_unmap(p, l, 0, 0);
+    }
+    return 0;
+}
+
+static void vhost_client_set_memory(CPUPhysMemoryClient *client,
+                                    target_phys_addr_t start_addr,
+                                    ram_addr_t size,
+                                    ram_addr_t phys_offset)
+{
+    struct vhost_dev *dev = container_of(client, struct vhost_dev, client);
+    ram_addr_t flags = phys_offset & ~TARGET_PAGE_MASK;
+    int s = offsetof(struct vhost_memory, regions) +
+        (dev->mem->nregions + 1) * sizeof dev->mem->regions[0];
+    uint64_t log_size;
+    int r;
+    dev->mem = qemu_realloc(dev->mem, s);
+
+    assert(size);
+
+    vhost_dev_unassign_memory(dev, start_addr, size);
+    if (flags == IO_MEM_RAM) {
+        /* Add given mapping, merging adjacent regions if any */
+        vhost_dev_assign_memory(dev, start_addr, size,
+                                (uintptr_t)qemu_get_ram_ptr(phys_offset));
+    } else {
+        /* Remove old mapping for this memory, if any. */
+        vhost_dev_unassign_memory(dev, start_addr, size);
+    }
+
+    if (!dev->started) {
+        return;
+    }
+
+    if (dev->started) {
+        r = vhost_verify_ring_mappings(dev, start_addr, size);
+        assert(r >= 0);
+    }
+
+    if (!dev->log_enabled) {
+        r = ioctl(dev->control, VHOST_SET_MEM_TABLE, dev->mem);
+        assert(r >= 0);
+        return;
+    }
+    log_size = vhost_get_log_size(dev);
+    /* We allocate an extra 4K bytes to log,
+     * to reduce the * number of reallocations. */
+#define VHOST_LOG_BUFFER (0x1000 / sizeof *dev->log)
+    /* To log more, must increase log size before table update. */
+    if (dev->log_size < log_size) {
+        vhost_dev_log_resize(dev, log_size + VHOST_LOG_BUFFER);
+    }
+    r = ioctl(dev->control, VHOST_SET_MEM_TABLE, dev->mem);
+    assert(r >= 0);
+    /* To log less, can only decrease log size after table update. */
+    if (dev->log_size > log_size + VHOST_LOG_BUFFER) {
+        vhost_dev_log_resize(dev, log_size);
+    }
+}
+
+static int vhost_virtqueue_set_addr(struct vhost_dev *dev,
+                                    struct vhost_virtqueue *vq,
+                                    unsigned idx, bool enable_log)
+{
+    struct vhost_vring_addr addr = {
+        .index = idx,
+        .desc_user_addr = (u_int64_t)(unsigned long)vq->desc,
+        .avail_user_addr = (u_int64_t)(unsigned long)vq->avail,
+        .used_user_addr = (u_int64_t)(unsigned long)vq->used,
+        .log_guest_addr = vq->used_phys,
+        .flags = enable_log ? (1 << VHOST_VRING_F_LOG) : 0,
+    };
+    int r = ioctl(dev->control, VHOST_SET_VRING_ADDR, &addr);
+    if (r < 0) {
+        return -errno;
+    }
+    return 0;
+}
+
+static int vhost_dev_set_features(struct vhost_dev *dev, bool enable_log)
+{
+    uint64_t features = dev->acked_features;
+    int r;
+    if (enable_log) {
+        features |= 0x1 << VHOST_F_LOG_ALL;
+    }
+    r = ioctl(dev->control, VHOST_SET_FEATURES, &features);
+    return r < 0 ? -errno : 0;
+}
+
+static int vhost_dev_set_log(struct vhost_dev *dev, bool enable_log)
+{
+    int r, t, i;
+    r = vhost_dev_set_features(dev, enable_log);
+    if (r < 0) {
+        goto err_features;
+    }
+    for (i = 0; i < dev->nvqs; ++i) {
+        r = vhost_virtqueue_set_addr(dev, dev->vqs + i, i,
+                                     enable_log);
+        if (r < 0) {
+            goto err_vq;
+        }
+    }
+    return 0;
+err_vq:
+    for (; i >= 0; --i) {
+        t = vhost_virtqueue_set_addr(dev, dev->vqs + i, i,
+                                     dev->log_enabled);
+        assert(t >= 0);
+    }
+    t = vhost_dev_set_features(dev, dev->log_enabled);
+    assert(t >= 0);
+err_features:
+    return r;
+}
+
+static int vhost_client_migration_log(CPUPhysMemoryClient *client,
+                                      int enable)
+{
+    struct vhost_dev *dev = container_of(client, struct vhost_dev, client);
+    int r;
+    if (!!enable == dev->log_enabled) {
+        return 0;
+    }
+    if (!dev->started) {
+        dev->log_enabled = enable;
+        return 0;
+    }
+    if (!enable) {
+        r = vhost_dev_set_log(dev, false);
+        if (r < 0) {
+            return r;
+        }
+        if (dev->log) {
+            qemu_free(dev->log);
+        }
+        dev->log = NULL;
+        dev->log_size = 0;
+    } else {
+        vhost_dev_log_resize(dev, vhost_get_log_size(dev));
+        r = vhost_dev_set_log(dev, true);
+        if (r < 0) {
+            return r;
+        }
+    }
+    dev->log_enabled = enable;
+    return 0;
+}
+
+static int vhost_virtqueue_init(struct vhost_dev *dev,
+                                struct VirtIODevice *vdev,
+                                struct vhost_virtqueue *vq,
+                                unsigned idx)
+{
+    target_phys_addr_t s, l, a;
+    int r;
+    struct vhost_vring_file file = {
+        .index = idx,
+    };
+    struct vhost_vring_state state = {
+        .index = idx,
+    };
+    struct VirtQueue *q = virtio_queue(vdev, idx);
+
+    vq->num = state.num = virtio_queue_get_num(vdev, idx);
+    r = ioctl(dev->control, VHOST_SET_VRING_NUM, &state);
+    if (r) {
+        return -errno;
+    }
+
+    state.num = virtio_queue_last_avail_idx(vdev, idx);
+    r = ioctl(dev->control, VHOST_SET_VRING_BASE, &state);
+    if (r) {
+        return -errno;
+    }
+
+    s = l = virtio_queue_get_desc_size(vdev, idx);
+    a = virtio_queue_get_desc(vdev, idx);
+    vq->desc = cpu_physical_memory_map(a, &l, 0);
+    if (!vq->desc || l != s) {
+        r = -ENOMEM;
+        goto fail_alloc_desc;
+    }
+    s = l = virtio_queue_get_avail_size(vdev, idx);
+    a = virtio_queue_get_avail(vdev, idx);
+    vq->avail = cpu_physical_memory_map(a, &l, 0);
+    if (!vq->avail || l != s) {
+        r = -ENOMEM;
+        goto fail_alloc_avail;
+    }
+    vq->used_size = s = l = virtio_queue_get_used_size(vdev, idx);
+    vq->used_phys = a = virtio_queue_get_used(vdev, idx);
+    vq->used = cpu_physical_memory_map(a, &l, 1);
+    if (!vq->used || l != s) {
+        r = -ENOMEM;
+        goto fail_alloc_used;
+    }
+
+    vq->ring_size = s = l = virtio_queue_get_ring_size(vdev, idx);
+    vq->ring_phys = a = virtio_queue_get_ring(vdev, idx);
+    vq->ring = cpu_physical_memory_map(a, &l, 1);
+    if (!vq->ring || l != s) {
+        r = -ENOMEM;
+        goto fail_alloc_ring;
+    }
+
+    r = vhost_virtqueue_set_addr(dev, vq, idx, dev->log_enabled);
+    if (r < 0) {
+        r = -errno;
+        goto fail_alloc;
+    }
+    if (!vdev->binding->guest_notifier || !vdev->binding->host_notifier) {
+        fprintf(stderr, "binding does not support irqfd/queuefd\n");
+        r = -ENOSYS;
+        goto fail_alloc;
+    }
+    r = vdev->binding->guest_notifier(vdev->binding_opaque, idx, true);
+    if (r < 0) {
+        fprintf(stderr, "Error binding guest notifier: %d\n", -r);
+        goto fail_guest_notifier;
+    }
+
+    r = vdev->binding->host_notifier(vdev->binding_opaque, idx, true);
+    if (r < 0) {
+        fprintf(stderr, "Error binding host notifier: %d\n", -r);
+        goto fail_host_notifier;
+    }
+
+    file.fd = event_notifier_get_fd(virtio_queue_host_notifier(q));
+    r = ioctl(dev->control, VHOST_SET_VRING_KICK, &file);
+    if (r) {
+        goto fail_kick;
+    }
+
+    file.fd = event_notifier_get_fd(virtio_queue_guest_notifier(q));
+    r = ioctl(dev->control, VHOST_SET_VRING_CALL, &file);
+    if (r) {
+        goto fail_call;
+    }
+
+    return 0;
+
+fail_call:
+fail_kick:
+    vdev->binding->host_notifier(vdev->binding_opaque, idx, false);
+fail_host_notifier:
+    vdev->binding->guest_notifier(vdev->binding_opaque, idx, false);
+fail_guest_notifier:
+fail_alloc:
+    cpu_physical_memory_unmap(vq->ring, virtio_queue_get_ring_size(vdev, idx),
+                              0, 0);
+fail_alloc_ring:
+    cpu_physical_memory_unmap(vq->used, virtio_queue_get_used_size(vdev, idx),
+                              0, 0);
+fail_alloc_used:
+    cpu_physical_memory_unmap(vq->avail, virtio_queue_get_avail_size(vdev, idx),
+                              0, 0);
+fail_alloc_avail:
+    cpu_physical_memory_unmap(vq->desc, virtio_queue_get_desc_size(vdev, idx),
+                              0, 0);
+fail_alloc_desc:
+    return r;
+}
+
+static void vhost_virtqueue_cleanup(struct vhost_dev *dev,
+                                    struct VirtIODevice *vdev,
+                                    struct vhost_virtqueue *vq,
+                                    unsigned idx)
+{
+    struct vhost_vring_state state = {
+        .index = idx,
+    };
+    int r;
+    r = vdev->binding->guest_notifier(vdev->binding_opaque, idx, false);
+    if (r < 0) {
+        fprintf(stderr, "vhost VQ %d guest cleanup failed: %d\n", idx, r);
+        fflush(stderr);
+    }
+    assert (r >= 0);
+
+    r = vdev->binding->host_notifier(vdev->binding_opaque, idx, false);
+    if (r < 0) {
+        fprintf(stderr, "vhost VQ %d host cleanup failed: %d\n", idx, r);
+        fflush(stderr);
+    }
+    assert (r >= 0);
+    r = ioctl(dev->control, VHOST_GET_VRING_BASE, &state);
+    if (r < 0) {
+        fprintf(stderr, "vhost VQ %d ring restore failed: %d\n", idx, r);
+        fflush(stderr);
+    }
+    virtio_queue_set_last_avail_idx(vdev, idx, state.num);
+    assert (r >= 0);
+    cpu_physical_memory_unmap(vq->ring, virtio_queue_get_ring_size(vdev, idx),
+                              0, virtio_queue_get_ring_size(vdev, idx));
+    cpu_physical_memory_unmap(vq->used, virtio_queue_get_used_size(vdev, idx),
+                              1, virtio_queue_get_used_size(vdev, idx));
+    cpu_physical_memory_unmap(vq->avail, virtio_queue_get_avail_size(vdev, idx),
+                              0, virtio_queue_get_avail_size(vdev, idx));
+    cpu_physical_memory_unmap(vq->desc, virtio_queue_get_desc_size(vdev, idx),
+                              0, virtio_queue_get_desc_size(vdev, idx));
+}
+
+int vhost_dev_init(struct vhost_dev *hdev, int devfd)
+{
+    uint64_t features;
+    int r;
+    if (devfd >= 0) {
+        hdev->control = devfd;
+    } else {
+        hdev->control = open("/dev/vhost-net", O_RDWR);
+        if (hdev->control < 0) {
+            return -errno;
+        }
+    }
+    r = ioctl(hdev->control, VHOST_SET_OWNER, NULL);
+    if (r < 0) {
+        goto fail;
+    }
+
+    r = ioctl(hdev->control, VHOST_GET_FEATURES, &features);
+    if (r < 0) {
+        goto fail;
+    }
+    hdev->features = features;
+
+    hdev->client.set_memory = vhost_client_set_memory;
+    hdev->client.sync_dirty_bitmap = vhost_client_sync_dirty_bitmap;
+    hdev->client.migration_log = vhost_client_migration_log;
+    hdev->mem = qemu_mallocz(offsetof(struct vhost_memory, regions));
+    hdev->log = NULL;
+    hdev->log_size = 0;
+    hdev->log_enabled = false;
+    hdev->started = false;
+    cpu_register_phys_memory_client(&hdev->client);
+    return 0;
+fail:
+    r = -errno;
+    close(hdev->control);
+    return r;
+}
+
+void vhost_dev_cleanup(struct vhost_dev *hdev)
+{
+    cpu_unregister_phys_memory_client(&hdev->client);
+    qemu_free(hdev->mem);
+    close(hdev->control);
+}
+
+int vhost_dev_start(struct vhost_dev *hdev, VirtIODevice *vdev)
+{
+    int i, r;
+
+    r = vhost_dev_set_features(hdev, hdev->log_enabled);
+    if (r < 0) {
+        goto fail;
+    }
+    r = ioctl(hdev->control, VHOST_SET_MEM_TABLE, hdev->mem);
+    if (r < 0) {
+        r = -errno;
+        goto fail;
+    }
+    if (hdev->log_enabled) {
+        hdev->log_size = vhost_get_log_size(hdev);
+        hdev->log = hdev->log_size ?
+            qemu_mallocz(hdev->log_size * sizeof *hdev->log) : NULL;
+        r = ioctl(hdev->control, VHOST_SET_LOG_BASE,
+                  (uint64_t)(unsigned long)hdev->log);
+        if (r < 0) {
+            r = -errno;
+            goto fail;
+        }
+    }
+
+    for (i = 0; i < hdev->nvqs; ++i) {
+        r = vhost_virtqueue_init(hdev,
+                                 vdev,
+                                 hdev->vqs + i,
+                                 i);
+        if (r < 0) {
+            goto fail_vq;
+        }
+    }
+    hdev->started = true;
+
+    return 0;
+fail_vq:
+    while (--i >= 0) {
+        vhost_virtqueue_cleanup(hdev,
+                                vdev,
+                                hdev->vqs + i,
+                                i);
+    }
+fail:
+    return r;
+}
+
+void vhost_dev_stop(struct vhost_dev *hdev, VirtIODevice *vdev)
+{
+    int i;
+    for (i = 0; i < hdev->nvqs; ++i) {
+        vhost_virtqueue_cleanup(hdev,
+                                vdev,
+                                hdev->vqs + i,
+                                i);
+    }
+    vhost_client_sync_dirty_bitmap(&hdev->client, 0,
+                                   (target_phys_addr_t)~0x0ull);
+    hdev->started = false;
+    qemu_free(hdev->log);
+    hdev->log_size = 0;
+}
diff --git a/hw/vhost.h b/hw/vhost.h
new file mode 100644
index 0000000..86dd834
--- /dev/null
+++ b/hw/vhost.h
@@ -0,0 +1,48 @@
+#ifndef VHOST_H
+#define VHOST_H
+
+#include "hw/hw.h"
+#include "hw/virtio.h"
+
+/* Generic structures common for any vhost based device. */
+struct vhost_virtqueue {
+    int kick;
+    int call;
+    void *desc;
+    void *avail;
+    void *used;
+    int num;
+    unsigned long long used_phys;
+    unsigned used_size;
+    void *ring;
+    unsigned long long ring_phys;
+    unsigned ring_size;
+};
+
+typedef unsigned long vhost_log_chunk_t;
+#define VHOST_LOG_PAGE 0x1000
+#define VHOST_LOG_BITS (8 * sizeof(vhost_log_chunk_t))
+#define VHOST_LOG_CHUNK (VHOST_LOG_PAGE * VHOST_LOG_BITS)
+
+struct vhost_memory;
+struct vhost_dev {
+    CPUPhysMemoryClient client;
+    int control;
+    struct vhost_memory *mem;
+    struct vhost_virtqueue *vqs;
+    int nvqs;
+    unsigned long long features;
+    unsigned long long acked_features;
+    unsigned long long backend_features;
+    bool started;
+    bool log_enabled;
+    vhost_log_chunk_t *log;
+    unsigned long long log_size;
+};
+
+int vhost_dev_init(struct vhost_dev *hdev, int devfd);
+void vhost_dev_cleanup(struct vhost_dev *hdev);
+int vhost_dev_start(struct vhost_dev *hdev, VirtIODevice *vdev);
+void vhost_dev_stop(struct vhost_dev *hdev, VirtIODevice *vdev);
+
+#endif
diff --git a/hw/vhost_net.c b/hw/vhost_net.c
new file mode 100644
index 0000000..d84eed0
--- /dev/null
+++ b/hw/vhost_net.c
@@ -0,0 +1,198 @@
+/*
+ * vhost-net support
+ *
+ * Copyright Red Hat, Inc. 2010
+ *
+ * Authors:
+ *  Michael S. Tsirkin <mst@redhat.com>
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2.  See
+ * the COPYING file in the top-level directory.
+ */
+
+#include "net.h"
+#include "net/tap.h"
+
+#include "virtio-net.h"
+#include "vhost_net.h"
+
+#include "config.h"
+
+#ifdef CONFIG_VHOST_NET
+#include <linux/vhost.h>
+#include <sys/eventfd.h>
+#include <sys/socket.h>
+#include <linux/kvm.h>
+#include <fcntl.h>
+#include <sys/ioctl.h>
+#include <linux/virtio_ring.h>
+#include <netpacket/packet.h>
+#include <net/ethernet.h>
+#include <net/if.h>
+#include <netinet/in.h>
+
+#include <stdio.h>
+
+#include "vhost.h"
+
+struct vhost_net {
+    struct vhost_dev dev;
+    struct vhost_virtqueue vqs[2];
+    int backend;
+    VLANClientState *vc;
+};
+
+unsigned vhost_net_get_features(struct vhost_net *net, unsigned features)
+{
+    /* Clear features not supported by host kernel. */
+    if (!(net->dev.features & (1 << VIRTIO_F_NOTIFY_ON_EMPTY))) {
+        features &= ~(1 << VIRTIO_F_NOTIFY_ON_EMPTY);
+    }
+    if (!(net->dev.features & (1 << VIRTIO_RING_F_INDIRECT_DESC))) {
+        features &= ~(1 << VIRTIO_RING_F_INDIRECT_DESC);
+    }
+    if (!(net->dev.features & (1 << VIRTIO_NET_F_MRG_RXBUF))) {
+        features &= ~(1 << VIRTIO_NET_F_MRG_RXBUF);
+    }
+    return features;
+}
+
+void vhost_net_ack_features(struct vhost_net *net, unsigned features)
+{
+    net->dev.acked_features = net->dev.backend_features;
+    if (features & (1 << VIRTIO_F_NOTIFY_ON_EMPTY)) {
+        net->dev.acked_features |= (1 << VIRTIO_F_NOTIFY_ON_EMPTY);
+    }
+    if (features & (1 << VIRTIO_RING_F_INDIRECT_DESC)) {
+        net->dev.acked_features |= (1 << VIRTIO_RING_F_INDIRECT_DESC);
+    }
+}
+
+static int vhost_net_get_fd(VLANClientState *backend)
+{
+    switch (backend->info->type) {
+    case NET_CLIENT_TYPE_TAP:
+        return tap_get_fd(backend);
+    default:
+        fprintf(stderr, "vhost-net requires tap backend\n");
+        return -EBADFD;
+    }
+}
+
+struct vhost_net *vhost_net_init(VLANClientState *backend, int devfd)
+{
+    int r;
+    struct vhost_net *net = qemu_malloc(sizeof *net);
+    if (!backend) {
+        fprintf(stderr, "vhost-net requires backend to be setup\n");
+        goto fail;
+    }
+    r = vhost_net_get_fd(backend);
+    if (r < 0) {
+        goto fail;
+    }
+    net->vc = backend;
+    net->dev.backend_features = tap_has_vnet_hdr(backend) ? 0 :
+        (1 << VHOST_NET_F_VIRTIO_NET_HDR);
+    net->backend = r;
+
+    r = vhost_dev_init(&net->dev, devfd);
+    if (r < 0) {
+        goto fail;
+    }
+    if (~net->dev.features & net->dev.backend_features) {
+        fprintf(stderr, "vhost lacks feature mask %llu for backend\n",
+                ~net->dev.features & net->dev.backend_features);
+        vhost_dev_cleanup(&net->dev);
+        goto fail;
+    }
+
+    /* Set sane init value. Override when guest acks. */
+    vhost_net_ack_features(net, 0);
+    return net;
+fail:
+    qemu_free(net);
+    return NULL;
+}
+
+int vhost_net_start(struct vhost_net *net,
+                    VirtIODevice *dev)
+{
+    struct vhost_vring_file file = { };
+    int r;
+
+    net->dev.nvqs = 2;
+    net->dev.vqs = net->vqs;
+    r = vhost_dev_start(&net->dev, dev);
+    if (r < 0) {
+        return r;
+    }
+
+    net->vc->info->poll(net->vc, false);
+    qemu_set_fd_handler(net->backend, NULL, NULL, NULL);
+    file.fd = net->backend;
+    for (file.index = 0; file.index < net->dev.nvqs; ++file.index) {
+        r = ioctl(net->dev.control, VHOST_NET_SET_BACKEND, &file);
+        if (r < 0) {
+            r = -errno;
+            goto fail;
+        }
+    }
+    return 0;
+fail:
+    file.fd = -1;
+    while (--file.index >= 0) {
+        int r = ioctl(net->dev.control, VHOST_NET_SET_BACKEND, &file);
+        assert(r >= 0);
+    }
+    net->vc->info->poll(net->vc, true);
+    vhost_dev_stop(&net->dev, dev);
+    return r;
+}
+
+void vhost_net_stop(struct vhost_net *net,
+                    VirtIODevice *dev)
+{
+    struct vhost_vring_file file = { .fd = -1 };
+
+    for (file.index = 0; file.index < net->dev.nvqs; ++file.index) {
+        int r = ioctl(net->dev.control, VHOST_NET_SET_BACKEND, &file);
+        assert(r >= 0);
+    }
+    net->vc->info->poll(net->vc, true);
+    vhost_dev_stop(&net->dev, dev);
+}
+
+void vhost_net_cleanup(struct vhost_net *net)
+{
+    vhost_dev_cleanup(&net->dev);
+    qemu_free(net);
+}
+#else
+struct vhost_net *vhost_net_init(VLANClientState *backend, int devfd)
+{
+	return NULL;
+}
+
+int vhost_net_start(struct vhost_net *net,
+		    VirtIODevice *dev)
+{
+	return -ENOSYS;
+}
+void vhost_net_stop(struct vhost_net *net,
+		    VirtIODevice *dev)
+{
+}
+
+void vhost_net_cleanup(struct vhost_net *net)
+{
+}
+
+unsigned vhost_net_get_features(struct vhost_net *net, unsigned features)
+{
+	return features;
+}
+void vhost_net_ack_features(struct vhost_net *net, unsigned features)
+{
+}
+#endif
diff --git a/hw/vhost_net.h b/hw/vhost_net.h
new file mode 100644
index 0000000..6c18ff7
--- /dev/null
+++ b/hw/vhost_net.h
@@ -0,0 +1,19 @@
+#ifndef VHOST_NET_H
+#define VHOST_NET_H
+
+#include "net.h"
+
+struct vhost_net;
+typedef struct vhost_net VHostNetState;
+
+VHostNetState *vhost_net_init(VLANClientState *backend, int devfd);
+
+int vhost_net_start(VHostNetState *net, VirtIODevice *dev);
+void vhost_net_stop(VHostNetState *net, VirtIODevice *dev);
+
+void vhost_net_cleanup(VHostNetState *net);
+
+unsigned vhost_net_get_features(VHostNetState *net, unsigned features);
+void vhost_net_ack_features(VHostNetState *net, unsigned features);
+
+#endif
-- 
1.7.0.18.g0d53a5

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [Qemu-devel] [PATCHv4 10/12] tap: add vhost/vhostfd options
  2010-03-03 17:15 [Qemu-devel] [PATCHv4 00/12] vhost-net: upstream integration Michael S. Tsirkin
                   ` (8 preceding siblings ...)
  2010-03-03 17:16 ` [Qemu-devel] [PATCHv4 09/12] vhost: vhost net support Michael S. Tsirkin
@ 2010-03-03 17:16 ` Michael S. Tsirkin
  2010-03-03 17:16 ` [Qemu-devel] [PATCHv4 11/12] tap: add API to retrieve vhost net header Michael S. Tsirkin
  2010-03-03 17:16 ` [Qemu-devel] [PATCHv4 12/12] virtio-net: vhost net support Michael S. Tsirkin
  11 siblings, 0 replies; 32+ messages in thread
From: Michael S. Tsirkin @ 2010-03-03 17:16 UTC (permalink / raw)
  To: Anthony Liguori, qemu-devel; +Cc: amit.shah, kraxel, quintela

This adds vhost binary option to tap, to enable vhost net accelerator.
Default is off for now, we'll be able to make default on long term
when we know it's stable.

vhostfd option can be used by management, to pass in the fd. Assigning
vhostfd implies vhost=on.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
---
 net.c           |    8 ++++++++
 net/tap.c       |   29 +++++++++++++++++++++++++++++
 qemu-options.hx |    4 +++-
 3 files changed, 40 insertions(+), 1 deletions(-)

diff --git a/net.c b/net.c
index a1bf49f..d1e23f1 100644
--- a/net.c
+++ b/net.c
@@ -973,6 +973,14 @@ static const struct {
                 .name = "vnet_hdr",
                 .type = QEMU_OPT_BOOL,
                 .help = "enable the IFF_VNET_HDR flag on the tap interface"
+            }, {
+                .name = "vhost",
+                .type = QEMU_OPT_BOOL,
+                .help = "enable vhost-net network accelerator",
+            }, {
+                .name = "vhostfd",
+                .type = QEMU_OPT_STRING,
+                .help = "file descriptor of an already opened vhost net device",
             },
 #endif /* _WIN32 */
             { /* end of list */ }
diff --git a/net/tap.c b/net/tap.c
index fc59fd4..19c4fa2 100644
--- a/net/tap.c
+++ b/net/tap.c
@@ -41,6 +41,8 @@
 
 #include "net/tap-linux.h"
 
+#include "hw/vhost_net.h"
+
 /* Maximum GSO packet size (64k) plus plenty of room for
  * the ethernet and virtio_net headers
  */
@@ -57,6 +59,7 @@ typedef struct TAPState {
     unsigned int has_vnet_hdr : 1;
     unsigned int using_vnet_hdr : 1;
     unsigned int has_ufo: 1;
+    VHostNetState *vhost_net;
 } TAPState;
 
 static int launch_script(const char *setup_script, const char *ifname, int fd);
@@ -252,6 +255,10 @@ static void tap_cleanup(VLANClientState *nc)
 {
     TAPState *s = DO_UPCAST(TAPState, nc, nc);
 
+    if (s->vhost_net) {
+        vhost_net_cleanup(s->vhost_net);
+    }
+
     qemu_purge_queued_packets(nc);
 
     if (s->down_script[0])
@@ -307,6 +314,7 @@ static TAPState *net_tap_fd_init(VLANState *vlan,
     s->has_ufo = tap_probe_has_ufo(s->fd);
     tap_set_offload(&s->nc, 0, 0, 0, 0, 0);
     tap_read_poll(s, 1);
+    s->vhost_net = NULL;
     return s;
 }
 
@@ -456,5 +464,26 @@ int net_init_tap(QemuOpts *opts, Monitor *mon, const char *name, VLANState *vlan
         }
     }
 
+    if (qemu_opt_get_bool(opts, "vhost", !!qemu_opt_get(opts, "vhostfd"))) {
+        int vhostfd, r;
+        if (qemu_opt_get(opts, "vhostfd")) {
+            r = net_handle_fd_param(mon, qemu_opt_get(opts, "vhostfd"));
+            if (r == -1) {
+                return -1;
+            }
+            vhostfd = r;
+        } else {
+            vhostfd = -1;
+        }
+        s->vhost_net = vhost_net_init(&s->nc, vhostfd);
+        if (!s->vhost_net) {
+            qemu_error("vhost-net requested but could not be initialized\n");
+            return -1;
+        }
+    } else if (qemu_opt_get(opts, "vhostfd")) {
+        qemu_error("vhostfd= is not valid without vhost\n");
+        return -1;
+    }
+
     return 0;
 }
diff --git a/qemu-options.hx b/qemu-options.hx
index 7daa246..9dc81b4 100644
--- a/qemu-options.hx
+++ b/qemu-options.hx
@@ -879,7 +879,7 @@ DEF("net", HAS_ARG, QEMU_OPTION_net,
     "-net tap[,vlan=n][,name=str],ifname=name\n"
     "                connect the host TAP network interface to VLAN 'n'\n"
 #else
-    "-net tap[,vlan=n][,name=str][,fd=h][,ifname=name][,script=file][,downscript=dfile][,sndbuf=nbytes][,vnet_hdr=on|off]\n"
+    "-net tap[,vlan=n][,name=str][,fd=h][,ifname=name][,script=file][,downscript=dfile][,sndbuf=nbytes][,vnet_hdr=on|off][,vhost=on|off][,vhostfd=h]\n"
     "                connect the host TAP network interface to VLAN 'n' and use the\n"
     "                network scripts 'file' (default=" DEFAULT_NETWORK_SCRIPT ")\n"
     "                and 'dfile' (default=" DEFAULT_NETWORK_DOWN_SCRIPT ")\n"
@@ -889,6 +889,8 @@ DEF("net", HAS_ARG, QEMU_OPTION_net,
     "                default of 'sndbuf=1048576' can be disabled using 'sndbuf=0')\n"
     "                use vnet_hdr=off to avoid enabling the IFF_VNET_HDR tap flag\n"
     "                use vnet_hdr=on to make the lack of IFF_VNET_HDR support an error condition\n"
+    "                use vhost=on to enable experimental in kernel accelerator\n"
+    "                use 'vhostfd=h' to connect to an already opened vhost net device\n"
 #endif
     "-net socket[,vlan=n][,name=str][,fd=h][,listen=[host]:port][,connect=host:port]\n"
     "                connect the vlan 'n' to another VLAN using a socket connection\n"
-- 
1.7.0.18.g0d53a5

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [Qemu-devel] [PATCHv4 11/12] tap: add API to retrieve vhost net header
  2010-03-03 17:15 [Qemu-devel] [PATCHv4 00/12] vhost-net: upstream integration Michael S. Tsirkin
                   ` (9 preceding siblings ...)
  2010-03-03 17:16 ` [Qemu-devel] [PATCHv4 10/12] tap: add vhost/vhostfd options Michael S. Tsirkin
@ 2010-03-03 17:16 ` Michael S. Tsirkin
  2010-03-03 17:16 ` [Qemu-devel] [PATCHv4 12/12] virtio-net: vhost net support Michael S. Tsirkin
  11 siblings, 0 replies; 32+ messages in thread
From: Michael S. Tsirkin @ 2010-03-03 17:16 UTC (permalink / raw)
  To: Anthony Liguori, qemu-devel; +Cc: amit.shah, kraxel, quintela

will be used by virtio-net for vhost net support

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
---
 net/tap.c |    7 +++++++
 net/tap.h |    3 +++
 2 files changed, 10 insertions(+), 0 deletions(-)

diff --git a/net/tap.c b/net/tap.c
index 19c4fa2..35c05d7 100644
--- a/net/tap.c
+++ b/net/tap.c
@@ -487,3 +487,10 @@ int net_init_tap(QemuOpts *opts, Monitor *mon, const char *name, VLANState *vlan
 
     return 0;
 }
+
+VHostNetState *tap_get_vhost_net(VLANClientState *nc)
+{
+    TAPState *s = DO_UPCAST(TAPState, nc, nc);
+    assert(nc->info->type == NET_CLIENT_TYPE_TAP);
+    return s->vhost_net;
+}
diff --git a/net/tap.h b/net/tap.h
index a244b28..b8cec83 100644
--- a/net/tap.h
+++ b/net/tap.h
@@ -50,4 +50,7 @@ void tap_fd_set_offload(int fd, int csum, int tso4, int tso6, int ecn, int ufo);
 
 int tap_get_fd(VLANClientState *vc);
 
+struct vhost_net;
+struct vhost_net *tap_get_vhost_net(VLANClientState *vc);
+
 #endif /* QEMU_NET_TAP_H */
-- 
1.7.0.18.g0d53a5

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [Qemu-devel] [PATCHv4 12/12] virtio-net: vhost net support
  2010-03-03 17:15 [Qemu-devel] [PATCHv4 00/12] vhost-net: upstream integration Michael S. Tsirkin
                   ` (10 preceding siblings ...)
  2010-03-03 17:16 ` [Qemu-devel] [PATCHv4 11/12] tap: add API to retrieve vhost net header Michael S. Tsirkin
@ 2010-03-03 17:16 ` Michael S. Tsirkin
  11 siblings, 0 replies; 32+ messages in thread
From: Michael S. Tsirkin @ 2010-03-03 17:16 UTC (permalink / raw)
  To: Anthony Liguori, qemu-devel; +Cc: amit.shah, kraxel, quintela

This connects virtio-net to vhost net backend.
The code is structured in a way analogous to what we have with vnet
header capability in tap.

We start/stop backend on driver start/stop as
well as on save and vm start (for migration).

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
---
 hw/virtio-net.c |   71 +++++++++++++++++++++++++++++++++++++++++++++++++++++-
 1 files changed, 69 insertions(+), 2 deletions(-)

diff --git a/hw/virtio-net.c b/hw/virtio-net.c
index 5c0093e..9ddd58c 100644
--- a/hw/virtio-net.c
+++ b/hw/virtio-net.c
@@ -17,6 +17,7 @@
 #include "net/tap.h"
 #include "qemu-timer.h"
 #include "virtio-net.h"
+#include "vhost_net.h"
 
 #define VIRTIO_NET_VM_VERSION    11
 
@@ -47,6 +48,8 @@ typedef struct VirtIONet
     uint8_t nomulti;
     uint8_t nouni;
     uint8_t nobcast;
+    uint8_t vhost_started;
+    VMChangeStateEntry *vmstate;
     struct {
         int in_use;
         int first_multi;
@@ -114,6 +117,10 @@ static void virtio_net_reset(VirtIODevice *vdev)
     n->nomulti = 0;
     n->nouni = 0;
     n->nobcast = 0;
+    if (n->vhost_started) {
+        vhost_net_stop(tap_get_vhost_net(n->nic->nc.peer), vdev);
+        n->vhost_started = 0;
+    }
 
     /* Flush any MAC and VLAN filter table state */
     n->mac_table.in_use = 0;
@@ -172,7 +179,14 @@ static uint32_t virtio_net_get_features(VirtIODevice *vdev, uint32_t features)
         features &= ~(0x1 << VIRTIO_NET_F_HOST_UFO);
     }
 
-    return features;
+    if (!n->nic->nc.peer ||
+        n->nic->nc.peer->info->type != NET_CLIENT_TYPE_TAP) {
+        return features;
+    }
+    if (!tap_get_vhost_net(n->nic->nc.peer)) {
+        return features;
+    }
+    return vhost_net_get_features(tap_get_vhost_net(n->nic->nc.peer), features);
 }
 
 static uint32_t virtio_net_bad_features(VirtIODevice *vdev)
@@ -698,6 +712,12 @@ static void virtio_net_save(QEMUFile *f, void *opaque)
 {
     VirtIONet *n = opaque;
 
+    if (n->vhost_started) {
+        /* TODO: should we really stop the backend?
+         * If we don't, it might keep writing to memory. */
+        vhost_net_stop(tap_get_vhost_net(n->nic->nc.peer), &n->vdev);
+        n->vhost_started = 0;
+    }
     virtio_save(&n->vdev, f);
 
     qemu_put_buffer(f, n->mac, ETH_ALEN);
@@ -810,7 +830,6 @@ static int virtio_net_load(QEMUFile *f, void *opaque, int version_id)
         qemu_mod_timer(n->tx_timer,
                        qemu_get_clock(vm_clock) + TX_TIMER_INTERVAL);
     }
-
     return 0;
 }
 
@@ -830,6 +849,47 @@ static NetClientInfo net_virtio_info = {
     .link_status_changed = virtio_net_set_link_status,
 };
 
+static void virtio_net_set_status(struct VirtIODevice *vdev, uint8_t status)
+{
+    VirtIONet *n = to_virtio_net(vdev);
+    if (!n->nic->nc.peer) {
+        return;
+    }
+    if (n->nic->nc.peer->info->type != NET_CLIENT_TYPE_TAP) {
+        return;
+    }
+
+    if (!tap_get_vhost_net(n->nic->nc.peer)) {
+        return;
+    }
+    if (!!n->vhost_started == !!(status & VIRTIO_CONFIG_S_DRIVER_OK)) {
+        return;
+    }
+    if (status & VIRTIO_CONFIG_S_DRIVER_OK) {
+        int r = vhost_net_start(tap_get_vhost_net(n->nic->nc.peer), vdev);
+        if (r < 0) {
+            fprintf(stderr, "unable to start vhost net: %d: "
+                    "falling back on userspace virtio\n", -r);
+        } else {
+            n->vhost_started = 1;
+        }
+    } else {
+        vhost_net_stop(tap_get_vhost_net(n->nic->nc.peer), vdev);
+        n->vhost_started = 0;
+    }
+}
+
+static void virtio_net_vmstate_change(void *opaque, int running, int reason)
+{
+    VirtIONet *n = opaque;
+    if (!running) {
+        return;
+    }
+    /* This is called when vm is started, it will start vhost backend if
+     * appropriate e.g. after migration. */
+    virtio_net_set_status(&n->vdev, n->vdev.status);
+}
+
 VirtIODevice *virtio_net_init(DeviceState *dev, NICConf *conf)
 {
     VirtIONet *n;
@@ -845,6 +905,7 @@ VirtIODevice *virtio_net_init(DeviceState *dev, NICConf *conf)
     n->vdev.set_features = virtio_net_set_features;
     n->vdev.bad_features = virtio_net_bad_features;
     n->vdev.reset = virtio_net_reset;
+    n->vdev.set_status = virtio_net_set_status;
     n->rx_vq = virtio_add_queue(&n->vdev, 256, virtio_net_handle_rx);
     n->tx_vq = virtio_add_queue(&n->vdev, 256, virtio_net_handle_tx);
     n->ctrl_vq = virtio_add_queue(&n->vdev, 64, virtio_net_handle_ctrl);
@@ -867,6 +928,7 @@ VirtIODevice *virtio_net_init(DeviceState *dev, NICConf *conf)
 
     register_savevm("virtio-net", virtio_net_id++, VIRTIO_NET_VM_VERSION,
                     virtio_net_save, virtio_net_load, n);
+    n->vmstate = qemu_add_vm_change_state_handler(virtio_net_vmstate_change, n);
 
     return &n->vdev;
 }
@@ -874,6 +936,11 @@ VirtIODevice *virtio_net_init(DeviceState *dev, NICConf *conf)
 void virtio_net_exit(VirtIODevice *vdev)
 {
     VirtIONet *n = DO_UPCAST(VirtIONet, vdev, vdev);
+    qemu_del_vm_change_state_handler(n->vmstate);
+
+    if (n->vhost_started) {
+        vhost_net_stop(tap_get_vhost_net(n->nic->nc.peer), vdev);
+    }
 
     qemu_purge_queued_packets(&n->nic->nc);
 
-- 
1.7.0.18.g0d53a5

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* Re: [Qemu-devel] [PATCHv4 06/12] virtio: add set_status callback
  2010-03-03 17:16 ` [Qemu-devel] [PATCHv4 06/12] virtio: add set_status callback Michael S. Tsirkin
@ 2010-03-04 12:19   ` Amit Shah
  2010-03-04 12:20     ` Michael S. Tsirkin
  0 siblings, 1 reply; 32+ messages in thread
From: Amit Shah @ 2010-03-04 12:19 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: quintela, qemu-devel, kraxel

On (Wed) Mar 03 2010 [19:16:13], Michael S. Tsirkin wrote:
> vhost net backend needs to be notified when
> frontend status changes. Add a callback,
> similar to set_features.
> 
> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
> ---
>  hw/s390-virtio-bus.c |    2 +-
>  hw/syborg_virtio.c   |    2 +-
>  hw/virtio-pci.c      |    5 +++--
>  hw/virtio.h          |    9 +++++++++
>  4 files changed, 14 insertions(+), 4 deletions(-)
> 
> diff --git a/hw/s390-virtio-bus.c b/hw/s390-virtio-bus.c
> index fa0a74f..6f93d67 100644
> --- a/hw/s390-virtio-bus.c
> +++ b/hw/s390-virtio-bus.c
> @@ -242,7 +242,7 @@ void s390_virtio_device_update_status(VirtIOS390Device *dev)
>      VirtIODevice *vdev = dev->vdev;
>      uint32_t features;
>  
> -    vdev->status = ldub_phys(dev->dev_offs + VIRTIO_DEV_OFFS_STATUS);
> +    virtio_set_status(vdev, ldub_phys(dev->dev_offs + VIRTIO_DEV_OFFS_STATUS));
>  
>      /* Update guest supported feature bitmap */
>  
> diff --git a/hw/syborg_virtio.c b/hw/syborg_virtio.c
> index 65239a0..c7b1162 100644
> --- a/hw/syborg_virtio.c
> +++ b/hw/syborg_virtio.c
> @@ -149,7 +149,7 @@ static void syborg_virtio_writel(void *opaque, target_phys_addr_t offset,
>          virtio_queue_notify(vdev, value);
>          break;
>      case SYBORG_VIRTIO_STATUS:
> -        vdev->status = value & 0xFF;
> +        virtio_set_status(vdev, value);

Dropping the '& 0xFF'? Should be OK, but in the next hunk it's retained:

>          if (vdev->status == 0)
>              virtio_reset(vdev);
>          break;
> diff --git a/hw/virtio-pci.c b/hw/virtio-pci.c
> index bcd40f7..fd9759a 100644
> --- a/hw/virtio-pci.c
> +++ b/hw/virtio-pci.c
> @@ -206,7 +206,7 @@ static void virtio_ioport_write(void *opaque, uint32_t addr, uint32_t val)
>          virtio_queue_notify(vdev, val);
>          break;
>      case VIRTIO_PCI_STATUS:
> -        vdev->status = val & 0xFF;
> +        virtio_set_status(vdev, val & 0xFF);
>          if (vdev->status == 0) {
>              virtio_reset(proxy->vdev);
>              msix_unuse_all_vectors(&proxy->pci_dev);

		Amit

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [Qemu-devel] [PATCHv4 07/12] virtio: move typedef to qemu-common
  2010-03-04 12:20   ` Amit Shah
@ 2010-03-04 12:19     ` Michael S. Tsirkin
  2010-03-04 12:29       ` Amit Shah
  0 siblings, 1 reply; 32+ messages in thread
From: Michael S. Tsirkin @ 2010-03-04 12:19 UTC (permalink / raw)
  To: Amit Shah; +Cc: quintela, qemu-devel, kraxel

On Thu, Mar 04, 2010 at 05:50:19PM +0530, Amit Shah wrote:
> On (Wed) Mar 03 2010 [19:16:28], Michael S. Tsirkin wrote:
> > make it possible to use type without header include
> 
> Why?

So that vhost.h does not need to include virtio.h

> > Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
> > ---
> >  hw/virtio.h   |    1 -
> >  qemu-common.h |    1 +
> >  2 files changed, 1 insertions(+), 1 deletions(-)
> > 
> > diff --git a/hw/virtio.h b/hw/virtio.h
> > index 58b06bf..6f2fab0 100644
> > --- a/hw/virtio.h
> > +++ b/hw/virtio.h
> > @@ -69,7 +69,6 @@ static inline target_phys_addr_t vring_align(target_phys_addr_t addr,
> >  }
> >  
> >  typedef struct VirtQueue VirtQueue;
> > -typedef struct VirtIODevice VirtIODevice;
> >  
> >  #define VIRTQUEUE_MAX_SIZE 1024
> >  
> > diff --git a/qemu-common.h b/qemu-common.h
> > index f12a8f5..90ca3b8 100644
> > --- a/qemu-common.h
> > +++ b/qemu-common.h
> > @@ -228,6 +228,7 @@ typedef struct I2SCodec I2SCodec;
> >  typedef struct DeviceState DeviceState;
> >  typedef struct SSIBus SSIBus;
> >  typedef struct EventNotifier EventNotifier;
> > +typedef struct VirtIODevice VirtIODevice;
> >  
> >  typedef uint64_t pcibus_t;
> 
> 		Amit

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [Qemu-devel] [PATCHv4 07/12] virtio: move typedef to qemu-common
  2010-03-03 17:16 ` [Qemu-devel] [PATCHv4 07/12] virtio: move typedef to qemu-common Michael S. Tsirkin
@ 2010-03-04 12:20   ` Amit Shah
  2010-03-04 12:19     ` Michael S. Tsirkin
  0 siblings, 1 reply; 32+ messages in thread
From: Amit Shah @ 2010-03-04 12:20 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: quintela, qemu-devel, kraxel

On (Wed) Mar 03 2010 [19:16:28], Michael S. Tsirkin wrote:
> make it possible to use type without header include

Why?

> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
> ---
>  hw/virtio.h   |    1 -
>  qemu-common.h |    1 +
>  2 files changed, 1 insertions(+), 1 deletions(-)
> 
> diff --git a/hw/virtio.h b/hw/virtio.h
> index 58b06bf..6f2fab0 100644
> --- a/hw/virtio.h
> +++ b/hw/virtio.h
> @@ -69,7 +69,6 @@ static inline target_phys_addr_t vring_align(target_phys_addr_t addr,
>  }
>  
>  typedef struct VirtQueue VirtQueue;
> -typedef struct VirtIODevice VirtIODevice;
>  
>  #define VIRTQUEUE_MAX_SIZE 1024
>  
> diff --git a/qemu-common.h b/qemu-common.h
> index f12a8f5..90ca3b8 100644
> --- a/qemu-common.h
> +++ b/qemu-common.h
> @@ -228,6 +228,7 @@ typedef struct I2SCodec I2SCodec;
>  typedef struct DeviceState DeviceState;
>  typedef struct SSIBus SSIBus;
>  typedef struct EventNotifier EventNotifier;
> +typedef struct VirtIODevice VirtIODevice;
>  
>  typedef uint64_t pcibus_t;

		Amit

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [Qemu-devel] [PATCHv4 06/12] virtio: add set_status callback
  2010-03-04 12:19   ` Amit Shah
@ 2010-03-04 12:20     ` Michael S. Tsirkin
  0 siblings, 0 replies; 32+ messages in thread
From: Michael S. Tsirkin @ 2010-03-04 12:20 UTC (permalink / raw)
  To: Amit Shah; +Cc: quintela, qemu-devel, kraxel

On Thu, Mar 04, 2010 at 05:49:37PM +0530, Amit Shah wrote:
> On (Wed) Mar 03 2010 [19:16:13], Michael S. Tsirkin wrote:
> > vhost net backend needs to be notified when
> > frontend status changes. Add a callback,
> > similar to set_features.
> > 
> > Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
> > ---
> >  hw/s390-virtio-bus.c |    2 +-
> >  hw/syborg_virtio.c   |    2 +-
> >  hw/virtio-pci.c      |    5 +++--
> >  hw/virtio.h          |    9 +++++++++
> >  4 files changed, 14 insertions(+), 4 deletions(-)
> > 
> > diff --git a/hw/s390-virtio-bus.c b/hw/s390-virtio-bus.c
> > index fa0a74f..6f93d67 100644
> > --- a/hw/s390-virtio-bus.c
> > +++ b/hw/s390-virtio-bus.c
> > @@ -242,7 +242,7 @@ void s390_virtio_device_update_status(VirtIOS390Device *dev)
> >      VirtIODevice *vdev = dev->vdev;
> >      uint32_t features;
> >  
> > -    vdev->status = ldub_phys(dev->dev_offs + VIRTIO_DEV_OFFS_STATUS);
> > +    virtio_set_status(vdev, ldub_phys(dev->dev_offs + VIRTIO_DEV_OFFS_STATUS));
> >  
> >      /* Update guest supported feature bitmap */
> >  
> > diff --git a/hw/syborg_virtio.c b/hw/syborg_virtio.c
> > index 65239a0..c7b1162 100644
> > --- a/hw/syborg_virtio.c
> > +++ b/hw/syborg_virtio.c
> > @@ -149,7 +149,7 @@ static void syborg_virtio_writel(void *opaque, target_phys_addr_t offset,
> >          virtio_queue_notify(vdev, value);
> >          break;
> >      case SYBORG_VIRTIO_STATUS:
> > -        vdev->status = value & 0xFF;
> > +        virtio_set_status(vdev, value);
> 
> Dropping the '& 0xFF'? Should be OK, but in the next hunk it's retained:
> 
> >          if (vdev->status == 0)
> >              virtio_reset(vdev);
> >          break;
> > diff --git a/hw/virtio-pci.c b/hw/virtio-pci.c
> > index bcd40f7..fd9759a 100644
> > --- a/hw/virtio-pci.c
> > +++ b/hw/virtio-pci.c
> > @@ -206,7 +206,7 @@ static void virtio_ioport_write(void *opaque, uint32_t addr, uint32_t val)
> >          virtio_queue_notify(vdev, val);
> >          break;
> >      case VIRTIO_PCI_STATUS:
> > -        vdev->status = val & 0xFF;
> > +        virtio_set_status(vdev, val & 0xFF);
> >          if (vdev->status == 0) {
> >              virtio_reset(proxy->vdev);
> >              msix_unuse_all_vectors(&proxy->pci_dev);
> 
> 		Amit

Yea, maybe both places should go.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [Qemu-devel] [PATCHv4 07/12] virtio: move typedef to qemu-common
  2010-03-04 12:19     ` Michael S. Tsirkin
@ 2010-03-04 12:29       ` Amit Shah
  2010-03-04 12:31         ` Michael S. Tsirkin
  0 siblings, 1 reply; 32+ messages in thread
From: Amit Shah @ 2010-03-04 12:29 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: kraxel, qemu-devel, quintela

On (Thu) Mar 04 2010 [14:19:58], Michael S. Tsirkin wrote:
> On Thu, Mar 04, 2010 at 05:50:19PM +0530, Amit Shah wrote:
> > On (Wed) Mar 03 2010 [19:16:28], Michael S. Tsirkin wrote:
> > > make it possible to use type without header include
> > 
> > Why?
> 
> So that vhost.h does not need to include virtio.h

But what's the benefit in doing that?

Either way, a better commit msg would help.

		Amit

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [Qemu-devel] [PATCHv4 07/12] virtio: move typedef to qemu-common
  2010-03-04 12:29       ` Amit Shah
@ 2010-03-04 12:31         ` Michael S. Tsirkin
  0 siblings, 0 replies; 32+ messages in thread
From: Michael S. Tsirkin @ 2010-03-04 12:31 UTC (permalink / raw)
  To: Amit Shah; +Cc: kraxel, qemu-devel, quintela

On Thu, Mar 04, 2010 at 05:59:11PM +0530, Amit Shah wrote:
> On (Thu) Mar 04 2010 [14:19:58], Michael S. Tsirkin wrote:
> > On Thu, Mar 04, 2010 at 05:50:19PM +0530, Amit Shah wrote:
> > > On (Wed) Mar 03 2010 [19:16:28], Michael S. Tsirkin wrote:
> > > > make it possible to use type without header include
> > > 
> > > Why?
> > 
> > So that vhost.h does not need to include virtio.h
> 
> But what's the benefit in doing that?

Simpler dependencies. E.g. tap calls vhost
init/cleanup but we do not want it to include virtio.h
An alternative would be to split init/cleanup
(does not need virtio), and start/stop (needs virtio).
But typedef in qemu-common is already used
for everything else.

> Either way, a better commit msg would help.

ok

> 		Amit

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [Qemu-devel] Re: [PATCHv4 03/12] notifier: event notifier implementation
  2010-03-03 17:16 ` [Qemu-devel] [PATCHv4 03/12] notifier: event notifier implementation Michael S. Tsirkin
@ 2010-03-05 12:03   ` Amit Shah
  0 siblings, 0 replies; 32+ messages in thread
From: Amit Shah @ 2010-03-05 12:03 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: quintela, qemu-devel, kraxel

On (Wed) Mar 03 2010 [19:16:00], Michael S. Tsirkin wrote:
> event notifiers are slightly generalized eventfd descriptors. Current
> implementation depends on eventfd because vhost is the only user, and
> vhost depends on eventfd anyway, but a stub is provided for non-eventfd
> case.
> 
> We'll be able to further generalize this when another user comes along
> and we see how to best do this.
> 
> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
> ---
>  Makefile.target |    1 +
>  hw/notifier.c   |   62 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
>  hw/notifier.h   |   16 ++++++++++++++

Since these are *event* notifiers and all the functions are named
event_notifier_*, can we also name the files event_notifer.[ch]?

Just notifier isn't very helpful, since there are all sorts of notifiers
available.

		Amit

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [Qemu-devel] Re: [PATCHv4 04/12] virtio: add notifier support
  2010-03-03 17:16 ` [Qemu-devel] [PATCHv4 04/12] virtio: add notifier support Michael S. Tsirkin
@ 2010-03-05 12:04   ` Amit Shah
  0 siblings, 0 replies; 32+ messages in thread
From: Amit Shah @ 2010-03-05 12:04 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: quintela, qemu-devel, kraxel

On (Wed) Mar 03 2010 [19:16:04], Michael S. Tsirkin wrote:
> Add binding API to set host/guest notifiers.
> Will be used by vhost.

I've already mentioned this on IRC, but I'll put it here too, for
tracking.

> 
> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
> ---
>  hw/virtio.c |    5 ++++-
>  hw/virtio.h |    3 +++
>  2 files changed, 7 insertions(+), 1 deletions(-)
> 
> diff --git a/hw/virtio.c b/hw/virtio.c
> index 7c020a3..1f5e7be 100644
> --- a/hw/virtio.c
> +++ b/hw/virtio.c
> @@ -73,6 +73,7 @@ struct VirtQueue
>      int inuse;
>      uint16_t vector;
>      void (*handle_output)(VirtIODevice *vdev, VirtQueue *vq);
> +    VirtIODevice *vdev;
>  };
>  
>  /* virt queue functions */
> @@ -714,8 +715,10 @@ VirtIODevice *virtio_common_init(const char *name, uint16_t device_id,
>      vdev->queue_sel = 0;
>      vdev->config_vector = VIRTIO_NO_VECTOR;
>      vdev->vq = qemu_mallocz(sizeof(VirtQueue) * VIRTIO_PCI_QUEUE_MAX);
> -    for(i = 0; i < VIRTIO_PCI_QUEUE_MAX; i++)
> +    for(i = 0; i < VIRTIO_PCI_QUEUE_MAX; i++) {
>          vdev->vq[i].vector = VIRTIO_NO_VECTOR;
> +        vdev->vq[i].vdev = vdev;
> +    }

These two hunks don't belong to this patchset (and are not part of the
commit description too..)

>  
>      vdev->name = name;
>      vdev->config_len = config_size;
> diff --git a/hw/virtio.h b/hw/virtio.h
> index 3baa2a3..af87889 100644
> --- a/hw/virtio.h
> +++ b/hw/virtio.h
> @@ -19,6 +19,7 @@
>  #include "qdev.h"
>  #include "sysemu.h"
>  #include "block_int.h"
> +#include "notifier.h"
>  
>  /* from Linux's linux/virtio_config.h */
>  
> @@ -89,6 +90,8 @@ typedef struct {
>      int (*load_config)(void * opaque, QEMUFile *f);
>      int (*load_queue)(void * opaque, int n, QEMUFile *f);
>      unsigned (*get_features)(void * opaque);
> +    int (*guest_notifier)(void * opaque, int n, bool assigned);
> +    int (*host_notifier)(void * opaque, int n, bool assigned);
>  } VirtIOBindings;
>  
>  #define VIRTIO_PCI_QUEUE_MAX 64

And I think this whole patch could be merged with the next one.


		Amit

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [Qemu-devel] Re: [PATCHv4 05/12] virtio: add APIs for queue fields
  2010-03-03 17:16 ` [Qemu-devel] [PATCHv4 05/12] virtio: add APIs for queue fields Michael S. Tsirkin
@ 2010-03-05 12:10   ` Amit Shah
  2010-03-06 19:09     ` Michael S. Tsirkin
  2010-03-05 13:08   ` Amit Shah
  1 sibling, 1 reply; 32+ messages in thread
From: Amit Shah @ 2010-03-05 12:10 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: quintela, qemu-devel, kraxel

On (Wed) Mar 03 2010 [19:16:09], Michael S. Tsirkin wrote:
> vhost needs physical addresses for ring and other queue fields,
> so add APIs for these.

Already discussed on IRC, but mentioning here so that it doesn't get
lost:

> +target_phys_addr_t virtio_queue_get_desc(VirtIODevice *vdev, int n)
> +{
> +    return vdev->vq[n].vring.desc;
> +}
> +
> +target_phys_addr_t virtio_queue_get_avail(VirtIODevice *vdev, int n)
> +{
> +    return vdev->vq[n].vring.avail;
> +}
> +
> +target_phys_addr_t virtio_queue_get_used(VirtIODevice *vdev, int n)
> +{
> +    return vdev->vq[n].vring.used;
> +}
> +
> +target_phys_addr_t virtio_queue_get_ring(VirtIODevice *vdev, int n)
> +{
> +    return vdev->vq[n].vring.desc;
> +}

All these functions return the start address of these fields; any better
way to name them?

eg, just by looking at, eg, 'virtio_queue_get_used()', I'd expect that
the function returns the number of used buffers in the ring, not the
start address of the used buffers.

> +target_phys_addr_t virtio_queue_get_desc_size(VirtIODevice *vdev, int n)
> +{
> +    return sizeof(VRingDesc) * vdev->vq[n].vring.num;
> +}
> +
> +target_phys_addr_t virtio_queue_get_avail_size(VirtIODevice *vdev, int n)
> +{
> +    return offsetof(VRingAvail, ring) +
> +        sizeof(u_int64_t) * vdev->vq[n].vring.num;
> +}
> +
> +target_phys_addr_t virtio_queue_get_used_size(VirtIODevice *vdev, int n)
> +{
> +    return offsetof(VRingUsed, ring) +
> +        sizeof(VRingUsedElem) * vdev->vq[n].vring.num;
> +}
> +
> +

Extra newline

> +target_phys_addr_t virtio_queue_get_ring_size(VirtIODevice *vdev, int n)
> +{
> +    return vdev->vq[n].vring.used - vdev->vq[n].vring.desc +
> +	    virtio_queue_get_used_size(vdev, n);
> +}
> +
> +uint16_t virtio_queue_last_avail_idx(VirtIODevice *vdev, int n)
> +{
> +    return vdev->vq[n].last_avail_idx;
> +}
> +
> +void virtio_queue_set_last_avail_idx(VirtIODevice *vdev, int n, uint16_t idx)
> +{
> +    vdev->vq[n].last_avail_idx = idx;
> +}

virtio_queue_last_avail_idx() does make sense, but since you have a
'set_last_avail_idx', better name the previous one 'get_..'?

> +VirtQueue *virtio_queue(VirtIODevice *vdev, int n)
> +{
> +    return vdev->vq + n;
> +}

This really doesn't mean anything; I suggest virtio_queue_get_vq().

> +EventNotifier *virtio_queue_guest_notifier(VirtQueue *vq)
> +{
> +    return &vq->guest_notifier;
> +}
> +EventNotifier *virtio_queue_host_notifier(VirtQueue *vq)
> +{
> +    return &vq->host_notifier;
> +}

Why drop the 'get_' for these functions?

virtio_queue_get_guest_notifier()
and
virtio_queue_get_host_notifier()

might be better.

		Amit

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [Qemu-devel] Re: [PATCHv4 05/12] virtio: add APIs for queue fields
  2010-03-03 17:16 ` [Qemu-devel] [PATCHv4 05/12] virtio: add APIs for queue fields Michael S. Tsirkin
  2010-03-05 12:10   ` [Qemu-devel] " Amit Shah
@ 2010-03-05 13:08   ` Amit Shah
  2010-03-06 19:07     ` Michael S. Tsirkin
  1 sibling, 1 reply; 32+ messages in thread
From: Amit Shah @ 2010-03-05 13:08 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: quintela, qemu-devel, kraxel

On (Wed) Mar 03 2010 [19:16:09], Michael S. Tsirkin wrote:
> +
> +EventNotifier *virtio_queue_guest_notifier(VirtQueue *vq)
> +{
> +    return &vq->guest_notifier;
> +}
> +EventNotifier *virtio_queue_host_notifier(VirtQueue *vq)
> +{
> +    return &vq->host_notifier;
> +}

Where are these host_notifier and guest_notifier fields set? Am I
completely missing it?

		Amit

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [Qemu-devel] Re: [PATCHv4 09/12] vhost: vhost net support
  2010-03-03 17:16 ` [Qemu-devel] [PATCHv4 09/12] vhost: vhost net support Michael S. Tsirkin
@ 2010-03-05 18:19   ` Amit Shah
  2010-03-06 19:06     ` Michael S. Tsirkin
  0 siblings, 1 reply; 32+ messages in thread
From: Amit Shah @ 2010-03-05 18:19 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: quintela, qemu-devel, kraxel

On (Wed) Mar 03 2010 [19:16:35], Michael S. Tsirkin wrote:

> +static int vhost_virtqueue_init(struct vhost_dev *dev,
> +                                struct VirtIODevice *vdev,
> +                                struct vhost_virtqueue *vq,
> +                                unsigned idx)
> +{
> +    target_phys_addr_t s, l, a;
> +    int r;
> +    struct vhost_vring_file file = {
> +        .index = idx,
> +    };
> +    struct vhost_vring_state state = {
> +        .index = idx,
> +    };
> +    struct VirtQueue *q = virtio_queue(vdev, idx);

Why depart from using 'vq' for VirtQueue? Why not use 'hvq' for
vhost_virtqueue? That'll make reading through this code easier... Also,
'hvdev' for vhost_dev will be apt as well.

> +    vq->num = state.num = virtio_queue_get_num(vdev, idx);

I think this should be named 'virtio_queue_get_vq_num' for clarity.

> +    r = ioctl(dev->control, VHOST_SET_VRING_NUM, &state);
> +    if (r) {
> +        return -errno;
> +    }
> +
> +    state.num = virtio_queue_last_avail_idx(vdev, idx);
> +    r = ioctl(dev->control, VHOST_SET_VRING_BASE, &state);
> +    if (r) {
> +        return -errno;
> +    }
> +
> +    s = l = virtio_queue_get_desc_size(vdev, idx);
> +    a = virtio_queue_get_desc(vdev, idx);
> +    vq->desc = cpu_physical_memory_map(a, &l, 0);
> +    if (!vq->desc || l != s) {
> +        r = -ENOMEM;
> +        goto fail_alloc_desc;
> +    }
> +    s = l = virtio_queue_get_avail_size(vdev, idx);
> +    a = virtio_queue_get_avail(vdev, idx);
> +    vq->avail = cpu_physical_memory_map(a, &l, 0);
> +    if (!vq->avail || l != s) {
> +        r = -ENOMEM;
> +        goto fail_alloc_avail;
> +    }
> +    vq->used_size = s = l = virtio_queue_get_used_size(vdev, idx);
> +    vq->used_phys = a = virtio_queue_get_used(vdev, idx);
> +    vq->used = cpu_physical_memory_map(a, &l, 1);
> +    if (!vq->used || l != s) {
> +        r = -ENOMEM;
> +        goto fail_alloc_used;
> +    }
> +
> +    vq->ring_size = s = l = virtio_queue_get_ring_size(vdev, idx);
> +    vq->ring_phys = a = virtio_queue_get_ring(vdev, idx);
> +    vq->ring = cpu_physical_memory_map(a, &l, 1);
> +    if (!vq->ring || l != s) {
> +        r = -ENOMEM;
> +        goto fail_alloc_ring;
> +    }
> +
> +    r = vhost_virtqueue_set_addr(dev, vq, idx, dev->log_enabled);
> +    if (r < 0) {
> +        r = -errno;
> +        goto fail_alloc;
> +    }
> +    if (!vdev->binding->guest_notifier || !vdev->binding->host_notifier) {
> +        fprintf(stderr, "binding does not support irqfd/queuefd\n");
> +        r = -ENOSYS;
> +        goto fail_alloc;
> +    }

This could be checked much earlier on in the function; so that we avoid
doing all that stuff above and the cleanup.

		Amit

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [Qemu-devel] Re: [PATCHv4 09/12] vhost: vhost net support
  2010-03-05 18:19   ` [Qemu-devel] " Amit Shah
@ 2010-03-06 19:06     ` Michael S. Tsirkin
  2010-03-08  6:20       ` Amit Shah
  0 siblings, 1 reply; 32+ messages in thread
From: Michael S. Tsirkin @ 2010-03-06 19:06 UTC (permalink / raw)
  To: Amit Shah; +Cc: quintela, qemu-devel, kraxel

On Fri, Mar 05, 2010 at 11:49:11PM +0530, Amit Shah wrote:
> On (Wed) Mar 03 2010 [19:16:35], Michael S. Tsirkin wrote:
> 
> > +static int vhost_virtqueue_init(struct vhost_dev *dev,
> > +                                struct VirtIODevice *vdev,
> > +                                struct vhost_virtqueue *vq,
> > +                                unsigned idx)
> > +{
> > +    target_phys_addr_t s, l, a;
> > +    int r;
> > +    struct vhost_vring_file file = {
> > +        .index = idx,
> > +    };
> > +    struct vhost_vring_state state = {
> > +        .index = idx,
> > +    };
> > +    struct VirtQueue *q = virtio_queue(vdev, idx);
> 
> Why depart from using 'vq' for VirtQueue? Why not use 'hvq' for
> vhost_virtqueue? That'll make reading through this code easier... Also,
> 'hvdev' for vhost_dev will be apt as well.

I'll think of better names, thanks.

> > +    vq->num = state.num = virtio_queue_get_num(vdev, idx);
> 
> I think this should be named 'virtio_queue_get_vq_num' for clarity.

This is the name upstream. If you like, send a patch to rename it :).

> > +    r = ioctl(dev->control, VHOST_SET_VRING_NUM, &state);
> > +    if (r) {
> > +        return -errno;
> > +    }
> > +
> > +    state.num = virtio_queue_last_avail_idx(vdev, idx);
> > +    r = ioctl(dev->control, VHOST_SET_VRING_BASE, &state);
> > +    if (r) {
> > +        return -errno;
> > +    }
> > +
> > +    s = l = virtio_queue_get_desc_size(vdev, idx);
> > +    a = virtio_queue_get_desc(vdev, idx);
> > +    vq->desc = cpu_physical_memory_map(a, &l, 0);
> > +    if (!vq->desc || l != s) {
> > +        r = -ENOMEM;
> > +        goto fail_alloc_desc;
> > +    }
> > +    s = l = virtio_queue_get_avail_size(vdev, idx);
> > +    a = virtio_queue_get_avail(vdev, idx);
> > +    vq->avail = cpu_physical_memory_map(a, &l, 0);
> > +    if (!vq->avail || l != s) {
> > +        r = -ENOMEM;
> > +        goto fail_alloc_avail;
> > +    }
> > +    vq->used_size = s = l = virtio_queue_get_used_size(vdev, idx);
> > +    vq->used_phys = a = virtio_queue_get_used(vdev, idx);
> > +    vq->used = cpu_physical_memory_map(a, &l, 1);
> > +    if (!vq->used || l != s) {
> > +        r = -ENOMEM;
> > +        goto fail_alloc_used;
> > +    }
> > +
> > +    vq->ring_size = s = l = virtio_queue_get_ring_size(vdev, idx);
> > +    vq->ring_phys = a = virtio_queue_get_ring(vdev, idx);
> > +    vq->ring = cpu_physical_memory_map(a, &l, 1);
> > +    if (!vq->ring || l != s) {
> > +        r = -ENOMEM;
> > +        goto fail_alloc_ring;
> > +    }
> > +
> > +    r = vhost_virtqueue_set_addr(dev, vq, idx, dev->log_enabled);
> > +    if (r < 0) {
> > +        r = -errno;
> > +        goto fail_alloc;
> > +    }
> > +    if (!vdev->binding->guest_notifier || !vdev->binding->host_notifier) {
> > +        fprintf(stderr, "binding does not support irqfd/queuefd\n");
> > +        r = -ENOSYS;
> > +        goto fail_alloc;
> > +    }
> 
> This could be checked much earlier on in the function; so that we avoid
> doing all that stuff above and the cleanup.
> 
> 		Amit


Whatever order we put checks in, we'll have to undo stuff
done beforehand on error.

-- 
MST

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [Qemu-devel] Re: [PATCHv4 05/12] virtio: add APIs for queue fields
  2010-03-05 13:08   ` Amit Shah
@ 2010-03-06 19:07     ` Michael S. Tsirkin
  2010-03-08  6:16       ` Amit Shah
  0 siblings, 1 reply; 32+ messages in thread
From: Michael S. Tsirkin @ 2010-03-06 19:07 UTC (permalink / raw)
  To: Amit Shah; +Cc: quintela, qemu-devel, kraxel

On Fri, Mar 05, 2010 at 06:38:35PM +0530, Amit Shah wrote:
> On (Wed) Mar 03 2010 [19:16:09], Michael S. Tsirkin wrote:
> > +
> > +EventNotifier *virtio_queue_guest_notifier(VirtQueue *vq)
> > +{
> > +    return &vq->guest_notifier;
> > +}
> > +EventNotifier *virtio_queue_host_notifier(VirtQueue *vq)
> > +{
> > +    return &vq->host_notifier;
> > +}
> 
> Where are these host_notifier and guest_notifier fields set? Am I
> completely missing it?
> 
> 		Amit

What do you mean by "set"?
You get a pointer, you can e.g. init it.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [Qemu-devel] Re: [PATCHv4 05/12] virtio: add APIs for queue fields
  2010-03-05 12:10   ` [Qemu-devel] " Amit Shah
@ 2010-03-06 19:09     ` Michael S. Tsirkin
  0 siblings, 0 replies; 32+ messages in thread
From: Michael S. Tsirkin @ 2010-03-06 19:09 UTC (permalink / raw)
  To: Amit Shah; +Cc: quintela, qemu-devel, kraxel

On Fri, Mar 05, 2010 at 05:40:18PM +0530, Amit Shah wrote:
> On (Wed) Mar 03 2010 [19:16:09], Michael S. Tsirkin wrote:
> > vhost needs physical addresses for ring and other queue fields,
> > so add APIs for these.
> 
> Already discussed on IRC, but mentioning here so that it doesn't get
> lost:
> 
> > +target_phys_addr_t virtio_queue_get_desc(VirtIODevice *vdev, int n)
> > +{
> > +    return vdev->vq[n].vring.desc;
> > +}
> > +
> > +target_phys_addr_t virtio_queue_get_avail(VirtIODevice *vdev, int n)
> > +{
> > +    return vdev->vq[n].vring.avail;
> > +}
> > +
> > +target_phys_addr_t virtio_queue_get_used(VirtIODevice *vdev, int n)
> > +{
> > +    return vdev->vq[n].vring.used;
> > +}
> > +
> > +target_phys_addr_t virtio_queue_get_ring(VirtIODevice *vdev, int n)
> > +{
> > +    return vdev->vq[n].vring.desc;
> > +}
> 
> All these functions return the start address of these fields; any better
> way to name them?
> 
> eg, just by looking at, eg, 'virtio_queue_get_used()', I'd expect that
> the function returns the number of used buffers in the ring, not the
> start address of the used buffers.
> 
> > +target_phys_addr_t virtio_queue_get_desc_size(VirtIODevice *vdev, int n)
> > +{
> > +    return sizeof(VRingDesc) * vdev->vq[n].vring.num;
> > +}
> > +
> > +target_phys_addr_t virtio_queue_get_avail_size(VirtIODevice *vdev, int n)
> > +{
> > +    return offsetof(VRingAvail, ring) +
> > +        sizeof(u_int64_t) * vdev->vq[n].vring.num;
> > +}
> > +
> > +target_phys_addr_t virtio_queue_get_used_size(VirtIODevice *vdev, int n)
> > +{
> > +    return offsetof(VRingUsed, ring) +
> > +        sizeof(VRingUsedElem) * vdev->vq[n].vring.num;
> > +}
> > +
> > +
> 
> Extra newline
> 
> > +target_phys_addr_t virtio_queue_get_ring_size(VirtIODevice *vdev, int n)
> > +{
> > +    return vdev->vq[n].vring.used - vdev->vq[n].vring.desc +
> > +	    virtio_queue_get_used_size(vdev, n);
> > +}
> > +
> > +uint16_t virtio_queue_last_avail_idx(VirtIODevice *vdev, int n)
> > +{
> > +    return vdev->vq[n].last_avail_idx;
> > +}
> > +
> > +void virtio_queue_set_last_avail_idx(VirtIODevice *vdev, int n, uint16_t idx)
> > +{
> > +    vdev->vq[n].last_avail_idx = idx;
> > +}
> 
> virtio_queue_last_avail_idx() does make sense, but since you have a
> 'set_last_avail_idx', better name the previous one 'get_..'?
> 
> > +VirtQueue *virtio_queue(VirtIODevice *vdev, int n)
> > +{
> > +    return vdev->vq + n;
> > +}
> 
> This really doesn't mean anything; I suggest virtio_queue_get_vq().
> 
> > +EventNotifier *virtio_queue_guest_notifier(VirtQueue *vq)
> > +{
> > +    return &vq->guest_notifier;
> > +}
> > +EventNotifier *virtio_queue_host_notifier(VirtQueue *vq)
> > +{
> > +    return &vq->host_notifier;
> > +}
> 
> Why drop the 'get_' for these functions?
> 
> virtio_queue_get_guest_notifier()
> and
> virtio_queue_get_host_notifier()
> 
> might be better.
> 
> 		Amit

Not sure we want 'get' all around, but I'll think about the names, thanks!

-- 
MST

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [Qemu-devel] Re: [PATCHv4 05/12] virtio: add APIs for queue fields
  2010-03-06 19:07     ` Michael S. Tsirkin
@ 2010-03-08  6:16       ` Amit Shah
  2010-03-08 18:11         ` Michael S. Tsirkin
  0 siblings, 1 reply; 32+ messages in thread
From: Amit Shah @ 2010-03-08  6:16 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: quintela, qemu-devel, kraxel

On (Sat) Mar 06 2010 [21:07:57], Michael S. Tsirkin wrote:
> On Fri, Mar 05, 2010 at 06:38:35PM +0530, Amit Shah wrote:
> > On (Wed) Mar 03 2010 [19:16:09], Michael S. Tsirkin wrote:
> > > +
> > > +EventNotifier *virtio_queue_guest_notifier(VirtQueue *vq)
> > > +{
> > > +    return &vq->guest_notifier;
> > > +}
> > > +EventNotifier *virtio_queue_host_notifier(VirtQueue *vq)
> > > +{
> > > +    return &vq->host_notifier;
> > > +}
> > 
> > Where are these host_notifier and guest_notifier fields set? Am I
> > completely missing it?
> 
> What do you mean by "set"?
> You get a pointer, you can e.g. init it.

I mean where is vq->host_notifier initialised?

		Amit

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [Qemu-devel] Re: [PATCHv4 09/12] vhost: vhost net support
  2010-03-06 19:06     ` Michael S. Tsirkin
@ 2010-03-08  6:20       ` Amit Shah
  2010-03-16 15:37         ` Michael S. Tsirkin
  0 siblings, 1 reply; 32+ messages in thread
From: Amit Shah @ 2010-03-08  6:20 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: quintela, qemu-devel, kraxel

On (Sat) Mar 06 2010 [21:06:35], Michael S. Tsirkin wrote:
> 
> > > +    r = vhost_virtqueue_set_addr(dev, vq, idx, dev->log_enabled);
> > > +    if (r < 0) {
> > > +        r = -errno;
> > > +        goto fail_alloc;
> > > +    }
> > > +    if (!vdev->binding->guest_notifier || !vdev->binding->host_notifier) {
> > > +        fprintf(stderr, "binding does not support irqfd/queuefd\n");
> > > +        r = -ENOSYS;
> > > +        goto fail_alloc;
> > > +    }
> > 
> > This could be checked much earlier on in the function; so that we avoid
> > doing all that stuff above and the cleanup.
> 
> Whatever order we put checks in, we'll have to undo stuff
> done beforehand on error.

Not if you do this check before any ioctls or allocations.
!vdev->binding->guest_notifier is not dependent on anything you do above
it in this function, so just checking for this first thing in the
function will not need any cleanup.


		Amit

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [Qemu-devel] Re: [PATCHv4 05/12] virtio: add APIs for queue fields
  2010-03-08  6:16       ` Amit Shah
@ 2010-03-08 18:11         ` Michael S. Tsirkin
  0 siblings, 0 replies; 32+ messages in thread
From: Michael S. Tsirkin @ 2010-03-08 18:11 UTC (permalink / raw)
  To: Amit Shah; +Cc: quintela, qemu-devel, kraxel

On Mon, Mar 08, 2010 at 11:46:46AM +0530, Amit Shah wrote:
> On (Sat) Mar 06 2010 [21:07:57], Michael S. Tsirkin wrote:
> > On Fri, Mar 05, 2010 at 06:38:35PM +0530, Amit Shah wrote:
> > > On (Wed) Mar 03 2010 [19:16:09], Michael S. Tsirkin wrote:
> > > > +
> > > > +EventNotifier *virtio_queue_guest_notifier(VirtQueue *vq)
> > > > +{
> > > > +    return &vq->guest_notifier;
> > > > +}
> > > > +EventNotifier *virtio_queue_host_notifier(VirtQueue *vq)
> > > > +{
> > > > +    return &vq->host_notifier;
> > > > +}
> > > 
> > > Where are these host_notifier and guest_notifier fields set? Am I
> > > completely missing it?
> > 
> > What do you mean by "set"?
> > You get a pointer, you can e.g. init it.
> 
> I mean where is vq->host_notifier initialised?
> 
> 		Amit

There's a function to do this. Called from virtio-pci.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [Qemu-devel] Re: [PATCHv4 09/12] vhost: vhost net support
  2010-03-08  6:20       ` Amit Shah
@ 2010-03-16 15:37         ` Michael S. Tsirkin
  2010-03-17  4:09           ` Amit Shah
  0 siblings, 1 reply; 32+ messages in thread
From: Michael S. Tsirkin @ 2010-03-16 15:37 UTC (permalink / raw)
  To: Amit Shah; +Cc: quintela, qemu-devel, kraxel

On Mon, Mar 08, 2010 at 11:50:23AM +0530, Amit Shah wrote:
> On (Sat) Mar 06 2010 [21:06:35], Michael S. Tsirkin wrote:
> > 
> > > > +    r = vhost_virtqueue_set_addr(dev, vq, idx, dev->log_enabled);
> > > > +    if (r < 0) {
> > > > +        r = -errno;
> > > > +        goto fail_alloc;
> > > > +    }
> > > > +    if (!vdev->binding->guest_notifier || !vdev->binding->host_notifier) {
> > > > +        fprintf(stderr, "binding does not support irqfd/queuefd\n");
> > > > +        r = -ENOSYS;
> > > > +        goto fail_alloc;
> > > > +    }
> > > 
> > > This could be checked much earlier on in the function; so that we avoid
> > > doing all that stuff above and the cleanup.
> > 
> > Whatever order we put checks in, we'll have to undo stuff
> > done beforehand on error.
> 
> Not if you do this check before any ioctls or allocations.
> !vdev->binding->guest_notifier is not dependent on anything you do above
> it in this function, so just checking for this first thing in the
> function will not need any cleanup.
> 
> 
> 		Amit

Yes, but I think it's clearer to do check function just before
calling it. No?

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [Qemu-devel] Re: [PATCHv4 09/12] vhost: vhost net support
  2010-03-16 15:37         ` Michael S. Tsirkin
@ 2010-03-17  4:09           ` Amit Shah
  0 siblings, 0 replies; 32+ messages in thread
From: Amit Shah @ 2010-03-17  4:09 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: quintela, qemu-devel, kraxel

On (Tue) Mar 16 2010 [17:37:39], Michael S. Tsirkin wrote:
> On Mon, Mar 08, 2010 at 11:50:23AM +0530, Amit Shah wrote:
> > On (Sat) Mar 06 2010 [21:06:35], Michael S. Tsirkin wrote:
> > > 
> > > > > +    r = vhost_virtqueue_set_addr(dev, vq, idx, dev->log_enabled);
> > > > > +    if (r < 0) {
> > > > > +        r = -errno;
> > > > > +        goto fail_alloc;
> > > > > +    }
> > > > > +    if (!vdev->binding->guest_notifier || !vdev->binding->host_notifier) {
> > > > > +        fprintf(stderr, "binding does not support irqfd/queuefd\n");
> > > > > +        r = -ENOSYS;
> > > > > +        goto fail_alloc;
> > > > > +    }
> > > > 
> > > > This could be checked much earlier on in the function; so that we avoid
> > > > doing all that stuff above and the cleanup.
> > > 
> > > Whatever order we put checks in, we'll have to undo stuff
> > > done beforehand on error.
> > 
> > Not if you do this check before any ioctls or allocations.
> > !vdev->binding->guest_notifier is not dependent on anything you do above
> > it in this function, so just checking for this first thing in the
> > function will not need any cleanup.
> 
> Yes, but I think it's clearer to do check function just before
> calling it. No?

The good thing about doing it before is that we can check for whatever
conditions are not met first and exit instead of doing N things and
undoing them later and then exiting -- faster and more efficient in the
case when vhost is not being used.

		Amit
-- 
http://log.amitshah.net/

^ permalink raw reply	[flat|nested] 32+ messages in thread

end of thread, other threads:[~2010-03-17  4:11 UTC | newest]

Thread overview: 32+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-03-03 17:15 [Qemu-devel] [PATCHv4 00/12] vhost-net: upstream integration Michael S. Tsirkin
2010-03-03 17:15 ` [Qemu-devel] [PATCHv4 01/12] tap: add interface to get device fd Michael S. Tsirkin
2010-03-03 17:15 ` [Qemu-devel] [PATCHv4 02/12] kvm: add API to set ioeventfd Michael S. Tsirkin
2010-03-03 17:16 ` [Qemu-devel] [PATCHv4 03/12] notifier: event notifier implementation Michael S. Tsirkin
2010-03-05 12:03   ` [Qemu-devel] " Amit Shah
2010-03-03 17:16 ` [Qemu-devel] [PATCHv4 04/12] virtio: add notifier support Michael S. Tsirkin
2010-03-05 12:04   ` [Qemu-devel] " Amit Shah
2010-03-03 17:16 ` [Qemu-devel] [PATCHv4 05/12] virtio: add APIs for queue fields Michael S. Tsirkin
2010-03-05 12:10   ` [Qemu-devel] " Amit Shah
2010-03-06 19:09     ` Michael S. Tsirkin
2010-03-05 13:08   ` Amit Shah
2010-03-06 19:07     ` Michael S. Tsirkin
2010-03-08  6:16       ` Amit Shah
2010-03-08 18:11         ` Michael S. Tsirkin
2010-03-03 17:16 ` [Qemu-devel] [PATCHv4 06/12] virtio: add set_status callback Michael S. Tsirkin
2010-03-04 12:19   ` Amit Shah
2010-03-04 12:20     ` Michael S. Tsirkin
2010-03-03 17:16 ` [Qemu-devel] [PATCHv4 07/12] virtio: move typedef to qemu-common Michael S. Tsirkin
2010-03-04 12:20   ` Amit Shah
2010-03-04 12:19     ` Michael S. Tsirkin
2010-03-04 12:29       ` Amit Shah
2010-03-04 12:31         ` Michael S. Tsirkin
2010-03-03 17:16 ` [Qemu-devel] [PATCHv4 08/12] virtio-pci: fill in notifier support Michael S. Tsirkin
2010-03-03 17:16 ` [Qemu-devel] [PATCHv4 09/12] vhost: vhost net support Michael S. Tsirkin
2010-03-05 18:19   ` [Qemu-devel] " Amit Shah
2010-03-06 19:06     ` Michael S. Tsirkin
2010-03-08  6:20       ` Amit Shah
2010-03-16 15:37         ` Michael S. Tsirkin
2010-03-17  4:09           ` Amit Shah
2010-03-03 17:16 ` [Qemu-devel] [PATCHv4 10/12] tap: add vhost/vhostfd options Michael S. Tsirkin
2010-03-03 17:16 ` [Qemu-devel] [PATCHv4 11/12] tap: add API to retrieve vhost net header Michael S. Tsirkin
2010-03-03 17:16 ` [Qemu-devel] [PATCHv4 12/12] virtio-net: vhost net support Michael S. Tsirkin

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.