linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/5] vringh
@ 2013-02-18 22:29 Rusty Russell
  2013-02-18 22:29 ` [PATCH 1/5] virtio_ring: expose virtio barriers for use in vringh Rusty Russell
                   ` (6 more replies)
  0 siblings, 7 replies; 9+ messages in thread
From: Rusty Russell @ 2013-02-18 22:29 UTC (permalink / raw)
  To: virtualization; +Cc: sjur.brandeland, mst, linux-kernel

This introduces vringh, which are generic accessors for virtio rings (host side).
There's a host-side implementation in vhost, but it assumes that the rings are
in userspace, and is tied to the vhost implementation.  I have patches to adapt
it to use vringh, but I'm pushing this in the next merge window for Sjur, who has
CAIF patches which need it.

This also includes a test program in tools/virtio.  There's a fixup patch for
the current tools, since they didn't compile for me.  MST?

Review and comments welcome!
Rusty.


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH 1/5] virtio_ring: expose virtio barriers for use in vringh.
  2013-02-18 22:29 [PATCH 0/5] vringh Rusty Russell
@ 2013-02-18 22:29 ` Rusty Russell
  2013-02-18 22:29 ` [PATCH 2/5] vringh: host-side implementation of virtio rings Rusty Russell
                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 9+ messages in thread
From: Rusty Russell @ 2013-02-18 22:29 UTC (permalink / raw)
  To: virtualization; +Cc: sjur.brandeland, mst, linux-kernel, Rusty Russell

The host side of ring needs this logic too.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
---
 drivers/virtio/virtio_ring.c |   33 +++++-------------------
 include/linux/virtio_ring.h  |   57 ++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 63 insertions(+), 27 deletions(-)

diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
index ffd7e7d..245177c 100644
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -24,27 +24,6 @@
 #include <linux/module.h>
 #include <linux/hrtimer.h>
 
-/* virtio guest is communicating with a virtual "device" that actually runs on
- * a host processor.  Memory barriers are used to control SMP effects. */
-#ifdef CONFIG_SMP
-/* Where possible, use SMP barriers which are more lightweight than mandatory
- * barriers, because mandatory barriers control MMIO effects on accesses
- * through relaxed memory I/O windows (which virtio-pci does not use). */
-#define virtio_mb(vq) \
-	do { if ((vq)->weak_barriers) smp_mb(); else mb(); } while(0)
-#define virtio_rmb(vq) \
-	do { if ((vq)->weak_barriers) smp_rmb(); else rmb(); } while(0)
-#define virtio_wmb(vq) \
-	do { if ((vq)->weak_barriers) smp_wmb(); else wmb(); } while(0)
-#else
-/* We must force memory ordering even if guest is UP since host could be
- * running on another CPU, but SMP barriers are defined to barrier() in that
- * configuration. So fall back to mandatory barriers instead. */
-#define virtio_mb(vq) mb()
-#define virtio_rmb(vq) rmb()
-#define virtio_wmb(vq) wmb()
-#endif
-
 #ifdef DEBUG
 /* For development, we want to crash whenever the ring is screwed. */
 #define BAD_RING(_vq, fmt, args...)				\
@@ -276,7 +255,7 @@ add_head:
 
 	/* Descriptors and available array need to be set before we expose the
 	 * new available array entries. */
-	virtio_wmb(vq);
+	virtio_wmb(vq->weak_barriers);
 	vq->vring.avail->idx++;
 	vq->num_added++;
 
@@ -312,7 +291,7 @@ bool virtqueue_kick_prepare(struct virtqueue *_vq)
 	START_USE(vq);
 	/* We need to expose available array entries before checking avail
 	 * event. */
-	virtio_mb(vq);
+	virtio_mb(vq->weak_barriers);
 
 	old = vq->vring.avail->idx - vq->num_added;
 	new = vq->vring.avail->idx;
@@ -436,7 +415,7 @@ void *virtqueue_get_buf(struct virtqueue *_vq, unsigned int *len)
 	}
 
 	/* Only get used array entries after they have been exposed by host. */
-	virtio_rmb(vq);
+	virtio_rmb(vq->weak_barriers);
 
 	last_used = (vq->last_used_idx & (vq->vring.num - 1));
 	i = vq->vring.used->ring[last_used].id;
@@ -460,7 +439,7 @@ void *virtqueue_get_buf(struct virtqueue *_vq, unsigned int *len)
 	 * the read in the next get_buf call. */
 	if (!(vq->vring.avail->flags & VRING_AVAIL_F_NO_INTERRUPT)) {
 		vring_used_event(&vq->vring) = vq->last_used_idx;
-		virtio_mb(vq);
+		virtio_mb(vq->weak_barriers);
 	}
 
 #ifdef DEBUG
@@ -513,7 +492,7 @@ bool virtqueue_enable_cb(struct virtqueue *_vq)
 	 * entry. Always do both to keep code simple. */
 	vq->vring.avail->flags &= ~VRING_AVAIL_F_NO_INTERRUPT;
 	vring_used_event(&vq->vring) = vq->last_used_idx;
-	virtio_mb(vq);
+	virtio_mb(vq->weak_barriers);
 	if (unlikely(more_used(vq))) {
 		END_USE(vq);
 		return false;
@@ -553,7 +532,7 @@ bool virtqueue_enable_cb_delayed(struct virtqueue *_vq)
 	/* TODO: tune this threshold */
 	bufs = (u16)(vq->vring.avail->idx - vq->last_used_idx) * 3 / 4;
 	vring_used_event(&vq->vring) = vq->last_used_idx + bufs;
-	virtio_mb(vq);
+	virtio_mb(vq->weak_barriers);
 	if (unlikely((u16)(vq->vring.used->idx - vq->last_used_idx) > bufs)) {
 		END_USE(vq);
 		return false;
diff --git a/include/linux/virtio_ring.h b/include/linux/virtio_ring.h
index 63c6ea1..ca3ad41 100644
--- a/include/linux/virtio_ring.h
+++ b/include/linux/virtio_ring.h
@@ -4,6 +4,63 @@
 #include <linux/irqreturn.h>
 #include <uapi/linux/virtio_ring.h>
 
+/*
+ * Barriers in virtio are tricky.  Non-SMP virtio guests can't assume
+ * they're not on an SMP host system, so they need to assume real
+ * barriers.  Non-SMP virtio hosts could skip the barriers, but does
+ * anyone care?
+ *
+ * For virtio_pci on SMP, we don't need to order with respect to MMIO
+ * accesses through relaxed memory I/O windows, so smp_mb() et al are
+ * sufficient.
+ *
+ * For using virtio to talk to real devices (eg. other heterogeneous
+ * CPUs) we do need real barriers.  In theory, we could be using both
+ * kinds of virtio, so it's a runtime decision, and the branch is
+ * actually quite cheap.
+ */
+
+#ifdef CONFIG_SMP
+static inline void virtio_mb(bool weak_barriers)
+{
+	if (weak_barriers)
+		smp_mb();
+	else
+		mb();
+}
+
+static inline void virtio_rmb(bool weak_barriers)
+{
+	if (weak_barriers)
+		smp_rmb();
+	else
+		rmb();
+}
+
+static inline void virtio_wmb(bool weak_barriers)
+{
+	if (weak_barriers)
+		smp_wmb();
+	else
+		wmb();
+}
+#else
+static inline void virtio_mb(bool weak_barriers)
+{
+	mb();
+}
+
+static inline void virtio_rmb(bool weak_barriers)
+{
+	rmb();
+}
+
+static inline void virtio_wmb(bool weak_barriers)
+{
+	wmb();
+}
+#endif
+
 struct virtio_device;
 struct virtqueue;
 
-- 
1.7.10.4


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH 2/5] vringh: host-side implementation of virtio rings.
  2013-02-18 22:29 [PATCH 0/5] vringh Rusty Russell
  2013-02-18 22:29 ` [PATCH 1/5] virtio_ring: expose virtio barriers for use in vringh Rusty Russell
@ 2013-02-18 22:29 ` Rusty Russell
  2013-02-18 22:29 ` [PATCH 3/5] tools/virtio: fix compile Rusty Russell
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 9+ messages in thread
From: Rusty Russell @ 2013-02-18 22:29 UTC (permalink / raw)
  To: virtualization; +Cc: sjur.brandeland, mst, linux-kernel, Rusty Russell

Getting use of virtio rings correct is tricky, and a recent patch saw
an implementation of in-kernel rings (as separate from userspace).

This abstracts the business of dealing with the virtio ring layout
from the access (userspace or direct); to do this, we use function
pointers, which gcc inlines correctly.

There are patches pending to make vhost use this interface, but it's
being merged now for the CAIF virtio patches, which want it
immediately.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
---
 drivers/Makefile          |    2 +-
 drivers/vhost/Kconfig     |    8 +
 drivers/vhost/Kconfig.tcm |    1 +
 drivers/vhost/Makefile    |    2 +
 drivers/vhost/vringh.c    | 1012 +++++++++++++++++++++++++++++++++++++++++++++
 include/linux/vringh.h    |  185 +++++++++
 6 files changed, 1209 insertions(+), 1 deletion(-)
 create mode 100644 drivers/vhost/vringh.c
 create mode 100644 include/linux/vringh.h

diff --git a/drivers/Makefile b/drivers/Makefile
index 7863b9f..351a34f 100644
--- a/drivers/Makefile
+++ b/drivers/Makefile
@@ -123,7 +123,7 @@ obj-$(CONFIG_PPC_PS3)		+= ps3/
 obj-$(CONFIG_OF)		+= of/
 obj-$(CONFIG_SSB)		+= ssb/
 obj-$(CONFIG_BCMA)		+= bcma/
-obj-$(CONFIG_VHOST_NET)		+= vhost/
+obj-$(CONFIG_VHOST_RING)	+= vhost/
 obj-$(CONFIG_VLYNQ)		+= vlynq/
 obj-$(CONFIG_STAGING)		+= staging/
 obj-y				+= platform/
diff --git a/drivers/vhost/Kconfig b/drivers/vhost/Kconfig
index 202bba6..613b074 100644
--- a/drivers/vhost/Kconfig
+++ b/drivers/vhost/Kconfig
@@ -1,6 +1,7 @@
 config VHOST_NET
 	tristate "Host kernel accelerator for virtio net (EXPERIMENTAL)"
 	depends on NET && EVENTFD && (TUN || !TUN) && (MACVTAP || !MACVTAP) && EXPERIMENTAL
+	select VHOST_RING
 	---help---
 	  This kernel module can be loaded in host kernel to accelerate
 	  guest networking with virtio_net. Not to be confused with virtio_net
@@ -12,3 +13,10 @@ config VHOST_NET
 if STAGING
 source "drivers/vhost/Kconfig.tcm"
 endif
+
+config VHOST_RING
+	tristate
+	---help---
+	  This option is selected by any driver which needs to access
+	  the host side of a virtio ring.
+
diff --git a/drivers/vhost/Kconfig.tcm b/drivers/vhost/Kconfig.tcm
index a9c6f76..0218f77 100644
--- a/drivers/vhost/Kconfig.tcm
+++ b/drivers/vhost/Kconfig.tcm
@@ -1,6 +1,7 @@
 config TCM_VHOST
 	tristate "TCM_VHOST fabric module (EXPERIMENTAL)"
 	depends on TARGET_CORE && EVENTFD && EXPERIMENTAL && m
+	select VHOST_RING
 	default n
 	---help---
 	Say M here to enable the TCM_VHOST fabric module for use with virtio-scsi guests
diff --git a/drivers/vhost/Makefile b/drivers/vhost/Makefile
index a27b053..1d37f5e 100644
--- a/drivers/vhost/Makefile
+++ b/drivers/vhost/Makefile
@@ -2,3 +2,5 @@ obj-$(CONFIG_VHOST_NET) += vhost_net.o
 vhost_net-y := vhost.o net.o
 
 obj-$(CONFIG_TCM_VHOST) += tcm_vhost.o
+
+obj-$(CONFIG_VHOST_RING) += vringh.o
diff --git a/drivers/vhost/vringh.c b/drivers/vhost/vringh.c
new file mode 100644
index 0000000..c05d918
--- /dev/null
+++ b/drivers/vhost/vringh.c
@@ -0,0 +1,1012 @@
+/*
+ * Helpers for the host side of a virtio ring.
+ *
+ * Since these may be in userspace, we use (inline) accessors.
+ */
+#include <linux/vringh.h>
+#include <linux/virtio_ring.h>
+#include <linux/kernel.h>
+#include <linux/ratelimit.h>
+#include <linux/uaccess.h>
+#include <linux/slab.h>
+#include <linux/export.h>
+
+static __printf(1,2) __cold void vringh_bad(const char *fmt, ...)
+{
+	static DEFINE_RATELIMIT_STATE(vringh_rs,
+				      DEFAULT_RATELIMIT_INTERVAL,
+				      DEFAULT_RATELIMIT_BURST);
+	if (__ratelimit(&vringh_rs)) {
+		va_list ap;
+		va_start(ap, fmt);
+		printk(KERN_NOTICE "vringh:");
+		vprintk(fmt, ap);
+		va_end(ap);
+	}
+}
+
+/* Returns vring->num if empty, -ve on error. */
+static inline int __vringh_get_head(const struct vringh *vrh,
+				    int (*getu16)(u16 *val, const u16 *p),
+				    u16 *last_avail_idx)
+{
+	u16 avail_idx, i, head;
+	int err;
+
+	err = getu16(&avail_idx, &vrh->vring.avail->idx);
+	if (err) {
+		vringh_bad("Failed to access avail idx at %p",
+			   &vrh->vring.avail->idx);
+		return err;
+	}
+
+	if (*last_avail_idx == avail_idx)
+		return vrh->vring.num;
+
+	/* Only get avail ring entries after they have been exposed by guest. */
+	virtio_rmb(vrh->weak_barriers);
+
+	i = *last_avail_idx & (vrh->vring.num - 1);
+
+	err = getu16(&head, &vrh->vring.avail->ring[i]);
+	if (err) {
+		vringh_bad("Failed to read head: idx %d address %p",
+			   *last_avail_idx, &vrh->vring.avail->ring[i]);
+		return err;
+	}
+
+	if (head >= vrh->vring.num) {
+		vringh_bad("Guest says index %u > %u is available",
+			   head, vrh->vring.num);
+		return -EINVAL;
+	}
+
+	(*last_avail_idx)++;
+	return head;
+}
+
+/* Copy some bytes to/from the iovec.  Returns num copied. */
+static inline ssize_t vringh_iov_xfer(struct vringh_kiov *iov,
+				      void *ptr, size_t len,
+				      int (*xfer)(void *addr, void *ptr,
+						  size_t len))
+{
+	int err, done = 0;
+
+	while (len && iov->i < iov->used) {
+		size_t partlen;
+
+		partlen = min(iov->iov[iov->i].iov_len - iov->off, len);
+		err = xfer(iov->iov[iov->i].iov_base + iov->off, ptr, partlen);
+		if (err)
+			return err;
+		done += partlen;
+		len -= partlen;
+		ptr += partlen;
+		iov->off += partlen;
+
+		if (iov->off == iov->iov[iov->i].iov_len) {
+			iov->off = 0;
+			iov->i++;
+		}
+	}
+	return done;
+}
+
+/* May reduce *len if range is shorter. */
+static inline bool range_check(struct vringh *vrh, u64 addr, size_t *len,
+			       struct vringh_range *range,
+			       bool (*getrange)(struct vringh *,
+						u64, struct vringh_range *))
+{
+	if (addr < range->start || addr > range->end_incl) {
+		if (!getrange(vrh, addr, range))
+			return false;
+	}
+	BUG_ON(addr < range->start || addr > range->end_incl);
+
+	/* To end of memory? */
+	if (unlikely(addr + *len == 0)) {
+		if (range->end_incl == -1ULL)
+			return true;
+		goto truncate;
+	}
+
+	/* Otherwise, don't wrap. */
+	if (addr + *len < addr) {
+		vringh_bad("Wrapping descriptor %zu@0x%llx",
+			   *len, (unsigned long long)addr);
+		return false;
+	}
+
+	if (unlikely(addr + *len - 1 > range->end_incl))
+		goto truncate;
+	return true;
+
+truncate:
+	*len = range->end_incl + 1 - addr;
+	return true;
+}
+
+static inline bool no_range_check(struct vringh *vrh, u64 addr, size_t *len,
+				  struct vringh_range *range,
+				  bool (*getrange)(struct vringh *,
+						   u64, struct vringh_range *))
+{
+	return true;
+}
+
+/* No reason for this code to be inline. */
+static int move_to_indirect(int *up_next, u16 *i, void *addr,
+			    const struct vring_desc *desc,
+			    struct vring_desc **descs, int *desc_max)
+{
+	/* Indirect tables can't have indirect. */
+	if (*up_next != -1) {
+		vringh_bad("Multilevel indirect %u->%u", *up_next, *i);
+		return -EINVAL;
+	}
+
+	if (unlikely(desc->len % sizeof(struct vring_desc))) {
+		vringh_bad("Strange indirect len %u", desc->len);
+		return -EINVAL;
+	}
+
+	/* We will check this when we follow it! */
+	if (desc->flags & VRING_DESC_F_NEXT)
+		*up_next = desc->next;
+	else
+		*up_next = -2;
+	*descs = addr;
+	*desc_max = desc->len / sizeof(struct vring_desc);
+
+	/* Now, start at the first indirect. */
+	*i = 0;
+	return 0;
+}
+
+static int resize_iovec(struct vringh_kiov *iov, gfp_t gfp)
+{
+	struct kvec *new;
+	unsigned int flag, new_num = (iov->max_num & ~VRINGH_IOV_ALLOCATED) * 2;
+
+	if (new_num < 8)
+		new_num = 8;
+
+	flag = (iov->max_num & VRINGH_IOV_ALLOCATED);
+	if (flag)
+		new = krealloc(iov->iov, new_num * sizeof(struct iovec), gfp);
+	else {
+		new = kmalloc(new_num * sizeof(struct iovec), gfp);
+		if (new) {
+			memcpy(new, iov->iov,
+			       iov->max_num * sizeof(struct iovec));
+			flag = VRINGH_IOV_ALLOCATED;
+		}
+	}
+	if (!new)
+		return -ENOMEM;
+	iov->iov = new;
+	iov->max_num = (new_num | flag);
+	return 0;
+}
+
+static u16 __cold return_from_indirect(const struct vringh *vrh, int *up_next,
+				       struct vring_desc **descs, int *desc_max)
+{
+	u16 i = *up_next;
+
+	*up_next = -1;
+	*descs = vrh->vring.desc;
+	*desc_max = vrh->vring.num;
+	return i;
+}
+
+static int slow_copy(struct vringh *vrh, void *dst, const void *src,
+		     bool (*rcheck)(struct vringh *vrh, u64 addr, size_t *len,
+				    struct vringh_range *range,
+				    bool (*getrange)(struct vringh *vrh,
+						     u64,
+						     struct vringh_range *)),
+		     bool (*getrange)(struct vringh *vrh,
+				      u64 addr,
+				      struct vringh_range *r),
+		     struct vringh_range *range,
+		     int (*copy)(void *dst, const void *src, size_t len))
+{
+	size_t part, len = sizeof(struct vring_desc);
+
+	do {
+		u64 addr;
+		int err;
+
+		part = len;
+		addr = (u64)(unsigned long)src - range->offset;
+
+		if (!rcheck(vrh, addr, &part, range, getrange))
+			return -EINVAL;
+
+		err = copy(dst, src, part);
+		if (err)
+			return err;
+
+		dst += part;
+		src += part;
+		len -= part;
+	} while (len);
+	return 0;
+}
+
+static inline int
+__vringh_iov(struct vringh *vrh, u16 i,
+	     struct vringh_kiov *riov,
+	     struct vringh_kiov *wiov,
+	     bool (*rcheck)(struct vringh *vrh, u64 addr, size_t *len,
+			    struct vringh_range *range,
+			    bool (*getrange)(struct vringh *, u64,
+					     struct vringh_range *)),
+	     bool (*getrange)(struct vringh *, u64, struct vringh_range *),
+	     gfp_t gfp,
+	     int (*copy)(void *dst, const void *src, size_t len))
+{
+	int err, count = 0, up_next, desc_max;
+	struct vring_desc desc, *descs;
+	struct vringh_range range = { -1ULL, 0 }, slowrange;
+	bool slow = false;
+
+	/* We start traversing vring's descriptor table. */
+	descs = vrh->vring.desc;
+	desc_max = vrh->vring.num;
+	up_next = -1;
+
+	if (riov)
+		riov->i = riov->used = 0;
+	else if (wiov)
+		wiov->i = wiov->used = 0;
+	else
+		/* You must want something! */
+		BUG();
+
+	for (;;) {
+		void *addr;
+		struct vringh_kiov *iov;
+		size_t len;
+
+		if (unlikely(slow))
+			err = slow_copy(vrh, &desc, &descs[i], rcheck, getrange,
+					&slowrange, copy);
+		else
+			err = copy(&desc, &descs[i], sizeof(desc));
+		if (unlikely(err))
+			goto fail;
+
+		if (unlikely(desc.flags & VRING_DESC_F_INDIRECT)) {
+			/* Make sure it's OK, and get offset. */
+			len = desc.len;
+			if (!rcheck(vrh, desc.addr, &len, &range, getrange)) {
+				err = -EINVAL;
+				goto fail;
+			}
+
+			if (unlikely(len != desc.len)) {
+				slow = true;
+				/* We need to save this range to use offset */
+				slowrange = range;
+			}
+
+			addr = (void *)(long)(desc.addr + range.offset);
+			err = move_to_indirect(&up_next, &i, addr, &desc,
+					       &descs, &desc_max);
+			if (err)
+				goto fail;
+			continue;
+		}
+
+		if (count++ == vrh->vring.num) {
+			vringh_bad("Descriptor loop in %p", descs);
+			err = -ELOOP;
+			goto fail;
+		}
+
+		if (desc.flags & VRING_DESC_F_WRITE)
+			iov = wiov;
+		else {
+			iov = riov;
+			if (unlikely(wiov && wiov->i)) {
+				vringh_bad("Readable desc %p after writable",
+					   &descs[i]);
+				err = -EINVAL;
+				goto fail;
+			}
+		}
+
+		if (!iov) {
+			vringh_bad("Unexpected %s desc",
+				   !wiov ? "writable" : "readable");
+			err = -EPROTO;
+			goto fail;
+		}
+
+	again:
+		/* Make sure it's OK, and get offset. */
+		len = desc.len;
+		if (!rcheck(vrh, desc.addr, &len, &range, getrange)) {
+			err = -EINVAL;
+			goto fail;
+		}
+		addr = (void *)(unsigned long)(desc.addr + range.offset);
+
+		if (unlikely(iov->used == (iov->max_num & ~VRINGH_IOV_ALLOCATED))) {
+			err = resize_iovec(iov, gfp);
+			if (err)
+				goto fail;
+		}
+
+		iov->iov[iov->used].iov_base = addr;
+		iov->iov[iov->used].iov_len = len;
+		iov->used++;
+
+		if (unlikely(len != desc.len)) {
+			desc.len -= len;
+			desc.addr += len;
+			goto again;
+		}
+
+		if (desc.flags & VRING_DESC_F_NEXT) {
+			i = desc.next;
+		} else {
+			/* Just in case we need to finish traversing above. */
+			if (unlikely(up_next > 0)) {
+				i = return_from_indirect(vrh, &up_next,
+							 &descs, &desc_max);
+				slow = false;
+			} else
+				break;
+		}
+
+		if (i >= desc_max) {
+			vringh_bad("Chained index %u > %u", i, desc_max);
+			err = -EINVAL;
+			goto fail;
+		}
+	}
+
+	return 0;
+
+fail:
+	return err;
+}
+
+static inline int __vringh_complete(struct vringh *vrh,
+				    const struct vring_used_elem *used,
+				    unsigned int num_used,
+				    int (*putu16)(u16 *p, u16 val),
+				    int (*putused)(struct vring_used_elem *dst,
+						   const struct vring_used_elem
+						   *src, unsigned num))
+{
+	struct vring_used *used_ring;
+	int err;
+	u16 used_idx, off;
+
+	used_ring = vrh->vring.used;
+	used_idx = vrh->last_used_idx + vrh->completed;
+
+	off = used_idx % vrh->vring.num;
+
+	/* Compiler knows num_used == 1 sometimes, hence extra check */
+	if (num_used > 1 && unlikely(off + num_used >= vrh->vring.num)) {
+		u16 part = vrh->vring.num - off;
+		err = putused(&used_ring->ring[off], used, part);
+		if (!err)
+			err = putused(&used_ring->ring[0], used + part,
+				      num_used - part);
+	} else
+		err = putused(&used_ring->ring[off], used, num_used);
+
+	if (err) {
+		vringh_bad("Failed to write %u used entries %u at %p",
+			   num_used, off, &used_ring->ring[off]);
+		return err;
+	}
+
+	/* Make sure buffer is written before we update index. */
+	virtio_wmb(vrh->weak_barriers);
+
+	err = putu16(&vrh->vring.used->idx, used_idx + num_used);
+	if (err) {
+		vringh_bad("Failed to update used index at %p",
+			   &vrh->vring.used->idx);
+		return err;
+	}
+
+	vrh->completed += num_used;
+	return 0;
+}
+
+
+static inline int __vringh_need_notify(struct vringh *vrh,
+				       int (*getu16)(u16 *val, const u16 *p))
+{
+	bool notify;
+	u16 used_event;
+	int err;
+
+	/* Flush out used index update. This is paired with the
+	 * barrier that the Guest executes when enabling
+	 * interrupts. */
+	virtio_mb(vrh->weak_barriers);
+
+	/* Old-style, without event indices. */
+	if (!vrh->event_indices) {
+		u16 flags;
+		err = getu16(&flags, &vrh->vring.avail->flags);
+		if (err) {
+			vringh_bad("Failed to get flags at %p",
+				   &vrh->vring.avail->flags);
+			return err;
+		}
+		return (!(flags & VRING_AVAIL_F_NO_INTERRUPT));
+	}
+
+	/* Modern: we know when other side wants to know. */
+	err = getu16(&used_event, &vring_used_event(&vrh->vring));
+	if (err) {
+		vringh_bad("Failed to get used event idx at %p",
+			   &vring_used_event(&vrh->vring));
+		return err;
+	}
+
+	/* Just in case we added so many that we wrap. */
+	if (unlikely(vrh->completed > 0xffff))
+		notify = true;
+	else
+		notify = vring_need_event(used_event,
+					  vrh->last_used_idx + vrh->completed,
+					  vrh->last_used_idx);
+
+	vrh->last_used_idx += vrh->completed;
+	vrh->completed = 0;
+	return notify;
+}
+
+static inline bool __vringh_notify_enable(struct vringh *vrh,
+					  int (*getu16)(u16 *val, const u16 *p),
+					  int (*putu16)(u16 *p, u16 val))
+{
+	u16 avail;
+
+	/* Already enabled? */
+	if (vrh->listening)
+		return false;
+
+	vrh->listening = true;
+
+	if (!vrh->event_indices) {
+		/* Old-school; update flags. */
+		if (putu16(&vrh->vring.used->flags, 0) != 0) {
+			vringh_bad("Clearing used flags %p",
+				   &vrh->vring.used->flags);
+			return false;
+		}
+	} else {
+		if (putu16(&vring_avail_event(&vrh->vring),
+			   vrh->last_avail_idx) != 0) {
+			vringh_bad("Updating avail event index %p",
+				   &vring_avail_event(&vrh->vring));
+			return false;
+		}
+	}
+
+	/* They could have slipped one in as we were doing that: make
+	 * sure it's written, then check again. */
+	virtio_mb(vrh->weak_barriers);
+
+	if (getu16(&avail, &vrh->vring.avail->idx) != 0) {
+		vringh_bad("Failed to check avail idx at %p",
+			   &vrh->vring.avail->idx);
+		return false;
+	}
+
+	/* This is so unlikely, we just leave notifications enabled. */
+	return avail != vrh->last_avail_idx;
+}
+
+static inline void __vringh_notify_disable(struct vringh *vrh,
+					   int (*putu16)(u16 *p, u16 val))
+{
+	/* Already disabled? */
+	if (!vrh->listening)
+		return;
+
+	vrh->listening = false;
+	if (!vrh->event_indices) {
+		/* Old-school; update flags. */
+		if (putu16(&vrh->vring.used->flags, VRING_USED_F_NO_NOTIFY)) {
+			vringh_bad("Setting used flags %p",
+				   &vrh->vring.used->flags);
+		}
+	}
+}
+
+/* Userspace access helpers: in this case, addresses are really userspace. */
+static inline int getu16_user(u16 *val, const u16 *p)
+{
+	return get_user(*val, (__force u16 __user *)p);
+}
+
+static inline int putu16_user(u16 *p, u16 val)
+{
+	return put_user(val, (__force u16 __user *)p);
+}
+
+static inline int copydesc_user(void *dst, const void *src, size_t len)
+{
+	return copy_from_user(dst, (__force void __user *)src, len) ?
+		-EFAULT : 0;
+}
+
+static inline int putused_user(struct vring_used_elem *dst,
+			       const struct vring_used_elem *src,
+			       unsigned int num)
+{
+	return copy_to_user((__force void __user *)dst, src,
+			    sizeof(*dst) * num) ? -EFAULT : 0;
+}
+
+static inline int xfer_from_user(void *src, void *dst, size_t len)
+{
+	return copy_from_user(dst, (__force void __user *)src, len) ?
+		-EFAULT : 0;
+}
+
+static inline int xfer_to_user(void *dst, void *src, size_t len)
+{
+	return copy_to_user((__force void __user *)dst, src, len) ?
+		-EFAULT : 0;
+}
+
+/**
+ * vringh_init_user - initialize a vringh for a userspace vring.
+ * @vrh: the vringh to initialize.
+ * @features: the feature bits for this ring.
+ * @num: the number of elements.
+ * @weak_barriers: true if we only need memory barriers, not I/O.
+ * @desc: the userpace descriptor pointer.
+ * @avail: the userpace avail pointer.
+ * @used: the userpace used pointer.
+ *
+ * Returns an error if num is invalid: you should check pointers
+ * yourself!
+ */
+int vringh_init_user(struct vringh *vrh, u32 features,
+		     unsigned int num, bool weak_barriers,
+		     struct vring_desc __user *desc,
+		     struct vring_avail __user *avail,
+		     struct vring_used __user *used)
+{
+	/* Sane power of 2 please! */
+	if (!num || num > 0xffff || (num & (num - 1))) {
+		vringh_bad("Bad ring size %u", num);
+		return -EINVAL;
+	}
+
+	vrh->event_indices = (features & (1 << VIRTIO_RING_F_EVENT_IDX));
+	vrh->weak_barriers = weak_barriers;
+	vrh->listening = false;
+	vrh->completed = 0;
+	vrh->last_avail_idx = 0;
+	vrh->last_used_idx = 0;
+	vrh->vring.num = num;
+	/* vring expects kernel addresses, but only used via accessors. */
+	vrh->vring.desc = (__force struct vring_desc *)desc;
+	vrh->vring.avail = (__force struct vring_avail *)avail;
+	vrh->vring.used = (__force struct vring_used *)used;
+	return 0;
+}
+EXPORT_SYMBOL(vringh_init_user);
+
+/**
+ * vringh_getdesc_user - get next available descriptor from userspace ring.
+ * @vrh: the userspace vring.
+ * @riov: where to put the readable descriptors (or NULL)
+ * @wiov: where to put the writable descriptors (or NULL)
+ * @getrange: function to call to check ranges.
+ * @head: head index we received, for passing to vringh_complete_user().
+ *
+ * Returns 0 if there was no descriptor, 1 if there was, or -errno.
+ *
+ * Note that on error return, you can tell the difference between an
+ * invalid ring and a single invalid descriptor: in the former case,
+ * *head will be vrh->vring.num.  You may be able to ignore an invalid
+ * descriptor, but there's not much you can do with an invalid ring.
+ *
+ * Note that you may need to clean up riov and wiov, even on error!
+ */
+int vringh_getdesc_user(struct vringh *vrh,
+			struct vringh_iov *riov,
+			struct vringh_iov *wiov,
+			bool (*getrange)(struct vringh *vrh,
+					 u64 addr, struct vringh_range *r),
+			u16 *head)
+{
+	int err;
+
+	*head = vrh->vring.num;
+	err = __vringh_get_head(vrh, getu16_user, &vrh->last_avail_idx);
+	if (err < 0)
+		return err;
+
+	/* Empty... */
+	if (err == vrh->vring.num)
+		return 0;
+
+	/* We need the layouts to be the indentical for this to work */
+	BUILD_BUG_ON(sizeof(struct vringh_kiov) != sizeof(struct vringh_iov));
+	BUILD_BUG_ON(offsetof(struct vringh_kiov, iov) !=
+		     offsetof(struct vringh_iov, iov));
+	BUILD_BUG_ON(offsetof(struct vringh_kiov, i) !=
+		     offsetof(struct vringh_iov, i));
+	BUILD_BUG_ON(offsetof(struct vringh_kiov, used) !=
+		     offsetof(struct vringh_iov, used));
+	BUILD_BUG_ON(offsetof(struct vringh_kiov, max_num) !=
+		     offsetof(struct vringh_iov, max_num));
+	BUILD_BUG_ON(sizeof(struct iovec) != sizeof(struct kvec));
+	BUILD_BUG_ON(offsetof(struct iovec, iov_base) !=
+		     offsetof(struct kvec, iov_base));
+	BUILD_BUG_ON(offsetof(struct iovec, iov_len) !=
+		     offsetof(struct kvec, iov_len));
+	BUILD_BUG_ON(sizeof(((struct iovec *)NULL)->iov_base)
+		     != sizeof(((struct kvec *)NULL)->iov_base));
+	BUILD_BUG_ON(sizeof(((struct iovec *)NULL)->iov_len)
+		     != sizeof(((struct kvec *)NULL)->iov_len));
+
+	*head = err;
+	err = __vringh_iov(vrh, *head, (struct vringh_kiov *)riov,
+			   (struct vringh_kiov *)wiov,
+			   range_check, getrange, GFP_KERNEL, copydesc_user);
+	if (err)
+		return err;
+
+	return 1;
+}
+EXPORT_SYMBOL(vringh_getdesc_user);
+
+/**
+ * vringh_iov_pull_user - copy bytes from vring_iov.
+ * @riov: the riov as passed to vringh_getdesc_user() (updated as we consume)
+ * @dst: the place to copy.
+ * @len: the maximum length to copy.
+ *
+ * Returns the bytes copied <= len or a negative errno.
+ */
+ssize_t vringh_iov_pull_user(struct vringh_iov *riov, void *dst, size_t len)
+{
+	return vringh_iov_xfer((struct vringh_kiov *)riov,
+			       dst, len, xfer_from_user);
+}
+EXPORT_SYMBOL(vringh_iov_pull_user);
+
+/**
+ * vringh_iov_push_user - copy bytes into vring_iov.
+ * @wiov: the wiov as passed to vringh_getdesc_user() (updated as we consume)
+ * @dst: the place to copy.
+ * @len: the maximum length to copy.
+ *
+ * Returns the bytes copied <= len or a negative errno.
+ */
+ssize_t vringh_iov_push_user(struct vringh_iov *wiov,
+			     const void *src, size_t len)
+{
+	return vringh_iov_xfer((struct vringh_kiov *)wiov,
+			       (void *)src, len, xfer_to_user);
+}
+EXPORT_SYMBOL(vringh_iov_push_user);
+
+/**
+ * vringh_abandon_user - we've decided not to handle the descriptor(s).
+ * @vrh: the vring.
+ * @num: the number of descriptors to put back (ie. num
+ *	 vringh_get_user() to undo).
+ *
+ * The next vringh_get_user() will return the old descriptor(s) again.
+ */
+void vringh_abandon_user(struct vringh *vrh, unsigned int num)
+{
+	/* We only update vring_avail_event(vr) when we want to be notified,
+	 * so we haven't changed that yet. */
+	vrh->last_avail_idx -= num;
+}
+EXPORT_SYMBOL(vringh_abandon_user);
+
+/**
+ * vringh_complete_user - we've finished with descriptor, publish it.
+ * @vrh: the vring.
+ * @head: the head as filled in by vringh_getdesc_user.
+ * @len: the length of data we have written.
+ *
+ * You should check vringh_need_notify_user() after one or more calls
+ * to this function.
+ */
+int vringh_complete_user(struct vringh *vrh, u16 head, u32 len)
+{
+	struct vring_used_elem used;
+
+	used.id = head;
+	used.len = len;
+	return __vringh_complete(vrh, &used, 1, putu16_user, putused_user);
+}
+EXPORT_SYMBOL(vringh_complete_user);
+
+/**
+ * vringh_complete_multi_user - we've finished with many descriptors.
+ * @vrh: the vring.
+ * @used: the head, length pairs.
+ * @num_used: the number of used elements.
+ *
+ * You should check vringh_need_notify_user() after one or more calls
+ * to this function.
+ */
+int vringh_complete_multi_user(struct vringh *vrh,
+			       const struct vring_used_elem used[],
+			       unsigned num_used)
+{
+	return __vringh_complete(vrh, used, num_used,
+				 putu16_user, putused_user);
+}
+EXPORT_SYMBOL(vringh_complete_multi_user);
+
+/**
+ * vringh_notify_enable_user - we want to know if something changes.
+ * @vrh: the vring.
+ *
+ * This always enables notifications, but returns true if there are
+ * now more buffers available in the vring.
+ */
+bool vringh_notify_enable_user(struct vringh *vrh)
+{
+	return __vringh_notify_enable(vrh, getu16_user, putu16_user);
+}
+EXPORT_SYMBOL(vringh_notify_enable_user);
+
+/**
+ * vringh_notify_disable_user - don't tell us if something changes.
+ * @vrh: the vring.
+ *
+ * This is our normal running state: we disable and then only enable when
+ * we're going to sleep.
+ */
+void vringh_notify_disable_user(struct vringh *vrh)
+{
+	__vringh_notify_disable(vrh, putu16_user);
+}
+EXPORT_SYMBOL(vringh_notify_disable_user);
+
+/**
+ * vringh_need_notify_user - must we tell the other side about used buffers?
+ * @vrh: the vring we've called vringh_complete_user() on.
+ *
+ * Returns -errno or 0 if we don't need to tell the other side, 1 if we do.
+ */
+int vringh_need_notify_user(struct vringh *vrh)
+{
+	return __vringh_need_notify(vrh, getu16_user);
+}
+EXPORT_SYMBOL(vringh_need_notify_user);
+
+/* Kernelspace access helpers. */
+static inline int getu16_kern(u16 *val, const u16 *p)
+{
+	*val = *p;
+	return 0;
+}
+
+static inline int putu16_kern(u16 *p, u16 val)
+{
+	*p = val;
+	return 0;
+}
+
+static inline int copydesc_kern(void *dst, const void *src, size_t len)
+{
+	memcpy(dst, src, len);
+	return 0;
+}
+
+static inline int putused_kern(struct vring_used_elem *dst,
+			       const struct vring_used_elem *src,
+			       unsigned int num)
+{
+	memcpy(dst, src, num * sizeof(*dst));
+	return 0;
+}
+
+static inline int xfer_kern(void *src, void *dst, size_t len)
+{
+	memcpy(dst, src, len);
+	return 0;
+}
+
+/**
+ * vringh_init_kern - initialize a vringh for a kernelspace vring.
+ * @vrh: the vringh to initialize.
+ * @features: the feature bits for this ring.
+ * @num: the number of elements.
+ * @weak_barriers: true if we only need memory barriers, not I/O.
+ * @desc: the userpace descriptor pointer.
+ * @avail: the userpace avail pointer.
+ * @used: the userpace used pointer.
+ *
+ * Returns an error if num is invalid.
+ */
+int vringh_init_kern(struct vringh *vrh, u32 features,
+		     unsigned int num, bool weak_barriers,
+		     struct vring_desc *desc,
+		     struct vring_avail *avail,
+		     struct vring_used *used)
+{
+	/* Sane power of 2 please! */
+	if (!num || num > 0xffff || (num & (num - 1))) {
+		vringh_bad("Bad ring size %u", num);
+		return -EINVAL;
+	}
+
+	vrh->event_indices = (features & (1 << VIRTIO_RING_F_EVENT_IDX));
+	vrh->weak_barriers = weak_barriers;
+	vrh->listening = false;
+	vrh->completed = 0;
+	vrh->last_avail_idx = 0;
+	vrh->last_used_idx = 0;
+	vrh->vring.num = num;
+	vrh->vring.desc = desc;
+	vrh->vring.avail = avail;
+	vrh->vring.used = used;
+	return 0;
+}
+EXPORT_SYMBOL(vringh_init_kern);
+
+/**
+ * vringh_getdesc_kern - get next available descriptor from kernelspace ring.
+ * @vrh: the kernelspace vring.
+ * @riov: where to put the readable descriptors (or NULL)
+ * @wiov: where to put the writable descriptors (or NULL)
+ * @head: head index we received, for passing to vringh_complete_kern().
+ * @gfp: flags for allocating larger riov/wiov.
+ *
+ * Returns 0 if there was no descriptor, 1 if there was, or -errno.
+ *
+ * Note that on error return, you can tell the difference between an
+ * invalid ring and a single invalid descriptor: in the former case,
+ * *head will be vrh->vring.num.  You may be able to ignore an invalid
+ * descriptor, but there's not much you can do with an invalid ring.
+ *
+ * Note that you may need to clean up riov and wiov, even on error!
+ */
+int vringh_getdesc_kern(struct vringh *vrh,
+			struct vringh_kiov *riov,
+			struct vringh_kiov *wiov,
+			u16 *head,
+			gfp_t gfp)
+{
+	int err;
+
+	err = __vringh_get_head(vrh, getu16_kern, &vrh->last_avail_idx);
+	if (err < 0)
+		return err;
+
+	/* Empty... */
+	if (err == vrh->vring.num)
+		return 0;
+
+	*head = err;
+	err = __vringh_iov(vrh, *head, riov, wiov, no_range_check, NULL,
+			   gfp, copydesc_kern);
+	if (err)
+		return err;
+
+	return 1;
+}
+EXPORT_SYMBOL(vringh_getdesc_kern);
+
+/**
+ * vringh_iov_pull_kern - copy bytes from vring_iov.
+ * @riov: the riov as passed to vringh_getdesc_kern() (updated as we consume)
+ * @dst: the place to copy.
+ * @len: the maximum length to copy.
+ *
+ * Returns the bytes copied <= len or a negative errno.
+ */
+ssize_t vringh_iov_pull_kern(struct vringh_kiov *riov, void *dst, size_t len)
+{
+	return vringh_iov_xfer(riov, dst, len, xfer_kern);
+}
+EXPORT_SYMBOL(vringh_iov_pull_kern);
+
+/**
+ * vringh_iov_push_kern - copy bytes into vring_iov.
+ * @wiov: the wiov as passed to vringh_getdesc_kern() (updated as we consume)
+ * @dst: the place to copy.
+ * @len: the maximum length to copy.
+ *
+ * Returns the bytes copied <= len or a negative errno.
+ */
+ssize_t vringh_iov_push_kern(struct vringh_kiov *wiov,
+			     const void *src, size_t len)
+{
+	return vringh_iov_xfer(wiov, (void *)src, len, xfer_kern);
+}
+EXPORT_SYMBOL(vringh_iov_push_kern);
+
+/**
+ * vringh_abandon_kern - we've decided not to handle the descriptor(s).
+ * @vrh: the vring.
+ * @num: the number of descriptors to put back (ie. num
+ *	 vringh_get_kern() to undo).
+ *
+ * The next vringh_get_kern() will return the old descriptor(s) again.
+ */
+void vringh_abandon_kern(struct vringh *vrh, unsigned int num)
+{
+	/* We only update vring_avail_event(vr) when we want to be notified,
+	 * so we haven't changed that yet. */
+	vrh->last_avail_idx -= num;
+}
+EXPORT_SYMBOL(vringh_abandon_kern);
+
+/**
+ * vringh_complete_kern - we've finished with descriptor, publish it.
+ * @vrh: the vring.
+ * @head: the head as filled in by vringh_getdesc_kern.
+ * @len: the length of data we have written.
+ *
+ * You should check vringh_need_notify_kern() after one or more calls
+ * to this function.
+ */
+int vringh_complete_kern(struct vringh *vrh, u16 head, u32 len)
+{
+	struct vring_used_elem used;
+
+	used.id = head;
+	used.len = len;
+
+	return __vringh_complete(vrh, &used, 1, putu16_kern, putused_kern);
+}
+EXPORT_SYMBOL(vringh_complete_kern);
+
+/**
+ * vringh_notify_enable_kern - we want to know if something changes.
+ * @vrh: the vring.
+ *
+ * This always enables notifications, but returns true if there are
+ * now more buffers available in the vring.
+ */
+bool vringh_notify_enable_kern(struct vringh *vrh)
+{
+	return __vringh_notify_enable(vrh, getu16_kern, putu16_kern);
+}
+EXPORT_SYMBOL(vringh_notify_enable_kern);
+
+/**
+ * vringh_notify_disable_kern - don't tell us if something changes.
+ * @vrh: the vring.
+ *
+ * This is our normal running state: we disable and then only enable when
+ * we're going to sleep.
+ */
+void vringh_notify_disable_kern(struct vringh *vrh)
+{
+	__vringh_notify_disable(vrh, putu16_kern);
+}
+EXPORT_SYMBOL(vringh_notify_disable_kern);
+
+/**
+ * vringh_need_notify_kern - must we tell the other side about used buffers?
+ * @vrh: the vring we've called vringh_complete_kern() on.
+ *
+ * Returns -errno or 0 if we don't need to tell the other side, 1 if we do.
+ */
+int vringh_need_notify_kern(struct vringh *vrh)
+{
+	return __vringh_need_notify(vrh, getu16_kern);
+}
+EXPORT_SYMBOL(vringh_need_notify_kern);
diff --git a/include/linux/vringh.h b/include/linux/vringh.h
new file mode 100644
index 0000000..ab41185
--- /dev/null
+++ b/include/linux/vringh.h
@@ -0,0 +1,185 @@
+/*
+ * Linux host-side vring helpers; for when the kernel needs to access
+ * someone else's vring.
+ *
+ * Copyright IBM Corporation, 2013.
+ * Parts taken from drivers/vhost/vhost.c Copyright 2009 Red Hat, Inc.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.
+ *
+ * Written by: Rusty Russell <rusty@rustcorp.com.au>
+ */
+#ifndef _LINUX_VRINGH_H
+#define _LINUX_VRINGH_H
+#include <uapi/linux/virtio_ring.h>
+#include <linux/uio.h>
+#include <linux/slab.h>
+#include <asm/barrier.h>
+
+/* virtio_ring with information needed for host access. */
+struct vringh {
+	/* Guest publishes used event idx (note: we always do). */
+	bool event_indices;
+
+	/* Have we told the other end we want to be notified? */
+	bool listening;
+
+	/* Can we get away with weak barriers? */
+	bool weak_barriers;
+
+	/* Last available index we saw (ie. where we're up to). */
+	u16 last_avail_idx;
+
+	/* Last index we used. */
+	u16 last_used_idx;
+
+	/* How many descriptors we've completed since last need_notify(). */
+	u32 completed;
+
+	/* The vring (note: it may contain user pointers!) */
+	struct vring vring;
+};
+
+/* The memory the vring can access, and what offset to apply. */
+struct vringh_range {
+	u64 start, end_incl;
+	u64 offset;
+};
+
+/* All the information about an iovec. */
+struct vringh_iov {
+	struct iovec *iov;
+	size_t off; /* Within iov[i] */
+	unsigned i, used, max_num;
+};
+
+/* All the information about a kvec. */
+struct vringh_kiov {
+	struct kvec *iov;
+	size_t off; /* Within iov[i] */
+	unsigned i, used, max_num;
+};
+
+/* Flag on max_num to indicate we're kmalloced. */
+#define VRINGH_IOV_ALLOCATED 0x8000000
+
+/* Helpers for userspace vrings. */
+int vringh_init_user(struct vringh *vrh, u32 features,
+		     unsigned int num, bool weak_barriers,
+		     struct vring_desc __user *desc,
+		     struct vring_avail __user *avail,
+		     struct vring_used __user *used);
+
+static inline void vringh_iov_init(struct vringh_iov *iov,
+				   struct iovec *iovec, unsigned num)
+{
+	iov->used = iov->i = 0;
+	iov->off = 0;
+	iov->max_num = num;
+	iov->iov = iovec;
+}
+
+static inline void vringh_iov_reset(struct vringh_iov *iov)
+{
+	iov->off = 0;
+	iov->i = 0;
+}
+
+static inline void vringh_iov_cleanup(struct vringh_iov *iov)
+{
+	if (iov->max_num & VRINGH_IOV_ALLOCATED)
+		kfree(iov->iov);
+	iov->max_num = iov->used = iov->i = iov->off = 0;
+	iov->iov = NULL;
+}
+
+/* Convert a descriptor into iovecs. */
+int vringh_getdesc_user(struct vringh *vrh,
+			struct vringh_iov *riov,
+			struct vringh_iov *wiov,
+			bool (*getrange)(struct vringh *vrh,
+					 u64 addr, struct vringh_range *r),
+			u16 *head);
+
+/* Copy bytes from readable vsg, consuming it (and incrementing wiov->i). */
+ssize_t vringh_iov_pull_user(struct vringh_iov *riov, void *dst, size_t len);
+
+/* Copy bytes into writable vsg, consuming it (and incrementing wiov->i). */
+ssize_t vringh_iov_push_user(struct vringh_iov *wiov,
+			     const void *src, size_t len);
+
+/* Mark a descriptor as used. */
+int vringh_complete_user(struct vringh *vrh, u16 head, u32 len);
+int vringh_complete_multi_user(struct vringh *vrh,
+			       const struct vring_used_elem used[],
+			       unsigned num_used);
+
+/* Pretend we've never seen descriptor (for easy error handling). */
+void vringh_abandon_user(struct vringh *vrh, unsigned int num);
+
+/* Do we need to fire the eventfd to notify the other side? */
+int vringh_need_notify_user(struct vringh *vrh);
+
+bool vringh_notify_enable_user(struct vringh *vrh);
+void vringh_notify_disable_user(struct vringh *vrh);
+
+/* Helpers for kernelspace vrings. */
+int vringh_init_kern(struct vringh *vrh, u32 features,
+		     unsigned int num, bool weak_barriers,
+		     struct vring_desc *desc,
+		     struct vring_avail *avail,
+		     struct vring_used *used);
+
+static inline void vringh_kiov_init(struct vringh_kiov *kiov,
+				    struct kvec *kvec, unsigned num)
+{
+	kiov->used = kiov->i = 0;
+	kiov->off = 0;
+	kiov->max_num = num;
+	kiov->iov = kvec;
+}
+
+static inline void vringh_kiov_reset(struct vringh_kiov *kiov)
+{
+	kiov->off = 0;
+	kiov->i = 0;
+}
+
+static inline void vringh_kiov_cleanup(struct vringh_kiov *kiov)
+{
+	if (kiov->max_num & VRINGH_IOV_ALLOCATED)
+		kfree(kiov->iov);
+	kiov->max_num = kiov->used = kiov->i = kiov->off = 0;
+	kiov->iov = NULL;
+}
+
+int vringh_getdesc_kern(struct vringh *vrh,
+			struct vringh_kiov *riov,
+			struct vringh_kiov *wiov,
+			u16 *head,
+			gfp_t gfp);
+
+ssize_t vringh_iov_pull_kern(struct vringh_kiov *riov, void *dst, size_t len);
+ssize_t vringh_iov_push_kern(struct vringh_kiov *wiov,
+			     const void *src, size_t len);
+void vringh_abandon_kern(struct vringh *vrh, unsigned int num);
+int vringh_complete_kern(struct vringh *vrh, u16 head, u32 len);
+
+bool vringh_notify_enable_kern(struct vringh *vrh);
+void vringh_notify_disable_kern(struct vringh *vrh);
+
+int vringh_need_notify_kern(struct vringh *vrh);
+
+#endif /* _LINUX_VRINGH_H */
-- 
1.7.10.4


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH 3/5] tools/virtio: fix compile.
  2013-02-18 22:29 [PATCH 0/5] vringh Rusty Russell
  2013-02-18 22:29 ` [PATCH 1/5] virtio_ring: expose virtio barriers for use in vringh Rusty Russell
  2013-02-18 22:29 ` [PATCH 2/5] vringh: host-side implementation of virtio rings Rusty Russell
@ 2013-02-18 22:29 ` Rusty Russell
  2013-02-18 22:29 ` [PATCH 4/5] tools/virtio: separate headers more Rusty Russell
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 9+ messages in thread
From: Rusty Russell @ 2013-02-18 22:29 UTC (permalink / raw)
  To: virtualization; +Cc: sjur.brandeland, mst, linux-kernel, Rusty Russell

Didn't test that it actually works, mind you!

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
---
 tools/virtio/linux/virtio.h |    9 ++++++++-
 tools/virtio/virtio_test.c  |    6 +++---
 2 files changed, 11 insertions(+), 4 deletions(-)

diff --git a/tools/virtio/linux/virtio.h b/tools/virtio/linux/virtio.h
index 81847dd..ccfc901 100644
--- a/tools/virtio/linux/virtio.h
+++ b/tools/virtio/linux/virtio.h
@@ -86,6 +86,10 @@ typedef enum {
 	GFP_KERNEL,
 	GFP_ATOMIC,
 } gfp_t;
+
+#define __GFP_HIGHMEM 0
+#define __GFP_HIGH 0
+
 typedef enum {
 	IRQ_NONE,
 	IRQ_HANDLED
@@ -163,6 +167,8 @@ struct virtqueue {
 	void (*callback)(struct virtqueue *vq);
 	const char *name;
 	struct virtio_device *vdev;
+	unsigned int index;
+	unsigned int num_free;
 	void *priv;
 };
 
@@ -206,7 +212,8 @@ bool virtqueue_enable_cb(struct virtqueue *vq);
 bool virtqueue_enable_cb_delayed(struct virtqueue *vq);
 
 void *virtqueue_detach_unused_buf(struct virtqueue *vq);
-struct virtqueue *vring_new_virtqueue(unsigned int num,
+struct virtqueue *vring_new_virtqueue(unsigned int index,
+				      unsigned int num,
 				      unsigned int vring_align,
 				      struct virtio_device *vdev,
 				      bool weak_barriers,
diff --git a/tools/virtio/virtio_test.c b/tools/virtio/virtio_test.c
index fcc9aa2..a07142c 100644
--- a/tools/virtio/virtio_test.c
+++ b/tools/virtio/virtio_test.c
@@ -81,7 +81,7 @@ void vhost_vq_setup(struct vdev_info *dev, struct vq_info *info)
 	assert(r >= 0);
 }
 
-static void vq_info_add(struct vdev_info *dev, int num)
+static void vq_info_add(struct vdev_info *dev, int idx, int num)
 {
 	struct vq_info *info = &dev->vqs[dev->nvqs];
 	int r;
@@ -92,7 +92,7 @@ static void vq_info_add(struct vdev_info *dev, int num)
 	assert(r >= 0);
 	memset(info->ring, 0, vring_size(num, 4096));
 	vring_init(&info->vring, num, info->ring, 4096);
-	info->vq = vring_new_virtqueue(info->vring.num, 4096, &dev->vdev,
+	info->vq = vring_new_virtqueue(idx, info->vring.num, 4096, &dev->vdev,
 				       true, info->ring,
 				       vq_notify, vq_callback, "test");
 	assert(info->vq);
@@ -277,7 +277,7 @@ int main(int argc, char **argv)
 
 done:
 	vdev_info_init(&dev, features);
-	vq_info_add(&dev, 256);
+	vq_info_add(&dev, 0, 256);
 	run_test(&dev, &dev.vqs[0], delayed, 0x100000);
 	return 0;
 }
-- 
1.7.10.4


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH 4/5] tools/virtio: separate headers more.
  2013-02-18 22:29 [PATCH 0/5] vringh Rusty Russell
                   ` (2 preceding siblings ...)
  2013-02-18 22:29 ` [PATCH 3/5] tools/virtio: fix compile Rusty Russell
@ 2013-02-18 22:29 ` Rusty Russell
  2013-02-18 22:29 ` [PATCH 5/5] tools/virtio: add vring_test Rusty Russell
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 9+ messages in thread
From: Rusty Russell @ 2013-02-18 22:29 UTC (permalink / raw)
  To: virtualization; +Cc: sjur.brandeland, mst, linux-kernel, Rusty Russell

This makes them a bit more like the kernel headers, so we can include more
real kernel headers in our tests.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
---
 tools/virtio/asm/barrier.h              |   14 +++
 tools/virtio/linux/bug.h                |   10 ++
 tools/virtio/linux/err.h                |   26 +++++
 tools/virtio/linux/export.h             |    5 +
 tools/virtio/linux/irqreturn.h          |    1 +
 tools/virtio/linux/kernel.h             |  113 ++++++++++++++++++++
 tools/virtio/linux/module.h             |    1 +
 tools/virtio/linux/printk.h             |    4 +
 tools/virtio/linux/ratelimit.h          |    4 +
 tools/virtio/linux/scatterlist.h        |  173 +++++++++++++++++++++++++++++++
 tools/virtio/linux/types.h              |   28 +++++
 tools/virtio/linux/uaccess.h            |   37 +++++++
 tools/virtio/linux/uio.h                |    1 +
 tools/virtio/linux/virtio.h             |  152 +--------------------------
 tools/virtio/linux/virtio_config.h      |    6 ++
 tools/virtio/linux/virtio_ring.h        |    1 +
 tools/virtio/linux/vringh.h             |    1 +
 tools/virtio/uapi/linux/uio.h           |    1 +
 tools/virtio/uapi/linux/virtio_config.h |    1 +
 tools/virtio/uapi/linux/virtio_ring.h   |    4 +
 tools/virtio/virtio_test.c              |    4 +
 21 files changed, 438 insertions(+), 149 deletions(-)
 create mode 100644 tools/virtio/asm/barrier.h
 create mode 100644 tools/virtio/linux/bug.h
 create mode 100644 tools/virtio/linux/err.h
 create mode 100644 tools/virtio/linux/export.h
 create mode 100644 tools/virtio/linux/irqreturn.h
 create mode 100644 tools/virtio/linux/kernel.h
 create mode 100644 tools/virtio/linux/printk.h
 create mode 100644 tools/virtio/linux/ratelimit.h
 create mode 100644 tools/virtio/linux/scatterlist.h
 create mode 100644 tools/virtio/linux/types.h
 create mode 100644 tools/virtio/linux/uaccess.h
 create mode 100644 tools/virtio/linux/uio.h
 create mode 100644 tools/virtio/linux/virtio_config.h
 create mode 100644 tools/virtio/linux/virtio_ring.h
 create mode 100644 tools/virtio/linux/vringh.h
 create mode 100644 tools/virtio/uapi/linux/uio.h
 create mode 100644 tools/virtio/uapi/linux/virtio_config.h
 create mode 100644 tools/virtio/uapi/linux/virtio_ring.h

diff --git a/tools/virtio/asm/barrier.h b/tools/virtio/asm/barrier.h
new file mode 100644
index 0000000..aff61e1
--- /dev/null
+++ b/tools/virtio/asm/barrier.h
@@ -0,0 +1,14 @@
+#if defined(__i386__) || defined(__x86_64__)
+#define barrier() asm volatile("" ::: "memory")
+#define mb() __sync_synchronize()
+
+#define smp_mb()	mb()
+# define smp_rmb()	barrier()
+# define smp_wmb()	barrier()
+/* Weak barriers should be used. If not - it's a bug */
+# define rmb()	abort()
+# define wmb()	abort()
+#else
+#error Please fill in barrier macros
+#endif
+
diff --git a/tools/virtio/linux/bug.h b/tools/virtio/linux/bug.h
new file mode 100644
index 0000000..fb94f07
--- /dev/null
+++ b/tools/virtio/linux/bug.h
@@ -0,0 +1,10 @@
+#ifndef BUG_H
+#define BUG_H
+
+#define BUG_ON(__BUG_ON_cond) assert(!(__BUG_ON_cond))
+
+#define BUILD_BUG_ON(x)
+
+#define BUG() abort()
+
+#endif /* BUG_H */
diff --git a/tools/virtio/linux/err.h b/tools/virtio/linux/err.h
new file mode 100644
index 0000000..e32eff8
--- /dev/null
+++ b/tools/virtio/linux/err.h
@@ -0,0 +1,26 @@
+#ifndef ERR_H
+#define ERR_H
+#define MAX_ERRNO	4095
+
+#define IS_ERR_VALUE(x) unlikely((x) >= (unsigned long)-MAX_ERRNO)
+
+static inline void * __must_check ERR_PTR(long error)
+{
+	return (void *) error;
+}
+
+static inline long __must_check PTR_ERR(const void *ptr)
+{
+	return (long) ptr;
+}
+
+static inline long __must_check IS_ERR(const void *ptr)
+{
+	return IS_ERR_VALUE((unsigned long)ptr);
+}
+
+static inline long __must_check IS_ERR_OR_NULL(const void *ptr)
+{
+	return !ptr || IS_ERR_VALUE((unsigned long)ptr);
+}
+#endif /* ERR_H */
diff --git a/tools/virtio/linux/export.h b/tools/virtio/linux/export.h
new file mode 100644
index 0000000..7311d32
--- /dev/null
+++ b/tools/virtio/linux/export.h
@@ -0,0 +1,5 @@
+#define EXPORT_SYMBOL(sym)
+#define EXPORT_SYMBOL_GPL(sym)
+#define EXPORT_SYMBOL_GPL_FUTURE(sym)
+#define EXPORT_UNUSED_SYMBOL(sym)
+#define EXPORT_UNUSED_SYMBOL_GPL(sym)
diff --git a/tools/virtio/linux/irqreturn.h b/tools/virtio/linux/irqreturn.h
new file mode 100644
index 0000000..a3c4e7b
--- /dev/null
+++ b/tools/virtio/linux/irqreturn.h
@@ -0,0 +1 @@
+#include "../../../include/linux/irqreturn.h"
diff --git a/tools/virtio/linux/kernel.h b/tools/virtio/linux/kernel.h
new file mode 100644
index 0000000..bc5f1c8
--- /dev/null
+++ b/tools/virtio/linux/kernel.h
@@ -0,0 +1,113 @@
+#ifndef KERNEL_H
+#define KERNEL_H
+#include <stdbool.h>
+#include <stdlib.h>
+#include <stddef.h>
+#include <stdio.h>
+#include <string.h>
+#include <assert.h>
+#include <stdarg.h>
+
+#include <linux/types.h>
+#include <linux/printk.h>
+#include <linux/bug.h>
+#include <errno.h>
+#include <unistd.h>
+#include <asm/barrier.h>
+
+#define CONFIG_SMP
+
+#define PAGE_SIZE getpagesize()
+#define PAGE_MASK (~(PAGE_SIZE-1))
+
+typedef unsigned long long dma_addr_t;
+typedef size_t __kernel_size_t;
+
+struct page {
+	unsigned long long dummy;
+};
+
+/* Physical == Virtual */
+#define virt_to_phys(p) ((unsigned long)p)
+#define phys_to_virt(a) ((void *)(unsigned long)(a))
+/* Page address: Virtual / 4K */
+#define page_to_phys(p) ((dma_addr_t)(unsigned long)(p))
+#define virt_to_page(p) ((struct page *)((unsigned long)p & PAGE_MASK))
+
+#define offset_in_page(p) (((unsigned long)p) % PAGE_SIZE)
+
+#define __printf(a,b) __attribute__((format(printf,a,b)))
+
+typedef enum {
+	GFP_KERNEL,
+	GFP_ATOMIC,
+} gfp_t;
+
+#define __GFP_HIGHMEM 0
+#define __GFP_HIGH 0
+
+#define ARRAY_SIZE(x) (sizeof(x)/sizeof(x[0]))
+
+extern void *__kmalloc_fake, *__kfree_ignore_start, *__kfree_ignore_end;
+static inline void *kmalloc(size_t s, gfp_t gfp)
+{
+	if (__kmalloc_fake)
+		return __kmalloc_fake;
+	return malloc(s);
+}
+
+static inline void kfree(void *p)
+{
+	if (p >= __kfree_ignore_start && p < __kfree_ignore_end)
+		return;
+	free(p);
+}
+
+static inline void *krealloc(void *p, size_t s, gfp_t gfp)
+{
+	return realloc(p, s);
+}
+
+
+static inline unsigned long __get_free_page(gfp_t gfp)
+{
+	void *p;
+
+	posix_memalign(&p, PAGE_SIZE, PAGE_SIZE);
+	return (unsigned long)p;
+}
+
+static inline void free_page(unsigned long addr)
+{
+	free((void *)addr);
+}
+
+#define container_of(ptr, type, member) ({			\
+	const typeof( ((type *)0)->member ) *__mptr = (ptr);	\
+	(type *)( (char *)__mptr - offsetof(type,member) );})
+
+#define uninitialized_var(x) x = x
+
+# ifndef likely
+#  define likely(x)	(__builtin_expect(!!(x), 1))
+# endif
+# ifndef unlikely
+#  define unlikely(x)	(__builtin_expect(!!(x), 0))
+# endif
+
+#define pr_err(format, ...) fprintf (stderr, format, ## __VA_ARGS__)
+#ifdef DEBUG
+#define pr_debug(format, ...) fprintf (stderr, format, ## __VA_ARGS__)
+#else
+#define pr_debug(format, ...) do {} while (0)
+#endif
+#define dev_err(dev, format, ...) fprintf (stderr, format, ## __VA_ARGS__)
+#define dev_warn(dev, format, ...) fprintf (stderr, format, ## __VA_ARGS__)
+
+#define min(x, y) ({				\
+	typeof(x) _min1 = (x);			\
+	typeof(y) _min2 = (y);			\
+	(void) (&_min1 == &_min2);		\
+	_min1 < _min2 ? _min1 : _min2; })
+
+#endif /* KERNEL_H */
diff --git a/tools/virtio/linux/module.h b/tools/virtio/linux/module.h
index e69de29..3039a7e 100644
--- a/tools/virtio/linux/module.h
+++ b/tools/virtio/linux/module.h
@@ -0,0 +1 @@
+#include <linux/export.h>
diff --git a/tools/virtio/linux/printk.h b/tools/virtio/linux/printk.h
new file mode 100644
index 0000000..9f2423b
--- /dev/null
+++ b/tools/virtio/linux/printk.h
@@ -0,0 +1,4 @@
+#include "../../../include/linux/kern_levels.h"
+
+#define printk printf
+#define vprintk vprintf
diff --git a/tools/virtio/linux/ratelimit.h b/tools/virtio/linux/ratelimit.h
new file mode 100644
index 0000000..dcce172
--- /dev/null
+++ b/tools/virtio/linux/ratelimit.h
@@ -0,0 +1,4 @@
+#define DEFINE_RATELIMIT_STATE(name, interval_init, burst_init)	int name = 0
+
+#define __ratelimit(x) (*(x))
+
diff --git a/tools/virtio/linux/scatterlist.h b/tools/virtio/linux/scatterlist.h
new file mode 100644
index 0000000..b2cf7d0
--- /dev/null
+++ b/tools/virtio/linux/scatterlist.h
@@ -0,0 +1,173 @@
+#ifndef SCATTERLIST_H
+#define SCATTERLIST_H
+#include <linux/kernel.h>
+
+struct scatterlist {
+	unsigned long	page_link;
+	unsigned int	offset;
+	unsigned int	length;
+	dma_addr_t	dma_address;
+};
+
+/* Scatterlist helpers, stolen from linux/scatterlist.h */
+#define sg_is_chain(sg)		((sg)->page_link & 0x01)
+#define sg_is_last(sg)		((sg)->page_link & 0x02)
+#define sg_chain_ptr(sg)	\
+	((struct scatterlist *) ((sg)->page_link & ~0x03))
+
+/**
+ * sg_assign_page - Assign a given page to an SG entry
+ * @sg:		    SG entry
+ * @page:	    The page
+ *
+ * Description:
+ *   Assign page to sg entry. Also see sg_set_page(), the most commonly used
+ *   variant.
+ *
+ **/
+static inline void sg_assign_page(struct scatterlist *sg, struct page *page)
+{
+	unsigned long page_link = sg->page_link & 0x3;
+
+	/*
+	 * In order for the low bit stealing approach to work, pages
+	 * must be aligned at a 32-bit boundary as a minimum.
+	 */
+	BUG_ON((unsigned long) page & 0x03);
+#ifdef CONFIG_DEBUG_SG
+	BUG_ON(sg->sg_magic != SG_MAGIC);
+	BUG_ON(sg_is_chain(sg));
+#endif
+	sg->page_link = page_link | (unsigned long) page;
+}
+
+/**
+ * sg_set_page - Set sg entry to point at given page
+ * @sg:		 SG entry
+ * @page:	 The page
+ * @len:	 Length of data
+ * @offset:	 Offset into page
+ *
+ * Description:
+ *   Use this function to set an sg entry pointing at a page, never assign
+ *   the page directly. We encode sg table information in the lower bits
+ *   of the page pointer. See sg_page() for looking up the page belonging
+ *   to an sg entry.
+ *
+ **/
+static inline void sg_set_page(struct scatterlist *sg, struct page *page,
+			       unsigned int len, unsigned int offset)
+{
+	sg_assign_page(sg, page);
+	sg->offset = offset;
+	sg->length = len;
+}
+
+static inline struct page *sg_page(struct scatterlist *sg)
+{
+#ifdef CONFIG_DEBUG_SG
+	BUG_ON(sg->sg_magic != SG_MAGIC);
+	BUG_ON(sg_is_chain(sg));
+#endif
+	return (struct page *)((sg)->page_link & ~0x3);
+}
+
+/*
+ * Loop over each sg element, following the pointer to a new list if necessary
+ */
+#define for_each_sg(sglist, sg, nr, __i)	\
+	for (__i = 0, sg = (sglist); __i < (nr); __i++, sg = sg_next(sg))
+
+/**
+ * sg_chain - Chain two sglists together
+ * @prv:	First scatterlist
+ * @prv_nents:	Number of entries in prv
+ * @sgl:	Second scatterlist
+ *
+ * Description:
+ *   Links @prv@ and @sgl@ together, to form a longer scatterlist.
+ *
+ **/
+static inline void sg_chain(struct scatterlist *prv, unsigned int prv_nents,
+			    struct scatterlist *sgl)
+{
+	/*
+	 * offset and length are unused for chain entry.  Clear them.
+	 */
+	prv[prv_nents - 1].offset = 0;
+	prv[prv_nents - 1].length = 0;
+
+	/*
+	 * Set lowest bit to indicate a link pointer, and make sure to clear
+	 * the termination bit if it happens to be set.
+	 */
+	prv[prv_nents - 1].page_link = ((unsigned long) sgl | 0x01) & ~0x02;
+}
+
+/**
+ * sg_mark_end - Mark the end of the scatterlist
+ * @sg:		 SG entryScatterlist
+ *
+ * Description:
+ *   Marks the passed in sg entry as the termination point for the sg
+ *   table. A call to sg_next() on this entry will return NULL.
+ *
+ **/
+static inline void sg_mark_end(struct scatterlist *sg)
+{
+#ifdef CONFIG_DEBUG_SG
+	BUG_ON(sg->sg_magic != SG_MAGIC);
+#endif
+	/*
+	 * Set termination bit, clear potential chain bit
+	 */
+	sg->page_link |= 0x02;
+	sg->page_link &= ~0x01;
+}
+
+static inline struct scatterlist *sg_next(struct scatterlist *sg)
+{
+#ifdef CONFIG_DEBUG_SG
+	BUG_ON(sg->sg_magic != SG_MAGIC);
+#endif
+	if (sg_is_last(sg))
+		return NULL;
+
+	sg++;
+	if (unlikely(sg_is_chain(sg)))
+		sg = sg_chain_ptr(sg);
+
+	return sg;
+}
+
+static inline void sg_init_table(struct scatterlist *sgl, unsigned int nents)
+{
+	memset(sgl, 0, sizeof(*sgl) * nents);
+#ifdef CONFIG_DEBUG_SG
+	{
+		unsigned int i;
+		for (i = 0; i < nents; i++)
+			sgl[i].sg_magic = SG_MAGIC;
+	}
+#endif
+	sg_mark_end(&sgl[nents - 1]);
+}
+
+static inline dma_addr_t sg_phys(struct scatterlist *sg)
+{
+	return page_to_phys(sg_page(sg)) + sg->offset;
+}
+
+static inline void sg_set_buf(struct scatterlist *sg, const void *buf,
+			      unsigned int buflen)
+{
+	sg_set_page(sg, virt_to_page(buf), buflen, offset_in_page(buf));
+}
+
+static inline void sg_init_one(struct scatterlist *sg,
+			       const void *buf, unsigned int buflen)
+{
+	sg_init_table(sg, 1);
+	sg_set_buf(sg, buf, buflen);
+}
+#endif /* SCATTERLIST_H */
diff --git a/tools/virtio/linux/types.h b/tools/virtio/linux/types.h
new file mode 100644
index 0000000..f8ebb9a
--- /dev/null
+++ b/tools/virtio/linux/types.h
@@ -0,0 +1,28 @@
+#ifndef TYPES_H
+#define TYPES_H
+#include <stdint.h>
+
+#define __force
+#define __user
+#define __must_check
+#define __cold
+
+typedef uint64_t u64;
+typedef int64_t s64;
+typedef uint32_t u32;
+typedef int32_t s32;
+typedef uint16_t u16;
+typedef int16_t s16;
+typedef uint8_t u8;
+typedef int8_t s8;
+
+typedef uint64_t __u64;
+typedef int64_t __s64;
+typedef uint32_t __u32;
+typedef int32_t __s32;
+typedef uint16_t __u16;
+typedef int16_t __s16;
+typedef uint8_t __u8;
+typedef int8_t __s8;
+
+#endif /* TYPES_H */
diff --git a/tools/virtio/linux/uaccess.h b/tools/virtio/linux/uaccess.h
new file mode 100644
index 0000000..5874786
--- /dev/null
+++ b/tools/virtio/linux/uaccess.h
@@ -0,0 +1,37 @@
+#ifndef UACCESS_H
+#define UACCESS_H
+extern void *__user_addr_min, *__user_addr_max;
+
+#define get_user(dst, uptr) get_user_func(&(dst), (uptr), sizeof(*uptr))
+#define put_user(val, uptr) put_user_func(&(val), (uptr), sizeof(*uptr))
+
+static inline int get_user_func(void *dst, void *src, size_t size)
+{
+	assert(src >= __user_addr_min && src + size <= __user_addr_max);
+	memcpy(dst, src, size);
+	return 0;
+}
+
+static inline int put_user_func(void *val, void *dst, size_t size)
+{
+	assert(dst >= __user_addr_min && dst + size <= __user_addr_max);
+	memcpy(dst, val, size);
+	return 0;
+}
+
+static inline int copy_from_user(void *to, const void __user *from,
+				 unsigned long n)
+{
+	assert(from >= __user_addr_min && from + n <= __user_addr_max);
+	memcpy(to, from, n);
+	return 0;
+}
+
+static inline int copy_to_user(void __user *to, const void *from,
+			       unsigned long n)
+{
+	assert(to >= __user_addr_min && to + n <= __user_addr_max);
+	memcpy(to, from, n);
+	return 0;
+}
+#endif /* UACCESS_H */
diff --git a/tools/virtio/linux/uio.h b/tools/virtio/linux/uio.h
new file mode 100644
index 0000000..ecbdf92
--- /dev/null
+++ b/tools/virtio/linux/uio.h
@@ -0,0 +1 @@
+#include "../../../include/linux/uio.h"
diff --git a/tools/virtio/linux/virtio.h b/tools/virtio/linux/virtio.h
index ccfc901..61f313b 100644
--- a/tools/virtio/linux/virtio.h
+++ b/tools/virtio/linux/virtio.h
@@ -1,131 +1,7 @@
 #ifndef LINUX_VIRTIO_H
 #define LINUX_VIRTIO_H
-
-#include <stdbool.h>
-#include <stdlib.h>
-#include <stddef.h>
-#include <stdio.h>
-#include <string.h>
-#include <assert.h>
-
-#include <linux/types.h>
-#include <errno.h>
-
-typedef unsigned long long dma_addr_t;
-
-struct scatterlist {
-	unsigned long	page_link;
-	unsigned int	offset;
-	unsigned int	length;
-	dma_addr_t	dma_address;
-};
-
-struct page {
-	unsigned long long dummy;
-};
-
-#define BUG_ON(__BUG_ON_cond) assert(!(__BUG_ON_cond))
-
-/* Physical == Virtual */
-#define virt_to_phys(p) ((unsigned long)p)
-#define phys_to_virt(a) ((void *)(unsigned long)(a))
-/* Page address: Virtual / 4K */
-#define virt_to_page(p) ((struct page*)((virt_to_phys(p) / 4096) * \
-					sizeof(struct page)))
-#define offset_in_page(p) (((unsigned long)p) % 4096)
-#define sg_phys(sg) ((sg->page_link & ~0x3) / sizeof(struct page) * 4096 + \
-		     sg->offset)
-static inline void sg_mark_end(struct scatterlist *sg)
-{
-	/*
-	 * Set termination bit, clear potential chain bit
-	 */
-	sg->page_link |= 0x02;
-	sg->page_link &= ~0x01;
-}
-static inline void sg_init_table(struct scatterlist *sgl, unsigned int nents)
-{
-	memset(sgl, 0, sizeof(*sgl) * nents);
-	sg_mark_end(&sgl[nents - 1]);
-}
-static inline void sg_assign_page(struct scatterlist *sg, struct page *page)
-{
-	unsigned long page_link = sg->page_link & 0x3;
-
-	/*
-	 * In order for the low bit stealing approach to work, pages
-	 * must be aligned at a 32-bit boundary as a minimum.
-	 */
-	BUG_ON((unsigned long) page & 0x03);
-	sg->page_link = page_link | (unsigned long) page;
-}
-
-static inline void sg_set_page(struct scatterlist *sg, struct page *page,
-			       unsigned int len, unsigned int offset)
-{
-	sg_assign_page(sg, page);
-	sg->offset = offset;
-	sg->length = len;
-}
-
-static inline void sg_set_buf(struct scatterlist *sg, const void *buf,
-			      unsigned int buflen)
-{
-	sg_set_page(sg, virt_to_page(buf), buflen, offset_in_page(buf));
-}
-
-static inline void sg_init_one(struct scatterlist *sg, const void *buf, unsigned int buflen)
-{
-	sg_init_table(sg, 1);
-	sg_set_buf(sg, buf, buflen);
-}
-
-typedef __u16 u16;
-
-typedef enum {
-	GFP_KERNEL,
-	GFP_ATOMIC,
-} gfp_t;
-
-#define __GFP_HIGHMEM 0
-#define __GFP_HIGH 0
-
-typedef enum {
-	IRQ_NONE,
-	IRQ_HANDLED
-} irqreturn_t;
-
-static inline void *kmalloc(size_t s, gfp_t gfp)
-{
-	return malloc(s);
-}
-
-static inline void kfree(void *p)
-{
-	free(p);
-}
-
-#define container_of(ptr, type, member) ({			\
-	const typeof( ((type *)0)->member ) *__mptr = (ptr);	\
-	(type *)( (char *)__mptr - offsetof(type,member) );})
-
-#define uninitialized_var(x) x = x
-
-# ifndef likely
-#  define likely(x)	(__builtin_expect(!!(x), 1))
-# endif
-# ifndef unlikely
-#  define unlikely(x)	(__builtin_expect(!!(x), 0))
-# endif
-
-#define pr_err(format, ...) fprintf (stderr, format, ## __VA_ARGS__)
-#ifdef DEBUG
-#define pr_debug(format, ...) fprintf (stderr, format, ## __VA_ARGS__)
-#else
-#define pr_debug(format, ...) do {} while (0)
-#endif
-#define dev_err(dev, format, ...) fprintf (stderr, format, ## __VA_ARGS__)
-#define dev_warn(dev, format, ...) fprintf (stderr, format, ## __VA_ARGS__)
+#include <linux/scatterlist.h>
+#include <linux/kernel.h>
 
 /* TODO: empty stubs for now. Broken but enough for virtio_ring.c */
 #define list_add_tail(a, b) do {} while (0)
@@ -135,6 +11,7 @@ static inline void kfree(void *p)
 #define BITS_PER_BYTE		8
 #define BITS_PER_LONG (sizeof(long) * BITS_PER_BYTE)
 #define BIT_MASK(nr)		(1UL << ((nr) % BITS_PER_LONG))
+
 /* TODO: Not atomic as it should be:
  * we don't use this for anything important. */
 static inline void clear_bit(int nr, volatile unsigned long *addr)
@@ -149,10 +26,6 @@ static inline int test_bit(int nr, const volatile unsigned long *addr)
 {
         return 1UL & (addr[BIT_WORD(nr)] >> (nr & (BITS_PER_LONG-1)));
 }
-
-/* The only feature we care to support */
-#define virtio_has_feature(dev, feature) \
-	test_bit((feature), (dev)->features)
 /* end of stubs */
 
 struct virtio_device {
@@ -172,28 +45,9 @@ struct virtqueue {
 	void *priv;
 };
 
-#define EXPORT_SYMBOL_GPL(__EXPORT_SYMBOL_GPL_name) \
-	void __EXPORT_SYMBOL_GPL##__EXPORT_SYMBOL_GPL_name() { \
-}
 #define MODULE_LICENSE(__MODULE_LICENSE_value) \
 	const char *__MODULE_LICENSE_name = __MODULE_LICENSE_value
 
-#define CONFIG_SMP
-
-#if defined(__i386__) || defined(__x86_64__)
-#define barrier() asm volatile("" ::: "memory")
-#define mb() __sync_synchronize()
-
-#define smp_mb()	mb()
-# define smp_rmb()	barrier()
-# define smp_wmb()	barrier()
-/* Weak barriers should be used. If not - it's a bug */
-# define rmb()	abort()
-# define wmb()	abort()
-#else
-#error Please fill in barrier macros
-#endif
-
 /* Interfaces exported by virtio_ring. */
 int virtqueue_add_buf(struct virtqueue *vq,
 		      struct scatterlist sg[],
diff --git a/tools/virtio/linux/virtio_config.h b/tools/virtio/linux/virtio_config.h
new file mode 100644
index 0000000..5049967
--- /dev/null
+++ b/tools/virtio/linux/virtio_config.h
@@ -0,0 +1,6 @@
+#define VIRTIO_TRANSPORT_F_START	28
+#define VIRTIO_TRANSPORT_F_END		32
+
+#define virtio_has_feature(dev, feature) \
+	test_bit((feature), (dev)->features)
+
diff --git a/tools/virtio/linux/virtio_ring.h b/tools/virtio/linux/virtio_ring.h
new file mode 100644
index 0000000..8949c4e
--- /dev/null
+++ b/tools/virtio/linux/virtio_ring.h
@@ -0,0 +1 @@
+#include "../../../include/linux/virtio_ring.h"
diff --git a/tools/virtio/linux/vringh.h b/tools/virtio/linux/vringh.h
new file mode 100644
index 0000000..9348957
--- /dev/null
+++ b/tools/virtio/linux/vringh.h
@@ -0,0 +1 @@
+#include "../../../include/linux/vringh.h"
diff --git a/tools/virtio/uapi/linux/uio.h b/tools/virtio/uapi/linux/uio.h
new file mode 100644
index 0000000..7230e90
--- /dev/null
+++ b/tools/virtio/uapi/linux/uio.h
@@ -0,0 +1 @@
+#include <sys/uio.h>
diff --git a/tools/virtio/uapi/linux/virtio_config.h b/tools/virtio/uapi/linux/virtio_config.h
new file mode 100644
index 0000000..4c86675
--- /dev/null
+++ b/tools/virtio/uapi/linux/virtio_config.h
@@ -0,0 +1 @@
+#include "../../../../include/uapi/linux/virtio_config.h"
diff --git a/tools/virtio/uapi/linux/virtio_ring.h b/tools/virtio/uapi/linux/virtio_ring.h
new file mode 100644
index 0000000..4d99c78
--- /dev/null
+++ b/tools/virtio/uapi/linux/virtio_ring.h
@@ -0,0 +1,4 @@
+#ifndef VIRTIO_RING_H
+#define VIRTIO_RING_H
+#include "../../../../include/uapi/linux/virtio_ring.h"
+#endif /* VIRTIO_RING_H */
diff --git a/tools/virtio/virtio_test.c b/tools/virtio/virtio_test.c
index a07142c..c613f25 100644
--- a/tools/virtio/virtio_test.c
+++ b/tools/virtio/virtio_test.c
@@ -10,11 +10,15 @@
 #include <sys/stat.h>
 #include <sys/types.h>
 #include <fcntl.h>
+#include <stdbool.h>
 #include <linux/vhost.h>
 #include <linux/virtio.h>
 #include <linux/virtio_ring.h>
 #include "../../drivers/vhost/test.h"
 
+/* Unused */
+void *__kmalloc_fake, *__kfree_ignore_start, *__kfree_ignore_end;
+
 struct vq_info {
 	int kick;
 	int call;
-- 
1.7.10.4


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH 5/5] tools/virtio: add vring_test.
  2013-02-18 22:29 [PATCH 0/5] vringh Rusty Russell
                   ` (3 preceding siblings ...)
  2013-02-18 22:29 ` [PATCH 4/5] tools/virtio: separate headers more Rusty Russell
@ 2013-02-18 22:29 ` Rusty Russell
  2013-02-19  8:11 ` [PATCH 0/5] vringh Michael S. Tsirkin
  2013-02-24 22:30 ` Michael S. Tsirkin
  6 siblings, 0 replies; 9+ messages in thread
From: Rusty Russell @ 2013-02-18 22:29 UTC (permalink / raw)
  To: virtualization; +Cc: sjur.brandeland, mst, linux-kernel, Rusty Russell

This is mainly to test the drivers/vhost/vringh.c code, but it also
uses the drivers/virtio/virtio_ring.c code for the guest side.

Usage for testing the basic implementation:

	./vringh_test
	# Test with indirect descriptors
	./vringh_test --indirect
	# Test with indirect descriptors and event indexex
	./vringh_test --indirect --eventidx

You can run a parallel stress test by adding --parallel to any of the
above options.

eg ./vringh_test --parallel:
	Using CPUS 0 and 3
	Guest: notified 10107974, pinged 107970
	Host: notified 108158, pinged 3172148
	Time: R=17.659 U=6.640 S=6.640

./vringh_test --eventidx --parallel:
	Using CPUS 0 and 3
	Guest: notified 156357, pinged 156251
	Host: notified 156251, pinged 78179
	Time: R=4.518 U=3.536 S=3.536

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
---
 tools/virtio/Makefile      |    6 +-
 tools/virtio/vringh_test.c |  659 ++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 662 insertions(+), 3 deletions(-)
 create mode 100644 tools/virtio/vringh_test.c

diff --git a/tools/virtio/Makefile b/tools/virtio/Makefile
index d1d442e..e961545 100644
--- a/tools/virtio/Makefile
+++ b/tools/virtio/Makefile
@@ -1,12 +1,12 @@
 all: test mod
-test: virtio_test
+test: virtio_test vringh_test
 virtio_test: virtio_ring.o virtio_test.o
-CFLAGS += -g -O2 -Wall -I. -I ../../usr/include/ -Wno-pointer-sign -fno-strict-overflow  -MMD
+CFLAGS += -g -O2 -Wall -I. -I ../../usr/include/ -Wno-pointer-sign -fno-strict-overflow -fno-strict-aliasing -Wno-unused-result -MMD
 vpath %.c ../../drivers/virtio
 mod:
 	${MAKE} -C `pwd`/../.. M=`pwd`/vhost_test
 .PHONY: all test mod clean
 clean:
-	${RM} *.o vhost_test/*.o vhost_test/.*.cmd \
+	${RM} *.o vringh_test virtio_test vhost_test/*.o vhost_test/.*.cmd \
               vhost_test/Module.symvers vhost_test/modules.order *.d
 -include *.d
diff --git a/tools/virtio/vringh_test.c b/tools/virtio/vringh_test.c
new file mode 100644
index 0000000..1536b8e
--- /dev/null
+++ b/tools/virtio/vringh_test.c
@@ -0,0 +1,659 @@
+/* Simple test of virtio code, entirely in userpsace. */
+#define _GNU_SOURCE
+#include <sched.h>
+#include <err.h>
+#include <linux/kernel.h>
+#include <linux/err.h>
+#include <../../drivers/vhost/vringh.c>
+#include <../../drivers/virtio/virtio_ring.c>
+#include <sys/types.h>
+#include <sys/stat.h>
+#include <sys/mman.h>
+#include <sys/wait.h>
+#include <fcntl.h>
+
+#define USER_MEM (1024*1024)
+void *__user_addr_min, *__user_addr_max;
+void *__kmalloc_fake, *__kfree_ignore_start, *__kfree_ignore_end;
+static u64 user_addr_offset;
+
+#define RINGSIZE 256
+#define ALIGN 4096
+
+static void never_notify_host(struct virtqueue *vq)
+{
+	abort();
+}
+
+static void never_callback_guest(struct virtqueue *vq)
+{
+	abort();
+}
+
+static bool getrange_iov(struct vringh *vrh, u64 addr, struct vringh_range *r)
+{
+	if (addr < (u64)(unsigned long)__user_addr_min - user_addr_offset)
+		return false;
+	if (addr >= (u64)(unsigned long)__user_addr_max - user_addr_offset)
+		return false;
+
+	r->start = (u64)(unsigned long)__user_addr_min - user_addr_offset;
+	r->end_incl = (u64)(unsigned long)__user_addr_max - 1 - user_addr_offset;
+	r->offset = user_addr_offset;
+	return true;
+}
+
+/* We return single byte ranges. */
+static bool getrange_slow(struct vringh *vrh, u64 addr, struct vringh_range *r)
+{
+	if (addr < (u64)(unsigned long)__user_addr_min - user_addr_offset)
+		return false;
+	if (addr >= (u64)(unsigned long)__user_addr_max - user_addr_offset)
+		return false;
+
+	r->start = addr;
+	r->end_incl = r->start;
+	r->offset = user_addr_offset;
+	return true;
+}
+
+struct guest_virtio_device {
+	struct virtio_device vdev;
+	int to_host_fd;
+	unsigned long notifies;
+};
+
+static void parallel_notify_host(struct virtqueue *vq)
+{
+	struct guest_virtio_device *gvdev;
+
+	gvdev = container_of(vq->vdev, struct guest_virtio_device, vdev);
+	write(gvdev->to_host_fd, "", 1);
+	gvdev->notifies++;
+}
+
+#define NUM_XFERS (10000000)
+
+/* We aim for two "distant" cpus. */
+static void find_cpus(unsigned int *first, unsigned int *last)
+{
+	unsigned int i;
+
+	*first = -1U;
+	*last = 0;
+	for (i = 0; i < 4096; i++) {
+		cpu_set_t set;
+		CPU_ZERO(&set);
+		CPU_SET(i, &set);
+		if (sched_setaffinity(getpid(), sizeof(set), &set) == 0) {
+			if (i < *first)
+				*first = i;
+			if (i > *last)
+				*last = i;
+		}
+	}
+}
+
+static int parallel_test(unsigned long features,
+			 bool (*getrange)(struct vringh *vrh,
+					  u64 addr, struct vringh_range *r))
+{
+	void *host_map, *guest_map;
+	int fd, mapsize, to_guest[2], to_host[2];
+	unsigned long xfers = 0, notifies = 0, receives = 0;
+	unsigned int first_cpu, last_cpu;
+	cpu_set_t cpu_set;
+
+	/* Create real file to mmap. */
+	fd = open("/tmp/vringh_test-file", O_RDWR|O_CREAT|O_TRUNC, 0600);
+	if (fd < 0)
+		err(1, "Opening /tmp/vringh_test-file");
+
+	/* Extra room at the end for some data, and indirects */
+	mapsize = vring_size(RINGSIZE, ALIGN)
+		+ RINGSIZE * 2 * sizeof(int)
+		+ RINGSIZE * 6 * sizeof(struct vring_desc);
+	mapsize = (mapsize + getpagesize() - 1) & ~(getpagesize() - 1);
+	ftruncate(fd, mapsize);
+
+	/* Parent and child use separate addresses, to check our mapping logic! */
+	host_map = mmap(NULL, mapsize, PROT_READ|PROT_WRITE, MAP_SHARED, fd, 0);
+	guest_map = mmap(NULL, mapsize, PROT_READ|PROT_WRITE, MAP_SHARED, fd, 0);
+
+	pipe(to_guest);
+	pipe(to_host);
+
+	CPU_ZERO(&cpu_set);
+	find_cpus(&first_cpu, &last_cpu);
+	printf("Using CPUS %u and %u\n", first_cpu, last_cpu);
+	fflush(stdout);
+
+	if (fork() != 0) {
+		struct vringh vrh;
+		int status, err;
+
+		/* We are the host: never access guest addresses! */
+		munmap(guest_map, mapsize);
+
+		__user_addr_min = host_map;
+		__user_addr_max = __user_addr_min + mapsize;
+		user_addr_offset = host_map - guest_map;
+		assert(user_addr_offset);
+
+		close(to_guest[0]);
+		close(to_host[1]);
+
+		vring_init(&vrh.vring, RINGSIZE, host_map, ALIGN);
+		vringh_init_user(&vrh, features, RINGSIZE, true,
+				 vrh.vring.desc, vrh.vring.avail, vrh.vring.used);
+		CPU_SET(first_cpu, &cpu_set);
+		if (sched_setaffinity(getpid(), sizeof(cpu_set), &cpu_set))
+			errx(1, "Could not set affinity to cpu %u", first_cpu);
+
+		while (xfers < NUM_XFERS) {
+			struct iovec host_riov[2], host_wiov[2];
+			struct vringh_iov riov, wiov;
+			char buf[5];
+			u16 head;
+			int rlen, err;
+
+			vringh_iov_init(&riov,
+					host_riov, ARRAY_SIZE(host_riov));
+			vringh_iov_init(&wiov,
+					host_wiov, ARRAY_SIZE(host_wiov));
+
+			err = vringh_getdesc_user(&vrh, &riov, &wiov, getrange,
+						  &head);
+			if (err == 0) {
+				char buf[128];
+
+				err = vringh_need_notify_user(&vrh);
+				if (err < 0)
+					errx(1, "vringh_need_notify_user: %i",
+					     err);
+				if (err) {
+					write(to_guest[1], "", 1);
+					notifies++;
+				}
+
+				if (vringh_notify_enable_user(&vrh))
+					continue;
+
+				/* Swallow all notifies at once. */
+				if (read(to_host[0], buf, sizeof(buf)) < 1)
+					break;
+
+				vringh_notify_disable_user(&vrh);
+				receives++;
+				continue;
+			}
+			if (err != 1)
+				errx(1, "vringh_getdesc_user: %i", err);
+
+			/* We simply copy bytes. */
+			rlen = vringh_iov_pull_user(&riov, buf, sizeof(buf));
+			if (rlen < 0)
+				errx(1, "vringh_iov_pull_user: %i", rlen);
+			err = vringh_iov_push_user(&wiov, buf, rlen);
+			if (err != rlen)
+				errx(1, "vringh_iov_push_user: %i", err);
+			xfers++;
+			assert(wiov.i == wiov.used);
+
+			err = vringh_complete_user(&vrh, head, rlen);
+			if (err != 0)
+				errx(1, "vringh_complete_user: %i", err);
+		}
+
+		err = vringh_need_notify_user(&vrh);
+		if (err < 0)
+			errx(1, "vringh_need_notify_user: %i", err);
+		if (err) {
+			write(to_guest[1], "", 1);
+			notifies++;
+		}
+		wait(&status);
+		if (!WIFEXITED(status))
+			errx(1, "Child died with signal %i?", WTERMSIG(status));
+		if (WEXITSTATUS(status) != 0)
+			errx(1, "Child exited %i?", WEXITSTATUS(status));
+		printf("Host: notified %lu, pinged %lu\n", notifies, receives);
+		return 0;
+	} else {
+		struct guest_virtio_device gvdev;
+		struct virtqueue *vq;
+		unsigned int *data;
+		struct vring_desc *indirects;
+		unsigned int finished = 0;
+
+		/* We pass sg[]s pointing into here, but we need RINGSIZE+1 */
+		data = guest_map + vring_size(RINGSIZE, ALIGN);
+		indirects = (void *)data + (RINGSIZE + 1) * 2 * sizeof(int);
+
+		/* We are the guest. */
+		munmap(host_map, mapsize);
+
+		close(to_guest[1]);
+		close(to_host[0]);
+
+		gvdev.vdev.features[0] = features;
+		gvdev.to_host_fd = to_host[1];
+
+		CPU_SET(first_cpu, &cpu_set);
+		if (sched_setaffinity(getpid(), sizeof(cpu_set), &cpu_set))
+			err(1, "Could not set affinity to cpu %u", first_cpu);
+
+		vq = vring_new_virtqueue(0, RINGSIZE, ALIGN, &gvdev.vdev, true,
+					 guest_map, parallel_notify_host,
+					 never_callback_guest, "guest vq");
+
+		/* Don't kfree indirects. */
+		__kfree_ignore_start = indirects;
+		__kfree_ignore_end = indirects + RINGSIZE * 6;
+
+		while (xfers < NUM_XFERS) {
+			struct scatterlist sg[6];
+			unsigned int num_sg, len;
+			int *din, *dout, err;
+
+			/* Consume bufs. */
+			while ((din = virtqueue_get_buf(vq, &len)) != NULL) {
+				dout = din + 1;
+				assert(*dout == *din);
+				assert(len == 4);
+				finished++;
+			}
+
+			/* Produce a buffer. */
+			din = data + (xfers % (RINGSIZE + 1)) * 2;
+			dout = din + 1;
+
+			*din = xfers;
+			switch ((xfers / sizeof(*din)) % 3) {
+			case 0:
+				/* Nasty three-element sg list. */
+				sg_init_table(sg, num_sg = 3);
+				sg_set_buf(&sg[0], (void *)din, 1);
+				sg_set_buf(&sg[1], (void *)din + 1, 2);
+				sg_set_buf(&sg[2], (void *)din + 3, 1);
+				sg_init_table(sg + num_sg, num_sg);
+				sg_set_buf(&sg[num_sg+0], (void *)dout, 1);
+				sg_set_buf(&sg[num_sg+1], (void *)dout + 1, 2);
+				sg_set_buf(&sg[num_sg+2], (void *)dout + 3, 1);
+				break;
+			case 1:
+				sg_init_table(sg, num_sg = 2);
+				sg_set_buf(&sg[0], (void *)din, 1);
+				sg_set_buf(&sg[1], (void *)din + 1, 3);
+				sg_init_table(sg + num_sg, num_sg);
+				sg_set_buf(&sg[num_sg+0], (void *)dout, 1);
+				sg_set_buf(&sg[num_sg+1], (void *)dout + 1, 3);
+				break;
+			case 2:
+				sg_init_table(sg, num_sg = 1);
+				sg_set_buf(&sg[0], (void *)din, 4);
+				sg_init_table(sg + num_sg, num_sg);
+				sg_set_buf(&sg[num_sg+0], (void *)dout, 4);
+				break;
+			}
+
+			/* May allocate an indirect, so force it to allocate
+			 * user addr */
+			__kmalloc_fake = indirects + (xfers % RINGSIZE) * 6;
+			err = virtqueue_add_buf(vq, sg, num_sg, num_sg, din,
+						GFP_KERNEL);
+			if (err == -ENOSPC) {
+				char buf[128];
+
+				if (!virtqueue_enable_cb_delayed(vq))
+					continue;
+				/* Swallow all notifies at once. */
+				if (read(to_guest[0], buf, sizeof(buf)) < 1)
+					break;
+				
+				receives++;
+				virtqueue_disable_cb(vq);
+				continue;
+			}
+
+			if (err)
+				errx(1, "virtqueue_add_buf: %i", err);
+
+			xfers++;
+			virtqueue_kick(vq);
+		}
+
+		/* Any extra? */
+		while (finished != xfers) {
+			char buf[128];
+			int *din, *dout;
+			unsigned int len;
+
+			/* Consume bufs. */
+			din = virtqueue_get_buf(vq, &len);
+			if (din) {
+				dout = din + 1;
+				assert(*dout == *din);
+				assert(len == 4);
+				finished++;
+				continue;
+			}
+
+			if (!virtqueue_enable_cb_delayed(vq))
+				continue;
+			if (read(to_guest[0], buf, sizeof(buf)) < 1)
+				break;
+				
+			receives++;
+			virtqueue_disable_cb(vq);
+		}
+
+		printf("Guest: notified %lu, pinged %lu\n",
+		       gvdev.notifies, receives);
+		return 0;
+	}
+}
+
+int main(int argc, char *argv[])
+{
+	struct virtio_device vdev;
+	struct virtqueue *vq;
+	struct vringh vrh;
+	struct scatterlist guest_sg[RINGSIZE];
+	struct iovec host_riov[2], host_wiov[2];
+	struct vringh_iov riov, wiov;
+	struct vring_used_elem used[RINGSIZE];
+	char buf[28];
+	u16 head;
+	int err;
+	unsigned i;
+	void *ret;
+	bool (*getrange)(struct vringh *vrh, u64 addr, struct vringh_range *r);
+
+	getrange = getrange_iov;
+	vdev.features[0] = 0;
+
+	if (argv[1] && strcmp(argv[1], "--indirect") == 0) {
+		vdev.features[0] |= (1 << VIRTIO_RING_F_INDIRECT_DESC);
+		argv++;
+	}
+
+	if (argv[1] && strcmp(argv[1], "--eventidx") == 0) {
+		vdev.features[0] |= (1 << VIRTIO_RING_F_EVENT_IDX);
+		argv++;
+	}
+
+	if (argv[1] && strcmp(argv[1], "--slow") == 0) {
+		getrange = getrange_slow;
+		argv++;
+	}
+
+	if (argv[1] && strcmp(argv[1], "--parallel") == 0)
+		return parallel_test(vdev.features[0], getrange);
+
+	if (posix_memalign(&__user_addr_min, PAGE_SIZE, USER_MEM) != 0)
+		abort();
+	__user_addr_max = __user_addr_min + USER_MEM;
+	memset(__user_addr_min, 0, vring_size(RINGSIZE, ALIGN));
+
+	/* Set up guest side. */
+	vq = vring_new_virtqueue(0, RINGSIZE, ALIGN, &vdev, true,
+				 __user_addr_min,
+				 never_notify_host, never_callback_guest,
+				 "guest vq");
+
+	/* Set up host side. */
+	vring_init(&vrh.vring, RINGSIZE, __user_addr_min, ALIGN);
+	vringh_init_user(&vrh, vdev.features[0], RINGSIZE, true,
+			 vrh.vring.desc, vrh.vring.avail, vrh.vring.used);
+
+	/* No descriptor to get yet... */
+	err = vringh_getdesc_user(&vrh, &riov, &wiov, getrange, &head);
+	if (err != 0)
+		errx(1, "vringh_getdesc_user: %i", err);
+
+	/* Guest puts in a descriptor. */
+	memcpy(__user_addr_max - 1, "a", 1);
+	sg_init_table(guest_sg, 1);
+	sg_set_buf(&guest_sg[0], __user_addr_max - 1, 1);
+	sg_init_table(guest_sg+1, 1);
+	sg_set_buf(&guest_sg[1], __user_addr_max - 3, 2);
+
+	/* May allocate an indirect, so force it to allocate user addr */
+	__kmalloc_fake = __user_addr_min + vring_size(RINGSIZE, ALIGN);
+	err = virtqueue_add_buf(vq, guest_sg, 1, 1, &err, GFP_KERNEL);
+	if (err)
+		errx(1, "virtqueue_add_buf: %i", err);
+	__kmalloc_fake = NULL;
+
+	/* Host retreives it. */
+	vringh_iov_init(&riov, host_riov, ARRAY_SIZE(host_riov));
+	vringh_iov_init(&wiov, host_wiov, ARRAY_SIZE(host_wiov));
+
+	err = vringh_getdesc_user(&vrh, &riov, &wiov, getrange, &head);
+	if (err != 1)
+		errx(1, "vringh_getdesc_user: %i", err);
+
+	assert(riov.used == 1);
+	assert(riov.iov[0].iov_base == __user_addr_max - 1);
+	assert(riov.iov[0].iov_len == 1);
+	if (getrange != getrange_slow) {
+		assert(wiov.used == 1);
+		assert(wiov.iov[0].iov_base == __user_addr_max - 3);
+		assert(wiov.iov[0].iov_len == 2);
+	} else {
+		assert(wiov.used == 2);
+		assert(wiov.iov[0].iov_base == __user_addr_max - 3);
+		assert(wiov.iov[0].iov_len == 1);
+		assert(wiov.iov[1].iov_base == __user_addr_max - 2);
+		assert(wiov.iov[1].iov_len == 1);
+	}
+
+	err = vringh_iov_pull_user(&riov, buf, 5);
+	if (err != 1)
+		errx(1, "vringh_iov_pull_user: %i", err);
+	assert(buf[0] == 'a');
+	assert(riov.i == 1);
+	assert(vringh_iov_pull_user(&riov, buf, 5) == 0);
+
+	memcpy(buf, "bcdef", 5);
+	err = vringh_iov_push_user(&wiov, buf, 5);
+	if (err != 2)
+		errx(1, "vringh_iov_push_user: %i", err);
+	assert(memcmp(__user_addr_max - 3, "bc", 2) == 0);
+	assert(wiov.i == wiov.used);
+	assert(vringh_iov_push_user(&wiov, buf, 5) == 0);
+
+	/* Host is done. */
+	err = vringh_complete_user(&vrh, head, err);
+	if (err != 0)
+		errx(1, "vringh_complete_user: %i", err);
+
+	/* Guest should see used token now. */
+	__kfree_ignore_start = __user_addr_min + vring_size(RINGSIZE, ALIGN);
+	__kfree_ignore_end = __kfree_ignore_start + 1;
+	ret = virtqueue_get_buf(vq, &i);
+	if (ret != &err)
+		errx(1, "virtqueue_get_buf: %p", ret);
+	assert(i == 2);
+
+	/* Guest puts in a huge descriptor. */
+	sg_init_table(guest_sg, RINGSIZE);
+	for (i = 0; i < RINGSIZE; i++) {
+		sg_set_buf(&guest_sg[i],
+			   __user_addr_max - USER_MEM/4, USER_MEM/4);
+	}
+
+	/* Fill contents with recognisable garbage. */
+	for (i = 0; i < USER_MEM/4; i++)
+		((char *)__user_addr_max - USER_MEM/4)[i] = i;
+
+	/* This will allocate an indirect, so force it to allocate user addr */
+	__kmalloc_fake = __user_addr_min + vring_size(RINGSIZE, ALIGN);
+	err = virtqueue_add_buf(vq, guest_sg, RINGSIZE, 0, &err, GFP_KERNEL);
+	if (err)
+		errx(1, "virtqueue_add_buf (large): %i", err);
+	__kmalloc_fake = NULL;
+
+	/* Host picks it up (allocates new iov). */
+	vringh_iov_init(&riov, host_riov, ARRAY_SIZE(host_riov));
+	vringh_iov_init(&wiov, host_wiov, ARRAY_SIZE(host_wiov));
+
+	err = vringh_getdesc_user(&vrh, &riov, &wiov, getrange, &head);
+	if (err != 1)
+		errx(1, "vringh_getdesc_user: %i", err);
+
+	assert(riov.max_num & VRINGH_IOV_ALLOCATED);
+	assert(riov.iov != host_riov);
+	if (getrange != getrange_slow)
+		assert(riov.used == RINGSIZE);
+	else
+		assert(riov.used == RINGSIZE * USER_MEM/4);
+
+	assert(!(wiov.max_num & VRINGH_IOV_ALLOCATED));
+	assert(wiov.used == 0);
+
+	/* Pull data back out (in odd chunks), should be as expected. */
+	for (i = 0; i < RINGSIZE * USER_MEM/4; i += 3) {
+		err = vringh_iov_pull_user(&riov, buf, 3);
+		if (err != 3 && i + err != RINGSIZE * USER_MEM/4)
+			errx(1, "vringh_iov_pull_user large: %i", err);
+		assert(buf[0] == (char)i);
+		assert(err < 2 || buf[1] == (char)(i + 1));
+		assert(err < 3 || buf[2] == (char)(i + 2));
+	}
+	assert(riov.i == riov.used);
+	vringh_iov_cleanup(&riov);
+	vringh_iov_cleanup(&wiov);
+
+	/* Complete using multi interface, just because we can. */
+	used[0].id = head;
+	used[0].len = 0;
+	err = vringh_complete_multi_user(&vrh, used, 1);
+	if (err)
+		errx(1, "vringh_complete_multi_user(1): %i", err);
+
+	/* Free up those descriptors. */
+	ret = virtqueue_get_buf(vq, &i);
+	if (ret != &err)
+		errx(1, "virtqueue_get_buf: %p", ret);
+
+	/* Add lots of descriptors. */
+	sg_init_table(guest_sg, 1);
+	sg_set_buf(&guest_sg[0], __user_addr_max - 1, 1);
+	for (i = 0; i < RINGSIZE; i++) {
+		err = virtqueue_add_buf(vq, guest_sg, 1, 0, &err, GFP_KERNEL);
+		if (err)
+			errx(1, "virtqueue_add_buf (multiple): %i", err);
+	}
+
+	/* Now get many, and consume them all at once. */
+	vringh_iov_init(&riov, host_riov, ARRAY_SIZE(host_riov));
+	vringh_iov_init(&wiov, host_wiov, ARRAY_SIZE(host_wiov));
+
+	for (i = 0; i < RINGSIZE; i++) {
+		err = vringh_getdesc_user(&vrh, &riov, &wiov, getrange, &head);
+		if (err != 1)
+			errx(1, "vringh_getdesc_user: %i", err);
+		used[i].id = head;
+		used[i].len = 0;
+	}
+	/* Make sure it wraps around ring, to test! */
+	assert(vrh.vring.used->idx % RINGSIZE != 0);
+	err = vringh_complete_multi_user(&vrh, used, RINGSIZE);
+	if (err)
+		errx(1, "vringh_complete_multi_user: %i", err);
+
+	/* Free those buffers. */
+	for (i = 0; i < RINGSIZE; i++) {
+		unsigned len;
+		assert(virtqueue_get_buf(vq, &len) != NULL);
+	}
+
+	/* Test weird (but legal!) indirect. */
+	if (vdev.features[0] & (1 << VIRTIO_RING_F_INDIRECT_DESC)) {
+		struct vring_virtqueue *vvq = to_vvq(vq);
+		char *data = __user_addr_max - USER_MEM/4;
+		struct vring_desc *d = __user_addr_max - USER_MEM/2;
+		unsigned int n = vvq->free_head;
+
+		/* Force creation of direct, which we modify. */
+		vvq->indirect = false;
+		
+		sg_init_table(guest_sg, 4);
+		sg_set_buf(&guest_sg[0], d, sizeof(*d)*2);
+		sg_set_buf(&guest_sg[1], d + 2, sizeof(*d)*1);
+		sg_set_buf(&guest_sg[2], data + 6, 4);
+		sg_set_buf(&guest_sg[3], d + 3, sizeof(*d)*3);
+
+		err = virtqueue_add_buf(vq, guest_sg, 4, 0, &err, GFP_KERNEL);
+		if (err)
+			errx(1, "virtqueue_add_buf (indirect): %i", err);
+
+		/* They're used in order, but double-check... */
+		assert(vvq->vring.desc[n].addr == (unsigned long)d);
+		assert(vvq->vring.desc[n-1].addr == (unsigned long)(d+2));
+		assert(vvq->vring.desc[n-2].addr == (unsigned long)data + 6);
+		assert(vvq->vring.desc[n-3].addr == (unsigned long)(d+3));
+		vvq->vring.desc[n].flags |= VRING_DESC_F_INDIRECT;
+		vvq->vring.desc[n-1].flags |= VRING_DESC_F_INDIRECT;
+		vvq->vring.desc[n-3].flags |= VRING_DESC_F_INDIRECT;
+
+		/* First indirect */
+		d[0].addr = (unsigned long)data;
+		d[0].len = 1;
+		d[0].flags = VRING_DESC_F_NEXT;
+		d[0].next = 1;
+		d[1].addr = (unsigned long)data + 1;
+		d[1].len = 2;
+		d[1].flags = 0;
+
+		/* Second indirect */
+		d[2].addr = (unsigned long)data + 3;
+		d[2].len = 3;
+		d[2].flags = 0;
+
+		/* Third indirect */
+		d[3].addr = (unsigned long)data + 10;
+		d[3].len = 5;
+		d[3].flags = VRING_DESC_F_NEXT;
+		d[3].next = 1;
+		d[4].addr = (unsigned long)data + 15;
+		d[4].len = 6;
+		d[4].flags = VRING_DESC_F_NEXT;
+		d[4].next = 2;
+		d[5].addr = (unsigned long)data + 21;
+		d[5].len = 7;
+		d[5].flags = 0;
+
+		/* Host picks it up (allocates new iov). */
+		vringh_iov_init(&riov, host_riov, ARRAY_SIZE(host_riov));
+		vringh_iov_init(&wiov, host_wiov, ARRAY_SIZE(host_wiov));
+
+		err = vringh_getdesc_user(&vrh, &riov, &wiov, getrange, &head);
+		if (err != 1)
+			errx(1, "vringh_getdesc_user: %i", err);
+
+		if (head != n)
+			errx(1, "vringh_getdesc_user: head %i not %i", head, n);
+
+		assert(riov.max_num & VRINGH_IOV_ALLOCATED);
+		if (getrange != getrange_slow)
+			assert(riov.used == 7);
+		else
+			assert(riov.used == 28);
+		err = vringh_iov_pull_user(&riov, buf, 29);
+		assert(err == 28);
+
+		/* Data should be linear. */
+		for (i = 0; i < err; i++)
+			assert(buf[i] == i);
+		vringh_iov_cleanup(&riov);
+	}
+
+	/* Don't leak memory... */
+	vring_del_virtqueue(vq);
+	free(__user_addr_min);
+
+	return 0;
+}
-- 
1.7.10.4


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH 0/5] vringh
  2013-02-18 22:29 [PATCH 0/5] vringh Rusty Russell
                   ` (4 preceding siblings ...)
  2013-02-18 22:29 ` [PATCH 5/5] tools/virtio: add vring_test Rusty Russell
@ 2013-02-19  8:11 ` Michael S. Tsirkin
  2013-02-19 20:43   ` Rusty Russell
  2013-02-24 22:30 ` Michael S. Tsirkin
  6 siblings, 1 reply; 9+ messages in thread
From: Michael S. Tsirkin @ 2013-02-19  8:11 UTC (permalink / raw)
  To: Rusty Russell; +Cc: virtualization, sjur.brandeland, linux-kernel

On Tue, Feb 19, 2013 at 08:59:27AM +1030, Rusty Russell wrote:
> This introduces vringh, which are generic accessors for virtio rings (host side).
> There's a host-side implementation in vhost, but it assumes that the rings are
> in userspace, and is tied to the vhost implementation.  I have patches to adapt
> it to use vringh, but I'm pushing this in the next merge window for Sjur, who has
> CAIF patches which need it.
> 
> This also includes a test program in tools/virtio.  There's a fixup patch for
> the current tools, since they didn't compile for me.  MST?
> 
> Review and comments welcome!
> Rusty.

Will review, thanks.
Okay so the plan is to merge this changeset for 3.9 and rework vhost
for 3.10?

-- 
MST

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 0/5] vringh
  2013-02-19  8:11 ` [PATCH 0/5] vringh Michael S. Tsirkin
@ 2013-02-19 20:43   ` Rusty Russell
  0 siblings, 0 replies; 9+ messages in thread
From: Rusty Russell @ 2013-02-19 20:43 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: virtualization, sjur.brandeland, linux-kernel

"Michael S. Tsirkin" <mst@redhat.com> writes:

> On Tue, Feb 19, 2013 at 08:59:27AM +1030, Rusty Russell wrote:
>> This introduces vringh, which are generic accessors for virtio rings (host side).
>> There's a host-side implementation in vhost, but it assumes that the rings are
>> in userspace, and is tied to the vhost implementation.  I have patches to adapt
>> it to use vringh, but I'm pushing this in the next merge window for Sjur, who has
>> CAIF patches which need it.
>> 
>> This also includes a test program in tools/virtio.  There's a fixup patch for
>> the current tools, since they didn't compile for me.  MST?
>> 
>> Review and comments welcome!
>> Rusty.
>
> Will review, thanks.
> Okay so the plan is to merge this changeset for 3.9 and rework vhost
> for 3.10?

I hope so, yes.

Thanks,
Rusty.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 0/5] vringh
  2013-02-18 22:29 [PATCH 0/5] vringh Rusty Russell
                   ` (5 preceding siblings ...)
  2013-02-19  8:11 ` [PATCH 0/5] vringh Michael S. Tsirkin
@ 2013-02-24 22:30 ` Michael S. Tsirkin
  6 siblings, 0 replies; 9+ messages in thread
From: Michael S. Tsirkin @ 2013-02-24 22:30 UTC (permalink / raw)
  To: Rusty Russell; +Cc: virtualization, sjur.brandeland, linux-kernel

On Tue, Feb 19, 2013 at 08:59:27AM +1030, Rusty Russell wrote:
> This introduces vringh, which are generic accessors for virtio rings (host side).
> There's a host-side implementation in vhost, but it assumes that the rings are
> in userspace, and is tied to the vhost implementation.  I have patches to adapt
> it to use vringh, but I'm pushing this in the next merge window for Sjur, who has
> CAIF patches which need it.
> 
> This also includes a test program in tools/virtio.  There's a fixup patch for
> the current tools, since they didn't compile for me.  MST?
> 
> Review and comments welcome!
> Rusty.

Acked-by: Michael S. Tsirkin <mst@redhat.com>

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2013-02-24 22:30 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-02-18 22:29 [PATCH 0/5] vringh Rusty Russell
2013-02-18 22:29 ` [PATCH 1/5] virtio_ring: expose virtio barriers for use in vringh Rusty Russell
2013-02-18 22:29 ` [PATCH 2/5] vringh: host-side implementation of virtio rings Rusty Russell
2013-02-18 22:29 ` [PATCH 3/5] tools/virtio: fix compile Rusty Russell
2013-02-18 22:29 ` [PATCH 4/5] tools/virtio: separate headers more Rusty Russell
2013-02-18 22:29 ` [PATCH 5/5] tools/virtio: add vring_test Rusty Russell
2013-02-19  8:11 ` [PATCH 0/5] vringh Michael S. Tsirkin
2013-02-19 20:43   ` Rusty Russell
2013-02-24 22:30 ` Michael S. Tsirkin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).