All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 1 of 5] virtio: document functions better
       [not found] <patchbomb.1320306168@localhost6.localdomain6>
@ 2011-11-03  7:42   ` Rusty Russell
  2011-11-03  7:42   ` Rusty Russell
                     ` (3 subsequent siblings)
  4 siblings, 0 replies; 60+ messages in thread
From: Rusty Russell @ 2011-11-03  7:42 UTC (permalink / raw)
  To: MichaelS.Tsirkist, Christoph Hellwig; +Cc: virtualization, kvm, linux-kernel

The old documentation is left over from when we used a structure with
strategy pointers.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
---
 include/linux/virtio.h |  130 ++++++++++++++++++++++++++++++++-----------------
 1 file changed, 87 insertions(+), 43 deletions(-)

diff --git a/include/linux/virtio.h b/include/linux/virtio.h
--- a/include/linux/virtio.h
+++ b/include/linux/virtio.h
@@ -26,52 +26,22 @@ struct virtqueue {
 };
 
 /**
- * operations for virtqueue
- * virtqueue_add_buf: expose buffer to other end
- *	vq: the struct virtqueue we're talking about.
- *	sg: the description of the buffer(s).
- *	out_num: the number of sg readable by other side
- *	in_num: the number of sg which are writable (after readable ones)
- *	data: the token identifying the buffer.
- *	gfp: how to do memory allocations (if necessary).
- *      Returns remaining capacity of queue (sg segments) or a negative error.
- * virtqueue_kick: update after add_buf
- *	vq: the struct virtqueue
- *	After one or more add_buf calls, invoke this to kick the other side.
- * virtqueue_get_buf: get the next used buffer
- *	vq: the struct virtqueue we're talking about.
- *	len: the length written into the buffer
- *	Returns NULL or the "data" token handed to add_buf.
- * virtqueue_disable_cb: disable callbacks
- *	vq: the struct virtqueue we're talking about.
- *	Note that this is not necessarily synchronous, hence unreliable and only
- *	useful as an optimization.
- * virtqueue_enable_cb: restart callbacks after disable_cb.
- *	vq: the struct virtqueue we're talking about.
- *	This re-enables callbacks; it returns "false" if there are pending
- *	buffers in the queue, to detect a possible race between the driver
- *	checking for more work, and enabling callbacks.
- * virtqueue_enable_cb_delayed: restart callbacks after disable_cb.
- *	vq: the struct virtqueue we're talking about.
- *	This re-enables callbacks but hints to the other side to delay
- *	interrupts until most of the available buffers have been processed;
- *	it returns "false" if there are many pending buffers in the queue,
- *	to detect a possible race between the driver checking for more work,
- *	and enabling callbacks.
- * virtqueue_detach_unused_buf: detach first unused buffer
- * 	vq: the struct virtqueue we're talking about.
- * 	Returns NULL or the "data" token handed to add_buf
- * virtqueue_get_vring_size: return the size of the virtqueue's vring
- *	vq: the struct virtqueue containing the vring of interest.
- *	Returns the size of the vring.
+ * virtqueue_add_buf_gfp - expose buffer to other end
+ * @vq: the struct virtqueue we're talking about.
+ * @sg: the description of the buffer(s).
+ * @out_num: the number of sg readable by other side
+ * @in_num: the number of sg which are writable (after readable ones)
+ * @data: the token identifying the buffer.
+ * @gfp: how to do memory allocations (if necessary).
  *
- * Locking rules are straightforward: the driver is responsible for
- * locking.  No two operations may be invoked simultaneously, with the exception
- * of virtqueue_disable_cb.
+ * Caller must ensure we don't call this with other virtqueue operations
+ * at the same time (except where noted).
  *
- * All operations can be called in any context.
+ * Returns remaining capacity of queue or a negative error
+ * (ie. ENOSPC).  Note that it only really makes sense to treat all
+ * positive return values as "available": indirect buffers mean that
+ * we can put an entire sg[] array inside a single queue entry.
  */
-
 int virtqueue_add_buf_gfp(struct virtqueue *vq,
 			  struct scatterlist sg[],
 			  unsigned int out_num,
@@ -88,18 +58,92 @@ static inline int virtqueue_add_buf(stru
 	return virtqueue_add_buf_gfp(vq, sg, out_num, in_num, data, GFP_ATOMIC);
 }
 
+/**
+ * virtqueue_kick - update after add_buf
+ * @vq: the struct virtqueue
+ *
+ * After one or more virtqueue_add_buf_gfp calls, invoke this to kick
+ * the other side.
+ *
+ * Caller must ensure we don't call this with other virtqueue
+ * operations at the same time (except where noted).
+ */
 void virtqueue_kick(struct virtqueue *vq);
 
+/**
+ * virtqueue_get_buf - get the next used buffer
+ * @vq: the struct virtqueue we're talking about.
+ * @len: the length written into the buffer
+ *
+ * If the driver wrote data into the buffer, @len will be set to the
+ * amount written.  This means you don't need to clear the buffer
+ * beforehand to ensure there's no data leakage in the case of short
+ * writes.
+ *
+ * Caller must ensure we don't call this with other virtqueue
+ * operations at the same time (except where noted).
+ *
+ * Returns NULL if there are no used buffers, or the "data" token
+ * handed to virtqueue_add_buf_gfp().
+ */
 void *virtqueue_get_buf(struct virtqueue *vq, unsigned int *len);
 
+/**
+ * virtqueue_disable_cb - disable callbacks
+ * @vq: the struct virtqueue we're talking about.
+ *
+ * Note that this is not necessarily synchronous, hence unreliable and only
+ * useful as an optimization.
+ *
+ * Unlike other operations, this need not be serialized.
+ */
 void virtqueue_disable_cb(struct virtqueue *vq);
 
+/**
+ * virtqueue_enable_cb - restart callbacks after disable_cb.
+ * @vq: the struct virtqueue we're talking about.
+ *
+ * This re-enables callbacks; it returns "false" if there are pending
+ * buffers in the queue, to detect a possible race between the driver
+ * checking for more work, and enabling callbacks.
+ *
+ * Caller must ensure we don't call this with other virtqueue
+ * operations at the same time (except where noted).
+ */
 bool virtqueue_enable_cb(struct virtqueue *vq);
 
+/**
+ * virtqueue_enable_cb_delayed - restart callbacks after disable_cb.
+ * @vq: the struct virtqueue we're talking about.
+ *
+ * This re-enables callbacks but hints to the other side to delay
+ * interrupts until most of the available buffers have been processed;
+ * it returns "false" if there are many pending buffers in the queue,
+ * to detect a possible race between the driver checking for more work,
+ * and enabling callbacks.
+ *
+ * Caller must ensure we don't call this with other virtqueue
+ * operations at the same time (except where noted).
+ */
 bool virtqueue_enable_cb_delayed(struct virtqueue *vq);
 
+/**
+ * virtqueue_detach_unused_buf - detach first unused buffer
+ * @vq: the struct virtqueue we're talking about.
+ *
+ * Returns NULL or the "data" token handed to virtqueue_add_buf_gfp().
+ * This is not valid on an active queue; it is useful only for device
+ * shutdown.
+ */
 void *virtqueue_detach_unused_buf(struct virtqueue *vq);
 
+/**
+ * virtqueue_get_vring_size - return the size of the virtqueue's vring
+ * @vq: the struct virtqueue containing the vring of interest.
+ *
+ * Returns the size of the vring.  This is mainly used for boasting to
+ * userspace.  Unlike other operations, this need not be serialized.
+ */
 unsigned int virtqueue_get_vring_size(struct virtqueue *vq);
 
 /**



^ permalink raw reply	[flat|nested] 60+ messages in thread

* [PATCH 1 of 5] virtio: document functions better
@ 2011-11-03  7:42   ` Rusty Russell
  0 siblings, 0 replies; 60+ messages in thread
From: Rusty Russell @ 2011-11-03  7:42 UTC (permalink / raw)
  To: MichaelS.Tsirkist, Christoph Hellwig; +Cc: linux-kernel, kvm, virtualization

The old documentation is left over from when we used a structure with
strategy pointers.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
---
 include/linux/virtio.h |  130 ++++++++++++++++++++++++++++++++-----------------
 1 file changed, 87 insertions(+), 43 deletions(-)

diff --git a/include/linux/virtio.h b/include/linux/virtio.h
--- a/include/linux/virtio.h
+++ b/include/linux/virtio.h
@@ -26,52 +26,22 @@ struct virtqueue {
 };
 
 /**
- * operations for virtqueue
- * virtqueue_add_buf: expose buffer to other end
- *	vq: the struct virtqueue we're talking about.
- *	sg: the description of the buffer(s).
- *	out_num: the number of sg readable by other side
- *	in_num: the number of sg which are writable (after readable ones)
- *	data: the token identifying the buffer.
- *	gfp: how to do memory allocations (if necessary).
- *      Returns remaining capacity of queue (sg segments) or a negative error.
- * virtqueue_kick: update after add_buf
- *	vq: the struct virtqueue
- *	After one or more add_buf calls, invoke this to kick the other side.
- * virtqueue_get_buf: get the next used buffer
- *	vq: the struct virtqueue we're talking about.
- *	len: the length written into the buffer
- *	Returns NULL or the "data" token handed to add_buf.
- * virtqueue_disable_cb: disable callbacks
- *	vq: the struct virtqueue we're talking about.
- *	Note that this is not necessarily synchronous, hence unreliable and only
- *	useful as an optimization.
- * virtqueue_enable_cb: restart callbacks after disable_cb.
- *	vq: the struct virtqueue we're talking about.
- *	This re-enables callbacks; it returns "false" if there are pending
- *	buffers in the queue, to detect a possible race between the driver
- *	checking for more work, and enabling callbacks.
- * virtqueue_enable_cb_delayed: restart callbacks after disable_cb.
- *	vq: the struct virtqueue we're talking about.
- *	This re-enables callbacks but hints to the other side to delay
- *	interrupts until most of the available buffers have been processed;
- *	it returns "false" if there are many pending buffers in the queue,
- *	to detect a possible race between the driver checking for more work,
- *	and enabling callbacks.
- * virtqueue_detach_unused_buf: detach first unused buffer
- * 	vq: the struct virtqueue we're talking about.
- * 	Returns NULL or the "data" token handed to add_buf
- * virtqueue_get_vring_size: return the size of the virtqueue's vring
- *	vq: the struct virtqueue containing the vring of interest.
- *	Returns the size of the vring.
+ * virtqueue_add_buf_gfp - expose buffer to other end
+ * @vq: the struct virtqueue we're talking about.
+ * @sg: the description of the buffer(s).
+ * @out_num: the number of sg readable by other side
+ * @in_num: the number of sg which are writable (after readable ones)
+ * @data: the token identifying the buffer.
+ * @gfp: how to do memory allocations (if necessary).
  *
- * Locking rules are straightforward: the driver is responsible for
- * locking.  No two operations may be invoked simultaneously, with the exception
- * of virtqueue_disable_cb.
+ * Caller must ensure we don't call this with other virtqueue operations
+ * at the same time (except where noted).
  *
- * All operations can be called in any context.
+ * Returns remaining capacity of queue or a negative error
+ * (ie. ENOSPC).  Note that it only really makes sense to treat all
+ * positive return values as "available": indirect buffers mean that
+ * we can put an entire sg[] array inside a single queue entry.
  */
-
 int virtqueue_add_buf_gfp(struct virtqueue *vq,
 			  struct scatterlist sg[],
 			  unsigned int out_num,
@@ -88,18 +58,92 @@ static inline int virtqueue_add_buf(stru
 	return virtqueue_add_buf_gfp(vq, sg, out_num, in_num, data, GFP_ATOMIC);
 }
 
+/**
+ * virtqueue_kick - update after add_buf
+ * @vq: the struct virtqueue
+ *
+ * After one or more virtqueue_add_buf_gfp calls, invoke this to kick
+ * the other side.
+ *
+ * Caller must ensure we don't call this with other virtqueue
+ * operations at the same time (except where noted).
+ */
 void virtqueue_kick(struct virtqueue *vq);
 
+/**
+ * virtqueue_get_buf - get the next used buffer
+ * @vq: the struct virtqueue we're talking about.
+ * @len: the length written into the buffer
+ *
+ * If the driver wrote data into the buffer, @len will be set to the
+ * amount written.  This means you don't need to clear the buffer
+ * beforehand to ensure there's no data leakage in the case of short
+ * writes.
+ *
+ * Caller must ensure we don't call this with other virtqueue
+ * operations at the same time (except where noted).
+ *
+ * Returns NULL if there are no used buffers, or the "data" token
+ * handed to virtqueue_add_buf_gfp().
+ */
 void *virtqueue_get_buf(struct virtqueue *vq, unsigned int *len);
 
+/**
+ * virtqueue_disable_cb - disable callbacks
+ * @vq: the struct virtqueue we're talking about.
+ *
+ * Note that this is not necessarily synchronous, hence unreliable and only
+ * useful as an optimization.
+ *
+ * Unlike other operations, this need not be serialized.
+ */
 void virtqueue_disable_cb(struct virtqueue *vq);
 
+/**
+ * virtqueue_enable_cb - restart callbacks after disable_cb.
+ * @vq: the struct virtqueue we're talking about.
+ *
+ * This re-enables callbacks; it returns "false" if there are pending
+ * buffers in the queue, to detect a possible race between the driver
+ * checking for more work, and enabling callbacks.
+ *
+ * Caller must ensure we don't call this with other virtqueue
+ * operations at the same time (except where noted).
+ */
 bool virtqueue_enable_cb(struct virtqueue *vq);
 
+/**
+ * virtqueue_enable_cb_delayed - restart callbacks after disable_cb.
+ * @vq: the struct virtqueue we're talking about.
+ *
+ * This re-enables callbacks but hints to the other side to delay
+ * interrupts until most of the available buffers have been processed;
+ * it returns "false" if there are many pending buffers in the queue,
+ * to detect a possible race between the driver checking for more work,
+ * and enabling callbacks.
+ *
+ * Caller must ensure we don't call this with other virtqueue
+ * operations at the same time (except where noted).
+ */
 bool virtqueue_enable_cb_delayed(struct virtqueue *vq);
 
+/**
+ * virtqueue_detach_unused_buf - detach first unused buffer
+ * @vq: the struct virtqueue we're talking about.
+ *
+ * Returns NULL or the "data" token handed to virtqueue_add_buf_gfp().
+ * This is not valid on an active queue; it is useful only for device
+ * shutdown.
+ */
 void *virtqueue_detach_unused_buf(struct virtqueue *vq);
 
+/**
+ * virtqueue_get_vring_size - return the size of the virtqueue's vring
+ * @vq: the struct virtqueue containing the vring of interest.
+ *
+ * Returns the size of the vring.  This is mainly used for boasting to
+ * userspace.  Unlike other operations, this need not be serialized.
+ */
 unsigned int virtqueue_get_vring_size(struct virtqueue *vq);
 
 /**

^ permalink raw reply	[flat|nested] 60+ messages in thread

* [PATCH 2 of 5] virtio: rename virtqueue_add_buf_gfp to virtqueue_add_buf
       [not found] <patchbomb.1320306168@localhost6.localdomain6>
@ 2011-11-03  7:42   ` Rusty Russell
  2011-11-03  7:42   ` Rusty Russell
                     ` (3 subsequent siblings)
  4 siblings, 0 replies; 60+ messages in thread
From: Rusty Russell @ 2011-11-03  7:42 UTC (permalink / raw)
  To: MichaelS.Tsirkist, Christoph Hellwig; +Cc: virtualization, kvm, linux-kernel

Remove wrapper functions. This makes the allocation type explicit in
all callers; I used GPF_KERNEL where it seemed obvious, left it at
GFP_ATOMIC otherwise.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>

diff --git a/drivers/block/virtio_blk.c b/drivers/block/virtio_blk.c
--- a/drivers/block/virtio_blk.c
+++ b/drivers/block/virtio_blk.c
@@ -171,7 +171,7 @@ static bool do_req(struct request_queue 
 		}
 	}
 
-	if (virtqueue_add_buf(vblk->vq, vblk->sg, out, in, vbr) < 0) {
+	if (virtqueue_add_buf(vblk->vq, vblk->sg, out, in, vbr, GFP_ATOMIC)<0) {
 		mempool_free(vbr, vblk->pool);
 		return false;
 	}
diff --git a/drivers/char/hw_random/virtio-rng.c b/drivers/char/hw_random/virtio-rng.c
--- a/drivers/char/hw_random/virtio-rng.c
+++ b/drivers/char/hw_random/virtio-rng.c
@@ -46,7 +46,7 @@ static void register_buffer(u8 *buf, siz
 	sg_init_one(&sg, buf, size);
 
 	/* There should always be room for one buffer. */
-	if (virtqueue_add_buf(vq, &sg, 0, 1, buf) < 0)
+	if (virtqueue_add_buf(vq, &sg, 0, 1, buf, GFP_KERNEL) < 0)
 		BUG();
 
 	virtqueue_kick(vq);
diff --git a/drivers/char/virtio_console.c b/drivers/char/virtio_console.c
--- a/drivers/char/virtio_console.c
+++ b/drivers/char/virtio_console.c
@@ -391,7 +391,7 @@ static int add_inbuf(struct virtqueue *v
 
 	sg_init_one(sg, buf->buf, buf->size);
 
-	ret = virtqueue_add_buf(vq, sg, 0, 1, buf);
+	ret = virtqueue_add_buf(vq, sg, 0, 1, buf, GFP_ATOMIC);
 	virtqueue_kick(vq);
 	return ret;
 }
@@ -456,7 +456,7 @@ static ssize_t __send_control_msg(struct
 	vq = portdev->c_ovq;
 
 	sg_init_one(sg, &cpkt, sizeof(cpkt));
-	if (virtqueue_add_buf(vq, sg, 1, 0, &cpkt) >= 0) {
+	if (virtqueue_add_buf(vq, sg, 1, 0, &cpkt, GFP_ATOMIC) >= 0) {
 		virtqueue_kick(vq);
 		while (!virtqueue_get_buf(vq, &len))
 			cpu_relax();
@@ -505,7 +505,7 @@ static ssize_t send_buf(struct port *por
 	reclaim_consumed_buffers(port);
 
 	sg_init_one(sg, in_buf, in_count);
-	ret = virtqueue_add_buf(out_vq, sg, 1, 0, in_buf);
+	ret = virtqueue_add_buf(out_vq, sg, 1, 0, in_buf, GFP_ATOMIC);
 
 	/* Tell Host to go! */
 	virtqueue_kick(out_vq);
diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -368,7 +368,7 @@ static int add_recvbuf_small(struct virt
 
 	skb_to_sgvec(skb, vi->rx_sg + 1, 0, skb->len);
 
-	err = virtqueue_add_buf_gfp(vi->rvq, vi->rx_sg, 0, 2, skb, gfp);
+	err = virtqueue_add_buf(vi->rvq, vi->rx_sg, 0, 2, skb, gfp);
 	if (err < 0)
 		dev_kfree_skb(skb);
 
@@ -413,8 +413,8 @@ static int add_recvbuf_big(struct virtne
 
 	/* chain first in list head */
 	first->private = (unsigned long)list;
-	err = virtqueue_add_buf_gfp(vi->rvq, vi->rx_sg, 0, MAX_SKB_FRAGS + 2,
-				    first, gfp);
+	err = virtqueue_add_buf(vi->rvq, vi->rx_sg, 0, MAX_SKB_FRAGS + 2,
+				first, gfp);
 	if (err < 0)
 		give_pages(vi, first);
 
@@ -432,7 +432,7 @@ static int add_recvbuf_mergeable(struct 
 
 	sg_init_one(vi->rx_sg, page_address(page), PAGE_SIZE);
 
-	err = virtqueue_add_buf_gfp(vi->rvq, vi->rx_sg, 0, 1, page, gfp);
+	err = virtqueue_add_buf(vi->rvq, vi->rx_sg, 0, 1, page, gfp);
 	if (err < 0)
 		give_pages(vi, page);
 
@@ -601,7 +601,7 @@ static int xmit_skb(struct virtnet_info 
 
 	hdr->num_sg = skb_to_sgvec(skb, vi->tx_sg + 1, 0, skb->len) + 1;
 	return virtqueue_add_buf(vi->svq, vi->tx_sg, hdr->num_sg,
-					0, skb);
+				 0, skb, GFP_ATOMIC);
 }
 
 static netdev_tx_t start_xmit(struct sk_buff *skb, struct net_device *dev)
@@ -754,7 +754,7 @@ static bool virtnet_send_command(struct 
 		sg_set_buf(&sg[i + 1], sg_virt(s), s->length);
 	sg_set_buf(&sg[out + in - 1], &status, sizeof(status));
 
-	BUG_ON(virtqueue_add_buf(vi->cvq, sg, out, in, vi) < 0);
+	BUG_ON(virtqueue_add_buf(vi->cvq, sg, out, in, vi, GFP_ATOMIC) < 0);
 
 	virtqueue_kick(vi->cvq);
 
diff --git a/drivers/virtio/virtio_balloon.c b/drivers/virtio/virtio_balloon.c
--- a/drivers/virtio/virtio_balloon.c
+++ b/drivers/virtio/virtio_balloon.c
@@ -86,7 +86,7 @@ static void tell_host(struct virtio_ball
 	init_completion(&vb->acked);
 
 	/* We should always be able to add one buffer to an empty queue. */
-	if (virtqueue_add_buf(vq, &sg, 1, 0, vb) < 0)
+	if (virtqueue_add_buf(vq, &sg, 1, 0, vb, GFP_KERNEL) < 0)
 		BUG();
 	virtqueue_kick(vq);
 
@@ -219,7 +219,7 @@ static void stats_handle_request(struct 
 
 	vq = vb->stats_vq;
 	sg_init_one(&sg, vb->stats, sizeof(vb->stats));
-	if (virtqueue_add_buf(vq, &sg, 1, 0, vb) < 0)
+	if (virtqueue_add_buf(vq, &sg, 1, 0, vb, GFP_KERNEL) < 0)
 		BUG();
 	virtqueue_kick(vq);
 }
@@ -312,7 +312,8 @@ static int virtballoon_probe(struct virt
 		 * use it to signal us later.
 		 */
 		sg_init_one(&sg, vb->stats, sizeof vb->stats);
-		if (virtqueue_add_buf(vb->stats_vq, &sg, 1, 0, vb) < 0)
+		if (virtqueue_add_buf(vb->stats_vq, &sg, 1, 0, vb, GFP_KERNEL)
+		    < 0)
 			BUG();
 		virtqueue_kick(vb->stats_vq);
 	}
diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -159,12 +159,12 @@ static int vring_add_indirect(struct vri
 	return head;
 }
 
-int virtqueue_add_buf_gfp(struct virtqueue *_vq,
-			  struct scatterlist sg[],
-			  unsigned int out,
-			  unsigned int in,
-			  void *data,
-			  gfp_t gfp)
+int virtqueue_add_buf(struct virtqueue *_vq,
+		      struct scatterlist sg[],
+		      unsigned int out,
+		      unsigned int in,
+		      void *data,
+		      gfp_t gfp)
 {
 	struct vring_virtqueue *vq = to_vvq(_vq);
 	unsigned int i, avail, uninitialized_var(prev);
@@ -235,7 +235,7 @@ add_head:
 
 	return vq->num_free;
 }
-EXPORT_SYMBOL_GPL(virtqueue_add_buf_gfp);
+EXPORT_SYMBOL_GPL(virtqueue_add_buf);
 
 void virtqueue_kick(struct virtqueue *_vq)
 {
diff --git a/include/linux/virtio.h b/include/linux/virtio.h
--- a/include/linux/virtio.h
+++ b/include/linux/virtio.h
@@ -26,7 +26,7 @@ struct virtqueue {
 };
 
 /**
- * virtqueue_add_buf_gfp - expose buffer to other end
+ * virtqueue_add_buf - expose buffer to other end
  * @vq: the struct virtqueue we're talking about.
  * @sg: the description of the buffer(s).
  * @out_num: the number of sg readable by other side
@@ -42,27 +42,18 @@ struct virtqueue {
  * positive return values as "available": indirect buffers mean that
  * we can put an entire sg[] array inside a single queue entry.
  */
-int virtqueue_add_buf_gfp(struct virtqueue *vq,
-			  struct scatterlist sg[],
-			  unsigned int out_num,
-			  unsigned int in_num,
-			  void *data,
-			  gfp_t gfp);
-
-static inline int virtqueue_add_buf(struct virtqueue *vq,
-				    struct scatterlist sg[],
-				    unsigned int out_num,
-				    unsigned int in_num,
-				    void *data)
-{
-	return virtqueue_add_buf_gfp(vq, sg, out_num, in_num, data, GFP_ATOMIC);
-}
+int virtqueue_add_buf(struct virtqueue *vq,
+		      struct scatterlist sg[],
+		      unsigned int out_num,
+		      unsigned int in_num,
+		      void *data,
+		      gfp_t gfp);
 
 /**
  * virtqueue_kick - update after add_buf
  * @vq: the struct virtqueue
  *
- * After one or more virtqueue_add_buf_gfp calls, invoke this to kick
+ * After one or more virtqueue_add_buf calls, invoke this to kick
  * the other side.
  *
  * Caller must ensure we don't call this with other virtqueue
@@ -84,7 +75,7 @@ void virtqueue_kick(struct virtqueue *vq
  * operations at the same time (except where noted).
  *
  * Returns NULL if there are no used buffers, or the "data" token
- * handed to virtqueue_add_buf_gfp().
+ * handed to virtqueue_add_buf().
  */
 void *virtqueue_get_buf(struct virtqueue *vq, unsigned int *len);
 
@@ -131,7 +122,7 @@ bool virtqueue_enable_cb_delayed(struct 
  * virtqueue_detach_unused_buf - detach first unused buffer
  * @vq: the struct virtqueue we're talking about.
  *
- * Returns NULL or the "data" token handed to virtqueue_add_buf_gfp().
+ * Returns NULL or the "data" token handed to virtqueue_add_buf().
  * This is not valid on an active queue; it is useful only for device
  * shutdown.
  */
diff --git a/net/9p/trans_virtio.c b/net/9p/trans_virtio.c
--- a/net/9p/trans_virtio.c
+++ b/net/9p/trans_virtio.c
@@ -270,7 +270,8 @@ req_retry:
 	in = pack_sg_list(chan->sg, out,
 			  VIRTQUEUE_NUM, req->rc->sdata, req->rc->capacity);
 
-	err = virtqueue_add_buf(chan->vq, chan->sg, out, in, req->tc);
+	err = virtqueue_add_buf(chan->vq, chan->sg, out, in, req->tc,
+				GFP_ATOMIC);
 	if (err < 0) {
 		if (err == -ENOSPC) {
 			chan->ring_bufs_avail = 0;
@@ -413,7 +414,8 @@ req_retry_pinned:
 		in += pack_sg_list_p(chan->sg, out + in, VIRTQUEUE_NUM,
 				     in_pages, in_nr_pages, uidata, inlen);
 
-	err = virtqueue_add_buf(chan->vq, chan->sg, out, in, req->tc);
+	err = virtqueue_add_buf(chan->vq, chan->sg, out, in, req->tc,
+				GFP_ATOMIC);
 	if (err < 0) {
 		if (err == -ENOSPC) {
 			chan->ring_bufs_avail = 0;
diff --git a/tools/virtio/linux/virtio.h b/tools/virtio/linux/virtio.h
--- a/tools/virtio/linux/virtio.h
+++ b/tools/virtio/linux/virtio.h
@@ -186,21 +186,12 @@ struct virtqueue {
 #endif
 
 /* Interfaces exported by virtio_ring. */
-int virtqueue_add_buf_gfp(struct virtqueue *vq,
-			  struct scatterlist sg[],
-			  unsigned int out_num,
-			  unsigned int in_num,
-			  void *data,
-			  gfp_t gfp);
-
-static inline int virtqueue_add_buf(struct virtqueue *vq,
-				    struct scatterlist sg[],
-				    unsigned int out_num,
-				    unsigned int in_num,
-				    void *data)
-{
-	return virtqueue_add_buf_gfp(vq, sg, out_num, in_num, data, GFP_ATOMIC);
-}
+int virtqueue_add_buf(struct virtqueue *vq,
+		      struct scatterlist sg[],
+		      unsigned int out_num,
+		      unsigned int in_num,
+		      void *data,
+		      gfp_t gfp);
 
 void virtqueue_kick(struct virtqueue *vq);
 
diff --git a/tools/virtio/virtio_test.c b/tools/virtio/virtio_test.c
--- a/tools/virtio/virtio_test.c
+++ b/tools/virtio/virtio_test.c
@@ -160,7 +160,8 @@ static void run_test(struct vdev_info *d
 			if (started < bufs) {
 				sg_init_one(&sl, dev->buf, dev->buf_size);
 				r = virtqueue_add_buf(vq->vq, &sl, 1, 0,
-						      dev->buf + started);
+						      dev->buf + started,
+						      GFP_ATOMIC);
 				if (likely(r >= 0)) {
 					++started;
 					virtqueue_kick(vq->vq);



^ permalink raw reply	[flat|nested] 60+ messages in thread

* [PATCH 2 of 5] virtio: rename virtqueue_add_buf_gfp to virtqueue_add_buf
@ 2011-11-03  7:42   ` Rusty Russell
  0 siblings, 0 replies; 60+ messages in thread
From: Rusty Russell @ 2011-11-03  7:42 UTC (permalink / raw)
  To: MichaelS.Tsirkist, Christoph Hellwig; +Cc: linux-kernel, kvm, virtualization

Remove wrapper functions. This makes the allocation type explicit in
all callers; I used GPF_KERNEL where it seemed obvious, left it at
GFP_ATOMIC otherwise.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>

diff --git a/drivers/block/virtio_blk.c b/drivers/block/virtio_blk.c
--- a/drivers/block/virtio_blk.c
+++ b/drivers/block/virtio_blk.c
@@ -171,7 +171,7 @@ static bool do_req(struct request_queue 
 		}
 	}
 
-	if (virtqueue_add_buf(vblk->vq, vblk->sg, out, in, vbr) < 0) {
+	if (virtqueue_add_buf(vblk->vq, vblk->sg, out, in, vbr, GFP_ATOMIC)<0) {
 		mempool_free(vbr, vblk->pool);
 		return false;
 	}
diff --git a/drivers/char/hw_random/virtio-rng.c b/drivers/char/hw_random/virtio-rng.c
--- a/drivers/char/hw_random/virtio-rng.c
+++ b/drivers/char/hw_random/virtio-rng.c
@@ -46,7 +46,7 @@ static void register_buffer(u8 *buf, siz
 	sg_init_one(&sg, buf, size);
 
 	/* There should always be room for one buffer. */
-	if (virtqueue_add_buf(vq, &sg, 0, 1, buf) < 0)
+	if (virtqueue_add_buf(vq, &sg, 0, 1, buf, GFP_KERNEL) < 0)
 		BUG();
 
 	virtqueue_kick(vq);
diff --git a/drivers/char/virtio_console.c b/drivers/char/virtio_console.c
--- a/drivers/char/virtio_console.c
+++ b/drivers/char/virtio_console.c
@@ -391,7 +391,7 @@ static int add_inbuf(struct virtqueue *v
 
 	sg_init_one(sg, buf->buf, buf->size);
 
-	ret = virtqueue_add_buf(vq, sg, 0, 1, buf);
+	ret = virtqueue_add_buf(vq, sg, 0, 1, buf, GFP_ATOMIC);
 	virtqueue_kick(vq);
 	return ret;
 }
@@ -456,7 +456,7 @@ static ssize_t __send_control_msg(struct
 	vq = portdev->c_ovq;
 
 	sg_init_one(sg, &cpkt, sizeof(cpkt));
-	if (virtqueue_add_buf(vq, sg, 1, 0, &cpkt) >= 0) {
+	if (virtqueue_add_buf(vq, sg, 1, 0, &cpkt, GFP_ATOMIC) >= 0) {
 		virtqueue_kick(vq);
 		while (!virtqueue_get_buf(vq, &len))
 			cpu_relax();
@@ -505,7 +505,7 @@ static ssize_t send_buf(struct port *por
 	reclaim_consumed_buffers(port);
 
 	sg_init_one(sg, in_buf, in_count);
-	ret = virtqueue_add_buf(out_vq, sg, 1, 0, in_buf);
+	ret = virtqueue_add_buf(out_vq, sg, 1, 0, in_buf, GFP_ATOMIC);
 
 	/* Tell Host to go! */
 	virtqueue_kick(out_vq);
diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -368,7 +368,7 @@ static int add_recvbuf_small(struct virt
 
 	skb_to_sgvec(skb, vi->rx_sg + 1, 0, skb->len);
 
-	err = virtqueue_add_buf_gfp(vi->rvq, vi->rx_sg, 0, 2, skb, gfp);
+	err = virtqueue_add_buf(vi->rvq, vi->rx_sg, 0, 2, skb, gfp);
 	if (err < 0)
 		dev_kfree_skb(skb);
 
@@ -413,8 +413,8 @@ static int add_recvbuf_big(struct virtne
 
 	/* chain first in list head */
 	first->private = (unsigned long)list;
-	err = virtqueue_add_buf_gfp(vi->rvq, vi->rx_sg, 0, MAX_SKB_FRAGS + 2,
-				    first, gfp);
+	err = virtqueue_add_buf(vi->rvq, vi->rx_sg, 0, MAX_SKB_FRAGS + 2,
+				first, gfp);
 	if (err < 0)
 		give_pages(vi, first);
 
@@ -432,7 +432,7 @@ static int add_recvbuf_mergeable(struct 
 
 	sg_init_one(vi->rx_sg, page_address(page), PAGE_SIZE);
 
-	err = virtqueue_add_buf_gfp(vi->rvq, vi->rx_sg, 0, 1, page, gfp);
+	err = virtqueue_add_buf(vi->rvq, vi->rx_sg, 0, 1, page, gfp);
 	if (err < 0)
 		give_pages(vi, page);
 
@@ -601,7 +601,7 @@ static int xmit_skb(struct virtnet_info 
 
 	hdr->num_sg = skb_to_sgvec(skb, vi->tx_sg + 1, 0, skb->len) + 1;
 	return virtqueue_add_buf(vi->svq, vi->tx_sg, hdr->num_sg,
-					0, skb);
+				 0, skb, GFP_ATOMIC);
 }
 
 static netdev_tx_t start_xmit(struct sk_buff *skb, struct net_device *dev)
@@ -754,7 +754,7 @@ static bool virtnet_send_command(struct 
 		sg_set_buf(&sg[i + 1], sg_virt(s), s->length);
 	sg_set_buf(&sg[out + in - 1], &status, sizeof(status));
 
-	BUG_ON(virtqueue_add_buf(vi->cvq, sg, out, in, vi) < 0);
+	BUG_ON(virtqueue_add_buf(vi->cvq, sg, out, in, vi, GFP_ATOMIC) < 0);
 
 	virtqueue_kick(vi->cvq);
 
diff --git a/drivers/virtio/virtio_balloon.c b/drivers/virtio/virtio_balloon.c
--- a/drivers/virtio/virtio_balloon.c
+++ b/drivers/virtio/virtio_balloon.c
@@ -86,7 +86,7 @@ static void tell_host(struct virtio_ball
 	init_completion(&vb->acked);
 
 	/* We should always be able to add one buffer to an empty queue. */
-	if (virtqueue_add_buf(vq, &sg, 1, 0, vb) < 0)
+	if (virtqueue_add_buf(vq, &sg, 1, 0, vb, GFP_KERNEL) < 0)
 		BUG();
 	virtqueue_kick(vq);
 
@@ -219,7 +219,7 @@ static void stats_handle_request(struct 
 
 	vq = vb->stats_vq;
 	sg_init_one(&sg, vb->stats, sizeof(vb->stats));
-	if (virtqueue_add_buf(vq, &sg, 1, 0, vb) < 0)
+	if (virtqueue_add_buf(vq, &sg, 1, 0, vb, GFP_KERNEL) < 0)
 		BUG();
 	virtqueue_kick(vq);
 }
@@ -312,7 +312,8 @@ static int virtballoon_probe(struct virt
 		 * use it to signal us later.
 		 */
 		sg_init_one(&sg, vb->stats, sizeof vb->stats);
-		if (virtqueue_add_buf(vb->stats_vq, &sg, 1, 0, vb) < 0)
+		if (virtqueue_add_buf(vb->stats_vq, &sg, 1, 0, vb, GFP_KERNEL)
+		    < 0)
 			BUG();
 		virtqueue_kick(vb->stats_vq);
 	}
diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -159,12 +159,12 @@ static int vring_add_indirect(struct vri
 	return head;
 }
 
-int virtqueue_add_buf_gfp(struct virtqueue *_vq,
-			  struct scatterlist sg[],
-			  unsigned int out,
-			  unsigned int in,
-			  void *data,
-			  gfp_t gfp)
+int virtqueue_add_buf(struct virtqueue *_vq,
+		      struct scatterlist sg[],
+		      unsigned int out,
+		      unsigned int in,
+		      void *data,
+		      gfp_t gfp)
 {
 	struct vring_virtqueue *vq = to_vvq(_vq);
 	unsigned int i, avail, uninitialized_var(prev);
@@ -235,7 +235,7 @@ add_head:
 
 	return vq->num_free;
 }
-EXPORT_SYMBOL_GPL(virtqueue_add_buf_gfp);
+EXPORT_SYMBOL_GPL(virtqueue_add_buf);
 
 void virtqueue_kick(struct virtqueue *_vq)
 {
diff --git a/include/linux/virtio.h b/include/linux/virtio.h
--- a/include/linux/virtio.h
+++ b/include/linux/virtio.h
@@ -26,7 +26,7 @@ struct virtqueue {
 };
 
 /**
- * virtqueue_add_buf_gfp - expose buffer to other end
+ * virtqueue_add_buf - expose buffer to other end
  * @vq: the struct virtqueue we're talking about.
  * @sg: the description of the buffer(s).
  * @out_num: the number of sg readable by other side
@@ -42,27 +42,18 @@ struct virtqueue {
  * positive return values as "available": indirect buffers mean that
  * we can put an entire sg[] array inside a single queue entry.
  */
-int virtqueue_add_buf_gfp(struct virtqueue *vq,
-			  struct scatterlist sg[],
-			  unsigned int out_num,
-			  unsigned int in_num,
-			  void *data,
-			  gfp_t gfp);
-
-static inline int virtqueue_add_buf(struct virtqueue *vq,
-				    struct scatterlist sg[],
-				    unsigned int out_num,
-				    unsigned int in_num,
-				    void *data)
-{
-	return virtqueue_add_buf_gfp(vq, sg, out_num, in_num, data, GFP_ATOMIC);
-}
+int virtqueue_add_buf(struct virtqueue *vq,
+		      struct scatterlist sg[],
+		      unsigned int out_num,
+		      unsigned int in_num,
+		      void *data,
+		      gfp_t gfp);
 
 /**
  * virtqueue_kick - update after add_buf
  * @vq: the struct virtqueue
  *
- * After one or more virtqueue_add_buf_gfp calls, invoke this to kick
+ * After one or more virtqueue_add_buf calls, invoke this to kick
  * the other side.
  *
  * Caller must ensure we don't call this with other virtqueue
@@ -84,7 +75,7 @@ void virtqueue_kick(struct virtqueue *vq
  * operations at the same time (except where noted).
  *
  * Returns NULL if there are no used buffers, or the "data" token
- * handed to virtqueue_add_buf_gfp().
+ * handed to virtqueue_add_buf().
  */
 void *virtqueue_get_buf(struct virtqueue *vq, unsigned int *len);
 
@@ -131,7 +122,7 @@ bool virtqueue_enable_cb_delayed(struct 
  * virtqueue_detach_unused_buf - detach first unused buffer
  * @vq: the struct virtqueue we're talking about.
  *
- * Returns NULL or the "data" token handed to virtqueue_add_buf_gfp().
+ * Returns NULL or the "data" token handed to virtqueue_add_buf().
  * This is not valid on an active queue; it is useful only for device
  * shutdown.
  */
diff --git a/net/9p/trans_virtio.c b/net/9p/trans_virtio.c
--- a/net/9p/trans_virtio.c
+++ b/net/9p/trans_virtio.c
@@ -270,7 +270,8 @@ req_retry:
 	in = pack_sg_list(chan->sg, out,
 			  VIRTQUEUE_NUM, req->rc->sdata, req->rc->capacity);
 
-	err = virtqueue_add_buf(chan->vq, chan->sg, out, in, req->tc);
+	err = virtqueue_add_buf(chan->vq, chan->sg, out, in, req->tc,
+				GFP_ATOMIC);
 	if (err < 0) {
 		if (err == -ENOSPC) {
 			chan->ring_bufs_avail = 0;
@@ -413,7 +414,8 @@ req_retry_pinned:
 		in += pack_sg_list_p(chan->sg, out + in, VIRTQUEUE_NUM,
 				     in_pages, in_nr_pages, uidata, inlen);
 
-	err = virtqueue_add_buf(chan->vq, chan->sg, out, in, req->tc);
+	err = virtqueue_add_buf(chan->vq, chan->sg, out, in, req->tc,
+				GFP_ATOMIC);
 	if (err < 0) {
 		if (err == -ENOSPC) {
 			chan->ring_bufs_avail = 0;
diff --git a/tools/virtio/linux/virtio.h b/tools/virtio/linux/virtio.h
--- a/tools/virtio/linux/virtio.h
+++ b/tools/virtio/linux/virtio.h
@@ -186,21 +186,12 @@ struct virtqueue {
 #endif
 
 /* Interfaces exported by virtio_ring. */
-int virtqueue_add_buf_gfp(struct virtqueue *vq,
-			  struct scatterlist sg[],
-			  unsigned int out_num,
-			  unsigned int in_num,
-			  void *data,
-			  gfp_t gfp);
-
-static inline int virtqueue_add_buf(struct virtqueue *vq,
-				    struct scatterlist sg[],
-				    unsigned int out_num,
-				    unsigned int in_num,
-				    void *data)
-{
-	return virtqueue_add_buf_gfp(vq, sg, out_num, in_num, data, GFP_ATOMIC);
-}
+int virtqueue_add_buf(struct virtqueue *vq,
+		      struct scatterlist sg[],
+		      unsigned int out_num,
+		      unsigned int in_num,
+		      void *data,
+		      gfp_t gfp);
 
 void virtqueue_kick(struct virtqueue *vq);
 
diff --git a/tools/virtio/virtio_test.c b/tools/virtio/virtio_test.c
--- a/tools/virtio/virtio_test.c
+++ b/tools/virtio/virtio_test.c
@@ -160,7 +160,8 @@ static void run_test(struct vdev_info *d
 			if (started < bufs) {
 				sg_init_one(&sl, dev->buf, dev->buf_size);
 				r = virtqueue_add_buf(vq->vq, &sl, 1, 0,
-						      dev->buf + started);
+						      dev->buf + started,
+						      GFP_ATOMIC);
 				if (likely(r >= 0)) {
 					++started;
 					virtqueue_kick(vq->vq);

^ permalink raw reply	[flat|nested] 60+ messages in thread

* [PATCH 3 of 5] virtio: support unlocked queue kick
       [not found] <patchbomb.1320306168@localhost6.localdomain6>
@ 2011-11-03  7:42   ` Rusty Russell
  2011-11-03  7:42   ` Rusty Russell
                     ` (3 subsequent siblings)
  4 siblings, 0 replies; 60+ messages in thread
From: Rusty Russell @ 2011-11-03  7:42 UTC (permalink / raw)
  To: MichaelS.Tsirkist, Christoph Hellwig; +Cc: virtualization, kvm, linux-kernel

Based on patch by Christoph for virtio_blk speedup:

	Split virtqueue_kick to be able to do the actual notification
	outside the lock protecting the virtqueue.  This patch was
	originally done by Stefan Hajnoczi, but I can't find the
	original one anymore and had to recreated it from memory.
	Pointers to the original or corrections for the commit message
	are welcome.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>

diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -237,10 +237,12 @@ add_head:
 }
 EXPORT_SYMBOL_GPL(virtqueue_add_buf);
 
-void virtqueue_kick(struct virtqueue *_vq)
+bool virtqueue_kick_prepare(struct virtqueue *_vq)
 {
 	struct vring_virtqueue *vq = to_vvq(_vq);
 	u16 new, old;
+	bool needs_kick;
+
 	START_USE(vq);
 	/* Descriptors and available array need to be set before we expose the
 	 * new available array entries. */
@@ -253,13 +255,30 @@ void virtqueue_kick(struct virtqueue *_v
 	/* Need to update avail index before checking if we should notify */
 	virtio_mb();
 
-	if (vq->event ?
-	    vring_need_event(vring_avail_event(&vq->vring), new, old) :
-	    !(vq->vring.used->flags & VRING_USED_F_NO_NOTIFY))
-		/* Prod other side to tell it about changes. */
-		vq->notify(&vq->vq);
+	if (vq->event) {
+		needs_kick = vring_need_event(vring_avail_event(&vq->vring),
+					      new, old);
+	} else {
+		needs_kick = (!(vq->vring.used->flags & VRING_USED_F_NO_NOTIFY));
+	}
+	END_USE(vq);
+	return needs_kick;
+}
+EXPORT_SYMBOL_GPL(virtqueue_kick_prepare);
 
-	END_USE(vq);
+void virtqueue_notify(struct virtqueue *_vq)
+{
+	struct vring_virtqueue *vq = to_vvq(_vq);
+
+	/* Prod other side to tell it about changes. */
+	vq->notify(_vq);
+}
+EXPORT_SYMBOL_GPL(virtqueue_notify);
+
+void virtqueue_kick(struct virtqueue *vq)
+{
+	if (virtqueue_kick_prepare(vq))
+		virtqueue_notify(vq);
 }
 EXPORT_SYMBOL_GPL(virtqueue_kick);
 
diff --git a/include/linux/virtio.h b/include/linux/virtio.h
--- a/include/linux/virtio.h
+++ b/include/linux/virtio.h
@@ -62,6 +62,27 @@ int virtqueue_add_buf(struct virtqueue *
 void virtqueue_kick(struct virtqueue *vq);
 
 /**
+ * virtqueue_kick_prepare - first half of split virtqueue_kick call.
+ * @vq: the struct virtqueue
+ *
+ * Instead of virtqueue_kick(), you can do:
+ *	if (virtqueue_kick_prepare(vq))
+ *		virtqueue_notify(vq);
+ *
+ * This is sometimes useful because the virtqueue_kick_prepare() needs
+ * to be serialized, but the actual virtqueue_notify() call does not.
+ */
+bool virtqueue_kick_prepare(struct virtqueue *vq);
+
+/**
+ * virtqueue_notify - second half of split virtqueue_kick call.
+ * @vq: the struct virtqueue
+ *
+ * This does not need to be serialized.
+ */
+void virtqueue_notify(struct virtqueue *vq);
+
+/**
  * virtqueue_get_buf - get the next used buffer
  * @vq: the struct virtqueue we're talking about.
  * @len: the length written into the buffer



^ permalink raw reply	[flat|nested] 60+ messages in thread

* [PATCH 3 of 5] virtio: support unlocked queue kick
@ 2011-11-03  7:42   ` Rusty Russell
  0 siblings, 0 replies; 60+ messages in thread
From: Rusty Russell @ 2011-11-03  7:42 UTC (permalink / raw)
  To: MichaelS.Tsirkist, Christoph Hellwig; +Cc: linux-kernel, kvm, virtualization

Based on patch by Christoph for virtio_blk speedup:

	Split virtqueue_kick to be able to do the actual notification
	outside the lock protecting the virtqueue.  This patch was
	originally done by Stefan Hajnoczi, but I can't find the
	original one anymore and had to recreated it from memory.
	Pointers to the original or corrections for the commit message
	are welcome.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>

diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -237,10 +237,12 @@ add_head:
 }
 EXPORT_SYMBOL_GPL(virtqueue_add_buf);
 
-void virtqueue_kick(struct virtqueue *_vq)
+bool virtqueue_kick_prepare(struct virtqueue *_vq)
 {
 	struct vring_virtqueue *vq = to_vvq(_vq);
 	u16 new, old;
+	bool needs_kick;
+
 	START_USE(vq);
 	/* Descriptors and available array need to be set before we expose the
 	 * new available array entries. */
@@ -253,13 +255,30 @@ void virtqueue_kick(struct virtqueue *_v
 	/* Need to update avail index before checking if we should notify */
 	virtio_mb();
 
-	if (vq->event ?
-	    vring_need_event(vring_avail_event(&vq->vring), new, old) :
-	    !(vq->vring.used->flags & VRING_USED_F_NO_NOTIFY))
-		/* Prod other side to tell it about changes. */
-		vq->notify(&vq->vq);
+	if (vq->event) {
+		needs_kick = vring_need_event(vring_avail_event(&vq->vring),
+					      new, old);
+	} else {
+		needs_kick = (!(vq->vring.used->flags & VRING_USED_F_NO_NOTIFY));
+	}
+	END_USE(vq);
+	return needs_kick;
+}
+EXPORT_SYMBOL_GPL(virtqueue_kick_prepare);
 
-	END_USE(vq);
+void virtqueue_notify(struct virtqueue *_vq)
+{
+	struct vring_virtqueue *vq = to_vvq(_vq);
+
+	/* Prod other side to tell it about changes. */
+	vq->notify(_vq);
+}
+EXPORT_SYMBOL_GPL(virtqueue_notify);
+
+void virtqueue_kick(struct virtqueue *vq)
+{
+	if (virtqueue_kick_prepare(vq))
+		virtqueue_notify(vq);
 }
 EXPORT_SYMBOL_GPL(virtqueue_kick);
 
diff --git a/include/linux/virtio.h b/include/linux/virtio.h
--- a/include/linux/virtio.h
+++ b/include/linux/virtio.h
@@ -62,6 +62,27 @@ int virtqueue_add_buf(struct virtqueue *
 void virtqueue_kick(struct virtqueue *vq);
 
 /**
+ * virtqueue_kick_prepare - first half of split virtqueue_kick call.
+ * @vq: the struct virtqueue
+ *
+ * Instead of virtqueue_kick(), you can do:
+ *	if (virtqueue_kick_prepare(vq))
+ *		virtqueue_notify(vq);
+ *
+ * This is sometimes useful because the virtqueue_kick_prepare() needs
+ * to be serialized, but the actual virtqueue_notify() call does not.
+ */
+bool virtqueue_kick_prepare(struct virtqueue *vq);
+
+/**
+ * virtqueue_notify - second half of split virtqueue_kick call.
+ * @vq: the struct virtqueue
+ *
+ * This does not need to be serialized.
+ */
+void virtqueue_notify(struct virtqueue *vq);
+
+/**
  * virtqueue_get_buf - get the next used buffer
  * @vq: the struct virtqueue we're talking about.
  * @len: the length written into the buffer

^ permalink raw reply	[flat|nested] 60+ messages in thread

* [PATCH 4 of 5] virtio: avoid modulus operation
       [not found] <patchbomb.1320306168@localhost6.localdomain6>
@ 2011-11-03  7:42   ` Rusty Russell
  2011-11-03  7:42   ` Rusty Russell
                     ` (3 subsequent siblings)
  4 siblings, 0 replies; 60+ messages in thread
From: Rusty Russell @ 2011-11-03  7:42 UTC (permalink / raw)
  To: MichaelS.Tsirkist, Christoph Hellwig; +Cc: virtualization, kvm, linux-kernel

Since we know vq->vring.num is a power of 2, modulus is lazy (it's asserted
in vring_new_virtqueue()).

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
---
 drivers/virtio/virtio_ring.c |   10 ++++++----
 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/arch/x86/lguest/boot.c b/arch/x86/lguest/boot.c
--- a/arch/x86/lguest/boot.c
+++ b/arch/x86/lguest/boot.c
@@ -1420,7 +1420,7 @@ __init void lguest_init(void)
 	add_preferred_console("hvc", 0, NULL);
 
 	/* Register our very early console. */
-	virtio_cons_early_init(early_put_chars);
+//	virtio_cons_early_init(early_put_chars);
 
 	/*
 	 * Last of all, we set the power management poweroff hook to point to
diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -226,8 +226,8 @@ add_head:
 	vq->data[head] = data;
 
 	/* Put entry in available array (but don't update avail->idx until they
-	 * do sync).  FIXME: avoid modulus here? */
-	avail = (vq->vring.avail->idx + vq->num_added++) % vq->vring.num;
+	 * do sync). */
+	avail = ((vq->vring.avail->idx + vq->num_added++) & (vq->vring.num-1));
 	vq->vring.avail->ring[avail] = head;
 
 	pr_debug("Added buffer head %i to %p\n", head, vq);
@@ -317,6 +317,7 @@ void *virtqueue_get_buf(struct virtqueue
 	struct vring_virtqueue *vq = to_vvq(_vq);
 	void *ret;
 	unsigned int i;
+	u16 last_used;
 
 	START_USE(vq);
 
@@ -334,8 +335,9 @@ void *virtqueue_get_buf(struct virtqueue
 	/* Only get used array entries after they have been exposed by host. */
 	virtio_rmb();
 
-	i = vq->vring.used->ring[vq->last_used_idx%vq->vring.num].id;
-	*len = vq->vring.used->ring[vq->last_used_idx%vq->vring.num].len;
+	last_used = (vq->last_used_idx & (vq->vring.num - 1));
+	i = vq->vring.used->ring[last_used].id;
+	*len = vq->vring.used->ring[last_used].len;
 
 	if (unlikely(i >= vq->vring.num)) {
 		BAD_RING(vq, "id %u out of range\n", i);



^ permalink raw reply	[flat|nested] 60+ messages in thread

* [PATCH 4 of 5] virtio: avoid modulus operation
@ 2011-11-03  7:42   ` Rusty Russell
  0 siblings, 0 replies; 60+ messages in thread
From: Rusty Russell @ 2011-11-03  7:42 UTC (permalink / raw)
  To: MichaelS.Tsirkist, Christoph Hellwig; +Cc: linux-kernel, kvm, virtualization

Since we know vq->vring.num is a power of 2, modulus is lazy (it's asserted
in vring_new_virtqueue()).

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
---
 drivers/virtio/virtio_ring.c |   10 ++++++----
 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/arch/x86/lguest/boot.c b/arch/x86/lguest/boot.c
--- a/arch/x86/lguest/boot.c
+++ b/arch/x86/lguest/boot.c
@@ -1420,7 +1420,7 @@ __init void lguest_init(void)
 	add_preferred_console("hvc", 0, NULL);
 
 	/* Register our very early console. */
-	virtio_cons_early_init(early_put_chars);
+//	virtio_cons_early_init(early_put_chars);
 
 	/*
 	 * Last of all, we set the power management poweroff hook to point to
diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -226,8 +226,8 @@ add_head:
 	vq->data[head] = data;
 
 	/* Put entry in available array (but don't update avail->idx until they
-	 * do sync).  FIXME: avoid modulus here? */
-	avail = (vq->vring.avail->idx + vq->num_added++) % vq->vring.num;
+	 * do sync). */
+	avail = ((vq->vring.avail->idx + vq->num_added++) & (vq->vring.num-1));
 	vq->vring.avail->ring[avail] = head;
 
 	pr_debug("Added buffer head %i to %p\n", head, vq);
@@ -317,6 +317,7 @@ void *virtqueue_get_buf(struct virtqueue
 	struct vring_virtqueue *vq = to_vvq(_vq);
 	void *ret;
 	unsigned int i;
+	u16 last_used;
 
 	START_USE(vq);
 
@@ -334,8 +335,9 @@ void *virtqueue_get_buf(struct virtqueue
 	/* Only get used array entries after they have been exposed by host. */
 	virtio_rmb();
 
-	i = vq->vring.used->ring[vq->last_used_idx%vq->vring.num].id;
-	*len = vq->vring.used->ring[vq->last_used_idx%vq->vring.num].len;
+	last_used = (vq->last_used_idx & (vq->vring.num - 1));
+	i = vq->vring.used->ring[last_used].id;
+	*len = vq->vring.used->ring[last_used].len;
 
 	if (unlikely(i >= vq->vring.num)) {
 		BAD_RING(vq, "id %u out of range\n", i);

^ permalink raw reply	[flat|nested] 60+ messages in thread

* [PATCH 5 of 5] virtio: expose added descriptors immediately
       [not found] <patchbomb.1320306168@localhost6.localdomain6>
@ 2011-11-03  7:42   ` Rusty Russell
  2011-11-03  7:42   ` Rusty Russell
                     ` (3 subsequent siblings)
  4 siblings, 0 replies; 60+ messages in thread
From: Rusty Russell @ 2011-11-03  7:42 UTC (permalink / raw)
  To: MichaelS.Tsirkist, Christoph Hellwig; +Cc: virtualization, kvm, linux-kernel

A virtio driver does virtqueue_add_buf() multiple times before finally
calling virtqueue_kick(); previously we only exposed the added buffers
in the virtqueue_kick() call.  This means we don't need a memory
barrier in virtqueue_add_buf(), but it reduces concurrency as the
device (ie. host) can't see the buffers until the kick.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
---
 drivers/virtio/virtio_ring.c |   37 ++++++++++++++++++++++---------------
 1 file changed, 22 insertions(+), 15 deletions(-)

diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -227,9 +227,15 @@ add_head:
 
 	/* Put entry in available array (but don't update avail->idx until they
 	 * do sync). */
-	avail = ((vq->vring.avail->idx + vq->num_added++) & (vq->vring.num-1));
+	avail = (vq->vring.avail->idx & (vq->vring.num-1));
 	vq->vring.avail->ring[avail] = head;
 
+	/* Descriptors and available array need to be set before we expose the
+	 * new available array entries. */
+	virtio_wmb();
+	vq->vring.avail->idx++;
+	vq->num_added++;
+
 	pr_debug("Added buffer head %i to %p\n", head, vq);
 	END_USE(vq);
 
@@ -248,13 +254,10 @@ bool virtqueue_kick_prepare(struct virtq
 	 * new available array entries. */
 	virtio_wmb();
 
-	old = vq->vring.avail->idx;
-	new = vq->vring.avail->idx = old + vq->num_added;
+	old = vq->vring.avail->idx - vq->num_added;
+	new = vq->vring.avail->idx;
 	vq->num_added = 0;
 
-	/* Need to update avail index before checking if we should notify */
-	virtio_mb();
-
 	if (vq->event) {
 		needs_kick = vring_need_event(vring_avail_event(&vq->vring),
 					      new, old);



^ permalink raw reply	[flat|nested] 60+ messages in thread

* [PATCH 5 of 5] virtio: expose added descriptors immediately
@ 2011-11-03  7:42   ` Rusty Russell
  0 siblings, 0 replies; 60+ messages in thread
From: Rusty Russell @ 2011-11-03  7:42 UTC (permalink / raw)
  To: MichaelS.Tsirkist, Christoph Hellwig; +Cc: linux-kernel, kvm, virtualization

A virtio driver does virtqueue_add_buf() multiple times before finally
calling virtqueue_kick(); previously we only exposed the added buffers
in the virtqueue_kick() call.  This means we don't need a memory
barrier in virtqueue_add_buf(), but it reduces concurrency as the
device (ie. host) can't see the buffers until the kick.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
---
 drivers/virtio/virtio_ring.c |   37 ++++++++++++++++++++++---------------
 1 file changed, 22 insertions(+), 15 deletions(-)

diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -227,9 +227,15 @@ add_head:
 
 	/* Put entry in available array (but don't update avail->idx until they
 	 * do sync). */
-	avail = ((vq->vring.avail->idx + vq->num_added++) & (vq->vring.num-1));
+	avail = (vq->vring.avail->idx & (vq->vring.num-1));
 	vq->vring.avail->ring[avail] = head;
 
+	/* Descriptors and available array need to be set before we expose the
+	 * new available array entries. */
+	virtio_wmb();
+	vq->vring.avail->idx++;
+	vq->num_added++;
+
 	pr_debug("Added buffer head %i to %p\n", head, vq);
 	END_USE(vq);
 
@@ -248,13 +254,10 @@ bool virtqueue_kick_prepare(struct virtq
 	 * new available array entries. */
 	virtio_wmb();
 
-	old = vq->vring.avail->idx;
-	new = vq->vring.avail->idx = old + vq->num_added;
+	old = vq->vring.avail->idx - vq->num_added;
+	new = vq->vring.avail->idx;
 	vq->num_added = 0;
 
-	/* Need to update avail index before checking if we should notify */
-	virtio_mb();
-
 	if (vq->event) {
 		needs_kick = vring_need_event(vring_avail_event(&vq->vring),
 					      new, old);

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 1 of 5] virtio: document functions better
  2011-11-03  7:42   ` Rusty Russell
@ 2011-11-03  7:49     ` Christoph Hellwig
  -1 siblings, 0 replies; 60+ messages in thread
From: Christoph Hellwig @ 2011-11-03  7:49 UTC (permalink / raw)
  To: Rusty Russell
  Cc: MichaelS.Tsirkist, Christoph Hellwig, virtualization, kvm, linux-kernel

On Thu, Nov 03, 2011 at 06:12:49PM +1030, Rusty Russell wrote:
> The old documentation is left over from when we used a structure with
> strategy pointers.

Looks good,

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 1 of 5] virtio: document functions better
@ 2011-11-03  7:49     ` Christoph Hellwig
  0 siblings, 0 replies; 60+ messages in thread
From: Christoph Hellwig @ 2011-11-03  7:49 UTC (permalink / raw)
  To: Rusty Russell
  Cc: MichaelS.Tsirkist, Christoph Hellwig, linux-kernel, kvm, virtualization

On Thu, Nov 03, 2011 at 06:12:49PM +1030, Rusty Russell wrote:
> The old documentation is left over from when we used a structure with
> strategy pointers.

Looks good,

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 2 of 5] virtio: rename virtqueue_add_buf_gfp to virtqueue_add_buf
  2011-11-03  7:42   ` Rusty Russell
@ 2011-11-03  7:50     ` Christoph Hellwig
  -1 siblings, 0 replies; 60+ messages in thread
From: Christoph Hellwig @ 2011-11-03  7:50 UTC (permalink / raw)
  To: Rusty Russell
  Cc: MichaelS.Tsirkist, Christoph Hellwig, virtualization, kvm, linux-kernel

On Thu, Nov 03, 2011 at 06:12:50PM +1030, Rusty Russell wrote:
> Remove wrapper functions. This makes the allocation type explicit in
> all callers; I used GPF_KERNEL where it seemed obvious, left it at
> GFP_ATOMIC otherwise.

Looks good,

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 2 of 5] virtio: rename virtqueue_add_buf_gfp to virtqueue_add_buf
@ 2011-11-03  7:50     ` Christoph Hellwig
  0 siblings, 0 replies; 60+ messages in thread
From: Christoph Hellwig @ 2011-11-03  7:50 UTC (permalink / raw)
  To: Rusty Russell
  Cc: MichaelS.Tsirkist, Christoph Hellwig, linux-kernel, kvm, virtualization

On Thu, Nov 03, 2011 at 06:12:50PM +1030, Rusty Russell wrote:
> Remove wrapper functions. This makes the allocation type explicit in
> all callers; I used GPF_KERNEL where it seemed obvious, left it at
> GFP_ATOMIC otherwise.

Looks good,

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 4 of 5] virtio: avoid modulus operation
  2011-11-03  7:42   ` Rusty Russell
  (?)
@ 2011-11-03  7:51   ` Pekka Enberg
  2011-11-03 10:18     ` Rusty Russell
  2011-11-03 10:18     ` Rusty Russell
  -1 siblings, 2 replies; 60+ messages in thread
From: Pekka Enberg @ 2011-11-03  7:51 UTC (permalink / raw)
  To: Rusty Russell
  Cc: MichaelS.Tsirkist, Christoph Hellwig, virtualization, kvm, linux-kernel

On Thu, Nov 3, 2011 at 9:42 AM, Rusty Russell <rusty@rustcorp.com.au> wrote:
> Since we know vq->vring.num is a power of 2, modulus is lazy (it's asserted
> in vring_new_virtqueue()).
>
> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
> ---
>  drivers/virtio/virtio_ring.c |   10 ++++++----
>  1 file changed, 6 insertions(+), 4 deletions(-)
>
> diff --git a/arch/x86/lguest/boot.c b/arch/x86/lguest/boot.c
> --- a/arch/x86/lguest/boot.c
> +++ b/arch/x86/lguest/boot.c
> @@ -1420,7 +1420,7 @@ __init void lguest_init(void)
>        add_preferred_console("hvc", 0, NULL);
>
>        /* Register our very early console. */
> -       virtio_cons_early_init(early_put_chars);
> +//     virtio_cons_early_init(early_put_chars);

What's this hunk here?

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 4 of 5] virtio: avoid modulus operation
  2011-11-03  7:42   ` Rusty Russell
  (?)
  (?)
@ 2011-11-03  7:51   ` Pekka Enberg
  -1 siblings, 0 replies; 60+ messages in thread
From: Pekka Enberg @ 2011-11-03  7:51 UTC (permalink / raw)
  To: Rusty Russell
  Cc: MichaelS.Tsirkist, Christoph Hellwig, linux-kernel, kvm, virtualization

On Thu, Nov 3, 2011 at 9:42 AM, Rusty Russell <rusty@rustcorp.com.au> wrote:
> Since we know vq->vring.num is a power of 2, modulus is lazy (it's asserted
> in vring_new_virtqueue()).
>
> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
> ---
>  drivers/virtio/virtio_ring.c |   10 ++++++----
>  1 file changed, 6 insertions(+), 4 deletions(-)
>
> diff --git a/arch/x86/lguest/boot.c b/arch/x86/lguest/boot.c
> --- a/arch/x86/lguest/boot.c
> +++ b/arch/x86/lguest/boot.c
> @@ -1420,7 +1420,7 @@ __init void lguest_init(void)
>        add_preferred_console("hvc", 0, NULL);
>
>        /* Register our very early console. */
> -       virtio_cons_early_init(early_put_chars);
> +//     virtio_cons_early_init(early_put_chars);

What's this hunk here?

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 3 of 5] virtio: support unlocked queue kick
  2011-11-03  7:42   ` Rusty Russell
  (?)
  (?)
@ 2011-11-03  7:52   ` Christoph Hellwig
  2011-11-04 10:09       ` Stefan Hajnoczi
  2011-11-04 10:36       ` Rusty Russell
  -1 siblings, 2 replies; 60+ messages in thread
From: Christoph Hellwig @ 2011-11-03  7:52 UTC (permalink / raw)
  To: Rusty Russell
  Cc: MichaelS.Tsirkist, Christoph Hellwig, virtualization, kvm, linux-kernel

On Thu, Nov 03, 2011 at 06:12:51PM +1030, Rusty Russell wrote:
> Based on patch by Christoph for virtio_blk speedup:

Please credit it to Stefan - he also sent a pointer to his original
version in reply to the previous thread.

Also shouldn't virtqueue_kick have kerneldoc comments?


I also notices that you documented the functions bother here and in
the first patch in the headers.  At least historically the kerneldoc
tools didn't parse comments at declarations, but only at the function
defintions.  Did you check these actually get picked up?

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 3 of 5] virtio: support unlocked queue kick
  2011-11-03  7:42   ` Rusty Russell
  (?)
@ 2011-11-03  7:52   ` Christoph Hellwig
  -1 siblings, 0 replies; 60+ messages in thread
From: Christoph Hellwig @ 2011-11-03  7:52 UTC (permalink / raw)
  To: Rusty Russell
  Cc: MichaelS.Tsirkist, Christoph Hellwig, linux-kernel, kvm, virtualization

On Thu, Nov 03, 2011 at 06:12:51PM +1030, Rusty Russell wrote:
> Based on patch by Christoph for virtio_blk speedup:

Please credit it to Stefan - he also sent a pointer to his original
version in reply to the previous thread.

Also shouldn't virtqueue_kick have kerneldoc comments?


I also notices that you documented the functions bother here and in
the first patch in the headers.  At least historically the kerneldoc
tools didn't parse comments at declarations, but only at the function
defintions.  Did you check these actually get picked up?

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 4 of 5] virtio: avoid modulus operation
  2011-11-03  7:51   ` Pekka Enberg
  2011-11-03 10:18     ` Rusty Russell
@ 2011-11-03 10:18     ` Rusty Russell
  1 sibling, 0 replies; 60+ messages in thread
From: Rusty Russell @ 2011-11-03 10:18 UTC (permalink / raw)
  To: Pekka Enberg; +Cc: mst, Christoph Hellwig, virtualization, kvm, linux-kernel

On Thu, 3 Nov 2011 09:51:15 +0200, Pekka Enberg <penberg@kernel.org> wrote:
> On Thu, Nov 3, 2011 at 9:42 AM, Rusty Russell <rusty@rustcorp.com.au> wrote:
> > Since we know vq->vring.num is a power of 2, modulus is lazy (it's asserted
> > in vring_new_virtqueue()).
> >
> > Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
> > ---
> >  drivers/virtio/virtio_ring.c |   10 ++++++----
> >  1 file changed, 6 insertions(+), 4 deletions(-)
> >
> > diff --git a/arch/x86/lguest/boot.c b/arch/x86/lguest/boot.c
> > --- a/arch/x86/lguest/boot.c
> > +++ b/arch/x86/lguest/boot.c
> > @@ -1420,7 +1420,7 @@ __init void lguest_init(void)
> >        add_preferred_console("hvc", 0, NULL);
> >
> >        /* Register our very early console. */
> > -       virtio_cons_early_init(early_put_chars);
> > +//     virtio_cons_early_init(early_put_chars);
> 
> What's this hunk here?

Ugly workaround for console until hvc_console breakage gets reverted.

Left in by mistake :)

Thanks,
Rusty.

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 4 of 5] virtio: avoid modulus operation
  2011-11-03  7:51   ` Pekka Enberg
@ 2011-11-03 10:18     ` Rusty Russell
  2011-11-03 10:18     ` Rusty Russell
  1 sibling, 0 replies; 60+ messages in thread
From: Rusty Russell @ 2011-11-03 10:18 UTC (permalink / raw)
  To: Pekka Enberg; +Cc: Christoph Hellwig, virtualization, linux-kernel, kvm, mst

On Thu, 3 Nov 2011 09:51:15 +0200, Pekka Enberg <penberg@kernel.org> wrote:
> On Thu, Nov 3, 2011 at 9:42 AM, Rusty Russell <rusty@rustcorp.com.au> wrote:
> > Since we know vq->vring.num is a power of 2, modulus is lazy (it's asserted
> > in vring_new_virtqueue()).
> >
> > Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
> > ---
> >  drivers/virtio/virtio_ring.c |   10 ++++++----
> >  1 file changed, 6 insertions(+), 4 deletions(-)
> >
> > diff --git a/arch/x86/lguest/boot.c b/arch/x86/lguest/boot.c
> > --- a/arch/x86/lguest/boot.c
> > +++ b/arch/x86/lguest/boot.c
> > @@ -1420,7 +1420,7 @@ __init void lguest_init(void)
> >        add_preferred_console("hvc", 0, NULL);
> >
> >        /* Register our very early console. */
> > -       virtio_cons_early_init(early_put_chars);
> > +//     virtio_cons_early_init(early_put_chars);
> 
> What's this hunk here?

Ugly workaround for console until hvc_console breakage gets reverted.

Left in by mistake :)

Thanks,
Rusty.
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 3 of 5] virtio: support unlocked queue kick
  2011-11-03  7:52   ` Christoph Hellwig
@ 2011-11-04 10:09       ` Stefan Hajnoczi
  2011-11-04 10:36       ` Rusty Russell
  1 sibling, 0 replies; 60+ messages in thread
From: Stefan Hajnoczi @ 2011-11-04 10:09 UTC (permalink / raw)
  To: Rusty Russell
  Cc: MichaelS.Tsirkist, linux-kernel, kvm, virtualization, Christoph Hellwig

On Thu, Nov 3, 2011 at 7:52 AM, Christoph Hellwig <hch@infradead.org> wrote:
> On Thu, Nov 03, 2011 at 06:12:51PM +1030, Rusty Russell wrote:
>> Based on patch by Christoph for virtio_blk speedup:
>
> Please credit it to Stefan - he also sent a pointer to his original
> version in reply to the previous thread.

Thank you Christoph.  Here the pointer to the original mailing list thread:

"Here is the patch:
https://github.com/stefanha/linux/commit/a6d06644e3a58e57a774e77d7dc34c4a5a2e7496

Or as an email if you want to track it down in your inbox:
http://www.spinics.net/lists/linux-virtualization/msg14616.html"

Stefan

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 3 of 5] virtio: support unlocked queue kick
@ 2011-11-04 10:09       ` Stefan Hajnoczi
  0 siblings, 0 replies; 60+ messages in thread
From: Stefan Hajnoczi @ 2011-11-04 10:09 UTC (permalink / raw)
  To: Rusty Russell
  Cc: MichaelS.Tsirkist, Christoph Hellwig, linux-kernel, kvm, virtualization

On Thu, Nov 3, 2011 at 7:52 AM, Christoph Hellwig <hch@infradead.org> wrote:
> On Thu, Nov 03, 2011 at 06:12:51PM +1030, Rusty Russell wrote:
>> Based on patch by Christoph for virtio_blk speedup:
>
> Please credit it to Stefan - he also sent a pointer to his original
> version in reply to the previous thread.

Thank you Christoph.  Here the pointer to the original mailing list thread:

"Here is the patch:
https://github.com/stefanha/linux/commit/a6d06644e3a58e57a774e77d7dc34c4a5a2e7496

Or as an email if you want to track it down in your inbox:
http://www.spinics.net/lists/linux-virtualization/msg14616.html"

Stefan

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 3 of 5] virtio: support unlocked queue kick
  2011-11-03  7:52   ` Christoph Hellwig
  2011-11-04 10:09       ` Stefan Hajnoczi
@ 2011-11-04 10:36       ` Rusty Russell
  1 sibling, 0 replies; 60+ messages in thread
From: Rusty Russell @ 2011-11-04 10:36 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: mst, Christoph Hellwig, virtualization, kvm, linux-kernel

On Thu, 3 Nov 2011 03:52:11 -0400, Christoph Hellwig <hch@infradead.org> wrote:
> On Thu, Nov 03, 2011 at 06:12:51PM +1030, Rusty Russell wrote:
> > Based on patch by Christoph for virtio_blk speedup:
> 
> Please credit it to Stefan - he also sent a pointer to his original
> version in reply to the previous thread.
> 
> Also shouldn't virtqueue_kick have kerneldoc comments?

Yep, and I've credited it properly.

> I also notices that you documented the functions bother here and in
> the first patch in the headers.  At least historically the kerneldoc
> tools didn't parse comments at declarations, but only at the function
> defintions.  Did you check these actually get picked up?

Gah, I'd forgotten that the kernel tendency is to put the interface
documentation next to the implementation.

Personally, I think extracting it is insane, but I've moved them.

Thanks,
Rusty.

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 3 of 5] virtio: support unlocked queue kick
@ 2011-11-04 10:36       ` Rusty Russell
  0 siblings, 0 replies; 60+ messages in thread
From: Rusty Russell @ 2011-11-04 10:36 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Christoph Hellwig, virtualization, linux-kernel, kvm, mst

On Thu, 3 Nov 2011 03:52:11 -0400, Christoph Hellwig <hch@infradead.org> wrote:
> On Thu, Nov 03, 2011 at 06:12:51PM +1030, Rusty Russell wrote:
> > Based on patch by Christoph for virtio_blk speedup:
> 
> Please credit it to Stefan - he also sent a pointer to his original
> version in reply to the previous thread.
> 
> Also shouldn't virtqueue_kick have kerneldoc comments?

Yep, and I've credited it properly.

> I also notices that you documented the functions bother here and in
> the first patch in the headers.  At least historically the kerneldoc
> tools didn't parse comments at declarations, but only at the function
> defintions.  Did you check these actually get picked up?

Gah, I'd forgotten that the kernel tendency is to put the interface
documentation next to the implementation.

Personally, I think extracting it is insane, but I've moved them.

Thanks,
Rusty.

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 3 of 5] virtio: support unlocked queue kick
@ 2011-11-04 10:36       ` Rusty Russell
  0 siblings, 0 replies; 60+ messages in thread
From: Rusty Russell @ 2011-11-04 10:36 UTC (permalink / raw)
  Cc: Christoph Hellwig, virtualization, linux-kernel, kvm, mst

On Thu, 3 Nov 2011 03:52:11 -0400, Christoph Hellwig <hch@infradead.org> wrote:
> On Thu, Nov 03, 2011 at 06:12:51PM +1030, Rusty Russell wrote:
> > Based on patch by Christoph for virtio_blk speedup:
> 
> Please credit it to Stefan - he also sent a pointer to his original
> version in reply to the previous thread.
> 
> Also shouldn't virtqueue_kick have kerneldoc comments?

Yep, and I've credited it properly.

> I also notices that you documented the functions bother here and in
> the first patch in the headers.  At least historically the kerneldoc
> tools didn't parse comments at declarations, but only at the function
> defintions.  Did you check these actually get picked up?

Gah, I'd forgotten that the kernel tendency is to put the interface
documentation next to the implementation.

Personally, I think extracting it is insane, but I've moved them.

Thanks,
Rusty.

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 5 of 5] virtio: expose added descriptors immediately
  2011-11-03  7:42   ` Rusty Russell
@ 2011-11-13 21:03     ` Michael S. Tsirkin
  -1 siblings, 0 replies; 60+ messages in thread
From: Michael S. Tsirkin @ 2011-11-13 21:03 UTC (permalink / raw)
  To: Rusty Russell; +Cc: mst, Christoph Hellwig, linux-kernel, kvm, virtualization

On Thu, Nov 03, 2011 at 06:12:53PM +1030, Rusty Russell wrote:
> A virtio driver does virtqueue_add_buf() multiple times before finally
> calling virtqueue_kick(); previously we only exposed the added buffers
> in the virtqueue_kick() call.  This means we don't need a memory
> barrier in virtqueue_add_buf(), but it reduces concurrency as the
> device (ie. host) can't see the buffers until the kick.
> 
> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>

In the past I played with a patch like this, but I didn't see a
performance gain either way. Do you see any gain?

I'm a bit concerned that with this patch, a buggy driver that
adds more than 2^16 descriptors without a kick
would seem to work sometimes. Let's add WARN_ON(vq->num_added > (1 << 16))?

> ---
>  drivers/virtio/virtio_ring.c |   37 ++++++++++++++++++++++---------------
>  1 file changed, 22 insertions(+), 15 deletions(-)
> 
> diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> --- a/drivers/virtio/virtio_ring.c
> +++ b/drivers/virtio/virtio_ring.c
> @@ -227,9 +227,15 @@ add_head:
>  
>  	/* Put entry in available array (but don't update avail->idx until they
>  	 * do sync). */
> -	avail = ((vq->vring.avail->idx + vq->num_added++) & (vq->vring.num-1));
> +	avail = (vq->vring.avail->idx & (vq->vring.num-1));
>  	vq->vring.avail->ring[avail] = head;
>  
> +	/* Descriptors and available array need to be set before we expose the
> +	 * new available array entries. */
> +	virtio_wmb();
> +	vq->vring.avail->idx++;
> +	vq->num_added++;
> +
>  	pr_debug("Added buffer head %i to %p\n", head, vq);
>  	END_USE(vq);
>  
> @@ -248,13 +254,10 @@ bool virtqueue_kick_prepare(struct virtq
>  	 * new available array entries. */
>  	virtio_wmb();
>  
> -	old = vq->vring.avail->idx;
> -	new = vq->vring.avail->idx = old + vq->num_added;
> +	old = vq->vring.avail->idx - vq->num_added;
> +	new = vq->vring.avail->idx;
>  	vq->num_added = 0;
>  
> -	/* Need to update avail index before checking if we should notify */
> -	virtio_mb();
> -
>  	if (vq->event) {
>  		needs_kick = vring_need_event(vring_avail_event(&vq->vring),
>  					      new, old);
> 
> 
> _______________________________________________
> Virtualization mailing list
> Virtualization@lists.linux-foundation.org
> https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 5 of 5] virtio: expose added descriptors immediately
@ 2011-11-13 21:03     ` Michael S. Tsirkin
  0 siblings, 0 replies; 60+ messages in thread
From: Michael S. Tsirkin @ 2011-11-13 21:03 UTC (permalink / raw)
  To: Rusty Russell; +Cc: Christoph Hellwig, virtualization, linux-kernel, kvm, mst

On Thu, Nov 03, 2011 at 06:12:53PM +1030, Rusty Russell wrote:
> A virtio driver does virtqueue_add_buf() multiple times before finally
> calling virtqueue_kick(); previously we only exposed the added buffers
> in the virtqueue_kick() call.  This means we don't need a memory
> barrier in virtqueue_add_buf(), but it reduces concurrency as the
> device (ie. host) can't see the buffers until the kick.
> 
> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>

In the past I played with a patch like this, but I didn't see a
performance gain either way. Do you see any gain?

I'm a bit concerned that with this patch, a buggy driver that
adds more than 2^16 descriptors without a kick
would seem to work sometimes. Let's add WARN_ON(vq->num_added > (1 << 16))?

> ---
>  drivers/virtio/virtio_ring.c |   37 ++++++++++++++++++++++---------------
>  1 file changed, 22 insertions(+), 15 deletions(-)
> 
> diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> --- a/drivers/virtio/virtio_ring.c
> +++ b/drivers/virtio/virtio_ring.c
> @@ -227,9 +227,15 @@ add_head:
>  
>  	/* Put entry in available array (but don't update avail->idx until they
>  	 * do sync). */
> -	avail = ((vq->vring.avail->idx + vq->num_added++) & (vq->vring.num-1));
> +	avail = (vq->vring.avail->idx & (vq->vring.num-1));
>  	vq->vring.avail->ring[avail] = head;
>  
> +	/* Descriptors and available array need to be set before we expose the
> +	 * new available array entries. */
> +	virtio_wmb();
> +	vq->vring.avail->idx++;
> +	vq->num_added++;
> +
>  	pr_debug("Added buffer head %i to %p\n", head, vq);
>  	END_USE(vq);
>  
> @@ -248,13 +254,10 @@ bool virtqueue_kick_prepare(struct virtq
>  	 * new available array entries. */
>  	virtio_wmb();
>  
> -	old = vq->vring.avail->idx;
> -	new = vq->vring.avail->idx = old + vq->num_added;
> +	old = vq->vring.avail->idx - vq->num_added;
> +	new = vq->vring.avail->idx;
>  	vq->num_added = 0;
>  
> -	/* Need to update avail index before checking if we should notify */
> -	virtio_mb();
> -
>  	if (vq->event) {
>  		needs_kick = vring_need_event(vring_avail_event(&vq->vring),
>  					      new, old);
> 
> 
> _______________________________________________
> Virtualization mailing list
> Virtualization@lists.linux-foundation.org
> https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 5 of 5] virtio: expose added descriptors immediately
  2011-11-13 21:03     ` Michael S. Tsirkin
  (?)
@ 2011-11-14  0:43       ` Rusty Russell
  -1 siblings, 0 replies; 60+ messages in thread
From: Rusty Russell @ 2011-11-14  0:43 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: mst, Christoph Hellwig, linux-kernel, kvm, virtualization

On Sun, 13 Nov 2011 23:03:14 +0200, "Michael S. Tsirkin" <mst@redhat.com> wrote:
> On Thu, Nov 03, 2011 at 06:12:53PM +1030, Rusty Russell wrote:
> > A virtio driver does virtqueue_add_buf() multiple times before finally
> > calling virtqueue_kick(); previously we only exposed the added buffers
> > in the virtqueue_kick() call.  This means we don't need a memory
> > barrier in virtqueue_add_buf(), but it reduces concurrency as the
> > device (ie. host) can't see the buffers until the kick.
> > 
> > Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
> 
> In the past I played with a patch like this, but I didn't see a
> performance gain either way. Do you see any gain?

No, but I haven't run it on real hardware.  lguest may see a win with
this in theory, since the virtqueue processing is fully async, so I'll
run some tests.

> I'm a bit concerned that with this patch, a buggy driver that
> adds more than 2^16 descriptors without a kick
> would seem to work sometimes. Let's add WARN_ON(vq->num_added > (1 << 16))?

Hmm, I guess it could wait for the add to fail before doing a kick, but
noone does that at the moment, so I've added a slight variant:

	WARN_ON(vq->num_added > vq->vring.num);

Thanks,
Rusty.

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 5 of 5] virtio: expose added descriptors immediately
@ 2011-11-14  0:43       ` Rusty Russell
  0 siblings, 0 replies; 60+ messages in thread
From: Rusty Russell @ 2011-11-14  0:43 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Christoph Hellwig, virtualization, linux-kernel, kvm, mst

On Sun, 13 Nov 2011 23:03:14 +0200, "Michael S. Tsirkin" <mst@redhat.com> wrote:
> On Thu, Nov 03, 2011 at 06:12:53PM +1030, Rusty Russell wrote:
> > A virtio driver does virtqueue_add_buf() multiple times before finally
> > calling virtqueue_kick(); previously we only exposed the added buffers
> > in the virtqueue_kick() call.  This means we don't need a memory
> > barrier in virtqueue_add_buf(), but it reduces concurrency as the
> > device (ie. host) can't see the buffers until the kick.
> > 
> > Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
> 
> In the past I played with a patch like this, but I didn't see a
> performance gain either way. Do you see any gain?

No, but I haven't run it on real hardware.  lguest may see a win with
this in theory, since the virtqueue processing is fully async, so I'll
run some tests.

> I'm a bit concerned that with this patch, a buggy driver that
> adds more than 2^16 descriptors without a kick
> would seem to work sometimes. Let's add WARN_ON(vq->num_added > (1 << 16))?

Hmm, I guess it could wait for the add to fail before doing a kick, but
noone does that at the moment, so I've added a slight variant:

	WARN_ON(vq->num_added > vq->vring.num);

Thanks,
Rusty.

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 5 of 5] virtio: expose added descriptors immediately
@ 2011-11-14  0:43       ` Rusty Russell
  0 siblings, 0 replies; 60+ messages in thread
From: Rusty Russell @ 2011-11-14  0:43 UTC (permalink / raw)
  Cc: Christoph Hellwig, virtualization, linux-kernel, kvm, mst

On Sun, 13 Nov 2011 23:03:14 +0200, "Michael S. Tsirkin" <mst@redhat.com> wrote:
> On Thu, Nov 03, 2011 at 06:12:53PM +1030, Rusty Russell wrote:
> > A virtio driver does virtqueue_add_buf() multiple times before finally
> > calling virtqueue_kick(); previously we only exposed the added buffers
> > in the virtqueue_kick() call.  This means we don't need a memory
> > barrier in virtqueue_add_buf(), but it reduces concurrency as the
> > device (ie. host) can't see the buffers until the kick.
> > 
> > Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
> 
> In the past I played with a patch like this, but I didn't see a
> performance gain either way. Do you see any gain?

No, but I haven't run it on real hardware.  lguest may see a win with
this in theory, since the virtqueue processing is fully async, so I'll
run some tests.

> I'm a bit concerned that with this patch, a buggy driver that
> adds more than 2^16 descriptors without a kick
> would seem to work sometimes. Let's add WARN_ON(vq->num_added > (1 << 16))?

Hmm, I guess it could wait for the add to fail before doing a kick, but
noone does that at the moment, so I've added a slight variant:

	WARN_ON(vq->num_added > vq->vring.num);

Thanks,
Rusty.

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 5 of 5] virtio: expose added descriptors immediately
  2011-11-13 21:03     ` Michael S. Tsirkin
  (?)
  (?)
@ 2011-11-14  6:56     ` Michael S. Tsirkin
  2011-11-16  0:21       ` Rusty Russell
  -1 siblings, 1 reply; 60+ messages in thread
From: Michael S. Tsirkin @ 2011-11-14  6:56 UTC (permalink / raw)
  To: Rusty Russell; +Cc: Christoph Hellwig, linux-kernel, kvm, virtualization

On Sun, Nov 13, 2011 at 11:03:13PM +0200, Michael S. Tsirkin wrote:
> On Thu, Nov 03, 2011 at 06:12:53PM +1030, Rusty Russell wrote:
> > A virtio driver does virtqueue_add_buf() multiple times before finally
> > calling virtqueue_kick(); previously we only exposed the added buffers
> > in the virtqueue_kick() call.  This means we don't need a memory
> > barrier in virtqueue_add_buf(), but it reduces concurrency as the
> > device (ie. host) can't see the buffers until the kick.
> > 
> > Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
> 
> In the past I played with a patch like this, but I didn't see a
> performance gain either way. Do you see any gain?
> 
> I'm a bit concerned that with this patch, a buggy driver that
> adds more than 2^16 descriptors without a kick
> would seem to work sometimes. Let's add WARN_ON(vq->num_added > (1 << 16))?

Thinking about it more - it might be tricky for drivers
to ensure this. add used to fail when vq is full, but now
driver might do get between add and notify:
	lock
	add_buf * N
	prep
	unlock
	lock
	get_buf * N
	unlock
	lock
	add_buf
	prep
	unlock
	notify

and since add was followed by get, this doesn't fail.

So the right thing to do I think is to either ignore indexes and assume
a kick is needed, something like:
if vq->num_added >= (1 << 15))
	needs_kick = true
(note: maybe it's 1<<16, and maybe >, but 1<<15 is plenty anyway)

Or alternatively, fail add when num_added is too large.

> > ---
> >  drivers/virtio/virtio_ring.c |   37 ++++++++++++++++++++++---------------
> >  1 file changed, 22 insertions(+), 15 deletions(-)
> > 
> > diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> > --- a/drivers/virtio/virtio_ring.c
> > +++ b/drivers/virtio/virtio_ring.c
> > @@ -227,9 +227,15 @@ add_head:
> >  
> >  	/* Put entry in available array (but don't update avail->idx until they
> >  	 * do sync). */
> > -	avail = ((vq->vring.avail->idx + vq->num_added++) & (vq->vring.num-1));
> > +	avail = (vq->vring.avail->idx & (vq->vring.num-1));
> >  	vq->vring.avail->ring[avail] = head;
> >  
> > +	/* Descriptors and available array need to be set before we expose the
> > +	 * new available array entries. */
> > +	virtio_wmb();
> > +	vq->vring.avail->idx++;
> > +	vq->num_added++;
> > +
> >  	pr_debug("Added buffer head %i to %p\n", head, vq);
> >  	END_USE(vq);
> >  
> > @@ -248,13 +254,10 @@ bool virtqueue_kick_prepare(struct virtq
> >  	 * new available array entries. */
> >  	virtio_wmb();
> >  
> > -	old = vq->vring.avail->idx;
> > -	new = vq->vring.avail->idx = old + vq->num_added;
> > +	old = vq->vring.avail->idx - vq->num_added;
> > +	new = vq->vring.avail->idx;
> >  	vq->num_added = 0;
> >  
> > -	/* Need to update avail index before checking if we should notify */
> > -	virtio_mb();
> > -
> >  	if (vq->event) {
> >  		needs_kick = vring_need_event(vring_avail_event(&vq->vring),
> >  					      new, old);
> > 
> > 
> > _______________________________________________
> > Virtualization mailing list
> > Virtualization@lists.linux-foundation.org
> > https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 5 of 5] virtio: expose added descriptors immediately
  2011-11-14  6:56     ` Michael S. Tsirkin
@ 2011-11-16  0:21       ` Rusty Russell
  2011-11-16  7:18         ` Michael S. Tsirkin
  0 siblings, 1 reply; 60+ messages in thread
From: Rusty Russell @ 2011-11-16  0:21 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: Christoph Hellwig, linux-kernel, kvm, virtualization

On Mon, 14 Nov 2011 08:56:06 +0200, "Michael S. Tsirkin" <mst@redhat.com> wrote:
> On Sun, Nov 13, 2011 at 11:03:13PM +0200, Michael S. Tsirkin wrote:
> > On Thu, Nov 03, 2011 at 06:12:53PM +1030, Rusty Russell wrote:
> > > A virtio driver does virtqueue_add_buf() multiple times before finally
> > > calling virtqueue_kick(); previously we only exposed the added buffers
> > > in the virtqueue_kick() call.  This means we don't need a memory
> > > barrier in virtqueue_add_buf(), but it reduces concurrency as the
> > > device (ie. host) can't see the buffers until the kick.
> > > 
> > > Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
> > 
> > In the past I played with a patch like this, but I didn't see a
> > performance gain either way. Do you see any gain?
> > 
> > I'm a bit concerned that with this patch, a buggy driver that
> > adds more than 2^16 descriptors without a kick
> > would seem to work sometimes. Let's add WARN_ON(vq->num_added > (1 << 16))?
> 
> Thinking about it more - it might be tricky for drivers
> to ensure this. add used to fail when vq is full, but now
> driver might do get between add and notify:
> 	lock
> 	add_buf * N
> 	prep
> 	unlock
> 	lock
> 	get_buf * N
> 	unlock
> 	lock
> 	add_buf
> 	prep
> 	unlock
> 	notify
> 
> and since add was followed by get, this doesn't fail.

Right, the driver could, in theory, do:
        add_buf()
        if (!get_buf())
                notify()

But we don't allow that at the moment in our API: we insist on a notify
occasionally.  Noone does this at the moment, so a WARN_ON is correct.

If you're just add_buf() without the get_buf() then add_buf() will fail
already.

Here's my current variant:

diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -245,9 +245,19 @@ add_head:
 
 	/* Put entry in available array (but don't update avail->idx until they
 	 * do sync). */
-	avail = ((vq->vring.avail->idx + vq->num_added++) & (vq->vring.num-1));
+	avail = (vq->vring.avail->idx & (vq->vring.num-1));
 	vq->vring.avail->ring[avail] = head;
 
+	/* Descriptors and available array need to be set before we expose the
+	 * new available array entries. */
+	virtio_wmb();
+	vq->vring.avail->idx++;
+	vq->num_added++;
+
+	/* If you haven't kicked in this long, you're probably doing something
+	 * wrong. */
+	WARN_ON(vq->num_added > vq->vring.num);
+
 	pr_debug("Added buffer head %i to %p\n", head, vq);
 	END_USE(vq);
 
It's hard to write a useful WARN_ON() for the "you should kick more
regularly" case (we could take timestamps if DEBUG is defined, I guess),
so let's leave this until someone actually trips it.

Thanks,
Rusty.


^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 5 of 5] virtio: expose added descriptors immediately
  2011-11-16  0:21       ` Rusty Russell
@ 2011-11-16  7:18         ` Michael S. Tsirkin
  2011-11-21  1:48           ` Rusty Russell
  0 siblings, 1 reply; 60+ messages in thread
From: Michael S. Tsirkin @ 2011-11-16  7:18 UTC (permalink / raw)
  To: Rusty Russell; +Cc: Christoph Hellwig, linux-kernel, kvm, virtualization

On Wed, Nov 16, 2011 at 10:51:26AM +1030, Rusty Russell wrote:
> On Mon, 14 Nov 2011 08:56:06 +0200, "Michael S. Tsirkin" <mst@redhat.com> wrote:
> > On Sun, Nov 13, 2011 at 11:03:13PM +0200, Michael S. Tsirkin wrote:
> > > On Thu, Nov 03, 2011 at 06:12:53PM +1030, Rusty Russell wrote:
> > > > A virtio driver does virtqueue_add_buf() multiple times before finally
> > > > calling virtqueue_kick(); previously we only exposed the added buffers
> > > > in the virtqueue_kick() call.  This means we don't need a memory
> > > > barrier in virtqueue_add_buf(), but it reduces concurrency as the
> > > > device (ie. host) can't see the buffers until the kick.
> > > > 
> > > > Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
> > > 
> > > In the past I played with a patch like this, but I didn't see a
> > > performance gain either way. Do you see any gain?
> > > 
> > > I'm a bit concerned that with this patch, a buggy driver that
> > > adds more than 2^16 descriptors without a kick
> > > would seem to work sometimes. Let's add WARN_ON(vq->num_added > (1 << 16))?
> > 
> > Thinking about it more - it might be tricky for drivers
> > to ensure this. add used to fail when vq is full, but now
> > driver might do get between add and notify:
> > 	lock
> > 	add_buf * N
> > 	prep
> > 	unlock
> > 	lock
> > 	get_buf * N
> > 	unlock
> > 	lock
> > 	add_buf
> > 	prep
> > 	unlock
> > 	notify
> > 
> > and since add was followed by get, this doesn't fail.
> 
> Right, the driver could, in theory, do:
>         add_buf()
>         if (!get_buf())
>                 notify()
> 
> But we don't allow that at the moment in our API: we insist on a notify
> occasionally.  Noone does this at the moment, so a WARN_ON is correct.
> 
> If you're just add_buf() without the get_buf() then add_buf() will fail
> already.
> 
> Here's my current variant:
> 
> diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> --- a/drivers/virtio/virtio_ring.c
> +++ b/drivers/virtio/virtio_ring.c
> @@ -245,9 +245,19 @@ add_head:
>  
>  	/* Put entry in available array (but don't update avail->idx until they
>  	 * do sync). */
> -	avail = ((vq->vring.avail->idx + vq->num_added++) & (vq->vring.num-1));
> +	avail = (vq->vring.avail->idx & (vq->vring.num-1));
>  	vq->vring.avail->ring[avail] = head;
>  
> +	/* Descriptors and available array need to be set before we expose the
> +	 * new available array entries. */
> +	virtio_wmb();
> +	vq->vring.avail->idx++;
> +	vq->num_added++;
> +
> +	/* If you haven't kicked in this long, you're probably doing something
> +	 * wrong. */
> +	WARN_ON(vq->num_added > vq->vring.num);
> +
>  	pr_debug("Added buffer head %i to %p\n", head, vq);
>  	END_USE(vq);
>  
> It's hard to write a useful WARN_ON() for the "you should kick more
> regularly" case (we could take timestamps if DEBUG is defined, I guess),
> so let's leave this until someone actually trips it.
> 
> Thanks,
> Rusty.

My unlocked kick patches will trip this warning: they make
virtio-net do add + get without kick.

I think block with unlocked kick can trip it too:
add, lock is dropped and then an interrupt can get.

We also don't need a kick each num - each 2^15 is enough.
Why don't we do this at start of add_buf:
if (vq->num_added >= 0x7fff)
	return -ENOSPC;

-- 
MST

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 5 of 5] virtio: expose added descriptors immediately
  2011-11-16  7:18         ` Michael S. Tsirkin
@ 2011-11-21  1:48           ` Rusty Russell
  2011-11-21 11:57             ` Michael S. Tsirkin
  0 siblings, 1 reply; 60+ messages in thread
From: Rusty Russell @ 2011-11-21  1:48 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: Christoph Hellwig, linux-kernel, kvm, virtualization

On Wed, 16 Nov 2011 09:18:38 +0200, "Michael S. Tsirkin" <mst@redhat.com> wrote:
> My unlocked kick patches will trip this warning: they make
> virtio-net do add + get without kick.

Heh, it's a good sign if they do, since that means you're running really
well :)

> I think block with unlocked kick can trip it too:
> add, lock is dropped and then an interrupt can get.
> 
> We also don't need a kick each num - each 2^15 is enough.
> Why don't we do this at start of add_buf:
> if (vq->num_added >= 0x7fff)
> 	return -ENOSPC;

The warning was there in case a driver is never doing a kick, and
getting away with it (mostly) because the device is polling.  Let's not
penalize good drivers to catch bad ones.

How about we do this properly, like so:

From: Rusty Russell <rusty@rustcorp.com.au>
Subject: virtio: add debugging if driver doesn't kick.

Under the existing #ifdef DEBUG, check that they don't have more than
1/10 of a second between an add_buf() and a
virtqueue_notify()/virtqueue_kick_prepare() call.

We could get false positives on a really busy system, but good for
development.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
---
 drivers/virtio/virtio_ring.c |   31 +++++++++++++++++++++++++++++++
 1 file changed, 31 insertions(+)

diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -22,6 +22,7 @@
 #include <linux/device.h>
 #include <linux/slab.h>
 #include <linux/module.h>
+#include <linux/hrtimer.h>
 
 /* virtio guest is communicating with a virtual "device" that actually runs on
  * a host processor.  Memory barriers are used to control SMP effects. */
@@ -102,6 +103,10 @@ struct vring_virtqueue
 #ifdef DEBUG
 	/* They're supposed to lock for us. */
 	unsigned int in_use;
+
+	/* Figure out if their kicks are too delayed. */
+	bool last_add_time_valid;
+	ktime_t last_add_time;
 #endif
 
 	/* Tokens for callbacks. */
@@ -192,6 +197,19 @@ int virtqueue_add_buf(struct virtqueue *
 
 	BUG_ON(data == NULL);
 
+#ifdef DEBUG
+	{
+		ktime_t now = ktime_get();
+
+		/* No kick or get, with .1 second between?  Warn. */
+		if (vq->last_add_time_valid)
+			WARN_ON(ktime_to_ms(ktime_sub(now, vq->last_add_time))
+					    > 100);
+		vq->last_add_time = now;
+		vq->last_add_time_valid = true;
+	}
+#endif
+
 	/* If the host supports indirect descriptor tables, and we have multiple
 	 * buffers, then go indirect. FIXME: tune this threshold */
 	if (vq->indirect && (out + in) > 1 && vq->num_free) {
@@ -291,6 +309,14 @@ bool virtqueue_kick_prepare(struct virtq
 	new = vq->vring.avail->idx;
 	vq->num_added = 0;
 
+#ifdef DEBUG
+	if (vq->last_add_time_valid) {
+		WARN_ON(ktime_to_ms(ktime_sub(ktime_get(),
+					      vq->last_add_time)) > 100);
+	}
+	vq->last_add_time_valid = false;
+#endif
+
 	if (vq->event) {
 		needs_kick = vring_need_event(vring_avail_event(&vq->vring),
 					      new, old);
@@ -428,6 +454,10 @@ void *virtqueue_get_buf(struct virtqueue
 		virtio_mb();
 	}
 
+#ifdef DEBUG
+	vq->last_add_time_valid = false;
+#endif
+
 	END_USE(vq);
 	return ret;
 }
@@ -611,6 +641,7 @@ struct virtqueue *vring_new_virtqueue(un
 	list_add_tail(&vq->vq.list, &vdev->vqs);
 #ifdef DEBUG
 	vq->in_use = false;
+	vq->last_add_time_valid = false;
 #endif
 
 	vq->indirect = virtio_has_feature(vdev, VIRTIO_RING_F_INDIRECT_DESC);

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 5 of 5] virtio: expose added descriptors immediately
  2011-11-21  1:48           ` Rusty Russell
@ 2011-11-21 11:57             ` Michael S. Tsirkin
  2011-11-22  0:33               ` Rusty Russell
  0 siblings, 1 reply; 60+ messages in thread
From: Michael S. Tsirkin @ 2011-11-21 11:57 UTC (permalink / raw)
  To: Rusty Russell; +Cc: Christoph Hellwig, linux-kernel, kvm, virtualization

On Mon, Nov 21, 2011 at 12:18:45PM +1030, Rusty Russell wrote:
> On Wed, 16 Nov 2011 09:18:38 +0200, "Michael S. Tsirkin" <mst@redhat.com> wrote:
> > My unlocked kick patches will trip this warning: they make
> > virtio-net do add + get without kick.
> 
> Heh, it's a good sign if they do, since that means you're running really
> well :)

They don't in fact, in my testing :(. But I think they can with luck.

> > I think block with unlocked kick can trip it too:
> > add, lock is dropped and then an interrupt can get.
> > 
> > We also don't need a kick each num - each 2^15 is enough.
> > Why don't we do this at start of add_buf:
> > if (vq->num_added >= 0x7fff)
> > 	return -ENOSPC;
> 
> The warning was there in case a driver is never doing a kick, and
> getting away with it (mostly) because the device is polling.  Let's not
> penalize good drivers to catch bad ones.
> 
> How about we do this properly, like so:

Absolutely. But I think we also need to handle num_added
overflow of a 15 bit counter, no? Otherwise the
vring_need_event logic might give us false negatives ....
I'm guessing we can just assume we need a kick in that case.

> From: Rusty Russell <rusty@rustcorp.com.au>
> Subject: virtio: add debugging if driver doesn't kick.
> 
> Under the existing #ifdef DEBUG, check that they don't have more than
> 1/10 of a second between an add_buf() and a
> virtqueue_notify()/virtqueue_kick_prepare() call.
> 
> We could get false positives on a really busy system, but good for
> development.
> 
> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
> ---
>  drivers/virtio/virtio_ring.c |   31 +++++++++++++++++++++++++++++++
>  1 file changed, 31 insertions(+)
> 
> diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> --- a/drivers/virtio/virtio_ring.c
> +++ b/drivers/virtio/virtio_ring.c
> @@ -22,6 +22,7 @@
>  #include <linux/device.h>
>  #include <linux/slab.h>
>  #include <linux/module.h>
> +#include <linux/hrtimer.h>
>  
>  /* virtio guest is communicating with a virtual "device" that actually runs on
>   * a host processor.  Memory barriers are used to control SMP effects. */
> @@ -102,6 +103,10 @@ struct vring_virtqueue
>  #ifdef DEBUG
>  	/* They're supposed to lock for us. */
>  	unsigned int in_use;
> +
> +	/* Figure out if their kicks are too delayed. */
> +	bool last_add_time_valid;
> +	ktime_t last_add_time;
>  #endif
>  
>  	/* Tokens for callbacks. */
> @@ -192,6 +197,19 @@ int virtqueue_add_buf(struct virtqueue *
>  
>  	BUG_ON(data == NULL);
>  
> +#ifdef DEBUG
> +	{
> +		ktime_t now = ktime_get();
> +
> +		/* No kick or get, with .1 second between?  Warn. */
> +		if (vq->last_add_time_valid)
> +			WARN_ON(ktime_to_ms(ktime_sub(now, vq->last_add_time))
> +					    > 100);
> +		vq->last_add_time = now;
> +		vq->last_add_time_valid = true;
> +	}
> +#endif
> +
>  	/* If the host supports indirect descriptor tables, and we have multiple
>  	 * buffers, then go indirect. FIXME: tune this threshold */
>  	if (vq->indirect && (out + in) > 1 && vq->num_free) {
> @@ -291,6 +309,14 @@ bool virtqueue_kick_prepare(struct virtq
>  	new = vq->vring.avail->idx;
>  	vq->num_added = 0;
>  
> +#ifdef DEBUG
> +	if (vq->last_add_time_valid) {
> +		WARN_ON(ktime_to_ms(ktime_sub(ktime_get(),
> +					      vq->last_add_time)) > 100);
> +	}
> +	vq->last_add_time_valid = false;
> +#endif
> +
>  	if (vq->event) {
>  		needs_kick = vring_need_event(vring_avail_event(&vq->vring),
>  					      new, old);
> @@ -428,6 +454,10 @@ void *virtqueue_get_buf(struct virtqueue
>  		virtio_mb();
>  	}
>  
> +#ifdef DEBUG
> +	vq->last_add_time_valid = false;
> +#endif
> +
>  	END_USE(vq);
>  	return ret;
>  }
> @@ -611,6 +641,7 @@ struct virtqueue *vring_new_virtqueue(un
>  	list_add_tail(&vq->vq.list, &vdev->vqs);
>  #ifdef DEBUG
>  	vq->in_use = false;
> +	vq->last_add_time_valid = false;
>  #endif
>  
>  	vq->indirect = virtio_has_feature(vdev, VIRTIO_RING_F_INDIRECT_DESC);

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 5 of 5] virtio: expose added descriptors immediately
  2011-11-21 11:57             ` Michael S. Tsirkin
@ 2011-11-22  0:33               ` Rusty Russell
  2011-11-22  6:29                 ` Michael S. Tsirkin
  0 siblings, 1 reply; 60+ messages in thread
From: Rusty Russell @ 2011-11-22  0:33 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: Christoph Hellwig, linux-kernel, kvm, virtualization

On Mon, 21 Nov 2011 13:57:04 +0200, "Michael S. Tsirkin" <mst@redhat.com> wrote:
> On Mon, Nov 21, 2011 at 12:18:45PM +1030, Rusty Russell wrote:
> > On Wed, 16 Nov 2011 09:18:38 +0200, "Michael S. Tsirkin" <mst@redhat.com> wrote:
> > > My unlocked kick patches will trip this warning: they make
> > > virtio-net do add + get without kick.
> > 
> > Heh, it's a good sign if they do, since that means you're running really
> > well :)
> 
> They don't in fact, in my testing :(. But I think they can with luck.
> 
> > > I think block with unlocked kick can trip it too:
> > > add, lock is dropped and then an interrupt can get.
> > > 
> > > We also don't need a kick each num - each 2^15 is enough.
> > > Why don't we do this at start of add_buf:
> > > if (vq->num_added >= 0x7fff)
> > > 	return -ENOSPC;
> > 
> > The warning was there in case a driver is never doing a kick, and
> > getting away with it (mostly) because the device is polling.  Let's not
> > penalize good drivers to catch bad ones.
> > 
> > How about we do this properly, like so:
> 
> Absolutely. But I think we also need to handle num_added
> overflow of a 15 bit counter, no? Otherwise the
> vring_need_event logic might give us false negatives ....
> I'm guessing we can just assume we need a kick in that case.

You're right.  Thankyou.  My immediate reaction of "make it an unsigned
long" doesn't work.

Here's the diff to what I posted before:

diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -254,9 +254,10 @@ add_head:
 	vq->vring.avail->idx++;
 	vq->num_added++;
 
-	/* If you haven't kicked in this long, you're probably doing something
-	 * wrong. */
-	WARN_ON(vq->num_added > vq->vring.num);
+	/* This is very unlikely, but theoretically possible.  Kick
+	 * just in case. */
+	if (unlikely(vq->num_added == 65535))
+		virtqueue_kick(_vq);
 
 	pr_debug("Added buffer head %i to %p\n", head, vq);
 	END_USE(vq);

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 5 of 5] virtio: expose added descriptors immediately
  2011-11-22  0:33               ` Rusty Russell
@ 2011-11-22  6:29                 ` Michael S. Tsirkin
  2011-11-23  1:19                   ` Rusty Russell
  0 siblings, 1 reply; 60+ messages in thread
From: Michael S. Tsirkin @ 2011-11-22  6:29 UTC (permalink / raw)
  To: Rusty Russell; +Cc: Christoph Hellwig, linux-kernel, kvm, virtualization

On Tue, Nov 22, 2011 at 11:03:04AM +1030, Rusty Russell wrote:
> On Mon, 21 Nov 2011 13:57:04 +0200, "Michael S. Tsirkin" <mst@redhat.com> wrote:
> > On Mon, Nov 21, 2011 at 12:18:45PM +1030, Rusty Russell wrote:
> > > On Wed, 16 Nov 2011 09:18:38 +0200, "Michael S. Tsirkin" <mst@redhat.com> wrote:
> > > > My unlocked kick patches will trip this warning: they make
> > > > virtio-net do add + get without kick.
> > > 
> > > Heh, it's a good sign if they do, since that means you're running really
> > > well :)
> > 
> > They don't in fact, in my testing :(. But I think they can with luck.
> > 
> > > > I think block with unlocked kick can trip it too:
> > > > add, lock is dropped and then an interrupt can get.
> > > > 
> > > > We also don't need a kick each num - each 2^15 is enough.
> > > > Why don't we do this at start of add_buf:
> > > > if (vq->num_added >= 0x7fff)
> > > > 	return -ENOSPC;
> > > 
> > > The warning was there in case a driver is never doing a kick, and
> > > getting away with it (mostly) because the device is polling.  Let's not
> > > penalize good drivers to catch bad ones.
> > > 
> > > How about we do this properly, like so:
> > 
> > Absolutely. But I think we also need to handle num_added
> > overflow of a 15 bit counter, no? Otherwise the
> > vring_need_event logic might give us false negatives ....
> > I'm guessing we can just assume we need a kick in that case.
> 
> You're right.  Thankyou.  My immediate reaction of "make it an unsigned
> long" doesn't work.
> 
> Here's the diff to what I posted before:
> 
> diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> --- a/drivers/virtio/virtio_ring.c
> +++ b/drivers/virtio/virtio_ring.c
> @@ -254,9 +254,10 @@ add_head:
>  	vq->vring.avail->idx++;
>  	vq->num_added++;
>  
> -	/* If you haven't kicked in this long, you're probably doing something
> -	 * wrong. */
> -	WARN_ON(vq->num_added > vq->vring.num);
> +	/* This is very unlikely, but theoretically possible.  Kick
> +	 * just in case. */
> +	if (unlikely(vq->num_added == 65535))

This is 0xffff but why use the decimal notation?

> +		virtqueue_kick(_vq);
>  
>  	pr_debug("Added buffer head %i to %p\n", head, vq);
>  	END_USE(vq);

We also still need to reset vq->num_added, right?

-- 
MST

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 5 of 5] virtio: expose added descriptors immediately
  2011-11-22  6:29                 ` Michael S. Tsirkin
@ 2011-11-23  1:19                   ` Rusty Russell
  2011-11-23  8:30                     ` Michael S. Tsirkin
  0 siblings, 1 reply; 60+ messages in thread
From: Rusty Russell @ 2011-11-23  1:19 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: Christoph Hellwig, linux-kernel, kvm, virtualization

On Tue, 22 Nov 2011 08:29:08 +0200, "Michael S. Tsirkin" <mst@redhat.com> wrote:
> On Tue, Nov 22, 2011 at 11:03:04AM +1030, Rusty Russell wrote:
> > -	/* If you haven't kicked in this long, you're probably doing something
> > -	 * wrong. */
> > -	WARN_ON(vq->num_added > vq->vring.num);
> > +	/* This is very unlikely, but theoretically possible.  Kick
> > +	 * just in case. */
> > +	if (unlikely(vq->num_added == 65535))
> 
> This is 0xffff but why use the decimal notation?

Interesting.  Why use hex?  Feels more like binary?

But I've changed it to "(1 << 16) - 1" to be clear.

> > +		virtqueue_kick(_vq);
> >  
> >  	pr_debug("Added buffer head %i to %p\n", head, vq);
> >  	END_USE(vq);
> 
> We also still need to reset vq->num_added, right?

virtqueue_kick does that for us.

Cheers,
Rusty.

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 5 of 5] virtio: expose added descriptors immediately
  2011-11-23  1:19                   ` Rusty Russell
@ 2011-11-23  8:30                     ` Michael S. Tsirkin
  0 siblings, 0 replies; 60+ messages in thread
From: Michael S. Tsirkin @ 2011-11-23  8:30 UTC (permalink / raw)
  To: Rusty Russell; +Cc: Christoph Hellwig, linux-kernel, kvm, virtualization

On Wed, Nov 23, 2011 at 11:49:01AM +1030, Rusty Russell wrote:
> On Tue, 22 Nov 2011 08:29:08 +0200, "Michael S. Tsirkin" <mst@redhat.com> wrote:
> > On Tue, Nov 22, 2011 at 11:03:04AM +1030, Rusty Russell wrote:
> > > -	/* If you haven't kicked in this long, you're probably doing something
> > > -	 * wrong. */
> > > -	WARN_ON(vq->num_added > vq->vring.num);
> > > +	/* This is very unlikely, but theoretically possible.  Kick
> > > +	 * just in case. */
> > > +	if (unlikely(vq->num_added == 65535))
> > 
> > This is 0xffff but why use the decimal notation?
> 
> Interesting.  Why use hex?  Feels more like binary?

Just easier to see it's the largest 16 bit number.

> But I've changed it to "(1 << 16) - 1" to be clear.

That's even better.

> > > +		virtqueue_kick(_vq);
> > >  
> > >  	pr_debug("Added buffer head %i to %p\n", head, vq);
> > >  	END_USE(vq);
> > 
> > We also still need to reset vq->num_added, right?
> 
> virtqueue_kick does that for us.
> 
> Cheers,
> Rusty.

Right.

^ permalink raw reply	[flat|nested] 60+ messages in thread

* RFD: virtio balloon API use (was Re: [PATCH 5 of 5] virtio: expose added descriptors immediately)
  2011-11-03  7:42   ` Rusty Russell
@ 2012-07-01  9:20     ` Michael S. Tsirkin
  -1 siblings, 0 replies; 60+ messages in thread
From: Michael S. Tsirkin @ 2012-07-01  9:20 UTC (permalink / raw)
  To: Rusty Russell; +Cc: Rafael Aquini, linux-kernel, kvm, virtualization

On Thu, Nov 03, 2011 at 06:12:53PM +1030, Rusty Russell wrote:
> A virtio driver does virtqueue_add_buf() multiple times before finally
> calling virtqueue_kick(); previously we only exposed the added buffers
> in the virtqueue_kick() call.  This means we don't need a memory
> barrier in virtqueue_add_buf(), but it reduces concurrency as the
> device (ie. host) can't see the buffers until the kick.
> 
> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>

Looking at recent mm compaction patches made me look at locking
in balloon closely. And I noticed the referenced patch (commit
ee7cd8981e15bcb365fc762afe3fc47b8242f630 upstream) interacts strangely
with virtio balloon; balloon currently does:

static void tell_host(struct virtio_balloon *vb, struct virtqueue *vq)
{
        struct scatterlist sg;

        sg_init_one(&sg, vb->pfns, sizeof(vb->pfns[0]) * vb->num_pfns);

        init_completion(&vb->acked);

        /* We should always be able to add one buffer to an empty queue. */
        if (virtqueue_add_buf(vq, &sg, 1, 0, vb, GFP_KERNEL) < 0)
                BUG();
        virtqueue_kick(vq);

        /* When host has read buffer, this completes via balloon_ack */
        wait_for_completion(&vb->acked);
}


While vq callback does:

static void balloon_ack(struct virtqueue *vq)
{
        struct virtio_balloon *vb;
        unsigned int len;

        vb = virtqueue_get_buf(vq, &len);
        if (vb)
                complete(&vb->acked);
}


So virtqueue_get_buf might now run concurrently with virtqueue_kick.
I audited both and this seems safe in practice but I think
we need to either declare this legal at the API level
or add locking in driver.

Further, is there a guarantee that we never get
spurious callbacks? We currently check ring not empty
but esp for non shared MSI this might not be needed.
If a spurious callback triggers, virtqueue_get_buf can run
concurrently with virtqueue_add_buf which is known to be racy.
Again I think this is currently safe as no spurious callbacks in
practice but should we guarantee no spurious callbacks at the API level
or add locking in driver?

-- 
MST

^ permalink raw reply	[flat|nested] 60+ messages in thread

* RFD: virtio balloon API use (was Re: [PATCH 5 of 5] virtio: expose added descriptors immediately)
@ 2012-07-01  9:20     ` Michael S. Tsirkin
  0 siblings, 0 replies; 60+ messages in thread
From: Michael S. Tsirkin @ 2012-07-01  9:20 UTC (permalink / raw)
  To: Rusty Russell; +Cc: virtualization, Rafael Aquini, kvm, linux-kernel

On Thu, Nov 03, 2011 at 06:12:53PM +1030, Rusty Russell wrote:
> A virtio driver does virtqueue_add_buf() multiple times before finally
> calling virtqueue_kick(); previously we only exposed the added buffers
> in the virtqueue_kick() call.  This means we don't need a memory
> barrier in virtqueue_add_buf(), but it reduces concurrency as the
> device (ie. host) can't see the buffers until the kick.
> 
> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>

Looking at recent mm compaction patches made me look at locking
in balloon closely. And I noticed the referenced patch (commit
ee7cd8981e15bcb365fc762afe3fc47b8242f630 upstream) interacts strangely
with virtio balloon; balloon currently does:

static void tell_host(struct virtio_balloon *vb, struct virtqueue *vq)
{
        struct scatterlist sg;

        sg_init_one(&sg, vb->pfns, sizeof(vb->pfns[0]) * vb->num_pfns);

        init_completion(&vb->acked);

        /* We should always be able to add one buffer to an empty queue. */
        if (virtqueue_add_buf(vq, &sg, 1, 0, vb, GFP_KERNEL) < 0)
                BUG();
        virtqueue_kick(vq);

        /* When host has read buffer, this completes via balloon_ack */
        wait_for_completion(&vb->acked);
}


While vq callback does:

static void balloon_ack(struct virtqueue *vq)
{
        struct virtio_balloon *vb;
        unsigned int len;

        vb = virtqueue_get_buf(vq, &len);
        if (vb)
                complete(&vb->acked);
}


So virtqueue_get_buf might now run concurrently with virtqueue_kick.
I audited both and this seems safe in practice but I think
we need to either declare this legal at the API level
or add locking in driver.

Further, is there a guarantee that we never get
spurious callbacks? We currently check ring not empty
but esp for non shared MSI this might not be needed.
If a spurious callback triggers, virtqueue_get_buf can run
concurrently with virtqueue_add_buf which is known to be racy.
Again I think this is currently safe as no spurious callbacks in
practice but should we guarantee no spurious callbacks at the API level
or add locking in driver?

-- 
MST

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: RFD: virtio balloon API use (was Re: [PATCH 5 of 5] virtio: expose added descriptors immediately)
  2012-07-01  9:20     ` Michael S. Tsirkin
@ 2012-07-02  1:05       ` Rusty Russell
  -1 siblings, 0 replies; 60+ messages in thread
From: Rusty Russell @ 2012-07-02  1:05 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: Rafael Aquini, linux-kernel, kvm, virtualization

On Sun, 1 Jul 2012 12:20:51 +0300, "Michael S. Tsirkin" <mst@redhat.com> wrote:
> On Thu, Nov 03, 2011 at 06:12:53PM +1030, Rusty Russell wrote:
> > A virtio driver does virtqueue_add_buf() multiple times before finally
> > calling virtqueue_kick(); previously we only exposed the added buffers
> > in the virtqueue_kick() call.  This means we don't need a memory
> > barrier in virtqueue_add_buf(), but it reduces concurrency as the
> > device (ie. host) can't see the buffers until the kick.
> > 
> > Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
> 
> Looking at recent mm compaction patches made me look at locking
> in balloon closely. And I noticed the referenced patch (commit
> ee7cd8981e15bcb365fc762afe3fc47b8242f630 upstream) interacts strangely
> with virtio balloon; balloon currently does:
> 
> static void tell_host(struct virtio_balloon *vb, struct virtqueue *vq)
> {
>         struct scatterlist sg;
> 
>         sg_init_one(&sg, vb->pfns, sizeof(vb->pfns[0]) * vb->num_pfns);
> 
>         init_completion(&vb->acked);
> 
>         /* We should always be able to add one buffer to an empty queue. */
>         if (virtqueue_add_buf(vq, &sg, 1, 0, vb, GFP_KERNEL) < 0)
>                 BUG();
>         virtqueue_kick(vq);
> 
>         /* When host has read buffer, this completes via balloon_ack */
>         wait_for_completion(&vb->acked);
> }
> 
> 
> While vq callback does:
> 
> static void balloon_ack(struct virtqueue *vq)
> {
>         struct virtio_balloon *vb;
>         unsigned int len;
> 
>         vb = virtqueue_get_buf(vq, &len);
>         if (vb)
>                 complete(&vb->acked);
> }
> 
> 
> So virtqueue_get_buf might now run concurrently with virtqueue_kick.
> I audited both and this seems safe in practice but I think

Good spotting!

Agreed.  Because there's only add_buf, we get away with it: the add_buf
must be almost finished by the time get_buf runs because the device has
seen the buffer.

> we need to either declare this legal at the API level
> or add locking in driver.

I wonder if we should just lock in the balloon driver, rather than
document this corner case and set a bad example.  Are there other
drivers which take the same shortcut?

> Further, is there a guarantee that we never get
> spurious callbacks?  We currently check ring not empty
> but esp for non shared MSI this might not be needed.

Yes, I think this saves us.  A spurious interrupt won't trigger
a spurious callback.

> If a spurious callback triggers, virtqueue_get_buf can run
> concurrently with virtqueue_add_buf which is known to be racy.
> Again I think this is currently safe as no spurious callbacks in
> practice but should we guarantee no spurious callbacks at the API level
> or add locking in driver?

I think we should guarantee it, but is there a hole in the current
implementation?

Thanks,
Rusty.

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: RFD: virtio balloon API use (was Re: [PATCH 5 of 5] virtio: expose added descriptors immediately)
@ 2012-07-02  1:05       ` Rusty Russell
  0 siblings, 0 replies; 60+ messages in thread
From: Rusty Russell @ 2012-07-02  1:05 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: virtualization, Rafael Aquini, kvm, linux-kernel

On Sun, 1 Jul 2012 12:20:51 +0300, "Michael S. Tsirkin" <mst@redhat.com> wrote:
> On Thu, Nov 03, 2011 at 06:12:53PM +1030, Rusty Russell wrote:
> > A virtio driver does virtqueue_add_buf() multiple times before finally
> > calling virtqueue_kick(); previously we only exposed the added buffers
> > in the virtqueue_kick() call.  This means we don't need a memory
> > barrier in virtqueue_add_buf(), but it reduces concurrency as the
> > device (ie. host) can't see the buffers until the kick.
> > 
> > Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
> 
> Looking at recent mm compaction patches made me look at locking
> in balloon closely. And I noticed the referenced patch (commit
> ee7cd8981e15bcb365fc762afe3fc47b8242f630 upstream) interacts strangely
> with virtio balloon; balloon currently does:
> 
> static void tell_host(struct virtio_balloon *vb, struct virtqueue *vq)
> {
>         struct scatterlist sg;
> 
>         sg_init_one(&sg, vb->pfns, sizeof(vb->pfns[0]) * vb->num_pfns);
> 
>         init_completion(&vb->acked);
> 
>         /* We should always be able to add one buffer to an empty queue. */
>         if (virtqueue_add_buf(vq, &sg, 1, 0, vb, GFP_KERNEL) < 0)
>                 BUG();
>         virtqueue_kick(vq);
> 
>         /* When host has read buffer, this completes via balloon_ack */
>         wait_for_completion(&vb->acked);
> }
> 
> 
> While vq callback does:
> 
> static void balloon_ack(struct virtqueue *vq)
> {
>         struct virtio_balloon *vb;
>         unsigned int len;
> 
>         vb = virtqueue_get_buf(vq, &len);
>         if (vb)
>                 complete(&vb->acked);
> }
> 
> 
> So virtqueue_get_buf might now run concurrently with virtqueue_kick.
> I audited both and this seems safe in practice but I think

Good spotting!

Agreed.  Because there's only add_buf, we get away with it: the add_buf
must be almost finished by the time get_buf runs because the device has
seen the buffer.

> we need to either declare this legal at the API level
> or add locking in driver.

I wonder if we should just lock in the balloon driver, rather than
document this corner case and set a bad example.  Are there other
drivers which take the same shortcut?

> Further, is there a guarantee that we never get
> spurious callbacks?  We currently check ring not empty
> but esp for non shared MSI this might not be needed.

Yes, I think this saves us.  A spurious interrupt won't trigger
a spurious callback.

> If a spurious callback triggers, virtqueue_get_buf can run
> concurrently with virtqueue_add_buf which is known to be racy.
> Again I think this is currently safe as no spurious callbacks in
> practice but should we guarantee no spurious callbacks at the API level
> or add locking in driver?

I think we should guarantee it, but is there a hole in the current
implementation?

Thanks,
Rusty.

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: RFD: virtio balloon API use (was Re: [PATCH 5 of 5] virtio: expose added descriptors immediately)
  2012-07-02  1:05       ` Rusty Russell
@ 2012-07-02  7:25         ` Michael S. Tsirkin
  -1 siblings, 0 replies; 60+ messages in thread
From: Michael S. Tsirkin @ 2012-07-02  7:25 UTC (permalink / raw)
  To: Rusty Russell; +Cc: Rafael Aquini, linux-kernel, kvm, virtualization

On Mon, Jul 02, 2012 at 10:35:47AM +0930, Rusty Russell wrote:
> On Sun, 1 Jul 2012 12:20:51 +0300, "Michael S. Tsirkin" <mst@redhat.com> wrote:
> > On Thu, Nov 03, 2011 at 06:12:53PM +1030, Rusty Russell wrote:
> > > A virtio driver does virtqueue_add_buf() multiple times before finally
> > > calling virtqueue_kick(); previously we only exposed the added buffers
> > > in the virtqueue_kick() call.  This means we don't need a memory
> > > barrier in virtqueue_add_buf(), but it reduces concurrency as the
> > > device (ie. host) can't see the buffers until the kick.
> > > 
> > > Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
> > 
> > Looking at recent mm compaction patches made me look at locking
> > in balloon closely. And I noticed the referenced patch (commit
> > ee7cd8981e15bcb365fc762afe3fc47b8242f630 upstream) interacts strangely
> > with virtio balloon; balloon currently does:
> > 
> > static void tell_host(struct virtio_balloon *vb, struct virtqueue *vq)
> > {
> >         struct scatterlist sg;
> > 
> >         sg_init_one(&sg, vb->pfns, sizeof(vb->pfns[0]) * vb->num_pfns);
> > 
> >         init_completion(&vb->acked);
> > 
> >         /* We should always be able to add one buffer to an empty queue. */
> >         if (virtqueue_add_buf(vq, &sg, 1, 0, vb, GFP_KERNEL) < 0)
> >                 BUG();
> >         virtqueue_kick(vq);
> > 
> >         /* When host has read buffer, this completes via balloon_ack */
> >         wait_for_completion(&vb->acked);
> > }
> > 
> > 
> > While vq callback does:
> > 
> > static void balloon_ack(struct virtqueue *vq)
> > {
> >         struct virtio_balloon *vb;
> >         unsigned int len;
> > 
> >         vb = virtqueue_get_buf(vq, &len);
> >         if (vb)
> >                 complete(&vb->acked);
> > }
> > 
> > 
> > So virtqueue_get_buf might now run concurrently with virtqueue_kick.
> > I audited both and this seems safe in practice but I think
> 
> Good spotting!
> 
> Agreed.  Because there's only add_buf, we get away with it: the add_buf
> must be almost finished by the time get_buf runs because the device has
> seen the buffer.
> 
> > we need to either declare this legal at the API level
> > or add locking in driver.
> 
> I wonder if we should just lock in the balloon driver, rather than
> document this corner case and set a bad example.

We'll need to replace &vb->acked with a waitqueue
and do get_buf from the same thread.
But I note that stats_request hash the same issue.
Let's see if we can fix it.

> Are there other
> drivers which take the same shortcut?

Not that I know.

> > Further, is there a guarantee that we never get
> > spurious callbacks?  We currently check ring not empty
> > but esp for non shared MSI this might not be needed.
> 
> Yes, I think this saves us.  A spurious interrupt won't trigger
> a spurious callback.
> 
> > If a spurious callback triggers, virtqueue_get_buf can run
> > concurrently with virtqueue_add_buf which is known to be racy.
> > Again I think this is currently safe as no spurious callbacks in
> > practice but should we guarantee no spurious callbacks at the API level
> > or add locking in driver?
> 
> I think we should guarantee it, but is there a hole in the current
> implementation?
> 
> Thanks,
> Rusty.

Could be. The check for ring empty looks somewhat suspicious.
It might be expensive to make it 100% robust - that check was
intended as an optimization for shared interrupts.
Whith per vq interrupts we IMO do not need the check.
If we add locking in balloon I think there's no need
to guarantee no spurious interrupts.


-- 
MST

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: RFD: virtio balloon API use (was Re: [PATCH 5 of 5] virtio: expose added descriptors immediately)
@ 2012-07-02  7:25         ` Michael S. Tsirkin
  0 siblings, 0 replies; 60+ messages in thread
From: Michael S. Tsirkin @ 2012-07-02  7:25 UTC (permalink / raw)
  To: Rusty Russell; +Cc: virtualization, Rafael Aquini, kvm, linux-kernel

On Mon, Jul 02, 2012 at 10:35:47AM +0930, Rusty Russell wrote:
> On Sun, 1 Jul 2012 12:20:51 +0300, "Michael S. Tsirkin" <mst@redhat.com> wrote:
> > On Thu, Nov 03, 2011 at 06:12:53PM +1030, Rusty Russell wrote:
> > > A virtio driver does virtqueue_add_buf() multiple times before finally
> > > calling virtqueue_kick(); previously we only exposed the added buffers
> > > in the virtqueue_kick() call.  This means we don't need a memory
> > > barrier in virtqueue_add_buf(), but it reduces concurrency as the
> > > device (ie. host) can't see the buffers until the kick.
> > > 
> > > Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
> > 
> > Looking at recent mm compaction patches made me look at locking
> > in balloon closely. And I noticed the referenced patch (commit
> > ee7cd8981e15bcb365fc762afe3fc47b8242f630 upstream) interacts strangely
> > with virtio balloon; balloon currently does:
> > 
> > static void tell_host(struct virtio_balloon *vb, struct virtqueue *vq)
> > {
> >         struct scatterlist sg;
> > 
> >         sg_init_one(&sg, vb->pfns, sizeof(vb->pfns[0]) * vb->num_pfns);
> > 
> >         init_completion(&vb->acked);
> > 
> >         /* We should always be able to add one buffer to an empty queue. */
> >         if (virtqueue_add_buf(vq, &sg, 1, 0, vb, GFP_KERNEL) < 0)
> >                 BUG();
> >         virtqueue_kick(vq);
> > 
> >         /* When host has read buffer, this completes via balloon_ack */
> >         wait_for_completion(&vb->acked);
> > }
> > 
> > 
> > While vq callback does:
> > 
> > static void balloon_ack(struct virtqueue *vq)
> > {
> >         struct virtio_balloon *vb;
> >         unsigned int len;
> > 
> >         vb = virtqueue_get_buf(vq, &len);
> >         if (vb)
> >                 complete(&vb->acked);
> > }
> > 
> > 
> > So virtqueue_get_buf might now run concurrently with virtqueue_kick.
> > I audited both and this seems safe in practice but I think
> 
> Good spotting!
> 
> Agreed.  Because there's only add_buf, we get away with it: the add_buf
> must be almost finished by the time get_buf runs because the device has
> seen the buffer.
> 
> > we need to either declare this legal at the API level
> > or add locking in driver.
> 
> I wonder if we should just lock in the balloon driver, rather than
> document this corner case and set a bad example.

We'll need to replace &vb->acked with a waitqueue
and do get_buf from the same thread.
But I note that stats_request hash the same issue.
Let's see if we can fix it.

> Are there other
> drivers which take the same shortcut?

Not that I know.

> > Further, is there a guarantee that we never get
> > spurious callbacks?  We currently check ring not empty
> > but esp for non shared MSI this might not be needed.
> 
> Yes, I think this saves us.  A spurious interrupt won't trigger
> a spurious callback.
> 
> > If a spurious callback triggers, virtqueue_get_buf can run
> > concurrently with virtqueue_add_buf which is known to be racy.
> > Again I think this is currently safe as no spurious callbacks in
> > practice but should we guarantee no spurious callbacks at the API level
> > or add locking in driver?
> 
> I think we should guarantee it, but is there a hole in the current
> implementation?
> 
> Thanks,
> Rusty.

Could be. The check for ring empty looks somewhat suspicious.
It might be expensive to make it 100% robust - that check was
intended as an optimization for shared interrupts.
Whith per vq interrupts we IMO do not need the check.
If we add locking in balloon I think there's no need
to guarantee no spurious interrupts.


-- 
MST

^ permalink raw reply	[flat|nested] 60+ messages in thread

* [PATCH RFC] virtio-balloon: fix add/get API use
  2012-07-02  1:05       ` Rusty Russell
@ 2012-07-02  7:33         ` Michael S. Tsirkin
  -1 siblings, 0 replies; 60+ messages in thread
From: Michael S. Tsirkin @ 2012-07-02  7:33 UTC (permalink / raw)
  To: Rusty Russell; +Cc: Rafael Aquini, linux-kernel, kvm, virtualization

In virtio balloon virtqueue_get_buf might now run concurrently with
virtqueue_kick.  I audited both and this seems safe in practice but
this is not guaranteed by the API.
Additionally, a spurious interrupt might in theory make
virtqueue_get_buf run in parallel with virtqueue_add_buf, which is racy.

While we might try to protect against spurious callbacks it's
easier to fix the driver: balloon seems to be the only one
(mis)using the API like this, so let's just fix balloon.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

---

Warning: completely untested.

diff --git a/drivers/virtio/virtio_balloon.c b/drivers/virtio/virtio_balloon.c
index bfbc15c..a26eb4f 100644
--- a/drivers/virtio/virtio_balloon.c
+++ b/drivers/virtio/virtio_balloon.c
@@ -47,7 +47,7 @@ struct virtio_balloon
 	struct task_struct *thread;
 
 	/* Waiting for host to ack the pages we released. */
-	struct completion acked;
+	wait_queue_head_t acked;
 
 	/* Number of balloon pages we've told the Host we're not using. */
 	unsigned int num_pages;
@@ -89,29 +89,26 @@ static struct page *balloon_pfn_to_page(u32 pfn)
 
 static void balloon_ack(struct virtqueue *vq)
 {
-	struct virtio_balloon *vb;
-	unsigned int len;
+	struct virtio_balloon *vb = vq->vdev->priv;
 
-	vb = virtqueue_get_buf(vq, &len);
-	if (vb)
-		complete(&vb->acked);
+	wake_up(&vb->acked);
 }
 
 static void tell_host(struct virtio_balloon *vb, struct virtqueue *vq)
 {
 	struct scatterlist sg;
+	unsigned int len;
+	void *buf;
 
 	sg_init_one(&sg, vb->pfns, sizeof(vb->pfns[0]) * vb->num_pfns);
 
-	init_completion(&vb->acked);
-
 	/* We should always be able to add one buffer to an empty queue. */
 	if (virtqueue_add_buf(vq, &sg, 1, 0, vb, GFP_KERNEL) < 0)
 		BUG();
 	virtqueue_kick(vq);
 
 	/* When host has read buffer, this completes via balloon_ack */
-	wait_for_completion(&vb->acked);
+	wait_event(vb->acked, virtqueue_get_buf(vq, &len));
 }
 
 static void set_page_pfns(u32 pfns[], struct page *page)
@@ -231,12 +228,8 @@ static void update_balloon_stats(struct virtio_balloon *vb)
  */
 static void stats_request(struct virtqueue *vq)
 {
-	struct virtio_balloon *vb;
-	unsigned int len;
+	struct virtio_balloon *vb = vq->vdev->priv;
 
-	vb = virtqueue_get_buf(vq, &len);
-	if (!vb)
-		return;
 	vb->need_stats_update = 1;
 	wake_up(&vb->config_change);
 }
@@ -245,11 +238,14 @@ static void stats_handle_request(struct virtio_balloon *vb)
 {
 	struct virtqueue *vq;
 	struct scatterlist sg;
+	unsigned int len;
 
 	vb->need_stats_update = 0;
 	update_balloon_stats(vb);
 
 	vq = vb->stats_vq;
+	if (!virtqueue_get_buf(vq, &len))
+		return;
 	sg_init_one(&sg, vb->stats, sizeof(vb->stats));
 	if (virtqueue_add_buf(vq, &sg, 1, 0, vb, GFP_KERNEL) < 0)
 		BUG();
@@ -358,6 +354,7 @@ static int virtballoon_probe(struct virtio_device *vdev)
 	INIT_LIST_HEAD(&vb->pages);
 	vb->num_pages = 0;
 	init_waitqueue_head(&vb->config_change);
+	init_waitqueue_head(&vb->acked);
 	vb->vdev = vdev;
 	vb->need_stats_update = 0;
 

^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH RFC] virtio-balloon: fix add/get API use
@ 2012-07-02  7:33         ` Michael S. Tsirkin
  0 siblings, 0 replies; 60+ messages in thread
From: Michael S. Tsirkin @ 2012-07-02  7:33 UTC (permalink / raw)
  To: Rusty Russell; +Cc: virtualization, Rafael Aquini, kvm, linux-kernel

In virtio balloon virtqueue_get_buf might now run concurrently with
virtqueue_kick.  I audited both and this seems safe in practice but
this is not guaranteed by the API.
Additionally, a spurious interrupt might in theory make
virtqueue_get_buf run in parallel with virtqueue_add_buf, which is racy.

While we might try to protect against spurious callbacks it's
easier to fix the driver: balloon seems to be the only one
(mis)using the API like this, so let's just fix balloon.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

---

Warning: completely untested.

diff --git a/drivers/virtio/virtio_balloon.c b/drivers/virtio/virtio_balloon.c
index bfbc15c..a26eb4f 100644
--- a/drivers/virtio/virtio_balloon.c
+++ b/drivers/virtio/virtio_balloon.c
@@ -47,7 +47,7 @@ struct virtio_balloon
 	struct task_struct *thread;
 
 	/* Waiting for host to ack the pages we released. */
-	struct completion acked;
+	wait_queue_head_t acked;
 
 	/* Number of balloon pages we've told the Host we're not using. */
 	unsigned int num_pages;
@@ -89,29 +89,26 @@ static struct page *balloon_pfn_to_page(u32 pfn)
 
 static void balloon_ack(struct virtqueue *vq)
 {
-	struct virtio_balloon *vb;
-	unsigned int len;
+	struct virtio_balloon *vb = vq->vdev->priv;
 
-	vb = virtqueue_get_buf(vq, &len);
-	if (vb)
-		complete(&vb->acked);
+	wake_up(&vb->acked);
 }
 
 static void tell_host(struct virtio_balloon *vb, struct virtqueue *vq)
 {
 	struct scatterlist sg;
+	unsigned int len;
+	void *buf;
 
 	sg_init_one(&sg, vb->pfns, sizeof(vb->pfns[0]) * vb->num_pfns);
 
-	init_completion(&vb->acked);
-
 	/* We should always be able to add one buffer to an empty queue. */
 	if (virtqueue_add_buf(vq, &sg, 1, 0, vb, GFP_KERNEL) < 0)
 		BUG();
 	virtqueue_kick(vq);
 
 	/* When host has read buffer, this completes via balloon_ack */
-	wait_for_completion(&vb->acked);
+	wait_event(vb->acked, virtqueue_get_buf(vq, &len));
 }
 
 static void set_page_pfns(u32 pfns[], struct page *page)
@@ -231,12 +228,8 @@ static void update_balloon_stats(struct virtio_balloon *vb)
  */
 static void stats_request(struct virtqueue *vq)
 {
-	struct virtio_balloon *vb;
-	unsigned int len;
+	struct virtio_balloon *vb = vq->vdev->priv;
 
-	vb = virtqueue_get_buf(vq, &len);
-	if (!vb)
-		return;
 	vb->need_stats_update = 1;
 	wake_up(&vb->config_change);
 }
@@ -245,11 +238,14 @@ static void stats_handle_request(struct virtio_balloon *vb)
 {
 	struct virtqueue *vq;
 	struct scatterlist sg;
+	unsigned int len;
 
 	vb->need_stats_update = 0;
 	update_balloon_stats(vb);
 
 	vq = vb->stats_vq;
+	if (!virtqueue_get_buf(vq, &len))
+		return;
 	sg_init_one(&sg, vb->stats, sizeof(vb->stats));
 	if (virtqueue_add_buf(vq, &sg, 1, 0, vb, GFP_KERNEL) < 0)
 		BUG();
@@ -358,6 +354,7 @@ static int virtballoon_probe(struct virtio_device *vdev)
 	INIT_LIST_HEAD(&vb->pages);
 	vb->num_pages = 0;
 	init_waitqueue_head(&vb->config_change);
+	init_waitqueue_head(&vb->acked);
 	vb->vdev = vdev;
 	vb->need_stats_update = 0;

^ permalink raw reply related	[flat|nested] 60+ messages in thread

* Re: RFD: virtio balloon API use (was Re: [PATCH 5 of 5] virtio: expose added descriptors immediately)
  2012-07-02  7:25         ` Michael S. Tsirkin
@ 2012-07-02 16:08           ` Rafael Aquini
  -1 siblings, 0 replies; 60+ messages in thread
From: Rafael Aquini @ 2012-07-02 16:08 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: Rusty Russell, linux-kernel, kvm, virtualization

On Mon, Jul 02, 2012 at 10:25:58AM +0300, Michael S. Tsirkin wrote:
> On Mon, Jul 02, 2012 at 10:35:47AM +0930, Rusty Russell wrote:
> > On Sun, 1 Jul 2012 12:20:51 +0300, "Michael S. Tsirkin" <mst@redhat.com> wrote:
> > > On Thu, Nov 03, 2011 at 06:12:53PM +1030, Rusty Russell wrote:
> > > > A virtio driver does virtqueue_add_buf() multiple times before finally
> > > > calling virtqueue_kick(); previously we only exposed the added buffers
> > > > in the virtqueue_kick() call.  This means we don't need a memory
> > > > barrier in virtqueue_add_buf(), but it reduces concurrency as the
> > > > device (ie. host) can't see the buffers until the kick.
> > > > 
> > > > Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
> > > 
> > > Looking at recent mm compaction patches made me look at locking
> > > in balloon closely. And I noticed the referenced patch (commit
> > > ee7cd8981e15bcb365fc762afe3fc47b8242f630 upstream) interacts strangely
> > > with virtio balloon; balloon currently does:
> > > 
> > > static void tell_host(struct virtio_balloon *vb, struct virtqueue *vq)
> > > {
> > >         struct scatterlist sg;
> > > 
> > >         sg_init_one(&sg, vb->pfns, sizeof(vb->pfns[0]) * vb->num_pfns);
> > > 
> > >         init_completion(&vb->acked);
> > > 
> > >         /* We should always be able to add one buffer to an empty queue. */
> > >         if (virtqueue_add_buf(vq, &sg, 1, 0, vb, GFP_KERNEL) < 0)
> > >                 BUG();
> > >         virtqueue_kick(vq);
> > > 
> > >         /* When host has read buffer, this completes via balloon_ack */
> > >         wait_for_completion(&vb->acked);
> > > }
> > > 
> > > 
> > > While vq callback does:
> > > 
> > > static void balloon_ack(struct virtqueue *vq)
> > > {
> > >         struct virtio_balloon *vb;
> > >         unsigned int len;
> > > 
> > >         vb = virtqueue_get_buf(vq, &len);
> > >         if (vb)
> > >                 complete(&vb->acked);
> > > }
> > > 
> > > 
> > > So virtqueue_get_buf might now run concurrently with virtqueue_kick.
> > > I audited both and this seems safe in practice but I think
> > 
> > Good spotting!
> > 
> > Agreed.  Because there's only add_buf, we get away with it: the add_buf
> > must be almost finished by the time get_buf runs because the device has
> > seen the buffer.
> > 
> > > we need to either declare this legal at the API level
> > > or add locking in driver.
> > 
> > I wonder if we should just lock in the balloon driver, rather than
> > document this corner case and set a bad example.
> 
> We'll need to replace &vb->acked with a waitqueue
> and do get_buf from the same thread.
> But I note that stats_request hash the same issue.
> Let's see if we can fix it.
> 
> > Are there other
> > drivers which take the same shortcut?
> 
> Not that I know.
> 
> > > Further, is there a guarantee that we never get
> > > spurious callbacks?  We currently check ring not empty
> > > but esp for non shared MSI this might not be needed.
> > 
> > Yes, I think this saves us.  A spurious interrupt won't trigger
> > a spurious callback.
> > 
> > > If a spurious callback triggers, virtqueue_get_buf can run
> > > concurrently with virtqueue_add_buf which is known to be racy.
> > > Again I think this is currently safe as no spurious callbacks in
> > > practice but should we guarantee no spurious callbacks at the API level
> > > or add locking in driver?
> > 
> > I think we should guarantee it, but is there a hole in the current
> > implementation?
> > 
> > Thanks,
> > Rusty.
> 
> Could be. The check for ring empty looks somewhat suspicious.
> It might be expensive to make it 100% robust - that check was
> intended as an optimization for shared interrupts.
> Whith per vq interrupts we IMO do not need the check.
> If we add locking in balloon I think there's no need
> to guarantee no spurious interrupts.
> 

As 'locking in balloon', may I assume the approach I took for the compaction case
is OK and aligned to address these concerns of yours? If not, do not hesitate in
giving me your thoughts, please. I'm respinning a V3 series to address a couple
of extra nitpicks from the compaction standpoint, and I'd love to be able to
address any extra concern you might have on the balloon side of that work.


Thanks!
Rafael.

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: RFD: virtio balloon API use (was Re: [PATCH 5 of 5] virtio: expose added descriptors immediately)
@ 2012-07-02 16:08           ` Rafael Aquini
  0 siblings, 0 replies; 60+ messages in thread
From: Rafael Aquini @ 2012-07-02 16:08 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: linux-kernel, kvm, virtualization

On Mon, Jul 02, 2012 at 10:25:58AM +0300, Michael S. Tsirkin wrote:
> On Mon, Jul 02, 2012 at 10:35:47AM +0930, Rusty Russell wrote:
> > On Sun, 1 Jul 2012 12:20:51 +0300, "Michael S. Tsirkin" <mst@redhat.com> wrote:
> > > On Thu, Nov 03, 2011 at 06:12:53PM +1030, Rusty Russell wrote:
> > > > A virtio driver does virtqueue_add_buf() multiple times before finally
> > > > calling virtqueue_kick(); previously we only exposed the added buffers
> > > > in the virtqueue_kick() call.  This means we don't need a memory
> > > > barrier in virtqueue_add_buf(), but it reduces concurrency as the
> > > > device (ie. host) can't see the buffers until the kick.
> > > > 
> > > > Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
> > > 
> > > Looking at recent mm compaction patches made me look at locking
> > > in balloon closely. And I noticed the referenced patch (commit
> > > ee7cd8981e15bcb365fc762afe3fc47b8242f630 upstream) interacts strangely
> > > with virtio balloon; balloon currently does:
> > > 
> > > static void tell_host(struct virtio_balloon *vb, struct virtqueue *vq)
> > > {
> > >         struct scatterlist sg;
> > > 
> > >         sg_init_one(&sg, vb->pfns, sizeof(vb->pfns[0]) * vb->num_pfns);
> > > 
> > >         init_completion(&vb->acked);
> > > 
> > >         /* We should always be able to add one buffer to an empty queue. */
> > >         if (virtqueue_add_buf(vq, &sg, 1, 0, vb, GFP_KERNEL) < 0)
> > >                 BUG();
> > >         virtqueue_kick(vq);
> > > 
> > >         /* When host has read buffer, this completes via balloon_ack */
> > >         wait_for_completion(&vb->acked);
> > > }
> > > 
> > > 
> > > While vq callback does:
> > > 
> > > static void balloon_ack(struct virtqueue *vq)
> > > {
> > >         struct virtio_balloon *vb;
> > >         unsigned int len;
> > > 
> > >         vb = virtqueue_get_buf(vq, &len);
> > >         if (vb)
> > >                 complete(&vb->acked);
> > > }
> > > 
> > > 
> > > So virtqueue_get_buf might now run concurrently with virtqueue_kick.
> > > I audited both and this seems safe in practice but I think
> > 
> > Good spotting!
> > 
> > Agreed.  Because there's only add_buf, we get away with it: the add_buf
> > must be almost finished by the time get_buf runs because the device has
> > seen the buffer.
> > 
> > > we need to either declare this legal at the API level
> > > or add locking in driver.
> > 
> > I wonder if we should just lock in the balloon driver, rather than
> > document this corner case and set a bad example.
> 
> We'll need to replace &vb->acked with a waitqueue
> and do get_buf from the same thread.
> But I note that stats_request hash the same issue.
> Let's see if we can fix it.
> 
> > Are there other
> > drivers which take the same shortcut?
> 
> Not that I know.
> 
> > > Further, is there a guarantee that we never get
> > > spurious callbacks?  We currently check ring not empty
> > > but esp for non shared MSI this might not be needed.
> > 
> > Yes, I think this saves us.  A spurious interrupt won't trigger
> > a spurious callback.
> > 
> > > If a spurious callback triggers, virtqueue_get_buf can run
> > > concurrently with virtqueue_add_buf which is known to be racy.
> > > Again I think this is currently safe as no spurious callbacks in
> > > practice but should we guarantee no spurious callbacks at the API level
> > > or add locking in driver?
> > 
> > I think we should guarantee it, but is there a hole in the current
> > implementation?
> > 
> > Thanks,
> > Rusty.
> 
> Could be. The check for ring empty looks somewhat suspicious.
> It might be expensive to make it 100% robust - that check was
> intended as an optimization for shared interrupts.
> Whith per vq interrupts we IMO do not need the check.
> If we add locking in balloon I think there's no need
> to guarantee no spurious interrupts.
> 

As 'locking in balloon', may I assume the approach I took for the compaction case
is OK and aligned to address these concerns of yours? If not, do not hesitate in
giving me your thoughts, please. I'm respinning a V3 series to address a couple
of extra nitpicks from the compaction standpoint, and I'd love to be able to
address any extra concern you might have on the balloon side of that work.


Thanks!
Rafael.

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: RFD: virtio balloon API use (was Re: [PATCH 5 of 5] virtio: expose added descriptors immediately)
  2012-07-02 16:08           ` Rafael Aquini
  (?)
@ 2012-07-03  0:47           ` Rusty Russell
  2012-07-03 16:26             ` Rafael Aquini
                               ` (2 more replies)
  -1 siblings, 3 replies; 60+ messages in thread
From: Rusty Russell @ 2012-07-03  0:47 UTC (permalink / raw)
  To: Rafael Aquini, Michael S. Tsirkin; +Cc: linux-kernel, kvm, virtualization

On Mon, 2 Jul 2012 13:08:19 -0300, Rafael Aquini <aquini@redhat.com> wrote:
> As 'locking in balloon', may I assume the approach I took for the compaction case
> is OK and aligned to address these concerns of yours? If not, do not hesitate in
> giving me your thoughts, please. I'm respinning a V3 series to address a couple
> of extra nitpicks from the compaction standpoint, and I'd love to be able to
> address any extra concern you might have on the balloon side of that work.

It's orthogonal, though looks like they clash textually :(

I'll re-spin MST's patch on top of yours, and include both in my tree,
otherwise linux-next will have to do the merge.  But I'll await your
push before pushing to Linus next merge window.

Thanks,
Rusty.

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: RFD: virtio balloon API use (was Re: [PATCH 5 of 5] virtio: expose added descriptors immediately)
  2012-07-03  0:47           ` Rusty Russell
@ 2012-07-03 16:26             ` Rafael Aquini
  2012-07-03 16:26             ` Rafael Aquini
  2012-07-04 10:55               ` Michael S. Tsirkin
  2 siblings, 0 replies; 60+ messages in thread
From: Rafael Aquini @ 2012-07-03 16:26 UTC (permalink / raw)
  To: Rusty Russell; +Cc: Michael S. Tsirkin, linux-kernel, kvm, virtualization

On Tue, Jul 03, 2012 at 10:17:46AM +0930, Rusty Russell wrote:
> On Mon, 2 Jul 2012 13:08:19 -0300, Rafael Aquini <aquini@redhat.com> wrote:
> > As 'locking in balloon', may I assume the approach I took for the compaction case
> > is OK and aligned to address these concerns of yours? If not, do not hesitate in
> > giving me your thoughts, please. I'm respinning a V3 series to address a couple
> > of extra nitpicks from the compaction standpoint, and I'd love to be able to
> > address any extra concern you might have on the balloon side of that work.
> 
> It's orthogonal, though looks like they clash textually :(
> 
> I'll re-spin MST's patch on top of yours, and include both in my tree,
> otherwise linux-next will have to do the merge.  But I'll await your
> push before pushing to Linus next merge window.
>
Thanks, Rusty.

I'll post V3 series quite soon.

Cheers!
Rafael
 
> Thanks,
> Rusty.

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: RFD: virtio balloon API use (was Re: [PATCH 5 of 5] virtio: expose added descriptors immediately)
  2012-07-03  0:47           ` Rusty Russell
  2012-07-03 16:26             ` Rafael Aquini
@ 2012-07-03 16:26             ` Rafael Aquini
  2012-07-04 10:55               ` Michael S. Tsirkin
  2 siblings, 0 replies; 60+ messages in thread
From: Rafael Aquini @ 2012-07-03 16:26 UTC (permalink / raw)
  To: Rusty Russell; +Cc: virtualization, linux-kernel, kvm, Michael S. Tsirkin

On Tue, Jul 03, 2012 at 10:17:46AM +0930, Rusty Russell wrote:
> On Mon, 2 Jul 2012 13:08:19 -0300, Rafael Aquini <aquini@redhat.com> wrote:
> > As 'locking in balloon', may I assume the approach I took for the compaction case
> > is OK and aligned to address these concerns of yours? If not, do not hesitate in
> > giving me your thoughts, please. I'm respinning a V3 series to address a couple
> > of extra nitpicks from the compaction standpoint, and I'd love to be able to
> > address any extra concern you might have on the balloon side of that work.
> 
> It's orthogonal, though looks like they clash textually :(
> 
> I'll re-spin MST's patch on top of yours, and include both in my tree,
> otherwise linux-next will have to do the merge.  But I'll await your
> push before pushing to Linus next merge window.
>
Thanks, Rusty.

I'll post V3 series quite soon.

Cheers!
Rafael
 
> Thanks,
> Rusty.

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH RFC] virtio-balloon: fix add/get API use
  2012-07-02  7:33         ` Michael S. Tsirkin
@ 2012-07-04  3:27           ` Rusty Russell
  -1 siblings, 0 replies; 60+ messages in thread
From: Rusty Russell @ 2012-07-04  3:27 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: Rafael Aquini, linux-kernel, kvm, virtualization

On Mon, 2 Jul 2012 10:33:08 +0300, "Michael S. Tsirkin" <mst@redhat.com> wrote:
> In virtio balloon virtqueue_get_buf might now run concurrently with
> virtqueue_kick.  I audited both and this seems safe in practice but
> this is not guaranteed by the API.
> Additionally, a spurious interrupt might in theory make
> virtqueue_get_buf run in parallel with virtqueue_add_buf, which is racy.
> 
> While we might try to protect against spurious callbacks it's
> easier to fix the driver: balloon seems to be the only one
> (mis)using the API like this, so let's just fix balloon.
> 
> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

I was thinking of a spinlock, but this is far more elegant.

And I added an explicit reference to the 'virtio: expose added
descriptors immediately.' commit in your commit msg.

Kudos!
Rusty.

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH RFC] virtio-balloon: fix add/get API use
@ 2012-07-04  3:27           ` Rusty Russell
  0 siblings, 0 replies; 60+ messages in thread
From: Rusty Russell @ 2012-07-04  3:27 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: virtualization, Rafael Aquini, kvm, linux-kernel

On Mon, 2 Jul 2012 10:33:08 +0300, "Michael S. Tsirkin" <mst@redhat.com> wrote:
> In virtio balloon virtqueue_get_buf might now run concurrently with
> virtqueue_kick.  I audited both and this seems safe in practice but
> this is not guaranteed by the API.
> Additionally, a spurious interrupt might in theory make
> virtqueue_get_buf run in parallel with virtqueue_add_buf, which is racy.
> 
> While we might try to protect against spurious callbacks it's
> easier to fix the driver: balloon seems to be the only one
> (mis)using the API like this, so let's just fix balloon.
> 
> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

I was thinking of a spinlock, but this is far more elegant.

And I added an explicit reference to the 'virtio: expose added
descriptors immediately.' commit in your commit msg.

Kudos!
Rusty.

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: RFD: virtio balloon API use (was Re: [PATCH 5 of 5] virtio: expose added descriptors immediately)
  2012-07-03  0:47           ` Rusty Russell
@ 2012-07-04 10:55               ` Michael S. Tsirkin
  2012-07-03 16:26             ` Rafael Aquini
  2012-07-04 10:55               ` Michael S. Tsirkin
  2 siblings, 0 replies; 60+ messages in thread
From: Michael S. Tsirkin @ 2012-07-04 10:55 UTC (permalink / raw)
  To: Rusty Russell; +Cc: Rafael Aquini, linux-kernel, kvm, virtualization

On Tue, Jul 03, 2012 at 10:17:46AM +0930, Rusty Russell wrote:
> On Mon, 2 Jul 2012 13:08:19 -0300, Rafael Aquini <aquini@redhat.com> wrote:
> > As 'locking in balloon', may I assume the approach I took for the compaction case
> > is OK and aligned to address these concerns of yours? If not, do not hesitate in
> > giving me your thoughts, please. I'm respinning a V3 series to address a couple
> > of extra nitpicks from the compaction standpoint, and I'd love to be able to
> > address any extra concern you might have on the balloon side of that work.
> 
> It's orthogonal, though looks like they clash textually :(
> 
> I'll re-spin MST's patch on top of yours, and include both in my tree,
> otherwise linux-next will have to do the merge.  But I'll await your
> push before pushing to Linus next merge window.
> 
> Thanks,
> Rusty.

While theoretical mine is a bugfix so could be 3.5 material, no?

-- 
MST

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: RFD: virtio balloon API use (was Re: [PATCH 5 of 5] virtio: expose added descriptors immediately)
@ 2012-07-04 10:55               ` Michael S. Tsirkin
  0 siblings, 0 replies; 60+ messages in thread
From: Michael S. Tsirkin @ 2012-07-04 10:55 UTC (permalink / raw)
  To: Rusty Russell; +Cc: virtualization, Rafael Aquini, kvm, linux-kernel

On Tue, Jul 03, 2012 at 10:17:46AM +0930, Rusty Russell wrote:
> On Mon, 2 Jul 2012 13:08:19 -0300, Rafael Aquini <aquini@redhat.com> wrote:
> > As 'locking in balloon', may I assume the approach I took for the compaction case
> > is OK and aligned to address these concerns of yours? If not, do not hesitate in
> > giving me your thoughts, please. I'm respinning a V3 series to address a couple
> > of extra nitpicks from the compaction standpoint, and I'd love to be able to
> > address any extra concern you might have on the balloon side of that work.
> 
> It's orthogonal, though looks like they clash textually :(
> 
> I'll re-spin MST's patch on top of yours, and include both in my tree,
> otherwise linux-next will have to do the merge.  But I'll await your
> push before pushing to Linus next merge window.
> 
> Thanks,
> Rusty.

While theoretical mine is a bugfix so could be 3.5 material, no?

-- 
MST

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: RFD: virtio balloon API use (was Re: [PATCH 5 of 5] virtio: expose added descriptors immediately)
  2012-07-02 16:08           ` Rafael Aquini
@ 2012-07-04 10:55             ` Michael S. Tsirkin
  -1 siblings, 0 replies; 60+ messages in thread
From: Michael S. Tsirkin @ 2012-07-04 10:55 UTC (permalink / raw)
  To: Rafael Aquini; +Cc: Rusty Russell, linux-kernel, kvm, virtualization

On Mon, Jul 02, 2012 at 01:08:19PM -0300, Rafael Aquini wrote:
> On Mon, Jul 02, 2012 at 10:25:58AM +0300, Michael S. Tsirkin wrote:
> > On Mon, Jul 02, 2012 at 10:35:47AM +0930, Rusty Russell wrote:
> > > On Sun, 1 Jul 2012 12:20:51 +0300, "Michael S. Tsirkin" <mst@redhat.com> wrote:
> > > > On Thu, Nov 03, 2011 at 06:12:53PM +1030, Rusty Russell wrote:
> > > > > A virtio driver does virtqueue_add_buf() multiple times before finally
> > > > > calling virtqueue_kick(); previously we only exposed the added buffers
> > > > > in the virtqueue_kick() call.  This means we don't need a memory
> > > > > barrier in virtqueue_add_buf(), but it reduces concurrency as the
> > > > > device (ie. host) can't see the buffers until the kick.
> > > > > 
> > > > > Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
> > > > 
> > > > Looking at recent mm compaction patches made me look at locking
> > > > in balloon closely. And I noticed the referenced patch (commit
> > > > ee7cd8981e15bcb365fc762afe3fc47b8242f630 upstream) interacts strangely
> > > > with virtio balloon; balloon currently does:
> > > > 
> > > > static void tell_host(struct virtio_balloon *vb, struct virtqueue *vq)
> > > > {
> > > >         struct scatterlist sg;
> > > > 
> > > >         sg_init_one(&sg, vb->pfns, sizeof(vb->pfns[0]) * vb->num_pfns);
> > > > 
> > > >         init_completion(&vb->acked);
> > > > 
> > > >         /* We should always be able to add one buffer to an empty queue. */
> > > >         if (virtqueue_add_buf(vq, &sg, 1, 0, vb, GFP_KERNEL) < 0)
> > > >                 BUG();
> > > >         virtqueue_kick(vq);
> > > > 
> > > >         /* When host has read buffer, this completes via balloon_ack */
> > > >         wait_for_completion(&vb->acked);
> > > > }
> > > > 
> > > > 
> > > > While vq callback does:
> > > > 
> > > > static void balloon_ack(struct virtqueue *vq)
> > > > {
> > > >         struct virtio_balloon *vb;
> > > >         unsigned int len;
> > > > 
> > > >         vb = virtqueue_get_buf(vq, &len);
> > > >         if (vb)
> > > >                 complete(&vb->acked);
> > > > }
> > > > 
> > > > 
> > > > So virtqueue_get_buf might now run concurrently with virtqueue_kick.
> > > > I audited both and this seems safe in practice but I think
> > > 
> > > Good spotting!
> > > 
> > > Agreed.  Because there's only add_buf, we get away with it: the add_buf
> > > must be almost finished by the time get_buf runs because the device has
> > > seen the buffer.
> > > 
> > > > we need to either declare this legal at the API level
> > > > or add locking in driver.
> > > 
> > > I wonder if we should just lock in the balloon driver, rather than
> > > document this corner case and set a bad example.
> > 
> > We'll need to replace &vb->acked with a waitqueue
> > and do get_buf from the same thread.
> > But I note that stats_request hash the same issue.
> > Let's see if we can fix it.
> > 
> > > Are there other
> > > drivers which take the same shortcut?
> > 
> > Not that I know.
> > 
> > > > Further, is there a guarantee that we never get
> > > > spurious callbacks?  We currently check ring not empty
> > > > but esp for non shared MSI this might not be needed.
> > > 
> > > Yes, I think this saves us.  A spurious interrupt won't trigger
> > > a spurious callback.
> > > 
> > > > If a spurious callback triggers, virtqueue_get_buf can run
> > > > concurrently with virtqueue_add_buf which is known to be racy.
> > > > Again I think this is currently safe as no spurious callbacks in
> > > > practice but should we guarantee no spurious callbacks at the API level
> > > > or add locking in driver?
> > > 
> > > I think we should guarantee it, but is there a hole in the current
> > > implementation?
> > > 
> > > Thanks,
> > > Rusty.
> > 
> > Could be. The check for ring empty looks somewhat suspicious.
> > It might be expensive to make it 100% robust - that check was
> > intended as an optimization for shared interrupts.
> > Whith per vq interrupts we IMO do not need the check.
> > If we add locking in balloon I think there's no need
> > to guarantee no spurious interrupts.
> > 
> 
> As 'locking in balloon', may I assume the approach I took for the compaction case
> is OK and aligned to address these concerns of yours?

No, I mean the patch I posted. Not so much locking as moving
get_buf to thread itself.

> If not, do not hesitate in
> giving me your thoughts, please. I'm respinning a V3 series to address a couple
> of extra nitpicks from the compaction standpoint, and I'd love to be able to
> address any extra concern you might have on the balloon side of that work.
> 
> 
> Thanks!
> Rafael.

-- 
MST

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: RFD: virtio balloon API use (was Re: [PATCH 5 of 5] virtio: expose added descriptors immediately)
@ 2012-07-04 10:55             ` Michael S. Tsirkin
  0 siblings, 0 replies; 60+ messages in thread
From: Michael S. Tsirkin @ 2012-07-04 10:55 UTC (permalink / raw)
  To: Rafael Aquini; +Cc: linux-kernel, kvm, virtualization

On Mon, Jul 02, 2012 at 01:08:19PM -0300, Rafael Aquini wrote:
> On Mon, Jul 02, 2012 at 10:25:58AM +0300, Michael S. Tsirkin wrote:
> > On Mon, Jul 02, 2012 at 10:35:47AM +0930, Rusty Russell wrote:
> > > On Sun, 1 Jul 2012 12:20:51 +0300, "Michael S. Tsirkin" <mst@redhat.com> wrote:
> > > > On Thu, Nov 03, 2011 at 06:12:53PM +1030, Rusty Russell wrote:
> > > > > A virtio driver does virtqueue_add_buf() multiple times before finally
> > > > > calling virtqueue_kick(); previously we only exposed the added buffers
> > > > > in the virtqueue_kick() call.  This means we don't need a memory
> > > > > barrier in virtqueue_add_buf(), but it reduces concurrency as the
> > > > > device (ie. host) can't see the buffers until the kick.
> > > > > 
> > > > > Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
> > > > 
> > > > Looking at recent mm compaction patches made me look at locking
> > > > in balloon closely. And I noticed the referenced patch (commit
> > > > ee7cd8981e15bcb365fc762afe3fc47b8242f630 upstream) interacts strangely
> > > > with virtio balloon; balloon currently does:
> > > > 
> > > > static void tell_host(struct virtio_balloon *vb, struct virtqueue *vq)
> > > > {
> > > >         struct scatterlist sg;
> > > > 
> > > >         sg_init_one(&sg, vb->pfns, sizeof(vb->pfns[0]) * vb->num_pfns);
> > > > 
> > > >         init_completion(&vb->acked);
> > > > 
> > > >         /* We should always be able to add one buffer to an empty queue. */
> > > >         if (virtqueue_add_buf(vq, &sg, 1, 0, vb, GFP_KERNEL) < 0)
> > > >                 BUG();
> > > >         virtqueue_kick(vq);
> > > > 
> > > >         /* When host has read buffer, this completes via balloon_ack */
> > > >         wait_for_completion(&vb->acked);
> > > > }
> > > > 
> > > > 
> > > > While vq callback does:
> > > > 
> > > > static void balloon_ack(struct virtqueue *vq)
> > > > {
> > > >         struct virtio_balloon *vb;
> > > >         unsigned int len;
> > > > 
> > > >         vb = virtqueue_get_buf(vq, &len);
> > > >         if (vb)
> > > >                 complete(&vb->acked);
> > > > }
> > > > 
> > > > 
> > > > So virtqueue_get_buf might now run concurrently with virtqueue_kick.
> > > > I audited both and this seems safe in practice but I think
> > > 
> > > Good spotting!
> > > 
> > > Agreed.  Because there's only add_buf, we get away with it: the add_buf
> > > must be almost finished by the time get_buf runs because the device has
> > > seen the buffer.
> > > 
> > > > we need to either declare this legal at the API level
> > > > or add locking in driver.
> > > 
> > > I wonder if we should just lock in the balloon driver, rather than
> > > document this corner case and set a bad example.
> > 
> > We'll need to replace &vb->acked with a waitqueue
> > and do get_buf from the same thread.
> > But I note that stats_request hash the same issue.
> > Let's see if we can fix it.
> > 
> > > Are there other
> > > drivers which take the same shortcut?
> > 
> > Not that I know.
> > 
> > > > Further, is there a guarantee that we never get
> > > > spurious callbacks?  We currently check ring not empty
> > > > but esp for non shared MSI this might not be needed.
> > > 
> > > Yes, I think this saves us.  A spurious interrupt won't trigger
> > > a spurious callback.
> > > 
> > > > If a spurious callback triggers, virtqueue_get_buf can run
> > > > concurrently with virtqueue_add_buf which is known to be racy.
> > > > Again I think this is currently safe as no spurious callbacks in
> > > > practice but should we guarantee no spurious callbacks at the API level
> > > > or add locking in driver?
> > > 
> > > I think we should guarantee it, but is there a hole in the current
> > > implementation?
> > > 
> > > Thanks,
> > > Rusty.
> > 
> > Could be. The check for ring empty looks somewhat suspicious.
> > It might be expensive to make it 100% robust - that check was
> > intended as an optimization for shared interrupts.
> > Whith per vq interrupts we IMO do not need the check.
> > If we add locking in balloon I think there's no need
> > to guarantee no spurious interrupts.
> > 
> 
> As 'locking in balloon', may I assume the approach I took for the compaction case
> is OK and aligned to address these concerns of yours?

No, I mean the patch I posted. Not so much locking as moving
get_buf to thread itself.

> If not, do not hesitate in
> giving me your thoughts, please. I'm respinning a V3 series to address a couple
> of extra nitpicks from the compaction standpoint, and I'd love to be able to
> address any extra concern you might have on the balloon side of that work.
> 
> 
> Thanks!
> Rafael.

-- 
MST

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: RFD: virtio balloon API use (was Re: [PATCH 5 of 5] virtio: expose added descriptors immediately)
  2012-07-04 10:55               ` Michael S. Tsirkin
@ 2012-07-08 23:39                 ` Rusty Russell
  -1 siblings, 0 replies; 60+ messages in thread
From: Rusty Russell @ 2012-07-08 23:39 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: Rafael Aquini, linux-kernel, kvm, virtualization

On Wed, 4 Jul 2012 13:55:33 +0300, "Michael S. Tsirkin" <mst@redhat.com> wrote:
> On Tue, Jul 03, 2012 at 10:17:46AM +0930, Rusty Russell wrote:
> > On Mon, 2 Jul 2012 13:08:19 -0300, Rafael Aquini <aquini@redhat.com> wrote:
> > > As 'locking in balloon', may I assume the approach I took for the compaction case
> > > is OK and aligned to address these concerns of yours? If not, do not hesitate in
> > > giving me your thoughts, please. I'm respinning a V3 series to address a couple
> > > of extra nitpicks from the compaction standpoint, and I'd love to be able to
> > > address any extra concern you might have on the balloon side of that work.
> > 
> > It's orthogonal, though looks like they clash textually :(
> > 
> > I'll re-spin MST's patch on top of yours, and include both in my tree,
> > otherwise linux-next will have to do the merge.  But I'll await your
> > push before pushing to Linus next merge window.
> > 
> > Thanks,
> > Rusty.
> 
> While theoretical mine is a bugfix so could be 3.5 material, no?

It's a little thin, but it's a good idea.  I'll try.

Cheers,
Rusty.

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: RFD: virtio balloon API use (was Re: [PATCH 5 of 5] virtio: expose added descriptors immediately)
@ 2012-07-08 23:39                 ` Rusty Russell
  0 siblings, 0 replies; 60+ messages in thread
From: Rusty Russell @ 2012-07-08 23:39 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: virtualization, Rafael Aquini, kvm, linux-kernel

On Wed, 4 Jul 2012 13:55:33 +0300, "Michael S. Tsirkin" <mst@redhat.com> wrote:
> On Tue, Jul 03, 2012 at 10:17:46AM +0930, Rusty Russell wrote:
> > On Mon, 2 Jul 2012 13:08:19 -0300, Rafael Aquini <aquini@redhat.com> wrote:
> > > As 'locking in balloon', may I assume the approach I took for the compaction case
> > > is OK and aligned to address these concerns of yours? If not, do not hesitate in
> > > giving me your thoughts, please. I'm respinning a V3 series to address a couple
> > > of extra nitpicks from the compaction standpoint, and I'd love to be able to
> > > address any extra concern you might have on the balloon side of that work.
> > 
> > It's orthogonal, though looks like they clash textually :(
> > 
> > I'll re-spin MST's patch on top of yours, and include both in my tree,
> > otherwise linux-next will have to do the merge.  But I'll await your
> > push before pushing to Linus next merge window.
> > 
> > Thanks,
> > Rusty.
> 
> While theoretical mine is a bugfix so could be 3.5 material, no?

It's a little thin, but it's a good idea.  I'll try.

Cheers,
Rusty.

^ permalink raw reply	[flat|nested] 60+ messages in thread

end of thread, other threads:[~2012-07-09  0:39 UTC | newest]

Thread overview: 60+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <patchbomb.1320306168@localhost6.localdomain6>
2011-11-03  7:42 ` [PATCH 1 of 5] virtio: document functions better Rusty Russell
2011-11-03  7:42   ` Rusty Russell
2011-11-03  7:49   ` Christoph Hellwig
2011-11-03  7:49     ` Christoph Hellwig
2011-11-03  7:42 ` [PATCH 2 of 5] virtio: rename virtqueue_add_buf_gfp to virtqueue_add_buf Rusty Russell
2011-11-03  7:42   ` Rusty Russell
2011-11-03  7:50   ` Christoph Hellwig
2011-11-03  7:50     ` Christoph Hellwig
2011-11-03  7:42 ` [PATCH 3 of 5] virtio: support unlocked queue kick Rusty Russell
2011-11-03  7:42   ` Rusty Russell
2011-11-03  7:52   ` Christoph Hellwig
2011-11-03  7:52   ` Christoph Hellwig
2011-11-04 10:09     ` Stefan Hajnoczi
2011-11-04 10:09       ` Stefan Hajnoczi
2011-11-04 10:36     ` Rusty Russell
2011-11-04 10:36       ` Rusty Russell
2011-11-04 10:36       ` Rusty Russell
2011-11-03  7:42 ` [PATCH 4 of 5] virtio: avoid modulus operation Rusty Russell
2011-11-03  7:42   ` Rusty Russell
2011-11-03  7:51   ` Pekka Enberg
2011-11-03 10:18     ` Rusty Russell
2011-11-03 10:18     ` Rusty Russell
2011-11-03  7:51   ` Pekka Enberg
2011-11-03  7:42 ` [PATCH 5 of 5] virtio: expose added descriptors immediately Rusty Russell
2011-11-03  7:42   ` Rusty Russell
2011-11-13 21:03   ` Michael S. Tsirkin
2011-11-13 21:03     ` Michael S. Tsirkin
2011-11-14  0:43     ` Rusty Russell
2011-11-14  0:43       ` Rusty Russell
2011-11-14  0:43       ` Rusty Russell
2011-11-14  6:56     ` Michael S. Tsirkin
2011-11-16  0:21       ` Rusty Russell
2011-11-16  7:18         ` Michael S. Tsirkin
2011-11-21  1:48           ` Rusty Russell
2011-11-21 11:57             ` Michael S. Tsirkin
2011-11-22  0:33               ` Rusty Russell
2011-11-22  6:29                 ` Michael S. Tsirkin
2011-11-23  1:19                   ` Rusty Russell
2011-11-23  8:30                     ` Michael S. Tsirkin
2012-07-01  9:20   ` RFD: virtio balloon API use (was Re: [PATCH 5 of 5] virtio: expose added descriptors immediately) Michael S. Tsirkin
2012-07-01  9:20     ` Michael S. Tsirkin
2012-07-02  1:05     ` Rusty Russell
2012-07-02  1:05       ` Rusty Russell
2012-07-02  7:25       ` Michael S. Tsirkin
2012-07-02  7:25         ` Michael S. Tsirkin
2012-07-02 16:08         ` Rafael Aquini
2012-07-02 16:08           ` Rafael Aquini
2012-07-03  0:47           ` Rusty Russell
2012-07-03 16:26             ` Rafael Aquini
2012-07-03 16:26             ` Rafael Aquini
2012-07-04 10:55             ` Michael S. Tsirkin
2012-07-04 10:55               ` Michael S. Tsirkin
2012-07-08 23:39               ` Rusty Russell
2012-07-08 23:39                 ` Rusty Russell
2012-07-04 10:55           ` Michael S. Tsirkin
2012-07-04 10:55             ` Michael S. Tsirkin
2012-07-02  7:33       ` [PATCH RFC] virtio-balloon: fix add/get API use Michael S. Tsirkin
2012-07-02  7:33         ` Michael S. Tsirkin
2012-07-04  3:27         ` Rusty Russell
2012-07-04  3:27           ` Rusty Russell

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.