linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/9] virtio: new API for addition of buffers, scatterlist changes
@ 2013-02-12 12:23 Paolo Bonzini
  2013-02-12 12:23 ` [PATCH 1/9] virtio: add functions for piecewise addition of buffers Paolo Bonzini
                   ` (9 more replies)
  0 siblings, 10 replies; 35+ messages in thread
From: Paolo Bonzini @ 2013-02-12 12:23 UTC (permalink / raw)
  To: linux-kernel; +Cc: Wanlong Gao, asias, mst, Rusty Russell, kvm, virtualization

Most device drivers do not need to perform any postprocessing on the
scatterlists they receive from higher-level drivers (e.g. the block
or SCSI layer), because they translate the request metadata directly
from the various C structs into the data that is required by the device.

virtio devices however do this translation in two steps: a device-specific
step in the device driver, and generic preparation of virtio direct or
indirect buffers in virtqueue_add_buf.  Now, virtqueue_add_buf also
accepts the outcome of the first step as a struct scatterlist, hence
the drivers may need to put additional items at the front or back of
the data scatterlists.

On top of this, virtqueue_add_buf also has the limitation of not
supporting chained scatterlists: the buffers must be provided as an
array of struct scatterlist.  Chained scatterlist would at least help
in the case where you have to add additional items at the front.

Because of this, virtio-scsi has to copy each request into a scatterlist
internal to the driver.  It cannot just use the one that was prepared
by the upper SCSI layers.  (virtio-blk has a little more flexibility in
that it handles itself the preparation of the scatterlist).

This series adds a different set of APIs for adding a buffer to a
virtqueue.  The new API lets you pass the buffers piecewise, wrapping
multiple calls to virtqueue_add_sg between virtqueue_start_buf and
virtqueue_end_buf.  Letting drivers call virtqueue_add_sg multiple times
if they already have a scatterlist provided by someone else simplifies the
code and, for virtio-scsi, it saves the copying and related locking.

One major difference between virtqueue_add_buf and virtqueue_add_sg
is that the latter uses scatterlist iterators, which follow chained
scatterlist structs and stop at ending markers.  In order to avoid code
duplication, and use the new API from virtqueue_add_buf (patch 8), we need
to change all existing callers of virtqueue_add_buf to provide well-formed
scatterlists.  This is what patches 2-7 do.  For virtio-blk it is easiest
to just switch to the new API, just like for virtio-scsi.  For virtio-net
the ending marker must be reset after calling virtqueue_add_buf, in
preparation for the next usage of the scatterlist.  Other drivers are
safe already, but many can be converted to the "fast-path" function
virtqueue_add_buf_single suggested by mst; this is done in patch 8.

Compared to the RFC, I have moved the state of virtqueue_add_sg into
struct vring_desc, and I replaced virtqueue_add_sg_single with
virtqueue_add_buf_single.  I renamed the "count" and "count_sg"
to "nents" and "nsg" following a suggestion of Stefan Hajnoczi.
And of course the patch is now well tested (including virtio-serial,
all three virtio-net receive paths, and virtio-blk SG_IO which I hadn't
checked for the RFC); anyway this did not bring in new code changes.

Ok for 3.9?

Paolo

Paolo Bonzini (9):
  virtio: add functions for piecewise addition of buffers
  virtio-blk: reorganize virtblk_add_req
  virtio-blk: use virtqueue_start_buf on bio path
  virtio-blk: use virtqueue_start_buf on req path
  scatterlist: introduce sg_unmark_end
  virtio-net: unmark scatterlist ending after virtqueue_add_buf
  virtio-scsi: use virtqueue_start_buf
  virtio: introduce and use virtqueue_add_buf_single
  virtio: reimplement virtqueue_add_buf using new functions

 block/blk-integrity.c               |    2 +-
 block/blk-merge.c                   |    2 +-
 drivers/block/virtio_blk.c          |  165 ++++++++-------
 drivers/char/hw_random/virtio-rng.c |    2 +-
 drivers/char/virtio_console.c       |    4 +-
 drivers/net/virtio_net.c            |   21 ++-
 drivers/rpmsg/virtio_rpmsg_bus.c    |    8 +-
 drivers/scsi/virtio_scsi.c          |  106 ++++------
 drivers/virtio/virtio_balloon.c     |    6 +-
 drivers/virtio/virtio_ring.c        |  396 ++++++++++++++++++++++-------------
 include/linux/scatterlist.h         |   16 ++
 include/linux/virtio.h              |   19 ++
 tools/virtio/virtio_test.c          |    6 +-
 13 files changed, 450 insertions(+), 303 deletions(-)


^ permalink raw reply	[flat|nested] 35+ messages in thread

* [PATCH 1/9] virtio: add functions for piecewise addition of buffers
  2013-02-12 12:23 [PATCH 0/9] virtio: new API for addition of buffers, scatterlist changes Paolo Bonzini
@ 2013-02-12 12:23 ` Paolo Bonzini
  2013-02-12 14:56   ` Michael S. Tsirkin
  2013-02-12 18:03   ` [PATCH v2 " Paolo Bonzini
  2013-02-12 12:23 ` [PATCH 2/9] virtio-blk: reorganize virtblk_add_req Paolo Bonzini
                   ` (8 subsequent siblings)
  9 siblings, 2 replies; 35+ messages in thread
From: Paolo Bonzini @ 2013-02-12 12:23 UTC (permalink / raw)
  To: linux-kernel; +Cc: Wanlong Gao, asias, mst, Rusty Russell, kvm, virtualization

virtio device drivers translate requests from higher layer in two steps:
a device-specific step in the device driver, and generic preparation
of virtio direct or indirect buffers in virtqueue_add_buf.  Because
virtqueue_add_buf also accepts the outcome of the first step as a single
struct scatterlist, drivers may need to put additional items at the
front or back of the data scatterlists before handing it to virtqueue_add_buf.
Because of this, virtio-scsi has to copy each request into a scatterlist
internal to the driver.  It cannot just use the one that was prepared
by the upper SCSI layers. 

On top of this, virtqueue_add_buf also has the limitation of not
supporting chained scatterlists: the buffers must be provided as an
array of struct scatterlist.  Chained scatterlist, though not supported
on all architectures, would help for virtio-scsi where all additional
items are placed at the front.

This patch adds a different set of APIs for adding a buffer to a virtqueue.
The new API lets you pass the buffers piecewise, wrapping multiple calls
to virtqueue_add_sg between virtqueue_start_buf and virtqueue_end_buf.
virtio-scsi can then call virtqueue_add_sg 3/4 times: for the request
header, for the write buffer (if present), for the response header, and
finally for the read buffer (again if present).  It saves the copying
and the related locking.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 drivers/virtio/virtio_ring.c |  211 ++++++++++++++++++++++++++++++++++++++++++
 include/linux/virtio.h       |   14 +++
 2 files changed, 225 insertions(+), 0 deletions(-)

diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
index ffd7e7d..64184e5 100644
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -101,6 +101,10 @@ struct vring_virtqueue
 	/* Last used index we've seen. */
 	u16 last_used_idx;
 
+	/* State between virtqueue_start_buf and virtqueue_end_buf.  */
+	int head;
+	struct vring_desc *indirect_base, *tail;
+
 	/* How to notify other side. FIXME: commonalize hcalls! */
 	void (*notify)(struct virtqueue *vq);
 
@@ -394,6 +398,213 @@ static void detach_buf(struct vring_virtqueue *vq, unsigned int head)
 	vq->vq.num_free++;
 }
 
+/**
+ * virtqueue_start_buf - start building buffer for the other end
+ * @vq: the struct virtqueue we're talking about.
+ * @data: the token identifying the buffer.
+ * @nents: the number of buffers that will be added
+ * @nsg: the number of sg lists that will be added
+ * @gfp: how to do memory allocations (if necessary).
+ *
+ * Caller must ensure we don't call this with other virtqueue operations
+ * at the same time (except where noted), and that a successful call is
+ * followed by one or more calls to virtqueue_add_sg, and finally a call
+ * to virtqueue_end_buf.
+ *
+ * Returns zero or a negative error (ie. ENOSPC).
+ */
+int virtqueue_start_buf(struct virtqueue *_vq,
+			void *data,
+			unsigned int nents,
+			unsigned int nsg,
+			gfp_t gfp)
+{
+	struct vring_virtqueue *vq = to_vvq(_vq);
+	struct vring_desc *desc = NULL;
+	int head;
+	int ret = -ENOMEM;
+
+	START_USE(vq);
+
+	BUG_ON(data == NULL);
+
+#ifdef DEBUG
+	{
+		ktime_t now = ktime_get();
+
+		/* No kick or get, with .1 second between?  Warn. */
+		if (vq->last_add_time_valid)
+			WARN_ON(ktime_to_ms(ktime_sub(now, vq->last_add_time))
+					    > 100);
+		vq->last_add_time = now;
+		vq->last_add_time_valid = true;
+	}
+#endif
+
+	BUG_ON(nents < nsg);
+	BUG_ON(nsg == 0);
+
+	/*
+	 * If the host supports indirect descriptor tables, and there is
+	 * no space for direct buffers or there are multi-item scatterlists,
+	 * go indirect.
+	 */
+	head = vq->free_head;
+	if (vq->indirect && (nents > nsg || vq->vq.num_free < nents)) {
+		if (vq->vq.num_free == 0)
+			goto no_space;
+
+		desc = kmalloc(nents * sizeof(struct vring_desc), gfp);
+		if (!desc)
+			goto error;
+
+		/* We're about to use a buffer */
+		vq->vq.num_free--;
+
+		/* Use a single buffer which doesn't continue */
+		vq->vring.desc[head].flags = VRING_DESC_F_INDIRECT;
+		vq->vring.desc[head].addr = virt_to_phys(desc);
+		vq->vring.desc[head].len = nents * sizeof(struct vring_desc);
+
+		/* Update free pointer */
+		vq->free_head = vq->vring.desc[head].next;
+	}
+
+	/* Set token. */
+	vq->data[head] = data;
+
+	pr_debug("Started buffer head %i for %p\n", head, vq);
+
+	vq->indirect_base = desc;
+	vq->tail = NULL;
+	vq->head = head;
+	return 0;
+
+no_space:
+	ret = -ENOSPC;
+error:
+	pr_debug("Can't add buf (%d) - nents = %i, avail = %i\n",
+		 ret, nents, vq->vq.num_free);
+	END_USE(vq);
+	return ret;
+}
+EXPORT_SYMBOL_GPL(virtqueue_start_buf);
+
+/**
+ * virtqueue_add_sg - add sglist to buffer being built
+ * @_vq: the virtqueue for which the buffer is being built
+ * @sgl: the description of the buffer(s).
+ * @nents: the number of items to process in sgl
+ * @dir: whether the sgl is read or written (DMA_TO_DEVICE/DMA_FROM_DEVICE only)
+ *
+ * Note that, unlike virtqueue_add_buf, this function follows chained
+ * scatterlists, and stops before the @nents-th item if a scatterlist item
+ * has a marker.
+ *
+ * Caller must ensure we don't call this with other virtqueue operations
+ * at the same time (except where noted).
+ */
+void virtqueue_add_sg(struct virtqueue *_vq,
+		      struct scatterlist sgl[],
+		      unsigned int nents,
+		      enum dma_data_direction dir)
+{
+	struct vring_virtqueue *vq = to_vvq(_vq);
+	unsigned int i, n;
+	struct scatterlist *sg;
+	struct vring_desc *tail;
+	u32 flags;
+
+#ifdef DEBUG
+	BUG_ON(!vq->in_use);
+#endif
+
+	BUG_ON(dir != DMA_FROM_DEVICE && dir != DMA_TO_DEVICE);
+	BUG_ON(nents == 0);
+
+	flags = dir == DMA_FROM_DEVICE ? VRING_DESC_F_WRITE : 0;
+	flags |= VRING_DESC_F_NEXT;
+
+	/*
+	 * If using indirect descriptor tables, fill in the buffers
+	 * at vq->indirect_base.
+	 */
+	if (vq->indirect_base) {
+		i = 0;
+		if (likely(vq->tail))
+			i = vq->tail - vq->indirect_base + 1;
+
+		for_each_sg(sgl, sg, nents, n) {
+			tail = &vq->indirect_base[i];
+			tail->flags = flags;
+			tail->addr = sg_phys(sg);
+			tail->len = sg->length;
+			tail->next = ++i;
+		}
+	} else {
+		BUG_ON(vq->vq.num_free < nents);
+
+		i = vq->free_head;
+		for_each_sg(sgl, sg, nents, n) {
+			tail = &vq->vring.desc[i];
+			tail->flags = flags;
+			tail->addr = sg_phys(sg);
+			tail->len = sg->length;
+			i = tail->next;
+			vq->vq.num_free--;
+		}
+
+		vq->free_head = i;
+	}
+	vq->tail = tail;
+}
+EXPORT_SYMBOL_GPL(virtqueue_add_sg);
+
+/**
+ * virtqueue_end_buf - expose buffer to other end
+ * @_vq: the virtqueue for which the buffer was built
+ *
+ * Caller must ensure we don't call this with other virtqueue operations
+ * at the same time (except where noted).
+ */
+void virtqueue_end_buf(struct virtqueue *_vq)
+{
+	struct vring_virtqueue *vq = to_vvq(_vq);
+	unsigned int avail;
+	int head = vq->head;
+	struct vring_desc *tail = vq->tail;
+
+#ifdef DEBUG
+	BUG_ON(!vq->in_use);
+#endif
+	BUG_ON(tail == NULL);
+
+	/* The last one does not have the next flag set.  */
+	tail->flags &= ~VRING_DESC_F_NEXT;
+
+	/*
+	 * Put entry in available array.  Descriptors and available array
+	 * need to be set before we expose the new available array entries.
+	 */
+	avail = vq->vring.avail->idx & (vq->vring.num-1);
+	vq->vring.avail->ring[avail] = head;
+	virtio_wmb(vq);
+
+	vq->vring.avail->idx++;
+	vq->num_added++;
+
+	/*
+	 * This is very unlikely, but theoretically possible.  Kick
+	 * just in case.
+	 */
+	if (unlikely(vq->num_added == (1 << 16) - 1))
+		virtqueue_kick(&vq->vq);
+
+	pr_debug("Added buffer head %i to %p\n", head, vq);
+	END_USE(vq);
+}
+EXPORT_SYMBOL_GPL(virtqueue_end_buf);
+
 static inline bool more_used(const struct vring_virtqueue *vq)
 {
 	return vq->last_used_idx != vq->vring.used->idx;
diff --git a/include/linux/virtio.h b/include/linux/virtio.h
index cf8adb1..43d6bc3 100644
--- a/include/linux/virtio.h
+++ b/include/linux/virtio.h
@@ -7,6 +7,7 @@
 #include <linux/spinlock.h>
 #include <linux/device.h>
 #include <linux/mod_devicetable.h>
+#include <linux/dma-direction.h>
 #include <linux/gfp.h>
 
 /**
@@ -40,6 +41,19 @@ int virtqueue_add_buf(struct virtqueue *vq,
 		      void *data,
 		      gfp_t gfp);
 
+int virtqueue_start_buf(struct virtqueue *_vq,
+			void *data,
+			unsigned int nents,
+			unsigned int nsg,
+			gfp_t gfp);
+
+void virtqueue_add_sg(struct virtqueue *_vq,
+		      struct scatterlist sgl[],
+		      unsigned int nents,
+		      enum dma_data_direction dir);
+
+void virtqueue_end_buf(struct virtqueue *_vq);
+
 void virtqueue_kick(struct virtqueue *vq);
 
 bool virtqueue_kick_prepare(struct virtqueue *vq);
-- 
1.7.1



^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH 2/9] virtio-blk: reorganize virtblk_add_req
  2013-02-12 12:23 [PATCH 0/9] virtio: new API for addition of buffers, scatterlist changes Paolo Bonzini
  2013-02-12 12:23 ` [PATCH 1/9] virtio: add functions for piecewise addition of buffers Paolo Bonzini
@ 2013-02-12 12:23 ` Paolo Bonzini
  2013-02-17  6:38   ` Asias He
  2013-02-12 12:23 ` [PATCH 3/9] virtio-blk: use virtqueue_start_buf on bio path Paolo Bonzini
                   ` (7 subsequent siblings)
  9 siblings, 1 reply; 35+ messages in thread
From: Paolo Bonzini @ 2013-02-12 12:23 UTC (permalink / raw)
  To: linux-kernel; +Cc: Wanlong Gao, asias, mst, Rusty Russell, kvm, virtualization

Right now, both virtblk_add_req and virtblk_add_req_wait call
virtqueue_add_buf.  To prepare for the next patches, abstract the call
to virtqueue_add_buf into a new function __virtblk_add_req, and include
the waiting logic directly in virtblk_add_req.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 drivers/block/virtio_blk.c |   55 ++++++++++++++++----------------------------
 1 files changed, 20 insertions(+), 35 deletions(-)

diff --git a/drivers/block/virtio_blk.c b/drivers/block/virtio_blk.c
index 8ad21a2..fd8a689 100644
--- a/drivers/block/virtio_blk.c
+++ b/drivers/block/virtio_blk.c
@@ -100,50 +100,39 @@ static inline struct virtblk_req *virtblk_alloc_req(struct virtio_blk *vblk,
 	return vbr;
 }
 
-static void virtblk_add_buf_wait(struct virtio_blk *vblk,
-				 struct virtblk_req *vbr,
-				 unsigned long out,
-				 unsigned long in)
+static inline int __virtblk_add_req(struct virtqueue *vq,
+			     struct virtblk_req *vbr,
+			     unsigned long out,
+			     unsigned long in)
 {
+	return virtqueue_add_buf(vq, vbr->sg, out, in, vbr, GFP_ATOMIC);
+}
+
+static void virtblk_add_req(struct virtblk_req *vbr,
+			    unsigned int out, unsigned int in)
+{
+	struct virtio_blk *vblk = vbr->vblk;
 	DEFINE_WAIT(wait);
+	int ret;
 
-	for (;;) {
+	spin_lock_irq(vblk->disk->queue->queue_lock);
+	while (unlikely((ret = __virtblk_add_req(vblk->vq, vbr,
+						 out, in)) < 0)) {
 		prepare_to_wait_exclusive(&vblk->queue_wait, &wait,
 					  TASK_UNINTERRUPTIBLE);
 
+		spin_unlock_irq(vblk->disk->queue->queue_lock);
+		io_schedule();
 		spin_lock_irq(vblk->disk->queue->queue_lock);
-		if (virtqueue_add_buf(vblk->vq, vbr->sg, out, in, vbr,
-				      GFP_ATOMIC) < 0) {
-			spin_unlock_irq(vblk->disk->queue->queue_lock);
-			io_schedule();
-		} else {
-			virtqueue_kick(vblk->vq);
-			spin_unlock_irq(vblk->disk->queue->queue_lock);
-			break;
-		}
 
+		finish_wait(&vblk->queue_wait, &wait);
 	}
 
-	finish_wait(&vblk->queue_wait, &wait);
-}
-
-static inline void virtblk_add_req(struct virtblk_req *vbr,
-				   unsigned int out, unsigned int in)
-{
-	struct virtio_blk *vblk = vbr->vblk;
-
-	spin_lock_irq(vblk->disk->queue->queue_lock);
-	if (unlikely(virtqueue_add_buf(vblk->vq, vbr->sg, out, in, vbr,
-					GFP_ATOMIC) < 0)) {
-		spin_unlock_irq(vblk->disk->queue->queue_lock);
-		virtblk_add_buf_wait(vblk, vbr, out, in);
-		return;
-	}
 	virtqueue_kick(vblk->vq);
 	spin_unlock_irq(vblk->disk->queue->queue_lock);
 }
 
-static int virtblk_bio_send_flush(struct virtblk_req *vbr)
+static void virtblk_bio_send_flush(struct virtblk_req *vbr)
 {
 	unsigned int out = 0, in = 0;
 
@@ -155,11 +144,9 @@ static int virtblk_bio_send_flush(struct virtblk_req *vbr)
 	sg_set_buf(&vbr->sg[out + in++], &vbr->status, sizeof(vbr->status));
 
 	virtblk_add_req(vbr, out, in);
-
-	return 0;
 }
 
-static int virtblk_bio_send_data(struct virtblk_req *vbr)
+static void virtblk_bio_send_data(struct virtblk_req *vbr)
 {
 	struct virtio_blk *vblk = vbr->vblk;
 	unsigned int num, out = 0, in = 0;
@@ -188,8 +175,6 @@ static int virtblk_bio_send_data(struct virtblk_req *vbr)
 	}
 
 	virtblk_add_req(vbr, out, in);
-
-	return 0;
 }
 
 static void virtblk_bio_send_data_work(struct work_struct *work)
-- 
1.7.1



^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH 3/9] virtio-blk: use virtqueue_start_buf on bio path
  2013-02-12 12:23 [PATCH 0/9] virtio: new API for addition of buffers, scatterlist changes Paolo Bonzini
  2013-02-12 12:23 ` [PATCH 1/9] virtio: add functions for piecewise addition of buffers Paolo Bonzini
  2013-02-12 12:23 ` [PATCH 2/9] virtio-blk: reorganize virtblk_add_req Paolo Bonzini
@ 2013-02-12 12:23 ` Paolo Bonzini
  2013-02-17  6:39   ` Asias He
  2013-02-12 12:23 ` [PATCH 4/9] virtio-blk: use virtqueue_start_buf on req path Paolo Bonzini
                   ` (6 subsequent siblings)
  9 siblings, 1 reply; 35+ messages in thread
From: Paolo Bonzini @ 2013-02-12 12:23 UTC (permalink / raw)
  To: linux-kernel; +Cc: Wanlong Gao, asias, mst, Rusty Russell, kvm, virtualization

Move the creation of the request header and response footer to
__virtblk_add_req.  vbr->sg only contains the data scatterlist,
the header/footer are added separately using the new piecewise
API for building virtqueue buffers.

With this change, virtio-blk (with use_bio) is not relying anymore on
the virtio functions ignoring the end markers in a scatterlist.
The next patch will do the same for the other path.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 drivers/block/virtio_blk.c |   74 ++++++++++++++++++++++++++-----------------
 1 files changed, 45 insertions(+), 29 deletions(-)

diff --git a/drivers/block/virtio_blk.c b/drivers/block/virtio_blk.c
index fd8a689..4a31fcc 100644
--- a/drivers/block/virtio_blk.c
+++ b/drivers/block/virtio_blk.c
@@ -62,6 +62,7 @@ struct virtblk_req
 	struct virtio_blk *vblk;
 	int flags;
 	u8 status;
+	int nents;
 	struct scatterlist sg[];
 };
 
@@ -100,24 +101,52 @@ static inline struct virtblk_req *virtblk_alloc_req(struct virtio_blk *vblk,
 	return vbr;
 }
 
-static inline int __virtblk_add_req(struct virtqueue *vq,
-			     struct virtblk_req *vbr,
-			     unsigned long out,
-			     unsigned long in)
+static int __virtblk_add_req(struct virtqueue *vq,
+			     struct virtblk_req *vbr)
 {
-	return virtqueue_add_buf(vq, vbr->sg, out, in, vbr, GFP_ATOMIC);
+	struct scatterlist sg;
+	enum dma_data_direction dir;
+	int ret;
+
+	unsigned int nents = 2;
+	unsigned int nsg = 2;
+
+	if (vbr->nents) {
+		nsg++;
+		nents += vbr->nents;
+	}
+
+	ret = virtqueue_start_buf(vq, vbr, nents, nsg, GFP_ATOMIC);
+	if (ret < 0)
+		return ret;
+
+	dir = DMA_TO_DEVICE;
+	sg_init_one(&sg, &vbr->out_hdr, sizeof(vbr->out_hdr));
+	virtqueue_add_sg(vq, &sg, 1, dir);
+
+	if (vbr->nents) {
+		if ((vbr->out_hdr.type & VIRTIO_BLK_T_OUT) == 0)
+			dir = DMA_FROM_DEVICE;
+
+		virtqueue_add_sg(vq, vbr->sg, vbr->nents, dir);
+	}
+
+	dir = DMA_FROM_DEVICE;
+	sg_init_one(&sg, &vbr->status, sizeof(vbr->status));
+	virtqueue_add_sg(vq, &sg, 1, dir);
+
+	virtqueue_end_buf(vq);
+	return 0;
 }
 
-static void virtblk_add_req(struct virtblk_req *vbr,
-			    unsigned int out, unsigned int in)
+static void virtblk_add_req(struct virtblk_req *vbr)
 {
 	struct virtio_blk *vblk = vbr->vblk;
 	DEFINE_WAIT(wait);
 	int ret;
 
 	spin_lock_irq(vblk->disk->queue->queue_lock);
-	while (unlikely((ret = __virtblk_add_req(vblk->vq, vbr,
-						 out, in)) < 0)) {
+	while (unlikely((ret = __virtblk_add_req(vblk->vq, vbr)) < 0)) {
 		prepare_to_wait_exclusive(&vblk->queue_wait, &wait,
 					  TASK_UNINTERRUPTIBLE);
 
@@ -134,22 +163,18 @@ static void virtblk_add_req(struct virtblk_req *vbr,
 
 static void virtblk_bio_send_flush(struct virtblk_req *vbr)
 {
-	unsigned int out = 0, in = 0;
-
 	vbr->flags |= VBLK_IS_FLUSH;
 	vbr->out_hdr.type = VIRTIO_BLK_T_FLUSH;
 	vbr->out_hdr.sector = 0;
 	vbr->out_hdr.ioprio = 0;
-	sg_set_buf(&vbr->sg[out++], &vbr->out_hdr, sizeof(vbr->out_hdr));
-	sg_set_buf(&vbr->sg[out + in++], &vbr->status, sizeof(vbr->status));
+	vbr->nents = 0;
 
-	virtblk_add_req(vbr, out, in);
+	virtblk_add_req(vbr);
 }
 
 static void virtblk_bio_send_data(struct virtblk_req *vbr)
 {
 	struct virtio_blk *vblk = vbr->vblk;
-	unsigned int num, out = 0, in = 0;
 	struct bio *bio = vbr->bio;
 
 	vbr->flags &= ~VBLK_IS_FLUSH;
@@ -157,24 +182,15 @@ static void virtblk_bio_send_data(struct virtblk_req *vbr)
 	vbr->out_hdr.sector = bio->bi_sector;
 	vbr->out_hdr.ioprio = bio_prio(bio);
 
-	sg_set_buf(&vbr->sg[out++], &vbr->out_hdr, sizeof(vbr->out_hdr));
-
-	num = blk_bio_map_sg(vblk->disk->queue, bio, vbr->sg + out);
-
-	sg_set_buf(&vbr->sg[num + out + in++], &vbr->status,
-		   sizeof(vbr->status));
-
-	if (num) {
-		if (bio->bi_rw & REQ_WRITE) {
+	vbr->nents = blk_bio_map_sg(vblk->disk->queue, bio, vbr->sg);
+	if (vbr->nents) {
+		if (bio->bi_rw & REQ_WRITE)
 			vbr->out_hdr.type |= VIRTIO_BLK_T_OUT;
-			out += num;
-		} else {
+		else
 			vbr->out_hdr.type |= VIRTIO_BLK_T_IN;
-			in += num;
-		}
 	}
 
-	virtblk_add_req(vbr, out, in);
+	virtblk_add_req(vbr);
 }
 
 static void virtblk_bio_send_data_work(struct work_struct *work)
-- 
1.7.1



^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH 4/9] virtio-blk: use virtqueue_start_buf on req path
  2013-02-12 12:23 [PATCH 0/9] virtio: new API for addition of buffers, scatterlist changes Paolo Bonzini
                   ` (2 preceding siblings ...)
  2013-02-12 12:23 ` [PATCH 3/9] virtio-blk: use virtqueue_start_buf on bio path Paolo Bonzini
@ 2013-02-12 12:23 ` Paolo Bonzini
  2013-02-17  6:37   ` Asias He
  2013-02-12 12:23 ` [PATCH 5/9] scatterlist: introduce sg_unmark_end Paolo Bonzini
                   ` (5 subsequent siblings)
  9 siblings, 1 reply; 35+ messages in thread
From: Paolo Bonzini @ 2013-02-12 12:23 UTC (permalink / raw)
  To: linux-kernel; +Cc: Wanlong Gao, asias, mst, Rusty Russell, kvm, virtualization

This is similar to the previous patch, but a bit more radical
because the bio and req paths now share the buffer construction
code.  Because the req path doesn't use vbr->sg, however, we
need to add a couple of arguments to __virtblk_add_req.

We also need to teach __virtblk_add_req how to build SCSI command
requests.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 drivers/block/virtio_blk.c |   74 ++++++++++++++++++++++---------------------
 1 files changed, 38 insertions(+), 36 deletions(-)

diff --git a/drivers/block/virtio_blk.c b/drivers/block/virtio_blk.c
index 4a31fcc..22deb65 100644
--- a/drivers/block/virtio_blk.c
+++ b/drivers/block/virtio_blk.c
@@ -102,18 +102,26 @@ static inline struct virtblk_req *virtblk_alloc_req(struct virtio_blk *vblk,
 }
 
 static int __virtblk_add_req(struct virtqueue *vq,
-			     struct virtblk_req *vbr)
+			     struct virtblk_req *vbr,
+			     struct scatterlist *data_sg,
+			     unsigned data_nents)
 {
 	struct scatterlist sg;
 	enum dma_data_direction dir;
 	int ret;
 
+	int type = vbr->out_hdr.type & ~VIRTIO_BLK_T_OUT;
 	unsigned int nents = 2;
 	unsigned int nsg = 2;
 
-	if (vbr->nents) {
+	if (type == VIRTIO_BLK_T_SCSI_CMD) {
+		BUG_ON(use_bio);
+		nsg += 3;
+		nents += 3;
+	}
+	if (data_nents) {
 		nsg++;
-		nents += vbr->nents;
+		nents += data_nents;
 	}
 
 	ret = virtqueue_start_buf(vq, vbr, nents, nsg, GFP_ATOMIC);
@@ -124,14 +132,32 @@ static int __virtblk_add_req(struct virtqueue *vq,
 	sg_init_one(&sg, &vbr->out_hdr, sizeof(vbr->out_hdr));
 	virtqueue_add_sg(vq, &sg, 1, dir);
 
-	if (vbr->nents) {
+	/*
+	 * If this is a packet command we need a couple of additional headers.
+	 * Behind the normal outhdr we put a segment with the scsi command
+	 * block, and before the normal inhdr we put the sense data and the
+	 * inhdr with additional status information.
+	 */
+	if (type == VIRTIO_BLK_T_SCSI_CMD) {
+		sg_init_one(&sg, vbr->req->cmd, vbr->req->cmd_len);
+		virtqueue_add_sg(vq, &sg, 1, dir);
+	}
+
+	if (data_nents) {
 		if ((vbr->out_hdr.type & VIRTIO_BLK_T_OUT) == 0)
 			dir = DMA_FROM_DEVICE;
 
-		virtqueue_add_sg(vq, vbr->sg, vbr->nents, dir);
+		virtqueue_add_sg(vq, data_sg, data_nents, dir);
 	}
 
 	dir = DMA_FROM_DEVICE;
+	if (type == VIRTIO_BLK_T_SCSI_CMD) {
+		sg_init_one(&sg, vbr->req->sense, SCSI_SENSE_BUFFERSIZE);
+		virtqueue_add_sg(vq, &sg, 1, dir);
+		sg_init_one(&sg, &vbr->in_hdr, sizeof(vbr->in_hdr));
+		virtqueue_add_sg(vq, &sg, 1, dir);
+	}
+
 	sg_init_one(&sg, &vbr->status, sizeof(vbr->status));
 	virtqueue_add_sg(vq, &sg, 1, dir);
 
@@ -146,7 +172,8 @@ static void virtblk_add_req(struct virtblk_req *vbr)
 	int ret;
 
 	spin_lock_irq(vblk->disk->queue->queue_lock);
-	while (unlikely((ret = __virtblk_add_req(vblk->vq, vbr)) < 0)) {
+	while (unlikely((ret = __virtblk_add_req(vblk->vq, vbr, vbr->sg,
+						 vbr->nents)) < 0)) {
 		prepare_to_wait_exclusive(&vblk->queue_wait, &wait,
 					  TASK_UNINTERRUPTIBLE);
 
@@ -299,7 +326,7 @@ static void virtblk_done(struct virtqueue *vq)
 static bool do_req(struct request_queue *q, struct virtio_blk *vblk,
 		   struct request *req)
 {
-	unsigned long num, out = 0, in = 0;
+	unsigned int num;
 	struct virtblk_req *vbr;
 
 	vbr = virtblk_alloc_req(vblk, GFP_ATOMIC);
@@ -336,40 +363,15 @@ static bool do_req(struct request_queue *q, struct virtio_blk *vblk,
 		}
 	}
 
-	sg_set_buf(&vblk->sg[out++], &vbr->out_hdr, sizeof(vbr->out_hdr));
-
-	/*
-	 * If this is a packet command we need a couple of additional headers.
-	 * Behind the normal outhdr we put a segment with the scsi command
-	 * block, and before the normal inhdr we put the sense data and the
-	 * inhdr with additional status information before the normal inhdr.
-	 */
-	if (vbr->req->cmd_type == REQ_TYPE_BLOCK_PC)
-		sg_set_buf(&vblk->sg[out++], vbr->req->cmd, vbr->req->cmd_len);
-
-	num = blk_rq_map_sg(q, vbr->req, vblk->sg + out);
-
-	if (vbr->req->cmd_type == REQ_TYPE_BLOCK_PC) {
-		sg_set_buf(&vblk->sg[num + out + in++], vbr->req->sense, SCSI_SENSE_BUFFERSIZE);
-		sg_set_buf(&vblk->sg[num + out + in++], &vbr->in_hdr,
-			   sizeof(vbr->in_hdr));
-	}
-
-	sg_set_buf(&vblk->sg[num + out + in++], &vbr->status,
-		   sizeof(vbr->status));
-
+	num = blk_rq_map_sg(q, vbr->req, vblk->sg);
 	if (num) {
-		if (rq_data_dir(vbr->req) == WRITE) {
+		if (rq_data_dir(vbr->req) == WRITE)
 			vbr->out_hdr.type |= VIRTIO_BLK_T_OUT;
-			out += num;
-		} else {
+		else
 			vbr->out_hdr.type |= VIRTIO_BLK_T_IN;
-			in += num;
-		}
 	}
 
-	if (virtqueue_add_buf(vblk->vq, vblk->sg, out, in, vbr,
-			      GFP_ATOMIC) < 0) {
+	if (__virtblk_add_req(vblk->vq, vbr, vblk->sg, num) < 0) {
 		mempool_free(vbr, vblk->pool);
 		return false;
 	}
-- 
1.7.1



^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH 5/9] scatterlist: introduce sg_unmark_end
  2013-02-12 12:23 [PATCH 0/9] virtio: new API for addition of buffers, scatterlist changes Paolo Bonzini
                   ` (3 preceding siblings ...)
  2013-02-12 12:23 ` [PATCH 4/9] virtio-blk: use virtqueue_start_buf on req path Paolo Bonzini
@ 2013-02-12 12:23 ` Paolo Bonzini
  2013-02-12 12:23 ` [PATCH 6/9] virtio-net: unmark scatterlist ending after virtqueue_add_buf Paolo Bonzini
                   ` (4 subsequent siblings)
  9 siblings, 0 replies; 35+ messages in thread
From: Paolo Bonzini @ 2013-02-12 12:23 UTC (permalink / raw)
  To: linux-kernel; +Cc: Wanlong Gao, asias, mst, Rusty Russell, kvm, virtualization

This is useful in places that recycle the same scatterlist multiple
times, and do not want to incur the cost of sg_init_table every
time in hot paths.

Acked-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 block/blk-integrity.c       |    2 +-
 block/blk-merge.c           |    2 +-
 include/linux/scatterlist.h |   16 ++++++++++++++++
 3 files changed, 18 insertions(+), 2 deletions(-)

diff --git a/block/blk-integrity.c b/block/blk-integrity.c
index da2a818..0f5004f 100644
--- a/block/blk-integrity.c
+++ b/block/blk-integrity.c
@@ -110,7 +110,7 @@ new_segment:
 			if (!sg)
 				sg = sglist;
 			else {
-				sg->page_link &= ~0x02;
+				sg_unmark_end(sg);
 				sg = sg_next(sg);
 			}
 
diff --git a/block/blk-merge.c b/block/blk-merge.c
index 936a110..5f24482 100644
--- a/block/blk-merge.c
+++ b/block/blk-merge.c
@@ -143,7 +143,7 @@ new_segment:
 			 * termination bit to avoid doing a full
 			 * sg_init_table() in drivers for each command.
 			 */
-			(*sg)->page_link &= ~0x02;
+			sg_unmark_end(*sg);
 			*sg = sg_next(*sg);
 		}
 
diff --git a/include/linux/scatterlist.h b/include/linux/scatterlist.h
index 4bd6c06..9b83784 100644
--- a/include/linux/scatterlist.h
+++ b/include/linux/scatterlist.h
@@ -172,6 +172,22 @@ static inline void sg_mark_end(struct scatterlist *sg)
 }
 
 /**
+ * sg_unmark_end - Undo setting the end of the scatterlist
+ * @sg:		 SG entryScatterlist
+ *
+ * Description:
+ *   Removes the termination marker from the given entry of the scatterlist.
+ *
+ **/
+static inline void sg_unmark_end(struct scatterlist *sg)
+{
+#ifdef CONFIG_DEBUG_SG
+	BUG_ON(sg->sg_magic != SG_MAGIC);
+#endif
+	sg->page_link &= ~0x02;
+}
+
+/**
  * sg_phys - Return physical address of an sg entry
  * @sg:	     SG entry
  *
-- 
1.7.1



^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH 6/9] virtio-net: unmark scatterlist ending after virtqueue_add_buf
  2013-02-12 12:23 [PATCH 0/9] virtio: new API for addition of buffers, scatterlist changes Paolo Bonzini
                   ` (4 preceding siblings ...)
  2013-02-12 12:23 ` [PATCH 5/9] scatterlist: introduce sg_unmark_end Paolo Bonzini
@ 2013-02-12 12:23 ` Paolo Bonzini
  2013-02-12 12:23 ` [PATCH 7/9] virtio-scsi: use virtqueue_start_buf Paolo Bonzini
                   ` (3 subsequent siblings)
  9 siblings, 0 replies; 35+ messages in thread
From: Paolo Bonzini @ 2013-02-12 12:23 UTC (permalink / raw)
  To: linux-kernel; +Cc: Wanlong Gao, asias, mst, Rusty Russell, kvm, virtualization

Prepare for when virtqueue_add_buf will use sg_next instead of
ignoring ending markers.

Note that for_each_sg (and thus virtqueue_add_sg) allows you
to pass a "truncated" scatterlist that does not have a marker
on the last item.  We rely on this in add_recvbuf_mergeable.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 drivers/net/virtio_net.c |   19 ++++++++++++++-----
 1 files changed, 14 insertions(+), 5 deletions(-)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 35c00c5..78b6f51 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -440,13 +440,16 @@ static int add_recvbuf_small(struct receive_queue *rq, gfp_t gfp)
 
 	hdr = skb_vnet_hdr(skb);
 	sg_set_buf(rq->sg, &hdr->hdr, sizeof hdr->hdr);
-
 	skb_to_sgvec(skb, rq->sg + 1, 0, skb->len);
 
 	err = virtqueue_add_buf(rq->vq, rq->sg, 0, 2, skb, gfp);
 	if (err < 0)
 		dev_kfree_skb(skb);
 
+	/* An optimization: clear the end bit set by skb_to_sgvec, so
+	 * we can simply re-use rq->sg[] next time.
+	 */
+	sg_unmark_end(rq->sg + 1);
 	return err;
 }
 
@@ -505,8 +508,7 @@ static int add_recvbuf_mergeable(struct receive_queue *rq, gfp_t gfp)
 	if (!page)
 		return -ENOMEM;
 
-	sg_init_one(rq->sg, page_address(page), PAGE_SIZE);
-
+	sg_set_page(rq->sg, page, PAGE_SIZE, 0);
 	err = virtqueue_add_buf(rq->vq, rq->sg, 0, 1, page, gfp);
 	if (err < 0)
 		give_pages(rq, page);
@@ -671,6 +673,7 @@ static int xmit_skb(struct send_queue *sq, struct sk_buff *skb)
 	const unsigned char *dest = ((struct ethhdr *)skb->data)->h_dest;
 	struct virtnet_info *vi = sq->vq->vdev->priv;
 	unsigned num_sg;
+	int ret;
 
 	pr_debug("%s: xmit %p %pM\n", vi->dev->name, skb, dest);
 
@@ -710,8 +713,14 @@ static int xmit_skb(struct send_queue *sq, struct sk_buff *skb)
 		sg_set_buf(sq->sg, &hdr->hdr, sizeof hdr->hdr);
 
 	num_sg = skb_to_sgvec(skb, sq->sg + 1, 0, skb->len) + 1;
-	return virtqueue_add_buf(sq->vq, sq->sg, num_sg,
-				 0, skb, GFP_ATOMIC);
+	ret = virtqueue_add_buf(sq->vq, sq->sg, num_sg,
+				0, skb, GFP_ATOMIC);
+
+	/* An optimization: clear the end bit set by skb_to_sgvec, so
+	 * we can simply re-use sq->sg[] next time.
+	 */
+	sg_unmark_end(&sq->sg[num_sg-1]);
+	return ret;
 }
 
 static netdev_tx_t start_xmit(struct sk_buff *skb, struct net_device *dev)
-- 
1.7.1



^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH 7/9] virtio-scsi: use virtqueue_start_buf
  2013-02-12 12:23 [PATCH 0/9] virtio: new API for addition of buffers, scatterlist changes Paolo Bonzini
                   ` (5 preceding siblings ...)
  2013-02-12 12:23 ` [PATCH 6/9] virtio-net: unmark scatterlist ending after virtqueue_add_buf Paolo Bonzini
@ 2013-02-12 12:23 ` Paolo Bonzini
  2013-02-12 12:23 ` [PATCH 8/9] virtio: introduce and use virtqueue_add_buf_single Paolo Bonzini
                   ` (2 subsequent siblings)
  9 siblings, 0 replies; 35+ messages in thread
From: Paolo Bonzini @ 2013-02-12 12:23 UTC (permalink / raw)
  To: linux-kernel; +Cc: Wanlong Gao, asias, mst, Rusty Russell, kvm, virtualization

Using the new virtio_scsi_add_sg function lets us simplify the queueing
path.  In particular, all data protected by the tgt_lock is just gone
(multiqueue will find a new use for the lock).

The speedup is relatively small (2-4%) but it is worthwhile because of
the code simplification it enables.  And it will be particularly useful
when adding multiqueue supoprt, because it keeps the tgt_lock critical
sections very slim while avoiding a proliferation of locks.

Signed-off-by: Wanlong Gao <gaowanlong@cn.fujitsu.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 drivers/scsi/virtio_scsi.c |  102 +++++++++++++++++++-------------------------
 1 files changed, 44 insertions(+), 58 deletions(-)

diff --git a/drivers/scsi/virtio_scsi.c b/drivers/scsi/virtio_scsi.c
index 3449a1f..6b752f7 100644
--- a/drivers/scsi/virtio_scsi.c
+++ b/drivers/scsi/virtio_scsi.c
@@ -59,11 +59,8 @@ struct virtio_scsi_vq {
 
 /* Per-target queue state */
 struct virtio_scsi_target_state {
-	/* Protects sg.  Lock hierarchy is tgt_lock -> vq_lock.  */
+	/* Never held at the same time as vq_lock.  */
 	spinlock_t tgt_lock;
-
-	/* For sglist construction when adding commands to the virtqueue.  */
-	struct scatterlist sg[];
 };
 
 /* Driver instance state */
@@ -351,89 +348,82 @@ static void virtscsi_event_done(struct virtqueue *vq)
 	spin_unlock_irqrestore(&vscsi->event_vq.vq_lock, flags);
 };
 
-static void virtscsi_map_sgl(struct scatterlist *sg, unsigned int *p_idx,
-			     struct scsi_data_buffer *sdb)
-{
-	struct sg_table *table = &sdb->table;
-	struct scatterlist *sg_elem;
-	unsigned int idx = *p_idx;
-	int i;
-
-	for_each_sg(table->sgl, sg_elem, table->nents, i)
-		sg[idx++] = *sg_elem;
-
-	*p_idx = idx;
-}
-
 /**
- * virtscsi_map_cmd - map a scsi_cmd to a virtqueue scatterlist
+ * virtscsi_add_cmd - add a virtio_scsi_cmd to a virtqueue
  * @vscsi	: virtio_scsi state
  * @cmd		: command structure
- * @out_num	: number of read-only elements
- * @in_num	: number of write-only elements
  * @req_size	: size of the request buffer
  * @resp_size	: size of the response buffer
- *
- * Called with tgt_lock held.
+ * @gfp	: flags to use for memory allocations
  */
-static void virtscsi_map_cmd(struct virtio_scsi_target_state *tgt,
-			     struct virtio_scsi_cmd *cmd,
-			     unsigned *out_num, unsigned *in_num,
-			     size_t req_size, size_t resp_size)
+static int virtscsi_add_cmd(struct virtqueue *vq,
+			    struct virtio_scsi_cmd *cmd,
+			    size_t req_size, size_t resp_size, gfp_t gfp)
 {
 	struct scsi_cmnd *sc = cmd->sc;
-	struct scatterlist *sg = tgt->sg;
-	unsigned int idx = 0;
+	struct scatterlist sg;
+	unsigned int nents, nsg;
+	struct sg_table *out, *in;
+	int ret;
+
+	out = in = NULL;
+
+	if (sc && sc->sc_data_direction != DMA_NONE) {
+		if (sc->sc_data_direction != DMA_FROM_DEVICE)
+			out = &scsi_out(sc)->table;
+		if (sc->sc_data_direction != DMA_TO_DEVICE)
+			in = &scsi_in(sc)->table;
+	}
+
+	nsg   = 2 + (out ? 1 : 0)          + (in ? 1 : 0);
+	nents = 2 + (out ? out->nents : 0) + (in ? in->nents : 0);
+	ret = virtqueue_start_buf(vq, cmd, nents, nsg, gfp);
+	if (ret < 0)
+		return ret;
 
 	/* Request header.  */
-	sg_set_buf(&sg[idx++], &cmd->req, req_size);
+	sg_init_one(&sg, &cmd->req, req_size);
+	virtqueue_add_sg(vq, &sg, 1, DMA_TO_DEVICE);
 
 	/* Data-out buffer.  */
-	if (sc && sc->sc_data_direction != DMA_FROM_DEVICE)
-		virtscsi_map_sgl(sg, &idx, scsi_out(sc));
-
-	*out_num = idx;
+	if (out)
+		virtqueue_add_sg(vq, out->sgl, out->nents, DMA_TO_DEVICE);
 
 	/* Response header.  */
-	sg_set_buf(&sg[idx++], &cmd->resp, resp_size);
+	sg_init_one(&sg, &cmd->resp, resp_size);
+	virtqueue_add_sg(vq, &sg, 1, DMA_FROM_DEVICE);
 
 	/* Data-in buffer */
-	if (sc && sc->sc_data_direction != DMA_TO_DEVICE)
-		virtscsi_map_sgl(sg, &idx, scsi_in(sc));
+	if (in)
+		virtqueue_add_sg(vq, in->sgl, in->nents, DMA_FROM_DEVICE);
 
-	*in_num = idx - *out_num;
+	virtqueue_end_buf(vq);
+	return 0;
 }
 
-static int virtscsi_kick_cmd(struct virtio_scsi_target_state *tgt,
-			     struct virtio_scsi_vq *vq,
+static int virtscsi_kick_cmd(struct virtio_scsi_vq *vq,
 			     struct virtio_scsi_cmd *cmd,
 			     size_t req_size, size_t resp_size, gfp_t gfp)
 {
-	unsigned int out_num, in_num;
 	unsigned long flags;
-	int err;
+	int ret;
 	bool needs_kick = false;
 
-	spin_lock_irqsave(&tgt->tgt_lock, flags);
-	virtscsi_map_cmd(tgt, cmd, &out_num, &in_num, req_size, resp_size);
-
-	spin_lock(&vq->vq_lock);
-	err = virtqueue_add_buf(vq->vq, tgt->sg, out_num, in_num, cmd, gfp);
-	spin_unlock(&tgt->tgt_lock);
-	if (!err)
+	spin_lock_irqsave(&vq->vq_lock, flags);
+	ret = virtscsi_add_cmd(vq->vq, cmd, req_size, resp_size, gfp);
+	if (!ret)
 		needs_kick = virtqueue_kick_prepare(vq->vq);
 
 	spin_unlock_irqrestore(&vq->vq_lock, flags);
 
 	if (needs_kick)
 		virtqueue_notify(vq->vq);
-	return err;
+	return ret;
 }
 
 static int virtscsi_queuecommand(struct Scsi_Host *sh, struct scsi_cmnd *sc)
 {
 	struct virtio_scsi *vscsi = shost_priv(sh);
-	struct virtio_scsi_target_state *tgt = vscsi->tgt[sc->device->id];
 	struct virtio_scsi_cmd *cmd;
 	int ret;
 
@@ -467,7 +457,7 @@ static int virtscsi_queuecommand(struct Scsi_Host *sh, struct scsi_cmnd *sc)
 	BUG_ON(sc->cmd_len > VIRTIO_SCSI_CDB_SIZE);
 	memcpy(cmd->req.cmd.cdb, sc->cmnd, sc->cmd_len);
 
-	if (virtscsi_kick_cmd(tgt, &vscsi->req_vq, cmd,
+	if (virtscsi_kick_cmd(&vscsi->req_vq, cmd,
 			      sizeof cmd->req.cmd, sizeof cmd->resp.cmd,
 			      GFP_ATOMIC) == 0)
 		ret = 0;
@@ -481,11 +471,10 @@ out:
 static int virtscsi_tmf(struct virtio_scsi *vscsi, struct virtio_scsi_cmd *cmd)
 {
 	DECLARE_COMPLETION_ONSTACK(comp);
-	struct virtio_scsi_target_state *tgt = vscsi->tgt[cmd->sc->device->id];
 	int ret = FAILED;
 
 	cmd->comp = &comp;
-	if (virtscsi_kick_cmd(tgt, &vscsi->ctrl_vq, cmd,
+	if (virtscsi_kick_cmd(&vscsi->ctrl_vq, cmd,
 			      sizeof cmd->req.tmf, sizeof cmd->resp.tmf,
 			      GFP_NOIO) < 0)
 		goto out;
@@ -592,14 +581,11 @@ static struct virtio_scsi_target_state *virtscsi_alloc_tgt(
 	gfp_t gfp_mask = GFP_KERNEL;
 
 	/* We need extra sg elements at head and tail.  */
-	tgt = kmalloc(sizeof(*tgt) + sizeof(tgt->sg[0]) * (sg_elems + 2),
-		      gfp_mask);
-
+	tgt = kmalloc(sizeof(*tgt), gfp_mask);
 	if (!tgt)
 		return NULL;
 
 	spin_lock_init(&tgt->tgt_lock);
-	sg_init_table(tgt->sg, sg_elems + 2);
 	return tgt;
 }
 
-- 
1.7.1



^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH 8/9] virtio: introduce and use virtqueue_add_buf_single
  2013-02-12 12:23 [PATCH 0/9] virtio: new API for addition of buffers, scatterlist changes Paolo Bonzini
                   ` (6 preceding siblings ...)
  2013-02-12 12:23 ` [PATCH 7/9] virtio-scsi: use virtqueue_start_buf Paolo Bonzini
@ 2013-02-12 12:23 ` Paolo Bonzini
  2013-02-12 12:23 ` [PATCH 9/9] virtio: reimplement virtqueue_add_buf using new functions Paolo Bonzini
  2013-02-14  6:00 ` [PATCH 0/9] virtio: new API for addition of buffers, scatterlist changes Rusty Russell
  9 siblings, 0 replies; 35+ messages in thread
From: Paolo Bonzini @ 2013-02-12 12:23 UTC (permalink / raw)
  To: linux-kernel; +Cc: Wanlong Gao, asias, mst, Rusty Russell, kvm, virtualization

Adding a single direct buffer is a very common case.  Introduce
an optimized function for that.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 drivers/char/hw_random/virtio-rng.c |    2 +-
 drivers/char/virtio_console.c       |    4 +-
 drivers/net/virtio_net.c            |    2 +-
 drivers/rpmsg/virtio_rpmsg_bus.c    |    8 +++---
 drivers/scsi/virtio_scsi.c          |    4 +-
 drivers/virtio/virtio_balloon.c     |    6 ++--
 drivers/virtio/virtio_ring.c        |   40 +++++++++++++++++++++++++++++++++++
 include/linux/virtio.h              |    5 ++++
 tools/virtio/virtio_test.c          |    6 ++--
 9 files changed, 61 insertions(+), 16 deletions(-)

diff --git a/drivers/char/hw_random/virtio-rng.c b/drivers/char/hw_random/virtio-rng.c
index b65c103..8b851ed 100644
--- a/drivers/char/hw_random/virtio-rng.c
+++ b/drivers/char/hw_random/virtio-rng.c
@@ -47,7 +47,7 @@ static void register_buffer(u8 *buf, size_t size)
 	sg_init_one(&sg, buf, size);
 
 	/* There should always be room for one buffer. */
-	if (virtqueue_add_buf(vq, &sg, 0, 1, buf, GFP_KERNEL) < 0)
+	if (virtqueue_add_buf_single(vq, &sg, DMA_FROM_DEVICE, buf) < 0)
 		BUG();
 
 	virtqueue_kick(vq);
diff --git a/drivers/char/virtio_console.c b/drivers/char/virtio_console.c
index 684b0d5..62bd945 100644
--- a/drivers/char/virtio_console.c
+++ b/drivers/char/virtio_console.c
@@ -508,7 +508,7 @@ static int add_inbuf(struct virtqueue *vq, struct port_buffer *buf)
 
 	sg_init_one(sg, buf->buf, buf->size);
 
-	ret = virtqueue_add_buf(vq, sg, 0, 1, buf, GFP_ATOMIC);
+	ret = virtqueue_add_buf_single(vq, sg, DMA_FROM_DEVICE, buf);
 	virtqueue_kick(vq);
 	if (!ret)
 		ret = vq->num_free;
@@ -575,7 +575,7 @@ static ssize_t __send_control_msg(struct ports_device *portdev, u32 port_id,
 	vq = portdev->c_ovq;
 
 	sg_init_one(sg, &cpkt, sizeof(cpkt));
-	if (virtqueue_add_buf(vq, sg, 1, 0, &cpkt, GFP_ATOMIC) == 0) {
+	if (virtqueue_add_buf_single(vq, sg, DMA_TO_DEVICE, &cpkt) == 0) {
 		virtqueue_kick(vq);
 		while (!virtqueue_get_buf(vq, &len))
 			cpu_relax();
diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 78b6f51..1a5186d 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -509,7 +509,7 @@ static int add_recvbuf_mergeable(struct receive_queue *rq, gfp_t gfp)
 		return -ENOMEM;
 
 	sg_set_page(rq->sg, page, PAGE_SIZE, 0);
-	err = virtqueue_add_buf(rq->vq, rq->sg, 0, 1, page, gfp);
+	err = virtqueue_add_buf_single(rq->vq, rq->sg, DMA_FROM_DEVICE, page);
 	if (err < 0)
 		give_pages(rq, page);
 
diff --git a/drivers/rpmsg/virtio_rpmsg_bus.c b/drivers/rpmsg/virtio_rpmsg_bus.c
index f1e3239..3008987 100644
--- a/drivers/rpmsg/virtio_rpmsg_bus.c
+++ b/drivers/rpmsg/virtio_rpmsg_bus.c
@@ -763,7 +763,7 @@ int rpmsg_send_offchannel_raw(struct rpmsg_channel *rpdev, u32 src, u32 dst,
 	mutex_lock(&vrp->tx_lock);
 
 	/* add message to the remote processor's virtqueue */
-	err = virtqueue_add_buf(vrp->svq, &sg, 1, 0, msg, GFP_KERNEL);
+	err = virtqueue_add_buf_single(vrp->svq, &sg, DMA_TO_DEVICE, msg);
 	if (err) {
 		/*
 		 * need to reclaim the buffer here, otherwise it's lost
@@ -845,7 +845,7 @@ static void rpmsg_recv_done(struct virtqueue *rvq)
 	sg_init_one(&sg, msg, RPMSG_BUF_SIZE);
 
 	/* add the buffer back to the remote processor's virtqueue */
-	err = virtqueue_add_buf(vrp->rvq, &sg, 0, 1, msg, GFP_KERNEL);
+	err = virtqueue_add_buf_single(vrp->rvq, &sg, DMA_FROM_DEVICE, msg);
 	if (err < 0) {
 		dev_err(dev, "failed to add a virtqueue buffer: %d\n", err);
 		return;
@@ -976,8 +976,8 @@ static int rpmsg_probe(struct virtio_device *vdev)
 
 		sg_init_one(&sg, cpu_addr, RPMSG_BUF_SIZE);
 
-		err = virtqueue_add_buf(vrp->rvq, &sg, 0, 1, cpu_addr,
-								GFP_KERNEL);
+		err = virtqueue_add_buf_single(vrp->rvq, &sg, DMA_FROM_DEVICE,
+					       cpu_addr);
 		WARN_ON(err); /* sanity check; this can't really happen */
 	}
 
diff --git a/drivers/scsi/virtio_scsi.c b/drivers/scsi/virtio_scsi.c
index 6b752f7..887902e 100644
--- a/drivers/scsi/virtio_scsi.c
+++ b/drivers/scsi/virtio_scsi.c
@@ -220,8 +220,8 @@ static int virtscsi_kick_event(struct virtio_scsi *vscsi,
 
 	spin_lock_irqsave(&vscsi->event_vq.vq_lock, flags);
 
-	err = virtqueue_add_buf(vscsi->event_vq.vq, &sg, 0, 1, event_node,
-				GFP_ATOMIC);
+	err = virtqueue_add_buf_single(vscsi->event_vq.vq, &sg,
+				       DMA_FROM_DEVICE, event_node);
 	if (!err)
 		virtqueue_kick(vscsi->event_vq.vq);
 
diff --git a/drivers/virtio/virtio_balloon.c b/drivers/virtio/virtio_balloon.c
index 797e1c7..c280948 100644
--- a/drivers/virtio/virtio_balloon.c
+++ b/drivers/virtio/virtio_balloon.c
@@ -108,7 +108,7 @@ static void tell_host(struct virtio_balloon *vb, struct virtqueue *vq)
 	sg_init_one(&sg, vb->pfns, sizeof(vb->pfns[0]) * vb->num_pfns);
 
 	/* We should always be able to add one buffer to an empty queue. */
-	if (virtqueue_add_buf(vq, &sg, 1, 0, vb, GFP_KERNEL) < 0)
+	if (virtqueue_add_buf_single(vq, &sg, DMA_TO_DEVICE, vb) < 0)
 		BUG();
 	virtqueue_kick(vq);
 
@@ -256,7 +256,7 @@ static void stats_handle_request(struct virtio_balloon *vb)
 	if (!virtqueue_get_buf(vq, &len))
 		return;
 	sg_init_one(&sg, vb->stats, sizeof(vb->stats));
-	if (virtqueue_add_buf(vq, &sg, 1, 0, vb, GFP_KERNEL) < 0)
+	if (virtqueue_add_buf_single(vq, &sg, DMA_TO_DEVICE, vb) < 0)
 		BUG();
 	virtqueue_kick(vq);
 }
@@ -341,7 +341,7 @@ static int init_vqs(struct virtio_balloon *vb)
 		 * use it to signal us later.
 		 */
 		sg_init_one(&sg, vb->stats, sizeof vb->stats);
-		if (virtqueue_add_buf(vb->stats_vq, &sg, 1, 0, vb, GFP_KERNEL)
+		if (virtqueue_add_buf_single(vb->stats_vq, &sg, DMA_TO_DEVICE, vb)
 		    < 0)
 			BUG();
 		virtqueue_kick(vb->stats_vq);
diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
index 64184e5..b803cf7 100644
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -181,6 +181,48 @@ static int vring_add_indirect(struct vring_virtqueue *vq,
 }
 
 /**
+ * virtqueue_add_buf_single - expose a single scatterlist entry to other end
+ * @vq: the struct virtqueue we're talking about.
+ * @sg: the description of the buffer.
+ * @dir: whether the buffer will be written or read by the device
+ * @data: the token identifying the buffer.
+ *
+ * Caller must ensure we don't call this with other virtqueue operations
+ * at the same time (except where noted).  No allocations are performed.
+ *
+ * Returns zero or a negative error (ENOSPC).
+ */
+int virtqueue_add_buf_single(struct virtqueue *_vq,
+			     struct scatterlist *sg,
+			     enum dma_data_direction dir,
+			     void *data)
+{
+	struct vring_virtqueue *vq = to_vvq(_vq);
+	struct vring_desc *tail;
+	int ret;
+
+	/* Always direct, the gfp argument is not used.  */
+	ret = virtqueue_start_buf(_vq, data, 1, 1, GFP_ATOMIC);
+	if (ret < 0)
+		return ret;
+
+	BUG_ON(vq->indirect_base);
+	BUG_ON(dir != DMA_FROM_DEVICE && dir != DMA_TO_DEVICE);
+
+	/* The direct case of virtqueue_add_sg, inlined.  */
+	tail = vq->tail = &vq->vring.desc[vq->free_head];
+	tail->flags = dir == DMA_FROM_DEVICE ? VRING_DESC_F_WRITE : 0;
+	tail->addr = sg_phys(sg);
+	tail->len = sg->length;
+	vq->free_head = tail->next;
+	vq->vq.num_free--;
+
+	virtqueue_end_buf(_vq);
+	return 0;
+}
+EXPORT_SYMBOL_GPL(virtqueue_add_buf_single);
+
+/**
  * virtqueue_add_buf - expose buffer to other end
  * @vq: the struct virtqueue we're talking about.
  * @sg: the description of the buffer(s).
diff --git a/include/linux/virtio.h b/include/linux/virtio.h
index 43d6bc3..4245bec 100644
--- a/include/linux/virtio.h
+++ b/include/linux/virtio.h
@@ -41,6 +41,11 @@ int virtqueue_add_buf(struct virtqueue *vq,
 		      void *data,
 		      gfp_t gfp);
 
+int virtqueue_add_buf_single(struct virtqueue *vq,
+			     struct scatterlist *sg,
+			     enum dma_data_direction dir,
+			     void *data);
+
 int virtqueue_start_buf(struct virtqueue *_vq,
 			void *data,
 			unsigned int nents,
diff --git a/tools/virtio/virtio_test.c b/tools/virtio/virtio_test.c
index fcc9aa2..727b6a1 100644
--- a/tools/virtio/virtio_test.c
+++ b/tools/virtio/virtio_test.c
@@ -161,9 +161,9 @@ static void run_test(struct vdev_info *dev, struct vq_info *vq,
 		do {
 			if (started < bufs) {
 				sg_init_one(&sl, dev->buf, dev->buf_size);
-				r = virtqueue_add_buf(vq->vq, &sl, 1, 0,
-						      dev->buf + started,
-						      GFP_ATOMIC);
+				r = virtqueue_add_buf_single(vq->vq, &sl,
+						      DMA_TO_DEVICE,
+						      dev->buf + started);
 				if (likely(r == 0)) {
 					++started;
 					virtqueue_kick(vq->vq);
-- 
1.7.1



^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH 9/9] virtio: reimplement virtqueue_add_buf using new functions
  2013-02-12 12:23 [PATCH 0/9] virtio: new API for addition of buffers, scatterlist changes Paolo Bonzini
                   ` (7 preceding siblings ...)
  2013-02-12 12:23 ` [PATCH 8/9] virtio: introduce and use virtqueue_add_buf_single Paolo Bonzini
@ 2013-02-12 12:23 ` Paolo Bonzini
  2013-02-14  6:00 ` [PATCH 0/9] virtio: new API for addition of buffers, scatterlist changes Rusty Russell
  9 siblings, 0 replies; 35+ messages in thread
From: Paolo Bonzini @ 2013-02-12 12:23 UTC (permalink / raw)
  To: linux-kernel; +Cc: Wanlong Gao, asias, mst, Rusty Russell, kvm, virtualization

Eliminate the code duplication between virtqueue_add_buf and
virtqueue_add_sg.  That's safe to do now that no devices will
pass scatterlists with a termination marker in the middle.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 drivers/virtio/virtio_ring.c |  159 +++---------------------------------------
 1 files changed, 11 insertions(+), 148 deletions(-)

diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
index b803cf7..71488c5 100644
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -123,63 +123,6 @@ struct vring_virtqueue
 
 #define to_vvq(_vq) container_of(_vq, struct vring_virtqueue, vq)
 
-/* Set up an indirect table of descriptors and add it to the queue. */
-static int vring_add_indirect(struct vring_virtqueue *vq,
-			      struct scatterlist sg[],
-			      unsigned int out,
-			      unsigned int in,
-			      gfp_t gfp)
-{
-	struct vring_desc *desc;
-	unsigned head;
-	int i;
-
-	/*
-	 * We require lowmem mappings for the descriptors because
-	 * otherwise virt_to_phys will give us bogus addresses in the
-	 * virtqueue.
-	 */
-	gfp &= ~(__GFP_HIGHMEM | __GFP_HIGH);
-
-	desc = kmalloc((out + in) * sizeof(struct vring_desc), gfp);
-	if (!desc)
-		return -ENOMEM;
-
-	/* Transfer entries from the sg list into the indirect page */
-	for (i = 0; i < out; i++) {
-		desc[i].flags = VRING_DESC_F_NEXT;
-		desc[i].addr = sg_phys(sg);
-		desc[i].len = sg->length;
-		desc[i].next = i+1;
-		sg++;
-	}
-	for (; i < (out + in); i++) {
-		desc[i].flags = VRING_DESC_F_NEXT|VRING_DESC_F_WRITE;
-		desc[i].addr = sg_phys(sg);
-		desc[i].len = sg->length;
-		desc[i].next = i+1;
-		sg++;
-	}
-
-	/* Last one doesn't continue. */
-	desc[i-1].flags &= ~VRING_DESC_F_NEXT;
-	desc[i-1].next = 0;
-
-	/* We're about to use a buffer */
-	vq->vq.num_free--;
-
-	/* Use a single buffer which doesn't continue */
-	head = vq->free_head;
-	vq->vring.desc[head].flags = VRING_DESC_F_INDIRECT;
-	vq->vring.desc[head].addr = virt_to_phys(desc);
-	vq->vring.desc[head].len = i * sizeof(struct vring_desc);
-
-	/* Update free pointer */
-	vq->free_head = vq->vring.desc[head].next;
-
-	return head;
-}
-
 /**
  * virtqueue_add_buf_single - expose a single scatterlist entry to other end
  * @vq: the struct virtqueue we're talking about.
@@ -234,104 +177,25 @@ EXPORT_SYMBOL_GPL(virtqueue_add_buf_single);
  *
  * Returns zero or a negative error (ie. ENOSPC, ENOMEM).
  */
-int virtqueue_add_buf(struct virtqueue *_vq,
+int virtqueue_add_buf(struct virtqueue *vq,
 		      struct scatterlist sg[],
 		      unsigned int out,
 		      unsigned int in,
 		      void *data,
 		      gfp_t gfp)
 {
-	struct vring_virtqueue *vq = to_vvq(_vq);
-	unsigned int i, avail, uninitialized_var(prev);
-	int head;
-
-	START_USE(vq);
-
-	BUG_ON(data == NULL);
-
-#ifdef DEBUG
-	{
-		ktime_t now = ktime_get();
-
-		/* No kick or get, with .1 second between?  Warn. */
-		if (vq->last_add_time_valid)
-			WARN_ON(ktime_to_ms(ktime_sub(now, vq->last_add_time))
-					    > 100);
-		vq->last_add_time = now;
-		vq->last_add_time_valid = true;
-	}
-#endif
-
-	/* If the host supports indirect descriptor tables, and we have multiple
-	 * buffers, then go indirect. FIXME: tune this threshold */
-	if (vq->indirect && (out + in) > 1 && vq->vq.num_free) {
-		head = vring_add_indirect(vq, sg, out, in, gfp);
-		if (likely(head >= 0))
-			goto add_head;
-	}
-
-	BUG_ON(out + in > vq->vring.num);
-	BUG_ON(out + in == 0);
-
-	if (vq->vq.num_free < out + in) {
-		pr_debug("Can't add buf len %i - avail = %i\n",
-			 out + in, vq->vq.num_free);
-		/* FIXME: for historical reasons, we force a notify here if
-		 * there are outgoing parts to the buffer.  Presumably the
-		 * host should service the ring ASAP. */
-		if (out)
-			vq->notify(&vq->vq);
-		END_USE(vq);
-		return -ENOSPC;
-	}
-
-	/* We're about to use some buffers from the free list. */
-	vq->vq.num_free -= out + in;
-
-	head = vq->free_head;
-	for (i = vq->free_head; out; i = vq->vring.desc[i].next, out--) {
-		vq->vring.desc[i].flags = VRING_DESC_F_NEXT;
-		vq->vring.desc[i].addr = sg_phys(sg);
-		vq->vring.desc[i].len = sg->length;
-		prev = i;
-		sg++;
-	}
-	for (; in; i = vq->vring.desc[i].next, in--) {
-		vq->vring.desc[i].flags = VRING_DESC_F_NEXT|VRING_DESC_F_WRITE;
-		vq->vring.desc[i].addr = sg_phys(sg);
-		vq->vring.desc[i].len = sg->length;
-		prev = i;
-		sg++;
-	}
-	/* Last one doesn't continue. */
-	vq->vring.desc[prev].flags &= ~VRING_DESC_F_NEXT;
-
-	/* Update free pointer */
-	vq->free_head = i;
-
-add_head:
-	/* Set token. */
-	vq->data[head] = data;
-
-	/* Put entry in available array (but don't update avail->idx until they
-	 * do sync). */
-	avail = (vq->vring.avail->idx & (vq->vring.num-1));
-	vq->vring.avail->ring[avail] = head;
-
-	/* Descriptors and available array need to be set before we expose the
-	 * new available array entries. */
-	virtio_wmb(vq);
-	vq->vring.avail->idx++;
-	vq->num_added++;
+	int ret;
 
-	/* This is very unlikely, but theoretically possible.  Kick
-	 * just in case. */
-	if (unlikely(vq->num_added == (1 << 16) - 1))
-		virtqueue_kick(_vq);
+	ret = virtqueue_start_buf(vq, data, out + in, !!out + !!in, gfp);
+	if (ret < 0)
+		return ret;
 
-	pr_debug("Added buffer head %i to %p\n", head, vq);
-	END_USE(vq);
+	if (out)
+		virtqueue_add_sg(vq, sg, out, DMA_TO_DEVICE);
+	if (in)
+		virtqueue_add_sg(vq, sg + out, in, DMA_FROM_DEVICE);
 
+	virtqueue_end_buf(vq);
 	return 0;
 }
 EXPORT_SYMBOL_GPL(virtqueue_add_buf);
@@ -537,8 +401,7 @@ EXPORT_SYMBOL_GPL(virtqueue_start_buf);
  * @nents: the number of items to process in sgl
  * @dir: whether the sgl is read or written (DMA_TO_DEVICE/DMA_FROM_DEVICE only)
  *
- * Note that, unlike virtqueue_add_buf, this function follows chained
- * scatterlists, and stops before the @nents-th item if a scatterlist item
+ * This function will stop before the @nents-th item if a scatterlist item
  * has a marker.
  *
  * Caller must ensure we don't call this with other virtqueue operations
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* Re: [PATCH 1/9] virtio: add functions for piecewise addition of buffers
  2013-02-12 12:23 ` [PATCH 1/9] virtio: add functions for piecewise addition of buffers Paolo Bonzini
@ 2013-02-12 14:56   ` Michael S. Tsirkin
  2013-02-12 15:32     ` Paolo Bonzini
  2013-02-12 18:03   ` [PATCH v2 " Paolo Bonzini
  1 sibling, 1 reply; 35+ messages in thread
From: Michael S. Tsirkin @ 2013-02-12 14:56 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: linux-kernel, Wanlong Gao, asias, Rusty Russell, kvm, virtualization

On Tue, Feb 12, 2013 at 01:23:27PM +0100, Paolo Bonzini wrote:
> virtio device drivers translate requests from higher layer in two steps:
> a device-specific step in the device driver, and generic preparation
> of virtio direct or indirect buffers in virtqueue_add_buf.  Because
> virtqueue_add_buf also accepts the outcome of the first step as a single
> struct scatterlist, drivers may need to put additional items at the
> front or back of the data scatterlists before handing it to virtqueue_add_buf.
> Because of this, virtio-scsi has to copy each request into a scatterlist
> internal to the driver.  It cannot just use the one that was prepared
> by the upper SCSI layers. 
> 
> On top of this, virtqueue_add_buf also has the limitation of not
> supporting chained scatterlists: the buffers must be provided as an
> array of struct scatterlist.  Chained scatterlist, though not supported
> on all architectures, would help for virtio-scsi where all additional
> items are placed at the front.
> 
> This patch adds a different set of APIs for adding a buffer to a virtqueue.
> The new API lets you pass the buffers piecewise, wrapping multiple calls
> to virtqueue_add_sg between virtqueue_start_buf and virtqueue_end_buf.
> virtio-scsi can then call virtqueue_add_sg 3/4 times: for the request
> header, for the write buffer (if present), for the response header, and
> finally for the read buffer (again if present).  It saves the copying
> and the related locking.
> 
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
>  drivers/virtio/virtio_ring.c |  211 ++++++++++++++++++++++++++++++++++++++++++
>  include/linux/virtio.h       |   14 +++
>  2 files changed, 225 insertions(+), 0 deletions(-)
> 
> diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> index ffd7e7d..64184e5 100644
> --- a/drivers/virtio/virtio_ring.c
> +++ b/drivers/virtio/virtio_ring.c
> @@ -101,6 +101,10 @@ struct vring_virtqueue
>  	/* Last used index we've seen. */
>  	u16 last_used_idx;
>  
> +	/* State between virtqueue_start_buf and virtqueue_end_buf.  */
> +	int head;
> +	struct vring_desc *indirect_base, *tail;
> +
>  	/* How to notify other side. FIXME: commonalize hcalls! */
>  	void (*notify)(struct virtqueue *vq);
>  
> @@ -394,6 +398,213 @@ static void detach_buf(struct vring_virtqueue *vq, unsigned int head)
>  	vq->vq.num_free++;
>  }
>  
> +/**
> + * virtqueue_start_buf - start building buffer for the other end
> + * @vq: the struct virtqueue we're talking about.
> + * @data: the token identifying the buffer.
> + * @nents: the number of buffers that will be added

This function starts building one buffer, number of buffers
is a bit weird here.

> + * @nsg: the number of sg lists that will be added

This means number of calls to add_sg ? Not sure why this matters.
How about we pass in in_num/out_num - that is total # of sg,
same as add_buf?


> + * @gfp: how to do memory allocations (if necessary).
> + *
> + * Caller must ensure we don't call this with other virtqueue operations
> + * at the same time (except where noted), and that a successful call is
> + * followed by one or more calls to virtqueue_add_sg, and finally a call
> + * to virtqueue_end_buf.
> + *
> + * Returns zero or a negative error (ie. ENOSPC).
> + */
> +int virtqueue_start_buf(struct virtqueue *_vq,
> +			void *data,
> +			unsigned int nents,
> +			unsigned int nsg,
> +			gfp_t gfp)
> +{
> +	struct vring_virtqueue *vq = to_vvq(_vq);
> +	struct vring_desc *desc = NULL;
> +	int head;
> +	int ret = -ENOMEM;
> +
> +	START_USE(vq);
> +
> +	BUG_ON(data == NULL);
> +
> +#ifdef DEBUG
> +	{
> +		ktime_t now = ktime_get();
> +
> +		/* No kick or get, with .1 second between?  Warn. */
> +		if (vq->last_add_time_valid)
> +			WARN_ON(ktime_to_ms(ktime_sub(now, vq->last_add_time))
> +					    > 100);
> +		vq->last_add_time = now;
> +		vq->last_add_time_valid = true;
> +	}
> +#endif
> +
> +	BUG_ON(nents < nsg);
> +	BUG_ON(nsg == 0);
> +
> +	/*
> +	 * If the host supports indirect descriptor tables, and there is
> +	 * no space for direct buffers or there are multi-item scatterlists,
> +	 * go indirect.
> +	 */
> +	head = vq->free_head;
> +	if (vq->indirect && (nents > nsg || vq->vq.num_free < nents)) {
> +		if (vq->vq.num_free == 0)
> +			goto no_space;
> +
> +		desc = kmalloc(nents * sizeof(struct vring_desc), gfp);
> +		if (!desc)
> +			goto error;
> +
> +		/* We're about to use a buffer */
> +		vq->vq.num_free--;
> +
> +		/* Use a single buffer which doesn't continue */
> +		vq->vring.desc[head].flags = VRING_DESC_F_INDIRECT;
> +		vq->vring.desc[head].addr = virt_to_phys(desc);
> +		vq->vring.desc[head].len = nents * sizeof(struct vring_desc);
> +
> +		/* Update free pointer */
> +		vq->free_head = vq->vring.desc[head].next;
> +	}
> +
> +	/* Set token. */
> +	vq->data[head] = data;
> +
> +	pr_debug("Started buffer head %i for %p\n", head, vq);
> +
> +	vq->indirect_base = desc;
> +	vq->tail = NULL;
> +	vq->head = head;
> +	return 0;
> +
> +no_space:
> +	ret = -ENOSPC;
> +error:
> +	pr_debug("Can't add buf (%d) - nents = %i, avail = %i\n",
> +		 ret, nents, vq->vq.num_free);
> +	END_USE(vq);
> +	return ret;
> +}
> +EXPORT_SYMBOL_GPL(virtqueue_start_buf);
> +
> +/**
> + * virtqueue_add_sg - add sglist to buffer being built
> + * @_vq: the virtqueue for which the buffer is being built
> + * @sgl: the description of the buffer(s).
> + * @nents: the number of items to process in sgl
> + * @dir: whether the sgl is read or written (DMA_TO_DEVICE/DMA_FROM_DEVICE only)
> + *
> + * Note that, unlike virtqueue_add_buf, this function follows chained
> + * scatterlists, and stops before the @nents-th item if a scatterlist item
> + * has a marker.
> + *
> + * Caller must ensure we don't call this with other virtqueue operations
> + * at the same time (except where noted).

Hmm so if you want to add in and out, need separate calls?
in_num/out_num would be nicer?


> + */
> +void virtqueue_add_sg(struct virtqueue *_vq,
> +		      struct scatterlist sgl[],
> +		      unsigned int nents,
> +		      enum dma_data_direction dir)
> +{
> +	struct vring_virtqueue *vq = to_vvq(_vq);
> +	unsigned int i, n;
> +	struct scatterlist *sg;
> +	struct vring_desc *tail;
> +	u32 flags;
> +
> +#ifdef DEBUG
> +	BUG_ON(!vq->in_use);
> +#endif
> +
> +	BUG_ON(dir != DMA_FROM_DEVICE && dir != DMA_TO_DEVICE);
> +	BUG_ON(nents == 0);
> +
> +	flags = dir == DMA_FROM_DEVICE ? VRING_DESC_F_WRITE : 0;
> +	flags |= VRING_DESC_F_NEXT;
> +
> +	/*
> +	 * If using indirect descriptor tables, fill in the buffers
> +	 * at vq->indirect_base.
> +	 */
> +	if (vq->indirect_base) {
> +		i = 0;
> +		if (likely(vq->tail))
> +			i = vq->tail - vq->indirect_base + 1;
> +
> +		for_each_sg(sgl, sg, nents, n) {
> +			tail = &vq->indirect_base[i];
> +			tail->flags = flags;
> +			tail->addr = sg_phys(sg);
> +			tail->len = sg->length;
> +			tail->next = ++i;
> +		}
> +	} else {
> +		BUG_ON(vq->vq.num_free < nents);
> +
> +		i = vq->free_head;
> +		for_each_sg(sgl, sg, nents, n) {
> +			tail = &vq->vring.desc[i];
> +			tail->flags = flags;
> +			tail->addr = sg_phys(sg);
> +			tail->len = sg->length;
> +			i = tail->next;
> +			vq->vq.num_free--;
> +		}
> +
> +		vq->free_head = i;
> +	}
> +	vq->tail = tail;
> +}
> +EXPORT_SYMBOL_GPL(virtqueue_add_sg);
> +
> +/**
> + * virtqueue_end_buf - expose buffer to other end
> + * @_vq: the virtqueue for which the buffer was built
> + *
> + * Caller must ensure we don't call this with other virtqueue operations
> + * at the same time (except where noted).
> + */
> +void virtqueue_end_buf(struct virtqueue *_vq)
> +{
> +	struct vring_virtqueue *vq = to_vvq(_vq);
> +	unsigned int avail;
> +	int head = vq->head;
> +	struct vring_desc *tail = vq->tail;
> +
> +#ifdef DEBUG
> +	BUG_ON(!vq->in_use);
> +#endif
> +	BUG_ON(tail == NULL);
> +
> +	/* The last one does not have the next flag set.  */
> +	tail->flags &= ~VRING_DESC_F_NEXT;
> +
> +	/*
> +	 * Put entry in available array.  Descriptors and available array
> +	 * need to be set before we expose the new available array entries.
> +	 */
> +	avail = vq->vring.avail->idx & (vq->vring.num-1);
> +	vq->vring.avail->ring[avail] = head;
> +	virtio_wmb(vq);
> +
> +	vq->vring.avail->idx++;
> +	vq->num_added++;
> +
> +	/*
> +	 * This is very unlikely, but theoretically possible.  Kick
> +	 * just in case.
> +	 */
> +	if (unlikely(vq->num_added == (1 << 16) - 1))
> +		virtqueue_kick(&vq->vq);
> +
> +	pr_debug("Added buffer head %i to %p\n", head, vq);
> +	END_USE(vq);
> +}
> +EXPORT_SYMBOL_GPL(virtqueue_end_buf);
> +
>  static inline bool more_used(const struct vring_virtqueue *vq)
>  {
>  	return vq->last_used_idx != vq->vring.used->idx;
> diff --git a/include/linux/virtio.h b/include/linux/virtio.h
> index cf8adb1..43d6bc3 100644
> --- a/include/linux/virtio.h
> +++ b/include/linux/virtio.h
> @@ -7,6 +7,7 @@
>  #include <linux/spinlock.h>
>  #include <linux/device.h>
>  #include <linux/mod_devicetable.h>
> +#include <linux/dma-direction.h>
>  #include <linux/gfp.h>
>  
>  /**
> @@ -40,6 +41,19 @@ int virtqueue_add_buf(struct virtqueue *vq,
>  		      void *data,
>  		      gfp_t gfp);
>  
> +int virtqueue_start_buf(struct virtqueue *_vq,
> +			void *data,
> +			unsigned int nents,
> +			unsigned int nsg,
> +			gfp_t gfp);
> +
> +void virtqueue_add_sg(struct virtqueue *_vq,
> +		      struct scatterlist sgl[],
> +		      unsigned int nents,
> +		      enum dma_data_direction dir);
> +
> +void virtqueue_end_buf(struct virtqueue *_vq);
> +
>  void virtqueue_kick(struct virtqueue *vq);
>  
>  bool virtqueue_kick_prepare(struct virtqueue *vq);
> -- 
> 1.7.1
> 

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 1/9] virtio: add functions for piecewise addition of buffers
  2013-02-12 14:56   ` Michael S. Tsirkin
@ 2013-02-12 15:32     ` Paolo Bonzini
  2013-02-12 15:43       ` Michael S. Tsirkin
  0 siblings, 1 reply; 35+ messages in thread
From: Paolo Bonzini @ 2013-02-12 15:32 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: linux-kernel, Wanlong Gao, asias, Rusty Russell, kvm, virtualization

Il 12/02/2013 15:56, Michael S. Tsirkin ha scritto:
>> > +/**
>> > + * virtqueue_start_buf - start building buffer for the other end
>> > + * @vq: the struct virtqueue we're talking about.
>> > + * @data: the token identifying the buffer.
>> > + * @nents: the number of buffers that will be added
> This function starts building one buffer, number of buffers
> is a bit weird here.

Ok.

>> > + * @nsg: the number of sg lists that will be added
> This means number of calls to add_sg ? Not sure why this matters.
> How about we pass in in_num/out_num - that is total # of sg,
> same as add_buf?

It is used to choose between direct and indirect.

>> > +/**
>> > + * virtqueue_add_sg - add sglist to buffer being built
>> > + * @_vq: the virtqueue for which the buffer is being built
>> > + * @sgl: the description of the buffer(s).
>> > + * @nents: the number of items to process in sgl
>> > + * @dir: whether the sgl is read or written (DMA_TO_DEVICE/DMA_FROM_DEVICE only)
>> > + *
>> > + * Note that, unlike virtqueue_add_buf, this function follows chained
>> > + * scatterlists, and stops before the @nents-th item if a scatterlist item
>> > + * has a marker.
>> > + *
>> > + * Caller must ensure we don't call this with other virtqueue operations
>> > + * at the same time (except where noted).
> Hmm so if you want to add in and out, need separate calls?
> in_num/out_num would be nicer?

If you want to add in and out just use virtqueue_add_buf...

Paolo


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 1/9] virtio: add functions for piecewise addition of buffers
  2013-02-12 15:32     ` Paolo Bonzini
@ 2013-02-12 15:43       ` Michael S. Tsirkin
  2013-02-12 15:48         ` Paolo Bonzini
  0 siblings, 1 reply; 35+ messages in thread
From: Michael S. Tsirkin @ 2013-02-12 15:43 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: linux-kernel, Wanlong Gao, asias, Rusty Russell, kvm, virtualization

On Tue, Feb 12, 2013 at 04:32:27PM +0100, Paolo Bonzini wrote:
> Il 12/02/2013 15:56, Michael S. Tsirkin ha scritto:
> >> > +/**
> >> > + * virtqueue_start_buf - start building buffer for the other end
> >> > + * @vq: the struct virtqueue we're talking about.
> >> > + * @data: the token identifying the buffer.
> >> > + * @nents: the number of buffers that will be added
> > This function starts building one buffer, number of buffers
> > is a bit weird here.
> 
> Ok.
> 
> >> > + * @nsg: the number of sg lists that will be added
> > This means number of calls to add_sg ? Not sure why this matters.
> > How about we pass in in_num/out_num - that is total # of sg,
> > same as add_buf?
> 
> It is used to choose between direct and indirect.

total number of in and out should be enough for this, no?

> >> > +/**
> >> > + * virtqueue_add_sg - add sglist to buffer being built
> >> > + * @_vq: the virtqueue for which the buffer is being built
> >> > + * @sgl: the description of the buffer(s).
> >> > + * @nents: the number of items to process in sgl
> >> > + * @dir: whether the sgl is read or written (DMA_TO_DEVICE/DMA_FROM_DEVICE only)
> >> > + *
> >> > + * Note that, unlike virtqueue_add_buf, this function follows chained
> >> > + * scatterlists, and stops before the @nents-th item if a scatterlist item
> >> > + * has a marker.
> >> > + *
> >> > + * Caller must ensure we don't call this with other virtqueue operations
> >> > + * at the same time (except where noted).
> > Hmm so if you want to add in and out, need separate calls?
> > in_num/out_num would be nicer?
> 
> If you want to add in and out just use virtqueue_add_buf...
> 
> Paolo

I thought the point of this one is maximum flexibility.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 1/9] virtio: add functions for piecewise addition of buffers
  2013-02-12 15:43       ` Michael S. Tsirkin
@ 2013-02-12 15:48         ` Paolo Bonzini
  2013-02-12 16:13           ` Michael S. Tsirkin
  0 siblings, 1 reply; 35+ messages in thread
From: Paolo Bonzini @ 2013-02-12 15:48 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: linux-kernel, Wanlong Gao, asias, Rusty Russell, kvm, virtualization

Il 12/02/2013 16:43, Michael S. Tsirkin ha scritto:
> On Tue, Feb 12, 2013 at 04:32:27PM +0100, Paolo Bonzini wrote:
>> Il 12/02/2013 15:56, Michael S. Tsirkin ha scritto:
>>>>> +/**
>>>>> + * virtqueue_start_buf - start building buffer for the other end
>>>>> + * @vq: the struct virtqueue we're talking about.
>>>>> + * @data: the token identifying the buffer.
>>>>> + * @nents: the number of buffers that will be added
>>> This function starts building one buffer, number of buffers
>>> is a bit weird here.
>>
>> Ok.
>>
>>>>> + * @nsg: the number of sg lists that will be added
>>> This means number of calls to add_sg ? Not sure why this matters.
>>> How about we pass in in_num/out_num - that is total # of sg,
>>> same as add_buf?
>>
>> It is used to choose between direct and indirect.
> 
> total number of in and out should be enough for this, no?

Originally, I used nsg/nents because I wanted to use mixed direct and
indirect buffers.  nsg/nents let me choose between full direct (nsg ==
nents), mixed (num_free >= nsg), full indirect (num_free < nsg).  Then I
had to give up because QEMU does not support it, but I still would like
to keep that open in the API.

In this series, however, I am still using nsg to choose between direct
and indirect.  I would like to use dirtect for small scatterlists, even
if they are surrounded by a request/response headers/footers.

>>>>> +/**
>>>>> + * virtqueue_add_sg - add sglist to buffer being built
>>>>> + * @_vq: the virtqueue for which the buffer is being built
>>>>> + * @sgl: the description of the buffer(s).
>>>>> + * @nents: the number of items to process in sgl
>>>>> + * @dir: whether the sgl is read or written (DMA_TO_DEVICE/DMA_FROM_DEVICE only)
>>>>> + *
>>>>> + * Note that, unlike virtqueue_add_buf, this function follows chained
>>>>> + * scatterlists, and stops before the @nents-th item if a scatterlist item
>>>>> + * has a marker.
>>>>> + *
>>>>> + * Caller must ensure we don't call this with other virtqueue operations
>>>>> + * at the same time (except where noted).
>>> Hmm so if you want to add in and out, need separate calls?
>>> in_num/out_num would be nicer?
>>
>> If you want to add in and out just use virtqueue_add_buf...
> 
> I thought the point of this one is maximum flexibility.

Maximum flexibility does not include doing everything in one call (the
other way round in fact: you already need to wrap with start/end, hence
doing one or two extra add_sg calls is not important).

Paolo


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 1/9] virtio: add functions for piecewise addition of buffers
  2013-02-12 15:48         ` Paolo Bonzini
@ 2013-02-12 16:13           ` Michael S. Tsirkin
  2013-02-12 16:17             ` Paolo Bonzini
  0 siblings, 1 reply; 35+ messages in thread
From: Michael S. Tsirkin @ 2013-02-12 16:13 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: linux-kernel, Wanlong Gao, asias, Rusty Russell, kvm, virtualization

On Tue, Feb 12, 2013 at 04:48:39PM +0100, Paolo Bonzini wrote:
> Il 12/02/2013 16:43, Michael S. Tsirkin ha scritto:
> > On Tue, Feb 12, 2013 at 04:32:27PM +0100, Paolo Bonzini wrote:
> >> Il 12/02/2013 15:56, Michael S. Tsirkin ha scritto:
> >>>>> +/**
> >>>>> + * virtqueue_start_buf - start building buffer for the other end
> >>>>> + * @vq: the struct virtqueue we're talking about.
> >>>>> + * @data: the token identifying the buffer.
> >>>>> + * @nents: the number of buffers that will be added
> >>> This function starts building one buffer, number of buffers
> >>> is a bit weird here.
> >>
> >> Ok.
> >>
> >>>>> + * @nsg: the number of sg lists that will be added
> >>> This means number of calls to add_sg ? Not sure why this matters.
> >>> How about we pass in in_num/out_num - that is total # of sg,
> >>> same as add_buf?
> >>
> >> It is used to choose between direct and indirect.
> > 
> > total number of in and out should be enough for this, no?
> 
> Originally, I used nsg/nents because I wanted to use mixed direct and
> indirect buffers.  nsg/nents let me choose between full direct (nsg ==
> nents), mixed (num_free >= nsg), full indirect (num_free < nsg).  Then I
> had to give up because QEMU does not support it, but I still would like
> to keep that open in the API.

Problem is it does not seem to make sense in the API.

> In this series, however, I am still using nsg to choose between direct
> and indirect.  I would like to use dirtect for small scatterlists, even
> if they are surrounded by a request/response headers/footers.

Shouldn't we base this on total number of s/g entries?
I don't see why does it matter how many calls you use
to build up the list.

> >>>>> +/**
> >>>>> + * virtqueue_add_sg - add sglist to buffer being built
> >>>>> + * @_vq: the virtqueue for which the buffer is being built
> >>>>> + * @sgl: the description of the buffer(s).
> >>>>> + * @nents: the number of items to process in sgl
> >>>>> + * @dir: whether the sgl is read or written (DMA_TO_DEVICE/DMA_FROM_DEVICE only)
> >>>>> + *
> >>>>> + * Note that, unlike virtqueue_add_buf, this function follows chained
> >>>>> + * scatterlists, and stops before the @nents-th item if a scatterlist item
> >>>>> + * has a marker.
> >>>>> + *
> >>>>> + * Caller must ensure we don't call this with other virtqueue operations
> >>>>> + * at the same time (except where noted).
> >>> Hmm so if you want to add in and out, need separate calls?
> >>> in_num/out_num would be nicer?
> >>
> >> If you want to add in and out just use virtqueue_add_buf...
> > 
> > I thought the point of this one is maximum flexibility.
> 
> Maximum flexibility does not include doing everything in one call (the
> other way round in fact: you already need to wrap with start/end, hence
> doing one or two extra add_sg calls is not important).
> 
> Paolo

My point is, we have exactly same number of parameters:
in + out instead of nsg + direction, and we get more
functionality.

-- 
MST

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 1/9] virtio: add functions for piecewise addition of buffers
  2013-02-12 16:13           ` Michael S. Tsirkin
@ 2013-02-12 16:17             ` Paolo Bonzini
  2013-02-12 16:35               ` Michael S. Tsirkin
  0 siblings, 1 reply; 35+ messages in thread
From: Paolo Bonzini @ 2013-02-12 16:17 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: linux-kernel, Wanlong Gao, asias, Rusty Russell, kvm, virtualization

Il 12/02/2013 17:13, Michael S. Tsirkin ha scritto:
>>>>>>> + * @nsg: the number of sg lists that will be added
>>>>> This means number of calls to add_sg ? Not sure why this matters.
>>>>> How about we pass in in_num/out_num - that is total # of sg,
>>>>> same as add_buf?
>>>>
>>>> It is used to choose between direct and indirect.
>>>
>>> total number of in and out should be enough for this, no?
>>
>> Originally, I used nsg/nents because I wanted to use mixed direct and
>> indirect buffers.  nsg/nents let me choose between full direct (nsg ==
>> nents), mixed (num_free >= nsg), full indirect (num_free < nsg).  Then I
>> had to give up because QEMU does not support it, but I still would like
>> to keep that open in the API.
> 
> Problem is it does not seem to make sense in the API.

Why not?  Perhaps in the idea you have of the implementation, but in the
API it definitely makes sense.  It's a fast-path API, it makes sense to
provide as much information as possible upfront.

>> In this series, however, I am still using nsg to choose between direct
>> and indirect.  I would like to use dirtect for small scatterlists, even
>> if they are surrounded by a request/response headers/footers.
> 
> Shouldn't we base this on total number of s/g entries?
> I don't see why does it matter how many calls you use
> to build up the list.

The idea is that in general the headers/footers are few (so their number
doesn't really matter) and are in singleton scatterlists.  Hence, the
heuristic checks at the data part of the request, and chooses
direct/indirect depending on the size of that part.

>>>>>>> +/**
>>>>>>> + * virtqueue_add_sg - add sglist to buffer being built
>>>>>>> + * @_vq: the virtqueue for which the buffer is being built
>>>>>>> + * @sgl: the description of the buffer(s).
>>>>>>> + * @nents: the number of items to process in sgl
>>>>>>> + * @dir: whether the sgl is read or written (DMA_TO_DEVICE/DMA_FROM_DEVICE only)
>>>>>>> + *
>>>>>>> + * Note that, unlike virtqueue_add_buf, this function follows chained
>>>>>>> + * scatterlists, and stops before the @nents-th item if a scatterlist item
>>>>>>> + * has a marker.
>>>>>>> + *
>>>>>>> + * Caller must ensure we don't call this with other virtqueue operations
>>>>>>> + * at the same time (except where noted).
>>>>> Hmm so if you want to add in and out, need separate calls?
>>>>> in_num/out_num would be nicer?
>>>>
>>>> If you want to add in and out just use virtqueue_add_buf...
>>>
>>> I thought the point of this one is maximum flexibility.
>>
>> Maximum flexibility does not include doing everything in one call (the
>> other way round in fact: you already need to wrap with start/end, hence
>> doing one or two extra add_sg calls is not important).
> 
> My point is, we have exactly same number of parameters:
> in + out instead of nsg + direction, and we get more
> functionality.

And we also have more complex (and slower) code, that would never be
used.  You would never save more than one call, because you cannot
alternate out and in buffers arbitrarily.  And if you look at how the
API is actually used in virtio-blk and virtio-scsi, building the buffers
"two-piecewise" doesn't fit the flow of the code.

Paolo

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 1/9] virtio: add functions for piecewise addition of buffers
  2013-02-12 16:17             ` Paolo Bonzini
@ 2013-02-12 16:35               ` Michael S. Tsirkin
  2013-02-12 16:57                 ` Paolo Bonzini
  0 siblings, 1 reply; 35+ messages in thread
From: Michael S. Tsirkin @ 2013-02-12 16:35 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: linux-kernel, Wanlong Gao, asias, Rusty Russell, kvm, virtualization

On Tue, Feb 12, 2013 at 05:17:47PM +0100, Paolo Bonzini wrote:
> Il 12/02/2013 17:13, Michael S. Tsirkin ha scritto:
> >>>>>>> + * @nsg: the number of sg lists that will be added
> >>>>> This means number of calls to add_sg ? Not sure why this matters.
> >>>>> How about we pass in in_num/out_num - that is total # of sg,
> >>>>> same as add_buf?
> >>>>
> >>>> It is used to choose between direct and indirect.
> >>>
> >>> total number of in and out should be enough for this, no?
> >>
> >> Originally, I used nsg/nents because I wanted to use mixed direct and
> >> indirect buffers.  nsg/nents let me choose between full direct (nsg ==
> >> nents), mixed (num_free >= nsg), full indirect (num_free < nsg).  Then I
> >> had to give up because QEMU does not support it, but I still would like
> >> to keep that open in the API.
> > 
> > Problem is it does not seem to make sense in the API.
> 
> Why not?  Perhaps in the idea you have of the implementation, but in the
> API it definitely makes sense.  It's a fast-path API, it makes sense to
> provide as much information as possible upfront.

If we are ignoring some information, I think we are better off
without asking for it.

> >> In this series, however, I am still using nsg to choose between direct
> >> and indirect.  I would like to use dirtect for small scatterlists, even
> >> if they are surrounded by a request/response headers/footers.
> > 
> > Shouldn't we base this on total number of s/g entries?
> > I don't see why does it matter how many calls you use
> > to build up the list.
> 
> The idea is that in general the headers/footers are few (so their number
> doesn't really matter) and are in singleton scatterlists.  Hence, the
> heuristic checks at the data part of the request, and chooses
> direct/indirect depending on the size of that part.

Why? Why not the total size as we did before?

> >>>>>>> +/**
> >>>>>>> + * virtqueue_add_sg - add sglist to buffer being built
> >>>>>>> + * @_vq: the virtqueue for which the buffer is being built
> >>>>>>> + * @sgl: the description of the buffer(s).
> >>>>>>> + * @nents: the number of items to process in sgl
> >>>>>>> + * @dir: whether the sgl is read or written (DMA_TO_DEVICE/DMA_FROM_DEVICE only)
> >>>>>>> + *
> >>>>>>> + * Note that, unlike virtqueue_add_buf, this function follows chained
> >>>>>>> + * scatterlists, and stops before the @nents-th item if a scatterlist item
> >>>>>>> + * has a marker.
> >>>>>>> + *
> >>>>>>> + * Caller must ensure we don't call this with other virtqueue operations
> >>>>>>> + * at the same time (except where noted).
> >>>>> Hmm so if you want to add in and out, need separate calls?
> >>>>> in_num/out_num would be nicer?
> >>>>
> >>>> If you want to add in and out just use virtqueue_add_buf...
> >>>
> >>> I thought the point of this one is maximum flexibility.
> >>
> >> Maximum flexibility does not include doing everything in one call (the
> >> other way round in fact: you already need to wrap with start/end, hence
> >> doing one or two extra add_sg calls is not important).
> > 
> > My point is, we have exactly same number of parameters:
> > in + out instead of nsg + direction, and we get more
> > functionality.
> 
> And we also have more complex (and slower) code, that would never be
> used.

Instead of 
	flags = (directon == from_device) ? out : in;

you would do

	flags = idx > in ? out : in;

why is this slower?


> You would never save more than one call, because you cannot
> alternate out and in buffers arbitrarily.

That's the problem with the API, it apparently let you do this, and
if you do it will fail at run time.  If we specify in/out upfront in
start, there's no way to misuse the API.

>  And if you look at how the
> API is actually used in virtio-blk and virtio-scsi, building the buffers
> "two-piecewise" doesn't fit the flow of the code.
> 
> Paolo


-- 
MST

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 1/9] virtio: add functions for piecewise addition of buffers
  2013-02-12 16:35               ` Michael S. Tsirkin
@ 2013-02-12 16:57                 ` Paolo Bonzini
  2013-02-12 17:34                   ` Michael S. Tsirkin
  0 siblings, 1 reply; 35+ messages in thread
From: Paolo Bonzini @ 2013-02-12 16:57 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: linux-kernel, Wanlong Gao, asias, Rusty Russell, kvm, virtualization

Il 12/02/2013 17:35, Michael S. Tsirkin ha scritto:
> On Tue, Feb 12, 2013 at 05:17:47PM +0100, Paolo Bonzini wrote:
>> Il 12/02/2013 17:13, Michael S. Tsirkin ha scritto:
>>>>>>>>> + * @nsg: the number of sg lists that will be added
>>>>>>> This means number of calls to add_sg ? Not sure why this matters.
>>>>>>> How about we pass in in_num/out_num - that is total # of sg,
>>>>>>> same as add_buf?
>>>>>>
>>>>>> It is used to choose between direct and indirect.
>>>>>
>>>>> total number of in and out should be enough for this, no?
>>>>
>>>> Originally, I used nsg/nents because I wanted to use mixed direct and
>>>> indirect buffers.  nsg/nents let me choose between full direct (nsg ==
>>>> nents), mixed (num_free >= nsg), full indirect (num_free < nsg).  Then I
>>>> had to give up because QEMU does not support it, but I still would like
>>>> to keep that open in the API.
>>>
>>> Problem is it does not seem to make sense in the API.
>>
>> Why not?  Perhaps in the idea you have of the implementation, but in the
>> API it definitely makes sense.  It's a fast-path API, it makes sense to
>> provide as much information as possible upfront.
> 
> If we are ignoring some information, I think we are better off
> without asking for it.

We're not ignoring it.  virtqueue_start_buf uses both nents and nsg:

        if (vq->indirect && (nents > nsg || vq->vq.num_free < nents)) {
		/* indirect */
	}

>>>> In this series, however, I am still using nsg to choose between direct
>>>> and indirect.  I would like to use dirtect for small scatterlists, even
>>>> if they are surrounded by a request/response headers/footers.
>>>
>>> Shouldn't we base this on total number of s/g entries?
>>> I don't see why does it matter how many calls you use
>>> to build up the list.
>>
>> The idea is that in general the headers/footers are few (so their number
>> doesn't really matter) and are in singleton scatterlists.  Hence, the
>> heuristic checks at the data part of the request, and chooses
>> direct/indirect depending on the size of that part.
> 
> Why? Why not the total size as we did before?

"More than one buffer" is not a great heuristic.  In particular, it
causes all virtio-blk and virtio-scsi requests to go indirect.

More than three buffers, or more than five buffers, is just an ad-hoc
hack, and similarly not great.

>>>>>>>>> +/**
>>>>>>>>> + * virtqueue_add_sg - add sglist to buffer being built
>>>>>>>>> + * @_vq: the virtqueue for which the buffer is being built
>>>>>>>>> + * @sgl: the description of the buffer(s).
>>>>>>>>> + * @nents: the number of items to process in sgl
>>>>>>>>> + * @dir: whether the sgl is read or written (DMA_TO_DEVICE/DMA_FROM_DEVICE only)
>>>>>>>>> + *
>>>>>>>>> + * Note that, unlike virtqueue_add_buf, this function follows chained
>>>>>>>>> + * scatterlists, and stops before the @nents-th item if a scatterlist item
>>>>>>>>> + * has a marker.
>>>>>>>>> + *
>>>>>>>>> + * Caller must ensure we don't call this with other virtqueue operations
>>>>>>>>> + * at the same time (except where noted).
>>>>>>> Hmm so if you want to add in and out, need separate calls?
>>>>>>> in_num/out_num would be nicer?
>>>>>>
>>>>>> If you want to add in and out just use virtqueue_add_buf...
>>>>>
>>>>> I thought the point of this one is maximum flexibility.
>>>>
>>>> Maximum flexibility does not include doing everything in one call (the
>>>> other way round in fact: you already need to wrap with start/end, hence
>>>> doing one or two extra add_sg calls is not important).
>>>
>>> My point is, we have exactly same number of parameters:
>>> in + out instead of nsg + direction, and we get more
>>> functionality.
>>
>> And we also have more complex (and slower) code, that would never be
>> used.
> 
> Instead of 
> 	flags = (directon == from_device) ? out : in;
> 
> you would do
> 
> 	flags = idx > in ? out : in;
> 
> why is this slower?

You said "in + out instead of nsg + direction", but now instead you're
talking about specifying in/out upfront in virtqueue_start_buf.

Specifying in/out in virtqueue_add_sg would have two loops instead of
one, one of them (you don't know which) unused on every call, and
wouldn't fix the problem of possibly misusing the API.

Specifying in/out upfront would look something like

	flags = vq->idx > vq->in ? VRING_DESC_F_WRITE : 0;

or with some optimization

	flags = vq->something > 0 ? VRING_DESC_F_WRITE : 0;

It is not clear for me whether you'd allow a single virtqueue_add_sg to
cover both out and in elements.  If so, the function would become much
more complex because the flags could change in the middle, and that's
what I was referring to.  If not, you traded one possible misuse with
another.

>> You would never save more than one call, because you cannot
>> alternate out and in buffers arbitrarily.
> 
> That's the problem with the API, it apparently let you do this, and
> if you do it will fail at run time.  If we specify in/out upfront in
> start, there's no way to misuse the API.

Perhaps, but 3 or 4 arguments (in/out/nsg or in/out/nsg_in/nsg_out) just
for this are definitely too many and make the API harder to use.

You have to find a balance.  Having actually used the API, the
possibility of mixing in/out buffers by mistake never even occurred to
me, much less happened in practice, so I didn't consider it a problem.
Mixing in/out buffers in a single call wasn't a necessity, either.

Paolo

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 1/9] virtio: add functions for piecewise addition of buffers
  2013-02-12 16:57                 ` Paolo Bonzini
@ 2013-02-12 17:34                   ` Michael S. Tsirkin
  2013-02-12 18:04                     ` Paolo Bonzini
  0 siblings, 1 reply; 35+ messages in thread
From: Michael S. Tsirkin @ 2013-02-12 17:34 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: linux-kernel, Wanlong Gao, asias, Rusty Russell, kvm, virtualization

On Tue, Feb 12, 2013 at 05:57:55PM +0100, Paolo Bonzini wrote:
> Il 12/02/2013 17:35, Michael S. Tsirkin ha scritto:
> > On Tue, Feb 12, 2013 at 05:17:47PM +0100, Paolo Bonzini wrote:
> >> Il 12/02/2013 17:13, Michael S. Tsirkin ha scritto:
> >>>>>>>>> + * @nsg: the number of sg lists that will be added
> >>>>>>> This means number of calls to add_sg ? Not sure why this matters.
> >>>>>>> How about we pass in in_num/out_num - that is total # of sg,
> >>>>>>> same as add_buf?
> >>>>>>
> >>>>>> It is used to choose between direct and indirect.
> >>>>>
> >>>>> total number of in and out should be enough for this, no?
> >>>>
> >>>> Originally, I used nsg/nents because I wanted to use mixed direct and
> >>>> indirect buffers.  nsg/nents let me choose between full direct (nsg ==
> >>>> nents), mixed (num_free >= nsg), full indirect (num_free < nsg).  Then I
> >>>> had to give up because QEMU does not support it, but I still would like
> >>>> to keep that open in the API.
> >>>
> >>> Problem is it does not seem to make sense in the API.
> >>
> >> Why not?  Perhaps in the idea you have of the implementation, but in the
> >> API it definitely makes sense.  It's a fast-path API, it makes sense to
> >> provide as much information as possible upfront.
> > 
> > If we are ignoring some information, I think we are better off
> > without asking for it.
> 
> We're not ignoring it.  virtqueue_start_buf uses both nents and nsg:
> 
>         if (vq->indirect && (nents > nsg || vq->vq.num_free < nents)) {
> 		/* indirect */
> 	}
> >>>> In this series, however, I am still using nsg to choose between direct
> >>>> and indirect.  I would like to use dirtect for small scatterlists, even
> >>>> if they are surrounded by a request/response headers/footers.
> >>>
> >>> Shouldn't we base this on total number of s/g entries?
> >>> I don't see why does it matter how many calls you use
> >>> to build up the list.
> >>
> >> The idea is that in general the headers/footers are few (so their number
> >> doesn't really matter) and are in singleton scatterlists.  Hence, the
> >> heuristic checks at the data part of the request, and chooses
> >> direct/indirect depending on the size of that part.
> > 
> > Why? Why not the total size as we did before?
> 
> "More than one buffer" is not a great heuristic.  In particular, it
> causes all virtio-blk and virtio-scsi requests to go indirect.

If you don't do indirect you get at least 2x less space in the ring.
For blk there were workloads where we always were out of buffers.
Similarly for net, switching heuristics degrades some workloads.
Let's not change these things as part of unrelated API work,
it should be a separate patch with benchmarking showing this
is not a problem.

> More than three buffers, or more than five buffers, is just an ad-hoc
> hack, and similarly not great.

If you want to expose control over indirect buffer to drivers,
we can do this. There were patches on list. How about doing that
and posting actual performance results?  In particular maybe this is
where all the performance wins come from?  This nsgs/nents hack just
seems to rely on how one specific driver uses the API.

> >>>>>>>>> +/**
> >>>>>>>>> + * virtqueue_add_sg - add sglist to buffer being built
> >>>>>>>>> + * @_vq: the virtqueue for which the buffer is being built
> >>>>>>>>> + * @sgl: the description of the buffer(s).
> >>>>>>>>> + * @nents: the number of items to process in sgl
> >>>>>>>>> + * @dir: whether the sgl is read or written (DMA_TO_DEVICE/DMA_FROM_DEVICE only)
> >>>>>>>>> + *
> >>>>>>>>> + * Note that, unlike virtqueue_add_buf, this function follows chained
> >>>>>>>>> + * scatterlists, and stops before the @nents-th item if a scatterlist item
> >>>>>>>>> + * has a marker.
> >>>>>>>>> + *
> >>>>>>>>> + * Caller must ensure we don't call this with other virtqueue operations
> >>>>>>>>> + * at the same time (except where noted).
> >>>>>>> Hmm so if you want to add in and out, need separate calls?
> >>>>>>> in_num/out_num would be nicer?
> >>>>>>
> >>>>>> If you want to add in and out just use virtqueue_add_buf...
> >>>>>
> >>>>> I thought the point of this one is maximum flexibility.
> >>>>
> >>>> Maximum flexibility does not include doing everything in one call (the
> >>>> other way round in fact: you already need to wrap with start/end, hence
> >>>> doing one or two extra add_sg calls is not important).
> >>>
> >>> My point is, we have exactly same number of parameters:
> >>> in + out instead of nsg + direction, and we get more
> >>> functionality.
> >>
> >> And we also have more complex (and slower) code, that would never be
> >> used.
> > 
> > Instead of 
> > 	flags = (directon == from_device) ? out : in;
> > 
> > you would do
> > 
> > 	flags = idx > in ? out : in;
> > 
> > why is this slower?
> 
> You said "in + out instead of nsg + direction", but now instead you're
> talking about specifying in/out upfront in virtqueue_start_buf.
> 
> Specifying in/out in virtqueue_add_sg would have two loops instead of
> one, one of them (you don't know which) unused on every call, and
> wouldn't fix the problem of possibly misusing the API.

One loop, and it also let us avoid setting VRING_DESC_F_NEXT
instead of set then later clear:

+               for_each_sg(sgl, sg, nents, n) {

+       		flags = idx > in_sg ? VRING_DESC_F_WRITE : 0;
+       		flags |= idx < (in_sg + out_sg - 1) ? VRING_DESC_F_NEXT : 0;
+                       tail = &vq->indirect_base[i];
+                       tail->flags = flags;
+                       tail->addr = sg_phys(sg);
+                       tail->len = sg->length;
+                       tail->next = ++i;
+               }                           



> Specifying in/_ut upfront would look something like
> 
> 	flags = vq->idx > vq->in ? VRING_DESC_F_WRITE : 0;
> 
> or with some optimization
> 
> 	flags = vq->something > 0 ? VRING_DESC_F_WRITE : 0;
> 
> It is not clear for me whether you'd allow a single virtqueue_add_sg to
> cover both out and in elements.  If so, the function would become much
> more complex because the flags could change in the middle, and that's
> what I was referring to.

You just move the flag assignment within the loop.
Does not seem more complex at all.

>  If not, you traded one possible misuse with another.
> 
> >> You would never save more than one call, because you cannot
> >> alternate out and in buffers arbitrarily.
> > 
> > That's the problem with the API, it apparently let you do this, and
> > if you do it will fail at run time.  If we specify in/out upfront in
> > start, there's no way to misuse the API.
> 
> Perhaps, but 3 or 4 arguments (in/out/nsg or in/out/nsg_in/nsg_out) just
> for this are definitely too many and make the API harder to use.
> 
> You have to find a balance.  Having actually used the API, the
> possibility of mixing in/out buffers by mistake never even occurred to
> me, much less happened in practice, so I didn't consider it a problem.
> Mixing in/out buffers in a single call wasn't a necessity, either.
> 
> Paolo

It is useful for virtqueue_add_buf implementation.
Basically the more consistent the interface is with virtqueue_add_buf,
the better.

I'm not against changing virtqueue_add_buf if you like but let's keep
it all consistent.

-- 
MST

^ permalink raw reply	[flat|nested] 35+ messages in thread

* [PATCH v2 1/9] virtio: add functions for piecewise addition of buffers
  2013-02-12 12:23 ` [PATCH 1/9] virtio: add functions for piecewise addition of buffers Paolo Bonzini
  2013-02-12 14:56   ` Michael S. Tsirkin
@ 2013-02-12 18:03   ` Paolo Bonzini
  1 sibling, 0 replies; 35+ messages in thread
From: Paolo Bonzini @ 2013-02-12 18:03 UTC (permalink / raw)
  To: linux-kernel; +Cc: mst, gaowanlog, asias, rusty, kvm, virtualization

virtio device drivers translate requests from higher layer in two steps:
a device-specific step in the device driver, and generic preparation
of virtio direct or indirect buffers in virtqueue_add_buf.  Because
virtqueue_add_buf also accepts the outcome of the first step as a single
struct scatterlist, drivers may need to put additional items at the
front or back of the data scatterlists before handing it to virtqueue_add_buf.
Because of this, virtio-scsi has to copy each request into a scatterlist
internal to the driver.  It cannot just use the one that was prepared
by the upper SCSI layers.

On top of this, virtqueue_add_buf also has the limitation of not
supporting chained scatterlists: the buffers must be provided as an
array of struct scatterlist.  Chained scatterlist, though not supported
on all architectures, would help for virtio-scsi where all additional
items are placed at the front.

This patch adds a different set of APIs for adding a buffer to a virtqueue.
The new API lets you pass the buffers piecewise, wrapping multiple calls
to virtqueue_add_sg between virtqueue_start_buf and virtqueue_end_buf.
virtio-scsi can then call virtqueue_add_sg 3/4 times: for the request
header, for the write buffer (if present), for the response header, and
finally for the read buffer (again if present).  It saves the copying
and the related locking.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 drivers/virtio/virtio_ring.c |  211 ++++++++++++++++++++++++++++++++++++++++++
 include/linux/virtio.h       |   14 +++
 2 files changed, 225 insertions(+), 0 deletions(-)

diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
index ffd7e7d..eede8a7 100644
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -101,6 +101,10 @@ struct vring_virtqueue
 	/* Last used index we've seen. */
 	u16 last_used_idx;
 
+	/* State between virtqueue_start_buf and virtqueue_end_buf.  */
+	int head;
+	struct vring_desc *indirect_base, *tail;
+
 	/* How to notify other side. FIXME: commonalize hcalls! */
 	void (*notify)(struct virtqueue *vq);
 
@@ -394,6 +398,213 @@ static void detach_buf(struct vring_virtqueue *vq, unsigned int head)
 	vq->vq.num_free++;
 }
 
+/**
+ * virtqueue_start_buf - start building buffer for the other end
+ * @vq: the struct virtqueue we're talking about.
+ * @data: the token identifying the buffer.
+ * @nents: the number of buffers that will be added
+ * @nsg: the number of sg lists that will be added
+ * @gfp: how to do memory allocations (if necessary).
+ *
+ * Caller must ensure we don't call this with other virtqueue operations
+ * at the same time (except where noted), and that a successful call is
+ * followed by one or more calls to virtqueue_add_sg, and finally a call
+ * to virtqueue_end_buf.
+ *
+ * Returns zero or a negative error (ie. ENOSPC).
+ */
+int virtqueue_start_buf(struct virtqueue *_vq,
+			void *data,
+			unsigned int nents,
+			unsigned int nsg,
+			gfp_t gfp)
+{
+	struct vring_virtqueue *vq = to_vvq(_vq);
+	struct vring_desc *desc = NULL;
+	int head;
+	int ret = -ENOMEM;
+
+	START_USE(vq);
+
+	BUG_ON(data == NULL);
+
+#ifdef DEBUG
+	{
+		ktime_t now = ktime_get();
+
+		/* No kick or get, with .1 second between?  Warn. */
+		if (vq->last_add_time_valid)
+			WARN_ON(ktime_to_ms(ktime_sub(now, vq->last_add_time))
+					    > 100);
+		vq->last_add_time = now;
+		vq->last_add_time_valid = true;
+	}
+#endif
+
+	BUG_ON(nents < nsg);
+	BUG_ON(nsg == 0);
+
+	/*
+	 * If the host supports indirect descriptor tables, and there are
+	 * multiple buffers, go indirect.  More sophisticated heuristics
+	 * could be used instead.
+	 */
+	head = vq->free_head;
+	if (vq->indirect && nents > 1) {
+		if (vq->vq.num_free == 0)
+			goto no_space;
+
+		desc = kmalloc(nents * sizeof(struct vring_desc), gfp);
+		if (!desc)
+			goto error;
+
+		/* We're about to use a buffer */
+		vq->vq.num_free--;
+
+		/* Use a single buffer which doesn't continue */
+		vq->vring.desc[head].flags = VRING_DESC_F_INDIRECT;
+		vq->vring.desc[head].addr = virt_to_phys(desc);
+		vq->vring.desc[head].len = nents * sizeof(struct vring_desc);
+
+		/* Update free pointer */
+		vq->free_head = vq->vring.desc[head].next;
+	}
+
+	/* Set token. */
+	vq->data[head] = data;
+
+	pr_debug("Started buffer head %i for %p\n", head, vq);
+
+	vq->indirect_base = desc;
+	vq->tail = NULL;
+	vq->head = head;
+	return 0;
+
+no_space:
+	ret = -ENOSPC;
+error:
+	pr_debug("Can't add buf (%d) - nents = %i, avail = %i\n",
+		 ret, nents, vq->vq.num_free);
+	END_USE(vq);
+	return ret;
+}
+EXPORT_SYMBOL_GPL(virtqueue_start_buf);
+
+/**
+ * virtqueue_add_sg - add sglist to buffer being built
+ * @_vq: the virtqueue for which the buffer is being built
+ * @sgl: the description of the buffer(s).
+ * @nents: the number of items to process in sgl
+ * @dir: whether the sgl is read or written (DMA_TO_DEVICE/DMA_FROM_DEVICE only)
+ *
+ * Note that, unlike virtqueue_add_buf, this function follows chained
+ * scatterlists, and stops before the @nents-th item if a scatterlist item
+ * has a marker.
+ *
+ * Caller must ensure we don't call this with other virtqueue operations
+ * at the same time (except where noted).
+ */
+void virtqueue_add_sg(struct virtqueue *_vq,
+		      struct scatterlist sgl[],
+		      unsigned int nents,
+		      enum dma_data_direction dir)
+{
+	struct vring_virtqueue *vq = to_vvq(_vq);
+	unsigned int i, n;
+	struct scatterlist *sg;
+	struct vring_desc *tail;
+	u32 flags;
+
+#ifdef DEBUG
+	BUG_ON(!vq->in_use);
+#endif
+
+	BUG_ON(dir != DMA_FROM_DEVICE && dir != DMA_TO_DEVICE);
+	BUG_ON(nents == 0);
+
+	flags = dir == DMA_FROM_DEVICE ? VRING_DESC_F_WRITE : 0;
+	flags |= VRING_DESC_F_NEXT;
+
+	/*
+	 * If using indirect descriptor tables, fill in the buffers
+	 * at vq->indirect_base.
+	 */
+	if (vq->indirect_base) {
+		i = 0;
+		if (likely(vq->tail))
+			i = vq->tail - vq->indirect_base + 1;
+
+		for_each_sg(sgl, sg, nents, n) {
+			tail = &vq->indirect_base[i];
+			tail->flags = flags;
+			tail->addr = sg_phys(sg);
+			tail->len = sg->length;
+			tail->next = ++i;
+		}
+	} else {
+		BUG_ON(vq->vq.num_free < nents);
+
+		i = vq->free_head;
+		for_each_sg(sgl, sg, nents, n) {
+			tail = &vq->vring.desc[i];
+			tail->flags = flags;
+			tail->addr = sg_phys(sg);
+			tail->len = sg->length;
+			i = tail->next;
+			vq->vq.num_free--;
+		}
+
+		vq->free_head = i;
+	}
+	vq->tail = tail;
+}
+EXPORT_SYMBOL_GPL(virtqueue_add_sg);
+
+/**
+ * virtqueue_end_buf - expose buffer to other end
+ * @_vq: the virtqueue for which the buffer was built
+ *
+ * Caller must ensure we don't call this with other virtqueue operations
+ * at the same time (except where noted).
+ */
+void virtqueue_end_buf(struct virtqueue *_vq)
+{
+	struct vring_virtqueue *vq = to_vvq(_vq);
+	unsigned int avail;
+	int head = vq->head;
+	struct vring_desc *tail = vq->tail;
+
+#ifdef DEBUG
+	BUG_ON(!vq->in_use);
+#endif
+	BUG_ON(tail == NULL);
+
+	/* The last one does not have the next flag set.  */
+	tail->flags &= ~VRING_DESC_F_NEXT;
+
+	/*
+	 * Put entry in available array.  Descriptors and available array
+	 * need to be set before we expose the new available array entries.
+	 */
+	avail = vq->vring.avail->idx & (vq->vring.num-1);
+	vq->vring.avail->ring[avail] = head;
+	virtio_wmb(vq);
+
+	vq->vring.avail->idx++;
+	vq->num_added++;
+
+	/*
+	 * This is very unlikely, but theoretically possible.  Kick
+	 * just in case.
+	 */
+	if (unlikely(vq->num_added == (1 << 16) - 1))
+		virtqueue_kick(&vq->vq);
+
+	pr_debug("Added buffer head %i to %p\n", head, vq);
+	END_USE(vq);
+}
+EXPORT_SYMBOL_GPL(virtqueue_end_buf);
+
 static inline bool more_used(const struct vring_virtqueue *vq)
 {
 	return vq->last_used_idx != vq->vring.used->idx;
diff --git a/include/linux/virtio.h b/include/linux/virtio.h
index cf8adb1..b5c2c12 100644
--- a/include/linux/virtio.h
+++ b/include/linux/virtio.h
@@ -7,6 +7,7 @@
 #include <linux/spinlock.h>
 #include <linux/device.h>
 #include <linux/mod_devicetable.h>
+#include <linux/dma-direction.h>
 #include <linux/gfp.h>
 
 /**
@@ -40,6 +41,19 @@ int virtqueue_add_buf(struct virtqueue *vq,
 		      void *data,
 		      gfp_t gfp);
 
+int virtqueue_start_buf(struct virtqueue *_vq,
+			void *data,
+			unsigned int nents,
+			unsigned int nsg,
+			gfp_t gfp);
+
+void virtqueue_add_sg(struct virtqueue *_vq,
+		      struct scatterlist sgl[],
+		      unsigned int nents,
+		      enum dma_data_direction dir);
+
+void virtqueue_end_buf(struct virtqueue *_vq);
+
 void virtqueue_kick(struct virtqueue *vq);
 
 bool virtqueue_kick_prepare(struct virtqueue *vq);
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* Re: [PATCH 1/9] virtio: add functions for piecewise addition of buffers
  2013-02-12 17:34                   ` Michael S. Tsirkin
@ 2013-02-12 18:04                     ` Paolo Bonzini
  2013-02-12 18:23                       ` Michael S. Tsirkin
  0 siblings, 1 reply; 35+ messages in thread
From: Paolo Bonzini @ 2013-02-12 18:04 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: linux-kernel, Wanlong Gao, asias, Rusty Russell, kvm, virtualization

Il 12/02/2013 18:34, Michael S. Tsirkin ha scritto:
> On Tue, Feb 12, 2013 at 05:57:55PM +0100, Paolo Bonzini wrote:
>> Il 12/02/2013 17:35, Michael S. Tsirkin ha scritto:
>>> On Tue, Feb 12, 2013 at 05:17:47PM +0100, Paolo Bonzini wrote:
>>>> Il 12/02/2013 17:13, Michael S. Tsirkin ha scritto:
>>>>>> In this series, however, I am still using nsg to choose between direct
>>>>>> and indirect.  I would like to use dirtect for small scatterlists, even
>>>>>> if they are surrounded by a request/response headers/footers.
>>>>>
>>>>> Shouldn't we base this on total number of s/g entries?
>>>>> I don't see why does it matter how many calls you use
>>>>> to build up the list.
>>>>
>>>> The idea is that in general the headers/footers are few (so their number
>>>> doesn't really matter) and are in singleton scatterlists.  Hence, the
>>>> heuristic checks at the data part of the request, and chooses
>>>> direct/indirect depending on the size of that part.
>>>
>>> Why? Why not the total size as we did before?
>>
>> "More than one buffer" is not a great heuristic.  In particular, it
>> causes all virtio-blk and virtio-scsi requests to go indirect.
> 
> If you don't do indirect you get at least 2x less space in the ring.
> For blk there were workloads where we always were out of buffers.

The heuristic is very conservative actually, and doesn't really get even
close to out-of-buffers.  You can see that in the single-queue results:

# of targets    single-queue
1                  540
2                  795
4                  997
8                 1136
16                1440
24                1408
32                1515

These are with the patched code; if queue space was a problem, you would
see much worse performance as you increase the number of targets (and
benchmark threads).  These are for virtio-scsi, that puts all disks on a
single request queue.

> Similarly for net, switching heuristics degrades some workloads.

Net is not affected.  Obviously for the mergeable buffers case that
always uses direct, but also for the others you can check that it will
always use direct/indirect as in the old scheme.  (Which is why I liked
this heuristic, too).

> Let's not change these things as part of unrelated API work,
> it should be a separate patch with benchmarking showing this
> is not a problem.

The change only happens for virtio-blk and virtio-scsi.  I benchmarked
it, if somebody found different results it would be easily bisectable.

>> More than three buffers, or more than five buffers, is just an ad-hoc
>> hack, and similarly not great.
> 
> If you want to expose control over indirect buffer to drivers,
> we can do this. There were patches on list. How about doing that
> and posting actual performance results?  In particular maybe this is
> where all the performance wins come from?

No, it's not.  Code that was high-ish in the profile disappears
(completely, not just from the profile :)).  But it's not just
performance wins, it's also simplified code and locking, so it would be
worthwhile even with no performance win.

Anyhow, I've sent v2 of this patch with the old heuristic.  It's a
one-line change, it's fine.

>>>> And we also have more complex (and slower) code, that would never be
>>>> used.
>>>
>>> Instead of 
>>> 	flags = (directon == from_device) ? out : in;
>>>
>>> you would do
>>>
>>> 	flags = idx > in ? out : in;
>>>
>>> why is this slower?
>>
>> You said "in + out instead of nsg + direction", but now instead you're
>> talking about specifying in/out upfront in virtqueue_start_buf.
>>
>> Specifying in/out in virtqueue_add_sg would have two loops instead of
>> one, one of them (you don't know which) unused on every call, and
>> wouldn't fix the problem of possibly misusing the API.
> 
> One loop, and it also let us avoid setting VRING_DESC_F_NEXT
> instead of set then later clear:
> 
> +               for_each_sg(sgl, sg, nents, n) {
> 
> +       		flags = idx > in_sg ? VRING_DESC_F_WRITE : 0;
> +       		flags |= idx < (in_sg + out_sg - 1) ? VRING_DESC_F_NEXT : 0;
> +                       tail = &vq->indirect_base[i];
> +                       tail->flags = flags;
> +                       tail->addr = sg_phys(sg);
> +                       tail->len = sg->length;
> +                       tail->next = ++i;
> +               }                           
> 

And slower it becomes.

>>  If not, you traded one possible misuse with another.
>>
>>>> You would never save more than one call, because you cannot
>>>> alternate out and in buffers arbitrarily.
>>>
>>> That's the problem with the API, it apparently let you do this, and
>>> if you do it will fail at run time.  If we specify in/out upfront in
>>> start, there's no way to misuse the API.
>>
>> Perhaps, but 3 or 4 arguments (in/out/nsg or in/out/nsg_in/nsg_out) just
>> for this are definitely too many and make the API harder to use.
>>
>> You have to find a balance.  Having actually used the API, the
>> possibility of mixing in/out buffers by mistake never even occurred to
>> me, much less happened in practice, so I didn't consider it a problem.
>> Mixing in/out buffers in a single call wasn't a necessity, either.
> 
> It is useful for virtqueue_add_buf implementation.

        ret = virtqueue_start_buf(vq, data, out + in, !!out + !!in,
				  gfp);
        if (ret < 0)
                return ret;

        if (out)
                virtqueue_add_sg(vq, sg, out, DMA_TO_DEVICE);
        if (in)
                virtqueue_add_sg(vq, sg + out, in, DMA_FROM_DEVICE);

        virtqueue_end_buf(vq);
	return 0;

How can it be simpler and easier to understand than that?

> Basically the more consistent the interface is with virtqueue_add_buf,
> the better.

The interface is consistent with virtqueue_add_buf_single, where out/in
clearly doesn't make sense.

virtqueue_add_buf and virtqueue_add_sg are very different, despite the
similar name.

> I'm not against changing virtqueue_add_buf if you like but let's keep
> it all consistent.

How can you change virtqueue_add_buf?

Paolo


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 1/9] virtio: add functions for piecewise addition of buffers
  2013-02-12 18:04                     ` Paolo Bonzini
@ 2013-02-12 18:23                       ` Michael S. Tsirkin
  2013-02-12 20:08                         ` Paolo Bonzini
  0 siblings, 1 reply; 35+ messages in thread
From: Michael S. Tsirkin @ 2013-02-12 18:23 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: linux-kernel, Wanlong Gao, asias, Rusty Russell, kvm, virtualization

On Tue, Feb 12, 2013 at 07:04:27PM +0100, Paolo Bonzini wrote:
> >> Perhaps, but 3 or 4 arguments (in/out/nsg or in/out/nsg_in/nsg_out) just
> >> for this are definitely too many and make the API harder to use.
> >>
> >> You have to find a balance.  Having actually used the API, the
> >> possibility of mixing in/out buffers by mistake never even occurred to
> >> me, much less happened in practice, so I didn't consider it a problem.
> >> Mixing in/out buffers in a single call wasn't a necessity, either.
> > 
> > It is useful for virtqueue_add_buf implementation.
> 
>         ret = virtqueue_start_buf(vq, data, out + in, !!out + !!in,
> 				  gfp);
>         if (ret < 0)
>                 return ret;
> 
>         if (out)
>                 virtqueue_add_sg(vq, sg, out, DMA_TO_DEVICE);
>         if (in)
>                 virtqueue_add_sg(vq, sg + out, in, DMA_FROM_DEVICE);
> 
>         virtqueue_end_buf(vq);
> 	return 0;
> 
> How can it be simpler and easier to understand than that?

Like this:

         ret = virtqueue_start_buf(vq, data, in, out, gfp);
         if (ret < 0)
                 return ret;
 
         virtqueue_add_sg(vq, sg, in, out);
 
         virtqueue_end_buf(vq);


> > Basically the more consistent the interface is with virtqueue_add_buf,
> > the better.
> 
> The interface is consistent with virtqueue_add_buf_single, where out/in
> clearly doesn't make sense.

Hmm, we could make virtqueue_add_buf_single consistent by giving it 'bool in'.

> virtqueue_add_buf and virtqueue_add_sg are very different, despite the
> similar name.

True. The similarity is between _start and _add_buf.
And this is confusing too. Maybe this means
_start and _add_sg should be renamed.

> > I'm not against changing virtqueue_add_buf if you like but let's keep
> > it all consistent.
> 
> How can you change virtqueue_add_buf?

Donnu.

-- 
MST

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 1/9] virtio: add functions for piecewise addition of buffers
  2013-02-12 18:23                       ` Michael S. Tsirkin
@ 2013-02-12 20:08                         ` Paolo Bonzini
  2013-02-12 20:49                           ` Michael S. Tsirkin
  0 siblings, 1 reply; 35+ messages in thread
From: Paolo Bonzini @ 2013-02-12 20:08 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: kvm, linux-kernel, virtualization

Il 12/02/2013 19:23, Michael S. Tsirkin ha scritto:
> On Tue, Feb 12, 2013 at 07:04:27PM +0100, Paolo Bonzini wrote:
>>>> Perhaps, but 3 or 4 arguments (in/out/nsg or in/out/nsg_in/nsg_out) just
>>>> for this are definitely too many and make the API harder to use.
>>>>
>>>> You have to find a balance.  Having actually used the API, the
>>>> possibility of mixing in/out buffers by mistake never even occurred to
>>>> me, much less happened in practice, so I didn't consider it a problem.
>>>> Mixing in/out buffers in a single call wasn't a necessity, either.
>>>
>>> It is useful for virtqueue_add_buf implementation.
>>
>>         ret = virtqueue_start_buf(vq, data, out + in, !!out + !!in,
>> 				  gfp);
>>         if (ret < 0)
>>                 return ret;
>>
>>         if (out)
>>                 virtqueue_add_sg(vq, sg, out, DMA_TO_DEVICE);
>>         if (in)
>>                 virtqueue_add_sg(vq, sg + out, in, DMA_FROM_DEVICE);
>>
>>         virtqueue_end_buf(vq);
>> 	return 0;
>>
>> How can it be simpler and easier to understand than that?
> 
> Like this:
> 
>          ret = virtqueue_start_buf(vq, data, in, out, gfp);
>          if (ret < 0)
>                  return ret;
>  
>          virtqueue_add_sg(vq, sg, in, out);
>  
>          virtqueue_end_buf(vq);

It's out/in, not in/out... I know you wrote it in a hurry, but it kind
of shows that the new API is easier to use.  Check out patch 8, it's  a
real improvement in readability.

Plus you haven't solved the problem of alternating to/from-device
elements (which is also harder to spot with in/out than with the enum).

And no one else would use add_sg with in != 0 && out != 0, so you'd be
favoring one caller over all the others.  If you did this instead:

         virtqueue_add_sg(vq, sg, in + out);

it would really look like a hack IMHO.

>>> Basically the more consistent the interface is with virtqueue_add_buf,
>>> the better.
>>
>> The interface is consistent with virtqueue_add_buf_single, where out/in
>> clearly doesn't make sense.
> 
> Hmm, we could make virtqueue_add_buf_single consistent by giving it 'bool in'.

But is it "bool in" or "bool out"?

>> virtqueue_add_buf and virtqueue_add_sg are very different, despite the
>> similar name.
> 
> True. The similarity is between _start and _add_buf.
> And this is confusing too. Maybe this means
> _start and _add_sg should be renamed.

Maybe.  If you have any suggestions it's fine.

BTW I tried using out/in for start_buf, and the code in virtio-blk gets
messier, it has to do all the math twice.  Perhaps we just need to
acknowledge that the API is different and thus the optimal choice of
arguments is different.  C doesn't have keyword arguments, there not
much that we can do.

Paolo

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 1/9] virtio: add functions for piecewise addition of buffers
  2013-02-12 20:08                         ` Paolo Bonzini
@ 2013-02-12 20:49                           ` Michael S. Tsirkin
  2013-02-13  8:06                             ` Paolo Bonzini
  0 siblings, 1 reply; 35+ messages in thread
From: Michael S. Tsirkin @ 2013-02-12 20:49 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: kvm, linux-kernel, virtualization

On Tue, Feb 12, 2013 at 09:08:27PM +0100, Paolo Bonzini wrote:
> Il 12/02/2013 19:23, Michael S. Tsirkin ha scritto:
> > On Tue, Feb 12, 2013 at 07:04:27PM +0100, Paolo Bonzini wrote:
> >>>> Perhaps, but 3 or 4 arguments (in/out/nsg or in/out/nsg_in/nsg_out) just
> >>>> for this are definitely too many and make the API harder to use.
> >>>>
> >>>> You have to find a balance.  Having actually used the API, the
> >>>> possibility of mixing in/out buffers by mistake never even occurred to
> >>>> me, much less happened in practice, so I didn't consider it a problem.
> >>>> Mixing in/out buffers in a single call wasn't a necessity, either.
> >>>
> >>> It is useful for virtqueue_add_buf implementation.
> >>
> >>         ret = virtqueue_start_buf(vq, data, out + in, !!out + !!in,
> >> 				  gfp);
> >>         if (ret < 0)
> >>                 return ret;
> >>
> >>         if (out)
> >>                 virtqueue_add_sg(vq, sg, out, DMA_TO_DEVICE);
> >>         if (in)
> >>                 virtqueue_add_sg(vq, sg + out, in, DMA_FROM_DEVICE);
> >>
> >>         virtqueue_end_buf(vq);
> >> 	return 0;
> >>
> >> How can it be simpler and easier to understand than that?
> > 
> > Like this:
> > 
> >          ret = virtqueue_start_buf(vq, data, in, out, gfp);
> >          if (ret < 0)
> >                  return ret;
> >  
> >          virtqueue_add_sg(vq, sg, in, out);
> >  
> >          virtqueue_end_buf(vq);
> 
> It's out/in, not in/out... I know you wrote it in a hurry, but it kind
> of shows that the new API is easier to use.  Check out patch 8, it's  a
> real improvement in readability.

That's virtqueue_add_buf_single, that's a separate matter.
Another option for _single is just two wrappers:
virtqueue_add_buf_in
virtqueue_add_buf_out

> Plus you haven't solved the problem of alternating to/from-device
> elements (which is also harder to spot with in/out than with the enum).

Yes it does, if add_sg does not have in/out at all there's no way to
request the impossible to/from mix.

> And no one else would use add_sg with in != 0 && out != 0, so you'd be
> favoring one caller over all the others.

Yes but it's an important caller as all drivers besides storage use this
path.

>  If you did this instead:
> 
>          virtqueue_add_sg(vq, sg, in + out);
> 
> it would really look like a hack IMHO.
> 
> >>> Basically the more consistent the interface is with virtqueue_add_buf,
> >>> the better.
> >>
> >> The interface is consistent with virtqueue_add_buf_single, where out/in
> >> clearly doesn't make sense.
> > 
> > Hmm, we could make virtqueue_add_buf_single consistent by giving it 'bool in'.
> 
> But is it "bool in" or "bool out"?

Agree, bool is a bit ugly anyway.

> >> virtqueue_add_buf and virtqueue_add_sg are very different, despite the
> >> similar name.
> > 
> > True. The similarity is between _start and _add_buf.
> > And this is confusing too. Maybe this means
> > _start and _add_sg should be renamed.
> 
> Maybe.  If you have any suggestions it's fine.
> 
> BTW I tried using out/in for start_buf, and the code in virtio-blk gets
> messier, it has to do all the math twice.

I'm pretty sure we can do this without duplication, if we want to.

> Perhaps we just need to
> acknowledge that the API is different and thus the optimal choice of
> arguments is different.  C doesn't have keyword arguments, there not
> much that we can do.
> 
> Paolo

Yea, maybe. I'm not the API guru here anyway, it's Rusty's street.

-- 
MST

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 1/9] virtio: add functions for piecewise addition of buffers
  2013-02-12 20:49                           ` Michael S. Tsirkin
@ 2013-02-13  8:06                             ` Paolo Bonzini
  2013-02-13 10:33                               ` Michael S. Tsirkin
  0 siblings, 1 reply; 35+ messages in thread
From: Paolo Bonzini @ 2013-02-13  8:06 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: linux-kernel, kvm, virtualization

Il 12/02/2013 21:49, Michael S. Tsirkin ha scritto:
> On Tue, Feb 12, 2013 at 09:08:27PM +0100, Paolo Bonzini wrote:
>> Il 12/02/2013 19:23, Michael S. Tsirkin ha scritto:
>>> On Tue, Feb 12, 2013 at 07:04:27PM +0100, Paolo Bonzini wrote:
>>>>>> Perhaps, but 3 or 4 arguments (in/out/nsg or in/out/nsg_in/nsg_out) just
>>>>>> for this are definitely too many and make the API harder to use.
>>>>>>
>>>>>> You have to find a balance.  Having actually used the API, the
>>>>>> possibility of mixing in/out buffers by mistake never even occurred to
>>>>>> me, much less happened in practice, so I didn't consider it a problem.
>>>>>> Mixing in/out buffers in a single call wasn't a necessity, either.
>>>>>
>>>>> It is useful for virtqueue_add_buf implementation.
>>>>
>>>>         ret = virtqueue_start_buf(vq, data, out + in, !!out + !!in,
>>>> 				  gfp);
>>>>         if (ret < 0)
>>>>                 return ret;
>>>>
>>>>         if (out)
>>>>                 virtqueue_add_sg(vq, sg, out, DMA_TO_DEVICE);
>>>>         if (in)
>>>>                 virtqueue_add_sg(vq, sg + out, in, DMA_FROM_DEVICE);
>>>>
>>>>         virtqueue_end_buf(vq);
>>>> 	return 0;
>>>>
>>>> How can it be simpler and easier to understand than that?
>>>
>>> Like this:
>>>
>>>          ret = virtqueue_start_buf(vq, data, in, out, gfp);
>>>          if (ret < 0)
>>>                  return ret;
>>>  
>>>          virtqueue_add_sg(vq, sg, in, out);
>>>  
>>>          virtqueue_end_buf(vq);
>>
>> It's out/in, not in/out... I know you wrote it in a hurry, but it kind
>> of shows that the new API is easier to use.  Check out patch 8, it's  a
>> real improvement in readability.
> 
> That's virtqueue_add_buf_single, that's a separate matter.
> Another option for _single is just two wrappers:
> virtqueue_add_buf_in
> virtqueue_add_buf_out

I like it less, but yes this one would be ok (no driver uses a variable
for the enum parameter).

>> Plus you haven't solved the problem of alternating to/from-device
>> elements (which is also harder to spot with in/out than with the enum).
> 
> Yes it does, if add_sg does not have in/out at all there's no way to
> request the impossible to/from mix.

In your example above it does have it.  I assume you meant

         ret = virtqueue_start_buf(vq, data, out, in, gfp);
         if (ret < 0)
                 return ret;

          virtqueue_add_sg(vq, sg, out + in);
          virtqueue_end_buf(vq);

>>>> virtqueue_add_buf and virtqueue_add_sg are very different, despite the
>>>> similar name.
>>>
>>> True. The similarity is between _start and _add_buf.
>>> And this is confusing too. Maybe this means
>>> _start and _add_sg should be renamed.
>>
>> Maybe.  If you have any suggestions it's fine.
>>
>> BTW I tried using out/in for start_buf, and the code in virtio-blk gets
>> messier, it has to do all the math twice.
> 
> I'm pretty sure we can do this without duplication, if we want to.

Indeed, if you remove the out/in arguments from _sg there is no
duplication in virtio-blk.  That's because it places data-out at the end
and data-in at the beginning (so data is always after the request header
and before the response footer).

>> Perhaps we just need to
>> acknowledge that the API is different and thus the optimal choice of
>> arguments is different.  C doesn't have keyword arguments, there not
>> much that we can do.
> 
> Yea, maybe. I'm not the API guru here anyway, it's Rusty's street.

Let's wait for him.

Paolo


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 1/9] virtio: add functions for piecewise addition of buffers
  2013-02-13  8:06                             ` Paolo Bonzini
@ 2013-02-13 10:33                               ` Michael S. Tsirkin
  0 siblings, 0 replies; 35+ messages in thread
From: Michael S. Tsirkin @ 2013-02-13 10:33 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: linux-kernel, kvm, virtualization

On Wed, Feb 13, 2013 at 09:06:27AM +0100, Paolo Bonzini wrote:
> Il 12/02/2013 21:49, Michael S. Tsirkin ha scritto:
> > On Tue, Feb 12, 2013 at 09:08:27PM +0100, Paolo Bonzini wrote:
> >> Il 12/02/2013 19:23, Michael S. Tsirkin ha scritto:
> >>> On Tue, Feb 12, 2013 at 07:04:27PM +0100, Paolo Bonzini wrote:
> >>>>>> Perhaps, but 3 or 4 arguments (in/out/nsg or in/out/nsg_in/nsg_out) just
> >>>>>> for this are definitely too many and make the API harder to use.
> >>>>>>
> >>>>>> You have to find a balance.  Having actually used the API, the
> >>>>>> possibility of mixing in/out buffers by mistake never even occurred to
> >>>>>> me, much less happened in practice, so I didn't consider it a problem.
> >>>>>> Mixing in/out buffers in a single call wasn't a necessity, either.
> >>>>>
> >>>>> It is useful for virtqueue_add_buf implementation.
> >>>>
> >>>>         ret = virtqueue_start_buf(vq, data, out + in, !!out + !!in,
> >>>> 				  gfp);
> >>>>         if (ret < 0)
> >>>>                 return ret;
> >>>>
> >>>>         if (out)
> >>>>                 virtqueue_add_sg(vq, sg, out, DMA_TO_DEVICE);
> >>>>         if (in)
> >>>>                 virtqueue_add_sg(vq, sg + out, in, DMA_FROM_DEVICE);
> >>>>
> >>>>         virtqueue_end_buf(vq);
> >>>> 	return 0;
> >>>>
> >>>> How can it be simpler and easier to understand than that?
> >>>
> >>> Like this:
> >>>
> >>>          ret = virtqueue_start_buf(vq, data, in, out, gfp);
> >>>          if (ret < 0)
> >>>                  return ret;
> >>>  
> >>>          virtqueue_add_sg(vq, sg, in, out);
> >>>  
> >>>          virtqueue_end_buf(vq);
> >>
> >> It's out/in, not in/out... I know you wrote it in a hurry, but it kind
> >> of shows that the new API is easier to use.  Check out patch 8, it's  a
> >> real improvement in readability.
> > 
> > That's virtqueue_add_buf_single, that's a separate matter.
> > Another option for _single is just two wrappers:
> > virtqueue_add_buf_in
> > virtqueue_add_buf_out
> 
> I like it less, but yes this one would be ok (no driver uses a variable
> for the enum parameter).

OK, this has the advantage of being even shorter.

> >> Plus you haven't solved the problem of alternating to/from-device
> >> elements (which is also harder to spot with in/out than with the enum).
> > 
> > Yes it does, if add_sg does not have in/out at all there's no way to
> > request the impossible to/from mix.
> 
> In your example above it does have it.  I assume you meant
> 
>          ret = virtqueue_start_buf(vq, data, out, in, gfp);
>          if (ret < 0)
>                  return ret;
> 
>           virtqueue_add_sg(vq, sg, out + in);
>           virtqueue_end_buf(vq);
> 
> >>>> virtqueue_add_buf and virtqueue_add_sg are very different, despite the
> >>>> similar name.
> >>>
> >>> True. The similarity is between _start and _add_buf.
> >>> And this is confusing too. Maybe this means
> >>> _start and _add_sg should be renamed.
> >>
> >> Maybe.  If you have any suggestions it's fine.
> >>
> >> BTW I tried using out/in for start_buf, and the code in virtio-blk gets
> >> messier, it has to do all the math twice.
> > 
> > I'm pretty sure we can do this without duplication, if we want to.
> 
> Indeed, if you remove the out/in arguments from _sg there is no
> duplication in virtio-blk.  That's because it places data-out at the end
> and data-in at the beginning (so data is always after the request header
> and before the response footer).

Yes, it's not a virtio-blk thing, virtio spec and interface require that
we have in after out.
So yes, to me it seems cleaner to drop out/in arguments from _sg.

> >> acknowledge that the API is different and thus the optimal choice of
> >> arguments is different.  C doesn't have keyword arguments, there not
> >> much that we can do.
> > 
> > Yea, maybe. I'm not the API guru here anyway, it's Rusty's street.
> 
> Let's wait for him.
> 
> Paolo

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 0/9] virtio: new API for addition of buffers, scatterlist changes
  2013-02-12 12:23 [PATCH 0/9] virtio: new API for addition of buffers, scatterlist changes Paolo Bonzini
                   ` (8 preceding siblings ...)
  2013-02-12 12:23 ` [PATCH 9/9] virtio: reimplement virtqueue_add_buf using new functions Paolo Bonzini
@ 2013-02-14  6:00 ` Rusty Russell
  2013-02-14  9:23   ` Paolo Bonzini
  9 siblings, 1 reply; 35+ messages in thread
From: Rusty Russell @ 2013-02-14  6:00 UTC (permalink / raw)
  To: Paolo Bonzini, linux-kernel; +Cc: Wanlong Gao, asias, mst, kvm, virtualization

Paolo Bonzini <pbonzini@redhat.com> writes:
> This series adds a different set of APIs for adding a buffer to a
> virtqueue.  The new API lets you pass the buffers piecewise, wrapping
> multiple calls to virtqueue_add_sg between virtqueue_start_buf and
> virtqueue_end_buf.  Letting drivers call virtqueue_add_sg multiple times
> if they already have a scatterlist provided by someone else simplifies the
> code and, for virtio-scsi, it saves the copying and related locking.

They are ugly though.  It's convoluted because we do actually know all
the buffers at once, we don't need a piecemeal API.

As a result, you now have arbitrary changes to the indirect heuristic,
because the API is now piecemeal.

How about this as a first step?

virtio_ring: virtqueue_add_sgs, to add multiple sgs.

virtio_scsi and virtio_blk can really use these, to avoid their current
hack of copying the whole sg array.

Signed-off-by: Ruty Russell <rusty@rustcorp.com.au> 

diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
index ffd7e7d..c5afc5d 100644
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -121,14 +121,16 @@ struct vring_virtqueue
 
 /* Set up an indirect table of descriptors and add it to the queue. */
 static int vring_add_indirect(struct vring_virtqueue *vq,
-			      struct scatterlist sg[],
-			      unsigned int out,
-			      unsigned int in,
+			      struct scatterlist *sgs[],
+			      unsigned int total_sg,
+			      unsigned int out_sgs,
+			      unsigned int in_sgs,
 			      gfp_t gfp)
 {
 	struct vring_desc *desc;
 	unsigned head;
-	int i;
+	struct scatterlist *sg;
+	int i, n;
 
 	/*
 	 * We require lowmem mappings for the descriptors because
@@ -137,26 +139,31 @@ static int vring_add_indirect(struct vring_virtqueue *vq,
 	 */
 	gfp &= ~(__GFP_HIGHMEM | __GFP_HIGH);
 
-	desc = kmalloc((out + in) * sizeof(struct vring_desc), gfp);
+	desc = kmalloc(total_sg * sizeof(struct vring_desc), gfp);
 	if (!desc)
 		return -ENOMEM;
 
-	/* Transfer entries from the sg list into the indirect page */
-	for (i = 0; i < out; i++) {
-		desc[i].flags = VRING_DESC_F_NEXT;
-		desc[i].addr = sg_phys(sg);
-		desc[i].len = sg->length;
-		desc[i].next = i+1;
-		sg++;
+	/* Transfer entries from the sg lists into the indirect page */
+	i = 0;
+	for (n = 0; n < out_sgs; n++) {
+		for (sg = sgs[n]; sg; sg = sg_next(sg)) {
+			desc[i].flags = VRING_DESC_F_NEXT;
+			desc[i].addr = sg_phys(sg);
+			desc[i].len = sg->length;
+			desc[i].next = i+1;
+		}
 	}
-	for (; i < (out + in); i++) {
-		desc[i].flags = VRING_DESC_F_NEXT|VRING_DESC_F_WRITE;
-		desc[i].addr = sg_phys(sg);
-		desc[i].len = sg->length;
-		desc[i].next = i+1;
-		sg++;
+	for (; n < (out_sgs + in_sgs); n++) {
+		for (sg = sgs[n]; sg; sg = sg_next(sg)) {
+			desc[i].flags = VRING_DESC_F_NEXT|VRING_DESC_F_WRITE;
+			desc[i].addr = sg_phys(sg);
+			desc[i].len = sg->length;
+			desc[i].next = i+1;
+		}
 	}
 
+	BUG_ON(i != total_sg);
+
 	/* Last one doesn't continue. */
 	desc[i-1].flags &= ~VRING_DESC_F_NEXT;
 	desc[i-1].next = 0;
@@ -176,6 +183,15 @@ static int vring_add_indirect(struct vring_virtqueue *vq,
 	return head;
 }
 
+/* FIXME */
+static inline void sg_unmark_end(struct scatterlist *sg)
+{
+#ifdef CONFIG_DEBUG_SG
+	BUG_ON(sg->sg_magic != SG_MAGIC);
+#endif
+	sg->page_link &= ~0x02;
+}
+
 /**
  * virtqueue_add_buf - expose buffer to other end
  * @vq: the struct virtqueue we're talking about.
@@ -197,8 +213,47 @@ int virtqueue_add_buf(struct virtqueue *_vq,
 		      void *data,
 		      gfp_t gfp)
 {
+	struct scatterlist *sgs[2];
+	unsigned int i;
+
+	sgs[0] = sg;
+	sgs[1] = sg + out;
+
+	/* Workaround until callers pass well-formed sgs. */
+	for (i = 0; i < out + in; i++)
+		sg_unmark_end(sg + i);
+
+	sg_unmark_end(sg + out + in);
+	if (out && in)
+		sg_unmark_end(sg + out);
+
+	return virtqueue_add_sgs(_vq, sgs, out ? 1 : 0, in ? 1 : 0, data, gfp);
+}
+
+/**
+ * virtqueue_add_sgs - expose buffers to other end
+ * @vq: the struct virtqueue we're talking about.
+ * @sgs: array of terminated scatterlists.
+ * @out_num: the number of scatterlists readable by other side
+ * @in_num: the number of scatterlists which are writable (after readable ones)
+ * @data: the token identifying the buffer.
+ * @gfp: how to do memory allocations (if necessary).
+ *
+ * Caller must ensure we don't call this with other virtqueue operations
+ * at the same time (except where noted).
+ *
+ * Returns zero or a negative error (ie. ENOSPC, ENOMEM).
+ */
+int virtqueue_add_sgs(struct virtqueue *_vq,
+		      struct scatterlist *sgs[],
+		      unsigned int out_sgs,
+		      unsigned int in_sgs,
+		      void *data,
+		      gfp_t gfp)
+{
 	struct vring_virtqueue *vq = to_vvq(_vq);
-	unsigned int i, avail, uninitialized_var(prev);
+	struct scatterlist *sg;
+	unsigned int i, n, avail, uninitialized_var(prev), total_sg;
 	int head;
 
 	START_USE(vq);
@@ -218,46 +273,59 @@ int virtqueue_add_buf(struct virtqueue *_vq,
 	}
 #endif
 
+	/* Count them first. */
+	for (i = total_sg = 0; i < out_sgs + in_sgs; i++) {
+		struct scatterlist *sg;
+		for (sg = sgs[i]; sg; sg = sg_next(sg))
+			total_sg++;
+	}
+
+
 	/* If the host supports indirect descriptor tables, and we have multiple
 	 * buffers, then go indirect. FIXME: tune this threshold */
-	if (vq->indirect && (out + in) > 1 && vq->vq.num_free) {
-		head = vring_add_indirect(vq, sg, out, in, gfp);
+	if (vq->indirect && total_sg > 1 && vq->vq.num_free) {
+		head = vring_add_indirect(vq, sgs, total_sg, out_sgs, in_sgs,
+					  gfp);
 		if (likely(head >= 0))
 			goto add_head;
 	}
 
-	BUG_ON(out + in > vq->vring.num);
-	BUG_ON(out + in == 0);
+	BUG_ON(total_sg > vq->vring.num);
+	BUG_ON(total_sg == 0);
 
-	if (vq->vq.num_free < out + in) {
+	if (vq->vq.num_free < total_sg) {
 		pr_debug("Can't add buf len %i - avail = %i\n",
-			 out + in, vq->vq.num_free);
+			 total_sg, vq->vq.num_free);
 		/* FIXME: for historical reasons, we force a notify here if
 		 * there are outgoing parts to the buffer.  Presumably the
 		 * host should service the ring ASAP. */
-		if (out)
+		if (out_sgs)
 			vq->notify(&vq->vq);
 		END_USE(vq);
 		return -ENOSPC;
 	}
 
 	/* We're about to use some buffers from the free list. */
-	vq->vq.num_free -= out + in;
-
-	head = vq->free_head;
-	for (i = vq->free_head; out; i = vq->vring.desc[i].next, out--) {
-		vq->vring.desc[i].flags = VRING_DESC_F_NEXT;
-		vq->vring.desc[i].addr = sg_phys(sg);
-		vq->vring.desc[i].len = sg->length;
-		prev = i;
-		sg++;
+	vq->vq.num_free -= total_sg;
+
+	head = i = vq->free_head;
+	for (n = 0; n < out_sgs; n++) {
+		for (sg = sgs[n]; sg; sg = sg_next(sg)) {
+			vq->vring.desc[i].flags = VRING_DESC_F_NEXT;
+			vq->vring.desc[i].addr = sg_phys(sg);
+			vq->vring.desc[i].len = sg->length;
+			prev = i;
+			i = vq->vring.desc[i].next;
+		}
 	}
-	for (; in; i = vq->vring.desc[i].next, in--) {
-		vq->vring.desc[i].flags = VRING_DESC_F_NEXT|VRING_DESC_F_WRITE;
-		vq->vring.desc[i].addr = sg_phys(sg);
-		vq->vring.desc[i].len = sg->length;
-		prev = i;
-		sg++;
+	for (; n < (out_sgs + in_sgs); n++) {
+		for (sg = sgs[n]; sg; sg = sg_next(sg)) {
+			vq->vring.desc[i].flags = VRING_DESC_F_NEXT|VRING_DESC_F_WRITE;
+			vq->vring.desc[i].addr = sg_phys(sg);
+			vq->vring.desc[i].len = sg->length;
+			prev = i;
+			i = vq->vring.desc[i].next;
+		}
 	}
 	/* Last one doesn't continue. */
 	vq->vring.desc[prev].flags &= ~VRING_DESC_F_NEXT;
diff --git a/include/linux/virtio.h b/include/linux/virtio.h
index ff6714e..6eff15b 100644
--- a/include/linux/virtio.h
+++ b/include/linux/virtio.h
@@ -40,6 +40,13 @@ int virtqueue_add_buf(struct virtqueue *vq,
 		      void *data,
 		      gfp_t gfp);
 
+int virtqueue_add_sgs(struct virtqueue *vq,
+		      struct scatterlist *sgs[],
+		      unsigned int out_sgs,
+		      unsigned int in_sgs,
+		      void *data,
+		      gfp_t gfp);
+
 void virtqueue_kick(struct virtqueue *vq);
 
 bool virtqueue_kick_prepare(struct virtqueue *vq);

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* Re: [PATCH 0/9] virtio: new API for addition of buffers, scatterlist changes
  2013-02-14  6:00 ` [PATCH 0/9] virtio: new API for addition of buffers, scatterlist changes Rusty Russell
@ 2013-02-14  9:23   ` Paolo Bonzini
  2013-02-15 18:04     ` Paolo Bonzini
  2013-02-19  7:49     ` Rusty Russell
  0 siblings, 2 replies; 35+ messages in thread
From: Paolo Bonzini @ 2013-02-14  9:23 UTC (permalink / raw)
  To: Rusty Russell; +Cc: linux-kernel, Wanlong Gao, asias, mst, kvm, virtualization

Il 14/02/2013 07:00, Rusty Russell ha scritto:
> Paolo Bonzini <pbonzini@redhat.com> writes:
>> This series adds a different set of APIs for adding a buffer to a
>> virtqueue.  The new API lets you pass the buffers piecewise, wrapping
>> multiple calls to virtqueue_add_sg between virtqueue_start_buf and
>> virtqueue_end_buf.  Letting drivers call virtqueue_add_sg multiple times
>> if they already have a scatterlist provided by someone else simplifies the
>> code and, for virtio-scsi, it saves the copying and related locking.
> 
> They are ugly though.  It's convoluted because we do actually know all
> the buffers at once, we don't need a piecemeal API.
> 
> As a result, you now have arbitrary changes to the indirect heuristic,
> because the API is now piecemeal.

Note that I have sent v2 of patch 1/9, keeping the original indirect
heuristic.  It was indeed a bad idea to conflate it in this series (it
was born there because originally virtqueue_add_buf was not sharing any
code, but now it's a different story)

> How about this as a first step?
> 
> virtio_ring: virtqueue_add_sgs, to add multiple sgs.
> 
> virtio_scsi and virtio_blk can really use these, to avoid their current
> hack of copying the whole sg array.
> 
> Signed-off-by: Ruty Russell <rusty@rustcorp.com.au> 

It's much better than the other prototype you had posted, but I still
dislike this...  You pay for additional counting of scatterlists when
the caller knows the number of buffers; and the nested loops aren't
free, either.

My piecemeal API tried hard to keep things as fast as virtqueue_add_buf
when possible; I'm worried that this approach requires a lot more
benchmarking.  Probably you would also need a fast-path
virtqueue_add_buf_single, and (unlike my version) that one couldn't
share much code if any with virtqueue_add_sgs.

So I can resend based on this patch, but I'm not sure it's really better...

Also, see below for a comment.

> @@ -197,8 +213,47 @@ int virtqueue_add_buf(struct virtqueue *_vq,
>  		      void *data,
>  		      gfp_t gfp)
>  {
> +	struct scatterlist *sgs[2];
> +	unsigned int i;
> +
> +	sgs[0] = sg;
> +	sgs[1] = sg + out;
> +
> +	/* Workaround until callers pass well-formed sgs. */
> +	for (i = 0; i < out + in; i++)
> +		sg_unmark_end(sg + i);
> +
> +	sg_unmark_end(sg + out + in);
> +	if (out && in)
> +		sg_unmark_end(sg + out);

What's this second sg_unmark_end block for?  Doesn't it access after the
end of sg?  If you wanted it to be sg_mark_end, that must be:

if (out)
	sg_mark_end(sg + out - 1);
if (in)
	sg_mark_end(sg + out + in - 1);

with a corresponding unmark afterwards.

Paolo

> +	return virtqueue_add_sgs(_vq, sgs, out ? 1 : 0, in ? 1 : 0, data, gfp);

> +}
> +
> +/**
> + * virtqueue_add_sgs - expose buffers to other end
> + * @vq: the struct virtqueue we're talking about.
> + * @sgs: array of terminated scatterlists.
> + * @out_num: the number of scatterlists readable by other side
> + * @in_num: the number of scatterlists which are writable (after readable ones)
> + * @data: the token identifying the buffer.
> + * @gfp: how to do memory allocations (if necessary).
> + *
> + * Caller must ensure we don't call this with other virtqueue operations
> + * at the same time (except where noted).
> + *
> + * Returns zero or a negative error (ie. ENOSPC, ENOMEM).
> + */
> +int virtqueue_add_sgs(struct virtqueue *_vq,
> +		      struct scatterlist *sgs[],
> +		      unsigned int out_sgs,
> +		      unsigned int in_sgs,
> +		      void *data,
> +		      gfp_t gfp)
> +{
>  	struct vring_virtqueue *vq = to_vvq(_vq);
> -	unsigned int i, avail, uninitialized_var(prev);
> +	struct scatterlist *sg;
> +	unsigned int i, n, avail, uninitialized_var(prev), total_sg;
>  	int head;
>  
>  	START_USE(vq);
> @@ -218,46 +273,59 @@ int virtqueue_add_buf(struct virtqueue *_vq,
>  	}
>  #endif
>  
> +	/* Count them first. */
> +	for (i = total_sg = 0; i < out_sgs + in_sgs; i++) {
> +		struct scatterlist *sg;
> +		for (sg = sgs[i]; sg; sg = sg_next(sg))
> +			total_sg++;
> +	}
> +
> +
>  	/* If the host supports indirect descriptor tables, and we have multiple
>  	 * buffers, then go indirect. FIXME: tune this threshold */
> -	if (vq->indirect && (out + in) > 1 && vq->vq.num_free) {
> -		head = vring_add_indirect(vq, sg, out, in, gfp);
> +	if (vq->indirect && total_sg > 1 && vq->vq.num_free) {
> +		head = vring_add_indirect(vq, sgs, total_sg, out_sgs, in_sgs,
> +					  gfp);
>  		if (likely(head >= 0))
>  			goto add_head;
>  	}
>  
> -	BUG_ON(out + in > vq->vring.num);
> -	BUG_ON(out + in == 0);
> +	BUG_ON(total_sg > vq->vring.num);
> +	BUG_ON(total_sg == 0);
>  
> -	if (vq->vq.num_free < out + in) {
> +	if (vq->vq.num_free < total_sg) {
>  		pr_debug("Can't add buf len %i - avail = %i\n",
> -			 out + in, vq->vq.num_free);
> +			 total_sg, vq->vq.num_free);
>  		/* FIXME: for historical reasons, we force a notify here if
>  		 * there are outgoing parts to the buffer.  Presumably the
>  		 * host should service the ring ASAP. */
> -		if (out)
> +		if (out_sgs)
>  			vq->notify(&vq->vq);
>  		END_USE(vq);
>  		return -ENOSPC;
>  	}
>  
>  	/* We're about to use some buffers from the free list. */
> -	vq->vq.num_free -= out + in;
> -
> -	head = vq->free_head;
> -	for (i = vq->free_head; out; i = vq->vring.desc[i].next, out--) {
> -		vq->vring.desc[i].flags = VRING_DESC_F_NEXT;
> -		vq->vring.desc[i].addr = sg_phys(sg);
> -		vq->vring.desc[i].len = sg->length;
> -		prev = i;
> -		sg++;
> +	vq->vq.num_free -= total_sg;
> +
> +	head = i = vq->free_head;
> +	for (n = 0; n < out_sgs; n++) {
> +		for (sg = sgs[n]; sg; sg = sg_next(sg)) {
> +			vq->vring.desc[i].flags = VRING_DESC_F_NEXT;
> +			vq->vring.desc[i].addr = sg_phys(sg);
> +			vq->vring.desc[i].len = sg->length;
> +			prev = i;
> +			i = vq->vring.desc[i].next;
> +		}
>  	}
> -	for (; in; i = vq->vring.desc[i].next, in--) {
> -		vq->vring.desc[i].flags = VRING_DESC_F_NEXT|VRING_DESC_F_WRITE;
> -		vq->vring.desc[i].addr = sg_phys(sg);
> -		vq->vring.desc[i].len = sg->length;
> -		prev = i;
> -		sg++;
> +	for (; n < (out_sgs + in_sgs); n++) {
> +		for (sg = sgs[n]; sg; sg = sg_next(sg)) {
> +			vq->vring.desc[i].flags = VRING_DESC_F_NEXT|VRING_DESC_F_WRITE;
> +			vq->vring.desc[i].addr = sg_phys(sg);
> +			vq->vring.desc[i].len = sg->length;
> +			prev = i;
> +			i = vq->vring.desc[i].next;
> +		}
>  	}
>  	/* Last one doesn't continue. */
>  	vq->vring.desc[prev].flags &= ~VRING_DESC_F_NEXT;
> diff --git a/include/linux/virtio.h b/include/linux/virtio.h
> index ff6714e..6eff15b 100644
> --- a/include/linux/virtio.h
> +++ b/include/linux/virtio.h
> @@ -40,6 +40,13 @@ int virtqueue_add_buf(struct virtqueue *vq,
>  		      void *data,
>  		      gfp_t gfp);
>  
> +int virtqueue_add_sgs(struct virtqueue *vq,
> +		      struct scatterlist *sgs[],
> +		      unsigned int out_sgs,
> +		      unsigned int in_sgs,
> +		      void *data,
> +		      gfp_t gfp);
> +
>  void virtqueue_kick(struct virtqueue *vq);
>  
>  bool virtqueue_kick_prepare(struct virtqueue *vq);
> 


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 0/9] virtio: new API for addition of buffers, scatterlist changes
  2013-02-14  9:23   ` Paolo Bonzini
@ 2013-02-15 18:04     ` Paolo Bonzini
  2013-02-19  7:49     ` Rusty Russell
  1 sibling, 0 replies; 35+ messages in thread
From: Paolo Bonzini @ 2013-02-15 18:04 UTC (permalink / raw)
  Cc: Rusty Russell, kvm, mst, linux-kernel, virtualization

Il 14/02/2013 10:23, Paolo Bonzini ha scritto:
>> > How about this as a first step?
>> > 
>> > virtio_ring: virtqueue_add_sgs, to add multiple sgs.
>> > 
>> > virtio_scsi and virtio_blk can really use these, to avoid their current
>> > hack of copying the whole sg array.
>> > 
>> > Signed-off-by: Ruty Russell <rusty@rustcorp.com.au> 
> It's much better than the other prototype you had posted, but I still
> dislike this...  You pay for additional counting of scatterlists when
> the caller knows the number of buffers; and the nested loops aren't
> free, either.

Another problem is that you cannot pass "truncated" scatterlists.  You
must ensure there is an end marker on the last item.  I'm not sure if
the kernel ensures that, given that for_each_sg takes explicitly the
number of scatterlist elements; and it is not as trivial as
"sg_mark_end(foo + nsg - 1);" if the upper layers hand you a chained
scatterlist.

Paolo

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 4/9] virtio-blk: use virtqueue_start_buf on req path
  2013-02-12 12:23 ` [PATCH 4/9] virtio-blk: use virtqueue_start_buf on req path Paolo Bonzini
@ 2013-02-17  6:37   ` Asias He
  2013-02-18  9:05     ` Paolo Bonzini
  0 siblings, 1 reply; 35+ messages in thread
From: Asias He @ 2013-02-17  6:37 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: linux-kernel, Wanlong Gao, mst, Rusty Russell, kvm, virtualization

On 02/12/2013 08:23 PM, Paolo Bonzini wrote:
> This is similar to the previous patch, but a bit more radical
> because the bio and req paths now share the buffer construction
> code.  Because the req path doesn't use vbr->sg, however, we
> need to add a couple of arguments to __virtblk_add_req.
> 
> We also need to teach __virtblk_add_req how to build SCSI command
> requests.
> 
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
>  drivers/block/virtio_blk.c |   74 ++++++++++++++++++++++---------------------
>  1 files changed, 38 insertions(+), 36 deletions(-)
> 
> diff --git a/drivers/block/virtio_blk.c b/drivers/block/virtio_blk.c
> index 4a31fcc..22deb65 100644
> --- a/drivers/block/virtio_blk.c
> +++ b/drivers/block/virtio_blk.c
> @@ -102,18 +102,26 @@ static inline struct virtblk_req *virtblk_alloc_req(struct virtio_blk *vblk,
>  }
>  
>  static int __virtblk_add_req(struct virtqueue *vq,
> -			     struct virtblk_req *vbr)
> +			     struct virtblk_req *vbr,
> +			     struct scatterlist *data_sg,
> +			     unsigned data_nents)
>  {
>  	struct scatterlist sg;
>  	enum dma_data_direction dir;
>  	int ret;
>  
> +	int type = vbr->out_hdr.type & ~VIRTIO_BLK_T_OUT;
>  	unsigned int nents = 2;
>  	unsigned int nsg = 2;
>  
> -	if (vbr->nents) {
> +	if (type == VIRTIO_BLK_T_SCSI_CMD) {
> +		BUG_ON(use_bio);

Do we really need the BUG_ON?  Even if with use_bio=1,
VIRTIO_BLK_T_SCSI_CMD cmd can be fired. See this:

# cat /proc/cmdline
root=/dev/mapper/rhel-root console=ttyS0 virtio_blk.use_bio=1

# sg_inq /dev/vda
[   36.042300] ------------[ cut here ]------------
[   36.043021] kernel BUG at drivers/block/virtio_blk.c:118!
[   36.043021] invalid opcode: 0000 [#1] SMP
[   36.043021] Modules linked in:
[   36.043021] CPU 2
[   36.043021] Pid: 3311, comm: sg_inq Not tainted 3.8.0-rc7+ #618 Bochs
Bochs
[   36.043021] RIP: 0010:[<ffffffff816d740d>]  [<ffffffff816d740d>]
__virtblk_add_req+0x1cd/0x1e0
[   36.043021] RSP: 0018:ffff88007b59b9d8  EFLAGS: 00010002
[   36.043021] RAX: 0000000000000001 RBX: 0000000000000002 RCX:
0000000000000002
[   36.043021] RDX: 0000000000000002 RSI: ffff88007a430000 RDI:
ffff88007b422000
[   36.043021] RBP: ffff88007b59ba28 R08: ffff88007b59b9e0 R09:
ffff88007b59b9f4
[   36.043021] R10: 0000000000000001 R11: ffff88007bf57900 R12:
ffff88007a430000
[   36.043021] R13: ffff88007b422000 R14: 0000000000000001 R15:
ffff880077d34088
[   36.043021] FS:  00007eff2efcb740(0000) GS:ffff88007eb00000(0000)
knlGS:0000000000000000
[   36.043021] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   36.043021] CR2: 0000003f33e0f200 CR3: 000000007f023000 CR4:
00000000000006e0
[   36.043021] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
0000000000000000
[   36.043021] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7:
0000000000000400
[   36.043021] Process sg_inq (pid: 3311, threadinfo ffff88007b59a000,
task ffff88007f47a3f0)
[   36.043021] Stack:
[   36.043021]  ffff88007b59ba40 ffff880077d34088 ffff88007bf57980
0000000100000001
[   36.043021]  ffff88007fffa6c0 ffff88007a430000 ffff88007f66abc0
ffff880077d34000
[   36.043021]  ffff88007bdab410 0000000000000000 ffff88007b59ba78
ffffffff816d7f6c
[   36.043021] Call Trace:
[   36.043021]  [<ffffffff816d7f6c>] virtblk_request+0xec/0x1c0
[   36.043021]  [<ffffffff8148b947>] __blk_run_queue+0x37/0x50
[   36.043021]  [<ffffffff81486c10>] __elv_add_request+0xb0/0x230
[   36.043021]  [<ffffffff81491c69>] blk_execute_rq_nowait+0x79/0x100
[   36.043021]  [<ffffffff811f06e1>] ? bio_phys_segments+0x21/0x30
[   36.043021]  [<ffffffff81491d5d>] blk_execute_rq+0x6d/0xf0
[   36.043021]  [<ffffffff81491798>] ? blk_rq_append_bio+0x28/0x70
[   36.043021]  [<ffffffff81491ad0>] ? blk_rq_map_user+0x1a0/0x280
[   36.043021]  [<ffffffff814979c4>] sg_io+0x274/0x3e0
[   36.043021]  [<ffffffff81497f15>] scsi_cmd_ioctl+0x3e5/0x460
[   36.043021]  [<ffffffff8118c816>] ? handle_pte_fault+0xf6/0x9c0
[   36.043021]  [<ffffffff8116e8e0>] ? release_pages+0x190/0x1e0
[   36.043021]  [<ffffffff81497fe1>] scsi_cmd_blk_ioctl+0x51/0x70
[   36.043021]  [<ffffffff816d80ac>] virtblk_ioctl+0x6c/0x90
[   36.043021]  [<ffffffff81494068>] __blkdev_driver_ioctl+0x28/0x30
[   36.043021]  [<ffffffff814946d0>] blkdev_ioctl+0x200/0x7b0
[   36.043021]  [<ffffffff811c0836>] ? cp_new_stat+0x116/0x130
[   36.043021]  [<ffffffff811f232c>] block_ioctl+0x3c/0x40
[   36.043021]  [<ffffffff811cceaa>] do_vfs_ioctl+0x9a/0x550
[   36.043021]  [<ffffffff811cd3b7>] sys_ioctl+0x57/0x90
[   36.043021]  [<ffffffff814b7c4e>] ? trace_hardirqs_on_thunk+0x3a/0x3c
[   36.043021]  [<ffffffff81c6ad99>] system_call_fastpath+0x16/0x1b
[   36.043021] Code: 24 20 48 8d 7d b0 ba 10 00 00 00 e8 2e 3d de ff 48
8d 75 b0 b9 02 00 00 00 ba 01 00 00 00 4c 89 ef e8 88 bb e6 ff
e9 d9 fe ff ff <0f> 0b eb fe 66 66 66 66 66 66 2e 0f 1f 84 00 00 00 00
00 55 48
[   36.043021] RIP  [<ffffffff816d740d>] __virtblk_add_req+0x1cd/0x1e0
[   36.043021]  RSP <ffff88007b59b9d8>
[   36.043021] ---[ end trace 93ac0a3ba2789369 ]---


> +		nsg += 3;
> +		nents += 3;
> +	}
> +	if (data_nents) {
>  		nsg++;
> -		nents += vbr->nents;
> +		nents += data_nents;
>  	}
>  
>  	ret = virtqueue_start_buf(vq, vbr, nents, nsg, GFP_ATOMIC);
> @@ -124,14 +132,32 @@ static int __virtblk_add_req(struct virtqueue *vq,
>  	sg_init_one(&sg, &vbr->out_hdr, sizeof(vbr->out_hdr));
>  	virtqueue_add_sg(vq, &sg, 1, dir);
>  
> -	if (vbr->nents) {
> +	/*
> +	 * If this is a packet command we need a couple of additional headers.
> +	 * Behind the normal outhdr we put a segment with the scsi command
> +	 * block, and before the normal inhdr we put the sense data and the
> +	 * inhdr with additional status information.
> +	 */
> +	if (type == VIRTIO_BLK_T_SCSI_CMD) {
> +		sg_init_one(&sg, vbr->req->cmd, vbr->req->cmd_len);
> +		virtqueue_add_sg(vq, &sg, 1, dir);
> +	}
> +
> +	if (data_nents) {
>  		if ((vbr->out_hdr.type & VIRTIO_BLK_T_OUT) == 0)
>  			dir = DMA_FROM_DEVICE;
>  
> -		virtqueue_add_sg(vq, vbr->sg, vbr->nents, dir);
> +		virtqueue_add_sg(vq, data_sg, data_nents, dir);
>  	}
>  
>  	dir = DMA_FROM_DEVICE;
> +	if (type == VIRTIO_BLK_T_SCSI_CMD) {
> +		sg_init_one(&sg, vbr->req->sense, SCSI_SENSE_BUFFERSIZE);
> +		virtqueue_add_sg(vq, &sg, 1, dir);
> +		sg_init_one(&sg, &vbr->in_hdr, sizeof(vbr->in_hdr));
> +		virtqueue_add_sg(vq, &sg, 1, dir);
> +	}
> +
>  	sg_init_one(&sg, &vbr->status, sizeof(vbr->status));
>  	virtqueue_add_sg(vq, &sg, 1, dir);
>  
> @@ -146,7 +172,8 @@ static void virtblk_add_req(struct virtblk_req *vbr)
>  	int ret;
>  
>  	spin_lock_irq(vblk->disk->queue->queue_lock);
> -	while (unlikely((ret = __virtblk_add_req(vblk->vq, vbr)) < 0)) {
> +	while (unlikely((ret = __virtblk_add_req(vblk->vq, vbr, vbr->sg,
> +						 vbr->nents)) < 0)) {
>  		prepare_to_wait_exclusive(&vblk->queue_wait, &wait,
>  					  TASK_UNINTERRUPTIBLE);
>  
> @@ -299,7 +326,7 @@ static void virtblk_done(struct virtqueue *vq)
>  static bool do_req(struct request_queue *q, struct virtio_blk *vblk,
>  		   struct request *req)
>  {
> -	unsigned long num, out = 0, in = 0;
> +	unsigned int num;
>  	struct virtblk_req *vbr;
>  
>  	vbr = virtblk_alloc_req(vblk, GFP_ATOMIC);
> @@ -336,40 +363,15 @@ static bool do_req(struct request_queue *q, struct virtio_blk *vblk,
>  		}
>  	}
>  
> -	sg_set_buf(&vblk->sg[out++], &vbr->out_hdr, sizeof(vbr->out_hdr));
> -
> -	/*
> -	 * If this is a packet command we need a couple of additional headers.
> -	 * Behind the normal outhdr we put a segment with the scsi command
> -	 * block, and before the normal inhdr we put the sense data and the
> -	 * inhdr with additional status information before the normal inhdr.
> -	 */
> -	if (vbr->req->cmd_type == REQ_TYPE_BLOCK_PC)
> -		sg_set_buf(&vblk->sg[out++], vbr->req->cmd, vbr->req->cmd_len);
> -
> -	num = blk_rq_map_sg(q, vbr->req, vblk->sg + out);
> -
> -	if (vbr->req->cmd_type == REQ_TYPE_BLOCK_PC) {
> -		sg_set_buf(&vblk->sg[num + out + in++], vbr->req->sense, SCSI_SENSE_BUFFERSIZE);
> -		sg_set_buf(&vblk->sg[num + out + in++], &vbr->in_hdr,
> -			   sizeof(vbr->in_hdr));
> -	}
> -
> -	sg_set_buf(&vblk->sg[num + out + in++], &vbr->status,
> -		   sizeof(vbr->status));
> -
> +	num = blk_rq_map_sg(q, vbr->req, vblk->sg);
>  	if (num) {
> -		if (rq_data_dir(vbr->req) == WRITE) {
> +		if (rq_data_dir(vbr->req) == WRITE)
>  			vbr->out_hdr.type |= VIRTIO_BLK_T_OUT;
> -			out += num;
> -		} else {
> +		else
>  			vbr->out_hdr.type |= VIRTIO_BLK_T_IN;
> -			in += num;
> -		}
>  	}
>  
> -	if (virtqueue_add_buf(vblk->vq, vblk->sg, out, in, vbr,
> -			      GFP_ATOMIC) < 0) {
> +	if (__virtblk_add_req(vblk->vq, vbr, vblk->sg, num) < 0) {
>  		mempool_free(vbr, vblk->pool);
>  		return false;
>  	}
> 


-- 
Asias

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 2/9] virtio-blk: reorganize virtblk_add_req
  2013-02-12 12:23 ` [PATCH 2/9] virtio-blk: reorganize virtblk_add_req Paolo Bonzini
@ 2013-02-17  6:38   ` Asias He
  0 siblings, 0 replies; 35+ messages in thread
From: Asias He @ 2013-02-17  6:38 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: linux-kernel, Wanlong Gao, mst, Rusty Russell, kvm, virtualization

On 02/12/2013 08:23 PM, Paolo Bonzini wrote:
> Right now, both virtblk_add_req and virtblk_add_req_wait call
> virtqueue_add_buf.  To prepare for the next patches, abstract the call
> to virtqueue_add_buf into a new function __virtblk_add_req, and include
> the waiting logic directly in virtblk_add_req.
> 
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

Reviewed-by: Asias He <asias@redhat.com>

> ---
>  drivers/block/virtio_blk.c |   55 ++++++++++++++++----------------------------
>  1 files changed, 20 insertions(+), 35 deletions(-)
> 
> diff --git a/drivers/block/virtio_blk.c b/drivers/block/virtio_blk.c
> index 8ad21a2..fd8a689 100644
> --- a/drivers/block/virtio_blk.c
> +++ b/drivers/block/virtio_blk.c
> @@ -100,50 +100,39 @@ static inline struct virtblk_req *virtblk_alloc_req(struct virtio_blk *vblk,
>  	return vbr;
>  }
>  
> -static void virtblk_add_buf_wait(struct virtio_blk *vblk,
> -				 struct virtblk_req *vbr,
> -				 unsigned long out,
> -				 unsigned long in)
> +static inline int __virtblk_add_req(struct virtqueue *vq,
> +			     struct virtblk_req *vbr,
> +			     unsigned long out,
> +			     unsigned long in)
>  {
> +	return virtqueue_add_buf(vq, vbr->sg, out, in, vbr, GFP_ATOMIC);
> +}
> +
> +static void virtblk_add_req(struct virtblk_req *vbr,
> +			    unsigned int out, unsigned int in)
> +{
> +	struct virtio_blk *vblk = vbr->vblk;
>  	DEFINE_WAIT(wait);
> +	int ret;
>  
> -	for (;;) {
> +	spin_lock_irq(vblk->disk->queue->queue_lock);
> +	while (unlikely((ret = __virtblk_add_req(vblk->vq, vbr,
> +						 out, in)) < 0)) {
>  		prepare_to_wait_exclusive(&vblk->queue_wait, &wait,
>  					  TASK_UNINTERRUPTIBLE);
>  
> +		spin_unlock_irq(vblk->disk->queue->queue_lock);
> +		io_schedule();
>  		spin_lock_irq(vblk->disk->queue->queue_lock);
> -		if (virtqueue_add_buf(vblk->vq, vbr->sg, out, in, vbr,
> -				      GFP_ATOMIC) < 0) {
> -			spin_unlock_irq(vblk->disk->queue->queue_lock);
> -			io_schedule();
> -		} else {
> -			virtqueue_kick(vblk->vq);
> -			spin_unlock_irq(vblk->disk->queue->queue_lock);
> -			break;
> -		}
>  
> +		finish_wait(&vblk->queue_wait, &wait);
>  	}
>  
> -	finish_wait(&vblk->queue_wait, &wait);
> -}
> -
> -static inline void virtblk_add_req(struct virtblk_req *vbr,
> -				   unsigned int out, unsigned int in)
> -{
> -	struct virtio_blk *vblk = vbr->vblk;
> -
> -	spin_lock_irq(vblk->disk->queue->queue_lock);
> -	if (unlikely(virtqueue_add_buf(vblk->vq, vbr->sg, out, in, vbr,
> -					GFP_ATOMIC) < 0)) {
> -		spin_unlock_irq(vblk->disk->queue->queue_lock);
> -		virtblk_add_buf_wait(vblk, vbr, out, in);
> -		return;
> -	}
>  	virtqueue_kick(vblk->vq);
>  	spin_unlock_irq(vblk->disk->queue->queue_lock);
>  }
>  
> -static int virtblk_bio_send_flush(struct virtblk_req *vbr)
> +static void virtblk_bio_send_flush(struct virtblk_req *vbr)
>  {
>  	unsigned int out = 0, in = 0;
>  
> @@ -155,11 +144,9 @@ static int virtblk_bio_send_flush(struct virtblk_req *vbr)
>  	sg_set_buf(&vbr->sg[out + in++], &vbr->status, sizeof(vbr->status));
>  
>  	virtblk_add_req(vbr, out, in);
> -
> -	return 0;
>  }
>  
> -static int virtblk_bio_send_data(struct virtblk_req *vbr)
> +static void virtblk_bio_send_data(struct virtblk_req *vbr)
>  {
>  	struct virtio_blk *vblk = vbr->vblk;
>  	unsigned int num, out = 0, in = 0;
> @@ -188,8 +175,6 @@ static int virtblk_bio_send_data(struct virtblk_req *vbr)
>  	}
>  
>  	virtblk_add_req(vbr, out, in);
> -
> -	return 0;
>  }
>  
>  static void virtblk_bio_send_data_work(struct work_struct *work)
> 


-- 
Asias

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 3/9] virtio-blk: use virtqueue_start_buf on bio path
  2013-02-12 12:23 ` [PATCH 3/9] virtio-blk: use virtqueue_start_buf on bio path Paolo Bonzini
@ 2013-02-17  6:39   ` Asias He
  0 siblings, 0 replies; 35+ messages in thread
From: Asias He @ 2013-02-17  6:39 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: linux-kernel, Wanlong Gao, mst, Rusty Russell, kvm, virtualization

On 02/12/2013 08:23 PM, Paolo Bonzini wrote:
> Move the creation of the request header and response footer to
> __virtblk_add_req.  vbr->sg only contains the data scatterlist,
> the header/footer are added separately using the new piecewise
> API for building virtqueue buffers.
> 
> With this change, virtio-blk (with use_bio) is not relying anymore on
> the virtio functions ignoring the end markers in a scatterlist.
> The next patch will do the same for the other path.
> 
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

Reviewed-by: Asias He <asias@redhat.com>

> ---
>  drivers/block/virtio_blk.c |   74 ++++++++++++++++++++++++++-----------------
>  1 files changed, 45 insertions(+), 29 deletions(-)
> 
> diff --git a/drivers/block/virtio_blk.c b/drivers/block/virtio_blk.c
> index fd8a689..4a31fcc 100644
> --- a/drivers/block/virtio_blk.c
> +++ b/drivers/block/virtio_blk.c
> @@ -62,6 +62,7 @@ struct virtblk_req
>  	struct virtio_blk *vblk;
>  	int flags;
>  	u8 status;
> +	int nents;
>  	struct scatterlist sg[];
>  };
>  
> @@ -100,24 +101,52 @@ static inline struct virtblk_req *virtblk_alloc_req(struct virtio_blk *vblk,
>  	return vbr;
>  }
>  
> -static inline int __virtblk_add_req(struct virtqueue *vq,
> -			     struct virtblk_req *vbr,
> -			     unsigned long out,
> -			     unsigned long in)
> +static int __virtblk_add_req(struct virtqueue *vq,
> +			     struct virtblk_req *vbr)
>  {
> -	return virtqueue_add_buf(vq, vbr->sg, out, in, vbr, GFP_ATOMIC);
> +	struct scatterlist sg;
> +	enum dma_data_direction dir;
> +	int ret;
> +
> +	unsigned int nents = 2;
> +	unsigned int nsg = 2;
> +
> +	if (vbr->nents) {
> +		nsg++;
> +		nents += vbr->nents;
> +	}
> +
> +	ret = virtqueue_start_buf(vq, vbr, nents, nsg, GFP_ATOMIC);
> +	if (ret < 0)
> +		return ret;
> +
> +	dir = DMA_TO_DEVICE;
> +	sg_init_one(&sg, &vbr->out_hdr, sizeof(vbr->out_hdr));
> +	virtqueue_add_sg(vq, &sg, 1, dir);
> +
> +	if (vbr->nents) {
> +		if ((vbr->out_hdr.type & VIRTIO_BLK_T_OUT) == 0)
> +			dir = DMA_FROM_DEVICE;
> +
> +		virtqueue_add_sg(vq, vbr->sg, vbr->nents, dir);
> +	}
> +
> +	dir = DMA_FROM_DEVICE;
> +	sg_init_one(&sg, &vbr->status, sizeof(vbr->status));
> +	virtqueue_add_sg(vq, &sg, 1, dir);
> +
> +	virtqueue_end_buf(vq);
> +	return 0;
>  }
>  
> -static void virtblk_add_req(struct virtblk_req *vbr,
> -			    unsigned int out, unsigned int in)
> +static void virtblk_add_req(struct virtblk_req *vbr)
>  {
>  	struct virtio_blk *vblk = vbr->vblk;
>  	DEFINE_WAIT(wait);
>  	int ret;
>  
>  	spin_lock_irq(vblk->disk->queue->queue_lock);
> -	while (unlikely((ret = __virtblk_add_req(vblk->vq, vbr,
> -						 out, in)) < 0)) {
> +	while (unlikely((ret = __virtblk_add_req(vblk->vq, vbr)) < 0)) {
>  		prepare_to_wait_exclusive(&vblk->queue_wait, &wait,
>  					  TASK_UNINTERRUPTIBLE);
>  
> @@ -134,22 +163,18 @@ static void virtblk_add_req(struct virtblk_req *vbr,
>  
>  static void virtblk_bio_send_flush(struct virtblk_req *vbr)
>  {
> -	unsigned int out = 0, in = 0;
> -
>  	vbr->flags |= VBLK_IS_FLUSH;
>  	vbr->out_hdr.type = VIRTIO_BLK_T_FLUSH;
>  	vbr->out_hdr.sector = 0;
>  	vbr->out_hdr.ioprio = 0;
> -	sg_set_buf(&vbr->sg[out++], &vbr->out_hdr, sizeof(vbr->out_hdr));
> -	sg_set_buf(&vbr->sg[out + in++], &vbr->status, sizeof(vbr->status));
> +	vbr->nents = 0;
>  
> -	virtblk_add_req(vbr, out, in);
> +	virtblk_add_req(vbr);
>  }
>  
>  static void virtblk_bio_send_data(struct virtblk_req *vbr)
>  {
>  	struct virtio_blk *vblk = vbr->vblk;
> -	unsigned int num, out = 0, in = 0;
>  	struct bio *bio = vbr->bio;
>  
>  	vbr->flags &= ~VBLK_IS_FLUSH;
> @@ -157,24 +182,15 @@ static void virtblk_bio_send_data(struct virtblk_req *vbr)
>  	vbr->out_hdr.sector = bio->bi_sector;
>  	vbr->out_hdr.ioprio = bio_prio(bio);
>  
> -	sg_set_buf(&vbr->sg[out++], &vbr->out_hdr, sizeof(vbr->out_hdr));
> -
> -	num = blk_bio_map_sg(vblk->disk->queue, bio, vbr->sg + out);
> -
> -	sg_set_buf(&vbr->sg[num + out + in++], &vbr->status,
> -		   sizeof(vbr->status));
> -
> -	if (num) {
> -		if (bio->bi_rw & REQ_WRITE) {
> +	vbr->nents = blk_bio_map_sg(vblk->disk->queue, bio, vbr->sg);
> +	if (vbr->nents) {
> +		if (bio->bi_rw & REQ_WRITE)
>  			vbr->out_hdr.type |= VIRTIO_BLK_T_OUT;
> -			out += num;
> -		} else {
> +		else
>  			vbr->out_hdr.type |= VIRTIO_BLK_T_IN;
> -			in += num;
> -		}
>  	}
>  
> -	virtblk_add_req(vbr, out, in);
> +	virtblk_add_req(vbr);
>  }
>  
>  static void virtblk_bio_send_data_work(struct work_struct *work)
> 


-- 
Asias

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 4/9] virtio-blk: use virtqueue_start_buf on req path
  2013-02-17  6:37   ` Asias He
@ 2013-02-18  9:05     ` Paolo Bonzini
  0 siblings, 0 replies; 35+ messages in thread
From: Paolo Bonzini @ 2013-02-18  9:05 UTC (permalink / raw)
  To: Asias He
  Cc: linux-kernel, Wanlong Gao, mst, Rusty Russell, kvm, virtualization

Il 17/02/2013 07:37, Asias He ha scritto:
>> >  static int __virtblk_add_req(struct virtqueue *vq,
>> > -			     struct virtblk_req *vbr)
>> > +			     struct virtblk_req *vbr,
>> > +			     struct scatterlist *data_sg,
>> > +			     unsigned data_nents)
>> >  {
>> >  	struct scatterlist sg;
>> >  	enum dma_data_direction dir;
>> >  	int ret;
>> >  
>> > +	int type = vbr->out_hdr.type & ~VIRTIO_BLK_T_OUT;
>> >  	unsigned int nents = 2;
>> >  	unsigned int nsg = 2;
>> >  
>> > -	if (vbr->nents) {
>> > +	if (type == VIRTIO_BLK_T_SCSI_CMD) {
>> > +		BUG_ON(use_bio);
> Do we really need the BUG_ON?  Even if with use_bio=1,
> VIRTIO_BLK_T_SCSI_CMD cmd can be fired. See this:

I stand corrected... will send the patch with this removed.

Paolo

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 0/9] virtio: new API for addition of buffers, scatterlist changes
  2013-02-14  9:23   ` Paolo Bonzini
  2013-02-15 18:04     ` Paolo Bonzini
@ 2013-02-19  7:49     ` Rusty Russell
  2013-02-19  9:11       ` Paolo Bonzini
  1 sibling, 1 reply; 35+ messages in thread
From: Rusty Russell @ 2013-02-19  7:49 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: linux-kernel, Wanlong Gao, asias, mst, kvm, virtualization

Paolo Bonzini <pbonzini@redhat.com> writes:
>> virtio_ring: virtqueue_add_sgs, to add multiple sgs.
>> 
>> virtio_scsi and virtio_blk can really use these, to avoid their current
>> hack of copying the whole sg array.
>> 
>> Signed-off-by: Ruty Russell <rusty@rustcorp.com.au> 
>
> It's much better than the other prototype you had posted, but I still
> dislike this...  You pay for additional counting of scatterlists when
> the caller knows the number of buffers

Yes, but I like the API simplicity.  We could use an array of sg_table,
which is what you have in virtio_scsi anyway, but I doubt it's
measurable on benchmarks.

> and the nested loops aren't free, either.

I think they'll win over multiple function calls :)

But modulo devastating benchmarks, this wins.  Because the three-part
API is really, really ugly.  It *looks* like a generic "start - work
... work - end" API, but it's not.  Because you need to declare exactly
what you're doing in virtqueue_start_buf!

And that's OK, because noone wants a generic API like that.

> > +	sg_unmark_end(sg + out + in);
> > +	if (out && in)
> > +		sg_unmark_end(sg + out);
>
> What's this second sg_unmark_end block for?  Doesn't it access after the
> end of sg?  If you wanted it to be sg_mark_end, that must be:
> 
>   if (out)
> 	  sg_mark_end(sg + out - 1);
>   if (in)
> 	  sg_mark_end(sg + out + in - 1);
> 
>   with a corresponding unmark afterwards.

Thanks, I fixed that after I actually tested it :)

But as we clean them every time, we don't need to unmark:

	/* Workaround until callers pass well-formed sgs. */
	for (i = 0; i < out + in; i++)
		sg_unmark_end(sg + i);

	sg_mark_end(sg + out + in - 1);
	if (out && in)
		sg_mark_end(sg + out - 1);

	return virtqueue_add_sgs(_vq, sgs, out ? 1 : 0, in ? 1 : 0, data, gfp);

This is a workaround until all callers fixed / replaced, of course.

> Another problem is that you cannot pass "truncated" scatterlists.  You
> must ensure there is an end marker on the last item.  I'm not sure if
> the kernel ensures that, given that for_each_sg takes explicitly the
> number of scatterlist elements; and it is not as trivial as
> "sg_mark_end(foo + nsg - 1);" if the upper layers hand you a chained
> scatterlist.

Makes you wonder why they have end markers at all.  But yes, the block
layer does the right thing with end markers in blk_bio_map_sg(), which
seems to carry through.

Cheers,
Rusty.
PS.  Patchbomb coming, lightly tested.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 0/9] virtio: new API for addition of buffers, scatterlist changes
  2013-02-19  7:49     ` Rusty Russell
@ 2013-02-19  9:11       ` Paolo Bonzini
  0 siblings, 0 replies; 35+ messages in thread
From: Paolo Bonzini @ 2013-02-19  9:11 UTC (permalink / raw)
  To: Rusty Russell; +Cc: linux-kernel, Wanlong Gao, asias, mst, kvm, virtualization

Il 19/02/2013 08:49, Rusty Russell ha scritto:
> 
> But modulo devastating benchmarks, this wins.  Because the three-part
> API is really, really ugly.  It *looks* like a generic "start - work
> ... work - end" API, but it's not.  Because you need to declare exactly
> what you're doing in virtqueue_start_buf!

Fair enough, and since you did the work for me I cannot really complain. :)

Paolo

^ permalink raw reply	[flat|nested] 35+ messages in thread

end of thread, other threads:[~2013-02-19  9:11 UTC | newest]

Thread overview: 35+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-02-12 12:23 [PATCH 0/9] virtio: new API for addition of buffers, scatterlist changes Paolo Bonzini
2013-02-12 12:23 ` [PATCH 1/9] virtio: add functions for piecewise addition of buffers Paolo Bonzini
2013-02-12 14:56   ` Michael S. Tsirkin
2013-02-12 15:32     ` Paolo Bonzini
2013-02-12 15:43       ` Michael S. Tsirkin
2013-02-12 15:48         ` Paolo Bonzini
2013-02-12 16:13           ` Michael S. Tsirkin
2013-02-12 16:17             ` Paolo Bonzini
2013-02-12 16:35               ` Michael S. Tsirkin
2013-02-12 16:57                 ` Paolo Bonzini
2013-02-12 17:34                   ` Michael S. Tsirkin
2013-02-12 18:04                     ` Paolo Bonzini
2013-02-12 18:23                       ` Michael S. Tsirkin
2013-02-12 20:08                         ` Paolo Bonzini
2013-02-12 20:49                           ` Michael S. Tsirkin
2013-02-13  8:06                             ` Paolo Bonzini
2013-02-13 10:33                               ` Michael S. Tsirkin
2013-02-12 18:03   ` [PATCH v2 " Paolo Bonzini
2013-02-12 12:23 ` [PATCH 2/9] virtio-blk: reorganize virtblk_add_req Paolo Bonzini
2013-02-17  6:38   ` Asias He
2013-02-12 12:23 ` [PATCH 3/9] virtio-blk: use virtqueue_start_buf on bio path Paolo Bonzini
2013-02-17  6:39   ` Asias He
2013-02-12 12:23 ` [PATCH 4/9] virtio-blk: use virtqueue_start_buf on req path Paolo Bonzini
2013-02-17  6:37   ` Asias He
2013-02-18  9:05     ` Paolo Bonzini
2013-02-12 12:23 ` [PATCH 5/9] scatterlist: introduce sg_unmark_end Paolo Bonzini
2013-02-12 12:23 ` [PATCH 6/9] virtio-net: unmark scatterlist ending after virtqueue_add_buf Paolo Bonzini
2013-02-12 12:23 ` [PATCH 7/9] virtio-scsi: use virtqueue_start_buf Paolo Bonzini
2013-02-12 12:23 ` [PATCH 8/9] virtio: introduce and use virtqueue_add_buf_single Paolo Bonzini
2013-02-12 12:23 ` [PATCH 9/9] virtio: reimplement virtqueue_add_buf using new functions Paolo Bonzini
2013-02-14  6:00 ` [PATCH 0/9] virtio: new API for addition of buffers, scatterlist changes Rusty Russell
2013-02-14  9:23   ` Paolo Bonzini
2013-02-15 18:04     ` Paolo Bonzini
2013-02-19  7:49     ` Rusty Russell
2013-02-19  9:11       ` Paolo Bonzini

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).