All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/3] virtio: indirect ring entries
@ 2008-12-18 17:10 Mark McLoughlin
  2008-12-18 17:10 ` [PATCH 1/3] virtio: teach virtio_has_feature() about transport features Mark McLoughlin
  0 siblings, 1 reply; 29+ messages in thread
From: Mark McLoughlin @ 2008-12-18 17:10 UTC (permalink / raw)
  To: Rusty Russell; +Cc: virtualization, linux-kernel, Avi Kivity


Hi Rusty,
        Here's something that has (apparently) been kicked around for
a while now and I think makes some sense.

        Avi has been especially pushing for it lately in the context
of high performance block I/O - I'll let him explain his thinking
there.

        The patches themselves are fairly trivial, there shouldn't be
anything too surprising here.

Cheers,
Mark.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [PATCH 1/3] virtio: teach virtio_has_feature() about transport features
  2008-12-18 17:10 [PATCH 0/3] virtio: indirect ring entries Mark McLoughlin
@ 2008-12-18 17:10 ` Mark McLoughlin
  2008-12-18 17:10   ` [PATCH 2/3] virtio: indirect ring entries (VIRTIO_RING_F_INDIRECT_DESC) Mark McLoughlin
  0 siblings, 1 reply; 29+ messages in thread
From: Mark McLoughlin @ 2008-12-18 17:10 UTC (permalink / raw)
  To: Rusty Russell; +Cc: virtualization, linux-kernel, Avi Kivity, Mark McLoughlin

Drivers don't add transport features to their table, so we
shouldn't check these with virtio_check_driver_offered_feature().

We could perhaps add an ->offered_feature() virtio_config_op,
but that perhaps that would be overkill for a consitency check
like this.

Signed-off-by: Mark McLoughlin <markmc@redhat.com>
---
 include/linux/virtio_config.h |    4 +++-
 1 files changed, 3 insertions(+), 1 deletions(-)

diff --git a/include/linux/virtio_config.h b/include/linux/virtio_config.h
index bf8ec28..e4ba694 100644
--- a/include/linux/virtio_config.h
+++ b/include/linux/virtio_config.h
@@ -99,7 +99,9 @@ static inline bool virtio_has_feature(const struct virtio_device *vdev,
 	if (__builtin_constant_p(fbit))
 		BUILD_BUG_ON(fbit >= 32);
 
-	virtio_check_driver_offered_feature(vdev, fbit);
+	if (fbit < VIRTIO_TRANSPORT_F_START)
+		virtio_check_driver_offered_feature(vdev, fbit);
+
 	return test_bit(fbit, vdev->features);
 }
 
-- 
1.6.0.5


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH 2/3] virtio: indirect ring entries (VIRTIO_RING_F_INDIRECT_DESC)
  2008-12-18 17:10 ` [PATCH 1/3] virtio: teach virtio_has_feature() about transport features Mark McLoughlin
@ 2008-12-18 17:10   ` Mark McLoughlin
  2008-12-18 17:10     ` [PATCH 3/3] lguest: add support for indirect ring entries Mark McLoughlin
                       ` (4 more replies)
  0 siblings, 5 replies; 29+ messages in thread
From: Mark McLoughlin @ 2008-12-18 17:10 UTC (permalink / raw)
  To: Rusty Russell; +Cc: virtualization, linux-kernel, Avi Kivity, Mark McLoughlin

Add a new feature flag for indirect ring entries. These are ring
entries which point to a table of buffer descriptors.

The idea here is to increase the ring capacity by allowing a larger
effective ring size whereby the ring size dictates the number of
requests that may be outstanding, rather than the size of those
requests.

This should be most effective in the case of block I/O where we can
potentially benefit by concurrently dispatching a large number of
large requests. Even in the simple case of single segment block
requests, this results in a threefold increase in ring capacity.

Signed-off-by: Mark McLoughlin <markmc@redhat.com>
---
 drivers/virtio/virtio_ring.c |   75 ++++++++++++++++++++++++++++++++++++++++-
 include/linux/virtio_ring.h  |    5 +++
 2 files changed, 78 insertions(+), 2 deletions(-)

diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
index 5777196..2330c4b 100644
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -46,6 +46,9 @@ struct vring_virtqueue
 	/* Other side has made a mess, don't try any more. */
 	bool broken;
 
+	/* Host supports indirect buffers */
+	bool indirect;
+
 	/* Number of free buffers */
 	unsigned int num_free;
 	/* Head of free buffer list. */
@@ -70,6 +73,55 @@ struct vring_virtqueue
 
 #define to_vvq(_vq) container_of(_vq, struct vring_virtqueue, vq)
 
+/* Set up an indirect table of descriptors and add it to the queue. */
+static int vring_add_indirect(struct vring_virtqueue *vq,
+			      struct scatterlist sg[],
+			      unsigned int out,
+			      unsigned int in)
+{
+	struct vring_desc *desc;
+	unsigned head;
+	int i;
+
+	desc = kmalloc((out + in) * sizeof(struct vring_desc), GFP_ATOMIC);
+	if (!desc)
+		return vq->vring.num;
+
+	/* Transfer entries from the sg list into the indirect page */
+	for (i = 0; i < out; i++) {
+		desc[i].flags = VRING_DESC_F_NEXT;
+		desc[i].addr = sg_phys(sg);
+		desc[i].len = sg->length;
+		desc[i].next = i+1;
+		sg++;
+	}
+	for (; i < (out + in); i++) {
+		desc[i].flags = VRING_DESC_F_NEXT|VRING_DESC_F_WRITE;
+		desc[i].addr = sg_phys(sg);
+		desc[i].len = sg->length;
+		desc[i].next = i+1;
+		sg++;
+	}
+
+	/* Last one doesn't continue. */
+	desc[i-1].flags &= ~VRING_DESC_F_NEXT;
+	desc[i-1].next = 0;
+
+	/* We're about to use a buffer */
+	vq->num_free--;
+
+	/* Use a single buffer which doesn't continue */
+	head = vq->free_head;
+	vq->vring.desc[head].flags = VRING_DESC_F_INDIRECT;
+	vq->vring.desc[head].addr = virt_to_phys(desc);
+	vq->vring.desc[head].len = i * sizeof(struct vring_desc);
+
+	/* Update free pointer */
+	vq->free_head = vq->vring.desc[head].next;
+
+	return head;
+}
+
 static int vring_add_buf(struct virtqueue *_vq,
 			 struct scatterlist sg[],
 			 unsigned int out,
@@ -79,12 +131,21 @@ static int vring_add_buf(struct virtqueue *_vq,
 	struct vring_virtqueue *vq = to_vvq(_vq);
 	unsigned int i, avail, head, uninitialized_var(prev);
 
+	START_USE(vq);
+
 	BUG_ON(data == NULL);
+
+	/* If the host supports indirect descriptor tables, and we have multiple
+	 * buffers, then go indirect. FIXME: tune this threshold */
+	if (vq->indirect && (out + in) > 1 && vq->num_free) {
+		head = vring_add_indirect(vq, sg, out, in);
+		if (head != vq->vring.num)
+			goto add_head;
+	}
+
 	BUG_ON(out + in > vq->vring.num);
 	BUG_ON(out + in == 0);
 
-	START_USE(vq);
-
 	if (vq->num_free < out + in) {
 		pr_debug("Can't add buf len %i - avail = %i\n",
 			 out + in, vq->num_free);
@@ -121,6 +182,7 @@ static int vring_add_buf(struct virtqueue *_vq,
 	/* Update free pointer */
 	vq->free_head = i;
 
+add_head:
 	/* Set token. */
 	vq->data[head] = data;
 
@@ -164,6 +226,11 @@ static void detach_buf(struct vring_virtqueue *vq, unsigned int head)
 
 	/* Put back on free list: find end */
 	i = head;
+
+	/* Free the indirect table */
+	if (vq->vring.desc[i].flags & VRING_DESC_F_INDIRECT)
+		kfree(phys_to_virt(vq->vring.desc[i].addr));
+
 	while (vq->vring.desc[i].flags & VRING_DESC_F_NEXT) {
 		i = vq->vring.desc[i].next;
 		vq->num_free++;
@@ -305,6 +372,8 @@ struct virtqueue *vring_new_virtqueue(unsigned int num,
 	vq->in_use = false;
 #endif
 
+	vq->indirect = virtio_has_feature(vdev, VIRTIO_RING_F_INDIRECT_DESC);
+
 	/* No callback?  Tell other side not to bother us. */
 	if (!callback)
 		vq->vring.avail->flags |= VRING_AVAIL_F_NO_INTERRUPT;
@@ -332,6 +401,8 @@ void vring_transport_features(struct virtio_device *vdev)
 
 	for (i = VIRTIO_TRANSPORT_F_START; i < VIRTIO_TRANSPORT_F_END; i++) {
 		switch (i) {
+		case VIRTIO_RING_F_INDIRECT_DESC:
+			break;
 		default:
 			/* We don't understand this bit. */
 			clear_bit(i, vdev->features);
diff --git a/include/linux/virtio_ring.h b/include/linux/virtio_ring.h
index 71e0372..3828ae2 100644
--- a/include/linux/virtio_ring.h
+++ b/include/linux/virtio_ring.h
@@ -14,6 +14,8 @@
 #define VRING_DESC_F_NEXT	1
 /* This marks a buffer as write-only (otherwise read-only). */
 #define VRING_DESC_F_WRITE	2
+/* This means the buffer contains a list of buffer descriptors. */
+#define VRING_DESC_F_INDIRECT	4
 
 /* The Host uses this in used->flags to advise the Guest: don't kick me when
  * you add a buffer.  It's unreliable, so it's simply an optimization.  Guest
@@ -24,6 +26,9 @@
  * optimization.  */
 #define VRING_AVAIL_F_NO_INTERRUPT	1
 
+/* We support indirect buffer descriptors */
+#define VIRTIO_RING_F_INDIRECT_DESC	28
+
 /* Virtio ring descriptors: 16 bytes.  These can chain together via "next". */
 struct vring_desc
 {
-- 
1.6.0.5


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH 3/3] lguest: add support for indirect ring entries
  2008-12-18 17:10   ` [PATCH 2/3] virtio: indirect ring entries (VIRTIO_RING_F_INDIRECT_DESC) Mark McLoughlin
@ 2008-12-18 17:10     ` Mark McLoughlin
  2008-12-20 11:38     ` [PATCH 2/3] virtio: indirect ring entries (VIRTIO_RING_F_INDIRECT_DESC) Ingo Oeser
                       ` (3 subsequent siblings)
  4 siblings, 0 replies; 29+ messages in thread
From: Mark McLoughlin @ 2008-12-18 17:10 UTC (permalink / raw)
  To: Rusty Russell; +Cc: virtualization, linux-kernel, Avi Kivity, Mark McLoughlin

Support the VIRTIO_RING_F_INDIRECT_DESC feature.

This is a simple matter of changing the descriptor walking
code to operate on a struct vring_desc* and supplying it
with an indirect table if detected.

Signed-off-by: Mark McLoughlin <markmc@redhat.com>
---
 Documentation/lguest/lguest.c |   41 +++++++++++++++++++++++++++++------------
 1 files changed, 29 insertions(+), 12 deletions(-)

diff --git a/Documentation/lguest/lguest.c b/Documentation/lguest/lguest.c
index f2dbbf3..3cd388f 100644
--- a/Documentation/lguest/lguest.c
+++ b/Documentation/lguest/lguest.c
@@ -623,20 +623,21 @@ static void *_check_pointer(unsigned long addr, unsigned int size,
 /* Each buffer in the virtqueues is actually a chain of descriptors.  This
  * function returns the next descriptor in the chain, or vq->vring.num if we're
  * at the end. */
-static unsigned next_desc(struct virtqueue *vq, unsigned int i)
+static unsigned next_desc(struct vring_desc *desc,
+			  unsigned int i, unsigned int max)
 {
 	unsigned int next;
 
 	/* If this descriptor says it doesn't chain, we're done. */
-	if (!(vq->vring.desc[i].flags & VRING_DESC_F_NEXT))
-		return vq->vring.num;
+	if (!(desc[i].flags & VRING_DESC_F_NEXT))
+		return max;
 
 	/* Check they're not leading us off end of descriptors. */
-	next = vq->vring.desc[i].next;
+	next = desc[i].next;
 	/* Make sure compiler knows to grab that: we don't want it changing! */
 	wmb();
 
-	if (next >= vq->vring.num)
+	if (next >= max)
 		errx(1, "Desc next is %u", next);
 
 	return next;
@@ -653,7 +654,8 @@ static unsigned get_vq_desc(struct virtqueue *vq,
 			    struct iovec iov[],
 			    unsigned int *out_num, unsigned int *in_num)
 {
-	unsigned int i, head;
+	struct vring_desc *desc;
+	unsigned int i, head, max;
 	u16 last_avail;
 
 	/* Check it isn't doing very strange things with descriptor numbers. */
@@ -678,15 +680,28 @@ static unsigned get_vq_desc(struct virtqueue *vq,
 	/* When we start there are none of either input nor output. */
 	*out_num = *in_num = 0;
 
+	max = vq->vring.num;
+	desc = vq->vring.desc;
 	i = head;
+
+	/* If this is an indirect entry, then this buffer contains a descriptor
+	 * table which we handle as if it's any normal descriptor chain. */
+	if (desc[i].flags & VRING_DESC_F_INDIRECT) {
+		if (desc[i].len % sizeof(struct vring_desc))
+			errx(1, "Invalid size for indirect buffer table");
+
+		max = desc[i].len / sizeof(struct vring_desc);
+		desc = check_pointer(desc[i].addr, desc[i].len);
+		i = 0;
+	}
+
 	do {
 		/* Grab the first descriptor, and check it's OK. */
-		iov[*out_num + *in_num].iov_len = vq->vring.desc[i].len;
+		iov[*out_num + *in_num].iov_len = desc[i].len;
 		iov[*out_num + *in_num].iov_base
-			= check_pointer(vq->vring.desc[i].addr,
-					vq->vring.desc[i].len);
+			= check_pointer(desc[i].addr, desc[i].len);
 		/* If this is an input descriptor, increment that count. */
-		if (vq->vring.desc[i].flags & VRING_DESC_F_WRITE)
+		if (desc[i].flags & VRING_DESC_F_WRITE)
 			(*in_num)++;
 		else {
 			/* If it's an output descriptor, they're all supposed
@@ -697,9 +712,9 @@ static unsigned get_vq_desc(struct virtqueue *vq,
 		}
 
 		/* If we've got too many, that implies a descriptor loop. */
-		if (*out_num + *in_num > vq->vring.num)
+		if (*out_num + *in_num > max)
 			errx(1, "Looped descriptor");
-	} while ((i = next_desc(vq, i)) != vq->vring.num);
+	} while ((i = next_desc(desc, i, max)) != max);
 
 	vq->inflight++;
 	return head;
@@ -1502,6 +1517,8 @@ static void setup_tun_net(char *arg)
 	add_feature(dev, VIRTIO_NET_F_HOST_TSO4);
 	add_feature(dev, VIRTIO_NET_F_HOST_TSO6);
 	add_feature(dev, VIRTIO_NET_F_HOST_ECN);
+	/* we handle indirect ring entries */
+	add_feature(dev, VIRTIO_RING_F_INDIRECT_DESC);
 	set_config(dev, sizeof(conf), &conf);
 
 	/* We don't need the socket any more; setup is done. */
-- 
1.6.0.5


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* Re: [PATCH 2/3] virtio: indirect ring entries (VIRTIO_RING_F_INDIRECT_DESC)
  2008-12-18 17:10   ` [PATCH 2/3] virtio: indirect ring entries (VIRTIO_RING_F_INDIRECT_DESC) Mark McLoughlin
  2008-12-18 17:10     ` [PATCH 3/3] lguest: add support for indirect ring entries Mark McLoughlin
  2008-12-20 11:38     ` [PATCH 2/3] virtio: indirect ring entries (VIRTIO_RING_F_INDIRECT_DESC) Ingo Oeser
@ 2008-12-20 11:38     ` Ingo Oeser
  2008-12-22 10:17       ` Mark McLoughlin
  2008-12-22 10:17       ` Mark McLoughlin
  2009-04-21 12:59     ` Mark McLoughlin
  2009-04-21 12:59     ` Mark McLoughlin
  4 siblings, 2 replies; 29+ messages in thread
From: Ingo Oeser @ 2008-12-20 11:38 UTC (permalink / raw)
  To: Mark McLoughlin; +Cc: Rusty Russell, virtualization, linux-kernel, Avi Kivity

Hi Mark,

On Thursday 18 December 2008, Mark McLoughlin wrote:
> diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> index 5777196..2330c4b 100644
> --- a/drivers/virtio/virtio_ring.c
> +++ b/drivers/virtio/virtio_ring.c
> @@ -70,6 +73,55 @@ struct vring_virtqueue
>  
>  #define to_vvq(_vq) container_of(_vq, struct vring_virtqueue, vq)
>  
> +/* Set up an indirect table of descriptors and add it to the queue. */
> +static int vring_add_indirect(struct vring_virtqueue *vq,
> +			      struct scatterlist sg[],
> +			      unsigned int out,
> +			      unsigned int in)
> +{
> +	struct vring_desc *desc;
> +	unsigned head;
> +	int i;
> +
> +	desc = kmalloc((out + in) * sizeof(struct vring_desc), GFP_ATOMIC);

kmalloc() returns ZERO_SIZE_PTR, if (out + in) == 0

> +	if (!desc)
> +		return vq->vring.num;
> +
> +	/* Transfer entries from the sg list into the indirect page */
> +	for (i = 0; i < out; i++) {
> +		desc[i].flags = VRING_DESC_F_NEXT;
> +		desc[i].addr = sg_phys(sg);
> +		desc[i].len = sg->length;
> +		desc[i].next = i+1;
> +		sg++;
> +	}
> +	for (; i < (out + in); i++) {
> +		desc[i].flags = VRING_DESC_F_NEXT|VRING_DESC_F_WRITE;
> +		desc[i].addr = sg_phys(sg);
> +		desc[i].len = sg->length;
> +		desc[i].next = i+1;
> +		sg++;
> +	}
> +
> +	/* Last one doesn't continue. */
> +	desc[i-1].flags &= ~VRING_DESC_F_NEXT;
> +	desc[i-1].next = 0;

So this array index can fail (be -1).
Please check and avoid within this function.


Best Regards

Ingo Oeser

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 2/3] virtio: indirect ring entries (VIRTIO_RING_F_INDIRECT_DESC)
  2008-12-18 17:10   ` [PATCH 2/3] virtio: indirect ring entries (VIRTIO_RING_F_INDIRECT_DESC) Mark McLoughlin
  2008-12-18 17:10     ` [PATCH 3/3] lguest: add support for indirect ring entries Mark McLoughlin
@ 2008-12-20 11:38     ` Ingo Oeser
  2008-12-20 11:38     ` Ingo Oeser
                       ` (2 subsequent siblings)
  4 siblings, 0 replies; 29+ messages in thread
From: Ingo Oeser @ 2008-12-20 11:38 UTC (permalink / raw)
  To: Mark McLoughlin; +Cc: Avi Kivity, linux-kernel, virtualization

Hi Mark,

On Thursday 18 December 2008, Mark McLoughlin wrote:
> diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> index 5777196..2330c4b 100644
> --- a/drivers/virtio/virtio_ring.c
> +++ b/drivers/virtio/virtio_ring.c
> @@ -70,6 +73,55 @@ struct vring_virtqueue
>  
>  #define to_vvq(_vq) container_of(_vq, struct vring_virtqueue, vq)
>  
> +/* Set up an indirect table of descriptors and add it to the queue. */
> +static int vring_add_indirect(struct vring_virtqueue *vq,
> +			      struct scatterlist sg[],
> +			      unsigned int out,
> +			      unsigned int in)
> +{
> +	struct vring_desc *desc;
> +	unsigned head;
> +	int i;
> +
> +	desc = kmalloc((out + in) * sizeof(struct vring_desc), GFP_ATOMIC);

kmalloc() returns ZERO_SIZE_PTR, if (out + in) == 0

> +	if (!desc)
> +		return vq->vring.num;
> +
> +	/* Transfer entries from the sg list into the indirect page */
> +	for (i = 0; i < out; i++) {
> +		desc[i].flags = VRING_DESC_F_NEXT;
> +		desc[i].addr = sg_phys(sg);
> +		desc[i].len = sg->length;
> +		desc[i].next = i+1;
> +		sg++;
> +	}
> +	for (; i < (out + in); i++) {
> +		desc[i].flags = VRING_DESC_F_NEXT|VRING_DESC_F_WRITE;
> +		desc[i].addr = sg_phys(sg);
> +		desc[i].len = sg->length;
> +		desc[i].next = i+1;
> +		sg++;
> +	}
> +
> +	/* Last one doesn't continue. */
> +	desc[i-1].flags &= ~VRING_DESC_F_NEXT;
> +	desc[i-1].next = 0;

So this array index can fail (be -1).
Please check and avoid within this function.


Best Regards

Ingo Oeser

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 2/3] virtio: indirect ring entries (VIRTIO_RING_F_INDIRECT_DESC)
  2008-12-20 11:38     ` Ingo Oeser
@ 2008-12-22 10:17       ` Mark McLoughlin
  2008-12-22 10:17       ` Mark McLoughlin
  1 sibling, 0 replies; 29+ messages in thread
From: Mark McLoughlin @ 2008-12-22 10:17 UTC (permalink / raw)
  To: Ingo Oeser; +Cc: Rusty Russell, virtualization, linux-kernel, Avi Kivity

Hi Ingo,

On Sat, 2008-12-20 at 12:38 +0100, Ingo Oeser wrote:
> Hi Mark,
> 
> On Thursday 18 December 2008, Mark McLoughlin wrote:
> > diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> > index 5777196..2330c4b 100644
> > --- a/drivers/virtio/virtio_ring.c
> > +++ b/drivers/virtio/virtio_ring.c
> > @@ -70,6 +73,55 @@ struct vring_virtqueue
> >  
> >  #define to_vvq(_vq) container_of(_vq, struct vring_virtqueue, vq)
> >  
> > +/* Set up an indirect table of descriptors and add it to the queue. */
> > +static int vring_add_indirect(struct vring_virtqueue *vq,
> > +			      struct scatterlist sg[],
> > +			      unsigned int out,
> > +			      unsigned int in)
> > +{
> > +	struct vring_desc *desc;
> > +	unsigned head;
> > +	int i;
> > +
> > +	desc = kmalloc((out + in) * sizeof(struct vring_desc), GFP_ATOMIC);
> 
> kmalloc() returns ZERO_SIZE_PTR, if (out + in) == 0

vring_add_buf() has:

  BUG_ON(out + in == 0)

I should just add that here too before the kmalloc() call.

Thanks,
Mark.


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 2/3] virtio: indirect ring entries (VIRTIO_RING_F_INDIRECT_DESC)
  2008-12-20 11:38     ` Ingo Oeser
  2008-12-22 10:17       ` Mark McLoughlin
@ 2008-12-22 10:17       ` Mark McLoughlin
  1 sibling, 0 replies; 29+ messages in thread
From: Mark McLoughlin @ 2008-12-22 10:17 UTC (permalink / raw)
  To: Ingo Oeser; +Cc: Avi Kivity, linux-kernel, virtualization

Hi Ingo,

On Sat, 2008-12-20 at 12:38 +0100, Ingo Oeser wrote:
> Hi Mark,
> 
> On Thursday 18 December 2008, Mark McLoughlin wrote:
> > diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> > index 5777196..2330c4b 100644
> > --- a/drivers/virtio/virtio_ring.c
> > +++ b/drivers/virtio/virtio_ring.c
> > @@ -70,6 +73,55 @@ struct vring_virtqueue
> >  
> >  #define to_vvq(_vq) container_of(_vq, struct vring_virtqueue, vq)
> >  
> > +/* Set up an indirect table of descriptors and add it to the queue. */
> > +static int vring_add_indirect(struct vring_virtqueue *vq,
> > +			      struct scatterlist sg[],
> > +			      unsigned int out,
> > +			      unsigned int in)
> > +{
> > +	struct vring_desc *desc;
> > +	unsigned head;
> > +	int i;
> > +
> > +	desc = kmalloc((out + in) * sizeof(struct vring_desc), GFP_ATOMIC);
> 
> kmalloc() returns ZERO_SIZE_PTR, if (out + in) == 0

vring_add_buf() has:

  BUG_ON(out + in == 0)

I should just add that here too before the kmalloc() call.

Thanks,
Mark.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 2/3] virtio: indirect ring entries (VIRTIO_RING_F_INDIRECT_DESC)
  2008-12-18 17:10   ` [PATCH 2/3] virtio: indirect ring entries (VIRTIO_RING_F_INDIRECT_DESC) Mark McLoughlin
                       ` (3 preceding siblings ...)
  2009-04-21 12:59     ` Mark McLoughlin
@ 2009-04-21 12:59     ` Mark McLoughlin
  2009-04-27  7:43       ` Dor Laor
  4 siblings, 1 reply; 29+ messages in thread
From: Mark McLoughlin @ 2009-04-21 12:59 UTC (permalink / raw)
  To: Rusty Russell; +Cc: virtualization, netdev, Avi Kivity, Dor Laor

Hi Rusty,

On Thu, 2008-12-18 at 17:10 +0000, Mark McLoughlin wrote:
> Add a new feature flag for indirect ring entries. These are ring
> entries which point to a table of buffer descriptors.
> 
> The idea here is to increase the ring capacity by allowing a larger
> effective ring size whereby the ring size dictates the number of
> requests that may be outstanding, rather than the size of those
> requests.
> 
> This should be most effective in the case of block I/O where we can
> potentially benefit by concurrently dispatching a large number of
> large requests. Even in the simple case of single segment block
> requests, this results in a threefold increase in ring capacity.

Apparently, this would also be useful for the windows virtio-net
drivers.

Dor can explain further, but apparently Windows has been observed
passing the driver a packet with >256 fragments when using TSO.

With a ring size of 256, the guest can either drop the packet or copy it
into a single buffer. We'd much rather if we could use an indirect ring
entry to pass this number of fragments without copying.

For reference the original patch was here:

  http://lkml.org/lkml/2008/12/18/212

Cheers,
Mark.


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 2/3] virtio: indirect ring entries (VIRTIO_RING_F_INDIRECT_DESC)
  2008-12-18 17:10   ` [PATCH 2/3] virtio: indirect ring entries (VIRTIO_RING_F_INDIRECT_DESC) Mark McLoughlin
                       ` (2 preceding siblings ...)
  2008-12-20 11:38     ` Ingo Oeser
@ 2009-04-21 12:59     ` Mark McLoughlin
  2009-04-21 12:59     ` Mark McLoughlin
  4 siblings, 0 replies; 29+ messages in thread
From: Mark McLoughlin @ 2009-04-21 12:59 UTC (permalink / raw)
  To: Rusty Russell; +Cc: netdev, Dor Laor, Avi Kivity, virtualization

Hi Rusty,

On Thu, 2008-12-18 at 17:10 +0000, Mark McLoughlin wrote:
> Add a new feature flag for indirect ring entries. These are ring
> entries which point to a table of buffer descriptors.
> 
> The idea here is to increase the ring capacity by allowing a larger
> effective ring size whereby the ring size dictates the number of
> requests that may be outstanding, rather than the size of those
> requests.
> 
> This should be most effective in the case of block I/O where we can
> potentially benefit by concurrently dispatching a large number of
> large requests. Even in the simple case of single segment block
> requests, this results in a threefold increase in ring capacity.

Apparently, this would also be useful for the windows virtio-net
drivers.

Dor can explain further, but apparently Windows has been observed
passing the driver a packet with >256 fragments when using TSO.

With a ring size of 256, the guest can either drop the packet or copy it
into a single buffer. We'd much rather if we could use an indirect ring
entry to pass this number of fragments without copying.

For reference the original patch was here:

  http://lkml.org/lkml/2008/12/18/212

Cheers,
Mark.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 2/3] virtio: indirect ring entries (VIRTIO_RING_F_INDIRECT_DESC)
  2009-04-21 12:59     ` Mark McLoughlin
@ 2009-04-27  7:43       ` Dor Laor
  2009-05-04  2:19         ` Rusty Russell
  0 siblings, 1 reply; 29+ messages in thread
From: Dor Laor @ 2009-04-27  7:43 UTC (permalink / raw)
  To: Mark McLoughlin; +Cc: netdev, Dor Laor, Avi Kivity, virtualization

Mark McLoughlin wrote:
> Hi Rusty,
>
> On Thu, 2008-12-18 at 17:10 +0000, Mark McLoughlin wrote:
>   
>> Add a new feature flag for indirect ring entries. These are ring
>> entries which point to a table of buffer descriptors.
>>
>> The idea here is to increase the ring capacity by allowing a larger
>> effective ring size whereby the ring size dictates the number of
>> requests that may be outstanding, rather than the size of those
>> requests.
>>
>> This should be most effective in the case of block I/O where we can
>> potentially benefit by concurrently dispatching a large number of
>> large requests. Even in the simple case of single segment block
>> requests, this results in a threefold increase in ring capacity.
>>     
>
> Apparently, this would also be useful for the windows virtio-net
> drivers.
>
> Dor can explain further, but apparently Windows has been observed
> passing the driver a packet with >256 fragments when using TSO.
>
> With a ring size of 256, the guest can either drop the packet or copy it
> into a single buffer. We'd much rather if we could use an indirect ring
> entry to pass this number of fragments without copying.
>   
Correct. This is what we do in Windows today.
The problem arises when using sending lots of small packets
from the win guest and TSO. Windows prepare very big scatter gather
list, bigger than the ring size (270 fragments).
Having indirect ring entries is good both for this and also for block 
io, as described
above.

Cheers,
Dor
> For reference the original patch was here:
>
>   http://lkml.org/lkml/2008/12/18/212
>
> Cheers,
> Mark.
>
> _______________________________________________
> Virtualization mailing list
> Virtualization@lists.linux-foundation.org
> https://lists.linux-foundation.org/mailman/listinfo/virtualization
>   

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 2/3] virtio: indirect ring entries (VIRTIO_RING_F_INDIRECT_DESC)
  2009-04-27  7:43       ` Dor Laor
@ 2009-05-04  2:19         ` Rusty Russell
  2009-05-11 17:10           ` Mark McLoughlin
  2009-05-11 17:10           ` Mark McLoughlin
  0 siblings, 2 replies; 29+ messages in thread
From: Rusty Russell @ 2009-05-04  2:19 UTC (permalink / raw)
  To: dlaor; +Cc: Mark McLoughlin, Dor Laor, virtualization, Avi Kivity, netdev

On Mon, 27 Apr 2009 05:13:53 pm Dor Laor wrote:
> Mark McLoughlin wrote:
> > Hi Rusty,
> >
> > On Thu, 2008-12-18 at 17:10 +0000, Mark McLoughlin wrote:
> >   
> >> Add a new feature flag for indirect ring entries. These are ring
> >> entries which point to a table of buffer descriptors.
> >>
> >> The idea here is to increase the ring capacity by allowing a larger
> >> effective ring size whereby the ring size dictates the number of
> >> requests that may be outstanding, rather than the size of those
> >> requests.

OK, just so we track our mistakes.

1) virtio_rings must be physically contiguous, even though they actually
   have two independent parts.
2) The number of elements in a ring must be a power of 2.
3) virtio_pci tells the guest what number of elements to use.
4) The guest has to allocate that much physically contiguous memory, or fail.

In practice, 128 elements = 2 pages, 256 elements = 3 pages, 512 elements
= 5 pages.  Order 1, order 2, order 3 under Linux.  1 is OK, 2 is iffy, 3 is
hard.

Blocked from doing the simpler thing, we've decided to go with a layer
of indirection.  But the patch is simple and clean, so there's nothing
fundamental to object to.

I can't find 3/3, did it go missing?

Thanks,
Rusty.



> >>
> >> This should be most effective in the case of block I/O where we can
> >> potentially benefit by concurrently dispatching a large number of
> >> large requests. Even in the simple case of single segment block
> >> requests, this results in a threefold increase in ring capacity.
> >>     
> >
> > Apparently, this would also be useful for the windows virtio-net
> > drivers.
> >
> > Dor can explain further, but apparently Windows has been observed
> > passing the driver a packet with >256 fragments when using TSO.
> >
> > With a ring size of 256, the guest can either drop the packet or copy it
> > into a single buffer. We'd much rather if we could use an indirect ring
> > entry to pass this number of fragments without copying.
> >   
> Correct. This is what we do in Windows today.
> The problem arises when using sending lots of small packets
> from the win guest and TSO. Windows prepare very big scatter gather
> list, bigger than the ring size (270 fragments).
> Having indirect ring entries is good both for this and also for block 
> io, as described
> above.
> 
> Cheers,
> Dor
> > For reference the original patch was here:
> >
> >   http://lkml.org/lkml/2008/12/18/212
> >
> > Cheers,
> > Mark.
> >
> > _______________________________________________
> > Virtualization mailing list
> > Virtualization@lists.linux-foundation.org
> > https://lists.linux-foundation.org/mailman/listinfo/virtualization
> >   
> 

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 2/3] virtio: indirect ring entries (VIRTIO_RING_F_INDIRECT_DESC)
  2009-05-04  2:19         ` Rusty Russell
  2009-05-11 17:10           ` Mark McLoughlin
@ 2009-05-11 17:10           ` Mark McLoughlin
  2009-05-11 17:11             ` [PATCH 1/3] virtio: teach virtio_has_feature() about transport features Mark McLoughlin
                               ` (4 more replies)
  1 sibling, 5 replies; 29+ messages in thread
From: Mark McLoughlin @ 2009-05-11 17:10 UTC (permalink / raw)
  To: Rusty Russell; +Cc: dlaor, netdev, Dor Laor, Avi Kivity, virtualization

On Mon, 2009-05-04 at 11:49 +0930, Rusty Russell wrote:
> On Mon, 27 Apr 2009 05:13:53 pm Dor Laor wrote:
> > Mark McLoughlin wrote:
> > > Hi Rusty,
> > >
> > > On Thu, 2008-12-18 at 17:10 +0000, Mark McLoughlin wrote:
> > >   
> > >> Add a new feature flag for indirect ring entries. These are ring
> > >> entries which point to a table of buffer descriptors.
> > >>
> > >> The idea here is to increase the ring capacity by allowing a larger
> > >> effective ring size whereby the ring size dictates the number of
> > >> requests that may be outstanding, rather than the size of those
> > >> requests.
> 
> OK, just so we track our mistakes.
> 
> 1) virtio_rings must be physically contiguous, even though they actually
>    have two independent parts.
> 2) The number of elements in a ring must be a power of 2.
> 3) virtio_pci tells the guest what number of elements to use.
> 4) The guest has to allocate that much physically contiguous memory, or fail.
> 
> In practice, 128 elements = 2 pages, 256 elements = 3 pages, 512 elements
> = 5 pages.  Order 1, order 2, order 3 under Linux.  1 is OK, 2 is iffy, 3 is
> hard.
> 
> Blocked from doing the simpler thing, we've decided to go with a layer
> of indirection.  But the patch is simple and clean, so there's nothing
> fundamental to object to.

Still have one FIXME in the patch worth looking at - at what point
should we use an indirect entry rather than consuming N entries? 

> I can't find 3/3, did it go missing?

Following up with all three patches again.

Cheers,
Mark.


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 2/3] virtio: indirect ring entries (VIRTIO_RING_F_INDIRECT_DESC)
  2009-05-04  2:19         ` Rusty Russell
@ 2009-05-11 17:10           ` Mark McLoughlin
  2009-05-11 17:10           ` Mark McLoughlin
  1 sibling, 0 replies; 29+ messages in thread
From: Mark McLoughlin @ 2009-05-11 17:10 UTC (permalink / raw)
  To: Rusty Russell; +Cc: netdev, Dor Laor, Avi Kivity, virtualization

On Mon, 2009-05-04 at 11:49 +0930, Rusty Russell wrote:
> On Mon, 27 Apr 2009 05:13:53 pm Dor Laor wrote:
> > Mark McLoughlin wrote:
> > > Hi Rusty,
> > >
> > > On Thu, 2008-12-18 at 17:10 +0000, Mark McLoughlin wrote:
> > >   
> > >> Add a new feature flag for indirect ring entries. These are ring
> > >> entries which point to a table of buffer descriptors.
> > >>
> > >> The idea here is to increase the ring capacity by allowing a larger
> > >> effective ring size whereby the ring size dictates the number of
> > >> requests that may be outstanding, rather than the size of those
> > >> requests.
> 
> OK, just so we track our mistakes.
> 
> 1) virtio_rings must be physically contiguous, even though they actually
>    have two independent parts.
> 2) The number of elements in a ring must be a power of 2.
> 3) virtio_pci tells the guest what number of elements to use.
> 4) The guest has to allocate that much physically contiguous memory, or fail.
> 
> In practice, 128 elements = 2 pages, 256 elements = 3 pages, 512 elements
> = 5 pages.  Order 1, order 2, order 3 under Linux.  1 is OK, 2 is iffy, 3 is
> hard.
> 
> Blocked from doing the simpler thing, we've decided to go with a layer
> of indirection.  But the patch is simple and clean, so there's nothing
> fundamental to object to.

Still have one FIXME in the patch worth looking at - at what point
should we use an indirect entry rather than consuming N entries? 

> I can't find 3/3, did it go missing?

Following up with all three patches again.

Cheers,
Mark.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [PATCH 1/3] virtio: teach virtio_has_feature() about transport features
  2009-05-11 17:10           ` Mark McLoughlin
@ 2009-05-11 17:11             ` Mark McLoughlin
  2009-05-11 17:11               ` [PATCH 2/3] virtio: indirect ring entries (VIRTIO_RING_F_INDIRECT_DESC) Mark McLoughlin
  2009-05-12 14:23             ` [PATCH 2/3] virtio: indirect ring entries (VIRTIO_RING_F_INDIRECT_DESC) Rusty Russell
                               ` (3 subsequent siblings)
  4 siblings, 1 reply; 29+ messages in thread
From: Mark McLoughlin @ 2009-05-11 17:11 UTC (permalink / raw)
  To: Rusty Russell
  Cc: netdev, Dor Laor, Avi Kivity, virtualization, Mark McLoughlin

Drivers don't add transport features to their table, so we
shouldn't check these with virtio_check_driver_offered_feature().

We could perhaps add an ->offered_feature() virtio_config_op,
but that perhaps that would be overkill for a consitency check
like this.

Signed-off-by: Mark McLoughlin <markmc@redhat.com>
---
 include/linux/virtio_config.h |    4 +++-
 1 files changed, 3 insertions(+), 1 deletions(-)

diff --git a/include/linux/virtio_config.h b/include/linux/virtio_config.h
index bf8ec28..e4ba694 100644
--- a/include/linux/virtio_config.h
+++ b/include/linux/virtio_config.h
@@ -99,7 +99,9 @@ static inline bool virtio_has_feature(const struct virtio_device *vdev,
 	if (__builtin_constant_p(fbit))
 		BUILD_BUG_ON(fbit >= 32);
 
-	virtio_check_driver_offered_feature(vdev, fbit);
+	if (fbit < VIRTIO_TRANSPORT_F_START)
+		virtio_check_driver_offered_feature(vdev, fbit);
+
 	return test_bit(fbit, vdev->features);
 }
 
-- 
1.6.0.6


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH 2/3] virtio: indirect ring entries (VIRTIO_RING_F_INDIRECT_DESC)
  2009-05-11 17:11             ` [PATCH 1/3] virtio: teach virtio_has_feature() about transport features Mark McLoughlin
@ 2009-05-11 17:11               ` Mark McLoughlin
  2009-05-11 17:11                 ` [PATCH 3/3] lguest: add support for indirect ring entries Mark McLoughlin
  0 siblings, 1 reply; 29+ messages in thread
From: Mark McLoughlin @ 2009-05-11 17:11 UTC (permalink / raw)
  To: Rusty Russell
  Cc: netdev, Dor Laor, Avi Kivity, virtualization, Mark McLoughlin

Add a new feature flag for indirect ring entries. These are ring
entries which point to a table of buffer descriptors.

The idea here is to increase the ring capacity by allowing a larger
effective ring size whereby the ring size dictates the number of
requests that may be outstanding, rather than the size of those
requests.

This should be most effective in the case of block I/O where we can
potentially benefit by concurrently dispatching a large number of
large requests. Even in the simple case of single segment block
requests, this results in a threefold increase in ring capacity.

Signed-off-by: Mark McLoughlin <markmc@redhat.com>
---
 drivers/virtio/virtio_ring.c |   75 ++++++++++++++++++++++++++++++++++++++++-
 include/linux/virtio_ring.h  |    5 +++
 2 files changed, 78 insertions(+), 2 deletions(-)

diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
index 5c52369..ebccea8 100644
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -52,6 +52,9 @@ struct vring_virtqueue
 	/* Other side has made a mess, don't try any more. */
 	bool broken;
 
+	/* Host supports indirect buffers */
+	bool indirect;
+
 	/* Number of free buffers */
 	unsigned int num_free;
 	/* Head of free buffer list. */
@@ -76,6 +79,55 @@ struct vring_virtqueue
 
 #define to_vvq(_vq) container_of(_vq, struct vring_virtqueue, vq)
 
+/* Set up an indirect table of descriptors and add it to the queue. */
+static int vring_add_indirect(struct vring_virtqueue *vq,
+			      struct scatterlist sg[],
+			      unsigned int out,
+			      unsigned int in)
+{
+	struct vring_desc *desc;
+	unsigned head;
+	int i;
+
+	desc = kmalloc((out + in) * sizeof(struct vring_desc), GFP_ATOMIC);
+	if (!desc)
+		return vq->vring.num;
+
+	/* Transfer entries from the sg list into the indirect page */
+	for (i = 0; i < out; i++) {
+		desc[i].flags = VRING_DESC_F_NEXT;
+		desc[i].addr = sg_phys(sg);
+		desc[i].len = sg->length;
+		desc[i].next = i+1;
+		sg++;
+	}
+	for (; i < (out + in); i++) {
+		desc[i].flags = VRING_DESC_F_NEXT|VRING_DESC_F_WRITE;
+		desc[i].addr = sg_phys(sg);
+		desc[i].len = sg->length;
+		desc[i].next = i+1;
+		sg++;
+	}
+
+	/* Last one doesn't continue. */
+	desc[i-1].flags &= ~VRING_DESC_F_NEXT;
+	desc[i-1].next = 0;
+
+	/* We're about to use a buffer */
+	vq->num_free--;
+
+	/* Use a single buffer which doesn't continue */
+	head = vq->free_head;
+	vq->vring.desc[head].flags = VRING_DESC_F_INDIRECT;
+	vq->vring.desc[head].addr = virt_to_phys(desc);
+	vq->vring.desc[head].len = i * sizeof(struct vring_desc);
+
+	/* Update free pointer */
+	vq->free_head = vq->vring.desc[head].next;
+
+	return head;
+}
+
 static int vring_add_buf(struct virtqueue *_vq,
 			 struct scatterlist sg[],
 			 unsigned int out,
@@ -85,12 +137,21 @@ static int vring_add_buf(struct virtqueue *_vq,
 	struct vring_virtqueue *vq = to_vvq(_vq);
 	unsigned int i, avail, head, uninitialized_var(prev);
 
+	START_USE(vq);
+
 	BUG_ON(data == NULL);
+
+	/* If the host supports indirect descriptor tables, and we have multiple
+	 * buffers, then go indirect. FIXME: tune this threshold */
+	if (vq->indirect && (out + in) > 1 && vq->num_free) {
+		head = vring_add_indirect(vq, sg, out, in);
+		if (head != vq->vring.num)
+			goto add_head;
+	}
+
 	BUG_ON(out + in > vq->vring.num);
 	BUG_ON(out + in == 0);
 
-	START_USE(vq);
-
 	if (vq->num_free < out + in) {
 		pr_debug("Can't add buf len %i - avail = %i\n",
 			 out + in, vq->num_free);
@@ -127,6 +188,7 @@ static int vring_add_buf(struct virtqueue *_vq,
 	/* Update free pointer */
 	vq->free_head = i;
 
+add_head:
 	/* Set token. */
 	vq->data[head] = data;
 
@@ -170,6 +232,11 @@ static void detach_buf(struct vring_virtqueue *vq, unsigned int head)
 
 	/* Put back on free list: find end */
 	i = head;
+
+	/* Free the indirect table */
+	if (vq->vring.desc[i].flags & VRING_DESC_F_INDIRECT)
+		kfree(phys_to_virt(vq->vring.desc[i].addr));
+
 	while (vq->vring.desc[i].flags & VRING_DESC_F_NEXT) {
 		i = vq->vring.desc[i].next;
 		vq->num_free++;
@@ -311,6 +378,8 @@ struct virtqueue *vring_new_virtqueue(unsigned int num,
 	vq->in_use = false;
 #endif
 
+	vq->indirect = virtio_has_feature(vdev, VIRTIO_RING_F_INDIRECT_DESC);
+
 	/* No callback?  Tell other side not to bother us. */
 	if (!callback)
 		vq->vring.avail->flags |= VRING_AVAIL_F_NO_INTERRUPT;
@@ -338,6 +407,8 @@ void vring_transport_features(struct virtio_device *vdev)
 
 	for (i = VIRTIO_TRANSPORT_F_START; i < VIRTIO_TRANSPORT_F_END; i++) {
 		switch (i) {
+		case VIRTIO_RING_F_INDIRECT_DESC:
+			break;
 		default:
 			/* We don't understand this bit. */
 			clear_bit(i, vdev->features);
diff --git a/include/linux/virtio_ring.h b/include/linux/virtio_ring.h
index 71e0372..3828ae2 100644
--- a/include/linux/virtio_ring.h
+++ b/include/linux/virtio_ring.h
@@ -14,6 +14,8 @@
 #define VRING_DESC_F_NEXT	1
 /* This marks a buffer as write-only (otherwise read-only). */
 #define VRING_DESC_F_WRITE	2
+/* This means the buffer contains a list of buffer descriptors. */
+#define VRING_DESC_F_INDIRECT	4
 
 /* The Host uses this in used->flags to advise the Guest: don't kick me when
  * you add a buffer.  It's unreliable, so it's simply an optimization.  Guest
@@ -24,6 +26,9 @@
  * optimization.  */
 #define VRING_AVAIL_F_NO_INTERRUPT	1
 
+/* We support indirect buffer descriptors */
+#define VIRTIO_RING_F_INDIRECT_DESC	28
+
 /* Virtio ring descriptors: 16 bytes.  These can chain together via "next". */
 struct vring_desc
 {
-- 
1.6.0.6


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH 3/3] lguest: add support for indirect ring entries
  2009-05-11 17:11               ` [PATCH 2/3] virtio: indirect ring entries (VIRTIO_RING_F_INDIRECT_DESC) Mark McLoughlin
@ 2009-05-11 17:11                 ` Mark McLoughlin
  0 siblings, 0 replies; 29+ messages in thread
From: Mark McLoughlin @ 2009-05-11 17:11 UTC (permalink / raw)
  To: Rusty Russell
  Cc: netdev, Dor Laor, Avi Kivity, virtualization, Mark McLoughlin

Support the VIRTIO_RING_F_INDIRECT_DESC feature.

This is a simple matter of changing the descriptor walking
code to operate on a struct vring_desc* and supplying it
with an indirect table if detected.

Signed-off-by: Mark McLoughlin <markmc@redhat.com>
---
 Documentation/lguest/lguest.c |   41 +++++++++++++++++++++++++++++------------
 1 files changed, 29 insertions(+), 12 deletions(-)

diff --git a/Documentation/lguest/lguest.c b/Documentation/lguest/lguest.c
index d36fcc0..010c8bc 100644
--- a/Documentation/lguest/lguest.c
+++ b/Documentation/lguest/lguest.c
@@ -623,20 +623,21 @@ static void *_check_pointer(unsigned long addr, unsigned int size,
 /* Each buffer in the virtqueues is actually a chain of descriptors.  This
  * function returns the next descriptor in the chain, or vq->vring.num if we're
  * at the end. */
-static unsigned next_desc(struct virtqueue *vq, unsigned int i)
+static unsigned next_desc(struct vring_desc *desc,
+			  unsigned int i, unsigned int max)
 {
 	unsigned int next;
 
 	/* If this descriptor says it doesn't chain, we're done. */
-	if (!(vq->vring.desc[i].flags & VRING_DESC_F_NEXT))
-		return vq->vring.num;
+	if (!(desc[i].flags & VRING_DESC_F_NEXT))
+		return max;
 
 	/* Check they're not leading us off end of descriptors. */
-	next = vq->vring.desc[i].next;
+	next = desc[i].next;
 	/* Make sure compiler knows to grab that: we don't want it changing! */
 	wmb();
 
-	if (next >= vq->vring.num)
+	if (next >= max)
 		errx(1, "Desc next is %u", next);
 
 	return next;
@@ -653,7 +654,8 @@ static unsigned get_vq_desc(struct virtqueue *vq,
 			    struct iovec iov[],
 			    unsigned int *out_num, unsigned int *in_num)
 {
-	unsigned int i, head;
+	struct vring_desc *desc;
+	unsigned int i, head, max;
 	u16 last_avail;
 
 	/* Check it isn't doing very strange things with descriptor numbers. */
@@ -678,15 +680,28 @@ static unsigned get_vq_desc(struct virtqueue *vq,
 	/* When we start there are none of either input nor output. */
 	*out_num = *in_num = 0;
 
+	max = vq->vring.num;
+	desc = vq->vring.desc;
 	i = head;
+
+	/* If this is an indirect entry, then this buffer contains a descriptor
+	 * table which we handle as if it's any normal descriptor chain. */
+	if (desc[i].flags & VRING_DESC_F_INDIRECT) {
+		if (desc[i].len % sizeof(struct vring_desc))
+			errx(1, "Invalid size for indirect buffer table");
+
+		max = desc[i].len / sizeof(struct vring_desc);
+		desc = check_pointer(desc[i].addr, desc[i].len);
+		i = 0;
+	}
+
 	do {
 		/* Grab the first descriptor, and check it's OK. */
-		iov[*out_num + *in_num].iov_len = vq->vring.desc[i].len;
+		iov[*out_num + *in_num].iov_len = desc[i].len;
 		iov[*out_num + *in_num].iov_base
-			= check_pointer(vq->vring.desc[i].addr,
-					vq->vring.desc[i].len);
+			= check_pointer(desc[i].addr, desc[i].len);
 		/* If this is an input descriptor, increment that count. */
-		if (vq->vring.desc[i].flags & VRING_DESC_F_WRITE)
+		if (desc[i].flags & VRING_DESC_F_WRITE)
 			(*in_num)++;
 		else {
 			/* If it's an output descriptor, they're all supposed
@@ -697,9 +712,9 @@ static unsigned get_vq_desc(struct virtqueue *vq,
 		}
 
 		/* If we've got too many, that implies a descriptor loop. */
-		if (*out_num + *in_num > vq->vring.num)
+		if (*out_num + *in_num > max)
 			errx(1, "Looped descriptor");
-	} while ((i = next_desc(vq, i)) != vq->vring.num);
+	} while ((i = next_desc(desc, i, max)) != max);
 
 	vq->inflight++;
 	return head;
@@ -1502,6 +1517,8 @@ static void setup_tun_net(char *arg)
 	add_feature(dev, VIRTIO_NET_F_HOST_TSO4);
 	add_feature(dev, VIRTIO_NET_F_HOST_TSO6);
 	add_feature(dev, VIRTIO_NET_F_HOST_ECN);
+	/* we handle indirect ring entries */
+	add_feature(dev, VIRTIO_RING_F_INDIRECT_DESC);
 	set_config(dev, sizeof(conf), &conf);
 
 	/* We don't need the socket any more; setup is done. */
-- 
1.6.0.6


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* Re: [PATCH 2/3] virtio: indirect ring entries (VIRTIO_RING_F_INDIRECT_DESC)
  2009-05-11 17:10           ` Mark McLoughlin
  2009-05-11 17:11             ` [PATCH 1/3] virtio: teach virtio_has_feature() about transport features Mark McLoughlin
@ 2009-05-12 14:23             ` Rusty Russell
  2009-05-12 14:23             ` Rusty Russell
                               ` (2 subsequent siblings)
  4 siblings, 0 replies; 29+ messages in thread
From: Rusty Russell @ 2009-05-12 14:23 UTC (permalink / raw)
  To: Mark McLoughlin; +Cc: dlaor, netdev, Dor Laor, Avi Kivity, virtualization

On Tue, 12 May 2009 02:40:38 am Mark McLoughlin wrote:
> > Blocked from doing the simpler thing, we've decided to go with a layer
> > of indirection.  But the patch is simple and clean, so there's nothing
> > fundamental to object to.
>
> Still have one FIXME in the patch worth looking at - at what point
> should we use an indirect entry rather than consuming N entries?

OK, I've applied these as is.  I'm doing some virtio net benchmarking (under 
lguest); I'll see if I can get a reasonable figure.  I don't think there's an 
obvious right answer; it depends how many more packets are coming as well as 
how many descriptors each will use.

Thanks,
Rusty.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 2/3] virtio: indirect ring entries (VIRTIO_RING_F_INDIRECT_DESC)
  2009-05-11 17:10           ` Mark McLoughlin
  2009-05-11 17:11             ` [PATCH 1/3] virtio: teach virtio_has_feature() about transport features Mark McLoughlin
  2009-05-12 14:23             ` [PATCH 2/3] virtio: indirect ring entries (VIRTIO_RING_F_INDIRECT_DESC) Rusty Russell
@ 2009-05-12 14:23             ` Rusty Russell
  2009-05-17  2:04             ` Rusty Russell
  2009-05-17  2:04             ` Rusty Russell
  4 siblings, 0 replies; 29+ messages in thread
From: Rusty Russell @ 2009-05-12 14:23 UTC (permalink / raw)
  To: Mark McLoughlin; +Cc: netdev, Dor Laor, Avi Kivity, virtualization

On Tue, 12 May 2009 02:40:38 am Mark McLoughlin wrote:
> > Blocked from doing the simpler thing, we've decided to go with a layer
> > of indirection.  But the patch is simple and clean, so there's nothing
> > fundamental to object to.
>
> Still have one FIXME in the patch worth looking at - at what point
> should we use an indirect entry rather than consuming N entries?

OK, I've applied these as is.  I'm doing some virtio net benchmarking (under 
lguest); I'll see if I can get a reasonable figure.  I don't think there's an 
obvious right answer; it depends how many more packets are coming as well as 
how many descriptors each will use.

Thanks,
Rusty.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 2/3] virtio: indirect ring entries (VIRTIO_RING_F_INDIRECT_DESC)
  2009-05-11 17:10           ` Mark McLoughlin
                               ` (3 preceding siblings ...)
  2009-05-17  2:04             ` Rusty Russell
@ 2009-05-17  2:04             ` Rusty Russell
  2009-05-17  6:27               ` Avi Kivity
  2009-05-17  6:27               ` Avi Kivity
  4 siblings, 2 replies; 29+ messages in thread
From: Rusty Russell @ 2009-05-17  2:04 UTC (permalink / raw)
  To: Mark McLoughlin; +Cc: dlaor, netdev, Dor Laor, Avi Kivity, virtualization

On Tue, 12 May 2009 02:40:38 am Mark McLoughlin wrote:
> Still have one FIXME in the patch worth looking at - at what point
> should we use an indirect entry rather than consuming N entries?

Is this overkill?

Rusty.

virtio: use indirect buffers based on demand (heuristic)

virtio_ring uses a ring buffer of descriptors: indirect support allows
a single descriptor to refer to a table of descriptors.  This saves
space in the ring, but requires a kmalloc/kfree.

Rather than try to figure out what the right threshold at which to use
indirect buffers, we drop the threshold dynamically when the ring is
under stress.

Note: to stress this, I reduced the ring size to 32 in lguest, and a
1G send reduced the threshold to 9.

Note2: I moved the BUG_ON()s above the indirect test, where they belong
(indirect falls thru on OOM, so the constraints still apply).

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>

diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -63,6 +63,8 @@ struct vring_virtqueue
 
 	/* Host supports indirect buffers */
 	bool indirect;
+	/* Threshold before we go indirect. */
+	unsigned int indirect_threshold;
 
 	/* Number of free buffers */
 	unsigned int num_free;
@@ -137,6 +141,32 @@ static int vring_add_indirect(struct vri
 	return head;
 }
 
+static void adjust_threshold(struct vring_virtqueue *vq,
+			     unsigned int out, unsigned int in)
+{
+	/* There are really two species of virtqueue, and it matters here.
+	 * If there are no output parts, it's a "normally full" receive queue,
+	 * otherwise it's a "normally empty" send queue. */
+	if (out) {
+		/* Leave threshold unless we're full. */
+		if (out + in < vq->num_free)
+			return;
+	} else {
+		/* Leave threshold unless we're empty. */
+		if (vq->num_free != vq->vring.num)
+			return;
+	}
+
+	/* Never drop threshold below 1 */
+	vq->indirect_threshold /= 2;
+	vq->indirect_threshold |= 1;
+
+	printk("%s %s: indirect threshold %u (%u+%u vs %u)\n", 
+	       dev_name(&vq->vq.vdev->dev),
+	       vq->vq.name, vq->indirect_threshold,
+	       out, in, vq->num_free);
+}
+
 static int vring_add_buf(struct virtqueue *_vq,
 			 struct scatterlist sg[],
 			 unsigned int out,
@@ -149,18 +179,31 @@ static int vring_add_buf(struct virtqueu
 	START_USE(vq);
 
 	BUG_ON(data == NULL);
-
-	/* If the host supports indirect descriptor tables, and we have multiple
-	 * buffers, then go indirect. FIXME: tune this threshold */
-	if (vq->indirect && (out + in) > 1 && vq->num_free) {
-		head = vring_add_indirect(vq, sg, out, in);
-		if (head != vq->vring.num)
-			goto add_head;
-	}
-
 	BUG_ON(out + in > vq->vring.num);
 	BUG_ON(out + in == 0);
 
+	/* If the host supports indirect descriptor tables, consider it. */
+	if (vq->indirect) {
+		bool try_indirect;
+
+		/* We tweak the threshold automatically. */
+		adjust_threshold(vq, out, in);
+
+		/* If we can't fit any at all, fall through. */
+		if (vq->num_free == 0)
+			try_indirect = false;
+		else if (out + in > vq->num_free)
+			try_indirect = true;
+		else
+			try_indirect = (out + in > vq->indirect_threshold);
+
+		if (try_indirect) {
+			head = vring_add_indirect(vq, sg, out, in);
+			if (head != vq->vring.num)
+				goto add_head;
+		}
+	}
+
 	if (vq->num_free < out + in) {
 		pr_debug("Can't add buf len %i - avail = %i\n",
 			 out + in, vq->num_free);
@@ -391,6 +434,7 @@ struct virtqueue *vring_new_virtqueue(un
 #endif
 
 	vq->indirect = virtio_has_feature(vdev, VIRTIO_RING_F_INDIRECT_DESC);
+	vq->indirect_threshold = num;
 
 	/* No callback?  Tell other side not to bother us. */
 	if (!callback)


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 2/3] virtio: indirect ring entries (VIRTIO_RING_F_INDIRECT_DESC)
  2009-05-11 17:10           ` Mark McLoughlin
                               ` (2 preceding siblings ...)
  2009-05-12 14:23             ` Rusty Russell
@ 2009-05-17  2:04             ` Rusty Russell
  2009-05-17  2:04             ` Rusty Russell
  4 siblings, 0 replies; 29+ messages in thread
From: Rusty Russell @ 2009-05-17  2:04 UTC (permalink / raw)
  To: Mark McLoughlin; +Cc: netdev, Dor Laor, Avi Kivity, virtualization

On Tue, 12 May 2009 02:40:38 am Mark McLoughlin wrote:
> Still have one FIXME in the patch worth looking at - at what point
> should we use an indirect entry rather than consuming N entries?

Is this overkill?

Rusty.

virtio: use indirect buffers based on demand (heuristic)

virtio_ring uses a ring buffer of descriptors: indirect support allows
a single descriptor to refer to a table of descriptors.  This saves
space in the ring, but requires a kmalloc/kfree.

Rather than try to figure out what the right threshold at which to use
indirect buffers, we drop the threshold dynamically when the ring is
under stress.

Note: to stress this, I reduced the ring size to 32 in lguest, and a
1G send reduced the threshold to 9.

Note2: I moved the BUG_ON()s above the indirect test, where they belong
(indirect falls thru on OOM, so the constraints still apply).

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>

diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -63,6 +63,8 @@ struct vring_virtqueue
 
 	/* Host supports indirect buffers */
 	bool indirect;
+	/* Threshold before we go indirect. */
+	unsigned int indirect_threshold;
 
 	/* Number of free buffers */
 	unsigned int num_free;
@@ -137,6 +141,32 @@ static int vring_add_indirect(struct vri
 	return head;
 }
 
+static void adjust_threshold(struct vring_virtqueue *vq,
+			     unsigned int out, unsigned int in)
+{
+	/* There are really two species of virtqueue, and it matters here.
+	 * If there are no output parts, it's a "normally full" receive queue,
+	 * otherwise it's a "normally empty" send queue. */
+	if (out) {
+		/* Leave threshold unless we're full. */
+		if (out + in < vq->num_free)
+			return;
+	} else {
+		/* Leave threshold unless we're empty. */
+		if (vq->num_free != vq->vring.num)
+			return;
+	}
+
+	/* Never drop threshold below 1 */
+	vq->indirect_threshold /= 2;
+	vq->indirect_threshold |= 1;
+
+	printk("%s %s: indirect threshold %u (%u+%u vs %u)\n", 
+	       dev_name(&vq->vq.vdev->dev),
+	       vq->vq.name, vq->indirect_threshold,
+	       out, in, vq->num_free);
+}
+
 static int vring_add_buf(struct virtqueue *_vq,
 			 struct scatterlist sg[],
 			 unsigned int out,
@@ -149,18 +179,31 @@ static int vring_add_buf(struct virtqueu
 	START_USE(vq);
 
 	BUG_ON(data == NULL);
-
-	/* If the host supports indirect descriptor tables, and we have multiple
-	 * buffers, then go indirect. FIXME: tune this threshold */
-	if (vq->indirect && (out + in) > 1 && vq->num_free) {
-		head = vring_add_indirect(vq, sg, out, in);
-		if (head != vq->vring.num)
-			goto add_head;
-	}
-
 	BUG_ON(out + in > vq->vring.num);
 	BUG_ON(out + in == 0);
 
+	/* If the host supports indirect descriptor tables, consider it. */
+	if (vq->indirect) {
+		bool try_indirect;
+
+		/* We tweak the threshold automatically. */
+		adjust_threshold(vq, out, in);
+
+		/* If we can't fit any at all, fall through. */
+		if (vq->num_free == 0)
+			try_indirect = false;
+		else if (out + in > vq->num_free)
+			try_indirect = true;
+		else
+			try_indirect = (out + in > vq->indirect_threshold);
+
+		if (try_indirect) {
+			head = vring_add_indirect(vq, sg, out, in);
+			if (head != vq->vring.num)
+				goto add_head;
+		}
+	}
+
 	if (vq->num_free < out + in) {
 		pr_debug("Can't add buf len %i - avail = %i\n",
 			 out + in, vq->num_free);
@@ -391,6 +434,7 @@ struct virtqueue *vring_new_virtqueue(un
 #endif
 
 	vq->indirect = virtio_has_feature(vdev, VIRTIO_RING_F_INDIRECT_DESC);
+	vq->indirect_threshold = num;
 
 	/* No callback?  Tell other side not to bother us. */
 	if (!callback)

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 2/3] virtio: indirect ring entries (VIRTIO_RING_F_INDIRECT_DESC)
  2009-05-17  2:04             ` Rusty Russell
  2009-05-17  6:27               ` Avi Kivity
@ 2009-05-17  6:27               ` Avi Kivity
  2009-05-17 14:16                 ` Rusty Russell
  2009-05-17 14:16                 ` Rusty Russell
  1 sibling, 2 replies; 29+ messages in thread
From: Avi Kivity @ 2009-05-17  6:27 UTC (permalink / raw)
  To: Rusty Russell; +Cc: Mark McLoughlin, dlaor, netdev, Dor Laor, virtualization

Rusty Russell wrote:
>  
> +static void adjust_threshold(struct vring_virtqueue *vq,
> +			     unsigned int out, unsigned int in)
> +{
> +	/* There are really two species of virtqueue, and it matters here.
> +	 * If there are no output parts, it's a "normally full" receive queue,
> +	 * otherwise it's a "normally empty" send queue. */
>   

This comment is true for networking, but not for block. ++overkill with 
a ->adjust_threshold op.

-- 
Do not meddle in the internals of kernels, for they are subtle and quick to panic.


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 2/3] virtio: indirect ring entries (VIRTIO_RING_F_INDIRECT_DESC)
  2009-05-17  2:04             ` Rusty Russell
@ 2009-05-17  6:27               ` Avi Kivity
  2009-05-17  6:27               ` Avi Kivity
  1 sibling, 0 replies; 29+ messages in thread
From: Avi Kivity @ 2009-05-17  6:27 UTC (permalink / raw)
  To: Rusty Russell; +Cc: Mark McLoughlin, Dor Laor, virtualization, netdev

Rusty Russell wrote:
>  
> +static void adjust_threshold(struct vring_virtqueue *vq,
> +			     unsigned int out, unsigned int in)
> +{
> +	/* There are really two species of virtqueue, and it matters here.
> +	 * If there are no output parts, it's a "normally full" receive queue,
> +	 * otherwise it's a "normally empty" send queue. */
>   

This comment is true for networking, but not for block. ++overkill with 
a ->adjust_threshold op.

-- 
Do not meddle in the internals of kernels, for they are subtle and quick to panic.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 2/3] virtio: indirect ring entries (VIRTIO_RING_F_INDIRECT_DESC)
  2009-05-17  6:27               ` Avi Kivity
  2009-05-17 14:16                 ` Rusty Russell
@ 2009-05-17 14:16                 ` Rusty Russell
  2009-05-17 15:05                   ` Avi Kivity
  2009-05-17 15:05                   ` Avi Kivity
  1 sibling, 2 replies; 29+ messages in thread
From: Rusty Russell @ 2009-05-17 14:16 UTC (permalink / raw)
  To: Avi Kivity; +Cc: Mark McLoughlin, dlaor, netdev, Dor Laor, virtualization

On Sun, 17 May 2009 03:57:01 pm Avi Kivity wrote:
> Rusty Russell wrote:
> > +static void adjust_threshold(struct vring_virtqueue *vq,
> > +			     unsigned int out, unsigned int in)
> > +{
> > +	/* There are really two species of virtqueue, and it matters here.
> > +	 * If there are no output parts, it's a "normally full" receive queue,
> > +	 * otherwise it's a "normally empty" send queue. */
>
> This comment is true for networking, but not for block. ++overkill with
> a ->adjust_threshold op.

No, it's true for block.  It has output parts, so we should reduce threshold 
when it's full.  Network recvq is an example which should reduce threshold 
when it's empty.

->adjust_threshold is better as an arg to vring_new_virtqueue, but it's still 
not clear what the answer would be.

Rusty.


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 2/3] virtio: indirect ring entries (VIRTIO_RING_F_INDIRECT_DESC)
  2009-05-17  6:27               ` Avi Kivity
@ 2009-05-17 14:16                 ` Rusty Russell
  2009-05-17 14:16                 ` Rusty Russell
  1 sibling, 0 replies; 29+ messages in thread
From: Rusty Russell @ 2009-05-17 14:16 UTC (permalink / raw)
  To: Avi Kivity; +Cc: Mark McLoughlin, Dor Laor, virtualization, netdev

On Sun, 17 May 2009 03:57:01 pm Avi Kivity wrote:
> Rusty Russell wrote:
> > +static void adjust_threshold(struct vring_virtqueue *vq,
> > +			     unsigned int out, unsigned int in)
> > +{
> > +	/* There are really two species of virtqueue, and it matters here.
> > +	 * If there are no output parts, it's a "normally full" receive queue,
> > +	 * otherwise it's a "normally empty" send queue. */
>
> This comment is true for networking, but not for block. ++overkill with
> a ->adjust_threshold op.

No, it's true for block.  It has output parts, so we should reduce threshold 
when it's full.  Network recvq is an example which should reduce threshold 
when it's empty.

->adjust_threshold is better as an arg to vring_new_virtqueue, but it's still 
not clear what the answer would be.

Rusty.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 2/3] virtio: indirect ring entries (VIRTIO_RING_F_INDIRECT_DESC)
  2009-05-17 14:16                 ` Rusty Russell
@ 2009-05-17 15:05                   ` Avi Kivity
  2009-05-19  8:15                     ` Rusty Russell
  2009-05-19  8:15                     ` Rusty Russell
  2009-05-17 15:05                   ` Avi Kivity
  1 sibling, 2 replies; 29+ messages in thread
From: Avi Kivity @ 2009-05-17 15:05 UTC (permalink / raw)
  To: Rusty Russell; +Cc: Mark McLoughlin, dlaor, netdev, Dor Laor, virtualization

Rusty Russell wrote:
> On Sun, 17 May 2009 03:57:01 pm Avi Kivity wrote:
>   
>> Rusty Russell wrote:
>>     
>>> +static void adjust_threshold(struct vring_virtqueue *vq,
>>> +			     unsigned int out, unsigned int in)
>>> +{
>>> +	/* There are really two species of virtqueue, and it matters here.
>>> +	 * If there are no output parts, it's a "normally full" receive queue,
>>> +	 * otherwise it's a "normally empty" send queue. */
>>>       
>> This comment is true for networking, but not for block. ++overkill with
>> a ->adjust_threshold op.
>>     
>
> No, it's true for block.  It has output parts, so we should reduce threshold 
> when it's full.  Network recvq is an example which should reduce threshold 
> when it's empty.
>   

You mean the header that contains the sector number?  It's a little 
incidental, but I guess it works.

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 2/3] virtio: indirect ring entries (VIRTIO_RING_F_INDIRECT_DESC)
  2009-05-17 14:16                 ` Rusty Russell
  2009-05-17 15:05                   ` Avi Kivity
@ 2009-05-17 15:05                   ` Avi Kivity
  1 sibling, 0 replies; 29+ messages in thread
From: Avi Kivity @ 2009-05-17 15:05 UTC (permalink / raw)
  To: Rusty Russell; +Cc: Mark McLoughlin, Dor Laor, virtualization, netdev

Rusty Russell wrote:
> On Sun, 17 May 2009 03:57:01 pm Avi Kivity wrote:
>   
>> Rusty Russell wrote:
>>     
>>> +static void adjust_threshold(struct vring_virtqueue *vq,
>>> +			     unsigned int out, unsigned int in)
>>> +{
>>> +	/* There are really two species of virtqueue, and it matters here.
>>> +	 * If there are no output parts, it's a "normally full" receive queue,
>>> +	 * otherwise it's a "normally empty" send queue. */
>>>       
>> This comment is true for networking, but not for block. ++overkill with
>> a ->adjust_threshold op.
>>     
>
> No, it's true for block.  It has output parts, so we should reduce threshold 
> when it's full.  Network recvq is an example which should reduce threshold 
> when it's empty.
>   

You mean the header that contains the sector number?  It's a little 
incidental, but I guess it works.

-- 
error compiling committee.c: too many arguments to function

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 2/3] virtio: indirect ring entries (VIRTIO_RING_F_INDIRECT_DESC)
  2009-05-17 15:05                   ` Avi Kivity
  2009-05-19  8:15                     ` Rusty Russell
@ 2009-05-19  8:15                     ` Rusty Russell
  1 sibling, 0 replies; 29+ messages in thread
From: Rusty Russell @ 2009-05-19  8:15 UTC (permalink / raw)
  To: Avi Kivity; +Cc: Mark McLoughlin, dlaor, netdev, Dor Laor, virtualization

On Mon, 18 May 2009 12:35:39 am Avi Kivity wrote:
> Rusty Russell wrote:
> > No, it's true for block.  It has output parts, so we should reduce
> > threshold when it's full.  Network recvq is an example which should
> > reduce threshold when it's empty.
>
> You mean the header that contains the sector number?  It's a little
> incidental, but I guess it works.

Yes, the one which breaks is randomness.  You really do just throw buffers with 
no metadata in them and the request is implied. If/when we care, we can add a 
hint flag.

Cheers,
Rusty.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 2/3] virtio: indirect ring entries (VIRTIO_RING_F_INDIRECT_DESC)
  2009-05-17 15:05                   ` Avi Kivity
@ 2009-05-19  8:15                     ` Rusty Russell
  2009-05-19  8:15                     ` Rusty Russell
  1 sibling, 0 replies; 29+ messages in thread
From: Rusty Russell @ 2009-05-19  8:15 UTC (permalink / raw)
  To: Avi Kivity; +Cc: Mark McLoughlin, Dor Laor, virtualization, netdev

On Mon, 18 May 2009 12:35:39 am Avi Kivity wrote:
> Rusty Russell wrote:
> > No, it's true for block.  It has output parts, so we should reduce
> > threshold when it's full.  Network recvq is an example which should
> > reduce threshold when it's empty.
>
> You mean the header that contains the sector number?  It's a little
> incidental, but I guess it works.

Yes, the one which breaks is randomness.  You really do just throw buffers with 
no metadata in them and the request is implied. If/when we care, we can add a 
hint flag.

Cheers,
Rusty.

^ permalink raw reply	[flat|nested] 29+ messages in thread

end of thread, other threads:[~2009-05-19  8:15 UTC | newest]

Thread overview: 29+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2008-12-18 17:10 [PATCH 0/3] virtio: indirect ring entries Mark McLoughlin
2008-12-18 17:10 ` [PATCH 1/3] virtio: teach virtio_has_feature() about transport features Mark McLoughlin
2008-12-18 17:10   ` [PATCH 2/3] virtio: indirect ring entries (VIRTIO_RING_F_INDIRECT_DESC) Mark McLoughlin
2008-12-18 17:10     ` [PATCH 3/3] lguest: add support for indirect ring entries Mark McLoughlin
2008-12-20 11:38     ` [PATCH 2/3] virtio: indirect ring entries (VIRTIO_RING_F_INDIRECT_DESC) Ingo Oeser
2008-12-20 11:38     ` Ingo Oeser
2008-12-22 10:17       ` Mark McLoughlin
2008-12-22 10:17       ` Mark McLoughlin
2009-04-21 12:59     ` Mark McLoughlin
2009-04-21 12:59     ` Mark McLoughlin
2009-04-27  7:43       ` Dor Laor
2009-05-04  2:19         ` Rusty Russell
2009-05-11 17:10           ` Mark McLoughlin
2009-05-11 17:10           ` Mark McLoughlin
2009-05-11 17:11             ` [PATCH 1/3] virtio: teach virtio_has_feature() about transport features Mark McLoughlin
2009-05-11 17:11               ` [PATCH 2/3] virtio: indirect ring entries (VIRTIO_RING_F_INDIRECT_DESC) Mark McLoughlin
2009-05-11 17:11                 ` [PATCH 3/3] lguest: add support for indirect ring entries Mark McLoughlin
2009-05-12 14:23             ` [PATCH 2/3] virtio: indirect ring entries (VIRTIO_RING_F_INDIRECT_DESC) Rusty Russell
2009-05-12 14:23             ` Rusty Russell
2009-05-17  2:04             ` Rusty Russell
2009-05-17  2:04             ` Rusty Russell
2009-05-17  6:27               ` Avi Kivity
2009-05-17  6:27               ` Avi Kivity
2009-05-17 14:16                 ` Rusty Russell
2009-05-17 14:16                 ` Rusty Russell
2009-05-17 15:05                   ` Avi Kivity
2009-05-19  8:15                     ` Rusty Russell
2009-05-19  8:15                     ` Rusty Russell
2009-05-17 15:05                   ` Avi Kivity

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.