All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC PATCH v2] ptr_ring: linked list fallback
@ 2018-02-26  1:17 Michael S. Tsirkin
  2018-02-26  3:15 ` Jason Wang
  2018-02-27 17:53 ` Eric Dumazet
  0 siblings, 2 replies; 16+ messages in thread
From: Michael S. Tsirkin @ 2018-02-26  1:17 UTC (permalink / raw)
  To: linux-kernel; +Cc: John Fastabend, netdev, Jason Wang, David Miller

So pointer rings work fine, but they have a problem: make them too small
and not enough entries fit.  Make them too large and you start flushing
your cache and running out of memory.

This is a new idea of mine: a ring backed by a linked list. Once you run
out of ring entries, instead of a drop you fall back on a list with a
common lock.

Should work well for the case where the ring is typically sized
correctly, but will help address the fact that some user try to set e.g.
tx queue length to 1000000.

In other words, the idea is that if a user sets a really huge TX queue
length, we allocate a ptr_ring which is smaller, and use the backup
linked list when necessary to provide the requested TX queue length
legitimately.

My hope this will move us closer to direction where e.g. fw codel can
use ptr rings without locking at all.  The API is still very rough, and
I really need to take a hard look at lock nesting.

Compiled only, sending for early feedback/flames.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
---

changes from v1:
- added clarifications by DaveM in the commit log
- build fixes

 include/linux/ptr_ring.h | 64 +++++++++++++++++++++++++++++++++++++++++++++---
 1 file changed, 61 insertions(+), 3 deletions(-)

diff --git a/include/linux/ptr_ring.h b/include/linux/ptr_ring.h
index d72b2e7..8aa8882 100644
--- a/include/linux/ptr_ring.h
+++ b/include/linux/ptr_ring.h
@@ -31,11 +31,18 @@
 #include <asm/errno.h>
 #endif
 
+/* entries must start with the following structure */
+struct plist {
+	struct plist *next;
+	struct plist *last; /* only valid in the 1st entry */
+};
+
 struct ptr_ring {
 	int producer ____cacheline_aligned_in_smp;
 	spinlock_t producer_lock;
 	int consumer_head ____cacheline_aligned_in_smp; /* next valid entry */
 	int consumer_tail; /* next entry to invalidate */
+	struct plist *consumer_list;
 	spinlock_t consumer_lock;
 	/* Shared consumer/producer data */
 	/* Read-only by both the producer and the consumer */
@@ -120,10 +127,40 @@ static inline int __ptr_ring_produce(struct ptr_ring *r, void *ptr)
 }
 
 /*
- * Note: resize (below) nests producer lock within consumer lock, so if you
- * consume in interrupt or BH context, you must disable interrupts/BH when
- * calling this.
+ * Note: resize API with the _fallback should be used when calling this.
  */
+static inline int ptr_ring_produce_fallback(struct ptr_ring *r, void *ptr)
+{
+	int ret;
+	unsigned long flags;
+	struct plist *p = ptr;
+
+	p->next = NULL;
+	p->last = p;
+
+	spin_lock_irqsave(&r->producer_lock, flags);
+	ret = __ptr_ring_produce(r, ptr);
+	if (ret) {
+		spin_lock(&r->consumer_lock);
+		ret = __ptr_ring_produce(r, ptr);
+		if (ret) {
+			int producer = r->producer ? r->producer - 1 :
+				r->size - 1;
+			struct plist *first = r->queue[producer];
+
+			BUG_ON(!first);
+
+			first->last->next = p;
+			first->last = p;
+		}
+		spin_unlock(&r->consumer_lock);
+	}
+
+	spin_unlock_irqrestore(&r->producer_lock, flags);
+
+	return ret;
+}
+
 static inline int ptr_ring_produce(struct ptr_ring *r, void *ptr)
 {
 	int ret;
@@ -135,6 +172,7 @@ static inline int ptr_ring_produce(struct ptr_ring *r, void *ptr)
 	return ret;
 }
 
+
 static inline int ptr_ring_produce_irq(struct ptr_ring *r, void *ptr)
 {
 	int ret;
@@ -359,6 +397,26 @@ static inline void *ptr_ring_consume_bh(struct ptr_ring *r)
 	return ptr;
 }
 
+static inline void *ptr_ring_consume_fallback(struct ptr_ring *r)
+{
+	unsigned long flags;
+	struct plist *ptr;
+
+	spin_lock_irqsave(&r->consumer_lock, flags);
+	if (r->consumer_list) {
+		ptr = r->consumer_list;
+		r->consumer_list = ptr->next;
+	} else {
+		ptr = __ptr_ring_consume(r);
+		if (ptr) {
+			r->consumer_list = ptr->next;
+		}
+	}
+	spin_unlock_irqrestore(&r->consumer_lock, flags);
+
+	return ptr;
+}
+
 static inline int ptr_ring_consume_batched(struct ptr_ring *r,
 					   void **array, int n)
 {
-- 
MST

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [RFC PATCH v2] ptr_ring: linked list fallback
  2018-02-26  1:17 [RFC PATCH v2] ptr_ring: linked list fallback Michael S. Tsirkin
@ 2018-02-26  3:15 ` Jason Wang
  2018-02-26 20:34   ` Michael S. Tsirkin
  2018-02-27 17:53 ` Eric Dumazet
  1 sibling, 1 reply; 16+ messages in thread
From: Jason Wang @ 2018-02-26  3:15 UTC (permalink / raw)
  To: Michael S. Tsirkin, linux-kernel; +Cc: John Fastabend, netdev, David Miller



On 2018年02月26日 09:17, Michael S. Tsirkin wrote:
> So pointer rings work fine, but they have a problem: make them too small
> and not enough entries fit.  Make them too large and you start flushing
> your cache and running out of memory.
>
> This is a new idea of mine: a ring backed by a linked list. Once you run
> out of ring entries, instead of a drop you fall back on a list with a
> common lock.
>
> Should work well for the case where the ring is typically sized
> correctly, but will help address the fact that some user try to set e.g.
> tx queue length to 1000000.
>
> In other words, the idea is that if a user sets a really huge TX queue
> length, we allocate a ptr_ring which is smaller, and use the backup
> linked list when necessary to provide the requested TX queue length
> legitimately.
>
> My hope this will move us closer to direction where e.g. fw codel can
> use ptr rings without locking at all.  The API is still very rough, and
> I really need to take a hard look at lock nesting.
>
> Compiled only, sending for early feedback/flames.
>
> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
> ---
>
> changes from v1:
> - added clarifications by DaveM in the commit log
> - build fixes
>
>   include/linux/ptr_ring.h | 64 +++++++++++++++++++++++++++++++++++++++++++++---
>   1 file changed, 61 insertions(+), 3 deletions(-)
>
> diff --git a/include/linux/ptr_ring.h b/include/linux/ptr_ring.h
> index d72b2e7..8aa8882 100644
> --- a/include/linux/ptr_ring.h
> +++ b/include/linux/ptr_ring.h
> @@ -31,11 +31,18 @@
>   #include <asm/errno.h>
>   #endif
>   
> +/* entries must start with the following structure */
> +struct plist {
> +	struct plist *next;
> +	struct plist *last; /* only valid in the 1st entry */
> +};

So I wonder whether or not it's better to do this in e.g skb_array 
implementation. Then it can use its own prev/next field.

> +
>   struct ptr_ring {
>   	int producer ____cacheline_aligned_in_smp;
>   	spinlock_t producer_lock;
>   	int consumer_head ____cacheline_aligned_in_smp; /* next valid entry */
>   	int consumer_tail; /* next entry to invalidate */
> +	struct plist *consumer_list;
>   	spinlock_t consumer_lock;
>   	/* Shared consumer/producer data */
>   	/* Read-only by both the producer and the consumer */
> @@ -120,10 +127,40 @@ static inline int __ptr_ring_produce(struct ptr_ring *r, void *ptr)
>   }
>   
>   /*
> - * Note: resize (below) nests producer lock within consumer lock, so if you
> - * consume in interrupt or BH context, you must disable interrupts/BH when
> - * calling this.
> + * Note: resize API with the _fallback should be used when calling this.
>    */
> +static inline int ptr_ring_produce_fallback(struct ptr_ring *r, void *ptr)
> +{
> +	int ret;
> +	unsigned long flags;
> +	struct plist *p = ptr;
> +
> +	p->next = NULL;
> +	p->last = p;
> +
> +	spin_lock_irqsave(&r->producer_lock, flags);
> +	ret = __ptr_ring_produce(r, ptr);
> +	if (ret) {
> +		spin_lock(&r->consumer_lock);
> +		ret = __ptr_ring_produce(r, ptr);
> +		if (ret) {
> +			int producer = r->producer ? r->producer - 1 :
> +				r->size - 1;
> +			struct plist *first = r->queue[producer];
> +
> +			BUG_ON(!first);
> +
> +			first->last->next = p;
> +			first->last = p;

I believe we still need a limitation on the total size of the queue.

Thanks

> +		}
> +		spin_unlock(&r->consumer_lock);
> +	}
> +
> +	spin_unlock_irqrestore(&r->producer_lock, flags);
> +
> +	return ret;
> +}
> +
>   static inline int ptr_ring_produce(struct ptr_ring *r, void *ptr)
>   {
>   	int ret;
> @@ -135,6 +172,7 @@ static inline int ptr_ring_produce(struct ptr_ring *r, void *ptr)
>   	return ret;
>   }
>   
> +
>   static inline int ptr_ring_produce_irq(struct ptr_ring *r, void *ptr)
>   {
>   	int ret;
> @@ -359,6 +397,26 @@ static inline void *ptr_ring_consume_bh(struct ptr_ring *r)
>   	return ptr;
>   }
>   
> +static inline void *ptr_ring_consume_fallback(struct ptr_ring *r)
> +{
> +	unsigned long flags;
> +	struct plist *ptr;
> +
> +	spin_lock_irqsave(&r->consumer_lock, flags);
> +	if (r->consumer_list) {
> +		ptr = r->consumer_list;
> +		r->consumer_list = ptr->next;
> +	} else {
> +		ptr = __ptr_ring_consume(r);
> +		if (ptr) {
> +			r->consumer_list = ptr->next;
> +		}
> +	}
> +	spin_unlock_irqrestore(&r->consumer_lock, flags);
> +
> +	return ptr;
> +}
> +
>   static inline int ptr_ring_consume_batched(struct ptr_ring *r,
>   					   void **array, int n)
>   {

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC PATCH v2] ptr_ring: linked list fallback
  2018-02-26  3:15 ` Jason Wang
@ 2018-02-26 20:34   ` Michael S. Tsirkin
  2018-02-27  2:29     ` Jason Wang
  0 siblings, 1 reply; 16+ messages in thread
From: Michael S. Tsirkin @ 2018-02-26 20:34 UTC (permalink / raw)
  To: Jason Wang; +Cc: linux-kernel, John Fastabend, netdev, David Miller

On Mon, Feb 26, 2018 at 11:15:42AM +0800, Jason Wang wrote:
> 
> 
> On 2018年02月26日 09:17, Michael S. Tsirkin wrote:
> > So pointer rings work fine, but they have a problem: make them too small
> > and not enough entries fit.  Make them too large and you start flushing
> > your cache and running out of memory.
> > 
> > This is a new idea of mine: a ring backed by a linked list. Once you run
> > out of ring entries, instead of a drop you fall back on a list with a
> > common lock.
> > 
> > Should work well for the case where the ring is typically sized
> > correctly, but will help address the fact that some user try to set e.g.
> > tx queue length to 1000000.
> > 
> > In other words, the idea is that if a user sets a really huge TX queue
> > length, we allocate a ptr_ring which is smaller, and use the backup
> > linked list when necessary to provide the requested TX queue length
> > legitimately.
> > 
> > My hope this will move us closer to direction where e.g. fw codel can
> > use ptr rings without locking at all.  The API is still very rough, and
> > I really need to take a hard look at lock nesting.
> > 
> > Compiled only, sending for early feedback/flames.
> > 
> > Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
> > ---
> > 
> > changes from v1:
> > - added clarifications by DaveM in the commit log
> > - build fixes
> > 
> >   include/linux/ptr_ring.h | 64 +++++++++++++++++++++++++++++++++++++++++++++---
> >   1 file changed, 61 insertions(+), 3 deletions(-)
> > 
> > diff --git a/include/linux/ptr_ring.h b/include/linux/ptr_ring.h
> > index d72b2e7..8aa8882 100644
> > --- a/include/linux/ptr_ring.h
> > +++ b/include/linux/ptr_ring.h
> > @@ -31,11 +31,18 @@
> >   #include <asm/errno.h>
> >   #endif
> > +/* entries must start with the following structure */
> > +struct plist {
> > +	struct plist *next;
> > +	struct plist *last; /* only valid in the 1st entry */
> > +};
> 
> So I wonder whether or not it's better to do this in e.g skb_array
> implementation. Then it can use its own prev/next field.

XDP uses ptr ring directly, doesn't it?

> > +
> >   struct ptr_ring {
> >   	int producer ____cacheline_aligned_in_smp;
> >   	spinlock_t producer_lock;
> >   	int consumer_head ____cacheline_aligned_in_smp; /* next valid entry */
> >   	int consumer_tail; /* next entry to invalidate */
> > +	struct plist *consumer_list;
> >   	spinlock_t consumer_lock;
> >   	/* Shared consumer/producer data */
> >   	/* Read-only by both the producer and the consumer */
> > @@ -120,10 +127,40 @@ static inline int __ptr_ring_produce(struct ptr_ring *r, void *ptr)
> >   }
> >   /*
> > - * Note: resize (below) nests producer lock within consumer lock, so if you
> > - * consume in interrupt or BH context, you must disable interrupts/BH when
> > - * calling this.
> > + * Note: resize API with the _fallback should be used when calling this.
> >    */
> > +static inline int ptr_ring_produce_fallback(struct ptr_ring *r, void *ptr)
> > +{
> > +	int ret;
> > +	unsigned long flags;
> > +	struct plist *p = ptr;
> > +
> > +	p->next = NULL;
> > +	p->last = p;
> > +
> > +	spin_lock_irqsave(&r->producer_lock, flags);
> > +	ret = __ptr_ring_produce(r, ptr);
> > +	if (ret) {
> > +		spin_lock(&r->consumer_lock);
> > +		ret = __ptr_ring_produce(r, ptr);
> > +		if (ret) {
> > +			int producer = r->producer ? r->producer - 1 :
> > +				r->size - 1;
> > +			struct plist *first = r->queue[producer];
> > +
> > +			BUG_ON(!first);
> > +
> > +			first->last->next = p;
> > +			first->last = p;
> 
> I believe we still need a limitation on the total size of the queue.

OK, I'll implement that - it's pretty easy to do.

> Thanks
> 
> > +		}
> > +		spin_unlock(&r->consumer_lock);
> > +	}
> > +
> > +	spin_unlock_irqrestore(&r->producer_lock, flags);
> > +
> > +	return ret;
> > +}
> > +
> >   static inline int ptr_ring_produce(struct ptr_ring *r, void *ptr)
> >   {
> >   	int ret;
> > @@ -135,6 +172,7 @@ static inline int ptr_ring_produce(struct ptr_ring *r, void *ptr)
> >   	return ret;
> >   }
> > +
> >   static inline int ptr_ring_produce_irq(struct ptr_ring *r, void *ptr)
> >   {
> >   	int ret;
> > @@ -359,6 +397,26 @@ static inline void *ptr_ring_consume_bh(struct ptr_ring *r)
> >   	return ptr;
> >   }
> > +static inline void *ptr_ring_consume_fallback(struct ptr_ring *r)
> > +{
> > +	unsigned long flags;
> > +	struct plist *ptr;
> > +
> > +	spin_lock_irqsave(&r->consumer_lock, flags);
> > +	if (r->consumer_list) {
> > +		ptr = r->consumer_list;
> > +		r->consumer_list = ptr->next;
> > +	} else {
> > +		ptr = __ptr_ring_consume(r);
> > +		if (ptr) {
> > +			r->consumer_list = ptr->next;
> > +		}
> > +	}
> > +	spin_unlock_irqrestore(&r->consumer_lock, flags);
> > +
> > +	return ptr;
> > +}
> > +
> >   static inline int ptr_ring_consume_batched(struct ptr_ring *r,
> >   					   void **array, int n)
> >   {

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC PATCH v2] ptr_ring: linked list fallback
  2018-02-26 20:34   ` Michael S. Tsirkin
@ 2018-02-27  2:29     ` Jason Wang
  2018-02-27 17:12       ` Michael S. Tsirkin
  0 siblings, 1 reply; 16+ messages in thread
From: Jason Wang @ 2018-02-27  2:29 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: linux-kernel, John Fastabend, netdev, David Miller



On 2018年02月27日 04:34, Michael S. Tsirkin wrote:
> On Mon, Feb 26, 2018 at 11:15:42AM +0800, Jason Wang wrote:
>> On 2018年02月26日 09:17, Michael S. Tsirkin wrote:
>>> So pointer rings work fine, but they have a problem: make them too small
>>> and not enough entries fit.  Make them too large and you start flushing
>>> your cache and running out of memory.
>>>
>>> This is a new idea of mine: a ring backed by a linked list. Once you run
>>> out of ring entries, instead of a drop you fall back on a list with a
>>> common lock.
>>>
>>> Should work well for the case where the ring is typically sized
>>> correctly, but will help address the fact that some user try to set e.g.
>>> tx queue length to 1000000.
>>>
>>> In other words, the idea is that if a user sets a really huge TX queue
>>> length, we allocate a ptr_ring which is smaller, and use the backup
>>> linked list when necessary to provide the requested TX queue length
>>> legitimately.
>>>
>>> My hope this will move us closer to direction where e.g. fw codel can
>>> use ptr rings without locking at all.  The API is still very rough, and
>>> I really need to take a hard look at lock nesting.
>>>
>>> Compiled only, sending for early feedback/flames.
>>>
>>> Signed-off-by: Michael S. Tsirkin<mst@redhat.com>
>>> ---
>>>
>>> changes from v1:
>>> - added clarifications by DaveM in the commit log
>>> - build fixes
>>>
>>>    include/linux/ptr_ring.h | 64 +++++++++++++++++++++++++++++++++++++++++++++---
>>>    1 file changed, 61 insertions(+), 3 deletions(-)
>>>
>>> diff --git a/include/linux/ptr_ring.h b/include/linux/ptr_ring.h
>>> index d72b2e7..8aa8882 100644
>>> --- a/include/linux/ptr_ring.h
>>> +++ b/include/linux/ptr_ring.h
>>> @@ -31,11 +31,18 @@
>>>    #include <asm/errno.h>
>>>    #endif
>>> +/* entries must start with the following structure */
>>> +struct plist {
>>> +	struct plist *next;
>>> +	struct plist *last; /* only valid in the 1st entry */
>>> +};
>> So I wonder whether or not it's better to do this in e.g skb_array
>> implementation. Then it can use its own prev/next field.
> XDP uses ptr ring directly, doesn't it?
>

Well I believe the main user for this is qdisc, which use skb array. And 
we can not use what implemented in this patch directly for sk_buff 
without some changes on the data structure.

For XDP, we need to embed plist in struct xdp_buff too, so it looks to 
me that the better approach is to have separated function for ptr ring 
and skb array.

Thanks

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC PATCH v2] ptr_ring: linked list fallback
  2018-02-27  2:29     ` Jason Wang
@ 2018-02-27 17:12       ` Michael S. Tsirkin
  2018-02-28  3:28         ` Jason Wang
  0 siblings, 1 reply; 16+ messages in thread
From: Michael S. Tsirkin @ 2018-02-27 17:12 UTC (permalink / raw)
  To: Jason Wang; +Cc: linux-kernel, John Fastabend, netdev, David Miller

On Tue, Feb 27, 2018 at 10:29:26AM +0800, Jason Wang wrote:
> 
> 
> On 2018年02月27日 04:34, Michael S. Tsirkin wrote:
> > On Mon, Feb 26, 2018 at 11:15:42AM +0800, Jason Wang wrote:
> > > On 2018年02月26日 09:17, Michael S. Tsirkin wrote:
> > > > So pointer rings work fine, but they have a problem: make them too small
> > > > and not enough entries fit.  Make them too large and you start flushing
> > > > your cache and running out of memory.
> > > > 
> > > > This is a new idea of mine: a ring backed by a linked list. Once you run
> > > > out of ring entries, instead of a drop you fall back on a list with a
> > > > common lock.
> > > > 
> > > > Should work well for the case where the ring is typically sized
> > > > correctly, but will help address the fact that some user try to set e.g.
> > > > tx queue length to 1000000.
> > > > 
> > > > In other words, the idea is that if a user sets a really huge TX queue
> > > > length, we allocate a ptr_ring which is smaller, and use the backup
> > > > linked list when necessary to provide the requested TX queue length
> > > > legitimately.
> > > > 
> > > > My hope this will move us closer to direction where e.g. fw codel can
> > > > use ptr rings without locking at all.  The API is still very rough, and
> > > > I really need to take a hard look at lock nesting.
> > > > 
> > > > Compiled only, sending for early feedback/flames.
> > > > 
> > > > Signed-off-by: Michael S. Tsirkin<mst@redhat.com>
> > > > ---
> > > > 
> > > > changes from v1:
> > > > - added clarifications by DaveM in the commit log
> > > > - build fixes
> > > > 
> > > >    include/linux/ptr_ring.h | 64 +++++++++++++++++++++++++++++++++++++++++++++---
> > > >    1 file changed, 61 insertions(+), 3 deletions(-)
> > > > 
> > > > diff --git a/include/linux/ptr_ring.h b/include/linux/ptr_ring.h
> > > > index d72b2e7..8aa8882 100644
> > > > --- a/include/linux/ptr_ring.h
> > > > +++ b/include/linux/ptr_ring.h
> > > > @@ -31,11 +31,18 @@
> > > >    #include <asm/errno.h>
> > > >    #endif
> > > > +/* entries must start with the following structure */
> > > > +struct plist {
> > > > +	struct plist *next;
> > > > +	struct plist *last; /* only valid in the 1st entry */
> > > > +};
> > > So I wonder whether or not it's better to do this in e.g skb_array
> > > implementation. Then it can use its own prev/next field.
> > XDP uses ptr ring directly, doesn't it?
> > 
> 
> Well I believe the main user for this is qdisc, which use skb array. And we
> can not use what implemented in this patch directly for sk_buff without some
> changes on the data structure.

Why not? skb has next and prev pointers at 1st two fields:

struct sk_buff {
        union { 
                struct {
                        /* These two members must be first. */
                        struct sk_buff          *next;
                        struct sk_buff          *prev;
...
}

so it's just a question of casting to struct plist.

Or we can add plist to a union:


struct sk_buff {
        union { 
                struct {
                        /* These two members must be first. */
                        struct sk_buff          *next;
                        struct sk_buff          *prev;
                        
                        union { 
                                struct net_device       *dev;
                                /* Some protocols might use this space to store information,
                                 * while device pointer would be NULL.
                                 * UDP receive path is one user.
                                 */
                                unsigned long           dev_scratch;
                        };
                };
                struct rb_node  rbnode; /* used in netem & tcp stack */
+		struct plist plist; /* For use with ptr_ring */
        };



> For XDP, we need to embed plist in struct xdp_buff too,

Right - that's pretty straightforward, isn't it?

> so it looks to me
> that the better approach is to have separated function for ptr ring and skb
> array.
> 
> Thanks

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC PATCH v2] ptr_ring: linked list fallback
  2018-02-26  1:17 [RFC PATCH v2] ptr_ring: linked list fallback Michael S. Tsirkin
  2018-02-26  3:15 ` Jason Wang
@ 2018-02-27 17:53 ` Eric Dumazet
  2018-02-27 19:35   ` Michael S. Tsirkin
  1 sibling, 1 reply; 16+ messages in thread
From: Eric Dumazet @ 2018-02-27 17:53 UTC (permalink / raw)
  To: Michael S. Tsirkin, linux-kernel
  Cc: John Fastabend, netdev, Jason Wang, David Miller

On Mon, 2018-02-26 at 03:17 +0200, Michael S. Tsirkin wrote:
> So pointer rings work fine, but they have a problem: make them too small
> and not enough entries fit.  Make them too large and you start flushing
> your cache and running out of memory.
> 
> This is a new idea of mine: a ring backed by a linked list. Once you run
> out of ring entries, instead of a drop you fall back on a list with a
> common lock.
> 
> Should work well for the case where the ring is typically sized
> correctly, but will help address the fact that some user try to set e.g.
> tx queue length to 1000000.
> 
> In other words, the idea is that if a user sets a really huge TX queue
> length, we allocate a ptr_ring which is smaller, and use the backup
> linked list when necessary to provide the requested TX queue length
> legitimately.
> 
> My hope this will move us closer to direction where e.g. fw codel can
> use ptr rings without locking at all.  The API is still very rough, and
> I really need to take a hard look at lock nesting.
> 
> Compiled only, sending for early feedback/flames.

Okay I'll bite then ;)

High performance will be hit only if nothing is added in the (fallback)
list.

Under stress, list operations will be the bottleneck, allowing XXXX
items in the list, probably wasting cpu caches by always dequeue-ing
cold objects.

Since systems need to be provisioned to cope with the stress, why
trying to optimize the light load case, while we know CPU has plenty of
cycles to use ?

If something uses ptr_ring and needs a list for the fallback, it might
simply go back to the old-and-simple list stuff.

Note that this old-and-simple stuff can greatly be optimized with the
use of two lists, as was shown in UDP stack lately, to decouple
producer and consumer (batching effects)

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC PATCH v2] ptr_ring: linked list fallback
  2018-02-27 17:53 ` Eric Dumazet
@ 2018-02-27 19:35   ` Michael S. Tsirkin
  0 siblings, 0 replies; 16+ messages in thread
From: Michael S. Tsirkin @ 2018-02-27 19:35 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: linux-kernel, John Fastabend, netdev, Jason Wang, David Miller

On Tue, Feb 27, 2018 at 09:53:49AM -0800, Eric Dumazet wrote:
> On Mon, 2018-02-26 at 03:17 +0200, Michael S. Tsirkin wrote:
> > So pointer rings work fine, but they have a problem: make them too small
> > and not enough entries fit.  Make them too large and you start flushing
> > your cache and running out of memory.
> > 
> > This is a new idea of mine: a ring backed by a linked list. Once you run
> > out of ring entries, instead of a drop you fall back on a list with a
> > common lock.
> > 
> > Should work well for the case where the ring is typically sized
> > correctly, but will help address the fact that some user try to set e.g.
> > tx queue length to 1000000.
> > 
> > In other words, the idea is that if a user sets a really huge TX queue
> > length, we allocate a ptr_ring which is smaller, and use the backup
> > linked list when necessary to provide the requested TX queue length
> > legitimately.
> > 
> > My hope this will move us closer to direction where e.g. fw codel can
> > use ptr rings without locking at all.  The API is still very rough, and
> > I really need to take a hard look at lock nesting.
> > 
> > Compiled only, sending for early feedback/flames.
> 
> Okay I'll bite then ;)

Let me start by saying that there's no intent to merge this
before any numbers show a performance gain.

> High performance will be hit only if nothing is added in the (fallback)
> list.
> 
> Under stress, list operations will be the bottleneck, allowing XXXX
> items in the list, probably wasting cpu caches by always dequeue-ing
> cold objects.
>
> Since systems need to be provisioned to cope with the stress, why
> trying to optimize the light load case, while we know CPU has plenty of
> cycles to use ?

E.g. with tun people configure huge rx rings to avoid packet drops, but
in practice tens of packets is the maximum we see even under heavy load
except <1% of time.

So the list will get used a very small % of time and yes, that
time it will be slower.

> If something uses ptr_ring and needs a list for the fallback, it might
> simply go back to the old-and-simple list stuff.


So for size > 512 we use a list, for size < 512 we use a ptr ring?

That is absolutely an option.

My concern is that this means that simply by increasing the size
using ethtool suddenly user sees a slowdown.
This did not use to be the case so users might be confused.

> Note that this old-and-simple stuff can greatly be optimized with the
> use of two lists, as was shown in UDP stack lately, to decouple
> producer and consumer (batching effects)

Pls note that such a batching is already built in to this patch:
packets are added to the last skb, then dequeued as a batch
and moved to consumer_list.

-- 
MST

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC PATCH v2] ptr_ring: linked list fallback
  2018-02-27 17:12       ` Michael S. Tsirkin
@ 2018-02-28  3:28         ` Jason Wang
  2018-02-28  3:39           ` Jason Wang
  2018-02-28  4:09           ` Michael S. Tsirkin
  0 siblings, 2 replies; 16+ messages in thread
From: Jason Wang @ 2018-02-28  3:28 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: linux-kernel, John Fastabend, netdev, David Miller



On 2018年02月28日 01:12, Michael S. Tsirkin wrote:
> On Tue, Feb 27, 2018 at 10:29:26AM +0800, Jason Wang wrote:
>>
>> On 2018年02月27日 04:34, Michael S. Tsirkin wrote:
>>> On Mon, Feb 26, 2018 at 11:15:42AM +0800, Jason Wang wrote:
>>>> On 2018年02月26日 09:17, Michael S. Tsirkin wrote:
>>>>> So pointer rings work fine, but they have a problem: make them too small
>>>>> and not enough entries fit.  Make them too large and you start flushing
>>>>> your cache and running out of memory.
>>>>>
>>>>> This is a new idea of mine: a ring backed by a linked list. Once you run
>>>>> out of ring entries, instead of a drop you fall back on a list with a
>>>>> common lock.
>>>>>
>>>>> Should work well for the case where the ring is typically sized
>>>>> correctly, but will help address the fact that some user try to set e.g.
>>>>> tx queue length to 1000000.
>>>>>
>>>>> In other words, the idea is that if a user sets a really huge TX queue
>>>>> length, we allocate a ptr_ring which is smaller, and use the backup
>>>>> linked list when necessary to provide the requested TX queue length
>>>>> legitimately.
>>>>>
>>>>> My hope this will move us closer to direction where e.g. fw codel can
>>>>> use ptr rings without locking at all.  The API is still very rough, and
>>>>> I really need to take a hard look at lock nesting.
>>>>>
>>>>> Compiled only, sending for early feedback/flames.
>>>>>
>>>>> Signed-off-by: Michael S. Tsirkin<mst@redhat.com>
>>>>> ---
>>>>>
>>>>> changes from v1:
>>>>> - added clarifications by DaveM in the commit log
>>>>> - build fixes
>>>>>
>>>>>     include/linux/ptr_ring.h | 64 +++++++++++++++++++++++++++++++++++++++++++++---
>>>>>     1 file changed, 61 insertions(+), 3 deletions(-)
>>>>>
>>>>> diff --git a/include/linux/ptr_ring.h b/include/linux/ptr_ring.h
>>>>> index d72b2e7..8aa8882 100644
>>>>> --- a/include/linux/ptr_ring.h
>>>>> +++ b/include/linux/ptr_ring.h
>>>>> @@ -31,11 +31,18 @@
>>>>>     #include <asm/errno.h>
>>>>>     #endif
>>>>> +/* entries must start with the following structure */
>>>>> +struct plist {
>>>>> +	struct plist *next;
>>>>> +	struct plist *last; /* only valid in the 1st entry */
>>>>> +};
>>>> So I wonder whether or not it's better to do this in e.g skb_array
>>>> implementation. Then it can use its own prev/next field.
>>> XDP uses ptr ring directly, doesn't it?
>>>
>> Well I believe the main user for this is qdisc, which use skb array. And we
>> can not use what implemented in this patch directly for sk_buff without some
>> changes on the data structure.
> Why not? skb has next and prev pointers at 1st two fields:
>
> struct sk_buff {
>          union {
>                  struct {
>                          /* These two members must be first. */
>                          struct sk_buff          *next;
>                          struct sk_buff          *prev;
> ...
> }
>
> so it's just a question of casting to struct plist.

Well, then the casting can only be done in skb_array implementation?

>
> Or we can add plist to a union:
>
>
> struct sk_buff {
>          union {
>                  struct {
>                          /* These two members must be first. */
>                          struct sk_buff          *next;
>                          struct sk_buff          *prev;
>                          
>                          union {
>                                  struct net_device       *dev;
>                                  /* Some protocols might use this space to store information,
>                                   * while device pointer would be NULL.
>                                   * UDP receive path is one user.
>                                   */
>                                  unsigned long           dev_scratch;
>                          };
>                  };
>                  struct rb_node  rbnode; /* used in netem & tcp stack */
> +		struct plist plist; /* For use with ptr_ring */
>          };
>

This look ok.

>
>> For XDP, we need to embed plist in struct xdp_buff too,
> Right - that's pretty straightforward, isn't it?

Yes, it's not clear to me this is really needed for XDP consider the 
lock contention it brings.

Thanks

>> so it looks to me
>> that the better approach is to have separated function for ptr ring and skb
>> array.
>>
>> Thanks

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC PATCH v2] ptr_ring: linked list fallback
  2018-02-28  3:28         ` Jason Wang
@ 2018-02-28  3:39           ` Jason Wang
  2018-02-28  4:11             ` Michael S. Tsirkin
  2018-02-28  4:09           ` Michael S. Tsirkin
  1 sibling, 1 reply; 16+ messages in thread
From: Jason Wang @ 2018-02-28  3:39 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: linux-kernel, John Fastabend, netdev, David Miller



On 2018年02月28日 11:28, Jason Wang wrote:
>>> Well I believe the main user for this is qdisc, which use skb array. 
>>> And we
>>> can not use what implemented in this patch directly for sk_buff 
>>> without some
>>> changes on the data structure.
>> Why not? skb has next and prev pointers at 1st two fields:
>>
>> struct sk_buff {
>>          union {
>>                  struct {
>>                          /* These two members must be first. */
>>                          struct sk_buff          *next;
>>                          struct sk_buff          *prev;
>> ...
>> }
>>
>> so it's just a question of casting to struct plist.
>
> Well, then the casting can only be done in skb_array implementation? 

Ok, could be done in ptr ring. But still looks tricky, because of the 
different meaning of prev and last.

Thanks

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC PATCH v2] ptr_ring: linked list fallback
  2018-02-28  3:28         ` Jason Wang
  2018-02-28  3:39           ` Jason Wang
@ 2018-02-28  4:09           ` Michael S. Tsirkin
  2018-02-28  6:28             ` Jason Wang
  1 sibling, 1 reply; 16+ messages in thread
From: Michael S. Tsirkin @ 2018-02-28  4:09 UTC (permalink / raw)
  To: Jason Wang; +Cc: linux-kernel, John Fastabend, netdev, David Miller

On Wed, Feb 28, 2018 at 11:28:57AM +0800, Jason Wang wrote:
> 
> 
> On 2018年02月28日 01:12, Michael S. Tsirkin wrote:
> > On Tue, Feb 27, 2018 at 10:29:26AM +0800, Jason Wang wrote:
> > > 
> > > On 2018年02月27日 04:34, Michael S. Tsirkin wrote:
> > > > On Mon, Feb 26, 2018 at 11:15:42AM +0800, Jason Wang wrote:
> > > > > On 2018年02月26日 09:17, Michael S. Tsirkin wrote:
> > > > > > So pointer rings work fine, but they have a problem: make them too small
> > > > > > and not enough entries fit.  Make them too large and you start flushing
> > > > > > your cache and running out of memory.
> > > > > > 
> > > > > > This is a new idea of mine: a ring backed by a linked list. Once you run
> > > > > > out of ring entries, instead of a drop you fall back on a list with a
> > > > > > common lock.
> > > > > > 
> > > > > > Should work well for the case where the ring is typically sized
> > > > > > correctly, but will help address the fact that some user try to set e.g.
> > > > > > tx queue length to 1000000.
> > > > > > 
> > > > > > In other words, the idea is that if a user sets a really huge TX queue
> > > > > > length, we allocate a ptr_ring which is smaller, and use the backup
> > > > > > linked list when necessary to provide the requested TX queue length
> > > > > > legitimately.
> > > > > > 
> > > > > > My hope this will move us closer to direction where e.g. fw codel can
> > > > > > use ptr rings without locking at all.  The API is still very rough, and
> > > > > > I really need to take a hard look at lock nesting.
> > > > > > 
> > > > > > Compiled only, sending for early feedback/flames.
> > > > > > 
> > > > > > Signed-off-by: Michael S. Tsirkin<mst@redhat.com>
> > > > > > ---
> > > > > > 
> > > > > > changes from v1:
> > > > > > - added clarifications by DaveM in the commit log
> > > > > > - build fixes
> > > > > > 
> > > > > >     include/linux/ptr_ring.h | 64 +++++++++++++++++++++++++++++++++++++++++++++---
> > > > > >     1 file changed, 61 insertions(+), 3 deletions(-)
> > > > > > 
> > > > > > diff --git a/include/linux/ptr_ring.h b/include/linux/ptr_ring.h
> > > > > > index d72b2e7..8aa8882 100644
> > > > > > --- a/include/linux/ptr_ring.h
> > > > > > +++ b/include/linux/ptr_ring.h
> > > > > > @@ -31,11 +31,18 @@
> > > > > >     #include <asm/errno.h>
> > > > > >     #endif
> > > > > > +/* entries must start with the following structure */
> > > > > > +struct plist {
> > > > > > +	struct plist *next;
> > > > > > +	struct plist *last; /* only valid in the 1st entry */
> > > > > > +};
> > > > > So I wonder whether or not it's better to do this in e.g skb_array
> > > > > implementation. Then it can use its own prev/next field.
> > > > XDP uses ptr ring directly, doesn't it?
> > > > 
> > > Well I believe the main user for this is qdisc, which use skb array. And we
> > > can not use what implemented in this patch directly for sk_buff without some
> > > changes on the data structure.
> > Why not? skb has next and prev pointers at 1st two fields:
> > 
> > struct sk_buff {
> >          union {
> >                  struct {
> >                          /* These two members must be first. */
> >                          struct sk_buff          *next;
> >                          struct sk_buff          *prev;
> > ...
> > }
> > 
> > so it's just a question of casting to struct plist.
> 
> Well, then the casting can only be done in skb_array implementation?

why not?

> > 
> > Or we can add plist to a union:
> > 
> > 
> > struct sk_buff {
> >          union {
> >                  struct {
> >                          /* These two members must be first. */
> >                          struct sk_buff          *next;
> >                          struct sk_buff          *prev;
> >                          union {
> >                                  struct net_device       *dev;
> >                                  /* Some protocols might use this space to store information,
> >                                   * while device pointer would be NULL.
> >                                   * UDP receive path is one user.
> >                                   */
> >                                  unsigned long           dev_scratch;
> >                          };
> >                  };
> >                  struct rb_node  rbnode; /* used in netem & tcp stack */
> > +		struct plist plist; /* For use with ptr_ring */
> >          };
> > 
> 
> This look ok.
> 
> > 
> > > For XDP, we need to embed plist in struct xdp_buff too,
> > Right - that's pretty straightforward, isn't it?
> 
> Yes, it's not clear to me this is really needed for XDP consider the lock
> contention it brings.
> 
> Thanks

The contention is only when the ring overflows into the list though.

> > > so it looks to me
> > > that the better approach is to have separated function for ptr ring and skb
> > > array.
> > > 
> > > Thanks

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC PATCH v2] ptr_ring: linked list fallback
  2018-02-28  3:39           ` Jason Wang
@ 2018-02-28  4:11             ` Michael S. Tsirkin
  0 siblings, 0 replies; 16+ messages in thread
From: Michael S. Tsirkin @ 2018-02-28  4:11 UTC (permalink / raw)
  To: Jason Wang; +Cc: linux-kernel, John Fastabend, netdev, David Miller

On Wed, Feb 28, 2018 at 11:39:15AM +0800, Jason Wang wrote:
> 
> 
> On 2018年02月28日 11:28, Jason Wang wrote:
> > > > Well I believe the main user for this is qdisc, which use skb
> > > > array. And we
> > > > can not use what implemented in this patch directly for sk_buff
> > > > without some
> > > > changes on the data structure.
> > > Why not? skb has next and prev pointers at 1st two fields:
> > > 
> > > struct sk_buff {
> > >          union {
> > >                  struct {
> > >                          /* These two members must be first. */
> > >                          struct sk_buff          *next;
> > >                          struct sk_buff          *prev;
> > > ...
> > > }
> > > 
> > > so it's just a question of casting to struct plist.
> > 
> > Well, then the casting can only be done in skb_array implementation?
> 
> Ok, could be done in ptr ring. But still looks tricky, because of the
> different meaning of prev and last.
> 
> Thanks

It doesn't matter much - the meaning is up to whoever has the skb
queued. But sure - adding struct plist into the union is cleaner
and just as easy.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC PATCH v2] ptr_ring: linked list fallback
  2018-02-28  4:09           ` Michael S. Tsirkin
@ 2018-02-28  6:28             ` Jason Wang
  2018-02-28 14:01               ` Michael S. Tsirkin
  0 siblings, 1 reply; 16+ messages in thread
From: Jason Wang @ 2018-02-28  6:28 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: linux-kernel, John Fastabend, netdev, David Miller



On 2018年02月28日 12:09, Michael S. Tsirkin wrote:
>>> Or we can add plist to a union:
>>>
>>>
>>> struct sk_buff {
>>>           union {
>>>                   struct {
>>>                           /* These two members must be first. */
>>>                           struct sk_buff          *next;
>>>                           struct sk_buff          *prev;
>>>                           union {
>>>                                   struct net_device       *dev;
>>>                                   /* Some protocols might use this space to store information,
>>>                                    * while device pointer would be NULL.
>>>                                    * UDP receive path is one user.
>>>                                    */
>>>                                   unsigned long           dev_scratch;
>>>                           };
>>>                   };
>>>                   struct rb_node  rbnode; /* used in netem & tcp stack */
>>> +		struct plist plist; /* For use with ptr_ring */
>>>           };
>>>
>> This look ok.
>>
>>>> For XDP, we need to embed plist in struct xdp_buff too,
>>> Right - that's pretty straightforward, isn't it?
>> Yes, it's not clear to me this is really needed for XDP consider the lock
>> contention it brings.
>>
>> Thanks
> The contention is only when the ring overflows into the list though.
>

Right, but there's usually a mismatch of speed between producer and 
consumer. In case of a fast producer, we may get this contention very 
frequently.

Thanks

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC PATCH v2] ptr_ring: linked list fallback
  2018-02-28  6:28             ` Jason Wang
@ 2018-02-28 14:01               ` Michael S. Tsirkin
  2018-02-28 14:20                 ` Jason Wang
  0 siblings, 1 reply; 16+ messages in thread
From: Michael S. Tsirkin @ 2018-02-28 14:01 UTC (permalink / raw)
  To: Jason Wang; +Cc: linux-kernel, John Fastabend, netdev, David Miller

On Wed, Feb 28, 2018 at 02:28:21PM +0800, Jason Wang wrote:
> 
> 
> On 2018年02月28日 12:09, Michael S. Tsirkin wrote:
> > > > Or we can add plist to a union:
> > > > 
> > > > 
> > > > struct sk_buff {
> > > >           union {
> > > >                   struct {
> > > >                           /* These two members must be first. */
> > > >                           struct sk_buff          *next;
> > > >                           struct sk_buff          *prev;
> > > >                           union {
> > > >                                   struct net_device       *dev;
> > > >                                   /* Some protocols might use this space to store information,
> > > >                                    * while device pointer would be NULL.
> > > >                                    * UDP receive path is one user.
> > > >                                    */
> > > >                                   unsigned long           dev_scratch;
> > > >                           };
> > > >                   };
> > > >                   struct rb_node  rbnode; /* used in netem & tcp stack */
> > > > +		struct plist plist; /* For use with ptr_ring */
> > > >           };
> > > > 
> > > This look ok.
> > > 
> > > > > For XDP, we need to embed plist in struct xdp_buff too,
> > > > Right - that's pretty straightforward, isn't it?
> > > Yes, it's not clear to me this is really needed for XDP consider the lock
> > > contention it brings.
> > > 
> > > Thanks
> > The contention is only when the ring overflows into the list though.
> > 
> 
> Right, but there's usually a mismatch of speed between producer and
> consumer. In case of a fast producer, we may get this contention very
> frequently.
> 
> Thanks

This is not true in my experiments.  In my experiments, ring size of 4k
bytes is enough to see packet drops in single %s of cases.

To you have workloads where rings are full most of the time?

One other nice side effect of this patch is that instead of dropping
packets quickly it slows down producer to match consumer speeds.

IOW, it can go either way in theory, we will need to test and see the effect.

-- 
MST

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC PATCH v2] ptr_ring: linked list fallback
  2018-02-28 14:01               ` Michael S. Tsirkin
@ 2018-02-28 14:20                 ` Jason Wang
  2018-02-28 15:43                   ` Michael S. Tsirkin
  0 siblings, 1 reply; 16+ messages in thread
From: Jason Wang @ 2018-02-28 14:20 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: linux-kernel, John Fastabend, netdev, David Miller



On 2018年02月28日 22:01, Michael S. Tsirkin wrote:
> On Wed, Feb 28, 2018 at 02:28:21PM +0800, Jason Wang wrote:
>>
>> On 2018年02月28日 12:09, Michael S. Tsirkin wrote:
>>>>> Or we can add plist to a union:
>>>>>
>>>>>
>>>>> struct sk_buff {
>>>>>            union {
>>>>>                    struct {
>>>>>                            /* These two members must be first. */
>>>>>                            struct sk_buff          *next;
>>>>>                            struct sk_buff          *prev;
>>>>>                            union {
>>>>>                                    struct net_device       *dev;
>>>>>                                    /* Some protocols might use this space to store information,
>>>>>                                     * while device pointer would be NULL.
>>>>>                                     * UDP receive path is one user.
>>>>>                                     */
>>>>>                                    unsigned long           dev_scratch;
>>>>>                            };
>>>>>                    };
>>>>>                    struct rb_node  rbnode; /* used in netem & tcp stack */
>>>>> +		struct plist plist; /* For use with ptr_ring */
>>>>>            };
>>>>>
>>>> This look ok.
>>>>
>>>>>> For XDP, we need to embed plist in struct xdp_buff too,
>>>>> Right - that's pretty straightforward, isn't it?
>>>> Yes, it's not clear to me this is really needed for XDP consider the lock
>>>> contention it brings.
>>>>
>>>> Thanks
>>> The contention is only when the ring overflows into the list though.
>>>
>> Right, but there's usually a mismatch of speed between producer and
>> consumer. In case of a fast producer, we may get this contention very
>> frequently.
>>
>> Thanks
> This is not true in my experiments.  In my experiments, ring size of 4k
> bytes is enough to see packet drops in single %s of cases.
>
> To you have workloads where rings are full most of the time?

E.g using xdp_redirect to redirect packets from ixgbe to tap. In my 
test, ixgeb can produce ~8Mpps. But vhost can only consume ~3.5Mpps.

>
> One other nice side effect of this patch is that instead of dropping
> packets quickly it slows down producer to match consumer speeds.

In some case, producer may not want to be slowed down, e.g in devmap 
which can redirect packets into several different interfaces.

> IOW, it can go either way in theory, we will need to test and see the effect.
>

Yes.

Thanks

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC PATCH v2] ptr_ring: linked list fallback
  2018-02-28 14:20                 ` Jason Wang
@ 2018-02-28 15:43                   ` Michael S. Tsirkin
  2018-03-01  6:41                     ` Jason Wang
  0 siblings, 1 reply; 16+ messages in thread
From: Michael S. Tsirkin @ 2018-02-28 15:43 UTC (permalink / raw)
  To: Jason Wang; +Cc: linux-kernel, John Fastabend, netdev, David Miller

On Wed, Feb 28, 2018 at 10:20:33PM +0800, Jason Wang wrote:
> 
> 
> On 2018年02月28日 22:01, Michael S. Tsirkin wrote:
> > On Wed, Feb 28, 2018 at 02:28:21PM +0800, Jason Wang wrote:
> > > 
> > > On 2018年02月28日 12:09, Michael S. Tsirkin wrote:
> > > > > > Or we can add plist to a union:
> > > > > > 
> > > > > > 
> > > > > > struct sk_buff {
> > > > > >            union {
> > > > > >                    struct {
> > > > > >                            /* These two members must be first. */
> > > > > >                            struct sk_buff          *next;
> > > > > >                            struct sk_buff          *prev;
> > > > > >                            union {
> > > > > >                                    struct net_device       *dev;
> > > > > >                                    /* Some protocols might use this space to store information,
> > > > > >                                     * while device pointer would be NULL.
> > > > > >                                     * UDP receive path is one user.
> > > > > >                                     */
> > > > > >                                    unsigned long           dev_scratch;
> > > > > >                            };
> > > > > >                    };
> > > > > >                    struct rb_node  rbnode; /* used in netem & tcp stack */
> > > > > > +		struct plist plist; /* For use with ptr_ring */
> > > > > >            };
> > > > > > 
> > > > > This look ok.
> > > > > 
> > > > > > > For XDP, we need to embed plist in struct xdp_buff too,
> > > > > > Right - that's pretty straightforward, isn't it?
> > > > > Yes, it's not clear to me this is really needed for XDP consider the lock
> > > > > contention it brings.
> > > > > 
> > > > > Thanks
> > > > The contention is only when the ring overflows into the list though.
> > > > 
> > > Right, but there's usually a mismatch of speed between producer and
> > > consumer. In case of a fast producer, we may get this contention very
> > > frequently.
> > > 
> > > Thanks
> > This is not true in my experiments.  In my experiments, ring size of 4k
> > bytes is enough to see packet drops in single %s of cases.
> > 
> > To you have workloads where rings are full most of the time?
> 
> E.g using xdp_redirect to redirect packets from ixgbe to tap. In my test,
> ixgeb can produce ~8Mpps. But vhost can only consume ~3.5Mpps.

Then you are better off just using a small ring and dropping
packets early, right?

> > 
> > One other nice side effect of this patch is that instead of dropping
> > packets quickly it slows down producer to match consumer speeds.
> 
> In some case, producer may not want to be slowed down, e.g in devmap which
> can redirect packets into several different interfaces.
> > IOW, it can go either way in theory, we will need to test and see the effect.
> > 
> 
> Yes.
> 
> Thanks

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC PATCH v2] ptr_ring: linked list fallback
  2018-02-28 15:43                   ` Michael S. Tsirkin
@ 2018-03-01  6:41                     ` Jason Wang
  0 siblings, 0 replies; 16+ messages in thread
From: Jason Wang @ 2018-03-01  6:41 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: linux-kernel, John Fastabend, netdev, David Miller



On 2018年02月28日 23:43, Michael S. Tsirkin wrote:
> On Wed, Feb 28, 2018 at 10:20:33PM +0800, Jason Wang wrote:
>>
>> On 2018年02月28日 22:01, Michael S. Tsirkin wrote:
>>> On Wed, Feb 28, 2018 at 02:28:21PM +0800, Jason Wang wrote:
>>>> On 2018年02月28日 12:09, Michael S. Tsirkin wrote:
>>>>>>> Or we can add plist to a union:
>>>>>>>
>>>>>>>
>>>>>>> struct sk_buff {
>>>>>>>             union {
>>>>>>>                     struct {
>>>>>>>                             /* These two members must be first. */
>>>>>>>                             struct sk_buff          *next;
>>>>>>>                             struct sk_buff          *prev;
>>>>>>>                             union {
>>>>>>>                                     struct net_device       *dev;
>>>>>>>                                     /* Some protocols might use this space to store information,
>>>>>>>                                      * while device pointer would be NULL.
>>>>>>>                                      * UDP receive path is one user.
>>>>>>>                                      */
>>>>>>>                                     unsigned long           dev_scratch;
>>>>>>>                             };
>>>>>>>                     };
>>>>>>>                     struct rb_node  rbnode; /* used in netem & tcp stack */
>>>>>>> +		struct plist plist; /* For use with ptr_ring */
>>>>>>>             };
>>>>>>>
>>>>>> This look ok.
>>>>>>
>>>>>>>> For XDP, we need to embed plist in struct xdp_buff too,
>>>>>>> Right - that's pretty straightforward, isn't it?
>>>>>> Yes, it's not clear to me this is really needed for XDP consider the lock
>>>>>> contention it brings.
>>>>>>
>>>>>> Thanks
>>>>> The contention is only when the ring overflows into the list though.
>>>>>
>>>> Right, but there's usually a mismatch of speed between producer and
>>>> consumer. In case of a fast producer, we may get this contention very
>>>> frequently.
>>>>
>>>> Thanks
>>> This is not true in my experiments.  In my experiments, ring size of 4k
>>> bytes is enough to see packet drops in single %s of cases.
>>>
>>> To you have workloads where rings are full most of the time?
>> E.g using xdp_redirect to redirect packets from ixgbe to tap. In my test,
>> ixgeb can produce ~8Mpps. But vhost can only consume ~3.5Mpps.
> Then you are better off just using a small ring and dropping
> packets early, right?

Yes, so I believe we won't use this for XDP.

Thanks

>>> One other nice side effect of this patch is that instead of dropping
>>> packets quickly it slows down producer to match consumer speeds.
>> In some case, producer may not want to be slowed down, e.g in devmap which
>> can redirect packets into several different interfaces.
>>> IOW, it can go either way in theory, we will need to test and see the effect.
>>>
>> Yes.
>>
>> Thanks

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2018-03-01  6:41 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-02-26  1:17 [RFC PATCH v2] ptr_ring: linked list fallback Michael S. Tsirkin
2018-02-26  3:15 ` Jason Wang
2018-02-26 20:34   ` Michael S. Tsirkin
2018-02-27  2:29     ` Jason Wang
2018-02-27 17:12       ` Michael S. Tsirkin
2018-02-28  3:28         ` Jason Wang
2018-02-28  3:39           ` Jason Wang
2018-02-28  4:11             ` Michael S. Tsirkin
2018-02-28  4:09           ` Michael S. Tsirkin
2018-02-28  6:28             ` Jason Wang
2018-02-28 14:01               ` Michael S. Tsirkin
2018-02-28 14:20                 ` Jason Wang
2018-02-28 15:43                   ` Michael S. Tsirkin
2018-03-01  6:41                     ` Jason Wang
2018-02-27 17:53 ` Eric Dumazet
2018-02-27 19:35   ` Michael S. Tsirkin

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.