All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Michael S. Tsirkin" <mst@redhat.com>
To: Jesper Dangaard Brouer <brouer@redhat.com>
Cc: linux-kernel@vger.kernel.org, Jason Wang <jasowang@redhat.com>
Subject: Re: [PATCH 1/3] ptr_ring: batch ring zeroing
Date: Tue, 9 May 2017 16:33:14 +0300	[thread overview]
Message-ID: <20170509163238-mutt-send-email-mst@kernel.org> (raw)
In-Reply-To: <20170408141408.2101017e@redhat.com>

On Sat, Apr 08, 2017 at 02:14:08PM +0200, Jesper Dangaard Brouer wrote:
> On Fri, 7 Apr 2017 08:49:57 +0300
> "Michael S. Tsirkin" <mst@redhat.com> wrote:
> 
> > A known weakness in ptr_ring design is that it does not handle well the
> > situation when ring is almost full: as entries are consumed they are
> > immediately used again by the producer, so consumer and producer are
> > writing to a shared cache line.
> > 
> > To fix this, add batching to consume calls: as entries are
> > consumed do not write NULL into the ring until we get
> > a multiple (in current implementation 2x) of cache lines
> > away from the producer. At that point, write them all out.
> > 
> > We do the write out in the reverse order to keep
> > producer from sharing cache with consumer for as long
> > as possible.
> > 
> > Writeout also triggers when ring wraps around - there's
> > no special reason to do this but it helps keep the code
> > a bit simpler.
> > 
> > What should we do if getting away from producer by 2 cache lines
> > would mean we are keeping the ring moe than half empty?
> > Maybe we should reduce the batching in this case,
> > current patch simply reduces the batching.
> > 
> > Notes:
> > - it is no longer true that a call to consume guarantees
> >   that the following call to produce will succeed.
> >   No users seem to assume that.
> > - batching can also in theory reduce the signalling rate:
> >   users that would previously send interrups to the producer
> >   to wake it up after consuming each entry would now only
> >   need to do this once in a batch.
> >   Doing this would be easy by returning a flag to the caller.
> >   No users seem to do signalling on consume yet so this was not
> >   implemented yet.
> > 
> > Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
> > ---
> > 
> > Jason, I am curious whether the following gives you some of
> > the performance boost that you see with vhost batching
> > patches. Is vhost batching on top still helpful?
> > 
> >  include/linux/ptr_ring.h | 63 +++++++++++++++++++++++++++++++++++++++++-------
> >  1 file changed, 54 insertions(+), 9 deletions(-)
> > 
> > diff --git a/include/linux/ptr_ring.h b/include/linux/ptr_ring.h
> > index 6c70444..6b2e0dd 100644
> > --- a/include/linux/ptr_ring.h
> > +++ b/include/linux/ptr_ring.h
> > @@ -34,11 +34,13 @@
> >  struct ptr_ring {
> >  	int producer ____cacheline_aligned_in_smp;
> >  	spinlock_t producer_lock;
> > -	int consumer ____cacheline_aligned_in_smp;
> > +	int consumer_head ____cacheline_aligned_in_smp; /* next valid entry */
> > +	int consumer_tail; /* next entry to invalidate */
> >  	spinlock_t consumer_lock;
> >  	/* Shared consumer/producer data */
> >  	/* Read-only by both the producer and the consumer */
> >  	int size ____cacheline_aligned_in_smp; /* max entries in queue */
> > +	int batch; /* number of entries to consume in a batch */
> >  	void **queue;
> >  };
> >  
> > @@ -170,7 +172,7 @@ static inline int ptr_ring_produce_bh(struct ptr_ring *r, void *ptr)
> >  static inline void *__ptr_ring_peek(struct ptr_ring *r)
> >  {
> >  	if (likely(r->size))
> > -		return r->queue[r->consumer];
> > +		return r->queue[r->consumer_head];
> >  	return NULL;
> >  }
> >  
> > @@ -231,9 +233,38 @@ static inline bool ptr_ring_empty_bh(struct ptr_ring *r)
> >  /* Must only be called after __ptr_ring_peek returned !NULL */
> >  static inline void __ptr_ring_discard_one(struct ptr_ring *r)
> >  {
> > -	r->queue[r->consumer++] = NULL;
> > -	if (unlikely(r->consumer >= r->size))
> > -		r->consumer = 0;
> > +	/* Fundamentally, what we want to do is update consumer
> > +	 * index and zero out the entry so producer can reuse it.
> > +	 * Doing it naively at each consume would be as simple as:
> > +	 *       r->queue[r->consumer++] = NULL;
> > +	 *       if (unlikely(r->consumer >= r->size))
> > +	 *               r->consumer = 0;
> > +	 * but that is suboptimal when the ring is full as producer is writing
> > +	 * out new entries in the same cache line.  Defer these updates until a
> > +	 * batch of entries has been consumed.
> > +	 */
> > +	int head = r->consumer_head++;
> > +
> > +	/* Once we have processed enough entries invalidate them in
> > +	 * the ring all at once so producer can reuse their space in the ring.
> > +	 * We also do this when we reach end of the ring - not mandatory
> > +	 * but helps keep the implementation simple.
> > +	 */
> > +	if (unlikely(r->consumer_head - r->consumer_tail >= r->batch ||
> > +		     r->consumer_head >= r->size)) {
> > +		/* Zero out entries in the reverse order: this way we touch the
> > +		 * cache line that producer might currently be reading the last;
> > +		 * producer won't make progress and touch other cache lines
> > +		 * besides the first one until we write out all entries.
> > +		 */
> > +		while (likely(head >= r->consumer_tail))
> > +			r->queue[head--] = NULL;
> > +		r->consumer_tail = r->consumer_head;
> > +	}
> > +	if (unlikely(r->consumer_head >= r->size)) {
> > +		r->consumer_head = 0;
> > +		r->consumer_tail = 0;
> > +	}
> >  }
> 
> I love this idea.  Reviewed and discussed the idea in-person with MST
> during netdevconf[1] at this laptop.  I promised I will also run it
> through my micro-benchmarking[2] once I return home (hint ptr_ring gets
> used in network stack as skb_array).

I'm merging this through my tree. Any objections?

> Reviewed-by: Jesper Dangaard Brouer <brouer@redhat.com>
> 
> [1] http://netdevconf.org/2.1/
> [2] https://github.com/netoptimizer/prototype-kernel/blob/master/kernel/lib/skb_array_bench01.c
> -- 
> Best regards,
>   Jesper Dangaard Brouer
>   MSc.CS, Principal Kernel Engineer at Red Hat
>   LinkedIn: http://www.linkedin.com/in/brouer

  reply	other threads:[~2017-05-09 13:33 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-04-07  5:49 [PATCH 1/3] ptr_ring: batch ring zeroing Michael S. Tsirkin
2017-04-07  5:50 ` [PATCH 2/3] ringtest: support test specific parameters Michael S. Tsirkin
2017-04-07  5:50   ` Michael S. Tsirkin
2017-04-07  5:50 ` [PATCH 3/3] ptr_ring: support testing different batching sizes Michael S. Tsirkin
2017-04-07  5:50   ` Michael S. Tsirkin
2017-04-08 12:14 ` [PATCH 1/3] ptr_ring: batch ring zeroing Jesper Dangaard Brouer
2017-05-09 13:33   ` Michael S. Tsirkin [this message]
2017-05-10  3:30     ` Jason Wang
2017-05-10 12:22       ` Michael S. Tsirkin
2017-05-10  9:18     ` Jesper Dangaard Brouer
2017-05-10 12:20       ` Michael S. Tsirkin
2017-04-12  8:03 ` Jason Wang
2017-04-14  7:52   ` Jason Wang
2017-04-14 21:00     ` Michael S. Tsirkin
2017-04-18  2:16       ` Jason Wang
2017-04-14 22:50     ` Michael S. Tsirkin
2017-04-18  2:18       ` Jason Wang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170509163238-mutt-send-email-mst@kernel.org \
    --to=mst@redhat.com \
    --cc=brouer@redhat.com \
    --cc=jasowang@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.