All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Michael S. Tsirkin" <mst@redhat.com>
To: Eric Dumazet <eric.dumazet@gmail.com>
Cc: linux-kernel@vger.kernel.org,
	John Fastabend <john.fastabend@gmail.com>,
	netdev@vger.kernel.org, Jason Wang <jasowang@redhat.com>,
	David Miller <davem@davemloft.net>
Subject: Re: [RFC PATCH v2] ptr_ring: linked list fallback
Date: Tue, 27 Feb 2018 21:35:42 +0200	[thread overview]
Message-ID: <20180227212120-mutt-send-email-mst@kernel.org> (raw)
In-Reply-To: <1519754029.7296.11.camel@gmail.com>

On Tue, Feb 27, 2018 at 09:53:49AM -0800, Eric Dumazet wrote:
> On Mon, 2018-02-26 at 03:17 +0200, Michael S. Tsirkin wrote:
> > So pointer rings work fine, but they have a problem: make them too small
> > and not enough entries fit.  Make them too large and you start flushing
> > your cache and running out of memory.
> > 
> > This is a new idea of mine: a ring backed by a linked list. Once you run
> > out of ring entries, instead of a drop you fall back on a list with a
> > common lock.
> > 
> > Should work well for the case where the ring is typically sized
> > correctly, but will help address the fact that some user try to set e.g.
> > tx queue length to 1000000.
> > 
> > In other words, the idea is that if a user sets a really huge TX queue
> > length, we allocate a ptr_ring which is smaller, and use the backup
> > linked list when necessary to provide the requested TX queue length
> > legitimately.
> > 
> > My hope this will move us closer to direction where e.g. fw codel can
> > use ptr rings without locking at all.  The API is still very rough, and
> > I really need to take a hard look at lock nesting.
> > 
> > Compiled only, sending for early feedback/flames.
> 
> Okay I'll bite then ;)

Let me start by saying that there's no intent to merge this
before any numbers show a performance gain.

> High performance will be hit only if nothing is added in the (fallback)
> list.
> 
> Under stress, list operations will be the bottleneck, allowing XXXX
> items in the list, probably wasting cpu caches by always dequeue-ing
> cold objects.
>
> Since systems need to be provisioned to cope with the stress, why
> trying to optimize the light load case, while we know CPU has plenty of
> cycles to use ?

E.g. with tun people configure huge rx rings to avoid packet drops, but
in practice tens of packets is the maximum we see even under heavy load
except <1% of time.

So the list will get used a very small % of time and yes, that
time it will be slower.

> If something uses ptr_ring and needs a list for the fallback, it might
> simply go back to the old-and-simple list stuff.


So for size > 512 we use a list, for size < 512 we use a ptr ring?

That is absolutely an option.

My concern is that this means that simply by increasing the size
using ethtool suddenly user sees a slowdown.
This did not use to be the case so users might be confused.

> Note that this old-and-simple stuff can greatly be optimized with the
> use of two lists, as was shown in UDP stack lately, to decouple
> producer and consumer (batching effects)

Pls note that such a batching is already built in to this patch:
packets are added to the last skb, then dequeued as a batch
and moved to consumer_list.

-- 
MST

      reply	other threads:[~2018-02-27 19:35 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-02-26  1:17 [RFC PATCH v2] ptr_ring: linked list fallback Michael S. Tsirkin
2018-02-26  3:15 ` Jason Wang
2018-02-26 20:34   ` Michael S. Tsirkin
2018-02-27  2:29     ` Jason Wang
2018-02-27 17:12       ` Michael S. Tsirkin
2018-02-28  3:28         ` Jason Wang
2018-02-28  3:39           ` Jason Wang
2018-02-28  4:11             ` Michael S. Tsirkin
2018-02-28  4:09           ` Michael S. Tsirkin
2018-02-28  6:28             ` Jason Wang
2018-02-28 14:01               ` Michael S. Tsirkin
2018-02-28 14:20                 ` Jason Wang
2018-02-28 15:43                   ` Michael S. Tsirkin
2018-03-01  6:41                     ` Jason Wang
2018-02-27 17:53 ` Eric Dumazet
2018-02-27 19:35   ` Michael S. Tsirkin [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180227212120-mutt-send-email-mst@kernel.org \
    --to=mst@redhat.com \
    --cc=davem@davemloft.net \
    --cc=eric.dumazet@gmail.com \
    --cc=jasowang@redhat.com \
    --cc=john.fastabend@gmail.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=netdev@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.