All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Michael S. Tsirkin" <mst@redhat.com>
To: Eric Dumazet <edumazet@google.com>
Cc: "David S . Miller" <davem@davemloft.net>,
	netdev <netdev@vger.kernel.org>,
	Soheil Hassas Yeganeh <soheil@google.com>,
	Willem de Bruijn <willemb@google.com>,
	Florian Westphal <fw@strlen.de>,
	Tom Herbert <tom@herbertland.com>,
	Eric Dumazet <eric.dumazet@gmail.com>
Subject: Re: [PATCH v2 net-next 0/3] tcp: add rx/tx cache to reduce lock contention
Date: Fri, 22 Mar 2019 07:28:33 -0400	[thread overview]
Message-ID: <20190322072802-mutt-send-email-mst@kernel.org> (raw)
In-Reply-To: <20190322001444.182463-1-edumazet@google.com>

On Thu, Mar 21, 2019 at 05:14:41PM -0700, Eric Dumazet wrote:
> On hosts with many cpus we can observe a very serious contention
> on spinlocks used in mm slab layer.
> 
> The following can happen quite often :
> 
> 1) TX path
>   sendmsg() allocates one (fclone) skb on CPU A, sends a clone.
>   ACK is received on CPU B, and consumes the skb that was in the retransmit
>   queue.
> 
> 2) RX path
>   network driver allocates skb on CPU C
>   recvmsg() happens on CPU D, freeing the skb after it has been delivered
>   to user space.
> 
> In both cases, we are hitting the asymetric alloc/free pattern
> for which slab has to drain alien caches. At 8 Mpps per second,
> this represents 16 Mpps alloc/free per second and has a huge penalty.
> 
> In an interesting experiment, I tried to use a single kmem_cache for all the skbs
> (in skb_init() : skbuff_fclone_cache = skbuff_head_cache =
>                   kmem_cache_create("skbuff_fclone_cache", sizeof(struct sk_buff_fclones),);
> qnd most of the contention disappeared, since cpus could better use
> their local slab per-cpu cache.
> 
> But we can do actually better, in the following patches.
> 
> TX : at ACK time, no longer free the skb but put it back in a tcp socket cache,
>      so that next sendmsg() can reuse it immediately.
> 
> RX : at recvmsg() time, do not free the skb but put it in a tcp socket cache
>    so that it can be freed by the cpu feeding the incoming packets in BH.
> 
> This increased the performance of small RPC benchmark by about 10 % on a host
> with 112 hyperthreads.
> 
> v2 : - Solved a race condition : sk_stream_alloc_skb() to make sure the prior
>        clone has been freed.
>      - Really test rps_needed in sk_eat_skb() as claimed.
>      - Fixed rps_needed use in drivers/net/tun.c

Just a thought: would it make sense to flush the cache
in enter_memory_pressure?


> Eric Dumazet (3):
>   net: convert rps_needed and rfs_needed to new static branch api
>   tcp: add one skb cache for tx
>   tcp: add one skb cache for rx
> 
>  drivers/net/tun.c          |  2 +-
>  include/linux/netdevice.h  |  4 +--
>  include/net/sock.h         | 13 ++++++++-
>  net/core/dev.c             | 10 +++----
>  net/core/net-sysfs.c       |  4 +--
>  net/core/sysctl_net_core.c |  8 +++---
>  net/ipv4/af_inet.c         |  4 +++
>  net/ipv4/tcp.c             | 54 +++++++++++++++++++-------------------
>  net/ipv4/tcp_ipv4.c        | 11 ++++++--
>  net/ipv6/tcp_ipv6.c        | 12 ++++++---
>  10 files changed, 75 insertions(+), 47 deletions(-)
> 
> -- 
> 2.21.0.225.g810b269d1ac-goog

  parent reply	other threads:[~2019-03-22 11:28 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-03-22  0:14 [PATCH v2 net-next 0/3] tcp: add rx/tx cache to reduce lock contention Eric Dumazet
2019-03-22  0:14 ` [PATCH v2 net-next 1/3] net: convert rps_needed and rfs_needed to new static branch api Eric Dumazet
2019-03-22  0:14 ` [PATCH v2 net-next 2/3] tcp: add one skb cache for tx Eric Dumazet
2019-03-22  0:14 ` [PATCH v2 net-next 3/3] tcp: add one skb cache for rx Eric Dumazet
2019-03-22 14:57   ` kbuild test robot
2019-03-22 15:00   ` kbuild test robot
2019-03-22 15:20     ` Eric Dumazet
2019-03-22  1:54 ` [PATCH v2 net-next 0/3] tcp: add rx/tx cache to reduce lock contention Willem de Bruijn
2019-03-22  7:04   ` Soheil Hassas Yeganeh
2019-03-22 11:28 ` Michael S. Tsirkin [this message]
2019-03-22 12:49   ` Eric Dumazet

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190322072802-mutt-send-email-mst@kernel.org \
    --to=mst@redhat.com \
    --cc=davem@davemloft.net \
    --cc=edumazet@google.com \
    --cc=eric.dumazet@gmail.com \
    --cc=fw@strlen.de \
    --cc=netdev@vger.kernel.org \
    --cc=soheil@google.com \
    --cc=tom@herbertland.com \
    --cc=willemb@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.