All of lore.kernel.org
 help / color / mirror / Atom feed
From: Eric Dumazet <edumazet@google.com>
To: "David S . Miller" <davem@davemloft.net>
Cc: netdev <netdev@vger.kernel.org>,
	Eric Dumazet <edumazet@google.com>,
	Eric Dumazet <eric.dumazet@gmail.com>
Subject: [PATCH v3 net-next 0/3] tcp: add rx/tx cache to reduce lock contention
Date: Fri, 22 Mar 2019 08:56:37 -0700	[thread overview]
Message-ID: <20190322155640.248144-1-edumazet@google.com> (raw)

On hosts with many cpus we can observe a very serious contention
on spinlocks used in mm slab layer.

The following can happen quite often :

1) TX path
  sendmsg() allocates one (fclone) skb on CPU A, sends a clone.
  ACK is received on CPU B, and consumes the skb that was in the retransmit
  queue.

2) RX path
  network driver allocates skb on CPU C
  recvmsg() happens on CPU D, freeing the skb after it has been delivered
  to user space.

In both cases, we are hitting the asymetric alloc/free pattern
for which slab has to drain alien caches. At 8 Mpps per second,
this represents 16 Mpps alloc/free per second and has a huge penalty.

In an interesting experiment, I tried to use a single kmem_cache for all the skbs
(in skb_init() : skbuff_fclone_cache = skbuff_head_cache =
                  kmem_cache_create("skbuff_fclone_cache", sizeof(struct sk_buff_fclones),);
qnd most of the contention disappeared, since cpus could better use
their local slab per-cpu cache.

But we can do actually better, in the following patches.

TX : at ACK time, no longer free the skb but put it back in a tcp socket cache,
     so that next sendmsg() can reuse it immediately.

RX : at recvmsg() time, do not free the skb but put it in a tcp socket cache
   so that it can be freed by the cpu feeding the incoming packets in BH.

This increased the performance of small RPC benchmark by about 10 % on a host
with 112 hyperthreads.

v2 : - Solved a race condition : sk_stream_alloc_skb() to make sure the prior
       clone has been freed.
     - Really test rps_needed in sk_eat_skb() as claimed.
     - Fixed rps_needed use in drivers/net/tun.c

v3: Added a #ifdef CONFIG_RPS, to avoid compile error (kbuild robot)

Eric Dumazet (3):
  net: convert rps_needed and rfs_needed to new static branch api
  tcp: add one skb cache for tx
  tcp: add one skb cache for rx

 drivers/net/tun.c          |  2 +-
 include/linux/netdevice.h  |  4 +--
 include/net/sock.h         | 17 +++++++++++-
 net/core/dev.c             | 10 +++----
 net/core/net-sysfs.c       |  4 +--
 net/core/sysctl_net_core.c |  8 +++---
 net/ipv4/af_inet.c         |  4 +++
 net/ipv4/tcp.c             | 54 +++++++++++++++++++-------------------
 net/ipv4/tcp_ipv4.c        | 11 ++++++--
 net/ipv6/tcp_ipv6.c        | 12 ++++++---
 10 files changed, 79 insertions(+), 47 deletions(-)

-- 
2.21.0.392.gf8f6787159e-goog


             reply	other threads:[~2019-03-22 15:56 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-03-22 15:56 Eric Dumazet [this message]
2019-03-22 15:56 ` [PATCH v3 net-next 1/3] net: convert rps_needed and rfs_needed to new static branch api Eric Dumazet
2019-03-22 15:56 ` [PATCH v3 net-next 2/3] tcp: add one skb cache for tx Eric Dumazet
2019-03-22 15:56 ` [PATCH v3 net-next 3/3] tcp: add one skb cache for rx Eric Dumazet
2019-04-03  1:17   ` Jakub Kicinski
2019-04-03  8:15     ` Eric Dumazet
2019-04-12 14:43   ` [tcp] 01b4c2aab8: lmbench3.TCP.socket.bandwidth.10MB.MB/sec -20.2% regression kernel test robot
2019-04-12 14:43     ` kernel test robot
2019-03-22 16:37 ` [PATCH v3 net-next 0/3] tcp: add rx/tx cache to reduce lock contention Tariq Toukan
2019-03-22 16:55   ` Eric Dumazet
2019-03-24  1:58 ` David Miller

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190322155640.248144-1-edumazet@google.com \
    --to=edumazet@google.com \
    --cc=davem@davemloft.net \
    --cc=eric.dumazet@gmail.com \
    --cc=netdev@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.