* [PATCH v2 net-next 0/3] tcp: add rx/tx cache to reduce lock contention
@ 2019-03-22 0:14 Eric Dumazet
2019-03-22 0:14 ` [PATCH v2 net-next 1/3] net: convert rps_needed and rfs_needed to new static branch api Eric Dumazet
` (4 more replies)
0 siblings, 5 replies; 11+ messages in thread
From: Eric Dumazet @ 2019-03-22 0:14 UTC (permalink / raw)
To: David S . Miller
Cc: netdev, Eric Dumazet, Soheil Hassas Yeganeh, Willem de Bruijn,
Florian Westphal, Tom Herbert, Eric Dumazet
On hosts with many cpus we can observe a very serious contention
on spinlocks used in mm slab layer.
The following can happen quite often :
1) TX path
sendmsg() allocates one (fclone) skb on CPU A, sends a clone.
ACK is received on CPU B, and consumes the skb that was in the retransmit
queue.
2) RX path
network driver allocates skb on CPU C
recvmsg() happens on CPU D, freeing the skb after it has been delivered
to user space.
In both cases, we are hitting the asymetric alloc/free pattern
for which slab has to drain alien caches. At 8 Mpps per second,
this represents 16 Mpps alloc/free per second and has a huge penalty.
In an interesting experiment, I tried to use a single kmem_cache for all the skbs
(in skb_init() : skbuff_fclone_cache = skbuff_head_cache =
kmem_cache_create("skbuff_fclone_cache", sizeof(struct sk_buff_fclones),);
qnd most of the contention disappeared, since cpus could better use
their local slab per-cpu cache.
But we can do actually better, in the following patches.
TX : at ACK time, no longer free the skb but put it back in a tcp socket cache,
so that next sendmsg() can reuse it immediately.
RX : at recvmsg() time, do not free the skb but put it in a tcp socket cache
so that it can be freed by the cpu feeding the incoming packets in BH.
This increased the performance of small RPC benchmark by about 10 % on a host
with 112 hyperthreads.
v2 : - Solved a race condition : sk_stream_alloc_skb() to make sure the prior
clone has been freed.
- Really test rps_needed in sk_eat_skb() as claimed.
- Fixed rps_needed use in drivers/net/tun.c
Eric Dumazet (3):
net: convert rps_needed and rfs_needed to new static branch api
tcp: add one skb cache for tx
tcp: add one skb cache for rx
drivers/net/tun.c | 2 +-
include/linux/netdevice.h | 4 +--
include/net/sock.h | 13 ++++++++-
net/core/dev.c | 10 +++----
net/core/net-sysfs.c | 4 +--
net/core/sysctl_net_core.c | 8 +++---
net/ipv4/af_inet.c | 4 +++
net/ipv4/tcp.c | 54 +++++++++++++++++++-------------------
net/ipv4/tcp_ipv4.c | 11 ++++++--
net/ipv6/tcp_ipv6.c | 12 ++++++---
10 files changed, 75 insertions(+), 47 deletions(-)
--
2.21.0.225.g810b269d1ac-goog
^ permalink raw reply [flat|nested] 11+ messages in thread
* [PATCH v2 net-next 1/3] net: convert rps_needed and rfs_needed to new static branch api
2019-03-22 0:14 [PATCH v2 net-next 0/3] tcp: add rx/tx cache to reduce lock contention Eric Dumazet
@ 2019-03-22 0:14 ` Eric Dumazet
2019-03-22 0:14 ` [PATCH v2 net-next 2/3] tcp: add one skb cache for tx Eric Dumazet
` (3 subsequent siblings)
4 siblings, 0 replies; 11+ messages in thread
From: Eric Dumazet @ 2019-03-22 0:14 UTC (permalink / raw)
To: David S . Miller
Cc: netdev, Eric Dumazet, Soheil Hassas Yeganeh, Willem de Bruijn,
Florian Westphal, Tom Herbert, Eric Dumazet
We prefer static_branch_unlikely() over static_key_false() these days.
Signed-off-by: Eric Dumazet <edumazet@google.com>
---
drivers/net/tun.c | 2 +-
include/linux/netdevice.h | 4 ++--
include/net/sock.h | 2 +-
net/core/dev.c | 10 +++++-----
net/core/net-sysfs.c | 4 ++--
net/core/sysctl_net_core.c | 8 ++++----
6 files changed, 15 insertions(+), 15 deletions(-)
diff --git a/drivers/net/tun.c b/drivers/net/tun.c
index 27798aacb671e3e8d754ea60dac528e8efdb52da..24d0220b9ba00724ebad94fbc58858a4abffb207 100644
--- a/drivers/net/tun.c
+++ b/drivers/net/tun.c
@@ -1042,7 +1042,7 @@ static int tun_net_close(struct net_device *dev)
static void tun_automq_xmit(struct tun_struct *tun, struct sk_buff *skb)
{
#ifdef CONFIG_RPS
- if (tun->numqueues == 1 && static_key_false(&rps_needed)) {
+ if (tun->numqueues == 1 && static_branch_unlikely(&rps_needed)) {
/* Select queue was not called for the skbuff, so we extract the
* RPS hash and save it into the flow_table here.
*/
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 823762291ebf59d2a8a0502f71d6591b5cd7839f..166fdc0a78b49c9df984b767169c3babce24462e 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -194,8 +194,8 @@ struct net_device_stats {
#ifdef CONFIG_RPS
#include <linux/static_key.h>
-extern struct static_key rps_needed;
-extern struct static_key rfs_needed;
+extern struct static_key_false rps_needed;
+extern struct static_key_false rfs_needed;
#endif
struct neighbour;
diff --git a/include/net/sock.h b/include/net/sock.h
index 8de5ee258b93a50b2fdcde796bae3a5b53ce4d6a..fecdf639225c2d4995ee2e2cd9be57f3d4f22777 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -966,7 +966,7 @@ static inline void sock_rps_record_flow_hash(__u32 hash)
static inline void sock_rps_record_flow(const struct sock *sk)
{
#ifdef CONFIG_RPS
- if (static_key_false(&rfs_needed)) {
+ if (static_branch_unlikely(&rfs_needed)) {
/* Reading sk->sk_rxhash might incur an expensive cache line
* miss.
*
diff --git a/net/core/dev.c b/net/core/dev.c
index 357111431ec9a6a5873830b89dd137d5eba6f2f0..c71b0998fa3ac8ae9d28aa1131852032a5cd0008 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -3973,9 +3973,9 @@ EXPORT_SYMBOL(rps_sock_flow_table);
u32 rps_cpu_mask __read_mostly;
EXPORT_SYMBOL(rps_cpu_mask);
-struct static_key rps_needed __read_mostly;
+struct static_key_false rps_needed __read_mostly;
EXPORT_SYMBOL(rps_needed);
-struct static_key rfs_needed __read_mostly;
+struct static_key_false rfs_needed __read_mostly;
EXPORT_SYMBOL(rfs_needed);
static struct rps_dev_flow *
@@ -4501,7 +4501,7 @@ static int netif_rx_internal(struct sk_buff *skb)
}
#ifdef CONFIG_RPS
- if (static_key_false(&rps_needed)) {
+ if (static_branch_unlikely(&rps_needed)) {
struct rps_dev_flow voidflow, *rflow = &voidflow;
int cpu;
@@ -5170,7 +5170,7 @@ static int netif_receive_skb_internal(struct sk_buff *skb)
rcu_read_lock();
#ifdef CONFIG_RPS
- if (static_key_false(&rps_needed)) {
+ if (static_branch_unlikely(&rps_needed)) {
struct rps_dev_flow voidflow, *rflow = &voidflow;
int cpu = get_rps_cpu(skb->dev, skb, &rflow);
@@ -5218,7 +5218,7 @@ static void netif_receive_skb_list_internal(struct list_head *head)
rcu_read_lock();
#ifdef CONFIG_RPS
- if (static_key_false(&rps_needed)) {
+ if (static_branch_unlikely(&rps_needed)) {
list_for_each_entry_safe(skb, next, head, list) {
struct rps_dev_flow voidflow, *rflow = &voidflow;
int cpu = get_rps_cpu(skb->dev, skb, &rflow);
diff --git a/net/core/net-sysfs.c b/net/core/net-sysfs.c
index 4ff661f6f989ae10ca49a1e81c825be56683d026..851cabb90bce66f30a5868d6b7499f240202d1eb 100644
--- a/net/core/net-sysfs.c
+++ b/net/core/net-sysfs.c
@@ -754,9 +754,9 @@ static ssize_t store_rps_map(struct netdev_rx_queue *queue,
rcu_assign_pointer(queue->rps_map, map);
if (map)
- static_key_slow_inc(&rps_needed);
+ static_branch_inc(&rps_needed);
if (old_map)
- static_key_slow_dec(&rps_needed);
+ static_branch_dec(&rps_needed);
mutex_unlock(&rps_map_mutex);
diff --git a/net/core/sysctl_net_core.c b/net/core/sysctl_net_core.c
index 84bf2861f45f76f162d661298991f13ac0e8b592..1a2685694abd537d7ae304754b84b237928fd298 100644
--- a/net/core/sysctl_net_core.c
+++ b/net/core/sysctl_net_core.c
@@ -95,12 +95,12 @@ static int rps_sock_flow_sysctl(struct ctl_table *table, int write,
if (sock_table != orig_sock_table) {
rcu_assign_pointer(rps_sock_flow_table, sock_table);
if (sock_table) {
- static_key_slow_inc(&rps_needed);
- static_key_slow_inc(&rfs_needed);
+ static_branch_inc(&rps_needed);
+ static_branch_inc(&rfs_needed);
}
if (orig_sock_table) {
- static_key_slow_dec(&rps_needed);
- static_key_slow_dec(&rfs_needed);
+ static_branch_dec(&rps_needed);
+ static_branch_dec(&rfs_needed);
synchronize_rcu();
vfree(orig_sock_table);
}
--
2.21.0.225.g810b269d1ac-goog
^ permalink raw reply related [flat|nested] 11+ messages in thread
* [PATCH v2 net-next 2/3] tcp: add one skb cache for tx
2019-03-22 0:14 [PATCH v2 net-next 0/3] tcp: add rx/tx cache to reduce lock contention Eric Dumazet
2019-03-22 0:14 ` [PATCH v2 net-next 1/3] net: convert rps_needed and rfs_needed to new static branch api Eric Dumazet
@ 2019-03-22 0:14 ` Eric Dumazet
2019-03-22 0:14 ` [PATCH v2 net-next 3/3] tcp: add one skb cache for rx Eric Dumazet
` (2 subsequent siblings)
4 siblings, 0 replies; 11+ messages in thread
From: Eric Dumazet @ 2019-03-22 0:14 UTC (permalink / raw)
To: David S . Miller
Cc: netdev, Eric Dumazet, Soheil Hassas Yeganeh, Willem de Bruijn,
Florian Westphal, Tom Herbert, Eric Dumazet
On hosts with a lot of cores, RPC workloads suffer from heavy contention on slab spinlocks.
20.69% [kernel] [k] queued_spin_lock_slowpath
5.64% [kernel] [k] _raw_spin_lock
3.83% [kernel] [k] syscall_return_via_sysret
3.48% [kernel] [k] __entry_text_start
1.76% [kernel] [k] __netif_receive_skb_core
1.64% [kernel] [k] __fget
For each sendmsg(), we allocate one skb, and free it at the time ACK packet comes.
In many cases, ACK packets are handled by another cpus, and this unfortunately
incurs heavy costs for slab layer.
This patch uses an extra pointer in socket structure, so that we try to reuse
the same skb and avoid these expensive costs.
We cache at most one skb per socket so this should be safe as far as
memory pressure is concerned.
Signed-off-by: Eric Dumazet <edumazet@google.com>
---
include/net/sock.h | 5 +++++
net/ipv4/tcp.c | 50 +++++++++++++++++++++-------------------------
2 files changed, 28 insertions(+), 27 deletions(-)
diff --git a/include/net/sock.h b/include/net/sock.h
index fecdf639225c2d4995ee2e2cd9be57f3d4f22777..314c47a8f5d19918393aa854a95e6e0f7ec6b604 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -414,6 +414,7 @@ struct sock {
struct sk_buff *sk_send_head;
struct rb_root tcp_rtx_queue;
};
+ struct sk_buff *sk_tx_skb_cache;
struct sk_buff_head sk_write_queue;
__s32 sk_peek_off;
int sk_write_pending;
@@ -1463,6 +1464,10 @@ static inline void sk_mem_uncharge(struct sock *sk, int size)
static inline void sk_wmem_free_skb(struct sock *sk, struct sk_buff *skb)
{
+ if (!sk->sk_tx_skb_cache) {
+ sk->sk_tx_skb_cache = skb;
+ return;
+ }
sock_set_flag(sk, SOCK_QUEUE_SHRUNK);
sk->sk_wmem_queued -= skb->truesize;
sk_mem_uncharge(sk, skb->truesize);
diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index 6baa6dc1b13b0b94b1da238668b93e167cf444fe..f0b5a599914514fee2ee14c7083796dfcd3614cd 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -865,6 +865,21 @@ struct sk_buff *sk_stream_alloc_skb(struct sock *sk, int size, gfp_t gfp,
{
struct sk_buff *skb;
+ skb = sk->sk_tx_skb_cache;
+ if (skb && !size) {
+ const struct sk_buff_fclones *fclones;
+
+ fclones = container_of(skb, struct sk_buff_fclones, skb1);
+ if (refcount_read(&fclones->fclone_ref) == 1) {
+ sk->sk_wmem_queued -= skb->truesize;
+ sk_mem_uncharge(sk, skb->truesize);
+ skb->truesize -= skb->data_len;
+ sk->sk_tx_skb_cache = NULL;
+ pskb_trim(skb, 0);
+ INIT_LIST_HEAD(&skb->tcp_tsorted_anchor);
+ return skb;
+ }
+ }
/* The TCP header must be at least 32-bit aligned. */
size = ALIGN(size, 4);
@@ -1098,30 +1113,6 @@ int tcp_sendpage(struct sock *sk, struct page *page, int offset,
}
EXPORT_SYMBOL(tcp_sendpage);
-/* Do not bother using a page frag for very small frames.
- * But use this heuristic only for the first skb in write queue.
- *
- * Having no payload in skb->head allows better SACK shifting
- * in tcp_shift_skb_data(), reducing sack/rack overhead, because
- * write queue has less skbs.
- * Each skb can hold up to MAX_SKB_FRAGS * 32Kbytes, or ~0.5 MB.
- * This also speeds up tso_fragment(), since it wont fallback
- * to tcp_fragment().
- */
-static int linear_payload_sz(bool first_skb)
-{
- if (first_skb)
- return SKB_WITH_OVERHEAD(2048 - MAX_TCP_HEADER);
- return 0;
-}
-
-static int select_size(bool first_skb, bool zc)
-{
- if (zc)
- return 0;
- return linear_payload_sz(first_skb);
-}
-
void tcp_free_fastopen_req(struct tcp_sock *tp)
{
if (tp->fastopen_req) {
@@ -1272,7 +1263,6 @@ int tcp_sendmsg_locked(struct sock *sk, struct msghdr *msg, size_t size)
if (copy <= 0 || !tcp_skb_can_collapse_to(skb)) {
bool first_skb;
- int linear;
new_segment:
if (!sk_stream_memory_free(sk))
@@ -1283,8 +1273,7 @@ int tcp_sendmsg_locked(struct sock *sk, struct msghdr *msg, size_t size)
goto restart;
}
first_skb = tcp_rtx_and_write_queues_empty(sk);
- linear = select_size(first_skb, zc);
- skb = sk_stream_alloc_skb(sk, linear, sk->sk_allocation,
+ skb = sk_stream_alloc_skb(sk, 0, sk->sk_allocation,
first_skb);
if (!skb)
goto wait_for_memory;
@@ -2552,6 +2541,13 @@ void tcp_write_queue_purge(struct sock *sk)
sk_wmem_free_skb(sk, skb);
}
tcp_rtx_queue_purge(sk);
+ skb = sk->sk_tx_skb_cache;
+ if (skb) {
+ sk->sk_wmem_queued -= skb->truesize;
+ sk_mem_uncharge(sk, skb->truesize);
+ __kfree_skb(skb);
+ sk->sk_tx_skb_cache = NULL;
+ }
INIT_LIST_HEAD(&tcp_sk(sk)->tsorted_sent_queue);
sk_mem_reclaim(sk);
tcp_clear_all_retrans_hints(tcp_sk(sk));
--
2.21.0.225.g810b269d1ac-goog
^ permalink raw reply related [flat|nested] 11+ messages in thread
* [PATCH v2 net-next 3/3] tcp: add one skb cache for rx
2019-03-22 0:14 [PATCH v2 net-next 0/3] tcp: add rx/tx cache to reduce lock contention Eric Dumazet
2019-03-22 0:14 ` [PATCH v2 net-next 1/3] net: convert rps_needed and rfs_needed to new static branch api Eric Dumazet
2019-03-22 0:14 ` [PATCH v2 net-next 2/3] tcp: add one skb cache for tx Eric Dumazet
@ 2019-03-22 0:14 ` Eric Dumazet
2019-03-22 14:57 ` kbuild test robot
2019-03-22 15:00 ` kbuild test robot
2019-03-22 1:54 ` [PATCH v2 net-next 0/3] tcp: add rx/tx cache to reduce lock contention Willem de Bruijn
2019-03-22 11:28 ` Michael S. Tsirkin
4 siblings, 2 replies; 11+ messages in thread
From: Eric Dumazet @ 2019-03-22 0:14 UTC (permalink / raw)
To: David S . Miller
Cc: netdev, Eric Dumazet, Soheil Hassas Yeganeh, Willem de Bruijn,
Florian Westphal, Tom Herbert, Eric Dumazet
Often times, recvmsg() system calls and BH handling for a particular
TCP socket are done on different cpus.
This means the incoming skb had to be allocated on a cpu,
but freed on another.
This incurs a high spinlock contention in slab layer for small rpc,
but also a high number of cache line ping pongs for larger packets.
A full size GRO packet might use 45 page fragments, meaning
that up to 45 put_page() can be involved.
More over performing the __kfree_skb() in the recvmsg() context
adds a latency for user applications, and increase probability
of trapping them in backlog processing, since the BH handler
might found the socket owned by the user.
This patch, combined with the prior one increases the rpc
performance by about 10 % on servers with large number of cores.
(tcp_rr workload with 10,000 flows and 112 threads reach 9 Mpps
instead of 8 Mpps)
This also increases single bulk flow performance on 40Gbit+ links,
since in this case there are often two cpus working in tandem :
- CPU handling the NIC rx interrupts, feeding the receive queue,
and (after this patch) freeing the skbs that were consumed.
- CPU in recvmsg() system call, essentially 100 % busy copying out
data to user space.
Having at most one skb in a per-socket cache has very little risk
of memory exhaustion, and since it is protected by socket lock,
its management is essentially free.
Note that if rps/rfs is used, we do not enable this feature, because
there is high chance that the same cpu is handling both the recvmsg()
system call and the TCP rx path, but that another cpu did the skb
allocations in the device driver right before the RPS/RFS logic.
To properly handle this case, it seems we would need to record
on which cpu skb was allocated, and use a different channel
to give skbs back to this cpu.
Signed-off-by: Eric Dumazet <edumazet@google.com>
---
include/net/sock.h | 6 ++++++
net/ipv4/af_inet.c | 4 ++++
net/ipv4/tcp.c | 4 ++++
net/ipv4/tcp_ipv4.c | 11 +++++++++--
net/ipv6/tcp_ipv6.c | 12 +++++++++---
5 files changed, 32 insertions(+), 5 deletions(-)
diff --git a/include/net/sock.h b/include/net/sock.h
index 314c47a8f5d19918393aa854a95e6e0f7ec6b604..0840f4b27b91eddb205ff42c03f787e5914f755d 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -368,6 +368,7 @@ struct sock {
atomic_t sk_drops;
int sk_rcvlowat;
struct sk_buff_head sk_error_queue;
+ struct sk_buff *sk_rx_skb_cache;
struct sk_buff_head sk_receive_queue;
/*
* The backlog queue is special, it is always used with
@@ -2438,6 +2439,11 @@ static inline void skb_setup_tx_timestamp(struct sk_buff *skb, __u16 tsflags)
static inline void sk_eat_skb(struct sock *sk, struct sk_buff *skb)
{
__skb_unlink(skb, &sk->sk_receive_queue);
+ if (!static_branch_unlikely(&rps_needed) && !sk->sk_rx_skb_cache) {
+ sk->sk_rx_skb_cache = skb;
+ skb_orphan(skb);
+ return;
+ }
__kfree_skb(skb);
}
diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c
index eab3ebde981e78a6a0a4852c3b4374c02ede1187..7f3a984ad618580ae28501c3fe3dd3fa915a66a2 100644
--- a/net/ipv4/af_inet.c
+++ b/net/ipv4/af_inet.c
@@ -136,6 +136,10 @@ void inet_sock_destruct(struct sock *sk)
struct inet_sock *inet = inet_sk(sk);
__skb_queue_purge(&sk->sk_receive_queue);
+ if (sk->sk_rx_skb_cache) {
+ __kfree_skb(sk->sk_rx_skb_cache);
+ sk->sk_rx_skb_cache = NULL;
+ }
__skb_queue_purge(&sk->sk_error_queue);
sk_mem_reclaim(sk);
diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index f0b5a599914514fee2ee14c7083796dfcd3614cd..29b94edf05f9357d3a33744d677827ce624738ae 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -2583,6 +2583,10 @@ int tcp_disconnect(struct sock *sk, int flags)
tcp_clear_xmit_timers(sk);
__skb_queue_purge(&sk->sk_receive_queue);
+ if (sk->sk_rx_skb_cache) {
+ __kfree_skb(sk->sk_rx_skb_cache);
+ sk->sk_rx_skb_cache = NULL;
+ }
tp->copied_seq = tp->rcv_nxt;
tp->urg_data = 0;
tcp_write_queue_purge(sk);
diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
index 277d71239d755d858be70663320d8de2ab23dfcc..3979939804b70b805655d94c598a6cb397e35947 100644
--- a/net/ipv4/tcp_ipv4.c
+++ b/net/ipv4/tcp_ipv4.c
@@ -1774,6 +1774,7 @@ static void tcp_v4_fill_cb(struct sk_buff *skb, const struct iphdr *iph,
int tcp_v4_rcv(struct sk_buff *skb)
{
struct net *net = dev_net(skb->dev);
+ struct sk_buff *skb_to_free;
int sdif = inet_sdif(skb);
const struct iphdr *iph;
const struct tcphdr *th;
@@ -1905,11 +1906,17 @@ int tcp_v4_rcv(struct sk_buff *skb)
tcp_segs_in(tcp_sk(sk), skb);
ret = 0;
if (!sock_owned_by_user(sk)) {
+ skb_to_free = sk->sk_rx_skb_cache;
+ sk->sk_rx_skb_cache = NULL;
ret = tcp_v4_do_rcv(sk, skb);
- } else if (tcp_add_backlog(sk, skb)) {
- goto discard_and_relse;
+ } else {
+ if (tcp_add_backlog(sk, skb))
+ goto discard_and_relse;
+ skb_to_free = NULL;
}
bh_unlock_sock(sk);
+ if (skb_to_free)
+ __kfree_skb(skb_to_free);
put_and_return:
if (refcounted)
diff --git a/net/ipv6/tcp_ipv6.c b/net/ipv6/tcp_ipv6.c
index 983ad7a751027cb8fbaee095b90225d71fbaa698..77d723bbe05085881d3d5d4ca0cb4dbcede8d11d 100644
--- a/net/ipv6/tcp_ipv6.c
+++ b/net/ipv6/tcp_ipv6.c
@@ -1436,6 +1436,7 @@ static void tcp_v6_fill_cb(struct sk_buff *skb, const struct ipv6hdr *hdr,
static int tcp_v6_rcv(struct sk_buff *skb)
{
+ struct sk_buff *skb_to_free;
int sdif = inet6_sdif(skb);
const struct tcphdr *th;
const struct ipv6hdr *hdr;
@@ -1562,12 +1563,17 @@ static int tcp_v6_rcv(struct sk_buff *skb)
tcp_segs_in(tcp_sk(sk), skb);
ret = 0;
if (!sock_owned_by_user(sk)) {
+ skb_to_free = sk->sk_rx_skb_cache;
+ sk->sk_rx_skb_cache = NULL;
ret = tcp_v6_do_rcv(sk, skb);
- } else if (tcp_add_backlog(sk, skb)) {
- goto discard_and_relse;
+ } else {
+ if (tcp_add_backlog(sk, skb))
+ goto discard_and_relse;
+ skb_to_free = NULL;
}
bh_unlock_sock(sk);
-
+ if (skb_to_free)
+ __kfree_skb(skb_to_free);
put_and_return:
if (refcounted)
sock_put(sk);
--
2.21.0.225.g810b269d1ac-goog
^ permalink raw reply related [flat|nested] 11+ messages in thread
* Re: [PATCH v2 net-next 0/3] tcp: add rx/tx cache to reduce lock contention
2019-03-22 0:14 [PATCH v2 net-next 0/3] tcp: add rx/tx cache to reduce lock contention Eric Dumazet
` (2 preceding siblings ...)
2019-03-22 0:14 ` [PATCH v2 net-next 3/3] tcp: add one skb cache for rx Eric Dumazet
@ 2019-03-22 1:54 ` Willem de Bruijn
2019-03-22 7:04 ` Soheil Hassas Yeganeh
2019-03-22 11:28 ` Michael S. Tsirkin
4 siblings, 1 reply; 11+ messages in thread
From: Willem de Bruijn @ 2019-03-22 1:54 UTC (permalink / raw)
To: Eric Dumazet
Cc: David S . Miller, netdev, Soheil Hassas Yeganeh,
Willem de Bruijn, Florian Westphal, Tom Herbert, Eric Dumazet
On Thu, Mar 21, 2019 at 8:16 PM Eric Dumazet <edumazet@google.com> wrote:
>
> On hosts with many cpus we can observe a very serious contention
> on spinlocks used in mm slab layer.
>
> The following can happen quite often :
>
> 1) TX path
> sendmsg() allocates one (fclone) skb on CPU A, sends a clone.
> ACK is received on CPU B, and consumes the skb that was in the retransmit
> queue.
>
> 2) RX path
> network driver allocates skb on CPU C
> recvmsg() happens on CPU D, freeing the skb after it has been delivered
> to user space.
>
> In both cases, we are hitting the asymetric alloc/free pattern
> for which slab has to drain alien caches. At 8 Mpps per second,
> this represents 16 Mpps alloc/free per second and has a huge penalty.
>
> In an interesting experiment, I tried to use a single kmem_cache for all the skbs
> (in skb_init() : skbuff_fclone_cache = skbuff_head_cache =
> kmem_cache_create("skbuff_fclone_cache", sizeof(struct sk_buff_fclones),);
> qnd most of the contention disappeared, since cpus could better use
> their local slab per-cpu cache.
>
> But we can do actually better, in the following patches.
>
> TX : at ACK time, no longer free the skb but put it back in a tcp socket cache,
> so that next sendmsg() can reuse it immediately.
>
> RX : at recvmsg() time, do not free the skb but put it in a tcp socket cache
> so that it can be freed by the cpu feeding the incoming packets in BH.
>
> This increased the performance of small RPC benchmark by about 10 % on a host
> with 112 hyperthreads.
>
> v2 : - Solved a race condition : sk_stream_alloc_skb() to make sure the prior
> clone has been freed.
> - Really test rps_needed in sk_eat_skb() as claimed.
> - Fixed rps_needed use in drivers/net/tun.c
>
> Eric Dumazet (3):
> net: convert rps_needed and rfs_needed to new static branch api
> tcp: add one skb cache for tx
> tcp: add one skb cache for rx
Acked-by: Willem de Bruijn <willemb@google.com>
Thanks Eric!
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH v2 net-next 0/3] tcp: add rx/tx cache to reduce lock contention
2019-03-22 1:54 ` [PATCH v2 net-next 0/3] tcp: add rx/tx cache to reduce lock contention Willem de Bruijn
@ 2019-03-22 7:04 ` Soheil Hassas Yeganeh
0 siblings, 0 replies; 11+ messages in thread
From: Soheil Hassas Yeganeh @ 2019-03-22 7:04 UTC (permalink / raw)
To: Willem de Bruijn
Cc: Eric Dumazet, David S . Miller, netdev, Willem de Bruijn,
Florian Westphal, Tom Herbert, Eric Dumazet
On Thu, Mar 21, 2019 at 9:55 PM Willem de Bruijn
<willemdebruijn.kernel@gmail.com> wrote:
>
> On Thu, Mar 21, 2019 at 8:16 PM Eric Dumazet <edumazet@google.com> wrote:
> >
> > On hosts with many cpus we can observe a very serious contention
> > on spinlocks used in mm slab layer.
> >
> > The following can happen quite often :
> >
> > 1) TX path
> > sendmsg() allocates one (fclone) skb on CPU A, sends a clone.
> > ACK is received on CPU B, and consumes the skb that was in the retransmit
> > queue.
> >
> > 2) RX path
> > network driver allocates skb on CPU C
> > recvmsg() happens on CPU D, freeing the skb after it has been delivered
> > to user space.
> >
> > In both cases, we are hitting the asymetric alloc/free pattern
> > for which slab has to drain alien caches. At 8 Mpps per second,
> > this represents 16 Mpps alloc/free per second and has a huge penalty.
> >
> > In an interesting experiment, I tried to use a single kmem_cache for all the skbs
> > (in skb_init() : skbuff_fclone_cache = skbuff_head_cache =
> > kmem_cache_create("skbuff_fclone_cache", sizeof(struct sk_buff_fclones),);
> > qnd most of the contention disappeared, since cpus could better use
> > their local slab per-cpu cache.
> >
> > But we can do actually better, in the following patches.
> >
> > TX : at ACK time, no longer free the skb but put it back in a tcp socket cache,
> > so that next sendmsg() can reuse it immediately.
> >
> > RX : at recvmsg() time, do not free the skb but put it in a tcp socket cache
> > so that it can be freed by the cpu feeding the incoming packets in BH.
> >
> > This increased the performance of small RPC benchmark by about 10 % on a host
> > with 112 hyperthreads.
> >
> > v2 : - Solved a race condition : sk_stream_alloc_skb() to make sure the prior
> > clone has been freed.
> > - Really test rps_needed in sk_eat_skb() as claimed.
> > - Fixed rps_needed use in drivers/net/tun.c
> >
> > Eric Dumazet (3):
> > net: convert rps_needed and rfs_needed to new static branch api
> > tcp: add one skb cache for tx
> > tcp: add one skb cache for rx
>
> Acked-by: Willem de Bruijn <willemb@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Thanks again!
> Thanks Eric!
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH v2 net-next 0/3] tcp: add rx/tx cache to reduce lock contention
2019-03-22 0:14 [PATCH v2 net-next 0/3] tcp: add rx/tx cache to reduce lock contention Eric Dumazet
` (3 preceding siblings ...)
2019-03-22 1:54 ` [PATCH v2 net-next 0/3] tcp: add rx/tx cache to reduce lock contention Willem de Bruijn
@ 2019-03-22 11:28 ` Michael S. Tsirkin
2019-03-22 12:49 ` Eric Dumazet
4 siblings, 1 reply; 11+ messages in thread
From: Michael S. Tsirkin @ 2019-03-22 11:28 UTC (permalink / raw)
To: Eric Dumazet
Cc: David S . Miller, netdev, Soheil Hassas Yeganeh,
Willem de Bruijn, Florian Westphal, Tom Herbert, Eric Dumazet
On Thu, Mar 21, 2019 at 05:14:41PM -0700, Eric Dumazet wrote:
> On hosts with many cpus we can observe a very serious contention
> on spinlocks used in mm slab layer.
>
> The following can happen quite often :
>
> 1) TX path
> sendmsg() allocates one (fclone) skb on CPU A, sends a clone.
> ACK is received on CPU B, and consumes the skb that was in the retransmit
> queue.
>
> 2) RX path
> network driver allocates skb on CPU C
> recvmsg() happens on CPU D, freeing the skb after it has been delivered
> to user space.
>
> In both cases, we are hitting the asymetric alloc/free pattern
> for which slab has to drain alien caches. At 8 Mpps per second,
> this represents 16 Mpps alloc/free per second and has a huge penalty.
>
> In an interesting experiment, I tried to use a single kmem_cache for all the skbs
> (in skb_init() : skbuff_fclone_cache = skbuff_head_cache =
> kmem_cache_create("skbuff_fclone_cache", sizeof(struct sk_buff_fclones),);
> qnd most of the contention disappeared, since cpus could better use
> their local slab per-cpu cache.
>
> But we can do actually better, in the following patches.
>
> TX : at ACK time, no longer free the skb but put it back in a tcp socket cache,
> so that next sendmsg() can reuse it immediately.
>
> RX : at recvmsg() time, do not free the skb but put it in a tcp socket cache
> so that it can be freed by the cpu feeding the incoming packets in BH.
>
> This increased the performance of small RPC benchmark by about 10 % on a host
> with 112 hyperthreads.
>
> v2 : - Solved a race condition : sk_stream_alloc_skb() to make sure the prior
> clone has been freed.
> - Really test rps_needed in sk_eat_skb() as claimed.
> - Fixed rps_needed use in drivers/net/tun.c
Just a thought: would it make sense to flush the cache
in enter_memory_pressure?
> Eric Dumazet (3):
> net: convert rps_needed and rfs_needed to new static branch api
> tcp: add one skb cache for tx
> tcp: add one skb cache for rx
>
> drivers/net/tun.c | 2 +-
> include/linux/netdevice.h | 4 +--
> include/net/sock.h | 13 ++++++++-
> net/core/dev.c | 10 +++----
> net/core/net-sysfs.c | 4 +--
> net/core/sysctl_net_core.c | 8 +++---
> net/ipv4/af_inet.c | 4 +++
> net/ipv4/tcp.c | 54 +++++++++++++++++++-------------------
> net/ipv4/tcp_ipv4.c | 11 ++++++--
> net/ipv6/tcp_ipv6.c | 12 ++++++---
> 10 files changed, 75 insertions(+), 47 deletions(-)
>
> --
> 2.21.0.225.g810b269d1ac-goog
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH v2 net-next 0/3] tcp: add rx/tx cache to reduce lock contention
2019-03-22 11:28 ` Michael S. Tsirkin
@ 2019-03-22 12:49 ` Eric Dumazet
0 siblings, 0 replies; 11+ messages in thread
From: Eric Dumazet @ 2019-03-22 12:49 UTC (permalink / raw)
To: Michael S. Tsirkin
Cc: David S . Miller, netdev, Soheil Hassas Yeganeh,
Willem de Bruijn, Florian Westphal, Tom Herbert, Eric Dumazet
On Fri, Mar 22, 2019 at 4:28 AM Michael S. Tsirkin <mst@redhat.com> wrote:
>
>
> Just a thought: would it make sense to flush the cache
> in enter_memory_pressure?
>
Good question, thanks !
Willem asked me something similar yesterday.
The argument of keeping one skb for tx and one for rx makes some sense
to me, since it will more or less facilitate forward progress.
References :
commit 8e4d980ac215 ("tcp: fix behavior for epoll edge trigger")
commit eb9344781a2f8 ("tcp: add a force_schedule argument to
sk_stream_alloc_skb()")
The global tcp_memory_pressure status should play its role for
elephant flows (not the RPC workload targeted by this patch series),
so that they gradually reduce their memory usage to the minimum of one
packet per TCP flow.
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH v2 net-next 3/3] tcp: add one skb cache for rx
2019-03-22 0:14 ` [PATCH v2 net-next 3/3] tcp: add one skb cache for rx Eric Dumazet
@ 2019-03-22 14:57 ` kbuild test robot
2019-03-22 15:00 ` kbuild test robot
1 sibling, 0 replies; 11+ messages in thread
From: kbuild test robot @ 2019-03-22 14:57 UTC (permalink / raw)
To: Eric Dumazet
Cc: kbuild-all, David S . Miller, netdev, Eric Dumazet,
Soheil Hassas Yeganeh, Willem de Bruijn, Florian Westphal,
Tom Herbert, Eric Dumazet
[-- Attachment #1: Type: text/plain, Size: 3591 bytes --]
Hi Eric,
I love your patch! Perhaps something to improve:
[auto build test WARNING on net-next/master]
url: https://github.com/0day-ci/linux/commits/Eric-Dumazet/tcp-add-rx-tx-cache-to-reduce-lock-contention/20190322-215506
config: i386-randconfig-x005-201911 (attached as .config)
compiler: gcc-7 (Debian 7.3.0-1) 7.3.0
reproduce:
# save the attached .config to linux build tree
make ARCH=i386
All warnings (new ones prefixed by >>):
In file included from include/linux/kernel.h:11:0,
from include/linux/delay.h:22,
from drivers//w1/w1.c:15:
include/net/sock.h: In function 'sk_eat_skb':
include/net/sock.h:2442:31: error: 'rps_needed' undeclared (first use in this function); did you mean 'free_netdev'?
if (!static_branch_unlikely(&rps_needed) && !sk->sk_rx_skb_cache) {
^
include/linux/compiler.h:33:34: note: in definition of macro '__branch_check__'
______r = __builtin_expect(!!(x), expect); \
^
include/linux/jump_label.h:478:35: note: in expansion of macro 'unlikely'
#define static_branch_unlikely(x) unlikely(static_key_enabled(&(x)->key))
^~~~~~~~
include/linux/jump_label.h:478:44: note: in expansion of macro 'static_key_enabled'
#define static_branch_unlikely(x) unlikely(static_key_enabled(&(x)->key))
^~~~~~~~~~~~~~~~~~
>> include/net/sock.h:2442:7: note: in expansion of macro 'static_branch_unlikely'
if (!static_branch_unlikely(&rps_needed) && !sk->sk_rx_skb_cache) {
^~~~~~~~~~~~~~~~~~~~~~
include/net/sock.h:2442:31: note: each undeclared identifier is reported only once for each function it appears in
if (!static_branch_unlikely(&rps_needed) && !sk->sk_rx_skb_cache) {
^
include/linux/compiler.h:33:34: note: in definition of macro '__branch_check__'
______r = __builtin_expect(!!(x), expect); \
^
include/linux/jump_label.h:478:35: note: in expansion of macro 'unlikely'
#define static_branch_unlikely(x) unlikely(static_key_enabled(&(x)->key))
^~~~~~~~
include/linux/jump_label.h:478:44: note: in expansion of macro 'static_key_enabled'
#define static_branch_unlikely(x) unlikely(static_key_enabled(&(x)->key))
^~~~~~~~~~~~~~~~~~
>> include/net/sock.h:2442:7: note: in expansion of macro 'static_branch_unlikely'
if (!static_branch_unlikely(&rps_needed) && !sk->sk_rx_skb_cache) {
^~~~~~~~~~~~~~~~~~~~~~
vim +/static_branch_unlikely +2442 include/net/sock.h
2430
2431 /**
2432 * sk_eat_skb - Release a skb if it is no longer needed
2433 * @sk: socket to eat this skb from
2434 * @skb: socket buffer to eat
2435 *
2436 * This routine must be called with interrupts disabled or with the socket
2437 * locked so that the sk_buff queue operation is ok.
2438 */
2439 static inline void sk_eat_skb(struct sock *sk, struct sk_buff *skb)
2440 {
2441 __skb_unlink(skb, &sk->sk_receive_queue);
> 2442 if (!static_branch_unlikely(&rps_needed) && !sk->sk_rx_skb_cache) {
2443 sk->sk_rx_skb_cache = skb;
2444 skb_orphan(skb);
2445 return;
2446 }
2447 __kfree_skb(skb);
2448 }
2449
---
0-DAY kernel test infrastructure Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all Intel Corporation
[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 28194 bytes --]
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH v2 net-next 3/3] tcp: add one skb cache for rx
2019-03-22 0:14 ` [PATCH v2 net-next 3/3] tcp: add one skb cache for rx Eric Dumazet
2019-03-22 14:57 ` kbuild test robot
@ 2019-03-22 15:00 ` kbuild test robot
2019-03-22 15:20 ` Eric Dumazet
1 sibling, 1 reply; 11+ messages in thread
From: kbuild test robot @ 2019-03-22 15:00 UTC (permalink / raw)
To: Eric Dumazet
Cc: kbuild-all, David S . Miller, netdev, Eric Dumazet,
Soheil Hassas Yeganeh, Willem de Bruijn, Florian Westphal,
Tom Herbert, Eric Dumazet
[-- Attachment #1: Type: text/plain, Size: 2664 bytes --]
Hi Eric,
I love your patch! Yet something to improve:
[auto build test ERROR on net-next/master]
url: https://github.com/0day-ci/linux/commits/Eric-Dumazet/tcp-add-rx-tx-cache-to-reduce-lock-contention/20190322-215506
config: x86_64-randconfig-x016-201911 (attached as .config)
compiler: gcc-7 (Debian 7.3.0-1) 7.3.0
reproduce:
# save the attached .config to linux build tree
make ARCH=x86_64
All errors (new ones prefixed by >>):
In file included from include/linux/dynamic_debug.h:6:0,
from include/linux/printk.h:330,
from include/linux/kernel.h:15,
from include/linux/list.h:9,
from include/linux/module.h:9,
from drivers//net/ethernet/intel/e1000/e1000.h:10,
from drivers//net/ethernet/intel/e1000/e1000_main.c:4:
include/net/sock.h: In function 'sk_eat_skb':
>> include/net/sock.h:2442:31: error: 'rps_needed' undeclared (first use in this function); did you mean 'free_netdev'?
if (!static_branch_unlikely(&rps_needed) && !sk->sk_rx_skb_cache) {
^
include/linux/jump_label.h:466:43: note: in definition of macro 'static_branch_unlikely'
if (__builtin_types_compatible_p(typeof(*x), struct static_key_true)) \
^
include/net/sock.h:2442:31: note: each undeclared identifier is reported only once for each function it appears in
if (!static_branch_unlikely(&rps_needed) && !sk->sk_rx_skb_cache) {
^
include/linux/jump_label.h:466:43: note: in definition of macro 'static_branch_unlikely'
if (__builtin_types_compatible_p(typeof(*x), struct static_key_true)) \
^
vim +2442 include/net/sock.h
2430
2431 /**
2432 * sk_eat_skb - Release a skb if it is no longer needed
2433 * @sk: socket to eat this skb from
2434 * @skb: socket buffer to eat
2435 *
2436 * This routine must be called with interrupts disabled or with the socket
2437 * locked so that the sk_buff queue operation is ok.
2438 */
2439 static inline void sk_eat_skb(struct sock *sk, struct sk_buff *skb)
2440 {
2441 __skb_unlink(skb, &sk->sk_receive_queue);
> 2442 if (!static_branch_unlikely(&rps_needed) && !sk->sk_rx_skb_cache) {
2443 sk->sk_rx_skb_cache = skb;
2444 skb_orphan(skb);
2445 return;
2446 }
2447 __kfree_skb(skb);
2448 }
2449
---
0-DAY kernel test infrastructure Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all Intel Corporation
[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 31146 bytes --]
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH v2 net-next 3/3] tcp: add one skb cache for rx
2019-03-22 15:00 ` kbuild test robot
@ 2019-03-22 15:20 ` Eric Dumazet
0 siblings, 0 replies; 11+ messages in thread
From: Eric Dumazet @ 2019-03-22 15:20 UTC (permalink / raw)
To: kbuild test robot, Eric Dumazet
Cc: kbuild-all, David S . Miller, netdev, Soheil Hassas Yeganeh,
Willem de Bruijn, Florian Westphal, Tom Herbert, Eric Dumazet
On 03/22/2019 08:00 AM, kbuild test robot wrote:
> Hi Eric,
>
> I love your patch! Yet something to improve:
>
> [auto build test ERROR on net-next/master]
>
> url: https://github.com/0day-ci/linux/commits/Eric-Dumazet/tcp-add-rx-tx-cache-to-reduce-lock-contention/20190322-215506
> config: x86_64-randconfig-x016-201911 (attached as .config)
> compiler: gcc-7 (Debian 7.3.0-1) 7.3.0
> reproduce:
> # save the attached .config to linux build tree
> make ARCH=x86_64
>
> All errors (new ones prefixed by >>):
>
> In file included from include/linux/dynamic_debug.h:6:0,
> from include/linux/printk.h:330,
> from include/linux/kernel.h:15,
> from include/linux/list.h:9,
> from include/linux/module.h:9,
> from drivers//net/ethernet/intel/e1000/e1000.h:10,
> from drivers//net/ethernet/intel/e1000/e1000_main.c:4:
> include/net/sock.h: In function 'sk_eat_skb':
>>> include/net/sock.h:2442:31: error: 'rps_needed' undeclared (first use in this function); did you mean 'free_netdev'?
> if (!static_branch_unlikely(&rps_needed) && !sk->sk_rx_skb_cache) {
> ^
> include/linux/jump_label.h:466:43: note: in definition of macro 'static_branch_unlikely'
> if (__builtin_types_compatible_p(typeof(*x), struct static_key_true)) \
> ^
> include/net/sock.h:2442:31: note: each undeclared identifier is reported only once for each function it appears in
> if (!static_branch_unlikely(&rps_needed) && !sk->sk_rx_skb_cache) {
> ^
> include/linux/jump_label.h:466:43: note: in definition of macro 'static_branch_unlikely'
> if (__builtin_types_compatible_p(typeof(*x), struct static_key_true)) \
> ^
>
> vim +2442 include/net/sock.h
>
> 2430
> 2431 /**
> 2432 * sk_eat_skb - Release a skb if it is no longer needed
> 2433 * @sk: socket to eat this skb from
> 2434 * @skb: socket buffer to eat
> 2435 *
> 2436 * This routine must be called with interrupts disabled or with the socket
> 2437 * locked so that the sk_buff queue operation is ok.
> 2438 */
> 2439 static inline void sk_eat_skb(struct sock *sk, struct sk_buff *skb)
> 2440 {
> 2441 __skb_unlink(skb, &sk->sk_receive_queue);
I guess all this makes sense on SMP systems only :)
>> 2442 if (!static_branch_unlikely(&rps_needed) && !sk->sk_rx_skb_cache) {
> 2443 sk->sk_rx_skb_cache = skb;
> 2444 skb_orphan(skb);
> 2445 return;
> 2446 }
> 2447 __kfree_skb(skb);
> 2448 }
> 2449
>
> ---
> 0-DAY kernel test infrastructure Open Source Technology Center
> https://lists.01.org/pipermail/kbuild-all Intel Corporation
>
^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2019-03-22 15:20 UTC | newest]
Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-03-22 0:14 [PATCH v2 net-next 0/3] tcp: add rx/tx cache to reduce lock contention Eric Dumazet
2019-03-22 0:14 ` [PATCH v2 net-next 1/3] net: convert rps_needed and rfs_needed to new static branch api Eric Dumazet
2019-03-22 0:14 ` [PATCH v2 net-next 2/3] tcp: add one skb cache for tx Eric Dumazet
2019-03-22 0:14 ` [PATCH v2 net-next 3/3] tcp: add one skb cache for rx Eric Dumazet
2019-03-22 14:57 ` kbuild test robot
2019-03-22 15:00 ` kbuild test robot
2019-03-22 15:20 ` Eric Dumazet
2019-03-22 1:54 ` [PATCH v2 net-next 0/3] tcp: add rx/tx cache to reduce lock contention Willem de Bruijn
2019-03-22 7:04 ` Soheil Hassas Yeganeh
2019-03-22 11:28 ` Michael S. Tsirkin
2019-03-22 12:49 ` Eric Dumazet
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.