[PATCH v2 net-next 0/3] tcp: add rx/tx cache to reduce lock contention

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH v2 net-next 0/3] tcp: add rx/tx cache to reduce lock contention
@ 2019-03-22  0:14 Eric Dumazet
  2019-03-22  0:14 ` [PATCH v2 net-next 1/3] net: convert rps_needed and rfs_needed to new static branch api Eric Dumazet
                   ` (4 more replies)
  0 siblings, 5 replies; 11+ messages in thread
From: Eric Dumazet @ 2019-03-22  0:14 UTC (permalink / raw)
  To: David S . Miller
  Cc: netdev, Eric Dumazet, Soheil Hassas Yeganeh, Willem de Bruijn,
	Florian Westphal, Tom Herbert, Eric Dumazet

On hosts with many cpus we can observe a very serious contention
on spinlocks used in mm slab layer.

The following can happen quite often :

1) TX path
  sendmsg() allocates one (fclone) skb on CPU A, sends a clone.
  ACK is received on CPU B, and consumes the skb that was in the retransmit
  queue.

2) RX path
  network driver allocates skb on CPU C
  recvmsg() happens on CPU D, freeing the skb after it has been delivered
  to user space.

In both cases, we are hitting the asymetric alloc/free pattern
for which slab has to drain alien caches. At 8 Mpps per second,
this represents 16 Mpps alloc/free per second and has a huge penalty.

In an interesting experiment, I tried to use a single kmem_cache for all the skbs
(in skb_init() : skbuff_fclone_cache = skbuff_head_cache =
                  kmem_cache_create("skbuff_fclone_cache", sizeof(struct sk_buff_fclones),);
qnd most of the contention disappeared, since cpus could better use
their local slab per-cpu cache.

But we can do actually better, in the following patches.

TX : at ACK time, no longer free the skb but put it back in a tcp socket cache,
     so that next sendmsg() can reuse it immediately.

RX : at recvmsg() time, do not free the skb but put it in a tcp socket cache
   so that it can be freed by the cpu feeding the incoming packets in BH.

This increased the performance of small RPC benchmark by about 10 % on a host
with 112 hyperthreads.

v2 : - Solved a race condition : sk_stream_alloc_skb() to make sure the prior
       clone has been freed.
     - Really test rps_needed in sk_eat_skb() as claimed.
     - Fixed rps_needed use in drivers/net/tun.c

Eric Dumazet (3):
  net: convert rps_needed and rfs_needed to new static branch api
  tcp: add one skb cache for tx
  tcp: add one skb cache for rx

 drivers/net/tun.c          |  2 +-
 include/linux/netdevice.h  |  4 +--
 include/net/sock.h         | 13 ++++++++-
 net/core/dev.c             | 10 +++----
 net/core/net-sysfs.c       |  4 +--
 net/core/sysctl_net_core.c |  8 +++---
 net/ipv4/af_inet.c         |  4 +++
 net/ipv4/tcp.c             | 54 +++++++++++++++++++-------------------
 net/ipv4/tcp_ipv4.c        | 11 ++++++--
 net/ipv6/tcp_ipv6.c        | 12 ++++++---
 10 files changed, 75 insertions(+), 47 deletions(-)

-- 
2.21.0.225.g810b269d1ac-goog

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH v2 net-next 1/3] net: convert rps_needed and rfs_needed to new static branch api
  2019-03-22  0:14 [PATCH v2 net-next 0/3] tcp: add rx/tx cache to reduce lock contention Eric Dumazet
@ 2019-03-22  0:14 ` Eric Dumazet
  2019-03-22  0:14 ` [PATCH v2 net-next 2/3] tcp: add one skb cache for tx Eric Dumazet
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 11+ messages in thread
From: Eric Dumazet @ 2019-03-22  0:14 UTC (permalink / raw)
  To: David S . Miller
  Cc: netdev, Eric Dumazet, Soheil Hassas Yeganeh, Willem de Bruijn,
	Florian Westphal, Tom Herbert, Eric Dumazet

We prefer static_branch_unlikely() over static_key_false() these days.

Signed-off-by: Eric Dumazet <edumazet@google.com>
---
 drivers/net/tun.c          |  2 +-
 include/linux/netdevice.h  |  4 ++--
 include/net/sock.h         |  2 +-
 net/core/dev.c             | 10 +++++-----
 net/core/net-sysfs.c       |  4 ++--
 net/core/sysctl_net_core.c |  8 ++++----
 6 files changed, 15 insertions(+), 15 deletions(-)

diff --git a/drivers/net/tun.c b/drivers/net/tun.c
index 27798aacb671e3e8d754ea60dac528e8efdb52da..24d0220b9ba00724ebad94fbc58858a4abffb207 100644
--- a/drivers/net/tun.c
+++ b/drivers/net/tun.c
@@ -1042,7 +1042,7 @@ static int tun_net_close(struct net_device *dev)
 static void tun_automq_xmit(struct tun_struct *tun, struct sk_buff *skb)
 {
 #ifdef CONFIG_RPS
-	if (tun->numqueues == 1 && static_key_false(&rps_needed)) {
+	if (tun->numqueues == 1 && static_branch_unlikely(&rps_needed)) {
 		/* Select queue was not called for the skbuff, so we extract the
 		 * RPS hash and save it into the flow_table here.
 		 */
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 823762291ebf59d2a8a0502f71d6591b5cd7839f..166fdc0a78b49c9df984b767169c3babce24462e 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -194,8 +194,8 @@ struct net_device_stats {
 
 #ifdef CONFIG_RPS
 #include <linux/static_key.h>
-extern struct static_key rps_needed;
-extern struct static_key rfs_needed;
+extern struct static_key_false rps_needed;
+extern struct static_key_false rfs_needed;
 #endif
 
 struct neighbour;
diff --git a/include/net/sock.h b/include/net/sock.h
index 8de5ee258b93a50b2fdcde796bae3a5b53ce4d6a..fecdf639225c2d4995ee2e2cd9be57f3d4f22777 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -966,7 +966,7 @@ static inline void sock_rps_record_flow_hash(__u32 hash)
 static inline void sock_rps_record_flow(const struct sock *sk)
 {
 #ifdef CONFIG_RPS
-	if (static_key_false(&rfs_needed)) {
+	if (static_branch_unlikely(&rfs_needed)) {
 		/* Reading sk->sk_rxhash might incur an expensive cache line
 		 * miss.
 		 *
diff --git a/net/core/dev.c b/net/core/dev.c
index 357111431ec9a6a5873830b89dd137d5eba6f2f0..c71b0998fa3ac8ae9d28aa1131852032a5cd0008 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -3973,9 +3973,9 @@ EXPORT_SYMBOL(rps_sock_flow_table);
 u32 rps_cpu_mask __read_mostly;
 EXPORT_SYMBOL(rps_cpu_mask);
 
-struct static_key rps_needed __read_mostly;
+struct static_key_false rps_needed __read_mostly;
 EXPORT_SYMBOL(rps_needed);
-struct static_key rfs_needed __read_mostly;
+struct static_key_false rfs_needed __read_mostly;
 EXPORT_SYMBOL(rfs_needed);
 
 static struct rps_dev_flow *
@@ -4501,7 +4501,7 @@ static int netif_rx_internal(struct sk_buff *skb)
 	}
 
 #ifdef CONFIG_RPS
-	if (static_key_false(&rps_needed)) {
+	if (static_branch_unlikely(&rps_needed)) {
 		struct rps_dev_flow voidflow, *rflow = &voidflow;
 		int cpu;
 
@@ -5170,7 +5170,7 @@ static int netif_receive_skb_internal(struct sk_buff *skb)
 
 	rcu_read_lock();
 #ifdef CONFIG_RPS
-	if (static_key_false(&rps_needed)) {
+	if (static_branch_unlikely(&rps_needed)) {
 		struct rps_dev_flow voidflow, *rflow = &voidflow;
 		int cpu = get_rps_cpu(skb->dev, skb, &rflow);
 
@@ -5218,7 +5218,7 @@ static void netif_receive_skb_list_internal(struct list_head *head)
 
 	rcu_read_lock();
 #ifdef CONFIG_RPS
-	if (static_key_false(&rps_needed)) {
+	if (static_branch_unlikely(&rps_needed)) {
 		list_for_each_entry_safe(skb, next, head, list) {
 			struct rps_dev_flow voidflow, *rflow = &voidflow;
 			int cpu = get_rps_cpu(skb->dev, skb, &rflow);
diff --git a/net/core/net-sysfs.c b/net/core/net-sysfs.c
index 4ff661f6f989ae10ca49a1e81c825be56683d026..851cabb90bce66f30a5868d6b7499f240202d1eb 100644
--- a/net/core/net-sysfs.c
+++ b/net/core/net-sysfs.c
@@ -754,9 +754,9 @@ static ssize_t store_rps_map(struct netdev_rx_queue *queue,
 	rcu_assign_pointer(queue->rps_map, map);
 
 	if (map)
-		static_key_slow_inc(&rps_needed);
+		static_branch_inc(&rps_needed);
 	if (old_map)
-		static_key_slow_dec(&rps_needed);
+		static_branch_dec(&rps_needed);
 
 	mutex_unlock(&rps_map_mutex);
 
diff --git a/net/core/sysctl_net_core.c b/net/core/sysctl_net_core.c
index 84bf2861f45f76f162d661298991f13ac0e8b592..1a2685694abd537d7ae304754b84b237928fd298 100644
--- a/net/core/sysctl_net_core.c
+++ b/net/core/sysctl_net_core.c
@@ -95,12 +95,12 @@ static int rps_sock_flow_sysctl(struct ctl_table *table, int write,
 		if (sock_table != orig_sock_table) {
 			rcu_assign_pointer(rps_sock_flow_table, sock_table);
 			if (sock_table) {
-				static_key_slow_inc(&rps_needed);
-				static_key_slow_inc(&rfs_needed);
+				static_branch_inc(&rps_needed);
+				static_branch_inc(&rfs_needed);
 			}
 			if (orig_sock_table) {
-				static_key_slow_dec(&rps_needed);
-				static_key_slow_dec(&rfs_needed);
+				static_branch_dec(&rps_needed);
+				static_branch_dec(&rfs_needed);
 				synchronize_rcu();
 				vfree(orig_sock_table);
 			}
-- 
2.21.0.225.g810b269d1ac-goog


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH v2 net-next 2/3] tcp: add one skb cache for tx
  2019-03-22  0:14 [PATCH v2 net-next 0/3] tcp: add rx/tx cache to reduce lock contention Eric Dumazet
  2019-03-22  0:14 ` [PATCH v2 net-next 1/3] net: convert rps_needed and rfs_needed to new static branch api Eric Dumazet
@ 2019-03-22  0:14 ` Eric Dumazet
  2019-03-22  0:14 ` [PATCH v2 net-next 3/3] tcp: add one skb cache for rx Eric Dumazet
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 11+ messages in thread
From: Eric Dumazet @ 2019-03-22  0:14 UTC (permalink / raw)
  To: David S . Miller
  Cc: netdev, Eric Dumazet, Soheil Hassas Yeganeh, Willem de Bruijn,
	Florian Westphal, Tom Herbert, Eric Dumazet

On hosts with a lot of cores, RPC workloads suffer from heavy contention on slab spinlocks.

    20.69%  [kernel]       [k] queued_spin_lock_slowpath
     5.64%  [kernel]       [k] _raw_spin_lock
     3.83%  [kernel]       [k] syscall_return_via_sysret
     3.48%  [kernel]       [k] __entry_text_start
     1.76%  [kernel]       [k] __netif_receive_skb_core
     1.64%  [kernel]       [k] __fget

For each sendmsg(), we allocate one skb, and free it at the time ACK packet comes.

In many cases, ACK packets are handled by another cpus, and this unfortunately
incurs heavy costs for slab layer.

This patch uses an extra pointer in socket structure, so that we try to reuse
the same skb and avoid these expensive costs.

We cache at most one skb per socket so this should be safe as far as
memory pressure is concerned.

Signed-off-by: Eric Dumazet <edumazet@google.com>
---
 include/net/sock.h |  5 +++++
 net/ipv4/tcp.c     | 50 +++++++++++++++++++++-------------------------
 2 files changed, 28 insertions(+), 27 deletions(-)

diff --git a/include/net/sock.h b/include/net/sock.h
index fecdf639225c2d4995ee2e2cd9be57f3d4f22777..314c47a8f5d19918393aa854a95e6e0f7ec6b604 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -414,6 +414,7 @@ struct sock {
 		struct sk_buff	*sk_send_head;
 		struct rb_root	tcp_rtx_queue;
 	};
+	struct sk_buff		*sk_tx_skb_cache;
 	struct sk_buff_head	sk_write_queue;
 	__s32			sk_peek_off;
 	int			sk_write_pending;
@@ -1463,6 +1464,10 @@ static inline void sk_mem_uncharge(struct sock *sk, int size)
 
 static inline void sk_wmem_free_skb(struct sock *sk, struct sk_buff *skb)
 {
+	if (!sk->sk_tx_skb_cache) {
+		sk->sk_tx_skb_cache = skb;
+		return;
+	}
 	sock_set_flag(sk, SOCK_QUEUE_SHRUNK);
 	sk->sk_wmem_queued -= skb->truesize;
 	sk_mem_uncharge(sk, skb->truesize);
diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index 6baa6dc1b13b0b94b1da238668b93e167cf444fe..f0b5a599914514fee2ee14c7083796dfcd3614cd 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -865,6 +865,21 @@ struct sk_buff *sk_stream_alloc_skb(struct sock *sk, int size, gfp_t gfp,
 {
 	struct sk_buff *skb;
 
+	skb = sk->sk_tx_skb_cache;
+	if (skb && !size) {
+		const struct sk_buff_fclones *fclones;
+
+		fclones = container_of(skb, struct sk_buff_fclones, skb1);
+		if (refcount_read(&fclones->fclone_ref) == 1) {
+			sk->sk_wmem_queued -= skb->truesize;
+			sk_mem_uncharge(sk, skb->truesize);
+			skb->truesize -= skb->data_len;
+			sk->sk_tx_skb_cache = NULL;
+			pskb_trim(skb, 0);
+			INIT_LIST_HEAD(&skb->tcp_tsorted_anchor);
+			return skb;
+		}
+	}
 	/* The TCP header must be at least 32-bit aligned.  */
 	size = ALIGN(size, 4);
 
@@ -1098,30 +1113,6 @@ int tcp_sendpage(struct sock *sk, struct page *page, int offset,
 }
 EXPORT_SYMBOL(tcp_sendpage);
 
-/* Do not bother using a page frag for very small frames.
- * But use this heuristic only for the first skb in write queue.
- *
- * Having no payload in skb->head allows better SACK shifting
- * in tcp_shift_skb_data(), reducing sack/rack overhead, because
- * write queue has less skbs.
- * Each skb can hold up to MAX_SKB_FRAGS * 32Kbytes, or ~0.5 MB.
- * This also speeds up tso_fragment(), since it wont fallback
- * to tcp_fragment().
- */
-static int linear_payload_sz(bool first_skb)
-{
-	if (first_skb)
-		return SKB_WITH_OVERHEAD(2048 - MAX_TCP_HEADER);
-	return 0;
-}
-
-static int select_size(bool first_skb, bool zc)
-{
-	if (zc)
-		return 0;
-	return linear_payload_sz(first_skb);
-}
-
 void tcp_free_fastopen_req(struct tcp_sock *tp)
 {
 	if (tp->fastopen_req) {
@@ -1272,7 +1263,6 @@ int tcp_sendmsg_locked(struct sock *sk, struct msghdr *msg, size_t size)
 
 		if (copy <= 0 || !tcp_skb_can_collapse_to(skb)) {
 			bool first_skb;
-			int linear;
 
 new_segment:
 			if (!sk_stream_memory_free(sk))
@@ -1283,8 +1273,7 @@ int tcp_sendmsg_locked(struct sock *sk, struct msghdr *msg, size_t size)
 				goto restart;
 			}
 			first_skb = tcp_rtx_and_write_queues_empty(sk);
-			linear = select_size(first_skb, zc);
-			skb = sk_stream_alloc_skb(sk, linear, sk->sk_allocation,
+			skb = sk_stream_alloc_skb(sk, 0, sk->sk_allocation,
 						  first_skb);
 			if (!skb)
 				goto wait_for_memory;
@@ -2552,6 +2541,13 @@ void tcp_write_queue_purge(struct sock *sk)
 		sk_wmem_free_skb(sk, skb);
 	}
 	tcp_rtx_queue_purge(sk);
+	skb = sk->sk_tx_skb_cache;
+	if (skb) {
+		sk->sk_wmem_queued -= skb->truesize;
+		sk_mem_uncharge(sk, skb->truesize);
+		__kfree_skb(skb);
+		sk->sk_tx_skb_cache = NULL;
+	}
 	INIT_LIST_HEAD(&tcp_sk(sk)->tsorted_sent_queue);
 	sk_mem_reclaim(sk);
 	tcp_clear_all_retrans_hints(tcp_sk(sk));
-- 
2.21.0.225.g810b269d1ac-goog


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH v2 net-next 3/3] tcp: add one skb cache for rx
  2019-03-22  0:14 [PATCH v2 net-next 0/3] tcp: add rx/tx cache to reduce lock contention Eric Dumazet
  2019-03-22  0:14 ` [PATCH v2 net-next 1/3] net: convert rps_needed and rfs_needed to new static branch api Eric Dumazet
  2019-03-22  0:14 ` [PATCH v2 net-next 2/3] tcp: add one skb cache for tx Eric Dumazet
@ 2019-03-22  0:14 ` Eric Dumazet
  2019-03-22 14:57   ` kbuild test robot
  2019-03-22 15:00   ` kbuild test robot
  2019-03-22  1:54 ` [PATCH v2 net-next 0/3] tcp: add rx/tx cache to reduce lock contention Willem de Bruijn
  2019-03-22 11:28 ` Michael S. Tsirkin
  4 siblings, 2 replies; 11+ messages in thread
From: Eric Dumazet @ 2019-03-22  0:14 UTC (permalink / raw)
  To: David S . Miller
  Cc: netdev, Eric Dumazet, Soheil Hassas Yeganeh, Willem de Bruijn,
	Florian Westphal, Tom Herbert, Eric Dumazet

Often times, recvmsg() system calls and BH handling for a particular
TCP socket are done on different cpus.

This means the incoming skb had to be allocated on a cpu,
but freed on another.

This incurs a high spinlock contention in slab layer for small rpc,
but also a high number of cache line ping pongs for larger packets.

A full size GRO packet might use 45 page fragments, meaning
that up to 45 put_page() can be involved.

More over performing the __kfree_skb() in the recvmsg() context
adds a latency for user applications, and increase probability
of trapping them in backlog processing, since the BH handler
might found the socket owned by the user.

This patch, combined with the prior one increases the rpc
performance by about 10 % on servers with large number of cores.

(tcp_rr workload with 10,000 flows and 112 threads reach 9 Mpps
 instead of 8 Mpps)

This also increases single bulk flow performance on 40Gbit+ links,
since in this case there are often two cpus working in tandem :

 - CPU handling the NIC rx interrupts, feeding the receive queue,
  and (after this patch) freeing the skbs that were consumed.

 - CPU in recvmsg() system call, essentially 100 % busy copying out
  data to user space.

Having at most one skb in a per-socket cache has very little risk
of memory exhaustion, and since it is protected by socket lock,
its management is essentially free.

Note that if rps/rfs is used, we do not enable this feature, because
there is high chance that the same cpu is handling both the recvmsg()
system call and the TCP rx path, but that another cpu did the skb
allocations in the device driver right before the RPS/RFS logic.

To properly handle this case, it seems we would need to record
on which cpu skb was allocated, and use a different channel
to give skbs back to this cpu.

Signed-off-by: Eric Dumazet <edumazet@google.com>
---
 include/net/sock.h  |  6 ++++++
 net/ipv4/af_inet.c  |  4 ++++
 net/ipv4/tcp.c      |  4 ++++
 net/ipv4/tcp_ipv4.c | 11 +++++++++--
 net/ipv6/tcp_ipv6.c | 12 +++++++++---
 5 files changed, 32 insertions(+), 5 deletions(-)

diff --git a/include/net/sock.h b/include/net/sock.h
index 314c47a8f5d19918393aa854a95e6e0f7ec6b604..0840f4b27b91eddb205ff42c03f787e5914f755d 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -368,6 +368,7 @@ struct sock {
 	atomic_t		sk_drops;
 	int			sk_rcvlowat;
 	struct sk_buff_head	sk_error_queue;
+	struct sk_buff		*sk_rx_skb_cache;
 	struct sk_buff_head	sk_receive_queue;
 	/*
 	 * The backlog queue is special, it is always used with
@@ -2438,6 +2439,11 @@ static inline void skb_setup_tx_timestamp(struct sk_buff *skb, __u16 tsflags)
 static inline void sk_eat_skb(struct sock *sk, struct sk_buff *skb)
 {
 	__skb_unlink(skb, &sk->sk_receive_queue);
+	if (!static_branch_unlikely(&rps_needed) && !sk->sk_rx_skb_cache) {
+		sk->sk_rx_skb_cache = skb;
+		skb_orphan(skb);
+		return;
+	}
 	__kfree_skb(skb);
 }
 
diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c
index eab3ebde981e78a6a0a4852c3b4374c02ede1187..7f3a984ad618580ae28501c3fe3dd3fa915a66a2 100644
--- a/net/ipv4/af_inet.c
+++ b/net/ipv4/af_inet.c
@@ -136,6 +136,10 @@ void inet_sock_destruct(struct sock *sk)
 	struct inet_sock *inet = inet_sk(sk);
 
 	__skb_queue_purge(&sk->sk_receive_queue);
+	if (sk->sk_rx_skb_cache) {
+		__kfree_skb(sk->sk_rx_skb_cache);
+		sk->sk_rx_skb_cache = NULL;
+	}
 	__skb_queue_purge(&sk->sk_error_queue);
 
 	sk_mem_reclaim(sk);
diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index f0b5a599914514fee2ee14c7083796dfcd3614cd..29b94edf05f9357d3a33744d677827ce624738ae 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -2583,6 +2583,10 @@ int tcp_disconnect(struct sock *sk, int flags)
 
 	tcp_clear_xmit_timers(sk);
 	__skb_queue_purge(&sk->sk_receive_queue);
+	if (sk->sk_rx_skb_cache) {
+		__kfree_skb(sk->sk_rx_skb_cache);
+		sk->sk_rx_skb_cache = NULL;
+	}
 	tp->copied_seq = tp->rcv_nxt;
 	tp->urg_data = 0;
 	tcp_write_queue_purge(sk);
diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
index 277d71239d755d858be70663320d8de2ab23dfcc..3979939804b70b805655d94c598a6cb397e35947 100644
--- a/net/ipv4/tcp_ipv4.c
+++ b/net/ipv4/tcp_ipv4.c
@@ -1774,6 +1774,7 @@ static void tcp_v4_fill_cb(struct sk_buff *skb, const struct iphdr *iph,
 int tcp_v4_rcv(struct sk_buff *skb)
 {
 	struct net *net = dev_net(skb->dev);
+	struct sk_buff *skb_to_free;
 	int sdif = inet_sdif(skb);
 	const struct iphdr *iph;
 	const struct tcphdr *th;
@@ -1905,11 +1906,17 @@ int tcp_v4_rcv(struct sk_buff *skb)
 	tcp_segs_in(tcp_sk(sk), skb);
 	ret = 0;
 	if (!sock_owned_by_user(sk)) {
+		skb_to_free = sk->sk_rx_skb_cache;
+		sk->sk_rx_skb_cache = NULL;
 		ret = tcp_v4_do_rcv(sk, skb);
-	} else if (tcp_add_backlog(sk, skb)) {
-		goto discard_and_relse;
+	} else {
+		if (tcp_add_backlog(sk, skb))
+			goto discard_and_relse;
+		skb_to_free = NULL;
 	}
 	bh_unlock_sock(sk);
+	if (skb_to_free)
+		__kfree_skb(skb_to_free);
 
 put_and_return:
 	if (refcounted)
diff --git a/net/ipv6/tcp_ipv6.c b/net/ipv6/tcp_ipv6.c
index 983ad7a751027cb8fbaee095b90225d71fbaa698..77d723bbe05085881d3d5d4ca0cb4dbcede8d11d 100644
--- a/net/ipv6/tcp_ipv6.c
+++ b/net/ipv6/tcp_ipv6.c
@@ -1436,6 +1436,7 @@ static void tcp_v6_fill_cb(struct sk_buff *skb, const struct ipv6hdr *hdr,
 
 static int tcp_v6_rcv(struct sk_buff *skb)
 {
+	struct sk_buff *skb_to_free;
 	int sdif = inet6_sdif(skb);
 	const struct tcphdr *th;
 	const struct ipv6hdr *hdr;
@@ -1562,12 +1563,17 @@ static int tcp_v6_rcv(struct sk_buff *skb)
 	tcp_segs_in(tcp_sk(sk), skb);
 	ret = 0;
 	if (!sock_owned_by_user(sk)) {
+		skb_to_free = sk->sk_rx_skb_cache;
+		sk->sk_rx_skb_cache = NULL;
 		ret = tcp_v6_do_rcv(sk, skb);
-	} else if (tcp_add_backlog(sk, skb)) {
-		goto discard_and_relse;
+	} else {
+		if (tcp_add_backlog(sk, skb))
+			goto discard_and_relse;
+		skb_to_free = NULL;
 	}
 	bh_unlock_sock(sk);
-
+	if (skb_to_free)
+		__kfree_skb(skb_to_free);
 put_and_return:
 	if (refcounted)
 		sock_put(sk);
-- 
2.21.0.225.g810b269d1ac-goog


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH v2 net-next 0/3] tcp: add rx/tx cache to reduce lock contention
  2019-03-22  0:14 [PATCH v2 net-next 0/3] tcp: add rx/tx cache to reduce lock contention Eric Dumazet
                   ` (2 preceding siblings ...)
  2019-03-22  0:14 ` [PATCH v2 net-next 3/3] tcp: add one skb cache for rx Eric Dumazet
@ 2019-03-22  1:54 ` Willem de Bruijn
  2019-03-22  7:04   ` Soheil Hassas Yeganeh
  2019-03-22 11:28 ` Michael S. Tsirkin
  4 siblings, 1 reply; 11+ messages in thread
From: Willem de Bruijn @ 2019-03-22  1:54 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: David S . Miller, netdev, Soheil Hassas Yeganeh,
	Willem de Bruijn, Florian Westphal, Tom Herbert, Eric Dumazet

On Thu, Mar 21, 2019 at 8:16 PM Eric Dumazet <edumazet@google.com> wrote:
>
> On hosts with many cpus we can observe a very serious contention
> on spinlocks used in mm slab layer.
>
> The following can happen quite often :
>
> 1) TX path
>   sendmsg() allocates one (fclone) skb on CPU A, sends a clone.
>   ACK is received on CPU B, and consumes the skb that was in the retransmit
>   queue.
>
> 2) RX path
>   network driver allocates skb on CPU C
>   recvmsg() happens on CPU D, freeing the skb after it has been delivered
>   to user space.
>
> In both cases, we are hitting the asymetric alloc/free pattern
> for which slab has to drain alien caches. At 8 Mpps per second,
> this represents 16 Mpps alloc/free per second and has a huge penalty.
>
> In an interesting experiment, I tried to use a single kmem_cache for all the skbs
> (in skb_init() : skbuff_fclone_cache = skbuff_head_cache =
>                   kmem_cache_create("skbuff_fclone_cache", sizeof(struct sk_buff_fclones),);
> qnd most of the contention disappeared, since cpus could better use
> their local slab per-cpu cache.
>
> But we can do actually better, in the following patches.
>
> TX : at ACK time, no longer free the skb but put it back in a tcp socket cache,
>      so that next sendmsg() can reuse it immediately.
>
> RX : at recvmsg() time, do not free the skb but put it in a tcp socket cache
>    so that it can be freed by the cpu feeding the incoming packets in BH.
>
> This increased the performance of small RPC benchmark by about 10 % on a host
> with 112 hyperthreads.
>
> v2 : - Solved a race condition : sk_stream_alloc_skb() to make sure the prior
>        clone has been freed.
>      - Really test rps_needed in sk_eat_skb() as claimed.
>      - Fixed rps_needed use in drivers/net/tun.c
>
> Eric Dumazet (3):
>   net: convert rps_needed and rfs_needed to new static branch api
>   tcp: add one skb cache for tx
>   tcp: add one skb cache for rx

Acked-by: Willem de Bruijn <willemb@google.com>

Thanks Eric!

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v2 net-next 0/3] tcp: add rx/tx cache to reduce lock contention
  2019-03-22  1:54 ` [PATCH v2 net-next 0/3] tcp: add rx/tx cache to reduce lock contention Willem de Bruijn
@ 2019-03-22  7:04   ` Soheil Hassas Yeganeh
  0 siblings, 0 replies; 11+ messages in thread
From: Soheil Hassas Yeganeh @ 2019-03-22  7:04 UTC (permalink / raw)
  To: Willem de Bruijn
  Cc: Eric Dumazet, David S . Miller, netdev, Willem de Bruijn,
	Florian Westphal, Tom Herbert, Eric Dumazet

On Thu, Mar 21, 2019 at 9:55 PM Willem de Bruijn
<willemdebruijn.kernel@gmail.com> wrote:
>
> On Thu, Mar 21, 2019 at 8:16 PM Eric Dumazet <edumazet@google.com> wrote:
> >
> > On hosts with many cpus we can observe a very serious contention
> > on spinlocks used in mm slab layer.
> >
> > The following can happen quite often :
> >
> > 1) TX path
> >   sendmsg() allocates one (fclone) skb on CPU A, sends a clone.
> >   ACK is received on CPU B, and consumes the skb that was in the retransmit
> >   queue.
> >
> > 2) RX path
> >   network driver allocates skb on CPU C
> >   recvmsg() happens on CPU D, freeing the skb after it has been delivered
> >   to user space.
> >
> > In both cases, we are hitting the asymetric alloc/free pattern
> > for which slab has to drain alien caches. At 8 Mpps per second,
> > this represents 16 Mpps alloc/free per second and has a huge penalty.
> >
> > In an interesting experiment, I tried to use a single kmem_cache for all the skbs
> > (in skb_init() : skbuff_fclone_cache = skbuff_head_cache =
> >                   kmem_cache_create("skbuff_fclone_cache", sizeof(struct sk_buff_fclones),);
> > qnd most of the contention disappeared, since cpus could better use
> > their local slab per-cpu cache.
> >
> > But we can do actually better, in the following patches.
> >
> > TX : at ACK time, no longer free the skb but put it back in a tcp socket cache,
> >      so that next sendmsg() can reuse it immediately.
> >
> > RX : at recvmsg() time, do not free the skb but put it in a tcp socket cache
> >    so that it can be freed by the cpu feeding the incoming packets in BH.
> >
> > This increased the performance of small RPC benchmark by about 10 % on a host
> > with 112 hyperthreads.
> >
> > v2 : - Solved a race condition : sk_stream_alloc_skb() to make sure the prior
> >        clone has been freed.
> >      - Really test rps_needed in sk_eat_skb() as claimed.
> >      - Fixed rps_needed use in drivers/net/tun.c
> >
> > Eric Dumazet (3):
> >   net: convert rps_needed and rfs_needed to new static branch api
> >   tcp: add one skb cache for tx
> >   tcp: add one skb cache for rx
>
> Acked-by: Willem de Bruijn <willemb@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>

Thanks again!

> Thanks Eric!

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v2 net-next 0/3] tcp: add rx/tx cache to reduce lock contention
  2019-03-22  0:14 [PATCH v2 net-next 0/3] tcp: add rx/tx cache to reduce lock contention Eric Dumazet
                   ` (3 preceding siblings ...)
  2019-03-22  1:54 ` [PATCH v2 net-next 0/3] tcp: add rx/tx cache to reduce lock contention Willem de Bruijn
@ 2019-03-22 11:28 ` Michael S. Tsirkin
  2019-03-22 12:49   ` Eric Dumazet
  4 siblings, 1 reply; 11+ messages in thread
From: Michael S. Tsirkin @ 2019-03-22 11:28 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: David S . Miller, netdev, Soheil Hassas Yeganeh,
	Willem de Bruijn, Florian Westphal, Tom Herbert, Eric Dumazet

On Thu, Mar 21, 2019 at 05:14:41PM -0700, Eric Dumazet wrote:
> On hosts with many cpus we can observe a very serious contention
> on spinlocks used in mm slab layer.
> 
> The following can happen quite often :
> 
> 1) TX path
>   sendmsg() allocates one (fclone) skb on CPU A, sends a clone.
>   ACK is received on CPU B, and consumes the skb that was in the retransmit
>   queue.
> 
> 2) RX path
>   network driver allocates skb on CPU C
>   recvmsg() happens on CPU D, freeing the skb after it has been delivered
>   to user space.
> 
> In both cases, we are hitting the asymetric alloc/free pattern
> for which slab has to drain alien caches. At 8 Mpps per second,
> this represents 16 Mpps alloc/free per second and has a huge penalty.
> 
> In an interesting experiment, I tried to use a single kmem_cache for all the skbs
> (in skb_init() : skbuff_fclone_cache = skbuff_head_cache =
>                   kmem_cache_create("skbuff_fclone_cache", sizeof(struct sk_buff_fclones),);
> qnd most of the contention disappeared, since cpus could better use
> their local slab per-cpu cache.
> 
> But we can do actually better, in the following patches.
> 
> TX : at ACK time, no longer free the skb but put it back in a tcp socket cache,
>      so that next sendmsg() can reuse it immediately.
> 
> RX : at recvmsg() time, do not free the skb but put it in a tcp socket cache
>    so that it can be freed by the cpu feeding the incoming packets in BH.
> 
> This increased the performance of small RPC benchmark by about 10 % on a host
> with 112 hyperthreads.
> 
> v2 : - Solved a race condition : sk_stream_alloc_skb() to make sure the prior
>        clone has been freed.
>      - Really test rps_needed in sk_eat_skb() as claimed.
>      - Fixed rps_needed use in drivers/net/tun.c

Just a thought: would it make sense to flush the cache
in enter_memory_pressure?


> Eric Dumazet (3):
>   net: convert rps_needed and rfs_needed to new static branch api
>   tcp: add one skb cache for tx
>   tcp: add one skb cache for rx
> 
>  drivers/net/tun.c          |  2 +-
>  include/linux/netdevice.h  |  4 +--
>  include/net/sock.h         | 13 ++++++++-
>  net/core/dev.c             | 10 +++----
>  net/core/net-sysfs.c       |  4 +--
>  net/core/sysctl_net_core.c |  8 +++---
>  net/ipv4/af_inet.c         |  4 +++
>  net/ipv4/tcp.c             | 54 +++++++++++++++++++-------------------
>  net/ipv4/tcp_ipv4.c        | 11 ++++++--
>  net/ipv6/tcp_ipv6.c        | 12 ++++++---
>  10 files changed, 75 insertions(+), 47 deletions(-)
> 
> -- 
> 2.21.0.225.g810b269d1ac-goog

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v2 net-next 0/3] tcp: add rx/tx cache to reduce lock contention
  2019-03-22 11:28 ` Michael S. Tsirkin
@ 2019-03-22 12:49   ` Eric Dumazet
  0 siblings, 0 replies; 11+ messages in thread
From: Eric Dumazet @ 2019-03-22 12:49 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: David S . Miller, netdev, Soheil Hassas Yeganeh,
	Willem de Bruijn, Florian Westphal, Tom Herbert, Eric Dumazet

On Fri, Mar 22, 2019 at 4:28 AM Michael S. Tsirkin <mst@redhat.com> wrote:
>
>
> Just a thought: would it make sense to flush the cache
> in enter_memory_pressure?
>

Good question, thanks !

Willem asked me something similar yesterday.

The argument of keeping one skb for tx and one for rx makes some sense
to me, since it will more or less facilitate forward progress.

References :
 commit  8e4d980ac215 ("tcp: fix behavior for epoll edge trigger")
 commit eb9344781a2f8 ("tcp: add a force_schedule argument to
sk_stream_alloc_skb()")

The global tcp_memory_pressure status should play its role for
elephant flows (not the RPC workload targeted by this patch series),
so that they gradually reduce their memory usage to the minimum of one
packet per TCP flow.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v2 net-next 3/3] tcp: add one skb cache for rx
  2019-03-22  0:14 ` [PATCH v2 net-next 3/3] tcp: add one skb cache for rx Eric Dumazet
@ 2019-03-22 14:57   ` kbuild test robot
  2019-03-22 15:00   ` kbuild test robot
  1 sibling, 0 replies; 11+ messages in thread
From: kbuild test robot @ 2019-03-22 14:57 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: kbuild-all, David S . Miller, netdev, Eric Dumazet,
	Soheil Hassas Yeganeh, Willem de Bruijn, Florian Westphal,
	Tom Herbert, Eric Dumazet

[-- Attachment #1: Type: text/plain, Size: 3591 bytes --]

Hi Eric,

I love your patch! Perhaps something to improve:

[auto build test WARNING on net-next/master]

url:    https://github.com/0day-ci/linux/commits/Eric-Dumazet/tcp-add-rx-tx-cache-to-reduce-lock-contention/20190322-215506
config: i386-randconfig-x005-201911 (attached as .config)
compiler: gcc-7 (Debian 7.3.0-1) 7.3.0
reproduce:
        # save the attached .config to linux build tree
        make ARCH=i386 

All warnings (new ones prefixed by >>):

   In file included from include/linux/kernel.h:11:0,
                    from include/linux/delay.h:22,
                    from drivers//w1/w1.c:15:
   include/net/sock.h: In function 'sk_eat_skb':
   include/net/sock.h:2442:31: error: 'rps_needed' undeclared (first use in this function); did you mean 'free_netdev'?
     if (!static_branch_unlikely(&rps_needed) && !sk->sk_rx_skb_cache) {
                                  ^
   include/linux/compiler.h:33:34: note: in definition of macro '__branch_check__'
       ______r = __builtin_expect(!!(x), expect); \
                                     ^
   include/linux/jump_label.h:478:35: note: in expansion of macro 'unlikely'
    #define static_branch_unlikely(x) unlikely(static_key_enabled(&(x)->key))
                                      ^~~~~~~~
   include/linux/jump_label.h:478:44: note: in expansion of macro 'static_key_enabled'
    #define static_branch_unlikely(x) unlikely(static_key_enabled(&(x)->key))
                                               ^~~~~~~~~~~~~~~~~~
>> include/net/sock.h:2442:7: note: in expansion of macro 'static_branch_unlikely'
     if (!static_branch_unlikely(&rps_needed) && !sk->sk_rx_skb_cache) {
          ^~~~~~~~~~~~~~~~~~~~~~
   include/net/sock.h:2442:31: note: each undeclared identifier is reported only once for each function it appears in
     if (!static_branch_unlikely(&rps_needed) && !sk->sk_rx_skb_cache) {
                                  ^
   include/linux/compiler.h:33:34: note: in definition of macro '__branch_check__'
       ______r = __builtin_expect(!!(x), expect); \
                                     ^
   include/linux/jump_label.h:478:35: note: in expansion of macro 'unlikely'
    #define static_branch_unlikely(x) unlikely(static_key_enabled(&(x)->key))
                                      ^~~~~~~~
   include/linux/jump_label.h:478:44: note: in expansion of macro 'static_key_enabled'
    #define static_branch_unlikely(x) unlikely(static_key_enabled(&(x)->key))
                                               ^~~~~~~~~~~~~~~~~~
>> include/net/sock.h:2442:7: note: in expansion of macro 'static_branch_unlikely'
     if (!static_branch_unlikely(&rps_needed) && !sk->sk_rx_skb_cache) {
          ^~~~~~~~~~~~~~~~~~~~~~

vim +/static_branch_unlikely +2442 include/net/sock.h

  2430	
  2431	/**
  2432	 * sk_eat_skb - Release a skb if it is no longer needed
  2433	 * @sk: socket to eat this skb from
  2434	 * @skb: socket buffer to eat
  2435	 *
  2436	 * This routine must be called with interrupts disabled or with the socket
  2437	 * locked so that the sk_buff queue operation is ok.
  2438	*/
  2439	static inline void sk_eat_skb(struct sock *sk, struct sk_buff *skb)
  2440	{
  2441		__skb_unlink(skb, &sk->sk_receive_queue);
> 2442		if (!static_branch_unlikely(&rps_needed) && !sk->sk_rx_skb_cache) {
  2443			sk->sk_rx_skb_cache = skb;
  2444			skb_orphan(skb);
  2445			return;
  2446		}
  2447		__kfree_skb(skb);
  2448	}
  2449	

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 28194 bytes --]

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v2 net-next 3/3] tcp: add one skb cache for rx
  2019-03-22  0:14 ` [PATCH v2 net-next 3/3] tcp: add one skb cache for rx Eric Dumazet
  2019-03-22 14:57   ` kbuild test robot
@ 2019-03-22 15:00   ` kbuild test robot
  2019-03-22 15:20     ` Eric Dumazet
  1 sibling, 1 reply; 11+ messages in thread
From: kbuild test robot @ 2019-03-22 15:00 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: kbuild-all, David S . Miller, netdev, Eric Dumazet,
	Soheil Hassas Yeganeh, Willem de Bruijn, Florian Westphal,
	Tom Herbert, Eric Dumazet

[-- Attachment #1: Type: text/plain, Size: 2664 bytes --]

Hi Eric,

I love your patch! Yet something to improve:

[auto build test ERROR on net-next/master]

url:    https://github.com/0day-ci/linux/commits/Eric-Dumazet/tcp-add-rx-tx-cache-to-reduce-lock-contention/20190322-215506
config: x86_64-randconfig-x016-201911 (attached as .config)
compiler: gcc-7 (Debian 7.3.0-1) 7.3.0
reproduce:
        # save the attached .config to linux build tree
        make ARCH=x86_64 

All errors (new ones prefixed by >>):

   In file included from include/linux/dynamic_debug.h:6:0,
                    from include/linux/printk.h:330,
                    from include/linux/kernel.h:15,
                    from include/linux/list.h:9,
                    from include/linux/module.h:9,
                    from drivers//net/ethernet/intel/e1000/e1000.h:10,
                    from drivers//net/ethernet/intel/e1000/e1000_main.c:4:
   include/net/sock.h: In function 'sk_eat_skb':
>> include/net/sock.h:2442:31: error: 'rps_needed' undeclared (first use in this function); did you mean 'free_netdev'?
     if (!static_branch_unlikely(&rps_needed) && !sk->sk_rx_skb_cache) {
                                  ^
   include/linux/jump_label.h:466:43: note: in definition of macro 'static_branch_unlikely'
     if (__builtin_types_compatible_p(typeof(*x), struct static_key_true)) \
                                              ^
   include/net/sock.h:2442:31: note: each undeclared identifier is reported only once for each function it appears in
     if (!static_branch_unlikely(&rps_needed) && !sk->sk_rx_skb_cache) {
                                  ^
   include/linux/jump_label.h:466:43: note: in definition of macro 'static_branch_unlikely'
     if (__builtin_types_compatible_p(typeof(*x), struct static_key_true)) \
                                              ^

vim +2442 include/net/sock.h

  2430	
  2431	/**
  2432	 * sk_eat_skb - Release a skb if it is no longer needed
  2433	 * @sk: socket to eat this skb from
  2434	 * @skb: socket buffer to eat
  2435	 *
  2436	 * This routine must be called with interrupts disabled or with the socket
  2437	 * locked so that the sk_buff queue operation is ok.
  2438	*/
  2439	static inline void sk_eat_skb(struct sock *sk, struct sk_buff *skb)
  2440	{
  2441		__skb_unlink(skb, &sk->sk_receive_queue);
> 2442		if (!static_branch_unlikely(&rps_needed) && !sk->sk_rx_skb_cache) {
  2443			sk->sk_rx_skb_cache = skb;
  2444			skb_orphan(skb);
  2445			return;
  2446		}
  2447		__kfree_skb(skb);
  2448	}
  2449	

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 31146 bytes --]

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v2 net-next 3/3] tcp: add one skb cache for rx
  2019-03-22 15:00   ` kbuild test robot
@ 2019-03-22 15:20     ` Eric Dumazet
  0 siblings, 0 replies; 11+ messages in thread
From: Eric Dumazet @ 2019-03-22 15:20 UTC (permalink / raw)
  To: kbuild test robot, Eric Dumazet
  Cc: kbuild-all, David S . Miller, netdev, Soheil Hassas Yeganeh,
	Willem de Bruijn, Florian Westphal, Tom Herbert, Eric Dumazet



On 03/22/2019 08:00 AM, kbuild test robot wrote:
> Hi Eric,
> 
> I love your patch! Yet something to improve:
> 
> [auto build test ERROR on net-next/master]
> 
> url:    https://github.com/0day-ci/linux/commits/Eric-Dumazet/tcp-add-rx-tx-cache-to-reduce-lock-contention/20190322-215506
> config: x86_64-randconfig-x016-201911 (attached as .config)
> compiler: gcc-7 (Debian 7.3.0-1) 7.3.0
> reproduce:
>         # save the attached .config to linux build tree
>         make ARCH=x86_64 
> 
> All errors (new ones prefixed by >>):
> 
>    In file included from include/linux/dynamic_debug.h:6:0,
>                     from include/linux/printk.h:330,
>                     from include/linux/kernel.h:15,
>                     from include/linux/list.h:9,
>                     from include/linux/module.h:9,
>                     from drivers//net/ethernet/intel/e1000/e1000.h:10,
>                     from drivers//net/ethernet/intel/e1000/e1000_main.c:4:
>    include/net/sock.h: In function 'sk_eat_skb':
>>> include/net/sock.h:2442:31: error: 'rps_needed' undeclared (first use in this function); did you mean 'free_netdev'?
>      if (!static_branch_unlikely(&rps_needed) && !sk->sk_rx_skb_cache) {
>                                   ^
>    include/linux/jump_label.h:466:43: note: in definition of macro 'static_branch_unlikely'
>      if (__builtin_types_compatible_p(typeof(*x), struct static_key_true)) \
>                                               ^
>    include/net/sock.h:2442:31: note: each undeclared identifier is reported only once for each function it appears in
>      if (!static_branch_unlikely(&rps_needed) && !sk->sk_rx_skb_cache) {
>                                   ^
>    include/linux/jump_label.h:466:43: note: in definition of macro 'static_branch_unlikely'
>      if (__builtin_types_compatible_p(typeof(*x), struct static_key_true)) \
>                                               ^
> 
> vim +2442 include/net/sock.h
> 
>   2430	
>   2431	/**
>   2432	 * sk_eat_skb - Release a skb if it is no longer needed
>   2433	 * @sk: socket to eat this skb from
>   2434	 * @skb: socket buffer to eat
>   2435	 *
>   2436	 * This routine must be called with interrupts disabled or with the socket
>   2437	 * locked so that the sk_buff queue operation is ok.
>   2438	*/
>   2439	static inline void sk_eat_skb(struct sock *sk, struct sk_buff *skb)
>   2440	{
>   2441		__skb_unlink(skb, &sk->sk_receive_queue);

I guess all this makes sense on SMP systems only :)

>> 2442		if (!static_branch_unlikely(&rps_needed) && !sk->sk_rx_skb_cache) {
>   2443			sk->sk_rx_skb_cache = skb;
>   2444			skb_orphan(skb);
>   2445			return;
>   2446		}
>   2447		__kfree_skb(skb);
>   2448	}
>   2449	
> 
> ---
> 0-DAY kernel test infrastructure                Open Source Technology Center
> https://lists.01.org/pipermail/kbuild-all                   Intel Corporation
> 

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2019-03-22 15:20 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-03-22  0:14 [PATCH v2 net-next 0/3] tcp: add rx/tx cache to reduce lock contention Eric Dumazet
2019-03-22  0:14 ` [PATCH v2 net-next 1/3] net: convert rps_needed and rfs_needed to new static branch api Eric Dumazet
2019-03-22  0:14 ` [PATCH v2 net-next 2/3] tcp: add one skb cache for tx Eric Dumazet
2019-03-22  0:14 ` [PATCH v2 net-next 3/3] tcp: add one skb cache for rx Eric Dumazet
2019-03-22 14:57   ` kbuild test robot
2019-03-22 15:00   ` kbuild test robot
2019-03-22 15:20     ` Eric Dumazet
2019-03-22  1:54 ` [PATCH v2 net-next 0/3] tcp: add rx/tx cache to reduce lock contention Willem de Bruijn
2019-03-22  7:04   ` Soheil Hassas Yeganeh
2019-03-22 11:28 ` Michael S. Tsirkin
2019-03-22 12:49   ` Eric Dumazet

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.