All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH net-next 0/9] tcp: receive path optimizations
@ 2021-10-21 16:22 Eric Dumazet
  2021-10-21 16:22 ` [PATCH net-next 1/2] net: sched: fix logic error in qdisc_run_begin() Eric Dumazet
                   ` (11 more replies)
  0 siblings, 12 replies; 15+ messages in thread
From: Eric Dumazet @ 2021-10-21 16:22 UTC (permalink / raw)
  To: David S . Miller, Jakub Kicinski
  Cc: netdev, Eric Dumazet, Eric Dumazet, Soheil Hassas Yeganeh, Neal Cardwell

From: Eric Dumazet <edumazet@google.com>

This series aims to reduce cache line misses in RX path.

I am still working on better cache locality in tcp_sock but
this will wait few more weeks.

Eric Dumazet (9):
  tcp: move inet->rx_dst_ifindex to sk->sk_rx_dst_ifindex
  ipv6: move inet6_sk(sk)->rx_dst_cookie to sk->sk_rx_dst_cookie
  net: avoid dirtying sk->sk_napi_id
  net: avoid dirtying sk->sk_rx_queue_mapping
  ipv6: annotate data races around np->min_hopcount
  ipv6: guard IPV6_MINHOPCOUNT with a static key
  ipv4: annotate data races arount inet->min_ttl
  ipv4: guard IP_MINTTL with a static key
  ipv6/tcp: small drop monitor changes

 include/linux/ipv6.h     |  1 -
 include/net/busy_poll.h  |  3 ++-
 include/net/inet_sock.h  |  3 +--
 include/net/ip.h         |  2 ++
 include/net/ipv6.h       |  1 +
 include/net/sock.h       | 11 +++++++----
 net/ipv4/ip_sockglue.c   | 11 ++++++++++-
 net/ipv4/tcp_ipv4.c      | 25 ++++++++++++++++---------
 net/ipv6/ipv6_sockglue.c | 11 ++++++++++-
 net/ipv6/tcp_ipv6.c      | 35 +++++++++++++++++++++--------------
 net/ipv6/udp.c           |  4 ++--
 11 files changed, 72 insertions(+), 35 deletions(-)

-- 
2.33.0.1079.g6e70778dc9-goog


^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH net-next 1/2] net: sched: fix logic error in qdisc_run_begin()
  2021-10-21 16:22 [PATCH net-next 0/9] tcp: receive path optimizations Eric Dumazet
@ 2021-10-21 16:22 ` Eric Dumazet
  2021-10-21 16:24   ` Eric Dumazet
  2021-10-21 16:22 ` [PATCH net-next 1/9] tcp: move inet->rx_dst_ifindex to sk->sk_rx_dst_ifindex Eric Dumazet
                   ` (10 subsequent siblings)
  11 siblings, 1 reply; 15+ messages in thread
From: Eric Dumazet @ 2021-10-21 16:22 UTC (permalink / raw)
  To: David S . Miller, Jakub Kicinski
  Cc: netdev, Eric Dumazet, Eric Dumazet, Soheil Hassas Yeganeh,
	Neal Cardwell, Ahmed S . Darwish, Sebastian Andrzej Siewior

From: Eric Dumazet <edumazet@google.com>

For non TCQ_F_NOLOCK qdisc, qdisc_run_begin() tries to set
__QDISC_STATE_RUNNING and should return true if the bit was not set.

test_and_set_bit() returns old bit value, therefore we need to invert.

Fixes: 29cbcd858283 ("net: sched: Remove Qdisc::running sequence counter")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Ahmed S. Darwish <a.darwish@linutronix.de>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
---
 include/net/sch_generic.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h
index baad2ab4d971cd3fdc8d59acdd72d39fa6230370..e0988c56dd8fd7aa3dff6bd971da3c81f1a20626 100644
--- a/include/net/sch_generic.h
+++ b/include/net/sch_generic.h
@@ -217,7 +217,7 @@ static inline bool qdisc_run_begin(struct Qdisc *qdisc)
 		 */
 		return spin_trylock(&qdisc->seqlock);
 	}
-	return test_and_set_bit(__QDISC_STATE_RUNNING, &qdisc->state);
+	return !test_and_set_bit(__QDISC_STATE_RUNNING, &qdisc->state);
 }
 
 static inline void qdisc_run_end(struct Qdisc *qdisc)
-- 
2.33.0.1079.g6e70778dc9-goog


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH net-next 1/9] tcp: move inet->rx_dst_ifindex to sk->sk_rx_dst_ifindex
  2021-10-21 16:22 [PATCH net-next 0/9] tcp: receive path optimizations Eric Dumazet
  2021-10-21 16:22 ` [PATCH net-next 1/2] net: sched: fix logic error in qdisc_run_begin() Eric Dumazet
@ 2021-10-21 16:22 ` Eric Dumazet
  2021-10-21 16:22 ` [PATCH net-next 2/9] ipv6: move inet6_sk(sk)->rx_dst_cookie to sk->sk_rx_dst_cookie Eric Dumazet
                   ` (9 subsequent siblings)
  11 siblings, 0 replies; 15+ messages in thread
From: Eric Dumazet @ 2021-10-21 16:22 UTC (permalink / raw)
  To: David S . Miller, Jakub Kicinski
  Cc: netdev, Eric Dumazet, Eric Dumazet, Soheil Hassas Yeganeh, Neal Cardwell

From: Eric Dumazet <edumazet@google.com>

Increase cache locality by moving rx_dst_ifindex next to sk->sk_rx_dst

This is part of an effort to reduce cache line misses in TCP fast path.

This removes one cache line miss in early demux.

Signed-off-by: Eric Dumazet <edumazet@google.com>
---
 include/net/inet_sock.h | 3 +--
 include/net/sock.h      | 3 +++
 net/ipv4/tcp_ipv4.c     | 6 +++---
 net/ipv6/tcp_ipv6.c     | 6 +++---
 4 files changed, 10 insertions(+), 8 deletions(-)

diff --git a/include/net/inet_sock.h b/include/net/inet_sock.h
index 89163ef8cf4be2aaf99d09806749911a121a56e0..9e1111f5915bd03b6ec5e2e4a74ea0079ede8263 100644
--- a/include/net/inet_sock.h
+++ b/include/net/inet_sock.h
@@ -207,11 +207,10 @@ struct inet_sock {
 	__be32			inet_saddr;
 	__s16			uc_ttl;
 	__u16			cmsg_flags;
+	struct ip_options_rcu __rcu	*inet_opt;
 	__be16			inet_sport;
 	__u16			inet_id;
 
-	struct ip_options_rcu __rcu	*inet_opt;
-	int			rx_dst_ifindex;
 	__u8			tos;
 	__u8			min_ttl;
 	__u8			mc_ttl;
diff --git a/include/net/sock.h b/include/net/sock.h
index 596ba85611bc786affed2bf2b18e455b015f3774..0bfb3f138bdab01bd97498e1126d111743000c8c 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -259,6 +259,7 @@ struct bpf_local_storage;
   *	@sk_rcvbuf: size of receive buffer in bytes
   *	@sk_wq: sock wait queue and async head
   *	@sk_rx_dst: receive input route used by early demux
+  *	@sk_rx_dst_ifindex: ifindex for @sk_rx_dst
   *	@sk_dst_cache: destination cache
   *	@sk_dst_pending_confirm: need to confirm neighbour
   *	@sk_policy: flow policy
@@ -430,6 +431,8 @@ struct sock {
 	struct xfrm_policy __rcu *sk_policy[2];
 #endif
 	struct dst_entry	*sk_rx_dst;
+	int			sk_rx_dst_ifindex;
+
 	struct dst_entry __rcu	*sk_dst_cache;
 	atomic_t		sk_omem_alloc;
 	int			sk_sndbuf;
diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
index 29a57bd159f0aa99e892bac56b75961c107f803a..e8ca8539b436cf8a8af5b53645a25923003afc41 100644
--- a/net/ipv4/tcp_ipv4.c
+++ b/net/ipv4/tcp_ipv4.c
@@ -1684,7 +1684,7 @@ int tcp_v4_do_rcv(struct sock *sk, struct sk_buff *skb)
 		sock_rps_save_rxhash(sk, skb);
 		sk_mark_napi_id(sk, skb);
 		if (dst) {
-			if (inet_sk(sk)->rx_dst_ifindex != skb->skb_iif ||
+			if (sk->sk_rx_dst_ifindex != skb->skb_iif ||
 			    !INDIRECT_CALL_1(dst->ops->check, ipv4_dst_check,
 					     dst, 0)) {
 				dst_release(dst);
@@ -1769,7 +1769,7 @@ int tcp_v4_early_demux(struct sk_buff *skb)
 			if (dst)
 				dst = dst_check(dst, 0);
 			if (dst &&
-			    inet_sk(sk)->rx_dst_ifindex == skb->skb_iif)
+			    sk->sk_rx_dst_ifindex == skb->skb_iif)
 				skb_dst_set_noref(skb, dst);
 		}
 	}
@@ -2176,7 +2176,7 @@ void inet_sk_rx_dst_set(struct sock *sk, const struct sk_buff *skb)
 
 	if (dst && dst_hold_safe(dst)) {
 		sk->sk_rx_dst = dst;
-		inet_sk(sk)->rx_dst_ifindex = skb->skb_iif;
+		sk->sk_rx_dst_ifindex = skb->skb_iif;
 	}
 }
 EXPORT_SYMBOL(inet_sk_rx_dst_set);
diff --git a/net/ipv6/tcp_ipv6.c b/net/ipv6/tcp_ipv6.c
index 8cf5ff2e95043ec1a2b27661aae884eb13dcf9eb..833b5ca8cc83798e5303542fc7522a86d97518ae 100644
--- a/net/ipv6/tcp_ipv6.c
+++ b/net/ipv6/tcp_ipv6.c
@@ -108,7 +108,7 @@ static void inet6_sk_rx_dst_set(struct sock *sk, const struct sk_buff *skb)
 		const struct rt6_info *rt = (const struct rt6_info *)dst;
 
 		sk->sk_rx_dst = dst;
-		inet_sk(sk)->rx_dst_ifindex = skb->skb_iif;
+		sk->sk_rx_dst_ifindex = skb->skb_iif;
 		tcp_inet6_sk(sk)->rx_dst_cookie = rt6_get_cookie(rt);
 	}
 }
@@ -1506,7 +1506,7 @@ static int tcp_v6_do_rcv(struct sock *sk, struct sk_buff *skb)
 		sock_rps_save_rxhash(sk, skb);
 		sk_mark_napi_id(sk, skb);
 		if (dst) {
-			if (inet_sk(sk)->rx_dst_ifindex != skb->skb_iif ||
+			if (sk->sk_rx_dst_ifindex != skb->skb_iif ||
 			    INDIRECT_CALL_1(dst->ops->check, ip6_dst_check,
 					    dst, np->rx_dst_cookie) == NULL) {
 				dst_release(dst);
@@ -1871,7 +1871,7 @@ INDIRECT_CALLABLE_SCOPE void tcp_v6_early_demux(struct sk_buff *skb)
 			if (dst)
 				dst = dst_check(dst, tcp_inet6_sk(sk)->rx_dst_cookie);
 			if (dst &&
-			    inet_sk(sk)->rx_dst_ifindex == skb->skb_iif)
+			    sk->sk_rx_dst_ifindex == skb->skb_iif)
 				skb_dst_set_noref(skb, dst);
 		}
 	}
-- 
2.33.0.1079.g6e70778dc9-goog


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH net-next 2/9] ipv6: move inet6_sk(sk)->rx_dst_cookie to sk->sk_rx_dst_cookie
  2021-10-21 16:22 [PATCH net-next 0/9] tcp: receive path optimizations Eric Dumazet
  2021-10-21 16:22 ` [PATCH net-next 1/2] net: sched: fix logic error in qdisc_run_begin() Eric Dumazet
  2021-10-21 16:22 ` [PATCH net-next 1/9] tcp: move inet->rx_dst_ifindex to sk->sk_rx_dst_ifindex Eric Dumazet
@ 2021-10-21 16:22 ` Eric Dumazet
  2021-10-21 16:22 ` [PATCH net-next 2/2] net: sched: remove one pair of atomic operations Eric Dumazet
                   ` (8 subsequent siblings)
  11 siblings, 0 replies; 15+ messages in thread
From: Eric Dumazet @ 2021-10-21 16:22 UTC (permalink / raw)
  To: David S . Miller, Jakub Kicinski
  Cc: netdev, Eric Dumazet, Eric Dumazet, Soheil Hassas Yeganeh, Neal Cardwell

From: Eric Dumazet <edumazet@google.com>

Increase cache locality by moving rx_dst_coookie next to sk->sk_rx_dst

This removes one or two cache line misses in IPv6 early demux (TCP/UDP)

Signed-off-by: Eric Dumazet <edumazet@google.com>
---
 include/linux/ipv6.h | 1 -
 include/net/sock.h   | 2 ++
 net/ipv6/tcp_ipv6.c  | 6 +++---
 net/ipv6/udp.c       | 4 ++--
 4 files changed, 7 insertions(+), 6 deletions(-)

diff --git a/include/linux/ipv6.h b/include/linux/ipv6.h
index ef4a69865737cee82a72c35f3421a535b607c7a6..c383630d3f0658908eac65c030daf97b0a0d0c7c 100644
--- a/include/linux/ipv6.h
+++ b/include/linux/ipv6.h
@@ -282,7 +282,6 @@ struct ipv6_pinfo {
 	__be32			rcv_flowinfo;
 
 	__u32			dst_cookie;
-	__u32			rx_dst_cookie;
 
 	struct ipv6_mc_socklist	__rcu *ipv6_mc_list;
 	struct ipv6_ac_socklist	*ipv6_ac_list;
diff --git a/include/net/sock.h b/include/net/sock.h
index 0bfb3f138bdab01bd97498e1126d111743000c8c..99c4194cb61add848e3a35db0f952c4193f5ea1f 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -260,6 +260,7 @@ struct bpf_local_storage;
   *	@sk_wq: sock wait queue and async head
   *	@sk_rx_dst: receive input route used by early demux
   *	@sk_rx_dst_ifindex: ifindex for @sk_rx_dst
+  *	@sk_rx_dst_cookie: cookie for @sk_rx_dst
   *	@sk_dst_cache: destination cache
   *	@sk_dst_pending_confirm: need to confirm neighbour
   *	@sk_policy: flow policy
@@ -432,6 +433,7 @@ struct sock {
 #endif
 	struct dst_entry	*sk_rx_dst;
 	int			sk_rx_dst_ifindex;
+	u32			sk_rx_dst_cookie;
 
 	struct dst_entry __rcu	*sk_dst_cache;
 	atomic_t		sk_omem_alloc;
diff --git a/net/ipv6/tcp_ipv6.c b/net/ipv6/tcp_ipv6.c
index 833b5ca8cc83798e5303542fc7522a86d97518ae..360c79c8e3099e54d125d454b7f5eb406678c91f 100644
--- a/net/ipv6/tcp_ipv6.c
+++ b/net/ipv6/tcp_ipv6.c
@@ -109,7 +109,7 @@ static void inet6_sk_rx_dst_set(struct sock *sk, const struct sk_buff *skb)
 
 		sk->sk_rx_dst = dst;
 		sk->sk_rx_dst_ifindex = skb->skb_iif;
-		tcp_inet6_sk(sk)->rx_dst_cookie = rt6_get_cookie(rt);
+		sk->sk_rx_dst_cookie = rt6_get_cookie(rt);
 	}
 }
 
@@ -1508,7 +1508,7 @@ static int tcp_v6_do_rcv(struct sock *sk, struct sk_buff *skb)
 		if (dst) {
 			if (sk->sk_rx_dst_ifindex != skb->skb_iif ||
 			    INDIRECT_CALL_1(dst->ops->check, ip6_dst_check,
-					    dst, np->rx_dst_cookie) == NULL) {
+					    dst, sk->sk_rx_dst_cookie) == NULL) {
 				dst_release(dst);
 				sk->sk_rx_dst = NULL;
 			}
@@ -1869,7 +1869,7 @@ INDIRECT_CALLABLE_SCOPE void tcp_v6_early_demux(struct sk_buff *skb)
 			struct dst_entry *dst = READ_ONCE(sk->sk_rx_dst);
 
 			if (dst)
-				dst = dst_check(dst, tcp_inet6_sk(sk)->rx_dst_cookie);
+				dst = dst_check(dst, sk->sk_rx_dst_cookie);
 			if (dst &&
 			    sk->sk_rx_dst_ifindex == skb->skb_iif)
 				skb_dst_set_noref(skb, dst);
diff --git a/net/ipv6/udp.c b/net/ipv6/udp.c
index 8d785232b4796b7cafe14a35dedcbb0aaa2c37c2..14a94cddcf0bcf63d8351c66b94a08770694a9c8 100644
--- a/net/ipv6/udp.c
+++ b/net/ipv6/udp.c
@@ -884,7 +884,7 @@ static void udp6_sk_rx_dst_set(struct sock *sk, struct dst_entry *dst)
 	if (udp_sk_rx_dst_set(sk, dst)) {
 		const struct rt6_info *rt = (const struct rt6_info *)dst;
 
-		inet6_sk(sk)->rx_dst_cookie = rt6_get_cookie(rt);
+		sk->sk_rx_dst_cookie = rt6_get_cookie(rt);
 	}
 }
 
@@ -1073,7 +1073,7 @@ INDIRECT_CALLABLE_SCOPE void udp_v6_early_demux(struct sk_buff *skb)
 	dst = READ_ONCE(sk->sk_rx_dst);
 
 	if (dst)
-		dst = dst_check(dst, inet6_sk(sk)->rx_dst_cookie);
+		dst = dst_check(dst, sk->sk_rx_dst_cookie);
 	if (dst) {
 		/* set noref for now.
 		 * any place which wants to hold dst has to call
-- 
2.33.0.1079.g6e70778dc9-goog


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH net-next 2/2] net: sched: remove one pair of atomic operations
  2021-10-21 16:22 [PATCH net-next 0/9] tcp: receive path optimizations Eric Dumazet
                   ` (2 preceding siblings ...)
  2021-10-21 16:22 ` [PATCH net-next 2/9] ipv6: move inet6_sk(sk)->rx_dst_cookie to sk->sk_rx_dst_cookie Eric Dumazet
@ 2021-10-21 16:22 ` Eric Dumazet
  2021-10-21 16:25   ` Eric Dumazet
  2021-10-21 16:22 ` [PATCH net-next 3/9] net: avoid dirtying sk->sk_napi_id Eric Dumazet
                   ` (7 subsequent siblings)
  11 siblings, 1 reply; 15+ messages in thread
From: Eric Dumazet @ 2021-10-21 16:22 UTC (permalink / raw)
  To: David S . Miller, Jakub Kicinski
  Cc: netdev, Eric Dumazet, Eric Dumazet, Soheil Hassas Yeganeh,
	Neal Cardwell, Ahmed S . Darwish, Sebastian Andrzej Siewior

From: Eric Dumazet <edumazet@google.com>

__QDISC_STATE_RUNNING is only set/cleared from contexts owning qdisc lock.

Thus we can use less expensive bit operations, as we were doing
before commit f9eb8aea2a1e ("net_sched: transform qdisc running bit into a seqcount")

Fixes: 29cbcd858283 ("net: sched: Remove Qdisc::running sequence counter")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Ahmed S. Darwish <a.darwish@linutronix.de>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
---
 include/net/sch_generic.h | 12 ++++++++----
 1 file changed, 8 insertions(+), 4 deletions(-)

diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h
index e0988c56dd8fd7aa3dff6bd971da3c81f1a20626..ada02c4a4f518b732d62561a22b1d9033516b494 100644
--- a/include/net/sch_generic.h
+++ b/include/net/sch_generic.h
@@ -38,10 +38,13 @@ enum qdisc_state_t {
 	__QDISC_STATE_DEACTIVATED,
 	__QDISC_STATE_MISSED,
 	__QDISC_STATE_DRAINING,
+};
+
+enum qdisc_state2_t {
 	/* Only for !TCQ_F_NOLOCK qdisc. Never access it directly.
 	 * Use qdisc_run_begin/end() or qdisc_is_running() instead.
 	 */
-	__QDISC_STATE_RUNNING,
+	__QDISC_STATE2_RUNNING,
 };
 
 #define QDISC_STATE_MISSED	BIT(__QDISC_STATE_MISSED)
@@ -114,6 +117,7 @@ struct Qdisc {
 	struct gnet_stats_basic_sync bstats;
 	struct gnet_stats_queue	qstats;
 	unsigned long		state;
+	unsigned long		state2; /* must be written under qdisc spinlock */
 	struct Qdisc            *next_sched;
 	struct sk_buff_head	skb_bad_txq;
 
@@ -154,7 +158,7 @@ static inline bool qdisc_is_running(struct Qdisc *qdisc)
 {
 	if (qdisc->flags & TCQ_F_NOLOCK)
 		return spin_is_locked(&qdisc->seqlock);
-	return test_bit(__QDISC_STATE_RUNNING, &qdisc->state);
+	return test_bit(__QDISC_STATE2_RUNNING, &qdisc->state2);
 }
 
 static inline bool nolock_qdisc_is_empty(const struct Qdisc *qdisc)
@@ -217,7 +221,7 @@ static inline bool qdisc_run_begin(struct Qdisc *qdisc)
 		 */
 		return spin_trylock(&qdisc->seqlock);
 	}
-	return !test_and_set_bit(__QDISC_STATE_RUNNING, &qdisc->state);
+	return !__test_and_set_bit(__QDISC_STATE2_RUNNING, &qdisc->state2);
 }
 
 static inline void qdisc_run_end(struct Qdisc *qdisc)
@@ -229,7 +233,7 @@ static inline void qdisc_run_end(struct Qdisc *qdisc)
 				      &qdisc->state)))
 			__netif_schedule(qdisc);
 	} else {
-		clear_bit(__QDISC_STATE_RUNNING, &qdisc->state);
+		__clear_bit(__QDISC_STATE2_RUNNING, &qdisc->state2);
 	}
 }
 
-- 
2.33.0.1079.g6e70778dc9-goog


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH net-next 3/9] net: avoid dirtying sk->sk_napi_id
  2021-10-21 16:22 [PATCH net-next 0/9] tcp: receive path optimizations Eric Dumazet
                   ` (3 preceding siblings ...)
  2021-10-21 16:22 ` [PATCH net-next 2/2] net: sched: remove one pair of atomic operations Eric Dumazet
@ 2021-10-21 16:22 ` Eric Dumazet
  2021-10-21 16:22 ` [PATCH net-next 4/9] net: avoid dirtying sk->sk_rx_queue_mapping Eric Dumazet
                   ` (6 subsequent siblings)
  11 siblings, 0 replies; 15+ messages in thread
From: Eric Dumazet @ 2021-10-21 16:22 UTC (permalink / raw)
  To: David S . Miller, Jakub Kicinski
  Cc: netdev, Eric Dumazet, Eric Dumazet, Soheil Hassas Yeganeh, Neal Cardwell

From: Eric Dumazet <edumazet@google.com>

sk_napi_id is located in a cache line that can be kept read mostly.

Signed-off-by: Eric Dumazet <edumazet@google.com>
---
 include/net/busy_poll.h | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/include/net/busy_poll.h b/include/net/busy_poll.h
index 40296ed976a9778ceb239b99ad783cb99b8b92ef..4202c609bb0b09345c0f1c5105adf409a3a89f74 100644
--- a/include/net/busy_poll.h
+++ b/include/net/busy_poll.h
@@ -130,7 +130,8 @@ static inline void skb_mark_napi_id(struct sk_buff *skb,
 static inline void sk_mark_napi_id(struct sock *sk, const struct sk_buff *skb)
 {
 #ifdef CONFIG_NET_RX_BUSY_POLL
-	WRITE_ONCE(sk->sk_napi_id, skb->napi_id);
+	if (unlikely(READ_ONCE(sk->sk_napi_id) != skb->napi_id))
+		WRITE_ONCE(sk->sk_napi_id, skb->napi_id);
 #endif
 	sk_rx_queue_set(sk, skb);
 }
-- 
2.33.0.1079.g6e70778dc9-goog


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH net-next 4/9] net: avoid dirtying sk->sk_rx_queue_mapping
  2021-10-21 16:22 [PATCH net-next 0/9] tcp: receive path optimizations Eric Dumazet
                   ` (4 preceding siblings ...)
  2021-10-21 16:22 ` [PATCH net-next 3/9] net: avoid dirtying sk->sk_napi_id Eric Dumazet
@ 2021-10-21 16:22 ` Eric Dumazet
  2021-10-21 16:22 ` [PATCH net-next 5/9] ipv6: annotate data races around np->min_hopcount Eric Dumazet
                   ` (5 subsequent siblings)
  11 siblings, 0 replies; 15+ messages in thread
From: Eric Dumazet @ 2021-10-21 16:22 UTC (permalink / raw)
  To: David S . Miller, Jakub Kicinski
  Cc: netdev, Eric Dumazet, Eric Dumazet, Soheil Hassas Yeganeh, Neal Cardwell

From: Eric Dumazet <edumazet@google.com>

sk_rx_queue_mapping is located in a cache line that should be kept read mostly.

Signed-off-by: Eric Dumazet <edumazet@google.com>
---
 include/net/sock.h | 6 ++----
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/include/net/sock.h b/include/net/sock.h
index 99c4194cb61add848e3a35db0f952c4193f5ea1f..b4d3744b188ad869b4ec55f78e04236b710898de 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -1916,10 +1916,8 @@ static inline void sk_rx_queue_set(struct sock *sk, const struct sk_buff *skb)
 	if (skb_rx_queue_recorded(skb)) {
 		u16 rx_queue = skb_get_rx_queue(skb);
 
-		if (WARN_ON_ONCE(rx_queue == NO_QUEUE_MAPPING))
-			return;
-
-		sk->sk_rx_queue_mapping = rx_queue;
+		if (unlikely(READ_ONCE(sk->sk_rx_queue_mapping) != rx_queue))
+			WRITE_ONCE(sk->sk_rx_queue_mapping, rx_queue);
 	}
 #endif
 }
-- 
2.33.0.1079.g6e70778dc9-goog


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH net-next 5/9] ipv6: annotate data races around np->min_hopcount
  2021-10-21 16:22 [PATCH net-next 0/9] tcp: receive path optimizations Eric Dumazet
                   ` (5 preceding siblings ...)
  2021-10-21 16:22 ` [PATCH net-next 4/9] net: avoid dirtying sk->sk_rx_queue_mapping Eric Dumazet
@ 2021-10-21 16:22 ` Eric Dumazet
  2021-10-21 16:22 ` [PATCH net-next 6/9] ipv6: guard IPV6_MINHOPCOUNT with a static key Eric Dumazet
                   ` (4 subsequent siblings)
  11 siblings, 0 replies; 15+ messages in thread
From: Eric Dumazet @ 2021-10-21 16:22 UTC (permalink / raw)
  To: David S . Miller, Jakub Kicinski
  Cc: netdev, Eric Dumazet, Eric Dumazet, Soheil Hassas Yeganeh, Neal Cardwell

From: Eric Dumazet <edumazet@google.com>

No report yet from KCSAN, yet worth documenting the races.

Signed-off-by: Eric Dumazet <edumazet@google.com>
---
 net/ipv6/ipv6_sockglue.c | 5 ++++-
 net/ipv6/tcp_ipv6.c      | 6 ++++--
 2 files changed, 8 insertions(+), 3 deletions(-)

diff --git a/net/ipv6/ipv6_sockglue.c b/net/ipv6/ipv6_sockglue.c
index e4bdb09c558670f342f1abad5dfd8252f497aa68..9c3d28764b5c3a47a73491ea5d656867ece4fed2 100644
--- a/net/ipv6/ipv6_sockglue.c
+++ b/net/ipv6/ipv6_sockglue.c
@@ -950,7 +950,10 @@ static int do_ipv6_setsockopt(struct sock *sk, int level, int optname,
 			goto e_inval;
 		if (val < 0 || val > 255)
 			goto e_inval;
-		np->min_hopcount = val;
+		/* tcp_v6_err() and tcp_v6_rcv() might read min_hopcount
+		 * while we are changing it.
+		 */
+		WRITE_ONCE(np->min_hopcount, val);
 		retv = 0;
 		break;
 	case IPV6_DONTFRAG:
diff --git a/net/ipv6/tcp_ipv6.c b/net/ipv6/tcp_ipv6.c
index 360c79c8e3099e54d125d454b7f5eb406678c91f..2247f525364b16e89afedbec8f4ec3367bf88aa8 100644
--- a/net/ipv6/tcp_ipv6.c
+++ b/net/ipv6/tcp_ipv6.c
@@ -414,7 +414,8 @@ static int tcp_v6_err(struct sk_buff *skb, struct inet6_skb_parm *opt,
 	if (sk->sk_state == TCP_CLOSE)
 		goto out;
 
-	if (ipv6_hdr(skb)->hop_limit < tcp_inet6_sk(sk)->min_hopcount) {
+	/* min_hopcount can be changed concurrently from do_ipv6_setsockopt() */
+	if (ipv6_hdr(skb)->hop_limit < READ_ONCE(tcp_inet6_sk(sk)->min_hopcount)) {
 		__NET_INC_STATS(net, LINUX_MIB_TCPMINTTLDROP);
 		goto out;
 	}
@@ -1723,7 +1724,8 @@ INDIRECT_CALLABLE_SCOPE int tcp_v6_rcv(struct sk_buff *skb)
 			return 0;
 		}
 	}
-	if (hdr->hop_limit < tcp_inet6_sk(sk)->min_hopcount) {
+	/* min_hopcount can be changed concurrently from do_ipv6_setsockopt() */
+	if (hdr->hop_limit < READ_ONCE(tcp_inet6_sk(sk)->min_hopcount)) {
 		__NET_INC_STATS(net, LINUX_MIB_TCPMINTTLDROP);
 		goto discard_and_relse;
 	}
-- 
2.33.0.1079.g6e70778dc9-goog


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH net-next 6/9] ipv6: guard IPV6_MINHOPCOUNT with a static key
  2021-10-21 16:22 [PATCH net-next 0/9] tcp: receive path optimizations Eric Dumazet
                   ` (6 preceding siblings ...)
  2021-10-21 16:22 ` [PATCH net-next 5/9] ipv6: annotate data races around np->min_hopcount Eric Dumazet
@ 2021-10-21 16:22 ` Eric Dumazet
  2021-10-21 16:22 ` [PATCH net-next 7/9] ipv4: annotate data races arount inet->min_ttl Eric Dumazet
                   ` (3 subsequent siblings)
  11 siblings, 0 replies; 15+ messages in thread
From: Eric Dumazet @ 2021-10-21 16:22 UTC (permalink / raw)
  To: David S . Miller, Jakub Kicinski
  Cc: netdev, Eric Dumazet, Eric Dumazet, Soheil Hassas Yeganeh, Neal Cardwell

From: Eric Dumazet <edumazet@google.com>

RFC 5082 IPV6_MINHOPCOUNT is rarely used on hosts.

Add a static key to remove from TCP fast path useless code,
and potential cache line miss to fetch tcp_inet6_sk(sk)->min_hopcount

Note that once ip6_min_hopcount static key has been enabled,
it stays enabled until next boot.

Signed-off-by: Eric Dumazet <edumazet@google.com>
---
 include/net/ipv6.h       |  1 +
 net/ipv6/ipv6_sockglue.c |  6 ++++++
 net/ipv6/tcp_ipv6.c      | 21 +++++++++++++--------
 3 files changed, 20 insertions(+), 8 deletions(-)

diff --git a/include/net/ipv6.h b/include/net/ipv6.h
index f2d0ecc257bb28e6dd162d180c371e2b0487c8e3..c19bf51ded1d026e795a3f9ae0ff3be766fc174e 100644
--- a/include/net/ipv6.h
+++ b/include/net/ipv6.h
@@ -1092,6 +1092,7 @@ struct in6_addr *fl6_update_dst(struct flowi6 *fl6,
 /*
  *	socket options (ipv6_sockglue.c)
  */
+DECLARE_STATIC_KEY_FALSE(ip6_min_hopcount);
 
 int ipv6_setsockopt(struct sock *sk, int level, int optname, sockptr_t optval,
 		    unsigned int optlen);
diff --git a/net/ipv6/ipv6_sockglue.c b/net/ipv6/ipv6_sockglue.c
index 9c3d28764b5c3a47a73491ea5d656867ece4fed2..41efca817db4228f265235a471449a3790075ce7 100644
--- a/net/ipv6/ipv6_sockglue.c
+++ b/net/ipv6/ipv6_sockglue.c
@@ -55,6 +55,8 @@
 struct ip6_ra_chain *ip6_ra_chain;
 DEFINE_RWLOCK(ip6_ra_lock);
 
+DEFINE_STATIC_KEY_FALSE(ip6_min_hopcount);
+
 int ip6_ra_control(struct sock *sk, int sel)
 {
 	struct ip6_ra_chain *ra, *new_ra, **rap;
@@ -950,6 +952,10 @@ static int do_ipv6_setsockopt(struct sock *sk, int level, int optname,
 			goto e_inval;
 		if (val < 0 || val > 255)
 			goto e_inval;
+
+		if (val)
+			static_branch_enable(&ip6_min_hopcount);
+
 		/* tcp_v6_err() and tcp_v6_rcv() might read min_hopcount
 		 * while we are changing it.
 		 */
diff --git a/net/ipv6/tcp_ipv6.c b/net/ipv6/tcp_ipv6.c
index 2247f525364b16e89afedbec8f4ec3367bf88aa8..bbff3df27d1c24d7a47849b28297ba129baafc99 100644
--- a/net/ipv6/tcp_ipv6.c
+++ b/net/ipv6/tcp_ipv6.c
@@ -414,10 +414,12 @@ static int tcp_v6_err(struct sk_buff *skb, struct inet6_skb_parm *opt,
 	if (sk->sk_state == TCP_CLOSE)
 		goto out;
 
-	/* min_hopcount can be changed concurrently from do_ipv6_setsockopt() */
-	if (ipv6_hdr(skb)->hop_limit < READ_ONCE(tcp_inet6_sk(sk)->min_hopcount)) {
-		__NET_INC_STATS(net, LINUX_MIB_TCPMINTTLDROP);
-		goto out;
+	if (static_branch_unlikely(&ip6_min_hopcount)) {
+		/* min_hopcount can be changed concurrently from do_ipv6_setsockopt() */
+		if (ipv6_hdr(skb)->hop_limit < READ_ONCE(tcp_inet6_sk(sk)->min_hopcount)) {
+			__NET_INC_STATS(net, LINUX_MIB_TCPMINTTLDROP);
+			goto out;
+		}
 	}
 
 	tp = tcp_sk(sk);
@@ -1724,10 +1726,13 @@ INDIRECT_CALLABLE_SCOPE int tcp_v6_rcv(struct sk_buff *skb)
 			return 0;
 		}
 	}
-	/* min_hopcount can be changed concurrently from do_ipv6_setsockopt() */
-	if (hdr->hop_limit < READ_ONCE(tcp_inet6_sk(sk)->min_hopcount)) {
-		__NET_INC_STATS(net, LINUX_MIB_TCPMINTTLDROP);
-		goto discard_and_relse;
+
+	if (static_branch_unlikely(&ip6_min_hopcount)) {
+		/* min_hopcount can be changed concurrently from do_ipv6_setsockopt() */
+		if (hdr->hop_limit < READ_ONCE(tcp_inet6_sk(sk)->min_hopcount)) {
+			__NET_INC_STATS(net, LINUX_MIB_TCPMINTTLDROP);
+			goto discard_and_relse;
+		}
 	}
 
 	if (!xfrm6_policy_check(sk, XFRM_POLICY_IN, skb))
-- 
2.33.0.1079.g6e70778dc9-goog


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH net-next 7/9] ipv4: annotate data races arount inet->min_ttl
  2021-10-21 16:22 [PATCH net-next 0/9] tcp: receive path optimizations Eric Dumazet
                   ` (7 preceding siblings ...)
  2021-10-21 16:22 ` [PATCH net-next 6/9] ipv6: guard IPV6_MINHOPCOUNT with a static key Eric Dumazet
@ 2021-10-21 16:22 ` Eric Dumazet
  2021-10-21 16:22 ` [PATCH net-next 8/9] ipv4: guard IP_MINTTL with a static key Eric Dumazet
                   ` (2 subsequent siblings)
  11 siblings, 0 replies; 15+ messages in thread
From: Eric Dumazet @ 2021-10-21 16:22 UTC (permalink / raw)
  To: David S . Miller, Jakub Kicinski
  Cc: netdev, Eric Dumazet, Eric Dumazet, Soheil Hassas Yeganeh, Neal Cardwell

From: Eric Dumazet <edumazet@google.com>

No report yet from KCSAN, yet worth documenting the races.

Signed-off-by: Eric Dumazet <edumazet@google.com>
---
 net/ipv4/ip_sockglue.c | 5 ++++-
 net/ipv4/tcp_ipv4.c    | 7 +++++--
 2 files changed, 9 insertions(+), 3 deletions(-)

diff --git a/net/ipv4/ip_sockglue.c b/net/ipv4/ip_sockglue.c
index b297bb28556ec5cf383068f67ee910af38591cc3..d5487c8580674a01df8c7d8ce88f97c9add846b6 100644
--- a/net/ipv4/ip_sockglue.c
+++ b/net/ipv4/ip_sockglue.c
@@ -1352,7 +1352,10 @@ static int do_ip_setsockopt(struct sock *sk, int level, int optname,
 			goto e_inval;
 		if (val < 0 || val > 255)
 			goto e_inval;
-		inet->min_ttl = val;
+		/* tcp_v4_err() and tcp_v4_rcv() might read min_ttl
+		 * while we are changint it.
+		 */
+		WRITE_ONCE(inet->min_ttl, val);
 		break;
 
 	default:
diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
index e8ca8539b436cf8a8af5b53645a25923003afc41..97b8acf726d0cdcb6b87b6ef45e366591d997a2b 100644
--- a/net/ipv4/tcp_ipv4.c
+++ b/net/ipv4/tcp_ipv4.c
@@ -508,7 +508,8 @@ int tcp_v4_err(struct sk_buff *skb, u32 info)
 	if (sk->sk_state == TCP_CLOSE)
 		goto out;
 
-	if (unlikely(iph->ttl < inet_sk(sk)->min_ttl)) {
+	/* min_ttl can be changed concurrently from do_ip_setsockopt() */
+	if (unlikely(iph->ttl < READ_ONCE(inet_sk(sk)->min_ttl))) {
 		__NET_INC_STATS(net, LINUX_MIB_TCPMINTTLDROP);
 		goto out;
 	}
@@ -2049,7 +2050,9 @@ int tcp_v4_rcv(struct sk_buff *skb)
 			return 0;
 		}
 	}
-	if (unlikely(iph->ttl < inet_sk(sk)->min_ttl)) {
+
+	/* min_ttl can be changed concurrently from do_ip_setsockopt() */
+	if (unlikely(iph->ttl < READ_ONCE(inet_sk(sk)->min_ttl))) {
 		__NET_INC_STATS(net, LINUX_MIB_TCPMINTTLDROP);
 		goto discard_and_relse;
 	}
-- 
2.33.0.1079.g6e70778dc9-goog


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH net-next 8/9] ipv4: guard IP_MINTTL with a static key
  2021-10-21 16:22 [PATCH net-next 0/9] tcp: receive path optimizations Eric Dumazet
                   ` (8 preceding siblings ...)
  2021-10-21 16:22 ` [PATCH net-next 7/9] ipv4: annotate data races arount inet->min_ttl Eric Dumazet
@ 2021-10-21 16:22 ` Eric Dumazet
  2021-10-21 16:22 ` [PATCH net-next 9/9] ipv6/tcp: small drop monitor changes Eric Dumazet
  2021-10-21 17:01 ` [PATCH net-next 0/9] tcp: receive path optimizations Soheil Hassas Yeganeh
  11 siblings, 0 replies; 15+ messages in thread
From: Eric Dumazet @ 2021-10-21 16:22 UTC (permalink / raw)
  To: David S . Miller, Jakub Kicinski
  Cc: netdev, Eric Dumazet, Eric Dumazet, Soheil Hassas Yeganeh, Neal Cardwell

From: Eric Dumazet <edumazet@google.com>

RFC 5082 IP_MINTTL option is rarely used on hosts.

Add a static key to remove from TCP fast path useless code,
and potential cache line miss to fetch inet_sk(sk)->min_ttl

Note that once ip4_min_ttl static key has been enabled,
it stays enabled until next boot.

Signed-off-by: Eric Dumazet <edumazet@google.com>
---
 include/net/ip.h       |  2 ++
 net/ipv4/ip_sockglue.c |  6 ++++++
 net/ipv4/tcp_ipv4.c    | 20 ++++++++++++--------
 3 files changed, 20 insertions(+), 8 deletions(-)

diff --git a/include/net/ip.h b/include/net/ip.h
index cf229a53119428307da898af4b0dc23e1cecc053..b71e88507c4a0907011c41e1ed0148eb873b5186 100644
--- a/include/net/ip.h
+++ b/include/net/ip.h
@@ -24,6 +24,7 @@
 #include <linux/skbuff.h>
 #include <linux/jhash.h>
 #include <linux/sockptr.h>
+#include <linux/static_key.h>
 
 #include <net/inet_sock.h>
 #include <net/route.h>
@@ -750,6 +751,7 @@ void ip_cmsg_recv_offset(struct msghdr *msg, struct sock *sk,
 			 struct sk_buff *skb, int tlen, int offset);
 int ip_cmsg_send(struct sock *sk, struct msghdr *msg,
 		 struct ipcm_cookie *ipc, bool allow_ipv6);
+DECLARE_STATIC_KEY_FALSE(ip4_min_ttl);
 int ip_setsockopt(struct sock *sk, int level, int optname, sockptr_t optval,
 		  unsigned int optlen);
 int ip_getsockopt(struct sock *sk, int level, int optname, char __user *optval,
diff --git a/net/ipv4/ip_sockglue.c b/net/ipv4/ip_sockglue.c
index d5487c8580674a01df8c7d8ce88f97c9add846b6..38d29b175ca6646c280e0626e8e935b348f00f08 100644
--- a/net/ipv4/ip_sockglue.c
+++ b/net/ipv4/ip_sockglue.c
@@ -886,6 +886,8 @@ static int compat_ip_mcast_join_leave(struct sock *sk, int optname,
 	return ip_mc_leave_group(sk, &mreq);
 }
 
+DEFINE_STATIC_KEY_FALSE(ip4_min_ttl);
+
 static int do_ip_setsockopt(struct sock *sk, int level, int optname,
 		sockptr_t optval, unsigned int optlen)
 {
@@ -1352,6 +1354,10 @@ static int do_ip_setsockopt(struct sock *sk, int level, int optname,
 			goto e_inval;
 		if (val < 0 || val > 255)
 			goto e_inval;
+
+		if (val)
+			static_branch_enable(&ip4_min_ttl);
+
 		/* tcp_v4_err() and tcp_v4_rcv() might read min_ttl
 		 * while we are changint it.
 		 */
diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
index 97b8acf726d0cdcb6b87b6ef45e366591d997a2b..8e9f05d9c54c316e6f6d0603ad786399f9c6345c 100644
--- a/net/ipv4/tcp_ipv4.c
+++ b/net/ipv4/tcp_ipv4.c
@@ -508,10 +508,12 @@ int tcp_v4_err(struct sk_buff *skb, u32 info)
 	if (sk->sk_state == TCP_CLOSE)
 		goto out;
 
-	/* min_ttl can be changed concurrently from do_ip_setsockopt() */
-	if (unlikely(iph->ttl < READ_ONCE(inet_sk(sk)->min_ttl))) {
-		__NET_INC_STATS(net, LINUX_MIB_TCPMINTTLDROP);
-		goto out;
+	if (static_branch_unlikely(&ip4_min_ttl)) {
+		/* min_ttl can be changed concurrently from do_ip_setsockopt() */
+		if (unlikely(iph->ttl < READ_ONCE(inet_sk(sk)->min_ttl))) {
+			__NET_INC_STATS(net, LINUX_MIB_TCPMINTTLDROP);
+			goto out;
+		}
 	}
 
 	tp = tcp_sk(sk);
@@ -2051,10 +2053,12 @@ int tcp_v4_rcv(struct sk_buff *skb)
 		}
 	}
 
-	/* min_ttl can be changed concurrently from do_ip_setsockopt() */
-	if (unlikely(iph->ttl < READ_ONCE(inet_sk(sk)->min_ttl))) {
-		__NET_INC_STATS(net, LINUX_MIB_TCPMINTTLDROP);
-		goto discard_and_relse;
+	if (static_branch_unlikely(&ip4_min_ttl)) {
+		/* min_ttl can be changed concurrently from do_ip_setsockopt() */
+		if (unlikely(iph->ttl < READ_ONCE(inet_sk(sk)->min_ttl))) {
+			__NET_INC_STATS(net, LINUX_MIB_TCPMINTTLDROP);
+			goto discard_and_relse;
+		}
 	}
 
 	if (!xfrm4_policy_check(sk, XFRM_POLICY_IN, skb))
-- 
2.33.0.1079.g6e70778dc9-goog


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH net-next 9/9] ipv6/tcp: small drop monitor changes
  2021-10-21 16:22 [PATCH net-next 0/9] tcp: receive path optimizations Eric Dumazet
                   ` (9 preceding siblings ...)
  2021-10-21 16:22 ` [PATCH net-next 8/9] ipv4: guard IP_MINTTL with a static key Eric Dumazet
@ 2021-10-21 16:22 ` Eric Dumazet
  2021-10-21 17:01 ` [PATCH net-next 0/9] tcp: receive path optimizations Soheil Hassas Yeganeh
  11 siblings, 0 replies; 15+ messages in thread
From: Eric Dumazet @ 2021-10-21 16:22 UTC (permalink / raw)
  To: David S . Miller, Jakub Kicinski
  Cc: netdev, Eric Dumazet, Eric Dumazet, Soheil Hassas Yeganeh, Neal Cardwell

From: Eric Dumazet <edumazet@google.com>

Two kfree_skb() calls must be replaced by consume_skb()
for skbs that are not technically dropped.

Signed-off-by: Eric Dumazet <edumazet@google.com>
---
 net/ipv6/tcp_ipv6.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/net/ipv6/tcp_ipv6.c b/net/ipv6/tcp_ipv6.c
index bbff3df27d1c24d7a47849b28297ba129baafc99..5504564f7e252e048df456156cf1183b5f01826c 100644
--- a/net/ipv6/tcp_ipv6.c
+++ b/net/ipv6/tcp_ipv6.c
@@ -572,7 +572,7 @@ static int tcp_v6_send_synack(const struct sock *sk, struct dst_entry *dst,
 static void tcp_v6_reqsk_destructor(struct request_sock *req)
 {
 	kfree(inet_rsk(req)->ipv6_opt);
-	kfree_skb(inet_rsk(req)->pktopts);
+	consume_skb(inet_rsk(req)->pktopts);
 }
 
 #ifdef CONFIG_TCP_MD5SIG
@@ -1591,7 +1591,7 @@ static int tcp_v6_do_rcv(struct sock *sk, struct sk_buff *skb)
 		}
 	}
 
-	kfree_skb(opt_skb);
+	consume_skb(opt_skb);
 	return 0;
 }
 
-- 
2.33.0.1079.g6e70778dc9-goog


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH net-next 1/2] net: sched: fix logic error in qdisc_run_begin()
  2021-10-21 16:22 ` [PATCH net-next 1/2] net: sched: fix logic error in qdisc_run_begin() Eric Dumazet
@ 2021-10-21 16:24   ` Eric Dumazet
  0 siblings, 0 replies; 15+ messages in thread
From: Eric Dumazet @ 2021-10-21 16:24 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: David S . Miller, Jakub Kicinski, netdev, Soheil Hassas Yeganeh,
	Neal Cardwell, Ahmed S . Darwish, Sebastian Andrzej Siewior

On Thu, Oct 21, 2021 at 9:23 AM Eric Dumazet <eric.dumazet@gmail.com> wrote:
>
> From: Eric Dumazet <edumazet@google.com>
>
> For non TCQ_F_NOLOCK qdisc, qdisc_run_begin() tries to set
> __QDISC_STATE_RUNNING and should return true if the bit was not set.
>
> test_and_set_bit() returns old bit value, therefore we need to invert.
>
> Fixes: 29cbcd858283 ("net: sched: Remove Qdisc::running sequence counter")
> Signed-off-by: Eric Dumazet <edumazet@google.com>
> Cc: Ahmed S. Darwish <a.darwish@linutronix.de>
> Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
> ---

Please disregard, I have accidentally resent this already merged patch.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH net-next 2/2] net: sched: remove one pair of atomic operations
  2021-10-21 16:22 ` [PATCH net-next 2/2] net: sched: remove one pair of atomic operations Eric Dumazet
@ 2021-10-21 16:25   ` Eric Dumazet
  0 siblings, 0 replies; 15+ messages in thread
From: Eric Dumazet @ 2021-10-21 16:25 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: David S . Miller, Jakub Kicinski, netdev, Soheil Hassas Yeganeh,
	Neal Cardwell, Ahmed S . Darwish, Sebastian Andrzej Siewior

On Thu, Oct 21, 2021 at 9:23 AM Eric Dumazet <eric.dumazet@gmail.com> wrote:
>
> From: Eric Dumazet <edumazet@google.com>
>
> __QDISC_STATE_RUNNING is only set/cleared from contexts owning qdisc lock.
>
> Thus we can use less expensive bit operations, as we were doing
> before commit f9eb8aea2a1e ("net_sched: transform qdisc running bit into a seqcount")
>
> Fixes: 29cbcd858283 ("net: sched: Remove Qdisc::running sequence counter")
> Signed-off-by: Eric Dumazet <edumazet@google.com>
> Cc: Ahmed S. Darwish <a.darwish@linutronix.de>
> Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
> ---


Please disregard, I have accidentally resent this patch while sending
another unrelated patch series.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH net-next 0/9] tcp: receive path optimizations
  2021-10-21 16:22 [PATCH net-next 0/9] tcp: receive path optimizations Eric Dumazet
                   ` (10 preceding siblings ...)
  2021-10-21 16:22 ` [PATCH net-next 9/9] ipv6/tcp: small drop monitor changes Eric Dumazet
@ 2021-10-21 17:01 ` Soheil Hassas Yeganeh
  11 siblings, 0 replies; 15+ messages in thread
From: Soheil Hassas Yeganeh @ 2021-10-21 17:01 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: David S . Miller, Jakub Kicinski, netdev, Eric Dumazet, Neal Cardwell

On Thu, Oct 21, 2021 at 12:23 PM Eric Dumazet <eric.dumazet@gmail.com> wrote:
>
> From: Eric Dumazet <edumazet@google.com>
>
> This series aims to reduce cache line misses in RX path.
>
> I am still working on better cache locality in tcp_sock but
> this will wait few more weeks.
>
> Eric Dumazet (9):
>   tcp: move inet->rx_dst_ifindex to sk->sk_rx_dst_ifindex
>   ipv6: move inet6_sk(sk)->rx_dst_cookie to sk->sk_rx_dst_cookie
>   net: avoid dirtying sk->sk_napi_id
>   net: avoid dirtying sk->sk_rx_queue_mapping
>   ipv6: annotate data races around np->min_hopcount
>   ipv6: guard IPV6_MINHOPCOUNT with a static key
>   ipv4: annotate data races arount inet->min_ttl
>   ipv4: guard IP_MINTTL with a static key
>   ipv6/tcp: small drop monitor changes

Acked-by: Soheil Hassas Yeganeh <soheil@google.com>

Very nice patch series!  The IP_MINTTL patch is an excellent find. I
wonder how many more of these we have. Thank you, Eric!

>  include/linux/ipv6.h     |  1 -
>  include/net/busy_poll.h  |  3 ++-
>  include/net/inet_sock.h  |  3 +--
>  include/net/ip.h         |  2 ++
>  include/net/ipv6.h       |  1 +
>  include/net/sock.h       | 11 +++++++----
>  net/ipv4/ip_sockglue.c   | 11 ++++++++++-
>  net/ipv4/tcp_ipv4.c      | 25 ++++++++++++++++---------
>  net/ipv6/ipv6_sockglue.c | 11 ++++++++++-
>  net/ipv6/tcp_ipv6.c      | 35 +++++++++++++++++++++--------------
>  net/ipv6/udp.c           |  4 ++--
>  11 files changed, 72 insertions(+), 35 deletions(-)
>
> --
> 2.33.0.1079.g6e70778dc9-goog
>

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2021-10-21 17:02 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-10-21 16:22 [PATCH net-next 0/9] tcp: receive path optimizations Eric Dumazet
2021-10-21 16:22 ` [PATCH net-next 1/2] net: sched: fix logic error in qdisc_run_begin() Eric Dumazet
2021-10-21 16:24   ` Eric Dumazet
2021-10-21 16:22 ` [PATCH net-next 1/9] tcp: move inet->rx_dst_ifindex to sk->sk_rx_dst_ifindex Eric Dumazet
2021-10-21 16:22 ` [PATCH net-next 2/9] ipv6: move inet6_sk(sk)->rx_dst_cookie to sk->sk_rx_dst_cookie Eric Dumazet
2021-10-21 16:22 ` [PATCH net-next 2/2] net: sched: remove one pair of atomic operations Eric Dumazet
2021-10-21 16:25   ` Eric Dumazet
2021-10-21 16:22 ` [PATCH net-next 3/9] net: avoid dirtying sk->sk_napi_id Eric Dumazet
2021-10-21 16:22 ` [PATCH net-next 4/9] net: avoid dirtying sk->sk_rx_queue_mapping Eric Dumazet
2021-10-21 16:22 ` [PATCH net-next 5/9] ipv6: annotate data races around np->min_hopcount Eric Dumazet
2021-10-21 16:22 ` [PATCH net-next 6/9] ipv6: guard IPV6_MINHOPCOUNT with a static key Eric Dumazet
2021-10-21 16:22 ` [PATCH net-next 7/9] ipv4: annotate data races arount inet->min_ttl Eric Dumazet
2021-10-21 16:22 ` [PATCH net-next 8/9] ipv4: guard IP_MINTTL with a static key Eric Dumazet
2021-10-21 16:22 ` [PATCH net-next 9/9] ipv6/tcp: small drop monitor changes Eric Dumazet
2021-10-21 17:01 ` [PATCH net-next 0/9] tcp: receive path optimizations Soheil Hassas Yeganeh

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.