netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC PATCH net-next 0/3] txhash: Make hash rethink configurable and change the default
@ 2021-08-09 18:53 Tom Herbert
  2021-08-09 18:53 ` [RFC PATCH net-next 1/3] txhash: Make rethinking txhash behavior configurable via sysctl Tom Herbert
                   ` (3 more replies)
  0 siblings, 4 replies; 6+ messages in thread
From: Tom Herbert @ 2021-08-09 18:53 UTC (permalink / raw)
  To: netdev, davem, brakmo, ycheng, eric.dumazet, a.e.azimov; +Cc: Tom Herbert

Alexander Azimov performed some nice analysis of the feature in Linux
stack where the IPv6 flow label is changed when the stack detects a
connection is failing. The idea of the algorithm is to try to find a
better path. His reults are quite impressive, and show that this form
of source routing can work effectively.

Alex raised an issue in that if the server endpoint is an IP anycast
address, the connection might break if the flow label changes routing
of packets on the connection. Anycast is known to be susceptible to
route changes, not just those caused be flow label. The concern is that
flow label modulation might increases the chances that anycast
connections might break, especially if the rethink occurs after just
one RTO which is the current behavior.

This patch set makes the rethink behavior granular and configurable.
It allows control of when to do the hash rethink: upon negative advice,
at RTO in SYN state, at RTO when not in SYN state. The behavior can
be configured by sysctl and by a socket option.

This patch set the defautl rethink behavior to be to do a rethink only
on negative advice. This is reverts back to the original behavior of
the hash rethink mechanism. This less aggressive with the intent of
mitigating potentail breakages when anycast addresses are present.
For those users that are benefitting from changing the hash at the
first RTO, they would retain that behavior by setting the sysctl.
*** BLURB HERE ***

Tom Herbert (3):
  txhash: Make rethinking txhash behavior configurable via sysctl
  txhash: Add socket option to control TX hash rethink behavior
  txhash: Change default rethink behavior to be less aggressive

 arch/alpha/include/uapi/asm/socket.h  |  2 ++
 arch/mips/include/uapi/asm/socket.h   |  2 ++
 arch/parisc/include/uapi/asm/socket.h |  2 ++
 arch/sparc/include/uapi/asm/socket.h  |  3 ++-
 include/net/netns/core.h              |  2 ++
 include/net/sock.h                    | 32 +++++++++++++++++++--------
 include/uapi/asm-generic/socket.h     |  2 ++
 include/uapi/linux/socket.h           | 13 +++++++++++
 net/core/net_namespace.c              |  4 ++++
 net/core/sock.c                       | 16 ++++++++++++++
 net/core/sysctl_net_core.c            |  7 ++++++
 net/ipv4/tcp_input.c                  |  2 +-
 net/ipv4/tcp_timer.c                  |  5 ++++-
 13 files changed, 80 insertions(+), 12 deletions(-)

-- 
2.25.1


^ permalink raw reply	[flat|nested] 6+ messages in thread

* [RFC PATCH net-next 1/3] txhash: Make rethinking txhash behavior configurable via sysctl
  2021-08-09 18:53 [RFC PATCH net-next 0/3] txhash: Make hash rethink configurable and change the default Tom Herbert
@ 2021-08-09 18:53 ` Tom Herbert
  2021-08-09 18:53 ` [RFC PATCH net-next 2/3] txhash: Add socket option to control TX hash rethink behavior Tom Herbert
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 6+ messages in thread
From: Tom Herbert @ 2021-08-09 18:53 UTC (permalink / raw)
  To: netdev, davem, brakmo, ycheng, eric.dumazet, a.e.azimov; +Cc: Tom Herbert

Add a per ns sysctl that controls the txhash rethink behavior,
sk_rethink_txhash. This sysctl value is a mask rethink modes that are:
rethink at negative advice, rethink at SYN RTO, and rethink at non-SYN
RTO. A value of zero disables hash rethink. The default mask is set
to rethink with all three modes (retains current default behavior)
---
 include/net/netns/core.h    |  2 ++
 include/net/sock.h          | 26 +++++++++++++++++---------
 include/uapi/linux/socket.h | 13 +++++++++++++
 net/core/net_namespace.c    |  6 ++++++
 net/core/sysctl_net_core.c  |  7 +++++++
 net/ipv4/tcp_input.c        |  2 +-
 net/ipv4/tcp_timer.c        |  5 ++++-
 7 files changed, 50 insertions(+), 11 deletions(-)

diff --git a/include/net/netns/core.h b/include/net/netns/core.h
index 36c2d998a43c..503f43bfc1d3 100644
--- a/include/net/netns/core.h
+++ b/include/net/netns/core.h
@@ -11,6 +11,8 @@ struct netns_core {
 
 	int	sysctl_somaxconn;
 
+	unsigend int sysctl_txrehash_mode;
+
 #ifdef CONFIG_PROC_FS
 	int __percpu *sock_inuse;
 	struct prot_inuse __percpu *prot_inuse;
diff --git a/include/net/sock.h b/include/net/sock.h
index 6e761451c927..6ef5314e8eed 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -577,6 +577,12 @@ static inline bool sk_user_data_is_nocopy(const struct sock *sk)
 			   __tmp | SK_USER_DATA_NOCOPY);		\
 })
 
+static inline
+struct net *sock_net(const struct sock *sk)
+{
+	return read_pnet(&sk->sk_net);
+}
+
 /*
  * SK_CAN_REUSE and SK_NO_REUSE on a socket mean that the socket is OK
  * or not whether his port will be reused by someone else. SK_FORCE_REUSE
@@ -1940,12 +1946,20 @@ static inline void sk_set_txhash(struct sock *sk)
 	WRITE_ONCE(sk->sk_txhash, net_tx_rndhash());
 }
 
-static inline bool sk_rethink_txhash(struct sock *sk)
+static inline bool sk_rethink_txhash(struct sock *sk, unsigned int level)
 {
-	if (sk->sk_txhash) {
+	unsigned int rehash_mode;
+
+	if (!sk->sk_txhash)
+		return false;
+
+	rehash_mode = READ_ONCE(sock_net(sk)->core.sysctl_txrehash_mode);
+
+	if (level & rehash_mode) {
 		sk_set_txhash(sk);
 		return true;
 	}
+
 	return false;
 }
 
@@ -1986,7 +2000,7 @@ static inline void __dst_negative_advice(struct sock *sk)
 
 static inline void dst_negative_advice(struct sock *sk)
 {
-	sk_rethink_txhash(sk);
+	sk_rethink_txhash(sk, SOCK_TXREHASH_MODE_NEG_ADVICE);
 	__dst_negative_advice(sk);
 }
 
@@ -2591,12 +2605,6 @@ static inline void sk_eat_skb(struct sock *sk, struct sk_buff *skb)
 	__kfree_skb(skb);
 }
 
-static inline
-struct net *sock_net(const struct sock *sk)
-{
-	return read_pnet(&sk->sk_net);
-}
-
 static inline
 void sock_net_set(struct sock *sk, struct net *net)
 {
diff --git a/include/uapi/linux/socket.h b/include/uapi/linux/socket.h
index eb0a9a5b6e71..2c2cef795a9b 100644
--- a/include/uapi/linux/socket.h
+++ b/include/uapi/linux/socket.h
@@ -31,4 +31,17 @@ struct __kernel_sockaddr_storage {
 
 #define SOCK_BUF_LOCK_MASK (SOCK_SNDBUF_LOCK | SOCK_RCVBUF_LOCK)
 
+#define SOCK_TXREHASH_MODE_DISABLE	0
+
+/* Flag bits for individual rehash function modes */
+#define SOCK_TXREHASH_MODE_NEG_ADVICE	0x1
+#define SOCK_TXREHASH_MODE_SYN_RTO	0x2
+#define SOCK_TXREHASH_MODE_RTO		0x4
+
+#define SOCK_TXREHASH_MODE_DEFAULT	-1U
+
+#define SOCK_TXREHASH_MODE_MASK	(SOCK_TXREHASH_MODE_NEG_ADVICE |	\
+				 SOCK_TXREHASH_MODE_SYN_RTO |		\
+				 SOCK_TXREHASH_MODE_RTO)
+
 #endif /* _UAPI_LINUX_SOCKET_H */
diff --git a/net/core/net_namespace.c b/net/core/net_namespace.c
index 9b5a767eddd5..03d3767e6728 100644
--- a/net/core/net_namespace.c
+++ b/net/core/net_namespace.c
@@ -366,6 +366,12 @@ static __net_init int setup_net(struct net *net, struct user_namespace *user_ns)
 static int __net_init net_defaults_init_net(struct net *net)
 {
 	net->core.sysctl_somaxconn = SOMAXCONN;
+
+	/* Default rethink mode is aggrssive (i.e. rethink on first RTO) */
+	net->core.sysctl_txrehash_mode = SOCK_TXREHASH_MODE_NEG_ADVICE |
+					 SOCK_TXREHASH_MODE_SYN_RTO |
+					 SOCK_TXREHASH_MODE_RTO;
+
 	return 0;
 }
 
diff --git a/net/core/sysctl_net_core.c b/net/core/sysctl_net_core.c
index c8496c1142c9..7e828a892bf5 100644
--- a/net/core/sysctl_net_core.c
+++ b/net/core/sysctl_net_core.c
@@ -592,6 +592,13 @@ static struct ctl_table netns_core_table[] = {
 		.extra1		= SYSCTL_ZERO,
 		.proc_handler	= proc_dointvec_minmax
 	},
+	{
+		.procname	= "txrehash_mode",
+		.data		= &init_net.core.sysctl_txrehash_mode,
+		.maxlen		= sizeof(unsigned int),
+		.mode		= 0644,
+		.proc_handler	= proc_dointvec
+	},
 	{ }
 };
 
diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index 3f7bd7ae7d7a..08eeb2523393 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -4442,7 +4442,7 @@ static void tcp_rcv_spurious_retrans(struct sock *sk, const struct sk_buff *skb)
 	 * DSACK state and change the txhash to re-route speculatively.
 	 */
 	if (TCP_SKB_CB(skb)->seq == tcp_sk(sk)->duplicate_sack[0].start_seq &&
-	    sk_rethink_txhash(sk))
+	    sk_rethink_txhash(sk, SOCK_TXREHASH_MODE_RTO))
 		NET_INC_STATS(sock_net(sk), LINUX_MIB_TCPDUPLICATEDATAREHASH);
 }
 
diff --git a/net/ipv4/tcp_timer.c b/net/ipv4/tcp_timer.c
index 20cf4a98c69d..53ae43ab5ebe 100644
--- a/net/ipv4/tcp_timer.c
+++ b/net/ipv4/tcp_timer.c
@@ -234,6 +234,7 @@ static int tcp_write_timeout(struct sock *sk)
 	struct tcp_sock *tp = tcp_sk(sk);
 	struct net *net = sock_net(sk);
 	bool expired = false, do_reset;
+	unsigned int rehash_mode;
 	int retry_until;
 
 	if ((1 << sk->sk_state) & (TCPF_SYN_SENT | TCPF_SYN_RECV)) {
@@ -241,6 +242,7 @@ static int tcp_write_timeout(struct sock *sk)
 			__dst_negative_advice(sk);
 		retry_until = icsk->icsk_syn_retries ? : net->ipv4.sysctl_tcp_syn_retries;
 		expired = icsk->icsk_retransmits >= retry_until;
+		rehash_mode = SOCK_TXREHASH_MODE_SYN_RTO;
 	} else {
 		if (retransmits_timed_out(sk, net->ipv4.sysctl_tcp_retries1, 0)) {
 			/* Black hole detection */
@@ -260,6 +262,7 @@ static int tcp_write_timeout(struct sock *sk)
 			if (tcp_out_of_resources(sk, do_reset))
 				return 1;
 		}
+		rehash_mode = SOCK_TXREHASH_MODE_RTO;
 	}
 	if (!expired)
 		expired = retransmits_timed_out(sk, retry_until,
@@ -277,7 +280,7 @@ static int tcp_write_timeout(struct sock *sk)
 		return 1;
 	}
 
-	if (sk_rethink_txhash(sk)) {
+	if (sk_rethink_txhash(sk, rehash_mode)) {
 		tp->timeout_rehash++;
 		__NET_INC_STATS(sock_net(sk), LINUX_MIB_TCPTIMEOUTREHASH);
 	}
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [RFC PATCH net-next 2/3] txhash: Add socket option to control TX hash rethink behavior
  2021-08-09 18:53 [RFC PATCH net-next 0/3] txhash: Make hash rethink configurable and change the default Tom Herbert
  2021-08-09 18:53 ` [RFC PATCH net-next 1/3] txhash: Make rethinking txhash behavior configurable via sysctl Tom Herbert
@ 2021-08-09 18:53 ` Tom Herbert
  2021-08-09 18:53 ` [RFC PATCH net-next 3/3] txhash: Change default rethink behavior to be less aggressive Tom Herbert
  2021-08-09 21:56 ` [RFC PATCH net-next 0/3] txhash: Make hash rethink configurable and change the default Yuchung Cheng
  3 siblings, 0 replies; 6+ messages in thread
From: Tom Herbert @ 2021-08-09 18:53 UTC (permalink / raw)
  To: netdev, davem, brakmo, ycheng, eric.dumazet, a.e.azimov; +Cc: Tom Herbert

Add the SO_TXREHASH_MODE socket option to control hash rethink behavior
per socket. The setsockopt argument is a mask of rethink modes
(SOCK_TXREHASH_MODE_NEG_ADVICE, SOCK_TXREHASH_MODE_SYN_RTO,
and SOCK_TXREHASH_MODE_RTO). The argument may also be -1U
(SOCK_TXREHASH_MODE_DEFAULT) which indicates that the default system
value should be used (see txrehash_mode sysctl)
---
 arch/alpha/include/uapi/asm/socket.h  |  2 ++
 arch/mips/include/uapi/asm/socket.h   |  2 ++
 arch/parisc/include/uapi/asm/socket.h |  2 ++
 arch/sparc/include/uapi/asm/socket.h  |  3 ++-
 include/net/sock.h                    |  8 +++++++-
 include/uapi/asm-generic/socket.h     |  2 ++
 net/core/sock.c                       | 16 ++++++++++++++++
 7 files changed, 33 insertions(+), 2 deletions(-)

diff --git a/arch/alpha/include/uapi/asm/socket.h b/arch/alpha/include/uapi/asm/socket.h
index 1dd9baf4a6c2..1165cdab5277 100644
--- a/arch/alpha/include/uapi/asm/socket.h
+++ b/arch/alpha/include/uapi/asm/socket.h
@@ -131,6 +131,8 @@
 
 #define SO_BUF_LOCK		72
 
+#define SO_TXREHASH_MODE	73
+
 #if !defined(__KERNEL__)
 
 #if __BITS_PER_LONG == 64
diff --git a/arch/mips/include/uapi/asm/socket.h b/arch/mips/include/uapi/asm/socket.h
index 1eaf6a1ca561..91412f7725bd 100644
--- a/arch/mips/include/uapi/asm/socket.h
+++ b/arch/mips/include/uapi/asm/socket.h
@@ -142,6 +142,8 @@
 
 #define SO_BUF_LOCK		72
 
+#define SO_TXREHASH_MODE	73
+
 #if !defined(__KERNEL__)
 
 #if __BITS_PER_LONG == 64
diff --git a/arch/parisc/include/uapi/asm/socket.h b/arch/parisc/include/uapi/asm/socket.h
index 8baaad52d799..80e0eddc6730 100644
--- a/arch/parisc/include/uapi/asm/socket.h
+++ b/arch/parisc/include/uapi/asm/socket.h
@@ -123,6 +123,8 @@
 
 #define SO_BUF_LOCK		0x4046
 
+#define SO_TXREHASH_MODE	0x4047
+
 #if !defined(__KERNEL__)
 
 #if __BITS_PER_LONG == 64
diff --git a/arch/sparc/include/uapi/asm/socket.h b/arch/sparc/include/uapi/asm/socket.h
index e80ee8641ac3..2fd5679e4116 100644
--- a/arch/sparc/include/uapi/asm/socket.h
+++ b/arch/sparc/include/uapi/asm/socket.h
@@ -124,8 +124,9 @@
 
 #define SO_BUF_LOCK              0x0051
 
-#if !defined(__KERNEL__)
+#define SO_TXREHASH_MODE	 0x0052
 
+#if !defined(__KERNEL__)
 
 #if __BITS_PER_LONG == 64
 #define SO_TIMESTAMP		SO_TIMESTAMP_OLD
diff --git a/include/net/sock.h b/include/net/sock.h
index 6ef5314e8eed..b6ddb5278b8c 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -313,6 +313,7 @@ struct bpf_local_storage;
   *	@sk_rcvtimeo: %SO_RCVTIMEO setting
   *	@sk_sndtimeo: %SO_SNDTIMEO setting
   *	@sk_txhash: computed flow hash for use on transmit
+  *	@sk_txrehash_mode: configuration bits for controlling TX hash rethink
   *	@sk_filter: socket filtering instructions
   *	@sk_timer: sock cleanup timer
   *	@sk_stamp: time stamp of last packet received
@@ -462,6 +463,7 @@ struct sock {
 	unsigned int		sk_gso_max_size;
 	gfp_t			sk_allocation;
 	__u32			sk_txhash;
+	unsigned int		sk_txrehash_mode;
 
 	/*
 	 * Because of non atomicity rules, all
@@ -1953,7 +1955,11 @@ static inline bool sk_rethink_txhash(struct sock *sk, unsigned int level)
 	if (!sk->sk_txhash)
 		return false;
 
-	rehash_mode = READ_ONCE(sock_net(sk)->core.sysctl_txrehash_mode);
+	if (sk->sk_txrehash_mode == SOCK_TXREHASH_MODE_DEFAULT)
+		rehash_mode =
+			READ_ONCE(sock_net(sk)->core.sysctl_txrehash_mode);
+	else
+		rehash_mode = sk->sk_txrehash_mode;
 
 	if (level & rehash_mode) {
 		sk_set_txhash(sk);
diff --git a/include/uapi/asm-generic/socket.h b/include/uapi/asm-generic/socket.h
index 1f0a2b4864e4..daa775cc4108 100644
--- a/include/uapi/asm-generic/socket.h
+++ b/include/uapi/asm-generic/socket.h
@@ -126,6 +126,8 @@
 
 #define SO_BUF_LOCK		72
 
+#define SO_TXREHASH_MODE	73
+
 #if !defined(__KERNEL__)
 
 #if __BITS_PER_LONG == 64 || (defined(__x86_64__) && defined(__ILP32__))
diff --git a/net/core/sock.c b/net/core/sock.c
index aada649e07e8..946d9e9242c8 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -1367,6 +1367,17 @@ int sock_setsockopt(struct socket *sock, int level, int optname,
 					  ~SOCK_BUF_LOCK_MASK);
 		break;
 
+	case SO_TXREHASH_MODE:
+		if (val == SOCK_TXREHASH_MODE_DEFAULT ||
+		    (val & ~SOCK_TXREHASH_MODE_MASK)) {
+			ret = -EINVAL;
+			break;
+		}
+
+		sk->sk_txrehash_mode =  val;
+
+		break;
+
 	default:
 		ret = -ENOPROTOOPT;
 		break;
@@ -1733,6 +1744,10 @@ int sock_getsockopt(struct socket *sock, int level, int optname,
 		v.val = sk->sk_userlocks & SOCK_BUF_LOCK_MASK;
 		break;
 
+	case SO_TXREHASH_MODE:
+		v.val = sk->sk_txrehash_mode;
+		break;
+
 	default:
 		/* We implement the SO_SNDLOWAT etc to not be settable
 		 * (1003.1g 7).
@@ -3158,6 +3173,7 @@ void sock_init_data(struct socket *sock, struct sock *sk)
 	sk->sk_pacing_rate = ~0UL;
 	WRITE_ONCE(sk->sk_pacing_shift, 10);
 	sk->sk_incoming_cpu = -1;
+	sk->sk_txrehash_mode = SOCK_TXREHASH_MODE_DEFAULT;
 
 	sk_rx_queue_clear(sk);
 	/*
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [RFC PATCH net-next 3/3] txhash: Change default rethink behavior to be less aggressive
  2021-08-09 18:53 [RFC PATCH net-next 0/3] txhash: Make hash rethink configurable and change the default Tom Herbert
  2021-08-09 18:53 ` [RFC PATCH net-next 1/3] txhash: Make rethinking txhash behavior configurable via sysctl Tom Herbert
  2021-08-09 18:53 ` [RFC PATCH net-next 2/3] txhash: Add socket option to control TX hash rethink behavior Tom Herbert
@ 2021-08-09 18:53 ` Tom Herbert
  2021-08-09 21:56 ` [RFC PATCH net-next 0/3] txhash: Make hash rethink configurable and change the default Yuchung Cheng
  3 siblings, 0 replies; 6+ messages in thread
From: Tom Herbert @ 2021-08-09 18:53 UTC (permalink / raw)
  To: netdev, davem, brakmo, ycheng, eric.dumazet, a.e.azimov; +Cc: Tom Herbert

Revert the default rethink behavior to only do a rethink upon negative
advice (at three RTOs with current defaults). This is less aggressive
than the current default which is to rethink the hash at the first
RTO.

The rationale for this change is that IP anycast relies on consistent
routing and changing the hash may affect the routing of the packet
For instance, if the hash is changed then the flow label used for
a TCP connection is changed and so the routing of packets for the
connection may change. If the destination address is anycast, a
route change may direct packets to a different server than doesn't
have state for the connection thereby breaking the connection is broken.
---
 net/core/net_namespace.c | 6 ++----
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/net/core/net_namespace.c b/net/core/net_namespace.c
index 03d3767e6728..bf9696dd7106 100644
--- a/net/core/net_namespace.c
+++ b/net/core/net_namespace.c
@@ -367,10 +367,8 @@ static int __net_init net_defaults_init_net(struct net *net)
 {
 	net->core.sysctl_somaxconn = SOMAXCONN;
 
-	/* Default rethink mode is aggrssive (i.e. rethink on first RTO) */
-	net->core.sysctl_txrehash_mode = SOCK_TXREHASH_MODE_NEG_ADVICE |
-					 SOCK_TXREHASH_MODE_SYN_RTO |
-					 SOCK_TXREHASH_MODE_RTO;
+	/* Default rethink mode is negative advice (i.e. not rthink on RTO) */
+	net->core.sysctl_txrehash_mode = SOCK_TXREHASH_MODE_NEG_ADVICE;
 
 	return 0;
 }
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [RFC PATCH net-next 0/3] txhash: Make hash rethink configurable and change the default
  2021-08-09 18:53 [RFC PATCH net-next 0/3] txhash: Make hash rethink configurable and change the default Tom Herbert
                   ` (2 preceding siblings ...)
  2021-08-09 18:53 ` [RFC PATCH net-next 3/3] txhash: Change default rethink behavior to be less aggressive Tom Herbert
@ 2021-08-09 21:56 ` Yuchung Cheng
  2021-08-09 21:58   ` Yuchung Cheng
  3 siblings, 1 reply; 6+ messages in thread
From: Yuchung Cheng @ 2021-08-09 21:56 UTC (permalink / raw)
  To: Tom Herbert; +Cc: netdev, davem, brakmo, eric.dumazet, a.e.azimov

On Mon, Aug 9, 2021 at 11:53 AM Tom Herbert <tom@herbertland.com> wrote:
>
> Alexander Azimov performed some nice analysis of the feature in Linux
> stack where the IPv6 flow label is changed when the stack detects a
> connection is failing. The idea of the algorithm is to try to find a
> better path. His reults are quite impressive, and show that this form
> of source routing can work effectively.
>
> Alex raised an issue in that if the server endpoint is an IP anycast
> address, the connection might break if the flow label changes routing
> of packets on the connection. Anycast is known to be susceptible to
> route changes, not just those caused be flow label. The concern is that
> flow label modulation might increases the chances that anycast
> connections might break, especially if the rethink occurs after just
> one RTO which is the current behavior.
>
> This patch set makes the rethink behavior granular and configurable.
> It allows control of when to do the hash rethink: upon negative advice,
> at RTO in SYN state, at RTO when not in SYN state. The behavior can
> be configured by sysctl and by a socket option.
>
> This patch set the defautl rethink behavior to be to do a rethink only
> on negative advice. This is reverts back to the original behavior of
> the hash rethink mechanism. This less aggressive with the intent of
Thanks for offering knobs to the txhash mechanism.

Any reason why reverting the default behavior (that was changed in
2013) is necessary? systems now rely on this RTO tx-rehash to work
around link failures will now have to manually re-enable it. Some
users may have to learn from higher connection failures to eventually
identify this kernel change.

> mitigating potentail breakages when anycast addresses are present.> For those users that are benefitting from changing the hash at the
> first RTO, they would retain that behavior by setting the sysctl.
> *** BLURB HERE ***
>
> Tom Herbert (3):
>   txhash: Make rethinking txhash behavior configurable via sysctl
>   txhash: Add socket option to control TX hash rethink behavior
>   txhash: Change default rethink behavior to be less aggressive
>
>  arch/alpha/include/uapi/asm/socket.h  |  2 ++
>  arch/mips/include/uapi/asm/socket.h   |  2 ++
>  arch/parisc/include/uapi/asm/socket.h |  2 ++
>  arch/sparc/include/uapi/asm/socket.h  |  3 ++-
>  include/net/netns/core.h              |  2 ++
>  include/net/sock.h                    | 32 +++++++++++++++++++--------
>  include/uapi/asm-generic/socket.h     |  2 ++
>  include/uapi/linux/socket.h           | 13 +++++++++++
>  net/core/net_namespace.c              |  4 ++++
>  net/core/sock.c                       | 16 ++++++++++++++
>  net/core/sysctl_net_core.c            |  7 ++++++
>  net/ipv4/tcp_input.c                  |  2 +-
>  net/ipv4/tcp_timer.c                  |  5 ++++-
>  13 files changed, 80 insertions(+), 12 deletions(-)
>
> --
> 2.25.1
>

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [RFC PATCH net-next 0/3] txhash: Make hash rethink configurable and change the default
  2021-08-09 21:56 ` [RFC PATCH net-next 0/3] txhash: Make hash rethink configurable and change the default Yuchung Cheng
@ 2021-08-09 21:58   ` Yuchung Cheng
  0 siblings, 0 replies; 6+ messages in thread
From: Yuchung Cheng @ 2021-08-09 21:58 UTC (permalink / raw)
  To: Tom Herbert; +Cc: netdev, davem, brakmo, eric.dumazet, a.e.azimov

On Mon, Aug 9, 2021 at 2:56 PM Yuchung Cheng <ycheng@google.com> wrote:
>
> On Mon, Aug 9, 2021 at 11:53 AM Tom Herbert <tom@herbertland.com> wrote:
> >
> > Alexander Azimov performed some nice analysis of the feature in Linux
> > stack where the IPv6 flow label is changed when the stack detects a
> > connection is failing. The idea of the algorithm is to try to find a
> > better path. His reults are quite impressive, and show that this form
> > of source routing can work effectively.
> >
> > Alex raised an issue in that if the server endpoint is an IP anycast
> > address, the connection might break if the flow label changes routing
> > of packets on the connection. Anycast is known to be susceptible to
> > route changes, not just those caused be flow label. The concern is that
> > flow label modulation might increases the chances that anycast
> > connections might break, especially if the rethink occurs after just
> > one RTO which is the current behavior.
> >
> > This patch set makes the rethink behavior granular and configurable.
> > It allows control of when to do the hash rethink: upon negative advice,
> > at RTO in SYN state, at RTO when not in SYN state. The behavior can
> > be configured by sysctl and by a socket option.
> >
> > This patch set the defautl rethink behavior to be to do a rethink only
> > on negative advice. This is reverts back to the original behavior of
> > the hash rethink mechanism. This less aggressive with the intent of
> Thanks for offering knobs to the txhash mechanism.
>
> Any reason why reverting the default behavior (that was changed in
> 2013) is necessary? systems now rely on this RTO tx-rehash to work
> around link failures will now have to manually re-enable it. Some
> users may have to learn from higher connection failures to eventually
> identify this kernel change.
Just to be clear: I agree we should offer knobs to change the txhash
behavior so the first parts of this set looks good to me. I am only
concerned about the default behavior reversal.

>
> > mitigating potentail breakages when anycast addresses are present.> For those users that are benefitting from changing the hash at the
> > first RTO, they would retain that behavior by setting the sysctl.
> > *** BLURB HERE ***
> >
> > Tom Herbert (3):
> >   txhash: Make rethinking txhash behavior configurable via sysctl
> >   txhash: Add socket option to control TX hash rethink behavior
> >   txhash: Change default rethink behavior to be less aggressive
> >
> >  arch/alpha/include/uapi/asm/socket.h  |  2 ++
> >  arch/mips/include/uapi/asm/socket.h   |  2 ++
> >  arch/parisc/include/uapi/asm/socket.h |  2 ++
> >  arch/sparc/include/uapi/asm/socket.h  |  3 ++-
> >  include/net/netns/core.h              |  2 ++
> >  include/net/sock.h                    | 32 +++++++++++++++++++--------
> >  include/uapi/asm-generic/socket.h     |  2 ++
> >  include/uapi/linux/socket.h           | 13 +++++++++++
> >  net/core/net_namespace.c              |  4 ++++
> >  net/core/sock.c                       | 16 ++++++++++++++
> >  net/core/sysctl_net_core.c            |  7 ++++++
> >  net/ipv4/tcp_input.c                  |  2 +-
> >  net/ipv4/tcp_timer.c                  |  5 ++++-
> >  13 files changed, 80 insertions(+), 12 deletions(-)
> >
> > --
> > 2.25.1
> >

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2021-08-09 21:59 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-08-09 18:53 [RFC PATCH net-next 0/3] txhash: Make hash rethink configurable and change the default Tom Herbert
2021-08-09 18:53 ` [RFC PATCH net-next 1/3] txhash: Make rethinking txhash behavior configurable via sysctl Tom Herbert
2021-08-09 18:53 ` [RFC PATCH net-next 2/3] txhash: Add socket option to control TX hash rethink behavior Tom Herbert
2021-08-09 18:53 ` [RFC PATCH net-next 3/3] txhash: Change default rethink behavior to be less aggressive Tom Herbert
2021-08-09 21:56 ` [RFC PATCH net-next 0/3] txhash: Make hash rethink configurable and change the default Yuchung Cheng
2021-08-09 21:58   ` Yuchung Cheng

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).