All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 0/2] netfilter: conntrack: improve SCTP multihoming
@ 2022-10-30 12:25 sriram.yagnaraman
  2022-10-30 12:25 ` [PATCH v2 1/2] netfilter: conntrack: introduce no_random_port proc entry sriram.yagnaraman
  2022-10-30 12:25 ` [PATCH v2 2/2] netfilter: conntrack: add sctp DATA_SENT state sriram.yagnaraman
  0 siblings, 2 replies; 10+ messages in thread
From: sriram.yagnaraman @ 2022-10-30 12:25 UTC (permalink / raw)
  To: netfilter-devel; +Cc: Sriram Yagnaraman

From: Sriram Yagnaraman <sriram.yagnaraman@est.tech>

Changes since v1:
- Fixed kernel test robot reported issues on fallthrough

Original cover letter text:

This patch series introduces a couple of changes to improve SCTP multihoming support when running behind NAT.

An SCTP association having multiple alternative paths, will have different IP addreses but will still have to use the same SCTP port. This means all the paths that have an NAT/middlebox will have to co-ordinate and use the same source port after SNAT.
This patch series introduces a sysctl to disable source port randomization.

An SCTP endpoint is allowed to use alternative paths during the lifetime of an association. This makes it hard to write a stateful SCTP connection tracking module. This patch series adds a new conntrack state DATA_SENT that will be triggered on receiving a DATA/SACK chunk on a new conntrack entry. This state behaves similar to the existing HEARTBEAT_SENT state.

Sriram Yagnaraman (2):
  netfilter: conntrack: introduce no_random_port proc entry
  netfilter: conntrack: add sctp DATA_SENT state

 include/net/netns/conntrack.h                 |   1 +
 .../uapi/linux/netfilter/nf_conntrack_sctp.h  |   1 +
 .../linux/netfilter/nfnetlink_cttimeout.h     |   1 +
 net/netfilter/nf_conntrack_proto_sctp.c       | 107 +++++++++++-------
 net/netfilter/nf_conntrack_standalone.c       |  21 ++++
 net/netfilter/nf_nat_core.c                   |  10 +-
 6 files changed, 97 insertions(+), 44 deletions(-)

-- 
2.34.1


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH v2 1/2] netfilter: conntrack: introduce no_random_port proc entry
  2022-10-30 12:25 [PATCH v2 0/2] netfilter: conntrack: improve SCTP multihoming sriram.yagnaraman
@ 2022-10-30 12:25 ` sriram.yagnaraman
  2022-10-31  8:38   ` Florian Westphal
  2022-10-30 12:25 ` [PATCH v2 2/2] netfilter: conntrack: add sctp DATA_SENT state sriram.yagnaraman
  1 sibling, 1 reply; 10+ messages in thread
From: sriram.yagnaraman @ 2022-10-30 12:25 UTC (permalink / raw)
  To: netfilter-devel; +Cc: Sriram Yagnaraman, kernel test robot

From: Sriram Yagnaraman <sriram.yagnaraman@est.tech>

This patch introduces a new proc entry to disable source port
randomization for SCTP connections.

As specified in RFC9260 all transport addresses used by an SCTP endpoint
MUST use the same port number but can use multiple IP addresses. That
means that all paths taken within an SCTP association should have the
same port even if they pass through different NAT/middleboxes in the
network.

Disabling source port randomization provides a deterministic source port
for the remote SCTP endpoint for all paths used in the SCTP association.
On NAT/middlebox restarts we will always end up with the same port after
the restart, and the SCTP endpoints involved in the SCTP association can
remain transparent to restarts.

Of course, there is a downside as this makes it impossible to have
multiple SCTP endpoints behind NAT that use the same source port.
But, this is a lesser of a problem than losing an existing association
altogether.

Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Sriram Yagnaraman <sriram.yagnaraman@est.tech>
---
 include/net/netns/conntrack.h           |  1 +
 net/netfilter/nf_conntrack_proto_sctp.c |  3 +++
 net/netfilter/nf_conntrack_standalone.c | 13 +++++++++++++
 net/netfilter/nf_nat_core.c             | 10 +++++++++-
 4 files changed, 26 insertions(+), 1 deletion(-)

diff --git a/include/net/netns/conntrack.h b/include/net/netns/conntrack.h
index e1290c159184..097bed663805 100644
--- a/include/net/netns/conntrack.h
+++ b/include/net/netns/conntrack.h
@@ -60,6 +60,7 @@ struct nf_dccp_net {
 #ifdef CONFIG_NF_CT_PROTO_SCTP
 struct nf_sctp_net {
 	unsigned int timeouts[SCTP_CONNTRACK_MAX];
+	u8 sctp_no_random_port;
 };
 #endif
 
diff --git a/net/netfilter/nf_conntrack_proto_sctp.c b/net/netfilter/nf_conntrack_proto_sctp.c
index 5a936334b517..5e4d3215dcf6 100644
--- a/net/netfilter/nf_conntrack_proto_sctp.c
+++ b/net/netfilter/nf_conntrack_proto_sctp.c
@@ -699,6 +699,9 @@ void nf_conntrack_sctp_init_net(struct net *net)
 	 * 'new' timeout, like udp or icmp.
 	 */
 	sn->timeouts[0] = sctp_timeouts[SCTP_CONNTRACK_CLOSED];
+
+	/* leave source port randomization as true by default */
+	sn->sctp_no_random_port = 0;
 }
 
 const struct nf_conntrack_l4proto nf_conntrack_l4proto_sctp = {
diff --git a/net/netfilter/nf_conntrack_standalone.c b/net/netfilter/nf_conntrack_standalone.c
index 4ffe84c5a82c..e35876ce418d 100644
--- a/net/netfilter/nf_conntrack_standalone.c
+++ b/net/netfilter/nf_conntrack_standalone.c
@@ -602,6 +602,7 @@ enum nf_ct_sysctl_index {
 	NF_SYSCTL_CT_PROTO_TIMEOUT_SCTP_SHUTDOWN_ACK_SENT,
 	NF_SYSCTL_CT_PROTO_TIMEOUT_SCTP_HEARTBEAT_SENT,
 	NF_SYSCTL_CT_PROTO_TIMEOUT_SCTP_HEARTBEAT_ACKED,
+	NF_SYSCTL_CT_PROTO_SCTP_NO_RANDOM_PORT,
 #endif
 #ifdef CONFIG_NF_CT_PROTO_DCCP
 	NF_SYSCTL_CT_PROTO_TIMEOUT_DCCP_REQUEST,
@@ -892,6 +893,14 @@ static struct ctl_table nf_ct_sysctl_table[] = {
 		.mode           = 0644,
 		.proc_handler   = proc_dointvec_jiffies,
 	},
+	[NF_SYSCTL_CT_PROTO_SCTP_NO_RANDOM_PORT] = {
+		.procname	= "nf_conntrack_sctp_no_random_port",
+		.maxlen		= sizeof(u8),
+		.mode		= 0644,
+		.proc_handler	= proc_dou8vec_minmax,
+		.extra1		= SYSCTL_ZERO,
+		.extra2		= SYSCTL_ONE,
+	},
 #endif
 #ifdef CONFIG_NF_CT_PROTO_DCCP
 	[NF_SYSCTL_CT_PROTO_TIMEOUT_DCCP_REQUEST] = {
@@ -1037,6 +1046,10 @@ static void nf_conntrack_standalone_init_sctp_sysctl(struct net *net,
 	XASSIGN(HEARTBEAT_SENT, sn);
 	XASSIGN(HEARTBEAT_ACKED, sn);
 #undef XASSIGN
+#define XASSIGN(XNAME, rval) \
+	table[NF_SYSCTL_CT_PROTO_SCTP_ ## XNAME].data = (rval)
+	XASSIGN(NO_RANDOM_PORT, &sn->sctp_no_random_port);
+#undef XASSIGN
 #endif
 }
 
diff --git a/net/netfilter/nf_nat_core.c b/net/netfilter/nf_nat_core.c
index 18319a6e6806..cdcff0ed094c 100644
--- a/net/netfilter/nf_nat_core.c
+++ b/net/netfilter/nf_nat_core.c
@@ -19,6 +19,7 @@
 #include <net/netfilter/nf_conntrack_bpf.h>
 #include <net/netfilter/nf_conntrack_core.h>
 #include <net/netfilter/nf_conntrack_helper.h>
+#include <net/netfilter/nf_conntrack_l4proto.h>
 #include <net/netfilter/nf_conntrack_seqadj.h>
 #include <net/netfilter/nf_conntrack_zones.h>
 #include <net/netfilter/nf_nat.h>
@@ -422,10 +423,17 @@ static void nf_nat_l4proto_unique_tuple(struct nf_conntrack_tuple *tuple,
 		}
 		goto find_free_id;
 #endif
+	case IPPROTO_SCTP:
+		/* SCTP port randomization disabled, try to use the same source port
+		 * as in the original packet. Drop packets if another endpoint tries
+		 * to use same source port behind NAT.
+		 */
+		if (nf_sctp_pernet(nf_ct_net(ct))->sctp_no_random_port)
+			return;
+		fallthrough;
 	case IPPROTO_UDP:
 	case IPPROTO_UDPLITE:
 	case IPPROTO_TCP:
-	case IPPROTO_SCTP:
 	case IPPROTO_DCCP:
 		if (maniptype == NF_NAT_MANIP_SRC)
 			keyptr = &tuple->src.u.all;
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH v2 2/2] netfilter: conntrack: add sctp DATA_SENT state
  2022-10-30 12:25 [PATCH v2 0/2] netfilter: conntrack: improve SCTP multihoming sriram.yagnaraman
  2022-10-30 12:25 ` [PATCH v2 1/2] netfilter: conntrack: introduce no_random_port proc entry sriram.yagnaraman
@ 2022-10-30 12:25 ` sriram.yagnaraman
  2022-11-02 14:02   ` Florian Westphal
  2022-11-21 11:20   ` Marcelo Ricardo Leitner
  1 sibling, 2 replies; 10+ messages in thread
From: sriram.yagnaraman @ 2022-10-30 12:25 UTC (permalink / raw)
  To: netfilter-devel; +Cc: Sriram Yagnaraman

From: Sriram Yagnaraman <sriram.yagnaraman@est.tech>

SCTP conntrack currently assumes that the SCTP endpoints will
probe secondary paths using HEARTBEAT before sending traffic.

But, according to RFC 9260, SCTP endpoints can send any traffic
on any of the confirmed paths after SCTP association is up.
SCTP endpoints that sends INIT will confirm all peer addresses
that upper layer configures, and the SCTP endpoint that receives
COOKIE_ECHO will only confirm the address it sent the INIT_ACK to.

So, we can have a situation where the INIT sender can start to
use secondary paths without the need to send HEARTBEAT. This patch
allows DATA/SACK packets to create new connection tracking entry.

A new state has been added to indicate that a DATA/SACK chunk has
been seen in the original direction - SCTP_CONNTRACK_DATA_SENT.
State transitions mostly follows the HEARTBEAT_SENT, except on
receiving HEARTBEAT/HEARTBEAT_ACK/DATA/SACK in the reply direction.

State transitions in original direction:
- DATA_SENT behaves similar to HEARTBEAT_SENT for all chunks,
   except that it remains in DATA_SENT on receving HEARTBEAT,
   HEARTBEAT_ACK/DATA/SACK chunks
State transitions in reply direction:
- DATA_SENT behaves similar to HEARTBEAT_SENT for all chunks,
   except that it moves to HEARTBEAT_ACKED on receiving
   HEARTBEAT/HEARTBEAT_ACK/DATA/SACK chunks

Note: This patch still doesn't solve the problem when the SCTP
endpoint decides to use primary paths for association establishment
but uses a secondary path for association shutdown. We still have
to depend on timeout for connections to expire in such a case.

Signed-off-by: Sriram Yagnaraman <sriram.yagnaraman@est.tech>
---
 .../uapi/linux/netfilter/nf_conntrack_sctp.h  |   1 +
 .../linux/netfilter/nfnetlink_cttimeout.h     |   1 +
 net/netfilter/nf_conntrack_proto_sctp.c       | 104 ++++++++++--------
 net/netfilter/nf_conntrack_standalone.c       |   8 ++
 4 files changed, 71 insertions(+), 43 deletions(-)

diff --git a/include/uapi/linux/netfilter/nf_conntrack_sctp.h b/include/uapi/linux/netfilter/nf_conntrack_sctp.h
index edc6ddab0de6..c742469afe21 100644
--- a/include/uapi/linux/netfilter/nf_conntrack_sctp.h
+++ b/include/uapi/linux/netfilter/nf_conntrack_sctp.h
@@ -16,6 +16,7 @@ enum sctp_conntrack {
 	SCTP_CONNTRACK_SHUTDOWN_ACK_SENT,
 	SCTP_CONNTRACK_HEARTBEAT_SENT,
 	SCTP_CONNTRACK_HEARTBEAT_ACKED,
+	SCTP_CONNTRACK_DATA_SENT,
 	SCTP_CONNTRACK_MAX
 };
 
diff --git a/include/uapi/linux/netfilter/nfnetlink_cttimeout.h b/include/uapi/linux/netfilter/nfnetlink_cttimeout.h
index 6b20fb22717b..94e74034706d 100644
--- a/include/uapi/linux/netfilter/nfnetlink_cttimeout.h
+++ b/include/uapi/linux/netfilter/nfnetlink_cttimeout.h
@@ -95,6 +95,7 @@ enum ctattr_timeout_sctp {
 	CTA_TIMEOUT_SCTP_SHUTDOWN_ACK_SENT,
 	CTA_TIMEOUT_SCTP_HEARTBEAT_SENT,
 	CTA_TIMEOUT_SCTP_HEARTBEAT_ACKED,
+	CTA_TIMEOUT_SCTP_DATA_SENT,
 	__CTA_TIMEOUT_SCTP_MAX
 };
 #define CTA_TIMEOUT_SCTP_MAX (__CTA_TIMEOUT_SCTP_MAX - 1)
diff --git a/net/netfilter/nf_conntrack_proto_sctp.c b/net/netfilter/nf_conntrack_proto_sctp.c
index 5e4d3215dcf6..d7f11145c7eb 100644
--- a/net/netfilter/nf_conntrack_proto_sctp.c
+++ b/net/netfilter/nf_conntrack_proto_sctp.c
@@ -60,6 +60,7 @@ static const unsigned int sctp_timeouts[SCTP_CONNTRACK_MAX] = {
 	[SCTP_CONNTRACK_SHUTDOWN_ACK_SENT]	= 3 SECS,
 	[SCTP_CONNTRACK_HEARTBEAT_SENT]		= 30 SECS,
 	[SCTP_CONNTRACK_HEARTBEAT_ACKED]	= 210 SECS,
+	[SCTP_CONNTRACK_DATA_SENT]		= 30 SECS,
 };
 
 #define	SCTP_FLAG_HEARTBEAT_VTAG_FAILED	1
@@ -74,6 +75,7 @@ static const unsigned int sctp_timeouts[SCTP_CONNTRACK_MAX] = {
 #define	sSA SCTP_CONNTRACK_SHUTDOWN_ACK_SENT
 #define	sHS SCTP_CONNTRACK_HEARTBEAT_SENT
 #define	sHA SCTP_CONNTRACK_HEARTBEAT_ACKED
+#define	sDS SCTP_CONNTRACK_DATA_SENT
 #define	sIV SCTP_CONNTRACK_MAX
 
 /*
@@ -90,15 +92,16 @@ COOKIE WAIT       - We have seen an INIT chunk in the original direction, or als
 COOKIE ECHOED     - We have seen a COOKIE_ECHO chunk in the original direction.
 ESTABLISHED       - We have seen a COOKIE_ACK in the reply direction.
 SHUTDOWN_SENT     - We have seen a SHUTDOWN chunk in the original direction.
-SHUTDOWN_RECD     - We have seen a SHUTDOWN chunk in the reply directoin.
+SHUTDOWN_RECD     - We have seen a SHUTDOWN chunk in the reply direction.
 SHUTDOWN_ACK_SENT - We have seen a SHUTDOWN_ACK chunk in the direction opposite
 		    to that of the SHUTDOWN chunk.
 CLOSED            - We have seen a SHUTDOWN_COMPLETE chunk in the direction of
 		    the SHUTDOWN chunk. Connection is closed.
 HEARTBEAT_SENT    - We have seen a HEARTBEAT in a new flow.
-HEARTBEAT_ACKED   - We have seen a HEARTBEAT-ACK in the direction opposite to
-		    that of the HEARTBEAT chunk. Secondary connection is
-		    established.
+HEARTBEAT_ACKED   - We have seen a HEARTBEAT-ACK/DATA/SACK in the direction
+		    opposite to that of the HEARTBEAT/DATA chunk. Secondary connection
+		 is established.
+DATA_SENT         - We have seen a DATA/SACK in a new flow.
 */
 
 /* TODO
@@ -112,36 +115,38 @@ cookie echoed to closed.
 */
 
 /* SCTP conntrack state transitions */
-static const u8 sctp_conntracks[2][11][SCTP_CONNTRACK_MAX] = {
+static const u8 sctp_conntracks[2][12][SCTP_CONNTRACK_MAX] = {
 	{
 /*	ORIGINAL	*/
-/*                  sNO, sCL, sCW, sCE, sES, sSS, sSR, sSA, sHS, sHA */
-/* init         */ {sCL, sCL, sCW, sCE, sES, sSS, sSR, sSA, sCW, sHA},
-/* init_ack     */ {sCL, sCL, sCW, sCE, sES, sSS, sSR, sSA, sCL, sHA},
-/* abort        */ {sCL, sCL, sCL, sCL, sCL, sCL, sCL, sCL, sCL, sCL},
-/* shutdown     */ {sCL, sCL, sCW, sCE, sSS, sSS, sSR, sSA, sCL, sSS},
-/* shutdown_ack */ {sSA, sCL, sCW, sCE, sES, sSA, sSA, sSA, sSA, sHA},
-/* error        */ {sCL, sCL, sCW, sCE, sES, sSS, sSR, sSA, sCL, sHA},/* Can't have Stale cookie*/
-/* cookie_echo  */ {sCL, sCL, sCE, sCE, sES, sSS, sSR, sSA, sCL, sHA},/* 5.2.4 - Big TODO */
-/* cookie_ack   */ {sCL, sCL, sCW, sCE, sES, sSS, sSR, sSA, sCL, sHA},/* Can't come in orig dir */
-/* shutdown_comp*/ {sCL, sCL, sCW, sCE, sES, sSS, sSR, sCL, sCL, sHA},
-/* heartbeat    */ {sHS, sCL, sCW, sCE, sES, sSS, sSR, sSA, sHS, sHA},
-/* heartbeat_ack*/ {sCL, sCL, sCW, sCE, sES, sSS, sSR, sSA, sHS, sHA}
+/*                  sNO, sCL, sCW, sCE, sES, sSS, sSR, sSA, sHS, sHA, sDS */
+/* init         */ {sCL, sCL, sCW, sCE, sES, sSS, sSR, sSA, sCW, sHA, sCW},
+/* init_ack     */ {sCL, sCL, sCW, sCE, sES, sSS, sSR, sSA, sCL, sHA, sCL},
+/* abort        */ {sCL, sCL, sCL, sCL, sCL, sCL, sCL, sCL, sCL, sCL, sCL},
+/* shutdown     */ {sCL, sCL, sCW, sCE, sSS, sSS, sSR, sSA, sCL, sSS, sCL},
+/* shutdown_ack */ {sSA, sCL, sCW, sCE, sES, sSA, sSA, sSA, sSA, sHA, sSA},
+/* error        */ {sCL, sCL, sCW, sCE, sES, sSS, sSR, sSA, sCL, sHA, sCL},/* Can't have Stale cookie*/
+/* cookie_echo  */ {sCL, sCL, sCE, sCE, sES, sSS, sSR, sSA, sCL, sHA, sCL},/* 5.2.4 - Big TODO */
+/* cookie_ack   */ {sCL, sCL, sCW, sCE, sES, sSS, sSR, sSA, sCL, sHA, sCL},/* Can't come in orig dir */
+/* shutdown_comp*/ {sCL, sCL, sCW, sCE, sES, sSS, sSR, sCL, sCL, sHA, sCL},
+/* heartbeat    */ {sHS, sCL, sCW, sCE, sES, sSS, sSR, sSA, sHS, sHA, sDS},
+/* heartbeat_ack*/ {sCL, sCL, sCW, sCE, sES, sSS, sSR, sSA, sHS, sHA, sDS},
+/* data/sack    */ {sDS, sCL, sCW, sCE, sES, sSS, sSR, sSA, sHS, sHA, sDS}
 	},
 	{
 /*	REPLY	*/
-/*                  sNO, sCL, sCW, sCE, sES, sSS, sSR, sSA, sHS, sHA */
-/* init         */ {sIV, sCL, sCW, sCE, sES, sSS, sSR, sSA, sIV, sHA},/* INIT in sCL Big TODO */
-/* init_ack     */ {sIV, sCW, sCW, sCE, sES, sSS, sSR, sSA, sIV, sHA},
-/* abort        */ {sIV, sCL, sCL, sCL, sCL, sCL, sCL, sCL, sIV, sCL},
-/* shutdown     */ {sIV, sCL, sCW, sCE, sSR, sSS, sSR, sSA, sIV, sSR},
-/* shutdown_ack */ {sIV, sCL, sCW, sCE, sES, sSA, sSA, sSA, sIV, sHA},
-/* error        */ {sIV, sCL, sCW, sCL, sES, sSS, sSR, sSA, sIV, sHA},
-/* cookie_echo  */ {sIV, sCL, sCW, sCE, sES, sSS, sSR, sSA, sIV, sHA},/* Can't come in reply dir */
-/* cookie_ack   */ {sIV, sCL, sCW, sES, sES, sSS, sSR, sSA, sIV, sHA},
-/* shutdown_comp*/ {sIV, sCL, sCW, sCE, sES, sSS, sSR, sCL, sIV, sHA},
-/* heartbeat    */ {sIV, sCL, sCW, sCE, sES, sSS, sSR, sSA, sHS, sHA},
-/* heartbeat_ack*/ {sIV, sCL, sCW, sCE, sES, sSS, sSR, sSA, sHA, sHA}
+/*                  sNO, sCL, sCW, sCE, sES, sSS, sSR, sSA, sHS, sHA, sDS */
+/* init         */ {sIV, sCL, sCW, sCE, sES, sSS, sSR, sSA, sIV, sHA, sIV},/* INIT in sCL Big TODO */
+/* init_ack     */ {sIV, sCW, sCW, sCE, sES, sSS, sSR, sSA, sIV, sHA, sIV},
+/* abort        */ {sIV, sCL, sCL, sCL, sCL, sCL, sCL, sCL, sIV, sCL, sIV},
+/* shutdown     */ {sIV, sCL, sCW, sCE, sSR, sSS, sSR, sSA, sIV, sSR, sIV},
+/* shutdown_ack */ {sIV, sCL, sCW, sCE, sES, sSA, sSA, sSA, sIV, sHA, sIV},
+/* error        */ {sIV, sCL, sCW, sCL, sES, sSS, sSR, sSA, sIV, sHA, sIV},
+/* cookie_echo  */ {sIV, sCL, sCW, sCE, sES, sSS, sSR, sSA, sIV, sHA, sIV},/* Can't come in reply dir */
+/* cookie_ack   */ {sIV, sCL, sCW, sES, sES, sSS, sSR, sSA, sIV, sHA, sIV},
+/* shutdown_comp*/ {sIV, sCL, sCW, sCE, sES, sSS, sSR, sCL, sIV, sHA, sIV},
+/* heartbeat    */ {sIV, sCL, sCW, sCE, sES, sSS, sSR, sSA, sHS, sHA, sHA},
+/* heartbeat_ack*/ {sIV, sCL, sCW, sCE, sES, sSS, sSR, sSA, sHA, sHA, sHA},
+/* data/sack    */ {sIV, sCL, sCW, sCE, sES, sSS, sSR, sSA, sHA, sHA, sHA},
 	}
 };
 
@@ -253,6 +258,11 @@ static int sctp_new_state(enum ip_conntrack_dir dir,
 		pr_debug("SCTP_CID_HEARTBEAT_ACK");
 		i = 10;
 		break;
+	case SCTP_CID_DATA:
+	case SCTP_CID_SACK:
+		pr_debug("SCTP_CID_DATA/SACK");
+		i = 11;
+		break;
 	default:
 		/* Other chunks like DATA or SACK do not change the state */
 		pr_debug("Unknown chunk type, Will stay in %s\n",
@@ -306,7 +316,9 @@ sctp_new(struct nf_conn *ct, const struct sk_buff *skb,
 				 ih->init_tag);
 
 			ct->proto.sctp.vtag[IP_CT_DIR_REPLY] = ih->init_tag;
-		} else if (sch->type == SCTP_CID_HEARTBEAT) {
+		} else if (sch->type == SCTP_CID_HEARTBEAT ||
+				sch->type == SCTP_CID_DATA ||
+				sch->type == SCTP_CID_SACK) {
 			pr_debug("Setting vtag %x for secondary conntrack\n",
 				 sh->vtag);
 			ct->proto.sctp.vtag[IP_CT_DIR_ORIGINAL] = sh->vtag;
@@ -392,19 +404,19 @@ int nf_conntrack_sctp_packet(struct nf_conn *ct,
 
 		if (!sctp_new(ct, skb, sh, dataoff))
 			return -NF_ACCEPT;
-	}
-
-	/* Check the verification tag (Sec 8.5) */
-	if (!test_bit(SCTP_CID_INIT, map) &&
-	    !test_bit(SCTP_CID_SHUTDOWN_COMPLETE, map) &&
-	    !test_bit(SCTP_CID_COOKIE_ECHO, map) &&
-	    !test_bit(SCTP_CID_ABORT, map) &&
-	    !test_bit(SCTP_CID_SHUTDOWN_ACK, map) &&
-	    !test_bit(SCTP_CID_HEARTBEAT, map) &&
-	    !test_bit(SCTP_CID_HEARTBEAT_ACK, map) &&
-	    sh->vtag != ct->proto.sctp.vtag[dir]) {
-		pr_debug("Verification tag check failed\n");
-		goto out;
+	} else {
+		/* Check the verification tag (Sec 8.5) */
+		if (!test_bit(SCTP_CID_INIT, map) &&
+		    !test_bit(SCTP_CID_SHUTDOWN_COMPLETE, map) &&
+		    !test_bit(SCTP_CID_COOKIE_ECHO, map) &&
+		    !test_bit(SCTP_CID_ABORT, map) &&
+		    !test_bit(SCTP_CID_SHUTDOWN_ACK, map) &&
+		    !test_bit(SCTP_CID_HEARTBEAT, map) &&
+		    !test_bit(SCTP_CID_HEARTBEAT_ACK, map) &&
+		    sh->vtag != ct->proto.sctp.vtag[dir]) {
+			pr_debug("Verification tag check failed\n");
+			goto out;
+		}
 	}
 
 	old_state = new_state = SCTP_CONNTRACK_NONE;
@@ -464,6 +476,11 @@ int nf_conntrack_sctp_packet(struct nf_conn *ct,
 			} else if (ct->proto.sctp.flags & SCTP_FLAG_HEARTBEAT_VTAG_FAILED) {
 				ct->proto.sctp.flags &= ~SCTP_FLAG_HEARTBEAT_VTAG_FAILED;
 			}
+		} else if (sch->type == SCTP_CID_DATA || sch->type == SCTP_CID_SACK) {
+			if (ct->proto.sctp.vtag[dir] == 0) {
+				pr_debug("Setting vtag %x for dir %d\n", sh->vtag, dir);
+				ct->proto.sctp.vtag[dir] = sh->vtag;
+			}
 		}
 
 		old_state = ct->proto.sctp.state;
@@ -684,6 +701,7 @@ sctp_timeout_nla_policy[CTA_TIMEOUT_SCTP_MAX+1] = {
 	[CTA_TIMEOUT_SCTP_SHUTDOWN_ACK_SENT]	= { .type = NLA_U32 },
 	[CTA_TIMEOUT_SCTP_HEARTBEAT_SENT]	= { .type = NLA_U32 },
 	[CTA_TIMEOUT_SCTP_HEARTBEAT_ACKED]	= { .type = NLA_U32 },
+	[CTA_TIMEOUT_SCTP_DATA_SENT]		= { .type = NLA_U32 },
 };
 #endif /* CONFIG_NF_CONNTRACK_TIMEOUT */
 
diff --git a/net/netfilter/nf_conntrack_standalone.c b/net/netfilter/nf_conntrack_standalone.c
index e35876ce418d..15199f00e33f 100644
--- a/net/netfilter/nf_conntrack_standalone.c
+++ b/net/netfilter/nf_conntrack_standalone.c
@@ -602,6 +602,7 @@ enum nf_ct_sysctl_index {
 	NF_SYSCTL_CT_PROTO_TIMEOUT_SCTP_SHUTDOWN_ACK_SENT,
 	NF_SYSCTL_CT_PROTO_TIMEOUT_SCTP_HEARTBEAT_SENT,
 	NF_SYSCTL_CT_PROTO_TIMEOUT_SCTP_HEARTBEAT_ACKED,
+	NF_SYSCTL_CT_PROTO_TIMEOUT_SCTP_DATA_SENT,
 	NF_SYSCTL_CT_PROTO_SCTP_NO_RANDOM_PORT,
 #endif
 #ifdef CONFIG_NF_CT_PROTO_DCCP
@@ -893,6 +894,12 @@ static struct ctl_table nf_ct_sysctl_table[] = {
 		.mode           = 0644,
 		.proc_handler   = proc_dointvec_jiffies,
 	},
+	[NF_SYSCTL_CT_PROTO_TIMEOUT_SCTP_DATA_SENT] = {
+		.procname       = "nf_conntrack_sctp_timeout_data_sent",
+		.maxlen         = sizeof(unsigned int),
+		.mode           = 0644,
+		.proc_handler   = proc_dointvec_jiffies,
+	},
 	[NF_SYSCTL_CT_PROTO_SCTP_NO_RANDOM_PORT] = {
 		.procname	= "nf_conntrack_sctp_no_random_port",
 		.maxlen		= sizeof(u8),
@@ -1045,6 +1052,7 @@ static void nf_conntrack_standalone_init_sctp_sysctl(struct net *net,
 	XASSIGN(SHUTDOWN_ACK_SENT, sn);
 	XASSIGN(HEARTBEAT_SENT, sn);
 	XASSIGN(HEARTBEAT_ACKED, sn);
+	XASSIGN(DATA_SENT, sn);
 #undef XASSIGN
 #define XASSIGN(XNAME, rval) \
 	table[NF_SYSCTL_CT_PROTO_SCTP_ ## XNAME].data = (rval)
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH v2 1/2] netfilter: conntrack: introduce no_random_port proc entry
  2022-10-30 12:25 ` [PATCH v2 1/2] netfilter: conntrack: introduce no_random_port proc entry sriram.yagnaraman
@ 2022-10-31  8:38   ` Florian Westphal
  2022-10-31 18:41     ` Sriram Yagnaraman
  0 siblings, 1 reply; 10+ messages in thread
From: Florian Westphal @ 2022-10-31  8:38 UTC (permalink / raw)
  To: sriram.yagnaraman; +Cc: netfilter-devel, kernel test robot

sriram.yagnaraman@est.tech <sriram.yagnaraman@est.tech> wrote:
> From: Sriram Yagnaraman <sriram.yagnaraman@est.tech>
> 
> This patch introduces a new proc entry to disable source port
> randomization for SCTP connections.

Hmm.  Can you elaborate?  The sport is never randomized, unless either
1. User explicitly requested it via "random" flag passed to snat rule, or
2. the is an existing connection, using the *same* sport:saddr -> daddr:dport
   quadruple as the new request.

In 2), this new toggle prevents communication.  So I wonder why ...

> As specified in RFC9260 all transport addresses used by an SCTP endpoint
> MUST use the same port number but can use multiple IP addresses. That
> means that all paths taken within an SCTP association should have the
> same port even if they pass through different NAT/middleboxes in the
> network.

... the rfc mandates this, especially given the fact that endpoints have
0 control on middlebox behaviour.

This flag will completely prevent communication in case another
middlebox does sport randomization, so I wonder why its needed -- I see
no advantages but I see a downside.

> Disabling source port randomization provides a deterministic source port
> for the remote SCTP endpoint for all paths used in the SCTP association.
> On NAT/middlebox restarts we will always end up with the same port after
> the restart, and the SCTP endpoints involved in the SCTP association can
> remain transparent to restarts.

Can you elaborate? If we're the middlebox and we restarted, we have no
record of the "old" incarnation so there is no sport reallocation.

> Of course, there is a downside as this makes it impossible to have
> multiple SCTP endpoints behind NAT that use the same source port.

Hmm?  Not following.

1.2.3.4:12345 -> 5.6.7.8:42
1.2.3.4:12345 -> 5.6.7.8:43

... should work fine. Same for
1.2.3.4:12345 -> 5.6.7.8:42
1.2.3.4:12345 -> 5.6.7.9:42

> But, this is a lesser of a problem than losing an existing association
> altogether.

Can you elaborate?  How is an existing assocation lost?
For example, what sequence of events is needed to result in loss of
an existing association?

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v2 1/2] netfilter: conntrack: introduce no_random_port proc entry
  2022-10-31  8:38   ` Florian Westphal
@ 2022-10-31 18:41     ` Sriram Yagnaraman
  2022-11-02 14:00       ` Florian Westphal
  0 siblings, 1 reply; 10+ messages in thread
From: Sriram Yagnaraman @ 2022-10-31 18:41 UTC (permalink / raw)
  To: Florian Westphal; +Cc: netfilter-devel, kernel test robot, claudio.porfiri

On 2022-10-31 09:38, Florian Westphal wrote:

> sriram.yagnaraman@est.tech <sriram.yagnaraman@est.tech> wrote:
>> From: Sriram Yagnaraman <sriram.yagnaraman@est.tech>
>>
>> This patch introduces a new proc entry to disable source port
>> randomization for SCTP connections.
> Hmm.  Can you elaborate?  The sport is never randomized, unless either
> 1. User explicitly requested it via "random" flag passed to snat rule, or
> 2. the is an existing connection, using the *same* sport:saddr -> daddr:dport
>    quadruple as the new request.
>
> In 2), this new toggle prevents communication.  So I wonder why ...

Thank you so much for the detailed review comments.

My use case for this flag originates from a deployment of SCTP client
endpoints on docker/kubernetes environments, where typically there exists
SNAT rules for the endpoints on egress. The *user* in this case are the
CNI plugins that configure the SNAT rules, and some of the most common
plugins use --random-fully regardless of the protocol.

Consider an SCTP association A -> B, which has two paths via NAT A and B
A: 1.2.3.4:12345
B: 5.6.7.8/9:42
NAT A: 1.2.31.4 (used for path towards 5.6.7.8)
NAT B: 1.2.32.4 (used for path towards 5.6.7.9)

              ┌───────┐   ┌───┐
           ┌──► NAT A ├───►   │
 ┌─────┐   │  └───────┘   │   │
 │  A  ├───┤              │ B │
 └─────┘   │  ┌───────┐   │   │
           └──► NAT B ├───►   │
              └───────┘   └───┘

Let us assume in NAT A (1.2.31.4), the connections is setup as
	ORIGINAL TUPLE		    REPLY TUPLE
1.2.3.4:12345 -> 5.6.7.8:42, 5.6.7.8.42 -> 1.2.31.4:33333

Let us assume in NAT B (1.2.32.4), the connections is setup as
	ORIGINAL TUPLE		    REPLY TUPLE
1.2.3.4:12345 -> 5.6.7.9:42, 5.6.7.8.42 -> 1.2.32.4:44444

Since the port numbers are different when viewed from B, the association
will not become multihomed, with only the primary path being active.
Moreover, on a NAT/middlebox restart, we will end up getting new ports.

I understand this is a problem in the way SNAT rules are configured, my
proposal was to have this flag as a means of preventing such a problem
even if the user wanted to.

>> As specified in RFC9260 all transport addresses used by an SCTP endpoint
>> MUST use the same port number but can use multiple IP addresses. That
>> means that all paths taken within an SCTP association should have the
>> same port even if they pass through different NAT/middleboxes in the
>> network.
> ... the rfc mandates this, especially given the fact that endpoints have
> 0 control on middlebox behaviour.
>
> This flag will completely prevent communication in case another
> middlebox does sport randomization, so I wonder why its needed -- I see
> no advantages but I see a downside.

Since the flag is optional, the idea is to enable it only on hosts that
are part of docker/kubernetes environments and use NAT in their datapath.

>> Disabling source port randomization provides a deterministic source port
>> for the remote SCTP endpoint for all paths used in the SCTP association.
>> On NAT/middlebox restarts we will always end up with the same port after
>> the restart, and the SCTP endpoints involved in the SCTP association can
>> remain transparent to restarts.
> Can you elaborate? If we're the middlebox and we restarted, we have no
> record of the "old" incarnation so there is no sport reallocation.
>
>> Of course, there is a downside as this makes it impossible to have
>> multiple SCTP endpoints behind NAT that use the same source port.
> Hmm?  Not following.
>
> 1.2.3.4:12345 -> 5.6.7.8:42
> 1.2.3.4:12345 -> 5.6.7.8:43
>
> ... should work fine. Same for
> 1.2.3.4:12345 -> 5.6.7.8:42
> 1.2.3.4:12345 -> 5.6.7.9:42

I meant to note the limitation you rightly pointed above, that when there
is an existing connection, using the *same* sport:saddr -> dport:daddr, the
new connection attempt will be dropped. For e.g.

1.2.3.41:12345 -> 5.6.7.8:42 (Existing connection)
1.2.3.42:12345 -> 5.6.7.8:42 (This connection request will fail)

Will be translated to conflicting reply tuples
1.2.3.40:12345 <- 5.6.7.8:42
1.2.3.40:12345 <- 5.6.7.8:42

>> But, this is a lesser of a problem than losing an existing association
>> altogether.
> Can you elaborate?  How is an existing assocation lost?
> For example, what sequence of events is needed to result in loss of
> an existing association?

Consider two SCTP associations A -> C and B -> C
A: 1.2.3.41:12345
B: 1.2.3.42:12345
C: 5.6.7.8:42 
NAT public IP: 1.2.3.40

  ┌─────┐
  │  A  ├───┐
  └─────┘   │    ┌─────┐   ┌─────┐
            ├────► NAT ├───►  C  │
  ┌─────┐   │    └─────┘   └─────┘
  │  B  ├───┘
  └─────┘

Let us assume in NAT (1.2.3.40), the connections are setup as
	ORIGINAL TUPLE		    REPLY TUPLE
1.2.3.41:12345 -> 5.6.7.8:42, 5.6.7.8.42 -> 1.2.3.40:12345
1.2.3.42:12345 -> 5.6.7.8:42, 5.6.7.8.42 -> 1.2.3.40:44444

After a restart there is a chance that the connections will be setup as
	ORIGINAL TUPLE		    REPLY TUPLE
1.2.3.41:12345 -> 5.6.7.8:42, 5.6.7.8.42 -> 1.2.3.40:55555
1.2.3.42:12345 -> 5.6.7.8:42, 5.6.7.8.42 -> 1.2.3.40:12345

From C's point of view the port numbers for the two associations will be 
different after a restart, hence there can be no communication using the
existing associations.




^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v2 1/2] netfilter: conntrack: introduce no_random_port proc entry
  2022-10-31 18:41     ` Sriram Yagnaraman
@ 2022-11-02 14:00       ` Florian Westphal
  2022-11-03 20:02         ` Sriram Yagnaraman
  0 siblings, 1 reply; 10+ messages in thread
From: Florian Westphal @ 2022-11-02 14:00 UTC (permalink / raw)
  To: Sriram Yagnaraman
  Cc: Florian Westphal, netfilter-devel, kernel test robot, claudio.porfiri

Sriram Yagnaraman <sriram.yagnaraman@est.tech> wrote:
> On 2022-10-31 09:38, Florian Westphal wrote:
> 
> > sriram.yagnaraman@est.tech <sriram.yagnaraman@est.tech> wrote:
> >> From: Sriram Yagnaraman <sriram.yagnaraman@est.tech>
> >>
> >> This patch introduces a new proc entry to disable source port
> >> randomization for SCTP connections.
> > Hmm.  Can you elaborate?  The sport is never randomized, unless either
> > 1. User explicitly requested it via "random" flag passed to snat rule, or
> > 2. the is an existing connection, using the *same* sport:saddr -> daddr:dport
> >    quadruple as the new request.
> >
> > In 2), this new toggle prevents communication.  So I wonder why ...
> 
> Thank you so much for the detailed review comments.
> 
> My use case for this flag originates from a deployment of SCTP client
> endpoints on docker/kubernetes environments, where typically there exists
> SNAT rules for the endpoints on egress. The *user* in this case are the
> CNI plugins that configure the SNAT rules, and some of the most common
> plugins use --random-fully regardless of the protocol.
> 
> Consider an SCTP association A -> B, which has two paths via NAT A and B
> A: 1.2.3.4:12345
> B: 5.6.7.8/9:42
> NAT A: 1.2.31.4 (used for path towards 5.6.7.8)
> NAT B: 1.2.32.4 (used for path towards 5.6.7.9)
> 
>               ┌───────┐   ┌───┐
>            ┌──► NAT A ├───►   │
>  ┌─────┐   │  └───────┘   │   │
>  │  A  ├───┤              │ B │
>  └─────┘   │  ┌───────┐   │   │
>            └──► NAT B ├───►   │
>               └───────┘   └───┘
> 
> Let us assume in NAT A (1.2.31.4), the connections is setup as
> 	ORIGINAL TUPLE		    REPLY TUPLE
> 1.2.3.4:12345 -> 5.6.7.8:42, 5.6.7.8.42 -> 1.2.31.4:33333
> 
> Let us assume in NAT B (1.2.32.4), the connections is setup as
> 	ORIGINAL TUPLE		    REPLY TUPLE
> 1.2.3.4:12345 -> 5.6.7.9:42, 5.6.7.8.42 -> 1.2.32.4:44444
> 
> Since the port numbers are different when viewed from B, the association
> will not become multihomed, with only the primary path being active.
> Moreover, on a NAT/middlebox restart, we will end up getting new ports.
>
> I understand this is a problem in the way SNAT rules are configured, my
> proposal was to have this flag as a means of preventing such a problem
> even if the user wanted to.

Ugh, sorry, but that sounds just wrong.

> >> As specified in RFC9260 all transport addresses used by an SCTP endpoint
> >> MUST use the same port number but can use multiple IP addresses. That
> >> means that all paths taken within an SCTP association should have the
> >> same port even if they pass through different NAT/middleboxes in the
> >> network.

Hmm, I don't understand WHY this requirement exists, since endpoints
cannot control source port (or source address) seen by the peer;
NAT won't go away.

I read that snippet several times and its not clear to me if
"port number" refers to sport or dport.  Dport would make sense to me,
but sport...?  No, not really.

Won't the endpoints notice that the path is down and re-create the flow?

AFAIU the root cause of your problem is that:
1. NAT middleboxes remap source port AND
2. NAT middleboxes restart frequently

... so fixing either 1 or 2 would avoid the problem.

I don't think adding sysctls to override 1) is a sane option.

> Since the flag is optional, the idea is to enable it only on hosts that
> are part of docker/kubernetes environments and use NAT in their datapath.

We can't fix the ruleset but we can somehow cure it via sysctl in each netns?
I don't like this.

NAT middlebox restart with --random is a problem in any case, not just
for SCTP, because the chosen "random port" is lost.

I don't see a way to fix this, unless NOT using --random mode.
If connection is subject to sequence number rewrite (for tcp)
the connection won't survive either as the sejadj state is lost.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v2 2/2] netfilter: conntrack: add sctp DATA_SENT state
  2022-10-30 12:25 ` [PATCH v2 2/2] netfilter: conntrack: add sctp DATA_SENT state sriram.yagnaraman
@ 2022-11-02 14:02   ` Florian Westphal
  2022-11-21 11:20   ` Marcelo Ricardo Leitner
  1 sibling, 0 replies; 10+ messages in thread
From: Florian Westphal @ 2022-11-02 14:02 UTC (permalink / raw)
  To: sriram.yagnaraman; +Cc: netfilter-devel

sriram.yagnaraman@est.tech <sriram.yagnaraman@est.tech> wrote:
> From: Sriram Yagnaraman <sriram.yagnaraman@est.tech>
> 
> SCTP conntrack currently assumes that the SCTP endpoints will
> probe secondary paths using HEARTBEAT before sending traffic.
> 
> But, according to RFC 9260, SCTP endpoints can send any traffic
> on any of the confirmed paths after SCTP association is up.
> SCTP endpoints that sends INIT will confirm all peer addresses
> that upper layer configures, and the SCTP endpoint that receives
> COOKIE_ECHO will only confirm the address it sent the INIT_ACK to.
> 
> So, we can have a situation where the INIT sender can start to
> use secondary paths without the need to send HEARTBEAT. This patch
> allows DATA/SACK packets to create new connection tracking entry.
> 
> A new state has been added to indicate that a DATA/SACK chunk has
> been seen in the original direction - SCTP_CONNTRACK_DATA_SENT.
> State transitions mostly follows the HEARTBEAT_SENT, except on
> receiving HEARTBEAT/HEARTBEAT_ACK/DATA/SACK in the reply direction.
> 
> State transitions in original direction:
> - DATA_SENT behaves similar to HEARTBEAT_SENT for all chunks,
>    except that it remains in DATA_SENT on receving HEARTBEAT,
>    HEARTBEAT_ACK/DATA/SACK chunks
> State transitions in reply direction:
> - DATA_SENT behaves similar to HEARTBEAT_SENT for all chunks,
>    except that it moves to HEARTBEAT_ACKED on receiving
>    HEARTBEAT/HEARTBEAT_ACK/DATA/SACK chunks

Looks OK to me.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v2 1/2] netfilter: conntrack: introduce no_random_port proc entry
  2022-11-02 14:00       ` Florian Westphal
@ 2022-11-03 20:02         ` Sriram Yagnaraman
  2022-11-21 11:24           ` Marcelo Ricardo Leitner
  0 siblings, 1 reply; 10+ messages in thread
From: Sriram Yagnaraman @ 2022-11-03 20:02 UTC (permalink / raw)
  To: Florian Westphal; +Cc: netfilter-devel, kernel test robot, claudio.porfiri

On 2022-11-02 15:00, Florian Westphal wrote:

> Sriram Yagnaraman <sriram.yagnaraman@est.tech> wrote:
>> On 2022-10-31 09:38, Florian Westphal wrote:
>>
>>> sriram.yagnaraman@est.tech <sriram.yagnaraman@est.tech> wrote:
>>>> From: Sriram Yagnaraman <sriram.yagnaraman@est.tech>
>>>>
>>>> This patch introduces a new proc entry to disable source port
>>>> randomization for SCTP connections.
>>> Hmm.  Can you elaborate?  The sport is never randomized, unless either
>>> 1. User explicitly requested it via "random" flag passed to snat rule, or
>>> 2. the is an existing connection, using the *same* sport:saddr -> daddr:dport
>>>    quadruple as the new request.
>>>
>>> In 2), this new toggle prevents communication.  So I wonder why ...
>> Thank you so much for the detailed review comments.
>>
>> My use case for this flag originates from a deployment of SCTP client
>> endpoints on docker/kubernetes environments, where typically there exists
>> SNAT rules for the endpoints on egress. The *user* in this case are the
>> CNI plugins that configure the SNAT rules, and some of the most common
>> plugins use --random-fully regardless of the protocol.
>>
>> Consider an SCTP association A -> B, which has two paths via NAT A and B
>> A: 1.2.3.4:12345
>> B: 5.6.7.8/9:42
>> NAT A: 1.2.31.4 (used for path towards 5.6.7.8)
>> NAT B: 1.2.32.4 (used for path towards 5.6.7.9)
>>
>>               ┌───────┐   ┌───┐
>>            ┌──► NAT A ├───►   │
>>  ┌─────┐   │  └───────┘   │   │
>>  │  A  ├───┤              │ B │
>>  └─────┘   │  ┌───────┐   │   │
>>            └──► NAT B ├───►   │
>>               └───────┘   └───┘
>>
>> Let us assume in NAT A (1.2.31.4), the connections is setup as
>> 	ORIGINAL TUPLE		    REPLY TUPLE
>> 1.2.3.4:12345 -> 5.6.7.8:42, 5.6.7.8.42 -> 1.2.31.4:33333
>>
>> Let us assume in NAT B (1.2.32.4), the connections is setup as
>> 	ORIGINAL TUPLE		    REPLY TUPLE
>> 1.2.3.4:12345 -> 5.6.7.9:42, 5.6.7.8.42 -> 1.2.32.4:44444
>>
>> Since the port numbers are different when viewed from B, the association
>> will not become multihomed, with only the primary path being active.
>> Moreover, on a NAT/middlebox restart, we will end up getting new ports.
>>
>> I understand this is a problem in the way SNAT rules are configured, my
>> proposal was to have this flag as a means of preventing such a problem
>> even if the user wanted to.
> Ugh, sorry, but that sounds just wrong.

Ok, I hear that. :)

>
>>>> As specified in RFC9260 all transport addresses used by an SCTP endpoint
>>>> MUST use the same port number but can use multiple IP addresses. That
>>>> means that all paths taken within an SCTP association should have the
>>>> same port even if they pass through different NAT/middleboxes in the
>>>> network.
> Hmm, I don't understand WHY this requirement exists, since endpoints
> cannot control source port (or source address) seen by the peer;
> NAT won't go away.
>
> I read that snippet several times and its not clear to me if
> "port number" refers to sport or dport.  Dport would make sense to me,
> but sport...?  No, not really.

I am just an interpreter of the standard but AFAIU, port means both source
and destination port. Section 1.3 of RFC 9260 defining an SCTP endpoint.
In any case, running SCTP on UDP is probably the best way to workaround
the SCTP NAT problem.

>
> Won't the endpoints notice that the path is down and re-create the flow?
>
> AFAIU the root cause of your problem is that:
> 1. NAT middleboxes remap source port AND
> 2. NAT middleboxes restart frequently
>
> ... so fixing either 1 or 2 would avoid the problem.
>
> I don't think adding sysctls to override 1) is a sane option.

Yeah the endpoints does try to re-create the flows, but if we have
multiple middle boxes remapping the source port, there is no guarantee
that they will remap to the same source port.
1) is the main problem that I was trying to address with this patch.

>> Since the flag is optional, the idea is to enable it only on hosts that
>> are part of docker/kubernetes environments and use NAT in their datapath.
> We can't fix the ruleset but we can somehow cure it via sysctl in each netns?
> I don't like this.
>
> NAT middlebox restart with --random is a problem in any case, not just
> for SCTP, because the chosen "random port" is lost.
>
> I don't see a way to fix this, unless NOT using --random mode.
> If connection is subject to sequence number rewrite (for tcp)
> the connection won't survive either as the sejadj state is lost.

Ok, I understand your point. I agree it doesn't make sense to have an
alternative configuration option to avoid this problem. I will try to
convince the "users" if --random-fully is not used for SCTP.



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v2 2/2] netfilter: conntrack: add sctp DATA_SENT state
  2022-10-30 12:25 ` [PATCH v2 2/2] netfilter: conntrack: add sctp DATA_SENT state sriram.yagnaraman
  2022-11-02 14:02   ` Florian Westphal
@ 2022-11-21 11:20   ` Marcelo Ricardo Leitner
  1 sibling, 0 replies; 10+ messages in thread
From: Marcelo Ricardo Leitner @ 2022-11-21 11:20 UTC (permalink / raw)
  To: sriram.yagnaraman; +Cc: netfilter-devel

On Sun, Oct 30, 2022 at 01:25:41PM +0100, sriram.yagnaraman@est.tech wrote:
> From: Sriram Yagnaraman <sriram.yagnaraman@est.tech>
>
> SCTP conntrack currently assumes that the SCTP endpoints will
> probe secondary paths using HEARTBEAT before sending traffic.
...

Reviewed-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v2 1/2] netfilter: conntrack: introduce no_random_port proc entry
  2022-11-03 20:02         ` Sriram Yagnaraman
@ 2022-11-21 11:24           ` Marcelo Ricardo Leitner
  0 siblings, 0 replies; 10+ messages in thread
From: Marcelo Ricardo Leitner @ 2022-11-21 11:24 UTC (permalink / raw)
  To: Sriram Yagnaraman
  Cc: Florian Westphal, netfilter-devel, kernel test robot,
	claudio.porfiri, lxin

On Thu, Nov 03, 2022 at 08:02:08PM +0000, Sriram Yagnaraman wrote:
> On 2022-11-02 15:00, Florian Westphal wrote:
>
> > Sriram Yagnaraman <sriram.yagnaraman@est.tech> wrote:
> >> On 2022-10-31 09:38, Florian Westphal wrote:
> >>
> >>> sriram.yagnaraman@est.tech <sriram.yagnaraman@est.tech> wrote:
> >>>> From: Sriram Yagnaraman <sriram.yagnaraman@est.tech>
> >>>>
> >>>> This patch introduces a new proc entry to disable source port
> >>>> randomization for SCTP connections.
> >>> Hmm.  Can you elaborate?  The sport is never randomized, unless either
> >>> 1. User explicitly requested it via "random" flag passed to snat rule, or
> >>> 2. the is an existing connection, using the *same* sport:saddr -> daddr:dport
> >>>    quadruple as the new request.
> >>>
> >>> In 2), this new toggle prevents communication.  So I wonder why ...
> >> Thank you so much for the detailed review comments.
> >>
> >> My use case for this flag originates from a deployment of SCTP client
> >> endpoints on docker/kubernetes environments, where typically there exists
> >> SNAT rules for the endpoints on egress. The *user* in this case are the
> >> CNI plugins that configure the SNAT rules, and some of the most common
> >> plugins use --random-fully regardless of the protocol.
> >>
> >> Consider an SCTP association A -> B, which has two paths via NAT A and B
> >> A: 1.2.3.4:12345
> >> B: 5.6.7.8/9:42
> >> NAT A: 1.2.31.4 (used for path towards 5.6.7.8)
> >> NAT B: 1.2.32.4 (used for path towards 5.6.7.9)
> >>
> >>               ┌───────┐   ┌───┐
> >>            ┌──► NAT A ├───►   │
> >>  ┌─────┐   │  └───────┘   │   │
> >>  │  A  ├───┤              │ B │
> >>  └─────┘   │  ┌───────┐   │   │
> >>            └──► NAT B ├───►   │
> >>               └───────┘   └───┘
> >>
> >> Let us assume in NAT A (1.2.31.4), the connections is setup as
> >> 	ORIGINAL TUPLE		    REPLY TUPLE
> >> 1.2.3.4:12345 -> 5.6.7.8:42, 5.6.7.8.42 -> 1.2.31.4:33333
> >>
> >> Let us assume in NAT B (1.2.32.4), the connections is setup as
> >> 	ORIGINAL TUPLE		    REPLY TUPLE
> >> 1.2.3.4:12345 -> 5.6.7.9:42, 5.6.7.8.42 -> 1.2.32.4:44444
> >>
> >> Since the port numbers are different when viewed from B, the association
> >> will not become multihomed, with only the primary path being active.
> >> Moreover, on a NAT/middlebox restart, we will end up getting new ports.
> >>
> >> I understand this is a problem in the way SNAT rules are configured, my
> >> proposal was to have this flag as a means of preventing such a problem
> >> even if the user wanted to.
> > Ugh, sorry, but that sounds just wrong.
>
> Ok, I hear that. :)
>
> >
> >>>> As specified in RFC9260 all transport addresses used by an SCTP endpoint
> >>>> MUST use the same port number but can use multiple IP addresses. That
> >>>> means that all paths taken within an SCTP association should have the
> >>>> same port even if they pass through different NAT/middleboxes in the
> >>>> network.
> > Hmm, I don't understand WHY this requirement exists, since endpoints
> > cannot control source port (or source address) seen by the peer;
> > NAT won't go away.
> >
> > I read that snippet several times and its not clear to me if
> > "port number" refers to sport or dport.  Dport would make sense to me,
> > but sport...?  No, not really.
>
> I am just an interpreter of the standard but AFAIU, port means both source
> and destination port. Section 1.3 of RFC 9260 defining an SCTP endpoint.
> In any case, running SCTP on UDP is probably the best way to workaround
> the SCTP NAT problem.
>
> >
> > Won't the endpoints notice that the path is down and re-create the flow?
> >
> > AFAIU the root cause of your problem is that:
> > 1. NAT middleboxes remap source port AND
> > 2. NAT middleboxes restart frequently
> >
> > ... so fixing either 1 or 2 would avoid the problem.
> >
> > I don't think adding sysctls to override 1) is a sane option.
>
> Yeah the endpoints does try to re-create the flows, but if we have
> multiple middle boxes remapping the source port, there is no guarantee
> that they will remap to the same source port.
> 1) is the main problem that I was trying to address with this patch.
>
> >> Since the flag is optional, the idea is to enable it only on hosts that
> >> are part of docker/kubernetes environments and use NAT in their datapath.
> > We can't fix the ruleset but we can somehow cure it via sysctl in each netns?
> > I don't like this.
> >
> > NAT middlebox restart with --random is a problem in any case, not just
> > for SCTP, because the chosen "random port" is lost.
> >
> > I don't see a way to fix this, unless NOT using --random mode.
> > If connection is subject to sequence number rewrite (for tcp)
> > the connection won't survive either as the sejadj state is lost.
>
> Ok, I understand your point. I agree it doesn't make sense to have an
> alternative configuration option to avoid this problem. I will try to
> convince the "users" if --random-fully is not used for SCTP.

FWIW I share Florian's opinion here. With the explanations above, it
doesn't make sense to have an override in kernel for an option that
userspace is supplying at will.

  Marcelo


^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2022-11-21 11:31 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-10-30 12:25 [PATCH v2 0/2] netfilter: conntrack: improve SCTP multihoming sriram.yagnaraman
2022-10-30 12:25 ` [PATCH v2 1/2] netfilter: conntrack: introduce no_random_port proc entry sriram.yagnaraman
2022-10-31  8:38   ` Florian Westphal
2022-10-31 18:41     ` Sriram Yagnaraman
2022-11-02 14:00       ` Florian Westphal
2022-11-03 20:02         ` Sriram Yagnaraman
2022-11-21 11:24           ` Marcelo Ricardo Leitner
2022-10-30 12:25 ` [PATCH v2 2/2] netfilter: conntrack: add sctp DATA_SENT state sriram.yagnaraman
2022-11-02 14:02   ` Florian Westphal
2022-11-21 11:20   ` Marcelo Ricardo Leitner

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.