All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH RFC 0/5] netfilter: implement netfilter SYN proxy
@ 2013-08-07 17:42 Patrick McHardy
  2013-08-07 17:42 ` [PATCH 1/5] netfilter: nf_conntrack: make sequence number adjustments usuable without NAT Patrick McHardy
                   ` (5 more replies)
  0 siblings, 6 replies; 29+ messages in thread
From: Patrick McHardy @ 2013-08-07 17:42 UTC (permalink / raw)
  To: pablo; +Cc: netfilter-devel, netdev, mph, jesper.brouer, as

The following patches against nf-next.git implement a SYN proxy for
netfilter. The series applies on top of the patches I sent last week
and is split into five patches:

- a patch to split out sequence number adjustment from NAT and make it
  usable from other netfilter subsystems. This is used to translate
  sequence numbers from the server to the client once the full connection
  has been established.

  This patch contains a bit of churn, but the core is to simply move the
  code to a new file and move the sequence number adjustment data into a
  ct extend.

- a patch to extract the TCP stack independant parts of syncookie generation
  and validation and make the usable from netfilter

- the SYN proxy core and IPv4 SYNPROXY target. See below for more details.

- a similar patch to the second one for IPv6

- an IPv6 version of the SYNPROXY target


The SYNPROXY operates by marking the initial SYN from the client as UNTRACKED
and directing it to the SYNPROXY target. The target responds with a SYN/ACK
containing a cookie and encodes options such as window scaling factor, SACK
perm etc. into the timestamp, if timestamps are used (similar to TCP). The
window size is set to zero. The response is also sent as untracked packet.

When the final ACK is received the cookie is validated, the original options
extracted and a SYN to the original destination is generated. The SYN to the
original destination uses the avertised window from the final ACK and the
options from the initial SYN packet. The SYN is not sent as untracked, so
from a connection tracking POV it will look like the original packet from
the client and instantiate a new connection. When the server responds with
a SYN/ACK a final ACK for the server is generated and a window update with
the window size announced by the server is sent to the client. At this
point the connection is handed of to conntrack and the only thing the
target is still involved in is timestamp translation through the registerd
hooks.

Since the SYN proxy can't know the options the server supports, they have
to be specified as parameters to the SYNPROXY target. The assumption is that
these options are constant as long as you don't change settings on the
server. Since the SYN proxy can't know the initial sequence number and
timestamp values the server will use, both have to be translated in the
direction server->client. Sequence number translation is done using the
standard sequence number translation mechanisms originally only used for
NAT, timestamps are translated in a hook registered by the SYNPROXY target.

Martin Topholm made some performance measurements with an earlier version
(that should still be valid, the only difference was that the core and IPv4
parts were in the same file) and measured a load of about 7% on a 8 way
system with 2 million SYNs per second, which without the target basically
killed the server (Martin, please correct me if I'm wrong).

The iptables patches will follow in a seperate thread, testing can be done
by:

iptables -t raw -A PREROUTING -i eth0 -p tcp --dport 80 --syn -j NOTRACK
iptables -A INPUT -i eth0 -p tcp --dport 80 -m state UNTRACKED,INVALID \
	-j SYNPROXY --sack-perm --timestamp --mss 1480 --wscale 7 --ecn

echo 0 > /proc/sys/net/netfilter/nf_conntrack_tcp_loose

The second rule catches untracked packets and directs them to the target.
The purpose of disabling loose tracking is to have the final ACK from the
client not be picked up by conntrack, so it won't create a new conntrack
entry and will be marked INVALID and also get directed to the target.

Unfortunately I couldn't come up with a nicer way to catch just the
first SYN and final ACK from the client and not have any more packets
hit the target, but even though it doesn't look to nice, it works well.

Comments welcome.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [PATCH 1/5] netfilter: nf_conntrack: make sequence number adjustments usuable without NAT
  2013-08-07 17:42 [PATCH RFC 0/5] netfilter: implement netfilter SYN proxy Patrick McHardy
@ 2013-08-07 17:42 ` Patrick McHardy
  2013-08-07 20:02   ` Jesper Dangaard Brouer
  2013-08-07 17:42 ` [PATCH 2/5] net: syncookies: export cookie_v4_init_sequence/cookie_v4_check Patrick McHardy
                   ` (4 subsequent siblings)
  5 siblings, 1 reply; 29+ messages in thread
From: Patrick McHardy @ 2013-08-07 17:42 UTC (permalink / raw)
  To: pablo; +Cc: netfilter-devel, netdev, mph, jesper.brouer, as

Split out sequence number adjustments from NAT and move them to the conntrack
core to make them usable for SYN proxying. The sequence number adjustment
information is moved to a seperate extend. The extend is added to new
conntracks when a NAT mapping is set up for a connection using a helper.

As a side effect, this saves 24 bytes per connection with NAT in the common
case that a connection does not have a helper assigned.

Signed-off-by: Patrick McHardy <kaber@trash.net>
---
 include/linux/netfilter.h                          |   9 +-
 include/net/netfilter/nf_conntrack_extend.h        |   2 +
 include/net/netfilter/nf_conntrack_seqadj.h        |  49 +++++
 include/net/netfilter/nf_nat.h                     |  10 -
 include/net/netfilter/nf_nat_helper.h              |  19 --
 include/uapi/linux/netfilter/nf_conntrack_common.h |   3 +-
 include/uapi/linux/netfilter/nfnetlink_conntrack.h |  15 +-
 net/ipv4/netfilter/nf_conntrack_l3proto_ipv4.c     |   7 +-
 net/ipv6/netfilter/nf_conntrack_l3proto_ipv6.c     |   7 +-
 net/netfilter/Makefile                             |   2 +-
 net/netfilter/nf_conntrack_core.c                  |  16 +-
 net/netfilter/nf_conntrack_netlink.c               | 115 +++++------
 net/netfilter/nf_conntrack_proto_tcp.c             |  18 +-
 net/netfilter/nf_conntrack_seqadj.c                | 218 ++++++++++++++++++++
 net/netfilter/nf_nat_core.c                        |  16 +-
 net/netfilter/nf_nat_helper.c                      | 228 +--------------------
 net/netfilter/nf_nat_sip.c                         |   3 +-
 net/netfilter/nfnetlink_queue_ct.c                 |   8 +-
 18 files changed, 369 insertions(+), 376 deletions(-)
 create mode 100644 include/net/netfilter/nf_conntrack_seqadj.h
 create mode 100644 net/netfilter/nf_conntrack_seqadj.c

diff --git a/include/linux/netfilter.h b/include/linux/netfilter.h
index 655d5d1..c6cbbf1 100644
--- a/include/linux/netfilter.h
+++ b/include/linux/netfilter.h
@@ -319,20 +319,17 @@ extern void nf_ct_attach(struct sk_buff *, const struct sk_buff *);
 extern void (*nf_ct_destroy)(struct nf_conntrack *) __rcu;
 
 struct nf_conn;
+enum ip_conntrack_info;
 struct nlattr;
 
 struct nfq_ct_hook {
 	size_t (*build_size)(const struct nf_conn *ct);
 	int (*build)(struct sk_buff *skb, struct nf_conn *ct);
 	int (*parse)(const struct nlattr *attr, struct nf_conn *ct);
-};
-extern struct nfq_ct_hook __rcu *nfq_ct_hook;
-
-struct nfq_ct_nat_hook {
 	void (*seq_adjust)(struct sk_buff *skb, struct nf_conn *ct,
-			   u32 ctinfo, s32 off);
+			   enum ip_conntrack_info ctinfo, s32 off);
 };
-extern struct nfq_ct_nat_hook __rcu *nfq_ct_nat_hook;
+extern struct nfq_ct_hook __rcu *nfq_ct_hook;
 #else
 static inline void nf_ct_attach(struct sk_buff *new, struct sk_buff *skb) {}
 #endif
diff --git a/include/net/netfilter/nf_conntrack_extend.h b/include/net/netfilter/nf_conntrack_extend.h
index 977bc8a..2a22bcb 100644
--- a/include/net/netfilter/nf_conntrack_extend.h
+++ b/include/net/netfilter/nf_conntrack_extend.h
@@ -9,6 +9,7 @@ enum nf_ct_ext_id {
 	NF_CT_EXT_HELPER,
 #if defined(CONFIG_NF_NAT) || defined(CONFIG_NF_NAT_MODULE)
 	NF_CT_EXT_NAT,
+	NF_CT_EXT_SEQADJ,
 #endif
 	NF_CT_EXT_ACCT,
 #ifdef CONFIG_NF_CONNTRACK_EVENTS
@@ -31,6 +32,7 @@ enum nf_ct_ext_id {
 
 #define NF_CT_EXT_HELPER_TYPE struct nf_conn_help
 #define NF_CT_EXT_NAT_TYPE struct nf_conn_nat
+#define NF_CT_EXT_SEQADJ_TYPE struct nf_conn_seqadj
 #define NF_CT_EXT_ACCT_TYPE struct nf_conn_counter
 #define NF_CT_EXT_ECACHE_TYPE struct nf_conntrack_ecache
 #define NF_CT_EXT_ZONE_TYPE struct nf_conntrack_zone
diff --git a/include/net/netfilter/nf_conntrack_seqadj.h b/include/net/netfilter/nf_conntrack_seqadj.h
new file mode 100644
index 0000000..30bfbbe
--- /dev/null
+++ b/include/net/netfilter/nf_conntrack_seqadj.h
@@ -0,0 +1,49 @@
+#ifndef _NF_CONNTRACK_SEQADJ_H
+#define _NF_CONNTRACK_SEQADJ_H
+
+#include <net/netfilter/nf_conntrack_extend.h>
+
+/**
+ * struct nf_ct_seqadj - sequence number adjustment information
+ *
+ * @correction_pos: position of the last TCP sequence number modification
+ * @offset_before: sequence number offset before last modification
+ * @offset_after: sequence number offset after last modification
+ */
+struct nf_ct_seqadj {
+	u32		correction_pos;
+	s32		offset_before;
+	s32		offset_after;
+};
+
+struct nf_conn_seqadj {
+	struct nf_ct_seqadj	seq[IP_CT_DIR_MAX];
+};
+
+static inline struct nf_conn_seqadj *nfct_seqadj(const struct nf_conn *ct)
+{
+	return nf_ct_ext_find(ct, NF_CT_EXT_SEQADJ);
+}
+
+static inline struct nf_conn_seqadj *nfct_seqadj_ext_add(struct nf_conn *ct)
+{
+	return nf_ct_ext_add(ct, NF_CT_EXT_SEQADJ, GFP_ATOMIC);
+}
+
+extern int nf_ct_seqadj_set(struct nf_conn *ct, enum ip_conntrack_info ctinfo,
+			    __be32 seq, s32 off);
+extern void nf_ct_tcp_seqadj_set(struct sk_buff *skb,
+				 struct nf_conn *ct,
+				 enum ip_conntrack_info ctinfo,
+				 s32 off);
+
+extern int nf_ct_seq_adjust(struct sk_buff *skb,
+			    struct nf_conn *ct, enum ip_conntrack_info ctinfo,
+			    unsigned int protoff);
+extern s32 nf_ct_seq_offset(const struct nf_conn *ct, enum ip_conntrack_dir,
+			    u32 seq);
+
+extern int nf_conntrack_seqadj_init(void);
+extern void nf_conntrack_seqadj_fini(void);
+
+#endif /* _NF_CONNTRACK_SEQADJ_H */
diff --git a/include/net/netfilter/nf_nat.h b/include/net/netfilter/nf_nat.h
index e244141..59a1924 100644
--- a/include/net/netfilter/nf_nat.h
+++ b/include/net/netfilter/nf_nat.h
@@ -13,15 +13,6 @@ enum nf_nat_manip_type {
 #define HOOK2MANIP(hooknum) ((hooknum) != NF_INET_POST_ROUTING && \
 			     (hooknum) != NF_INET_LOCAL_IN)
 
-/* NAT sequence number modifications */
-struct nf_nat_seq {
-	/* position of the last TCP sequence number modification (if any) */
-	u_int32_t correction_pos;
-
-	/* sequence number offset before and after last modification */
-	int32_t offset_before, offset_after;
-};
-
 #include <linux/list.h>
 #include <linux/netfilter/nf_conntrack_pptp.h>
 #include <net/netfilter/nf_conntrack_extend.h>
@@ -39,7 +30,6 @@ struct nf_conn;
 /* The structure embedded in the conntrack structure. */
 struct nf_conn_nat {
 	struct hlist_node bysource;
-	struct nf_nat_seq seq[IP_CT_DIR_MAX];
 	struct nf_conn *ct;
 	union nf_conntrack_nat_help help;
 #if defined(CONFIG_IP_NF_TARGET_MASQUERADE) || \
diff --git a/include/net/netfilter/nf_nat_helper.h b/include/net/netfilter/nf_nat_helper.h
index 194c347..404324d 100644
--- a/include/net/netfilter/nf_nat_helper.h
+++ b/include/net/netfilter/nf_nat_helper.h
@@ -39,28 +39,9 @@ extern int nf_nat_mangle_udp_packet(struct sk_buff *skb,
 				    const char *rep_buffer,
 				    unsigned int rep_len);
 
-extern void nf_nat_set_seq_adjust(struct nf_conn *ct,
-				  enum ip_conntrack_info ctinfo,
-				  __be32 seq, s32 off);
-extern int nf_nat_seq_adjust(struct sk_buff *skb,
-			     struct nf_conn *ct,
-			     enum ip_conntrack_info ctinfo,
-			     unsigned int protoff);
-extern int (*nf_nat_seq_adjust_hook)(struct sk_buff *skb,
-				     struct nf_conn *ct,
-				     enum ip_conntrack_info ctinfo,
-				     unsigned int protoff);
-
 /* Setup NAT on this expected conntrack so it follows master, but goes
  * to port ct->master->saved_proto. */
 extern void nf_nat_follow_master(struct nf_conn *ct,
 				 struct nf_conntrack_expect *this);
 
-extern s32 nf_nat_get_offset(const struct nf_conn *ct,
-			     enum ip_conntrack_dir dir,
-			     u32 seq);
-
-extern void nf_nat_tcp_seq_adjust(struct sk_buff *skb, struct nf_conn *ct,
-				  u32 dir, s32 off);
-
 #endif
diff --git a/include/uapi/linux/netfilter/nf_conntrack_common.h b/include/uapi/linux/netfilter/nf_conntrack_common.h
index d69483f..8dd8038 100644
--- a/include/uapi/linux/netfilter/nf_conntrack_common.h
+++ b/include/uapi/linux/netfilter/nf_conntrack_common.h
@@ -99,7 +99,8 @@ enum ip_conntrack_events {
 	IPCT_PROTOINFO,		/* protocol information has changed */
 	IPCT_HELPER,		/* new helper has been set */
 	IPCT_MARK,		/* new mark has been set */
-	IPCT_NATSEQADJ,		/* NAT is doing sequence adjustment */
+	IPCT_SEQADJ,		/* sequence adjustment has changed */
+	IPCT_NATSEQADJ = IPCT_SEQADJ,
 	IPCT_SECMARK,		/* new security mark has been set */
 	IPCT_LABEL,		/* new connlabel has been set */
 };
diff --git a/include/uapi/linux/netfilter/nfnetlink_conntrack.h b/include/uapi/linux/netfilter/nfnetlink_conntrack.h
index 08fabc6..acad6c5 100644
--- a/include/uapi/linux/netfilter/nfnetlink_conntrack.h
+++ b/include/uapi/linux/netfilter/nfnetlink_conntrack.h
@@ -42,8 +42,10 @@ enum ctattr_type {
 	CTA_ID,
 	CTA_NAT_DST,
 	CTA_TUPLE_MASTER,
-	CTA_NAT_SEQ_ADJ_ORIG,
-	CTA_NAT_SEQ_ADJ_REPLY,
+	CTA_SEQ_ADJ_ORIG,
+	CTA_NAT_SEQ_ADJ_ORIG	= CTA_SEQ_ADJ_ORIG,
+	CTA_SEQ_ADJ_REPLY,
+	CTA_NAT_SEQ_ADJ_REPLY	= CTA_SEQ_ADJ_REPLY,
 	CTA_SECMARK,		/* obsolete */
 	CTA_ZONE,
 	CTA_SECCTX,
@@ -165,6 +167,15 @@ enum ctattr_protonat {
 };
 #define CTA_PROTONAT_MAX (__CTA_PROTONAT_MAX - 1)
 
+enum ctattr_seqadj {
+	CTA_SEQADJ_UNSPEC,
+	CTA_SEQADJ_CORRECTION_POS,
+	CTA_SEQADJ_OFFSET_BEFORE,
+	CTA_SEQADJ_OFFSET_AFTER,
+	__CTA_SEQADJ_MAX
+};
+#define CTA_SEQADJ_MAX (__CTA_SEQADJ_MAX - 1)
+
 enum ctattr_natseq {
 	CTA_NAT_SEQ_UNSPEC,
 	CTA_NAT_SEQ_CORRECTION_POS,
diff --git a/net/ipv4/netfilter/nf_conntrack_l3proto_ipv4.c b/net/ipv4/netfilter/nf_conntrack_l3proto_ipv4.c
index 0a2e0e3..86f5b34 100644
--- a/net/ipv4/netfilter/nf_conntrack_l3proto_ipv4.c
+++ b/net/ipv4/netfilter/nf_conntrack_l3proto_ipv4.c
@@ -25,6 +25,7 @@
 #include <net/netfilter/nf_conntrack_l3proto.h>
 #include <net/netfilter/nf_conntrack_zones.h>
 #include <net/netfilter/nf_conntrack_core.h>
+#include <net/netfilter/nf_conntrack_seqadj.h>
 #include <net/netfilter/ipv4/nf_conntrack_ipv4.h>
 #include <net/netfilter/nf_nat_helper.h>
 #include <net/netfilter/ipv4/nf_defrag_ipv4.h>
@@ -136,11 +137,7 @@ static unsigned int ipv4_confirm(unsigned int hooknum,
 	/* adjust seqs for loopback traffic only in outgoing direction */
 	if (test_bit(IPS_SEQ_ADJUST_BIT, &ct->status) &&
 	    !nf_is_loopback_packet(skb)) {
-		typeof(nf_nat_seq_adjust_hook) seq_adjust;
-
-		seq_adjust = rcu_dereference(nf_nat_seq_adjust_hook);
-		if (!seq_adjust ||
-		    !seq_adjust(skb, ct, ctinfo, ip_hdrlen(skb))) {
+		if (!nf_ct_seq_adjust(skb, ct, ctinfo, ip_hdrlen(skb))) {
 			NF_CT_STAT_INC_ATOMIC(nf_ct_net(ct), drop);
 			return NF_DROP;
 		}
diff --git a/net/ipv6/netfilter/nf_conntrack_l3proto_ipv6.c b/net/ipv6/netfilter/nf_conntrack_l3proto_ipv6.c
index c9b6a6e..d6e4dd8 100644
--- a/net/ipv6/netfilter/nf_conntrack_l3proto_ipv6.c
+++ b/net/ipv6/netfilter/nf_conntrack_l3proto_ipv6.c
@@ -28,6 +28,7 @@
 #include <net/netfilter/nf_conntrack_l3proto.h>
 #include <net/netfilter/nf_conntrack_core.h>
 #include <net/netfilter/nf_conntrack_zones.h>
+#include <net/netfilter/nf_conntrack_seqadj.h>
 #include <net/netfilter/ipv6/nf_conntrack_ipv6.h>
 #include <net/netfilter/nf_nat_helper.h>
 #include <net/netfilter/ipv6/nf_defrag_ipv6.h>
@@ -158,11 +159,7 @@ static unsigned int ipv6_confirm(unsigned int hooknum,
 	/* adjust seqs for loopback traffic only in outgoing direction */
 	if (test_bit(IPS_SEQ_ADJUST_BIT, &ct->status) &&
 	    !nf_is_loopback_packet(skb)) {
-		typeof(nf_nat_seq_adjust_hook) seq_adjust;
-
-		seq_adjust = rcu_dereference(nf_nat_seq_adjust_hook);
-		if (!seq_adjust ||
-		    !seq_adjust(skb, ct, ctinfo, protoff)) {
+		if (!nf_ct_seq_adjust(skb, ct, ctinfo, protoff)) {
 			NF_CT_STAT_INC_ATOMIC(nf_ct_net(ct), drop);
 			return NF_DROP;
 		}
diff --git a/net/netfilter/Makefile b/net/netfilter/Makefile
index ebfa7dc..89a9c16 100644
--- a/net/netfilter/Makefile
+++ b/net/netfilter/Makefile
@@ -1,6 +1,6 @@
 netfilter-objs := core.o nf_log.o nf_queue.o nf_sockopt.o
 
-nf_conntrack-y	:= nf_conntrack_core.o nf_conntrack_standalone.o nf_conntrack_expect.o nf_conntrack_helper.o nf_conntrack_proto.o nf_conntrack_l3proto_generic.o nf_conntrack_proto_generic.o nf_conntrack_proto_tcp.o nf_conntrack_proto_udp.o nf_conntrack_extend.o nf_conntrack_acct.o
+nf_conntrack-y	:= nf_conntrack_core.o nf_conntrack_standalone.o nf_conntrack_expect.o nf_conntrack_helper.o nf_conntrack_proto.o nf_conntrack_l3proto_generic.o nf_conntrack_proto_generic.o nf_conntrack_proto_tcp.o nf_conntrack_proto_udp.o nf_conntrack_extend.o nf_conntrack_acct.o nf_conntrack_seqadj.o
 nf_conntrack-$(CONFIG_NF_CONNTRACK_TIMEOUT) += nf_conntrack_timeout.o
 nf_conntrack-$(CONFIG_NF_CONNTRACK_TIMESTAMP) += nf_conntrack_timestamp.o
 nf_conntrack-$(CONFIG_NF_CONNTRACK_EVENTS) += nf_conntrack_ecache.o
diff --git a/net/netfilter/nf_conntrack_core.c b/net/netfilter/nf_conntrack_core.c
index 0934611..31683ca 100644
--- a/net/netfilter/nf_conntrack_core.c
+++ b/net/netfilter/nf_conntrack_core.c
@@ -39,6 +39,7 @@
 #include <net/netfilter/nf_conntrack_l4proto.h>
 #include <net/netfilter/nf_conntrack_expect.h>
 #include <net/netfilter/nf_conntrack_helper.h>
+#include <net/netfilter/nf_conntrack_seqadj.h>
 #include <net/netfilter/nf_conntrack_core.h>
 #include <net/netfilter/nf_conntrack_extend.h>
 #include <net/netfilter/nf_conntrack_acct.h>
@@ -1354,6 +1355,7 @@ void nf_conntrack_cleanup_end(void)
 	nf_ct_extend_unregister(&nf_ct_zone_extend);
 #endif
 	nf_conntrack_proto_fini();
+	nf_conntrack_seqadj_fini();
 	nf_conntrack_labels_fini();
 	nf_conntrack_helper_fini();
 	nf_conntrack_timeout_fini();
@@ -1559,6 +1561,10 @@ int nf_conntrack_init_start(void)
 	if (ret < 0)
 		goto err_labels;
 
+	ret = nf_conntrack_seqadj_init();
+	if (ret < 0)
+		goto err_seqadj;
+
 #ifdef CONFIG_NF_CONNTRACK_ZONES
 	ret = nf_ct_extend_register(&nf_ct_zone_extend);
 	if (ret < 0)
@@ -1583,6 +1589,8 @@ err_proto:
 	nf_ct_extend_unregister(&nf_ct_zone_extend);
 err_extend:
 #endif
+	nf_conntrack_seqadj_fini();
+err_seqadj:
 	nf_conntrack_labels_fini();
 err_labels:
 	nf_conntrack_helper_fini();
@@ -1605,9 +1613,6 @@ void nf_conntrack_init_end(void)
 	/* For use by REJECT target */
 	RCU_INIT_POINTER(ip_ct_attach, nf_conntrack_attach);
 	RCU_INIT_POINTER(nf_ct_destroy, destroy_conntrack);
-
-	/* Howto get NAT offsets */
-	RCU_INIT_POINTER(nf_ct_nat_offset, NULL);
 }
 
 /*
@@ -1694,8 +1699,3 @@ err_slabname:
 err_stat:
 	return ret;
 }
-
-s32 (*nf_ct_nat_offset)(const struct nf_conn *ct,
-			enum ip_conntrack_dir dir,
-			u32 seq);
-EXPORT_SYMBOL_GPL(nf_ct_nat_offset);
diff --git a/net/netfilter/nf_conntrack_netlink.c b/net/netfilter/nf_conntrack_netlink.c
index e842c0d..3de5d3e 100644
--- a/net/netfilter/nf_conntrack_netlink.c
+++ b/net/netfilter/nf_conntrack_netlink.c
@@ -37,6 +37,7 @@
 #include <net/netfilter/nf_conntrack_core.h>
 #include <net/netfilter/nf_conntrack_expect.h>
 #include <net/netfilter/nf_conntrack_helper.h>
+#include <net/netfilter/nf_conntrack_seqadj.h>
 #include <net/netfilter/nf_conntrack_l3proto.h>
 #include <net/netfilter/nf_conntrack_l4proto.h>
 #include <net/netfilter/nf_conntrack_tuple.h>
@@ -381,9 +382,8 @@ nla_put_failure:
 	return -1;
 }
 
-#ifdef CONFIG_NF_NAT_NEEDED
 static int
-dump_nat_seq_adj(struct sk_buff *skb, const struct nf_nat_seq *natseq, int type)
+dump_ct_seq_adj(struct sk_buff *skb, const struct nf_ct_seqadj *seq, int type)
 {
 	struct nlattr *nest_parms;
 
@@ -391,12 +391,12 @@ dump_nat_seq_adj(struct sk_buff *skb, const struct nf_nat_seq *natseq, int type)
 	if (!nest_parms)
 		goto nla_put_failure;
 
-	if (nla_put_be32(skb, CTA_NAT_SEQ_CORRECTION_POS,
-			 htonl(natseq->correction_pos)) ||
-	    nla_put_be32(skb, CTA_NAT_SEQ_OFFSET_BEFORE,
-			 htonl(natseq->offset_before)) ||
-	    nla_put_be32(skb, CTA_NAT_SEQ_OFFSET_AFTER,
-			 htonl(natseq->offset_after)))
+	if (nla_put_be32(skb, CTA_SEQADJ_CORRECTION_POS,
+			 htonl(seq->correction_pos)) ||
+	    nla_put_be32(skb, CTA_SEQADJ_OFFSET_BEFORE,
+			 htonl(seq->offset_before)) ||
+	    nla_put_be32(skb, CTA_SEQADJ_OFFSET_AFTER,
+			 htonl(seq->offset_after)))
 		goto nla_put_failure;
 
 	nla_nest_end(skb, nest_parms);
@@ -408,27 +408,24 @@ nla_put_failure:
 }
 
 static inline int
-ctnetlink_dump_nat_seq_adj(struct sk_buff *skb, const struct nf_conn *ct)
+ctnetlink_dump_ct_seq_adj(struct sk_buff *skb, const struct nf_conn *ct)
 {
-	struct nf_nat_seq *natseq;
-	struct nf_conn_nat *nat = nfct_nat(ct);
+	struct nf_conn_seqadj *seqadj = nfct_seqadj(ct);
+	struct nf_ct_seqadj *seq;
 
-	if (!(ct->status & IPS_SEQ_ADJUST) || !nat)
+	if (!(ct->status & IPS_SEQ_ADJUST) || !seqadj)
 		return 0;
 
-	natseq = &nat->seq[IP_CT_DIR_ORIGINAL];
-	if (dump_nat_seq_adj(skb, natseq, CTA_NAT_SEQ_ADJ_ORIG) == -1)
+	seq = &seqadj->seq[IP_CT_DIR_ORIGINAL];
+	if (dump_ct_seq_adj(skb, seq, CTA_SEQ_ADJ_ORIG) == -1)
 		return -1;
 
-	natseq = &nat->seq[IP_CT_DIR_REPLY];
-	if (dump_nat_seq_adj(skb, natseq, CTA_NAT_SEQ_ADJ_REPLY) == -1)
+	seq = &seqadj->seq[IP_CT_DIR_REPLY];
+	if (dump_ct_seq_adj(skb, seq, CTA_SEQ_ADJ_REPLY) == -1)
 		return -1;
 
 	return 0;
 }
-#else
-#define ctnetlink_dump_nat_seq_adj(a, b) (0)
-#endif
 
 static inline int
 ctnetlink_dump_id(struct sk_buff *skb, const struct nf_conn *ct)
@@ -502,7 +499,7 @@ ctnetlink_fill_info(struct sk_buff *skb, u32 portid, u32 seq, u32 type,
 	    ctnetlink_dump_id(skb, ct) < 0 ||
 	    ctnetlink_dump_use(skb, ct) < 0 ||
 	    ctnetlink_dump_master(skb, ct) < 0 ||
-	    ctnetlink_dump_nat_seq_adj(skb, ct) < 0)
+	    ctnetlink_dump_ct_seq_adj(skb, ct) < 0)
 		goto nla_put_failure;
 
 	nlmsg_end(skb, nlh);
@@ -707,8 +704,8 @@ ctnetlink_conntrack_event(unsigned int events, struct nf_ct_event *item)
 		    ctnetlink_dump_master(skb, ct) < 0)
 			goto nla_put_failure;
 
-		if (events & (1 << IPCT_NATSEQADJ) &&
-		    ctnetlink_dump_nat_seq_adj(skb, ct) < 0)
+		if (events & (1 << IPCT_SEQADJ) &&
+		    ctnetlink_dump_ct_seq_adj(skb, ct) < 0)
 			goto nla_put_failure;
 	}
 
@@ -1439,66 +1436,65 @@ ctnetlink_change_protoinfo(struct nf_conn *ct, const struct nlattr * const cda[]
 	return err;
 }
 
-#ifdef CONFIG_NF_NAT_NEEDED
-static const struct nla_policy nat_seq_policy[CTA_NAT_SEQ_MAX+1] = {
-	[CTA_NAT_SEQ_CORRECTION_POS]	= { .type = NLA_U32 },
-	[CTA_NAT_SEQ_OFFSET_BEFORE]	= { .type = NLA_U32 },
-	[CTA_NAT_SEQ_OFFSET_AFTER]	= { .type = NLA_U32 },
+static const struct nla_policy seqadj_policy[CTA_SEQADJ_MAX+1] = {
+	[CTA_SEQADJ_CORRECTION_POS]	= { .type = NLA_U32 },
+	[CTA_SEQADJ_OFFSET_BEFORE]	= { .type = NLA_U32 },
+	[CTA_SEQADJ_OFFSET_AFTER]	= { .type = NLA_U32 },
 };
 
 static inline int
-change_nat_seq_adj(struct nf_nat_seq *natseq, const struct nlattr * const attr)
+change_seq_adj(struct nf_ct_seqadj *seq, const struct nlattr * const attr)
 {
 	int err;
-	struct nlattr *cda[CTA_NAT_SEQ_MAX+1];
+	struct nlattr *cda[CTA_SEQADJ_MAX+1];
 
-	err = nla_parse_nested(cda, CTA_NAT_SEQ_MAX, attr, nat_seq_policy);
+	err = nla_parse_nested(cda, CTA_SEQADJ_MAX, attr, seqadj_policy);
 	if (err < 0)
 		return err;
 
-	if (!cda[CTA_NAT_SEQ_CORRECTION_POS])
+	if (!cda[CTA_SEQADJ_CORRECTION_POS])
 		return -EINVAL;
 
-	natseq->correction_pos =
-		ntohl(nla_get_be32(cda[CTA_NAT_SEQ_CORRECTION_POS]));
+	seq->correction_pos =
+		ntohl(nla_get_be32(cda[CTA_SEQADJ_CORRECTION_POS]));
 
-	if (!cda[CTA_NAT_SEQ_OFFSET_BEFORE])
+	if (!cda[CTA_SEQADJ_OFFSET_BEFORE])
 		return -EINVAL;
 
-	natseq->offset_before =
-		ntohl(nla_get_be32(cda[CTA_NAT_SEQ_OFFSET_BEFORE]));
+	seq->offset_before =
+		ntohl(nla_get_be32(cda[CTA_SEQADJ_OFFSET_BEFORE]));
 
-	if (!cda[CTA_NAT_SEQ_OFFSET_AFTER])
+	if (!cda[CTA_SEQADJ_OFFSET_AFTER])
 		return -EINVAL;
 
-	natseq->offset_after =
-		ntohl(nla_get_be32(cda[CTA_NAT_SEQ_OFFSET_AFTER]));
+	seq->offset_after =
+		ntohl(nla_get_be32(cda[CTA_SEQADJ_OFFSET_AFTER]));
 
 	return 0;
 }
 
 static int
-ctnetlink_change_nat_seq_adj(struct nf_conn *ct,
-			     const struct nlattr * const cda[])
+ctnetlink_change_seq_adj(struct nf_conn *ct,
+			 const struct nlattr * const cda[])
 {
+	struct nf_conn_seqadj *seqadj = nfct_seqadj(ct);
 	int ret = 0;
-	struct nf_conn_nat *nat = nfct_nat(ct);
 
-	if (!nat)
+	if (!seqadj)
 		return 0;
 
-	if (cda[CTA_NAT_SEQ_ADJ_ORIG]) {
-		ret = change_nat_seq_adj(&nat->seq[IP_CT_DIR_ORIGINAL],
-					 cda[CTA_NAT_SEQ_ADJ_ORIG]);
+	if (cda[CTA_SEQ_ADJ_ORIG]) {
+		ret = change_seq_adj(&seqadj->seq[IP_CT_DIR_ORIGINAL],
+				     cda[CTA_SEQ_ADJ_ORIG]);
 		if (ret < 0)
 			return ret;
 
 		ct->status |= IPS_SEQ_ADJUST;
 	}
 
-	if (cda[CTA_NAT_SEQ_ADJ_REPLY]) {
-		ret = change_nat_seq_adj(&nat->seq[IP_CT_DIR_REPLY],
-					 cda[CTA_NAT_SEQ_ADJ_REPLY]);
+	if (cda[CTA_SEQ_ADJ_REPLY]) {
+		ret = change_seq_adj(&seqadj->seq[IP_CT_DIR_REPLY],
+				     cda[CTA_SEQ_ADJ_REPLY]);
 		if (ret < 0)
 			return ret;
 
@@ -1507,7 +1503,6 @@ ctnetlink_change_nat_seq_adj(struct nf_conn *ct,
 
 	return 0;
 }
-#endif
 
 static int
 ctnetlink_attach_labels(struct nf_conn *ct, const struct nlattr * const cda[])
@@ -1573,13 +1568,12 @@ ctnetlink_change_conntrack(struct nf_conn *ct,
 		ct->mark = ntohl(nla_get_be32(cda[CTA_MARK]));
 #endif
 
-#ifdef CONFIG_NF_NAT_NEEDED
-	if (cda[CTA_NAT_SEQ_ADJ_ORIG] || cda[CTA_NAT_SEQ_ADJ_REPLY]) {
-		err = ctnetlink_change_nat_seq_adj(ct, cda);
+	if (cda[CTA_SEQ_ADJ_ORIG] || cda[CTA_SEQ_ADJ_REPLY]) {
+		err = ctnetlink_change_seq_adj(ct, cda);
 		if (err < 0)
 			return err;
 	}
-#endif
+
 	if (cda[CTA_LABELS]) {
 		err = ctnetlink_attach_labels(ct, cda);
 		if (err < 0)
@@ -1684,13 +1678,11 @@ ctnetlink_create_conntrack(struct net *net, u16 zone,
 			goto err2;
 	}
 
-#ifdef CONFIG_NF_NAT_NEEDED
-	if (cda[CTA_NAT_SEQ_ADJ_ORIG] || cda[CTA_NAT_SEQ_ADJ_REPLY]) {
-		err = ctnetlink_change_nat_seq_adj(ct, cda);
+	if (cda[CTA_SEQ_ADJ_ORIG] || cda[CTA_SEQ_ADJ_REPLY]) {
+		err = ctnetlink_change_seq_adj(ct, cda);
 		if (err < 0)
 			goto err2;
 	}
-#endif
 
 	memset(&ct->proto, 0, sizeof(ct->proto));
 	if (cda[CTA_PROTOINFO]) {
@@ -1804,7 +1796,7 @@ ctnetlink_new_conntrack(struct sock *ctnl, struct sk_buff *skb,
 						      (1 << IPCT_ASSURED) |
 						      (1 << IPCT_HELPER) |
 						      (1 << IPCT_PROTOINFO) |
-						      (1 << IPCT_NATSEQADJ) |
+						      (1 << IPCT_SEQADJ) |
 						      (1 << IPCT_MARK) | events,
 						      ct, NETLINK_CB(skb).portid,
 						      nlmsg_report(nlh));
@@ -1827,7 +1819,7 @@ ctnetlink_new_conntrack(struct sock *ctnl, struct sk_buff *skb,
 						      (1 << IPCT_HELPER) |
 						      (1 << IPCT_LABEL) |
 						      (1 << IPCT_PROTOINFO) |
-						      (1 << IPCT_NATSEQADJ) |
+						      (1 << IPCT_SEQADJ) |
 						      (1 << IPCT_MARK),
 						      ct, NETLINK_CB(skb).portid,
 						      nlmsg_report(nlh));
@@ -2061,7 +2053,7 @@ ctnetlink_nfqueue_build(struct sk_buff *skb, struct nf_conn *ct)
 		goto nla_put_failure;
 
 	if ((ct->status & IPS_SEQ_ADJUST) &&
-	    ctnetlink_dump_nat_seq_adj(skb, ct) < 0)
+	    ctnetlink_dump_ct_seq_adj(skb, ct) < 0)
 		goto nla_put_failure;
 
 #ifdef CONFIG_NF_CONNTRACK_MARK
@@ -2131,6 +2123,7 @@ static struct nfq_ct_hook ctnetlink_nfqueue_hook = {
 	.build_size	= ctnetlink_nfqueue_build_size,
 	.build		= ctnetlink_nfqueue_build,
 	.parse		= ctnetlink_nfqueue_parse,
+	.seq_adjust	= nf_ct_tcp_seqadj_set,
 };
 #endif /* CONFIG_NETFILTER_NETLINK_QUEUE_CT */
 
diff --git a/net/netfilter/nf_conntrack_proto_tcp.c b/net/netfilter/nf_conntrack_proto_tcp.c
index 8f308d8..3c3401e 100644
--- a/net/netfilter/nf_conntrack_proto_tcp.c
+++ b/net/netfilter/nf_conntrack_proto_tcp.c
@@ -27,6 +27,7 @@
 #include <net/netfilter/nf_conntrack.h>
 #include <net/netfilter/nf_conntrack_l4proto.h>
 #include <net/netfilter/nf_conntrack_ecache.h>
+#include <net/netfilter/nf_conntrack_seqadj.h>
 #include <net/netfilter/nf_log.h>
 #include <net/netfilter/ipv4/nf_conntrack_ipv4.h>
 #include <net/netfilter/ipv6/nf_conntrack_ipv6.h>
@@ -495,21 +496,6 @@ static void tcp_sack(const struct sk_buff *skb, unsigned int dataoff,
 	}
 }
 
-#ifdef CONFIG_NF_NAT_NEEDED
-static inline s32 nat_offset(const struct nf_conn *ct,
-			     enum ip_conntrack_dir dir,
-			     u32 seq)
-{
-	typeof(nf_ct_nat_offset) get_offset = rcu_dereference(nf_ct_nat_offset);
-
-	return get_offset != NULL ? get_offset(ct, dir, seq) : 0;
-}
-#define NAT_OFFSET(ct, dir, seq) \
-	(nat_offset(ct, dir, seq))
-#else
-#define NAT_OFFSET(ct, dir, seq)	0
-#endif
-
 static bool tcp_in_window(const struct nf_conn *ct,
 			  struct ip_ct_tcp *state,
 			  enum ip_conntrack_dir dir,
@@ -540,7 +526,7 @@ static bool tcp_in_window(const struct nf_conn *ct,
 		tcp_sack(skb, dataoff, tcph, &sack);
 
 	/* Take into account NAT sequence number mangling */
-	receiver_offset = NAT_OFFSET(ct, !dir, ack - 1);
+	receiver_offset = nf_ct_seq_offset(ct, !dir, ack - 1);
 	ack -= receiver_offset;
 	sack -= receiver_offset;
 
diff --git a/net/netfilter/nf_conntrack_seqadj.c b/net/netfilter/nf_conntrack_seqadj.c
new file mode 100644
index 0000000..483eb9c
--- /dev/null
+++ b/net/netfilter/nf_conntrack_seqadj.c
@@ -0,0 +1,218 @@
+#include <linux/types.h>
+#include <linux/netfilter.h>
+#include <net/tcp.h>
+
+#include <net/netfilter/nf_conntrack.h>
+#include <net/netfilter/nf_conntrack_extend.h>
+#include <net/netfilter/nf_conntrack_seqadj.h>
+
+int nf_ct_seqadj_set(struct nf_conn *ct, enum ip_conntrack_info ctinfo,
+		     __be32 seq, s32 off)
+{
+	struct nf_conn_seqadj *seqadj = nfct_seqadj(ct);
+	enum ip_conntrack_dir dir = CTINFO2DIR(ctinfo);
+	struct nf_ct_seqadj *this_way;
+
+	if (off == 0)
+		return 0;
+
+	set_bit(IPS_SEQ_ADJUST_BIT, &ct->status);
+
+	spin_lock_bh(&ct->lock);
+	this_way = &seqadj->seq[dir];
+	if (this_way->offset_before == this_way->offset_after ||
+	    before(this_way->correction_pos, seq)) {
+		this_way->correction_pos = seq;
+		this_way->offset_before	 = this_way->offset_after;
+		this_way->offset_after	+= off;
+	}
+	spin_unlock_bh(&ct->lock);
+	return 0;
+}
+EXPORT_SYMBOL_GPL(nf_ct_seqadj_set);
+
+void nf_ct_tcp_seqadj_set(struct sk_buff *skb,
+			  struct nf_conn *ct, enum ip_conntrack_info ctinfo,
+			  s32 off)
+{
+	const struct tcphdr *th;
+
+	if (nf_ct_protonum(ct) != IPPROTO_TCP)
+		return;
+
+	th = (struct tcphdr *)(skb_network_header(skb) + ip_hdrlen(skb));
+	nf_ct_seqadj_set(ct, ctinfo, th->seq, off);
+}
+EXPORT_SYMBOL_GPL(nf_ct_tcp_seqadj_set);
+
+/* Adjust one found SACK option including checksum correction */
+static void nf_ct_sack_block_adjust(struct sk_buff *skb,
+				    struct tcphdr *tcph,
+				    unsigned int sackoff,
+				    unsigned int sackend,
+				    struct nf_ct_seqadj *seq)
+{
+	while (sackoff < sackend) {
+		struct tcp_sack_block_wire *sack;
+		__be32 new_start_seq, new_end_seq;
+
+		sack = (void *)skb->data + sackoff;
+		if (after(ntohl(sack->start_seq) - seq->offset_before,
+			  seq->correction_pos))
+			new_start_seq = htonl(ntohl(sack->start_seq) -
+					seq->offset_after);
+		else
+			new_start_seq = htonl(ntohl(sack->start_seq) -
+					seq->offset_before);
+
+		if (after(ntohl(sack->end_seq) - seq->offset_before,
+			  seq->correction_pos))
+			new_end_seq = htonl(ntohl(sack->end_seq) -
+				      seq->offset_after);
+		else
+			new_end_seq = htonl(ntohl(sack->end_seq) -
+				      seq->offset_before);
+
+		pr_debug("sack_adjust: start_seq: %d->%d, end_seq: %d->%d\n",
+			 ntohl(sack->start_seq), new_start_seq,
+			 ntohl(sack->end_seq), new_end_seq);
+
+		inet_proto_csum_replace4(&tcph->check, skb,
+					 sack->start_seq, new_start_seq, 0);
+		inet_proto_csum_replace4(&tcph->check, skb,
+					 sack->end_seq, new_end_seq, 0);
+		sack->start_seq = new_start_seq;
+		sack->end_seq = new_end_seq;
+		sackoff += sizeof(*sack);
+	}
+}
+
+/* TCP SACK sequence number adjustment */
+static unsigned int nf_ct_sack_adjust(struct sk_buff *skb,
+				      unsigned int protoff,
+				      struct tcphdr *tcph,
+				      struct nf_conn *ct,
+				      enum ip_conntrack_info ctinfo)
+{
+	unsigned int dir, optoff, optend;
+	struct nf_conn_seqadj *seqadj = nfct_seqadj(ct);
+
+	optoff = protoff + sizeof(struct tcphdr);
+	optend = protoff + tcph->doff * 4;
+
+	if (!skb_make_writable(skb, optend))
+		return 0;
+
+	dir = CTINFO2DIR(ctinfo);
+
+	while (optoff < optend) {
+		/* Usually: option, length. */
+		unsigned char *op = skb->data + optoff;
+
+		switch (op[0]) {
+		case TCPOPT_EOL:
+			return 1;
+		case TCPOPT_NOP:
+			optoff++;
+			continue;
+		default:
+			/* no partial options */
+			if (optoff + 1 == optend ||
+			    optoff + op[1] > optend ||
+			    op[1] < 2)
+				return 0;
+			if (op[0] == TCPOPT_SACK &&
+			    op[1] >= 2+TCPOLEN_SACK_PERBLOCK &&
+			    ((op[1] - 2) % TCPOLEN_SACK_PERBLOCK) == 0)
+				nf_ct_sack_block_adjust(skb, tcph, optoff + 2,
+							optoff+op[1],
+							&seqadj->seq[!dir]);
+			optoff += op[1];
+		}
+	}
+	return 1;
+}
+
+/* TCP sequence number adjustment.  Returns 1 on success, 0 on failure */
+int nf_ct_seq_adjust(struct sk_buff *skb,
+		     struct nf_conn *ct, enum ip_conntrack_info ctinfo,
+		     unsigned int protoff)
+{
+	enum ip_conntrack_dir dir = CTINFO2DIR(ctinfo);
+	struct tcphdr *tcph;
+	__be32 newseq, newack;
+	s32 seqoff, ackoff;
+	struct nf_conn_seqadj *seqadj = nfct_seqadj(ct);
+	struct nf_ct_seqadj *this_way, *other_way;
+	int res;
+
+	this_way  = &seqadj->seq[dir];
+	other_way = &seqadj->seq[!dir];
+
+	if (!skb_make_writable(skb, protoff + sizeof(*tcph)))
+		return 0;
+
+	tcph = (void *)skb->data + protoff;
+	spin_lock_bh(&ct->lock);
+	if (after(ntohl(tcph->seq), this_way->correction_pos))
+		seqoff = this_way->offset_after;
+	else
+		seqoff = this_way->offset_before;
+
+	if (after(ntohl(tcph->ack_seq) - other_way->offset_before,
+		  other_way->correction_pos))
+		ackoff = other_way->offset_after;
+	else
+		ackoff = other_way->offset_before;
+
+	newseq = htonl(ntohl(tcph->seq) + seqoff);
+	newack = htonl(ntohl(tcph->ack_seq) - ackoff);
+
+	inet_proto_csum_replace4(&tcph->check, skb, tcph->seq, newseq, 0);
+	inet_proto_csum_replace4(&tcph->check, skb, tcph->ack_seq, newack, 0);
+
+	pr_debug("Adjusting sequence number from %u->%u, ack from %u->%u\n",
+		 ntohl(tcph->seq), ntohl(newseq), ntohl(tcph->ack_seq),
+		 ntohl(newack));
+
+	tcph->seq = newseq;
+	tcph->ack_seq = newack;
+
+	res = nf_ct_sack_adjust(skb, protoff, tcph, ct, ctinfo);
+	spin_unlock_bh(&ct->lock);
+
+	return res;
+}
+EXPORT_SYMBOL_GPL(nf_ct_seq_adjust);
+
+s32 nf_ct_seq_offset(const struct nf_conn *ct,
+		     enum ip_conntrack_dir dir,
+		     u32 seq)
+{
+	struct nf_conn_seqadj *seqadj = nfct_seqadj(ct);
+	struct nf_ct_seqadj *this_way;
+
+	if (!seqadj)
+		return 0;
+
+	this_way = &seqadj->seq[dir];
+	return after(seq, this_way->correction_pos) ?
+		 this_way->offset_after : this_way->offset_before;
+}
+EXPORT_SYMBOL_GPL(nf_ct_seq_offset);
+
+static struct nf_ct_ext_type nf_ct_seqadj_extend __read_mostly = {
+	.len	= sizeof(struct nf_conn_seqadj),
+	.align	= __alignof__(struct nf_conn_seqadj),
+	.id	= NF_CT_EXT_SEQADJ,
+};
+
+int nf_conntrack_seqadj_init(void)
+{
+	return nf_ct_extend_register(&nf_ct_seqadj_extend);
+}
+
+void nf_conntrack_seqadj_fini(void)
+{
+	nf_ct_extend_unregister(&nf_ct_seqadj_extend);
+}
diff --git a/net/netfilter/nf_nat_core.c b/net/netfilter/nf_nat_core.c
index 038eee5..ffdd26d 100644
--- a/net/netfilter/nf_nat_core.c
+++ b/net/netfilter/nf_nat_core.c
@@ -25,6 +25,7 @@
 #include <net/netfilter/nf_nat_core.h>
 #include <net/netfilter/nf_nat_helper.h>
 #include <net/netfilter/nf_conntrack_helper.h>
+#include <net/netfilter/nf_conntrack_seqadj.h>
 #include <net/netfilter/nf_conntrack_l3proto.h>
 #include <net/netfilter/nf_conntrack_zones.h>
 #include <linux/netfilter/nf_nat.h>
@@ -402,6 +403,9 @@ nf_nat_setup_info(struct nf_conn *ct,
 			ct->status |= IPS_SRC_NAT;
 		else
 			ct->status |= IPS_DST_NAT;
+
+		if (nfct_help(ct))
+			nfct_seqadj_ext_add(ct);
 	}
 
 	if (maniptype == NF_NAT_MANIP_SRC) {
@@ -764,10 +768,6 @@ static struct nf_ct_helper_expectfn follow_master_nat = {
 	.expectfn	= nf_nat_follow_master,
 };
 
-static struct nfq_ct_nat_hook nfq_ct_nat = {
-	.seq_adjust	= nf_nat_tcp_seq_adjust,
-};
-
 static int __init nf_nat_init(void)
 {
 	int ret;
@@ -787,14 +787,9 @@ static int __init nf_nat_init(void)
 	/* Initialize fake conntrack so that NAT will skip it */
 	nf_ct_untracked_status_or(IPS_NAT_DONE_MASK);
 
-	BUG_ON(nf_nat_seq_adjust_hook != NULL);
-	RCU_INIT_POINTER(nf_nat_seq_adjust_hook, nf_nat_seq_adjust);
 	BUG_ON(nfnetlink_parse_nat_setup_hook != NULL);
 	RCU_INIT_POINTER(nfnetlink_parse_nat_setup_hook,
 			   nfnetlink_parse_nat_setup);
-	BUG_ON(nf_ct_nat_offset != NULL);
-	RCU_INIT_POINTER(nf_ct_nat_offset, nf_nat_get_offset);
-	RCU_INIT_POINTER(nfq_ct_nat_hook, &nfq_ct_nat);
 #ifdef CONFIG_XFRM
 	BUG_ON(nf_nat_decode_session_hook != NULL);
 	RCU_INIT_POINTER(nf_nat_decode_session_hook, __nf_nat_decode_session);
@@ -813,10 +808,7 @@ static void __exit nf_nat_cleanup(void)
 	unregister_pernet_subsys(&nf_nat_net_ops);
 	nf_ct_extend_unregister(&nat_extend);
 	nf_ct_helper_expectfn_unregister(&follow_master_nat);
-	RCU_INIT_POINTER(nf_nat_seq_adjust_hook, NULL);
 	RCU_INIT_POINTER(nfnetlink_parse_nat_setup_hook, NULL);
-	RCU_INIT_POINTER(nf_ct_nat_offset, NULL);
-	RCU_INIT_POINTER(nfq_ct_nat_hook, NULL);
 #ifdef CONFIG_XFRM
 	RCU_INIT_POINTER(nf_nat_decode_session_hook, NULL);
 #endif
diff --git a/net/netfilter/nf_nat_helper.c b/net/netfilter/nf_nat_helper.c
index 46b9baa..2840abb 100644
--- a/net/netfilter/nf_nat_helper.c
+++ b/net/netfilter/nf_nat_helper.c
@@ -20,67 +20,13 @@
 #include <net/netfilter/nf_conntrack_helper.h>
 #include <net/netfilter/nf_conntrack_ecache.h>
 #include <net/netfilter/nf_conntrack_expect.h>
+#include <net/netfilter/nf_conntrack_seqadj.h>
 #include <net/netfilter/nf_nat.h>
 #include <net/netfilter/nf_nat_l3proto.h>
 #include <net/netfilter/nf_nat_l4proto.h>
 #include <net/netfilter/nf_nat_core.h>
 #include <net/netfilter/nf_nat_helper.h>
 
-#define DUMP_OFFSET(x) \
-	pr_debug("offset_before=%d, offset_after=%d, correction_pos=%u\n", \
-		 x->offset_before, x->offset_after, x->correction_pos);
-
-/* Setup TCP sequence correction given this change at this sequence */
-static inline void
-adjust_tcp_sequence(u32 seq,
-		    int sizediff,
-		    struct nf_conn *ct,
-		    enum ip_conntrack_info ctinfo)
-{
-	enum ip_conntrack_dir dir = CTINFO2DIR(ctinfo);
-	struct nf_conn_nat *nat = nfct_nat(ct);
-	struct nf_nat_seq *this_way = &nat->seq[dir];
-
-	pr_debug("adjust_tcp_sequence: seq = %u, sizediff = %d\n",
-		 seq, sizediff);
-
-	pr_debug("adjust_tcp_sequence: Seq_offset before: ");
-	DUMP_OFFSET(this_way);
-
-	spin_lock_bh(&ct->lock);
-
-	/* SYN adjust. If it's uninitialized, or this is after last
-	 * correction, record it: we don't handle more than one
-	 * adjustment in the window, but do deal with common case of a
-	 * retransmit */
-	if (this_way->offset_before == this_way->offset_after ||
-	    before(this_way->correction_pos, seq)) {
-		this_way->correction_pos = seq;
-		this_way->offset_before = this_way->offset_after;
-		this_way->offset_after += sizediff;
-	}
-	spin_unlock_bh(&ct->lock);
-
-	pr_debug("adjust_tcp_sequence: Seq_offset after: ");
-	DUMP_OFFSET(this_way);
-}
-
-/* Get the offset value, for conntrack. Caller must have the conntrack locked */
-s32 nf_nat_get_offset(const struct nf_conn *ct,
-		      enum ip_conntrack_dir dir,
-		      u32 seq)
-{
-	struct nf_conn_nat *nat = nfct_nat(ct);
-	struct nf_nat_seq *this_way;
-
-	if (!nat)
-		return 0;
-
-	this_way = &nat->seq[dir];
-	return after(seq, this_way->correction_pos)
-		 ? this_way->offset_after : this_way->offset_before;
-}
-
 /* Frobs data inside this packet, which is linear. */
 static void mangle_contents(struct sk_buff *skb,
 			    unsigned int dataoff,
@@ -135,30 +81,6 @@ static int enlarge_skb(struct sk_buff *skb, unsigned int extra)
 	return 1;
 }
 
-void nf_nat_set_seq_adjust(struct nf_conn *ct, enum ip_conntrack_info ctinfo,
-			   __be32 seq, s32 off)
-{
-	if (!off)
-		return;
-	set_bit(IPS_SEQ_ADJUST_BIT, &ct->status);
-	adjust_tcp_sequence(ntohl(seq), off, ct, ctinfo);
-	nf_conntrack_event_cache(IPCT_NATSEQADJ, ct);
-}
-EXPORT_SYMBOL_GPL(nf_nat_set_seq_adjust);
-
-void nf_nat_tcp_seq_adjust(struct sk_buff *skb, struct nf_conn *ct,
-			   u32 ctinfo, int off)
-{
-	const struct tcphdr *th;
-
-	if (nf_ct_protonum(ct) != IPPROTO_TCP)
-		return;
-
-	th = (struct tcphdr *)(skb_network_header(skb)+ ip_hdrlen(skb));
-	nf_nat_set_seq_adjust(ct, ctinfo, th->seq, off);
-}
-EXPORT_SYMBOL_GPL(nf_nat_tcp_seq_adjust);
-
 /* Generic function for mangling variable-length address changes inside
  * NATed TCP connections (like the PORT XXX,XXX,XXX,XXX,XXX,XXX
  * command in FTP).
@@ -203,8 +125,8 @@ int __nf_nat_mangle_tcp_packet(struct sk_buff *skb,
 			     datalen, oldlen);
 
 	if (adjust && rep_len != match_len)
-		nf_nat_set_seq_adjust(ct, ctinfo, tcph->seq,
-				      (int)rep_len - (int)match_len);
+		nf_ct_seqadj_set(ct, ctinfo, tcph->seq,
+				 (int)rep_len - (int)match_len);
 
 	return 1;
 }
@@ -264,150 +186,6 @@ nf_nat_mangle_udp_packet(struct sk_buff *skb,
 }
 EXPORT_SYMBOL(nf_nat_mangle_udp_packet);
 
-/* Adjust one found SACK option including checksum correction */
-static void
-sack_adjust(struct sk_buff *skb,
-	    struct tcphdr *tcph,
-	    unsigned int sackoff,
-	    unsigned int sackend,
-	    struct nf_nat_seq *natseq)
-{
-	while (sackoff < sackend) {
-		struct tcp_sack_block_wire *sack;
-		__be32 new_start_seq, new_end_seq;
-
-		sack = (void *)skb->data + sackoff;
-		if (after(ntohl(sack->start_seq) - natseq->offset_before,
-			  natseq->correction_pos))
-			new_start_seq = htonl(ntohl(sack->start_seq)
-					- natseq->offset_after);
-		else
-			new_start_seq = htonl(ntohl(sack->start_seq)
-					- natseq->offset_before);
-
-		if (after(ntohl(sack->end_seq) - natseq->offset_before,
-			  natseq->correction_pos))
-			new_end_seq = htonl(ntohl(sack->end_seq)
-				      - natseq->offset_after);
-		else
-			new_end_seq = htonl(ntohl(sack->end_seq)
-				      - natseq->offset_before);
-
-		pr_debug("sack_adjust: start_seq: %d->%d, end_seq: %d->%d\n",
-			 ntohl(sack->start_seq), new_start_seq,
-			 ntohl(sack->end_seq), new_end_seq);
-
-		inet_proto_csum_replace4(&tcph->check, skb,
-					 sack->start_seq, new_start_seq, 0);
-		inet_proto_csum_replace4(&tcph->check, skb,
-					 sack->end_seq, new_end_seq, 0);
-		sack->start_seq = new_start_seq;
-		sack->end_seq = new_end_seq;
-		sackoff += sizeof(*sack);
-	}
-}
-
-/* TCP SACK sequence number adjustment */
-static inline unsigned int
-nf_nat_sack_adjust(struct sk_buff *skb,
-		   unsigned int protoff,
-		   struct tcphdr *tcph,
-		   struct nf_conn *ct,
-		   enum ip_conntrack_info ctinfo)
-{
-	unsigned int dir, optoff, optend;
-	struct nf_conn_nat *nat = nfct_nat(ct);
-
-	optoff = protoff + sizeof(struct tcphdr);
-	optend = protoff + tcph->doff * 4;
-
-	if (!skb_make_writable(skb, optend))
-		return 0;
-
-	dir = CTINFO2DIR(ctinfo);
-
-	while (optoff < optend) {
-		/* Usually: option, length. */
-		unsigned char *op = skb->data + optoff;
-
-		switch (op[0]) {
-		case TCPOPT_EOL:
-			return 1;
-		case TCPOPT_NOP:
-			optoff++;
-			continue;
-		default:
-			/* no partial options */
-			if (optoff + 1 == optend ||
-			    optoff + op[1] > optend ||
-			    op[1] < 2)
-				return 0;
-			if (op[0] == TCPOPT_SACK &&
-			    op[1] >= 2+TCPOLEN_SACK_PERBLOCK &&
-			    ((op[1] - 2) % TCPOLEN_SACK_PERBLOCK) == 0)
-				sack_adjust(skb, tcph, optoff+2,
-					    optoff+op[1], &nat->seq[!dir]);
-			optoff += op[1];
-		}
-	}
-	return 1;
-}
-
-/* TCP sequence number adjustment.  Returns 1 on success, 0 on failure */
-int
-nf_nat_seq_adjust(struct sk_buff *skb,
-		  struct nf_conn *ct,
-		  enum ip_conntrack_info ctinfo,
-		  unsigned int protoff)
-{
-	struct tcphdr *tcph;
-	int dir;
-	__be32 newseq, newack;
-	s32 seqoff, ackoff;
-	struct nf_conn_nat *nat = nfct_nat(ct);
-	struct nf_nat_seq *this_way, *other_way;
-	int res;
-
-	dir = CTINFO2DIR(ctinfo);
-
-	this_way = &nat->seq[dir];
-	other_way = &nat->seq[!dir];
-
-	if (!skb_make_writable(skb, protoff + sizeof(*tcph)))
-		return 0;
-
-	tcph = (void *)skb->data + protoff;
-	spin_lock_bh(&ct->lock);
-	if (after(ntohl(tcph->seq), this_way->correction_pos))
-		seqoff = this_way->offset_after;
-	else
-		seqoff = this_way->offset_before;
-
-	if (after(ntohl(tcph->ack_seq) - other_way->offset_before,
-		  other_way->correction_pos))
-		ackoff = other_way->offset_after;
-	else
-		ackoff = other_way->offset_before;
-
-	newseq = htonl(ntohl(tcph->seq) + seqoff);
-	newack = htonl(ntohl(tcph->ack_seq) - ackoff);
-
-	inet_proto_csum_replace4(&tcph->check, skb, tcph->seq, newseq, 0);
-	inet_proto_csum_replace4(&tcph->check, skb, tcph->ack_seq, newack, 0);
-
-	pr_debug("Adjusting sequence number from %u->%u, ack from %u->%u\n",
-		 ntohl(tcph->seq), ntohl(newseq), ntohl(tcph->ack_seq),
-		 ntohl(newack));
-
-	tcph->seq = newseq;
-	tcph->ack_seq = newack;
-
-	res = nf_nat_sack_adjust(skb, protoff, tcph, ct, ctinfo);
-	spin_unlock_bh(&ct->lock);
-
-	return res;
-}
-
 /* Setup NAT on this expected conntrack so it follows master. */
 /* If we fail to get a free NAT slot, we'll get dropped on confirm */
 void nf_nat_follow_master(struct nf_conn *ct,
diff --git a/net/netfilter/nf_nat_sip.c b/net/netfilter/nf_nat_sip.c
index dac11f7..f979040 100644
--- a/net/netfilter/nf_nat_sip.c
+++ b/net/netfilter/nf_nat_sip.c
@@ -20,6 +20,7 @@
 #include <net/netfilter/nf_nat_helper.h>
 #include <net/netfilter/nf_conntrack_helper.h>
 #include <net/netfilter/nf_conntrack_expect.h>
+#include <net/netfilter/nf_conntrack_seqadj.h>
 #include <linux/netfilter/nf_conntrack_sip.h>
 
 MODULE_LICENSE("GPL");
@@ -308,7 +309,7 @@ static void nf_nat_sip_seq_adjust(struct sk_buff *skb, unsigned int protoff,
 		return;
 
 	th = (struct tcphdr *)(skb->data + protoff);
-	nf_nat_set_seq_adjust(ct, ctinfo, th->seq, off);
+	nf_ct_seqadj_set(ct, ctinfo, th->seq, off);
 }
 
 /* Handles expected signalling connections and media streams */
diff --git a/net/netfilter/nfnetlink_queue_ct.c b/net/netfilter/nfnetlink_queue_ct.c
index ab61d66..2cb9a28 100644
--- a/net/netfilter/nfnetlink_queue_ct.c
+++ b/net/netfilter/nfnetlink_queue_ct.c
@@ -87,12 +87,12 @@ nla_put_failure:
 void nfqnl_ct_seq_adjust(struct sk_buff *skb, struct nf_conn *ct,
 			 enum ip_conntrack_info ctinfo, int diff)
 {
-	struct nfq_ct_nat_hook *nfq_nat_ct;
+	struct nfq_ct_hook *nfq_ct;
 
-	nfq_nat_ct = rcu_dereference(nfq_ct_nat_hook);
-	if (nfq_nat_ct == NULL)
+	nfq_ct = rcu_dereference(nfq_ct_hook);
+	if (nfq_ct == NULL)
 		return;
 
 	if ((ct->status & IPS_NAT_MASK) && diff)
-		nfq_nat_ct->seq_adjust(skb, ct, ctinfo, diff);
+		nfq_ct->seq_adjust(skb, ct, ctinfo, diff);
 }
-- 
1.8.1.4


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH 2/5] net: syncookies: export cookie_v4_init_sequence/cookie_v4_check
  2013-08-07 17:42 [PATCH RFC 0/5] netfilter: implement netfilter SYN proxy Patrick McHardy
  2013-08-07 17:42 ` [PATCH 1/5] netfilter: nf_conntrack: make sequence number adjustments usuable without NAT Patrick McHardy
@ 2013-08-07 17:42 ` Patrick McHardy
  2013-08-07 20:03   ` Jesper Dangaard Brouer
  2013-08-07 17:42 ` [PATCH 3/5] netfilter: add SYNPROXY core/target Patrick McHardy
                   ` (3 subsequent siblings)
  5 siblings, 1 reply; 29+ messages in thread
From: Patrick McHardy @ 2013-08-07 17:42 UTC (permalink / raw)
  To: pablo; +Cc: netfilter-devel, netdev, mph, jesper.brouer, as

Extract the local TCP stack independant parts of tcp_v4_init_sequence()
and cookie_v4_check() and export them for use by the upcoming SYNPROXY
target.

Signed-off-by: Patrick McHardy <kaber@trash.net>
---
 include/net/tcp.h     |  4 ++++
 net/ipv4/syncookies.c | 29 ++++++++++++++++++-----------
 2 files changed, 22 insertions(+), 11 deletions(-)

diff --git a/include/net/tcp.h b/include/net/tcp.h
index 18fc999..63256c6 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -480,9 +480,13 @@ void inet_sk_rx_dst_set(struct sock *sk, const struct sk_buff *skb);
 
 /* From syncookies.c */
 extern __u32 syncookie_secret[2][16-4+SHA_DIGEST_WORDS];
+extern int __cookie_v4_check(const struct iphdr *iph, const struct tcphdr *th,
+			     u32 cookie);
 extern struct sock *cookie_v4_check(struct sock *sk, struct sk_buff *skb, 
 				    struct ip_options *opt);
 #ifdef CONFIG_SYN_COOKIES
+extern u32 __cookie_v4_init_sequence(const struct iphdr *iph,
+				     const struct tcphdr *th, u16 *mssp);
 extern __u32 cookie_v4_init_sequence(struct sock *sk, struct sk_buff *skb, 
 				     __u16 *mss);
 #else
diff --git a/net/ipv4/syncookies.c b/net/ipv4/syncookies.c
index b05c96e..14a15c4 100644
--- a/net/ipv4/syncookies.c
+++ b/net/ipv4/syncookies.c
@@ -160,26 +160,33 @@ static __u16 const msstab[] = {
  * Generate a syncookie.  mssp points to the mss, which is returned
  * rounded down to the value encoded in the cookie.
  */
-__u32 cookie_v4_init_sequence(struct sock *sk, struct sk_buff *skb, __u16 *mssp)
+u32 __cookie_v4_init_sequence(const struct iphdr *iph, const struct tcphdr *th,
+			      u16 *mssp)
 {
-	const struct iphdr *iph = ip_hdr(skb);
-	const struct tcphdr *th = tcp_hdr(skb);
 	int mssind;
 	const __u16 mss = *mssp;
 
-	tcp_synq_overflow(sk);
-
 	for (mssind = ARRAY_SIZE(msstab) - 1; mssind ; mssind--)
 		if (mss >= msstab[mssind])
 			break;
 	*mssp = msstab[mssind];
 
-	NET_INC_STATS_BH(sock_net(sk), LINUX_MIB_SYNCOOKIESSENT);
-
 	return secure_tcp_syn_cookie(iph->saddr, iph->daddr,
 				     th->source, th->dest, ntohl(th->seq),
 				     jiffies / (HZ * 60), mssind);
 }
+EXPORT_SYMBOL_GPL(__cookie_v4_init_sequence);
+
+__u32 cookie_v4_init_sequence(struct sock *sk, struct sk_buff *skb, __u16 *mssp)
+{
+	const struct iphdr *iph = ip_hdr(skb);
+	const struct tcphdr *th = tcp_hdr(skb);
+
+	tcp_synq_overflow(sk);
+	NET_INC_STATS_BH(sock_net(sk), LINUX_MIB_SYNCOOKIESSENT);
+
+	return __cookie_v4_init_sequence(iph, th, mssp);
+}
 
 /*
  * This (misnamed) value is the age of syncookie which is permitted.
@@ -192,10 +199,9 @@ __u32 cookie_v4_init_sequence(struct sock *sk, struct sk_buff *skb, __u16 *mssp)
  * Check if a ack sequence number is a valid syncookie.
  * Return the decoded mss if it is, or 0 if not.
  */
-static inline int cookie_check(struct sk_buff *skb, __u32 cookie)
+int __cookie_v4_check(const struct iphdr *iph, const struct tcphdr *th,
+		      u32 cookie)
 {
-	const struct iphdr *iph = ip_hdr(skb);
-	const struct tcphdr *th = tcp_hdr(skb);
 	__u32 seq = ntohl(th->seq) - 1;
 	__u32 mssind = check_tcp_syn_cookie(cookie, iph->saddr, iph->daddr,
 					    th->source, th->dest, seq,
@@ -204,6 +210,7 @@ static inline int cookie_check(struct sk_buff *skb, __u32 cookie)
 
 	return mssind < ARRAY_SIZE(msstab) ? msstab[mssind] : 0;
 }
+EXPORT_SYMBOL_GPL(__cookie_v4_check);
 
 static inline struct sock *get_cookie_sock(struct sock *sk, struct sk_buff *skb,
 					   struct request_sock *req,
@@ -284,7 +291,7 @@ struct sock *cookie_v4_check(struct sock *sk, struct sk_buff *skb,
 		goto out;
 
 	if (tcp_synq_no_recent_overflow(sk) ||
-	    (mss = cookie_check(skb, cookie)) == 0) {
+	    (mss = __cookie_v4_check(ip_hdr(skb), th, cookie)) == 0) {
 		NET_INC_STATS_BH(sock_net(sk), LINUX_MIB_SYNCOOKIESFAILED);
 		goto out;
 	}
-- 
1.8.1.4

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH 3/5] netfilter: add SYNPROXY core/target
  2013-08-07 17:42 [PATCH RFC 0/5] netfilter: implement netfilter SYN proxy Patrick McHardy
  2013-08-07 17:42 ` [PATCH 1/5] netfilter: nf_conntrack: make sequence number adjustments usuable without NAT Patrick McHardy
  2013-08-07 17:42 ` [PATCH 2/5] net: syncookies: export cookie_v4_init_sequence/cookie_v4_check Patrick McHardy
@ 2013-08-07 17:42 ` Patrick McHardy
  2013-08-07 20:26   ` Jesper Dangaard Brouer
  2013-08-07 22:11   ` Eric Dumazet
  2013-08-07 17:42 ` [PATCH 4/5] net: syncookies: export cookie_v6_init_sequence/cookie_v6_check Patrick McHardy
                   ` (2 subsequent siblings)
  5 siblings, 2 replies; 29+ messages in thread
From: Patrick McHardy @ 2013-08-07 17:42 UTC (permalink / raw)
  To: pablo; +Cc: netfilter-devel, netdev, mph, jesper.brouer, as

Add a SYNPROXY for netfilter. The code is split into two parts, the synproxy
core with common functions and an address family specific target.

The SYNPROXY receives the connection request from the client, responds with
a SYN/ACK containing a SYN cookie and announcing a zero window and checks
whether the final ACK from the client contains a valid cookie.

It then establishes a connection to the original destination and, if
successful, sends a window update to the client with the window size
announced by the server.

Support for timestamps, SACK, window scaling and MSS options can be
statically configured as target parameters if the features of the server
are known. If timestamps are used, the timestamp value sent back to
the client in the SYN/ACK will be different from the real timestamp of
the server. In order to now break PAWS, the timestamps are translated in
the direction server->client.

Signed-off-by: Patrick McHardy <kaber@trash.net>
---
 include/net/netfilter/nf_conntrack_extend.h   |   6 +-
 include/net/netfilter/nf_conntrack_seqadj.h   |   2 +
 include/net/netfilter/nf_conntrack_synproxy.h |  74 +++++
 include/uapi/linux/netfilter/xt_SYNPROXY.h    |  16 +
 net/ipv4/netfilter/Kconfig                    |  13 +
 net/ipv4/netfilter/Makefile                   |   1 +
 net/ipv4/netfilter/ipt_SYNPROXY.c             | 428 ++++++++++++++++++++++++++
 net/netfilter/Kconfig                         |   3 +
 net/netfilter/Makefile                        |   3 +
 net/netfilter/nf_conntrack_core.c             |   6 +
 net/netfilter/nf_conntrack_seqadj.c           |  20 ++
 net/netfilter/nf_synproxy_core.c              | 428 ++++++++++++++++++++++++++
 12 files changed, 999 insertions(+), 1 deletion(-)
 create mode 100644 include/net/netfilter/nf_conntrack_synproxy.h
 create mode 100644 include/uapi/linux/netfilter/xt_SYNPROXY.h
 create mode 100644 net/ipv4/netfilter/ipt_SYNPROXY.c
 create mode 100644 net/netfilter/nf_synproxy_core.c

diff --git a/include/net/netfilter/nf_conntrack_extend.h b/include/net/netfilter/nf_conntrack_extend.h
index 2a22bcb..ff95434 100644
--- a/include/net/netfilter/nf_conntrack_extend.h
+++ b/include/net/netfilter/nf_conntrack_extend.h
@@ -9,8 +9,8 @@ enum nf_ct_ext_id {
 	NF_CT_EXT_HELPER,
 #if defined(CONFIG_NF_NAT) || defined(CONFIG_NF_NAT_MODULE)
 	NF_CT_EXT_NAT,
-	NF_CT_EXT_SEQADJ,
 #endif
+	NF_CT_EXT_SEQADJ,
 	NF_CT_EXT_ACCT,
 #ifdef CONFIG_NF_CONNTRACK_EVENTS
 	NF_CT_EXT_ECACHE,
@@ -27,6 +27,9 @@ enum nf_ct_ext_id {
 #ifdef CONFIG_NF_CONNTRACK_LABELS
 	NF_CT_EXT_LABELS,
 #endif
+#if IS_ENABLED(CONFIG_NETFILTER_SYNPROXY)
+	NF_CT_EXT_SYNPROXY,
+#endif
 	NF_CT_EXT_NUM,
 };
 
@@ -39,6 +42,7 @@ enum nf_ct_ext_id {
 #define NF_CT_EXT_TSTAMP_TYPE struct nf_conn_tstamp
 #define NF_CT_EXT_TIMEOUT_TYPE struct nf_conn_timeout
 #define NF_CT_EXT_LABELS_TYPE struct nf_conn_labels
+#define NF_CT_EXT_SYNPROXY_TYPE struct nf_conn_synproxy
 
 /* Extensions: optional stuff which isn't permanently in struct. */
 struct nf_ct_ext {
diff --git a/include/net/netfilter/nf_conntrack_seqadj.h b/include/net/netfilter/nf_conntrack_seqadj.h
index 30bfbbe..f6177a5 100644
--- a/include/net/netfilter/nf_conntrack_seqadj.h
+++ b/include/net/netfilter/nf_conntrack_seqadj.h
@@ -30,6 +30,8 @@ static inline struct nf_conn_seqadj *nfct_seqadj_ext_add(struct nf_conn *ct)
 	return nf_ct_ext_add(ct, NF_CT_EXT_SEQADJ, GFP_ATOMIC);
 }
 
+extern int nf_ct_seqadj_init(struct nf_conn *ct, enum ip_conntrack_info ctinfo,
+			     s32 off);
 extern int nf_ct_seqadj_set(struct nf_conn *ct, enum ip_conntrack_info ctinfo,
 			    __be32 seq, s32 off);
 extern void nf_ct_tcp_seqadj_set(struct sk_buff *skb,
diff --git a/include/net/netfilter/nf_conntrack_synproxy.h b/include/net/netfilter/nf_conntrack_synproxy.h
new file mode 100644
index 0000000..cc43895
--- /dev/null
+++ b/include/net/netfilter/nf_conntrack_synproxy.h
@@ -0,0 +1,74 @@
+#ifndef _NF_CONNTRACK_SYNPROXY_H
+#define _NF_CONNTRACK_SYNPROXY_H
+
+#include <net/netns/generic.h>
+
+struct nf_conn_synproxy {
+	u32	isn;
+	u32	tsoff;
+};
+
+static inline struct nf_conn_synproxy *nfct_synproxy(const struct nf_conn *ct)
+{
+#if IS_ENABLED(CONFIG_NETFILTER_SYNPROXY)
+	return nf_ct_ext_find(ct, NF_CT_EXT_SYNPROXY);
+#else
+	return NULL;
+#endif
+}
+
+static inline struct nf_conn_synproxy *nfct_synproxy_ext_add(struct nf_conn *ct)
+{
+#if IS_ENABLED(CONFIG_NETFILTER_SYNPROXY)
+	return nf_ct_ext_add(ct, NF_CT_EXT_SYNPROXY, GFP_ATOMIC);
+#else
+	return NULL;
+#endif
+}
+
+struct synproxy_stats {
+	unsigned int			syn_received;
+	unsigned int			cookie_invalid;
+	unsigned int			cookie_valid;
+};
+
+struct synproxy_net {
+	struct nf_conn			*tmpl;
+	struct synproxy_stats __percpu	*stats;
+};
+
+extern int synproxy_net_id;
+static inline struct synproxy_net *synproxy_pernet(struct net *net)
+{
+	return net_generic(net, synproxy_net_id);
+}
+
+struct synproxy_options {
+	u8				options;
+	u8				wscale;
+	u16				mss;
+	u32				tsval;
+	u32				tsecr;
+};
+
+struct tcphdr;
+struct xt_synproxy_info;
+extern void synproxy_parse_options(const struct sk_buff *skb, unsigned int doff,
+				   const struct tcphdr *th,
+				   struct synproxy_options *opts);
+extern unsigned int synproxy_options_size(const struct synproxy_options *opts);
+extern void synproxy_build_options(struct tcphdr *th,
+				   const struct synproxy_options *opts);
+
+extern void synproxy_init_timestamp_cookie(const struct xt_synproxy_info *info,
+					   struct synproxy_options *opts);
+extern void synproxy_check_timestamp_cookie(struct synproxy_options *opts);
+
+extern unsigned int synproxy_tstamp_adjust(struct sk_buff *skb,
+					   unsigned int protoff,
+					   struct tcphdr *th,
+					   struct nf_conn *ct,
+					   enum ip_conntrack_info ctinfo,
+					   const struct nf_conn_synproxy *synproxy);
+
+#endif /* _NF_CONNTRACK_SYNPROXY_H */
diff --git a/include/uapi/linux/netfilter/xt_SYNPROXY.h b/include/uapi/linux/netfilter/xt_SYNPROXY.h
new file mode 100644
index 0000000..2d59fba
--- /dev/null
+++ b/include/uapi/linux/netfilter/xt_SYNPROXY.h
@@ -0,0 +1,16 @@
+#ifndef _XT_SYNPROXY_H
+#define _XT_SYNPROXY_H
+
+#define XT_SYNPROXY_OPT_MSS		0x01
+#define XT_SYNPROXY_OPT_WSCALE		0x02
+#define XT_SYNPROXY_OPT_SACK_PERM	0x04
+#define XT_SYNPROXY_OPT_TIMESTAMP	0x08
+#define XT_SYNPROXY_OPT_ECN		0x10
+
+struct xt_synproxy_info {
+	__u8	options;
+	__u8	wscale;
+	__u16	mss;
+};
+
+#endif /* _XT_SYNPROXY_H */
diff --git a/net/ipv4/netfilter/Kconfig b/net/ipv4/netfilter/Kconfig
index 4e90280..1657e39 100644
--- a/net/ipv4/netfilter/Kconfig
+++ b/net/ipv4/netfilter/Kconfig
@@ -110,6 +110,19 @@ config IP_NF_TARGET_REJECT
 
 	  To compile it as a module, choose M here.  If unsure, say N.
 
+config IP_NF_TARGET_SYNPROXY
+	tristate "SYNPROXY target support"
+	depends on NF_CONNTRACK && NETFILTER_ADVANCED
+	select NETFILTER_SYNPROXY
+	select SYN_COOKIES
+	help
+	  The SYNPROXY target allows you to intercept TCP connections and
+	  establish them using syncookies before they are passed on to the
+	  server. This allows to avoid conntrack and server resource usage
+	  during SYN-flood attacks.
+
+	  To compile it as a module, choose M here. If unsure, say N.
+
 config IP_NF_TARGET_ULOG
 	tristate "ULOG target support (obsolete)"
 	default m if NETFILTER_ADVANCED=n
diff --git a/net/ipv4/netfilter/Makefile b/net/ipv4/netfilter/Makefile
index 007b128..3622b24 100644
--- a/net/ipv4/netfilter/Makefile
+++ b/net/ipv4/netfilter/Makefile
@@ -46,6 +46,7 @@ obj-$(CONFIG_IP_NF_TARGET_CLUSTERIP) += ipt_CLUSTERIP.o
 obj-$(CONFIG_IP_NF_TARGET_ECN) += ipt_ECN.o
 obj-$(CONFIG_IP_NF_TARGET_MASQUERADE) += ipt_MASQUERADE.o
 obj-$(CONFIG_IP_NF_TARGET_REJECT) += ipt_REJECT.o
+obj-$(CONFIG_IP_NF_TARGET_SYNPROXY) += ipt_SYNPROXY.o
 obj-$(CONFIG_IP_NF_TARGET_ULOG) += ipt_ULOG.o
 
 # generic ARP tables
diff --git a/net/ipv4/netfilter/ipt_SYNPROXY.c b/net/ipv4/netfilter/ipt_SYNPROXY.c
new file mode 100644
index 0000000..5576f7c
--- /dev/null
+++ b/net/ipv4/netfilter/ipt_SYNPROXY.c
@@ -0,0 +1,428 @@
+/*
+ * Copyright (c) 2013 Patrick McHardy <kaber@trash.net>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+
+#include <linux/module.h>
+#include <linux/skbuff.h>
+#include <net/tcp.h>
+
+#include <linux/netfilter_ipv4/ip_tables.h>
+#include <linux/netfilter/x_tables.h>
+#include <linux/netfilter/xt_SYNPROXY.h>
+#include <net/netfilter/nf_conntrack.h>
+#include <net/netfilter/nf_conntrack_seqadj.h>
+#include <net/netfilter/nf_conntrack_synproxy.h>
+
+static struct iphdr *
+synproxy_build_ip(struct sk_buff *skb, u32 saddr, u32 daddr)
+{
+	struct iphdr *iph;
+
+	skb_reset_network_header(skb);
+	iph = (struct iphdr *)skb_put(skb, sizeof(*iph));
+	iph->version	= 4;
+	iph->ihl	= sizeof(*iph) / 4;
+	iph->tos	= 0;
+	iph->id		= 0;
+	iph->frag_off	= htons(IP_DF);
+	iph->ttl	= 64;
+	iph->protocol	= IPPROTO_TCP;
+	iph->check	= 0;
+	iph->saddr	= saddr;
+	iph->daddr	= daddr;
+
+	return iph;
+}
+
+static void
+synproxy_send_tcp(const struct sk_buff *skb, struct sk_buff *nskb,
+		  struct nf_conntrack *nfct, enum ip_conntrack_info ctinfo,
+		  struct iphdr *niph, struct tcphdr *nth,
+		  unsigned int tcp_hdr_size)
+{
+	nth->check = ~tcp_v4_check(tcp_hdr_size, niph->saddr, niph->daddr, 0);
+	nskb->ip_summed   = CHECKSUM_PARTIAL;
+	nskb->csum_start  = (unsigned char *)nth - nskb->head;
+	nskb->csum_offset = offsetof(struct tcphdr, check);
+
+	skb_dst_set_noref(nskb, skb_dst(skb));
+	nskb->protocol = htons(ETH_P_IP);
+	if (ip_route_me_harder(nskb, RTN_UNSPEC))
+		goto free_nskb;
+
+	nskb->nfct = nfct;
+	nskb->nfctinfo = ctinfo;
+	nf_conntrack_get(nfct);
+
+	ip_local_out(nskb);
+	return;
+
+free_nskb:
+	kfree_skb(nskb);
+}
+
+static void
+synproxy_send_client_synack(const struct sk_buff *skb, const struct tcphdr *th,
+			    const struct synproxy_options *opts)
+{
+	struct sk_buff *nskb;
+	struct iphdr *iph, *niph;
+	struct tcphdr *nth;
+	unsigned int tcp_hdr_size;
+	u16 mss = opts->mss;
+
+	iph = ip_hdr(skb);
+
+	tcp_hdr_size = sizeof(*nth) + synproxy_options_size(opts);
+	nskb = alloc_skb(sizeof(*niph) + tcp_hdr_size + LL_MAX_HEADER,
+			 GFP_ATOMIC);
+	if (nskb == NULL)
+		return;
+	skb_reserve(nskb, LL_MAX_HEADER);
+
+	niph = synproxy_build_ip(nskb, iph->daddr, iph->saddr);
+
+	skb_reset_transport_header(nskb);
+	nth = (struct tcphdr *)skb_put(nskb, tcp_hdr_size);
+	nth->source	= th->dest;
+	nth->dest	= th->source;
+	nth->seq	= htonl(__cookie_v4_init_sequence(iph, th, &mss));
+	nth->ack_seq	= htonl(ntohl(th->seq) + 1);
+	tcp_flag_word(nth) = TCP_FLAG_SYN | TCP_FLAG_ACK;
+	if (opts->options & XT_SYNPROXY_OPT_ECN)
+		tcp_flag_word(nth) |= TCP_FLAG_ECE;
+	nth->doff	= tcp_hdr_size / 4;
+	nth->window	= 0;
+	nth->check	= 0;
+	nth->urg_ptr	= 0;
+
+	synproxy_build_options(nth, opts);
+
+	synproxy_send_tcp(skb, nskb, skb->nfct, IP_CT_ESTABLISHED_REPLY,
+			  niph, nth, tcp_hdr_size);
+}
+
+static void
+synproxy_send_server_syn(const struct synproxy_net *snet,
+			 const struct sk_buff *skb, const struct tcphdr *th,
+			 const struct synproxy_options *opts)
+{
+	struct sk_buff *nskb;
+	struct iphdr *iph, *niph;
+	struct tcphdr *nth;
+	unsigned int tcp_hdr_size;
+
+	iph = ip_hdr(skb);
+
+	tcp_hdr_size = sizeof(*nth) + synproxy_options_size(opts);
+	nskb = alloc_skb(sizeof(*niph) + tcp_hdr_size + LL_MAX_HEADER,
+			 GFP_ATOMIC);
+	if (nskb == NULL)
+		return;
+	skb_reserve(nskb, LL_MAX_HEADER);
+
+	niph = synproxy_build_ip(nskb, iph->saddr, iph->daddr);
+
+	skb_reset_transport_header(nskb);
+	nth = (struct tcphdr *)skb_put(nskb, tcp_hdr_size);
+	nth->source	= th->source;
+	nth->dest	= th->dest;
+	nth->seq	= htonl(ntohl(th->seq) - 1);
+	nth->ack_seq	= htonl(ntohl(th->ack_seq) - 1);;
+	tcp_flag_word(nth) = TCP_FLAG_SYN;
+	if (opts->options & XT_SYNPROXY_OPT_ECN)
+		tcp_flag_word(nth) |= TCP_FLAG_ECE | TCP_FLAG_CWR;
+	nth->doff	= tcp_hdr_size / 4;
+	nth->window	= th->window;
+	nth->check	= 0;
+	nth->urg_ptr	= 0;
+
+	synproxy_build_options(nth, opts);
+
+	synproxy_send_tcp(skb, nskb, &snet->tmpl->ct_general, IP_CT_NEW,
+			  niph, nth, tcp_hdr_size);
+}
+
+static void
+synproxy_send_server_ack(const struct synproxy_net *snet,
+			 const struct ip_ct_tcp *state,
+			 const struct sk_buff *skb, const struct tcphdr *th,
+			 const struct synproxy_options *opts)
+{
+	struct sk_buff *nskb;
+	struct iphdr *iph, *niph;
+	struct tcphdr *nth;
+	unsigned int tcp_hdr_size;
+
+	iph = ip_hdr(skb);
+
+	tcp_hdr_size = sizeof(*nth) + synproxy_options_size(opts);
+	nskb = alloc_skb(sizeof(*niph) + tcp_hdr_size + LL_MAX_HEADER,
+			 GFP_ATOMIC);
+	if (nskb == NULL)
+		return;
+	skb_reserve(nskb, LL_MAX_HEADER);
+
+	niph = synproxy_build_ip(nskb, iph->daddr, iph->saddr);
+
+	skb_reset_transport_header(nskb);
+	nth = (struct tcphdr *)skb_put(nskb, tcp_hdr_size);
+	nth->source	= th->dest;
+	nth->dest	= th->source;
+	nth->seq	= htonl(ntohl(th->ack_seq));
+	nth->ack_seq	= htonl(ntohl(th->seq) + 1);;
+	tcp_flag_word(nth) = TCP_FLAG_ACK;
+	nth->doff	= tcp_hdr_size / 4;
+	nth->window	= htons(state->seen[IP_CT_DIR_ORIGINAL].td_maxwin);
+	nth->check	= 0;
+	nth->urg_ptr	= 0;
+
+	synproxy_build_options(nth, opts);
+
+	synproxy_send_tcp(skb, nskb, skb->nfct, IP_CT_ESTABLISHED,
+			  niph, nth, tcp_hdr_size);
+}
+
+static void
+synproxy_send_client_ack(const struct synproxy_net *snet,
+			 const struct sk_buff *skb, const struct tcphdr *th,
+			 const struct synproxy_options *opts)
+{
+	struct sk_buff *nskb;
+	struct iphdr *iph, *niph;
+	struct tcphdr *nth;
+	unsigned int tcp_hdr_size;
+
+	iph = ip_hdr(skb);
+
+	tcp_hdr_size = sizeof(*nth) + synproxy_options_size(opts);
+	nskb = alloc_skb(sizeof(*niph) + tcp_hdr_size + LL_MAX_HEADER,
+			 GFP_ATOMIC);
+	if (nskb == NULL)
+		return;
+	skb_reserve(nskb, LL_MAX_HEADER);
+
+	niph = synproxy_build_ip(nskb, iph->saddr, iph->daddr);
+
+	skb_reset_transport_header(nskb);
+	nth = (struct tcphdr *)skb_put(nskb, tcp_hdr_size);
+	nth->source	= th->source;
+	nth->dest	= th->dest;
+	nth->seq	= htonl(ntohl(th->seq) + 1);
+	nth->ack_seq	= th->ack_seq;
+	tcp_flag_word(nth) = TCP_FLAG_ACK;
+	nth->doff	= tcp_hdr_size / 4;
+	nth->window	= th->window;
+	nth->check	= 0;
+	nth->urg_ptr	= 0;
+
+	synproxy_build_options(nth, opts);
+
+	synproxy_send_tcp(skb, nskb, skb->nfct, IP_CT_ESTABLISHED_REPLY,
+			  niph, nth, tcp_hdr_size);
+}
+
+static unsigned int
+synproxy_tg4(struct sk_buff *skb, const struct xt_action_param *par)
+{
+	const struct xt_synproxy_info *info = par->targinfo;
+	struct synproxy_net *snet = synproxy_pernet(dev_net(par->in));
+	struct synproxy_options opts = {};
+	struct tcphdr *th, _th;
+
+	if (nf_ip_checksum(skb, par->hooknum, par->thoff, IPPROTO_TCP))
+		return NF_DROP;
+
+	th = skb_header_pointer(skb, par->thoff, sizeof(_th), &_th);
+	if (th == NULL)
+		return NF_DROP;
+
+	synproxy_parse_options(skb, par->thoff, th, &opts);
+
+	if (th->syn) {
+		/* Initial SYN from client */
+		this_cpu_inc(snet->stats->syn_received);
+
+		if (th->ece && th->cwr)
+			opts.options |= XT_SYNPROXY_OPT_ECN;
+
+		opts.options &= info->options;
+		if (opts.options & XT_SYNPROXY_OPT_TIMESTAMP)
+			synproxy_init_timestamp_cookie(info, &opts);
+		else
+			opts.options &= ~(XT_SYNPROXY_OPT_WSCALE |
+					  XT_SYNPROXY_OPT_SACK_PERM |
+					  XT_SYNPROXY_OPT_ECN);
+
+		synproxy_send_client_synack(skb, th, &opts);
+	} else if (th->ack && !(th->fin || th->rst)) {
+		/* ACK from client */
+		int mss = __cookie_v4_check(ip_hdr(skb), th,
+					    ntohl(th->ack_seq) - 1);
+		if (mss == 0) {
+			this_cpu_inc(snet->stats->cookie_invalid);
+			return NF_DROP;
+		}
+
+		this_cpu_inc(snet->stats->cookie_valid);
+		opts.mss = mss;
+
+		if (opts.options & XT_SYNPROXY_OPT_TIMESTAMP)
+			synproxy_check_timestamp_cookie(&opts);
+
+		synproxy_send_server_syn(snet, skb, th, &opts);
+	}
+
+	return NF_DROP;
+}
+
+static unsigned int ipv4_synproxy_hook(unsigned int hooknum,
+				       struct sk_buff *skb,
+				       const struct net_device *in,
+				       const struct net_device *out,
+				       int (*okfn)(struct sk_buff *))
+{
+	struct synproxy_net *snet = synproxy_pernet(dev_net(in ? : out));
+	enum ip_conntrack_info ctinfo;
+	struct nf_conn *ct;
+	struct nf_conn_synproxy *synproxy;
+	struct synproxy_options opts = {};
+	const struct ip_ct_tcp *state;
+	struct tcphdr *th, _th;
+	unsigned int thoff;
+
+	ct = nf_ct_get(skb, &ctinfo);
+	if (ct == NULL)
+		return NF_ACCEPT;
+
+	synproxy = nfct_synproxy(ct);
+	if (synproxy == NULL)
+		return NF_ACCEPT;
+
+	if (nf_is_loopback_packet(skb))
+		return NF_ACCEPT;
+
+	thoff = ip_hdrlen(skb);
+	th = skb_header_pointer(skb, thoff, sizeof(_th), &_th);
+	if (th == NULL)
+		return NF_DROP;
+
+	state = &ct->proto.tcp;
+	switch (state->state) {
+	case TCP_CONNTRACK_SYN_SENT:
+		synproxy_parse_options(skb, thoff, th, &opts);
+
+		synproxy->isn = ntohl(th->ack_seq);
+		if (opts.options & XT_SYNPROXY_OPT_TIMESTAMP)
+			synproxy->tsoff = -opts.tsecr;
+		break;
+	case TCP_CONNTRACK_SYN_RECV:
+		if (!th->syn)
+			break;
+
+		synproxy_parse_options(skb, thoff, th, &opts);
+		if (opts.options & XT_SYNPROXY_OPT_TIMESTAMP)
+			synproxy->tsoff += opts.tsval;
+
+		opts.options &= ~(XT_SYNPROXY_OPT_MSS |
+				  XT_SYNPROXY_OPT_WSCALE |
+				  XT_SYNPROXY_OPT_SACK_PERM);
+
+		swap(opts.tsval, opts.tsecr);
+		synproxy_send_server_ack(snet, state, skb, th, &opts);
+
+		nf_ct_seqadj_init(ct, ctinfo, synproxy->isn - ntohl(th->seq));
+
+		swap(opts.tsval, opts.tsecr);
+		synproxy_send_client_ack(snet, skb, th, &opts);
+
+		kfree_skb(skb);
+		return NF_STOLEN;
+	case TCP_CONNTRACK_CLOSE:
+		if (!th->rst)
+			break;
+
+		nf_ct_seqadj_init(ct, ctinfo, synproxy->isn -
+					      ntohl(th->seq) + 1);
+		break;
+	default:
+		break;
+	}
+
+	synproxy_tstamp_adjust(skb, thoff, th, ct, ctinfo, synproxy);
+	return NF_ACCEPT;
+}
+
+static int synproxy_tg4_check(const struct xt_tgchk_param *par)
+{
+	return nf_ct_l3proto_try_module_get(par->family);
+}
+
+static void synproxy_tg4_destroy(const struct xt_tgdtor_param *par)
+{
+	nf_ct_l3proto_module_put(par->family);
+}
+
+static struct xt_target synproxy_tg4_reg __read_mostly = {
+	.name		= "SYNPROXY",
+	.family		= NFPROTO_IPV4,
+	.target		= synproxy_tg4,
+	.targetsize	= sizeof(struct xt_synproxy_info),
+	.checkentry	= synproxy_tg4_check,
+	.destroy	= synproxy_tg4_destroy,
+	.me		= THIS_MODULE,
+};
+
+static struct nf_hook_ops ipv4_synproxy_ops[] __read_mostly = {
+	{
+		.hook		= ipv4_synproxy_hook,
+		.owner		= THIS_MODULE,
+		.pf		= NFPROTO_IPV4,
+		.hooknum	= NF_INET_LOCAL_IN,
+		.priority	= NF_IP_PRI_CONNTRACK_CONFIRM - 1,
+	},
+	{
+		.hook		= ipv4_synproxy_hook,
+		.owner		= THIS_MODULE,
+		.pf		= NFPROTO_IPV4,
+		.hooknum	= NF_INET_POST_ROUTING,
+		.priority	= NF_IP_PRI_CONNTRACK_CONFIRM - 1,
+	},
+};
+
+static int __init synproxy_tg4_init(void)
+{
+	int err;
+
+	err = nf_register_hooks(ipv4_synproxy_ops,
+				ARRAY_SIZE(ipv4_synproxy_ops));
+	if (err < 0)
+		goto err1;
+
+	err = xt_register_target(&synproxy_tg4_reg);
+	if (err < 0)
+		goto err2;
+
+	return 0;
+
+err2:
+	nf_unregister_hooks(ipv4_synproxy_ops, ARRAY_SIZE(ipv4_synproxy_ops));
+err1:
+	return err;
+}
+
+static void __exit synproxy_tg4_exit(void)
+{
+	xt_unregister_target(&synproxy_tg4_reg);
+	nf_unregister_hooks(ipv4_synproxy_ops, ARRAY_SIZE(ipv4_synproxy_ops));
+}
+
+module_init(synproxy_tg4_init);
+module_exit(synproxy_tg4_exit);
+
+MODULE_LICENSE("GPL");
+MODULE_AUTHOR("Patrick McHardy <kaber@trash.net>");
diff --git a/net/netfilter/Kconfig b/net/netfilter/Kconfig
index c45fc1a..62a171a 100644
--- a/net/netfilter/Kconfig
+++ b/net/netfilter/Kconfig
@@ -408,6 +408,9 @@ config NF_NAT_TFTP
 	depends on NF_CONNTRACK && NF_NAT
 	default NF_NAT && NF_CONNTRACK_TFTP
 
+config NETFILTER_SYNPROXY
+	tristate
+
 endif # NF_CONNTRACK
 
 config NETFILTER_XTABLES
diff --git a/net/netfilter/Makefile b/net/netfilter/Makefile
index 89a9c16..c3a0a12 100644
--- a/net/netfilter/Makefile
+++ b/net/netfilter/Makefile
@@ -61,6 +61,9 @@ obj-$(CONFIG_NF_NAT_IRC) += nf_nat_irc.o
 obj-$(CONFIG_NF_NAT_SIP) += nf_nat_sip.o
 obj-$(CONFIG_NF_NAT_TFTP) += nf_nat_tftp.o
 
+# SYNPROXY
+obj-$(CONFIG_NETFILTER_SYNPROXY) += nf_synproxy_core.o
+
 # generic X tables 
 obj-$(CONFIG_NETFILTER_XTABLES) += x_tables.o xt_tcpudp.o
 
diff --git a/net/netfilter/nf_conntrack_core.c b/net/netfilter/nf_conntrack_core.c
index 31683ca..109d10b 100644
--- a/net/netfilter/nf_conntrack_core.c
+++ b/net/netfilter/nf_conntrack_core.c
@@ -48,6 +48,7 @@
 #include <net/netfilter/nf_conntrack_timestamp.h>
 #include <net/netfilter/nf_conntrack_timeout.h>
 #include <net/netfilter/nf_conntrack_labels.h>
+#include <net/netfilter/nf_conntrack_synproxy.h>
 #include <net/netfilter/nf_nat.h>
 #include <net/netfilter/nf_nat_core.h>
 #include <net/netfilter/nf_nat_helper.h>
@@ -799,6 +800,11 @@ init_conntrack(struct net *net, struct nf_conn *tmpl,
 	if (IS_ERR(ct))
 		return (struct nf_conntrack_tuple_hash *)ct;
 
+	if (tmpl && nfct_synproxy(tmpl)) {
+		nfct_seqadj_ext_add(ct);
+		nfct_synproxy_ext_add(ct);
+	}
+
 	timeout_ext = tmpl ? nf_ct_timeout_find(tmpl) : NULL;
 	if (timeout_ext)
 		timeouts = NF_CT_TIMEOUT_EXT_DATA(timeout_ext);
diff --git a/net/netfilter/nf_conntrack_seqadj.c b/net/netfilter/nf_conntrack_seqadj.c
index 483eb9c..5f9bfd0 100644
--- a/net/netfilter/nf_conntrack_seqadj.c
+++ b/net/netfilter/nf_conntrack_seqadj.c
@@ -6,6 +6,26 @@
 #include <net/netfilter/nf_conntrack_extend.h>
 #include <net/netfilter/nf_conntrack_seqadj.h>
 
+int nf_ct_seqadj_init(struct nf_conn *ct, enum ip_conntrack_info ctinfo,
+		      s32 off)
+{
+	enum ip_conntrack_dir dir = CTINFO2DIR(ctinfo);
+	struct nf_conn_seqadj *seqadj;
+	struct nf_ct_seqadj *this_way;
+
+	if (off == 0)
+		return 0;
+
+	set_bit(IPS_SEQ_ADJUST_BIT, &ct->status);
+
+	seqadj = nfct_seqadj(ct);
+	this_way = &seqadj->seq[dir];
+	this_way->offset_before	 = off;
+	this_way->offset_after	 = off;
+	return 0;
+}
+EXPORT_SYMBOL_GPL(nf_ct_seqadj_init);
+
 int nf_ct_seqadj_set(struct nf_conn *ct, enum ip_conntrack_info ctinfo,
 		     __be32 seq, s32 off)
 {
diff --git a/net/netfilter/nf_synproxy_core.c b/net/netfilter/nf_synproxy_core.c
new file mode 100644
index 0000000..d887a84
--- /dev/null
+++ b/net/netfilter/nf_synproxy_core.c
@@ -0,0 +1,428 @@
+/*
+ * Copyright (c) 2013 Patrick McHardy <kaber@trash.net>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+
+#include <linux/module.h>
+#include <linux/skbuff.h>
+#include <asm/unaligned.h>
+#include <net/tcp.h>
+#include <net/netns/generic.h>
+
+#include <linux/netfilter_ipv4/ip_tables.h>
+#include <linux/netfilter/x_tables.h>
+#include <linux/netfilter/xt_tcpudp.h>
+#include <linux/netfilter/xt_SYNPROXY.h>
+#include <net/netfilter/nf_conntrack.h>
+#include <net/netfilter/nf_conntrack_extend.h>
+#include <net/netfilter/nf_conntrack_seqadj.h>
+#include <net/netfilter/nf_conntrack_synproxy.h>
+
+int synproxy_net_id;
+EXPORT_SYMBOL_GPL(synproxy_net_id);
+
+void
+synproxy_parse_options(const struct sk_buff *skb, unsigned int doff,
+		       const struct tcphdr *th, struct synproxy_options *opts)
+{
+	int length = (th->doff * 4) - sizeof(*th);
+	u8 buf[40], *ptr;
+
+	ptr = skb_header_pointer(skb, doff + sizeof(*th), length, buf);
+	BUG_ON(ptr == NULL);
+
+	opts->options = 0;
+	while (length > 0) {
+		int opcode = *ptr++;
+		int opsize;
+
+		switch (opcode) {
+		case TCPOPT_EOL:
+			return;
+		case TCPOPT_NOP:
+			length--;
+			continue;
+		default:
+			opsize = *ptr++;
+			if (opsize < 2)
+				return;
+			if (opsize > length)
+				return;
+
+			switch (opcode) {
+			case TCPOPT_MSS:
+				if (opsize == TCPOLEN_MSS) {
+					 opts->mss = get_unaligned_be16(ptr);
+					 opts->options |= XT_SYNPROXY_OPT_MSS;
+				}
+				break;
+			case TCPOPT_WINDOW:
+				if (opsize == TCPOLEN_WINDOW) {
+					opts->wscale = *ptr;
+					if (opts->wscale > 14)
+						opts->wscale = 14;
+					opts->options |= XT_SYNPROXY_OPT_WSCALE;
+				}
+				break;
+			case TCPOPT_TIMESTAMP:
+				if (opsize == TCPOLEN_TIMESTAMP) {
+					opts->tsval = get_unaligned_be32(ptr);
+					opts->tsecr = get_unaligned_be32(ptr + 4);
+					opts->options |= XT_SYNPROXY_OPT_TIMESTAMP;
+				}
+				break;
+			case TCPOPT_SACK_PERM:
+				if (opsize == TCPOLEN_SACK_PERM)
+					opts->options |= XT_SYNPROXY_OPT_SACK_PERM;
+				break;
+			}
+
+			ptr += opsize - 2;
+			length -= opsize;
+		}
+	}
+}
+EXPORT_SYMBOL_GPL(synproxy_parse_options);
+
+unsigned int synproxy_options_size(const struct synproxy_options *opts)
+{
+	unsigned int size = 0;
+
+	if (opts->options & XT_SYNPROXY_OPT_MSS)
+		size += TCPOLEN_MSS_ALIGNED;
+	if (opts->options & XT_SYNPROXY_OPT_TIMESTAMP)
+		size += TCPOLEN_TSTAMP_ALIGNED;
+	else if (opts->options & XT_SYNPROXY_OPT_SACK_PERM)
+		size += TCPOLEN_SACKPERM_ALIGNED;
+	if (opts->options & XT_SYNPROXY_OPT_WSCALE)
+		size += TCPOLEN_WSCALE_ALIGNED;
+
+	return size;
+}
+EXPORT_SYMBOL_GPL(synproxy_options_size);
+
+void
+synproxy_build_options(struct tcphdr *th, const struct synproxy_options *opts)
+{
+	__be32 *ptr = (__be32 *)(th + 1);
+	u8 options = opts->options;
+
+	if (options & XT_SYNPROXY_OPT_MSS)
+		*ptr++ = htonl((TCPOPT_MSS << 24) |
+			       (TCPOLEN_MSS << 16) |
+			       opts->mss);
+
+	if (options & XT_SYNPROXY_OPT_TIMESTAMP) {
+		if (options & XT_SYNPROXY_OPT_SACK_PERM)
+			*ptr++ = htonl((TCPOPT_SACK_PERM << 24) |
+				       (TCPOLEN_SACK_PERM << 16) |
+				       (TCPOPT_TIMESTAMP << 8) |
+				       TCPOLEN_TIMESTAMP);
+		else
+			*ptr++ = htonl((TCPOPT_NOP << 24) |
+				       (TCPOPT_NOP << 16) |
+				       (TCPOPT_TIMESTAMP << 8) |
+				       TCPOLEN_TIMESTAMP);
+
+		*ptr++ = htonl(opts->tsval);
+		*ptr++ = htonl(opts->tsecr);
+	} else if (options & XT_SYNPROXY_OPT_SACK_PERM)
+		*ptr++ = htonl((TCPOPT_NOP << 24) |
+			       (TCPOPT_NOP << 16) |
+			       (TCPOPT_SACK_PERM << 8) |
+			       TCPOLEN_SACK_PERM);
+
+	if (options & XT_SYNPROXY_OPT_WSCALE)
+		*ptr++ = htonl((TCPOPT_NOP << 24) |
+			       (TCPOPT_WINDOW << 16) |
+			       (TCPOLEN_WINDOW << 8) |
+			       opts->wscale);
+}
+EXPORT_SYMBOL_GPL(synproxy_build_options);
+
+void synproxy_init_timestamp_cookie(const struct xt_synproxy_info *info,
+				    struct synproxy_options *opts)
+{
+	opts->tsecr = opts->tsval;
+	opts->tsval = tcp_time_stamp & ~0x3f;
+
+	if (opts->options & XT_SYNPROXY_OPT_WSCALE)
+		opts->tsval |= info->wscale;
+	else
+		opts->tsval |= 0xf;
+
+	if (opts->options & XT_SYNPROXY_OPT_SACK_PERM)
+		opts->tsval |= 1 << 4;
+
+	if (opts->options & XT_SYNPROXY_OPT_ECN)
+		opts->tsval |= 1 << 5;
+}
+EXPORT_SYMBOL_GPL(synproxy_init_timestamp_cookie);
+
+void synproxy_check_timestamp_cookie(struct synproxy_options *opts)
+{
+	opts->wscale = opts->tsecr & 0xf;
+	if (opts->wscale != 0xf)
+		opts->options |= XT_SYNPROXY_OPT_WSCALE;
+
+	opts->options |= opts->tsecr & (1 << 4) ? XT_SYNPROXY_OPT_SACK_PERM : 0;
+
+	opts->options |= opts->tsecr & (1 << 5) ? XT_SYNPROXY_OPT_ECN : 0;
+}
+EXPORT_SYMBOL_GPL(synproxy_check_timestamp_cookie);
+
+unsigned int synproxy_tstamp_adjust(struct sk_buff *skb,
+				    unsigned int protoff,
+				    struct tcphdr *th,
+				    struct nf_conn *ct,
+				    enum ip_conntrack_info ctinfo,
+				    const struct nf_conn_synproxy *synproxy)
+{
+	unsigned int optoff, optend;
+	u32 *ptr, old;
+
+	if (synproxy->tsoff == 0)
+		return 1;
+
+	optoff = protoff + sizeof(struct tcphdr);
+	optend = protoff + th->doff * 4;
+
+	if (!skb_make_writable(skb, optend))
+		return 0;
+
+	while (optoff < optend) {
+		unsigned char *op = skb->data + optoff;
+
+		switch (op[0]) {
+		case TCPOPT_EOL:
+			return 1;
+		case TCPOPT_NOP:
+			optoff++;
+			continue;
+		default:
+			if (optoff + 1 == optend ||
+			    optoff + op[1] > optend ||
+			    op[1] < 2)
+				return 0;
+			if (op[0] == TCPOPT_TIMESTAMP &&
+			    op[1] == TCPOLEN_TIMESTAMP) {
+				if (CTINFO2DIR(ctinfo) == IP_CT_DIR_REPLY) {
+					ptr = (u32 *)&op[2];
+					old = *ptr;
+					*ptr = htonl(ntohl(*ptr) -
+						     synproxy->tsoff);
+				} else {
+					ptr = (u32 *)&op[6];
+					old = *ptr;
+					*ptr = htonl(ntohl(*ptr) +
+						     synproxy->tsoff);
+				}
+				inet_proto_csum_replace4(&th->check, skb,
+							 old, *ptr, 0);
+				return 1;
+			}
+			optoff += op[1];
+		}
+	}
+	return 1;
+}
+EXPORT_SYMBOL_GPL(synproxy_tstamp_adjust);
+
+static struct nf_ct_ext_type nf_ct_synproxy_extend __read_mostly = {
+	.len		= sizeof(struct nf_conn_synproxy),
+	.align		= __alignof__(struct nf_conn_synproxy),
+	.id		= NF_CT_EXT_SYNPROXY,
+};
+
+#ifdef CONFIG_PROC_FS
+static void *synproxy_cpu_seq_start(struct seq_file *seq, loff_t *pos)
+{
+	struct synproxy_net *snet = synproxy_pernet(seq_file_net(seq));
+	int cpu;
+
+	if (*pos == 0)
+		return SEQ_START_TOKEN;
+
+	for (cpu = *pos - 1; cpu < nr_cpu_ids; cpu++) {
+		if (!cpu_possible(cpu))
+			continue;
+		*pos = cpu + 1;
+		return per_cpu_ptr(snet->stats, cpu);
+	}
+
+	return NULL;
+}
+
+static void *synproxy_cpu_seq_next(struct seq_file *seq, void *v, loff_t *pos)
+{
+	struct synproxy_net *snet = synproxy_pernet(seq_file_net(seq));
+	int cpu;
+
+	for (cpu = *pos; cpu < nr_cpu_ids; cpu++) {
+		if (!cpu_possible(cpu))
+			continue;
+		*pos = cpu + 1;
+		return per_cpu_ptr(snet->stats, cpu);
+	}
+
+	return NULL;
+}
+
+static void synproxy_cpu_seq_stop(struct seq_file *seq, void *v)
+{
+	return;
+}
+
+static int synproxy_cpu_seq_show(struct seq_file *seq, void *v)
+{
+	struct synproxy_stats *stats = v;
+
+	if (v == SEQ_START_TOKEN) {
+		seq_printf(seq, "syn_received\tcookie_invalid\tcookie_valid\n");
+		return 0;
+	}
+
+	seq_printf(seq, "%08u\t%08x\t%08x\n",
+		   stats->syn_received,
+		   stats->cookie_invalid,
+		   stats->cookie_valid);
+
+	return 0;
+}
+
+static const struct seq_operations synproxy_cpu_seq_ops = {
+	.start		= synproxy_cpu_seq_start,
+	.next		= synproxy_cpu_seq_next,
+	.stop		= synproxy_cpu_seq_stop,
+	.show		= synproxy_cpu_seq_show,
+};
+
+static int synproxy_cpu_seq_open(struct inode *inode, struct file *file)
+{
+	return seq_open_net(inode, file, &synproxy_cpu_seq_ops,
+			    sizeof(struct seq_net_private));
+}
+
+static const struct file_operations synproxy_cpu_seq_fops = {
+	.owner		= THIS_MODULE,
+	.open		= synproxy_cpu_seq_open,
+	.read		= seq_read,
+	.llseek		= seq_lseek,
+	.release	= seq_release_net,
+};
+
+static int __net_init synproxy_proc_init(struct net *net)
+{
+	if (!proc_create("synproxy", S_IRUGO, net->proc_net_stat,
+			 &synproxy_cpu_seq_fops))
+		return -ENOMEM;
+	return 0;
+}
+
+static void __net_exit synproxy_proc_exit(struct net *net)
+{
+	remove_proc_entry("synproxy", net->proc_net_stat);
+}
+#else
+static int __net_init synproxy_proc_init(struct net *net)
+{
+	return 0;
+}
+
+static void __net_exit synproxy_proc_exit(struct net *net)
+{
+	return;
+}
+#endif /* CONFIG_PROC_FS */
+
+static int __net_init synproxy_net_init(struct net *net)
+{
+	struct synproxy_net *snet = synproxy_pernet(net);
+	struct nf_conntrack_tuple t;
+	struct nf_conn *ct;
+	int err = -ENOMEM;
+
+	memset(&t, 0, sizeof(t));
+	ct = nf_conntrack_alloc(net, 0, &t, &t, GFP_KERNEL);
+	if (IS_ERR(ct)) {
+		err = PTR_ERR(ct);
+		goto err1;
+	}
+
+	__set_bit(IPS_TEMPLATE_BIT, &ct->status);
+	__set_bit(IPS_CONFIRMED_BIT, &ct->status);
+	if (!nfct_seqadj_ext_add(ct))
+		goto err2;
+	if (!nfct_synproxy_ext_add(ct))
+		goto err2;
+
+	snet->tmpl = ct;
+
+	snet->stats = alloc_percpu(struct synproxy_stats);
+	if (snet->stats == NULL)
+		goto err2;
+
+	err = synproxy_proc_init(net);
+	if (err < 0)
+		goto err3;
+
+	return 0;
+
+err3:
+	free_percpu(snet->stats);
+err2:
+	nf_conntrack_free(ct);
+err1:
+	return err;
+}
+
+static void __net_exit synproxy_net_exit(struct net *net)
+{
+	struct synproxy_net *snet = synproxy_pernet(net);
+
+	nf_conntrack_free(snet->tmpl);
+	synproxy_proc_exit(net);
+	free_percpu(snet->stats);
+}
+
+static struct pernet_operations synproxy_net_ops = {
+	.init		= synproxy_net_init,
+	.exit		= synproxy_net_exit,
+	.id		= &synproxy_net_id,
+	.size		= sizeof(struct synproxy_net),
+};
+
+static int __init synproxy_core_init(void)
+{
+	int err;
+
+	err = nf_ct_extend_register(&nf_ct_synproxy_extend);
+	if (err < 0)
+		goto err1;
+
+	err = register_pernet_subsys(&synproxy_net_ops);
+	if (err < 0)
+		goto err2;
+
+	return 0;
+
+err2:
+	nf_ct_extend_unregister(&nf_ct_synproxy_extend);
+err1:
+	return err;
+}
+
+static void __exit synproxy_core_exit(void)
+{
+	unregister_pernet_subsys(&synproxy_net_ops);
+	nf_ct_extend_unregister(&nf_ct_synproxy_extend);
+}
+
+module_init(synproxy_core_init);
+module_exit(synproxy_core_exit);
+
+MODULE_LICENSE("GPL");
+MODULE_AUTHOR("Patrick McHardy <kaber@trash.net>");
-- 
1.8.1.4

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH 4/5] net: syncookies: export cookie_v6_init_sequence/cookie_v6_check
  2013-08-07 17:42 [PATCH RFC 0/5] netfilter: implement netfilter SYN proxy Patrick McHardy
                   ` (2 preceding siblings ...)
  2013-08-07 17:42 ` [PATCH 3/5] netfilter: add SYNPROXY core/target Patrick McHardy
@ 2013-08-07 17:42 ` Patrick McHardy
  2013-08-07 20:27   ` Jesper Dangaard Brouer
  2013-08-07 17:42 ` [PATCH 5/5] netfilter: add IPv6 SYNPROXY target Patrick McHardy
  2013-08-07 18:06 ` [PATCH RFC 0/5] netfilter: implement netfilter SYN proxy Eric Dumazet
  5 siblings, 1 reply; 29+ messages in thread
From: Patrick McHardy @ 2013-08-07 17:42 UTC (permalink / raw)
  To: pablo; +Cc: netfilter-devel, netdev, mph, jesper.brouer, as

Extract the local TCP stack independant parts of tcp_v6_init_sequence()
and cookie_v6_check() and export them for use by the upcoming IPv6 SYNPROXY
target.

Signed-off-by: Patrick McHardy <kaber@trash.net>
---
 include/net/tcp.h     |  4 ++++
 net/ipv6/syncookies.c | 25 ++++++++++++++++---------
 2 files changed, 20 insertions(+), 9 deletions(-)

diff --git a/include/net/tcp.h b/include/net/tcp.h
index 63256c6..22872c7 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -503,8 +503,12 @@ extern bool cookie_check_timestamp(struct tcp_options_received *opt,
 				struct net *net, bool *ecn_ok);
 
 /* From net/ipv6/syncookies.c */
+extern int __cookie_v6_check(const struct ipv6hdr *iph, const struct tcphdr *th,
+			     u32 cookie);
 extern struct sock *cookie_v6_check(struct sock *sk, struct sk_buff *skb);
 #ifdef CONFIG_SYN_COOKIES
+extern u32 __cookie_v6_init_sequence(const struct ipv6hdr *iph,
+				     const struct tcphdr *th, u16 *mssp);
 extern __u32 cookie_v6_init_sequence(struct sock *sk, const struct sk_buff *skb,
 				     __u16 *mss);
 #else
diff --git a/net/ipv6/syncookies.c b/net/ipv6/syncookies.c
index d5dda20..bf63ac8 100644
--- a/net/ipv6/syncookies.c
+++ b/net/ipv6/syncookies.c
@@ -112,32 +112,38 @@ static __u32 check_tcp_syn_cookie(__u32 cookie, const struct in6_addr *saddr,
 		& COOKIEMASK;
 }
 
-__u32 cookie_v6_init_sequence(struct sock *sk, const struct sk_buff *skb, __u16 *mssp)
+u32 __cookie_v6_init_sequence(const struct ipv6hdr *iph,
+			      const struct tcphdr *th, __u16 *mssp)
 {
-	const struct ipv6hdr *iph = ipv6_hdr(skb);
-	const struct tcphdr *th = tcp_hdr(skb);
 	int mssind;
 	const __u16 mss = *mssp;
 
-	tcp_synq_overflow(sk);
-
 	for (mssind = ARRAY_SIZE(msstab) - 1; mssind ; mssind--)
 		if (mss >= msstab[mssind])
 			break;
 
 	*mssp = msstab[mssind];
 
-	NET_INC_STATS_BH(sock_net(sk), LINUX_MIB_SYNCOOKIESSENT);
-
 	return secure_tcp_syn_cookie(&iph->saddr, &iph->daddr, th->source,
 				     th->dest, ntohl(th->seq),
 				     jiffies / (HZ * 60), mssind);
 }
+EXPORT_SYMBOL_GPL(__cookie_v6_init_sequence);
 
-static inline int cookie_check(const struct sk_buff *skb, __u32 cookie)
+__u32 cookie_v6_init_sequence(struct sock *sk, const struct sk_buff *skb, __u16 *mssp)
 {
 	const struct ipv6hdr *iph = ipv6_hdr(skb);
 	const struct tcphdr *th = tcp_hdr(skb);
+
+	tcp_synq_overflow(sk);
+	NET_INC_STATS_BH(sock_net(sk), LINUX_MIB_SYNCOOKIESSENT);
+
+	return __cookie_v6_init_sequence(iph, th, mssp);
+}
+
+int __cookie_v6_check(const struct ipv6hdr *iph, const struct tcphdr *th,
+		      __u32 cookie)
+{
 	__u32 seq = ntohl(th->seq) - 1;
 	__u32 mssind = check_tcp_syn_cookie(cookie, &iph->saddr, &iph->daddr,
 					    th->source, th->dest, seq,
@@ -145,6 +151,7 @@ static inline int cookie_check(const struct sk_buff *skb, __u32 cookie)
 
 	return mssind < ARRAY_SIZE(msstab) ? msstab[mssind] : 0;
 }
+EXPORT_SYMBOL_GPL(__cookie_v6_check);
 
 struct sock *cookie_v6_check(struct sock *sk, struct sk_buff *skb)
 {
@@ -167,7 +174,7 @@ struct sock *cookie_v6_check(struct sock *sk, struct sk_buff *skb)
 		goto out;
 
 	if (tcp_synq_no_recent_overflow(sk) ||
-		(mss = cookie_check(skb, cookie)) == 0) {
+		(mss = __cookie_v6_check(ipv6_hdr(skb), th, cookie)) == 0) {
 		NET_INC_STATS_BH(sock_net(sk), LINUX_MIB_SYNCOOKIESFAILED);
 		goto out;
 	}
-- 
1.8.1.4

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH 5/5] netfilter: add IPv6 SYNPROXY target
  2013-08-07 17:42 [PATCH RFC 0/5] netfilter: implement netfilter SYN proxy Patrick McHardy
                   ` (3 preceding siblings ...)
  2013-08-07 17:42 ` [PATCH 4/5] net: syncookies: export cookie_v6_init_sequence/cookie_v6_check Patrick McHardy
@ 2013-08-07 17:42 ` Patrick McHardy
  2013-08-07 20:34   ` Jesper Dangaard Brouer
  2013-08-07 18:06 ` [PATCH RFC 0/5] netfilter: implement netfilter SYN proxy Eric Dumazet
  5 siblings, 1 reply; 29+ messages in thread
From: Patrick McHardy @ 2013-08-07 17:42 UTC (permalink / raw)
  To: pablo; +Cc: netfilter-devel, netdev, mph, jesper.brouer, as

Add an IPv6 version of the SYNPROXY target. The main differences to the
IPv4 version is routing and IP header construction.

Signed-off-by: Patrick McHardy <kaber@trash.net>
---
 net/ipv6/netfilter/Kconfig         |  13 ++
 net/ipv6/netfilter/Makefile        |   1 +
 net/ipv6/netfilter/ip6t_SYNPROXY.c | 451 +++++++++++++++++++++++++++++++++++++
 3 files changed, 465 insertions(+)
 create mode 100644 net/ipv6/netfilter/ip6t_SYNPROXY.c

diff --git a/net/ipv6/netfilter/Kconfig b/net/ipv6/netfilter/Kconfig
index 4433ab40..a7f842b 100644
--- a/net/ipv6/netfilter/Kconfig
+++ b/net/ipv6/netfilter/Kconfig
@@ -153,6 +153,19 @@ config IP6_NF_TARGET_REJECT
 
 	  To compile it as a module, choose M here.  If unsure, say N.
 
+config IP6_NF_TARGET_SYNPROXY
+	tristate "SYNPROXY target support"
+	depends on NF_CONNTRACK && NETFILTER_ADVANCED
+	select NETFILTER_SYNPROXY
+	select SYN_COOKIES
+	help
+	  The SYNPROXY target allows you to intercept TCP connections and
+	  establish them using syncookies before they are passed on to the
+	  server. This allows to avoid conntrack and server resource usage
+	  during SYN-flood attacks.
+
+	  To compile it as a module, choose M here. If unsure, say N.
+
 config IP6_NF_MANGLE
 	tristate "Packet mangling"
 	default m if NETFILTER_ADVANCED=n
diff --git a/net/ipv6/netfilter/Makefile b/net/ipv6/netfilter/Makefile
index 2d11fcc..58f031b 100644
--- a/net/ipv6/netfilter/Makefile
+++ b/net/ipv6/netfilter/Makefile
@@ -37,3 +37,4 @@ obj-$(CONFIG_IP6_NF_MATCH_RT) += ip6t_rt.o
 obj-$(CONFIG_IP6_NF_TARGET_MASQUERADE) += ip6t_MASQUERADE.o
 obj-$(CONFIG_IP6_NF_TARGET_NPT) += ip6t_NPT.o
 obj-$(CONFIG_IP6_NF_TARGET_REJECT) += ip6t_REJECT.o
+obj-$(CONFIG_IP6_NF_TARGET_SYNPROXY) += ip6t_SYNPROXY.o
diff --git a/net/ipv6/netfilter/ip6t_SYNPROXY.c b/net/ipv6/netfilter/ip6t_SYNPROXY.c
new file mode 100644
index 0000000..ee773da
--- /dev/null
+++ b/net/ipv6/netfilter/ip6t_SYNPROXY.c
@@ -0,0 +1,451 @@
+/*
+ * Copyright (c) 2013 Patrick McHardy <kaber@trash.net>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+
+#include <linux/module.h>
+#include <linux/skbuff.h>
+#include <net/ip6_checksum.h>
+#include <net/ip6_route.h>
+#include <net/tcp.h>
+
+#include <linux/netfilter_ipv6/ip6_tables.h>
+#include <linux/netfilter/x_tables.h>
+#include <linux/netfilter/xt_SYNPROXY.h>
+#include <net/netfilter/nf_conntrack.h>
+#include <net/netfilter/nf_conntrack_seqadj.h>
+#include <net/netfilter/nf_conntrack_synproxy.h>
+
+static struct ipv6hdr *
+synproxy_build_ip(struct sk_buff *skb, const struct in6_addr *saddr,
+				       const struct in6_addr *daddr)
+{
+	struct ipv6hdr *iph;
+
+	skb_reset_network_header(skb);
+	iph = (struct ipv6hdr *)skb_put(skb, sizeof(*iph));
+	ip6_flow_hdr(iph, 0, 0);
+	iph->hop_limit	= 64;
+	iph->nexthdr	= IPPROTO_TCP;
+	iph->saddr	= *saddr;
+	iph->daddr	= *daddr;
+
+	return iph;
+}
+
+static void
+synproxy_send_tcp(const struct sk_buff *skb, struct sk_buff *nskb,
+		  struct nf_conntrack *nfct, enum ip_conntrack_info ctinfo,
+		  struct ipv6hdr *niph, struct tcphdr *nth,
+		  unsigned int tcp_hdr_size)
+{
+	struct net *net = nf_ct_net((struct nf_conn *)nfct);
+	struct dst_entry *dst;
+	struct flowi6 fl6;
+
+	nth->check = ~tcp_v6_check(tcp_hdr_size, &niph->saddr, &niph->daddr, 0);
+	nskb->ip_summed   = CHECKSUM_PARTIAL;
+	nskb->csum_start  = (unsigned char *)nth - nskb->head;
+	nskb->csum_offset = offsetof(struct tcphdr, check);
+
+	memset(&fl6, 0, sizeof(fl6));
+	fl6.flowi6_proto = IPPROTO_TCP;
+	fl6.saddr = niph->saddr;
+	fl6.daddr = niph->daddr;
+	fl6.fl6_sport = nth->source;
+	fl6.fl6_dport = nth->dest;
+	security_skb_classify_flow((struct sk_buff *)skb, flowi6_to_flowi(&fl6));
+	dst = ip6_route_output(net, NULL, &fl6);
+	if (dst == NULL || dst->error) {
+		dst_release(dst);
+		goto free_nskb;
+	}
+	dst = xfrm_lookup(net, dst, flowi6_to_flowi(&fl6), NULL, 0);
+	if (IS_ERR(dst))
+		goto free_nskb;
+
+	skb_dst_set(nskb, dst);
+
+	nskb->nfct = nfct;
+	nskb->nfctinfo = ctinfo;
+	nf_conntrack_get(nfct);
+
+	ip6_local_out(nskb);
+	return;
+
+free_nskb:
+	kfree_skb(nskb);
+}
+
+static void
+synproxy_send_client_synack(const struct sk_buff *skb, const struct tcphdr *th,
+			    const struct synproxy_options *opts)
+{
+	struct sk_buff *nskb;
+	struct ipv6hdr *iph, *niph;
+	struct tcphdr *nth;
+	unsigned int tcp_hdr_size;
+	u16 mss = opts->mss;
+
+	iph = ipv6_hdr(skb);
+
+	tcp_hdr_size = sizeof(*nth) + synproxy_options_size(opts);
+	nskb = alloc_skb(sizeof(*niph) + tcp_hdr_size + LL_MAX_HEADER,
+			 GFP_ATOMIC);
+	if (nskb == NULL)
+		return;
+	skb_reserve(nskb, LL_MAX_HEADER);
+
+	niph = synproxy_build_ip(nskb, &iph->daddr, &iph->saddr);
+
+	skb_reset_transport_header(nskb);
+	nth = (struct tcphdr *)skb_put(nskb, tcp_hdr_size);
+	nth->source	= th->dest;
+	nth->dest	= th->source;
+	nth->seq	= htonl(__cookie_v6_init_sequence(iph, th, &mss));
+	nth->ack_seq	= htonl(ntohl(th->seq) + 1);
+	tcp_flag_word(nth) = TCP_FLAG_SYN | TCP_FLAG_ACK;
+	if (opts->options & XT_SYNPROXY_OPT_ECN)
+		tcp_flag_word(nth) |= TCP_FLAG_ECE;
+	nth->doff	= tcp_hdr_size / 4;
+	nth->window	= 0;
+	nth->check	= 0;
+	nth->urg_ptr	= 0;
+
+	synproxy_build_options(nth, opts);
+
+	synproxy_send_tcp(skb, nskb, skb->nfct, IP_CT_ESTABLISHED_REPLY,
+			  niph, nth, tcp_hdr_size);
+}
+
+static void
+synproxy_send_server_syn(const struct synproxy_net *snet,
+			 const struct sk_buff *skb, const struct tcphdr *th,
+			 const struct synproxy_options *opts)
+{
+	struct sk_buff *nskb;
+	struct ipv6hdr *iph, *niph;
+	struct tcphdr *nth;
+	unsigned int tcp_hdr_size;
+
+	iph = ipv6_hdr(skb);
+
+	tcp_hdr_size = sizeof(*nth) + synproxy_options_size(opts);
+	nskb = alloc_skb(sizeof(*niph) + tcp_hdr_size + LL_MAX_HEADER,
+			 GFP_ATOMIC);
+	if (nskb == NULL)
+		return;
+	skb_reserve(nskb, LL_MAX_HEADER);
+
+	niph = synproxy_build_ip(nskb, &iph->saddr, &iph->daddr);
+
+	skb_reset_transport_header(nskb);
+	nth = (struct tcphdr *)skb_put(nskb, tcp_hdr_size);
+	nth->source	= th->source;
+	nth->dest	= th->dest;
+	nth->seq	= htonl(ntohl(th->seq) - 1);
+	nth->ack_seq	= htonl(ntohl(th->ack_seq) - 1);;
+	tcp_flag_word(nth) = TCP_FLAG_SYN;
+	if (opts->options & XT_SYNPROXY_OPT_ECN)
+		tcp_flag_word(nth) |= TCP_FLAG_ECE | TCP_FLAG_CWR;
+	nth->doff	= tcp_hdr_size / 4;
+	nth->window	= th->window;
+	nth->check	= 0;
+	nth->urg_ptr	= 0;
+
+	synproxy_build_options(nth, opts);
+
+	synproxy_send_tcp(skb, nskb, &snet->tmpl->ct_general, IP_CT_NEW,
+			  niph, nth, tcp_hdr_size);
+}
+
+static void
+synproxy_send_server_ack(const struct synproxy_net *snet,
+			 const struct ip_ct_tcp *state,
+			 const struct sk_buff *skb, const struct tcphdr *th,
+			 const struct synproxy_options *opts)
+{
+	struct sk_buff *nskb;
+	struct ipv6hdr *iph, *niph;
+	struct tcphdr *nth;
+	unsigned int tcp_hdr_size;
+
+	iph = ipv6_hdr(skb);
+
+	tcp_hdr_size = sizeof(*nth) + synproxy_options_size(opts);
+	nskb = alloc_skb(sizeof(*niph) + tcp_hdr_size + LL_MAX_HEADER,
+			 GFP_ATOMIC);
+	if (nskb == NULL)
+		return;
+	skb_reserve(nskb, LL_MAX_HEADER);
+
+	niph = synproxy_build_ip(nskb, &iph->daddr, &iph->saddr);
+
+	skb_reset_transport_header(nskb);
+	nth = (struct tcphdr *)skb_put(nskb, tcp_hdr_size);
+	nth->source	= th->dest;
+	nth->dest	= th->source;
+	nth->seq	= htonl(ntohl(th->ack_seq));
+	nth->ack_seq	= htonl(ntohl(th->seq) + 1);;
+	tcp_flag_word(nth) = TCP_FLAG_ACK;
+	nth->doff	= tcp_hdr_size / 4;
+	nth->window	= htons(state->seen[IP_CT_DIR_ORIGINAL].td_maxwin);
+	nth->check	= 0;
+	nth->urg_ptr	= 0;
+
+	synproxy_build_options(nth, opts);
+
+	synproxy_send_tcp(skb, nskb, skb->nfct, IP_CT_ESTABLISHED,
+			  niph, nth, tcp_hdr_size);
+}
+
+static void
+synproxy_send_client_ack(const struct synproxy_net *snet,
+			 const struct sk_buff *skb, const struct tcphdr *th,
+			 const struct synproxy_options *opts)
+{
+	struct sk_buff *nskb;
+	struct ipv6hdr *iph, *niph;
+	struct tcphdr *nth;
+	unsigned int tcp_hdr_size;
+
+	iph = ipv6_hdr(skb);
+
+	tcp_hdr_size = sizeof(*nth) + synproxy_options_size(opts);
+	nskb = alloc_skb(sizeof(*niph) + tcp_hdr_size + LL_MAX_HEADER,
+			 GFP_ATOMIC);
+	if (nskb == NULL)
+		return;
+	skb_reserve(nskb, LL_MAX_HEADER);
+
+	niph = synproxy_build_ip(nskb, &iph->saddr, &iph->daddr);
+
+	skb_reset_transport_header(nskb);
+	nth = (struct tcphdr *)skb_put(nskb, tcp_hdr_size);
+	nth->source	= th->source;
+	nth->dest	= th->dest;
+	nth->seq	= htonl(ntohl(th->seq) + 1);
+	nth->ack_seq	= th->ack_seq;
+	tcp_flag_word(nth) = TCP_FLAG_ACK;
+	nth->doff	= tcp_hdr_size / 4;
+	nth->window	= th->window;
+	nth->check	= 0;
+	nth->urg_ptr	= 0;
+
+	synproxy_build_options(nth, opts);
+
+	synproxy_send_tcp(skb, nskb, skb->nfct, IP_CT_ESTABLISHED_REPLY,
+			  niph, nth, tcp_hdr_size);
+}
+
+static unsigned int
+synproxy_tg6(struct sk_buff *skb, const struct xt_action_param *par)
+{
+	const struct xt_synproxy_info *info = par->targinfo;
+	struct synproxy_net *snet = synproxy_pernet(dev_net(par->in));
+	struct synproxy_options opts = {};
+	struct tcphdr *th, _th;
+
+	if (nf_ip6_checksum(skb, par->hooknum, par->thoff, IPPROTO_TCP))
+		return NF_DROP;
+
+	th = skb_header_pointer(skb, par->thoff, sizeof(_th), &_th);
+	if (th == NULL)
+		return NF_DROP;
+
+	synproxy_parse_options(skb, par->thoff, th, &opts);
+
+	if (th->syn) {
+		/* Initial SYN from client */
+		this_cpu_inc(snet->stats->syn_received);
+
+		if (th->ece && th->cwr)
+			opts.options |= XT_SYNPROXY_OPT_ECN;
+
+		opts.options &= info->options;
+		if (opts.options & XT_SYNPROXY_OPT_TIMESTAMP)
+			synproxy_init_timestamp_cookie(info, &opts);
+		else
+			opts.options &= ~(XT_SYNPROXY_OPT_WSCALE |
+					  XT_SYNPROXY_OPT_SACK_PERM |
+					  XT_SYNPROXY_OPT_ECN);
+
+		synproxy_send_client_synack(skb, th, &opts);
+	} else if (th->ack && !(th->fin || th->rst)) {
+		/* ACK from client */
+		int mss = __cookie_v6_check(ipv6_hdr(skb), th,
+					    ntohl(th->ack_seq) - 1);
+		if (mss == 0) {
+			this_cpu_inc(snet->stats->cookie_invalid);
+			return NF_DROP;
+		}
+
+		this_cpu_inc(snet->stats->cookie_valid);
+		opts.mss = mss;
+
+		if (opts.options & XT_SYNPROXY_OPT_TIMESTAMP)
+			synproxy_check_timestamp_cookie(&opts);
+
+		synproxy_send_server_syn(snet, skb, th, &opts);
+	}
+
+	return NF_DROP;
+}
+
+static unsigned int ipv6_synproxy_hook(unsigned int hooknum,
+				       struct sk_buff *skb,
+				       const struct net_device *in,
+				       const struct net_device *out,
+				       int (*okfn)(struct sk_buff *))
+{
+	struct synproxy_net *snet = synproxy_pernet(dev_net(in ? : out));
+	enum ip_conntrack_info ctinfo;
+	struct nf_conn *ct;
+	struct nf_conn_synproxy *synproxy;
+	struct synproxy_options opts = {};
+	const struct ip_ct_tcp *state;
+	struct tcphdr *th, _th;
+	__be16 frag_off;
+	u8 nexthdr;
+	int thoff;
+
+	ct = nf_ct_get(skb, &ctinfo);
+	if (ct == NULL)
+		return NF_ACCEPT;
+
+	synproxy = nfct_synproxy(ct);
+	if (synproxy == NULL)
+		return NF_ACCEPT;
+
+	if (nf_is_loopback_packet(skb))
+		return NF_ACCEPT;
+
+	nexthdr = ipv6_hdr(skb)->nexthdr;
+	thoff = ipv6_skip_exthdr(skb, sizeof(struct ipv6hdr), &nexthdr,
+				 &frag_off);
+	if (thoff < 0)
+		return NF_ACCEPT;
+
+	th = skb_header_pointer(skb, thoff, sizeof(_th), &_th);
+	if (th == NULL)
+		return NF_DROP;
+
+	state = &ct->proto.tcp;
+	switch (state->state) {
+	case TCP_CONNTRACK_SYN_SENT:
+		synproxy_parse_options(skb, thoff, th, &opts);
+
+		synproxy->isn = ntohl(th->ack_seq);
+		if (opts.options & XT_SYNPROXY_OPT_TIMESTAMP)
+			synproxy->tsoff = -opts.tsecr;
+		break;
+	case TCP_CONNTRACK_SYN_RECV:
+		if (!th->syn)
+			break;
+
+		synproxy_parse_options(skb, thoff, th, &opts);
+		if (opts.options & XT_SYNPROXY_OPT_TIMESTAMP)
+			synproxy->tsoff += opts.tsval;
+
+		opts.options &= ~(XT_SYNPROXY_OPT_MSS |
+				  XT_SYNPROXY_OPT_WSCALE |
+				  XT_SYNPROXY_OPT_SACK_PERM);
+
+		swap(opts.tsval, opts.tsecr);
+		synproxy_send_server_ack(snet, state, skb, th, &opts);
+
+		nf_ct_seqadj_init(ct, ctinfo, synproxy->isn - ntohl(th->seq));
+
+		swap(opts.tsval, opts.tsecr);
+		synproxy_send_client_ack(snet, skb, th, &opts);
+
+		kfree_skb(skb);
+		return NF_STOLEN;
+	case TCP_CONNTRACK_CLOSE:
+		if (!th->rst)
+			break;
+
+		nf_ct_seqadj_init(ct, ctinfo, synproxy->isn -
+					      ntohl(th->seq) + 1);
+		break;
+	default:
+		break;
+	}
+
+	synproxy_tstamp_adjust(skb, thoff, th, ct, ctinfo, synproxy);
+	return NF_ACCEPT;
+}
+
+static int synproxy_tg6_check(const struct xt_tgchk_param *par)
+{
+	/// XXX PROTO match TCP
+	return nf_ct_l3proto_try_module_get(par->family);
+}
+
+static void synproxy_tg6_destroy(const struct xt_tgdtor_param *par)
+{
+	nf_ct_l3proto_module_put(par->family);
+}
+
+static struct xt_target synproxy_tg6_reg __read_mostly = {
+	.name		= "SYNPROXY",
+	.family		= NFPROTO_IPV6,
+	.target		= synproxy_tg6,
+	.targetsize	= sizeof(struct xt_synproxy_info),
+	.checkentry	= synproxy_tg6_check,
+	.destroy	= synproxy_tg6_destroy,
+	.me		= THIS_MODULE,
+};
+
+static struct nf_hook_ops ipv6_synproxy_ops[] __read_mostly = {
+	{
+		.hook		= ipv6_synproxy_hook,
+		.owner		= THIS_MODULE,
+		.pf		= NFPROTO_IPV6,
+		.hooknum	= NF_INET_LOCAL_IN,
+		.priority	= NF_IP_PRI_CONNTRACK_CONFIRM - 1,
+	},
+	{
+		.hook		= ipv6_synproxy_hook,
+		.owner		= THIS_MODULE,
+		.pf		= NFPROTO_IPV6,
+		.hooknum	= NF_INET_POST_ROUTING,
+		.priority	= NF_IP_PRI_CONNTRACK_CONFIRM - 1,
+	},
+};
+
+static int __init synproxy_tg6_init(void)
+{
+	int err;
+
+	err = nf_register_hooks(ipv6_synproxy_ops,
+				ARRAY_SIZE(ipv6_synproxy_ops));
+	if (err < 0)
+		goto err1;
+
+	err = xt_register_target(&synproxy_tg6_reg);
+	if (err < 0)
+		goto err2;
+
+	return 0;
+
+err2:
+	nf_unregister_hooks(ipv6_synproxy_ops, ARRAY_SIZE(ipv6_synproxy_ops));
+err1:
+	return err;
+}
+
+static void __exit synproxy_tg6_exit(void)
+{
+	xt_unregister_target(&synproxy_tg6_reg);
+	nf_unregister_hooks(ipv6_synproxy_ops, ARRAY_SIZE(ipv6_synproxy_ops));
+}
+
+module_init(synproxy_tg6_init);
+module_exit(synproxy_tg6_exit);
+
+MODULE_LICENSE("GPL");
+MODULE_AUTHOR("Patrick McHardy <kaber@trash.net>");
-- 
1.8.1.4

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* Re: [PATCH RFC 0/5] netfilter: implement netfilter SYN proxy
  2013-08-07 17:42 [PATCH RFC 0/5] netfilter: implement netfilter SYN proxy Patrick McHardy
                   ` (4 preceding siblings ...)
  2013-08-07 17:42 ` [PATCH 5/5] netfilter: add IPv6 SYNPROXY target Patrick McHardy
@ 2013-08-07 18:06 ` Eric Dumazet
  2013-08-07 20:59   ` Patrick McHardy
  5 siblings, 1 reply; 29+ messages in thread
From: Eric Dumazet @ 2013-08-07 18:06 UTC (permalink / raw)
  To: Patrick McHardy; +Cc: pablo, netfilter-devel, netdev, mph, jesper.brouer, as

On Wed, 2013-08-07 at 19:42 +0200, Patrick McHardy wrote:

> 
> The SYNPROXY operates by marking the initial SYN from the client as UNTRACKED
> and directing it to the SYNPROXY target. The target responds with a SYN/ACK
> containing a cookie and encodes options such as window scaling factor, SACK
> perm etc. into the timestamp, if timestamps are used (similar to TCP). The
> window size is set to zero. The response is also sent as untracked packet.

TCP timestamps are not really used, for various reasons ...

Have you taken a look at 

<http://lists.freebsd.org/pipermail/freebsd-net/2013-July/035999.html>




^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 1/5] netfilter: nf_conntrack: make sequence number adjustments usuable without NAT
  2013-08-07 17:42 ` [PATCH 1/5] netfilter: nf_conntrack: make sequence number adjustments usuable without NAT Patrick McHardy
@ 2013-08-07 20:02   ` Jesper Dangaard Brouer
  0 siblings, 0 replies; 29+ messages in thread
From: Jesper Dangaard Brouer @ 2013-08-07 20:02 UTC (permalink / raw)
  To: Patrick McHardy; +Cc: pablo, netfilter-devel, netdev, mph, as

On Wed,  7 Aug 2013 19:42:47 +0200
Patrick McHardy <kaber@trash.net> wrote:

> Split out sequence number adjustments from NAT and move them to the
> conntrack core to make them usable for SYN proxying. The sequence
> number adjustment information is moved to a seperate extend. The
> extend is added to new conntracks when a NAT mapping is set up for a
> connection using a helper.
> 
> As a side effect, this saves 24 bytes per connection with NAT in the
> common case that a connection does not have a helper assigned.
> 
> Signed-off-by: Patrick McHardy <kaber@trash.net>

Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 2/5] net: syncookies: export cookie_v4_init_sequence/cookie_v4_check
  2013-08-07 17:42 ` [PATCH 2/5] net: syncookies: export cookie_v4_init_sequence/cookie_v4_check Patrick McHardy
@ 2013-08-07 20:03   ` Jesper Dangaard Brouer
  0 siblings, 0 replies; 29+ messages in thread
From: Jesper Dangaard Brouer @ 2013-08-07 20:03 UTC (permalink / raw)
  To: Patrick McHardy; +Cc: pablo, netfilter-devel, netdev, mph, as

On Wed,  7 Aug 2013 19:42:48 +0200
Patrick McHardy <kaber@trash.net> wrote:

> Extract the local TCP stack independant parts of
> tcp_v4_init_sequence() and cookie_v4_check() and export them for use
> by the upcoming SYNPROXY target.
> 
> Signed-off-by: Patrick McHardy <kaber@trash.net>

Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 3/5] netfilter: add SYNPROXY core/target
  2013-08-07 17:42 ` [PATCH 3/5] netfilter: add SYNPROXY core/target Patrick McHardy
@ 2013-08-07 20:26   ` Jesper Dangaard Brouer
  2013-08-07 20:56     ` Patrick McHardy
  2013-08-07 22:11   ` Eric Dumazet
  1 sibling, 1 reply; 29+ messages in thread
From: Jesper Dangaard Brouer @ 2013-08-07 20:26 UTC (permalink / raw)
  To: Patrick McHardy; +Cc: pablo, netfilter-devel, netdev, mph, as

On Wed,  7 Aug 2013 19:42:49 +0200
Patrick McHardy <kaber@trash.net> wrote:

> Add a SYNPROXY for netfilter. The code is split into two parts, the
> synproxy core with common functions and an address family specific
> target.
> 
> The SYNPROXY receives the connection request from the client,
> responds with a SYN/ACK containing a SYN cookie and announcing a zero
> window and checks whether the final ACK from the client contains a
> valid cookie.
> 
> It then establishes a connection to the original destination and, if
> successful, sends a window update to the client with the window size
> announced by the server.
> 
> Support for timestamps, SACK, window scaling and MSS options can be
> statically configured as target parameters if the features of the
> server are known. If timestamps are used, the timestamp value sent
> back to the client in the SYN/ACK will be different from the real
> timestamp of the server. In order to now break PAWS, the timestamps
> are translated in the direction server->client.
> 
> Signed-off-by: Patrick McHardy <kaber@trash.net>
> ---

[...]

> +static void
> +synproxy_send_server_syn(const struct synproxy_net *snet,
> +			 const struct sk_buff *skb, const struct
> tcphdr *th,
> +			 const struct synproxy_options *opts)
> +{
> +	struct sk_buff *nskb;
> +	struct iphdr *iph, *niph;
> +	struct tcphdr *nth;
> +	unsigned int tcp_hdr_size;
> +
> +	iph = ip_hdr(skb);
> +
> +	tcp_hdr_size = sizeof(*nth) + synproxy_options_size(opts);
> +	nskb = alloc_skb(sizeof(*niph) + tcp_hdr_size + LL_MAX_HEADER,
> +			 GFP_ATOMIC);
> +	if (nskb == NULL)
> +		return;
> +	skb_reserve(nskb, LL_MAX_HEADER);
> +
> +	niph = synproxy_build_ip(nskb, iph->saddr, iph->daddr);
> +
> +	skb_reset_transport_header(nskb);
> +	nth = (struct tcphdr *)skb_put(nskb, tcp_hdr_size);
> +	nth->source	= th->source;
> +	nth->dest	= th->dest;
> +	nth->seq	= htonl(ntohl(th->seq) - 1);
> +	nth->ack_seq	= htonl(ntohl(th->ack_seq) - 1);;

Strange double ";;"

Besides shouldn't nth->ack_seq be zero, in a SYN packet? This is the
SYN "replayed" towards the server right?

I also pointed to this in an earlier patch Martin showed me, but he
reported that changing this resulted in bad behavior.  So, I would
request Martin to re-test this part.


> +	tcp_flag_word(nth) = TCP_FLAG_SYN;
> +	if (opts->options & XT_SYNPROXY_OPT_ECN)
> +		tcp_flag_word(nth) |= TCP_FLAG_ECE | TCP_FLAG_CWR;
> +	nth->doff	= tcp_hdr_size / 4;
> +	nth->window	= th->window;
> +	nth->check	= 0;
> +	nth->urg_ptr	= 0;
> +
> +	synproxy_build_options(nth, opts);
> +
> +	synproxy_send_tcp(skb, nskb, &snet->tmpl->ct_general, IP_CT_NEW,
> +			  niph, nth, tcp_hdr_size);
> +}

[...]

> +static unsigned int
> +synproxy_tg4(struct sk_buff *skb, const struct xt_action_param *par)
> +{
> +	const struct xt_synproxy_info *info = par->targinfo;
> +	struct synproxy_net *snet = synproxy_pernet(dev_net(par->in));
> +	struct synproxy_options opts = {};
> +	struct tcphdr *th, _th;
> +
> +	if (nf_ip_checksum(skb, par->hooknum, par->thoff, IPPROTO_TCP))
> +		return NF_DROP;
> +
> +	th = skb_header_pointer(skb, par->thoff, sizeof(_th), &_th);
> +	if (th == NULL)
> +		return NF_DROP;
> +
> +	synproxy_parse_options(skb, par->thoff, th, &opts);
> +
> +	if (th->syn) {
> +		/* Initial SYN from client */
> +		this_cpu_inc(snet->stats->syn_received);
> +
> +		if (th->ece && th->cwr)
> +			opts.options |= XT_SYNPROXY_OPT_ECN;
> +
> +		opts.options &= info->options;
> +		if (opts.options & XT_SYNPROXY_OPT_TIMESTAMP)
> +			synproxy_init_timestamp_cookie(info, &opts);
> +		else
> +			opts.options &= ~(XT_SYNPROXY_OPT_WSCALE |
> +					  XT_SYNPROXY_OPT_SACK_PERM |
> +					  XT_SYNPROXY_OPT_ECN);
> +
> +		synproxy_send_client_synack(skb, th, &opts);
> +	} else if (th->ack && !(th->fin || th->rst)) {

This could also match SYN+ACK... we are only interested in the ACK
(from the 3WHS) here, right?

> +		/* ACK from client */
> +		int mss = __cookie_v4_check(ip_hdr(skb), th,
> +					    ntohl(th->ack_seq) - 1);
> +		if (mss == 0) {
> +			this_cpu_inc(snet->stats->cookie_invalid);
> +			return NF_DROP;
> +		}
> +
> +		this_cpu_inc(snet->stats->cookie_valid);
> +		opts.mss = mss;
> +
> +		if (opts.options & XT_SYNPROXY_OPT_TIMESTAMP)
> +			synproxy_check_timestamp_cookie(&opts);
> +
> +		synproxy_send_server_syn(snet, skb, th, &opts);
> +	}
> +
> +	return NF_DROP;
> +}

[...]

> diff --git a/net/netfilter/nf_synproxy_core.c
> b/net/netfilter/nf_synproxy_core.c new file mode 100644
> index 0000000..d887a84
> --- /dev/null
> +++ b/net/netfilter/nf_synproxy_core.c
> @@ -0,0 +1,428 @@
> +/*
> + * Copyright (c) 2013 Patrick McHardy <kaber@trash.net>
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License version 2 as
> + * published by the Free Software Foundation.
> + */
> +
[...]
> +
> +void
> +synproxy_parse_options(const struct sk_buff *skb, unsigned int doff,
> +		       const struct tcphdr *th, struct synproxy_options *opts)
> +{
> +	int length = (th->doff * 4) - sizeof(*th);
> +	u8 buf[40], *ptr;
> +
> +	ptr = skb_header_pointer(skb, doff + sizeof(*th), length, buf);
> +	BUG_ON(ptr == NULL);
> +
> +	opts->options = 0;
> +	while (length > 0) {
> +		int opcode = *ptr++;
> +		int opsize;
> +
> +		switch (opcode) {
> +		case TCPOPT_EOL:
> +			return;
> +		case TCPOPT_NOP:
> +			length--;
> +			continue;
> +		default:
> +			opsize = *ptr++;
> +			if (opsize < 2)
> +				return;
> +			if (opsize > length)
> +				return;
> +
> +			switch (opcode) {
> +			case TCPOPT_MSS:
> +				if (opsize == TCPOLEN_MSS) {
> +					 opts->mss = get_unaligned_be16(ptr);

Strange indention, extra space before "opts->mss".

> +					 opts->options |= XT_SYNPROXY_OPT_MSS;

Strange indention, extra space before "opts->options".


> +				}
> +				break;
> +			case TCPOPT_WINDOW:
> +				if (opsize == TCPOLEN_WINDOW) {
> +					opts->wscale = *ptr;
> +					if (opts->wscale > 14)
> +						opts->wscale = 14;
> +					opts->options |= XT_SYNPROXY_OPT_WSCALE;
> +				}
> +				break;
> +			case TCPOPT_TIMESTAMP:
> +				if (opsize == TCPOLEN_TIMESTAMP) {
> +					opts->tsval = get_unaligned_be32(ptr);
> +					opts->tsecr = get_unaligned_be32(ptr + 4);
> +					opts->options |= XT_SYNPROXY_OPT_TIMESTAMP;
> +				}
> +				break;
> +			case TCPOPT_SACK_PERM:
> +				if (opsize == TCPOLEN_SACK_PERM)
> +					opts->options |= XT_SYNPROXY_OPT_SACK_PERM;
> +				break;
> +			}
> +
> +			ptr += opsize - 2;
> +			length -= opsize;
> +		}
> +	}
> +}
> +EXPORT_SYMBOL_GPL(synproxy_parse_options);

[...]

> +static int synproxy_cpu_seq_show(struct seq_file *seq, void *v)
> +{
> +	struct synproxy_stats *stats = v;
> +
> +	if (v == SEQ_START_TOKEN) {
> +		seq_printf(seq, "syn_received\tcookie_invalid\tcookie_valid\n");
> +		return 0;
> +	}
> +
> +	seq_printf(seq, "%08u\t%08x\t%08x\n",

Shouldn't all numbers be printed in hex? (%08u -> %08x)

Besides when using net->proc_net_stat, then the first entry is usually
"entries" which is not percpu, this will likely confusing the tool:
  lnstat -f synproxy -c 42


> +		   stats->syn_received,
> +		   stats->cookie_invalid,
> +		   stats->cookie_valid);
> +
> +	return 0;
> +}

[...]


-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Sr. Network Kernel Developer at Red Hat
  Author of http://www.iptv-analyzer.org
  LinkedIn: http://www.linkedin.com/in/brouer

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 4/5] net: syncookies: export cookie_v6_init_sequence/cookie_v6_check
  2013-08-07 17:42 ` [PATCH 4/5] net: syncookies: export cookie_v6_init_sequence/cookie_v6_check Patrick McHardy
@ 2013-08-07 20:27   ` Jesper Dangaard Brouer
  0 siblings, 0 replies; 29+ messages in thread
From: Jesper Dangaard Brouer @ 2013-08-07 20:27 UTC (permalink / raw)
  To: Patrick McHardy; +Cc: pablo, netfilter-devel, netdev, mph, jesper.brouer, as

On Wed,  7 Aug 2013 19:42:50 +0200
Patrick McHardy <kaber@trash.net> wrote:

> Extract the local TCP stack independant parts of
> tcp_v6_init_sequence() and cookie_v6_check() and export them for use
> by the upcoming IPv6 SYNPROXY target.
> 
> Signed-off-by: Patrick McHardy <kaber@trash.net>

Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>

-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Sr. Network Kernel Developer at Red Hat
  Author of http://www.iptv-analyzer.org
  LinkedIn: http://www.linkedin.com/in/brouer

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 5/5] netfilter: add IPv6 SYNPROXY target
  2013-08-07 17:42 ` [PATCH 5/5] netfilter: add IPv6 SYNPROXY target Patrick McHardy
@ 2013-08-07 20:34   ` Jesper Dangaard Brouer
  2013-08-07 20:57     ` Patrick McHardy
  0 siblings, 1 reply; 29+ messages in thread
From: Jesper Dangaard Brouer @ 2013-08-07 20:34 UTC (permalink / raw)
  To: Patrick McHardy; +Cc: pablo, netfilter-devel, netdev, mph, as


On Wed,  7 Aug 2013 19:42:51 +0200 Patrick McHardy <kaber@trash.net> wrote:

> Add an IPv6 version of the SYNPROXY target. The main differences to
> the IPv4 version is routing and IP header construction.
> 
> Signed-off-by: Patrick McHardy <kaber@trash.net>

[...]

> diff --git a/net/ipv6/netfilter/ip6t_SYNPROXY.c
> b/net/ipv6/netfilter/ip6t_SYNPROXY.c new file mode 100644
> index 0000000..ee773da
> --- /dev/null
> +++ b/net/ipv6/netfilter/ip6t_SYNPROXY.c
[...]

> +static void
> +synproxy_send_server_syn(const struct synproxy_net *snet,
> +			 const struct sk_buff *skb, const struct tcphdr *th,
> +			 const struct synproxy_options *opts)
> +{
> +	struct sk_buff *nskb;
> +	struct ipv6hdr *iph, *niph;
> +	struct tcphdr *nth;
> +	unsigned int tcp_hdr_size;
> +
> +	iph = ipv6_hdr(skb);
> +
> +	tcp_hdr_size = sizeof(*nth) + synproxy_options_size(opts);
> +	nskb = alloc_skb(sizeof(*niph) + tcp_hdr_size + LL_MAX_HEADER,
> +			 GFP_ATOMIC);
> +	if (nskb == NULL)
> +		return;
> +	skb_reserve(nskb, LL_MAX_HEADER);
> +
> +	niph = synproxy_build_ip(nskb, &iph->saddr, &iph->daddr);
> +
> +	skb_reset_transport_header(nskb);
> +	nth = (struct tcphdr *)skb_put(nskb, tcp_hdr_size);
> +	nth->source	= th->source;
> +	nth->dest	= th->dest;
> +	nth->seq	= htonl(ntohl(th->seq) - 1);
> +	nth->ack_seq	= htonl(ntohl(th->ack_seq) - 1);;

Strange double ";;".

And as IPv4, shouldn't this be zero? I might be wrong...


> +	tcp_flag_word(nth) = TCP_FLAG_SYN;
> +	if (opts->options & XT_SYNPROXY_OPT_ECN)
> +		tcp_flag_word(nth) |= TCP_FLAG_ECE | TCP_FLAG_CWR;
> +	nth->doff	= tcp_hdr_size / 4;
> +	nth->window	= th->window;
> +	nth->check	= 0;
> +	nth->urg_ptr	= 0;
> +
> +	synproxy_build_options(nth, opts);
> +
> +	synproxy_send_tcp(skb, nskb, &snet->tmpl->ct_general, IP_CT_NEW,
> +			  niph, nth, tcp_hdr_size);
> +}
> +
> +static void
> +synproxy_send_server_ack(const struct synproxy_net *snet,
> +			 const struct ip_ct_tcp *state,
> +			 const struct sk_buff *skb, const struct tcphdr *th,
> +			 const struct synproxy_options *opts)
> +{
> +	struct sk_buff *nskb;
> +	struct ipv6hdr *iph, *niph;
> +	struct tcphdr *nth;
> +	unsigned int tcp_hdr_size;
> +
> +	iph = ipv6_hdr(skb);
> +
> +	tcp_hdr_size = sizeof(*nth) + synproxy_options_size(opts);
> +	nskb = alloc_skb(sizeof(*niph) + tcp_hdr_size + LL_MAX_HEADER,
> +			 GFP_ATOMIC);
> +	if (nskb == NULL)
> +		return;
> +	skb_reserve(nskb, LL_MAX_HEADER);
> +
> +	niph = synproxy_build_ip(nskb, &iph->daddr, &iph->saddr);
> +
> +	skb_reset_transport_header(nskb);
> +	nth = (struct tcphdr *)skb_put(nskb, tcp_hdr_size);
> +	nth->source	= th->dest;
> +	nth->dest	= th->source;
> +	nth->seq	= htonl(ntohl(th->ack_seq));
> +	nth->ack_seq	= htonl(ntohl(th->seq) + 1);;

Strange double ";;"


> +	tcp_flag_word(nth) = TCP_FLAG_ACK;
> +	nth->doff	= tcp_hdr_size / 4;
> +	nth->window	=
> htons(state->seen[IP_CT_DIR_ORIGINAL].td_maxwin);
> +	nth->check	= 0;
> +	nth->urg_ptr	= 0;
> +
> +	synproxy_build_options(nth, opts);
> +
> +	synproxy_send_tcp(skb, nskb, skb->nfct, IP_CT_ESTABLISHED,
> +			  niph, nth, tcp_hdr_size);
> +}
> +


> +static int synproxy_tg6_check(const struct xt_tgchk_param *par)
> +{
> +	/// XXX PROTO match TCP

Ups, this looks like an comment to your self ;-)

> +	return nf_ct_l3proto_try_module_get(par->family);
> +}



-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Sr. Network Kernel Developer at Red Hat
  Author of http://www.iptv-analyzer.org
  LinkedIn: http://www.linkedin.com/in/brouer

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 3/5] netfilter: add SYNPROXY core/target
  2013-08-07 20:26   ` Jesper Dangaard Brouer
@ 2013-08-07 20:56     ` Patrick McHardy
  2013-08-08  6:22       ` Patrick McHardy
  2013-08-08  8:04       ` Jesper Dangaard Brouer
  0 siblings, 2 replies; 29+ messages in thread
From: Patrick McHardy @ 2013-08-07 20:56 UTC (permalink / raw)
  To: Jesper Dangaard Brouer; +Cc: pablo, netfilter-devel, netdev, mph, as

On Wed, Aug 07, 2013 at 10:26:00PM +0200, Jesper Dangaard Brouer wrote:
> On Wed,  7 Aug 2013 19:42:49 +0200
> Patrick McHardy <kaber@trash.net> wrote:
> 
> > +	nth = (struct tcphdr *)skb_put(nskb, tcp_hdr_size);
> > +	nth->source	= th->source;
> > +	nth->dest	= th->dest;
> > +	nth->seq	= htonl(ntohl(th->seq) - 1);
> > +	nth->ack_seq	= htonl(ntohl(th->ack_seq) - 1);;
> 
> Strange double ";;"

Thanks, fixed.

> Besides shouldn't nth->ack_seq be zero, in a SYN packet? This is the
> SYN "replayed" towards the server right?
> 
> I also pointed to this in an earlier patch Martin showed me, but he
> reported that changing this resulted in bad behavior.  So, I would
> request Martin to re-test this part.

Right, it should be zero, but it doesn't matter since the ACK flag isn't
set. This is used to propagate the sequence number to the hook function
to initialize the sequence adjustment data. While in the target function,
we don't have any connection tracking state to store this in. We could
set it to zero after that, but it shouldn't matter.

> > +static unsigned int
> > +synproxy_tg4(struct sk_buff *skb, const struct xt_action_param *par)
> > +{
> > +	const struct xt_synproxy_info *info = par->targinfo;
> > +	struct synproxy_net *snet = synproxy_pernet(dev_net(par->in));
> > +	struct synproxy_options opts = {};
> > +	struct tcphdr *th, _th;
> > +
> > +	if (nf_ip_checksum(skb, par->hooknum, par->thoff, IPPROTO_TCP))
> > +		return NF_DROP;
> > +
> > +	th = skb_header_pointer(skb, par->thoff, sizeof(_th), &_th);
> > +	if (th == NULL)
> > +		return NF_DROP;
> > +
> > +	synproxy_parse_options(skb, par->thoff, th, &opts);
> > +
> > +	if (th->syn) {
> > +		/* Initial SYN from client */
> > +		this_cpu_inc(snet->stats->syn_received);
> > +
> > +		if (th->ece && th->cwr)
> > +			opts.options |= XT_SYNPROXY_OPT_ECN;
> > +
> > +		opts.options &= info->options;
> > +		if (opts.options & XT_SYNPROXY_OPT_TIMESTAMP)
> > +			synproxy_init_timestamp_cookie(info, &opts);
> > +		else
> > +			opts.options &= ~(XT_SYNPROXY_OPT_WSCALE |
> > +					  XT_SYNPROXY_OPT_SACK_PERM |
> > +					  XT_SYNPROXY_OPT_ECN);
> > +
> > +		synproxy_send_client_synack(skb, th, &opts);
> > +	} else if (th->ack && !(th->fin || th->rst)) {
> 
> This could also match SYN+ACK... we are only interested in the ACK
> (from the 3WHS) here, right?

Right, we shouldn't see a SYN/ACK here, but I'll add an explicit check.

> > +			switch (opcode) {
> > +			case TCPOPT_MSS:
> > +				if (opsize == TCPOLEN_MSS) {
> > +					 opts->mss = get_unaligned_be16(ptr);
> 
> Strange indention, extra space before "opts->mss".
> 
> > +					 opts->options |= XT_SYNPROXY_OPT_MSS;
> 
> Strange indention, extra space before "opts->options".

Thanks, fixed.

> > +static int synproxy_cpu_seq_show(struct seq_file *seq, void *v)
> > +{
> > +	struct synproxy_stats *stats = v;
> > +
> > +	if (v == SEQ_START_TOKEN) {
> > +		seq_printf(seq, "syn_received\tcookie_invalid\tcookie_valid\n");
> > +		return 0;
> > +	}
> > +
> > +	seq_printf(seq, "%08u\t%08x\t%08x\n",
> 
> Shouldn't all numbers be printed in hex? (%08u -> %08x)

Right, fixed.

> Besides when using net->proc_net_stat, then the first entry is usually
> "entries" which is not percpu, this will likely confusing the tool:
>   lnstat -f synproxy -c 42

I'll look into that.

Thanks Jesper.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 5/5] netfilter: add IPv6 SYNPROXY target
  2013-08-07 20:34   ` Jesper Dangaard Brouer
@ 2013-08-07 20:57     ` Patrick McHardy
  0 siblings, 0 replies; 29+ messages in thread
From: Patrick McHardy @ 2013-08-07 20:57 UTC (permalink / raw)
  To: Jesper Dangaard Brouer; +Cc: pablo, netfilter-devel, netdev, mph, as

On Wed, Aug 07, 2013 at 10:34:40PM +0200, Jesper Dangaard Brouer wrote:
> 
> On Wed,  7 Aug 2013 19:42:51 +0200 Patrick McHardy <kaber@trash.net> wrote:
> 
> > Add an IPv6 version of the SYNPROXY target. The main differences to
> > the IPv4 version is routing and IP header construction.
> > 
> > Signed-off-by: Patrick McHardy <kaber@trash.net>
> 
> > +static int synproxy_tg6_check(const struct xt_tgchk_param *par)
> > +{
> > +	/// XXX PROTO match TCP
> 
> Ups, this looks like an comment to your self ;-)
> 
> > +	return nf_ct_l3proto_try_module_get(par->family);
> > +}

Oops right, I intended to add a check for proto TCP match in the rule
to make sure thoff is initialized by ip6tables.

I'll wait for more comments before sending an updated series.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH RFC 0/5] netfilter: implement netfilter SYN proxy
  2013-08-07 18:06 ` [PATCH RFC 0/5] netfilter: implement netfilter SYN proxy Eric Dumazet
@ 2013-08-07 20:59   ` Patrick McHardy
  2013-08-07 21:05     ` Hannes Frederic Sowa
  0 siblings, 1 reply; 29+ messages in thread
From: Patrick McHardy @ 2013-08-07 20:59 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: pablo, netfilter-devel, netdev, mph, jesper.brouer, as

On Wed, Aug 07, 2013 at 11:06:06AM -0700, Eric Dumazet wrote:
> On Wed, 2013-08-07 at 19:42 +0200, Patrick McHardy wrote:
> 
> > 
> > The SYNPROXY operates by marking the initial SYN from the client as UNTRACKED
> > and directing it to the SYNPROXY target. The target responds with a SYN/ACK
> > containing a cookie and encodes options such as window scaling factor, SACK
> > perm etc. into the timestamp, if timestamps are used (similar to TCP). The
> > window size is set to zero. The response is also sent as untracked packet.
> 
> TCP timestamps are not really used, for various reasons ...
> 
> Have you taken a look at 
> 
> <http://lists.freebsd.org/pipermail/freebsd-net/2013-July/035999.html>

No, not yet, will have a look. Not sure what you mean by "TCP timestamps
are not really used" though. I might be biased by usually only looking at
Linux traffic, but I was under that impression that everyone is using
TCP timestamps nowadays?

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH RFC 0/5] netfilter: implement netfilter SYN proxy
  2013-08-07 20:59   ` Patrick McHardy
@ 2013-08-07 21:05     ` Hannes Frederic Sowa
  2013-08-07 21:24       ` Patrick McHardy
  2013-08-07 23:40       ` David Miller
  0 siblings, 2 replies; 29+ messages in thread
From: Hannes Frederic Sowa @ 2013-08-07 21:05 UTC (permalink / raw)
  To: Patrick McHardy
  Cc: Eric Dumazet, pablo, netfilter-devel, netdev, mph, jesper.brouer, as

On Wed, Aug 07, 2013 at 10:59:59PM +0200, Patrick McHardy wrote:
> On Wed, Aug 07, 2013 at 11:06:06AM -0700, Eric Dumazet wrote:
> > TCP timestamps are not really used, for various reasons ...
> > 
> > Have you taken a look at 
> > 
> > <http://lists.freebsd.org/pipermail/freebsd-net/2013-July/035999.html>
> 
> No, not yet, will have a look. Not sure what you mean by "TCP timestamps
> are not really used" though. I might be biased by usually only looking at
> Linux traffic, but I was under that impression that everyone is using
> TCP timestamps nowadays?

We had a thread here on netdev:
<http://thread.gmane.org/gmane.linux.network/275681/>

It seems, Windows stopped using tcp timestamps at least in windows 8 by
default.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH RFC 0/5] netfilter: implement netfilter SYN proxy
  2013-08-07 21:05     ` Hannes Frederic Sowa
@ 2013-08-07 21:24       ` Patrick McHardy
  2013-08-07 21:39         ` Eric Dumazet
  2013-08-07 23:40       ` David Miller
  1 sibling, 1 reply; 29+ messages in thread
From: Patrick McHardy @ 2013-08-07 21:24 UTC (permalink / raw)
  To: Eric Dumazet, pablo, netfilter-devel, netdev, mph, jesper.brouer, as

On Wed, Aug 07, 2013 at 11:05:40PM +0200, Hannes Frederic Sowa wrote:
> On Wed, Aug 07, 2013 at 10:59:59PM +0200, Patrick McHardy wrote:
> > On Wed, Aug 07, 2013 at 11:06:06AM -0700, Eric Dumazet wrote:
> > > TCP timestamps are not really used, for various reasons ...
> > > 
> > > Have you taken a look at 
> > > 
> > > <http://lists.freebsd.org/pipermail/freebsd-net/2013-July/035999.html>
> > 
> > No, not yet, will have a look. Not sure what you mean by "TCP timestamps
> > are not really used" though. I might be biased by usually only looking at
> > Linux traffic, but I was under that impression that everyone is using
> > TCP timestamps nowadays?
> 
> We had a thread here on netdev:
> <http://thread.gmane.org/gmane.linux.network/275681/>
> 
> It seems, Windows stopped using tcp timestamps at least in windows 8 by
> default.

I see. Well, that seems to be a general problem with SYN cookies, I guess
in that case the encoding Linux uses should be changed. I'll have a closer
look at the changes proposed in that thread tommorrow.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH RFC 0/5] netfilter: implement netfilter SYN proxy
  2013-08-07 21:24       ` Patrick McHardy
@ 2013-08-07 21:39         ` Eric Dumazet
  0 siblings, 0 replies; 29+ messages in thread
From: Eric Dumazet @ 2013-08-07 21:39 UTC (permalink / raw)
  To: Patrick McHardy; +Cc: pablo, netfilter-devel, netdev, mph, jesper.brouer, as

On Wed, 2013-08-07 at 23:24 +0200, Patrick McHardy wrote:

> I see. Well, that seems to be a general problem with SYN cookies, I guess
> in that case the encoding Linux uses should be changed. I'll have a closer
> look at the changes proposed in that thread tommorrow.

I did a quick check on a host, and it turns out that 111664 SYN had TS
option (total of 146123 SYN messages received)

So maybe its not a big issue. We probably need a poll ;)



^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 3/5] netfilter: add SYNPROXY core/target
  2013-08-07 17:42 ` [PATCH 3/5] netfilter: add SYNPROXY core/target Patrick McHardy
  2013-08-07 20:26   ` Jesper Dangaard Brouer
@ 2013-08-07 22:11   ` Eric Dumazet
  2013-08-07 23:37     ` Patrick McHardy
  1 sibling, 1 reply; 29+ messages in thread
From: Eric Dumazet @ 2013-08-07 22:11 UTC (permalink / raw)
  To: Patrick McHardy; +Cc: pablo, netfilter-devel, netdev, mph, jesper.brouer, as

On Wed, 2013-08-07 at 19:42 +0200, Patrick McHardy wrote:
> Add a SYNPROXY for netfilter. The code is split into two parts, the synproxy
> core with common functions and an address family specific target.
> 
> The SYNPROXY receives the connection request from the client, responds with
> a SYN/ACK containing a SYN cookie and announcing a zero window and checks
> whether the final ACK from the client contains a valid cookie.
> 
> It then establishes a connection to the original destination and, if
> successful, sends a window update to the client with the window size
> announced by the server.
> 
> Support for timestamps, SACK, window scaling and MSS options can be
> statically configured as target parameters if the features of the server
> are known. If timestamps are used, the timestamp value sent back to
> the client in the SYN/ACK will be different from the real timestamp of
> the server. In order to now break PAWS, the timestamps are translated in
> the direction server->client.
> 
> Signed-off-by: Patrick McHardy <kaber@trash.net>


> +static struct iphdr *
> +synproxy_build_ip(struct sk_buff *skb, u32 saddr, u32 daddr)
> +{
> +	struct iphdr *iph;
> +
> +	skb_reset_network_header(skb);
> +	iph = (struct iphdr *)skb_put(skb, sizeof(*iph));
> +	iph->version	= 4;
> +	iph->ihl	= sizeof(*iph) / 4;
> +	iph->tos	= 0;
> +	iph->id		= 0;
> +	iph->frag_off	= htons(IP_DF);
> +	iph->ttl	= 64;

sysctl_ip_default_ttl ?

> +	iph->protocol	= IPPROTO_TCP;
> +	iph->check	= 0;
> +	iph->saddr	= saddr;
> +	iph->daddr	= daddr;
> +
> +	return iph;
> +}
> +


> +static void
> +synproxy_send_client_synack(const struct sk_buff *skb, const struct tcphdr *th,
> +			    const struct synproxy_options *opts)
> +{
> +	struct sk_buff *nskb;
> +	struct iphdr *iph, *niph;
> +	struct tcphdr *nth;
> +	unsigned int tcp_hdr_size;
> +	u16 mss = opts->mss;
> +
> +	iph = ip_hdr(skb);
> +
> +	tcp_hdr_size = sizeof(*nth) + synproxy_options_size(opts);
> +	nskb = alloc_skb(sizeof(*niph) + tcp_hdr_size + LL_MAX_HEADER,
> +			 GFP_ATOMIC);
> +	if (nskb == NULL)
> +		return;
> +	skb_reserve(nskb, LL_MAX_HEADER);

s/LL_MAX_HEADER/MAX_TCP_HEADER ? 

> +
> +	niph = synproxy_build_ip(nskb, iph->daddr, iph->saddr);
> +
> +	skb_reset_transport_header(nskb);
> +	nth = (struct tcphdr *)skb_put(nskb, tcp_hdr_size);
> +	nth->source	= th->dest;
> +	nth->dest	= th->source;
> +	nth->seq	= htonl(__cookie_v4_init_sequence(iph, th, &mss));
> +	nth->ack_seq	= htonl(ntohl(th->seq) + 1);
> +	tcp_flag_word(nth) = TCP_FLAG_SYN | TCP_FLAG_ACK;
> +	if (opts->options & XT_SYNPROXY_OPT_ECN)
> +		tcp_flag_word(nth) |= TCP_FLAG_ECE;
> +	nth->doff	= tcp_hdr_size / 4;
> +	nth->window	= 0;
> +	nth->check	= 0;
> +	nth->urg_ptr	= 0;
> +
> +	synproxy_build_options(nth, opts);
> +
> +	synproxy_send_tcp(skb, nskb, skb->nfct, IP_CT_ESTABLISHED_REPLY,
> +			  niph, nth, tcp_hdr_size);
> +}

Also please check your uses of kfree_skb() .

Some of them would better be consume_skb() (for example in
ipv4_synproxy_hook())

I wonder if this code could be generic for IPv4/IPv6, instead of
duplicating in IPv6




^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 3/5] netfilter: add SYNPROXY core/target
  2013-08-07 22:11   ` Eric Dumazet
@ 2013-08-07 23:37     ` Patrick McHardy
  2013-08-08  6:34       ` Patrick McHardy
  0 siblings, 1 reply; 29+ messages in thread
From: Patrick McHardy @ 2013-08-07 23:37 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: pablo, netfilter-devel, netdev, mph, jesper.brouer, as

On Wed, Aug 07, 2013 at 03:11:54PM -0700, Eric Dumazet wrote:
> On Wed, 2013-08-07 at 19:42 +0200, Patrick McHardy wrote:
> > Add a SYNPROXY for netfilter. The code is split into two parts, the synproxy
> > core with common functions and an address family specific target.
> > 
> > The SYNPROXY receives the connection request from the client, responds with
> > a SYN/ACK containing a SYN cookie and announcing a zero window and checks
> > whether the final ACK from the client contains a valid cookie.
> > 
> > It then establishes a connection to the original destination and, if
> > successful, sends a window update to the client with the window size
> > announced by the server.
> > 
> > Support for timestamps, SACK, window scaling and MSS options can be
> > statically configured as target parameters if the features of the server
> > are known. If timestamps are used, the timestamp value sent back to
> > the client in the SYN/ACK will be different from the real timestamp of
> > the server. In order to now break PAWS, the timestamps are translated in
> > the direction server->client.
> > 
> > Signed-off-by: Patrick McHardy <kaber@trash.net>
> 
> 
> > +static struct iphdr *
> > +synproxy_build_ip(struct sk_buff *skb, u32 saddr, u32 daddr)
> > +{
> > +	struct iphdr *iph;
> > +
> > +	skb_reset_network_header(skb);
> > +	iph = (struct iphdr *)skb_put(skb, sizeof(*iph));
> > +	iph->version	= 4;
> > +	iph->ihl	= sizeof(*iph) / 4;
> > +	iph->tos	= 0;
> > +	iph->id		= 0;
> > +	iph->frag_off	= htons(IP_DF);
> > +	iph->ttl	= 64;
> 
> sysctl_ip_default_ttl ?

Will do, thanks.

> > +static void
> > +synproxy_send_client_synack(const struct sk_buff *skb, const struct tcphdr *th,
> > +			    const struct synproxy_options *opts)
> > +{
> > +	struct sk_buff *nskb;
> > +	struct iphdr *iph, *niph;
> > +	struct tcphdr *nth;
> > +	unsigned int tcp_hdr_size;
> > +	u16 mss = opts->mss;
> > +
> > +	iph = ip_hdr(skb);
> > +
> > +	tcp_hdr_size = sizeof(*nth) + synproxy_options_size(opts);
> > +	nskb = alloc_skb(sizeof(*niph) + tcp_hdr_size + LL_MAX_HEADER,
> > +			 GFP_ATOMIC);
> > +	if (nskb == NULL)
> > +		return;
> > +	skb_reserve(nskb, LL_MAX_HEADER);
> 
> s/LL_MAX_HEADER/MAX_TCP_HEADER ? 

ACK.

> > +	niph = synproxy_build_ip(nskb, iph->daddr, iph->saddr);
> > +
> > +	skb_reset_transport_header(nskb);
> > +	nth = (struct tcphdr *)skb_put(nskb, tcp_hdr_size);
> > +	nth->source	= th->dest;
> > +	nth->dest	= th->source;
> > +	nth->seq	= htonl(__cookie_v4_init_sequence(iph, th, &mss));
> > +	nth->ack_seq	= htonl(ntohl(th->seq) + 1);
> > +	tcp_flag_word(nth) = TCP_FLAG_SYN | TCP_FLAG_ACK;
> > +	if (opts->options & XT_SYNPROXY_OPT_ECN)
> > +		tcp_flag_word(nth) |= TCP_FLAG_ECE;
> > +	nth->doff	= tcp_hdr_size / 4;
> > +	nth->window	= 0;
> > +	nth->check	= 0;
> > +	nth->urg_ptr	= 0;
> > +
> > +	synproxy_build_options(nth, opts);
> > +
> > +	synproxy_send_tcp(skb, nskb, skb->nfct, IP_CT_ESTABLISHED_REPLY,
> > +			  niph, nth, tcp_hdr_size);
> > +}
> 
> Also please check your uses of kfree_skb() .
> 
> Some of them would better be consume_skb() (for example in
> ipv4_synproxy_hook())

I'll look into that.

> I wonder if this code could be generic for IPv4/IPv6, instead of
> duplicating in IPv6

I considered that, in fact I started under that assumption. But it would
result in lots of functions taking 10+ (long) arguments, so in my opinion
the code gets less readable. Alternative would be lots of callbacks.

This seemed like the cleanest way so far, but I'd certainly welcome
concrete proposals for removing duplicated code.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH RFC 0/5] netfilter: implement netfilter SYN proxy
  2013-08-07 21:05     ` Hannes Frederic Sowa
  2013-08-07 21:24       ` Patrick McHardy
@ 2013-08-07 23:40       ` David Miller
  2013-08-08  0:04         ` Hannes Frederic Sowa
  1 sibling, 1 reply; 29+ messages in thread
From: David Miller @ 2013-08-07 23:40 UTC (permalink / raw)
  To: hannes
  Cc: kaber, eric.dumazet, pablo, netfilter-devel, netdev, mph,
	jesper.brouer, as

From: Hannes Frederic Sowa <hannes@stressinduktion.org>
Date: Wed, 7 Aug 2013 23:05:40 +0200

> It seems, Windows stopped using tcp timestamps at least in windows 8 by
> default.

Thankfully, Android device outnumber Windows 8 installs
by... something like 1,000 to 1, right?

I throw a huge "doesn't matter" to whatever Windows's TCP stack
decides to do.  It absolutely should not dictate whether we decide to
make use of this or that feature of TCP.  It's a bit player at best.

So if Windows 8 is the reason you're saying we shouldn't use
timestamps for anything, you're wrong.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH RFC 0/5] netfilter: implement netfilter SYN proxy
  2013-08-07 23:40       ` David Miller
@ 2013-08-08  0:04         ` Hannes Frederic Sowa
  2013-08-08  0:13           ` Patrick McHardy
  0 siblings, 1 reply; 29+ messages in thread
From: Hannes Frederic Sowa @ 2013-08-08  0:04 UTC (permalink / raw)
  To: David Miller
  Cc: kaber, eric.dumazet, pablo, netfilter-devel, netdev, mph,
	jesper.brouer, as

On Wed, Aug 07, 2013 at 04:40:56PM -0700, David Miller wrote:
> From: Hannes Frederic Sowa <hannes@stressinduktion.org>
> Date: Wed, 7 Aug 2013 23:05:40 +0200
> 
> > It seems, Windows stopped using tcp timestamps at least in windows 8 by
> > default.
> 
> Thankfully, Android device outnumber Windows 8 installs
> by... something like 1,000 to 1, right?

Heh, at minimum. :)

> I throw a huge "doesn't matter" to whatever Windows's TCP stack
> decides to do.  It absolutely should not dictate whether we decide to
> make use of this or that feature of TCP.  It's a bit player at best.
> 
> So if Windows 8 is the reason you're saying we shouldn't use
> timestamps for anything, you're wrong.

Actually, I don't care at all, because I don't do anything with windows
and don't get paid by anyone who wants me to care. ;)

But if we switch to a similar scheme as freebsd we can even care
less because even if some other operating systems or a major provider
decides to disable timestamps on their devices, we would still have
window scaling, sack (and ecn?) under syn dos. So, I do think it is an
improvement and don't see any disadvantages.

So, I don't care as long as the change (and siphash or maybe another
hashing scheme) is secure enough...

Greetings,

  Hannes

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH RFC 0/5] netfilter: implement netfilter SYN proxy
  2013-08-08  0:04         ` Hannes Frederic Sowa
@ 2013-08-08  0:13           ` Patrick McHardy
  2013-08-09 13:55             ` Neal Cardwell
  0 siblings, 1 reply; 29+ messages in thread
From: Patrick McHardy @ 2013-08-08  0:13 UTC (permalink / raw)
  To: David Miller, eric.dumazet, pablo, netfilter-devel, netdev, mph,
	jesper.brouer, as

On Thu, Aug 08, 2013 at 02:04:12AM +0200, Hannes Frederic Sowa wrote:
> On Wed, Aug 07, 2013 at 04:40:56PM -0700, David Miller wrote:
> > From: Hannes Frederic Sowa <hannes@stressinduktion.org>
> > Date: Wed, 7 Aug 2013 23:05:40 +0200
> > 
> > > It seems, Windows stopped using tcp timestamps at least in windows 8 by
> > > default.
> > 
> > Thankfully, Android device outnumber Windows 8 installs
> > by... something like 1,000 to 1, right?
> 
> Heh, at minimum. :)
> 
> > I throw a huge "doesn't matter" to whatever Windows's TCP stack
> > decides to do.  It absolutely should not dictate whether we decide to
> > make use of this or that feature of TCP.  It's a bit player at best.
> > 
> > So if Windows 8 is the reason you're saying we shouldn't use
> > timestamps for anything, you're wrong.
> 
> Actually, I don't care at all, because I don't do anything with windows
> and don't get paid by anyone who wants me to care. ;)
> 
> But if we switch to a similar scheme as freebsd we can even care
> less because even if some other operating systems or a major provider
> decides to disable timestamps on their devices, we would still have
> window scaling, sack (and ecn?) under syn dos. So, I do think it is an
> improvement and don't see any disadvantages.
> 
> So, I don't care as long as the change (and siphash or maybe another
> hashing scheme) is secure enough...

So maybe lets agree to this - we all don't care about Windows 8, but
more importantly, I'd prefer to do one thing at a time and cryptographic
hashing isn't exactly my area of expertise. So lets get these patches
in order and after that I'll look into making SYN cookies perform better
for the poor souls using Windows 8, if that's possible without using
timestamps ;)

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 3/5] netfilter: add SYNPROXY core/target
  2013-08-07 20:56     ` Patrick McHardy
@ 2013-08-08  6:22       ` Patrick McHardy
  2013-08-08 15:07         ` Jesper Dangaard Brouer
  2013-08-08  8:04       ` Jesper Dangaard Brouer
  1 sibling, 1 reply; 29+ messages in thread
From: Patrick McHardy @ 2013-08-08  6:22 UTC (permalink / raw)
  To: Jesper Dangaard Brouer; +Cc: pablo, netfilter-devel, netdev, mph, as

On Wed, Aug 07, 2013 at 10:56:03PM +0200, Patrick McHardy wrote:
> On Wed, Aug 07, 2013 at 10:26:00PM +0200, Jesper Dangaard Brouer wrote:
> > On Wed,  7 Aug 2013 19:42:49 +0200
> > Patrick McHardy <kaber@trash.net> wrote:
> > 
> > Besides when using net->proc_net_stat, then the first entry is usually
> > "entries" which is not percpu, this will likely confusing the tool:
> >   lnstat -f synproxy -c 42
> 
> I'll look into that.

Ok right, the first field must contains something that is not per-CPU.
Unfortunately I don't have anything to put there and I really don't want
to keep any global state. The two possibilities I see are:

- a dummy field
- the number of proxied connections, but not using a global counter but
  gathered by iterating over the entire conntrack hash.

Any opinions?

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 3/5] netfilter: add SYNPROXY core/target
  2013-08-07 23:37     ` Patrick McHardy
@ 2013-08-08  6:34       ` Patrick McHardy
  0 siblings, 0 replies; 29+ messages in thread
From: Patrick McHardy @ 2013-08-08  6:34 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: pablo, netfilter-devel, netdev, mph, jesper.brouer, as

On Thu, Aug 08, 2013 at 01:37:11AM +0200, Patrick McHardy wrote:
> On Wed, Aug 07, 2013 at 03:11:54PM -0700, Eric Dumazet wrote:
> > On Wed, 2013-08-07 at 19:42 +0200, Patrick McHardy wrote:
> > 
> > Also please check your uses of kfree_skb() .
> > 
> > Some of them would better be consume_skb() (for example in
> > ipv4_synproxy_hook())
> 
> I'll look into that.

Agreed for this case. The only other two kfree_skb()s are actually
error paths.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 3/5] netfilter: add SYNPROXY core/target
  2013-08-07 20:56     ` Patrick McHardy
  2013-08-08  6:22       ` Patrick McHardy
@ 2013-08-08  8:04       ` Jesper Dangaard Brouer
  2013-08-08  8:24         ` Patrick McHardy
  1 sibling, 1 reply; 29+ messages in thread
From: Jesper Dangaard Brouer @ 2013-08-08  8:04 UTC (permalink / raw)
  To: Patrick McHardy; +Cc: pablo, netfilter-devel, netdev, mph, as


On Wed, 7 Aug 2013 22:56:03 +0200 Patrick McHardy <kaber@trash.net> wrote:
> On Wed, Aug 07, 2013 at 10:26:00PM +0200, Jesper Dangaard Brouer wrote:
> > On Wed,  7 Aug 2013 19:42:49 +0200 Patrick McHardy <kaber@trash.net> wrote:

[...]
> > Besides shouldn't nth->ack_seq be zero, in a SYN packet? This is the
> > SYN "replayed" towards the server right?
> > 
> > I also pointed to this in an earlier patch Martin showed me, but he
> > reported that changing this resulted in bad behavior.  So, I would
> > request Martin to re-test this part.
> 
> Right, it should be zero, but it doesn't matter since the ACK flag isn't
> set. This is used to propagate the sequence number to the hook function
> to initialize the sequence adjustment data. While in the target function,
> we don't have any connection tracking state to store this in. We could
> set it to zero after that, but it shouldn't matter.

I think it deserves a comment in the code, that you are using ack_seq,
to relay this information to the hook, as its not obvious.

And I think we should set it to zero after that, else it will be
visible on the wire, and wireshark complains (with a warning) when it
sees pure SYN packets with a non-zero ACK number (Martin send me a dump
some time ago, and I just checked).

p.s. thanks for working on this module, which we discussed during the
Netfilter Workshop 2013.
-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Sr. Network Kernel Developer at Red Hat
  Author of http://www.iptv-analyzer.org
  LinkedIn: http://www.linkedin.com/in/brouer

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 3/5] netfilter: add SYNPROXY core/target
  2013-08-08  8:04       ` Jesper Dangaard Brouer
@ 2013-08-08  8:24         ` Patrick McHardy
  0 siblings, 0 replies; 29+ messages in thread
From: Patrick McHardy @ 2013-08-08  8:24 UTC (permalink / raw)
  To: Jesper Dangaard Brouer; +Cc: pablo, netfilter-devel, netdev, mph, as

On Thu, Aug 08, 2013 at 10:04:48AM +0200, Jesper Dangaard Brouer wrote:
> 
> On Wed, 7 Aug 2013 22:56:03 +0200 Patrick McHardy <kaber@trash.net> wrote:
> > On Wed, Aug 07, 2013 at 10:26:00PM +0200, Jesper Dangaard Brouer wrote:
> > > On Wed,  7 Aug 2013 19:42:49 +0200 Patrick McHardy <kaber@trash.net> wrote:
> 
> [...]
> > > Besides shouldn't nth->ack_seq be zero, in a SYN packet? This is the
> > > SYN "replayed" towards the server right?
> > > 
> > > I also pointed to this in an earlier patch Martin showed me, but he
> > > reported that changing this resulted in bad behavior.  So, I would
> > > request Martin to re-test this part.
> > 
> > Right, it should be zero, but it doesn't matter since the ACK flag isn't
> > set. This is used to propagate the sequence number to the hook function
> > to initialize the sequence adjustment data. While in the target function,
> > we don't have any connection tracking state to store this in. We could
> > set it to zero after that, but it shouldn't matter.
> 
> I think it deserves a comment in the code, that you are using ack_seq,
> to relay this information to the hook, as its not obvious.

Agreed, I've added a comment.

> And I think we should set it to zero after that, else it will be
> visible on the wire, and wireshark complains (with a warning) when it
> sees pure SYN packets with a non-zero ACK number (Martin send me a dump
> some time ago, and I just checked).

I'm a bit reluctant to do the entire "make skb writable, change packet,
update checksum" dance for a cosmetic issue when wireshark should in
fact ignore the value since the ACK flag is not set. I'll give it a try
and see how ugly it gets.

> p.s. thanks for working on this module, which we discussed during the
> Netfilter Workshop 2013.

Well, I think its pretty cool considering the numbers ;)

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 3/5] netfilter: add SYNPROXY core/target
  2013-08-08  6:22       ` Patrick McHardy
@ 2013-08-08 15:07         ` Jesper Dangaard Brouer
  0 siblings, 0 replies; 29+ messages in thread
From: Jesper Dangaard Brouer @ 2013-08-08 15:07 UTC (permalink / raw)
  To: Patrick McHardy; +Cc: pablo, netfilter-devel, netdev, mph, as

On Thu, 8 Aug 2013 08:22:55 +0200
Patrick McHardy <kaber@trash.net> wrote:

> On Wed, Aug 07, 2013 at 10:56:03PM +0200, Patrick McHardy wrote:
> > On Wed, Aug 07, 2013 at 10:26:00PM +0200, Jesper Dangaard Brouer wrote:
> > > On Wed,  7 Aug 2013 19:42:49 +0200
> > > Patrick McHardy <kaber@trash.net> wrote:
> > > 
> > > Besides when using net->proc_net_stat, then the first entry is usually
> > > "entries" which is not percpu, this will likely confusing the tool:
> > >   lnstat -f synproxy -c 42
> > 
> > I'll look into that.
> 
> Ok right, the first field must contains something that is not per-CPU.
> Unfortunately I don't have anything to put there and I really don't want
> to keep any global state. The two possibilities I see are:
> 
> - a dummy field
> - the number of proxied connections, but not using a global counter but
>   gathered by iterating over the entire conntrack hash.
> 
> Any opinions?

Well, I would of cause be nice to have some "entries" counter, e.g.
listing the number of active conntrack entries created by the SYNPROXY
target, but I don't think it's possible to identify those conntrack
entries, right.
So, I think it would be okay with just a dummy "entries" field which is
always zero.  

-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Sr. Network Kernel Developer at Red Hat
  Author of http://www.iptv-analyzer.org
  LinkedIn: http://www.linkedin.com/in/brouer

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH RFC 0/5] netfilter: implement netfilter SYN proxy
  2013-08-08  0:13           ` Patrick McHardy
@ 2013-08-09 13:55             ` Neal Cardwell
  0 siblings, 0 replies; 29+ messages in thread
From: Neal Cardwell @ 2013-08-09 13:55 UTC (permalink / raw)
  To: Patrick McHardy
  Cc: David Miller, Eric Dumazet, pablo, netfilter-devel, Netdev, mph,
	jesper.brouer, as

> It seems, Windows stopped using tcp timestamps at least in windows 8 by
> default.

I agree we should not let the Windows TCP stack behavior dictate the
approach, but just wanted to point out that it's not just Windows 8
that lacks timestamps. It's more that Windows 7 is about the only
sizable Windows deployment that does use timestamps. Windows NT,
Windows 2000, Windows XP, Windows Vista, and Windows 8 all do not use
timestamps by default, and that adds up to about 30-40% of Internet
clients, depending on who's measuring. Something to keep in mind.

neal

^ permalink raw reply	[flat|nested] 29+ messages in thread

end of thread, other threads:[~2013-08-09 13:55 UTC | newest]

Thread overview: 29+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-08-07 17:42 [PATCH RFC 0/5] netfilter: implement netfilter SYN proxy Patrick McHardy
2013-08-07 17:42 ` [PATCH 1/5] netfilter: nf_conntrack: make sequence number adjustments usuable without NAT Patrick McHardy
2013-08-07 20:02   ` Jesper Dangaard Brouer
2013-08-07 17:42 ` [PATCH 2/5] net: syncookies: export cookie_v4_init_sequence/cookie_v4_check Patrick McHardy
2013-08-07 20:03   ` Jesper Dangaard Brouer
2013-08-07 17:42 ` [PATCH 3/5] netfilter: add SYNPROXY core/target Patrick McHardy
2013-08-07 20:26   ` Jesper Dangaard Brouer
2013-08-07 20:56     ` Patrick McHardy
2013-08-08  6:22       ` Patrick McHardy
2013-08-08 15:07         ` Jesper Dangaard Brouer
2013-08-08  8:04       ` Jesper Dangaard Brouer
2013-08-08  8:24         ` Patrick McHardy
2013-08-07 22:11   ` Eric Dumazet
2013-08-07 23:37     ` Patrick McHardy
2013-08-08  6:34       ` Patrick McHardy
2013-08-07 17:42 ` [PATCH 4/5] net: syncookies: export cookie_v6_init_sequence/cookie_v6_check Patrick McHardy
2013-08-07 20:27   ` Jesper Dangaard Brouer
2013-08-07 17:42 ` [PATCH 5/5] netfilter: add IPv6 SYNPROXY target Patrick McHardy
2013-08-07 20:34   ` Jesper Dangaard Brouer
2013-08-07 20:57     ` Patrick McHardy
2013-08-07 18:06 ` [PATCH RFC 0/5] netfilter: implement netfilter SYN proxy Eric Dumazet
2013-08-07 20:59   ` Patrick McHardy
2013-08-07 21:05     ` Hannes Frederic Sowa
2013-08-07 21:24       ` Patrick McHardy
2013-08-07 21:39         ` Eric Dumazet
2013-08-07 23:40       ` David Miller
2013-08-08  0:04         ` Hannes Frederic Sowa
2013-08-08  0:13           ` Patrick McHardy
2013-08-09 13:55             ` Neal Cardwell

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.