[PATCH RFC 0/7] netfilter: introduce new chain type for local socket input

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH RFC 0/7] netfilter: introduce new chain type for local socket input
@ 2015-09-29 11:12 Daniel Mack
  2015-09-29 11:12 ` [PATCH RFC 1/7] netfilter: add socket to struct nft_pktinfo Daniel Mack
                   ` (6 more replies)
  0 siblings, 7 replies; 21+ messages in thread
From: Daniel Mack @ 2015-09-29 11:12 UTC (permalink / raw)
  To: pablo; +Cc: daniel, netfilter-devel, netdev, fw, balazs.scheidler, Daniel Mack

Here is a patch set that enables for full support for match rules
that take into account information about the local receiver socket.

Such rules allow administrators to implement per-application or
per-container firewalls which filter any type of network traffic
directed to or originated from a set of processes on a system,
independent of, for instance, local or remote port numbers.

In theory, such rules are already supported through the 'meta' and
'socket' rule types, but they currently do not work for ingress packets
delivered to unestablished listener sockets. NF_INET_LOCAL_IN chains
are iterated once the IP stack decides a packet is directed to the
local system, but before the local listener socket is determined.
Consequently, filter rules that are based on information derived from
the listener socket cannot be used reliably.

This patch set introduces a new chain type (NF_INET_LOCAL_SOCKET_IN)
that is iterated at a later point in time than NF_INET_LOCAL_IN, after
the listener socket demux has succeeded. Chains of this type are hence
only looked at _if_ there is a local listener.

The input paths for TCP and UDP for IPv4 and IPv6 are patched for
the new hook-up, as well as SCTP and DCCP.

Possible performance penalties for setups in which this new type is
not used need to be considered, but I lack a good test case for that.
I'm sure some people reading this do have proper test scenarios they
can run with these patches applied. I'd be very interested in these
numbers.

For SCTP and DCCP, I admittedly lack a proper test case as well, and
for UDP, I'm aware of a possible deadlock due to nf_hook() being called
under hslot->lock when the stack is flushed preliminarily from
__udp[46]_lib_mcast_deliver(). That's fixable, but I've kept it simple
for this RFC.

Only nftables is supported so far, but enabling iptables as well would
be straight forward.

I also have trivial patches for libnftnl and nftables to enable
the userspace part.

I'd appreciate some feedback about this approach.

Thanks,
Daniel

Daniel Mack (7):
  netfilter: add socket to struct nft_pktinfo
  netfilter: nft_meta: look at pkt->sk rather than skb->sk
  netfilter: add NF_INET_LOCAL_SOCKET_IN chain type
  net: tcp_ipv4, udp_ipv4: hook up LOCAL_SOCKET_IN netfilter chains
  net: tcp_ipv6, udp_ipv6: hook up LOCAL_SOCKET_IN netfilter chains
  net: sctp: hook up LOCAL_SOCKET_IN netfilter chains
  net: dccp: hook up LOCAL_SOCKET_IN netfilter chains

 include/net/netfilter/nf_tables.h   |  2 ++
 include/uapi/linux/netfilter.h      |  1 +
 net/dccp/ipv4.c                     | 14 +++++++++++++-
 net/dccp/ipv6.c                     | 14 +++++++++++++-
 net/ipv4/netfilter/iptable_filter.c |  1 +
 net/ipv4/netfilter/nf_tables_ipv4.c | 14 ++++++++------
 net/ipv4/tcp_ipv4.c                 |  8 ++++++++
 net/ipv4/udp.c                      | 15 +++++++++++++++
 net/ipv6/netfilter/nf_tables_ipv6.c | 14 ++++++++------
 net/ipv6/tcp_ipv6.c                 |  8 ++++++++
 net/ipv6/udp.c                      |  9 +++++++++
 net/netfilter/nf_tables_inet.c      |  3 ++-
 net/netfilter/nft_meta.c            |  7 ++++---
 net/sctp/input.c                    | 11 ++++++++++-
 14 files changed, 102 insertions(+), 19 deletions(-)

-- 
2.5.0

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH RFC 1/7] netfilter: add socket to struct nft_pktinfo
  2015-09-29 11:12 [PATCH RFC 0/7] netfilter: introduce new chain type for local socket input Daniel Mack
@ 2015-09-29 11:12 ` Daniel Mack
  2015-09-29 18:25   ` Eric W. Biederman
  2015-09-29 11:12 ` [PATCH RFC 2/7] netfilter: nft_meta: look at pkt->sk rather than skb->sk Daniel Mack
                   ` (5 subsequent siblings)
  6 siblings, 1 reply; 21+ messages in thread
From: Daniel Mack @ 2015-09-29 11:12 UTC (permalink / raw)
  To: pablo; +Cc: daniel, netfilter-devel, netdev, fw, balazs.scheidler, Daniel Mack

The high-level netfilter hook API already enables users to pass a socket,
but that information is lost when the chains are walked.

In order to let internal eval callbacks use the passed filter rather than
skb->sk, add a pointer of type 'struct sock' to 'struct nft_pktinfo' and
set that field via nft_set_pktinfo().

This allows us to run filter chains from situations where skb->sk is unset.
Fall back to skb->sk in case state->sk is NULL, so filter callbacks can be
written in a generic way.

Signed-off-by: Daniel Mack <daniel@zonque.org>
---
 include/net/netfilter/nf_tables.h | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/include/net/netfilter/nf_tables.h b/include/net/netfilter/nf_tables.h
index aa8bee7..05e97ed 100644
--- a/include/net/netfilter/nf_tables.h
+++ b/include/net/netfilter/nf_tables.h
@@ -13,6 +13,7 @@
 #define NFT_JUMP_STACK_SIZE	16
 
 struct nft_pktinfo {
+	struct sock			*sk;
 	struct sk_buff			*skb;
 	const struct net_device		*in;
 	const struct net_device		*out;
@@ -29,6 +30,7 @@ static inline void nft_set_pktinfo(struct nft_pktinfo *pkt,
 				   struct sk_buff *skb,
 				   const struct nf_hook_state *state)
 {
+	pkt->sk = state->sk ?: skb->sk;
 	pkt->skb = skb;
 	pkt->in = pkt->xt.in = state->in;
 	pkt->out = pkt->xt.out = state->out;
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH RFC 2/7] netfilter: nft_meta: look at pkt->sk rather than skb->sk
  2015-09-29 11:12 [PATCH RFC 0/7] netfilter: introduce new chain type for local socket input Daniel Mack
  2015-09-29 11:12 ` [PATCH RFC 1/7] netfilter: add socket to struct nft_pktinfo Daniel Mack
@ 2015-09-29 11:12 ` Daniel Mack
  2015-09-29 13:37   ` kbuild test robot
  2015-09-29 11:12 ` [PATCH RFC 3/7] netfilter: add NF_INET_LOCAL_SOCKET_IN chain type Daniel Mack
                   ` (4 subsequent siblings)
  6 siblings, 1 reply; 21+ messages in thread
From: Daniel Mack @ 2015-09-29 11:12 UTC (permalink / raw)
  To: pablo; +Cc: daniel, netfilter-devel, netdev, fw, balazs.scheidler, Daniel Mack

pkt->sk is set to whatever was passed to nh_hook() by the caller,
and for post demux chains, this is the one that should be looked
at, as skb->sk is still NULL at this point in time.

Signed-off-by: Daniel Mack <daniel@zonque.org>
---
 net/netfilter/nft_meta.c | 7 ++++---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/net/netfilter/nft_meta.c b/net/netfilter/nft_meta.c
index cb2f13e..f195bee 100644
--- a/net/netfilter/nft_meta.c
+++ b/net/netfilter/nft_meta.c
@@ -29,8 +29,9 @@ void nft_meta_get_eval(const struct nft_expr *expr,
 		       const struct nft_pktinfo *pkt)
 {
 	const struct nft_meta *priv = nft_expr_priv(expr);
-	const struct sk_buff *skb = pkt->skb;
 	const struct net_device *in = pkt->in, *out = pkt->out;
+	struct sk_buff *skb = pkt->skb;
+	struct sock *sk = pkt->sk;
 	u32 *dest = &regs->data[priv->dreg];
 
 	switch (priv->key) {
@@ -168,9 +169,9 @@ void nft_meta_get_eval(const struct nft_expr *expr,
 		break;
 #ifdef CONFIG_CGROUP_NET_CLASSID
 	case NFT_META_CGROUP:
-		if (skb->sk == NULL || !sk_fullsock(skb->sk))
+		if (sk == NULL || !sk_fullsock(sk))
 			goto err;
-		*dest = skb->sk->sk_classid;
+		*dest = sk->sk_classid;
 		break;
 #endif
 	default:
-- 
2.5.0


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH RFC 3/7] netfilter: add NF_INET_LOCAL_SOCKET_IN chain type
  2015-09-29 11:12 [PATCH RFC 0/7] netfilter: introduce new chain type for local socket input Daniel Mack
  2015-09-29 11:12 ` [PATCH RFC 1/7] netfilter: add socket to struct nft_pktinfo Daniel Mack
  2015-09-29 11:12 ` [PATCH RFC 2/7] netfilter: nft_meta: look at pkt->sk rather than skb->sk Daniel Mack
@ 2015-09-29 11:12 ` Daniel Mack
  2015-09-29 21:19   ` Florian Westphal
  2015-09-29 11:12 ` [PATCH RFC 4/7] net: tcp_ipv4, udp_ipv4: hook up LOCAL_SOCKET_IN netfilter chains Daniel Mack
                   ` (3 subsequent siblings)
  6 siblings, 1 reply; 21+ messages in thread
From: Daniel Mack @ 2015-09-29 11:12 UTC (permalink / raw)
  To: pablo; +Cc: daniel, netfilter-devel, netdev, fw, balazs.scheidler, Daniel Mack

Add a new chain type NF_INET_LOCAL_SOCKET_IN which is ran after the
input demux is complete and the final destination socket (if any)
has been determined.

This helps filtering packets based on information stored in the
destination socket, such as cgroup controller supplied net class IDs.

Note that rules in such chains are not processed in case the local
listen socket cannot be determined. Hence, if no application is
listening on a specific task, the resulting error code that is sent
back to the remote peer can't be controlled with rules in
NF_INET_LOCAL_SOCKET_IN chains.

Signed-off-by: Daniel Mack <daniel@zonque.org>
---
 include/uapi/linux/netfilter.h      | 1 +
 net/ipv4/netfilter/iptable_filter.c | 1 +
 net/ipv4/netfilter/nf_tables_ipv4.c | 4 +++-
 net/netfilter/nf_tables_inet.c      | 3 ++-
 4 files changed, 7 insertions(+), 2 deletions(-)

diff --git a/include/uapi/linux/netfilter.h b/include/uapi/linux/netfilter.h
index d93f949..96c3f8b 100644
--- a/include/uapi/linux/netfilter.h
+++ b/include/uapi/linux/netfilter.h
@@ -49,6 +49,7 @@ enum nf_inet_hooks {
 	NF_INET_FORWARD,
 	NF_INET_LOCAL_OUT,
 	NF_INET_POST_ROUTING,
+	NF_INET_LOCAL_SOCKET_IN,
 	NF_INET_NUMHOOKS
 };
 
diff --git a/net/ipv4/netfilter/iptable_filter.c b/net/ipv4/netfilter/iptable_filter.c
index a0f3bec..d65616a5 100644
--- a/net/ipv4/netfilter/iptable_filter.c
+++ b/net/ipv4/netfilter/iptable_filter.c
@@ -21,6 +21,7 @@ MODULE_AUTHOR("Netfilter Core Team <coreteam@netfilter.org>");
 MODULE_DESCRIPTION("iptables filter table");
 
 #define FILTER_VALID_HOOKS ((1 << NF_INET_LOCAL_IN) | \
+			    (1 << NF_INET_LOCAL_SOCKET_IN) | \
 			    (1 << NF_INET_FORWARD) | \
 			    (1 << NF_INET_LOCAL_OUT))
 
diff --git a/net/ipv4/netfilter/nf_tables_ipv4.c b/net/ipv4/netfilter/nf_tables_ipv4.c
index aa180d3..abee60a 100644
--- a/net/ipv4/netfilter/nf_tables_ipv4.c
+++ b/net/ipv4/netfilter/nf_tables_ipv4.c
@@ -55,6 +55,7 @@ struct nft_af_info nft_af_ipv4 __read_mostly = {
 		[NF_INET_FORWARD]	= nft_do_chain_ipv4,
 		[NF_INET_PRE_ROUTING]	= nft_do_chain_ipv4,
 		[NF_INET_POST_ROUTING]	= nft_do_chain_ipv4,
+		[NF_INET_LOCAL_SOCKET_IN]	= nft_do_chain_ipv4,
 	},
 };
 EXPORT_SYMBOL_GPL(nft_af_ipv4);
@@ -96,7 +97,8 @@ static const struct nf_chain_type filter_ipv4 = {
 			  (1 << NF_INET_LOCAL_OUT) |
 			  (1 << NF_INET_FORWARD) |
 			  (1 << NF_INET_PRE_ROUTING) |
-			  (1 << NF_INET_POST_ROUTING),
+			  (1 << NF_INET_POST_ROUTING) |
+			  (1 << NF_INET_LOCAL_SOCKET_IN),
 };
 
 static int __init nf_tables_ipv4_init(void)
diff --git a/net/netfilter/nf_tables_inet.c b/net/netfilter/nf_tables_inet.c
index 9dd2d21..5544196 100644
--- a/net/netfilter/nf_tables_inet.c
+++ b/net/netfilter/nf_tables_inet.c
@@ -75,7 +75,8 @@ static const struct nf_chain_type filter_inet = {
 			  (1 << NF_INET_LOCAL_OUT) |
 			  (1 << NF_INET_FORWARD) |
 			  (1 << NF_INET_PRE_ROUTING) |
-			  (1 << NF_INET_POST_ROUTING),
+			  (1 << NF_INET_POST_ROUTING) |
+			  (1 << NF_INET_LOCAL_SOCKET_IN),
 };
 
 static int __init nf_tables_inet_init(void)
-- 
2.5.0


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH RFC 4/7] net: tcp_ipv4, udp_ipv4: hook up LOCAL_SOCKET_IN netfilter chains
  2015-09-29 11:12 [PATCH RFC 0/7] netfilter: introduce new chain type for local socket input Daniel Mack
                   ` (2 preceding siblings ...)
  2015-09-29 11:12 ` [PATCH RFC 3/7] netfilter: add NF_INET_LOCAL_SOCKET_IN chain type Daniel Mack
@ 2015-09-29 11:12 ` Daniel Mack
  2015-09-29 11:12 ` [PATCH RFC 5/7] net: tcp_ipv6, udp_ipv6: " Daniel Mack
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 21+ messages in thread
From: Daniel Mack @ 2015-09-29 11:12 UTC (permalink / raw)
  To: pablo; +Cc: daniel, netfilter-devel, netdev, fw, balazs.scheidler, Daniel Mack

Run the NF_INET_LOCAL_SOCKET_IN netfilter chain rules after the
destination socket for IPv4 unicast and multicast ports have been
looked up.

Signed-off-by: Daniel Mack <daniel@zonque.org>
---
 net/ipv4/netfilter/nf_tables_ipv4.c | 10 +++++-----
 net/ipv4/tcp_ipv4.c                 |  8 ++++++++
 net/ipv4/udp.c                      | 15 +++++++++++++++
 3 files changed, 28 insertions(+), 5 deletions(-)

diff --git a/net/ipv4/netfilter/nf_tables_ipv4.c b/net/ipv4/netfilter/nf_tables_ipv4.c
index abee60a..2e65664 100644
--- a/net/ipv4/netfilter/nf_tables_ipv4.c
+++ b/net/ipv4/netfilter/nf_tables_ipv4.c
@@ -50,11 +50,11 @@ struct nft_af_info nft_af_ipv4 __read_mostly = {
 	.owner		= THIS_MODULE,
 	.nops		= 1,
 	.hooks		= {
-		[NF_INET_LOCAL_IN]	= nft_do_chain_ipv4,
-		[NF_INET_LOCAL_OUT]	= nft_ipv4_output,
-		[NF_INET_FORWARD]	= nft_do_chain_ipv4,
-		[NF_INET_PRE_ROUTING]	= nft_do_chain_ipv4,
-		[NF_INET_POST_ROUTING]	= nft_do_chain_ipv4,
+		[NF_INET_LOCAL_IN]		= nft_do_chain_ipv4,
+		[NF_INET_LOCAL_OUT]		= nft_ipv4_output,
+		[NF_INET_FORWARD]		= nft_do_chain_ipv4,
+		[NF_INET_PRE_ROUTING]		= nft_do_chain_ipv4,
+		[NF_INET_POST_ROUTING]		= nft_do_chain_ipv4,
 		[NF_INET_LOCAL_SOCKET_IN]	= nft_do_chain_ipv4,
 	},
 };
diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
index 93898e0..83bc7b3 100644
--- a/net/ipv4/tcp_ipv4.c
+++ b/net/ipv4/tcp_ipv4.c
@@ -78,6 +78,7 @@
 
 #include <linux/inet.h>
 #include <linux/ipv6.h>
+#include <linux/netfilter.h>
 #include <linux/stddef.h>
 #include <linux/proc_fs.h>
 #include <linux/seq_file.h>
@@ -1594,6 +1595,13 @@ int tcp_v4_rcv(struct sk_buff *skb)
 	if (!sk)
 		goto no_tcp_socket;
 
+	ret = nf_hook(NFPROTO_IPV4, NF_INET_LOCAL_SOCKET_IN, sk,
+		      skb, skb->dev, NULL, NULL);
+	if (ret != 1) {
+		sock_put(sk);
+		return 0;
+	}
+
 process:
 	if (sk->sk_state == TCP_TIME_WAIT)
 		goto do_time_wait;
diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
index f7d1d5e..57c7571 100644
--- a/net/ipv4/udp.c
+++ b/net/ipv4/udp.c
@@ -97,6 +97,7 @@
 #include <linux/mm.h>
 #include <linux/inet.h>
 #include <linux/netdevice.h>
+#include <linux/netfilter.h>
 #include <linux/slab.h>
 #include <net/tcp_states.h>
 #include <linux/skbuff.h>
@@ -1633,7 +1634,14 @@ static void flush_stack(struct sock **stack, unsigned int count,
 	struct sock *sk;
 
 	for (i = 0; i < count; i++) {
+		int ret;
 		sk = stack[i];
+
+		ret = nf_hook(NFPROTO_IPV4, NF_INET_LOCAL_SOCKET_IN, sk,
+			      skb, skb->dev, NULL, NULL);
+		if (ret != 1)
+			continue;
+
 		if (likely(!skb1))
 			skb1 = (i == final) ? skb : skb_clone(skb, GFP_ATOMIC);
 
@@ -1820,6 +1828,13 @@ int __udp4_lib_rcv(struct sk_buff *skb, struct udp_table *udptable,
 	if (sk) {
 		int ret;
 
+		ret = nf_hook(NFPROTO_IPV4, NF_INET_LOCAL_SOCKET_IN, sk,
+			      skb, skb->dev, NULL, NULL);
+		if (ret != 1) {
+			sock_put(sk);
+			return 0;
+		}
+
 		if (inet_get_convert_csum(sk) && uh->check && !IS_UDPLITE(sk))
 			skb_checksum_try_convert(skb, IPPROTO_UDP, uh->check,
 						 inet_compute_pseudo);
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH RFC 5/7] net: tcp_ipv6, udp_ipv6: hook up LOCAL_SOCKET_IN netfilter chains
  2015-09-29 11:12 [PATCH RFC 0/7] netfilter: introduce new chain type for local socket input Daniel Mack
                   ` (3 preceding siblings ...)
  2015-09-29 11:12 ` [PATCH RFC 4/7] net: tcp_ipv4, udp_ipv4: hook up LOCAL_SOCKET_IN netfilter chains Daniel Mack
@ 2015-09-29 11:12 ` Daniel Mack
  2015-09-29 11:12 ` [PATCH RFC 6/7] net: sctp: " Daniel Mack
  2015-09-29 11:12 ` [PATCH RFC 7/7] net: dccp: " Daniel Mack
  6 siblings, 0 replies; 21+ messages in thread
From: Daniel Mack @ 2015-09-29 11:12 UTC (permalink / raw)
  To: pablo; +Cc: daniel, netfilter-devel, netdev, fw, balazs.scheidler, Daniel Mack

Run the NF_INET_LOCAL_SOCKET_IN netfilter chain rules after the
destination socket for IPv6 unicast and multicast ports have been
looked up.

Signed-off-by: Daniel Mack <daniel@zonque.org>
---
 net/ipv6/netfilter/nf_tables_ipv6.c | 14 ++++++++------
 net/ipv6/tcp_ipv6.c                 |  8 ++++++++
 net/ipv6/udp.c                      |  9 +++++++++
 3 files changed, 25 insertions(+), 6 deletions(-)

diff --git a/net/ipv6/netfilter/nf_tables_ipv6.c b/net/ipv6/netfilter/nf_tables_ipv6.c
index c8148ba..53c7923 100644
--- a/net/ipv6/netfilter/nf_tables_ipv6.c
+++ b/net/ipv6/netfilter/nf_tables_ipv6.c
@@ -49,11 +49,12 @@ struct nft_af_info nft_af_ipv6 __read_mostly = {
 	.owner		= THIS_MODULE,
 	.nops		= 1,
 	.hooks		= {
-		[NF_INET_LOCAL_IN]	= nft_do_chain_ipv6,
-		[NF_INET_LOCAL_OUT]	= nft_ipv6_output,
-		[NF_INET_FORWARD]	= nft_do_chain_ipv6,
-		[NF_INET_PRE_ROUTING]	= nft_do_chain_ipv6,
-		[NF_INET_POST_ROUTING]	= nft_do_chain_ipv6,
+		[NF_INET_LOCAL_IN]		= nft_do_chain_ipv6,
+		[NF_INET_LOCAL_OUT]		= nft_ipv6_output,
+		[NF_INET_FORWARD]		= nft_do_chain_ipv6,
+		[NF_INET_PRE_ROUTING]		= nft_do_chain_ipv6,
+		[NF_INET_POST_ROUTING]		= nft_do_chain_ipv6,
+		[NF_INET_LOCAL_SOCKET_IN]	= nft_do_chain_ipv6,
 	},
 };
 EXPORT_SYMBOL_GPL(nft_af_ipv6);
@@ -95,7 +96,8 @@ static const struct nf_chain_type filter_ipv6 = {
 			  (1 << NF_INET_LOCAL_OUT) |
 			  (1 << NF_INET_FORWARD) |
 			  (1 << NF_INET_PRE_ROUTING) |
-			  (1 << NF_INET_POST_ROUTING),
+			  (1 << NF_INET_POST_ROUTING) |
+			  (1 << NF_INET_LOCAL_SOCKET_IN),
 };
 
 static int __init nf_tables_ipv6_init(void)
diff --git a/net/ipv6/tcp_ipv6.c b/net/ipv6/tcp_ipv6.c
index 97d9314..0b0706d 100644
--- a/net/ipv6/tcp_ipv6.c
+++ b/net/ipv6/tcp_ipv6.c
@@ -37,6 +37,7 @@
 #include <linux/init.h>
 #include <linux/jhash.h>
 #include <linux/ipsec.h>
+#include <linux/netfilter.h>
 #include <linux/times.h>
 #include <linux/slab.h>
 #include <linux/uaccess.h>
@@ -1392,6 +1393,13 @@ static int tcp_v6_rcv(struct sk_buff *skb)
 	if (!sk)
 		goto no_tcp_socket;
 
+	ret = nf_hook(NFPROTO_IPV6, NF_INET_LOCAL_SOCKET_IN, sk,
+		      skb, skb->dev, NULL, NULL);
+	if (ret != 1) {
+		sock_put(sk);
+		return 0;
+	}
+
 process:
 	if (sk->sk_state == TCP_TIME_WAIT)
 		goto do_time_wait;
diff --git a/net/ipv6/udp.c b/net/ipv6/udp.c
index 0aba654..99df081 100644
--- a/net/ipv6/udp.c
+++ b/net/ipv6/udp.c
@@ -33,6 +33,7 @@
 #include <linux/icmpv6.h>
 #include <linux/init.h>
 #include <linux/module.h>
+#include <linux/netfilter.h>
 #include <linux/skbuff.h>
 #include <linux/slab.h>
 #include <asm/uaccess.h>
@@ -746,7 +747,15 @@ static void flush_stack(struct sock **stack, unsigned int count,
 	unsigned int i;
 
 	for (i = 0; i < count; i++) {
+		int ret;
+
 		sk = stack[i];
+
+		ret = nf_hook(NFPROTO_IPV6, NF_INET_LOCAL_SOCKET_IN, sk,
+			      skb, skb->dev, NULL, NULL);
+		if (ret != 1)
+			continue;
+
 		if (likely(!skb1))
 			skb1 = (i == final) ? skb : skb_clone(skb, GFP_ATOMIC);
 		if (!skb1) {
-- 
2.5.0


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH RFC 6/7] net: sctp: hook up LOCAL_SOCKET_IN netfilter chains
  2015-09-29 11:12 [PATCH RFC 0/7] netfilter: introduce new chain type for local socket input Daniel Mack
                   ` (4 preceding siblings ...)
  2015-09-29 11:12 ` [PATCH RFC 5/7] net: tcp_ipv6, udp_ipv6: " Daniel Mack
@ 2015-09-29 11:12 ` Daniel Mack
  2015-09-29 11:12 ` [PATCH RFC 7/7] net: dccp: " Daniel Mack
  6 siblings, 0 replies; 21+ messages in thread
From: Daniel Mack @ 2015-09-29 11:12 UTC (permalink / raw)
  To: pablo; +Cc: daniel, netfilter-devel, netdev, fw, balazs.scheidler, Daniel Mack

Run the NF_INET_LOCAL_SOCKET_IN netfilter chain rules after the
destination socket for SCTP packets have been looked up.

Signed-off-by: Daniel Mack <daniel@zonque.org>
---
 net/sctp/input.c | 11 ++++++++++-
 1 file changed, 10 insertions(+), 1 deletion(-)

diff --git a/net/sctp/input.c b/net/sctp/input.c
index b6493b3..0652406 100644
--- a/net/sctp/input.c
+++ b/net/sctp/input.c
@@ -45,6 +45,7 @@
 #include <linux/list.h> /* For struct list_head */
 #include <linux/socket.h>
 #include <linux/ip.h>
+#include <linux/netfilter.h>
 #include <linux/time.h> /* For struct timeval */
 #include <linux/slab.h>
 #include <net/ip.h>
@@ -115,7 +116,7 @@ int sctp_rcv(struct sk_buff *skb)
 	struct sctphdr *sh;
 	union sctp_addr src;
 	union sctp_addr dest;
-	int family;
+	int ret, family;
 	struct sctp_af *af;
 	struct net *net = dev_net(skb->dev);
 
@@ -180,6 +181,14 @@ int sctp_rcv(struct sk_buff *skb)
 	rcvr = asoc ? &asoc->base : &ep->base;
 	sk = rcvr->sk;
 
+	/* Iterate through rules in LOCAL_SOCKET_IN,
+	 * now that the receiver is known.
+	 */
+	ret = nf_hook(family == AF_INET ? NFPROTO_IPV4 : NFPROTO_IPV6,
+		      NF_INET_LOCAL_SOCKET_IN, sk, skb, skb->dev, NULL, NULL);
+	if (ret != 1)
+		goto discard_release;
+
 	/*
 	 * If a frame arrives on an interface and the receiving socket is
 	 * bound to another interface, via SO_BINDTODEVICE, treat it as OOTB
-- 
2.5.0


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH RFC 7/7] net: dccp: hook up LOCAL_SOCKET_IN netfilter chains
  2015-09-29 11:12 [PATCH RFC 0/7] netfilter: introduce new chain type for local socket input Daniel Mack
                   ` (5 preceding siblings ...)
  2015-09-29 11:12 ` [PATCH RFC 6/7] net: sctp: " Daniel Mack
@ 2015-09-29 11:12 ` Daniel Mack
  6 siblings, 0 replies; 21+ messages in thread
From: Daniel Mack @ 2015-09-29 11:12 UTC (permalink / raw)
  To: pablo; +Cc: daniel, netfilter-devel, netdev, fw, balazs.scheidler, Daniel Mack

Run the NF_INET_LOCAL_SOCKET_IN netfilter chain rules after the
destination socket for DCCP packets have been looked up.

Signed-off-by: Daniel Mack <daniel@zonque.org>
---
 net/dccp/ipv4.c | 14 +++++++++++++-
 net/dccp/ipv6.c | 14 +++++++++++++-
 2 files changed, 26 insertions(+), 2 deletions(-)

diff --git a/net/dccp/ipv4.c b/net/dccp/ipv4.c
index ccf4c56..9746138 100644
--- a/net/dccp/ipv4.c
+++ b/net/dccp/ipv4.c
@@ -14,6 +14,7 @@
 #include <linux/icmp.h>
 #include <linux/slab.h>
 #include <linux/module.h>
+#include <linux/netfilter.h>
 #include <linux/skbuff.h>
 #include <linux/random.h>
 
@@ -807,7 +808,7 @@ static int dccp_v4_rcv(struct sk_buff *skb)
 	const struct dccp_hdr *dh;
 	const struct iphdr *iph;
 	struct sock *sk;
-	int min_cov;
+	int ret, min_cov;
 
 	/* Step 1: Check header basics */
 
@@ -857,6 +858,17 @@ static int dccp_v4_rcv(struct sk_buff *skb)
 
 	/*
 	 * Step 2:
+	 *	... or any LOCAL_SOCKET_IN rule disagrees ...
+	 */
+	ret = nf_hook(NFPROTO_IPV4, NF_INET_LOCAL_SOCKET_IN, sk,
+		      skb, skb->dev, NULL, NULL);
+	if (ret != 1) {
+		sock_put(sk);
+		return 0;
+	}
+
+	/*
+	 * Step 2:
 	 *	... or S.state == TIMEWAIT,
 	 *		Generate Reset(No Connection) unless P.type == Reset
 	 *		Drop packet and return
diff --git a/net/dccp/ipv6.c b/net/dccp/ipv6.c
index 5165571..63b51e6 100644
--- a/net/dccp/ipv6.c
+++ b/net/dccp/ipv6.c
@@ -14,6 +14,7 @@
 
 #include <linux/module.h>
 #include <linux/random.h>
+#include <linux/netfilter.h>
 #include <linux/slab.h>
 #include <linux/xfrm.h>
 
@@ -691,7 +692,7 @@ static int dccp_v6_rcv(struct sk_buff *skb)
 {
 	const struct dccp_hdr *dh;
 	struct sock *sk;
-	int min_cov;
+	int ret, min_cov;
 
 	/* Step 1: Check header basics */
 
@@ -732,6 +733,17 @@ static int dccp_v6_rcv(struct sk_buff *skb)
 
 	/*
 	 * Step 2:
+	 *	... or any LOCAL_SOCKET_IN rule disagrees ...
+	 */
+	ret = nf_hook(NFPROTO_IPV6, NF_INET_LOCAL_SOCKET_IN, sk,
+		      skb, skb->dev, NULL, NULL);
+	if (ret != 1) {
+		sock_put(sk);
+		return 0;
+	}
+
+	/*
+	 * Step 2:
 	 *	... or S.state == TIMEWAIT,
 	 *		Generate Reset(No Connection) unless P.type == Reset
 	 *		Drop packet and return
-- 
2.5.0


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: [PATCH RFC 2/7] netfilter: nft_meta: look at pkt->sk rather than skb->sk
  2015-09-29 11:12 ` [PATCH RFC 2/7] netfilter: nft_meta: look at pkt->sk rather than skb->sk Daniel Mack
@ 2015-09-29 13:37   ` kbuild test robot
  0 siblings, 0 replies; 21+ messages in thread
From: kbuild test robot @ 2015-09-29 13:37 UTC (permalink / raw)
  To: Daniel Mack
  Cc: kbuild-all, pablo, daniel, netfilter-devel, netdev, fw,
	balazs.scheidler, Daniel Mack

[-- Attachment #1: Type: text/plain, Size: 1726 bytes --]

Hi Daniel,

[auto build test results on v4.3-rc3 -- if it's inappropriate base, please ignore]

config: m68k-sun3_defconfig (attached as .config)
reproduce:
  wget https://git.kernel.org/cgit/linux/kernel/git/wfg/lkp-tests.git/plain/sbin/make.cross -O ~/bin/make.cross
  chmod +x ~/bin/make.cross
  git checkout bcddf1d1557b51bef5ef395b5b7dd7b512794e2f
  # save the attached .config to linux build tree
  make.cross ARCH=m68k 

All warnings (new ones prefixed by >>):

   net/netfilter/nft_meta.c: In function 'nft_meta_get_eval':
>> net/netfilter/nft_meta.c:34:15: warning: unused variable 'sk' [-Wunused-variable]
     struct sock *sk = pkt->sk;
                  ^

vim +/sk +34 net/netfilter/nft_meta.c

    18	#include <linux/ip.h>
    19	#include <linux/ipv6.h>
    20	#include <linux/smp.h>
    21	#include <net/dst.h>
    22	#include <net/sock.h>
    23	#include <net/tcp_states.h> /* for TCP_TIME_WAIT */
    24	#include <net/netfilter/nf_tables.h>
    25	#include <net/netfilter/nft_meta.h>
    26	
    27	void nft_meta_get_eval(const struct nft_expr *expr,
    28			       struct nft_regs *regs,
    29			       const struct nft_pktinfo *pkt)
    30	{
    31		const struct nft_meta *priv = nft_expr_priv(expr);
    32		const struct net_device *in = pkt->in, *out = pkt->out;
    33		struct sk_buff *skb = pkt->skb;
  > 34		struct sock *sk = pkt->sk;
    35		u32 *dest = &regs->data[priv->dreg];
    36	
    37		switch (priv->key) {
    38		case NFT_META_LEN:
    39			*dest = skb->len;
    40			break;
    41		case NFT_META_PROTOCOL:
    42			*dest = 0;

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/octet-stream, Size: 10998 bytes --]

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH RFC 1/7] netfilter: add socket to struct nft_pktinfo
  2015-09-29 11:12 ` [PATCH RFC 1/7] netfilter: add socket to struct nft_pktinfo Daniel Mack
@ 2015-09-29 18:25   ` Eric W. Biederman
  0 siblings, 0 replies; 21+ messages in thread
From: Eric W. Biederman @ 2015-09-29 18:25 UTC (permalink / raw)
  To: Daniel Mack; +Cc: pablo, daniel, netfilter-devel, netdev, fw, balazs.scheidler

Daniel Mack <daniel@zonque.org> writes:

> The high-level netfilter hook API already enables users to pass a socket,
> but that information is lost when the chains are walked.
>
> In order to let internal eval callbacks use the passed filter rather than
> skb->sk, add a pointer of type 'struct sock' to 'struct nft_pktinfo' and
> set that field via nft_set_pktinfo().
>
> This allows us to run filter chains from situations where skb->sk is unset.
> Fall back to skb->sk in case state->sk is NULL, so filter callbacks can be
> written in a generic way.

A word of caution here, and something I expect needs to be thought
about.  The reason sk is passed through netfilter and some of the other
paths is that there are tunnel cases where there are two sk's in play
and the passed in socket makes it possible to have both sk's available.

I really don't understand all of the details, but they exist in the git
history of the change.

Reading your change description it does not feel as though you are aware
of the subtleties and as such I suspect your code gets something wrong.

The placing of sk as the first word in struct nft_pktinfo is also a
little concerning, as I don't expect it will be as highly used as
the later fields, and everything that is highly used we want to keep
in one cache line if we can.

Eric

> Signed-off-by: Daniel Mack <daniel@zonque.org>
> ---
>  include/net/netfilter/nf_tables.h | 2 ++
>  1 file changed, 2 insertions(+)
>
> diff --git a/include/net/netfilter/nf_tables.h b/include/net/netfilter/nf_tables.h
> index aa8bee7..05e97ed 100644
> --- a/include/net/netfilter/nf_tables.h
> +++ b/include/net/netfilter/nf_tables.h
> @@ -13,6 +13,7 @@
>  #define NFT_JUMP_STACK_SIZE	16
>  
>  struct nft_pktinfo {
> +	struct sock			*sk;
>  	struct sk_buff			*skb;
>  	const struct net_device		*in;
>  	const struct net_device		*out;
> @@ -29,6 +30,7 @@ static inline void nft_set_pktinfo(struct nft_pktinfo *pkt,
>  				   struct sk_buff *skb,
>  				   const struct nf_hook_state *state)
>  {
> +	pkt->sk = state->sk ?: skb->sk;
>  	pkt->skb = skb;
>  	pkt->in = pkt->xt.in = state->in;
>  	pkt->out = pkt->xt.out = state->out;

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH RFC 3/7] netfilter: add NF_INET_LOCAL_SOCKET_IN chain type
  2015-09-29 11:12 ` [PATCH RFC 3/7] netfilter: add NF_INET_LOCAL_SOCKET_IN chain type Daniel Mack
@ 2015-09-29 21:19   ` Florian Westphal
  2015-09-30  7:24     ` Daniel Mack
  0 siblings, 1 reply; 21+ messages in thread
From: Florian Westphal @ 2015-09-29 21:19 UTC (permalink / raw)
  To: Daniel Mack; +Cc: pablo, daniel, netfilter-devel, netdev, fw, balazs.scheidler

Daniel Mack <daniel@zonque.org> wrote:
> Add a new chain type NF_INET_LOCAL_SOCKET_IN which is ran after the
> input demux is complete and the final destination socket (if any)
> has been determined.
> 
> This helps filtering packets based on information stored in the
> destination socket, such as cgroup controller supplied net class IDs.

This still seems like the 'x y' problem ("want to do X, think Y is
correct solution; ask about Y, but thats a strange thing to do").

There is nothing that this offers over INPUT *except* that sk is
available.  But there is zero benefit as far as I am concerned --
why would you want to do any meaningful filtering based on the sk at
that point...?

Drop?  Makes no sense, else application would not be running in the first
place.
Allowing response packets?  Can already do that with conntrack.

So the only 'benefit' is that netcls id is available; but
a) why is that even needed and
b) is such a huge sledgehammer just for net cgroup accounting
worth it?

Another question is what other strange things come up once we would
open this door.

> listening on a specific task, the resulting error code that is sent
> back to the remote peer can't be controlled with rules in
> NF_INET_LOCAL_SOCKET_IN chains.

Right, and that makes this even weirder.

For deterministic ingress filtering you can only rely on what
is contained in the packet.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH RFC 3/7] netfilter: add NF_INET_LOCAL_SOCKET_IN chain type
  2015-09-29 21:19   ` Florian Westphal
@ 2015-09-30  7:24     ` Daniel Mack
  2015-09-30  7:40       ` Jan Engelhardt
                         ` (2 more replies)
  0 siblings, 3 replies; 21+ messages in thread
From: Daniel Mack @ 2015-09-30  7:24 UTC (permalink / raw)
  To: Florian Westphal; +Cc: pablo, daniel, netfilter-devel, netdev, balazs.scheidler

On 09/29/2015 11:19 PM, Florian Westphal wrote:
> Daniel Mack <daniel@zonque.org> wrote:
>> Add a new chain type NF_INET_LOCAL_SOCKET_IN which is ran after the
>> input demux is complete and the final destination socket (if any)
>> has been determined.
>>
>> This helps filtering packets based on information stored in the
>> destination socket, such as cgroup controller supplied net class IDs.
> 
> This still seems like the 'x y' problem ("want to do X, think Y is
> correct solution; ask about Y, but thats a strange thing to do").
> 
> There is nothing that this offers over INPUT *except* that sk is
> available.  But there is zero benefit as far as I am concerned --
> why would you want to do any meaningful filtering based on the sk at
> that point...?

Well, INPUT and SOCKET_INPUT are just two different tools that help
solve different classes of problems. INPUT is for filtering all local
traffic while SOCKET_INPUT is just for such that actually has a
listener, and they both make sense in different scenarios.

> Drop?  Makes no sense, else application would not be running in the first
> place.

Of course you can drop certain packets at this point, depending on other
details. Say, for instance, you want to match all packets that are
received by a certain task and that are originated from IP addresses of
a specific subnet, and drop the rest. Rather than adding matches to your
global firewall configuration for all the ports that tasks may or may
not listen on, you can just do it on a higher level, from the
perspective of an administrator. If you decide to let your web server
listen on another port as well, no firewall rule configuration change is
needed at all.

Another use case is accounting. If you want to know how much traffic a
certain service or application in your system has caused, you don't want
to match all its ports to firewall rules just in order to get that
information. Instead, you can now derive that information on a
per-application base. With this patch set, this even works just fine for
multicast listeners, which is something that is currently impossible to
achieve otherwise.

> So the only 'benefit' is that netcls id is available; but
> a) why is that even needed and

It's currently the only way of realizing application-level firewalls,
and it'd be an awesome feature if it actually worked.

> b) is such a huge sledgehammer just for net cgroup accounting
> worth it?

I really don't know if this approach is intrusive enough to make it
qualify as sledgehammer. I'd like to see some real-world benchmarks and
have proof there is a performance decrease for setups that don't use
such chains.

> Another question is what other strange things come up once we would
> open this door.

So let's discuss the possible drawbacks.

Again, the deal with this new chain type is simple: if there is no local
listener, the rules are not looked at. If you need rules that are
processed either way, put them in LOCAL_IN, as you always did.

>> listening on a specific task, the resulting error code that is sent
>> back to the remote peer can't be controlled with rules in
>> NF_INET_LOCAL_SOCKET_IN chains.
> 
> Right, and that makes this even weirder.

Well, to be more specific: you can only control the resulting error code
that is sent back to the remote peer _if_ there is a local listener. You
can do _anything_ _if_ there is a local listener. This is in line with
the above description and shouldn't cause much surprises for users.

> For deterministic ingress filtering you can only rely on what
> is contained in the packet.

Why so? For deterministic ingress filtering of traffic directed to a
local socket, you can as well rely on information associated with that
socket. And this is what application-level firewall rule sets are all about.

Daniel

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH RFC 3/7] netfilter: add NF_INET_LOCAL_SOCKET_IN chain type
  2015-09-30  7:24     ` Daniel Mack
@ 2015-09-30  7:40       ` Jan Engelhardt
  2015-09-30  8:54         ` Daniel Mack
  2015-09-30 21:48       ` Florian Westphal
  2015-10-01 17:13       ` Marcelo Ricardo Leitner
  2 siblings, 1 reply; 21+ messages in thread
From: Jan Engelhardt @ 2015-09-30  7:40 UTC (permalink / raw)
  To: Daniel Mack
  Cc: Florian Westphal, pablo, daniel, netfilter-devel, netdev,
	balazs.scheidler


On Wednesday 2015-09-30 09:24, Daniel Mack wrote:
>
>> Drop?  Makes no sense, else application would not be running in the first
>> place.
>
>Of course you can drop certain packets at this point, depending on other
>details. Say, for instance, you want to match all packets that are
>received by a certain task [...]
>Another use case is accounting. If you want to know how much traffic a
>certain service or application in your system has caused

But the sk info would be available in INPUT already, would it not?

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH RFC 3/7] netfilter: add NF_INET_LOCAL_SOCKET_IN chain type
  2015-09-30  7:40       ` Jan Engelhardt
@ 2015-09-30  8:54         ` Daniel Mack
  0 siblings, 0 replies; 21+ messages in thread
From: Daniel Mack @ 2015-09-30  8:54 UTC (permalink / raw)
  To: Jan Engelhardt
  Cc: Florian Westphal, pablo, daniel, netfilter-devel, netdev,
	balazs.scheidler

On 09/30/2015 09:40 AM, Jan Engelhardt wrote:
> 
> On Wednesday 2015-09-30 09:24, Daniel Mack wrote:
>>
>>> Drop?  Makes no sense, else application would not be running in the first
>>> place.
>>
>> Of course you can drop certain packets at this point, depending on other
>> details. Say, for instance, you want to match all packets that are
>> received by a certain task [...]
>> Another use case is accounting. If you want to know how much traffic a
>> certain service or application in your system has caused
> 
> But the sk info would be available in INPUT already, would it not?

No, only for established connections, as those are subject to early
demux which sets skb->sk. For all other packets, netfilter callbacks are
called with skb->sk == NULL.

That's the whole point of this patch set ;)


Daniel

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH RFC 3/7] netfilter: add NF_INET_LOCAL_SOCKET_IN chain type
  2015-09-30  7:24     ` Daniel Mack
  2015-09-30  7:40       ` Jan Engelhardt
@ 2015-09-30 21:48       ` Florian Westphal
  2015-10-01  9:04         ` Daniel Mack
  2015-10-01 17:13       ` Marcelo Ricardo Leitner
  2 siblings, 1 reply; 21+ messages in thread
From: Florian Westphal @ 2015-09-30 21:48 UTC (permalink / raw)
  To: Daniel Mack
  Cc: Florian Westphal, pablo, daniel, netfilter-devel, netdev,
	balazs.scheidler

Daniel Mack <daniel@zonque.org> wrote:
> Of course you can drop certain packets at this point, depending on other
> details. Say, for instance, you want to match all packets that are
> received by a certain task and that are originated from IP addresses of
> a specific subnet, and drop the rest. Rather than adding matches to your
> global firewall configuration for all the ports that tasks may or may
> not listen on, you can just do it on a higher level, from the
> perspective of an administrator. If you decide to let your web server
> listen on another port as well, no firewall rule configuration change is
> needed at all.

We don't have per application configurations, not even with these
patches...?
The configuration is still global/per netns?

> Another use case is accounting. If you want to know how much traffic a
> certain service or application in your system has caused, you don't want
> to match all its ports to firewall rules just in order to get that
> information. Instead, you can now derive that information on a
> per-application base. With this patch set, this even works just fine for
> multicast listeners, which is something that is currently impossible to
> achieve otherwise.

> > So the only 'benefit' is that netcls id is available; but
> > a) why is that even needed and
> 
> It's currently the only way of realizing application-level firewalls,
> and it'd be an awesome feature if it actually worked.

Application level firewall makes me think of proxies.
What exactly are we talking about?
Can you provide examples?

> > b) is such a huge sledgehammer just for net cgroup accounting
> > worth it?
> 
> I really don't know if this approach is intrusive enough to make it
> qualify as sledgehammer. I'd like to see some real-world benchmarks and
> have proof there is a performance decrease for setups that don't use
> such chains.

We have static keys for nf hooks so there is no performance decrease if
no netfilter hooks are active.

HOWEVER, this discussion seems to hint at the opposite, namely that this
will turn into a mandatory or at least auto-enabled feature.

But regardless, I called it a sledgehammer not because of performance
concerns but because just to get this proposed socket matching facility we
now have to put hooks into all protocol handlers and commit to keeping them
there forever.

Also, some time ago Eric suggested to allow binding to tcp ports
from packet sockets to implement userspace-driven TCP stacks (i.e.
kernel would just queue raw frames for the given quadruple withput
caring about ordering etc).

If we allow this proposed change, we'd also have to extend that
with these hooks.

Next time someone mentions that we don't have application matching
but only sk/cgroup tests. Are we then going to consider doing task lookups?

> > For deterministic ingress filtering you can only rely on what
> > is contained in the packet.
> 
> Why so? For deterministic ingress filtering of traffic directed to a
> local socket, you can as well rely on information associated with that
> socket. And this is what application-level firewall rule sets are all about.

Application not running -> different behaviour.  I'm sure thats obvious
to you, but I'm not sure if its obvious to users that their
'match on net clsid 42' won't work anymore when that service isn't running.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH RFC 3/7] netfilter: add NF_INET_LOCAL_SOCKET_IN chain type
  2015-09-30 21:48       ` Florian Westphal
@ 2015-10-01  9:04         ` Daniel Mack
  0 siblings, 0 replies; 21+ messages in thread
From: Daniel Mack @ 2015-10-01  9:04 UTC (permalink / raw)
  To: Florian Westphal; +Cc: pablo, daniel, netfilter-devel, netdev, balazs.scheidler

Hi Florian,

On 09/30/2015 11:48 PM, Florian Westphal wrote:
> Daniel Mack <daniel@zonque.org> wrote:
>> Of course you can drop certain packets at this point, depending on other
>> details. Say, for instance, you want to match all packets that are
>> received by a certain task and that are originated from IP addresses of
>> a specific subnet, and drop the rest. Rather than adding matches to your
>> global firewall configuration for all the ports that tasks may or may
>> not listen on, you can just do it on a higher level, from the
>> perspective of an administrator. If you decide to let your web server
>> listen on another port as well, no firewall rule configuration change is
>> needed at all.
> 
> We don't have per application configurations, not even with these
> patches...?
> The configuration is still global/per netns?

The configuration is per net_cls ID, and with a cgroup manager that
assigns unique IDs to each application/service/task, this effectively
becomes a per-application setting (or anything else for that matter, see
below).

> Application level firewall makes me think of proxies.
> What exactly are we talking about?
> Can you provide examples?

Ok, so the term is misleading, I agree. I am talking about a possibility
for userspace to install netfilter rules that match packets sent to or
sent by a task, or a set of tasks (you may call that an application or
service).

So for example, I want a way to redirect netfilter decisions that are
about tasks in net_cls cgroup 23 to a dedicated chain, so the admin can
configure rules that only apply to a certain service or application.

>> I really don't know if this approach is intrusive enough to make it
>> qualify as sledgehammer. I'd like to see some real-world benchmarks and
>> have proof there is a performance decrease for setups that don't use
>> such chains.
> 
> We have static keys for nf hooks so there is no performance decrease if
> no netfilter hooks are active.
> 
> HOWEVER, this discussion seems to hint at the opposite, namely that this
> will turn into a mandatory or at least auto-enabled feature.
>
> But regardless, I called it a sledgehammer not because of performance
> concerns but because just to get this proposed socket matching facility we
> now have to put hooks into all protocol handlers and commit to keeping them
> there forever.

Sure, but why is that a problem exactly? I still don't understand your
technical concerns.

> Also, some time ago Eric suggested to allow binding to tcp ports
> from packet sockets to implement userspace-driven TCP stacks (i.e.
> kernel would just queue raw frames for the given quadruple withput
> caring about ordering etc).
>
> If we allow this proposed change, we'd also have to extend that
> with these hooks.

In that case, the receiver socket would be the one of that
userspace-driven TCP stack, and the same rules apply. The feature might
not make much sense in such setups though, but that's not an issue.

> Next time someone mentions that we don't have application matching
> but only sk/cgroup tests. Are we then going to consider doing task lookups?

cgroups are the most comprehensive notion the kernel has about a group
of tasks, which may be called a service or an application. By allowing
users to make netfilter rules specific to a setting in a cgroup
controller, we open the doors for supporting a lot of interesting use
cases. Which tasks to put into a cgroup with a certain net_cls setting
is up to userspace, but it would be possible to group everything the OS
considers an 'application'.

>> Why so? For deterministic ingress filtering of traffic directed to a
>> local socket, you can as well rely on information associated with that
>> socket. And this is what application-level firewall rule sets are all about.
> 
> Application not running -> different behaviour.  I'm sure thats obvious
> to you, but I'm not sure if its obvious to users that their
> 'match on net clsid 42' won't work anymore when that service isn't running.

That's a matter of explaining that fact in the documentation, I'd say :)

In a sense, what happens is comparable to rules that are specific to an
ingress interface. Depending on whether that interface is up or down,
these netfilter rules are iterated or they aren't, right?

Anyway, note that socket uid/gid and cgroup matches already exist for
ingress today, so the exact same change in behavior that you describe is
in place since a number of releases. It's just that right now, what the
kernel does is to only match packets that had early demux treatment,
which makes it largely unusable for users. So, what this patch set aims
for is to make an existing feature actually usable.

I'm open to other approaches, as long as it allows us to install
netfilter rules that match against net_cls IDs. The demux target would
probably be another option, but I'm not convinced yet that would be any
better from a technical perspective.

Pablo, what do you think?

Thanks,
Daniel

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH RFC 3/7] netfilter: add NF_INET_LOCAL_SOCKET_IN chain type
  2015-09-30  7:24     ` Daniel Mack
  2015-09-30  7:40       ` Jan Engelhardt
  2015-09-30 21:48       ` Florian Westphal
@ 2015-10-01 17:13       ` Marcelo Ricardo Leitner
  2015-10-01 21:07         ` Daniel Mack
  2 siblings, 1 reply; 21+ messages in thread
From: Marcelo Ricardo Leitner @ 2015-10-01 17:13 UTC (permalink / raw)
  To: Daniel Mack
  Cc: Florian Westphal, pablo, daniel, netfilter-devel, netdev,
	balazs.scheidler

On Wed, Sep 30, 2015 at 09:24:21AM +0200, Daniel Mack wrote:
> On 09/29/2015 11:19 PM, Florian Westphal wrote:
> > Daniel Mack <daniel@zonque.org> wrote:
> >> Add a new chain type NF_INET_LOCAL_SOCKET_IN which is ran after the
> >> input demux is complete and the final destination socket (if any)
> >> has been determined.
> >>
> >> This helps filtering packets based on information stored in the
> >> destination socket, such as cgroup controller supplied net class IDs.
> > 
> > This still seems like the 'x y' problem ("want to do X, think Y is
> > correct solution; ask about Y, but thats a strange thing to do").
> > 
> > There is nothing that this offers over INPUT *except* that sk is
> > available.  But there is zero benefit as far as I am concerned --
> > why would you want to do any meaningful filtering based on the sk at
> > that point...?
> 
> Well, INPUT and SOCKET_INPUT are just two different tools that help
> solve different classes of problems. INPUT is for filtering all local
> traffic while SOCKET_INPUT is just for such that actually has a
> listener, and they both make sense in different scenarios.

How is it better than -m socket ? It's used with tproxy, but not only,
and works quite well, thought it only supports TCP and UDP.

Something like
  iptables -N INPUT_SOCKET
  iptables -I INPUT -m socket -j INPUT_SOCKET
would achieve similar results, if I got you right.

-m socket implies in a double-lookup for the socket, yes, but that
sounds a reasonable price to pay for this while not inserting another
hook. I know of deployments using -m socket for tproxy and handling very
high rates, performance has not been a problem..

> > Drop?  Makes no sense, else application would not be running in the first
> > place.
> 
> Of course you can drop certain packets at this point, depending on other
> details. Say, for instance, you want to match all packets that are
> received by a certain task and that are originated from IP addresses of
> a specific subnet, and drop the rest. Rather than adding matches to your
> global firewall configuration for all the ports that tasks may or may
> not listen on, you can just do it on a higher level, from the
> perspective of an administrator. If you decide to let your web server
> listen on another port as well, no firewall rule configuration change is
> needed at all.
> 
> Another use case is accounting. If you want to know how much traffic a
> certain service or application in your system has caused, you don't want
> to match all its ports to firewall rules just in order to get that
> information. Instead, you can now derive that information on a
> per-application base. With this patch set, this even works just fine for
> multicast listeners, which is something that is currently impossible to
> achieve otherwise.

Sounds like you want a boosted socket match, perhaps borrowing some of
the features from owner match.

  Marcelo

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH RFC 3/7] netfilter: add NF_INET_LOCAL_SOCKET_IN chain type
  2015-10-01 17:13       ` Marcelo Ricardo Leitner
@ 2015-10-01 21:07         ` Daniel Mack
  2015-10-01 21:34           ` Marcelo Ricardo Leitner
  2015-10-02 11:07           ` Pablo Neira Ayuso
  0 siblings, 2 replies; 21+ messages in thread
From: Daniel Mack @ 2015-10-01 21:07 UTC (permalink / raw)
  To: Marcelo Ricardo Leitner
  Cc: Florian Westphal, pablo, daniel, netfilter-devel, netdev,
	balazs.scheidler

On 10/01/2015 07:13 PM, Marcelo Ricardo Leitner wrote:
> On Wed, Sep 30, 2015 at 09:24:21AM +0200, Daniel Mack wrote:
>> On 09/29/2015 11:19 PM, Florian Westphal wrote:
>>> Daniel Mack <daniel@zonque.org> wrote:
>>>> Add a new chain type NF_INET_LOCAL_SOCKET_IN which is ran after the
>>>> input demux is complete and the final destination socket (if any)
>>>> has been determined.
>>>>
>>>> This helps filtering packets based on information stored in the
>>>> destination socket, such as cgroup controller supplied net class IDs.
>>>
>>> This still seems like the 'x y' problem ("want to do X, think Y is
>>> correct solution; ask about Y, but thats a strange thing to do").
>>>
>>> There is nothing that this offers over INPUT *except* that sk is
>>> available.  But there is zero benefit as far as I am concerned --
>>> why would you want to do any meaningful filtering based on the sk at
>>> that point...?
>>
>> Well, INPUT and SOCKET_INPUT are just two different tools that help
>> solve different classes of problems. INPUT is for filtering all local
>> traffic while SOCKET_INPUT is just for such that actually has a
>> listener, and they both make sense in different scenarios.
> 
> How is it better than -m socket ? It's used with tproxy, but not only,
> and works quite well, thought it only supports TCP and UDP.

Yes, but not multicast.

> Something like
>   iptables -N INPUT_SOCKET
>   iptables -I INPUT -m socket -j INPUT_SOCKET
> would achieve similar results, if I got you right.
> 
> -m socket implies in a double-lookup for the socket, yes, but that
> sounds a reasonable price to pay for this while not inserting another
> hook. I know of deployments using -m socket for tproxy and handling very
> high rates, performance has not been a problem..

I know, and my primary attempt to get this fixed was to factor out the
early demux code from the socket matching code and make it available to
the cgroup matcher as well:


http://thread.gmane.org/gmane.comp.security.firewalls.netfilter.devel/58054

That, however, got rejected because it doesn't work for multicast. This
patch set implements one of the things Pablo suggested in his reply.


Daniel

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH RFC 3/7] netfilter: add NF_INET_LOCAL_SOCKET_IN chain type
  2015-10-01 21:07         ` Daniel Mack
@ 2015-10-01 21:34           ` Marcelo Ricardo Leitner
  2015-10-02 11:07           ` Pablo Neira Ayuso
  1 sibling, 0 replies; 21+ messages in thread
From: Marcelo Ricardo Leitner @ 2015-10-01 21:34 UTC (permalink / raw)
  To: Daniel Mack
  Cc: Florian Westphal, pablo, daniel, netfilter-devel, netdev,
	balazs.scheidler

On Thu, Oct 01, 2015 at 11:07:30PM +0200, Daniel Mack wrote:
> On 10/01/2015 07:13 PM, Marcelo Ricardo Leitner wrote:
> > On Wed, Sep 30, 2015 at 09:24:21AM +0200, Daniel Mack wrote:
> >> On 09/29/2015 11:19 PM, Florian Westphal wrote:
> >>> Daniel Mack <daniel@zonque.org> wrote:
> >>>> Add a new chain type NF_INET_LOCAL_SOCKET_IN which is ran after the
> >>>> input demux is complete and the final destination socket (if any)
> >>>> has been determined.
> >>>>
> >>>> This helps filtering packets based on information stored in the
> >>>> destination socket, such as cgroup controller supplied net class IDs.
> >>>
> >>> This still seems like the 'x y' problem ("want to do X, think Y is
> >>> correct solution; ask about Y, but thats a strange thing to do").
> >>>
> >>> There is nothing that this offers over INPUT *except* that sk is
> >>> available.  But there is zero benefit as far as I am concerned --
> >>> why would you want to do any meaningful filtering based on the sk at
> >>> that point...?
> >>
> >> Well, INPUT and SOCKET_INPUT are just two different tools that help
> >> solve different classes of problems. INPUT is for filtering all local
> >> traffic while SOCKET_INPUT is just for such that actually has a
> >> listener, and they both make sense in different scenarios.
> > 
> > How is it better than -m socket ? It's used with tproxy, but not only,
> > and works quite well, thought it only supports TCP and UDP.
> 
> Yes, but not multicast.

Right

> > Something like
> >   iptables -N INPUT_SOCKET
> >   iptables -I INPUT -m socket -j INPUT_SOCKET
> > would achieve similar results, if I got you right.
> > 
> > -m socket implies in a double-lookup for the socket, yes, but that
> > sounds a reasonable price to pay for this while not inserting another
> > hook. I know of deployments using -m socket for tproxy and handling very
> > high rates, performance has not been a problem..
> 
> I know, and my primary attempt to get this fixed was to factor out the
> early demux code from the socket matching code and make it available to
> the cgroup matcher as well:
> 
> 
> http://thread.gmane.org/gmane.comp.security.firewalls.netfilter.devel/58054
> 
> That, however, got rejected because it doesn't work for multicast. This
> patch set implements one of the things Pablo suggested in his reply.

Ok, thanks for the info. Makes sense, hmm.

  Marcelo

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH RFC 3/7] netfilter: add NF_INET_LOCAL_SOCKET_IN chain type
  2015-10-01 21:07         ` Daniel Mack
  2015-10-01 21:34           ` Marcelo Ricardo Leitner
@ 2015-10-02 11:07           ` Pablo Neira Ayuso
  2015-10-02 13:52             ` Daniel Mack
  1 sibling, 1 reply; 21+ messages in thread
From: Pablo Neira Ayuso @ 2015-10-02 11:07 UTC (permalink / raw)
  To: Daniel Mack
  Cc: Marcelo Ricardo Leitner, Florian Westphal, daniel,
	netfilter-devel, netdev, balazs.scheidler

On Thu, Oct 01, 2015 at 11:07:30PM +0200, Daniel Mack wrote:
[...]
> That, however, got rejected because it doesn't work for multicast. This
> patch set implements one of the things Pablo suggested in his reply.

People are rising valid concerns here, so far we got a RFC where you
say that you don't have a proper setup to validate performance impact.

>From the locking front, you indicated that there are possible problems
in this RFC, although you claim those can be fixed.

No examples on how you will use this are shown, which has triggered
questions on how you plan to use this. Only one use-case that has been
described in natural language.

Rergading inconsistent behaviour when no process are listening, your
argument is that "that can be documented".

Frankly, I would expect you do the work from your side to justify the
inclusion of this, and that requires that your cover open fronts from
the technical side, not just arguing.

Thanks.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH RFC 3/7] netfilter: add NF_INET_LOCAL_SOCKET_IN chain type
  2015-10-02 11:07           ` Pablo Neira Ayuso
@ 2015-10-02 13:52             ` Daniel Mack
  0 siblings, 0 replies; 21+ messages in thread
From: Daniel Mack @ 2015-10-02 13:52 UTC (permalink / raw)
  To: Pablo Neira Ayuso
  Cc: Marcelo Ricardo Leitner, Florian Westphal, daniel,
	netfilter-devel, netdev, balazs.scheidler

On 10/02/2015 01:07 PM, Pablo Neira Ayuso wrote:
> On Thu, Oct 01, 2015 at 11:07:30PM +0200, Daniel Mack wrote:
> [...]
>> That, however, got rejected because it doesn't work for multicast. This
>> patch set implements one of the things Pablo suggested in his reply.
> 
> People are rising valid concerns here, so far we got a RFC where you
> say that you don't have a proper setup to validate performance impact.
> 
> From the locking front, you indicated that there are possible problems
> in this RFC, although you claim those can be fixed.
> 
> No examples on how you will use this are shown, which has triggered
> questions on how you plan to use this. Only one use-case that has been
> described in natural language.
> 
> Rergading inconsistent behaviour when no process are listening, your
> argument is that "that can be documented".
> 
> Frankly, I would expect you do the work from your side to justify the
> inclusion of this, and that requires that your cover open fronts from
> the technical side, not just arguing.

Sure, I'm willing to do this of course. The purpose of this RFC was only
to outline where this approach would go, and to allow discussions about
possible show stoppers.

I'll respin this in a new series then.



Thanks,
Daniel

^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2015-10-02 13:52 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-09-29 11:12 [PATCH RFC 0/7] netfilter: introduce new chain type for local socket input Daniel Mack
2015-09-29 11:12 ` [PATCH RFC 1/7] netfilter: add socket to struct nft_pktinfo Daniel Mack
2015-09-29 18:25   ` Eric W. Biederman
2015-09-29 11:12 ` [PATCH RFC 2/7] netfilter: nft_meta: look at pkt->sk rather than skb->sk Daniel Mack
2015-09-29 13:37   ` kbuild test robot
2015-09-29 11:12 ` [PATCH RFC 3/7] netfilter: add NF_INET_LOCAL_SOCKET_IN chain type Daniel Mack
2015-09-29 21:19   ` Florian Westphal
2015-09-30  7:24     ` Daniel Mack
2015-09-30  7:40       ` Jan Engelhardt
2015-09-30  8:54         ` Daniel Mack
2015-09-30 21:48       ` Florian Westphal
2015-10-01  9:04         ` Daniel Mack
2015-10-01 17:13       ` Marcelo Ricardo Leitner
2015-10-01 21:07         ` Daniel Mack
2015-10-01 21:34           ` Marcelo Ricardo Leitner
2015-10-02 11:07           ` Pablo Neira Ayuso
2015-10-02 13:52             ` Daniel Mack
2015-09-29 11:12 ` [PATCH RFC 4/7] net: tcp_ipv4, udp_ipv4: hook up LOCAL_SOCKET_IN netfilter chains Daniel Mack
2015-09-29 11:12 ` [PATCH RFC 5/7] net: tcp_ipv6, udp_ipv6: " Daniel Mack
2015-09-29 11:12 ` [PATCH RFC 6/7] net: sctp: " Daniel Mack
2015-09-29 11:12 ` [PATCH RFC 7/7] net: dccp: " Daniel Mack

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.