All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH net-next 0/2] vrf: rework interaction with netfilter/conntrack
@ 2021-10-21 14:48 Florian Westphal
  2021-10-21 14:48 ` [PATCH net-next 1/2] netfilter: conntrack: skip confirmation and nat hooks in postrouting for vrf Florian Westphal
  2021-10-21 14:48 ` [PATCH net-next 2/2] vrf: run conntrack only in context of lower/physdev for locally generated packets Florian Westphal
  0 siblings, 2 replies; 7+ messages in thread
From: Florian Westphal @ 2021-10-21 14:48 UTC (permalink / raw)
  To: netdev
  Cc: netfilter-devel, dsahern, pablo, crosser, lschlesinger, Florian Westphal

This patch series aims to solve the to-be-reverted change 09e856d54bda5f288e
("vrf: Reset skb conntrack connection on VRF rcv") in a different way.

Rather than have skbs pass through conntrack and nat hooks twice, suppress
conntrack invocation if the conntrack/nat hook is called from the vrf driver.

First patch deals with 'incoming connection' case:
1. suppress NAT transformations
2. skip conntrack confirmation

NAT and conntrack confirmation is done when ip/ipv6 stack calls
the postrouting hook.

Second patch deals with local packets:
in vrf driver, mark the skbs as 'untracked', so conntrack output
hook ignores them.  This skips all nat hooks as well.

Afterwards, remove the untracked state again so the second
round will pick them up.

One alternative to the chosen implementation would be to add a 'caller
id' field to 'struct nf_hook_state' and then use that, these patches
use the more straightforward check of VRF flag on the state->out device.

The two patches apply to both net and net-next, i am targeting -next
because I think that since snat did not work correctly for so long that
we can take the longer route.  If you disagree, apply to net at your
discretion.

The patches apply both with 09e856d54bda5f288e reverted or still
in-place, but only with the revert in place ingress conntrack settings
(zone, notrack etc) start working again.

I've already submitted selftests for vrf+nfqueue and conntrack+vrf.

Florian Westphal (2):
  netfilter: conntrack: skip confirmation and nat hooks in postrouting
    for vrf
  vrf: run conntrack only in context of lower/physdev for locally
    generated packets

 drivers/net/vrf.c                  | 28 ++++++++++++++++++++++++----
 net/netfilter/nf_conntrack_proto.c | 16 ++++++++++++++++
 net/netfilter/nf_nat_core.c        | 12 +++++++++++-
 3 files changed, 51 insertions(+), 5 deletions(-)

-- 
2.32.0


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH net-next 1/2] netfilter: conntrack: skip confirmation and nat hooks in postrouting for vrf
  2021-10-21 14:48 [PATCH net-next 0/2] vrf: rework interaction with netfilter/conntrack Florian Westphal
@ 2021-10-21 14:48 ` Florian Westphal
  2021-10-21 14:48 ` [PATCH net-next 2/2] vrf: run conntrack only in context of lower/physdev for locally generated packets Florian Westphal
  1 sibling, 0 replies; 7+ messages in thread
From: Florian Westphal @ 2021-10-21 14:48 UTC (permalink / raw)
  To: netdev
  Cc: netfilter-devel, dsahern, pablo, crosser, lschlesinger, Florian Westphal

The VRF driver invokes netfilter for output+postrouting hooks so that users
can create rules that check for 'oif $vrf' rather than lower device name.

Afterwards, ip stack calls those hooks again.

This is a problem when conntrack is used with IP masquerading.
masquerading has an internal check that re-validates the output
interface to account for route changes.

This check will trigger in the vrf case.

If the -j MASQUERADE rule matched on the first iteration, then round 2
finds state->out->ifindex != nat->masq_index: the latter is the vrf
index, but out->ifindex is the lower device.

The packet gets dropped and the conntrack entry is invalidated.

This change makes conntrack postrouting skip the nat hooks.
Also skip confirmation.  This allows the second round
(postrouting invocation from ipv4/ipv6) to create nat bindings.

This also prevents the second round from seeing packets that had their
source address changed by the nat hook.

Signed-off-by: Florian Westphal <fw@strlen.de>
---
 net/netfilter/nf_conntrack_proto.c | 16 ++++++++++++++++
 net/netfilter/nf_nat_core.c        | 12 +++++++++++-
 2 files changed, 27 insertions(+), 1 deletion(-)

diff --git a/net/netfilter/nf_conntrack_proto.c b/net/netfilter/nf_conntrack_proto.c
index 8f7a9837349c..d1f2d3c8d2b1 100644
--- a/net/netfilter/nf_conntrack_proto.c
+++ b/net/netfilter/nf_conntrack_proto.c
@@ -155,6 +155,16 @@ unsigned int nf_confirm(struct sk_buff *skb, unsigned int protoff,
 }
 EXPORT_SYMBOL_GPL(nf_confirm);
 
+static bool in_vrf_postrouting(const struct nf_hook_state *state)
+{
+#if IS_ENABLED(CONFIG_NET_L3_MASTER_DEV)
+	if (state->hook == NF_INET_POST_ROUTING &&
+	    netif_is_l3_master(state->out))
+		return true;
+#endif
+	return false;
+}
+
 static unsigned int ipv4_confirm(void *priv,
 				 struct sk_buff *skb,
 				 const struct nf_hook_state *state)
@@ -166,6 +176,9 @@ static unsigned int ipv4_confirm(void *priv,
 	if (!ct || ctinfo == IP_CT_RELATED_REPLY)
 		return nf_conntrack_confirm(skb);
 
+	if (in_vrf_postrouting(state))
+		return NF_ACCEPT;
+
 	return nf_confirm(skb,
 			  skb_network_offset(skb) + ip_hdrlen(skb),
 			  ct, ctinfo);
@@ -374,6 +387,9 @@ static unsigned int ipv6_confirm(void *priv,
 	if (!ct || ctinfo == IP_CT_RELATED_REPLY)
 		return nf_conntrack_confirm(skb);
 
+	if (in_vrf_postrouting(state))
+		return NF_ACCEPT;
+
 	protoff = ipv6_skip_exthdr(skb, sizeof(struct ipv6hdr), &pnum,
 				   &frag_off);
 	if (protoff < 0 || (frag_off & htons(~0x7)) != 0) {
diff --git a/net/netfilter/nf_nat_core.c b/net/netfilter/nf_nat_core.c
index 273117683922..4d50d51db796 100644
--- a/net/netfilter/nf_nat_core.c
+++ b/net/netfilter/nf_nat_core.c
@@ -699,6 +699,16 @@ unsigned int nf_nat_packet(struct nf_conn *ct,
 }
 EXPORT_SYMBOL_GPL(nf_nat_packet);
 
+static bool in_vrf_postrouting(const struct nf_hook_state *state)
+{
+#if IS_ENABLED(CONFIG_NET_L3_MASTER_DEV)
+	if (state->hook == NF_INET_POST_ROUTING &&
+	    netif_is_l3_master(state->out))
+		return true;
+#endif
+	return false;
+}
+
 unsigned int
 nf_nat_inet_fn(void *priv, struct sk_buff *skb,
 	       const struct nf_hook_state *state)
@@ -715,7 +725,7 @@ nf_nat_inet_fn(void *priv, struct sk_buff *skb,
 	 * packet filter it out, or implement conntrack/NAT for that
 	 * protocol. 8) --RR
 	 */
-	if (!ct)
+	if (!ct || in_vrf_postrouting(state))
 		return NF_ACCEPT;
 
 	nat = nfct_nat(ct);
-- 
2.32.0


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH net-next 2/2] vrf: run conntrack only in context of lower/physdev for locally generated packets
  2021-10-21 14:48 [PATCH net-next 0/2] vrf: rework interaction with netfilter/conntrack Florian Westphal
  2021-10-21 14:48 ` [PATCH net-next 1/2] netfilter: conntrack: skip confirmation and nat hooks in postrouting for vrf Florian Westphal
@ 2021-10-21 14:48 ` Florian Westphal
  2021-10-21 22:25   ` Jakub Kicinski
  2021-10-21 23:03   ` Eugene Crosser
  1 sibling, 2 replies; 7+ messages in thread
From: Florian Westphal @ 2021-10-21 14:48 UTC (permalink / raw)
  To: netdev
  Cc: netfilter-devel, dsahern, pablo, crosser, lschlesinger, Florian Westphal

The VRF driver invokes netfilter for output+postrouting hooks so that users
can create rules that check for 'oif $vrf' rather than lower device name.

This is a problem when NAT rules are configured.

To avoid any conntrack involvement in round 1, tag skbs as 'untracked'
to prevent conntrack from picking them up.

This gets cleared before the packet gets handed to the ip stack so
conntrack will be active on the second iteration.

For ingress, conntrack has already been done before the packet makes it
to the vrf driver, with this patch egress does connection tracking with
lower/physical device as well.

Signed-off-by: Florian Westphal <fw@strlen.de>
---
 drivers/net/vrf.c | 28 ++++++++++++++++++++++++----
 1 file changed, 24 insertions(+), 4 deletions(-)

diff --git a/drivers/net/vrf.c b/drivers/net/vrf.c
index bf2fac913942..c813d03159bf 100644
--- a/drivers/net/vrf.c
+++ b/drivers/net/vrf.c
@@ -35,6 +35,7 @@
 #include <net/l3mdev.h>
 #include <net/fib_rules.h>
 #include <net/netns/generic.h>
+#include <net/netfilter/nf_conntrack.h>
 
 #define DRV_NAME	"vrf"
 #define DRV_VERSION	"1.1"
@@ -424,12 +425,26 @@ static int vrf_local_xmit(struct sk_buff *skb, struct net_device *dev,
 	return NETDEV_TX_OK;
 }
 
+static void vrf_nf_set_untracked(struct sk_buff *skb)
+{
+	if (skb_get_nfct(skb) == 0)
+		nf_ct_set(skb, 0, IP_CT_UNTRACKED);
+}
+
+static void vrf_nf_reset_ct(struct sk_buff *skb)
+{
+	if (skb_get_nfct(skb) == IP_CT_UNTRACKED)
+		nf_reset_ct(skb);
+}
+
 #if IS_ENABLED(CONFIG_IPV6)
 static int vrf_ip6_local_out(struct net *net, struct sock *sk,
 			     struct sk_buff *skb)
 {
 	int err;
 
+	vrf_nf_reset_ct(skb);
+
 	err = nf_hook(NFPROTO_IPV6, NF_INET_LOCAL_OUT, net,
 		      sk, skb, NULL, skb_dst(skb)->dev, dst_output);
 
@@ -508,6 +523,8 @@ static int vrf_ip_local_out(struct net *net, struct sock *sk,
 {
 	int err;
 
+	vrf_nf_reset_ct(skb);
+
 	err = nf_hook(NFPROTO_IPV4, NF_INET_LOCAL_OUT, net, sk,
 		      skb, NULL, skb_dst(skb)->dev, dst_output);
 	if (likely(err == 1))
@@ -626,8 +643,7 @@ static void vrf_finish_direct(struct sk_buff *skb)
 		skb_pull(skb, ETH_HLEN);
 	}
 
-	/* reset skb device */
-	nf_reset_ct(skb);
+	vrf_nf_reset_ct(skb);
 }
 
 #if IS_ENABLED(CONFIG_IPV6)
@@ -641,7 +657,7 @@ static int vrf_finish_output6(struct net *net, struct sock *sk,
 	struct neighbour *neigh;
 	int ret;
 
-	nf_reset_ct(skb);
+	vrf_nf_reset_ct(skb);
 
 	skb->protocol = htons(ETH_P_IPV6);
 	skb->dev = dev;
@@ -752,6 +768,8 @@ static struct sk_buff *vrf_ip6_out_direct(struct net_device *vrf_dev,
 
 	skb->dev = vrf_dev;
 
+	vrf_nf_set_untracked(skb);
+
 	err = nf_hook(NFPROTO_IPV6, NF_INET_LOCAL_OUT, net, sk,
 		      skb, NULL, vrf_dev, vrf_ip6_out_direct_finish);
 
@@ -858,7 +876,7 @@ static int vrf_finish_output(struct net *net, struct sock *sk, struct sk_buff *s
 	struct neighbour *neigh;
 	bool is_v6gw = false;
 
-	nf_reset_ct(skb);
+	vrf_nf_reset_ct(skb);
 
 	/* Be paranoid, rather than too clever. */
 	if (unlikely(skb_headroom(skb) < hh_len && dev->header_ops)) {
@@ -980,6 +998,8 @@ static struct sk_buff *vrf_ip_out_direct(struct net_device *vrf_dev,
 
 	skb->dev = vrf_dev;
 
+	vrf_nf_set_untracked(skb);
+
 	err = nf_hook(NFPROTO_IPV4, NF_INET_LOCAL_OUT, net, sk,
 		      skb, NULL, vrf_dev, vrf_ip_out_direct_finish);
 
-- 
2.32.0


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH net-next 2/2] vrf: run conntrack only in context of lower/physdev for locally generated packets
  2021-10-21 14:48 ` [PATCH net-next 2/2] vrf: run conntrack only in context of lower/physdev for locally generated packets Florian Westphal
@ 2021-10-21 22:25   ` Jakub Kicinski
  2021-10-21 23:03   ` Eugene Crosser
  1 sibling, 0 replies; 7+ messages in thread
From: Jakub Kicinski @ 2021-10-21 22:25 UTC (permalink / raw)
  To: Florian Westphal
  Cc: netdev, netfilter-devel, dsahern, pablo, crosser, lschlesinger

On Thu, 21 Oct 2021 16:48:57 +0200 Florian Westphal wrote:
> +		nf_ct_set(skb, 0, IP_CT_UNTRACKED);

drivers/net/vrf.c:431:32: warning: Using plain integer as NULL pointer

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH net-next 2/2] vrf: run conntrack only in context of lower/physdev for locally generated packets
  2021-10-21 14:48 ` [PATCH net-next 2/2] vrf: run conntrack only in context of lower/physdev for locally generated packets Florian Westphal
  2021-10-21 22:25   ` Jakub Kicinski
@ 2021-10-21 23:03   ` Eugene Crosser
  2021-10-21 23:58     ` Florian Westphal
  1 sibling, 1 reply; 7+ messages in thread
From: Eugene Crosser @ 2021-10-21 23:03 UTC (permalink / raw)
  To: Florian Westphal, netdev; +Cc: netfilter-devel, dsahern, pablo, lschlesinger


[-- Attachment #1.1: Type: text/plain, Size: 3919 bytes --]

On 21/10/2021 16:48, Florian Westphal wrote:
> The VRF driver invokes netfilter for output+postrouting hooks so that users
> can create rules that check for 'oif $vrf' rather than lower device name.
> 
> This is a problem when NAT rules are configured.
> 
> To avoid any conntrack involvement in round 1, tag skbs as 'untracked'
> to prevent conntrack from picking them up.
> 
> This gets cleared before the packet gets handed to the ip stack so
> conntrack will be active on the second iteration.
> 
> For ingress, conntrack has already been done before the packet makes it
> to the vrf driver, with this patch egress does connection tracking with
> lower/physical device as well.
> 
> Signed-off-by: Florian Westphal <fw@strlen.de>
> ---
>  drivers/net/vrf.c | 28 ++++++++++++++++++++++++----
>  1 file changed, 24 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/net/vrf.c b/drivers/net/vrf.c
> index bf2fac913942..c813d03159bf 100644
> --- a/drivers/net/vrf.c
> +++ b/drivers/net/vrf.c
> @@ -35,6 +35,7 @@
>  #include <net/l3mdev.h>
>  #include <net/fib_rules.h>
>  #include <net/netns/generic.h>
> +#include <net/netfilter/nf_conntrack.h>
>  
>  #define DRV_NAME	"vrf"
>  #define DRV_VERSION	"1.1"
> @@ -424,12 +425,26 @@ static int vrf_local_xmit(struct sk_buff *skb, struct net_device *dev,
>  	return NETDEV_TX_OK;
>  }
>  
> +static void vrf_nf_set_untracked(struct sk_buff *skb)
> +{
> +	if (skb_get_nfct(skb) == 0)
> +		nf_ct_set(skb, 0, IP_CT_UNTRACKED);
> +}
> +
> +static void vrf_nf_reset_ct(struct sk_buff *skb)
> +{
> +	if (skb_get_nfct(skb) == IP_CT_UNTRACKED)
> +		nf_reset_ct(skb);
> +}
> +

Isn't it possible that skb was marked UNTRACKED before entering this path, by a
rule? In such case 'set_untrackd' will do nothing, but 'reset_ct' will clear
UNTRACKED status that was set elswhere. It seems wrong, am I missing something?

>  #if IS_ENABLED(CONFIG_IPV6)
>  static int vrf_ip6_local_out(struct net *net, struct sock *sk,
>  			     struct sk_buff *skb)
>  {
>  	int err;
>  
> +	vrf_nf_reset_ct(skb);
> +
>  	err = nf_hook(NFPROTO_IPV6, NF_INET_LOCAL_OUT, net,
>  		      sk, skb, NULL, skb_dst(skb)->dev, dst_output);
>  
> @@ -508,6 +523,8 @@ static int vrf_ip_local_out(struct net *net, struct sock *sk,
>  {
>  	int err;
>  
> +	vrf_nf_reset_ct(skb);
> +
>  	err = nf_hook(NFPROTO_IPV4, NF_INET_LOCAL_OUT, net, sk,
>  		      skb, NULL, skb_dst(skb)->dev, dst_output);
>  	if (likely(err == 1))
> @@ -626,8 +643,7 @@ static void vrf_finish_direct(struct sk_buff *skb)
>  		skb_pull(skb, ETH_HLEN);
>  	}
>  
> -	/* reset skb device */
> -	nf_reset_ct(skb);
> +	vrf_nf_reset_ct(skb);
>  }
>  
>  #if IS_ENABLED(CONFIG_IPV6)
> @@ -641,7 +657,7 @@ static int vrf_finish_output6(struct net *net, struct sock *sk,
>  	struct neighbour *neigh;
>  	int ret;
>  
> -	nf_reset_ct(skb);
> +	vrf_nf_reset_ct(skb);
>  
>  	skb->protocol = htons(ETH_P_IPV6);
>  	skb->dev = dev;
> @@ -752,6 +768,8 @@ static struct sk_buff *vrf_ip6_out_direct(struct net_device *vrf_dev,
>  
>  	skb->dev = vrf_dev;
>  
> +	vrf_nf_set_untracked(skb);
> +
>  	err = nf_hook(NFPROTO_IPV6, NF_INET_LOCAL_OUT, net, sk,
>  		      skb, NULL, vrf_dev, vrf_ip6_out_direct_finish);
>  
> @@ -858,7 +876,7 @@ static int vrf_finish_output(struct net *net, struct sock *sk, struct sk_buff *s
>  	struct neighbour *neigh;
>  	bool is_v6gw = false;
>  
> -	nf_reset_ct(skb);
> +	vrf_nf_reset_ct(skb);
>  
>  	/* Be paranoid, rather than too clever. */
>  	if (unlikely(skb_headroom(skb) < hh_len && dev->header_ops)) {
> @@ -980,6 +998,8 @@ static struct sk_buff *vrf_ip_out_direct(struct net_device *vrf_dev,
>  
>  	skb->dev = vrf_dev;
>  
> +	vrf_nf_set_untracked(skb);
> +
>  	err = nf_hook(NFPROTO_IPV4, NF_INET_LOCAL_OUT, net, sk,
>  		      skb, NULL, vrf_dev, vrf_ip_out_direct_finish);
>  
> 


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH net-next 2/2] vrf: run conntrack only in context of lower/physdev for locally generated packets
  2021-10-21 23:03   ` Eugene Crosser
@ 2021-10-21 23:58     ` Florian Westphal
  2021-10-22  0:04       ` Florian Westphal
  0 siblings, 1 reply; 7+ messages in thread
From: Florian Westphal @ 2021-10-21 23:58 UTC (permalink / raw)
  To: Eugene Crosser
  Cc: Florian Westphal, netdev, netfilter-devel, dsahern, pablo, lschlesinger

Eugene Crosser <crosser@average.org> wrote:
> > +static void vrf_nf_set_untracked(struct sk_buff *skb)
> > +{
> > +	if (skb_get_nfct(skb) == 0)
> > +		nf_ct_set(skb, 0, IP_CT_UNTRACKED);
> > +}
> > +
> > +static void vrf_nf_reset_ct(struct sk_buff *skb)
> > +{
> > +	if (skb_get_nfct(skb) == IP_CT_UNTRACKED)
> > +		nf_reset_ct(skb);
> > +}
> > +
> 
> Isn't it possible that skb was marked UNTRACKED before entering this path, by a
> rule?

I don't think so, it should be called before any ruleset evaluation has
taken place.

> In  such case 'set_untrackd' will do nothing, but 'reset_ct' will clear
> UNTRACKED status that was set elswhere. It seems wrong, am I missing something?

No, thats the catch.  I can't find a better option.

I can add a patch to disable all of the NF_HOOK() invocations from vrf
which removes the ability to filter on vrf interface names.

The option to add a caller_id to nf_hook_state struct (so conntrack/nat
can detect when they are called from the vrf hooks) either needs
copypastry of entire NF_HOOK* inline functions into vrf (so the 'is-vrf'
flag can be enabled) or yet another argument to NF_HOOK().

It also leaks even more 'is vrf' checks into conntrack.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH net-next 2/2] vrf: run conntrack only in context of lower/physdev for locally generated packets
  2021-10-21 23:58     ` Florian Westphal
@ 2021-10-22  0:04       ` Florian Westphal
  0 siblings, 0 replies; 7+ messages in thread
From: Florian Westphal @ 2021-10-22  0:04 UTC (permalink / raw)
  To: Florian Westphal
  Cc: Eugene Crosser, netdev, netfilter-devel, dsahern, pablo, lschlesinger

Florian Westphal <fw@strlen.de> wrote:
> Eugene Crosser <crosser@average.org> wrote:
> > In  such case 'set_untrackd' will do nothing, but 'reset_ct' will clear
> > UNTRACKED status that was set elswhere. It seems wrong, am I missing something?
> 
> No, thats the catch.  I can't find a better option.

To clarify, existing code has unconditional reset, so existing rulesets
that set 'notrack' in the first (vrf) round do not affect the second
round.

This feature/bug would remain, which sucks but I can't think of a saner
alternative.

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2021-10-22  0:04 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-10-21 14:48 [PATCH net-next 0/2] vrf: rework interaction with netfilter/conntrack Florian Westphal
2021-10-21 14:48 ` [PATCH net-next 1/2] netfilter: conntrack: skip confirmation and nat hooks in postrouting for vrf Florian Westphal
2021-10-21 14:48 ` [PATCH net-next 2/2] vrf: run conntrack only in context of lower/physdev for locally generated packets Florian Westphal
2021-10-21 22:25   ` Jakub Kicinski
2021-10-21 23:03   ` Eugene Crosser
2021-10-21 23:58     ` Florian Westphal
2021-10-22  0:04       ` Florian Westphal

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.