All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH V2 nf-next] netfilter: meta: add support for setting skb->pkttype
@ 2015-12-10 17:04 Florian Westphal
  2015-12-14 11:45 ` Pablo Neira Ayuso
  0 siblings, 1 reply; 5+ messages in thread
From: Florian Westphal @ 2015-12-10 17:04 UTC (permalink / raw)
  To: netfilter-devel; +Cc: Florian Westphal

This allows to redirect bridged packets to local machine:

ether type ip ether daddr set aa:53:08:12:34:56 meta pkttype set unicast
Without 'set unicast', ip stack discards PACKET_OTHERHOST skbs.

It is also useful to add support for a '-m cluster like' nft rule
(where switch floods packets to several nodes, and each cluster node
 node processes a subset of packets for load distribution).

Mangling is restricted to HOST/OTHER/BROAD/MULTICAST, i.e. you cannot set
skb->pkt_type to PACKET_KERNEL or change PACKET_LOOPBACK to PACKET_HOST.

Signed-off-by: Florian Westphal <fw@strlen.de>
---
 Changes since v1:
  - allow setting pkttype to any of host/broadcast/multicast/other.

 net/netfilter/nft_meta.c | 38 ++++++++++++++++++++++++++++++++++++++
 1 file changed, 38 insertions(+)

diff --git a/net/netfilter/nft_meta.c b/net/netfilter/nft_meta.c
index 85a465b..c89f835 100644
--- a/net/netfilter/nft_meta.c
+++ b/net/netfilter/nft_meta.c
@@ -26,6 +26,8 @@
 #include <net/netfilter/nf_tables_core.h>
 #include <net/netfilter/nft_meta.h>
 
+#include <uapi/linux/netfilter_bridge.h> /* NF_BR_PRE_ROUTING */
+
 void nft_meta_get_eval(const struct nft_expr *expr,
 		       struct nft_regs *regs,
 		       const struct nft_pktinfo *pkt)
@@ -190,6 +192,13 @@ err:
 }
 EXPORT_SYMBOL_GPL(nft_meta_get_eval);
 
+/* don't change or set _LOOPBACK, _USER, etc. */
+static bool pkt_type_ok(u32 p)
+{
+	return p == PACKET_HOST || p == PACKET_BROADCAST ||
+	       p == PACKET_MULTICAST || p == PACKET_OTHERHOST;
+}
+
 void nft_meta_set_eval(const struct nft_expr *expr,
 		       struct nft_regs *regs,
 		       const struct nft_pktinfo *pkt)
@@ -205,6 +214,11 @@ void nft_meta_set_eval(const struct nft_expr *expr,
 	case NFT_META_PRIORITY:
 		skb->priority = value;
 		break;
+	case NFT_META_PKTTYPE:
+		if (skb->pkt_type != value &&
+		    pkt_type_ok(value) && pkt_type_ok(skb->pkt_type))
+			skb->pkt_type = value;
+		break;
 	case NFT_META_NFTRACE:
 		skb->nf_trace = 1;
 		break;
@@ -273,6 +287,24 @@ int nft_meta_get_init(const struct nft_ctx *ctx,
 }
 EXPORT_SYMBOL_GPL(nft_meta_get_init);
 
+static int nft_meta_set_init_pkttype(const struct nft_ctx *ctx)
+{
+	unsigned int hooks;
+
+	switch (ctx->afi->family) {
+	case NFPROTO_BRIDGE:
+		hooks = 1 << NF_BR_PRE_ROUTING;
+		break;
+	case NFPROTO_NETDEV:
+		hooks = 1 << NF_NETDEV_INGRESS;
+		break;
+	default:
+		return -EOPNOTSUPP;
+	}
+
+	return nft_chain_validate_hooks(ctx->chain, hooks);
+}
+
 int nft_meta_set_init(const struct nft_ctx *ctx,
 		      const struct nft_expr *expr,
 		      const struct nlattr * const tb[])
@@ -290,6 +322,12 @@ int nft_meta_set_init(const struct nft_ctx *ctx,
 	case NFT_META_NFTRACE:
 		len = sizeof(u8);
 		break;
+	case NFT_META_PKTTYPE:
+		err = nft_meta_set_init_pkttype(ctx);
+		if (err)
+			return err;
+		len = sizeof(u8);
+		break;
 	default:
 		return -EOPNOTSUPP;
 	}
-- 
2.4.10


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH V2 nf-next] netfilter: meta: add support for setting skb->pkttype
  2015-12-10 17:04 [PATCH V2 nf-next] netfilter: meta: add support for setting skb->pkttype Florian Westphal
@ 2015-12-14 11:45 ` Pablo Neira Ayuso
  2015-12-14 13:38   ` Florian Westphal
  0 siblings, 1 reply; 5+ messages in thread
From: Pablo Neira Ayuso @ 2015-12-14 11:45 UTC (permalink / raw)
  To: Florian Westphal; +Cc: netfilter-devel, kaber

Cc'ing Patrick, I would like to hear him, this is bringing up again
the datatype issue that we discussed before.

On Thu, Dec 10, 2015 at 06:04:07PM +0100, Florian Westphal wrote:
> This allows to redirect bridged packets to local machine:
> 
> ether type ip ether daddr set aa:53:08:12:34:56 meta pkttype set unicast
> Without 'set unicast', ip stack discards PACKET_OTHERHOST skbs.
> 
> It is also useful to add support for a '-m cluster like' nft rule
> (where switch floods packets to several nodes, and each cluster node
>  node processes a subset of packets for load distribution).
> 
> Mangling is restricted to HOST/OTHER/BROAD/MULTICAST, i.e. you cannot set
> skb->pkt_type to PACKET_KERNEL or change PACKET_LOOPBACK to PACKET_HOST.
> @@ -190,6 +192,13 @@ err:
>  }
>  EXPORT_SYMBOL_GPL(nft_meta_get_eval);
>  
> +/* don't change or set _LOOPBACK, _USER, etc. */
> +static bool pkt_type_ok(u32 p)
> +{
> +	return p == PACKET_HOST || p == PACKET_BROADCAST ||
> +	       p == PACKET_MULTICAST || p == PACKET_OTHERHOST;
> +}

I think we should make use of the datatypes in the kernel side, ie.
add NFT_DATA_PKTTYPE. Currently we are using the raw NFT_DATA_VALUE
which doesn't semantically tell us anything on the kind of data.

The datatype will allow to perform validation from the configuration
plane on correct value that we support, otherwise return -EOPNOTSUPP.
This approach above seems a bit sloppy to me as the validation happens
from the packet path.

Then, we should enhance the validation code to make sure the
expression get the expected datatypes when passing data from one to
another.

I think we will need something similar for Tejun's cgroups2 since
PATH_MAX doesn't fit into our 128-bit register and copying 4096 bytes
for each packet is not an option through an immediate.

I already discussed this with Patrick. His concerns go in the
direction we didn't come up with a design for to optimization at
ruleset-level (eg. avoid fetching the same packet field several times
from different rules) and this validation at rule level may introduce
problems to this.

Using this datatypes will result in making the validation at (global)
ruleset level a bit more complicated. But after thinking a while on
this, such scenario will require a new 'constant' flag for the table
ruleset as making dynamic updates with an optimized ruleset would be
complicated.

I also have a patch to add a NFT_DATA_IFACE type to maintain the
mapping between index and device names from the kernel.

That makes three possible use cases of this datatype + validation at
rule level.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH V2 nf-next] netfilter: meta: add support for setting skb->pkttype
  2015-12-14 11:45 ` Pablo Neira Ayuso
@ 2015-12-14 13:38   ` Florian Westphal
  2015-12-14 17:26     ` Pablo Neira Ayuso
  0 siblings, 1 reply; 5+ messages in thread
From: Florian Westphal @ 2015-12-14 13:38 UTC (permalink / raw)
  To: Pablo Neira Ayuso; +Cc: Florian Westphal, netfilter-devel, kaber

Pablo Neira Ayuso <pablo@netfilter.org> wrote:
> Cc'ing Patrick, I would like to hear him, this is bringing up again
> the datatype issue that we discussed before.
> 
> On Thu, Dec 10, 2015 at 06:04:07PM +0100, Florian Westphal wrote:
> > This allows to redirect bridged packets to local machine:
> > 
> > ether type ip ether daddr set aa:53:08:12:34:56 meta pkttype set unicast
> > Without 'set unicast', ip stack discards PACKET_OTHERHOST skbs.
> > 
> > It is also useful to add support for a '-m cluster like' nft rule
> > (where switch floods packets to several nodes, and each cluster node
> >  node processes a subset of packets for load distribution).
> > 
> > Mangling is restricted to HOST/OTHER/BROAD/MULTICAST, i.e. you cannot set
> > skb->pkt_type to PACKET_KERNEL or change PACKET_LOOPBACK to PACKET_HOST.
> > @@ -190,6 +192,13 @@ err:
> >  }
> >  EXPORT_SYMBOL_GPL(nft_meta_get_eval);
> >  
> > +/* don't change or set _LOOPBACK, _USER, etc. */
> > +static bool pkt_type_ok(u32 p)
> > +{
> > +	return p == PACKET_HOST || p == PACKET_BROADCAST ||
> > +	       p == PACKET_MULTICAST || p == PACKET_OTHERHOST;
> > +}
> 
> I think we should make use of the datatypes in the kernel side, ie.
> add NFT_DATA_PKTTYPE. Currently we are using the raw NFT_DATA_VALUE
> which doesn't semantically tell us anything on the kind of data.

Yes, but otherwise you can't parameterize the set operations anymore.

Its probably not a big deal for pkt_type but it would be f.e. for mark,
we dfinitely want to take that from a sreg.

> The datatype will allow to perform validation from the configuration
> plane on correct value that we support, otherwise return -EOPNOTSUPP.
> This approach above seems a bit sloppy to me as the validation happens
> from the packet path.

Yes, but again ...  I think it would be okay for pkt_type to force
specification of the target value at config stage, but it would mean
that meta set works differently depending on the key (somethis input
is sreg, sometimes a specified attribute value).

Patrick, Pablo, please battle th^W^W discuss this and let me know the
result.

I can redo this as suggested by Pablo but since its a bit more work
I'd only do this if its decided that its indeed the right path forward.
(I.e. meta set would work with an sreg or some pre-set attribute depending
 on the meta operation key).

Thanks!

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH V2 nf-next] netfilter: meta: add support for setting skb->pkttype
  2015-12-14 13:38   ` Florian Westphal
@ 2015-12-14 17:26     ` Pablo Neira Ayuso
  2015-12-18 12:07       ` Pablo Neira Ayuso
  0 siblings, 1 reply; 5+ messages in thread
From: Pablo Neira Ayuso @ 2015-12-14 17:26 UTC (permalink / raw)
  To: Florian Westphal; +Cc: netfilter-devel, kaber

On Mon, Dec 14, 2015 at 02:38:46PM +0100, Florian Westphal wrote:
> Pablo Neira Ayuso <pablo@netfilter.org> wrote:
> > Cc'ing Patrick, I would like to hear him, this is bringing up again
> > the datatype issue that we discussed before.
> > 
> > On Thu, Dec 10, 2015 at 06:04:07PM +0100, Florian Westphal wrote:
> > > This allows to redirect bridged packets to local machine:
> > > 
> > > ether type ip ether daddr set aa:53:08:12:34:56 meta pkttype set unicast
> > > Without 'set unicast', ip stack discards PACKET_OTHERHOST skbs.
> > > 
> > > It is also useful to add support for a '-m cluster like' nft rule
> > > (where switch floods packets to several nodes, and each cluster node
> > >  node processes a subset of packets for load distribution).
> > > 
> > > Mangling is restricted to HOST/OTHER/BROAD/MULTICAST, i.e. you cannot set
> > > skb->pkt_type to PACKET_KERNEL or change PACKET_LOOPBACK to PACKET_HOST.
> > > @@ -190,6 +192,13 @@ err:
> > >  }
> > >  EXPORT_SYMBOL_GPL(nft_meta_get_eval);
> > >  
> > > +/* don't change or set _LOOPBACK, _USER, etc. */
> > > +static bool pkt_type_ok(u32 p)
> > > +{
> > > +	return p == PACKET_HOST || p == PACKET_BROADCAST ||
> > > +	       p == PACKET_MULTICAST || p == PACKET_OTHERHOST;
> > > +}
> > 
> > I think we should make use of the datatypes in the kernel side, ie.
> > add NFT_DATA_PKTTYPE. Currently we are using the raw NFT_DATA_VALUE
> > which doesn't semantically tell us anything on the kind of data.
> 
> Yes, but otherwise you can't parameterize the set operations anymore.

We can still via nft_data_init(), we would need a nft_pkttype_init()
there (or some generic infrastructure to register datatypes from each
extensions). So as long as you indicate NFT_DATA_PKTTYPE, the data
will be validated. That will catch both store of pkttype into register
both from immediate and sets.

> Its probably not a big deal for pkt_type but it would be f.e. for mark,
> we dfinitely want to take that from a sreg.
>
> > The datatype will allow to perform validation from the configuration
> > plane on correct value that we support, otherwise return -EOPNOTSUPP.
> > This approach above seems a bit sloppy to me as the validation happens
> > from the packet path.
> 
> Yes, but again ...  I think it would be okay for pkt_type to force
> specification of the target value at config stage, but it would mean
> that meta set works differently depending on the key (somethis input
> is sreg, sometimes a specified attribute value).

Not talking here about passing the data via sreg (through immediate
expression) or as a netlink attribute. Although that is another
interesting topic.

Last time I met Patrick a bit more that one month ago, he mentioned it
would be good to evaluate this from performance point of view as we
would skip the handling of the immediate expression.

We may keep going with the existing approach, ie. just raw data with
no semantics, then if we decide to go this path, to retain backward
compatibility, we would need to pass two data attributes through
netlink: one with the raw data and another with the specific datatype
so older kernels keep handling what userspace generates.

Let me see if I can reach Patrick to discuss this.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH V2 nf-next] netfilter: meta: add support for setting skb->pkttype
  2015-12-14 17:26     ` Pablo Neira Ayuso
@ 2015-12-18 12:07       ` Pablo Neira Ayuso
  0 siblings, 0 replies; 5+ messages in thread
From: Pablo Neira Ayuso @ 2015-12-18 12:07 UTC (permalink / raw)
  To: Florian Westphal; +Cc: netfilter-devel, kaber

On Mon, Dec 14, 2015 at 06:26:42PM +0100, Pablo Neira Ayuso wrote:
> We may keep going with the existing approach, ie. just raw data with
> no semantics, then if we decide to go this path, to retain backward
> compatibility, we would need to pass two data attributes through
> netlink: one with the raw data and another with the specific datatype
> so older kernels keep handling what userspace generates.

I working on a RFC patchset that should cover these concerns above,
but I think that new infrastructure is too much for this specific
case.

So I'm reconsidering this, I'm going to apply this patch.

Thanks.

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2015-12-18 12:07 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-12-10 17:04 [PATCH V2 nf-next] netfilter: meta: add support for setting skb->pkttype Florian Westphal
2015-12-14 11:45 ` Pablo Neira Ayuso
2015-12-14 13:38   ` Florian Westphal
2015-12-14 17:26     ` Pablo Neira Ayuso
2015-12-18 12:07       ` Pablo Neira Ayuso

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.