All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 00/20] Netfilter updates for net-next
@ 2015-04-09 11:34 Pablo Neira Ayuso
  2015-04-09 11:34 ` [PATCH 01/20] netfilter: nf_tables: add set timeout API support Pablo Neira Ayuso
                   ` (20 more replies)
  0 siblings, 21 replies; 25+ messages in thread
From: Pablo Neira Ayuso @ 2015-04-09 11:34 UTC (permalink / raw)
  To: netfilter-devel; +Cc: davem, netdev

Hi David,

The following patchset contains Netfilter updates for your net-next tree.
They are:

* nf_tables set timeout infrastructure from Patrick Mchardy.

1) Add support for set timeout support.

2) Add support for set element timeouts using the new set extension
   infrastructure.

4) Add garbage collection helper functions to get rid of stale elements.
   Elements are accumulated in a batch that are asynchronously released
   via RCU when the batch is full.

5) Add garbage collection synchronization helpers. This introduces a new
   element busy bit to address concurrent access from the netlink API and the
   garbage collector.

5) Add timeout support for the nft_hash set implementation. The garbage
   collector peridically checks for stale elements from the workqueue.

* iptables/nftables cgroup fixes:

6) Ignore non full-socket objects from the input path, otherwise cgroup
   match may crash, from Daniel Borkmann.

7) Fix cgroup in nf_tables.

8) Save some cycles from xt_socket by skipping packet header parsing when
   skb->sk is already set because of early demux. Also from Daniel.

* br_netfilter updates from Florian Westphal.

9) Save frag_max_size and restore it from the forward path too.

10) Use a per-cpu area to restore the original source MAC address when traffic
    is DNAT'ed.

11) Add helper functions to access physical devices.

12) Use these new physdev helper function from xt_physdev.

13) Add another nf_bridge_info_get() helper function to fetch the br_netfilter
    state information.

14) Annotate original layer 2 protocol number in nf_bridge info, instead of
    using kludgy flags.

15) Also annotate the pkttype mangling when the packet travels back and forth
    from the IP to the bridge layer, instead of using a flag.


* More nf_tables set enhancement from Patrick:

16) Fix possible usage of set variant that doesn't support timeouts.

17) Avoid spurious "set is full" errors from Netlink API when there are pending
    stale elements scheduled to be released.

18) Restrict loop checks to set maps.

19) Add support for dynamic set updates from the packet path.

20) Add support to store optional user data (eg. comments) per set element.

BTW, I have also pulled net-next into nf-next to anticipate the conflict
resolution between your okfn() signature changes and Florian's br_netfilter
updates.

You can pull these changes from:

  git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next.git

Thanks!

----------------------------------------------------------------

The following changes since commit ee90b81203a91d4e5385622811ee7872b5bcfe76:

  hv_netvsc: Fix the packet free when it is in skb headroom (2015-04-07 18:45:33 -0400)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next.git master

for you to fetch changes up to aadd51aa71f8d013c818a312bb2a0c5714830dbc:

  Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next (2015-04-08 18:30:21 +0200)

----------------------------------------------------------------

Daniel Borkmann (2):
      netfilter: x_tables: fix cgroup matching on non-full sks
      netfilter: x_tables: don't extract flow keys on early demuxed sks in socket match

Florian Westphal (7):
      netfilter: bridge: really save frag_max_size between PRE and POST_ROUTING
      netfilter: bridge: don't use nf_bridge_info data to store mac header
      netfilter: bridge: add helpers for fetching physin/outdev
      netfilter: physdev: use helpers
      netfilter: bridge: add and use nf_bridge_info_get helper
      netfilter: bridge: start splitting mask into public/private chunks
      netfilter: bridge: make BRNF_PKT_TYPE flag a bool

Pablo Neira Ayuso (2):
      netfilter: nft_meta: fix cgroup matching
      Merge git://git.kernel.org/.../davem/net-next

Patrick McHardy (10):
      netfilter: nf_tables: add set timeout API support
      netfilter: nf_tables: add set element timeout support
      netfilter: nf_tables: add set garbage collection helpers
      netfilter: nf_tables: add GC synchronization helpers
      netfilter: nft_hash: add support for timeouts
      netfilter: nf_tables: fix set selection when timeouts are requested
      netfilter: nf_tables: prepare set element accounting for async updates
      netfilter: nf_tables: support different set binding types
      netfilter: nf_tables: add support for dynamic set updates
      netfilter: nf_tables: support optional userdata for set elements

 include/linux/netfilter_bridge.h           |   28 +++-
 include/linux/skbuff.h                     |    8 +-
 include/net/netfilter/nf_tables.h          |  155 +++++++++++++++++++-
 include/net/netfilter/nf_tables_core.h     |    3 +
 include/uapi/linux/netfilter/nf_tables.h   |   39 +++++
 net/bridge/br_netfilter.c                  |  144 +++++++++++-------
 net/ipv4/netfilter/nf_reject_ipv4.c        |    4 +-
 net/ipv6/netfilter/nf_reject_ipv6.c        |    4 +-
 net/netfilter/Makefile                     |    2 +-
 net/netfilter/ipset/ip_set_hash_netiface.c |   32 +++-
 net/netfilter/nf_log_common.c              |    5 +-
 net/netfilter/nf_queue.c                   |   18 ++-
 net/netfilter/nf_tables_api.c              |  186 +++++++++++++++++++++---
 net/netfilter/nf_tables_core.c             |    7 +
 net/netfilter/nfnetlink_log.c              |   17 ++-
 net/netfilter/nfnetlink_queue_core.c       |   28 ++--
 net/netfilter/nft_dynset.c                 |  218 ++++++++++++++++++++++++++++
 net/netfilter/nft_hash.c                   |  117 ++++++++++++++-
 net/netfilter/nft_lookup.c                 |    2 +
 net/netfilter/nft_meta.c                   |    5 +-
 net/netfilter/xt_cgroup.c                  |    2 +-
 net/netfilter/xt_physdev.c                 |   34 +++--
 net/netfilter/xt_socket.c                  |   95 ++++++------
 23 files changed, 973 insertions(+), 180 deletions(-)
 create mode 100644 net/netfilter/nft_dynset.c

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH 01/20] netfilter: nf_tables: add set timeout API support
  2015-04-09 11:34 [PATCH 00/20] Netfilter updates for net-next Pablo Neira Ayuso
@ 2015-04-09 11:34 ` Pablo Neira Ayuso
  2015-04-09 11:34 ` [PATCH 02/20] netfilter: nf_tables: add set element timeout support Pablo Neira Ayuso
                   ` (19 subsequent siblings)
  20 siblings, 0 replies; 25+ messages in thread
From: Pablo Neira Ayuso @ 2015-04-09 11:34 UTC (permalink / raw)
  To: netfilter-devel; +Cc: davem, netdev

From: Patrick McHardy <kaber@trash.net>

Add set timeout support to the netlink API. Sets with timeout support
enabled can have a default timeout value and garbage collection interval
specified.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
---
 include/net/netfilter/nf_tables.h        |    9 +++++++++
 include/uapi/linux/netfilter/nf_tables.h |    6 ++++++
 net/netfilter/nf_tables_api.c            |   30 ++++++++++++++++++++++++++++--
 3 files changed, 43 insertions(+), 2 deletions(-)

diff --git a/include/net/netfilter/nf_tables.h b/include/net/netfilter/nf_tables.h
index b8cd60d..8936803 100644
--- a/include/net/netfilter/nf_tables.h
+++ b/include/net/netfilter/nf_tables.h
@@ -258,6 +258,8 @@ void nft_unregister_set(struct nft_set_ops *ops);
  * 	@dtype: data type (verdict or numeric type defined by userspace)
  * 	@size: maximum set size
  * 	@nelems: number of elements
+ * 	@timeout: default timeout value in msecs
+ * 	@gc_int: garbage collection interval in msecs
  *	@policy: set parameterization (see enum nft_set_policies)
  * 	@ops: set ops
  * 	@pnet: network namespace
@@ -274,6 +276,8 @@ struct nft_set {
 	u32				dtype;
 	u32				size;
 	u32				nelems;
+	u64				timeout;
+	u32				gc_int;
 	u16				policy;
 	/* runtime data below here */
 	const struct nft_set_ops	*ops ____cacheline_aligned;
@@ -295,6 +299,11 @@ struct nft_set *nf_tables_set_lookup(const struct nft_table *table,
 struct nft_set *nf_tables_set_lookup_byid(const struct net *net,
 					  const struct nlattr *nla);
 
+static inline unsigned long nft_set_gc_interval(const struct nft_set *set)
+{
+	return set->gc_int ? msecs_to_jiffies(set->gc_int) : HZ;
+}
+
 /**
  *	struct nft_set_binding - nf_tables set binding
  *
diff --git a/include/uapi/linux/netfilter/nf_tables.h b/include/uapi/linux/netfilter/nf_tables.h
index b978393..971d245 100644
--- a/include/uapi/linux/netfilter/nf_tables.h
+++ b/include/uapi/linux/netfilter/nf_tables.h
@@ -208,12 +208,14 @@ enum nft_rule_compat_attributes {
  * @NFT_SET_CONSTANT: set contents may not change while bound
  * @NFT_SET_INTERVAL: set contains intervals
  * @NFT_SET_MAP: set is used as a dictionary
+ * @NFT_SET_TIMEOUT: set uses timeouts
  */
 enum nft_set_flags {
 	NFT_SET_ANONYMOUS		= 0x1,
 	NFT_SET_CONSTANT		= 0x2,
 	NFT_SET_INTERVAL		= 0x4,
 	NFT_SET_MAP			= 0x8,
+	NFT_SET_TIMEOUT			= 0x10,
 };
 
 /**
@@ -252,6 +254,8 @@ enum nft_set_desc_attributes {
  * @NFTA_SET_POLICY: selection policy (NLA_U32)
  * @NFTA_SET_DESC: set description (NLA_NESTED)
  * @NFTA_SET_ID: uniquely identifies a set in a transaction (NLA_U32)
+ * @NFTA_SET_TIMEOUT: default timeout value (NLA_U64)
+ * @NFTA_SET_GC_INTERVAL: garbage collection interval (NLA_U32)
  */
 enum nft_set_attributes {
 	NFTA_SET_UNSPEC,
@@ -265,6 +269,8 @@ enum nft_set_attributes {
 	NFTA_SET_POLICY,
 	NFTA_SET_DESC,
 	NFTA_SET_ID,
+	NFTA_SET_TIMEOUT,
+	NFTA_SET_GC_INTERVAL,
 	__NFTA_SET_MAX
 };
 #define NFTA_SET_MAX		(__NFTA_SET_MAX - 1)
diff --git a/net/netfilter/nf_tables_api.c b/net/netfilter/nf_tables_api.c
index 5604c2d..6320b64 100644
--- a/net/netfilter/nf_tables_api.c
+++ b/net/netfilter/nf_tables_api.c
@@ -2216,6 +2216,8 @@ static const struct nla_policy nft_set_policy[NFTA_SET_MAX + 1] = {
 	[NFTA_SET_POLICY]		= { .type = NLA_U32 },
 	[NFTA_SET_DESC]			= { .type = NLA_NESTED },
 	[NFTA_SET_ID]			= { .type = NLA_U32 },
+	[NFTA_SET_TIMEOUT]		= { .type = NLA_U64 },
+	[NFTA_SET_GC_INTERVAL]		= { .type = NLA_U32 },
 };
 
 static const struct nla_policy nft_set_desc_policy[NFTA_SET_DESC_MAX + 1] = {
@@ -2366,6 +2368,13 @@ static int nf_tables_fill_set(struct sk_buff *skb, const struct nft_ctx *ctx,
 			goto nla_put_failure;
 	}
 
+	if (set->timeout &&
+	    nla_put_be64(skb, NFTA_SET_TIMEOUT, cpu_to_be64(set->timeout)))
+		goto nla_put_failure;
+	if (set->gc_int &&
+	    nla_put_be32(skb, NFTA_SET_GC_INTERVAL, htonl(set->gc_int)))
+		goto nla_put_failure;
+
 	if (set->policy != NFT_SET_POL_PERFORMANCE) {
 		if (nla_put_be32(skb, NFTA_SET_POLICY, htonl(set->policy)))
 			goto nla_put_failure;
@@ -2578,7 +2587,8 @@ static int nf_tables_newset(struct sock *nlsk, struct sk_buff *skb,
 	char name[IFNAMSIZ];
 	unsigned int size;
 	bool create;
-	u32 ktype, dtype, flags, policy;
+	u64 timeout;
+	u32 ktype, dtype, flags, policy, gc_int;
 	struct nft_set_desc desc;
 	int err;
 
@@ -2605,7 +2615,8 @@ static int nf_tables_newset(struct sock *nlsk, struct sk_buff *skb,
 	if (nla[NFTA_SET_FLAGS] != NULL) {
 		flags = ntohl(nla_get_be32(nla[NFTA_SET_FLAGS]));
 		if (flags & ~(NFT_SET_ANONYMOUS | NFT_SET_CONSTANT |
-			      NFT_SET_INTERVAL | NFT_SET_MAP))
+			      NFT_SET_INTERVAL | NFT_SET_MAP |
+			      NFT_SET_TIMEOUT))
 			return -EINVAL;
 	}
 
@@ -2631,6 +2642,19 @@ static int nf_tables_newset(struct sock *nlsk, struct sk_buff *skb,
 	} else if (flags & NFT_SET_MAP)
 		return -EINVAL;
 
+	timeout = 0;
+	if (nla[NFTA_SET_TIMEOUT] != NULL) {
+		if (!(flags & NFT_SET_TIMEOUT))
+			return -EINVAL;
+		timeout = be64_to_cpu(nla_get_be64(nla[NFTA_SET_TIMEOUT]));
+	}
+	gc_int = 0;
+	if (nla[NFTA_SET_GC_INTERVAL] != NULL) {
+		if (!(flags & NFT_SET_TIMEOUT))
+			return -EINVAL;
+		gc_int = ntohl(nla_get_be32(nla[NFTA_SET_GC_INTERVAL]));
+	}
+
 	policy = NFT_SET_POL_PERFORMANCE;
 	if (nla[NFTA_SET_POLICY] != NULL)
 		policy = ntohl(nla_get_be32(nla[NFTA_SET_POLICY]));
@@ -2699,6 +2723,8 @@ static int nf_tables_newset(struct sock *nlsk, struct sk_buff *skb,
 	set->flags = flags;
 	set->size  = desc.size;
 	set->policy = policy;
+	set->timeout = timeout;
+	set->gc_int = gc_int;
 
 	err = ops->init(set, &desc, nla);
 	if (err < 0)
-- 
1.7.10.4


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 02/20] netfilter: nf_tables: add set element timeout support
  2015-04-09 11:34 [PATCH 00/20] Netfilter updates for net-next Pablo Neira Ayuso
  2015-04-09 11:34 ` [PATCH 01/20] netfilter: nf_tables: add set timeout API support Pablo Neira Ayuso
@ 2015-04-09 11:34 ` Pablo Neira Ayuso
  2015-04-09 11:34 ` [PATCH 03/20] netfilter: nf_tables: add set garbage collection helpers Pablo Neira Ayuso
                   ` (18 subsequent siblings)
  20 siblings, 0 replies; 25+ messages in thread
From: Pablo Neira Ayuso @ 2015-04-09 11:34 UTC (permalink / raw)
  To: netfilter-devel; +Cc: davem, netdev

From: Patrick McHardy <kaber@trash.net>

Add API support for set element timeouts. Elements can have a individual
timeout value specified, overriding the sets' default.

Two new extension types are used for timeouts - the timeout value and
the expiration time. The timeout value only exists if it differs from
the default value.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
---
 include/net/netfilter/nf_tables.h        |   20 +++++++++++
 include/uapi/linux/netfilter/nf_tables.h |    4 +++
 net/netfilter/nf_tables_api.c            |   53 ++++++++++++++++++++++++++++--
 3 files changed, 75 insertions(+), 2 deletions(-)

diff --git a/include/net/netfilter/nf_tables.h b/include/net/netfilter/nf_tables.h
index 8936803..f2726c5 100644
--- a/include/net/netfilter/nf_tables.h
+++ b/include/net/netfilter/nf_tables.h
@@ -329,12 +329,16 @@ void nf_tables_unbind_set(const struct nft_ctx *ctx, struct nft_set *set,
  *	@NFT_SET_EXT_KEY: element key
  *	@NFT_SET_EXT_DATA: mapping data
  *	@NFT_SET_EXT_FLAGS: element flags
+ *	@NFT_SET_EXT_TIMEOUT: element timeout
+ *	@NFT_SET_EXT_EXPIRATION: element expiration time
  *	@NFT_SET_EXT_NUM: number of extension types
  */
 enum nft_set_extensions {
 	NFT_SET_EXT_KEY,
 	NFT_SET_EXT_DATA,
 	NFT_SET_EXT_FLAGS,
+	NFT_SET_EXT_TIMEOUT,
+	NFT_SET_EXT_EXPIRATION,
 	NFT_SET_EXT_NUM
 };
 
@@ -431,6 +435,22 @@ static inline u8 *nft_set_ext_flags(const struct nft_set_ext *ext)
 	return nft_set_ext(ext, NFT_SET_EXT_FLAGS);
 }
 
+static inline u64 *nft_set_ext_timeout(const struct nft_set_ext *ext)
+{
+	return nft_set_ext(ext, NFT_SET_EXT_TIMEOUT);
+}
+
+static inline unsigned long *nft_set_ext_expiration(const struct nft_set_ext *ext)
+{
+	return nft_set_ext(ext, NFT_SET_EXT_EXPIRATION);
+}
+
+static inline bool nft_set_elem_expired(const struct nft_set_ext *ext)
+{
+	return nft_set_ext_exists(ext, NFT_SET_EXT_EXPIRATION) &&
+	       time_is_before_eq_jiffies(*nft_set_ext_expiration(ext));
+}
+
 static inline struct nft_set_ext *nft_set_elem_ext(const struct nft_set *set,
 						   void *elem)
 {
diff --git a/include/uapi/linux/netfilter/nf_tables.h b/include/uapi/linux/netfilter/nf_tables.h
index 971d245..83441cc 100644
--- a/include/uapi/linux/netfilter/nf_tables.h
+++ b/include/uapi/linux/netfilter/nf_tables.h
@@ -290,12 +290,16 @@ enum nft_set_elem_flags {
  * @NFTA_SET_ELEM_KEY: key value (NLA_NESTED: nft_data)
  * @NFTA_SET_ELEM_DATA: data value of mapping (NLA_NESTED: nft_data_attributes)
  * @NFTA_SET_ELEM_FLAGS: bitmask of nft_set_elem_flags (NLA_U32)
+ * @NFTA_SET_ELEM_TIMEOUT: timeout value (NLA_U64)
+ * @NFTA_SET_ELEM_EXPIRATION: expiration time (NLA_U64)
  */
 enum nft_set_elem_attributes {
 	NFTA_SET_ELEM_UNSPEC,
 	NFTA_SET_ELEM_KEY,
 	NFTA_SET_ELEM_DATA,
 	NFTA_SET_ELEM_FLAGS,
+	NFTA_SET_ELEM_TIMEOUT,
+	NFTA_SET_ELEM_EXPIRATION,
 	__NFTA_SET_ELEM_MAX
 };
 #define NFTA_SET_ELEM_MAX	(__NFTA_SET_ELEM_MAX - 1)
diff --git a/net/netfilter/nf_tables_api.c b/net/netfilter/nf_tables_api.c
index 6320b64..9e032db 100644
--- a/net/netfilter/nf_tables_api.c
+++ b/net/netfilter/nf_tables_api.c
@@ -2863,6 +2863,14 @@ const struct nft_set_ext_type nft_set_ext_types[] = {
 		.len	= sizeof(u8),
 		.align	= __alignof__(u8),
 	},
+	[NFT_SET_EXT_TIMEOUT]		= {
+		.len	= sizeof(u64),
+		.align	= __alignof__(u64),
+	},
+	[NFT_SET_EXT_EXPIRATION]	= {
+		.len	= sizeof(unsigned long),
+		.align	= __alignof__(unsigned long),
+	},
 };
 EXPORT_SYMBOL_GPL(nft_set_ext_types);
 
@@ -2874,6 +2882,7 @@ static const struct nla_policy nft_set_elem_policy[NFTA_SET_ELEM_MAX + 1] = {
 	[NFTA_SET_ELEM_KEY]		= { .type = NLA_NESTED },
 	[NFTA_SET_ELEM_DATA]		= { .type = NLA_NESTED },
 	[NFTA_SET_ELEM_FLAGS]		= { .type = NLA_U32 },
+	[NFTA_SET_ELEM_TIMEOUT]		= { .type = NLA_U64 },
 };
 
 static const struct nla_policy nft_set_elem_list_policy[NFTA_SET_ELEM_LIST_MAX + 1] = {
@@ -2935,6 +2944,25 @@ static int nf_tables_fill_setelem(struct sk_buff *skb,
 		         htonl(*nft_set_ext_flags(ext))))
 		goto nla_put_failure;
 
+	if (nft_set_ext_exists(ext, NFT_SET_EXT_TIMEOUT) &&
+	    nla_put_be64(skb, NFTA_SET_ELEM_TIMEOUT,
+			 cpu_to_be64(*nft_set_ext_timeout(ext))))
+		goto nla_put_failure;
+
+	if (nft_set_ext_exists(ext, NFT_SET_EXT_EXPIRATION)) {
+		unsigned long expires, now = jiffies;
+
+		expires = *nft_set_ext_expiration(ext);
+		if (time_before(now, expires))
+			expires -= now;
+		else
+			expires = 0;
+
+		if (nla_put_be64(skb, NFTA_SET_ELEM_EXPIRATION,
+				 cpu_to_be64(jiffies_to_msecs(expires))))
+			goto nla_put_failure;
+	}
+
 	nla_nest_end(skb, nest);
 	return 0;
 
@@ -3158,7 +3186,7 @@ static void *nft_set_elem_init(const struct nft_set *set,
 			       const struct nft_set_ext_tmpl *tmpl,
 			       const struct nft_data *key,
 			       const struct nft_data *data,
-			       gfp_t gfp)
+			       u64 timeout, gfp_t gfp)
 {
 	struct nft_set_ext *ext;
 	void *elem;
@@ -3173,6 +3201,11 @@ static void *nft_set_elem_init(const struct nft_set *set,
 	memcpy(nft_set_ext_key(ext), key, set->klen);
 	if (nft_set_ext_exists(ext, NFT_SET_EXT_DATA))
 		memcpy(nft_set_ext_data(ext), data, set->dlen);
+	if (nft_set_ext_exists(ext, NFT_SET_EXT_EXPIRATION))
+		*nft_set_ext_expiration(ext) =
+			jiffies + msecs_to_jiffies(timeout);
+	if (nft_set_ext_exists(ext, NFT_SET_EXT_TIMEOUT))
+		*nft_set_ext_timeout(ext) = timeout;
 
 	return elem;
 }
@@ -3201,6 +3234,7 @@ static int nft_add_set_elem(struct nft_ctx *ctx, struct nft_set *set,
 	struct nft_data data;
 	enum nft_registers dreg;
 	struct nft_trans *trans;
+	u64 timeout;
 	u32 flags;
 	int err;
 
@@ -3241,6 +3275,15 @@ static int nft_add_set_elem(struct nft_ctx *ctx, struct nft_set *set,
 			return -EINVAL;
 	}
 
+	timeout = 0;
+	if (nla[NFTA_SET_ELEM_TIMEOUT] != NULL) {
+		if (!(set->flags & NFT_SET_TIMEOUT))
+			return -EINVAL;
+		timeout = be64_to_cpu(nla_get_be64(nla[NFTA_SET_ELEM_TIMEOUT]));
+	} else if (set->flags & NFT_SET_TIMEOUT) {
+		timeout = set->timeout;
+	}
+
 	err = nft_data_init(ctx, &elem.key, &d1, nla[NFTA_SET_ELEM_KEY]);
 	if (err < 0)
 		goto err1;
@@ -3249,6 +3292,11 @@ static int nft_add_set_elem(struct nft_ctx *ctx, struct nft_set *set,
 		goto err2;
 
 	nft_set_ext_add(&tmpl, NFT_SET_EXT_KEY);
+	if (timeout > 0) {
+		nft_set_ext_add(&tmpl, NFT_SET_EXT_EXPIRATION);
+		if (timeout != set->timeout)
+			nft_set_ext_add(&tmpl, NFT_SET_EXT_TIMEOUT);
+	}
 
 	if (nla[NFTA_SET_ELEM_DATA] != NULL) {
 		err = nft_data_init(ctx, &data, &d2, nla[NFTA_SET_ELEM_DATA]);
@@ -3277,7 +3325,8 @@ static int nft_add_set_elem(struct nft_ctx *ctx, struct nft_set *set,
 	}
 
 	err = -ENOMEM;
-	elem.priv = nft_set_elem_init(set, &tmpl, &elem.key, &data, GFP_KERNEL);
+	elem.priv = nft_set_elem_init(set, &tmpl, &elem.key, &data,
+				      timeout, GFP_KERNEL);
 	if (elem.priv == NULL)
 		goto err3;
 
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 03/20] netfilter: nf_tables: add set garbage collection helpers
  2015-04-09 11:34 [PATCH 00/20] Netfilter updates for net-next Pablo Neira Ayuso
  2015-04-09 11:34 ` [PATCH 01/20] netfilter: nf_tables: add set timeout API support Pablo Neira Ayuso
  2015-04-09 11:34 ` [PATCH 02/20] netfilter: nf_tables: add set element timeout support Pablo Neira Ayuso
@ 2015-04-09 11:34 ` Pablo Neira Ayuso
  2015-04-09 11:34 ` [PATCH 04/20] netfilter: nf_tables: add GC synchronization helpers Pablo Neira Ayuso
                   ` (17 subsequent siblings)
  20 siblings, 0 replies; 25+ messages in thread
From: Pablo Neira Ayuso @ 2015-04-09 11:34 UTC (permalink / raw)
  To: netfilter-devel; +Cc: davem, netdev

From: Patrick McHardy <kaber@trash.net>

Add helpers for GC batch destruction: since element destruction needs
a RCU grace period for all set implementations, add some helper functions
for asynchronous batch destruction. Elements are collected in a batch
structure, which is asynchronously released using RCU once its full.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
---
 include/net/netfilter/nf_tables.h |   56 +++++++++++++++++++++++++++++++++++++
 net/netfilter/nf_tables_api.c     |   25 +++++++++++++++++
 2 files changed, 81 insertions(+)

diff --git a/include/net/netfilter/nf_tables.h b/include/net/netfilter/nf_tables.h
index f2726c5..6fd4495 100644
--- a/include/net/netfilter/nf_tables.h
+++ b/include/net/netfilter/nf_tables.h
@@ -460,6 +460,62 @@ static inline struct nft_set_ext *nft_set_elem_ext(const struct nft_set *set,
 void nft_set_elem_destroy(const struct nft_set *set, void *elem);
 
 /**
+ *	struct nft_set_gc_batch_head - nf_tables set garbage collection batch
+ *
+ *	@rcu: rcu head
+ *	@set: set the elements belong to
+ *	@cnt: count of elements
+ */
+struct nft_set_gc_batch_head {
+	struct rcu_head			rcu;
+	const struct nft_set		*set;
+	unsigned int			cnt;
+};
+
+#define NFT_SET_GC_BATCH_SIZE	((PAGE_SIZE -				  \
+				  sizeof(struct nft_set_gc_batch_head)) / \
+				 sizeof(void *))
+
+/**
+ *	struct nft_set_gc_batch - nf_tables set garbage collection batch
+ *
+ * 	@head: GC batch head
+ * 	@elems: garbage collection elements
+ */
+struct nft_set_gc_batch {
+	struct nft_set_gc_batch_head	head;
+	void				*elems[NFT_SET_GC_BATCH_SIZE];
+};
+
+struct nft_set_gc_batch *nft_set_gc_batch_alloc(const struct nft_set *set,
+						gfp_t gfp);
+void nft_set_gc_batch_release(struct rcu_head *rcu);
+
+static inline void nft_set_gc_batch_complete(struct nft_set_gc_batch *gcb)
+{
+	if (gcb != NULL)
+		call_rcu(&gcb->head.rcu, nft_set_gc_batch_release);
+}
+
+static inline struct nft_set_gc_batch *
+nft_set_gc_batch_check(const struct nft_set *set, struct nft_set_gc_batch *gcb,
+		       gfp_t gfp)
+{
+	if (gcb != NULL) {
+		if (gcb->head.cnt + 1 < ARRAY_SIZE(gcb->elems))
+			return gcb;
+		nft_set_gc_batch_complete(gcb);
+	}
+	return nft_set_gc_batch_alloc(set, gfp);
+}
+
+static inline void nft_set_gc_batch_add(struct nft_set_gc_batch *gcb,
+					void *elem)
+{
+	gcb->elems[gcb->head.cnt++] = elem;
+}
+
+/**
  *	struct nft_expr_type - nf_tables expression type
  *
  *	@select_ops: function to select nft_expr_ops
diff --git a/net/netfilter/nf_tables_api.c b/net/netfilter/nf_tables_api.c
index 9e032db..138e47f 100644
--- a/net/netfilter/nf_tables_api.c
+++ b/net/netfilter/nf_tables_api.c
@@ -3482,6 +3482,31 @@ static int nf_tables_delsetelem(struct sock *nlsk, struct sk_buff *skb,
 	return err;
 }
 
+void nft_set_gc_batch_release(struct rcu_head *rcu)
+{
+	struct nft_set_gc_batch *gcb;
+	unsigned int i;
+
+	gcb = container_of(rcu, struct nft_set_gc_batch, head.rcu);
+	for (i = 0; i < gcb->head.cnt; i++)
+		nft_set_elem_destroy(gcb->head.set, gcb->elems[i]);
+	kfree(gcb);
+}
+EXPORT_SYMBOL_GPL(nft_set_gc_batch_release);
+
+struct nft_set_gc_batch *nft_set_gc_batch_alloc(const struct nft_set *set,
+						gfp_t gfp)
+{
+	struct nft_set_gc_batch *gcb;
+
+	gcb = kzalloc(sizeof(*gcb), gfp);
+	if (gcb == NULL)
+		return gcb;
+	gcb->head.set = set;
+	return gcb;
+}
+EXPORT_SYMBOL_GPL(nft_set_gc_batch_alloc);
+
 static int nf_tables_fill_gen_info(struct sk_buff *skb, struct net *net,
 				   u32 portid, u32 seq)
 {
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 04/20] netfilter: nf_tables: add GC synchronization helpers
  2015-04-09 11:34 [PATCH 00/20] Netfilter updates for net-next Pablo Neira Ayuso
                   ` (2 preceding siblings ...)
  2015-04-09 11:34 ` [PATCH 03/20] netfilter: nf_tables: add set garbage collection helpers Pablo Neira Ayuso
@ 2015-04-09 11:34 ` Pablo Neira Ayuso
  2015-04-09 11:34 ` [PATCH 05/20] netfilter: nft_hash: add support for timeouts Pablo Neira Ayuso
                   ` (16 subsequent siblings)
  20 siblings, 0 replies; 25+ messages in thread
From: Pablo Neira Ayuso @ 2015-04-09 11:34 UTC (permalink / raw)
  To: netfilter-devel; +Cc: davem, netdev

From: Patrick McHardy <kaber@trash.net>

GC is expected to happen asynchrously to the netlink interface. In the
netlink path, both insertion and removal of elements consist of two
steps, insertion followed by activation or deactivation followed by
removal, during which the element must not be freed by GC.

The synchronization helpers use an unused bit in the genmask field to
atomically mark an element as "busy", meaning it is either currently
being handled through the netlink API or by GC.

Elements being processed by GC will never survive, netlink will simply
ignore them. Elements being currently processed through netlink will be
skipped by GC and reprocessed during the next run.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
---
 include/net/netfilter/nf_tables.h |   35 +++++++++++++++++++++++++++++++++++
 net/netfilter/nf_tables_api.c     |    2 +-
 2 files changed, 36 insertions(+), 1 deletion(-)

diff --git a/include/net/netfilter/nf_tables.h b/include/net/netfilter/nf_tables.h
index 6fd4495..1ea13fc 100644
--- a/include/net/netfilter/nf_tables.h
+++ b/include/net/netfilter/nf_tables.h
@@ -852,6 +852,41 @@ static inline void nft_set_elem_change_active(const struct nft_set *set,
 	ext->genmask ^= nft_genmask_next(read_pnet(&set->pnet));
 }
 
+/*
+ * We use a free bit in the genmask field to indicate the element
+ * is busy, meaning it is currently being processed either by
+ * the netlink API or GC.
+ *
+ * Even though the genmask is only a single byte wide, this works
+ * because the extension structure if fully constant once initialized,
+ * so there are no non-atomic write accesses unless it is already
+ * marked busy.
+ */
+#define NFT_SET_ELEM_BUSY_MASK	(1 << 2)
+
+#if defined(__LITTLE_ENDIAN_BITFIELD)
+#define NFT_SET_ELEM_BUSY_BIT	2
+#elif defined(__BIG_ENDIAN_BITFIELD)
+#define NFT_SET_ELEM_BUSY_BIT	(BITS_PER_LONG - BITS_PER_BYTE + 2)
+#else
+#error
+#endif
+
+static inline int nft_set_elem_mark_busy(struct nft_set_ext *ext)
+{
+	unsigned long *word = (unsigned long *)ext;
+
+	BUILD_BUG_ON(offsetof(struct nft_set_ext, genmask) != 0);
+	return test_and_set_bit(NFT_SET_ELEM_BUSY_BIT, word);
+}
+
+static inline void nft_set_elem_clear_busy(struct nft_set_ext *ext)
+{
+	unsigned long *word = (unsigned long *)ext;
+
+	clear_bit(NFT_SET_ELEM_BUSY_BIT, word);
+}
+
 /**
  *	struct nft_trans - nf_tables object update in transaction
  *
diff --git a/net/netfilter/nf_tables_api.c b/net/netfilter/nf_tables_api.c
index 138e47f..3aa92b3 100644
--- a/net/netfilter/nf_tables_api.c
+++ b/net/netfilter/nf_tables_api.c
@@ -3338,7 +3338,7 @@ static int nft_add_set_elem(struct nft_ctx *ctx, struct nft_set *set,
 	if (trans == NULL)
 		goto err4;
 
-	ext->genmask = nft_genmask_cur(ctx->net);
+	ext->genmask = nft_genmask_cur(ctx->net) | NFT_SET_ELEM_BUSY_MASK;
 	err = set->ops->insert(set, &elem);
 	if (err < 0)
 		goto err5;
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 05/20] netfilter: nft_hash: add support for timeouts
  2015-04-09 11:34 [PATCH 00/20] Netfilter updates for net-next Pablo Neira Ayuso
                   ` (3 preceding siblings ...)
  2015-04-09 11:34 ` [PATCH 04/20] netfilter: nf_tables: add GC synchronization helpers Pablo Neira Ayuso
@ 2015-04-09 11:34 ` Pablo Neira Ayuso
  2015-04-09 13:39   ` David Laight
  2015-04-09 11:34 ` [PATCH 06/20] netfilter: x_tables: fix cgroup matching on non-full sks Pablo Neira Ayuso
                   ` (15 subsequent siblings)
  20 siblings, 1 reply; 25+ messages in thread
From: Pablo Neira Ayuso @ 2015-04-09 11:34 UTC (permalink / raw)
  To: netfilter-devel; +Cc: davem, netdev

From: Patrick McHardy <kaber@trash.net>

Add support for element timeouts to nft_hash. The lookup and walking
functions are changed to ignore timed out elements, a periodic garbage
collection task cleans out expired entries.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
---
 include/net/netfilter/nf_tables.h |    5 +++
 net/netfilter/nft_hash.c          |   79 +++++++++++++++++++++++++++++++++++--
 2 files changed, 80 insertions(+), 4 deletions(-)

diff --git a/include/net/netfilter/nf_tables.h b/include/net/netfilter/nf_tables.h
index 1ea13fc..a785699 100644
--- a/include/net/netfilter/nf_tables.h
+++ b/include/net/netfilter/nf_tables.h
@@ -294,6 +294,11 @@ static inline void *nft_set_priv(const struct nft_set *set)
 	return (void *)set->data;
 }
 
+static inline struct nft_set *nft_set_container_of(const void *priv)
+{
+	return (void *)priv - offsetof(struct nft_set, data);
+}
+
 struct nft_set *nf_tables_set_lookup(const struct nft_table *table,
 				     const struct nlattr *nla);
 struct nft_set *nf_tables_set_lookup_byid(const struct net *net,
diff --git a/net/netfilter/nft_hash.c b/net/netfilter/nft_hash.c
index c7e1a9d..5923ec5 100644
--- a/net/netfilter/nft_hash.c
+++ b/net/netfilter/nft_hash.c
@@ -15,6 +15,7 @@
 #include <linux/log2.h>
 #include <linux/jhash.h>
 #include <linux/netlink.h>
+#include <linux/workqueue.h>
 #include <linux/rhashtable.h>
 #include <linux/netfilter.h>
 #include <linux/netfilter/nf_tables.h>
@@ -25,6 +26,7 @@
 
 struct nft_hash {
 	struct rhashtable		ht;
+	struct delayed_work		gc_work;
 };
 
 struct nft_hash_elem {
@@ -62,6 +64,8 @@ static inline int nft_hash_cmp(struct rhashtable_compare_arg *arg,
 
 	if (nft_data_cmp(nft_set_ext_key(&he->ext), x->key, x->set->klen))
 		return 1;
+	if (nft_set_elem_expired(&he->ext))
+		return 1;
 	if (!nft_set_elem_active(&he->ext, x->genmask))
 		return 1;
 	return 0;
@@ -107,6 +111,7 @@ static void nft_hash_activate(const struct nft_set *set,
 	struct nft_hash_elem *he = elem->priv;
 
 	nft_set_elem_change_active(set, &he->ext);
+	nft_set_elem_clear_busy(&he->ext);
 }
 
 static void *nft_hash_deactivate(const struct nft_set *set,
@@ -120,9 +125,15 @@ static void *nft_hash_deactivate(const struct nft_set *set,
 		.key	 = &elem->key,
 	};
 
+	rcu_read_lock();
 	he = rhashtable_lookup_fast(&priv->ht, &arg, nft_hash_params);
-	if (he != NULL)
-		nft_set_elem_change_active(set, &he->ext);
+	if (he != NULL) {
+		if (!nft_set_elem_mark_busy(&he->ext))
+			nft_set_elem_change_active(set, &he->ext);
+		else
+			he = NULL;
+	}
+	rcu_read_unlock();
 
 	return he;
 }
@@ -170,6 +181,8 @@ static void nft_hash_walk(const struct nft_ctx *ctx, const struct nft_set *set,
 
 		if (iter->count < iter->skip)
 			goto cont;
+		if (nft_set_elem_expired(&he->ext))
+			goto cont;
 		if (!nft_set_elem_active(&he->ext, genmask))
 			goto cont;
 
@@ -188,6 +201,54 @@ out:
 	rhashtable_walk_exit(&hti);
 }
 
+static void nft_hash_gc(struct work_struct *work)
+{
+	const struct nft_set *set;
+	struct nft_hash_elem *he;
+	struct nft_hash *priv;
+	struct nft_set_gc_batch *gcb = NULL;
+	struct rhashtable_iter hti;
+	int err;
+
+	priv = container_of(work, struct nft_hash, gc_work.work);
+	set  = nft_set_container_of(priv);
+
+	err = rhashtable_walk_init(&priv->ht, &hti);
+	if (err)
+		goto schedule;
+
+	err = rhashtable_walk_start(&hti);
+	if (err && err != -EAGAIN)
+		goto out;
+
+	while ((he = rhashtable_walk_next(&hti))) {
+		if (IS_ERR(he)) {
+			if (PTR_ERR(he) != -EAGAIN)
+				goto out;
+			continue;
+		}
+
+		if (!nft_set_elem_expired(&he->ext))
+			continue;
+		if (nft_set_elem_mark_busy(&he->ext))
+			continue;
+
+		gcb = nft_set_gc_batch_check(set, gcb, GFP_ATOMIC);
+		if (gcb == NULL)
+			goto out;
+		rhashtable_remove_fast(&priv->ht, &he->node, nft_hash_params);
+		nft_set_gc_batch_add(gcb, he);
+	}
+out:
+	rhashtable_walk_stop(&hti);
+	rhashtable_walk_exit(&hti);
+
+	nft_set_gc_batch_complete(gcb);
+schedule:
+	queue_delayed_work(system_power_efficient_wq, &priv->gc_work,
+			   nft_set_gc_interval(set));
+}
+
 static unsigned int nft_hash_privsize(const struct nlattr * const nla[])
 {
 	return sizeof(struct nft_hash);
@@ -207,11 +268,20 @@ static int nft_hash_init(const struct nft_set *set,
 {
 	struct nft_hash *priv = nft_set_priv(set);
 	struct rhashtable_params params = nft_hash_params;
+	int err;
 
 	params.nelem_hint = desc->size ?: NFT_HASH_ELEMENT_HINT;
 	params.key_len	  = set->klen;
 
-	return rhashtable_init(&priv->ht, &params);
+	err = rhashtable_init(&priv->ht, &params);
+	if (err < 0)
+		return err;
+
+	INIT_DEFERRABLE_WORK(&priv->gc_work, nft_hash_gc);
+	if (set->flags & NFT_SET_TIMEOUT)
+		queue_delayed_work(system_power_efficient_wq, &priv->gc_work,
+				   nft_set_gc_interval(set));
+	return 0;
 }
 
 static void nft_hash_elem_destroy(void *ptr, void *arg)
@@ -223,6 +293,7 @@ static void nft_hash_destroy(const struct nft_set *set)
 {
 	struct nft_hash *priv = nft_set_priv(set);
 
+	cancel_delayed_work_sync(&priv->gc_work);
 	rhashtable_free_and_destroy(&priv->ht, nft_hash_elem_destroy,
 				    (void *)set);
 }
@@ -264,7 +335,7 @@ static struct nft_set_ops nft_hash_ops __read_mostly = {
 	.remove		= nft_hash_remove,
 	.lookup		= nft_hash_lookup,
 	.walk		= nft_hash_walk,
-	.features	= NFT_SET_MAP,
+	.features	= NFT_SET_MAP | NFT_SET_TIMEOUT,
 	.owner		= THIS_MODULE,
 };
 
-- 
1.7.10.4


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 06/20] netfilter: x_tables: fix cgroup matching on non-full sks
  2015-04-09 11:34 [PATCH 00/20] Netfilter updates for net-next Pablo Neira Ayuso
                   ` (4 preceding siblings ...)
  2015-04-09 11:34 ` [PATCH 05/20] netfilter: nft_hash: add support for timeouts Pablo Neira Ayuso
@ 2015-04-09 11:34 ` Pablo Neira Ayuso
  2015-04-09 11:34 ` [PATCH 07/20] netfilter: nft_meta: fix cgroup matching Pablo Neira Ayuso
                   ` (14 subsequent siblings)
  20 siblings, 0 replies; 25+ messages in thread
From: Pablo Neira Ayuso @ 2015-04-09 11:34 UTC (permalink / raw)
  To: netfilter-devel; +Cc: davem, netdev

From: Daniel Borkmann <daniel@iogearbox.net>

While originally only being intended for outgoing traffic, commit
a00e76349f35 ("netfilter: x_tables: allow to use cgroup match for
LOCAL_IN nf hooks") enabled xt_cgroups for the NF_INET_LOCAL_IN hook
as well, in order to allow for nfacct accounting.

Besides being currently limited to early demuxes only, commit
a00e76349f35 forgot to add a check if we deal with full sockets,
i.e. in this case not with time wait sockets. TCP time wait sockets
do not have the same memory layout as full sockets, a lower memory
footprint and consequently also don't have a sk_classid member;
probing for sk_classid member there could potentially lead to a
crash.

Fixes: a00e76349f35 ("netfilter: x_tables: allow to use cgroup match for LOCAL_IN nf hooks")
Cc: Alexey Perevalov <a.perevalov@samsung.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
---
 net/netfilter/xt_cgroup.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/netfilter/xt_cgroup.c b/net/netfilter/xt_cgroup.c
index 7198d66..a1d126f 100644
--- a/net/netfilter/xt_cgroup.c
+++ b/net/netfilter/xt_cgroup.c
@@ -39,7 +39,7 @@ cgroup_mt(const struct sk_buff *skb, struct xt_action_param *par)
 {
 	const struct xt_cgroup_info *info = par->matchinfo;
 
-	if (skb->sk == NULL)
+	if (skb->sk == NULL || !sk_fullsock(skb->sk))
 		return false;
 
 	return (info->id == skb->sk->sk_classid) ^ info->invert;
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 07/20] netfilter: nft_meta: fix cgroup matching
  2015-04-09 11:34 [PATCH 00/20] Netfilter updates for net-next Pablo Neira Ayuso
                   ` (5 preceding siblings ...)
  2015-04-09 11:34 ` [PATCH 06/20] netfilter: x_tables: fix cgroup matching on non-full sks Pablo Neira Ayuso
@ 2015-04-09 11:34 ` Pablo Neira Ayuso
  2015-04-09 11:34 ` [PATCH 08/20] netfilter: bridge: really save frag_max_size between PRE and POST_ROUTING Pablo Neira Ayuso
                   ` (13 subsequent siblings)
  20 siblings, 0 replies; 25+ messages in thread
From: Pablo Neira Ayuso @ 2015-04-09 11:34 UTC (permalink / raw)
  To: netfilter-devel; +Cc: davem, netdev

We have to stop iterating on the rule expressions if the cgroup
mismatches. Moreover, make sure a non-full socket from the input path
leads us to a crash.

Fixes: ce67417 ("netfilter: nft_meta: add cgroup support")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
---
 net/netfilter/nft_meta.c |    5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/net/netfilter/nft_meta.c b/net/netfilter/nft_meta.c
index 5197874..d79ce88 100644
--- a/net/netfilter/nft_meta.c
+++ b/net/netfilter/nft_meta.c
@@ -166,9 +166,8 @@ void nft_meta_get_eval(const struct nft_expr *expr,
 		dest->data[0] = out->group;
 		break;
 	case NFT_META_CGROUP:
-		if (skb->sk == NULL)
-			break;
-
+		if (skb->sk == NULL || !sk_fullsock(skb->sk))
+			goto err;
 		dest->data[0] = skb->sk->sk_classid;
 		break;
 	default:
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 08/20] netfilter: bridge: really save frag_max_size between PRE and POST_ROUTING
  2015-04-09 11:34 [PATCH 00/20] Netfilter updates for net-next Pablo Neira Ayuso
                   ` (6 preceding siblings ...)
  2015-04-09 11:34 ` [PATCH 07/20] netfilter: nft_meta: fix cgroup matching Pablo Neira Ayuso
@ 2015-04-09 11:34 ` Pablo Neira Ayuso
  2015-04-09 11:34 ` [PATCH 09/20] netfilter: x_tables: don't extract flow keys on early demuxed sks in socket match Pablo Neira Ayuso
                   ` (12 subsequent siblings)
  20 siblings, 0 replies; 25+ messages in thread
From: Pablo Neira Ayuso @ 2015-04-09 11:34 UTC (permalink / raw)
  To: netfilter-devel; +Cc: davem, netdev

From: Florian Westphal <fw@strlen.de>

We also need to save/store in forward, else br_parse_ip_options call
will zero frag_max_size as well.

Fixes: 93fdd47e5 ('bridge: Save frag_max_size between PRE_ROUTING and POST_ROUTING')
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
---
 net/bridge/br_netfilter.c |   17 +++++++++++++++--
 1 file changed, 15 insertions(+), 2 deletions(-)

diff --git a/net/bridge/br_netfilter.c b/net/bridge/br_netfilter.c
index f3884a1..282ed76 100644
--- a/net/bridge/br_netfilter.c
+++ b/net/bridge/br_netfilter.c
@@ -686,6 +686,13 @@ static int br_nf_forward_finish(struct sk_buff *skb)
 	struct net_device *in;
 
 	if (!IS_ARP(skb) && !IS_VLAN_ARP(skb)) {
+		int frag_max_size;
+
+		if (skb->protocol == htons(ETH_P_IP)) {
+			frag_max_size = IPCB(skb)->frag_max_size;
+			BR_INPUT_SKB_CB(skb)->frag_max_size = frag_max_size;
+		}
+
 		in = nf_bridge->physindev;
 		if (nf_bridge->mask & BRNF_PKT_TYPE) {
 			skb->pkt_type = PACKET_OTHERHOST;
@@ -745,8 +752,14 @@ static unsigned int br_nf_forward_ip(const struct nf_hook_ops *ops,
 		nf_bridge->mask |= BRNF_PKT_TYPE;
 	}
 
-	if (pf == NFPROTO_IPV4 && br_parse_ip_options(skb))
-		return NF_DROP;
+	if (pf == NFPROTO_IPV4) {
+		int frag_max = BR_INPUT_SKB_CB(skb)->frag_max_size;
+
+		if (br_parse_ip_options(skb))
+			return NF_DROP;
+
+		IPCB(skb)->frag_max_size = frag_max;
+	}
 
 	nf_bridge->physoutdev = skb->dev;
 	if (pf == NFPROTO_IPV4)
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 09/20] netfilter: x_tables: don't extract flow keys on early demuxed sks in socket match
  2015-04-09 11:34 [PATCH 00/20] Netfilter updates for net-next Pablo Neira Ayuso
                   ` (7 preceding siblings ...)
  2015-04-09 11:34 ` [PATCH 08/20] netfilter: bridge: really save frag_max_size between PRE and POST_ROUTING Pablo Neira Ayuso
@ 2015-04-09 11:34 ` Pablo Neira Ayuso
  2015-04-09 11:34 ` [PATCH 10/20] netfilter: bridge: don't use nf_bridge_info data to store mac header Pablo Neira Ayuso
                   ` (11 subsequent siblings)
  20 siblings, 0 replies; 25+ messages in thread
From: Pablo Neira Ayuso @ 2015-04-09 11:34 UTC (permalink / raw)
  To: netfilter-devel; +Cc: davem, netdev

From: Daniel Borkmann <daniel@iogearbox.net>

Currently in xt_socket, we take advantage of early demuxed sockets
since commit 00028aa37098 ("netfilter: xt_socket: use IP early demux")
in order to avoid a second socket lookup in the fast path, but we
only make partial use of this:

We still unnecessarily parse headers, extract proto, {s,d}addr and
{s,d}ports from the skb data, accessing possible conntrack information,
etc even though we were not even calling into the socket lookup via
xt_socket_get_sock_{v4,v6}() due to skb->sk hit, meaning those cycles
can be spared.

After this patch, we only proceed the slower, manual lookup path
when we have a skb->sk miss, thus time to match verdict for early
demuxed sockets will improve further, which might be i.e. interesting
for use cases such as mentioned in 681f130f39e1 ("netfilter: xt_socket:
add XT_SOCKET_NOWILDCARD flag").

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
---
 net/netfilter/xt_socket.c |   95 ++++++++++++++++++++++++---------------------
 1 file changed, 50 insertions(+), 45 deletions(-)

diff --git a/net/netfilter/xt_socket.c b/net/netfilter/xt_socket.c
index 895534e..e092cb0 100644
--- a/net/netfilter/xt_socket.c
+++ b/net/netfilter/xt_socket.c
@@ -143,13 +143,10 @@ static bool xt_socket_sk_is_transparent(struct sock *sk)
 	}
 }
 
-static bool
-socket_match(const struct sk_buff *skb, struct xt_action_param *par,
-	     const struct xt_socket_mtinfo1 *info)
+static struct sock *xt_socket_lookup_slow_v4(const struct sk_buff *skb,
+					     const struct net_device *indev)
 {
 	const struct iphdr *iph = ip_hdr(skb);
-	struct udphdr _hdr, *hp = NULL;
-	struct sock *sk = skb->sk;
 	__be32 uninitialized_var(daddr), uninitialized_var(saddr);
 	__be16 uninitialized_var(dport), uninitialized_var(sport);
 	u8 uninitialized_var(protocol);
@@ -159,10 +156,12 @@ socket_match(const struct sk_buff *skb, struct xt_action_param *par,
 #endif
 
 	if (iph->protocol == IPPROTO_UDP || iph->protocol == IPPROTO_TCP) {
+		struct udphdr _hdr, *hp;
+
 		hp = skb_header_pointer(skb, ip_hdrlen(skb),
 					sizeof(_hdr), &_hdr);
 		if (hp == NULL)
-			return false;
+			return NULL;
 
 		protocol = iph->protocol;
 		saddr = iph->saddr;
@@ -172,16 +171,17 @@ socket_match(const struct sk_buff *skb, struct xt_action_param *par,
 
 	} else if (iph->protocol == IPPROTO_ICMP) {
 		if (extract_icmp4_fields(skb, &protocol, &saddr, &daddr,
-					&sport, &dport))
-			return false;
+					 &sport, &dport))
+			return NULL;
 	} else {
-		return false;
+		return NULL;
 	}
 
 #ifdef XT_SOCKET_HAVE_CONNTRACK
-	/* Do the lookup with the original socket address in case this is a
-	 * reply packet of an established SNAT-ted connection. */
-
+	/* Do the lookup with the original socket address in
+	 * case this is a reply packet of an established
+	 * SNAT-ted connection.
+	 */
 	ct = nf_ct_get(skb, &ctinfo);
 	if (ct && !nf_ct_is_untracked(ct) &&
 	    ((iph->protocol != IPPROTO_ICMP &&
@@ -197,10 +197,18 @@ socket_match(const struct sk_buff *skb, struct xt_action_param *par,
 	}
 #endif
 
+	return xt_socket_get_sock_v4(dev_net(skb->dev), protocol, saddr, daddr,
+				     sport, dport, indev);
+}
+
+static bool
+socket_match(const struct sk_buff *skb, struct xt_action_param *par,
+	     const struct xt_socket_mtinfo1 *info)
+{
+	struct sock *sk = skb->sk;
+
 	if (!sk)
-		sk = xt_socket_get_sock_v4(dev_net(skb->dev), protocol,
-					   saddr, daddr, sport, dport,
-					   par->in);
+		sk = xt_socket_lookup_slow_v4(skb, par->in);
 	if (sk) {
 		bool wildcard;
 		bool transparent = true;
@@ -225,12 +233,7 @@ socket_match(const struct sk_buff *skb, struct xt_action_param *par,
 			sk = NULL;
 	}
 
-	pr_debug("proto %hhu %pI4:%hu -> %pI4:%hu (orig %pI4:%hu) sock %p\n",
-		 protocol, &saddr, ntohs(sport),
-		 &daddr, ntohs(dport),
-		 &iph->daddr, hp ? ntohs(hp->dest) : 0, sk);
-
-	return (sk != NULL);
+	return sk != NULL;
 }
 
 static bool
@@ -327,28 +330,26 @@ xt_socket_get_sock_v6(struct net *net, const u8 protocol,
 	return NULL;
 }
 
-static bool
-socket_mt6_v1_v2(const struct sk_buff *skb, struct xt_action_param *par)
+static struct sock *xt_socket_lookup_slow_v6(const struct sk_buff *skb,
+					     const struct net_device *indev)
 {
-	struct ipv6hdr ipv6_var, *iph = ipv6_hdr(skb);
-	struct udphdr _hdr, *hp = NULL;
-	struct sock *sk = skb->sk;
-	const struct in6_addr *daddr = NULL, *saddr = NULL;
 	__be16 uninitialized_var(dport), uninitialized_var(sport);
-	int thoff = 0, uninitialized_var(tproto);
-	const struct xt_socket_mtinfo1 *info = (struct xt_socket_mtinfo1 *) par->matchinfo;
+	const struct in6_addr *daddr = NULL, *saddr = NULL;
+	struct ipv6hdr *iph = ipv6_hdr(skb);
+	int thoff = 0, tproto;
 
 	tproto = ipv6_find_hdr(skb, &thoff, -1, NULL, NULL);
 	if (tproto < 0) {
 		pr_debug("unable to find transport header in IPv6 packet, dropping\n");
-		return NF_DROP;
+		return NULL;
 	}
 
 	if (tproto == IPPROTO_UDP || tproto == IPPROTO_TCP) {
-		hp = skb_header_pointer(skb, thoff,
-					sizeof(_hdr), &_hdr);
+		struct udphdr _hdr, *hp;
+
+		hp = skb_header_pointer(skb, thoff, sizeof(_hdr), &_hdr);
 		if (hp == NULL)
-			return false;
+			return NULL;
 
 		saddr = &iph->saddr;
 		sport = hp->source;
@@ -356,17 +357,27 @@ socket_mt6_v1_v2(const struct sk_buff *skb, struct xt_action_param *par)
 		dport = hp->dest;
 
 	} else if (tproto == IPPROTO_ICMPV6) {
+		struct ipv6hdr ipv6_var;
+
 		if (extract_icmp6_fields(skb, thoff, &tproto, &saddr, &daddr,
 					 &sport, &dport, &ipv6_var))
-			return false;
+			return NULL;
 	} else {
-		return false;
+		return NULL;
 	}
 
+	return xt_socket_get_sock_v6(dev_net(skb->dev), tproto, saddr, daddr,
+				     sport, dport, indev);
+}
+
+static bool
+socket_mt6_v1_v2(const struct sk_buff *skb, struct xt_action_param *par)
+{
+	const struct xt_socket_mtinfo1 *info = (struct xt_socket_mtinfo1 *) par->matchinfo;
+	struct sock *sk = skb->sk;
+
 	if (!sk)
-		sk = xt_socket_get_sock_v6(dev_net(skb->dev), tproto,
-					   saddr, daddr, sport, dport,
-					   par->in);
+		sk = xt_socket_lookup_slow_v6(skb, par->in);
 	if (sk) {
 		bool wildcard;
 		bool transparent = true;
@@ -391,13 +402,7 @@ socket_mt6_v1_v2(const struct sk_buff *skb, struct xt_action_param *par)
 			sk = NULL;
 	}
 
-	pr_debug("proto %hhd %pI6:%hu -> %pI6:%hu "
-		 "(orig %pI6:%hu) sock %p\n",
-		 tproto, saddr, ntohs(sport),
-		 daddr, ntohs(dport),
-		 &iph->daddr, hp ? ntohs(hp->dest) : 0, sk);
-
-	return (sk != NULL);
+	return sk != NULL;
 }
 #endif
 
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 10/20] netfilter: bridge: don't use nf_bridge_info data to store mac header
  2015-04-09 11:34 [PATCH 00/20] Netfilter updates for net-next Pablo Neira Ayuso
                   ` (8 preceding siblings ...)
  2015-04-09 11:34 ` [PATCH 09/20] netfilter: x_tables: don't extract flow keys on early demuxed sks in socket match Pablo Neira Ayuso
@ 2015-04-09 11:34 ` Pablo Neira Ayuso
  2015-04-09 11:34 ` [PATCH 11/20] netfilter: bridge: add helpers for fetching physin/outdev Pablo Neira Ayuso
                   ` (10 subsequent siblings)
  20 siblings, 0 replies; 25+ messages in thread
From: Pablo Neira Ayuso @ 2015-04-09 11:34 UTC (permalink / raw)
  To: netfilter-devel; +Cc: davem, netdev

From: Florian Westphal <fw@strlen.de>

br_netfilter maintains an extra state, nf_bridge_info, which is attached
to skb via skb->nf_bridge pointer.

Amongst other things we use skb->nf_bridge->data to store the original
mac header for every processed skb.

This is required for ip refragmentation when using conntrack
on top of bridge, because ip_fragment doesn't copy it from original skb.

However there is no need anymore to do this unconditionally.

Move this to the one place where its needed -- when br_netfilter calls
ip_fragment().

Also switch to percpu storage for this so we can handle fragmenting
without accessing nf_bridge meta data.

Only user left is neigh resolution when DNAT is detected, to hold
the original source mac address (neigh resolution builds new mac header
using bridge mac), so rename ->data and reduce its size to whats needed.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
---
 include/linux/skbuff.h    |    2 +-
 net/bridge/br_netfilter.c |   70 ++++++++++++++++++++++++++-------------------
 2 files changed, 42 insertions(+), 30 deletions(-)

diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index 36f3f43..f66a089 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -169,7 +169,7 @@ struct nf_bridge_info {
 	unsigned int		mask;
 	struct net_device	*physindev;
 	struct net_device	*physoutdev;
-	unsigned long		data[32 / sizeof(unsigned long)];
+	char			neigh_header[8];
 };
 #endif
 
diff --git a/net/bridge/br_netfilter.c b/net/bridge/br_netfilter.c
index 282ed76..ca1cb67 100644
--- a/net/bridge/br_netfilter.c
+++ b/net/bridge/br_netfilter.c
@@ -111,6 +111,19 @@ static inline __be16 pppoe_proto(const struct sk_buff *skb)
 	 pppoe_proto(skb) == htons(PPP_IPV6) && \
 	 brnf_filter_pppoe_tagged)
 
+/* largest possible L2 header, see br_nf_dev_queue_xmit() */
+#define NF_BRIDGE_MAX_MAC_HEADER_LENGTH (PPPOE_SES_HLEN + ETH_HLEN)
+
+#if IS_ENABLED(CONFIG_NF_DEFRAG_IPV4)
+struct brnf_frag_data {
+	char mac[NF_BRIDGE_MAX_MAC_HEADER_LENGTH];
+	u8 encap_size;
+	u8 size;
+};
+
+static DEFINE_PER_CPU(struct brnf_frag_data, brnf_frag_data_storage);
+#endif
+
 static inline struct rtable *bridge_parent_rtable(const struct net_device *dev)
 {
 	struct net_bridge_port *port;
@@ -189,14 +202,6 @@ static inline void nf_bridge_pull_encap_header_rcsum(struct sk_buff *skb)
 	skb->network_header += len;
 }
 
-static inline void nf_bridge_save_header(struct sk_buff *skb)
-{
-	int header_size = ETH_HLEN + nf_bridge_encap_header_len(skb);
-
-	skb_copy_from_linear_data_offset(skb, -header_size,
-					 skb->nf_bridge->data, header_size);
-}
-
 /* When handing a packet over to the IP layer
  * check whether we have a skb that is in the
  * expected format
@@ -318,7 +323,7 @@ static int br_nf_pre_routing_finish_bridge(struct sk_buff *skb)
 			 */
 			skb_copy_from_linear_data_offset(skb,
 							 -(ETH_HLEN-ETH_ALEN),
-							 skb->nf_bridge->data,
+							 nf_bridge->neigh_header,
 							 ETH_HLEN-ETH_ALEN);
 			/* tell br_dev_xmit to continue with forwarding */
 			nf_bridge->mask |= BRNF_BRIDGED_DNAT;
@@ -810,30 +815,22 @@ static unsigned int br_nf_forward_arp(const struct nf_hook_ops *ops,
 }
 
 #if IS_ENABLED(CONFIG_NF_DEFRAG_IPV4)
-static bool nf_bridge_copy_header(struct sk_buff *skb)
+static int br_nf_push_frag_xmit(struct sk_buff *skb)
 {
+	struct brnf_frag_data *data;
 	int err;
-	unsigned int header_size;
 
-	nf_bridge_update_protocol(skb);
-	header_size = ETH_HLEN + nf_bridge_encap_header_len(skb);
-	err = skb_cow_head(skb, header_size);
-	if (err)
-		return false;
+	data = this_cpu_ptr(&brnf_frag_data_storage);
+	err = skb_cow_head(skb, data->size);
 
-	skb_copy_to_linear_data_offset(skb, -header_size,
-				       skb->nf_bridge->data, header_size);
-	__skb_push(skb, nf_bridge_encap_header_len(skb));
-	return true;
-}
-
-static int br_nf_push_frag_xmit(struct sk_buff *skb)
-{
-	if (!nf_bridge_copy_header(skb)) {
+	if (err) {
 		kfree_skb(skb);
 		return 0;
 	}
 
+	skb_copy_to_linear_data_offset(skb, -data->size, data->mac, data->size);
+	__skb_push(skb, data->encap_size);
+
 	return br_dev_queue_push_xmit(skb);
 }
 
@@ -851,14 +848,27 @@ static int br_nf_dev_queue_xmit(struct sk_buff *skb)
 	 * boundaries by preserving frag_list rather than refragmenting.
 	 */
 	if (skb->len + mtu_reserved > skb->dev->mtu) {
+		struct brnf_frag_data *data;
+
 		frag_max_size = BR_INPUT_SKB_CB(skb)->frag_max_size;
 		if (br_parse_ip_options(skb))
 			/* Drop invalid packet */
 			return NF_DROP;
 		IPCB(skb)->frag_max_size = frag_max_size;
+
+		nf_bridge_update_protocol(skb);
+
+		data = this_cpu_ptr(&brnf_frag_data_storage);
+		data->encap_size = nf_bridge_encap_header_len(skb);
+		data->size = ETH_HLEN + data->encap_size;
+
+		skb_copy_from_linear_data_offset(skb, -data->size, data->mac,
+						 data->size);
+
 		ret = ip_fragment(skb, br_nf_push_frag_xmit);
-	} else
+	} else {
 		ret = br_dev_queue_push_xmit(skb);
+	}
 
 	return ret;
 }
@@ -906,7 +916,6 @@ static unsigned int br_nf_post_routing(const struct nf_hook_ops *ops,
 	}
 
 	nf_bridge_pull_encap_header(skb);
-	nf_bridge_save_header(skb);
 	if (pf == NFPROTO_IPV4)
 		skb->protocol = htons(ETH_P_IP);
 	else
@@ -951,8 +960,11 @@ static void br_nf_pre_routing_finish_bridge_slow(struct sk_buff *skb)
 	skb_pull(skb, ETH_HLEN);
 	nf_bridge->mask &= ~BRNF_BRIDGED_DNAT;
 
-	skb_copy_to_linear_data_offset(skb, -(ETH_HLEN-ETH_ALEN),
-				       skb->nf_bridge->data, ETH_HLEN-ETH_ALEN);
+	BUILD_BUG_ON(sizeof(nf_bridge->neigh_header) != (ETH_HLEN - ETH_ALEN));
+
+	skb_copy_to_linear_data_offset(skb, -(ETH_HLEN - ETH_ALEN),
+				       nf_bridge->neigh_header,
+				       ETH_HLEN - ETH_ALEN);
 	skb->dev = nf_bridge->physindev;
 	br_handle_frame_finish(skb);
 }
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 11/20] netfilter: bridge: add helpers for fetching physin/outdev
  2015-04-09 11:34 [PATCH 00/20] Netfilter updates for net-next Pablo Neira Ayuso
                   ` (9 preceding siblings ...)
  2015-04-09 11:34 ` [PATCH 10/20] netfilter: bridge: don't use nf_bridge_info data to store mac header Pablo Neira Ayuso
@ 2015-04-09 11:34 ` Pablo Neira Ayuso
  2015-04-09 11:34 ` [PATCH 12/20] netfilter: physdev: use helpers Pablo Neira Ayuso
                   ` (9 subsequent siblings)
  20 siblings, 0 replies; 25+ messages in thread
From: Pablo Neira Ayuso @ 2015-04-09 11:34 UTC (permalink / raw)
  To: netfilter-devel; +Cc: davem, netdev

From: Florian Westphal <fw@strlen.de>

right now we store this in the nf_bridge_info struct, accessible
via skb->nf_bridge.  This patch prepares removal of this pointer from skb:

Instead of using skb->nf_bridge->x, we use helpers to obtain the in/out
device (or ifindexes).

Followup patches to netfilter will then allow nf_bridge_info to be
obtained by a call into the br_netfilter core, rather than keeping a
pointer to it in sk_buff.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
---
 include/linux/netfilter_bridge.h           |   23 +++++++++++++++++++-
 net/ipv4/netfilter/nf_reject_ipv4.c        |    4 +++-
 net/ipv6/netfilter/nf_reject_ipv6.c        |    4 +++-
 net/netfilter/ipset/ip_set_hash_netiface.c |   32 +++++++++++++++++++++-------
 net/netfilter/nf_log_common.c              |    5 +++--
 net/netfilter/nf_queue.c                   |   18 +++++++++-------
 net/netfilter/nfnetlink_log.c              |   17 +++++++++++----
 net/netfilter/nfnetlink_queue_core.c       |   28 ++++++++++++++++--------
 8 files changed, 97 insertions(+), 34 deletions(-)

diff --git a/include/linux/netfilter_bridge.h b/include/linux/netfilter_bridge.h
index 2734977..e1d96bc 100644
--- a/include/linux/netfilter_bridge.h
+++ b/include/linux/netfilter_bridge.h
@@ -2,7 +2,7 @@
 #define __LINUX_BRIDGE_NETFILTER_H
 
 #include <uapi/linux/netfilter_bridge.h>
-
+#include <linux/skbuff.h>
 
 enum nf_br_hook_priorities {
 	NF_BR_PRI_FIRST = INT_MIN,
@@ -40,6 +40,27 @@ static inline void br_drop_fake_rtable(struct sk_buff *skb)
 		skb_dst_drop(skb);
 }
 
+static inline int nf_bridge_get_physinif(const struct sk_buff *skb)
+{
+	return skb->nf_bridge ? skb->nf_bridge->physindev->ifindex : 0;
+}
+
+static inline int nf_bridge_get_physoutif(const struct sk_buff *skb)
+{
+	return skb->nf_bridge ? skb->nf_bridge->physoutdev->ifindex : 0;
+}
+
+static inline struct net_device *
+nf_bridge_get_physindev(const struct sk_buff *skb)
+{
+	return skb->nf_bridge ? skb->nf_bridge->physindev : NULL;
+}
+
+static inline struct net_device *
+nf_bridge_get_physoutdev(const struct sk_buff *skb)
+{
+	return skb->nf_bridge ? skb->nf_bridge->physoutdev : NULL;
+}
 #else
 #define br_drop_fake_rtable(skb)	        do { } while (0)
 #endif /* CONFIG_BRIDGE_NETFILTER */
diff --git a/net/ipv4/netfilter/nf_reject_ipv4.c b/net/ipv4/netfilter/nf_reject_ipv4.c
index c5b794d..3262e41 100644
--- a/net/ipv4/netfilter/nf_reject_ipv4.c
+++ b/net/ipv4/netfilter/nf_reject_ipv4.c
@@ -13,6 +13,7 @@
 #include <net/dst.h>
 #include <net/netfilter/ipv4/nf_reject.h>
 #include <linux/netfilter_ipv4.h>
+#include <linux/netfilter_bridge.h>
 #include <net/netfilter/ipv4/nf_reject.h>
 
 const struct tcphdr *nf_reject_ip_tcphdr_get(struct sk_buff *oldskb,
@@ -146,7 +147,8 @@ void nf_send_reset(struct sk_buff *oldskb, int hook)
 	 */
 	if (oldskb->nf_bridge) {
 		struct ethhdr *oeth = eth_hdr(oldskb);
-		nskb->dev = oldskb->nf_bridge->physindev;
+
+		nskb->dev = nf_bridge_get_physindev(oldskb);
 		niph->tot_len = htons(nskb->len);
 		ip_send_check(niph);
 		if (dev_hard_header(nskb, nskb->dev, ntohs(nskb->protocol),
diff --git a/net/ipv6/netfilter/nf_reject_ipv6.c b/net/ipv6/netfilter/nf_reject_ipv6.c
index 3afdce0..94b4c6d 100644
--- a/net/ipv6/netfilter/nf_reject_ipv6.c
+++ b/net/ipv6/netfilter/nf_reject_ipv6.c
@@ -13,6 +13,7 @@
 #include <net/ip6_checksum.h>
 #include <net/netfilter/ipv6/nf_reject.h>
 #include <linux/netfilter_ipv6.h>
+#include <linux/netfilter_bridge.h>
 #include <net/netfilter/ipv6/nf_reject.h>
 
 const struct tcphdr *nf_reject_ip6_tcphdr_get(struct sk_buff *oldskb,
@@ -195,7 +196,8 @@ void nf_send_reset6(struct net *net, struct sk_buff *oldskb, int hook)
 	 */
 	if (oldskb->nf_bridge) {
 		struct ethhdr *oeth = eth_hdr(oldskb);
-		nskb->dev = oldskb->nf_bridge->physindev;
+
+		nskb->dev = nf_bridge_get_physindev(oldskb);
 		nskb->protocol = htons(ETH_P_IPV6);
 		ip6h->payload_len = htons(sizeof(struct tcphdr));
 		if (dev_hard_header(nskb, nskb->dev, ntohs(nskb->protocol),
diff --git a/net/netfilter/ipset/ip_set_hash_netiface.c b/net/netfilter/ipset/ip_set_hash_netiface.c
index 758b002..380ef51 100644
--- a/net/netfilter/ipset/ip_set_hash_netiface.c
+++ b/net/netfilter/ipset/ip_set_hash_netiface.c
@@ -19,6 +19,7 @@
 #include <net/netlink.h>
 
 #include <linux/netfilter.h>
+#include <linux/netfilter_bridge.h>
 #include <linux/netfilter/ipset/pfxlen.h>
 #include <linux/netfilter/ipset/ip_set.h>
 #include <linux/netfilter/ipset/ip_set_hash.h>
@@ -211,6 +212,22 @@ hash_netiface4_data_next(struct hash_netiface4_elem *next,
 #define HKEY_DATALEN	sizeof(struct hash_netiface4_elem_hashed)
 #include "ip_set_hash_gen.h"
 
+#if IS_ENABLED(CONFIG_BRIDGE_NETFILTER)
+static const char *get_physindev_name(const struct sk_buff *skb)
+{
+	struct net_device *dev = nf_bridge_get_physindev(skb);
+
+	return dev ? dev->name : NULL;
+}
+
+static const char *get_phyoutdev_name(const struct sk_buff *skb)
+{
+	struct net_device *dev = nf_bridge_get_physoutdev(skb);
+
+	return dev ? dev->name : NULL;
+}
+#endif
+
 static int
 hash_netiface4_kadt(struct ip_set *set, const struct sk_buff *skb,
 		    const struct xt_action_param *par,
@@ -234,16 +251,15 @@ hash_netiface4_kadt(struct ip_set *set, const struct sk_buff *skb,
 	e.ip &= ip_set_netmask(e.cidr);
 
 #define IFACE(dir)	(par->dir ? par->dir->name : NULL)
-#define PHYSDEV(dir)	(nf_bridge->dir ? nf_bridge->dir->name : NULL)
 #define SRCDIR		(opt->flags & IPSET_DIM_TWO_SRC)
 
 	if (opt->cmdflags & IPSET_FLAG_PHYSDEV) {
 #if IS_ENABLED(CONFIG_BRIDGE_NETFILTER)
-		const struct nf_bridge_info *nf_bridge = skb->nf_bridge;
+		e.iface = SRCDIR ? get_physindev_name(skb) :
+				   get_phyoutdev_name(skb);
 
-		if (!nf_bridge)
+		if (!e.iface)
 			return -EINVAL;
-		e.iface = SRCDIR ? PHYSDEV(physindev) : PHYSDEV(physoutdev);
 		e.physdev = 1;
 #else
 		e.iface = NULL;
@@ -476,11 +492,11 @@ hash_netiface6_kadt(struct ip_set *set, const struct sk_buff *skb,
 
 	if (opt->cmdflags & IPSET_FLAG_PHYSDEV) {
 #if IS_ENABLED(CONFIG_BRIDGE_NETFILTER)
-		const struct nf_bridge_info *nf_bridge = skb->nf_bridge;
-
-		if (!nf_bridge)
+		e.iface = SRCDIR ? get_physindev_name(skb) :
+				   get_phyoutdev_name(skb);
+		if (!e.iface)
 			return -EINVAL;
-		e.iface = SRCDIR ? PHYSDEV(physindev) : PHYSDEV(physoutdev);
+
 		e.physdev = 1;
 #else
 		e.iface = NULL;
diff --git a/net/netfilter/nf_log_common.c b/net/netfilter/nf_log_common.c
index 2631876..a5aa596 100644
--- a/net/netfilter/nf_log_common.c
+++ b/net/netfilter/nf_log_common.c
@@ -17,6 +17,7 @@
 #include <net/route.h>
 
 #include <linux/netfilter.h>
+#include <linux/netfilter_bridge.h>
 #include <linux/netfilter/xt_LOG.h>
 #include <net/netfilter/nf_log.h>
 
@@ -163,10 +164,10 @@ nf_log_dump_packet_common(struct nf_log_buf *m, u_int8_t pf,
 		const struct net_device *physindev;
 		const struct net_device *physoutdev;
 
-		physindev = skb->nf_bridge->physindev;
+		physindev = nf_bridge_get_physindev(skb);
 		if (physindev && in != physindev)
 			nf_log_buf_add(m, "PHYSIN=%s ", physindev->name);
-		physoutdev = skb->nf_bridge->physoutdev;
+		physoutdev = nf_bridge_get_physoutdev(skb);
 		if (physoutdev && out != physoutdev)
 			nf_log_buf_add(m, "PHYSOUT=%s ", physoutdev->name);
 	}
diff --git a/net/netfilter/nf_queue.c b/net/netfilter/nf_queue.c
index 4c8b68e..fb045b4 100644
--- a/net/netfilter/nf_queue.c
+++ b/net/netfilter/nf_queue.c
@@ -10,6 +10,7 @@
 #include <linux/proc_fs.h>
 #include <linux/skbuff.h>
 #include <linux/netfilter.h>
+#include <linux/netfilter_bridge.h>
 #include <linux/seq_file.h>
 #include <linux/rcupdate.h>
 #include <net/protocol.h>
@@ -54,12 +55,14 @@ void nf_queue_entry_release_refs(struct nf_queue_entry *entry)
 		dev_put(entry->outdev);
 #if IS_ENABLED(CONFIG_BRIDGE_NETFILTER)
 	if (entry->skb->nf_bridge) {
-		struct nf_bridge_info *nf_bridge = entry->skb->nf_bridge;
+		struct net_device *physdev;
 
-		if (nf_bridge->physindev)
-			dev_put(nf_bridge->physindev);
-		if (nf_bridge->physoutdev)
-			dev_put(nf_bridge->physoutdev);
+		physdev = nf_bridge_get_physindev(entry->skb);
+		if (physdev)
+			dev_put(physdev);
+		physdev = nf_bridge_get_physoutdev(entry->skb);
+		if (physdev)
+			dev_put(physdev);
 	}
 #endif
 	/* Drop reference to owner of hook which queued us. */
@@ -79,13 +82,12 @@ bool nf_queue_entry_get_refs(struct nf_queue_entry *entry)
 		dev_hold(entry->outdev);
 #if IS_ENABLED(CONFIG_BRIDGE_NETFILTER)
 	if (entry->skb->nf_bridge) {
-		struct nf_bridge_info *nf_bridge = entry->skb->nf_bridge;
 		struct net_device *physdev;
 
-		physdev = nf_bridge->physindev;
+		physdev = nf_bridge_get_physindev(entry->skb);
 		if (physdev)
 			dev_hold(physdev);
-		physdev = nf_bridge->physoutdev;
+		physdev = nf_bridge_get_physoutdev(entry->skb);
 		if (physdev)
 			dev_hold(physdev);
 	}
diff --git a/net/netfilter/nfnetlink_log.c b/net/netfilter/nfnetlink_log.c
index 957b83a..51afea4 100644
--- a/net/netfilter/nfnetlink_log.c
+++ b/net/netfilter/nfnetlink_log.c
@@ -23,6 +23,7 @@
 #include <linux/ipv6.h>
 #include <linux/netdevice.h>
 #include <linux/netfilter.h>
+#include <linux/netfilter_bridge.h>
 #include <net/netlink.h>
 #include <linux/netfilter/nfnetlink.h>
 #include <linux/netfilter/nfnetlink_log.h>
@@ -448,14 +449,18 @@ __build_packet_message(struct nfnl_log_net *log,
 					 htonl(br_port_get_rcu(indev)->br->dev->ifindex)))
 				goto nla_put_failure;
 		} else {
+			struct net_device *physindev;
+
 			/* Case 2: indev is bridge group, we need to look for
 			 * physical device (when called from ipv4) */
 			if (nla_put_be32(inst->skb, NFULA_IFINDEX_INDEV,
 					 htonl(indev->ifindex)))
 				goto nla_put_failure;
-			if (skb->nf_bridge && skb->nf_bridge->physindev &&
+
+			physindev = nf_bridge_get_physindev(skb);
+			if (physindev &&
 			    nla_put_be32(inst->skb, NFULA_IFINDEX_PHYSINDEV,
-					 htonl(skb->nf_bridge->physindev->ifindex)))
+					 htonl(physindev->ifindex)))
 				goto nla_put_failure;
 		}
 #endif
@@ -479,14 +484,18 @@ __build_packet_message(struct nfnl_log_net *log,
 					 htonl(br_port_get_rcu(outdev)->br->dev->ifindex)))
 				goto nla_put_failure;
 		} else {
+			struct net_device *physoutdev;
+
 			/* Case 2: indev is a bridge group, we need to look
 			 * for physical device (when called from ipv4) */
 			if (nla_put_be32(inst->skb, NFULA_IFINDEX_OUTDEV,
 					 htonl(outdev->ifindex)))
 				goto nla_put_failure;
-			if (skb->nf_bridge && skb->nf_bridge->physoutdev &&
+
+			physoutdev = nf_bridge_get_physoutdev(skb);
+			if (physoutdev &&
 			    nla_put_be32(inst->skb, NFULA_IFINDEX_PHYSOUTDEV,
-					 htonl(skb->nf_bridge->physoutdev->ifindex)))
+					 htonl(physoutdev->ifindex)))
 				goto nla_put_failure;
 		}
 #endif
diff --git a/net/netfilter/nfnetlink_queue_core.c b/net/netfilter/nfnetlink_queue_core.c
index 86ee8b0..94e1aaf 100644
--- a/net/netfilter/nfnetlink_queue_core.c
+++ b/net/netfilter/nfnetlink_queue_core.c
@@ -25,6 +25,7 @@
 #include <linux/proc_fs.h>
 #include <linux/netfilter_ipv4.h>
 #include <linux/netfilter_ipv6.h>
+#include <linux/netfilter_bridge.h>
 #include <linux/netfilter/nfnetlink.h>
 #include <linux/netfilter/nfnetlink_queue.h>
 #include <linux/list.h>
@@ -396,14 +397,18 @@ nfqnl_build_packet_message(struct net *net, struct nfqnl_instance *queue,
 					 htonl(br_port_get_rcu(indev)->br->dev->ifindex)))
 				goto nla_put_failure;
 		} else {
+			int physinif;
+
 			/* Case 2: indev is bridge group, we need to look for
 			 * physical device (when called from ipv4) */
 			if (nla_put_be32(skb, NFQA_IFINDEX_INDEV,
 					 htonl(indev->ifindex)))
 				goto nla_put_failure;
-			if (entskb->nf_bridge && entskb->nf_bridge->physindev &&
+
+			physinif = nf_bridge_get_physinif(entskb);
+			if (physinif &&
 			    nla_put_be32(skb, NFQA_IFINDEX_PHYSINDEV,
-					 htonl(entskb->nf_bridge->physindev->ifindex)))
+					 htonl(physinif)))
 				goto nla_put_failure;
 		}
 #endif
@@ -426,14 +431,18 @@ nfqnl_build_packet_message(struct net *net, struct nfqnl_instance *queue,
 					 htonl(br_port_get_rcu(outdev)->br->dev->ifindex)))
 				goto nla_put_failure;
 		} else {
+			int physoutif;
+
 			/* Case 2: outdev is bridge group, we need to look for
 			 * physical output device (when called from ipv4) */
 			if (nla_put_be32(skb, NFQA_IFINDEX_OUTDEV,
 					 htonl(outdev->ifindex)))
 				goto nla_put_failure;
-			if (entskb->nf_bridge && entskb->nf_bridge->physoutdev &&
+
+			physoutif = nf_bridge_get_physoutif(entskb);
+			if (physoutif &&
 			    nla_put_be32(skb, NFQA_IFINDEX_PHYSOUTDEV,
-					 htonl(entskb->nf_bridge->physoutdev->ifindex)))
+					 htonl(physoutif)))
 				goto nla_put_failure;
 		}
 #endif
@@ -765,11 +774,12 @@ dev_cmp(struct nf_queue_entry *entry, unsigned long ifindex)
 			return 1;
 #if IS_ENABLED(CONFIG_BRIDGE_NETFILTER)
 	if (entry->skb->nf_bridge) {
-		if (entry->skb->nf_bridge->physindev &&
-		    entry->skb->nf_bridge->physindev->ifindex == ifindex)
-			return 1;
-		if (entry->skb->nf_bridge->physoutdev &&
-		    entry->skb->nf_bridge->physoutdev->ifindex == ifindex)
+		int physinif, physoutif;
+
+		physinif = nf_bridge_get_physinif(entry->skb);
+		physoutif = nf_bridge_get_physoutif(entry->skb);
+
+		if (physinif == ifindex || physoutif == ifindex)
 			return 1;
 	}
 #endif
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 12/20] netfilter: physdev: use helpers
  2015-04-09 11:34 [PATCH 00/20] Netfilter updates for net-next Pablo Neira Ayuso
                   ` (10 preceding siblings ...)
  2015-04-09 11:34 ` [PATCH 11/20] netfilter: bridge: add helpers for fetching physin/outdev Pablo Neira Ayuso
@ 2015-04-09 11:34 ` Pablo Neira Ayuso
  2015-04-09 11:34 ` [PATCH 13/20] netfilter: bridge: add and use nf_bridge_info_get helper Pablo Neira Ayuso
                   ` (8 subsequent siblings)
  20 siblings, 0 replies; 25+ messages in thread
From: Pablo Neira Ayuso @ 2015-04-09 11:34 UTC (permalink / raw)
  To: netfilter-devel; +Cc: davem, netdev

From: Florian Westphal <fw@strlen.de>

Avoid skb->nf_bridge accesses where possible.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
---
 net/netfilter/xt_physdev.c |   34 ++++++++++++++++++++++------------
 1 file changed, 22 insertions(+), 12 deletions(-)

diff --git a/net/netfilter/xt_physdev.c b/net/netfilter/xt_physdev.c
index 50a5204..1caaccb 100644
--- a/net/netfilter/xt_physdev.c
+++ b/net/netfilter/xt_physdev.c
@@ -25,16 +25,15 @@ MODULE_ALIAS("ip6t_physdev");
 static bool
 physdev_mt(const struct sk_buff *skb, struct xt_action_param *par)
 {
-	static const char nulldevname[IFNAMSIZ] __attribute__((aligned(sizeof(long))));
 	const struct xt_physdev_info *info = par->matchinfo;
+	const struct net_device *physdev;
 	unsigned long ret;
 	const char *indev, *outdev;
-	const struct nf_bridge_info *nf_bridge;
 
 	/* Not a bridged IP packet or no info available yet:
 	 * LOCAL_OUT/mangle and LOCAL_OUT/nat don't know if
 	 * the destination device will be a bridge. */
-	if (!(nf_bridge = skb->nf_bridge)) {
+	if (!skb->nf_bridge) {
 		/* Return MATCH if the invert flags of the used options are on */
 		if ((info->bitmask & XT_PHYSDEV_OP_BRIDGED) &&
 		    !(info->invert & XT_PHYSDEV_OP_BRIDGED))
@@ -54,30 +53,41 @@ physdev_mt(const struct sk_buff *skb, struct xt_action_param *par)
 		return true;
 	}
 
+	physdev = nf_bridge_get_physoutdev(skb);
+	outdev = physdev ? physdev->name : NULL;
+
 	/* This only makes sense in the FORWARD and POSTROUTING chains */
 	if ((info->bitmask & XT_PHYSDEV_OP_BRIDGED) &&
-	    (!!nf_bridge->physoutdev ^ !(info->invert & XT_PHYSDEV_OP_BRIDGED)))
+	    (!!outdev ^ !(info->invert & XT_PHYSDEV_OP_BRIDGED)))
 		return false;
 
+	physdev = nf_bridge_get_physindev(skb);
+	indev = physdev ? physdev->name : NULL;
+
 	if ((info->bitmask & XT_PHYSDEV_OP_ISIN &&
-	    (!nf_bridge->physindev ^ !!(info->invert & XT_PHYSDEV_OP_ISIN))) ||
+	    (!indev ^ !!(info->invert & XT_PHYSDEV_OP_ISIN))) ||
 	    (info->bitmask & XT_PHYSDEV_OP_ISOUT &&
-	    (!nf_bridge->physoutdev ^ !!(info->invert & XT_PHYSDEV_OP_ISOUT))))
+	    (!outdev ^ !!(info->invert & XT_PHYSDEV_OP_ISOUT))))
 		return false;
 
 	if (!(info->bitmask & XT_PHYSDEV_OP_IN))
 		goto match_outdev;
-	indev = nf_bridge->physindev ? nf_bridge->physindev->name : nulldevname;
-	ret = ifname_compare_aligned(indev, info->physindev, info->in_mask);
 
-	if (!ret ^ !(info->invert & XT_PHYSDEV_OP_IN))
-		return false;
+	if (indev) {
+		ret = ifname_compare_aligned(indev, info->physindev,
+					     info->in_mask);
+
+		if (!ret ^ !(info->invert & XT_PHYSDEV_OP_IN))
+			return false;
+	}
 
 match_outdev:
 	if (!(info->bitmask & XT_PHYSDEV_OP_OUT))
 		return true;
-	outdev = nf_bridge->physoutdev ?
-		 nf_bridge->physoutdev->name : nulldevname;
+
+	if (!outdev)
+		return false;
+
 	ret = ifname_compare_aligned(outdev, info->physoutdev, info->out_mask);
 
 	return (!!ret ^ !(info->invert & XT_PHYSDEV_OP_OUT));
-- 
1.7.10.4


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 13/20] netfilter: bridge: add and use nf_bridge_info_get helper
  2015-04-09 11:34 [PATCH 00/20] Netfilter updates for net-next Pablo Neira Ayuso
                   ` (11 preceding siblings ...)
  2015-04-09 11:34 ` [PATCH 12/20] netfilter: physdev: use helpers Pablo Neira Ayuso
@ 2015-04-09 11:34 ` Pablo Neira Ayuso
  2015-04-09 11:34 ` [PATCH 14/20] netfilter: bridge: start splitting mask into public/private chunks Pablo Neira Ayuso
                   ` (7 subsequent siblings)
  20 siblings, 0 replies; 25+ messages in thread
From: Pablo Neira Ayuso @ 2015-04-09 11:34 UTC (permalink / raw)
  To: netfilter-devel; +Cc: davem, netdev

From: Florian Westphal <fw@strlen.de>

Don't access skb->nf_bridge directly, this pointer will be removed soon.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
---
 net/bridge/br_netfilter.c |   24 ++++++++++++++++--------
 1 file changed, 16 insertions(+), 8 deletions(-)

diff --git a/net/bridge/br_netfilter.c b/net/bridge/br_netfilter.c
index ca1cb67..301f12b 100644
--- a/net/bridge/br_netfilter.c
+++ b/net/bridge/br_netfilter.c
@@ -124,6 +124,11 @@ struct brnf_frag_data {
 static DEFINE_PER_CPU(struct brnf_frag_data, brnf_frag_data_storage);
 #endif
 
+static struct nf_bridge_info *nf_bridge_info_get(const struct sk_buff *skb)
+{
+	return skb->nf_bridge;
+}
+
 static inline struct rtable *bridge_parent_rtable(const struct net_device *dev)
 {
 	struct net_bridge_port *port;
@@ -268,7 +273,7 @@ static void nf_bridge_update_protocol(struct sk_buff *skb)
  * bridge PRE_ROUTING hook. */
 static int br_nf_pre_routing_finish_ipv6(struct sk_buff *skb)
 {
-	struct nf_bridge_info *nf_bridge = skb->nf_bridge;
+	struct nf_bridge_info *nf_bridge = nf_bridge_info_get(skb);
 	struct rtable *rt;
 
 	if (nf_bridge->mask & BRNF_PKT_TYPE) {
@@ -300,7 +305,6 @@ static int br_nf_pre_routing_finish_ipv6(struct sk_buff *skb)
  */
 static int br_nf_pre_routing_finish_bridge(struct sk_buff *skb)
 {
-	struct nf_bridge_info *nf_bridge = skb->nf_bridge;
 	struct neighbour *neigh;
 	struct dst_entry *dst;
 
@@ -310,6 +314,7 @@ static int br_nf_pre_routing_finish_bridge(struct sk_buff *skb)
 	dst = skb_dst(skb);
 	neigh = dst_neigh_lookup_skb(dst, skb);
 	if (neigh) {
+		struct nf_bridge_info *nf_bridge = nf_bridge_info_get(skb);
 		int ret;
 
 		if (neigh->hh.hh_len) {
@@ -396,7 +401,7 @@ static int br_nf_pre_routing_finish(struct sk_buff *skb)
 {
 	struct net_device *dev = skb->dev;
 	struct iphdr *iph = ip_hdr(skb);
-	struct nf_bridge_info *nf_bridge = skb->nf_bridge;
+	struct nf_bridge_info *nf_bridge = nf_bridge_info_get(skb);
 	struct rtable *rt;
 	int err;
 	int frag_max_size;
@@ -488,7 +493,7 @@ static struct net_device *brnf_get_logical_dev(struct sk_buff *skb, const struct
 /* Some common code for IPv4/IPv6 */
 static struct net_device *setup_pre_routing(struct sk_buff *skb)
 {
-	struct nf_bridge_info *nf_bridge = skb->nf_bridge;
+	struct nf_bridge_info *nf_bridge = nf_bridge_info_get(skb);
 
 	if (skb->pkt_type == PACKET_OTHERHOST) {
 		skb->pkt_type = PACKET_HOST;
@@ -687,7 +692,7 @@ static unsigned int br_nf_local_in(const struct nf_hook_ops *ops,
 /* PF_BRIDGE/FORWARD *************************************************/
 static int br_nf_forward_finish(struct sk_buff *skb)
 {
-	struct nf_bridge_info *nf_bridge = skb->nf_bridge;
+	struct nf_bridge_info *nf_bridge = nf_bridge_info_get(skb);
 	struct net_device *in;
 
 	if (!IS_ARP(skb) && !IS_VLAN_ARP(skb)) {
@@ -738,6 +743,10 @@ static unsigned int br_nf_forward_ip(const struct nf_hook_ops *ops,
 	if (!nf_bridge_unshare(skb))
 		return NF_DROP;
 
+	nf_bridge = nf_bridge_info_get(skb);
+	if (!nf_bridge)
+		return NF_DROP;
+
 	parent = bridge_parent(out);
 	if (!parent)
 		return NF_DROP;
@@ -751,7 +760,6 @@ static unsigned int br_nf_forward_ip(const struct nf_hook_ops *ops,
 
 	nf_bridge_pull_encap_header(skb);
 
-	nf_bridge = skb->nf_bridge;
 	if (skb->pkt_type == PACKET_OTHERHOST) {
 		skb->pkt_type = PACKET_HOST;
 		nf_bridge->mask |= BRNF_PKT_TYPE;
@@ -886,7 +894,7 @@ static unsigned int br_nf_post_routing(const struct nf_hook_ops *ops,
 				       const struct net_device *out,
 				       int (*okfn)(struct sk_buff *))
 {
-	struct nf_bridge_info *nf_bridge = skb->nf_bridge;
+	struct nf_bridge_info *nf_bridge = nf_bridge_info_get(skb);
 	struct net_device *realoutdev = bridge_parent(skb->dev);
 	u_int8_t pf;
 
@@ -955,7 +963,7 @@ static unsigned int ip_sabotage_in(const struct nf_hook_ops *ops,
  */
 static void br_nf_pre_routing_finish_bridge_slow(struct sk_buff *skb)
 {
-	struct nf_bridge_info *nf_bridge = skb->nf_bridge;
+	struct nf_bridge_info *nf_bridge = nf_bridge_info_get(skb);
 
 	skb_pull(skb, ETH_HLEN);
 	nf_bridge->mask &= ~BRNF_BRIDGED_DNAT;
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 14/20] netfilter: bridge: start splitting mask into public/private chunks
  2015-04-09 11:34 [PATCH 00/20] Netfilter updates for net-next Pablo Neira Ayuso
                   ` (12 preceding siblings ...)
  2015-04-09 11:34 ` [PATCH 13/20] netfilter: bridge: add and use nf_bridge_info_get helper Pablo Neira Ayuso
@ 2015-04-09 11:34 ` Pablo Neira Ayuso
  2015-04-09 11:34 ` [PATCH 15/20] netfilter: bridge: make BRNF_PKT_TYPE flag a bool Pablo Neira Ayuso
                   ` (6 subsequent siblings)
  20 siblings, 0 replies; 25+ messages in thread
From: Pablo Neira Ayuso @ 2015-04-09 11:34 UTC (permalink / raw)
  To: netfilter-devel; +Cc: davem, netdev

From: Florian Westphal <fw@strlen.de>

->mask is a bit info field that mixes various use cases.

In particular, we have flags that are mutually exlusive, and flags that
are only used within br_netfilter while others need to be exposed to
other parts of the kernel.

Remove BRNF_8021Q/PPPoE flags.  They're mutually exclusive and only
needed within br_netfilter context.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
---
 include/linux/netfilter_bridge.h |    4 +---
 include/linux/skbuff.h           |    5 +++++
 net/bridge/br_netfilter.c        |   15 +++++++++++----
 3 files changed, 17 insertions(+), 7 deletions(-)

diff --git a/include/linux/netfilter_bridge.h b/include/linux/netfilter_bridge.h
index e1d96bc..d47a32d 100644
--- a/include/linux/netfilter_bridge.h
+++ b/include/linux/netfilter_bridge.h
@@ -20,12 +20,10 @@ enum nf_br_hook_priorities {
 #define BRNF_PKT_TYPE			0x01
 #define BRNF_BRIDGED_DNAT		0x02
 #define BRNF_NF_BRIDGE_PREROUTING	0x08
-#define BRNF_8021Q			0x10
-#define BRNF_PPPoE			0x20
 
 static inline unsigned int nf_bridge_mtu_reduction(const struct sk_buff *skb)
 {
-	if (unlikely(skb->nf_bridge->mask & BRNF_PPPoE))
+	if (skb->nf_bridge->orig_proto == BRNF_PROTO_PPPOE)
 		return PPPOE_SES_HLEN;
 	return 0;
 }
diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index f66a089..6f75fb5 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -166,6 +166,11 @@ struct nf_conntrack {
 #if IS_ENABLED(CONFIG_BRIDGE_NETFILTER)
 struct nf_bridge_info {
 	atomic_t		use;
+	enum {
+		BRNF_PROTO_UNCHANGED,
+		BRNF_PROTO_8021Q,
+		BRNF_PROTO_PPPOE
+	} orig_proto;
 	unsigned int		mask;
 	struct net_device	*physindev;
 	struct net_device	*physoutdev;
diff --git a/net/bridge/br_netfilter.c b/net/bridge/br_netfilter.c
index 301f12b..ab1e988 100644
--- a/net/bridge/br_netfilter.c
+++ b/net/bridge/br_netfilter.c
@@ -262,10 +262,16 @@ drop:
 
 static void nf_bridge_update_protocol(struct sk_buff *skb)
 {
-	if (skb->nf_bridge->mask & BRNF_8021Q)
+	switch (skb->nf_bridge->orig_proto) {
+	case BRNF_PROTO_8021Q:
 		skb->protocol = htons(ETH_P_8021Q);
-	else if (skb->nf_bridge->mask & BRNF_PPPoE)
+		break;
+	case BRNF_PROTO_PPPOE:
 		skb->protocol = htons(ETH_P_PPP_SES);
+		break;
+	case BRNF_PROTO_UNCHANGED:
+		break;
+	}
 }
 
 /* PF_BRIDGE/PRE_ROUTING *********************************************/
@@ -503,10 +509,11 @@ static struct net_device *setup_pre_routing(struct sk_buff *skb)
 	nf_bridge->mask |= BRNF_NF_BRIDGE_PREROUTING;
 	nf_bridge->physindev = skb->dev;
 	skb->dev = brnf_get_logical_dev(skb, skb->dev);
+
 	if (skb->protocol == htons(ETH_P_8021Q))
-		nf_bridge->mask |= BRNF_8021Q;
+		nf_bridge->orig_proto = BRNF_PROTO_8021Q;
 	else if (skb->protocol == htons(ETH_P_PPP_SES))
-		nf_bridge->mask |= BRNF_PPPoE;
+		nf_bridge->orig_proto = BRNF_PROTO_PPPOE;
 
 	/* Must drop socket now because of tproxy. */
 	skb_orphan(skb);
-- 
1.7.10.4


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 15/20] netfilter: bridge: make BRNF_PKT_TYPE flag a bool
  2015-04-09 11:34 [PATCH 00/20] Netfilter updates for net-next Pablo Neira Ayuso
                   ` (13 preceding siblings ...)
  2015-04-09 11:34 ` [PATCH 14/20] netfilter: bridge: start splitting mask into public/private chunks Pablo Neira Ayuso
@ 2015-04-09 11:34 ` Pablo Neira Ayuso
  2015-04-09 11:35 ` [PATCH 16/20] netfilter: nf_tables: fix set selection when timeouts are requested Pablo Neira Ayuso
                   ` (5 subsequent siblings)
  20 siblings, 0 replies; 25+ messages in thread
From: Pablo Neira Ayuso @ 2015-04-09 11:34 UTC (permalink / raw)
  To: netfilter-devel; +Cc: davem, netdev

From: Florian Westphal <fw@strlen.de>

nf_bridge_info->mask is used for several things, for example to
remember if skb->pkt_type was set to OTHER_HOST.

For a bridge, OTHER_HOST is expected case. For ip forward its a non-starter
though -- routing expects PACKET_HOST.

Bridge netfilter thus changes OTHER_HOST to PACKET_HOST before hook
invocation and then un-does it after hook traversal.

This information is irrelevant outside of br_netfilter.

After this change, ->mask now only contains flags that need to be
known outside of br_netfilter in fast-path.

Future patch changes mask into a 2bit state field in sk_buff, so that
we can remove skb->nf_bridge pointer for good and consider all remaining
places that access nf_bridge info content a not-so fastpath.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
---
 include/linux/netfilter_bridge.h |    1 -
 include/linux/skbuff.h           |    1 +
 net/bridge/br_netfilter.c        |   18 +++++++++---------
 3 files changed, 10 insertions(+), 10 deletions(-)

diff --git a/include/linux/netfilter_bridge.h b/include/linux/netfilter_bridge.h
index d47a32d..8912e8c 100644
--- a/include/linux/netfilter_bridge.h
+++ b/include/linux/netfilter_bridge.h
@@ -17,7 +17,6 @@ enum nf_br_hook_priorities {
 
 #if IS_ENABLED(CONFIG_BRIDGE_NETFILTER)
 
-#define BRNF_PKT_TYPE			0x01
 #define BRNF_BRIDGED_DNAT		0x02
 #define BRNF_NF_BRIDGE_PREROUTING	0x08
 
diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index 6f75fb5..0991259 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -171,6 +171,7 @@ struct nf_bridge_info {
 		BRNF_PROTO_8021Q,
 		BRNF_PROTO_PPPOE
 	} orig_proto;
+	bool			pkt_otherhost;
 	unsigned int		mask;
 	struct net_device	*physindev;
 	struct net_device	*physoutdev;
diff --git a/net/bridge/br_netfilter.c b/net/bridge/br_netfilter.c
index ab1e988..e8ac743 100644
--- a/net/bridge/br_netfilter.c
+++ b/net/bridge/br_netfilter.c
@@ -282,9 +282,9 @@ static int br_nf_pre_routing_finish_ipv6(struct sk_buff *skb)
 	struct nf_bridge_info *nf_bridge = nf_bridge_info_get(skb);
 	struct rtable *rt;
 
-	if (nf_bridge->mask & BRNF_PKT_TYPE) {
+	if (nf_bridge->pkt_otherhost) {
 		skb->pkt_type = PACKET_OTHERHOST;
-		nf_bridge->mask ^= BRNF_PKT_TYPE;
+		nf_bridge->pkt_otherhost = false;
 	}
 	nf_bridge->mask ^= BRNF_NF_BRIDGE_PREROUTING;
 
@@ -415,9 +415,9 @@ static int br_nf_pre_routing_finish(struct sk_buff *skb)
 	frag_max_size = IPCB(skb)->frag_max_size;
 	BR_INPUT_SKB_CB(skb)->frag_max_size = frag_max_size;
 
-	if (nf_bridge->mask & BRNF_PKT_TYPE) {
+	if (nf_bridge->pkt_otherhost) {
 		skb->pkt_type = PACKET_OTHERHOST;
-		nf_bridge->mask ^= BRNF_PKT_TYPE;
+		nf_bridge->pkt_otherhost = false;
 	}
 	nf_bridge->mask ^= BRNF_NF_BRIDGE_PREROUTING;
 	if (dnat_took_place(skb)) {
@@ -503,7 +503,7 @@ static struct net_device *setup_pre_routing(struct sk_buff *skb)
 
 	if (skb->pkt_type == PACKET_OTHERHOST) {
 		skb->pkt_type = PACKET_HOST;
-		nf_bridge->mask |= BRNF_PKT_TYPE;
+		nf_bridge->pkt_otherhost = true;
 	}
 
 	nf_bridge->mask |= BRNF_NF_BRIDGE_PREROUTING;
@@ -711,9 +711,9 @@ static int br_nf_forward_finish(struct sk_buff *skb)
 		}
 
 		in = nf_bridge->physindev;
-		if (nf_bridge->mask & BRNF_PKT_TYPE) {
+		if (nf_bridge->pkt_otherhost) {
 			skb->pkt_type = PACKET_OTHERHOST;
-			nf_bridge->mask ^= BRNF_PKT_TYPE;
+			nf_bridge->pkt_otherhost = false;
 		}
 		nf_bridge_update_protocol(skb);
 	} else {
@@ -769,7 +769,7 @@ static unsigned int br_nf_forward_ip(const struct nf_hook_ops *ops,
 
 	if (skb->pkt_type == PACKET_OTHERHOST) {
 		skb->pkt_type = PACKET_HOST;
-		nf_bridge->mask |= BRNF_PKT_TYPE;
+		nf_bridge->pkt_otherhost = true;
 	}
 
 	if (pf == NFPROTO_IPV4) {
@@ -927,7 +927,7 @@ static unsigned int br_nf_post_routing(const struct nf_hook_ops *ops,
 	 * about the value of skb->pkt_type. */
 	if (skb->pkt_type == PACKET_OTHERHOST) {
 		skb->pkt_type = PACKET_HOST;
-		nf_bridge->mask |= BRNF_PKT_TYPE;
+		nf_bridge->pkt_otherhost = true;
 	}
 
 	nf_bridge_pull_encap_header(skb);
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 16/20] netfilter: nf_tables: fix set selection when timeouts are requested
  2015-04-09 11:34 [PATCH 00/20] Netfilter updates for net-next Pablo Neira Ayuso
                   ` (14 preceding siblings ...)
  2015-04-09 11:34 ` [PATCH 15/20] netfilter: bridge: make BRNF_PKT_TYPE flag a bool Pablo Neira Ayuso
@ 2015-04-09 11:35 ` Pablo Neira Ayuso
  2015-04-09 11:35 ` [PATCH 17/20] netfilter: nf_tables: prepare set element accounting for async updates Pablo Neira Ayuso
                   ` (4 subsequent siblings)
  20 siblings, 0 replies; 25+ messages in thread
From: Pablo Neira Ayuso @ 2015-04-09 11:35 UTC (permalink / raw)
  To: netfilter-devel; +Cc: davem, netdev

From: Patrick McHardy <kaber@trash.net>

The NFT_SET_TIMEOUT flag is ignore in nft_select_set_ops, which may
lead to selection of a set implementation that doesn't actually
support timeouts.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
---
 net/netfilter/nf_tables_api.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/netfilter/nf_tables_api.c b/net/netfilter/nf_tables_api.c
index 3aa92b3..0dab872 100644
--- a/net/netfilter/nf_tables_api.c
+++ b/net/netfilter/nf_tables_api.c
@@ -2159,7 +2159,7 @@ nft_select_set_ops(const struct nlattr * const nla[],
 	features = 0;
 	if (nla[NFTA_SET_FLAGS] != NULL) {
 		features = ntohl(nla_get_be32(nla[NFTA_SET_FLAGS]));
-		features &= NFT_SET_INTERVAL | NFT_SET_MAP;
+		features &= NFT_SET_INTERVAL | NFT_SET_MAP | NFT_SET_TIMEOUT;
 	}
 
 	bops	   = NULL;
-- 
1.7.10.4


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 17/20] netfilter: nf_tables: prepare set element accounting for async updates
  2015-04-09 11:34 [PATCH 00/20] Netfilter updates for net-next Pablo Neira Ayuso
                   ` (15 preceding siblings ...)
  2015-04-09 11:35 ` [PATCH 16/20] netfilter: nf_tables: fix set selection when timeouts are requested Pablo Neira Ayuso
@ 2015-04-09 11:35 ` Pablo Neira Ayuso
  2015-04-09 11:35 ` [PATCH 18/20] netfilter: nf_tables: support different set binding types Pablo Neira Ayuso
                   ` (3 subsequent siblings)
  20 siblings, 0 replies; 25+ messages in thread
From: Pablo Neira Ayuso @ 2015-04-09 11:35 UTC (permalink / raw)
  To: netfilter-devel; +Cc: davem, netdev

From: Patrick McHardy <kaber@trash.net>

Use atomic operations for the element count to avoid races with async
updates.

To properly handle the transactional semantics during netlink updates,
deleted but not yet committed elements are accounted for seperately and
are treated as being already removed. This means for the duration of
a netlink transaction, the limit might be exceeded by the amount of
elements deleted. Set implementations must be prepared to handle this.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
---
 include/net/netfilter/nf_tables.h |    4 +++-
 net/netfilter/nf_tables_api.c     |   21 ++++++++++++---------
 net/netfilter/nft_hash.c          |    3 ++-
 3 files changed, 17 insertions(+), 11 deletions(-)

diff --git a/include/net/netfilter/nf_tables.h b/include/net/netfilter/nf_tables.h
index a785699..7464233 100644
--- a/include/net/netfilter/nf_tables.h
+++ b/include/net/netfilter/nf_tables.h
@@ -258,6 +258,7 @@ void nft_unregister_set(struct nft_set_ops *ops);
  * 	@dtype: data type (verdict or numeric type defined by userspace)
  * 	@size: maximum set size
  * 	@nelems: number of elements
+ * 	@ndeact: number of deactivated elements queued for removal
  * 	@timeout: default timeout value in msecs
  * 	@gc_int: garbage collection interval in msecs
  *	@policy: set parameterization (see enum nft_set_policies)
@@ -275,7 +276,8 @@ struct nft_set {
 	u32				ktype;
 	u32				dtype;
 	u32				size;
-	u32				nelems;
+	atomic_t			nelems;
+	u32				ndeact;
 	u64				timeout;
 	u32				gc_int;
 	u16				policy;
diff --git a/net/netfilter/nf_tables_api.c b/net/netfilter/nf_tables_api.c
index 0dab872..27d1bf5 100644
--- a/net/netfilter/nf_tables_api.c
+++ b/net/netfilter/nf_tables_api.c
@@ -3238,9 +3238,6 @@ static int nft_add_set_elem(struct nft_ctx *ctx, struct nft_set *set,
 	u32 flags;
 	int err;
 
-	if (set->size && set->nelems == set->size)
-		return -ENFILE;
-
 	err = nla_parse_nested(nla, NFTA_SET_ELEM_MAX, attr,
 			       nft_set_elem_policy);
 	if (err < 0)
@@ -3391,11 +3388,15 @@ static int nf_tables_newsetelem(struct sock *nlsk, struct sk_buff *skb,
 		return -EBUSY;
 
 	nla_for_each_nested(attr, nla[NFTA_SET_ELEM_LIST_ELEMENTS], rem) {
+		if (set->size &&
+		    !atomic_add_unless(&set->nelems, 1, set->size + set->ndeact))
+			return -ENFILE;
+
 		err = nft_add_set_elem(&ctx, set, attr);
-		if (err < 0)
+		if (err < 0) {
+			atomic_dec(&set->nelems);
 			break;
-
-		set->nelems++;
+		}
 	}
 	return err;
 }
@@ -3477,7 +3478,7 @@ static int nf_tables_delsetelem(struct sock *nlsk, struct sk_buff *skb,
 		if (err < 0)
 			break;
 
-		set->nelems--;
+		set->ndeact++;
 	}
 	return err;
 }
@@ -3810,6 +3811,8 @@ static int nf_tables_commit(struct sk_buff *skb)
 						 &te->elem,
 						 NFT_MSG_DELSETELEM, 0);
 			te->set->ops->remove(te->set, &te->elem);
+			atomic_dec(&te->set->nelems);
+			te->set->ndeact--;
 			break;
 		}
 	}
@@ -3913,16 +3916,16 @@ static int nf_tables_abort(struct sk_buff *skb)
 			nft_trans_destroy(trans);
 			break;
 		case NFT_MSG_NEWSETELEM:
-			nft_trans_elem_set(trans)->nelems--;
 			te = (struct nft_trans_elem *)trans->data;
 
 			te->set->ops->remove(te->set, &te->elem);
+			atomic_dec(&te->set->nelems);
 			break;
 		case NFT_MSG_DELSETELEM:
 			te = (struct nft_trans_elem *)trans->data;
 
-			nft_trans_elem_set(trans)->nelems++;
 			te->set->ops->activate(te->set, &te->elem);
+			te->set->ndeact--;
 
 			nft_trans_destroy(trans);
 			break;
diff --git a/net/netfilter/nft_hash.c b/net/netfilter/nft_hash.c
index 5923ec5..c74e2bf 100644
--- a/net/netfilter/nft_hash.c
+++ b/net/netfilter/nft_hash.c
@@ -203,7 +203,7 @@ out:
 
 static void nft_hash_gc(struct work_struct *work)
 {
-	const struct nft_set *set;
+	struct nft_set *set;
 	struct nft_hash_elem *he;
 	struct nft_hash *priv;
 	struct nft_set_gc_batch *gcb = NULL;
@@ -237,6 +237,7 @@ static void nft_hash_gc(struct work_struct *work)
 		if (gcb == NULL)
 			goto out;
 		rhashtable_remove_fast(&priv->ht, &he->node, nft_hash_params);
+		atomic_dec(&set->nelems);
 		nft_set_gc_batch_add(gcb, he);
 	}
 out:
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 18/20] netfilter: nf_tables: support different set binding types
  2015-04-09 11:34 [PATCH 00/20] Netfilter updates for net-next Pablo Neira Ayuso
                   ` (16 preceding siblings ...)
  2015-04-09 11:35 ` [PATCH 17/20] netfilter: nf_tables: prepare set element accounting for async updates Pablo Neira Ayuso
@ 2015-04-09 11:35 ` Pablo Neira Ayuso
  2015-04-09 11:35 ` [PATCH 19/20] netfilter: nf_tables: add support for dynamic set updates Pablo Neira Ayuso
                   ` (2 subsequent siblings)
  20 siblings, 0 replies; 25+ messages in thread
From: Pablo Neira Ayuso @ 2015-04-09 11:35 UTC (permalink / raw)
  To: netfilter-devel; +Cc: davem, netdev

From: Patrick McHardy <kaber@trash.net>

Currently a set binding is assumed to be related to a lookup and, in
case of maps, a data load.

In order to use bindings for set updates, the loop detection checks
must be restricted to map operations only. Add a flags member to the
binding struct to hold the set "action" flags such as NFT_SET_MAP,
and perform loop detection based on these.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
---
 include/net/netfilter/nf_tables.h |    2 ++
 net/netfilter/nf_tables_api.c     |   11 ++++++++---
 net/netfilter/nft_lookup.c        |    2 ++
 3 files changed, 12 insertions(+), 3 deletions(-)

diff --git a/include/net/netfilter/nf_tables.h b/include/net/netfilter/nf_tables.h
index 7464233..e7e6365 100644
--- a/include/net/netfilter/nf_tables.h
+++ b/include/net/netfilter/nf_tables.h
@@ -316,6 +316,7 @@ static inline unsigned long nft_set_gc_interval(const struct nft_set *set)
  *
  *	@list: set bindings list node
  *	@chain: chain containing the rule bound to the set
+ *	@flags: set action flags
  *
  *	A set binding contains all information necessary for validation
  *	of new elements added to a bound set.
@@ -323,6 +324,7 @@ static inline unsigned long nft_set_gc_interval(const struct nft_set *set)
 struct nft_set_binding {
 	struct list_head		list;
 	const struct nft_chain		*chain;
+	u32				flags;
 };
 
 int nf_tables_bind_set(const struct nft_ctx *ctx, struct nft_set *set,
diff --git a/net/netfilter/nf_tables_api.c b/net/netfilter/nf_tables_api.c
index 27d1bf5..90b8984 100644
--- a/net/netfilter/nf_tables_api.c
+++ b/net/netfilter/nf_tables_api.c
@@ -2811,12 +2811,13 @@ int nf_tables_bind_set(const struct nft_ctx *ctx, struct nft_set *set,
 	if (!list_empty(&set->bindings) && set->flags & NFT_SET_ANONYMOUS)
 		return -EBUSY;
 
-	if (set->flags & NFT_SET_MAP) {
+	if (binding->flags & NFT_SET_MAP) {
 		/* If the set is already bound to the same chain all
 		 * jumps are already validated for that chain.
 		 */
 		list_for_each_entry(i, &set->bindings, list) {
-			if (i->chain == binding->chain)
+			if (binding->flags & NFT_SET_MAP &&
+			    i->chain == binding->chain)
 				goto bind;
 		}
 
@@ -3312,6 +3313,9 @@ static int nft_add_set_elem(struct nft_ctx *ctx, struct nft_set *set,
 				.chain	= (struct nft_chain *)binding->chain,
 			};
 
+			if (!(binding->flags & NFT_SET_MAP))
+				continue;
+
 			err = nft_validate_data_load(&bind_ctx, dreg,
 						     &data, d2.type);
 			if (err < 0)
@@ -4063,7 +4067,8 @@ static int nf_tables_check_loops(const struct nft_ctx *ctx,
 			continue;
 
 		list_for_each_entry(binding, &set->bindings, list) {
-			if (binding->chain != chain)
+			if (!(binding->flags & NFT_SET_MAP) ||
+			    binding->chain != chain)
 				continue;
 
 			iter.skip 	= 0;
diff --git a/net/netfilter/nft_lookup.c b/net/netfilter/nft_lookup.c
index a5f30b8..d8cf86f 100644
--- a/net/netfilter/nft_lookup.c
+++ b/net/netfilter/nft_lookup.c
@@ -92,6 +92,8 @@ static int nft_lookup_init(const struct nft_ctx *ctx,
 	} else if (set->flags & NFT_SET_MAP)
 		return -EINVAL;
 
+	priv->binding.flags = set->flags & NFT_SET_MAP;
+
 	err = nf_tables_bind_set(ctx, set, &priv->binding);
 	if (err < 0)
 		return err;
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 19/20] netfilter: nf_tables: add support for dynamic set updates
  2015-04-09 11:34 [PATCH 00/20] Netfilter updates for net-next Pablo Neira Ayuso
                   ` (17 preceding siblings ...)
  2015-04-09 11:35 ` [PATCH 18/20] netfilter: nf_tables: support different set binding types Pablo Neira Ayuso
@ 2015-04-09 11:35 ` Pablo Neira Ayuso
  2015-04-09 11:35 ` [PATCH 20/20] netfilter: nf_tables: support optional userdata for set elements Pablo Neira Ayuso
  2015-04-09 18:46 ` [PATCH 00/20] Netfilter updates for net-next David Miller
  20 siblings, 0 replies; 25+ messages in thread
From: Pablo Neira Ayuso @ 2015-04-09 11:35 UTC (permalink / raw)
  To: netfilter-devel; +Cc: davem, netdev

From: Patrick McHardy <kaber@trash.net>

Add a new "dynset" expression for dynamic set updates.

A new set op ->update() is added which, for non existant elements,
invokes an initialization callback and inserts the new element.
For both new or existing elements the extenstion pointer is returned
to the caller to optionally perform timer updates or other actions.

Element removal is not supported so far, however that seems to be a
rather exotic need and can be added later on.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
---
 include/net/netfilter/nf_tables.h        |   17 +++
 include/net/netfilter/nf_tables_core.h   |    3 +
 include/uapi/linux/netfilter/nf_tables.h |   27 ++++
 net/netfilter/Makefile                   |    2 +-
 net/netfilter/nf_tables_api.c            |   10 +-
 net/netfilter/nf_tables_core.c           |    7 +
 net/netfilter/nft_dynset.c               |  218 ++++++++++++++++++++++++++++++
 net/netfilter/nft_hash.c                 |   37 +++++
 8 files changed, 315 insertions(+), 6 deletions(-)
 create mode 100644 net/netfilter/nft_dynset.c

diff --git a/include/net/netfilter/nf_tables.h b/include/net/netfilter/nf_tables.h
index e7e6365..38c3496 100644
--- a/include/net/netfilter/nf_tables.h
+++ b/include/net/netfilter/nf_tables.h
@@ -196,6 +196,7 @@ struct nft_set_estimate {
 };
 
 struct nft_set_ext;
+struct nft_expr;
 
 /**
  *	struct nft_set_ops - nf_tables set operations
@@ -218,6 +219,15 @@ struct nft_set_ops {
 	bool				(*lookup)(const struct nft_set *set,
 						  const struct nft_data *key,
 						  const struct nft_set_ext **ext);
+	bool				(*update)(struct nft_set *set,
+						  const struct nft_data *key,
+						  void *(*new)(struct nft_set *,
+							       const struct nft_expr *,
+							       struct nft_data []),
+						  const struct nft_expr *expr,
+						  struct nft_data data[],
+						  const struct nft_set_ext **ext);
+
 	int				(*insert)(const struct nft_set *set,
 						  const struct nft_set_elem *elem);
 	void				(*activate)(const struct nft_set *set,
@@ -466,6 +476,11 @@ static inline struct nft_set_ext *nft_set_elem_ext(const struct nft_set *set,
 	return elem + set->ops->elemsize;
 }
 
+void *nft_set_elem_init(const struct nft_set *set,
+			const struct nft_set_ext_tmpl *tmpl,
+			const struct nft_data *key,
+			const struct nft_data *data,
+			u64 timeout, gfp_t gfp);
 void nft_set_elem_destroy(const struct nft_set *set, void *elem);
 
 /**
@@ -845,6 +860,8 @@ static inline u8 nft_genmask_cur(const struct net *net)
 	return 1 << ACCESS_ONCE(net->nft.gencursor);
 }
 
+#define NFT_GENMASK_ANY		((1 << 0) | (1 << 1))
+
 /*
  * Set element transaction helpers
  */
diff --git a/include/net/netfilter/nf_tables_core.h b/include/net/netfilter/nf_tables_core.h
index a75fc8e..c6f400c 100644
--- a/include/net/netfilter/nf_tables_core.h
+++ b/include/net/netfilter/nf_tables_core.h
@@ -31,6 +31,9 @@ void nft_cmp_module_exit(void);
 int nft_lookup_module_init(void);
 void nft_lookup_module_exit(void);
 
+int nft_dynset_module_init(void);
+void nft_dynset_module_exit(void);
+
 int nft_bitwise_module_init(void);
 void nft_bitwise_module_exit(void);
 
diff --git a/include/uapi/linux/netfilter/nf_tables.h b/include/uapi/linux/netfilter/nf_tables.h
index 83441cc..0b87b2f 100644
--- a/include/uapi/linux/netfilter/nf_tables.h
+++ b/include/uapi/linux/netfilter/nf_tables.h
@@ -515,6 +515,33 @@ enum nft_lookup_attributes {
 };
 #define NFTA_LOOKUP_MAX		(__NFTA_LOOKUP_MAX - 1)
 
+enum nft_dynset_ops {
+	NFT_DYNSET_OP_ADD,
+	NFT_DYNSET_OP_UPDATE,
+};
+
+/**
+ * enum nft_dynset_attributes - dynset expression attributes
+ *
+ * @NFTA_DYNSET_SET_NAME: name of set the to add data to (NLA_STRING)
+ * @NFTA_DYNSET_SET_ID: uniquely identifier of the set in the transaction (NLA_U32)
+ * @NFTA_DYNSET_OP: operation (NLA_U32)
+ * @NFTA_DYNSET_SREG_KEY: source register of the key (NLA_U32)
+ * @NFTA_DYNSET_SREG_DATA: source register of the data (NLA_U32)
+ * @NFTA_DYNSET_TIMEOUT: timeout value for the new element (NLA_U64)
+ */
+enum nft_dynset_attributes {
+	NFTA_DYNSET_UNSPEC,
+	NFTA_DYNSET_SET_NAME,
+	NFTA_DYNSET_SET_ID,
+	NFTA_DYNSET_OP,
+	NFTA_DYNSET_SREG_KEY,
+	NFTA_DYNSET_SREG_DATA,
+	NFTA_DYNSET_TIMEOUT,
+	__NFTA_DYNSET_MAX,
+};
+#define NFTA_DYNSET_MAX		(__NFTA_DYNSET_MAX - 1)
+
 /**
  * enum nft_payload_bases - nf_tables payload expression offset bases
  *
diff --git a/net/netfilter/Makefile b/net/netfilter/Makefile
index 89f73a9..a87d8b8 100644
--- a/net/netfilter/Makefile
+++ b/net/netfilter/Makefile
@@ -70,7 +70,7 @@ obj-$(CONFIG_NETFILTER_SYNPROXY) += nf_synproxy_core.o
 
 # nf_tables
 nf_tables-objs += nf_tables_core.o nf_tables_api.o
-nf_tables-objs += nft_immediate.o nft_cmp.o nft_lookup.o
+nf_tables-objs += nft_immediate.o nft_cmp.o nft_lookup.o nft_dynset.o
 nf_tables-objs += nft_bitwise.o nft_byteorder.o nft_payload.o
 
 obj-$(CONFIG_NF_TABLES)		+= nf_tables.o
diff --git a/net/netfilter/nf_tables_api.c b/net/netfilter/nf_tables_api.c
index 90b8984..598e53e 100644
--- a/net/netfilter/nf_tables_api.c
+++ b/net/netfilter/nf_tables_api.c
@@ -3183,11 +3183,11 @@ static struct nft_trans *nft_trans_elem_alloc(struct nft_ctx *ctx,
 	return trans;
 }
 
-static void *nft_set_elem_init(const struct nft_set *set,
-			       const struct nft_set_ext_tmpl *tmpl,
-			       const struct nft_data *key,
-			       const struct nft_data *data,
-			       u64 timeout, gfp_t gfp)
+void *nft_set_elem_init(const struct nft_set *set,
+			const struct nft_set_ext_tmpl *tmpl,
+			const struct nft_data *key,
+			const struct nft_data *data,
+			u64 timeout, gfp_t gfp)
 {
 	struct nft_set_ext *ext;
 	void *elem;
diff --git a/net/netfilter/nf_tables_core.c b/net/netfilter/nf_tables_core.c
index ef4dfcb..7caf08a 100644
--- a/net/netfilter/nf_tables_core.c
+++ b/net/netfilter/nf_tables_core.c
@@ -239,8 +239,14 @@ int __init nf_tables_core_module_init(void)
 	if (err < 0)
 		goto err6;
 
+	err = nft_dynset_module_init();
+	if (err < 0)
+		goto err7;
+
 	return 0;
 
+err7:
+	nft_payload_module_exit();
 err6:
 	nft_byteorder_module_exit();
 err5:
@@ -257,6 +263,7 @@ err1:
 
 void nf_tables_core_module_exit(void)
 {
+	nft_dynset_module_exit();
 	nft_payload_module_exit();
 	nft_byteorder_module_exit();
 	nft_bitwise_module_exit();
diff --git a/net/netfilter/nft_dynset.c b/net/netfilter/nft_dynset.c
new file mode 100644
index 0000000..eeb72de
--- /dev/null
+++ b/net/netfilter/nft_dynset.c
@@ -0,0 +1,218 @@
+/*
+ * Copyright (c) 2015 Patrick McHardy <kaber@trash.net>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ */
+
+#include <linux/kernel.h>
+#include <linux/module.h>
+#include <linux/init.h>
+#include <linux/netlink.h>
+#include <linux/netfilter.h>
+#include <linux/netfilter/nf_tables.h>
+#include <net/netfilter/nf_tables.h>
+#include <net/netfilter/nf_tables_core.h>
+
+struct nft_dynset {
+	struct nft_set			*set;
+	struct nft_set_ext_tmpl		tmpl;
+	enum nft_dynset_ops		op:8;
+	enum nft_registers		sreg_key:8;
+	enum nft_registers		sreg_data:8;
+	u64				timeout;
+	struct nft_set_binding		binding;
+};
+
+static void *nft_dynset_new(struct nft_set *set, const struct nft_expr *expr,
+			    struct nft_data data[NFT_REG_MAX + 1])
+{
+	const struct nft_dynset *priv = nft_expr_priv(expr);
+	u64 timeout;
+	void *elem;
+
+	if (set->size && !atomic_add_unless(&set->nelems, 1, set->size))
+		return NULL;
+
+	timeout = priv->timeout ? : set->timeout;
+	elem = nft_set_elem_init(set, &priv->tmpl,
+				 &data[priv->sreg_key], &data[priv->sreg_data],
+				 timeout, GFP_ATOMIC);
+	if (elem == NULL) {
+		if (set->size)
+			atomic_dec(&set->nelems);
+	}
+	return elem;
+}
+
+static void nft_dynset_eval(const struct nft_expr *expr,
+			    struct nft_data data[NFT_REG_MAX + 1],
+			    const struct nft_pktinfo *pkt)
+{
+	const struct nft_dynset *priv = nft_expr_priv(expr);
+	struct nft_set *set = priv->set;
+	const struct nft_set_ext *ext;
+	u64 timeout;
+
+	if (set->ops->update(set, &data[priv->sreg_key], nft_dynset_new,
+			     expr, data, &ext)) {
+		if (priv->op == NFT_DYNSET_OP_UPDATE &&
+		    nft_set_ext_exists(ext, NFT_SET_EXT_EXPIRATION)) {
+			timeout = priv->timeout ? : set->timeout;
+			*nft_set_ext_expiration(ext) = jiffies + timeout;
+			return;
+		}
+	}
+
+	data[NFT_REG_VERDICT].verdict = NFT_BREAK;
+}
+
+static const struct nla_policy nft_dynset_policy[NFTA_DYNSET_MAX + 1] = {
+	[NFTA_DYNSET_SET_NAME]	= { .type = NLA_STRING },
+	[NFTA_DYNSET_SET_ID]	= { .type = NLA_U32 },
+	[NFTA_DYNSET_OP]	= { .type = NLA_U32 },
+	[NFTA_DYNSET_SREG_KEY]	= { .type = NLA_U32 },
+	[NFTA_DYNSET_SREG_DATA]	= { .type = NLA_U32 },
+	[NFTA_DYNSET_TIMEOUT]	= { .type = NLA_U64 },
+};
+
+static int nft_dynset_init(const struct nft_ctx *ctx,
+			   const struct nft_expr *expr,
+			   const struct nlattr * const tb[])
+{
+	struct nft_dynset *priv = nft_expr_priv(expr);
+	struct nft_set *set;
+	u64 timeout;
+	int err;
+
+	if (tb[NFTA_DYNSET_SET_NAME] == NULL ||
+	    tb[NFTA_DYNSET_OP] == NULL ||
+	    tb[NFTA_DYNSET_SREG_KEY] == NULL)
+		return -EINVAL;
+
+	set = nf_tables_set_lookup(ctx->table, tb[NFTA_DYNSET_SET_NAME]);
+	if (IS_ERR(set)) {
+		if (tb[NFTA_DYNSET_SET_ID])
+			set = nf_tables_set_lookup_byid(ctx->net,
+							tb[NFTA_DYNSET_SET_ID]);
+		if (IS_ERR(set))
+			return PTR_ERR(set);
+	}
+
+	if (set->flags & NFT_SET_CONSTANT)
+		return -EBUSY;
+
+	priv->op = ntohl(nla_get_be32(tb[NFTA_DYNSET_OP]));
+	switch (priv->op) {
+	case NFT_DYNSET_OP_ADD:
+		break;
+	case NFT_DYNSET_OP_UPDATE:
+		if (!(set->flags & NFT_SET_TIMEOUT))
+			return -EOPNOTSUPP;
+		break;
+	default:
+		return -EOPNOTSUPP;
+	}
+
+	timeout = 0;
+	if (tb[NFTA_DYNSET_TIMEOUT] != NULL) {
+		if (!(set->flags & NFT_SET_TIMEOUT))
+			return -EINVAL;
+		timeout = be64_to_cpu(nla_get_be64(tb[NFTA_DYNSET_TIMEOUT]));
+	}
+
+	priv->sreg_key = ntohl(nla_get_be32(tb[NFTA_DYNSET_SREG_KEY]));
+	err = nft_validate_input_register(priv->sreg_key);
+	if (err < 0)
+		return err;
+
+	if (tb[NFTA_DYNSET_SREG_DATA] != NULL) {
+		if (!(set->flags & NFT_SET_MAP))
+			return -EINVAL;
+		if (set->dtype == NFT_DATA_VERDICT)
+			return -EOPNOTSUPP;
+
+		priv->sreg_data = ntohl(nla_get_be32(tb[NFTA_DYNSET_SREG_DATA]));
+		err = nft_validate_input_register(priv->sreg_data);
+		if (err < 0)
+			return err;
+	} else if (set->flags & NFT_SET_MAP)
+		return -EINVAL;
+
+	nft_set_ext_prepare(&priv->tmpl);
+	nft_set_ext_add_length(&priv->tmpl, NFT_SET_EXT_KEY, set->klen);
+	if (set->flags & NFT_SET_MAP)
+		nft_set_ext_add_length(&priv->tmpl, NFT_SET_EXT_DATA, set->dlen);
+	if (set->flags & NFT_SET_TIMEOUT) {
+		if (timeout || set->timeout)
+			nft_set_ext_add(&priv->tmpl, NFT_SET_EXT_EXPIRATION);
+	}
+
+	priv->timeout = timeout;
+
+	err = nf_tables_bind_set(ctx, set, &priv->binding);
+	if (err < 0)
+		return err;
+
+	priv->set = set;
+	return 0;
+}
+
+static void nft_dynset_destroy(const struct nft_ctx *ctx,
+			       const struct nft_expr *expr)
+{
+	struct nft_dynset *priv = nft_expr_priv(expr);
+
+	nf_tables_unbind_set(ctx, priv->set, &priv->binding);
+}
+
+static int nft_dynset_dump(struct sk_buff *skb, const struct nft_expr *expr)
+{
+	const struct nft_dynset *priv = nft_expr_priv(expr);
+
+	if (nla_put_be32(skb, NFTA_DYNSET_SREG_KEY, htonl(priv->sreg_key)))
+		goto nla_put_failure;
+	if (priv->set->flags & NFT_SET_MAP &&
+	    nla_put_be32(skb, NFTA_DYNSET_SREG_DATA, htonl(priv->sreg_data)))
+		goto nla_put_failure;
+	if (nla_put_be32(skb, NFTA_DYNSET_OP, htonl(priv->op)))
+		goto nla_put_failure;
+	if (nla_put_string(skb, NFTA_DYNSET_SET_NAME, priv->set->name))
+		goto nla_put_failure;
+	if (nla_put_be64(skb, NFTA_DYNSET_TIMEOUT, cpu_to_be64(priv->timeout)))
+		goto nla_put_failure;
+	return 0;
+
+nla_put_failure:
+	return -1;
+}
+
+static struct nft_expr_type nft_dynset_type;
+static const struct nft_expr_ops nft_dynset_ops = {
+	.type		= &nft_dynset_type,
+	.size		= NFT_EXPR_SIZE(sizeof(struct nft_dynset)),
+	.eval		= nft_dynset_eval,
+	.init		= nft_dynset_init,
+	.destroy	= nft_dynset_destroy,
+	.dump		= nft_dynset_dump,
+};
+
+static struct nft_expr_type nft_dynset_type __read_mostly = {
+	.name		= "dynset",
+	.ops		= &nft_dynset_ops,
+	.policy		= nft_dynset_policy,
+	.maxattr	= NFTA_DYNSET_MAX,
+	.owner		= THIS_MODULE,
+};
+
+int __init nft_dynset_module_init(void)
+{
+	return nft_register_expr(&nft_dynset_type);
+}
+
+void nft_dynset_module_exit(void)
+{
+	nft_unregister_expr(&nft_dynset_type);
+}
diff --git a/net/netfilter/nft_hash.c b/net/netfilter/nft_hash.c
index c74e2bf..bc23806 100644
--- a/net/netfilter/nft_hash.c
+++ b/net/netfilter/nft_hash.c
@@ -90,6 +90,42 @@ static bool nft_hash_lookup(const struct nft_set *set,
 	return !!he;
 }
 
+static bool nft_hash_update(struct nft_set *set, const struct nft_data *key,
+			    void *(*new)(struct nft_set *,
+					 const struct nft_expr *,
+					 struct nft_data []),
+			    const struct nft_expr *expr,
+			    struct nft_data data[],
+			    const struct nft_set_ext **ext)
+{
+	struct nft_hash *priv = nft_set_priv(set);
+	struct nft_hash_elem *he;
+	struct nft_hash_cmp_arg arg = {
+		.genmask = NFT_GENMASK_ANY,
+		.set	 = set,
+		.key	 = key,
+	};
+
+	he = rhashtable_lookup_fast(&priv->ht, &arg, nft_hash_params);
+	if (he != NULL)
+		goto out;
+
+	he = new(set, expr, data);
+	if (he == NULL)
+		goto err1;
+	if (rhashtable_lookup_insert_key(&priv->ht, &arg, &he->node,
+					 nft_hash_params))
+		goto err2;
+out:
+	*ext = &he->ext;
+	return true;
+
+err2:
+	nft_set_elem_destroy(set, he);
+err1:
+	return false;
+}
+
 static int nft_hash_insert(const struct nft_set *set,
 			   const struct nft_set_elem *elem)
 {
@@ -335,6 +371,7 @@ static struct nft_set_ops nft_hash_ops __read_mostly = {
 	.deactivate	= nft_hash_deactivate,
 	.remove		= nft_hash_remove,
 	.lookup		= nft_hash_lookup,
+	.update		= nft_hash_update,
 	.walk		= nft_hash_walk,
 	.features	= NFT_SET_MAP | NFT_SET_TIMEOUT,
 	.owner		= THIS_MODULE,
-- 
1.7.10.4


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 20/20] netfilter: nf_tables: support optional userdata for set elements
  2015-04-09 11:34 [PATCH 00/20] Netfilter updates for net-next Pablo Neira Ayuso
                   ` (18 preceding siblings ...)
  2015-04-09 11:35 ` [PATCH 19/20] netfilter: nf_tables: add support for dynamic set updates Pablo Neira Ayuso
@ 2015-04-09 11:35 ` Pablo Neira Ayuso
  2015-04-09 18:46 ` [PATCH 00/20] Netfilter updates for net-next David Miller
  20 siblings, 0 replies; 25+ messages in thread
From: Pablo Neira Ayuso @ 2015-04-09 11:35 UTC (permalink / raw)
  To: netfilter-devel; +Cc: davem, netdev

From: Patrick McHardy <kaber@trash.net>

Add an userdata set extension and allow the user to attach arbitrary
data to set elements. This is intended to hold TLV encoded data like
comments or DNS annotations that have no meaning to the kernel.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
---
 include/net/netfilter/nf_tables.h        |    7 ++++++
 include/uapi/linux/netfilter/nf_tables.h |    2 ++
 net/netfilter/nf_tables_api.c            |   34 ++++++++++++++++++++++++++++++
 3 files changed, 43 insertions(+)

diff --git a/include/net/netfilter/nf_tables.h b/include/net/netfilter/nf_tables.h
index 38c3496..63c44bdf 100644
--- a/include/net/netfilter/nf_tables.h
+++ b/include/net/netfilter/nf_tables.h
@@ -350,6 +350,7 @@ void nf_tables_unbind_set(const struct nft_ctx *ctx, struct nft_set *set,
  *	@NFT_SET_EXT_FLAGS: element flags
  *	@NFT_SET_EXT_TIMEOUT: element timeout
  *	@NFT_SET_EXT_EXPIRATION: element expiration time
+ *	@NFT_SET_EXT_USERDATA: user data associated with the element
  *	@NFT_SET_EXT_NUM: number of extension types
  */
 enum nft_set_extensions {
@@ -358,6 +359,7 @@ enum nft_set_extensions {
 	NFT_SET_EXT_FLAGS,
 	NFT_SET_EXT_TIMEOUT,
 	NFT_SET_EXT_EXPIRATION,
+	NFT_SET_EXT_USERDATA,
 	NFT_SET_EXT_NUM
 };
 
@@ -464,6 +466,11 @@ static inline unsigned long *nft_set_ext_expiration(const struct nft_set_ext *ex
 	return nft_set_ext(ext, NFT_SET_EXT_EXPIRATION);
 }
 
+static inline struct nft_userdata *nft_set_ext_userdata(const struct nft_set_ext *ext)
+{
+	return nft_set_ext(ext, NFT_SET_EXT_USERDATA);
+}
+
 static inline bool nft_set_elem_expired(const struct nft_set_ext *ext)
 {
 	return nft_set_ext_exists(ext, NFT_SET_EXT_EXPIRATION) &&
diff --git a/include/uapi/linux/netfilter/nf_tables.h b/include/uapi/linux/netfilter/nf_tables.h
index 0b87b2f..05ee1e0 100644
--- a/include/uapi/linux/netfilter/nf_tables.h
+++ b/include/uapi/linux/netfilter/nf_tables.h
@@ -292,6 +292,7 @@ enum nft_set_elem_flags {
  * @NFTA_SET_ELEM_FLAGS: bitmask of nft_set_elem_flags (NLA_U32)
  * @NFTA_SET_ELEM_TIMEOUT: timeout value (NLA_U64)
  * @NFTA_SET_ELEM_EXPIRATION: expiration time (NLA_U64)
+ * @NFTA_SET_ELEM_USERDATA: user data (NLA_BINARY)
  */
 enum nft_set_elem_attributes {
 	NFTA_SET_ELEM_UNSPEC,
@@ -300,6 +301,7 @@ enum nft_set_elem_attributes {
 	NFTA_SET_ELEM_FLAGS,
 	NFTA_SET_ELEM_TIMEOUT,
 	NFTA_SET_ELEM_EXPIRATION,
+	NFTA_SET_ELEM_USERDATA,
 	__NFTA_SET_ELEM_MAX
 };
 #define NFTA_SET_ELEM_MAX	(__NFTA_SET_ELEM_MAX - 1)
diff --git a/net/netfilter/nf_tables_api.c b/net/netfilter/nf_tables_api.c
index 598e53e..0b96fa0 100644
--- a/net/netfilter/nf_tables_api.c
+++ b/net/netfilter/nf_tables_api.c
@@ -2872,6 +2872,10 @@ const struct nft_set_ext_type nft_set_ext_types[] = {
 		.len	= sizeof(unsigned long),
 		.align	= __alignof__(unsigned long),
 	},
+	[NFT_SET_EXT_USERDATA]		= {
+		.len	= sizeof(struct nft_userdata),
+		.align	= __alignof__(struct nft_userdata),
+	},
 };
 EXPORT_SYMBOL_GPL(nft_set_ext_types);
 
@@ -2884,6 +2888,8 @@ static const struct nla_policy nft_set_elem_policy[NFTA_SET_ELEM_MAX + 1] = {
 	[NFTA_SET_ELEM_DATA]		= { .type = NLA_NESTED },
 	[NFTA_SET_ELEM_FLAGS]		= { .type = NLA_U32 },
 	[NFTA_SET_ELEM_TIMEOUT]		= { .type = NLA_U64 },
+	[NFTA_SET_ELEM_USERDATA]	= { .type = NLA_BINARY,
+					    .len = NFT_USERDATA_MAXLEN },
 };
 
 static const struct nla_policy nft_set_elem_list_policy[NFTA_SET_ELEM_LIST_MAX + 1] = {
@@ -2964,6 +2970,15 @@ static int nf_tables_fill_setelem(struct sk_buff *skb,
 			goto nla_put_failure;
 	}
 
+	if (nft_set_ext_exists(ext, NFT_SET_EXT_USERDATA)) {
+		struct nft_userdata *udata;
+
+		udata = nft_set_ext_userdata(ext);
+		if (nla_put(skb, NFTA_SET_ELEM_USERDATA,
+			    udata->len + 1, udata->data))
+			goto nla_put_failure;
+	}
+
 	nla_nest_end(skb, nest);
 	return 0;
 
@@ -3232,11 +3247,13 @@ static int nft_add_set_elem(struct nft_ctx *ctx, struct nft_set *set,
 	struct nft_set_ext *ext;
 	struct nft_set_elem elem;
 	struct nft_set_binding *binding;
+	struct nft_userdata *udata;
 	struct nft_data data;
 	enum nft_registers dreg;
 	struct nft_trans *trans;
 	u64 timeout;
 	u32 flags;
+	u8 ulen;
 	int err;
 
 	err = nla_parse_nested(nla, NFTA_SET_ELEM_MAX, attr,
@@ -3325,6 +3342,18 @@ static int nft_add_set_elem(struct nft_ctx *ctx, struct nft_set *set,
 		nft_set_ext_add(&tmpl, NFT_SET_EXT_DATA);
 	}
 
+	/* The full maximum length of userdata can exceed the maximum
+	 * offset value (U8_MAX) for following extensions, therefor it
+	 * must be the last extension added.
+	 */
+	ulen = 0;
+	if (nla[NFTA_SET_ELEM_USERDATA] != NULL) {
+		ulen = nla_len(nla[NFTA_SET_ELEM_USERDATA]);
+		if (ulen > 0)
+			nft_set_ext_add_length(&tmpl, NFT_SET_EXT_USERDATA,
+					       ulen);
+	}
+
 	err = -ENOMEM;
 	elem.priv = nft_set_elem_init(set, &tmpl, &elem.key, &data,
 				      timeout, GFP_KERNEL);
@@ -3334,6 +3363,11 @@ static int nft_add_set_elem(struct nft_ctx *ctx, struct nft_set *set,
 	ext = nft_set_elem_ext(set, elem.priv);
 	if (flags)
 		*nft_set_ext_flags(ext) = flags;
+	if (ulen > 0) {
+		udata = nft_set_ext_userdata(ext);
+		udata->len = ulen - 1;
+		nla_memcpy(&udata->data, nla[NFTA_SET_ELEM_USERDATA], ulen);
+	}
 
 	trans = nft_trans_elem_alloc(ctx, NFT_MSG_NEWSETELEM, set);
 	if (trans == NULL)
-- 
1.7.10.4


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* RE: [PATCH 05/20] netfilter: nft_hash: add support for timeouts
  2015-04-09 11:34 ` [PATCH 05/20] netfilter: nft_hash: add support for timeouts Pablo Neira Ayuso
@ 2015-04-09 13:39   ` David Laight
  2015-04-11 13:40     ` Pablo Neira Ayuso
  0 siblings, 1 reply; 25+ messages in thread
From: David Laight @ 2015-04-09 13:39 UTC (permalink / raw)
  To: 'Pablo Neira Ayuso', netfilter-devel; +Cc: davem, netdev

From: Pablo Neira Ayuso
> Sent: 09 April 2015 12:35
...
> Add support for element timeouts to nft_hash. The lookup and walking
> functions are changed to ignore timed out elements, a periodic garbage
> collection task cleans out expired entries.

You probably want to delete timed out entries during insert.
If you do that you don't really need a garbage collector.

I'd also worry about re-adding a timed out entry.

	David

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 00/20] Netfilter updates for net-next
  2015-04-09 11:34 [PATCH 00/20] Netfilter updates for net-next Pablo Neira Ayuso
                   ` (19 preceding siblings ...)
  2015-04-09 11:35 ` [PATCH 20/20] netfilter: nf_tables: support optional userdata for set elements Pablo Neira Ayuso
@ 2015-04-09 18:46 ` David Miller
  20 siblings, 0 replies; 25+ messages in thread
From: David Miller @ 2015-04-09 18:46 UTC (permalink / raw)
  To: pablo; +Cc: netfilter-devel, netdev

From: Pablo Neira Ayuso <pablo@netfilter.org>
Date: Thu,  9 Apr 2015 13:34:44 +0200

> The following patchset contains Netfilter updates for your net-next tree.
> They are:
> 
> * nf_tables set timeout infrastructure from Patrick Mchardy.
 ...
> * More nf_tables set enhancement from Patrick:
 ...
> BTW, I have also pulled net-next into nf-next to anticipate the conflict
> resolution between your okfn() signature changes and Florian's br_netfilter
> updates.
> 
> You can pull these changes from:
> 
>   git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next.git

Ok, pulled, thanks Pablo!

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 05/20] netfilter: nft_hash: add support for timeouts
  2015-04-09 13:39   ` David Laight
@ 2015-04-11 13:40     ` Pablo Neira Ayuso
  2015-04-11 13:45       ` Patrick McHardy
  0 siblings, 1 reply; 25+ messages in thread
From: Pablo Neira Ayuso @ 2015-04-11 13:40 UTC (permalink / raw)
  To: David Laight; +Cc: kaber, netfilter-devel, davem, netdev

On Thu, Apr 09, 2015 at 01:39:18PM +0000, David Laight wrote:
> From: Pablo Neira Ayuso
> > Sent: 09 April 2015 12:35
> ...
> > Add support for element timeouts to nft_hash. The lookup and walking
> > functions are changed to ignore timed out elements, a periodic garbage
> > collection task cleans out expired entries.
> 
> You probably want to delete timed out entries during insert.
> If you do that you don't really need a garbage collector.

Exploring a synchronous solution from the Netlink API sounds like an
interesting idea to me.

> I'd also worry about re-adding a timed out entry.

It seems we re-add it as a new entry.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 05/20] netfilter: nft_hash: add support for timeouts
  2015-04-11 13:40     ` Pablo Neira Ayuso
@ 2015-04-11 13:45       ` Patrick McHardy
  0 siblings, 0 replies; 25+ messages in thread
From: Patrick McHardy @ 2015-04-11 13:45 UTC (permalink / raw)
  To: Pablo Neira Ayuso; +Cc: David Laight, netfilter-devel, davem, netdev

On 11.04, Pablo Neira Ayuso wrote:
> On Thu, Apr 09, 2015 at 01:39:18PM +0000, David Laight wrote:
> > From: Pablo Neira Ayuso
> > > Sent: 09 April 2015 12:35
> > ...
> > > Add support for element timeouts to nft_hash. The lookup and walking
> > > functions are changed to ignore timed out elements, a periodic garbage
> > > collection task cleans out expired entries.
> > 
> > You probably want to delete timed out entries during insert.
> > If you do that you don't really need a garbage collector.
> 
> Exploring a synchronous solution from the Netlink API sounds like an
> interesting idea to me.

Its an optimization, but it can not replace GC. There's no guarantee
further inserts will happen.

For dynamic updates, where this is mostly needed, it won't work at all
since those happen in the wrong context. Doesn't really seem worth it.

> > I'd also worry about re-adding a timed out entry.
> 
> It seems we re-add it as a new entry.

Sure, everything except the key might be different.

^ permalink raw reply	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2015-04-11 13:45 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-04-09 11:34 [PATCH 00/20] Netfilter updates for net-next Pablo Neira Ayuso
2015-04-09 11:34 ` [PATCH 01/20] netfilter: nf_tables: add set timeout API support Pablo Neira Ayuso
2015-04-09 11:34 ` [PATCH 02/20] netfilter: nf_tables: add set element timeout support Pablo Neira Ayuso
2015-04-09 11:34 ` [PATCH 03/20] netfilter: nf_tables: add set garbage collection helpers Pablo Neira Ayuso
2015-04-09 11:34 ` [PATCH 04/20] netfilter: nf_tables: add GC synchronization helpers Pablo Neira Ayuso
2015-04-09 11:34 ` [PATCH 05/20] netfilter: nft_hash: add support for timeouts Pablo Neira Ayuso
2015-04-09 13:39   ` David Laight
2015-04-11 13:40     ` Pablo Neira Ayuso
2015-04-11 13:45       ` Patrick McHardy
2015-04-09 11:34 ` [PATCH 06/20] netfilter: x_tables: fix cgroup matching on non-full sks Pablo Neira Ayuso
2015-04-09 11:34 ` [PATCH 07/20] netfilter: nft_meta: fix cgroup matching Pablo Neira Ayuso
2015-04-09 11:34 ` [PATCH 08/20] netfilter: bridge: really save frag_max_size between PRE and POST_ROUTING Pablo Neira Ayuso
2015-04-09 11:34 ` [PATCH 09/20] netfilter: x_tables: don't extract flow keys on early demuxed sks in socket match Pablo Neira Ayuso
2015-04-09 11:34 ` [PATCH 10/20] netfilter: bridge: don't use nf_bridge_info data to store mac header Pablo Neira Ayuso
2015-04-09 11:34 ` [PATCH 11/20] netfilter: bridge: add helpers for fetching physin/outdev Pablo Neira Ayuso
2015-04-09 11:34 ` [PATCH 12/20] netfilter: physdev: use helpers Pablo Neira Ayuso
2015-04-09 11:34 ` [PATCH 13/20] netfilter: bridge: add and use nf_bridge_info_get helper Pablo Neira Ayuso
2015-04-09 11:34 ` [PATCH 14/20] netfilter: bridge: start splitting mask into public/private chunks Pablo Neira Ayuso
2015-04-09 11:34 ` [PATCH 15/20] netfilter: bridge: make BRNF_PKT_TYPE flag a bool Pablo Neira Ayuso
2015-04-09 11:35 ` [PATCH 16/20] netfilter: nf_tables: fix set selection when timeouts are requested Pablo Neira Ayuso
2015-04-09 11:35 ` [PATCH 17/20] netfilter: nf_tables: prepare set element accounting for async updates Pablo Neira Ayuso
2015-04-09 11:35 ` [PATCH 18/20] netfilter: nf_tables: support different set binding types Pablo Neira Ayuso
2015-04-09 11:35 ` [PATCH 19/20] netfilter: nf_tables: add support for dynamic set updates Pablo Neira Ayuso
2015-04-09 11:35 ` [PATCH 20/20] netfilter: nf_tables: support optional userdata for set elements Pablo Neira Ayuso
2015-04-09 18:46 ` [PATCH 00/20] Netfilter updates for net-next David Miller

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.