All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/5] netfilter: nf_tables: set timeout support
@ 2015-03-26 12:39 Patrick McHardy
  2015-03-26 12:39 ` [PATCH 1/5] netfilter: nf_tables: add set timeout API support Patrick McHardy
                   ` (5 more replies)
  0 siblings, 6 replies; 8+ messages in thread
From: Patrick McHardy @ 2015-03-26 12:39 UTC (permalink / raw)
  To: pablo; +Cc: netfilter-devel

These patches add support for set timeouts. Sets can have a default
timeout value that can be overriden by element specific timeouts.

Removal of expired elements will usually be performed by a garbage
collector for two reasons: avoiding an excessive number of timers
and because data deinit has to happen in process context.

The first two patches add the required netlink attributes, parsing,
dump etc. A set of GC helper functions for batched RCU element
destruction is added in patch three, some synchronization helpers
to avoid races between async GC and netlink insertion and removal
of elements are added in patch four.

Following patches will use this infrastrucure to support set updates
from the packet classification path for dynamic sets and dynamic
flow state maintenance.

Please apply, thanks!


Patrick McHardy (5):
  netfilter: nf_tables: add set timeout API support
  netfilter: nf_tables: add set element timeout support
  netfilter: nf_tables: add set garbage collection helpers
  netfilter: nf_tables: add GC synchronization helpers
  netfilter: nft_hash: add support for timeouts

 include/net/netfilter/nf_tables.h        | 125 +++++++++++++++++++++++++++++++
 include/uapi/linux/netfilter/nf_tables.h |  10 +++
 net/netfilter/nf_tables_api.c            | 110 +++++++++++++++++++++++++--
 net/netfilter/nft_hash.c                 |  80 +++++++++++++++++++-
 4 files changed, 316 insertions(+), 9 deletions(-)

-- 
2.1.0


^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH 1/5] netfilter: nf_tables: add set timeout API support
  2015-03-26 12:39 [PATCH 0/5] netfilter: nf_tables: set timeout support Patrick McHardy
@ 2015-03-26 12:39 ` Patrick McHardy
  2015-03-26 12:39 ` [PATCH 2/5] netfilter: nf_tables: add set element timeout support Patrick McHardy
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 8+ messages in thread
From: Patrick McHardy @ 2015-03-26 12:39 UTC (permalink / raw)
  To: pablo; +Cc: netfilter-devel

Add set timeout support to the netlink API. Sets with timeout support
enabled can have a default timeout value and garbage collection interval
specified.

Signed-off-by: Patrick McHardy <kaber@trash.net>
---
 include/net/netfilter/nf_tables.h        |  9 +++++++++
 include/uapi/linux/netfilter/nf_tables.h |  6 ++++++
 net/netfilter/nf_tables_api.c            | 30 ++++++++++++++++++++++++++++--
 3 files changed, 43 insertions(+), 2 deletions(-)

diff --git a/include/net/netfilter/nf_tables.h b/include/net/netfilter/nf_tables.h
index b8cd60d..8936803 100644
--- a/include/net/netfilter/nf_tables.h
+++ b/include/net/netfilter/nf_tables.h
@@ -258,6 +258,8 @@ void nft_unregister_set(struct nft_set_ops *ops);
  * 	@dtype: data type (verdict or numeric type defined by userspace)
  * 	@size: maximum set size
  * 	@nelems: number of elements
+ * 	@timeout: default timeout value in msecs
+ * 	@gc_int: garbage collection interval in msecs
  *	@policy: set parameterization (see enum nft_set_policies)
  * 	@ops: set ops
  * 	@pnet: network namespace
@@ -274,6 +276,8 @@ struct nft_set {
 	u32				dtype;
 	u32				size;
 	u32				nelems;
+	u64				timeout;
+	u32				gc_int;
 	u16				policy;
 	/* runtime data below here */
 	const struct nft_set_ops	*ops ____cacheline_aligned;
@@ -295,6 +299,11 @@ struct nft_set *nf_tables_set_lookup(const struct nft_table *table,
 struct nft_set *nf_tables_set_lookup_byid(const struct net *net,
 					  const struct nlattr *nla);
 
+static inline unsigned long nft_set_gc_interval(const struct nft_set *set)
+{
+	return set->gc_int ? msecs_to_jiffies(set->gc_int) : HZ;
+}
+
 /**
  *	struct nft_set_binding - nf_tables set binding
  *
diff --git a/include/uapi/linux/netfilter/nf_tables.h b/include/uapi/linux/netfilter/nf_tables.h
index b978393..971d245 100644
--- a/include/uapi/linux/netfilter/nf_tables.h
+++ b/include/uapi/linux/netfilter/nf_tables.h
@@ -208,12 +208,14 @@ enum nft_rule_compat_attributes {
  * @NFT_SET_CONSTANT: set contents may not change while bound
  * @NFT_SET_INTERVAL: set contains intervals
  * @NFT_SET_MAP: set is used as a dictionary
+ * @NFT_SET_TIMEOUT: set uses timeouts
  */
 enum nft_set_flags {
 	NFT_SET_ANONYMOUS		= 0x1,
 	NFT_SET_CONSTANT		= 0x2,
 	NFT_SET_INTERVAL		= 0x4,
 	NFT_SET_MAP			= 0x8,
+	NFT_SET_TIMEOUT			= 0x10,
 };
 
 /**
@@ -252,6 +254,8 @@ enum nft_set_desc_attributes {
  * @NFTA_SET_POLICY: selection policy (NLA_U32)
  * @NFTA_SET_DESC: set description (NLA_NESTED)
  * @NFTA_SET_ID: uniquely identifies a set in a transaction (NLA_U32)
+ * @NFTA_SET_TIMEOUT: default timeout value (NLA_U64)
+ * @NFTA_SET_GC_INTERVAL: garbage collection interval (NLA_U32)
  */
 enum nft_set_attributes {
 	NFTA_SET_UNSPEC,
@@ -265,6 +269,8 @@ enum nft_set_attributes {
 	NFTA_SET_POLICY,
 	NFTA_SET_DESC,
 	NFTA_SET_ID,
+	NFTA_SET_TIMEOUT,
+	NFTA_SET_GC_INTERVAL,
 	__NFTA_SET_MAX
 };
 #define NFTA_SET_MAX		(__NFTA_SET_MAX - 1)
diff --git a/net/netfilter/nf_tables_api.c b/net/netfilter/nf_tables_api.c
index 5604c2d..6320b64 100644
--- a/net/netfilter/nf_tables_api.c
+++ b/net/netfilter/nf_tables_api.c
@@ -2216,6 +2216,8 @@ static const struct nla_policy nft_set_policy[NFTA_SET_MAX + 1] = {
 	[NFTA_SET_POLICY]		= { .type = NLA_U32 },
 	[NFTA_SET_DESC]			= { .type = NLA_NESTED },
 	[NFTA_SET_ID]			= { .type = NLA_U32 },
+	[NFTA_SET_TIMEOUT]		= { .type = NLA_U64 },
+	[NFTA_SET_GC_INTERVAL]		= { .type = NLA_U32 },
 };
 
 static const struct nla_policy nft_set_desc_policy[NFTA_SET_DESC_MAX + 1] = {
@@ -2366,6 +2368,13 @@ static int nf_tables_fill_set(struct sk_buff *skb, const struct nft_ctx *ctx,
 			goto nla_put_failure;
 	}
 
+	if (set->timeout &&
+	    nla_put_be64(skb, NFTA_SET_TIMEOUT, cpu_to_be64(set->timeout)))
+		goto nla_put_failure;
+	if (set->gc_int &&
+	    nla_put_be32(skb, NFTA_SET_GC_INTERVAL, htonl(set->gc_int)))
+		goto nla_put_failure;
+
 	if (set->policy != NFT_SET_POL_PERFORMANCE) {
 		if (nla_put_be32(skb, NFTA_SET_POLICY, htonl(set->policy)))
 			goto nla_put_failure;
@@ -2578,7 +2587,8 @@ static int nf_tables_newset(struct sock *nlsk, struct sk_buff *skb,
 	char name[IFNAMSIZ];
 	unsigned int size;
 	bool create;
-	u32 ktype, dtype, flags, policy;
+	u64 timeout;
+	u32 ktype, dtype, flags, policy, gc_int;
 	struct nft_set_desc desc;
 	int err;
 
@@ -2605,7 +2615,8 @@ static int nf_tables_newset(struct sock *nlsk, struct sk_buff *skb,
 	if (nla[NFTA_SET_FLAGS] != NULL) {
 		flags = ntohl(nla_get_be32(nla[NFTA_SET_FLAGS]));
 		if (flags & ~(NFT_SET_ANONYMOUS | NFT_SET_CONSTANT |
-			      NFT_SET_INTERVAL | NFT_SET_MAP))
+			      NFT_SET_INTERVAL | NFT_SET_MAP |
+			      NFT_SET_TIMEOUT))
 			return -EINVAL;
 	}
 
@@ -2631,6 +2642,19 @@ static int nf_tables_newset(struct sock *nlsk, struct sk_buff *skb,
 	} else if (flags & NFT_SET_MAP)
 		return -EINVAL;
 
+	timeout = 0;
+	if (nla[NFTA_SET_TIMEOUT] != NULL) {
+		if (!(flags & NFT_SET_TIMEOUT))
+			return -EINVAL;
+		timeout = be64_to_cpu(nla_get_be64(nla[NFTA_SET_TIMEOUT]));
+	}
+	gc_int = 0;
+	if (nla[NFTA_SET_GC_INTERVAL] != NULL) {
+		if (!(flags & NFT_SET_TIMEOUT))
+			return -EINVAL;
+		gc_int = ntohl(nla_get_be32(nla[NFTA_SET_GC_INTERVAL]));
+	}
+
 	policy = NFT_SET_POL_PERFORMANCE;
 	if (nla[NFTA_SET_POLICY] != NULL)
 		policy = ntohl(nla_get_be32(nla[NFTA_SET_POLICY]));
@@ -2699,6 +2723,8 @@ static int nf_tables_newset(struct sock *nlsk, struct sk_buff *skb,
 	set->flags = flags;
 	set->size  = desc.size;
 	set->policy = policy;
+	set->timeout = timeout;
+	set->gc_int = gc_int;
 
 	err = ops->init(set, &desc, nla);
 	if (err < 0)
-- 
2.1.0


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH 2/5] netfilter: nf_tables: add set element timeout support
  2015-03-26 12:39 [PATCH 0/5] netfilter: nf_tables: set timeout support Patrick McHardy
  2015-03-26 12:39 ` [PATCH 1/5] netfilter: nf_tables: add set timeout API support Patrick McHardy
@ 2015-03-26 12:39 ` Patrick McHardy
  2015-03-26 12:39 ` [PATCH 3/5] netfilter: nf_tables: add set garbage collection helpers Patrick McHardy
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 8+ messages in thread
From: Patrick McHardy @ 2015-03-26 12:39 UTC (permalink / raw)
  To: pablo; +Cc: netfilter-devel

Add API support for set element timeouts. Elements can have a individual
timeout value specified, overriding the sets' default.

Two new extension types are used for timeouts - the timeout value and
the expiration time. The timeout value only exists if it differs from
the default value.

Signed-off-by: Patrick McHardy <kaber@trash.net>
---
 include/net/netfilter/nf_tables.h        | 20 ++++++++++++
 include/uapi/linux/netfilter/nf_tables.h |  4 +++
 net/netfilter/nf_tables_api.c            | 53 ++++++++++++++++++++++++++++++--
 3 files changed, 75 insertions(+), 2 deletions(-)

diff --git a/include/net/netfilter/nf_tables.h b/include/net/netfilter/nf_tables.h
index 8936803..f2726c5 100644
--- a/include/net/netfilter/nf_tables.h
+++ b/include/net/netfilter/nf_tables.h
@@ -329,12 +329,16 @@ void nf_tables_unbind_set(const struct nft_ctx *ctx, struct nft_set *set,
  *	@NFT_SET_EXT_KEY: element key
  *	@NFT_SET_EXT_DATA: mapping data
  *	@NFT_SET_EXT_FLAGS: element flags
+ *	@NFT_SET_EXT_TIMEOUT: element timeout
+ *	@NFT_SET_EXT_EXPIRATION: element expiration time
  *	@NFT_SET_EXT_NUM: number of extension types
  */
 enum nft_set_extensions {
 	NFT_SET_EXT_KEY,
 	NFT_SET_EXT_DATA,
 	NFT_SET_EXT_FLAGS,
+	NFT_SET_EXT_TIMEOUT,
+	NFT_SET_EXT_EXPIRATION,
 	NFT_SET_EXT_NUM
 };
 
@@ -431,6 +435,22 @@ static inline u8 *nft_set_ext_flags(const struct nft_set_ext *ext)
 	return nft_set_ext(ext, NFT_SET_EXT_FLAGS);
 }
 
+static inline u64 *nft_set_ext_timeout(const struct nft_set_ext *ext)
+{
+	return nft_set_ext(ext, NFT_SET_EXT_TIMEOUT);
+}
+
+static inline unsigned long *nft_set_ext_expiration(const struct nft_set_ext *ext)
+{
+	return nft_set_ext(ext, NFT_SET_EXT_EXPIRATION);
+}
+
+static inline bool nft_set_elem_expired(const struct nft_set_ext *ext)
+{
+	return nft_set_ext_exists(ext, NFT_SET_EXT_EXPIRATION) &&
+	       time_is_before_eq_jiffies(*nft_set_ext_expiration(ext));
+}
+
 static inline struct nft_set_ext *nft_set_elem_ext(const struct nft_set *set,
 						   void *elem)
 {
diff --git a/include/uapi/linux/netfilter/nf_tables.h b/include/uapi/linux/netfilter/nf_tables.h
index 971d245..83441cc 100644
--- a/include/uapi/linux/netfilter/nf_tables.h
+++ b/include/uapi/linux/netfilter/nf_tables.h
@@ -290,12 +290,16 @@ enum nft_set_elem_flags {
  * @NFTA_SET_ELEM_KEY: key value (NLA_NESTED: nft_data)
  * @NFTA_SET_ELEM_DATA: data value of mapping (NLA_NESTED: nft_data_attributes)
  * @NFTA_SET_ELEM_FLAGS: bitmask of nft_set_elem_flags (NLA_U32)
+ * @NFTA_SET_ELEM_TIMEOUT: timeout value (NLA_U64)
+ * @NFTA_SET_ELEM_EXPIRATION: expiration time (NLA_U64)
  */
 enum nft_set_elem_attributes {
 	NFTA_SET_ELEM_UNSPEC,
 	NFTA_SET_ELEM_KEY,
 	NFTA_SET_ELEM_DATA,
 	NFTA_SET_ELEM_FLAGS,
+	NFTA_SET_ELEM_TIMEOUT,
+	NFTA_SET_ELEM_EXPIRATION,
 	__NFTA_SET_ELEM_MAX
 };
 #define NFTA_SET_ELEM_MAX	(__NFTA_SET_ELEM_MAX - 1)
diff --git a/net/netfilter/nf_tables_api.c b/net/netfilter/nf_tables_api.c
index 6320b64..9e032db 100644
--- a/net/netfilter/nf_tables_api.c
+++ b/net/netfilter/nf_tables_api.c
@@ -2863,6 +2863,14 @@ const struct nft_set_ext_type nft_set_ext_types[] = {
 		.len	= sizeof(u8),
 		.align	= __alignof__(u8),
 	},
+	[NFT_SET_EXT_TIMEOUT]		= {
+		.len	= sizeof(u64),
+		.align	= __alignof__(u64),
+	},
+	[NFT_SET_EXT_EXPIRATION]	= {
+		.len	= sizeof(unsigned long),
+		.align	= __alignof__(unsigned long),
+	},
 };
 EXPORT_SYMBOL_GPL(nft_set_ext_types);
 
@@ -2874,6 +2882,7 @@ static const struct nla_policy nft_set_elem_policy[NFTA_SET_ELEM_MAX + 1] = {
 	[NFTA_SET_ELEM_KEY]		= { .type = NLA_NESTED },
 	[NFTA_SET_ELEM_DATA]		= { .type = NLA_NESTED },
 	[NFTA_SET_ELEM_FLAGS]		= { .type = NLA_U32 },
+	[NFTA_SET_ELEM_TIMEOUT]		= { .type = NLA_U64 },
 };
 
 static const struct nla_policy nft_set_elem_list_policy[NFTA_SET_ELEM_LIST_MAX + 1] = {
@@ -2935,6 +2944,25 @@ static int nf_tables_fill_setelem(struct sk_buff *skb,
 		         htonl(*nft_set_ext_flags(ext))))
 		goto nla_put_failure;
 
+	if (nft_set_ext_exists(ext, NFT_SET_EXT_TIMEOUT) &&
+	    nla_put_be64(skb, NFTA_SET_ELEM_TIMEOUT,
+			 cpu_to_be64(*nft_set_ext_timeout(ext))))
+		goto nla_put_failure;
+
+	if (nft_set_ext_exists(ext, NFT_SET_EXT_EXPIRATION)) {
+		unsigned long expires, now = jiffies;
+
+		expires = *nft_set_ext_expiration(ext);
+		if (time_before(now, expires))
+			expires -= now;
+		else
+			expires = 0;
+
+		if (nla_put_be64(skb, NFTA_SET_ELEM_EXPIRATION,
+				 cpu_to_be64(jiffies_to_msecs(expires))))
+			goto nla_put_failure;
+	}
+
 	nla_nest_end(skb, nest);
 	return 0;
 
@@ -3158,7 +3186,7 @@ static void *nft_set_elem_init(const struct nft_set *set,
 			       const struct nft_set_ext_tmpl *tmpl,
 			       const struct nft_data *key,
 			       const struct nft_data *data,
-			       gfp_t gfp)
+			       u64 timeout, gfp_t gfp)
 {
 	struct nft_set_ext *ext;
 	void *elem;
@@ -3173,6 +3201,11 @@ static void *nft_set_elem_init(const struct nft_set *set,
 	memcpy(nft_set_ext_key(ext), key, set->klen);
 	if (nft_set_ext_exists(ext, NFT_SET_EXT_DATA))
 		memcpy(nft_set_ext_data(ext), data, set->dlen);
+	if (nft_set_ext_exists(ext, NFT_SET_EXT_EXPIRATION))
+		*nft_set_ext_expiration(ext) =
+			jiffies + msecs_to_jiffies(timeout);
+	if (nft_set_ext_exists(ext, NFT_SET_EXT_TIMEOUT))
+		*nft_set_ext_timeout(ext) = timeout;
 
 	return elem;
 }
@@ -3201,6 +3234,7 @@ static int nft_add_set_elem(struct nft_ctx *ctx, struct nft_set *set,
 	struct nft_data data;
 	enum nft_registers dreg;
 	struct nft_trans *trans;
+	u64 timeout;
 	u32 flags;
 	int err;
 
@@ -3241,6 +3275,15 @@ static int nft_add_set_elem(struct nft_ctx *ctx, struct nft_set *set,
 			return -EINVAL;
 	}
 
+	timeout = 0;
+	if (nla[NFTA_SET_ELEM_TIMEOUT] != NULL) {
+		if (!(set->flags & NFT_SET_TIMEOUT))
+			return -EINVAL;
+		timeout = be64_to_cpu(nla_get_be64(nla[NFTA_SET_ELEM_TIMEOUT]));
+	} else if (set->flags & NFT_SET_TIMEOUT) {
+		timeout = set->timeout;
+	}
+
 	err = nft_data_init(ctx, &elem.key, &d1, nla[NFTA_SET_ELEM_KEY]);
 	if (err < 0)
 		goto err1;
@@ -3249,6 +3292,11 @@ static int nft_add_set_elem(struct nft_ctx *ctx, struct nft_set *set,
 		goto err2;
 
 	nft_set_ext_add(&tmpl, NFT_SET_EXT_KEY);
+	if (timeout > 0) {
+		nft_set_ext_add(&tmpl, NFT_SET_EXT_EXPIRATION);
+		if (timeout != set->timeout)
+			nft_set_ext_add(&tmpl, NFT_SET_EXT_TIMEOUT);
+	}
 
 	if (nla[NFTA_SET_ELEM_DATA] != NULL) {
 		err = nft_data_init(ctx, &data, &d2, nla[NFTA_SET_ELEM_DATA]);
@@ -3277,7 +3325,8 @@ static int nft_add_set_elem(struct nft_ctx *ctx, struct nft_set *set,
 	}
 
 	err = -ENOMEM;
-	elem.priv = nft_set_elem_init(set, &tmpl, &elem.key, &data, GFP_KERNEL);
+	elem.priv = nft_set_elem_init(set, &tmpl, &elem.key, &data,
+				      timeout, GFP_KERNEL);
 	if (elem.priv == NULL)
 		goto err3;
 
-- 
2.1.0


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH 3/5] netfilter: nf_tables: add set garbage collection helpers
  2015-03-26 12:39 [PATCH 0/5] netfilter: nf_tables: set timeout support Patrick McHardy
  2015-03-26 12:39 ` [PATCH 1/5] netfilter: nf_tables: add set timeout API support Patrick McHardy
  2015-03-26 12:39 ` [PATCH 2/5] netfilter: nf_tables: add set element timeout support Patrick McHardy
@ 2015-03-26 12:39 ` Patrick McHardy
  2015-03-26 12:39 ` [PATCH 4/5] netfilter: nf_tables: add GC synchronization helpers Patrick McHardy
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 8+ messages in thread
From: Patrick McHardy @ 2015-03-26 12:39 UTC (permalink / raw)
  To: pablo; +Cc: netfilter-devel

Add helpers for GC batch destruction: since element destruction needs
a RCU grace period for all set implementations, add some helper functions
for asynchronous batch destruction. Elements are collected in a batch
structure, which is asynchronously released using RCU once its full.

Signed-off-by: Patrick McHardy <kaber@trash.net>
---
 include/net/netfilter/nf_tables.h | 56 +++++++++++++++++++++++++++++++++++++++
 net/netfilter/nf_tables_api.c     | 25 +++++++++++++++++
 2 files changed, 81 insertions(+)

diff --git a/include/net/netfilter/nf_tables.h b/include/net/netfilter/nf_tables.h
index f2726c5..6fd4495 100644
--- a/include/net/netfilter/nf_tables.h
+++ b/include/net/netfilter/nf_tables.h
@@ -460,6 +460,62 @@ static inline struct nft_set_ext *nft_set_elem_ext(const struct nft_set *set,
 void nft_set_elem_destroy(const struct nft_set *set, void *elem);
 
 /**
+ *	struct nft_set_gc_batch_head - nf_tables set garbage collection batch
+ *
+ *	@rcu: rcu head
+ *	@set: set the elements belong to
+ *	@cnt: count of elements
+ */
+struct nft_set_gc_batch_head {
+	struct rcu_head			rcu;
+	const struct nft_set		*set;
+	unsigned int			cnt;
+};
+
+#define NFT_SET_GC_BATCH_SIZE	((PAGE_SIZE -				  \
+				  sizeof(struct nft_set_gc_batch_head)) / \
+				 sizeof(void *))
+
+/**
+ *	struct nft_set_gc_batch - nf_tables set garbage collection batch
+ *
+ * 	@head: GC batch head
+ * 	@elems: garbage collection elements
+ */
+struct nft_set_gc_batch {
+	struct nft_set_gc_batch_head	head;
+	void				*elems[NFT_SET_GC_BATCH_SIZE];
+};
+
+struct nft_set_gc_batch *nft_set_gc_batch_alloc(const struct nft_set *set,
+						gfp_t gfp);
+void nft_set_gc_batch_release(struct rcu_head *rcu);
+
+static inline void nft_set_gc_batch_complete(struct nft_set_gc_batch *gcb)
+{
+	if (gcb != NULL)
+		call_rcu(&gcb->head.rcu, nft_set_gc_batch_release);
+}
+
+static inline struct nft_set_gc_batch *
+nft_set_gc_batch_check(const struct nft_set *set, struct nft_set_gc_batch *gcb,
+		       gfp_t gfp)
+{
+	if (gcb != NULL) {
+		if (gcb->head.cnt + 1 < ARRAY_SIZE(gcb->elems))
+			return gcb;
+		nft_set_gc_batch_complete(gcb);
+	}
+	return nft_set_gc_batch_alloc(set, gfp);
+}
+
+static inline void nft_set_gc_batch_add(struct nft_set_gc_batch *gcb,
+					void *elem)
+{
+	gcb->elems[gcb->head.cnt++] = elem;
+}
+
+/**
  *	struct nft_expr_type - nf_tables expression type
  *
  *	@select_ops: function to select nft_expr_ops
diff --git a/net/netfilter/nf_tables_api.c b/net/netfilter/nf_tables_api.c
index 9e032db..138e47f 100644
--- a/net/netfilter/nf_tables_api.c
+++ b/net/netfilter/nf_tables_api.c
@@ -3482,6 +3482,31 @@ static int nf_tables_delsetelem(struct sock *nlsk, struct sk_buff *skb,
 	return err;
 }
 
+void nft_set_gc_batch_release(struct rcu_head *rcu)
+{
+	struct nft_set_gc_batch *gcb;
+	unsigned int i;
+
+	gcb = container_of(rcu, struct nft_set_gc_batch, head.rcu);
+	for (i = 0; i < gcb->head.cnt; i++)
+		nft_set_elem_destroy(gcb->head.set, gcb->elems[i]);
+	kfree(gcb);
+}
+EXPORT_SYMBOL_GPL(nft_set_gc_batch_release);
+
+struct nft_set_gc_batch *nft_set_gc_batch_alloc(const struct nft_set *set,
+						gfp_t gfp)
+{
+	struct nft_set_gc_batch *gcb;
+
+	gcb = kzalloc(sizeof(*gcb), gfp);
+	if (gcb == NULL)
+		return gcb;
+	gcb->head.set = set;
+	return gcb;
+}
+EXPORT_SYMBOL_GPL(nft_set_gc_batch_alloc);
+
 static int nf_tables_fill_gen_info(struct sk_buff *skb, struct net *net,
 				   u32 portid, u32 seq)
 {
-- 
2.1.0


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH 4/5] netfilter: nf_tables: add GC synchronization helpers
  2015-03-26 12:39 [PATCH 0/5] netfilter: nf_tables: set timeout support Patrick McHardy
                   ` (2 preceding siblings ...)
  2015-03-26 12:39 ` [PATCH 3/5] netfilter: nf_tables: add set garbage collection helpers Patrick McHardy
@ 2015-03-26 12:39 ` Patrick McHardy
  2015-03-26 12:39 ` [PATCH 5/5] netfilter: nft_hash: add support for timeouts Patrick McHardy
  2015-04-02  8:52 ` [PATCH 0/5] netfilter: nf_tables: set timeout support Pablo Neira Ayuso
  5 siblings, 0 replies; 8+ messages in thread
From: Patrick McHardy @ 2015-03-26 12:39 UTC (permalink / raw)
  To: pablo; +Cc: netfilter-devel

GC is expected to happen asynchrously to the netlink interface. In the
netlink path, both insertion and removal of elements consist of two
steps, insertion followed by activation or deactivation followed by
removal, during which the element must not be freed by GC.

The synchronization helpers use an unused bit in the genmask field to
atomically mark an element as "busy", meaning it is either currently
being handled through the netlink API or by GC.

Elements being processed by GC will never survive, netlink will simply
ignore them. Elements being currently processed through netlink will be
skipped by GC and reprocessed during the next run.

Signed-off-by: Patrick McHardy <kaber@trash.net>
---
 include/net/netfilter/nf_tables.h | 35 +++++++++++++++++++++++++++++++++++
 net/netfilter/nf_tables_api.c     |  2 +-
 2 files changed, 36 insertions(+), 1 deletion(-)

diff --git a/include/net/netfilter/nf_tables.h b/include/net/netfilter/nf_tables.h
index 6fd4495..1ea13fc 100644
--- a/include/net/netfilter/nf_tables.h
+++ b/include/net/netfilter/nf_tables.h
@@ -852,6 +852,41 @@ static inline void nft_set_elem_change_active(const struct nft_set *set,
 	ext->genmask ^= nft_genmask_next(read_pnet(&set->pnet));
 }
 
+/*
+ * We use a free bit in the genmask field to indicate the element
+ * is busy, meaning it is currently being processed either by
+ * the netlink API or GC.
+ *
+ * Even though the genmask is only a single byte wide, this works
+ * because the extension structure if fully constant once initialized,
+ * so there are no non-atomic write accesses unless it is already
+ * marked busy.
+ */
+#define NFT_SET_ELEM_BUSY_MASK	(1 << 2)
+
+#if defined(__LITTLE_ENDIAN_BITFIELD)
+#define NFT_SET_ELEM_BUSY_BIT	2
+#elif defined(__BIG_ENDIAN_BITFIELD)
+#define NFT_SET_ELEM_BUSY_BIT	(BITS_PER_LONG - BITS_PER_BYTE + 2)
+#else
+#error
+#endif
+
+static inline int nft_set_elem_mark_busy(struct nft_set_ext *ext)
+{
+	unsigned long *word = (unsigned long *)ext;
+
+	BUILD_BUG_ON(offsetof(struct nft_set_ext, genmask) != 0);
+	return test_and_set_bit(NFT_SET_ELEM_BUSY_BIT, word);
+}
+
+static inline void nft_set_elem_clear_busy(struct nft_set_ext *ext)
+{
+	unsigned long *word = (unsigned long *)ext;
+
+	clear_bit(NFT_SET_ELEM_BUSY_BIT, word);
+}
+
 /**
  *	struct nft_trans - nf_tables object update in transaction
  *
diff --git a/net/netfilter/nf_tables_api.c b/net/netfilter/nf_tables_api.c
index 138e47f..3aa92b3 100644
--- a/net/netfilter/nf_tables_api.c
+++ b/net/netfilter/nf_tables_api.c
@@ -3338,7 +3338,7 @@ static int nft_add_set_elem(struct nft_ctx *ctx, struct nft_set *set,
 	if (trans == NULL)
 		goto err4;
 
-	ext->genmask = nft_genmask_cur(ctx->net);
+	ext->genmask = nft_genmask_cur(ctx->net) | NFT_SET_ELEM_BUSY_MASK;
 	err = set->ops->insert(set, &elem);
 	if (err < 0)
 		goto err5;
-- 
2.1.0


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH 5/5] netfilter: nft_hash: add support for timeouts
  2015-03-26 12:39 [PATCH 0/5] netfilter: nf_tables: set timeout support Patrick McHardy
                   ` (3 preceding siblings ...)
  2015-03-26 12:39 ` [PATCH 4/5] netfilter: nf_tables: add GC synchronization helpers Patrick McHardy
@ 2015-03-26 12:39 ` Patrick McHardy
  2015-04-02  8:52 ` [PATCH 0/5] netfilter: nf_tables: set timeout support Pablo Neira Ayuso
  5 siblings, 0 replies; 8+ messages in thread
From: Patrick McHardy @ 2015-03-26 12:39 UTC (permalink / raw)
  To: pablo; +Cc: netfilter-devel

Add support for element timeouts to nft_hash. The lookup and walking
functions are changed to ignore timed out elements, a periodic garbage
collection task cleans out expired entries.

Signed-off-by: Patrick McHardy <kaber@trash.net>
---
 include/net/netfilter/nf_tables.h |  5 +++
 net/netfilter/nft_hash.c          | 80 +++++++++++++++++++++++++++++++++++++--
 2 files changed, 81 insertions(+), 4 deletions(-)

diff --git a/include/net/netfilter/nf_tables.h b/include/net/netfilter/nf_tables.h
index 1ea13fc..a785699 100644
--- a/include/net/netfilter/nf_tables.h
+++ b/include/net/netfilter/nf_tables.h
@@ -294,6 +294,11 @@ static inline void *nft_set_priv(const struct nft_set *set)
 	return (void *)set->data;
 }
 
+static inline struct nft_set *nft_set_container_of(const void *priv)
+{
+	return (void *)priv - offsetof(struct nft_set, data);
+}
+
 struct nft_set *nf_tables_set_lookup(const struct nft_table *table,
 				     const struct nlattr *nla);
 struct nft_set *nf_tables_set_lookup_byid(const struct net *net,
diff --git a/net/netfilter/nft_hash.c b/net/netfilter/nft_hash.c
index c7e1a9d..2a00da9 100644
--- a/net/netfilter/nft_hash.c
+++ b/net/netfilter/nft_hash.c
@@ -15,6 +15,7 @@
 #include <linux/log2.h>
 #include <linux/jhash.h>
 #include <linux/netlink.h>
+#include <linux/workqueue.h>
 #include <linux/rhashtable.h>
 #include <linux/netfilter.h>
 #include <linux/netfilter/nf_tables.h>
@@ -25,6 +26,7 @@
 
 struct nft_hash {
 	struct rhashtable		ht;
+	struct delayed_work		gc_work;
 };
 
 struct nft_hash_elem {
@@ -62,6 +64,8 @@ static inline int nft_hash_cmp(struct rhashtable_compare_arg *arg,
 
 	if (nft_data_cmp(nft_set_ext_key(&he->ext), x->key, x->set->klen))
 		return 1;
+	if (nft_set_elem_expired(&he->ext))
+		return 1;
 	if (!nft_set_elem_active(&he->ext, x->genmask))
 		return 1;
 	return 0;
@@ -107,6 +111,7 @@ static void nft_hash_activate(const struct nft_set *set,
 	struct nft_hash_elem *he = elem->priv;
 
 	nft_set_elem_change_active(set, &he->ext);
+	nft_set_elem_clear_busy(&he->ext);
 }
 
 static void *nft_hash_deactivate(const struct nft_set *set,
@@ -120,9 +125,15 @@ static void *nft_hash_deactivate(const struct nft_set *set,
 		.key	 = &elem->key,
 	};
 
+	rcu_read_lock();
 	he = rhashtable_lookup_fast(&priv->ht, &arg, nft_hash_params);
-	if (he != NULL)
-		nft_set_elem_change_active(set, &he->ext);
+	if (he != NULL) {
+		if (!nft_set_elem_mark_busy(&he->ext))
+			nft_set_elem_change_active(set, &he->ext);
+		else
+			he = NULL;
+	}
+	rcu_read_unlock();
 
 	return he;
 }
@@ -170,6 +181,8 @@ static void nft_hash_walk(const struct nft_ctx *ctx, const struct nft_set *set,
 
 		if (iter->count < iter->skip)
 			goto cont;
+		if (nft_set_elem_expired(&he->ext))
+			goto cont;
 		if (!nft_set_elem_active(&he->ext, genmask))
 			goto cont;
 
@@ -188,6 +201,54 @@ out:
 	rhashtable_walk_exit(&hti);
 }
 
+static void nft_hash_gc(struct work_struct *work)
+{
+	const struct nft_set *set;
+	struct nft_hash_elem *he;
+	struct nft_hash *priv;
+	struct nft_set_gc_batch *gcb = NULL;
+	struct rhashtable_iter hti;
+	int err;
+
+	priv = container_of(work, struct nft_hash, gc_work.work);
+	set  = nft_set_container_of(priv);
+
+	err = rhashtable_walk_init(&priv->ht, &hti);
+	if (err)
+		goto schedule;
+
+	err = rhashtable_walk_start(&hti);
+	if (err && err != -EAGAIN)
+		goto out;
+
+	while ((he = rhashtable_walk_next(&hti))) {
+		if (IS_ERR(he)) {
+			if (PTR_ERR(he) != -EAGAIN)
+				goto out;
+			continue;
+		}
+
+		if (!nft_set_elem_expired(&he->ext))
+			continue;
+		if (nft_set_elem_mark_busy(&he->ext))
+			continue;
+
+		gcb = nft_set_gc_batch_check(set, gcb, GFP_ATOMIC);
+		if (gcb == NULL)
+			goto out;
+		rhashtable_remove_fast(&priv->ht, &he->node, nft_hash_params);
+		nft_set_gc_batch_add(gcb, he);
+	}
+out:
+	rhashtable_walk_stop(&hti);
+	rhashtable_walk_exit(&hti);
+
+	nft_set_gc_batch_complete(gcb);
+schedule:
+	queue_delayed_work(system_power_efficient_wq, &priv->gc_work,
+			   nft_set_gc_interval(set));
+}
+
 static unsigned int nft_hash_privsize(const struct nlattr * const nla[])
 {
 	return sizeof(struct nft_hash);
@@ -207,11 +268,20 @@ static int nft_hash_init(const struct nft_set *set,
 {
 	struct nft_hash *priv = nft_set_priv(set);
 	struct rhashtable_params params = nft_hash_params;
+	int err;
 
 	params.nelem_hint = desc->size ?: NFT_HASH_ELEMENT_HINT;
 	params.key_len	  = set->klen;
 
-	return rhashtable_init(&priv->ht, &params);
+	err = rhashtable_init(&priv->ht, &params);
+	if (err < 0)
+		return err;
+
+	INIT_DEFERRABLE_WORK(&priv->gc_work, nft_hash_gc);
+	if (set->flags & NFT_SET_TIMEOUT)
+		queue_delayed_work(system_power_efficient_wq, &priv->gc_work,
+				   nft_set_gc_interval(set));
+	return 0;
 }
 
 static void nft_hash_elem_destroy(void *ptr, void *arg)
@@ -223,6 +293,7 @@ static void nft_hash_destroy(const struct nft_set *set)
 {
 	struct nft_hash *priv = nft_set_priv(set);
 
+	cancel_delayed_work_sync(&priv->gc_work);
 	rhashtable_free_and_destroy(&priv->ht, nft_hash_elem_destroy,
 				    (void *)set);
 }
@@ -264,7 +335,7 @@ static struct nft_set_ops nft_hash_ops __read_mostly = {
 	.remove		= nft_hash_remove,
 	.lookup		= nft_hash_lookup,
 	.walk		= nft_hash_walk,
-	.features	= NFT_SET_MAP,
+	.features	= NFT_SET_MAP | NFT_SET_TIMEOUT,
 	.owner		= THIS_MODULE,
 };
 
@@ -284,3 +355,4 @@ module_exit(nft_hash_module_exit);
 MODULE_LICENSE("GPL");
 MODULE_AUTHOR("Patrick McHardy <kaber@trash.net>");
 MODULE_ALIAS_NFT_SET();
+
-- 
2.1.0


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH 0/5] netfilter: nf_tables: set timeout support
  2015-03-26 12:39 [PATCH 0/5] netfilter: nf_tables: set timeout support Patrick McHardy
                   ` (4 preceding siblings ...)
  2015-03-26 12:39 ` [PATCH 5/5] netfilter: nft_hash: add support for timeouts Patrick McHardy
@ 2015-04-02  8:52 ` Pablo Neira Ayuso
  2015-04-02  9:20   ` Patrick McHardy
  5 siblings, 1 reply; 8+ messages in thread
From: Pablo Neira Ayuso @ 2015-04-02  8:52 UTC (permalink / raw)
  To: Patrick McHardy; +Cc: netfilter-devel

On Thu, Mar 26, 2015 at 12:39:35PM +0000, Patrick McHardy wrote:
> These patches add support for set timeouts. Sets can have a default
> timeout value that can be overriden by element specific timeouts.
> 
> Removal of expired elements will usually be performed by a garbage
> collector for two reasons: avoiding an excessive number of timers
> and because data deinit has to happen in process context.
> 
> The first two patches add the required netlink attributes, parsing,
> dump etc. A set of GC helper functions for batched RCU element
> destruction is added in patch three, some synchronization helpers
> to avoid races between async GC and netlink insertion and removal
> of elements are added in patch four.
> 
> Following patches will use this infrastrucure to support set updates
> from the packet classification path for dynamic sets and dynamic
> flow state maintenance.
> 
> Please apply, thanks!

Series applied, thanks Patrick.

BTW, what's your plan with the rbtree and timeouts?

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 0/5] netfilter: nf_tables: set timeout support
  2015-04-02  8:52 ` [PATCH 0/5] netfilter: nf_tables: set timeout support Pablo Neira Ayuso
@ 2015-04-02  9:20   ` Patrick McHardy
  0 siblings, 0 replies; 8+ messages in thread
From: Patrick McHardy @ 2015-04-02  9:20 UTC (permalink / raw)
  To: Pablo Neira Ayuso; +Cc: netfilter-devel

On 02.04, Pablo Neira Ayuso wrote:
> On Thu, Mar 26, 2015 at 12:39:35PM +0000, Patrick McHardy wrote:
> > These patches add support for set timeouts. Sets can have a default
> > timeout value that can be overriden by element specific timeouts.
> > 
> > Removal of expired elements will usually be performed by a garbage
> > collector for two reasons: avoiding an excessive number of timers
> > and because data deinit has to happen in process context.
> > 
> > The first two patches add the required netlink attributes, parsing,
> > dump etc. A set of GC helper functions for batched RCU element
> > destruction is added in patch three, some synchronization helpers
> > to avoid races between async GC and netlink insertion and removal
> > of elements are added in patch four.
> > 
> > Following patches will use this infrastrucure to support set updates
> > from the packet classification path for dynamic sets and dynamic
> > flow state maintenance.
> > 
> > Please apply, thanks!
> 
> Series applied, thanks Patrick.

Thanks, I'll send the next batch soon.

> BTW, what's your plan with the rbtree and timeouts?

No specific plans so far. It would be fairly easy to add them, however
we'll always chose nft_hash unless intervals are used anyway, and in
that case it doesn't make too much sense to have timeouts as long
as we have no real knowledge of intervals within the kernel.

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2015-04-02  9:20 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-03-26 12:39 [PATCH 0/5] netfilter: nf_tables: set timeout support Patrick McHardy
2015-03-26 12:39 ` [PATCH 1/5] netfilter: nf_tables: add set timeout API support Patrick McHardy
2015-03-26 12:39 ` [PATCH 2/5] netfilter: nf_tables: add set element timeout support Patrick McHardy
2015-03-26 12:39 ` [PATCH 3/5] netfilter: nf_tables: add set garbage collection helpers Patrick McHardy
2015-03-26 12:39 ` [PATCH 4/5] netfilter: nf_tables: add GC synchronization helpers Patrick McHardy
2015-03-26 12:39 ` [PATCH 5/5] netfilter: nft_hash: add support for timeouts Patrick McHardy
2015-04-02  8:52 ` [PATCH 0/5] netfilter: nf_tables: set timeout support Pablo Neira Ayuso
2015-04-02  9:20   ` Patrick McHardy

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.