All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/9 WIP] nf_tables: set extensions and dynamic updates
@ 2015-01-30  7:46 Patrick McHardy
  2015-01-30  7:46 ` [PATCH 1/9] rhashtable: simplify rhashtable_remove() Patrick McHardy
                   ` (8 more replies)
  0 siblings, 9 replies; 21+ messages in thread
From: Patrick McHardy @ 2015-01-30  7:46 UTC (permalink / raw)
  To: herbert
  Cc: tgraf, davem, David.Laight, ying.xue, paulmck, netdev, netfilter-devel

Hi Herbert,

following is the nftables patchset for dynamic set updates and timeouts.
Probably the only interesting part is in patch 7/9, which adds GC to
nft_hash. Nothing special, just walking the hash and zapping entries.

Please keep in mind that this is work in progress, there are some known
bugs and problems left.

Cheers,
Patrick

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH 1/9] rhashtable: simplify rhashtable_remove()
  2015-01-30  7:46 [PATCH 0/9 WIP] nf_tables: set extensions and dynamic updates Patrick McHardy
@ 2015-01-30  7:46 ` Patrick McHardy
  2015-01-30 16:36   ` Thomas Graf
  2015-01-30  7:46 ` [PATCH 2/9] nftables: reject NFT_SET_ELEM_INTERVAL_END flag for non-interval sets Patrick McHardy
                   ` (7 subsequent siblings)
  8 siblings, 1 reply; 21+ messages in thread
From: Patrick McHardy @ 2015-01-30  7:46 UTC (permalink / raw)
  To: herbert
  Cc: tgraf, davem, David.Laight, ying.xue, paulmck, netdev, netfilter-devel

Remove some duplicated code by moving the restart label up a few
lines. Also use rcu_access_pointer() for the pointer comparison
instead of rht_dereference_rcu().

Signed-off-by: Patrick McHardy <kaber@trash.net>
---
 lib/rhashtable.c | 8 ++------
 1 file changed, 2 insertions(+), 6 deletions(-)

diff --git a/lib/rhashtable.c b/lib/rhashtable.c
index aca6998..5f079f7 100644
--- a/lib/rhashtable.c
+++ b/lib/rhashtable.c
@@ -588,12 +588,12 @@ bool rhashtable_remove(struct rhashtable *ht, struct rhash_head *obj)
 
 	rcu_read_lock();
 	tbl = rht_dereference_rcu(ht->tbl, ht);
+restart:
 	hash = head_hashfn(ht, tbl, obj);
 
 	lock = bucket_lock(tbl, hash);
 	spin_lock_bh(lock);
 
-restart:
 	pprev = &tbl->buckets[hash];
 	rht_for_each(he, tbl, hash) {
 		if (he != obj) {
@@ -613,14 +613,10 @@ restart:
 		return true;
 	}
 
-	if (tbl != rht_dereference_rcu(ht->future_tbl, ht)) {
+	if (tbl != rcu_access_pointer(ht->future_tbl)) {
 		spin_unlock_bh(lock);
 
 		tbl = rht_dereference_rcu(ht->future_tbl, ht);
-		hash = head_hashfn(ht, tbl, obj);
-
-		lock = bucket_lock(tbl, hash);
-		spin_lock_bh(lock);
 		goto restart;
 	}
 
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 2/9] nftables: reject NFT_SET_ELEM_INTERVAL_END flag for non-interval sets
  2015-01-30  7:46 [PATCH 0/9 WIP] nf_tables: set extensions and dynamic updates Patrick McHardy
  2015-01-30  7:46 ` [PATCH 1/9] rhashtable: simplify rhashtable_remove() Patrick McHardy
@ 2015-01-30  7:46 ` Patrick McHardy
  2015-01-30 17:31   ` Pablo Neira Ayuso
  2015-01-30  7:46 ` [PATCH 3/9] nftables: nft_rbtree: fix locking Patrick McHardy
                   ` (6 subsequent siblings)
  8 siblings, 1 reply; 21+ messages in thread
From: Patrick McHardy @ 2015-01-30  7:46 UTC (permalink / raw)
  To: herbert
  Cc: tgraf, davem, David.Laight, ying.xue, paulmck, netdev, netfilter-devel

Signed-off-by: Patrick McHardy <kaber@trash.net>
---
 net/netfilter/nf_tables_api.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/net/netfilter/nf_tables_api.c b/net/netfilter/nf_tables_api.c
index 129a8da..92ba4a0 100644
--- a/net/netfilter/nf_tables_api.c
+++ b/net/netfilter/nf_tables_api.c
@@ -3112,6 +3112,9 @@ static int nft_add_set_elem(struct nft_ctx *ctx, struct nft_set *set,
 		elem.flags = ntohl(nla_get_be32(nla[NFTA_SET_ELEM_FLAGS]));
 		if (elem.flags & ~NFT_SET_ELEM_INTERVAL_END)
 			return -EINVAL;
+		if (!(set->flags & NFT_SET_INTERVAL) &&
+		    elem.flags & NFT_SET_ELEM_INTERVAL_END)
+			return -EINVAL;
 	}
 
 	if (set->flags & NFT_SET_MAP) {
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 3/9] nftables: nft_rbtree: fix locking
  2015-01-30  7:46 [PATCH 0/9 WIP] nf_tables: set extensions and dynamic updates Patrick McHardy
  2015-01-30  7:46 ` [PATCH 1/9] rhashtable: simplify rhashtable_remove() Patrick McHardy
  2015-01-30  7:46 ` [PATCH 2/9] nftables: reject NFT_SET_ELEM_INTERVAL_END flag for non-interval sets Patrick McHardy
@ 2015-01-30  7:46 ` Patrick McHardy
  2015-01-30 10:52   ` Pablo Neira Ayuso
  2015-01-30  7:46 ` [PATCH 4/9] netfilter: nf_tables: add set extensions Patrick McHardy
                   ` (5 subsequent siblings)
  8 siblings, 1 reply; 21+ messages in thread
From: Patrick McHardy @ 2015-01-30  7:46 UTC (permalink / raw)
  To: herbert
  Cc: tgraf, davem, David.Laight, ying.xue, paulmck, netdev, netfilter-devel

Fix a race condition and unnecessary locking:

* the root rb_node must only be accessed under the lock in nft_rbtree_lookup()
* the lock is not needed in lookup functions in netlink contexts

Signed-off-by: Patrick McHardy <kaber@trash.net>
---
 net/netfilter/nft_rbtree.c | 12 +++---------
 1 file changed, 3 insertions(+), 9 deletions(-)

diff --git a/net/netfilter/nft_rbtree.c b/net/netfilter/nft_rbtree.c
index 46214f2..417796f 100644
--- a/net/netfilter/nft_rbtree.c
+++ b/net/netfilter/nft_rbtree.c
@@ -37,10 +37,11 @@ static bool nft_rbtree_lookup(const struct nft_set *set,
 {
 	const struct nft_rbtree *priv = nft_set_priv(set);
 	const struct nft_rbtree_elem *rbe, *interval = NULL;
-	const struct rb_node *parent = priv->root.rb_node;
+	const struct rb_node *parent;
 	int d;
 
 	spin_lock_bh(&nft_rbtree_lock);
+	parent = priv->root.rb_node;
 	while (parent != NULL) {
 		rbe = rb_entry(parent, struct nft_rbtree_elem, node);
 
@@ -158,7 +159,6 @@ static int nft_rbtree_get(const struct nft_set *set, struct nft_set_elem *elem)
 	struct nft_rbtree_elem *rbe;
 	int d;
 
-	spin_lock_bh(&nft_rbtree_lock);
 	while (parent != NULL) {
 		rbe = rb_entry(parent, struct nft_rbtree_elem, node);
 
@@ -173,11 +173,9 @@ static int nft_rbtree_get(const struct nft_set *set, struct nft_set_elem *elem)
 			    !(rbe->flags & NFT_SET_ELEM_INTERVAL_END))
 				nft_data_copy(&elem->data, rbe->data);
 			elem->flags = rbe->flags;
-			spin_unlock_bh(&nft_rbtree_lock);
 			return 0;
 		}
 	}
-	spin_unlock_bh(&nft_rbtree_lock);
 	return -ENOENT;
 }
 
@@ -190,7 +188,6 @@ static void nft_rbtree_walk(const struct nft_ctx *ctx,
 	struct nft_set_elem elem;
 	struct rb_node *node;
 
-	spin_lock_bh(&nft_rbtree_lock);
 	for (node = rb_first(&priv->root); node != NULL; node = rb_next(node)) {
 		if (iter->count < iter->skip)
 			goto cont;
@@ -203,14 +200,11 @@ static void nft_rbtree_walk(const struct nft_ctx *ctx,
 		elem.flags = rbe->flags;
 
 		iter->err = iter->fn(ctx, set, iter, &elem);
-		if (iter->err < 0) {
-			spin_unlock_bh(&nft_rbtree_lock);
+		if (iter->err < 0)
 			return;
-		}
 cont:
 		iter->count++;
 	}
-	spin_unlock_bh(&nft_rbtree_lock);
 }
 
 static unsigned int nft_rbtree_privsize(const struct nlattr * const nla[])
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 4/9] netfilter: nf_tables: add set extensions
  2015-01-30  7:46 [PATCH 0/9 WIP] nf_tables: set extensions and dynamic updates Patrick McHardy
                   ` (2 preceding siblings ...)
  2015-01-30  7:46 ` [PATCH 3/9] nftables: nft_rbtree: fix locking Patrick McHardy
@ 2015-01-30  7:46 ` Patrick McHardy
  2015-01-30  7:46 ` [PATCH 5/9] netfilter: nf_tables: convert hash and rbtree to " Patrick McHardy
                   ` (4 subsequent siblings)
  8 siblings, 0 replies; 21+ messages in thread
From: Patrick McHardy @ 2015-01-30  7:46 UTC (permalink / raw)
  To: herbert
  Cc: tgraf, davem, David.Laight, ying.xue, paulmck, netdev, netfilter-devel

Add simple set extension infrastructure for maintaining variable sized
and optional per element data.

Signed-off-by: Patrick McHardy <kaber@trash.net>
---
 include/net/netfilter/nf_tables.h | 109 ++++++++++++++++++++++++++++++++++++++
 net/netfilter/nf_tables_api.c     |  16 ++++++
 2 files changed, 125 insertions(+)

diff --git a/include/net/netfilter/nf_tables.h b/include/net/netfilter/nf_tables.h
index 3ae969e..9f2d073 100644
--- a/include/net/netfilter/nf_tables.h
+++ b/include/net/netfilter/nf_tables.h
@@ -120,6 +120,115 @@ int nft_validate_data_load(const struct nft_ctx *ctx, enum nft_registers reg,
 			   enum nft_data_types type);
 
 /**
+ *	enum nft_set_extensions - set extension type IDs
+ *
+ *	@NFT_SET_EXT_KEY: element key
+ *	@NFT_SET_EXT_DATA: mapping data
+ *	@NFT_SET_EXT_FLAGS: element flags
+ *	@NFT_SET_EXT_NUM: number of extension types
+ */
+enum nft_set_extensions {
+	NFT_SET_EXT_KEY,
+	NFT_SET_EXT_DATA,
+	NFT_SET_EXT_FLAGS,
+	NFT_SET_EXT_NUM
+};
+
+/**
+ *	struct nft_set_ext_type - set extension types
+ *
+ * 	@len: fixed part length of the extension
+ * 	@align: alignment requirements of the extension
+ */
+struct nft_set_ext_type {
+	u8	len;
+	u8	align;
+};
+
+extern const struct nft_set_ext_type nft_set_ext_types[];
+
+static inline u8 nft_set_ext_size(u8 id)
+{
+	return nft_set_ext_types[id].len;
+}
+
+static inline u8 nft_set_ext_offset(u8 size, u8 id)
+{
+	return ALIGN(size, nft_set_ext_types[id].align);
+}
+
+/**
+ *	struct nft_set_ext_tmpl - set extension template
+ *
+ *	@offset: offsets of individual extension types
+ *	@len: length of extension area
+ */
+struct nft_set_ext_tmpl {
+	u8	offset[NFT_SET_EXT_NUM];
+	u8	len;
+};
+
+/**
+ *	struct nft_set_ext - set extensions
+ *
+ *	@offset: offsets of individual extension types
+ *	@data: beginning of extension data
+ */
+struct nft_set_ext {
+	u8	offset[NFT_SET_EXT_NUM];
+	char	data[0];
+};
+
+static inline void nft_set_ext_prepare(struct nft_set_ext_tmpl *tmpl)
+{
+	memset(tmpl, 0, sizeof(*tmpl));
+	tmpl->len = sizeof(struct nft_set_ext);
+}
+
+static inline void nft_set_ext_add(struct nft_set_ext_tmpl *tmpl, u8 id)
+{
+	tmpl->len	 = ALIGN(tmpl->len, nft_set_ext_types[id].align);
+	tmpl->offset[id] = tmpl->len;
+	tmpl->len	+= nft_set_ext_types[id].len;
+}
+
+static inline void nft_set_ext_init(struct nft_set_ext *ext,
+				    const struct nft_set_ext_tmpl *tmpl)
+{
+	memcpy(ext->offset, tmpl->offset, sizeof(ext->offset));
+}
+
+static inline bool __nft_set_ext_exists(const struct nft_set_ext *ext, u8 id)
+{
+	return !!ext->offset[id];
+}
+
+static inline bool nft_set_ext_exists(const struct nft_set_ext *ext, u8 id)
+{
+	return ext && __nft_set_ext_exists(ext, id);
+}
+
+static inline void *nft_set_ext(const struct nft_set_ext *ext, u8 id)
+{
+	return (void *)ext + ext->offset[id];
+}
+
+static inline struct nft_data *nft_set_ext_key(const struct nft_set_ext *ext)
+{
+	return nft_set_ext(ext, NFT_SET_EXT_KEY);
+}
+
+static inline struct nft_data *nft_set_ext_data(const struct nft_set_ext *ext)
+{
+	return nft_set_ext(ext, NFT_SET_EXT_DATA);
+}
+
+static inline u8 *nft_set_ext_flags(const struct nft_set_ext *ext)
+{
+	return nft_set_ext(ext, NFT_SET_EXT_FLAGS);
+}
+
+/**
  *	struct nft_set_elem - generic representation of set elements
  *
  *	@cookie: implementation specific element cookie
diff --git a/net/netfilter/nf_tables_api.c b/net/netfilter/nf_tables_api.c
index 92ba4a0..fbc73a0 100644
--- a/net/netfilter/nf_tables_api.c
+++ b/net/netfilter/nf_tables_api.c
@@ -2799,6 +2799,22 @@ void nf_tables_unbind_set(const struct nft_ctx *ctx, struct nft_set *set,
 		nf_tables_set_destroy(ctx, set);
 }
 
+const struct nft_set_ext_type nft_set_ext_types[] = {
+	[NFT_SET_EXT_KEY]		= {
+		.len	= sizeof(struct nft_data),
+		.align	= __alignof__(struct nft_data),
+	},
+	[NFT_SET_EXT_DATA]		= {
+		.len	= sizeof(struct nft_data),
+		.align	= __alignof__(struct nft_data),
+	},
+	[NFT_SET_EXT_FLAGS]		= {
+		.len	= sizeof(u8),
+		.align	= __alignof__(u8),
+	},
+};
+EXPORT_SYMBOL_GPL(nft_set_ext_types);
+
 /*
  * Set elements
  */
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 5/9] netfilter: nf_tables: convert hash and rbtree to set extensions
  2015-01-30  7:46 [PATCH 0/9 WIP] nf_tables: set extensions and dynamic updates Patrick McHardy
                   ` (3 preceding siblings ...)
  2015-01-30  7:46 ` [PATCH 4/9] netfilter: nf_tables: add set extensions Patrick McHardy
@ 2015-01-30  7:46 ` Patrick McHardy
  2015-01-30  7:46 ` [PATCH 6/9] netfilter: nf_tables: add set timeout support Patrick McHardy
                   ` (3 subsequent siblings)
  8 siblings, 0 replies; 21+ messages in thread
From: Patrick McHardy @ 2015-01-30  7:46 UTC (permalink / raw)
  To: herbert
  Cc: tgraf, davem, David.Laight, ying.xue, paulmck, netdev, netfilter-devel

Signed-off-by: Patrick McHardy <kaber@trash.net>
---
 include/net/netfilter/nf_tables.h | 15 ++++---
 net/netfilter/nf_tables_api.c     | 83 ++++++++++++++++++++++++++++-----------
 net/netfilter/nft_hash.c          | 52 +++++++-----------------
 net/netfilter/nft_lookup.c        |  6 ++-
 net/netfilter/nft_rbtree.c        | 68 +++++++++++---------------------
 5 files changed, 112 insertions(+), 112 deletions(-)

diff --git a/include/net/netfilter/nf_tables.h b/include/net/netfilter/nf_tables.h
index 9f2d073..60846da 100644
--- a/include/net/netfilter/nf_tables.h
+++ b/include/net/netfilter/nf_tables.h
@@ -233,8 +233,7 @@ static inline u8 *nft_set_ext_flags(const struct nft_set_ext *ext)
  *
  *	@cookie: implementation specific element cookie
  *	@key: element key
- *	@data: element data (maps only)
- *	@flags: element flags (end of interval)
+ *	@priv: element private data and extensions
  *
  *	The cookie can be used to store a handle to the element for subsequent
  *	removal.
@@ -242,8 +241,7 @@ static inline u8 *nft_set_ext_flags(const struct nft_set_ext *ext)
 struct nft_set_elem {
 	void			*cookie;
 	struct nft_data		key;
-	struct nft_data		data;
-	u32			flags;
+	void			*priv;
 };
 
 struct nft_set;
@@ -307,12 +305,13 @@ struct nft_set_estimate {
  *	@destroy: destroy private data of set instance
  *	@list: nf_tables_set_ops list node
  *	@owner: module reference
+ *	@elemsize: element private size
  *	@features: features supported by the implementation
  */
 struct nft_set_ops {
 	bool				(*lookup)(const struct nft_set *set,
 						  const struct nft_data *key,
-						  struct nft_data *data);
+						  const struct nft_set_ext **ext);
 	int				(*get)(const struct nft_set *set,
 					       struct nft_set_elem *elem);
 	int				(*insert)(const struct nft_set *set,
@@ -334,6 +333,7 @@ struct nft_set_ops {
 
 	struct list_head		list;
 	struct module			*owner;
+	unsigned int			elemsize;
 	u32				features;
 };
 
@@ -380,6 +380,11 @@ static inline void *nft_set_priv(const struct nft_set *set)
 	return (void *)set->data;
 }
 
+static inline struct nft_set_ext *nft_set_elem_ext(const struct nft_set *set, void *priv)
+{
+	return priv + set->ops->elemsize;
+}
+
 struct nft_set *nf_tables_set_lookup(const struct nft_table *table,
 				     const struct nlattr *nla);
 struct nft_set *nf_tables_set_lookup_byid(const struct net *net,
diff --git a/net/netfilter/nf_tables_api.c b/net/netfilter/nf_tables_api.c
index fbc73a0..cad0184 100644
--- a/net/netfilter/nf_tables_api.c
+++ b/net/netfilter/nf_tables_api.c
@@ -2743,10 +2743,11 @@ static int nf_tables_bind_check_setelem(const struct nft_ctx *ctx,
 					const struct nft_set_iter *iter,
 					const struct nft_set_elem *elem)
 {
+	const struct nft_set_ext *ext = nft_set_elem_ext(set, elem->priv);
 	enum nft_registers dreg;
 
 	dreg = nft_type_to_reg(set->dtype);
-	return nft_validate_data_load(ctx, dreg, &elem->data,
+	return nft_validate_data_load(ctx, dreg, nft_set_ext_data(ext),
 				      set->dtype == NFT_DATA_VERDICT ?
 				      NFT_DATA_VERDICT : NFT_DATA_VALUE);
 }
@@ -2861,6 +2862,7 @@ static int nf_tables_fill_setelem(struct sk_buff *skb,
 				  const struct nft_set *set,
 				  const struct nft_set_elem *elem)
 {
+	const struct nft_set_ext *ext = nft_set_elem_ext(set, elem->priv);
 	unsigned char *b = skb_tail_pointer(skb);
 	struct nlattr *nest;
 
@@ -2868,20 +2870,20 @@ static int nf_tables_fill_setelem(struct sk_buff *skb,
 	if (nest == NULL)
 		goto nla_put_failure;
 
-	if (nft_data_dump(skb, NFTA_SET_ELEM_KEY, &elem->key, NFT_DATA_VALUE,
-			  set->klen) < 0)
+	if (nft_data_dump(skb, NFTA_SET_ELEM_KEY, nft_set_ext_key(ext),
+			  NFT_DATA_VALUE, set->klen) < 0)
 		goto nla_put_failure;
 
-	if (set->flags & NFT_SET_MAP &&
-	    !(elem->flags & NFT_SET_ELEM_INTERVAL_END) &&
-	    nft_data_dump(skb, NFTA_SET_ELEM_DATA, &elem->data,
+	if (nft_set_ext_exists(ext, NFT_SET_EXT_DATA) &&
+	    nft_data_dump(skb, NFTA_SET_ELEM_DATA, nft_set_ext_data(ext),
 			  set->dtype == NFT_DATA_VERDICT ? NFT_DATA_VERDICT : NFT_DATA_VALUE,
 			  set->dlen) < 0)
 		goto nla_put_failure;
 
-	if (elem->flags != 0)
-		if (nla_put_be32(skb, NFTA_SET_ELEM_FLAGS, htonl(elem->flags)))
-			goto nla_put_failure;
+	if (nft_set_ext_exists(ext, NFT_SET_EXT_FLAGS) &&
+	    nla_put_be32(skb, NFTA_SET_ELEM_FLAGS,
+		         htonl(*nft_set_ext_flags(ext))))
+		goto nla_put_failure;
 
 	nla_nest_end(skb, nest);
 	return 0;
@@ -3106,10 +3108,14 @@ static int nft_add_set_elem(struct nft_ctx *ctx, struct nft_set *set,
 {
 	struct nlattr *nla[NFTA_SET_ELEM_MAX + 1];
 	struct nft_data_desc d1, d2;
+	struct nft_set_ext_tmpl tmpl;
+	struct nft_set_ext *ext;
 	struct nft_set_elem elem;
 	struct nft_set_binding *binding;
+	struct nft_data data;
 	enum nft_registers dreg;
 	struct nft_trans *trans;
+	u32 flags;
 	int err;
 
 	if (set->size && set->nelems == set->size)
@@ -3123,22 +3129,26 @@ static int nft_add_set_elem(struct nft_ctx *ctx, struct nft_set *set,
 	if (nla[NFTA_SET_ELEM_KEY] == NULL)
 		return -EINVAL;
 
-	elem.flags = 0;
+	nft_set_ext_prepare(&tmpl);
+
+	flags = 0;
 	if (nla[NFTA_SET_ELEM_FLAGS] != NULL) {
-		elem.flags = ntohl(nla_get_be32(nla[NFTA_SET_ELEM_FLAGS]));
-		if (elem.flags & ~NFT_SET_ELEM_INTERVAL_END)
+		flags = ntohl(nla_get_be32(nla[NFTA_SET_ELEM_FLAGS]));
+		if (flags & ~NFT_SET_ELEM_INTERVAL_END)
 			return -EINVAL;
 		if (!(set->flags & NFT_SET_INTERVAL) &&
-		    elem.flags & NFT_SET_ELEM_INTERVAL_END)
+		    flags & NFT_SET_ELEM_INTERVAL_END)
 			return -EINVAL;
+		if (flags != 0)
+			nft_set_ext_add(&tmpl, NFT_SET_EXT_FLAGS);
 	}
 
 	if (set->flags & NFT_SET_MAP) {
 		if (nla[NFTA_SET_ELEM_DATA] == NULL &&
-		    !(elem.flags & NFT_SET_ELEM_INTERVAL_END))
+		    !(flags & NFT_SET_ELEM_INTERVAL_END))
 			return -EINVAL;
 		if (nla[NFTA_SET_ELEM_DATA] != NULL &&
-		    elem.flags & NFT_SET_ELEM_INTERVAL_END)
+		    flags & NFT_SET_ELEM_INTERVAL_END)
 			return -EINVAL;
 	} else {
 		if (nla[NFTA_SET_ELEM_DATA] != NULL)
@@ -3156,8 +3166,10 @@ static int nft_add_set_elem(struct nft_ctx *ctx, struct nft_set *set,
 	if (set->ops->get(set, &elem) == 0)
 		goto err2;
 
+	nft_set_ext_add(&tmpl, NFT_SET_EXT_KEY);
+
 	if (nla[NFTA_SET_ELEM_DATA] != NULL) {
-		err = nft_data_init(ctx, &elem.data, &d2, nla[NFTA_SET_ELEM_DATA]);
+		err = nft_data_init(ctx, &data, &d2, nla[NFTA_SET_ELEM_DATA]);
 		if (err < 0)
 			goto err2;
 
@@ -3174,29 +3186,46 @@ static int nft_add_set_elem(struct nft_ctx *ctx, struct nft_set *set,
 			};
 
 			err = nft_validate_data_load(&bind_ctx, dreg,
-						     &elem.data, d2.type);
+						     &data, d2.type);
 			if (err < 0)
 				goto err3;
 		}
+
+		nft_set_ext_add(&tmpl, NFT_SET_EXT_DATA);
 	}
 
+	err = -ENOMEM;
+	elem.priv = kzalloc(set->ops->elemsize + tmpl.len, GFP_KERNEL);
+	if (elem.priv == NULL)
+		goto err3;
+	ext = elem.priv + set->ops->elemsize;
+
+	nft_set_ext_init(ext, &tmpl);
+	nft_data_copy(nft_set_ext_key(ext), &elem.key);
+	if (flags != 0)
+		*nft_set_ext_flags(ext) = flags;
+	if (nla[NFTA_SET_ELEM_DATA] != NULL)
+		nft_data_copy(nft_set_ext_data(ext), &data);
+
 	trans = nft_trans_elem_alloc(ctx, NFT_MSG_NEWSETELEM, set);
 	if (trans == NULL)
-		goto err3;
+		goto err4;
 
 	err = set->ops->insert(set, &elem);
 	if (err < 0)
-		goto err4;
+		goto err5;
 
 	nft_trans_elem(trans) = elem;
 	list_add_tail(&trans->list, &ctx->net->nft.commit_list);
 	return 0;
 
-err4:
+err5:
 	kfree(trans);
+err4:
+	kfree(elem.priv);
 err3:
 	if (nla[NFTA_SET_ELEM_DATA] != NULL)
-		nft_data_uninit(&elem.data, d2.type);
+		nft_data_uninit(&data, d2.type);
 err2:
 	nft_data_uninit(&elem.key, d1.type);
 err1:
@@ -3617,10 +3646,12 @@ static int nf_tables_commit(struct sk_buff *skb)
 			te->set->ops->get(te->set, &te->elem);
 			te->set->ops->remove(te->set, &te->elem);
 			nft_data_uninit(&te->elem.key, NFT_DATA_VALUE);
+#if 0
 			if (te->elem.flags & NFT_SET_MAP) {
 				nft_data_uninit(&te->elem.data,
 						te->set->dtype);
 			}
+#endif
 			nft_trans_destroy(trans);
 			break;
 		}
@@ -3785,13 +3816,17 @@ static int nf_tables_loop_check_setelem(const struct nft_ctx *ctx,
 					const struct nft_set_iter *iter,
 					const struct nft_set_elem *elem)
 {
-	if (elem->flags & NFT_SET_ELEM_INTERVAL_END)
+	const struct nft_set_ext *ext = nft_set_elem_ext(set, elem->priv);
+	const struct nft_data *data;
+
+	if (*nft_set_ext_flags(ext) & NFT_SET_ELEM_INTERVAL_END)
 		return 0;
 
-	switch (elem->data.verdict) {
+	data = nft_set_ext_data(ext);
+	switch (data->verdict) {
 	case NFT_JUMP:
 	case NFT_GOTO:
-		return nf_tables_check_loops(ctx, elem->data.chain);
+		return nf_tables_check_loops(ctx, data->chain);
 	default:
 		return 0;
 	}
diff --git a/net/netfilter/nft_hash.c b/net/netfilter/nft_hash.c
index 75887d7..cba0ad2 100644
--- a/net/netfilter/nft_hash.c
+++ b/net/netfilter/nft_hash.c
@@ -25,20 +25,19 @@
 
 struct nft_hash_elem {
 	struct rhash_head		node;
-	struct nft_data			key;
-	struct nft_data			data[];
+	struct nft_set_ext		ext;
 };
 
 static bool nft_hash_lookup(const struct nft_set *set,
 			    const struct nft_data *key,
-			    struct nft_data *data)
+			    const struct nft_set_ext **ext)
 {
 	struct rhashtable *priv = nft_set_priv(set);
 	const struct nft_hash_elem *he;
 
 	he = rhashtable_lookup(priv, key);
-	if (he && set->flags & NFT_SET_MAP)
-		nft_data_copy(data, he->data);
+	if (he != NULL)
+		*ext = &he->ext;
 
 	return !!he;
 }
@@ -47,35 +46,18 @@ static int nft_hash_insert(const struct nft_set *set,
 			   const struct nft_set_elem *elem)
 {
 	struct rhashtable *priv = nft_set_priv(set);
-	struct nft_hash_elem *he;
-	unsigned int size;
-
-	if (elem->flags != 0)
-		return -EINVAL;
-
-	size = sizeof(*he);
-	if (set->flags & NFT_SET_MAP)
-		size += sizeof(he->data[0]);
-
-	he = kzalloc(size, GFP_KERNEL);
-	if (he == NULL)
-		return -ENOMEM;
-
-	nft_data_copy(&he->key, &elem->key);
-	if (set->flags & NFT_SET_MAP)
-		nft_data_copy(he->data, &elem->data);
+	struct nft_hash_elem *he = elem->priv;
 
 	rhashtable_insert(priv, &he->node);
-
 	return 0;
 }
 
 static void nft_hash_elem_destroy(const struct nft_set *set,
 				  struct nft_hash_elem *he)
 {
-	nft_data_uninit(&he->key, NFT_DATA_VALUE);
+	nft_data_uninit(nft_set_ext_key(&he->ext), NFT_DATA_VALUE);
 	if (set->flags & NFT_SET_MAP)
-		nft_data_uninit(he->data, set->dtype);
+		nft_data_uninit(nft_set_ext_data(&he->ext), set->dtype);
 	kfree(he);
 }
 
@@ -99,12 +81,10 @@ static bool nft_hash_compare(void *ptr, void *arg)
 	struct nft_hash_elem *he = ptr;
 	struct nft_compare_arg *x = arg;
 
-	if (!nft_data_cmp(&he->key, &x->elem->key, x->set->klen)) {
+	if (!nft_data_cmp(nft_set_ext_key(&he->ext), &x->elem->key,
+			  x->set->klen)) {
 		x->elem->cookie = he;
-		x->elem->flags = 0;
-		if (x->set->flags & NFT_SET_MAP)
-			nft_data_copy(&x->elem->data, he->data);
-
+		x->elem->priv  = he;
 		return true;
 	}
 
@@ -131,7 +111,7 @@ static void nft_hash_walk(const struct nft_ctx *ctx, const struct nft_set *set,
 {
 	struct rhashtable *priv = nft_set_priv(set);
 	const struct bucket_table *tbl;
-	const struct nft_hash_elem *he;
+	struct nft_hash_elem *he;
 	struct nft_set_elem elem;
 	unsigned int i;
 
@@ -143,10 +123,7 @@ static void nft_hash_walk(const struct nft_ctx *ctx, const struct nft_set *set,
 			if (iter->count < iter->skip)
 				goto cont;
 
-			memcpy(&elem.key, &he->key, sizeof(elem.key));
-			if (set->flags & NFT_SET_MAP)
-				memcpy(&elem.data, he->data, sizeof(elem.data));
-			elem.flags = 0;
+			elem.priv = he;
 
 			iter->err = iter->fn(ctx, set, iter, &elem);
 			if (iter->err < 0)
@@ -170,7 +147,7 @@ static int nft_hash_init(const struct nft_set *set,
 	struct rhashtable_params params = {
 		.nelem_hint = desc->size ? : NFT_HASH_ELEMENT_HINT,
 		.head_offset = offsetof(struct nft_hash_elem, node),
-		.key_offset = offsetof(struct nft_hash_elem, key),
+		.key_offset = offsetof(struct nft_hash_elem, ext) + 8,
 		.key_len = set->klen,
 		.hashfn = jhash,
 		.grow_decision = rht_grow_above_75,
@@ -209,7 +186,7 @@ static bool nft_hash_estimate(const struct nft_set_desc *desc, u32 features,
 
 	esize = sizeof(struct nft_hash_elem);
 	if (features & NFT_SET_MAP)
-		esize += FIELD_SIZEOF(struct nft_hash_elem, data[0]);
+		esize += sizeof(struct nft_data);
 
 	if (desc->size) {
 		est->size = sizeof(struct rhashtable) +
@@ -232,6 +209,7 @@ static bool nft_hash_estimate(const struct nft_set_desc *desc, u32 features,
 
 static struct nft_set_ops nft_hash_ops __read_mostly = {
 	.privsize       = nft_hash_privsize,
+	.elemsize	= offsetof(struct nft_hash_elem, ext),
 	.estimate	= nft_hash_estimate,
 	.init		= nft_hash_init,
 	.destroy	= nft_hash_destroy,
diff --git a/net/netfilter/nft_lookup.c b/net/netfilter/nft_lookup.c
index 6404a72..cdbf050 100644
--- a/net/netfilter/nft_lookup.c
+++ b/net/netfilter/nft_lookup.c
@@ -31,9 +31,13 @@ static void nft_lookup_eval(const struct nft_expr *expr,
 {
 	const struct nft_lookup *priv = nft_expr_priv(expr);
 	const struct nft_set *set = priv->set;
+	const struct nft_set_ext *ext;
 
-	if (set->ops->lookup(set, &data[priv->sreg], &data[priv->dreg]))
+	if (set->ops->lookup(set, &data[priv->sreg], &ext)) {
+		if (set->flags & NFT_SET_MAP)
+			nft_data_copy(&data[priv->dreg], nft_set_ext_data(ext));
 		return;
+	}
 	data[NFT_REG_VERDICT].verdict = NFT_BREAK;
 }
 
diff --git a/net/netfilter/nft_rbtree.c b/net/netfilter/nft_rbtree.c
index 417796f..e721744 100644
--- a/net/netfilter/nft_rbtree.c
+++ b/net/netfilter/nft_rbtree.c
@@ -26,14 +26,12 @@ struct nft_rbtree {
 
 struct nft_rbtree_elem {
 	struct rb_node		node;
-	u16			flags;
-	struct nft_data		key;
-	struct nft_data		data[];
+	struct nft_set_ext	ext;
 };
 
 static bool nft_rbtree_lookup(const struct nft_set *set,
 			      const struct nft_data *key,
-			      struct nft_data *data)
+			      const struct nft_set_ext **ext)
 {
 	const struct nft_rbtree *priv = nft_set_priv(set);
 	const struct nft_rbtree_elem *rbe, *interval = NULL;
@@ -45,7 +43,7 @@ static bool nft_rbtree_lookup(const struct nft_set *set,
 	while (parent != NULL) {
 		rbe = rb_entry(parent, struct nft_rbtree_elem, node);
 
-		d = nft_data_cmp(&rbe->key, key, set->klen);
+		d = nft_data_cmp(nft_set_ext_key(&rbe->ext), key, set->klen);
 		if (d < 0) {
 			parent = parent->rb_left;
 			interval = rbe;
@@ -53,12 +51,12 @@ static bool nft_rbtree_lookup(const struct nft_set *set,
 			parent = parent->rb_right;
 		else {
 found:
-			if (rbe->flags & NFT_SET_ELEM_INTERVAL_END)
+			if (*nft_set_ext_flags(&rbe->ext) &
+			    NFT_SET_ELEM_INTERVAL_END)
 				goto out;
-			if (set->flags & NFT_SET_MAP)
-				nft_data_copy(data, rbe->data);
-
 			spin_unlock_bh(&nft_rbtree_lock);
+			// FIXME: valid?
+			*ext = &rbe->ext;
 			return true;
 		}
 	}
@@ -75,10 +73,10 @@ out:
 static void nft_rbtree_elem_destroy(const struct nft_set *set,
 				    struct nft_rbtree_elem *rbe)
 {
-	nft_data_uninit(&rbe->key, NFT_DATA_VALUE);
+	nft_data_uninit(nft_set_ext_key(&rbe->ext), NFT_DATA_VALUE);
 	if (set->flags & NFT_SET_MAP &&
-	    !(rbe->flags & NFT_SET_ELEM_INTERVAL_END))
-		nft_data_uninit(rbe->data, set->dtype);
+	    nft_set_ext_exists(&rbe->ext, NFT_SET_EXT_DATA))
+		nft_data_uninit(nft_set_ext_data(&rbe->ext), set->dtype);
 
 	kfree(rbe);
 }
@@ -96,7 +94,9 @@ static int __nft_rbtree_insert(const struct nft_set *set,
 	while (*p != NULL) {
 		parent = *p;
 		rbe = rb_entry(parent, struct nft_rbtree_elem, node);
-		d = nft_data_cmp(&rbe->key, &new->key, set->klen);
+		d = nft_data_cmp(nft_set_ext_key(&rbe->ext),
+				 nft_set_ext_key(&new->ext),
+				 set->klen);
 		if (d < 0)
 			p = &parent->rb_left;
 		else if (d > 0)
@@ -112,31 +112,13 @@ static int __nft_rbtree_insert(const struct nft_set *set,
 static int nft_rbtree_insert(const struct nft_set *set,
 			     const struct nft_set_elem *elem)
 {
-	struct nft_rbtree_elem *rbe;
-	unsigned int size;
+	struct nft_rbtree_elem *rbe = elem->priv;
 	int err;
 
-	size = sizeof(*rbe);
-	if (set->flags & NFT_SET_MAP &&
-	    !(elem->flags & NFT_SET_ELEM_INTERVAL_END))
-		size += sizeof(rbe->data[0]);
-
-	rbe = kzalloc(size, GFP_KERNEL);
-	if (rbe == NULL)
-		return -ENOMEM;
-
-	rbe->flags = elem->flags;
-	nft_data_copy(&rbe->key, &elem->key);
-	if (set->flags & NFT_SET_MAP &&
-	    !(rbe->flags & NFT_SET_ELEM_INTERVAL_END))
-		nft_data_copy(rbe->data, &elem->data);
-
 	spin_lock_bh(&nft_rbtree_lock);
 	err = __nft_rbtree_insert(set, rbe);
-	if (err < 0)
-		kfree(rbe);
-
 	spin_unlock_bh(&nft_rbtree_lock);
+
 	return err;
 }
 
@@ -162,17 +144,16 @@ static int nft_rbtree_get(const struct nft_set *set, struct nft_set_elem *elem)
 	while (parent != NULL) {
 		rbe = rb_entry(parent, struct nft_rbtree_elem, node);
 
-		d = nft_data_cmp(&rbe->key, &elem->key, set->klen);
+		d = nft_data_cmp(nft_set_ext_key(&rbe->ext), &elem->key,
+				 set->klen);
 		if (d < 0)
 			parent = parent->rb_left;
 		else if (d > 0)
 			parent = parent->rb_right;
 		else {
 			elem->cookie = rbe;
-			if (set->flags & NFT_SET_MAP &&
-			    !(rbe->flags & NFT_SET_ELEM_INTERVAL_END))
-				nft_data_copy(&elem->data, rbe->data);
-			elem->flags = rbe->flags;
+			elem->priv   = rbe;
+			spin_unlock_bh(&nft_rbtree_lock);
 			return 0;
 		}
 	}
@@ -184,7 +165,7 @@ static void nft_rbtree_walk(const struct nft_ctx *ctx,
 			    struct nft_set_iter *iter)
 {
 	const struct nft_rbtree *priv = nft_set_priv(set);
-	const struct nft_rbtree_elem *rbe;
+	struct nft_rbtree_elem *rbe;
 	struct nft_set_elem elem;
 	struct rb_node *node;
 
@@ -193,11 +174,7 @@ static void nft_rbtree_walk(const struct nft_ctx *ctx,
 			goto cont;
 
 		rbe = rb_entry(node, struct nft_rbtree_elem, node);
-		nft_data_copy(&elem.key, &rbe->key);
-		if (set->flags & NFT_SET_MAP &&
-		    !(rbe->flags & NFT_SET_ELEM_INTERVAL_END))
-			nft_data_copy(&elem.data, rbe->data);
-		elem.flags = rbe->flags;
+		elem.priv = rbe;
 
 		iter->err = iter->fn(ctx, set, iter, &elem);
 		if (iter->err < 0)
@@ -242,7 +219,7 @@ static bool nft_rbtree_estimate(const struct nft_set_desc *desc, u32 features,
 
 	nsize = sizeof(struct nft_rbtree_elem);
 	if (features & NFT_SET_MAP)
-		nsize += FIELD_SIZEOF(struct nft_rbtree_elem, data[0]);
+		nsize += sizeof(struct nft_data);
 
 	if (desc->size)
 		est->size = sizeof(struct nft_rbtree) + desc->size * nsize;
@@ -256,6 +233,7 @@ static bool nft_rbtree_estimate(const struct nft_set_desc *desc, u32 features,
 
 static struct nft_set_ops nft_rbtree_ops __read_mostly = {
 	.privsize	= nft_rbtree_privsize,
+	.elemsize	= offsetof(struct nft_rbtree_elem, ext),
 	.estimate	= nft_rbtree_estimate,
 	.init		= nft_rbtree_init,
 	.destroy	= nft_rbtree_destroy,
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 6/9] netfilter: nf_tables: add set timeout support
  2015-01-30  7:46 [PATCH 0/9 WIP] nf_tables: set extensions and dynamic updates Patrick McHardy
                   ` (4 preceding siblings ...)
  2015-01-30  7:46 ` [PATCH 5/9] netfilter: nf_tables: convert hash and rbtree to " Patrick McHardy
@ 2015-01-30  7:46 ` Patrick McHardy
  2015-01-30  7:46 ` [PATCH 7/9] netfilter: nft_hash: add support for timeouts Patrick McHardy
                   ` (2 subsequent siblings)
  8 siblings, 0 replies; 21+ messages in thread
From: Patrick McHardy @ 2015-01-30  7:46 UTC (permalink / raw)
  To: herbert
  Cc: tgraf, davem, David.Laight, ying.xue, paulmck, netdev, netfilter-devel

Signed-off-by: Patrick McHardy <kaber@trash.net>
---
 include/net/netfilter/nf_tables.h        |  9 ++++++
 include/uapi/linux/netfilter/nf_tables.h |  6 ++++
 net/netfilter/nf_tables_api.c            | 51 ++++++++++++++++++++++++++++++--
 3 files changed, 64 insertions(+), 2 deletions(-)

diff --git a/include/net/netfilter/nf_tables.h b/include/net/netfilter/nf_tables.h
index 60846da..735a59d 100644
--- a/include/net/netfilter/nf_tables.h
+++ b/include/net/netfilter/nf_tables.h
@@ -125,12 +125,14 @@ int nft_validate_data_load(const struct nft_ctx *ctx, enum nft_registers reg,
  *	@NFT_SET_EXT_KEY: element key
  *	@NFT_SET_EXT_DATA: mapping data
  *	@NFT_SET_EXT_FLAGS: element flags
+ *	@NFT_SET_EXT_TIMEOUT: element timeout
  *	@NFT_SET_EXT_NUM: number of extension types
  */
 enum nft_set_extensions {
 	NFT_SET_EXT_KEY,
 	NFT_SET_EXT_DATA,
 	NFT_SET_EXT_FLAGS,
+	NFT_SET_EXT_TIMEOUT,
 	NFT_SET_EXT_NUM
 };
 
@@ -228,6 +230,11 @@ static inline u8 *nft_set_ext_flags(const struct nft_set_ext *ext)
 	return nft_set_ext(ext, NFT_SET_EXT_FLAGS);
 }
 
+static inline unsigned long *nft_set_ext_timeout(const struct nft_set_ext *ext)
+{
+	return nft_set_ext(ext, NFT_SET_EXT_TIMEOUT);
+}
+
 /**
  *	struct nft_set_elem - generic representation of set elements
  *
@@ -350,6 +357,7 @@ void nft_unregister_set(struct nft_set_ops *ops);
  * 	@dtype: data type (verdict or numeric type defined by userspace)
  * 	@size: maximum set size
  * 	@nelems: number of elements
+ * 	@timeout: default timeout value
  *	@policy: set parameterization (see enum nft_set_policies)
  * 	@ops: set ops
  * 	@flags: set flags
@@ -365,6 +373,7 @@ struct nft_set {
 	u32				dtype;
 	u32				size;
 	u32				nelems;
+	u32				timeout;
 	u16				policy;
 	/* runtime data below here */
 	const struct nft_set_ops	*ops ____cacheline_aligned;
diff --git a/include/uapi/linux/netfilter/nf_tables.h b/include/uapi/linux/netfilter/nf_tables.h
index 832bc46..144d8fe 100644
--- a/include/uapi/linux/netfilter/nf_tables.h
+++ b/include/uapi/linux/netfilter/nf_tables.h
@@ -207,12 +207,14 @@ enum nft_rule_compat_attributes {
  * @NFT_SET_CONSTANT: set contents may not change while bound
  * @NFT_SET_INTERVAL: set contains intervals
  * @NFT_SET_MAP: set is used as a dictionary
+ * @NFT_SET_TIMEOUT: set uses timeouts
  */
 enum nft_set_flags {
 	NFT_SET_ANONYMOUS		= 0x1,
 	NFT_SET_CONSTANT		= 0x2,
 	NFT_SET_INTERVAL		= 0x4,
 	NFT_SET_MAP			= 0x8,
+	NFT_SET_TIMEOUT			= 0x16,
 };
 
 /**
@@ -251,6 +253,7 @@ enum nft_set_desc_attributes {
  * @NFTA_SET_POLICY: selection policy (NLA_U32)
  * @NFTA_SET_DESC: set description (NLA_NESTED)
  * @NFTA_SET_ID: uniquely identifies a set in a transaction (NLA_U32)
+ * @NFTA_SET_TIMEOUT: timeout default value (NLA_U32)
  */
 enum nft_set_attributes {
 	NFTA_SET_UNSPEC,
@@ -264,6 +267,7 @@ enum nft_set_attributes {
 	NFTA_SET_POLICY,
 	NFTA_SET_DESC,
 	NFTA_SET_ID,
+	NFTA_SET_TIMEOUT,
 	__NFTA_SET_MAX
 };
 #define NFTA_SET_MAX		(__NFTA_SET_MAX - 1)
@@ -283,12 +287,14 @@ enum nft_set_elem_flags {
  * @NFTA_SET_ELEM_KEY: key value (NLA_NESTED: nft_data)
  * @NFTA_SET_ELEM_DATA: data value of mapping (NLA_NESTED: nft_data_attributes)
  * @NFTA_SET_ELEM_FLAGS: bitmask of nft_set_elem_flags (NLA_U32)
+ * @NFTA_SET_ELEM_TIMEOUT: timeout value (NLA_U32)
  */
 enum nft_set_elem_attributes {
 	NFTA_SET_ELEM_UNSPEC,
 	NFTA_SET_ELEM_KEY,
 	NFTA_SET_ELEM_DATA,
 	NFTA_SET_ELEM_FLAGS,
+	NFTA_SET_ELEM_TIMEOUT,
 	__NFTA_SET_ELEM_MAX
 };
 #define NFTA_SET_ELEM_MAX	(__NFTA_SET_ELEM_MAX - 1)
diff --git a/net/netfilter/nf_tables_api.c b/net/netfilter/nf_tables_api.c
index cad0184..95234a3 100644
--- a/net/netfilter/nf_tables_api.c
+++ b/net/netfilter/nf_tables_api.c
@@ -2194,6 +2194,7 @@ static const struct nla_policy nft_set_policy[NFTA_SET_MAX + 1] = {
 	[NFTA_SET_POLICY]		= { .type = NLA_U32 },
 	[NFTA_SET_DESC]			= { .type = NLA_NESTED },
 	[NFTA_SET_ID]			= { .type = NLA_U32 },
+	[NFTA_SET_TIMEOUT]		= { .type = NLA_U32 },
 };
 
 static const struct nla_policy nft_set_desc_policy[NFTA_SET_DESC_MAX + 1] = {
@@ -2344,6 +2345,10 @@ static int nf_tables_fill_set(struct sk_buff *skb, const struct nft_ctx *ctx,
 			goto nla_put_failure;
 	}
 
+	if (set->timeout &&
+	    nla_put_be32(skb, NFTA_SET_TIMEOUT, htonl(set->timeout / HZ)))
+		goto nla_put_failure;
+
 	if (set->policy != NFT_SET_POL_PERFORMANCE) {
 		if (nla_put_be32(skb, NFTA_SET_POLICY, htonl(set->policy)))
 			goto nla_put_failure;
@@ -2557,6 +2562,7 @@ static int nf_tables_newset(struct sock *nlsk, struct sk_buff *skb,
 	bool create;
 	u32 ktype, dtype, flags, policy;
 	struct nft_set_desc desc;
+	unsigned int timeout;
 	int err;
 
 	if (nla[NFTA_SET_TABLE] == NULL ||
@@ -2582,7 +2588,8 @@ static int nf_tables_newset(struct sock *nlsk, struct sk_buff *skb,
 	if (nla[NFTA_SET_FLAGS] != NULL) {
 		flags = ntohl(nla_get_be32(nla[NFTA_SET_FLAGS]));
 		if (flags & ~(NFT_SET_ANONYMOUS | NFT_SET_CONSTANT |
-			      NFT_SET_INTERVAL | NFT_SET_MAP))
+			      NFT_SET_INTERVAL | NFT_SET_MAP |
+			      NFT_SET_TIMEOUT))
 			return -EINVAL;
 	}
 
@@ -2608,6 +2615,13 @@ static int nf_tables_newset(struct sock *nlsk, struct sk_buff *skb,
 	} else if (flags & NFT_SET_MAP)
 		return -EINVAL;
 
+	timeout = 0;
+	if (nla[NFTA_SET_TIMEOUT] != NULL) {
+		if (!(flags & NFT_SET_TIMEOUT))
+			return -EINVAL;
+		timeout = ntohl(nla_get_be32(nla[NFTA_SET_TIMEOUT])) * HZ;
+	}
+
 	policy = NFT_SET_POL_PERFORMANCE;
 	if (nla[NFTA_SET_POLICY] != NULL)
 		policy = ntohl(nla_get_be32(nla[NFTA_SET_POLICY]));
@@ -2675,6 +2689,7 @@ static int nf_tables_newset(struct sock *nlsk, struct sk_buff *skb,
 	set->flags = flags;
 	set->size  = desc.size;
 	set->policy = policy;
+	set->timeout = timeout;
 
 	err = ops->init(set, &desc, nla);
 	if (err < 0)
@@ -2813,6 +2828,10 @@ const struct nft_set_ext_type nft_set_ext_types[] = {
 		.len	= sizeof(u8),
 		.align	= __alignof__(u8),
 	},
+	[NFT_SET_EXT_TIMEOUT]		= {
+		.len	= sizeof(unsigned long),
+		.align	= __alignof__(unsigned long),
+	},
 };
 EXPORT_SYMBOL_GPL(nft_set_ext_types);
 
@@ -2824,6 +2843,7 @@ static const struct nla_policy nft_set_elem_policy[NFTA_SET_ELEM_MAX + 1] = {
 	[NFTA_SET_ELEM_KEY]		= { .type = NLA_NESTED },
 	[NFTA_SET_ELEM_DATA]		= { .type = NLA_NESTED },
 	[NFTA_SET_ELEM_FLAGS]		= { .type = NLA_U32 },
+	[NFTA_SET_ELEM_TIMEOUT]		= { .type = NLA_U32 },
 };
 
 static const struct nla_policy nft_set_elem_list_policy[NFTA_SET_ELEM_LIST_MAX + 1] = {
@@ -2885,6 +2905,20 @@ static int nf_tables_fill_setelem(struct sk_buff *skb,
 		         htonl(*nft_set_ext_flags(ext))))
 		goto nla_put_failure;
 
+	if (nft_set_ext_exists(ext, NFT_SET_EXT_TIMEOUT)) {
+		unsigned long timeout;
+
+		timeout = *nft_set_ext_timeout(ext);
+		if (timeout > jiffies)
+			timeout = (timeout - jiffies) / HZ;
+		else
+			timeout = 0;
+
+		if (timeout &&
+		    nla_put_be32(skb, NFTA_SET_ELEM_TIMEOUT, htonl(timeout)))
+			goto nla_put_failure;
+	}
+
 	nla_nest_end(skb, nest);
 	return 0;
 
@@ -3115,7 +3149,7 @@ static int nft_add_set_elem(struct nft_ctx *ctx, struct nft_set *set,
 	struct nft_data data;
 	enum nft_registers dreg;
 	struct nft_trans *trans;
-	u32 flags;
+	u32 flags, timeout;
 	int err;
 
 	if (set->size && set->nelems == set->size)
@@ -3155,6 +3189,15 @@ static int nft_add_set_elem(struct nft_ctx *ctx, struct nft_set *set,
 			return -EINVAL;
 	}
 
+	timeout = 0;
+	if (nla[NFTA_SET_ELEM_TIMEOUT] != NULL) {
+		if (!(set->flags & NFT_SET_TIMEOUT))
+			return -EINVAL;
+		timeout = nla_get_be32(nla[NFTA_SET_ELEM_TIMEOUT]) * HZ;
+	} else if (set->flags & NFT_SET_TIMEOUT) {
+		timeout = set->timeout;
+	}
+
 	err = nft_data_init(ctx, &elem.key, &d1, nla[NFTA_SET_ELEM_KEY]);
 	if (err < 0)
 		goto err1;
@@ -3167,6 +3210,8 @@ static int nft_add_set_elem(struct nft_ctx *ctx, struct nft_set *set,
 		goto err2;
 
 	nft_set_ext_add(&tmpl, NFT_SET_EXT_KEY);
+	if (timeout > 0)
+		nft_set_ext_add(&tmpl, NFT_SET_EXT_TIMEOUT);
 
 	if (nla[NFTA_SET_ELEM_DATA] != NULL) {
 		err = nft_data_init(ctx, &data, &d2, nla[NFTA_SET_ELEM_DATA]);
@@ -3204,6 +3249,8 @@ static int nft_add_set_elem(struct nft_ctx *ctx, struct nft_set *set,
 	nft_data_copy(nft_set_ext_key(ext), &elem.key);
 	if (flags != 0)
 		*nft_set_ext_flags(ext) = flags;
+	if (timeout > 0)
+		*nft_set_ext_timeout(ext) = jiffies + timeout;
 	if (nla[NFTA_SET_ELEM_DATA] != NULL)
 		nft_data_copy(nft_set_ext_data(ext), &data);
 
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 7/9] netfilter: nft_hash: add support for timeouts
  2015-01-30  7:46 [PATCH 0/9 WIP] nf_tables: set extensions and dynamic updates Patrick McHardy
                   ` (5 preceding siblings ...)
  2015-01-30  7:46 ` [PATCH 6/9] netfilter: nf_tables: add set timeout support Patrick McHardy
@ 2015-01-30  7:46 ` Patrick McHardy
  2015-01-31  4:29   ` Herbert Xu
  2015-01-30  7:46 ` [PATCH 8/9] netfilter: nft_lookup: add missing attribute validation for NFTA_LOOKUP_SET_ID Patrick McHardy
  2015-01-30  7:46 ` [PATCH 9/9] netfilter: nf_tables: add support for dynamic set updates Patrick McHardy
  8 siblings, 1 reply; 21+ messages in thread
From: Patrick McHardy @ 2015-01-30  7:46 UTC (permalink / raw)
  To: herbert
  Cc: tgraf, davem, David.Laight, ying.xue, paulmck, netdev, netfilter-devel

Signed-off-by: Patrick McHardy <kaber@trash.net>
---
 net/netfilter/nft_hash.c | 153 +++++++++++++++++++++++++++++++++--------------
 1 file changed, 107 insertions(+), 46 deletions(-)

diff --git a/net/netfilter/nft_hash.c b/net/netfilter/nft_hash.c
index cba0ad2..e7cf886 100644
--- a/net/netfilter/nft_hash.c
+++ b/net/netfilter/nft_hash.c
@@ -15,6 +15,7 @@
 #include <linux/log2.h>
 #include <linux/jhash.h>
 #include <linux/netlink.h>
+#include <linux/workqueue.h>
 #include <linux/rhashtable.h>
 #include <linux/netfilter.h>
 #include <linux/netfilter/nf_tables.h>
@@ -23,19 +24,47 @@
 /* We target a hash table size of 4, element hint is 75% of final size */
 #define NFT_HASH_ELEMENT_HINT 3
 
+struct nft_hash {
+	struct rhashtable		ht;
+	struct delayed_work		gc_work;
+};
+
 struct nft_hash_elem {
 	struct rhash_head		node;
 	struct nft_set_ext		ext;
 };
 
+struct nft_hash_compare_arg {
+	const struct nft_data		*key;
+	unsigned int			len;
+};
+
+static bool nft_hash_compare(void *ptr, void *arg)
+{
+	const struct nft_hash_elem *he = ptr;
+	struct nft_hash_compare_arg *x = arg;
+
+	if (nft_data_cmp(nft_set_ext_key(&he->ext), x->key, x->len))
+		return false;
+	if (nft_set_ext_exists(&he->ext, NFT_SET_EXT_TIMEOUT) &&
+	    time_after_eq(jiffies, *nft_set_ext_timeout(&he->ext)))
+		return false;
+
+	return true;
+}
+
 static bool nft_hash_lookup(const struct nft_set *set,
 			    const struct nft_data *key,
 			    const struct nft_set_ext **ext)
 {
-	struct rhashtable *priv = nft_set_priv(set);
+	struct nft_hash *priv = nft_set_priv(set);
 	const struct nft_hash_elem *he;
+	struct nft_hash_compare_arg arg = {
+		.key	= key,
+		.len	= set->klen,
+	};
 
-	he = rhashtable_lookup(priv, key);
+	he = rhashtable_lookup_compare(&priv->ht, key, nft_hash_compare, &arg);
 	if (he != NULL)
 		*ext = &he->ext;
 
@@ -45,10 +74,10 @@ static bool nft_hash_lookup(const struct nft_set *set,
 static int nft_hash_insert(const struct nft_set *set,
 			   const struct nft_set_elem *elem)
 {
-	struct rhashtable *priv = nft_set_priv(set);
+	struct nft_hash *priv = nft_set_priv(set);
 	struct nft_hash_elem *he = elem->priv;
 
-	rhashtable_insert(priv, &he->node);
+	rhashtable_insert(&priv->ht, &he->node);
 	return 0;
 }
 
@@ -64,58 +93,43 @@ static void nft_hash_elem_destroy(const struct nft_set *set,
 static void nft_hash_remove(const struct nft_set *set,
 			    const struct nft_set_elem *elem)
 {
-	struct rhashtable *priv = nft_set_priv(set);
+	struct nft_hash *priv = nft_set_priv(set);
+	struct nft_hash_elem *he = elem->cookie;
 
-	rhashtable_remove(priv, elem->cookie);
+	rhashtable_remove(&priv->ht, &he->node);
 	synchronize_rcu();
 	kfree(elem->cookie);
 }
 
-struct nft_compare_arg {
-	const struct nft_set *set;
-	struct nft_set_elem *elem;
-};
-
-static bool nft_hash_compare(void *ptr, void *arg)
-{
-	struct nft_hash_elem *he = ptr;
-	struct nft_compare_arg *x = arg;
-
-	if (!nft_data_cmp(nft_set_ext_key(&he->ext), &x->elem->key,
-			  x->set->klen)) {
-		x->elem->cookie = he;
-		x->elem->priv  = he;
-		return true;
-	}
-
-	return false;
-}
-
 static int nft_hash_get(const struct nft_set *set, struct nft_set_elem *elem)
 {
-	struct rhashtable *priv = nft_set_priv(set);
-	struct nft_compare_arg arg = {
-		.set = set,
-		.elem = elem,
+	struct nft_hash *priv = nft_set_priv(set);
+	struct nft_hash_elem *he;
+	struct nft_hash_compare_arg arg = {
+		.key	= &elem->key,
+		.len	= set->klen,
 	};
 
-	if (rhashtable_lookup_compare(priv, &elem->key,
-				      &nft_hash_compare, &arg))
+	he = rhashtable_lookup_compare(&priv->ht, &elem->key,
+				       nft_hash_compare, &arg);
+	if (he != NULL) {
+		elem->cookie = he;
+		elem->priv   = he;
 		return 0;
-
+	}
 	return -ENOENT;
 }
 
 static void nft_hash_walk(const struct nft_ctx *ctx, const struct nft_set *set,
 			  struct nft_set_iter *iter)
 {
-	struct rhashtable *priv = nft_set_priv(set);
+	struct nft_hash *priv = nft_set_priv(set);
 	const struct bucket_table *tbl;
 	struct nft_hash_elem *he;
 	struct nft_set_elem elem;
 	unsigned int i;
 
-	tbl = rht_dereference_rcu(priv->tbl, priv);
+	tbl = rht_dereference_rcu(priv->ht.tbl, &priv->ht);
 	for (i = 0; i < tbl->size; i++) {
 		struct rhash_head *pos;
 
@@ -134,16 +148,48 @@ cont:
 	}
 }
 
+static void nft_hash_gc(struct work_struct *work)
+{
+	const struct nft_set *set;
+	const struct bucket_table *tbl;
+	struct rhash_head *pos, *next;
+	struct nft_hash_elem *he;
+	struct nft_hash *priv;
+	unsigned long timeout;
+	unsigned int i;
+
+	priv = container_of(work, struct nft_hash, gc_work.work);
+	set  = (void *)priv - offsetof(struct nft_set, data);
+
+	mutex_lock(&priv->ht.mutex);
+	tbl = rht_dereference(priv->ht.tbl, &priv->ht);
+	for (i = 0; i < tbl->size; i++) {
+		rht_for_each_entry_safe(he, pos, next, tbl, i, node) {
+			if (!nft_set_ext_exists(&he->ext, NFT_SET_EXT_TIMEOUT))
+				continue;
+			timeout = *nft_set_ext_timeout(&he->ext);
+			if (time_before(jiffies, timeout))
+				continue;
+
+			rhashtable_remove(&priv->ht, &he->node);
+			nft_hash_elem_destroy(set, he);
+		}
+	}
+	mutex_unlock(&priv->ht.mutex);
+
+	queue_delayed_work(system_power_efficient_wq, &priv->gc_work, HZ);
+}
+
 static unsigned int nft_hash_privsize(const struct nlattr * const nla[])
 {
-	return sizeof(struct rhashtable);
+	return sizeof(struct nft_hash);
 }
 
 static int nft_hash_init(const struct nft_set *set,
 			 const struct nft_set_desc *desc,
 			 const struct nlattr * const tb[])
 {
-	struct rhashtable *priv = nft_set_priv(set);
+	struct nft_hash *priv = nft_set_priv(set);
 	struct rhashtable_params params = {
 		.nelem_hint = desc->size ? : NFT_HASH_ELEMENT_HINT,
 		.head_offset = offsetof(struct nft_hash_elem, node),
@@ -153,30 +199,42 @@ static int nft_hash_init(const struct nft_set *set,
 		.grow_decision = rht_grow_above_75,
 		.shrink_decision = rht_shrink_below_30,
 	};
+	int err;
 
-	return rhashtable_init(priv, &params);
+	err = rhashtable_init(&priv->ht, &params);
+	if (err < 0)
+		return err;
+
+	INIT_DEFERRABLE_WORK(&priv->gc_work, nft_hash_gc);
+	if (set->flags & NFT_SET_TIMEOUT)
+		queue_delayed_work(system_power_efficient_wq,
+				   &priv->gc_work, HZ);
+
+	return 0;
 }
 
 static void nft_hash_destroy(const struct nft_set *set)
 {
-	struct rhashtable *priv = nft_set_priv(set);
+	struct nft_hash *priv = nft_set_priv(set);
 	const struct bucket_table *tbl;
 	struct nft_hash_elem *he;
 	struct rhash_head *pos, *next;
 	unsigned int i;
 
+	cancel_delayed_work_sync(&priv->gc_work);
+
 	/* Stop an eventual async resizing */
-	priv->being_destroyed = true;
-	mutex_lock(&priv->mutex);
+	priv->ht.being_destroyed = true;
+	mutex_lock(&priv->ht.mutex);
 
-	tbl = rht_dereference(priv->tbl, priv);
+	tbl = rht_dereference(priv->ht.tbl, &priv->ht);
 	for (i = 0; i < tbl->size; i++) {
 		rht_for_each_entry_safe(he, pos, next, tbl, i, node)
 			nft_hash_elem_destroy(set, he);
 	}
-	mutex_unlock(&priv->mutex);
+	mutex_unlock(&priv->ht.mutex);
 
-	rhashtable_destroy(priv);
+	rhashtable_destroy(&priv->ht);
 }
 
 static bool nft_hash_estimate(const struct nft_set_desc *desc, u32 features,
@@ -187,9 +245,11 @@ static bool nft_hash_estimate(const struct nft_set_desc *desc, u32 features,
 	esize = sizeof(struct nft_hash_elem);
 	if (features & NFT_SET_MAP)
 		esize += sizeof(struct nft_data);
+	if (features & NFT_SET_TIMEOUT)
+		esize += sizeof(unsigned long);
 
 	if (desc->size) {
-		est->size = sizeof(struct rhashtable) +
+		est->size = sizeof(struct nft_hash) +
 			    roundup_pow_of_two(desc->size * 4 / 3) *
 			    sizeof(struct nft_hash_elem *) +
 			    desc->size * esize;
@@ -218,7 +278,7 @@ static struct nft_set_ops nft_hash_ops __read_mostly = {
 	.remove		= nft_hash_remove,
 	.lookup		= nft_hash_lookup,
 	.walk		= nft_hash_walk,
-	.features	= NFT_SET_MAP,
+	.features	= NFT_SET_MAP | NFT_SET_TIMEOUT,
 	.owner		= THIS_MODULE,
 };
 
@@ -238,3 +298,4 @@ module_exit(nft_hash_module_exit);
 MODULE_LICENSE("GPL");
 MODULE_AUTHOR("Patrick McHardy <kaber@trash.net>");
 MODULE_ALIAS_NFT_SET();
+
-- 
2.1.0


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 8/9] netfilter: nft_lookup: add missing attribute validation for NFTA_LOOKUP_SET_ID
  2015-01-30  7:46 [PATCH 0/9 WIP] nf_tables: set extensions and dynamic updates Patrick McHardy
                   ` (6 preceding siblings ...)
  2015-01-30  7:46 ` [PATCH 7/9] netfilter: nft_hash: add support for timeouts Patrick McHardy
@ 2015-01-30  7:46 ` Patrick McHardy
  2015-01-30  7:46 ` [PATCH 9/9] netfilter: nf_tables: add support for dynamic set updates Patrick McHardy
  8 siblings, 0 replies; 21+ messages in thread
From: Patrick McHardy @ 2015-01-30  7:46 UTC (permalink / raw)
  To: herbert
  Cc: tgraf, davem, David.Laight, ying.xue, paulmck, netdev, netfilter-devel

Signed-off-by: Patrick McHardy <kaber@trash.net>
---
 net/netfilter/nft_lookup.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/net/netfilter/nft_lookup.c b/net/netfilter/nft_lookup.c
index cdbf050..a5f30b8 100644
--- a/net/netfilter/nft_lookup.c
+++ b/net/netfilter/nft_lookup.c
@@ -43,6 +43,7 @@ static void nft_lookup_eval(const struct nft_expr *expr,
 
 static const struct nla_policy nft_lookup_policy[NFTA_LOOKUP_MAX + 1] = {
 	[NFTA_LOOKUP_SET]	= { .type = NLA_STRING },
+	[NFTA_LOOKUP_SET_ID]	= { .type = NLA_U32 },
 	[NFTA_LOOKUP_SREG]	= { .type = NLA_U32 },
 	[NFTA_LOOKUP_DREG]	= { .type = NLA_U32 },
 };
-- 
2.1.0


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 9/9] netfilter: nf_tables: add support for dynamic set updates
  2015-01-30  7:46 [PATCH 0/9 WIP] nf_tables: set extensions and dynamic updates Patrick McHardy
                   ` (7 preceding siblings ...)
  2015-01-30  7:46 ` [PATCH 8/9] netfilter: nft_lookup: add missing attribute validation for NFTA_LOOKUP_SET_ID Patrick McHardy
@ 2015-01-30  7:46 ` Patrick McHardy
  2015-01-30  9:28   ` Herbert Xu
  8 siblings, 1 reply; 21+ messages in thread
From: Patrick McHardy @ 2015-01-30  7:46 UTC (permalink / raw)
  To: herbert
  Cc: tgraf, davem, David.Laight, ying.xue, paulmck, netdev, netfilter-devel

Signed-off-by: Patrick McHardy <kaber@trash.net>
---
 include/net/netfilter/nf_tables.h        |   4 +-
 include/uapi/linux/netfilter/nf_tables.h |  21 ++++
 net/netfilter/Kconfig                    |   7 ++
 net/netfilter/Makefile                   |   1 +
 net/netfilter/nf_tables_api.c            |   2 +
 net/netfilter/nft_hash.c                 |  20 +++-
 net/netfilter/nft_set.c                  | 176 +++++++++++++++++++++++++++++++
 7 files changed, 228 insertions(+), 3 deletions(-)
 create mode 100644 net/netfilter/nft_set.c

diff --git a/include/net/netfilter/nf_tables.h b/include/net/netfilter/nf_tables.h
index 735a59d..5bbde43 100644
--- a/include/net/netfilter/nf_tables.h
+++ b/include/net/netfilter/nf_tables.h
@@ -319,6 +319,8 @@ struct nft_set_ops {
 	bool				(*lookup)(const struct nft_set *set,
 						  const struct nft_data *key,
 						  const struct nft_set_ext **ext);
+	bool				(*update)(const struct nft_set *set,
+						  void *elem);
 	int				(*get)(const struct nft_set *set,
 					       struct nft_set_elem *elem);
 	int				(*insert)(const struct nft_set *set,
@@ -373,13 +375,13 @@ struct nft_set {
 	u32				dtype;
 	u32				size;
 	u32				nelems;
-	u32				timeout;
 	u16				policy;
 	/* runtime data below here */
 	const struct nft_set_ops	*ops ____cacheline_aligned;
 	u16				flags;
 	u8				klen;
 	u8				dlen;
+	u32				timeout;
 	unsigned char			data[]
 		__attribute__((aligned(__alignof__(u64))));
 };
diff --git a/include/uapi/linux/netfilter/nf_tables.h b/include/uapi/linux/netfilter/nf_tables.h
index 144d8fe..d8bad34 100644
--- a/include/uapi/linux/netfilter/nf_tables.h
+++ b/include/uapi/linux/netfilter/nf_tables.h
@@ -510,6 +510,27 @@ enum nft_lookup_attributes {
 };
 #define NFTA_LOOKUP_MAX		(__NFTA_LOOKUP_MAX - 1)
 
+enum nft_set_ops_ {
+	NFT_SET_OP_ADD,
+	NFT_SET_OP_UPDATE,
+	NFT_SET_OP_DELETE,
+};
+
+/**
+ * enum nft_set_attributes - set expression attributes
+ *
+ */
+enum nft_set_attributes_ {
+	NFTA_SET_UNSPEC_,
+	NFTA_SET_SET_NAME,
+	NFTA_SET_SET_ID,
+	NFTA_SET_OP,
+	NFTA_SET_SREG_KEY,
+	NFTA_SET_SREG_DATA,
+	__NFTA_SET_MAX_,
+};
+#define NFTA_SET_MAX_		(__NFTA_SET_MAX_ - 1)
+
 /**
  * enum nft_payload_bases - nf_tables payload expression offset bases
  *
diff --git a/net/netfilter/Kconfig b/net/netfilter/Kconfig
index b02660f..a6c5942 100644
--- a/net/netfilter/Kconfig
+++ b/net/netfilter/Kconfig
@@ -482,6 +482,13 @@ config NFT_HASH
 	  This option adds the "hash" set type that is used to build one-way
 	  mappings between matchings and actions.
 
+config NFT_SET
+	depends on NF_TABLES
+	tristate "Netfilter nf_tables set module"
+	help
+	  This options adds support for dynamic set updates during the packet
+	  classification process.
+
 config NFT_COUNTER
 	depends on NF_TABLES
 	tristate "Netfilter nf_tables counter module"
diff --git a/net/netfilter/Makefile b/net/netfilter/Makefile
index 89f73a9..0ff329f 100644
--- a/net/netfilter/Makefile
+++ b/net/netfilter/Makefile
@@ -86,6 +86,7 @@ obj-$(CONFIG_NFT_REJECT) 	+= nft_reject.o
 obj-$(CONFIG_NFT_REJECT_INET)	+= nft_reject_inet.o
 obj-$(CONFIG_NFT_RBTREE)	+= nft_rbtree.o
 obj-$(CONFIG_NFT_HASH)		+= nft_hash.o
+obj-$(CONFIG_NFT_SET)		+= nft_set.o
 obj-$(CONFIG_NFT_COUNTER)	+= nft_counter.o
 obj-$(CONFIG_NFT_LOG)		+= nft_log.o
 obj-$(CONFIG_NFT_MASQ)		+= nft_masq.o
diff --git a/net/netfilter/nf_tables_api.c b/net/netfilter/nf_tables_api.c
index 95234a3..7bdc626 100644
--- a/net/netfilter/nf_tables_api.c
+++ b/net/netfilter/nf_tables_api.c
@@ -2246,6 +2246,7 @@ struct nft_set *nf_tables_set_lookup(const struct nft_table *table,
 	}
 	return ERR_PTR(-ENOENT);
 }
+EXPORT_SYMBOL_GPL(nf_tables_set_lookup);
 
 struct nft_set *nf_tables_set_lookup_byid(const struct net *net,
 					  const struct nlattr *nla)
@@ -2260,6 +2261,7 @@ struct nft_set *nf_tables_set_lookup_byid(const struct net *net,
 	}
 	return ERR_PTR(-ENOENT);
 }
+EXPORT_SYMBOL(nf_tables_set_lookup_byid);
 
 static int nf_tables_set_alloc_name(struct nft_ctx *ctx, struct nft_set *set,
 				    const char *name)
diff --git a/net/netfilter/nft_hash.c b/net/netfilter/nft_hash.c
index e7cf886..cc6750e 100644
--- a/net/netfilter/nft_hash.c
+++ b/net/netfilter/nft_hash.c
@@ -71,6 +71,19 @@ static bool nft_hash_lookup(const struct nft_set *set,
 	return !!he;
 }
 
+static bool nft_hash_update(const struct nft_set *set, void *elem)
+{
+	struct nft_hash *priv = nft_set_priv(set);
+	struct nft_hash_elem *he = elem;
+	struct nft_hash_compare_arg arg = {
+		.key	= nft_set_ext_key(&he->ext),
+		.len	= set->klen,
+	};
+
+	return rhashtable_lookup_compare_insert(&priv->ht, &he->node,
+						nft_hash_compare, &arg);
+}
+
 static int nft_hash_insert(const struct nft_set *set,
 			   const struct nft_set_elem *elem)
 {
@@ -96,7 +109,8 @@ static void nft_hash_remove(const struct nft_set *set,
 	struct nft_hash *priv = nft_set_priv(set);
 	struct nft_hash_elem *he = elem->cookie;
 
-	rhashtable_remove(&priv->ht, &he->node);
+	if (!rhashtable_remove(&priv->ht, &he->node))
+		return;
 	synchronize_rcu();
 	kfree(elem->cookie);
 }
@@ -171,7 +185,8 @@ static void nft_hash_gc(struct work_struct *work)
 			if (time_before(jiffies, timeout))
 				continue;
 
-			rhashtable_remove(&priv->ht, &he->node);
+			if (!rhashtable_remove(&priv->ht, &he->node))
+				continue;
 			nft_hash_elem_destroy(set, he);
 		}
 	}
@@ -277,6 +292,7 @@ static struct nft_set_ops nft_hash_ops __read_mostly = {
 	.insert		= nft_hash_insert,
 	.remove		= nft_hash_remove,
 	.lookup		= nft_hash_lookup,
+	.update		= nft_hash_update,
 	.walk		= nft_hash_walk,
 	.features	= NFT_SET_MAP | NFT_SET_TIMEOUT,
 	.owner		= THIS_MODULE,
diff --git a/net/netfilter/nft_set.c b/net/netfilter/nft_set.c
new file mode 100644
index 0000000..e94048e
--- /dev/null
+++ b/net/netfilter/nft_set.c
@@ -0,0 +1,176 @@
+/*
+ * Copyright (c) 2015 Patrick McHardy <kaber@trash.net>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ */
+
+#include <linux/kernel.h>
+#include <linux/module.h>
+#include <linux/init.h>
+#include <linux/netlink.h>
+#include <linux/netfilter.h>
+#include <linux/netfilter/nf_tables.h>
+#include <net/netfilter/nf_tables.h>
+#include <net/netfilter/nf_tables_core.h>
+
+struct nft_set_expr {
+	struct nft_set			*set;
+	enum nft_registers		sreg_key:8;
+	enum nft_registers		sreg_data:8;
+	unsigned long			timeout;
+};
+
+static void nft_set_eval(const struct nft_expr *expr,
+			 struct nft_data data[NFT_REG_MAX + 1],
+			 const struct nft_pktinfo *pkt)
+{
+	const struct nft_set_expr *priv = nft_expr_priv(expr);
+	const struct nft_set *set = priv->set;
+	struct nft_set_ext_tmpl tmpl;
+	struct nft_set_ext *ext;
+	unsigned long timeout;
+	void *elem;
+
+	nft_set_ext_prepare(&tmpl);
+	nft_set_ext_add(&tmpl, NFT_SET_EXT_KEY);
+
+	timeout = 0;
+	if (set->flags & NFT_SET_TIMEOUT) {
+		timeout = priv->timeout ? : set->timeout;
+		if (timeout > 0)
+			nft_set_ext_add(&tmpl, NFT_SET_EXT_TIMEOUT);
+	}
+	if (set->flags & NFT_SET_MAP)
+		nft_set_ext_add(&tmpl, NFT_SET_EXT_DATA);
+
+	elem = kzalloc(set->ops->elemsize + tmpl.len, GFP_ATOMIC);
+	if (elem == NULL)
+		return;
+	ext = elem + set->ops->elemsize;
+	nft_set_ext_init(ext, &tmpl);
+
+	nft_data_copy(nft_set_ext_key(ext), &data[priv->sreg_key]);
+	if (set->flags & NFT_SET_MAP)
+		nft_data_copy(nft_set_ext_data(ext), &data[priv->sreg_data]);
+	if (timeout > 0)
+		*nft_set_ext_timeout(ext) = jiffies + timeout;
+
+	if (!set->ops->update(set, elem))
+		kfree(elem);
+}
+
+static const struct nla_policy nft_set_policy[NFTA_SET_MAX + 1] = {
+	[NFTA_SET_SET_NAME]	= { .type = NLA_STRING },
+	[NFTA_SET_SET_ID]	= { .type = NLA_U32 },
+	[NFTA_SET_OP]		= { .type = NLA_U32 },
+	[NFTA_SET_SREG_KEY]	= { .type = NLA_U32 },
+	[NFTA_SET_SREG_DATA]	= { .type = NLA_U32 },
+	[NFTA_SET_TIMEOUT]	= { .type = NLA_U32 },
+};
+
+static int nft_set_init(const struct nft_ctx *ctx,
+			const struct nft_expr *expr,
+			const struct nlattr * const tb[])
+{
+	struct nft_set_expr *priv = nft_expr_priv(expr);
+	struct nft_set *set;
+	enum nft_set_ops_ op;
+	int err;
+
+	if (tb[NFTA_SET_SET_NAME] == NULL ||
+	    tb[NFTA_SET_OP] == NULL ||
+	    tb[NFTA_SET_SREG_KEY] == NULL)
+		return -EINVAL;
+
+	op = ntohl(nla_get_be32(tb[NFTA_SET_OP]));
+	switch (op) {
+	case NFT_SET_ADD:
+	case NFT_SET_UPDATE:
+	case NFT_SET_DELETE:
+		break;
+	default:
+		return -EINVAL;
+	}
+
+	set = nf_tables_set_lookup(ctx->table, tb[NFTA_SET_SET_NAME]);
+	if (IS_ERR(set)) {
+		if (tb[NFTA_SET_SET_ID])
+			set = nf_tables_set_lookup_byid(ctx->net,
+							tb[NFTA_SET_SET_ID]);
+		if (IS_ERR(set))
+			return PTR_ERR(set);
+	}
+
+	priv->sreg_key = ntohl(nla_get_be32(tb[NFTA_SET_SREG_KEY]));
+	err = nft_validate_input_register(priv->sreg_key);
+	if (err < 0)
+		return err;
+
+	if (tb[NFTA_SET_SREG_DATA] != NULL) {
+		if (!(set->flags & NFT_SET_MAP))
+			return -EINVAL;
+
+		priv->sreg_data = ntohl(nla_get_be32(tb[NFTA_SET_SREG_DATA]));
+		err = nft_validate_input_register(priv->sreg_data);
+		if (err < 0)
+			return err;
+	} else if (set->flags & NFT_SET_MAP)
+		return -EINVAL;
+
+	// FIXME: bind
+	priv->set = set;
+	return 0;
+}
+
+static int nft_set_dump(struct sk_buff *skb, const struct nft_expr *expr)
+{
+	const struct nft_set_expr *priv = nft_expr_priv(expr);
+
+	if (nla_put_be32(skb, NFTA_SET_SREG_KEY, htonl(priv->sreg_key)))
+		goto nla_put_failure;
+	if (priv->set->flags & NFT_SET_MAP &&
+	    nla_put_be32(skb, NFTA_SET_SREG_DATA, htonl(priv->sreg_data)))
+		goto nla_put_failure;
+	if (nla_put_be32(skb, NFTA_SET_OP, htonl(priv->op)))
+		goto nla_put_failure;
+	if (nla_put_string(skb, NFTA_SET_SET_NAME, priv->set->name))
+		goto nla_put_failure;
+	return 0;
+
+nla_put_failure:
+	return -1;
+}
+
+static struct nft_expr_type nft_set_type;
+static const struct nft_expr_ops nft_set_ops = {
+	.type		= &nft_set_type,
+	.size		= NFT_EXPR_SIZE(sizeof(struct nft_set_expr)),
+	.eval		= nft_set_eval,
+	.init		= nft_set_init,
+	.dump		= nft_set_dump,
+};
+
+static struct nft_expr_type nft_set_type __read_mostly = {
+	.name		= "set",
+	.ops		= &nft_set_ops,
+	.policy		= nft_set_policy,
+	.maxattr	= NFTA_SET_MAX,
+	.owner		= THIS_MODULE,
+};
+
+int __init nft_set_module_init(void)
+{
+	return nft_register_expr(&nft_set_type);
+}
+
+void nft_set_module_exit(void)
+{
+	nft_unregister_expr(&nft_set_type);
+}
+
+module_init(nft_set_module_init);
+module_exit(nft_set_module_exit);
+MODULE_LICENSE("GPL");
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: [PATCH 9/9] netfilter: nf_tables: add support for dynamic set updates
  2015-01-30  7:46 ` [PATCH 9/9] netfilter: nf_tables: add support for dynamic set updates Patrick McHardy
@ 2015-01-30  9:28   ` Herbert Xu
  2015-01-30 10:08     ` Patrick McHardy
  0 siblings, 1 reply; 21+ messages in thread
From: Herbert Xu @ 2015-01-30  9:28 UTC (permalink / raw)
  To: Patrick McHardy
  Cc: tgraf, davem, David.Laight, ying.xue, paulmck, netdev, netfilter-devel

On Fri, Jan 30, 2015 at 07:46:34AM +0000, Patrick McHardy wrote:
> Signed-off-by: Patrick McHardy <kaber@trash.net>

I presume this can't create any jumps/gotos, right?

Thanks,
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 9/9] netfilter: nf_tables: add support for dynamic set updates
  2015-01-30  9:28   ` Herbert Xu
@ 2015-01-30 10:08     ` Patrick McHardy
  2015-01-30 10:18       ` Herbert Xu
  0 siblings, 1 reply; 21+ messages in thread
From: Patrick McHardy @ 2015-01-30 10:08 UTC (permalink / raw)
  To: Herbert Xu
  Cc: tgraf, davem, David.Laight, ying.xue, paulmck, netdev, netfilter-devel

Am 30. Januar 2015 09:28:45 GMT+00:00, schrieb Herbert Xu <herbert@gondor.apana.org.au>:
>On Fri, Jan 30, 2015 at 07:46:34AM +0000, Patrick McHardy wrote:
>> Signed-off-by: Patrick McHardy <kaber@trash.net>
>
>I presume this can't create any jumps/gotos, right?

So far not, just data mappings. Not sure yet if there is a valid use case for jumps.
>
>Thanks,



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 9/9] netfilter: nf_tables: add support for dynamic set updates
  2015-01-30 10:08     ` Patrick McHardy
@ 2015-01-30 10:18       ` Herbert Xu
  2015-01-30 11:29         ` Herbert Xu
  0 siblings, 1 reply; 21+ messages in thread
From: Herbert Xu @ 2015-01-30 10:18 UTC (permalink / raw)
  To: Patrick McHardy
  Cc: tgraf, davem, David.Laight, ying.xue, paulmck, netdev, netfilter-devel

On Fri, Jan 30, 2015 at 10:08:46AM +0000, Patrick McHardy wrote:
> Am 30. Januar 2015 09:28:45 GMT+00:00, schrieb Herbert Xu <herbert@gondor.apana.org.au>:
> >On Fri, Jan 30, 2015 at 07:46:34AM +0000, Patrick McHardy wrote:
> >> Signed-off-by: Patrick McHardy <kaber@trash.net>
> >
> >I presume this can't create any jumps/gotos, right?
> 
> So far not, just data mappings. Not sure yet if there is a valid use case for jumps.

Well if they could create jumps/gotos in softirq context then
the loop verification would get pretty hairy :)

OK so it looks like you really do need a totally lockless walk.
So I'll reshape my iterators to do that.

Cheers,
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 3/9] nftables: nft_rbtree: fix locking
  2015-01-30  7:46 ` [PATCH 3/9] nftables: nft_rbtree: fix locking Patrick McHardy
@ 2015-01-30 10:52   ` Pablo Neira Ayuso
  0 siblings, 0 replies; 21+ messages in thread
From: Pablo Neira Ayuso @ 2015-01-30 10:52 UTC (permalink / raw)
  To: Patrick McHardy
  Cc: herbert, tgraf, davem, David.Laight, ying.xue, paulmck, netdev,
	netfilter-devel

Hi Patrick,

On Fri, Jan 30, 2015 at 07:46:28AM +0000, Patrick McHardy wrote:
> Fix a race condition and unnecessary locking:
> 
> * the root rb_node must only be accessed under the lock in nft_rbtree_lookup()
> * the lock is not needed in lookup functions in netlink contexts
> 
> Signed-off-by: Patrick McHardy <kaber@trash.net>
> ---
>  net/netfilter/nft_rbtree.c | 12 +++---------
>  1 file changed, 3 insertions(+), 9 deletions(-)
> 
> diff --git a/net/netfilter/nft_rbtree.c b/net/netfilter/nft_rbtree.c
> index 46214f2..417796f 100644
> --- a/net/netfilter/nft_rbtree.c
> +++ b/net/netfilter/nft_rbtree.c
> @@ -37,10 +37,11 @@ static bool nft_rbtree_lookup(const struct nft_set *set,
>  {
>  	const struct nft_rbtree *priv = nft_set_priv(set);
>  	const struct nft_rbtree_elem *rbe, *interval = NULL;
> -	const struct rb_node *parent = priv->root.rb_node;
> +	const struct rb_node *parent;
>  	int d;
>  
>  	spin_lock_bh(&nft_rbtree_lock);
> +	parent = priv->root.rb_node;

Good catch.

>  	while (parent != NULL) {
>  		rbe = rb_entry(parent, struct nft_rbtree_elem, node);
>  
> @@ -158,7 +159,6 @@ static int nft_rbtree_get(const struct nft_set *set, struct nft_set_elem *elem)
>  	struct nft_rbtree_elem *rbe;
>  	int d;
>  
> -	spin_lock_bh(&nft_rbtree_lock);
>  	while (parent != NULL) {
>  		rbe = rb_entry(parent, struct nft_rbtree_elem, node);
>  
> @@ -173,11 +173,9 @@ static int nft_rbtree_get(const struct nft_set *set, struct nft_set_elem *elem)
>  			    !(rbe->flags & NFT_SET_ELEM_INTERVAL_END))
>  				nft_data_copy(&elem->data, rbe->data);
>  			elem->flags = rbe->flags;
> -			spin_unlock_bh(&nft_rbtree_lock);
>  			return 0;
>  		}
>  	}
> -	spin_unlock_bh(&nft_rbtree_lock);
>  	return -ENOENT;

this chunk looks fine to me, we always hold the nfnetlink mutex.

>  }
> @@ -190,7 +188,6 @@ static void nft_rbtree_walk(const struct nft_ctx *ctx,
>  	struct nft_set_elem elem;
>  	struct rb_node *node;
>  
> -	spin_lock_bh(&nft_rbtree_lock);
>  	for (node = rb_first(&priv->root); node != NULL; node = rb_next(node)) {
>  		if (iter->count < iter->skip)
>  			goto cont;
> @@ -203,14 +200,11 @@ static void nft_rbtree_walk(const struct nft_ctx *ctx,
>  		elem.flags = rbe->flags;
>  
>  		iter->err = iter->fn(ctx, set, iter, &elem);
> -		if (iter->err < 0) {
> -			spin_unlock_bh(&nft_rbtree_lock);
> +		if (iter->err < 0)
>  			return;
> -		}
>  cont:
>  		iter->count++;
>  	}
> -	spin_unlock_bh(&nft_rbtree_lock);
>  }
>  

I think that _walk still needs the lock there. This is called from
nf_tables_dump_set() for each recvmsg() in netlink, and IIRC unlike
rtnetlink the dump path in nfnetlink is lockless.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 9/9] netfilter: nf_tables: add support for dynamic set updates
  2015-01-30 10:18       ` Herbert Xu
@ 2015-01-30 11:29         ` Herbert Xu
  0 siblings, 0 replies; 21+ messages in thread
From: Herbert Xu @ 2015-01-30 11:29 UTC (permalink / raw)
  To: Patrick McHardy
  Cc: tgraf, davem, David.Laight, ying.xue, paulmck, netdev, netfilter-devel

On Fri, Jan 30, 2015 at 09:18:02PM +1100, Herbert Xu wrote:
> 
> OK so it looks like you really do need a totally lockless walk.
> So I'll reshape my iterators to do that.

OK here is a completely untested patch that implements a totally
lockless walk with restart-on-resize support.

diff --git a/include/linux/rhashtable.h b/include/linux/rhashtable.h
index 6d7e840..a0e5b08 100644
--- a/include/linux/rhashtable.h
+++ b/include/linux/rhashtable.h
@@ -18,6 +18,7 @@
 #ifndef _LINUX_RHASHTABLE_H
 #define _LINUX_RHASHTABLE_H
 
+#include <linux/compiler.h>
 #include <linux/list_nulls.h>
 #include <linux/workqueue.h>
 #include <linux/mutex.h>
@@ -110,6 +111,7 @@ struct rhashtable_params {
  * @p: Configuration parameters
  * @run_work: Deferred worker to expand/shrink asynchronously
  * @mutex: Mutex to protect current/future table swapping
+ * @walkers: List of active walkers
  * @being_destroyed: True if table is set up for destruction
  */
 struct rhashtable {
@@ -120,9 +122,36 @@ struct rhashtable {
 	struct rhashtable_params	p;
 	struct work_struct		run_work;
 	struct mutex                    mutex;
+	struct list_head		walkers;
 	bool                            being_destroyed;
 };
 
+/**
+ * struct rhashtable_walker - Hash table walker
+ * @list: List entry on list of walkers
+ * @resize: Resize event occured
+ */
+struct rhashtable_walker {
+	struct list_head list;
+	bool resize;
+};
+
+/**
+ * struct rhashtable_iter - Hash table iterator, fits into netlink cb
+ * @ht: Table to iterate through
+ * @p: Current pointer
+ * @walker: Associated rhashtable walker
+ * @slot: Current slot
+ * @skip: Number of entries to skip in slot
+ */
+struct rhashtable_iter {
+	struct rhashtable *ht;
+	struct rhash_head *p;
+	struct rhashtable_walker *walker;
+	unsigned int slot;
+	unsigned int skip;
+};
+
 static inline unsigned long rht_marker(const struct rhashtable *ht, u32 hash)
 {
 	return NULLS_MARKER(ht->p.nulls_base + hash);
@@ -178,6 +207,12 @@ bool rhashtable_lookup_compare_insert(struct rhashtable *ht,
 				      bool (*compare)(void *, void *),
 				      void *arg);
 
+int rhashtable_walk_init(struct rhashtable *ht, struct rhashtable_iter *iter);
+void rhashtable_walk_exit(struct rhashtable_iter *iter);
+int rhashtable_walk_start(struct rhashtable_iter *iter) __acquires(RCU);
+void *rhashtable_walk_next(struct rhashtable_iter *iter);
+void rhashtable_walk_stop(struct rhashtable_iter *iter) __releases(RCU);
+
 void rhashtable_destroy(struct rhashtable *ht);
 
 #define rht_dereference(p, ht) \
diff --git a/lib/rhashtable.c b/lib/rhashtable.c
index 71c6aa1..a3e9e5c 100644
--- a/lib/rhashtable.c
+++ b/lib/rhashtable.c
@@ -485,11 +485,15 @@ static void rht_deferred_worker(struct work_struct *work)
 {
 	struct rhashtable *ht;
 	struct bucket_table *tbl;
+	struct rhashtable_walker *walker;
 
 	ht = container_of(work, struct rhashtable, run_work);
 	mutex_lock(&ht->mutex);
 	tbl = rht_dereference(ht->tbl, ht);
 
+	list_for_each_entry(walker, &ht->walkers, list)
+		walker->resize = true;
+
 	if (ht->p.grow_decision && ht->p.grow_decision(ht, tbl->size))
 		rhashtable_expand(ht);
 	else if (ht->p.shrink_decision && ht->p.shrink_decision(ht, tbl->size))
@@ -813,6 +817,165 @@ exit:
 }
 EXPORT_SYMBOL_GPL(rhashtable_lookup_compare_insert);
 
+/**
+ * rhashtable_walk_init - Initialise an iterator
+ * @ht:		Table to walk over
+ * @iter:	Hash table Iterator
+ *
+ * This function prepares a hash table walk.
+ *
+ * Note that if you restart a walk after rhashtable_walk_stop you
+ * may see the same object twice.  Also, you may miss objects if
+ * there are removals in between rhashtable_walk_stop and the next
+ * call to rhashtable_walk_start.
+ *
+ * For a completely stable walk you should construct your own data
+ * structure outside the hash table.
+ *
+ * This function may sleep so you must not call it from interrupt
+ * context or with spin locks held.
+ *
+ * You must call rhashtable_walk_exit if this function returns
+ * successfully.
+ */
+int rhashtable_walk_init(struct rhashtable *ht, struct rhashtable_iter *iter)
+{
+	int err;
+
+	iter->ht = ht;
+	iter->p = NULL;
+	iter->slot = 0;
+	iter->skip = 0;
+
+	iter->walk = kmalloc(sizeof(*iter->walk), GFP_KERNEL);
+	if (!iter->walk)
+		return -ENOMEM;
+
+	mutex_lock(&ht->mutex);
+	list_add(&iter->walker->list, &ht->walkers);
+	mutex_unlock(&ht->mutex);
+
+	return 0;
+}
+EXPORT_SYMBOL_GPL(rhashtable_walk_init);
+
+/**
+ * rhashtable_walk_exit - Free an iterator
+ * @iter:	Hash table Iterator
+ *
+ * This function frees resources allocated by rhashtable_walk_init.
+ */
+void rhashtable_walk_exit(struct rhashtable_iter *iter)
+{
+	mutex_lock(&ht->mutex);
+	list_del(&iter->walker->list);
+	mutex_unlock(&ht->mutex);
+	kfree(iter->walker);
+}
+EXPORT_SYMBOL_GPL(rhashtable_walk_exit);
+
+/**
+ * rhashtable_walk_start - Start a hash table walk
+ * @iter:	Hash table iterator
+ *
+ * Start a hash table walk.  Note that we take the RCU lock in all
+ * cases including when we return an error.  So you must always call
+ * rhashtable_walk_stop to clean up.
+ *
+ * Returns zero if successful.
+ *
+ * Returns -EAGAIN if resize event occured.  Note that the iterator
+ * will rewind back to the beginning and you may use it immediately
+ * by calling rhashtable_walk_next.
+ */
+int rhashtable_walk_start(struct rhashtable_iter *iter)
+{
+	rcu_read_lock();
+
+	if (iter->walker->resize) {
+		iter->slot = 0;
+		iter->skip = 0;
+		iter->walker->resize = false;
+		return -EAGAIN;
+	}
+
+	return 0;
+}
+
+/**
+ * rhashtable_walk_next - Return the next object and advance the iterator
+ * @iter:	Hash table iterator
+ *
+ * Note that you must call rhashtable_walk_stop when you are finished
+ * with the walk.
+ *
+ * Returns the next object or NULL when the end of the table is reached.
+ *
+ * Returns -EAGAIN if resize event occured.  Note that the iterator
+ * will rewind back to the beginning and you may continue to use it.
+ */
+void *rhashtable_walk_next(struct rhashtable_iter *iter)
+{
+	const struct bucket_table *tbl;
+	struct rhashtable *ht = iter->ht;
+	struct rhash_head *p = iter->p;
+	void *obj = NULL;
+
+	tbl = rht_dereference_rcu(ht->tbl, ht);
+
+	if (p) {
+		p = rht_dereference_bucket_rcu(p->next, tbl, iter->slot);
+		goto next;
+	}
+
+	for (; iter->slot < tbl->size; iter->slot++) {
+		int skip = iter->skip;
+
+		rht_for_each_rcu(p, tbl, iter->slot) {
+			if (!skip)
+				break;
+			skip--;
+		}
+
+next:
+		if (!rht_is_a_nulls(p)) {
+			iter->skip++;
+			iter->p = p;
+			obj = rht_obj(ht, p);
+			goto out;
+		}
+
+		iter->skip = 0;
+	}
+
+	iter->p = NULL;
+
+out:
+	if (iter->walker->resize) {
+		iter->p = NULL;
+		iter->slot = 0;
+		iter->skip = 0;
+		iter->walker->resize = false;
+		return ERR_PTR(-EAGAIN);
+	}
+
+	return obj;
+}
+EXPORT_SYMBOL_GPL(rhashtable_walk_next);
+
+/**
+ * rhashtable_walk_stop - Finish a hash table walk
+ * @iter:	Hash table iterator
+ *
+ * Finish a hash table walk.
+ */
+void rhashtable_walk_stop(struct rhashtable_iter *iter)
+{
+	rcu_read_unlock();
+	iter->p = NULL;
+}
+EXPORT_SYMBOL_GPL(rhashtable_walk_stop);
+
 static size_t rounded_hashtable_size(struct rhashtable_params *params)
 {
 	return max(roundup_pow_of_two(params->nelem_hint * 4 / 3),

Cheers,
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: [PATCH 1/9] rhashtable: simplify rhashtable_remove()
  2015-01-30  7:46 ` [PATCH 1/9] rhashtable: simplify rhashtable_remove() Patrick McHardy
@ 2015-01-30 16:36   ` Thomas Graf
  0 siblings, 0 replies; 21+ messages in thread
From: Thomas Graf @ 2015-01-30 16:36 UTC (permalink / raw)
  To: Patrick McHardy
  Cc: herbert, davem, David.Laight, ying.xue, paulmck, netdev, netfilter-devel

On 01/30/15 at 07:46am, Patrick McHardy wrote:
> Remove some duplicated code by moving the restart label up a few
> lines. Also use rcu_access_pointer() for the pointer comparison
> instead of rht_dereference_rcu().
> 
> Signed-off-by: Patrick McHardy <kaber@trash.net>

BTW, everything except the rcu_access_pointer() optimization is
also covered in the "rhashtable fixes" series I posted.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 2/9] nftables: reject NFT_SET_ELEM_INTERVAL_END flag for non-interval sets
  2015-01-30  7:46 ` [PATCH 2/9] nftables: reject NFT_SET_ELEM_INTERVAL_END flag for non-interval sets Patrick McHardy
@ 2015-01-30 17:31   ` Pablo Neira Ayuso
  2015-01-30 17:55     ` Patrick McHardy
  0 siblings, 1 reply; 21+ messages in thread
From: Pablo Neira Ayuso @ 2015-01-30 17:31 UTC (permalink / raw)
  To: Patrick McHardy
  Cc: herbert, tgraf, davem, David.Laight, ying.xue, paulmck, netdev,
	netfilter-devel

Hi Patrick,

Unless you have any concern, I'm going to apply this and 8/9 to
nf-next, so you don't need to resend these two sanitization fixes.

Thanks.

On Fri, Jan 30, 2015 at 07:46:27AM +0000, Patrick McHardy wrote:
> Signed-off-by: Patrick McHardy <kaber@trash.net>
> ---
>  net/netfilter/nf_tables_api.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/net/netfilter/nf_tables_api.c b/net/netfilter/nf_tables_api.c
> index 129a8da..92ba4a0 100644
> --- a/net/netfilter/nf_tables_api.c
> +++ b/net/netfilter/nf_tables_api.c
> @@ -3112,6 +3112,9 @@ static int nft_add_set_elem(struct nft_ctx *ctx, struct nft_set *set,
>  		elem.flags = ntohl(nla_get_be32(nla[NFTA_SET_ELEM_FLAGS]));
>  		if (elem.flags & ~NFT_SET_ELEM_INTERVAL_END)
>  			return -EINVAL;
> +		if (!(set->flags & NFT_SET_INTERVAL) &&
> +		    elem.flags & NFT_SET_ELEM_INTERVAL_END)
> +			return -EINVAL;
>  	}
>  
>  	if (set->flags & NFT_SET_MAP) {
> -- 
> 2.1.0
> 
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 2/9] nftables: reject NFT_SET_ELEM_INTERVAL_END flag for non-interval sets
  2015-01-30 17:31   ` Pablo Neira Ayuso
@ 2015-01-30 17:55     ` Patrick McHardy
  2015-01-30 18:00       ` Pablo Neira Ayuso
  0 siblings, 1 reply; 21+ messages in thread
From: Patrick McHardy @ 2015-01-30 17:55 UTC (permalink / raw)
  To: Pablo Neira Ayuso
  Cc: herbert, tgraf, davem, David.Laight, ying.xue, paulmck, netdev,
	netfilter-devel

On 30.01, Pablo Neira Ayuso wrote:
> Hi Patrick,
> 
> Unless you have any concern, I'm going to apply this and 8/9 to
> nf-next, so you don't need to resend these two sanitization fixes.

This one is not needed for mainline so far since nft_hash validates
on its own. It is only required since my series centralizes that
validation once the set extensions are added.

For 8/9, sure.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 2/9] nftables: reject NFT_SET_ELEM_INTERVAL_END flag for non-interval sets
  2015-01-30 17:55     ` Patrick McHardy
@ 2015-01-30 18:00       ` Pablo Neira Ayuso
  0 siblings, 0 replies; 21+ messages in thread
From: Pablo Neira Ayuso @ 2015-01-30 18:00 UTC (permalink / raw)
  To: Patrick McHardy
  Cc: herbert, tgraf, davem, David.Laight, ying.xue, paulmck, netdev,
	netfilter-devel

On Fri, Jan 30, 2015 at 05:55:26PM +0000, Patrick McHardy wrote:
> On 30.01, Pablo Neira Ayuso wrote:
> > Hi Patrick,
> > 
> > Unless you have any concern, I'm going to apply this and 8/9 to
> > nf-next, so you don't need to resend these two sanitization fixes.
> 
> This one is not needed for mainline so far since nft_hash validates
> on its own. It is only required since my series centralizes that
> validation once the set extensions are added.
> 
> For 8/9, sure.

OK, I'll take 8/9 then, thanks!

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 7/9] netfilter: nft_hash: add support for timeouts
  2015-01-30  7:46 ` [PATCH 7/9] netfilter: nft_hash: add support for timeouts Patrick McHardy
@ 2015-01-31  4:29   ` Herbert Xu
  2015-01-31 12:16     ` Patrick McHardy
  0 siblings, 1 reply; 21+ messages in thread
From: Herbert Xu @ 2015-01-31  4:29 UTC (permalink / raw)
  To: Patrick McHardy
  Cc: tgraf, davem, David.Laight, ying.xue, paulmck, netdev, netfilter-devel

On Fri, Jan 30, 2015 at 07:46:32AM +0000, Patrick McHardy wrote:
>
> +	mutex_lock(&priv->ht.mutex);
> +	tbl = rht_dereference(priv->ht.tbl, &priv->ht);
> +	for (i = 0; i < tbl->size; i++) {
> +		rht_for_each_entry_safe(he, pos, next, tbl, i, node) {
> +			if (!nft_set_ext_exists(&he->ext, NFT_SET_EXT_TIMEOUT))
> +				continue;
> +			timeout = *nft_set_ext_timeout(&he->ext);
> +			if (time_before(jiffies, timeout))
> +				continue;
> +
> +			rhashtable_remove(&priv->ht, &he->node);
> +			nft_hash_elem_destroy(set, he);
> +		}
> +	}
> +	mutex_unlock(&priv->ht.mutex);

What if somebody is currently walking over the table? Shouldn't
you do an RCU free here instead of immediately destroying the
element?

Cheers,
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 7/9] netfilter: nft_hash: add support for timeouts
  2015-01-31  4:29   ` Herbert Xu
@ 2015-01-31 12:16     ` Patrick McHardy
  0 siblings, 0 replies; 21+ messages in thread
From: Patrick McHardy @ 2015-01-31 12:16 UTC (permalink / raw)
  To: Herbert Xu
  Cc: tgraf, davem, David.Laight, ying.xue, paulmck, netdev, netfilter-devel

On 31.01, Herbert Xu wrote:
> On Fri, Jan 30, 2015 at 07:46:32AM +0000, Patrick McHardy wrote:
> >
> > +	mutex_lock(&priv->ht.mutex);
> > +	tbl = rht_dereference(priv->ht.tbl, &priv->ht);
> > +	for (i = 0; i < tbl->size; i++) {
> > +		rht_for_each_entry_safe(he, pos, next, tbl, i, node) {
> > +			if (!nft_set_ext_exists(&he->ext, NFT_SET_EXT_TIMEOUT))
> > +				continue;
> > +			timeout = *nft_set_ext_timeout(&he->ext);
> > +			if (time_before(jiffies, timeout))
> > +				continue;
> > +
> > +			rhashtable_remove(&priv->ht, &he->node);
> > +			nft_hash_elem_destroy(set, he);
> > +		}
> > +	}
> > +	mutex_unlock(&priv->ht.mutex);
> 
> What if somebody is currently walking over the table? Shouldn't
> you do an RCU free here instead of immediately destroying the
> element?

Yes, that's what I meant in mail 0/x regarding the existing races.
Probably will add some fixed sized batching here.

^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2015-01-31 12:16 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-01-30  7:46 [PATCH 0/9 WIP] nf_tables: set extensions and dynamic updates Patrick McHardy
2015-01-30  7:46 ` [PATCH 1/9] rhashtable: simplify rhashtable_remove() Patrick McHardy
2015-01-30 16:36   ` Thomas Graf
2015-01-30  7:46 ` [PATCH 2/9] nftables: reject NFT_SET_ELEM_INTERVAL_END flag for non-interval sets Patrick McHardy
2015-01-30 17:31   ` Pablo Neira Ayuso
2015-01-30 17:55     ` Patrick McHardy
2015-01-30 18:00       ` Pablo Neira Ayuso
2015-01-30  7:46 ` [PATCH 3/9] nftables: nft_rbtree: fix locking Patrick McHardy
2015-01-30 10:52   ` Pablo Neira Ayuso
2015-01-30  7:46 ` [PATCH 4/9] netfilter: nf_tables: add set extensions Patrick McHardy
2015-01-30  7:46 ` [PATCH 5/9] netfilter: nf_tables: convert hash and rbtree to " Patrick McHardy
2015-01-30  7:46 ` [PATCH 6/9] netfilter: nf_tables: add set timeout support Patrick McHardy
2015-01-30  7:46 ` [PATCH 7/9] netfilter: nft_hash: add support for timeouts Patrick McHardy
2015-01-31  4:29   ` Herbert Xu
2015-01-31 12:16     ` Patrick McHardy
2015-01-30  7:46 ` [PATCH 8/9] netfilter: nft_lookup: add missing attribute validation for NFTA_LOOKUP_SET_ID Patrick McHardy
2015-01-30  7:46 ` [PATCH 9/9] netfilter: nf_tables: add support for dynamic set updates Patrick McHardy
2015-01-30  9:28   ` Herbert Xu
2015-01-30 10:08     ` Patrick McHardy
2015-01-30 10:18       ` Herbert Xu
2015-01-30 11:29         ` Herbert Xu

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.