All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH RFC 0/6] net: ILA resolver and generic resolver backend
@ 2016-09-09 23:19 Tom Herbert
  2016-09-09 23:19 ` [PATCH RFC 1/6] spinlock: Add library function to allocate spinlock buckets array Tom Herbert
                   ` (5 more replies)
  0 siblings, 6 replies; 14+ messages in thread
From: Tom Herbert @ 2016-09-09 23:19 UTC (permalink / raw)
  To: davem, netdev; +Cc: kernel-team, tgraf

This patch sets implements an ILA host side resolver. This uses LWT to
implement the hook to a userspace resolver and tracks pending unresolved
address using the backend net resolver.

This patch set contains:

- An new library function to allocate an array of spinlocks for use
  with locking hash buckets.
- Make hash function in rhashtable directly callable.
- A generic resolver backend infrastructure. This primary does two
  things: track unsesolved addresses and implement a timeout for
  resolution not happening. These mechanisms provides rate limiting
  control over resolution requests (for instance in ILA it use used
  to rate limit requests to userspace to resolve addresses).
- The ILA resolver. This is implements to path from the kernel ILA
  implementation to a userspace daemon that an identifier address
  needs to be resolved.

Tom Herbert (6):
  spinlock: Add library function to allocate spinlock buckets array
  rhashtable: Call library function alloc_bucket_locks
  ila: Call library function alloc_bucket_locks
  rhashtable: abstract out function to get hash
  net: Generic resolver backend
  ila: Resolver mechanism

 include/linux/rhashtable.h     |  28 +++--
 include/linux/spinlock.h       |   6 +
 include/net/resolver.h         |  58 +++++++++
 include/uapi/linux/lwtunnel.h  |   1 +
 include/uapi/linux/rtnetlink.h |   5 +
 lib/Makefile                   |   2 +-
 lib/bucket_locks.c             |  63 ++++++++++
 lib/rhashtable.c               |  46 +------
 net/Kconfig                    |   4 +
 net/core/Makefile              |   1 +
 net/core/resolver.c            | 267 +++++++++++++++++++++++++++++++++++++++++
 net/ipv6/Kconfig               |   1 +
 net/ipv6/ila/Makefile          |   2 +-
 net/ipv6/ila/ila.h             |  16 +++
 net/ipv6/ila/ila_common.c      |   7 ++
 net/ipv6/ila/ila_lwt.c         |   9 ++
 net/ipv6/ila/ila_resolver.c    | 192 +++++++++++++++++++++++++++++
 net/ipv6/ila/ila_xlat.c        |  51 ++------
 18 files changed, 666 insertions(+), 93 deletions(-)
 create mode 100644 include/net/resolver.h
 create mode 100644 lib/bucket_locks.c
 create mode 100644 net/core/resolver.c
 create mode 100644 net/ipv6/ila/ila_resolver.c

-- 
2.8.0.rc2

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH RFC 1/6] spinlock: Add library function to allocate spinlock buckets array
  2016-09-09 23:19 [PATCH RFC 0/6] net: ILA resolver and generic resolver backend Tom Herbert
@ 2016-09-09 23:19 ` Tom Herbert
  2016-09-12 15:17   ` Greg
  2016-09-14  9:27   ` Thomas Graf
  2016-09-09 23:19 ` [PATCH RFC 2/6] rhashtable: Call library function alloc_bucket_locks Tom Herbert
                   ` (4 subsequent siblings)
  5 siblings, 2 replies; 14+ messages in thread
From: Tom Herbert @ 2016-09-09 23:19 UTC (permalink / raw)
  To: davem, netdev; +Cc: kernel-team, tgraf

Add two new library functions alloc_bucket_spinlocks and
free_bucket_spinlocks. These are use to allocate and free an array
of spinlocks that are useful as locks for hash buckets. The interface
specifies the maximum number of spinlocks in the array as well
as a CPU multiplier to derive the number of spinlocks to allocate.
The number to allocated is rounded up to a power of two to make
the array amenable to hash lookup.

Signed-off-by: Tom Herbert <tom@herbertland.com>
---
 include/linux/spinlock.h |  6 +++++
 lib/Makefile             |  2 +-
 lib/bucket_locks.c       | 63 ++++++++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 70 insertions(+), 1 deletion(-)
 create mode 100644 lib/bucket_locks.c

diff --git a/include/linux/spinlock.h b/include/linux/spinlock.h
index 47dd0ce..4ebdfbf 100644
--- a/include/linux/spinlock.h
+++ b/include/linux/spinlock.h
@@ -416,4 +416,10 @@ extern int _atomic_dec_and_lock(atomic_t *atomic, spinlock_t *lock);
 #define atomic_dec_and_lock(atomic, lock) \
 		__cond_lock(lock, _atomic_dec_and_lock(atomic, lock))
 
+int alloc_bucket_spinlocks(spinlock_t **locks, unsigned int *lock_mask,
+			   unsigned int max_size, unsigned int cpu_mult,
+			   gfp_t gfp);
+
+void free_bucket_spinlocks(spinlock_t *locks);
+
 #endif /* __LINUX_SPINLOCK_H */
diff --git a/lib/Makefile b/lib/Makefile
index cfa68eb..a1dedf1 100644
--- a/lib/Makefile
+++ b/lib/Makefile
@@ -37,7 +37,7 @@ obj-y += bcd.o div64.o sort.o parser.o halfmd4.o debug_locks.o random32.o \
 	 gcd.o lcm.o list_sort.o uuid.o flex_array.o iov_iter.o clz_ctz.o \
 	 bsearch.o find_bit.o llist.o memweight.o kfifo.o \
 	 percpu-refcount.o percpu_ida.o rhashtable.o reciprocal_div.o \
-	 once.o
+	 once.o bucket_locks.o
 obj-y += string_helpers.o
 obj-$(CONFIG_TEST_STRING_HELPERS) += test-string_helpers.o
 obj-y += hexdump.o
diff --git a/lib/bucket_locks.c b/lib/bucket_locks.c
new file mode 100644
index 0000000..bb9bf11
--- /dev/null
+++ b/lib/bucket_locks.c
@@ -0,0 +1,63 @@
+#include <linux/kernel.h>
+#include <linux/slab.h>
+#include <linux/vmalloc.h>
+#include <linux/mm.h>
+#include <linux/export.h>
+
+/* Allocate an array of spinlocks to be accessed by a hash. Two arguments
+ * indicate the number of elements to allocate in the array. max_size
+ * gives the maximum number of elements to allocate. cpu_mult gives
+ * the number of locks per CPU to allocate. The size is rounded up
+ * to a power of 2 to be suitable as a hash table.
+ */
+int alloc_bucket_spinlocks(spinlock_t **locks, unsigned int *locks_mask,
+			   unsigned int max_size, unsigned int cpu_mult,
+			   gfp_t gfp)
+{
+	unsigned int i, size;
+#if defined(CONFIG_PROVE_LOCKING)
+	unsigned int nr_pcpus = 2;
+#else
+	unsigned int nr_pcpus = num_possible_cpus();
+#endif
+	spinlock_t *tlocks = NULL;
+
+	if (cpu_mult) {
+		nr_pcpus = min_t(unsigned int, nr_pcpus, 64UL);
+		size = min_t(unsigned int, nr_pcpus * cpu_mult, max_size);
+	} else {
+		size = max_size;
+	}
+	size = roundup_pow_of_two(size);
+
+	if (!size)
+		return -EINVAL;
+
+	if (sizeof(spinlock_t) != 0) {
+#ifdef CONFIG_NUMA
+		if (size * sizeof(spinlock_t) > PAGE_SIZE &&
+		    gfp == GFP_KERNEL)
+			tlocks = vmalloc(size * sizeof(spinlock_t));
+#endif
+		if (gfp != GFP_KERNEL)
+			gfp |= __GFP_NOWARN | __GFP_NORETRY;
+
+		if (!tlocks)
+			tlocks = kmalloc_array(size, sizeof(spinlock_t), gfp);
+		if (!tlocks)
+			return -ENOMEM;
+		for (i = 0; i < size; i++)
+			spin_lock_init(&tlocks[i]);
+	}
+	*locks = tlocks;
+	*locks_mask = size - 1;
+
+	return 0;
+}
+EXPORT_SYMBOL(alloc_bucket_spinlocks);
+
+void free_bucket_spinlocks(spinlock_t *locks)
+{
+	kvfree(locks);
+}
+EXPORT_SYMBOL(free_bucket_spinlocks);
-- 
2.8.0.rc2

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH RFC 2/6] rhashtable: Call library function alloc_bucket_locks
  2016-09-09 23:19 [PATCH RFC 0/6] net: ILA resolver and generic resolver backend Tom Herbert
  2016-09-09 23:19 ` [PATCH RFC 1/6] spinlock: Add library function to allocate spinlock buckets array Tom Herbert
@ 2016-09-09 23:19 ` Tom Herbert
  2016-09-14  9:18   ` Thomas Graf
  2016-09-20  1:49   ` Herbert Xu
  2016-09-09 23:19 ` [PATCH RFC 3/6] ila: " Tom Herbert
                   ` (3 subsequent siblings)
  5 siblings, 2 replies; 14+ messages in thread
From: Tom Herbert @ 2016-09-09 23:19 UTC (permalink / raw)
  To: davem, netdev; +Cc: kernel-team, tgraf

To allocate the array of bucket locks for the hash table we now
call library function alloc_bucket_spinlocks. This function is
based on the old alloc_bucket_locks in rhashtable and should
produce the same effect.

Signed-off-by: Tom Herbert <tom@herbertland.com>
---
 lib/rhashtable.c | 46 ++++------------------------------------------
 1 file changed, 4 insertions(+), 42 deletions(-)

diff --git a/lib/rhashtable.c b/lib/rhashtable.c
index 06c2872..5b53304 100644
--- a/lib/rhashtable.c
+++ b/lib/rhashtable.c
@@ -59,50 +59,10 @@ EXPORT_SYMBOL_GPL(lockdep_rht_bucket_is_held);
 #define ASSERT_RHT_MUTEX(HT)
 #endif
 
-
-static int alloc_bucket_locks(struct rhashtable *ht, struct bucket_table *tbl,
-			      gfp_t gfp)
-{
-	unsigned int i, size;
-#if defined(CONFIG_PROVE_LOCKING)
-	unsigned int nr_pcpus = 2;
-#else
-	unsigned int nr_pcpus = num_possible_cpus();
-#endif
-
-	nr_pcpus = min_t(unsigned int, nr_pcpus, 64UL);
-	size = roundup_pow_of_two(nr_pcpus * ht->p.locks_mul);
-
-	/* Never allocate more than 0.5 locks per bucket */
-	size = min_t(unsigned int, size, tbl->size >> 1);
-
-	if (sizeof(spinlock_t) != 0) {
-		tbl->locks = NULL;
-#ifdef CONFIG_NUMA
-		if (size * sizeof(spinlock_t) > PAGE_SIZE &&
-		    gfp == GFP_KERNEL)
-			tbl->locks = vmalloc(size * sizeof(spinlock_t));
-#endif
-		if (gfp != GFP_KERNEL)
-			gfp |= __GFP_NOWARN | __GFP_NORETRY;
-
-		if (!tbl->locks)
-			tbl->locks = kmalloc_array(size, sizeof(spinlock_t),
-						   gfp);
-		if (!tbl->locks)
-			return -ENOMEM;
-		for (i = 0; i < size; i++)
-			spin_lock_init(&tbl->locks[i]);
-	}
-	tbl->locks_mask = size - 1;
-
-	return 0;
-}
-
 static void bucket_table_free(const struct bucket_table *tbl)
 {
 	if (tbl)
-		kvfree(tbl->locks);
+		free_bucket_spinlocks(tbl->locks);
 
 	kvfree(tbl);
 }
@@ -131,7 +91,9 @@ static struct bucket_table *bucket_table_alloc(struct rhashtable *ht,
 
 	tbl->size = nbuckets;
 
-	if (alloc_bucket_locks(ht, tbl, gfp) < 0) {
+	/* Never allocate more than 0.5 locks per bucket */
+	if (alloc_bucket_spinlocks(&tbl->locks, &tbl->locks_mask,
+				   tbl->size >> 1, ht->p.locks_mul, gfp)) {
 		bucket_table_free(tbl);
 		return NULL;
 	}
-- 
2.8.0.rc2

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH RFC 3/6] ila: Call library function alloc_bucket_locks
  2016-09-09 23:19 [PATCH RFC 0/6] net: ILA resolver and generic resolver backend Tom Herbert
  2016-09-09 23:19 ` [PATCH RFC 1/6] spinlock: Add library function to allocate spinlock buckets array Tom Herbert
  2016-09-09 23:19 ` [PATCH RFC 2/6] rhashtable: Call library function alloc_bucket_locks Tom Herbert
@ 2016-09-09 23:19 ` Tom Herbert
  2016-09-09 23:19 ` [PATCH RFC 4/6] rhashtable: abstract out function to get hash Tom Herbert
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 14+ messages in thread
From: Tom Herbert @ 2016-09-09 23:19 UTC (permalink / raw)
  To: davem, netdev; +Cc: kernel-team, tgraf

To allocate the array of bucket locks for the hash table we now
call library function alloc_bucket_spinlocks.

Signed-off-by: Tom Herbert <tom@herbertland.com>
---
 net/ipv6/ila/ila_xlat.c | 36 +++++-------------------------------
 1 file changed, 5 insertions(+), 31 deletions(-)

diff --git a/net/ipv6/ila/ila_xlat.c b/net/ipv6/ila/ila_xlat.c
index e604013..7d1c34b 100644
--- a/net/ipv6/ila/ila_xlat.c
+++ b/net/ipv6/ila/ila_xlat.c
@@ -30,34 +30,6 @@ struct ila_net {
 	bool hooks_registered;
 };
 
-#define	LOCKS_PER_CPU 10
-
-static int alloc_ila_locks(struct ila_net *ilan)
-{
-	unsigned int i, size;
-	unsigned int nr_pcpus = num_possible_cpus();
-
-	nr_pcpus = min_t(unsigned int, nr_pcpus, 32UL);
-	size = roundup_pow_of_two(nr_pcpus * LOCKS_PER_CPU);
-
-	if (sizeof(spinlock_t) != 0) {
-#ifdef CONFIG_NUMA
-		if (size * sizeof(spinlock_t) > PAGE_SIZE)
-			ilan->locks = vmalloc(size * sizeof(spinlock_t));
-		else
-#endif
-		ilan->locks = kmalloc_array(size, sizeof(spinlock_t),
-					    GFP_KERNEL);
-		if (!ilan->locks)
-			return -ENOMEM;
-		for (i = 0; i < size; i++)
-			spin_lock_init(&ilan->locks[i]);
-	}
-	ilan->locks_mask = size - 1;
-
-	return 0;
-}
-
 static u32 hashrnd __read_mostly;
 static __always_inline void __ila_hash_secret_init(void)
 {
@@ -561,14 +533,16 @@ static const struct genl_ops ila_nl_ops[] = {
 	},
 };
 
-#define ILA_HASH_TABLE_SIZE 1024
+#define LOCKS_PER_CPU 10
+#define MAX_LOCKS 1024
 
 static __net_init int ila_init_net(struct net *net)
 {
 	int err;
 	struct ila_net *ilan = net_generic(net, ila_net_id);
 
-	err = alloc_ila_locks(ilan);
+	err = alloc_bucket_spinlocks(&ilan->locks, &ilan->locks_mask,
+				     MAX_LOCKS, LOCKS_PER_CPU, GFP_KERNEL);
 	if (err)
 		return err;
 
@@ -583,7 +557,7 @@ static __net_exit void ila_exit_net(struct net *net)
 
 	rhashtable_free_and_destroy(&ilan->rhash_table, ila_free_cb, NULL);
 
-	kvfree(ilan->locks);
+	free_bucket_spinlocks(ilan->locks);
 
 	if (ilan->hooks_registered)
 		nf_unregister_net_hooks(net, ila_nf_hook_ops,
-- 
2.8.0.rc2

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH RFC 4/6] rhashtable: abstract out function to get hash
  2016-09-09 23:19 [PATCH RFC 0/6] net: ILA resolver and generic resolver backend Tom Herbert
                   ` (2 preceding siblings ...)
  2016-09-09 23:19 ` [PATCH RFC 3/6] ila: " Tom Herbert
@ 2016-09-09 23:19 ` Tom Herbert
  2016-09-14  9:23   ` Thomas Graf
  2016-09-09 23:19 ` [PATCH RFC 5/6] net: Generic resolver backend Tom Herbert
  2016-09-09 23:19 ` [PATCH RFC 6/6] ila: Resolver mechanism Tom Herbert
  5 siblings, 1 reply; 14+ messages in thread
From: Tom Herbert @ 2016-09-09 23:19 UTC (permalink / raw)
  To: davem, netdev; +Cc: kernel-team, tgraf

Split out most of rht_key_hashfn which is calculating the hash into
its own function. This way the hash function can be called separately to
get the hash value.

Signed-off-by: Tom Herbert <tom@herbertland.com>
---
 include/linux/rhashtable.h | 28 ++++++++++++++++++----------
 1 file changed, 18 insertions(+), 10 deletions(-)

diff --git a/include/linux/rhashtable.h b/include/linux/rhashtable.h
index fd82584..e398a62 100644
--- a/include/linux/rhashtable.h
+++ b/include/linux/rhashtable.h
@@ -208,34 +208,42 @@ static inline unsigned int rht_bucket_index(const struct bucket_table *tbl,
 	return (hash >> RHT_HASH_RESERVED_SPACE) & (tbl->size - 1);
 }
 
-static inline unsigned int rht_key_hashfn(
-	struct rhashtable *ht, const struct bucket_table *tbl,
-	const void *key, const struct rhashtable_params params)
+static inline unsigned int rht_key_get_hash(struct rhashtable *ht,
+	const void *key, const struct rhashtable_params params,
+	unsigned int hash_rnd)
 {
 	unsigned int hash;
 
 	/* params must be equal to ht->p if it isn't constant. */
 	if (!__builtin_constant_p(params.key_len))
-		hash = ht->p.hashfn(key, ht->key_len, tbl->hash_rnd);
+		hash = ht->p.hashfn(key, ht->key_len, hash_rnd);
 	else if (params.key_len) {
 		unsigned int key_len = params.key_len;
 
 		if (params.hashfn)
-			hash = params.hashfn(key, key_len, tbl->hash_rnd);
+			hash = params.hashfn(key, key_len, hash_rnd);
 		else if (key_len & (sizeof(u32) - 1))
-			hash = jhash(key, key_len, tbl->hash_rnd);
+			hash = jhash(key, key_len, hash_rnd);
 		else
-			hash = jhash2(key, key_len / sizeof(u32),
-				      tbl->hash_rnd);
+			hash = jhash2(key, key_len / sizeof(u32), hash_rnd);
 	} else {
 		unsigned int key_len = ht->p.key_len;
 
 		if (params.hashfn)
-			hash = params.hashfn(key, key_len, tbl->hash_rnd);
+			hash = params.hashfn(key, key_len, hash_rnd);
 		else
-			hash = jhash(key, key_len, tbl->hash_rnd);
+			hash = jhash(key, key_len, hash_rnd);
 	}
 
+	return hash;
+}
+
+static inline unsigned int rht_key_hashfn(
+	struct rhashtable *ht, const struct bucket_table *tbl,
+	const void *key, const struct rhashtable_params params)
+{
+	unsigned int hash = rht_key_get_hash(ht, key, params, tbl->hash_rnd);
+
 	return rht_bucket_index(tbl, hash);
 }
 
-- 
2.8.0.rc2

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH RFC 5/6] net: Generic resolver backend
  2016-09-09 23:19 [PATCH RFC 0/6] net: ILA resolver and generic resolver backend Tom Herbert
                   ` (3 preceding siblings ...)
  2016-09-09 23:19 ` [PATCH RFC 4/6] rhashtable: abstract out function to get hash Tom Herbert
@ 2016-09-09 23:19 ` Tom Herbert
  2016-09-14  9:49   ` Thomas Graf
  2016-09-09 23:19 ` [PATCH RFC 6/6] ila: Resolver mechanism Tom Herbert
  5 siblings, 1 reply; 14+ messages in thread
From: Tom Herbert @ 2016-09-09 23:19 UTC (permalink / raw)
  To: davem, netdev; +Cc: kernel-team, tgraf

This patch implements the backend of a resolver, specifically it
provides a means to track unresolved addresses and to time them out.

The resolver is mostly a frontend to an rhashtable where the key
of the table is whatever address type or object is tracked. A resolver
instance is created by net_rslv_create. A resolver is destroyed by
net_rslv_destroy.

There are two functions that are used to manipulate entries in the
table: net_rslv_lookup_and_create and net_rslv_resolved.

net_rslv_lookup_and_create is called with an unresolved address as
the argument. It returns a structure of type net_rslv_ent. When
called a lookup is performed to see if an entry for the address
is already in the table, if it is the entry is return and the
false is returned in the new bool pointer argument to indicate that
the entry was preexisting. If an entry is not found, one is create
and true is returned on the new pointer argument. It is expected
that when an entry is new the address resolution protocol is
initiated (for instance a RTM_ADDR_RESOLVE message may be sent to a
userspace daemon as we will do in ILA). If net_rslv_lookup_and_create
returns NULL then presumably the hash table has reached the limit of
number of outstanding unresolved addresses, the caller should take
appropriate actions to avoid spamming the resolution protocol.

net_rslv_resolved is called when resolution is completely (e.g.
ILA locator mapping was instantiated for a locator. The entry is
removed for the hash table.

An argument to net_rslv_create indicates a time for the pending
resolution in milliseconds. If the timer fires before resolution
then the entry is removed from the table. Subsequently, another
attempt to resolve the same address will result in a new entry in
the table.

net_rslv_lookup_and_create allocates an net_rslv_ent struct and
includes allocating related user data. This is the object[] field
in the structure. The key (unresolved address) is always the first
field in the the object. Following that the caller may add it's
own private field data. The key length and size of the user object
(including the key) are specific in net_rslv_create.

There are three callback functions that can be set as arugments in
net_rslv_create:

   - cmp_fn: Compare function for hash table. Arguments are the
       key and an object in the table. If this is NULL then the
       default memcmp of rhashtable is used.

   - init_fn: Initial a new net_rslv_ent structure. This allows
       initialization of the user portion of the structure
       (the object[]).

   - destroy_fn: Called right before a net_rslv_ent is freed.
       This allows cleanup of user data associated with the
       entry.

Note that the resolver backend only tracks unresolved addresses, it
is up to the caller to perform the mechanism of resolution. This
includes the possible of queuing packets awaiting resolution; this
can be accomplished for instance by maintaining an skbuff queue
in the net_rslv_ent user object[] data.

DOS mitigation is done by limiting the number of entries in the
resolver table (the max_size which argument of net_rslv_create)
and setting a timeout. IF the timeout is set then the maximum rate
of new resolution requests is max_table_size / timeout. For
instance, with a maximum size of 1000 entries and a timeout of 100
msecs the maximum rate of resolutions requests is 10000/s.

Signed-off-by: Tom Herbert <tom@herbertland.com>
---
 include/net/resolver.h |  58 +++++++++++
 net/Kconfig            |   4 +
 net/core/Makefile      |   1 +
 net/core/resolver.c    | 267 +++++++++++++++++++++++++++++++++++++++++++++++++
 4 files changed, 330 insertions(+)
 create mode 100644 include/net/resolver.h
 create mode 100644 net/core/resolver.c

diff --git a/include/net/resolver.h b/include/net/resolver.h
new file mode 100644
index 0000000..8f73b5c
--- /dev/null
+++ b/include/net/resolver.h
@@ -0,0 +1,58 @@
+#ifndef __NET_RESOLVER_H
+#define __NET_RESOLVER_H
+
+#include <linux/rhashtable.h>
+
+struct net_rslv;
+struct net_rslv_ent;
+
+typedef int (*net_rslv_cmpfn)(struct net_rslv *nrslv, const void *key,
+			      const void *object);
+typedef void (*net_rslv_initfn)(struct net_rslv *nrslv, void *object);
+typedef void (*net_rslv_destroyfn)(struct net_rslv_ent *nrent);
+
+struct net_rslv {
+	struct rhashtable rhash_table;
+	struct rhashtable_params params;
+	net_rslv_cmpfn rslv_cmp;
+	net_rslv_initfn rslv_init;
+	net_rslv_destroyfn rslv_destroy;
+	size_t obj_size;
+	spinlock_t *locks;
+	unsigned int locks_mask;
+	unsigned int hash_rnd;
+	long timeout;
+};
+
+struct net_rslv_ent {
+	struct rcu_head rcu;
+	union {
+		/* Fields set when entry is in hash table */
+		struct {
+			struct rhash_head node;
+			struct delayed_work timeout_work;
+			struct net_rslv *nrslv;
+		};
+
+		/* Fields set when rcu freeing structure */
+		struct {
+			net_rslv_destroyfn destroy;
+		};
+	};
+	char object[];
+};
+
+struct net_rslv *net_rslv_create(size_t size, size_t key_len,
+				 size_t max_size, long timeout,
+				 net_rslv_cmpfn cmp_fn,
+				 net_rslv_initfn init_fn,
+				 net_rslv_destroyfn destroy_fn);
+
+struct net_rslv_ent *net_rslv_lookup_and_create(struct net_rslv *nrslv,
+						void *key, bool *created);
+
+void net_rslv_resolved(struct net_rslv *nrslv, void *key);
+
+void net_rslv_destroy(struct net_rslv *nrslv);
+
+#endif
diff --git a/net/Kconfig b/net/Kconfig
index 7b6cd34..fad4fac 100644
--- a/net/Kconfig
+++ b/net/Kconfig
@@ -255,6 +255,10 @@ config XPS
 	depends on SMP
 	default y
 
+config NET_EXT_RESOLVER
+	bool
+	default n
+
 config HWBM
        bool
 
diff --git a/net/core/Makefile b/net/core/Makefile
index d6508c2..c0a0208 100644
--- a/net/core/Makefile
+++ b/net/core/Makefile
@@ -27,3 +27,4 @@ obj-$(CONFIG_LWTUNNEL) += lwtunnel.o
 obj-$(CONFIG_DST_CACHE) += dst_cache.o
 obj-$(CONFIG_HWBM) += hwbm.o
 obj-$(CONFIG_NET_DEVLINK) += devlink.o
+obj-$(CONFIG_NET_EXT_RESOLVER) += resolver.o
diff --git a/net/core/resolver.c b/net/core/resolver.c
new file mode 100644
index 0000000..61b36c5
--- /dev/null
+++ b/net/core/resolver.c
@@ -0,0 +1,267 @@
+#include <linux/errno.h>
+#include <linux/ip.h>
+#include <linux/kernel.h>
+#include <linux/module.h>
+#include <linux/netlink.h>
+#include <linux/skbuff.h>
+#include <linux/socket.h>
+#include <linux/types.h>
+#include <linux/vmalloc.h>
+#include <net/checksum.h>
+#include <net/ip.h>
+#include <net/ip6_fib.h>
+#include <net/lwtunnel.h>
+#include <net/protocol.h>
+#include <net/resolver.h>
+#include <uapi/linux/ila.h>
+
+static void net_rslv_destroy_rcu(struct rcu_head *head)
+{
+	struct net_rslv_ent *nrent = container_of(head, struct net_rslv_ent,
+						  rcu);
+	if (nrent->destroy) {
+		/* Call user's destroy function just before freeing */
+		nrent->destroy(nrent);
+	}
+
+	kfree(nrent);
+}
+
+static void net_rslv_destroy_entry(struct net_rslv *nrslv,
+				   struct net_rslv_ent *nrent)
+{
+	nrent->destroy = nrslv->rslv_destroy;
+	call_rcu(&nrent->rcu, net_rslv_destroy_rcu);
+}
+
+static inline spinlock_t *net_rslv_get_lock(struct net_rslv *nrslv, void *key)
+{
+	unsigned int hash;
+
+	/* Use the rhashtable hash function */
+	hash = rht_key_get_hash(&nrslv->rhash_table, key, nrslv->params,
+				nrslv->hash_rnd);
+
+	return &nrslv->locks[hash & nrslv->locks_mask];
+}
+
+static void net_rslv_delayed_work(struct work_struct *w)
+{
+	struct delayed_work *delayed_work = to_delayed_work(w);
+	struct net_rslv_ent *nrent = container_of(delayed_work,
+						  struct net_rslv_ent,
+						  timeout_work);
+	struct net_rslv *nrslv = nrent->nrslv;
+	spinlock_t *lock = net_rslv_get_lock(nrslv, nrent->object);
+
+	spin_lock(lock);
+	rhashtable_remove_fast(&nrslv->rhash_table, &nrent->node,
+			       nrslv->params);
+	spin_unlock(lock);
+
+	net_rslv_destroy_entry(nrslv, nrent);
+}
+
+static void net_rslv_ent_free_cb(void *ptr, void *arg)
+{
+	struct net_rslv_ent *nrent = (struct net_rslv_ent *)ptr;
+	struct net_rslv *nrslv = nrent->nrslv;
+
+	net_rslv_destroy_entry(nrslv, nrent);
+}
+
+void net_rslv_resolved(struct net_rslv *nrslv, void *key)
+{
+	spinlock_t *lock = net_rslv_get_lock(nrslv, key);
+	struct net_rslv_ent *nrent;
+
+	rcu_read_lock();
+
+	nrent = rhashtable_lookup_fast(&nrslv->rhash_table, key,
+				       nrslv->params);
+	if (!nrent)
+		goto out;
+
+	/* Cancel timer first */
+	cancel_delayed_work_sync(&nrent->timeout_work);
+
+	spin_lock(lock);
+
+	/* Lookup again just in case someone already removed */
+	nrent = rhashtable_lookup_fast(&nrslv->rhash_table, key,
+				       nrslv->params);
+	if (unlikely(!nrent)) {
+		spin_unlock(lock);
+		goto out;
+	}
+
+	rhashtable_remove_fast(&nrslv->rhash_table, &nrent->node,
+			       nrslv->params);
+	spin_unlock(lock);
+
+	net_rslv_destroy_entry(nrslv, nrent);
+
+out:
+	rcu_read_unlock();
+}
+EXPORT_SYMBOL_GPL(net_rslv_resolved);
+
+static struct net_rslv_ent *net_rslv_new_ent(struct net_rslv *nrslv,
+					     void *key)
+{
+	struct net_rslv_ent *nrent;
+	int err;
+
+	nrent = kzalloc(sizeof(*nrent) + nrslv->obj_size, GFP_KERNEL);
+	if (!nrent)
+		return ERR_PTR(-EAGAIN);
+
+	/* Key is always at beginning of object data */
+	memcpy(nrent->object, key, nrslv->params.key_len);
+
+	/* Initialize user data */
+	if (nrslv->rslv_init)
+		nrslv->rslv_init(nrslv, nrent);
+
+	/* Put in hash table */
+	err = rhashtable_lookup_insert_fast(&nrslv->rhash_table,
+					    &nrent->node, nrslv->params);
+	if (err)
+		return ERR_PTR(err);
+
+	if (nrslv->timeout) {
+		/* Schedule timeout for resolver */
+		INIT_DELAYED_WORK(&nrent->timeout_work, net_rslv_delayed_work);
+		schedule_delayed_work(&nrent->timeout_work, nrslv->timeout);
+	}
+
+	nrent->nrslv = nrslv;
+
+	return nrent;
+}
+
+struct net_rslv_ent *net_rslv_lookup_and_create(struct net_rslv *nrslv,
+						void *key, bool *created)
+{
+	spinlock_t *lock = net_rslv_get_lock(nrslv, key);
+	struct net_rslv_ent *nrent;
+
+	*created = false;
+	nrent = rhashtable_lookup_fast(&nrslv->rhash_table, key,
+				       nrslv->params);
+	if (nrent)
+		return nrent;
+
+	spin_lock(lock);
+
+	/* Check if someone beat us to the punch */
+	nrent = rhashtable_lookup_fast(&nrslv->rhash_table, key,
+				       nrslv->params);
+	if (nrent) {
+		spin_unlock(lock);
+		return nrent;
+	}
+
+	nrent = net_rslv_new_ent(nrslv, key);
+
+	spin_unlock(lock);
+
+	*created = true;
+
+	return nrent;
+}
+EXPORT_SYMBOL_GPL(net_rslv_lookup_and_create);
+
+static int net_rslv_cmp(struct rhashtable_compare_arg *arg,
+			const void *obj)
+{
+	struct net_rslv *nrslv = container_of(arg->ht, struct net_rslv,
+					      rhash_table);
+
+	return nrslv->rslv_cmp(nrslv, arg->key, obj);
+}
+
+#define LOCKS_PER_CPU	10
+#define MAX_LOCKS 1024
+
+struct net_rslv *net_rslv_create(size_t obj_size, size_t key_len,
+				 size_t max_size, long timeout,
+				 net_rslv_cmpfn cmp_fn,
+				 net_rslv_initfn init_fn,
+				 net_rslv_destroyfn destroy_fn)
+{
+	struct net_rslv *nrslv;
+	int err;
+
+	if (key_len < obj_size)
+		return ERR_PTR(-EINVAL);
+
+	nrslv = kzalloc(sizeof(*nrslv), GFP_KERNEL);
+	if (!nrslv)
+		return ERR_PTR(-ENOMEM);
+
+	err = alloc_bucket_spinlocks(&nrslv->locks, &nrslv->locks_mask,
+				     MAX_LOCKS, LOCKS_PER_CPU, GFP_KERNEL);
+	if (err)
+		return ERR_PTR(err);
+
+	nrslv->obj_size = obj_size;
+	nrslv->rslv_init = init_fn;
+	nrslv->rslv_cmp = cmp_fn;
+	nrslv->rslv_destroy = destroy_fn;
+	nrslv->timeout = msecs_to_jiffies(timeout);
+	get_random_bytes(&nrslv->hash_rnd, sizeof(nrslv->hash_rnd));
+
+	nrslv->params.head_offset = offsetof(struct net_rslv_ent, node);
+	nrslv->params.key_offset = offsetof(struct net_rslv_ent, object);
+	nrslv->params.key_len = key_len;
+	nrslv->params.max_size = max_size;
+	nrslv->params.min_size = 256;
+	nrslv->params.automatic_shrinking = true;
+	nrslv->params.obj_cmpfn = cmp_fn ? net_rslv_cmp : NULL;
+
+	rhashtable_init(&nrslv->rhash_table, &nrslv->params);
+
+	return nrslv;
+}
+EXPORT_SYMBOL_GPL(net_rslv_create);
+
+static void net_rslv_cancel_all_delayed_work(struct net_rslv *nrslv)
+{
+	struct rhashtable_iter iter;
+	struct net_rslv_ent *nrent;
+	int ret;
+
+	ret = rhashtable_walk_init(&nrslv->rhash_table, &iter, GFP_ATOMIC);
+	if (WARN_ON(ret))
+		return;
+
+	ret = rhashtable_walk_start(&iter);
+	if (WARN_ON(ret && ret != -EAGAIN))
+		goto err;
+
+	while ((nrent = rhashtable_walk_next(&iter)))
+		cancel_delayed_work_sync(&nrent->timeout_work);
+
+err:
+	rhashtable_walk_stop(&iter);
+	rhashtable_walk_exit(&iter);
+}
+
+void net_rslv_destroy(struct net_rslv *nrslv)
+{
+	/* First cancel delayed work in all the nodes. We don't want
+	 * delayed work trying to remove nodes from the table while
+	 * rhashtable_free_and_destroy is walking.
+	 */
+	net_rslv_cancel_all_delayed_work(nrslv);
+
+	rhashtable_free_and_destroy(&nrslv->rhash_table,
+				    net_rslv_ent_free_cb, NULL);
+
+	free_bucket_spinlocks(nrslv->locks);
+
+	kfree(nrslv);
+}
+EXPORT_SYMBOL_GPL(net_rslv_destroy);
+
-- 
2.8.0.rc2

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH RFC 6/6] ila: Resolver mechanism
  2016-09-09 23:19 [PATCH RFC 0/6] net: ILA resolver and generic resolver backend Tom Herbert
                   ` (4 preceding siblings ...)
  2016-09-09 23:19 ` [PATCH RFC 5/6] net: Generic resolver backend Tom Herbert
@ 2016-09-09 23:19 ` Tom Herbert
  5 siblings, 0 replies; 14+ messages in thread
From: Tom Herbert @ 2016-09-09 23:19 UTC (permalink / raw)
  To: davem, netdev; +Cc: kernel-team, tgraf

Implement an ILA resolver. This uses LWT to implement the hook to a
userspace resolver and tracks pending unresolved address using the
backend net resolver.

The idea is that the kernel sets an ILA resolver route to the
SIR prefix, something like:

ip route add 3333::/64 encap ila-resolve \
     via 2401:db00:20:911a::27:0 dev eth0

When a packet hits the route the address is looked up in a resolver
table. If the entry is created (no entry with the address already
exists) then an rtnl message is generated with group
RTNLGRP_ILA_NOTIFY and type RTM_ADDR_RESOLVE. A userspace
daemon can listen for such messages and perform an ILA resolution
protocol to determine the ILA mapping. If the mapping is resolved
then a /128 ila encap router is set so that host can perform
ILA translation and send directly to destination.

Signed-off-by: Tom Herbert <tom@herbertland.com>
---
 include/uapi/linux/lwtunnel.h  |   1 +
 include/uapi/linux/rtnetlink.h |   5 ++
 net/ipv6/Kconfig               |   1 +
 net/ipv6/ila/Makefile          |   2 +-
 net/ipv6/ila/ila.h             |  16 ++++
 net/ipv6/ila/ila_common.c      |   7 ++
 net/ipv6/ila/ila_lwt.c         |   9 ++
 net/ipv6/ila/ila_resolver.c    | 192 +++++++++++++++++++++++++++++++++++++++++
 net/ipv6/ila/ila_xlat.c        |  15 ++--
 9 files changed, 239 insertions(+), 9 deletions(-)
 create mode 100644 net/ipv6/ila/ila_resolver.c

diff --git a/include/uapi/linux/lwtunnel.h b/include/uapi/linux/lwtunnel.h
index a478fe8..d880e49 100644
--- a/include/uapi/linux/lwtunnel.h
+++ b/include/uapi/linux/lwtunnel.h
@@ -9,6 +9,7 @@ enum lwtunnel_encap_types {
 	LWTUNNEL_ENCAP_IP,
 	LWTUNNEL_ENCAP_ILA,
 	LWTUNNEL_ENCAP_IP6,
+	LWTUNNEL_ENCAP_ILA_NOTIFY,
 	__LWTUNNEL_ENCAP_MAX,
 };
 
diff --git a/include/uapi/linux/rtnetlink.h b/include/uapi/linux/rtnetlink.h
index 262f037..271215f 100644
--- a/include/uapi/linux/rtnetlink.h
+++ b/include/uapi/linux/rtnetlink.h
@@ -144,6 +144,9 @@ enum {
 	RTM_GETSTATS = 94,
 #define RTM_GETSTATS RTM_GETSTATS
 
+	RTM_ADDR_RESOLVE = 95,
+#define RTM_ADDR_RESOLVE RTM_ADDR_RESOLVE
+
 	__RTM_MAX,
 #define RTM_MAX		(((__RTM_MAX + 3) & ~3) - 1)
 };
@@ -656,6 +659,8 @@ enum rtnetlink_groups {
 #define RTNLGRP_MPLS_ROUTE	RTNLGRP_MPLS_ROUTE
 	RTNLGRP_NSID,
 #define RTNLGRP_NSID		RTNLGRP_NSID
+	RTNLGRP_ILA_NOTIFY,
+#define RTNLGRP_ILA_NOTIFY	RTNLGRP_ILA_NOTIFY
 	__RTNLGRP_MAX
 };
 #define RTNLGRP_MAX	(__RTNLGRP_MAX - 1)
diff --git a/net/ipv6/Kconfig b/net/ipv6/Kconfig
index 2343e4f..cf3ea8e 100644
--- a/net/ipv6/Kconfig
+++ b/net/ipv6/Kconfig
@@ -97,6 +97,7 @@ config IPV6_ILA
 	tristate "IPv6: Identifier Locator Addressing (ILA)"
 	depends on NETFILTER
 	select LWTUNNEL
+	select NET_EXT_RESOLVER
 	---help---
 	  Support for IPv6 Identifier Locator Addressing (ILA).
 
diff --git a/net/ipv6/ila/Makefile b/net/ipv6/ila/Makefile
index 4b32e59..f2aadc3 100644
--- a/net/ipv6/ila/Makefile
+++ b/net/ipv6/ila/Makefile
@@ -4,4 +4,4 @@
 
 obj-$(CONFIG_IPV6_ILA) += ila.o
 
-ila-objs := ila_common.o ila_lwt.o ila_xlat.o
+ila-objs := ila_common.o ila_lwt.o ila_xlat.o ila_resolver.o
diff --git a/net/ipv6/ila/ila.h b/net/ipv6/ila/ila.h
index e0170f6..e369611 100644
--- a/net/ipv6/ila/ila.h
+++ b/net/ipv6/ila/ila.h
@@ -15,6 +15,7 @@
 #include <linux/ip.h>
 #include <linux/kernel.h>
 #include <linux/module.h>
+#include <linux/rhashtable.h>
 #include <linux/socket.h>
 #include <linux/skbuff.h>
 #include <linux/types.h>
@@ -23,6 +24,16 @@
 #include <net/protocol.h>
 #include <uapi/linux/ila.h>
 
+extern unsigned int ila_net_id;
+
+struct ila_net {
+	struct rhashtable rhash_table;
+	spinlock_t *locks; /* Bucket locks for entry manipulation */
+	unsigned int locks_mask;
+	bool hooks_registered;
+	struct net_rslv *nrslv;
+};
+
 struct ila_locator {
 	union {
 		__u8            v8[8];
@@ -114,9 +125,14 @@ void ila_update_ipv6_locator(struct sk_buff *skb, struct ila_params *p,
 
 void ila_init_saved_csum(struct ila_params *p);
 
+void ila_rslv_resolved(struct ila_net *ilan, struct ila_addr *iaddr);
 int ila_lwt_init(void);
 void ila_lwt_fini(void);
 int ila_xlat_init(void);
 void ila_xlat_fini(void);
+int ila_rslv_init(void);
+void ila_rslv_fini(void);
+int ila_init_resolver_net(struct ila_net *ilan);
+void ila_exit_resolver_net(struct ila_net *ilan);
 
 #endif /* __ILA_H */
diff --git a/net/ipv6/ila/ila_common.c b/net/ipv6/ila/ila_common.c
index aba0998..83c7d4a 100644
--- a/net/ipv6/ila/ila_common.c
+++ b/net/ipv6/ila/ila_common.c
@@ -157,7 +157,13 @@ static int __init ila_init(void)
 	if (ret)
 		goto fail_xlat;
 
+	ret = ila_rslv_init();
+	if (ret)
+		goto fail_rslv;
+
 	return 0;
+fail_rslv:
+	ila_xlat_fini();
 fail_xlat:
 	ila_lwt_fini();
 fail_lwt:
@@ -168,6 +174,7 @@ static void __exit ila_fini(void)
 {
 	ila_xlat_fini();
 	ila_lwt_fini();
+	ila_rslv_fini();
 }
 
 module_init(ila_init);
diff --git a/net/ipv6/ila/ila_lwt.c b/net/ipv6/ila/ila_lwt.c
index e50c27a..02594aa 100644
--- a/net/ipv6/ila/ila_lwt.c
+++ b/net/ipv6/ila/ila_lwt.c
@@ -9,6 +9,7 @@
 #include <net/ip.h>
 #include <net/ip6_fib.h>
 #include <net/lwtunnel.h>
+#include <net/netns/generic.h>
 #include <net/protocol.h>
 #include <uapi/linux/ila.h>
 #include "ila.h"
@@ -122,6 +123,14 @@ static int ila_build_state(struct net_device *dev, struct nlattr *nla,
 
 	*ts = newts;
 
+	if (cfg6->fc_dst_len >= sizeof(struct ila_addr)) {
+		struct net *net = dev_net(dev);
+		struct ila_net *ilan = net_generic(net, ila_net_id);
+
+		/* Cancel any pending resolution on this address */
+		ila_rslv_resolved(ilan, iaddr);
+	}
+
 	return 0;
 }
 
diff --git a/net/ipv6/ila/ila_resolver.c b/net/ipv6/ila/ila_resolver.c
new file mode 100644
index 0000000..4dd6262
--- /dev/null
+++ b/net/ipv6/ila/ila_resolver.c
@@ -0,0 +1,192 @@
+#include <linux/errno.h>
+#include <linux/ip.h>
+#include <linux/kernel.h>
+#include <linux/module.h>
+#include <linux/netlink.h>
+#include <linux/skbuff.h>
+#include <linux/socket.h>
+#include <linux/types.h>
+#include <net/checksum.h>
+#include <net/ip.h>
+#include <net/ip6_fib.h>
+#include <net/lwtunnel.h>
+#include <net/netns/generic.h>
+#include <net/protocol.h>
+#include <net/resolver.h>
+#include <uapi/linux/ila.h>
+#include "ila.h"
+
+struct ila_notify {
+	int type;
+	struct in6_addr addr;
+};
+
+#define ILA_NOTIFY_SIR_DEST 1
+
+static int ila_fill_notify(struct sk_buff *skb, struct in6_addr *addr,
+			   u32 pid, u32 seq, int event, int flags)
+{
+	struct ila_notify *nila;
+	struct nlmsghdr *nlh;
+
+	nlh = nlmsg_put(skb, pid, seq, event, sizeof(*nila), flags);
+	if (!nlh)
+		return -EMSGSIZE;
+
+	nila = nlmsg_data(nlh);
+	nila->type = ILA_NOTIFY_SIR_DEST;
+	nila->addr = *addr;
+
+	nlmsg_end(skb, nlh);
+
+	return 0;
+}
+
+void ila_rslv_notify(struct net *net, struct sk_buff *skb)
+{
+	struct ipv6hdr *ip6h = ipv6_hdr(skb);
+	struct sk_buff *nlskb;
+	int err = 0;
+
+	/* Send ILA notification to user */
+	nlskb = nlmsg_new(NLMSG_ALIGN(sizeof(struct ila_notify) +
+			nlmsg_total_size(1)), GFP_KERNEL);
+	if (!nlskb)
+		goto errout;
+
+	err = ila_fill_notify(nlskb, &ip6h->daddr, 0, 0, RTM_ADDR_RESOLVE,
+			      NLM_F_MULTI);
+	if (err < 0) {
+		WARN_ON(err == -EMSGSIZE);
+		kfree_skb(nlskb);
+		goto errout;
+	}
+	rtnl_notify(nlskb, net, 0, RTNLGRP_ILA_NOTIFY, NULL, GFP_ATOMIC);
+	return;
+
+errout:
+	if (err < 0)
+		rtnl_set_sk_err(net, RTNLGRP_ILA_NOTIFY, err);
+}
+
+static int ila_rslv_output(struct net *net, struct sock *sk,
+			   struct sk_buff *skb)
+{
+	struct ila_net *ilan = net_generic(net, ila_net_id);
+	struct dst_entry *dst = skb_dst(skb);
+	struct net_rslv_ent *nrent;
+	struct ipv6hdr *ip6h = ipv6_hdr(skb);
+	bool new;
+
+	/* Don't bother taking rcu lock, we only want to know if the entry
+	 * exists or not.
+	 */
+	nrent = net_rslv_lookup_and_create(ilan->nrslv, &ip6h->daddr, &new);
+
+	if (nrent && new)
+		ila_rslv_notify(net, skb);
+
+	return dst->lwtstate->orig_output(net, sk, skb);
+}
+
+void ila_rslv_resolved(struct ila_net *ilan, struct ila_addr *iaddr)
+{
+	if (ilan->nrslv)
+		net_rslv_resolved(ilan->nrslv, iaddr);
+}
+
+static int ila_rslv_input(struct sk_buff *skb)
+{
+	struct dst_entry *dst = skb_dst(skb);
+
+	return dst->lwtstate->orig_input(skb);
+}
+
+static int ila_rslv_build_state(struct net_device *dev, struct nlattr *nla,
+				unsigned int family, const void *cfg,
+				struct lwtunnel_state **ts)
+{
+	struct lwtunnel_state *newts;
+	struct ila_net *ilan = net_generic(dev_net(dev), ila_net_id);
+
+	if (unlikely(!ilan->nrslv)) {
+		int err;
+
+		/* Only create net resolver on demand */
+		err = ila_init_resolver_net(ilan);
+		if (err)
+			return err;
+	}
+
+	if (family != AF_INET6)
+		return -EINVAL;
+
+	newts = lwtunnel_state_alloc(0);
+	if (!newts)
+		return -ENOMEM;
+
+	newts->len = 0;
+	newts->type = LWTUNNEL_ENCAP_ILA_NOTIFY;
+	newts->flags |= LWTUNNEL_STATE_OUTPUT_REDIRECT |
+			LWTUNNEL_STATE_INPUT_REDIRECT;
+
+	*ts = newts;
+
+	return 0;
+}
+
+static int ila_rslv_fill_encap_info(struct sk_buff *skb,
+				    struct lwtunnel_state *lwtstate)
+{
+	return 0;
+}
+
+static int ila_rslv_nlsize(struct lwtunnel_state *lwtstate)
+{
+	return 0;
+}
+
+static int ila_rslv_cmp(struct lwtunnel_state *a, struct lwtunnel_state *b)
+{
+	return 0;
+}
+
+static const struct lwtunnel_encap_ops ila_rslv_ops = {
+	.build_state = ila_rslv_build_state,
+	.output = ila_rslv_output,
+	.input = ila_rslv_input,
+	.fill_encap = ila_rslv_fill_encap_info,
+	.get_encap_size = ila_rslv_nlsize,
+	.cmp_encap = ila_rslv_cmp,
+};
+
+#define ILA_RESOLVER_TIMEOUT 100
+#define ILA_MAX_SIZE 8192
+
+int ila_init_resolver_net(struct ila_net *ilan)
+{
+	ilan->nrslv = net_rslv_create(sizeof(struct ila_addr),
+				      sizeof(struct ila_addr), ILA_MAX_SIZE,
+				      ILA_RESOLVER_TIMEOUT, NULL, NULL, NULL);
+
+	if (!ilan->nrslv)
+		return -ENOMEM;
+
+	return 0;
+}
+
+void ila_exit_resolver_net(struct ila_net *ilan)
+{
+	if (ilan->nrslv)
+		net_rslv_destroy(ilan->nrslv);
+}
+
+int ila_rslv_init(void)
+{
+	return lwtunnel_encap_add_ops(&ila_rslv_ops, LWTUNNEL_ENCAP_ILA_NOTIFY);
+}
+
+void ila_rslv_fini(void)
+{
+	lwtunnel_encap_del_ops(&ila_rslv_ops, LWTUNNEL_ENCAP_ILA_NOTIFY);
+}
diff --git a/net/ipv6/ila/ila_xlat.c b/net/ipv6/ila/ila_xlat.c
index 7d1c34b..9fcb041 100644
--- a/net/ipv6/ila/ila_xlat.c
+++ b/net/ipv6/ila/ila_xlat.c
@@ -21,14 +21,7 @@ struct ila_map {
 	struct rcu_head rcu;
 };
 
-static unsigned int ila_net_id;
-
-struct ila_net {
-	struct rhashtable rhash_table;
-	spinlock_t *locks; /* Bucket locks for entry manipulation */
-	unsigned int locks_mask;
-	bool hooks_registered;
-};
+unsigned int ila_net_id;
 
 static u32 hashrnd __read_mostly;
 static __always_inline void __ila_hash_secret_init(void)
@@ -546,6 +539,10 @@ static __net_init int ila_init_net(struct net *net)
 	if (err)
 		return err;
 
+	/* Resolver net is create on demand when LWT ILA resolver route
+	 * is made.
+	 */
+
 	rhashtable_init(&ilan->rhash_table, &rht_params);
 
 	return 0;
@@ -557,6 +554,8 @@ static __net_exit void ila_exit_net(struct net *net)
 
 	rhashtable_free_and_destroy(&ilan->rhash_table, ila_free_cb, NULL);
 
+	ila_exit_resolver_net(ilan);
+
 	free_bucket_spinlocks(ilan->locks);
 
 	if (ilan->hooks_registered)
-- 
2.8.0.rc2

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH RFC 1/6] spinlock: Add library function to allocate spinlock buckets array
  2016-09-09 23:19 ` [PATCH RFC 1/6] spinlock: Add library function to allocate spinlock buckets array Tom Herbert
@ 2016-09-12 15:17   ` Greg
  2016-09-14  9:27   ` Thomas Graf
  1 sibling, 0 replies; 14+ messages in thread
From: Greg @ 2016-09-12 15:17 UTC (permalink / raw)
  To: Tom Herbert; +Cc: davem, netdev, kernel-team, tgraf

On Fri, 2016-09-09 at 16:19 -0700, Tom Herbert wrote:
> Add two new library functions alloc_bucket_spinlocks and
> free_bucket_spinlocks. These are use to allocate and free an array
> of spinlocks that are useful as locks for hash buckets. The interface
> specifies the maximum number of spinlocks in the array as well
> as a CPU multiplier to derive the number of spinlocks to allocate.
> The number to allocated is rounded up to a power of two to make
> the array amenable to hash lookup.
> 
> Signed-off-by: Tom Herbert <tom@herbertland.com>

I like this idea!!

Reviewed by Greg Rose <grose@lightfleet.com>

> ---
>  include/linux/spinlock.h |  6 +++++
>  lib/Makefile             |  2 +-
>  lib/bucket_locks.c       | 63 ++++++++++++++++++++++++++++++++++++++++++++++++
>  3 files changed, 70 insertions(+), 1 deletion(-)
>  create mode 100644 lib/bucket_locks.c
> 
> diff --git a/include/linux/spinlock.h b/include/linux/spinlock.h
> index 47dd0ce..4ebdfbf 100644
> --- a/include/linux/spinlock.h
> +++ b/include/linux/spinlock.h
> @@ -416,4 +416,10 @@ extern int _atomic_dec_and_lock(atomic_t *atomic, spinlock_t *lock);
>  #define atomic_dec_and_lock(atomic, lock) \
>  		__cond_lock(lock, _atomic_dec_and_lock(atomic, lock))
>  
> +int alloc_bucket_spinlocks(spinlock_t **locks, unsigned int *lock_mask,
> +			   unsigned int max_size, unsigned int cpu_mult,
> +			   gfp_t gfp);
> +
> +void free_bucket_spinlocks(spinlock_t *locks);
> +
>  #endif /* __LINUX_SPINLOCK_H */
> diff --git a/lib/Makefile b/lib/Makefile
> index cfa68eb..a1dedf1 100644
> --- a/lib/Makefile
> +++ b/lib/Makefile
> @@ -37,7 +37,7 @@ obj-y += bcd.o div64.o sort.o parser.o halfmd4.o debug_locks.o random32.o \
>  	 gcd.o lcm.o list_sort.o uuid.o flex_array.o iov_iter.o clz_ctz.o \
>  	 bsearch.o find_bit.o llist.o memweight.o kfifo.o \
>  	 percpu-refcount.o percpu_ida.o rhashtable.o reciprocal_div.o \
> -	 once.o
> +	 once.o bucket_locks.o
>  obj-y += string_helpers.o
>  obj-$(CONFIG_TEST_STRING_HELPERS) += test-string_helpers.o
>  obj-y += hexdump.o
> diff --git a/lib/bucket_locks.c b/lib/bucket_locks.c
> new file mode 100644
> index 0000000..bb9bf11
> --- /dev/null
> +++ b/lib/bucket_locks.c
> @@ -0,0 +1,63 @@
> +#include <linux/kernel.h>
> +#include <linux/slab.h>
> +#include <linux/vmalloc.h>
> +#include <linux/mm.h>
> +#include <linux/export.h>
> +
> +/* Allocate an array of spinlocks to be accessed by a hash. Two arguments
> + * indicate the number of elements to allocate in the array. max_size
> + * gives the maximum number of elements to allocate. cpu_mult gives
> + * the number of locks per CPU to allocate. The size is rounded up
> + * to a power of 2 to be suitable as a hash table.
> + */
> +int alloc_bucket_spinlocks(spinlock_t **locks, unsigned int *locks_mask,
> +			   unsigned int max_size, unsigned int cpu_mult,
> +			   gfp_t gfp)
> +{
> +	unsigned int i, size;
> +#if defined(CONFIG_PROVE_LOCKING)
> +	unsigned int nr_pcpus = 2;
> +#else
> +	unsigned int nr_pcpus = num_possible_cpus();
> +#endif
> +	spinlock_t *tlocks = NULL;
> +
> +	if (cpu_mult) {
> +		nr_pcpus = min_t(unsigned int, nr_pcpus, 64UL);
> +		size = min_t(unsigned int, nr_pcpus * cpu_mult, max_size);
> +	} else {
> +		size = max_size;
> +	}
> +	size = roundup_pow_of_two(size);
> +
> +	if (!size)
> +		return -EINVAL;
> +
> +	if (sizeof(spinlock_t) != 0) {
> +#ifdef CONFIG_NUMA
> +		if (size * sizeof(spinlock_t) > PAGE_SIZE &&
> +		    gfp == GFP_KERNEL)
> +			tlocks = vmalloc(size * sizeof(spinlock_t));
> +#endif
> +		if (gfp != GFP_KERNEL)
> +			gfp |= __GFP_NOWARN | __GFP_NORETRY;
> +
> +		if (!tlocks)
> +			tlocks = kmalloc_array(size, sizeof(spinlock_t), gfp);
> +		if (!tlocks)
> +			return -ENOMEM;
> +		for (i = 0; i < size; i++)
> +			spin_lock_init(&tlocks[i]);
> +	}
> +	*locks = tlocks;
> +	*locks_mask = size - 1;
> +
> +	return 0;
> +}
> +EXPORT_SYMBOL(alloc_bucket_spinlocks);
> +
> +void free_bucket_spinlocks(spinlock_t *locks)
> +{
> +	kvfree(locks);
> +}
> +EXPORT_SYMBOL(free_bucket_spinlocks);

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH RFC 2/6] rhashtable: Call library function alloc_bucket_locks
  2016-09-09 23:19 ` [PATCH RFC 2/6] rhashtable: Call library function alloc_bucket_locks Tom Herbert
@ 2016-09-14  9:18   ` Thomas Graf
  2016-09-20  1:49   ` Herbert Xu
  1 sibling, 0 replies; 14+ messages in thread
From: Thomas Graf @ 2016-09-14  9:18 UTC (permalink / raw)
  To: Tom Herbert; +Cc: davem, netdev, kernel-team

On 09/09/16 at 04:19pm, Tom Herbert wrote:
> To allocate the array of bucket locks for the hash table we now
> call library function alloc_bucket_spinlocks. This function is
> based on the old alloc_bucket_locks in rhashtable and should
> produce the same effect.
> 
> Signed-off-by: Tom Herbert <tom@herbertland.com>

Acked-by: Thomas Graf <tgraf@suug.ch>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH RFC 4/6] rhashtable: abstract out function to get hash
  2016-09-09 23:19 ` [PATCH RFC 4/6] rhashtable: abstract out function to get hash Tom Herbert
@ 2016-09-14  9:23   ` Thomas Graf
  0 siblings, 0 replies; 14+ messages in thread
From: Thomas Graf @ 2016-09-14  9:23 UTC (permalink / raw)
  To: Tom Herbert; +Cc: davem, netdev, kernel-team

On 09/09/16 at 04:19pm, Tom Herbert wrote:
> Split out most of rht_key_hashfn which is calculating the hash into
> its own function. This way the hash function can be called separately to
> get the hash value.
> 
> Signed-off-by: Tom Herbert <tom@herbertland.com>

Acked-by: Thomas Graf <tgraf@suug.ch>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH RFC 1/6] spinlock: Add library function to allocate spinlock buckets array
  2016-09-09 23:19 ` [PATCH RFC 1/6] spinlock: Add library function to allocate spinlock buckets array Tom Herbert
  2016-09-12 15:17   ` Greg
@ 2016-09-14  9:27   ` Thomas Graf
  1 sibling, 0 replies; 14+ messages in thread
From: Thomas Graf @ 2016-09-14  9:27 UTC (permalink / raw)
  To: Tom Herbert; +Cc: davem, netdev, kernel-team

On 09/09/16 at 04:19pm, Tom Herbert wrote:
> Add two new library functions alloc_bucket_spinlocks and
> free_bucket_spinlocks. These are use to allocate and free an array
> of spinlocks that are useful as locks for hash buckets. The interface
> specifies the maximum number of spinlocks in the array as well
> as a CPU multiplier to derive the number of spinlocks to allocate.
> The number to allocated is rounded up to a power of two to make
> the array amenable to hash lookup.
> 
> Signed-off-by: Tom Herbert <tom@herbertland.com>

Acked-by: Thomas Graf <tgraf@suug.ch>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH RFC 5/6] net: Generic resolver backend
  2016-09-09 23:19 ` [PATCH RFC 5/6] net: Generic resolver backend Tom Herbert
@ 2016-09-14  9:49   ` Thomas Graf
  2016-09-14 19:56     ` Tom Herbert
  0 siblings, 1 reply; 14+ messages in thread
From: Thomas Graf @ 2016-09-14  9:49 UTC (permalink / raw)
  To: Tom Herbert; +Cc: davem, netdev, kernel-team

On 09/09/16 at 04:19pm, Tom Herbert wrote:
> diff --git a/net/core/resolver.c b/net/core/resolver.c
> new file mode 100644
> index 0000000..61b36c5
> --- /dev/null
> +++ b/net/core/resolver.c
> @@ -0,0 +1,267 @@
> +#include <linux/errno.h>
> +#include <linux/ip.h>
> +#include <linux/kernel.h>
> +#include <linux/module.h>
> +#include <linux/netlink.h>
> +#include <linux/skbuff.h>
> +#include <linux/socket.h>
> +#include <linux/types.h>
> +#include <linux/vmalloc.h>
> +#include <net/checksum.h>
> +#include <net/ip.h>
> +#include <net/ip6_fib.h>
> +#include <net/lwtunnel.h>
> +#include <net/protocol.h>
> +#include <net/resolver.h>
> +#include <uapi/linux/ila.h>

This include list could be stripped down a bit. ila, lwt, fib, ...

> +
> +static struct net_rslv_ent *net_rslv_new_ent(struct net_rslv *nrslv,
> +					     void *key)

Comment above that net_rslv_get_lock() must be held?

> +{
> +	struct net_rslv_ent *nrent;
> +	int err;
> +
> +	nrent = kzalloc(sizeof(*nrent) + nrslv->obj_size, GFP_KERNEL);

GFP_ATOMIC since you typically hold net_rslv_get_lock() spinlock?

> +	if (!nrent)
> +		return ERR_PTR(-EAGAIN);
> +
> +	/* Key is always at beginning of object data */
> +	memcpy(nrent->object, key, nrslv->params.key_len);
> +
> +	/* Initialize user data */
> +	if (nrslv->rslv_init)
> +		nrslv->rslv_init(nrslv, nrent);
> +
> +	/* Put in hash table */
> +	err = rhashtable_lookup_insert_fast(&nrslv->rhash_table,
> +					    &nrent->node, nrslv->params);
> +	if (err)
> +		return ERR_PTR(err);
> +
> +	if (nrslv->timeout) {
> +		/* Schedule timeout for resolver */
> +		INIT_DELAYED_WORK(&nrent->timeout_work, net_rslv_delayed_work);

Should this be done before inserting into rhashtable?

> +		schedule_delayed_work(&nrent->timeout_work, nrslv->timeout);
> +	}
> +
> +	nrent->nrslv = nrslv;

Same here.  net_rslv_cancel_all_delayed_work() walking the rhashtable could
see ->nrslv as NULL.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH RFC 5/6] net: Generic resolver backend
  2016-09-14  9:49   ` Thomas Graf
@ 2016-09-14 19:56     ` Tom Herbert
  0 siblings, 0 replies; 14+ messages in thread
From: Tom Herbert @ 2016-09-14 19:56 UTC (permalink / raw)
  To: Thomas Graf; +Cc: David S. Miller, Linux Kernel Network Developers, Kernel Team

On Wed, Sep 14, 2016 at 2:49 AM, Thomas Graf <tgraf@suug.ch> wrote:
> On 09/09/16 at 04:19pm, Tom Herbert wrote:
>> diff --git a/net/core/resolver.c b/net/core/resolver.c
>> new file mode 100644
>> index 0000000..61b36c5
>> --- /dev/null
>> +++ b/net/core/resolver.c
>> @@ -0,0 +1,267 @@
>> +#include <linux/errno.h>
>> +#include <linux/ip.h>
>> +#include <linux/kernel.h>
>> +#include <linux/module.h>
>> +#include <linux/netlink.h>
>> +#include <linux/skbuff.h>
>> +#include <linux/socket.h>
>> +#include <linux/types.h>
>> +#include <linux/vmalloc.h>
>> +#include <net/checksum.h>
>> +#include <net/ip.h>
>> +#include <net/ip6_fib.h>
>> +#include <net/lwtunnel.h>
>> +#include <net/protocol.h>
>> +#include <net/resolver.h>
>> +#include <uapi/linux/ila.h>
>
> This include list could be stripped down a bit. ila, lwt, fib, ...
>
>> +
>> +static struct net_rslv_ent *net_rslv_new_ent(struct net_rslv *nrslv,
>> +                                          void *key)
>
> Comment above that net_rslv_get_lock() must be held?
>
>> +{
>> +     struct net_rslv_ent *nrent;
>> +     int err;
>> +
>> +     nrent = kzalloc(sizeof(*nrent) + nrslv->obj_size, GFP_KERNEL);
>
> GFP_ATOMIC since you typically hold net_rslv_get_lock() spinlock?
>
>> +     if (!nrent)
>> +             return ERR_PTR(-EAGAIN);
>> +
>> +     /* Key is always at beginning of object data */
>> +     memcpy(nrent->object, key, nrslv->params.key_len);
>> +
>> +     /* Initialize user data */
>> +     if (nrslv->rslv_init)
>> +             nrslv->rslv_init(nrslv, nrent);
>> +
>> +     /* Put in hash table */
>> +     err = rhashtable_lookup_insert_fast(&nrslv->rhash_table,
>> +                                         &nrent->node, nrslv->params);
>> +     if (err)
>> +             return ERR_PTR(err);
>> +
>> +     if (nrslv->timeout) {
>> +             /* Schedule timeout for resolver */
>> +             INIT_DELAYED_WORK(&nrent->timeout_work, net_rslv_delayed_work);
>
> Should this be done before inserting into rhashtable?
>
Adding to the table and setting delayed work are done under a lock so
I think it should be okay. I'll add a comment to the function that the
lock is held.

>> +             schedule_delayed_work(&nrent->timeout_work, nrslv->timeout);
>> +     }
>> +
>> +     nrent->nrslv = nrslv;
>
> Same here.  net_rslv_cancel_all_delayed_work() walking the rhashtable could
> see ->nrslv as NULL.

I'll move it up, but net_rslv_cancel_all_delayed_work is only called
when we're destroying the table so it would be a bug in the higher
layer if it is both destroying the table and adding entries at the
same time.

Thanks,
Tom


>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH RFC 2/6] rhashtable: Call library function alloc_bucket_locks
  2016-09-09 23:19 ` [PATCH RFC 2/6] rhashtable: Call library function alloc_bucket_locks Tom Herbert
  2016-09-14  9:18   ` Thomas Graf
@ 2016-09-20  1:49   ` Herbert Xu
  1 sibling, 0 replies; 14+ messages in thread
From: Herbert Xu @ 2016-09-20  1:49 UTC (permalink / raw)
  To: Tom Herbert; +Cc: davem, netdev, kernel-team, tgraf

Tom Herbert <tom@herbertland.com> wrote:
> To allocate the array of bucket locks for the hash table we now
> call library function alloc_bucket_spinlocks. This function is
> based on the old alloc_bucket_locks in rhashtable and should
> produce the same effect.
> 
> Signed-off-by: Tom Herbert <tom@herbertland.com>

This conflicts with the work I'm doing to fix the resize ENOMEM
issue.  I'll be making the hashtable as well as the spinlock table
nested, in which case you must not directly dereference it as an
array.

If you're just trying to share the spinlocks for another purpose,
what we can do is provide a helper function to return the right
lock for a given key/object.

Cheers,
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2016-09-20  1:49 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-09-09 23:19 [PATCH RFC 0/6] net: ILA resolver and generic resolver backend Tom Herbert
2016-09-09 23:19 ` [PATCH RFC 1/6] spinlock: Add library function to allocate spinlock buckets array Tom Herbert
2016-09-12 15:17   ` Greg
2016-09-14  9:27   ` Thomas Graf
2016-09-09 23:19 ` [PATCH RFC 2/6] rhashtable: Call library function alloc_bucket_locks Tom Herbert
2016-09-14  9:18   ` Thomas Graf
2016-09-20  1:49   ` Herbert Xu
2016-09-09 23:19 ` [PATCH RFC 3/6] ila: " Tom Herbert
2016-09-09 23:19 ` [PATCH RFC 4/6] rhashtable: abstract out function to get hash Tom Herbert
2016-09-14  9:23   ` Thomas Graf
2016-09-09 23:19 ` [PATCH RFC 5/6] net: Generic resolver backend Tom Herbert
2016-09-14  9:49   ` Thomas Graf
2016-09-14 19:56     ` Tom Herbert
2016-09-09 23:19 ` [PATCH RFC 6/6] ila: Resolver mechanism Tom Herbert

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.