All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH ipsec-next 0/2] xfrm: scalability enhancements for policy database
@ 2014-05-12 13:45 Christophe Gouault
  2014-05-12 13:45 ` [PATCH ipsec-next 1/2] xfrm: hash prefixed policies based on preflen thresholds Christophe Gouault
                   ` (2 more replies)
  0 siblings, 3 replies; 27+ messages in thread
From: Christophe Gouault @ 2014-05-12 13:45 UTC (permalink / raw)
  To: Steffen Klassert, David S. Miller; +Cc: netdev

This patchset enables to hash more policies than just non-prefixed
ones: hash policies whose prefix lengths are greater or equal to
configurable thresholds.

These thresholds are configured via /proc entries:

/proc/sys/net/ipv4/xfrm4_policy_hash_tresh: 32 32
/proc/sys/net/ipv6/xfrm6_policy_hash_tresh: 128 128

Christophe Gouault (2):
      xfrm: hash prefixed policies based on preflen thresholds
      xfrm: configure policy hash table thresholds by /proc

 include/net/netns/xfrm.h |   8 +++
 include/net/xfrm.h       |   1 +
 net/ipv4/xfrm4_policy.c  |  67 ++++++++++++++++++++++
 net/ipv6/xfrm6_policy.c  |  67 ++++++++++++++++++++++
 net/xfrm/xfrm_hash.h     |  76 +++++++++++++++++++++----
 net/xfrm/xfrm_policy.c   | 142 +++++++++++++++++++++++++++++++++++++++++++++--
 net/xfrm/xfrm_sysctl.c   |   3 +
 7 files changed, 348 insertions(+), 16 deletions(-)

Best Regards,
Christophe

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCH ipsec-next 1/2] xfrm: hash prefixed policies based on preflen thresholds
  2014-05-12 13:45 [PATCH ipsec-next 0/2] xfrm: scalability enhancements for policy database Christophe Gouault
@ 2014-05-12 13:45 ` Christophe Gouault
  2014-05-12 13:45 ` [PATCH ipsec-next 2/2] xfrm: configure policy hash table thresholds by /proc Christophe Gouault
  2014-08-01  9:12 ` [PATCH net-next v2 0/2] xfrm: scalability enhancements for policy database Christophe Gouault
  2 siblings, 0 replies; 27+ messages in thread
From: Christophe Gouault @ 2014-05-12 13:45 UTC (permalink / raw)
  To: Steffen Klassert, David S. Miller; +Cc: netdev, Christophe Gouault

The idea is an extension of the current policy hashing.

Today only non-prefixed policies are stored in a hash table. This
patch relaxes the constraints, and hashes policies whose prefix
lengths are greater or equal to a configurable threshold.

Each hash table (one per direction) maintains its own set of IPv4 and
IPv6 thresholds (dbits4, sbits4, dbits6, sbits6), by default (32, 32,
128, 128).

Example, if the output hash table is configured with values (16, 24,
56, 64):

ip xfrm policy add dir out src 10.22.0.0/20 dst 10.24.1.0/24 ... => hashed
ip xfrm policy add dir out src 10.22.0.0/16 dst 10.24.1.1/32 ... => hashed
ip xfrm policy add dir out src 10.22.0.0/16 dst 10.24.0.0/16 ... => unhashed

ip xfrm policy add dir out \
    src 3ffe:304:124:2200::/60 dst 3ffe:304:124:2401::/64 ...    => hashed
ip xfrm policy add dir out \
    src 3ffe:304:124:2200::/56 dst 3ffe:304:124:2401::2/128 ...  => hashed
ip xfrm policy add dir out \
    src 3ffe:304:124:2200::/56 dst 3ffe:304:124:2400::/56 ...    => unhashed

The high order bits of the addresses (up to the threshold) are used to
compute the hash key.

Signed-off-by: Christophe Gouault <christophe.gouault@6wind.com>
---
 include/net/netns/xfrm.h |  4 +++
 net/xfrm/xfrm_hash.h     | 76 +++++++++++++++++++++++++++++++++++++++++-------
 net/xfrm/xfrm_policy.c   | 53 +++++++++++++++++++++++++++++----
 3 files changed, 117 insertions(+), 16 deletions(-)

diff --git a/include/net/netns/xfrm.h b/include/net/netns/xfrm.h
index 3492434..41902a8 100644
--- a/include/net/netns/xfrm.h
+++ b/include/net/netns/xfrm.h
@@ -13,6 +13,10 @@ struct ctl_table_header;
 struct xfrm_policy_hash {
 	struct hlist_head	*table;
 	unsigned int		hmask;
+	u8			dbits4;
+	u8			sbits4;
+	u8			dbits6;
+	u8			sbits6;
 };
 
 struct netns_xfrm {
diff --git a/net/xfrm/xfrm_hash.h b/net/xfrm/xfrm_hash.h
index 0622d31..666c5ff 100644
--- a/net/xfrm/xfrm_hash.h
+++ b/net/xfrm/xfrm_hash.h
@@ -3,6 +3,7 @@
 
 #include <linux/xfrm.h>
 #include <linux/socket.h>
+#include <linux/jhash.h>
 
 static inline unsigned int __xfrm4_addr_hash(const xfrm_address_t *addr)
 {
@@ -28,6 +29,58 @@ static inline unsigned int __xfrm6_daddr_saddr_hash(const xfrm_address_t *daddr,
 		     saddr->a6[2] ^ saddr->a6[3]);
 }
 
+static inline u32 __bits2mask32(__u8 bits)
+{
+	u32 mask32 = 0xffffffff;
+
+	if (bits == 0)
+		mask32 = 0;
+	else if (bits < 32)
+		mask32 <<= (32 - bits);
+
+	return mask32;
+}
+
+static inline unsigned int __xfrm4_dpref_spref_hash(const xfrm_address_t *daddr,
+						    const xfrm_address_t *saddr,
+						    __u8 dbits,
+						    __u8 sbits)
+{
+	return jhash_2words(ntohl(daddr->a4) & __bits2mask32(dbits),
+			    ntohl(saddr->a4) & __bits2mask32(sbits),
+			    0);
+}
+
+static inline unsigned int __xfrm6_pref_hash(const xfrm_address_t *addr,
+					     __u8 prefixlen)
+{
+	int pdw;
+	int pbi;
+	u32 initval = 0;
+
+	pdw = prefixlen >> 5;     /* num of whole u32 in prefix */
+	pbi = prefixlen &  0x1f;  /* num of bits in incomplete u32 in prefix */
+
+	if (pbi) {
+		__be32 mask;
+
+		mask = htonl((0xffffffff) << (32 - pbi));
+
+		initval = (__force u32)(addr->a6[pdw] & mask);
+	}
+
+	return jhash2((__force u32 *)addr->a6, pdw, initval);
+}
+
+static inline unsigned int __xfrm6_dpref_spref_hash(const xfrm_address_t *daddr,
+						    const xfrm_address_t *saddr,
+						    __u8 dbits,
+						    __u8 sbits)
+{
+	return __xfrm6_pref_hash(daddr, dbits) ^
+	       __xfrm6_pref_hash(saddr, sbits);
+}
+
 static inline unsigned int __xfrm_dst_hash(const xfrm_address_t *daddr,
 					   const xfrm_address_t *saddr,
 					   u32 reqid, unsigned short family,
@@ -84,7 +137,8 @@ static inline unsigned int __idx_hash(u32 index, unsigned int hmask)
 }
 
 static inline unsigned int __sel_hash(const struct xfrm_selector *sel,
-				      unsigned short family, unsigned int hmask)
+				      unsigned short family, unsigned int hmask,
+				      u8 dbits, u8 sbits)
 {
 	const xfrm_address_t *daddr = &sel->daddr;
 	const xfrm_address_t *saddr = &sel->saddr;
@@ -92,19 +146,19 @@ static inline unsigned int __sel_hash(const struct xfrm_selector *sel,
 
 	switch (family) {
 	case AF_INET:
-		if (sel->prefixlen_d != 32 ||
-		    sel->prefixlen_s != 32)
+		if (sel->prefixlen_d < dbits ||
+		    sel->prefixlen_s < sbits)
 			return hmask + 1;
 
-		h = __xfrm4_daddr_saddr_hash(daddr, saddr);
+		h = __xfrm4_dpref_spref_hash(daddr, saddr, dbits, sbits);
 		break;
 
 	case AF_INET6:
-		if (sel->prefixlen_d != 128 ||
-		    sel->prefixlen_s != 128)
+		if (sel->prefixlen_d < dbits ||
+		    sel->prefixlen_s < sbits)
 			return hmask + 1;
 
-		h = __xfrm6_daddr_saddr_hash(daddr, saddr);
+		h = __xfrm6_dpref_spref_hash(daddr, saddr, dbits, sbits);
 		break;
 	}
 	h ^= (h >> 16);
@@ -113,17 +167,19 @@ static inline unsigned int __sel_hash(const struct xfrm_selector *sel,
 
 static inline unsigned int __addr_hash(const xfrm_address_t *daddr,
 				       const xfrm_address_t *saddr,
-				       unsigned short family, unsigned int hmask)
+				       unsigned short family,
+				       unsigned int hmask,
+				       u8 dbits, u8 sbits)
 {
 	unsigned int h = 0;
 
 	switch (family) {
 	case AF_INET:
-		h = __xfrm4_daddr_saddr_hash(daddr, saddr);
+		h = __xfrm4_dpref_spref_hash(daddr, saddr, dbits, sbits);
 		break;
 
 	case AF_INET6:
-		h = __xfrm6_daddr_saddr_hash(daddr, saddr);
+		h = __xfrm6_dpref_spref_hash(daddr, saddr, dbits, sbits);
 		break;
 	}
 	h ^= (h >> 16);
diff --git a/net/xfrm/xfrm_policy.c b/net/xfrm/xfrm_policy.c
index 375267d..d65e254 100644
--- a/net/xfrm/xfrm_policy.c
+++ b/net/xfrm/xfrm_policy.c
@@ -344,12 +344,39 @@ static inline unsigned int idx_hash(struct net *net, u32 index)
 	return __idx_hash(index, net->xfrm.policy_idx_hmask);
 }
 
+/* calculate policy hash thresholds */
+static void __get_hash_thresh(struct net *net,
+				   unsigned short family, int dir,
+				   u8 *dbits, u8 *sbits)
+{
+	switch (family) {
+	case AF_INET:
+		*dbits = net->xfrm.policy_bydst[dir].dbits4;
+		*sbits = net->xfrm.policy_bydst[dir].sbits4;
+		break;
+
+	case AF_INET6:
+		*dbits = net->xfrm.policy_bydst[dir].dbits6;
+		*sbits = net->xfrm.policy_bydst[dir].sbits6;
+		break;
+
+	default:
+		*dbits = 0;
+		*sbits = 0;
+	}
+}
+
 static struct hlist_head *policy_hash_bysel(struct net *net,
 					    const struct xfrm_selector *sel,
 					    unsigned short family, int dir)
 {
 	unsigned int hmask = net->xfrm.policy_bydst[dir].hmask;
-	unsigned int hash = __sel_hash(sel, family, hmask);
+	unsigned int hash;
+	u8 dbits;
+	u8 sbits;
+
+	__get_hash_thresh(net, family, dir, &dbits, &sbits);
+	hash = __sel_hash(sel, family, hmask, dbits, sbits);
 
 	return (hash == hmask + 1 ?
 		&net->xfrm.policy_inexact[dir] :
@@ -362,25 +389,35 @@ static struct hlist_head *policy_hash_direct(struct net *net,
 					     unsigned short family, int dir)
 {
 	unsigned int hmask = net->xfrm.policy_bydst[dir].hmask;
-	unsigned int hash = __addr_hash(daddr, saddr, family, hmask);
+	unsigned int hash;
+	u8 dbits;
+	u8 sbits;
+
+	__get_hash_thresh(net, family, dir, &dbits, &sbits);
+	hash = __addr_hash(daddr, saddr, family, hmask, dbits, sbits);
 
 	return net->xfrm.policy_bydst[dir].table + hash;
 }
 
-static void xfrm_dst_hash_transfer(struct hlist_head *list,
+static void xfrm_dst_hash_transfer(struct net *net,
+				   struct hlist_head *list,
 				   struct hlist_head *ndsttable,
-				   unsigned int nhashmask)
+				   unsigned int nhashmask,
+				   int dir)
 {
 	struct hlist_node *tmp, *entry0 = NULL;
 	struct xfrm_policy *pol;
 	unsigned int h0 = 0;
+	u8 dbits;
+	u8 sbits;
 
 redo:
 	hlist_for_each_entry_safe(pol, tmp, list, bydst) {
 		unsigned int h;
 
+		__get_hash_thresh(net, pol->family, dir, &dbits, &sbits);
 		h = __addr_hash(&pol->selector.daddr, &pol->selector.saddr,
-				pol->family, nhashmask);
+				pol->family, nhashmask, dbits, sbits);
 		if (!entry0) {
 			hlist_del(&pol->bydst);
 			hlist_add_head(&pol->bydst, ndsttable+h);
@@ -434,7 +471,7 @@ static void xfrm_bydst_resize(struct net *net, int dir)
 	write_lock_bh(&net->xfrm.xfrm_policy_lock);
 
 	for (i = hmask; i >= 0; i--)
-		xfrm_dst_hash_transfer(odst + i, ndst, nhashmask);
+		xfrm_dst_hash_transfer(net, odst + i, ndst, nhashmask, dir);
 
 	net->xfrm.policy_bydst[dir].table = ndst;
 	net->xfrm.policy_bydst[dir].hmask = nhashmask;
@@ -2830,6 +2867,10 @@ static int __net_init xfrm_policy_init(struct net *net)
 		if (!htab->table)
 			goto out_bydst;
 		htab->hmask = hmask;
+		htab->dbits4 = 32;
+		htab->sbits4 = 32;
+		htab->dbits6 = 128;
+		htab->sbits6 = 128;
 	}
 
 	INIT_LIST_HEAD(&net->xfrm.policy_all);
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH ipsec-next 2/2] xfrm: configure policy hash table thresholds by /proc
  2014-05-12 13:45 [PATCH ipsec-next 0/2] xfrm: scalability enhancements for policy database Christophe Gouault
  2014-05-12 13:45 ` [PATCH ipsec-next 1/2] xfrm: hash prefixed policies based on preflen thresholds Christophe Gouault
@ 2014-05-12 13:45 ` Christophe Gouault
  2014-05-15  8:34   ` Steffen Klassert
  2014-08-01  9:12 ` [PATCH net-next v2 0/2] xfrm: scalability enhancements for policy database Christophe Gouault
  2 siblings, 1 reply; 27+ messages in thread
From: Christophe Gouault @ 2014-05-12 13:45 UTC (permalink / raw)
  To: Steffen Klassert, David S. Miller; +Cc: netdev, Christophe Gouault

Enable to specify local and remote prefix length thresholds
for the policy hash table via /proc entries. Example:

echo 0 24 > /proc/sys/net/ipv4/xfrm4_policy_hash_tresh
echo 0 56 > /proc/sys/net/ipv6/xfrm6_policy_hash_tresh

The numbers are the policy selector minimum prefix lengths to put a
policy in the hash table.

The first number is the local threshold (source address for out
policies, destination address for in and fwd policies).

The second number is the remote threshold (destination address for out
policies, source address for in and fwd policies).

The default values are:

/proc/sys/net/ipv4/xfrm4_policy_hash_tresh: 32 32
/proc/sys/net/ipv6/xfrm6_policy_hash_tresh: 128 128

Dynamic re-building of the SPD is performed when the /proc values
are changed.

Signed-off-by: Christophe Gouault <christophe.gouault@6wind.com>
---
 include/net/netns/xfrm.h |  4 +++
 include/net/xfrm.h       |  1 +
 net/ipv4/xfrm4_policy.c  | 67 ++++++++++++++++++++++++++++++++++++
 net/ipv6/xfrm6_policy.c  | 67 ++++++++++++++++++++++++++++++++++++
 net/xfrm/xfrm_policy.c   | 89 ++++++++++++++++++++++++++++++++++++++++++++++++
 net/xfrm/xfrm_sysctl.c   |  3 ++
 6 files changed, 231 insertions(+)

diff --git a/include/net/netns/xfrm.h b/include/net/netns/xfrm.h
index 41902a8..0a23d02 100644
--- a/include/net/netns/xfrm.h
+++ b/include/net/netns/xfrm.h
@@ -45,6 +45,7 @@ struct netns_xfrm {
 	struct xfrm_policy_hash	policy_bydst[XFRM_POLICY_MAX * 2];
 	unsigned int		policy_count[XFRM_POLICY_MAX * 2];
 	struct work_struct	policy_hash_work;
+	struct work_struct	policy_hash_thresh_work;
 
 
 	struct sock		*nlsk;
@@ -54,6 +55,9 @@ struct netns_xfrm {
 	u32			sysctl_aevent_rseqth;
 	int			sysctl_larval_drop;
 	u32			sysctl_acq_expires;
+	u8			sysctl_xfrm4_policy_hash_thresh[2];
+	u8			sysctl_xfrm6_policy_hash_thresh[2];
+	seqlock_t		sysctl_policy_hash_thresh_lock;
 #ifdef CONFIG_SYSCTL
 	struct ctl_table_header	*sysctl_hdr;
 #endif
diff --git a/include/net/xfrm.h b/include/net/xfrm.h
index 721e9c3..dc4865e 100644
--- a/include/net/xfrm.h
+++ b/include/net/xfrm.h
@@ -1591,6 +1591,7 @@ struct xfrm_policy *xfrm_policy_bysel_ctx(struct net *net, u32 mark,
 struct xfrm_policy *xfrm_policy_byid(struct net *net, u32 mark, u8, int dir,
 				     u32 id, int delete, int *err);
 int xfrm_policy_flush(struct net *net, u8 type, bool task_valid);
+void xfrm_policy_hash_rebuild(struct net *net);
 u32 xfrm_get_acqseq(void);
 int verify_spi_info(u8 proto, u32 min, u32 max);
 int xfrm_alloc_spi(struct xfrm_state *x, u32 minspi, u32 maxspi);
diff --git a/net/ipv4/xfrm4_policy.c b/net/ipv4/xfrm4_policy.c
index 6156f68..4b7b29d 100644
--- a/net/ipv4/xfrm4_policy.c
+++ b/net/ipv4/xfrm4_policy.c
@@ -256,6 +256,61 @@ static struct xfrm_policy_afinfo xfrm4_policy_afinfo = {
 };
 
 #ifdef CONFIG_SYSCTL
+static int xfrm4_policy_hash_thresh_min[] = { 0, 0 };
+static int xfrm4_policy_hash_thresh_max[] = { 32, 32 };
+
+/* Read xfrm4 policy hash table thresholds */
+static void get_xfrm4_policy_hash_thresh(struct net *net, int thresh[2])
+{
+	unsigned seq;
+
+	do {
+		seq = read_seqbegin(&net->xfrm.sysctl_policy_hash_thresh_lock);
+
+		thresh[0] = net->xfrm.sysctl_xfrm4_policy_hash_thresh[0];
+		thresh[1] = net->xfrm.sysctl_xfrm4_policy_hash_thresh[1];
+	} while (read_seqretry(&net->xfrm.sysctl_policy_hash_thresh_lock, seq));
+}
+
+/* Update xfrm4 policy hash table thresholds */
+static void set_xfrm4_policy_hash_thresh(struct net *net, int thresh[2])
+{
+	write_seqlock(&net->xfrm.sysctl_policy_hash_thresh_lock);
+	net->xfrm.sysctl_xfrm4_policy_hash_thresh[0] = thresh[0];
+	net->xfrm.sysctl_xfrm4_policy_hash_thresh[1] = thresh[1];
+	write_sequnlock(&net->xfrm.sysctl_policy_hash_thresh_lock);
+
+	xfrm_policy_hash_rebuild(net);
+}
+
+/* Validate changes from /proc interface. */
+static int xfrm4_policy_hash_thresh(struct ctl_table *table, int write,
+				 void __user *buffer,
+				 size_t *lenp, loff_t *ppos)
+{
+	struct net *net =
+		container_of(table->data, struct net,
+			     xfrm.sysctl_xfrm4_policy_hash_thresh);
+	int ret;
+	int thresh[2];
+	struct ctl_table tmp = {
+		.data = &thresh,
+		.maxlen = sizeof(thresh),
+		.mode = table->mode,
+		.extra1 = &xfrm4_policy_hash_thresh_min,
+		.extra2 = &xfrm4_policy_hash_thresh_max,
+	};
+
+	get_xfrm4_policy_hash_thresh(net, thresh);
+
+	ret = proc_dointvec_minmax(&tmp, write, buffer, lenp, ppos);
+
+	if (write && ret == 0)
+		set_xfrm4_policy_hash_thresh(net, thresh);
+
+	return ret;
+}
+
 static struct ctl_table xfrm4_policy_table[] = {
 	{
 		.procname       = "xfrm4_gc_thresh",
@@ -264,6 +319,13 @@ static struct ctl_table xfrm4_policy_table[] = {
 		.mode           = 0644,
 		.proc_handler   = proc_dointvec,
 	},
+	{
+		.procname       = "xfrm4_policy_hash_thresh",
+		.data           = &init_net.xfrm.sysctl_xfrm4_policy_hash_thresh,
+		.maxlen         = sizeof(init_net.xfrm.sysctl_xfrm4_policy_hash_thresh),
+		.mode           = 0644,
+		.proc_handler   = xfrm4_policy_hash_thresh,
+	},
 	{ }
 };
 
@@ -279,8 +341,13 @@ static int __net_init xfrm4_net_init(struct net *net)
 			goto err_alloc;
 
 		table[0].data = &net->xfrm.xfrm4_dst_ops.gc_thresh;
+		table[1].data = &net->xfrm.sysctl_xfrm4_policy_hash_thresh;
 	}
 
+	/* Set defaults for xfrm4 policy hash thresholds */
+	net->xfrm.sysctl_xfrm4_policy_hash_thresh[0] = 32;
+	net->xfrm.sysctl_xfrm4_policy_hash_thresh[1] = 32;
+
 	hdr = register_net_sysctl(net, "net/ipv4", table);
 	if (!hdr)
 		goto err_reg;
diff --git a/net/ipv6/xfrm6_policy.c b/net/ipv6/xfrm6_policy.c
index 2a0bbda..7d7ca9af 100644
--- a/net/ipv6/xfrm6_policy.c
+++ b/net/ipv6/xfrm6_policy.c
@@ -316,6 +316,61 @@ static void xfrm6_policy_fini(void)
 }
 
 #ifdef CONFIG_SYSCTL
+static int xfrm6_policy_hash_thresh_min[] = { 0, 0 };
+static int xfrm6_policy_hash_thresh_max[] = { 128, 128 };
+
+/* Read xfrm6 policy hash table thresholds */
+static void get_xfrm6_policy_hash_thresh(struct net *net, int thresh[2])
+{
+	unsigned seq;
+
+	do {
+		seq = read_seqbegin(&net->xfrm.sysctl_policy_hash_thresh_lock);
+
+		thresh[0] = net->xfrm.sysctl_xfrm6_policy_hash_thresh[0];
+		thresh[1] = net->xfrm.sysctl_xfrm6_policy_hash_thresh[1];
+	} while (read_seqretry(&net->xfrm.sysctl_policy_hash_thresh_lock, seq));
+}
+
+/* Update xfrm6 policy hash table thresholds */
+static void set_xfrm6_policy_hash_thresh(struct net *net, int thresh[2])
+{
+	write_seqlock(&net->xfrm.sysctl_policy_hash_thresh_lock);
+	net->xfrm.sysctl_xfrm6_policy_hash_thresh[0] = thresh[0];
+	net->xfrm.sysctl_xfrm6_policy_hash_thresh[1] = thresh[1];
+	write_sequnlock(&net->xfrm.sysctl_policy_hash_thresh_lock);
+
+	xfrm_policy_hash_rebuild(net);
+}
+
+/* Validate changes from /proc interface. */
+static int xfrm6_policy_hash_thresh(struct ctl_table *table, int write,
+				 void __user *buffer,
+				 size_t *lenp, loff_t *ppos)
+{
+	struct net *net =
+		container_of(table->data, struct net,
+			     xfrm.sysctl_xfrm6_policy_hash_thresh);
+	int ret;
+	int thresh[2];
+	struct ctl_table tmp = {
+		.data = &thresh,
+		.maxlen = sizeof(thresh),
+		.mode = table->mode,
+		.extra1 = &xfrm6_policy_hash_thresh_min,
+		.extra2 = &xfrm6_policy_hash_thresh_max,
+	};
+
+	get_xfrm6_policy_hash_thresh(net, thresh);
+
+	ret = proc_dointvec_minmax(&tmp, write, buffer, lenp, ppos);
+
+	if (write && ret == 0)
+		set_xfrm6_policy_hash_thresh(net, thresh);
+
+	return ret;
+}
+
 static struct ctl_table xfrm6_policy_table[] = {
 	{
 		.procname       = "xfrm6_gc_thresh",
@@ -324,6 +379,13 @@ static struct ctl_table xfrm6_policy_table[] = {
 		.mode	   	= 0644,
 		.proc_handler   = proc_dointvec,
 	},
+	{
+		.procname       = "xfrm6_policy_hash_thresh",
+		.data           = &init_net.xfrm.sysctl_xfrm6_policy_hash_thresh,
+		.maxlen         = sizeof(init_net.xfrm.sysctl_xfrm6_policy_hash_thresh),
+		.mode           = 0644,
+		.proc_handler   = xfrm6_policy_hash_thresh,
+	},
 	{ }
 };
 
@@ -339,8 +401,13 @@ static int __net_init xfrm6_net_init(struct net *net)
 			goto err_alloc;
 
 		table[0].data = &net->xfrm.xfrm6_dst_ops.gc_thresh;
+		table[1].data = &net->xfrm.sysctl_xfrm6_policy_hash_thresh;
 	}
 
+	/* Set defaults for xfrm6 policy hash thresholds */
+	net->xfrm.sysctl_xfrm6_policy_hash_thresh[0] = 128;
+	net->xfrm.sysctl_xfrm6_policy_hash_thresh[1] = 128;
+
 	hdr = register_net_sysctl(net, "net/ipv6", table);
 	if (!hdr)
 		goto err_reg;
diff --git a/net/xfrm/xfrm_policy.c b/net/xfrm/xfrm_policy.c
index d65e254..0b968ca 100644
--- a/net/xfrm/xfrm_policy.c
+++ b/net/xfrm/xfrm_policy.c
@@ -566,6 +566,90 @@ static void xfrm_hash_resize(struct work_struct *work)
 	mutex_unlock(&hash_resize_mutex);
 }
 
+/* selector source side (local/remote) according to direction (in/out/fwd) */
+static int __src_side(int dir)
+{
+	return (dir & XFRM_POLICY_MASK) == XFRM_POLICY_OUT ? 0 : 1;
+}
+
+/* selector dest side (local/remote) according to direction (in/out/fwd) */
+static int __dst_side(int dir)
+{
+	return (dir & XFRM_POLICY_MASK) == XFRM_POLICY_OUT ? 1 : 0;
+}
+
+static void xfrm_hash_rebuild(struct work_struct *work)
+{
+	struct net *net = container_of(work, struct net,
+				       xfrm.policy_hash_thresh_work);
+	unsigned int hmask;
+	struct xfrm_policy *pol;
+	struct xfrm_policy *policy;
+	struct hlist_head *chain;
+	struct hlist_head *odst;
+	struct hlist_node *newpos;
+	int i;
+	int dir;
+	unsigned seq;
+	u8 thresh4[2];
+	u8 thresh6[2];
+
+	mutex_lock(&hash_resize_mutex);
+
+	/* copy thresholds from sysctl */
+	do {
+		seq = read_seqbegin(&net->xfrm.sysctl_policy_hash_thresh_lock);
+
+		thresh4[0] = net->xfrm.sysctl_xfrm4_policy_hash_thresh[0];
+		thresh4[1] = net->xfrm.sysctl_xfrm4_policy_hash_thresh[1];
+		thresh6[0] = net->xfrm.sysctl_xfrm6_policy_hash_thresh[0];
+		thresh6[1] = net->xfrm.sysctl_xfrm6_policy_hash_thresh[1];
+	} while (read_seqretry(&net->xfrm.sysctl_policy_hash_thresh_lock, seq));
+
+	write_lock_bh(&net->xfrm.xfrm_policy_lock);
+
+	/* reset the bydst and inexact table in all directions */
+	for (dir = 0; dir < XFRM_POLICY_MAX * 2; dir++) {
+
+		INIT_HLIST_HEAD(&net->xfrm.policy_inexact[dir]);
+		hmask = net->xfrm.policy_bydst[dir].hmask;
+		odst = net->xfrm.policy_bydst[dir].table;
+		for (i = hmask; i >= 0; i--)
+			INIT_HLIST_HEAD(odst + i);
+		net->xfrm.policy_bydst[dir].dbits4 = thresh4[__dst_side(dir)];
+		net->xfrm.policy_bydst[dir].sbits4 = thresh4[__src_side(dir)];
+		net->xfrm.policy_bydst[dir].dbits6 = thresh6[__dst_side(dir)];
+		net->xfrm.policy_bydst[dir].sbits6 = thresh6[__src_side(dir)];
+	}
+
+	/* re-insert all policies by order of creation */
+	list_for_each_entry_reverse(policy, &net->xfrm.policy_all, walk.all) {
+		newpos = NULL;
+		chain = policy_hash_bysel(net, &policy->selector,
+					  policy->family,
+					  xfrm_policy_id2dir(policy->index));
+		hlist_for_each_entry(pol, chain, bydst) {
+			if (policy->priority >= pol->priority)
+				newpos = &pol->bydst;
+			else
+				break;
+		}
+		if (newpos)
+			hlist_add_after(newpos, &policy->bydst);
+		else
+			hlist_add_head(&policy->bydst, chain);
+	}
+
+	write_unlock_bh(&net->xfrm.xfrm_policy_lock);
+
+	mutex_unlock(&hash_resize_mutex);
+}
+
+void xfrm_policy_hash_rebuild(struct net *net)
+{
+	schedule_work(&net->xfrm.policy_hash_thresh_work);
+}
+
 /* Generate new index... KAME seems to generate them ordered by cost
  * of an absolute inpredictability of ordering of rules. This will not pass. */
 static u32 xfrm_gen_index(struct net *net, int dir, u32 index)
@@ -2872,9 +2956,14 @@ static int __net_init xfrm_policy_init(struct net *net)
 		htab->dbits6 = 128;
 		htab->sbits6 = 128;
 	}
+	net->xfrm.sysctl_xfrm4_policy_hash_thresh[0] = 32;
+	net->xfrm.sysctl_xfrm4_policy_hash_thresh[1] = 32;
+	net->xfrm.sysctl_xfrm6_policy_hash_thresh[0] = 128;
+	net->xfrm.sysctl_xfrm6_policy_hash_thresh[1] = 128;
 
 	INIT_LIST_HEAD(&net->xfrm.policy_all);
 	INIT_WORK(&net->xfrm.policy_hash_work, xfrm_hash_resize);
+	INIT_WORK(&net->xfrm.policy_hash_thresh_work, xfrm_hash_rebuild);
 	if (net_eq(net, &init_net))
 		register_netdevice_notifier(&xfrm_dev_notifier);
 	return 0;
diff --git a/net/xfrm/xfrm_sysctl.c b/net/xfrm/xfrm_sysctl.c
index 05a6e3d..5fefb9d 100644
--- a/net/xfrm/xfrm_sysctl.c
+++ b/net/xfrm/xfrm_sysctl.c
@@ -54,6 +54,9 @@ int __net_init xfrm_sysctl_init(struct net *net)
 	table[2].data = &net->xfrm.sysctl_larval_drop;
 	table[3].data = &net->xfrm.sysctl_acq_expires;
 
+	/* initialize policy hash threshold sysctl lock */
+	seqlock_init(&net->xfrm.sysctl_policy_hash_thresh_lock);
+
 	/* Don't export sysctls to unprivileged users */
 	if (net->user_ns != &init_user_ns)
 		table[0].procname = NULL;
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: [PATCH ipsec-next 2/2] xfrm: configure policy hash table thresholds by /proc
  2014-05-12 13:45 ` [PATCH ipsec-next 2/2] xfrm: configure policy hash table thresholds by /proc Christophe Gouault
@ 2014-05-15  8:34   ` Steffen Klassert
  2014-05-19  7:41     ` Christophe Gouault
  0 siblings, 1 reply; 27+ messages in thread
From: Steffen Klassert @ 2014-05-15  8:34 UTC (permalink / raw)
  To: Christophe Gouault; +Cc: David S. Miller, netdev

On Mon, May 12, 2014 at 03:45:25PM +0200, Christophe Gouault wrote:
> Enable to specify local and remote prefix length thresholds
> for the policy hash table via /proc entries. Example:
> 
> echo 0 24 > /proc/sys/net/ipv4/xfrm4_policy_hash_tresh
> echo 0 56 > /proc/sys/net/ipv6/xfrm6_policy_hash_tresh

I would not like to have this configurable from userspace.
Fist of all, a good threshold depends on the IPsec configuration
and can change during runtime. So it is not obvious for a user
which values are good for his configuration. Most users will
just leave the default, so they will not benefit from your
changes.

Second, on the long run we have to remove the IPsec flowcache
as this has the same limitation as our routing cache had.
To do this, we need to replace the hashlist based policy and
state lookups by a well performing lookup algorithm and I
would like to do that without any user visible changes.

Can't we tune the hash threshold internally? We could maintain
a per hashlist policy counter. If we have 'many' policies and
most of these policies are in the same hashlist we could change
the hash threshold. We could check this when we add policies
and update the hash threshold if needed.

Everything else looks pretty good, thanks!

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH ipsec-next 2/2] xfrm: configure policy hash table thresholds by /proc
  2014-05-15  8:34   ` Steffen Klassert
@ 2014-05-19  7:41     ` Christophe Gouault
  2014-05-22 10:09       ` Steffen Klassert
  0 siblings, 1 reply; 27+ messages in thread
From: Christophe Gouault @ 2014-05-19  7:41 UTC (permalink / raw)
  To: Steffen Klassert; +Cc: David S. Miller, netdev

On 05/15/2014 10:34 AM, Steffen Klassert wrote:
> On Mon, May 12, 2014 at 03:45:25PM +0200, Christophe Gouault wrote:
>> Enable to specify local and remote prefix length thresholds
>> for the policy hash table via /proc entries. Example:
>>
>> echo 0 24 > /proc/sys/net/ipv4/xfrm4_policy_hash_tresh
>> echo 0 56 > /proc/sys/net/ipv6/xfrm6_policy_hash_tresh
>
> I would not like to have this configurable from userspace.
> Fist of all, a good threshold depends on the IPsec configuration
> and can change during runtime. So it is not obvious for a user
> which values are good for his configuration. Most users will
> just leave the default, so they will not benefit from your
> changes.

Hi Steffen,

Like for several other /proc entries, the default values are suitable
for simple use cases and users can let them unchanged. Users usually
only start tuning them when they have a specific use case (typically
scalability needs).

Moreover, I am concerned that any heuristic for automatic changes would
be a performance killer when the system is flapping. See below.

> Second, on the long run we have to remove the IPsec flowcache
> as this has the same limitation as our routing cache had.
> To do this, we need to replace the hashlist based policy and
> state lookups by a well performing lookup algorithm and I
> would like to do that without any user visible changes.

Efficient lookup is a field we have studied for long in my company.
There are many thesis about multi-field classification, but none enables
to cover all use cases. All suffer from limitations (building time,
memory consumption, number of fields, time and memory
unpredictability...) and each is adapted to a specific use case.

The best seems to offer several methods and enable to select and tune
them according to the use case.

The main advantage of the hash table with configurable thresholds is
that it enables to cover a wide variety of use cases by adjusting the
thresholds. And we have the benefit of "keep it simple".

> Can't we tune the hash threshold internally? We could maintain
> a per hashlist policy counter. If we have 'many' policies and
> most of these policies are in the same hashlist we could change
> the hash threshold. We could check this when we add policies
> and update the hash threshold if needed.

I think that finding a generic algorithm to determine a good tradeof for
the local and remote thresholds is quite tough. I'm afraid tracking the
number of entries in each hlist is not enough. It would help to trigger
a change, but not to choose the new values. Thresholds both
determine which SPs will actually be hashed (vs. ones that will just be
enqueued in the inexact list) and the number of bits that will be
included in the hash key (and hence the entropy of the key). Moreover,
it is a pair of thresholds, which makes the choice even harder.

A user who knows what his SPD contains would probably prefer to be able
to tune the hash thresholds instead of relying on an uncontrolled,
automatic algorithm.

Exporting a userland API (here by /proc) enables a user or a daemon to
choose a strategy according to information the kernel does not
necessarily have, and enables to implement various (possibly complex)
policies.

> Everything else looks pretty good, thanks!
>

You're welcome :)

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH ipsec-next 2/2] xfrm: configure policy hash table thresholds by /proc
  2014-05-19  7:41     ` Christophe Gouault
@ 2014-05-22 10:09       ` Steffen Klassert
  2014-05-22 10:15         ` David Laight
  0 siblings, 1 reply; 27+ messages in thread
From: Steffen Klassert @ 2014-05-22 10:09 UTC (permalink / raw)
  To: Christophe Gouault; +Cc: David S. Miller, netdev

On Mon, May 19, 2014 at 09:41:05AM +0200, Christophe Gouault wrote:
> On 05/15/2014 10:34 AM, Steffen Klassert wrote:
> 
> >Second, on the long run we have to remove the IPsec flowcache
> >as this has the same limitation as our routing cache had.
> >To do this, we need to replace the hashlist based policy and
> >state lookups by a well performing lookup algorithm and I
> >would like to do that without any user visible changes.
> 
> Efficient lookup is a field we have studied for long in my company.
> There are many thesis about multi-field classification, but none enables
> to cover all use cases. All suffer from limitations (building time,
> memory consumption, number of fields, time and memory
> unpredictability...) and each is adapted to a specific use case.

Right, it is even hard to find a algorithm that covers the most
common usecases. That's why we still use this list + flowcache
based lookup mechanism. But on the long run we need an option
to disable/remove the flowcache without loosing to much performance
in the fastpath lookup.

Like the ipv4 routing cache that was removed recently, the IPsec
flowcache gets its performance from the network traffic that
arrives and therefore it might be partly controllable by remote
entities. This can be critical for a security protocol like IPsec.

> 
> Exporting a userland API (here by /proc) enables a user or a daemon to
> choose a strategy according to information the kernel does not
> necessarily have, and enables to implement various (possibly complex)
> policies.
> 

If we add a user API for the current lookup mechanism, we will stick
with this because we can't change it anymore without breaking userspace.
So I don't want to add one before we finally decided on a long term
lookup mechanism for IPsec.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* RE: [PATCH ipsec-next 2/2] xfrm: configure policy hash table thresholds by /proc
  2014-05-22 10:09       ` Steffen Klassert
@ 2014-05-22 10:15         ` David Laight
  2014-05-23  8:30           ` Christophe Gouault
  0 siblings, 1 reply; 27+ messages in thread
From: David Laight @ 2014-05-22 10:15 UTC (permalink / raw)
  To: 'Steffen Klassert', Christophe Gouault; +Cc: David S. Miller, netdev

From: Klassert
...
> > Exporting a userland API (here by /proc) enables a user or a daemon to
> > choose a strategy according to information the kernel does not
> > necessarily have, and enables to implement various (possibly complex)
> > policies.
> >
> 
> If we add a user API for the current lookup mechanism, we will stick
> with this because we can't change it anymore without breaking userspace.
> So I don't want to add one before we finally decided on a long term
> lookup mechanism for IPsec.

You could have a user API call to find the list of available mechanisms
as well as one that returns/sets the current one.
Then there is no actual requirement to continue to support any specific one.

	David

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH ipsec-next 2/2] xfrm: configure policy hash table thresholds by /proc
  2014-05-22 10:15         ` David Laight
@ 2014-05-23  8:30           ` Christophe Gouault
  0 siblings, 0 replies; 27+ messages in thread
From: Christophe Gouault @ 2014-05-23  8:30 UTC (permalink / raw)
  To: David Laight, 'Steffen Klassert'; +Cc: David S. Miller, netdev

On 05/22/2014 12:15 PM, David Laight wrote:
> From: Klassert
> ...
>>> Exporting a userland API (here by /proc) enables a user or a daemon to
>>> choose a strategy according to information the kernel does not
>>> necessarily have, and enables to implement various (possibly complex)
>>> policies.
>>>
>>
>> If we add a user API for the current lookup mechanism, we will stick
>> with this because we can't change it anymore without breaking userspace.
>> So I don't want to add one before we finally decided on a long term
>> lookup mechanism for IPsec.
>
> You could have a user API call to find the list of available mechanisms
> as well as one that returns/sets the current one.
> Then there is no actual requirement to continue to support any specific one.
>
> 	David

Hi David,

It sounds like a brilliant idea, since we will probably need to support
several types of mechanisms. If nobody objects, I can start working on
such API.

Any preference on the type of API? (/proc, netlink, ioctl?...)

Christophe

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCH net-next v2 0/2] xfrm: scalability enhancements for policy database
  2014-05-12 13:45 [PATCH ipsec-next 0/2] xfrm: scalability enhancements for policy database Christophe Gouault
  2014-05-12 13:45 ` [PATCH ipsec-next 1/2] xfrm: hash prefixed policies based on preflen thresholds Christophe Gouault
  2014-05-12 13:45 ` [PATCH ipsec-next 2/2] xfrm: configure policy hash table thresholds by /proc Christophe Gouault
@ 2014-08-01  9:12 ` Christophe Gouault
  2014-08-01  9:12   ` [PATCH net-next v2 1/2] xfrm: hash prefixed policies based on preflen thresholds Christophe Gouault
                     ` (2 more replies)
  2 siblings, 3 replies; 27+ messages in thread
From: Christophe Gouault @ 2014-08-01  9:12 UTC (permalink / raw)
  To: David S. Miller, Steffen Klassert; +Cc: netdev

This patchset enables to hash more policies than just non-prefixed
ones: hash policies whose prefix lengths are greater or equal to
configurable thresholds.

These thresholds are configured via netlink message
XFRM_MSG_NEWSPDINFO, attributes XFRMA_SPD_IPV4_HTHRESH and
XFRMA_SPD_IPV6_HTHRESH.

The related iproute2 patch for configuring the thresholds is available
on demand.

Best Regards,
Christophe
----
v2:
- changed configuration API from proc to netlink
---
 include/net/netns/xfrm.h  |  14 +++++++
 include/net/xfrm.h        |   1 +
 include/uapi/linux/xfrm.h |   7 ++++
 net/xfrm/xfrm_hash.h      |  76 ++++++++++++++++++++++++++++++-----
 net/xfrm/xfrm_policy.c    | 143
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++---
 net/xfrm/xfrm_user.c      |  91
++++++++++++++++++++++++++++++++++++++++--
 6 files changed, 313 insertions(+), 19 deletions(-)

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCH net-next v2 1/2] xfrm: hash prefixed policies based on preflen thresholds
  2014-08-01  9:12 ` [PATCH net-next v2 0/2] xfrm: scalability enhancements for policy database Christophe Gouault
@ 2014-08-01  9:12   ` Christophe Gouault
  2014-08-01  9:12   ` [PATCH net-next v2 2/2] xfrm: configure policy hash table thresholds by netlink Christophe Gouault
  2014-08-04 22:09   ` [PATCH net-next v2 " David Miller
  2 siblings, 0 replies; 27+ messages in thread
From: Christophe Gouault @ 2014-08-01  9:12 UTC (permalink / raw)
  To: David S. Miller, Steffen Klassert; +Cc: netdev, Christophe Gouault

The idea is an extension of the current policy hashing.

Today only non-prefixed policies are stored in a hash table. This
patch relaxes the constraints, and hashes policies whose prefix
lengths are greater or equal to a configurable threshold.

Each hash table (one per direction) maintains its own set of IPv4 and
IPv6 thresholds (dbits4, sbits4, dbits6, sbits6), by default (32, 32,
128, 128).

Example, if the output hash table is configured with values (16, 24,
56, 64):

ip xfrm policy add dir out src 10.22.0.0/20 dst 10.24.1.0/24 ... => hashed
ip xfrm policy add dir out src 10.22.0.0/16 dst 10.24.1.1/32 ... => hashed
ip xfrm policy add dir out src 10.22.0.0/16 dst 10.24.0.0/16 ... => unhashed

ip xfrm policy add dir out \
    src 3ffe:304:124:2200::/60 dst 3ffe:304:124:2401::/64 ...    => hashed
ip xfrm policy add dir out \
    src 3ffe:304:124:2200::/56 dst 3ffe:304:124:2401::2/128 ...  => hashed
ip xfrm policy add dir out \
    src 3ffe:304:124:2200::/56 dst 3ffe:304:124:2400::/56 ...    => unhashed

The high order bits of the addresses (up to the threshold) are used to
compute the hash key.

Signed-off-by: Christophe Gouault <christophe.gouault@6wind.com>
---
 include/net/netns/xfrm.h |  4 +++
 net/xfrm/xfrm_hash.h     | 76 +++++++++++++++++++++++++++++++++++++++++-------
 net/xfrm/xfrm_policy.c   | 53 +++++++++++++++++++++++++++++----
 3 files changed, 117 insertions(+), 16 deletions(-)

diff --git a/include/net/netns/xfrm.h b/include/net/netns/xfrm.h
index 3492434..41902a8 100644
--- a/include/net/netns/xfrm.h
+++ b/include/net/netns/xfrm.h
@@ -13,6 +13,10 @@ struct ctl_table_header;
 struct xfrm_policy_hash {
 	struct hlist_head	*table;
 	unsigned int		hmask;
+	u8			dbits4;
+	u8			sbits4;
+	u8			dbits6;
+	u8			sbits6;
 };
 
 struct netns_xfrm {
diff --git a/net/xfrm/xfrm_hash.h b/net/xfrm/xfrm_hash.h
index 0622d31..666c5ff 100644
--- a/net/xfrm/xfrm_hash.h
+++ b/net/xfrm/xfrm_hash.h
@@ -3,6 +3,7 @@
 
 #include <linux/xfrm.h>
 #include <linux/socket.h>
+#include <linux/jhash.h>
 
 static inline unsigned int __xfrm4_addr_hash(const xfrm_address_t *addr)
 {
@@ -28,6 +29,58 @@ static inline unsigned int __xfrm6_daddr_saddr_hash(const xfrm_address_t *daddr,
 		     saddr->a6[2] ^ saddr->a6[3]);
 }
 
+static inline u32 __bits2mask32(__u8 bits)
+{
+	u32 mask32 = 0xffffffff;
+
+	if (bits == 0)
+		mask32 = 0;
+	else if (bits < 32)
+		mask32 <<= (32 - bits);
+
+	return mask32;
+}
+
+static inline unsigned int __xfrm4_dpref_spref_hash(const xfrm_address_t *daddr,
+						    const xfrm_address_t *saddr,
+						    __u8 dbits,
+						    __u8 sbits)
+{
+	return jhash_2words(ntohl(daddr->a4) & __bits2mask32(dbits),
+			    ntohl(saddr->a4) & __bits2mask32(sbits),
+			    0);
+}
+
+static inline unsigned int __xfrm6_pref_hash(const xfrm_address_t *addr,
+					     __u8 prefixlen)
+{
+	int pdw;
+	int pbi;
+	u32 initval = 0;
+
+	pdw = prefixlen >> 5;     /* num of whole u32 in prefix */
+	pbi = prefixlen &  0x1f;  /* num of bits in incomplete u32 in prefix */
+
+	if (pbi) {
+		__be32 mask;
+
+		mask = htonl((0xffffffff) << (32 - pbi));
+
+		initval = (__force u32)(addr->a6[pdw] & mask);
+	}
+
+	return jhash2((__force u32 *)addr->a6, pdw, initval);
+}
+
+static inline unsigned int __xfrm6_dpref_spref_hash(const xfrm_address_t *daddr,
+						    const xfrm_address_t *saddr,
+						    __u8 dbits,
+						    __u8 sbits)
+{
+	return __xfrm6_pref_hash(daddr, dbits) ^
+	       __xfrm6_pref_hash(saddr, sbits);
+}
+
 static inline unsigned int __xfrm_dst_hash(const xfrm_address_t *daddr,
 					   const xfrm_address_t *saddr,
 					   u32 reqid, unsigned short family,
@@ -84,7 +137,8 @@ static inline unsigned int __idx_hash(u32 index, unsigned int hmask)
 }
 
 static inline unsigned int __sel_hash(const struct xfrm_selector *sel,
-				      unsigned short family, unsigned int hmask)
+				      unsigned short family, unsigned int hmask,
+				      u8 dbits, u8 sbits)
 {
 	const xfrm_address_t *daddr = &sel->daddr;
 	const xfrm_address_t *saddr = &sel->saddr;
@@ -92,19 +146,19 @@ static inline unsigned int __sel_hash(const struct xfrm_selector *sel,
 
 	switch (family) {
 	case AF_INET:
-		if (sel->prefixlen_d != 32 ||
-		    sel->prefixlen_s != 32)
+		if (sel->prefixlen_d < dbits ||
+		    sel->prefixlen_s < sbits)
 			return hmask + 1;
 
-		h = __xfrm4_daddr_saddr_hash(daddr, saddr);
+		h = __xfrm4_dpref_spref_hash(daddr, saddr, dbits, sbits);
 		break;
 
 	case AF_INET6:
-		if (sel->prefixlen_d != 128 ||
-		    sel->prefixlen_s != 128)
+		if (sel->prefixlen_d < dbits ||
+		    sel->prefixlen_s < sbits)
 			return hmask + 1;
 
-		h = __xfrm6_daddr_saddr_hash(daddr, saddr);
+		h = __xfrm6_dpref_spref_hash(daddr, saddr, dbits, sbits);
 		break;
 	}
 	h ^= (h >> 16);
@@ -113,17 +167,19 @@ static inline unsigned int __sel_hash(const struct xfrm_selector *sel,
 
 static inline unsigned int __addr_hash(const xfrm_address_t *daddr,
 				       const xfrm_address_t *saddr,
-				       unsigned short family, unsigned int hmask)
+				       unsigned short family,
+				       unsigned int hmask,
+				       u8 dbits, u8 sbits)
 {
 	unsigned int h = 0;
 
 	switch (family) {
 	case AF_INET:
-		h = __xfrm4_daddr_saddr_hash(daddr, saddr);
+		h = __xfrm4_dpref_spref_hash(daddr, saddr, dbits, sbits);
 		break;
 
 	case AF_INET6:
-		h = __xfrm6_daddr_saddr_hash(daddr, saddr);
+		h = __xfrm6_dpref_spref_hash(daddr, saddr, dbits, sbits);
 		break;
 	}
 	h ^= (h >> 16);
diff --git a/net/xfrm/xfrm_policy.c b/net/xfrm/xfrm_policy.c
index a8ef510..312828c 100644
--- a/net/xfrm/xfrm_policy.c
+++ b/net/xfrm/xfrm_policy.c
@@ -344,12 +344,39 @@ static inline unsigned int idx_hash(struct net *net, u32 index)
 	return __idx_hash(index, net->xfrm.policy_idx_hmask);
 }
 
+/* calculate policy hash thresholds */
+static void __get_hash_thresh(struct net *net,
+			      unsigned short family, int dir,
+			      u8 *dbits, u8 *sbits)
+{
+	switch (family) {
+	case AF_INET:
+		*dbits = net->xfrm.policy_bydst[dir].dbits4;
+		*sbits = net->xfrm.policy_bydst[dir].sbits4;
+		break;
+
+	case AF_INET6:
+		*dbits = net->xfrm.policy_bydst[dir].dbits6;
+		*sbits = net->xfrm.policy_bydst[dir].sbits6;
+		break;
+
+	default:
+		*dbits = 0;
+		*sbits = 0;
+	}
+}
+
 static struct hlist_head *policy_hash_bysel(struct net *net,
 					    const struct xfrm_selector *sel,
 					    unsigned short family, int dir)
 {
 	unsigned int hmask = net->xfrm.policy_bydst[dir].hmask;
-	unsigned int hash = __sel_hash(sel, family, hmask);
+	unsigned int hash;
+	u8 dbits;
+	u8 sbits;
+
+	__get_hash_thresh(net, family, dir, &dbits, &sbits);
+	hash = __sel_hash(sel, family, hmask, dbits, sbits);
 
 	return (hash == hmask + 1 ?
 		&net->xfrm.policy_inexact[dir] :
@@ -362,25 +389,35 @@ static struct hlist_head *policy_hash_direct(struct net *net,
 					     unsigned short family, int dir)
 {
 	unsigned int hmask = net->xfrm.policy_bydst[dir].hmask;
-	unsigned int hash = __addr_hash(daddr, saddr, family, hmask);
+	unsigned int hash;
+	u8 dbits;
+	u8 sbits;
+
+	__get_hash_thresh(net, family, dir, &dbits, &sbits);
+	hash = __addr_hash(daddr, saddr, family, hmask, dbits, sbits);
 
 	return net->xfrm.policy_bydst[dir].table + hash;
 }
 
-static void xfrm_dst_hash_transfer(struct hlist_head *list,
+static void xfrm_dst_hash_transfer(struct net *net,
+				   struct hlist_head *list,
 				   struct hlist_head *ndsttable,
-				   unsigned int nhashmask)
+				   unsigned int nhashmask,
+				   int dir)
 {
 	struct hlist_node *tmp, *entry0 = NULL;
 	struct xfrm_policy *pol;
 	unsigned int h0 = 0;
+	u8 dbits;
+	u8 sbits;
 
 redo:
 	hlist_for_each_entry_safe(pol, tmp, list, bydst) {
 		unsigned int h;
 
+		__get_hash_thresh(net, pol->family, dir, &dbits, &sbits);
 		h = __addr_hash(&pol->selector.daddr, &pol->selector.saddr,
-				pol->family, nhashmask);
+				pol->family, nhashmask, dbits, sbits);
 		if (!entry0) {
 			hlist_del(&pol->bydst);
 			hlist_add_head(&pol->bydst, ndsttable+h);
@@ -434,7 +471,7 @@ static void xfrm_bydst_resize(struct net *net, int dir)
 	write_lock_bh(&net->xfrm.xfrm_policy_lock);
 
 	for (i = hmask; i >= 0; i--)
-		xfrm_dst_hash_transfer(odst + i, ndst, nhashmask);
+		xfrm_dst_hash_transfer(net, odst + i, ndst, nhashmask, dir);
 
 	net->xfrm.policy_bydst[dir].table = ndst;
 	net->xfrm.policy_bydst[dir].hmask = nhashmask;
@@ -2828,6 +2865,10 @@ static int __net_init xfrm_policy_init(struct net *net)
 		if (!htab->table)
 			goto out_bydst;
 		htab->hmask = hmask;
+		htab->dbits4 = 32;
+		htab->sbits4 = 32;
+		htab->dbits6 = 128;
+		htab->sbits6 = 128;
 	}
 
 	INIT_LIST_HEAD(&net->xfrm.policy_all);
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH net-next v2 2/2] xfrm: configure policy hash table thresholds by netlink
  2014-08-01  9:12 ` [PATCH net-next v2 0/2] xfrm: scalability enhancements for policy database Christophe Gouault
  2014-08-01  9:12   ` [PATCH net-next v2 1/2] xfrm: hash prefixed policies based on preflen thresholds Christophe Gouault
@ 2014-08-01  9:12   ` Christophe Gouault
  2014-08-01 13:01     ` [PATCH RFC iproute2 0/2] ipxfrm: configuration of SPD hash Christophe Gouault
  2014-08-21  6:09     ` [PATCH net-next v2 2/2] xfrm: configure policy hash table thresholds by netlink Steffen Klassert
  2014-08-04 22:09   ` [PATCH net-next v2 " David Miller
  2 siblings, 2 replies; 27+ messages in thread
From: Christophe Gouault @ 2014-08-01  9:12 UTC (permalink / raw)
  To: David S. Miller, Steffen Klassert; +Cc: netdev, Christophe Gouault

Enable to specify local and remote prefix length thresholds for the
policy hash table via a netlink XFRM_MSG_NEWSPDINFO message.

prefix length thresholds are specified by XFRMA_SPD_IPV4_HTHRESH and
XFRMA_SPD_IPV6_HTHRESH optional attributes (struct xfrmu_spdhthresh).

example:

    struct xfrmu_spdhthresh thresh4 = {
        .lbits = 0;
        .rbits = 24;
    };
    struct xfrmu_spdhthresh thresh6 = {
        .lbits = 0;
        .rbits = 56;
    };
    struct nlmsghdr *hdr;
    struct nl_msg *msg;

    msg = nlmsg_alloc();
    hdr = nlmsg_put(msg, NL_AUTO_PORT, NL_AUTO_SEQ, XFRMA_SPD_IPV4_HTHRESH, sizeof(__u32), NLM_F_REQUEST);
    nla_put(msg, XFRMA_SPD_IPV4_HTHRESH, sizeof(thresh4), &thresh4);
    nla_put(msg, XFRMA_SPD_IPV6_HTHRESH, sizeof(thresh6), &thresh6);
    nla_send_auto(sk, msg);

The numbers are the policy selector minimum prefix lengths to put a
policy in the hash table.

- lbits is the local threshold (source address for out policies,
  destination address for in and fwd policies).

- rbits is the remote threshold (destination address for out
  policies, source address for in and fwd policies).

The default values are:

XFRMA_SPD_IPV4_HTHRESH: 32 32
XFRMA_SPD_IPV6_HTHRESH: 128 128

Dynamic re-building of the SPD is performed when the thresholds values
are changed.

The kernel replies to XFRM_MSG_GETSPDINFO and XFRM_MSG_NEWSPDINFO
requests by an XFRM_MSG_NEWSPDINFO message, with both attributes
XFRMA_SPD_IPV4_HTHRESH and XFRMA_SPD_IPV6_HTHRESH.

Signed-off-by: Christophe Gouault <christophe.gouault@6wind.com>
---
v2:
- use netlink instead of /proc
---
 include/net/netns/xfrm.h  | 10 ++++++
 include/net/xfrm.h        |  1 +
 include/uapi/linux/xfrm.h |  7 ++++
 net/xfrm/xfrm_policy.c    | 90 ++++++++++++++++++++++++++++++++++++++++++++++
 net/xfrm/xfrm_user.c      | 91 +++++++++++++++++++++++++++++++++++++++++++++--
 5 files changed, 196 insertions(+), 3 deletions(-)

diff --git a/include/net/netns/xfrm.h b/include/net/netns/xfrm.h
index 41902a8..9da7982 100644
--- a/include/net/netns/xfrm.h
+++ b/include/net/netns/xfrm.h
@@ -19,6 +19,15 @@ struct xfrm_policy_hash {
 	u8			sbits6;
 };
 
+struct xfrm_policy_hthresh {
+	struct work_struct	work;
+	seqlock_t		lock;
+	u8			lbits4;
+	u8			rbits4;
+	u8			lbits6;
+	u8			rbits6;
+};
+
 struct netns_xfrm {
 	struct list_head	state_all;
 	/*
@@ -45,6 +54,7 @@ struct netns_xfrm {
 	struct xfrm_policy_hash	policy_bydst[XFRM_POLICY_MAX * 2];
 	unsigned int		policy_count[XFRM_POLICY_MAX * 2];
 	struct work_struct	policy_hash_work;
+	struct xfrm_policy_hthresh policy_hthresh;
 
 
 	struct sock		*nlsk;
diff --git a/include/net/xfrm.h b/include/net/xfrm.h
index 721e9c3..dc4865e 100644
--- a/include/net/xfrm.h
+++ b/include/net/xfrm.h
@@ -1591,6 +1591,7 @@ struct xfrm_policy *xfrm_policy_bysel_ctx(struct net *net, u32 mark,
 struct xfrm_policy *xfrm_policy_byid(struct net *net, u32 mark, u8, int dir,
 				     u32 id, int delete, int *err);
 int xfrm_policy_flush(struct net *net, u8 type, bool task_valid);
+void xfrm_policy_hash_rebuild(struct net *net);
 u32 xfrm_get_acqseq(void);
 int verify_spi_info(u8 proto, u32 min, u32 max);
 int xfrm_alloc_spi(struct xfrm_state *x, u32 minspi, u32 maxspi);
diff --git a/include/uapi/linux/xfrm.h b/include/uapi/linux/xfrm.h
index 25e5dd9..02d5125 100644
--- a/include/uapi/linux/xfrm.h
+++ b/include/uapi/linux/xfrm.h
@@ -328,6 +328,8 @@ enum xfrm_spdattr_type_t {
 	XFRMA_SPD_UNSPEC,
 	XFRMA_SPD_INFO,
 	XFRMA_SPD_HINFO,
+	XFRMA_SPD_IPV4_HTHRESH,
+	XFRMA_SPD_IPV6_HTHRESH,
 	__XFRMA_SPD_MAX
 
 #define XFRMA_SPD_MAX (__XFRMA_SPD_MAX - 1)
@@ -347,6 +349,11 @@ struct xfrmu_spdhinfo {
 	__u32 spdhmcnt;
 };
 
+struct xfrmu_spdhthresh {
+	__u8 lbits;
+	__u8 rbits;
+};
+
 struct xfrm_usersa_info {
 	struct xfrm_selector		sel;
 	struct xfrm_id			id;
diff --git a/net/xfrm/xfrm_policy.c b/net/xfrm/xfrm_policy.c
index 312828c..c7d7a7e 100644
--- a/net/xfrm/xfrm_policy.c
+++ b/net/xfrm/xfrm_policy.c
@@ -13,6 +13,8 @@
  *
  */
 
+#define pr_fmt(fmt) "IPsec: " fmt
+
 #include <linux/err.h>
 #include <linux/slab.h>
 #include <linux/kmod.h>
@@ -566,6 +568,89 @@ static void xfrm_hash_resize(struct work_struct *work)
 	mutex_unlock(&hash_resize_mutex);
 }
 
+static void xfrm_hash_rebuild(struct work_struct *work)
+{
+	struct net *net = container_of(work, struct net,
+				       xfrm.policy_hthresh.work);
+	unsigned int hmask;
+	struct xfrm_policy *pol;
+	struct xfrm_policy *policy;
+	struct hlist_head *chain;
+	struct hlist_head *odst;
+	struct hlist_node *newpos;
+	int i;
+	int dir;
+	unsigned seq;
+	u8 lbits4, rbits4, lbits6, rbits6;
+
+	mutex_lock(&hash_resize_mutex);
+
+	/* read selector prefixlen thresholds */
+	do {
+		seq = read_seqbegin(&net->xfrm.policy_hthresh.lock);
+
+		lbits4 = net->xfrm.policy_hthresh.lbits4;
+		rbits4 = net->xfrm.policy_hthresh.rbits4;
+		lbits6 = net->xfrm.policy_hthresh.lbits6;
+		rbits6 = net->xfrm.policy_hthresh.rbits6;
+	} while (read_seqretry(&net->xfrm.policy_hthresh.lock, seq));
+
+	write_lock_bh(&net->xfrm.xfrm_policy_lock);
+
+	pr_info("rebuilding SPD hash table: thresholds (%u,%u)(%u,%u)\n",
+		lbits4, rbits4, lbits6, rbits6);
+
+	/* reset the bydst and inexact table in all directions */
+	for (dir = 0; dir < XFRM_POLICY_MAX * 2; dir++) {
+		INIT_HLIST_HEAD(&net->xfrm.policy_inexact[dir]);
+		hmask = net->xfrm.policy_bydst[dir].hmask;
+		odst = net->xfrm.policy_bydst[dir].table;
+		for (i = hmask; i >= 0; i--)
+			INIT_HLIST_HEAD(odst + i);
+		if ((dir & XFRM_POLICY_MASK) == XFRM_POLICY_OUT) {
+			/* dir out => dst = remote, src = local */
+			net->xfrm.policy_bydst[dir].dbits4 = rbits4;
+			net->xfrm.policy_bydst[dir].sbits4 = lbits4;
+			net->xfrm.policy_bydst[dir].dbits6 = rbits6;
+			net->xfrm.policy_bydst[dir].sbits6 = lbits6;
+		} else {
+			/* dir in/fwd => dst = local, src = remote */
+			net->xfrm.policy_bydst[dir].dbits4 = lbits4;
+			net->xfrm.policy_bydst[dir].sbits4 = rbits4;
+			net->xfrm.policy_bydst[dir].dbits6 = lbits6;
+			net->xfrm.policy_bydst[dir].sbits6 = rbits6;
+		}
+	}
+
+	/* re-insert all policies by order of creation */
+	list_for_each_entry_reverse(policy, &net->xfrm.policy_all, walk.all) {
+		newpos = NULL;
+		chain = policy_hash_bysel(net, &policy->selector,
+					  policy->family,
+					  xfrm_policy_id2dir(policy->index));
+		hlist_for_each_entry(pol, chain, bydst) {
+			if (policy->priority >= pol->priority)
+				newpos = &pol->bydst;
+			else
+				break;
+		}
+		if (newpos)
+			hlist_add_after(newpos, &policy->bydst);
+		else
+			hlist_add_head(&policy->bydst, chain);
+	}
+
+	write_unlock_bh(&net->xfrm.xfrm_policy_lock);
+
+	mutex_unlock(&hash_resize_mutex);
+}
+
+void xfrm_policy_hash_rebuild(struct net *net)
+{
+	schedule_work(&net->xfrm.policy_hthresh.work);
+}
+EXPORT_SYMBOL(xfrm_policy_hash_rebuild);
+
 /* Generate new index... KAME seems to generate them ordered by cost
  * of an absolute inpredictability of ordering of rules. This will not pass. */
 static u32 xfrm_gen_index(struct net *net, int dir, u32 index)
@@ -2870,9 +2955,14 @@ static int __net_init xfrm_policy_init(struct net *net)
 		htab->dbits6 = 128;
 		htab->sbits6 = 128;
 	}
+	net->xfrm.policy_hthresh.lbits4 = 32;
+	net->xfrm.policy_hthresh.rbits4 = 32;
+	net->xfrm.policy_hthresh.lbits6 = 128;
+	net->xfrm.policy_hthresh.rbits6 = 128;
 
 	INIT_LIST_HEAD(&net->xfrm.policy_all);
 	INIT_WORK(&net->xfrm.policy_hash_work, xfrm_hash_resize);
+	INIT_WORK(&net->xfrm.policy_hthresh.work, xfrm_hash_rebuild);
 	if (net_eq(net, &init_net))
 		register_netdevice_notifier(&xfrm_dev_notifier);
 	return 0;
diff --git a/net/xfrm/xfrm_user.c b/net/xfrm/xfrm_user.c
index 412d9dc..a3549fa 100644
--- a/net/xfrm/xfrm_user.c
+++ b/net/xfrm/xfrm_user.c
@@ -965,7 +965,9 @@ static inline size_t xfrm_spdinfo_msgsize(void)
 {
 	return NLMSG_ALIGN(4)
 	       + nla_total_size(sizeof(struct xfrmu_spdinfo))
-	       + nla_total_size(sizeof(struct xfrmu_spdhinfo));
+	       + nla_total_size(sizeof(struct xfrmu_spdhinfo))
+	       + nla_total_size(sizeof(struct xfrmu_spdhthresh))
+	       + nla_total_size(sizeof(struct xfrmu_spdhthresh));
 }
 
 static int build_spdinfo(struct sk_buff *skb, struct net *net,
@@ -974,9 +976,11 @@ static int build_spdinfo(struct sk_buff *skb, struct net *net,
 	struct xfrmk_spdinfo si;
 	struct xfrmu_spdinfo spc;
 	struct xfrmu_spdhinfo sph;
+	struct xfrmu_spdhthresh spt4, spt6;
 	struct nlmsghdr *nlh;
 	int err;
 	u32 *f;
+	unsigned lseq;
 
 	nlh = nlmsg_put(skb, portid, seq, XFRM_MSG_NEWSPDINFO, sizeof(u32), 0);
 	if (nlh == NULL) /* shouldn't really happen ... */
@@ -994,9 +998,22 @@ static int build_spdinfo(struct sk_buff *skb, struct net *net,
 	sph.spdhcnt = si.spdhcnt;
 	sph.spdhmcnt = si.spdhmcnt;
 
+	do {
+		lseq = read_seqbegin(&net->xfrm.policy_hthresh.lock);
+
+		spt4.lbits = net->xfrm.policy_hthresh.lbits4;
+		spt4.rbits = net->xfrm.policy_hthresh.rbits4;
+		spt6.lbits = net->xfrm.policy_hthresh.lbits6;
+		spt6.rbits = net->xfrm.policy_hthresh.rbits6;
+	} while (read_seqretry(&net->xfrm.policy_hthresh.lock, lseq));
+
 	err = nla_put(skb, XFRMA_SPD_INFO, sizeof(spc), &spc);
 	if (!err)
 		err = nla_put(skb, XFRMA_SPD_HINFO, sizeof(sph), &sph);
+	if (!err)
+		err = nla_put(skb, XFRMA_SPD_IPV4_HTHRESH, sizeof(spt4), &spt4);
+	if (!err)
+		err = nla_put(skb, XFRMA_SPD_IPV6_HTHRESH, sizeof(spt6), &spt6);
 	if (err) {
 		nlmsg_cancel(skb, nlh);
 		return err;
@@ -1005,6 +1022,62 @@ static int build_spdinfo(struct sk_buff *skb, struct net *net,
 	return nlmsg_end(skb, nlh);
 }
 
+static int xfrm_set_spdinfo(struct sk_buff *skb, struct nlmsghdr *nlh,
+			    struct nlattr **attrs)
+{
+	struct net *net = sock_net(skb->sk);
+	struct sk_buff *r_skb;
+	u32 *flags = nlmsg_data(nlh);
+	u32 sportid = NETLINK_CB(skb).portid;
+	u32 seq = nlh->nlmsg_seq;
+	struct xfrmu_spdhthresh *thresh4 = NULL;
+	struct xfrmu_spdhthresh *thresh6 = NULL;
+
+	/* selector prefixlen thresholds to hash policies */
+	if (attrs[XFRMA_SPD_IPV4_HTHRESH]) {
+		struct nlattr *rta = attrs[XFRMA_SPD_IPV4_HTHRESH];
+
+		if (nla_len(rta) < sizeof(*thresh4))
+			return -EINVAL;
+		thresh4 = nla_data(rta);
+		if (thresh4->lbits > 32 || thresh4->rbits > 32)
+			return -EINVAL;
+	}
+	if (attrs[XFRMA_SPD_IPV6_HTHRESH]) {
+		struct nlattr *rta = attrs[XFRMA_SPD_IPV6_HTHRESH];
+
+		if (nla_len(rta) < sizeof(*thresh6))
+			return -EINVAL;
+		thresh6 = nla_data(rta);
+		if (thresh6->lbits > 128 || thresh6->rbits > 128)
+			return -EINVAL;
+	}
+
+	if (thresh4 || thresh6) {
+		write_seqlock(&net->xfrm.policy_hthresh.lock);
+		if (thresh4) {
+			net->xfrm.policy_hthresh.lbits4 = thresh4->lbits;
+			net->xfrm.policy_hthresh.rbits4 = thresh4->rbits;
+		}
+		if (thresh6) {
+			net->xfrm.policy_hthresh.lbits6 = thresh6->lbits;
+			net->xfrm.policy_hthresh.rbits6 = thresh6->rbits;
+		}
+		write_sequnlock(&net->xfrm.policy_hthresh.lock);
+
+		xfrm_policy_hash_rebuild(net);
+	}
+
+	r_skb = nlmsg_new(xfrm_spdinfo_msgsize(), GFP_ATOMIC);
+	if (r_skb == NULL)
+		return -ENOMEM;
+
+	if (build_spdinfo(r_skb, net, sportid, seq, *flags) < 0)
+		BUG();
+
+	return nlmsg_unicast(net->xfrm.nlsk, r_skb, sportid);
+}
+
 static int xfrm_get_spdinfo(struct sk_buff *skb, struct nlmsghdr *nlh,
 		struct nlattr **attrs)
 {
@@ -2275,6 +2348,7 @@ static const int xfrm_msg_min[XFRM_NR_MSGTYPES] = {
 	[XFRM_MSG_REPORT      - XFRM_MSG_BASE] = XMSGSIZE(xfrm_user_report),
 	[XFRM_MSG_MIGRATE     - XFRM_MSG_BASE] = XMSGSIZE(xfrm_userpolicy_id),
 	[XFRM_MSG_GETSADINFO  - XFRM_MSG_BASE] = sizeof(u32),
+	[XFRM_MSG_NEWSPDINFO  - XFRM_MSG_BASE] = sizeof(u32),
 	[XFRM_MSG_GETSPDINFO  - XFRM_MSG_BASE] = sizeof(u32),
 };
 
@@ -2309,10 +2383,17 @@ static const struct nla_policy xfrma_policy[XFRMA_MAX+1] = {
 	[XFRMA_ADDRESS_FILTER]	= { .len = sizeof(struct xfrm_address_filter) },
 };
 
+static const struct nla_policy xfrma_spd_policy[XFRMA_SPD_MAX+1] = {
+	[XFRMA_SPD_IPV4_HTHRESH] = { .len = sizeof(struct xfrmu_spdhthresh) },
+	[XFRMA_SPD_IPV6_HTHRESH] = { .len = sizeof(struct xfrmu_spdhthresh) },
+};
+
 static const struct xfrm_link {
 	int (*doit)(struct sk_buff *, struct nlmsghdr *, struct nlattr **);
 	int (*dump)(struct sk_buff *, struct netlink_callback *);
 	int (*done)(struct netlink_callback *);
+	const struct nla_policy *nla_pol;
+	int nla_max;
 } xfrm_dispatch[XFRM_NR_MSGTYPES] = {
 	[XFRM_MSG_NEWSA       - XFRM_MSG_BASE] = { .doit = xfrm_add_sa        },
 	[XFRM_MSG_DELSA       - XFRM_MSG_BASE] = { .doit = xfrm_del_sa        },
@@ -2336,6 +2417,9 @@ static const struct xfrm_link {
 	[XFRM_MSG_GETAE       - XFRM_MSG_BASE] = { .doit = xfrm_get_ae  },
 	[XFRM_MSG_MIGRATE     - XFRM_MSG_BASE] = { .doit = xfrm_do_migrate    },
 	[XFRM_MSG_GETSADINFO  - XFRM_MSG_BASE] = { .doit = xfrm_get_sadinfo   },
+	[XFRM_MSG_NEWSPDINFO  - XFRM_MSG_BASE] = { .doit = xfrm_set_spdinfo,
+						   .nla_pol = xfrma_spd_policy,
+						   .nla_max = XFRMA_SPD_MAX },
 	[XFRM_MSG_GETSPDINFO  - XFRM_MSG_BASE] = { .doit = xfrm_get_spdinfo   },
 };
 
@@ -2372,8 +2456,9 @@ static int xfrm_user_rcv_msg(struct sk_buff *skb, struct nlmsghdr *nlh)
 		}
 	}
 
-	err = nlmsg_parse(nlh, xfrm_msg_min[type], attrs, XFRMA_MAX,
-			  xfrma_policy);
+	err = nlmsg_parse(nlh, xfrm_msg_min[type], attrs,
+			  link->nla_max ? : XFRMA_MAX,
+			  link->nla_pol ? : xfrma_policy);
 	if (err < 0)
 		return err;
 
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH RFC iproute2 0/2] ipxfrm: configuration of SPD hash
  2014-08-01  9:12   ` [PATCH net-next v2 2/2] xfrm: configure policy hash table thresholds by netlink Christophe Gouault
@ 2014-08-01 13:01     ` Christophe Gouault
  2014-08-01 13:01       ` [PATCH RFC iproute2 1/2] Update headers to net-next Christophe Gouault
  2014-08-01 13:01       ` [PATCH RFC iproute2 2/2] ipxfrm: add command for configuring SPD hash table Christophe Gouault
  2014-08-21  6:09     ` [PATCH net-next v2 2/2] xfrm: configure policy hash table thresholds by netlink Steffen Klassert
  1 sibling, 2 replies; 27+ messages in thread
From: Christophe Gouault @ 2014-08-01 13:01 UTC (permalink / raw)
  To: David S. Miller, Steffen Klassert; +Cc: netdev

This patchset is provided in order to test the kernel patchset
"[net-next v2 0/2] xfrm: scalability enhancements for policy database"
for those who would like to play with these new knobs.

Please note that I will be on vacation starting next week, so I will
not be very reactive to comments during August.

Best Regards,
Christophe
---
 include/linux/xfrm.h |   7 +++++
 ip/xfrm_policy.c     | 106
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++---
 2 files changed, 109 insertions(+), 4 deletions(-)

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCH RFC iproute2 1/2] Update headers to net-next
  2014-08-01 13:01     ` [PATCH RFC iproute2 0/2] ipxfrm: configuration of SPD hash Christophe Gouault
@ 2014-08-01 13:01       ` Christophe Gouault
  2014-08-01 13:01       ` [PATCH RFC iproute2 2/2] ipxfrm: add command for configuring SPD hash table Christophe Gouault
  1 sibling, 0 replies; 27+ messages in thread
From: Christophe Gouault @ 2014-08-01 13:01 UTC (permalink / raw)
  To: David S. Miller, Steffen Klassert; +Cc: netdev, Christophe Gouault

---
 include/linux/xfrm.h | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/include/linux/xfrm.h b/include/linux/xfrm.h
index fa2ecb2..3a1fd32 100644
--- a/include/linux/xfrm.h
+++ b/include/linux/xfrm.h
@@ -328,6 +328,8 @@ enum xfrm_spdattr_type_t {
 	XFRMA_SPD_UNSPEC,
 	XFRMA_SPD_INFO,
 	XFRMA_SPD_HINFO,
+	XFRMA_SPD_IPV4_HTHRESH,
+	XFRMA_SPD_IPV6_HTHRESH,
 	__XFRMA_SPD_MAX
 
 #define XFRMA_SPD_MAX (__XFRMA_SPD_MAX - 1)
@@ -347,6 +349,11 @@ struct xfrmu_spdhinfo {
 	__u32 spdhmcnt;
 };
 
+struct xfrmu_spdhthresh {
+	__u8 lbits;
+	__u8 rbits;
+};
+
 struct xfrm_usersa_info {
 	struct xfrm_selector		sel;
 	struct xfrm_id			id;
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH RFC iproute2 2/2] ipxfrm: add command for configuring SPD hash table
  2014-08-01 13:01     ` [PATCH RFC iproute2 0/2] ipxfrm: configuration of SPD hash Christophe Gouault
  2014-08-01 13:01       ` [PATCH RFC iproute2 1/2] Update headers to net-next Christophe Gouault
@ 2014-08-01 13:01       ` Christophe Gouault
  1 sibling, 0 replies; 27+ messages in thread
From: Christophe Gouault @ 2014-08-01 13:01 UTC (permalink / raw)
  To: David S. Miller, Steffen Klassert; +Cc: netdev, Christophe Gouault

add a new command to configure the SPD hash table:
   ip xfrm policy set [ hthresh4 LBITS RBITS ] [ hthresh6 LBITS RBITS ]

hthresh4: defines minimum local and remote IPv4 prefix lengths of
selectors to hash a policy. If prefix lengths are greater or equal
to the thresholds, then the policy is hashed, otherwise it falls back
in the policy_inexact chained list.

hthresh6: defines minimum local and remote IPv6 prefix lengths of
selectors to hash a policy, otherwise it falls back
in the policy_inexact chained list.

Signed-off-by: Christophe Gouault <christophe.gouault@6wind.com>
---
 ip/xfrm_policy.c | 106 ++++++++++++++++++++++++++++++++++++++++++++++++++++---
 1 file changed, 102 insertions(+), 4 deletions(-)

diff --git a/ip/xfrm_policy.c b/ip/xfrm_policy.c
index 2337d35..dbca894 100644
--- a/ip/xfrm_policy.c
+++ b/ip/xfrm_policy.c
@@ -63,7 +63,8 @@ static void usage(void)
 	fprintf(stderr, "        [ index INDEX ] [ ptype PTYPE ] [ action ACTION ] [ priority PRIORITY ]\n");
 	fprintf(stderr, "        [ flag FLAG-LIST ]\n");
 	fprintf(stderr, "Usage: ip xfrm policy flush [ ptype PTYPE ]\n");
-	fprintf(stderr, "Usage: ip xfrm count\n");
+	fprintf(stderr, "Usage: ip xfrm policy count\n");
+	fprintf(stderr, "Usage: ip xfrm policy set [ hthresh4 LBITS RBITS ] [ hthresh6 LBITS RBITS ]\n");
 	fprintf(stderr, "SELECTOR := [ src ADDR[/PLEN] ] [ dst ADDR[/PLEN] ] [ dev DEV ] [ UPSPEC ]\n");
 	fprintf(stderr, "UPSPEC := proto { { ");
 	fprintf(stderr, "%s | ", strxf_proto(IPPROTO_TCP));
@@ -933,9 +934,9 @@ static int print_spdinfo( struct nlmsghdr *n, void *arg)
 			fprintf(fp," FWD %d", si->fwdscnt);
 			fprintf(fp,")");
 		}
-
-		fprintf(fp,"\n");
 	}
+	fprintf(fp,"\n");
+
 	if (show_stats > 1) {
 		struct xfrmu_spdhinfo *sh;
 
@@ -948,13 +949,108 @@ static int print_spdinfo( struct nlmsghdr *n, void *arg)
 			fprintf(fp,"\t SPD buckets:");
 			fprintf(fp," count %d", sh->spdhcnt);
 			fprintf(fp," Max %d", sh->spdhmcnt);
+			fprintf(fp,"\n");
+		}
+		if (tb[XFRMA_SPD_IPV4_HTHRESH]) {
+			struct xfrmu_spdhthresh *th;
+			if (RTA_PAYLOAD(tb[XFRMA_SPD_IPV4_HTHRESH]) < sizeof(*th)) {
+				fprintf(stderr, "SPDinfo: Wrong len %d\n", len);
+				return -1;
+			}
+			th = RTA_DATA(tb[XFRMA_SPD_IPV4_HTHRESH]);
+			fprintf(fp,"\t SPD IPv4 thresholds:");
+			fprintf(fp," local %d", th->lbits);
+			fprintf(fp," remote %d", th->rbits);
+			fprintf(fp,"\n");
+
+		}
+		if (tb[XFRMA_SPD_IPV6_HTHRESH]) {
+			struct xfrmu_spdhthresh *th;
+			if (RTA_PAYLOAD(tb[XFRMA_SPD_IPV6_HTHRESH]) < sizeof(*th)) {
+				fprintf(stderr, "SPDinfo: Wrong len %d\n", len);
+				return -1;
+			}
+			th = RTA_DATA(tb[XFRMA_SPD_IPV6_HTHRESH]);
+			fprintf(fp,"\t SPD IPv6 thresholds:");
+			fprintf(fp," local %d", th->lbits);
+			fprintf(fp," remote %d", th->rbits);
+			fprintf(fp,"\n");
 		}
 	}
-	fprintf(fp,"\n");
 
         return 0;
 }
 
+static int xfrm_spd_setinfo(int argc, char **argv)
+{
+	struct rtnl_handle rth;
+	struct {
+		struct nlmsghdr			n;
+		__u32				flags;
+		char				buf[RTA_BUF_SIZE];
+	} req;
+
+	char *thr4 = NULL;
+	char *thr6 = NULL;
+
+	memset(&req, 0, sizeof(req));
+
+	req.n.nlmsg_len = NLMSG_LENGTH(sizeof(__u32));
+	req.n.nlmsg_flags = NLM_F_REQUEST;
+	req.n.nlmsg_type = XFRM_MSG_NEWSPDINFO;
+	req.flags = 0XFFFFFFFF;
+
+	while (argc > 0) {
+		if (strcmp(*argv, "hthresh4") == 0) {
+			struct xfrmu_spdhthresh thr;
+
+			if (thr4)
+				duparg("hthresh4", *argv);
+			thr4 = *argv;
+			NEXT_ARG();
+			if (get_u8(&thr.lbits, *argv, 0) || thr.lbits > 32)
+				invarg("hthresh4 LBITS value is invalid", *argv);
+			NEXT_ARG();
+			if (get_u8(&thr.rbits, *argv, 0) || thr.rbits > 32)
+				invarg("hthresh4 RBITS value is invalid", *argv);
+
+			addattr_l(&req.n, sizeof(req), XFRMA_SPD_IPV4_HTHRESH,
+				  (void *)&thr, sizeof(thr));
+		} else if (strcmp(*argv, "hthresh6") == 0) {
+			struct xfrmu_spdhthresh thr;
+
+			if (thr6)
+				duparg("hthresh6", *argv);
+			thr6 = *argv;
+			NEXT_ARG();
+			if (get_u8(&thr.lbits, *argv, 0) || thr.lbits > 128)
+				invarg("hthresh6 LBITS value is invalid", *argv);
+			NEXT_ARG();
+			if (get_u8(&thr.rbits, *argv, 0) || thr.rbits > 128)
+				invarg("hthresh6 RBITS value is invalid", *argv);
+
+			addattr_l(&req.n, sizeof(req), XFRMA_SPD_IPV6_HTHRESH,
+				  (void *)&thr, sizeof(thr));
+		} else {
+			invarg("unknown", *argv);
+		}
+
+		argc--; argv++;
+	}
+
+	if (rtnl_open_byproto(&rth, 0, NETLINK_XFRM) < 0)
+		exit(1);
+
+	if (rtnl_talk(&rth, &req.n, 0, 0, &req.n) < 0)
+		exit(2);
+
+	print_spdinfo(&req.n, (void*)stdout);
+
+	rtnl_close(&rth);
+
+	return 0;
+}
+
 static int xfrm_spd_getinfo(int argc, char **argv)
 {
 	struct rtnl_handle rth;
@@ -1058,6 +1154,8 @@ int do_xfrm_policy(int argc, char **argv)
 		return xfrm_policy_flush(argc-1, argv+1);
 	if (matches(*argv, "count") == 0)
 		return xfrm_spd_getinfo(argc, argv);
+	if (matches(*argv, "set") == 0)
+		return xfrm_spd_setinfo(argc-1, argv+1);
 	if (matches(*argv, "help") == 0)
 		usage();
 	fprintf(stderr, "Command \"%s\" is unknown, try \"ip xfrm policy help\".\n", *argv);
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: [PATCH net-next v2 0/2] xfrm: scalability enhancements for policy database
  2014-08-01  9:12 ` [PATCH net-next v2 0/2] xfrm: scalability enhancements for policy database Christophe Gouault
  2014-08-01  9:12   ` [PATCH net-next v2 1/2] xfrm: hash prefixed policies based on preflen thresholds Christophe Gouault
  2014-08-01  9:12   ` [PATCH net-next v2 2/2] xfrm: configure policy hash table thresholds by netlink Christophe Gouault
@ 2014-08-04 22:09   ` David Miller
  2 siblings, 0 replies; 27+ messages in thread
From: David Miller @ 2014-08-04 22:09 UTC (permalink / raw)
  To: christophe.gouault; +Cc: steffen.klassert, netdev

From: Christophe Gouault <christophe.gouault@6wind.com>
Date: Fri,  1 Aug 2014 11:12:26 +0200

> This patchset enables to hash more policies than just non-prefixed
> ones: hash policies whose prefix lengths are greater or equal to
> configurable thresholds.
> 
> These thresholds are configured via netlink message
> XFRM_MSG_NEWSPDINFO, attributes XFRMA_SPD_IPV4_HTHRESH and
> XFRMA_SPD_IPV6_HTHRESH.
> 
> The related iproute2 patch for configuring the thresholds is available
> on demand.

Since this is not urgent, and a new feature, I'll let Steffen review this
when he gets back.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH net-next v2 2/2] xfrm: configure policy hash table thresholds by netlink
  2014-08-01  9:12   ` [PATCH net-next v2 2/2] xfrm: configure policy hash table thresholds by netlink Christophe Gouault
  2014-08-01 13:01     ` [PATCH RFC iproute2 0/2] ipxfrm: configuration of SPD hash Christophe Gouault
@ 2014-08-21  6:09     ` Steffen Klassert
  2014-08-26  7:27       ` Christophe Gouault
  2014-08-27 15:48       ` [PATCH ipsec-next v3 0/2] xfrm: scalability enhancements for policy database Christophe Gouault
  1 sibling, 2 replies; 27+ messages in thread
From: Steffen Klassert @ 2014-08-21  6:09 UTC (permalink / raw)
  To: Christophe Gouault; +Cc: David S. Miller, netdev

On Fri, Aug 01, 2014 at 11:12:28AM +0200, Christophe Gouault wrote:
> diff --git a/include/net/netns/xfrm.h b/include/net/netns/xfrm.h
> index 41902a8..9da7982 100644
> --- a/include/net/netns/xfrm.h
> +++ b/include/net/netns/xfrm.h
> @@ -19,6 +19,15 @@ struct xfrm_policy_hash {
>  	u8			sbits6;
>  };
>  
> +struct xfrm_policy_hthresh {
> +	struct work_struct	work;
> +	seqlock_t		lock;

This newly introduced lock is not initialized. It triggers an
inconsistent lock state warning when acquired for the first time.

>  
> +static void xfrm_hash_rebuild(struct work_struct *work)
> +{
> +	struct net *net = container_of(work, struct net,
> +				       xfrm.policy_hthresh.work);
> +	unsigned int hmask;
> +	struct xfrm_policy *pol;
> +	struct xfrm_policy *policy;
> +	struct hlist_head *chain;
> +	struct hlist_head *odst;
> +	struct hlist_node *newpos;
> +	int i;
> +	int dir;
> +	unsigned seq;
> +	u8 lbits4, rbits4, lbits6, rbits6;
> +
> +	mutex_lock(&hash_resize_mutex);
> +
> +	/* read selector prefixlen thresholds */
> +	do {
> +		seq = read_seqbegin(&net->xfrm.policy_hthresh.lock);
> +
> +		lbits4 = net->xfrm.policy_hthresh.lbits4;
> +		rbits4 = net->xfrm.policy_hthresh.rbits4;
> +		lbits6 = net->xfrm.policy_hthresh.lbits6;
> +		rbits6 = net->xfrm.policy_hthresh.rbits6;
> +	} while (read_seqretry(&net->xfrm.policy_hthresh.lock, seq));
> +
> +	write_lock_bh(&net->xfrm.xfrm_policy_lock);
> +
> +	pr_info("rebuilding SPD hash table: thresholds (%u,%u)(%u,%u)\n",
> +		lbits4, rbits4, lbits6, rbits6);

Do we really need to print this?

> +
> +	/* reset the bydst and inexact table in all directions */
> +	for (dir = 0; dir < XFRM_POLICY_MAX * 2; dir++) {
> +		INIT_HLIST_HEAD(&net->xfrm.policy_inexact[dir]);
> +		hmask = net->xfrm.policy_bydst[dir].hmask;
> +		odst = net->xfrm.policy_bydst[dir].table;
> +		for (i = hmask; i >= 0; i--)
> +			INIT_HLIST_HEAD(odst + i);
> +		if ((dir & XFRM_POLICY_MASK) == XFRM_POLICY_OUT) {
> +			/* dir out => dst = remote, src = local */
> +			net->xfrm.policy_bydst[dir].dbits4 = rbits4;
> +			net->xfrm.policy_bydst[dir].sbits4 = lbits4;
> +			net->xfrm.policy_bydst[dir].dbits6 = rbits6;
> +			net->xfrm.policy_bydst[dir].sbits6 = lbits6;
> +		} else {
> +			/* dir in/fwd => dst = local, src = remote */
> +			net->xfrm.policy_bydst[dir].dbits4 = lbits4;
> +			net->xfrm.policy_bydst[dir].sbits4 = rbits4;
> +			net->xfrm.policy_bydst[dir].dbits6 = lbits6;
> +			net->xfrm.policy_bydst[dir].sbits6 = rbits6;
> +		}
> +	}
> +
> +	/* re-insert all policies by order of creation */
> +	list_for_each_entry_reverse(policy, &net->xfrm.policy_all, walk.all) {
> +		newpos = NULL;
> +		chain = policy_hash_bysel(net, &policy->selector,
> +					  policy->family,
> +					  xfrm_policy_id2dir(policy->index));
> +		hlist_for_each_entry(pol, chain, bydst) {
> +			if (policy->priority >= pol->priority)
> +				newpos = &pol->bydst;
> +			else
> +				break;
> +		}
> +		if (newpos)
> +			hlist_add_after(newpos, &policy->bydst);

hlist_add_after() does not exist any more, it was replaced by
hlist_add_behind() recently.

>  
> +static int xfrm_set_spdinfo(struct sk_buff *skb, struct nlmsghdr *nlh,
> +			    struct nlattr **attrs)
> +{
> +	struct net *net = sock_net(skb->sk);
> +	struct sk_buff *r_skb;
> +	u32 *flags = nlmsg_data(nlh);
> +	u32 sportid = NETLINK_CB(skb).portid;
> +	u32 seq = nlh->nlmsg_seq;
> +	struct xfrmu_spdhthresh *thresh4 = NULL;
> +	struct xfrmu_spdhthresh *thresh6 = NULL;
> +
> +	/* selector prefixlen thresholds to hash policies */
> +	if (attrs[XFRMA_SPD_IPV4_HTHRESH]) {
> +		struct nlattr *rta = attrs[XFRMA_SPD_IPV4_HTHRESH];
> +
> +		if (nla_len(rta) < sizeof(*thresh4))
> +			return -EINVAL;
> +		thresh4 = nla_data(rta);
> +		if (thresh4->lbits > 32 || thresh4->rbits > 32)
> +			return -EINVAL;
> +	}
> +	if (attrs[XFRMA_SPD_IPV6_HTHRESH]) {
> +		struct nlattr *rta = attrs[XFRMA_SPD_IPV6_HTHRESH];
> +
> +		if (nla_len(rta) < sizeof(*thresh6))
> +			return -EINVAL;
> +		thresh6 = nla_data(rta);
> +		if (thresh6->lbits > 128 || thresh6->rbits > 128)
> +			return -EINVAL;
> +	}
> +
> +	if (thresh4 || thresh6) {
> +		write_seqlock(&net->xfrm.policy_hthresh.lock);
> +		if (thresh4) {
> +			net->xfrm.policy_hthresh.lbits4 = thresh4->lbits;
> +			net->xfrm.policy_hthresh.rbits4 = thresh4->rbits;
> +		}
> +		if (thresh6) {
> +			net->xfrm.policy_hthresh.lbits6 = thresh6->lbits;
> +			net->xfrm.policy_hthresh.rbits6 = thresh6->rbits;
> +		}
> +		write_sequnlock(&net->xfrm.policy_hthresh.lock);
> +
> +		xfrm_policy_hash_rebuild(net);
> +	}
> +
> +	r_skb = nlmsg_new(xfrm_spdinfo_msgsize(), GFP_ATOMIC);
> +	if (r_skb == NULL)
> +		return -ENOMEM;
> +
> +	if (build_spdinfo(r_skb, net, sportid, seq, *flags) < 0)
> +		BUG();
> +
> +	return nlmsg_unicast(net->xfrm.nlsk, r_skb, sportid);

Why do you send these informations to userspace? This is a set
operation, not get.


The rest looks quite good, thanks!

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH net-next v2 2/2] xfrm: configure policy hash table thresholds by netlink
  2014-08-21  6:09     ` [PATCH net-next v2 2/2] xfrm: configure policy hash table thresholds by netlink Steffen Klassert
@ 2014-08-26  7:27       ` Christophe Gouault
  2014-08-27 15:48       ` [PATCH ipsec-next v3 0/2] xfrm: scalability enhancements for policy database Christophe Gouault
  1 sibling, 0 replies; 27+ messages in thread
From: Christophe Gouault @ 2014-08-26  7:27 UTC (permalink / raw)
  To: Steffen Klassert; +Cc: David S. Miller, netdev

2014-08-21 8:09 GMT+02:00 Steffen Klassert <steffen.klassert@secunet.com>:
> On Fri, Aug 01, 2014 at 11:12:28AM +0200, Christophe Gouault wrote:
>> diff --git a/include/net/netns/xfrm.h b/include/net/netns/xfrm.h
>> index 41902a8..9da7982 100644
>> --- a/include/net/netns/xfrm.h
>> +++ b/include/net/netns/xfrm.h
>> @@ -19,6 +19,15 @@ struct xfrm_policy_hash {
>>       u8                      sbits6;
>>  };
>>
>> +struct xfrm_policy_hthresh {
>> +     struct work_struct      work;
>> +     seqlock_t               lock;
>
> This newly introduced lock is not initialized. It triggers an
> inconsistent lock state warning when acquired for the first time.

oops! I'll fix that.

>> +     pr_info("rebuilding SPD hash table: thresholds (%u,%u)(%u,%u)\n",
>> +             lbits4, rbits4, lbits6, rbits6);
>
> Do we really need to print this?

No, it's not necessary, I will remove it.

>> +             hlist_for_each_entry(pol, chain, bydst) {
>> +                     if (policy->priority >= pol->priority)
>> +                             newpos = &pol->bydst;
>> +                     else
>> +                             break;
>> +             }
>> +             if (newpos)
>> +                     hlist_add_after(newpos, &policy->bydst);
>
> hlist_add_after() does not exist any more, it was replaced by
> hlist_add_behind() recently.

OK, I'll update the code accordingly.

>> +static int xfrm_set_spdinfo(struct sk_buff *skb, struct nlmsghdr *nlh,
>> +                         struct nlattr **attrs)
>> +{
>> +     struct net *net = sock_net(skb->sk);
>> +     struct sk_buff *r_skb;
>> +     u32 *flags = nlmsg_data(nlh);
>> +     u32 sportid = NETLINK_CB(skb).portid;
>> +     u32 seq = nlh->nlmsg_seq;
>> +     struct xfrmu_spdhthresh *thresh4 = NULL;
>> +     struct xfrmu_spdhthresh *thresh6 = NULL;
>> +
>> +     /* selector prefixlen thresholds to hash policies */
>> +     if (attrs[XFRMA_SPD_IPV4_HTHRESH]) {
>> +             struct nlattr *rta = attrs[XFRMA_SPD_IPV4_HTHRESH];
>> +
>> +             if (nla_len(rta) < sizeof(*thresh4))
>> +                     return -EINVAL;
>> +             thresh4 = nla_data(rta);
>> +             if (thresh4->lbits > 32 || thresh4->rbits > 32)
>> +                     return -EINVAL;
>> +     }
>> +     if (attrs[XFRMA_SPD_IPV6_HTHRESH]) {
>> +             struct nlattr *rta = attrs[XFRMA_SPD_IPV6_HTHRESH];
>> +
>> +             if (nla_len(rta) < sizeof(*thresh6))
>> +                     return -EINVAL;
>> +             thresh6 = nla_data(rta);
>> +             if (thresh6->lbits > 128 || thresh6->rbits > 128)
>> +                     return -EINVAL;
>> +     }
>> +
>> +     if (thresh4 || thresh6) {
>> +             write_seqlock(&net->xfrm.policy_hthresh.lock);
>> +             if (thresh4) {
>> +                     net->xfrm.policy_hthresh.lbits4 = thresh4->lbits;
>> +                     net->xfrm.policy_hthresh.rbits4 = thresh4->rbits;
>> +             }
>> +             if (thresh6) {
>> +                     net->xfrm.policy_hthresh.lbits6 = thresh6->lbits;
>> +                     net->xfrm.policy_hthresh.rbits6 = thresh6->rbits;
>> +             }
>> +             write_sequnlock(&net->xfrm.policy_hthresh.lock);
>> +
>> +             xfrm_policy_hash_rebuild(net);
>> +     }
>> +
>> +     r_skb = nlmsg_new(xfrm_spdinfo_msgsize(), GFP_ATOMIC);
>> +     if (r_skb == NULL)
>> +             return -ENOMEM;
>> +
>> +     if (build_spdinfo(r_skb, net, sportid, seq, *flags) < 0)
>> +             BUG();
>> +
>> +     return nlmsg_unicast(net->xfrm.nlsk, r_skb, sportid);
>
> Why do you send these informations to userspace? This is a set
> operation, not get.

You're right, I'll remove this reply message.

> The rest looks quite good, thanks!

Thanks. I'll send an update.

Christophe

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCH ipsec-next v3 0/2] xfrm: scalability enhancements for policy database
  2014-08-21  6:09     ` [PATCH net-next v2 2/2] xfrm: configure policy hash table thresholds by netlink Steffen Klassert
  2014-08-26  7:27       ` Christophe Gouault
@ 2014-08-27 15:48       ` Christophe Gouault
  2014-08-27 15:48         ` [PATCH ipsec-next v3 1/2] xfrm: hash prefixed policies based on preflen thresholds Christophe Gouault
  2014-08-27 15:48         ` [PATCH ipsec-next v3 2/2] xfrm: configure policy hash table thresholds by netlink Christophe Gouault
  1 sibling, 2 replies; 27+ messages in thread
From: Christophe Gouault @ 2014-08-27 15:48 UTC (permalink / raw)
  To: Steffen Klassert, David S. Miller; +Cc: netdev

This patchset enables to hash more policies than just non-prefixed
ones: hash policies whose prefix lengths are greater or equal to
configurable thresholds.

These thresholds are configured via netlink message
XFRM_MSG_NEWSPDINFO, attributes XFRMA_SPD_IPV4_HTHRESH and
XFRMA_SPD_IPV6_HTHRESH.

The related iproute2 patch for configuring the thresholds is available
on demand.

Best Regards,
Christophe
----
v2:
- change configuration API from proc to netlink
v3:
- initialize xfrm_policy_hthresh lock
- remove "rebuilding SPD hash table" log
- replace deprecated hlist_add_after by hlist_add_behind
- remove netlink reply to XFRM_MSG_NEWSPDINFO request
---
 include/net/netns/xfrm.h  |  14 +++++++
 include/net/xfrm.h        |   1 +
 include/uapi/linux/xfrm.h |   7 ++++
 net/xfrm/xfrm_hash.h      |  76 +++++++++++++++++++++++++++++++-----
 net/xfrm/xfrm_policy.c    | 140 +++++++++++++++++++++++++++++++++++++++++++++++
 net/xfrm/xfrm_user.c      |  83 +++++++++++++++++++++++++++++++++++++--
 6 files changed, 302 insertions(+), 19 deletions(-)

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCH ipsec-next v3 1/2] xfrm: hash prefixed policies based on preflen thresholds
  2014-08-27 15:48       ` [PATCH ipsec-next v3 0/2] xfrm: scalability enhancements for policy database Christophe Gouault
@ 2014-08-27 15:48         ` Christophe Gouault
  2014-08-27 15:48         ` [PATCH ipsec-next v3 2/2] xfrm: configure policy hash table thresholds by netlink Christophe Gouault
  1 sibling, 0 replies; 27+ messages in thread
From: Christophe Gouault @ 2014-08-27 15:48 UTC (permalink / raw)
  To: Steffen Klassert, David S. Miller; +Cc: netdev, Christophe Gouault

The idea is an extension of the current policy hashing.

Today only non-prefixed policies are stored in a hash table. This
patch relaxes the constraints, and hashes policies whose prefix
lengths are greater or equal to a configurable threshold.

Each hash table (one per direction) maintains its own set of IPv4 and
IPv6 thresholds (dbits4, sbits4, dbits6, sbits6), by default (32, 32,
128, 128).

Example, if the output hash table is configured with values (16, 24,
56, 64):

ip xfrm policy add dir out src 10.22.0.0/20 dst 10.24.1.0/24 ... => hashed
ip xfrm policy add dir out src 10.22.0.0/16 dst 10.24.1.1/32 ... => hashed
ip xfrm policy add dir out src 10.22.0.0/16 dst 10.24.0.0/16 ... => unhashed

ip xfrm policy add dir out \
    src 3ffe:304:124:2200::/60 dst 3ffe:304:124:2401::/64 ...    => hashed
ip xfrm policy add dir out \
    src 3ffe:304:124:2200::/56 dst 3ffe:304:124:2401::2/128 ...  => hashed
ip xfrm policy add dir out \
    src 3ffe:304:124:2200::/56 dst 3ffe:304:124:2400::/56 ...    => unhashed

The high order bits of the addresses (up to the threshold) are used to
compute the hash key.

Signed-off-by: Christophe Gouault <christophe.gouault@6wind.com>
---
 include/net/netns/xfrm.h |  4 +++
 net/xfrm/xfrm_hash.h     | 76 +++++++++++++++++++++++++++++++++++++++++-------
 net/xfrm/xfrm_policy.c   | 53 +++++++++++++++++++++++++++++----
 3 files changed, 117 insertions(+), 16 deletions(-)

diff --git a/include/net/netns/xfrm.h b/include/net/netns/xfrm.h
index 3492434..41902a8 100644
--- a/include/net/netns/xfrm.h
+++ b/include/net/netns/xfrm.h
@@ -13,6 +13,10 @@ struct ctl_table_header;
 struct xfrm_policy_hash {
 	struct hlist_head	*table;
 	unsigned int		hmask;
+	u8			dbits4;
+	u8			sbits4;
+	u8			dbits6;
+	u8			sbits6;
 };
 
 struct netns_xfrm {
diff --git a/net/xfrm/xfrm_hash.h b/net/xfrm/xfrm_hash.h
index 0622d31..666c5ff 100644
--- a/net/xfrm/xfrm_hash.h
+++ b/net/xfrm/xfrm_hash.h
@@ -3,6 +3,7 @@
 
 #include <linux/xfrm.h>
 #include <linux/socket.h>
+#include <linux/jhash.h>
 
 static inline unsigned int __xfrm4_addr_hash(const xfrm_address_t *addr)
 {
@@ -28,6 +29,58 @@ static inline unsigned int __xfrm6_daddr_saddr_hash(const xfrm_address_t *daddr,
 		     saddr->a6[2] ^ saddr->a6[3]);
 }
 
+static inline u32 __bits2mask32(__u8 bits)
+{
+	u32 mask32 = 0xffffffff;
+
+	if (bits == 0)
+		mask32 = 0;
+	else if (bits < 32)
+		mask32 <<= (32 - bits);
+
+	return mask32;
+}
+
+static inline unsigned int __xfrm4_dpref_spref_hash(const xfrm_address_t *daddr,
+						    const xfrm_address_t *saddr,
+						    __u8 dbits,
+						    __u8 sbits)
+{
+	return jhash_2words(ntohl(daddr->a4) & __bits2mask32(dbits),
+			    ntohl(saddr->a4) & __bits2mask32(sbits),
+			    0);
+}
+
+static inline unsigned int __xfrm6_pref_hash(const xfrm_address_t *addr,
+					     __u8 prefixlen)
+{
+	int pdw;
+	int pbi;
+	u32 initval = 0;
+
+	pdw = prefixlen >> 5;     /* num of whole u32 in prefix */
+	pbi = prefixlen &  0x1f;  /* num of bits in incomplete u32 in prefix */
+
+	if (pbi) {
+		__be32 mask;
+
+		mask = htonl((0xffffffff) << (32 - pbi));
+
+		initval = (__force u32)(addr->a6[pdw] & mask);
+	}
+
+	return jhash2((__force u32 *)addr->a6, pdw, initval);
+}
+
+static inline unsigned int __xfrm6_dpref_spref_hash(const xfrm_address_t *daddr,
+						    const xfrm_address_t *saddr,
+						    __u8 dbits,
+						    __u8 sbits)
+{
+	return __xfrm6_pref_hash(daddr, dbits) ^
+	       __xfrm6_pref_hash(saddr, sbits);
+}
+
 static inline unsigned int __xfrm_dst_hash(const xfrm_address_t *daddr,
 					   const xfrm_address_t *saddr,
 					   u32 reqid, unsigned short family,
@@ -84,7 +137,8 @@ static inline unsigned int __idx_hash(u32 index, unsigned int hmask)
 }
 
 static inline unsigned int __sel_hash(const struct xfrm_selector *sel,
-				      unsigned short family, unsigned int hmask)
+				      unsigned short family, unsigned int hmask,
+				      u8 dbits, u8 sbits)
 {
 	const xfrm_address_t *daddr = &sel->daddr;
 	const xfrm_address_t *saddr = &sel->saddr;
@@ -92,19 +146,19 @@ static inline unsigned int __sel_hash(const struct xfrm_selector *sel,
 
 	switch (family) {
 	case AF_INET:
-		if (sel->prefixlen_d != 32 ||
-		    sel->prefixlen_s != 32)
+		if (sel->prefixlen_d < dbits ||
+		    sel->prefixlen_s < sbits)
 			return hmask + 1;
 
-		h = __xfrm4_daddr_saddr_hash(daddr, saddr);
+		h = __xfrm4_dpref_spref_hash(daddr, saddr, dbits, sbits);
 		break;
 
 	case AF_INET6:
-		if (sel->prefixlen_d != 128 ||
-		    sel->prefixlen_s != 128)
+		if (sel->prefixlen_d < dbits ||
+		    sel->prefixlen_s < sbits)
 			return hmask + 1;
 
-		h = __xfrm6_daddr_saddr_hash(daddr, saddr);
+		h = __xfrm6_dpref_spref_hash(daddr, saddr, dbits, sbits);
 		break;
 	}
 	h ^= (h >> 16);
@@ -113,17 +167,19 @@ static inline unsigned int __sel_hash(const struct xfrm_selector *sel,
 
 static inline unsigned int __addr_hash(const xfrm_address_t *daddr,
 				       const xfrm_address_t *saddr,
-				       unsigned short family, unsigned int hmask)
+				       unsigned short family,
+				       unsigned int hmask,
+				       u8 dbits, u8 sbits)
 {
 	unsigned int h = 0;
 
 	switch (family) {
 	case AF_INET:
-		h = __xfrm4_daddr_saddr_hash(daddr, saddr);
+		h = __xfrm4_dpref_spref_hash(daddr, saddr, dbits, sbits);
 		break;
 
 	case AF_INET6:
-		h = __xfrm6_daddr_saddr_hash(daddr, saddr);
+		h = __xfrm6_dpref_spref_hash(daddr, saddr, dbits, sbits);
 		break;
 	}
 	h ^= (h >> 16);
diff --git a/net/xfrm/xfrm_policy.c b/net/xfrm/xfrm_policy.c
index beeed60..e6ff7b4 100644
--- a/net/xfrm/xfrm_policy.c
+++ b/net/xfrm/xfrm_policy.c
@@ -344,12 +344,39 @@ static inline unsigned int idx_hash(struct net *net, u32 index)
 	return __idx_hash(index, net->xfrm.policy_idx_hmask);
 }
 
+/* calculate policy hash thresholds */
+static void __get_hash_thresh(struct net *net,
+			      unsigned short family, int dir,
+			      u8 *dbits, u8 *sbits)
+{
+	switch (family) {
+	case AF_INET:
+		*dbits = net->xfrm.policy_bydst[dir].dbits4;
+		*sbits = net->xfrm.policy_bydst[dir].sbits4;
+		break;
+
+	case AF_INET6:
+		*dbits = net->xfrm.policy_bydst[dir].dbits6;
+		*sbits = net->xfrm.policy_bydst[dir].sbits6;
+		break;
+
+	default:
+		*dbits = 0;
+		*sbits = 0;
+	}
+}
+
 static struct hlist_head *policy_hash_bysel(struct net *net,
 					    const struct xfrm_selector *sel,
 					    unsigned short family, int dir)
 {
 	unsigned int hmask = net->xfrm.policy_bydst[dir].hmask;
-	unsigned int hash = __sel_hash(sel, family, hmask);
+	unsigned int hash;
+	u8 dbits;
+	u8 sbits;
+
+	__get_hash_thresh(net, family, dir, &dbits, &sbits);
+	hash = __sel_hash(sel, family, hmask, dbits, sbits);
 
 	return (hash == hmask + 1 ?
 		&net->xfrm.policy_inexact[dir] :
@@ -362,25 +389,35 @@ static struct hlist_head *policy_hash_direct(struct net *net,
 					     unsigned short family, int dir)
 {
 	unsigned int hmask = net->xfrm.policy_bydst[dir].hmask;
-	unsigned int hash = __addr_hash(daddr, saddr, family, hmask);
+	unsigned int hash;
+	u8 dbits;
+	u8 sbits;
+
+	__get_hash_thresh(net, family, dir, &dbits, &sbits);
+	hash = __addr_hash(daddr, saddr, family, hmask, dbits, sbits);
 
 	return net->xfrm.policy_bydst[dir].table + hash;
 }
 
-static void xfrm_dst_hash_transfer(struct hlist_head *list,
+static void xfrm_dst_hash_transfer(struct net *net,
+				   struct hlist_head *list,
 				   struct hlist_head *ndsttable,
-				   unsigned int nhashmask)
+				   unsigned int nhashmask,
+				   int dir)
 {
 	struct hlist_node *tmp, *entry0 = NULL;
 	struct xfrm_policy *pol;
 	unsigned int h0 = 0;
+	u8 dbits;
+	u8 sbits;
 
 redo:
 	hlist_for_each_entry_safe(pol, tmp, list, bydst) {
 		unsigned int h;
 
+		__get_hash_thresh(net, pol->family, dir, &dbits, &sbits);
 		h = __addr_hash(&pol->selector.daddr, &pol->selector.saddr,
-				pol->family, nhashmask);
+				pol->family, nhashmask, dbits, sbits);
 		if (!entry0) {
 			hlist_del(&pol->bydst);
 			hlist_add_head(&pol->bydst, ndsttable+h);
@@ -434,7 +471,7 @@ static void xfrm_bydst_resize(struct net *net, int dir)
 	write_lock_bh(&net->xfrm.xfrm_policy_lock);
 
 	for (i = hmask; i >= 0; i--)
-		xfrm_dst_hash_transfer(odst + i, ndst, nhashmask);
+		xfrm_dst_hash_transfer(net, odst + i, ndst, nhashmask, dir);
 
 	net->xfrm.policy_bydst[dir].table = ndst;
 	net->xfrm.policy_bydst[dir].hmask = nhashmask;
@@ -2830,6 +2867,10 @@ static int __net_init xfrm_policy_init(struct net *net)
 		if (!htab->table)
 			goto out_bydst;
 		htab->hmask = hmask;
+		htab->dbits4 = 32;
+		htab->sbits4 = 32;
+		htab->dbits6 = 128;
+		htab->sbits6 = 128;
 	}
 
 	INIT_LIST_HEAD(&net->xfrm.policy_all);
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH ipsec-next v3 2/2] xfrm: configure policy hash table thresholds by netlink
  2014-08-27 15:48       ` [PATCH ipsec-next v3 0/2] xfrm: scalability enhancements for policy database Christophe Gouault
  2014-08-27 15:48         ` [PATCH ipsec-next v3 1/2] xfrm: hash prefixed policies based on preflen thresholds Christophe Gouault
@ 2014-08-27 15:48         ` Christophe Gouault
  2014-08-29  9:54           ` Steffen Klassert
  1 sibling, 1 reply; 27+ messages in thread
From: Christophe Gouault @ 2014-08-27 15:48 UTC (permalink / raw)
  To: Steffen Klassert, David S. Miller; +Cc: netdev, Christophe Gouault

Enable to specify local and remote prefix length thresholds for the
policy hash table via a netlink XFRM_MSG_NEWSPDINFO message.

prefix length thresholds are specified by XFRMA_SPD_IPV4_HTHRESH and
XFRMA_SPD_IPV6_HTHRESH optional attributes (struct xfrmu_spdhthresh).

example:

    struct xfrmu_spdhthresh thresh4 = {
        .lbits = 0;
        .rbits = 24;
    };
    struct xfrmu_spdhthresh thresh6 = {
        .lbits = 0;
        .rbits = 56;
    };
    struct nlmsghdr *hdr;
    struct nl_msg *msg;

    msg = nlmsg_alloc();
    hdr = nlmsg_put(msg, NL_AUTO_PORT, NL_AUTO_SEQ, XFRMA_SPD_IPV4_HTHRESH, sizeof(__u32), NLM_F_REQUEST);
    nla_put(msg, XFRMA_SPD_IPV4_HTHRESH, sizeof(thresh4), &thresh4);
    nla_put(msg, XFRMA_SPD_IPV6_HTHRESH, sizeof(thresh6), &thresh6);
    nla_send_auto(sk, msg);

The numbers are the policy selector minimum prefix lengths to put a
policy in the hash table.

- lbits is the local threshold (source address for out policies,
  destination address for in and fwd policies).

- rbits is the remote threshold (destination address for out
  policies, source address for in and fwd policies).

The default values are:

XFRMA_SPD_IPV4_HTHRESH: 32 32
XFRMA_SPD_IPV6_HTHRESH: 128 128

Dynamic re-building of the SPD is performed when the thresholds values
are changed.

The current thresholds can be read via a XFRM_MSG_GETSPDINFO request:
the kernel replies to XFRM_MSG_GETSPDINFO requests by an
XFRM_MSG_NEWSPDINFO message, with both attributes
XFRMA_SPD_IPV4_HTHRESH and XFRMA_SPD_IPV6_HTHRESH.

Signed-off-by: Christophe Gouault <christophe.gouault@6wind.com>
---
 include/net/netns/xfrm.h  | 10 ++++++
 include/net/xfrm.h        |  1 +
 include/uapi/linux/xfrm.h |  7 ++++
 net/xfrm/xfrm_policy.c    | 87 +++++++++++++++++++++++++++++++++++++++++++++++
 net/xfrm/xfrm_user.c      | 83 ++++++++++++++++++++++++++++++++++++++++++--
 5 files changed, 185 insertions(+), 3 deletions(-)

diff --git a/include/net/netns/xfrm.h b/include/net/netns/xfrm.h
index 41902a8..9da7982 100644
--- a/include/net/netns/xfrm.h
+++ b/include/net/netns/xfrm.h
@@ -19,6 +19,15 @@ struct xfrm_policy_hash {
 	u8			sbits6;
 };
 
+struct xfrm_policy_hthresh {
+	struct work_struct	work;
+	seqlock_t		lock;
+	u8			lbits4;
+	u8			rbits4;
+	u8			lbits6;
+	u8			rbits6;
+};
+
 struct netns_xfrm {
 	struct list_head	state_all;
 	/*
@@ -45,6 +54,7 @@ struct netns_xfrm {
 	struct xfrm_policy_hash	policy_bydst[XFRM_POLICY_MAX * 2];
 	unsigned int		policy_count[XFRM_POLICY_MAX * 2];
 	struct work_struct	policy_hash_work;
+	struct xfrm_policy_hthresh policy_hthresh;
 
 
 	struct sock		*nlsk;
diff --git a/include/net/xfrm.h b/include/net/xfrm.h
index 721e9c3..dc4865e 100644
--- a/include/net/xfrm.h
+++ b/include/net/xfrm.h
@@ -1591,6 +1591,7 @@ struct xfrm_policy *xfrm_policy_bysel_ctx(struct net *net, u32 mark,
 struct xfrm_policy *xfrm_policy_byid(struct net *net, u32 mark, u8, int dir,
 				     u32 id, int delete, int *err);
 int xfrm_policy_flush(struct net *net, u8 type, bool task_valid);
+void xfrm_policy_hash_rebuild(struct net *net);
 u32 xfrm_get_acqseq(void);
 int verify_spi_info(u8 proto, u32 min, u32 max);
 int xfrm_alloc_spi(struct xfrm_state *x, u32 minspi, u32 maxspi);
diff --git a/include/uapi/linux/xfrm.h b/include/uapi/linux/xfrm.h
index 25e5dd9..02d5125 100644
--- a/include/uapi/linux/xfrm.h
+++ b/include/uapi/linux/xfrm.h
@@ -328,6 +328,8 @@ enum xfrm_spdattr_type_t {
 	XFRMA_SPD_UNSPEC,
 	XFRMA_SPD_INFO,
 	XFRMA_SPD_HINFO,
+	XFRMA_SPD_IPV4_HTHRESH,
+	XFRMA_SPD_IPV6_HTHRESH,
 	__XFRMA_SPD_MAX
 
 #define XFRMA_SPD_MAX (__XFRMA_SPD_MAX - 1)
@@ -347,6 +349,11 @@ struct xfrmu_spdhinfo {
 	__u32 spdhmcnt;
 };
 
+struct xfrmu_spdhthresh {
+	__u8 lbits;
+	__u8 rbits;
+};
+
 struct xfrm_usersa_info {
 	struct xfrm_selector		sel;
 	struct xfrm_id			id;
diff --git a/net/xfrm/xfrm_policy.c b/net/xfrm/xfrm_policy.c
index e6ff7b4..55bcb86 100644
--- a/net/xfrm/xfrm_policy.c
+++ b/net/xfrm/xfrm_policy.c
@@ -566,6 +566,86 @@ static void xfrm_hash_resize(struct work_struct *work)
 	mutex_unlock(&hash_resize_mutex);
 }
 
+static void xfrm_hash_rebuild(struct work_struct *work)
+{
+	struct net *net = container_of(work, struct net,
+				       xfrm.policy_hthresh.work);
+	unsigned int hmask;
+	struct xfrm_policy *pol;
+	struct xfrm_policy *policy;
+	struct hlist_head *chain;
+	struct hlist_head *odst;
+	struct hlist_node *newpos;
+	int i;
+	int dir;
+	unsigned seq;
+	u8 lbits4, rbits4, lbits6, rbits6;
+
+	mutex_lock(&hash_resize_mutex);
+
+	/* read selector prefixlen thresholds */
+	do {
+		seq = read_seqbegin(&net->xfrm.policy_hthresh.lock);
+
+		lbits4 = net->xfrm.policy_hthresh.lbits4;
+		rbits4 = net->xfrm.policy_hthresh.rbits4;
+		lbits6 = net->xfrm.policy_hthresh.lbits6;
+		rbits6 = net->xfrm.policy_hthresh.rbits6;
+	} while (read_seqretry(&net->xfrm.policy_hthresh.lock, seq));
+
+	write_lock_bh(&net->xfrm.xfrm_policy_lock);
+
+	/* reset the bydst and inexact table in all directions */
+	for (dir = 0; dir < XFRM_POLICY_MAX * 2; dir++) {
+		INIT_HLIST_HEAD(&net->xfrm.policy_inexact[dir]);
+		hmask = net->xfrm.policy_bydst[dir].hmask;
+		odst = net->xfrm.policy_bydst[dir].table;
+		for (i = hmask; i >= 0; i--)
+			INIT_HLIST_HEAD(odst + i);
+		if ((dir & XFRM_POLICY_MASK) == XFRM_POLICY_OUT) {
+			/* dir out => dst = remote, src = local */
+			net->xfrm.policy_bydst[dir].dbits4 = rbits4;
+			net->xfrm.policy_bydst[dir].sbits4 = lbits4;
+			net->xfrm.policy_bydst[dir].dbits6 = rbits6;
+			net->xfrm.policy_bydst[dir].sbits6 = lbits6;
+		} else {
+			/* dir in/fwd => dst = local, src = remote */
+			net->xfrm.policy_bydst[dir].dbits4 = lbits4;
+			net->xfrm.policy_bydst[dir].sbits4 = rbits4;
+			net->xfrm.policy_bydst[dir].dbits6 = lbits6;
+			net->xfrm.policy_bydst[dir].sbits6 = rbits6;
+		}
+	}
+
+	/* re-insert all policies by order of creation */
+	list_for_each_entry_reverse(policy, &net->xfrm.policy_all, walk.all) {
+		newpos = NULL;
+		chain = policy_hash_bysel(net, &policy->selector,
+					  policy->family,
+					  xfrm_policy_id2dir(policy->index));
+		hlist_for_each_entry(pol, chain, bydst) {
+			if (policy->priority >= pol->priority)
+				newpos = &pol->bydst;
+			else
+				break;
+		}
+		if (newpos)
+			hlist_add_behind(&policy->bydst, newpos);
+		else
+			hlist_add_head(&policy->bydst, chain);
+	}
+
+	write_unlock_bh(&net->xfrm.xfrm_policy_lock);
+
+	mutex_unlock(&hash_resize_mutex);
+}
+
+void xfrm_policy_hash_rebuild(struct net *net)
+{
+	schedule_work(&net->xfrm.policy_hthresh.work);
+}
+EXPORT_SYMBOL(xfrm_policy_hash_rebuild);
+
 /* Generate new index... KAME seems to generate them ordered by cost
  * of an absolute inpredictability of ordering of rules. This will not pass. */
 static u32 xfrm_gen_index(struct net *net, int dir, u32 index)
@@ -2872,9 +2952,16 @@ static int __net_init xfrm_policy_init(struct net *net)
 		htab->dbits6 = 128;
 		htab->sbits6 = 128;
 	}
+	net->xfrm.policy_hthresh.lbits4 = 32;
+	net->xfrm.policy_hthresh.rbits4 = 32;
+	net->xfrm.policy_hthresh.lbits6 = 128;
+	net->xfrm.policy_hthresh.rbits6 = 128;
+
+	seqlock_init(&net->xfrm.policy_hthresh.lock);
 
 	INIT_LIST_HEAD(&net->xfrm.policy_all);
 	INIT_WORK(&net->xfrm.policy_hash_work, xfrm_hash_resize);
+	INIT_WORK(&net->xfrm.policy_hthresh.work, xfrm_hash_rebuild);
 	if (net_eq(net, &init_net))
 		register_netdevice_notifier(&xfrm_dev_notifier);
 	return 0;
diff --git a/net/xfrm/xfrm_user.c b/net/xfrm/xfrm_user.c
index d4db6eb..c6a7f44 100644
--- a/net/xfrm/xfrm_user.c
+++ b/net/xfrm/xfrm_user.c
@@ -964,7 +964,9 @@ static inline size_t xfrm_spdinfo_msgsize(void)
 {
 	return NLMSG_ALIGN(4)
 	       + nla_total_size(sizeof(struct xfrmu_spdinfo))
-	       + nla_total_size(sizeof(struct xfrmu_spdhinfo));
+	       + nla_total_size(sizeof(struct xfrmu_spdhinfo))
+	       + nla_total_size(sizeof(struct xfrmu_spdhthresh))
+	       + nla_total_size(sizeof(struct xfrmu_spdhthresh));
 }
 
 static int build_spdinfo(struct sk_buff *skb, struct net *net,
@@ -973,9 +975,11 @@ static int build_spdinfo(struct sk_buff *skb, struct net *net,
 	struct xfrmk_spdinfo si;
 	struct xfrmu_spdinfo spc;
 	struct xfrmu_spdhinfo sph;
+	struct xfrmu_spdhthresh spt4, spt6;
 	struct nlmsghdr *nlh;
 	int err;
 	u32 *f;
+	unsigned lseq;
 
 	nlh = nlmsg_put(skb, portid, seq, XFRM_MSG_NEWSPDINFO, sizeof(u32), 0);
 	if (nlh == NULL) /* shouldn't really happen ... */
@@ -993,9 +997,22 @@ static int build_spdinfo(struct sk_buff *skb, struct net *net,
 	sph.spdhcnt = si.spdhcnt;
 	sph.spdhmcnt = si.spdhmcnt;
 
+	do {
+		lseq = read_seqbegin(&net->xfrm.policy_hthresh.lock);
+
+		spt4.lbits = net->xfrm.policy_hthresh.lbits4;
+		spt4.rbits = net->xfrm.policy_hthresh.rbits4;
+		spt6.lbits = net->xfrm.policy_hthresh.lbits6;
+		spt6.rbits = net->xfrm.policy_hthresh.rbits6;
+	} while (read_seqretry(&net->xfrm.policy_hthresh.lock, lseq));
+
 	err = nla_put(skb, XFRMA_SPD_INFO, sizeof(spc), &spc);
 	if (!err)
 		err = nla_put(skb, XFRMA_SPD_HINFO, sizeof(sph), &sph);
+	if (!err)
+		err = nla_put(skb, XFRMA_SPD_IPV4_HTHRESH, sizeof(spt4), &spt4);
+	if (!err)
+		err = nla_put(skb, XFRMA_SPD_IPV6_HTHRESH, sizeof(spt6), &spt6);
 	if (err) {
 		nlmsg_cancel(skb, nlh);
 		return err;
@@ -1004,6 +1021,54 @@ static int build_spdinfo(struct sk_buff *skb, struct net *net,
 	return nlmsg_end(skb, nlh);
 }
 
+static int xfrm_set_spdinfo(struct sk_buff *skb, struct nlmsghdr *nlh,
+			    struct nlattr **attrs)
+{
+	struct net *net = sock_net(skb->sk);
+	u32 *flags = nlmsg_data(nlh);
+	u32 sportid = NETLINK_CB(skb).portid;
+	u32 seq = nlh->nlmsg_seq;
+	struct xfrmu_spdhthresh *thresh4 = NULL;
+	struct xfrmu_spdhthresh *thresh6 = NULL;
+
+	/* selector prefixlen thresholds to hash policies */
+	if (attrs[XFRMA_SPD_IPV4_HTHRESH]) {
+		struct nlattr *rta = attrs[XFRMA_SPD_IPV4_HTHRESH];
+
+		if (nla_len(rta) < sizeof(*thresh4))
+			return -EINVAL;
+		thresh4 = nla_data(rta);
+		if (thresh4->lbits > 32 || thresh4->rbits > 32)
+			return -EINVAL;
+	}
+	if (attrs[XFRMA_SPD_IPV6_HTHRESH]) {
+		struct nlattr *rta = attrs[XFRMA_SPD_IPV6_HTHRESH];
+
+		if (nla_len(rta) < sizeof(*thresh6))
+			return -EINVAL;
+		thresh6 = nla_data(rta);
+		if (thresh6->lbits > 128 || thresh6->rbits > 128)
+			return -EINVAL;
+	}
+
+	if (thresh4 || thresh6) {
+		write_seqlock(&net->xfrm.policy_hthresh.lock);
+		if (thresh4) {
+			net->xfrm.policy_hthresh.lbits4 = thresh4->lbits;
+			net->xfrm.policy_hthresh.rbits4 = thresh4->rbits;
+		}
+		if (thresh6) {
+			net->xfrm.policy_hthresh.lbits6 = thresh6->lbits;
+			net->xfrm.policy_hthresh.rbits6 = thresh6->rbits;
+		}
+		write_sequnlock(&net->xfrm.policy_hthresh.lock);
+
+		xfrm_policy_hash_rebuild(net);
+	}
+
+	return 0;
+}
+
 static int xfrm_get_spdinfo(struct sk_buff *skb, struct nlmsghdr *nlh,
 		struct nlattr **attrs)
 {
@@ -2274,6 +2339,7 @@ static const int xfrm_msg_min[XFRM_NR_MSGTYPES] = {
 	[XFRM_MSG_REPORT      - XFRM_MSG_BASE] = XMSGSIZE(xfrm_user_report),
 	[XFRM_MSG_MIGRATE     - XFRM_MSG_BASE] = XMSGSIZE(xfrm_userpolicy_id),
 	[XFRM_MSG_GETSADINFO  - XFRM_MSG_BASE] = sizeof(u32),
+	[XFRM_MSG_NEWSPDINFO  - XFRM_MSG_BASE] = sizeof(u32),
 	[XFRM_MSG_GETSPDINFO  - XFRM_MSG_BASE] = sizeof(u32),
 };
 
@@ -2308,10 +2374,17 @@ static const struct nla_policy xfrma_policy[XFRMA_MAX+1] = {
 	[XFRMA_ADDRESS_FILTER]	= { .len = sizeof(struct xfrm_address_filter) },
 };
 
+static const struct nla_policy xfrma_spd_policy[XFRMA_SPD_MAX+1] = {
+	[XFRMA_SPD_IPV4_HTHRESH] = { .len = sizeof(struct xfrmu_spdhthresh) },
+	[XFRMA_SPD_IPV6_HTHRESH] = { .len = sizeof(struct xfrmu_spdhthresh) },
+};
+
 static const struct xfrm_link {
 	int (*doit)(struct sk_buff *, struct nlmsghdr *, struct nlattr **);
 	int (*dump)(struct sk_buff *, struct netlink_callback *);
 	int (*done)(struct netlink_callback *);
+	const struct nla_policy *nla_pol;
+	int nla_max;
 } xfrm_dispatch[XFRM_NR_MSGTYPES] = {
 	[XFRM_MSG_NEWSA       - XFRM_MSG_BASE] = { .doit = xfrm_add_sa        },
 	[XFRM_MSG_DELSA       - XFRM_MSG_BASE] = { .doit = xfrm_del_sa        },
@@ -2335,6 +2408,9 @@ static const struct xfrm_link {
 	[XFRM_MSG_GETAE       - XFRM_MSG_BASE] = { .doit = xfrm_get_ae  },
 	[XFRM_MSG_MIGRATE     - XFRM_MSG_BASE] = { .doit = xfrm_do_migrate    },
 	[XFRM_MSG_GETSADINFO  - XFRM_MSG_BASE] = { .doit = xfrm_get_sadinfo   },
+	[XFRM_MSG_NEWSPDINFO  - XFRM_MSG_BASE] = { .doit = xfrm_set_spdinfo,
+						   .nla_pol = xfrma_spd_policy,
+						   .nla_max = XFRMA_SPD_MAX },
 	[XFRM_MSG_GETSPDINFO  - XFRM_MSG_BASE] = { .doit = xfrm_get_spdinfo   },
 };
 
@@ -2371,8 +2447,9 @@ static int xfrm_user_rcv_msg(struct sk_buff *skb, struct nlmsghdr *nlh)
 		}
 	}
 
-	err = nlmsg_parse(nlh, xfrm_msg_min[type], attrs, XFRMA_MAX,
-			  xfrma_policy);
+	err = nlmsg_parse(nlh, xfrm_msg_min[type], attrs,
+			  link->nla_max ? : XFRMA_MAX,
+			  link->nla_pol ? : xfrma_policy);
 	if (err < 0)
 		return err;
 
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: [PATCH ipsec-next v3 2/2] xfrm: configure policy hash table thresholds by netlink
  2014-08-27 15:48         ` [PATCH ipsec-next v3 2/2] xfrm: configure policy hash table thresholds by netlink Christophe Gouault
@ 2014-08-29  9:54           ` Steffen Klassert
  2014-08-29 10:02             ` Christophe Gouault
  2014-08-29 14:16             ` [ipsec-next v4 0/2] xfrm: scalability enhancements for policy database Christophe Gouault
  0 siblings, 2 replies; 27+ messages in thread
From: Steffen Klassert @ 2014-08-29  9:54 UTC (permalink / raw)
  To: Christophe Gouault; +Cc: David S. Miller, netdev

On Wed, Aug 27, 2014 at 05:48:15PM +0200, Christophe Gouault wrote:
>  
> +static int xfrm_set_spdinfo(struct sk_buff *skb, struct nlmsghdr *nlh,
> +			    struct nlattr **attrs)
> +{
> +	struct net *net = sock_net(skb->sk);
> +	u32 *flags = nlmsg_data(nlh);
> +	u32 sportid = NETLINK_CB(skb).portid;
> +	u32 seq = nlh->nlmsg_seq;

flags, sportid and seq are unused now. Please remove them.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH ipsec-next v3 2/2] xfrm: configure policy hash table thresholds by netlink
  2014-08-29  9:54           ` Steffen Klassert
@ 2014-08-29 10:02             ` Christophe Gouault
  2014-08-29 14:16             ` [ipsec-next v4 0/2] xfrm: scalability enhancements for policy database Christophe Gouault
  1 sibling, 0 replies; 27+ messages in thread
From: Christophe Gouault @ 2014-08-29 10:02 UTC (permalink / raw)
  To: Steffen Klassert; +Cc: David S. Miller, netdev

2014-08-29 11:54 GMT+02:00 Steffen Klassert <steffen.klassert@secunet.com>:
> On Wed, Aug 27, 2014 at 05:48:15PM +0200, Christophe Gouault wrote:
>>
>> +static int xfrm_set_spdinfo(struct sk_buff *skb, struct nlmsghdr *nlh,
>> +                         struct nlattr **attrs)
>> +{
>> +     struct net *net = sock_net(skb->sk);
>> +     u32 *flags = nlmsg_data(nlh);
>> +     u32 sportid = NETLINK_CB(skb).portid;
>> +     u32 seq = nlh->nlmsg_seq;
>
> flags, sportid and seq are unused now. Please remove them.

OK, I'll send an update.

Christophe.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [ipsec-next v4 0/2] xfrm: scalability enhancements for policy database
  2014-08-29  9:54           ` Steffen Klassert
  2014-08-29 10:02             ` Christophe Gouault
@ 2014-08-29 14:16             ` Christophe Gouault
  2014-08-29 14:16               ` [ipsec-next v4 1/2] xfrm: hash prefixed policies based on preflen thresholds Christophe Gouault
                                 ` (2 more replies)
  1 sibling, 3 replies; 27+ messages in thread
From: Christophe Gouault @ 2014-08-29 14:16 UTC (permalink / raw)
  To: Steffen Klassert; +Cc: David S. Miller, netdev

This patchset enables to hash more policies than just non-prefixed
ones: hash policies whose prefix lengths are greater or equal to
configurable thresholds.

These thresholds are configured via netlink message
XFRM_MSG_NEWSPDINFO, attributes XFRMA_SPD_IPV4_HTHRESH and
XFRMA_SPD_IPV6_HTHRESH.

The related iproute2 patch for configuring the thresholds is available
on demand.

Best Regards,
Christophe
----
v2:
- change configuration API from proc to netlink
v3:
- initialize xfrm_policy_hthresh lock
- remove "rebuilding SPD hash table" log
- replace deprecated hlist_add_after by hlist_add_behind
- remove netlink reply to XFRM_MSG_NEWSPDINFO request
v4:
- remove unused variables in xfrm_set_spdinfo
---
 include/net/netns/xfrm.h  |  14 +++++++
 include/net/xfrm.h        |   1 +
 include/uapi/linux/xfrm.h |   7 ++++
 net/xfrm/xfrm_hash.h      |  76 +++++++++++++++++++++++++++++++----- 
 net/xfrm/xfrm_policy.c    | 140 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++---
 net/xfrm/xfrm_user.c      |  80 ++++++++++++++++++++++++++++++++++++--
 6 files changed, 299 insertions(+), 19 deletions(-)

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [ipsec-next v4 1/2] xfrm: hash prefixed policies based on preflen thresholds
  2014-08-29 14:16             ` [ipsec-next v4 0/2] xfrm: scalability enhancements for policy database Christophe Gouault
@ 2014-08-29 14:16               ` Christophe Gouault
  2014-08-29 14:16               ` [ipsec-next v4 2/2] xfrm: configure policy hash table thresholds by netlink Christophe Gouault
  2014-09-03 11:59               ` [ipsec-next v4 0/2] xfrm: scalability enhancements for policy database Steffen Klassert
  2 siblings, 0 replies; 27+ messages in thread
From: Christophe Gouault @ 2014-08-29 14:16 UTC (permalink / raw)
  To: Steffen Klassert; +Cc: David S. Miller, netdev, Christophe Gouault

The idea is an extension of the current policy hashing.

Today only non-prefixed policies are stored in a hash table. This
patch relaxes the constraints, and hashes policies whose prefix
lengths are greater or equal to a configurable threshold.

Each hash table (one per direction) maintains its own set of IPv4 and
IPv6 thresholds (dbits4, sbits4, dbits6, sbits6), by default (32, 32,
128, 128).

Example, if the output hash table is configured with values (16, 24,
56, 64):

ip xfrm policy add dir out src 10.22.0.0/20 dst 10.24.1.0/24 ... => hashed
ip xfrm policy add dir out src 10.22.0.0/16 dst 10.24.1.1/32 ... => hashed
ip xfrm policy add dir out src 10.22.0.0/16 dst 10.24.0.0/16 ... => unhashed

ip xfrm policy add dir out \
    src 3ffe:304:124:2200::/60 dst 3ffe:304:124:2401::/64 ...    => hashed
ip xfrm policy add dir out \
    src 3ffe:304:124:2200::/56 dst 3ffe:304:124:2401::2/128 ...  => hashed
ip xfrm policy add dir out \
    src 3ffe:304:124:2200::/56 dst 3ffe:304:124:2400::/56 ...    => unhashed

The high order bits of the addresses (up to the threshold) are used to
compute the hash key.

Signed-off-by: Christophe Gouault <christophe.gouault@6wind.com>
---
 include/net/netns/xfrm.h |  4 +++
 net/xfrm/xfrm_hash.h     | 76 +++++++++++++++++++++++++++++++++++++++++-------
 net/xfrm/xfrm_policy.c   | 53 +++++++++++++++++++++++++++++----
 3 files changed, 117 insertions(+), 16 deletions(-)

diff --git a/include/net/netns/xfrm.h b/include/net/netns/xfrm.h
index 3492434..41902a8 100644
--- a/include/net/netns/xfrm.h
+++ b/include/net/netns/xfrm.h
@@ -13,6 +13,10 @@ struct ctl_table_header;
 struct xfrm_policy_hash {
 	struct hlist_head	*table;
 	unsigned int		hmask;
+	u8			dbits4;
+	u8			sbits4;
+	u8			dbits6;
+	u8			sbits6;
 };
 
 struct netns_xfrm {
diff --git a/net/xfrm/xfrm_hash.h b/net/xfrm/xfrm_hash.h
index 0622d31..666c5ff 100644
--- a/net/xfrm/xfrm_hash.h
+++ b/net/xfrm/xfrm_hash.h
@@ -3,6 +3,7 @@
 
 #include <linux/xfrm.h>
 #include <linux/socket.h>
+#include <linux/jhash.h>
 
 static inline unsigned int __xfrm4_addr_hash(const xfrm_address_t *addr)
 {
@@ -28,6 +29,58 @@ static inline unsigned int __xfrm6_daddr_saddr_hash(const xfrm_address_t *daddr,
 		     saddr->a6[2] ^ saddr->a6[3]);
 }
 
+static inline u32 __bits2mask32(__u8 bits)
+{
+	u32 mask32 = 0xffffffff;
+
+	if (bits == 0)
+		mask32 = 0;
+	else if (bits < 32)
+		mask32 <<= (32 - bits);
+
+	return mask32;
+}
+
+static inline unsigned int __xfrm4_dpref_spref_hash(const xfrm_address_t *daddr,
+						    const xfrm_address_t *saddr,
+						    __u8 dbits,
+						    __u8 sbits)
+{
+	return jhash_2words(ntohl(daddr->a4) & __bits2mask32(dbits),
+			    ntohl(saddr->a4) & __bits2mask32(sbits),
+			    0);
+}
+
+static inline unsigned int __xfrm6_pref_hash(const xfrm_address_t *addr,
+					     __u8 prefixlen)
+{
+	int pdw;
+	int pbi;
+	u32 initval = 0;
+
+	pdw = prefixlen >> 5;     /* num of whole u32 in prefix */
+	pbi = prefixlen &  0x1f;  /* num of bits in incomplete u32 in prefix */
+
+	if (pbi) {
+		__be32 mask;
+
+		mask = htonl((0xffffffff) << (32 - pbi));
+
+		initval = (__force u32)(addr->a6[pdw] & mask);
+	}
+
+	return jhash2((__force u32 *)addr->a6, pdw, initval);
+}
+
+static inline unsigned int __xfrm6_dpref_spref_hash(const xfrm_address_t *daddr,
+						    const xfrm_address_t *saddr,
+						    __u8 dbits,
+						    __u8 sbits)
+{
+	return __xfrm6_pref_hash(daddr, dbits) ^
+	       __xfrm6_pref_hash(saddr, sbits);
+}
+
 static inline unsigned int __xfrm_dst_hash(const xfrm_address_t *daddr,
 					   const xfrm_address_t *saddr,
 					   u32 reqid, unsigned short family,
@@ -84,7 +137,8 @@ static inline unsigned int __idx_hash(u32 index, unsigned int hmask)
 }
 
 static inline unsigned int __sel_hash(const struct xfrm_selector *sel,
-				      unsigned short family, unsigned int hmask)
+				      unsigned short family, unsigned int hmask,
+				      u8 dbits, u8 sbits)
 {
 	const xfrm_address_t *daddr = &sel->daddr;
 	const xfrm_address_t *saddr = &sel->saddr;
@@ -92,19 +146,19 @@ static inline unsigned int __sel_hash(const struct xfrm_selector *sel,
 
 	switch (family) {
 	case AF_INET:
-		if (sel->prefixlen_d != 32 ||
-		    sel->prefixlen_s != 32)
+		if (sel->prefixlen_d < dbits ||
+		    sel->prefixlen_s < sbits)
 			return hmask + 1;
 
-		h = __xfrm4_daddr_saddr_hash(daddr, saddr);
+		h = __xfrm4_dpref_spref_hash(daddr, saddr, dbits, sbits);
 		break;
 
 	case AF_INET6:
-		if (sel->prefixlen_d != 128 ||
-		    sel->prefixlen_s != 128)
+		if (sel->prefixlen_d < dbits ||
+		    sel->prefixlen_s < sbits)
 			return hmask + 1;
 
-		h = __xfrm6_daddr_saddr_hash(daddr, saddr);
+		h = __xfrm6_dpref_spref_hash(daddr, saddr, dbits, sbits);
 		break;
 	}
 	h ^= (h >> 16);
@@ -113,17 +167,19 @@ static inline unsigned int __sel_hash(const struct xfrm_selector *sel,
 
 static inline unsigned int __addr_hash(const xfrm_address_t *daddr,
 				       const xfrm_address_t *saddr,
-				       unsigned short family, unsigned int hmask)
+				       unsigned short family,
+				       unsigned int hmask,
+				       u8 dbits, u8 sbits)
 {
 	unsigned int h = 0;
 
 	switch (family) {
 	case AF_INET:
-		h = __xfrm4_daddr_saddr_hash(daddr, saddr);
+		h = __xfrm4_dpref_spref_hash(daddr, saddr, dbits, sbits);
 		break;
 
 	case AF_INET6:
-		h = __xfrm6_daddr_saddr_hash(daddr, saddr);
+		h = __xfrm6_dpref_spref_hash(daddr, saddr, dbits, sbits);
 		break;
 	}
 	h ^= (h >> 16);
diff --git a/net/xfrm/xfrm_policy.c b/net/xfrm/xfrm_policy.c
index beeed60..e6ff7b4 100644
--- a/net/xfrm/xfrm_policy.c
+++ b/net/xfrm/xfrm_policy.c
@@ -344,12 +344,39 @@ static inline unsigned int idx_hash(struct net *net, u32 index)
 	return __idx_hash(index, net->xfrm.policy_idx_hmask);
 }
 
+/* calculate policy hash thresholds */
+static void __get_hash_thresh(struct net *net,
+			      unsigned short family, int dir,
+			      u8 *dbits, u8 *sbits)
+{
+	switch (family) {
+	case AF_INET:
+		*dbits = net->xfrm.policy_bydst[dir].dbits4;
+		*sbits = net->xfrm.policy_bydst[dir].sbits4;
+		break;
+
+	case AF_INET6:
+		*dbits = net->xfrm.policy_bydst[dir].dbits6;
+		*sbits = net->xfrm.policy_bydst[dir].sbits6;
+		break;
+
+	default:
+		*dbits = 0;
+		*sbits = 0;
+	}
+}
+
 static struct hlist_head *policy_hash_bysel(struct net *net,
 					    const struct xfrm_selector *sel,
 					    unsigned short family, int dir)
 {
 	unsigned int hmask = net->xfrm.policy_bydst[dir].hmask;
-	unsigned int hash = __sel_hash(sel, family, hmask);
+	unsigned int hash;
+	u8 dbits;
+	u8 sbits;
+
+	__get_hash_thresh(net, family, dir, &dbits, &sbits);
+	hash = __sel_hash(sel, family, hmask, dbits, sbits);
 
 	return (hash == hmask + 1 ?
 		&net->xfrm.policy_inexact[dir] :
@@ -362,25 +389,35 @@ static struct hlist_head *policy_hash_direct(struct net *net,
 					     unsigned short family, int dir)
 {
 	unsigned int hmask = net->xfrm.policy_bydst[dir].hmask;
-	unsigned int hash = __addr_hash(daddr, saddr, family, hmask);
+	unsigned int hash;
+	u8 dbits;
+	u8 sbits;
+
+	__get_hash_thresh(net, family, dir, &dbits, &sbits);
+	hash = __addr_hash(daddr, saddr, family, hmask, dbits, sbits);
 
 	return net->xfrm.policy_bydst[dir].table + hash;
 }
 
-static void xfrm_dst_hash_transfer(struct hlist_head *list,
+static void xfrm_dst_hash_transfer(struct net *net,
+				   struct hlist_head *list,
 				   struct hlist_head *ndsttable,
-				   unsigned int nhashmask)
+				   unsigned int nhashmask,
+				   int dir)
 {
 	struct hlist_node *tmp, *entry0 = NULL;
 	struct xfrm_policy *pol;
 	unsigned int h0 = 0;
+	u8 dbits;
+	u8 sbits;
 
 redo:
 	hlist_for_each_entry_safe(pol, tmp, list, bydst) {
 		unsigned int h;
 
+		__get_hash_thresh(net, pol->family, dir, &dbits, &sbits);
 		h = __addr_hash(&pol->selector.daddr, &pol->selector.saddr,
-				pol->family, nhashmask);
+				pol->family, nhashmask, dbits, sbits);
 		if (!entry0) {
 			hlist_del(&pol->bydst);
 			hlist_add_head(&pol->bydst, ndsttable+h);
@@ -434,7 +471,7 @@ static void xfrm_bydst_resize(struct net *net, int dir)
 	write_lock_bh(&net->xfrm.xfrm_policy_lock);
 
 	for (i = hmask; i >= 0; i--)
-		xfrm_dst_hash_transfer(odst + i, ndst, nhashmask);
+		xfrm_dst_hash_transfer(net, odst + i, ndst, nhashmask, dir);
 
 	net->xfrm.policy_bydst[dir].table = ndst;
 	net->xfrm.policy_bydst[dir].hmask = nhashmask;
@@ -2830,6 +2867,10 @@ static int __net_init xfrm_policy_init(struct net *net)
 		if (!htab->table)
 			goto out_bydst;
 		htab->hmask = hmask;
+		htab->dbits4 = 32;
+		htab->sbits4 = 32;
+		htab->dbits6 = 128;
+		htab->sbits6 = 128;
 	}
 
 	INIT_LIST_HEAD(&net->xfrm.policy_all);
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [ipsec-next v4 2/2] xfrm: configure policy hash table thresholds by netlink
  2014-08-29 14:16             ` [ipsec-next v4 0/2] xfrm: scalability enhancements for policy database Christophe Gouault
  2014-08-29 14:16               ` [ipsec-next v4 1/2] xfrm: hash prefixed policies based on preflen thresholds Christophe Gouault
@ 2014-08-29 14:16               ` Christophe Gouault
  2014-09-03 11:59               ` [ipsec-next v4 0/2] xfrm: scalability enhancements for policy database Steffen Klassert
  2 siblings, 0 replies; 27+ messages in thread
From: Christophe Gouault @ 2014-08-29 14:16 UTC (permalink / raw)
  To: Steffen Klassert; +Cc: David S. Miller, netdev, Christophe Gouault

Enable to specify local and remote prefix length thresholds for the
policy hash table via a netlink XFRM_MSG_NEWSPDINFO message.

prefix length thresholds are specified by XFRMA_SPD_IPV4_HTHRESH and
XFRMA_SPD_IPV6_HTHRESH optional attributes (struct xfrmu_spdhthresh).

example:

    struct xfrmu_spdhthresh thresh4 = {
        .lbits = 0;
        .rbits = 24;
    };
    struct xfrmu_spdhthresh thresh6 = {
        .lbits = 0;
        .rbits = 56;
    };
    struct nlmsghdr *hdr;
    struct nl_msg *msg;

    msg = nlmsg_alloc();
    hdr = nlmsg_put(msg, NL_AUTO_PORT, NL_AUTO_SEQ, XFRMA_SPD_IPV4_HTHRESH, sizeof(__u32), NLM_F_REQUEST);
    nla_put(msg, XFRMA_SPD_IPV4_HTHRESH, sizeof(thresh4), &thresh4);
    nla_put(msg, XFRMA_SPD_IPV6_HTHRESH, sizeof(thresh6), &thresh6);
    nla_send_auto(sk, msg);

The numbers are the policy selector minimum prefix lengths to put a
policy in the hash table.

- lbits is the local threshold (source address for out policies,
  destination address for in and fwd policies).

- rbits is the remote threshold (destination address for out
  policies, source address for in and fwd policies).

The default values are:

XFRMA_SPD_IPV4_HTHRESH: 32 32
XFRMA_SPD_IPV6_HTHRESH: 128 128

Dynamic re-building of the SPD is performed when the thresholds values
are changed.

The current thresholds can be read via a XFRM_MSG_GETSPDINFO request:
the kernel replies to XFRM_MSG_GETSPDINFO requests by an
XFRM_MSG_NEWSPDINFO message, with both attributes
XFRMA_SPD_IPV4_HTHRESH and XFRMA_SPD_IPV6_HTHRESH.

Signed-off-by: Christophe Gouault <christophe.gouault@6wind.com>
---
 include/net/netns/xfrm.h  | 10 ++++++
 include/net/xfrm.h        |  1 +
 include/uapi/linux/xfrm.h |  7 ++++
 net/xfrm/xfrm_policy.c    | 87 +++++++++++++++++++++++++++++++++++++++++++++++
 net/xfrm/xfrm_user.c      | 80 +++++++++++++++++++++++++++++++++++++++++--
 5 files changed, 182 insertions(+), 3 deletions(-)

diff --git a/include/net/netns/xfrm.h b/include/net/netns/xfrm.h
index 41902a8..9da7982 100644
--- a/include/net/netns/xfrm.h
+++ b/include/net/netns/xfrm.h
@@ -19,6 +19,15 @@ struct xfrm_policy_hash {
 	u8			sbits6;
 };
 
+struct xfrm_policy_hthresh {
+	struct work_struct	work;
+	seqlock_t		lock;
+	u8			lbits4;
+	u8			rbits4;
+	u8			lbits6;
+	u8			rbits6;
+};
+
 struct netns_xfrm {
 	struct list_head	state_all;
 	/*
@@ -45,6 +54,7 @@ struct netns_xfrm {
 	struct xfrm_policy_hash	policy_bydst[XFRM_POLICY_MAX * 2];
 	unsigned int		policy_count[XFRM_POLICY_MAX * 2];
 	struct work_struct	policy_hash_work;
+	struct xfrm_policy_hthresh policy_hthresh;
 
 
 	struct sock		*nlsk;
diff --git a/include/net/xfrm.h b/include/net/xfrm.h
index 721e9c3..dc4865e 100644
--- a/include/net/xfrm.h
+++ b/include/net/xfrm.h
@@ -1591,6 +1591,7 @@ struct xfrm_policy *xfrm_policy_bysel_ctx(struct net *net, u32 mark,
 struct xfrm_policy *xfrm_policy_byid(struct net *net, u32 mark, u8, int dir,
 				     u32 id, int delete, int *err);
 int xfrm_policy_flush(struct net *net, u8 type, bool task_valid);
+void xfrm_policy_hash_rebuild(struct net *net);
 u32 xfrm_get_acqseq(void);
 int verify_spi_info(u8 proto, u32 min, u32 max);
 int xfrm_alloc_spi(struct xfrm_state *x, u32 minspi, u32 maxspi);
diff --git a/include/uapi/linux/xfrm.h b/include/uapi/linux/xfrm.h
index 25e5dd9..02d5125 100644
--- a/include/uapi/linux/xfrm.h
+++ b/include/uapi/linux/xfrm.h
@@ -328,6 +328,8 @@ enum xfrm_spdattr_type_t {
 	XFRMA_SPD_UNSPEC,
 	XFRMA_SPD_INFO,
 	XFRMA_SPD_HINFO,
+	XFRMA_SPD_IPV4_HTHRESH,
+	XFRMA_SPD_IPV6_HTHRESH,
 	__XFRMA_SPD_MAX
 
 #define XFRMA_SPD_MAX (__XFRMA_SPD_MAX - 1)
@@ -347,6 +349,11 @@ struct xfrmu_spdhinfo {
 	__u32 spdhmcnt;
 };
 
+struct xfrmu_spdhthresh {
+	__u8 lbits;
+	__u8 rbits;
+};
+
 struct xfrm_usersa_info {
 	struct xfrm_selector		sel;
 	struct xfrm_id			id;
diff --git a/net/xfrm/xfrm_policy.c b/net/xfrm/xfrm_policy.c
index e6ff7b4..55bcb86 100644
--- a/net/xfrm/xfrm_policy.c
+++ b/net/xfrm/xfrm_policy.c
@@ -566,6 +566,86 @@ static void xfrm_hash_resize(struct work_struct *work)
 	mutex_unlock(&hash_resize_mutex);
 }
 
+static void xfrm_hash_rebuild(struct work_struct *work)
+{
+	struct net *net = container_of(work, struct net,
+				       xfrm.policy_hthresh.work);
+	unsigned int hmask;
+	struct xfrm_policy *pol;
+	struct xfrm_policy *policy;
+	struct hlist_head *chain;
+	struct hlist_head *odst;
+	struct hlist_node *newpos;
+	int i;
+	int dir;
+	unsigned seq;
+	u8 lbits4, rbits4, lbits6, rbits6;
+
+	mutex_lock(&hash_resize_mutex);
+
+	/* read selector prefixlen thresholds */
+	do {
+		seq = read_seqbegin(&net->xfrm.policy_hthresh.lock);
+
+		lbits4 = net->xfrm.policy_hthresh.lbits4;
+		rbits4 = net->xfrm.policy_hthresh.rbits4;
+		lbits6 = net->xfrm.policy_hthresh.lbits6;
+		rbits6 = net->xfrm.policy_hthresh.rbits6;
+	} while (read_seqretry(&net->xfrm.policy_hthresh.lock, seq));
+
+	write_lock_bh(&net->xfrm.xfrm_policy_lock);
+
+	/* reset the bydst and inexact table in all directions */
+	for (dir = 0; dir < XFRM_POLICY_MAX * 2; dir++) {
+		INIT_HLIST_HEAD(&net->xfrm.policy_inexact[dir]);
+		hmask = net->xfrm.policy_bydst[dir].hmask;
+		odst = net->xfrm.policy_bydst[dir].table;
+		for (i = hmask; i >= 0; i--)
+			INIT_HLIST_HEAD(odst + i);
+		if ((dir & XFRM_POLICY_MASK) == XFRM_POLICY_OUT) {
+			/* dir out => dst = remote, src = local */
+			net->xfrm.policy_bydst[dir].dbits4 = rbits4;
+			net->xfrm.policy_bydst[dir].sbits4 = lbits4;
+			net->xfrm.policy_bydst[dir].dbits6 = rbits6;
+			net->xfrm.policy_bydst[dir].sbits6 = lbits6;
+		} else {
+			/* dir in/fwd => dst = local, src = remote */
+			net->xfrm.policy_bydst[dir].dbits4 = lbits4;
+			net->xfrm.policy_bydst[dir].sbits4 = rbits4;
+			net->xfrm.policy_bydst[dir].dbits6 = lbits6;
+			net->xfrm.policy_bydst[dir].sbits6 = rbits6;
+		}
+	}
+
+	/* re-insert all policies by order of creation */
+	list_for_each_entry_reverse(policy, &net->xfrm.policy_all, walk.all) {
+		newpos = NULL;
+		chain = policy_hash_bysel(net, &policy->selector,
+					  policy->family,
+					  xfrm_policy_id2dir(policy->index));
+		hlist_for_each_entry(pol, chain, bydst) {
+			if (policy->priority >= pol->priority)
+				newpos = &pol->bydst;
+			else
+				break;
+		}
+		if (newpos)
+			hlist_add_behind(&policy->bydst, newpos);
+		else
+			hlist_add_head(&policy->bydst, chain);
+	}
+
+	write_unlock_bh(&net->xfrm.xfrm_policy_lock);
+
+	mutex_unlock(&hash_resize_mutex);
+}
+
+void xfrm_policy_hash_rebuild(struct net *net)
+{
+	schedule_work(&net->xfrm.policy_hthresh.work);
+}
+EXPORT_SYMBOL(xfrm_policy_hash_rebuild);
+
 /* Generate new index... KAME seems to generate them ordered by cost
  * of an absolute inpredictability of ordering of rules. This will not pass. */
 static u32 xfrm_gen_index(struct net *net, int dir, u32 index)
@@ -2872,9 +2952,16 @@ static int __net_init xfrm_policy_init(struct net *net)
 		htab->dbits6 = 128;
 		htab->sbits6 = 128;
 	}
+	net->xfrm.policy_hthresh.lbits4 = 32;
+	net->xfrm.policy_hthresh.rbits4 = 32;
+	net->xfrm.policy_hthresh.lbits6 = 128;
+	net->xfrm.policy_hthresh.rbits6 = 128;
+
+	seqlock_init(&net->xfrm.policy_hthresh.lock);
 
 	INIT_LIST_HEAD(&net->xfrm.policy_all);
 	INIT_WORK(&net->xfrm.policy_hash_work, xfrm_hash_resize);
+	INIT_WORK(&net->xfrm.policy_hthresh.work, xfrm_hash_rebuild);
 	if (net_eq(net, &init_net))
 		register_netdevice_notifier(&xfrm_dev_notifier);
 	return 0;
diff --git a/net/xfrm/xfrm_user.c b/net/xfrm/xfrm_user.c
index d4db6eb..eaf8a8f 100644
--- a/net/xfrm/xfrm_user.c
+++ b/net/xfrm/xfrm_user.c
@@ -964,7 +964,9 @@ static inline size_t xfrm_spdinfo_msgsize(void)
 {
 	return NLMSG_ALIGN(4)
 	       + nla_total_size(sizeof(struct xfrmu_spdinfo))
-	       + nla_total_size(sizeof(struct xfrmu_spdhinfo));
+	       + nla_total_size(sizeof(struct xfrmu_spdhinfo))
+	       + nla_total_size(sizeof(struct xfrmu_spdhthresh))
+	       + nla_total_size(sizeof(struct xfrmu_spdhthresh));
 }
 
 static int build_spdinfo(struct sk_buff *skb, struct net *net,
@@ -973,9 +975,11 @@ static int build_spdinfo(struct sk_buff *skb, struct net *net,
 	struct xfrmk_spdinfo si;
 	struct xfrmu_spdinfo spc;
 	struct xfrmu_spdhinfo sph;
+	struct xfrmu_spdhthresh spt4, spt6;
 	struct nlmsghdr *nlh;
 	int err;
 	u32 *f;
+	unsigned lseq;
 
 	nlh = nlmsg_put(skb, portid, seq, XFRM_MSG_NEWSPDINFO, sizeof(u32), 0);
 	if (nlh == NULL) /* shouldn't really happen ... */
@@ -993,9 +997,22 @@ static int build_spdinfo(struct sk_buff *skb, struct net *net,
 	sph.spdhcnt = si.spdhcnt;
 	sph.spdhmcnt = si.spdhmcnt;
 
+	do {
+		lseq = read_seqbegin(&net->xfrm.policy_hthresh.lock);
+
+		spt4.lbits = net->xfrm.policy_hthresh.lbits4;
+		spt4.rbits = net->xfrm.policy_hthresh.rbits4;
+		spt6.lbits = net->xfrm.policy_hthresh.lbits6;
+		spt6.rbits = net->xfrm.policy_hthresh.rbits6;
+	} while (read_seqretry(&net->xfrm.policy_hthresh.lock, lseq));
+
 	err = nla_put(skb, XFRMA_SPD_INFO, sizeof(spc), &spc);
 	if (!err)
 		err = nla_put(skb, XFRMA_SPD_HINFO, sizeof(sph), &sph);
+	if (!err)
+		err = nla_put(skb, XFRMA_SPD_IPV4_HTHRESH, sizeof(spt4), &spt4);
+	if (!err)
+		err = nla_put(skb, XFRMA_SPD_IPV6_HTHRESH, sizeof(spt6), &spt6);
 	if (err) {
 		nlmsg_cancel(skb, nlh);
 		return err;
@@ -1004,6 +1021,51 @@ static int build_spdinfo(struct sk_buff *skb, struct net *net,
 	return nlmsg_end(skb, nlh);
 }
 
+static int xfrm_set_spdinfo(struct sk_buff *skb, struct nlmsghdr *nlh,
+			    struct nlattr **attrs)
+{
+	struct net *net = sock_net(skb->sk);
+	struct xfrmu_spdhthresh *thresh4 = NULL;
+	struct xfrmu_spdhthresh *thresh6 = NULL;
+
+	/* selector prefixlen thresholds to hash policies */
+	if (attrs[XFRMA_SPD_IPV4_HTHRESH]) {
+		struct nlattr *rta = attrs[XFRMA_SPD_IPV4_HTHRESH];
+
+		if (nla_len(rta) < sizeof(*thresh4))
+			return -EINVAL;
+		thresh4 = nla_data(rta);
+		if (thresh4->lbits > 32 || thresh4->rbits > 32)
+			return -EINVAL;
+	}
+	if (attrs[XFRMA_SPD_IPV6_HTHRESH]) {
+		struct nlattr *rta = attrs[XFRMA_SPD_IPV6_HTHRESH];
+
+		if (nla_len(rta) < sizeof(*thresh6))
+			return -EINVAL;
+		thresh6 = nla_data(rta);
+		if (thresh6->lbits > 128 || thresh6->rbits > 128)
+			return -EINVAL;
+	}
+
+	if (thresh4 || thresh6) {
+		write_seqlock(&net->xfrm.policy_hthresh.lock);
+		if (thresh4) {
+			net->xfrm.policy_hthresh.lbits4 = thresh4->lbits;
+			net->xfrm.policy_hthresh.rbits4 = thresh4->rbits;
+		}
+		if (thresh6) {
+			net->xfrm.policy_hthresh.lbits6 = thresh6->lbits;
+			net->xfrm.policy_hthresh.rbits6 = thresh6->rbits;
+		}
+		write_sequnlock(&net->xfrm.policy_hthresh.lock);
+
+		xfrm_policy_hash_rebuild(net);
+	}
+
+	return 0;
+}
+
 static int xfrm_get_spdinfo(struct sk_buff *skb, struct nlmsghdr *nlh,
 		struct nlattr **attrs)
 {
@@ -2274,6 +2336,7 @@ static const int xfrm_msg_min[XFRM_NR_MSGTYPES] = {
 	[XFRM_MSG_REPORT      - XFRM_MSG_BASE] = XMSGSIZE(xfrm_user_report),
 	[XFRM_MSG_MIGRATE     - XFRM_MSG_BASE] = XMSGSIZE(xfrm_userpolicy_id),
 	[XFRM_MSG_GETSADINFO  - XFRM_MSG_BASE] = sizeof(u32),
+	[XFRM_MSG_NEWSPDINFO  - XFRM_MSG_BASE] = sizeof(u32),
 	[XFRM_MSG_GETSPDINFO  - XFRM_MSG_BASE] = sizeof(u32),
 };
 
@@ -2308,10 +2371,17 @@ static const struct nla_policy xfrma_policy[XFRMA_MAX+1] = {
 	[XFRMA_ADDRESS_FILTER]	= { .len = sizeof(struct xfrm_address_filter) },
 };
 
+static const struct nla_policy xfrma_spd_policy[XFRMA_SPD_MAX+1] = {
+	[XFRMA_SPD_IPV4_HTHRESH] = { .len = sizeof(struct xfrmu_spdhthresh) },
+	[XFRMA_SPD_IPV6_HTHRESH] = { .len = sizeof(struct xfrmu_spdhthresh) },
+};
+
 static const struct xfrm_link {
 	int (*doit)(struct sk_buff *, struct nlmsghdr *, struct nlattr **);
 	int (*dump)(struct sk_buff *, struct netlink_callback *);
 	int (*done)(struct netlink_callback *);
+	const struct nla_policy *nla_pol;
+	int nla_max;
 } xfrm_dispatch[XFRM_NR_MSGTYPES] = {
 	[XFRM_MSG_NEWSA       - XFRM_MSG_BASE] = { .doit = xfrm_add_sa        },
 	[XFRM_MSG_DELSA       - XFRM_MSG_BASE] = { .doit = xfrm_del_sa        },
@@ -2335,6 +2405,9 @@ static const struct xfrm_link {
 	[XFRM_MSG_GETAE       - XFRM_MSG_BASE] = { .doit = xfrm_get_ae  },
 	[XFRM_MSG_MIGRATE     - XFRM_MSG_BASE] = { .doit = xfrm_do_migrate    },
 	[XFRM_MSG_GETSADINFO  - XFRM_MSG_BASE] = { .doit = xfrm_get_sadinfo   },
+	[XFRM_MSG_NEWSPDINFO  - XFRM_MSG_BASE] = { .doit = xfrm_set_spdinfo,
+						   .nla_pol = xfrma_spd_policy,
+						   .nla_max = XFRMA_SPD_MAX },
 	[XFRM_MSG_GETSPDINFO  - XFRM_MSG_BASE] = { .doit = xfrm_get_spdinfo   },
 };
 
@@ -2371,8 +2444,9 @@ static int xfrm_user_rcv_msg(struct sk_buff *skb, struct nlmsghdr *nlh)
 		}
 	}
 
-	err = nlmsg_parse(nlh, xfrm_msg_min[type], attrs, XFRMA_MAX,
-			  xfrma_policy);
+	err = nlmsg_parse(nlh, xfrm_msg_min[type], attrs,
+			  link->nla_max ? : XFRMA_MAX,
+			  link->nla_pol ? : xfrma_policy);
 	if (err < 0)
 		return err;
 
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: [ipsec-next v4 0/2] xfrm: scalability enhancements for policy database
  2014-08-29 14:16             ` [ipsec-next v4 0/2] xfrm: scalability enhancements for policy database Christophe Gouault
  2014-08-29 14:16               ` [ipsec-next v4 1/2] xfrm: hash prefixed policies based on preflen thresholds Christophe Gouault
  2014-08-29 14:16               ` [ipsec-next v4 2/2] xfrm: configure policy hash table thresholds by netlink Christophe Gouault
@ 2014-09-03 11:59               ` Steffen Klassert
  2014-09-03 12:53                 ` Christophe Gouault
  2 siblings, 1 reply; 27+ messages in thread
From: Steffen Klassert @ 2014-09-03 11:59 UTC (permalink / raw)
  To: Christophe Gouault; +Cc: David S. Miller, netdev

On Fri, Aug 29, 2014 at 04:16:03PM +0200, Christophe Gouault wrote:
> This patchset enables to hash more policies than just non-prefixed
> ones: hash policies whose prefix lengths are greater or equal to
> configurable thresholds.
> 
> These thresholds are configured via netlink message
> XFRM_MSG_NEWSPDINFO, attributes XFRMA_SPD_IPV4_HTHRESH and
> XFRMA_SPD_IPV6_HTHRESH.
> 
> The related iproute2 patch for configuring the thresholds is available
> on demand.
> 
> Best Regards,
> Christophe
> ----
> v2:
> - change configuration API from proc to netlink
> v3:
> - initialize xfrm_policy_hthresh lock
> - remove "rebuilding SPD hash table" log
> - replace deprecated hlist_add_after by hlist_add_behind
> - remove netlink reply to XFRM_MSG_NEWSPDINFO request
> v4:
> - remove unused variables in xfrm_set_spdinfo

Looks good. All applied to ipsec-next, thanks a lot Christophe!

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [ipsec-next v4 0/2] xfrm: scalability enhancements for policy database
  2014-09-03 11:59               ` [ipsec-next v4 0/2] xfrm: scalability enhancements for policy database Steffen Klassert
@ 2014-09-03 12:53                 ` Christophe Gouault
  0 siblings, 0 replies; 27+ messages in thread
From: Christophe Gouault @ 2014-09-03 12:53 UTC (permalink / raw)
  To: Steffen Klassert; +Cc: David S. Miller, netdev

2014-09-03 13:59 GMT+02:00 Steffen Klassert <steffen.klassert@secunet.com>:
> On Fri, Aug 29, 2014 at 04:16:03PM +0200, Christophe Gouault wrote:
>> This patchset enables to hash more policies than just non-prefixed
>> ones: hash policies whose prefix lengths are greater or equal to
>> configurable thresholds.

> Looks good. All applied to ipsec-next, thanks a lot Christophe!

You're welcome.
Thanks for reviewing.

Christophe

^ permalink raw reply	[flat|nested] 27+ messages in thread

end of thread, other threads:[~2014-09-03 12:53 UTC | newest]

Thread overview: 27+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-05-12 13:45 [PATCH ipsec-next 0/2] xfrm: scalability enhancements for policy database Christophe Gouault
2014-05-12 13:45 ` [PATCH ipsec-next 1/2] xfrm: hash prefixed policies based on preflen thresholds Christophe Gouault
2014-05-12 13:45 ` [PATCH ipsec-next 2/2] xfrm: configure policy hash table thresholds by /proc Christophe Gouault
2014-05-15  8:34   ` Steffen Klassert
2014-05-19  7:41     ` Christophe Gouault
2014-05-22 10:09       ` Steffen Klassert
2014-05-22 10:15         ` David Laight
2014-05-23  8:30           ` Christophe Gouault
2014-08-01  9:12 ` [PATCH net-next v2 0/2] xfrm: scalability enhancements for policy database Christophe Gouault
2014-08-01  9:12   ` [PATCH net-next v2 1/2] xfrm: hash prefixed policies based on preflen thresholds Christophe Gouault
2014-08-01  9:12   ` [PATCH net-next v2 2/2] xfrm: configure policy hash table thresholds by netlink Christophe Gouault
2014-08-01 13:01     ` [PATCH RFC iproute2 0/2] ipxfrm: configuration of SPD hash Christophe Gouault
2014-08-01 13:01       ` [PATCH RFC iproute2 1/2] Update headers to net-next Christophe Gouault
2014-08-01 13:01       ` [PATCH RFC iproute2 2/2] ipxfrm: add command for configuring SPD hash table Christophe Gouault
2014-08-21  6:09     ` [PATCH net-next v2 2/2] xfrm: configure policy hash table thresholds by netlink Steffen Klassert
2014-08-26  7:27       ` Christophe Gouault
2014-08-27 15:48       ` [PATCH ipsec-next v3 0/2] xfrm: scalability enhancements for policy database Christophe Gouault
2014-08-27 15:48         ` [PATCH ipsec-next v3 1/2] xfrm: hash prefixed policies based on preflen thresholds Christophe Gouault
2014-08-27 15:48         ` [PATCH ipsec-next v3 2/2] xfrm: configure policy hash table thresholds by netlink Christophe Gouault
2014-08-29  9:54           ` Steffen Klassert
2014-08-29 10:02             ` Christophe Gouault
2014-08-29 14:16             ` [ipsec-next v4 0/2] xfrm: scalability enhancements for policy database Christophe Gouault
2014-08-29 14:16               ` [ipsec-next v4 1/2] xfrm: hash prefixed policies based on preflen thresholds Christophe Gouault
2014-08-29 14:16               ` [ipsec-next v4 2/2] xfrm: configure policy hash table thresholds by netlink Christophe Gouault
2014-09-03 11:59               ` [ipsec-next v4 0/2] xfrm: scalability enhancements for policy database Steffen Klassert
2014-09-03 12:53                 ` Christophe Gouault
2014-08-04 22:09   ` [PATCH net-next v2 " David Miller

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.