netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH net-next 0/5] sctp: use transport hashtable to replace association's with rhashtable
@ 2015-12-30 15:50 Xin Long
  2015-12-30 15:50 ` [PATCH net-next 1/5] sctp: add the rhashtable apis for sctp global transport hashtable Xin Long
                   ` (2 more replies)
  0 siblings, 3 replies; 40+ messages in thread
From: Xin Long @ 2015-12-30 15:50 UTC (permalink / raw)
  To: network dev, linux-sctp; +Cc: mleitner, vyasevic, daniel, davem

for telecom center, the usual case is that a server is connected by thousands
of clients. but if the server with only one enpoint(udp style) use the same
sport and dport to communicate with every clients, and every assoc in server
will be hashed in the same chain of global assoc hashtable due to currently we
choose dport and sport as the hash key.

when a packet is received, sctp_rcv try to find the assoc with sport and dport,
since that chain is too long to find it fast, it make the performance turn to
very low, some test data is as follow:

in server:
$./ss [start a udp style server there]
in client:
$./cc [start 2500 sockets to connect server with same port and different ip,
       and use one of them to send data to server]

===== test on net-next
-- perf top
server:
  55.73%  [kernel]             [k] sctp_assoc_is_match
   6.80%  [kernel]             [k] sctp_assoc_lookup_paddr
   4.81%  [kernel]             [k] sctp_v4_cmp_addr
   3.12%  [kernel]             [k] _raw_spin_unlock_irqrestore
   1.94%  [kernel]             [k] sctp_cmp_addr_exact

client:
  46.01%  [kernel]                    [k] sctp_endpoint_lookup_assoc
   5.55%  libc-2.17.so                [.] __libc_calloc
   5.39%  libc-2.17.so                [.] _int_free
   3.92%  libc-2.17.so                [.] _int_malloc
   3.23%  [kernel]                    [k] __memset

-- spent time
time is 487s, send pkt is 10000000

we need to change the way to calculate the hash key, to use lport +
rport + paddr as the hash key can avoid this issue.

besides, this patchset will use transport hashtable to replace
association hashtable to lookup with rhashtable api. get transport
first then get association by t->asoc. and also it will make tcp
style work better.

===== test with this patchset:
-- perf top
server:
  15.98%  [kernel]                 [k] _raw_spin_unlock_irqrestore
   9.92%  [kernel]                 [k] __pv_queued_spin_lock_slowpath
   7.22%  [kernel]                 [k] copy_user_generic_string
   2.38%  libpthread-2.17.so       [.] __recvmsg_nocancel
   1.88%  [kernel]                 [k] sctp_recvmsg

client:
  11.90%  [kernel]                   [k] sctp_hash_cmp
   8.52%  [kernel]                   [k] rht_deferred_worker
   4.94%  [kernel]                   [k] __pv_queued_spin_lock_slowpath
   3.95%  [kernel]                   [k] sctp_bind_addr_match
   2.49%  [kernel]                   [k] __memset

-- spent time
time is 22s, send pkt is 10000000

Xin Long (5):
  sctp: add the rhashtable apis for sctp global transport hashtable
  sctp: apply rhashtable api to send/recv path
  sctp: apply rhashtable api to sctp procfs
  sctp: drop the old assoc hashtable of sctp
  sctp: remove the local_bh_disable/enable in sctp_endpoint_lookup_assoc

 include/net/sctp/sctp.h    |  32 ++---
 include/net/sctp/structs.h |  10 +-
 net/sctp/associola.c       |   5 +
 net/sctp/endpointola.c     |  52 ++------
 net/sctp/input.c           | 187 +++++++++++++++++----------
 net/sctp/proc.c            | 316 +++++++++++++++++++++++++--------------------
 net/sctp/protocol.c        |  36 ++----
 net/sctp/sm_sideeffect.c   |   2 -
 net/sctp/socket.c          |   6 +-
 9 files changed, 331 insertions(+), 315 deletions(-)

-- 
2.1.0

^ permalink raw reply	[flat|nested] 40+ messages in thread

* [PATCH net-next 1/5] sctp: add the rhashtable apis for sctp global transport hashtable
  2015-12-30 15:50 [PATCH net-next 0/5] sctp: use transport hashtable to replace association's with rhashtable Xin Long
@ 2015-12-30 15:50 ` Xin Long
  2015-12-30 15:50   ` [PATCH net-next 2/5] sctp: apply rhashtable api to send/recv path Xin Long
                     ` (4 more replies)
  2015-12-30 17:19 ` [PATCH net-next 0/5] sctp: use transport hashtable to replace association's with rhashtable Eric Dumazet
  2016-01-04 22:30 ` David Miller
  2 siblings, 5 replies; 40+ messages in thread
From: Xin Long @ 2015-12-30 15:50 UTC (permalink / raw)
  To: network dev, linux-sctp; +Cc: mleitner, vyasevic, daniel, davem

tranport hashtbale will replace the association hashtable to do the
lookup for transport, and then get association by t->assoc, rhashtable
apis will be used because of it's resizable, scalable and using rcu.

lport + rport + paddr will be the base hashkey to locate the chain,
with net to protect one netns from another, then plus the laddr to
compare to get the target.

this patch will provider the lookup functions:
- sctp_epaddr_lookup_transport
- sctp_addrs_lookup_transport

hash/unhash functions:
- sctp_hash_transport
- sctp_unhash_transport

init/destroy functions:
- sctp_transport_hashtable_init
- sctp_transport_hashtable_destroy

Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
---
 include/net/sctp/sctp.h    |  11 ++++
 include/net/sctp/structs.h |   5 ++
 net/sctp/input.c           | 131 +++++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 147 insertions(+)

diff --git a/include/net/sctp/sctp.h b/include/net/sctp/sctp.h
index ce13cf2..7bbdfba 100644
--- a/include/net/sctp/sctp.h
+++ b/include/net/sctp/sctp.h
@@ -143,6 +143,17 @@ void sctp_icmp_proto_unreachable(struct sock *sk,
 				 struct sctp_transport *t);
 void sctp_backlog_migrate(struct sctp_association *assoc,
 			  struct sock *oldsk, struct sock *newsk);
+int sctp_transport_hashtable_init(void);
+void sctp_transport_hashtable_destroy(void);
+void sctp_hash_transport(struct sctp_transport *t);
+void sctp_unhash_transport(struct sctp_transport *t);
+struct sctp_transport *sctp_addrs_lookup_transport(
+				struct net *net,
+				const union sctp_addr *laddr,
+				const union sctp_addr *paddr);
+struct sctp_transport *sctp_epaddr_lookup_transport(
+				const struct sctp_endpoint *ep,
+				const union sctp_addr *paddr);
 
 /*
  * sctp/proc.c
diff --git a/include/net/sctp/structs.h b/include/net/sctp/structs.h
index eea9bde..4ab87d0 100644
--- a/include/net/sctp/structs.h
+++ b/include/net/sctp/structs.h
@@ -48,6 +48,7 @@
 #define __sctp_structs_h__
 
 #include <linux/ktime.h>
+#include <linux/rhashtable.h>
 #include <linux/socket.h>	/* linux/in.h needs this!!    */
 #include <linux/in.h>		/* We get struct sockaddr_in. */
 #include <linux/in6.h>		/* We get struct in6_addr     */
@@ -123,6 +124,8 @@ extern struct sctp_globals {
 	struct sctp_hashbucket *assoc_hashtable;
 	/* This is the sctp port control hash.	*/
 	struct sctp_bind_hashbucket *port_hashtable;
+	/* This is the hash of all transports. */
+	struct rhashtable transport_hashtable;
 
 	/* Sizes of above hashtables. */
 	int ep_hashsize;
@@ -147,6 +150,7 @@ extern struct sctp_globals {
 #define sctp_assoc_hashtable		(sctp_globals.assoc_hashtable)
 #define sctp_port_hashsize		(sctp_globals.port_hashsize)
 #define sctp_port_hashtable		(sctp_globals.port_hashtable)
+#define sctp_transport_hashtable	(sctp_globals.transport_hashtable)
 #define sctp_checksum_disable		(sctp_globals.checksum_disable)
 
 /* SCTP Socket type: UDP or TCP style. */
@@ -753,6 +757,7 @@ static inline int sctp_packet_empty(struct sctp_packet *packet)
 struct sctp_transport {
 	/* A list of transports. */
 	struct list_head transports;
+	struct rhash_head node;
 
 	/* Reference counting. */
 	atomic_t refcnt;
diff --git a/net/sctp/input.c b/net/sctp/input.c
index b6493b3..bac8278 100644
--- a/net/sctp/input.c
+++ b/net/sctp/input.c
@@ -782,6 +782,137 @@ hit:
 	return ep;
 }
 
+/* rhashtable for transport */
+struct sctp_hash_cmp_arg {
+	const union sctp_addr		*laddr;
+	const union sctp_addr		*paddr;
+	const struct net		*net;
+};
+
+static inline int sctp_hash_cmp(struct rhashtable_compare_arg *arg,
+				const void *ptr)
+{
+	const struct sctp_hash_cmp_arg *x = arg->key;
+	const struct sctp_transport *t = ptr;
+	struct sctp_association *asoc = t->asoc;
+	const struct net *net = x->net;
+
+	if (x->laddr->v4.sin_port != htons(asoc->base.bind_addr.port))
+		return 1;
+	if (!sctp_cmp_addr_exact(&t->ipaddr, x->paddr))
+		return 1;
+	if (!net_eq(sock_net(asoc->base.sk), net))
+		return 1;
+	if (!sctp_bind_addr_match(&asoc->base.bind_addr,
+				  x->laddr, sctp_sk(asoc->base.sk)))
+		return 1;
+
+	return 0;
+}
+
+static inline u32 sctp_hash_obj(const void *data, u32 len, u32 seed)
+{
+	const struct sctp_transport *t = data;
+	const union sctp_addr *paddr = &t->ipaddr;
+	const struct net *net = sock_net(t->asoc->base.sk);
+	u16 lport = htons(t->asoc->base.bind_addr.port);
+	u32 addr;
+
+	if (paddr->sa.sa_family == AF_INET6)
+		addr = jhash(&paddr->v6.sin6_addr, 16, seed);
+	else
+		addr = paddr->v4.sin_addr.s_addr;
+
+	return  jhash_3words(addr, ((__u32)paddr->v4.sin_port) << 16 |
+			     (__force __u32)lport, net_hash_mix(net), seed);
+}
+
+static inline u32 sctp_hash_key(const void *data, u32 len, u32 seed)
+{
+	const struct sctp_hash_cmp_arg *x = data;
+	const union sctp_addr *paddr = x->paddr;
+	const struct net *net = x->net;
+	u16 lport = x->laddr->v4.sin_port;
+	u32 addr;
+
+	if (paddr->sa.sa_family == AF_INET6)
+		addr = jhash(&paddr->v6.sin6_addr, 16, seed);
+	else
+		addr = paddr->v4.sin_addr.s_addr;
+
+	return  jhash_3words(addr, ((__u32)paddr->v4.sin_port) << 16 |
+			     (__force __u32)lport, net_hash_mix(net), seed);
+}
+
+static const struct rhashtable_params sctp_hash_params = {
+	.head_offset		= offsetof(struct sctp_transport, node),
+	.hashfn			= sctp_hash_key,
+	.obj_hashfn		= sctp_hash_obj,
+	.obj_cmpfn		= sctp_hash_cmp,
+	.automatic_shrinking	= true,
+};
+
+int sctp_transport_hashtable_init(void)
+{
+	return rhashtable_init(&sctp_transport_hashtable, &sctp_hash_params);
+}
+
+void sctp_transport_hashtable_destroy(void)
+{
+	rhashtable_destroy(&sctp_transport_hashtable);
+}
+
+void sctp_hash_transport(struct sctp_transport *t)
+{
+	struct sctp_sockaddr_entry *addr;
+	struct sctp_hash_cmp_arg arg;
+
+	addr = list_entry(t->asoc->base.bind_addr.address_list.next,
+			  struct sctp_sockaddr_entry, list);
+	arg.laddr = &addr->a;
+	arg.paddr = &t->ipaddr;
+	arg.net   = sock_net(t->asoc->base.sk);
+
+reinsert:
+	if (rhashtable_lookup_insert_key(&sctp_transport_hashtable, &arg,
+					 &t->node, sctp_hash_params) == -EBUSY)
+		goto reinsert;
+}
+
+void sctp_unhash_transport(struct sctp_transport *t)
+{
+	rhashtable_remove_fast(&sctp_transport_hashtable, &t->node,
+			       sctp_hash_params);
+}
+
+struct sctp_transport *sctp_addrs_lookup_transport(
+				struct net *net,
+				const union sctp_addr *laddr,
+				const union sctp_addr *paddr)
+{
+	struct sctp_hash_cmp_arg arg = {
+		.laddr = laddr,
+		.paddr = paddr,
+		.net   = net,
+	};
+
+	return rhashtable_lookup_fast(&sctp_transport_hashtable, &arg,
+				      sctp_hash_params);
+}
+
+struct sctp_transport *sctp_epaddr_lookup_transport(
+				const struct sctp_endpoint *ep,
+				const union sctp_addr *paddr)
+{
+	struct sctp_sockaddr_entry *addr;
+	struct net *net = sock_net(ep->base.sk);
+
+	addr = list_entry(ep->base.bind_addr.address_list.next,
+			  struct sctp_sockaddr_entry, list);
+
+	return sctp_addrs_lookup_transport(net, &addr->a, paddr);
+}
+
 /* Insert association into the hash table.  */
 static void __sctp_hash_established(struct sctp_association *asoc)
 {
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH net-next 2/5] sctp: apply rhashtable api to send/recv path
  2015-12-30 15:50 ` [PATCH net-next 1/5] sctp: add the rhashtable apis for sctp global transport hashtable Xin Long
@ 2015-12-30 15:50   ` Xin Long
  2015-12-30 15:50     ` [PATCH net-next 3/5] sctp: apply rhashtable api to sctp procfs Xin Long
  2016-01-05 19:07     ` [PATCH net-next 2/5] sctp: apply rhashtable api to send/recv path Vlad Yasevich
  2015-12-30 16:57   ` [PATCH net-next 1/5] sctp: add the rhashtable apis for sctp global transport hashtable Eric Dumazet
                     ` (3 subsequent siblings)
  4 siblings, 2 replies; 40+ messages in thread
From: Xin Long @ 2015-12-30 15:50 UTC (permalink / raw)
  To: network dev, linux-sctp; +Cc: mleitner, vyasevic, daniel, davem

apply lookup apis to two functions, for __sctp_endpoint_lookup_assoc
and __sctp_lookup_association, it's invoked in the protection of sock
lock, it will be safe, but sctp_lookup_association need to call
rcu_read_lock() and to detect the t->dead to protect it.

Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
---
 net/sctp/associola.c   |  5 +++++
 net/sctp/endpointola.c | 35 ++++++++---------------------------
 net/sctp/input.c       | 39 ++++++++++-----------------------------
 net/sctp/protocol.c    |  6 ++++++
 4 files changed, 29 insertions(+), 56 deletions(-)

diff --git a/net/sctp/associola.c b/net/sctp/associola.c
index 559afd0..2bf8ec9 100644
--- a/net/sctp/associola.c
+++ b/net/sctp/associola.c
@@ -383,6 +383,7 @@ void sctp_association_free(struct sctp_association *asoc)
 	list_for_each_safe(pos, temp, &asoc->peer.transport_addr_list) {
 		transport = list_entry(pos, struct sctp_transport, transports);
 		list_del_rcu(pos);
+		sctp_unhash_transport(transport);
 		sctp_transport_free(transport);
 	}
 
@@ -500,6 +501,8 @@ void sctp_assoc_rm_peer(struct sctp_association *asoc,
 
 	/* Remove this peer from the list. */
 	list_del_rcu(&peer->transports);
+	/* Remove this peer from the transport hashtable */
+	sctp_unhash_transport(peer);
 
 	/* Get the first transport of asoc. */
 	pos = asoc->peer.transport_addr_list.next;
@@ -699,6 +702,8 @@ struct sctp_transport *sctp_assoc_add_peer(struct sctp_association *asoc,
 	/* Attach the remote transport to our asoc.  */
 	list_add_tail_rcu(&peer->transports, &asoc->peer.transport_addr_list);
 	asoc->peer.transport_count++;
+	/* Add this peer into the transport hashtable */
+	sctp_hash_transport(peer);
 
 	/* If we do not yet have a primary path, set one.  */
 	if (!asoc->peer.primary_path) {
diff --git a/net/sctp/endpointola.c b/net/sctp/endpointola.c
index 9da76ba..8838bf4 100644
--- a/net/sctp/endpointola.c
+++ b/net/sctp/endpointola.c
@@ -314,8 +314,8 @@ struct sctp_endpoint *sctp_endpoint_is_match(struct sctp_endpoint *ep,
 }
 
 /* Find the association that goes with this chunk.
- * We do a linear search of the associations for this endpoint.
- * We return the matching transport address too.
+ * We lookup the transport from hashtable at first, then get association
+ * through t->assoc.
  */
 static struct sctp_association *__sctp_endpoint_lookup_assoc(
 	const struct sctp_endpoint *ep,
@@ -323,12 +323,7 @@ static struct sctp_association *__sctp_endpoint_lookup_assoc(
 	struct sctp_transport **transport)
 {
 	struct sctp_association *asoc = NULL;
-	struct sctp_association *tmp;
-	struct sctp_transport *t = NULL;
-	struct sctp_hashbucket *head;
-	struct sctp_ep_common *epb;
-	int hash;
-	int rport;
+	struct sctp_transport *t;
 
 	*transport = NULL;
 
@@ -337,26 +332,12 @@ static struct sctp_association *__sctp_endpoint_lookup_assoc(
 	 */
 	if (!ep->base.bind_addr.port)
 		goto out;
+	t = sctp_epaddr_lookup_transport(ep, paddr);
+	if (!t || t->asoc->temp)
+		goto out;
 
-	rport = ntohs(paddr->v4.sin_port);
-
-	hash = sctp_assoc_hashfn(sock_net(ep->base.sk), ep->base.bind_addr.port,
-				 rport);
-	head = &sctp_assoc_hashtable[hash];
-	read_lock(&head->lock);
-	sctp_for_each_hentry(epb, &head->chain) {
-		tmp = sctp_assoc(epb);
-		if (tmp->ep != ep || rport != tmp->peer.port)
-			continue;
-
-		t = sctp_assoc_lookup_paddr(tmp, paddr);
-		if (t) {
-			asoc = tmp;
-			*transport = t;
-			break;
-		}
-	}
-	read_unlock(&head->lock);
+	*transport = t;
+	asoc = t->asoc;
 out:
 	return asoc;
 }
diff --git a/net/sctp/input.c b/net/sctp/input.c
index bac8278..6f075d8 100644
--- a/net/sctp/input.c
+++ b/net/sctp/input.c
@@ -981,38 +981,19 @@ static struct sctp_association *__sctp_lookup_association(
 					const union sctp_addr *peer,
 					struct sctp_transport **pt)
 {
-	struct sctp_hashbucket *head;
-	struct sctp_ep_common *epb;
-	struct sctp_association *asoc;
-	struct sctp_transport *transport;
-	int hash;
+	struct sctp_transport *t;
 
-	/* Optimize here for direct hit, only listening connections can
-	 * have wildcards anyways.
-	 */
-	hash = sctp_assoc_hashfn(net, ntohs(local->v4.sin_port),
-				 ntohs(peer->v4.sin_port));
-	head = &sctp_assoc_hashtable[hash];
-	read_lock(&head->lock);
-	sctp_for_each_hentry(epb, &head->chain) {
-		asoc = sctp_assoc(epb);
-		transport = sctp_assoc_is_match(asoc, net, local, peer);
-		if (transport)
-			goto hit;
-	}
+	t = sctp_addrs_lookup_transport(net, local, peer);
+	if (!t || t->dead || t->asoc->temp)
+		return NULL;
 
-	read_unlock(&head->lock);
+	sctp_association_hold(t->asoc);
+	*pt = t;
 
-	return NULL;
-
-hit:
-	*pt = transport;
-	sctp_association_hold(asoc);
-	read_unlock(&head->lock);
-	return asoc;
+	return t->asoc;
 }
 
-/* Look up an association. BH-safe. */
+/* Look up an association. protected by RCU read lock */
 static
 struct sctp_association *sctp_lookup_association(struct net *net,
 						 const union sctp_addr *laddr,
@@ -1021,9 +1002,9 @@ struct sctp_association *sctp_lookup_association(struct net *net,
 {
 	struct sctp_association *asoc;
 
-	local_bh_disable();
+	rcu_read_lock();
 	asoc = __sctp_lookup_association(net, laddr, paddr, transportp);
-	local_bh_enable();
+	rcu_read_unlock();
 
 	return asoc;
 }
diff --git a/net/sctp/protocol.c b/net/sctp/protocol.c
index 010aced..631cfb3 100644
--- a/net/sctp/protocol.c
+++ b/net/sctp/protocol.c
@@ -1467,6 +1467,9 @@ static __init int sctp_init(void)
 		INIT_HLIST_HEAD(&sctp_port_hashtable[i].chain);
 	}
 
+	if (sctp_transport_hashtable_init())
+		goto err_thash_alloc;
+
 	pr_info("Hash tables configured (established %d bind %d)\n",
 		sctp_assoc_hashsize, sctp_port_hashsize);
 
@@ -1521,6 +1524,8 @@ err_register_defaults:
 		   get_order(sctp_port_hashsize *
 			     sizeof(struct sctp_bind_hashbucket)));
 err_bhash_alloc:
+	sctp_transport_hashtable_destroy();
+err_thash_alloc:
 	kfree(sctp_ep_hashtable);
 err_ehash_alloc:
 	free_pages((unsigned long)sctp_assoc_hashtable,
@@ -1567,6 +1572,7 @@ static __exit void sctp_exit(void)
 	free_pages((unsigned long)sctp_port_hashtable,
 		   get_order(sctp_port_hashsize *
 			     sizeof(struct sctp_bind_hashbucket)));
+	sctp_transport_hashtable_destroy();
 
 	percpu_counter_destroy(&sctp_sockets_allocated);
 
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH net-next 3/5] sctp: apply rhashtable api to sctp procfs
  2015-12-30 15:50   ` [PATCH net-next 2/5] sctp: apply rhashtable api to send/recv path Xin Long
@ 2015-12-30 15:50     ` Xin Long
  2015-12-30 15:50       ` [PATCH net-next 4/5] sctp: drop the old assoc hashtable of sctp Xin Long
  2016-01-05 19:07     ` [PATCH net-next 2/5] sctp: apply rhashtable api to send/recv path Vlad Yasevich
  1 sibling, 1 reply; 40+ messages in thread
From: Xin Long @ 2015-12-30 15:50 UTC (permalink / raw)
  To: network dev, linux-sctp; +Cc: mleitner, vyasevic, daniel, davem

Traversal the transport rhashtable, get the association only once through
the condition assoc->peer.primary_path != transport.

Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
---
 net/sctp/proc.c | 316 +++++++++++++++++++++++++++++++-------------------------
 1 file changed, 173 insertions(+), 143 deletions(-)

diff --git a/net/sctp/proc.c b/net/sctp/proc.c
index 0697eda..dfa7eec 100644
--- a/net/sctp/proc.c
+++ b/net/sctp/proc.c
@@ -281,88 +281,136 @@ void sctp_eps_proc_exit(struct net *net)
 	remove_proc_entry("eps", net->sctp.proc_net_sctp);
 }
 
+struct sctp_ht_iter {
+	struct seq_net_private p;
+	struct rhashtable_iter hti;
+};
 
-static void *sctp_assocs_seq_start(struct seq_file *seq, loff_t *pos)
+static struct sctp_transport *sctp_transport_get_next(struct seq_file *seq)
 {
-	if (*pos >= sctp_assoc_hashsize)
-		return NULL;
+	struct sctp_ht_iter *iter = seq->private;
+	struct sctp_transport *t;
 
-	if (*pos < 0)
-		*pos = 0;
+	t = rhashtable_walk_next(&iter->hti);
+	for (; t; t = rhashtable_walk_next(&iter->hti)) {
+		if (IS_ERR(t)) {
+			if (PTR_ERR(t) == -EAGAIN)
+				continue;
+			break;
+		}
 
-	if (*pos == 0)
-		seq_printf(seq, " ASSOC     SOCK   STY SST ST HBKT "
-				"ASSOC-ID TX_QUEUE RX_QUEUE UID INODE LPORT "
-				"RPORT LADDRS <-> RADDRS "
-				"HBINT INS OUTS MAXRT T1X T2X RTXC "
-				"wmema wmemq sndbuf rcvbuf\n");
+		if (net_eq(sock_net(t->asoc->base.sk), seq_file_net(seq)) &&
+		    t->asoc->peer.primary_path == t)
+			break;
+	}
 
-	return (void *)pos;
+	return t;
 }
 
-static void sctp_assocs_seq_stop(struct seq_file *seq, void *v)
+static struct sctp_transport *sctp_transport_get_idx(struct seq_file *seq,
+						     loff_t pos)
+{
+	void *obj;
+
+	while (pos && (obj = sctp_transport_get_next(seq)) && !IS_ERR(obj))
+		pos--;
+
+	return obj;
+}
+
+static int sctp_transport_walk_start(struct seq_file *seq)
 {
+	struct sctp_ht_iter *iter = seq->private;
+	int err;
+
+	err = rhashtable_walk_init(&sctp_transport_hashtable, &iter->hti);
+	if (err)
+		return err;
+
+	err = rhashtable_walk_start(&iter->hti);
+
+	return err == -EAGAIN ? 0 : err;
 }
 
+static void sctp_transport_walk_stop(struct seq_file *seq)
+{
+	struct sctp_ht_iter *iter = seq->private;
+
+	rhashtable_walk_stop(&iter->hti);
+	rhashtable_walk_exit(&iter->hti);
+}
+
+static void *sctp_assocs_seq_start(struct seq_file *seq, loff_t *pos)
+{
+	int err = sctp_transport_walk_start(seq);
+
+	if (err)
+		return ERR_PTR(err);
+
+	return *pos ? sctp_transport_get_idx(seq, *pos) : SEQ_START_TOKEN;
+}
+
+static void sctp_assocs_seq_stop(struct seq_file *seq, void *v)
+{
+	sctp_transport_walk_stop(seq);
+}
 
 static void *sctp_assocs_seq_next(struct seq_file *seq, void *v, loff_t *pos)
 {
-	if (++*pos >= sctp_assoc_hashsize)
-		return NULL;
+	++*pos;
 
-	return pos;
+	return sctp_transport_get_next(seq);
 }
 
 /* Display sctp associations (/proc/net/sctp/assocs). */
 static int sctp_assocs_seq_show(struct seq_file *seq, void *v)
 {
-	struct sctp_hashbucket *head;
-	struct sctp_ep_common *epb;
+	struct sctp_transport *transport;
 	struct sctp_association *assoc;
+	struct sctp_ep_common *epb;
 	struct sock *sk;
-	int    hash = *(loff_t *)v;
-
-	if (hash >= sctp_assoc_hashsize)
-		return -ENOMEM;
 
-	head = &sctp_assoc_hashtable[hash];
-	local_bh_disable();
-	read_lock(&head->lock);
-	sctp_for_each_hentry(epb, &head->chain) {
-		assoc = sctp_assoc(epb);
-		sk = epb->sk;
-		if (!net_eq(sock_net(sk), seq_file_net(seq)))
-			continue;
-		seq_printf(seq,
-			   "%8pK %8pK %-3d %-3d %-2d %-4d "
-			   "%4d %8d %8d %7u %5lu %-5d %5d ",
-			   assoc, sk, sctp_sk(sk)->type, sk->sk_state,
-			   assoc->state, hash,
-			   assoc->assoc_id,
-			   assoc->sndbuf_used,
-			   atomic_read(&assoc->rmem_alloc),
-			   from_kuid_munged(seq_user_ns(seq), sock_i_uid(sk)),
-			   sock_i_ino(sk),
-			   epb->bind_addr.port,
-			   assoc->peer.port);
-		seq_printf(seq, " ");
-		sctp_seq_dump_local_addrs(seq, epb);
-		seq_printf(seq, "<-> ");
-		sctp_seq_dump_remote_addrs(seq, assoc);
-		seq_printf(seq, "\t%8lu %5d %5d %4d %4d %4d %8d "
-			   "%8d %8d %8d %8d",
-			assoc->hbinterval, assoc->c.sinit_max_instreams,
-			assoc->c.sinit_num_ostreams, assoc->max_retrans,
-			assoc->init_retries, assoc->shutdown_retries,
-			assoc->rtx_data_chunks,
-			atomic_read(&sk->sk_wmem_alloc),
-			sk->sk_wmem_queued,
-			sk->sk_sndbuf,
-			sk->sk_rcvbuf);
-		seq_printf(seq, "\n");
+	if (v == SEQ_START_TOKEN) {
+		seq_printf(seq, " ASSOC     SOCK   STY SST ST HBKT "
+				"ASSOC-ID TX_QUEUE RX_QUEUE UID INODE LPORT "
+				"RPORT LADDRS <-> RADDRS "
+				"HBINT INS OUTS MAXRT T1X T2X RTXC "
+				"wmema wmemq sndbuf rcvbuf\n");
+		return 0;
 	}
-	read_unlock(&head->lock);
-	local_bh_enable();
+
+	transport = (struct sctp_transport *)v;
+	assoc = transport->asoc;
+	epb = &assoc->base;
+	sk = epb->sk;
+
+	seq_printf(seq,
+		   "%8pK %8pK %-3d %-3d %-2d %-4d "
+		   "%4d %8d %8d %7u %5lu %-5d %5d ",
+		   assoc, sk, sctp_sk(sk)->type, sk->sk_state,
+		   assoc->state, 0,
+		   assoc->assoc_id,
+		   assoc->sndbuf_used,
+		   atomic_read(&assoc->rmem_alloc),
+		   from_kuid_munged(seq_user_ns(seq), sock_i_uid(sk)),
+		   sock_i_ino(sk),
+		   epb->bind_addr.port,
+		   assoc->peer.port);
+	seq_printf(seq, " ");
+	sctp_seq_dump_local_addrs(seq, epb);
+	seq_printf(seq, "<-> ");
+	sctp_seq_dump_remote_addrs(seq, assoc);
+	seq_printf(seq, "\t%8lu %5d %5d %4d %4d %4d %8d "
+		   "%8d %8d %8d %8d",
+		assoc->hbinterval, assoc->c.sinit_max_instreams,
+		assoc->c.sinit_num_ostreams, assoc->max_retrans,
+		assoc->init_retries, assoc->shutdown_retries,
+		assoc->rtx_data_chunks,
+		atomic_read(&sk->sk_wmem_alloc),
+		sk->sk_wmem_queued,
+		sk->sk_sndbuf,
+		sk->sk_rcvbuf);
+	seq_printf(seq, "\n");
 
 	return 0;
 }
@@ -378,7 +426,7 @@ static const struct seq_operations sctp_assoc_ops = {
 static int sctp_assocs_seq_open(struct inode *inode, struct file *file)
 {
 	return seq_open_net(inode, file, &sctp_assoc_ops,
-			    sizeof(struct seq_net_private));
+			    sizeof(struct sctp_ht_iter));
 }
 
 static const struct file_operations sctp_assocs_seq_fops = {
@@ -409,112 +457,94 @@ void sctp_assocs_proc_exit(struct net *net)
 
 static void *sctp_remaddr_seq_start(struct seq_file *seq, loff_t *pos)
 {
-	if (*pos >= sctp_assoc_hashsize)
-		return NULL;
-
-	if (*pos < 0)
-		*pos = 0;
+	int err = sctp_transport_walk_start(seq);
 
-	if (*pos == 0)
-		seq_printf(seq, "ADDR ASSOC_ID HB_ACT RTO MAX_PATH_RTX "
-				"REM_ADDR_RTX START STATE\n");
+	if (err)
+		return ERR_PTR(err);
 
-	return (void *)pos;
+	return *pos ? sctp_transport_get_idx(seq, *pos) : SEQ_START_TOKEN;
 }
 
 static void *sctp_remaddr_seq_next(struct seq_file *seq, void *v, loff_t *pos)
 {
-	if (++*pos >= sctp_assoc_hashsize)
-		return NULL;
+	++*pos;
 
-	return pos;
+	return sctp_transport_get_next(seq);
 }
 
 static void sctp_remaddr_seq_stop(struct seq_file *seq, void *v)
 {
+	sctp_transport_walk_stop(seq);
 }
 
 static int sctp_remaddr_seq_show(struct seq_file *seq, void *v)
 {
-	struct sctp_hashbucket *head;
-	struct sctp_ep_common *epb;
 	struct sctp_association *assoc;
 	struct sctp_transport *tsp;
-	int    hash = *(loff_t *)v;
 
-	if (hash >= sctp_assoc_hashsize)
-		return -ENOMEM;
+	if (v == SEQ_START_TOKEN) {
+		seq_printf(seq, "ADDR ASSOC_ID HB_ACT RTO MAX_PATH_RTX "
+				"REM_ADDR_RTX START STATE\n");
+		return 0;
+	}
 
-	head = &sctp_assoc_hashtable[hash];
-	local_bh_disable();
-	read_lock(&head->lock);
-	rcu_read_lock();
-	sctp_for_each_hentry(epb, &head->chain) {
-		if (!net_eq(sock_net(epb->sk), seq_file_net(seq)))
+	tsp = (struct sctp_transport *)v;
+	assoc = tsp->asoc;
+
+	list_for_each_entry_rcu(tsp, &assoc->peer.transport_addr_list,
+				transports) {
+		if (tsp->dead)
 			continue;
-		assoc = sctp_assoc(epb);
-		list_for_each_entry_rcu(tsp, &assoc->peer.transport_addr_list,
-					transports) {
-			if (tsp->dead)
-				continue;
+		/*
+		 * The remote address (ADDR)
+		 */
+		tsp->af_specific->seq_dump_addr(seq, &tsp->ipaddr);
+		seq_printf(seq, " ");
+		/*
+		 * The association ID (ASSOC_ID)
+		 */
+		seq_printf(seq, "%d ", tsp->asoc->assoc_id);
+
+		/*
+		 * If the Heartbeat is active (HB_ACT)
+		 * Note: 1 = Active, 0 = Inactive
+		 */
+		seq_printf(seq, "%d ", timer_pending(&tsp->hb_timer));
+
+		/*
+		 * Retransmit time out (RTO)
+		 */
+		seq_printf(seq, "%lu ", tsp->rto);
+
+		/*
+		 * Maximum path retransmit count (PATH_MAX_RTX)
+		 */
+		seq_printf(seq, "%d ", tsp->pathmaxrxt);
+
+		/*
+		 * remote address retransmit count (REM_ADDR_RTX)
+		 * Note: We don't have a way to tally this at the moment
+		 * so lets just leave it as zero for the moment
+		 */
+		seq_puts(seq, "0 ");
+
+		/*
+		 * remote address start time (START).  This is also not
+		 * currently implemented, but we can record it with a
+		 * jiffies marker in a subsequent patch
+		 */
+		seq_puts(seq, "0 ");
+
+		/*
+		 * The current state of this destination. I.e.
+		 * SCTP_ACTIVE, SCTP_INACTIVE, ...
+		 */
+		seq_printf(seq, "%d", tsp->state);
 
-			/*
-			 * The remote address (ADDR)
-			 */
-			tsp->af_specific->seq_dump_addr(seq, &tsp->ipaddr);
-			seq_printf(seq, " ");
-
-			/*
-			 * The association ID (ASSOC_ID)
-			 */
-			seq_printf(seq, "%d ", tsp->asoc->assoc_id);
-
-			/*
-			 * If the Heartbeat is active (HB_ACT)
-			 * Note: 1 = Active, 0 = Inactive
-			 */
-			seq_printf(seq, "%d ", timer_pending(&tsp->hb_timer));
-
-			/*
-			 * Retransmit time out (RTO)
-			 */
-			seq_printf(seq, "%lu ", tsp->rto);
-
-			/*
-			 * Maximum path retransmit count (PATH_MAX_RTX)
-			 */
-			seq_printf(seq, "%d ", tsp->pathmaxrxt);
-
-			/*
-			 * remote address retransmit count (REM_ADDR_RTX)
-			 * Note: We don't have a way to tally this at the moment
-			 * so lets just leave it as zero for the moment
-			 */
-			seq_puts(seq, "0 ");
-
-			/*
-			 * remote address start time (START).  This is also not
-			 * currently implemented, but we can record it with a
-			 * jiffies marker in a subsequent patch
-			 */
-			seq_puts(seq, "0 ");
-
-			/*
-			 * The current state of this destination. I.e.
-			 * SCTP_ACTIVE, SCTP_INACTIVE, ...
-			 */
-			seq_printf(seq, "%d", tsp->state);
-
-			seq_printf(seq, "\n");
-		}
+		seq_printf(seq, "\n");
 	}
 
-	rcu_read_unlock();
-	read_unlock(&head->lock);
-	local_bh_enable();
-
 	return 0;
-
 }
 
 static const struct seq_operations sctp_remaddr_ops = {
@@ -533,7 +563,7 @@ void sctp_remaddr_proc_exit(struct net *net)
 static int sctp_remaddr_seq_open(struct inode *inode, struct file *file)
 {
 	return seq_open_net(inode, file, &sctp_remaddr_ops,
-			    sizeof(struct seq_net_private));
+			    sizeof(struct sctp_ht_iter));
 }
 
 static const struct file_operations sctp_remaddr_seq_fops = {
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH net-next 4/5] sctp: drop the old assoc hashtable of sctp
  2015-12-30 15:50     ` [PATCH net-next 3/5] sctp: apply rhashtable api to sctp procfs Xin Long
@ 2015-12-30 15:50       ` Xin Long
  2015-12-30 15:50         ` [PATCH net-next 5/5] sctp: remove the local_bh_disable/enable in sctp_endpoint_lookup_assoc Xin Long
  0 siblings, 1 reply; 40+ messages in thread
From: Xin Long @ 2015-12-30 15:50 UTC (permalink / raw)
  To: network dev, linux-sctp; +Cc: mleitner, vyasevic, daniel, davem

transport hashtable will replace the association hashtable,
so association hashtable is not used in sctp any more, so
drop the codes about that.

Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
---
 include/net/sctp/sctp.h    | 21 ----------------
 include/net/sctp/structs.h |  5 ----
 net/sctp/input.c           | 61 ----------------------------------------------
 net/sctp/protocol.c        | 30 ++---------------------
 net/sctp/sm_sideeffect.c   |  2 --
 net/sctp/socket.c          |  6 +----
 6 files changed, 3 insertions(+), 122 deletions(-)

diff --git a/include/net/sctp/sctp.h b/include/net/sctp/sctp.h
index 7bbdfba..835aa2e 100644
--- a/include/net/sctp/sctp.h
+++ b/include/net/sctp/sctp.h
@@ -126,8 +126,6 @@ int sctp_primitive_ASCONF(struct net *, struct sctp_association *, void *arg);
  */
 int sctp_rcv(struct sk_buff *skb);
 void sctp_v4_err(struct sk_buff *skb, u32 info);
-void sctp_hash_established(struct sctp_association *);
-void sctp_unhash_established(struct sctp_association *);
 void sctp_hash_endpoint(struct sctp_endpoint *);
 void sctp_unhash_endpoint(struct sctp_endpoint *);
 struct sock *sctp_err_lookup(struct net *net, int family, struct sk_buff *,
@@ -530,25 +528,6 @@ static inline int sctp_ep_hashfn(struct net *net, __u16 lport)
 	return (net_hash_mix(net) + lport) & (sctp_ep_hashsize - 1);
 }
 
-/* This is the hash function for the association hash table. */
-static inline int sctp_assoc_hashfn(struct net *net, __u16 lport, __u16 rport)
-{
-	int h = (lport << 16) + rport + net_hash_mix(net);
-	h ^= h>>8;
-	return h & (sctp_assoc_hashsize - 1);
-}
-
-/* This is the hash function for the association hash table.  This is
- * not used yet, but could be used as a better hash function when
- * we have a vtag.
- */
-static inline int sctp_vtag_hashfn(__u16 lport, __u16 rport, __u32 vtag)
-{
-	int h = (lport << 16) + rport;
-	h ^= vtag;
-	return h & (sctp_assoc_hashsize - 1);
-}
-
 #define sctp_for_each_hentry(epb, head) \
 	hlist_for_each_entry(epb, head, node)
 
diff --git a/include/net/sctp/structs.h b/include/net/sctp/structs.h
index 4ab87d0..20e7212 100644
--- a/include/net/sctp/structs.h
+++ b/include/net/sctp/structs.h
@@ -120,8 +120,6 @@ extern struct sctp_globals {
 
 	/* This is the hash of all endpoints. */
 	struct sctp_hashbucket *ep_hashtable;
-	/* This is the hash of all associations. */
-	struct sctp_hashbucket *assoc_hashtable;
 	/* This is the sctp port control hash.	*/
 	struct sctp_bind_hashbucket *port_hashtable;
 	/* This is the hash of all transports. */
@@ -129,7 +127,6 @@ extern struct sctp_globals {
 
 	/* Sizes of above hashtables. */
 	int ep_hashsize;
-	int assoc_hashsize;
 	int port_hashsize;
 
 	/* Default initialization values to be applied to new associations. */
@@ -146,8 +143,6 @@ extern struct sctp_globals {
 #define sctp_address_families		(sctp_globals.address_families)
 #define sctp_ep_hashsize		(sctp_globals.ep_hashsize)
 #define sctp_ep_hashtable		(sctp_globals.ep_hashtable)
-#define sctp_assoc_hashsize		(sctp_globals.assoc_hashsize)
-#define sctp_assoc_hashtable		(sctp_globals.assoc_hashtable)
 #define sctp_port_hashsize		(sctp_globals.port_hashsize)
 #define sctp_port_hashtable		(sctp_globals.port_hashtable)
 #define sctp_transport_hashtable	(sctp_globals.transport_hashtable)
diff --git a/net/sctp/input.c b/net/sctp/input.c
index 6f075d8..d9a6e66 100644
--- a/net/sctp/input.c
+++ b/net/sctp/input.c
@@ -913,67 +913,6 @@ struct sctp_transport *sctp_epaddr_lookup_transport(
 	return sctp_addrs_lookup_transport(net, &addr->a, paddr);
 }
 
-/* Insert association into the hash table.  */
-static void __sctp_hash_established(struct sctp_association *asoc)
-{
-	struct net *net = sock_net(asoc->base.sk);
-	struct sctp_ep_common *epb;
-	struct sctp_hashbucket *head;
-
-	epb = &asoc->base;
-
-	/* Calculate which chain this entry will belong to. */
-	epb->hashent = sctp_assoc_hashfn(net, epb->bind_addr.port,
-					 asoc->peer.port);
-
-	head = &sctp_assoc_hashtable[epb->hashent];
-
-	write_lock(&head->lock);
-	hlist_add_head(&epb->node, &head->chain);
-	write_unlock(&head->lock);
-}
-
-/* Add an association to the hash. Local BH-safe. */
-void sctp_hash_established(struct sctp_association *asoc)
-{
-	if (asoc->temp)
-		return;
-
-	local_bh_disable();
-	__sctp_hash_established(asoc);
-	local_bh_enable();
-}
-
-/* Remove association from the hash table.  */
-static void __sctp_unhash_established(struct sctp_association *asoc)
-{
-	struct net *net = sock_net(asoc->base.sk);
-	struct sctp_hashbucket *head;
-	struct sctp_ep_common *epb;
-
-	epb = &asoc->base;
-
-	epb->hashent = sctp_assoc_hashfn(net, epb->bind_addr.port,
-					 asoc->peer.port);
-
-	head = &sctp_assoc_hashtable[epb->hashent];
-
-	write_lock(&head->lock);
-	hlist_del_init(&epb->node);
-	write_unlock(&head->lock);
-}
-
-/* Remove association from the hash table.  Local BH-safe. */
-void sctp_unhash_established(struct sctp_association *asoc)
-{
-	if (asoc->temp)
-		return;
-
-	local_bh_disable();
-	__sctp_unhash_established(asoc);
-	local_bh_enable();
-}
-
 /* Look up an association. */
 static struct sctp_association *__sctp_lookup_association(
 					struct net *net,
diff --git a/net/sctp/protocol.c b/net/sctp/protocol.c
index 631cfb3..ab0d538 100644
--- a/net/sctp/protocol.c
+++ b/net/sctp/protocol.c
@@ -1416,24 +1416,6 @@ static __init int sctp_init(void)
 	for (order = 0; (1UL << order) < goal; order++)
 		;
 
-	do {
-		sctp_assoc_hashsize = (1UL << order) * PAGE_SIZE /
-					sizeof(struct sctp_hashbucket);
-		if ((sctp_assoc_hashsize > (64 * 1024)) && order > 0)
-			continue;
-		sctp_assoc_hashtable = (struct sctp_hashbucket *)
-			__get_free_pages(GFP_KERNEL | __GFP_NOWARN, order);
-	} while (!sctp_assoc_hashtable && --order > 0);
-	if (!sctp_assoc_hashtable) {
-		pr_err("Failed association hash alloc\n");
-		status = -ENOMEM;
-		goto err_ahash_alloc;
-	}
-	for (i = 0; i < sctp_assoc_hashsize; i++) {
-		rwlock_init(&sctp_assoc_hashtable[i].lock);
-		INIT_HLIST_HEAD(&sctp_assoc_hashtable[i].chain);
-	}
-
 	/* Allocate and initialize the endpoint hash table.  */
 	sctp_ep_hashsize = 64;
 	sctp_ep_hashtable =
@@ -1470,8 +1452,7 @@ static __init int sctp_init(void)
 	if (sctp_transport_hashtable_init())
 		goto err_thash_alloc;
 
-	pr_info("Hash tables configured (established %d bind %d)\n",
-		sctp_assoc_hashsize, sctp_port_hashsize);
+	pr_info("Hash tables configured (bind %d)\n", sctp_port_hashsize);
 
 	sctp_sysctl_register();
 
@@ -1528,10 +1509,6 @@ err_bhash_alloc:
 err_thash_alloc:
 	kfree(sctp_ep_hashtable);
 err_ehash_alloc:
-	free_pages((unsigned long)sctp_assoc_hashtable,
-		   get_order(sctp_assoc_hashsize *
-			     sizeof(struct sctp_hashbucket)));
-err_ahash_alloc:
 	percpu_counter_destroy(&sctp_sockets_allocated);
 err_percpu_counter_init:
 	kmem_cache_destroy(sctp_chunk_cachep);
@@ -1565,13 +1542,10 @@ static __exit void sctp_exit(void)
 
 	sctp_sysctl_unregister();
 
-	free_pages((unsigned long)sctp_assoc_hashtable,
-		   get_order(sctp_assoc_hashsize *
-			     sizeof(struct sctp_hashbucket)));
-	kfree(sctp_ep_hashtable);
 	free_pages((unsigned long)sctp_port_hashtable,
 		   get_order(sctp_port_hashsize *
 			     sizeof(struct sctp_bind_hashbucket)));
+	kfree(sctp_ep_hashtable);
 	sctp_transport_hashtable_destroy();
 
 	percpu_counter_destroy(&sctp_sockets_allocated);
diff --git a/net/sctp/sm_sideeffect.c b/net/sctp/sm_sideeffect.c
index 05cd164..4f170ad 100644
--- a/net/sctp/sm_sideeffect.c
+++ b/net/sctp/sm_sideeffect.c
@@ -866,7 +866,6 @@ static void sctp_cmd_delete_tcb(sctp_cmd_seq_t *cmds,
 	    (!asoc->temp) && (sk->sk_shutdown != SHUTDOWN_MASK))
 		return;
 
-	sctp_unhash_established(asoc);
 	sctp_association_free(asoc);
 }
 
@@ -1269,7 +1268,6 @@ static int sctp_cmd_interpreter(sctp_event_t event_type,
 			asoc = cmd->obj.asoc;
 			BUG_ON(asoc->peer.primary_path == NULL);
 			sctp_endpoint_add_asoc(ep, asoc);
-			sctp_hash_established(asoc);
 			break;
 
 		case SCTP_CMD_UPDATE_ASSOC:
diff --git a/net/sctp/socket.c b/net/sctp/socket.c
index 2a1e8ba..523b9fa 100644
--- a/net/sctp/socket.c
+++ b/net/sctp/socket.c
@@ -1228,7 +1228,6 @@ out_free:
 		 * To the hash table, try to unhash it, just in case, its a noop
 		 * if it wasn't hashed so we're safe
 		 */
-		sctp_unhash_established(asoc);
 		sctp_association_free(asoc);
 	}
 	return err;
@@ -1501,7 +1500,6 @@ static void sctp_close(struct sock *sk, long timeout)
 			 * ABORT or SHUTDOWN based on the linger options.
 			 */
 			if (sctp_state(asoc, CLOSED)) {
-				sctp_unhash_established(asoc);
 				sctp_association_free(asoc);
 				continue;
 			}
@@ -1984,10 +1982,8 @@ static int sctp_sendmsg(struct sock *sk, struct msghdr *msg, size_t msg_len)
 	goto out_unlock;
 
 out_free:
-	if (new_asoc) {
-		sctp_unhash_established(asoc);
+	if (new_asoc)
 		sctp_association_free(asoc);
-	}
 out_unlock:
 	release_sock(sk);
 
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH net-next 5/5] sctp: remove the local_bh_disable/enable in sctp_endpoint_lookup_assoc
  2015-12-30 15:50       ` [PATCH net-next 4/5] sctp: drop the old assoc hashtable of sctp Xin Long
@ 2015-12-30 15:50         ` Xin Long
  0 siblings, 0 replies; 40+ messages in thread
From: Xin Long @ 2015-12-30 15:50 UTC (permalink / raw)
  To: network dev, linux-sctp; +Cc: mleitner, vyasevic, daniel, davem

sctp_endpoint_lookup_assoc is called in the protection of sock lock
there is no need to call local_bh_disable in this function. so remove
them.

Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
---
 net/sctp/endpointola.c | 17 +----------------
 1 file changed, 1 insertion(+), 16 deletions(-)

diff --git a/net/sctp/endpointola.c b/net/sctp/endpointola.c
index 8838bf4..52838ea 100644
--- a/net/sctp/endpointola.c
+++ b/net/sctp/endpointola.c
@@ -317,7 +317,7 @@ struct sctp_endpoint *sctp_endpoint_is_match(struct sctp_endpoint *ep,
  * We lookup the transport from hashtable at first, then get association
  * through t->assoc.
  */
-static struct sctp_association *__sctp_endpoint_lookup_assoc(
+struct sctp_association *sctp_endpoint_lookup_assoc(
 	const struct sctp_endpoint *ep,
 	const union sctp_addr *paddr,
 	struct sctp_transport **transport)
@@ -342,21 +342,6 @@ out:
 	return asoc;
 }
 
-/* Lookup association on an endpoint based on a peer address.  BH-safe.  */
-struct sctp_association *sctp_endpoint_lookup_assoc(
-	const struct sctp_endpoint *ep,
-	const union sctp_addr *paddr,
-	struct sctp_transport **transport)
-{
-	struct sctp_association *asoc;
-
-	local_bh_disable();
-	asoc = __sctp_endpoint_lookup_assoc(ep, paddr, transport);
-	local_bh_enable();
-
-	return asoc;
-}
-
 /* Look for any peeled off association from the endpoint that matches the
  * given peer address.
  */
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* Re: [PATCH net-next 1/5] sctp: add the rhashtable apis for sctp global transport hashtable
  2015-12-30 15:50 ` [PATCH net-next 1/5] sctp: add the rhashtable apis for sctp global transport hashtable Xin Long
  2015-12-30 15:50   ` [PATCH net-next 2/5] sctp: apply rhashtable api to send/recv path Xin Long
@ 2015-12-30 16:57   ` Eric Dumazet
  2015-12-30 17:50     ` David Miller
  2015-12-30 17:41   ` Marcelo Ricardo Leitner
                     ` (2 subsequent siblings)
  4 siblings, 1 reply; 40+ messages in thread
From: Eric Dumazet @ 2015-12-30 16:57 UTC (permalink / raw)
  To: Xin Long; +Cc: network dev, linux-sctp, mleitner, vyasevic, daniel, davem

On Wed, 2015-12-30 at 23:50 +0800, Xin Long wrote:
> tranport hashtbale will replace the association hashtable to do the
> lookup for transport, and then get association by t->assoc, rhashtable
> apis will be used because of it's resizable, scalable and using rcu.
> 
> lport + rport + paddr will be the base hashkey to locate the chain,
> with net to protect one netns from another, then plus the laddr to
> compare to get the target.
> 
> this patch will provider the lookup functions:
> - sctp_epaddr_lookup_transport
> - sctp_addrs_lookup_transport
> 
> hash/unhash functions:
> - sctp_hash_transport
> - sctp_unhash_transport
> 
> init/destroy functions:
> - sctp_transport_hashtable_init
> - sctp_transport_hashtable_destroy
> 
> Signed-off-by: Xin Long <lucien.xin@gmail.com>
> Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
> ---


I am against using rhashtable in SCTP (or TCP) at this stage, given the
number of bugs we have with it.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH net-next 0/5] sctp: use transport hashtable to replace association's with rhashtable
  2015-12-30 15:50 [PATCH net-next 0/5] sctp: use transport hashtable to replace association's with rhashtable Xin Long
  2015-12-30 15:50 ` [PATCH net-next 1/5] sctp: add the rhashtable apis for sctp global transport hashtable Xin Long
@ 2015-12-30 17:19 ` Eric Dumazet
  2015-12-30 17:32   ` Marcelo Ricardo Leitner
  2015-12-30 17:52   ` David Miller
  2016-01-04 22:30 ` David Miller
  2 siblings, 2 replies; 40+ messages in thread
From: Eric Dumazet @ 2015-12-30 17:19 UTC (permalink / raw)
  To: Xin Long; +Cc: network dev, linux-sctp, mleitner, vyasevic, daniel, davem

On Wed, 2015-12-30 at 23:50 +0800, Xin Long wrote:

> besides, this patchset will use transport hashtable to replace
> association hashtable to lookup with rhashtable api. get transport
> first then get association by t->asoc. and also it will make tcp
> style work better.

SCTP already has a hash table, why not simply changing the way items are
hashed into it ?

Sure, storing thousands of sockets in a single hash bucket is not wise.

Switching SCTP to rhashtable at this moment is premature, it is still
moving fast.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH net-next 0/5] sctp: use transport hashtable to replace association's with rhashtable
  2015-12-30 17:19 ` [PATCH net-next 0/5] sctp: use transport hashtable to replace association's with rhashtable Eric Dumazet
@ 2015-12-30 17:32   ` Marcelo Ricardo Leitner
  2015-12-30 19:11     ` Eric Dumazet
  2015-12-30 17:52   ` David Miller
  1 sibling, 1 reply; 40+ messages in thread
From: Marcelo Ricardo Leitner @ 2015-12-30 17:32 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Xin Long, network dev, linux-sctp, mleitner, vyasevic, daniel, davem

On Wed, Dec 30, 2015 at 12:19:39PM -0500, Eric Dumazet wrote:
> On Wed, 2015-12-30 at 23:50 +0800, Xin Long wrote:
> 
> > besides, this patchset will use transport hashtable to replace
> > association hashtable to lookup with rhashtable api. get transport
> > first then get association by t->asoc. and also it will make tcp
> > style work better.
> 
> SCTP already has a hash table, why not simply changing the way items are
> hashed into it ?

Because Vlad asked to split the patch so it gets easier to review. The
direct change was quite big.

> Sure, storing thousands of sockets in a single hash bucket is not wise.
> 
> Switching SCTP to rhashtable at this moment is premature, it is still
> moving fast.

Dave and Vlad had asked in the first review for considering using
rhashtable (ok, Dave didn't mention it by name).  We did, and it seemed
nice beside 1 issue Xin found, regarding multiple rehashing, which I'll
highlight in a reply right away. 
Said all this, I know this was your second email already against this
usage, but I have to ask, sorry: still really against it?

Initial post was with subject:
[PATCH net] sctp: support global vtag assochash and per endpoint
s(d)port assochash table

Thanks,
Marcelo

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH net-next 1/5] sctp: add the rhashtable apis for sctp global transport hashtable
  2015-12-30 15:50 ` [PATCH net-next 1/5] sctp: add the rhashtable apis for sctp global transport hashtable Xin Long
  2015-12-30 15:50   ` [PATCH net-next 2/5] sctp: apply rhashtable api to send/recv path Xin Long
  2015-12-30 16:57   ` [PATCH net-next 1/5] sctp: add the rhashtable apis for sctp global transport hashtable Eric Dumazet
@ 2015-12-30 17:41   ` Marcelo Ricardo Leitner
  2016-01-05 10:10     ` Xin Long
  2016-01-05 18:38   ` Vlad Yasevich
  2016-01-11  9:30   ` Herbert Xu
  4 siblings, 1 reply; 40+ messages in thread
From: Marcelo Ricardo Leitner @ 2015-12-30 17:41 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: network dev, linux-sctp, mleitner, vyasevic, daniel, davem,
	Herbert Xu, Xin Long

On Wed, Dec 30, 2015 at 11:50:46PM +0800, Xin Long wrote:
...
> +void sctp_hash_transport(struct sctp_transport *t)
> +{
> +	struct sctp_sockaddr_entry *addr;
> +	struct sctp_hash_cmp_arg arg;
> +
> +	addr = list_entry(t->asoc->base.bind_addr.address_list.next,
> +			  struct sctp_sockaddr_entry, list);
> +	arg.laddr = &addr->a;
> +	arg.paddr = &t->ipaddr;
> +	arg.net   = sock_net(t->asoc->base.sk);
> +
> +reinsert:
> +	if (rhashtable_lookup_insert_key(&sctp_transport_hashtable, &arg,
> +					 &t->node, sctp_hash_params) == -EBUSY)
> +		goto reinsert;
> +}

This is the nasty situation I mentioned in previous email. It seems that
a stress test can trigger a double rehash and cause an entry to not be
added.

This is in fact very near some bugs you caught on rhashtable in the past
few days/couple of weeks tops.

I'm actually against this loop as is. I may have not been clear with Xin
about not adding my signature to the patchset due to this.

Please take a look at Xin's emails on thread 'rhashtable: Prevent
spurious EBUSY errors on insertion' about this particular situation.
Cc'ing Herbert as he wanted to see the patches for that issue.

  Marcelo

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH net-next 1/5] sctp: add the rhashtable apis for sctp global transport hashtable
  2015-12-30 16:57   ` [PATCH net-next 1/5] sctp: add the rhashtable apis for sctp global transport hashtable Eric Dumazet
@ 2015-12-30 17:50     ` David Miller
  2016-01-11  9:32       ` Herbert Xu
  0 siblings, 1 reply; 40+ messages in thread
From: David Miller @ 2015-12-30 17:50 UTC (permalink / raw)
  To: eric.dumazet; +Cc: lucien.xin, netdev, linux-sctp, mleitner, vyasevic, daniel

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Wed, 30 Dec 2015 11:57:31 -0500

> I am against using rhashtable in SCTP (or TCP) at this stage, given the
> number of bugs we have with it.

Come on Eric, we've largely dealt with all of these problems.  I haven't
seen a serious report in a while.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH net-next 0/5] sctp: use transport hashtable to replace association's with rhashtable
  2015-12-30 17:19 ` [PATCH net-next 0/5] sctp: use transport hashtable to replace association's with rhashtable Eric Dumazet
  2015-12-30 17:32   ` Marcelo Ricardo Leitner
@ 2015-12-30 17:52   ` David Miller
  2015-12-30 19:03     ` Eric Dumazet
  1 sibling, 1 reply; 40+ messages in thread
From: David Miller @ 2015-12-30 17:52 UTC (permalink / raw)
  To: eric.dumazet; +Cc: lucien.xin, netdev, linux-sctp, mleitner, vyasevic, daniel

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Wed, 30 Dec 2015 12:19:39 -0500

> Switching SCTP to rhashtable at this moment is premature, it is
> still moving fast.

I completely, and totally, disagree.

rhashtable actually _needs_ a strong active user like one of the
protocol socket hashes.

It's a step backwards to keep rhashtable in the shadows by only
allowing certain subsystems to convert to it.  That's really
incredibly stupid if you ask me.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH net-next 0/5] sctp: use transport hashtable to replace association's with rhashtable
  2015-12-30 17:52   ` David Miller
@ 2015-12-30 19:03     ` Eric Dumazet
  2015-12-30 20:40       ` David Miller
  0 siblings, 1 reply; 40+ messages in thread
From: Eric Dumazet @ 2015-12-30 19:03 UTC (permalink / raw)
  To: David Miller; +Cc: lucien.xin, netdev, linux-sctp, mleitner, vyasevic, daniel

On Wed, 2015-12-30 at 12:52 -0500, David Miller wrote:
> From: Eric Dumazet <eric.dumazet@gmail.com>
> Date: Wed, 30 Dec 2015 12:19:39 -0500
> 
> > Switching SCTP to rhashtable at this moment is premature, it is
> > still moving fast.
> 
> I completely, and totally, disagree.
> 
> rhashtable actually _needs_ a strong active user like one of the
> protocol socket hashes.
> 
> It's a step backwards to keep rhashtable in the shadows by only
> allowing certain subsystems to convert to it.  That's really
> incredibly stupid if you ask me.

You sure can disagree with me, but calling my opinion 'incredily stupid'
is not wise.

Let me check how stable is rhashtable :

# git log --oneline v4.2.. lib/rhashtable.c
179ccc0a7364 rhashtable: Kill harmless RCU warning in rhashtable_walk_init
c6ff5268293e rhashtable: Fix walker list corruption
3a324606bbab rhashtable: Enforce minimum size on initial hash table
a90099d9fabd Revert "rhashtable: Use __vmalloc with GFP_ATOMIC for table allocation"
d3716f18a7d8 rhashtable: Use __vmalloc with GFP_ATOMIC for table allocation
3cf92222a39c rhashtable: Prevent spurious EBUSY errors on insertion
7def0f952ecc lib: fix data race in rhashtable_rehash_one


Seriously, I think we can wait one release before 'en masse'
conversions.

I understand we would love to do that, but what is the hurry for SCTP,
that needed rhashtable so desperately that it could not be done before
2016 ?

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH net-next 0/5] sctp: use transport hashtable to replace association's with rhashtable
  2015-12-30 17:32   ` Marcelo Ricardo Leitner
@ 2015-12-30 19:11     ` Eric Dumazet
  2015-12-30 20:44       ` David Miller
  0 siblings, 1 reply; 40+ messages in thread
From: Eric Dumazet @ 2015-12-30 19:11 UTC (permalink / raw)
  To: Marcelo Ricardo Leitner
  Cc: Xin Long, network dev, linux-sctp, mleitner, vyasevic, daniel, davem

On Wed, 2015-12-30 at 15:32 -0200, Marcelo Ricardo Leitner wrote:
> On Wed, Dec 30, 2015 at 12:19:39PM -0500, Eric Dumazet wrote:
> > On Wed, 2015-12-30 at 23:50 +0800, Xin Long wrote:
> > 
> > > besides, this patchset will use transport hashtable to replace
> > > association hashtable to lookup with rhashtable api. get transport
> > > first then get association by t->asoc. and also it will make tcp
> > > style work better.
> > 
> > SCTP already has a hash table, why not simply changing the way items are
> > hashed into it ?
> 
> Because Vlad asked to split the patch so it gets easier to review. The
> direct change was quite big.
> 
> > Sure, storing thousands of sockets in a single hash bucket is not wise.
> > 
> > Switching SCTP to rhashtable at this moment is premature, it is still
> > moving fast.
> 
> Dave and Vlad had asked in the first review for considering using
> rhashtable (ok, Dave didn't mention it by name).  We did, and it seemed
> nice beside 1 issue Xin found, regarding multiple rehashing, which I'll
> highlight in a reply right away. 
> Said all this, I know this was your second email already against this
> usage, but I have to ask, sorry: still really against it?

Well, it seems that Dave is OK to fix all remaining bugs in rhashtable.

I was not aware that 'we' decided to force rhashtable all over the
places, because it looks so sexy and fun.

Let see how funny it will be then.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH net-next 0/5] sctp: use transport hashtable to replace association's with rhashtable
  2015-12-30 19:03     ` Eric Dumazet
@ 2015-12-30 20:40       ` David Miller
  0 siblings, 0 replies; 40+ messages in thread
From: David Miller @ 2015-12-30 20:40 UTC (permalink / raw)
  To: eric.dumazet; +Cc: lucien.xin, netdev, linux-sctp, mleitner, vyasevic, daniel

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Wed, 30 Dec 2015 14:03:43 -0500

> You sure can disagree with me, but calling my opinion 'incredily
> stupid' is not wise.

I think fundamentally giving facilities less rather than more coverage
is not a smart approach at all.

If the code is that bad that people are discouraged from even using
it, then it should be moved to drivers/staging or removed.  Otherwise
it must work and we must be able to make use of it.

> I understand we would love to do that, but what is the hurry for
> SCTP, that needed rhashtable so desperately that it could not be
> done before 2016 ?

There is no rush, but quite frankly finding people to do serious
work on SCTP is no easy task so I'm hesitant to "defer" someone's
work in this area... :-)

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH net-next 0/5] sctp: use transport hashtable to replace association's with rhashtable
  2015-12-30 19:11     ` Eric Dumazet
@ 2015-12-30 20:44       ` David Miller
  2015-12-30 21:57         ` Eric Dumazet
  0 siblings, 1 reply; 40+ messages in thread
From: David Miller @ 2015-12-30 20:44 UTC (permalink / raw)
  To: eric.dumazet
  Cc: marcelo.leitner, lucien.xin, netdev, linux-sctp, mleitner,
	vyasevic, daniel

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Wed, 30 Dec 2015 14:11:20 -0500

> Let see how funny it will be then.

It is more fun than waiting longer for the more limited uses of it to
trigger problems.

I cannot be convinced that using it in more places in order to find
and fix more bugs is a bad thing.

I'm sorry if a lot of bug fixes in a short period of time concerns
you, but for me that's an even clearer sign that it needs help, and
exposing it to more use cases is one of the best forms of help it can
get.

It also tells me that the people actually working on those fixes, such
as Herbert Xu, are motivated and reliable when they are shown properly
formed bug reports.

I cannot think of a report Herbert and others did not resolve in a
timely manner.  They usually add test cases too.

And that matters more to me than anything else.  A subsystem can be
buggy as shit, but if someone is responsible about fixing the reported
bugs properly, then I have absolutely nothing to worry about.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH net-next 0/5] sctp: use transport hashtable to replace association's with rhashtable
  2015-12-30 20:44       ` David Miller
@ 2015-12-30 21:57         ` Eric Dumazet
  2015-12-30 22:29           ` Marcelo Ricardo Leitner
  0 siblings, 1 reply; 40+ messages in thread
From: Eric Dumazet @ 2015-12-30 21:57 UTC (permalink / raw)
  To: David Miller
  Cc: marcelo.leitner, lucien.xin, netdev, linux-sctp, mleitner,
	vyasevic, daniel

On Wed, 2015-12-30 at 15:44 -0500, David Miller wrote:

> It is more fun than waiting longer for the more limited uses of it to
> trigger problems.
> 
> I cannot be convinced that using it in more places in order to find
> and fix more bugs is a bad thing.
> 
> I'm sorry if a lot of bug fixes in a short period of time concerns
> you, but for me that's an even clearer sign that it needs help, and
> exposing it to more use cases is one of the best forms of help it can
> get.
> 
> It also tells me that the people actually working on those fixes, such
> as Herbert Xu, are motivated and reliable when they are shown properly
> formed bug reports.
> 
> I cannot think of a report Herbert and others did not resolve in a
> timely manner.  They usually add test cases too.


I have no doubts we can fix bugs in upstream kernels in a few days (at
most).

The problem is when a customer is stuck using a distro, with a release
cycle of extra months after upstream fixes.

I had to deal with customers having issues with resolvers hitting the
netlink/rhashtable bugs, and I can tell you it was not pretty nor funny.

Seeing all these SCTP bugs being currently tracked/fixed (reports from
Dmitry Vyukov), I am concerned about having to backport fixes into old
kernels without proper rhashtable if now SCTP relies heavily on
rhashtable.

Hopefully nothing bad will happen.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH net-next 0/5] sctp: use transport hashtable to replace association's with rhashtable
  2015-12-30 21:57         ` Eric Dumazet
@ 2015-12-30 22:29           ` Marcelo Ricardo Leitner
  0 siblings, 0 replies; 40+ messages in thread
From: Marcelo Ricardo Leitner @ 2015-12-30 22:29 UTC (permalink / raw)
  To: Eric Dumazet, David Miller
  Cc: lucien.xin, netdev, linux-sctp, mleitner, vyasevic, daniel

Em 30-12-2015 19:57, Eric Dumazet escreveu:
> On Wed, 2015-12-30 at 15:44 -0500, David Miller wrote:
>
>> It is more fun than waiting longer for the more limited uses of it to
>> trigger problems.
>>
>> I cannot be convinced that using it in more places in order to find
>> and fix more bugs is a bad thing.
>>
>> I'm sorry if a lot of bug fixes in a short period of time concerns
>> you, but for me that's an even clearer sign that it needs help, and
>> exposing it to more use cases is one of the best forms of help it can
>> get.
>>
>> It also tells me that the people actually working on those fixes, such
>> as Herbert Xu, are motivated and reliable when they are shown properly
>> formed bug reports.
>>
>> I cannot think of a report Herbert and others did not resolve in a
>> timely manner.  They usually add test cases too.
>
>
> I have no doubts we can fix bugs in upstream kernels in a few days (at
> most).
>
> The problem is when a customer is stuck using a distro, with a release
> cycle of extra months after upstream fixes.

If one takes extra months to have a fix delivered to a customer, they 
probably are also months late on security fixes as well, right? That 
would be pretty scary by itself already.

> I had to deal with customers having issues with resolvers hitting the
> netlink/rhashtable bugs, and I can tell you it was not pretty nor funny.
>
> Seeing all these SCTP bugs being currently tracked/fixed (reports from
> Dmitry Vyukov), I am concerned about having to backport fixes into old
> kernels without proper rhashtable if now SCTP relies heavily on
> rhashtable.

This happens with every major change in the kernel. Try backporting 
vxlan fixes to an older kernel, for example, to one without ip_tunnel.

Can't say about the future, but so far none of those bugs were related 
to the hash that we want to replace and they were all small/contained 
patches.

And at least for now, we are not adding new stuff which relies on this 
new hash. It's on a central part of sctp, yes, but somewhat contained. 
Like what happened with vxlan/ip_tunnel, which ended up growing together.

> Hopefully nothing bad will happen.

+1 :)

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH net-next 0/5] sctp: use transport hashtable to replace association's with rhashtable
  2015-12-30 15:50 [PATCH net-next 0/5] sctp: use transport hashtable to replace association's with rhashtable Xin Long
  2015-12-30 15:50 ` [PATCH net-next 1/5] sctp: add the rhashtable apis for sctp global transport hashtable Xin Long
  2015-12-30 17:19 ` [PATCH net-next 0/5] sctp: use transport hashtable to replace association's with rhashtable Eric Dumazet
@ 2016-01-04 22:30 ` David Miller
  2 siblings, 0 replies; 40+ messages in thread
From: David Miller @ 2016-01-04 22:30 UTC (permalink / raw)
  To: lucien.xin; +Cc: netdev, linux-sctp, mleitner, vyasevic, daniel

From: Xin Long <lucien.xin@gmail.com>
Date: Wed, 30 Dec 2015 23:50:45 +0800

> for telecom center, the usual case is that a server is connected by thousands
> of clients. but if the server with only one enpoint(udp style) use the same
> sport and dport to communicate with every clients, and every assoc in server
> will be hashed in the same chain of global assoc hashtable due to currently we
> choose dport and sport as the hash key.
> 
> when a packet is received, sctp_rcv try to find the assoc with sport and dport,
> since that chain is too long to find it fast, it make the performance turn to
> very low, some test data is as follow:
 ...
> we need to change the way to calculate the hash key, to use lport +
> rport + paddr as the hash key can avoid this issue.
> 
> besides, this patchset will use transport hashtable to replace
> association hashtable to lookup with rhashtable api. get transport
> first then get association by t->asoc. and also it will make tcp
> style work better.
 ...

I've applied this series, but...

If anything regresses and is not dealt with in a timely manner I
will revert this series.

Thanks.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH net-next 1/5] sctp: add the rhashtable apis for sctp global transport hashtable
  2015-12-30 17:41   ` Marcelo Ricardo Leitner
@ 2016-01-05 10:10     ` Xin Long
  2016-01-11  9:22       ` Herbert Xu
  0 siblings, 1 reply; 40+ messages in thread
From: Xin Long @ 2016-01-05 10:10 UTC (permalink / raw)
  To: Marcelo Ricardo Leitner
  Cc: Eric Dumazet, network dev, linux-sctp, Marcelo Ricardo Leitner,
	Vlad Yasevich, daniel, davem, Herbert Xu

[-- Attachment #1: Type: text/plain, Size: 1733 bytes --]

On Thu, Dec 31, 2015 at 1:41 AM, Marcelo Ricardo Leitner
<marcelo.leitner@gmail.com> wrote:
> On Wed, Dec 30, 2015 at 11:50:46PM +0800, Xin Long wrote:
> ...
>> +void sctp_hash_transport(struct sctp_transport *t)
>> +{
>> +     struct sctp_sockaddr_entry *addr;
>> +     struct sctp_hash_cmp_arg arg;
>> +
>> +     addr = list_entry(t->asoc->base.bind_addr.address_list.next,
>> +                       struct sctp_sockaddr_entry, list);
>> +     arg.laddr = &addr->a;
>> +     arg.paddr = &t->ipaddr;
>> +     arg.net   = sock_net(t->asoc->base.sk);
>> +
>> +reinsert:
>> +     if (rhashtable_lookup_insert_key(&sctp_transport_hashtable, &arg,
>> +                                      &t->node, sctp_hash_params) == -EBUSY)
>> +             goto reinsert;
>> +}
>
> This is the nasty situation I mentioned in previous email. It seems that
> a stress test can trigger a double rehash and cause an entry to not be
> added.
>
> This is in fact very near some bugs you caught on rhashtable in the past
> few days/couple of weeks tops.
>
> I'm actually against this loop as is. I may have not been clear with Xin
> about not adding my signature to the patchset due to this.
>
> Please take a look at Xin's emails on thread 'rhashtable: Prevent
> spurious EBUSY errors on insertion' about this particular situation.
> Cc'ing Herbert as he wanted to see the patches for that issue.
>
>   Marcelo
>
without this 'reinsert'.

we can reproduce this issue like this:
1. download the attachment
2. cp sctperf.tar.gz to server and client hosts
3. in each hosts.
#make
4. in server:
#sh saddr.sh $ethx
#./ss
5. in client:
#sh caddr.sh $ethx
#ulimit -n 20000
#./cc

when the number of  connections reach about 1600, this issue will be triggered.

[-- Attachment #2: sctperf.tar.gz --]
[-- Type: application/x-gzip, Size: 1932 bytes --]

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH net-next 1/5] sctp: add the rhashtable apis for sctp global transport hashtable
  2015-12-30 15:50 ` [PATCH net-next 1/5] sctp: add the rhashtable apis for sctp global transport hashtable Xin Long
                     ` (2 preceding siblings ...)
  2015-12-30 17:41   ` Marcelo Ricardo Leitner
@ 2016-01-05 18:38   ` Vlad Yasevich
  2016-01-06 17:01     ` Xin Long
  2016-01-11  9:30   ` Herbert Xu
  4 siblings, 1 reply; 40+ messages in thread
From: Vlad Yasevich @ 2016-01-05 18:38 UTC (permalink / raw)
  To: Xin Long, network dev, linux-sctp; +Cc: mleitner, vyasevic, daniel, davem

On 12/30/2015 10:50 AM, Xin Long wrote:
> tranport hashtbale will replace the association hashtable to do the
> lookup for transport, and then get association by t->assoc, rhashtable
> apis will be used because of it's resizable, scalable and using rcu.
> 
> lport + rport + paddr will be the base hashkey to locate the chain,
> with net to protect one netns from another, then plus the laddr to
> compare to get the target.
> 
> this patch will provider the lookup functions:
> - sctp_epaddr_lookup_transport
> - sctp_addrs_lookup_transport
> 
> hash/unhash functions:
> - sctp_hash_transport
> - sctp_unhash_transport
> 
> init/destroy functions:
> - sctp_transport_hashtable_init
> - sctp_transport_hashtable_destroy
> 
> Signed-off-by: Xin Long <lucien.xin@gmail.com>
> Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
> ---
>  include/net/sctp/sctp.h    |  11 ++++
>  include/net/sctp/structs.h |   5 ++
>  net/sctp/input.c           | 131 +++++++++++++++++++++++++++++++++++++++++++++
>  3 files changed, 147 insertions(+)
> 
> diff --git a/include/net/sctp/sctp.h b/include/net/sctp/sctp.h
> index ce13cf2..7bbdfba 100644
> --- a/include/net/sctp/sctp.h
> +++ b/include/net/sctp/sctp.h
> @@ -143,6 +143,17 @@ void sctp_icmp_proto_unreachable(struct sock *sk,
>  				 struct sctp_transport *t);
>  void sctp_backlog_migrate(struct sctp_association *assoc,
>  			  struct sock *oldsk, struct sock *newsk);
> +int sctp_transport_hashtable_init(void);
> +void sctp_transport_hashtable_destroy(void);
> +void sctp_hash_transport(struct sctp_transport *t);
> +void sctp_unhash_transport(struct sctp_transport *t);
> +struct sctp_transport *sctp_addrs_lookup_transport(
> +				struct net *net,
> +				const union sctp_addr *laddr,
> +				const union sctp_addr *paddr);
> +struct sctp_transport *sctp_epaddr_lookup_transport(
> +				const struct sctp_endpoint *ep,
> +				const union sctp_addr *paddr);
>  
>  /*
>   * sctp/proc.c
> diff --git a/include/net/sctp/structs.h b/include/net/sctp/structs.h
> index eea9bde..4ab87d0 100644
> --- a/include/net/sctp/structs.h
> +++ b/include/net/sctp/structs.h
> @@ -48,6 +48,7 @@
>  #define __sctp_structs_h__
>  
>  #include <linux/ktime.h>
> +#include <linux/rhashtable.h>
>  #include <linux/socket.h>	/* linux/in.h needs this!!    */
>  #include <linux/in.h>		/* We get struct sockaddr_in. */
>  #include <linux/in6.h>		/* We get struct in6_addr     */
> @@ -123,6 +124,8 @@ extern struct sctp_globals {
>  	struct sctp_hashbucket *assoc_hashtable;
>  	/* This is the sctp port control hash.	*/
>  	struct sctp_bind_hashbucket *port_hashtable;
> +	/* This is the hash of all transports. */
> +	struct rhashtable transport_hashtable;
>  
>  	/* Sizes of above hashtables. */
>  	int ep_hashsize;
> @@ -147,6 +150,7 @@ extern struct sctp_globals {
>  #define sctp_assoc_hashtable		(sctp_globals.assoc_hashtable)
>  #define sctp_port_hashsize		(sctp_globals.port_hashsize)
>  #define sctp_port_hashtable		(sctp_globals.port_hashtable)
> +#define sctp_transport_hashtable	(sctp_globals.transport_hashtable)
>  #define sctp_checksum_disable		(sctp_globals.checksum_disable)
>  
>  /* SCTP Socket type: UDP or TCP style. */
> @@ -753,6 +757,7 @@ static inline int sctp_packet_empty(struct sctp_packet *packet)
>  struct sctp_transport {
>  	/* A list of transports. */
>  	struct list_head transports;
> +	struct rhash_head node;
>  
>  	/* Reference counting. */
>  	atomic_t refcnt;
> diff --git a/net/sctp/input.c b/net/sctp/input.c
> index b6493b3..bac8278 100644
> --- a/net/sctp/input.c
> +++ b/net/sctp/input.c
> @@ -782,6 +782,137 @@ hit:
>  	return ep;
>  }
>  
> +/* rhashtable for transport */
> +struct sctp_hash_cmp_arg {
> +	const union sctp_addr		*laddr;
> +	const union sctp_addr		*paddr;
> +	const struct net		*net;
> +};
> +
> +static inline int sctp_hash_cmp(struct rhashtable_compare_arg *arg,
> +				const void *ptr)
> +{
> +	const struct sctp_hash_cmp_arg *x = arg->key;
> +	const struct sctp_transport *t = ptr;
> +	struct sctp_association *asoc = t->asoc;
> +	const struct net *net = x->net;
> +
> +	if (x->laddr->v4.sin_port != htons(asoc->base.bind_addr.port))
> +		return 1;
> +	if (!sctp_cmp_addr_exact(&t->ipaddr, x->paddr))
> +		return 1;
> +	if (!net_eq(sock_net(asoc->base.sk), net))
> +		return 1;
> +	if (!sctp_bind_addr_match(&asoc->base.bind_addr,
> +				  x->laddr, sctp_sk(asoc->base.sk)))
> +		return 1;
> +
> +	return 0;
> +}
> +
> +static inline u32 sctp_hash_obj(const void *data, u32 len, u32 seed)
> +{
> +	const struct sctp_transport *t = data;
> +	const union sctp_addr *paddr = &t->ipaddr;
> +	const struct net *net = sock_net(t->asoc->base.sk);
> +	u16 lport = htons(t->asoc->base.bind_addr.port);
> +	u32 addr;
> +
> +	if (paddr->sa.sa_family == AF_INET6)
> +		addr = jhash(&paddr->v6.sin6_addr, 16, seed);
> +	else
> +		addr = paddr->v4.sin_addr.s_addr;
> +
> +	return  jhash_3words(addr, ((__u32)paddr->v4.sin_port) << 16 |
> +			     (__force __u32)lport, net_hash_mix(net), seed);
> +}
> +
> +static inline u32 sctp_hash_key(const void *data, u32 len, u32 seed)
> +{
> +	const struct sctp_hash_cmp_arg *x = data;
> +	const union sctp_addr *paddr = x->paddr;
> +	const struct net *net = x->net;
> +	u16 lport = x->laddr->v4.sin_port;
> +	u32 addr;
> +
> +	if (paddr->sa.sa_family == AF_INET6)
> +		addr = jhash(&paddr->v6.sin6_addr, 16, seed);
> +	else
> +		addr = paddr->v4.sin_addr.s_addr;
> +
> +	return  jhash_3words(addr, ((__u32)paddr->v4.sin_port) << 16 |
> +			     (__force __u32)lport, net_hash_mix(net), seed);
> +}
> +
> +static const struct rhashtable_params sctp_hash_params = {
> +	.head_offset		= offsetof(struct sctp_transport, node),
> +	.hashfn			= sctp_hash_key,
> +	.obj_hashfn		= sctp_hash_obj,
> +	.obj_cmpfn		= sctp_hash_cmp,
> +	.automatic_shrinking	= true,
> +};
> +
> +int sctp_transport_hashtable_init(void)
> +{
> +	return rhashtable_init(&sctp_transport_hashtable, &sctp_hash_params);
> +}
> +
> +void sctp_transport_hashtable_destroy(void)
> +{
> +	rhashtable_destroy(&sctp_transport_hashtable);
> +}
> +
> +void sctp_hash_transport(struct sctp_transport *t)
> +{
> +	struct sctp_sockaddr_entry *addr;
> +	struct sctp_hash_cmp_arg arg;
> +
> +	addr = list_entry(t->asoc->base.bind_addr.address_list.next,
> +			  struct sctp_sockaddr_entry, list);
> +	arg.laddr = &addr->a;
> +	arg.paddr = &t->ipaddr;
> +	arg.net   = sock_net(t->asoc->base.sk);
> +
> +reinsert:
> +	if (rhashtable_lookup_insert_key(&sctp_transport_hashtable, &arg,
> +					 &t->node, sctp_hash_params) == -EBUSY)
> +		goto reinsert;
> +}
> +
> +void sctp_unhash_transport(struct sctp_transport *t)
> +{
> +	rhashtable_remove_fast(&sctp_transport_hashtable, &t->node,
> +			       sctp_hash_params);
> +}
> +
> +struct sctp_transport *sctp_addrs_lookup_transport(
> +				struct net *net,
> +				const union sctp_addr *laddr,
> +				const union sctp_addr *paddr)
> +{
> +	struct sctp_hash_cmp_arg arg = {
> +		.laddr = laddr,
> +		.paddr = paddr,
> +		.net   = net,
> +	};
> +
> +	return rhashtable_lookup_fast(&sctp_transport_hashtable, &arg,
> +				      sctp_hash_params);
> +}
> +
> +struct sctp_transport *sctp_epaddr_lookup_transport(
> +				const struct sctp_endpoint *ep,
> +				const union sctp_addr *paddr)
> +{
> +	struct sctp_sockaddr_entry *addr;
> +	struct net *net = sock_net(ep->base.sk);
> +
> +	addr = list_entry(ep->base.bind_addr.address_list.next,
> +			  struct sctp_sockaddr_entry, list);
> +
> +	return sctp_addrs_lookup_transport(net, &addr->a, paddr);
> +} 
> +

I don't think that this right, mainly because not all endpoint
addresses will be copied to association bind_addr list.   As a result,
you may actually have an association on this endpoint to a given
peer, but may not be using the first address from the endpoint..

What might work is to pick an endpoint address that would be usable within
the scope of the peer address.

-vlad

>  /* Insert association into the hash table.  */
>  static void __sctp_hash_established(struct sctp_association *asoc)
>  {
> 

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH net-next 2/5] sctp: apply rhashtable api to send/recv path
  2015-12-30 15:50   ` [PATCH net-next 2/5] sctp: apply rhashtable api to send/recv path Xin Long
  2015-12-30 15:50     ` [PATCH net-next 3/5] sctp: apply rhashtable api to sctp procfs Xin Long
@ 2016-01-05 19:07     ` Vlad Yasevich
  2016-01-06 16:18       ` Xin Long
  2016-01-06 17:42       ` mleitner
  1 sibling, 2 replies; 40+ messages in thread
From: Vlad Yasevich @ 2016-01-05 19:07 UTC (permalink / raw)
  To: Xin Long, network dev, linux-sctp; +Cc: mleitner, vyasevic, daniel, davem

On 12/30/2015 10:50 AM, Xin Long wrote:
> apply lookup apis to two functions, for __sctp_endpoint_lookup_assoc
> and __sctp_lookup_association, it's invoked in the protection of sock
> lock, it will be safe, but sctp_lookup_association need to call
> rcu_read_lock() and to detect the t->dead to protect it.
> 
> Signed-off-by: Xin Long <lucien.xin@gmail.com>
> Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
> ---
>  net/sctp/associola.c   |  5 +++++
>  net/sctp/endpointola.c | 35 ++++++++---------------------------
>  net/sctp/input.c       | 39 ++++++++++-----------------------------
>  net/sctp/protocol.c    |  6 ++++++
>  4 files changed, 29 insertions(+), 56 deletions(-)
> 
> diff --git a/net/sctp/associola.c b/net/sctp/associola.c
> index 559afd0..2bf8ec9 100644
> --- a/net/sctp/associola.c
> +++ b/net/sctp/associola.c
> @@ -383,6 +383,7 @@ void sctp_association_free(struct sctp_association *asoc)
>  	list_for_each_safe(pos, temp, &asoc->peer.transport_addr_list) {
>  		transport = list_entry(pos, struct sctp_transport, transports);
>  		list_del_rcu(pos);
> +		sctp_unhash_transport(transport);
>  		sctp_transport_free(transport);
>  	}
>  
> @@ -500,6 +501,8 @@ void sctp_assoc_rm_peer(struct sctp_association *asoc,
>  
>  	/* Remove this peer from the list. */
>  	list_del_rcu(&peer->transports);
> +	/* Remove this peer from the transport hashtable */
> +	sctp_unhash_transport(peer);
>  
>  	/* Get the first transport of asoc. */
>  	pos = asoc->peer.transport_addr_list.next;
> @@ -699,6 +702,8 @@ struct sctp_transport *sctp_assoc_add_peer(struct sctp_association *asoc,
>  	/* Attach the remote transport to our asoc.  */
>  	list_add_tail_rcu(&peer->transports, &asoc->peer.transport_addr_list);
>  	asoc->peer.transport_count++;
> +	/* Add this peer into the transport hashtable */
> +	sctp_hash_transport(peer);

This is actually problematic.  The issue is that transports are unhashed when removed.
however, transport removal happens after the association has been declared dead and
should have been removed from the hash and marked unreachable.

As a result, with the code above, you can now find and return a dead association.
Checking for 'dead' state is racy.

The best solution I've come up with is to hash the transports in sctp_hash_established()
and clean-up in __sctp_unhash_established(), and then handle ADD-IP case separately.

The above would also remove the necessity to check for temporary associations, since they
should never be hashed.

-vlad

>  
>  	/* If we do not yet have a primary path, set one.  */
>  	if (!asoc->peer.primary_path) {
> diff --git a/net/sctp/endpointola.c b/net/sctp/endpointola.c
> index 9da76ba..8838bf4 100644
> --- a/net/sctp/endpointola.c
> +++ b/net/sctp/endpointola.c
> @@ -314,8 +314,8 @@ struct sctp_endpoint *sctp_endpoint_is_match(struct sctp_endpoint *ep,
>  }
>  
>  /* Find the association that goes with this chunk.
> - * We do a linear search of the associations for this endpoint.
> - * We return the matching transport address too.
> + * We lookup the transport from hashtable at first, then get association
> + * through t->assoc.
>   */
>  static struct sctp_association *__sctp_endpoint_lookup_assoc(
>  	const struct sctp_endpoint *ep,
> @@ -323,12 +323,7 @@ static struct sctp_association *__sctp_endpoint_lookup_assoc(
>  	struct sctp_transport **transport)
>  {
>  	struct sctp_association *asoc = NULL;
> -	struct sctp_association *tmp;
> -	struct sctp_transport *t = NULL;
> -	struct sctp_hashbucket *head;
> -	struct sctp_ep_common *epb;
> -	int hash;
> -	int rport;
> +	struct sctp_transport *t;
>  
>  	*transport = NULL;
>  
> @@ -337,26 +332,12 @@ static struct sctp_association *__sctp_endpoint_lookup_assoc(
>  	 */
>  	if (!ep->base.bind_addr.port)
>  		goto out;
> +	t = sctp_epaddr_lookup_transport(ep, paddr);
> +	if (!t || t->asoc->temp)
> +		goto out;
>  
> -	rport = ntohs(paddr->v4.sin_port);
> -
> -	hash = sctp_assoc_hashfn(sock_net(ep->base.sk), ep->base.bind_addr.port,
> -				 rport);
> -	head = &sctp_assoc_hashtable[hash];
> -	read_lock(&head->lock);
> -	sctp_for_each_hentry(epb, &head->chain) {
> -		tmp = sctp_assoc(epb);
> -		if (tmp->ep != ep || rport != tmp->peer.port)
> -			continue;
> -
> -		t = sctp_assoc_lookup_paddr(tmp, paddr);
> -		if (t) {
> -			asoc = tmp;
> -			*transport = t;
> -			break;
> -		}
> -	}
> -	read_unlock(&head->lock);
> +	*transport = t;
> +	asoc = t->asoc;
>  out:
>  	return asoc;
>  }
> diff --git a/net/sctp/input.c b/net/sctp/input.c
> index bac8278..6f075d8 100644
> --- a/net/sctp/input.c
> +++ b/net/sctp/input.c
> @@ -981,38 +981,19 @@ static struct sctp_association *__sctp_lookup_association(
>  					const union sctp_addr *peer,
>  					struct sctp_transport **pt)
>  {
> -	struct sctp_hashbucket *head;
> -	struct sctp_ep_common *epb;
> -	struct sctp_association *asoc;
> -	struct sctp_transport *transport;
> -	int hash;
> +	struct sctp_transport *t;
>  
> -	/* Optimize here for direct hit, only listening connections can
> -	 * have wildcards anyways.
> -	 */
> -	hash = sctp_assoc_hashfn(net, ntohs(local->v4.sin_port),
> -				 ntohs(peer->v4.sin_port));
> -	head = &sctp_assoc_hashtable[hash];
> -	read_lock(&head->lock);
> -	sctp_for_each_hentry(epb, &head->chain) {
> -		asoc = sctp_assoc(epb);
> -		transport = sctp_assoc_is_match(asoc, net, local, peer);
> -		if (transport)
> -			goto hit;
> -	}
> +	t = sctp_addrs_lookup_transport(net, local, peer);
> +	if (!t || t->dead || t->asoc->temp)
> +		return NULL;
>  
> -	read_unlock(&head->lock);
> +	sctp_association_hold(t->asoc);
> +	*pt = t;
>  
> -	return NULL;
> -
> -hit:
> -	*pt = transport;
> -	sctp_association_hold(asoc);
> -	read_unlock(&head->lock);
> -	return asoc;
> +	return t->asoc;
>  }
>  
> -/* Look up an association. BH-safe. */
> +/* Look up an association. protected by RCU read lock */
>  static
>  struct sctp_association *sctp_lookup_association(struct net *net,
>  						 const union sctp_addr *laddr,
> @@ -1021,9 +1002,9 @@ struct sctp_association *sctp_lookup_association(struct net *net,
>  {
>  	struct sctp_association *asoc;
>  
> -	local_bh_disable();
> +	rcu_read_lock();
>  	asoc = __sctp_lookup_association(net, laddr, paddr, transportp);
> -	local_bh_enable();
> +	rcu_read_unlock();
>  
>  	return asoc;
>  }
> diff --git a/net/sctp/protocol.c b/net/sctp/protocol.c
> index 010aced..631cfb3 100644
> --- a/net/sctp/protocol.c
> +++ b/net/sctp/protocol.c
> @@ -1467,6 +1467,9 @@ static __init int sctp_init(void)
>  		INIT_HLIST_HEAD(&sctp_port_hashtable[i].chain);
>  	}
>  
> +	if (sctp_transport_hashtable_init())
> +		goto err_thash_alloc;
> +
>  	pr_info("Hash tables configured (established %d bind %d)\n",
>  		sctp_assoc_hashsize, sctp_port_hashsize);
>  
> @@ -1521,6 +1524,8 @@ err_register_defaults:
>  		   get_order(sctp_port_hashsize *
>  			     sizeof(struct sctp_bind_hashbucket)));
>  err_bhash_alloc:
> +	sctp_transport_hashtable_destroy();
> +err_thash_alloc:
>  	kfree(sctp_ep_hashtable);
>  err_ehash_alloc:
>  	free_pages((unsigned long)sctp_assoc_hashtable,
> @@ -1567,6 +1572,7 @@ static __exit void sctp_exit(void)
>  	free_pages((unsigned long)sctp_port_hashtable,
>  		   get_order(sctp_port_hashsize *
>  			     sizeof(struct sctp_bind_hashbucket)));
> +	sctp_transport_hashtable_destroy();
>  
>  	percpu_counter_destroy(&sctp_sockets_allocated);
>  
> 

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH net-next 2/5] sctp: apply rhashtable api to send/recv path
  2016-01-05 19:07     ` [PATCH net-next 2/5] sctp: apply rhashtable api to send/recv path Vlad Yasevich
@ 2016-01-06 16:18       ` Xin Long
  2016-01-06 17:42       ` mleitner
  1 sibling, 0 replies; 40+ messages in thread
From: Xin Long @ 2016-01-06 16:18 UTC (permalink / raw)
  To: Vlad Yasevich
  Cc: network dev, linux-sctp, Marcelo Ricardo Leitner, Vlad Yasevich,
	daniel, davem

On Wed, Jan 6, 2016 at 3:07 AM, Vlad Yasevich <vyasevich@gmail.com> wrote:
> On 12/30/2015 10:50 AM, Xin Long wrote:
>> apply lookup apis to two functions, for __sctp_endpoint_lookup_assoc
>> and __sctp_lookup_association, it's invoked in the protection of sock
>> lock, it will be safe, but sctp_lookup_association need to call
>> rcu_read_lock() and to detect the t->dead to protect it.
>>
>> Signed-off-by: Xin Long <lucien.xin@gmail.com>
>> Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
>> ---
>>  net/sctp/associola.c   |  5 +++++
>>  net/sctp/endpointola.c | 35 ++++++++---------------------------
>>  net/sctp/input.c       | 39 ++++++++++-----------------------------
>>  net/sctp/protocol.c    |  6 ++++++
>>  4 files changed, 29 insertions(+), 56 deletions(-)
>>
>> diff --git a/net/sctp/associola.c b/net/sctp/associola.c
>> index 559afd0..2bf8ec9 100644
>> --- a/net/sctp/associola.c
>> +++ b/net/sctp/associola.c
>> @@ -383,6 +383,7 @@ void sctp_association_free(struct sctp_association *asoc)
>>       list_for_each_safe(pos, temp, &asoc->peer.transport_addr_list) {
>>               transport = list_entry(pos, struct sctp_transport, transports);
>>               list_del_rcu(pos);
>> +             sctp_unhash_transport(transport);
>>               sctp_transport_free(transport);
>>       }
>>
>> @@ -500,6 +501,8 @@ void sctp_assoc_rm_peer(struct sctp_association *asoc,
>>
>>       /* Remove this peer from the list. */
>>       list_del_rcu(&peer->transports);
>> +     /* Remove this peer from the transport hashtable */
>> +     sctp_unhash_transport(peer);
>>
>>       /* Get the first transport of asoc. */
>>       pos = asoc->peer.transport_addr_list.next;
>> @@ -699,6 +702,8 @@ struct sctp_transport *sctp_assoc_add_peer(struct sctp_association *asoc,
>>       /* Attach the remote transport to our asoc.  */
>>       list_add_tail_rcu(&peer->transports, &asoc->peer.transport_addr_list);
>>       asoc->peer.transport_count++;
>> +     /* Add this peer into the transport hashtable */
>> +     sctp_hash_transport(peer);
>
> This is actually problematic.  The issue is that transports are unhashed when removed.
> however, transport removal happens after the association has been declared dead and
> should have been removed from the hash and marked unreachable.
>
> As a result, with the code above, you can now find and return a dead association.
> Checking for 'dead' state is racy.
>
> The best solution I've come up with is to hash the transports in sctp_hash_established()
> and clean-up in __sctp_unhash_established(), and then handle ADD-IP case separately.
>
> The above would also remove the necessity to check for temporary associations, since they
> should never be hashed.
>
> -vlad
>
yes, you're right, im thinking if we can unhash transport before the association
declares dead in sctp_association_free, like:

       list_for_each_safe(pos, temp, &asoc->peer.transport_addr_list) {
               transport = list_entry(pos, struct sctp_transport, transports);
               sctp_unhash_transport(transport);
       }
       asoc->base.dead = true;

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH net-next 1/5] sctp: add the rhashtable apis for sctp global transport hashtable
  2016-01-05 18:38   ` Vlad Yasevich
@ 2016-01-06 17:01     ` Xin Long
  2016-01-06 18:19       ` Marcelo Ricardo Leitner
  2016-01-07 20:28       ` Vlad Yasevich
  0 siblings, 2 replies; 40+ messages in thread
From: Xin Long @ 2016-01-06 17:01 UTC (permalink / raw)
  To: Vlad Yasevich
  Cc: network dev, linux-sctp, Marcelo Ricardo Leitner, Vlad Yasevich,
	daniel, davem

On Wed, Jan 6, 2016 at 2:38 AM, Vlad Yasevich <vyasevich@gmail.com> wrote:
> On 12/30/2015 10:50 AM, Xin Long wrote:
>> tranport hashtbale will replace the association hashtable to do the
>> lookup for transport, and then get association by t->assoc, rhashtable
>> apis will be used because of it's resizable, scalable and using rcu.
>>
>> lport + rport + paddr will be the base hashkey to locate the chain,
>> with net to protect one netns from another, then plus the laddr to
>> compare to get the target.
>>
>> this patch will provider the lookup functions:
>> - sctp_epaddr_lookup_transport
>> - sctp_addrs_lookup_transport
>>
>> hash/unhash functions:
>> - sctp_hash_transport
>> - sctp_unhash_transport
>>
>> init/destroy functions:
>> - sctp_transport_hashtable_init
>> - sctp_transport_hashtable_destroy
>>
>> Signed-off-by: Xin Long <lucien.xin@gmail.com>
>> Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
>> ---
>>  include/net/sctp/sctp.h    |  11 ++++
>>  include/net/sctp/structs.h |   5 ++
>>  net/sctp/input.c           | 131 +++++++++++++++++++++++++++++++++++++++++++++
>>  3 files changed, 147 insertions(+)
>>
>> diff --git a/include/net/sctp/sctp.h b/include/net/sctp/sctp.h
>> index ce13cf2..7bbdfba 100644
>> --- a/include/net/sctp/sctp.h
>> +++ b/include/net/sctp/sctp.h
>> @@ -143,6 +143,17 @@ void sctp_icmp_proto_unreachable(struct sock *sk,
>>                                struct sctp_transport *t);
>>  void sctp_backlog_migrate(struct sctp_association *assoc,
>>                         struct sock *oldsk, struct sock *newsk);
>> +int sctp_transport_hashtable_init(void);
>> +void sctp_transport_hashtable_destroy(void);
>> +void sctp_hash_transport(struct sctp_transport *t);
>> +void sctp_unhash_transport(struct sctp_transport *t);
>> +struct sctp_transport *sctp_addrs_lookup_transport(
>> +                             struct net *net,
>> +                             const union sctp_addr *laddr,
>> +                             const union sctp_addr *paddr);
>> +struct sctp_transport *sctp_epaddr_lookup_transport(
>> +                             const struct sctp_endpoint *ep,
>> +                             const union sctp_addr *paddr);
>>
>>  /*
>>   * sctp/proc.c
>> diff --git a/include/net/sctp/structs.h b/include/net/sctp/structs.h
>> index eea9bde..4ab87d0 100644
>> --- a/include/net/sctp/structs.h
>> +++ b/include/net/sctp/structs.h
>> @@ -48,6 +48,7 @@
>>  #define __sctp_structs_h__
>>
>>  #include <linux/ktime.h>
>> +#include <linux/rhashtable.h>
>>  #include <linux/socket.h>    /* linux/in.h needs this!!    */
>>  #include <linux/in.h>                /* We get struct sockaddr_in. */
>>  #include <linux/in6.h>               /* We get struct in6_addr     */
>> @@ -123,6 +124,8 @@ extern struct sctp_globals {
>>       struct sctp_hashbucket *assoc_hashtable;
>>       /* This is the sctp port control hash.  */
>>       struct sctp_bind_hashbucket *port_hashtable;
>> +     /* This is the hash of all transports. */
>> +     struct rhashtable transport_hashtable;
>>
>>       /* Sizes of above hashtables. */
>>       int ep_hashsize;
>> @@ -147,6 +150,7 @@ extern struct sctp_globals {
>>  #define sctp_assoc_hashtable         (sctp_globals.assoc_hashtable)
>>  #define sctp_port_hashsize           (sctp_globals.port_hashsize)
>>  #define sctp_port_hashtable          (sctp_globals.port_hashtable)
>> +#define sctp_transport_hashtable     (sctp_globals.transport_hashtable)
>>  #define sctp_checksum_disable                (sctp_globals.checksum_disable)
>>
>>  /* SCTP Socket type: UDP or TCP style. */
>> @@ -753,6 +757,7 @@ static inline int sctp_packet_empty(struct sctp_packet *packet)
>>  struct sctp_transport {
>>       /* A list of transports. */
>>       struct list_head transports;
>> +     struct rhash_head node;
>>
>>       /* Reference counting. */
>>       atomic_t refcnt;
>> diff --git a/net/sctp/input.c b/net/sctp/input.c
>> index b6493b3..bac8278 100644
>> --- a/net/sctp/input.c
>> +++ b/net/sctp/input.c
>> @@ -782,6 +782,137 @@ hit:
>>       return ep;
>>  }
>>
>> +/* rhashtable for transport */
>> +struct sctp_hash_cmp_arg {
>> +     const union sctp_addr           *laddr;
>> +     const union sctp_addr           *paddr;
>> +     const struct net                *net;
>> +};
>> +
>> +static inline int sctp_hash_cmp(struct rhashtable_compare_arg *arg,
>> +                             const void *ptr)
>> +{
>> +     const struct sctp_hash_cmp_arg *x = arg->key;
>> +     const struct sctp_transport *t = ptr;
>> +     struct sctp_association *asoc = t->asoc;
>> +     const struct net *net = x->net;
>> +
>> +     if (x->laddr->v4.sin_port != htons(asoc->base.bind_addr.port))
>> +             return 1;
>> +     if (!sctp_cmp_addr_exact(&t->ipaddr, x->paddr))
>> +             return 1;
>> +     if (!net_eq(sock_net(asoc->base.sk), net))
>> +             return 1;
>> +     if (!sctp_bind_addr_match(&asoc->base.bind_addr,
>> +                               x->laddr, sctp_sk(asoc->base.sk)))
>> +             return 1;
>> +
>> +     return 0;
>> +}
>> +
>> +static inline u32 sctp_hash_obj(const void *data, u32 len, u32 seed)
>> +{
>> +     const struct sctp_transport *t = data;
>> +     const union sctp_addr *paddr = &t->ipaddr;
>> +     const struct net *net = sock_net(t->asoc->base.sk);
>> +     u16 lport = htons(t->asoc->base.bind_addr.port);
>> +     u32 addr;
>> +
>> +     if (paddr->sa.sa_family == AF_INET6)
>> +             addr = jhash(&paddr->v6.sin6_addr, 16, seed);
>> +     else
>> +             addr = paddr->v4.sin_addr.s_addr;
>> +
>> +     return  jhash_3words(addr, ((__u32)paddr->v4.sin_port) << 16 |
>> +                          (__force __u32)lport, net_hash_mix(net), seed);
>> +}
>> +
>> +static inline u32 sctp_hash_key(const void *data, u32 len, u32 seed)
>> +{
>> +     const struct sctp_hash_cmp_arg *x = data;
>> +     const union sctp_addr *paddr = x->paddr;
>> +     const struct net *net = x->net;
>> +     u16 lport = x->laddr->v4.sin_port;
>> +     u32 addr;
>> +
>> +     if (paddr->sa.sa_family == AF_INET6)
>> +             addr = jhash(&paddr->v6.sin6_addr, 16, seed);
>> +     else
>> +             addr = paddr->v4.sin_addr.s_addr;
>> +
>> +     return  jhash_3words(addr, ((__u32)paddr->v4.sin_port) << 16 |
>> +                          (__force __u32)lport, net_hash_mix(net), seed);
>> +}
>> +
>> +static const struct rhashtable_params sctp_hash_params = {
>> +     .head_offset            = offsetof(struct sctp_transport, node),
>> +     .hashfn                 = sctp_hash_key,
>> +     .obj_hashfn             = sctp_hash_obj,
>> +     .obj_cmpfn              = sctp_hash_cmp,
>> +     .automatic_shrinking    = true,
>> +};
>> +
>> +int sctp_transport_hashtable_init(void)
>> +{
>> +     return rhashtable_init(&sctp_transport_hashtable, &sctp_hash_params);
>> +}
>> +
>> +void sctp_transport_hashtable_destroy(void)
>> +{
>> +     rhashtable_destroy(&sctp_transport_hashtable);
>> +}
>> +
>> +void sctp_hash_transport(struct sctp_transport *t)
>> +{
>> +     struct sctp_sockaddr_entry *addr;
>> +     struct sctp_hash_cmp_arg arg;
>> +
>> +     addr = list_entry(t->asoc->base.bind_addr.address_list.next,
>> +                       struct sctp_sockaddr_entry, list);
>> +     arg.laddr = &addr->a;
>> +     arg.paddr = &t->ipaddr;
>> +     arg.net   = sock_net(t->asoc->base.sk);
>> +
>> +reinsert:
>> +     if (rhashtable_lookup_insert_key(&sctp_transport_hashtable, &arg,
>> +                                      &t->node, sctp_hash_params) == -EBUSY)
>> +             goto reinsert;
>> +}
>> +
>> +void sctp_unhash_transport(struct sctp_transport *t)
>> +{
>> +     rhashtable_remove_fast(&sctp_transport_hashtable, &t->node,
>> +                            sctp_hash_params);
>> +}
>> +
>> +struct sctp_transport *sctp_addrs_lookup_transport(
>> +                             struct net *net,
>> +                             const union sctp_addr *laddr,
>> +                             const union sctp_addr *paddr)
>> +{
>> +     struct sctp_hash_cmp_arg arg = {
>> +             .laddr = laddr,
>> +             .paddr = paddr,
>> +             .net   = net,
>> +     };
>> +
>> +     return rhashtable_lookup_fast(&sctp_transport_hashtable, &arg,
>> +                                   sctp_hash_params);
>> +}
>> +
>> +struct sctp_transport *sctp_epaddr_lookup_transport(
>> +                             const struct sctp_endpoint *ep,
>> +                             const union sctp_addr *paddr)
>> +{
>> +     struct sctp_sockaddr_entry *addr;
>> +     struct net *net = sock_net(ep->base.sk);
>> +
>> +     addr = list_entry(ep->base.bind_addr.address_list.next,
>> +                       struct sctp_sockaddr_entry, list);
>> +
>> +     return sctp_addrs_lookup_transport(net, &addr->a, paddr);
>> +}
>> +
>
> I don't think that this right, mainly because not all endpoint
> addresses will be copied to association bind_addr list.   As a result,
> you may actually have an association on this endpoint to a given
> peer, but may not be using the first address from the endpoint..
>
> What might work is to pick an endpoint address that would be usable within
> the scope of the peer address.
it's not that easy, does it make sense to you if I change

     if (!sctp_bind_addr_match(&asoc->base.bind_addr,
                               x->laddr, sctp_sk(asoc->base.sk)))

to
     if (!sctp_bind_addr_match(&asoc->ep->base.bind_addr,
                               x->laddr, sctp_sk(asoc->base.sk)))

in sctp_hash_cmp() ?

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH net-next 2/5] sctp: apply rhashtable api to send/recv path
  2016-01-05 19:07     ` [PATCH net-next 2/5] sctp: apply rhashtable api to send/recv path Vlad Yasevich
  2016-01-06 16:18       ` Xin Long
@ 2016-01-06 17:42       ` mleitner
  2016-01-11 15:00         ` Vlad Yasevich
  1 sibling, 1 reply; 40+ messages in thread
From: mleitner @ 2016-01-06 17:42 UTC (permalink / raw)
  To: Vlad Yasevich; +Cc: Xin Long, network dev, linux-sctp, vyasevic, daniel, davem

On Tue, Jan 05, 2016 at 02:07:11PM -0500, Vlad Yasevich wrote:
> On 12/30/2015 10:50 AM, Xin Long wrote:
> > apply lookup apis to two functions, for __sctp_endpoint_lookup_assoc
> > and __sctp_lookup_association, it's invoked in the protection of sock
> > lock, it will be safe, but sctp_lookup_association need to call
> > rcu_read_lock() and to detect the t->dead to protect it.
> > 
> > Signed-off-by: Xin Long <lucien.xin@gmail.com>
> > Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
> > ---
> >  net/sctp/associola.c   |  5 +++++
> >  net/sctp/endpointola.c | 35 ++++++++---------------------------
> >  net/sctp/input.c       | 39 ++++++++++-----------------------------
> >  net/sctp/protocol.c    |  6 ++++++
> >  4 files changed, 29 insertions(+), 56 deletions(-)
> > 
> > diff --git a/net/sctp/associola.c b/net/sctp/associola.c
> > index 559afd0..2bf8ec9 100644
> > --- a/net/sctp/associola.c
> > +++ b/net/sctp/associola.c
> > @@ -383,6 +383,7 @@ void sctp_association_free(struct sctp_association *asoc)
> >  	list_for_each_safe(pos, temp, &asoc->peer.transport_addr_list) {
> >  		transport = list_entry(pos, struct sctp_transport, transports);
> >  		list_del_rcu(pos);
> > +		sctp_unhash_transport(transport);
> >  		sctp_transport_free(transport);
> >  	}
> >  
> > @@ -500,6 +501,8 @@ void sctp_assoc_rm_peer(struct sctp_association *asoc,
> >  
> >  	/* Remove this peer from the list. */
> >  	list_del_rcu(&peer->transports);
> > +	/* Remove this peer from the transport hashtable */
> > +	sctp_unhash_transport(peer);
> >  
> >  	/* Get the first transport of asoc. */
> >  	pos = asoc->peer.transport_addr_list.next;
> > @@ -699,6 +702,8 @@ struct sctp_transport *sctp_assoc_add_peer(struct sctp_association *asoc,
> >  	/* Attach the remote transport to our asoc.  */
> >  	list_add_tail_rcu(&peer->transports, &asoc->peer.transport_addr_list);
> >  	asoc->peer.transport_count++;
> > +	/* Add this peer into the transport hashtable */
> > +	sctp_hash_transport(peer);
> 
> This is actually problematic.  The issue is that transports are unhashed when removed.
> however, transport removal happens after the association has been declared dead and
> should have been removed from the hash and marked unreachable.
> 
> As a result, with the code above, you can now find and return a dead association.
> Checking for 'dead' state is racy.

Mind elaborating why you think this is racy? (More below)

> The best solution I've come up with is to hash the transports in sctp_hash_established()
> and clean-up in __sctp_unhash_established(), and then handle ADD-IP case separately.

The idea was exactly the opposite :) to avoid such calls. All calls to
sctp_unhash_established() were followed by sctp_association_free(), and
vice-versa, indicating that it could (and probably should) be embedded
in sctp_association_free() itself.

No extra locking was taken other than to protect the hash table itself,
which now is different. The check against ->dead should be quite as
effective as prior implementation, as it's marked dead quite early in
sctp_association_free(). We could do it a bit earlier if necessary.

Please correct if I'm wrong, but AFAIU rhashtable, it's possible to
return an element that is being removed by another CPU, due to RCU
usage. If that's right, the early removal is not enough anymore and
only a check like the the ->dead one can protect it then. Hmmm we
probably should use something more atomic there then.

> The above would also remove the necessity to check for temporary associations, since they
> should never be hashed.

Agreed. We could add a simple
	if (t->asoc->temp)
		return;
to the new sctp_hash/unhash_transport to fix that.

  Marcelo

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH net-next 1/5] sctp: add the rhashtable apis for sctp global transport hashtable
  2016-01-06 17:01     ` Xin Long
@ 2016-01-06 18:19       ` Marcelo Ricardo Leitner
  2016-01-07 17:23         ` Marcelo Ricardo Leitner
  2016-01-07 20:28       ` Vlad Yasevich
  1 sibling, 1 reply; 40+ messages in thread
From: Marcelo Ricardo Leitner @ 2016-01-06 18:19 UTC (permalink / raw)
  To: Xin Long
  Cc: Vlad Yasevich, network dev, linux-sctp, Vlad Yasevich, daniel, davem

On Thu, Jan 07, 2016 at 01:01:11AM +0800, Xin Long wrote:
> On Wed, Jan 6, 2016 at 2:38 AM, Vlad Yasevich <vyasevich@gmail.com> wrote:
> > On 12/30/2015 10:50 AM, Xin Long wrote:
> >> tranport hashtbale will replace the association hashtable to do the
> >> lookup for transport, and then get association by t->assoc, rhashtable
> >> apis will be used because of it's resizable, scalable and using rcu.
> >>
> >> lport + rport + paddr will be the base hashkey to locate the chain,
> >> with net to protect one netns from another, then plus the laddr to
> >> compare to get the target.
> >>
> >> this patch will provider the lookup functions:
> >> - sctp_epaddr_lookup_transport
> >> - sctp_addrs_lookup_transport
> >>
> >> hash/unhash functions:
> >> - sctp_hash_transport
> >> - sctp_unhash_transport
> >>
> >> init/destroy functions:
> >> - sctp_transport_hashtable_init
> >> - sctp_transport_hashtable_destroy
> >>
> >> Signed-off-by: Xin Long <lucien.xin@gmail.com>
> >> Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
> >> ---
> >>  include/net/sctp/sctp.h    |  11 ++++
> >>  include/net/sctp/structs.h |   5 ++
> >>  net/sctp/input.c           | 131 +++++++++++++++++++++++++++++++++++++++++++++
> >>  3 files changed, 147 insertions(+)
> >>
> >> diff --git a/include/net/sctp/sctp.h b/include/net/sctp/sctp.h
> >> index ce13cf2..7bbdfba 100644
> >> --- a/include/net/sctp/sctp.h
> >> +++ b/include/net/sctp/sctp.h
> >> @@ -143,6 +143,17 @@ void sctp_icmp_proto_unreachable(struct sock *sk,
> >>                                struct sctp_transport *t);
> >>  void sctp_backlog_migrate(struct sctp_association *assoc,
> >>                         struct sock *oldsk, struct sock *newsk);
> >> +int sctp_transport_hashtable_init(void);
> >> +void sctp_transport_hashtable_destroy(void);
> >> +void sctp_hash_transport(struct sctp_transport *t);
> >> +void sctp_unhash_transport(struct sctp_transport *t);
> >> +struct sctp_transport *sctp_addrs_lookup_transport(
> >> +                             struct net *net,
> >> +                             const union sctp_addr *laddr,
> >> +                             const union sctp_addr *paddr);
> >> +struct sctp_transport *sctp_epaddr_lookup_transport(
> >> +                             const struct sctp_endpoint *ep,
> >> +                             const union sctp_addr *paddr);
> >>
> >>  /*
> >>   * sctp/proc.c
> >> diff --git a/include/net/sctp/structs.h b/include/net/sctp/structs.h
> >> index eea9bde..4ab87d0 100644
> >> --- a/include/net/sctp/structs.h
> >> +++ b/include/net/sctp/structs.h
> >> @@ -48,6 +48,7 @@
> >>  #define __sctp_structs_h__
> >>
> >>  #include <linux/ktime.h>
> >> +#include <linux/rhashtable.h>
> >>  #include <linux/socket.h>    /* linux/in.h needs this!!    */
> >>  #include <linux/in.h>                /* We get struct sockaddr_in. */
> >>  #include <linux/in6.h>               /* We get struct in6_addr     */
> >> @@ -123,6 +124,8 @@ extern struct sctp_globals {
> >>       struct sctp_hashbucket *assoc_hashtable;
> >>       /* This is the sctp port control hash.  */
> >>       struct sctp_bind_hashbucket *port_hashtable;
> >> +     /* This is the hash of all transports. */
> >> +     struct rhashtable transport_hashtable;
> >>
> >>       /* Sizes of above hashtables. */
> >>       int ep_hashsize;
> >> @@ -147,6 +150,7 @@ extern struct sctp_globals {
> >>  #define sctp_assoc_hashtable         (sctp_globals.assoc_hashtable)
> >>  #define sctp_port_hashsize           (sctp_globals.port_hashsize)
> >>  #define sctp_port_hashtable          (sctp_globals.port_hashtable)
> >> +#define sctp_transport_hashtable     (sctp_globals.transport_hashtable)
> >>  #define sctp_checksum_disable                (sctp_globals.checksum_disable)
> >>
> >>  /* SCTP Socket type: UDP or TCP style. */
> >> @@ -753,6 +757,7 @@ static inline int sctp_packet_empty(struct sctp_packet *packet)
> >>  struct sctp_transport {
> >>       /* A list of transports. */
> >>       struct list_head transports;
> >> +     struct rhash_head node;
> >>
> >>       /* Reference counting. */
> >>       atomic_t refcnt;
> >> diff --git a/net/sctp/input.c b/net/sctp/input.c
> >> index b6493b3..bac8278 100644
> >> --- a/net/sctp/input.c
> >> +++ b/net/sctp/input.c
> >> @@ -782,6 +782,137 @@ hit:
> >>       return ep;
> >>  }
> >>
> >> +/* rhashtable for transport */
> >> +struct sctp_hash_cmp_arg {
> >> +     const union sctp_addr           *laddr;
> >> +     const union sctp_addr           *paddr;
> >> +     const struct net                *net;
> >> +};
> >> +
> >> +static inline int sctp_hash_cmp(struct rhashtable_compare_arg *arg,
> >> +                             const void *ptr)
> >> +{
> >> +     const struct sctp_hash_cmp_arg *x = arg->key;
> >> +     const struct sctp_transport *t = ptr;
> >> +     struct sctp_association *asoc = t->asoc;
> >> +     const struct net *net = x->net;
> >> +
> >> +     if (x->laddr->v4.sin_port != htons(asoc->base.bind_addr.port))
> >> +             return 1;
> >> +     if (!sctp_cmp_addr_exact(&t->ipaddr, x->paddr))
> >> +             return 1;
> >> +     if (!net_eq(sock_net(asoc->base.sk), net))
> >> +             return 1;
> >> +     if (!sctp_bind_addr_match(&asoc->base.bind_addr,
> >> +                               x->laddr, sctp_sk(asoc->base.sk)))
> >> +             return 1;
> >> +
> >> +     return 0;
> >> +}
> >> +
> >> +static inline u32 sctp_hash_obj(const void *data, u32 len, u32 seed)
> >> +{
> >> +     const struct sctp_transport *t = data;
> >> +     const union sctp_addr *paddr = &t->ipaddr;
> >> +     const struct net *net = sock_net(t->asoc->base.sk);
> >> +     u16 lport = htons(t->asoc->base.bind_addr.port);
> >> +     u32 addr;
> >> +
> >> +     if (paddr->sa.sa_family == AF_INET6)
> >> +             addr = jhash(&paddr->v6.sin6_addr, 16, seed);
> >> +     else
> >> +             addr = paddr->v4.sin_addr.s_addr;
> >> +
> >> +     return  jhash_3words(addr, ((__u32)paddr->v4.sin_port) << 16 |
> >> +                          (__force __u32)lport, net_hash_mix(net), seed);
> >> +}
> >> +
> >> +static inline u32 sctp_hash_key(const void *data, u32 len, u32 seed)
> >> +{
> >> +     const struct sctp_hash_cmp_arg *x = data;
> >> +     const union sctp_addr *paddr = x->paddr;
> >> +     const struct net *net = x->net;
> >> +     u16 lport = x->laddr->v4.sin_port;
> >> +     u32 addr;
> >> +
> >> +     if (paddr->sa.sa_family == AF_INET6)
> >> +             addr = jhash(&paddr->v6.sin6_addr, 16, seed);
> >> +     else
> >> +             addr = paddr->v4.sin_addr.s_addr;
> >> +
> >> +     return  jhash_3words(addr, ((__u32)paddr->v4.sin_port) << 16 |
> >> +                          (__force __u32)lport, net_hash_mix(net), seed);
> >> +}
> >> +
> >> +static const struct rhashtable_params sctp_hash_params = {
> >> +     .head_offset            = offsetof(struct sctp_transport, node),
> >> +     .hashfn                 = sctp_hash_key,
> >> +     .obj_hashfn             = sctp_hash_obj,
> >> +     .obj_cmpfn              = sctp_hash_cmp,
> >> +     .automatic_shrinking    = true,
> >> +};
> >> +
> >> +int sctp_transport_hashtable_init(void)
> >> +{
> >> +     return rhashtable_init(&sctp_transport_hashtable, &sctp_hash_params);
> >> +}
> >> +
> >> +void sctp_transport_hashtable_destroy(void)
> >> +{
> >> +     rhashtable_destroy(&sctp_transport_hashtable);
> >> +}
> >> +
> >> +void sctp_hash_transport(struct sctp_transport *t)
> >> +{
> >> +     struct sctp_sockaddr_entry *addr;
> >> +     struct sctp_hash_cmp_arg arg;
> >> +
> >> +     addr = list_entry(t->asoc->base.bind_addr.address_list.next,
> >> +                       struct sctp_sockaddr_entry, list);
> >> +     arg.laddr = &addr->a;
> >> +     arg.paddr = &t->ipaddr;
> >> +     arg.net   = sock_net(t->asoc->base.sk);
> >> +
> >> +reinsert:
> >> +     if (rhashtable_lookup_insert_key(&sctp_transport_hashtable, &arg,
> >> +                                      &t->node, sctp_hash_params) == -EBUSY)
> >> +             goto reinsert;
> >> +}
> >> +
> >> +void sctp_unhash_transport(struct sctp_transport *t)
> >> +{
> >> +     rhashtable_remove_fast(&sctp_transport_hashtable, &t->node,
> >> +                            sctp_hash_params);
> >> +}
> >> +
> >> +struct sctp_transport *sctp_addrs_lookup_transport(
> >> +                             struct net *net,
> >> +                             const union sctp_addr *laddr,
> >> +                             const union sctp_addr *paddr)
> >> +{
> >> +     struct sctp_hash_cmp_arg arg = {
> >> +             .laddr = laddr,
> >> +             .paddr = paddr,
> >> +             .net   = net,
> >> +     };
> >> +
> >> +     return rhashtable_lookup_fast(&sctp_transport_hashtable, &arg,
> >> +                                   sctp_hash_params);
> >> +}
> >> +
> >> +struct sctp_transport *sctp_epaddr_lookup_transport(
> >> +                             const struct sctp_endpoint *ep,
> >> +                             const union sctp_addr *paddr)
> >> +{
> >> +     struct sctp_sockaddr_entry *addr;
> >> +     struct net *net = sock_net(ep->base.sk);
> >> +
> >> +     addr = list_entry(ep->base.bind_addr.address_list.next,
> >> +                       struct sctp_sockaddr_entry, list);
> >> +
> >> +     return sctp_addrs_lookup_transport(net, &addr->a, paddr);
> >> +}
> >> +
> >
> > I don't think that this right, mainly because not all endpoint
> > addresses will be copied to association bind_addr list.   As a result,
> > you may actually have an association on this endpoint to a given
> > peer, but may not be using the first address from the endpoint..

That shouldn't be a problem, as laddr is not considered in the hash
function and later on sctp_hash_cmp() will use sctp_bind_addr_match() as
below, which will traverse asoc->base.bind_addr in this case, meaning it
would still see all transport/asoc possibilities.

Or did I miss something?

  Marcelo

> > What might work is to pick an endpoint address that would be usable within
> > the scope of the peer address.
> it's not that easy, does it make sense to you if I change
> 
>      if (!sctp_bind_addr_match(&asoc->base.bind_addr,
>                                x->laddr, sctp_sk(asoc->base.sk)))
> 
> to
>      if (!sctp_bind_addr_match(&asoc->ep->base.bind_addr,
>                                x->laddr, sctp_sk(asoc->base.sk)))
> 
> in sctp_hash_cmp() ?
> 

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH net-next 1/5] sctp: add the rhashtable apis for sctp global transport hashtable
  2016-01-06 18:19       ` Marcelo Ricardo Leitner
@ 2016-01-07 17:23         ` Marcelo Ricardo Leitner
  0 siblings, 0 replies; 40+ messages in thread
From: Marcelo Ricardo Leitner @ 2016-01-07 17:23 UTC (permalink / raw)
  To: Vlad Yasevich
  Cc: Xin Long, network dev, linux-sctp, Vlad Yasevich, daniel, davem

On Wed, Jan 06, 2016 at 04:19:50PM -0200, Marcelo Ricardo Leitner wrote:
> On Thu, Jan 07, 2016 at 01:01:11AM +0800, Xin Long wrote:
> > On Wed, Jan 6, 2016 at 2:38 AM, Vlad Yasevich <vyasevich@gmail.com> wrote:
> > > On 12/30/2015 10:50 AM, Xin Long wrote:
> > >> tranport hashtbale will replace the association hashtable to do the
> > >> lookup for transport, and then get association by t->assoc, rhashtable
> > >> apis will be used because of it's resizable, scalable and using rcu.
> > >>
> > >> lport + rport + paddr will be the base hashkey to locate the chain,
> > >> with net to protect one netns from another, then plus the laddr to
> > >> compare to get the target.
> > >>
> > >> this patch will provider the lookup functions:
> > >> - sctp_epaddr_lookup_transport
> > >> - sctp_addrs_lookup_transport
> > >>
> > >> hash/unhash functions:
> > >> - sctp_hash_transport
> > >> - sctp_unhash_transport
> > >>
> > >> init/destroy functions:
> > >> - sctp_transport_hashtable_init
> > >> - sctp_transport_hashtable_destroy
> > >>
> > >> Signed-off-by: Xin Long <lucien.xin@gmail.com>
> > >> Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
> > >> ---
> > >>  include/net/sctp/sctp.h    |  11 ++++
> > >>  include/net/sctp/structs.h |   5 ++
> > >>  net/sctp/input.c           | 131 +++++++++++++++++++++++++++++++++++++++++++++
> > >>  3 files changed, 147 insertions(+)
> > >>
> > >> diff --git a/include/net/sctp/sctp.h b/include/net/sctp/sctp.h
> > >> index ce13cf2..7bbdfba 100644
> > >> --- a/include/net/sctp/sctp.h
> > >> +++ b/include/net/sctp/sctp.h
> > >> @@ -143,6 +143,17 @@ void sctp_icmp_proto_unreachable(struct sock *sk,
> > >>                                struct sctp_transport *t);
> > >>  void sctp_backlog_migrate(struct sctp_association *assoc,
> > >>                         struct sock *oldsk, struct sock *newsk);
> > >> +int sctp_transport_hashtable_init(void);
> > >> +void sctp_transport_hashtable_destroy(void);
> > >> +void sctp_hash_transport(struct sctp_transport *t);
> > >> +void sctp_unhash_transport(struct sctp_transport *t);
> > >> +struct sctp_transport *sctp_addrs_lookup_transport(
> > >> +                             struct net *net,
> > >> +                             const union sctp_addr *laddr,
> > >> +                             const union sctp_addr *paddr);
> > >> +struct sctp_transport *sctp_epaddr_lookup_transport(
> > >> +                             const struct sctp_endpoint *ep,
> > >> +                             const union sctp_addr *paddr);
> > >>
> > >>  /*
> > >>   * sctp/proc.c
> > >> diff --git a/include/net/sctp/structs.h b/include/net/sctp/structs.h
> > >> index eea9bde..4ab87d0 100644
> > >> --- a/include/net/sctp/structs.h
> > >> +++ b/include/net/sctp/structs.h
> > >> @@ -48,6 +48,7 @@
> > >>  #define __sctp_structs_h__
> > >>
> > >>  #include <linux/ktime.h>
> > >> +#include <linux/rhashtable.h>
> > >>  #include <linux/socket.h>    /* linux/in.h needs this!!    */
> > >>  #include <linux/in.h>                /* We get struct sockaddr_in. */
> > >>  #include <linux/in6.h>               /* We get struct in6_addr     */
> > >> @@ -123,6 +124,8 @@ extern struct sctp_globals {
> > >>       struct sctp_hashbucket *assoc_hashtable;
> > >>       /* This is the sctp port control hash.  */
> > >>       struct sctp_bind_hashbucket *port_hashtable;
> > >> +     /* This is the hash of all transports. */
> > >> +     struct rhashtable transport_hashtable;
> > >>
> > >>       /* Sizes of above hashtables. */
> > >>       int ep_hashsize;
> > >> @@ -147,6 +150,7 @@ extern struct sctp_globals {
> > >>  #define sctp_assoc_hashtable         (sctp_globals.assoc_hashtable)
> > >>  #define sctp_port_hashsize           (sctp_globals.port_hashsize)
> > >>  #define sctp_port_hashtable          (sctp_globals.port_hashtable)
> > >> +#define sctp_transport_hashtable     (sctp_globals.transport_hashtable)
> > >>  #define sctp_checksum_disable                (sctp_globals.checksum_disable)
> > >>
> > >>  /* SCTP Socket type: UDP or TCP style. */
> > >> @@ -753,6 +757,7 @@ static inline int sctp_packet_empty(struct sctp_packet *packet)
> > >>  struct sctp_transport {
> > >>       /* A list of transports. */
> > >>       struct list_head transports;
> > >> +     struct rhash_head node;
> > >>
> > >>       /* Reference counting. */
> > >>       atomic_t refcnt;
> > >> diff --git a/net/sctp/input.c b/net/sctp/input.c
> > >> index b6493b3..bac8278 100644
> > >> --- a/net/sctp/input.c
> > >> +++ b/net/sctp/input.c
> > >> @@ -782,6 +782,137 @@ hit:
> > >>       return ep;
> > >>  }
> > >>
> > >> +/* rhashtable for transport */
> > >> +struct sctp_hash_cmp_arg {
> > >> +     const union sctp_addr           *laddr;
> > >> +     const union sctp_addr           *paddr;
> > >> +     const struct net                *net;
> > >> +};
> > >> +
> > >> +static inline int sctp_hash_cmp(struct rhashtable_compare_arg *arg,
> > >> +                             const void *ptr)
> > >> +{
> > >> +     const struct sctp_hash_cmp_arg *x = arg->key;
> > >> +     const struct sctp_transport *t = ptr;
> > >> +     struct sctp_association *asoc = t->asoc;
> > >> +     const struct net *net = x->net;
> > >> +
> > >> +     if (x->laddr->v4.sin_port != htons(asoc->base.bind_addr.port))
> > >> +             return 1;
> > >> +     if (!sctp_cmp_addr_exact(&t->ipaddr, x->paddr))
> > >> +             return 1;
> > >> +     if (!net_eq(sock_net(asoc->base.sk), net))
> > >> +             return 1;
> > >> +     if (!sctp_bind_addr_match(&asoc->base.bind_addr,
> > >> +                               x->laddr, sctp_sk(asoc->base.sk)))
> > >> +             return 1;
> > >> +
> > >> +     return 0;
> > >> +}
> > >> +
> > >> +static inline u32 sctp_hash_obj(const void *data, u32 len, u32 seed)
> > >> +{
> > >> +     const struct sctp_transport *t = data;
> > >> +     const union sctp_addr *paddr = &t->ipaddr;
> > >> +     const struct net *net = sock_net(t->asoc->base.sk);
> > >> +     u16 lport = htons(t->asoc->base.bind_addr.port);
> > >> +     u32 addr;
> > >> +
> > >> +     if (paddr->sa.sa_family == AF_INET6)
> > >> +             addr = jhash(&paddr->v6.sin6_addr, 16, seed);
> > >> +     else
> > >> +             addr = paddr->v4.sin_addr.s_addr;
> > >> +
> > >> +     return  jhash_3words(addr, ((__u32)paddr->v4.sin_port) << 16 |
> > >> +                          (__force __u32)lport, net_hash_mix(net), seed);
> > >> +}
> > >> +
> > >> +static inline u32 sctp_hash_key(const void *data, u32 len, u32 seed)
> > >> +{
> > >> +     const struct sctp_hash_cmp_arg *x = data;
> > >> +     const union sctp_addr *paddr = x->paddr;
> > >> +     const struct net *net = x->net;
> > >> +     u16 lport = x->laddr->v4.sin_port;
> > >> +     u32 addr;
> > >> +
> > >> +     if (paddr->sa.sa_family == AF_INET6)
> > >> +             addr = jhash(&paddr->v6.sin6_addr, 16, seed);
> > >> +     else
> > >> +             addr = paddr->v4.sin_addr.s_addr;
> > >> +
> > >> +     return  jhash_3words(addr, ((__u32)paddr->v4.sin_port) << 16 |
> > >> +                          (__force __u32)lport, net_hash_mix(net), seed);
> > >> +}
> > >> +
> > >> +static const struct rhashtable_params sctp_hash_params = {
> > >> +     .head_offset            = offsetof(struct sctp_transport, node),
> > >> +     .hashfn                 = sctp_hash_key,
> > >> +     .obj_hashfn             = sctp_hash_obj,
> > >> +     .obj_cmpfn              = sctp_hash_cmp,
> > >> +     .automatic_shrinking    = true,
> > >> +};
> > >> +
> > >> +int sctp_transport_hashtable_init(void)
> > >> +{
> > >> +     return rhashtable_init(&sctp_transport_hashtable, &sctp_hash_params);
> > >> +}
> > >> +
> > >> +void sctp_transport_hashtable_destroy(void)
> > >> +{
> > >> +     rhashtable_destroy(&sctp_transport_hashtable);
> > >> +}
> > >> +
> > >> +void sctp_hash_transport(struct sctp_transport *t)
> > >> +{
> > >> +     struct sctp_sockaddr_entry *addr;
> > >> +     struct sctp_hash_cmp_arg arg;
> > >> +
> > >> +     addr = list_entry(t->asoc->base.bind_addr.address_list.next,
> > >> +                       struct sctp_sockaddr_entry, list);
> > >> +     arg.laddr = &addr->a;
> > >> +     arg.paddr = &t->ipaddr;
> > >> +     arg.net   = sock_net(t->asoc->base.sk);
> > >> +
> > >> +reinsert:
> > >> +     if (rhashtable_lookup_insert_key(&sctp_transport_hashtable, &arg,
> > >> +                                      &t->node, sctp_hash_params) == -EBUSY)
> > >> +             goto reinsert;
> > >> +}
> > >> +
> > >> +void sctp_unhash_transport(struct sctp_transport *t)
> > >> +{
> > >> +     rhashtable_remove_fast(&sctp_transport_hashtable, &t->node,
> > >> +                            sctp_hash_params);
> > >> +}
> > >> +
> > >> +struct sctp_transport *sctp_addrs_lookup_transport(
> > >> +                             struct net *net,
> > >> +                             const union sctp_addr *laddr,
> > >> +                             const union sctp_addr *paddr)
> > >> +{
> > >> +     struct sctp_hash_cmp_arg arg = {
> > >> +             .laddr = laddr,
> > >> +             .paddr = paddr,
> > >> +             .net   = net,
> > >> +     };
> > >> +
> > >> +     return rhashtable_lookup_fast(&sctp_transport_hashtable, &arg,
> > >> +                                   sctp_hash_params);
> > >> +}
> > >> +
> > >> +struct sctp_transport *sctp_epaddr_lookup_transport(
> > >> +                             const struct sctp_endpoint *ep,
> > >> +                             const union sctp_addr *paddr)
> > >> +{
> > >> +     struct sctp_sockaddr_entry *addr;
> > >> +     struct net *net = sock_net(ep->base.sk);
> > >> +
> > >> +     addr = list_entry(ep->base.bind_addr.address_list.next,
> > >> +                       struct sctp_sockaddr_entry, list);
> > >> +
> > >> +     return sctp_addrs_lookup_transport(net, &addr->a, paddr);
> > >> +}
> > >> +
> > >
> > > I don't think that this right, mainly because not all endpoint
> > > addresses will be copied to association bind_addr list.   As a result,
> > > you may actually have an association on this endpoint to a given
> > > peer, but may not be using the first address from the endpoint..
> 
> That shouldn't be a problem, as laddr is not considered in the hash
> function and later on sctp_hash_cmp() will use sctp_bind_addr_match() as
> below, which will traverse asoc->base.bind_addr in this case, meaning it
> would still see all transport/asoc possibilities.
> 
> Or did I miss something?

I did.. Xin kindly explained the issue to me offline. Yup, this is an
issue, we will work on it. Thanks

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH net-next 1/5] sctp: add the rhashtable apis for sctp global transport hashtable
  2016-01-06 17:01     ` Xin Long
  2016-01-06 18:19       ` Marcelo Ricardo Leitner
@ 2016-01-07 20:28       ` Vlad Yasevich
  1 sibling, 0 replies; 40+ messages in thread
From: Vlad Yasevich @ 2016-01-07 20:28 UTC (permalink / raw)
  To: Xin Long
  Cc: network dev, linux-sctp, Marcelo Ricardo Leitner, Vlad Yasevich,
	daniel, davem

On 01/06/2016 12:01 PM, Xin Long wrote:
> On Wed, Jan 6, 2016 at 2:38 AM, Vlad Yasevich <vyasevich@gmail.com> wrote:
>> On 12/30/2015 10:50 AM, Xin Long wrote:
>>> tranport hashtbale will replace the association hashtable to do the
>>> lookup for transport, and then get association by t->assoc, rhashtable
>>> apis will be used because of it's resizable, scalable and using rcu.
>>>
>>> lport + rport + paddr will be the base hashkey to locate the chain,
>>> with net to protect one netns from another, then plus the laddr to
>>> compare to get the target.
>>>
>>> this patch will provider the lookup functions:
>>> - sctp_epaddr_lookup_transport
>>> - sctp_addrs_lookup_transport
>>>
>>> hash/unhash functions:
>>> - sctp_hash_transport
>>> - sctp_unhash_transport
>>>
>>> init/destroy functions:
>>> - sctp_transport_hashtable_init
>>> - sctp_transport_hashtable_destroy
>>>
>>> Signed-off-by: Xin Long <lucien.xin@gmail.com>
>>> Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
>>> ---
>>>  include/net/sctp/sctp.h    |  11 ++++
>>>  include/net/sctp/structs.h |   5 ++
>>>  net/sctp/input.c           | 131 +++++++++++++++++++++++++++++++++++++++++++++
>>>  3 files changed, 147 insertions(+)
>>>
>>> diff --git a/include/net/sctp/sctp.h b/include/net/sctp/sctp.h
>>> index ce13cf2..7bbdfba 100644
>>> --- a/include/net/sctp/sctp.h
>>> +++ b/include/net/sctp/sctp.h
>>> @@ -143,6 +143,17 @@ void sctp_icmp_proto_unreachable(struct sock *sk,
>>>                                struct sctp_transport *t);
>>>  void sctp_backlog_migrate(struct sctp_association *assoc,
>>>                         struct sock *oldsk, struct sock *newsk);
>>> +int sctp_transport_hashtable_init(void);
>>> +void sctp_transport_hashtable_destroy(void);
>>> +void sctp_hash_transport(struct sctp_transport *t);
>>> +void sctp_unhash_transport(struct sctp_transport *t);
>>> +struct sctp_transport *sctp_addrs_lookup_transport(
>>> +                             struct net *net,
>>> +                             const union sctp_addr *laddr,
>>> +                             const union sctp_addr *paddr);
>>> +struct sctp_transport *sctp_epaddr_lookup_transport(
>>> +                             const struct sctp_endpoint *ep,
>>> +                             const union sctp_addr *paddr);
>>>
>>>  /*
>>>   * sctp/proc.c
>>> diff --git a/include/net/sctp/structs.h b/include/net/sctp/structs.h
>>> index eea9bde..4ab87d0 100644
>>> --- a/include/net/sctp/structs.h
>>> +++ b/include/net/sctp/structs.h
>>> @@ -48,6 +48,7 @@
>>>  #define __sctp_structs_h__
>>>
>>>  #include <linux/ktime.h>
>>> +#include <linux/rhashtable.h>
>>>  #include <linux/socket.h>    /* linux/in.h needs this!!    */
>>>  #include <linux/in.h>                /* We get struct sockaddr_in. */
>>>  #include <linux/in6.h>               /* We get struct in6_addr     */
>>> @@ -123,6 +124,8 @@ extern struct sctp_globals {
>>>       struct sctp_hashbucket *assoc_hashtable;
>>>       /* This is the sctp port control hash.  */
>>>       struct sctp_bind_hashbucket *port_hashtable;
>>> +     /* This is the hash of all transports. */
>>> +     struct rhashtable transport_hashtable;
>>>
>>>       /* Sizes of above hashtables. */
>>>       int ep_hashsize;
>>> @@ -147,6 +150,7 @@ extern struct sctp_globals {
>>>  #define sctp_assoc_hashtable         (sctp_globals.assoc_hashtable)
>>>  #define sctp_port_hashsize           (sctp_globals.port_hashsize)
>>>  #define sctp_port_hashtable          (sctp_globals.port_hashtable)
>>> +#define sctp_transport_hashtable     (sctp_globals.transport_hashtable)
>>>  #define sctp_checksum_disable                (sctp_globals.checksum_disable)
>>>
>>>  /* SCTP Socket type: UDP or TCP style. */
>>> @@ -753,6 +757,7 @@ static inline int sctp_packet_empty(struct sctp_packet *packet)
>>>  struct sctp_transport {
>>>       /* A list of transports. */
>>>       struct list_head transports;
>>> +     struct rhash_head node;
>>>
>>>       /* Reference counting. */
>>>       atomic_t refcnt;
>>> diff --git a/net/sctp/input.c b/net/sctp/input.c
>>> index b6493b3..bac8278 100644
>>> --- a/net/sctp/input.c
>>> +++ b/net/sctp/input.c
>>> @@ -782,6 +782,137 @@ hit:
>>>       return ep;
>>>  }
>>>
>>> +/* rhashtable for transport */
>>> +struct sctp_hash_cmp_arg {
>>> +     const union sctp_addr           *laddr;
>>> +     const union sctp_addr           *paddr;
>>> +     const struct net                *net;
>>> +};
>>> +
>>> +static inline int sctp_hash_cmp(struct rhashtable_compare_arg *arg,
>>> +                             const void *ptr)
>>> +{
>>> +     const struct sctp_hash_cmp_arg *x = arg->key;
>>> +     const struct sctp_transport *t = ptr;
>>> +     struct sctp_association *asoc = t->asoc;
>>> +     const struct net *net = x->net;
>>> +
>>> +     if (x->laddr->v4.sin_port != htons(asoc->base.bind_addr.port))
>>> +             return 1;
>>> +     if (!sctp_cmp_addr_exact(&t->ipaddr, x->paddr))
>>> +             return 1;
>>> +     if (!net_eq(sock_net(asoc->base.sk), net))
>>> +             return 1;
>>> +     if (!sctp_bind_addr_match(&asoc->base.bind_addr,
>>> +                               x->laddr, sctp_sk(asoc->base.sk)))
>>> +             return 1;
>>> +
>>> +     return 0;
>>> +}
>>> +
>>> +static inline u32 sctp_hash_obj(const void *data, u32 len, u32 seed)
>>> +{
>>> +     const struct sctp_transport *t = data;
>>> +     const union sctp_addr *paddr = &t->ipaddr;
>>> +     const struct net *net = sock_net(t->asoc->base.sk);
>>> +     u16 lport = htons(t->asoc->base.bind_addr.port);
>>> +     u32 addr;
>>> +
>>> +     if (paddr->sa.sa_family == AF_INET6)
>>> +             addr = jhash(&paddr->v6.sin6_addr, 16, seed);
>>> +     else
>>> +             addr = paddr->v4.sin_addr.s_addr;
>>> +
>>> +     return  jhash_3words(addr, ((__u32)paddr->v4.sin_port) << 16 |
>>> +                          (__force __u32)lport, net_hash_mix(net), seed);
>>> +}
>>> +
>>> +static inline u32 sctp_hash_key(const void *data, u32 len, u32 seed)
>>> +{
>>> +     const struct sctp_hash_cmp_arg *x = data;
>>> +     const union sctp_addr *paddr = x->paddr;
>>> +     const struct net *net = x->net;
>>> +     u16 lport = x->laddr->v4.sin_port;
>>> +     u32 addr;
>>> +
>>> +     if (paddr->sa.sa_family == AF_INET6)
>>> +             addr = jhash(&paddr->v6.sin6_addr, 16, seed);
>>> +     else
>>> +             addr = paddr->v4.sin_addr.s_addr;
>>> +
>>> +     return  jhash_3words(addr, ((__u32)paddr->v4.sin_port) << 16 |
>>> +                          (__force __u32)lport, net_hash_mix(net), seed);
>>> +}
>>> +
>>> +static const struct rhashtable_params sctp_hash_params = {
>>> +     .head_offset            = offsetof(struct sctp_transport, node),
>>> +     .hashfn                 = sctp_hash_key,
>>> +     .obj_hashfn             = sctp_hash_obj,
>>> +     .obj_cmpfn              = sctp_hash_cmp,
>>> +     .automatic_shrinking    = true,
>>> +};
>>> +
>>> +int sctp_transport_hashtable_init(void)
>>> +{
>>> +     return rhashtable_init(&sctp_transport_hashtable, &sctp_hash_params);
>>> +}
>>> +
>>> +void sctp_transport_hashtable_destroy(void)
>>> +{
>>> +     rhashtable_destroy(&sctp_transport_hashtable);
>>> +}
>>> +
>>> +void sctp_hash_transport(struct sctp_transport *t)
>>> +{
>>> +     struct sctp_sockaddr_entry *addr;
>>> +     struct sctp_hash_cmp_arg arg;
>>> +
>>> +     addr = list_entry(t->asoc->base.bind_addr.address_list.next,
>>> +                       struct sctp_sockaddr_entry, list);
>>> +     arg.laddr = &addr->a;
>>> +     arg.paddr = &t->ipaddr;
>>> +     arg.net   = sock_net(t->asoc->base.sk);
>>> +
>>> +reinsert:
>>> +     if (rhashtable_lookup_insert_key(&sctp_transport_hashtable, &arg,
>>> +                                      &t->node, sctp_hash_params) == -EBUSY)
>>> +             goto reinsert;
>>> +}
>>> +
>>> +void sctp_unhash_transport(struct sctp_transport *t)
>>> +{
>>> +     rhashtable_remove_fast(&sctp_transport_hashtable, &t->node,
>>> +                            sctp_hash_params);
>>> +}
>>> +
>>> +struct sctp_transport *sctp_addrs_lookup_transport(
>>> +                             struct net *net,
>>> +                             const union sctp_addr *laddr,
>>> +                             const union sctp_addr *paddr)
>>> +{
>>> +     struct sctp_hash_cmp_arg arg = {
>>> +             .laddr = laddr,
>>> +             .paddr = paddr,
>>> +             .net   = net,
>>> +     };
>>> +
>>> +     return rhashtable_lookup_fast(&sctp_transport_hashtable, &arg,
>>> +                                   sctp_hash_params);
>>> +}
>>> +
>>> +struct sctp_transport *sctp_epaddr_lookup_transport(
>>> +                             const struct sctp_endpoint *ep,
>>> +                             const union sctp_addr *paddr)
>>> +{
>>> +     struct sctp_sockaddr_entry *addr;
>>> +     struct net *net = sock_net(ep->base.sk);
>>> +
>>> +     addr = list_entry(ep->base.bind_addr.address_list.next,
>>> +                       struct sctp_sockaddr_entry, list);
>>> +
>>> +     return sctp_addrs_lookup_transport(net, &addr->a, paddr);
>>> +}
>>> +
>>
>> I don't think that this right, mainly because not all endpoint
>> addresses will be copied to association bind_addr list.   As a result,
>> you may actually have an association on this endpoint to a given
>> peer, but may not be using the first address from the endpoint..
>>
>> What might work is to pick an endpoint address that would be usable within
>> the scope of the peer address.
> it's not that easy, does it make sense to you if I change
> 
>      if (!sctp_bind_addr_match(&asoc->base.bind_addr,
>                                x->laddr, sctp_sk(asoc->base.sk)))
> 
> to
>      if (!sctp_bind_addr_match(&asoc->ep->base.bind_addr,
>                                x->laddr, sctp_sk(asoc->base.sk)))
> 
> in sctp_hash_cmp() ?
> 

No, the problem them becomes the accept/peel-off path.  The assoc->ep
linkage isn't protected in the lookup path and thus can race against
accept call moving the assoc from one endpoint to another.
This is why we've used the association list as it is under rcu protection.

We might be able to get away with making asoc->ep pointer rcu protected...
Need to think a bit more about it.

-vlad

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH net-next 1/5] sctp: add the rhashtable apis for sctp global transport hashtable
  2016-01-05 10:10     ` Xin Long
@ 2016-01-11  9:22       ` Herbert Xu
  0 siblings, 0 replies; 40+ messages in thread
From: Herbert Xu @ 2016-01-11  9:22 UTC (permalink / raw)
  To: Xin Long
  Cc: Marcelo Ricardo Leitner, Eric Dumazet, network dev, linux-sctp,
	Marcelo Ricardo Leitner, Vlad Yasevich, daniel, davem

On Tue, Jan 05, 2016 at 06:10:15PM +0800, Xin Long wrote:
>
> without this 'reinsert'.
> 
> we can reproduce this issue like this:
> 1. download the attachment
> 2. cp sctperf.tar.gz to server and client hosts
> 3. in each hosts.
> #make
> 4. in server:
> #sh saddr.sh $ethx
> #./ss
> 5. in client:
> #sh caddr.sh $ethx
> #ulimit -n 20000
> #./cc
> 
> when the number of  connections reach about 1600, this issue will be triggered.

Just to confirm this is with the latest upstream kernel, right?
A git ID would be helpful.

Thanks,
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH net-next 1/5] sctp: add the rhashtable apis for sctp global transport hashtable
  2015-12-30 15:50 ` [PATCH net-next 1/5] sctp: add the rhashtable apis for sctp global transport hashtable Xin Long
                     ` (3 preceding siblings ...)
  2016-01-05 18:38   ` Vlad Yasevich
@ 2016-01-11  9:30   ` Herbert Xu
  2016-01-11 16:00     ` mleitner
  4 siblings, 1 reply; 40+ messages in thread
From: Herbert Xu @ 2016-01-11  9:30 UTC (permalink / raw)
  To: Xin Long; +Cc: netdev, linux-sctp, mleitner, vyasevic, daniel, davem

Xin Long <lucien.xin@gmail.com> wrote:
>
> +static inline int sctp_hash_cmp(struct rhashtable_compare_arg *arg,
> +                               const void *ptr)
> +{
> +       const struct sctp_hash_cmp_arg *x = arg->key;
> +       const struct sctp_transport *t = ptr;
> +       struct sctp_association *asoc = t->asoc;
> +       const struct net *net = x->net;
> +
> +       if (x->laddr->v4.sin_port != htons(asoc->base.bind_addr.port))
> +               return 1;
> +       if (!sctp_cmp_addr_exact(&t->ipaddr, x->paddr))
> +               return 1;
> +       if (!net_eq(sock_net(asoc->base.sk), net))
> +               return 1;
> +       if (!sctp_bind_addr_match(&asoc->base.bind_addr,
> +                                 x->laddr, sctp_sk(asoc->base.sk)))
> +               return 1;
> +
> +       return 0;
> +}
> +
> +static inline u32 sctp_hash_obj(const void *data, u32 len, u32 seed)
> +{
> +       const struct sctp_transport *t = data;
> +       const union sctp_addr *paddr = &t->ipaddr;
> +       const struct net *net = sock_net(t->asoc->base.sk);
> +       u16 lport = htons(t->asoc->base.bind_addr.port);
> +       u32 addr;
> +
> +       if (paddr->sa.sa_family == AF_INET6)
> +               addr = jhash(&paddr->v6.sin6_addr, 16, seed);
> +       else
> +               addr = paddr->v4.sin_addr.s_addr;
> +
> +       return  jhash_3words(addr, ((__u32)paddr->v4.sin_port) << 16 |
> +                            (__force __u32)lport, net_hash_mix(net), seed);
> +}
> +
> +static inline u32 sctp_hash_key(const void *data, u32 len, u32 seed)
> +{
> +       const struct sctp_hash_cmp_arg *x = data;
> +       const union sctp_addr *paddr = x->paddr;
> +       const struct net *net = x->net;
> +       u16 lport = x->laddr->v4.sin_port;
> +       u32 addr;
> +
> +       if (paddr->sa.sa_family == AF_INET6)
> +               addr = jhash(&paddr->v6.sin6_addr, 16, seed);
> +       else
> +               addr = paddr->v4.sin_addr.s_addr;
> +
> +       return  jhash_3words(addr, ((__u32)paddr->v4.sin_port) << 16 |
> +                            (__force __u32)lport, net_hash_mix(net), seed);
> +}

There's your problem.  You are allowing multiple objects to hash
to the same value.  This is unacceptable with rhashtable because
we use the hash chain length to determine if we're under attack
and need to rehash.  This is the reason why you would see EBUSY
during insertion.

So instead of inserting your objects as is, you should instead hash
a list object that then links to all the objects that has the same
hash key.

Cheers,
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH net-next 1/5] sctp: add the rhashtable apis for sctp global transport hashtable
  2015-12-30 17:50     ` David Miller
@ 2016-01-11  9:32       ` Herbert Xu
  2016-01-11 16:33         ` Marcelo Ricardo Leitner
  0 siblings, 1 reply; 40+ messages in thread
From: Herbert Xu @ 2016-01-11  9:32 UTC (permalink / raw)
  To: David Miller
  Cc: eric.dumazet, lucien.xin, netdev, linux-sctp, mleitner, vyasevic, daniel

David Miller <davem@davemloft.net> wrote:
> From: Eric Dumazet <eric.dumazet@gmail.com>
> Date: Wed, 30 Dec 2015 11:57:31 -0500
> 
>> I am against using rhashtable in SCTP (or TCP) at this stage, given the
>> number of bugs we have with it.
> 
> Come on Eric, we've largely dealt with all of these problems.  I haven't
> seen a serious report in a while.

Well there is still the outstanding issue with softirq insertion
potentially failing with ENOMEM if we fail to expand the hash
table using just kmalloc.

So if the target user does softirq insertions, I would wait until
the fix for that is ready.

Thanks,
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH net-next 2/5] sctp: apply rhashtable api to send/recv path
  2016-01-06 17:42       ` mleitner
@ 2016-01-11 15:00         ` Vlad Yasevich
  0 siblings, 0 replies; 40+ messages in thread
From: Vlad Yasevich @ 2016-01-11 15:00 UTC (permalink / raw)
  To: mleitner; +Cc: Xin Long, network dev, linux-sctp, vyasevic, daniel, davem

On 01/06/2016 12:42 PM, mleitner@redhat.com wrote:
> On Tue, Jan 05, 2016 at 02:07:11PM -0500, Vlad Yasevich wrote:
>> On 12/30/2015 10:50 AM, Xin Long wrote:
>>> apply lookup apis to two functions, for __sctp_endpoint_lookup_assoc
>>> and __sctp_lookup_association, it's invoked in the protection of sock
>>> lock, it will be safe, but sctp_lookup_association need to call
>>> rcu_read_lock() and to detect the t->dead to protect it.
>>>
>>> Signed-off-by: Xin Long <lucien.xin@gmail.com>
>>> Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
>>> ---
>>>  net/sctp/associola.c   |  5 +++++
>>>  net/sctp/endpointola.c | 35 ++++++++---------------------------
>>>  net/sctp/input.c       | 39 ++++++++++-----------------------------
>>>  net/sctp/protocol.c    |  6 ++++++
>>>  4 files changed, 29 insertions(+), 56 deletions(-)
>>>
>>> diff --git a/net/sctp/associola.c b/net/sctp/associola.c
>>> index 559afd0..2bf8ec9 100644
>>> --- a/net/sctp/associola.c
>>> +++ b/net/sctp/associola.c
>>> @@ -383,6 +383,7 @@ void sctp_association_free(struct sctp_association *asoc)
>>>  	list_for_each_safe(pos, temp, &asoc->peer.transport_addr_list) {
>>>  		transport = list_entry(pos, struct sctp_transport, transports);
>>>  		list_del_rcu(pos);
>>> +		sctp_unhash_transport(transport);
>>>  		sctp_transport_free(transport);
>>>  	}
>>>  
>>> @@ -500,6 +501,8 @@ void sctp_assoc_rm_peer(struct sctp_association *asoc,
>>>  
>>>  	/* Remove this peer from the list. */
>>>  	list_del_rcu(&peer->transports);
>>> +	/* Remove this peer from the transport hashtable */
>>> +	sctp_unhash_transport(peer);
>>>  
>>>  	/* Get the first transport of asoc. */
>>>  	pos = asoc->peer.transport_addr_list.next;
>>> @@ -699,6 +702,8 @@ struct sctp_transport *sctp_assoc_add_peer(struct sctp_association *asoc,
>>>  	/* Attach the remote transport to our asoc.  */
>>>  	list_add_tail_rcu(&peer->transports, &asoc->peer.transport_addr_list);
>>>  	asoc->peer.transport_count++;
>>> +	/* Add this peer into the transport hashtable */
>>> +	sctp_hash_transport(peer);
>>
>> This is actually problematic.  The issue is that transports are unhashed when removed.
>> however, transport removal happens after the association has been declared dead and
>> should have been removed from the hash and marked unreachable.
>>
>> As a result, with the code above, you can now find and return a dead association.
>> Checking for 'dead' state is racy.
> 
> Mind elaborating why you think this is racy? (More below)
> 
>> The best solution I've come up with is to hash the transports in sctp_hash_established()
>> and clean-up in __sctp_unhash_established(), and then handle ADD-IP case separately.
> 
> The idea was exactly the opposite :) to avoid such calls. All calls to
> sctp_unhash_established() were followed by sctp_association_free(), and
> vice-versa, indicating that it could (and probably should) be embedded
> in sctp_association_free() itself.

Oh, I understand the idea and the desire...

> 
> No extra locking was taken other than to protect the hash table itself,
> which now is different. The check against ->dead should be quite as
> effective as prior implementation, as it's marked dead quite early in
> sctp_association_free(). We could do it a bit earlier if necessary.
> 
> Please correct if I'm wrong, but AFAIU rhashtable, it's possible to
> return an element that is being removed by another CPU, due to RCU
> usage. If that's right, the early removal is not enough anymore and
> only a check like the the ->dead one can protect it then. Hmmm we
> probably should use something more atomic there then.

So, I've been looking at this problem trying to figure out what specifically
triggers this race.  It comes down to what seems like a small change in
in __sctp_lookup_association(), but it is significant.  The old code
took a ref on the found association while under a read lock for the hash bucket.
This guaranteed that the association still existed in the hash, if even though
it was in the process of being removed, we were guaranteed to have a proper ref
on it.  With the new code, the guarantee is gone.   It looks like we
illegally take a ref on the association without any requisite protections.  As
a result, the transport may have been unhashed and the association is going through
the destroy phase, when we bump the ref (technically from 0).

I think what Herbert recently suggested may help us here.  He suggested hashing a
list object and linking transports to it. I am thinking right now that f we take it
one step further and put a lock inside that list object that protects the list,
then we might be able to solve this race.

> 
>> The above would also remove the necessity to check for temporary associations, since they
>> should never be hashed.
> 
> Agreed. We could add a simple
> 	if (t->asoc->temp)
> 		return;
> to the new sctp_hash/unhash_transport to fix that.

Good idea.

-vlad

> 
>   Marcelo
> 

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH net-next 1/5] sctp: add the rhashtable apis for sctp global transport hashtable
  2016-01-11  9:30   ` Herbert Xu
@ 2016-01-11 16:00     ` mleitner
  2016-01-11 17:20       ` Vlad Yasevich
  0 siblings, 1 reply; 40+ messages in thread
From: mleitner @ 2016-01-11 16:00 UTC (permalink / raw)
  To: Herbert Xu; +Cc: Xin Long, netdev, linux-sctp, vyasevic, daniel, davem

On Mon, Jan 11, 2016 at 05:30:12PM +0800, Herbert Xu wrote:
> Xin Long <lucien.xin@gmail.com> wrote:
> >
> > +static inline int sctp_hash_cmp(struct rhashtable_compare_arg *arg,
> > +                               const void *ptr)
> > +{
> > +       const struct sctp_hash_cmp_arg *x = arg->key;
> > +       const struct sctp_transport *t = ptr;
> > +       struct sctp_association *asoc = t->asoc;
> > +       const struct net *net = x->net;
> > +
> > +       if (x->laddr->v4.sin_port != htons(asoc->base.bind_addr.port))
> > +               return 1;
> > +       if (!sctp_cmp_addr_exact(&t->ipaddr, x->paddr))
> > +               return 1;
> > +       if (!net_eq(sock_net(asoc->base.sk), net))
> > +               return 1;
> > +       if (!sctp_bind_addr_match(&asoc->base.bind_addr,
> > +                                 x->laddr, sctp_sk(asoc->base.sk)))
> > +               return 1;
> > +
> > +       return 0;
> > +}
> > +
> > +static inline u32 sctp_hash_obj(const void *data, u32 len, u32 seed)
> > +{
> > +       const struct sctp_transport *t = data;
> > +       const union sctp_addr *paddr = &t->ipaddr;
> > +       const struct net *net = sock_net(t->asoc->base.sk);
> > +       u16 lport = htons(t->asoc->base.bind_addr.port);
> > +       u32 addr;
> > +
> > +       if (paddr->sa.sa_family == AF_INET6)
> > +               addr = jhash(&paddr->v6.sin6_addr, 16, seed);
> > +       else
> > +               addr = paddr->v4.sin_addr.s_addr;
> > +
> > +       return  jhash_3words(addr, ((__u32)paddr->v4.sin_port) << 16 |
> > +                            (__force __u32)lport, net_hash_mix(net), seed);
> > +}
> > +
> > +static inline u32 sctp_hash_key(const void *data, u32 len, u32 seed)
> > +{
> > +       const struct sctp_hash_cmp_arg *x = data;
> > +       const union sctp_addr *paddr = x->paddr;
> > +       const struct net *net = x->net;
> > +       u16 lport = x->laddr->v4.sin_port;
> > +       u32 addr;
> > +
> > +       if (paddr->sa.sa_family == AF_INET6)
> > +               addr = jhash(&paddr->v6.sin6_addr, 16, seed);
> > +       else
> > +               addr = paddr->v4.sin_addr.s_addr;
> > +
> > +       return  jhash_3words(addr, ((__u32)paddr->v4.sin_port) << 16 |
> > +                            (__force __u32)lport, net_hash_mix(net), seed);
> > +}
> 
> There's your problem.  You are allowing multiple objects to hash
> to the same value.  This is unacceptable with rhashtable because
> we use the hash chain length to determine if we're under attack
> and need to rehash.  This is the reason why you would see EBUSY
> during insertion.

Cool. Then I guess we don't really have an issue here. The case that
fails is an artificial load test which is virtually impossible to be hit
in real world, or at least I really hope so. The test, as in Xin's
attachment, will load more than 1600 IP addresses in one host (2 vCPU
during the test) and attempt to start an assoc from each of those using
the very same (lport, daddr, dport)-tuple.

Doing so is just unreasonable. Note that net is also hashed, so
even if we consider it could be 1600 containers, it is fine.

> So instead of inserting your objects as is, you should instead hash
> a list object that then links to all the objects that has the same
> hash key.

Now that it is clarified, I'm thinking we should just get ride of that
loop on EBUSY. No real reason to have extra code only to support an
artificial test. Agree?

Thanks,
Marcelo

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH net-next 1/5] sctp: add the rhashtable apis for sctp global transport hashtable
  2016-01-11  9:32       ` Herbert Xu
@ 2016-01-11 16:33         ` Marcelo Ricardo Leitner
  2016-01-11 18:08           ` Vlad Yasevich
  0 siblings, 1 reply; 40+ messages in thread
From: Marcelo Ricardo Leitner @ 2016-01-11 16:33 UTC (permalink / raw)
  To: Herbert Xu
  Cc: David Miller, eric.dumazet, lucien.xin, netdev, linux-sctp,
	mleitner, vyasevic, daniel

On Mon, Jan 11, 2016 at 05:32:10PM +0800, Herbert Xu wrote:
> David Miller <davem@davemloft.net> wrote:
> > From: Eric Dumazet <eric.dumazet@gmail.com>
> > Date: Wed, 30 Dec 2015 11:57:31 -0500
> > 
> >> I am against using rhashtable in SCTP (or TCP) at this stage, given the
> >> number of bugs we have with it.
> > 
> > Come on Eric, we've largely dealt with all of these problems.  I haven't
> > seen a serious report in a while.
> 
> Well there is still the outstanding issue with softirq insertion
> potentially failing with ENOMEM if we fail to expand the hash
> table using just kmalloc.
> 
> So if the target user does softirq insertions, I would wait until
> the fix for that is ready.

It does some, yes. If listening socket is not backlogged, there will be
N inserts at each new association, where N is the number of IP addresses
that the client is advertising.

This is done on the second stage of the SCTP handshake. Not easily
DoS-able as it requires receiving a packet from server and replying
based on it, plus N is limited by MTU.

AFAIK Xin's stress tests couldn't hit this situation of ENOMEM, btw.

Thanks,
Marcelo

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH net-next 1/5] sctp: add the rhashtable apis for sctp global transport hashtable
  2016-01-11 16:00     ` mleitner
@ 2016-01-11 17:20       ` Vlad Yasevich
  2016-01-11 18:09         ` mleitner
  2016-01-11 21:31         ` David Miller
  0 siblings, 2 replies; 40+ messages in thread
From: Vlad Yasevich @ 2016-01-11 17:20 UTC (permalink / raw)
  To: mleitner, Herbert Xu
  Cc: Xin Long, netdev, linux-sctp, vyasevic, daniel, davem

On 01/11/2016 11:00 AM, mleitner@redhat.com wrote:
> On Mon, Jan 11, 2016 at 05:30:12PM +0800, Herbert Xu wrote:
>> Xin Long <lucien.xin@gmail.com> wrote:
>>>
>>> +static inline int sctp_hash_cmp(struct rhashtable_compare_arg *arg,
>>> +                               const void *ptr)
>>> +{
>>> +       const struct sctp_hash_cmp_arg *x = arg->key;
>>> +       const struct sctp_transport *t = ptr;
>>> +       struct sctp_association *asoc = t->asoc;
>>> +       const struct net *net = x->net;
>>> +
>>> +       if (x->laddr->v4.sin_port != htons(asoc->base.bind_addr.port))
>>> +               return 1;
>>> +       if (!sctp_cmp_addr_exact(&t->ipaddr, x->paddr))
>>> +               return 1;
>>> +       if (!net_eq(sock_net(asoc->base.sk), net))
>>> +               return 1;
>>> +       if (!sctp_bind_addr_match(&asoc->base.bind_addr,
>>> +                                 x->laddr, sctp_sk(asoc->base.sk)))
>>> +               return 1;
>>> +
>>> +       return 0;
>>> +}
>>> +
>>> +static inline u32 sctp_hash_obj(const void *data, u32 len, u32 seed)
>>> +{
>>> +       const struct sctp_transport *t = data;
>>> +       const union sctp_addr *paddr = &t->ipaddr;
>>> +       const struct net *net = sock_net(t->asoc->base.sk);
>>> +       u16 lport = htons(t->asoc->base.bind_addr.port);
>>> +       u32 addr;
>>> +
>>> +       if (paddr->sa.sa_family == AF_INET6)
>>> +               addr = jhash(&paddr->v6.sin6_addr, 16, seed);
>>> +       else
>>> +               addr = paddr->v4.sin_addr.s_addr;
>>> +
>>> +       return  jhash_3words(addr, ((__u32)paddr->v4.sin_port) << 16 |
>>> +                            (__force __u32)lport, net_hash_mix(net), seed);
>>> +}
>>> +
>>> +static inline u32 sctp_hash_key(const void *data, u32 len, u32 seed)
>>> +{
>>> +       const struct sctp_hash_cmp_arg *x = data;
>>> +       const union sctp_addr *paddr = x->paddr;
>>> +       const struct net *net = x->net;
>>> +       u16 lport = x->laddr->v4.sin_port;
>>> +       u32 addr;
>>> +
>>> +       if (paddr->sa.sa_family == AF_INET6)
>>> +               addr = jhash(&paddr->v6.sin6_addr, 16, seed);
>>> +       else
>>> +               addr = paddr->v4.sin_addr.s_addr;
>>> +
>>> +       return  jhash_3words(addr, ((__u32)paddr->v4.sin_port) << 16 |
>>> +                            (__force __u32)lport, net_hash_mix(net), seed);
>>> +}
>>
>> There's your problem.  You are allowing multiple objects to hash
>> to the same value.  This is unacceptable with rhashtable because
>> we use the hash chain length to determine if we're under attack
>> and need to rehash.  This is the reason why you would see EBUSY
>> during insertion.
> 
> Cool. Then I guess we don't really have an issue here. The case that
> fails is an artificial load test which is virtually impossible to be hit
> in real world, or at least I really hope so. The test, as in Xin's
> attachment, will load more than 1600 IP addresses in one host (2 vCPU
> during the test) and attempt to start an assoc from each of those using
> the very same (lport, daddr, dport)-tuple.
> 
> Doing so is just unreasonable. Note that net is also hashed, so
> even if we consider it could be 1600 containers, it is fine.

I have a hard time excepting this argument.  Just because a given test
scenario may be unreasonable now, doesn't make so in the future.  If
there is a way to solve the problem, then it should be done.  Saying
this isn't really a problem isn't going to make it go away.

-vlad

> 
>> So instead of inserting your objects as is, you should instead hash
>> a list object that then links to all the objects that has the same
>> hash key.
> 
> Now that it is clarified, I'm thinking we should just get ride of that
> loop on EBUSY. No real reason to have extra code only to support an
> artificial test. Agree?
> 
> Thanks,
> Marcelo
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-sctp" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH net-next 1/5] sctp: add the rhashtable apis for sctp global transport hashtable
  2016-01-11 16:33         ` Marcelo Ricardo Leitner
@ 2016-01-11 18:08           ` Vlad Yasevich
  2016-01-11 18:19             ` Marcelo Ricardo Leitner
  0 siblings, 1 reply; 40+ messages in thread
From: Vlad Yasevich @ 2016-01-11 18:08 UTC (permalink / raw)
  To: Marcelo Ricardo Leitner, Herbert Xu
  Cc: David Miller, eric.dumazet, lucien.xin, netdev, linux-sctp,
	mleitner, vyasevic, daniel

On 01/11/2016 11:33 AM, Marcelo Ricardo Leitner wrote:
> On Mon, Jan 11, 2016 at 05:32:10PM +0800, Herbert Xu wrote:
>> David Miller <davem@davemloft.net> wrote:
>>> From: Eric Dumazet <eric.dumazet@gmail.com>
>>> Date: Wed, 30 Dec 2015 11:57:31 -0500
>>>
>>>> I am against using rhashtable in SCTP (or TCP) at this stage, given the
>>>> number of bugs we have with it.
>>>
>>> Come on Eric, we've largely dealt with all of these problems.  I haven't
>>> seen a serious report in a while.
>>
>> Well there is still the outstanding issue with softirq insertion
>> potentially failing with ENOMEM if we fail to expand the hash
>> table using just kmalloc.
>>
>> So if the target user does softirq insertions, I would wait until
>> the fix for that is ready.
> 
> It does some, yes. If listening socket is not backlogged, there will be
> N inserts at each new association, where N is the number of IP addresses
> that the client is advertising.
> 
> This is done on the second stage of the SCTP handshake. Not easily
> DoS-able as it requires receiving a packet from server and replying
> based on it, plus N is limited by MTU.

How is N limited by MTU?  INIT and COOKIE chunks are allowed to exceed
mtu and will be IP fragmented.  So it seems that N is limited by 65535-overhead,
where overhead is all the headers plus the sctp cookie info.  That can
be quite a lot of addresses.

-vlad

> 
> AFAIK Xin's stress tests couldn't hit this situation of ENOMEM, btw.
> 
> Thanks,
> Marcelo
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-sctp" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH net-next 1/5] sctp: add the rhashtable apis for sctp global transport hashtable
  2016-01-11 17:20       ` Vlad Yasevich
@ 2016-01-11 18:09         ` mleitner
  2016-01-11 21:35           ` David Miller
  2016-01-11 21:31         ` David Miller
  1 sibling, 1 reply; 40+ messages in thread
From: mleitner @ 2016-01-11 18:09 UTC (permalink / raw)
  To: Vlad Yasevich
  Cc: Herbert Xu, Xin Long, netdev, linux-sctp, vyasevic, daniel, davem

On Mon, Jan 11, 2016 at 12:20:17PM -0500, Vlad Yasevich wrote:
> On 01/11/2016 11:00 AM, mleitner@redhat.com wrote:
> > On Mon, Jan 11, 2016 at 05:30:12PM +0800, Herbert Xu wrote:
> >> Xin Long <lucien.xin@gmail.com> wrote:
> >>>
> >>> +static inline int sctp_hash_cmp(struct rhashtable_compare_arg *arg,
> >>> +                               const void *ptr)
> >>> +{
> >>> +       const struct sctp_hash_cmp_arg *x = arg->key;
> >>> +       const struct sctp_transport *t = ptr;
> >>> +       struct sctp_association *asoc = t->asoc;
> >>> +       const struct net *net = x->net;
> >>> +
> >>> +       if (x->laddr->v4.sin_port != htons(asoc->base.bind_addr.port))
> >>> +               return 1;
> >>> +       if (!sctp_cmp_addr_exact(&t->ipaddr, x->paddr))
> >>> +               return 1;
> >>> +       if (!net_eq(sock_net(asoc->base.sk), net))
> >>> +               return 1;
> >>> +       if (!sctp_bind_addr_match(&asoc->base.bind_addr,
> >>> +                                 x->laddr, sctp_sk(asoc->base.sk)))
> >>> +               return 1;
> >>> +
> >>> +       return 0;
> >>> +}
> >>> +
> >>> +static inline u32 sctp_hash_obj(const void *data, u32 len, u32 seed)
> >>> +{
> >>> +       const struct sctp_transport *t = data;
> >>> +       const union sctp_addr *paddr = &t->ipaddr;
> >>> +       const struct net *net = sock_net(t->asoc->base.sk);
> >>> +       u16 lport = htons(t->asoc->base.bind_addr.port);
> >>> +       u32 addr;
> >>> +
> >>> +       if (paddr->sa.sa_family == AF_INET6)
> >>> +               addr = jhash(&paddr->v6.sin6_addr, 16, seed);
> >>> +       else
> >>> +               addr = paddr->v4.sin_addr.s_addr;
> >>> +
> >>> +       return  jhash_3words(addr, ((__u32)paddr->v4.sin_port) << 16 |
> >>> +                            (__force __u32)lport, net_hash_mix(net), seed);
> >>> +}
> >>> +
> >>> +static inline u32 sctp_hash_key(const void *data, u32 len, u32 seed)
> >>> +{
> >>> +       const struct sctp_hash_cmp_arg *x = data;
> >>> +       const union sctp_addr *paddr = x->paddr;
> >>> +       const struct net *net = x->net;
> >>> +       u16 lport = x->laddr->v4.sin_port;
> >>> +       u32 addr;
> >>> +
> >>> +       if (paddr->sa.sa_family == AF_INET6)
> >>> +               addr = jhash(&paddr->v6.sin6_addr, 16, seed);
> >>> +       else
> >>> +               addr = paddr->v4.sin_addr.s_addr;
> >>> +
> >>> +       return  jhash_3words(addr, ((__u32)paddr->v4.sin_port) << 16 |
> >>> +                            (__force __u32)lport, net_hash_mix(net), seed);
> >>> +}
> >>
> >> There's your problem.  You are allowing multiple objects to hash
> >> to the same value.  This is unacceptable with rhashtable because
> >> we use the hash chain length to determine if we're under attack
> >> and need to rehash.  This is the reason why you would see EBUSY
> >> during insertion.
> > 
> > Cool. Then I guess we don't really have an issue here. The case that
> > fails is an artificial load test which is virtually impossible to be hit
> > in real world, or at least I really hope so. The test, as in Xin's
> > attachment, will load more than 1600 IP addresses in one host (2 vCPU
> > during the test) and attempt to start an assoc from each of those using
> > the very same (lport, daddr, dport)-tuple.
> > 
> > Doing so is just unreasonable. Note that net is also hashed, so
> > even if we consider it could be 1600 containers, it is fine.
> 
> I have a hard time excepting this argument.  Just because a given test
> scenario may be unreasonable now, doesn't make so in the future.  If
> there is a way to solve the problem, then it should be done.  Saying
> this isn't really a problem isn't going to make it go away.

Heh, I understand..

There is still the other part of this thread to be worked on (re
->dead), maybe that will justify extra stuff in here but I really
wouldn't like to add extra structures and locks on this just to satisfy
an unreasonable scenario like this. This hash is very busy, the lean it
is, the better.

Maybe we could keep the loop as is for now, as a fail-safe, and add a
pr_warn_once() if it gets hit?

  Marcelo

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH net-next 1/5] sctp: add the rhashtable apis for sctp global transport hashtable
  2016-01-11 18:08           ` Vlad Yasevich
@ 2016-01-11 18:19             ` Marcelo Ricardo Leitner
  0 siblings, 0 replies; 40+ messages in thread
From: Marcelo Ricardo Leitner @ 2016-01-11 18:19 UTC (permalink / raw)
  To: Vlad Yasevich
  Cc: Herbert Xu, David Miller, eric.dumazet, lucien.xin, netdev,
	linux-sctp, mleitner, vyasevic, daniel

On Mon, Jan 11, 2016 at 01:08:56PM -0500, Vlad Yasevich wrote:
> On 01/11/2016 11:33 AM, Marcelo Ricardo Leitner wrote:
> > On Mon, Jan 11, 2016 at 05:32:10PM +0800, Herbert Xu wrote:
> >> David Miller <davem@davemloft.net> wrote:
> >>> From: Eric Dumazet <eric.dumazet@gmail.com>
> >>> Date: Wed, 30 Dec 2015 11:57:31 -0500
> >>>
> >>>> I am against using rhashtable in SCTP (or TCP) at this stage, given the
> >>>> number of bugs we have with it.
> >>>
> >>> Come on Eric, we've largely dealt with all of these problems.  I haven't
> >>> seen a serious report in a while.
> >>
> >> Well there is still the outstanding issue with softirq insertion
> >> potentially failing with ENOMEM if we fail to expand the hash
> >> table using just kmalloc.
> >>
> >> So if the target user does softirq insertions, I would wait until
> >> the fix for that is ready.
> > 
> > It does some, yes. If listening socket is not backlogged, there will be
> > N inserts at each new association, where N is the number of IP addresses
> > that the client is advertising.
> > 
> > This is done on the second stage of the SCTP handshake. Not easily
> > DoS-able as it requires receiving a packet from server and replying
> > based on it, plus N is limited by MTU.
> 
> How is N limited by MTU?  INIT and COOKIE chunks are allowed to exceed
> mtu and will be IP fragmented.  So it seems that N is limited by 65535-overhead,
> where overhead is all the headers plus the sctp cookie info.  That can
> be quite a lot of addresses.

Oups, you're right. One then can trigger quite some inserts with a
single new assoc attempt, yes.

  Marcelo

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH net-next 1/5] sctp: add the rhashtable apis for sctp global transport hashtable
  2016-01-11 17:20       ` Vlad Yasevich
  2016-01-11 18:09         ` mleitner
@ 2016-01-11 21:31         ` David Miller
  1 sibling, 0 replies; 40+ messages in thread
From: David Miller @ 2016-01-11 21:31 UTC (permalink / raw)
  To: vyasevich
  Cc: mleitner, herbert, lucien.xin, netdev, linux-sctp, vyasevic, daniel

From: Vlad Yasevich <vyasevich@gmail.com>
Date: Mon, 11 Jan 2016 12:20:17 -0500

> I have a hard time excepting this argument.  Just because a given test
> scenario may be unreasonable now, doesn't make so in the future.  If
> there is a way to solve the problem, then it should be done.  Saying
> this isn't really a problem isn't going to make it go away.

+1

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH net-next 1/5] sctp: add the rhashtable apis for sctp global transport hashtable
  2016-01-11 18:09         ` mleitner
@ 2016-01-11 21:35           ` David Miller
  0 siblings, 0 replies; 40+ messages in thread
From: David Miller @ 2016-01-11 21:35 UTC (permalink / raw)
  To: mleitner
  Cc: vyasevich, herbert, lucien.xin, netdev, linux-sctp, vyasevic, daniel

From: mleitner@redhat.com
Date: Mon, 11 Jan 2016 16:09:27 -0200

> There is still the other part of this thread to be worked on (re
> ->dead), maybe that will justify extra stuff in here but I really
> wouldn't like to add extra structures and locks on this just to satisfy
> an unreasonable scenario like this. This hash is very busy, the lean it
> is, the better.

It is never "unreasonable" if it is necessary to fix a real bug, which
this is.

^ permalink raw reply	[flat|nested] 40+ messages in thread

end of thread, other threads:[~2016-01-11 21:35 UTC | newest]

Thread overview: 40+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-12-30 15:50 [PATCH net-next 0/5] sctp: use transport hashtable to replace association's with rhashtable Xin Long
2015-12-30 15:50 ` [PATCH net-next 1/5] sctp: add the rhashtable apis for sctp global transport hashtable Xin Long
2015-12-30 15:50   ` [PATCH net-next 2/5] sctp: apply rhashtable api to send/recv path Xin Long
2015-12-30 15:50     ` [PATCH net-next 3/5] sctp: apply rhashtable api to sctp procfs Xin Long
2015-12-30 15:50       ` [PATCH net-next 4/5] sctp: drop the old assoc hashtable of sctp Xin Long
2015-12-30 15:50         ` [PATCH net-next 5/5] sctp: remove the local_bh_disable/enable in sctp_endpoint_lookup_assoc Xin Long
2016-01-05 19:07     ` [PATCH net-next 2/5] sctp: apply rhashtable api to send/recv path Vlad Yasevich
2016-01-06 16:18       ` Xin Long
2016-01-06 17:42       ` mleitner
2016-01-11 15:00         ` Vlad Yasevich
2015-12-30 16:57   ` [PATCH net-next 1/5] sctp: add the rhashtable apis for sctp global transport hashtable Eric Dumazet
2015-12-30 17:50     ` David Miller
2016-01-11  9:32       ` Herbert Xu
2016-01-11 16:33         ` Marcelo Ricardo Leitner
2016-01-11 18:08           ` Vlad Yasevich
2016-01-11 18:19             ` Marcelo Ricardo Leitner
2015-12-30 17:41   ` Marcelo Ricardo Leitner
2016-01-05 10:10     ` Xin Long
2016-01-11  9:22       ` Herbert Xu
2016-01-05 18:38   ` Vlad Yasevich
2016-01-06 17:01     ` Xin Long
2016-01-06 18:19       ` Marcelo Ricardo Leitner
2016-01-07 17:23         ` Marcelo Ricardo Leitner
2016-01-07 20:28       ` Vlad Yasevich
2016-01-11  9:30   ` Herbert Xu
2016-01-11 16:00     ` mleitner
2016-01-11 17:20       ` Vlad Yasevich
2016-01-11 18:09         ` mleitner
2016-01-11 21:35           ` David Miller
2016-01-11 21:31         ` David Miller
2015-12-30 17:19 ` [PATCH net-next 0/5] sctp: use transport hashtable to replace association's with rhashtable Eric Dumazet
2015-12-30 17:32   ` Marcelo Ricardo Leitner
2015-12-30 19:11     ` Eric Dumazet
2015-12-30 20:44       ` David Miller
2015-12-30 21:57         ` Eric Dumazet
2015-12-30 22:29           ` Marcelo Ricardo Leitner
2015-12-30 17:52   ` David Miller
2015-12-30 19:03     ` Eric Dumazet
2015-12-30 20:40       ` David Miller
2016-01-04 22:30 ` David Miller

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).