* [PATCH net-next 0/5] sctp: use transport hashtable to replace association's with rhashtable @ 2015-12-30 15:50 Xin Long 2015-12-30 15:50 ` [PATCH net-next 1/5] sctp: add the rhashtable apis for sctp global transport hashtable Xin Long ` (2 more replies) 0 siblings, 3 replies; 40+ messages in thread From: Xin Long @ 2015-12-30 15:50 UTC (permalink / raw) To: network dev, linux-sctp; +Cc: mleitner, vyasevic, daniel, davem for telecom center, the usual case is that a server is connected by thousands of clients. but if the server with only one enpoint(udp style) use the same sport and dport to communicate with every clients, and every assoc in server will be hashed in the same chain of global assoc hashtable due to currently we choose dport and sport as the hash key. when a packet is received, sctp_rcv try to find the assoc with sport and dport, since that chain is too long to find it fast, it make the performance turn to very low, some test data is as follow: in server: $./ss [start a udp style server there] in client: $./cc [start 2500 sockets to connect server with same port and different ip, and use one of them to send data to server] ===== test on net-next -- perf top server: 55.73% [kernel] [k] sctp_assoc_is_match 6.80% [kernel] [k] sctp_assoc_lookup_paddr 4.81% [kernel] [k] sctp_v4_cmp_addr 3.12% [kernel] [k] _raw_spin_unlock_irqrestore 1.94% [kernel] [k] sctp_cmp_addr_exact client: 46.01% [kernel] [k] sctp_endpoint_lookup_assoc 5.55% libc-2.17.so [.] __libc_calloc 5.39% libc-2.17.so [.] _int_free 3.92% libc-2.17.so [.] _int_malloc 3.23% [kernel] [k] __memset -- spent time time is 487s, send pkt is 10000000 we need to change the way to calculate the hash key, to use lport + rport + paddr as the hash key can avoid this issue. besides, this patchset will use transport hashtable to replace association hashtable to lookup with rhashtable api. get transport first then get association by t->asoc. and also it will make tcp style work better. ===== test with this patchset: -- perf top server: 15.98% [kernel] [k] _raw_spin_unlock_irqrestore 9.92% [kernel] [k] __pv_queued_spin_lock_slowpath 7.22% [kernel] [k] copy_user_generic_string 2.38% libpthread-2.17.so [.] __recvmsg_nocancel 1.88% [kernel] [k] sctp_recvmsg client: 11.90% [kernel] [k] sctp_hash_cmp 8.52% [kernel] [k] rht_deferred_worker 4.94% [kernel] [k] __pv_queued_spin_lock_slowpath 3.95% [kernel] [k] sctp_bind_addr_match 2.49% [kernel] [k] __memset -- spent time time is 22s, send pkt is 10000000 Xin Long (5): sctp: add the rhashtable apis for sctp global transport hashtable sctp: apply rhashtable api to send/recv path sctp: apply rhashtable api to sctp procfs sctp: drop the old assoc hashtable of sctp sctp: remove the local_bh_disable/enable in sctp_endpoint_lookup_assoc include/net/sctp/sctp.h | 32 ++--- include/net/sctp/structs.h | 10 +- net/sctp/associola.c | 5 + net/sctp/endpointola.c | 52 ++------ net/sctp/input.c | 187 +++++++++++++++++---------- net/sctp/proc.c | 316 +++++++++++++++++++++++++-------------------- net/sctp/protocol.c | 36 ++---- net/sctp/sm_sideeffect.c | 2 - net/sctp/socket.c | 6 +- 9 files changed, 331 insertions(+), 315 deletions(-) -- 2.1.0 ^ permalink raw reply [flat|nested] 40+ messages in thread
* [PATCH net-next 1/5] sctp: add the rhashtable apis for sctp global transport hashtable 2015-12-30 15:50 [PATCH net-next 0/5] sctp: use transport hashtable to replace association's with rhashtable Xin Long @ 2015-12-30 15:50 ` Xin Long 2015-12-30 15:50 ` [PATCH net-next 2/5] sctp: apply rhashtable api to send/recv path Xin Long ` (4 more replies) 2015-12-30 17:19 ` [PATCH net-next 0/5] sctp: use transport hashtable to replace association's with rhashtable Eric Dumazet 2016-01-04 22:30 ` David Miller 2 siblings, 5 replies; 40+ messages in thread From: Xin Long @ 2015-12-30 15:50 UTC (permalink / raw) To: network dev, linux-sctp; +Cc: mleitner, vyasevic, daniel, davem tranport hashtbale will replace the association hashtable to do the lookup for transport, and then get association by t->assoc, rhashtable apis will be used because of it's resizable, scalable and using rcu. lport + rport + paddr will be the base hashkey to locate the chain, with net to protect one netns from another, then plus the laddr to compare to get the target. this patch will provider the lookup functions: - sctp_epaddr_lookup_transport - sctp_addrs_lookup_transport hash/unhash functions: - sctp_hash_transport - sctp_unhash_transport init/destroy functions: - sctp_transport_hashtable_init - sctp_transport_hashtable_destroy Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> --- include/net/sctp/sctp.h | 11 ++++ include/net/sctp/structs.h | 5 ++ net/sctp/input.c | 131 +++++++++++++++++++++++++++++++++++++++++++++ 3 files changed, 147 insertions(+) diff --git a/include/net/sctp/sctp.h b/include/net/sctp/sctp.h index ce13cf2..7bbdfba 100644 --- a/include/net/sctp/sctp.h +++ b/include/net/sctp/sctp.h @@ -143,6 +143,17 @@ void sctp_icmp_proto_unreachable(struct sock *sk, struct sctp_transport *t); void sctp_backlog_migrate(struct sctp_association *assoc, struct sock *oldsk, struct sock *newsk); +int sctp_transport_hashtable_init(void); +void sctp_transport_hashtable_destroy(void); +void sctp_hash_transport(struct sctp_transport *t); +void sctp_unhash_transport(struct sctp_transport *t); +struct sctp_transport *sctp_addrs_lookup_transport( + struct net *net, + const union sctp_addr *laddr, + const union sctp_addr *paddr); +struct sctp_transport *sctp_epaddr_lookup_transport( + const struct sctp_endpoint *ep, + const union sctp_addr *paddr); /* * sctp/proc.c diff --git a/include/net/sctp/structs.h b/include/net/sctp/structs.h index eea9bde..4ab87d0 100644 --- a/include/net/sctp/structs.h +++ b/include/net/sctp/structs.h @@ -48,6 +48,7 @@ #define __sctp_structs_h__ #include <linux/ktime.h> +#include <linux/rhashtable.h> #include <linux/socket.h> /* linux/in.h needs this!! */ #include <linux/in.h> /* We get struct sockaddr_in. */ #include <linux/in6.h> /* We get struct in6_addr */ @@ -123,6 +124,8 @@ extern struct sctp_globals { struct sctp_hashbucket *assoc_hashtable; /* This is the sctp port control hash. */ struct sctp_bind_hashbucket *port_hashtable; + /* This is the hash of all transports. */ + struct rhashtable transport_hashtable; /* Sizes of above hashtables. */ int ep_hashsize; @@ -147,6 +150,7 @@ extern struct sctp_globals { #define sctp_assoc_hashtable (sctp_globals.assoc_hashtable) #define sctp_port_hashsize (sctp_globals.port_hashsize) #define sctp_port_hashtable (sctp_globals.port_hashtable) +#define sctp_transport_hashtable (sctp_globals.transport_hashtable) #define sctp_checksum_disable (sctp_globals.checksum_disable) /* SCTP Socket type: UDP or TCP style. */ @@ -753,6 +757,7 @@ static inline int sctp_packet_empty(struct sctp_packet *packet) struct sctp_transport { /* A list of transports. */ struct list_head transports; + struct rhash_head node; /* Reference counting. */ atomic_t refcnt; diff --git a/net/sctp/input.c b/net/sctp/input.c index b6493b3..bac8278 100644 --- a/net/sctp/input.c +++ b/net/sctp/input.c @@ -782,6 +782,137 @@ hit: return ep; } +/* rhashtable for transport */ +struct sctp_hash_cmp_arg { + const union sctp_addr *laddr; + const union sctp_addr *paddr; + const struct net *net; +}; + +static inline int sctp_hash_cmp(struct rhashtable_compare_arg *arg, + const void *ptr) +{ + const struct sctp_hash_cmp_arg *x = arg->key; + const struct sctp_transport *t = ptr; + struct sctp_association *asoc = t->asoc; + const struct net *net = x->net; + + if (x->laddr->v4.sin_port != htons(asoc->base.bind_addr.port)) + return 1; + if (!sctp_cmp_addr_exact(&t->ipaddr, x->paddr)) + return 1; + if (!net_eq(sock_net(asoc->base.sk), net)) + return 1; + if (!sctp_bind_addr_match(&asoc->base.bind_addr, + x->laddr, sctp_sk(asoc->base.sk))) + return 1; + + return 0; +} + +static inline u32 sctp_hash_obj(const void *data, u32 len, u32 seed) +{ + const struct sctp_transport *t = data; + const union sctp_addr *paddr = &t->ipaddr; + const struct net *net = sock_net(t->asoc->base.sk); + u16 lport = htons(t->asoc->base.bind_addr.port); + u32 addr; + + if (paddr->sa.sa_family == AF_INET6) + addr = jhash(&paddr->v6.sin6_addr, 16, seed); + else + addr = paddr->v4.sin_addr.s_addr; + + return jhash_3words(addr, ((__u32)paddr->v4.sin_port) << 16 | + (__force __u32)lport, net_hash_mix(net), seed); +} + +static inline u32 sctp_hash_key(const void *data, u32 len, u32 seed) +{ + const struct sctp_hash_cmp_arg *x = data; + const union sctp_addr *paddr = x->paddr; + const struct net *net = x->net; + u16 lport = x->laddr->v4.sin_port; + u32 addr; + + if (paddr->sa.sa_family == AF_INET6) + addr = jhash(&paddr->v6.sin6_addr, 16, seed); + else + addr = paddr->v4.sin_addr.s_addr; + + return jhash_3words(addr, ((__u32)paddr->v4.sin_port) << 16 | + (__force __u32)lport, net_hash_mix(net), seed); +} + +static const struct rhashtable_params sctp_hash_params = { + .head_offset = offsetof(struct sctp_transport, node), + .hashfn = sctp_hash_key, + .obj_hashfn = sctp_hash_obj, + .obj_cmpfn = sctp_hash_cmp, + .automatic_shrinking = true, +}; + +int sctp_transport_hashtable_init(void) +{ + return rhashtable_init(&sctp_transport_hashtable, &sctp_hash_params); +} + +void sctp_transport_hashtable_destroy(void) +{ + rhashtable_destroy(&sctp_transport_hashtable); +} + +void sctp_hash_transport(struct sctp_transport *t) +{ + struct sctp_sockaddr_entry *addr; + struct sctp_hash_cmp_arg arg; + + addr = list_entry(t->asoc->base.bind_addr.address_list.next, + struct sctp_sockaddr_entry, list); + arg.laddr = &addr->a; + arg.paddr = &t->ipaddr; + arg.net = sock_net(t->asoc->base.sk); + +reinsert: + if (rhashtable_lookup_insert_key(&sctp_transport_hashtable, &arg, + &t->node, sctp_hash_params) == -EBUSY) + goto reinsert; +} + +void sctp_unhash_transport(struct sctp_transport *t) +{ + rhashtable_remove_fast(&sctp_transport_hashtable, &t->node, + sctp_hash_params); +} + +struct sctp_transport *sctp_addrs_lookup_transport( + struct net *net, + const union sctp_addr *laddr, + const union sctp_addr *paddr) +{ + struct sctp_hash_cmp_arg arg = { + .laddr = laddr, + .paddr = paddr, + .net = net, + }; + + return rhashtable_lookup_fast(&sctp_transport_hashtable, &arg, + sctp_hash_params); +} + +struct sctp_transport *sctp_epaddr_lookup_transport( + const struct sctp_endpoint *ep, + const union sctp_addr *paddr) +{ + struct sctp_sockaddr_entry *addr; + struct net *net = sock_net(ep->base.sk); + + addr = list_entry(ep->base.bind_addr.address_list.next, + struct sctp_sockaddr_entry, list); + + return sctp_addrs_lookup_transport(net, &addr->a, paddr); +} + /* Insert association into the hash table. */ static void __sctp_hash_established(struct sctp_association *asoc) { -- 2.1.0 ^ permalink raw reply related [flat|nested] 40+ messages in thread
* [PATCH net-next 2/5] sctp: apply rhashtable api to send/recv path 2015-12-30 15:50 ` [PATCH net-next 1/5] sctp: add the rhashtable apis for sctp global transport hashtable Xin Long @ 2015-12-30 15:50 ` Xin Long 2015-12-30 15:50 ` [PATCH net-next 3/5] sctp: apply rhashtable api to sctp procfs Xin Long 2016-01-05 19:07 ` [PATCH net-next 2/5] sctp: apply rhashtable api to send/recv path Vlad Yasevich 2015-12-30 16:57 ` [PATCH net-next 1/5] sctp: add the rhashtable apis for sctp global transport hashtable Eric Dumazet ` (3 subsequent siblings) 4 siblings, 2 replies; 40+ messages in thread From: Xin Long @ 2015-12-30 15:50 UTC (permalink / raw) To: network dev, linux-sctp; +Cc: mleitner, vyasevic, daniel, davem apply lookup apis to two functions, for __sctp_endpoint_lookup_assoc and __sctp_lookup_association, it's invoked in the protection of sock lock, it will be safe, but sctp_lookup_association need to call rcu_read_lock() and to detect the t->dead to protect it. Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> --- net/sctp/associola.c | 5 +++++ net/sctp/endpointola.c | 35 ++++++++--------------------------- net/sctp/input.c | 39 ++++++++++----------------------------- net/sctp/protocol.c | 6 ++++++ 4 files changed, 29 insertions(+), 56 deletions(-) diff --git a/net/sctp/associola.c b/net/sctp/associola.c index 559afd0..2bf8ec9 100644 --- a/net/sctp/associola.c +++ b/net/sctp/associola.c @@ -383,6 +383,7 @@ void sctp_association_free(struct sctp_association *asoc) list_for_each_safe(pos, temp, &asoc->peer.transport_addr_list) { transport = list_entry(pos, struct sctp_transport, transports); list_del_rcu(pos); + sctp_unhash_transport(transport); sctp_transport_free(transport); } @@ -500,6 +501,8 @@ void sctp_assoc_rm_peer(struct sctp_association *asoc, /* Remove this peer from the list. */ list_del_rcu(&peer->transports); + /* Remove this peer from the transport hashtable */ + sctp_unhash_transport(peer); /* Get the first transport of asoc. */ pos = asoc->peer.transport_addr_list.next; @@ -699,6 +702,8 @@ struct sctp_transport *sctp_assoc_add_peer(struct sctp_association *asoc, /* Attach the remote transport to our asoc. */ list_add_tail_rcu(&peer->transports, &asoc->peer.transport_addr_list); asoc->peer.transport_count++; + /* Add this peer into the transport hashtable */ + sctp_hash_transport(peer); /* If we do not yet have a primary path, set one. */ if (!asoc->peer.primary_path) { diff --git a/net/sctp/endpointola.c b/net/sctp/endpointola.c index 9da76ba..8838bf4 100644 --- a/net/sctp/endpointola.c +++ b/net/sctp/endpointola.c @@ -314,8 +314,8 @@ struct sctp_endpoint *sctp_endpoint_is_match(struct sctp_endpoint *ep, } /* Find the association that goes with this chunk. - * We do a linear search of the associations for this endpoint. - * We return the matching transport address too. + * We lookup the transport from hashtable at first, then get association + * through t->assoc. */ static struct sctp_association *__sctp_endpoint_lookup_assoc( const struct sctp_endpoint *ep, @@ -323,12 +323,7 @@ static struct sctp_association *__sctp_endpoint_lookup_assoc( struct sctp_transport **transport) { struct sctp_association *asoc = NULL; - struct sctp_association *tmp; - struct sctp_transport *t = NULL; - struct sctp_hashbucket *head; - struct sctp_ep_common *epb; - int hash; - int rport; + struct sctp_transport *t; *transport = NULL; @@ -337,26 +332,12 @@ static struct sctp_association *__sctp_endpoint_lookup_assoc( */ if (!ep->base.bind_addr.port) goto out; + t = sctp_epaddr_lookup_transport(ep, paddr); + if (!t || t->asoc->temp) + goto out; - rport = ntohs(paddr->v4.sin_port); - - hash = sctp_assoc_hashfn(sock_net(ep->base.sk), ep->base.bind_addr.port, - rport); - head = &sctp_assoc_hashtable[hash]; - read_lock(&head->lock); - sctp_for_each_hentry(epb, &head->chain) { - tmp = sctp_assoc(epb); - if (tmp->ep != ep || rport != tmp->peer.port) - continue; - - t = sctp_assoc_lookup_paddr(tmp, paddr); - if (t) { - asoc = tmp; - *transport = t; - break; - } - } - read_unlock(&head->lock); + *transport = t; + asoc = t->asoc; out: return asoc; } diff --git a/net/sctp/input.c b/net/sctp/input.c index bac8278..6f075d8 100644 --- a/net/sctp/input.c +++ b/net/sctp/input.c @@ -981,38 +981,19 @@ static struct sctp_association *__sctp_lookup_association( const union sctp_addr *peer, struct sctp_transport **pt) { - struct sctp_hashbucket *head; - struct sctp_ep_common *epb; - struct sctp_association *asoc; - struct sctp_transport *transport; - int hash; + struct sctp_transport *t; - /* Optimize here for direct hit, only listening connections can - * have wildcards anyways. - */ - hash = sctp_assoc_hashfn(net, ntohs(local->v4.sin_port), - ntohs(peer->v4.sin_port)); - head = &sctp_assoc_hashtable[hash]; - read_lock(&head->lock); - sctp_for_each_hentry(epb, &head->chain) { - asoc = sctp_assoc(epb); - transport = sctp_assoc_is_match(asoc, net, local, peer); - if (transport) - goto hit; - } + t = sctp_addrs_lookup_transport(net, local, peer); + if (!t || t->dead || t->asoc->temp) + return NULL; - read_unlock(&head->lock); + sctp_association_hold(t->asoc); + *pt = t; - return NULL; - -hit: - *pt = transport; - sctp_association_hold(asoc); - read_unlock(&head->lock); - return asoc; + return t->asoc; } -/* Look up an association. BH-safe. */ +/* Look up an association. protected by RCU read lock */ static struct sctp_association *sctp_lookup_association(struct net *net, const union sctp_addr *laddr, @@ -1021,9 +1002,9 @@ struct sctp_association *sctp_lookup_association(struct net *net, { struct sctp_association *asoc; - local_bh_disable(); + rcu_read_lock(); asoc = __sctp_lookup_association(net, laddr, paddr, transportp); - local_bh_enable(); + rcu_read_unlock(); return asoc; } diff --git a/net/sctp/protocol.c b/net/sctp/protocol.c index 010aced..631cfb3 100644 --- a/net/sctp/protocol.c +++ b/net/sctp/protocol.c @@ -1467,6 +1467,9 @@ static __init int sctp_init(void) INIT_HLIST_HEAD(&sctp_port_hashtable[i].chain); } + if (sctp_transport_hashtable_init()) + goto err_thash_alloc; + pr_info("Hash tables configured (established %d bind %d)\n", sctp_assoc_hashsize, sctp_port_hashsize); @@ -1521,6 +1524,8 @@ err_register_defaults: get_order(sctp_port_hashsize * sizeof(struct sctp_bind_hashbucket))); err_bhash_alloc: + sctp_transport_hashtable_destroy(); +err_thash_alloc: kfree(sctp_ep_hashtable); err_ehash_alloc: free_pages((unsigned long)sctp_assoc_hashtable, @@ -1567,6 +1572,7 @@ static __exit void sctp_exit(void) free_pages((unsigned long)sctp_port_hashtable, get_order(sctp_port_hashsize * sizeof(struct sctp_bind_hashbucket))); + sctp_transport_hashtable_destroy(); percpu_counter_destroy(&sctp_sockets_allocated); -- 2.1.0 ^ permalink raw reply related [flat|nested] 40+ messages in thread
* [PATCH net-next 3/5] sctp: apply rhashtable api to sctp procfs 2015-12-30 15:50 ` [PATCH net-next 2/5] sctp: apply rhashtable api to send/recv path Xin Long @ 2015-12-30 15:50 ` Xin Long 2015-12-30 15:50 ` [PATCH net-next 4/5] sctp: drop the old assoc hashtable of sctp Xin Long 2016-01-05 19:07 ` [PATCH net-next 2/5] sctp: apply rhashtable api to send/recv path Vlad Yasevich 1 sibling, 1 reply; 40+ messages in thread From: Xin Long @ 2015-12-30 15:50 UTC (permalink / raw) To: network dev, linux-sctp; +Cc: mleitner, vyasevic, daniel, davem Traversal the transport rhashtable, get the association only once through the condition assoc->peer.primary_path != transport. Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> --- net/sctp/proc.c | 316 +++++++++++++++++++++++++++++++------------------------- 1 file changed, 173 insertions(+), 143 deletions(-) diff --git a/net/sctp/proc.c b/net/sctp/proc.c index 0697eda..dfa7eec 100644 --- a/net/sctp/proc.c +++ b/net/sctp/proc.c @@ -281,88 +281,136 @@ void sctp_eps_proc_exit(struct net *net) remove_proc_entry("eps", net->sctp.proc_net_sctp); } +struct sctp_ht_iter { + struct seq_net_private p; + struct rhashtable_iter hti; +}; -static void *sctp_assocs_seq_start(struct seq_file *seq, loff_t *pos) +static struct sctp_transport *sctp_transport_get_next(struct seq_file *seq) { - if (*pos >= sctp_assoc_hashsize) - return NULL; + struct sctp_ht_iter *iter = seq->private; + struct sctp_transport *t; - if (*pos < 0) - *pos = 0; + t = rhashtable_walk_next(&iter->hti); + for (; t; t = rhashtable_walk_next(&iter->hti)) { + if (IS_ERR(t)) { + if (PTR_ERR(t) == -EAGAIN) + continue; + break; + } - if (*pos == 0) - seq_printf(seq, " ASSOC SOCK STY SST ST HBKT " - "ASSOC-ID TX_QUEUE RX_QUEUE UID INODE LPORT " - "RPORT LADDRS <-> RADDRS " - "HBINT INS OUTS MAXRT T1X T2X RTXC " - "wmema wmemq sndbuf rcvbuf\n"); + if (net_eq(sock_net(t->asoc->base.sk), seq_file_net(seq)) && + t->asoc->peer.primary_path == t) + break; + } - return (void *)pos; + return t; } -static void sctp_assocs_seq_stop(struct seq_file *seq, void *v) +static struct sctp_transport *sctp_transport_get_idx(struct seq_file *seq, + loff_t pos) +{ + void *obj; + + while (pos && (obj = sctp_transport_get_next(seq)) && !IS_ERR(obj)) + pos--; + + return obj; +} + +static int sctp_transport_walk_start(struct seq_file *seq) { + struct sctp_ht_iter *iter = seq->private; + int err; + + err = rhashtable_walk_init(&sctp_transport_hashtable, &iter->hti); + if (err) + return err; + + err = rhashtable_walk_start(&iter->hti); + + return err == -EAGAIN ? 0 : err; } +static void sctp_transport_walk_stop(struct seq_file *seq) +{ + struct sctp_ht_iter *iter = seq->private; + + rhashtable_walk_stop(&iter->hti); + rhashtable_walk_exit(&iter->hti); +} + +static void *sctp_assocs_seq_start(struct seq_file *seq, loff_t *pos) +{ + int err = sctp_transport_walk_start(seq); + + if (err) + return ERR_PTR(err); + + return *pos ? sctp_transport_get_idx(seq, *pos) : SEQ_START_TOKEN; +} + +static void sctp_assocs_seq_stop(struct seq_file *seq, void *v) +{ + sctp_transport_walk_stop(seq); +} static void *sctp_assocs_seq_next(struct seq_file *seq, void *v, loff_t *pos) { - if (++*pos >= sctp_assoc_hashsize) - return NULL; + ++*pos; - return pos; + return sctp_transport_get_next(seq); } /* Display sctp associations (/proc/net/sctp/assocs). */ static int sctp_assocs_seq_show(struct seq_file *seq, void *v) { - struct sctp_hashbucket *head; - struct sctp_ep_common *epb; + struct sctp_transport *transport; struct sctp_association *assoc; + struct sctp_ep_common *epb; struct sock *sk; - int hash = *(loff_t *)v; - - if (hash >= sctp_assoc_hashsize) - return -ENOMEM; - head = &sctp_assoc_hashtable[hash]; - local_bh_disable(); - read_lock(&head->lock); - sctp_for_each_hentry(epb, &head->chain) { - assoc = sctp_assoc(epb); - sk = epb->sk; - if (!net_eq(sock_net(sk), seq_file_net(seq))) - continue; - seq_printf(seq, - "%8pK %8pK %-3d %-3d %-2d %-4d " - "%4d %8d %8d %7u %5lu %-5d %5d ", - assoc, sk, sctp_sk(sk)->type, sk->sk_state, - assoc->state, hash, - assoc->assoc_id, - assoc->sndbuf_used, - atomic_read(&assoc->rmem_alloc), - from_kuid_munged(seq_user_ns(seq), sock_i_uid(sk)), - sock_i_ino(sk), - epb->bind_addr.port, - assoc->peer.port); - seq_printf(seq, " "); - sctp_seq_dump_local_addrs(seq, epb); - seq_printf(seq, "<-> "); - sctp_seq_dump_remote_addrs(seq, assoc); - seq_printf(seq, "\t%8lu %5d %5d %4d %4d %4d %8d " - "%8d %8d %8d %8d", - assoc->hbinterval, assoc->c.sinit_max_instreams, - assoc->c.sinit_num_ostreams, assoc->max_retrans, - assoc->init_retries, assoc->shutdown_retries, - assoc->rtx_data_chunks, - atomic_read(&sk->sk_wmem_alloc), - sk->sk_wmem_queued, - sk->sk_sndbuf, - sk->sk_rcvbuf); - seq_printf(seq, "\n"); + if (v == SEQ_START_TOKEN) { + seq_printf(seq, " ASSOC SOCK STY SST ST HBKT " + "ASSOC-ID TX_QUEUE RX_QUEUE UID INODE LPORT " + "RPORT LADDRS <-> RADDRS " + "HBINT INS OUTS MAXRT T1X T2X RTXC " + "wmema wmemq sndbuf rcvbuf\n"); + return 0; } - read_unlock(&head->lock); - local_bh_enable(); + + transport = (struct sctp_transport *)v; + assoc = transport->asoc; + epb = &assoc->base; + sk = epb->sk; + + seq_printf(seq, + "%8pK %8pK %-3d %-3d %-2d %-4d " + "%4d %8d %8d %7u %5lu %-5d %5d ", + assoc, sk, sctp_sk(sk)->type, sk->sk_state, + assoc->state, 0, + assoc->assoc_id, + assoc->sndbuf_used, + atomic_read(&assoc->rmem_alloc), + from_kuid_munged(seq_user_ns(seq), sock_i_uid(sk)), + sock_i_ino(sk), + epb->bind_addr.port, + assoc->peer.port); + seq_printf(seq, " "); + sctp_seq_dump_local_addrs(seq, epb); + seq_printf(seq, "<-> "); + sctp_seq_dump_remote_addrs(seq, assoc); + seq_printf(seq, "\t%8lu %5d %5d %4d %4d %4d %8d " + "%8d %8d %8d %8d", + assoc->hbinterval, assoc->c.sinit_max_instreams, + assoc->c.sinit_num_ostreams, assoc->max_retrans, + assoc->init_retries, assoc->shutdown_retries, + assoc->rtx_data_chunks, + atomic_read(&sk->sk_wmem_alloc), + sk->sk_wmem_queued, + sk->sk_sndbuf, + sk->sk_rcvbuf); + seq_printf(seq, "\n"); return 0; } @@ -378,7 +426,7 @@ static const struct seq_operations sctp_assoc_ops = { static int sctp_assocs_seq_open(struct inode *inode, struct file *file) { return seq_open_net(inode, file, &sctp_assoc_ops, - sizeof(struct seq_net_private)); + sizeof(struct sctp_ht_iter)); } static const struct file_operations sctp_assocs_seq_fops = { @@ -409,112 +457,94 @@ void sctp_assocs_proc_exit(struct net *net) static void *sctp_remaddr_seq_start(struct seq_file *seq, loff_t *pos) { - if (*pos >= sctp_assoc_hashsize) - return NULL; - - if (*pos < 0) - *pos = 0; + int err = sctp_transport_walk_start(seq); - if (*pos == 0) - seq_printf(seq, "ADDR ASSOC_ID HB_ACT RTO MAX_PATH_RTX " - "REM_ADDR_RTX START STATE\n"); + if (err) + return ERR_PTR(err); - return (void *)pos; + return *pos ? sctp_transport_get_idx(seq, *pos) : SEQ_START_TOKEN; } static void *sctp_remaddr_seq_next(struct seq_file *seq, void *v, loff_t *pos) { - if (++*pos >= sctp_assoc_hashsize) - return NULL; + ++*pos; - return pos; + return sctp_transport_get_next(seq); } static void sctp_remaddr_seq_stop(struct seq_file *seq, void *v) { + sctp_transport_walk_stop(seq); } static int sctp_remaddr_seq_show(struct seq_file *seq, void *v) { - struct sctp_hashbucket *head; - struct sctp_ep_common *epb; struct sctp_association *assoc; struct sctp_transport *tsp; - int hash = *(loff_t *)v; - if (hash >= sctp_assoc_hashsize) - return -ENOMEM; + if (v == SEQ_START_TOKEN) { + seq_printf(seq, "ADDR ASSOC_ID HB_ACT RTO MAX_PATH_RTX " + "REM_ADDR_RTX START STATE\n"); + return 0; + } - head = &sctp_assoc_hashtable[hash]; - local_bh_disable(); - read_lock(&head->lock); - rcu_read_lock(); - sctp_for_each_hentry(epb, &head->chain) { - if (!net_eq(sock_net(epb->sk), seq_file_net(seq))) + tsp = (struct sctp_transport *)v; + assoc = tsp->asoc; + + list_for_each_entry_rcu(tsp, &assoc->peer.transport_addr_list, + transports) { + if (tsp->dead) continue; - assoc = sctp_assoc(epb); - list_for_each_entry_rcu(tsp, &assoc->peer.transport_addr_list, - transports) { - if (tsp->dead) - continue; + /* + * The remote address (ADDR) + */ + tsp->af_specific->seq_dump_addr(seq, &tsp->ipaddr); + seq_printf(seq, " "); + /* + * The association ID (ASSOC_ID) + */ + seq_printf(seq, "%d ", tsp->asoc->assoc_id); + + /* + * If the Heartbeat is active (HB_ACT) + * Note: 1 = Active, 0 = Inactive + */ + seq_printf(seq, "%d ", timer_pending(&tsp->hb_timer)); + + /* + * Retransmit time out (RTO) + */ + seq_printf(seq, "%lu ", tsp->rto); + + /* + * Maximum path retransmit count (PATH_MAX_RTX) + */ + seq_printf(seq, "%d ", tsp->pathmaxrxt); + + /* + * remote address retransmit count (REM_ADDR_RTX) + * Note: We don't have a way to tally this at the moment + * so lets just leave it as zero for the moment + */ + seq_puts(seq, "0 "); + + /* + * remote address start time (START). This is also not + * currently implemented, but we can record it with a + * jiffies marker in a subsequent patch + */ + seq_puts(seq, "0 "); + + /* + * The current state of this destination. I.e. + * SCTP_ACTIVE, SCTP_INACTIVE, ... + */ + seq_printf(seq, "%d", tsp->state); - /* - * The remote address (ADDR) - */ - tsp->af_specific->seq_dump_addr(seq, &tsp->ipaddr); - seq_printf(seq, " "); - - /* - * The association ID (ASSOC_ID) - */ - seq_printf(seq, "%d ", tsp->asoc->assoc_id); - - /* - * If the Heartbeat is active (HB_ACT) - * Note: 1 = Active, 0 = Inactive - */ - seq_printf(seq, "%d ", timer_pending(&tsp->hb_timer)); - - /* - * Retransmit time out (RTO) - */ - seq_printf(seq, "%lu ", tsp->rto); - - /* - * Maximum path retransmit count (PATH_MAX_RTX) - */ - seq_printf(seq, "%d ", tsp->pathmaxrxt); - - /* - * remote address retransmit count (REM_ADDR_RTX) - * Note: We don't have a way to tally this at the moment - * so lets just leave it as zero for the moment - */ - seq_puts(seq, "0 "); - - /* - * remote address start time (START). This is also not - * currently implemented, but we can record it with a - * jiffies marker in a subsequent patch - */ - seq_puts(seq, "0 "); - - /* - * The current state of this destination. I.e. - * SCTP_ACTIVE, SCTP_INACTIVE, ... - */ - seq_printf(seq, "%d", tsp->state); - - seq_printf(seq, "\n"); - } + seq_printf(seq, "\n"); } - rcu_read_unlock(); - read_unlock(&head->lock); - local_bh_enable(); - return 0; - } static const struct seq_operations sctp_remaddr_ops = { @@ -533,7 +563,7 @@ void sctp_remaddr_proc_exit(struct net *net) static int sctp_remaddr_seq_open(struct inode *inode, struct file *file) { return seq_open_net(inode, file, &sctp_remaddr_ops, - sizeof(struct seq_net_private)); + sizeof(struct sctp_ht_iter)); } static const struct file_operations sctp_remaddr_seq_fops = { -- 2.1.0 ^ permalink raw reply related [flat|nested] 40+ messages in thread
* [PATCH net-next 4/5] sctp: drop the old assoc hashtable of sctp 2015-12-30 15:50 ` [PATCH net-next 3/5] sctp: apply rhashtable api to sctp procfs Xin Long @ 2015-12-30 15:50 ` Xin Long 2015-12-30 15:50 ` [PATCH net-next 5/5] sctp: remove the local_bh_disable/enable in sctp_endpoint_lookup_assoc Xin Long 0 siblings, 1 reply; 40+ messages in thread From: Xin Long @ 2015-12-30 15:50 UTC (permalink / raw) To: network dev, linux-sctp; +Cc: mleitner, vyasevic, daniel, davem transport hashtable will replace the association hashtable, so association hashtable is not used in sctp any more, so drop the codes about that. Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> --- include/net/sctp/sctp.h | 21 ---------------- include/net/sctp/structs.h | 5 ---- net/sctp/input.c | 61 ---------------------------------------------- net/sctp/protocol.c | 30 ++--------------------- net/sctp/sm_sideeffect.c | 2 -- net/sctp/socket.c | 6 +---- 6 files changed, 3 insertions(+), 122 deletions(-) diff --git a/include/net/sctp/sctp.h b/include/net/sctp/sctp.h index 7bbdfba..835aa2e 100644 --- a/include/net/sctp/sctp.h +++ b/include/net/sctp/sctp.h @@ -126,8 +126,6 @@ int sctp_primitive_ASCONF(struct net *, struct sctp_association *, void *arg); */ int sctp_rcv(struct sk_buff *skb); void sctp_v4_err(struct sk_buff *skb, u32 info); -void sctp_hash_established(struct sctp_association *); -void sctp_unhash_established(struct sctp_association *); void sctp_hash_endpoint(struct sctp_endpoint *); void sctp_unhash_endpoint(struct sctp_endpoint *); struct sock *sctp_err_lookup(struct net *net, int family, struct sk_buff *, @@ -530,25 +528,6 @@ static inline int sctp_ep_hashfn(struct net *net, __u16 lport) return (net_hash_mix(net) + lport) & (sctp_ep_hashsize - 1); } -/* This is the hash function for the association hash table. */ -static inline int sctp_assoc_hashfn(struct net *net, __u16 lport, __u16 rport) -{ - int h = (lport << 16) + rport + net_hash_mix(net); - h ^= h>>8; - return h & (sctp_assoc_hashsize - 1); -} - -/* This is the hash function for the association hash table. This is - * not used yet, but could be used as a better hash function when - * we have a vtag. - */ -static inline int sctp_vtag_hashfn(__u16 lport, __u16 rport, __u32 vtag) -{ - int h = (lport << 16) + rport; - h ^= vtag; - return h & (sctp_assoc_hashsize - 1); -} - #define sctp_for_each_hentry(epb, head) \ hlist_for_each_entry(epb, head, node) diff --git a/include/net/sctp/structs.h b/include/net/sctp/structs.h index 4ab87d0..20e7212 100644 --- a/include/net/sctp/structs.h +++ b/include/net/sctp/structs.h @@ -120,8 +120,6 @@ extern struct sctp_globals { /* This is the hash of all endpoints. */ struct sctp_hashbucket *ep_hashtable; - /* This is the hash of all associations. */ - struct sctp_hashbucket *assoc_hashtable; /* This is the sctp port control hash. */ struct sctp_bind_hashbucket *port_hashtable; /* This is the hash of all transports. */ @@ -129,7 +127,6 @@ extern struct sctp_globals { /* Sizes of above hashtables. */ int ep_hashsize; - int assoc_hashsize; int port_hashsize; /* Default initialization values to be applied to new associations. */ @@ -146,8 +143,6 @@ extern struct sctp_globals { #define sctp_address_families (sctp_globals.address_families) #define sctp_ep_hashsize (sctp_globals.ep_hashsize) #define sctp_ep_hashtable (sctp_globals.ep_hashtable) -#define sctp_assoc_hashsize (sctp_globals.assoc_hashsize) -#define sctp_assoc_hashtable (sctp_globals.assoc_hashtable) #define sctp_port_hashsize (sctp_globals.port_hashsize) #define sctp_port_hashtable (sctp_globals.port_hashtable) #define sctp_transport_hashtable (sctp_globals.transport_hashtable) diff --git a/net/sctp/input.c b/net/sctp/input.c index 6f075d8..d9a6e66 100644 --- a/net/sctp/input.c +++ b/net/sctp/input.c @@ -913,67 +913,6 @@ struct sctp_transport *sctp_epaddr_lookup_transport( return sctp_addrs_lookup_transport(net, &addr->a, paddr); } -/* Insert association into the hash table. */ -static void __sctp_hash_established(struct sctp_association *asoc) -{ - struct net *net = sock_net(asoc->base.sk); - struct sctp_ep_common *epb; - struct sctp_hashbucket *head; - - epb = &asoc->base; - - /* Calculate which chain this entry will belong to. */ - epb->hashent = sctp_assoc_hashfn(net, epb->bind_addr.port, - asoc->peer.port); - - head = &sctp_assoc_hashtable[epb->hashent]; - - write_lock(&head->lock); - hlist_add_head(&epb->node, &head->chain); - write_unlock(&head->lock); -} - -/* Add an association to the hash. Local BH-safe. */ -void sctp_hash_established(struct sctp_association *asoc) -{ - if (asoc->temp) - return; - - local_bh_disable(); - __sctp_hash_established(asoc); - local_bh_enable(); -} - -/* Remove association from the hash table. */ -static void __sctp_unhash_established(struct sctp_association *asoc) -{ - struct net *net = sock_net(asoc->base.sk); - struct sctp_hashbucket *head; - struct sctp_ep_common *epb; - - epb = &asoc->base; - - epb->hashent = sctp_assoc_hashfn(net, epb->bind_addr.port, - asoc->peer.port); - - head = &sctp_assoc_hashtable[epb->hashent]; - - write_lock(&head->lock); - hlist_del_init(&epb->node); - write_unlock(&head->lock); -} - -/* Remove association from the hash table. Local BH-safe. */ -void sctp_unhash_established(struct sctp_association *asoc) -{ - if (asoc->temp) - return; - - local_bh_disable(); - __sctp_unhash_established(asoc); - local_bh_enable(); -} - /* Look up an association. */ static struct sctp_association *__sctp_lookup_association( struct net *net, diff --git a/net/sctp/protocol.c b/net/sctp/protocol.c index 631cfb3..ab0d538 100644 --- a/net/sctp/protocol.c +++ b/net/sctp/protocol.c @@ -1416,24 +1416,6 @@ static __init int sctp_init(void) for (order = 0; (1UL << order) < goal; order++) ; - do { - sctp_assoc_hashsize = (1UL << order) * PAGE_SIZE / - sizeof(struct sctp_hashbucket); - if ((sctp_assoc_hashsize > (64 * 1024)) && order > 0) - continue; - sctp_assoc_hashtable = (struct sctp_hashbucket *) - __get_free_pages(GFP_KERNEL | __GFP_NOWARN, order); - } while (!sctp_assoc_hashtable && --order > 0); - if (!sctp_assoc_hashtable) { - pr_err("Failed association hash alloc\n"); - status = -ENOMEM; - goto err_ahash_alloc; - } - for (i = 0; i < sctp_assoc_hashsize; i++) { - rwlock_init(&sctp_assoc_hashtable[i].lock); - INIT_HLIST_HEAD(&sctp_assoc_hashtable[i].chain); - } - /* Allocate and initialize the endpoint hash table. */ sctp_ep_hashsize = 64; sctp_ep_hashtable = @@ -1470,8 +1452,7 @@ static __init int sctp_init(void) if (sctp_transport_hashtable_init()) goto err_thash_alloc; - pr_info("Hash tables configured (established %d bind %d)\n", - sctp_assoc_hashsize, sctp_port_hashsize); + pr_info("Hash tables configured (bind %d)\n", sctp_port_hashsize); sctp_sysctl_register(); @@ -1528,10 +1509,6 @@ err_bhash_alloc: err_thash_alloc: kfree(sctp_ep_hashtable); err_ehash_alloc: - free_pages((unsigned long)sctp_assoc_hashtable, - get_order(sctp_assoc_hashsize * - sizeof(struct sctp_hashbucket))); -err_ahash_alloc: percpu_counter_destroy(&sctp_sockets_allocated); err_percpu_counter_init: kmem_cache_destroy(sctp_chunk_cachep); @@ -1565,13 +1542,10 @@ static __exit void sctp_exit(void) sctp_sysctl_unregister(); - free_pages((unsigned long)sctp_assoc_hashtable, - get_order(sctp_assoc_hashsize * - sizeof(struct sctp_hashbucket))); - kfree(sctp_ep_hashtable); free_pages((unsigned long)sctp_port_hashtable, get_order(sctp_port_hashsize * sizeof(struct sctp_bind_hashbucket))); + kfree(sctp_ep_hashtable); sctp_transport_hashtable_destroy(); percpu_counter_destroy(&sctp_sockets_allocated); diff --git a/net/sctp/sm_sideeffect.c b/net/sctp/sm_sideeffect.c index 05cd164..4f170ad 100644 --- a/net/sctp/sm_sideeffect.c +++ b/net/sctp/sm_sideeffect.c @@ -866,7 +866,6 @@ static void sctp_cmd_delete_tcb(sctp_cmd_seq_t *cmds, (!asoc->temp) && (sk->sk_shutdown != SHUTDOWN_MASK)) return; - sctp_unhash_established(asoc); sctp_association_free(asoc); } @@ -1269,7 +1268,6 @@ static int sctp_cmd_interpreter(sctp_event_t event_type, asoc = cmd->obj.asoc; BUG_ON(asoc->peer.primary_path == NULL); sctp_endpoint_add_asoc(ep, asoc); - sctp_hash_established(asoc); break; case SCTP_CMD_UPDATE_ASSOC: diff --git a/net/sctp/socket.c b/net/sctp/socket.c index 2a1e8ba..523b9fa 100644 --- a/net/sctp/socket.c +++ b/net/sctp/socket.c @@ -1228,7 +1228,6 @@ out_free: * To the hash table, try to unhash it, just in case, its a noop * if it wasn't hashed so we're safe */ - sctp_unhash_established(asoc); sctp_association_free(asoc); } return err; @@ -1501,7 +1500,6 @@ static void sctp_close(struct sock *sk, long timeout) * ABORT or SHUTDOWN based on the linger options. */ if (sctp_state(asoc, CLOSED)) { - sctp_unhash_established(asoc); sctp_association_free(asoc); continue; } @@ -1984,10 +1982,8 @@ static int sctp_sendmsg(struct sock *sk, struct msghdr *msg, size_t msg_len) goto out_unlock; out_free: - if (new_asoc) { - sctp_unhash_established(asoc); + if (new_asoc) sctp_association_free(asoc); - } out_unlock: release_sock(sk); -- 2.1.0 ^ permalink raw reply related [flat|nested] 40+ messages in thread
* [PATCH net-next 5/5] sctp: remove the local_bh_disable/enable in sctp_endpoint_lookup_assoc 2015-12-30 15:50 ` [PATCH net-next 4/5] sctp: drop the old assoc hashtable of sctp Xin Long @ 2015-12-30 15:50 ` Xin Long 0 siblings, 0 replies; 40+ messages in thread From: Xin Long @ 2015-12-30 15:50 UTC (permalink / raw) To: network dev, linux-sctp; +Cc: mleitner, vyasevic, daniel, davem sctp_endpoint_lookup_assoc is called in the protection of sock lock there is no need to call local_bh_disable in this function. so remove them. Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> --- net/sctp/endpointola.c | 17 +---------------- 1 file changed, 1 insertion(+), 16 deletions(-) diff --git a/net/sctp/endpointola.c b/net/sctp/endpointola.c index 8838bf4..52838ea 100644 --- a/net/sctp/endpointola.c +++ b/net/sctp/endpointola.c @@ -317,7 +317,7 @@ struct sctp_endpoint *sctp_endpoint_is_match(struct sctp_endpoint *ep, * We lookup the transport from hashtable at first, then get association * through t->assoc. */ -static struct sctp_association *__sctp_endpoint_lookup_assoc( +struct sctp_association *sctp_endpoint_lookup_assoc( const struct sctp_endpoint *ep, const union sctp_addr *paddr, struct sctp_transport **transport) @@ -342,21 +342,6 @@ out: return asoc; } -/* Lookup association on an endpoint based on a peer address. BH-safe. */ -struct sctp_association *sctp_endpoint_lookup_assoc( - const struct sctp_endpoint *ep, - const union sctp_addr *paddr, - struct sctp_transport **transport) -{ - struct sctp_association *asoc; - - local_bh_disable(); - asoc = __sctp_endpoint_lookup_assoc(ep, paddr, transport); - local_bh_enable(); - - return asoc; -} - /* Look for any peeled off association from the endpoint that matches the * given peer address. */ -- 2.1.0 ^ permalink raw reply related [flat|nested] 40+ messages in thread
* Re: [PATCH net-next 2/5] sctp: apply rhashtable api to send/recv path 2015-12-30 15:50 ` [PATCH net-next 2/5] sctp: apply rhashtable api to send/recv path Xin Long 2015-12-30 15:50 ` [PATCH net-next 3/5] sctp: apply rhashtable api to sctp procfs Xin Long @ 2016-01-05 19:07 ` Vlad Yasevich 2016-01-06 16:18 ` Xin Long 2016-01-06 17:42 ` mleitner 1 sibling, 2 replies; 40+ messages in thread From: Vlad Yasevich @ 2016-01-05 19:07 UTC (permalink / raw) To: Xin Long, network dev, linux-sctp; +Cc: mleitner, vyasevic, daniel, davem On 12/30/2015 10:50 AM, Xin Long wrote: > apply lookup apis to two functions, for __sctp_endpoint_lookup_assoc > and __sctp_lookup_association, it's invoked in the protection of sock > lock, it will be safe, but sctp_lookup_association need to call > rcu_read_lock() and to detect the t->dead to protect it. > > Signed-off-by: Xin Long <lucien.xin@gmail.com> > Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> > --- > net/sctp/associola.c | 5 +++++ > net/sctp/endpointola.c | 35 ++++++++--------------------------- > net/sctp/input.c | 39 ++++++++++----------------------------- > net/sctp/protocol.c | 6 ++++++ > 4 files changed, 29 insertions(+), 56 deletions(-) > > diff --git a/net/sctp/associola.c b/net/sctp/associola.c > index 559afd0..2bf8ec9 100644 > --- a/net/sctp/associola.c > +++ b/net/sctp/associola.c > @@ -383,6 +383,7 @@ void sctp_association_free(struct sctp_association *asoc) > list_for_each_safe(pos, temp, &asoc->peer.transport_addr_list) { > transport = list_entry(pos, struct sctp_transport, transports); > list_del_rcu(pos); > + sctp_unhash_transport(transport); > sctp_transport_free(transport); > } > > @@ -500,6 +501,8 @@ void sctp_assoc_rm_peer(struct sctp_association *asoc, > > /* Remove this peer from the list. */ > list_del_rcu(&peer->transports); > + /* Remove this peer from the transport hashtable */ > + sctp_unhash_transport(peer); > > /* Get the first transport of asoc. */ > pos = asoc->peer.transport_addr_list.next; > @@ -699,6 +702,8 @@ struct sctp_transport *sctp_assoc_add_peer(struct sctp_association *asoc, > /* Attach the remote transport to our asoc. */ > list_add_tail_rcu(&peer->transports, &asoc->peer.transport_addr_list); > asoc->peer.transport_count++; > + /* Add this peer into the transport hashtable */ > + sctp_hash_transport(peer); This is actually problematic. The issue is that transports are unhashed when removed. however, transport removal happens after the association has been declared dead and should have been removed from the hash and marked unreachable. As a result, with the code above, you can now find and return a dead association. Checking for 'dead' state is racy. The best solution I've come up with is to hash the transports in sctp_hash_established() and clean-up in __sctp_unhash_established(), and then handle ADD-IP case separately. The above would also remove the necessity to check for temporary associations, since they should never be hashed. -vlad > > /* If we do not yet have a primary path, set one. */ > if (!asoc->peer.primary_path) { > diff --git a/net/sctp/endpointola.c b/net/sctp/endpointola.c > index 9da76ba..8838bf4 100644 > --- a/net/sctp/endpointola.c > +++ b/net/sctp/endpointola.c > @@ -314,8 +314,8 @@ struct sctp_endpoint *sctp_endpoint_is_match(struct sctp_endpoint *ep, > } > > /* Find the association that goes with this chunk. > - * We do a linear search of the associations for this endpoint. > - * We return the matching transport address too. > + * We lookup the transport from hashtable at first, then get association > + * through t->assoc. > */ > static struct sctp_association *__sctp_endpoint_lookup_assoc( > const struct sctp_endpoint *ep, > @@ -323,12 +323,7 @@ static struct sctp_association *__sctp_endpoint_lookup_assoc( > struct sctp_transport **transport) > { > struct sctp_association *asoc = NULL; > - struct sctp_association *tmp; > - struct sctp_transport *t = NULL; > - struct sctp_hashbucket *head; > - struct sctp_ep_common *epb; > - int hash; > - int rport; > + struct sctp_transport *t; > > *transport = NULL; > > @@ -337,26 +332,12 @@ static struct sctp_association *__sctp_endpoint_lookup_assoc( > */ > if (!ep->base.bind_addr.port) > goto out; > + t = sctp_epaddr_lookup_transport(ep, paddr); > + if (!t || t->asoc->temp) > + goto out; > > - rport = ntohs(paddr->v4.sin_port); > - > - hash = sctp_assoc_hashfn(sock_net(ep->base.sk), ep->base.bind_addr.port, > - rport); > - head = &sctp_assoc_hashtable[hash]; > - read_lock(&head->lock); > - sctp_for_each_hentry(epb, &head->chain) { > - tmp = sctp_assoc(epb); > - if (tmp->ep != ep || rport != tmp->peer.port) > - continue; > - > - t = sctp_assoc_lookup_paddr(tmp, paddr); > - if (t) { > - asoc = tmp; > - *transport = t; > - break; > - } > - } > - read_unlock(&head->lock); > + *transport = t; > + asoc = t->asoc; > out: > return asoc; > } > diff --git a/net/sctp/input.c b/net/sctp/input.c > index bac8278..6f075d8 100644 > --- a/net/sctp/input.c > +++ b/net/sctp/input.c > @@ -981,38 +981,19 @@ static struct sctp_association *__sctp_lookup_association( > const union sctp_addr *peer, > struct sctp_transport **pt) > { > - struct sctp_hashbucket *head; > - struct sctp_ep_common *epb; > - struct sctp_association *asoc; > - struct sctp_transport *transport; > - int hash; > + struct sctp_transport *t; > > - /* Optimize here for direct hit, only listening connections can > - * have wildcards anyways. > - */ > - hash = sctp_assoc_hashfn(net, ntohs(local->v4.sin_port), > - ntohs(peer->v4.sin_port)); > - head = &sctp_assoc_hashtable[hash]; > - read_lock(&head->lock); > - sctp_for_each_hentry(epb, &head->chain) { > - asoc = sctp_assoc(epb); > - transport = sctp_assoc_is_match(asoc, net, local, peer); > - if (transport) > - goto hit; > - } > + t = sctp_addrs_lookup_transport(net, local, peer); > + if (!t || t->dead || t->asoc->temp) > + return NULL; > > - read_unlock(&head->lock); > + sctp_association_hold(t->asoc); > + *pt = t; > > - return NULL; > - > -hit: > - *pt = transport; > - sctp_association_hold(asoc); > - read_unlock(&head->lock); > - return asoc; > + return t->asoc; > } > > -/* Look up an association. BH-safe. */ > +/* Look up an association. protected by RCU read lock */ > static > struct sctp_association *sctp_lookup_association(struct net *net, > const union sctp_addr *laddr, > @@ -1021,9 +1002,9 @@ struct sctp_association *sctp_lookup_association(struct net *net, > { > struct sctp_association *asoc; > > - local_bh_disable(); > + rcu_read_lock(); > asoc = __sctp_lookup_association(net, laddr, paddr, transportp); > - local_bh_enable(); > + rcu_read_unlock(); > > return asoc; > } > diff --git a/net/sctp/protocol.c b/net/sctp/protocol.c > index 010aced..631cfb3 100644 > --- a/net/sctp/protocol.c > +++ b/net/sctp/protocol.c > @@ -1467,6 +1467,9 @@ static __init int sctp_init(void) > INIT_HLIST_HEAD(&sctp_port_hashtable[i].chain); > } > > + if (sctp_transport_hashtable_init()) > + goto err_thash_alloc; > + > pr_info("Hash tables configured (established %d bind %d)\n", > sctp_assoc_hashsize, sctp_port_hashsize); > > @@ -1521,6 +1524,8 @@ err_register_defaults: > get_order(sctp_port_hashsize * > sizeof(struct sctp_bind_hashbucket))); > err_bhash_alloc: > + sctp_transport_hashtable_destroy(); > +err_thash_alloc: > kfree(sctp_ep_hashtable); > err_ehash_alloc: > free_pages((unsigned long)sctp_assoc_hashtable, > @@ -1567,6 +1572,7 @@ static __exit void sctp_exit(void) > free_pages((unsigned long)sctp_port_hashtable, > get_order(sctp_port_hashsize * > sizeof(struct sctp_bind_hashbucket))); > + sctp_transport_hashtable_destroy(); > > percpu_counter_destroy(&sctp_sockets_allocated); > > ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH net-next 2/5] sctp: apply rhashtable api to send/recv path 2016-01-05 19:07 ` [PATCH net-next 2/5] sctp: apply rhashtable api to send/recv path Vlad Yasevich @ 2016-01-06 16:18 ` Xin Long 2016-01-06 17:42 ` mleitner 1 sibling, 0 replies; 40+ messages in thread From: Xin Long @ 2016-01-06 16:18 UTC (permalink / raw) To: Vlad Yasevich Cc: network dev, linux-sctp, Marcelo Ricardo Leitner, Vlad Yasevich, daniel, davem On Wed, Jan 6, 2016 at 3:07 AM, Vlad Yasevich <vyasevich@gmail.com> wrote: > On 12/30/2015 10:50 AM, Xin Long wrote: >> apply lookup apis to two functions, for __sctp_endpoint_lookup_assoc >> and __sctp_lookup_association, it's invoked in the protection of sock >> lock, it will be safe, but sctp_lookup_association need to call >> rcu_read_lock() and to detect the t->dead to protect it. >> >> Signed-off-by: Xin Long <lucien.xin@gmail.com> >> Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> >> --- >> net/sctp/associola.c | 5 +++++ >> net/sctp/endpointola.c | 35 ++++++++--------------------------- >> net/sctp/input.c | 39 ++++++++++----------------------------- >> net/sctp/protocol.c | 6 ++++++ >> 4 files changed, 29 insertions(+), 56 deletions(-) >> >> diff --git a/net/sctp/associola.c b/net/sctp/associola.c >> index 559afd0..2bf8ec9 100644 >> --- a/net/sctp/associola.c >> +++ b/net/sctp/associola.c >> @@ -383,6 +383,7 @@ void sctp_association_free(struct sctp_association *asoc) >> list_for_each_safe(pos, temp, &asoc->peer.transport_addr_list) { >> transport = list_entry(pos, struct sctp_transport, transports); >> list_del_rcu(pos); >> + sctp_unhash_transport(transport); >> sctp_transport_free(transport); >> } >> >> @@ -500,6 +501,8 @@ void sctp_assoc_rm_peer(struct sctp_association *asoc, >> >> /* Remove this peer from the list. */ >> list_del_rcu(&peer->transports); >> + /* Remove this peer from the transport hashtable */ >> + sctp_unhash_transport(peer); >> >> /* Get the first transport of asoc. */ >> pos = asoc->peer.transport_addr_list.next; >> @@ -699,6 +702,8 @@ struct sctp_transport *sctp_assoc_add_peer(struct sctp_association *asoc, >> /* Attach the remote transport to our asoc. */ >> list_add_tail_rcu(&peer->transports, &asoc->peer.transport_addr_list); >> asoc->peer.transport_count++; >> + /* Add this peer into the transport hashtable */ >> + sctp_hash_transport(peer); > > This is actually problematic. The issue is that transports are unhashed when removed. > however, transport removal happens after the association has been declared dead and > should have been removed from the hash and marked unreachable. > > As a result, with the code above, you can now find and return a dead association. > Checking for 'dead' state is racy. > > The best solution I've come up with is to hash the transports in sctp_hash_established() > and clean-up in __sctp_unhash_established(), and then handle ADD-IP case separately. > > The above would also remove the necessity to check for temporary associations, since they > should never be hashed. > > -vlad > yes, you're right, im thinking if we can unhash transport before the association declares dead in sctp_association_free, like: list_for_each_safe(pos, temp, &asoc->peer.transport_addr_list) { transport = list_entry(pos, struct sctp_transport, transports); sctp_unhash_transport(transport); } asoc->base.dead = true; ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH net-next 2/5] sctp: apply rhashtable api to send/recv path 2016-01-05 19:07 ` [PATCH net-next 2/5] sctp: apply rhashtable api to send/recv path Vlad Yasevich 2016-01-06 16:18 ` Xin Long @ 2016-01-06 17:42 ` mleitner 2016-01-11 15:00 ` Vlad Yasevich 1 sibling, 1 reply; 40+ messages in thread From: mleitner @ 2016-01-06 17:42 UTC (permalink / raw) To: Vlad Yasevich; +Cc: Xin Long, network dev, linux-sctp, vyasevic, daniel, davem On Tue, Jan 05, 2016 at 02:07:11PM -0500, Vlad Yasevich wrote: > On 12/30/2015 10:50 AM, Xin Long wrote: > > apply lookup apis to two functions, for __sctp_endpoint_lookup_assoc > > and __sctp_lookup_association, it's invoked in the protection of sock > > lock, it will be safe, but sctp_lookup_association need to call > > rcu_read_lock() and to detect the t->dead to protect it. > > > > Signed-off-by: Xin Long <lucien.xin@gmail.com> > > Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> > > --- > > net/sctp/associola.c | 5 +++++ > > net/sctp/endpointola.c | 35 ++++++++--------------------------- > > net/sctp/input.c | 39 ++++++++++----------------------------- > > net/sctp/protocol.c | 6 ++++++ > > 4 files changed, 29 insertions(+), 56 deletions(-) > > > > diff --git a/net/sctp/associola.c b/net/sctp/associola.c > > index 559afd0..2bf8ec9 100644 > > --- a/net/sctp/associola.c > > +++ b/net/sctp/associola.c > > @@ -383,6 +383,7 @@ void sctp_association_free(struct sctp_association *asoc) > > list_for_each_safe(pos, temp, &asoc->peer.transport_addr_list) { > > transport = list_entry(pos, struct sctp_transport, transports); > > list_del_rcu(pos); > > + sctp_unhash_transport(transport); > > sctp_transport_free(transport); > > } > > > > @@ -500,6 +501,8 @@ void sctp_assoc_rm_peer(struct sctp_association *asoc, > > > > /* Remove this peer from the list. */ > > list_del_rcu(&peer->transports); > > + /* Remove this peer from the transport hashtable */ > > + sctp_unhash_transport(peer); > > > > /* Get the first transport of asoc. */ > > pos = asoc->peer.transport_addr_list.next; > > @@ -699,6 +702,8 @@ struct sctp_transport *sctp_assoc_add_peer(struct sctp_association *asoc, > > /* Attach the remote transport to our asoc. */ > > list_add_tail_rcu(&peer->transports, &asoc->peer.transport_addr_list); > > asoc->peer.transport_count++; > > + /* Add this peer into the transport hashtable */ > > + sctp_hash_transport(peer); > > This is actually problematic. The issue is that transports are unhashed when removed. > however, transport removal happens after the association has been declared dead and > should have been removed from the hash and marked unreachable. > > As a result, with the code above, you can now find and return a dead association. > Checking for 'dead' state is racy. Mind elaborating why you think this is racy? (More below) > The best solution I've come up with is to hash the transports in sctp_hash_established() > and clean-up in __sctp_unhash_established(), and then handle ADD-IP case separately. The idea was exactly the opposite :) to avoid such calls. All calls to sctp_unhash_established() were followed by sctp_association_free(), and vice-versa, indicating that it could (and probably should) be embedded in sctp_association_free() itself. No extra locking was taken other than to protect the hash table itself, which now is different. The check against ->dead should be quite as effective as prior implementation, as it's marked dead quite early in sctp_association_free(). We could do it a bit earlier if necessary. Please correct if I'm wrong, but AFAIU rhashtable, it's possible to return an element that is being removed by another CPU, due to RCU usage. If that's right, the early removal is not enough anymore and only a check like the the ->dead one can protect it then. Hmmm we probably should use something more atomic there then. > The above would also remove the necessity to check for temporary associations, since they > should never be hashed. Agreed. We could add a simple if (t->asoc->temp) return; to the new sctp_hash/unhash_transport to fix that. Marcelo ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH net-next 2/5] sctp: apply rhashtable api to send/recv path 2016-01-06 17:42 ` mleitner @ 2016-01-11 15:00 ` Vlad Yasevich 0 siblings, 0 replies; 40+ messages in thread From: Vlad Yasevich @ 2016-01-11 15:00 UTC (permalink / raw) To: mleitner; +Cc: Xin Long, network dev, linux-sctp, vyasevic, daniel, davem On 01/06/2016 12:42 PM, mleitner@redhat.com wrote: > On Tue, Jan 05, 2016 at 02:07:11PM -0500, Vlad Yasevich wrote: >> On 12/30/2015 10:50 AM, Xin Long wrote: >>> apply lookup apis to two functions, for __sctp_endpoint_lookup_assoc >>> and __sctp_lookup_association, it's invoked in the protection of sock >>> lock, it will be safe, but sctp_lookup_association need to call >>> rcu_read_lock() and to detect the t->dead to protect it. >>> >>> Signed-off-by: Xin Long <lucien.xin@gmail.com> >>> Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> >>> --- >>> net/sctp/associola.c | 5 +++++ >>> net/sctp/endpointola.c | 35 ++++++++--------------------------- >>> net/sctp/input.c | 39 ++++++++++----------------------------- >>> net/sctp/protocol.c | 6 ++++++ >>> 4 files changed, 29 insertions(+), 56 deletions(-) >>> >>> diff --git a/net/sctp/associola.c b/net/sctp/associola.c >>> index 559afd0..2bf8ec9 100644 >>> --- a/net/sctp/associola.c >>> +++ b/net/sctp/associola.c >>> @@ -383,6 +383,7 @@ void sctp_association_free(struct sctp_association *asoc) >>> list_for_each_safe(pos, temp, &asoc->peer.transport_addr_list) { >>> transport = list_entry(pos, struct sctp_transport, transports); >>> list_del_rcu(pos); >>> + sctp_unhash_transport(transport); >>> sctp_transport_free(transport); >>> } >>> >>> @@ -500,6 +501,8 @@ void sctp_assoc_rm_peer(struct sctp_association *asoc, >>> >>> /* Remove this peer from the list. */ >>> list_del_rcu(&peer->transports); >>> + /* Remove this peer from the transport hashtable */ >>> + sctp_unhash_transport(peer); >>> >>> /* Get the first transport of asoc. */ >>> pos = asoc->peer.transport_addr_list.next; >>> @@ -699,6 +702,8 @@ struct sctp_transport *sctp_assoc_add_peer(struct sctp_association *asoc, >>> /* Attach the remote transport to our asoc. */ >>> list_add_tail_rcu(&peer->transports, &asoc->peer.transport_addr_list); >>> asoc->peer.transport_count++; >>> + /* Add this peer into the transport hashtable */ >>> + sctp_hash_transport(peer); >> >> This is actually problematic. The issue is that transports are unhashed when removed. >> however, transport removal happens after the association has been declared dead and >> should have been removed from the hash and marked unreachable. >> >> As a result, with the code above, you can now find and return a dead association. >> Checking for 'dead' state is racy. > > Mind elaborating why you think this is racy? (More below) > >> The best solution I've come up with is to hash the transports in sctp_hash_established() >> and clean-up in __sctp_unhash_established(), and then handle ADD-IP case separately. > > The idea was exactly the opposite :) to avoid such calls. All calls to > sctp_unhash_established() were followed by sctp_association_free(), and > vice-versa, indicating that it could (and probably should) be embedded > in sctp_association_free() itself. Oh, I understand the idea and the desire... > > No extra locking was taken other than to protect the hash table itself, > which now is different. The check against ->dead should be quite as > effective as prior implementation, as it's marked dead quite early in > sctp_association_free(). We could do it a bit earlier if necessary. > > Please correct if I'm wrong, but AFAIU rhashtable, it's possible to > return an element that is being removed by another CPU, due to RCU > usage. If that's right, the early removal is not enough anymore and > only a check like the the ->dead one can protect it then. Hmmm we > probably should use something more atomic there then. So, I've been looking at this problem trying to figure out what specifically triggers this race. It comes down to what seems like a small change in in __sctp_lookup_association(), but it is significant. The old code took a ref on the found association while under a read lock for the hash bucket. This guaranteed that the association still existed in the hash, if even though it was in the process of being removed, we were guaranteed to have a proper ref on it. With the new code, the guarantee is gone. It looks like we illegally take a ref on the association without any requisite protections. As a result, the transport may have been unhashed and the association is going through the destroy phase, when we bump the ref (technically from 0). I think what Herbert recently suggested may help us here. He suggested hashing a list object and linking transports to it. I am thinking right now that f we take it one step further and put a lock inside that list object that protects the list, then we might be able to solve this race. > >> The above would also remove the necessity to check for temporary associations, since they >> should never be hashed. > > Agreed. We could add a simple > if (t->asoc->temp) > return; > to the new sctp_hash/unhash_transport to fix that. Good idea. -vlad > > Marcelo > ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH net-next 1/5] sctp: add the rhashtable apis for sctp global transport hashtable 2015-12-30 15:50 ` [PATCH net-next 1/5] sctp: add the rhashtable apis for sctp global transport hashtable Xin Long 2015-12-30 15:50 ` [PATCH net-next 2/5] sctp: apply rhashtable api to send/recv path Xin Long @ 2015-12-30 16:57 ` Eric Dumazet 2015-12-30 17:50 ` David Miller 2015-12-30 17:41 ` Marcelo Ricardo Leitner ` (2 subsequent siblings) 4 siblings, 1 reply; 40+ messages in thread From: Eric Dumazet @ 2015-12-30 16:57 UTC (permalink / raw) To: Xin Long; +Cc: network dev, linux-sctp, mleitner, vyasevic, daniel, davem On Wed, 2015-12-30 at 23:50 +0800, Xin Long wrote: > tranport hashtbale will replace the association hashtable to do the > lookup for transport, and then get association by t->assoc, rhashtable > apis will be used because of it's resizable, scalable and using rcu. > > lport + rport + paddr will be the base hashkey to locate the chain, > with net to protect one netns from another, then plus the laddr to > compare to get the target. > > this patch will provider the lookup functions: > - sctp_epaddr_lookup_transport > - sctp_addrs_lookup_transport > > hash/unhash functions: > - sctp_hash_transport > - sctp_unhash_transport > > init/destroy functions: > - sctp_transport_hashtable_init > - sctp_transport_hashtable_destroy > > Signed-off-by: Xin Long <lucien.xin@gmail.com> > Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> > --- I am against using rhashtable in SCTP (or TCP) at this stage, given the number of bugs we have with it. ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH net-next 1/5] sctp: add the rhashtable apis for sctp global transport hashtable 2015-12-30 16:57 ` [PATCH net-next 1/5] sctp: add the rhashtable apis for sctp global transport hashtable Eric Dumazet @ 2015-12-30 17:50 ` David Miller 2016-01-11 9:32 ` Herbert Xu 0 siblings, 1 reply; 40+ messages in thread From: David Miller @ 2015-12-30 17:50 UTC (permalink / raw) To: eric.dumazet; +Cc: lucien.xin, netdev, linux-sctp, mleitner, vyasevic, daniel From: Eric Dumazet <eric.dumazet@gmail.com> Date: Wed, 30 Dec 2015 11:57:31 -0500 > I am against using rhashtable in SCTP (or TCP) at this stage, given the > number of bugs we have with it. Come on Eric, we've largely dealt with all of these problems. I haven't seen a serious report in a while. ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH net-next 1/5] sctp: add the rhashtable apis for sctp global transport hashtable 2015-12-30 17:50 ` David Miller @ 2016-01-11 9:32 ` Herbert Xu 2016-01-11 16:33 ` Marcelo Ricardo Leitner 0 siblings, 1 reply; 40+ messages in thread From: Herbert Xu @ 2016-01-11 9:32 UTC (permalink / raw) To: David Miller Cc: eric.dumazet, lucien.xin, netdev, linux-sctp, mleitner, vyasevic, daniel David Miller <davem@davemloft.net> wrote: > From: Eric Dumazet <eric.dumazet@gmail.com> > Date: Wed, 30 Dec 2015 11:57:31 -0500 > >> I am against using rhashtable in SCTP (or TCP) at this stage, given the >> number of bugs we have with it. > > Come on Eric, we've largely dealt with all of these problems. I haven't > seen a serious report in a while. Well there is still the outstanding issue with softirq insertion potentially failing with ENOMEM if we fail to expand the hash table using just kmalloc. So if the target user does softirq insertions, I would wait until the fix for that is ready. Thanks, -- Email: Herbert Xu <herbert@gondor.apana.org.au> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH net-next 1/5] sctp: add the rhashtable apis for sctp global transport hashtable 2016-01-11 9:32 ` Herbert Xu @ 2016-01-11 16:33 ` Marcelo Ricardo Leitner 2016-01-11 18:08 ` Vlad Yasevich 0 siblings, 1 reply; 40+ messages in thread From: Marcelo Ricardo Leitner @ 2016-01-11 16:33 UTC (permalink / raw) To: Herbert Xu Cc: David Miller, eric.dumazet, lucien.xin, netdev, linux-sctp, mleitner, vyasevic, daniel On Mon, Jan 11, 2016 at 05:32:10PM +0800, Herbert Xu wrote: > David Miller <davem@davemloft.net> wrote: > > From: Eric Dumazet <eric.dumazet@gmail.com> > > Date: Wed, 30 Dec 2015 11:57:31 -0500 > > > >> I am against using rhashtable in SCTP (or TCP) at this stage, given the > >> number of bugs we have with it. > > > > Come on Eric, we've largely dealt with all of these problems. I haven't > > seen a serious report in a while. > > Well there is still the outstanding issue with softirq insertion > potentially failing with ENOMEM if we fail to expand the hash > table using just kmalloc. > > So if the target user does softirq insertions, I would wait until > the fix for that is ready. It does some, yes. If listening socket is not backlogged, there will be N inserts at each new association, where N is the number of IP addresses that the client is advertising. This is done on the second stage of the SCTP handshake. Not easily DoS-able as it requires receiving a packet from server and replying based on it, plus N is limited by MTU. AFAIK Xin's stress tests couldn't hit this situation of ENOMEM, btw. Thanks, Marcelo ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH net-next 1/5] sctp: add the rhashtable apis for sctp global transport hashtable 2016-01-11 16:33 ` Marcelo Ricardo Leitner @ 2016-01-11 18:08 ` Vlad Yasevich 2016-01-11 18:19 ` Marcelo Ricardo Leitner 0 siblings, 1 reply; 40+ messages in thread From: Vlad Yasevich @ 2016-01-11 18:08 UTC (permalink / raw) To: Marcelo Ricardo Leitner, Herbert Xu Cc: David Miller, eric.dumazet, lucien.xin, netdev, linux-sctp, mleitner, vyasevic, daniel On 01/11/2016 11:33 AM, Marcelo Ricardo Leitner wrote: > On Mon, Jan 11, 2016 at 05:32:10PM +0800, Herbert Xu wrote: >> David Miller <davem@davemloft.net> wrote: >>> From: Eric Dumazet <eric.dumazet@gmail.com> >>> Date: Wed, 30 Dec 2015 11:57:31 -0500 >>> >>>> I am against using rhashtable in SCTP (or TCP) at this stage, given the >>>> number of bugs we have with it. >>> >>> Come on Eric, we've largely dealt with all of these problems. I haven't >>> seen a serious report in a while. >> >> Well there is still the outstanding issue with softirq insertion >> potentially failing with ENOMEM if we fail to expand the hash >> table using just kmalloc. >> >> So if the target user does softirq insertions, I would wait until >> the fix for that is ready. > > It does some, yes. If listening socket is not backlogged, there will be > N inserts at each new association, where N is the number of IP addresses > that the client is advertising. > > This is done on the second stage of the SCTP handshake. Not easily > DoS-able as it requires receiving a packet from server and replying > based on it, plus N is limited by MTU. How is N limited by MTU? INIT and COOKIE chunks are allowed to exceed mtu and will be IP fragmented. So it seems that N is limited by 65535-overhead, where overhead is all the headers plus the sctp cookie info. That can be quite a lot of addresses. -vlad > > AFAIK Xin's stress tests couldn't hit this situation of ENOMEM, btw. > > Thanks, > Marcelo > > -- > To unsubscribe from this list: send the line "unsubscribe linux-sctp" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH net-next 1/5] sctp: add the rhashtable apis for sctp global transport hashtable 2016-01-11 18:08 ` Vlad Yasevich @ 2016-01-11 18:19 ` Marcelo Ricardo Leitner 0 siblings, 0 replies; 40+ messages in thread From: Marcelo Ricardo Leitner @ 2016-01-11 18:19 UTC (permalink / raw) To: Vlad Yasevich Cc: Herbert Xu, David Miller, eric.dumazet, lucien.xin, netdev, linux-sctp, mleitner, vyasevic, daniel On Mon, Jan 11, 2016 at 01:08:56PM -0500, Vlad Yasevich wrote: > On 01/11/2016 11:33 AM, Marcelo Ricardo Leitner wrote: > > On Mon, Jan 11, 2016 at 05:32:10PM +0800, Herbert Xu wrote: > >> David Miller <davem@davemloft.net> wrote: > >>> From: Eric Dumazet <eric.dumazet@gmail.com> > >>> Date: Wed, 30 Dec 2015 11:57:31 -0500 > >>> > >>>> I am against using rhashtable in SCTP (or TCP) at this stage, given the > >>>> number of bugs we have with it. > >>> > >>> Come on Eric, we've largely dealt with all of these problems. I haven't > >>> seen a serious report in a while. > >> > >> Well there is still the outstanding issue with softirq insertion > >> potentially failing with ENOMEM if we fail to expand the hash > >> table using just kmalloc. > >> > >> So if the target user does softirq insertions, I would wait until > >> the fix for that is ready. > > > > It does some, yes. If listening socket is not backlogged, there will be > > N inserts at each new association, where N is the number of IP addresses > > that the client is advertising. > > > > This is done on the second stage of the SCTP handshake. Not easily > > DoS-able as it requires receiving a packet from server and replying > > based on it, plus N is limited by MTU. > > How is N limited by MTU? INIT and COOKIE chunks are allowed to exceed > mtu and will be IP fragmented. So it seems that N is limited by 65535-overhead, > where overhead is all the headers plus the sctp cookie info. That can > be quite a lot of addresses. Oups, you're right. One then can trigger quite some inserts with a single new assoc attempt, yes. Marcelo ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH net-next 1/5] sctp: add the rhashtable apis for sctp global transport hashtable 2015-12-30 15:50 ` [PATCH net-next 1/5] sctp: add the rhashtable apis for sctp global transport hashtable Xin Long 2015-12-30 15:50 ` [PATCH net-next 2/5] sctp: apply rhashtable api to send/recv path Xin Long 2015-12-30 16:57 ` [PATCH net-next 1/5] sctp: add the rhashtable apis for sctp global transport hashtable Eric Dumazet @ 2015-12-30 17:41 ` Marcelo Ricardo Leitner 2016-01-05 10:10 ` Xin Long 2016-01-05 18:38 ` Vlad Yasevich 2016-01-11 9:30 ` Herbert Xu 4 siblings, 1 reply; 40+ messages in thread From: Marcelo Ricardo Leitner @ 2015-12-30 17:41 UTC (permalink / raw) To: Eric Dumazet Cc: network dev, linux-sctp, mleitner, vyasevic, daniel, davem, Herbert Xu, Xin Long On Wed, Dec 30, 2015 at 11:50:46PM +0800, Xin Long wrote: ... > +void sctp_hash_transport(struct sctp_transport *t) > +{ > + struct sctp_sockaddr_entry *addr; > + struct sctp_hash_cmp_arg arg; > + > + addr = list_entry(t->asoc->base.bind_addr.address_list.next, > + struct sctp_sockaddr_entry, list); > + arg.laddr = &addr->a; > + arg.paddr = &t->ipaddr; > + arg.net = sock_net(t->asoc->base.sk); > + > +reinsert: > + if (rhashtable_lookup_insert_key(&sctp_transport_hashtable, &arg, > + &t->node, sctp_hash_params) == -EBUSY) > + goto reinsert; > +} This is the nasty situation I mentioned in previous email. It seems that a stress test can trigger a double rehash and cause an entry to not be added. This is in fact very near some bugs you caught on rhashtable in the past few days/couple of weeks tops. I'm actually against this loop as is. I may have not been clear with Xin about not adding my signature to the patchset due to this. Please take a look at Xin's emails on thread 'rhashtable: Prevent spurious EBUSY errors on insertion' about this particular situation. Cc'ing Herbert as he wanted to see the patches for that issue. Marcelo ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH net-next 1/5] sctp: add the rhashtable apis for sctp global transport hashtable 2015-12-30 17:41 ` Marcelo Ricardo Leitner @ 2016-01-05 10:10 ` Xin Long 2016-01-11 9:22 ` Herbert Xu 0 siblings, 1 reply; 40+ messages in thread From: Xin Long @ 2016-01-05 10:10 UTC (permalink / raw) To: Marcelo Ricardo Leitner Cc: Eric Dumazet, network dev, linux-sctp, Marcelo Ricardo Leitner, Vlad Yasevich, daniel, davem, Herbert Xu [-- Attachment #1: Type: text/plain, Size: 1733 bytes --] On Thu, Dec 31, 2015 at 1:41 AM, Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> wrote: > On Wed, Dec 30, 2015 at 11:50:46PM +0800, Xin Long wrote: > ... >> +void sctp_hash_transport(struct sctp_transport *t) >> +{ >> + struct sctp_sockaddr_entry *addr; >> + struct sctp_hash_cmp_arg arg; >> + >> + addr = list_entry(t->asoc->base.bind_addr.address_list.next, >> + struct sctp_sockaddr_entry, list); >> + arg.laddr = &addr->a; >> + arg.paddr = &t->ipaddr; >> + arg.net = sock_net(t->asoc->base.sk); >> + >> +reinsert: >> + if (rhashtable_lookup_insert_key(&sctp_transport_hashtable, &arg, >> + &t->node, sctp_hash_params) == -EBUSY) >> + goto reinsert; >> +} > > This is the nasty situation I mentioned in previous email. It seems that > a stress test can trigger a double rehash and cause an entry to not be > added. > > This is in fact very near some bugs you caught on rhashtable in the past > few days/couple of weeks tops. > > I'm actually against this loop as is. I may have not been clear with Xin > about not adding my signature to the patchset due to this. > > Please take a look at Xin's emails on thread 'rhashtable: Prevent > spurious EBUSY errors on insertion' about this particular situation. > Cc'ing Herbert as he wanted to see the patches for that issue. > > Marcelo > without this 'reinsert'. we can reproduce this issue like this: 1. download the attachment 2. cp sctperf.tar.gz to server and client hosts 3. in each hosts. #make 4. in server: #sh saddr.sh $ethx #./ss 5. in client: #sh caddr.sh $ethx #ulimit -n 20000 #./cc when the number of connections reach about 1600, this issue will be triggered. [-- Attachment #2: sctperf.tar.gz --] [-- Type: application/x-gzip, Size: 1932 bytes --] ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH net-next 1/5] sctp: add the rhashtable apis for sctp global transport hashtable 2016-01-05 10:10 ` Xin Long @ 2016-01-11 9:22 ` Herbert Xu 0 siblings, 0 replies; 40+ messages in thread From: Herbert Xu @ 2016-01-11 9:22 UTC (permalink / raw) To: Xin Long Cc: Marcelo Ricardo Leitner, Eric Dumazet, network dev, linux-sctp, Marcelo Ricardo Leitner, Vlad Yasevich, daniel, davem On Tue, Jan 05, 2016 at 06:10:15PM +0800, Xin Long wrote: > > without this 'reinsert'. > > we can reproduce this issue like this: > 1. download the attachment > 2. cp sctperf.tar.gz to server and client hosts > 3. in each hosts. > #make > 4. in server: > #sh saddr.sh $ethx > #./ss > 5. in client: > #sh caddr.sh $ethx > #ulimit -n 20000 > #./cc > > when the number of connections reach about 1600, this issue will be triggered. Just to confirm this is with the latest upstream kernel, right? A git ID would be helpful. Thanks, -- Email: Herbert Xu <herbert@gondor.apana.org.au> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH net-next 1/5] sctp: add the rhashtable apis for sctp global transport hashtable 2015-12-30 15:50 ` [PATCH net-next 1/5] sctp: add the rhashtable apis for sctp global transport hashtable Xin Long ` (2 preceding siblings ...) 2015-12-30 17:41 ` Marcelo Ricardo Leitner @ 2016-01-05 18:38 ` Vlad Yasevich 2016-01-06 17:01 ` Xin Long 2016-01-11 9:30 ` Herbert Xu 4 siblings, 1 reply; 40+ messages in thread From: Vlad Yasevich @ 2016-01-05 18:38 UTC (permalink / raw) To: Xin Long, network dev, linux-sctp; +Cc: mleitner, vyasevic, daniel, davem On 12/30/2015 10:50 AM, Xin Long wrote: > tranport hashtbale will replace the association hashtable to do the > lookup for transport, and then get association by t->assoc, rhashtable > apis will be used because of it's resizable, scalable and using rcu. > > lport + rport + paddr will be the base hashkey to locate the chain, > with net to protect one netns from another, then plus the laddr to > compare to get the target. > > this patch will provider the lookup functions: > - sctp_epaddr_lookup_transport > - sctp_addrs_lookup_transport > > hash/unhash functions: > - sctp_hash_transport > - sctp_unhash_transport > > init/destroy functions: > - sctp_transport_hashtable_init > - sctp_transport_hashtable_destroy > > Signed-off-by: Xin Long <lucien.xin@gmail.com> > Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> > --- > include/net/sctp/sctp.h | 11 ++++ > include/net/sctp/structs.h | 5 ++ > net/sctp/input.c | 131 +++++++++++++++++++++++++++++++++++++++++++++ > 3 files changed, 147 insertions(+) > > diff --git a/include/net/sctp/sctp.h b/include/net/sctp/sctp.h > index ce13cf2..7bbdfba 100644 > --- a/include/net/sctp/sctp.h > +++ b/include/net/sctp/sctp.h > @@ -143,6 +143,17 @@ void sctp_icmp_proto_unreachable(struct sock *sk, > struct sctp_transport *t); > void sctp_backlog_migrate(struct sctp_association *assoc, > struct sock *oldsk, struct sock *newsk); > +int sctp_transport_hashtable_init(void); > +void sctp_transport_hashtable_destroy(void); > +void sctp_hash_transport(struct sctp_transport *t); > +void sctp_unhash_transport(struct sctp_transport *t); > +struct sctp_transport *sctp_addrs_lookup_transport( > + struct net *net, > + const union sctp_addr *laddr, > + const union sctp_addr *paddr); > +struct sctp_transport *sctp_epaddr_lookup_transport( > + const struct sctp_endpoint *ep, > + const union sctp_addr *paddr); > > /* > * sctp/proc.c > diff --git a/include/net/sctp/structs.h b/include/net/sctp/structs.h > index eea9bde..4ab87d0 100644 > --- a/include/net/sctp/structs.h > +++ b/include/net/sctp/structs.h > @@ -48,6 +48,7 @@ > #define __sctp_structs_h__ > > #include <linux/ktime.h> > +#include <linux/rhashtable.h> > #include <linux/socket.h> /* linux/in.h needs this!! */ > #include <linux/in.h> /* We get struct sockaddr_in. */ > #include <linux/in6.h> /* We get struct in6_addr */ > @@ -123,6 +124,8 @@ extern struct sctp_globals { > struct sctp_hashbucket *assoc_hashtable; > /* This is the sctp port control hash. */ > struct sctp_bind_hashbucket *port_hashtable; > + /* This is the hash of all transports. */ > + struct rhashtable transport_hashtable; > > /* Sizes of above hashtables. */ > int ep_hashsize; > @@ -147,6 +150,7 @@ extern struct sctp_globals { > #define sctp_assoc_hashtable (sctp_globals.assoc_hashtable) > #define sctp_port_hashsize (sctp_globals.port_hashsize) > #define sctp_port_hashtable (sctp_globals.port_hashtable) > +#define sctp_transport_hashtable (sctp_globals.transport_hashtable) > #define sctp_checksum_disable (sctp_globals.checksum_disable) > > /* SCTP Socket type: UDP or TCP style. */ > @@ -753,6 +757,7 @@ static inline int sctp_packet_empty(struct sctp_packet *packet) > struct sctp_transport { > /* A list of transports. */ > struct list_head transports; > + struct rhash_head node; > > /* Reference counting. */ > atomic_t refcnt; > diff --git a/net/sctp/input.c b/net/sctp/input.c > index b6493b3..bac8278 100644 > --- a/net/sctp/input.c > +++ b/net/sctp/input.c > @@ -782,6 +782,137 @@ hit: > return ep; > } > > +/* rhashtable for transport */ > +struct sctp_hash_cmp_arg { > + const union sctp_addr *laddr; > + const union sctp_addr *paddr; > + const struct net *net; > +}; > + > +static inline int sctp_hash_cmp(struct rhashtable_compare_arg *arg, > + const void *ptr) > +{ > + const struct sctp_hash_cmp_arg *x = arg->key; > + const struct sctp_transport *t = ptr; > + struct sctp_association *asoc = t->asoc; > + const struct net *net = x->net; > + > + if (x->laddr->v4.sin_port != htons(asoc->base.bind_addr.port)) > + return 1; > + if (!sctp_cmp_addr_exact(&t->ipaddr, x->paddr)) > + return 1; > + if (!net_eq(sock_net(asoc->base.sk), net)) > + return 1; > + if (!sctp_bind_addr_match(&asoc->base.bind_addr, > + x->laddr, sctp_sk(asoc->base.sk))) > + return 1; > + > + return 0; > +} > + > +static inline u32 sctp_hash_obj(const void *data, u32 len, u32 seed) > +{ > + const struct sctp_transport *t = data; > + const union sctp_addr *paddr = &t->ipaddr; > + const struct net *net = sock_net(t->asoc->base.sk); > + u16 lport = htons(t->asoc->base.bind_addr.port); > + u32 addr; > + > + if (paddr->sa.sa_family == AF_INET6) > + addr = jhash(&paddr->v6.sin6_addr, 16, seed); > + else > + addr = paddr->v4.sin_addr.s_addr; > + > + return jhash_3words(addr, ((__u32)paddr->v4.sin_port) << 16 | > + (__force __u32)lport, net_hash_mix(net), seed); > +} > + > +static inline u32 sctp_hash_key(const void *data, u32 len, u32 seed) > +{ > + const struct sctp_hash_cmp_arg *x = data; > + const union sctp_addr *paddr = x->paddr; > + const struct net *net = x->net; > + u16 lport = x->laddr->v4.sin_port; > + u32 addr; > + > + if (paddr->sa.sa_family == AF_INET6) > + addr = jhash(&paddr->v6.sin6_addr, 16, seed); > + else > + addr = paddr->v4.sin_addr.s_addr; > + > + return jhash_3words(addr, ((__u32)paddr->v4.sin_port) << 16 | > + (__force __u32)lport, net_hash_mix(net), seed); > +} > + > +static const struct rhashtable_params sctp_hash_params = { > + .head_offset = offsetof(struct sctp_transport, node), > + .hashfn = sctp_hash_key, > + .obj_hashfn = sctp_hash_obj, > + .obj_cmpfn = sctp_hash_cmp, > + .automatic_shrinking = true, > +}; > + > +int sctp_transport_hashtable_init(void) > +{ > + return rhashtable_init(&sctp_transport_hashtable, &sctp_hash_params); > +} > + > +void sctp_transport_hashtable_destroy(void) > +{ > + rhashtable_destroy(&sctp_transport_hashtable); > +} > + > +void sctp_hash_transport(struct sctp_transport *t) > +{ > + struct sctp_sockaddr_entry *addr; > + struct sctp_hash_cmp_arg arg; > + > + addr = list_entry(t->asoc->base.bind_addr.address_list.next, > + struct sctp_sockaddr_entry, list); > + arg.laddr = &addr->a; > + arg.paddr = &t->ipaddr; > + arg.net = sock_net(t->asoc->base.sk); > + > +reinsert: > + if (rhashtable_lookup_insert_key(&sctp_transport_hashtable, &arg, > + &t->node, sctp_hash_params) == -EBUSY) > + goto reinsert; > +} > + > +void sctp_unhash_transport(struct sctp_transport *t) > +{ > + rhashtable_remove_fast(&sctp_transport_hashtable, &t->node, > + sctp_hash_params); > +} > + > +struct sctp_transport *sctp_addrs_lookup_transport( > + struct net *net, > + const union sctp_addr *laddr, > + const union sctp_addr *paddr) > +{ > + struct sctp_hash_cmp_arg arg = { > + .laddr = laddr, > + .paddr = paddr, > + .net = net, > + }; > + > + return rhashtable_lookup_fast(&sctp_transport_hashtable, &arg, > + sctp_hash_params); > +} > + > +struct sctp_transport *sctp_epaddr_lookup_transport( > + const struct sctp_endpoint *ep, > + const union sctp_addr *paddr) > +{ > + struct sctp_sockaddr_entry *addr; > + struct net *net = sock_net(ep->base.sk); > + > + addr = list_entry(ep->base.bind_addr.address_list.next, > + struct sctp_sockaddr_entry, list); > + > + return sctp_addrs_lookup_transport(net, &addr->a, paddr); > +} > + I don't think that this right, mainly because not all endpoint addresses will be copied to association bind_addr list. As a result, you may actually have an association on this endpoint to a given peer, but may not be using the first address from the endpoint.. What might work is to pick an endpoint address that would be usable within the scope of the peer address. -vlad > /* Insert association into the hash table. */ > static void __sctp_hash_established(struct sctp_association *asoc) > { > ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH net-next 1/5] sctp: add the rhashtable apis for sctp global transport hashtable 2016-01-05 18:38 ` Vlad Yasevich @ 2016-01-06 17:01 ` Xin Long 2016-01-06 18:19 ` Marcelo Ricardo Leitner 2016-01-07 20:28 ` Vlad Yasevich 0 siblings, 2 replies; 40+ messages in thread From: Xin Long @ 2016-01-06 17:01 UTC (permalink / raw) To: Vlad Yasevich Cc: network dev, linux-sctp, Marcelo Ricardo Leitner, Vlad Yasevich, daniel, davem On Wed, Jan 6, 2016 at 2:38 AM, Vlad Yasevich <vyasevich@gmail.com> wrote: > On 12/30/2015 10:50 AM, Xin Long wrote: >> tranport hashtbale will replace the association hashtable to do the >> lookup for transport, and then get association by t->assoc, rhashtable >> apis will be used because of it's resizable, scalable and using rcu. >> >> lport + rport + paddr will be the base hashkey to locate the chain, >> with net to protect one netns from another, then plus the laddr to >> compare to get the target. >> >> this patch will provider the lookup functions: >> - sctp_epaddr_lookup_transport >> - sctp_addrs_lookup_transport >> >> hash/unhash functions: >> - sctp_hash_transport >> - sctp_unhash_transport >> >> init/destroy functions: >> - sctp_transport_hashtable_init >> - sctp_transport_hashtable_destroy >> >> Signed-off-by: Xin Long <lucien.xin@gmail.com> >> Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> >> --- >> include/net/sctp/sctp.h | 11 ++++ >> include/net/sctp/structs.h | 5 ++ >> net/sctp/input.c | 131 +++++++++++++++++++++++++++++++++++++++++++++ >> 3 files changed, 147 insertions(+) >> >> diff --git a/include/net/sctp/sctp.h b/include/net/sctp/sctp.h >> index ce13cf2..7bbdfba 100644 >> --- a/include/net/sctp/sctp.h >> +++ b/include/net/sctp/sctp.h >> @@ -143,6 +143,17 @@ void sctp_icmp_proto_unreachable(struct sock *sk, >> struct sctp_transport *t); >> void sctp_backlog_migrate(struct sctp_association *assoc, >> struct sock *oldsk, struct sock *newsk); >> +int sctp_transport_hashtable_init(void); >> +void sctp_transport_hashtable_destroy(void); >> +void sctp_hash_transport(struct sctp_transport *t); >> +void sctp_unhash_transport(struct sctp_transport *t); >> +struct sctp_transport *sctp_addrs_lookup_transport( >> + struct net *net, >> + const union sctp_addr *laddr, >> + const union sctp_addr *paddr); >> +struct sctp_transport *sctp_epaddr_lookup_transport( >> + const struct sctp_endpoint *ep, >> + const union sctp_addr *paddr); >> >> /* >> * sctp/proc.c >> diff --git a/include/net/sctp/structs.h b/include/net/sctp/structs.h >> index eea9bde..4ab87d0 100644 >> --- a/include/net/sctp/structs.h >> +++ b/include/net/sctp/structs.h >> @@ -48,6 +48,7 @@ >> #define __sctp_structs_h__ >> >> #include <linux/ktime.h> >> +#include <linux/rhashtable.h> >> #include <linux/socket.h> /* linux/in.h needs this!! */ >> #include <linux/in.h> /* We get struct sockaddr_in. */ >> #include <linux/in6.h> /* We get struct in6_addr */ >> @@ -123,6 +124,8 @@ extern struct sctp_globals { >> struct sctp_hashbucket *assoc_hashtable; >> /* This is the sctp port control hash. */ >> struct sctp_bind_hashbucket *port_hashtable; >> + /* This is the hash of all transports. */ >> + struct rhashtable transport_hashtable; >> >> /* Sizes of above hashtables. */ >> int ep_hashsize; >> @@ -147,6 +150,7 @@ extern struct sctp_globals { >> #define sctp_assoc_hashtable (sctp_globals.assoc_hashtable) >> #define sctp_port_hashsize (sctp_globals.port_hashsize) >> #define sctp_port_hashtable (sctp_globals.port_hashtable) >> +#define sctp_transport_hashtable (sctp_globals.transport_hashtable) >> #define sctp_checksum_disable (sctp_globals.checksum_disable) >> >> /* SCTP Socket type: UDP or TCP style. */ >> @@ -753,6 +757,7 @@ static inline int sctp_packet_empty(struct sctp_packet *packet) >> struct sctp_transport { >> /* A list of transports. */ >> struct list_head transports; >> + struct rhash_head node; >> >> /* Reference counting. */ >> atomic_t refcnt; >> diff --git a/net/sctp/input.c b/net/sctp/input.c >> index b6493b3..bac8278 100644 >> --- a/net/sctp/input.c >> +++ b/net/sctp/input.c >> @@ -782,6 +782,137 @@ hit: >> return ep; >> } >> >> +/* rhashtable for transport */ >> +struct sctp_hash_cmp_arg { >> + const union sctp_addr *laddr; >> + const union sctp_addr *paddr; >> + const struct net *net; >> +}; >> + >> +static inline int sctp_hash_cmp(struct rhashtable_compare_arg *arg, >> + const void *ptr) >> +{ >> + const struct sctp_hash_cmp_arg *x = arg->key; >> + const struct sctp_transport *t = ptr; >> + struct sctp_association *asoc = t->asoc; >> + const struct net *net = x->net; >> + >> + if (x->laddr->v4.sin_port != htons(asoc->base.bind_addr.port)) >> + return 1; >> + if (!sctp_cmp_addr_exact(&t->ipaddr, x->paddr)) >> + return 1; >> + if (!net_eq(sock_net(asoc->base.sk), net)) >> + return 1; >> + if (!sctp_bind_addr_match(&asoc->base.bind_addr, >> + x->laddr, sctp_sk(asoc->base.sk))) >> + return 1; >> + >> + return 0; >> +} >> + >> +static inline u32 sctp_hash_obj(const void *data, u32 len, u32 seed) >> +{ >> + const struct sctp_transport *t = data; >> + const union sctp_addr *paddr = &t->ipaddr; >> + const struct net *net = sock_net(t->asoc->base.sk); >> + u16 lport = htons(t->asoc->base.bind_addr.port); >> + u32 addr; >> + >> + if (paddr->sa.sa_family == AF_INET6) >> + addr = jhash(&paddr->v6.sin6_addr, 16, seed); >> + else >> + addr = paddr->v4.sin_addr.s_addr; >> + >> + return jhash_3words(addr, ((__u32)paddr->v4.sin_port) << 16 | >> + (__force __u32)lport, net_hash_mix(net), seed); >> +} >> + >> +static inline u32 sctp_hash_key(const void *data, u32 len, u32 seed) >> +{ >> + const struct sctp_hash_cmp_arg *x = data; >> + const union sctp_addr *paddr = x->paddr; >> + const struct net *net = x->net; >> + u16 lport = x->laddr->v4.sin_port; >> + u32 addr; >> + >> + if (paddr->sa.sa_family == AF_INET6) >> + addr = jhash(&paddr->v6.sin6_addr, 16, seed); >> + else >> + addr = paddr->v4.sin_addr.s_addr; >> + >> + return jhash_3words(addr, ((__u32)paddr->v4.sin_port) << 16 | >> + (__force __u32)lport, net_hash_mix(net), seed); >> +} >> + >> +static const struct rhashtable_params sctp_hash_params = { >> + .head_offset = offsetof(struct sctp_transport, node), >> + .hashfn = sctp_hash_key, >> + .obj_hashfn = sctp_hash_obj, >> + .obj_cmpfn = sctp_hash_cmp, >> + .automatic_shrinking = true, >> +}; >> + >> +int sctp_transport_hashtable_init(void) >> +{ >> + return rhashtable_init(&sctp_transport_hashtable, &sctp_hash_params); >> +} >> + >> +void sctp_transport_hashtable_destroy(void) >> +{ >> + rhashtable_destroy(&sctp_transport_hashtable); >> +} >> + >> +void sctp_hash_transport(struct sctp_transport *t) >> +{ >> + struct sctp_sockaddr_entry *addr; >> + struct sctp_hash_cmp_arg arg; >> + >> + addr = list_entry(t->asoc->base.bind_addr.address_list.next, >> + struct sctp_sockaddr_entry, list); >> + arg.laddr = &addr->a; >> + arg.paddr = &t->ipaddr; >> + arg.net = sock_net(t->asoc->base.sk); >> + >> +reinsert: >> + if (rhashtable_lookup_insert_key(&sctp_transport_hashtable, &arg, >> + &t->node, sctp_hash_params) == -EBUSY) >> + goto reinsert; >> +} >> + >> +void sctp_unhash_transport(struct sctp_transport *t) >> +{ >> + rhashtable_remove_fast(&sctp_transport_hashtable, &t->node, >> + sctp_hash_params); >> +} >> + >> +struct sctp_transport *sctp_addrs_lookup_transport( >> + struct net *net, >> + const union sctp_addr *laddr, >> + const union sctp_addr *paddr) >> +{ >> + struct sctp_hash_cmp_arg arg = { >> + .laddr = laddr, >> + .paddr = paddr, >> + .net = net, >> + }; >> + >> + return rhashtable_lookup_fast(&sctp_transport_hashtable, &arg, >> + sctp_hash_params); >> +} >> + >> +struct sctp_transport *sctp_epaddr_lookup_transport( >> + const struct sctp_endpoint *ep, >> + const union sctp_addr *paddr) >> +{ >> + struct sctp_sockaddr_entry *addr; >> + struct net *net = sock_net(ep->base.sk); >> + >> + addr = list_entry(ep->base.bind_addr.address_list.next, >> + struct sctp_sockaddr_entry, list); >> + >> + return sctp_addrs_lookup_transport(net, &addr->a, paddr); >> +} >> + > > I don't think that this right, mainly because not all endpoint > addresses will be copied to association bind_addr list. As a result, > you may actually have an association on this endpoint to a given > peer, but may not be using the first address from the endpoint.. > > What might work is to pick an endpoint address that would be usable within > the scope of the peer address. it's not that easy, does it make sense to you if I change if (!sctp_bind_addr_match(&asoc->base.bind_addr, x->laddr, sctp_sk(asoc->base.sk))) to if (!sctp_bind_addr_match(&asoc->ep->base.bind_addr, x->laddr, sctp_sk(asoc->base.sk))) in sctp_hash_cmp() ? ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH net-next 1/5] sctp: add the rhashtable apis for sctp global transport hashtable 2016-01-06 17:01 ` Xin Long @ 2016-01-06 18:19 ` Marcelo Ricardo Leitner 2016-01-07 17:23 ` Marcelo Ricardo Leitner 2016-01-07 20:28 ` Vlad Yasevich 1 sibling, 1 reply; 40+ messages in thread From: Marcelo Ricardo Leitner @ 2016-01-06 18:19 UTC (permalink / raw) To: Xin Long Cc: Vlad Yasevich, network dev, linux-sctp, Vlad Yasevich, daniel, davem On Thu, Jan 07, 2016 at 01:01:11AM +0800, Xin Long wrote: > On Wed, Jan 6, 2016 at 2:38 AM, Vlad Yasevich <vyasevich@gmail.com> wrote: > > On 12/30/2015 10:50 AM, Xin Long wrote: > >> tranport hashtbale will replace the association hashtable to do the > >> lookup for transport, and then get association by t->assoc, rhashtable > >> apis will be used because of it's resizable, scalable and using rcu. > >> > >> lport + rport + paddr will be the base hashkey to locate the chain, > >> with net to protect one netns from another, then plus the laddr to > >> compare to get the target. > >> > >> this patch will provider the lookup functions: > >> - sctp_epaddr_lookup_transport > >> - sctp_addrs_lookup_transport > >> > >> hash/unhash functions: > >> - sctp_hash_transport > >> - sctp_unhash_transport > >> > >> init/destroy functions: > >> - sctp_transport_hashtable_init > >> - sctp_transport_hashtable_destroy > >> > >> Signed-off-by: Xin Long <lucien.xin@gmail.com> > >> Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> > >> --- > >> include/net/sctp/sctp.h | 11 ++++ > >> include/net/sctp/structs.h | 5 ++ > >> net/sctp/input.c | 131 +++++++++++++++++++++++++++++++++++++++++++++ > >> 3 files changed, 147 insertions(+) > >> > >> diff --git a/include/net/sctp/sctp.h b/include/net/sctp/sctp.h > >> index ce13cf2..7bbdfba 100644 > >> --- a/include/net/sctp/sctp.h > >> +++ b/include/net/sctp/sctp.h > >> @@ -143,6 +143,17 @@ void sctp_icmp_proto_unreachable(struct sock *sk, > >> struct sctp_transport *t); > >> void sctp_backlog_migrate(struct sctp_association *assoc, > >> struct sock *oldsk, struct sock *newsk); > >> +int sctp_transport_hashtable_init(void); > >> +void sctp_transport_hashtable_destroy(void); > >> +void sctp_hash_transport(struct sctp_transport *t); > >> +void sctp_unhash_transport(struct sctp_transport *t); > >> +struct sctp_transport *sctp_addrs_lookup_transport( > >> + struct net *net, > >> + const union sctp_addr *laddr, > >> + const union sctp_addr *paddr); > >> +struct sctp_transport *sctp_epaddr_lookup_transport( > >> + const struct sctp_endpoint *ep, > >> + const union sctp_addr *paddr); > >> > >> /* > >> * sctp/proc.c > >> diff --git a/include/net/sctp/structs.h b/include/net/sctp/structs.h > >> index eea9bde..4ab87d0 100644 > >> --- a/include/net/sctp/structs.h > >> +++ b/include/net/sctp/structs.h > >> @@ -48,6 +48,7 @@ > >> #define __sctp_structs_h__ > >> > >> #include <linux/ktime.h> > >> +#include <linux/rhashtable.h> > >> #include <linux/socket.h> /* linux/in.h needs this!! */ > >> #include <linux/in.h> /* We get struct sockaddr_in. */ > >> #include <linux/in6.h> /* We get struct in6_addr */ > >> @@ -123,6 +124,8 @@ extern struct sctp_globals { > >> struct sctp_hashbucket *assoc_hashtable; > >> /* This is the sctp port control hash. */ > >> struct sctp_bind_hashbucket *port_hashtable; > >> + /* This is the hash of all transports. */ > >> + struct rhashtable transport_hashtable; > >> > >> /* Sizes of above hashtables. */ > >> int ep_hashsize; > >> @@ -147,6 +150,7 @@ extern struct sctp_globals { > >> #define sctp_assoc_hashtable (sctp_globals.assoc_hashtable) > >> #define sctp_port_hashsize (sctp_globals.port_hashsize) > >> #define sctp_port_hashtable (sctp_globals.port_hashtable) > >> +#define sctp_transport_hashtable (sctp_globals.transport_hashtable) > >> #define sctp_checksum_disable (sctp_globals.checksum_disable) > >> > >> /* SCTP Socket type: UDP or TCP style. */ > >> @@ -753,6 +757,7 @@ static inline int sctp_packet_empty(struct sctp_packet *packet) > >> struct sctp_transport { > >> /* A list of transports. */ > >> struct list_head transports; > >> + struct rhash_head node; > >> > >> /* Reference counting. */ > >> atomic_t refcnt; > >> diff --git a/net/sctp/input.c b/net/sctp/input.c > >> index b6493b3..bac8278 100644 > >> --- a/net/sctp/input.c > >> +++ b/net/sctp/input.c > >> @@ -782,6 +782,137 @@ hit: > >> return ep; > >> } > >> > >> +/* rhashtable for transport */ > >> +struct sctp_hash_cmp_arg { > >> + const union sctp_addr *laddr; > >> + const union sctp_addr *paddr; > >> + const struct net *net; > >> +}; > >> + > >> +static inline int sctp_hash_cmp(struct rhashtable_compare_arg *arg, > >> + const void *ptr) > >> +{ > >> + const struct sctp_hash_cmp_arg *x = arg->key; > >> + const struct sctp_transport *t = ptr; > >> + struct sctp_association *asoc = t->asoc; > >> + const struct net *net = x->net; > >> + > >> + if (x->laddr->v4.sin_port != htons(asoc->base.bind_addr.port)) > >> + return 1; > >> + if (!sctp_cmp_addr_exact(&t->ipaddr, x->paddr)) > >> + return 1; > >> + if (!net_eq(sock_net(asoc->base.sk), net)) > >> + return 1; > >> + if (!sctp_bind_addr_match(&asoc->base.bind_addr, > >> + x->laddr, sctp_sk(asoc->base.sk))) > >> + return 1; > >> + > >> + return 0; > >> +} > >> + > >> +static inline u32 sctp_hash_obj(const void *data, u32 len, u32 seed) > >> +{ > >> + const struct sctp_transport *t = data; > >> + const union sctp_addr *paddr = &t->ipaddr; > >> + const struct net *net = sock_net(t->asoc->base.sk); > >> + u16 lport = htons(t->asoc->base.bind_addr.port); > >> + u32 addr; > >> + > >> + if (paddr->sa.sa_family == AF_INET6) > >> + addr = jhash(&paddr->v6.sin6_addr, 16, seed); > >> + else > >> + addr = paddr->v4.sin_addr.s_addr; > >> + > >> + return jhash_3words(addr, ((__u32)paddr->v4.sin_port) << 16 | > >> + (__force __u32)lport, net_hash_mix(net), seed); > >> +} > >> + > >> +static inline u32 sctp_hash_key(const void *data, u32 len, u32 seed) > >> +{ > >> + const struct sctp_hash_cmp_arg *x = data; > >> + const union sctp_addr *paddr = x->paddr; > >> + const struct net *net = x->net; > >> + u16 lport = x->laddr->v4.sin_port; > >> + u32 addr; > >> + > >> + if (paddr->sa.sa_family == AF_INET6) > >> + addr = jhash(&paddr->v6.sin6_addr, 16, seed); > >> + else > >> + addr = paddr->v4.sin_addr.s_addr; > >> + > >> + return jhash_3words(addr, ((__u32)paddr->v4.sin_port) << 16 | > >> + (__force __u32)lport, net_hash_mix(net), seed); > >> +} > >> + > >> +static const struct rhashtable_params sctp_hash_params = { > >> + .head_offset = offsetof(struct sctp_transport, node), > >> + .hashfn = sctp_hash_key, > >> + .obj_hashfn = sctp_hash_obj, > >> + .obj_cmpfn = sctp_hash_cmp, > >> + .automatic_shrinking = true, > >> +}; > >> + > >> +int sctp_transport_hashtable_init(void) > >> +{ > >> + return rhashtable_init(&sctp_transport_hashtable, &sctp_hash_params); > >> +} > >> + > >> +void sctp_transport_hashtable_destroy(void) > >> +{ > >> + rhashtable_destroy(&sctp_transport_hashtable); > >> +} > >> + > >> +void sctp_hash_transport(struct sctp_transport *t) > >> +{ > >> + struct sctp_sockaddr_entry *addr; > >> + struct sctp_hash_cmp_arg arg; > >> + > >> + addr = list_entry(t->asoc->base.bind_addr.address_list.next, > >> + struct sctp_sockaddr_entry, list); > >> + arg.laddr = &addr->a; > >> + arg.paddr = &t->ipaddr; > >> + arg.net = sock_net(t->asoc->base.sk); > >> + > >> +reinsert: > >> + if (rhashtable_lookup_insert_key(&sctp_transport_hashtable, &arg, > >> + &t->node, sctp_hash_params) == -EBUSY) > >> + goto reinsert; > >> +} > >> + > >> +void sctp_unhash_transport(struct sctp_transport *t) > >> +{ > >> + rhashtable_remove_fast(&sctp_transport_hashtable, &t->node, > >> + sctp_hash_params); > >> +} > >> + > >> +struct sctp_transport *sctp_addrs_lookup_transport( > >> + struct net *net, > >> + const union sctp_addr *laddr, > >> + const union sctp_addr *paddr) > >> +{ > >> + struct sctp_hash_cmp_arg arg = { > >> + .laddr = laddr, > >> + .paddr = paddr, > >> + .net = net, > >> + }; > >> + > >> + return rhashtable_lookup_fast(&sctp_transport_hashtable, &arg, > >> + sctp_hash_params); > >> +} > >> + > >> +struct sctp_transport *sctp_epaddr_lookup_transport( > >> + const struct sctp_endpoint *ep, > >> + const union sctp_addr *paddr) > >> +{ > >> + struct sctp_sockaddr_entry *addr; > >> + struct net *net = sock_net(ep->base.sk); > >> + > >> + addr = list_entry(ep->base.bind_addr.address_list.next, > >> + struct sctp_sockaddr_entry, list); > >> + > >> + return sctp_addrs_lookup_transport(net, &addr->a, paddr); > >> +} > >> + > > > > I don't think that this right, mainly because not all endpoint > > addresses will be copied to association bind_addr list. As a result, > > you may actually have an association on this endpoint to a given > > peer, but may not be using the first address from the endpoint.. That shouldn't be a problem, as laddr is not considered in the hash function and later on sctp_hash_cmp() will use sctp_bind_addr_match() as below, which will traverse asoc->base.bind_addr in this case, meaning it would still see all transport/asoc possibilities. Or did I miss something? Marcelo > > What might work is to pick an endpoint address that would be usable within > > the scope of the peer address. > it's not that easy, does it make sense to you if I change > > if (!sctp_bind_addr_match(&asoc->base.bind_addr, > x->laddr, sctp_sk(asoc->base.sk))) > > to > if (!sctp_bind_addr_match(&asoc->ep->base.bind_addr, > x->laddr, sctp_sk(asoc->base.sk))) > > in sctp_hash_cmp() ? > ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH net-next 1/5] sctp: add the rhashtable apis for sctp global transport hashtable 2016-01-06 18:19 ` Marcelo Ricardo Leitner @ 2016-01-07 17:23 ` Marcelo Ricardo Leitner 0 siblings, 0 replies; 40+ messages in thread From: Marcelo Ricardo Leitner @ 2016-01-07 17:23 UTC (permalink / raw) To: Vlad Yasevich Cc: Xin Long, network dev, linux-sctp, Vlad Yasevich, daniel, davem On Wed, Jan 06, 2016 at 04:19:50PM -0200, Marcelo Ricardo Leitner wrote: > On Thu, Jan 07, 2016 at 01:01:11AM +0800, Xin Long wrote: > > On Wed, Jan 6, 2016 at 2:38 AM, Vlad Yasevich <vyasevich@gmail.com> wrote: > > > On 12/30/2015 10:50 AM, Xin Long wrote: > > >> tranport hashtbale will replace the association hashtable to do the > > >> lookup for transport, and then get association by t->assoc, rhashtable > > >> apis will be used because of it's resizable, scalable and using rcu. > > >> > > >> lport + rport + paddr will be the base hashkey to locate the chain, > > >> with net to protect one netns from another, then plus the laddr to > > >> compare to get the target. > > >> > > >> this patch will provider the lookup functions: > > >> - sctp_epaddr_lookup_transport > > >> - sctp_addrs_lookup_transport > > >> > > >> hash/unhash functions: > > >> - sctp_hash_transport > > >> - sctp_unhash_transport > > >> > > >> init/destroy functions: > > >> - sctp_transport_hashtable_init > > >> - sctp_transport_hashtable_destroy > > >> > > >> Signed-off-by: Xin Long <lucien.xin@gmail.com> > > >> Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> > > >> --- > > >> include/net/sctp/sctp.h | 11 ++++ > > >> include/net/sctp/structs.h | 5 ++ > > >> net/sctp/input.c | 131 +++++++++++++++++++++++++++++++++++++++++++++ > > >> 3 files changed, 147 insertions(+) > > >> > > >> diff --git a/include/net/sctp/sctp.h b/include/net/sctp/sctp.h > > >> index ce13cf2..7bbdfba 100644 > > >> --- a/include/net/sctp/sctp.h > > >> +++ b/include/net/sctp/sctp.h > > >> @@ -143,6 +143,17 @@ void sctp_icmp_proto_unreachable(struct sock *sk, > > >> struct sctp_transport *t); > > >> void sctp_backlog_migrate(struct sctp_association *assoc, > > >> struct sock *oldsk, struct sock *newsk); > > >> +int sctp_transport_hashtable_init(void); > > >> +void sctp_transport_hashtable_destroy(void); > > >> +void sctp_hash_transport(struct sctp_transport *t); > > >> +void sctp_unhash_transport(struct sctp_transport *t); > > >> +struct sctp_transport *sctp_addrs_lookup_transport( > > >> + struct net *net, > > >> + const union sctp_addr *laddr, > > >> + const union sctp_addr *paddr); > > >> +struct sctp_transport *sctp_epaddr_lookup_transport( > > >> + const struct sctp_endpoint *ep, > > >> + const union sctp_addr *paddr); > > >> > > >> /* > > >> * sctp/proc.c > > >> diff --git a/include/net/sctp/structs.h b/include/net/sctp/structs.h > > >> index eea9bde..4ab87d0 100644 > > >> --- a/include/net/sctp/structs.h > > >> +++ b/include/net/sctp/structs.h > > >> @@ -48,6 +48,7 @@ > > >> #define __sctp_structs_h__ > > >> > > >> #include <linux/ktime.h> > > >> +#include <linux/rhashtable.h> > > >> #include <linux/socket.h> /* linux/in.h needs this!! */ > > >> #include <linux/in.h> /* We get struct sockaddr_in. */ > > >> #include <linux/in6.h> /* We get struct in6_addr */ > > >> @@ -123,6 +124,8 @@ extern struct sctp_globals { > > >> struct sctp_hashbucket *assoc_hashtable; > > >> /* This is the sctp port control hash. */ > > >> struct sctp_bind_hashbucket *port_hashtable; > > >> + /* This is the hash of all transports. */ > > >> + struct rhashtable transport_hashtable; > > >> > > >> /* Sizes of above hashtables. */ > > >> int ep_hashsize; > > >> @@ -147,6 +150,7 @@ extern struct sctp_globals { > > >> #define sctp_assoc_hashtable (sctp_globals.assoc_hashtable) > > >> #define sctp_port_hashsize (sctp_globals.port_hashsize) > > >> #define sctp_port_hashtable (sctp_globals.port_hashtable) > > >> +#define sctp_transport_hashtable (sctp_globals.transport_hashtable) > > >> #define sctp_checksum_disable (sctp_globals.checksum_disable) > > >> > > >> /* SCTP Socket type: UDP or TCP style. */ > > >> @@ -753,6 +757,7 @@ static inline int sctp_packet_empty(struct sctp_packet *packet) > > >> struct sctp_transport { > > >> /* A list of transports. */ > > >> struct list_head transports; > > >> + struct rhash_head node; > > >> > > >> /* Reference counting. */ > > >> atomic_t refcnt; > > >> diff --git a/net/sctp/input.c b/net/sctp/input.c > > >> index b6493b3..bac8278 100644 > > >> --- a/net/sctp/input.c > > >> +++ b/net/sctp/input.c > > >> @@ -782,6 +782,137 @@ hit: > > >> return ep; > > >> } > > >> > > >> +/* rhashtable for transport */ > > >> +struct sctp_hash_cmp_arg { > > >> + const union sctp_addr *laddr; > > >> + const union sctp_addr *paddr; > > >> + const struct net *net; > > >> +}; > > >> + > > >> +static inline int sctp_hash_cmp(struct rhashtable_compare_arg *arg, > > >> + const void *ptr) > > >> +{ > > >> + const struct sctp_hash_cmp_arg *x = arg->key; > > >> + const struct sctp_transport *t = ptr; > > >> + struct sctp_association *asoc = t->asoc; > > >> + const struct net *net = x->net; > > >> + > > >> + if (x->laddr->v4.sin_port != htons(asoc->base.bind_addr.port)) > > >> + return 1; > > >> + if (!sctp_cmp_addr_exact(&t->ipaddr, x->paddr)) > > >> + return 1; > > >> + if (!net_eq(sock_net(asoc->base.sk), net)) > > >> + return 1; > > >> + if (!sctp_bind_addr_match(&asoc->base.bind_addr, > > >> + x->laddr, sctp_sk(asoc->base.sk))) > > >> + return 1; > > >> + > > >> + return 0; > > >> +} > > >> + > > >> +static inline u32 sctp_hash_obj(const void *data, u32 len, u32 seed) > > >> +{ > > >> + const struct sctp_transport *t = data; > > >> + const union sctp_addr *paddr = &t->ipaddr; > > >> + const struct net *net = sock_net(t->asoc->base.sk); > > >> + u16 lport = htons(t->asoc->base.bind_addr.port); > > >> + u32 addr; > > >> + > > >> + if (paddr->sa.sa_family == AF_INET6) > > >> + addr = jhash(&paddr->v6.sin6_addr, 16, seed); > > >> + else > > >> + addr = paddr->v4.sin_addr.s_addr; > > >> + > > >> + return jhash_3words(addr, ((__u32)paddr->v4.sin_port) << 16 | > > >> + (__force __u32)lport, net_hash_mix(net), seed); > > >> +} > > >> + > > >> +static inline u32 sctp_hash_key(const void *data, u32 len, u32 seed) > > >> +{ > > >> + const struct sctp_hash_cmp_arg *x = data; > > >> + const union sctp_addr *paddr = x->paddr; > > >> + const struct net *net = x->net; > > >> + u16 lport = x->laddr->v4.sin_port; > > >> + u32 addr; > > >> + > > >> + if (paddr->sa.sa_family == AF_INET6) > > >> + addr = jhash(&paddr->v6.sin6_addr, 16, seed); > > >> + else > > >> + addr = paddr->v4.sin_addr.s_addr; > > >> + > > >> + return jhash_3words(addr, ((__u32)paddr->v4.sin_port) << 16 | > > >> + (__force __u32)lport, net_hash_mix(net), seed); > > >> +} > > >> + > > >> +static const struct rhashtable_params sctp_hash_params = { > > >> + .head_offset = offsetof(struct sctp_transport, node), > > >> + .hashfn = sctp_hash_key, > > >> + .obj_hashfn = sctp_hash_obj, > > >> + .obj_cmpfn = sctp_hash_cmp, > > >> + .automatic_shrinking = true, > > >> +}; > > >> + > > >> +int sctp_transport_hashtable_init(void) > > >> +{ > > >> + return rhashtable_init(&sctp_transport_hashtable, &sctp_hash_params); > > >> +} > > >> + > > >> +void sctp_transport_hashtable_destroy(void) > > >> +{ > > >> + rhashtable_destroy(&sctp_transport_hashtable); > > >> +} > > >> + > > >> +void sctp_hash_transport(struct sctp_transport *t) > > >> +{ > > >> + struct sctp_sockaddr_entry *addr; > > >> + struct sctp_hash_cmp_arg arg; > > >> + > > >> + addr = list_entry(t->asoc->base.bind_addr.address_list.next, > > >> + struct sctp_sockaddr_entry, list); > > >> + arg.laddr = &addr->a; > > >> + arg.paddr = &t->ipaddr; > > >> + arg.net = sock_net(t->asoc->base.sk); > > >> + > > >> +reinsert: > > >> + if (rhashtable_lookup_insert_key(&sctp_transport_hashtable, &arg, > > >> + &t->node, sctp_hash_params) == -EBUSY) > > >> + goto reinsert; > > >> +} > > >> + > > >> +void sctp_unhash_transport(struct sctp_transport *t) > > >> +{ > > >> + rhashtable_remove_fast(&sctp_transport_hashtable, &t->node, > > >> + sctp_hash_params); > > >> +} > > >> + > > >> +struct sctp_transport *sctp_addrs_lookup_transport( > > >> + struct net *net, > > >> + const union sctp_addr *laddr, > > >> + const union sctp_addr *paddr) > > >> +{ > > >> + struct sctp_hash_cmp_arg arg = { > > >> + .laddr = laddr, > > >> + .paddr = paddr, > > >> + .net = net, > > >> + }; > > >> + > > >> + return rhashtable_lookup_fast(&sctp_transport_hashtable, &arg, > > >> + sctp_hash_params); > > >> +} > > >> + > > >> +struct sctp_transport *sctp_epaddr_lookup_transport( > > >> + const struct sctp_endpoint *ep, > > >> + const union sctp_addr *paddr) > > >> +{ > > >> + struct sctp_sockaddr_entry *addr; > > >> + struct net *net = sock_net(ep->base.sk); > > >> + > > >> + addr = list_entry(ep->base.bind_addr.address_list.next, > > >> + struct sctp_sockaddr_entry, list); > > >> + > > >> + return sctp_addrs_lookup_transport(net, &addr->a, paddr); > > >> +} > > >> + > > > > > > I don't think that this right, mainly because not all endpoint > > > addresses will be copied to association bind_addr list. As a result, > > > you may actually have an association on this endpoint to a given > > > peer, but may not be using the first address from the endpoint.. > > That shouldn't be a problem, as laddr is not considered in the hash > function and later on sctp_hash_cmp() will use sctp_bind_addr_match() as > below, which will traverse asoc->base.bind_addr in this case, meaning it > would still see all transport/asoc possibilities. > > Or did I miss something? I did.. Xin kindly explained the issue to me offline. Yup, this is an issue, we will work on it. Thanks ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH net-next 1/5] sctp: add the rhashtable apis for sctp global transport hashtable 2016-01-06 17:01 ` Xin Long 2016-01-06 18:19 ` Marcelo Ricardo Leitner @ 2016-01-07 20:28 ` Vlad Yasevich 1 sibling, 0 replies; 40+ messages in thread From: Vlad Yasevich @ 2016-01-07 20:28 UTC (permalink / raw) To: Xin Long Cc: network dev, linux-sctp, Marcelo Ricardo Leitner, Vlad Yasevich, daniel, davem On 01/06/2016 12:01 PM, Xin Long wrote: > On Wed, Jan 6, 2016 at 2:38 AM, Vlad Yasevich <vyasevich@gmail.com> wrote: >> On 12/30/2015 10:50 AM, Xin Long wrote: >>> tranport hashtbale will replace the association hashtable to do the >>> lookup for transport, and then get association by t->assoc, rhashtable >>> apis will be used because of it's resizable, scalable and using rcu. >>> >>> lport + rport + paddr will be the base hashkey to locate the chain, >>> with net to protect one netns from another, then plus the laddr to >>> compare to get the target. >>> >>> this patch will provider the lookup functions: >>> - sctp_epaddr_lookup_transport >>> - sctp_addrs_lookup_transport >>> >>> hash/unhash functions: >>> - sctp_hash_transport >>> - sctp_unhash_transport >>> >>> init/destroy functions: >>> - sctp_transport_hashtable_init >>> - sctp_transport_hashtable_destroy >>> >>> Signed-off-by: Xin Long <lucien.xin@gmail.com> >>> Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> >>> --- >>> include/net/sctp/sctp.h | 11 ++++ >>> include/net/sctp/structs.h | 5 ++ >>> net/sctp/input.c | 131 +++++++++++++++++++++++++++++++++++++++++++++ >>> 3 files changed, 147 insertions(+) >>> >>> diff --git a/include/net/sctp/sctp.h b/include/net/sctp/sctp.h >>> index ce13cf2..7bbdfba 100644 >>> --- a/include/net/sctp/sctp.h >>> +++ b/include/net/sctp/sctp.h >>> @@ -143,6 +143,17 @@ void sctp_icmp_proto_unreachable(struct sock *sk, >>> struct sctp_transport *t); >>> void sctp_backlog_migrate(struct sctp_association *assoc, >>> struct sock *oldsk, struct sock *newsk); >>> +int sctp_transport_hashtable_init(void); >>> +void sctp_transport_hashtable_destroy(void); >>> +void sctp_hash_transport(struct sctp_transport *t); >>> +void sctp_unhash_transport(struct sctp_transport *t); >>> +struct sctp_transport *sctp_addrs_lookup_transport( >>> + struct net *net, >>> + const union sctp_addr *laddr, >>> + const union sctp_addr *paddr); >>> +struct sctp_transport *sctp_epaddr_lookup_transport( >>> + const struct sctp_endpoint *ep, >>> + const union sctp_addr *paddr); >>> >>> /* >>> * sctp/proc.c >>> diff --git a/include/net/sctp/structs.h b/include/net/sctp/structs.h >>> index eea9bde..4ab87d0 100644 >>> --- a/include/net/sctp/structs.h >>> +++ b/include/net/sctp/structs.h >>> @@ -48,6 +48,7 @@ >>> #define __sctp_structs_h__ >>> >>> #include <linux/ktime.h> >>> +#include <linux/rhashtable.h> >>> #include <linux/socket.h> /* linux/in.h needs this!! */ >>> #include <linux/in.h> /* We get struct sockaddr_in. */ >>> #include <linux/in6.h> /* We get struct in6_addr */ >>> @@ -123,6 +124,8 @@ extern struct sctp_globals { >>> struct sctp_hashbucket *assoc_hashtable; >>> /* This is the sctp port control hash. */ >>> struct sctp_bind_hashbucket *port_hashtable; >>> + /* This is the hash of all transports. */ >>> + struct rhashtable transport_hashtable; >>> >>> /* Sizes of above hashtables. */ >>> int ep_hashsize; >>> @@ -147,6 +150,7 @@ extern struct sctp_globals { >>> #define sctp_assoc_hashtable (sctp_globals.assoc_hashtable) >>> #define sctp_port_hashsize (sctp_globals.port_hashsize) >>> #define sctp_port_hashtable (sctp_globals.port_hashtable) >>> +#define sctp_transport_hashtable (sctp_globals.transport_hashtable) >>> #define sctp_checksum_disable (sctp_globals.checksum_disable) >>> >>> /* SCTP Socket type: UDP or TCP style. */ >>> @@ -753,6 +757,7 @@ static inline int sctp_packet_empty(struct sctp_packet *packet) >>> struct sctp_transport { >>> /* A list of transports. */ >>> struct list_head transports; >>> + struct rhash_head node; >>> >>> /* Reference counting. */ >>> atomic_t refcnt; >>> diff --git a/net/sctp/input.c b/net/sctp/input.c >>> index b6493b3..bac8278 100644 >>> --- a/net/sctp/input.c >>> +++ b/net/sctp/input.c >>> @@ -782,6 +782,137 @@ hit: >>> return ep; >>> } >>> >>> +/* rhashtable for transport */ >>> +struct sctp_hash_cmp_arg { >>> + const union sctp_addr *laddr; >>> + const union sctp_addr *paddr; >>> + const struct net *net; >>> +}; >>> + >>> +static inline int sctp_hash_cmp(struct rhashtable_compare_arg *arg, >>> + const void *ptr) >>> +{ >>> + const struct sctp_hash_cmp_arg *x = arg->key; >>> + const struct sctp_transport *t = ptr; >>> + struct sctp_association *asoc = t->asoc; >>> + const struct net *net = x->net; >>> + >>> + if (x->laddr->v4.sin_port != htons(asoc->base.bind_addr.port)) >>> + return 1; >>> + if (!sctp_cmp_addr_exact(&t->ipaddr, x->paddr)) >>> + return 1; >>> + if (!net_eq(sock_net(asoc->base.sk), net)) >>> + return 1; >>> + if (!sctp_bind_addr_match(&asoc->base.bind_addr, >>> + x->laddr, sctp_sk(asoc->base.sk))) >>> + return 1; >>> + >>> + return 0; >>> +} >>> + >>> +static inline u32 sctp_hash_obj(const void *data, u32 len, u32 seed) >>> +{ >>> + const struct sctp_transport *t = data; >>> + const union sctp_addr *paddr = &t->ipaddr; >>> + const struct net *net = sock_net(t->asoc->base.sk); >>> + u16 lport = htons(t->asoc->base.bind_addr.port); >>> + u32 addr; >>> + >>> + if (paddr->sa.sa_family == AF_INET6) >>> + addr = jhash(&paddr->v6.sin6_addr, 16, seed); >>> + else >>> + addr = paddr->v4.sin_addr.s_addr; >>> + >>> + return jhash_3words(addr, ((__u32)paddr->v4.sin_port) << 16 | >>> + (__force __u32)lport, net_hash_mix(net), seed); >>> +} >>> + >>> +static inline u32 sctp_hash_key(const void *data, u32 len, u32 seed) >>> +{ >>> + const struct sctp_hash_cmp_arg *x = data; >>> + const union sctp_addr *paddr = x->paddr; >>> + const struct net *net = x->net; >>> + u16 lport = x->laddr->v4.sin_port; >>> + u32 addr; >>> + >>> + if (paddr->sa.sa_family == AF_INET6) >>> + addr = jhash(&paddr->v6.sin6_addr, 16, seed); >>> + else >>> + addr = paddr->v4.sin_addr.s_addr; >>> + >>> + return jhash_3words(addr, ((__u32)paddr->v4.sin_port) << 16 | >>> + (__force __u32)lport, net_hash_mix(net), seed); >>> +} >>> + >>> +static const struct rhashtable_params sctp_hash_params = { >>> + .head_offset = offsetof(struct sctp_transport, node), >>> + .hashfn = sctp_hash_key, >>> + .obj_hashfn = sctp_hash_obj, >>> + .obj_cmpfn = sctp_hash_cmp, >>> + .automatic_shrinking = true, >>> +}; >>> + >>> +int sctp_transport_hashtable_init(void) >>> +{ >>> + return rhashtable_init(&sctp_transport_hashtable, &sctp_hash_params); >>> +} >>> + >>> +void sctp_transport_hashtable_destroy(void) >>> +{ >>> + rhashtable_destroy(&sctp_transport_hashtable); >>> +} >>> + >>> +void sctp_hash_transport(struct sctp_transport *t) >>> +{ >>> + struct sctp_sockaddr_entry *addr; >>> + struct sctp_hash_cmp_arg arg; >>> + >>> + addr = list_entry(t->asoc->base.bind_addr.address_list.next, >>> + struct sctp_sockaddr_entry, list); >>> + arg.laddr = &addr->a; >>> + arg.paddr = &t->ipaddr; >>> + arg.net = sock_net(t->asoc->base.sk); >>> + >>> +reinsert: >>> + if (rhashtable_lookup_insert_key(&sctp_transport_hashtable, &arg, >>> + &t->node, sctp_hash_params) == -EBUSY) >>> + goto reinsert; >>> +} >>> + >>> +void sctp_unhash_transport(struct sctp_transport *t) >>> +{ >>> + rhashtable_remove_fast(&sctp_transport_hashtable, &t->node, >>> + sctp_hash_params); >>> +} >>> + >>> +struct sctp_transport *sctp_addrs_lookup_transport( >>> + struct net *net, >>> + const union sctp_addr *laddr, >>> + const union sctp_addr *paddr) >>> +{ >>> + struct sctp_hash_cmp_arg arg = { >>> + .laddr = laddr, >>> + .paddr = paddr, >>> + .net = net, >>> + }; >>> + >>> + return rhashtable_lookup_fast(&sctp_transport_hashtable, &arg, >>> + sctp_hash_params); >>> +} >>> + >>> +struct sctp_transport *sctp_epaddr_lookup_transport( >>> + const struct sctp_endpoint *ep, >>> + const union sctp_addr *paddr) >>> +{ >>> + struct sctp_sockaddr_entry *addr; >>> + struct net *net = sock_net(ep->base.sk); >>> + >>> + addr = list_entry(ep->base.bind_addr.address_list.next, >>> + struct sctp_sockaddr_entry, list); >>> + >>> + return sctp_addrs_lookup_transport(net, &addr->a, paddr); >>> +} >>> + >> >> I don't think that this right, mainly because not all endpoint >> addresses will be copied to association bind_addr list. As a result, >> you may actually have an association on this endpoint to a given >> peer, but may not be using the first address from the endpoint.. >> >> What might work is to pick an endpoint address that would be usable within >> the scope of the peer address. > it's not that easy, does it make sense to you if I change > > if (!sctp_bind_addr_match(&asoc->base.bind_addr, > x->laddr, sctp_sk(asoc->base.sk))) > > to > if (!sctp_bind_addr_match(&asoc->ep->base.bind_addr, > x->laddr, sctp_sk(asoc->base.sk))) > > in sctp_hash_cmp() ? > No, the problem them becomes the accept/peel-off path. The assoc->ep linkage isn't protected in the lookup path and thus can race against accept call moving the assoc from one endpoint to another. This is why we've used the association list as it is under rcu protection. We might be able to get away with making asoc->ep pointer rcu protected... Need to think a bit more about it. -vlad ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH net-next 1/5] sctp: add the rhashtable apis for sctp global transport hashtable 2015-12-30 15:50 ` [PATCH net-next 1/5] sctp: add the rhashtable apis for sctp global transport hashtable Xin Long ` (3 preceding siblings ...) 2016-01-05 18:38 ` Vlad Yasevich @ 2016-01-11 9:30 ` Herbert Xu 2016-01-11 16:00 ` mleitner 4 siblings, 1 reply; 40+ messages in thread From: Herbert Xu @ 2016-01-11 9:30 UTC (permalink / raw) To: Xin Long; +Cc: netdev, linux-sctp, mleitner, vyasevic, daniel, davem Xin Long <lucien.xin@gmail.com> wrote: > > +static inline int sctp_hash_cmp(struct rhashtable_compare_arg *arg, > + const void *ptr) > +{ > + const struct sctp_hash_cmp_arg *x = arg->key; > + const struct sctp_transport *t = ptr; > + struct sctp_association *asoc = t->asoc; > + const struct net *net = x->net; > + > + if (x->laddr->v4.sin_port != htons(asoc->base.bind_addr.port)) > + return 1; > + if (!sctp_cmp_addr_exact(&t->ipaddr, x->paddr)) > + return 1; > + if (!net_eq(sock_net(asoc->base.sk), net)) > + return 1; > + if (!sctp_bind_addr_match(&asoc->base.bind_addr, > + x->laddr, sctp_sk(asoc->base.sk))) > + return 1; > + > + return 0; > +} > + > +static inline u32 sctp_hash_obj(const void *data, u32 len, u32 seed) > +{ > + const struct sctp_transport *t = data; > + const union sctp_addr *paddr = &t->ipaddr; > + const struct net *net = sock_net(t->asoc->base.sk); > + u16 lport = htons(t->asoc->base.bind_addr.port); > + u32 addr; > + > + if (paddr->sa.sa_family == AF_INET6) > + addr = jhash(&paddr->v6.sin6_addr, 16, seed); > + else > + addr = paddr->v4.sin_addr.s_addr; > + > + return jhash_3words(addr, ((__u32)paddr->v4.sin_port) << 16 | > + (__force __u32)lport, net_hash_mix(net), seed); > +} > + > +static inline u32 sctp_hash_key(const void *data, u32 len, u32 seed) > +{ > + const struct sctp_hash_cmp_arg *x = data; > + const union sctp_addr *paddr = x->paddr; > + const struct net *net = x->net; > + u16 lport = x->laddr->v4.sin_port; > + u32 addr; > + > + if (paddr->sa.sa_family == AF_INET6) > + addr = jhash(&paddr->v6.sin6_addr, 16, seed); > + else > + addr = paddr->v4.sin_addr.s_addr; > + > + return jhash_3words(addr, ((__u32)paddr->v4.sin_port) << 16 | > + (__force __u32)lport, net_hash_mix(net), seed); > +} There's your problem. You are allowing multiple objects to hash to the same value. This is unacceptable with rhashtable because we use the hash chain length to determine if we're under attack and need to rehash. This is the reason why you would see EBUSY during insertion. So instead of inserting your objects as is, you should instead hash a list object that then links to all the objects that has the same hash key. Cheers, -- Email: Herbert Xu <herbert@gondor.apana.org.au> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH net-next 1/5] sctp: add the rhashtable apis for sctp global transport hashtable 2016-01-11 9:30 ` Herbert Xu @ 2016-01-11 16:00 ` mleitner 2016-01-11 17:20 ` Vlad Yasevich 0 siblings, 1 reply; 40+ messages in thread From: mleitner @ 2016-01-11 16:00 UTC (permalink / raw) To: Herbert Xu; +Cc: Xin Long, netdev, linux-sctp, vyasevic, daniel, davem On Mon, Jan 11, 2016 at 05:30:12PM +0800, Herbert Xu wrote: > Xin Long <lucien.xin@gmail.com> wrote: > > > > +static inline int sctp_hash_cmp(struct rhashtable_compare_arg *arg, > > + const void *ptr) > > +{ > > + const struct sctp_hash_cmp_arg *x = arg->key; > > + const struct sctp_transport *t = ptr; > > + struct sctp_association *asoc = t->asoc; > > + const struct net *net = x->net; > > + > > + if (x->laddr->v4.sin_port != htons(asoc->base.bind_addr.port)) > > + return 1; > > + if (!sctp_cmp_addr_exact(&t->ipaddr, x->paddr)) > > + return 1; > > + if (!net_eq(sock_net(asoc->base.sk), net)) > > + return 1; > > + if (!sctp_bind_addr_match(&asoc->base.bind_addr, > > + x->laddr, sctp_sk(asoc->base.sk))) > > + return 1; > > + > > + return 0; > > +} > > + > > +static inline u32 sctp_hash_obj(const void *data, u32 len, u32 seed) > > +{ > > + const struct sctp_transport *t = data; > > + const union sctp_addr *paddr = &t->ipaddr; > > + const struct net *net = sock_net(t->asoc->base.sk); > > + u16 lport = htons(t->asoc->base.bind_addr.port); > > + u32 addr; > > + > > + if (paddr->sa.sa_family == AF_INET6) > > + addr = jhash(&paddr->v6.sin6_addr, 16, seed); > > + else > > + addr = paddr->v4.sin_addr.s_addr; > > + > > + return jhash_3words(addr, ((__u32)paddr->v4.sin_port) << 16 | > > + (__force __u32)lport, net_hash_mix(net), seed); > > +} > > + > > +static inline u32 sctp_hash_key(const void *data, u32 len, u32 seed) > > +{ > > + const struct sctp_hash_cmp_arg *x = data; > > + const union sctp_addr *paddr = x->paddr; > > + const struct net *net = x->net; > > + u16 lport = x->laddr->v4.sin_port; > > + u32 addr; > > + > > + if (paddr->sa.sa_family == AF_INET6) > > + addr = jhash(&paddr->v6.sin6_addr, 16, seed); > > + else > > + addr = paddr->v4.sin_addr.s_addr; > > + > > + return jhash_3words(addr, ((__u32)paddr->v4.sin_port) << 16 | > > + (__force __u32)lport, net_hash_mix(net), seed); > > +} > > There's your problem. You are allowing multiple objects to hash > to the same value. This is unacceptable with rhashtable because > we use the hash chain length to determine if we're under attack > and need to rehash. This is the reason why you would see EBUSY > during insertion. Cool. Then I guess we don't really have an issue here. The case that fails is an artificial load test which is virtually impossible to be hit in real world, or at least I really hope so. The test, as in Xin's attachment, will load more than 1600 IP addresses in one host (2 vCPU during the test) and attempt to start an assoc from each of those using the very same (lport, daddr, dport)-tuple. Doing so is just unreasonable. Note that net is also hashed, so even if we consider it could be 1600 containers, it is fine. > So instead of inserting your objects as is, you should instead hash > a list object that then links to all the objects that has the same > hash key. Now that it is clarified, I'm thinking we should just get ride of that loop on EBUSY. No real reason to have extra code only to support an artificial test. Agree? Thanks, Marcelo ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH net-next 1/5] sctp: add the rhashtable apis for sctp global transport hashtable 2016-01-11 16:00 ` mleitner @ 2016-01-11 17:20 ` Vlad Yasevich 2016-01-11 18:09 ` mleitner 2016-01-11 21:31 ` David Miller 0 siblings, 2 replies; 40+ messages in thread From: Vlad Yasevich @ 2016-01-11 17:20 UTC (permalink / raw) To: mleitner, Herbert Xu Cc: Xin Long, netdev, linux-sctp, vyasevic, daniel, davem On 01/11/2016 11:00 AM, mleitner@redhat.com wrote: > On Mon, Jan 11, 2016 at 05:30:12PM +0800, Herbert Xu wrote: >> Xin Long <lucien.xin@gmail.com> wrote: >>> >>> +static inline int sctp_hash_cmp(struct rhashtable_compare_arg *arg, >>> + const void *ptr) >>> +{ >>> + const struct sctp_hash_cmp_arg *x = arg->key; >>> + const struct sctp_transport *t = ptr; >>> + struct sctp_association *asoc = t->asoc; >>> + const struct net *net = x->net; >>> + >>> + if (x->laddr->v4.sin_port != htons(asoc->base.bind_addr.port)) >>> + return 1; >>> + if (!sctp_cmp_addr_exact(&t->ipaddr, x->paddr)) >>> + return 1; >>> + if (!net_eq(sock_net(asoc->base.sk), net)) >>> + return 1; >>> + if (!sctp_bind_addr_match(&asoc->base.bind_addr, >>> + x->laddr, sctp_sk(asoc->base.sk))) >>> + return 1; >>> + >>> + return 0; >>> +} >>> + >>> +static inline u32 sctp_hash_obj(const void *data, u32 len, u32 seed) >>> +{ >>> + const struct sctp_transport *t = data; >>> + const union sctp_addr *paddr = &t->ipaddr; >>> + const struct net *net = sock_net(t->asoc->base.sk); >>> + u16 lport = htons(t->asoc->base.bind_addr.port); >>> + u32 addr; >>> + >>> + if (paddr->sa.sa_family == AF_INET6) >>> + addr = jhash(&paddr->v6.sin6_addr, 16, seed); >>> + else >>> + addr = paddr->v4.sin_addr.s_addr; >>> + >>> + return jhash_3words(addr, ((__u32)paddr->v4.sin_port) << 16 | >>> + (__force __u32)lport, net_hash_mix(net), seed); >>> +} >>> + >>> +static inline u32 sctp_hash_key(const void *data, u32 len, u32 seed) >>> +{ >>> + const struct sctp_hash_cmp_arg *x = data; >>> + const union sctp_addr *paddr = x->paddr; >>> + const struct net *net = x->net; >>> + u16 lport = x->laddr->v4.sin_port; >>> + u32 addr; >>> + >>> + if (paddr->sa.sa_family == AF_INET6) >>> + addr = jhash(&paddr->v6.sin6_addr, 16, seed); >>> + else >>> + addr = paddr->v4.sin_addr.s_addr; >>> + >>> + return jhash_3words(addr, ((__u32)paddr->v4.sin_port) << 16 | >>> + (__force __u32)lport, net_hash_mix(net), seed); >>> +} >> >> There's your problem. You are allowing multiple objects to hash >> to the same value. This is unacceptable with rhashtable because >> we use the hash chain length to determine if we're under attack >> and need to rehash. This is the reason why you would see EBUSY >> during insertion. > > Cool. Then I guess we don't really have an issue here. The case that > fails is an artificial load test which is virtually impossible to be hit > in real world, or at least I really hope so. The test, as in Xin's > attachment, will load more than 1600 IP addresses in one host (2 vCPU > during the test) and attempt to start an assoc from each of those using > the very same (lport, daddr, dport)-tuple. > > Doing so is just unreasonable. Note that net is also hashed, so > even if we consider it could be 1600 containers, it is fine. I have a hard time excepting this argument. Just because a given test scenario may be unreasonable now, doesn't make so in the future. If there is a way to solve the problem, then it should be done. Saying this isn't really a problem isn't going to make it go away. -vlad > >> So instead of inserting your objects as is, you should instead hash >> a list object that then links to all the objects that has the same >> hash key. > > Now that it is clarified, I'm thinking we should just get ride of that > loop on EBUSY. No real reason to have extra code only to support an > artificial test. Agree? > > Thanks, > Marcelo > > -- > To unsubscribe from this list: send the line "unsubscribe linux-sctp" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH net-next 1/5] sctp: add the rhashtable apis for sctp global transport hashtable 2016-01-11 17:20 ` Vlad Yasevich @ 2016-01-11 18:09 ` mleitner 2016-01-11 21:35 ` David Miller 2016-01-11 21:31 ` David Miller 1 sibling, 1 reply; 40+ messages in thread From: mleitner @ 2016-01-11 18:09 UTC (permalink / raw) To: Vlad Yasevich Cc: Herbert Xu, Xin Long, netdev, linux-sctp, vyasevic, daniel, davem On Mon, Jan 11, 2016 at 12:20:17PM -0500, Vlad Yasevich wrote: > On 01/11/2016 11:00 AM, mleitner@redhat.com wrote: > > On Mon, Jan 11, 2016 at 05:30:12PM +0800, Herbert Xu wrote: > >> Xin Long <lucien.xin@gmail.com> wrote: > >>> > >>> +static inline int sctp_hash_cmp(struct rhashtable_compare_arg *arg, > >>> + const void *ptr) > >>> +{ > >>> + const struct sctp_hash_cmp_arg *x = arg->key; > >>> + const struct sctp_transport *t = ptr; > >>> + struct sctp_association *asoc = t->asoc; > >>> + const struct net *net = x->net; > >>> + > >>> + if (x->laddr->v4.sin_port != htons(asoc->base.bind_addr.port)) > >>> + return 1; > >>> + if (!sctp_cmp_addr_exact(&t->ipaddr, x->paddr)) > >>> + return 1; > >>> + if (!net_eq(sock_net(asoc->base.sk), net)) > >>> + return 1; > >>> + if (!sctp_bind_addr_match(&asoc->base.bind_addr, > >>> + x->laddr, sctp_sk(asoc->base.sk))) > >>> + return 1; > >>> + > >>> + return 0; > >>> +} > >>> + > >>> +static inline u32 sctp_hash_obj(const void *data, u32 len, u32 seed) > >>> +{ > >>> + const struct sctp_transport *t = data; > >>> + const union sctp_addr *paddr = &t->ipaddr; > >>> + const struct net *net = sock_net(t->asoc->base.sk); > >>> + u16 lport = htons(t->asoc->base.bind_addr.port); > >>> + u32 addr; > >>> + > >>> + if (paddr->sa.sa_family == AF_INET6) > >>> + addr = jhash(&paddr->v6.sin6_addr, 16, seed); > >>> + else > >>> + addr = paddr->v4.sin_addr.s_addr; > >>> + > >>> + return jhash_3words(addr, ((__u32)paddr->v4.sin_port) << 16 | > >>> + (__force __u32)lport, net_hash_mix(net), seed); > >>> +} > >>> + > >>> +static inline u32 sctp_hash_key(const void *data, u32 len, u32 seed) > >>> +{ > >>> + const struct sctp_hash_cmp_arg *x = data; > >>> + const union sctp_addr *paddr = x->paddr; > >>> + const struct net *net = x->net; > >>> + u16 lport = x->laddr->v4.sin_port; > >>> + u32 addr; > >>> + > >>> + if (paddr->sa.sa_family == AF_INET6) > >>> + addr = jhash(&paddr->v6.sin6_addr, 16, seed); > >>> + else > >>> + addr = paddr->v4.sin_addr.s_addr; > >>> + > >>> + return jhash_3words(addr, ((__u32)paddr->v4.sin_port) << 16 | > >>> + (__force __u32)lport, net_hash_mix(net), seed); > >>> +} > >> > >> There's your problem. You are allowing multiple objects to hash > >> to the same value. This is unacceptable with rhashtable because > >> we use the hash chain length to determine if we're under attack > >> and need to rehash. This is the reason why you would see EBUSY > >> during insertion. > > > > Cool. Then I guess we don't really have an issue here. The case that > > fails is an artificial load test which is virtually impossible to be hit > > in real world, or at least I really hope so. The test, as in Xin's > > attachment, will load more than 1600 IP addresses in one host (2 vCPU > > during the test) and attempt to start an assoc from each of those using > > the very same (lport, daddr, dport)-tuple. > > > > Doing so is just unreasonable. Note that net is also hashed, so > > even if we consider it could be 1600 containers, it is fine. > > I have a hard time excepting this argument. Just because a given test > scenario may be unreasonable now, doesn't make so in the future. If > there is a way to solve the problem, then it should be done. Saying > this isn't really a problem isn't going to make it go away. Heh, I understand.. There is still the other part of this thread to be worked on (re ->dead), maybe that will justify extra stuff in here but I really wouldn't like to add extra structures and locks on this just to satisfy an unreasonable scenario like this. This hash is very busy, the lean it is, the better. Maybe we could keep the loop as is for now, as a fail-safe, and add a pr_warn_once() if it gets hit? Marcelo ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH net-next 1/5] sctp: add the rhashtable apis for sctp global transport hashtable 2016-01-11 18:09 ` mleitner @ 2016-01-11 21:35 ` David Miller 0 siblings, 0 replies; 40+ messages in thread From: David Miller @ 2016-01-11 21:35 UTC (permalink / raw) To: mleitner Cc: vyasevich, herbert, lucien.xin, netdev, linux-sctp, vyasevic, daniel From: mleitner@redhat.com Date: Mon, 11 Jan 2016 16:09:27 -0200 > There is still the other part of this thread to be worked on (re > ->dead), maybe that will justify extra stuff in here but I really > wouldn't like to add extra structures and locks on this just to satisfy > an unreasonable scenario like this. This hash is very busy, the lean it > is, the better. It is never "unreasonable" if it is necessary to fix a real bug, which this is. ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH net-next 1/5] sctp: add the rhashtable apis for sctp global transport hashtable 2016-01-11 17:20 ` Vlad Yasevich 2016-01-11 18:09 ` mleitner @ 2016-01-11 21:31 ` David Miller 1 sibling, 0 replies; 40+ messages in thread From: David Miller @ 2016-01-11 21:31 UTC (permalink / raw) To: vyasevich Cc: mleitner, herbert, lucien.xin, netdev, linux-sctp, vyasevic, daniel From: Vlad Yasevich <vyasevich@gmail.com> Date: Mon, 11 Jan 2016 12:20:17 -0500 > I have a hard time excepting this argument. Just because a given test > scenario may be unreasonable now, doesn't make so in the future. If > there is a way to solve the problem, then it should be done. Saying > this isn't really a problem isn't going to make it go away. +1 ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH net-next 0/5] sctp: use transport hashtable to replace association's with rhashtable 2015-12-30 15:50 [PATCH net-next 0/5] sctp: use transport hashtable to replace association's with rhashtable Xin Long 2015-12-30 15:50 ` [PATCH net-next 1/5] sctp: add the rhashtable apis for sctp global transport hashtable Xin Long @ 2015-12-30 17:19 ` Eric Dumazet 2015-12-30 17:32 ` Marcelo Ricardo Leitner 2015-12-30 17:52 ` David Miller 2016-01-04 22:30 ` David Miller 2 siblings, 2 replies; 40+ messages in thread From: Eric Dumazet @ 2015-12-30 17:19 UTC (permalink / raw) To: Xin Long; +Cc: network dev, linux-sctp, mleitner, vyasevic, daniel, davem On Wed, 2015-12-30 at 23:50 +0800, Xin Long wrote: > besides, this patchset will use transport hashtable to replace > association hashtable to lookup with rhashtable api. get transport > first then get association by t->asoc. and also it will make tcp > style work better. SCTP already has a hash table, why not simply changing the way items are hashed into it ? Sure, storing thousands of sockets in a single hash bucket is not wise. Switching SCTP to rhashtable at this moment is premature, it is still moving fast. ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH net-next 0/5] sctp: use transport hashtable to replace association's with rhashtable 2015-12-30 17:19 ` [PATCH net-next 0/5] sctp: use transport hashtable to replace association's with rhashtable Eric Dumazet @ 2015-12-30 17:32 ` Marcelo Ricardo Leitner 2015-12-30 19:11 ` Eric Dumazet 2015-12-30 17:52 ` David Miller 1 sibling, 1 reply; 40+ messages in thread From: Marcelo Ricardo Leitner @ 2015-12-30 17:32 UTC (permalink / raw) To: Eric Dumazet Cc: Xin Long, network dev, linux-sctp, mleitner, vyasevic, daniel, davem On Wed, Dec 30, 2015 at 12:19:39PM -0500, Eric Dumazet wrote: > On Wed, 2015-12-30 at 23:50 +0800, Xin Long wrote: > > > besides, this patchset will use transport hashtable to replace > > association hashtable to lookup with rhashtable api. get transport > > first then get association by t->asoc. and also it will make tcp > > style work better. > > SCTP already has a hash table, why not simply changing the way items are > hashed into it ? Because Vlad asked to split the patch so it gets easier to review. The direct change was quite big. > Sure, storing thousands of sockets in a single hash bucket is not wise. > > Switching SCTP to rhashtable at this moment is premature, it is still > moving fast. Dave and Vlad had asked in the first review for considering using rhashtable (ok, Dave didn't mention it by name). We did, and it seemed nice beside 1 issue Xin found, regarding multiple rehashing, which I'll highlight in a reply right away. Said all this, I know this was your second email already against this usage, but I have to ask, sorry: still really against it? Initial post was with subject: [PATCH net] sctp: support global vtag assochash and per endpoint s(d)port assochash table Thanks, Marcelo ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH net-next 0/5] sctp: use transport hashtable to replace association's with rhashtable 2015-12-30 17:32 ` Marcelo Ricardo Leitner @ 2015-12-30 19:11 ` Eric Dumazet 2015-12-30 20:44 ` David Miller 0 siblings, 1 reply; 40+ messages in thread From: Eric Dumazet @ 2015-12-30 19:11 UTC (permalink / raw) To: Marcelo Ricardo Leitner Cc: Xin Long, network dev, linux-sctp, mleitner, vyasevic, daniel, davem On Wed, 2015-12-30 at 15:32 -0200, Marcelo Ricardo Leitner wrote: > On Wed, Dec 30, 2015 at 12:19:39PM -0500, Eric Dumazet wrote: > > On Wed, 2015-12-30 at 23:50 +0800, Xin Long wrote: > > > > > besides, this patchset will use transport hashtable to replace > > > association hashtable to lookup with rhashtable api. get transport > > > first then get association by t->asoc. and also it will make tcp > > > style work better. > > > > SCTP already has a hash table, why not simply changing the way items are > > hashed into it ? > > Because Vlad asked to split the patch so it gets easier to review. The > direct change was quite big. > > > Sure, storing thousands of sockets in a single hash bucket is not wise. > > > > Switching SCTP to rhashtable at this moment is premature, it is still > > moving fast. > > Dave and Vlad had asked in the first review for considering using > rhashtable (ok, Dave didn't mention it by name). We did, and it seemed > nice beside 1 issue Xin found, regarding multiple rehashing, which I'll > highlight in a reply right away. > Said all this, I know this was your second email already against this > usage, but I have to ask, sorry: still really against it? Well, it seems that Dave is OK to fix all remaining bugs in rhashtable. I was not aware that 'we' decided to force rhashtable all over the places, because it looks so sexy and fun. Let see how funny it will be then. ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH net-next 0/5] sctp: use transport hashtable to replace association's with rhashtable 2015-12-30 19:11 ` Eric Dumazet @ 2015-12-30 20:44 ` David Miller 2015-12-30 21:57 ` Eric Dumazet 0 siblings, 1 reply; 40+ messages in thread From: David Miller @ 2015-12-30 20:44 UTC (permalink / raw) To: eric.dumazet Cc: marcelo.leitner, lucien.xin, netdev, linux-sctp, mleitner, vyasevic, daniel From: Eric Dumazet <eric.dumazet@gmail.com> Date: Wed, 30 Dec 2015 14:11:20 -0500 > Let see how funny it will be then. It is more fun than waiting longer for the more limited uses of it to trigger problems. I cannot be convinced that using it in more places in order to find and fix more bugs is a bad thing. I'm sorry if a lot of bug fixes in a short period of time concerns you, but for me that's an even clearer sign that it needs help, and exposing it to more use cases is one of the best forms of help it can get. It also tells me that the people actually working on those fixes, such as Herbert Xu, are motivated and reliable when they are shown properly formed bug reports. I cannot think of a report Herbert and others did not resolve in a timely manner. They usually add test cases too. And that matters more to me than anything else. A subsystem can be buggy as shit, but if someone is responsible about fixing the reported bugs properly, then I have absolutely nothing to worry about. ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH net-next 0/5] sctp: use transport hashtable to replace association's with rhashtable 2015-12-30 20:44 ` David Miller @ 2015-12-30 21:57 ` Eric Dumazet 2015-12-30 22:29 ` Marcelo Ricardo Leitner 0 siblings, 1 reply; 40+ messages in thread From: Eric Dumazet @ 2015-12-30 21:57 UTC (permalink / raw) To: David Miller Cc: marcelo.leitner, lucien.xin, netdev, linux-sctp, mleitner, vyasevic, daniel On Wed, 2015-12-30 at 15:44 -0500, David Miller wrote: > It is more fun than waiting longer for the more limited uses of it to > trigger problems. > > I cannot be convinced that using it in more places in order to find > and fix more bugs is a bad thing. > > I'm sorry if a lot of bug fixes in a short period of time concerns > you, but for me that's an even clearer sign that it needs help, and > exposing it to more use cases is one of the best forms of help it can > get. > > It also tells me that the people actually working on those fixes, such > as Herbert Xu, are motivated and reliable when they are shown properly > formed bug reports. > > I cannot think of a report Herbert and others did not resolve in a > timely manner. They usually add test cases too. I have no doubts we can fix bugs in upstream kernels in a few days (at most). The problem is when a customer is stuck using a distro, with a release cycle of extra months after upstream fixes. I had to deal with customers having issues with resolvers hitting the netlink/rhashtable bugs, and I can tell you it was not pretty nor funny. Seeing all these SCTP bugs being currently tracked/fixed (reports from Dmitry Vyukov), I am concerned about having to backport fixes into old kernels without proper rhashtable if now SCTP relies heavily on rhashtable. Hopefully nothing bad will happen. ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH net-next 0/5] sctp: use transport hashtable to replace association's with rhashtable 2015-12-30 21:57 ` Eric Dumazet @ 2015-12-30 22:29 ` Marcelo Ricardo Leitner 0 siblings, 0 replies; 40+ messages in thread From: Marcelo Ricardo Leitner @ 2015-12-30 22:29 UTC (permalink / raw) To: Eric Dumazet, David Miller Cc: lucien.xin, netdev, linux-sctp, mleitner, vyasevic, daniel Em 30-12-2015 19:57, Eric Dumazet escreveu: > On Wed, 2015-12-30 at 15:44 -0500, David Miller wrote: > >> It is more fun than waiting longer for the more limited uses of it to >> trigger problems. >> >> I cannot be convinced that using it in more places in order to find >> and fix more bugs is a bad thing. >> >> I'm sorry if a lot of bug fixes in a short period of time concerns >> you, but for me that's an even clearer sign that it needs help, and >> exposing it to more use cases is one of the best forms of help it can >> get. >> >> It also tells me that the people actually working on those fixes, such >> as Herbert Xu, are motivated and reliable when they are shown properly >> formed bug reports. >> >> I cannot think of a report Herbert and others did not resolve in a >> timely manner. They usually add test cases too. > > > I have no doubts we can fix bugs in upstream kernels in a few days (at > most). > > The problem is when a customer is stuck using a distro, with a release > cycle of extra months after upstream fixes. If one takes extra months to have a fix delivered to a customer, they probably are also months late on security fixes as well, right? That would be pretty scary by itself already. > I had to deal with customers having issues with resolvers hitting the > netlink/rhashtable bugs, and I can tell you it was not pretty nor funny. > > Seeing all these SCTP bugs being currently tracked/fixed (reports from > Dmitry Vyukov), I am concerned about having to backport fixes into old > kernels without proper rhashtable if now SCTP relies heavily on > rhashtable. This happens with every major change in the kernel. Try backporting vxlan fixes to an older kernel, for example, to one without ip_tunnel. Can't say about the future, but so far none of those bugs were related to the hash that we want to replace and they were all small/contained patches. And at least for now, we are not adding new stuff which relies on this new hash. It's on a central part of sctp, yes, but somewhat contained. Like what happened with vxlan/ip_tunnel, which ended up growing together. > Hopefully nothing bad will happen. +1 :) ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH net-next 0/5] sctp: use transport hashtable to replace association's with rhashtable 2015-12-30 17:19 ` [PATCH net-next 0/5] sctp: use transport hashtable to replace association's with rhashtable Eric Dumazet 2015-12-30 17:32 ` Marcelo Ricardo Leitner @ 2015-12-30 17:52 ` David Miller 2015-12-30 19:03 ` Eric Dumazet 1 sibling, 1 reply; 40+ messages in thread From: David Miller @ 2015-12-30 17:52 UTC (permalink / raw) To: eric.dumazet; +Cc: lucien.xin, netdev, linux-sctp, mleitner, vyasevic, daniel From: Eric Dumazet <eric.dumazet@gmail.com> Date: Wed, 30 Dec 2015 12:19:39 -0500 > Switching SCTP to rhashtable at this moment is premature, it is > still moving fast. I completely, and totally, disagree. rhashtable actually _needs_ a strong active user like one of the protocol socket hashes. It's a step backwards to keep rhashtable in the shadows by only allowing certain subsystems to convert to it. That's really incredibly stupid if you ask me. ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH net-next 0/5] sctp: use transport hashtable to replace association's with rhashtable 2015-12-30 17:52 ` David Miller @ 2015-12-30 19:03 ` Eric Dumazet 2015-12-30 20:40 ` David Miller 0 siblings, 1 reply; 40+ messages in thread From: Eric Dumazet @ 2015-12-30 19:03 UTC (permalink / raw) To: David Miller; +Cc: lucien.xin, netdev, linux-sctp, mleitner, vyasevic, daniel On Wed, 2015-12-30 at 12:52 -0500, David Miller wrote: > From: Eric Dumazet <eric.dumazet@gmail.com> > Date: Wed, 30 Dec 2015 12:19:39 -0500 > > > Switching SCTP to rhashtable at this moment is premature, it is > > still moving fast. > > I completely, and totally, disagree. > > rhashtable actually _needs_ a strong active user like one of the > protocol socket hashes. > > It's a step backwards to keep rhashtable in the shadows by only > allowing certain subsystems to convert to it. That's really > incredibly stupid if you ask me. You sure can disagree with me, but calling my opinion 'incredily stupid' is not wise. Let me check how stable is rhashtable : # git log --oneline v4.2.. lib/rhashtable.c 179ccc0a7364 rhashtable: Kill harmless RCU warning in rhashtable_walk_init c6ff5268293e rhashtable: Fix walker list corruption 3a324606bbab rhashtable: Enforce minimum size on initial hash table a90099d9fabd Revert "rhashtable: Use __vmalloc with GFP_ATOMIC for table allocation" d3716f18a7d8 rhashtable: Use __vmalloc with GFP_ATOMIC for table allocation 3cf92222a39c rhashtable: Prevent spurious EBUSY errors on insertion 7def0f952ecc lib: fix data race in rhashtable_rehash_one Seriously, I think we can wait one release before 'en masse' conversions. I understand we would love to do that, but what is the hurry for SCTP, that needed rhashtable so desperately that it could not be done before 2016 ? ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH net-next 0/5] sctp: use transport hashtable to replace association's with rhashtable 2015-12-30 19:03 ` Eric Dumazet @ 2015-12-30 20:40 ` David Miller 0 siblings, 0 replies; 40+ messages in thread From: David Miller @ 2015-12-30 20:40 UTC (permalink / raw) To: eric.dumazet; +Cc: lucien.xin, netdev, linux-sctp, mleitner, vyasevic, daniel From: Eric Dumazet <eric.dumazet@gmail.com> Date: Wed, 30 Dec 2015 14:03:43 -0500 > You sure can disagree with me, but calling my opinion 'incredily > stupid' is not wise. I think fundamentally giving facilities less rather than more coverage is not a smart approach at all. If the code is that bad that people are discouraged from even using it, then it should be moved to drivers/staging or removed. Otherwise it must work and we must be able to make use of it. > I understand we would love to do that, but what is the hurry for > SCTP, that needed rhashtable so desperately that it could not be > done before 2016 ? There is no rush, but quite frankly finding people to do serious work on SCTP is no easy task so I'm hesitant to "defer" someone's work in this area... :-) ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH net-next 0/5] sctp: use transport hashtable to replace association's with rhashtable 2015-12-30 15:50 [PATCH net-next 0/5] sctp: use transport hashtable to replace association's with rhashtable Xin Long 2015-12-30 15:50 ` [PATCH net-next 1/5] sctp: add the rhashtable apis for sctp global transport hashtable Xin Long 2015-12-30 17:19 ` [PATCH net-next 0/5] sctp: use transport hashtable to replace association's with rhashtable Eric Dumazet @ 2016-01-04 22:30 ` David Miller 2 siblings, 0 replies; 40+ messages in thread From: David Miller @ 2016-01-04 22:30 UTC (permalink / raw) To: lucien.xin; +Cc: netdev, linux-sctp, mleitner, vyasevic, daniel From: Xin Long <lucien.xin@gmail.com> Date: Wed, 30 Dec 2015 23:50:45 +0800 > for telecom center, the usual case is that a server is connected by thousands > of clients. but if the server with only one enpoint(udp style) use the same > sport and dport to communicate with every clients, and every assoc in server > will be hashed in the same chain of global assoc hashtable due to currently we > choose dport and sport as the hash key. > > when a packet is received, sctp_rcv try to find the assoc with sport and dport, > since that chain is too long to find it fast, it make the performance turn to > very low, some test data is as follow: ... > we need to change the way to calculate the hash key, to use lport + > rport + paddr as the hash key can avoid this issue. > > besides, this patchset will use transport hashtable to replace > association hashtable to lookup with rhashtable api. get transport > first then get association by t->asoc. and also it will make tcp > style work better. ... I've applied this series, but... If anything regresses and is not dealt with in a timely manner I will revert this series. Thanks. ^ permalink raw reply [flat|nested] 40+ messages in thread
end of thread, other threads:[~2016-01-11 21:35 UTC | newest] Thread overview: 40+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2015-12-30 15:50 [PATCH net-next 0/5] sctp: use transport hashtable to replace association's with rhashtable Xin Long 2015-12-30 15:50 ` [PATCH net-next 1/5] sctp: add the rhashtable apis for sctp global transport hashtable Xin Long 2015-12-30 15:50 ` [PATCH net-next 2/5] sctp: apply rhashtable api to send/recv path Xin Long 2015-12-30 15:50 ` [PATCH net-next 3/5] sctp: apply rhashtable api to sctp procfs Xin Long 2015-12-30 15:50 ` [PATCH net-next 4/5] sctp: drop the old assoc hashtable of sctp Xin Long 2015-12-30 15:50 ` [PATCH net-next 5/5] sctp: remove the local_bh_disable/enable in sctp_endpoint_lookup_assoc Xin Long 2016-01-05 19:07 ` [PATCH net-next 2/5] sctp: apply rhashtable api to send/recv path Vlad Yasevich 2016-01-06 16:18 ` Xin Long 2016-01-06 17:42 ` mleitner 2016-01-11 15:00 ` Vlad Yasevich 2015-12-30 16:57 ` [PATCH net-next 1/5] sctp: add the rhashtable apis for sctp global transport hashtable Eric Dumazet 2015-12-30 17:50 ` David Miller 2016-01-11 9:32 ` Herbert Xu 2016-01-11 16:33 ` Marcelo Ricardo Leitner 2016-01-11 18:08 ` Vlad Yasevich 2016-01-11 18:19 ` Marcelo Ricardo Leitner 2015-12-30 17:41 ` Marcelo Ricardo Leitner 2016-01-05 10:10 ` Xin Long 2016-01-11 9:22 ` Herbert Xu 2016-01-05 18:38 ` Vlad Yasevich 2016-01-06 17:01 ` Xin Long 2016-01-06 18:19 ` Marcelo Ricardo Leitner 2016-01-07 17:23 ` Marcelo Ricardo Leitner 2016-01-07 20:28 ` Vlad Yasevich 2016-01-11 9:30 ` Herbert Xu 2016-01-11 16:00 ` mleitner 2016-01-11 17:20 ` Vlad Yasevich 2016-01-11 18:09 ` mleitner 2016-01-11 21:35 ` David Miller 2016-01-11 21:31 ` David Miller 2015-12-30 17:19 ` [PATCH net-next 0/5] sctp: use transport hashtable to replace association's with rhashtable Eric Dumazet 2015-12-30 17:32 ` Marcelo Ricardo Leitner 2015-12-30 19:11 ` Eric Dumazet 2015-12-30 20:44 ` David Miller 2015-12-30 21:57 ` Eric Dumazet 2015-12-30 22:29 ` Marcelo Ricardo Leitner 2015-12-30 17:52 ` David Miller 2015-12-30 19:03 ` Eric Dumazet 2015-12-30 20:40 ` David Miller 2016-01-04 22:30 ` David Miller
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).