linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH net-next 0/8] rxrpc: Rework endpoint record handling
@ 2016-06-15 16:41 David Howells
  2016-06-15 16:41 ` [PATCH net-next 1/8] rxrpc: Rework peer object handling to use hash table and RCU David Howells
                   ` (8 more replies)
  0 siblings, 9 replies; 10+ messages in thread
From: David Howells @ 2016-06-15 16:41 UTC (permalink / raw)
  To: davem; +Cc: dhowells, netdev, linux-afs, linux-kernel



Here's the next part of the AF_RXRPC rewrite.  In this set I rework
endpoint record handling.  There are two types of endpoint record, local
and peer.  The local endpoint record is used as an anchor for the transport
socket that AF_RXRPC uses (at the moment a UDP socket).  Local endpoints
can be shared between AF_RXRPC sockets under certain restricted
circumstances.

The peer endpoint is a record of the remote end.  It is (or will be) used
to keep track MTU and RTT values and, with these changes, is used to find
the call(s) to abort when a network error occurs.

The following significant changes are made:

 (1) The local endpoint event handling code is split out into its own file.

 (2) The local endpoint list bottom half-excluding spinlock is removed as
     things are arranged such that sk_user_data will not change whilst the
     transport socket callbacks are in progress.

 (3) Local endpoints can now only be shared if they have the same transport
     address (as before) and have a local service ID of 0 (ie. they're not
     listening for incoming calls).  This prevents callbacks from a server
     to one process being picked up by another process.

 (4) Local endpoint destruction is now accomplished by the same work item
     as processes events, meaning that the destructor doesn't need to wait
     for the event processor.

 (5) Peer endpoints are now held in a hash table rather than a flat list.

 (6) Peer endpoints are now destroyed by RCU rather than by work item.

 (7) Peer endpoints are now differentiated by local endpoint and remote
     transport port in addition to remote transport address and transport
     type and family.

     This means that a firewall that excludes access between a particular
     local port and remote port won't cause calls to be aborted that use a
     different port pair.

 (8) Error report handling now no longer assumes that the source is always
     an IPv4 ICMP message from a UDP port and has assumptions that an ICMP
     message comes from an IPv4 socket removed.  At some point IPv6 support
     will be added.

 (9) Peer endpoints rather than local endpoints are now the anchor point
     for distributing network error reports.

(10) Both types of endpoint records are now disposed of as soon as all
     references to them are gone.  There is less hanging around and once
     their usage counts hit zero, records can no longer be resurrected.

The patches can be found here also:

	http://git.kernel.org/cgit/linux/kernel/git/dhowells/linux-fs.git/log/?h=rxrpc-rewrite

Tagged thusly:

	git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs.git
	rxrpc-rewrite-20160615


David
---
David Howells (8):
      rxrpc: Rework peer object handling to use hash table and RCU
      rxrpc: Rename rxrpc_UDP_error_report() to rxrpc_error_report()
      rxrpc: Break MTU determination from ICMP into its own function
      rxrpc: Don't assume anything about the address in an ICMP packet
      rxrpc: Do a little bit of tidying in the ICMP processing
      rxrpc: Use the peer record to distribute network errors
      rxrpc: Separate local endpoint event handling out into its own file
      rxrpc: Rework local endpoint management


 net/rxrpc/Makefile       |    4 
 net/rxrpc/af_rxrpc.c     |   22 ++
 net/rxrpc/ar-internal.h  |  124 +++++++++----
 net/rxrpc/call_accept.c  |   27 +--
 net/rxrpc/call_event.c   |   15 +-
 net/rxrpc/call_object.c  |    6 -
 net/rxrpc/conn_event.c   |   15 --
 net/rxrpc/input.c        |   42 ++--
 net/rxrpc/local_event.c  |  116 ++++++++++++
 net/rxrpc/local_object.c |  430 +++++++++++++++++++++-------------------------
 net/rxrpc/output.c       |    4 
 net/rxrpc/peer_event.c   |  251 ++++++++++++++++-----------
 net/rxrpc/peer_object.c  |  373 ++++++++++++++++++++--------------------
 net/rxrpc/transport.c    |   19 --
 net/rxrpc/utils.c        |   41 ++++
 15 files changed, 842 insertions(+), 647 deletions(-)
 create mode 100644 net/rxrpc/local_event.c
 create mode 100644 net/rxrpc/utils.c

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH net-next 1/8] rxrpc: Rework peer object handling to use hash table and RCU
  2016-06-15 16:41 [PATCH net-next 0/8] rxrpc: Rework endpoint record handling David Howells
@ 2016-06-15 16:41 ` David Howells
  2016-06-15 16:41 ` [PATCH net-next 2/8] rxrpc: Rename rxrpc_UDP_error_report() to rxrpc_error_report() David Howells
                   ` (7 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: David Howells @ 2016-06-15 16:41 UTC (permalink / raw)
  To: davem; +Cc: dhowells, netdev, linux-afs, linux-kernel

Rework peer object handling to use a hash table instead of a flat list and
to use RCU.  Peer objects are no longer destroyed by passing them to a
workqueue to process, but rather are just passed to the RCU garbage
collector as kfree'able objects.

The hash function uses the local endpoint plus all the components of the
remote address, except for the RxRPC service ID.  Peers thus represent a
UDP port on the remote machine as contacted by a UDP port on this machine.

The RCU read lock is used to handle non-creating lookups so that they can
be called from bottom half context in the sk_error_report handler without
having to lock the hash table against modification.
rxrpc_lookup_peer_rcu() *does* take a reference on the peer object as in
the future, this will be passed to a work item for error distribution in
the error_report path and this function will cease being used in the
data_ready path.

Creating lookups are done under spinlock rather than mutex as they might be
set up due to an external stimulus if the local endpoint is a server.

Captured network error messages (ICMP) are handled with respect to this
struct and MTU size and RTT are cached here.

Signed-off-by: David Howells <dhowells@redhat.com>
---

 net/rxrpc/Makefile      |    3 
 net/rxrpc/af_rxrpc.c    |    3 
 net/rxrpc/ar-internal.h |   46 +++++-
 net/rxrpc/call_accept.c |    2 
 net/rxrpc/input.c       |   13 +-
 net/rxrpc/peer_event.c  |   59 +++++++-
 net/rxrpc/peer_object.c |  369 ++++++++++++++++++++++++-----------------------
 net/rxrpc/transport.c   |    2 
 net/rxrpc/utils.c       |   41 +++++
 9 files changed, 335 insertions(+), 203 deletions(-)
 create mode 100644 net/rxrpc/utils.c

diff --git a/net/rxrpc/Makefile b/net/rxrpc/Makefile
index 7e1006a3bfa5..a6f6f21d8a59 100644
--- a/net/rxrpc/Makefile
+++ b/net/rxrpc/Makefile
@@ -20,7 +20,8 @@ af-rxrpc-y := \
 	recvmsg.o \
 	security.o \
 	skbuff.o \
-	transport.o
+	transport.o \
+	utils.o
 
 af-rxrpc-$(CONFIG_PROC_FS) += proc.o
 af-rxrpc-$(CONFIG_RXKAD) += rxkad.o
diff --git a/net/rxrpc/af_rxrpc.c b/net/rxrpc/af_rxrpc.c
index a1bcb0e17250..ba373caddbeb 100644
--- a/net/rxrpc/af_rxrpc.c
+++ b/net/rxrpc/af_rxrpc.c
@@ -244,7 +244,7 @@ struct rxrpc_transport *rxrpc_name_to_transport(struct rxrpc_sock *rx,
 		return ERR_PTR(-EAFNOSUPPORT);
 
 	/* find a remote transport endpoint from the local one */
-	peer = rxrpc_get_peer(srx, gfp);
+	peer = rxrpc_lookup_peer(rx->local, srx, gfp);
 	if (IS_ERR(peer))
 		return ERR_CAST(peer);
 
@@ -835,7 +835,6 @@ static void __exit af_rxrpc_exit(void)
 	rxrpc_destroy_all_calls();
 	rxrpc_destroy_all_connections();
 	rxrpc_destroy_all_transports();
-	rxrpc_destroy_all_peers();
 	rxrpc_destroy_all_locals();
 
 	ASSERTCMP(atomic_read(&rxrpc_n_skbs), ==, 0);
diff --git a/net/rxrpc/ar-internal.h b/net/rxrpc/ar-internal.h
index 03919b9a8a31..7dba6677b9d5 100644
--- a/net/rxrpc/ar-internal.h
+++ b/net/rxrpc/ar-internal.h
@@ -9,7 +9,9 @@
  * 2 of the License, or (at your option) any later version.
  */
 
+#include <linux/atomic.h>
 #include <net/sock.h>
+#include <net/af_rxrpc.h>
 #include <rxrpc/packet.h>
 
 #if 0
@@ -193,15 +195,16 @@ struct rxrpc_local {
 
 /*
  * RxRPC remote transport endpoint definition
- * - matched by remote port, address and protocol type
- * - holds the connection ID counter for connections between the two endpoints
+ * - matched by local endpoint, remote port, address and protocol type
  */
 struct rxrpc_peer {
-	struct work_struct	destroyer;	/* peer destroyer */
-	struct list_head	link;		/* link in master peer list */
+	struct rcu_head		rcu;		/* This must be first */
+	atomic_t		usage;
+	unsigned long		hash_key;
+	struct hlist_node	hash_link;
+	struct rxrpc_local	*local;
 	struct list_head	error_targets;	/* targets for net error distribution */
 	spinlock_t		lock;		/* access lock */
-	atomic_t		usage;
 	unsigned int		if_mtu;		/* interface MTU for this peer */
 	unsigned int		mtu;		/* network MTU for this peer */
 	unsigned int		maxdata;	/* data size (MTU - hdrsize) */
@@ -611,10 +614,29 @@ void rxrpc_UDP_error_handler(struct work_struct *);
 /*
  * peer_object.c
  */
-struct rxrpc_peer *rxrpc_get_peer(struct sockaddr_rxrpc *, gfp_t);
-void rxrpc_put_peer(struct rxrpc_peer *);
-struct rxrpc_peer *rxrpc_find_peer(struct rxrpc_local *, __be32, __be16);
-void __exit rxrpc_destroy_all_peers(void);
+struct rxrpc_peer *rxrpc_lookup_peer_rcu(struct rxrpc_local *,
+					 const struct sockaddr_rxrpc *);
+struct rxrpc_peer *rxrpc_lookup_peer(struct rxrpc_local *,
+				     struct sockaddr_rxrpc *, gfp_t);
+struct rxrpc_peer *rxrpc_alloc_peer(struct rxrpc_local *, gfp_t);
+
+static inline void rxrpc_get_peer(struct rxrpc_peer *peer)
+{
+	atomic_inc(&peer->usage);
+}
+
+static inline
+struct rxrpc_peer *rxrpc_get_peer_maybe(struct rxrpc_peer *peer)
+{
+	return atomic_inc_not_zero(&peer->usage) ? peer : NULL;
+}
+
+extern void __rxrpc_put_peer(struct rxrpc_peer *peer);
+static inline void rxrpc_put_peer(struct rxrpc_peer *peer)
+{
+	if (atomic_dec_and_test(&peer->usage))
+		__rxrpc_put_peer(peer);
+}
 
 /*
  * proc.c
@@ -673,6 +695,12 @@ struct rxrpc_transport *rxrpc_find_transport(struct rxrpc_local *,
 					     struct rxrpc_peer *);
 
 /*
+ * utils.c
+ */
+void rxrpc_get_addr_from_skb(struct rxrpc_local *, const struct sk_buff *,
+			     struct sockaddr_rxrpc *);
+
+/*
  * debug tracing
  */
 extern unsigned int rxrpc_debug;
diff --git a/net/rxrpc/call_accept.c b/net/rxrpc/call_accept.c
index eea5f4a5d8b1..e5723f4dce89 100644
--- a/net/rxrpc/call_accept.c
+++ b/net/rxrpc/call_accept.c
@@ -95,7 +95,7 @@ static int rxrpc_accept_incoming_call(struct rxrpc_local *local,
 	rxrpc_new_skb(notification);
 	notification->mark = RXRPC_SKB_MARK_NEW_CALL;
 
-	peer = rxrpc_get_peer(srx, GFP_NOIO);
+	peer = rxrpc_lookup_peer(local, srx, GFP_NOIO);
 	if (IS_ERR(peer)) {
 		_debug("no peer");
 		ret = -EBUSY;
diff --git a/net/rxrpc/input.c b/net/rxrpc/input.c
index e0815a033999..3b405dbf3a05 100644
--- a/net/rxrpc/input.c
+++ b/net/rxrpc/input.c
@@ -635,14 +635,16 @@ static struct rxrpc_connection *rxrpc_conn_from_local(struct rxrpc_local *local,
 	struct rxrpc_peer *peer;
 	struct rxrpc_transport *trans;
 	struct rxrpc_connection *conn;
+	struct sockaddr_rxrpc srx;
 
-	peer = rxrpc_find_peer(local, ip_hdr(skb)->saddr,
-				udp_hdr(skb)->source);
+	rxrpc_get_addr_from_skb(local, skb, &srx);
+	rcu_read_lock();
+	peer = rxrpc_lookup_peer_rcu(local, &srx);
 	if (IS_ERR(peer))
-		goto cant_find_conn;
+		goto cant_find_peer;
 
 	trans = rxrpc_find_transport(local, peer);
-	rxrpc_put_peer(peer);
+	rcu_read_unlock();
 	if (!trans)
 		goto cant_find_conn;
 
@@ -652,6 +654,9 @@ static struct rxrpc_connection *rxrpc_conn_from_local(struct rxrpc_local *local,
 		goto cant_find_conn;
 
 	return conn;
+
+cant_find_peer:
+	rcu_read_unlock();
 cant_find_conn:
 	return NULL;
 }
diff --git a/net/rxrpc/peer_event.c b/net/rxrpc/peer_event.c
index 3e82d6f0313c..24f5ec0fcd20 100644
--- a/net/rxrpc/peer_event.c
+++ b/net/rxrpc/peer_event.c
@@ -23,6 +23,55 @@
 #include "ar-internal.h"
 
 /*
+ * Find the peer associated with an ICMP packet.
+ */
+static struct rxrpc_peer *rxrpc_lookup_peer_icmp_rcu(struct rxrpc_local *local,
+						     const struct sk_buff *skb)
+{
+	struct sock_exterr_skb *serr = SKB_EXT_ERR(skb);
+	struct sockaddr_rxrpc srx;
+
+	_enter("");
+
+	memset(&srx, 0, sizeof(srx));
+	srx.transport_type = local->srx.transport_type;
+	srx.transport.family = local->srx.transport.family;
+
+	/* Can we see an ICMP4 packet on an ICMP6 listening socket?  and vice
+	 * versa?
+	 */
+	switch (srx.transport.family) {
+	case AF_INET:
+		srx.transport.sin.sin_port = serr->port;
+		srx.transport_len = sizeof(struct sockaddr_in);
+		switch (serr->ee.ee_origin) {
+		case SO_EE_ORIGIN_ICMP:
+			_net("Rx ICMP");
+			memcpy(&srx.transport.sin.sin_addr,
+			       skb_network_header(skb) + serr->addr_offset,
+			       sizeof(struct in_addr));
+			break;
+		case SO_EE_ORIGIN_ICMP6:
+			_net("Rx ICMP6 on v4 sock");
+			memcpy(&srx.transport.sin.sin_addr,
+			       skb_network_header(skb) + serr->addr_offset + 12,
+			       sizeof(struct in_addr));
+			break;
+		default:
+			memcpy(&srx.transport.sin.sin_addr, &ip_hdr(skb)->saddr,
+			       sizeof(struct in_addr));
+			break;
+		}
+		break;
+
+	default:
+		BUG();
+	}
+
+	return rxrpc_lookup_peer_rcu(local, &srx);
+}
+
+/*
  * handle an error received on the local endpoint
  */
 void rxrpc_UDP_error_report(struct sock *sk)
@@ -57,8 +106,12 @@ void rxrpc_UDP_error_report(struct sock *sk)
 	_net("Rx UDP Error from %pI4:%hu", &addr, ntohs(port));
 	_debug("Msg l:%d d:%d", skb->len, skb->data_len);
 
-	peer = rxrpc_find_peer(local, addr, port);
-	if (IS_ERR(peer)) {
+	rcu_read_lock();
+	peer = rxrpc_lookup_peer_icmp_rcu(local, skb);
+	if (peer && !rxrpc_get_peer_maybe(peer))
+		peer = NULL;
+	if (!peer) {
+		rcu_read_unlock();
 		rxrpc_free_skb(skb);
 		_leave(" [no peer]");
 		return;
@@ -66,6 +119,7 @@ void rxrpc_UDP_error_report(struct sock *sk)
 
 	trans = rxrpc_find_transport(local, peer);
 	if (!trans) {
+		rcu_read_unlock();
 		rxrpc_put_peer(peer);
 		rxrpc_free_skb(skb);
 		_leave(" [no trans]");
@@ -110,6 +164,7 @@ void rxrpc_UDP_error_report(struct sock *sk)
 		}
 	}
 
+	rcu_read_unlock();
 	rxrpc_put_peer(peer);
 
 	/* pass the transport ref to error_handler to release */
diff --git a/net/rxrpc/peer_object.c b/net/rxrpc/peer_object.c
index 0b54cda3d8e5..7fc50dc7d333 100644
--- a/net/rxrpc/peer_object.c
+++ b/net/rxrpc/peer_object.c
@@ -1,6 +1,6 @@
-/* RxRPC remote transport endpoint management
+/* RxRPC remote transport endpoint record management
  *
- * Copyright (C) 2007 Red Hat, Inc. All Rights Reserved.
+ * Copyright (C) 2007, 2016 Red Hat, Inc. All Rights Reserved.
  * Written by David Howells (dhowells@redhat.com)
  *
  * This program is free software; you can redistribute it and/or
@@ -16,20 +16,132 @@
 #include <linux/skbuff.h>
 #include <linux/udp.h>
 #include <linux/in.h>
-#include <linux/in6.h>
-#include <linux/icmp.h>
 #include <linux/slab.h>
+#include <linux/hashtable.h>
 #include <net/sock.h>
 #include <net/af_rxrpc.h>
 #include <net/ip.h>
 #include <net/route.h>
 #include "ar-internal.h"
 
-static LIST_HEAD(rxrpc_peers);
-static DEFINE_RWLOCK(rxrpc_peer_lock);
-static DECLARE_WAIT_QUEUE_HEAD(rxrpc_peer_wq);
+static DEFINE_HASHTABLE(rxrpc_peer_hash, 10);
+static DEFINE_SPINLOCK(rxrpc_peer_hash_lock);
 
-static void rxrpc_destroy_peer(struct work_struct *work);
+/*
+ * Hash a peer key.
+ */
+static unsigned long rxrpc_peer_hash_key(struct rxrpc_local *local,
+					 const struct sockaddr_rxrpc *srx)
+{
+	const u16 *p;
+	unsigned int i, size;
+	unsigned long hash_key;
+
+	_enter("");
+
+	hash_key = (unsigned long)local / __alignof__(*local);
+	hash_key += srx->transport_type;
+	hash_key += srx->transport_len;
+	hash_key += srx->transport.family;
+
+	switch (srx->transport.family) {
+	case AF_INET:
+		hash_key += (u16 __force)srx->transport.sin.sin_port;
+		size = sizeof(srx->transport.sin.sin_addr);
+		p = (u16 *)&srx->transport.sin.sin_addr;
+		break;
+	}
+
+	/* Step through the peer address in 16-bit portions for speed */
+	for (i = 0; i < size; i += sizeof(*p), p++)
+		hash_key += *p;
+
+	_leave(" 0x%lx", hash_key);
+	return hash_key;
+}
+
+/*
+ * Compare a peer to a key.  Return -ve, 0 or +ve to indicate less than, same
+ * or greater than.
+ *
+ * Unfortunately, the primitives in linux/hashtable.h don't allow for sorted
+ * buckets and mid-bucket insertion, so we don't make full use of this
+ * information at this point.
+ */
+static long rxrpc_peer_cmp_key(const struct rxrpc_peer *peer,
+			       struct rxrpc_local *local,
+			       const struct sockaddr_rxrpc *srx,
+			       unsigned long hash_key)
+{
+	long diff;
+
+	diff = ((peer->hash_key - hash_key) ?:
+		((unsigned long)peer->local - (unsigned long)local) ?:
+		(peer->srx.transport_type - srx->transport_type) ?:
+		(peer->srx.transport_len - srx->transport_len) ?:
+		(peer->srx.transport.family - srx->transport.family));
+	if (diff != 0)
+		return diff;
+
+	switch (srx->transport.family) {
+	case AF_INET:
+		return ((u16 __force)peer->srx.transport.sin.sin_port -
+			(u16 __force)srx->transport.sin.sin_port) ?:
+			memcmp(&peer->srx.transport.sin.sin_addr,
+			       &srx->transport.sin.sin_addr,
+			       sizeof(struct in_addr));
+	default:
+		BUG();
+	}
+}
+
+/*
+ * Look up a remote transport endpoint for the specified address using RCU.
+ */
+static struct rxrpc_peer *__rxrpc_lookup_peer_rcu(
+	struct rxrpc_local *local,
+	const struct sockaddr_rxrpc *srx,
+	unsigned long hash_key)
+{
+	struct rxrpc_peer *peer;
+
+	hash_for_each_possible_rcu(rxrpc_peer_hash, peer, hash_link, hash_key) {
+		if (rxrpc_peer_cmp_key(peer, local, srx, hash_key) == 0) {
+			if (atomic_read(&peer->usage) == 0)
+				return NULL;
+			return peer;
+		}
+	}
+
+	return NULL;
+}
+
+/*
+ * Look up a remote transport endpoint for the specified address using RCU.
+ */
+struct rxrpc_peer *rxrpc_lookup_peer_rcu(struct rxrpc_local *local,
+					 const struct sockaddr_rxrpc *srx)
+{
+	struct rxrpc_peer *peer;
+	unsigned long hash_key = rxrpc_peer_hash_key(local, srx);
+
+	peer = __rxrpc_lookup_peer_rcu(local, srx, hash_key);
+	if (peer) {
+		switch (srx->transport.family) {
+		case AF_INET:
+			_net("PEER %d {%d,%u,%pI4+%hu}",
+			     peer->debug_id,
+			     peer->srx.transport_type,
+			     peer->srx.transport.family,
+			     &peer->srx.transport.sin.sin_addr,
+			     ntohs(peer->srx.transport.sin.sin_port));
+			break;
+		}
+
+		_leave(" = %p {u=%d}", peer, atomic_read(&peer->usage));
+	}
+	return peer;
+}
 
 /*
  * assess the MTU size for the network interface through which this peer is
@@ -58,10 +170,9 @@ static void rxrpc_assess_MTU_size(struct rxrpc_peer *peer)
 }
 
 /*
- * allocate a new peer
+ * Allocate a peer.
  */
-static struct rxrpc_peer *rxrpc_alloc_peer(struct sockaddr_rxrpc *srx,
-					   gfp_t gfp)
+struct rxrpc_peer *rxrpc_alloc_peer(struct rxrpc_local *local, gfp_t gfp)
 {
 	struct rxrpc_peer *peer;
 
@@ -69,12 +180,32 @@ static struct rxrpc_peer *rxrpc_alloc_peer(struct sockaddr_rxrpc *srx,
 
 	peer = kzalloc(sizeof(struct rxrpc_peer), gfp);
 	if (peer) {
-		INIT_WORK(&peer->destroyer, &rxrpc_destroy_peer);
-		INIT_LIST_HEAD(&peer->link);
+		atomic_set(&peer->usage, 1);
+		peer->local = local;
 		INIT_LIST_HEAD(&peer->error_targets);
 		spin_lock_init(&peer->lock);
-		atomic_set(&peer->usage, 1);
 		peer->debug_id = atomic_inc_return(&rxrpc_debug_id);
+	}
+
+	_leave(" = %p", peer);
+	return peer;
+}
+
+/*
+ * Set up a new peer.
+ */
+static struct rxrpc_peer *rxrpc_create_peer(struct rxrpc_local *local,
+					    struct sockaddr_rxrpc *srx,
+					    unsigned long hash_key,
+					    gfp_t gfp)
+{
+	struct rxrpc_peer *peer;
+
+	_enter("");
+
+	peer = rxrpc_alloc_peer(local, gfp);
+	if (peer) {
+		peer->hash_key = hash_key;
 		memcpy(&peer->srx, srx, sizeof(*srx));
 
 		rxrpc_assess_MTU_size(peer);
@@ -105,11 +236,11 @@ static struct rxrpc_peer *rxrpc_alloc_peer(struct sockaddr_rxrpc *srx,
 /*
  * obtain a remote transport endpoint for the specified address
  */
-struct rxrpc_peer *rxrpc_get_peer(struct sockaddr_rxrpc *srx, gfp_t gfp)
+struct rxrpc_peer *rxrpc_lookup_peer(struct rxrpc_local *local,
+				     struct sockaddr_rxrpc *srx, gfp_t gfp)
 {
 	struct rxrpc_peer *peer, *candidate;
-	const char *new = "old";
-	int usage;
+	unsigned long hash_key = rxrpc_peer_hash_key(local, srx);
 
 	_enter("{%d,%d,%pI4+%hu}",
 	       srx->transport_type,
@@ -118,188 +249,60 @@ struct rxrpc_peer *rxrpc_get_peer(struct sockaddr_rxrpc *srx, gfp_t gfp)
 	       ntohs(srx->transport.sin.sin_port));
 
 	/* search the peer list first */
-	read_lock_bh(&rxrpc_peer_lock);
-	list_for_each_entry(peer, &rxrpc_peers, link) {
-		_debug("check PEER %d { u=%d t=%d l=%d }",
-		       peer->debug_id,
-		       atomic_read(&peer->usage),
-		       peer->srx.transport_type,
-		       peer->srx.transport_len);
-
-		if (atomic_read(&peer->usage) > 0 &&
-		    peer->srx.transport_type == srx->transport_type &&
-		    peer->srx.transport_len == srx->transport_len &&
-		    memcmp(&peer->srx.transport,
-			   &srx->transport,
-			   srx->transport_len) == 0)
-			goto found_extant_peer;
-	}
-	read_unlock_bh(&rxrpc_peer_lock);
-
-	/* not yet present - create a candidate for a new record and then
-	 * redo the search */
-	candidate = rxrpc_alloc_peer(srx, gfp);
-	if (!candidate) {
-		_leave(" = -ENOMEM");
-		return ERR_PTR(-ENOMEM);
-	}
+	rcu_read_lock();
+	peer = __rxrpc_lookup_peer_rcu(local, srx, hash_key);
+	if (peer && !rxrpc_get_peer_maybe(peer))
+		peer = NULL;
+	rcu_read_unlock();
+
+	if (!peer) {
+		/* The peer is not yet present in hash - create a candidate
+		 * for a new record and then redo the search.
+		 */
+		candidate = rxrpc_create_peer(local, srx, hash_key, gfp);
+		if (!candidate) {
+			_leave(" = NULL [nomem]");
+			return NULL;
+		}
 
-	write_lock_bh(&rxrpc_peer_lock);
+		spin_lock(&rxrpc_peer_hash_lock);
 
-	list_for_each_entry(peer, &rxrpc_peers, link) {
-		if (atomic_read(&peer->usage) > 0 &&
-		    peer->srx.transport_type == srx->transport_type &&
-		    peer->srx.transport_len == srx->transport_len &&
-		    memcmp(&peer->srx.transport,
-			   &srx->transport,
-			   srx->transport_len) == 0)
-			goto found_extant_second;
-	}
+		/* Need to check that we aren't racing with someone else */
+		peer = __rxrpc_lookup_peer_rcu(local, srx, hash_key);
+		if (peer && !rxrpc_get_peer_maybe(peer))
+			peer = NULL;
+		if (!peer)
+			hash_add_rcu(rxrpc_peer_hash,
+				     &candidate->hash_link, hash_key);
 
-	/* we can now add the new candidate to the list */
-	peer = candidate;
-	candidate = NULL;
-	usage = atomic_read(&peer->usage);
+		spin_unlock(&rxrpc_peer_hash_lock);
 
-	list_add_tail(&peer->link, &rxrpc_peers);
-	write_unlock_bh(&rxrpc_peer_lock);
-	new = "new";
+		if (peer)
+			kfree(candidate);
+		else
+			peer = candidate;
+	}
 
-success:
-	_net("PEER %s %d {%d,%u,%pI4+%hu}",
-	     new,
+	_net("PEER %d {%d,%pI4+%hu}",
 	     peer->debug_id,
 	     peer->srx.transport_type,
-	     peer->srx.transport.family,
 	     &peer->srx.transport.sin.sin_addr,
 	     ntohs(peer->srx.transport.sin.sin_port));
 
-	_leave(" = %p {u=%d}", peer, usage);
-	return peer;
-
-	/* we found the peer in the list immediately */
-found_extant_peer:
-	usage = atomic_inc_return(&peer->usage);
-	read_unlock_bh(&rxrpc_peer_lock);
-	goto success;
-
-	/* we found the peer on the second time through the list */
-found_extant_second:
-	usage = atomic_inc_return(&peer->usage);
-	write_unlock_bh(&rxrpc_peer_lock);
-	kfree(candidate);
-	goto success;
-}
-
-/*
- * find the peer associated with a packet
- */
-struct rxrpc_peer *rxrpc_find_peer(struct rxrpc_local *local,
-				   __be32 addr, __be16 port)
-{
-	struct rxrpc_peer *peer;
-
-	_enter("");
-
-	/* search the peer list */
-	read_lock_bh(&rxrpc_peer_lock);
-
-	if (local->srx.transport.family == AF_INET &&
-	    local->srx.transport_type == SOCK_DGRAM
-	    ) {
-		list_for_each_entry(peer, &rxrpc_peers, link) {
-			if (atomic_read(&peer->usage) > 0 &&
-			    peer->srx.transport_type == SOCK_DGRAM &&
-			    peer->srx.transport.family == AF_INET &&
-			    peer->srx.transport.sin.sin_port == port &&
-			    peer->srx.transport.sin.sin_addr.s_addr == addr)
-				goto found_UDP_peer;
-		}
-
-		goto new_UDP_peer;
-	}
-
-	read_unlock_bh(&rxrpc_peer_lock);
-	_leave(" = -EAFNOSUPPORT");
-	return ERR_PTR(-EAFNOSUPPORT);
-
-found_UDP_peer:
-	_net("Rx UDP DGRAM from peer %d", peer->debug_id);
-	atomic_inc(&peer->usage);
-	read_unlock_bh(&rxrpc_peer_lock);
-	_leave(" = %p", peer);
+	_leave(" = %p {u=%d}", peer, atomic_read(&peer->usage));
 	return peer;
-
-new_UDP_peer:
-	_net("Rx UDP DGRAM from NEW peer");
-	read_unlock_bh(&rxrpc_peer_lock);
-	_leave(" = -EBUSY [new]");
-	return ERR_PTR(-EBUSY);
 }
 
 /*
- * release a remote transport endpoint
+ * Discard a ref on a remote peer record.
  */
-void rxrpc_put_peer(struct rxrpc_peer *peer)
+void __rxrpc_put_peer(struct rxrpc_peer *peer)
 {
-	_enter("%p{u=%d}", peer, atomic_read(&peer->usage));
+	ASSERT(list_empty(&peer->error_targets));
 
-	ASSERTCMP(atomic_read(&peer->usage), >, 0);
-
-	if (likely(!atomic_dec_and_test(&peer->usage))) {
-		_leave(" [in use]");
-		return;
-	}
-
-	rxrpc_queue_work(&peer->destroyer);
-	_leave("");
-}
-
-/*
- * destroy a remote transport endpoint
- */
-static void rxrpc_destroy_peer(struct work_struct *work)
-{
-	struct rxrpc_peer *peer =
-		container_of(work, struct rxrpc_peer, destroyer);
-
-	_enter("%p{%d}", peer, atomic_read(&peer->usage));
-
-	write_lock_bh(&rxrpc_peer_lock);
-	list_del(&peer->link);
-	write_unlock_bh(&rxrpc_peer_lock);
-
-	_net("DESTROY PEER %d", peer->debug_id);
-	kfree(peer);
-
-	if (list_empty(&rxrpc_peers))
-		wake_up_all(&rxrpc_peer_wq);
-	_leave("");
-}
-
-/*
- * preemptively destroy all the peer records from a transport endpoint rather
- * than waiting for them to time out
- */
-void __exit rxrpc_destroy_all_peers(void)
-{
-	DECLARE_WAITQUEUE(myself,current);
-
-	_enter("");
-
-	/* we simply have to wait for them to go away */
-	if (!list_empty(&rxrpc_peers)) {
-		set_current_state(TASK_UNINTERRUPTIBLE);
-		add_wait_queue(&rxrpc_peer_wq, &myself);
-
-		while (!list_empty(&rxrpc_peers)) {
-			schedule();
-			set_current_state(TASK_UNINTERRUPTIBLE);
-		}
-
-		remove_wait_queue(&rxrpc_peer_wq, &myself);
-		set_current_state(TASK_RUNNING);
-	}
+	spin_lock(&rxrpc_peer_hash_lock);
+	hash_del_rcu(&peer->hash_link);
+	spin_unlock(&rxrpc_peer_hash_lock);
 
-	_leave("");
+	kfree_rcu(peer, rcu);
 }
diff --git a/net/rxrpc/transport.c b/net/rxrpc/transport.c
index a1b65183b07d..d33387dec0ce 100644
--- a/net/rxrpc/transport.c
+++ b/net/rxrpc/transport.c
@@ -121,7 +121,7 @@ struct rxrpc_transport *rxrpc_get_transport(struct rxrpc_local *local,
 	usage = atomic_read(&trans->usage);
 
 	rxrpc_get_local(trans->local);
-	atomic_inc(&trans->peer->usage);
+	rxrpc_get_peer(trans->peer);
 	list_add_tail(&trans->link, &rxrpc_transports);
 	write_unlock_bh(&rxrpc_transport_lock);
 	new = "new";
diff --git a/net/rxrpc/utils.c b/net/rxrpc/utils.c
new file mode 100644
index 000000000000..f28122a15a24
--- /dev/null
+++ b/net/rxrpc/utils.c
@@ -0,0 +1,41 @@
+/* Utility routines
+ *
+ * Copyright (C) 2015 Red Hat, Inc. All Rights Reserved.
+ * Written by David Howells (dhowells@redhat.com)
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public Licence
+ * as published by the Free Software Foundation; either version
+ * 2 of the Licence, or (at your option) any later version.
+ */
+
+#include <linux/ip.h>
+#include <linux/udp.h>
+#include "ar-internal.h"
+
+/*
+ * Set up an RxRPC address from a socket buffer.
+ */
+void rxrpc_get_addr_from_skb(struct rxrpc_local *local,
+			     const struct sk_buff *skb,
+			     struct sockaddr_rxrpc *srx)
+{
+	memset(srx, 0, sizeof(*srx));
+	srx->transport_type = local->srx.transport_type;
+	srx->transport.family = local->srx.transport.family;
+
+	/* Can we see an ipv4 UDP packet on an ipv6 UDP socket?  and vice
+	 * versa?
+	 */
+	switch (srx->transport.family) {
+	case AF_INET:
+		srx->transport.sin.sin_port = udp_hdr(skb)->source;
+		srx->transport_len = sizeof(struct sockaddr_in);
+		memcpy(&srx->transport.sin.sin_addr, &ip_hdr(skb)->saddr,
+		       sizeof(struct in_addr));
+		break;
+
+	default:
+		BUG();
+	}
+}

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH net-next 2/8] rxrpc: Rename rxrpc_UDP_error_report() to rxrpc_error_report()
  2016-06-15 16:41 [PATCH net-next 0/8] rxrpc: Rework endpoint record handling David Howells
  2016-06-15 16:41 ` [PATCH net-next 1/8] rxrpc: Rework peer object handling to use hash table and RCU David Howells
@ 2016-06-15 16:41 ` David Howells
  2016-06-15 16:41 ` [PATCH net-next 3/8] rxrpc: Break MTU determination from ICMP into its own function David Howells
                   ` (6 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: David Howells @ 2016-06-15 16:41 UTC (permalink / raw)
  To: davem; +Cc: dhowells, netdev, linux-afs, linux-kernel

Rename rxrpc_UDP_error_report() to rxrpc_error_report() as it might get
called for something other than UDP.

Signed-off-by: David Howells <dhowells@redhat.com>
---

 net/rxrpc/ar-internal.h  |    4 ++--
 net/rxrpc/local_object.c |    2 +-
 net/rxrpc/peer_event.c   |    2 +-
 3 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/net/rxrpc/ar-internal.h b/net/rxrpc/ar-internal.h
index 7dba6677b9d5..1e5c15632f49 100644
--- a/net/rxrpc/ar-internal.h
+++ b/net/rxrpc/ar-internal.h
@@ -606,9 +606,9 @@ int rxrpc_send_packet(struct rxrpc_transport *, struct sk_buff *);
 int rxrpc_do_sendmsg(struct rxrpc_sock *, struct msghdr *, size_t);
 
 /*
- * peer_error.c
+ * peer_event.c
  */
-void rxrpc_UDP_error_report(struct sock *);
+void rxrpc_error_report(struct sock *);
 void rxrpc_UDP_error_handler(struct work_struct *);
 
 /*
diff --git a/net/rxrpc/local_object.c b/net/rxrpc/local_object.c
index 111f250b045f..28f9efb3118f 100644
--- a/net/rxrpc/local_object.c
+++ b/net/rxrpc/local_object.c
@@ -120,7 +120,7 @@ static int rxrpc_create_local(struct rxrpc_local *local)
 	sock = local->socket->sk;
 	sock->sk_user_data	= local;
 	sock->sk_data_ready	= rxrpc_data_ready;
-	sock->sk_error_report	= rxrpc_UDP_error_report;
+	sock->sk_error_report	= rxrpc_error_report;
 	_leave(" = 0");
 	return 0;
 
diff --git a/net/rxrpc/peer_event.c b/net/rxrpc/peer_event.c
index 24f5ec0fcd20..2c2df3a5d1b9 100644
--- a/net/rxrpc/peer_event.c
+++ b/net/rxrpc/peer_event.c
@@ -74,7 +74,7 @@ static struct rxrpc_peer *rxrpc_lookup_peer_icmp_rcu(struct rxrpc_local *local,
 /*
  * handle an error received on the local endpoint
  */
-void rxrpc_UDP_error_report(struct sock *sk)
+void rxrpc_error_report(struct sock *sk)
 {
 	struct sock_exterr_skb *serr;
 	struct rxrpc_transport *trans;

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH net-next 3/8] rxrpc: Break MTU determination from ICMP into its own function
  2016-06-15 16:41 [PATCH net-next 0/8] rxrpc: Rework endpoint record handling David Howells
  2016-06-15 16:41 ` [PATCH net-next 1/8] rxrpc: Rework peer object handling to use hash table and RCU David Howells
  2016-06-15 16:41 ` [PATCH net-next 2/8] rxrpc: Rename rxrpc_UDP_error_report() to rxrpc_error_report() David Howells
@ 2016-06-15 16:41 ` David Howells
  2016-06-15 16:42 ` [PATCH net-next 4/8] rxrpc: Don't assume anything about the address in an ICMP packet David Howells
                   ` (5 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: David Howells @ 2016-06-15 16:41 UTC (permalink / raw)
  To: davem; +Cc: dhowells, netdev, linux-afs, linux-kernel

Break MTU determination from ICMP out into its own function to reduce the
complexity of the error report handler.

Signed-off-by: David Howells <dhowells@redhat.com>
---

 net/rxrpc/peer_event.c |   93 ++++++++++++++++++++++++++++--------------------
 1 file changed, 54 insertions(+), 39 deletions(-)

diff --git a/net/rxrpc/peer_event.c b/net/rxrpc/peer_event.c
index 2c2df3a5d1b9..80de84257227 100644
--- a/net/rxrpc/peer_event.c
+++ b/net/rxrpc/peer_event.c
@@ -72,6 +72,45 @@ static struct rxrpc_peer *rxrpc_lookup_peer_icmp_rcu(struct rxrpc_local *local,
 }
 
 /*
+ * Handle an MTU/fragmentation problem.
+ */
+static void rxrpc_adjust_mtu(struct rxrpc_peer *peer, struct sock_exterr_skb *serr)
+{
+	u32 mtu = serr->ee.ee_info;
+
+	_net("Rx ICMP Fragmentation Needed (%d)", mtu);
+
+	/* wind down the local interface MTU */
+	if (mtu > 0 && peer->if_mtu == 65535 && mtu < peer->if_mtu) {
+		peer->if_mtu = mtu;
+		_net("I/F MTU %u", mtu);
+	}
+
+	if (mtu == 0) {
+		/* they didn't give us a size, estimate one */
+		mtu = peer->if_mtu;
+		if (mtu > 1500) {
+			mtu >>= 1;
+			if (mtu < 1500)
+				mtu = 1500;
+		} else {
+			mtu -= 100;
+			if (mtu < peer->hdrsize)
+				mtu = peer->hdrsize + 4;
+		}
+	}
+
+	if (mtu < peer->mtu) {
+		spin_lock_bh(&peer->lock);
+		peer->mtu = mtu;
+		peer->maxdata = peer->mtu - peer->hdrsize;
+		spin_unlock_bh(&peer->lock);
+		_net("Net MTU %u (maxdata %u)",
+		     peer->mtu, peer->maxdata);
+	}
+}
+
+/*
  * handle an error received on the local endpoint
  */
 void rxrpc_error_report(struct sock *sk)
@@ -126,50 +165,26 @@ void rxrpc_error_report(struct sock *sk)
 		return;
 	}
 
-	if (serr->ee.ee_origin == SO_EE_ORIGIN_ICMP &&
-	    serr->ee.ee_type == ICMP_DEST_UNREACH &&
-	    serr->ee.ee_code == ICMP_FRAG_NEEDED
-	    ) {
-		u32 mtu = serr->ee.ee_info;
-
-		_net("Rx Received ICMP Fragmentation Needed (%d)", mtu);
-
-		/* wind down the local interface MTU */
-		if (mtu > 0 && peer->if_mtu == 65535 && mtu < peer->if_mtu) {
-			peer->if_mtu = mtu;
-			_net("I/F MTU %u", mtu);
-		}
-
-		if (mtu == 0) {
-			/* they didn't give us a size, estimate one */
-			mtu = peer->if_mtu;
-			if (mtu > 1500) {
-				mtu >>= 1;
-				if (mtu < 1500)
-					mtu = 1500;
-			} else {
-				mtu -= 100;
-				if (mtu < peer->hdrsize)
-					mtu = peer->hdrsize + 4;
-			}
-		}
-
-		if (mtu < peer->mtu) {
-			spin_lock_bh(&peer->lock);
-			peer->mtu = mtu;
-			peer->maxdata = peer->mtu - peer->hdrsize;
-			spin_unlock_bh(&peer->lock);
-			_net("Net MTU %u (maxdata %u)",
-			     peer->mtu, peer->maxdata);
-		}
+	if ((serr->ee.ee_origin == SO_EE_ORIGIN_ICMP &&
+	     serr->ee.ee_type == ICMP_DEST_UNREACH &&
+	     serr->ee.ee_code == ICMP_FRAG_NEEDED)) {
+		rxrpc_adjust_mtu(peer, serr);
+		rxrpc_free_skb(skb);
+		skb = NULL;
+		goto out;
 	}
 
+out:
 	rcu_read_unlock();
 	rxrpc_put_peer(peer);
 
-	/* pass the transport ref to error_handler to release */
-	skb_queue_tail(&trans->error_queue, skb);
-	rxrpc_queue_work(&trans->error_handler);
+	if (skb) {
+		/* pass the transport ref to error_handler to release */
+		skb_queue_tail(&trans->error_queue, skb);
+		rxrpc_queue_work(&trans->error_handler);
+	} else {
+		rxrpc_put_transport(trans);
+	}
 	_leave("");
 }
 

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH net-next 4/8] rxrpc: Don't assume anything about the address in an ICMP packet
  2016-06-15 16:41 [PATCH net-next 0/8] rxrpc: Rework endpoint record handling David Howells
                   ` (2 preceding siblings ...)
  2016-06-15 16:41 ` [PATCH net-next 3/8] rxrpc: Break MTU determination from ICMP into its own function David Howells
@ 2016-06-15 16:42 ` David Howells
  2016-06-15 16:42 ` [PATCH net-next 5/8] rxrpc: Do a little bit of tidying in the ICMP processing David Howells
                   ` (4 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: David Howells @ 2016-06-15 16:42 UTC (permalink / raw)
  To: davem; +Cc: dhowells, netdev, linux-afs, linux-kernel

Don't assume anything about the address in an ICMP packet in
rxrpc_error_report() as the address may not be IPv4 in future, especially
since we're just printing these details.

Signed-off-by: David Howells <dhowells@redhat.com>
---

 net/rxrpc/peer_event.c |    8 --------
 1 file changed, 8 deletions(-)

diff --git a/net/rxrpc/peer_event.c b/net/rxrpc/peer_event.c
index 80de84257227..6ba798d6659e 100644
--- a/net/rxrpc/peer_event.c
+++ b/net/rxrpc/peer_event.c
@@ -120,8 +120,6 @@ void rxrpc_error_report(struct sock *sk)
 	struct rxrpc_local *local = sk->sk_user_data;
 	struct rxrpc_peer *peer;
 	struct sk_buff *skb;
-	__be32 addr;
-	__be16 port;
 
 	_enter("%p{%d}", sk, local->debug_id);
 
@@ -139,12 +137,6 @@ void rxrpc_error_report(struct sock *sk)
 
 	rxrpc_new_skb(skb);
 
-	addr = *(__be32 *)(skb_network_header(skb) + serr->addr_offset);
-	port = serr->port;
-
-	_net("Rx UDP Error from %pI4:%hu", &addr, ntohs(port));
-	_debug("Msg l:%d d:%d", skb->len, skb->data_len);
-
 	rcu_read_lock();
 	peer = rxrpc_lookup_peer_icmp_rcu(local, skb);
 	if (peer && !rxrpc_get_peer_maybe(peer))

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH net-next 5/8] rxrpc: Do a little bit of tidying in the ICMP processing
  2016-06-15 16:41 [PATCH net-next 0/8] rxrpc: Rework endpoint record handling David Howells
                   ` (3 preceding siblings ...)
  2016-06-15 16:42 ` [PATCH net-next 4/8] rxrpc: Don't assume anything about the address in an ICMP packet David Howells
@ 2016-06-15 16:42 ` David Howells
  2016-06-15 16:42 ` [PATCH net-next 6/8] rxrpc: Use the peer record to distribute network errors David Howells
                   ` (3 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: David Howells @ 2016-06-15 16:42 UTC (permalink / raw)
  To: davem; +Cc: dhowells, netdev, linux-afs, linux-kernel

Do a little bit of tidying in the ICMP processing code.

Signed-off-by: David Howells <dhowells@redhat.com>
---

 net/rxrpc/peer_event.c |    6 ++----
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/net/rxrpc/peer_event.c b/net/rxrpc/peer_event.c
index 6ba798d6659e..31c440acd8c9 100644
--- a/net/rxrpc/peer_event.c
+++ b/net/rxrpc/peer_event.c
@@ -245,15 +245,13 @@ void rxrpc_UDP_error_handler(struct work_struct *work)
 		break;
 
 	case SO_EE_ORIGIN_LOCAL:
-		_proto("Rx Received local error { error=%d }",
-		       ee->ee_errno);
+		_proto("Rx Received local error { error=%d }", err);
 		break;
 
 	case SO_EE_ORIGIN_NONE:
 	case SO_EE_ORIGIN_ICMP6:
 	default:
-		_proto("Rx Received error report { orig=%u }",
-		       ee->ee_origin);
+		_proto("Rx Received error report { orig=%u }", ee->ee_origin);
 		break;
 	}
 

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH net-next 6/8] rxrpc: Use the peer record to distribute network errors
  2016-06-15 16:41 [PATCH net-next 0/8] rxrpc: Rework endpoint record handling David Howells
                   ` (4 preceding siblings ...)
  2016-06-15 16:42 ` [PATCH net-next 5/8] rxrpc: Do a little bit of tidying in the ICMP processing David Howells
@ 2016-06-15 16:42 ` David Howells
  2016-06-15 16:42 ` [PATCH net-next 7/8] rxrpc: Separate local endpoint event handling out into its own file David Howells
                   ` (2 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: David Howells @ 2016-06-15 16:42 UTC (permalink / raw)
  To: davem; +Cc: dhowells, netdev, linux-afs, linux-kernel

Use the peer record to distribute network errors rather than the transport
object (which I want to get rid of).  An error from a particular peer
terminates all calls on that peer.

For future consideration:

 (1) For ICMP-induced errors it might be worth trying to extract the RxRPC
     header from the offending packet, if one is returned attached to the
     ICMP packet, to better direct the error.

     This may be overkill, though, since an ICMP packet would be expected
     to be relating to the destination port, machine or network.  RxRPC
     ABORT and BUSY packets give notice at RxRPC level.

 (2) To also abort connection-level communications (such as CHALLENGE
     packets) where indicted by an error - but that requires some revamping
     of the connection event handling first.

Signed-off-by: David Howells <dhowells@redhat.com>
---

 net/rxrpc/ar-internal.h |   16 +++----
 net/rxrpc/call_event.c  |   15 +++++-
 net/rxrpc/call_object.c |    6 +--
 net/rxrpc/output.c      |    4 +-
 net/rxrpc/peer_event.c  |  109 ++++++++++++++++++++++-------------------------
 net/rxrpc/peer_object.c |    6 ++-
 net/rxrpc/transport.c   |   17 -------
 7 files changed, 79 insertions(+), 94 deletions(-)

diff --git a/net/rxrpc/ar-internal.h b/net/rxrpc/ar-internal.h
index 1e5c15632f49..a63bb7518fb5 100644
--- a/net/rxrpc/ar-internal.h
+++ b/net/rxrpc/ar-internal.h
@@ -189,7 +189,6 @@ struct rxrpc_local {
 	rwlock_t		services_lock;	/* lock for services list */
 	atomic_t		usage;
 	int			debug_id;	/* debug ID for printks */
-	volatile char		error_rcvd;	/* T if received ICMP error outstanding */
 	struct sockaddr_rxrpc	srx;		/* local address */
 };
 
@@ -203,14 +202,16 @@ struct rxrpc_peer {
 	unsigned long		hash_key;
 	struct hlist_node	hash_link;
 	struct rxrpc_local	*local;
-	struct list_head	error_targets;	/* targets for net error distribution */
+	struct hlist_head	error_targets;	/* targets for net error distribution */
+	struct work_struct	error_distributor;
 	spinlock_t		lock;		/* access lock */
 	unsigned int		if_mtu;		/* interface MTU for this peer */
 	unsigned int		mtu;		/* network MTU for this peer */
 	unsigned int		maxdata;	/* data size (MTU - hdrsize) */
 	unsigned short		hdrsize;	/* header size (IP + UDP + RxRPC) */
 	int			debug_id;	/* debug ID for printks */
-	int			net_error;	/* network error distributed */
+	int			error_report;	/* Net (+0) or local (+1000000) to distribute */
+#define RXRPC_LOCAL_ERROR_OFFSET 1000000
 	struct sockaddr_rxrpc	srx;		/* remote address */
 
 	/* calculated RTT cache */
@@ -229,12 +230,10 @@ struct rxrpc_peer {
 struct rxrpc_transport {
 	struct rxrpc_local	*local;		/* local transport endpoint */
 	struct rxrpc_peer	*peer;		/* remote transport endpoint */
-	struct work_struct	error_handler;	/* network error distributor */
 	struct rb_root		bundles;	/* client connection bundles on this transport */
 	struct rb_root		client_conns;	/* client connections on this transport */
 	struct rb_root		server_conns;	/* server connections on this transport */
 	struct list_head	link;		/* link in master session list */
-	struct sk_buff_head	error_queue;	/* error packets awaiting processing */
 	unsigned long		put_time;	/* time at which to reap */
 	spinlock_t		client_lock;	/* client connection allocation lock */
 	rwlock_t		conn_lock;	/* lock for active/dead connections */
@@ -393,7 +392,7 @@ struct rxrpc_call {
 	struct work_struct	destroyer;	/* call destroyer */
 	struct work_struct	processor;	/* packet processor and ACK generator */
 	struct list_head	link;		/* link in master call list */
-	struct list_head	error_link;	/* link in error distribution list */
+	struct hlist_node	error_link;	/* link in error distribution list */
 	struct list_head	accept_link;	/* calls awaiting acceptance */
 	struct rb_node		sock_node;	/* node in socket call tree */
 	struct rb_node		conn_node;	/* node in connection call tree */
@@ -411,7 +410,8 @@ struct rxrpc_call {
 	atomic_t		sequence;	/* Tx data packet sequence counter */
 	u32			local_abort;	/* local abort code */
 	u32			remote_abort;	/* remote abort code */
-	int			error;		/* local error incurred */
+	int			error_report;	/* Network error (ICMP/local transport) */
+	int			error;		/* Local error incurred */
 	enum rxrpc_call_state	state : 8;	/* current state of call */
 	int			debug_id;	/* debug ID for printks */
 	u8			channel;	/* connection channel occupied by this call */
@@ -609,7 +609,7 @@ int rxrpc_do_sendmsg(struct rxrpc_sock *, struct msghdr *, size_t);
  * peer_event.c
  */
 void rxrpc_error_report(struct sock *);
-void rxrpc_UDP_error_handler(struct work_struct *);
+void rxrpc_peer_error_distributor(struct work_struct *);
 
 /*
  * peer_object.c
diff --git a/net/rxrpc/call_event.c b/net/rxrpc/call_event.c
index 18381783c2b1..e610b106c913 100644
--- a/net/rxrpc/call_event.c
+++ b/net/rxrpc/call_event.c
@@ -864,17 +864,24 @@ void rxrpc_process_call(struct work_struct *work)
 	}
 
 	if (test_bit(RXRPC_CALL_EV_RCVD_ERROR, &call->events)) {
+		enum rxrpc_skb_mark mark;
 		int error;
 
 		clear_bit(RXRPC_CALL_EV_CONN_ABORT, &call->events);
 		clear_bit(RXRPC_CALL_EV_REJECT_BUSY, &call->events);
 		clear_bit(RXRPC_CALL_EV_ABORT, &call->events);
 
-		error = call->conn->trans->peer->net_error;
-		_debug("post net error %d", error);
+		error = call->error_report;
+		if (error < RXRPC_LOCAL_ERROR_OFFSET) {
+			mark = RXRPC_SKB_MARK_NET_ERROR;
+			_debug("post net error %d", error);
+		} else {
+			mark = RXRPC_SKB_MARK_LOCAL_ERROR;
+			error -= RXRPC_LOCAL_ERROR_OFFSET;
+			_debug("post net local error %d", error);
+		}
 
-		if (rxrpc_post_message(call, RXRPC_SKB_MARK_NET_ERROR,
-				       error, true) < 0)
+		if (rxrpc_post_message(call, mark, error, true) < 0)
 			goto no_mem;
 		clear_bit(RXRPC_CALL_EV_RCVD_ERROR, &call->events);
 		goto kill_ACKs;
diff --git a/net/rxrpc/call_object.c b/net/rxrpc/call_object.c
index 68125dc4cb7c..8b4d47b3ccac 100644
--- a/net/rxrpc/call_object.c
+++ b/net/rxrpc/call_object.c
@@ -334,7 +334,7 @@ static struct rxrpc_call *rxrpc_alloc_client_call(
 	rxrpc_call_hash_add(call);
 
 	spin_lock(&call->conn->trans->peer->lock);
-	list_add(&call->error_link, &call->conn->trans->peer->error_targets);
+	hlist_add_head(&call->error_link, &call->conn->trans->peer->error_targets);
 	spin_unlock(&call->conn->trans->peer->lock);
 
 	call->lifetimer.expires = jiffies + rxrpc_max_call_lifetime;
@@ -516,7 +516,7 @@ struct rxrpc_call *rxrpc_incoming_call(struct rxrpc_sock *rx,
 	write_unlock_bh(&conn->lock);
 
 	spin_lock(&conn->trans->peer->lock);
-	list_add(&call->error_link, &conn->trans->peer->error_targets);
+	hlist_add_head(&call->error_link, &conn->trans->peer->error_targets);
 	spin_unlock(&conn->trans->peer->lock);
 
 	write_lock_bh(&rxrpc_call_lock);
@@ -812,7 +812,7 @@ static void rxrpc_cleanup_call(struct rxrpc_call *call)
 
 	if (call->conn) {
 		spin_lock(&call->conn->trans->peer->lock);
-		list_del(&call->error_link);
+		hlist_del_init(&call->error_link);
 		spin_unlock(&call->conn->trans->peer->lock);
 
 		write_lock_bh(&call->conn->lock);
diff --git a/net/rxrpc/output.c b/net/rxrpc/output.c
index 2e3c4064e29c..e6fb3863b0bc 100644
--- a/net/rxrpc/output.c
+++ b/net/rxrpc/output.c
@@ -707,7 +707,9 @@ out:
 call_aborted:
 	rxrpc_free_skb(skb);
 	if (call->state == RXRPC_CALL_NETWORK_ERROR)
-		ret = call->conn->trans->peer->net_error;
+		ret = call->error_report < RXRPC_LOCAL_ERROR_OFFSET ?
+			call->error_report :
+			call->error_report - RXRPC_LOCAL_ERROR_OFFSET;
 	else
 		ret = -ECONNABORTED;
 	_leave(" = %d", ret);
diff --git a/net/rxrpc/peer_event.c b/net/rxrpc/peer_event.c
index 31c440acd8c9..8940674b5e08 100644
--- a/net/rxrpc/peer_event.c
+++ b/net/rxrpc/peer_event.c
@@ -1,4 +1,4 @@
-/* Error message handling (ICMP)
+/* Peer event handling, typically ICMP messages.
  *
  * Copyright (C) 2007 Red Hat, Inc. All Rights Reserved.
  * Written by David Howells (dhowells@redhat.com)
@@ -22,6 +22,8 @@
 #include <net/ip.h>
 #include "ar-internal.h"
 
+static void rxrpc_store_error(struct rxrpc_peer *, struct sock_exterr_skb *);
+
 /*
  * Find the peer associated with an ICMP packet.
  */
@@ -111,12 +113,11 @@ static void rxrpc_adjust_mtu(struct rxrpc_peer *peer, struct sock_exterr_skb *se
 }
 
 /*
- * handle an error received on the local endpoint
+ * Handle an error received on the local endpoint.
  */
 void rxrpc_error_report(struct sock *sk)
 {
 	struct sock_exterr_skb *serr;
-	struct rxrpc_transport *trans;
 	struct rxrpc_local *local = sk->sk_user_data;
 	struct rxrpc_peer *peer;
 	struct sk_buff *skb;
@@ -148,57 +149,37 @@ void rxrpc_error_report(struct sock *sk)
 		return;
 	}
 
-	trans = rxrpc_find_transport(local, peer);
-	if (!trans) {
-		rcu_read_unlock();
-		rxrpc_put_peer(peer);
-		rxrpc_free_skb(skb);
-		_leave(" [no trans]");
-		return;
-	}
-
 	if ((serr->ee.ee_origin == SO_EE_ORIGIN_ICMP &&
 	     serr->ee.ee_type == ICMP_DEST_UNREACH &&
 	     serr->ee.ee_code == ICMP_FRAG_NEEDED)) {
 		rxrpc_adjust_mtu(peer, serr);
+		rcu_read_unlock();
 		rxrpc_free_skb(skb);
-		skb = NULL;
-		goto out;
+		rxrpc_put_peer(peer);
+		_leave(" [MTU update]");
+		return;
 	}
 
-out:
+	rxrpc_store_error(peer, serr);
 	rcu_read_unlock();
-	rxrpc_put_peer(peer);
+	rxrpc_free_skb(skb);
 
-	if (skb) {
-		/* pass the transport ref to error_handler to release */
-		skb_queue_tail(&trans->error_queue, skb);
-		rxrpc_queue_work(&trans->error_handler);
-	} else {
-		rxrpc_put_transport(trans);
-	}
+	/* The ref we obtained is passed off to the work item */
+	rxrpc_queue_work(&peer->error_distributor);
 	_leave("");
 }
 
 /*
- * deal with UDP error messages
+ * Map an error report to error codes on the peer record.
  */
-void rxrpc_UDP_error_handler(struct work_struct *work)
+static void rxrpc_store_error(struct rxrpc_peer *peer,
+			      struct sock_exterr_skb *serr)
 {
 	struct sock_extended_err *ee;
-	struct sock_exterr_skb *serr;
-	struct rxrpc_transport *trans =
-		container_of(work, struct rxrpc_transport, error_handler);
-	struct sk_buff *skb;
 	int err;
 
 	_enter("");
 
-	skb = skb_dequeue(&trans->error_queue);
-	if (!skb)
-		return;
-
-	serr = SKB_EXT_ERR(skb);
 	ee = &serr->ee;
 
 	_net("Rx Error o=%d t=%d c=%d e=%d",
@@ -244,47 +225,57 @@ void rxrpc_UDP_error_handler(struct work_struct *work)
 		}
 		break;
 
+	case SO_EE_ORIGIN_NONE:
 	case SO_EE_ORIGIN_LOCAL:
 		_proto("Rx Received local error { error=%d }", err);
+		err += RXRPC_LOCAL_ERROR_OFFSET;
 		break;
 
-	case SO_EE_ORIGIN_NONE:
 	case SO_EE_ORIGIN_ICMP6:
 	default:
 		_proto("Rx Received error report { orig=%u }", ee->ee_origin);
 		break;
 	}
 
-	/* terminate all the affected calls if there's an unrecoverable
-	 * error */
-	if (err) {
-		struct rxrpc_call *call, *_n;
+	peer->error_report = err;
+}
+
+/*
+ * Distribute an error that occurred on a peer
+ */
+void rxrpc_peer_error_distributor(struct work_struct *work)
+{
+	struct rxrpc_peer *peer =
+		container_of(work, struct rxrpc_peer, error_distributor);
+	struct rxrpc_call *call;
+	int error_report;
+
+	_enter("");
 
-		_debug("ISSUE ERROR %d", err);
+	error_report = READ_ONCE(peer->error_report);
 
-		spin_lock_bh(&trans->peer->lock);
-		trans->peer->net_error = err;
+	_debug("ISSUE ERROR %d", error_report);
 
-		list_for_each_entry_safe(call, _n, &trans->peer->error_targets,
-					 error_link) {
-			write_lock(&call->state_lock);
-			if (call->state != RXRPC_CALL_COMPLETE &&
-			    call->state < RXRPC_CALL_NETWORK_ERROR) {
-				call->state = RXRPC_CALL_NETWORK_ERROR;
-				set_bit(RXRPC_CALL_EV_RCVD_ERROR, &call->events);
-				rxrpc_queue_call(call);
-			}
-			write_unlock(&call->state_lock);
-			list_del_init(&call->error_link);
-		}
+	spin_lock_bh(&peer->lock);
 
-		spin_unlock_bh(&trans->peer->lock);
+	while (!hlist_empty(&peer->error_targets)) {
+		call = hlist_entry(peer->error_targets.first,
+				   struct rxrpc_call, error_link);
+		hlist_del_init(&call->error_link);
+
+		write_lock(&call->state_lock);
+		if (call->state != RXRPC_CALL_COMPLETE &&
+		    call->state < RXRPC_CALL_NETWORK_ERROR) {
+			call->error_report = error_report;
+			call->state = RXRPC_CALL_NETWORK_ERROR;
+			set_bit(RXRPC_CALL_EV_RCVD_ERROR, &call->events);
+			rxrpc_queue_call(call);
+		}
+		write_unlock(&call->state_lock);
 	}
 
-	if (!skb_queue_empty(&trans->error_queue))
-		rxrpc_queue_work(&trans->error_handler);
+	spin_unlock_bh(&peer->lock);
 
-	rxrpc_free_skb(skb);
-	rxrpc_put_transport(trans);
+	rxrpc_put_peer(peer);
 	_leave("");
 }
diff --git a/net/rxrpc/peer_object.c b/net/rxrpc/peer_object.c
index 7fc50dc7d333..faf222c21698 100644
--- a/net/rxrpc/peer_object.c
+++ b/net/rxrpc/peer_object.c
@@ -182,7 +182,9 @@ struct rxrpc_peer *rxrpc_alloc_peer(struct rxrpc_local *local, gfp_t gfp)
 	if (peer) {
 		atomic_set(&peer->usage, 1);
 		peer->local = local;
-		INIT_LIST_HEAD(&peer->error_targets);
+		INIT_HLIST_HEAD(&peer->error_targets);
+		INIT_WORK(&peer->error_distributor,
+			  &rxrpc_peer_error_distributor);
 		spin_lock_init(&peer->lock);
 		peer->debug_id = atomic_inc_return(&rxrpc_debug_id);
 	}
@@ -298,7 +300,7 @@ struct rxrpc_peer *rxrpc_lookup_peer(struct rxrpc_local *local,
  */
 void __rxrpc_put_peer(struct rxrpc_peer *peer)
 {
-	ASSERT(list_empty(&peer->error_targets));
+	ASSERT(hlist_empty(&peer->error_targets));
 
 	spin_lock(&rxrpc_peer_hash_lock);
 	hash_del_rcu(&peer->hash_link);
diff --git a/net/rxrpc/transport.c b/net/rxrpc/transport.c
index d33387dec0ce..24c71218a6f8 100644
--- a/net/rxrpc/transport.c
+++ b/net/rxrpc/transport.c
@@ -49,26 +49,11 @@ static struct rxrpc_transport *rxrpc_alloc_transport(struct rxrpc_local *local,
 		trans->bundles = RB_ROOT;
 		trans->client_conns = RB_ROOT;
 		trans->server_conns = RB_ROOT;
-		skb_queue_head_init(&trans->error_queue);
 		spin_lock_init(&trans->client_lock);
 		rwlock_init(&trans->conn_lock);
 		atomic_set(&trans->usage, 1);
 		trans->conn_idcounter = peer->srx.srx_service << 16;
 		trans->debug_id = atomic_inc_return(&rxrpc_debug_id);
-
-		if (peer->srx.transport.family == AF_INET) {
-			switch (peer->srx.transport_type) {
-			case SOCK_DGRAM:
-				INIT_WORK(&trans->error_handler,
-					  rxrpc_UDP_error_handler);
-				break;
-			default:
-				BUG();
-				break;
-			}
-		} else {
-			BUG();
-		}
 	}
 
 	_leave(" = %p", trans);
@@ -210,8 +195,6 @@ static void rxrpc_cleanup_transport(struct rxrpc_transport *trans)
 {
 	_net("DESTROY TRANS %d", trans->debug_id);
 
-	rxrpc_purge_queue(&trans->error_queue);
-
 	rxrpc_put_local(trans->local);
 	rxrpc_put_peer(trans->peer);
 	kfree(trans);

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH net-next 7/8] rxrpc: Separate local endpoint event handling out into its own file
  2016-06-15 16:41 [PATCH net-next 0/8] rxrpc: Rework endpoint record handling David Howells
                   ` (5 preceding siblings ...)
  2016-06-15 16:42 ` [PATCH net-next 6/8] rxrpc: Use the peer record to distribute network errors David Howells
@ 2016-06-15 16:42 ` David Howells
  2016-06-15 16:42 ` [PATCH net-next 8/8] rxrpc: Rework local endpoint management David Howells
  2016-06-16  5:25 ` [PATCH net-next 0/8] rxrpc: Rework endpoint record handling David Miller
  8 siblings, 0 replies; 10+ messages in thread
From: David Howells @ 2016-06-15 16:42 UTC (permalink / raw)
  To: davem; +Cc: dhowells, netdev, linux-afs, linux-kernel

Separate local endpoint event handling out into its own file preparatory to
overhauling the object management aspect (which remains in the original
file).

Signed-off-by: David Howells <dhowells@redhat.com>
---

 net/rxrpc/Makefile       |    1 
 net/rxrpc/ar-internal.h  |    5 ++
 net/rxrpc/local_event.c  |  120 ++++++++++++++++++++++++++++++++++++++++++++++
 net/rxrpc/local_object.c |  105 +---------------------------------------
 4 files changed, 129 insertions(+), 102 deletions(-)
 create mode 100644 net/rxrpc/local_event.c

diff --git a/net/rxrpc/Makefile b/net/rxrpc/Makefile
index a6f6f21d8a59..b005027f80cf 100644
--- a/net/rxrpc/Makefile
+++ b/net/rxrpc/Makefile
@@ -12,6 +12,7 @@ af-rxrpc-y := \
 	input.o \
 	insecure.o \
 	key.o \
+	local_event.o \
 	local_object.o \
 	misc.o \
 	output.o \
diff --git a/net/rxrpc/ar-internal.h b/net/rxrpc/ar-internal.h
index a63bb7518fb5..fa50b09eaa63 100644
--- a/net/rxrpc/ar-internal.h
+++ b/net/rxrpc/ar-internal.h
@@ -573,6 +573,11 @@ int rxrpc_get_server_data_key(struct rxrpc_connection *, const void *, time_t,
 			      u32);
 
 /*
+ * local_event.c
+ */
+extern void rxrpc_process_local_events(struct work_struct *);
+
+/*
  * local_object.c
  */
 extern rwlock_t rxrpc_local_lock;
diff --git a/net/rxrpc/local_event.c b/net/rxrpc/local_event.c
new file mode 100644
index 000000000000..194db2e6d548
--- /dev/null
+++ b/net/rxrpc/local_event.c
@@ -0,0 +1,120 @@
+/* AF_RXRPC local endpoint management
+ *
+ * Copyright (C) 2007 Red Hat, Inc. All Rights Reserved.
+ * Written by David Howells (dhowells@redhat.com)
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version
+ * 2 of the License, or (at your option) any later version.
+ */
+
+#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
+
+#include <linux/module.h>
+#include <linux/net.h>
+#include <linux/skbuff.h>
+#include <linux/slab.h>
+#include <linux/udp.h>
+#include <linux/ip.h>
+#include <net/sock.h>
+#include <net/af_rxrpc.h>
+#include <generated/utsrelease.h>
+#include "ar-internal.h"
+
+static const char rxrpc_version_string[65] = "linux-" UTS_RELEASE " AF_RXRPC";
+
+/*
+ * Reply to a version request
+ */
+static void rxrpc_send_version_request(struct rxrpc_local *local,
+				       struct rxrpc_host_header *hdr,
+				       struct sk_buff *skb)
+{
+	struct rxrpc_wire_header whdr;
+	struct rxrpc_skb_priv *sp = rxrpc_skb(skb);
+	struct sockaddr_in sin;
+	struct msghdr msg;
+	struct kvec iov[2];
+	size_t len;
+	int ret;
+
+	_enter("");
+
+	sin.sin_family = AF_INET;
+	sin.sin_port = udp_hdr(skb)->source;
+	sin.sin_addr.s_addr = ip_hdr(skb)->saddr;
+
+	msg.msg_name	= &sin;
+	msg.msg_namelen	= sizeof(sin);
+	msg.msg_control	= NULL;
+	msg.msg_controllen = 0;
+	msg.msg_flags	= 0;
+
+	whdr.epoch	= htonl(sp->hdr.epoch);
+	whdr.cid	= htonl(sp->hdr.cid);
+	whdr.callNumber	= htonl(sp->hdr.callNumber);
+	whdr.seq	= 0;
+	whdr.serial	= 0;
+	whdr.type	= RXRPC_PACKET_TYPE_VERSION;
+	whdr.flags	= RXRPC_LAST_PACKET | (~hdr->flags & RXRPC_CLIENT_INITIATED);
+	whdr.userStatus	= 0;
+	whdr.securityIndex = 0;
+	whdr._rsvd	= 0;
+	whdr.serviceId	= htons(sp->hdr.serviceId);
+
+	iov[0].iov_base	= &whdr;
+	iov[0].iov_len	= sizeof(whdr);
+	iov[1].iov_base	= (char *)rxrpc_version_string;
+	iov[1].iov_len	= sizeof(rxrpc_version_string);
+
+	len = iov[0].iov_len + iov[1].iov_len;
+
+	_proto("Tx VERSION (reply)");
+
+	ret = kernel_sendmsg(local->socket, &msg, iov, 2, len);
+	if (ret < 0)
+		_debug("sendmsg failed: %d", ret);
+
+	_leave("");
+}
+
+/*
+ * Process event packets targetted at a local endpoint.
+ */
+void rxrpc_process_local_events(struct work_struct *work)
+{
+	struct rxrpc_local *local = container_of(work, struct rxrpc_local, event_processor);
+	struct sk_buff *skb;
+	char v;
+
+	_enter("");
+
+	atomic_inc(&local->usage);
+
+	while ((skb = skb_dequeue(&local->event_queue))) {
+		struct rxrpc_skb_priv *sp = rxrpc_skb(skb);
+
+		_debug("{%d},{%u}", local->debug_id, sp->hdr.type);
+
+		switch (sp->hdr.type) {
+		case RXRPC_PACKET_TYPE_VERSION:
+			if (skb_copy_bits(skb, 0, &v, 1) < 0)
+				return;
+			_proto("Rx VERSION { %02x }", v);
+			if (v == 0)
+				rxrpc_send_version_request(local, &sp->hdr, skb);
+			break;
+
+		default:
+			/* Just ignore anything we don't understand */
+			break;
+		}
+
+		rxrpc_put_local(local);
+		rxrpc_free_skb(skb);
+	}
+
+	rxrpc_put_local(local);
+	_leave("");
+}
diff --git a/net/rxrpc/local_object.c b/net/rxrpc/local_object.c
index 28f9efb3118f..c1b8d745bf5e 100644
--- a/net/rxrpc/local_object.c
+++ b/net/rxrpc/local_object.c
@@ -1,12 +1,12 @@
-/* AF_RXRPC local endpoint management
+/* Local endpoint object management
  *
  * Copyright (C) 2007 Red Hat, Inc. All Rights Reserved.
  * Written by David Howells (dhowells@redhat.com)
  *
  * This program is free software; you can redistribute it and/or
- * modify it under the terms of the GNU General Public License
+ * modify it under the terms of the GNU General Public Licence
  * as published by the Free Software Foundation; either version
- * 2 of the License, or (at your option) any later version.
+ * 2 of the Licence, or (at your option) any later version.
  */
 
 #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
@@ -19,18 +19,14 @@
 #include <linux/ip.h>
 #include <net/sock.h>
 #include <net/af_rxrpc.h>
-#include <generated/utsrelease.h>
 #include "ar-internal.h"
 
-static const char rxrpc_version_string[65] = "linux-" UTS_RELEASE " AF_RXRPC";
-
 static LIST_HEAD(rxrpc_locals);
 DEFINE_RWLOCK(rxrpc_local_lock);
 static DECLARE_RWSEM(rxrpc_local_sem);
 static DECLARE_WAIT_QUEUE_HEAD(rxrpc_local_wq);
 
 static void rxrpc_destroy_local(struct work_struct *work);
-static void rxrpc_process_local_events(struct work_struct *work);
 
 /*
  * allocate a new local
@@ -320,98 +316,3 @@ void __exit rxrpc_destroy_all_locals(void)
 
 	_leave("");
 }
-
-/*
- * Reply to a version request
- */
-static void rxrpc_send_version_request(struct rxrpc_local *local,
-				       struct rxrpc_host_header *hdr,
-				       struct sk_buff *skb)
-{
-	struct rxrpc_wire_header whdr;
-	struct rxrpc_skb_priv *sp = rxrpc_skb(skb);
-	struct sockaddr_in sin;
-	struct msghdr msg;
-	struct kvec iov[2];
-	size_t len;
-	int ret;
-
-	_enter("");
-
-	sin.sin_family = AF_INET;
-	sin.sin_port = udp_hdr(skb)->source;
-	sin.sin_addr.s_addr = ip_hdr(skb)->saddr;
-
-	msg.msg_name	= &sin;
-	msg.msg_namelen	= sizeof(sin);
-	msg.msg_control	= NULL;
-	msg.msg_controllen = 0;
-	msg.msg_flags	= 0;
-
-	whdr.epoch	= htonl(sp->hdr.epoch);
-	whdr.cid	= htonl(sp->hdr.cid);
-	whdr.callNumber	= htonl(sp->hdr.callNumber);
-	whdr.seq	= 0;
-	whdr.serial	= 0;
-	whdr.type	= RXRPC_PACKET_TYPE_VERSION;
-	whdr.flags	= RXRPC_LAST_PACKET | (~hdr->flags & RXRPC_CLIENT_INITIATED);
-	whdr.userStatus	= 0;
-	whdr.securityIndex = 0;
-	whdr._rsvd	= 0;
-	whdr.serviceId	= htons(sp->hdr.serviceId);
-
-	iov[0].iov_base	= &whdr;
-	iov[0].iov_len	= sizeof(whdr);
-	iov[1].iov_base	= (char *)rxrpc_version_string;
-	iov[1].iov_len	= sizeof(rxrpc_version_string);
-
-	len = iov[0].iov_len + iov[1].iov_len;
-
-	_proto("Tx VERSION (reply)");
-
-	ret = kernel_sendmsg(local->socket, &msg, iov, 2, len);
-	if (ret < 0)
-		_debug("sendmsg failed: %d", ret);
-
-	_leave("");
-}
-
-/*
- * Process event packets targetted at a local endpoint.
- */
-static void rxrpc_process_local_events(struct work_struct *work)
-{
-	struct rxrpc_local *local = container_of(work, struct rxrpc_local, event_processor);
-	struct sk_buff *skb;
-	char v;
-
-	_enter("");
-
-	atomic_inc(&local->usage);
-
-	while ((skb = skb_dequeue(&local->event_queue))) {
-		struct rxrpc_skb_priv *sp = rxrpc_skb(skb);
-
-		_debug("{%d},{%u}", local->debug_id, sp->hdr.type);
-
-		switch (sp->hdr.type) {
-		case RXRPC_PACKET_TYPE_VERSION:
-			if (skb_copy_bits(skb, 0, &v, 1) < 0)
-				return;
-			_proto("Rx VERSION { %02x }", v);
-			if (v == 0)
-				rxrpc_send_version_request(local, &sp->hdr, skb);
-			break;
-
-		default:
-			/* Just ignore anything we don't understand */
-			break;
-		}
-
-		rxrpc_put_local(local);
-		rxrpc_free_skb(skb);
-	}
-
-	rxrpc_put_local(local);
-	_leave("");
-}

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH net-next 8/8] rxrpc: Rework local endpoint management
  2016-06-15 16:41 [PATCH net-next 0/8] rxrpc: Rework endpoint record handling David Howells
                   ` (6 preceding siblings ...)
  2016-06-15 16:42 ` [PATCH net-next 7/8] rxrpc: Separate local endpoint event handling out into its own file David Howells
@ 2016-06-15 16:42 ` David Howells
  2016-06-16  5:25 ` [PATCH net-next 0/8] rxrpc: Rework endpoint record handling David Miller
  8 siblings, 0 replies; 10+ messages in thread
From: David Howells @ 2016-06-15 16:42 UTC (permalink / raw)
  To: davem; +Cc: dhowells, netdev, linux-afs, linux-kernel

Rework the local RxRPC endpoint management.

Local endpoint objects are maintained in a flat list as before.  This
should be okay as there shouldn't be more than one per open AF_RXRPC socket
(there can be fewer as local endpoints can be shared if their local service
ID is 0 and they share the same local transport parameters).

Changes:

 (1) Local endpoints may now only be shared if they have local service ID 0
     (ie. they're not being used for listening).

     This prevents a scenario where process A is listening of the Cache
     Manager port and process B contacts a fileserver - which may then
     attempt to send CM requests back to B.  But if A and B are sharing a
     local endpoint, A will get the CM requests meant for B.

 (2) We use a mutex to handle lookups and don't provide RCU-only lookups
     since we only expect to access the list when opening a socket or
     destroying an endpoint.

     The local endpoint object is pointed to by the transport socket's
     sk_user_data for the life of the transport socket - allowing us to
     refer to it directly from the sk_data_ready and sk_error_report
     callbacks.

 (3) atomic_inc_not_zero() now exists and can be used to only share a local
     endpoint if the last reference hasn't yet gone.

 (4) We can remove rxrpc_local_lock - a spinlock that had to be taken with
     BH processing disabled given that we assume sk_user_data won't change
     under us.

 (5) The transport socket is shut down before we clear the sk_user_data
     pointer so that we can be sure that the transport socket's callbacks
     won't be invoked once the RCU destruction is scheduled.

 (6) Local endpoints have a work item that handles both destruction and
     event processing.  The means that destruction doesn't then need to
     wait for event processing.  The event queues can then be cleared after
     the transport socket is shut down.

 (7) Local endpoints are no longer available for resurrection beyond the
     life of the sockets that had them open.  As soon as their last ref
     goes, they are scheduled for destruction and may not have their usage
     count moved from 0.

Signed-off-by: David Howells <dhowells@redhat.com>
---

 net/rxrpc/af_rxrpc.c     |   19 ++
 net/rxrpc/ar-internal.h  |   55 ++++---
 net/rxrpc/call_accept.c  |   25 +--
 net/rxrpc/conn_event.c   |   15 --
 net/rxrpc/input.c        |   29 +---
 net/rxrpc/local_event.c  |   10 -
 net/rxrpc/local_object.c |  353 +++++++++++++++++++++++++++-------------------
 7 files changed, 276 insertions(+), 230 deletions(-)

diff --git a/net/rxrpc/af_rxrpc.c b/net/rxrpc/af_rxrpc.c
index ba373caddbeb..c83c3c75d665 100644
--- a/net/rxrpc/af_rxrpc.c
+++ b/net/rxrpc/af_rxrpc.c
@@ -102,6 +102,8 @@ static int rxrpc_validate_address(struct rxrpc_sock *rx,
 
 	switch (srx->transport.family) {
 	case AF_INET:
+		if (srx->transport_len < sizeof(struct sockaddr_in))
+			return -EINVAL;
 		_debug("INET: %x @ %pI4",
 		       ntohs(srx->transport.sin.sin_port),
 		       &srx->transport.sin.sin_addr);
@@ -835,12 +837,27 @@ static void __exit af_rxrpc_exit(void)
 	rxrpc_destroy_all_calls();
 	rxrpc_destroy_all_connections();
 	rxrpc_destroy_all_transports();
-	rxrpc_destroy_all_locals();
 
 	ASSERTCMP(atomic_read(&rxrpc_n_skbs), ==, 0);
 
+	/* We need to flush the scheduled work twice because the local endpoint
+	 * records involve a work item in their destruction as they can only be
+	 * destroyed from process context.  However, a connection may have a
+	 * work item outstanding - and this will pin the local endpoint record
+	 * until the connection goes away.
+	 *
+	 * Peers don't pin locals and calls pin sockets - which prevents the
+	 * module from being unloaded - so we should only need two flushes.
+	 */
 	_debug("flush scheduled work");
 	flush_workqueue(rxrpc_workqueue);
+	_debug("flush scheduled work 2");
+	flush_workqueue(rxrpc_workqueue);
+	_debug("synchronise RCU");
+	rcu_barrier();
+	_debug("destroy locals");
+	rxrpc_destroy_all_locals();
+
 	remove_proc_entry("rxrpc_conns", init_net.proc_net);
 	remove_proc_entry("rxrpc_calls", init_net.proc_net);
 	destroy_workqueue(rxrpc_workqueue);
diff --git a/net/rxrpc/ar-internal.h b/net/rxrpc/ar-internal.h
index fa50b09eaa63..c168268467cd 100644
--- a/net/rxrpc/ar-internal.h
+++ b/net/rxrpc/ar-internal.h
@@ -170,25 +170,26 @@ struct rxrpc_security {
 };
 
 /*
- * RxRPC local transport endpoint definition
- * - matched by local port, address and protocol type
+ * RxRPC local transport endpoint description
+ * - owned by a single AF_RXRPC socket
+ * - pointed to by transport socket struct sk_user_data
  */
 struct rxrpc_local {
+	struct rcu_head		rcu;
+	atomic_t		usage;
+	struct list_head	link;
 	struct socket		*socket;	/* my UDP socket */
-	struct work_struct	destroyer;	/* endpoint destroyer */
-	struct work_struct	acceptor;	/* incoming call processor */
-	struct work_struct	rejecter;	/* packet reject writer */
-	struct work_struct	event_processor; /* endpoint event processor */
+	struct work_struct	processor;
 	struct list_head	services;	/* services listening on this endpoint */
-	struct list_head	link;		/* link in endpoint list */
 	struct rw_semaphore	defrag_sem;	/* control re-enablement of IP DF bit */
 	struct sk_buff_head	accept_queue;	/* incoming calls awaiting acceptance */
 	struct sk_buff_head	reject_queue;	/* packets awaiting rejection */
 	struct sk_buff_head	event_queue;	/* endpoint event packets awaiting processing */
+	struct mutex		conn_lock;	/* Client connection creation lock */
 	spinlock_t		lock;		/* access lock */
 	rwlock_t		services_lock;	/* lock for services list */
-	atomic_t		usage;
 	int			debug_id;	/* debug ID for printks */
+	bool			dead;
 	struct sockaddr_rxrpc	srx;		/* local address */
 };
 
@@ -487,7 +488,7 @@ extern struct rxrpc_transport *rxrpc_name_to_transport(struct rxrpc_sock *,
 /*
  * call_accept.c
  */
-void rxrpc_accept_incoming_calls(struct work_struct *);
+void rxrpc_accept_incoming_calls(struct rxrpc_local *);
 struct rxrpc_call *rxrpc_accept_call(struct rxrpc_sock *, unsigned long);
 int rxrpc_reject_call(struct rxrpc_sock *);
 
@@ -527,7 +528,7 @@ void __exit rxrpc_destroy_all_calls(void);
  */
 void rxrpc_process_connection(struct work_struct *);
 void rxrpc_reject_packet(struct rxrpc_local *, struct sk_buff *);
-void rxrpc_reject_packets(struct work_struct *);
+void rxrpc_reject_packets(struct rxrpc_local *);
 
 /*
  * conn_object.c
@@ -575,17 +576,32 @@ int rxrpc_get_server_data_key(struct rxrpc_connection *, const void *, time_t,
 /*
  * local_event.c
  */
-extern void rxrpc_process_local_events(struct work_struct *);
+extern void rxrpc_process_local_events(struct rxrpc_local *);
 
 /*
  * local_object.c
  */
-extern rwlock_t rxrpc_local_lock;
-
-struct rxrpc_local *rxrpc_lookup_local(struct sockaddr_rxrpc *);
-void rxrpc_put_local(struct rxrpc_local *);
+struct rxrpc_local *rxrpc_lookup_local(const struct sockaddr_rxrpc *);
+void __rxrpc_put_local(struct rxrpc_local *);
 void __exit rxrpc_destroy_all_locals(void);
 
+static inline void rxrpc_get_local(struct rxrpc_local *local)
+{
+	atomic_inc(&local->usage);
+}
+
+static inline
+struct rxrpc_local *rxrpc_get_local_maybe(struct rxrpc_local *local)
+{
+	return atomic_inc_not_zero(&local->usage) ? local : NULL;
+}
+
+static inline void rxrpc_put_local(struct rxrpc_local *local)
+{
+	if (atomic_dec_and_test(&local->usage))
+		__rxrpc_put_local(local);
+}
+
 /*
  * misc.c
  */
@@ -874,15 +890,6 @@ static inline void rxrpc_purge_queue(struct sk_buff_head *list)
 		rxrpc_free_skb(skb);
 }
 
-static inline void __rxrpc_get_local(struct rxrpc_local *local, const char *f)
-{
-	CHECK_SLAB_OKAY(&local->usage);
-	if (atomic_inc_return(&local->usage) == 1)
-		printk("resurrected (%s)\n", f);
-}
-
-#define rxrpc_get_local(LOCAL) __rxrpc_get_local((LOCAL), __func__)
-
 #define rxrpc_get_call(CALL)				\
 do {							\
 	CHECK_SLAB_OKAY(&(CALL)->usage);		\
diff --git a/net/rxrpc/call_accept.c b/net/rxrpc/call_accept.c
index e5723f4dce89..50136c76ebd1 100644
--- a/net/rxrpc/call_accept.c
+++ b/net/rxrpc/call_accept.c
@@ -202,10 +202,8 @@ error_nofree:
  * accept incoming calls that need peer, transport and/or connection setting up
  * - the packets we get are all incoming client DATA packets that have seq == 1
  */
-void rxrpc_accept_incoming_calls(struct work_struct *work)
+void rxrpc_accept_incoming_calls(struct rxrpc_local *local)
 {
-	struct rxrpc_local *local =
-		container_of(work, struct rxrpc_local, acceptor);
 	struct rxrpc_skb_priv *sp;
 	struct sockaddr_rxrpc srx;
 	struct rxrpc_sock *rx;
@@ -215,21 +213,8 @@ void rxrpc_accept_incoming_calls(struct work_struct *work)
 
 	_enter("%d", local->debug_id);
 
-	read_lock_bh(&rxrpc_local_lock);
-	if (atomic_read(&local->usage) > 0)
-		rxrpc_get_local(local);
-	else
-		local = NULL;
-	read_unlock_bh(&rxrpc_local_lock);
-	if (!local) {
-		_leave(" [local dead]");
-		return;
-	}
-
-process_next_packet:
 	skb = skb_dequeue(&local->accept_queue);
 	if (!skb) {
-		rxrpc_put_local(local);
 		_leave("\n");
 		return;
 	}
@@ -292,7 +277,7 @@ found_service:
 	case -ECONNRESET: /* old calls are ignored */
 	case -ECONNABORTED: /* aborted calls are reaborted or ignored */
 	case 0:
-		goto process_next_packet;
+		return;
 	case -ECONNREFUSED:
 		goto invalid_service;
 	case -EBUSY:
@@ -308,18 +293,18 @@ backlog_full:
 busy:
 	rxrpc_busy(local, &srx, &whdr);
 	rxrpc_free_skb(skb);
-	goto process_next_packet;
+	return;
 
 invalid_service:
 	skb->priority = RX_INVALID_OPERATION;
 	rxrpc_reject_packet(local, skb);
-	goto process_next_packet;
+	return;
 
 	/* can't change connection security type mid-flow */
 security_mismatch:
 	skb->priority = RX_PROTOCOL_ERROR;
 	rxrpc_reject_packet(local, skb);
-	goto process_next_packet;
+	return;
 }
 
 /*
diff --git a/net/rxrpc/conn_event.c b/net/rxrpc/conn_event.c
index 8bdd692d4862..00c92b614485 100644
--- a/net/rxrpc/conn_event.c
+++ b/net/rxrpc/conn_event.c
@@ -314,19 +314,14 @@ void rxrpc_reject_packet(struct rxrpc_local *local, struct sk_buff *skb)
 {
 	CHECK_SLAB_OKAY(&local->usage);
 
-	if (!atomic_inc_not_zero(&local->usage)) {
-		printk("resurrected on reject\n");
-		BUG();
-	}
-
 	skb_queue_tail(&local->reject_queue, skb);
-	rxrpc_queue_work(&local->rejecter);
+	rxrpc_queue_work(&local->processor);
 }
 
 /*
  * reject packets through the local endpoint
  */
-void rxrpc_reject_packets(struct work_struct *work)
+void rxrpc_reject_packets(struct rxrpc_local *local)
 {
 	union {
 		struct sockaddr sa;
@@ -334,16 +329,12 @@ void rxrpc_reject_packets(struct work_struct *work)
 	} sa;
 	struct rxrpc_skb_priv *sp;
 	struct rxrpc_wire_header whdr;
-	struct rxrpc_local *local;
 	struct sk_buff *skb;
 	struct msghdr msg;
 	struct kvec iov[2];
 	size_t size;
 	__be32 code;
 
-	local = container_of(work, struct rxrpc_local, rejecter);
-	rxrpc_get_local(local);
-
 	_enter("%d", local->debug_id);
 
 	iov[0].iov_base = &whdr;
@@ -395,9 +386,7 @@ void rxrpc_reject_packets(struct work_struct *work)
 		}
 
 		rxrpc_free_skb(skb);
-		rxrpc_put_local(local);
 	}
 
-	rxrpc_put_local(local);
 	_leave("");
 }
diff --git a/net/rxrpc/input.c b/net/rxrpc/input.c
index 3b405dbf3a05..47fb167af3e4 100644
--- a/net/rxrpc/input.c
+++ b/net/rxrpc/input.c
@@ -594,9 +594,8 @@ static void rxrpc_post_packet_to_local(struct rxrpc_local *local,
 {
 	_enter("%p,%p", local, skb);
 
-	atomic_inc(&local->usage);
 	skb_queue_tail(&local->event_queue, skb);
-	rxrpc_queue_work(&local->event_processor);
+	rxrpc_queue_work(&local->processor);
 }
 
 /*
@@ -664,11 +663,15 @@ cant_find_conn:
 /*
  * handle data received on the local endpoint
  * - may be called in interrupt context
+ *
+ * The socket is locked by the caller and this prevents the socket from being
+ * shut down and the local endpoint from going away, thus sk_user_data will not
+ * be cleared until this function returns.
  */
 void rxrpc_data_ready(struct sock *sk)
 {
 	struct rxrpc_skb_priv *sp;
-	struct rxrpc_local *local;
+	struct rxrpc_local *local = sk->sk_user_data;
 	struct sk_buff *skb;
 	int ret;
 
@@ -676,21 +679,8 @@ void rxrpc_data_ready(struct sock *sk)
 
 	ASSERT(!irqs_disabled());
 
-	read_lock_bh(&rxrpc_local_lock);
-	local = sk->sk_user_data;
-	if (local && atomic_read(&local->usage) > 0)
-		rxrpc_get_local(local);
-	else
-		local = NULL;
-	read_unlock_bh(&rxrpc_local_lock);
-	if (!local) {
-		_leave(" [local dead]");
-		return;
-	}
-
 	skb = skb_recv_datagram(sk, 0, 1, &ret);
 	if (!skb) {
-		rxrpc_put_local(local);
 		if (ret == -EAGAIN)
 			return;
 		_debug("UDP socket error %d", ret);
@@ -704,7 +694,6 @@ void rxrpc_data_ready(struct sock *sk)
 	/* we'll probably need to checksum it (didn't call sock_recvmsg) */
 	if (skb_checksum_complete(skb)) {
 		rxrpc_free_skb(skb);
-		rxrpc_put_local(local);
 		__UDP_INC_STATS(&init_net, UDP_MIB_INERRORS, 0);
 		_leave(" [CSUM failed]");
 		return;
@@ -769,7 +758,6 @@ void rxrpc_data_ready(struct sock *sk)
 	}
 
 out:
-	rxrpc_put_local(local);
 	return;
 
 cant_route_call:
@@ -779,8 +767,7 @@ cant_route_call:
 		if (sp->hdr.seq == 1) {
 			_debug("first packet");
 			skb_queue_tail(&local->accept_queue, skb);
-			rxrpc_queue_work(&local->acceptor);
-			rxrpc_put_local(local);
+			rxrpc_queue_work(&local->processor);
 			_leave(" [incoming]");
 			return;
 		}
@@ -793,13 +780,11 @@ cant_route_call:
 		_debug("reject type %d",sp->hdr.type);
 		rxrpc_reject_packet(local, skb);
 	}
-	rxrpc_put_local(local);
 	_leave(" [no call]");
 	return;
 
 bad_message:
 	skb->priority = RX_PROTOCOL_ERROR;
 	rxrpc_reject_packet(local, skb);
-	rxrpc_put_local(local);
 	_leave(" [badmsg]");
 }
diff --git a/net/rxrpc/local_event.c b/net/rxrpc/local_event.c
index 194db2e6d548..31a3f86ef2f6 100644
--- a/net/rxrpc/local_event.c
+++ b/net/rxrpc/local_event.c
@@ -82,17 +82,15 @@ static void rxrpc_send_version_request(struct rxrpc_local *local,
 /*
  * Process event packets targetted at a local endpoint.
  */
-void rxrpc_process_local_events(struct work_struct *work)
+void rxrpc_process_local_events(struct rxrpc_local *local)
 {
-	struct rxrpc_local *local = container_of(work, struct rxrpc_local, event_processor);
 	struct sk_buff *skb;
 	char v;
 
 	_enter("");
 
-	atomic_inc(&local->usage);
-
-	while ((skb = skb_dequeue(&local->event_queue))) {
+	skb = skb_dequeue(&local->event_queue);
+	if (skb) {
 		struct rxrpc_skb_priv *sp = rxrpc_skb(skb);
 
 		_debug("{%d},{%u}", local->debug_id, sp->hdr.type);
@@ -111,10 +109,8 @@ void rxrpc_process_local_events(struct work_struct *work)
 			break;
 		}
 
-		rxrpc_put_local(local);
 		rxrpc_free_skb(skb);
 	}
 
-	rxrpc_put_local(local);
 	_leave("");
 }
diff --git a/net/rxrpc/local_object.c b/net/rxrpc/local_object.c
index c1b8d745bf5e..009b321712bc 100644
--- a/net/rxrpc/local_object.c
+++ b/net/rxrpc/local_object.c
@@ -1,6 +1,6 @@
 /* Local endpoint object management
  *
- * Copyright (C) 2007 Red Hat, Inc. All Rights Reserved.
+ * Copyright (C) 2016 Red Hat, Inc. All Rights Reserved.
  * Written by David Howells (dhowells@redhat.com)
  *
  * This program is free software; you can redistribute it and/or
@@ -17,40 +17,72 @@
 #include <linux/slab.h>
 #include <linux/udp.h>
 #include <linux/ip.h>
+#include <linux/hashtable.h>
 #include <net/sock.h>
 #include <net/af_rxrpc.h>
 #include "ar-internal.h"
 
-static LIST_HEAD(rxrpc_locals);
-DEFINE_RWLOCK(rxrpc_local_lock);
-static DECLARE_RWSEM(rxrpc_local_sem);
-static DECLARE_WAIT_QUEUE_HEAD(rxrpc_local_wq);
+static void rxrpc_local_processor(struct work_struct *);
+static void rxrpc_local_rcu(struct rcu_head *);
 
-static void rxrpc_destroy_local(struct work_struct *work);
+static DEFINE_MUTEX(rxrpc_local_mutex);
+static LIST_HEAD(rxrpc_local_endpoints);
 
 /*
- * allocate a new local
+ * Compare a local to an address.  Return -ve, 0 or +ve to indicate less than,
+ * same or greater than.
+ *
+ * We explicitly don't compare the RxRPC service ID as we want to reject
+ * conflicting uses by differing services.  Further, we don't want to share
+ * addresses with different options (IPv6), so we don't compare those bits
+ * either.
  */
-static
-struct rxrpc_local *rxrpc_alloc_local(struct sockaddr_rxrpc *srx)
+static long rxrpc_local_cmp_key(const struct rxrpc_local *local,
+				const struct sockaddr_rxrpc *srx)
+{
+	long diff;
+
+	diff = ((local->srx.transport_type - srx->transport_type) ?:
+		(local->srx.transport_len - srx->transport_len) ?:
+		(local->srx.transport.family - srx->transport.family));
+	if (diff != 0)
+		return diff;
+
+	switch (srx->transport.family) {
+	case AF_INET:
+		/* If the choice of UDP port is left up to the transport, then
+		 * the endpoint record doesn't match.
+		 */
+		return ((u16 __force)local->srx.transport.sin.sin_port -
+			(u16 __force)srx->transport.sin.sin_port) ?:
+			memcmp(&local->srx.transport.sin.sin_addr,
+			       &srx->transport.sin.sin_addr,
+			       sizeof(struct in_addr));
+	default:
+		BUG();
+	}
+}
+
+/*
+ * Allocate a new local endpoint.
+ */
+static struct rxrpc_local *rxrpc_alloc_local(const struct sockaddr_rxrpc *srx)
 {
 	struct rxrpc_local *local;
 
 	local = kzalloc(sizeof(struct rxrpc_local), GFP_KERNEL);
 	if (local) {
-		INIT_WORK(&local->destroyer, &rxrpc_destroy_local);
-		INIT_WORK(&local->acceptor, &rxrpc_accept_incoming_calls);
-		INIT_WORK(&local->rejecter, &rxrpc_reject_packets);
-		INIT_WORK(&local->event_processor, &rxrpc_process_local_events);
-		INIT_LIST_HEAD(&local->services);
+		atomic_set(&local->usage, 1);
 		INIT_LIST_HEAD(&local->link);
+		INIT_WORK(&local->processor, rxrpc_local_processor);
+		INIT_LIST_HEAD(&local->services);
 		init_rwsem(&local->defrag_sem);
 		skb_queue_head_init(&local->accept_queue);
 		skb_queue_head_init(&local->reject_queue);
 		skb_queue_head_init(&local->event_queue);
+		mutex_init(&local->conn_lock);
 		spin_lock_init(&local->lock);
 		rwlock_init(&local->services_lock);
-		atomic_set(&local->usage, 1);
 		local->debug_id = atomic_inc_return(&rxrpc_debug_id);
 		memcpy(&local->srx, srx, sizeof(*srx));
 	}
@@ -61,9 +93,9 @@ struct rxrpc_local *rxrpc_alloc_local(struct sockaddr_rxrpc *srx)
 
 /*
  * create the local socket
- * - must be called with rxrpc_local_sem writelocked
+ * - must be called with rxrpc_local_mutex locked
  */
-static int rxrpc_create_local(struct rxrpc_local *local)
+static int rxrpc_open_socket(struct rxrpc_local *local)
 {
 	struct sock *sock;
 	int ret, opt;
@@ -82,10 +114,10 @@ static int rxrpc_create_local(struct rxrpc_local *local)
 	if (local->srx.transport_len > sizeof(sa_family_t)) {
 		_debug("bind");
 		ret = kernel_bind(local->socket,
-				  (struct sockaddr *) &local->srx.transport,
+				  (struct sockaddr *)&local->srx.transport,
 				  local->srx.transport_len);
 		if (ret < 0) {
-			_debug("bind failed");
+			_debug("bind failed %d", ret);
 			goto error;
 		}
 	}
@@ -108,10 +140,6 @@ static int rxrpc_create_local(struct rxrpc_local *local)
 		goto error;
 	}
 
-	write_lock_bh(&rxrpc_local_lock);
-	list_add(&local->link, &rxrpc_locals);
-	write_unlock_bh(&rxrpc_local_lock);
-
 	/* set the socket up */
 	sock = local->socket->sk;
 	sock->sk_user_data	= local;
@@ -131,188 +159,227 @@ error:
 }
 
 /*
- * create a new local endpoint using the specified UDP address
+ * Look up or create a new local endpoint using the specified local address.
  */
-struct rxrpc_local *rxrpc_lookup_local(struct sockaddr_rxrpc *srx)
+struct rxrpc_local *rxrpc_lookup_local(const struct sockaddr_rxrpc *srx)
 {
 	struct rxrpc_local *local;
+	struct list_head *cursor;
+	const char *age;
+	long diff;
 	int ret;
 
-	_enter("{%d,%u,%pI4+%hu}",
-	       srx->transport_type,
-	       srx->transport.family,
-	       &srx->transport.sin.sin_addr,
-	       ntohs(srx->transport.sin.sin_port));
-
-	down_write(&rxrpc_local_sem);
+	if (srx->transport.family == AF_INET) {
+		_enter("{%d,%u,%pI4+%hu}",
+		       srx->transport_type,
+		       srx->transport.family,
+		       &srx->transport.sin.sin_addr,
+		       ntohs(srx->transport.sin.sin_port));
+	} else {
+		_enter("{%d,%u}",
+		       srx->transport_type,
+		       srx->transport.family);
+		return ERR_PTR(-EAFNOSUPPORT);
+	}
 
-	/* see if we have a suitable local local endpoint already */
-	read_lock_bh(&rxrpc_local_lock);
+	mutex_lock(&rxrpc_local_mutex);
 
-	list_for_each_entry(local, &rxrpc_locals, link) {
-		_debug("CMP {%d,%u,%pI4+%hu}",
-		       local->srx.transport_type,
-		       local->srx.transport.family,
-		       &local->srx.transport.sin.sin_addr,
-		       ntohs(local->srx.transport.sin.sin_port));
+	for (cursor = rxrpc_local_endpoints.next;
+	     cursor != &rxrpc_local_endpoints;
+	     cursor = cursor->next) {
+		local = list_entry(cursor, struct rxrpc_local, link);
 
-		if (local->srx.transport_type != srx->transport_type ||
-		    local->srx.transport.family != srx->transport.family)
+		diff = rxrpc_local_cmp_key(local, srx);
+		if (diff < 0)
 			continue;
+		if (diff > 0)
+			break;
+
+		/* Services aren't allowed to share transport sockets, so
+		 * reject that here.  It is possible that the object is dying -
+		 * but it may also still have the local transport address that
+		 * we want bound.
+		 */
+		if (srx->srx_service) {
+			local = NULL;
+			goto addr_in_use;
+		}
 
-		switch (srx->transport.family) {
-		case AF_INET:
-			if (local->srx.transport.sin.sin_port !=
-			    srx->transport.sin.sin_port)
-				continue;
-			if (memcmp(&local->srx.transport.sin.sin_addr,
-				   &srx->transport.sin.sin_addr,
-				   sizeof(struct in_addr)) != 0)
-				continue;
-			goto found_local;
-
-		default:
-			BUG();
+		/* Found a match.  We replace a dying object.  Attempting to
+		 * bind the transport socket may still fail if we're attempting
+		 * to use a local address that the dying object is still using.
+		 */
+		if (!atomic_inc_not_zero(&local->usage)) {
+			cursor = cursor->next;
+			list_del_init(&local->link);
+			break;
 		}
-	}
 
-	read_unlock_bh(&rxrpc_local_lock);
+		age = "old";
+		goto found;
+	}
 
-	/* we didn't find one, so we need to create one */
 	local = rxrpc_alloc_local(srx);
-	if (!local) {
-		up_write(&rxrpc_local_sem);
-		return ERR_PTR(-ENOMEM);
-	}
+	if (!local)
+		goto nomem;
 
-	ret = rxrpc_create_local(local);
-	if (ret < 0) {
-		up_write(&rxrpc_local_sem);
-		kfree(local);
-		_leave(" = %d", ret);
-		return ERR_PTR(ret);
-	}
+	ret = rxrpc_open_socket(local);
+	if (ret < 0)
+		goto sock_error;
+
+	list_add_tail(&local->link, cursor);
+	age = "new";
 
-	up_write(&rxrpc_local_sem);
+found:
+	mutex_unlock(&rxrpc_local_mutex);
 
-	_net("LOCAL new %d {%d,%u,%pI4+%hu}",
+	_net("LOCAL %s %d {%d,%u,%pI4+%hu}",
+	     age,
 	     local->debug_id,
 	     local->srx.transport_type,
 	     local->srx.transport.family,
 	     &local->srx.transport.sin.sin_addr,
 	     ntohs(local->srx.transport.sin.sin_port));
 
-	_leave(" = %p [new]", local);
+	_leave(" = %p", local);
 	return local;
 
-found_local:
-	rxrpc_get_local(local);
-	read_unlock_bh(&rxrpc_local_lock);
-	up_write(&rxrpc_local_sem);
+nomem:
+	ret = -ENOMEM;
+sock_error:
+	mutex_unlock(&rxrpc_local_mutex);
+	kfree(local);
+	_leave(" = %d", ret);
+	return ERR_PTR(ret);
 
-	_net("LOCAL old %d {%d,%u,%pI4+%hu}",
-	     local->debug_id,
-	     local->srx.transport_type,
-	     local->srx.transport.family,
-	     &local->srx.transport.sin.sin_addr,
-	     ntohs(local->srx.transport.sin.sin_port));
+addr_in_use:
+	mutex_unlock(&rxrpc_local_mutex);
+	_leave(" = -EADDRINUSE");
+	return ERR_PTR(-EADDRINUSE);
+}
 
-	_leave(" = %p [reuse]", local);
-	return local;
+/*
+ * A local endpoint reached its end of life.
+ */
+void __rxrpc_put_local(struct rxrpc_local *local)
+{
+	_enter("%d", local->debug_id);
+	rxrpc_queue_work(&local->processor);
 }
 
 /*
- * release a local endpoint
+ * Destroy a local endpoint's socket and then hand the record to RCU to dispose
+ * of.
+ *
+ * Closing the socket cannot be done from bottom half context or RCU callback
+ * context because it might sleep.
  */
-void rxrpc_put_local(struct rxrpc_local *local)
+static void rxrpc_local_destroyer(struct rxrpc_local *local)
 {
-	_enter("%p{u=%d}", local, atomic_read(&local->usage));
+	struct socket *socket = local->socket;
 
-	ASSERTCMP(atomic_read(&local->usage), >, 0);
+	_enter("%d", local->debug_id);
 
-	/* to prevent a race, the decrement and the dequeue must be effectively
-	 * atomic */
-	write_lock_bh(&rxrpc_local_lock);
-	if (unlikely(atomic_dec_and_test(&local->usage))) {
-		_debug("destroy local");
-		rxrpc_queue_work(&local->destroyer);
+	/* We can get a race between an incoming call packet queueing the
+	 * processor again and the work processor starting the destruction
+	 * process which will shut down the UDP socket.
+	 */
+	if (local->dead) {
+		_leave(" [already dead]");
+		return;
 	}
-	write_unlock_bh(&rxrpc_local_lock);
-	_leave("");
+	local->dead = true;
+
+	mutex_lock(&rxrpc_local_mutex);
+	list_del_init(&local->link);
+	mutex_unlock(&rxrpc_local_mutex);
+
+	ASSERT(list_empty(&local->services));
+
+	if (socket) {
+		local->socket = NULL;
+		kernel_sock_shutdown(socket, SHUT_RDWR);
+		socket->sk->sk_user_data = NULL;
+		sock_release(socket);
+	}
+
+	/* At this point, there should be no more packets coming in to the
+	 * local endpoint.
+	 */
+	rxrpc_purge_queue(&local->accept_queue);
+	rxrpc_purge_queue(&local->reject_queue);
+	rxrpc_purge_queue(&local->event_queue);
+
+	_debug("rcu local %d", local->debug_id);
+	call_rcu(&local->rcu, rxrpc_local_rcu);
 }
 
 /*
- * destroy a local endpoint
+ * Process events on an endpoint
  */
-static void rxrpc_destroy_local(struct work_struct *work)
+static void rxrpc_local_processor(struct work_struct *work)
 {
 	struct rxrpc_local *local =
-		container_of(work, struct rxrpc_local, destroyer);
+		container_of(work, struct rxrpc_local, processor);
+	bool again;
 
-	_enter("%p{%d}", local, atomic_read(&local->usage));
+	_enter("%d", local->debug_id);
 
-	down_write(&rxrpc_local_sem);
+	do {
+		again = false;
+		if (atomic_read(&local->usage) == 0)
+			return rxrpc_local_destroyer(local);
 
-	write_lock_bh(&rxrpc_local_lock);
-	if (atomic_read(&local->usage) > 0) {
-		write_unlock_bh(&rxrpc_local_lock);
-		up_read(&rxrpc_local_sem);
-		_leave(" [resurrected]");
-		return;
-	}
+		if (!skb_queue_empty(&local->accept_queue)) {
+			rxrpc_accept_incoming_calls(local);
+			again = true;
+		}
 
-	list_del(&local->link);
-	local->socket->sk->sk_user_data = NULL;
-	write_unlock_bh(&rxrpc_local_lock);
+		if (!skb_queue_empty(&local->reject_queue)) {
+			rxrpc_reject_packets(local);
+			again = true;
+		}
 
-	downgrade_write(&rxrpc_local_sem);
+		if (!skb_queue_empty(&local->event_queue)) {
+			rxrpc_process_local_events(local);
+			again = true;
+		}
+	} while (again);
+}
 
-	ASSERT(list_empty(&local->services));
-	ASSERT(!work_pending(&local->acceptor));
-	ASSERT(!work_pending(&local->rejecter));
-	ASSERT(!work_pending(&local->event_processor));
+/*
+ * Destroy a local endpoint after the RCU grace period expires.
+ */
+static void rxrpc_local_rcu(struct rcu_head *rcu)
+{
+	struct rxrpc_local *local = container_of(rcu, struct rxrpc_local, rcu);
 
-	/* finish cleaning up the local descriptor */
-	rxrpc_purge_queue(&local->accept_queue);
-	rxrpc_purge_queue(&local->reject_queue);
-	rxrpc_purge_queue(&local->event_queue);
-	kernel_sock_shutdown(local->socket, SHUT_RDWR);
-	sock_release(local->socket);
+	_enter("%d", local->debug_id);
 
-	up_read(&rxrpc_local_sem);
+	ASSERT(!work_pending(&local->processor));
 
 	_net("DESTROY LOCAL %d", local->debug_id);
 	kfree(local);
-
-	if (list_empty(&rxrpc_locals))
-		wake_up_all(&rxrpc_local_wq);
-
 	_leave("");
 }
 
 /*
- * preemptively destroy all local local endpoint rather than waiting for
- * them to be destroyed
+ * Verify the local endpoint list is empty by this point.
  */
 void __exit rxrpc_destroy_all_locals(void)
 {
-	DECLARE_WAITQUEUE(myself,current);
+	struct rxrpc_local *local;
 
 	_enter("");
 
-	/* we simply have to wait for them to go away */
-	if (!list_empty(&rxrpc_locals)) {
-		set_current_state(TASK_UNINTERRUPTIBLE);
-		add_wait_queue(&rxrpc_local_wq, &myself);
-
-		while (!list_empty(&rxrpc_locals)) {
-			schedule();
-			set_current_state(TASK_UNINTERRUPTIBLE);
-		}
+	if (list_empty(&rxrpc_local_endpoints))
+		return;
 
-		remove_wait_queue(&rxrpc_local_wq, &myself);
-		set_current_state(TASK_RUNNING);
+	mutex_lock(&rxrpc_local_mutex);
+	list_for_each_entry(local, &rxrpc_local_endpoints, link) {
+		pr_err("AF_RXRPC: Leaked local %p {%d}\n",
+		       local, atomic_read(&local->usage));
 	}
-
-	_leave("");
+	mutex_unlock(&rxrpc_local_mutex);
+	BUG();
 }

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH net-next 0/8] rxrpc: Rework endpoint record handling
  2016-06-15 16:41 [PATCH net-next 0/8] rxrpc: Rework endpoint record handling David Howells
                   ` (7 preceding siblings ...)
  2016-06-15 16:42 ` [PATCH net-next 8/8] rxrpc: Rework local endpoint management David Howells
@ 2016-06-16  5:25 ` David Miller
  8 siblings, 0 replies; 10+ messages in thread
From: David Miller @ 2016-06-16  5:25 UTC (permalink / raw)
  To: dhowells; +Cc: netdev, linux-afs, linux-kernel

From: David Howells <dhowells@redhat.com>
Date: Wed, 15 Jun 2016 17:41:38 +0100

> Here's the next part of the AF_RXRPC rewrite.  In this set I rework
> endpoint record handling.  There are two types of endpoint record, local
> and peer.  The local endpoint record is used as an anchor for the transport
> socket that AF_RXRPC uses (at the moment a UDP socket).  Local endpoints
> can be shared between AF_RXRPC sockets under certain restricted
> circumstances.
> 
> The peer endpoint is a record of the remote end.  It is (or will be) used
> to keep track MTU and RTT values and, with these changes, is used to find
> the call(s) to abort when a network error occurs.
> 
> The following significant changes are made:
 ...
> The patches can be found here also:
> 
> 	http://git.kernel.org/cgit/linux/kernel/git/dhowells/linux-fs.git/log/?h=rxrpc-rewrite
> 
> Tagged thusly:
> 
> 	git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs.git
> 	rxrpc-rewrite-20160615

Pulled, thanks David.

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2016-06-16  5:25 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-06-15 16:41 [PATCH net-next 0/8] rxrpc: Rework endpoint record handling David Howells
2016-06-15 16:41 ` [PATCH net-next 1/8] rxrpc: Rework peer object handling to use hash table and RCU David Howells
2016-06-15 16:41 ` [PATCH net-next 2/8] rxrpc: Rename rxrpc_UDP_error_report() to rxrpc_error_report() David Howells
2016-06-15 16:41 ` [PATCH net-next 3/8] rxrpc: Break MTU determination from ICMP into its own function David Howells
2016-06-15 16:42 ` [PATCH net-next 4/8] rxrpc: Don't assume anything about the address in an ICMP packet David Howells
2016-06-15 16:42 ` [PATCH net-next 5/8] rxrpc: Do a little bit of tidying in the ICMP processing David Howells
2016-06-15 16:42 ` [PATCH net-next 6/8] rxrpc: Use the peer record to distribute network errors David Howells
2016-06-15 16:42 ` [PATCH net-next 7/8] rxrpc: Separate local endpoint event handling out into its own file David Howells
2016-06-15 16:42 ` [PATCH net-next 8/8] rxrpc: Rework local endpoint management David Howells
2016-06-16  5:25 ` [PATCH net-next 0/8] rxrpc: Rework endpoint record handling David Miller

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).