linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH net-next 00/10] rxrpc: Miscellaneous fixes
@ 2016-09-13 22:20 David Howells
  2016-09-13 22:21 ` [PATCH net-next 01/10] rxrpc: Make sure we initialise the peer hash key David Howells
                   ` (10 more replies)
  0 siblings, 11 replies; 12+ messages in thread
From: David Howells @ 2016-09-13 22:20 UTC (permalink / raw)
  To: netdev; +Cc: dhowells, linux-afs, linux-kernel


Here's a set of miscellaneous fix patches.  There are a couple of points of
note:

 (1) There is one non-fix patch that adjusts the call ref tracking
     tracepoint to make kernel API-held refs on calls more obvious.  This
     is a prerequisite for the patch that fixes prealloc refcounting.

 (2) The final patch alters how jumbo packets that partially exceed the
     receive window are handled.  Previously, space was being left in the
     Rx buffer for them, but this significantly hurts performance as the Rx
     window can't be increased to match the OpenAFS Tx window size.

     Instead, the excess subpackets are discarded and an EXCEEDS_WINDOW ACK
     is generated for the first.  To avoid the problem of someone trying to
     run the kernel out of space by feeding the kernel a series of
     overlapping maximal jumbo packets, we stop allowing jumbo packets on a
     call if we encounter more than three jumbo packets with duplicate or
     excessive subpackets.

The patches can be found here also (non-terminally on the branch):

	http://git.kernel.org/cgit/linux/kernel/git/dhowells/linux-fs.git/log/?h=rxrpc-rewrite

Tagged thusly:

	git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs.git
	rxrpc-rewrite-20160913-1

David
---
David Howells (10):
      rxrpc: Make sure we initialise the peer hash key
      rxrpc: Add missing wakeup on Tx window rotation
      rxrpc: The IDLE ACK packet should use rxrpc_idle_ack_delay
      rxrpc: Requeue call for recvmsg if more data
      rxrpc: Add missing unlock in rxrpc_call_accept()
      rxrpc: Use skb->len not skb->data_len
      rxrpc: Allow tx_winsize to grow in response to an ACK
      rxrpc: Adjust the call ref tracepoint to show kernel API refs
      rxrpc: Fix prealloc refcounting
      rxrpc: Correctly initialise, limit and transmit call->rx_winsize


 net/rxrpc/af_rxrpc.c    |    2 +-
 net/rxrpc/ar-internal.h |    5 ++++-
 net/rxrpc/call_accept.c |   20 +++++++++++++++-----
 net/rxrpc/call_event.c  |    2 +-
 net/rxrpc/call_object.c |    7 +++----
 net/rxrpc/input.c       |   41 +++++++++++++++++++++++++++--------------
 net/rxrpc/misc.c        |    5 ++++-
 net/rxrpc/output.c      |    4 ++--
 net/rxrpc/peer_object.c |    2 +-
 net/rxrpc/recvmsg.c     |    5 +++++
 net/rxrpc/sysctl.c      |    2 +-
 11 files changed, 64 insertions(+), 31 deletions(-)

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH net-next 01/10] rxrpc: Make sure we initialise the peer hash key
  2016-09-13 22:20 [PATCH net-next 00/10] rxrpc: Miscellaneous fixes David Howells
@ 2016-09-13 22:21 ` David Howells
  2016-09-13 22:21 ` [PATCH net-next 02/10] rxrpc: Add missing wakeup on Tx window rotation David Howells
                   ` (9 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: David Howells @ 2016-09-13 22:21 UTC (permalink / raw)
  To: netdev; +Cc: dhowells, linux-afs, linux-kernel

Peer records created for incoming connections weren't getting their hash
key set.  This meant that incoming calls wouldn't see more than one DATA
packet - which is not a problem for AFS CM calls with small request data
blobs.

Signed-off-by: David Howells <dhowells@redhat.com>
---

 net/rxrpc/peer_object.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/rxrpc/peer_object.c b/net/rxrpc/peer_object.c
index 2efe29a4c232..3e6cd174b53d 100644
--- a/net/rxrpc/peer_object.c
+++ b/net/rxrpc/peer_object.c
@@ -203,6 +203,7 @@ struct rxrpc_peer *rxrpc_alloc_peer(struct rxrpc_local *local, gfp_t gfp)
  */
 static void rxrpc_init_peer(struct rxrpc_peer *peer, unsigned long hash_key)
 {
+	peer->hash_key = hash_key;
 	rxrpc_assess_MTU_size(peer);
 	peer->mtu = peer->if_mtu;
 
@@ -238,7 +239,6 @@ static struct rxrpc_peer *rxrpc_create_peer(struct rxrpc_local *local,
 
 	peer = rxrpc_alloc_peer(local, gfp);
 	if (peer) {
-		peer->hash_key = hash_key;
 		memcpy(&peer->srx, srx, sizeof(*srx));
 		rxrpc_init_peer(peer, hash_key);
 	}

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH net-next 02/10] rxrpc: Add missing wakeup on Tx window rotation
  2016-09-13 22:20 [PATCH net-next 00/10] rxrpc: Miscellaneous fixes David Howells
  2016-09-13 22:21 ` [PATCH net-next 01/10] rxrpc: Make sure we initialise the peer hash key David Howells
@ 2016-09-13 22:21 ` David Howells
  2016-09-13 22:21 ` [PATCH net-next 03/10] rxrpc: The IDLE ACK packet should use rxrpc_idle_ack_delay David Howells
                   ` (8 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: David Howells @ 2016-09-13 22:21 UTC (permalink / raw)
  To: netdev; +Cc: dhowells, linux-afs, linux-kernel

We need to wake up the sender when Tx window rotation due to an incoming
ACK makes space in the buffer otherwise the sender is liable to just hang
endlessly.

This problem isn't noticeable if the Tx phase transfers no more than will
fit in a single window or the Tx window rotates fast enough that it doesn't
get full.

Signed-off-by: David Howells <dhowells@redhat.com>
---

 net/rxrpc/input.c |    2 ++
 1 file changed, 2 insertions(+)

diff --git a/net/rxrpc/input.c b/net/rxrpc/input.c
index afeba98004b1..a707d5952164 100644
--- a/net/rxrpc/input.c
+++ b/net/rxrpc/input.c
@@ -59,6 +59,8 @@ static void rxrpc_rotate_tx_window(struct rxrpc_call *call, rxrpc_seq_t to)
 
 	spin_unlock(&call->lock);
 
+	wake_up(&call->waitq);
+
 	while (list) {
 		skb = list;
 		list = skb->next;

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH net-next 03/10] rxrpc: The IDLE ACK packet should use rxrpc_idle_ack_delay
  2016-09-13 22:20 [PATCH net-next 00/10] rxrpc: Miscellaneous fixes David Howells
  2016-09-13 22:21 ` [PATCH net-next 01/10] rxrpc: Make sure we initialise the peer hash key David Howells
  2016-09-13 22:21 ` [PATCH net-next 02/10] rxrpc: Add missing wakeup on Tx window rotation David Howells
@ 2016-09-13 22:21 ` David Howells
  2016-09-13 22:21 ` [PATCH net-next 04/10] rxrpc: Requeue call for recvmsg if more data David Howells
                   ` (7 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: David Howells @ 2016-09-13 22:21 UTC (permalink / raw)
  To: netdev; +Cc: dhowells, linux-afs, linux-kernel

The IDLE ACK packet should use the rxrpc_idle_ack_delay setting when the
timer is set for it.

Signed-off-by: David Howells <dhowells@redhat.com>
---

 net/rxrpc/call_event.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/rxrpc/call_event.c b/net/rxrpc/call_event.c
index 2b976e789562..61432049869b 100644
--- a/net/rxrpc/call_event.c
+++ b/net/rxrpc/call_event.c
@@ -95,7 +95,7 @@ static void __rxrpc_propose_ACK(struct rxrpc_call *call, u8 ack_reason,
 		break;
 
 	case RXRPC_ACK_IDLE:
-		if (rxrpc_soft_ack_delay < expiry)
+		if (rxrpc_idle_ack_delay < expiry)
 			expiry = rxrpc_idle_ack_delay;
 		break;
 

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH net-next 04/10] rxrpc: Requeue call for recvmsg if more data
  2016-09-13 22:20 [PATCH net-next 00/10] rxrpc: Miscellaneous fixes David Howells
                   ` (2 preceding siblings ...)
  2016-09-13 22:21 ` [PATCH net-next 03/10] rxrpc: The IDLE ACK packet should use rxrpc_idle_ack_delay David Howells
@ 2016-09-13 22:21 ` David Howells
  2016-09-13 22:21 ` [PATCH net-next 05/10] rxrpc: Add missing unlock in rxrpc_call_accept() David Howells
                   ` (6 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: David Howells @ 2016-09-13 22:21 UTC (permalink / raw)
  To: netdev; +Cc: dhowells, linux-afs, linux-kernel

rxrpc_recvmsg() needs to make sure that the call it has just been
processing gets requeued for further attention if the buffer has been
filled and there's more data to be consumed.  The softirq producer only
queues the call and wakes the socket if it fills the first slot in the
window, so userspace might end up sleeping forever otherwise, despite there
being data available.

This is not a problem provided the userspace buffer is big enough or it
empties the buffer completely before more data comes in.

Signed-off-by: David Howells <dhowells@redhat.com>
---

 net/rxrpc/recvmsg.c |    4 ++++
 1 file changed, 4 insertions(+)

diff --git a/net/rxrpc/recvmsg.c b/net/rxrpc/recvmsg.c
index 20d0b5c6f81b..16ff56f69256 100644
--- a/net/rxrpc/recvmsg.c
+++ b/net/rxrpc/recvmsg.c
@@ -463,6 +463,10 @@ try_again:
 					 flags, &copied);
 		if (ret == -EAGAIN)
 			ret = 0;
+
+		if (after(call->rx_top, call->rx_hard_ack) &&
+		    call->rxtx_buffer[(call->rx_hard_ack + 1) & RXRPC_RXTX_BUFF_MASK])
+			rxrpc_notify_socket(call);
 		break;
 	default:
 		ret = 0;

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH net-next 05/10] rxrpc: Add missing unlock in rxrpc_call_accept()
  2016-09-13 22:20 [PATCH net-next 00/10] rxrpc: Miscellaneous fixes David Howells
                   ` (3 preceding siblings ...)
  2016-09-13 22:21 ` [PATCH net-next 04/10] rxrpc: Requeue call for recvmsg if more data David Howells
@ 2016-09-13 22:21 ` David Howells
  2016-09-13 22:21 ` [PATCH net-next 06/10] rxrpc: Use skb->len not skb->data_len David Howells
                   ` (5 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: David Howells @ 2016-09-13 22:21 UTC (permalink / raw)
  To: netdev; +Cc: dhowells, linux-afs, linux-kernel

Add a missing unlock in rxrpc_call_accept() in the path taken if there's no
call to wake up.

Signed-off-by: David Howells <dhowells@redhat.com>
---

 net/rxrpc/call_accept.c |    8 +++++---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/net/rxrpc/call_accept.c b/net/rxrpc/call_accept.c
index b8acec0d596e..06e328f6b0f0 100644
--- a/net/rxrpc/call_accept.c
+++ b/net/rxrpc/call_accept.c
@@ -425,9 +425,11 @@ struct rxrpc_call *rxrpc_accept_call(struct rxrpc_sock *rx,
 
 	write_lock(&rx->call_lock);
 
-	ret = -ENODATA;
-	if (list_empty(&rx->to_be_accepted))
-		goto out;
+	if (list_empty(&rx->to_be_accepted)) {
+		write_unlock(&rx->call_lock);
+		kleave(" = -ENODATA [empty]");
+		return ERR_PTR(-ENODATA);
+	}
 
 	/* check the user ID isn't already in use */
 	pp = &rx->calls.rb_node;

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH net-next 06/10] rxrpc: Use skb->len not skb->data_len
  2016-09-13 22:20 [PATCH net-next 00/10] rxrpc: Miscellaneous fixes David Howells
                   ` (4 preceding siblings ...)
  2016-09-13 22:21 ` [PATCH net-next 05/10] rxrpc: Add missing unlock in rxrpc_call_accept() David Howells
@ 2016-09-13 22:21 ` David Howells
  2016-09-13 22:21 ` [PATCH net-next 07/10] rxrpc: Allow tx_winsize to grow in response to an ACK David Howells
                   ` (4 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: David Howells @ 2016-09-13 22:21 UTC (permalink / raw)
  To: netdev; +Cc: dhowells, linux-afs, linux-kernel

skb->len should be used rather than skb->data_len when referring to the
amount of data in a packet.  This will only cause a malfunction in the
following cases:

 (1) We receive a jumbo packet (validation and splitting both are wrong).

 (2) We see if there's extra ACK info in an ACK packet (we think it's not
     there and just ignore it).

Signed-off-by: David Howells <dhowells@redhat.com>
---

 net/rxrpc/input.c |    8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/net/rxrpc/input.c b/net/rxrpc/input.c
index a707d5952164..5958ef8ba2a0 100644
--- a/net/rxrpc/input.c
+++ b/net/rxrpc/input.c
@@ -127,7 +127,7 @@ static bool rxrpc_validate_jumbo(struct sk_buff *skb)
 {
 	struct rxrpc_skb_priv *sp = rxrpc_skb(skb);
 	unsigned int offset = sp->offset;
-	unsigned int len = skb->data_len;
+	unsigned int len = skb->len;
 	int nr_jumbo = 1;
 	u8 flags = sp->hdr.flags;
 
@@ -196,7 +196,7 @@ static void rxrpc_input_data(struct rxrpc_call *call, struct sk_buff *skb,
 	u8 ack = 0, flags, annotation = 0;
 
 	_enter("{%u,%u},{%u,%u}",
-	       call->rx_hard_ack, call->rx_top, skb->data_len, seq);
+	       call->rx_hard_ack, call->rx_top, skb->len, seq);
 
 	_proto("Rx DATA %%%u { #%u f=%02x }",
 	       sp->hdr.serial, seq, sp->hdr.flags);
@@ -233,7 +233,7 @@ static void rxrpc_input_data(struct rxrpc_call *call, struct sk_buff *skb,
 next_subpacket:
 	queued = false;
 	ix = seq & RXRPC_RXTX_BUFF_MASK;
-	len = skb->data_len;
+	len = skb->len;
 	if (flags & RXRPC_JUMBO_PACKET)
 		len = RXRPC_JUMBO_DATALEN;
 
@@ -444,7 +444,7 @@ static void rxrpc_input_ack(struct rxrpc_call *call, struct sk_buff *skb,
 	}
 
 	offset = sp->offset + nr_acks + 3;
-	if (skb->data_len >= offset + sizeof(buf.info)) {
+	if (skb->len >= offset + sizeof(buf.info)) {
 		if (skb_copy_bits(skb, offset, &buf.info, sizeof(buf.info)) < 0)
 			return rxrpc_proto_abort("XAI", call, 0);
 		rxrpc_input_ackinfo(call, skb, &buf.info);

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH net-next 07/10] rxrpc: Allow tx_winsize to grow in response to an ACK
  2016-09-13 22:20 [PATCH net-next 00/10] rxrpc: Miscellaneous fixes David Howells
                   ` (5 preceding siblings ...)
  2016-09-13 22:21 ` [PATCH net-next 06/10] rxrpc: Use skb->len not skb->data_len David Howells
@ 2016-09-13 22:21 ` David Howells
  2016-09-13 22:21 ` [PATCH net-next 08/10] rxrpc: Adjust the call ref tracepoint to show kernel API refs David Howells
                   ` (3 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: David Howells @ 2016-09-13 22:21 UTC (permalink / raw)
  To: netdev; +Cc: dhowells, linux-afs, linux-kernel

Allow tx_winsize to grow when the ACK info packet shows a larger receive
window at the other end rather than only permitting it to shrink.

Signed-off-by: David Howells <dhowells@redhat.com>
---

 net/rxrpc/input.c |    8 +++++---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/net/rxrpc/input.c b/net/rxrpc/input.c
index 5958ef8ba2a0..8e529afcd6c1 100644
--- a/net/rxrpc/input.c
+++ b/net/rxrpc/input.c
@@ -333,14 +333,16 @@ static void rxrpc_input_ackinfo(struct rxrpc_call *call, struct sk_buff *skb,
 	struct rxrpc_skb_priv *sp = rxrpc_skb(skb);
 	struct rxrpc_peer *peer;
 	unsigned int mtu;
+	u32 rwind = ntohl(ackinfo->rwind);
 
 	_proto("Rx ACK %%%u Info { rx=%u max=%u rwin=%u jm=%u }",
 	       sp->hdr.serial,
 	       ntohl(ackinfo->rxMTU), ntohl(ackinfo->maxMTU),
-	       ntohl(ackinfo->rwind), ntohl(ackinfo->jumbo_max));
+	       rwind, ntohl(ackinfo->jumbo_max));
 
-	if (call->tx_winsize > ntohl(ackinfo->rwind))
-		call->tx_winsize = ntohl(ackinfo->rwind);
+	if (rwind > RXRPC_RXTX_BUFF_SIZE - 1)
+		rwind = RXRPC_RXTX_BUFF_SIZE - 1;
+	call->tx_winsize = rwind;
 
 	mtu = min(ntohl(ackinfo->rxMTU), ntohl(ackinfo->maxMTU));
 

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH net-next 08/10] rxrpc: Adjust the call ref tracepoint to show kernel API refs
  2016-09-13 22:20 [PATCH net-next 00/10] rxrpc: Miscellaneous fixes David Howells
                   ` (6 preceding siblings ...)
  2016-09-13 22:21 ` [PATCH net-next 07/10] rxrpc: Allow tx_winsize to grow in response to an ACK David Howells
@ 2016-09-13 22:21 ` David Howells
  2016-09-13 22:21 ` [PATCH net-next 09/10] rxrpc: Fix prealloc refcounting David Howells
                   ` (2 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: David Howells @ 2016-09-13 22:21 UTC (permalink / raw)
  To: netdev; +Cc: dhowells, linux-afs, linux-kernel

Adjust the call ref tracepoint to show references held on a call by the
kernel API separately as much as possible and add an additional trace to at
the allocation point from the preallocation buffer for an incoming call.

Note that this doesn't show the allocation of a client call for the kernel
separately at the moment.

Signed-off-by: David Howells <dhowells@redhat.com>
---

 net/rxrpc/af_rxrpc.c    |    2 +-
 net/rxrpc/ar-internal.h |    2 ++
 net/rxrpc/call_accept.c |    3 ++-
 net/rxrpc/call_object.c |    2 ++
 4 files changed, 7 insertions(+), 2 deletions(-)

diff --git a/net/rxrpc/af_rxrpc.c b/net/rxrpc/af_rxrpc.c
index caa226dd436e..25d00ded24bc 100644
--- a/net/rxrpc/af_rxrpc.c
+++ b/net/rxrpc/af_rxrpc.c
@@ -299,7 +299,7 @@ void rxrpc_kernel_end_call(struct socket *sock, struct rxrpc_call *call)
 {
 	_enter("%d{%d}", call->debug_id, atomic_read(&call->usage));
 	rxrpc_release_call(rxrpc_sk(sock->sk), call);
-	rxrpc_put_call(call, rxrpc_call_put);
+	rxrpc_put_call(call, rxrpc_call_put_kernel);
 }
 EXPORT_SYMBOL(rxrpc_kernel_end_call);
 
diff --git a/net/rxrpc/ar-internal.h b/net/rxrpc/ar-internal.h
index b1cb79ec4e96..47c74a581a0f 100644
--- a/net/rxrpc/ar-internal.h
+++ b/net/rxrpc/ar-internal.h
@@ -540,8 +540,10 @@ enum rxrpc_call_trace {
 	rxrpc_call_seen,
 	rxrpc_call_got,
 	rxrpc_call_got_userid,
+	rxrpc_call_got_kernel,
 	rxrpc_call_put,
 	rxrpc_call_put_userid,
+	rxrpc_call_put_kernel,
 	rxrpc_call_put_noqueue,
 	rxrpc_call__nr_trace
 };
diff --git a/net/rxrpc/call_accept.c b/net/rxrpc/call_accept.c
index 06e328f6b0f0..5fd9d2c89b7f 100644
--- a/net/rxrpc/call_accept.c
+++ b/net/rxrpc/call_accept.c
@@ -121,7 +121,7 @@ static int rxrpc_service_prealloc_one(struct rxrpc_sock *rx,
 
 		call->user_call_ID = user_call_ID;
 		call->notify_rx = notify_rx;
-		rxrpc_get_call(call, rxrpc_call_got);
+		rxrpc_get_call(call, rxrpc_call_got_kernel);
 		user_attach_call(call, user_call_ID);
 		rxrpc_get_call(call, rxrpc_call_got_userid);
 		rb_link_node(&call->sock_node, parent, pp);
@@ -300,6 +300,7 @@ static struct rxrpc_call *rxrpc_alloc_incoming_call(struct rxrpc_sock *rx,
 	smp_store_release(&b->call_backlog_tail,
 			  (call_tail + 1) & (RXRPC_BACKLOG_MAX - 1));
 
+	rxrpc_see_call(call);
 	call->conn = conn;
 	call->peer = rxrpc_get_peer(conn->params.peer);
 	return call;
diff --git a/net/rxrpc/call_object.c b/net/rxrpc/call_object.c
index 18ab13f82f6e..3f9476508204 100644
--- a/net/rxrpc/call_object.c
+++ b/net/rxrpc/call_object.c
@@ -56,8 +56,10 @@ const char rxrpc_call_traces[rxrpc_call__nr_trace][4] = {
 	[rxrpc_call_seen]		= "SEE",
 	[rxrpc_call_got]		= "GOT",
 	[rxrpc_call_got_userid]		= "Gus",
+	[rxrpc_call_got_kernel]		= "Gke",
 	[rxrpc_call_put]		= "PUT",
 	[rxrpc_call_put_userid]		= "Pus",
+	[rxrpc_call_put_kernel]		= "Pke",
 	[rxrpc_call_put_noqueue]	= "PNQ",
 };
 

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH net-next 09/10] rxrpc: Fix prealloc refcounting
  2016-09-13 22:20 [PATCH net-next 00/10] rxrpc: Miscellaneous fixes David Howells
                   ` (7 preceding siblings ...)
  2016-09-13 22:21 ` [PATCH net-next 08/10] rxrpc: Adjust the call ref tracepoint to show kernel API refs David Howells
@ 2016-09-13 22:21 ` David Howells
  2016-09-13 22:22 ` [PATCH net-next 10/10] rxrpc: Correctly initialise, limit and transmit call->rx_winsize David Howells
  2016-09-16  5:55 ` [PATCH net-next 00/10] rxrpc: Miscellaneous fixes David Miller
  10 siblings, 0 replies; 12+ messages in thread
From: David Howells @ 2016-09-13 22:21 UTC (permalink / raw)
  To: netdev; +Cc: dhowells, linux-afs, linux-kernel

The preallocated call buffer holds a ref on the calls within that buffer.
The ref was being released in the wrong place - it worked okay for incoming
calls to the AFS cache manager service, but doesn't work right for incoming
calls to a userspace service.

Instead of releasing an extra ref service calls in rxrpc_release_call(),
the ref needs to be released during the acceptance/rejectance process.  To
this end:

 (1) The prealloc ref is now normally released during
     rxrpc_new_incoming_call().

 (2) For preallocated kernel API calls, the kernel API's ref needs to be
     released when the call is discarded on socket close.

 (3) We shouldn't take a second ref in rxrpc_accept_call().

 (4) rxrpc_recvmsg_new_call() needs to get a ref of its own when it adds
     the call to the to_be_accepted socket queue.

In doing (4) above, we would prefer not to put the call's refcount down to
0 as that entails doing cleanup in softirq context, but it's unlikely as
there are several refs held elsewhere, at least one of which must be put by
someone in process context calling rxrpc_release_call().  However, it's not
a problem if we do have to do that.

Signed-off-by: David Howells <dhowells@redhat.com>
---

 net/rxrpc/call_accept.c |    9 ++++++++-
 net/rxrpc/call_object.c |    3 ---
 net/rxrpc/recvmsg.c     |    1 +
 3 files changed, 9 insertions(+), 4 deletions(-)

diff --git a/net/rxrpc/call_accept.c b/net/rxrpc/call_accept.c
index 5fd9d2c89b7f..26c293ef98eb 100644
--- a/net/rxrpc/call_accept.c
+++ b/net/rxrpc/call_accept.c
@@ -221,6 +221,7 @@ void rxrpc_discard_prealloc(struct rxrpc_sock *rx)
 		if (rx->discard_new_call) {
 			_debug("discard %lx", call->user_call_ID);
 			rx->discard_new_call(call, call->user_call_ID);
+			rxrpc_put_call(call, rxrpc_call_put_kernel);
 		}
 		rxrpc_call_completed(call);
 		rxrpc_release_call(rx, call);
@@ -402,6 +403,13 @@ found_service:
 	if (call->state == RXRPC_CALL_SERVER_ACCEPTING)
 		rxrpc_notify_socket(call);
 
+	/* We have to discard the prealloc queue's ref here and rely on a
+	 * combination of the RCU read lock and refs held either by the socket
+	 * (recvmsg queue, to-be-accepted queue or user ID tree) or the kernel
+	 * service to prevent the call from being deallocated too early.
+	 */
+	rxrpc_put_call(call, rxrpc_call_put);
+
 	_leave(" = %p{%d}", call, call->debug_id);
 out:
 	spin_unlock(&rx->incoming_lock);
@@ -469,7 +477,6 @@ struct rxrpc_call *rxrpc_accept_call(struct rxrpc_sock *rx,
 	}
 
 	/* formalise the acceptance */
-	rxrpc_get_call(call, rxrpc_call_got);
 	call->notify_rx = notify_rx;
 	call->user_call_ID = user_call_ID;
 	rxrpc_get_call(call, rxrpc_call_got_userid);
diff --git a/net/rxrpc/call_object.c b/net/rxrpc/call_object.c
index 3f9476508204..9aa1c4b53563 100644
--- a/net/rxrpc/call_object.c
+++ b/net/rxrpc/call_object.c
@@ -464,9 +464,6 @@ void rxrpc_release_call(struct rxrpc_sock *rx, struct rxrpc_call *call)
 		call->rxtx_buffer[i] = NULL;
 	}
 
-	/* We have to release the prealloc backlog ref */
-	if (rxrpc_is_service_call(call))
-		rxrpc_put_call(call, rxrpc_call_put);
 	_leave("");
 }
 
diff --git a/net/rxrpc/recvmsg.c b/net/rxrpc/recvmsg.c
index 16ff56f69256..a284205b8ecf 100644
--- a/net/rxrpc/recvmsg.c
+++ b/net/rxrpc/recvmsg.c
@@ -118,6 +118,7 @@ static int rxrpc_recvmsg_new_call(struct rxrpc_sock *rx,
 		list_del_init(&call->recvmsg_link);
 		write_unlock_bh(&rx->recvmsg_lock);
 
+		rxrpc_get_call(call, rxrpc_call_got);
 		write_lock(&rx->call_lock);
 		list_add_tail(&call->accept_link, &rx->to_be_accepted);
 		write_unlock(&rx->call_lock);

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH net-next 10/10] rxrpc: Correctly initialise, limit and transmit call->rx_winsize
  2016-09-13 22:20 [PATCH net-next 00/10] rxrpc: Miscellaneous fixes David Howells
                   ` (8 preceding siblings ...)
  2016-09-13 22:21 ` [PATCH net-next 09/10] rxrpc: Fix prealloc refcounting David Howells
@ 2016-09-13 22:22 ` David Howells
  2016-09-16  5:55 ` [PATCH net-next 00/10] rxrpc: Miscellaneous fixes David Miller
  10 siblings, 0 replies; 12+ messages in thread
From: David Howells @ 2016-09-13 22:22 UTC (permalink / raw)
  To: netdev; +Cc: dhowells, linux-afs, linux-kernel

call->rx_winsize should be initialised to the sysctl setting and the sysctl
setting should be limited to the maximum we want to permit.  Further, we
need to place this in the ACK info instead of the sysctl setting.

Furthermore, discard the idea of accepting the subpackets of a jumbo packet
that lie beyond the receive window when the first packet of the jumbo is
within the window.  Just discard the excess subpackets instead.  This
allows the receive window to be opened up right to the buffer size less one
for the dead slot.

Signed-off-by: David Howells <dhowells@redhat.com>
---

 net/rxrpc/ar-internal.h |    3 ++-
 net/rxrpc/call_object.c |    2 +-
 net/rxrpc/input.c       |   23 ++++++++++++++++-------
 net/rxrpc/misc.c        |    5 ++++-
 net/rxrpc/output.c      |    4 ++--
 net/rxrpc/sysctl.c      |    2 +-
 6 files changed, 26 insertions(+), 13 deletions(-)

diff --git a/net/rxrpc/ar-internal.h b/net/rxrpc/ar-internal.h
index 47c74a581a0f..e78c40b37db5 100644
--- a/net/rxrpc/ar-internal.h
+++ b/net/rxrpc/ar-internal.h
@@ -498,6 +498,7 @@ struct rxrpc_call {
 	 */
 #define RXRPC_RXTX_BUFF_SIZE	64
 #define RXRPC_RXTX_BUFF_MASK	(RXRPC_RXTX_BUFF_SIZE - 1)
+#define RXRPC_INIT_RX_WINDOW_SIZE 32
 	struct sk_buff		**rxtx_buffer;
 	u8			*rxtx_annotations;
 #define RXRPC_TX_ANNO_ACK	0
@@ -518,7 +519,7 @@ struct rxrpc_call {
 	rxrpc_seq_t		rx_expect_next;	/* Expected next packet sequence number */
 	u8			rx_winsize;	/* Size of Rx window */
 	u8			tx_winsize;	/* Maximum size of Tx window */
-	u8			nr_jumbo_dup;	/* Number of jumbo duplicates */
+	u8			nr_jumbo_bad;	/* Number of jumbo dups/exceeds-windows */
 
 	/* receive-phase ACK management */
 	u8			ackr_reason;	/* reason to ACK */
diff --git a/net/rxrpc/call_object.c b/net/rxrpc/call_object.c
index 9aa1c4b53563..22f9b0d1a138 100644
--- a/net/rxrpc/call_object.c
+++ b/net/rxrpc/call_object.c
@@ -152,7 +152,7 @@ struct rxrpc_call *rxrpc_alloc_call(gfp_t gfp)
 	memset(&call->sock_node, 0xed, sizeof(call->sock_node));
 
 	/* Leave space in the ring to handle a maxed-out jumbo packet */
-	call->rx_winsize = RXRPC_RXTX_BUFF_SIZE - 1 - 46;
+	call->rx_winsize = rxrpc_rx_window_size;
 	call->tx_winsize = 16;
 	call->rx_expect_next = 1;
 	return call;
diff --git a/net/rxrpc/input.c b/net/rxrpc/input.c
index 8e529afcd6c1..75af0bd316c7 100644
--- a/net/rxrpc/input.c
+++ b/net/rxrpc/input.c
@@ -164,7 +164,7 @@ protocol_error:
  * (that information is encoded in the ACK packet).
  */
 static void rxrpc_input_dup_data(struct rxrpc_call *call, rxrpc_seq_t seq,
-				 u8 annotation, bool *_jumbo_dup)
+				 u8 annotation, bool *_jumbo_bad)
 {
 	/* Discard normal packets that are duplicates. */
 	if (annotation == 0)
@@ -174,9 +174,9 @@ static void rxrpc_input_dup_data(struct rxrpc_call *call, rxrpc_seq_t seq,
 	 * more partially duplicate jumbo packets, we refuse to take any more
 	 * jumbos for this call.
 	 */
-	if (!*_jumbo_dup) {
-		call->nr_jumbo_dup++;
-		*_jumbo_dup = true;
+	if (!*_jumbo_bad) {
+		call->nr_jumbo_bad++;
+		*_jumbo_bad = true;
 	}
 }
 
@@ -191,7 +191,7 @@ static void rxrpc_input_data(struct rxrpc_call *call, struct sk_buff *skb,
 	unsigned int ix;
 	rxrpc_serial_t serial = sp->hdr.serial, ack_serial = 0;
 	rxrpc_seq_t seq = sp->hdr.seq, hard_ack;
-	bool immediate_ack = false, jumbo_dup = false, queued;
+	bool immediate_ack = false, jumbo_bad = false, queued;
 	u16 len;
 	u8 ack = 0, flags, annotation = 0;
 
@@ -222,7 +222,7 @@ static void rxrpc_input_data(struct rxrpc_call *call, struct sk_buff *skb,
 
 	flags = sp->hdr.flags;
 	if (flags & RXRPC_JUMBO_PACKET) {
-		if (call->nr_jumbo_dup > 3) {
+		if (call->nr_jumbo_bad > 3) {
 			ack = RXRPC_ACK_NOSPACE;
 			ack_serial = serial;
 			goto ack;
@@ -259,7 +259,7 @@ next_subpacket:
 	}
 
 	if (call->rxtx_buffer[ix]) {
-		rxrpc_input_dup_data(call, seq, annotation, &jumbo_dup);
+		rxrpc_input_dup_data(call, seq, annotation, &jumbo_bad);
 		if (ack != RXRPC_ACK_DUPLICATE) {
 			ack = RXRPC_ACK_DUPLICATE;
 			ack_serial = serial;
@@ -304,6 +304,15 @@ skip:
 		annotation++;
 		if (flags & RXRPC_JUMBO_PACKET)
 			annotation |= RXRPC_RX_ANNO_JLAST;
+		if (after(seq, hard_ack + call->rx_winsize)) {
+			ack = RXRPC_ACK_EXCEEDS_WINDOW;
+			ack_serial = serial;
+			if (!jumbo_bad) {
+				call->nr_jumbo_bad++;
+				jumbo_bad = true;
+			}
+			goto ack;
+		}
 
 		_proto("Rx DATA Jumbo %%%u", serial);
 		goto next_subpacket;
diff --git a/net/rxrpc/misc.c b/net/rxrpc/misc.c
index fd096f742e4b..8b910780f1ac 100644
--- a/net/rxrpc/misc.c
+++ b/net/rxrpc/misc.c
@@ -50,7 +50,10 @@ unsigned int rxrpc_idle_ack_delay = 0.5 * HZ;
  * limit is hit, we should generate an EXCEEDS_WINDOW ACK and discard further
  * packets.
  */
-unsigned int rxrpc_rx_window_size = RXRPC_RXTX_BUFF_SIZE - 46;
+unsigned int rxrpc_rx_window_size = RXRPC_INIT_RX_WINDOW_SIZE;
+#if (RXRPC_RXTX_BUFF_SIZE - 1) < RXRPC_INIT_RX_WINDOW_SIZE
+#error Need to reduce RXRPC_INIT_RX_WINDOW_SIZE
+#endif
 
 /*
  * Maximum Rx MTU size.  This indicates to the sender the size of jumbo packet
diff --git a/net/rxrpc/output.c b/net/rxrpc/output.c
index 719a4c23f09d..90c7722d5779 100644
--- a/net/rxrpc/output.c
+++ b/net/rxrpc/output.c
@@ -71,10 +71,10 @@ static size_t rxrpc_fill_out_ack(struct rxrpc_call *call,
 
 	mtu = call->conn->params.peer->if_mtu;
 	mtu -= call->conn->params.peer->hdrsize;
-	jmax = (call->nr_jumbo_dup > 3) ? 1 : rxrpc_rx_jumbo_max;
+	jmax = (call->nr_jumbo_bad > 3) ? 1 : rxrpc_rx_jumbo_max;
 	pkt->ackinfo.rxMTU	= htonl(rxrpc_rx_mtu);
 	pkt->ackinfo.maxMTU	= htonl(mtu);
-	pkt->ackinfo.rwind	= htonl(rxrpc_rx_window_size);
+	pkt->ackinfo.rwind	= htonl(call->rx_winsize);
 	pkt->ackinfo.jumbo_max	= htonl(jmax);
 
 	*ackp++ = 0;
diff --git a/net/rxrpc/sysctl.c b/net/rxrpc/sysctl.c
index b7ca8cf13c84..a03c61c672f5 100644
--- a/net/rxrpc/sysctl.c
+++ b/net/rxrpc/sysctl.c
@@ -20,7 +20,7 @@ static const unsigned int one = 1;
 static const unsigned int four = 4;
 static const unsigned int thirtytwo = 32;
 static const unsigned int n_65535 = 65535;
-static const unsigned int n_max_acks = RXRPC_MAXACKS;
+static const unsigned int n_max_acks = RXRPC_RXTX_BUFF_SIZE - 1;
 
 /*
  * RxRPC operating parameters.

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH net-next 00/10] rxrpc: Miscellaneous fixes
  2016-09-13 22:20 [PATCH net-next 00/10] rxrpc: Miscellaneous fixes David Howells
                   ` (9 preceding siblings ...)
  2016-09-13 22:22 ` [PATCH net-next 10/10] rxrpc: Correctly initialise, limit and transmit call->rx_winsize David Howells
@ 2016-09-16  5:55 ` David Miller
  10 siblings, 0 replies; 12+ messages in thread
From: David Miller @ 2016-09-16  5:55 UTC (permalink / raw)
  To: dhowells; +Cc: netdev, linux-afs, linux-kernel

From: David Howells <dhowells@redhat.com>
Date: Tue, 13 Sep 2016 23:20:56 +0100

> 
> Here's a set of miscellaneous fix patches.  There are a couple of points of
> note:
> 
>  (1) There is one non-fix patch that adjusts the call ref tracking
>      tracepoint to make kernel API-held refs on calls more obvious.  This
>      is a prerequisite for the patch that fixes prealloc refcounting.
> 
>  (2) The final patch alters how jumbo packets that partially exceed the
>      receive window are handled.  Previously, space was being left in the
>      Rx buffer for them, but this significantly hurts performance as the Rx
>      window can't be increased to match the OpenAFS Tx window size.
> 
>      Instead, the excess subpackets are discarded and an EXCEEDS_WINDOW ACK
>      is generated for the first.  To avoid the problem of someone trying to
>      run the kernel out of space by feeding the kernel a series of
>      overlapping maximal jumbo packets, we stop allowing jumbo packets on a
>      call if we encounter more than three jumbo packets with duplicate or
>      excessive subpackets.
 ...
> Tagged thusly:
> 
> 	git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs.git
> 	rxrpc-rewrite-20160913-1

Pulled, thanks David.

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2016-09-16  5:55 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-09-13 22:20 [PATCH net-next 00/10] rxrpc: Miscellaneous fixes David Howells
2016-09-13 22:21 ` [PATCH net-next 01/10] rxrpc: Make sure we initialise the peer hash key David Howells
2016-09-13 22:21 ` [PATCH net-next 02/10] rxrpc: Add missing wakeup on Tx window rotation David Howells
2016-09-13 22:21 ` [PATCH net-next 03/10] rxrpc: The IDLE ACK packet should use rxrpc_idle_ack_delay David Howells
2016-09-13 22:21 ` [PATCH net-next 04/10] rxrpc: Requeue call for recvmsg if more data David Howells
2016-09-13 22:21 ` [PATCH net-next 05/10] rxrpc: Add missing unlock in rxrpc_call_accept() David Howells
2016-09-13 22:21 ` [PATCH net-next 06/10] rxrpc: Use skb->len not skb->data_len David Howells
2016-09-13 22:21 ` [PATCH net-next 07/10] rxrpc: Allow tx_winsize to grow in response to an ACK David Howells
2016-09-13 22:21 ` [PATCH net-next 08/10] rxrpc: Adjust the call ref tracepoint to show kernel API refs David Howells
2016-09-13 22:21 ` [PATCH net-next 09/10] rxrpc: Fix prealloc refcounting David Howells
2016-09-13 22:22 ` [PATCH net-next 10/10] rxrpc: Correctly initialise, limit and transmit call->rx_winsize David Howells
2016-09-16  5:55 ` [PATCH net-next 00/10] rxrpc: Miscellaneous fixes David Miller

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).