linux-hyperv.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH net v2] hvsock: fix epollout hang from race condition
@ 2019-06-17 19:26 Sunil Muthuswamy
  2019-06-19  1:42 ` David Miller
  0 siblings, 1 reply; 2+ messages in thread
From: Sunil Muthuswamy @ 2019-06-17 19:26 UTC (permalink / raw)
  To: KY Srinivasan, Haiyang Zhang, Stephen Hemminger, Sasha Levin,
	David S. Miller, Dexuan Cui, Michael Kelley
  Cc: netdev, linux-hyperv, linux-kernel

Currently, hvsock can enter into a state where epoll_wait on EPOLLOUT will
not return even when the hvsock socket is writable, under some race
condition. This can happen under the following sequence:
- fd = socket(hvsocket)
- fd_out = dup(fd)
- fd_in = dup(fd)
- start a writer thread that writes data to fd_out with a combination of
  epoll_wait(fd_out, EPOLLOUT) and
- start a reader thread that reads data from fd_in with a combination of
  epoll_wait(fd_in, EPOLLIN)
- On the host, there are two threads that are reading/writing data to the
  hvsocket

stack:
hvs_stream_has_space
hvs_notify_poll_out
vsock_poll
sock_poll
ep_poll

Race condition:
check for epollout from ep_poll():
	assume no writable space in the socket
	hvs_stream_has_space() returns 0
check for epollin from ep_poll():
	assume socket has some free space < HVS_PKT_LEN(HVS_SEND_BUF_SIZE)
	hvs_stream_has_space() will clear the channel pending send size
	host will not notify the guest because the pending send size has
		been cleared and so the hvsocket will never mark the
		socket writable

Now, the EPOLLOUT will never return even if the socket write buffer is
empty.

The fix is to set the pending size to the default size and never change it.
This way the host will always notify the guest whenever the writable space
is bigger than the pending size. The host is already optimized to *only*
notify the guest when the pending size threshold boundary is crossed and
not everytime.

This change also reduces the cpu usage somewhat since hv_stream_has_space()
is in the hotpath of send:
vsock_stream_sendmsg()->hv_stream_has_space()
Earlier hv_stream_has_space was setting/clearing the pending size on every
call.

Signed-off-by: Sunil Muthuswamy <sunilmut@microsoft.com>
Reviewed-by: Dexuan Cui <decui@microsoft.com>
---
- Resubmitting the patch after taking care of some spurious warnings.

 net/vmw_vsock/hyperv_transport.c | 39 ++++++++-------------------------------
 1 file changed, 8 insertions(+), 31 deletions(-)

diff --git a/net/vmw_vsock/hyperv_transport.c b/net/vmw_vsock/hyperv_transport.c
index e4801c7..cd3f47f 100644
--- a/net/vmw_vsock/hyperv_transport.c
+++ b/net/vmw_vsock/hyperv_transport.c
@@ -220,18 +220,6 @@ static void hvs_set_channel_pending_send_size(struct vmbus_channel *chan)
 	set_channel_pending_send_size(chan,
 				      HVS_PKT_LEN(HVS_SEND_BUF_SIZE));
 
-	/* See hvs_stream_has_space(): we must make sure the host has seen
-	 * the new pending send size, before we can re-check the writable
-	 * bytes.
-	 */
-	virt_mb();
-}
-
-static void hvs_clear_channel_pending_send_size(struct vmbus_channel *chan)
-{
-	set_channel_pending_send_size(chan, 0);
-
-	/* Ditto */
 	virt_mb();
 }
 
@@ -301,9 +289,6 @@ static void hvs_channel_cb(void *ctx)
 	if (hvs_channel_readable(chan))
 		sk->sk_data_ready(sk);
 
-	/* See hvs_stream_has_space(): when we reach here, the writable bytes
-	 * may be already less than HVS_PKT_LEN(HVS_SEND_BUF_SIZE).
-	 */
 	if (hv_get_bytes_to_write(&chan->outbound) > 0)
 		sk->sk_write_space(sk);
 }
@@ -404,6 +389,13 @@ static void hvs_open_connection(struct vmbus_channel *chan)
 	set_per_channel_state(chan, conn_from_host ? new : sk);
 	vmbus_set_chn_rescind_callback(chan, hvs_close_connection);
 
+	/* Set the pending send size to max packet size to always get
+	 * notifications from the host when there is enough writable space.
+	 * The host is optimized to send notifications only when the pending
+	 * size boundary is crossed, and not always.
+	 */
+	hvs_set_channel_pending_send_size(chan);
+
 	if (conn_from_host) {
 		new->sk_state = TCP_ESTABLISHED;
 		sk->sk_ack_backlog++;
@@ -697,23 +689,8 @@ static s64 hvs_stream_has_data(struct vsock_sock *vsk)
 static s64 hvs_stream_has_space(struct vsock_sock *vsk)
 {
 	struct hvsock *hvs = vsk->trans;
-	struct vmbus_channel *chan = hvs->chan;
-	s64 ret;
-
-	ret = hvs_channel_writable_bytes(chan);
-	if (ret > 0)  {
-		hvs_clear_channel_pending_send_size(chan);
-	} else {
-		/* See hvs_channel_cb() */
-		hvs_set_channel_pending_send_size(chan);
-
-		/* Re-check the writable bytes to avoid race */
-		ret = hvs_channel_writable_bytes(chan);
-		if (ret > 0)
-			hvs_clear_channel_pending_send_size(chan);
-	}
 
-	return ret;
+	return hvs_channel_writable_bytes(hvs->chan);
 }
 
 static u64 hvs_stream_rcvhiwat(struct vsock_sock *vsk)
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 2+ messages in thread

* Re: [PATCH net v2] hvsock: fix epollout hang from race condition
  2019-06-17 19:26 [PATCH net v2] hvsock: fix epollout hang from race condition Sunil Muthuswamy
@ 2019-06-19  1:42 ` David Miller
  0 siblings, 0 replies; 2+ messages in thread
From: David Miller @ 2019-06-19  1:42 UTC (permalink / raw)
  To: sunilmut
  Cc: kys, haiyangz, sthemmin, sashal, decui, mikelley, netdev,
	linux-hyperv, linux-kernel

From: Sunil Muthuswamy <sunilmut@microsoft.com>
Date: Mon, 17 Jun 2019 19:26:25 +0000

> Currently, hvsock can enter into a state where epoll_wait on EPOLLOUT will
> not return even when the hvsock socket is writable, under some race
> condition. This can happen under the following sequence:
...
> Now, the EPOLLOUT will never return even if the socket write buffer is
> empty.
> 
> The fix is to set the pending size to the default size and never change it.
> This way the host will always notify the guest whenever the writable space
> is bigger than the pending size. The host is already optimized to *only*
> notify the guest when the pending size threshold boundary is crossed and
> not everytime.
> 
> This change also reduces the cpu usage somewhat since hv_stream_has_space()
> is in the hotpath of send:
> vsock_stream_sendmsg()->hv_stream_has_space()
> Earlier hv_stream_has_space was setting/clearing the pending size on every
> call.
> 
> Signed-off-by: Sunil Muthuswamy <sunilmut@microsoft.com>
> Reviewed-by: Dexuan Cui <decui@microsoft.com>
> ---
> - Resubmitting the patch after taking care of some spurious warnings.

Applied and queued up for -stable.

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2019-06-19  1:42 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-06-17 19:26 [PATCH net v2] hvsock: fix epollout hang from race condition Sunil Muthuswamy
2019-06-19  1:42 ` David Miller

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).