[RFC PATCH v1 0/2] virtio/vsock: fix mutual rx/tx hungup

* [RFC PATCH v1 0/2] virtio/vsock: fix mutual rx/tx hungup
@ 2022-12-17 19:42 Arseniy Krasnov
  2022-12-17 19:45 ` [RFC PATCH v1 1/2] virtio/vsock: send credit update depending on SO_RCVLOWAT Arseniy Krasnov
                   ` (2 more replies)
  0 siblings, 3 replies; 9+ messages in thread
From: Arseniy Krasnov @ 2022-12-17 19:42 UTC (permalink / raw)
  To: Stefano Garzarella, Stefan Hajnoczi, edumazet, David S. Miller,
	Jakub Kicinski, Paolo Abeni
  Cc: linux-kernel, netdev, virtualization, kvm, kernel,
	Krasnov Arseniy, Arseniy Krasnov

Hello,

seems I found strange thing(may be a bug) where sender('tx' later) and
receiver('rx' later) could stuck forever. Potential fix is in the first
patch, second patch contains reproducer, based on vsock test suite.
Reproducer is simple: tx just sends data to rx by 'write() syscall, rx
dequeues it using 'read()' syscall and uses 'poll()' for waiting. I run
server in host and client in guest.

rx side params:
1) SO_VM_SOCKETS_BUFFER_SIZE is 256Kb(e.g. default).
2) SO_RCVLOWAT is 128Kb.

What happens in the reproducer step by step:

1) tx tries to send 256Kb + 1 byte (in a single 'write()')
2) tx sends 256Kb, data reaches rx (rx_bytes == 256Kb)
3) tx waits for space in 'write()' to send last 1 byte
4) rx does poll(), (rx_bytes >= rcvlowat) 256Kb >= 128Kb, POLLIN is set
5) rx reads 64Kb, credit update is not sent due to *
6) rx does poll(), (rx_bytes >= rcvlowat) 192Kb >= 128Kb, POLLIN is set
7) rx reads 64Kb, credit update is not sent due to *
8) rx does poll(), (rx_bytes >= rcvlowat) 128Kb >= 128Kb, POLLIN is set
9) rx reads 64Kb, credit update is not sent due to *
10) rx does poll(), (rx_bytes < rcvlowat) 64Kb < 128Kb, rx waits in poll()

* is optimization in 'virtio_transport_stream_do_dequeue()' which
  sends OP_CREDIT_UPDATE only when we have not too much space -
  less than VIRTIO_VSOCK_MAX_PKT_BUF_SIZE.

Now tx side waits for space inside write() and rx waits in poll() for
'rx_bytes' to reach SO_RCVLOWAT value. Both sides will wait forever. I
think, possible fix is to send credit update not only when we have too
small space, but also when number of bytes in receive queue is smaller
than SO_RCVLOWAT thus not enough to wake up sleeping reader. I'm not
sure about correctness of this idea, but anyway - I think that problem
above exists. What do You think?

Patchset was rebased and tested on skbuff v7 patch from Bobby Eshleman:
https://lore.kernel.org/netdev/20221213192843.421032-1-bobby.eshleman@bytedance.com/

Arseniy Krasnov(2):
 virtio/vsock: send credit update depending on SO_RCVLOWAT
 vsock_test: mutual hungup reproducer

 net/vmw_vsock/virtio_transport_common.c |  9 +++-
 tools/testing/vsock/vsock_test.c        | 78 +++++++++++++++++++++++++++++++++
 2 files changed, 85 insertions(+), 2 deletions(-)
-- 
2.25.1

^ permalink raw reply	[flat|nested] 9+ messages in thread