linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC PATCH v1 0/2] virtio/vsock: fix mutual rx/tx hungup
@ 2022-12-17 19:42 Arseniy Krasnov
  2022-12-17 19:45 ` [RFC PATCH v1 1/2] virtio/vsock: send credit update depending on SO_RCVLOWAT Arseniy Krasnov
                   ` (2 more replies)
  0 siblings, 3 replies; 9+ messages in thread
From: Arseniy Krasnov @ 2022-12-17 19:42 UTC (permalink / raw)
  To: Stefano Garzarella, Stefan Hajnoczi, edumazet, David S. Miller,
	Jakub Kicinski, Paolo Abeni
  Cc: linux-kernel, netdev, virtualization, kvm, kernel,
	Krasnov Arseniy, Arseniy Krasnov

Hello,

seems I found strange thing(may be a bug) where sender('tx' later) and
receiver('rx' later) could stuck forever. Potential fix is in the first
patch, second patch contains reproducer, based on vsock test suite.
Reproducer is simple: tx just sends data to rx by 'write() syscall, rx
dequeues it using 'read()' syscall and uses 'poll()' for waiting. I run
server in host and client in guest.

rx side params:
1) SO_VM_SOCKETS_BUFFER_SIZE is 256Kb(e.g. default).
2) SO_RCVLOWAT is 128Kb.

What happens in the reproducer step by step:

1) tx tries to send 256Kb + 1 byte (in a single 'write()')
2) tx sends 256Kb, data reaches rx (rx_bytes == 256Kb)
3) tx waits for space in 'write()' to send last 1 byte
4) rx does poll(), (rx_bytes >= rcvlowat) 256Kb >= 128Kb, POLLIN is set
5) rx reads 64Kb, credit update is not sent due to *
6) rx does poll(), (rx_bytes >= rcvlowat) 192Kb >= 128Kb, POLLIN is set
7) rx reads 64Kb, credit update is not sent due to *
8) rx does poll(), (rx_bytes >= rcvlowat) 128Kb >= 128Kb, POLLIN is set
9) rx reads 64Kb, credit update is not sent due to *
10) rx does poll(), (rx_bytes < rcvlowat) 64Kb < 128Kb, rx waits in poll()

* is optimization in 'virtio_transport_stream_do_dequeue()' which
  sends OP_CREDIT_UPDATE only when we have not too much space -
  less than VIRTIO_VSOCK_MAX_PKT_BUF_SIZE.

Now tx side waits for space inside write() and rx waits in poll() for
'rx_bytes' to reach SO_RCVLOWAT value. Both sides will wait forever. I
think, possible fix is to send credit update not only when we have too
small space, but also when number of bytes in receive queue is smaller
than SO_RCVLOWAT thus not enough to wake up sleeping reader. I'm not
sure about correctness of this idea, but anyway - I think that problem
above exists. What do You think?

Patchset was rebased and tested on skbuff v7 patch from Bobby Eshleman:
https://lore.kernel.org/netdev/20221213192843.421032-1-bobby.eshleman@bytedance.com/

Arseniy Krasnov(2):
 virtio/vsock: send credit update depending on SO_RCVLOWAT
 vsock_test: mutual hungup reproducer

 net/vmw_vsock/virtio_transport_common.c |  9 +++-
 tools/testing/vsock/vsock_test.c        | 78 +++++++++++++++++++++++++++++++++
 2 files changed, 85 insertions(+), 2 deletions(-)
-- 
2.25.1

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [RFC PATCH v1 1/2] virtio/vsock: send credit update depending on SO_RCVLOWAT
  2022-12-17 19:42 [RFC PATCH v1 0/2] virtio/vsock: fix mutual rx/tx hungup Arseniy Krasnov
@ 2022-12-17 19:45 ` Arseniy Krasnov
  2022-12-17 19:47 ` [RFC PATCH v1 2/2] vsock_test: mutual hungup reproducer Arseniy Krasnov
  2022-12-19 15:41 ` [RFC PATCH v1 0/2] virtio/vsock: fix mutual rx/tx hungup Stefano Garzarella
  2 siblings, 0 replies; 9+ messages in thread
From: Arseniy Krasnov @ 2022-12-17 19:45 UTC (permalink / raw)
  To: Stefano Garzarella, Stefan Hajnoczi, edumazet, David S. Miller,
	Jakub Kicinski, Paolo Abeni
  Cc: linux-kernel, netdev, virtualization, kvm, kernel,
	Krasnov Arseniy, Arseniy Krasnov

This adds extra condition to send credit update message during data
read to userspace. Problem arises, when sender waits for the free space
on the receiver while receiver waits in 'poll()' until 'rx_bytes' reaches
SO_RCVLOWAT value of the socket. With this patch, receiver sends credit
update message when number of bytes in it's rx queue is too small to
avoid sleeping in 'poll()'.

Signed-off-by: Arseniy Krasnov <AVKrasnov@sberdevices.ru>
---
 net/vmw_vsock/virtio_transport_common.c | 9 +++++++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/net/vmw_vsock/virtio_transport_common.c b/net/vmw_vsock/virtio_transport_common.c
index a1581c77cf84..4cf26cf8a754 100644
--- a/net/vmw_vsock/virtio_transport_common.c
+++ b/net/vmw_vsock/virtio_transport_common.c
@@ -362,6 +362,7 @@ virtio_transport_stream_do_dequeue(struct vsock_sock *vsk,
 	struct virtio_vsock_sock *vvs = vsk->trans;
 	size_t bytes, total = 0;
 	struct sk_buff *skb;
+	bool low_rx_bytes;
 	int err = -EFAULT;
 	u32 free_space;
 
@@ -396,6 +397,8 @@ virtio_transport_stream_do_dequeue(struct vsock_sock *vsk,
 	}
 
 	free_space = vvs->buf_alloc - (vvs->fwd_cnt - vvs->last_fwd_cnt);
+	low_rx_bytes = (vvs->rx_bytes <
+			sock_rcvlowat(sk_vsock(vsk), 0, INT_MAX));
 
 	spin_unlock_bh(&vvs->rx_lock);
 
@@ -405,9 +408,11 @@ virtio_transport_stream_do_dequeue(struct vsock_sock *vsk,
 	 * too high causes extra messages. Too low causes transmitter
 	 * stalls. As stalls are in theory more expensive than extra
 	 * messages, we set the limit to a high value. TODO: experiment
-	 * with different values.
+	 * with different values. Also send credit update message when
+	 * number of bytes in rx queue is not enough to wake up reader.
 	 */
-	if (free_space < VIRTIO_VSOCK_MAX_PKT_BUF_SIZE)
+	if (free_space < VIRTIO_VSOCK_MAX_PKT_BUF_SIZE ||
+	    low_rx_bytes)
 		virtio_transport_send_credit_update(vsk);
 
 	return total;
-- 
2.25.1

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [RFC PATCH v1 2/2] vsock_test: mutual hungup reproducer
  2022-12-17 19:42 [RFC PATCH v1 0/2] virtio/vsock: fix mutual rx/tx hungup Arseniy Krasnov
  2022-12-17 19:45 ` [RFC PATCH v1 1/2] virtio/vsock: send credit update depending on SO_RCVLOWAT Arseniy Krasnov
@ 2022-12-17 19:47 ` Arseniy Krasnov
  2022-12-19 15:41 ` [RFC PATCH v1 0/2] virtio/vsock: fix mutual rx/tx hungup Stefano Garzarella
  2 siblings, 0 replies; 9+ messages in thread
From: Arseniy Krasnov @ 2022-12-17 19:47 UTC (permalink / raw)
  To: Stefano Garzarella, Stefan Hajnoczi, edumazet, David S. Miller,
	Jakub Kicinski, Paolo Abeni
  Cc: linux-kernel, netdev, virtualization, kvm, kernel,
	Krasnov Arseniy, Arseniy Krasnov

This is not for merge, just demo.

Signed-off-by: Arseniy Krasnov <AVKrasnov@sberdevices.ru>
---
 tools/testing/vsock/vsock_test.c | 78 ++++++++++++++++++++++++++++++++
 1 file changed, 78 insertions(+)

diff --git a/tools/testing/vsock/vsock_test.c b/tools/testing/vsock/vsock_test.c
index bb6d691cb30d..320ecf4db74b 100644
--- a/tools/testing/vsock/vsock_test.c
+++ b/tools/testing/vsock/vsock_test.c
@@ -699,7 +699,85 @@ static void test_stream_poll_rcvlowat_client(const struct test_opts *opts)
 	close(fd);
 }
 
+static void test_stall_client(const struct test_opts *opts)
+{
+	int fd;
+	unsigned long lowat_val = 128*1024;
+	size_t data_size = 64 * 1024;
+	void *data;
+
+
+	fd = vsock_stream_connect(opts->peer_cid, 1234);
+	assert(fd != -1);
+
+	assert(!setsockopt(fd, SOL_SOCKET, SO_RCVLOWAT,
+		       &lowat_val, sizeof(lowat_val)));
+
+	data = malloc(data_size);
+	assert(data);
+
+	/* Wait for tx to send data. */
+	sleep(3);
+
+	while (1) {
+		struct pollfd fds = {0};
+
+		fds.fd = fd;
+		fds.events = POLLIN | POLLRDNORM | POLLERR | POLLRDHUP | POLLHUP;
+
+		/* Try to wait for 1 sec. */
+		printf("[RX] ENTER POLL\n");
+		assert(poll(&fds, 1, -1) >= 0);
+		printf("[RX] LEAVE POLL\n");
+
+		if (fds.revents & (POLLIN | POLLRDNORM)) {
+			read(fd, data, data_size);
+		}
+
+		if (fds.revents & POLLERR) {
+			printf("[RX] POLL ERR\n");
+			break;
+		}
+
+		if (fds.revents & (POLLRDHUP | POLLHUP)) {
+			printf("[RX] POLL DONE\n");
+			break;
+		}
+	}
+
+	close(fd);
+	exit(0);
+}
+
+static void test_stall_server(const struct test_opts *opts)
+{
+	size_t data_size = (256 * 1024) + 1;
+	void *data;
+	int fd;
+
+	fd = vsock_stream_accept(VMADDR_CID_ANY, 1234, NULL);
+	assert(fd != -1);
+
+	data = malloc(data_size);
+	assert(data);
+
+	printf("[TX] ENTER WRITE\n");
+	assert(write(fd, data, data_size) == data_size);
+
+	/* Never get here without kernel patch. */
+	printf("[TX] LEAVE WRITE\n");
+
+	close(fd);
+	exit(0);
+}
+
+
 static struct test_case test_cases[] = {
+	{
+		.name = "Test stall",
+		.run_client = test_stall_client,
+		.run_server = test_stall_server,
+	},
 	{
 		.name = "SOCK_STREAM connection reset",
 		.run_client = test_stream_connection_reset,
-- 
2.25.1

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [RFC PATCH v1 0/2] virtio/vsock: fix mutual rx/tx hungup
  2022-12-17 19:42 [RFC PATCH v1 0/2] virtio/vsock: fix mutual rx/tx hungup Arseniy Krasnov
  2022-12-17 19:45 ` [RFC PATCH v1 1/2] virtio/vsock: send credit update depending on SO_RCVLOWAT Arseniy Krasnov
  2022-12-17 19:47 ` [RFC PATCH v1 2/2] vsock_test: mutual hungup reproducer Arseniy Krasnov
@ 2022-12-19 15:41 ` Stefano Garzarella
  2022-12-20  7:14   ` Arseniy Krasnov
  2 siblings, 1 reply; 9+ messages in thread
From: Stefano Garzarella @ 2022-12-19 15:41 UTC (permalink / raw)
  To: Arseniy Krasnov
  Cc: Stefan Hajnoczi, edumazet, David S. Miller, Jakub Kicinski,
	Paolo Abeni, linux-kernel, netdev, virtualization, kvm, kernel,
	Krasnov Arseniy, Arseniy Krasnov

Hi Arseniy,

On Sat, Dec 17, 2022 at 8:42 PM Arseniy Krasnov <AVKrasnov@sberdevices.ru> wrote:
>
> Hello,
>
> seems I found strange thing(may be a bug) where sender('tx' later) and
> receiver('rx' later) could stuck forever. Potential fix is in the first
> patch, second patch contains reproducer, based on vsock test suite.
> Reproducer is simple: tx just sends data to rx by 'write() syscall, rx
> dequeues it using 'read()' syscall and uses 'poll()' for waiting. I run
> server in host and client in guest.
>
> rx side params:
> 1) SO_VM_SOCKETS_BUFFER_SIZE is 256Kb(e.g. default).
> 2) SO_RCVLOWAT is 128Kb.
>
> What happens in the reproducer step by step:
>

I put the values of the variables involved to facilitate understanding:

RX: buf_alloc = 256 KB; fwd_cnt = 0; last_fwd_cnt = 0;
    free_space = buf_alloc - (fwd_cnt - last_fwd_cnt) = 256 KB

The credit update is sent if
free_space < VIRTIO_VSOCK_MAX_PKT_BUF_SIZE [64 KB]

> 1) tx tries to send 256Kb + 1 byte (in a single 'write()')
> 2) tx sends 256Kb, data reaches rx (rx_bytes == 256Kb)
> 3) tx waits for space in 'write()' to send last 1 byte
> 4) rx does poll(), (rx_bytes >= rcvlowat) 256Kb >= 128Kb, POLLIN is set
> 5) rx reads 64Kb, credit update is not sent due to *

RX: buf_alloc = 256 KB; fwd_cnt = 64 KB; last_fwd_cnt = 0;
    free_space = 192 KB

> 6) rx does poll(), (rx_bytes >= rcvlowat) 192Kb >= 128Kb, POLLIN is set
> 7) rx reads 64Kb, credit update is not sent due to *

RX: buf_alloc = 256 KB; fwd_cnt = 128 KB; last_fwd_cnt = 0;
    free_space = 128 KB

> 8) rx does poll(), (rx_bytes >= rcvlowat) 128Kb >= 128Kb, POLLIN is set
> 9) rx reads 64Kb, credit update is not sent due to *

Right, (free_space < VIRTIO_VSOCK_MAX_PKT_BUF_SIZE) is still false.

RX: buf_alloc = 256 KB; fwd_cnt = 196 KB; last_fwd_cnt = 0;
    free_space = 64 KB

> 10) rx does poll(), (rx_bytes < rcvlowat) 64Kb < 128Kb, rx waits in poll()

I agree that the TX is stuck because we are not sending the credit 
update, but also if RX sends the credit update at step 9, RX won't be 
woken up at step 10, right?

>
> * is optimization in 'virtio_transport_stream_do_dequeue()' which
>   sends OP_CREDIT_UPDATE only when we have not too much space -
>   less than VIRTIO_VSOCK_MAX_PKT_BUF_SIZE.
>
> Now tx side waits for space inside write() and rx waits in poll() for
> 'rx_bytes' to reach SO_RCVLOWAT value. Both sides will wait forever. I
> think, possible fix is to send credit update not only when we have too
> small space, but also when number of bytes in receive queue is smaller
> than SO_RCVLOWAT thus not enough to wake up sleeping reader. I'm not
> sure about correctness of this idea, but anyway - I think that problem
> above exists. What do You think?

I'm not sure, I have to think more about it, but if RX reads less than 
SO_RCVLOWAT, I expect it's normal to get to a case of stuck.

In this case we are only unstucking TX, but even if it sends that single 
byte, RX is still stuck and not consuming it, so it was useless to wake 
up TX if RX won't consume it anyway, right?

If RX woke up (e.g. SO_RCVLOWAT = 64KB) and read the remaining 64KB, 
then it would still send the credit update even without this patch and 
TX will send the 1 byte.

Thanks,
Stefano


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [RFC PATCH v1 0/2] virtio/vsock: fix mutual rx/tx hungup
  2022-12-19 15:41 ` [RFC PATCH v1 0/2] virtio/vsock: fix mutual rx/tx hungup Stefano Garzarella
@ 2022-12-20  7:14   ` Arseniy Krasnov
  2022-12-20  8:33     ` Stefano Garzarella
  0 siblings, 1 reply; 9+ messages in thread
From: Arseniy Krasnov @ 2022-12-20  7:14 UTC (permalink / raw)
  To: Stefano Garzarella
  Cc: Stefan Hajnoczi, edumazet, David S. Miller, Jakub Kicinski,
	Paolo Abeni, linux-kernel, netdev, virtualization, kvm, kernel,
	Krasnov Arseniy

On 19.12.2022 18:41, Stefano Garzarella wrote:

Hello!

> Hi Arseniy,
> 
> On Sat, Dec 17, 2022 at 8:42 PM Arseniy Krasnov <AVKrasnov@sberdevices.ru> wrote:
>>
>> Hello,
>>
>> seems I found strange thing(may be a bug) where sender('tx' later) and
>> receiver('rx' later) could stuck forever. Potential fix is in the first
>> patch, second patch contains reproducer, based on vsock test suite.
>> Reproducer is simple: tx just sends data to rx by 'write() syscall, rx
>> dequeues it using 'read()' syscall and uses 'poll()' for waiting. I run
>> server in host and client in guest.
>>
>> rx side params:
>> 1) SO_VM_SOCKETS_BUFFER_SIZE is 256Kb(e.g. default).
>> 2) SO_RCVLOWAT is 128Kb.
>>
>> What happens in the reproducer step by step:
>>
> 
> I put the values of the variables involved to facilitate understanding:
> 
> RX: buf_alloc = 256 KB; fwd_cnt = 0; last_fwd_cnt = 0;
>     free_space = buf_alloc - (fwd_cnt - last_fwd_cnt) = 256 KB
> 
> The credit update is sent if
> free_space < VIRTIO_VSOCK_MAX_PKT_BUF_SIZE [64 KB]
> 
>> 1) tx tries to send 256Kb + 1 byte (in a single 'write()')
>> 2) tx sends 256Kb, data reaches rx (rx_bytes == 256Kb)
>> 3) tx waits for space in 'write()' to send last 1 byte
>> 4) rx does poll(), (rx_bytes >= rcvlowat) 256Kb >= 128Kb, POLLIN is set
>> 5) rx reads 64Kb, credit update is not sent due to *
> 
> RX: buf_alloc = 256 KB; fwd_cnt = 64 KB; last_fwd_cnt = 0;
>     free_space = 192 KB
> 
>> 6) rx does poll(), (rx_bytes >= rcvlowat) 192Kb >= 128Kb, POLLIN is set
>> 7) rx reads 64Kb, credit update is not sent due to *
> 
> RX: buf_alloc = 256 KB; fwd_cnt = 128 KB; last_fwd_cnt = 0;
>     free_space = 128 KB
> 
>> 8) rx does poll(), (rx_bytes >= rcvlowat) 128Kb >= 128Kb, POLLIN is set
>> 9) rx reads 64Kb, credit update is not sent due to *
> 
> Right, (free_space < VIRTIO_VSOCK_MAX_PKT_BUF_SIZE) is still false.
> 
> RX: buf_alloc = 256 KB; fwd_cnt = 196 KB; last_fwd_cnt = 0;
>     free_space = 64 KB
> 
>> 10) rx does poll(), (rx_bytes < rcvlowat) 64Kb < 128Kb, rx waits in poll()
> 
> I agree that the TX is stuck because we are not sending the credit 
> update, but also if RX sends the credit update at step 9, RX won't be 
> woken up at step 10, right?

Yes, RX will sleep, but TX will wake up and as we inform TX how much
free space we have, now there are two cases for TX:
1) send "small" rest of data(e.g. without blocking again), leave 'write()'
   and continue execution. RX still waits in 'poll()'. Later TX will
   send enough data to wake up RX.
2) send "big" rest of data - if rest is too big to leave 'write()' and TX
   will wait again for the free space - it will be able to send enough data
   to wake up RX as we compared 'rx_bytes' with rcvlowat value in RX.
> 
>>
>> * is optimization in 'virtio_transport_stream_do_dequeue()' which
>>   sends OP_CREDIT_UPDATE only when we have not too much space -
>>   less than VIRTIO_VSOCK_MAX_PKT_BUF_SIZE.
>>
>> Now tx side waits for space inside write() and rx waits in poll() for
>> 'rx_bytes' to reach SO_RCVLOWAT value. Both sides will wait forever. I
>> think, possible fix is to send credit update not only when we have too
>> small space, but also when number of bytes in receive queue is smaller
>> than SO_RCVLOWAT thus not enough to wake up sleeping reader. I'm not
>> sure about correctness of this idea, but anyway - I think that problem
>> above exists. What do You think?
> 
> I'm not sure, I have to think more about it, but if RX reads less than 
> SO_RCVLOWAT, I expect it's normal to get to a case of stuck.
> 
> In this case we are only unstucking TX, but even if it sends that single 
> byte, RX is still stuck and not consuming it, so it was useless to wake 
> up TX if RX won't consume it anyway, right?

1) I think it is not useless, because we inform(not just wake up) TX that
there is free space at RX side - as i mentioned above.
2) Anyway i think that this situation is a little bit strange: TX thinks that
there is no free space at RX and waits for it, but there is free space at RX!
At the same time, RX waits in poll() forever - it is ready to get new portion
of data to return POLLIN, but TX "thinks" exactly opposite thing - RX is full
of data. Of course, if there will be just stalls in TX data handling - it will
be ok - just performance degradation, but TX stucks forever.

> 
> If RX woke up (e.g. SO_RCVLOWAT = 64KB) and read the remaining 64KB, 
> then it would still send the credit update even without this patch and 
> TX will send the 1 byte.

But how RX will wake up in this case? E.g. it calls poll() without timeout,
connection is established, RX ignores signal

Thanks, Arseniy
> 
> Thanks,
> Stefano
> 


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [RFC PATCH v1 0/2] virtio/vsock: fix mutual rx/tx hungup
  2022-12-20  7:14   ` Arseniy Krasnov
@ 2022-12-20  8:33     ` Stefano Garzarella
  2022-12-20  9:23       ` Arseniy Krasnov
  0 siblings, 1 reply; 9+ messages in thread
From: Stefano Garzarella @ 2022-12-20  8:33 UTC (permalink / raw)
  To: Arseniy Krasnov
  Cc: Stefan Hajnoczi, edumazet, David S. Miller, Jakub Kicinski,
	Paolo Abeni, linux-kernel, netdev, virtualization, kvm, kernel,
	Krasnov Arseniy

On Tue, Dec 20, 2022 at 07:14:27AM +0000, Arseniy Krasnov wrote:
>On 19.12.2022 18:41, Stefano Garzarella wrote:
>
>Hello!
>
>> Hi Arseniy,
>>
>> On Sat, Dec 17, 2022 at 8:42 PM Arseniy Krasnov <AVKrasnov@sberdevices.ru> wrote:
>>>
>>> Hello,
>>>
>>> seems I found strange thing(may be a bug) where sender('tx' later) and
>>> receiver('rx' later) could stuck forever. Potential fix is in the first
>>> patch, second patch contains reproducer, based on vsock test suite.
>>> Reproducer is simple: tx just sends data to rx by 'write() syscall, rx
>>> dequeues it using 'read()' syscall and uses 'poll()' for waiting. I run
>>> server in host and client in guest.
>>>
>>> rx side params:
>>> 1) SO_VM_SOCKETS_BUFFER_SIZE is 256Kb(e.g. default).
>>> 2) SO_RCVLOWAT is 128Kb.
>>>
>>> What happens in the reproducer step by step:
>>>
>>
>> I put the values of the variables involved to facilitate understanding:
>>
>> RX: buf_alloc = 256 KB; fwd_cnt = 0; last_fwd_cnt = 0;
>>     free_space = buf_alloc - (fwd_cnt - last_fwd_cnt) = 256 KB
>>
>> The credit update is sent if
>> free_space < VIRTIO_VSOCK_MAX_PKT_BUF_SIZE [64 KB]
>>
>>> 1) tx tries to send 256Kb + 1 byte (in a single 'write()')
>>> 2) tx sends 256Kb, data reaches rx (rx_bytes == 256Kb)
>>> 3) tx waits for space in 'write()' to send last 1 byte
>>> 4) rx does poll(), (rx_bytes >= rcvlowat) 256Kb >= 128Kb, POLLIN is set
>>> 5) rx reads 64Kb, credit update is not sent due to *
>>
>> RX: buf_alloc = 256 KB; fwd_cnt = 64 KB; last_fwd_cnt = 0;
>>     free_space = 192 KB
>>
>>> 6) rx does poll(), (rx_bytes >= rcvlowat) 192Kb >= 128Kb, POLLIN is set
>>> 7) rx reads 64Kb, credit update is not sent due to *
>>
>> RX: buf_alloc = 256 KB; fwd_cnt = 128 KB; last_fwd_cnt = 0;
>>     free_space = 128 KB
>>
>>> 8) rx does poll(), (rx_bytes >= rcvlowat) 128Kb >= 128Kb, POLLIN is set
>>> 9) rx reads 64Kb, credit update is not sent due to *
>>
>> Right, (free_space < VIRTIO_VSOCK_MAX_PKT_BUF_SIZE) is still false.
>>
>> RX: buf_alloc = 256 KB; fwd_cnt = 196 KB; last_fwd_cnt = 0;
>>     free_space = 64 KB
>>
>>> 10) rx does poll(), (rx_bytes < rcvlowat) 64Kb < 128Kb, rx waits in poll()
>>
>> I agree that the TX is stuck because we are not sending the credit
>> update, but also if RX sends the credit update at step 9, RX won't be
>> woken up at step 10, right?
>
>Yes, RX will sleep, but TX will wake up and as we inform TX how much
>free space we have, now there are two cases for TX:
>1) send "small" rest of data(e.g. without blocking again), leave 'write()'
>   and continue execution. RX still waits in 'poll()'. Later TX will
>   send enough data to wake up RX.
>2) send "big" rest of data - if rest is too big to leave 'write()' and TX
>   will wait again for the free space - it will be able to send enough data
>   to wake up RX as we compared 'rx_bytes' with rcvlowat value in RX.

Right, so I'd update the test to behave like this.
And I'd explain better the problem we are going to fix in the commit 
message.

>>
>>>
>>> * is optimization in 'virtio_transport_stream_do_dequeue()' which
>>>   sends OP_CREDIT_UPDATE only when we have not too much space -
>>>   less than VIRTIO_VSOCK_MAX_PKT_BUF_SIZE.
>>>
>>> Now tx side waits for space inside write() and rx waits in poll() for
>>> 'rx_bytes' to reach SO_RCVLOWAT value. Both sides will wait forever. I
>>> think, possible fix is to send credit update not only when we have too
>>> small space, but also when number of bytes in receive queue is smaller
>>> than SO_RCVLOWAT thus not enough to wake up sleeping reader. I'm not
>>> sure about correctness of this idea, but anyway - I think that problem
>>> above exists. What do You think?
>>
>> I'm not sure, I have to think more about it, but if RX reads less 
>> than
>> SO_RCVLOWAT, I expect it's normal to get to a case of stuck.
>>
>> In this case we are only unstucking TX, but even if it sends that single
>> byte, RX is still stuck and not consuming it, so it was useless to wake
>> up TX if RX won't consume it anyway, right?
>
>1) I think it is not useless, because we inform(not just wake up) TX that
>there is free space at RX side - as i mentioned above.
>2) Anyway i think that this situation is a little bit strange: TX thinks that
>there is no free space at RX and waits for it, but there is free space at RX!
>At the same time, RX waits in poll() forever - it is ready to get new portion
>of data to return POLLIN, but TX "thinks" exactly opposite thing - RX is full
>of data. Of course, if there will be just stalls in TX data handling - it will
>be ok - just performance degradation, but TX stucks forever.

We did it to avoid a lot of credit update messages.
Anyway I think here the main point is why RX is setting SO_RCVLOWAT to 
128 KB and then reads only half of it?

So I think if the users set SO_RCVLOWAT to a value and then RX reads 
less then it, is expected to get stuck.

Anyway, since the change will not impact the default behaviour 
(SO_RCVLOWAT = 1) we can merge this patch, but IMHO we need to explain 
the case better and improve the test.

>
>>
>> If RX woke up (e.g. SO_RCVLOWAT = 64KB) and read the remaining 64KB,
>> then it would still send the credit update even without this patch and
>> TX will send the 1 byte.
>
>But how RX will wake up in this case? E.g. it calls poll() without timeout,
>connection is established, RX ignores signal

RX will wake up because SO_RCVLOWAT is 64KB and there are 64 KB in the 
buffer. Then RX will read it and send the credit update to TX because
free_space is 0.

Thanks,
Stefano


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [RFC PATCH v1 0/2] virtio/vsock: fix mutual rx/tx hungup
  2022-12-20  8:33     ` Stefano Garzarella
@ 2022-12-20  9:23       ` Arseniy Krasnov
  2022-12-20 10:43         ` Stefano Garzarella
  0 siblings, 1 reply; 9+ messages in thread
From: Arseniy Krasnov @ 2022-12-20  9:23 UTC (permalink / raw)
  To: Stefano Garzarella
  Cc: Stefan Hajnoczi, edumazet, David S. Miller, Jakub Kicinski,
	Paolo Abeni, linux-kernel, netdev, virtualization, kvm, kernel,
	Krasnov Arseniy

On 20.12.2022 11:33, Stefano Garzarella wrote:
> On Tue, Dec 20, 2022 at 07:14:27AM +0000, Arseniy Krasnov wrote:
>> On 19.12.2022 18:41, Stefano Garzarella wrote:
>>
>> Hello!
>>
>>> Hi Arseniy,
>>>
>>> On Sat, Dec 17, 2022 at 8:42 PM Arseniy Krasnov <AVKrasnov@sberdevices.ru> wrote:
>>>>
>>>> Hello,
>>>>
>>>> seems I found strange thing(may be a bug) where sender('tx' later) and
>>>> receiver('rx' later) could stuck forever. Potential fix is in the first
>>>> patch, second patch contains reproducer, based on vsock test suite.
>>>> Reproducer is simple: tx just sends data to rx by 'write() syscall, rx
>>>> dequeues it using 'read()' syscall and uses 'poll()' for waiting. I run
>>>> server in host and client in guest.
>>>>
>>>> rx side params:
>>>> 1) SO_VM_SOCKETS_BUFFER_SIZE is 256Kb(e.g. default).
>>>> 2) SO_RCVLOWAT is 128Kb.
>>>>
>>>> What happens in the reproducer step by step:
>>>>
>>>
>>> I put the values of the variables involved to facilitate understanding:
>>>
>>> RX: buf_alloc = 256 KB; fwd_cnt = 0; last_fwd_cnt = 0;
>>>     free_space = buf_alloc - (fwd_cnt - last_fwd_cnt) = 256 KB
>>>
>>> The credit update is sent if
>>> free_space < VIRTIO_VSOCK_MAX_PKT_BUF_SIZE [64 KB]
>>>
>>>> 1) tx tries to send 256Kb + 1 byte (in a single 'write()')
>>>> 2) tx sends 256Kb, data reaches rx (rx_bytes == 256Kb)
>>>> 3) tx waits for space in 'write()' to send last 1 byte
>>>> 4) rx does poll(), (rx_bytes >= rcvlowat) 256Kb >= 128Kb, POLLIN is set
>>>> 5) rx reads 64Kb, credit update is not sent due to *
>>>
>>> RX: buf_alloc = 256 KB; fwd_cnt = 64 KB; last_fwd_cnt = 0;
>>>     free_space = 192 KB
>>>
>>>> 6) rx does poll(), (rx_bytes >= rcvlowat) 192Kb >= 128Kb, POLLIN is set
>>>> 7) rx reads 64Kb, credit update is not sent due to *
>>>
>>> RX: buf_alloc = 256 KB; fwd_cnt = 128 KB; last_fwd_cnt = 0;
>>>     free_space = 128 KB
>>>
>>>> 8) rx does poll(), (rx_bytes >= rcvlowat) 128Kb >= 128Kb, POLLIN is set
>>>> 9) rx reads 64Kb, credit update is not sent due to *
>>>
>>> Right, (free_space < VIRTIO_VSOCK_MAX_PKT_BUF_SIZE) is still false.
>>>
>>> RX: buf_alloc = 256 KB; fwd_cnt = 196 KB; last_fwd_cnt = 0;
>>>     free_space = 64 KB
>>>
>>>> 10) rx does poll(), (rx_bytes < rcvlowat) 64Kb < 128Kb, rx waits in poll()
>>>
>>> I agree that the TX is stuck because we are not sending the credit
>>> update, but also if RX sends the credit update at step 9, RX won't be
>>> woken up at step 10, right?
>>
>> Yes, RX will sleep, but TX will wake up and as we inform TX how much
>> free space we have, now there are two cases for TX:
>> 1) send "small" rest of data(e.g. without blocking again), leave 'write()'
>>   and continue execution. RX still waits in 'poll()'. Later TX will
>>   send enough data to wake up RX.
>> 2) send "big" rest of data - if rest is too big to leave 'write()' and TX
>>   will wait again for the free space - it will be able to send enough data
>>   to wake up RX as we compared 'rx_bytes' with rcvlowat value in RX.
> 
> Right, so I'd update the test to behave like this.
Sorry, You mean vsock_test? To cover TX waiting for free space at RX, thus checking
this kernel patch logic?
> And I'd explain better the problem we are going to fix in the commit message.
Ok
> 
>>>
>>>>
>>>> * is optimization in 'virtio_transport_stream_do_dequeue()' which
>>>>   sends OP_CREDIT_UPDATE only when we have not too much space -
>>>>   less than VIRTIO_VSOCK_MAX_PKT_BUF_SIZE.
>>>>
>>>> Now tx side waits for space inside write() and rx waits in poll() for
>>>> 'rx_bytes' to reach SO_RCVLOWAT value. Both sides will wait forever. I
>>>> think, possible fix is to send credit update not only when we have too
>>>> small space, but also when number of bytes in receive queue is smaller
>>>> than SO_RCVLOWAT thus not enough to wake up sleeping reader. I'm not
>>>> sure about correctness of this idea, but anyway - I think that problem
>>>> above exists. What do You think?
>>>
>>> I'm not sure, I have to think more about it, but if RX reads less than
>>> SO_RCVLOWAT, I expect it's normal to get to a case of stuck.
>>>
>>> In this case we are only unstucking TX, but even if it sends that single
>>> byte, RX is still stuck and not consuming it, so it was useless to wake
>>> up TX if RX won't consume it anyway, right?
>>
>> 1) I think it is not useless, because we inform(not just wake up) TX that
>> there is free space at RX side - as i mentioned above.
>> 2) Anyway i think that this situation is a little bit strange: TX thinks that
>> there is no free space at RX and waits for it, but there is free space at RX!
>> At the same time, RX waits in poll() forever - it is ready to get new portion
>> of data to return POLLIN, but TX "thinks" exactly opposite thing - RX is full
>> of data. Of course, if there will be just stalls in TX data handling - it will
>> be ok - just performance degradation, but TX stucks forever.
> 
> We did it to avoid a lot of credit update messages.
Yes, i see
> Anyway I think here the main point is why RX is setting SO_RCVLOWAT to 128 KB and then reads only half of it?
> 
> So I think if the users set SO_RCVLOWAT to a value and then RX reads less then it, is expected to get stuck.
That a really interesting question, I've found nothing about this case in Google(not sure for 100%) or POSIX. But,
i can modify reproducer: it sets SO_RCVLOWAT to 128Kb BEFORE entering its last poll where it will stuck. In this
case behaviour looks more legal: it uses default SO_RCVLOWAT of 1, read 64Kb each time. Finally it sets SO_RCVLOWAT
to 128Kb(and imagine that it prepares 128Kb 'read()' buffer) and enters poll() - we will get same effect: TX will wait
for space, RX waits in 'poll()'.
> 
> Anyway, since the change will not impact the default behaviour (SO_RCVLOWAT = 1) we can merge this patch, but IMHO we need to explain the case better and improve the test.
I see, of course I'm not sure about this change, just want to ask someone who knows this code better
> 
>>
>>>
>>> If RX woke up (e.g. SO_RCVLOWAT = 64KB) and read the remaining 64KB,
>>> then it would still send the credit update even without this patch and
>>> TX will send the 1 byte.
>>
>> But how RX will wake up in this case? E.g. it calls poll() without timeout,
>> connection is established, RX ignores signal
> 
> RX will wake up because SO_RCVLOWAT is 64KB and there are 64 KB in the buffer. Then RX will read it and send the credit update to TX because
> free_space is 0.
IIUC, i'm talking about 10 steps above, e.g. RX will never wake up, because TX is waiting for space.
> 
> Thanks,
> Stefano
> 
Thanks, Arseniy


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [RFC PATCH v1 0/2] virtio/vsock: fix mutual rx/tx hungup
  2022-12-20  9:23       ` Arseniy Krasnov
@ 2022-12-20 10:43         ` Stefano Garzarella
  2022-12-20 12:09           ` Arseniy Krasnov
  0 siblings, 1 reply; 9+ messages in thread
From: Stefano Garzarella @ 2022-12-20 10:43 UTC (permalink / raw)
  To: Arseniy Krasnov
  Cc: Stefan Hajnoczi, edumazet, David S. Miller, Jakub Kicinski,
	Paolo Abeni, linux-kernel, netdev, virtualization, kvm, kernel,
	Krasnov Arseniy

On Tue, Dec 20, 2022 at 09:23:17AM +0000, Arseniy Krasnov wrote:
>On 20.12.2022 11:33, Stefano Garzarella wrote:
>> On Tue, Dec 20, 2022 at 07:14:27AM +0000, Arseniy Krasnov wrote:
>>> On 19.12.2022 18:41, Stefano Garzarella wrote:
>>>
>>> Hello!
>>>
>>>> Hi Arseniy,
>>>>
>>>> On Sat, Dec 17, 2022 at 8:42 PM Arseniy Krasnov <AVKrasnov@sberdevices.ru> wrote:
>>>>>
>>>>> Hello,
>>>>>
>>>>> seems I found strange thing(may be a bug) where sender('tx' later) and
>>>>> receiver('rx' later) could stuck forever. Potential fix is in the first
>>>>> patch, second patch contains reproducer, based on vsock test suite.
>>>>> Reproducer is simple: tx just sends data to rx by 'write() syscall, rx
>>>>> dequeues it using 'read()' syscall and uses 'poll()' for waiting. I run
>>>>> server in host and client in guest.
>>>>>
>>>>> rx side params:
>>>>> 1) SO_VM_SOCKETS_BUFFER_SIZE is 256Kb(e.g. default).
>>>>> 2) SO_RCVLOWAT is 128Kb.
>>>>>
>>>>> What happens in the reproducer step by step:
>>>>>
>>>>
>>>> I put the values of the variables involved to facilitate understanding:
>>>>
>>>> RX: buf_alloc = 256 KB; fwd_cnt = 0; last_fwd_cnt = 0;
>>>>     free_space = buf_alloc - (fwd_cnt - last_fwd_cnt) = 256 KB
>>>>
>>>> The credit update is sent if
>>>> free_space < VIRTIO_VSOCK_MAX_PKT_BUF_SIZE [64 KB]
>>>>
>>>>> 1) tx tries to send 256Kb + 1 byte (in a single 'write()')
>>>>> 2) tx sends 256Kb, data reaches rx (rx_bytes == 256Kb)
>>>>> 3) tx waits for space in 'write()' to send last 1 byte
>>>>> 4) rx does poll(), (rx_bytes >= rcvlowat) 256Kb >= 128Kb, POLLIN is set
>>>>> 5) rx reads 64Kb, credit update is not sent due to *
>>>>
>>>> RX: buf_alloc = 256 KB; fwd_cnt = 64 KB; last_fwd_cnt = 0;
>>>>     free_space = 192 KB
>>>>
>>>>> 6) rx does poll(), (rx_bytes >= rcvlowat) 192Kb >= 128Kb, POLLIN is set
>>>>> 7) rx reads 64Kb, credit update is not sent due to *
>>>>
>>>> RX: buf_alloc = 256 KB; fwd_cnt = 128 KB; last_fwd_cnt = 0;
>>>>     free_space = 128 KB
>>>>
>>>>> 8) rx does poll(), (rx_bytes >= rcvlowat) 128Kb >= 128Kb, POLLIN is set
>>>>> 9) rx reads 64Kb, credit update is not sent due to *
>>>>
>>>> Right, (free_space < VIRTIO_VSOCK_MAX_PKT_BUF_SIZE) is still false.
>>>>
>>>> RX: buf_alloc = 256 KB; fwd_cnt = 196 KB; last_fwd_cnt = 0;
>>>>     free_space = 64 KB
>>>>
>>>>> 10) rx does poll(), (rx_bytes < rcvlowat) 64Kb < 128Kb, rx waits in poll()
>>>>
>>>> I agree that the TX is stuck because we are not sending the credit
>>>> update, but also if RX sends the credit update at step 9, RX won't be
>>>> woken up at step 10, right?
>>>
>>> Yes, RX will sleep, but TX will wake up and as we inform TX how much
>>> free space we have, now there are two cases for TX:
>>> 1) send "small" rest of data(e.g. without blocking again), leave 'write()'
>>>   and continue execution. RX still waits in 'poll()'. Later TX will
>>>   send enough data to wake up RX.
>>> 2) send "big" rest of data - if rest is too big to leave 'write()' and TX
>>>   will wait again for the free space - it will be able to send enough data
>>>   to wake up RX as we compared 'rx_bytes' with rcvlowat value in RX.
>>
>> Right, so I'd update the test to behave like this.
>Sorry, You mean vsock_test? To cover TX waiting for free space at RX, thus checking
>this kernel patch logic?

Yep, I mean the test that you added in this series.

>> And I'd explain better the problem we are going to fix in the commit message.
>Ok
>>
>>>>
>>>>>
>>>>> * is optimization in 'virtio_transport_stream_do_dequeue()' which
>>>>>   sends OP_CREDIT_UPDATE only when we have not too much space -
>>>>>   less than VIRTIO_VSOCK_MAX_PKT_BUF_SIZE.
>>>>>
>>>>> Now tx side waits for space inside write() and rx waits in poll() for
>>>>> 'rx_bytes' to reach SO_RCVLOWAT value. Both sides will wait forever. I
>>>>> think, possible fix is to send credit update not only when we have too
>>>>> small space, but also when number of bytes in receive queue is smaller
>>>>> than SO_RCVLOWAT thus not enough to wake up sleeping reader. I'm not
>>>>> sure about correctness of this idea, but anyway - I think that problem
>>>>> above exists. What do You think?
>>>>
>>>> I'm not sure, I have to think more about it, but if RX reads less than
>>>> SO_RCVLOWAT, I expect it's normal to get to a case of stuck.
>>>>
>>>> In this case we are only unstucking TX, but even if it sends that single
>>>> byte, RX is still stuck and not consuming it, so it was useless to wake
>>>> up TX if RX won't consume it anyway, right?
>>>
>>> 1) I think it is not useless, because we inform(not just wake up) TX that
>>> there is free space at RX side - as i mentioned above.
>>> 2) Anyway i think that this situation is a little bit strange: TX thinks that
>>> there is no free space at RX and waits for it, but there is free space at RX!
>>> At the same time, RX waits in poll() forever - it is ready to get new portion
>>> of data to return POLLIN, but TX "thinks" exactly opposite thing - RX is full
>>> of data. Of course, if there will be just stalls in TX data handling - it will
>>> be ok - just performance degradation, but TX stucks forever.
>>
>> We did it to avoid a lot of credit update messages.
>Yes, i see
>> Anyway I think here the main point is why RX is setting SO_RCVLOWAT to 128 KB and then reads only half of it?
>>
>> So I think if the users set SO_RCVLOWAT to a value and then RX reads less then it, is expected to get stuck.
>That a really interesting question, I've found nothing about this case in Google(not sure for 100%) or POSIX. But,
>i can modify reproducer: it sets SO_RCVLOWAT to 128Kb BEFORE entering its last poll where it will stuck. In this
>case behaviour looks more legal: it uses default SO_RCVLOWAT of 1, read 64Kb each time. Finally it sets SO_RCVLOWAT
>to 128Kb(and imagine that it prepares 128Kb 'read()' buffer) and enters poll() - we will get same effect: TX will wait
>for space, RX waits in 'poll()'.

Good point!

>>
>> Anyway, since the change will not impact the default behaviour (SO_RCVLOWAT = 1) we can merge this patch, but IMHO we need to explain the case better and improve the test.
>I see, of course I'm not sure about this change, just want to ask 
>someone who knows this code better

Yes, it's an RFC, so you did well! :-)

>>
>>>
>>>>
>>>> If RX woke up (e.g. SO_RCVLOWAT = 64KB) and read the remaining 64KB,
>>>> then it would still send the credit update even without this patch and
>>>> TX will send the 1 byte.
>>>
>>> But how RX will wake up in this case? E.g. it calls poll() without timeout,
>>> connection is established, RX ignores signal
>>
>> RX will wake up because SO_RCVLOWAT is 64KB and there are 64 KB in the buffer. Then RX will read it and send the credit update to TX because
>> free_space is 0.
>IIUC, i'm talking about 10 steps above, e.g. RX will never wake up, 
>because TX is waiting for space.

Yep, but if RX uses SO_RCVLOWAT = 64 KB instead of 128 KB (I mean if RX 
reads all the bytes that it's waiting as it specified in SO_RCVLOWAT), 
then RX will send the credit message.

But there is the case that you mentioned, when SO_RCVLOWAT is chagend 
while executing.

Thanks,
Stefano


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [RFC PATCH v1 0/2] virtio/vsock: fix mutual rx/tx hungup
  2022-12-20 10:43         ` Stefano Garzarella
@ 2022-12-20 12:09           ` Arseniy Krasnov
  0 siblings, 0 replies; 9+ messages in thread
From: Arseniy Krasnov @ 2022-12-20 12:09 UTC (permalink / raw)
  To: Stefano Garzarella
  Cc: Stefan Hajnoczi, edumazet, David S. Miller, Jakub Kicinski,
	Paolo Abeni, linux-kernel, netdev, virtualization, kvm, kernel,
	Krasnov Arseniy

On 20.12.2022 13:43, Stefano Garzarella wrote:
> On Tue, Dec 20, 2022 at 09:23:17AM +0000, Arseniy Krasnov wrote:
>> On 20.12.2022 11:33, Stefano Garzarella wrote:
>>> On Tue, Dec 20, 2022 at 07:14:27AM +0000, Arseniy Krasnov wrote:
>>>> On 19.12.2022 18:41, Stefano Garzarella wrote:
>>>>
>>>> Hello!
>>>>
>>>>> Hi Arseniy,
>>>>>
>>>>> On Sat, Dec 17, 2022 at 8:42 PM Arseniy Krasnov <AVKrasnov@sberdevices.ru> wrote:
>>>>>>
>>>>>> Hello,
>>>>>>
>>>>>> seems I found strange thing(may be a bug) where sender('tx' later) and
>>>>>> receiver('rx' later) could stuck forever. Potential fix is in the first
>>>>>> patch, second patch contains reproducer, based on vsock test suite.
>>>>>> Reproducer is simple: tx just sends data to rx by 'write() syscall, rx
>>>>>> dequeues it using 'read()' syscall and uses 'poll()' for waiting. I run
>>>>>> server in host and client in guest.
>>>>>>
>>>>>> rx side params:
>>>>>> 1) SO_VM_SOCKETS_BUFFER_SIZE is 256Kb(e.g. default).
>>>>>> 2) SO_RCVLOWAT is 128Kb.
>>>>>>
>>>>>> What happens in the reproducer step by step:
>>>>>>
>>>>>
>>>>> I put the values of the variables involved to facilitate understanding:
>>>>>
>>>>> RX: buf_alloc = 256 KB; fwd_cnt = 0; last_fwd_cnt = 0;
>>>>>     free_space = buf_alloc - (fwd_cnt - last_fwd_cnt) = 256 KB
>>>>>
>>>>> The credit update is sent if
>>>>> free_space < VIRTIO_VSOCK_MAX_PKT_BUF_SIZE [64 KB]
>>>>>
>>>>>> 1) tx tries to send 256Kb + 1 byte (in a single 'write()')
>>>>>> 2) tx sends 256Kb, data reaches rx (rx_bytes == 256Kb)
>>>>>> 3) tx waits for space in 'write()' to send last 1 byte
>>>>>> 4) rx does poll(), (rx_bytes >= rcvlowat) 256Kb >= 128Kb, POLLIN is set
>>>>>> 5) rx reads 64Kb, credit update is not sent due to *
>>>>>
>>>>> RX: buf_alloc = 256 KB; fwd_cnt = 64 KB; last_fwd_cnt = 0;
>>>>>     free_space = 192 KB
>>>>>
>>>>>> 6) rx does poll(), (rx_bytes >= rcvlowat) 192Kb >= 128Kb, POLLIN is set
>>>>>> 7) rx reads 64Kb, credit update is not sent due to *
>>>>>
>>>>> RX: buf_alloc = 256 KB; fwd_cnt = 128 KB; last_fwd_cnt = 0;
>>>>>     free_space = 128 KB
>>>>>
>>>>>> 8) rx does poll(), (rx_bytes >= rcvlowat) 128Kb >= 128Kb, POLLIN is set
>>>>>> 9) rx reads 64Kb, credit update is not sent due to *
>>>>>
>>>>> Right, (free_space < VIRTIO_VSOCK_MAX_PKT_BUF_SIZE) is still false.
>>>>>
>>>>> RX: buf_alloc = 256 KB; fwd_cnt = 196 KB; last_fwd_cnt = 0;
>>>>>     free_space = 64 KB
>>>>>
>>>>>> 10) rx does poll(), (rx_bytes < rcvlowat) 64Kb < 128Kb, rx waits in poll()
>>>>>
>>>>> I agree that the TX is stuck because we are not sending the credit
>>>>> update, but also if RX sends the credit update at step 9, RX won't be
>>>>> woken up at step 10, right?
>>>>
>>>> Yes, RX will sleep, but TX will wake up and as we inform TX how much
>>>> free space we have, now there are two cases for TX:
>>>> 1) send "small" rest of data(e.g. without blocking again), leave 'write()'
>>>>   and continue execution. RX still waits in 'poll()'. Later TX will
>>>>   send enough data to wake up RX.
>>>> 2) send "big" rest of data - if rest is too big to leave 'write()' and TX
>>>>   will wait again for the free space - it will be able to send enough data
>>>>   to wake up RX as we compared 'rx_bytes' with rcvlowat value in RX.
>>>
>>> Right, so I'd update the test to behave like this.
>> Sorry, You mean vsock_test? To cover TX waiting for free space at RX, thus checking
>> this kernel patch logic?
> 
> Yep, I mean the test that you added in this series.
Ok
> 
>>> And I'd explain better the problem we are going to fix in the commit message.
>> Ok
>>>
>>>>>
>>>>>>
>>>>>> * is optimization in 'virtio_transport_stream_do_dequeue()' which
>>>>>>   sends OP_CREDIT_UPDATE only when we have not too much space -
>>>>>>   less than VIRTIO_VSOCK_MAX_PKT_BUF_SIZE.
>>>>>>
>>>>>> Now tx side waits for space inside write() and rx waits in poll() for
>>>>>> 'rx_bytes' to reach SO_RCVLOWAT value. Both sides will wait forever. I
>>>>>> think, possible fix is to send credit update not only when we have too
>>>>>> small space, but also when number of bytes in receive queue is smaller
>>>>>> than SO_RCVLOWAT thus not enough to wake up sleeping reader. I'm not
>>>>>> sure about correctness of this idea, but anyway - I think that problem
>>>>>> above exists. What do You think?
>>>>>
>>>>> I'm not sure, I have to think more about it, but if RX reads less than
>>>>> SO_RCVLOWAT, I expect it's normal to get to a case of stuck.
>>>>>
>>>>> In this case we are only unstucking TX, but even if it sends that single
>>>>> byte, RX is still stuck and not consuming it, so it was useless to wake
>>>>> up TX if RX won't consume it anyway, right?
>>>>
>>>> 1) I think it is not useless, because we inform(not just wake up) TX that
>>>> there is free space at RX side - as i mentioned above.
>>>> 2) Anyway i think that this situation is a little bit strange: TX thinks that
>>>> there is no free space at RX and waits for it, but there is free space at RX!
>>>> At the same time, RX waits in poll() forever - it is ready to get new portion
>>>> of data to return POLLIN, but TX "thinks" exactly opposite thing - RX is full
>>>> of data. Of course, if there will be just stalls in TX data handling - it will
>>>> be ok - just performance degradation, but TX stucks forever.
>>>
>>> We did it to avoid a lot of credit update messages.
>> Yes, i see
>>> Anyway I think here the main point is why RX is setting SO_RCVLOWAT to 128 KB and then reads only half of it?
>>>
>>> So I think if the users set SO_RCVLOWAT to a value and then RX reads less then it, is expected to get stuck.
>> That a really interesting question, I've found nothing about this case in Google(not sure for 100%) or POSIX. But,
>> i can modify reproducer: it sets SO_RCVLOWAT to 128Kb BEFORE entering its last poll where it will stuck. In this
>> case behaviour looks more legal: it uses default SO_RCVLOWAT of 1, read 64Kb each time. Finally it sets SO_RCVLOWAT
>> to 128Kb(and imagine that it prepares 128Kb 'read()' buffer) and enters poll() - we will get same effect: TX will wait
>> for space, RX waits in 'poll()'.
> 
> Good point!
> 
>>>
>>> Anyway, since the change will not impact the default behaviour (SO_RCVLOWAT = 1) we can merge this patch, but IMHO we need to explain the case better and improve the test.
>> I see, of course I'm not sure about this change, just want to ask someone who knows this code better
> 
> Yes, it's an RFC, so you did well! :-)
So ok, I'll prepare RFC version of this patchset(e.g. CV with explanation, kernel patch and test for it)
> 
>>>
>>>>
>>>>>
>>>>> If RX woke up (e.g. SO_RCVLOWAT = 64KB) and read the remaining 64KB,
>>>>> then it would still send the credit update even without this patch and
>>>>> TX will send the 1 byte.
>>>>
>>>> But how RX will wake up in this case? E.g. it calls poll() without timeout,
>>>> connection is established, RX ignores signal
>>>
>>> RX will wake up because SO_RCVLOWAT is 64KB and there are 64 KB in the buffer. Then RX will read it and send the credit update to TX because
>>> free_space is 0.
>> IIUC, i'm talking about 10 steps above, e.g. RX will never wake up, because TX is waiting for space.
> 
> Yep, but if RX uses SO_RCVLOWAT = 64 KB instead of 128 KB (I mean if RX reads all the bytes that it's waiting as it specified in SO_RCVLOWAT), then RX will send the credit message.
> 
> But there is the case that you mentioned, when SO_RCVLOWAT is chagend while executing.
I'll use this case for test
> 
> Thanks,
> Stefano
> 
Thanks, Arseniy


^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2022-12-20 12:09 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-12-17 19:42 [RFC PATCH v1 0/2] virtio/vsock: fix mutual rx/tx hungup Arseniy Krasnov
2022-12-17 19:45 ` [RFC PATCH v1 1/2] virtio/vsock: send credit update depending on SO_RCVLOWAT Arseniy Krasnov
2022-12-17 19:47 ` [RFC PATCH v1 2/2] vsock_test: mutual hungup reproducer Arseniy Krasnov
2022-12-19 15:41 ` [RFC PATCH v1 0/2] virtio/vsock: fix mutual rx/tx hungup Stefano Garzarella
2022-12-20  7:14   ` Arseniy Krasnov
2022-12-20  8:33     ` Stefano Garzarella
2022-12-20  9:23       ` Arseniy Krasnov
2022-12-20 10:43         ` Stefano Garzarella
2022-12-20 12:09           ` Arseniy Krasnov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).