linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC PATCH v1 0/3] virtio/vsock: use SO_RCVLOWAT to set POLLIN/POLLRDNORM
@ 2022-07-18  8:12 Arseniy Krasnov
  2022-07-18  8:15 ` [RFC PATCH v1 1/3] vsock: use sk_skrcvlowat to set POLLIN,POLLRDNORM, bits Arseniy Krasnov
                   ` (3 more replies)
  0 siblings, 4 replies; 18+ messages in thread
From: Arseniy Krasnov @ 2022-07-18  8:12 UTC (permalink / raw)
  To: Stefano Garzarella, David S. Miller, Jakub Kicinski, Paolo Abeni,
	Stefan Hajnoczi, Michael S. Tsirkin, Arseniy Krasnov,
	Krasnov Arseniy
  Cc: kvm, netdev, virtualization, linux-kernel, kernel

Hello,

during my experiments with zerocopy receive, i found, that in some
cases, poll() implementation violates POSIX: when socket has non-
default SO_RCVLOWAT(e.g. not 1), poll() will always set POLLIN and
POLLRDNORM bits in 'revents' even number of bytes available to read
on socket is smaller than SO_RCVLOWAT value. In this case,user sees
POLLIN flag and then tries to read data(for example using  'read()'
call), but read call will be blocked, because  SO_RCVLOWAT logic is
supported in dequeue loop in af_vsock.c. But the same time,  POSIX
requires that:

"POLLIN     Data other than high-priority data may be read without
            blocking.
 POLLRDNORM Normal data may be read without blocking."

See https://www.open-std.org/jtc1/sc22/open/n4217.pdf, page 293.

So, we have, that poll() syscall returns POLLIN, but read call will
be blocked.

Also in man page socket(7) i found that:

"Since Linux 2.6.28, select(2), poll(2), and epoll(7) indicate a
socket as readable only if at least SO_RCVLOWAT bytes are available."

I checked TCP callback for poll()(net/ipv4/tcp.c, tcp_poll()), it
uses SO_RCVLOWAT value to set POLLIN bit, also i've tested TCP with
this case for TCP socket, it works as POSIX required.

I've added some fixes to af_vsock.c and virtio_transport_common.c,
test is also implemented.

What do You think guys?

Thank You

Arseniy Krasnov(3):
 vsock_test: POLLIN + SO_RCVLOWAT test.
 virtio/vsock: use 'target' in notify_poll_in callback.
 vsock: use sk_skrcvlowat to set POLLIN,POLLRDNORM bits.

 net/vmw_vsock/af_vsock.c                |  2 +-
 net/vmw_vsock/virtio_transport_common.c |  2 +-
 tools/testing/vsock/vsock_test.c        | 90 +++++++++++++++++++++++++++++++++
 3 files changed, 92 insertions(+), 2 deletions(-)

-- 
2.25.1

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [RFC PATCH v1 1/3] vsock: use sk_skrcvlowat to set POLLIN,POLLRDNORM, bits.
  2022-07-18  8:12 [RFC PATCH v1 0/3] virtio/vsock: use SO_RCVLOWAT to set POLLIN/POLLRDNORM Arseniy Krasnov
@ 2022-07-18  8:15 ` Arseniy Krasnov
  2022-07-19 12:44   ` Stefano Garzarella
  2022-07-18  8:17 ` [RFC PATCH v1 2/3] virtio/vsock: use 'target' in notify_poll_in, callback Arseniy Krasnov
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 18+ messages in thread
From: Arseniy Krasnov @ 2022-07-18  8:15 UTC (permalink / raw)
  To: Stefano Garzarella, David S. Miller, Jakub Kicinski, Paolo Abeni,
	Stefan Hajnoczi, Michael S. Tsirkin, Arseniy Krasnov,
	Krasnov Arseniy, edumazet
  Cc: linux-kernel, virtualization, netdev, kvm, kernel

Both bits indicate, that next data read call won't be blocked,
but when sk_rcvlowat is not 1,these bits will be set by poll
anyway,thus when user tries to dequeue data, it will wait until
sk_rcvlowat bytes of data will be available.

Signed-off-by: Arseniy Krasnov <AVKrasnov@sberdevices.ru>
---
 net/vmw_vsock/af_vsock.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/vmw_vsock/af_vsock.c b/net/vmw_vsock/af_vsock.c
index f04abf662ec6..0225f3558e30 100644
--- a/net/vmw_vsock/af_vsock.c
+++ b/net/vmw_vsock/af_vsock.c
@@ -1067,7 +1067,7 @@ static __poll_t vsock_poll(struct file *file, struct socket *sock,
 		    !(sk->sk_shutdown & RCV_SHUTDOWN)) {
 			bool data_ready_now = false;
 			int ret = transport->notify_poll_in(
-					vsk, 1, &data_ready_now);
+					vsk, sk->sk_rcvlowat, &data_ready_now);
 			if (ret < 0) {
 				mask |= EPOLLERR;
 			} else {
-- 
2.25.1

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [RFC PATCH v1 2/3] virtio/vsock: use 'target' in notify_poll_in, callback.
  2022-07-18  8:12 [RFC PATCH v1 0/3] virtio/vsock: use SO_RCVLOWAT to set POLLIN/POLLRDNORM Arseniy Krasnov
  2022-07-18  8:15 ` [RFC PATCH v1 1/3] vsock: use sk_skrcvlowat to set POLLIN,POLLRDNORM, bits Arseniy Krasnov
@ 2022-07-18  8:17 ` Arseniy Krasnov
  2022-07-19 12:48   ` Stefano Garzarella
  2022-07-18  8:19 ` [RFC PATCH v1 3/3] vsock_test: POLLIN + SO_RCVLOWAT test Arseniy Krasnov
  2022-07-19 12:58 ` [RFC PATCH v1 0/3] virtio/vsock: use SO_RCVLOWAT to set POLLIN/POLLRDNORM Stefano Garzarella
  3 siblings, 1 reply; 18+ messages in thread
From: Arseniy Krasnov @ 2022-07-18  8:17 UTC (permalink / raw)
  To: Stefano Garzarella, David S. Miller, edumazet, Jakub Kicinski,
	Paolo Abeni, Stefan Hajnoczi, Michael S. Tsirkin,
	Arseniy Krasnov, Krasnov Arseniy
  Cc: linux-kernel, virtualization, netdev, kvm, kernel

This callback controls setting of POLLIN,POLLRDNORM output bits
of poll() syscall,but in some cases,it is incorrectly to set it,
when socket has at least 1 bytes of available data. Use 'target'
which is already exists and equal to sk_rcvlowat in this case.

Signed-off-by: Arseniy Krasnov <AVKrasnov@sberdevices.ru>
---
 net/vmw_vsock/virtio_transport_common.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/vmw_vsock/virtio_transport_common.c b/net/vmw_vsock/virtio_transport_common.c
index ec2c2afbf0d0..591908740992 100644
--- a/net/vmw_vsock/virtio_transport_common.c
+++ b/net/vmw_vsock/virtio_transport_common.c
@@ -634,7 +634,7 @@ virtio_transport_notify_poll_in(struct vsock_sock *vsk,
 				size_t target,
 				bool *data_ready_now)
 {
-	if (vsock_stream_has_data(vsk))
+	if (vsock_stream_has_data(vsk) >= target)
 		*data_ready_now = true;
 	else
 		*data_ready_now = false;
-- 
2.25.1

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [RFC PATCH v1 3/3] vsock_test: POLLIN + SO_RCVLOWAT test.
  2022-07-18  8:12 [RFC PATCH v1 0/3] virtio/vsock: use SO_RCVLOWAT to set POLLIN/POLLRDNORM Arseniy Krasnov
  2022-07-18  8:15 ` [RFC PATCH v1 1/3] vsock: use sk_skrcvlowat to set POLLIN,POLLRDNORM, bits Arseniy Krasnov
  2022-07-18  8:17 ` [RFC PATCH v1 2/3] virtio/vsock: use 'target' in notify_poll_in, callback Arseniy Krasnov
@ 2022-07-18  8:19 ` Arseniy Krasnov
  2022-07-19 12:52   ` Stefano Garzarella
  2022-07-19 12:58 ` [RFC PATCH v1 0/3] virtio/vsock: use SO_RCVLOWAT to set POLLIN/POLLRDNORM Stefano Garzarella
  3 siblings, 1 reply; 18+ messages in thread
From: Arseniy Krasnov @ 2022-07-18  8:19 UTC (permalink / raw)
  To: Stefano Garzarella, David S. Miller, edumazet, Jakub Kicinski,
	Paolo Abeni, Stefan Hajnoczi, Michael S. Tsirkin,
	Arseniy Krasnov, Krasnov Arseniy
  Cc: linux-kernel, virtualization, netdev, kvm, kernel

This adds test to check, that when poll() returns POLLIN and
POLLRDNORM bits, next read call won't block.

Signed-off-by: Arseniy Krasnov <AVKrasnov@sberdevices.ru>
---
 tools/testing/vsock/vsock_test.c | 90 ++++++++++++++++++++++++++++++++
 1 file changed, 90 insertions(+)

diff --git a/tools/testing/vsock/vsock_test.c b/tools/testing/vsock/vsock_test.c
index dc577461afc2..8e394443eaf6 100644
--- a/tools/testing/vsock/vsock_test.c
+++ b/tools/testing/vsock/vsock_test.c
@@ -18,6 +18,7 @@
 #include <sys/socket.h>
 #include <time.h>
 #include <sys/mman.h>
+#include <poll.h>
 
 #include "timeout.h"
 #include "control.h"
@@ -596,6 +597,90 @@ static void test_seqpacket_invalid_rec_buffer_server(const struct test_opts *opt
 	close(fd);
 }
 
+static void test_stream_poll_rcvlowat_server(const struct test_opts *opts)
+{
+#define RCVLOWAT_BUF_SIZE 128
+	int fd;
+	int i;
+
+	fd = vsock_stream_accept(VMADDR_CID_ANY, 1234, NULL);
+	if (fd < 0) {
+		perror("accept");
+		exit(EXIT_FAILURE);
+	}
+
+	/* Send 1 byte. */
+	send_byte(fd, 1, 0);
+
+	control_writeln("SRVSENT");
+
+	/* Just empirically delay value. */
+	sleep(4);
+
+	for (i = 0; i < RCVLOWAT_BUF_SIZE - 1; i++)
+		send_byte(fd, 1, 0);
+
+	/* Keep socket in active state. */
+	control_expectln("POLLDONE");
+
+	close(fd);
+}
+
+static void test_stream_poll_rcvlowat_client(const struct test_opts *opts)
+{
+	unsigned long lowat_val = RCVLOWAT_BUF_SIZE;
+	char buf[RCVLOWAT_BUF_SIZE];
+	struct pollfd fds;
+	ssize_t read_res;
+	short poll_flags;
+	int fd;
+
+	fd = vsock_stream_connect(opts->peer_cid, 1234);
+	if (fd < 0) {
+		perror("connect");
+		exit(EXIT_FAILURE);
+	}
+
+	if (setsockopt(fd, SOL_SOCKET, SO_RCVLOWAT,
+			&lowat_val, sizeof(lowat_val))) {
+		perror("setsockopt");
+		exit(EXIT_FAILURE);
+	}
+
+	control_expectln("SRVSENT");
+
+	/* At this point, server sent 1 byte. */
+	fds.fd = fd;
+	poll_flags = POLLIN | POLLRDNORM;
+	fds.events = poll_flags;
+
+	if (poll(&fds, 1, -1) < 0) {
+		perror("poll");
+		exit(EXIT_FAILURE);
+	}
+
+	/* Only these two bits are expected. */
+	if (fds.revents != poll_flags) {
+		fprintf(stderr, "Unexpected poll result %hx\n",
+			fds.revents);
+		exit(EXIT_FAILURE);
+	}
+
+	/* Use MSG_DONTWAIT, if call is going to wait, EAGAIN
+	 * will be returned.
+	 */
+	read_res = recv(fd, buf, sizeof(buf), MSG_DONTWAIT);
+	if (read_res != RCVLOWAT_BUF_SIZE) {
+		fprintf(stderr, "Unexpected recv result %zi\n",
+			read_res);
+		exit(EXIT_FAILURE);
+	}
+
+	control_writeln("POLLDONE");
+
+	close(fd);
+}
+
 static struct test_case test_cases[] = {
 	{
 		.name = "SOCK_STREAM connection reset",
@@ -646,6 +731,11 @@ static struct test_case test_cases[] = {
 		.run_client = test_seqpacket_invalid_rec_buffer_client,
 		.run_server = test_seqpacket_invalid_rec_buffer_server,
 	},
+	{
+		.name = "SOCK_STREAM poll() + SO_RCVLOWAT",
+		.run_client = test_stream_poll_rcvlowat_client,
+		.run_server = test_stream_poll_rcvlowat_server,
+	},
 	{},
 };
 
-- 
2.25.1

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [RFC PATCH v1 1/3] vsock: use sk_skrcvlowat to set POLLIN,POLLRDNORM, bits.
  2022-07-18  8:15 ` [RFC PATCH v1 1/3] vsock: use sk_skrcvlowat to set POLLIN,POLLRDNORM, bits Arseniy Krasnov
@ 2022-07-19 12:44   ` Stefano Garzarella
  2022-07-20  5:35     ` Arseniy Krasnov
  0 siblings, 1 reply; 18+ messages in thread
From: Stefano Garzarella @ 2022-07-19 12:44 UTC (permalink / raw)
  To: Arseniy Krasnov
  Cc: David S. Miller, Jakub Kicinski, Paolo Abeni, Stefan Hajnoczi,
	Michael S. Tsirkin, Krasnov Arseniy, edumazet, linux-kernel,
	virtualization, netdev, kvm, kernel

On Mon, Jul 18, 2022 at 08:15:42AM +0000, Arseniy Krasnov wrote:
>Both bits indicate, that next data read call won't be blocked,
>but when sk_rcvlowat is not 1,these bits will be set by poll
>anyway,thus when user tries to dequeue data, it will wait until
>sk_rcvlowat bytes of data will be available.
>
>Signed-off-by: Arseniy Krasnov <AVKrasnov@sberdevices.ru>
>---
> net/vmw_vsock/af_vsock.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
>diff --git a/net/vmw_vsock/af_vsock.c b/net/vmw_vsock/af_vsock.c
>index f04abf662ec6..0225f3558e30 100644
>--- a/net/vmw_vsock/af_vsock.c
>+++ b/net/vmw_vsock/af_vsock.c
>@@ -1067,7 +1067,7 @@ static __poll_t vsock_poll(struct file *file, struct socket *sock,
> 		    !(sk->sk_shutdown & RCV_SHUTDOWN)) {
> 			bool data_ready_now = false;
> 			int ret = transport->notify_poll_in(
>-					vsk, 1, &data_ready_now);
>+					vsk, sk->sk_rcvlowat, &data_ready_now);

In tcp_poll() we have the following:
     int target = sock_rcvlowat(sk, 0, INT_MAX);

Maybe we can do the same here.

Thanks,
Stefano

> 			if (ret < 0) {
> 				mask |= EPOLLERR;
> 			} else {
>-- 
>2.25.1


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC PATCH v1 2/3] virtio/vsock: use 'target' in notify_poll_in, callback.
  2022-07-18  8:17 ` [RFC PATCH v1 2/3] virtio/vsock: use 'target' in notify_poll_in, callback Arseniy Krasnov
@ 2022-07-19 12:48   ` Stefano Garzarella
  2022-07-20  5:38     ` Arseniy Krasnov
  0 siblings, 1 reply; 18+ messages in thread
From: Stefano Garzarella @ 2022-07-19 12:48 UTC (permalink / raw)
  To: Arseniy Krasnov
  Cc: David S. Miller, edumazet, Jakub Kicinski, Paolo Abeni,
	Stefan Hajnoczi, Michael S. Tsirkin, Krasnov Arseniy,
	linux-kernel, virtualization, netdev, kvm, kernel

On Mon, Jul 18, 2022 at 08:17:31AM +0000, Arseniy Krasnov wrote:
>This callback controls setting of POLLIN,POLLRDNORM output bits
>of poll() syscall,but in some cases,it is incorrectly to set it,
>when socket has at least 1 bytes of available data. Use 'target'
>which is already exists and equal to sk_rcvlowat in this case.
>
>Signed-off-by: Arseniy Krasnov <AVKrasnov@sberdevices.ru>
>---
> net/vmw_vsock/virtio_transport_common.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
>diff --git a/net/vmw_vsock/virtio_transport_common.c b/net/vmw_vsock/virtio_transport_common.c
>index ec2c2afbf0d0..591908740992 100644
>--- a/net/vmw_vsock/virtio_transport_common.c
>+++ b/net/vmw_vsock/virtio_transport_common.c
>@@ -634,7 +634,7 @@ virtio_transport_notify_poll_in(struct vsock_sock *vsk,
> 				size_t target,
> 				bool *data_ready_now)
> {
>-	if (vsock_stream_has_data(vsk))
>+	if (vsock_stream_has_data(vsk) >= target)
> 		*data_ready_now = true;
> 	else
> 		*data_ready_now = false;

Perhaps we can take the opportunity to clean up the code in this way:

	*data_ready_now = vsock_stream_has_data(vsk) >= target;

Anyway, I think we also need to fix the other transports (vmci and 
hyperv), what do you think?

Thanks,
Stefano


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC PATCH v1 3/3] vsock_test: POLLIN + SO_RCVLOWAT test.
  2022-07-18  8:19 ` [RFC PATCH v1 3/3] vsock_test: POLLIN + SO_RCVLOWAT test Arseniy Krasnov
@ 2022-07-19 12:52   ` Stefano Garzarella
  2022-07-20  5:46     ` Arseniy Krasnov
  0 siblings, 1 reply; 18+ messages in thread
From: Stefano Garzarella @ 2022-07-19 12:52 UTC (permalink / raw)
  To: Arseniy Krasnov
  Cc: David S. Miller, edumazet, Jakub Kicinski, Paolo Abeni,
	Stefan Hajnoczi, Michael S. Tsirkin, Krasnov Arseniy,
	linux-kernel, virtualization, netdev, kvm, kernel

On Mon, Jul 18, 2022 at 08:19:06AM +0000, Arseniy Krasnov wrote:
>This adds test to check, that when poll() returns POLLIN and
>POLLRDNORM bits, next read call won't block.
>
>Signed-off-by: Arseniy Krasnov <AVKrasnov@sberdevices.ru>
>---
> tools/testing/vsock/vsock_test.c | 90 ++++++++++++++++++++++++++++++++
> 1 file changed, 90 insertions(+)
>
>diff --git a/tools/testing/vsock/vsock_test.c b/tools/testing/vsock/vsock_test.c
>index dc577461afc2..8e394443eaf6 100644
>--- a/tools/testing/vsock/vsock_test.c
>+++ b/tools/testing/vsock/vsock_test.c
>@@ -18,6 +18,7 @@
> #include <sys/socket.h>
> #include <time.h>
> #include <sys/mman.h>
>+#include <poll.h>
>
> #include "timeout.h"
> #include "control.h"
>@@ -596,6 +597,90 @@ static void test_seqpacket_invalid_rec_buffer_server(const struct test_opts *opt
> 	close(fd);
> }
>
>+static void test_stream_poll_rcvlowat_server(const struct test_opts *opts)
>+{
>+#define RCVLOWAT_BUF_SIZE 128
>+	int fd;
>+	int i;
>+
>+	fd = vsock_stream_accept(VMADDR_CID_ANY, 1234, NULL);
>+	if (fd < 0) {
>+		perror("accept");
>+		exit(EXIT_FAILURE);
>+	}
>+
>+	/* Send 1 byte. */
>+	send_byte(fd, 1, 0);
>+
>+	control_writeln("SRVSENT");
>+
>+	/* Just empirically delay value. */
>+	sleep(4);

Why we need this sleep()?


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC PATCH v1 0/3] virtio/vsock: use SO_RCVLOWAT to set POLLIN/POLLRDNORM
  2022-07-18  8:12 [RFC PATCH v1 0/3] virtio/vsock: use SO_RCVLOWAT to set POLLIN/POLLRDNORM Arseniy Krasnov
                   ` (2 preceding siblings ...)
  2022-07-18  8:19 ` [RFC PATCH v1 3/3] vsock_test: POLLIN + SO_RCVLOWAT test Arseniy Krasnov
@ 2022-07-19 12:58 ` Stefano Garzarella
  2022-07-20  6:07   ` Arseniy Krasnov
  3 siblings, 1 reply; 18+ messages in thread
From: Stefano Garzarella @ 2022-07-19 12:58 UTC (permalink / raw)
  To: Arseniy Krasnov, David S. Miller, Jakub Kicinski, Paolo Abeni
  Cc: Stefan Hajnoczi, Michael S. Tsirkin, Krasnov Arseniy, kvm,
	netdev, virtualization, linux-kernel, kernel

On Mon, Jul 18, 2022 at 08:12:52AM +0000, Arseniy Krasnov wrote:
>Hello,
>
>during my experiments with zerocopy receive, i found, that in some
>cases, poll() implementation violates POSIX: when socket has non-
>default SO_RCVLOWAT(e.g. not 1), poll() will always set POLLIN and
>POLLRDNORM bits in 'revents' even number of bytes available to read
>on socket is smaller than SO_RCVLOWAT value. In this case,user sees
>POLLIN flag and then tries to read data(for example using  'read()'
>call), but read call will be blocked, because  SO_RCVLOWAT logic is
>supported in dequeue loop in af_vsock.c. But the same time,  POSIX
>requires that:
>
>"POLLIN     Data other than high-priority data may be read without
>            blocking.
> POLLRDNORM Normal data may be read without blocking."
>
>See https://www.open-std.org/jtc1/sc22/open/n4217.pdf, page 293.
>
>So, we have, that poll() syscall returns POLLIN, but read call will
>be blocked.
>
>Also in man page socket(7) i found that:
>
>"Since Linux 2.6.28, select(2), poll(2), and epoll(7) indicate a
>socket as readable only if at least SO_RCVLOWAT bytes are available."
>
>I checked TCP callback for poll()(net/ipv4/tcp.c, tcp_poll()), it
>uses SO_RCVLOWAT value to set POLLIN bit, also i've tested TCP with
>this case for TCP socket, it works as POSIX required.

I tried to look at the code and it seems that only TCP complies with it 
or am I wrong?

>
>I've added some fixes to af_vsock.c and virtio_transport_common.c,
>test is also implemented.
>
>What do You think guys?

Nice, thanks for fixing this and for the test!

I left some comments, but I think the series is fine if we will support 
it in all transports.

I'd just like to understand if it's just TCP complying with it or I'm 
missing some check included in the socket layer that we could reuse.

@David, @Jakub, @Paolo, any advice?

Thanks,
Stefano


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC PATCH v1 1/3] vsock: use sk_skrcvlowat to set POLLIN,POLLRDNORM, bits.
  2022-07-19 12:44   ` Stefano Garzarella
@ 2022-07-20  5:35     ` Arseniy Krasnov
  0 siblings, 0 replies; 18+ messages in thread
From: Arseniy Krasnov @ 2022-07-20  5:35 UTC (permalink / raw)
  To: Stefano Garzarella
  Cc: David S. Miller, Jakub Kicinski, Paolo Abeni, Stefan Hajnoczi,
	Michael S. Tsirkin, Krasnov Arseniy, edumazet, linux-kernel,
	virtualization, netdev, kvm, kernel

On 19.07.2022 15:44, Stefano Garzarella wrote:
> On Mon, Jul 18, 2022 at 08:15:42AM +0000, Arseniy Krasnov wrote:
>> Both bits indicate, that next data read call won't be blocked,
>> but when sk_rcvlowat is not 1,these bits will be set by poll
>> anyway,thus when user tries to dequeue data, it will wait until
>> sk_rcvlowat bytes of data will be available.
>>
>> Signed-off-by: Arseniy Krasnov <AVKrasnov@sberdevices.ru>
>> ---
>> net/vmw_vsock/af_vsock.c | 2 +-
>> 1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/net/vmw_vsock/af_vsock.c b/net/vmw_vsock/af_vsock.c
>> index f04abf662ec6..0225f3558e30 100644
>> --- a/net/vmw_vsock/af_vsock.c
>> +++ b/net/vmw_vsock/af_vsock.c
>> @@ -1067,7 +1067,7 @@ static __poll_t vsock_poll(struct file *file, struct socket *sock,
>>             !(sk->sk_shutdown & RCV_SHUTDOWN)) {
>>             bool data_ready_now = false;
>>             int ret = transport->notify_poll_in(
>> -                    vsk, 1, &data_ready_now);
>> +                    vsk, sk->sk_rcvlowat, &data_ready_now);
> 
> In tcp_poll() we have the following:
>     int target = sock_rcvlowat(sk, 0, INT_MAX);
> 
> Maybe we can do the same here.

You are right, ack

> 
> Thanks,
> Stefano
> 
>>             if (ret < 0) {
>>                 mask |= EPOLLERR;
>>             } else {
>> -- 
>> 2.25.1
> 


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC PATCH v1 2/3] virtio/vsock: use 'target' in notify_poll_in, callback.
  2022-07-19 12:48   ` Stefano Garzarella
@ 2022-07-20  5:38     ` Arseniy Krasnov
  2022-07-20  8:23       ` Stefano Garzarella
  0 siblings, 1 reply; 18+ messages in thread
From: Arseniy Krasnov @ 2022-07-20  5:38 UTC (permalink / raw)
  To: Stefano Garzarella
  Cc: David S. Miller, edumazet, Jakub Kicinski, Paolo Abeni,
	Stefan Hajnoczi, Michael S. Tsirkin, Krasnov Arseniy,
	linux-kernel, virtualization, netdev, kvm, kernel

On 19.07.2022 15:48, Stefano Garzarella wrote:
> On Mon, Jul 18, 2022 at 08:17:31AM +0000, Arseniy Krasnov wrote:
>> This callback controls setting of POLLIN,POLLRDNORM output bits
>> of poll() syscall,but in some cases,it is incorrectly to set it,
>> when socket has at least 1 bytes of available data. Use 'target'
>> which is already exists and equal to sk_rcvlowat in this case.
>>
>> Signed-off-by: Arseniy Krasnov <AVKrasnov@sberdevices.ru>
>> ---
>> net/vmw_vsock/virtio_transport_common.c | 2 +-
>> 1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/net/vmw_vsock/virtio_transport_common.c b/net/vmw_vsock/virtio_transport_common.c
>> index ec2c2afbf0d0..591908740992 100644
>> --- a/net/vmw_vsock/virtio_transport_common.c
>> +++ b/net/vmw_vsock/virtio_transport_common.c
>> @@ -634,7 +634,7 @@ virtio_transport_notify_poll_in(struct vsock_sock *vsk,
>>                 size_t target,
>>                 bool *data_ready_now)
>> {
>> -    if (vsock_stream_has_data(vsk))
>> +    if (vsock_stream_has_data(vsk) >= target)
>>         *data_ready_now = true;
>>     else
>>         *data_ready_now = false;
> 
> Perhaps we can take the opportunity to clean up the code in this way:
> 
>     *data_ready_now = vsock_stream_has_data(vsk) >= target;
Ack
> 
> Anyway, I think we also need to fix the other transports (vmci and hyperv), what do you think?
For vmci it is look clear to fix it. For hyperv i need to check it more, because it already
uses some internal target value.
> 
> Thanks,
> Stefano
> 


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC PATCH v1 3/3] vsock_test: POLLIN + SO_RCVLOWAT test.
  2022-07-19 12:52   ` Stefano Garzarella
@ 2022-07-20  5:46     ` Arseniy Krasnov
  2022-07-20  8:56       ` Stefano Garzarella
  0 siblings, 1 reply; 18+ messages in thread
From: Arseniy Krasnov @ 2022-07-20  5:46 UTC (permalink / raw)
  To: Stefano Garzarella
  Cc: David S. Miller, edumazet, Jakub Kicinski, Paolo Abeni,
	Stefan Hajnoczi, Michael S. Tsirkin, Krasnov Arseniy,
	linux-kernel, virtualization, netdev, kvm, kernel

On 19.07.2022 15:52, Stefano Garzarella wrote:
> On Mon, Jul 18, 2022 at 08:19:06AM +0000, Arseniy Krasnov wrote:
>> This adds test to check, that when poll() returns POLLIN and
>> POLLRDNORM bits, next read call won't block.
>>
>> Signed-off-by: Arseniy Krasnov <AVKrasnov@sberdevices.ru>
>> ---
>> tools/testing/vsock/vsock_test.c | 90 ++++++++++++++++++++++++++++++++
>> 1 file changed, 90 insertions(+)
>>
>> diff --git a/tools/testing/vsock/vsock_test.c b/tools/testing/vsock/vsock_test.c
>> index dc577461afc2..8e394443eaf6 100644
>> --- a/tools/testing/vsock/vsock_test.c
>> +++ b/tools/testing/vsock/vsock_test.c
>> @@ -18,6 +18,7 @@
>> #include <sys/socket.h>
>> #include <time.h>
>> #include <sys/mman.h>
>> +#include <poll.h>
>>
>> #include "timeout.h"
>> #include "control.h"
>> @@ -596,6 +597,90 @@ static void test_seqpacket_invalid_rec_buffer_server(const struct test_opts *opt
>>     close(fd);
>> }
>>
>> +static void test_stream_poll_rcvlowat_server(const struct test_opts *opts)
>> +{
>> +#define RCVLOWAT_BUF_SIZE 128
>> +    int fd;
>> +    int i;
>> +
>> +    fd = vsock_stream_accept(VMADDR_CID_ANY, 1234, NULL);
>> +    if (fd < 0) {
>> +        perror("accept");
>> +        exit(EXIT_FAILURE);
>> +    }
>> +
>> +    /* Send 1 byte. */
>> +    send_byte(fd, 1, 0);
>> +
>> +    control_writeln("SRVSENT");
>> +
>> +    /* Just empirically delay value. */
>> +    sleep(4);
> 
> Why we need this sleep()?
Purpose of sleep() is to move client in state, when it has 1 byte of rx data
and poll() won't wake. For example:
client:                        server:
waits for "SRVSENT"
                               send 1 byte
                               send "SRVSENT"
poll()
                               sleep
...
poll sleeps
...
                               send rest of data
poll wake up

I think, without sleep there is chance, that client enters poll() when whole
data from server is already received, thus test will be useless(it just tests
poll()). May be i can remove "SRVSENT" as sleep is enough.

> 


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC PATCH v1 0/3] virtio/vsock: use SO_RCVLOWAT to set POLLIN/POLLRDNORM
  2022-07-19 12:58 ` [RFC PATCH v1 0/3] virtio/vsock: use SO_RCVLOWAT to set POLLIN/POLLRDNORM Stefano Garzarella
@ 2022-07-20  6:07   ` Arseniy Krasnov
  2022-07-20  9:30     ` Stefano Garzarella
  0 siblings, 1 reply; 18+ messages in thread
From: Arseniy Krasnov @ 2022-07-20  6:07 UTC (permalink / raw)
  To: Stefano Garzarella, David S. Miller, Jakub Kicinski, Paolo Abeni
  Cc: Stefan Hajnoczi, Michael S. Tsirkin, Krasnov Arseniy, kvm,
	netdev, virtualization, linux-kernel, kernel

On 19.07.2022 15:58, Stefano Garzarella wrote:
> On Mon, Jul 18, 2022 at 08:12:52AM +0000, Arseniy Krasnov wrote:
>> Hello,
>>
>> during my experiments with zerocopy receive, i found, that in some
>> cases, poll() implementation violates POSIX: when socket has non-
>> default SO_RCVLOWAT(e.g. not 1), poll() will always set POLLIN and
>> POLLRDNORM bits in 'revents' even number of bytes available to read
>> on socket is smaller than SO_RCVLOWAT value. In this case,user sees
>> POLLIN flag and then tries to read data(for example using  'read()'
>> call), but read call will be blocked, because  SO_RCVLOWAT logic is
>> supported in dequeue loop in af_vsock.c. But the same time,  POSIX
>> requires that:
>>
>> "POLLIN     Data other than high-priority data may be read without
>>            blocking.
>> POLLRDNORM Normal data may be read without blocking."
>>
>> See https://www.open-std.org/jtc1/sc22/open/n4217.pdf, page 293.
>>
>> So, we have, that poll() syscall returns POLLIN, but read call will
>> be blocked.
>>
>> Also in man page socket(7) i found that:
>>
>> "Since Linux 2.6.28, select(2), poll(2), and epoll(7) indicate a
>> socket as readable only if at least SO_RCVLOWAT bytes are available."
>>
>> I checked TCP callback for poll()(net/ipv4/tcp.c, tcp_poll()), it
>> uses SO_RCVLOWAT value to set POLLIN bit, also i've tested TCP with
>> this case for TCP socket, it works as POSIX required.
> 
> I tried to look at the code and it seems that only TCP complies with it or am I wrong?
Yes, i checked AF_UNIX, it also don't care about that. It calls skb_queue_empty() that of
course ignores SO_RCVLOWAT.
> 
>>
>> I've added some fixes to af_vsock.c and virtio_transport_common.c,
>> test is also implemented.
>>
>> What do You think guys?
> 
> Nice, thanks for fixing this and for the test!
> 
> I left some comments, but I think the series is fine if we will support it in all transports.
Ack
> 
> I'd just like to understand if it's just TCP complying with it or I'm missing some check included in the socket layer that we could reuse.
Seems sock_poll() which is socket layer entry point for poll() doesn't contain any such checks
> 
> @David, @Jakub, @Paolo, any advice?
> 
> Thanks,
> Stefano
> 

PS: moreover, i found one more interesting thing with TCP and poll: TCP receive logic wakes up poll waiter
only when number of available bytes > SO_RCVLOWAT. E.g. it prevents "spurious" wake ups, when poll will be
woken up because new data arrived, but POLLIN to allow user dequeue this data won't be set(as amount of data
is too small).
See tcp_data_ready() in net/ipv4/tcp_input.c

Thanks

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC PATCH v1 2/3] virtio/vsock: use 'target' in notify_poll_in, callback.
  2022-07-20  5:38     ` Arseniy Krasnov
@ 2022-07-20  8:23       ` Stefano Garzarella
  2022-07-20 18:54         ` Dexuan Cui
  0 siblings, 1 reply; 18+ messages in thread
From: Stefano Garzarella @ 2022-07-20  8:23 UTC (permalink / raw)
  To: Arseniy Krasnov, Dexuan Cui
  Cc: David S. Miller, edumazet, Jakub Kicinski, Paolo Abeni,
	Stefan Hajnoczi, Michael S. Tsirkin, Krasnov Arseniy,
	linux-kernel, virtualization, netdev, kvm, kernel

On Wed, Jul 20, 2022 at 05:38:03AM +0000, Arseniy Krasnov wrote:
>On 19.07.2022 15:48, Stefano Garzarella wrote:
>> On Mon, Jul 18, 2022 at 08:17:31AM +0000, Arseniy Krasnov wrote:
>>> This callback controls setting of POLLIN,POLLRDNORM output bits
>>> of poll() syscall,but in some cases,it is incorrectly to set it,
>>> when socket has at least 1 bytes of available data. Use 'target'
>>> which is already exists and equal to sk_rcvlowat in this case.
>>>
>>> Signed-off-by: Arseniy Krasnov <AVKrasnov@sberdevices.ru>
>>> ---
>>> net/vmw_vsock/virtio_transport_common.c | 2 +-
>>> 1 file changed, 1 insertion(+), 1 deletion(-)
>>>
>>> diff --git a/net/vmw_vsock/virtio_transport_common.c b/net/vmw_vsock/virtio_transport_common.c
>>> index ec2c2afbf0d0..591908740992 100644
>>> --- a/net/vmw_vsock/virtio_transport_common.c
>>> +++ b/net/vmw_vsock/virtio_transport_common.c
>>> @@ -634,7 +634,7 @@ virtio_transport_notify_poll_in(struct vsock_sock *vsk,
>>>                 size_t target,
>>>                 bool *data_ready_now)
>>> {
>>> -    if (vsock_stream_has_data(vsk))
>>> +    if (vsock_stream_has_data(vsk) >= target)
>>>         *data_ready_now = true;
>>>     else
>>>         *data_ready_now = false;
>>
>> Perhaps we can take the opportunity to clean up the code in this way:
>>
>>     *data_ready_now = vsock_stream_has_data(vsk) >= target;
>Ack
>>
>> Anyway, I think we also need to fix the other transports (vmci and hyperv), what do you think?
>For vmci it is look clear to fix it. For hyperv i need to check it more, because it already
>uses some internal target value.

Yep, I see. Maybe you can pass `target` to hvs_channel_readable() and 
use it as parameter of HVS_PKT_LEN().

@Dexuan what do you think?

Thanks,
Stefano


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC PATCH v1 3/3] vsock_test: POLLIN + SO_RCVLOWAT test.
  2022-07-20  5:46     ` Arseniy Krasnov
@ 2022-07-20  8:56       ` Stefano Garzarella
  0 siblings, 0 replies; 18+ messages in thread
From: Stefano Garzarella @ 2022-07-20  8:56 UTC (permalink / raw)
  To: Arseniy Krasnov
  Cc: David S. Miller, edumazet, Jakub Kicinski, Paolo Abeni,
	Stefan Hajnoczi, Michael S. Tsirkin, Krasnov Arseniy,
	linux-kernel, virtualization, netdev, kvm, kernel

On Wed, Jul 20, 2022 at 05:46:01AM +0000, Arseniy Krasnov wrote:
>On 19.07.2022 15:52, Stefano Garzarella wrote:
>> On Mon, Jul 18, 2022 at 08:19:06AM +0000, Arseniy Krasnov wrote:
>>> This adds test to check, that when poll() returns POLLIN and
>>> POLLRDNORM bits, next read call won't block.
>>>
>>> Signed-off-by: Arseniy Krasnov <AVKrasnov@sberdevices.ru>
>>> ---
>>> tools/testing/vsock/vsock_test.c | 90 ++++++++++++++++++++++++++++++++
>>> 1 file changed, 90 insertions(+)
>>>
>>> diff --git a/tools/testing/vsock/vsock_test.c b/tools/testing/vsock/vsock_test.c
>>> index dc577461afc2..8e394443eaf6 100644
>>> --- a/tools/testing/vsock/vsock_test.c
>>> +++ b/tools/testing/vsock/vsock_test.c
>>> @@ -18,6 +18,7 @@
>>> #include <sys/socket.h>
>>> #include <time.h>
>>> #include <sys/mman.h>
>>> +#include <poll.h>
>>>
>>> #include "timeout.h"
>>> #include "control.h"
>>> @@ -596,6 +597,90 @@ static void test_seqpacket_invalid_rec_buffer_server(const struct test_opts *opt
>>>     close(fd);
>>> }
>>>
>>> +static void test_stream_poll_rcvlowat_server(const struct test_opts *opts)
>>> +{
>>> +#define RCVLOWAT_BUF_SIZE 128
>>> +    int fd;
>>> +    int i;
>>> +
>>> +    fd = vsock_stream_accept(VMADDR_CID_ANY, 1234, NULL);
>>> +    if (fd < 0) {
>>> +        perror("accept");
>>> +        exit(EXIT_FAILURE);
>>> +    }
>>> +
>>> +    /* Send 1 byte. */
>>> +    send_byte(fd, 1, 0);
>>> +
>>> +    control_writeln("SRVSENT");
>>> +
>>> +    /* Just empirically delay value. */
>>> +    sleep(4);
>>
>> Why we need this sleep()?
>Purpose of sleep() is to move client in state, when it has 1 byte of rx data
>and poll() won't wake. For example:
>client:                        server:
>waits for "SRVSENT"
>                               send 1 byte
>                               send "SRVSENT"
>poll()
>                               sleep
>...
>poll sleeps
>...
>                               send rest of data
>poll wake up
>
>I think, without sleep there is chance, that client enters poll() when whole
>data from server is already received, thus test will be useless(it just tests

Right, I see (maybe add a comment in the test).

>poll()). May be i can remove "SRVSENT" as sleep is enough.

I think it's fine.

An alternative could be to use the `timeout` of poll():

client:                        server:
waits for "SRVSENT"
                                send 1 byte
                                send "SRVSENT"
poll(, timeout = 1 * 1000)
                                wait for "CLNSENT"
poll should return 0
send "CLNSENT"

poll(, timeout = 10 * 1000)
...
poll sleeps
...
                                send rest of data
poll wake up


I don't have a strong opinion, also your version seems fine, just an 
alternative ;-)

Maybe in your version you can add a 10 sec timeout to poll, to avoid 
that the test stuck for some reason (failing if the timeout is reached).

Thanks,
Stefano


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC PATCH v1 0/3] virtio/vsock: use SO_RCVLOWAT to set POLLIN/POLLRDNORM
  2022-07-20  6:07   ` Arseniy Krasnov
@ 2022-07-20  9:30     ` Stefano Garzarella
  2022-07-20 10:52       ` Arseniy Krasnov
  0 siblings, 1 reply; 18+ messages in thread
From: Stefano Garzarella @ 2022-07-20  9:30 UTC (permalink / raw)
  To: Arseniy Krasnov
  Cc: David S. Miller, Jakub Kicinski, Paolo Abeni, Stefan Hajnoczi,
	Michael S. Tsirkin, Krasnov Arseniy, kvm, netdev, virtualization,
	linux-kernel, kernel

On Wed, Jul 20, 2022 at 06:07:47AM +0000, Arseniy Krasnov wrote:
>On 19.07.2022 15:58, Stefano Garzarella wrote:
>> On Mon, Jul 18, 2022 at 08:12:52AM +0000, Arseniy Krasnov wrote:
>>> Hello,
>>>
>>> during my experiments with zerocopy receive, i found, that in some
>>> cases, poll() implementation violates POSIX: when socket has non-
>>> default SO_RCVLOWAT(e.g. not 1), poll() will always set POLLIN and
>>> POLLRDNORM bits in 'revents' even number of bytes available to read
>>> on socket is smaller than SO_RCVLOWAT value. In this case,user sees
>>> POLLIN flag and then tries to read data(for example using  'read()'
>>> call), but read call will be blocked, because  SO_RCVLOWAT logic is
>>> supported in dequeue loop in af_vsock.c. But the same time,  POSIX
>>> requires that:
>>>
>>> "POLLIN     Data other than high-priority data may be read without
>>>            blocking.
>>> POLLRDNORM Normal data may be read without blocking."
>>>
>>> See https://www.open-std.org/jtc1/sc22/open/n4217.pdf, page 293.
>>>
>>> So, we have, that poll() syscall returns POLLIN, but read call will
>>> be blocked.
>>>
>>> Also in man page socket(7) i found that:
>>>
>>> "Since Linux 2.6.28, select(2), poll(2), and epoll(7) indicate a
>>> socket as readable only if at least SO_RCVLOWAT bytes are available."
>>>
>>> I checked TCP callback for poll()(net/ipv4/tcp.c, tcp_poll()), it
>>> uses SO_RCVLOWAT value to set POLLIN bit, also i've tested TCP with
>>> this case for TCP socket, it works as POSIX required.
>>
>> I tried to look at the code and it seems that only TCP complies with it or am I wrong?
>Yes, i checked AF_UNIX, it also don't care about that. It calls skb_queue_empty() that of
>course ignores SO_RCVLOWAT.
>>
>>>
>>> I've added some fixes to af_vsock.c and virtio_transport_common.c,
>>> test is also implemented.
>>>
>>> What do You think guys?
>>
>> Nice, thanks for fixing this and for the test!
>>
>> I left some comments, but I think the series is fine if we will support it in all transports.
>Ack
>>
>> I'd just like to understand if it's just TCP complying with it or I'm missing some check included in the socket layer that we could reuse.
>Seems sock_poll() which is socket layer entry point for poll() doesn't contain any such checks
>>
>> @David, @Jakub, @Paolo, any advice?
>>
>> Thanks,
>> Stefano
>>
>
>PS: moreover, i found one more interesting thing with TCP and poll: TCP receive logic wakes up poll waiter
>only when number of available bytes > SO_RCVLOWAT. E.g. it prevents "spurious" wake ups, when poll will be
>woken up because new data arrived, but POLLIN to allow user dequeue this data won't be set(as amount of data
>is too small).
>See tcp_data_ready() in net/ipv4/tcp_input.c

Do you mean that we should call sk->sk_data_ready(sk) checking 
SO_RCVLOWAT?

It seems fine, maybe we can add vsock_data_ready() in af_vsock.c that 
transports should call instead of calling sk->sk_data_ready(sk) 
directly.

Then we can something similar to tcp_data_ready().

Thanks,
Stefano


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC PATCH v1 0/3] virtio/vsock: use SO_RCVLOWAT to set POLLIN/POLLRDNORM
  2022-07-20  9:30     ` Stefano Garzarella
@ 2022-07-20 10:52       ` Arseniy Krasnov
  0 siblings, 0 replies; 18+ messages in thread
From: Arseniy Krasnov @ 2022-07-20 10:52 UTC (permalink / raw)
  To: Stefano Garzarella
  Cc: David S. Miller, Jakub Kicinski, Paolo Abeni, Stefan Hajnoczi,
	Michael S. Tsirkin, Krasnov Arseniy, kvm, netdev, virtualization,
	linux-kernel, kernel

On 20.07.2022 12:30, Stefano Garzarella wrote:
> On Wed, Jul 20, 2022 at 06:07:47AM +0000, Arseniy Krasnov wrote:
>> On 19.07.2022 15:58, Stefano Garzarella wrote:
>>> On Mon, Jul 18, 2022 at 08:12:52AM +0000, Arseniy Krasnov wrote:
>>>> Hello,
>>>>
>>>> during my experiments with zerocopy receive, i found, that in some
>>>> cases, poll() implementation violates POSIX: when socket has non-
>>>> default SO_RCVLOWAT(e.g. not 1), poll() will always set POLLIN and
>>>> POLLRDNORM bits in 'revents' even number of bytes available to read
>>>> on socket is smaller than SO_RCVLOWAT value. In this case,user sees
>>>> POLLIN flag and then tries to read data(for example using  'read()'
>>>> call), but read call will be blocked, because  SO_RCVLOWAT logic is
>>>> supported in dequeue loop in af_vsock.c. But the same time,  POSIX
>>>> requires that:
>>>>
>>>> "POLLIN     Data other than high-priority data may be read without
>>>>            blocking.
>>>> POLLRDNORM Normal data may be read without blocking."
>>>>
>>>> See https://www.open-std.org/jtc1/sc22/open/n4217.pdf, page 293.
>>>>
>>>> So, we have, that poll() syscall returns POLLIN, but read call will
>>>> be blocked.
>>>>
>>>> Also in man page socket(7) i found that:
>>>>
>>>> "Since Linux 2.6.28, select(2), poll(2), and epoll(7) indicate a
>>>> socket as readable only if at least SO_RCVLOWAT bytes are available."
>>>>
>>>> I checked TCP callback for poll()(net/ipv4/tcp.c, tcp_poll()), it
>>>> uses SO_RCVLOWAT value to set POLLIN bit, also i've tested TCP with
>>>> this case for TCP socket, it works as POSIX required.
>>>
>>> I tried to look at the code and it seems that only TCP complies with it or am I wrong?
>> Yes, i checked AF_UNIX, it also don't care about that. It calls skb_queue_empty() that of
>> course ignores SO_RCVLOWAT.
>>>
>>>>
>>>> I've added some fixes to af_vsock.c and virtio_transport_common.c,
>>>> test is also implemented.
>>>>
>>>> What do You think guys?
>>>
>>> Nice, thanks for fixing this and for the test!
>>>
>>> I left some comments, but I think the series is fine if we will support it in all transports.
>> Ack
>>>
>>> I'd just like to understand if it's just TCP complying with it or I'm missing some check included in the socket layer that we could reuse.
>> Seems sock_poll() which is socket layer entry point for poll() doesn't contain any such checks
>>>
>>> @David, @Jakub, @Paolo, any advice?
>>>
>>> Thanks,
>>> Stefano
>>>
>>
>> PS: moreover, i found one more interesting thing with TCP and poll: TCP receive logic wakes up poll waiter
>> only when number of available bytes > SO_RCVLOWAT. E.g. it prevents "spurious" wake ups, when poll will be
>> woken up because new data arrived, but POLLIN to allow user dequeue this data won't be set(as amount of data
>> is too small).
>> See tcp_data_ready() in net/ipv4/tcp_input.c
> 
> Do you mean that we should call sk->sk_data_ready(sk) checking SO_RCVLOWAT?
Yes, like tcp_data_read().
> 
> It seems fine, maybe we can add vsock_data_ready() in af_vsock.c that transports should call instead of calling sk->sk_data_ready(sk) directly.
Yes, this will also update logic in vmci and hyperv transports
> 
> Then we can something similar to tcp_data_ready().
> 
> Thanks,
> Stefano
> 


^ permalink raw reply	[flat|nested] 18+ messages in thread

* RE: [RFC PATCH v1 2/3] virtio/vsock: use 'target' in notify_poll_in, callback.
  2022-07-20  8:23       ` Stefano Garzarella
@ 2022-07-20 18:54         ` Dexuan Cui
  2022-07-21  6:02           ` Arseniy Krasnov
  0 siblings, 1 reply; 18+ messages in thread
From: Dexuan Cui @ 2022-07-20 18:54 UTC (permalink / raw)
  To: Stefano Garzarella, Arseniy Krasnov
  Cc: David S. Miller, edumazet, Jakub Kicinski, Paolo Abeni,
	Stefan Hajnoczi, Michael S. Tsirkin, Krasnov Arseniy,
	linux-kernel, virtualization, netdev, kvm, kernel

> From: Stefano Garzarella <sgarzare@redhat.com>
> Sent: Wednesday, July 20, 2022 1:23 AM
> ...
> On Wed, Jul 20, 2022 at 05:38:03AM +0000, Arseniy Krasnov wrote:
> >On 19.07.2022 15:48, Stefano Garzarella wrote:
> >> On Mon, Jul 18, 2022 at 08:17:31AM +0000, Arseniy Krasnov wrote:
> >>> This callback controls setting of POLLIN,POLLRDNORM output bits
> >>> of poll() syscall,but in some cases,it is incorrectly to set it,
> >>> when socket has at least 1 bytes of available data. Use 'target'
> >>> which is already exists and equal to sk_rcvlowat in this case.
> >>>
> >>> Signed-off-by: Arseniy Krasnov <AVKrasnov@sberdevices.ru>
> >>> ---
> >>> net/vmw_vsock/virtio_transport_common.c | 2 +-
> >>> 1 file changed, 1 insertion(+), 1 deletion(-)
> >>>
> >>> diff --git a/net/vmw_vsock/virtio_transport_common.c
> b/net/vmw_vsock/virtio_transport_common.c
> >>> index ec2c2afbf0d0..591908740992 100644
> >>> --- a/net/vmw_vsock/virtio_transport_common.c
> >>> +++ b/net/vmw_vsock/virtio_transport_common.c
> >>> @@ -634,7 +634,7 @@ virtio_transport_notify_poll_in(struct vsock_sock
> *vsk,
> >>>                 size_t target,
> >>>                 bool *data_ready_now)
> >>> {
> >>> -    if (vsock_stream_has_data(vsk))
> >>> +    if (vsock_stream_has_data(vsk) >= target)
> >>>         *data_ready_now = true;
> >>>     else
> >>>         *data_ready_now = false;
> >>
> >> Perhaps we can take the opportunity to clean up the code in this way:
> >>
> >>     *data_ready_now = vsock_stream_has_data(vsk) >= target;
> >Ack
> >>
> >> Anyway, I think we also need to fix the other transports (vmci and hyperv),
> >> what do you think?
> >For vmci it is look clear to fix it. For hyperv i need to check it more, because it
> > already uses some internal target value.
> 
> Yep, I see. Maybe you can pass `target` to hvs_channel_readable() and
> use it as parameter of HVS_PKT_LEN().
> 
> @Dexuan what do you think?
> 
> Thanks,
> Stefano

Can we return "not supported" to set_rcvlowat for Hyper-V vsock? :-)

For Hyper-V vsock, it's easy to tell if there is at least 1 byte to read: 
please refer to hvs_channel_readable(), but it's difficult to figure out
exactly how many bytes can be read. 

In hvs_channel_readable(), hv_get_bytes_to_read() returns the total 
bytes of 0, 1 or multiple Hyper-V vsock packets: each packet has a
24-byte header (see HVS_HEADER_LEN), the payload, some padding
bytes (if the payload length is not a multiple of 8), and 8 trailing
useless bytes.

It's hard to get the total payload length because there is no API in
include/linux/hyperv.h, drivers/hv/channel.c and 
drivers/hv/ring_buffer.c that allows us to peek at the data in the
VMBus channel's ringbuffer. 

We could add such a "peek" API in drivers/hv/channel.c (see the
non-peek version of the APIs in hvs_stream_dequeue(): 
hv_pkt_iter_first() and hv_pkt_iter_next()), and examine the whole
ringbuffe to figure out the exact total payload length, but I feel it may 
not be worth the non-trivial complexity just to be POSIX-compliant --
nobody ever complained about this for the past 5 years :-) So I'm
wondering if we should allow Hyper-V vsock to not support the 
set_rcvlowat op?

Thanks,
-- Dexuan


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC PATCH v1 2/3] virtio/vsock: use 'target' in notify_poll_in, callback.
  2022-07-20 18:54         ` Dexuan Cui
@ 2022-07-21  6:02           ` Arseniy Krasnov
  0 siblings, 0 replies; 18+ messages in thread
From: Arseniy Krasnov @ 2022-07-21  6:02 UTC (permalink / raw)
  To: Dexuan Cui, Stefano Garzarella
  Cc: David S. Miller, edumazet, Jakub Kicinski, Paolo Abeni,
	Stefan Hajnoczi, Michael S. Tsirkin, Krasnov Arseniy,
	linux-kernel, virtualization, netdev, kvm, kernel

On 20.07.2022 21:54, Dexuan Cui wrote:
>> From: Stefano Garzarella <sgarzare@redhat.com>
>> Sent: Wednesday, July 20, 2022 1:23 AM
>> ...
>> On Wed, Jul 20, 2022 at 05:38:03AM +0000, Arseniy Krasnov wrote:
>>> On 19.07.2022 15:48, Stefano Garzarella wrote:
>>>> On Mon, Jul 18, 2022 at 08:17:31AM +0000, Arseniy Krasnov wrote:
>>>>> This callback controls setting of POLLIN,POLLRDNORM output bits
>>>>> of poll() syscall,but in some cases,it is incorrectly to set it,
>>>>> when socket has at least 1 bytes of available data. Use 'target'
>>>>> which is already exists and equal to sk_rcvlowat in this case.
>>>>>
>>>>> Signed-off-by: Arseniy Krasnov <AVKrasnov@sberdevices.ru>
>>>>> ---
>>>>> net/vmw_vsock/virtio_transport_common.c | 2 +-
>>>>> 1 file changed, 1 insertion(+), 1 deletion(-)
>>>>>
>>>>> diff --git a/net/vmw_vsock/virtio_transport_common.c
>> b/net/vmw_vsock/virtio_transport_common.c
>>>>> index ec2c2afbf0d0..591908740992 100644
>>>>> --- a/net/vmw_vsock/virtio_transport_common.c
>>>>> +++ b/net/vmw_vsock/virtio_transport_common.c
>>>>> @@ -634,7 +634,7 @@ virtio_transport_notify_poll_in(struct vsock_sock
>> *vsk,
>>>>>                 size_t target,
>>>>>                 bool *data_ready_now)
>>>>> {
>>>>> -    if (vsock_stream_has_data(vsk))
>>>>> +    if (vsock_stream_has_data(vsk) >= target)
>>>>>         *data_ready_now = true;
>>>>>     else
>>>>>         *data_ready_now = false;
>>>>
>>>> Perhaps we can take the opportunity to clean up the code in this way:
>>>>
>>>>     *data_ready_now = vsock_stream_has_data(vsk) >= target;
>>> Ack
>>>>
>>>> Anyway, I think we also need to fix the other transports (vmci and hyperv),
>>>> what do you think?
>>> For vmci it is look clear to fix it. For hyperv i need to check it more, because it
>>> already uses some internal target value.
>>
>> Yep, I see. Maybe you can pass `target` to hvs_channel_readable() and
>> use it as parameter of HVS_PKT_LEN().
>>
>> @Dexuan what do you think?
>>
>> Thanks,
>> Stefano
> 
> Can we return "not supported" to set_rcvlowat for Hyper-V vsock? :-)
> 
> For Hyper-V vsock, it's easy to tell if there is at least 1 byte to read: 
> please refer to hvs_channel_readable(), but it's difficult to figure out
> exactly how many bytes can be read. 
> 
> In hvs_channel_readable(), hv_get_bytes_to_read() returns the total 
> bytes of 0, 1 or multiple Hyper-V vsock packets: each packet has a
> 24-byte header (see HVS_HEADER_LEN), the payload, some padding
> bytes (if the payload length is not a multiple of 8), and 8 trailing
> useless bytes.
> 
> It's hard to get the total payload length because there is no API in
> include/linux/hyperv.h, drivers/hv/channel.c and 
> drivers/hv/ring_buffer.c that allows us to peek at the data in the
> VMBus channel's ringbuffer. 
> 
> We could add such a "peek" API in drivers/hv/channel.c (see the
> non-peek version of the APIs in hvs_stream_dequeue(): 
> hv_pkt_iter_first() and hv_pkt_iter_next()), and examine the whole
> ringbuffe to figure out the exact total payload length, but I feel it may 
> not be worth the non-trivial complexity just to be POSIX-compliant --
> nobody ever complained about this for the past 5 years :-) So I'm
> wondering if we should allow Hyper-V vsock to not support the 
> set_rcvlowat op?
I see, i can prepare patch for this in this patchset in v2. Call to set SO_RCVLOWAT for Hyper-V
transport will return EOPNOTSUPP.

Thank You
> 
> Thanks,
> -- Dexuan
> 


^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2022-07-21  6:02 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-07-18  8:12 [RFC PATCH v1 0/3] virtio/vsock: use SO_RCVLOWAT to set POLLIN/POLLRDNORM Arseniy Krasnov
2022-07-18  8:15 ` [RFC PATCH v1 1/3] vsock: use sk_skrcvlowat to set POLLIN,POLLRDNORM, bits Arseniy Krasnov
2022-07-19 12:44   ` Stefano Garzarella
2022-07-20  5:35     ` Arseniy Krasnov
2022-07-18  8:17 ` [RFC PATCH v1 2/3] virtio/vsock: use 'target' in notify_poll_in, callback Arseniy Krasnov
2022-07-19 12:48   ` Stefano Garzarella
2022-07-20  5:38     ` Arseniy Krasnov
2022-07-20  8:23       ` Stefano Garzarella
2022-07-20 18:54         ` Dexuan Cui
2022-07-21  6:02           ` Arseniy Krasnov
2022-07-18  8:19 ` [RFC PATCH v1 3/3] vsock_test: POLLIN + SO_RCVLOWAT test Arseniy Krasnov
2022-07-19 12:52   ` Stefano Garzarella
2022-07-20  5:46     ` Arseniy Krasnov
2022-07-20  8:56       ` Stefano Garzarella
2022-07-19 12:58 ` [RFC PATCH v1 0/3] virtio/vsock: use SO_RCVLOWAT to set POLLIN/POLLRDNORM Stefano Garzarella
2022-07-20  6:07   ` Arseniy Krasnov
2022-07-20  9:30     ` Stefano Garzarella
2022-07-20 10:52       ` Arseniy Krasnov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).