netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC PATCH v2 0/9] vsock: updates for SO_RCVLOWAT handling
@ 2022-07-25  7:54 Arseniy Krasnov
  2022-07-25  7:56 ` [RFC PATCH v2 1/9] vsock: use sk_rcvlowat to set POLLIN/POLLRDNORM Arseniy Krasnov
                   ` (9 more replies)
  0 siblings, 10 replies; 24+ messages in thread
From: Arseniy Krasnov @ 2022-07-25  7:54 UTC (permalink / raw)
  To: Stefano Garzarella, David S. Miller, edumazet, Jakub Kicinski,
	Paolo Abeni, Stefan Hajnoczi, Michael S. Tsirkin, kys, haiyangz,
	sthemmin, wei.liu, Dexuan Cui, Arseniy Krasnov, Krasnov Arseniy
  Cc: virtualization, netdev, linux-kernel, linux-hyperv, kvm, kernel

Hello,

This patchset includes some updates for SO_RCVLOWAT:

1) af_vsock:
   During my experiments with zerocopy receive, i found, that in some
   cases, poll() implementation violates POSIX: when socket has non-
   default SO_RCVLOWAT(e.g. not 1), poll() will always set POLLIN and
   POLLRDNORM bits in 'revents' even number of bytes available to read
   on socket is smaller than SO_RCVLOWAT value. In this case,user sees
   POLLIN flag and then tries to read data(for example using  'read()'
   call), but read call will be blocked, because  SO_RCVLOWAT logic is
   supported in dequeue loop in af_vsock.c. But the same time,  POSIX
   requires that:

   "POLLIN     Data other than high-priority data may be read without
               blocking.
    POLLRDNORM Normal data may be read without blocking."

   See https://www.open-std.org/jtc1/sc22/open/n4217.pdf, page 293.

   So, we have, that poll() syscall returns POLLIN, but read call will
   be blocked.

   Also in man page socket(7) i found that:

   "Since Linux 2.6.28, select(2), poll(2), and epoll(7) indicate a
   socket as readable only if at least SO_RCVLOWAT bytes are available."

   I checked TCP callback for poll()(net/ipv4/tcp.c, tcp_poll()), it
   uses SO_RCVLOWAT value to set POLLIN bit, also i've tested TCP with
   this case for TCP socket, it works as POSIX required.

   I've added some fixes to af_vsock.c and virtio_transport_common.c,
   test is also implemented.

2) virtio/vsock:
   It adds some optimization to wake ups, when new data arrived. Now,
   SO_RCVLOWAT is considered before wake up sleepers who wait new data.
   There is no sense, to kick waiter, when number of available bytes
   in socket's queue < SO_RCVLOWAT, because if we wake up reader in
   this case, it will wait for SO_RCVLOWAT data anyway during dequeue,
   or in poll() case, POLLIN/POLLRDNORM bits won't be set, so such
   exit from poll() will be "spurious". This logic is also used in TCP
   sockets.

3) vmci/vsock:
   Same as 2), but i'm not sure about this changes. Will be very good,
   to get comments from someone who knows this code.

4) Hyper-V:
   As Dexuan Cui mentioned, for Hyper-V transport it is difficult to
   support SO_RCVLOWAT, so he suggested to disable this feature for
   Hyper-V.

Thank You

Arseniy Krasnov(9):
 vsock: use sk_rcvlowat to set POLLIN/POLLRDNORM
 virtio/vsock: use 'target' in notify_poll_in callback
 vmci/vsock: use 'target' in notify_poll_in callback
 vsock_test: POLLIN + SO_RCVLOWAT test
 vsock: SO_RCVLOWAT transport set callback
 hv_sock: disable SO_RCVLOWAT support
 vsock: add API call for data ready
 virtio/vsock: check SO_RCVLOWAT before wake up reader
 vmci/vsock: check SO_RCVLOWAT before wake up reader

 include/net/af_vsock.h                       |   2 +
 net/vmw_vsock/af_vsock.c                     |  32 +++++++-
 net/vmw_vsock/hyperv_transport.c             |   7 ++
 net/vmw_vsock/virtio_transport_common.c      |   7 +-
 net/vmw_vsock/vmci_transport_notify.c        |   4 +-
 net/vmw_vsock/vmci_transport_notify_qstate.c |   6 +-
 tools/testing/vsock/vsock_test.c             | 107 +++++++++++++++++++++++++++
 7 files changed, 154 insertions(+), 11 deletions(-)

-- 
2.25.1

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [RFC PATCH v2 1/9] vsock: use sk_rcvlowat to set POLLIN/POLLRDNORM
  2022-07-25  7:54 [RFC PATCH v2 0/9] vsock: updates for SO_RCVLOWAT handling Arseniy Krasnov
@ 2022-07-25  7:56 ` Arseniy Krasnov
  2022-07-27 12:17   ` Stefano Garzarella
  2022-07-25  7:59 ` [RFC PATCH v2 2/9] virtio/vsock: use 'target' in notify_poll_in, callback Arseniy Krasnov
                   ` (8 subsequent siblings)
  9 siblings, 1 reply; 24+ messages in thread
From: Arseniy Krasnov @ 2022-07-25  7:56 UTC (permalink / raw)
  To: Stefano Garzarella, David S. Miller, edumazet, Jakub Kicinski,
	Paolo Abeni, kys, haiyangz, sthemmin, wei.liu, Dexuan Cui,
	Stefan Hajnoczi, Michael S. Tsirkin, Arseniy Krasnov,
	Krasnov Arseniy
  Cc: virtualization, netdev, linux-kernel, linux-hyperv, kvm, kernel

Both bits indicate, that next data read call won't be blocked, but when
sk_rcvlowat is not 1, these bits will be set by poll anyway, thus when
user tries to dequeue data,it will wait until sk_rcvlowat bytes of data
will be available.

Signed-off-by: Arseniy Krasnov <AVKrasnov@sberdevices.ru>
---
 net/vmw_vsock/af_vsock.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/net/vmw_vsock/af_vsock.c b/net/vmw_vsock/af_vsock.c
index f04abf662ec6..63a13fa2686a 100644
--- a/net/vmw_vsock/af_vsock.c
+++ b/net/vmw_vsock/af_vsock.c
@@ -1066,8 +1066,9 @@ static __poll_t vsock_poll(struct file *file, struct socket *sock,
 		if (transport && transport->stream_is_active(vsk) &&
 		    !(sk->sk_shutdown & RCV_SHUTDOWN)) {
 			bool data_ready_now = false;
+			int target = sock_rcvlowat(sk, 0, INT_MAX);
 			int ret = transport->notify_poll_in(
-					vsk, 1, &data_ready_now);
+					vsk, target, &data_ready_now);
 			if (ret < 0) {
 				mask |= EPOLLERR;
 			} else {
-- 
2.25.1

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [RFC PATCH v2 2/9] virtio/vsock: use 'target' in notify_poll_in, callback
  2022-07-25  7:54 [RFC PATCH v2 0/9] vsock: updates for SO_RCVLOWAT handling Arseniy Krasnov
  2022-07-25  7:56 ` [RFC PATCH v2 1/9] vsock: use sk_rcvlowat to set POLLIN/POLLRDNORM Arseniy Krasnov
@ 2022-07-25  7:59 ` Arseniy Krasnov
  2022-07-25  8:01 ` [RFC PATCH v2 3/9] vmci/vsock: " Arseniy Krasnov
                   ` (7 subsequent siblings)
  9 siblings, 0 replies; 24+ messages in thread
From: Arseniy Krasnov @ 2022-07-25  7:59 UTC (permalink / raw)
  To: Stefano Garzarella, David S. Miller, edumazet, Jakub Kicinski,
	Paolo Abeni, kys, haiyangz, sthemmin, wei.liu, Dexuan Cui,
	Stefan Hajnoczi, Arseniy Krasnov, Krasnov Arseniy
  Cc: virtualization, netdev, linux-kernel, linux-hyperv, kvm, kernel

This callback controls setting of POLLIN,POLLRDNORM output bits of poll()
syscall,but in some cases,it is incorrectly to set it, when socket has
at least 1 bytes of available data. Use 'target' which is already exists
and equal to sk_rcvlowat in this case.

Signed-off-by: Arseniy Krasnov <AVKrasnov@sberdevices.ru>
---
 net/vmw_vsock/virtio_transport_common.c | 5 +----
 1 file changed, 1 insertion(+), 4 deletions(-)

diff --git a/net/vmw_vsock/virtio_transport_common.c b/net/vmw_vsock/virtio_transport_common.c
index ec2c2afbf0d0..8f6356ebcdd1 100644
--- a/net/vmw_vsock/virtio_transport_common.c
+++ b/net/vmw_vsock/virtio_transport_common.c
@@ -634,10 +634,7 @@ virtio_transport_notify_poll_in(struct vsock_sock *vsk,
 				size_t target,
 				bool *data_ready_now)
 {
-	if (vsock_stream_has_data(vsk))
-		*data_ready_now = true;
-	else
-		*data_ready_now = false;
+	*data_ready_now = vsock_stream_has_data(vsk) >= target;
 
 	return 0;
 }
-- 
2.25.1

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [RFC PATCH v2 3/9] vmci/vsock: use 'target' in notify_poll_in, callback
  2022-07-25  7:54 [RFC PATCH v2 0/9] vsock: updates for SO_RCVLOWAT handling Arseniy Krasnov
  2022-07-25  7:56 ` [RFC PATCH v2 1/9] vsock: use sk_rcvlowat to set POLLIN/POLLRDNORM Arseniy Krasnov
  2022-07-25  7:59 ` [RFC PATCH v2 2/9] virtio/vsock: use 'target' in notify_poll_in, callback Arseniy Krasnov
@ 2022-07-25  8:01 ` Arseniy Krasnov
  2022-07-27 12:22   ` Stefano Garzarella
  2022-07-25  8:03 ` [RFC PATCH v2 4/9] vsock_test: POLLIN + SO_RCVLOWAT test Arseniy Krasnov
                   ` (6 subsequent siblings)
  9 siblings, 1 reply; 24+ messages in thread
From: Arseniy Krasnov @ 2022-07-25  8:01 UTC (permalink / raw)
  To: Stefano Garzarella, David S. Miller, edumazet, Jakub Kicinski,
	Paolo Abeni, kys, haiyangz, sthemmin, wei.liu, Dexuan Cui,
	Stefan Hajnoczi, Arseniy Krasnov, Krasnov Arseniy
  Cc: virtualization, linux-kernel, linux-hyperv, netdev, kvm, kernel

This callback controls setting of POLLIN,POLLRDNORM output bits of poll()
syscall,but in some cases,it is incorrectly to set it, when socket has
at least 1 bytes of available data. Use 'target' which is already exists
and equal to sk_rcvlowat in this case.

Signed-off-by: Arseniy Krasnov <AVKrasnov@sberdevices.ru>
---
 net/vmw_vsock/vmci_transport_notify.c        | 2 +-
 net/vmw_vsock/vmci_transport_notify_qstate.c | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/net/vmw_vsock/vmci_transport_notify.c b/net/vmw_vsock/vmci_transport_notify.c
index d69fc4b595ad..1684b85b0660 100644
--- a/net/vmw_vsock/vmci_transport_notify.c
+++ b/net/vmw_vsock/vmci_transport_notify.c
@@ -340,7 +340,7 @@ vmci_transport_notify_pkt_poll_in(struct sock *sk,
 {
 	struct vsock_sock *vsk = vsock_sk(sk);
 
-	if (vsock_stream_has_data(vsk)) {
+	if (vsock_stream_has_data(vsk) >= target) {
 		*data_ready_now = true;
 	} else {
 		/* We can't read right now because there is nothing in the
diff --git a/net/vmw_vsock/vmci_transport_notify_qstate.c b/net/vmw_vsock/vmci_transport_notify_qstate.c
index 0f36d7c45db3..a40407872b53 100644
--- a/net/vmw_vsock/vmci_transport_notify_qstate.c
+++ b/net/vmw_vsock/vmci_transport_notify_qstate.c
@@ -161,7 +161,7 @@ vmci_transport_notify_pkt_poll_in(struct sock *sk,
 {
 	struct vsock_sock *vsk = vsock_sk(sk);
 
-	if (vsock_stream_has_data(vsk)) {
+	if (vsock_stream_has_data(vsk) >= target) {
 		*data_ready_now = true;
 	} else {
 		/* We can't read right now because there is nothing in the
-- 
2.25.1

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [RFC PATCH v2 4/9] vsock_test: POLLIN + SO_RCVLOWAT test
  2022-07-25  7:54 [RFC PATCH v2 0/9] vsock: updates for SO_RCVLOWAT handling Arseniy Krasnov
                   ` (2 preceding siblings ...)
  2022-07-25  8:01 ` [RFC PATCH v2 3/9] vmci/vsock: " Arseniy Krasnov
@ 2022-07-25  8:03 ` Arseniy Krasnov
  2022-07-25  8:05 ` [RFC PATCH v2 5/9] vsock: SO_RCVLOWAT transport set callback Arseniy Krasnov
                   ` (5 subsequent siblings)
  9 siblings, 0 replies; 24+ messages in thread
From: Arseniy Krasnov @ 2022-07-25  8:03 UTC (permalink / raw)
  To: Stefano Garzarella, David S. Miller, edumazet, Jakub Kicinski,
	Paolo Abeni, kys, haiyangz, sthemmin, wei.liu, Dexuan Cui,
	Stefan Hajnoczi, Arseniy Krasnov, Krasnov Arseniy
  Cc: virtualization, netdev, linux-kernel, linux-hyperv, kvm, kernel

This adds test to check,that when poll() returns POLLIN,POLLRDNORM bits,
next read call won't block.

Signed-off-by: Arseniy Krasnov <AVKrasnov@sberdevices.ru>
---
 tools/testing/vsock/vsock_test.c | 107 +++++++++++++++++++++++++++++++
 1 file changed, 107 insertions(+)

diff --git a/tools/testing/vsock/vsock_test.c b/tools/testing/vsock/vsock_test.c
index dc577461afc2..920dc5d5d979 100644
--- a/tools/testing/vsock/vsock_test.c
+++ b/tools/testing/vsock/vsock_test.c
@@ -18,6 +18,7 @@
 #include <sys/socket.h>
 #include <time.h>
 #include <sys/mman.h>
+#include <poll.h>
 
 #include "timeout.h"
 #include "control.h"
@@ -596,6 +597,107 @@ static void test_seqpacket_invalid_rec_buffer_server(const struct test_opts *opt
 	close(fd);
 }
 
+static void test_stream_poll_rcvlowat_server(const struct test_opts *opts)
+{
+#define RCVLOWAT_BUF_SIZE 128
+	int fd;
+	int i;
+
+	fd = vsock_stream_accept(VMADDR_CID_ANY, 1234, NULL);
+	if (fd < 0) {
+		perror("accept");
+		exit(EXIT_FAILURE);
+	}
+
+	/* Send 1 byte. */
+	send_byte(fd, 1, 0);
+
+	control_writeln("SRVSENT");
+
+	/* Wait until client is ready to receive rest of data. */
+	control_expectln("CLNSENT");
+
+	for (i = 0; i < RCVLOWAT_BUF_SIZE - 1; i++)
+		send_byte(fd, 1, 0);
+
+	/* Keep socket in active state. */
+	control_expectln("POLLDONE");
+
+	close(fd);
+}
+
+static void test_stream_poll_rcvlowat_client(const struct test_opts *opts)
+{
+	unsigned long lowat_val = RCVLOWAT_BUF_SIZE;
+	char buf[RCVLOWAT_BUF_SIZE];
+	struct pollfd fds;
+	ssize_t read_res;
+	short poll_flags;
+	int fd;
+
+	fd = vsock_stream_connect(opts->peer_cid, 1234);
+	if (fd < 0) {
+		perror("connect");
+		exit(EXIT_FAILURE);
+	}
+
+	if (setsockopt(fd, SOL_SOCKET, SO_RCVLOWAT,
+			&lowat_val, sizeof(lowat_val))) {
+		perror("setsockopt");
+		exit(EXIT_FAILURE);
+	}
+
+	control_expectln("SRVSENT");
+
+	/* At this point, server sent 1 byte. */
+	fds.fd = fd;
+	poll_flags = POLLIN | POLLRDNORM;
+	fds.events = poll_flags;
+
+	/* Try to wait for 1 sec. */
+	if (poll(&fds, 1, 1000) < 0) {
+		perror("poll");
+		exit(EXIT_FAILURE);
+	}
+
+	/* poll() must return nothing. */
+	if (fds.revents) {
+		fprintf(stderr, "Unexpected poll result %hx\n",
+			fds.revents);
+		exit(EXIT_FAILURE);
+	}
+
+	/* Tell server to send rest of data. */
+	control_writeln("CLNSENT");
+
+	/* Poll for data. */
+	if (poll(&fds, 1, 10000) < 0) {
+		perror("poll");
+		exit(EXIT_FAILURE);
+	}
+
+	/* Only these two bits are expected. */
+	if (fds.revents != poll_flags) {
+		fprintf(stderr, "Unexpected poll result %hx\n",
+			fds.revents);
+		exit(EXIT_FAILURE);
+	}
+
+	/* Use MSG_DONTWAIT, if call is going to wait, EAGAIN
+	 * will be returned.
+	 */
+	read_res = recv(fd, buf, sizeof(buf), MSG_DONTWAIT);
+	if (read_res != RCVLOWAT_BUF_SIZE) {
+		fprintf(stderr, "Unexpected recv result %zi\n",
+			read_res);
+		exit(EXIT_FAILURE);
+	}
+
+	control_writeln("POLLDONE");
+
+	close(fd);
+}
+
 static struct test_case test_cases[] = {
 	{
 		.name = "SOCK_STREAM connection reset",
@@ -646,6 +748,11 @@ static struct test_case test_cases[] = {
 		.run_client = test_seqpacket_invalid_rec_buffer_client,
 		.run_server = test_seqpacket_invalid_rec_buffer_server,
 	},
+	{
+		.name = "SOCK_STREAM poll() + SO_RCVLOWAT",
+		.run_client = test_stream_poll_rcvlowat_client,
+		.run_server = test_stream_poll_rcvlowat_server,
+	},
 	{},
 };
 
-- 
2.25.1

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [RFC PATCH v2 5/9] vsock: SO_RCVLOWAT transport set callback
  2022-07-25  7:54 [RFC PATCH v2 0/9] vsock: updates for SO_RCVLOWAT handling Arseniy Krasnov
                   ` (3 preceding siblings ...)
  2022-07-25  8:03 ` [RFC PATCH v2 4/9] vsock_test: POLLIN + SO_RCVLOWAT test Arseniy Krasnov
@ 2022-07-25  8:05 ` Arseniy Krasnov
  2022-07-27 12:24   ` Stefano Garzarella
  2022-07-25  8:07 ` [RFC PATCH v2 6/9] hv_sock: disable SO_RCVLOWAT support Arseniy Krasnov
                   ` (4 subsequent siblings)
  9 siblings, 1 reply; 24+ messages in thread
From: Arseniy Krasnov @ 2022-07-25  8:05 UTC (permalink / raw)
  To: Stefano Garzarella, David S. Miller, edumazet, Jakub Kicinski,
	Paolo Abeni, kys, haiyangz, sthemmin, wei.liu, Dexuan Cui,
	Stefan Hajnoczi, Arseniy Krasnov, Krasnov Arseniy
  Cc: virtualization, linux-kernel, linux-hyperv, netdev, kvm, kernel

This adds transport specific callback for SO_RCVLOWAT, because in some
transports it may be difficult to know current available number of bytes
ready to read. Thus, when SO_RCVLOWAT is set, transport may reject it.

Signed-off-by: Arseniy Krasnov <AVKrasnov@sberdevices.ru>
---
 include/net/af_vsock.h   |  1 +
 net/vmw_vsock/af_vsock.c | 19 +++++++++++++++++++
 2 files changed, 20 insertions(+)

diff --git a/include/net/af_vsock.h b/include/net/af_vsock.h
index f742e50207fb..eae5874bae35 100644
--- a/include/net/af_vsock.h
+++ b/include/net/af_vsock.h
@@ -134,6 +134,7 @@ struct vsock_transport {
 	u64 (*stream_rcvhiwat)(struct vsock_sock *);
 	bool (*stream_is_active)(struct vsock_sock *);
 	bool (*stream_allow)(u32 cid, u32 port);
+	int (*set_rcvlowat)(struct vsock_sock *, int);
 
 	/* SEQ_PACKET. */
 	ssize_t (*seqpacket_dequeue)(struct vsock_sock *vsk, struct msghdr *msg,
diff --git a/net/vmw_vsock/af_vsock.c b/net/vmw_vsock/af_vsock.c
index 63a13fa2686a..b7a286db4af1 100644
--- a/net/vmw_vsock/af_vsock.c
+++ b/net/vmw_vsock/af_vsock.c
@@ -2130,6 +2130,24 @@ vsock_connectible_recvmsg(struct socket *sock, struct msghdr *msg, size_t len,
 	return err;
 }
 
+static int vsock_set_rcvlowat(struct sock *sk, int val)
+{
+	const struct vsock_transport *transport;
+	struct vsock_sock *vsk;
+	int err = 0;
+
+	vsk = vsock_sk(sk);
+	transport = vsk->transport;
+
+	if (transport->set_rcvlowat)
+		err = transport->set_rcvlowat(vsk, val);
+
+	if (!err)
+		WRITE_ONCE(sk->sk_rcvlowat, val ? : 1);
+
+	return err;
+}
+
 static const struct proto_ops vsock_stream_ops = {
 	.family = PF_VSOCK,
 	.owner = THIS_MODULE,
@@ -2149,6 +2167,7 @@ static const struct proto_ops vsock_stream_ops = {
 	.recvmsg = vsock_connectible_recvmsg,
 	.mmap = sock_no_mmap,
 	.sendpage = sock_no_sendpage,
+	.set_rcvlowat = vsock_set_rcvlowat,
 };
 
 static const struct proto_ops vsock_seqpacket_ops = {
-- 
2.25.1

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [RFC PATCH v2 6/9] hv_sock: disable SO_RCVLOWAT support
  2022-07-25  7:54 [RFC PATCH v2 0/9] vsock: updates for SO_RCVLOWAT handling Arseniy Krasnov
                   ` (4 preceding siblings ...)
  2022-07-25  8:05 ` [RFC PATCH v2 5/9] vsock: SO_RCVLOWAT transport set callback Arseniy Krasnov
@ 2022-07-25  8:07 ` Arseniy Krasnov
  2022-08-02  5:42   ` Dexuan Cui
  2022-07-25  8:09 ` [RFC PATCH v2 7/9] vsock: add API call for data ready Arseniy Krasnov
                   ` (3 subsequent siblings)
  9 siblings, 1 reply; 24+ messages in thread
From: Arseniy Krasnov @ 2022-07-25  8:07 UTC (permalink / raw)
  To: Stefano Garzarella, David S. Miller, edumazet, Jakub Kicinski,
	Paolo Abeni, kys, haiyangz, sthemmin, wei.liu, Dexuan Cui,
	Stefan Hajnoczi, Arseniy Krasnov, Krasnov Arseniy
  Cc: virtualization, linux-kernel, linux-hyperv, netdev, kvm, kernel

For Hyper-V it is quiet difficult to support this socket option,due to
transport internals, so disable it.

Signed-off-by: Arseniy Krasnov <AVKrasnov@sberdevices.ru>
---
 net/vmw_vsock/hyperv_transport.c | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/net/vmw_vsock/hyperv_transport.c b/net/vmw_vsock/hyperv_transport.c
index e111e13b6660..5fab8f356a86 100644
--- a/net/vmw_vsock/hyperv_transport.c
+++ b/net/vmw_vsock/hyperv_transport.c
@@ -802,6 +802,12 @@ int hvs_notify_send_post_enqueue(struct vsock_sock *vsk, ssize_t written,
 	return 0;
 }
 
+static
+int hvs_set_rcvlowat(struct vsock_sock *vsk, int val)
+{
+	return -EOPNOTSUPP;
+}
+
 static struct vsock_transport hvs_transport = {
 	.module                   = THIS_MODULE,
 
@@ -837,6 +843,7 @@ static struct vsock_transport hvs_transport = {
 	.notify_send_pre_enqueue  = hvs_notify_send_pre_enqueue,
 	.notify_send_post_enqueue = hvs_notify_send_post_enqueue,
 
+	.set_rcvlowat             = hvs_set_rcvlowat
 };
 
 static bool hvs_check_transport(struct vsock_sock *vsk)
-- 
2.25.1

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [RFC PATCH v2 7/9] vsock: add API call for data ready
  2022-07-25  7:54 [RFC PATCH v2 0/9] vsock: updates for SO_RCVLOWAT handling Arseniy Krasnov
                   ` (5 preceding siblings ...)
  2022-07-25  8:07 ` [RFC PATCH v2 6/9] hv_sock: disable SO_RCVLOWAT support Arseniy Krasnov
@ 2022-07-25  8:09 ` Arseniy Krasnov
  2022-07-25  8:11 ` [RFC PATCH v2 8/9] virtio/vsock: check SO_RCVLOWAT before wake up reader Arseniy Krasnov
                   ` (2 subsequent siblings)
  9 siblings, 0 replies; 24+ messages in thread
From: Arseniy Krasnov @ 2022-07-25  8:09 UTC (permalink / raw)
  To: Stefano Garzarella, David S. Miller, edumazet, Jakub Kicinski,
	Paolo Abeni, kys, haiyangz, sthemmin, wei.liu, Dexuan Cui,
	Stefan Hajnoczi, Arseniy Krasnov, Krasnov Arseniy
  Cc: virtualization, netdev, linux-kernel, linux-hyperv, kvm, kernel

This adds 'vsock_data_ready()' which must be called by transport to kick
sleeping data readers. It checks for SO_RCVLOWAT value before waking
user,thus preventing spurious wake ups.Based on 'tcp_data_ready()' logic.

Signed-off-by: Arseniy Krasnov <AVKrasnov@sberdevices.ru>
---
 include/net/af_vsock.h   |  1 +
 net/vmw_vsock/af_vsock.c | 10 ++++++++++
 2 files changed, 11 insertions(+)

diff --git a/include/net/af_vsock.h b/include/net/af_vsock.h
index eae5874bae35..7b79fc5164cc 100644
--- a/include/net/af_vsock.h
+++ b/include/net/af_vsock.h
@@ -77,6 +77,7 @@ struct vsock_sock {
 s64 vsock_stream_has_data(struct vsock_sock *vsk);
 s64 vsock_stream_has_space(struct vsock_sock *vsk);
 struct sock *vsock_create_connected(struct sock *parent);
+void vsock_data_ready(struct sock *sk);
 
 /**** TRANSPORT ****/
 
diff --git a/net/vmw_vsock/af_vsock.c b/net/vmw_vsock/af_vsock.c
index b7a286db4af1..4b3ec3f9383f 100644
--- a/net/vmw_vsock/af_vsock.c
+++ b/net/vmw_vsock/af_vsock.c
@@ -882,6 +882,16 @@ s64 vsock_stream_has_space(struct vsock_sock *vsk)
 }
 EXPORT_SYMBOL_GPL(vsock_stream_has_space);
 
+void vsock_data_ready(struct sock *sk)
+{
+	struct vsock_sock *vsk = vsock_sk(sk);
+
+	if (vsock_stream_has_data(vsk) >= sk->sk_rcvlowat ||
+	    sock_flag(sk, SOCK_DONE))
+		sk->sk_data_ready(sk);
+}
+EXPORT_SYMBOL_GPL(vsock_data_ready);
+
 static int vsock_release(struct socket *sock)
 {
 	__vsock_release(sock->sk, 0);
-- 
2.25.1

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [RFC PATCH v2 8/9] virtio/vsock: check SO_RCVLOWAT before wake up reader
  2022-07-25  7:54 [RFC PATCH v2 0/9] vsock: updates for SO_RCVLOWAT handling Arseniy Krasnov
                   ` (6 preceding siblings ...)
  2022-07-25  8:09 ` [RFC PATCH v2 7/9] vsock: add API call for data ready Arseniy Krasnov
@ 2022-07-25  8:11 ` Arseniy Krasnov
  2022-07-25  8:13 ` [RFC PATCH v2 9/9] vmci/vsock: " Arseniy Krasnov
  2022-07-27 12:37 ` [RFC PATCH v2 0/9] vsock: updates for SO_RCVLOWAT handling Stefano Garzarella
  9 siblings, 0 replies; 24+ messages in thread
From: Arseniy Krasnov @ 2022-07-25  8:11 UTC (permalink / raw)
  To: Stefano Garzarella, David S. Miller, edumazet, Jakub Kicinski,
	Paolo Abeni, kys, haiyangz, sthemmin, wei.liu, Dexuan Cui,
	Stefan Hajnoczi, Arseniy Krasnov, Krasnov Arseniy
  Cc: virtualization, linux-kernel, linux-hyperv, netdev, kvm, kernel

This adds extra condition to wake up data reader: do it only when number
of readable bytes >= SO_RCVLOWAT. Otherwise, there is no sense to kick
user,because it will wait until SO_RCVLOWAT bytes will be dequeued.

Signed-off-by: Arseniy Krasnov <AVKrasnov@sberdevices.ru>
---
 net/vmw_vsock/virtio_transport_common.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/vmw_vsock/virtio_transport_common.c b/net/vmw_vsock/virtio_transport_common.c
index 8f6356ebcdd1..35863132f4f1 100644
--- a/net/vmw_vsock/virtio_transport_common.c
+++ b/net/vmw_vsock/virtio_transport_common.c
@@ -1081,7 +1081,7 @@ virtio_transport_recv_connected(struct sock *sk,
 	switch (le16_to_cpu(pkt->hdr.op)) {
 	case VIRTIO_VSOCK_OP_RW:
 		virtio_transport_recv_enqueue(vsk, pkt);
-		sk->sk_data_ready(sk);
+		vsock_data_ready(sk);
 		return err;
 	case VIRTIO_VSOCK_OP_CREDIT_REQUEST:
 		virtio_transport_send_credit_update(vsk);
-- 
2.25.1

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [RFC PATCH v2 9/9] vmci/vsock: check SO_RCVLOWAT before wake up reader
  2022-07-25  7:54 [RFC PATCH v2 0/9] vsock: updates for SO_RCVLOWAT handling Arseniy Krasnov
                   ` (7 preceding siblings ...)
  2022-07-25  8:11 ` [RFC PATCH v2 8/9] virtio/vsock: check SO_RCVLOWAT before wake up reader Arseniy Krasnov
@ 2022-07-25  8:13 ` Arseniy Krasnov
  2022-07-27 12:37 ` [RFC PATCH v2 0/9] vsock: updates for SO_RCVLOWAT handling Stefano Garzarella
  9 siblings, 0 replies; 24+ messages in thread
From: Arseniy Krasnov @ 2022-07-25  8:13 UTC (permalink / raw)
  To: Stefano Garzarella, David S. Miller, edumazet, Jakub Kicinski,
	Paolo Abeni, kys, haiyangz, sthemmin, wei.liu, Dexuan Cui,
	Stefan Hajnoczi, Arseniy Krasnov, Krasnov Arseniy
  Cc: virtualization, linux-kernel, linux-hyperv, netdev, kvm, kernel

This adds extra condition to wake up data reader: do it only when number
of readable bytes >= SO_RCVLOWAT. Otherwise, there is no sense to kick
user,because it will wait until SO_RCVLOWAT bytes will be dequeued.

Signed-off-by: Arseniy Krasnov <AVKrasnov@sberdevices.ru>
---
 net/vmw_vsock/vmci_transport_notify.c        | 2 +-
 net/vmw_vsock/vmci_transport_notify_qstate.c | 4 ++--
 2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/net/vmw_vsock/vmci_transport_notify.c b/net/vmw_vsock/vmci_transport_notify.c
index 1684b85b0660..ab42d73ab3da 100644
--- a/net/vmw_vsock/vmci_transport_notify.c
+++ b/net/vmw_vsock/vmci_transport_notify.c
@@ -307,7 +307,7 @@ vmci_transport_handle_wrote(struct sock *sk,
 	struct vsock_sock *vsk = vsock_sk(sk);
 	PKT_FIELD(vsk, sent_waiting_read) = false;
 #endif
-	sk->sk_data_ready(sk);
+	vsock_data_ready(sk);
 }
 
 static void vmci_transport_notify_pkt_socket_init(struct sock *sk)
diff --git a/net/vmw_vsock/vmci_transport_notify_qstate.c b/net/vmw_vsock/vmci_transport_notify_qstate.c
index a40407872b53..41dca5fbea5e 100644
--- a/net/vmw_vsock/vmci_transport_notify_qstate.c
+++ b/net/vmw_vsock/vmci_transport_notify_qstate.c
@@ -84,7 +84,7 @@ vmci_transport_handle_wrote(struct sock *sk,
 			    bool bottom_half,
 			    struct sockaddr_vm *dst, struct sockaddr_vm *src)
 {
-	sk->sk_data_ready(sk);
+	vsock_data_ready(sk);
 }
 
 static void vsock_block_update_write_window(struct sock *sk)
@@ -282,7 +282,7 @@ vmci_transport_notify_pkt_recv_post_dequeue(
 		/* See the comment in
 		 * vmci_transport_notify_pkt_send_post_enqueue().
 		 */
-		sk->sk_data_ready(sk);
+		vsock_data_ready(sk);
 	}
 
 	return err;
-- 
2.25.1

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* Re: [RFC PATCH v2 1/9] vsock: use sk_rcvlowat to set POLLIN/POLLRDNORM
  2022-07-25  7:56 ` [RFC PATCH v2 1/9] vsock: use sk_rcvlowat to set POLLIN/POLLRDNORM Arseniy Krasnov
@ 2022-07-27 12:17   ` Stefano Garzarella
  2022-07-28  6:04     ` Arseniy Krasnov
  0 siblings, 1 reply; 24+ messages in thread
From: Stefano Garzarella @ 2022-07-27 12:17 UTC (permalink / raw)
  To: Arseniy Krasnov
  Cc: David S. Miller, edumazet, Jakub Kicinski, Paolo Abeni, kys,
	haiyangz, sthemmin, wei.liu, Dexuan Cui, Stefan Hajnoczi,
	Michael S. Tsirkin, Krasnov Arseniy, virtualization, netdev,
	linux-kernel, linux-hyperv, kvm, kernel

On Mon, Jul 25, 2022 at 07:56:59AM +0000, Arseniy Krasnov wrote:
>Both bits indicate, that next data read call won't be blocked, but when
>sk_rcvlowat is not 1, these bits will be set by poll anyway, thus when
>user tries to dequeue data,it will wait until sk_rcvlowat bytes of data
>will be available.
>

The patch LGTM, but I suggest you to rewrite the title and commit of the 
message to better explain what this patch does (pass sock_rcvlowat to 
notify_poll_in as target) and then explain why as you already did (to 
set POLLIN/POLLRDNORM only when target is reached).

Thanks,
Stefano

>Signed-off-by: Arseniy Krasnov <AVKrasnov@sberdevices.ru>
>---
> net/vmw_vsock/af_vsock.c | 3 ++-
> 1 file changed, 2 insertions(+), 1 deletion(-)
>
>diff --git a/net/vmw_vsock/af_vsock.c b/net/vmw_vsock/af_vsock.c
>index f04abf662ec6..63a13fa2686a 100644
>--- a/net/vmw_vsock/af_vsock.c
>+++ b/net/vmw_vsock/af_vsock.c
>@@ -1066,8 +1066,9 @@ static __poll_t vsock_poll(struct file *file, struct socket *sock,
> 		if (transport && transport->stream_is_active(vsk) &&
> 		    !(sk->sk_shutdown & RCV_SHUTDOWN)) {
> 			bool data_ready_now = false;
>+			int target = sock_rcvlowat(sk, 0, INT_MAX);
> 			int ret = transport->notify_poll_in(
>-					vsk, 1, &data_ready_now);
>+					vsk, target, &data_ready_now);
> 			if (ret < 0) {
> 				mask |= EPOLLERR;
> 			} else {
>-- 
>2.25.1


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC PATCH v2 3/9] vmci/vsock: use 'target' in notify_poll_in, callback
  2022-07-25  8:01 ` [RFC PATCH v2 3/9] vmci/vsock: " Arseniy Krasnov
@ 2022-07-27 12:22   ` Stefano Garzarella
  2022-07-27 12:29     ` Stefano Garzarella
  2022-07-28  6:05     ` Arseniy Krasnov
  0 siblings, 2 replies; 24+ messages in thread
From: Stefano Garzarella @ 2022-07-27 12:22 UTC (permalink / raw)
  To: Jorgen Hansen, Arseniy Krasnov
  Cc: David S. Miller, edumazet, Jakub Kicinski, Paolo Abeni, kys,
	haiyangz, sthemmin, wei.liu, Dexuan Cui, Stefan Hajnoczi,
	Krasnov Arseniy, virtualization, linux-kernel, linux-hyperv,
	netdev, kvm, kernel

@Jorgen can you take a look at this series, especially this patch?

Maybe we need to update the comments in the else branch, something like
s/there is nothing/there is not enough data

Thanks,
Stefano

On Mon, Jul 25, 2022 at 08:01:01AM +0000, Arseniy Krasnov wrote:
>This callback controls setting of POLLIN,POLLRDNORM output bits of poll()
>syscall,but in some cases,it is incorrectly to set it, when socket has
>at least 1 bytes of available data. Use 'target' which is already exists
>and equal to sk_rcvlowat in this case.
>
>Signed-off-by: Arseniy Krasnov <AVKrasnov@sberdevices.ru>
>---
> net/vmw_vsock/vmci_transport_notify.c        | 2 +-
> net/vmw_vsock/vmci_transport_notify_qstate.c | 2 +-
> 2 files changed, 2 insertions(+), 2 deletions(-)
>
>diff --git a/net/vmw_vsock/vmci_transport_notify.c b/net/vmw_vsock/vmci_transport_notify.c
>index d69fc4b595ad..1684b85b0660 100644
>--- a/net/vmw_vsock/vmci_transport_notify.c
>+++ b/net/vmw_vsock/vmci_transport_notify.c
>@@ -340,7 +340,7 @@ vmci_transport_notify_pkt_poll_in(struct sock *sk,
> {
> 	struct vsock_sock *vsk = vsock_sk(sk);
>
>-	if (vsock_stream_has_data(vsk)) {
>+	if (vsock_stream_has_data(vsk) >= target) {
> 		*data_ready_now = true;
> 	} else {
> 		/* We can't read right now because there is nothing in the
>diff --git a/net/vmw_vsock/vmci_transport_notify_qstate.c b/net/vmw_vsock/vmci_transport_notify_qstate.c
>index 0f36d7c45db3..a40407872b53 100644
>--- a/net/vmw_vsock/vmci_transport_notify_qstate.c
>+++ b/net/vmw_vsock/vmci_transport_notify_qstate.c
>@@ -161,7 +161,7 @@ vmci_transport_notify_pkt_poll_in(struct sock *sk,
> {
> 	struct vsock_sock *vsk = vsock_sk(sk);
>
>-	if (vsock_stream_has_data(vsk)) {
>+	if (vsock_stream_has_data(vsk) >= target) {
> 		*data_ready_now = true;
> 	} else {
> 		/* We can't read right now because there is nothing in the
>-- 
>2.25.1


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC PATCH v2 5/9] vsock: SO_RCVLOWAT transport set callback
  2022-07-25  8:05 ` [RFC PATCH v2 5/9] vsock: SO_RCVLOWAT transport set callback Arseniy Krasnov
@ 2022-07-27 12:24   ` Stefano Garzarella
  2022-07-28  6:06     ` Arseniy Krasnov
  0 siblings, 1 reply; 24+ messages in thread
From: Stefano Garzarella @ 2022-07-27 12:24 UTC (permalink / raw)
  To: Arseniy Krasnov
  Cc: David S. Miller, edumazet, Jakub Kicinski, Paolo Abeni, kys,
	haiyangz, sthemmin, wei.liu, Dexuan Cui, Stefan Hajnoczi,
	Krasnov Arseniy, virtualization, linux-kernel, linux-hyperv,
	netdev, kvm, kernel

On Mon, Jul 25, 2022 at 08:05:28AM +0000, Arseniy Krasnov wrote:
>This adds transport specific callback for SO_RCVLOWAT, because in some
>transports it may be difficult to know current available number of bytes
>ready to read. Thus, when SO_RCVLOWAT is set, transport may reject it.
>
>Signed-off-by: Arseniy Krasnov <AVKrasnov@sberdevices.ru>
>---
> include/net/af_vsock.h   |  1 +
> net/vmw_vsock/af_vsock.c | 19 +++++++++++++++++++
> 2 files changed, 20 insertions(+)
>
>diff --git a/include/net/af_vsock.h b/include/net/af_vsock.h
>index f742e50207fb..eae5874bae35 100644
>--- a/include/net/af_vsock.h
>+++ b/include/net/af_vsock.h
>@@ -134,6 +134,7 @@ struct vsock_transport {
> 	u64 (*stream_rcvhiwat)(struct vsock_sock *);
> 	bool (*stream_is_active)(struct vsock_sock *);
> 	bool (*stream_allow)(u32 cid, u32 port);
>+	int (*set_rcvlowat)(struct vsock_sock *, int);
>
> 	/* SEQ_PACKET. */
> 	ssize_t (*seqpacket_dequeue)(struct vsock_sock *vsk, struct msghdr *msg,
>diff --git a/net/vmw_vsock/af_vsock.c b/net/vmw_vsock/af_vsock.c
>index 63a13fa2686a..b7a286db4af1 100644
>--- a/net/vmw_vsock/af_vsock.c
>+++ b/net/vmw_vsock/af_vsock.c
>@@ -2130,6 +2130,24 @@ vsock_connectible_recvmsg(struct socket *sock, struct msghdr *msg, size_t len,
> 	return err;
> }
>
>+static int vsock_set_rcvlowat(struct sock *sk, int val)
>+{
>+	const struct vsock_transport *transport;
>+	struct vsock_sock *vsk;
>+	int err = 0;
>+
>+	vsk = vsock_sk(sk);
>+	transport = vsk->transport;

`transport` can be NULL if the user call SO_RCVLOWAT before we assign 
it, so we should check it.

I think if the transport implements `set_rcvlowat`, maybe we should set 
there sk->sk_rcvlowat, so I would do something like that:

     if (transport && transport->set_rcvlowat)
         err = transport->set_rcvlowat(vsk, val);
     else
         WRITE_ONCE(sk->sk_rcvlowat, val ? : 1);

     return err;

In addition I think we should check that val does not exceed 
vsk->buffer_size, something similar of what tcp_set_rcvlowat() does.

Thanks,
Stefano


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC PATCH v2 3/9] vmci/vsock: use 'target' in notify_poll_in, callback
  2022-07-27 12:22   ` Stefano Garzarella
@ 2022-07-27 12:29     ` Stefano Garzarella
  2022-07-28  6:05     ` Arseniy Krasnov
  1 sibling, 0 replies; 24+ messages in thread
From: Stefano Garzarella @ 2022-07-27 12:29 UTC (permalink / raw)
  To: Arseniy Krasnov, Bryan Tan, Rajesh Jalisatgi, Vishnu Dasa,
	VMware PV-Drivers Reviewers
  Cc: David S. Miller, edumazet, Jakub Kicinski, Paolo Abeni, kys,
	haiyangz, sthemmin, wei.liu, Dexuan Cui, Stefan Hajnoczi,
	Krasnov Arseniy, virtualization, linux-kernel, linux-hyperv,
	netdev, kvm, kernel

On Wed, Jul 27, 2022 at 2:22 PM Stefano Garzarella <sgarzare@redhat.com> wrote:
>
> @Jorgen can you take a look at this series, especially this patch?

Jorgen's email bounced back, so I'm CCing VMCI maintainers.

Bryan, Rajesh, Vishnu, can you take a look?

Thanks,
Stefano

>
> Maybe we need to update the comments in the else branch, something like
> s/there is nothing/there is not enough data
>
> Thanks,
> Stefano
>
> On Mon, Jul 25, 2022 at 08:01:01AM +0000, Arseniy Krasnov wrote:
> >This callback controls setting of POLLIN,POLLRDNORM output bits of poll()
> >syscall,but in some cases,it is incorrectly to set it, when socket has
> >at least 1 bytes of available data. Use 'target' which is already exists
> >and equal to sk_rcvlowat in this case.
> >
> >Signed-off-by: Arseniy Krasnov <AVKrasnov@sberdevices.ru>
> >---
> > net/vmw_vsock/vmci_transport_notify.c        | 2 +-
> > net/vmw_vsock/vmci_transport_notify_qstate.c | 2 +-
> > 2 files changed, 2 insertions(+), 2 deletions(-)
> >
> >diff --git a/net/vmw_vsock/vmci_transport_notify.c b/net/vmw_vsock/vmci_transport_notify.c
> >index d69fc4b595ad..1684b85b0660 100644
> >--- a/net/vmw_vsock/vmci_transport_notify.c
> >+++ b/net/vmw_vsock/vmci_transport_notify.c
> >@@ -340,7 +340,7 @@ vmci_transport_notify_pkt_poll_in(struct sock *sk,
> > {
> >       struct vsock_sock *vsk = vsock_sk(sk);
> >
> >-      if (vsock_stream_has_data(vsk)) {
> >+      if (vsock_stream_has_data(vsk) >= target) {
> >               *data_ready_now = true;
> >       } else {
> >               /* We can't read right now because there is nothing in the
> >diff --git a/net/vmw_vsock/vmci_transport_notify_qstate.c b/net/vmw_vsock/vmci_transport_notify_qstate.c
> >index 0f36d7c45db3..a40407872b53 100644
> >--- a/net/vmw_vsock/vmci_transport_notify_qstate.c
> >+++ b/net/vmw_vsock/vmci_transport_notify_qstate.c
> >@@ -161,7 +161,7 @@ vmci_transport_notify_pkt_poll_in(struct sock *sk,
> > {
> >       struct vsock_sock *vsk = vsock_sk(sk);
> >
> >-      if (vsock_stream_has_data(vsk)) {
> >+      if (vsock_stream_has_data(vsk) >= target) {
> >               *data_ready_now = true;
> >       } else {
> >               /* We can't read right now because there is nothing in the
> >--
> >2.25.1


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC PATCH v2 0/9] vsock: updates for SO_RCVLOWAT handling
  2022-07-25  7:54 [RFC PATCH v2 0/9] vsock: updates for SO_RCVLOWAT handling Arseniy Krasnov
                   ` (8 preceding siblings ...)
  2022-07-25  8:13 ` [RFC PATCH v2 9/9] vmci/vsock: " Arseniy Krasnov
@ 2022-07-27 12:37 ` Stefano Garzarella
  2022-07-28  6:08   ` Arseniy Krasnov
  2022-08-02  5:35   ` Vishnu Dasa
  9 siblings, 2 replies; 24+ messages in thread
From: Stefano Garzarella @ 2022-07-27 12:37 UTC (permalink / raw)
  To: Arseniy Krasnov, Bryan Tan, Vishnu Dasa, VMware PV-Drivers Reviewers
  Cc: David S. Miller, edumazet, Jakub Kicinski, Paolo Abeni,
	Stefan Hajnoczi, Michael S. Tsirkin, kys, haiyangz, sthemmin,
	wei.liu, Dexuan Cui, Krasnov Arseniy, virtualization, netdev,
	linux-kernel, linux-hyperv, kvm, kernel

Hi Arseniy,

On Mon, Jul 25, 2022 at 07:54:05AM +0000, Arseniy Krasnov wrote:
>Hello,
>
>This patchset includes some updates for SO_RCVLOWAT:
>
>1) af_vsock:
>   During my experiments with zerocopy receive, i found, that in some
>   cases, poll() implementation violates POSIX: when socket has non-
>   default SO_RCVLOWAT(e.g. not 1), poll() will always set POLLIN and
>   POLLRDNORM bits in 'revents' even number of bytes available to read
>   on socket is smaller than SO_RCVLOWAT value. In this case,user sees
>   POLLIN flag and then tries to read data(for example using  'read()'
>   call), but read call will be blocked, because  SO_RCVLOWAT logic is
>   supported in dequeue loop in af_vsock.c. But the same time,  POSIX
>   requires that:
>
>   "POLLIN     Data other than high-priority data may be read without
>               blocking.
>    POLLRDNORM Normal data may be read without blocking."
>
>   See https://www.open-std.org/jtc1/sc22/open/n4217.pdf, page 293.
>
>   So, we have, that poll() syscall returns POLLIN, but read call will
>   be blocked.
>
>   Also in man page socket(7) i found that:
>
>   "Since Linux 2.6.28, select(2), poll(2), and epoll(7) indicate a
>   socket as readable only if at least SO_RCVLOWAT bytes are available."
>
>   I checked TCP callback for poll()(net/ipv4/tcp.c, tcp_poll()), it
>   uses SO_RCVLOWAT value to set POLLIN bit, also i've tested TCP with
>   this case for TCP socket, it works as POSIX required.
>
>   I've added some fixes to af_vsock.c and virtio_transport_common.c,
>   test is also implemented.
>
>2) virtio/vsock:
>   It adds some optimization to wake ups, when new data arrived. Now,
>   SO_RCVLOWAT is considered before wake up sleepers who wait new data.
>   There is no sense, to kick waiter, when number of available bytes
>   in socket's queue < SO_RCVLOWAT, because if we wake up reader in
>   this case, it will wait for SO_RCVLOWAT data anyway during dequeue,
>   or in poll() case, POLLIN/POLLRDNORM bits won't be set, so such
>   exit from poll() will be "spurious". This logic is also used in TCP
>   sockets.

Nice, it looks good!

>
>3) vmci/vsock:
>   Same as 2), but i'm not sure about this changes. Will be very good,
>   to get comments from someone who knows this code.

I CCed VMCI maintainers to the patch and also to this cover, maybe 
better to keep them in the loop for next versions.

(Jorgen's and Rajesh's emails bounced back, so I'm CCing here only 
Bryan, Vishnu, and pv-drivers@vmware.com)

>
>4) Hyper-V:
>   As Dexuan Cui mentioned, for Hyper-V transport it is difficult to
>   support SO_RCVLOWAT, so he suggested to disable this feature for
>   Hyper-V.

I left a couple of comments in some patches, but it seems to me to be in 
a good state :-)

I would just suggest a bit of a re-organization of the series (the 
patches are fine, just the order):
   - introduce vsock_set_rcvlowat()
   - disabling it for hv_sock
   - use 'target' in virtio transports
   - use 'target' in vmci transports
   - use sock_rcvlowat in vsock_poll()
     I think is better to pass sock_rcvlowat() as 'target' when the
     transports are already able to use it
   - add vsock_data_ready()
   - use vsock_data_ready() in virtio transports
   - use vsock_data_ready() in vmci transports
   - tests

What do you think?

Thanks,
Stefano


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC PATCH v2 1/9] vsock: use sk_rcvlowat to set POLLIN/POLLRDNORM
  2022-07-27 12:17   ` Stefano Garzarella
@ 2022-07-28  6:04     ` Arseniy Krasnov
  0 siblings, 0 replies; 24+ messages in thread
From: Arseniy Krasnov @ 2022-07-28  6:04 UTC (permalink / raw)
  To: Stefano Garzarella
  Cc: David S. Miller, edumazet, Jakub Kicinski, Paolo Abeni, kys,
	haiyangz, sthemmin, wei.liu, Dexuan Cui, Stefan Hajnoczi,
	Michael S. Tsirkin, Krasnov Arseniy, virtualization, netdev,
	linux-kernel, linux-hyperv, kvm, kernel

On 27.07.2022 15:17, Stefano Garzarella wrote:
> On Mon, Jul 25, 2022 at 07:56:59AM +0000, Arseniy Krasnov wrote:
>> Both bits indicate, that next data read call won't be blocked, but when
>> sk_rcvlowat is not 1, these bits will be set by poll anyway, thus when
>> user tries to dequeue data,it will wait until sk_rcvlowat bytes of data
>> will be available.
>>
> 
> The patch LGTM, but I suggest you to rewrite the title and commit of the message to better explain what this patch does (pass sock_rcvlowat to notify_poll_in as target) and then explain why as you already did (to set POLLIN/POLLRDNORM only when target is reached).
Ok, i see. Ack
> 
> Thanks,
> Stefano
> 
>> Signed-off-by: Arseniy Krasnov <AVKrasnov@sberdevices.ru>
>> ---
>> net/vmw_vsock/af_vsock.c | 3 ++-
>> 1 file changed, 2 insertions(+), 1 deletion(-)
>>
>> diff --git a/net/vmw_vsock/af_vsock.c b/net/vmw_vsock/af_vsock.c
>> index f04abf662ec6..63a13fa2686a 100644
>> --- a/net/vmw_vsock/af_vsock.c
>> +++ b/net/vmw_vsock/af_vsock.c
>> @@ -1066,8 +1066,9 @@ static __poll_t vsock_poll(struct file *file, struct socket *sock,
>>         if (transport && transport->stream_is_active(vsk) &&
>>             !(sk->sk_shutdown & RCV_SHUTDOWN)) {
>>             bool data_ready_now = false;
>> +            int target = sock_rcvlowat(sk, 0, INT_MAX);
>>             int ret = transport->notify_poll_in(
>> -                    vsk, 1, &data_ready_now);
>> +                    vsk, target, &data_ready_now);
>>             if (ret < 0) {
>>                 mask |= EPOLLERR;
>>             } else {
>> -- 
>> 2.25.1
> 


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC PATCH v2 3/9] vmci/vsock: use 'target' in notify_poll_in, callback
  2022-07-27 12:22   ` Stefano Garzarella
  2022-07-27 12:29     ` Stefano Garzarella
@ 2022-07-28  6:05     ` Arseniy Krasnov
  1 sibling, 0 replies; 24+ messages in thread
From: Arseniy Krasnov @ 2022-07-28  6:05 UTC (permalink / raw)
  To: Stefano Garzarella, Jorgen Hansen
  Cc: David S. Miller, edumazet, Jakub Kicinski, Paolo Abeni, kys,
	haiyangz, sthemmin, wei.liu, Dexuan Cui, Stefan Hajnoczi,
	Krasnov Arseniy, virtualization, linux-kernel, linux-hyperv,
	netdev, kvm, kernel

On 27.07.2022 15:22, Stefano Garzarella wrote:
> @Jorgen can you take a look at this series, especially this patch?
> 
> Maybe we need to update the comments in the else branch, something like
> s/there is nothing/there is not enough data
Ok, thanks!
> 
> Thanks,
> Stefano
> 
> On Mon, Jul 25, 2022 at 08:01:01AM +0000, Arseniy Krasnov wrote:
>> This callback controls setting of POLLIN,POLLRDNORM output bits of poll()
>> syscall,but in some cases,it is incorrectly to set it, when socket has
>> at least 1 bytes of available data. Use 'target' which is already exists
>> and equal to sk_rcvlowat in this case.
>>
>> Signed-off-by: Arseniy Krasnov <AVKrasnov@sberdevices.ru>
>> ---
>> net/vmw_vsock/vmci_transport_notify.c        | 2 +-
>> net/vmw_vsock/vmci_transport_notify_qstate.c | 2 +-
>> 2 files changed, 2 insertions(+), 2 deletions(-)
>>
>> diff --git a/net/vmw_vsock/vmci_transport_notify.c b/net/vmw_vsock/vmci_transport_notify.c
>> index d69fc4b595ad..1684b85b0660 100644
>> --- a/net/vmw_vsock/vmci_transport_notify.c
>> +++ b/net/vmw_vsock/vmci_transport_notify.c
>> @@ -340,7 +340,7 @@ vmci_transport_notify_pkt_poll_in(struct sock *sk,
>> {
>>     struct vsock_sock *vsk = vsock_sk(sk);
>>
>> -    if (vsock_stream_has_data(vsk)) {
>> +    if (vsock_stream_has_data(vsk) >= target) {
>>         *data_ready_now = true;
>>     } else {
>>         /* We can't read right now because there is nothing in the
>> diff --git a/net/vmw_vsock/vmci_transport_notify_qstate.c b/net/vmw_vsock/vmci_transport_notify_qstate.c
>> index 0f36d7c45db3..a40407872b53 100644
>> --- a/net/vmw_vsock/vmci_transport_notify_qstate.c
>> +++ b/net/vmw_vsock/vmci_transport_notify_qstate.c
>> @@ -161,7 +161,7 @@ vmci_transport_notify_pkt_poll_in(struct sock *sk,
>> {
>>     struct vsock_sock *vsk = vsock_sk(sk);
>>
>> -    if (vsock_stream_has_data(vsk)) {
>> +    if (vsock_stream_has_data(vsk) >= target) {
>>         *data_ready_now = true;
>>     } else {
>>         /* We can't read right now because there is nothing in the
>> -- 
>> 2.25.1
> 


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC PATCH v2 5/9] vsock: SO_RCVLOWAT transport set callback
  2022-07-27 12:24   ` Stefano Garzarella
@ 2022-07-28  6:06     ` Arseniy Krasnov
  0 siblings, 0 replies; 24+ messages in thread
From: Arseniy Krasnov @ 2022-07-28  6:06 UTC (permalink / raw)
  To: Stefano Garzarella
  Cc: David S. Miller, edumazet, Jakub Kicinski, Paolo Abeni, kys,
	haiyangz, sthemmin, wei.liu, Dexuan Cui, Stefan Hajnoczi,
	Krasnov Arseniy, virtualization, linux-kernel, linux-hyperv,
	netdev, kvm, kernel

On 27.07.2022 15:24, Stefano Garzarella wrote:
> On Mon, Jul 25, 2022 at 08:05:28AM +0000, Arseniy Krasnov wrote:
>> This adds transport specific callback for SO_RCVLOWAT, because in some
>> transports it may be difficult to know current available number of bytes
>> ready to read. Thus, when SO_RCVLOWAT is set, transport may reject it.
>>
>> Signed-off-by: Arseniy Krasnov <AVKrasnov@sberdevices.ru>
>> ---
>> include/net/af_vsock.h   |  1 +
>> net/vmw_vsock/af_vsock.c | 19 +++++++++++++++++++
>> 2 files changed, 20 insertions(+)
>>
>> diff --git a/include/net/af_vsock.h b/include/net/af_vsock.h
>> index f742e50207fb..eae5874bae35 100644
>> --- a/include/net/af_vsock.h
>> +++ b/include/net/af_vsock.h
>> @@ -134,6 +134,7 @@ struct vsock_transport {
>>     u64 (*stream_rcvhiwat)(struct vsock_sock *);
>>     bool (*stream_is_active)(struct vsock_sock *);
>>     bool (*stream_allow)(u32 cid, u32 port);
>> +    int (*set_rcvlowat)(struct vsock_sock *, int);
>>
>>     /* SEQ_PACKET. */
>>     ssize_t (*seqpacket_dequeue)(struct vsock_sock *vsk, struct msghdr *msg,
>> diff --git a/net/vmw_vsock/af_vsock.c b/net/vmw_vsock/af_vsock.c
>> index 63a13fa2686a..b7a286db4af1 100644
>> --- a/net/vmw_vsock/af_vsock.c
>> +++ b/net/vmw_vsock/af_vsock.c
>> @@ -2130,6 +2130,24 @@ vsock_connectible_recvmsg(struct socket *sock, struct msghdr *msg, size_t len,
>>     return err;
>> }
>>
>> +static int vsock_set_rcvlowat(struct sock *sk, int val)
>> +{
>> +    const struct vsock_transport *transport;
>> +    struct vsock_sock *vsk;
>> +    int err = 0;
>> +
>> +    vsk = vsock_sk(sk);
>> +    transport = vsk->transport;
> 
> `transport` can be NULL if the user call SO_RCVLOWAT before we assign it, so we should check it.
Ack
> 
> I think if the transport implements `set_rcvlowat`, maybe we should set there sk->sk_rcvlowat, so I would do something like that:
> 
>     if (transport && transport->set_rcvlowat)
>         err = transport->set_rcvlowat(vsk, val);
>     else
>         WRITE_ONCE(sk->sk_rcvlowat, val ? : 1);
> 
>     return err;
> 
> In addition I think we should check that val does not exceed vsk->buffer_size, something similar of what tcp_set_rcvlowat() does.
> 
Ack
> Thanks,
> Stefano
> 


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC PATCH v2 0/9] vsock: updates for SO_RCVLOWAT handling
  2022-07-27 12:37 ` [RFC PATCH v2 0/9] vsock: updates for SO_RCVLOWAT handling Stefano Garzarella
@ 2022-07-28  6:08   ` Arseniy Krasnov
  2022-08-02  5:31     ` Vishnu Dasa
  2022-08-02  5:35   ` Vishnu Dasa
  1 sibling, 1 reply; 24+ messages in thread
From: Arseniy Krasnov @ 2022-07-28  6:08 UTC (permalink / raw)
  To: Stefano Garzarella, Bryan Tan, Vishnu Dasa, VMware PV-Drivers Reviewers
  Cc: David S. Miller, edumazet, Jakub Kicinski, Paolo Abeni,
	Stefan Hajnoczi, Michael S. Tsirkin, kys, haiyangz, sthemmin,
	wei.liu, Dexuan Cui, Krasnov Arseniy, virtualization, netdev,
	linux-kernel, linux-hyperv, kvm, kernel

On 27.07.2022 15:37, Stefano Garzarella wrote:
> Hi Arseniy,
> 
> On Mon, Jul 25, 2022 at 07:54:05AM +0000, Arseniy Krasnov wrote:
>> Hello,
>>
>> This patchset includes some updates for SO_RCVLOWAT:
>>
>> 1) af_vsock:
>>   During my experiments with zerocopy receive, i found, that in some
>>   cases, poll() implementation violates POSIX: when socket has non-
>>   default SO_RCVLOWAT(e.g. not 1), poll() will always set POLLIN and
>>   POLLRDNORM bits in 'revents' even number of bytes available to read
>>   on socket is smaller than SO_RCVLOWAT value. In this case,user sees
>>   POLLIN flag and then tries to read data(for example using  'read()'
>>   call), but read call will be blocked, because  SO_RCVLOWAT logic is
>>   supported in dequeue loop in af_vsock.c. But the same time,  POSIX
>>   requires that:
>>
>>   "POLLIN     Data other than high-priority data may be read without
>>               blocking.
>>    POLLRDNORM Normal data may be read without blocking."
>>
>>   See https://www.open-std.org/jtc1/sc22/open/n4217.pdf, page 293.
>>
>>   So, we have, that poll() syscall returns POLLIN, but read call will
>>   be blocked.
>>
>>   Also in man page socket(7) i found that:
>>
>>   "Since Linux 2.6.28, select(2), poll(2), and epoll(7) indicate a
>>   socket as readable only if at least SO_RCVLOWAT bytes are available."
>>
>>   I checked TCP callback for poll()(net/ipv4/tcp.c, tcp_poll()), it
>>   uses SO_RCVLOWAT value to set POLLIN bit, also i've tested TCP with
>>   this case for TCP socket, it works as POSIX required.
>>
>>   I've added some fixes to af_vsock.c and virtio_transport_common.c,
>>   test is also implemented.
>>
>> 2) virtio/vsock:
>>   It adds some optimization to wake ups, when new data arrived. Now,
>>   SO_RCVLOWAT is considered before wake up sleepers who wait new data.
>>   There is no sense, to kick waiter, when number of available bytes
>>   in socket's queue < SO_RCVLOWAT, because if we wake up reader in
>>   this case, it will wait for SO_RCVLOWAT data anyway during dequeue,
>>   or in poll() case, POLLIN/POLLRDNORM bits won't be set, so such
>>   exit from poll() will be "spurious". This logic is also used in TCP
>>   sockets.
> 
> Nice, it looks good!
Thank You!
> 
>>
>> 3) vmci/vsock:
>>   Same as 2), but i'm not sure about this changes. Will be very good,
>>   to get comments from someone who knows this code.
> 
> I CCed VMCI maintainers to the patch and also to this cover, maybe better to keep them in the loop for next versions.
> 
> (Jorgen's and Rajesh's emails bounced back, so I'm CCing here only Bryan, Vishnu, and pv-drivers@vmware.com)
Ok, i'll CC them in the next version
> 
>>
>> 4) Hyper-V:
>>   As Dexuan Cui mentioned, for Hyper-V transport it is difficult to
>>   support SO_RCVLOWAT, so he suggested to disable this feature for
>>   Hyper-V.
> 
> I left a couple of comments in some patches, but it seems to me to be in a good state :-)
> 
> I would just suggest a bit of a re-organization of the series (the patches are fine, just the order):
>   - introduce vsock_set_rcvlowat()
>   - disabling it for hv_sock
>   - use 'target' in virtio transports
>   - use 'target' in vmci transports
>   - use sock_rcvlowat in vsock_poll()
>     I think is better to pass sock_rcvlowat() as 'target' when the
>     transports are already able to use it
>   - add vsock_data_ready()
>   - use vsock_data_ready() in virtio transports
>   - use vsock_data_ready() in vmci transports
>   - tests
> 
> What do you think?
No problem! I think i can wait for reply from VMWare guys before preparing v3
> 
> Thanks,
> Stefano
> 


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC PATCH v2 0/9] vsock: updates for SO_RCVLOWAT handling
  2022-07-28  6:08   ` Arseniy Krasnov
@ 2022-08-02  5:31     ` Vishnu Dasa
  2022-08-02  5:35       ` Arseniy Krasnov
  0 siblings, 1 reply; 24+ messages in thread
From: Vishnu Dasa @ 2022-08-02  5:31 UTC (permalink / raw)
  To: Arseniy Krasnov
  Cc: Stefano Garzarella, Bryan Tan, Pv-drivers, David S. Miller,
	edumazet, Jakub Kicinski, Paolo Abeni, Stefan Hajnoczi,
	Michael S. Tsirkin, kys, haiyangz, sthemmin, wei.liu, Dexuan Cui,
	Krasnov Arseniy, virtualization, netdev, linux-kernel,
	linux-hyperv, kvm, kernel



> On Jul 27, 2022, at 11:08 PM, Arseniy Krasnov <AVKrasnov@sberdevices.ru> wrote:
> 
> On 27.07.2022 15:37, Stefano Garzarella wrote:
>> Hi Arseniy,
>> 
>> On Mon, Jul 25, 2022 at 07:54:05AM +0000, Arseniy Krasnov wrote:
>>> Hello,
>>> 
>>> This patchset includes some updates for SO_RCVLOWAT:
>>> 
>>> 1) af_vsock:
>>> During my experiments with zerocopy receive, i found, that in some
>>> cases, poll() implementation violates POSIX: when socket has non-
>>> default SO_RCVLOWAT(e.g. not 1), poll() will always set POLLIN and
>>> POLLRDNORM bits in 'revents' even number of bytes available to read
>>> on socket is smaller than SO_RCVLOWAT value. In this case,user sees
>>> POLLIN flag and then tries to read data(for example using 'read()'
>>> call), but read call will be blocked, because SO_RCVLOWAT logic is
>>> supported in dequeue loop in af_vsock.c. But the same time, POSIX
>>> requires that:
>>> 
>>> "POLLIN Data other than high-priority data may be read without
>>> blocking.
>>> POLLRDNORM Normal data may be read without blocking."
>>> 
>>> See https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.open-std.org%2Fjtc1%2Fsc22%2Fopen%2Fn4217.pdf&amp;data=05%7C01%7Cvdasa%40vmware.com%7Cae83621d8709421de14b08da705faa9c%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C1%7C637945853473740235%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&amp;sdata=NrbycCcVXV9Tz8NRDYBpnDx7KpFF6BZpSRbuhz1IfJ4%3D&amp;reserved=0, page 293.
>>> 
>>> So, we have, that poll() syscall returns POLLIN, but read call will
>>> be blocked.
>>> 
>>> Also in man page socket(7) i found that:
>>> 
>>> "Since Linux 2.6.28, select(2), poll(2), and epoll(7) indicate a
>>> socket as readable only if at least SO_RCVLOWAT bytes are available."
>>> 
>>> I checked TCP callback for poll()(net/ipv4/tcp.c, tcp_poll()), it
>>> uses SO_RCVLOWAT value to set POLLIN bit, also i've tested TCP with
>>> this case for TCP socket, it works as POSIX required.
>>> 
>>> I've added some fixes to af_vsock.c and virtio_transport_common.c,
>>> test is also implemented.
>>> 
>>> 2) virtio/vsock:
>>> It adds some optimization to wake ups, when new data arrived. Now,
>>> SO_RCVLOWAT is considered before wake up sleepers who wait new data.
>>> There is no sense, to kick waiter, when number of available bytes
>>> in socket's queue < SO_RCVLOWAT, because if we wake up reader in
>>> this case, it will wait for SO_RCVLOWAT data anyway during dequeue,
>>> or in poll() case, POLLIN/POLLRDNORM bits won't be set, so such
>>> exit from poll() will be "spurious". This logic is also used in TCP
>>> sockets.
>> 
>> Nice, it looks good!
> Thank You!
>> 
>>> 
>>> 3) vmci/vsock:
>>> Same as 2), but i'm not sure about this changes. Will be very good,
>>> to get comments from someone who knows this code.
>> 
>> I CCed VMCI maintainers to the patch and also to this cover, maybe better to keep them in the loop for next versions.
>> 
>> (Jorgen's and Rajesh's emails bounced back, so I'm CCing here only Bryan, Vishnu, and pv-drivers@vmware.com)
> Ok, i'll CC them in the next version
>> 
>>> 
>>> 4) Hyper-V:
>>> As Dexuan Cui mentioned, for Hyper-V transport it is difficult to
>>> support SO_RCVLOWAT, so he suggested to disable this feature for
>>> Hyper-V.
>> 
>> I left a couple of comments in some patches, but it seems to me to be in a good state :-)
>> 
>> I would just suggest a bit of a re-organization of the series (the patches are fine, just the order):
>> - introduce vsock_set_rcvlowat()
>> - disabling it for hv_sock
>> - use 'target' in virtio transports
>> - use 'target' in vmci transports
>> - use sock_rcvlowat in vsock_poll()
>> I think is better to pass sock_rcvlowat() as 'target' when the
>> transports are already able to use it
>> - add vsock_data_ready()
>> - use vsock_data_ready() in virtio transports
>> - use vsock_data_ready() in vmci transports
>> - tests
>> 
>> What do you think?
> No problem! I think i can wait for reply from VMWare guys before preparing v3

Looks fine to me, especially the VMCI parts.  Please send v3, and we can test it
from VMCI point of view as well.

Thanks,
Vishnu

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC PATCH v2 0/9] vsock: updates for SO_RCVLOWAT handling
  2022-07-27 12:37 ` [RFC PATCH v2 0/9] vsock: updates for SO_RCVLOWAT handling Stefano Garzarella
  2022-07-28  6:08   ` Arseniy Krasnov
@ 2022-08-02  5:35   ` Vishnu Dasa
  2022-08-02  8:04     ` Stefano Garzarella
  1 sibling, 1 reply; 24+ messages in thread
From: Vishnu Dasa @ 2022-08-02  5:35 UTC (permalink / raw)
  To: Stefano Garzarella
  Cc: Arseniy Krasnov, Bryan Tan, Pv-drivers, David S. Miller,
	edumazet, Jakub Kicinski, Paolo Abeni, Stefan Hajnoczi,
	Michael S. Tsirkin, kys, haiyangz, sthemmin, wei.liu, Dexuan Cui,
	Krasnov Arseniy, virtualization, netdev, linux-kernel,
	linux-hyperv, kvm, kernel


> On Jul 27, 2022, at 5:37 AM, Stefano Garzarella <sgarzare@redhat.com> wrote:
> 
> Hi Arseniy,
> 
> On Mon, Jul 25, 2022 at 07:54:05AM +0000, Arseniy Krasnov wrote:
>> Hello,
>> 
>> This patchset includes some updates for SO_RCVLOWAT:
>> 
>> 1) af_vsock:
>>  During my experiments with zerocopy receive, i found, that in some
>>  cases, poll() implementation violates POSIX: when socket has non-
>>  default SO_RCVLOWAT(e.g. not 1), poll() will always set POLLIN and
>>  POLLRDNORM bits in 'revents' even number of bytes available to read
>>  on socket is smaller than SO_RCVLOWAT value. In this case,user sees
>>  POLLIN flag and then tries to read data(for example using  'read()'
>>  call), but read call will be blocked, because  SO_RCVLOWAT logic is
>>  supported in dequeue loop in af_vsock.c. But the same time,  POSIX
>>  requires that:
>> 
>>  "POLLIN     Data other than high-priority data may be read without
>>              blocking.
>>   POLLRDNORM Normal data may be read without blocking."
>> 
>>  See https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.open-std.org%2Fjtc1%2Fsc22%2Fopen%2Fn4217.pdf&amp;data=05%7C01%7Cvdasa%40vmware.com%7C5ad2c6759fd8439e938708da6fccbee4%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C1%7C637945222450930014%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&amp;sdata=60hG3DiYufOCv1DuufSdujiLEKDNou1Ztyah3GPbOLI%3D&amp;reserved=0, page 293.
>> 
>>  So, we have, that poll() syscall returns POLLIN, but read call will
>>  be blocked.
>> 
>>  Also in man page socket(7) i found that:
>> 
>>  "Since Linux 2.6.28, select(2), poll(2), and epoll(7) indicate a
>>  socket as readable only if at least SO_RCVLOWAT bytes are available."
>> 
>>  I checked TCP callback for poll()(net/ipv4/tcp.c, tcp_poll()), it
>>  uses SO_RCVLOWAT value to set POLLIN bit, also i've tested TCP with
>>  this case for TCP socket, it works as POSIX required.
>> 
>>  I've added some fixes to af_vsock.c and virtio_transport_common.c,
>>  test is also implemented.
>> 
>> 2) virtio/vsock:
>>  It adds some optimization to wake ups, when new data arrived. Now,
>>  SO_RCVLOWAT is considered before wake up sleepers who wait new data.
>>  There is no sense, to kick waiter, when number of available bytes
>>  in socket's queue < SO_RCVLOWAT, because if we wake up reader in
>>  this case, it will wait for SO_RCVLOWAT data anyway during dequeue,
>>  or in poll() case, POLLIN/POLLRDNORM bits won't be set, so such
>>  exit from poll() will be "spurious". This logic is also used in TCP
>>  sockets.
> 
> Nice, it looks good!
> 
>> 
>> 3) vmci/vsock:
>>  Same as 2), but i'm not sure about this changes. Will be very good,
>>  to get comments from someone who knows this code.
> 
> I CCed VMCI maintainers to the patch and also to this cover, maybe
> better to keep them in the loop for next versions.
> 
> (Jorgen's and Rajesh's emails bounced back, so I'm CCing here only
> Bryan, Vishnu, and pv-drivers@vmware.com)

Hi Stefano,
Jorgen and Rajesh are no longer with VMware.  There's a patch in
flight to remove Rajesh from the MAINTAINERS file (Jorgen is already
removed).

Thanks,
Vishnu

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC PATCH v2 0/9] vsock: updates for SO_RCVLOWAT handling
  2022-08-02  5:31     ` Vishnu Dasa
@ 2022-08-02  5:35       ` Arseniy Krasnov
  0 siblings, 0 replies; 24+ messages in thread
From: Arseniy Krasnov @ 2022-08-02  5:35 UTC (permalink / raw)
  To: Vishnu Dasa
  Cc: Stefano Garzarella, Bryan Tan, Pv-drivers, David S. Miller,
	edumazet, Jakub Kicinski, Paolo Abeni, Stefan Hajnoczi,
	Michael S. Tsirkin, kys, haiyangz, sthemmin, wei.liu, Dexuan Cui,
	Krasnov Arseniy, virtualization, netdev, linux-kernel,
	linux-hyperv, kvm, kernel

On 02.08.2022 08:31, Vishnu Dasa wrote:
> 
> 
>> On Jul 27, 2022, at 11:08 PM, Arseniy Krasnov <AVKrasnov@sberdevices.ru> wrote:
>>
>> On 27.07.2022 15:37, Stefano Garzarella wrote:
>>> Hi Arseniy,
>>>
>>> On Mon, Jul 25, 2022 at 07:54:05AM +0000, Arseniy Krasnov wrote:
>>>> Hello,
>>>>
>>>> This patchset includes some updates for SO_RCVLOWAT:
>>>>
>>>> 1) af_vsock:
>>>> During my experiments with zerocopy receive, i found, that in some
>>>> cases, poll() implementation violates POSIX: when socket has non-
>>>> default SO_RCVLOWAT(e.g. not 1), poll() will always set POLLIN and
>>>> POLLRDNORM bits in 'revents' even number of bytes available to read
>>>> on socket is smaller than SO_RCVLOWAT value. In this case,user sees
>>>> POLLIN flag and then tries to read data(for example using 'read()'
>>>> call), but read call will be blocked, because SO_RCVLOWAT logic is
>>>> supported in dequeue loop in af_vsock.c. But the same time, POSIX
>>>> requires that:
>>>>
>>>> "POLLIN Data other than high-priority data may be read without
>>>> blocking.
>>>> POLLRDNORM Normal data may be read without blocking."
>>>>
>>>> See https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.open-std.org%2Fjtc1%2Fsc22%2Fopen%2Fn4217.pdf&amp;data=05%7C01%7Cvdasa%40vmware.com%7Cae83621d8709421de14b08da705faa9c%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C1%7C637945853473740235%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&amp;sdata=NrbycCcVXV9Tz8NRDYBpnDx7KpFF6BZpSRbuhz1IfJ4%3D&amp;reserved=0, page 293.
>>>>
>>>> So, we have, that poll() syscall returns POLLIN, but read call will
>>>> be blocked.
>>>>
>>>> Also in man page socket(7) i found that:
>>>>
>>>> "Since Linux 2.6.28, select(2), poll(2), and epoll(7) indicate a
>>>> socket as readable only if at least SO_RCVLOWAT bytes are available."
>>>>
>>>> I checked TCP callback for poll()(net/ipv4/tcp.c, tcp_poll()), it
>>>> uses SO_RCVLOWAT value to set POLLIN bit, also i've tested TCP with
>>>> this case for TCP socket, it works as POSIX required.
>>>>
>>>> I've added some fixes to af_vsock.c and virtio_transport_common.c,
>>>> test is also implemented.
>>>>
>>>> 2) virtio/vsock:
>>>> It adds some optimization to wake ups, when new data arrived. Now,
>>>> SO_RCVLOWAT is considered before wake up sleepers who wait new data.
>>>> There is no sense, to kick waiter, when number of available bytes
>>>> in socket's queue < SO_RCVLOWAT, because if we wake up reader in
>>>> this case, it will wait for SO_RCVLOWAT data anyway during dequeue,
>>>> or in poll() case, POLLIN/POLLRDNORM bits won't be set, so such
>>>> exit from poll() will be "spurious". This logic is also used in TCP
>>>> sockets.
>>>
>>> Nice, it looks good!
>> Thank You!
>>>
>>>>
>>>> 3) vmci/vsock:
>>>> Same as 2), but i'm not sure about this changes. Will be very good,
>>>> to get comments from someone who knows this code.
>>>
>>> I CCed VMCI maintainers to the patch and also to this cover, maybe better to keep them in the loop for next versions.
>>>
>>> (Jorgen's and Rajesh's emails bounced back, so I'm CCing here only Bryan, Vishnu, and pv-drivers@vmware.com)
>> Ok, i'll CC them in the next version
>>>
>>>>
>>>> 4) Hyper-V:
>>>> As Dexuan Cui mentioned, for Hyper-V transport it is difficult to
>>>> support SO_RCVLOWAT, so he suggested to disable this feature for
>>>> Hyper-V.
>>>
>>> I left a couple of comments in some patches, but it seems to me to be in a good state :-)
>>>
>>> I would just suggest a bit of a re-organization of the series (the patches are fine, just the order):
>>> - introduce vsock_set_rcvlowat()
>>> - disabling it for hv_sock
>>> - use 'target' in virtio transports
>>> - use 'target' in vmci transports
>>> - use sock_rcvlowat in vsock_poll()
>>> I think is better to pass sock_rcvlowat() as 'target' when the
>>> transports are already able to use it
>>> - add vsock_data_ready()
>>> - use vsock_data_ready() in virtio transports
>>> - use vsock_data_ready() in vmci transports
>>> - tests
>>>
>>> What do you think?
>> No problem! I think i can wait for reply from VMWare guys before preparing v3
> 
> Looks fine to me, especially the VMCI parts.  Please send v3, and we can test it
> from VMCI point of view as well.
Great, thank you for reply. I'll prepare v3 ASAP and You will be CCed

Thanks,
Arseniy
> 
> Thanks,
> Vishnu


^ permalink raw reply	[flat|nested] 24+ messages in thread

* RE: [RFC PATCH v2 6/9] hv_sock: disable SO_RCVLOWAT support
  2022-07-25  8:07 ` [RFC PATCH v2 6/9] hv_sock: disable SO_RCVLOWAT support Arseniy Krasnov
@ 2022-08-02  5:42   ` Dexuan Cui
  0 siblings, 0 replies; 24+ messages in thread
From: Dexuan Cui @ 2022-08-02  5:42 UTC (permalink / raw)
  To: Arseniy Krasnov, Stefano Garzarella, David S. Miller, edumazet,
	Jakub Kicinski, Paolo Abeni, KY Srinivasan, Haiyang Zhang,
	Stephen Hemminger, wei.liu, Stefan Hajnoczi, Krasnov Arseniy
  Cc: virtualization, linux-kernel, linux-hyperv, netdev, kvm, kernel

> From: Arseniy Krasnov <AVKrasnov@sberdevices.ru>
> Sent: Monday, July 25, 2022 1:07 AM
> ...
> Subject: [RFC PATCH v2 6/9] hv_sock: disable SO_RCVLOWAT support
> 
> For Hyper-V it is quiet difficult to support this socket option,due to
> transport internals, so disable it.
> 
> Signed-off-by: Arseniy Krasnov <AVKrasnov@sberdevices.ru>

Thanks, Arseniy! This looks good to me.

Reviewed-by: Dexuan Cui <decui@microsoft.com>



^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC PATCH v2 0/9] vsock: updates for SO_RCVLOWAT handling
  2022-08-02  5:35   ` Vishnu Dasa
@ 2022-08-02  8:04     ` Stefano Garzarella
  0 siblings, 0 replies; 24+ messages in thread
From: Stefano Garzarella @ 2022-08-02  8:04 UTC (permalink / raw)
  To: Vishnu Dasa
  Cc: Arseniy Krasnov, Bryan Tan, Pv-drivers, David S. Miller,
	edumazet, Jakub Kicinski, Paolo Abeni, Stefan Hajnoczi,
	Michael S. Tsirkin, kys, haiyangz, sthemmin, wei.liu, Dexuan Cui,
	Krasnov Arseniy, virtualization, netdev, linux-kernel,
	linux-hyperv, kvm, kernel

Hi Vishnu,

On Tue, Aug 02, 2022 at 05:35:22AM +0000, Vishnu Dasa wrote:
>> On Jul 27, 2022, at 5:37 AM, Stefano Garzarella <sgarzare@redhat.com> 
>> wrote:
>> Hi Arseniy,
>>
>> On Mon, Jul 25, 2022 at 07:54:05AM +0000, Arseniy Krasnov wrote:

[...]

>>>
>>> 3) vmci/vsock:
>>>  Same as 2), but i'm not sure about this changes. Will be very good,
>>>  to get comments from someone who knows this code.
>>
>> I CCed VMCI maintainers to the patch and also to this cover, maybe
>> better to keep them in the loop for next versions.
>>
>> (Jorgen's and Rajesh's emails bounced back, so I'm CCing here only
>> Bryan, Vishnu, and pv-drivers@vmware.com)
>
>Hi Stefano,
>Jorgen and Rajesh are no longer with VMware.  There's a patch in
>flight to remove Rajesh from the MAINTAINERS file (Jorgen is already
>removed).

Thanks for the update! I will contact you and Bryan for any questions 
with VMCI in the future :-)

Stefano


^ permalink raw reply	[flat|nested] 24+ messages in thread

end of thread, other threads:[~2022-08-02  8:04 UTC | newest]

Thread overview: 24+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-07-25  7:54 [RFC PATCH v2 0/9] vsock: updates for SO_RCVLOWAT handling Arseniy Krasnov
2022-07-25  7:56 ` [RFC PATCH v2 1/9] vsock: use sk_rcvlowat to set POLLIN/POLLRDNORM Arseniy Krasnov
2022-07-27 12:17   ` Stefano Garzarella
2022-07-28  6:04     ` Arseniy Krasnov
2022-07-25  7:59 ` [RFC PATCH v2 2/9] virtio/vsock: use 'target' in notify_poll_in, callback Arseniy Krasnov
2022-07-25  8:01 ` [RFC PATCH v2 3/9] vmci/vsock: " Arseniy Krasnov
2022-07-27 12:22   ` Stefano Garzarella
2022-07-27 12:29     ` Stefano Garzarella
2022-07-28  6:05     ` Arseniy Krasnov
2022-07-25  8:03 ` [RFC PATCH v2 4/9] vsock_test: POLLIN + SO_RCVLOWAT test Arseniy Krasnov
2022-07-25  8:05 ` [RFC PATCH v2 5/9] vsock: SO_RCVLOWAT transport set callback Arseniy Krasnov
2022-07-27 12:24   ` Stefano Garzarella
2022-07-28  6:06     ` Arseniy Krasnov
2022-07-25  8:07 ` [RFC PATCH v2 6/9] hv_sock: disable SO_RCVLOWAT support Arseniy Krasnov
2022-08-02  5:42   ` Dexuan Cui
2022-07-25  8:09 ` [RFC PATCH v2 7/9] vsock: add API call for data ready Arseniy Krasnov
2022-07-25  8:11 ` [RFC PATCH v2 8/9] virtio/vsock: check SO_RCVLOWAT before wake up reader Arseniy Krasnov
2022-07-25  8:13 ` [RFC PATCH v2 9/9] vmci/vsock: " Arseniy Krasnov
2022-07-27 12:37 ` [RFC PATCH v2 0/9] vsock: updates for SO_RCVLOWAT handling Stefano Garzarella
2022-07-28  6:08   ` Arseniy Krasnov
2022-08-02  5:31     ` Vishnu Dasa
2022-08-02  5:35       ` Arseniy Krasnov
2022-08-02  5:35   ` Vishnu Dasa
2022-08-02  8:04     ` Stefano Garzarella

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).