netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Arseniy Krasnov <AVKrasnov@sberdevices.ru>
To: Stefano Garzarella <sgarzare@redhat.com>
Cc: "David S. Miller" <davem@davemloft.net>,
	Jakub Kicinski <kuba@kernel.org>, Paolo Abeni <pabeni@redhat.com>,
	Stefan Hajnoczi <stefanha@redhat.com>,
	"Michael S. Tsirkin" <mst@redhat.com>,
	Krasnov Arseniy <oxffffaa@gmail.com>,
	"kvm@vger.kernel.org" <kvm@vger.kernel.org>,
	"netdev@vger.kernel.org" <netdev@vger.kernel.org>,
	"virtualization@lists.linux-foundation.org" 
	<virtualization@lists.linux-foundation.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	kernel <kernel@sberdevices.ru>
Subject: Re: [RFC PATCH v1 0/3] virtio/vsock: use SO_RCVLOWAT to set POLLIN/POLLRDNORM
Date: Wed, 20 Jul 2022 10:52:25 +0000	[thread overview]
Message-ID: <3e954621-4496-17be-4b73-d0971372b8c5@sberdevices.ru> (raw)
In-Reply-To: <20220720093005.2unej4jnnvrn55f2@sgarzare-redhat>

On 20.07.2022 12:30, Stefano Garzarella wrote:
> On Wed, Jul 20, 2022 at 06:07:47AM +0000, Arseniy Krasnov wrote:
>> On 19.07.2022 15:58, Stefano Garzarella wrote:
>>> On Mon, Jul 18, 2022 at 08:12:52AM +0000, Arseniy Krasnov wrote:
>>>> Hello,
>>>>
>>>> during my experiments with zerocopy receive, i found, that in some
>>>> cases, poll() implementation violates POSIX: when socket has non-
>>>> default SO_RCVLOWAT(e.g. not 1), poll() will always set POLLIN and
>>>> POLLRDNORM bits in 'revents' even number of bytes available to read
>>>> on socket is smaller than SO_RCVLOWAT value. In this case,user sees
>>>> POLLIN flag and then tries to read data(for example using  'read()'
>>>> call), but read call will be blocked, because  SO_RCVLOWAT logic is
>>>> supported in dequeue loop in af_vsock.c. But the same time,  POSIX
>>>> requires that:
>>>>
>>>> "POLLIN     Data other than high-priority data may be read without
>>>>            blocking.
>>>> POLLRDNORM Normal data may be read without blocking."
>>>>
>>>> See https://www.open-std.org/jtc1/sc22/open/n4217.pdf, page 293.
>>>>
>>>> So, we have, that poll() syscall returns POLLIN, but read call will
>>>> be blocked.
>>>>
>>>> Also in man page socket(7) i found that:
>>>>
>>>> "Since Linux 2.6.28, select(2), poll(2), and epoll(7) indicate a
>>>> socket as readable only if at least SO_RCVLOWAT bytes are available."
>>>>
>>>> I checked TCP callback for poll()(net/ipv4/tcp.c, tcp_poll()), it
>>>> uses SO_RCVLOWAT value to set POLLIN bit, also i've tested TCP with
>>>> this case for TCP socket, it works as POSIX required.
>>>
>>> I tried to look at the code and it seems that only TCP complies with it or am I wrong?
>> Yes, i checked AF_UNIX, it also don't care about that. It calls skb_queue_empty() that of
>> course ignores SO_RCVLOWAT.
>>>
>>>>
>>>> I've added some fixes to af_vsock.c and virtio_transport_common.c,
>>>> test is also implemented.
>>>>
>>>> What do You think guys?
>>>
>>> Nice, thanks for fixing this and for the test!
>>>
>>> I left some comments, but I think the series is fine if we will support it in all transports.
>> Ack
>>>
>>> I'd just like to understand if it's just TCP complying with it or I'm missing some check included in the socket layer that we could reuse.
>> Seems sock_poll() which is socket layer entry point for poll() doesn't contain any such checks
>>>
>>> @David, @Jakub, @Paolo, any advice?
>>>
>>> Thanks,
>>> Stefano
>>>
>>
>> PS: moreover, i found one more interesting thing with TCP and poll: TCP receive logic wakes up poll waiter
>> only when number of available bytes > SO_RCVLOWAT. E.g. it prevents "spurious" wake ups, when poll will be
>> woken up because new data arrived, but POLLIN to allow user dequeue this data won't be set(as amount of data
>> is too small).
>> See tcp_data_ready() in net/ipv4/tcp_input.c
> 
> Do you mean that we should call sk->sk_data_ready(sk) checking SO_RCVLOWAT?
Yes, like tcp_data_read().
> 
> It seems fine, maybe we can add vsock_data_ready() in af_vsock.c that transports should call instead of calling sk->sk_data_ready(sk) directly.
Yes, this will also update logic in vmci and hyperv transports
> 
> Then we can something similar to tcp_data_ready().
> 
> Thanks,
> Stefano
> 


      reply	other threads:[~2022-07-20 10:53 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-07-18  8:12 [RFC PATCH v1 0/3] virtio/vsock: use SO_RCVLOWAT to set POLLIN/POLLRDNORM Arseniy Krasnov
2022-07-18  8:15 ` [RFC PATCH v1 1/3] vsock: use sk_skrcvlowat to set POLLIN,POLLRDNORM, bits Arseniy Krasnov
2022-07-19 12:44   ` Stefano Garzarella
2022-07-20  5:35     ` Arseniy Krasnov
2022-07-18  8:17 ` [RFC PATCH v1 2/3] virtio/vsock: use 'target' in notify_poll_in, callback Arseniy Krasnov
2022-07-19 12:48   ` Stefano Garzarella
2022-07-20  5:38     ` Arseniy Krasnov
2022-07-20  8:23       ` Stefano Garzarella
2022-07-20 18:54         ` Dexuan Cui
2022-07-21  6:02           ` Arseniy Krasnov
2022-07-18  8:19 ` [RFC PATCH v1 3/3] vsock_test: POLLIN + SO_RCVLOWAT test Arseniy Krasnov
2022-07-19 12:52   ` Stefano Garzarella
2022-07-20  5:46     ` Arseniy Krasnov
2022-07-20  8:56       ` Stefano Garzarella
2022-07-19 12:58 ` [RFC PATCH v1 0/3] virtio/vsock: use SO_RCVLOWAT to set POLLIN/POLLRDNORM Stefano Garzarella
2022-07-20  6:07   ` Arseniy Krasnov
2022-07-20  9:30     ` Stefano Garzarella
2022-07-20 10:52       ` Arseniy Krasnov [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=3e954621-4496-17be-4b73-d0971372b8c5@sberdevices.ru \
    --to=avkrasnov@sberdevices.ru \
    --cc=davem@davemloft.net \
    --cc=kernel@sberdevices.ru \
    --cc=kuba@kernel.org \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mst@redhat.com \
    --cc=netdev@vger.kernel.org \
    --cc=oxffffaa@gmail.com \
    --cc=pabeni@redhat.com \
    --cc=sgarzare@redhat.com \
    --cc=stefanha@redhat.com \
    --cc=virtualization@lists.linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).