All of lore.kernel.org
 help / color / mirror / Atom feed
From: Stefano Garzarella <sgarzare@redhat.com>
To: Bobby Eshleman <bobbyeshleman@gmail.com>
Cc: Vishnu Dasa <vdasa@vmware.com>, Wei Liu <wei.liu@kernel.org>,
	Bobby Eshleman <bobby.eshleman@bytedance.com>,
	kvm@vger.kernel.org, "Michael S. Tsirkin" <mst@redhat.com>,
	VMware PV-Drivers Reviewers <pv-drivers@vmware.com>,
	netdev@vger.kernel.org, Haiyang Zhang <haiyangz@microsoft.com>,
	Dexuan Cui <decui@microsoft.com>,
	linux-kernel@vger.kernel.org,
	virtualization@lists.linux-foundation.org,
	Bryan Tan <bryantan@vmware.com>,
	Eric Dumazet <edumazet@google.com>,
	linux-hyperv@vger.kernel.org,
	Stefan Hajnoczi <stefanha@redhat.com>,
	Jakub Kicinski <kuba@kernel.org>, Paolo Abeni <pabeni@redhat.com>,
	"David S. Miller" <davem@davemloft.net>
Subject: Re: [PATCH RFC net-next v2 3/4] vsock: Add lockless sendmsg() support
Date: Wed, 3 May 2023 14:09:38 +0200	[thread overview]
Message-ID: <lc2v5porgyzx6neimlyrpeg3d5l7trnorbs7xubqgcrlp7bbi7@yxs25wx67tm7> (raw)
In-Reply-To: <ZDre6Mqh9+HA8wuN@bullseye>

On Sat, Apr 15, 2023 at 05:29:12PM +0000, Bobby Eshleman wrote:
>On Fri, Apr 28, 2023 at 12:29:50PM +0200, Stefano Garzarella wrote:
>> On Sat, Apr 15, 2023 at 10:30:55AM +0000, Bobby Eshleman wrote:
>> > On Wed, Apr 19, 2023 at 11:30:53AM +0200, Stefano Garzarella wrote:
>> > > On Fri, Apr 14, 2023 at 12:25:59AM +0000, Bobby Eshleman wrote:
>> > > > Because the dgram sendmsg() path for AF_VSOCK acquires the socket lock
>> > > > it does not scale when many senders share a socket.
>> > > >
>> > > > Prior to this patch the socket lock is used to protect the local_addr,
>> > > > remote_addr, transport, and buffer size variables. What follows are the
>> > > > new protection schemes for the various protected fields that ensure a
>> > > > race-free multi-sender sendmsg() path for vsock dgrams.
>> > > >
>> > > > - local_addr
>> > > >    local_addr changes as a result of binding a socket. The write path
>> > > >    for local_addr is bind() and various vsock_auto_bind() call sites.
>> > > >    After a socket has been bound via vsock_auto_bind() or bind(), subsequent
>> > > >    calls to bind()/vsock_auto_bind() do not write to local_addr again. bind()
>> > > >    rejects the user request and vsock_auto_bind() early exits.
>> > > >    Therefore, the local addr can not change while a parallel thread is
>> > > >    in sendmsg() and lock-free reads of local addr in sendmsg() are safe.
>> > > >    Change: only acquire lock for auto-binding as-needed in sendmsg().
>> > > >
>> > > > - vsk->transport
>> > > >    Updated upon socket creation and it doesn't change again until the
>> > >
>> > > This is true only for dgram, right?
>> > >
>> >
>> > Yes.
>> >
>> > > How do we decide which transport to assign for dgram?
>> > >
>> >
>> > The transport is assigned in proto->create() [vsock_create()]. It is
>> > assigned there *only* for dgrams, whereas for streams/seqpackets it is
>> > assigned in connect(). vsock_create() sets transport to
>> > 'transport_dgram' if sock->type == SOCK_DGRAM.
>> >
>> > vsock_sk_destruct() then eventually sets vsk->transport to NULL.
>> >
>> > Neither destruct nor create can occur in parallel with sendmsg().
>> > create() hasn't yet returned the sockfd for the user to call upon it
>> > sendmsg(), and AFAICT destruct is only called after the final socket
>> > reference is released, which only happens after the socket no longer
>> > exists in the fd lookup table and so sendmsg() will fail before it ever
>> > has the chance to race.
>>
>> This is okay if a socket can be assigned to a single transport, but with
>> dgrams I'm not sure we can do that.
>>
>> Since it is not connected, a socket can send or receive packets from
>> different transports, so maybe we can't assign it to a specific one,
>> but do a lookup for every packets to understand which transport to use.
>>
>
>Yes this is true, this lookup needs to be implemented... currently
>sendmsg() doesn't do this at all. It grabs the remote_addr when passed
>in from sendto(), but then just uses the same old transport from vsk.
>You are right that sendto() should be a per-packet lookup, not a
>vsk->transport read. Perhaps I should add that as another patch in this
>series, and make it precede this one?

Yep, I think so, we need to implement it before adding a new transport
that can support dgram.

>
>For the send() / sendto(NULL) case where vsk->transport is being read, I
>do believe this is still race-free, but...
>
>If we later support dynamic transports for datagram, such that
>connect(VMADDR_LOCAL_CID) sets vsk->transport to transport_loopback,
>connect(VMADDR_CID_HOST) sets vsk->transport to something like
>transport_datagram_g2h, etc..., then vsk->transport will need to be
>bundled into the RCU-protected pointer too, since it may change when
>remote_addr changes.

Yep, I think so. Although in vsock_dgram_connect we call lock_sock(), so 
maybe that could be enough to protect us.

In general I think we should use vsk->transport if vsock_dgram_connect()
is called, or we need to do per-packet lookup.

Another think I would change, is the dgram_dequeue() callback.
I would remove it, and move in the core the code in 
vmci_transport_dgram_dequeue() since it seems pretty generic.

This should work well if every transport uses sk_receive_skb() to 
enqueue sk_buffs in the socket buffer.

Thanks,
Stefano

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

WARNING: multiple messages have this Message-ID (diff)
From: Stefano Garzarella <sgarzare@redhat.com>
To: Bobby Eshleman <bobbyeshleman@gmail.com>
Cc: Vishnu Dasa <vdasa@vmware.com>, Wei Liu <wei.liu@kernel.org>,
	Bobby Eshleman <bobby.eshleman@bytedance.com>,
	kvm@vger.kernel.org, "Michael S. Tsirkin" <mst@redhat.com>,
	VMware PV-Drivers Reviewers <pv-drivers@vmware.com>,
	Dexuan Cui <decui@microsoft.com>,
	Haiyang Zhang <haiyangz@microsoft.com>,
	linux-kernel@vger.kernel.org,
	virtualization@lists.linux-foundation.org,
	Bryan Tan <bryantan@vmware.com>,
	Eric Dumazet <edumazet@google.com>,
	linux-hyperv@vger.kernel.org,
	Stefan Hajnoczi <stefanha@redhat.com>,
	netdev@vger.kernel.org, Jakub Kicinski <kuba@kernel.org>,
	Paolo Abeni <pabeni@redhat.com>,
	"David S. Miller" <davem@davemloft.net>
Subject: Re: [PATCH RFC net-next v2 3/4] vsock: Add lockless sendmsg() support
Date: Wed, 3 May 2023 14:09:38 +0200	[thread overview]
Message-ID: <lc2v5porgyzx6neimlyrpeg3d5l7trnorbs7xubqgcrlp7bbi7@yxs25wx67tm7> (raw)
In-Reply-To: <ZDre6Mqh9+HA8wuN@bullseye>

On Sat, Apr 15, 2023 at 05:29:12PM +0000, Bobby Eshleman wrote:
>On Fri, Apr 28, 2023 at 12:29:50PM +0200, Stefano Garzarella wrote:
>> On Sat, Apr 15, 2023 at 10:30:55AM +0000, Bobby Eshleman wrote:
>> > On Wed, Apr 19, 2023 at 11:30:53AM +0200, Stefano Garzarella wrote:
>> > > On Fri, Apr 14, 2023 at 12:25:59AM +0000, Bobby Eshleman wrote:
>> > > > Because the dgram sendmsg() path for AF_VSOCK acquires the socket lock
>> > > > it does not scale when many senders share a socket.
>> > > >
>> > > > Prior to this patch the socket lock is used to protect the local_addr,
>> > > > remote_addr, transport, and buffer size variables. What follows are the
>> > > > new protection schemes for the various protected fields that ensure a
>> > > > race-free multi-sender sendmsg() path for vsock dgrams.
>> > > >
>> > > > - local_addr
>> > > >    local_addr changes as a result of binding a socket. The write path
>> > > >    for local_addr is bind() and various vsock_auto_bind() call sites.
>> > > >    After a socket has been bound via vsock_auto_bind() or bind(), subsequent
>> > > >    calls to bind()/vsock_auto_bind() do not write to local_addr again. bind()
>> > > >    rejects the user request and vsock_auto_bind() early exits.
>> > > >    Therefore, the local addr can not change while a parallel thread is
>> > > >    in sendmsg() and lock-free reads of local addr in sendmsg() are safe.
>> > > >    Change: only acquire lock for auto-binding as-needed in sendmsg().
>> > > >
>> > > > - vsk->transport
>> > > >    Updated upon socket creation and it doesn't change again until the
>> > >
>> > > This is true only for dgram, right?
>> > >
>> >
>> > Yes.
>> >
>> > > How do we decide which transport to assign for dgram?
>> > >
>> >
>> > The transport is assigned in proto->create() [vsock_create()]. It is
>> > assigned there *only* for dgrams, whereas for streams/seqpackets it is
>> > assigned in connect(). vsock_create() sets transport to
>> > 'transport_dgram' if sock->type == SOCK_DGRAM.
>> >
>> > vsock_sk_destruct() then eventually sets vsk->transport to NULL.
>> >
>> > Neither destruct nor create can occur in parallel with sendmsg().
>> > create() hasn't yet returned the sockfd for the user to call upon it
>> > sendmsg(), and AFAICT destruct is only called after the final socket
>> > reference is released, which only happens after the socket no longer
>> > exists in the fd lookup table and so sendmsg() will fail before it ever
>> > has the chance to race.
>>
>> This is okay if a socket can be assigned to a single transport, but with
>> dgrams I'm not sure we can do that.
>>
>> Since it is not connected, a socket can send or receive packets from
>> different transports, so maybe we can't assign it to a specific one,
>> but do a lookup for every packets to understand which transport to use.
>>
>
>Yes this is true, this lookup needs to be implemented... currently
>sendmsg() doesn't do this at all. It grabs the remote_addr when passed
>in from sendto(), but then just uses the same old transport from vsk.
>You are right that sendto() should be a per-packet lookup, not a
>vsk->transport read. Perhaps I should add that as another patch in this
>series, and make it precede this one?

Yep, I think so, we need to implement it before adding a new transport
that can support dgram.

>
>For the send() / sendto(NULL) case where vsk->transport is being read, I
>do believe this is still race-free, but...
>
>If we later support dynamic transports for datagram, such that
>connect(VMADDR_LOCAL_CID) sets vsk->transport to transport_loopback,
>connect(VMADDR_CID_HOST) sets vsk->transport to something like
>transport_datagram_g2h, etc..., then vsk->transport will need to be
>bundled into the RCU-protected pointer too, since it may change when
>remote_addr changes.

Yep, I think so. Although in vsock_dgram_connect we call lock_sock(), so 
maybe that could be enough to protect us.

In general I think we should use vsk->transport if vsock_dgram_connect()
is called, or we need to do per-packet lookup.

Another think I would change, is the dgram_dequeue() callback.
I would remove it, and move in the core the code in 
vmci_transport_dgram_dequeue() since it seems pretty generic.

This should work well if every transport uses sk_receive_skb() to 
enqueue sk_buffs in the socket buffer.

Thanks,
Stefano


  reply	other threads:[~2023-05-03 12:09 UTC|newest]

Thread overview: 38+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-04-14  0:25 [PATCH RFC net-next v2 0/4] virtio/vsock: support datagrams Bobby Eshleman
2023-04-14  0:25 ` [PATCH RFC net-next v2 1/4] virtio/vsock: support dgram Bobby Eshleman
2023-04-19  9:29   ` Stefano Garzarella
2023-04-19  9:29     ` Stefano Garzarella
2023-04-14  0:25 ` [PATCH RFC net-next v2 2/4] virtio/vsock: add VIRTIO_VSOCK_F_DGRAM feature bit Bobby Eshleman
2023-04-14  8:47   ` Alvaro Karsz
2023-04-14  8:47     ` Alvaro Karsz
2023-04-14 10:28     ` Bobby Eshleman
2023-04-19  9:30   ` Stefano Garzarella
2023-04-19  9:30     ` Stefano Garzarella
2023-04-15  9:51     ` Bobby Eshleman
2023-04-14  0:25 ` [PATCH RFC net-next v2 3/4] vsock: Add lockless sendmsg() support Bobby Eshleman
2023-04-14  2:09   ` kernel test robot
2023-04-14  3:32   ` kernel test robot
2023-04-15  6:13   ` kernel test robot
2023-04-19  9:30   ` Stefano Garzarella
2023-04-19  9:30     ` Stefano Garzarella
2023-04-15 10:30     ` Bobby Eshleman
2023-04-28 10:29       ` Stefano Garzarella
2023-04-28 10:29         ` Stefano Garzarella
2023-04-15 17:29         ` Bobby Eshleman
2023-05-03 12:09           ` Stefano Garzarella [this message]
2023-05-03 12:09             ` Stefano Garzarella
2023-04-14  0:26 ` [PATCH RFC net-next v2 4/4] tests: add vsock dgram tests Bobby Eshleman
2023-04-14 11:18 ` [PATCH RFC net-next v2 0/4] virtio/vsock: support datagrams Bobby Eshleman
2023-04-19 10:00   ` Stefano Garzarella
2023-04-19 10:00     ` Stefano Garzarella
2023-04-15  7:13     ` Bobby Eshleman
2023-04-15  7:13       ` [virtio-dev] " Bobby Eshleman
2023-04-28 10:43       ` Stefano Garzarella
2023-04-28 10:43         ` Stefano Garzarella
2023-04-28 10:43         ` [virtio-dev] " Stefano Garzarella
2023-04-15 15:55         ` Bobby Eshleman
2023-04-15 15:55           ` [virtio-dev] " Bobby Eshleman
2023-05-03 12:13           ` Stefano Garzarella
2023-05-03 12:13             ` [virtio-dev] " Stefano Garzarella
2023-05-03 12:13             ` Stefano Garzarella
2023-04-14  5:16 [PATCH RFC net-next v2 3/4] vsock: Add lockless sendmsg() support kernel test robot

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=lc2v5porgyzx6neimlyrpeg3d5l7trnorbs7xubqgcrlp7bbi7@yxs25wx67tm7 \
    --to=sgarzare@redhat.com \
    --cc=bobby.eshleman@bytedance.com \
    --cc=bobbyeshleman@gmail.com \
    --cc=bryantan@vmware.com \
    --cc=davem@davemloft.net \
    --cc=decui@microsoft.com \
    --cc=edumazet@google.com \
    --cc=haiyangz@microsoft.com \
    --cc=kuba@kernel.org \
    --cc=kvm@vger.kernel.org \
    --cc=linux-hyperv@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mst@redhat.com \
    --cc=netdev@vger.kernel.org \
    --cc=pabeni@redhat.com \
    --cc=pv-drivers@vmware.com \
    --cc=stefanha@redhat.com \
    --cc=vdasa@vmware.com \
    --cc=virtualization@lists.linux-foundation.org \
    --cc=wei.liu@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.