From: Stefano Garzarella <sgarzare@redhat.com> To: Bobby Eshleman <bobbyeshleman@gmail.com> Cc: Vishnu Dasa <vdasa@vmware.com>, Wei Liu <wei.liu@kernel.org>, Bobby Eshleman <bobby.eshleman@bytedance.com>, kvm@vger.kernel.org, "Michael S. Tsirkin" <mst@redhat.com>, VMware PV-Drivers Reviewers <pv-drivers@vmware.com>, netdev@vger.kernel.org, Haiyang Zhang <haiyangz@microsoft.com>, Dexuan Cui <decui@microsoft.com>, linux-kernel@vger.kernel.org, virtualization@lists.linux-foundation.org, Bryan Tan <bryantan@vmware.com>, Eric Dumazet <edumazet@google.com>, linux-hyperv@vger.kernel.org, Stefan Hajnoczi <stefanha@redhat.com>, Jakub Kicinski <kuba@kernel.org>, Paolo Abeni <pabeni@redhat.com>, "David S. Miller" <davem@davemloft.net> Subject: Re: [PATCH RFC net-next v2 3/4] vsock: Add lockless sendmsg() support Date: Wed, 3 May 2023 14:09:38 +0200 [thread overview] Message-ID: <lc2v5porgyzx6neimlyrpeg3d5l7trnorbs7xubqgcrlp7bbi7@yxs25wx67tm7> (raw) In-Reply-To: <ZDre6Mqh9+HA8wuN@bullseye> On Sat, Apr 15, 2023 at 05:29:12PM +0000, Bobby Eshleman wrote: >On Fri, Apr 28, 2023 at 12:29:50PM +0200, Stefano Garzarella wrote: >> On Sat, Apr 15, 2023 at 10:30:55AM +0000, Bobby Eshleman wrote: >> > On Wed, Apr 19, 2023 at 11:30:53AM +0200, Stefano Garzarella wrote: >> > > On Fri, Apr 14, 2023 at 12:25:59AM +0000, Bobby Eshleman wrote: >> > > > Because the dgram sendmsg() path for AF_VSOCK acquires the socket lock >> > > > it does not scale when many senders share a socket. >> > > > >> > > > Prior to this patch the socket lock is used to protect the local_addr, >> > > > remote_addr, transport, and buffer size variables. What follows are the >> > > > new protection schemes for the various protected fields that ensure a >> > > > race-free multi-sender sendmsg() path for vsock dgrams. >> > > > >> > > > - local_addr >> > > > local_addr changes as a result of binding a socket. The write path >> > > > for local_addr is bind() and various vsock_auto_bind() call sites. >> > > > After a socket has been bound via vsock_auto_bind() or bind(), subsequent >> > > > calls to bind()/vsock_auto_bind() do not write to local_addr again. bind() >> > > > rejects the user request and vsock_auto_bind() early exits. >> > > > Therefore, the local addr can not change while a parallel thread is >> > > > in sendmsg() and lock-free reads of local addr in sendmsg() are safe. >> > > > Change: only acquire lock for auto-binding as-needed in sendmsg(). >> > > > >> > > > - vsk->transport >> > > > Updated upon socket creation and it doesn't change again until the >> > > >> > > This is true only for dgram, right? >> > > >> > >> > Yes. >> > >> > > How do we decide which transport to assign for dgram? >> > > >> > >> > The transport is assigned in proto->create() [vsock_create()]. It is >> > assigned there *only* for dgrams, whereas for streams/seqpackets it is >> > assigned in connect(). vsock_create() sets transport to >> > 'transport_dgram' if sock->type == SOCK_DGRAM. >> > >> > vsock_sk_destruct() then eventually sets vsk->transport to NULL. >> > >> > Neither destruct nor create can occur in parallel with sendmsg(). >> > create() hasn't yet returned the sockfd for the user to call upon it >> > sendmsg(), and AFAICT destruct is only called after the final socket >> > reference is released, which only happens after the socket no longer >> > exists in the fd lookup table and so sendmsg() will fail before it ever >> > has the chance to race. >> >> This is okay if a socket can be assigned to a single transport, but with >> dgrams I'm not sure we can do that. >> >> Since it is not connected, a socket can send or receive packets from >> different transports, so maybe we can't assign it to a specific one, >> but do a lookup for every packets to understand which transport to use. >> > >Yes this is true, this lookup needs to be implemented... currently >sendmsg() doesn't do this at all. It grabs the remote_addr when passed >in from sendto(), but then just uses the same old transport from vsk. >You are right that sendto() should be a per-packet lookup, not a >vsk->transport read. Perhaps I should add that as another patch in this >series, and make it precede this one? Yep, I think so, we need to implement it before adding a new transport that can support dgram. > >For the send() / sendto(NULL) case where vsk->transport is being read, I >do believe this is still race-free, but... > >If we later support dynamic transports for datagram, such that >connect(VMADDR_LOCAL_CID) sets vsk->transport to transport_loopback, >connect(VMADDR_CID_HOST) sets vsk->transport to something like >transport_datagram_g2h, etc..., then vsk->transport will need to be >bundled into the RCU-protected pointer too, since it may change when >remote_addr changes. Yep, I think so. Although in vsock_dgram_connect we call lock_sock(), so maybe that could be enough to protect us. In general I think we should use vsk->transport if vsock_dgram_connect() is called, or we need to do per-packet lookup. Another think I would change, is the dgram_dequeue() callback. I would remove it, and move in the core the code in vmci_transport_dgram_dequeue() since it seems pretty generic. This should work well if every transport uses sk_receive_skb() to enqueue sk_buffs in the socket buffer. Thanks, Stefano _______________________________________________ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
WARNING: multiple messages have this Message-ID (diff)
From: Stefano Garzarella <sgarzare@redhat.com> To: Bobby Eshleman <bobbyeshleman@gmail.com> Cc: Vishnu Dasa <vdasa@vmware.com>, Wei Liu <wei.liu@kernel.org>, Bobby Eshleman <bobby.eshleman@bytedance.com>, kvm@vger.kernel.org, "Michael S. Tsirkin" <mst@redhat.com>, VMware PV-Drivers Reviewers <pv-drivers@vmware.com>, Dexuan Cui <decui@microsoft.com>, Haiyang Zhang <haiyangz@microsoft.com>, linux-kernel@vger.kernel.org, virtualization@lists.linux-foundation.org, Bryan Tan <bryantan@vmware.com>, Eric Dumazet <edumazet@google.com>, linux-hyperv@vger.kernel.org, Stefan Hajnoczi <stefanha@redhat.com>, netdev@vger.kernel.org, Jakub Kicinski <kuba@kernel.org>, Paolo Abeni <pabeni@redhat.com>, "David S. Miller" <davem@davemloft.net> Subject: Re: [PATCH RFC net-next v2 3/4] vsock: Add lockless sendmsg() support Date: Wed, 3 May 2023 14:09:38 +0200 [thread overview] Message-ID: <lc2v5porgyzx6neimlyrpeg3d5l7trnorbs7xubqgcrlp7bbi7@yxs25wx67tm7> (raw) In-Reply-To: <ZDre6Mqh9+HA8wuN@bullseye> On Sat, Apr 15, 2023 at 05:29:12PM +0000, Bobby Eshleman wrote: >On Fri, Apr 28, 2023 at 12:29:50PM +0200, Stefano Garzarella wrote: >> On Sat, Apr 15, 2023 at 10:30:55AM +0000, Bobby Eshleman wrote: >> > On Wed, Apr 19, 2023 at 11:30:53AM +0200, Stefano Garzarella wrote: >> > > On Fri, Apr 14, 2023 at 12:25:59AM +0000, Bobby Eshleman wrote: >> > > > Because the dgram sendmsg() path for AF_VSOCK acquires the socket lock >> > > > it does not scale when many senders share a socket. >> > > > >> > > > Prior to this patch the socket lock is used to protect the local_addr, >> > > > remote_addr, transport, and buffer size variables. What follows are the >> > > > new protection schemes for the various protected fields that ensure a >> > > > race-free multi-sender sendmsg() path for vsock dgrams. >> > > > >> > > > - local_addr >> > > > local_addr changes as a result of binding a socket. The write path >> > > > for local_addr is bind() and various vsock_auto_bind() call sites. >> > > > After a socket has been bound via vsock_auto_bind() or bind(), subsequent >> > > > calls to bind()/vsock_auto_bind() do not write to local_addr again. bind() >> > > > rejects the user request and vsock_auto_bind() early exits. >> > > > Therefore, the local addr can not change while a parallel thread is >> > > > in sendmsg() and lock-free reads of local addr in sendmsg() are safe. >> > > > Change: only acquire lock for auto-binding as-needed in sendmsg(). >> > > > >> > > > - vsk->transport >> > > > Updated upon socket creation and it doesn't change again until the >> > > >> > > This is true only for dgram, right? >> > > >> > >> > Yes. >> > >> > > How do we decide which transport to assign for dgram? >> > > >> > >> > The transport is assigned in proto->create() [vsock_create()]. It is >> > assigned there *only* for dgrams, whereas for streams/seqpackets it is >> > assigned in connect(). vsock_create() sets transport to >> > 'transport_dgram' if sock->type == SOCK_DGRAM. >> > >> > vsock_sk_destruct() then eventually sets vsk->transport to NULL. >> > >> > Neither destruct nor create can occur in parallel with sendmsg(). >> > create() hasn't yet returned the sockfd for the user to call upon it >> > sendmsg(), and AFAICT destruct is only called after the final socket >> > reference is released, which only happens after the socket no longer >> > exists in the fd lookup table and so sendmsg() will fail before it ever >> > has the chance to race. >> >> This is okay if a socket can be assigned to a single transport, but with >> dgrams I'm not sure we can do that. >> >> Since it is not connected, a socket can send or receive packets from >> different transports, so maybe we can't assign it to a specific one, >> but do a lookup for every packets to understand which transport to use. >> > >Yes this is true, this lookup needs to be implemented... currently >sendmsg() doesn't do this at all. It grabs the remote_addr when passed >in from sendto(), but then just uses the same old transport from vsk. >You are right that sendto() should be a per-packet lookup, not a >vsk->transport read. Perhaps I should add that as another patch in this >series, and make it precede this one? Yep, I think so, we need to implement it before adding a new transport that can support dgram. > >For the send() / sendto(NULL) case where vsk->transport is being read, I >do believe this is still race-free, but... > >If we later support dynamic transports for datagram, such that >connect(VMADDR_LOCAL_CID) sets vsk->transport to transport_loopback, >connect(VMADDR_CID_HOST) sets vsk->transport to something like >transport_datagram_g2h, etc..., then vsk->transport will need to be >bundled into the RCU-protected pointer too, since it may change when >remote_addr changes. Yep, I think so. Although in vsock_dgram_connect we call lock_sock(), so maybe that could be enough to protect us. In general I think we should use vsk->transport if vsock_dgram_connect() is called, or we need to do per-packet lookup. Another think I would change, is the dgram_dequeue() callback. I would remove it, and move in the core the code in vmci_transport_dgram_dequeue() since it seems pretty generic. This should work well if every transport uses sk_receive_skb() to enqueue sk_buffs in the socket buffer. Thanks, Stefano
next prev parent reply other threads:[~2023-05-03 12:09 UTC|newest] Thread overview: 38+ messages / expand[flat|nested] mbox.gz Atom feed top 2023-04-14 0:25 [PATCH RFC net-next v2 0/4] virtio/vsock: support datagrams Bobby Eshleman 2023-04-14 0:25 ` [PATCH RFC net-next v2 1/4] virtio/vsock: support dgram Bobby Eshleman 2023-04-19 9:29 ` Stefano Garzarella 2023-04-19 9:29 ` Stefano Garzarella 2023-04-14 0:25 ` [PATCH RFC net-next v2 2/4] virtio/vsock: add VIRTIO_VSOCK_F_DGRAM feature bit Bobby Eshleman 2023-04-14 8:47 ` Alvaro Karsz 2023-04-14 8:47 ` Alvaro Karsz 2023-04-14 10:28 ` Bobby Eshleman 2023-04-19 9:30 ` Stefano Garzarella 2023-04-19 9:30 ` Stefano Garzarella 2023-04-15 9:51 ` Bobby Eshleman 2023-04-14 0:25 ` [PATCH RFC net-next v2 3/4] vsock: Add lockless sendmsg() support Bobby Eshleman 2023-04-14 2:09 ` kernel test robot 2023-04-14 3:32 ` kernel test robot 2023-04-15 6:13 ` kernel test robot 2023-04-19 9:30 ` Stefano Garzarella 2023-04-19 9:30 ` Stefano Garzarella 2023-04-15 10:30 ` Bobby Eshleman 2023-04-28 10:29 ` Stefano Garzarella 2023-04-28 10:29 ` Stefano Garzarella 2023-04-15 17:29 ` Bobby Eshleman 2023-05-03 12:09 ` Stefano Garzarella [this message] 2023-05-03 12:09 ` Stefano Garzarella 2023-04-14 0:26 ` [PATCH RFC net-next v2 4/4] tests: add vsock dgram tests Bobby Eshleman 2023-04-14 11:18 ` [PATCH RFC net-next v2 0/4] virtio/vsock: support datagrams Bobby Eshleman 2023-04-19 10:00 ` Stefano Garzarella 2023-04-19 10:00 ` Stefano Garzarella 2023-04-15 7:13 ` Bobby Eshleman 2023-04-15 7:13 ` [virtio-dev] " Bobby Eshleman 2023-04-28 10:43 ` Stefano Garzarella 2023-04-28 10:43 ` Stefano Garzarella 2023-04-28 10:43 ` [virtio-dev] " Stefano Garzarella 2023-04-15 15:55 ` Bobby Eshleman 2023-04-15 15:55 ` [virtio-dev] " Bobby Eshleman 2023-05-03 12:13 ` Stefano Garzarella 2023-05-03 12:13 ` [virtio-dev] " Stefano Garzarella 2023-05-03 12:13 ` Stefano Garzarella 2023-04-14 5:16 [PATCH RFC net-next v2 3/4] vsock: Add lockless sendmsg() support kernel test robot
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=lc2v5porgyzx6neimlyrpeg3d5l7trnorbs7xubqgcrlp7bbi7@yxs25wx67tm7 \ --to=sgarzare@redhat.com \ --cc=bobby.eshleman@bytedance.com \ --cc=bobbyeshleman@gmail.com \ --cc=bryantan@vmware.com \ --cc=davem@davemloft.net \ --cc=decui@microsoft.com \ --cc=edumazet@google.com \ --cc=haiyangz@microsoft.com \ --cc=kuba@kernel.org \ --cc=kvm@vger.kernel.org \ --cc=linux-hyperv@vger.kernel.org \ --cc=linux-kernel@vger.kernel.org \ --cc=mst@redhat.com \ --cc=netdev@vger.kernel.org \ --cc=pabeni@redhat.com \ --cc=pv-drivers@vmware.com \ --cc=stefanha@redhat.com \ --cc=vdasa@vmware.com \ --cc=virtualization@lists.linux-foundation.org \ --cc=wei.liu@kernel.org \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: linkBe sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.