netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Re: [PATCH v2] recv.2: Document MSG_CMSG_CLOEXEC as returned in msg_flags
       [not found]     ` <20230718060121.934187-1-mattlloydhouse@gmail.com>
@ 2023-07-18 12:03       ` Alejandro Colomar
  2023-07-18 17:26         ` [PATCH v3] " Matthew House
  0 siblings, 1 reply; 3+ messages in thread
From: Alejandro Colomar @ 2023-07-18 12:03 UTC (permalink / raw)
  To: Matthew House
  Cc: linux-man, linux-api, netdev, Ulrich Drepper, Ulrich Drepper,
	David S. Miller, Andrew Morton


[-- Attachment #1.1: Type: text/plain, Size: 5155 bytes --]

Hi Matthew,

On 2023-07-18 08:00, Matthew House wrote:
> On Mon, Jul 17, 2023 at 7:10 PM Alejandro Colomar <alx@kernel.org> wrote:
>> Hi Matthew,
>>
>> I don't understand what's the purpose of this.  The kernel sets a bit
>> just to report to the caller that it set a bit?  No other purpose?
>> It feels very weird.  Of course, the caller already has that info,
>> doesn't it?
> 
> The main reason I posted this patch was because I was confused by the
> flag's presence in the msg_flags when I was looking at some strace logs, so
> I figured that it would be a good idea to document it.

Makes sense.

> As for the original
> purpose of the behavior, it's not really clear, and it may well have been
> an implementation artifact that got enshrined in the user space ABI. (Even
> io_uring is careful to replicate this behavior!)

This is what worries me.  I've CCd a bunch of people to see if they can
bring some light.

> 
> This behavior began when the MSG_CMSG_CLOEXEC flag was first added in Linux
> 2.6.23, with Ulrich Drepper's commit 4a19542e5f69 ("O_CLOEXEC for
> SCM_RIGHTS"). Per the commit message, the flag was designed to be
> "passe[d]... just like the existing MSG_CMSG_COMPAT flag". Since it was
> added to the msg_flags at the start of sys_recvmsg(), the
> scm_detach_fds[_compat]() functions in net/core/scm.c and net/compat.c
> could read the flag off of msg->msg_flags without having to thread the
> recvmsg() flags through.
> 
> This was indeed similar to the behavior of MSG_CMSG_COMPAT. That flag was
> added in Linux 2.5.65, with commit 3225fc8a85f4 ("[NET]: Simplify scm
> handling and sendmsg/recvmsg invocation, consolidate net compat
> syscalls."), in which put_cmsg() and scm_detach_fds() in net/core/scm.c
> read it off of msg->msg_flags. (It wouldn't actually be set in msg_flags
> until Linux 2.5.67, with commit 7e8d06bc1d90, "[COMPAT]: Fix
> MSG_CMSG_COMPAT flag passing, kill cmsg_compat_recvmsg_fixup." Both of
> these commits are from history/history.git.)
> 
> However, the MSG_CMSG_COMPAT flag has been scrubbed from the output
> msg_flags since Linux 2.6.14, with commit 37f7f421cce1 ("[NET]: Do not leak
> MSG_CMSG_COMPAT into userspace."). That's what I find so unclear:
> MSG_CMSG_CLOEXEC was added after the kernel started scrubbing
> MSG_CMSG_COMPAT from the output, but the new flag was never written to be
> similarly scrubbed.
> 
> Later, in Linux 3.10, with commits 1be374a0518a ("net: Block
> MSG_CMSG_COMPAT in send(m)msg and recv(m)msg") and a7526eb5d06b ("net:
> Unbreak compat_sys_{send,recv}msg"), MSG_CMSG_COMPAT was banned from being
> passed to the *msg() syscalls' flags from user space, with the rationale
> that they were "not intended to be part of the API". Then, in Linux 4.0, we
> reached the current status quo with commit d720d8cec563 ("net: compat:
> Ignore MSG_CMSG_COMPAT in compat_sys_{send, recv}msg"), where
> MSG_CMSG_COMPAT is allowed (and a no-op) in compat syscalls, but banned
> from non-compat syscalls.
> 
> So I agree that it's very weird that this flag gets returned to user space,
> even while the internal flag that it's modeled after doesn't. I suppose I
> could spin up a nice story, where the user-space function calling recvmsg()
> is totally separate from the function processing the returned struct
> msghdr, and the latter function would really like to know whether the fds
> in that message are close-on-exec without having to call fcntl(F_GETFD).
> But that's all just a total guess. If you want to know for sure, perhaps
> cc'ing Drepper may be worthwhile?
> 
> A cursory look hasn't shown me any existing user-space code that depends on
> this behavior. Though one library appears to be aware of this behavior,
> actively filtering MSG_CMSG_CLOEXEC out of the result flags:
> <https://github.com/dutchanddutch/node-socket-calls/blob/ca759a0da87cb112875d158f4a81b45b31f4a871/src/socket_calls.cc#L417>
> 
> Also, only somewhat relatedly, some libraries incorrectly attempt to
> request MSG_CMSG_CLOEXEC by passing it into the msg_flags field instead of
> the flags argument:
> <https://git.samba.org/samba.git/?p=samba.git;a=blob;f=lib/messaging/messages_dgm.c;hb=refs/tags/samba-4.17.9#l1272>
> <https://github.com/genodelabs/genode/blob/23.05/repos/base-linux/src/lib/base/ipc.cc#L132>
> <https://github.com/proxmox/pve-lxc-syscalld/blob/a14430f3e75c2b695332ad712164e599464177fc/src/io/seq_packet.rs#L123>

In any case, all of this mail has been very interesting,
and it would be useful to have it in the commit message of the patch.
Please send an v2 with it, and add 'Cc:' tags for all these people:

Cc: <linux-api@vger.kernel.org>
Cc: <netdev@vger.kernel.org>
Cc: Ulrich Drepper <drepper@gmail.com>
Cc: Ulrich Drepper <drepper@redhat.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Andrew Morton <akpm@linux-foundation.org>

(I don't know if <drepper@redhat.com> still works.  Does anyone know?)

Thanks!
Alex

> 
> Thank you,
> Matthew House

-- 
<http://www.alejandro-colomar.es/>
GPG key fingerprint: A9348594CE31283A826FBDD8D57633D441E25BB5


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 3+ messages in thread

* [PATCH v3] recv.2: Document MSG_CMSG_CLOEXEC as returned in msg_flags
  2023-07-18 12:03       ` [PATCH v2] recv.2: Document MSG_CMSG_CLOEXEC as returned in msg_flags Alejandro Colomar
@ 2023-07-18 17:26         ` Matthew House
  2023-07-18 21:39           ` Alejandro Colomar
  0 siblings, 1 reply; 3+ messages in thread
From: Matthew House @ 2023-07-18 17:26 UTC (permalink / raw)
  To: Alejandro Colomar
  Cc: linux-man, linux-api, netdev, Ulrich Drepper, David S. Miller,
	Andrew Morton

Ever since commit 4a19542e5f69 ("O_CLOEXEC for SCM_RIGHTS") added the
MSG_CMSG_CLOEXEC flag to recvmsg(2), the flag has also been copied into the
returned msg->msg_flags when specified, regardless of whether any file
descriptors were actually received, or whether the protocol supports
receiving file descriptors at all. This behavior was primarily an
implementation artifact: by copying MSG_CMSG_CLOEXEC into the msg_flags,
scm_detach_fds() in net/core/scm.c (and its _compat() counterpart in
net/compat.c) could determine whether it was set without having to receive
a copy of the recvmsg(2) flags.

This mechanism was closely modeled after the internal MSG_CMSG_COMPAT flag,
which is passed by the compat versions of the send[m]msg(2) and
recv[m]msg(2) syscalls to inform various functions that user space expects
a compat layout. When the flag was first implemented by commits
3225fc8a85f4 ("[NET]: Simplify scm handling and sendmsg/recvmsg invocation,
consolidate net compat syscalls.") and 7e8d06bc1d90 ("[COMPAT]: Fix
MSG_CMSG_COMPAT flag passing, kill cmsg_compat_recvmsg_fixup.") (in
history/history.git), the behavior was very similar: recvmsg(2) would add
MSG_CMSG_COMPAT to the msg_flags, and put_cmsg() and scm_detach_fds() in
net/core/scm.c would read the flag to determine whether to delegate to
their _compat() counterparts.

However, after the initial implementation, more work was done to hide
MSG_CMSG_COMPAT from user space. First, commit 37f7f421cce1 ("[NET]: Do not
leak MSG_CMSG_COMPAT into userspace.") started scrubbing the bit from
msg_flags right before copying it back into user space. Then, since passing
the MSG_CMSG_COMPAT flag into the syscalls from non-compat code could
confuse the kernel, commits 1be374a0518a ("net: Block MSG_CMSG_COMPAT in
send(m)msg and recv(m)msg") and a7526eb5d06b ("net: Unbreak
compat_sys_{send,recv}msg") made them return -EINVAL if user space
attempted to pass the flag. But to reduce breakage, commit d720d8cec563
("net: compat: Ignore MSG_CMSG_COMPAT in compat_sys_{send, recv}msg")
rolled that back somewhat, making MSG_CMSG_COMPAT an error for the
non-compat syscalls and a no-op for the compat syscalls, which is the
current status quo.

Even though MSG_CMSG_CLOEXEC was implemented after the kernel started
scrubbing MSG_CMSG_COMPAT from the returned msg_flags, the newer flag never
received the same treatment. At this point, this behavior has effectively
become part of the user-space API, to the extent that io_uring has been
careful in commit 9bb66906f23e ("io_uring: support multishot in recvmsg")
to replicate the behavior in its multishot IORING_OP_RECVMSG operation.

Therefore, document this behavior to avoid confusion when user space sees
MSG_CMSG_CLOEXEC returned in msg->msg_flags.

Cc: linux-api@vger.kernel.org
Cc: netdev@vger.kernel.org
Cc: Ulrich Drepper <drepper@redhat.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Matthew House <mattlloydhouse@gmail.com>
---
Alright, I've summarized the history in the commit message, and I've added
the CCs you requested.

Also, for future reference, Drepper gave a reply to the last email, which
did not make it onto the list:

On Tue, Jul 18, 2023 at 9:24 AM Ulrich Drepper <drepper@redhat.com> wrote:
> On Tue, Jul 18, 2023 at 2:10 PM Alejandro Colomar <alx@kernel.org> wrote:
>
> > > As for the original
> > > purpose of the behavior, it's not really clear, and it may well have been
> > > an implementation artifact that got enshrined in the user space ABI.
> > (Even
> > > io_uring is careful to replicate this behavior!)
> >
> > This is what worries me.  I've CCd a bunch of people to see if they can
> > bring some light.
> >
>
> It definitely was an artifact of the implementation.  I haven't tested
> getting the close-on-exec flag information for all interfaces.  The
> assumption was that the information about the close-on-exec flag is
> received with the universal fcntl() call.

Thank you,
Matthew House

 man2/recv.2 | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/man2/recv.2 b/man2/recv.2
index 660c103fb..1cd9f3e1b 100644
--- a/man2/recv.2
+++ b/man2/recv.2
@@ -412,6 +412,15 @@ is returned to indicate that expedited or out-of-band data was received.
 .B MSG_ERRQUEUE
 indicates that no data was received but an extended error from the socket
 error queue.
+.TP
+.BR MSG_CMSG_CLOEXEC " (since Linux 2.6.23)"
+.\" commit 4a19542e5f694cd408a32c3d9dc593ba9366e2d7
+indicates that
+.B MSG_CMSG_CLOEXEC
+was specified in the
+.I flags
+argument of
+.BR recvmsg ().
 .SH RETURN VALUE
 These calls return the number of bytes received, or \-1
 if an error occurred.
-- 
2.41.0


^ permalink raw reply related	[flat|nested] 3+ messages in thread

* Re: [PATCH v3] recv.2: Document MSG_CMSG_CLOEXEC as returned in msg_flags
  2023-07-18 17:26         ` [PATCH v3] " Matthew House
@ 2023-07-18 21:39           ` Alejandro Colomar
  0 siblings, 0 replies; 3+ messages in thread
From: Alejandro Colomar @ 2023-07-18 21:39 UTC (permalink / raw)
  To: Matthew House
  Cc: linux-man, linux-api, netdev, Ulrich Drepper, David S. Miller,
	Andrew Morton


[-- Attachment #1.1: Type: text/plain, Size: 5245 bytes --]

Hi Matthew, Ulrich,

On 2023-07-18 19:26, Matthew House wrote:
> Ever since commit 4a19542e5f69 ("O_CLOEXEC for SCM_RIGHTS") added the
> MSG_CMSG_CLOEXEC flag to recvmsg(2), the flag has also been copied into the
> returned msg->msg_flags when specified, regardless of whether any file
> descriptors were actually received, or whether the protocol supports
> receiving file descriptors at all. This behavior was primarily an
> implementation artifact: by copying MSG_CMSG_CLOEXEC into the msg_flags,
> scm_detach_fds() in net/core/scm.c (and its _compat() counterpart in
> net/compat.c) could determine whether it was set without having to receive
> a copy of the recvmsg(2) flags.
> 
> This mechanism was closely modeled after the internal MSG_CMSG_COMPAT flag,
> which is passed by the compat versions of the send[m]msg(2) and
> recv[m]msg(2) syscalls to inform various functions that user space expects
> a compat layout. When the flag was first implemented by commits
> 3225fc8a85f4 ("[NET]: Simplify scm handling and sendmsg/recvmsg invocation,
> consolidate net compat syscalls.") and 7e8d06bc1d90 ("[COMPAT]: Fix
> MSG_CMSG_COMPAT flag passing, kill cmsg_compat_recvmsg_fixup.") (in
> history/history.git), the behavior was very similar: recvmsg(2) would add
> MSG_CMSG_COMPAT to the msg_flags, and put_cmsg() and scm_detach_fds() in
> net/core/scm.c would read the flag to determine whether to delegate to
> their _compat() counterparts.
> 
> However, after the initial implementation, more work was done to hide
> MSG_CMSG_COMPAT from user space. First, commit 37f7f421cce1 ("[NET]: Do not
> leak MSG_CMSG_COMPAT into userspace.") started scrubbing the bit from
> msg_flags right before copying it back into user space. Then, since passing
> the MSG_CMSG_COMPAT flag into the syscalls from non-compat code could
> confuse the kernel, commits 1be374a0518a ("net: Block MSG_CMSG_COMPAT in
> send(m)msg and recv(m)msg") and a7526eb5d06b ("net: Unbreak
> compat_sys_{send,recv}msg") made them return -EINVAL if user space
> attempted to pass the flag. But to reduce breakage, commit d720d8cec563
> ("net: compat: Ignore MSG_CMSG_COMPAT in compat_sys_{send, recv}msg")
> rolled that back somewhat, making MSG_CMSG_COMPAT an error for the
> non-compat syscalls and a no-op for the compat syscalls, which is the
> current status quo.
> 
> Even though MSG_CMSG_CLOEXEC was implemented after the kernel started
> scrubbing MSG_CMSG_COMPAT from the returned msg_flags, the newer flag never
> received the same treatment. At this point, this behavior has effectively
> become part of the user-space API, to the extent that io_uring has been
> careful in commit 9bb66906f23e ("io_uring: support multishot in recvmsg")
> to replicate the behavior in its multishot IORING_OP_RECVMSG operation.
> 
> Therefore, document this behavior to avoid confusion when user space sees
> MSG_CMSG_CLOEXEC returned in msg->msg_flags.
> 
> Cc: linux-api@vger.kernel.org
> Cc: netdev@vger.kernel.org
> Cc: Ulrich Drepper <drepper@redhat.com>
> Cc: "David S. Miller" <davem@davemloft.net>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Signed-off-by: Matthew House <mattlloydhouse@gmail.com>
> ---
> Alright, I've summarized the history in the commit message, and I've added
> the CCs you requested.
> 
> Also, for future reference, Drepper gave a reply to the last email, which
> did not make it onto the list:
> 
> On Tue, Jul 18, 2023 at 9:24 AM Ulrich Drepper <drepper@redhat.com> wrote:
>> On Tue, Jul 18, 2023 at 2:10 PM Alejandro Colomar <alx@kernel.org> wrote:
>>
>>>> As for the original
>>>> purpose of the behavior, it's not really clear, and it may well have been
>>>> an implementation artifact that got enshrined in the user space ABI.
>>> (Even
>>>> io_uring is careful to replicate this behavior!)
>>>
>>> This is what worries me.  I've CCd a bunch of people to see if they can
>>> bring some light.
>>>
>>
>> It definitely was an artifact of the implementation.  I haven't tested
>> getting the close-on-exec flag information for all interfaces.  The
>> assumption was that the information about the close-on-exec flag is
>> received with the universal fcntl() call.

Patch applied.  I've included Drepper's quote in the commit message too.
Thank you both!

Cheers,
Alex

> 
> Thank you,
> Matthew House
> 
>  man2/recv.2 | 9 +++++++++
>  1 file changed, 9 insertions(+)
> 
> diff --git a/man2/recv.2 b/man2/recv.2
> index 660c103fb..1cd9f3e1b 100644
> --- a/man2/recv.2
> +++ b/man2/recv.2
> @@ -412,6 +412,15 @@ is returned to indicate that expedited or out-of-band data was received.
>  .B MSG_ERRQUEUE
>  indicates that no data was received but an extended error from the socket
>  error queue.
> +.TP
> +.BR MSG_CMSG_CLOEXEC " (since Linux 2.6.23)"
> +.\" commit 4a19542e5f694cd408a32c3d9dc593ba9366e2d7
> +indicates that
> +.B MSG_CMSG_CLOEXEC
> +was specified in the
> +.I flags
> +argument of
> +.BR recvmsg ().
>  .SH RETURN VALUE
>  These calls return the number of bytes received, or \-1
>  if an error occurred.

-- 
<http://www.alejandro-colomar.es/>
GPG key fingerprint: A9348594CE31283A826FBDD8D57633D441E25BB5


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2023-07-18 21:39 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <20230709213358.389871-1-mattlloydhouse@gmail.com>
     [not found] ` <363c0f82-969d-1927-1bd5-b664cfc83a87@kernel.org>
     [not found]   ` <20230716234803.851580-1-mattlloydhouse@gmail.com>
     [not found]     ` <20230718060121.934187-1-mattlloydhouse@gmail.com>
2023-07-18 12:03       ` [PATCH v2] recv.2: Document MSG_CMSG_CLOEXEC as returned in msg_flags Alejandro Colomar
2023-07-18 17:26         ` [PATCH v3] " Matthew House
2023-07-18 21:39           ` Alejandro Colomar

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).