All of lore.kernel.org
 help / color / mirror / Atom feed
From: Casper.Dik@oracle.com
To: Al Viro <viro@ZenIV.linux.org.uk>
Cc: Alan Burlison <Alan.Burlison@oracle.com>,
	Eric Dumazet <eric.dumazet@gmail.com>,
	Stephen Hemminger <stephen@networkplumber.org>,
	netdev@vger.kernel.org, dholland-tech@netbsd.org
Subject: Re: Fw: [Bug 106241] New: shutdown(3)/close(3) behaviour is incorrect for sockets in accept(3)
Date: Wed, 21 Oct 2015 22:33:04 +0200	[thread overview]
Message-ID: <201510212033.t9LKX4G8007718@room101.nl.oracle.com> (raw)
In-Reply-To: <20151021185104.GM22011@ZenIV.linux.org.uk>


>On Wed, Oct 21, 2015 at 03:38:51PM +0100, Alan Burlison wrote:
>
>> >There's going to be a notion of "last close"; that's what this refcount is
>> >about and _that_ is more than implementation detail.
>> 
>> Yes, POSIX distinguishes between "file descriptor" and "file
>> description" (ugh!) and the close() page says:
>
>Would've been better if they went for something like "IO channel" for
>the latter ;-/

Or at least some other word.  A file descriptor is just an index to
a list of file pointers (and wasn't named so?)

>> "When all file descriptors associated with an open file description
>> have been closed, the open file description shall be freed."
>
>BTW, is SCM_RIGHTS outside of scope?  They do mention it in one place
>only:
>| Ancillary data is also possible at the socket level. The <sys/socket.h>
>| header shall define the following symbolic constant for use as the cmsg_type
>| value when cmsg_level is SOL_SOCKET:
>|
>| SCM_RIGHTS
>|     Indicates that the data array contains the access rights to be sent or
>| received.
>
>with no further details whatsoever.  It's been there since at least 4.3-Reno;
>does anybody still use the older variant (->msg_accrights, that is)?  IIRC,
>there was some crap circa 2.6 when Solaris used to do ->msg_accrights for
>descriptor-passing, but more or less current versions appear to support
>SCM_RIGHTS...  In any case, descriptor-passing had been there in some form
>since at least '83 (the old variant is already present in 4.2) and considering
>it out-of-scope for POSIX is bloody ridiculous, IMO.

SCM_RIGHTS was introduced as part of the POSIX standardization of BSD 
sockets.  Looks like they became part of Solaris 2.6, but the default
was non-standard sockets so you may easily find msg->accrights but not
SCM_RIGHTS.

msg_accrights is what was introduced in BSD in likely the first 
implementation of socket-based file descriptor passing.

SysV has its own file descriptor passing on file descriptors passing.

But that interface is too much ad-hoc, so SCM_RIGHTS is a standardizations 
allowing multiple types of messages to be send around

>Unless they want to consider in-flight descriptor-passing datagrams as
>collections of file descriptors, the quoted sentence is seriously misleading.
>And then there's mmap(), which they do kinda-sorta mention...

Well, a file descriptor really only exists in the context of a process; 
in-flight it is no longer a file descriptor as there process context with 
a file descriptor table; so pointers to file descriptions are passed 
around.

>> >In other words, is that destruction of
>> >	* any descriptor refering to this socket [utterly insane for obvious
>> >reasons]
>> >	* the last descriptor refering to this socket (modulo descriptor
>> >passing, etc.) [a bitch to implement, unless we treat a syscall in progress
>> >as keeping the opened file open], or
>> >	* _the_ descriptor used to issue accept(2) [a bitch to implement,
>> >with a lot of fun races in an already race-prone area]?
>> 
>> From reading the POSIX close() page I believe the second option is
>> the correct one.
>
>Er...  So fd2 = dup(fd);accept(fd)/close(fd) should *not* trigger that
>behaviour, in your opinion?  Because fd is sure as hell not the last
>descriptor refering to that socket - fd2 remains alive and well.
>
>Behaviour you describe below matches the _third_ option.

>> >BTW, for real fun, consider this:
>> >7)
>> >// fd is a socket
>> >fd2 = dup(fd);
>> >in thread A: accept(fd);
>> >in thread B: accept(fd);
>> >in thread C: accept(fd2);
>> >in thread D: close(fd);
>> >
>> >Which threads (if any), should get hit where it hurts?
>> 
>> A & B should return from the accept with an error. C should
>> continue. Which is what happens on Solaris.
>
>> To this end each thread keeps a list of file descriptors
>> in use by the current active system call.
>
>Yecchhhh...  How much cross-CPU traffic does that cause on
>multithread processes?  Not on close(2), on maintaining the
>descriptor use counts through the normal syscalls.

In the Solaris implementation is pretty much what we do;
but there is no much cross-CPU traffic.  Of course, you will need
to keep locks in the file descriptor table if only to find
the actual file pointer.

The work is done only in the case of a badly written application
where close is required to hunt down all threads currently using
the specific file descriptor.

>> When a file descriptor is closed and this file descriptor
>> is marked as being in use by other threads, the kernel
>> will search all threads to see which have this file descriptor
>> listed as in use. For each such thread, the kernel tells
>> the thread that its active fds list is now stale and, if
>> possible, makes the thread run.
>>
>> While this algorithm is pretty expensive, it is not often invoked.
>
>Sure, but the upkeep of data structures it would need is there
>whether you actually end up triggering it or not.  Both in
>memory footprint and in cacheline pingpong...

Most of the work on using a file descriptor is local to the thread.

>Besides, the list of threads using given descriptor table also needs
>to be maintained, unless you scan *all* threads in the system (which
>would be quite a fun wrt latency and affect a lot more than just the
>process doing something dumb and rare).

In Solaris all threads using the same file descriptor table are all the 
threads in the same process.  This is typical for Unix but it is not what 
we have in Linux, or so I'm told.   So we already have that particular
list of threads.

>BTW, speaking of fun races: AFAICS, NetBSD dup2() isn't atomic.  It
>calls fd_close() outside of ->fd_lock (has to, since fd_close() is
>grabbing that itself), so another thread doing e.g.  fcntl(newfd, F_GETFD)
>in the middle of dup2(oldfd, newfd) might end up with EBADF, even though
>both before and after dup2() newfd had been open.  What's worse,
>thread A:
>	fd1 = open("foo", ...);
>	fd2 = open("bar", ...);
>	...
>	dup2(fd1, fd2);
>thread B:
>	fd = open("baz", ...);
>might, AFAICS, end up with fd == fd2 and refering to foo instead of baz.
>All it takes is the last open() managing to grab ->fd_lock just as fd_close()
>from dup2() has dropped it.  Which is an unexpected behaviour, to put it
>mildly, no matter how much standard lawyering one applies...

It could happen even when the implementation does not have any races; but 
I think you're saying that you know that the open call in thread B 
comes after the open to fd2, then fd != fd2.

This looks indeed like a bug.

Casper

  reply	other threads:[~2015-10-21 20:45 UTC|newest]

Thread overview: 138+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-10-19 16:59 Fw: [Bug 106241] New: shutdown(3)/close(3) behaviour is incorrect for sockets in accept(3) Stephen Hemminger
2015-10-19 23:33 ` Eric Dumazet
2015-10-20  1:12   ` Alan Burlison
2015-10-20  1:45     ` Eric Dumazet
2015-10-20  9:59       ` Alan Burlison
2015-10-20 11:24         ` David Miller
2015-10-20 11:39           ` Alan Burlison
2015-10-20 13:19         ` Fw: " Eric Dumazet
2015-10-20 13:45           ` Alan Burlison
2015-10-20 15:30             ` Eric Dumazet
2015-10-20 18:31               ` Alan Burlison
2015-10-20 18:42                 ` Eric Dumazet
2015-10-21 10:25                 ` David Laight
2015-10-21 10:49                   ` Alan Burlison
2015-10-21 11:28                     ` Eric Dumazet
2015-10-21 13:03                       ` Alan Burlison
2015-10-21 13:29                         ` Eric Dumazet
2015-10-21  3:49       ` Al Viro
2015-10-21 14:38         ` Alan Burlison
2015-10-21 15:30           ` David Miller
2015-10-21 16:04             ` Casper.Dik
2015-10-21 21:18               ` Eric Dumazet
2015-10-21 21:28                 ` Al Viro
2015-10-21 16:32           ` Fw: " Eric Dumazet
2015-10-21 18:51           ` Al Viro
2015-10-21 20:33             ` Casper.Dik [this message]
2015-10-22  4:21               ` Al Viro
2015-10-22 10:55                 ` Alan Burlison
2015-10-22 18:16                   ` Al Viro
2015-10-22 20:15                     ` Alan Burlison
2015-11-02 10:03               ` David Laight
2015-11-02 10:29                 ` Al Viro
2015-10-21 22:28             ` Alan Burlison
2015-10-22  1:29             ` David Miller
2015-10-22  4:17               ` Alan Burlison
2015-10-22  4:44                 ` Al Viro
2015-10-22  6:03                   ` Al Viro
2015-10-22  6:34                     ` Casper.Dik
2015-10-22 17:21                       ` Al Viro
2015-10-22 18:24                         ` Casper.Dik
2015-10-22 19:07                           ` Al Viro
2015-10-22 19:51                             ` Casper.Dik
2015-10-22 21:57                               ` Al Viro
2015-10-23  9:52                                 ` Casper.Dik
2015-10-23 13:02                                   ` Eric Dumazet
2015-10-23 13:20                                     ` Casper.Dik
2015-10-23 13:48                                       ` Eric Dumazet
2015-10-23 14:13                                       ` Eric Dumazet
2015-10-23 13:35                                     ` Alan Burlison
2015-10-23 14:21                                       ` Eric Dumazet
2015-10-23 15:46                                         ` Alan Burlison
2015-10-23 16:00                                           ` Eric Dumazet
2015-10-23 16:07                                             ` Alan Burlison
2015-10-23 16:19                                             ` Eric Dumazet
2015-10-23 16:40                                               ` Alan Burlison
2015-10-23 17:47                                                 ` Eric Dumazet
2015-10-23 17:59                                                   ` [PATCH net-next] af_unix: do not report POLLOUT on listeners Eric Dumazet
2015-10-25 13:45                                                     ` David Miller
2015-10-24  2:30                                   ` [Bug 106241] New: shutdown(3)/close(3) behaviour is incorrect for sockets in accept(3) Al Viro
2015-10-27  9:08                                     ` Casper.Dik
2015-10-27 10:52                                       ` Alan Burlison
2015-10-27 12:01                                         ` Eric Dumazet
2015-10-27 12:27                                           ` Alan Burlison
2015-10-27 12:44                                             ` Eric Dumazet
2015-10-27 13:42                                         ` David Miller
2015-10-27 13:37                                           ` Alan Burlison
2015-10-27 13:59                                             ` David Miller
2015-10-27 14:13                                               ` Alan Burlison
2015-10-27 14:39                                                 ` David Miller
2015-10-27 14:39                                                   ` Alan Burlison
2015-10-27 15:04                                                     ` David Miller
2015-10-27 15:53                                                       ` Alan Burlison
2015-10-27 23:17                                         ` Al Viro
2015-10-28  0:13                                           ` Eric Dumazet
2015-10-28 12:35                                             ` Al Viro
2015-10-28 13:24                                               ` Eric Dumazet
2015-10-28 14:47                                                 ` Eric Dumazet
2015-10-28 21:13                                                   ` Al Viro
2015-10-28 21:44                                                     ` Eric Dumazet
2015-10-28 22:33                                                       ` Al Viro
2015-10-28 23:08                                                         ` Eric Dumazet
2015-10-29  0:15                                                           ` Al Viro
2015-10-29  3:29                                                             ` Eric Dumazet
2015-10-29  4:16                                                               ` Al Viro
2015-10-29 12:35                                                                 ` Eric Dumazet
2015-10-29 13:48                                                                   ` Eric Dumazet
2015-10-30 17:18                                                                   ` Linus Torvalds
2015-10-30 21:02                                                                     ` Al Viro
2015-10-30 21:23                                                                       ` Linus Torvalds
2015-10-30 21:50                                                                         ` Linus Torvalds
2015-10-30 22:33                                                                           ` Al Viro
2015-10-30 23:52                                                                             ` Linus Torvalds
2015-10-31  0:09                                                                               ` Al Viro
2015-10-31 15:59                                                                               ` Eric Dumazet
2015-10-31 19:34                                                                               ` Al Viro
2015-10-31 19:54                                                                                 ` Linus Torvalds
2015-10-31 20:29                                                                                   ` Al Viro
2015-11-02  0:24                                                                                     ` Al Viro
2015-11-02  0:59                                                                                       ` Linus Torvalds
2015-11-02  2:14                                                                                       ` Eric Dumazet
2015-11-02  6:22                                                                                         ` Al Viro
2015-10-31 20:45                                                                                   ` Eric Dumazet
2015-10-31 21:23                                                                                     ` Linus Torvalds
2015-10-31 21:51                                                                                       ` Al Viro
2015-10-31 22:34                                                                                       ` Eric Dumazet
2015-10-31  1:07                                                                           ` Eric Dumazet
2015-10-28 16:04                                           ` Alan Burlison
2015-10-29 14:58                                         ` David Holland
2015-10-29 15:18                                           ` Alan Burlison
2015-10-29 16:01                                             ` David Holland
2015-10-29 16:15                                               ` Alan Burlison
2015-10-29 17:07                                                 ` Al Viro
2015-10-29 17:12                                                   ` Alan Burlison
2015-10-30  1:54                                                     ` David Miller
2015-10-30  1:55                                                   ` David Miller
2015-10-30  5:44                                                 ` David Holland
2015-10-30 17:43                                           ` David Laight
2015-10-30 21:09                                             ` Al Viro
2015-11-04 15:54                                               ` David Laight
2015-11-04 16:27                                                 ` Al Viro
2015-11-06 15:07                                                   ` David Laight
2015-11-06 19:31                                                     ` Al Viro
2015-10-22  6:51                   ` Casper.Dik
2015-10-22 11:18                     ` Alan Burlison
2015-10-22 11:15                   ` Alan Burlison
2015-10-22  6:15                 ` Casper.Dik
2015-10-22 11:30                   ` Eric Dumazet
2015-10-22 11:58                     ` Alan Burlison
2015-10-22 12:10                       ` Eric Dumazet
2015-10-22 13:12                         ` David Miller
2015-10-22 13:14                         ` Alan Burlison
2015-10-22 17:05                           ` Al Viro
2015-10-22 17:39                             ` Alan Burlison
2015-10-22 18:56                               ` Al Viro
2015-10-22 19:50                                 ` Casper.Dik
2015-10-23 17:09                                   ` Al Viro
2015-10-23 18:30           ` Fw: " David Holland
2015-10-23 19:51             ` Al Viro

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=201510212033.t9LKX4G8007718@room101.nl.oracle.com \
    --to=casper.dik@oracle.com \
    --cc=Alan.Burlison@oracle.com \
    --cc=dholland-tech@netbsd.org \
    --cc=eric.dumazet@gmail.com \
    --cc=netdev@vger.kernel.org \
    --cc=stephen@networkplumber.org \
    --cc=viro@ZenIV.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.