netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Al Viro <viro@ZenIV.linux.org.uk>
To: Alan Burlison <Alan.Burlison@oracle.com>
Cc: Casper.Dik@oracle.com, Eric Dumazet <eric.dumazet@gmail.com>,
	Stephen Hemminger <stephen@networkplumber.org>,
	netdev@vger.kernel.org, dholland-tech@netbsd.org
Subject: Re: Fw: [Bug 106241] New: shutdown(3)/close(3) behaviour is incorrect for sockets in accept(3)
Date: Thu, 22 Oct 2015 19:16:56 +0100	[thread overview]
Message-ID: <20151022181656.GT22011@ZenIV.linux.org.uk> (raw)
In-Reply-To: <5628C0AE.2020508@oracle.com>

On Thu, Oct 22, 2015 at 11:55:42AM +0100, Alan Burlison wrote:
> On 22/10/2015 05:21, Al Viro wrote:
> 
> >>Most of the work on using a file descriptor is local to the thread.
> >
> >Using - sure, but what of cacheline dirtied every time you resolve a
> >descriptor to file reference?
> 
> Don't you have to do that anyway, to do anything useful with the file?

Dirtying the cacheline that contains struct file itself is different, but
that's not per-descriptor.

> >In case of Linux we have two bitmaps and an array of pointers associated
> >with descriptor table.  They grow on demand (in parallel)
> >	* reserving a descriptor is done under ->file_lock (dropped/regained
> >around memory allocation if we end up expanding the sucker, actual reassignment
> >of pointers to array/bitmaps is under that spinlock)
> >	* installing a pointer is lockless (we wait for ongoing resize to
> >settle, RCU takes care of the rest)
> >	* grabbing a file by index is lockless as well
> >	* removing a pointer is under ->file_lock, so's replacing it by dup2().
> 
> Is that table per-process or global?

Usually it's per-process, but any thread could ask for a private instance
to work with (and then spawn more threads sharing that instance - or getting
independent copies).

It's common for Plan 9-inspired models - basically, you treat every thread
as a machine that consists of
	* memory
	* file descriptor table
	* namespace
	* signal handlers
	...
	* CPU (i.e. actual thread of execution).
The last part can't be shared; anything else can.  fork(2) variant used to
start new threads (clone(2) in case of Linux, rfork(2) in Plan 9 and *BSD)
is told which components should be copies of parent's ones and which should
be shared with the parent.  fork(2) is simply "copy everything except for the
namespace".  It's fairly common to have "share everything", but intermediate
variants are also possible.  There are constraints (e.g. you can't share
signal handlers without sharing the memory space), but descriptor table
can be shared independently from memory space just fine.  There's also a
way to say "unshare this, this and that components" - mapped to unshare(2) in
Linux and to rfork(2) in Plan 9.

Best way to think of that is to consider descriptor table as a first-class
object a thread can be connected to.  Usually you have one for each process,
with all threads belonging to that process connected to the same thing,
but that's just the most common use.

> I don't think that it's possible to claim that a non-atomic dup2()
> is POSIX-compliant.

Except that it's in non-normative part of dup2(2), AFAICS.  I certainly
agree that it would be a standard lawyering beyond reason, but "not
possible to claim" is too optimistic.  Maybe I'm just more cynical...

> ThreadA remains sat in accept on fd1 which is now a plain file, not
> a socket.

No.  accept() is not an operation on file descriptors; it's an operation on
file descriptions (pardon for use of that terminology).  They are specified
by passing descriptors, but there's a hell of a difference between e.g.
dup() or fcntl(,F_SETFD,) (operations on descriptors) and read() or lseek()
(operations on descriptions).

Lookups are done once per syscall; the only exception is F_SETFL{,W}, where
we recheck that descriptor is refering to the same thing before granting
the lock.

Again, POSIX is still underspecifying the semantics of shared descriptor
tables; back when the bulk of it had been written there had been no way
to have a descriptor -> description mapping changed under a syscall by
action of another thread.  Hell, they still hadn't picked on some things
that happened in early 80s, let alone early-to-mid 90s...

Linux and Solaris happen to cover these gaps differently; FreeBSD and
OpenBSD are probably closer to Linux variant, NetBSD - to Solaris one.

  reply	other threads:[~2015-10-22 18:17 UTC|newest]

Thread overview: 138+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-10-19 16:59 Fw: [Bug 106241] New: shutdown(3)/close(3) behaviour is incorrect for sockets in accept(3) Stephen Hemminger
2015-10-19 23:33 ` Eric Dumazet
2015-10-20  1:12   ` Alan Burlison
2015-10-20  1:45     ` Eric Dumazet
2015-10-20  9:59       ` Alan Burlison
2015-10-20 11:24         ` David Miller
2015-10-20 11:39           ` Alan Burlison
2015-10-20 13:19         ` Fw: " Eric Dumazet
2015-10-20 13:45           ` Alan Burlison
2015-10-20 15:30             ` Eric Dumazet
2015-10-20 18:31               ` Alan Burlison
2015-10-20 18:42                 ` Eric Dumazet
2015-10-21 10:25                 ` David Laight
2015-10-21 10:49                   ` Alan Burlison
2015-10-21 11:28                     ` Eric Dumazet
2015-10-21 13:03                       ` Alan Burlison
2015-10-21 13:29                         ` Eric Dumazet
2015-10-21  3:49       ` Al Viro
2015-10-21 14:38         ` Alan Burlison
2015-10-21 15:30           ` David Miller
2015-10-21 16:04             ` Casper.Dik
2015-10-21 21:18               ` Eric Dumazet
2015-10-21 21:28                 ` Al Viro
2015-10-21 16:32           ` Fw: " Eric Dumazet
2015-10-21 18:51           ` Al Viro
2015-10-21 20:33             ` Casper.Dik
2015-10-22  4:21               ` Al Viro
2015-10-22 10:55                 ` Alan Burlison
2015-10-22 18:16                   ` Al Viro [this message]
2015-10-22 20:15                     ` Alan Burlison
2015-11-02 10:03               ` David Laight
2015-11-02 10:29                 ` Al Viro
2015-10-21 22:28             ` Alan Burlison
2015-10-22  1:29             ` David Miller
2015-10-22  4:17               ` Alan Burlison
2015-10-22  4:44                 ` Al Viro
2015-10-22  6:03                   ` Al Viro
2015-10-22  6:34                     ` Casper.Dik
2015-10-22 17:21                       ` Al Viro
2015-10-22 18:24                         ` Casper.Dik
2015-10-22 19:07                           ` Al Viro
2015-10-22 19:51                             ` Casper.Dik
2015-10-22 21:57                               ` Al Viro
2015-10-23  9:52                                 ` Casper.Dik
2015-10-23 13:02                                   ` Eric Dumazet
2015-10-23 13:20                                     ` Casper.Dik
2015-10-23 13:48                                       ` Eric Dumazet
2015-10-23 14:13                                       ` Eric Dumazet
2015-10-23 13:35                                     ` Alan Burlison
2015-10-23 14:21                                       ` Eric Dumazet
2015-10-23 15:46                                         ` Alan Burlison
2015-10-23 16:00                                           ` Eric Dumazet
2015-10-23 16:07                                             ` Alan Burlison
2015-10-23 16:19                                             ` Eric Dumazet
2015-10-23 16:40                                               ` Alan Burlison
2015-10-23 17:47                                                 ` Eric Dumazet
2015-10-23 17:59                                                   ` [PATCH net-next] af_unix: do not report POLLOUT on listeners Eric Dumazet
2015-10-25 13:45                                                     ` David Miller
2015-10-24  2:30                                   ` [Bug 106241] New: shutdown(3)/close(3) behaviour is incorrect for sockets in accept(3) Al Viro
2015-10-27  9:08                                     ` Casper.Dik
2015-10-27 10:52                                       ` Alan Burlison
2015-10-27 12:01                                         ` Eric Dumazet
2015-10-27 12:27                                           ` Alan Burlison
2015-10-27 12:44                                             ` Eric Dumazet
2015-10-27 13:42                                         ` David Miller
2015-10-27 13:37                                           ` Alan Burlison
2015-10-27 13:59                                             ` David Miller
2015-10-27 14:13                                               ` Alan Burlison
2015-10-27 14:39                                                 ` David Miller
2015-10-27 14:39                                                   ` Alan Burlison
2015-10-27 15:04                                                     ` David Miller
2015-10-27 15:53                                                       ` Alan Burlison
2015-10-27 23:17                                         ` Al Viro
2015-10-28  0:13                                           ` Eric Dumazet
2015-10-28 12:35                                             ` Al Viro
2015-10-28 13:24                                               ` Eric Dumazet
2015-10-28 14:47                                                 ` Eric Dumazet
2015-10-28 21:13                                                   ` Al Viro
2015-10-28 21:44                                                     ` Eric Dumazet
2015-10-28 22:33                                                       ` Al Viro
2015-10-28 23:08                                                         ` Eric Dumazet
2015-10-29  0:15                                                           ` Al Viro
2015-10-29  3:29                                                             ` Eric Dumazet
2015-10-29  4:16                                                               ` Al Viro
2015-10-29 12:35                                                                 ` Eric Dumazet
2015-10-29 13:48                                                                   ` Eric Dumazet
2015-10-30 17:18                                                                   ` Linus Torvalds
2015-10-30 21:02                                                                     ` Al Viro
2015-10-30 21:23                                                                       ` Linus Torvalds
2015-10-30 21:50                                                                         ` Linus Torvalds
2015-10-30 22:33                                                                           ` Al Viro
2015-10-30 23:52                                                                             ` Linus Torvalds
2015-10-31  0:09                                                                               ` Al Viro
2015-10-31 15:59                                                                               ` Eric Dumazet
2015-10-31 19:34                                                                               ` Al Viro
2015-10-31 19:54                                                                                 ` Linus Torvalds
2015-10-31 20:29                                                                                   ` Al Viro
2015-11-02  0:24                                                                                     ` Al Viro
2015-11-02  0:59                                                                                       ` Linus Torvalds
2015-11-02  2:14                                                                                       ` Eric Dumazet
2015-11-02  6:22                                                                                         ` Al Viro
2015-10-31 20:45                                                                                   ` Eric Dumazet
2015-10-31 21:23                                                                                     ` Linus Torvalds
2015-10-31 21:51                                                                                       ` Al Viro
2015-10-31 22:34                                                                                       ` Eric Dumazet
2015-10-31  1:07                                                                           ` Eric Dumazet
2015-10-28 16:04                                           ` Alan Burlison
2015-10-29 14:58                                         ` David Holland
2015-10-29 15:18                                           ` Alan Burlison
2015-10-29 16:01                                             ` David Holland
2015-10-29 16:15                                               ` Alan Burlison
2015-10-29 17:07                                                 ` Al Viro
2015-10-29 17:12                                                   ` Alan Burlison
2015-10-30  1:54                                                     ` David Miller
2015-10-30  1:55                                                   ` David Miller
2015-10-30  5:44                                                 ` David Holland
2015-10-30 17:43                                           ` David Laight
2015-10-30 21:09                                             ` Al Viro
2015-11-04 15:54                                               ` David Laight
2015-11-04 16:27                                                 ` Al Viro
2015-11-06 15:07                                                   ` David Laight
2015-11-06 19:31                                                     ` Al Viro
2015-10-22  6:51                   ` Casper.Dik
2015-10-22 11:18                     ` Alan Burlison
2015-10-22 11:15                   ` Alan Burlison
2015-10-22  6:15                 ` Casper.Dik
2015-10-22 11:30                   ` Eric Dumazet
2015-10-22 11:58                     ` Alan Burlison
2015-10-22 12:10                       ` Eric Dumazet
2015-10-22 13:12                         ` David Miller
2015-10-22 13:14                         ` Alan Burlison
2015-10-22 17:05                           ` Al Viro
2015-10-22 17:39                             ` Alan Burlison
2015-10-22 18:56                               ` Al Viro
2015-10-22 19:50                                 ` Casper.Dik
2015-10-23 17:09                                   ` Al Viro
2015-10-23 18:30           ` Fw: " David Holland
2015-10-23 19:51             ` Al Viro

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20151022181656.GT22011@ZenIV.linux.org.uk \
    --to=viro@zeniv.linux.org.uk \
    --cc=Alan.Burlison@oracle.com \
    --cc=Casper.Dik@oracle.com \
    --cc=dholland-tech@netbsd.org \
    --cc=eric.dumazet@gmail.com \
    --cc=netdev@vger.kernel.org \
    --cc=stephen@networkplumber.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).