All of lore.kernel.org
 help / color / mirror / Atom feed
From: Eric Dumazet <eric.dumazet@gmail.com>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>,
	David Miller <davem@davemloft.net>,
	Stephen Hemminger <stephen@networkplumber.org>,
	Network Development <netdev@vger.kernel.org>,
	David Howells <dhowells@redhat.com>,
	linux-fsdevel <linux-fsdevel@vger.kernel.org>
Subject: Re: [Bug 106241] New: shutdown(3)/close(3) behaviour is incorrect for sockets in accept(3)
Date: Sat, 31 Oct 2015 08:59:31 -0700	[thread overview]
Message-ID: <1446307171.6254.69.camel@edumazet-glaptop2.roam.corp.google.com> (raw)
In-Reply-To: <CA+55aFwfFb=LXU77AbiPDHgWcpBwTBoJB4EMCZgTgX32cxMYWw@mail.gmail.com>

On Fri, 2015-10-30 at 16:52 -0700, Linus Torvalds wrote: sequential
allocations...
> 
> I don't think it would matter in real life, since I don't really think
> you have lots of fd's with strictly sequential behavior.
> 
> That said, the trivial "open lots of fds" benchmark would show it, so
> I guess we can just keep it. The next_fd logic is not expensive or
> complex, after all.

+1


> Attached is an updated patch that just uses the regular bitmap
> allocator and extends it to also have the bitmap of bitmaps. It
> actually simplifies the patch, so I guess it's better this way.
> 
> Anyway, I've tested it all a bit more, and for a trivial worst-case
> stress program that explicitly kills the next_fd logic by doing
> 
>     for (i = 0; i < 1000000; i++) {
>         close(3);
>         dup2(0,3);
>         if (dup(0))
>             break;
>     }
> 
> it takes it down from roughly 10s to 0.2s. So the patch is quite
> noticeable on that kind of pattern.
> 
> NOTE! You'll obviously need to increase your limits to actually be
> able to do the above with lots of file descriptors.
> 
> I ran Eric's test-program too, and find_next_zero_bit() dropped to a
> fraction of a percent. It's not entirely gone, but it's down in the
> noise.
> 
> I really suspect this patch is "good enough" in reality, and I would
> *much* rather do something like this than add a new non-POSIX flag
> that people have to update their binaries for. I agree with Eric that
> *some* people will do so, but it's still the wrong thing to do. Let's
> just make performance with the normal semantics be good enough that we
> don't need to play odd special games.
> 
> Eric?

I absolutely agree a generic solution is far better, especially when
its performance is in par.

Tested-by: Eric Dumazet <edumazet@google.com>
Acked-by: Eric Dumazet <edumazet@google.com>

Note that a non-POSIX flag (or a thread personality hints)
would still allow the kernel to do proper NUMA affinity placement : Say
the fd_array and bitmaps are split on the 2 nodes (or more, but most
servers nowadays have 2 sockets really).

Then at fd allocation time, we can prefer to pick an fd for which memory
holding various bits and the file pointer are in the local node

This speeds up subsequent fd system call on programs that constantly
blow away cpu caches, saving QPI transactions.

Thanks a lot Linus.

lpaa24:~# taskset ff0ff ./opensock -t 16 -n 10000000 -l 10
count=10000000 (check/increase ulimit -n)
total = 3992764

lpaa24:~# ./opensock -t 48 -n 10000000 -l 10
count=10000000 (check/increase ulimit -n)
total = 3545249

Profile with 16 threads :

    69.55%  opensock          [.] memset                       
    11.83%  [kernel]          [k] queued_spin_lock_slowpath    
     1.91%  [kernel]          [k] _find_next_bit.part.0        
     1.68%  [kernel]          [k] _raw_spin_lock               
     0.99%  [kernel]          [k] kmem_cache_alloc             
     0.99%  [kernel]          [k] memset_erms                  
     0.95%  [kernel]          [k] get_empty_filp               
     0.82%  [kernel]          [k] __close_fd                   
     0.73%  [kernel]          [k] __alloc_fd                   
     0.65%  [kernel]          [k] sk_alloc                     
     0.63%  opensock          [.] child_function               
     0.56%  [kernel]          [k] fput                         
     0.35%  [kernel]          [k] sock_alloc                   
     0.31%  [kernel]          [k] kmem_cache_free              
     0.31%  [kernel]          [k] inode_init_always            
     0.28%  [kernel]          [k] d_set_d_op                   
     0.27%  [kernel]          [k] entry_SYSCALL_64_after_swapgs

Profile with 48 threads :

    57.92%  [kernel]          [k] queued_spin_lock_slowpath    
    32.14%  opensock          [.] memset                       
     0.81%  [kernel]          [k] _find_next_bit.part.0        
     0.51%  [kernel]          [k] _raw_spin_lock               
     0.45%  [kernel]          [k] kmem_cache_alloc             
     0.38%  [kernel]          [k] kmem_cache_free              
     0.34%  [kernel]          [k] __close_fd                   
     0.32%  [kernel]          [k] memset_erms                  
     0.25%  [kernel]          [k] __alloc_fd                   
     0.24%  [kernel]          [k] get_empty_filp               
     0.23%  opensock          [.] child_function               
     0.18%  [kernel]          [k] __d_alloc                    
     0.17%  [kernel]          [k] inode_init_always            
     0.16%  [kernel]          [k] sock_alloc                   
     0.16%  [kernel]          [k] del_timer                    
     0.15%  [kernel]          [k] entry_SYSCALL_64_after_swapgs
     0.15%  perf              [.] 0x000000000004d924           
     0.15%  [kernel]          [k] tcp_close                    





  parent reply	other threads:[~2015-10-31 15:59 UTC|newest]

Thread overview: 138+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-10-19 16:59 Fw: [Bug 106241] New: shutdown(3)/close(3) behaviour is incorrect for sockets in accept(3) Stephen Hemminger
2015-10-19 23:33 ` Eric Dumazet
2015-10-20  1:12   ` Alan Burlison
2015-10-20  1:45     ` Eric Dumazet
2015-10-20  9:59       ` Alan Burlison
2015-10-20 11:24         ` David Miller
2015-10-20 11:39           ` Alan Burlison
2015-10-20 13:19         ` Fw: " Eric Dumazet
2015-10-20 13:45           ` Alan Burlison
2015-10-20 15:30             ` Eric Dumazet
2015-10-20 18:31               ` Alan Burlison
2015-10-20 18:42                 ` Eric Dumazet
2015-10-21 10:25                 ` David Laight
2015-10-21 10:49                   ` Alan Burlison
2015-10-21 11:28                     ` Eric Dumazet
2015-10-21 13:03                       ` Alan Burlison
2015-10-21 13:29                         ` Eric Dumazet
2015-10-21  3:49       ` Al Viro
2015-10-21 14:38         ` Alan Burlison
2015-10-21 15:30           ` David Miller
2015-10-21 16:04             ` Casper.Dik
2015-10-21 21:18               ` Eric Dumazet
2015-10-21 21:28                 ` Al Viro
2015-10-21 16:32           ` Fw: " Eric Dumazet
2015-10-21 18:51           ` Al Viro
2015-10-21 20:33             ` Casper.Dik
2015-10-22  4:21               ` Al Viro
2015-10-22 10:55                 ` Alan Burlison
2015-10-22 18:16                   ` Al Viro
2015-10-22 20:15                     ` Alan Burlison
2015-11-02 10:03               ` David Laight
2015-11-02 10:29                 ` Al Viro
2015-10-21 22:28             ` Alan Burlison
2015-10-22  1:29             ` David Miller
2015-10-22  4:17               ` Alan Burlison
2015-10-22  4:44                 ` Al Viro
2015-10-22  6:03                   ` Al Viro
2015-10-22  6:34                     ` Casper.Dik
2015-10-22 17:21                       ` Al Viro
2015-10-22 18:24                         ` Casper.Dik
2015-10-22 19:07                           ` Al Viro
2015-10-22 19:51                             ` Casper.Dik
2015-10-22 21:57                               ` Al Viro
2015-10-23  9:52                                 ` Casper.Dik
2015-10-23 13:02                                   ` Eric Dumazet
2015-10-23 13:20                                     ` Casper.Dik
2015-10-23 13:48                                       ` Eric Dumazet
2015-10-23 14:13                                       ` Eric Dumazet
2015-10-23 13:35                                     ` Alan Burlison
2015-10-23 14:21                                       ` Eric Dumazet
2015-10-23 15:46                                         ` Alan Burlison
2015-10-23 16:00                                           ` Eric Dumazet
2015-10-23 16:07                                             ` Alan Burlison
2015-10-23 16:19                                             ` Eric Dumazet
2015-10-23 16:40                                               ` Alan Burlison
2015-10-23 17:47                                                 ` Eric Dumazet
2015-10-23 17:59                                                   ` [PATCH net-next] af_unix: do not report POLLOUT on listeners Eric Dumazet
2015-10-25 13:45                                                     ` David Miller
2015-10-24  2:30                                   ` [Bug 106241] New: shutdown(3)/close(3) behaviour is incorrect for sockets in accept(3) Al Viro
2015-10-27  9:08                                     ` Casper.Dik
2015-10-27 10:52                                       ` Alan Burlison
2015-10-27 12:01                                         ` Eric Dumazet
2015-10-27 12:27                                           ` Alan Burlison
2015-10-27 12:44                                             ` Eric Dumazet
2015-10-27 13:42                                         ` David Miller
2015-10-27 13:37                                           ` Alan Burlison
2015-10-27 13:59                                             ` David Miller
2015-10-27 14:13                                               ` Alan Burlison
2015-10-27 14:39                                                 ` David Miller
2015-10-27 14:39                                                   ` Alan Burlison
2015-10-27 15:04                                                     ` David Miller
2015-10-27 15:53                                                       ` Alan Burlison
2015-10-27 23:17                                         ` Al Viro
2015-10-28  0:13                                           ` Eric Dumazet
2015-10-28 12:35                                             ` Al Viro
2015-10-28 13:24                                               ` Eric Dumazet
2015-10-28 14:47                                                 ` Eric Dumazet
2015-10-28 21:13                                                   ` Al Viro
2015-10-28 21:44                                                     ` Eric Dumazet
2015-10-28 22:33                                                       ` Al Viro
2015-10-28 23:08                                                         ` Eric Dumazet
2015-10-29  0:15                                                           ` Al Viro
2015-10-29  3:29                                                             ` Eric Dumazet
2015-10-29  4:16                                                               ` Al Viro
2015-10-29 12:35                                                                 ` Eric Dumazet
2015-10-29 13:48                                                                   ` Eric Dumazet
2015-10-30 17:18                                                                   ` Linus Torvalds
2015-10-30 21:02                                                                     ` Al Viro
2015-10-30 21:23                                                                       ` Linus Torvalds
2015-10-30 21:50                                                                         ` Linus Torvalds
2015-10-30 22:33                                                                           ` Al Viro
2015-10-30 23:52                                                                             ` Linus Torvalds
2015-10-31  0:09                                                                               ` Al Viro
2015-10-31 15:59                                                                               ` Eric Dumazet [this message]
2015-10-31 19:34                                                                               ` Al Viro
2015-10-31 19:54                                                                                 ` Linus Torvalds
2015-10-31 20:29                                                                                   ` Al Viro
2015-11-02  0:24                                                                                     ` Al Viro
2015-11-02  0:59                                                                                       ` Linus Torvalds
2015-11-02  2:14                                                                                       ` Eric Dumazet
2015-11-02  6:22                                                                                         ` Al Viro
2015-10-31 20:45                                                                                   ` Eric Dumazet
2015-10-31 21:23                                                                                     ` Linus Torvalds
2015-10-31 21:51                                                                                       ` Al Viro
2015-10-31 22:34                                                                                       ` Eric Dumazet
2015-10-31  1:07                                                                           ` Eric Dumazet
2015-10-28 16:04                                           ` Alan Burlison
2015-10-29 14:58                                         ` David Holland
2015-10-29 15:18                                           ` Alan Burlison
2015-10-29 16:01                                             ` David Holland
2015-10-29 16:15                                               ` Alan Burlison
2015-10-29 17:07                                                 ` Al Viro
2015-10-29 17:12                                                   ` Alan Burlison
2015-10-30  1:54                                                     ` David Miller
2015-10-30  1:55                                                   ` David Miller
2015-10-30  5:44                                                 ` David Holland
2015-10-30 17:43                                           ` David Laight
2015-10-30 21:09                                             ` Al Viro
2015-11-04 15:54                                               ` David Laight
2015-11-04 16:27                                                 ` Al Viro
2015-11-06 15:07                                                   ` David Laight
2015-11-06 19:31                                                     ` Al Viro
2015-10-22  6:51                   ` Casper.Dik
2015-10-22 11:18                     ` Alan Burlison
2015-10-22 11:15                   ` Alan Burlison
2015-10-22  6:15                 ` Casper.Dik
2015-10-22 11:30                   ` Eric Dumazet
2015-10-22 11:58                     ` Alan Burlison
2015-10-22 12:10                       ` Eric Dumazet
2015-10-22 13:12                         ` David Miller
2015-10-22 13:14                         ` Alan Burlison
2015-10-22 17:05                           ` Al Viro
2015-10-22 17:39                             ` Alan Burlison
2015-10-22 18:56                               ` Al Viro
2015-10-22 19:50                                 ` Casper.Dik
2015-10-23 17:09                                   ` Al Viro
2015-10-23 18:30           ` Fw: " David Holland
2015-10-23 19:51             ` Al Viro

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1446307171.6254.69.camel@edumazet-glaptop2.roam.corp.google.com \
    --to=eric.dumazet@gmail.com \
    --cc=davem@davemloft.net \
    --cc=dhowells@redhat.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=stephen@networkplumber.org \
    --cc=torvalds@linux-foundation.org \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.