All of lore.kernel.org
 help / color / mirror / Atom feed
From: Eric Dumazet <eric.dumazet@gmail.com>
To: Al Viro <viro@ZenIV.linux.org.uk>
Cc: David Miller <davem@davemloft.net>,
	stephen@networkplumber.org, netdev@vger.kernel.org,
	Linus Torvalds <torvalds@linux-foundation.org>,
	dhowells@redhat.com, linux-fsdevel@vger.kernel.org
Subject: Re: [Bug 106241] New: shutdown(3)/close(3) behaviour is incorrect for sockets in accept(3)
Date: Wed, 28 Oct 2015 06:24:32 -0700	[thread overview]
Message-ID: <1446038672.7476.65.camel@edumazet-glaptop2.roam.corp.google.com> (raw)
In-Reply-To: <20151028123543.GB22011@ZenIV.linux.org.uk>

On Wed, 2015-10-28 at 12:35 +0000, Al Viro wrote:
> [Linus and Dave added, Solaris and NetBSD folks dropped from Cc]
> 
> On Tue, Oct 27, 2015 at 05:13:56PM -0700, Eric Dumazet wrote:
> > On Tue, 2015-10-27 at 23:17 +0000, Al Viro wrote:
> > 
> > > 	* [Linux-specific aside] our __alloc_fd() can degrade quite badly
> > > with some use patterns.  The cacheline pingpong in the bitmap is probably
> > > inevitable, unless we accept considerably heavier memory footprint,
> > > but we also have a case when alloc_fd() takes O(n) and it's _not_ hard
> > > to trigger - close(3);open(...); will have the next open() after that
> > > scanning the entire in-use bitmap.  I think I see a way to improve it
> > > without slowing the normal case down, but I'll need to experiment a
> > > bit before I post patches.  Anybody with examples of real-world loads
> > > that make our descriptor allocator to degrade is very welcome to post
> > > the reproducers...
> > 
> > Well, I do have real-world loads, but quite hard to setup in a lab :(
> > 
> > Note that we also hit the 'struct cred'->usage refcount for every
> > open()/close()/sock_alloc(), and simply moving uid/gid out of the first
> > cache line really helps, as current_fsuid() and current_fsgid() no
> > longer forces a pingpong.
> > 
> > I moved seldom used fields on the first cache line, so that overall
> > memory usage did not change (192 bytes on 64 bit arches)
> 
> [snip]
> 
> Makes sense, but there's a funny thing about that refcount - the part
> coming from ->f_cred is the most frequently changed *and* almost all
> places using ->f_cred are just looking at its fields and do not manipulate
> its refcount.  The only exception (do_process_acct()) is easy to eliminate
> just by storing a separate reference to the current creds of acct(2) caller
> and using it instead of looking at ->f_cred.  What's more, the place where we
> grab what will be ->f_cred is guaranteed to have a non-f_cred reference *and*
> most of the time such a reference is there for dropping ->f_cred (in
> file_free()/file_free_rcu()).
> 
> With that change in kernel/acct.c done, we could do the following:
> 	a) split the cred refcount into the normal and percpu parts and
> add a spinlock in there.
> 	b) have put_cred() do this:
> 		if (atomic_dec_and_test(&cred->usage)) {
> 			this_cpu_add(&cred->f_cred_usage, 1);
> 			call_rcu(&cred->rcu, put_f_cred_rcu);
> 		}
> 	c) have get_empty_filp() increment current_cred ->f_cred_usage with
> this_cpu_add()
> 	d) have file_free() do
> 		percpu_counter_dec(&nr_files);
> 		rcu_read_lock();
> 		if (likely(atomic_read(&f->f_cred->usage))) {
> 			this_cpu_add(&f->f_cred->f_cred_usage, -1);
> 			rcu_read_unlock();
> 			call_rcu(&f->f_u.fu_rcuhead, file_free_rcu_light);
> 		} else {
> 			rcu_read_unlock();
> 			call_rcu(&f->f_u.fu_rcuhead, file_free_rcu);
> 		}
> file_free_rcu() being
> static void file_free_rcu(struct rcu_head *head)
> {
>         struct file *f = container_of(head, struct file, f_u.fu_rcuhead);
>         put_f_cred(&f->f_cred->rcu);
>         kmem_cache_free(filp_cachep, f);
> }
> and file_free_rcu_light() - the same sans put_f_cred();
> 
> with put_f_cred() doing
> 	spin_lock cred->lock
> 	this_cpu_add(&cred->f_cred_usage, -1);
> 	find the sum of cred->f_cred_usage
> 	spin_unlock cred->lock
> 	if the sum has not reached 0
> 		return
> 	current put_cred_rcu(cred)
> 
> IOW, let's try to get rid of cross-cpu stores in ->f_cred grabbing and
> (most of) ->f_cred dropping.
> 
> Note that there are two paths leading to put_f_cred() in the above - via
> call_rcu() on &cred->rcu and from file_free_rcu() called via call_rcu() on
> &f->f_u.fu_rcuhead.  Both are RCU-delayed and they can happen in parallel -
> different rcu_head are used.
> 
> atomic_read() check in file_free() might give false positives if it comes
> just before put_cred() on another CPU kills the last non-f_cred reference.
> It's not a problem, since put_f_cred() from that put_cred() won't be
> executed until we drop rcu_read_lock(), so we can safely decrement the
> cred->f_cred_usage without cred->lock here (and we are guaranteed that we won't
> be dropping the last of that - the same put_cred() would've incremented
> ->f_cred_usage).
> 
> Does anybody see problems with that approach?  I'm going to grab some sleep
> (only a couple of hours so far tonight ;-/), will cook an incremental to Eric's
> field-reordering patch when I get up...

Before I take a deep look at your suggestion, are you sure plain use of
include/linux/percpu-refcount.h infra is not possible for struct cred ?

Thanks !

  reply	other threads:[~2015-10-28 13:24 UTC|newest]

Thread overview: 138+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-10-19 16:59 Fw: [Bug 106241] New: shutdown(3)/close(3) behaviour is incorrect for sockets in accept(3) Stephen Hemminger
2015-10-19 23:33 ` Eric Dumazet
2015-10-20  1:12   ` Alan Burlison
2015-10-20  1:45     ` Eric Dumazet
2015-10-20  9:59       ` Alan Burlison
2015-10-20 11:24         ` David Miller
2015-10-20 11:39           ` Alan Burlison
2015-10-20 13:19         ` Fw: " Eric Dumazet
2015-10-20 13:45           ` Alan Burlison
2015-10-20 15:30             ` Eric Dumazet
2015-10-20 18:31               ` Alan Burlison
2015-10-20 18:42                 ` Eric Dumazet
2015-10-21 10:25                 ` David Laight
2015-10-21 10:49                   ` Alan Burlison
2015-10-21 11:28                     ` Eric Dumazet
2015-10-21 13:03                       ` Alan Burlison
2015-10-21 13:29                         ` Eric Dumazet
2015-10-21  3:49       ` Al Viro
2015-10-21 14:38         ` Alan Burlison
2015-10-21 15:30           ` David Miller
2015-10-21 16:04             ` Casper.Dik
2015-10-21 21:18               ` Eric Dumazet
2015-10-21 21:28                 ` Al Viro
2015-10-21 16:32           ` Fw: " Eric Dumazet
2015-10-21 18:51           ` Al Viro
2015-10-21 20:33             ` Casper.Dik
2015-10-22  4:21               ` Al Viro
2015-10-22 10:55                 ` Alan Burlison
2015-10-22 18:16                   ` Al Viro
2015-10-22 20:15                     ` Alan Burlison
2015-11-02 10:03               ` David Laight
2015-11-02 10:29                 ` Al Viro
2015-10-21 22:28             ` Alan Burlison
2015-10-22  1:29             ` David Miller
2015-10-22  4:17               ` Alan Burlison
2015-10-22  4:44                 ` Al Viro
2015-10-22  6:03                   ` Al Viro
2015-10-22  6:34                     ` Casper.Dik
2015-10-22 17:21                       ` Al Viro
2015-10-22 18:24                         ` Casper.Dik
2015-10-22 19:07                           ` Al Viro
2015-10-22 19:51                             ` Casper.Dik
2015-10-22 21:57                               ` Al Viro
2015-10-23  9:52                                 ` Casper.Dik
2015-10-23 13:02                                   ` Eric Dumazet
2015-10-23 13:20                                     ` Casper.Dik
2015-10-23 13:48                                       ` Eric Dumazet
2015-10-23 14:13                                       ` Eric Dumazet
2015-10-23 13:35                                     ` Alan Burlison
2015-10-23 14:21                                       ` Eric Dumazet
2015-10-23 15:46                                         ` Alan Burlison
2015-10-23 16:00                                           ` Eric Dumazet
2015-10-23 16:07                                             ` Alan Burlison
2015-10-23 16:19                                             ` Eric Dumazet
2015-10-23 16:40                                               ` Alan Burlison
2015-10-23 17:47                                                 ` Eric Dumazet
2015-10-23 17:59                                                   ` [PATCH net-next] af_unix: do not report POLLOUT on listeners Eric Dumazet
2015-10-25 13:45                                                     ` David Miller
2015-10-24  2:30                                   ` [Bug 106241] New: shutdown(3)/close(3) behaviour is incorrect for sockets in accept(3) Al Viro
2015-10-27  9:08                                     ` Casper.Dik
2015-10-27 10:52                                       ` Alan Burlison
2015-10-27 12:01                                         ` Eric Dumazet
2015-10-27 12:27                                           ` Alan Burlison
2015-10-27 12:44                                             ` Eric Dumazet
2015-10-27 13:42                                         ` David Miller
2015-10-27 13:37                                           ` Alan Burlison
2015-10-27 13:59                                             ` David Miller
2015-10-27 14:13                                               ` Alan Burlison
2015-10-27 14:39                                                 ` David Miller
2015-10-27 14:39                                                   ` Alan Burlison
2015-10-27 15:04                                                     ` David Miller
2015-10-27 15:53                                                       ` Alan Burlison
2015-10-27 23:17                                         ` Al Viro
2015-10-28  0:13                                           ` Eric Dumazet
2015-10-28 12:35                                             ` Al Viro
2015-10-28 13:24                                               ` Eric Dumazet [this message]
2015-10-28 14:47                                                 ` Eric Dumazet
2015-10-28 21:13                                                   ` Al Viro
2015-10-28 21:44                                                     ` Eric Dumazet
2015-10-28 22:33                                                       ` Al Viro
2015-10-28 23:08                                                         ` Eric Dumazet
2015-10-29  0:15                                                           ` Al Viro
2015-10-29  3:29                                                             ` Eric Dumazet
2015-10-29  4:16                                                               ` Al Viro
2015-10-29 12:35                                                                 ` Eric Dumazet
2015-10-29 13:48                                                                   ` Eric Dumazet
2015-10-30 17:18                                                                   ` Linus Torvalds
2015-10-30 21:02                                                                     ` Al Viro
2015-10-30 21:23                                                                       ` Linus Torvalds
2015-10-30 21:50                                                                         ` Linus Torvalds
2015-10-30 22:33                                                                           ` Al Viro
2015-10-30 23:52                                                                             ` Linus Torvalds
2015-10-31  0:09                                                                               ` Al Viro
2015-10-31 15:59                                                                               ` Eric Dumazet
2015-10-31 19:34                                                                               ` Al Viro
2015-10-31 19:54                                                                                 ` Linus Torvalds
2015-10-31 20:29                                                                                   ` Al Viro
2015-11-02  0:24                                                                                     ` Al Viro
2015-11-02  0:59                                                                                       ` Linus Torvalds
2015-11-02  2:14                                                                                       ` Eric Dumazet
2015-11-02  6:22                                                                                         ` Al Viro
2015-10-31 20:45                                                                                   ` Eric Dumazet
2015-10-31 21:23                                                                                     ` Linus Torvalds
2015-10-31 21:51                                                                                       ` Al Viro
2015-10-31 22:34                                                                                       ` Eric Dumazet
2015-10-31  1:07                                                                           ` Eric Dumazet
2015-10-28 16:04                                           ` Alan Burlison
2015-10-29 14:58                                         ` David Holland
2015-10-29 15:18                                           ` Alan Burlison
2015-10-29 16:01                                             ` David Holland
2015-10-29 16:15                                               ` Alan Burlison
2015-10-29 17:07                                                 ` Al Viro
2015-10-29 17:12                                                   ` Alan Burlison
2015-10-30  1:54                                                     ` David Miller
2015-10-30  1:55                                                   ` David Miller
2015-10-30  5:44                                                 ` David Holland
2015-10-30 17:43                                           ` David Laight
2015-10-30 21:09                                             ` Al Viro
2015-11-04 15:54                                               ` David Laight
2015-11-04 16:27                                                 ` Al Viro
2015-11-06 15:07                                                   ` David Laight
2015-11-06 19:31                                                     ` Al Viro
2015-10-22  6:51                   ` Casper.Dik
2015-10-22 11:18                     ` Alan Burlison
2015-10-22 11:15                   ` Alan Burlison
2015-10-22  6:15                 ` Casper.Dik
2015-10-22 11:30                   ` Eric Dumazet
2015-10-22 11:58                     ` Alan Burlison
2015-10-22 12:10                       ` Eric Dumazet
2015-10-22 13:12                         ` David Miller
2015-10-22 13:14                         ` Alan Burlison
2015-10-22 17:05                           ` Al Viro
2015-10-22 17:39                             ` Alan Burlison
2015-10-22 18:56                               ` Al Viro
2015-10-22 19:50                                 ` Casper.Dik
2015-10-23 17:09                                   ` Al Viro
2015-10-23 18:30           ` Fw: " David Holland
2015-10-23 19:51             ` Al Viro

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1446038672.7476.65.camel@edumazet-glaptop2.roam.corp.google.com \
    --to=eric.dumazet@gmail.com \
    --cc=davem@davemloft.net \
    --cc=dhowells@redhat.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=stephen@networkplumber.org \
    --cc=torvalds@linux-foundation.org \
    --cc=viro@ZenIV.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.