All of lore.kernel.org
 help / color / mirror / Atom feed
From: Eric Dumazet <eric.dumazet@gmail.com>
To: Mateusz Guzik <mguzik@redhat.com>
Cc: Al Viro <viro@ZenIV.linux.org.uk>,
	Andrew Morton <akpm@linux-foundation.org>,
	"Paul E. McKenney" <paulmck@linux.vnet.ibm.com>,
	Yann Droneaud <ydroneaud@opteya.com>,
	Konstantin Khlebnikov <khlebnikov@yandex-team.ru>,
	linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: Re: [RFC PATCH] fs: use a sequence counter instead of file_lock in fd_install
Date: Thu, 16 Apr 2015 15:52:20 -0700	[thread overview]
Message-ID: <1429224740.7346.225.camel@edumazet-glaptop2.roam.corp.google.com> (raw)
In-Reply-To: <20150416220002.GB20615@mguzik>

On Fri, 2015-04-17 at 00:00 +0200, Mateusz Guzik wrote:
> On Thu, Apr 16, 2015 at 01:55:39PM -0700, Eric Dumazet wrote:
> > On Thu, 2015-04-16 at 13:42 -0700, Eric Dumazet wrote:
> > > On Thu, 2015-04-16 at 19:09 +0100, Al Viro wrote:
> > > > On Thu, Apr 16, 2015 at 02:16:31PM +0200, Mateusz Guzik wrote:
> > > > > @@ -165,8 +165,10 @@ static int expand_fdtable(struct files_struct *files, int nr)
> > > > >  	cur_fdt = files_fdtable(files);
> > > > >  	if (nr >= cur_fdt->max_fds) {
> > > > >  		/* Continue as planned */
> > > > > +		write_seqcount_begin(&files->fdt_seqcount);
> > > > >  		copy_fdtable(new_fdt, cur_fdt);
> > > > >  		rcu_assign_pointer(files->fdt, new_fdt);
> > > > > +		write_seqcount_end(&files->fdt_seqcount);
> > > > >  		if (cur_fdt != &files->fdtab)
> > > > >  			call_rcu(&cur_fdt->rcu, free_fdtable_rcu);
> > > > 
> > > > Interesting.  AFAICS, your test doesn't step anywhere near that path,
> > > > does it?  So basically you never hit the retries during that...
> > > 
> > > Right, but then the table is almost never changed for a given process,
> > > as we only increase it by power of two steps.
> > > 
> > > (So I scratch my initial comment, fdt_seqcount is really mostly read)
> > 
> > I tested Mateusz patch with my opensock program, mimicking a bit more
> > what a server does (having lot of sockets)
> > 
> > 24 threads running, doing close(randomfd())/socket() calls like crazy.
> > 
> > Before patch :
> > 
> > # time ./opensock 
> > 
> > real	0m10.863s
> > user	0m0.954s
> > sys	2m43.659s
> > 
> > 
> > After patch :
> > 
> > # time ./opensock
> > 
> > real	0m9.750s
> > user	0m0.804s
> > sys	2m18.034s
> > 
> > So this is an improvement for sure, but not massive.
> > 
> > perf record ./opensock ; report
> > 
> >     87.80%  opensock  [kernel.kallsyms]  [k] _raw_spin_lock                     
> >                |--52.70%-- __close_fd
> >                |--46.41%-- __alloc_fd
> 
> My crap benchmark is here: http://people.redhat.com/~mguzik/pipebench.c
> (compile with -pthread, run with -s 10 -n 16 for 10 second test + 16
> threads)
> 
> As noted earlier it tends to go from rougly 300k ops/s to 400.
> 
> The fundamental problem here seems to be this pesky POSIX requirement of
> providing the lowest possible fd on each allocation (as a side note
> Linux breaks this with parallel fd allocs, where one of these backs off
> the reservation, not that I believe this causes trouble).

Note POSIX never talked about multi threads. The POSIX requirement came
from traditional linux stdin/stdout/stderr handling and legacy programs,
before dup2() even existed.


> 
> Ideally a process-wide switch could be implemented (e.g.
> prctl(SCRATCH_LOWEST_FD_REQ)) which would grant the kernel the freedom
> to return any fd it wants, so it would be possible to have fd ranges
> per thread and the like.

I played months ago with a SOCK_FD_FASTALLOC ;)

idea was to use a random starting point instead of 0.

But the bottleneck was really the spinlock, not the bit search, unless I
used 10 million fds in the program...

> 
> Having only a O_SCRATCH_POSIX flag passed to syscalls would still leave
> close() as a bottleneck.
> 
> In the meantime I consider the approach taken in my patch as an ok
> temporary improvement.

Yes please formally submit this patch.

Note that adding atomic bit operations could eventually allow to not
hold the spinlock at close() time.



  reply	other threads:[~2015-04-16 22:52 UTC|newest]

Thread overview: 36+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-04-16 12:16 [RFC PATCH] fs: use a sequence counter instead of file_lock in fd_install Mateusz Guzik
2015-04-16 17:47 ` Eric Dumazet
2015-04-16 18:09 ` Al Viro
2015-04-16 20:42   ` Eric Dumazet
2015-04-16 20:55     ` Eric Dumazet
2015-04-16 22:00       ` Mateusz Guzik
2015-04-16 22:52         ` Eric Dumazet [this message]
2015-04-16 22:35   ` Mateusz Guzik
2015-04-17 21:46 ` Eric Dumazet
2015-04-17 22:16   ` Mateusz Guzik
2015-04-17 23:02     ` Al Viro
2015-04-18 19:41       ` Eric Dumazet
2015-04-20 13:41         ` Mateusz Guzik
2015-04-20 16:46           ` Eric Dumazet
2015-04-20 16:48             ` Eric Dumazet
2015-04-20 13:06       ` Mateusz Guzik
2015-04-20 13:43         ` Mateusz Guzik
2015-04-20 15:10           ` Mateusz Guzik
2015-04-20 17:15             ` Eric Dumazet
2015-04-20 20:49               ` Eric Dumazet
2015-04-21 18:05                 ` Eric Dumazet
2015-04-21 20:06                   ` Mateusz Guzik
2015-04-21 20:12                     ` Mateusz Guzik
2015-04-21 21:06                       ` Eric Dumazet
2015-04-22  4:59                         ` [PATCH] fs/file.c: don't acquire files->file_lock in fd_install() Eric Dumazet
2015-04-27 19:05                           ` Mateusz Guzik
2015-04-28 16:20                             ` Eric Dumazet
2015-04-29  4:25                           ` [PATCH v2] " Eric Dumazet
2015-06-22  2:32                             ` Al Viro
2015-06-22  2:32                               ` Al Viro
2015-06-23  5:31                               ` Eric Dumazet
2015-06-23  5:31                                 ` Eric Dumazet
2015-06-30 13:54                             ` [PATCH v3] " Eric Dumazet
2015-04-22 13:31                         ` [RFC PATCH] fs: use a sequence counter instead of file_lock in fd_install Mateusz Guzik
2015-04-22 13:55                           ` Eric Dumazet
2015-04-21 20:57                     ` Eric Dumazet

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1429224740.7346.225.camel@edumazet-glaptop2.roam.corp.google.com \
    --to=eric.dumazet@gmail.com \
    --cc=akpm@linux-foundation.org \
    --cc=khlebnikov@yandex-team.ru \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mguzik@redhat.com \
    --cc=paulmck@linux.vnet.ibm.com \
    --cc=viro@ZenIV.linux.org.uk \
    --cc=ydroneaud@opteya.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.