From: Al Viro <viro@ZenIV.linux.org.uk>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Dave Hansen <dave.hansen@intel.com>,
"Chen, Tim C" <tim.c.chen@intel.com>,
Ingo Molnar <mingo@redhat.com>, Davidlohr Bueso <dbueso@suse.de>,
"Peter Zijlstra (Intel)" <peterz@infradead.org>,
Jason Low <jason.low2@hp.com>,
Michel Lespinasse <walken@google.com>,
"Paul E. McKenney" <paulmck@linux.vnet.ibm.com>,
Waiman Long <waiman.long@hp.com>,
LKML <linux-kernel@vger.kernel.org>
Subject: Re: performance delta after VFS i_mutex=>i_rwsem conversion
Date: Tue, 7 Jun 2016 01:44:27 +0100 [thread overview]
Message-ID: <20160607004427.GI14480@ZenIV.linux.org.uk> (raw)
In-Reply-To: <20160607004058.GH14480@ZenIV.linux.org.uk>
On Tue, Jun 07, 2016 at 01:40:58AM +0100, Al Viro wrote:
> On Mon, Jun 06, 2016 at 04:50:59PM -0700, Linus Torvalds wrote:
> >
> >
> > On Mon, 6 Jun 2016, Al Viro wrote:
> > >
> > > True in general, but here we really do a lot under that ->d_lock - all
> > > list traversals are under it. So I suspect that contention on nested
> > > lock is not an issue in that particular load. It's certainly a separate
> > > commit, so we'll see how much does it give on its own, but I doubt that
> > > it'll be anywhere near enough.
> >
> > Hmm. Maybe.
> >
> > But at least we can try to minimize everything that happens under the
> > dentry->d_lock spinlock.
> >
> > So how about this patch? It's entirely untested, but it rewrites that
> > readdir() function to try to do the minimum possible under the d_lock
> > spinlock.
> >
> > I say "rewrite", because it really is totally different. It's not just
> > that the nested "next" locking is gone, it also treats the cursor very
> > differently and tries to avoid doing any unnecessary cursor list
> > operations.
>
> Similar to what I've got here, except that mine has a couple of helper
> functions usable in dcache_dir_lseek() as well:
>
> next_positive(parent, child, n) - returns nth positive child after that one
> or NULL if there's less than n such. NULL as the second argument => search
> from the beginning.
>
> move_cursor(cursor, child) - moves cursor immediately past child *or* to
> the very end if child is NULL.
>
> The third commit in series will be the lockless replacement for
> for next_positive(). move_cursor() is easy - it became simply
> struct dentry *parent = cursor->d_parent;
> unsigned n, *seq = &parent->d_inode->i_dir_seq;
> spin_lock(&parent->d_lock);
> for (;;) {
> n = *seq;
> if (!(n & 1) && cmpxchg(seq, n, n + 1) == n)
> break;
> cpu_relax();
> }
> __list_del(cursor->d_child.prev, cursor->d_child.next);
> if (child)
> list_add(&cursor->d_child, &child->d_child);
> else
> list_add_tail(&cursor->d_child, &parent->d_subdirs);
> smp_store_release(seq, n + 2);
> spin_unlock(&parent->d_lock);
>
> with
>
> static struct dentry *next_positive(struct dentry *parent,
> struct dentry *child, int count)
> {
> struct list_head *p = child ? &child->d_child : &parent->d_subdirs;
> unsigned *seq = &parent->d_inode->i_dir_seq, n;
> do {
> int i = count;
> n = smp_load_acquire(seq) & ~1;
> rcu_read_lock();
> do {
> p = p->next;
> if (p == &parent->d_subdirs) {
> child = NULL;
> break;
> }
> child = list_entry(p, struct dentry, d_child);
> } while (!simple_positive(child) || --i);
> rcu_read_unlock();
> } while (unlikely(smp_load_acquire(seq) != n));
> return child;
> }
> as initial attempt at lockless next_positive(); barriers are probably wrong,
> though...
FWIW,
loff_t dcache_dir_lseek(struct file *file, loff_t offset, int whence)
{
struct dentry *dentry = file->f_path.dentry;
switch (whence) {
case 1:
offset += file->f_pos;
case 0:
if (offset >= 0)
break;
default:
return -EINVAL;
}
if (offset != file->f_pos) {
file->f_pos = offset;
if (file->f_pos >= 2) {
struct dentry *cursor = file->private_data;
loff_t n = file->f_pos - 2;
inode_lock_shared(dentry->d_inode);
move_cursor(cursor, next_positive(dentry, NULL, n));
inode_unlock_shared(dentry->d_inode);
}
}
return offset;
}
and
int dcache_readdir(struct file *file, struct dir_context *ctx)
{
struct dentry *dentry = file->f_path.dentry;
struct dentry *cursor = file->private_data;
struct dentry *child, *next;
bool moved = false;
if (!dir_emit_dots(file, ctx))
return 0;
child = ctx->pos != 2 ? cursor : NULL;
while ((next = next_positive(dentry, child, 1)) != NULL) {
if (!dir_emit(ctx, next->d_name.name, next->d_name.len,
d_inode(next)->i_ino, dt_type(d_inode(next))))
break;
moved = true;
child = next;
ctx->pos++;
}
if (moved)
move_cursor(cursor, child);
return 0;
}
is what the methods themselves became.
next prev parent reply other threads:[~2016-06-07 0:44 UTC|newest]
Thread overview: 21+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-06-06 20:00 performance delta after VFS i_mutex=>i_rwsem conversion Dave Hansen
2016-06-06 20:46 ` Linus Torvalds
2016-06-06 21:13 ` Waiman Long
2016-06-06 21:20 ` Linus Torvalds
2016-06-07 3:22 ` Valdis.Kletnieks
2016-06-07 15:22 ` Waiman Long
2016-06-08 8:58 ` Ingo Molnar
2016-06-09 10:25 ` Ingo Molnar
2016-06-09 18:14 ` Dave Hansen
2016-06-09 20:10 ` Chen, Tim C
2016-06-06 21:15 ` Al Viro
2016-06-06 21:46 ` Linus Torvalds
2016-06-06 22:07 ` Al Viro
2016-06-06 23:50 ` Linus Torvalds
2016-06-06 23:59 ` Linus Torvalds
2016-06-07 0:29 ` Linus Torvalds
2016-06-07 0:40 ` Al Viro
2016-06-07 0:44 ` Al Viro [this message]
2016-06-07 0:58 ` Al Viro
2016-06-07 0:58 ` Linus Torvalds
2016-06-07 1:19 ` Al Viro
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20160607004427.GI14480@ZenIV.linux.org.uk \
--to=viro@zeniv.linux.org.uk \
--cc=dave.hansen@intel.com \
--cc=dbueso@suse.de \
--cc=jason.low2@hp.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mingo@redhat.com \
--cc=paulmck@linux.vnet.ibm.com \
--cc=peterz@infradead.org \
--cc=tim.c.chen@intel.com \
--cc=torvalds@linux-foundation.org \
--cc=waiman.long@hp.com \
--cc=walken@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).