linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Ian Kent <raven@themaw.net>
To: Fox Chen <foxhlchen@gmail.com>
Cc: akpm@linux-foundation.org, dhowells@redhat.com,
	Greg KH <gregkh@linuxfoundation.org>,
	linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org,
	miklos@szeredi.hu, ricklind@linux.vnet.ibm.com,
	sfr@canb.auug.org.au, Tejun Heo <tj@kernel.org>,
	viro@zeniv.linux.org.uk
Subject: Re: [PATCH v2 0/6] kernfs: proposed locking and concurrency improvement
Date: Thu, 17 Dec 2020 18:09:43 +0800	[thread overview]
Message-ID: <c4002127c72c07a00e8ba0fae6b0ebf5ba8e08e7.camel@themaw.net> (raw)
In-Reply-To: <CAC2o3DJsvB6kj=S6D3q+_OBjgez9Q9B5s3-_gjUjaKmb2MkTHQ@mail.gmail.com>

On Thu, 2020-12-17 at 16:54 +0800, Fox Chen wrote:
> On Thu, Dec 17, 2020 at 12:46 PM Ian Kent <raven@themaw.net> wrote:
> > On Tue, 2020-12-15 at 20:59 +0800, Ian Kent wrote:
> > > On Tue, 2020-12-15 at 16:33 +0800, Fox Chen wrote:
> > > > On Mon, Dec 14, 2020 at 9:30 PM Ian Kent <raven@themaw.net>
> > > > wrote:
> > > > > On Mon, 2020-12-14 at 14:14 +0800, Fox Chen wrote:
> > > > > > On Sun, Dec 13, 2020 at 11:46 AM Ian Kent <raven@themaw.net
> > > > > > >
> > > > > > wrote:
> > > > > > > On Fri, 2020-12-11 at 10:17 +0800, Ian Kent wrote:
> > > > > > > > On Fri, 2020-12-11 at 10:01 +0800, Ian Kent wrote:
> > > > > > > > > > For the patches, there is a mutex_lock in kn-
> > > > > > > > > > > attr_mutex,
> > > > > > > > > > as
> > > > > > > > > > Tejun
> > > > > > > > > > mentioned here
> > > > > > > > > > (
> > > > > > > > > > https://lore.kernel.org/lkml/X8fe0cmu+aq1gi7O@mtj.duckdns.org/
> > > > > > > > > > ),
> > > > > > > > > > maybe a global
> > > > > > > > > > rwsem for kn->iattr will be better??
> > > > > > > > > 
> > > > > > > > > I wasn't sure about that, IIRC a spin lock could be
> > > > > > > > > used
> > > > > > > > > around
> > > > > > > > > the
> > > > > > > > > initial check and checked again at the end which
> > > > > > > > > would
> > > > > > > > > probably
> > > > > > > > > have
> > > > > > > > > been much faster but much less conservative and a bit
> > > > > > > > > more
> > > > > > > > > ugly
> > > > > > > > > so
> > > > > > > > > I just went the conservative path since there was so
> > > > > > > > > much
> > > > > > > > > change
> > > > > > > > > already.
> > > > > > > > 
> > > > > > > > Sorry, I hadn't looked at Tejun's reply yet and TBH
> > > > > > > > didn't
> > > > > > > > remember
> > > > > > > > it.
> > > > > > > > 
> > > > > > > > Based on what Tejun said it sounds like that needs
> > > > > > > > work.
> > > > > > > 
> > > > > > > Those attribute handling patches were meant to allow
> > > > > > > taking
> > > > > > > the
> > > > > > > rw
> > > > > > > sem read lock instead of the write lock for
> > > > > > > kernfs_refresh_inode()
> > > > > > > updates, with the added locking to protect the inode
> > > > > > > attributes
> > > > > > > update since it's called from the VFS both with and
> > > > > > > without
> > > > > > > the
> > > > > > > inode lock.
> > > > > > 
> > > > > > Oh, understood. I was asking also because lock on kn-
> > > > > > > attr_mutex
> > > > > > drags
> > > > > > concurrent performance.
> > > > > > 
> > > > > > > Looking around it looks like kernfs_iattrs() is called
> > > > > > > from
> > > > > > > multiple
> > > > > > > places without a node database lock at all.
> > > > > > > 
> > > > > > > I'm thinking that, to keep my proposed change straight
> > > > > > > forward
> > > > > > > and on topic, I should just leave kernfs_refresh_inode()
> > > > > > > taking
> > > > > > > the node db write lock for now and consider the
> > > > > > > attributes
> > > > > > > handling
> > > > > > > as a separate change. Once that's done we could
> > > > > > > reconsider
> > > > > > > what's
> > > > > > > needed to use the node db read lock in
> > > > > > > kernfs_refresh_inode().
> > > > > > 
> > > > > > You meant taking write lock of kernfs_rwsem for
> > > > > > kernfs_refresh_inode()??
> > > > > > It may be a lot slower in my benchmark, let me test it.
> > > > > 
> > > > > Yes, but make sure the write lock of kernfs_rwsem is being
> > > > > taken
> > > > > not the read lock.
> > > > > 
> > > > > That's a mistake I had initially?
> > > > > 
> > > > > Still, that attributes handling is, I think, sufficient to
> > > > > warrant
> > > > > a separate change since it looks like it might need work, the
> > > > > kernfs
> > > > > node db probably should be kept stable for those attribute
> > > > > updates
> > > > > but equally the existence of an instantiated dentry might
> > > > > mitigate
> > > > > the it.
> > > > > 
> > > > > Some people might just know whether it's ok or not but I
> > > > > would
> > > > > like
> > > > > to check the callers to work out what's going on.
> > > > > 
> > > > > In any case it's academic if GCH isn't willing to consider
> > > > > the
> > > > > series
> > > > > for review and possible merge.
> > > > > 
> > > > Hi Ian
> > > > 
> > > > I removed kn->attr_mutex and changed read lock to write lock
> > > > for
> > > > kernfs_refresh_inode
> > > > 
> > > > down_write(&kernfs_rwsem);
> > > > kernfs_refresh_inode(kn, inode);
> > > > up_write(&kernfs_rwsem);
> > > > 
> > > > 
> > > > Unfortunate, changes in this way make things worse,  my
> > > > benchmark
> > > > runs
> > > > 100% slower than upstream sysfs.  :(
> > > > open+read+close a sysfs file concurrently took 1000us.
> > > > (Currently,
> > > > sysfs with a big mutex kernfs_mutex only takes ~500us
> > > > for one open+read+close operation concurrently)
> > > 
> > > Right, so it does need attention nowish.
> > > 
> > > I'll have a look at it in a while, I really need to get a new
> > > autofs
> > > release out, and there are quite a few changes, and testing is
> > > seeing
> > > a number of errors, some old, some newly introduced. It's proving
> > > difficult.
> > 
> > I've taken a breather for the autofs testing and had a look at
> > this.
> 
> Thanks. :)
> 
> > I think my original analysis of this was wrong.
> > 
> > Could you try this patch please.
> > I'm not sure how much difference it will make but, in principle,
> > it's much the same as the previous approach except it doesn't
> > increase the kernfs node struct size or mess with the other
> > attribute handling code.
> > 
> > Note, this is not even compile tested.
> 
> I failed to apply this patch. So based on the original six patches, I
> manually removed kn->attr_mutex, and added
> inode_lock/inode_unlock to those two functions, they were like:
> 
> int kernfs_iop_getattr(const struct path *path, struct kstat *stat,
>                        u32 request_mask, unsigned int query_flags)
> {
>         struct inode *inode = d_inode(path->dentry);
>         struct kernfs_node *kn = inode->i_private;
> 
>         inode_lock(inode);
>         down_read(&kernfs_rwsem);
>         kernfs_refresh_inode(kn, inode);
>         up_read(&kernfs_rwsem);
>         inode_unlock(inode);
> 
>         generic_fillattr(inode, stat);
>         return 0;
> }
> 
> int kernfs_iop_permission(struct inode *inode, int mask)
> {
>         struct kernfs_node *kn;
> 
>         if (mask & MAY_NOT_BLOCK)
>                 return -ECHILD;
> 
>         kn = inode->i_private;
> 
>         inode_lock(inode);
>         down_read(&kernfs_rwsem);
>         kernfs_refresh_inode(kn, inode);
>         up_read(&kernfs_rwsem);
>         inode_unlock(inode);
> 
>         return generic_permission(inode, mask);
> }
> 
> But I couldn't boot the kernel and there was no error on the screen.
> I guess it was deadlocked on /sys creation?? :D

Right, I guess the locking documentation is out of date. I'm guessing
the inode lock is taken somewhere over the .permission() call. If that
usage is consistent it's easy fixed, if the usage is inconsistent it's
hard to deal with and amounts to a bug.

I'll have another look at it.

Also, it sounds like I'm working from a more recent series.

I had 8 patches, dropped the last three and added the one I posted.
If I can work out what's going on I'll post the series for you to
check.

Ian

> 
> > kernfs: use kernfs read lock in .getattr() and .permission()
> > 
> > From: Ian Kent <raven@themaw.net>
> > 
> > From Documenation/filesystems.rst and (slightly outdated) comments
> > in fs/attr.c the inode i_rwsem is used for attribute handling.
> > 
> > This lock satisfies the requirememnts needed to reduce lock
> > contention,
> > namely a per-object lock needs to be used rather than a file system
> > global lock with the kernfs node db held stable for read
> > operations.
> > 
> > In particular it should reduce lock contention seen when calling
> > the
> > kernfs .permission() method.
> > 
> > The inode methods .getattr() and .permission() do not hold the
> > inode
> > i_rwsem lock when called as they are usually read operations. Also
> > the .permission() method checks for rcu-walk mode and returns
> > -ECHILD
> > to the VFS if it is set. So the i_rwsem lock can be used in
> > kernfs_iop_getattr() and kernfs_iop_permission() to protect the
> > inode
> > update done by kernfs_refresh_inode(). Using this lock allows the
> > kernfs node db write lock in these functions to be changed to a
> > read
> > lock.
> > 
> > Signed-off-by: Ian Kent <raven@themaw.net>
> > ---
> >  fs/kernfs/inode.c |   12 ++++++++----
> >  1 file changed, 8 insertions(+), 4 deletions(-)
> > 
> > diff --git a/fs/kernfs/inode.c b/fs/kernfs/inode.c
> > index ddaf18198935..568037e9efe9 100644
> > --- a/fs/kernfs/inode.c
> > +++ b/fs/kernfs/inode.c
> > @@ -189,9 +189,11 @@ int kernfs_iop_getattr(const struct path
> > *path, struct kstat *stat,
> >         struct inode *inode = d_inode(path->dentry);
> >         struct kernfs_node *kn = inode->i_private;
> > 
> > -       down_write(&kernfs_rwsem);
> > +       inode_lock(inode);
> > +       down_read(&kernfs_rwsem);
> >         kernfs_refresh_inode(kn, inode);
> > -       up_write(&kernfs_rwsem);
> > +       up_read(&kernfs_rwsem);
> > +       inode_unlock(inode);
> > 
> >         generic_fillattr(inode, stat);
> >         return 0;
> > @@ -281,9 +283,11 @@ int kernfs_iop_permission(struct inode *inode,
> > int mask)
> > 
> >         kn = inode->i_private;
> > 
> > -       down_write(&kernfs_rwsem);
> > +       inode_lock(inode);
> > +       down_read(&kernfs_rwsem);
> >         kernfs_refresh_inode(kn, inode);
> > -       up_write(&kernfs_rwsem);
> > +       up_read(&kernfs_rwsem);
> > +       inode_unlock(inode);
> > 
> >         return generic_permission(inode, mask);
> >  }
> > 
> 
> thanks,
> fox


  reply	other threads:[~2020-12-17 10:17 UTC|newest]

Thread overview: 62+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-06-17  7:37 [PATCH v2 0/6] kernfs: proposed locking and concurrency improvement Ian Kent
2020-06-17  7:37 ` [PATCH v2 1/6] kernfs: switch kernfs to use an rwsem Ian Kent
2020-06-17  7:37 ` [PATCH v2 2/6] kernfs: move revalidate to be near lookup Ian Kent
2020-06-17  7:37 ` [PATCH v2 3/6] kernfs: improve kernfs path resolution Ian Kent
2020-06-17  7:38 ` [PATCH v2 4/6] kernfs: use revision to identify directory node changes Ian Kent
2020-06-17  7:38 ` [PATCH v2 5/6] kernfs: refactor attr locking Ian Kent
2020-06-17  7:38 ` [PATCH v2 6/6] kernfs: make attr_mutex a local kernfs node lock Ian Kent
2020-06-19 15:38 ` [PATCH v2 0/6] kernfs: proposed locking and concurrency improvement Tejun Heo
2020-06-19 20:41   ` Rick Lindsley
2020-06-19 22:23     ` Tejun Heo
2020-06-20  2:44       ` Rick Lindsley
2020-06-22 17:53         ` Tejun Heo
2020-06-22 21:22           ` Rick Lindsley
2020-06-23 23:13             ` Tejun Heo
2020-06-24  9:04               ` Rick Lindsley
2020-06-24  9:27                 ` Greg Kroah-Hartman
2020-06-24 13:19                 ` Tejun Heo
2020-06-25  8:15               ` Ian Kent
2020-06-25  9:43                 ` Greg Kroah-Hartman
2020-06-26  0:19                   ` Ian Kent
2020-06-21  4:55       ` Ian Kent
2020-06-22 17:48         ` Tejun Heo
2020-06-22 18:03           ` Greg Kroah-Hartman
2020-06-22 21:27             ` Rick Lindsley
2020-06-23  5:21               ` Greg Kroah-Hartman
2020-06-23  5:09             ` Ian Kent
2020-06-23  6:02               ` Greg Kroah-Hartman
2020-06-23  8:01                 ` Ian Kent
2020-06-23  8:29                   ` Ian Kent
2020-06-23 11:49                   ` Greg Kroah-Hartman
2020-06-23  9:33                 ` Rick Lindsley
2020-06-23 11:45                   ` Greg Kroah-Hartman
2020-06-23 22:55                     ` Rick Lindsley
2020-06-23 11:51                   ` Ian Kent
2020-06-21  3:21   ` Ian Kent
2020-12-10 16:44 ` Fox Chen
2020-12-11  2:01   ` [PATCH " Ian Kent
2020-12-11  2:17     ` Ian Kent
2020-12-13  3:46       ` Ian Kent
2020-12-14  6:14         ` Fox Chen
2020-12-14 13:30           ` Ian Kent
2020-12-15  8:33             ` Fox Chen
2020-12-15 12:59               ` Ian Kent
2020-12-17  4:46                 ` Ian Kent
2020-12-17  8:54                   ` Fox Chen
2020-12-17 10:09                     ` Ian Kent [this message]
2020-12-17 11:09                       ` Ian Kent
2020-12-17 11:48                         ` Ian Kent
2020-12-17 15:14                           ` Tejun Heo
2020-12-18  7:36                             ` Ian Kent
2020-12-18  8:01                               ` Fox Chen
2020-12-18 11:21                                 ` Ian Kent
2020-12-18 13:20                                   ` Fox Chen
2020-12-19  0:53                                     ` Ian Kent
2020-12-19  7:47                                       ` Fox Chen
2020-12-22  2:17                                         ` Ian Kent
2020-12-18 14:59                               ` Tejun Heo
2020-12-19  7:08                                 ` Ian Kent
2020-12-19 16:23                                   ` Tejun Heo
2020-12-19 23:52                                     ` Ian Kent
2020-12-20  1:37                                       ` Ian Kent
2020-12-21  9:28                                       ` Fox Chen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=c4002127c72c07a00e8ba0fae6b0ebf5ba8e08e7.camel@themaw.net \
    --to=raven@themaw.net \
    --cc=akpm@linux-foundation.org \
    --cc=dhowells@redhat.com \
    --cc=foxhlchen@gmail.com \
    --cc=gregkh@linuxfoundation.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=miklos@szeredi.hu \
    --cc=ricklind@linux.vnet.ibm.com \
    --cc=sfr@canb.auug.org.au \
    --cc=tj@kernel.org \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).