linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Ian Kent <raven@themaw.net>
To: Fox Chen <foxhlchen@gmail.com>,
	Greg KH <gregkh@linuxfoundation.org>, Tejun Heo <tj@kernel.org>
Cc: akpm@linux-foundation.org, dhowells@redhat.com,
	linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org,
	miklos@szeredi.hu, ricklind@linux.vnet.ibm.com,
	sfr@canb.auug.org.au, viro@zeniv.linux.org.uk
Subject: Re: [PATCH v2 0/6] kernfs: proposed locking and concurrency improvement
Date: Thu, 17 Dec 2020 19:48:49 +0800	[thread overview]
Message-ID: <c8a6c9adc3651e64cf694f580a8cb3d87d7cb893.camel@themaw.net> (raw)
In-Reply-To: <a39b73a53778094279522f1665be01ce15fb21f4.camel@themaw.net>

On Thu, 2020-12-17 at 19:09 +0800, Ian Kent wrote:
> On Thu, 2020-12-17 at 18:09 +0800, Ian Kent wrote:
> > On Thu, 2020-12-17 at 16:54 +0800, Fox Chen wrote:
> > > On Thu, Dec 17, 2020 at 12:46 PM Ian Kent <raven@themaw.net>
> > > wrote:
> > > > On Tue, 2020-12-15 at 20:59 +0800, Ian Kent wrote:
> > > > > On Tue, 2020-12-15 at 16:33 +0800, Fox Chen wrote:
> > > > > > On Mon, Dec 14, 2020 at 9:30 PM Ian Kent <raven@themaw.net>
> > > > > > wrote:
> > > > > > > On Mon, 2020-12-14 at 14:14 +0800, Fox Chen wrote:
> > > > > > > > On Sun, Dec 13, 2020 at 11:46 AM Ian Kent <
> > > > > > > > raven@themaw.net
> > > > > > > > wrote:
> > > > > > > > > On Fri, 2020-12-11 at 10:17 +0800, Ian Kent wrote:
> > > > > > > > > > On Fri, 2020-12-11 at 10:01 +0800, Ian Kent wrote:
> > > > > > > > > > > > For the patches, there is a mutex_lock in kn-
> > > > > > > > > > > > > attr_mutex,
> > > > > > > > > > > > as
> > > > > > > > > > > > Tejun
> > > > > > > > > > > > mentioned here
> > > > > > > > > > > > (
> > > > > > > > > > > > https://lore.kernel.org/lkml/X8fe0cmu+aq1gi7O@mtj.duckdns.org/
> > > > > > > > > > > > ),
> > > > > > > > > > > > maybe a global
> > > > > > > > > > > > rwsem for kn->iattr will be better??
> > > > > > > > > > > 
> > > > > > > > > > > I wasn't sure about that, IIRC a spin lock could
> > > > > > > > > > > be
> > > > > > > > > > > used
> > > > > > > > > > > around
> > > > > > > > > > > the
> > > > > > > > > > > initial check and checked again at the end which
> > > > > > > > > > > would
> > > > > > > > > > > probably
> > > > > > > > > > > have
> > > > > > > > > > > been much faster but much less conservative and a
> > > > > > > > > > > bit
> > > > > > > > > > > more
> > > > > > > > > > > ugly
> > > > > > > > > > > so
> > > > > > > > > > > I just went the conservative path since there was
> > > > > > > > > > > so
> > > > > > > > > > > much
> > > > > > > > > > > change
> > > > > > > > > > > already.
> > > > > > > > > > 
> > > > > > > > > > Sorry, I hadn't looked at Tejun's reply yet and TBH
> > > > > > > > > > didn't
> > > > > > > > > > remember
> > > > > > > > > > it.
> > > > > > > > > > 
> > > > > > > > > > Based on what Tejun said it sounds like that needs
> > > > > > > > > > work.
> > > > > > > > > 
> > > > > > > > > Those attribute handling patches were meant to allow
> > > > > > > > > taking
> > > > > > > > > the
> > > > > > > > > rw
> > > > > > > > > sem read lock instead of the write lock for
> > > > > > > > > kernfs_refresh_inode()
> > > > > > > > > updates, with the added locking to protect the inode
> > > > > > > > > attributes
> > > > > > > > > update since it's called from the VFS both with and
> > > > > > > > > without
> > > > > > > > > the
> > > > > > > > > inode lock.
> > > > > > > > 
> > > > > > > > Oh, understood. I was asking also because lock on kn-
> > > > > > > > > attr_mutex
> > > > > > > > drags
> > > > > > > > concurrent performance.
> > > > > > > > 
> > > > > > > > > Looking around it looks like kernfs_iattrs() is
> > > > > > > > > called
> > > > > > > > > from
> > > > > > > > > multiple
> > > > > > > > > places without a node database lock at all.
> > > > > > > > > 
> > > > > > > > > I'm thinking that, to keep my proposed change
> > > > > > > > > straight
> > > > > > > > > forward
> > > > > > > > > and on topic, I should just leave
> > > > > > > > > kernfs_refresh_inode()
> > > > > > > > > taking
> > > > > > > > > the node db write lock for now and consider the
> > > > > > > > > attributes
> > > > > > > > > handling
> > > > > > > > > as a separate change. Once that's done we could
> > > > > > > > > reconsider
> > > > > > > > > what's
> > > > > > > > > needed to use the node db read lock in
> > > > > > > > > kernfs_refresh_inode().
> > > > > > > > 
> > > > > > > > You meant taking write lock of kernfs_rwsem for
> > > > > > > > kernfs_refresh_inode()??
> > > > > > > > It may be a lot slower in my benchmark, let me test it.
> > > > > > > 
> > > > > > > Yes, but make sure the write lock of kernfs_rwsem is
> > > > > > > being
> > > > > > > taken
> > > > > > > not the read lock.
> > > > > > > 
> > > > > > > That's a mistake I had initially?
> > > > > > > 
> > > > > > > Still, that attributes handling is, I think, sufficient
> > > > > > > to
> > > > > > > warrant
> > > > > > > a separate change since it looks like it might need work,
> > > > > > > the
> > > > > > > kernfs
> > > > > > > node db probably should be kept stable for those
> > > > > > > attribute
> > > > > > > updates
> > > > > > > but equally the existence of an instantiated dentry might
> > > > > > > mitigate
> > > > > > > the it.
> > > > > > > 
> > > > > > > Some people might just know whether it's ok or not but I
> > > > > > > would
> > > > > > > like
> > > > > > > to check the callers to work out what's going on.
> > > > > > > 
> > > > > > > In any case it's academic if GCH isn't willing to
> > > > > > > consider
> > > > > > > the
> > > > > > > series
> > > > > > > for review and possible merge.
> > > > > > > 
> > > > > > Hi Ian
> > > > > > 
> > > > > > I removed kn->attr_mutex and changed read lock to write
> > > > > > lock
> > > > > > for
> > > > > > kernfs_refresh_inode
> > > > > > 
> > > > > > down_write(&kernfs_rwsem);
> > > > > > kernfs_refresh_inode(kn, inode);
> > > > > > up_write(&kernfs_rwsem);
> > > > > > 
> > > > > > 
> > > > > > Unfortunate, changes in this way make things worse,  my
> > > > > > benchmark
> > > > > > runs
> > > > > > 100% slower than upstream sysfs.  :(
> > > > > > open+read+close a sysfs file concurrently took 1000us.
> > > > > > (Currently,
> > > > > > sysfs with a big mutex kernfs_mutex only takes ~500us
> > > > > > for one open+read+close operation concurrently)
> > > > > 
> > > > > Right, so it does need attention nowish.
> > > > > 
> > > > > I'll have a look at it in a while, I really need to get a new
> > > > > autofs
> > > > > release out, and there are quite a few changes, and testing
> > > > > is
> > > > > seeing
> > > > > a number of errors, some old, some newly introduced. It's
> > > > > proving
> > > > > difficult.
> > > > 
> > > > I've taken a breather for the autofs testing and had a look at
> > > > this.
> > > 
> > > Thanks. :)
> > > 
> > > > I think my original analysis of this was wrong.
> > > > 
> > > > Could you try this patch please.
> > > > I'm not sure how much difference it will make but, in
> > > > principle,
> > > > it's much the same as the previous approach except it doesn't
> > > > increase the kernfs node struct size or mess with the other
> > > > attribute handling code.
> > > > 
> > > > Note, this is not even compile tested.
> > > 
> > > I failed to apply this patch. So based on the original six
> > > patches,
> > > I
> > > manually removed kn->attr_mutex, and added
> > > inode_lock/inode_unlock to those two functions, they were like:
> > > 
> > > int kernfs_iop_getattr(const struct path *path, struct kstat
> > > *stat,
> > >                        u32 request_mask, unsigned int
> > > query_flags)
> > > {
> > >         struct inode *inode = d_inode(path->dentry);
> > >         struct kernfs_node *kn = inode->i_private;
> > > 
> > >         inode_lock(inode);
> > >         down_read(&kernfs_rwsem);
> > >         kernfs_refresh_inode(kn, inode);
> > >         up_read(&kernfs_rwsem);
> > >         inode_unlock(inode);
> > > 
> > >         generic_fillattr(inode, stat);
> > >         return 0;
> > > }
> > > 
> > > int kernfs_iop_permission(struct inode *inode, int mask)
> > > {
> > >         struct kernfs_node *kn;
> > > 
> > >         if (mask & MAY_NOT_BLOCK)
> > >                 return -ECHILD;
> > > 
> > >         kn = inode->i_private;
> > > 
> > >         inode_lock(inode);
> > >         down_read(&kernfs_rwsem);
> > >         kernfs_refresh_inode(kn, inode);
> > >         up_read(&kernfs_rwsem);
> > >         inode_unlock(inode);
> > > 
> > >         return generic_permission(inode, mask);
> > > }
> > > 
> > > But I couldn't boot the kernel and there was no error on the
> > > screen.
> > > I guess it was deadlocked on /sys creation?? :D
> > 
> > Right, I guess the locking documentation is out of date. I'm
> > guessing
> > the inode lock is taken somewhere over the .permission() call. If
> > that
> > usage is consistent it's easy fixed, if the usage is inconsistent
> > it's
> > hard to deal with and amounts to a bug.
> 
> Yes, it is called, both shared on open, and exclusive on open
> create, and without the inode lock at all at the start of path
> resolution.
> 
> That can't really be called a VFS bug since .permission() is
> meant to check permissions not update the inode.
> 
> This is probably what lead to the attr patches I had.
> 
> If a suitable place to put a local per-object lock can't be
> found for this, other than in the kernfs_node, then it's a
> real problem from a contention POV.
> 
> What could be done is to make the kernfs node attr_mutex
> a pointer and dynamically allocate it but even that is too
> costly a size addition to the kernfs node structure as
> Tejun has said.

I guess the question to ask is, is there really a need to
call kernfs_refresh_inode() from functions that are usually
reading/checking functions.

Would it be sufficient to refresh the inode in the write/set
operations in (if there's any) places where things like
setattr_copy() is not already called?

Perhaps GKH or Tejun could comment on this?

Ian

> 
> Those patches I referred to clearly aren't finished because
> the eighth one is empty, which followed a patch I have titled
> "kernfs: make attr_mutex a local kernfs node lock".
> 
> I obviously gave up on it when the series was rejected.
> But I'll give it some more thought.
> 
> Ian
> 
> > I'll have another look at it.
> > 
> > Also, it sounds like I'm working from a more recent series.
> > 
> > I had 8 patches, dropped the last three and added the one I posted.
> > If I can work out what's going on I'll post the series for you to
> > check.
> > 
> > Ian
> > 
> > > > kernfs: use kernfs read lock in .getattr() and .permission()
> > > > 
> > > > From: Ian Kent <raven@themaw.net>
> > > > 
> > > > From Documenation/filesystems.rst and (slightly outdated)
> > > > comments
> > > > in fs/attr.c the inode i_rwsem is used for attribute handling.
> > > > 
> > > > This lock satisfies the requirememnts needed to reduce lock
> > > > contention,
> > > > namely a per-object lock needs to be used rather than a file
> > > > system
> > > > global lock with the kernfs node db held stable for read
> > > > operations.
> > > > 
> > > > In particular it should reduce lock contention seen when
> > > > calling
> > > > the
> > > > kernfs .permission() method.
> > > > 
> > > > The inode methods .getattr() and .permission() do not hold the
> > > > inode
> > > > i_rwsem lock when called as they are usually read operations.
> > > > Also
> > > > the .permission() method checks for rcu-walk mode and returns
> > > > -ECHILD
> > > > to the VFS if it is set. So the i_rwsem lock can be used in
> > > > kernfs_iop_getattr() and kernfs_iop_permission() to protect the
> > > > inode
> > > > update done by kernfs_refresh_inode(). Using this lock allows
> > > > the
> > > > kernfs node db write lock in these functions to be changed to a
> > > > read
> > > > lock.
> > > > 
> > > > Signed-off-by: Ian Kent <raven@themaw.net>
> > > > ---
> > > >  fs/kernfs/inode.c |   12 ++++++++----
> > > >  1 file changed, 8 insertions(+), 4 deletions(-)
> > > > 
> > > > diff --git a/fs/kernfs/inode.c b/fs/kernfs/inode.c
> > > > index ddaf18198935..568037e9efe9 100644
> > > > --- a/fs/kernfs/inode.c
> > > > +++ b/fs/kernfs/inode.c
> > > > @@ -189,9 +189,11 @@ int kernfs_iop_getattr(const struct path
> > > > *path, struct kstat *stat,
> > > >         struct inode *inode = d_inode(path->dentry);
> > > >         struct kernfs_node *kn = inode->i_private;
> > > > 
> > > > -       down_write(&kernfs_rwsem);
> > > > +       inode_lock(inode);
> > > > +       down_read(&kernfs_rwsem);
> > > >         kernfs_refresh_inode(kn, inode);
> > > > -       up_write(&kernfs_rwsem);
> > > > +       up_read(&kernfs_rwsem);
> > > > +       inode_unlock(inode);
> > > > 
> > > >         generic_fillattr(inode, stat);
> > > >         return 0;
> > > > @@ -281,9 +283,11 @@ int kernfs_iop_permission(struct inode
> > > > *inode,
> > > > int mask)
> > > > 
> > > >         kn = inode->i_private;
> > > > 
> > > > -       down_write(&kernfs_rwsem);
> > > > +       inode_lock(inode);
> > > > +       down_read(&kernfs_rwsem);
> > > >         kernfs_refresh_inode(kn, inode);
> > > > -       up_write(&kernfs_rwsem);
> > > > +       up_read(&kernfs_rwsem);
> > > > +       inode_unlock(inode);
> > > > 
> > > >         return generic_permission(inode, mask);
> > > >  }
> > > > 
> > > 
> > > thanks,
> > > fox


  reply	other threads:[~2020-12-17 11:50 UTC|newest]

Thread overview: 62+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-06-17  7:37 [PATCH v2 0/6] kernfs: proposed locking and concurrency improvement Ian Kent
2020-06-17  7:37 ` [PATCH v2 1/6] kernfs: switch kernfs to use an rwsem Ian Kent
2020-06-17  7:37 ` [PATCH v2 2/6] kernfs: move revalidate to be near lookup Ian Kent
2020-06-17  7:37 ` [PATCH v2 3/6] kernfs: improve kernfs path resolution Ian Kent
2020-06-17  7:38 ` [PATCH v2 4/6] kernfs: use revision to identify directory node changes Ian Kent
2020-06-17  7:38 ` [PATCH v2 5/6] kernfs: refactor attr locking Ian Kent
2020-06-17  7:38 ` [PATCH v2 6/6] kernfs: make attr_mutex a local kernfs node lock Ian Kent
2020-06-19 15:38 ` [PATCH v2 0/6] kernfs: proposed locking and concurrency improvement Tejun Heo
2020-06-19 20:41   ` Rick Lindsley
2020-06-19 22:23     ` Tejun Heo
2020-06-20  2:44       ` Rick Lindsley
2020-06-22 17:53         ` Tejun Heo
2020-06-22 21:22           ` Rick Lindsley
2020-06-23 23:13             ` Tejun Heo
2020-06-24  9:04               ` Rick Lindsley
2020-06-24  9:27                 ` Greg Kroah-Hartman
2020-06-24 13:19                 ` Tejun Heo
2020-06-25  8:15               ` Ian Kent
2020-06-25  9:43                 ` Greg Kroah-Hartman
2020-06-26  0:19                   ` Ian Kent
2020-06-21  4:55       ` Ian Kent
2020-06-22 17:48         ` Tejun Heo
2020-06-22 18:03           ` Greg Kroah-Hartman
2020-06-22 21:27             ` Rick Lindsley
2020-06-23  5:21               ` Greg Kroah-Hartman
2020-06-23  5:09             ` Ian Kent
2020-06-23  6:02               ` Greg Kroah-Hartman
2020-06-23  8:01                 ` Ian Kent
2020-06-23  8:29                   ` Ian Kent
2020-06-23 11:49                   ` Greg Kroah-Hartman
2020-06-23  9:33                 ` Rick Lindsley
2020-06-23 11:45                   ` Greg Kroah-Hartman
2020-06-23 22:55                     ` Rick Lindsley
2020-06-23 11:51                   ` Ian Kent
2020-06-21  3:21   ` Ian Kent
2020-12-10 16:44 ` Fox Chen
2020-12-11  2:01   ` [PATCH " Ian Kent
2020-12-11  2:17     ` Ian Kent
2020-12-13  3:46       ` Ian Kent
2020-12-14  6:14         ` Fox Chen
2020-12-14 13:30           ` Ian Kent
2020-12-15  8:33             ` Fox Chen
2020-12-15 12:59               ` Ian Kent
2020-12-17  4:46                 ` Ian Kent
2020-12-17  8:54                   ` Fox Chen
2020-12-17 10:09                     ` Ian Kent
2020-12-17 11:09                       ` Ian Kent
2020-12-17 11:48                         ` Ian Kent [this message]
2020-12-17 15:14                           ` Tejun Heo
2020-12-18  7:36                             ` Ian Kent
2020-12-18  8:01                               ` Fox Chen
2020-12-18 11:21                                 ` Ian Kent
2020-12-18 13:20                                   ` Fox Chen
2020-12-19  0:53                                     ` Ian Kent
2020-12-19  7:47                                       ` Fox Chen
2020-12-22  2:17                                         ` Ian Kent
2020-12-18 14:59                               ` Tejun Heo
2020-12-19  7:08                                 ` Ian Kent
2020-12-19 16:23                                   ` Tejun Heo
2020-12-19 23:52                                     ` Ian Kent
2020-12-20  1:37                                       ` Ian Kent
2020-12-21  9:28                                       ` Fox Chen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=c8a6c9adc3651e64cf694f580a8cb3d87d7cb893.camel@themaw.net \
    --to=raven@themaw.net \
    --cc=akpm@linux-foundation.org \
    --cc=dhowells@redhat.com \
    --cc=foxhlchen@gmail.com \
    --cc=gregkh@linuxfoundation.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=miklos@szeredi.hu \
    --cc=ricklind@linux.vnet.ibm.com \
    --cc=sfr@canb.auug.org.au \
    --cc=tj@kernel.org \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).