linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Minchan Kim <minchan@kernel.org>
To: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Tejun Heo <tj@kernel.org>, LKML <linux-kernel@vger.kernel.org>,
	hridya@google.com
Subject: Re: [RFC PATCH] kernfs: release kernfs_mutex before the inode allocation
Date: Wed, 17 Nov 2021 13:43:55 -0800	[thread overview]
Message-ID: <YZV3m1O6x3yQk/DZ@google.com> (raw)
In-Reply-To: <YZSxwM8ucqGsY1hq@kroah.com>

On Wed, Nov 17, 2021 at 08:39:44AM +0100, Greg Kroah-Hartman wrote:
> On Tue, Nov 16, 2021 at 11:27:56PM -0800, Minchan Kim wrote:
> > On Wed, Nov 17, 2021 at 07:44:44AM +0100, Greg Kroah-Hartman wrote:
> > > On Tue, Nov 16, 2021 at 01:36:01PM -0800, Minchan Kim wrote:
> > > > On Tue, Nov 16, 2021 at 08:49:46PM +0100, Greg Kroah-Hartman wrote:
> > > > > On Tue, Nov 16, 2021 at 11:43:17AM -0800, Minchan Kim wrote:
> > > > > > The kernfs implementation has big lock granularity(kernfs_rwsem) so
> > > > > > every kernfs-based(e.g., sysfs, cgroup, dmabuf) fs are able to compete
> > > > > > the lock. Thus, if one of userspace goes the sleep under holding
> > > > > > the lock for a long time, rest of them should wait it. A example is
> > > > > > the holder goes direct reclaim with the lock since it needs memory
> > > > > > allocation. Let's fix it at common technique that release the lock
> > > > > > and then allocate the memory. Fortunately, kernfs looks like have
> > > > > > an refcount so I hope it's fine.
> > > > > > 
> > > > > > Signed-off-by: Minchan Kim <minchan@kernel.org>
> > > > > > ---
> > > > > >  fs/kernfs/dir.c             | 14 +++++++++++---
> > > > > >  fs/kernfs/inode.c           |  2 +-
> > > > > >  fs/kernfs/kernfs-internal.h |  1 +
> > > > > >  3 files changed, 13 insertions(+), 4 deletions(-)
> > > > > 
> > > > > What workload hits this lock to cause it to be noticable?
> > > > 
> > > > A app launching since it was dropping the frame since the
> > > > latency was too long.
> > > 
> > > How does running a program interact with kernfs filesystems?  Which
> > > one(s)?
> > 
> > A app launching involves dma_buf exports which creates kobject
> > and add it to the kernfs with down_write - kernfs_add_one.
> 
> I thought the "create a dma_buf kobject" kernel change was fixed up to
> not do that anymore as that was a known performance issue.
> 
> Creating kobjects should NOT be on a fast path, if they are, that needs
> to be fixed.

Ccing Hridya
I also already mentioned before but it's the as-is.
It would be great to be fixed but kernfs still has the problem
regardless of the dmabuf.

For example, process A hold the lock as read-side and try to
allocate memory and entered the reclaim. process B try to
hold the lock as writes-side(e.g., kernfs_iop_setattr) but
he should wait until process A completes. Next, processs C is
also stuck on the lock as read-side when it tries to access
sysfs since the process B spent a threahold timeout in rwsem.
Here, process C could be critical role to contribute the jank.
What it was doing is just access sysfs but was stuck.

> 
> > At the same time in other CPU, a random process was accessing
> > sysfs and the kernfs_iop_lookup was already hoding the kernfs_rwsem
> > and ran under direct reclaim patch due to alloc_inode in
> > kerfs_get_inode.
> 
> What program is constantly hitting sysfs?  sysfs is not for
> performance-critical things, right?

Constantly hitting sysfs itself is no problem since they need
to read telemetry data from sysfs and it's not latency sensitive.
Here, problem is the reading kernfs could make trouble others who
want to access once the rwsem is involved with exclusive lock mode.
Please look at above scenario.

> 
> > Therefore, the app is stuck on the lock and lose frames so enduser
> > sees the jank.
> 
> But how does this patch work around it?  It seems like you are
> special-casing the kobject creation path only.

It's true. If other path also has the memory allocation with holding
kernfs_rwsem, it could make trouble.

> 
> And is this the case really on 5.15?  I thought the kernfs locks were
> broken up again to not cause this problem in 5.14 or so.

It happened on 5.10 but the path I mentioned were still vulnerable path
with rwsem so it could happen on 5.15, too.

  reply	other threads:[~2021-11-17 21:44 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-11-16 19:43 Minchan Kim
2021-11-16 19:49 ` Greg Kroah-Hartman
2021-11-16 21:36   ` Minchan Kim
2021-11-17  6:44     ` Greg Kroah-Hartman
2021-11-17  7:27       ` Minchan Kim
2021-11-17  7:39         ` Greg Kroah-Hartman
2021-11-17 21:43           ` Minchan Kim [this message]
2021-11-17 21:45         ` Tejun Heo
2021-11-17 22:13           ` Minchan Kim
2021-11-17 22:23             ` Tejun Heo
2021-11-18  1:55               ` Minchan Kim
2021-11-18 16:35                 ` Tejun Heo

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YZV3m1O6x3yQk/DZ@google.com \
    --to=minchan@kernel.org \
    --cc=gregkh@linuxfoundation.org \
    --cc=hridya@google.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=tj@kernel.org \
    --subject='Re: [RFC PATCH] kernfs: release kernfs_mutex before the inode allocation' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).