linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jan Kara <jack@suse.cz>
To: Andrew Morton <akpm@linux-foundation.org>
Cc: Jan Kara <jack@suse.cz>, Al Viro <viro@zeniv.linux.org.uk>,
	linux-fsdevel@vger.kernel.org, linux-mm@kvack.org
Subject: Re: [PATCH] vfs: Avoid softlockups in drop_pagecache_sb()
Date: Wed, 30 Jan 2019 11:07:18 +0100	[thread overview]
Message-ID: <20190130100718.GA30203@quack2.suse.cz> (raw)
In-Reply-To: <20190129165636.34a1dc779efdbb9eff4bcf8b@linux-foundation.org>

On Tue 29-01-19 16:56:36, Andrew Morton wrote:
> On Mon, 14 Jan 2019 09:53:43 +0100 Jan Kara <jack@suse.cz> wrote:
> 
> > When superblock has lots of inodes without any pagecache (like is the
> > case for /proc), drop_pagecache_sb() will iterate through all of them
> > without dropping sb->s_inode_list_lock which can lead to softlockups
> > (one of our customers hit this).
> > 
> > Fix the problem by going to the slow path and doing cond_resched() in
> > case the process needs rescheduling.
> > 
> > ...
> >
> > --- a/fs/drop_caches.c
> > +++ b/fs/drop_caches.c
> > @@ -21,8 +21,13 @@ static void drop_pagecache_sb(struct super_block *sb, void *unused)
> >  	spin_lock(&sb->s_inode_list_lock);
> >  	list_for_each_entry(inode, &sb->s_inodes, i_sb_list) {
> >  		spin_lock(&inode->i_lock);
> > +		/*
> > +		 * We must skip inodes in unusual state. We may also skip
> > +		 * inodes without pages but we deliberately won't in case
> > +		 * we need to reschedule to avoid softlockups.
> > +		 */
> >  		if ((inode->i_state & (I_FREEING|I_WILL_FREE|I_NEW)) ||
> > -		    (inode->i_mapping->nrpages == 0)) {
> > +		    (inode->i_mapping->nrpages == 0 && !need_resched())) {
> >  			spin_unlock(&inode->i_lock);
> >  			continue;
> >  		}
> > @@ -30,6 +35,7 @@ static void drop_pagecache_sb(struct super_block *sb, void *unused)
> >  		spin_unlock(&inode->i_lock);
> >  		spin_unlock(&sb->s_inode_list_lock);
> >  
> > +		cond_resched();
> >  		invalidate_mapping_pages(inode->i_mapping, 0, -1);
> >  		iput(toput_inode);
> >  		toput_inode = inode;
> 
> Are we sure there's no situation in which a large number of inodes can
> be in the "unusual state", leading to the same issue?

No, we cannot be really sure that there aren't many such inodes (although
I'd be surprised if there were). But the problem with "unusual state"
inodes is that you cannot just __iget() them (well, you could but it's
breaking the rules and would lead to use-after-free issues ;) and so
there's no easy way to drop the spinlock and then continue the iteration
after cond_resched(). So overall it's too complex to deal with this until
someone actually hits it.

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

  reply	other threads:[~2019-01-30 10:07 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-01-14  8:53 [PATCH] vfs: Avoid softlockups in drop_pagecache_sb() Jan Kara
2019-01-30  0:56 ` Andrew Morton
2019-01-30 10:07   ` Jan Kara [this message]
  -- strict thread matches above, loose matches on Subject: below --
2018-12-06 16:11 Jan Kara
2019-01-08 15:16 ` Michal Hocko

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190130100718.GA30203@quack2.suse.cz \
    --to=jack@suse.cz \
    --cc=akpm@linux-foundation.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).