From: "NeilBrown" <neilb@suse.de>
To: "Chuck Lever" <chuck.lever@oracle.com>
Cc: "Al Viro" <viro@zeniv.linux.org.uk>,
"Christian Brauner" <brauner@kernel.org>,
"Jens Axboe" <axboe@kernel.dk>, "Oleg Nesterov" <oleg@redhat.com>,
"Jeff Layton" <jlayton@kernel.org>,
linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org,
linux-nfs@vger.kernel.org
Subject: Re: [PATCH 1/3] nfsd: use __fput_sync() to avoid delayed closing of files.
Date: Mon, 11 Dec 2023 09:47:35 +1100 [thread overview]
Message-ID: <170224845504.12910.16483736613606611138@noble.neil.brown.name> (raw)
In-Reply-To: <ZXMv4psmTWw4mlCd@tissot.1015granger.net>
On Sat, 09 Dec 2023, Chuck Lever wrote:
> On Fri, Dec 08, 2023 at 02:27:26PM +1100, NeilBrown wrote:
> > Calling fput() directly or though filp_close() from a kernel thread like
> > nfsd causes the final __fput() (if necessary) to be called from a
> > workqueue. This means that nfsd is not forced to wait for any work to
> > complete. If the ->release of ->destroy_inode function is slow for any
> > reason, this can result in nfsd closing files more quickly than the
> > workqueue can complete the close and the queue of pending closes can
> > grow without bounces (30 million has been seen at one customer site,
> > though this was in part due to a slowness in xfs which has since been
> > fixed).
> >
> > nfsd does not need this.
>
> That is technically true, but IIUC, there is only one case where a
> synchronous close matters for the backlog problem, and that's when
> nfsd_file_free() is called from nfsd_file_put(). AFAICT all other
> call sites (except rename) are error paths, so there aren't negative
> consequences for the lack of synchronous wait there...
What you say is technically true but it isn't the way I see it.
Firstly I should clarify that __fput_sync() is *not* a flushing close as
you describe it below.
All it does, apart for some trivial book-keeping, is to call ->release
and possibly ->destroy_inode immediately rather than shunting them off
to another thread.
Apparently ->release sometimes does something that can deadlock with
some kernel threads or if some awkward locks are held, so the whole
final __fput is delay by default. But this does not apply to nfsd.
Standard fput() is really the wrong interface for nfsd to use.
It should use __fput_sync() (which shouldn't have such a scary name).
The comment above flush_delayed_fput() seems to suggest that unmounting
is a core issue. Maybe the fact that __fput() can call
dissolve_on_fput() is a reason why it is sometimes safer to leave the
work to later. But I don't see that applying to nfsd.
Of course a ->release function *could* do synchronous writes just like
the XFS ->destroy_inode function used to do synchronous reads.
I don't think we should ever try to hide that by putting it in
a workqueue. It's probably a bug and it is best if bugs are visible.
Note that the XFS ->release function does call filemap_flush() in some
cases, but that is an async flush, so __fput_sync doesn't wait for the
flush to complete.
The way I see this patch is that fput() is the wrong interface for nfsd
to use, __fput_sync is the right interface. So we should change. 1
patch.
The details about exhausting memory explain a particular symptom that
motivated the examination which revealed that nfsd was using the wrong
interface.
If we have nfsd sometimes using fput() and sometimes __fput_sync, then
we need to have clear rules for when to use which. It is much easier to
have a simple rule: always use __fput_sync().
I'm certainly happy to revise function documentation and provide
wrapper functions if needed.
I might be good to have
void filp_close_sync(struct file *f)
{
get_file(f);
filp_close(f);
__fput_sync(f);
}
but as that would only be called once, it was hard to motivate.
Having it in linux/fs.h would be nice.
Similarly would could wrap __fput_sync() is a more friendly name, but
that would be better if we actually renamed the function.
void fput_now(struct file *f)
{
__fput_sync(f);
}
??
Thanks,
NeilBrown
next prev parent reply other threads:[~2023-12-10 22:59 UTC|newest]
Thread overview: 21+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-12-08 3:27 [PATCH 0/3] nfsd: fully close all files in the nfsd threads NeilBrown
2023-12-08 3:27 ` [PATCH 1/3] nfsd: use __fput_sync() to avoid delayed closing of files NeilBrown
2023-12-08 15:01 ` Chuck Lever
2023-12-10 22:47 ` NeilBrown [this message]
2023-12-11 19:01 ` Chuck Lever
2023-12-11 22:04 ` NeilBrown
2023-12-12 16:17 ` Chuck Lever
2023-12-11 19:11 ` Al Viro
2023-12-11 22:23 ` NeilBrown
2023-12-11 23:13 ` Al Viro
2023-12-11 23:21 ` Al Viro
2023-12-13 0:28 ` NeilBrown
2023-12-15 18:27 ` David Laight
2023-12-15 19:35 ` Chuck Lever III
2023-12-15 22:36 ` NeilBrown
2023-12-15 18:28 ` David Laight
2023-12-16 1:50 ` Dave Chinner
2023-12-08 3:27 ` [PATCH 2/3] nfsd: Don't leave work of closing files to a work queue NeilBrown
2023-12-08 3:27 ` [PATCH 3/3] VFS: don't export flush_delayed_fput() NeilBrown
2023-12-08 11:40 ` [PATCH 0/3] nfsd: fully close all files in the nfsd threads Jeff Layton
2023-12-08 14:33 ` Jens Axboe
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=170224845504.12910.16483736613606611138@noble.neil.brown.name \
--to=neilb@suse.de \
--cc=axboe@kernel.dk \
--cc=brauner@kernel.org \
--cc=chuck.lever@oracle.com \
--cc=jlayton@kernel.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-nfs@vger.kernel.org \
--cc=oleg@redhat.com \
--cc=viro@zeniv.linux.org.uk \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).