All of lore.kernel.org
 help / color / mirror / Atom feed
From: Chuck Lever <chuck.lever@oracle.com>
To: Jeff Layton <jlayton@kernel.org>
Cc: Christian Brauner <brauner@kernel.org>, NeilBrown <neilb@suse.de>,
	Al Viro <viro@zeniv.linux.org.uk>,
	linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org,
	linux-nfs@vger.kernel.org
Subject: Re: [PATCH/RFC] core/nfsd: allow kernel threads to use task_work.
Date: Tue, 28 Nov 2023 10:22:33 -0500	[thread overview]
Message-ID: <ZWYFuWqCmX87C0ve@tissot.1015granger.net> (raw)
In-Reply-To: <518775f9f9bd3ad1afec0bde4d0a6bee3370bdd4.camel@kernel.org>

On Tue, Nov 28, 2023 at 09:15:39AM -0500, Jeff Layton wrote:
> On Tue, 2023-11-28 at 14:51 +0100, Christian Brauner wrote:
> > [Reusing the trimmed Cc]
> > 
> > On Tue, Nov 28, 2023 at 11:16:06AM +1100, NeilBrown wrote:
> > > On Tue, 28 Nov 2023, Chuck Lever wrote:
> > > > On Tue, Nov 28, 2023 at 09:05:21AM +1100, NeilBrown wrote:
> > > > > 
> > > > > I have evidence from a customer site of 256 nfsd threads adding files to
> > > > > delayed_fput_lists nearly twice as fast they are retired by a single
> > > > > work-queue thread running delayed_fput().  As you might imagine this
> > > > > does not end well (20 million files in the queue at the time a snapshot
> > > > > was taken for analysis).
> > > > > 
> > > > > While this might point to a problem with the filesystem not handling the
> > > > > final close efficiently, such problems should only hurt throughput, not
> > > > > lead to memory exhaustion.
> > > > 
> > > > I have this patch queued for v6.8:
> > > > 
> > > > https://git.kernel.org/pub/scm/linux/kernel/git/cel/linux.git/commit/?h=nfsd-next&id=c42661ffa58acfeaf73b932dec1e6f04ce8a98c0
> > > > 
> > > 
> > > Thanks....
> > > I think that change is good, but I don't think it addresses the problem
> > > mentioned in the description, and it is not directly relevant to the
> > > problem I saw ... though it is complicated.
> > > 
> > > The problem "workqueue ...  hogged cpu..." probably means that
> > > nfsd_file_dispose_list() needs a cond_resched() call in the loop.
> > > That will stop it from hogging the CPU whether it is tied to one CPU or
> > > free to roam.
> > > 
> > > Also that work is calling filp_close() which primarily calls
> > > filp_flush().
> > > It also calls fput() but that does minimal work.  If there is much work
> > > to do then that is offloaded to another work-item.  *That* is the
> > > workitem that I had problems with.
> > > 
> > > The problem I saw was with an older kernel which didn't have the nfsd
> > > file cache and so probably is calling filp_close more often.  So maybe
> > > my patch isn't so important now.  Particularly as nfsd now isn't closing
> > > most files in-task but instead offloads that to another task.  So the
> > > final fput will not be handled by the nfsd task either.
> > > 
> > > But I think there is room for improvement.  Gathering lots of files
> > > together into a list and closing them sequentially is not going to be as
> > > efficient as closing them in parallel.
> > > 
> > > > 
> > > > > For normal threads, the thread that closes the file also calls the
> > > > > final fput so there is natural rate limiting preventing excessive growth
> > > > > in the list of delayed fputs.  For kernel threads, and particularly for
> > > > > nfsd, delayed in the final fput do not impose any throttling to prevent
> > > > > the thread from closing more files.
> > > > 
> > > > I don't think we want to block nfsd threads waiting for files to
> > > > close. Won't that be a potential denial of service?
> > > 
> > > Not as much as the denial of service caused by memory exhaustion due to
> > > an indefinitely growing list of files waiting to be closed by a single
> > > thread of workqueue.
> > 
> > It seems less likely that you run into memory exhausting than a DOS
> > because nfsd() is busy closing fds. Especially because you default to
> > single nfsd thread afaict.

I would expect a DoS too: the system should start pushing out dirty
file data itself well before exhausting memory.


> The default is currently 8 threads (which is ridiculously low for most
> uses, but that's another discussion). That policy is usually set by
> userland nfs-utils though.

With only 8 threads, it might be /more/ difficult for clients to
generate enough workload to cause an overwhelming flood of closes.
As Neil said in the cover letter text, he observed this issue with
256 nfsd threads.


> This is another place where we might want to reserve a "rescuer" thread
> that avoids doing work that can end up blocked. Maybe we could switch
> back to queuing them to the list when we're below a certain threshold of
> available threads (1? 2? 4?).
> 
> > > I think it is perfectly reasonable that when handling an NFSv4 CLOSE,
> > > the nfsd thread should completely handle that request including all the
> > > flush and ->release etc.  If that causes any denial of service, then
> > > simple increase the number of nfsd threads.
> > 
> > But isn't that a significant behavioral change? So I would expect to
> > make this at configurable via a module- or Kconfig option?
> 
> I struggle to think about how we would document a new option like this. 

I think NFSv4 CLOSE can close files synchronously without an
observable behavior change. NFSv4 clients almost always COMMIT dirty
data before they CLOSE, so there should only rarely be a significant
flush component to an fput done here.

The problem is the garbage-collected (NFSv3) case, where the server
frequently closes files well before a client might have COMMITted
its dirty data.


> > > For NFSv3 it is more complex.  On the kernel where I saw a problem the
> > > filp_close happen after each READ or WRITE (though I think the customer
> > > was using NFSv4...).  With the file cache there is no thread that is
> > > obviously responsible for the close.
> > > To get the sort of throttling that I think is need, we could possibly
> > > have each "nfsd_open" check if there are pending closes, and to wait for
> > > some small amount of progress.
> > > 
> > > But don't think it is reasonable for the nfsd threads to take none of
> > > the burden of closing files as that can result in imbalance.
> > 
> > It feels that this really needs to be tested under a similar workload in
> > question to see whether this is a viable solution.
> 
> Definitely. I'd also like to see how this behaves with NFS or Ceph
> reexport. Closing can be quite slow on those filesystems, so that might
> be a good place to try and break this.

-- 
Chuck Lever

  reply	other threads:[~2023-11-28 15:22 UTC|newest]

Thread overview: 31+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-11-27 22:05 [PATCH/RFC] core/nfsd: allow kernel threads to use task_work NeilBrown
2023-11-27 22:30 ` Al Viro
2023-11-27 22:43   ` NeilBrown
2023-11-27 22:59 ` Chuck Lever
2023-11-28  0:16   ` NeilBrown
2023-11-28  1:37     ` Chuck Lever
2023-11-28  2:57       ` NeilBrown
2023-11-28 15:34         ` Chuck Lever
2023-11-30 17:50           ` Jeff Layton
2023-11-28 13:51     ` Christian Brauner
2023-11-28 14:15       ` Jeff Layton
2023-11-28 15:22         ` Chuck Lever [this message]
2023-11-28 23:31         ` NeilBrown
2023-11-28 23:20       ` NeilBrown
2023-11-29 11:43         ` Christian Brauner
2023-12-04  1:30           ` NeilBrown
2023-11-29 14:04         ` Chuck Lever
2023-11-30 17:47           ` Jeff Layton
2023-11-30 18:07             ` Chuck Lever
2023-11-30 18:33               ` Jeff Layton
2023-11-28 11:24 ` Christian Brauner
2023-11-28 13:52   ` Oleg Nesterov
2023-11-28 15:33     ` Christian Brauner
2023-11-28 16:59       ` Oleg Nesterov
2023-11-28 17:29         ` Oleg Nesterov
2023-11-28 23:40           ` NeilBrown
2023-11-29 11:38           ` Christian Brauner
2023-11-28 14:01 ` Oleg Nesterov
2023-11-28 14:20   ` Oleg Nesterov
2023-11-29  0:14   ` NeilBrown
2023-11-29  7:55     ` Oleg Nesterov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZWYFuWqCmX87C0ve@tissot.1015granger.net \
    --to=chuck.lever@oracle.com \
    --cc=brauner@kernel.org \
    --cc=jlayton@kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-nfs@vger.kernel.org \
    --cc=neilb@suse.de \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.