All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jeff Layton <jlayton@kernel.org>
To: Chuck Lever <chuck.lever@oracle.com>, NeilBrown <neilb@suse.de>
Cc: Al Viro <viro@zeniv.linux.org.uk>,
	Christian Brauner <brauner@kernel.org>,
	linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org,
	linux-nfs@vger.kernel.org
Subject: Re: [PATCH/RFC] core/nfsd: allow kernel threads to use task_work.
Date: Thu, 30 Nov 2023 12:50:14 -0500	[thread overview]
Message-ID: <68b6743f8c095177f5c99876627861f0fbf48edc.camel@kernel.org> (raw)
In-Reply-To: <ZWYIj7K0KPQFCCdf@tissot.1015granger.net>

On Tue, 2023-11-28 at 10:34 -0500, Chuck Lever wrote:
> On Tue, Nov 28, 2023 at 01:57:30PM +1100, NeilBrown wrote:
> > 
> > (trimmed cc...)
> > 
> > On Tue, 28 Nov 2023, Chuck Lever wrote:
> > > On Tue, Nov 28, 2023 at 11:16:06AM +1100, NeilBrown wrote:
> > > > On Tue, 28 Nov 2023, Chuck Lever wrote:
> > > > > On Tue, Nov 28, 2023 at 09:05:21AM +1100, NeilBrown wrote:
> > > > > > 
> > > > > > I have evidence from a customer site of 256 nfsd threads adding files to
> > > > > > delayed_fput_lists nearly twice as fast they are retired by a single
> > > > > > work-queue thread running delayed_fput().  As you might imagine this
> > > > > > does not end well (20 million files in the queue at the time a snapshot
> > > > > > was taken for analysis).
> > > > > > 
> > > > > > While this might point to a problem with the filesystem not handling the
> > > > > > final close efficiently, such problems should only hurt throughput, not
> > > > > > lead to memory exhaustion.
> > > > > 
> > > > > I have this patch queued for v6.8:
> > > > > 
> > > > > https://git.kernel.org/pub/scm/linux/kernel/git/cel/linux.git/commit/?h=nfsd-next&id=c42661ffa58acfeaf73b932dec1e6f04ce8a98c0
> > > > > 
> > > > 
> > > > Thanks....
> > > > I think that change is good, but I don't think it addresses the problem
> > > > mentioned in the description, and it is not directly relevant to the
> > > > problem I saw ... though it is complicated.
> > > > 
> > > > The problem "workqueue ...  hogged cpu..." probably means that
> > > > nfsd_file_dispose_list() needs a cond_resched() call in the loop.
> > > > That will stop it from hogging the CPU whether it is tied to one CPU or
> > > > free to roam.
> > > > 
> > > > Also that work is calling filp_close() which primarily calls
> > > > filp_flush().
> > > > It also calls fput() but that does minimal work.  If there is much work
> > > > to do then that is offloaded to another work-item.  *That* is the
> > > > workitem that I had problems with.
> > > > 
> > > > The problem I saw was with an older kernel which didn't have the nfsd
> > > > file cache and so probably is calling filp_close more often.
> > > 
> > > Without the file cache, the filp_close() should be handled directly
> > > by the nfsd thread handling the RPC, IIRC.
> > 
> > Yes - but __fput() is handled by a workqueue.
> > 
> > > 
> > > 
> > > > So maybe
> > > > my patch isn't so important now.  Particularly as nfsd now isn't closing
> > > > most files in-task but instead offloads that to another task.  So the
> > > > final fput will not be handled by the nfsd task either.
> > > > 
> > > > But I think there is room for improvement.  Gathering lots of files
> > > > together into a list and closing them sequentially is not going to be as
> > > > efficient as closing them in parallel.
> > > 
> > > I believe the file cache passes the filps to the work queue one at
> > 
> > nfsd_file_close_inode() does.  nfsd_file_gc() and nfsd_file_lru_scan()
> > can pass multiple.
> > 
> > > a time, but I don't think there's anything that forces the work
> > > queue to handle each flush/close completely before proceeding to the
> > > next.
> > 
> > Parallelism with workqueues is controlled by the work items (struct
> > work_struct).  Two different work items can run in parallel.  But any
> > given work item can never run parallel to itself.
> > 
> > The only work items queued on nfsd_filecache_wq are from
> >   nn->fcache_disposal->work.
> > There is one of these for each network namespace.  So in any given
> > network namespace, all work on nfsd_filecache_wq is fully serialised.
> 
> OIC, it's that specific case you are concerned with. The per-
> namespace laundrette was added by:
> 
>   9542e6a643fc ("nfsd: Containerise filecache laundrette")
> 
> It's purpose was to confine the close backlog to each container.
> 
> Seems like it would be better if there was a struct work_struct
> in each struct nfsd_file. That wouldn't add real backpressure to
> nfsd threads, but it would enable file closes to run in parallel.
> 

I like this idea. That seems a lot simpler than all of this weirdo
queueing of delayed closes that we do.

-- 
Jeff Layton <jlayton@kernel.org>

  reply	other threads:[~2023-11-30 17:50 UTC|newest]

Thread overview: 31+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-11-27 22:05 [PATCH/RFC] core/nfsd: allow kernel threads to use task_work NeilBrown
2023-11-27 22:30 ` Al Viro
2023-11-27 22:43   ` NeilBrown
2023-11-27 22:59 ` Chuck Lever
2023-11-28  0:16   ` NeilBrown
2023-11-28  1:37     ` Chuck Lever
2023-11-28  2:57       ` NeilBrown
2023-11-28 15:34         ` Chuck Lever
2023-11-30 17:50           ` Jeff Layton [this message]
2023-11-28 13:51     ` Christian Brauner
2023-11-28 14:15       ` Jeff Layton
2023-11-28 15:22         ` Chuck Lever
2023-11-28 23:31         ` NeilBrown
2023-11-28 23:20       ` NeilBrown
2023-11-29 11:43         ` Christian Brauner
2023-12-04  1:30           ` NeilBrown
2023-11-29 14:04         ` Chuck Lever
2023-11-30 17:47           ` Jeff Layton
2023-11-30 18:07             ` Chuck Lever
2023-11-30 18:33               ` Jeff Layton
2023-11-28 11:24 ` Christian Brauner
2023-11-28 13:52   ` Oleg Nesterov
2023-11-28 15:33     ` Christian Brauner
2023-11-28 16:59       ` Oleg Nesterov
2023-11-28 17:29         ` Oleg Nesterov
2023-11-28 23:40           ` NeilBrown
2023-11-29 11:38           ` Christian Brauner
2023-11-28 14:01 ` Oleg Nesterov
2023-11-28 14:20   ` Oleg Nesterov
2023-11-29  0:14   ` NeilBrown
2023-11-29  7:55     ` Oleg Nesterov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=68b6743f8c095177f5c99876627861f0fbf48edc.camel@kernel.org \
    --to=jlayton@kernel.org \
    --cc=brauner@kernel.org \
    --cc=chuck.lever@oracle.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-nfs@vger.kernel.org \
    --cc=neilb@suse.de \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.