All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jeff Layton <jlayton@kernel.org>
To: chuck.lever@oracle.com, NeilBrown <neilb@suse.de>
Cc: linux-nfs@vger.kernel.org
Subject: Re: [PATCH v5 3/5] nfsd: rework refcounting in filecache
Date: Wed, 02 Nov 2022 12:58:21 -0400	[thread overview]
Message-ID: <db00762c5e0b983c72bee2c6bfa4476b2090a942.camel@kernel.org> (raw)
In-Reply-To: <4ed166accb2fd4a1aa6e4013ca7639bc2e610e37.camel@kernel.org>

On Tue, 2022-11-01 at 18:42 -0400, Jeff Layton wrote:
> On Wed, 2022-11-02 at 09:05 +1100, NeilBrown wrote:
> > On Wed, 02 Nov 2022, Jeff Layton wrote:
> > > On Wed, 2022-11-02 at 08:23 +1100, NeilBrown wrote:
> > > > On Wed, 02 Nov 2022, Jeff Layton wrote:
> > > > > The filecache refcounting is a bit non-standard for something searchable
> > > > > by RCU, in that we maintain a sentinel reference while it's hashed. This
> > > > > in turn requires that we have to do things differently in the "put"
> > > > > depending on whether its hashed, which we believe to have led to races.
> > > > > 
> > > > > There are other problems in here too. nfsd_file_close_inode_sync can end
> > > > > up freeing an nfsd_file while there are still outstanding references to
> > > > > it, and there are a number of subtle ToC/ToU races.
> > > > > 
> > > > > Rework the code so that the refcount is what drives the lifecycle. When
> > > > > the refcount goes to zero, then unhash and rcu free the object.
> > > > > 
> > > > > With this change, the LRU carries a reference. Take special care to
> > > > > deal with it when removing an entry from the list.
> > > > 
> > > > The refcounting and lru management all look sane here.
> > > > 
> > > > You need to have moved the final put (and corresponding fsync) to
> > > > different threads.  I think I see you and Chuck discussing that and I
> > > > have no sense of what is "right". 
> > > > 
> > > 
> > > Yeah, this is a tough call. I get Chuck's reticence.
> > > 
> > > One thing we could consider is offloading the SYNC_NONE writeback
> > > submission to a workqueue. I'm not sure though whether that's a win --
> > > it might just add needless context switches. OTOH, that would make it
> > > fairly simple to kick off writeback when the REFERENCED flag is cleared,
> > > which would probably be the best time to do it.
> > > 
> > > An entry that ends up being harvested by the LRU scanner is going to be
> > > touched by it at least twice: once to clear the REFERENCED flag, and
> > > again ~2s later to reap it.
> > > 
> > > If we schedule writeback when we clear the flag then we have a pretty
> > > good indication that nothing else is going to be using it (though I
> > > think we need to clear REFERENCED even when nfsd_file_check_writeback
> > > returns true -- I'll fix that in the coming series).
> > > 
> > > In any case, I'd probably like to do that sort of change in a separate
> > > series after we get the first part sorted.
> > > 
> > > > But it would be nice to explain in
> > > > the comment what is being moved and why, so I could then confirm that
> > > > the code matches the intent.
> > > > 
> > > 
> > > I'm happy to add comments, but I'm a little unclear on what you're
> > > confused by here. It's a bit too big of a patch for me to give a full
> > > play-by-play description. Can you elaborate on what you'd like to see?
> > > 
> > 
> > I don't need blow-by-blow, but all the behavioural changes should at
> > least be flagged in the intro, and possibly explained.
> > The one I particularly noticed is in nfsd_file_close_inode() which
> > previously used nfsd_file_dispose_list() which hands the final close off
> > to nfsd_filecache_wq.
> > But this patch now does the final close in-line so an fsnotify event
> > might now do the fsync.  I was assuming that was deliberate and wanted
> > it to be explained.  But maybe it wasn't deliberate?
> > 
> 
> Good catch! That wasn't a deliberate change, or at least I missed the
> subtlety that the earlier code attempted to avoid it. fsnotify callbacks
> are run under the srcu_read_lock. I don't think we want to run a fsync
> under that if we can at all help it.
> 
> What we can probably do is unhash it and dequeue it from the LRU, and
> then do a refcount_dec_and_test. If that comes back true, we can then
> queue it to the nfsd_fcache_disposal infrastructure to be closed and
> freed. I'll have a look at that tomorrow.
> 

Ok, I looked over the notification handling in here again and there is a
bit of a dilemma:

Neil is correct that we currently just put the reference directly in the
notification callback, and if we put the last reference at that point
then we could end up waiting on writeback.

There are two notification callbacks:

1/ fanotify callback to tell us when the link count on a file has gone
to 0.

2/ the setlease_notifier which is called when someone wants to do a
setlease

...both are called under the srcu_read_lock(), and they are both fairly
similar situations. We call different functions for them today, but we
may be OK to unify them since their needs are similar.

When I originally added these, I made them synchronous because it's best
if nfsd can clean up and get out the way quickly when either of these
events happen. At that time though, the code didn't call vfs_fsync at
all, much less always on the last put.

We have a couple of options:

1/ just continue to do them synchronously: We can sleep under the
srcu_read_lock, so we can do both of those synchronously, but blocking
it for a long period of time could cause latency issues elsewhere.

2/ move them to the delayed freeing infrastructure. That should be fine
but we could end doing stuff like silly renaming when someone deletes an
open file on an NFS reexport.

Thoughts? What's the best approach here?

Maybe we should just leave them synchronous for now, and plan to address
this in a later set?
 
> > The movement of flush_delayed_fput() threw me at first, but I think I
> > understand it now - the new code for close_inode_sync is much cleaner,
> > not needing dispose_list_sync.
> > 
> 
> Yep, I think this is cleaner too.
> 

-- 
Jeff Layton <jlayton@kernel.org>

  reply	other threads:[~2022-11-02 16:58 UTC|newest]

Thread overview: 32+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-11-01 14:46 [PATCH v5 0/5] nfsd: clean up refcounting in the filecache Jeff Layton
2022-11-01 14:46 ` [PATCH v5 1/5] nfsd: remove the pages_flushed statistic from filecache Jeff Layton
2022-11-01 14:46 ` [PATCH v5 2/5] nfsd: reorganize filecache.c Jeff Layton
2022-11-01 20:59   ` NeilBrown
2022-11-01 21:04     ` Jeff Layton
2022-11-01 14:46 ` [PATCH v5 3/5] nfsd: rework refcounting in filecache Jeff Layton
2022-11-01 17:25   ` Chuck Lever III
2022-11-01 18:03     ` Jeff Layton
2022-11-01 18:57       ` Jeff Layton
2022-11-01 19:13         ` Chuck Lever III
2022-11-01 20:21           ` Jeff Layton
2022-11-01 18:15     ` Jeff Layton
2022-11-01 21:23   ` NeilBrown
2022-11-01 21:39     ` Jeff Layton
2022-11-01 21:49       ` Chuck Lever III
2022-11-01 22:04         ` Chuck Lever III
2022-11-01 22:05       ` NeilBrown
2022-11-01 22:42         ` Jeff Layton
2022-11-02 16:58           ` Jeff Layton [this message]
2022-11-02 18:07             ` Chuck Lever III
2022-11-02 18:29               ` Chuck Lever III
2022-11-02 18:34               ` Jeff Layton
2023-01-17 15:16   ` Jason Gunthorpe
2023-01-17 15:22     ` Chuck Lever III
2023-01-17 15:59       ` Jeff Layton
2023-01-18 16:48         ` Shachar Kagan
2023-01-18 17:18           ` Jeff Layton
2023-01-18 18:41             ` Chuck Lever III
2022-11-01 14:46 ` [PATCH v5 4/5] nfsd: close race between unhashing and LRU addition Jeff Layton
2022-11-01 19:45   ` Chuck Lever III
2022-11-01 21:12   ` Chuck Lever III
2022-11-01 14:46 ` [PATCH v5 5/5] nfsd: fix up the filecache laundrette scheduling Jeff Layton

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=db00762c5e0b983c72bee2c6bfa4476b2090a942.camel@kernel.org \
    --to=jlayton@kernel.org \
    --cc=chuck.lever@oracle.com \
    --cc=linux-nfs@vger.kernel.org \
    --cc=neilb@suse.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.