All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jeff Layton <jlayton@kernel.org>
To: NeilBrown <neilb@suse.de>
Cc: chuck.lever@oracle.com, linux-nfs@vger.kernel.org
Subject: Re: [PATCH v5 3/5] nfsd: rework refcounting in filecache
Date: Tue, 01 Nov 2022 18:42:32 -0400	[thread overview]
Message-ID: <4ed166accb2fd4a1aa6e4013ca7639bc2e610e37.camel@kernel.org> (raw)
In-Reply-To: <166734032156.19313.13594422911305212646@noble.neil.brown.name>

On Wed, 2022-11-02 at 09:05 +1100, NeilBrown wrote:
> On Wed, 02 Nov 2022, Jeff Layton wrote:
> > On Wed, 2022-11-02 at 08:23 +1100, NeilBrown wrote:
> > > On Wed, 02 Nov 2022, Jeff Layton wrote:
> > > > The filecache refcounting is a bit non-standard for something searchable
> > > > by RCU, in that we maintain a sentinel reference while it's hashed. This
> > > > in turn requires that we have to do things differently in the "put"
> > > > depending on whether its hashed, which we believe to have led to races.
> > > > 
> > > > There are other problems in here too. nfsd_file_close_inode_sync can end
> > > > up freeing an nfsd_file while there are still outstanding references to
> > > > it, and there are a number of subtle ToC/ToU races.
> > > > 
> > > > Rework the code so that the refcount is what drives the lifecycle. When
> > > > the refcount goes to zero, then unhash and rcu free the object.
> > > > 
> > > > With this change, the LRU carries a reference. Take special care to
> > > > deal with it when removing an entry from the list.
> > > 
> > > The refcounting and lru management all look sane here.
> > > 
> > > You need to have moved the final put (and corresponding fsync) to
> > > different threads.  I think I see you and Chuck discussing that and I
> > > have no sense of what is "right". 
> > > 
> > 
> > Yeah, this is a tough call. I get Chuck's reticence.
> > 
> > One thing we could consider is offloading the SYNC_NONE writeback
> > submission to a workqueue. I'm not sure though whether that's a win --
> > it might just add needless context switches. OTOH, that would make it
> > fairly simple to kick off writeback when the REFERENCED flag is cleared,
> > which would probably be the best time to do it.
> > 
> > An entry that ends up being harvested by the LRU scanner is going to be
> > touched by it at least twice: once to clear the REFERENCED flag, and
> > again ~2s later to reap it.
> > 
> > If we schedule writeback when we clear the flag then we have a pretty
> > good indication that nothing else is going to be using it (though I
> > think we need to clear REFERENCED even when nfsd_file_check_writeback
> > returns true -- I'll fix that in the coming series).
> > 
> > In any case, I'd probably like to do that sort of change in a separate
> > series after we get the first part sorted.
> > 
> > > But it would be nice to explain in
> > > the comment what is being moved and why, so I could then confirm that
> > > the code matches the intent.
> > > 
> > 
> > I'm happy to add comments, but I'm a little unclear on what you're
> > confused by here. It's a bit too big of a patch for me to give a full
> > play-by-play description. Can you elaborate on what you'd like to see?
> > 
> 
> I don't need blow-by-blow, but all the behavioural changes should at
> least be flagged in the intro, and possibly explained.
> The one I particularly noticed is in nfsd_file_close_inode() which
> previously used nfsd_file_dispose_list() which hands the final close off
> to nfsd_filecache_wq.
> But this patch now does the final close in-line so an fsnotify event
> might now do the fsync.  I was assuming that was deliberate and wanted
> it to be explained.  But maybe it wasn't deliberate?
> 

Good catch! That wasn't a deliberate change, or at least I missed the
subtlety that the earlier code attempted to avoid it. fsnotify callbacks
are run under the srcu_read_lock. I don't think we want to run a fsync
under that if we can at all help it.

What we can probably do is unhash it and dequeue it from the LRU, and
then do a refcount_dec_and_test. If that comes back true, we can then
queue it to the nfsd_fcache_disposal infrastructure to be closed and
freed. I'll have a look at that tomorrow.

> The movement of flush_delayed_fput() threw me at first, but I think I
> understand it now - the new code for close_inode_sync is much cleaner,
> not needing dispose_list_sync.
> 

Yep, I think this is cleaner too.

-- 
Jeff Layton <jlayton@kernel.org>

  reply	other threads:[~2022-11-01 22:42 UTC|newest]

Thread overview: 32+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-11-01 14:46 [PATCH v5 0/5] nfsd: clean up refcounting in the filecache Jeff Layton
2022-11-01 14:46 ` [PATCH v5 1/5] nfsd: remove the pages_flushed statistic from filecache Jeff Layton
2022-11-01 14:46 ` [PATCH v5 2/5] nfsd: reorganize filecache.c Jeff Layton
2022-11-01 20:59   ` NeilBrown
2022-11-01 21:04     ` Jeff Layton
2022-11-01 14:46 ` [PATCH v5 3/5] nfsd: rework refcounting in filecache Jeff Layton
2022-11-01 17:25   ` Chuck Lever III
2022-11-01 18:03     ` Jeff Layton
2022-11-01 18:57       ` Jeff Layton
2022-11-01 19:13         ` Chuck Lever III
2022-11-01 20:21           ` Jeff Layton
2022-11-01 18:15     ` Jeff Layton
2022-11-01 21:23   ` NeilBrown
2022-11-01 21:39     ` Jeff Layton
2022-11-01 21:49       ` Chuck Lever III
2022-11-01 22:04         ` Chuck Lever III
2022-11-01 22:05       ` NeilBrown
2022-11-01 22:42         ` Jeff Layton [this message]
2022-11-02 16:58           ` Jeff Layton
2022-11-02 18:07             ` Chuck Lever III
2022-11-02 18:29               ` Chuck Lever III
2022-11-02 18:34               ` Jeff Layton
2023-01-17 15:16   ` Jason Gunthorpe
2023-01-17 15:22     ` Chuck Lever III
2023-01-17 15:59       ` Jeff Layton
2023-01-18 16:48         ` Shachar Kagan
2023-01-18 17:18           ` Jeff Layton
2023-01-18 18:41             ` Chuck Lever III
2022-11-01 14:46 ` [PATCH v5 4/5] nfsd: close race between unhashing and LRU addition Jeff Layton
2022-11-01 19:45   ` Chuck Lever III
2022-11-01 21:12   ` Chuck Lever III
2022-11-01 14:46 ` [PATCH v5 5/5] nfsd: fix up the filecache laundrette scheduling Jeff Layton

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4ed166accb2fd4a1aa6e4013ca7639bc2e610e37.camel@kernel.org \
    --to=jlayton@kernel.org \
    --cc=chuck.lever@oracle.com \
    --cc=linux-nfs@vger.kernel.org \
    --cc=neilb@suse.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.