All of lore.kernel.org
 help / color / mirror / Atom feed
From: Daire Byrne <daire@dneg.com>
To: Jeff Layton <jlayton@kernel.org>
Cc: bfields <bfields@fieldses.org>,
	"J. Bruce Fields" <bfields@redhat.com>,
	Trond Myklebust <trondmy@hammerspace.com>,
	linux-cachefs <linux-cachefs@redhat.com>,
	linux-nfs <linux-nfs@vger.kernel.org>
Subject: Re: [PATCH 2/4] nfsd: pre/post attr is using wrong change attribute
Date: Sat, 21 Nov 2020 21:44:29 +0000 (GMT)	[thread overview]
Message-ID: <1758069641.91412432.1605995069116.JavaMail.zimbra@dneg.com> (raw)
In-Reply-To: <5f8e9e0cb53c89a7d1c156a6799c6dbc6db96dae.camel@kernel.org>

----- On 21 Nov, 2020, at 01:03, Jeff Layton jlayton@kernel.org wrote:
> On Fri, 2020-11-20 at 17:44 -0500, J. Bruce Fields wrote:
>> On Fri, Nov 20, 2020 at 05:38:31PM -0500, J. Bruce Fields wrote:
>> > On Tue, Nov 17, 2020 at 10:34:57AM -0500, Jeff Layton wrote:
>> > > On Tue, 2020-11-17 at 10:26 -0500, J. Bruce Fields wrote:
>> > > > On Tue, Nov 17, 2020 at 07:34:49AM -0500, Jeff Layton wrote:
>> > > > > I don't think I described what I was thinking well. Let me try again...
>> > > > > 
>> > > > > There should be no need to change the code in iversion.h -- I think we
>> > > > > can do this in a way that's confined to just nfsd/export code.
>> > > > > 
>> > > > > What I would suggest is to have nfsd4_change_attribute call the
>> > > > > fetch_iversion op if it exists, instead of checking IS_I_VERSION and
>> > > > > doing the stuff in that block. If fetch_iversion is NULL, then just use
>> > > > > the ctime.
>> > > > > 
>> > > > > Then, you just need to make sure that the filesystems' export_ops have
>> > > > > an appropriate fetch_iversion vector. xfs, ext4 and btrfs can just call
>> > > > > inode_query_iversion, and NFS and Ceph can call inode_peek_iversion_raw.
>> > > > > The rest of the filesystems can leave fetch_iversion as NULL (since we
>> > > > > don't want to use it on them).
>> > > > 
>> > > > Thanks for your patience, that makes sense, I'll try it.
>> > > > 
>> > > 
>> > > There is one gotcha in here though... ext4 needs to also handle the case
>> > > where SB_I_VERSION is not set. The simple fix might be to just have
>> > > different export ops for ext4 based on whether it was mounted with -o
>> > > iversion or not, but maybe there is some better way to do it?
>> > 
>> > I was thinking ext4's export op could check for I_VERSION on its own and
>> > vary behavior based on that.
>> > 
>> > I'll follow up with new patches in a moment.
>> > 
>> > I think the first one's all that's needed to fix the problem Daire
>> > identified.  I'm a little less sure of the rest.

I can confirm that patch 1/8 alone does indeed address the reported revalidation issue for us (as did the previous patch). The re-export server's client cache seems to remain intact and can serve the same cached results to multiple clients.

>> > Lightly tested, just by running them through my usual regression tests
>> > (which don't re-export) and then running connectathon on a 4.2 re-export
>> > of a 4.2 mount.
>> > 
>> > The latter triggered a crash preceded by a KASAN use-after free warning.
>> > Looks like it might be a problem with blocking lock notifications,
>> > probably not related to these patches.
>> >
> The set looks pretty reasonable at first glance. Nice work.
> 
> Once you put this in, I'll plan to add a suitable fetch_iversion op for
> ceph too.
> 
>> Another nit I ran across:
>> 
>> Some NFSv4 directory-modifying operations return pre- and post- change
>> attributes together with an "atomic" flag that's supposed to indicate
>> whether the change attributes were read atomically with the operation.
>> It looks like we're setting the atomic flag under the assumptions that
>> local vfs locks are sufficient to guarantee atomicity, which isn't right
>> when we're exporting a distributed filesystem.
>> 
>> In the case we're reexporting NFS I guess ideal would be to use the pre-
>> and post- attributes that the original server returned and also save
>> having to do extra getattr calls.  Not sure how we'd do that,
>> though--more export operations?  Maybe for now we could just figure out
>> when to turn off the atomic bit.
> 
> Oh yeah, good point.
> 
> I'm not even sure that local locks are really enough -- IIRC, there are
> still some race windows between doing the metadata operations and the
> getattrs called to fill pre/post op attrs. Still, those windows are a
> lot larger on something like NFS, so setting the flag there is really
> stretching things.
> 
> One hacky fix might be to add a flags field to export_operations, and
> have one that indicates that the atomic flag shouldn't be set. Then we
> could add that flag to all of the netfs' (nfs, ceph, cifs), and anywhere
> else that we thought it appropriate?
> 
> That approach might be helpful later too since we're starting to see a
> wider variety of exportable filesystems these days. We may need more
> "quirk" flags like this.
> --
> Jeff Layton <jlayton@kernel.org>

I should also mention that I still see a lot of unexpected repeat lookups even with the iversion optimisation patches with certain workloads. For example, looking at a network capture on the re-export server I might see 100s of getattr calls to the originating server for the same filehandle within 30 seconds which I would have expected the client cache to serve. But it could also be that the client cache is under memory pressure and not holding that data for very long.

But now I do wonder if these NFSv4 directory modifications and pre/post change attributes could be one potential contributor? I might run some production loads with a v3 re-export of a v3 server to see if that changes anything.

Many thanks again for the patches, I will take the entire set and run them through our production re-export workloads to see if anything shakes out.

Daire

  reply	other threads:[~2020-11-21 21:44 UTC|newest]

Thread overview: 129+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-09-07 17:31 Adventures in NFS re-exporting Daire Byrne
2020-09-08  9:40 ` Mkrtchyan, Tigran
2020-09-08 11:06   ` Daire Byrne
2020-09-15 17:21 ` J. Bruce Fields
2020-09-15 19:59   ` Trond Myklebust
2020-09-16 16:01     ` Daire Byrne
2020-10-19 16:19       ` Daire Byrne
2020-10-19 17:53         ` [PATCH 0/2] Add NFSv3 emulation of the lookupp operation trondmy
2020-10-19 17:53           ` [PATCH 1/2] NFSv3: Refactor nfs3_proc_lookup() to split out the dentry trondmy
2020-10-19 17:53             ` [PATCH 2/2] NFSv3: Add emulation of the lookupp() operation trondmy
2020-10-19 20:05         ` [PATCH v2 0/2] Add NFSv3 emulation of the lookupp operation trondmy
2020-10-19 20:05           ` [PATCH v2 1/2] NFSv3: Refactor nfs3_proc_lookup() to split out the dentry trondmy
2020-10-19 20:05             ` [PATCH v2 2/2] NFSv3: Add emulation of the lookupp() operation trondmy
2020-10-20 18:37         ` [PATCH v3 0/3] Add NFSv3 emulation of the lookupp operation trondmy
2020-10-20 18:37           ` [PATCH v3 1/3] NFSv3: Refactor nfs3_proc_lookup() to split out the dentry trondmy
2020-10-20 18:37             ` [PATCH v3 2/3] NFSv3: Add emulation of the lookupp() operation trondmy
2020-10-20 18:37               ` [PATCH v3 3/3] NFSv4: Observe the NFS_MOUNT_SOFTREVAL flag in _nfs4_proc_lookupp trondmy
2020-10-21  9:33         ` Adventures in NFS re-exporting Daire Byrne
2020-11-09 16:02           ` bfields
2020-11-12 13:01             ` Daire Byrne
2020-11-12 13:57               ` bfields
2020-11-12 18:33                 ` Daire Byrne
2020-11-12 20:55                   ` bfields
2020-11-12 23:05                     ` Daire Byrne
2020-11-13 14:50                       ` bfields
2020-11-13 22:26                         ` bfields
2020-11-14 12:57                           ` Daire Byrne
2020-11-16 15:18                             ` bfields
2020-11-16 15:53                             ` bfields
2020-11-16 19:21                               ` Daire Byrne
2020-11-16 15:29                           ` Jeff Layton
2020-11-16 15:56                             ` bfields
2020-11-16 16:03                               ` Jeff Layton
2020-11-16 16:14                                 ` bfields
2020-11-16 16:38                                   ` Jeff Layton
2020-11-16 19:03                                     ` bfields
2020-11-16 20:03                                       ` Jeff Layton
2020-11-17  3:16                                         ` bfields
2020-11-17  3:18                                           ` [PATCH 1/4] nfsd: move fill_{pre,post}_wcc to nfsfh.c J. Bruce Fields
2020-11-17  3:18                                             ` [PATCH 2/4] nfsd: pre/post attr is using wrong change attribute J. Bruce Fields
2020-11-17 12:34                                               ` Jeff Layton
2020-11-17 15:26                                                 ` J. Bruce Fields
2020-11-17 15:34                                                   ` Jeff Layton
2020-11-20 22:38                                                     ` J. Bruce Fields
2020-11-20 22:39                                                       ` [PATCH 1/8] nfsd: only call inode_query_iversion in the I_VERSION case J. Bruce Fields
2020-11-20 22:39                                                         ` [PATCH 2/8] nfsd: simplify nfsd4_change_info J. Bruce Fields
2020-11-20 22:39                                                         ` [PATCH 3/8] nfsd: minor nfsd4_change_attribute cleanup J. Bruce Fields
2020-11-21  0:34                                                           ` Jeff Layton
2020-11-20 22:39                                                         ` [PATCH 4/8] nfsd4: don't query change attribute in v2/v3 case J. Bruce Fields
2020-11-20 22:39                                                         ` [PATCH 5/8] nfs: use change attribute for NFS re-exports J. Bruce Fields
2020-11-20 22:39                                                         ` [PATCH 6/8] nfsd: move change attribute generation to filesystem J. Bruce Fields
2020-11-21  0:58                                                           ` Jeff Layton
2020-11-21  1:01                                                             ` J. Bruce Fields
2020-11-21 13:00                                                           ` Jeff Layton
2020-11-20 22:39                                                         ` [PATCH 7/8] nfsd: skip some unnecessary stats in the v4 case J. Bruce Fields
2020-11-20 22:39                                                         ` [PATCH 8/8] Revert "nfsd4: support change_attr_type attribute" J. Bruce Fields
2020-11-20 22:44                                                       ` [PATCH 2/4] nfsd: pre/post attr is using wrong change attribute J. Bruce Fields
2020-11-21  1:03                                                         ` Jeff Layton
2020-11-21 21:44                                                           ` Daire Byrne [this message]
2020-11-22  0:02                                                             ` bfields
2020-11-22  1:55                                                               ` Daire Byrne
2020-11-22  3:03                                                                 ` bfields
2020-11-23 20:07                                                                   ` Daire Byrne
2020-11-17 15:25                                               ` J. Bruce Fields
2020-11-17  3:18                                             ` [PATCH 3/4] nfs: don't mangle i_version on NFS J. Bruce Fields
2020-11-17 12:27                                               ` Jeff Layton
2020-11-17 14:14                                                 ` J. Bruce Fields
2020-11-17  3:18                                             ` [PATCH 4/4] nfs: support i_version in the NFSv4 case J. Bruce Fields
2020-11-17 12:34                                               ` Jeff Layton
2020-11-24 20:35               ` Adventures in NFS re-exporting Daire Byrne
2020-11-24 21:15                 ` bfields
2020-11-24 22:15                   ` Frank Filz
2020-11-25 14:47                     ` 'bfields'
2020-11-25 16:25                       ` Frank Filz
2020-11-25 19:03                         ` 'bfields'
2020-11-26  0:04                           ` Frank Filz
2020-11-25 17:14                   ` Daire Byrne
2020-11-25 19:31                     ` bfields
2020-12-03 12:20                     ` Daire Byrne
2020-12-03 18:51                       ` bfields
2020-12-03 20:27                         ` Trond Myklebust
2020-12-03 21:13                           ` bfields
2020-12-03 21:32                             ` Frank Filz
2020-12-03 21:34                             ` Trond Myklebust
2020-12-03 21:45                               ` Frank Filz
2020-12-03 21:57                                 ` Trond Myklebust
2020-12-03 22:04                                   ` bfields
2020-12-03 22:14                                     ` Trond Myklebust
2020-12-03 22:39                                       ` Frank Filz
2020-12-03 22:50                                         ` Trond Myklebust
2020-12-03 23:34                                           ` Frank Filz
2020-12-03 22:44                                       ` bfields
2020-12-03 21:54                               ` bfields
2020-12-03 22:45                               ` bfields
2020-12-03 22:53                                 ` Trond Myklebust
2020-12-03 23:16                                   ` bfields
2020-12-03 23:28                                     ` Frank Filz
2020-12-04  1:02                                     ` Trond Myklebust
2020-12-04  1:41                                       ` bfields
2020-12-04  2:27                                         ` Trond Myklebust
2020-09-17 16:01   ` Daire Byrne
2020-09-17 19:09     ` bfields
2020-09-17 20:23       ` Frank van der Linden
2020-09-17 21:57         ` bfields
2020-09-19 11:08           ` Daire Byrne
2020-09-22 16:43         ` Chuck Lever
2020-09-23 20:25           ` Daire Byrne
2020-09-23 21:01             ` Frank van der Linden
2020-09-26  9:00               ` Daire Byrne
2020-09-28 15:49                 ` Frank van der Linden
2020-09-28 16:08                   ` Chuck Lever
2020-09-28 17:42                     ` Frank van der Linden
2020-09-22 12:31 ` Daire Byrne
2020-09-22 13:52   ` Trond Myklebust
2020-09-23 12:40     ` J. Bruce Fields
2020-09-23 13:09       ` Trond Myklebust
2020-09-23 17:07         ` bfields
2020-09-30 19:30   ` [Linux-cachefs] " Jeff Layton
2020-10-01  0:09     ` Daire Byrne
2020-10-01 10:36       ` Jeff Layton
2020-10-01 12:38         ` Trond Myklebust
2020-10-01 16:39           ` Jeff Layton
2020-10-05 12:54         ` Daire Byrne
2020-10-13  9:59           ` Daire Byrne
2020-10-01 18:41     ` J. Bruce Fields
2020-10-01 19:24       ` Trond Myklebust
2020-10-01 19:26         ` bfields
2020-10-01 19:29           ` Trond Myklebust
2020-10-01 19:51             ` bfields

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1758069641.91412432.1605995069116.JavaMail.zimbra@dneg.com \
    --to=daire@dneg.com \
    --cc=bfields@fieldses.org \
    --cc=bfields@redhat.com \
    --cc=jlayton@kernel.org \
    --cc=linux-cachefs@redhat.com \
    --cc=linux-nfs@vger.kernel.org \
    --cc=trondmy@hammerspace.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.