Re: git regression failures with v6.2-rc NFS client

From: Chuck Lever III <chuck.lever@oracle.com>
To: Trond Myklebust <trondmy@hammerspace.com>
Cc: Benjamin Coddington <bcodding@redhat.com>,
	Linux NFS Mailing List <linux-nfs@vger.kernel.org>,
	Anna Schumaker <anna@kernel.org>, Hugh Dickins <hughd@google.com>,
	Linux Memory Management List <linux-mm@kvack.org>
Subject: Re: git regression failures with v6.2-rc NFS client
Date: Fri, 3 Feb 2023 23:11:21 +0000	[thread overview]
Message-ID: <FA8392E6-DAFC-4462-BDAE-893955F9E1A4@oracle.com> (raw)
In-Reply-To: <545B5AB7-93A6-496E-924E-AE882BF57B72@hammerspace.com>

> On Feb 3, 2023, at 5:26 PM, Trond Myklebust <trondmy@hammerspace.com> wrote:
> 
> 
> 
>> On Feb 3, 2023, at 15:25, Chuck Lever III <chuck.lever@oracle.com> wrote:
>> 
>> 
>> 
>>> On Feb 3, 2023, at 3:01 PM, Benjamin Coddington <bcodding@redhat.com> wrote:
>>> 
>>> On 3 Feb 2023, at 13:03, Chuck Lever III wrote:
>>>> Naive suggestion: Would another option be to add server
>>>> support for the directory verifier?
>>> 
>>> I don't think it would resolve this bug, because what can the client do to
>>> find its place in the directory stream after getting NFS4ERR_NOT_SAME?
>>> 
>>> Directory verifiers might help another class of bugs, where a linear walk
>>> through the directory produces duplicate entries.. but I think it only helps
>>> some of those cases.
>>> 
>>> Knfsd /could/ use the directory verifier to keep a reference to an opened
>>> directory.  Perhaps knfsd's open file cache can be used.  Then, as long as
>>> we're doing a linear walk through the directory, the local directory's
>>> file->private cursor would continue to be valid to produce consistent linear
>>> results even if the dentries are removed in between calls to READDIR.
>>> 
>>> .. but that's not how the verifier is supposed to be used - rather it's
>>> supposed to signal when the client's cookies are invalid, which, for tmpfs,
>>> would be anytime any changes are made to the directory.
>> 
>> Right. Change the verifier whenever a directory is modified.
>> 
>> (Make it an export -> callback per filesystem type).
>> 
>>>> Well, let's not argue semantics. The optimization exposes
>>>> this (previously known) bug to a wider set of workloads.
>>> 
>>> Ok, yes.
>>> 
>>>> Please, open some bugs that document it. Otherwise this stuff
>>>> will never get addressed. Can't fix what we don't know about.
>>>> 
>>>> I'm not claiming tmpfs is perfect. But the optimization seems
>>>> to make things worse for more workloads. That's always been a
>>>> textbook definition of regression, and a non-starter for
>>>> upstream.
>>> 
>>> I guess I can - my impression has been there's no interest in fixing tmpfs
>>> for use over NFS..  after my last round of work on it I resolved to tell any
>>> customers that asked for it to do:
>>> 
>>> [root@fedora ~]# modprobe brd rd_size=1048576 rd_nr=1
>>> [root@fedora ~]# mkfs.xfs /dev/ram0
>>> 
>>> .. instead.  Though, I think that tmpfs is going to have better performance.
>>> 
>>>>> I spent a great deal of time making points about it and arguing that the
>>>>> client should try to work around them, and ultimately was told exactly this:
>>>>> https://lore.kernel.org/linux-nfs/a9411640329ed06dd7306bbdbdf251097c5e3411.camel@hammerspace.com/
>>>> 
>>>> Ah, well "client should work around them" is going to be a
>>>> losing argument every time.
>>> 
>>> Yeah, I agree with that, though at the time I naively thought the issues
>>> could be solved by better validation of changing directories.
>>> 
>>> So.. I guess I lost?  I wasn't aware of the stable cookie issues fully
>>> until Trond pointed them out and I compared tmpfs and xfs.  At that point, I
>>> saw there's not really much of a path forward for tmpfs, unless we want to
>>> work pretty hard at it.  But why?  Any production server wanting performance
>>> and stability on an NFS export isn't going to use tmpfs on knfsd.  There are
>>> good memory-speed alternatives.
>> 
>> Just curious, what are they? I have xfs on a pair of NVMe
>> add-in cards, and it's quite fast. But that's an expensive
>> replacement for tmpfs.
>> 
>> My point is, until now, I had thought that tmpfs was a fully
>> supported filesystem type for NFS export. What I'm hearing
>> is that some people don't agree, and worse, it's not been
>> documented anywhere.
>> 
>> If we're not going to support tmpfs enough to fix these
>> significant problems, then it should be made unexportable,
>> and that deprecation needs to be properly documented.
>> 
>> 
>>>>> The optimization you'd like to revert fixes a performance regression on
>>>>> workloads across exports of all filesystems.  This is a fix we've had many
>>>>> folks asking for repeatedly.
>>>> 
>>>> Does the community agree that tmpfs has never been a first-tier
>>>> filesystem for exporting? That's news to me. I don't recall us
>>>> deprecating support for tmpfs.
>>> 
>>> I'm definitely not telling folks to use it as exported from knfsd.  I'm
>>> instead telling them, "Directory listings are going to be unstable, you'll
>>> see problems." That includes any filesystems with file_operations of
>>> simple_dir_operations.
>> 
>>> That said, the whole opendir, getdents, unlink, getdents pattern is maybe
>>> fine for a test that can assume it has exclusive rights (writes?) to a
>>> directory, but also probably insane for anything else that wants to reliably
>>> remove the thing, and we'll find that's why `rm -rf` still works.  It does
>>> opendir, getdents, getdents, getdents, unlink, unlink, unlink, .. rmdir.
>>> This more closely corresponds to POSIX stating:
>>> 
>>>  If a file is removed from or added to the directory after the most recent
>>>  call to opendir() or rewinddir(), whether a subsequent call to readdir()
>>>  returns an entry for that file is unspecified.
>>> 
>>> 
>>>> If an optimization broke ext4, xfs, or btrfs, it would be
>>>> reverted until the situation was properly addressed. I don't
>>>> see why the situation is different for tmpfs just because it
>>>> has more bugs.
>>> 
>>> Maybe it isn't?  We've yet to hear from any authoritative sources on this.
>> 
>>>>> I hope the maintainers decide not to revert
>>>>> it, and that we can also find a way to make readdir work reliably on tmpfs.
>>>> 
>>>> IMO the guidelines go the other way: readdir on tmpfs needs to
>>>> be addressed first. Reverting is a last resort, but I don't see
>>>> a fix coming before v6.2. Am I wrong?
>>> 
>>> Is your opinion wrong?  :)  IMO, yes, because this is just another round of
>>> "we don't fix the client for broken server behaviors".
>> 
>> In general, we don't fix broken servers on the client, true.
>> 
>> In this case, though, this is a regression. A client change
>> broke workloads that were working in v6.1.
> 
> No. We’ve had this discussion on the NFS mailing list every time someone invents a new filesystem that they want to export and then claims that they don’t need stable cookies because the NFS protocol doesn’t bother to spell that part out. The NFS client cannot and will not claim bug-free support for a filesystem that assigns cookie values in any way that is not 100% repeatable.

Nor should it. However:

A. tmpfs is not a new filesystem.

B. Ben says this is more or less an issue on _all_ filesystems,
but others manage to hide it effectively, likely as a side-effect
of having to deal with slow durable storage. Before the optimization,
tmpfs worked "well enough" as well.

C. If we don't want to fully support tmpfs, that's fine. But let's
document that properly, please.

D. If you guys knew beforehand that this change would break tmpfs
exports, there should have been a warning in the patch description.

The guidelines about regressions are clear. I don't agree with
leaving the optimization in place while tmpfs is not working well
enough. And actually, these issues in tmpfs should have been
addressed first. There's loads of precedent for that requirement.

But it appears that as usual I don't have much choice. What I can
do is file some bugs and see if we can address the server side
pieces.

So far no one has said that the tmpfs cookie problem is irreparable.
I'd much prefer to keep it in NFSD's stable of support.

https://bugzilla.linux-nfs.org/show_bug.cgi?id=405

And, if it helps, our server should support directory verifiers.

https://bugzilla.linux-nfs.org/show_bug.cgi?id=404

> Concerning the claim that it was unknown this is a problem with tmpfs, then see https://marc.info/?l=linux-kernel&m=100369543808129&w=2
> In the years since 2001, we’ve “fixed” the behaviour of tmpfs by circumventing the reliance on cookies for the case of a linear read of a tmpfs directory, but the underlying cookie behaviour is still the same, and so the NFS behaviour ends up being the same.
> 
> The bottom line is that you’ve always been playing the lottery when mounting tmpfs over NFS.

I'm not debating the truth of that. I just don't think we should
be making that situation needlessly worse.

And I would be much more comfortable with this if it appeared in
a man page or on our wiki, or ... I'm sorry, but "some email in
2001" is not documentation a user should be expected to find.

--
Chuck Lever