Re: git regression failures with v6.2-rc NFS client

From: Benjamin Coddington <bcodding@redhat.com>
To: Trond Myklebust <trondmy@hammerspace.com>
Cc: Thorsten Leemhuis <regressions@leemhuis.info>,
	Hugh Dickins <hughd@google.com>,
	Charles Edward Lever <chuck.lever@oracle.com>,
	Linux NFS Mailing List <linux-nfs@vger.kernel.org>,
	Anna Schumaker <anna@kernel.org>,
	Alexander Viro <viro@zeniv.linux.org.uk>,
	Amir Goldstein <amir73il@gmail.com>,
	linux-fsdevel@vger.kernel.org, linux-mm@kvack.org,
	Linux kernel regressions list <regressions@lists.linux.dev>
Subject: Re: git regression failures with v6.2-rc NFS client
Date: Sat, 04 Feb 2023 15:44:26 -0500	[thread overview]
Message-ID: <05BEEF62-46DF-4FAC-99D4-4589C294F93A@redhat.com> (raw)
In-Reply-To: <031C52C0-144A-4051-9B4C-0E1E3164951E@hammerspace.com>

On 4 Feb 2023, at 11:52, Trond Myklebust wrote:
> On Feb 4, 2023, at 08:15, Benjamin Coddington <bcodding@redhat.com> wrote:
>> Ah, thanks for explaining that.
>>
>> I'd like to summarize and quantify this problem one last time for folks that
>> don't want to read everything.  If an application wants to remove all files
>> and the parent directory, and uses this pattern to do it:
>>
>> opendir
>> while (getdents)
>>    unlink dents
>> closedir
>> rmdir
>>
>> Before this commit, that would work with up to 126 dentries on NFS from
>> tmpfs export.  If the directory had 127 or more, the rmdir would fail with
>> ENOTEMPTY.
>
> For all sizes of filenames, or just the particular set that was chosen
> here? What about the choice of rsize? Both these values affect how many
> entries glibc can cache before it has to issue another getdents() call
> into the kernel. For the record, this is what glibc does in the opendir()
> code in order to choose a buffer size for the getdents syscalls:
>
>   /* The st_blksize value of the directory is used as a hint for the
>      size of the buffer which receives struct dirent values from the
>      kernel.  st_blksize is limited to max_buffer_size, in case the
>      file system provides a bogus value.  */
>   enum { max_buffer_size = 1048576 };
>
>   enum { allocation_size = 32768 };
>   _Static_assert (allocation_size >= sizeof (struct dirent64),
>                   "allocation_size < sizeof (struct dirent64)");
>
>   /* Increase allocation if requested, but not if the value appears to
>      be bogus.  It will be between 32Kb and 1Mb.  */
>   size_t allocation = MIN (MAX ((size_t) statp->st_blksize, (size_t)
>                                 allocation_size), (size_t) max_buffer_size);
>
>   DIR *dirp = (DIR *) malloc (sizeof (DIR) + allocation);

The behavioral complexity is even higher with glibc in the mix, but both the
test that Chuck's using and the reproducer I've been making claims about
use SYS_getdents directly.  I'm using a static 4k buffer size which is big
enough to fit enough entries to prime the heuristic for a single call to
getdents() whether or not we return early at 17 or 126.

>> After this commit, it only works with up to 17 dentries.
>>
>> The argument that this is making things worse takes the position that there
>> are more directories in the universe with >17 dentries that want to be
>> cleaned up by this "saw off the branch you're sitting on" pattern than
>> directories with >127.  And I guess that's true if Chuck runs that testing
>> setup enough.  :)
>>
>> We can change the optimization in the commit from
>> NFS_READDIR_CACHE_MISS_THRESHOLD + 1
>> to
>> nfs_readdir_array_maxentries + 1
>>
>> This would make the regression disappear, and would also keep most of the
>> optimization.
>>
>> Ben
>>
>
> So in other words the suggestion is to optimise the number of readdir
> records that we return from NFS to whatever value that papers over the
> known telldir()/seekdir() tmpfs bug that is re-revealed by this particular
> test when run under these particular conditions?

Yes.  It's a terrible suggestion.  Its only merit may be that it meets the
letter of the no regressions law.  I hate it, and I after I started popping
out patches that do it I've found they've all made the behavior far more
complex due to the way we dynamically optimize dtsize.

> Anyone who tries to use tmpfs with a different number of files, different
> file name lengths, or different mount options is still SOL because that’s
> not a “regression"?

Right. :P

Ben