linux-nfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Benjamin Coddington <bcodding@redhat.com>
To: Chuck Lever III <chuck.lever@oracle.com>
Cc: Linux NFS Mailing List <linux-nfs@vger.kernel.org>
Subject: Re: git regression failures with v6.2-rc NFS client
Date: Wed, 01 Feb 2023 09:10:31 -0500	[thread overview]
Message-ID: <5FF4061F-108C-4555-A32D-DDBFA80EE4E7@redhat.com> (raw)
In-Reply-To: <D0404F55-2692-4DB6-8DD6-CAC004331AC1@redhat.com>

On 31 Jan 2023, at 17:02, Benjamin Coddington wrote:

> On 31 Jan 2023, at 16:15, Chuck Lever III wrote:
>
>> Hi-
>>
>> I upgraded my test client's kernel to v6.2-rc5 and now I get
>> failures during the git regression suite on all NFS versions.
>> I bisected to:
>>
>> 85aa8ddc3818 ("NFS: Trigger the "ls -l" readdir heuristic sooner")
>>
>> The failure looks like:
>>
>> not ok 6 - git am --skip succeeds despite D/F conflict
>> #
>> #               test_when_finished "git -C df_plus_edit_edit clean -f" &&
>> #               test_when_finished "git -C df_plus_edit_edit reset --hard" &&
>> #               (
>> #                       cd df_plus_edit_edit &&
>> #
>> #                       git checkout f-edit^0 &&
>> #                       git format-patch -1 d-edit &&
>> #                       test_must_fail git am -3 0001*.patch &&
>> #
>> #                       git am --skip &&
>> #                       test_path_is_missing .git/rebase-apply &&
>> #                       git ls-files -u >conflicts &&
>> #                       test_must_be_empty conflicts
>> #               )
>> #
>> # failed 1 among 6 test(s)
>> 1..6
>> make[2]: *** [Makefile:60: t1015-read-index-unmerged.sh] Error 1
>> make[2]: *** Waiting for unfinished jobs....
>>
>> The regression suite is run like this:
>>
>> RESULTS= some random directory under /tmp
>> RELEASE="git-2.37.1"
>>
>> rm -f ${RELEASE}.tar.gz
>> curl --no-progress-meter -O https://mirrors.edge.kernel.org/pub/software/scm/git/${RELEASE}.tar.gz
>> /usr/bin/time tar zxf ${RELEASE}.tar.gz >> ${RESULTS}/git 2>&1
>>
>> cd ${RELEASE}
>> make clean >> ${RESULTS}/git 2>&1
>> /usr/bin/time make -j${THREADS} all doc >> ${RESULTS}/git 2>&1
>>
>> /usr/bin/time make -j${THREADS} test >> ${RESULTS}/git 2>&1
>>
>> On this client, THREADS=12. A single-thread run doesn't seem to
>> trigger a problem. So unfortunately the specific data I have is
>> going to be noisy.
>
> I'll attempt to reproduce this and see what's up.  This is an export of
> tmpfs?  If so, I suspect you might be running into tmpfs' unstable cookie
> problem when two processes race through nfs_do_filldir().. and if so, the
> cached listing of the directory on the client won't match a listing on the
> server.

It doesn't reproduce on ext4, but I can see it on an export of tmpfs.

Unsurprisingly the pattern is getdents() returning 19 entries (17 for the
first emit and "." and ".."), then unlinking those and the next getdents()
returning 0.

Here's a reproducer which fails on tmpfs but works properly on exports of
ext4 and xfs:

#define _GNU_SOURCE
#include <stdio.h>
#include <unistd.h>
#include <fcntl.h>
#include <sched.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <sys/syscall.h>
#include <string.h>

#define NFSDIR "/mnt/tmpfs/dirtest"
#define BUF_SIZE 4096
#define COUNT 18

int main(int argc, char **argv)
{
    int i, dir_fd, bpos, total = 0;
    size_t nread;
    struct linux_dirent {
        long           d_ino;
        off_t          d_off;
        unsigned short d_reclen;
        char           d_name[];
    };
    struct linux_dirent *d;
    char buf[BUF_SIZE];

    /* create files */
    for (i = 0; i < COUNT; i++) {
        sprintf(buf, NFSDIR "/file_%03d", i);
        close(open(buf, O_CREAT, 666));
	total++;
    }
    printf("created %d total dirents\n", total);

    dir_fd = open(NFSDIR, O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC);
    if (dir_fd < 0) {
        perror("cannot open dir");
        return 1;
    }

    /* drop the first page */
    posix_fadvise(dir_fd, 0, 4096, POSIX_FADV_DONTNEED);
    total = 0;

    while (1) {
        nread = syscall(SYS_getdents, dir_fd, buf, BUF_SIZE);
        if (nread == 0 || nread == -1)
            break;
        for (bpos = 0; bpos < nread;) {
            d = (struct linux_dirent *) (buf + bpos);

	    if (d->d_name[0] != '.') {
		    printf("%s\n", d->d_name);
		    unlinkat(dir_fd, d->d_name, 0);
		    total++;
	    }
            bpos += d->d_reclen;
        }
    }
    printf("found and deleted %d dirents\n", total);
    close(dir_fd);

    printf("rmdir returns %d\n", rmdir(NFSDIR));
    return 0;
}

The client is doing uncached_readdir looking for cookie 19, but tmpfs has
re-ordered the last file into cookie 3 on the second READDIR.

I think this is a different case of the problems discussed about unstable
readdir cookies on the last round of directory cache improvements, but since
we're now returning after 17 entries the problem is exposed on a directory
containing 18 files, rather than 128.

Working on a fix..

Ben


  reply	other threads:[~2023-02-01 14:11 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-01-31 21:15 git regression failures with v6.2-rc NFS client Chuck Lever III
2023-01-31 22:02 ` Benjamin Coddington
2023-02-01 14:10   ` Benjamin Coddington [this message]
2023-02-01 15:53     ` Benjamin Coddington
2023-02-03 14:38       ` Chuck Lever III
2023-02-03 15:13         ` Benjamin Coddington
2023-02-03 15:35           ` Chuck Lever III
2023-02-03 17:14             ` Benjamin Coddington
2023-02-03 18:03               ` Chuck Lever III
2023-02-03 20:01                 ` Benjamin Coddington
2023-02-03 20:25                   ` Chuck Lever III
2023-02-03 22:26                     ` Trond Myklebust
2023-02-03 23:11                       ` Chuck Lever III
2023-02-03 23:53                         ` Hugh Dickins
2023-02-04  0:07                           ` Trond Myklebust
2023-02-04  0:15                             ` Hugh Dickins
2023-02-04  0:59                               ` Trond Myklebust
2023-02-04 11:07                                 ` Thorsten Leemhuis
2023-02-04 13:15                                   ` Benjamin Coddington
2023-02-04 16:52                                     ` Trond Myklebust
2023-02-04 20:44                                       ` Benjamin Coddington
2023-02-05 11:24                                         ` Jeff Layton
2023-02-05 16:11                                           ` Benjamin Coddington
2023-02-01 15:11   ` Chuck Lever III
2023-02-03 12:39 ` Linux kernel regression tracking (#adding)
2023-02-21 14:58   ` Linux regression tracking #update (Thorsten Leemhuis)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5FF4061F-108C-4555-A32D-DDBFA80EE4E7@redhat.com \
    --to=bcodding@redhat.com \
    --cc=chuck.lever@oracle.com \
    --cc=linux-nfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).