Live lock in silly-rename.

* Live lock in silly-rename.
@ 2014-05-29  6:45 NeilBrown
  2014-05-29 16:38 ` Trond Myklebust
  0 siblings, 1 reply; 15+ messages in thread
From: NeilBrown @ 2014-05-29  6:45 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: NFS

[-- Attachment #1: Type: text/plain, Size: 4050 bytes --]


The program below (provided by a customer) demonstrates a livelock that can
trigger in NFS.

"/mnt" should be an NFSv4 mount from a server which will hand out READ
delegations (e.g. the Linux NFS server) and should contain a subdirectory
"/mnt/data".

The program forks off some worker threads which poll a particular file in
that directory until it disappears.  Then each worker will exit.
The main program waits a few seconds and then unlinks the file.

The number of threads can be set with the first arg (4 is good). The number of
seconds with the second.  Both have useful defaults.

The unlink should happen promptly and then all the workers should  exit.  But
they don't.

What happens is that between when the "unlink" returns the delegation that it
will inevitably have due to all those "open"s, and when it performs the
required silly-rename (because some thread will have the file open), some
other thread opens the file and gets a delegation.
So the NFSv4 RENAME returns NFS4ERR_DELAY while it tries to reclaim the
delegation.  15 seconds later the rename will be retried, but there will still
(or again) be an active delegation.  So the pattern repeats indefinitely.
All this time the i_mutex on the directory and file are held so "ls" on the
directory will also hang.

As an interesting twist, if you remove the file on the server, the main
process will duly notice when it next tries to rename it, and so will exit.
The clients will continue to successfully open and close the file, even
though "ls -l" shows that it doesn't exist.
If you then rm the file on the client, it will tell you that it doesn't
exist, and the workers will suddenly notice that too.

I haven't really looked into the cause of this second issue, but I can work
around the original problem with the patch below.  It effectively serialised
'open' with 'unlink' (and with anything else that grabs the file's mutex).

I think some sort of serialisation is needed.  I don't know whether i_mutex
is suitable or if we should find (or make) some other locks.

Thoughts?

Thanks,
NeilBrown

Patch:

diff --git a/fs/nfs/nfs4file.c b/fs/nfs/nfs4file.c
index 8de3407e0360..96108f88d3f9 100644
--- a/fs/nfs/nfs4file.c
+++ b/fs/nfs/nfs4file.c
@@ -33,6 +33,10 @@ nfs4_file_open(struct inode *inode, struct file *filp)
 
 	dprintk("NFS: open file(%pd2)\n", dentry);
 
+	// hack - if being unlinked, pause for it to complete
+	mutex_lock(&inode->i_mutex);
+	mutex_unlock(&inode->i_mutex);
+
 	if ((openflags & O_ACCMODE) == 3)
 		openflags--;
 


Program:
#include <stdio.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <unistd.h>
#include <errno.h>


// nfsv4 mount point /mnt
const char check[] = "/mnt/data/checkTST";
const char data[] = "/mnt/data/dummy.data";

int num_client = 4;
int wait_sec   = 3;

// call open/close in endless loop until open fails
void client (void)
{
    for (;;)
    {
        int f = open (check, O_RDONLY);

        if (f == -1)
        {
            printf ("client: done\n");
            _exit(0);
        }
        close (f);
    }
}

int main (int argc, char **argv)
{
    int i, fd;
    FILE *f_dummy;

    if (argc > 1)
        num_client = atoi (argv[1]);

    if (argc > 2)
        wait_sec = atoi (argv[2]);

    fd = open (check, O_RDWR|O_CREAT, S_IRWXU);

    if (fd == -1)
    {
        perror ("open failed:\n");
        _exit (1);
    }

    close (fd);

    for (i=0; i<num_client; i++)
    {
        // fork childs to poll the 'checkTST' file
        pid_t p = fork ();
        if (p==0)
            client();
        else if (p==-1)
        {
            perror ("fork failed");
            _exit (1);
        }
    }

      // parent process
      // - wait a few seconds and unlink 'checkTST' file
      // - all childs are polling the 'checkTST' and should
      //   end now
      sleep (wait_sec);
      unlink (check);
      printf ("server: done\n");
}

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 828 bytes --]

^ permalink raw reply related	[flat|nested] 15+ messages in thread