All of lore.kernel.org
 help / color / mirror / Atom feed
From: Trond Myklebust <Trond.Myklebust@netapp.com>
To: Jeff Layton <jlayton@redhat.com>
Cc: linux-nfs@vger.kernel.org
Subject: Re: [PATCH] nfs: don't queue synchronous NFSv4 close rpc_release to nfsiod
Date: Tue, 15 Feb 2011 18:47:04 -0500	[thread overview]
Message-ID: <1297813624.10103.34.camel@heimdal.trondhjem.org> (raw)
In-Reply-To: <20110215113053.345e3abc@tlielax.poochiereds.net>

On Tue, 2011-02-15 at 11:30 -0500, Jeff Layton wrote: 
> On Tue, 15 Feb 2011 10:31:38 -0500
> Trond Myklebust <Trond.Myklebust@netapp.com> wrote:
> 
> > On Tue, 2011-02-15 at 09:58 -0500, Jeff Layton wrote: 
> > > I recently had some of our QA people report some connectathon test
> > > failures in RHEL5 (2.6.18-based kernel). For some odd reason (maybe
> > > scheduling differences that make the race more likely?) the problem
> > > occurs more frequently on s390.
> > > 
> > > The problem generally manifests itself on NFSv4 as a race where an rmdir
> > > fails because a silly-renamed file in the directory wasn't deleted in
> > > time. Looking at traces, what you usually see is the failing rmdir
> > > attempt that fails with the sillydelete of the file that prevented it
> > > very soon afterward.
> > > 
> > > Silly deletes are handled via dentry_iput and in the case of a close on
> > > NFSv4, the last dentry reference is often held by the CLOSE RPC task.
> > > nfs4_do_close does the close as an async RPC task that it conditionally
> > > waits on depending on whether the close is synchronous or not.
> > > 
> > > It also sets the workqueue for the task to nfsiod_workqueue. When
> > > tk_workqueue is set, the rpc_release operation is queued to that
> > > workqueue. rpc_release is where the dentry reference held by the task is
> > > put. The caller has no way to wait for that to complete, so the close(2)
> > > syscall can easily return before the rpc_release call is ever done. In
> > > some cases, that rpc_release is delayed for a long enough to prevent a
> > > subsequent rmdir of the containing directory.
> > > 
> > > I believe this is a bug, or at least not ideal behavior. We should try
> > > not to have the close(2) call return in this situation until the
> > > sillydelete is done.
> > > 
> > > I've been able to reproduce this more reliably by adding a 100ms sleep
> > > at the top of nfs4_free_closedata. I've not seen it "in the wild" on
> > > mainline kernels, but it seems quite possible when a machine is heavily
> > > loaded.
<snip> 
> > There is no guarantee that rpc_wait_for_completion_task() will wait
> > until rpciod had called rpc_put_task(), in which case the above change
> > ends up with a dput() on the rpciod workqueue, and potential deadlocks.
> > 
> > Let's rather look at making rpc_put_task a little bit more safe, by
> > ensuring that rpciod always calls queue_work(), and by allowing other
> > callers to switch it off.
> > 
> 
> Ok, I see the potential deadlock. To make sure I understand correctly,
> is this what you'd like to see?
> 
> 1) If tk_workqueue is set, then queue rpc_release to that.
> 
> 2) If it's not, queue it to rpciod.
> 
> 3) add a flag to rpc_put_task (or a variant of that function) that
> calls rpc_release synchronously, and have nfs4_do_close use that
> variant when it puts the task.
> 
> ...does that sound correct? If so, then I'm not sure that it'll 100%
> close the race. It's still possible that the rpc_put_task call in
> rpc_release_task would be the last one. If so and we queue it to the
> workqueue, then we're back in the same situation.

Agreed. The problem is lack of atomicity between rpc_complete_task() and
rpc_put_task(). How about something like the following?

Cheers
  Trond
8<----------------------------------------------------------------------------------------- 

  reply	other threads:[~2011-02-15 23:47 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-02-15 14:58 [PATCH] nfs: don't queue synchronous NFSv4 close rpc_release to nfsiod Jeff Layton
2011-02-15 15:31 ` Trond Myklebust
2011-02-15 16:30   ` Jeff Layton
2011-02-15 23:47     ` Trond Myklebust [this message]
2011-02-16 14:09       ` Trond Myklebust
2011-02-16 14:26         ` Trond Myklebust
2011-02-16 14:50           ` Jeff Layton
2011-02-16 15:21             ` Trond Myklebust
2011-02-16 18:13               ` Jeff Layton
2011-02-17 13:40                 ` Jeff Layton
2011-02-17 15:10                   ` Jeff Layton
2011-02-17 19:47                     ` Trond Myklebust
2011-02-17 21:37                       ` Jeff Layton
2011-02-18 20:04                         ` Jeff Layton
2011-02-18 20:54                           ` Trond Myklebust
2011-02-23 20:17                             ` Jeff Layton
2011-02-15 15:53 ` Tigran Mkrtchyan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1297813624.10103.34.camel@heimdal.trondhjem.org \
    --to=trond.myklebust@netapp.com \
    --cc=jlayton@redhat.com \
    --cc=linux-nfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.