Linux-NFS Archive on lore.kernel.org
 help / color / Atom feed
From: Trond Myklebust <trondmy@hammerspace.com>
To: "neilb@suse.de" <neilb@suse.de>,
	"chuck.lever@oracle.com" <chuck.lever@oracle.com>
Cc: "linux-nfs@vger.kernel.org" <linux-nfs@vger.kernel.org>,
	"nfbrown@suse.com" <nfbrown@suse.com>
Subject: Re: remounting hard -> soft
Date: Fri, 4 Oct 2019 15:25:12 +0000
Message-ID: <8fa5b7e8a20c435b8bbf2130f2ade0c513b3631c.camel@hammerspace.com> (raw)
In-Reply-To: <4C8E9327-5B84-4EA7-B9D4-37183A1FEB3C@oracle.com>

On Thu, 2019-10-03 at 09:01 -0400, Chuck Lever wrote:
> > On Oct 2, 2019, at 8:27 PM, NeilBrown <neilb@suse.de> wrote:
> > 
> > On Wed, Oct 02 2019, Chuck Lever wrote:
> > 
> > > Hi Trond-
> > > 
> > > We (Oracle) had another (fairly rare) instance of a weekend
> > > maintenance
> > > window where an NFS server's IP address changed while there were
> > > mounted
> > > clients. It brought up the issue again of how we (the Linux NFS
> > > community)
> > > would like to deal with cases where a client administrator has to
> > > deal
> > > with a moribund mount (like that alliteration :-).
> > 
> > What exactly is the problem that this caused?
> > 
> > As I understand it, a moribund mount can still be unmounted with "-
> > l"
> > and processes accessing it can still be killed
> 
> I was asking about "-o remount,soft" because I was not certain
> about the outcome last time this conversation was in full swing.
> The gist then is that we want "umount -l" and "umount -f" to
> work reliably and as advertised?

'umount -l' and 'umount -f' are both inherently flawed. The former
because it just hides the hanging RPC calls in the kernel (causing
resource leaks left, right and center), and the latter because it is a
single point-in-time operation. When you do 'umount -f', it will try to
kill all pending RPC calls, but it does nothing to prevent further
calls from being scheduled.

So yes, at some point it would be good to be able to kill requests from
a permanently hanging server through some other means.

One of the ideas that I do like, is being able to remount as 'soft' so
that the RPC calls simply time out. That solves the problem, and does
not compromise the case where the server comes back up, and we remount
the super block in order to continue operations.
That said, there are a few impediments to making that work. As far as I
can tell, none are insurmountable, but they need to be solved.

For instance, one such impediment is the fact that the way soft mounts
work these days is by tagging each RPC task with the flag RPC_TASK_SOFT
(and/or RPC_TASK_TIMEOUT depending on which error value you want the
call to return). This tag is set in task->tk_flags, which is assumed
constant throughout the lifetime of the RPC task. This is why we can
test RPC_IS_SOFT(task) before deciding how we want to call
rpc_sleep_on(). If a third party wants to change that tag, and the wake
up the task in order to have it try to time out, then code snippets
like the following in xprt_reserve_xprt()

        if  (RPC_IS_SOFT(task))
                rpc_sleep_on_timeout(&xprt->sending, task, NULL,
                                xprt_request_timeout(req));
        else
                rpc_sleep_on(&xprt->sending, task, NULL);

would need to be replaced by something that is atomic.

> 
> 
> > ... except....
> > There are some waits the VFS/MM which are not TASK_KILLABLE and
> > probably should be.  I think that "we" definitely want "someone" to
> > track them down and fix them.
> 
> I agree... and "someone" could mean me or someone here at Oracle.
> 
> 
> > > Does remounting with "soft" work today? That seems like the most
> > > direct
> > > way to deal with this particular situation.
> > 
> > I don't think this does work, and it would be non-trivial (but
> > maybe not
> > impossible) to mark all the outstanding RPCs as also "soft".
> 
> The problem I've observed with umount is umount_begin does the
> killall_tasks call, then the client issues some additional requests.
> Those are the requests that get stuck before umount_end can finally
> shutdown the RPC client. umount_end is never called because those
> requests are "hard".
> 
> We have rpc_killall_tasks which loops over all of an rpc_clnt's
> outstanding RPC tasks. nfs_umount_begin could do something like
> 
> - set the rpc_clnt's "soft" flag
> - kill all tasks
> 
> Then any new tasks would timeout eventually. Just a thought, maybe
> not a good one.
> 
> There's also using SOFTCONN for all tasks after killall is called:
> if the client can't reconnect to the server, these tasks would fail
> immediately.
> 
> 
> > If we wanted to follow a path like this (and I suspect we don't), I
> > would hope that we could expose the server connection (shared among
> > multiple mounts) in sysfs somewhere, and could then set "soft" (or
> > "dead") on that connection, rather than having to do it on every
> > mount
> > from the particular server.
> 
> I think of your use case from last time: client shutdown should be
> reliable. Seems like making "umount -f" reliable would be better
> for that use case, and would work for the "make client mount points
> recoverable after server dies" case too.

'umount -f' is intended as a point in time operation, which is why it
is implemented as 'umount_begin' in const struct super_operations
nfs_sops. It is not intended to act as a state changing operation on
the super block. If it were, it would need to ensure that we also hide
such a super block from being found when you try to mount again, and it
would need to ensure that you don't inadvertently end up with a
surviving duplicate.

-- 
Trond Myklebust
Linux NFS client maintainer, Hammerspace
trond.myklebust@hammerspace.com



      reply index

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-10-02 14:57 Chuck Lever
2019-10-03  0:27 ` NeilBrown
2019-10-03 13:01   ` Chuck Lever
2019-10-04 15:25     ` Trond Myklebust [this message]

Reply instructions:

You may reply publically to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=8fa5b7e8a20c435b8bbf2130f2ade0c513b3631c.camel@hammerspace.com \
    --to=trondmy@hammerspace.com \
    --cc=chuck.lever@oracle.com \
    --cc=linux-nfs@vger.kernel.org \
    --cc=neilb@suse.de \
    --cc=nfbrown@suse.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Linux-NFS Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-nfs/0 linux-nfs/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-nfs linux-nfs/ https://lore.kernel.org/linux-nfs \
		linux-nfs@vger.kernel.org
	public-inbox-index linux-nfs

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-nfs


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git