linux-api.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Re: [RFC v4 0/9] NFS Force Unmounting
       [not found]   ` <1512398194.7031.56.camel-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
@ 2017-12-05 23:34     ` NeilBrown
       [not found]       ` <87indksq9q.fsf-wvvUuzkyo1HefUI2i7LXDhCRmIWqnp/j@public.gmane.org>
  0 siblings, 1 reply; 7+ messages in thread
From: NeilBrown @ 2017-12-05 23:34 UTC (permalink / raw)
  To: Joshua Watt, Jeff Layton, Trond Myklebust, J . Bruce Fields
  Cc: linux-nfs-u79uwXL29TY76Z2rM5mHXA, Al Viro, David Howells,
	linux-api-u79uwXL29TY76Z2rM5mHXA

[-- Attachment #1: Type: text/plain, Size: 6736 bytes --]

On Mon, Dec 04 2017, Joshua Watt wrote:

> On Fri, 2017-11-17 at 11:45 -0600, Joshua Watt wrote:
>> Latest attempt at unifying the various constraints for force
>> umounting
>> NFS servers that have disappeared in a timely manner. Namely:
>>   * umount2(..., MNT_FORCE) should not be made stronger, unless we
>> know
>>     this is the final umount()
>>   * The "failed server" state should be reversible
>>   * The mechanism should be able to "unstick" a sync(2) that is stuck
>> on
>>     an unresponsive server
>> 
>> I believe the proposal satisfies all of these concerns. There are a
>> few
>> major components to this proposal:
>>  1) The umount_begin superblock operation now has a corresponding
>>     umount_end operation. This is invoked by umount() when MNT_FORCE
>> is
>>     specified (like umount_begin()), but the actual unmount failed
>> (i.e.
>>     the mount is busy).
>>  2) Instead of killing all the RPC queued at a single point in time,
>> the
>>     NFS mount now kills all queue RPCs and all RPCs that get queued
>>     between nfs_umount_begin() and nfs_umount_end(). I believe this
>> is
>>     not a significant change in behavior because there were always
>> races
>>     between queuing RPCs and killing them in nfs_umount_begin().
>>  3) nfs_umount_end() is *not* called when MNT_DETACH is specified.
>> This
>>     is the indication that the user is done with this mount and all
>>     RPCs will be killed until the mount finally gets removed.
>>  4) The new "transient" mount option prevents sharing nfs_clients
>>     between multiple superblocks. The only exception to this is when
>> the
>>     kernel does an automatic mount to cross a device boundary ("xdev"
>>     NFS mount). In this case, the existing code always shares the
>>     existing nfs_client from parent superblock. The "transient" mount
>>     option implies "nosharecache", as it doesn't make sense to share
>>     superblocks if clients aren't shared.
>>  5) If the "transient" mount option is specified (and hence the
>>     nfs_client is not shared), MNT_FORCE kills all RPCs for the
>> entire
>>     nfs_client (and all its nfs_servers). This effectively enables
>> the
>>     "burn down the forest" option when combined with MNT_DETACH.
>> 
>> The choice to use MNT_FORCE as the mechanism for triggering this
>> behavior stems from the desire to unstick sync(2) calls that might be
>> blocked on a non-responsive NFS server. While it was previously
>> discussed to remount with a new mount option to enable this behavior,
>> this cannot release the blocked sync(2) call because both
>> sync_fileystem() and do_remount() lock the struct superblock-
>> >s_umount
>> reader-writer lock. As such, a remount will block until the sync(2)
>> finishes, which is undesirable. umount2() doesn't have this
>> restriction
>> and can unblock the sync(2) call.
>> 
>> For the most part, all existing behavior is unchanged if the
>> "transient"
>> option is not specified. umount -f continues to behave as is has, but
>> umount -fl (see note below) now does a more aggressive kill to get
>> everything out of there and allow unmounting in a more timely manner.
>> Note that there will probably be some communication with the server
>> still, as destruction of the NFS client ID and such will occur when
>> the
>> last reference is removed. If the server is truly gone, this can
>> result
>> in long blocks at that time.
>> 
>> If it is known at mount time that the server might be disappearing,
>> it
>> should be mounted with "transient". Doing this will allow mount -fl
>> to
>> do a more complete cleanup and prevent all communication with the
>> server, which should allow a timely cleanup in all cases.
>> 
>> Notes:
>> 
>> Until recently, libmount did not allow a detached and lazy mount at
>> the
>> same time. This was recently fixed (see
>> https://marc.info/?l=util-linux-ng&m=151000714401929&w=2). If you
>> want
>> to test this, you may need to write a simple test program that
>> directly
>> calls umount2() with MNT_FORCE | MNT_DETACH.
>> 
>> Thank you all again for your time and comments,
>> Joshua Watt
>> 
>> Joshua Watt (9):
>>   SUNRPC: Add flag to kill new tasks
>>   SUNRPC: Expose kill_new_tasks in debugfs
>>   SUNRPC: Simplify client shutdown
>>   namespace: Add umount_end superblock operation
>>   NFS: Kill RPCs for the duration of umount
>>   NFS: Add debugfs for nfs_server and nfs_client
>>   NFS: Add transient mount option
>>   NFS: Don't shared transient clients
>>   NFS: Kill all client RPCs if transient
>> 
>>  fs/namespace.c                 |  22 ++++++-
>>  fs/nfs/Makefile                |   2 +-
>>  fs/nfs/client.c                |  96 +++++++++++++++++++++++++--
>>  fs/nfs/debugfs.c               | 143
>> +++++++++++++++++++++++++++++++++++++++++
>>  fs/nfs/inode.c                 |   5 ++
>>  fs/nfs/internal.h              |  11 ++++
>>  fs/nfs/nfs3client.c            |   2 +
>>  fs/nfs/nfs4client.c            |   5 ++
>>  fs/nfs/nfs4super.c             |   1 +
>>  fs/nfs/super.c                 |  81 ++++++++++++++++++++---
>>  include/linux/fs.h             |   1 +
>>  include/linux/nfs_fs_sb.h      |   6 ++
>>  include/linux/sunrpc/clnt.h    |   1 +
>>  include/uapi/linux/nfs_mount.h |   1 +
>>  net/sunrpc/clnt.c              |  13 ++--
>>  net/sunrpc/debugfs.c           |   5 ++
>>  net/sunrpc/sched.c             |   3 +
>>  17 files changed, 372 insertions(+), 26 deletions(-)
>>  create mode 100644 fs/nfs/debugfs.c
>> 
>
> Anyone have any comments on this? Sorry fo the churn, it took a few
> tries to get this to where it would work. I also realize I should have
> put all my RFC patchsets in-reply-to each other instead of starting a
> new thread for each one.

The new semantic for MNT_DETACH|MNT_FORCE is interesting.
As it was never possible before (from /bin/umount), it should be safe to
add a new meaning.
The meaning is effectively "detach the filesystem from the namespace and
detach the transport from the filesystem", which sounds like it is
useful.
It is worth highlighting this, and maybe even cc:ing
linux-api-u79uwXL29TY76Z2rM5mHXA@public.gmane.org ... done that.

The new umount_end to match umount_begin seems to make sense.  It can
replace a racy semantic with a more predictable one, and that is good.
As you say, we cannot do anything really useful here with remount as
sync can block it by holding ->s_umount.

I'm not sure about the new "transient" option.  It is probably a good
idea but I haven't yet wrapped my mind around exactly what it does so I
don't want to commit just yet.

Thanks,
NeilBrown

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [RFC v4 0/9] NFS Force Unmounting
       [not found]       ` <87indksq9q.fsf-wvvUuzkyo1HefUI2i7LXDhCRmIWqnp/j@public.gmane.org>
@ 2017-12-06 13:03         ` Jeff Layton
       [not found]           ` <1512565420.4048.21.camel-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
  0 siblings, 1 reply; 7+ messages in thread
From: Jeff Layton @ 2017-12-06 13:03 UTC (permalink / raw)
  To: NeilBrown, Joshua Watt, Trond Myklebust, J . Bruce Fields
  Cc: linux-nfs-u79uwXL29TY76Z2rM5mHXA, Al Viro, David Howells,
	linux-api-u79uwXL29TY76Z2rM5mHXA

On Wed, 2017-12-06 at 10:34 +1100, NeilBrown wrote:
> On Mon, Dec 04 2017, Joshua Watt wrote:
> 
> > On Fri, 2017-11-17 at 11:45 -0600, Joshua Watt wrote:
> > > Latest attempt at unifying the various constraints for force
> > > umounting
> > > NFS servers that have disappeared in a timely manner. Namely:
> > >   * umount2(..., MNT_FORCE) should not be made stronger, unless we
> > > know
> > >     this is the final umount()
> > >   * The "failed server" state should be reversible
> > >   * The mechanism should be able to "unstick" a sync(2) that is stuck
> > > on
> > >     an unresponsive server
> > > 
> > > I believe the proposal satisfies all of these concerns. There are a
> > > few
> > > major components to this proposal:
> > >  1) The umount_begin superblock operation now has a corresponding
> > >     umount_end operation. This is invoked by umount() when MNT_FORCE
> > > is
> > >     specified (like umount_begin()), but the actual unmount failed
> > > (i.e.
> > >     the mount is busy).
> > >  2) Instead of killing all the RPC queued at a single point in time,
> > > the
> > >     NFS mount now kills all queue RPCs and all RPCs that get queued
> > >     between nfs_umount_begin() and nfs_umount_end(). I believe this
> > > is
> > >     not a significant change in behavior because there were always
> > > races
> > >     between queuing RPCs and killing them in nfs_umount_begin().
> > >  3) nfs_umount_end() is *not* called when MNT_DETACH is specified.
> > > This
> > >     is the indication that the user is done with this mount and all
> > >     RPCs will be killed until the mount finally gets removed.
> > >  4) The new "transient" mount option prevents sharing nfs_clients
> > >     between multiple superblocks. The only exception to this is when
> > > the
> > >     kernel does an automatic mount to cross a device boundary ("xdev"
> > >     NFS mount). In this case, the existing code always shares the
> > >     existing nfs_client from parent superblock. The "transient" mount
> > >     option implies "nosharecache", as it doesn't make sense to share
> > >     superblocks if clients aren't shared.
> > >  5) If the "transient" mount option is specified (and hence the
> > >     nfs_client is not shared), MNT_FORCE kills all RPCs for the
> > > entire
> > >     nfs_client (and all its nfs_servers). This effectively enables
> > > the
> > >     "burn down the forest" option when combined with MNT_DETACH.
> > > 
> > > The choice to use MNT_FORCE as the mechanism for triggering this
> > > behavior stems from the desire to unstick sync(2) calls that might be
> > > blocked on a non-responsive NFS server. While it was previously
> > > discussed to remount with a new mount option to enable this behavior,
> > > this cannot release the blocked sync(2) call because both
> > > sync_fileystem() and do_remount() lock the struct superblock-
> > > > s_umount
> > > 
> > > reader-writer lock. As such, a remount will block until the sync(2)
> > > finishes, which is undesirable. umount2() doesn't have this
> > > restriction
> > > and can unblock the sync(2) call.
> > > 
> > > For the most part, all existing behavior is unchanged if the
> > > "transient"
> > > option is not specified. umount -f continues to behave as is has, but
> > > umount -fl (see note below) now does a more aggressive kill to get
> > > everything out of there and allow unmounting in a more timely manner.
> > > Note that there will probably be some communication with the server
> > > still, as destruction of the NFS client ID and such will occur when
> > > the
> > > last reference is removed. If the server is truly gone, this can
> > > result
> > > in long blocks at that time.
> > > 
> > > If it is known at mount time that the server might be disappearing,
> > > it
> > > should be mounted with "transient". Doing this will allow mount -fl
> > > to
> > > do a more complete cleanup and prevent all communication with the
> > > server, which should allow a timely cleanup in all cases.
> > > 
> > > Notes:
> > > 
> > > Until recently, libmount did not allow a detached and lazy mount at
> > > the
> > > same time. This was recently fixed (see
> > > https://marc.info/?l=util-linux-ng&m=151000714401929&w=2). If you
> > > want
> > > to test this, you may need to write a simple test program that
> > > directly
> > > calls umount2() with MNT_FORCE | MNT_DETACH.
> > > 
> > > Thank you all again for your time and comments,
> > > Joshua Watt
> > > 
> > > Joshua Watt (9):
> > >   SUNRPC: Add flag to kill new tasks
> > >   SUNRPC: Expose kill_new_tasks in debugfs
> > >   SUNRPC: Simplify client shutdown
> > >   namespace: Add umount_end superblock operation
> > >   NFS: Kill RPCs for the duration of umount
> > >   NFS: Add debugfs for nfs_server and nfs_client
> > >   NFS: Add transient mount option
> > >   NFS: Don't shared transient clients
> > >   NFS: Kill all client RPCs if transient
> > > 
> > >  fs/namespace.c                 |  22 ++++++-
> > >  fs/nfs/Makefile                |   2 +-
> > >  fs/nfs/client.c                |  96 +++++++++++++++++++++++++--
> > >  fs/nfs/debugfs.c               | 143
> > > +++++++++++++++++++++++++++++++++++++++++
> > >  fs/nfs/inode.c                 |   5 ++
> > >  fs/nfs/internal.h              |  11 ++++
> > >  fs/nfs/nfs3client.c            |   2 +
> > >  fs/nfs/nfs4client.c            |   5 ++
> > >  fs/nfs/nfs4super.c             |   1 +
> > >  fs/nfs/super.c                 |  81 ++++++++++++++++++++---
> > >  include/linux/fs.h             |   1 +
> > >  include/linux/nfs_fs_sb.h      |   6 ++
> > >  include/linux/sunrpc/clnt.h    |   1 +
> > >  include/uapi/linux/nfs_mount.h |   1 +
> > >  net/sunrpc/clnt.c              |  13 ++--
> > >  net/sunrpc/debugfs.c           |   5 ++
> > >  net/sunrpc/sched.c             |   3 +
> > >  17 files changed, 372 insertions(+), 26 deletions(-)
> > >  create mode 100644 fs/nfs/debugfs.c
> > > 
> > 
> > Anyone have any comments on this? Sorry fo the churn, it took a few
> > tries to get this to where it would work. I also realize I should have
> > put all my RFC patchsets in-reply-to each other instead of starting a
> > new thread for each one.
> 
> The new semantic for MNT_DETACH|MNT_FORCE is interesting.
> As it was never possible before (from /bin/umount), it should be safe to
> add a new meaning.
> The meaning is effectively "detach the filesystem from the namespace and
> detach the transport from the filesystem", which sounds like it is
> useful.
> It is worth highlighting this, and maybe even cc:ing
> linux-api-u79uwXL29TY76Z2rM5mHXA@public.gmane.org ... done that.
> 

I'm not thrilled with the new flag combo, personally. Given that we're
introducing new behavior here, I think it wouldn't hurt to add a new
UMOUNT_* flag for this (UMOUNT_NUKE_FROM_ORBIT?).

That should still give you a -EINVAL when you try to use it on kernels
that don't understand it, and cleanly separates this behavior from the
old flags. Userland umount program would need changes of course, but
that's not hard.

Fundamentally, I don't really agree with all the hand-wringing about
torpedoing other mounts that are using a client that you're going to
kill off. This is going to be a privileged operation, and I think we
just need to ensure that it's well-documented and that the admin can
figure out what mounts will be affected and how.

> The new umount_end to match umount_begin seems to make sense.  It can
> replace a racy semantic with a more predictable one, and that is good.
> As you say, we cannot do anything really useful here with remount as
> sync can block it by holding ->s_umount.
> 
> I'm not sure about the new "transient" option.  It is probably a good
> idea but I haven't yet wrapped my mind around exactly what it does so I
> don't want to commit just yet.

I think fixing this with new mount options is a losing strategy. By the
time you realize you need it, it's too late to add it. NFS has a huge
entrenched user-base and you'll never get them all to sprinkle this new
mount option in there.

That said, it might work if you can make it take effect on a remount.

My two lumps of copper and nickel.
-- 
Jeff Layton <jlayton-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [RFC v4 0/9] NFS Force Unmounting
       [not found]           ` <1512565420.4048.21.camel-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
@ 2017-12-06 16:40             ` Joshua Watt
  2017-12-08  2:10             ` NeilBrown
  1 sibling, 0 replies; 7+ messages in thread
From: Joshua Watt @ 2017-12-06 16:40 UTC (permalink / raw)
  To: Jeff Layton, NeilBrown, Trond Myklebust, J . Bruce Fields
  Cc: linux-nfs-u79uwXL29TY76Z2rM5mHXA, Al Viro, David Howells,
	linux-api-u79uwXL29TY76Z2rM5mHXA

On Wed, 2017-12-06 at 08:03 -0500, Jeff Layton wrote:
> On Wed, 2017-12-06 at 10:34 +1100, NeilBrown wrote:
> > On Mon, Dec 04 2017, Joshua Watt wrote:
> > 
> > > On Fri, 2017-11-17 at 11:45 -0600, Joshua Watt wrote:
> > > > Latest attempt at unifying the various constraints for force
> > > > umounting
> > > > NFS servers that have disappeared in a timely manner. Namely:
> > > >   * umount2(..., MNT_FORCE) should not be made stronger, unless
> > > > we
> > > > know
> > > >     this is the final umount()
> > > >   * The "failed server" state should be reversible
> > > >   * The mechanism should be able to "unstick" a sync(2) that is
> > > > stuck
> > > > on
> > > >     an unresponsive server
> > > > 
> > > > I believe the proposal satisfies all of these concerns. There
> > > > are a
> > > > few
> > > > major components to this proposal:
> > > >  1) The umount_begin superblock operation now has a
> > > > corresponding
> > > >     umount_end operation. This is invoked by umount() when
> > > > MNT_FORCE
> > > > is
> > > >     specified (like umount_begin()), but the actual unmount
> > > > failed
> > > > (i.e.
> > > >     the mount is busy).
> > > >  2) Instead of killing all the RPC queued at a single point in
> > > > time,
> > > > the
> > > >     NFS mount now kills all queue RPCs and all RPCs that get
> > > > queued
> > > >     between nfs_umount_begin() and nfs_umount_end(). I believe
> > > > this
> > > > is
> > > >     not a significant change in behavior because there were
> > > > always
> > > > races
> > > >     between queuing RPCs and killing them in
> > > > nfs_umount_begin().
> > > >  3) nfs_umount_end() is *not* called when MNT_DETACH is
> > > > specified.
> > > > This
> > > >     is the indication that the user is done with this mount and
> > > > all
> > > >     RPCs will be killed until the mount finally gets removed.
> > > >  4) The new "transient" mount option prevents sharing
> > > > nfs_clients
> > > >     between multiple superblocks. The only exception to this is
> > > > when
> > > > the
> > > >     kernel does an automatic mount to cross a device boundary
> > > > ("xdev"
> > > >     NFS mount). In this case, the existing code always shares
> > > > the
> > > >     existing nfs_client from parent superblock. The "transient"
> > > > mount
> > > >     option implies "nosharecache", as it doesn't make sense to
> > > > share
> > > >     superblocks if clients aren't shared.
> > > >  5) If the "transient" mount option is specified (and hence the
> > > >     nfs_client is not shared), MNT_FORCE kills all RPCs for the
> > > > entire
> > > >     nfs_client (and all its nfs_servers). This effectively
> > > > enables
> > > > the
> > > >     "burn down the forest" option when combined with
> > > > MNT_DETACH.
> > > > 
> > > > The choice to use MNT_FORCE as the mechanism for triggering
> > > > this
> > > > behavior stems from the desire to unstick sync(2) calls that
> > > > might be
> > > > blocked on a non-responsive NFS server. While it was previously
> > > > discussed to remount with a new mount option to enable this
> > > > behavior,
> > > > this cannot release the blocked sync(2) call because both
> > > > sync_fileystem() and do_remount() lock the struct superblock-
> > > > > s_umount
> > > > 
> > > > reader-writer lock. As such, a remount will block until the
> > > > sync(2)
> > > > finishes, which is undesirable. umount2() doesn't have this
> > > > restriction
> > > > and can unblock the sync(2) call.
> > > > 
> > > > For the most part, all existing behavior is unchanged if the
> > > > "transient"
> > > > option is not specified. umount -f continues to behave as is
> > > > has, but
> > > > umount -fl (see note below) now does a more aggressive kill to
> > > > get
> > > > everything out of there and allow unmounting in a more timely
> > > > manner.
> > > > Note that there will probably be some communication with the
> > > > server
> > > > still, as destruction of the NFS client ID and such will occur
> > > > when
> > > > the
> > > > last reference is removed. If the server is truly gone, this
> > > > can
> > > > result
> > > > in long blocks at that time.
> > > > 
> > > > If it is known at mount time that the server might be
> > > > disappearing,
> > > > it
> > > > should be mounted with "transient". Doing this will allow mount
> > > > -fl
> > > > to
> > > > do a more complete cleanup and prevent all communication with
> > > > the
> > > > server, which should allow a timely cleanup in all cases.
> > > > 
> > > > Notes:
> > > > 
> > > > Until recently, libmount did not allow a detached and lazy
> > > > mount at
> > > > the
> > > > same time. This was recently fixed (see
> > > > https://marc.info/?l=util-linux-ng&m=151000714401929&w=2). If
> > > > you
> > > > want
> > > > to test this, you may need to write a simple test program that
> > > > directly
> > > > calls umount2() with MNT_FORCE | MNT_DETACH.
> > > > 
> > > > Thank you all again for your time and comments,
> > > > Joshua Watt
> > > > 
> > > > Joshua Watt (9):
> > > >   SUNRPC: Add flag to kill new tasks
> > > >   SUNRPC: Expose kill_new_tasks in debugfs
> > > >   SUNRPC: Simplify client shutdown
> > > >   namespace: Add umount_end superblock operation
> > > >   NFS: Kill RPCs for the duration of umount
> > > >   NFS: Add debugfs for nfs_server and nfs_client
> > > >   NFS: Add transient mount option
> > > >   NFS: Don't shared transient clients
> > > >   NFS: Kill all client RPCs if transient
> > > > 
> > > >  fs/namespace.c                 |  22 ++++++-
> > > >  fs/nfs/Makefile                |   2 +-
> > > >  fs/nfs/client.c                |  96
> > > > +++++++++++++++++++++++++--
> > > >  fs/nfs/debugfs.c               | 143
> > > > +++++++++++++++++++++++++++++++++++++++++
> > > >  fs/nfs/inode.c                 |   5 ++
> > > >  fs/nfs/internal.h              |  11 ++++
> > > >  fs/nfs/nfs3client.c            |   2 +
> > > >  fs/nfs/nfs4client.c            |   5 ++
> > > >  fs/nfs/nfs4super.c             |   1 +
> > > >  fs/nfs/super.c                 |  81 ++++++++++++++++++++---
> > > >  include/linux/fs.h             |   1 +
> > > >  include/linux/nfs_fs_sb.h      |   6 ++
> > > >  include/linux/sunrpc/clnt.h    |   1 +
> > > >  include/uapi/linux/nfs_mount.h |   1 +
> > > >  net/sunrpc/clnt.c              |  13 ++--
> > > >  net/sunrpc/debugfs.c           |   5 ++
> > > >  net/sunrpc/sched.c             |   3 +
> > > >  17 files changed, 372 insertions(+), 26 deletions(-)
> > > >  create mode 100644 fs/nfs/debugfs.c
> > > > 
> > > 
> > > Anyone have any comments on this? Sorry fo the churn, it took a
> > > few
> > > tries to get this to where it would work. I also realize I should
> > > have
> > > put all my RFC patchsets in-reply-to each other instead of
> > > starting a
> > > new thread for each one.
> > 
> > The new semantic for MNT_DETACH|MNT_FORCE is interesting.
> > As it was never possible before (from /bin/umount), it should be
> > safe to
> > add a new meaning.
> > The meaning is effectively "detach the filesystem from the
> > namespace and
> > detach the transport from the filesystem", which sounds like it is
> > useful.
> > It is worth highlighting this, and maybe even cc:ing
> > linux-api-u79uwXL29TY76Z2rM5mHXA@public.gmane.org ... done that.

Full disclosure: It looks like the kernel umount2() syscall always
allowed both to be specified, it was only libmount's umount API (and by
extension the umount(8) command) that didn't allow it in userspace
(unitl I recently changed it). If fact, the umount(8) command in
busybox allows both to be specified, and IIRC there is a comment
expressing some confusion as to why util-linux didn't. 

> > 
> 
> I'm not thrilled with the new flag combo, personally. Given that
> we're
> introducing new behavior here, I think it wouldn't hurt to add a new
> UMOUNT_* flag for this (UMOUNT_NUKE_FROM_ORBIT?).
> 
> That should still give you a -EINVAL when you try to use it on
> kernels
> that don't understand it, and cleanly separates this behavior from
> the
> old flags. Userland umount program would need changes of course, but
> that's not hard.
> 
> Fundamentally, I don't really agree with all the hand-wringing about
> torpedoing other mounts that are using a client that you're going to
> kill off. This is going to be a privileged operation, and I think we
> just need to ensure that it's well-documented and that the admin can
> figure out what mounts will be affected and how.
> 
> > The new umount_end to match umount_begin seems to make sense.  It
> > can
> > replace a racy semantic with a more predictable one, and that is
> > good.
> > As you say, we cannot do anything really useful here with remount
> > as
> > sync can block it by holding ->s_umount.
> > 
> > I'm not sure about the new "transient" option.  It is probably a
> > good
> > idea but I haven't yet wrapped my mind around exactly what it does
> > so I
> > don't want to commit just yet.
> 
> I think fixing this with new mount options is a losing strategy. By
> the
> time you realize you need it, it's too late to add it. NFS has a huge
> entrenched user-base and you'll never get them all to sprinkle this
> new
> mount option in there.

My use case fortunately has the convience of knowing apriori that the
mount option is needed when the filesystem is mounted (*occasionally*,
embedded use cases do simplify things), but I obviously realize that my
use case isn't necessarily what is best for everyone.

> 
> That said, it might work if you can make it take effect on a remount.

I think it could.... the thing that blocks remount from is waiting on a
"stuck" call that is never going to finish and has the struct
superblock->s_umount reader-write lock (specifically sync(2), but maybe
others). You could actually use umount(..., MNT_FORCE) to try and
recover that like so:

 until mount -o remount,transient $MOUNT; do
     umount -f $MOUNT
 done
 umount -fl $MOUNT

I'll play a little devil's advocate to my own patch here and ask why
the above is any better than simply forgoing my patches entirely and
doing:

 until umount -f $MOUNT; do echo "Retry"; done

Technically speaking the difference between these is that the first
only races between processes calling sync(2) and the remount, which the
second races between all NFS calls that make RPC calls and the umount.
In theory a race is a race, but I would guess in practice the second is
much more likely to have to keep retrying than the first.

Honestly, if you are a system admin trying to remove a dead NFS mount,
the second option might be better in many cases. If the umount takes
too long, you can go in and kill processes, cleanup resources, or
whatever might be required to get the mount shutdown. However, there
are use cases (like mine) where having the system administrator step in
to clean up the system isn't an option. On my embedded system, I *know*
this mount can disappear on me at any time, and have decided that
keeping our system responsive is more important than trying a long
recovery processes (that's probably going to fail anyway) to try and
save the data, and we have confidence that our applications can handle
this without requiring drastic things like being killed. As such, I
mount with the "transient" option right away so that my system can
clean up in a timely manner when and if it becomes necessary. I've
effectively made that administrative decision as part of the system
design, but there is currently no mechanism in the kernel that allows
me to enforce it in a sane manner.

Thanks,
Joshua Watt

> 
> My two lumps of copper and nickel.
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [RFC v4 0/9] NFS Force Unmounting
       [not found]           ` <1512565420.4048.21.camel-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
  2017-12-06 16:40             ` Joshua Watt
@ 2017-12-08  2:10             ` NeilBrown
       [not found]               ` <87bmjaq89r.fsf-wvvUuzkyo1HefUI2i7LXDhCRmIWqnp/j@public.gmane.org>
  1 sibling, 1 reply; 7+ messages in thread
From: NeilBrown @ 2017-12-08  2:10 UTC (permalink / raw)
  To: Jeff Layton, Joshua Watt, Trond Myklebust, J . Bruce Fields
  Cc: linux-nfs-u79uwXL29TY76Z2rM5mHXA, Al Viro, David Howells,
	linux-api-u79uwXL29TY76Z2rM5mHXA

[-- Attachment #1: Type: text/plain, Size: 2292 bytes --]

On Wed, Dec 06 2017, Jeff Layton wrote:

> On Wed, 2017-12-06 at 10:34 +1100, NeilBrown wrote:
>> 
>> The new semantic for MNT_DETACH|MNT_FORCE is interesting.
>> As it was never possible before (from /bin/umount), it should be safe to
>> add a new meaning.
>> The meaning is effectively "detach the filesystem from the namespace and
>> detach the transport from the filesystem", which sounds like it is
>> useful.
>> It is worth highlighting this, and maybe even cc:ing
>> linux-api-u79uwXL29TY76Z2rM5mHXA@public.gmane.org ... done that.
>> 
>
> I'm not thrilled with the new flag combo, personally. Given that we're
> introducing new behavior here, I think it wouldn't hurt to add a new
> UMOUNT_* flag for this (UMOUNT_NUKE_FROM_ORBIT?).

Suppose we did... MNT_FORCE_PONIES. What would be the semantics of this
flag?  Once we had it, would anyone ever want to use MNT_FORCE again?

MNT_FORCE is already fairly heavy handled.  It abort an arbitrary
collections of RPC requests being sent for the given filesystem, no
matter where else that filesystem might be mounted.
Is it ever safe to use this flag unless you have good reason to believe
that the server is not available and there is no point pretending any
more?
And if that is the case, why not use the new MNT_FORCE_PONIES which is
at least predictable and reliable.

We've talking a lot about the one NFS filesystem being mounted in
multiple containers.  MNT_FORCE is already a problem for such mounts as
one contains can kill requests generated from another container.  Maybe
MNT_FORCE needs to be restricted to "real" root.
Once we restrict it, do we need to keep it from being too harsh?

What would be really nice is a timeout for umount, and for sync.
The timeout only starts when the filesystem stops making progress for
writeback.  If it eventually does timeout, then the caller can fall back
to MNT_DETACH if they are in a container, or MNT_FORCE if not.
(Maybe MNT_FORCE should map to MNT_DETACH in a container??? or maybe
not).

There is a lot here that still isn't clear to me, but one this does seem
to be becoming clear:  MNT_FORCE as it stands is nearly useless and it
would serve is well to find a semantic that it actually useful, and
impose that.

Thanks,
NeilBrown

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [RFC v4 0/9] NFS Force Unmounting
       [not found]               ` <87bmjaq89r.fsf-wvvUuzkyo1HefUI2i7LXDhCRmIWqnp/j@public.gmane.org>
@ 2017-12-14 18:22                 ` Joshua Watt
       [not found]                   ` <1513275773.3888.20.camel-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
  0 siblings, 1 reply; 7+ messages in thread
From: Joshua Watt @ 2017-12-14 18:22 UTC (permalink / raw)
  To: NeilBrown, Jeff Layton, Trond Myklebust, J . Bruce Fields
  Cc: linux-nfs-u79uwXL29TY76Z2rM5mHXA, Al Viro, David Howells,
	linux-api-u79uwXL29TY76Z2rM5mHXA

On Fri, 2017-12-08 at 13:10 +1100, NeilBrown wrote:
> On Wed, Dec 06 2017, Jeff Layton wrote:
> 
> > On Wed, 2017-12-06 at 10:34 +1100, NeilBrown wrote:
> > > 
> > > The new semantic for MNT_DETACH|MNT_FORCE is interesting.
> > > As it was never possible before (from /bin/umount), it should be
> > > safe to
> > > add a new meaning.
> > > The meaning is effectively "detach the filesystem from the
> > > namespace and
> > > detach the transport from the filesystem", which sounds like it
> > > is
> > > useful.
> > > It is worth highlighting this, and maybe even cc:ing
> > > linux-api-u79uwXL29TY76Z2rM5mHXA@public.gmane.org ... done that.
> > > 
> > 
> > I'm not thrilled with the new flag combo, personally. Given that
> > we're
> > introducing new behavior here, I think it wouldn't hurt to add a
> > new
> > UMOUNT_* flag for this (UMOUNT_NUKE_FROM_ORBIT?).
> 
> Suppose we did... MNT_FORCE_PONIES. What would be the semantics of
> this
> flag?  Once we had it, would anyone ever want to use MNT_FORCE again?
> 
> MNT_FORCE is already fairly heavy handled.  It abort an arbitrary
> collections of RPC requests being sent for the given filesystem, no
> matter where else that filesystem might be mounted.
> Is it ever safe to use this flag unless you have good reason to
> believe
> that the server is not available and there is no point pretending any
> more?
> And if that is the case, why not use the new MNT_FORCE_PONIES which
> is
> at least predictable and reliable.
> 
> We've talking a lot about the one NFS filesystem being mounted in
> multiple containers.  MNT_FORCE is already a problem for such mounts
> as
> one contains can kill requests generated from another
> container.  Maybe
> MNT_FORCE needs to be restricted to "real" root.
> Once we restrict it, do we need to keep it from being too harsh?
> 
> What would be really nice is a timeout for umount, and for sync.
> The timeout only starts when the filesystem stops making progress for
> writeback.  If it eventually does timeout, then the caller can fall
> back
> to MNT_DETACH if they are in a container, or MNT_FORCE if not.
> (Maybe MNT_FORCE should map to MNT_DETACH in a container??? or maybe
> not).
> 
> There is a lot here that still isn't clear to me, but one this does
> seem
> to be becoming clear:  MNT_FORCE as it stands is nearly useless and
> it
> would serve is well to find a semantic that it actually useful, and
> impose that.

Trying to keep the discussion going... does anyone else have thoughts
on this?

Thanks,
Joshua Watt
> 
> Thanks,
> NeilBrown

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [RFC v4 0/9] NFS Force Unmounting
       [not found]                   ` <1513275773.3888.20.camel-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
@ 2017-12-14 21:52                     ` NeilBrown
       [not found]                       ` <876099nfiw.fsf-wvvUuzkyo1HefUI2i7LXDhCRmIWqnp/j@public.gmane.org>
  0 siblings, 1 reply; 7+ messages in thread
From: NeilBrown @ 2017-12-14 21:52 UTC (permalink / raw)
  To: Joshua Watt, Jeff Layton, Trond Myklebust, J . Bruce Fields
  Cc: linux-nfs-u79uwXL29TY76Z2rM5mHXA, Al Viro, David Howells,
	linux-api-u79uwXL29TY76Z2rM5mHXA

[-- Attachment #1: Type: text/plain, Size: 5786 bytes --]

On Thu, Dec 14 2017, Joshua Watt wrote:

> On Fri, 2017-12-08 at 13:10 +1100, NeilBrown wrote:
>> On Wed, Dec 06 2017, Jeff Layton wrote:
>> 
>> > On Wed, 2017-12-06 at 10:34 +1100, NeilBrown wrote:
>> > > 
>> > > The new semantic for MNT_DETACH|MNT_FORCE is interesting.
>> > > As it was never possible before (from /bin/umount), it should be
>> > > safe to
>> > > add a new meaning.
>> > > The meaning is effectively "detach the filesystem from the
>> > > namespace and
>> > > detach the transport from the filesystem", which sounds like it
>> > > is
>> > > useful.
>> > > It is worth highlighting this, and maybe even cc:ing
>> > > linux-api-u79uwXL29TY76Z2rM5mHXA@public.gmane.org ... done that.
>> > > 
>> > 
>> > I'm not thrilled with the new flag combo, personally. Given that
>> > we're
>> > introducing new behavior here, I think it wouldn't hurt to add a
>> > new
>> > UMOUNT_* flag for this (UMOUNT_NUKE_FROM_ORBIT?).
>> 
>> Suppose we did... MNT_FORCE_PONIES. What would be the semantics of
>> this
>> flag?  Once we had it, would anyone ever want to use MNT_FORCE again?
>> 
>> MNT_FORCE is already fairly heavy handled.  It abort an arbitrary
>> collections of RPC requests being sent for the given filesystem, no
>> matter where else that filesystem might be mounted.
>> Is it ever safe to use this flag unless you have good reason to
>> believe
>> that the server is not available and there is no point pretending any
>> more?
>> And if that is the case, why not use the new MNT_FORCE_PONIES which
>> is
>> at least predictable and reliable.
>> 
>> We've talking a lot about the one NFS filesystem being mounted in
>> multiple containers.  MNT_FORCE is already a problem for such mounts
>> as
>> one contains can kill requests generated from another
>> container.  Maybe
>> MNT_FORCE needs to be restricted to "real" root.
>> Once we restrict it, do we need to keep it from being too harsh?
>> 
>> What would be really nice is a timeout for umount, and for sync.
>> The timeout only starts when the filesystem stops making progress for
>> writeback.  If it eventually does timeout, then the caller can fall
>> back
>> to MNT_DETACH if they are in a container, or MNT_FORCE if not.
>> (Maybe MNT_FORCE should map to MNT_DETACH in a container??? or maybe
>> not).
>> 
>> There is a lot here that still isn't clear to me, but one this does
>> seem
>> to be becoming clear:  MNT_FORCE as it stands is nearly useless and
>> it
>> would serve is well to find a semantic that it actually useful, and
>> impose that.
>
> Trying to keep the discussion going... does anyone else have thoughts
> on this?

It's a challenge, isn't it ... keeping people on-task to make forward
progress.
If only we could all meet in the canteen at 4pm every Friday and discuss
these things over drinks.  I don't suppose any of the video conference
tools support timeshifting, so we can each do 4pm in our own time
zone....

I would like to arrange that nothing can block indefinitely on
->s_umount.  This probably means that the various "flush data" calls
made under this lock need a timeout, or to be interruptible.
Then both umount and remount could be sure of getting ->s_umount
without undue delay.
Then I would like MNT_FORCE *not* to abort requests before trying to get
the lock, but instead to be passed down to ->kill_sb().
We probably cannot pass it explicitly, but could set a flag while
->s_umount is held.
This flag might be handled by generic_shutdown_super(), causing
it to purge any unwritten data, rather than call sync_filesystems().

This way, if the filesystem is mounted elsewhere, then the MNT_FORCE has
no effect.  If it is a final mount, then it cleans up properly.

Your need to cause writes to start failing would be achieved by
performing a remount, either just setting "soft,retrans=0,timeo=1", or
by setting some special-purpose mount option.

In order for s_umount not to be held indefinite, the generic things that
need to be fixed include:
 __writeback_inodes_wb() calls writeback_sb_inodes() under the lock.
    This needs to be interruptible
 Same for try_to_writeback_inodes_sb() -> __writeback_inodes_sb_nr()
 sync_sync and do_sync_work call iterate_supers() with various handlers, and these need
 to be interruptible.

and do_remount_sb needs to not block.

Finding a way to interrupt those writeback calls would be tricky,
especially as we need to trigger the interrupt without holding s_umount.

I really like the idea that an umount attempt would interrupt a sync().
Currently sync() can block indefinitely, which is occasionally
inconvenient.
If "umount" (or "umount -f" at least) on a filesystem would abort the
sync of that filesystem, take the lock and clean up more forcefully,
that would make for fairly clean shutdown processing.
1/ call sync() in a separate thread.
2/ wait until Dirty in /proc/meminfo stops changing
3/ umount -f every remaining filesystem.  Even if the
   umount fails, the sync will abort.


Part of this effort would require making sure that SIGKILL really kills
processes blocked on filesystem IO.

So:
 1/ make sure all filesystem IO waits are TASK_KILLABLE
 2/ find a way to interrupt any write-back wait when there is a pending
    remount or umount.  Possibly the write-back thing would need to
    retry after the umount/remount, I'm not sure.
 3/ Cause MNT_FORCE to set a superblock flag, and have
    generic_shutdown_super() and/or ->kill_sb() interpret this flag to
    be very forceful
 4/ Possibly introduce new NFS mount option which causes all requests
    to fail
 5/ Teach NFS to support remount of this option, and of soft, retrans,
    timeo.

How does that sound?

NeilBrown

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [RFC v4 0/9] NFS Force Unmounting
       [not found]                       ` <876099nfiw.fsf-wvvUuzkyo1HefUI2i7LXDhCRmIWqnp/j@public.gmane.org>
@ 2017-12-18 21:48                         ` Joshua Watt
  0 siblings, 0 replies; 7+ messages in thread
From: Joshua Watt @ 2017-12-18 21:48 UTC (permalink / raw)
  To: NeilBrown, Jeff Layton, Trond Myklebust, J . Bruce Fields
  Cc: linux-nfs-u79uwXL29TY76Z2rM5mHXA, Al Viro, David Howells,
	linux-api-u79uwXL29TY76Z2rM5mHXA

On Fri, 2017-12-15 at 08:52 +1100, NeilBrown wrote:
> On Thu, Dec 14 2017, Joshua Watt wrote:
> 
> > On Fri, 2017-12-08 at 13:10 +1100, NeilBrown wrote:
> > > On Wed, Dec 06 2017, Jeff Layton wrote:
> > > 
> > > > On Wed, 2017-12-06 at 10:34 +1100, NeilBrown wrote:
> > > > > 
> > > > > The new semantic for MNT_DETACH|MNT_FORCE is interesting.
> > > > > As it was never possible before (from /bin/umount), it should
> > > > > be
> > > > > safe to
> > > > > add a new meaning.
> > > > > The meaning is effectively "detach the filesystem from the
> > > > > namespace and
> > > > > detach the transport from the filesystem", which sounds like
> > > > > it
> > > > > is
> > > > > useful.
> > > > > It is worth highlighting this, and maybe even cc:ing
> > > > > linux-api-u79uwXL29TY76Z2rM5mHXA@public.gmane.org ... done that.
> > > > > 
> > > > 
> > > > I'm not thrilled with the new flag combo, personally. Given
> > > > that
> > > > we're
> > > > introducing new behavior here, I think it wouldn't hurt to add
> > > > a
> > > > new
> > > > UMOUNT_* flag for this (UMOUNT_NUKE_FROM_ORBIT?).
> > > 
> > > Suppose we did... MNT_FORCE_PONIES. What would be the semantics
> > > of
> > > this
> > > flag?  Once we had it, would anyone ever want to use MNT_FORCE
> > > again?
> > > 
> > > MNT_FORCE is already fairly heavy handled.  It abort an arbitrary
> > > collections of RPC requests being sent for the given filesystem,
> > > no
> > > matter where else that filesystem might be mounted.
> > > Is it ever safe to use this flag unless you have good reason to
> > > believe
> > > that the server is not available and there is no point pretending
> > > any
> > > more?
> > > And if that is the case, why not use the new MNT_FORCE_PONIES
> > > which
> > > is
> > > at least predictable and reliable.
> > > 
> > > We've talking a lot about the one NFS filesystem being mounted in
> > > multiple containers.  MNT_FORCE is already a problem for such
> > > mounts
> > > as
> > > one contains can kill requests generated from another
> > > container.  Maybe
> > > MNT_FORCE needs to be restricted to "real" root.
> > > Once we restrict it, do we need to keep it from being too harsh?
> > > 
> > > What would be really nice is a timeout for umount, and for sync.
> > > The timeout only starts when the filesystem stops making progress
> > > for
> > > writeback.  If it eventually does timeout, then the caller can
> > > fall
> > > back
> > > to MNT_DETACH if they are in a container, or MNT_FORCE if not.
> > > (Maybe MNT_FORCE should map to MNT_DETACH in a container??? or
> > > maybe
> > > not).
> > > 
> > > There is a lot here that still isn't clear to me, but one this
> > > does
> > > seem
> > > to be becoming clear:  MNT_FORCE as it stands is nearly useless
> > > and
> > > it
> > > would serve is well to find a semantic that it actually useful,
> > > and
> > > impose that.
> > 
> > Trying to keep the discussion going... does anyone else have
> > thoughts
> > on this?
> 
> It's a challenge, isn't it ... keeping people on-task to make forward
> progress.
> If only we could all meet in the canteen at 4pm every Friday and
> discuss
> these things over drinks.  I don't suppose any of the video
> conference
> tools support timeshifting, so we can each do 4pm in our own time
> zone....

Well, you can all come to my house... it might be easier than building
a time machine. I'll provide the beer.

> 
> I would like to arrange that nothing can block indefinitely on
> ->s_umount.  This probably means that the various "flush data" calls
> made under this lock need a timeout, or to be interruptible.
> Then both umount and remount could be sure of getting ->s_umount
> without undue delay.
> Then I would like MNT_FORCE *not* to abort requests before trying to
> get
> the lock, but instead to be passed down to ->kill_sb().
> We probably cannot pass it explicitly, but could set a flag while
> ->s_umount is held.
> This flag might be handled by generic_shutdown_super(), causing
> it to purge any unwritten data, rather than call sync_filesystems().
> 
> This way, if the filesystem is mounted elsewhere, then the MNT_FORCE
> has
> no effect.  If it is a final mount, then it cleans up properly.

> Your need to cause writes to start failing would be achieved by
> performing a remount, either just setting "soft,retrans=0,timeo=1",
> or
> by setting some special-purpose mount option.

Hmm... If I have a mount option that does what I want and remount
doesn't block, what do I need MNT_FORCE for at all? All I would need
from userspace is:

 mount /mnt/nfs -o remount,serverdead
 umount -l /mnt/nfs

I does seem nice to know that this is the last place the superblock is
mounted in ->kill_sb()... But it seems like we replace MNT_FORCE having
consequences on a shared superblock with some mount options that
probably(?) have consequences on a shared superblock (and in the
process, makes MNT_FORCE less useful?). 

> In order for s_umount not to be held indefinite, the generic things
> that
> need to be fixed include:
>  __writeback_inodes_wb() calls writeback_sb_inodes() under the lock.
>     This needs to be interruptible
>  Same for try_to_writeback_inodes_sb() -> __writeback_inodes_sb_nr()
>  sync_sync and do_sync_work call iterate_supers() with various
> handlers, and these need
>  to be interruptible.
> 
> and do_remount_sb needs to not block.
> 
> Finding a way to interrupt those writeback calls would be tricky,
> especially as we need to trigger the interrupt without holding
> s_umount.

I'll look into it more, but I don't think it would be terribly tricky.
AFAIK, rpc_killall_tasks() covers a lot of these cases.

> 
> I really like the idea that an umount attempt would interrupt a
> sync().
> Currently sync() can block indefinitely, which is occasionally
> inconvenient.
> If "umount" (or "umount -f" at least) on a filesystem would abort the
> sync of that filesystem, take the lock and clean up more forcefully,
> that would make for fairly clean shutdown processing.
> 1/ call sync() in a separate thread.
> 2/ wait until Dirty in /proc/meminfo stops changing
> 3/ umount -f every remaining filesystem.  Even if the
>    umount fails, the sync will abort.

Isn't that pretty close to the current behavior? I'm having a bit of
trouble reconciling this with "Then I would like MNT_FORCE *not* to
abort requests before trying to get the lock"... it sort of sounds like
here you *do* want umount (or maybe umount + MNT_FORCE) to abort a
sync()? Or are you trying to say that sync() should *only* be
interrupted in kill_sb()? Would you mind clarifying?

> 
> 
> Part of this effort would require making sure that SIGKILL really
> kills
> processes blocked on filesystem IO.

I get that making sure processes can be killed when blocked on I/O is a
really good thing, but I don't understand why it keeps getting rolled
into what I'm trying to do.... it seems like an orthagonal concern? You
can certianly do "better" force unmounting without having to SIGKILL
processes, and someone could surely make SIGKILL work for all I/O
without having to implement a full force unmount scheme?

I guess more directly.... SIGKILLing a processes isn't necessary (or
really acceptable) for my use case. If I don't do that portion, is that
going to result in rejection of my patches, or am I missing something
important? 

> 
> So:
>  1/ make sure all filesystem IO waits are TASK_KILLABLE
>  2/ find a way to interrupt any write-back wait when there is a
> pending
>     remount or umount.  Possibly the write-back thing would need to
>     retry after the umount/remount, I'm not sure.
>  3/ Cause MNT_FORCE to set a superblock flag, and have
>     generic_shutdown_super() and/or ->kill_sb() interpret this flag
> to
>     be very forceful
>  4/ Possibly introduce new NFS mount option which causes all requests
>     to fail
>  5/ Teach NFS to support remount of this option, and of soft,
> retrans,
>     timeo.
> 
> How does that sound?

All right, let me reiterate where I think you are going with this:

Currently, MNT_FORCE is not particularly useful for actually force
unmounting, since it only cancels RPCs at a single point in time.
However, this would be a useful behavior to have during remount.
Specifically, do_remount_sb() should get this behavior so that any
"stuck" sync() calls that would block it from getting ->s_umount can be
cleaned up in a timely manner. If a sync() was canceled for this
reason, it might need to be restarted after the remount?

Once remount can be done without undue blocking, some combination of
mount options can be used to mark the server as "dead", thereby
preventing long blocking API calls going forward.

Finally, MNT_FORCE can be reworked. Instead of trying to clean things
up so that a mount *might* be able to be unmounted, wait until we
*know* the mount is getting removed (->kill_sb()) and then be agressive
with cleaning up.



I'm not sure if this is really any different than MNT_FORCE_PONIES. Not
that I prefer one mechanism over the other... it just sort of seems
like they are all pretty similar. 

So far we have:

 1) umount with MNT_FORCE_PONIES - Only *real* root can do this, and it
destroys everything

 2) Remount (which won't block) with some "server dead" flag(s), then
umount with MNT_DETACH 

 3) Mount with a flag that optionally makes MNT_FORCE act like
MNT_FORCE_PONIES (e.g. my original proposal with the "transient" mount
flag).

Am I missing any? In all of these mechanism, a superblock mounted in
multiple namespaces get affected in some way. The real difference is
what that change looks like in the other namespaces. In all cases, the
admin doing these actions is going to have to be aware of how these
action affect other namespaces... I don't know if we can save them from
that.

Thanks,
Joshua

> 
> NeilBrown
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2017-12-18 21:48 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <20171117174552.18722-1-JPEWhacker@gmail.com>
     [not found] ` <1512398194.7031.56.camel@gmail.com>
     [not found]   ` <1512398194.7031.56.camel-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2017-12-05 23:34     ` [RFC v4 0/9] NFS Force Unmounting NeilBrown
     [not found]       ` <87indksq9q.fsf-wvvUuzkyo1HefUI2i7LXDhCRmIWqnp/j@public.gmane.org>
2017-12-06 13:03         ` Jeff Layton
     [not found]           ` <1512565420.4048.21.camel-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2017-12-06 16:40             ` Joshua Watt
2017-12-08  2:10             ` NeilBrown
     [not found]               ` <87bmjaq89r.fsf-wvvUuzkyo1HefUI2i7LXDhCRmIWqnp/j@public.gmane.org>
2017-12-14 18:22                 ` Joshua Watt
     [not found]                   ` <1513275773.3888.20.camel-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2017-12-14 21:52                     ` NeilBrown
     [not found]                       ` <876099nfiw.fsf-wvvUuzkyo1HefUI2i7LXDhCRmIWqnp/j@public.gmane.org>
2017-12-18 21:48                         ` Joshua Watt

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).