All of lore.kernel.org
 help / color / mirror / Atom feed
* NFS hard read-only mount option - again
@ 2010-04-26 23:58 Valerie Aurora
  2010-04-27 11:51 ` Jeff Layton
  0 siblings, 1 reply; 8+ messages in thread
From: Valerie Aurora @ 2010-04-26 23:58 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: Trond Myklebust, J. Bruce Fields, Jeff Layton

I want to restart the discussion we had last July (!) about an NFS
hard read-only mount option.  A common use case of union mounts is a
cluster with NFS mounted read-only root file systems, with a local fs
union mounted on top.  Here's the last discussion we had:

http://kerneltrap.org/mailarchive/linux-fsdevel/2009/7/16/6211043/thread

We can assume a local mechanism that lets the server enforce the
read-only-ness of the file system on the local machine (the server can
increment sb->s_hard_readonly_users on the local fs and the VFS will
take care of the rest).

The main question is what to do on the client side when the server
changes its mind and wants to write to that file system.  On the
server side, there's a clear synchronization point:
sb->s_hard_readonly_users needs to be decremented, and so we don't
have to worry about a hard readonly exported file system going
read-write willy-nilly.

But the client has to cope with the sudden withdrawal of the read-only
guarantee.  A lowest common denominator starting point is to treat it
as though the mount went away entirely, and force the client to
remount and/or reboot.  I also have vague ideas about doing something
smart with stale file handles and generation numbers to avoid a
remount.  This looks a little bit like the forced umount patches too,
where we could EIO any open file descriptors on the old file system.

How long would it take to implement the dumb "NFS server not
responding" version?

-VAL

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: NFS hard read-only mount option - again
  2010-04-26 23:58 NFS hard read-only mount option - again Valerie Aurora
@ 2010-04-27 11:51 ` Jeff Layton
  2010-04-28 20:07   ` Valerie Aurora
  0 siblings, 1 reply; 8+ messages in thread
From: Jeff Layton @ 2010-04-27 11:51 UTC (permalink / raw)
  To: Valerie Aurora; +Cc: linux-fsdevel, Trond Myklebust, J. Bruce Fields

On Mon, 26 Apr 2010 19:58:33 -0400
Valerie Aurora <vaurora@redhat.com> wrote:

> I want to restart the discussion we had last July (!) about an NFS
> hard read-only mount option.  A common use case of union mounts is a
> cluster with NFS mounted read-only root file systems, with a local fs
> union mounted on top.  Here's the last discussion we had:
> 
> http://kerneltrap.org/mailarchive/linux-fsdevel/2009/7/16/6211043/thread
> 
> We can assume a local mechanism that lets the server enforce the
> read-only-ness of the file system on the local machine (the server can
> increment sb->s_hard_readonly_users on the local fs and the VFS will
> take care of the rest).
> 
> The main question is what to do on the client side when the server
> changes its mind and wants to write to that file system.  On the
> server side, there's a clear synchronization point:
> sb->s_hard_readonly_users needs to be decremented, and so we don't
> have to worry about a hard readonly exported file system going
> read-write willy-nilly.
> 
> But the client has to cope with the sudden withdrawal of the read-only
> guarantee.  A lowest common denominator starting point is to treat it
> as though the mount went away entirely, and force the client to
> remount and/or reboot.  I also have vague ideas about doing something
> smart with stale file handles and generation numbers to avoid a
> remount.  This looks a little bit like the forced umount patches too,
> where we could EIO any open file descriptors on the old file system.
> 
> How long would it take to implement the dumb "NFS server not
> responding" version?
> 
> -VAL

Ok, so the problem is this:

You have a client with the aforementioned union mount (r/o NFS layer
with a local r/w layer on top). "Something" changes on the server and
you need a way to cope with the change?

What happens if you do nothing here and just expect the client to deal
with it? Obviously you have the potential for inconsistent data on the
clients until they remount along with problems like -ESTALE errors, etc.

For the use case you describe however, an admin would have to be insane
to think that they could safely change the filesystem while it was
online and serving out data to clients. If I had a cluster like you
describe, my upgrade plan would look like this:

1) update a copy of the master r/o filesytem offline
2) test it, test it, test it
3) shut down the clients
4) unexport the old filesystem, export the new one
5) bring the clients back up

...anything else would be playing with fire.

Unfortunately I haven't been keeping up with your patchset as well as I
probably should. What happens to the r/w layer when the r/o layer
changes? Does it become completely invalid and you have to rebuild it?
Or can it cope with a situation where the r/o filesystem is changed
while the r/w layer isn't mounted on top of it?

-- 
Jeff Layton <jlayton@redhat.com>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: NFS hard read-only mount option - again
  2010-04-27 11:51 ` Jeff Layton
@ 2010-04-28 20:07   ` Valerie Aurora
  2010-04-28 20:34     ` Jeff Layton
  0 siblings, 1 reply; 8+ messages in thread
From: Valerie Aurora @ 2010-04-28 20:07 UTC (permalink / raw)
  To: Jeff Layton; +Cc: linux-fsdevel, Trond Myklebust, J. Bruce Fields

On Tue, Apr 27, 2010 at 07:51:59AM -0400, Jeff Layton wrote:
> On Mon, 26 Apr 2010 19:58:33 -0400
> Valerie Aurora <vaurora@redhat.com> wrote:
> 
> > I want to restart the discussion we had last July (!) about an NFS
> > hard read-only mount option.  A common use case of union mounts is a
> > cluster with NFS mounted read-only root file systems, with a local fs
> > union mounted on top.  Here's the last discussion we had:
> > 
> > http://kerneltrap.org/mailarchive/linux-fsdevel/2009/7/16/6211043/thread
> > 
> > We can assume a local mechanism that lets the server enforce the
> > read-only-ness of the file system on the local machine (the server can
> > increment sb->s_hard_readonly_users on the local fs and the VFS will
> > take care of the rest).
> > 
> > The main question is what to do on the client side when the server
> > changes its mind and wants to write to that file system.  On the
> > server side, there's a clear synchronization point:
> > sb->s_hard_readonly_users needs to be decremented, and so we don't
> > have to worry about a hard readonly exported file system going
> > read-write willy-nilly.
> > 
> > But the client has to cope with the sudden withdrawal of the read-only
> > guarantee.  A lowest common denominator starting point is to treat it
> > as though the mount went away entirely, and force the client to
> > remount and/or reboot.  I also have vague ideas about doing something
> > smart with stale file handles and generation numbers to avoid a
> > remount.  This looks a little bit like the forced umount patches too,
> > where we could EIO any open file descriptors on the old file system.
> > 
> > How long would it take to implement the dumb "NFS server not
> > responding" version?
> > 
> > -VAL
> 
> Ok, so the problem is this:
> 
> You have a client with the aforementioned union mount (r/o NFS layer
> with a local r/w layer on top). "Something" changes on the server and
> you need a way to cope with the change?
> 
> What happens if you do nothing here and just expect the client to deal
> with it? Obviously you have the potential for inconsistent data on the
> clients until they remount along with problems like -ESTALE errors, etc.
> 
> For the use case you describe however, an admin would have to be insane
> to think that they could safely change the filesystem while it was
> online and serving out data to clients. If I had a cluster like you
> describe, my upgrade plan would look like this:
> 
> 1) update a copy of the master r/o filesytem offline
> 2) test it, test it, test it
> 3) shut down the clients
> 4) unexport the old filesystem, export the new one
> 5) bring the clients back up
> 
> ...anything else would be playing with fire.

Yes, you are totally correct, that's the only scenario that would
actually work.  This feature is just detecting when someone tries to
do this without step 3).

> Unfortunately I haven't been keeping up with your patchset as well as I
> probably should. What happens to the r/w layer when the r/o layer
> changes? Does it become completely invalid and you have to rebuild it?
> Or can it cope with a situation where the r/o filesystem is changed
> while the r/w layer isn't mounted on top of it?

The short version is that we can't cope with the r/o file system being
changed while it's mounted as the bottom layer of a union mount on a
client.  This assumption is what makes a non-panicking union mount
implementation possible.

What I need can be summarized in the distinction between the following
scenarios:

Scenario A: The NFS server reboots while a client has the file system
mounted as the r/o layer of a union mount.  The server does not change
the exported file system at all and re-exports it as hard read-only.
This should work.

Scenario B: The NFS server reboots as in the above scenario, but
performs "touch /exports/client_root/a_file" before re-exporting the
file system as hard read-only.  This is _not_ okay and in some form
will cause a panic on the client if the client doesn't detect it and
stop accessing the mount.

How to tell the difference between scenarios A and B?

Thanks,

-VAL

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: NFS hard read-only mount option - again
  2010-04-28 20:07   ` Valerie Aurora
@ 2010-04-28 20:34     ` Jeff Layton
  2010-04-28 20:56       ` J. Bruce Fields
  0 siblings, 1 reply; 8+ messages in thread
From: Jeff Layton @ 2010-04-28 20:34 UTC (permalink / raw)
  To: Valerie Aurora; +Cc: linux-fsdevel, Trond Myklebust, J. Bruce Fields

On Wed, 28 Apr 2010 16:07:46 -0400
Valerie Aurora <vaurora@redhat.com> wrote:

> 
> What I need can be summarized in the distinction between the following
> scenarios:
> 
> Scenario A: The NFS server reboots while a client has the file system
> mounted as the r/o layer of a union mount.  The server does not change
> the exported file system at all and re-exports it as hard read-only.
> This should work.
> 

Nitpick: This should be fine regardless of how it's exported. You
don't want the clients going bonkers just because someone pulled the
plug on the server accidentally. NFS was designed such that clients
really shouldn't be affected when the server reboots (aside from
stalling out on RPC calls while the server comes back up).

> Scenario B: The NFS server reboots as in the above scenario, but
> performs "touch /exports/client_root/a_file" before re-exporting the
> file system as hard read-only.  This is _not_ okay and in some form
> will cause a panic on the client if the client doesn't detect it and
> stop accessing the mount.
> 
> How to tell the difference between scenarios A and B?
> 

I don't believe you can, at least not with standard NFS protocols. I
think the best you can do is detect these problems on an as-needed
basis. Anything that relies on server behavior won't be very robust.

In the above case, the mtime on /exports/client_root will (likely) have
changed. At that point you can try to handle the situation without
oopsing.

There are quite a few ways to screw up your NFS server too that don't
involve changing the underlying fs. Suppose the server is clustered and
someone screws up the fsid's such that you get ESTALE's when you try to
access the root inode?

It would be best if union mounts could cope with that sort of problem
without oopsing (even if the alternative is EIO's for everything that
touches it).

-- 
Jeff Layton <jlayton@redhat.com>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: NFS hard read-only mount option - again
  2010-04-28 20:34     ` Jeff Layton
@ 2010-04-28 20:56       ` J. Bruce Fields
  2010-05-04 22:51         ` Valerie Aurora
  0 siblings, 1 reply; 8+ messages in thread
From: J. Bruce Fields @ 2010-04-28 20:56 UTC (permalink / raw)
  To: Jeff Layton; +Cc: Valerie Aurora, linux-fsdevel, Trond Myklebust

On Wed, Apr 28, 2010 at 04:34:47PM -0400, Jeff Layton wrote:
> On Wed, 28 Apr 2010 16:07:46 -0400
> Valerie Aurora <vaurora@redhat.com> wrote:
> 
> > 
> > What I need can be summarized in the distinction between the following
> > scenarios:
> > 
> > Scenario A: The NFS server reboots while a client has the file system
> > mounted as the r/o layer of a union mount.  The server does not change
> > the exported file system at all and re-exports it as hard read-only.
> > This should work.
> > 
> 
> Nitpick: This should be fine regardless of how it's exported. You
> don't want the clients going bonkers just because someone pulled the
> plug on the server accidentally. NFS was designed such that clients
> really shouldn't be affected when the server reboots (aside from
> stalling out on RPC calls while the server comes back up).
> 
> > Scenario B: The NFS server reboots as in the above scenario, but
> > performs "touch /exports/client_root/a_file" before re-exporting the
> > file system as hard read-only.  This is _not_ okay and in some form
> > will cause a panic on the client if the client doesn't detect it and
> > stop accessing the mount.
> > 
> > How to tell the difference between scenarios A and B?
> > 
> 
> I don't believe you can, at least not with standard NFS protocols. I
> think the best you can do is detect these problems on an as-needed
> basis. Anything that relies on server behavior won't be very robust.

Yeah.  Even if the server had a way to tell the client "this filesystem
will never ever change, I promise" (and actually I think 4.1 might have
something like that--see STATUS4_FIXED?)--there's still so many
opportunities for operator error, network problems, etc., that in
pratice a client that panics in that situation probably isn't going to
be considered reliable or secure.

So the unionfs code has to be prepared to deal with the possibility.  If
dealing with it fairly harshly is the simplest thing to do for now, I
agree, that sounds fine--but panicking sounds too harsh!

I'm not sure if we're answering your question.

--b.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: NFS hard read-only mount option - again
  2010-04-28 20:56       ` J. Bruce Fields
@ 2010-05-04 22:51         ` Valerie Aurora
  2010-05-05  7:22           ` Jeff Layton
  0 siblings, 1 reply; 8+ messages in thread
From: Valerie Aurora @ 2010-05-04 22:51 UTC (permalink / raw)
  To: J. Bruce Fields; +Cc: Jeff Layton, linux-fsdevel, Trond Myklebust

On Wed, Apr 28, 2010 at 04:56:00PM -0400, J. Bruce Fields wrote:
> On Wed, Apr 28, 2010 at 04:34:47PM -0400, Jeff Layton wrote:
> > On Wed, 28 Apr 2010 16:07:46 -0400
> > Valerie Aurora <vaurora@redhat.com> wrote:
> > 
> > > 
> > > What I need can be summarized in the distinction between the following
> > > scenarios:
> > > 
> > > Scenario A: The NFS server reboots while a client has the file system
> > > mounted as the r/o layer of a union mount.  The server does not change
> > > the exported file system at all and re-exports it as hard read-only.
> > > This should work.
> > > 
> > 
> > Nitpick: This should be fine regardless of how it's exported. You
> > don't want the clients going bonkers just because someone pulled the
> > plug on the server accidentally. NFS was designed such that clients
> > really shouldn't be affected when the server reboots (aside from
> > stalling out on RPC calls while the server comes back up).
> > 
> > > Scenario B: The NFS server reboots as in the above scenario, but
> > > performs "touch /exports/client_root/a_file" before re-exporting the
> > > file system as hard read-only.  This is _not_ okay and in some form
> > > will cause a panic on the client if the client doesn't detect it and
> > > stop accessing the mount.
> > > 
> > > How to tell the difference between scenarios A and B?
> > > 
> > 
> > I don't believe you can, at least not with standard NFS protocols. I
> > think the best you can do is detect these problems on an as-needed
> > basis. Anything that relies on server behavior won't be very robust.
> 
> Yeah.  Even if the server had a way to tell the client "this filesystem
> will never ever change, I promise" (and actually I think 4.1 might have
> something like that--see STATUS4_FIXED?)--there's still so many
> opportunities for operator error, network problems, etc., that in
> pratice a client that panics in that situation probably isn't going to
> be considered reliable or secure.
> 
> So the unionfs code has to be prepared to deal with the possibility.  If
> dealing with it fairly harshly is the simplest thing to do for now, I
> agree, that sounds fine--but panicking sounds too harsh!
> 
> I'm not sure if we're answering your question.

This is definitely going in the right direction, thank you.  Mainly
I'm just really ignorant of actual NFS implementation. :)

Let's focus on detecting a write to a file or directory the client has
read and still has in cache.  This would be the case of an NFS dentry
in cache on the client that is written on the server.  So what is the
actual code path if the client has an NFS dentry in cache and it is
altered or goes away on the client?  Can we hook in there and disable
the union mount?  Is this a totally dumb idea?

-VAL

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: NFS hard read-only mount option - again
  2010-05-04 22:51         ` Valerie Aurora
@ 2010-05-05  7:22           ` Jeff Layton
  2010-05-06 15:01             ` Valerie Aurora
  0 siblings, 1 reply; 8+ messages in thread
From: Jeff Layton @ 2010-05-05  7:22 UTC (permalink / raw)
  To: Valerie Aurora; +Cc: J. Bruce Fields, linux-fsdevel, Trond Myklebust

On Tue, 4 May 2010 18:51:56 -0400
Valerie Aurora <vaurora@redhat.com> wrote:

> On Wed, Apr 28, 2010 at 04:56:00PM -0400, J. Bruce Fields wrote:
> > On Wed, Apr 28, 2010 at 04:34:47PM -0400, Jeff Layton wrote:
> > > On Wed, 28 Apr 2010 16:07:46 -0400
> > > Valerie Aurora <vaurora@redhat.com> wrote:
> > > 
> > > > 
> > > > What I need can be summarized in the distinction between the following
> > > > scenarios:
> > > > 
> > > > Scenario A: The NFS server reboots while a client has the file system
> > > > mounted as the r/o layer of a union mount.  The server does not change
> > > > the exported file system at all and re-exports it as hard read-only.
> > > > This should work.
> > > > 
> > > 
> > > Nitpick: This should be fine regardless of how it's exported. You
> > > don't want the clients going bonkers just because someone pulled the
> > > plug on the server accidentally. NFS was designed such that clients
> > > really shouldn't be affected when the server reboots (aside from
> > > stalling out on RPC calls while the server comes back up).
> > > 
> > > > Scenario B: The NFS server reboots as in the above scenario, but
> > > > performs "touch /exports/client_root/a_file" before re-exporting the
> > > > file system as hard read-only.  This is _not_ okay and in some form
> > > > will cause a panic on the client if the client doesn't detect it and
> > > > stop accessing the mount.
> > > > 
> > > > How to tell the difference between scenarios A and B?
> > > > 
> > > 
> > > I don't believe you can, at least not with standard NFS protocols. I
> > > think the best you can do is detect these problems on an as-needed
> > > basis. Anything that relies on server behavior won't be very robust.
> > 
> > Yeah.  Even if the server had a way to tell the client "this filesystem
> > will never ever change, I promise" (and actually I think 4.1 might have
> > something like that--see STATUS4_FIXED?)--there's still so many
> > opportunities for operator error, network problems, etc., that in
> > pratice a client that panics in that situation probably isn't going to
> > be considered reliable or secure.
> > 
> > So the unionfs code has to be prepared to deal with the possibility.  If
> > dealing with it fairly harshly is the simplest thing to do for now, I
> > agree, that sounds fine--but panicking sounds too harsh!
> > 
> > I'm not sure if we're answering your question.
> 
> This is definitely going in the right direction, thank you.  Mainly
> I'm just really ignorant of actual NFS implementation. :)
> 
> Let's focus on detecting a write to a file or directory the client has
> read and still has in cache.  This would be the case of an NFS dentry
> in cache on the client that is written on the server.  So what is the
> actual code path if the client has an NFS dentry in cache and it is
> altered or goes away on the client? Can we hook in there and disable
> the union mount?  Is this a totally dumb idea?
> 
> -VAL

Well...we typically can tell if an inode changed -- see
nfs_update_inode for most of that logic. Note that the methods we use
there are not perfect -- NFSv2/3 rely heavily on timestamps and if the
server is using a filesystem with coarse-grained timestamps (e.g. ext3)
then it's possible for things to change and the client won't notice
(whee!)

Dentries don't really change like inodes do, but we do generally check
whether they are correct before trusting them. That's done via the
d_revalidate methods for NFS. Mostly that involves checking whether the
directory that contains the it has changed since the dentry was spawned.

That's probably where you'll want to place your hooks, but I wonder
whether it would be better to do that at a higher level -- in the
generic VFS. Whenever a d_revalidate op returns false, then you know
that something has happened.

-- 
Jeff Layton <jlayton@redhat.com>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: NFS hard read-only mount option - again
  2010-05-05  7:22           ` Jeff Layton
@ 2010-05-06 15:01             ` Valerie Aurora
  0 siblings, 0 replies; 8+ messages in thread
From: Valerie Aurora @ 2010-05-06 15:01 UTC (permalink / raw)
  To: Jeff Layton; +Cc: J. Bruce Fields, linux-fsdevel, Trond Myklebust

On Wed, May 05, 2010 at 09:22:30AM +0200, Jeff Layton wrote:
> On Tue, 4 May 2010 18:51:56 -0400
> Valerie Aurora <vaurora@redhat.com> wrote:
> > 
> > This is definitely going in the right direction, thank you.  Mainly
> > I'm just really ignorant of actual NFS implementation. :)
> > 
> > Let's focus on detecting a write to a file or directory the client has
> > read and still has in cache.  This would be the case of an NFS dentry
> > in cache on the client that is written on the server.  So what is the
> > actual code path if the client has an NFS dentry in cache and it is
> > altered or goes away on the client? Can we hook in there and disable
> > the union mount?  Is this a totally dumb idea?
> > 
> > -VAL
> 
> Well...we typically can tell if an inode changed -- see
> nfs_update_inode for most of that logic. Note that the methods we use
> there are not perfect -- NFSv2/3 rely heavily on timestamps and if the
> server is using a filesystem with coarse-grained timestamps (e.g. ext3)
> then it's possible for things to change and the client won't notice
> (whee!)
> 
> Dentries don't really change like inodes do, but we do generally check
> whether they are correct before trusting them. That's done via the
> d_revalidate methods for NFS. Mostly that involves checking whether the
> directory that contains the it has changed since the dentry was spawned.
> 
> That's probably where you'll want to place your hooks, but I wonder
> whether it would be better to do that at a higher level -- in the
> generic VFS. Whenever a d_revalidate op returns false, then you know
> that something has happened.

My gut feeling is that you are right and the VFS call of the
d_revalidate op is the right place to check this.  My guess is that
->d_revalidate() should never fail in the lower/read-only layers of a
union mount, no matter what the file system is.  Can you think of a
d_revalidate implementation that would fail for a reason other than a
write on the server?

Thanks,

-VAL

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2010-05-06 15:02 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-04-26 23:58 NFS hard read-only mount option - again Valerie Aurora
2010-04-27 11:51 ` Jeff Layton
2010-04-28 20:07   ` Valerie Aurora
2010-04-28 20:34     ` Jeff Layton
2010-04-28 20:56       ` J. Bruce Fields
2010-05-04 22:51         ` Valerie Aurora
2010-05-05  7:22           ` Jeff Layton
2010-05-06 15:01             ` Valerie Aurora

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.