[PATCH] NFS: Don't let readdirplus revalidate an inode that was marked as stale

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH] NFS: Don't let readdirplus revalidate an inode that was marked as stale
@ 2016-06-14 21:25 Trond Myklebust
  2016-06-30 21:46 ` grace period Marc Eshel
  0 siblings, 1 reply; 44+ messages in thread
From: Trond Myklebust @ 2016-06-14 21:25 UTC (permalink / raw)
  To: linux-nfs

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
---
 fs/nfs/dir.c | 7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/fs/nfs/dir.c b/fs/nfs/dir.c
index aaf7bd0cbae2..a924d66b5608 100644
--- a/fs/nfs/dir.c
+++ b/fs/nfs/dir.c
@@ -424,12 +424,17 @@ static int xdr_decode(nfs_readdir_descriptor_t *desc,
 static
 int nfs_same_file(struct dentry *dentry, struct nfs_entry *entry)
 {
+	struct inode *inode;
 	struct nfs_inode *nfsi;
 
 	if (d_really_is_negative(dentry))
 		return 0;
 
-	nfsi = NFS_I(d_inode(dentry));
+	inode = d_inode(dentry);
+	if (is_bad_inode(inode) || NFS_STALE(inode))
+		return 0;
+
+	nfsi = NFS_I(inode);
 	if (entry->fattr->fileid == nfsi->fileid)
 		return 1;
 	if (nfs_compare_fh(entry->fh, &nfsi->fh) == 0)
-- 
2.5.5


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* grace period
  2016-06-14 21:25 [PATCH] NFS: Don't let readdirplus revalidate an inode that was marked as stale Trond Myklebust
@ 2016-06-30 21:46 ` Marc Eshel
  2016-07-01 16:08   ` Bruce Fields
  0 siblings, 1 reply; 44+ messages in thread
From: Marc Eshel @ 2016-06-30 21:46 UTC (permalink / raw)
  To: Bruce Fields; +Cc: linux-nfs

Hi Bruce,
I see that setting the number of nfsd threads to 0 (echo 0 > 
/proc/fs/nfsd/threads) is not releasing the locks and putting the server 
in grace mode. What is the best way to go into grace period, in new 
version of the kernel, without restarting the nfs server?
Thanks, Marc.

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: grace period
  2016-06-30 21:46 ` grace period Marc Eshel
@ 2016-07-01 16:08   ` Bruce Fields
  2016-07-01 17:31     ` Marc Eshel
  0 siblings, 1 reply; 44+ messages in thread
From: Bruce Fields @ 2016-07-01 16:08 UTC (permalink / raw)
  To: Marc Eshel; +Cc: linux-nfs

On Thu, Jun 30, 2016 at 02:46:19PM -0700, Marc Eshel wrote:
> I see that setting the number of nfsd threads to 0 (echo 0 > 
> /proc/fs/nfsd/threads) is not releasing the locks and putting the server 
> in grace mode.

Writing 0 to /proc/fs/nfsd/threads shuts down knfsd.  So it should
certainly drop locks.  If that's not happening, there's a bug, but we'd
need to know more details (version numbers, etc.) to help.

That alone has never been enough to start a grace period--you'd have to
start knfsd again to do that.

> What is the best way to go into grace period, in new version of the
> kernel, without restarting the nfs server?

Restarting the nfs server is the only way.  That's true on older kernels
true, as far as I know.  (OK, you can apparently make lockd do something
like this with a signal, I don't know if that's used much, and I doubt
it works outside an NFSv3-only environment.)

So if you want locks dropped and a new grace period, then you should run
"systemctl restart nfs-server", or your distro's equivalent.

But you're probably doing something more complicated than that.  I'm not
sure I understand the question....

--b.

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: grace period
  2016-07-01 16:08   ` Bruce Fields
@ 2016-07-01 17:31     ` Marc Eshel
  2016-07-01 20:07       ` Bruce Fields
  0 siblings, 1 reply; 44+ messages in thread
From: Marc Eshel @ 2016-07-01 17:31 UTC (permalink / raw)
  To: Bruce Fields; +Cc: linux-nfs

It used to be that sending KILL signal to lockd would free locks and start 
Grace period, and when setting nfsd threads to zero, nfsd_last_thread() 
calls nfsd_shutdown that called lockd_down that I believe was causing both 
freeing of locks and starting grace period or maybe it was setting it back 
to a value > 0 that started the grace period.
Any way starting with the kernels that are in RHEL7.1 and up echo 0 > 
/proc/fs/nfsd/threads doesn't do it anymore, I assume going to common 
grace period for NLM and NFSv4 changed things.
The question is how to do IP fail-over, so when a node fails and the IP is 
moving to another node, we need to go into grace period on all the nodes 
in the cluster so the locks of the failed node are not given to anyone 
other than the client that is reclaiming his locks. Restarting NFS server 
is to distractive. For NFSv3 KILL signal to lockd still works but for 
NFSv4 have no way to do it for v4.
Marc. 

From:   Bruce Fields <bfields@fieldses.org>
To:     Marc Eshel/Almaden/IBM@IBMUS
Cc:     linux-nfs@vger.kernel.org
Date:   07/01/2016 09:09 AM
Subject:        Re: grace period

On Thu, Jun 30, 2016 at 02:46:19PM -0700, Marc Eshel wrote:
> I see that setting the number of nfsd threads to 0 (echo 0 > 
> /proc/fs/nfsd/threads) is not releasing the locks and putting the server 

> in grace mode.

Writing 0 to /proc/fs/nfsd/threads shuts down knfsd.  So it should
certainly drop locks.  If that's not happening, there's a bug, but we'd
need to know more details (version numbers, etc.) to help.

That alone has never been enough to start a grace period--you'd have to
start knfsd again to do that.

> What is the best way to go into grace period, in new version of the
> kernel, without restarting the nfs server?

Restarting the nfs server is the only way.  That's true on older kernels
true, as far as I know.  (OK, you can apparently make lockd do something
like this with a signal, I don't know if that's used much, and I doubt
it works outside an NFSv3-only environment.)

So if you want locks dropped and a new grace period, then you should run
"systemctl restart nfs-server", or your distro's equivalent.

But you're probably doing something more complicated than that.  I'm not
sure I understand the question....

--b.

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: grace period
  2016-07-01 17:31     ` Marc Eshel
@ 2016-07-01 20:07       ` Bruce Fields
  2016-07-01 20:24         ` Marc Eshel
                           ` (2 more replies)
  0 siblings, 3 replies; 44+ messages in thread
From: Bruce Fields @ 2016-07-01 20:07 UTC (permalink / raw)
  To: Marc Eshel; +Cc: linux-nfs

On Fri, Jul 01, 2016 at 10:31:55AM -0700, Marc Eshel wrote:
> It used to be that sending KILL signal to lockd would free locks and start 
> Grace period, and when setting nfsd threads to zero, nfsd_last_thread() 
> calls nfsd_shutdown that called lockd_down that I believe was causing both 
> freeing of locks and starting grace period or maybe it was setting it back 
> to a value > 0 that started the grace period.

OK, apologies, I didn't know (or forgot) that.

> Any way starting with the kernels that are in RHEL7.1 and up echo 0 > 
> /proc/fs/nfsd/threads doesn't do it anymore, I assume going to common 
> grace period for NLM and NFSv4 changed things.
> The question is how to do IP fail-over, so when a node fails and the IP is 
> moving to another node, we need to go into grace period on all the nodes 
> in the cluster so the locks of the failed node are not given to anyone 
> other than the client that is reclaiming his locks. Restarting NFS server 
> is to distractive.

What's the difference?  Just that clients don't have to reestablish tcp
connections?

--b.

> For NFSv3 KILL signal to lockd still works but for 
> NFSv4 have no way to do it for v4.
> Marc. 
> 
> 
> 
> From:   Bruce Fields <bfields@fieldses.org>
> To:     Marc Eshel/Almaden/IBM@IBMUS
> Cc:     linux-nfs@vger.kernel.org
> Date:   07/01/2016 09:09 AM
> Subject:        Re: grace period
> 
> 
> 
> On Thu, Jun 30, 2016 at 02:46:19PM -0700, Marc Eshel wrote:
> > I see that setting the number of nfsd threads to 0 (echo 0 > 
> > /proc/fs/nfsd/threads) is not releasing the locks and putting the server 
> 
> > in grace mode.
> 
> Writing 0 to /proc/fs/nfsd/threads shuts down knfsd.  So it should
> certainly drop locks.  If that's not happening, there's a bug, but we'd
> need to know more details (version numbers, etc.) to help.
> 
> That alone has never been enough to start a grace period--you'd have to
> start knfsd again to do that.
> 
> > What is the best way to go into grace period, in new version of the
> > kernel, without restarting the nfs server?
> 
> Restarting the nfs server is the only way.  That's true on older kernels
> true, as far as I know.  (OK, you can apparently make lockd do something
> like this with a signal, I don't know if that's used much, and I doubt
> it works outside an NFSv3-only environment.)
> 
> So if you want locks dropped and a new grace period, then you should run
> "systemctl restart nfs-server", or your distro's equivalent.
> 
> But you're probably doing something more complicated than that.  I'm not
> sure I understand the question....
> 
> --b.
> 
> 
> 
> 

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: grace period
  2016-07-01 20:07       ` Bruce Fields
@ 2016-07-01 20:24         ` Marc Eshel
  2016-07-01 20:47           ` Bruce Fields
  2016-07-01 20:46         ` Marc Eshel
       [not found]         ` <OF5D486F02.62CECB7B-ON88257FE3.0071DBE5-88257FE3.00722318@LocalDomain>
  2 siblings, 1 reply; 44+ messages in thread
From: Marc Eshel @ 2016-07-01 20:24 UTC (permalink / raw)
  To: Bruce Fields; +Cc: linux-nfs, Tomer Perry

linux-nfs-owner@vger.kernel.org wrote on 07/01/2016 01:07:42 PM:

> From: Bruce Fields <bfields@fieldses.org>
> To: Marc Eshel/Almaden/IBM@IBMUS
> Cc: linux-nfs@vger.kernel.org
> Date: 07/01/2016 01:07 PM
> Subject: Re: grace period
> Sent by: linux-nfs-owner@vger.kernel.org
> 
> On Fri, Jul 01, 2016 at 10:31:55AM -0700, Marc Eshel wrote:
> > It used to be that sending KILL signal to lockd would free locks and 
start 
> > Grace period, and when setting nfsd threads to zero, 
nfsd_last_thread() 
> > calls nfsd_shutdown that called lockd_down that I believe was causing 
both 
> > freeing of locks and starting grace period or maybe it was setting it 
back 
> > to a value > 0 that started the grace period.
> 
> OK, apologies, I didn't know (or forgot) that.
> 
> > Any way starting with the kernels that are in RHEL7.1 and up echo 0 > 
> > /proc/fs/nfsd/threads doesn't do it anymore, I assume going to common 
> > grace period for NLM and NFSv4 changed things.
> > The question is how to do IP fail-over, so when a node fails and the 
IP is 
> > moving to another node, we need to go into grace period on all the 
nodes 
> > in the cluster so the locks of the failed node are not given to anyone 

> > other than the client that is reclaiming his locks. Restarting NFS 
server 
> > is to distractive.
> 
> What's the difference?  Just that clients don't have to reestablish tcp
> connections?

I am not sure what else systemctl will do but I need to control the order 
of the restart so the client will not see any errors.
I don't think that echo 0 > /proc/fs/nfsd/threads is freeing the lock, at 
least not the v3 locks, I will try again with v4.
The question is what is the most basic operation that can be done to start 
grace, will echo 8 > /proc/fs/nfsd/threads following echo 0 do it?
or is there any other primitive that will do it?
Marc.

> 
> --b.
> 
> > For NFSv3 KILL signal to lockd still works but for 
> > NFSv4 have no way to do it for v4.
> > Marc. 
> > 
> > 
> > 
> > From:   Bruce Fields <bfields@fieldses.org>
> > To:     Marc Eshel/Almaden/IBM@IBMUS
> > Cc:     linux-nfs@vger.kernel.org
> > Date:   07/01/2016 09:09 AM
> > Subject:        Re: grace period
> > 
> > 
> > 
> > On Thu, Jun 30, 2016 at 02:46:19PM -0700, Marc Eshel wrote:
> > > I see that setting the number of nfsd threads to 0 (echo 0 > 
> > > /proc/fs/nfsd/threads) is not releasing the locks and putting the 
server 
> > 
> > > in grace mode.
> > 
> > Writing 0 to /proc/fs/nfsd/threads shuts down knfsd.  So it should
> > certainly drop locks.  If that's not happening, there's a bug, but 
we'd
> > need to know more details (version numbers, etc.) to help.
> > 
> > That alone has never been enough to start a grace period--you'd have 
to
> > start knfsd again to do that.
> > 
> > > What is the best way to go into grace period, in new version of the
> > > kernel, without restarting the nfs server?
> > 
> > Restarting the nfs server is the only way.  That's true on older 
kernels
> > true, as far as I know.  (OK, you can apparently make lockd do 
something
> > like this with a signal, I don't know if that's used much, and I doubt
> > it works outside an NFSv3-only environment.)
> > 
> > So if you want locks dropped and a new grace period, then you should 
run
> > "systemctl restart nfs-server", or your distro's equivalent.
> > 
> > But you're probably doing something more complicated than that.  I'm 
not
> > sure I understand the question....
> > 
> > --b.
> > 
> > 
> > 
> > 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 



^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: grace period
  2016-07-01 20:07       ` Bruce Fields
  2016-07-01 20:24         ` Marc Eshel
@ 2016-07-01 20:46         ` Marc Eshel
  2016-07-01 21:01           ` Bruce Fields
       [not found]         ` <OF5D486F02.62CECB7B-ON88257FE3.0071DBE5-88257FE3.00722318@LocalDomain>
  2 siblings, 1 reply; 44+ messages in thread
From: Marc Eshel @ 2016-07-01 20:46 UTC (permalink / raw)
  To: Bruce Fields; +Cc: linux-nfs, Tomer Perry

This is my v3 test that show the lock still there after echo 0 > 
/proc/fs/nfsd/threads

[root@sonascl21 ~]# cat /etc/redhat-release 
Red Hat Enterprise Linux Server release 7.2 (Maipo)

[root@sonascl21 ~]# uname -a
Linux sonascl21.sonasad.almaden.ibm.com 3.10.0-327.el7.x86_64 #1 SMP Thu 
Oct 29 17:29:29 EDT 2015 x86_64 x86_64 x86_64 GNU/Linux

[root@sonascl21 ~]# cat /proc/locks | grep 999
3: POSIX  ADVISORY  WRITE 2349 00:2a:489486 0 999

[root@sonascl21 ~]# echo 0 > /proc/fs/nfsd/threads
[root@sonascl21 ~]# cat /proc/fs/nfsd/threads
0

[root@sonascl21 ~]# cat /proc/locks | grep 999
3: POSIX  ADVISORY  WRITE 2349 00:2a:489486 0 999




From:   Bruce Fields <bfields@fieldses.org>
To:     Marc Eshel/Almaden/IBM@IBMUS
Cc:     linux-nfs@vger.kernel.org
Date:   07/01/2016 01:07 PM
Subject:        Re: grace period



On Fri, Jul 01, 2016 at 10:31:55AM -0700, Marc Eshel wrote:
> It used to be that sending KILL signal to lockd would free locks and 
start 
> Grace period, and when setting nfsd threads to zero, nfsd_last_thread() 
> calls nfsd_shutdown that called lockd_down that I believe was causing 
both 
> freeing of locks and starting grace period or maybe it was setting it 
back 
> to a value > 0 that started the grace period.

OK, apologies, I didn't know (or forgot) that.

> Any way starting with the kernels that are in RHEL7.1 and up echo 0 > 
> /proc/fs/nfsd/threads doesn't do it anymore, I assume going to common 
> grace period for NLM and NFSv4 changed things.
> The question is how to do IP fail-over, so when a node fails and the IP 
is 
> moving to another node, we need to go into grace period on all the nodes 

> in the cluster so the locks of the failed node are not given to anyone 
> other than the client that is reclaiming his locks. Restarting NFS 
server 
> is to distractive.

What's the difference?  Just that clients don't have to reestablish tcp
connections?

--b.

> For NFSv3 KILL signal to lockd still works but for 
> NFSv4 have no way to do it for v4.
> Marc. 
> 
> 
> 
> From:   Bruce Fields <bfields@fieldses.org>
> To:     Marc Eshel/Almaden/IBM@IBMUS
> Cc:     linux-nfs@vger.kernel.org
> Date:   07/01/2016 09:09 AM
> Subject:        Re: grace period
> 
> 
> 
> On Thu, Jun 30, 2016 at 02:46:19PM -0700, Marc Eshel wrote:
> > I see that setting the number of nfsd threads to 0 (echo 0 > 
> > /proc/fs/nfsd/threads) is not releasing the locks and putting the 
server 
> 
> > in grace mode.
> 
> Writing 0 to /proc/fs/nfsd/threads shuts down knfsd.  So it should
> certainly drop locks.  If that's not happening, there's a bug, but we'd
> need to know more details (version numbers, etc.) to help.
> 
> That alone has never been enough to start a grace period--you'd have to
> start knfsd again to do that.
> 
> > What is the best way to go into grace period, in new version of the
> > kernel, without restarting the nfs server?
> 
> Restarting the nfs server is the only way.  That's true on older kernels
> true, as far as I know.  (OK, you can apparently make lockd do something
> like this with a signal, I don't know if that's used much, and I doubt
> it works outside an NFSv3-only environment.)
> 
> So if you want locks dropped and a new grace period, then you should run
> "systemctl restart nfs-server", or your distro's equivalent.
> 
> But you're probably doing something more complicated than that.  I'm not
> sure I understand the question....
> 
> --b.
> 
> 
> 
> 






^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: grace period
  2016-07-01 20:24         ` Marc Eshel
@ 2016-07-01 20:47           ` Bruce Fields
  0 siblings, 0 replies; 44+ messages in thread
From: Bruce Fields @ 2016-07-01 20:47 UTC (permalink / raw)
  To: Marc Eshel; +Cc: linux-nfs, Tomer Perry

On Fri, Jul 01, 2016 at 01:24:48PM -0700, Marc Eshel wrote:
> linux-nfs-owner@vger.kernel.org wrote on 07/01/2016 01:07:42 PM:
> 
> > From: Bruce Fields <bfields@fieldses.org>
> > To: Marc Eshel/Almaden/IBM@IBMUS
> > Cc: linux-nfs@vger.kernel.org
> > Date: 07/01/2016 01:07 PM
> > Subject: Re: grace period
> > Sent by: linux-nfs-owner@vger.kernel.org
> > 
> > On Fri, Jul 01, 2016 at 10:31:55AM -0700, Marc Eshel wrote:
> > > It used to be that sending KILL signal to lockd would free locks and 
> start 
> > > Grace period, and when setting nfsd threads to zero, 
> nfsd_last_thread() 
> > > calls nfsd_shutdown that called lockd_down that I believe was causing 
> both 
> > > freeing of locks and starting grace period or maybe it was setting it 
> back 
> > > to a value > 0 that started the grace period.
> > 
> > OK, apologies, I didn't know (or forgot) that.
> > 
> > > Any way starting with the kernels that are in RHEL7.1 and up echo 0 > 
> > > /proc/fs/nfsd/threads doesn't do it anymore, I assume going to common 
> > > grace period for NLM and NFSv4 changed things.
> > > The question is how to do IP fail-over, so when a node fails and the 
> IP is 
> > > moving to another node, we need to go into grace period on all the 
> nodes 
> > > in the cluster so the locks of the failed node are not given to anyone 
> 
> > > other than the client that is reclaiming his locks. Restarting NFS 
> server 
> > > is to distractive.
> > 
> > What's the difference?  Just that clients don't have to reestablish tcp
> > connections?
> 
> I am not sure what else systemctl will do but I need to control the order 
> of the restart so the client will not see any errors.
> I don't think that echo 0 > /proc/fs/nfsd/threads is freeing the lock, at 
> least not the v3 locks, I will try again with v4.
> The question is what is the most basic operation that can be done to start 
> grace, will echo 8 > /proc/fs/nfsd/threads following echo 0 do it?
> or is there any other primitive that will do it?

That should do it, though really so should just "systemctl restart
nfs-server"--if that causes errors then there's a bug somewhere.

--b.

> Marc.
> 
> > 
> > --b.
> > 
> > > For NFSv3 KILL signal to lockd still works but for 
> > > NFSv4 have no way to do it for v4.
> > > Marc. 
> > > 
> > > 
> > > 
> > > From:   Bruce Fields <bfields@fieldses.org>
> > > To:     Marc Eshel/Almaden/IBM@IBMUS
> > > Cc:     linux-nfs@vger.kernel.org
> > > Date:   07/01/2016 09:09 AM
> > > Subject:        Re: grace period
> > > 
> > > 
> > > 
> > > On Thu, Jun 30, 2016 at 02:46:19PM -0700, Marc Eshel wrote:
> > > > I see that setting the number of nfsd threads to 0 (echo 0 > 
> > > > /proc/fs/nfsd/threads) is not releasing the locks and putting the 
> server 
> > > 
> > > > in grace mode.
> > > 
> > > Writing 0 to /proc/fs/nfsd/threads shuts down knfsd.  So it should
> > > certainly drop locks.  If that's not happening, there's a bug, but 
> we'd
> > > need to know more details (version numbers, etc.) to help.
> > > 
> > > That alone has never been enough to start a grace period--you'd have 
> to
> > > start knfsd again to do that.
> > > 
> > > > What is the best way to go into grace period, in new version of the
> > > > kernel, without restarting the nfs server?
> > > 
> > > Restarting the nfs server is the only way.  That's true on older 
> kernels
> > > true, as far as I know.  (OK, you can apparently make lockd do 
> something
> > > like this with a signal, I don't know if that's used much, and I doubt
> > > it works outside an NFSv3-only environment.)
> > > 
> > > So if you want locks dropped and a new grace period, then you should 
> run
> > > "systemctl restart nfs-server", or your distro's equivalent.
> > > 
> > > But you're probably doing something more complicated than that.  I'm 
> not
> > > sure I understand the question....
> > > 
> > > --b.
> > > 
> > > 
> > > 
> > > 
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > 
> 

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: grace period
       [not found]         ` <OF5D486F02.62CECB7B-ON88257FE3.0071DBE5-88257FE3.00722318@LocalDomain>
@ 2016-07-01 20:51           ` Marc Eshel
  0 siblings, 0 replies; 44+ messages in thread
From: Marc Eshel @ 2016-07-01 20:51 UTC (permalink / raw)
  To: Bruce Fields; +Cc: linux-nfs, Tomer Perry

echo 0 > /proc/fs/nfsd/threads does delete the locks for v4 but not for v3
Marc.



From:   Marc Eshel/Almaden/IBM
To:     Bruce Fields <bfields@fieldses.org>
Cc:     linux-nfs@vger.kernel.org, Tomer Perry/Israel/IBM@IBMIL
Date:   07/01/2016 01:46 PM
Subject:        Re: grace period


This is my v3 test that show the lock still there after echo 0 > 
/proc/fs/nfsd/threads

[root@sonascl21 ~]# cat /etc/redhat-release 
Red Hat Enterprise Linux Server release 7.2 (Maipo)

[root@sonascl21 ~]# uname -a
Linux sonascl21.sonasad.almaden.ibm.com 3.10.0-327.el7.x86_64 #1 SMP Thu 
Oct 29 17:29:29 EDT 2015 x86_64 x86_64 x86_64 GNU/Linux

[root@sonascl21 ~]# cat /proc/locks | grep 999
3: POSIX  ADVISORY  WRITE 2349 00:2a:489486 0 999

[root@sonascl21 ~]# echo 0 > /proc/fs/nfsd/threads
[root@sonascl21 ~]# cat /proc/fs/nfsd/threads
0

[root@sonascl21 ~]# cat /proc/locks | grep 999
3: POSIX  ADVISORY  WRITE 2349 00:2a:489486 0 999





From:   Bruce Fields <bfields@fieldses.org>
To:     Marc Eshel/Almaden/IBM@IBMUS
Cc:     linux-nfs@vger.kernel.org
Date:   07/01/2016 01:07 PM
Subject:        Re: grace period



On Fri, Jul 01, 2016 at 10:31:55AM -0700, Marc Eshel wrote:
> It used to be that sending KILL signal to lockd would free locks and 
start 
> Grace period, and when setting nfsd threads to zero, nfsd_last_thread() 
> calls nfsd_shutdown that called lockd_down that I believe was causing 
both 
> freeing of locks and starting grace period or maybe it was setting it 
back 
> to a value > 0 that started the grace period.

OK, apologies, I didn't know (or forgot) that.

> Any way starting with the kernels that are in RHEL7.1 and up echo 0 > 
> /proc/fs/nfsd/threads doesn't do it anymore, I assume going to common 
> grace period for NLM and NFSv4 changed things.
> The question is how to do IP fail-over, so when a node fails and the IP 
is 
> moving to another node, we need to go into grace period on all the nodes 

> in the cluster so the locks of the failed node are not given to anyone 
> other than the client that is reclaiming his locks. Restarting NFS 
server 
> is to distractive.

What's the difference?  Just that clients don't have to reestablish tcp
connections?

--b.

> For NFSv3 KILL signal to lockd still works but for 
> NFSv4 have no way to do it for v4.
> Marc. 
> 
> 
> 
> From:   Bruce Fields <bfields@fieldses.org>
> To:     Marc Eshel/Almaden/IBM@IBMUS
> Cc:     linux-nfs@vger.kernel.org
> Date:   07/01/2016 09:09 AM
> Subject:        Re: grace period
> 
> 
> 
> On Thu, Jun 30, 2016 at 02:46:19PM -0700, Marc Eshel wrote:
> > I see that setting the number of nfsd threads to 0 (echo 0 > 
> > /proc/fs/nfsd/threads) is not releasing the locks and putting the 
server 
> 
> > in grace mode.
> 
> Writing 0 to /proc/fs/nfsd/threads shuts down knfsd.  So it should
> certainly drop locks.  If that's not happening, there's a bug, but we'd
> need to know more details (version numbers, etc.) to help.
> 
> That alone has never been enough to start a grace period--you'd have to
> start knfsd again to do that.
> 
> > What is the best way to go into grace period, in new version of the
> > kernel, without restarting the nfs server?
> 
> Restarting the nfs server is the only way.  That's true on older kernels
> true, as far as I know.  (OK, you can apparently make lockd do something
> like this with a signal, I don't know if that's used much, and I doubt
> it works outside an NFSv3-only environment.)
> 
> So if you want locks dropped and a new grace period, then you should run
> "systemctl restart nfs-server", or your distro's equivalent.
> 
> But you're probably doing something more complicated than that.  I'm not
> sure I understand the question....
> 
> --b.
> 
> 
> 
> 







^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: grace period
  2016-07-01 20:46         ` Marc Eshel
@ 2016-07-01 21:01           ` Bruce Fields
  2016-07-01 22:42             ` Marc Eshel
  0 siblings, 1 reply; 44+ messages in thread
From: Bruce Fields @ 2016-07-01 21:01 UTC (permalink / raw)
  To: Marc Eshel; +Cc: linux-nfs, Tomer Perry

On Fri, Jul 01, 2016 at 01:46:42PM -0700, Marc Eshel wrote:
> This is my v3 test that show the lock still there after echo 0 > 
> /proc/fs/nfsd/threads
> 
> [root@sonascl21 ~]# cat /etc/redhat-release 
> Red Hat Enterprise Linux Server release 7.2 (Maipo)
> 
> [root@sonascl21 ~]# uname -a
> Linux sonascl21.sonasad.almaden.ibm.com 3.10.0-327.el7.x86_64 #1 SMP Thu 
> Oct 29 17:29:29 EDT 2015 x86_64 x86_64 x86_64 GNU/Linux
> 
> [root@sonascl21 ~]# cat /proc/locks | grep 999
> 3: POSIX  ADVISORY  WRITE 2349 00:2a:489486 0 999
> 
> [root@sonascl21 ~]# echo 0 > /proc/fs/nfsd/threads
> [root@sonascl21 ~]# cat /proc/fs/nfsd/threads
> 0
> 
> [root@sonascl21 ~]# cat /proc/locks | grep 999
> 3: POSIX  ADVISORY  WRITE 2349 00:2a:489486 0 999

Huh, that's not what I see.  Are you positive that's the lock on the
backend filesystem and not the client-side lock (in case you're doing a
loopback mount?)

--b.

> 
> 
> 
> 
> From:   Bruce Fields <bfields@fieldses.org>
> To:     Marc Eshel/Almaden/IBM@IBMUS
> Cc:     linux-nfs@vger.kernel.org
> Date:   07/01/2016 01:07 PM
> Subject:        Re: grace period
> 
> 
> 
> On Fri, Jul 01, 2016 at 10:31:55AM -0700, Marc Eshel wrote:
> > It used to be that sending KILL signal to lockd would free locks and 
> start 
> > Grace period, and when setting nfsd threads to zero, nfsd_last_thread() 
> > calls nfsd_shutdown that called lockd_down that I believe was causing 
> both 
> > freeing of locks and starting grace period or maybe it was setting it 
> back 
> > to a value > 0 that started the grace period.
> 
> OK, apologies, I didn't know (or forgot) that.
> 
> > Any way starting with the kernels that are in RHEL7.1 and up echo 0 > 
> > /proc/fs/nfsd/threads doesn't do it anymore, I assume going to common 
> > grace period for NLM and NFSv4 changed things.
> > The question is how to do IP fail-over, so when a node fails and the IP 
> is 
> > moving to another node, we need to go into grace period on all the nodes 
> 
> > in the cluster so the locks of the failed node are not given to anyone 
> > other than the client that is reclaiming his locks. Restarting NFS 
> server 
> > is to distractive.
> 
> What's the difference?  Just that clients don't have to reestablish tcp
> connections?
> 
> --b.
> 
> > For NFSv3 KILL signal to lockd still works but for 
> > NFSv4 have no way to do it for v4.
> > Marc. 
> > 
> > 
> > 
> > From:   Bruce Fields <bfields@fieldses.org>
> > To:     Marc Eshel/Almaden/IBM@IBMUS
> > Cc:     linux-nfs@vger.kernel.org
> > Date:   07/01/2016 09:09 AM
> > Subject:        Re: grace period
> > 
> > 
> > 
> > On Thu, Jun 30, 2016 at 02:46:19PM -0700, Marc Eshel wrote:
> > > I see that setting the number of nfsd threads to 0 (echo 0 > 
> > > /proc/fs/nfsd/threads) is not releasing the locks and putting the 
> server 
> > 
> > > in grace mode.
> > 
> > Writing 0 to /proc/fs/nfsd/threads shuts down knfsd.  So it should
> > certainly drop locks.  If that's not happening, there's a bug, but we'd
> > need to know more details (version numbers, etc.) to help.
> > 
> > That alone has never been enough to start a grace period--you'd have to
> > start knfsd again to do that.
> > 
> > > What is the best way to go into grace period, in new version of the
> > > kernel, without restarting the nfs server?
> > 
> > Restarting the nfs server is the only way.  That's true on older kernels
> > true, as far as I know.  (OK, you can apparently make lockd do something
> > like this with a signal, I don't know if that's used much, and I doubt
> > it works outside an NFSv3-only environment.)
> > 
> > So if you want locks dropped and a new grace period, then you should run
> > "systemctl restart nfs-server", or your distro's equivalent.
> > 
> > But you're probably doing something more complicated than that.  I'm not
> > sure I understand the question....
> > 
> > --b.
> > 
> > 
> > 
> > 
> 
> 
> 
> 

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: grace period
  2016-07-01 21:01           ` Bruce Fields
@ 2016-07-01 22:42             ` Marc Eshel
  2016-07-02  0:58               ` Bruce Fields
  0 siblings, 1 reply; 44+ messages in thread
From: Marc Eshel @ 2016-07-01 22:42 UTC (permalink / raw)
  To: Bruce Fields; +Cc: linux-nfs, Tomer Perry

Yes, the locks are requested from another node, what fs are you using, I 
don't think it should make any difference, but I can try it with the same 
fs. 
Make sure you are using v3, it does work for v4.
Marc.



From:   Bruce Fields <bfields@fieldses.org>
To:     Marc Eshel/Almaden/IBM@IBMUS
Cc:     linux-nfs@vger.kernel.org, Tomer Perry <TOMP@il.ibm.com>
Date:   07/01/2016 02:01 PM
Subject:        Re: grace period



On Fri, Jul 01, 2016 at 01:46:42PM -0700, Marc Eshel wrote:
> This is my v3 test that show the lock still there after echo 0 > 
> /proc/fs/nfsd/threads
> 
> [root@sonascl21 ~]# cat /etc/redhat-release 
> Red Hat Enterprise Linux Server release 7.2 (Maipo)
> 
> [root@sonascl21 ~]# uname -a
> Linux sonascl21.sonasad.almaden.ibm.com 3.10.0-327.el7.x86_64 #1 SMP Thu 

> Oct 29 17:29:29 EDT 2015 x86_64 x86_64 x86_64 GNU/Linux
> 
> [root@sonascl21 ~]# cat /proc/locks | grep 999
> 3: POSIX  ADVISORY  WRITE 2349 00:2a:489486 0 999
> 
> [root@sonascl21 ~]# echo 0 > /proc/fs/nfsd/threads
> [root@sonascl21 ~]# cat /proc/fs/nfsd/threads
> 0
> 
> [root@sonascl21 ~]# cat /proc/locks | grep 999
> 3: POSIX  ADVISORY  WRITE 2349 00:2a:489486 0 999

Huh, that's not what I see.  Are you positive that's the lock on the
backend filesystem and not the client-side lock (in case you're doing a
loopback mount?)

--b.

> 
> 
> 
> 
> From:   Bruce Fields <bfields@fieldses.org>
> To:     Marc Eshel/Almaden/IBM@IBMUS
> Cc:     linux-nfs@vger.kernel.org
> Date:   07/01/2016 01:07 PM
> Subject:        Re: grace period
> 
> 
> 
> On Fri, Jul 01, 2016 at 10:31:55AM -0700, Marc Eshel wrote:
> > It used to be that sending KILL signal to lockd would free locks and 
> start 
> > Grace period, and when setting nfsd threads to zero, 
nfsd_last_thread() 
> > calls nfsd_shutdown that called lockd_down that I believe was causing 
> both 
> > freeing of locks and starting grace period or maybe it was setting it 
> back 
> > to a value > 0 that started the grace period.
> 
> OK, apologies, I didn't know (or forgot) that.
> 
> > Any way starting with the kernels that are in RHEL7.1 and up echo 0 > 
> > /proc/fs/nfsd/threads doesn't do it anymore, I assume going to common 
> > grace period for NLM and NFSv4 changed things.
> > The question is how to do IP fail-over, so when a node fails and the 
IP 
> is 
> > moving to another node, we need to go into grace period on all the 
nodes 
> 
> > in the cluster so the locks of the failed node are not given to anyone 

> > other than the client that is reclaiming his locks. Restarting NFS 
> server 
> > is to distractive.
> 
> What's the difference?  Just that clients don't have to reestablish tcp
> connections?
> 
> --b.
> 
> > For NFSv3 KILL signal to lockd still works but for 
> > NFSv4 have no way to do it for v4.
> > Marc. 
> > 
> > 
> > 
> > From:   Bruce Fields <bfields@fieldses.org>
> > To:     Marc Eshel/Almaden/IBM@IBMUS
> > Cc:     linux-nfs@vger.kernel.org
> > Date:   07/01/2016 09:09 AM
> > Subject:        Re: grace period
> > 
> > 
> > 
> > On Thu, Jun 30, 2016 at 02:46:19PM -0700, Marc Eshel wrote:
> > > I see that setting the number of nfsd threads to 0 (echo 0 > 
> > > /proc/fs/nfsd/threads) is not releasing the locks and putting the 
> server 
> > 
> > > in grace mode.
> > 
> > Writing 0 to /proc/fs/nfsd/threads shuts down knfsd.  So it should
> > certainly drop locks.  If that's not happening, there's a bug, but 
we'd
> > need to know more details (version numbers, etc.) to help.
> > 
> > That alone has never been enough to start a grace period--you'd have 
to
> > start knfsd again to do that.
> > 
> > > What is the best way to go into grace period, in new version of the
> > > kernel, without restarting the nfs server?
> > 
> > Restarting the nfs server is the only way.  That's true on older 
kernels
> > true, as far as I know.  (OK, you can apparently make lockd do 
something
> > like this with a signal, I don't know if that's used much, and I doubt
> > it works outside an NFSv3-only environment.)
> > 
> > So if you want locks dropped and a new grace period, then you should 
run
> > "systemctl restart nfs-server", or your distro's equivalent.
> > 
> > But you're probably doing something more complicated than that.  I'm 
not
> > sure I understand the question....
> > 
> > --b.
> > 
> > 
> > 
> > 
> 
> 
> 
> 






^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: grace period
  2016-07-01 22:42             ` Marc Eshel
@ 2016-07-02  0:58               ` Bruce Fields
  2016-07-03  5:30                 ` Marc Eshel
       [not found]                 ` <OFC1237E53.3CFCA8E8-ON88257FE5.001D3182-88257FE5.001E3A5B@LocalDomain>
  0 siblings, 2 replies; 44+ messages in thread
From: Bruce Fields @ 2016-07-02  0:58 UTC (permalink / raw)
  To: Marc Eshel; +Cc: linux-nfs, Tomer Perry

On Fri, Jul 01, 2016 at 03:42:43PM -0700, Marc Eshel wrote:
> Yes, the locks are requested from another node, what fs are you using, I 
> don't think it should make any difference, but I can try it with the same 
> fs. 
> Make sure you are using v3, it does work for v4.

I tested v3 on upstream.--b.

> Marc.
> 
> 
> 
> From:   Bruce Fields <bfields@fieldses.org>
> To:     Marc Eshel/Almaden/IBM@IBMUS
> Cc:     linux-nfs@vger.kernel.org, Tomer Perry <TOMP@il.ibm.com>
> Date:   07/01/2016 02:01 PM
> Subject:        Re: grace period
> 
> 
> 
> On Fri, Jul 01, 2016 at 01:46:42PM -0700, Marc Eshel wrote:
> > This is my v3 test that show the lock still there after echo 0 > 
> > /proc/fs/nfsd/threads
> > 
> > [root@sonascl21 ~]# cat /etc/redhat-release 
> > Red Hat Enterprise Linux Server release 7.2 (Maipo)
> > 
> > [root@sonascl21 ~]# uname -a
> > Linux sonascl21.sonasad.almaden.ibm.com 3.10.0-327.el7.x86_64 #1 SMP Thu 
> 
> > Oct 29 17:29:29 EDT 2015 x86_64 x86_64 x86_64 GNU/Linux
> > 
> > [root@sonascl21 ~]# cat /proc/locks | grep 999
> > 3: POSIX  ADVISORY  WRITE 2349 00:2a:489486 0 999
> > 
> > [root@sonascl21 ~]# echo 0 > /proc/fs/nfsd/threads
> > [root@sonascl21 ~]# cat /proc/fs/nfsd/threads
> > 0
> > 
> > [root@sonascl21 ~]# cat /proc/locks | grep 999
> > 3: POSIX  ADVISORY  WRITE 2349 00:2a:489486 0 999
> 
> Huh, that's not what I see.  Are you positive that's the lock on the
> backend filesystem and not the client-side lock (in case you're doing a
> loopback mount?)
> 
> --b.
> 
> > 
> > 
> > 
> > 
> > From:   Bruce Fields <bfields@fieldses.org>
> > To:     Marc Eshel/Almaden/IBM@IBMUS
> > Cc:     linux-nfs@vger.kernel.org
> > Date:   07/01/2016 01:07 PM
> > Subject:        Re: grace period
> > 
> > 
> > 
> > On Fri, Jul 01, 2016 at 10:31:55AM -0700, Marc Eshel wrote:
> > > It used to be that sending KILL signal to lockd would free locks and 
> > start 
> > > Grace period, and when setting nfsd threads to zero, 
> nfsd_last_thread() 
> > > calls nfsd_shutdown that called lockd_down that I believe was causing 
> > both 
> > > freeing of locks and starting grace period or maybe it was setting it 
> > back 
> > > to a value > 0 that started the grace period.
> > 
> > OK, apologies, I didn't know (or forgot) that.
> > 
> > > Any way starting with the kernels that are in RHEL7.1 and up echo 0 > 
> > > /proc/fs/nfsd/threads doesn't do it anymore, I assume going to common 
> > > grace period for NLM and NFSv4 changed things.
> > > The question is how to do IP fail-over, so when a node fails and the 
> IP 
> > is 
> > > moving to another node, we need to go into grace period on all the 
> nodes 
> > 
> > > in the cluster so the locks of the failed node are not given to anyone 
> 
> > > other than the client that is reclaiming his locks. Restarting NFS 
> > server 
> > > is to distractive.
> > 
> > What's the difference?  Just that clients don't have to reestablish tcp
> > connections?
> > 
> > --b.
> > 
> > > For NFSv3 KILL signal to lockd still works but for 
> > > NFSv4 have no way to do it for v4.
> > > Marc. 
> > > 
> > > 
> > > 
> > > From:   Bruce Fields <bfields@fieldses.org>
> > > To:     Marc Eshel/Almaden/IBM@IBMUS
> > > Cc:     linux-nfs@vger.kernel.org
> > > Date:   07/01/2016 09:09 AM
> > > Subject:        Re: grace period
> > > 
> > > 
> > > 
> > > On Thu, Jun 30, 2016 at 02:46:19PM -0700, Marc Eshel wrote:
> > > > I see that setting the number of nfsd threads to 0 (echo 0 > 
> > > > /proc/fs/nfsd/threads) is not releasing the locks and putting the 
> > server 
> > > 
> > > > in grace mode.
> > > 
> > > Writing 0 to /proc/fs/nfsd/threads shuts down knfsd.  So it should
> > > certainly drop locks.  If that's not happening, there's a bug, but 
> we'd
> > > need to know more details (version numbers, etc.) to help.
> > > 
> > > That alone has never been enough to start a grace period--you'd have 
> to
> > > start knfsd again to do that.
> > > 
> > > > What is the best way to go into grace period, in new version of the
> > > > kernel, without restarting the nfs server?
> > > 
> > > Restarting the nfs server is the only way.  That's true on older 
> kernels
> > > true, as far as I know.  (OK, you can apparently make lockd do 
> something
> > > like this with a signal, I don't know if that's used much, and I doubt
> > > it works outside an NFSv3-only environment.)
> > > 
> > > So if you want locks dropped and a new grace period, then you should 
> run
> > > "systemctl restart nfs-server", or your distro's equivalent.
> > > 
> > > But you're probably doing something more complicated than that.  I'm 
> not
> > > sure I understand the question....
> > > 
> > > --b.
> > > 
> > > 
> > > 
> > > 
> > 
> > 
> > 
> > 
> 
> 
> 
> 

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: grace period
  2016-07-02  0:58               ` Bruce Fields
@ 2016-07-03  5:30                 ` Marc Eshel
  2016-07-05 20:51                   ` Bruce Fields
       [not found]                 ` <OFC1237E53.3CFCA8E8-ON88257FE5.001D3182-88257FE5.001E3A5B@LocalDomain>
  1 sibling, 1 reply; 44+ messages in thread
From: Marc Eshel @ 2016-07-03  5:30 UTC (permalink / raw)
  To: Bruce Fields; +Cc: linux-nfs, Tomer Perry

I tried again NFSv3 locks with xfs export. "echo 0 > 
/proc/fs/nfsd/threads" releases locks on rhel7.0 but not on rhel7.2
What else can I show you to find the problem?
Marc.
 
works:
[root@boar11 ~]# uname -a
Linux boar11 3.10.0-123.el7.x86_64 #1 SMP Mon May 5 11:16:57 EDT 2014 
x86_64 x86_64 x86_64 GNU/Linux
[root@boar11 ~]# cat /etc/redhat-release 
Red Hat Enterprise Linux Server release 7.0 (Maipo)

not working:
[root@sonascl21 ~]# uname -a
Linux sonascl21.sonasad.almaden.ibm.com 3.10.0-327.el7.x86_64 #1 SMP Thu 
Oct 29 17:29:29 EDT 2015 x86_64 x86_64 x86_64 GNU/Linux
[root@sonascl21 ~]# cat /etc/redhat-release 
Red Hat Enterprise Linux Server release 7.2 (Maipo)
[root@sonascl21 ~]# cat /proc/fs/nfsd/threads 
0
[root@sonascl21 ~]# cat /proc/locks
1: POSIX  ADVISORY  WRITE 2346 fd:00:1612092569 0 9999



From:   Bruce Fields <bfields@fieldses.org>
To:     Marc Eshel/Almaden/IBM@IBMUS
Cc:     linux-nfs@vger.kernel.org, Tomer Perry <TOMP@il.ibm.com>
Date:   07/01/2016 05:58 PM
Subject:        Re: grace period



On Fri, Jul 01, 2016 at 03:42:43PM -0700, Marc Eshel wrote:
> Yes, the locks are requested from another node, what fs are you using, I 

> don't think it should make any difference, but I can try it with the 
same 
> fs. 
> Make sure you are using v3, it does work for v4.

I tested v3 on upstream.--b.

> Marc.
> 
> 
> 
> From:   Bruce Fields <bfields@fieldses.org>
> To:     Marc Eshel/Almaden/IBM@IBMUS
> Cc:     linux-nfs@vger.kernel.org, Tomer Perry <TOMP@il.ibm.com>
> Date:   07/01/2016 02:01 PM
> Subject:        Re: grace period
> 
> 
> 
> On Fri, Jul 01, 2016 at 01:46:42PM -0700, Marc Eshel wrote:
> > This is my v3 test that show the lock still there after echo 0 > 
> > /proc/fs/nfsd/threads
> > 
> > [root@sonascl21 ~]# cat /etc/redhat-release 
> > Red Hat Enterprise Linux Server release 7.2 (Maipo)
> > 
> > [root@sonascl21 ~]# uname -a
> > Linux sonascl21.sonasad.almaden.ibm.com 3.10.0-327.el7.x86_64 #1 SMP 
Thu 
> 
> > Oct 29 17:29:29 EDT 2015 x86_64 x86_64 x86_64 GNU/Linux
> > 
> > [root@sonascl21 ~]# cat /proc/locks | grep 999
> > 3: POSIX  ADVISORY  WRITE 2349 00:2a:489486 0 999
> > 
> > [root@sonascl21 ~]# echo 0 > /proc/fs/nfsd/threads
> > [root@sonascl21 ~]# cat /proc/fs/nfsd/threads
> > 0
> > 
> > [root@sonascl21 ~]# cat /proc/locks | grep 999
> > 3: POSIX  ADVISORY  WRITE 2349 00:2a:489486 0 999
> 
> Huh, that's not what I see.  Are you positive that's the lock on the
> backend filesystem and not the client-side lock (in case you're doing a
> loopback mount?)
> 
> --b.
> 
> > 
> > 
> > 
> > 
> > From:   Bruce Fields <bfields@fieldses.org>
> > To:     Marc Eshel/Almaden/IBM@IBMUS
> > Cc:     linux-nfs@vger.kernel.org
> > Date:   07/01/2016 01:07 PM
> > Subject:        Re: grace period
> > 
> > 
> > 
> > On Fri, Jul 01, 2016 at 10:31:55AM -0700, Marc Eshel wrote:
> > > It used to be that sending KILL signal to lockd would free locks and 

> > start 
> > > Grace period, and when setting nfsd threads to zero, 
> nfsd_last_thread() 
> > > calls nfsd_shutdown that called lockd_down that I believe was 
causing 
> > both 
> > > freeing of locks and starting grace period or maybe it was setting 
it 
> > back 
> > > to a value > 0 that started the grace period.
> > 
> > OK, apologies, I didn't know (or forgot) that.
> > 
> > > Any way starting with the kernels that are in RHEL7.1 and up echo 0 
> 
> > > /proc/fs/nfsd/threads doesn't do it anymore, I assume going to 
common 
> > > grace period for NLM and NFSv4 changed things.
> > > The question is how to do IP fail-over, so when a node fails and the 

> IP 
> > is 
> > > moving to another node, we need to go into grace period on all the 
> nodes 
> > 
> > > in the cluster so the locks of the failed node are not given to 
anyone 
> 
> > > other than the client that is reclaiming his locks. Restarting NFS 
> > server 
> > > is to distractive.
> > 
> > What's the difference?  Just that clients don't have to reestablish 
tcp
> > connections?
> > 
> > --b.
> > 
> > > For NFSv3 KILL signal to lockd still works but for 
> > > NFSv4 have no way to do it for v4.
> > > Marc. 
> > > 
> > > 
> > > 
> > > From:   Bruce Fields <bfields@fieldses.org>
> > > To:     Marc Eshel/Almaden/IBM@IBMUS
> > > Cc:     linux-nfs@vger.kernel.org
> > > Date:   07/01/2016 09:09 AM
> > > Subject:        Re: grace period
> > > 
> > > 
> > > 
> > > On Thu, Jun 30, 2016 at 02:46:19PM -0700, Marc Eshel wrote:
> > > > I see that setting the number of nfsd threads to 0 (echo 0 > 
> > > > /proc/fs/nfsd/threads) is not releasing the locks and putting the 
> > server 
> > > 
> > > > in grace mode.
> > > 
> > > Writing 0 to /proc/fs/nfsd/threads shuts down knfsd.  So it should
> > > certainly drop locks.  If that's not happening, there's a bug, but 
> we'd
> > > need to know more details (version numbers, etc.) to help.
> > > 
> > > That alone has never been enough to start a grace period--you'd have 

> to
> > > start knfsd again to do that.
> > > 
> > > > What is the best way to go into grace period, in new version of 
the
> > > > kernel, without restarting the nfs server?
> > > 
> > > Restarting the nfs server is the only way.  That's true on older 
> kernels
> > > true, as far as I know.  (OK, you can apparently make lockd do 
> something
> > > like this with a signal, I don't know if that's used much, and I 
doubt
> > > it works outside an NFSv3-only environment.)
> > > 
> > > So if you want locks dropped and a new grace period, then you should 

> run
> > > "systemctl restart nfs-server", or your distro's equivalent.
> > > 
> > > But you're probably doing something more complicated than that.  I'm 

> not
> > > sure I understand the question....
> > > 
> > > --b.
> > > 
> > > 
> > > 
> > > 
> > 
> > 
> > 
> > 
> 
> 
> 
> 






^ permalink raw reply	[flat|nested] 44+ messages in thread

* HA NFS
       [not found]                 ` <OFC1237E53.3CFCA8E8-ON88257FE5.001D3182-88257FE5.001E3A5B@LocalDomain>
@ 2016-07-04 23:53                   ` Marc Eshel
  2016-07-05 15:08                     ` Steve Dickson
  0 siblings, 1 reply; 44+ messages in thread
From: Marc Eshel @ 2016-07-04 23:53 UTC (permalink / raw)
  To: Steve Dickson; +Cc: linux-nfs, Tomer Perry

Hi Steve,
I did not pay attention for a while and now I see that since RHEL7.0 there 
a major changes in NFSv4 recovery for a signal machine and for cluster 
file system. Is there any write up on the changes like the use of 
/var/lib/nfs/nfsdcltrack/main.sqlite, I see it being used in 7.0 but not 
in 7.2. Any information would be appreciated.
Thanks, Marc. 

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: HA NFS
  2016-07-04 23:53                   ` HA NFS Marc Eshel
@ 2016-07-05 15:08                     ` Steve Dickson
  2016-07-05 20:56                       ` Marc Eshel
  0 siblings, 1 reply; 44+ messages in thread
From: Steve Dickson @ 2016-07-05 15:08 UTC (permalink / raw)
  To: Marc Eshel; +Cc: linux-nfs, Tomer Perry



On 07/04/2016 07:53 PM, Marc Eshel wrote:
> Hi Steve,
> I did not pay attention for a while and now I see that since RHEL7.0 there 
> a major changes in NFSv4 recovery for a signal machine and for cluster 
> file system. Is there any write up on the changes like the use of 
> /var/lib/nfs/nfsdcltrack/main.sqlite, I see it being used in 7.0 but not 
> in 7.2. Any information would be appreciated.

That file is still being used... but there were some changes.

In RHEL 7.2 this was added for bz 1234598
commit c41a3d0a17baa61a07d48d8536e99908d765de9b
Author: Jeff Layton <jlayton@primarydata.com>
Date:   Fri Sep 19 11:07:31 2014 -0400

    nfsdcltrack: fetch NFSDCLTRACK_GRACE_START out of environment


In RHEL 7.3 there will be this for bz 1285097
commit d479ad3adb0671c48d6fbf3e36bd52a31159c413
Author: Jeff Layton <jlayton@primarydata.com>
Date:   Fri Sep 19 11:03:45 2014 -0400

    nfsdcltrack: update schema to v2

I hope this helps... 

steved.

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: grace period
  2016-07-03  5:30                 ` Marc Eshel
@ 2016-07-05 20:51                   ` Bruce Fields
  2016-07-05 23:05                     ` Marc Eshel
  0 siblings, 1 reply; 44+ messages in thread
From: Bruce Fields @ 2016-07-05 20:51 UTC (permalink / raw)
  To: Marc Eshel; +Cc: linux-nfs, Tomer Perry

On Sat, Jul 02, 2016 at 10:30:11PM -0700, Marc Eshel wrote:
> I tried again NFSv3 locks with xfs export. "echo 0 > 
> /proc/fs/nfsd/threads" releases locks on rhel7.0 but not on rhel7.2
> What else can I show you to find the problem?

Sorry, I can't reproduce, though I've only tried a slightly later kernel
than that.  Could you submit a RHEL bug?

--b.

> Marc.
>  
> works:
> [root@boar11 ~]# uname -a
> Linux boar11 3.10.0-123.el7.x86_64 #1 SMP Mon May 5 11:16:57 EDT 2014 
> x86_64 x86_64 x86_64 GNU/Linux
> [root@boar11 ~]# cat /etc/redhat-release 
> Red Hat Enterprise Linux Server release 7.0 (Maipo)
> 
> not working:
> [root@sonascl21 ~]# uname -a
> Linux sonascl21.sonasad.almaden.ibm.com 3.10.0-327.el7.x86_64 #1 SMP Thu 
> Oct 29 17:29:29 EDT 2015 x86_64 x86_64 x86_64 GNU/Linux
> [root@sonascl21 ~]# cat /etc/redhat-release 
> Red Hat Enterprise Linux Server release 7.2 (Maipo)
> [root@sonascl21 ~]# cat /proc/fs/nfsd/threads 
> 0
> [root@sonascl21 ~]# cat /proc/locks
> 1: POSIX  ADVISORY  WRITE 2346 fd:00:1612092569 0 9999
> 
> 
> 
> From:   Bruce Fields <bfields@fieldses.org>
> To:     Marc Eshel/Almaden/IBM@IBMUS
> Cc:     linux-nfs@vger.kernel.org, Tomer Perry <TOMP@il.ibm.com>
> Date:   07/01/2016 05:58 PM
> Subject:        Re: grace period
> 
> 
> 
> On Fri, Jul 01, 2016 at 03:42:43PM -0700, Marc Eshel wrote:
> > Yes, the locks are requested from another node, what fs are you using, I 
> 
> > don't think it should make any difference, but I can try it with the 
> same 
> > fs. 
> > Make sure you are using v3, it does work for v4.
> 
> I tested v3 on upstream.--b.
> 
> > Marc.
> > 
> > 
> > 
> > From:   Bruce Fields <bfields@fieldses.org>
> > To:     Marc Eshel/Almaden/IBM@IBMUS
> > Cc:     linux-nfs@vger.kernel.org, Tomer Perry <TOMP@il.ibm.com>
> > Date:   07/01/2016 02:01 PM
> > Subject:        Re: grace period
> > 
> > 
> > 
> > On Fri, Jul 01, 2016 at 01:46:42PM -0700, Marc Eshel wrote:
> > > This is my v3 test that show the lock still there after echo 0 > 
> > > /proc/fs/nfsd/threads
> > > 
> > > [root@sonascl21 ~]# cat /etc/redhat-release 
> > > Red Hat Enterprise Linux Server release 7.2 (Maipo)
> > > 
> > > [root@sonascl21 ~]# uname -a
> > > Linux sonascl21.sonasad.almaden.ibm.com 3.10.0-327.el7.x86_64 #1 SMP 
> Thu 
> > 
> > > Oct 29 17:29:29 EDT 2015 x86_64 x86_64 x86_64 GNU/Linux
> > > 
> > > [root@sonascl21 ~]# cat /proc/locks | grep 999
> > > 3: POSIX  ADVISORY  WRITE 2349 00:2a:489486 0 999
> > > 
> > > [root@sonascl21 ~]# echo 0 > /proc/fs/nfsd/threads
> > > [root@sonascl21 ~]# cat /proc/fs/nfsd/threads
> > > 0
> > > 
> > > [root@sonascl21 ~]# cat /proc/locks | grep 999
> > > 3: POSIX  ADVISORY  WRITE 2349 00:2a:489486 0 999
> > 
> > Huh, that's not what I see.  Are you positive that's the lock on the
> > backend filesystem and not the client-side lock (in case you're doing a
> > loopback mount?)
> > 
> > --b.
> > 
> > > 
> > > 
> > > 
> > > 
> > > From:   Bruce Fields <bfields@fieldses.org>
> > > To:     Marc Eshel/Almaden/IBM@IBMUS
> > > Cc:     linux-nfs@vger.kernel.org
> > > Date:   07/01/2016 01:07 PM
> > > Subject:        Re: grace period
> > > 
> > > 
> > > 
> > > On Fri, Jul 01, 2016 at 10:31:55AM -0700, Marc Eshel wrote:
> > > > It used to be that sending KILL signal to lockd would free locks and 
> 
> > > start 
> > > > Grace period, and when setting nfsd threads to zero, 
> > nfsd_last_thread() 
> > > > calls nfsd_shutdown that called lockd_down that I believe was 
> causing 
> > > both 
> > > > freeing of locks and starting grace period or maybe it was setting 
> it 
> > > back 
> > > > to a value > 0 that started the grace period.
> > > 
> > > OK, apologies, I didn't know (or forgot) that.
> > > 
> > > > Any way starting with the kernels that are in RHEL7.1 and up echo 0 
> > 
> > > > /proc/fs/nfsd/threads doesn't do it anymore, I assume going to 
> common 
> > > > grace period for NLM and NFSv4 changed things.
> > > > The question is how to do IP fail-over, so when a node fails and the 
> 
> > IP 
> > > is 
> > > > moving to another node, we need to go into grace period on all the 
> > nodes 
> > > 
> > > > in the cluster so the locks of the failed node are not given to 
> anyone 
> > 
> > > > other than the client that is reclaiming his locks. Restarting NFS 
> > > server 
> > > > is to distractive.
> > > 
> > > What's the difference?  Just that clients don't have to reestablish 
> tcp
> > > connections?
> > > 
> > > --b.
> > > 
> > > > For NFSv3 KILL signal to lockd still works but for 
> > > > NFSv4 have no way to do it for v4.
> > > > Marc. 
> > > > 
> > > > 
> > > > 
> > > > From:   Bruce Fields <bfields@fieldses.org>
> > > > To:     Marc Eshel/Almaden/IBM@IBMUS
> > > > Cc:     linux-nfs@vger.kernel.org
> > > > Date:   07/01/2016 09:09 AM
> > > > Subject:        Re: grace period
> > > > 
> > > > 
> > > > 
> > > > On Thu, Jun 30, 2016 at 02:46:19PM -0700, Marc Eshel wrote:
> > > > > I see that setting the number of nfsd threads to 0 (echo 0 > 
> > > > > /proc/fs/nfsd/threads) is not releasing the locks and putting the 
> > > server 
> > > > 
> > > > > in grace mode.
> > > > 
> > > > Writing 0 to /proc/fs/nfsd/threads shuts down knfsd.  So it should
> > > > certainly drop locks.  If that's not happening, there's a bug, but 
> > we'd
> > > > need to know more details (version numbers, etc.) to help.
> > > > 
> > > > That alone has never been enough to start a grace period--you'd have 
> 
> > to
> > > > start knfsd again to do that.
> > > > 
> > > > > What is the best way to go into grace period, in new version of 
> the
> > > > > kernel, without restarting the nfs server?
> > > > 
> > > > Restarting the nfs server is the only way.  That's true on older 
> > kernels
> > > > true, as far as I know.  (OK, you can apparently make lockd do 
> > something
> > > > like this with a signal, I don't know if that's used much, and I 
> doubt
> > > > it works outside an NFSv3-only environment.)
> > > > 
> > > > So if you want locks dropped and a new grace period, then you should 
> 
> > run
> > > > "systemctl restart nfs-server", or your distro's equivalent.
> > > > 
> > > > But you're probably doing something more complicated than that.  I'm 
> 
> > not
> > > > sure I understand the question....
> > > > 
> > > > --b.
> > > > 
> > > > 
> > > > 
> > > > 
> > > 
> > > 
> > > 
> > > 
> > 
> > 
> > 
> > 
> 
> 
> 
> 

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: HA NFS
  2016-07-05 15:08                     ` Steve Dickson
@ 2016-07-05 20:56                       ` Marc Eshel
  0 siblings, 0 replies; 44+ messages in thread
From: Marc Eshel @ 2016-07-05 20:56 UTC (permalink / raw)
  To: Steve Dickson; +Cc: linux-nfs, Tomer Perry, Jeff Layton

Thanks for the pointer Steve, I am not sure how much of the changes 
considered RedHat and how much is Linux kernel changes so if another 
mailing list is more appropriate please let me know. I now see on RHEL7.0 
that /var/lib/nfs/nfsdcltrack/main.sqlite is updated as I open a file from 
an NFS client, but going to RHEL7.2 that file is created but not updated 
with a new client open. Did something change in this area between 7.0 and 
7.2 ?
Marc.



From:   Steve Dickson <SteveD@redhat.com>
To:     Marc Eshel/Almaden/IBM@IBMUS
Cc:     linux-nfs@vger.kernel.org, Tomer Perry <TOMP@il.ibm.com>
Date:   07/05/2016 08:08 AM
Subject:        Re: HA NFS





On 07/04/2016 07:53 PM, Marc Eshel wrote:
> Hi Steve,
> I did not pay attention for a while and now I see that since RHEL7.0 
there 
> a major changes in NFSv4 recovery for a signal machine and for cluster 
> file system. Is there any write up on the changes like the use of 
> /var/lib/nfs/nfsdcltrack/main.sqlite, I see it being used in 7.0 but not 

> in 7.2. Any information would be appreciated.

That file is still being used... but there were some changes.

In RHEL 7.2 this was added for bz 1234598
commit c41a3d0a17baa61a07d48d8536e99908d765de9b
Author: Jeff Layton <jlayton@primarydata.com>
Date:   Fri Sep 19 11:07:31 2014 -0400

    nfsdcltrack: fetch NFSDCLTRACK_GRACE_START out of environment


In RHEL 7.3 there will be this for bz 1285097
commit d479ad3adb0671c48d6fbf3e36bd52a31159c413
Author: Jeff Layton <jlayton@primarydata.com>
Date:   Fri Sep 19 11:03:45 2014 -0400

    nfsdcltrack: update schema to v2

I hope this helps... 

steved.






^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: grace period
  2016-07-05 20:51                   ` Bruce Fields
@ 2016-07-05 23:05                     ` Marc Eshel
  2016-07-06  0:38                       ` Bruce Fields
  0 siblings, 1 reply; 44+ messages in thread
From: Marc Eshel @ 2016-07-05 23:05 UTC (permalink / raw)
  To: Bruce Fields; +Cc: linux-nfs, Tomer Perry

Can you please point me to the kernel that you are using so I can check if 
it is an obvious problem before I open an RHEL bug?
Thanks, Marc. 



From:   Bruce Fields <bfields@fieldses.org>
To:     Marc Eshel/Almaden/IBM@IBMUS
Cc:     linux-nfs@vger.kernel.org, Tomer Perry <TOMP@il.ibm.com>
Date:   07/05/2016 01:52 PM
Subject:        Re: grace period
Sent by:        linux-nfs-owner@vger.kernel.org



On Sat, Jul 02, 2016 at 10:30:11PM -0700, Marc Eshel wrote:
> I tried again NFSv3 locks with xfs export. "echo 0 > 
> /proc/fs/nfsd/threads" releases locks on rhel7.0 but not on rhel7.2
> What else can I show you to find the problem?

Sorry, I can't reproduce, though I've only tried a slightly later kernel
than that.  Could you submit a RHEL bug?

--b.

> Marc.
> 
> works:
> [root@boar11 ~]# uname -a
> Linux boar11 3.10.0-123.el7.x86_64 #1 SMP Mon May 5 11:16:57 EDT 2014 
> x86_64 x86_64 x86_64 GNU/Linux
> [root@boar11 ~]# cat /etc/redhat-release 
> Red Hat Enterprise Linux Server release 7.0 (Maipo)
> 
> not working:
> [root@sonascl21 ~]# uname -a
> Linux sonascl21.sonasad.almaden.ibm.com 3.10.0-327.el7.x86_64 #1 SMP Thu 

> Oct 29 17:29:29 EDT 2015 x86_64 x86_64 x86_64 GNU/Linux
> [root@sonascl21 ~]# cat /etc/redhat-release 
> Red Hat Enterprise Linux Server release 7.2 (Maipo)
> [root@sonascl21 ~]# cat /proc/fs/nfsd/threads 
> 0
> [root@sonascl21 ~]# cat /proc/locks
> 1: POSIX  ADVISORY  WRITE 2346 fd:00:1612092569 0 9999
> 
> 
> 
> From:   Bruce Fields <bfields@fieldses.org>
> To:     Marc Eshel/Almaden/IBM@IBMUS
> Cc:     linux-nfs@vger.kernel.org, Tomer Perry <TOMP@il.ibm.com>
> Date:   07/01/2016 05:58 PM
> Subject:        Re: grace period
> 
> 
> 
> On Fri, Jul 01, 2016 at 03:42:43PM -0700, Marc Eshel wrote:
> > Yes, the locks are requested from another node, what fs are you using, 
I 
> 
> > don't think it should make any difference, but I can try it with the 
> same 
> > fs. 
> > Make sure you are using v3, it does work for v4.
> 
> I tested v3 on upstream.--b.
> 
> > Marc.
> > 
> > 
> > 
> > From:   Bruce Fields <bfields@fieldses.org>
> > To:     Marc Eshel/Almaden/IBM@IBMUS
> > Cc:     linux-nfs@vger.kernel.org, Tomer Perry <TOMP@il.ibm.com>
> > Date:   07/01/2016 02:01 PM
> > Subject:        Re: grace period
> > 
> > 
> > 
> > On Fri, Jul 01, 2016 at 01:46:42PM -0700, Marc Eshel wrote:
> > > This is my v3 test that show the lock still there after echo 0 > 
> > > /proc/fs/nfsd/threads
> > > 
> > > [root@sonascl21 ~]# cat /etc/redhat-release 
> > > Red Hat Enterprise Linux Server release 7.2 (Maipo)
> > > 
> > > [root@sonascl21 ~]# uname -a
> > > Linux sonascl21.sonasad.almaden.ibm.com 3.10.0-327.el7.x86_64 #1 SMP 

> Thu 
> > 
> > > Oct 29 17:29:29 EDT 2015 x86_64 x86_64 x86_64 GNU/Linux
> > > 
> > > [root@sonascl21 ~]# cat /proc/locks | grep 999
> > > 3: POSIX  ADVISORY  WRITE 2349 00:2a:489486 0 999
> > > 
> > > [root@sonascl21 ~]# echo 0 > /proc/fs/nfsd/threads
> > > [root@sonascl21 ~]# cat /proc/fs/nfsd/threads
> > > 0
> > > 
> > > [root@sonascl21 ~]# cat /proc/locks | grep 999
> > > 3: POSIX  ADVISORY  WRITE 2349 00:2a:489486 0 999
> > 
> > Huh, that's not what I see.  Are you positive that's the lock on the
> > backend filesystem and not the client-side lock (in case you're doing 
a
> > loopback mount?)
> > 
> > --b.
> > 
> > > 
> > > 
> > > 
> > > 
> > > From:   Bruce Fields <bfields@fieldses.org>
> > > To:     Marc Eshel/Almaden/IBM@IBMUS
> > > Cc:     linux-nfs@vger.kernel.org
> > > Date:   07/01/2016 01:07 PM
> > > Subject:        Re: grace period
> > > 
> > > 
> > > 
> > > On Fri, Jul 01, 2016 at 10:31:55AM -0700, Marc Eshel wrote:
> > > > It used to be that sending KILL signal to lockd would free locks 
and 
> 
> > > start 
> > > > Grace period, and when setting nfsd threads to zero, 
> > nfsd_last_thread() 
> > > > calls nfsd_shutdown that called lockd_down that I believe was 
> causing 
> > > both 
> > > > freeing of locks and starting grace period or maybe it was setting 

> it 
> > > back 
> > > > to a value > 0 that started the grace period.
> > > 
> > > OK, apologies, I didn't know (or forgot) that.
> > > 
> > > > Any way starting with the kernels that are in RHEL7.1 and up echo 
0 
> > 
> > > > /proc/fs/nfsd/threads doesn't do it anymore, I assume going to 
> common 
> > > > grace period for NLM and NFSv4 changed things.
> > > > The question is how to do IP fail-over, so when a node fails and 
the 
> 
> > IP 
> > > is 
> > > > moving to another node, we need to go into grace period on all the 

> > nodes 
> > > 
> > > > in the cluster so the locks of the failed node are not given to 
> anyone 
> > 
> > > > other than the client that is reclaiming his locks. Restarting NFS 

> > > server 
> > > > is to distractive.
> > > 
> > > What's the difference?  Just that clients don't have to reestablish 
> tcp
> > > connections?
> > > 
> > > --b.
> > > 
> > > > For NFSv3 KILL signal to lockd still works but for 
> > > > NFSv4 have no way to do it for v4.
> > > > Marc. 
> > > > 
> > > > 
> > > > 
> > > > From:   Bruce Fields <bfields@fieldses.org>
> > > > To:     Marc Eshel/Almaden/IBM@IBMUS
> > > > Cc:     linux-nfs@vger.kernel.org
> > > > Date:   07/01/2016 09:09 AM
> > > > Subject:        Re: grace period
> > > > 
> > > > 
> > > > 
> > > > On Thu, Jun 30, 2016 at 02:46:19PM -0700, Marc Eshel wrote:
> > > > > I see that setting the number of nfsd threads to 0 (echo 0 > 
> > > > > /proc/fs/nfsd/threads) is not releasing the locks and putting 
the 
> > > server 
> > > > 
> > > > > in grace mode.
> > > > 
> > > > Writing 0 to /proc/fs/nfsd/threads shuts down knfsd.  So it should
> > > > certainly drop locks.  If that's not happening, there's a bug, but 

> > we'd
> > > > need to know more details (version numbers, etc.) to help.
> > > > 
> > > > That alone has never been enough to start a grace period--you'd 
have 
> 
> > to
> > > > start knfsd again to do that.
> > > > 
> > > > > What is the best way to go into grace period, in new version of 
> the
> > > > > kernel, without restarting the nfs server?
> > > > 
> > > > Restarting the nfs server is the only way.  That's true on older 
> > kernels
> > > > true, as far as I know.  (OK, you can apparently make lockd do 
> > something
> > > > like this with a signal, I don't know if that's used much, and I 
> doubt
> > > > it works outside an NFSv3-only environment.)
> > > > 
> > > > So if you want locks dropped and a new grace period, then you 
should 
> 
> > run
> > > > "systemctl restart nfs-server", or your distro's equivalent.
> > > > 
> > > > But you're probably doing something more complicated than that. 
I'm 
> 
> > not
> > > > sure I understand the question....
> > > > 
> > > > --b.
> > > > 
> > > > 
> > > > 
> > > > 
> > > 
> > > 
> > > 
> > > 
> > 
> > 
> > 
> > 
> 
> 
> 
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html






^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: grace period
  2016-07-05 23:05                     ` Marc Eshel
@ 2016-07-06  0:38                       ` Bruce Fields
  0 siblings, 0 replies; 44+ messages in thread
From: Bruce Fields @ 2016-07-06  0:38 UTC (permalink / raw)
  To: Marc Eshel; +Cc: linux-nfs, Tomer Perry

On Tue, Jul 05, 2016 at 04:05:56PM -0700, Marc Eshel wrote:
> Can you please point me to the kernel that you are using so I can check if 
> it is an obvious problem before I open an RHEL bug?

I've tried it on the latest upstream and on rhel 3.10.0-327.13.1.el7.

--b.

> Thanks, Marc. 
> 
> 
> 
> From:   Bruce Fields <bfields@fieldses.org>
> To:     Marc Eshel/Almaden/IBM@IBMUS
> Cc:     linux-nfs@vger.kernel.org, Tomer Perry <TOMP@il.ibm.com>
> Date:   07/05/2016 01:52 PM
> Subject:        Re: grace period
> Sent by:        linux-nfs-owner@vger.kernel.org
> 
> 
> 
> On Sat, Jul 02, 2016 at 10:30:11PM -0700, Marc Eshel wrote:
> > I tried again NFSv3 locks with xfs export. "echo 0 > 
> > /proc/fs/nfsd/threads" releases locks on rhel7.0 but not on rhel7.2
> > What else can I show you to find the problem?
> 
> Sorry, I can't reproduce, though I've only tried a slightly later kernel
> than that.  Could you submit a RHEL bug?
> 
> --b.
> 
> > Marc.
> > 
> > works:
> > [root@boar11 ~]# uname -a
> > Linux boar11 3.10.0-123.el7.x86_64 #1 SMP Mon May 5 11:16:57 EDT 2014 
> > x86_64 x86_64 x86_64 GNU/Linux
> > [root@boar11 ~]# cat /etc/redhat-release 
> > Red Hat Enterprise Linux Server release 7.0 (Maipo)
> > 
> > not working:
> > [root@sonascl21 ~]# uname -a
> > Linux sonascl21.sonasad.almaden.ibm.com 3.10.0-327.el7.x86_64 #1 SMP Thu 
> 
> > Oct 29 17:29:29 EDT 2015 x86_64 x86_64 x86_64 GNU/Linux
> > [root@sonascl21 ~]# cat /etc/redhat-release 
> > Red Hat Enterprise Linux Server release 7.2 (Maipo)
> > [root@sonascl21 ~]# cat /proc/fs/nfsd/threads 
> > 0
> > [root@sonascl21 ~]# cat /proc/locks
> > 1: POSIX  ADVISORY  WRITE 2346 fd:00:1612092569 0 9999
> > 
> > 
> > 
> > From:   Bruce Fields <bfields@fieldses.org>
> > To:     Marc Eshel/Almaden/IBM@IBMUS
> > Cc:     linux-nfs@vger.kernel.org, Tomer Perry <TOMP@il.ibm.com>
> > Date:   07/01/2016 05:58 PM
> > Subject:        Re: grace period
> > 
> > 
> > 
> > On Fri, Jul 01, 2016 at 03:42:43PM -0700, Marc Eshel wrote:
> > > Yes, the locks are requested from another node, what fs are you using, 
> I 
> > 
> > > don't think it should make any difference, but I can try it with the 
> > same 
> > > fs. 
> > > Make sure you are using v3, it does work for v4.
> > 
> > I tested v3 on upstream.--b.
> > 
> > > Marc.
> > > 
> > > 
> > > 
> > > From:   Bruce Fields <bfields@fieldses.org>
> > > To:     Marc Eshel/Almaden/IBM@IBMUS
> > > Cc:     linux-nfs@vger.kernel.org, Tomer Perry <TOMP@il.ibm.com>
> > > Date:   07/01/2016 02:01 PM
> > > Subject:        Re: grace period
> > > 
> > > 
> > > 
> > > On Fri, Jul 01, 2016 at 01:46:42PM -0700, Marc Eshel wrote:
> > > > This is my v3 test that show the lock still there after echo 0 > 
> > > > /proc/fs/nfsd/threads
> > > > 
> > > > [root@sonascl21 ~]# cat /etc/redhat-release 
> > > > Red Hat Enterprise Linux Server release 7.2 (Maipo)
> > > > 
> > > > [root@sonascl21 ~]# uname -a
> > > > Linux sonascl21.sonasad.almaden.ibm.com 3.10.0-327.el7.x86_64 #1 SMP 
> 
> > Thu 
> > > 
> > > > Oct 29 17:29:29 EDT 2015 x86_64 x86_64 x86_64 GNU/Linux
> > > > 
> > > > [root@sonascl21 ~]# cat /proc/locks | grep 999
> > > > 3: POSIX  ADVISORY  WRITE 2349 00:2a:489486 0 999
> > > > 
> > > > [root@sonascl21 ~]# echo 0 > /proc/fs/nfsd/threads
> > > > [root@sonascl21 ~]# cat /proc/fs/nfsd/threads
> > > > 0
> > > > 
> > > > [root@sonascl21 ~]# cat /proc/locks | grep 999
> > > > 3: POSIX  ADVISORY  WRITE 2349 00:2a:489486 0 999
> > > 
> > > Huh, that's not what I see.  Are you positive that's the lock on the
> > > backend filesystem and not the client-side lock (in case you're doing 
> a
> > > loopback mount?)
> > > 
> > > --b.
> > > 
> > > > 
> > > > 
> > > > 
> > > > 
> > > > From:   Bruce Fields <bfields@fieldses.org>
> > > > To:     Marc Eshel/Almaden/IBM@IBMUS
> > > > Cc:     linux-nfs@vger.kernel.org
> > > > Date:   07/01/2016 01:07 PM
> > > > Subject:        Re: grace period
> > > > 
> > > > 
> > > > 
> > > > On Fri, Jul 01, 2016 at 10:31:55AM -0700, Marc Eshel wrote:
> > > > > It used to be that sending KILL signal to lockd would free locks 
> and 
> > 
> > > > start 
> > > > > Grace period, and when setting nfsd threads to zero, 
> > > nfsd_last_thread() 
> > > > > calls nfsd_shutdown that called lockd_down that I believe was 
> > causing 
> > > > both 
> > > > > freeing of locks and starting grace period or maybe it was setting 
> 
> > it 
> > > > back 
> > > > > to a value > 0 that started the grace period.
> > > > 
> > > > OK, apologies, I didn't know (or forgot) that.
> > > > 
> > > > > Any way starting with the kernels that are in RHEL7.1 and up echo 
> 0 
> > > 
> > > > > /proc/fs/nfsd/threads doesn't do it anymore, I assume going to 
> > common 
> > > > > grace period for NLM and NFSv4 changed things.
> > > > > The question is how to do IP fail-over, so when a node fails and 
> the 
> > 
> > > IP 
> > > > is 
> > > > > moving to another node, we need to go into grace period on all the 
> 
> > > nodes 
> > > > 
> > > > > in the cluster so the locks of the failed node are not given to 
> > anyone 
> > > 
> > > > > other than the client that is reclaiming his locks. Restarting NFS 
> 
> > > > server 
> > > > > is to distractive.
> > > > 
> > > > What's the difference?  Just that clients don't have to reestablish 
> > tcp
> > > > connections?
> > > > 
> > > > --b.
> > > > 
> > > > > For NFSv3 KILL signal to lockd still works but for 
> > > > > NFSv4 have no way to do it for v4.
> > > > > Marc. 
> > > > > 
> > > > > 
> > > > > 
> > > > > From:   Bruce Fields <bfields@fieldses.org>
> > > > > To:     Marc Eshel/Almaden/IBM@IBMUS
> > > > > Cc:     linux-nfs@vger.kernel.org
> > > > > Date:   07/01/2016 09:09 AM
> > > > > Subject:        Re: grace period
> > > > > 
> > > > > 
> > > > > 
> > > > > On Thu, Jun 30, 2016 at 02:46:19PM -0700, Marc Eshel wrote:
> > > > > > I see that setting the number of nfsd threads to 0 (echo 0 > 
> > > > > > /proc/fs/nfsd/threads) is not releasing the locks and putting 
> the 
> > > > server 
> > > > > 
> > > > > > in grace mode.
> > > > > 
> > > > > Writing 0 to /proc/fs/nfsd/threads shuts down knfsd.  So it should
> > > > > certainly drop locks.  If that's not happening, there's a bug, but 
> 
> > > we'd
> > > > > need to know more details (version numbers, etc.) to help.
> > > > > 
> > > > > That alone has never been enough to start a grace period--you'd 
> have 
> > 
> > > to
> > > > > start knfsd again to do that.
> > > > > 
> > > > > > What is the best way to go into grace period, in new version of 
> > the
> > > > > > kernel, without restarting the nfs server?
> > > > > 
> > > > > Restarting the nfs server is the only way.  That's true on older 
> > > kernels
> > > > > true, as far as I know.  (OK, you can apparently make lockd do 
> > > something
> > > > > like this with a signal, I don't know if that's used much, and I 
> > doubt
> > > > > it works outside an NFSv3-only environment.)
> > > > > 
> > > > > So if you want locks dropped and a new grace period, then you 
> should 
> > 
> > > run
> > > > > "systemctl restart nfs-server", or your distro's equivalent.
> > > > > 
> > > > > But you're probably doing something more complicated than that. 
> I'm 
> > 
> > > not
> > > > > sure I understand the question....
> > > > > 
> > > > > --b.
> > > > > 
> > > > > 
> > > > > 
> > > > > 
> > > > 
> > > > 
> > > > 
> > > > 
> > > 
> > > 
> > > 
> > > 
> > 
> > 
> > 
> > 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> 
> 
> 

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: Grace period
  2012-04-10 18:28                               ` Jeff Layton
  2012-04-10 20:46                                 ` bfields
@ 2012-04-11 10:08                                 ` Stanislav Kinsbursky
  1 sibling, 0 replies; 44+ messages in thread
From: Stanislav Kinsbursky @ 2012-04-11 10:08 UTC (permalink / raw)
  To: Jeff Layton; +Cc: bfields, Myklebust, Trond, linux-nfs, linux-kernel

10.04.2012 22:28, Jeff Layton пишет:
> On Tue, 10 Apr 2012 19:36:26 +0400
> Stanislav Kinsbursky<skinsbursky@parallels.com>  wrote:
>
>> 10.04.2012 17:39, bfields@fieldses.org пишет:
>>> On Tue, Apr 10, 2012 at 02:56:12PM +0400, Stanislav Kinsbursky wrote:
>>>> 09.04.2012 22:11, bfields@fieldses.org пишет:
>>>>> Since NFSv4 doesn't have a separate MOUNT protocol, clients need to be
>>>>> able to do readdir's and lookups to get to exported filesystems.  We
>>>>> support this in the Linux server by exporting all the filesystems from
>>>>> "/" on down that must be traversed to reach a given filesystem.  These
>>>>> exports are very restricted (e.g. only parents of exports are visible).
>>>>>
>>>>
>>>> Ok, thanks for explanation.
>>>> So, this pseudoroot looks like a part of NFS server internal
>>>> implementation, but not a part of a standard. That's good.
>>>>
>>>>>> Why does it prevents implementing of check for "superblock-network
>>>>>> namespace" pair on NFS server start and forbid (?) it in case of
>>>>>> this pair is shared already in other namespace? I.e. maybe this
>>>>>> pseudoroot can be an exclusion from this rule?
>>>>>
>>>>> That might work.  It's read-only and consists only of directories, so
>>>>> the grace period doesn't affect it.
>>>>>
>>>>
>>>> I've just realized, that this per-sb grace period won't work.
>>>> I.e., it's a valid situation, when two or more containers located on
>>>> the same filesystem, but shares different parts of it. And there is
>>>> not conflict here at all.
>>>
>>> Well, there may be some conflict in that a file could be hardlinked into
>>> both subtrees, and that file could be locked from users of either
>>> export.
>>>
>>
>> Is this case handled if both links or visible in the same export?
>> But anyway, this is not that bad. I.e it doesn't make things unpredictable.
>> Probably, there are some more issues like this one (bind-mounting, for example).
>> But I think, that it's root responsibility to handle such problems.
>>
>
> Well, it's a problem and one that you'll probably have to address to
> some degree. In truth, the fact that you're exporting different
> subtrees in different containers is immaterial since they're both on
> the same fs and filehandles don't carry any info about the path in and
> of themselves...
>
> Suppose for instance that we have a hardlinked file that's available
> from two different exports in two different containers. The grace
> period ends in container #1, so that nfsd starts servicing normal lock
> requests. An application takes a lock on that hardlinked file. In the
> meantime, a client of container #2 attempts to reclaim the lock that he
> previously held on that same inode and gets denied.
>

> That's just one example. The scarier case is that the client of
> container #1 takes the lock, alters the file and then drops it again
> with the client of container #2 none the wiser. Now the file got
> altered while client #2 thought he held a lock on it. That won't be fun
> to track down...
>
> This sort of thing is one of the reasons I've been saying that the
> grace period is really a property of the underlying filesystem and not
> of nfsd itself. Of course, we do have to come up with a way to handle
> the grace period that doesn't involve altering every exportable fs.
>

I see.
But, frankly speaking, looks like the problem you are talking about is another 
task (comparing to containerization).
I.e. making NFSd work per network namespace is somewhat different comparing to 
these "shared file system" issues (which are actually a part of mount namespace).



-- 
Best regards,
Stanislav Kinsbursky

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: Grace period
  2012-04-10 18:28                               ` Jeff Layton
@ 2012-04-10 20:46                                 ` bfields
  2012-04-11 10:08                                 ` Stanislav Kinsbursky
  1 sibling, 0 replies; 44+ messages in thread
From: bfields @ 2012-04-10 20:46 UTC (permalink / raw)
  To: Jeff Layton
  Cc: Stanislav Kinsbursky, Myklebust, Trond, linux-nfs, linux-kernel

On Tue, Apr 10, 2012 at 02:28:53PM -0400, Jeff Layton wrote:
> This sort of thing is one of the reasons I've been saying that the
> grace period is really a property of the underlying filesystem and not
> of nfsd itself. Of course, we do have to come up with a way to handle
> the grace period that doesn't involve altering every exportable fs.

By the way, the case of multiple containers exporting a single
filesystem does look a lot like an active/active cluster filesystem
export.  It might be an opportunity to prototype the interfaces for
handling that case without having to deal with modifying the DLM.

--b.

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: Grace period
  2012-04-10 15:36                             ` Stanislav Kinsbursky
@ 2012-04-10 18:28                               ` Jeff Layton
  2012-04-10 20:46                                 ` bfields
  2012-04-11 10:08                                 ` Stanislav Kinsbursky
  0 siblings, 2 replies; 44+ messages in thread
From: Jeff Layton @ 2012-04-10 18:28 UTC (permalink / raw)
  To: Stanislav Kinsbursky; +Cc: bfields, Myklebust, Trond, linux-nfs, linux-kernel

On Tue, 10 Apr 2012 19:36:26 +0400
Stanislav Kinsbursky <skinsbursky@parallels.com> wrote:

> 10.04.2012 17:39, bfields@fieldses.org пишет:
> > On Tue, Apr 10, 2012 at 02:56:12PM +0400, Stanislav Kinsbursky wrote:
> >> 09.04.2012 22:11, bfields@fieldses.org пишет:
> >>> Since NFSv4 doesn't have a separate MOUNT protocol, clients need to be
> >>> able to do readdir's and lookups to get to exported filesystems.  We
> >>> support this in the Linux server by exporting all the filesystems from
> >>> "/" on down that must be traversed to reach a given filesystem.  These
> >>> exports are very restricted (e.g. only parents of exports are visible).
> >>>
> >>
> >> Ok, thanks for explanation.
> >> So, this pseudoroot looks like a part of NFS server internal
> >> implementation, but not a part of a standard. That's good.
> >>
> >>>> Why does it prevents implementing of check for "superblock-network
> >>>> namespace" pair on NFS server start and forbid (?) it in case of
> >>>> this pair is shared already in other namespace? I.e. maybe this
> >>>> pseudoroot can be an exclusion from this rule?
> >>>
> >>> That might work.  It's read-only and consists only of directories, so
> >>> the grace period doesn't affect it.
> >>>
> >>
> >> I've just realized, that this per-sb grace period won't work.
> >> I.e., it's a valid situation, when two or more containers located on
> >> the same filesystem, but shares different parts of it. And there is
> >> not conflict here at all.
> >
> > Well, there may be some conflict in that a file could be hardlinked into
> > both subtrees, and that file could be locked from users of either
> > export.
> >
> 
> Is this case handled if both links or visible in the same export?
> But anyway, this is not that bad. I.e it doesn't make things unpredictable.
> Probably, there are some more issues like this one (bind-mounting, for example).
> But I think, that it's root responsibility to handle such problems.
> 

Well, it's a problem and one that you'll probably have to address to
some degree. In truth, the fact that you're exporting different
subtrees in different containers is immaterial since they're both on
the same fs and filehandles don't carry any info about the path in and
of themselves...

Suppose for instance that we have a hardlinked file that's available
from two different exports in two different containers. The grace
period ends in container #1, so that nfsd starts servicing normal lock
requests. An application takes a lock on that hardlinked file. In the
meantime, a client of container #2 attempts to reclaim the lock that he
previously held on that same inode and gets denied.

That's just one example. The scarier case is that the client of
container #1 takes the lock, alters the file and then drops it again
with the client of container #2 none the wiser. Now the file got
altered while client #2 thought he held a lock on it. That won't be fun
to track down...

This sort of thing is one of the reasons I've been saying that the
grace period is really a property of the underlying filesystem and not
of nfsd itself. Of course, we do have to come up with a way to handle
the grace period that doesn't involve altering every exportable fs.

-- 
Jeff Layton <jlayton@redhat.com>

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: Grace period
  2012-04-10 13:39                           ` bfields
@ 2012-04-10 15:36                             ` Stanislav Kinsbursky
  2012-04-10 18:28                               ` Jeff Layton
  0 siblings, 1 reply; 44+ messages in thread
From: Stanislav Kinsbursky @ 2012-04-10 15:36 UTC (permalink / raw)
  To: bfields; +Cc: Myklebust, Trond, Jeff Layton, linux-nfs, linux-kernel

10.04.2012 17:39, bfields@fieldses.org пишет:
> On Tue, Apr 10, 2012 at 02:56:12PM +0400, Stanislav Kinsbursky wrote:
>> 09.04.2012 22:11, bfields@fieldses.org пишет:
>>> Since NFSv4 doesn't have a separate MOUNT protocol, clients need to be
>>> able to do readdir's and lookups to get to exported filesystems.  We
>>> support this in the Linux server by exporting all the filesystems from
>>> "/" on down that must be traversed to reach a given filesystem.  These
>>> exports are very restricted (e.g. only parents of exports are visible).
>>>
>>
>> Ok, thanks for explanation.
>> So, this pseudoroot looks like a part of NFS server internal
>> implementation, but not a part of a standard. That's good.
>>
>>>> Why does it prevents implementing of check for "superblock-network
>>>> namespace" pair on NFS server start and forbid (?) it in case of
>>>> this pair is shared already in other namespace? I.e. maybe this
>>>> pseudoroot can be an exclusion from this rule?
>>>
>>> That might work.  It's read-only and consists only of directories, so
>>> the grace period doesn't affect it.
>>>
>>
>> I've just realized, that this per-sb grace period won't work.
>> I.e., it's a valid situation, when two or more containers located on
>> the same filesystem, but shares different parts of it. And there is
>> not conflict here at all.
>
> Well, there may be some conflict in that a file could be hardlinked into
> both subtrees, and that file could be locked from users of either
> export.
>

Is this case handled if both links or visible in the same export?
But anyway, this is not that bad. I.e it doesn't make things unpredictable.
Probably, there are some more issues like this one (bind-mounting, for example).
But I think, that it's root responsibility to handle such problems.

-- 
Best regards,
Stanislav Kinsbursky

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: Grace period
  2012-04-10 14:10           ` Stanislav Kinsbursky
@ 2012-04-10 14:18             ` bfields
  0 siblings, 0 replies; 44+ messages in thread
From: bfields @ 2012-04-10 14:18 UTC (permalink / raw)
  To: Stanislav Kinsbursky; +Cc: Trond.Myklebust, linux-nfs, linux-kernel

On Tue, Apr 10, 2012 at 06:10:27PM +0400, Stanislav Kinsbursky wrote:
> Well, I can do this to restart grace only for "init_net" and a
> printk with your message and information, that it affect only
> init_net.
> Looks good to you?

Yep, thanks!

--b.

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: Grace period
  2012-04-10 13:37         ` bfields
@ 2012-04-10 14:10           ` Stanislav Kinsbursky
  2012-04-10 14:18             ` bfields
  0 siblings, 1 reply; 44+ messages in thread
From: Stanislav Kinsbursky @ 2012-04-10 14:10 UTC (permalink / raw)
  To: bfields; +Cc: Trond.Myklebust, linux-nfs, linux-kernel

10.04.2012 17:37, bfields@fieldses.org пишет:
> On Tue, Apr 10, 2012 at 03:29:11PM +0400, Stanislav Kinsbursky wrote:
>> 10.04.2012 03:26, bfields@fieldses.org пишет:
>>> On Mon, Apr 09, 2012 at 03:24:19PM +0400, Stanislav Kinsbursky wrote:
>>>> 07.04.2012 03:40, bfields@fieldses.org пишет:
>>>>> On Fri, Apr 06, 2012 at 09:08:26PM +0400, Stanislav Kinsbursky wrote:
>>>>>> Hello, Bruce.
>>>>>> Could you, please, clarify this reason why grace list is used?
>>>>>> I.e. why list is used instead of some atomic variable, for example?
>>>>>
>>>>> Like just a reference count?  Yeah, that would be OK.
>>>>>
>>>>> In theory it could provide some sort of debugging help.  (E.g. we could
>>>>> print out the list of "lock managers" currently keeping us in grace.)  I
>>>>> had some idea we'd make those lock manager objects more complicated, and
>>>>> might have more for individual containerized services.
>>>>
>>>> Could you share this idea, please?
>>>>
>>>> Anyway, I have nothing against lists. Just was curious, why it was used.
>>>> I added Trond and lists to this reply.
>>>>
>>>> Let me explain, what is the problem with grace period I'm facing
>>>> right know, and what I'm thinking about it.
>>>> So, one of the things to be containerized during "NFSd per net ns"
>>>> work is the grace period, and these are the basic components of it:
>>>> 1) Grace period start.
>>>> 2) Grace period end.
>>>> 3) Grace period check.
>>>> 3) Grace period restart.
>>>
>>> For restart, you're thinking of the fs/lockd/svc.c:restart_grace()
>>> that's called on aisngal in lockd()?
>>>
>>> I wonder if there's any way to figure out if that's actually used by
>>> anyone?  (E.g. by any distro init scripts).  It strikes me as possibly
>>> impossible to use correctly.  Perhaps we could deprecate it....
>>>
>>
>> Or (since lockd kthread is visible only from initial pid namespace)
>> we can just hardcode "init_net" in this case. But it means, that
>> this "kill" logic will be broken if two containers shares one pid
>> namespace, but have separated networks namespaces.
>> Anyway, both (this one or Bruce's) solutions suits me.
>>
>>>> So, the simplest straight-forward way is to make all internal stuff:
>>>> "grace_list", "grace_lock", "grace_period_end" work and both
>>>> "lockd_manager" and "nfsd4_manager" - per network namespace. Also,
>>>> "laundromat_work" have to be per-net as well.
>>>> In this case:
>>>> 1) Start - grace period can be started per net ns in
>>>> "lockd_up_net()" (thus has to be moves there from "lockd()") and
>>>> "nfs4_state_start()".
>>>> 2) End - grace period can be ended per net ns in "lockd_down_net()"
>>>> (thus has to be moved there from "lockd()"), "nfsd4_end_grace()" and
>>>> "fs4_state_shutdown()".
>>>> 3) Check - looks easy. There is either svc_rqst or net context can
>>>> be passed to function.
>>>> 4) Restart - this is a tricky place. It would be great to restart
>>>> grace period only for the networks namespace of the sender of the
>>>> kill signal. So, the idea is to check siginfo_t for the pid of
>>>> sender, then try to locate the task, and if found, then get sender's
>>>> networks namespace, and restart grace period only for this namespace
>>>> (of course, if lockd was started for this namespace - see below).
>>>
>>> If it's really the signalling that's the problem--perhaps we can get
>>> away from the signal-based interface.
>>>
>>> At least in the case of lockd I suspect we could.
>>>
>>
>> I'm ok with that. So, if no objections will follow, I'll drop it and
>> send the patch. Or you want to do it?
>
> Please do go ahead.
>
> The safest approach might be:
> 	- leave lockd's signal handling there (just accept that it may
> 	  behave incorrectly in container case), assuming that's safe.
> 	- add a printk ("signalling lockd to restart is deprecated",
> 	  or something) if it's used.
>
> Then eventually we'll remove it entirely.
>
> (But if that doesn't work, it'd likely also be OK just to remove it
> completely now.)
>

Well, I can do this to restart grace only for "init_net" and a printk with your 
message and information, that it affect only init_net.
Looks good to you?

-- 
Best regards,
Stanislav Kinsbursky

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: Grace period
  2012-04-10 10:56                         ` Stanislav Kinsbursky
@ 2012-04-10 13:39                           ` bfields
  2012-04-10 15:36                             ` Stanislav Kinsbursky
  0 siblings, 1 reply; 44+ messages in thread
From: bfields @ 2012-04-10 13:39 UTC (permalink / raw)
  To: Stanislav Kinsbursky
  Cc: Myklebust, Trond, Jeff Layton, linux-nfs, linux-kernel

On Tue, Apr 10, 2012 at 02:56:12PM +0400, Stanislav Kinsbursky wrote:
> 09.04.2012 22:11, bfields@fieldses.org пишет:
> >Since NFSv4 doesn't have a separate MOUNT protocol, clients need to be
> >able to do readdir's and lookups to get to exported filesystems.  We
> >support this in the Linux server by exporting all the filesystems from
> >"/" on down that must be traversed to reach a given filesystem.  These
> >exports are very restricted (e.g. only parents of exports are visible).
> >
> 
> Ok, thanks for explanation.
> So, this pseudoroot looks like a part of NFS server internal
> implementation, but not a part of a standard. That's good.
> 
> >>Why does it prevents implementing of check for "superblock-network
> >>namespace" pair on NFS server start and forbid (?) it in case of
> >>this pair is shared already in other namespace? I.e. maybe this
> >>pseudoroot can be an exclusion from this rule?
> >
> >That might work.  It's read-only and consists only of directories, so
> >the grace period doesn't affect it.
> >
> 
> I've just realized, that this per-sb grace period won't work.
> I.e., it's a valid situation, when two or more containers located on
> the same filesystem, but shares different parts of it. And there is
> not conflict here at all.

Well, there may be some conflict in that a file could be hardlinked into
both subtrees, and that file could be locked from users of either
export.

--b.

> I don't see any clear and simple way how to handle such races,
> because otherwise we have to tie network namespace and filesystem
> namespace.
> I.e. there will be required some way to define, was passed export
> directory shared already somewhere else or not.
> 
> Realistic solution - since export check should be done in initial
> file system environment (most probably container will have it's own
> root), then we to pass this data to some kernel thread/userspace
> daemon in initial file system environment somehow (sockets doesn't
> suits here... Shared memory?).
> 
> Improbable solution - patching VFS layer...
> 
> -- 
> Best regards,
> Stanislav Kinsbursky

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: Grace period
  2012-04-10 11:29       ` Stanislav Kinsbursky
@ 2012-04-10 13:37         ` bfields
  2012-04-10 14:10           ` Stanislav Kinsbursky
  0 siblings, 1 reply; 44+ messages in thread
From: bfields @ 2012-04-10 13:37 UTC (permalink / raw)
  To: Stanislav Kinsbursky; +Cc: Trond.Myklebust, linux-nfs, linux-kernel

On Tue, Apr 10, 2012 at 03:29:11PM +0400, Stanislav Kinsbursky wrote:
> 10.04.2012 03:26, bfields@fieldses.org пишет:
> >On Mon, Apr 09, 2012 at 03:24:19PM +0400, Stanislav Kinsbursky wrote:
> >>07.04.2012 03:40, bfields@fieldses.org пишет:
> >>>On Fri, Apr 06, 2012 at 09:08:26PM +0400, Stanislav Kinsbursky wrote:
> >>>>Hello, Bruce.
> >>>>Could you, please, clarify this reason why grace list is used?
> >>>>I.e. why list is used instead of some atomic variable, for example?
> >>>
> >>>Like just a reference count?  Yeah, that would be OK.
> >>>
> >>>In theory it could provide some sort of debugging help.  (E.g. we could
> >>>print out the list of "lock managers" currently keeping us in grace.)  I
> >>>had some idea we'd make those lock manager objects more complicated, and
> >>>might have more for individual containerized services.
> >>
> >>Could you share this idea, please?
> >>
> >>Anyway, I have nothing against lists. Just was curious, why it was used.
> >>I added Trond and lists to this reply.
> >>
> >>Let me explain, what is the problem with grace period I'm facing
> >>right know, and what I'm thinking about it.
> >>So, one of the things to be containerized during "NFSd per net ns"
> >>work is the grace period, and these are the basic components of it:
> >>1) Grace period start.
> >>2) Grace period end.
> >>3) Grace period check.
> >>3) Grace period restart.
> >
> >For restart, you're thinking of the fs/lockd/svc.c:restart_grace()
> >that's called on aisngal in lockd()?
> >
> >I wonder if there's any way to figure out if that's actually used by
> >anyone?  (E.g. by any distro init scripts).  It strikes me as possibly
> >impossible to use correctly.  Perhaps we could deprecate it....
> >
> 
> Or (since lockd kthread is visible only from initial pid namespace)
> we can just hardcode "init_net" in this case. But it means, that
> this "kill" logic will be broken if two containers shares one pid
> namespace, but have separated networks namespaces.
> Anyway, both (this one or Bruce's) solutions suits me.
> 
> >>So, the simplest straight-forward way is to make all internal stuff:
> >>"grace_list", "grace_lock", "grace_period_end" work and both
> >>"lockd_manager" and "nfsd4_manager" - per network namespace. Also,
> >>"laundromat_work" have to be per-net as well.
> >>In this case:
> >>1) Start - grace period can be started per net ns in
> >>"lockd_up_net()" (thus has to be moves there from "lockd()") and
> >>"nfs4_state_start()".
> >>2) End - grace period can be ended per net ns in "lockd_down_net()"
> >>(thus has to be moved there from "lockd()"), "nfsd4_end_grace()" and
> >>"fs4_state_shutdown()".
> >>3) Check - looks easy. There is either svc_rqst or net context can
> >>be passed to function.
> >>4) Restart - this is a tricky place. It would be great to restart
> >>grace period only for the networks namespace of the sender of the
> >>kill signal. So, the idea is to check siginfo_t for the pid of
> >>sender, then try to locate the task, and if found, then get sender's
> >>networks namespace, and restart grace period only for this namespace
> >>(of course, if lockd was started for this namespace - see below).
> >
> >If it's really the signalling that's the problem--perhaps we can get
> >away from the signal-based interface.
> >
> >At least in the case of lockd I suspect we could.
> >
> 
> I'm ok with that. So, if no objections will follow, I'll drop it and
> send the patch. Or you want to do it?

Please do go ahead.

The safest approach might be:
	- leave lockd's signal handling there (just accept that it may
	  behave incorrectly in container case), assuming that's safe.
	- add a printk ("signalling lockd to restart is deprecated",
	  or something) if it's used.

Then eventually we'll remove it entirely.

(But if that doesn't work, it'd likely also be OK just to remove it
completely now.)

--b.

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: Grace period
  2012-04-09 23:26     ` bfields
@ 2012-04-10 11:29       ` Stanislav Kinsbursky
  2012-04-10 13:37         ` bfields
  0 siblings, 1 reply; 44+ messages in thread
From: Stanislav Kinsbursky @ 2012-04-10 11:29 UTC (permalink / raw)
  To: bfields; +Cc: Trond.Myklebust, linux-nfs, linux-kernel

10.04.2012 03:26, bfields@fieldses.org пишет:
> On Mon, Apr 09, 2012 at 03:24:19PM +0400, Stanislav Kinsbursky wrote:
>> 07.04.2012 03:40, bfields@fieldses.org пишет:
>>> On Fri, Apr 06, 2012 at 09:08:26PM +0400, Stanislav Kinsbursky wrote:
>>>> Hello, Bruce.
>>>> Could you, please, clarify this reason why grace list is used?
>>>> I.e. why list is used instead of some atomic variable, for example?
>>>
>>> Like just a reference count?  Yeah, that would be OK.
>>>
>>> In theory it could provide some sort of debugging help.  (E.g. we could
>>> print out the list of "lock managers" currently keeping us in grace.)  I
>>> had some idea we'd make those lock manager objects more complicated, and
>>> might have more for individual containerized services.
>>
>> Could you share this idea, please?
>>
>> Anyway, I have nothing against lists. Just was curious, why it was used.
>> I added Trond and lists to this reply.
>>
>> Let me explain, what is the problem with grace period I'm facing
>> right know, and what I'm thinking about it.
>> So, one of the things to be containerized during "NFSd per net ns"
>> work is the grace period, and these are the basic components of it:
>> 1) Grace period start.
>> 2) Grace period end.
>> 3) Grace period check.
>> 3) Grace period restart.
>
> For restart, you're thinking of the fs/lockd/svc.c:restart_grace()
> that's called on aisngal in lockd()?
>
> I wonder if there's any way to figure out if that's actually used by
> anyone?  (E.g. by any distro init scripts).  It strikes me as possibly
> impossible to use correctly.  Perhaps we could deprecate it....
>

Or (since lockd kthread is visible only from initial pid namespace) we can just 
hardcode "init_net" in this case. But it means, that this "kill" logic will be 
broken if two containers shares one pid namespace, but have separated networks 
namespaces.
Anyway, both (this one or Bruce's) solutions suits me.

>> So, the simplest straight-forward way is to make all internal stuff:
>> "grace_list", "grace_lock", "grace_period_end" work and both
>> "lockd_manager" and "nfsd4_manager" - per network namespace. Also,
>> "laundromat_work" have to be per-net as well.
>> In this case:
>> 1) Start - grace period can be started per net ns in
>> "lockd_up_net()" (thus has to be moves there from "lockd()") and
>> "nfs4_state_start()".
>> 2) End - grace period can be ended per net ns in "lockd_down_net()"
>> (thus has to be moved there from "lockd()"), "nfsd4_end_grace()" and
>> "fs4_state_shutdown()".
>> 3) Check - looks easy. There is either svc_rqst or net context can
>> be passed to function.
>> 4) Restart - this is a tricky place. It would be great to restart
>> grace period only for the networks namespace of the sender of the
>> kill signal. So, the idea is to check siginfo_t for the pid of
>> sender, then try to locate the task, and if found, then get sender's
>> networks namespace, and restart grace period only for this namespace
>> (of course, if lockd was started for this namespace - see below).
>
> If it's really the signalling that's the problem--perhaps we can get
> away from the signal-based interface.
>
> At least in the case of lockd I suspect we could.
>

I'm ok with that. So, if no objections will follow, I'll drop it and send the 
patch. Or you want to do it?

BTW, I tried this "pid from siginfo" approach yesterday. And it doesn't work, 
because sender usually dead already, when lookup for task by pid is performed.

> Or perhaps the decision to share a single lockd thread (or set of nsfd
> threads) among multiple network namespaces was a poor one.  But I
> realize multithreading lockd doesn't look easy.
>

This decision was the best one in current circumstances.
Having Lockd thread (or NFSd threads) per container looks easy to implement on 
first sight. But kernel threads currently supported only in initial pid 
namespace. I.e. it means that per-container kernel thread won't be visible in 
container, if it has it's own pid namespace. And there is no way to put a kernel 
thread into container.
In OpenVZ we have per-container kernel threads. But integrating this feature to 
mainline looks hopeless (or very difficult) to me. At least for now.
So this problem with signals remains unsolved.

So, as it looks to me, this "one service per all" is the only one suitable for 
now. But there are some corner cases which have to be solved.

Anyway, Jeff's question is still open.
Do we need to prevent people from exporting nested directories from different 
network namespaces?
And if yes, how to do this?

-- 
Best regards,
Stanislav Kinsbursky

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: Grace period
  2012-04-09 18:11                       ` bfields
@ 2012-04-10 10:56                         ` Stanislav Kinsbursky
  2012-04-10 13:39                           ` bfields
  0 siblings, 1 reply; 44+ messages in thread
From: Stanislav Kinsbursky @ 2012-04-10 10:56 UTC (permalink / raw)
  To: bfields; +Cc: Myklebust, Trond, Jeff Layton, linux-nfs, linux-kernel

09.04.2012 22:11, bfields@fieldses.org пишет:
> On Mon, Apr 09, 2012 at 08:56:47PM +0400, Stanislav Kinsbursky wrote:
>> 09.04.2012 20:33, Myklebust, Trond пишет:
>>> On Mon, 2012-04-09 at 12:21 -0400, bfields@fieldses.org wrote:
>>>> On Mon, Apr 09, 2012 at 04:17:06PM +0000, Myklebust, Trond wrote:
>>>>> On Mon, 2012-04-09 at 12:11 -0400, bfields@fieldses.org wrote:
>>>>>> On Mon, Apr 09, 2012 at 08:08:57PM +0400, Stanislav Kinsbursky wrote:
>>>>>>> 09.04.2012 19:27, Jeff Layton пишет:
>>>>>>>>
>>>>>>>> If you allow one container to hand out conflicting locks while another
>>>>>>>> container is allowing reclaims, then you can end up with some very
>>>>>>>> difficult to debug silent data corruption. That's the worst possible
>>>>>>>> outcome, IMO. We really need to actively keep people from shooting
>>>>>>>> themselves in the foot here.
>>>>>>>>
>>>>>>>> One possibility might be to only allow filesystems to be exported from
>>>>>>>> a single container at a time (and allow that to be overridable somehow
>>>>>>>> once we have a working active/active serving solution). With that, you
>>>>>>>> may be able limp along with a per-container grace period handling
>>>>>>>> scheme like you're proposing.
>>>>>>>>
>>>>>>>
>>>>>>> Ok then. Keeping people from shooting themselves here sounds reasonable.
>>>>>>> And I like the idea of exporting a filesystem only from once per
>>>>>>> network namespace.
>>>>>>
>>>>>> Unfortunately that's not going to get us very far, especially not in the
>>>>>> v4 case where we've got the common read-only pseudoroot that everyone
>>>>>> has to share.
>>>>>
>>>>> I don't see how that can work in cases where each container has its own
>>>>> private mount namespace. You're going to have to tie that pseudoroot to
>>>>> the mount namespace somehow.
>>>>
>>>> Sure, but in typical cases it'll still be shared; requiring that they
>>>> not be sounds like a severe limitation.
>>>
>>> I'd expect the typical case to be the non-shared namespace: the whole
>>> point of containers is to provide for complete isolation of processes.
>>> Usually that implies that you don't want them to be able to communicate
>>> via a shared filesystem.
>>>
>>
>> BTW, we DO use one mount namespace for all containers and host in
>> OpenVZ. This allows us to have an access to containers mount points
>> from initial environment. Isolation between containers is done via
>> chroot and some simple tricks on /proc/mounts read operation.
>> Moreover, with one mount namespace, we currently support
>> bind-mounting on NFS from one container into another...
>>
>> Anyway, I'm sorry, but I'm not familiar with this pseudoroot idea.
>
> Since NFSv4 doesn't have a separate MOUNT protocol, clients need to be
> able to do readdir's and lookups to get to exported filesystems.  We
> support this in the Linux server by exporting all the filesystems from
> "/" on down that must be traversed to reach a given filesystem.  These
> exports are very restricted (e.g. only parents of exports are visible).
>

Ok, thanks for explanation.
So, this pseudoroot looks like a part of NFS server internal implementation, but 
not a part of a standard. That's good.

>> Why does it prevents implementing of check for "superblock-network
>> namespace" pair on NFS server start and forbid (?) it in case of
>> this pair is shared already in other namespace? I.e. maybe this
>> pseudoroot can be an exclusion from this rule?
>
> That might work.  It's read-only and consists only of directories, so
> the grace period doesn't affect it.
>

I've just realized, that this per-sb grace period won't work.
I.e., it's a valid situation, when two or more containers located on the same 
filesystem, but shares different parts of it. And there is not conflict here at all.
I don't see any clear and simple way how to handle such races, because otherwise 
we have to tie network namespace and filesystem namespace.
I.e. there will be required some way to define, was passed export directory 
shared already somewhere else or not.

Realistic solution - since export check should be done in initial file system 
environment (most probably container will have it's own root), then we to pass 
this data to some kernel thread/userspace daemon in initial file system 
environment somehow (sockets doesn't suits here... Shared memory?).

Improbable solution - patching VFS layer...

-- 
Best regards,
Stanislav Kinsbursky

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: Grace period
  2012-04-09 11:24   ` Grace period Stanislav Kinsbursky
  2012-04-09 13:47     ` Jeff Layton
@ 2012-04-09 23:26     ` bfields
  2012-04-10 11:29       ` Stanislav Kinsbursky
  1 sibling, 1 reply; 44+ messages in thread
From: bfields @ 2012-04-09 23:26 UTC (permalink / raw)
  To: Stanislav Kinsbursky; +Cc: Trond.Myklebust, linux-nfs, linux-kernel

On Mon, Apr 09, 2012 at 03:24:19PM +0400, Stanislav Kinsbursky wrote:
> 07.04.2012 03:40, bfields@fieldses.org пишет:
> >On Fri, Apr 06, 2012 at 09:08:26PM +0400, Stanislav Kinsbursky wrote:
> >>Hello, Bruce.
> >>Could you, please, clarify this reason why grace list is used?
> >>I.e. why list is used instead of some atomic variable, for example?
> >
> >Like just a reference count?  Yeah, that would be OK.
> >
> >In theory it could provide some sort of debugging help.  (E.g. we could
> >print out the list of "lock managers" currently keeping us in grace.)  I
> >had some idea we'd make those lock manager objects more complicated, and
> >might have more for individual containerized services.
> 
> Could you share this idea, please?
> 
> Anyway, I have nothing against lists. Just was curious, why it was used.
> I added Trond and lists to this reply.
> 
> Let me explain, what is the problem with grace period I'm facing
> right know, and what I'm thinking about it.
> So, one of the things to be containerized during "NFSd per net ns"
> work is the grace period, and these are the basic components of it:
> 1) Grace period start.
> 2) Grace period end.
> 3) Grace period check.
> 3) Grace period restart.

For restart, you're thinking of the fs/lockd/svc.c:restart_grace()
that's called on aisngal in lockd()?

I wonder if there's any way to figure out if that's actually used by
anyone?  (E.g. by any distro init scripts).  It strikes me as possibly
impossible to use correctly.  Perhaps we could deprecate it....

> So, the simplest straight-forward way is to make all internal stuff:
> "grace_list", "grace_lock", "grace_period_end" work and both
> "lockd_manager" and "nfsd4_manager" - per network namespace. Also,
> "laundromat_work" have to be per-net as well.
> In this case:
> 1) Start - grace period can be started per net ns in
> "lockd_up_net()" (thus has to be moves there from "lockd()") and
> "nfs4_state_start()".
> 2) End - grace period can be ended per net ns in "lockd_down_net()"
> (thus has to be moved there from "lockd()"), "nfsd4_end_grace()" and
> "fs4_state_shutdown()".
> 3) Check - looks easy. There is either svc_rqst or net context can
> be passed to function.
> 4) Restart - this is a tricky place. It would be great to restart
> grace period only for the networks namespace of the sender of the
> kill signal. So, the idea is to check siginfo_t for the pid of
> sender, then try to locate the task, and if found, then get sender's
> networks namespace, and restart grace period only for this namespace
> (of course, if lockd was started for this namespace - see below).

If it's really the signalling that's the problem--perhaps we can get
away from the signal-based interface.

At least in the case of lockd I suspect we could.

Or perhaps the decision to share a single lockd thread (or set of nsfd
threads) among multiple network namespaces was a poor one.  But I
realize multithreading lockd doesn't look easy.

--b.

> If task not found, of it's lockd wasn't started for it's namespace,
> then grace period can be either restarted for all namespaces, of
> just silently dropped. This is the place where I'm not sure, how to
> do. Because calling grace period for all namespaces will be
> overkill...
> 
> There also another problem with the "task by pid" search, that found
> task can be actually not sender (which died already), but some other
> new task with the same pid number. In this case, I think, we can
> just neglect this probability and always assume, that we located
> sender (if, of course, lockd was started for sender's network
> namespace).
> 
> Trond, Bruce, could you, please, comment this ideas?
> 
> -- 
> Best regards,
> Stanislav Kinsbursky
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: Grace period
  2012-04-09 16:56                     ` Stanislav Kinsbursky
@ 2012-04-09 18:11                       ` bfields
  2012-04-10 10:56                         ` Stanislav Kinsbursky
  0 siblings, 1 reply; 44+ messages in thread
From: bfields @ 2012-04-09 18:11 UTC (permalink / raw)
  To: Stanislav Kinsbursky
  Cc: Myklebust, Trond, Jeff Layton, linux-nfs, linux-kernel

On Mon, Apr 09, 2012 at 08:56:47PM +0400, Stanislav Kinsbursky wrote:
> 09.04.2012 20:33, Myklebust, Trond пишет:
> >On Mon, 2012-04-09 at 12:21 -0400, bfields@fieldses.org wrote:
> >>On Mon, Apr 09, 2012 at 04:17:06PM +0000, Myklebust, Trond wrote:
> >>>On Mon, 2012-04-09 at 12:11 -0400, bfields@fieldses.org wrote:
> >>>>On Mon, Apr 09, 2012 at 08:08:57PM +0400, Stanislav Kinsbursky wrote:
> >>>>>09.04.2012 19:27, Jeff Layton пишет:
> >>>>>>
> >>>>>>If you allow one container to hand out conflicting locks while another
> >>>>>>container is allowing reclaims, then you can end up with some very
> >>>>>>difficult to debug silent data corruption. That's the worst possible
> >>>>>>outcome, IMO. We really need to actively keep people from shooting
> >>>>>>themselves in the foot here.
> >>>>>>
> >>>>>>One possibility might be to only allow filesystems to be exported from
> >>>>>>a single container at a time (and allow that to be overridable somehow
> >>>>>>once we have a working active/active serving solution). With that, you
> >>>>>>may be able limp along with a per-container grace period handling
> >>>>>>scheme like you're proposing.
> >>>>>>
> >>>>>
> >>>>>Ok then. Keeping people from shooting themselves here sounds reasonable.
> >>>>>And I like the idea of exporting a filesystem only from once per
> >>>>>network namespace.
> >>>>
> >>>>Unfortunately that's not going to get us very far, especially not in the
> >>>>v4 case where we've got the common read-only pseudoroot that everyone
> >>>>has to share.
> >>>
> >>>I don't see how that can work in cases where each container has its own
> >>>private mount namespace. You're going to have to tie that pseudoroot to
> >>>the mount namespace somehow.
> >>
> >>Sure, but in typical cases it'll still be shared; requiring that they
> >>not be sounds like a severe limitation.
> >
> >I'd expect the typical case to be the non-shared namespace: the whole
> >point of containers is to provide for complete isolation of processes.
> >Usually that implies that you don't want them to be able to communicate
> >via a shared filesystem.
> >
> 
> BTW, we DO use one mount namespace for all containers and host in
> OpenVZ. This allows us to have an access to containers mount points
> from initial environment. Isolation between containers is done via
> chroot and some simple tricks on /proc/mounts read operation.
> Moreover, with one mount namespace, we currently support
> bind-mounting on NFS from one container into another...
> 
> Anyway, I'm sorry, but I'm not familiar with this pseudoroot idea.

Since NFSv4 doesn't have a separate MOUNT protocol, clients need to be
able to do readdir's and lookups to get to exported filesystems.  We
support this in the Linux server by exporting all the filesystems from
"/" on down that must be traversed to reach a given filesystem.  These
exports are very restricted (e.g. only parents of exports are visible).

> Why does it prevents implementing of check for "superblock-network
> namespace" pair on NFS server start and forbid (?) it in case of
> this pair is shared already in other namespace? I.e. maybe this
> pseudoroot can be an exclusion from this rule?

That might work.  It's read-only and consists only of directories, so
the grace period doesn't affect it.

--b.

> Or I'm just missing the point at all?

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: Grace period
  2012-04-09 16:33                     ` Myklebust, Trond
  (?)
  (?)
@ 2012-04-09 16:56                     ` Stanislav Kinsbursky
  2012-04-09 18:11                       ` bfields
  -1 siblings, 1 reply; 44+ messages in thread
From: Stanislav Kinsbursky @ 2012-04-09 16:56 UTC (permalink / raw)
  To: Myklebust, Trond; +Cc: bfields, Jeff Layton, linux-nfs, linux-kernel

09.04.2012 20:33, Myklebust, Trond пишет:
> On Mon, 2012-04-09 at 12:21 -0400, bfields@fieldses.org wrote:
>> On Mon, Apr 09, 2012 at 04:17:06PM +0000, Myklebust, Trond wrote:
>>> On Mon, 2012-04-09 at 12:11 -0400, bfields@fieldses.org wrote:
>>>> On Mon, Apr 09, 2012 at 08:08:57PM +0400, Stanislav Kinsbursky wrote:
>>>>> 09.04.2012 19:27, Jeff Layton пишет:
>>>>>>
>>>>>> If you allow one container to hand out conflicting locks while another
>>>>>> container is allowing reclaims, then you can end up with some very
>>>>>> difficult to debug silent data corruption. That's the worst possible
>>>>>> outcome, IMO. We really need to actively keep people from shooting
>>>>>> themselves in the foot here.
>>>>>>
>>>>>> One possibility might be to only allow filesystems to be exported from
>>>>>> a single container at a time (and allow that to be overridable somehow
>>>>>> once we have a working active/active serving solution). With that, you
>>>>>> may be able limp along with a per-container grace period handling
>>>>>> scheme like you're proposing.
>>>>>>
>>>>>
>>>>> Ok then. Keeping people from shooting themselves here sounds reasonable.
>>>>> And I like the idea of exporting a filesystem only from once per
>>>>> network namespace.
>>>>
>>>> Unfortunately that's not going to get us very far, especially not in the
>>>> v4 case where we've got the common read-only pseudoroot that everyone
>>>> has to share.
>>>
>>> I don't see how that can work in cases where each container has its own
>>> private mount namespace. You're going to have to tie that pseudoroot to
>>> the mount namespace somehow.
>>
>> Sure, but in typical cases it'll still be shared; requiring that they
>> not be sounds like a severe limitation.
>
> I'd expect the typical case to be the non-shared namespace: the whole
> point of containers is to provide for complete isolation of processes.
> Usually that implies that you don't want them to be able to communicate
> via a shared filesystem.
>

BTW, we DO use one mount namespace for all containers and host in OpenVZ. This 
allows us to have an access to containers mount points from initial environment. 
Isolation between containers is done via chroot and some simple tricks on 
/proc/mounts read operation.
Moreover, with one mount namespace, we currently support bind-mounting on NFS 
from one container into another...

Anyway, I'm sorry, but I'm not familiar with this pseudoroot idea.
Why does it prevents implementing of check for "superblock-network namespace" 
pair on NFS server start and forbid (?) it in case of this pair is shared 
already in other namespace? I.e. maybe this pseudoroot can be an exclusion from 
this rule?
Or I'm just missing the point at all?

-- 
Best regards,
Stanislav Kinsbursky

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: Grace period
  2012-04-09 16:33                     ` Myklebust, Trond
  (?)
@ 2012-04-09 16:39                     ` bfields
  -1 siblings, 0 replies; 44+ messages in thread
From: bfields @ 2012-04-09 16:39 UTC (permalink / raw)
  To: Myklebust, Trond
  Cc: Stanislav Kinsbursky, Jeff Layton, linux-nfs, linux-kernel

On Mon, Apr 09, 2012 at 04:33:36PM +0000, Myklebust, Trond wrote:
> On Mon, 2012-04-09 at 12:21 -0400, bfields@fieldses.org wrote:
> > On Mon, Apr 09, 2012 at 04:17:06PM +0000, Myklebust, Trond wrote:
> > > On Mon, 2012-04-09 at 12:11 -0400, bfields@fieldses.org wrote:
> > > > On Mon, Apr 09, 2012 at 08:08:57PM +0400, Stanislav Kinsbursky wrote:
> > > > > 09.04.2012 19:27, Jeff Layton пишет:
> > > > > >
> > > > > >If you allow one container to hand out conflicting locks while another
> > > > > >container is allowing reclaims, then you can end up with some very
> > > > > >difficult to debug silent data corruption. That's the worst possible
> > > > > >outcome, IMO. We really need to actively keep people from shooting
> > > > > >themselves in the foot here.
> > > > > >
> > > > > >One possibility might be to only allow filesystems to be exported from
> > > > > >a single container at a time (and allow that to be overridable somehow
> > > > > >once we have a working active/active serving solution). With that, you
> > > > > >may be able limp along with a per-container grace period handling
> > > > > >scheme like you're proposing.
> > > > > >
> > > > > 
> > > > > Ok then. Keeping people from shooting themselves here sounds reasonable.
> > > > > And I like the idea of exporting a filesystem only from once per
> > > > > network namespace.
> > > > 
> > > > Unfortunately that's not going to get us very far, especially not in the
> > > > v4 case where we've got the common read-only pseudoroot that everyone
> > > > has to share.
> > > 
> > > I don't see how that can work in cases where each container has its own
> > > private mount namespace. You're going to have to tie that pseudoroot to
> > > the mount namespace somehow.
> > 
> > Sure, but in typical cases it'll still be shared; requiring that they
> > not be sounds like a severe limitation.
> 
> I'd expect the typical case to be the non-shared namespace: the whole
> point of containers is to provide for complete isolation of processes.
> Usually that implies that you don't want them to be able to communicate
> via a shared filesystem.

If it's just a file server, then you may want to be able to bring up and
down service on individual server ip's individually, and possibly
advertise different exports; but requiring complete isolation to do that
seems like overkill.

--b.

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: Grace period
  2012-04-09 16:21                 ` bfields
@ 2012-04-09 16:33                     ` Myklebust, Trond
  0 siblings, 0 replies; 44+ messages in thread
From: Myklebust, Trond @ 2012-04-09 16:33 UTC (permalink / raw)
  To: bfields; +Cc: Stanislav Kinsbursky, Jeff Layton, linux-nfs, linux-kernel

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset="utf-8", Size: 2234 bytes --]

On Mon, 2012-04-09 at 12:21 -0400, bfields@fieldses.org wrote:
> On Mon, Apr 09, 2012 at 04:17:06PM +0000, Myklebust, Trond wrote:
> > On Mon, 2012-04-09 at 12:11 -0400, bfields@fieldses.org wrote:
> > > On Mon, Apr 09, 2012 at 08:08:57PM +0400, Stanislav Kinsbursky wrote:
> > > > 09.04.2012 19:27, Jeff Layton Ð¿Ð¸ÑˆÐµÑ‚:
> > > > >
> > > > >If you allow one container to hand out conflicting locks while another
> > > > >container is allowing reclaims, then you can end up with some very
> > > > >difficult to debug silent data corruption. That's the worst possible
> > > > >outcome, IMO. We really need to actively keep people from shooting
> > > > >themselves in the foot here.
> > > > >
> > > > >One possibility might be to only allow filesystems to be exported from
> > > > >a single container at a time (and allow that to be overridable somehow
> > > > >once we have a working active/active serving solution). With that, you
> > > > >may be able limp along with a per-container grace period handling
> > > > >scheme like you're proposing.
> > > > >
> > > > 
> > > > Ok then. Keeping people from shooting themselves here sounds reasonable.
> > > > And I like the idea of exporting a filesystem only from once per
> > > > network namespace.
> > > 
> > > Unfortunately that's not going to get us very far, especially not in the
> > > v4 case where we've got the common read-only pseudoroot that everyone
> > > has to share.
> > 
> > I don't see how that can work in cases where each container has its own
> > private mount namespace. You're going to have to tie that pseudoroot to
> > the mount namespace somehow.
> 
> Sure, but in typical cases it'll still be shared; requiring that they
> not be sounds like a severe limitation.

I'd expect the typical case to be the non-shared namespace: the whole
point of containers is to provide for complete isolation of processes.
Usually that implies that you don't want them to be able to communicate
via a shared filesystem.

-- 
Trond Myklebust
Linux NFS client maintainer

NetApp
Trond.Myklebust@netapp.com
www.netapp.com

ÿôèº{.nÇ+‰·Ÿ®‰†+%ŠËÿ±éÝ¶\x17¥Šwÿº{.nÇ+‰·¥Š{±þG«éÿŠ{ayº\x1dÊ‡Ú™ë,j\a¢f£¢·hšïêÿ‘êçz_è®\x03(éšŽŠÝ¢j"ú\x1a¶^[m§ÿÿ¾\a«þG«éÿ¢¸?™¨èÚ&£ø§~á¶iO•æ¬z·švØ^\x14\x04\x1a¶^[m§ÿÿÃ\fÿ¶ìÿ¢¸?–I¥

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: Grace period
@ 2012-04-09 16:33                     ` Myklebust, Trond
  0 siblings, 0 replies; 44+ messages in thread
From: Myklebust, Trond @ 2012-04-09 16:33 UTC (permalink / raw)
  To: bfields; +Cc: Stanislav Kinsbursky, Jeff Layton, linux-nfs, linux-kernel

T24gTW9uLCAyMDEyLTA0LTA5IGF0IDEyOjIxIC0wNDAwLCBiZmllbGRzQGZpZWxkc2VzLm9yZyB3
cm90ZToNCj4gT24gTW9uLCBBcHIgMDksIDIwMTIgYXQgMDQ6MTc6MDZQTSArMDAwMCwgTXlrbGVi
dXN0LCBUcm9uZCB3cm90ZToNCj4gPiBPbiBNb24sIDIwMTItMDQtMDkgYXQgMTI6MTEgLTA0MDAs
IGJmaWVsZHNAZmllbGRzZXMub3JnIHdyb3RlOg0KPiA+ID4gT24gTW9uLCBBcHIgMDksIDIwMTIg
YXQgMDg6MDg6NTdQTSArMDQwMCwgU3RhbmlzbGF2IEtpbnNidXJza3kgd3JvdGU6DQo+ID4gPiA+
IDA5LjA0LjIwMTIgMTk6MjcsIEplZmYgTGF5dG9uINC/0LjRiNC10YI6DQo+ID4gPiA+ID4NCj4g
PiA+ID4gPklmIHlvdSBhbGxvdyBvbmUgY29udGFpbmVyIHRvIGhhbmQgb3V0IGNvbmZsaWN0aW5n
IGxvY2tzIHdoaWxlIGFub3RoZXINCj4gPiA+ID4gPmNvbnRhaW5lciBpcyBhbGxvd2luZyByZWNs
YWltcywgdGhlbiB5b3UgY2FuIGVuZCB1cCB3aXRoIHNvbWUgdmVyeQ0KPiA+ID4gPiA+ZGlmZmlj
dWx0IHRvIGRlYnVnIHNpbGVudCBkYXRhIGNvcnJ1cHRpb24uIFRoYXQncyB0aGUgd29yc3QgcG9z
c2libGUNCj4gPiA+ID4gPm91dGNvbWUsIElNTy4gV2UgcmVhbGx5IG5lZWQgdG8gYWN0aXZlbHkg
a2VlcCBwZW9wbGUgZnJvbSBzaG9vdGluZw0KPiA+ID4gPiA+dGhlbXNlbHZlcyBpbiB0aGUgZm9v
dCBoZXJlLg0KPiA+ID4gPiA+DQo+ID4gPiA+ID5PbmUgcG9zc2liaWxpdHkgbWlnaHQgYmUgdG8g
b25seSBhbGxvdyBmaWxlc3lzdGVtcyB0byBiZSBleHBvcnRlZCBmcm9tDQo+ID4gPiA+ID5hIHNp
bmdsZSBjb250YWluZXIgYXQgYSB0aW1lIChhbmQgYWxsb3cgdGhhdCB0byBiZSBvdmVycmlkYWJs
ZSBzb21laG93DQo+ID4gPiA+ID5vbmNlIHdlIGhhdmUgYSB3b3JraW5nIGFjdGl2ZS9hY3RpdmUg
c2VydmluZyBzb2x1dGlvbikuIFdpdGggdGhhdCwgeW91DQo+ID4gPiA+ID5tYXkgYmUgYWJsZSBs
aW1wIGFsb25nIHdpdGggYSBwZXItY29udGFpbmVyIGdyYWNlIHBlcmlvZCBoYW5kbGluZw0KPiA+
ID4gPiA+c2NoZW1lIGxpa2UgeW91J3JlIHByb3Bvc2luZy4NCj4gPiA+ID4gPg0KPiA+ID4gPiAN
Cj4gPiA+ID4gT2sgdGhlbi4gS2VlcGluZyBwZW9wbGUgZnJvbSBzaG9vdGluZyB0aGVtc2VsdmVz
IGhlcmUgc291bmRzIHJlYXNvbmFibGUuDQo+ID4gPiA+IEFuZCBJIGxpa2UgdGhlIGlkZWEgb2Yg
ZXhwb3J0aW5nIGEgZmlsZXN5c3RlbSBvbmx5IGZyb20gb25jZSBwZXINCj4gPiA+ID4gbmV0d29y
ayBuYW1lc3BhY2UuDQo+ID4gPiANCj4gPiA+IFVuZm9ydHVuYXRlbHkgdGhhdCdzIG5vdCBnb2lu
ZyB0byBnZXQgdXMgdmVyeSBmYXIsIGVzcGVjaWFsbHkgbm90IGluIHRoZQ0KPiA+ID4gdjQgY2Fz
ZSB3aGVyZSB3ZSd2ZSBnb3QgdGhlIGNvbW1vbiByZWFkLW9ubHkgcHNldWRvcm9vdCB0aGF0IGV2
ZXJ5b25lDQo+ID4gPiBoYXMgdG8gc2hhcmUuDQo+ID4gDQo+ID4gSSBkb24ndCBzZWUgaG93IHRo
YXQgY2FuIHdvcmsgaW4gY2FzZXMgd2hlcmUgZWFjaCBjb250YWluZXIgaGFzIGl0cyBvd24NCj4g
PiBwcml2YXRlIG1vdW50IG5hbWVzcGFjZS4gWW91J3JlIGdvaW5nIHRvIGhhdmUgdG8gdGllIHRo
YXQgcHNldWRvcm9vdCB0bw0KPiA+IHRoZSBtb3VudCBuYW1lc3BhY2Ugc29tZWhvdy4NCj4gDQo+
IFN1cmUsIGJ1dCBpbiB0eXBpY2FsIGNhc2VzIGl0J2xsIHN0aWxsIGJlIHNoYXJlZDsgcmVxdWly
aW5nIHRoYXQgdGhleQ0KPiBub3QgYmUgc291bmRzIGxpa2UgYSBzZXZlcmUgbGltaXRhdGlvbi4N
Cg0KSSdkIGV4cGVjdCB0aGUgdHlwaWNhbCBjYXNlIHRvIGJlIHRoZSBub24tc2hhcmVkIG5hbWVz
cGFjZTogdGhlIHdob2xlDQpwb2ludCBvZiBjb250YWluZXJzIGlzIHRvIHByb3ZpZGUgZm9yIGNv
bXBsZXRlIGlzb2xhdGlvbiBvZiBwcm9jZXNzZXMuDQpVc3VhbGx5IHRoYXQgaW1wbGllcyB0aGF0
IHlvdSBkb24ndCB3YW50IHRoZW0gdG8gYmUgYWJsZSB0byBjb21tdW5pY2F0ZQ0KdmlhIGEgc2hh
cmVkIGZpbGVzeXN0ZW0uDQoNCi0tIA0KVHJvbmQgTXlrbGVidXN0DQpMaW51eCBORlMgY2xpZW50
IG1haW50YWluZXINCg0KTmV0QXBwDQpUcm9uZC5NeWtsZWJ1c3RAbmV0YXBwLmNvbQ0Kd3d3Lm5l
dGFwcC5jb20NCg0K

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: Grace period
  2012-04-09 16:17                 ` Myklebust, Trond
  (?)
@ 2012-04-09 16:21                 ` bfields
  2012-04-09 16:33                     ` Myklebust, Trond
  -1 siblings, 1 reply; 44+ messages in thread
From: bfields @ 2012-04-09 16:21 UTC (permalink / raw)
  To: Myklebust, Trond
  Cc: Stanislav Kinsbursky, Jeff Layton, linux-nfs, linux-kernel

On Mon, Apr 09, 2012 at 04:17:06PM +0000, Myklebust, Trond wrote:
> On Mon, 2012-04-09 at 12:11 -0400, bfields@fieldses.org wrote:
> > On Mon, Apr 09, 2012 at 08:08:57PM +0400, Stanislav Kinsbursky wrote:
> > > 09.04.2012 19:27, Jeff Layton пишет:
> > > >
> > > >If you allow one container to hand out conflicting locks while another
> > > >container is allowing reclaims, then you can end up with some very
> > > >difficult to debug silent data corruption. That's the worst possible
> > > >outcome, IMO. We really need to actively keep people from shooting
> > > >themselves in the foot here.
> > > >
> > > >One possibility might be to only allow filesystems to be exported from
> > > >a single container at a time (and allow that to be overridable somehow
> > > >once we have a working active/active serving solution). With that, you
> > > >may be able limp along with a per-container grace period handling
> > > >scheme like you're proposing.
> > > >
> > > 
> > > Ok then. Keeping people from shooting themselves here sounds reasonable.
> > > And I like the idea of exporting a filesystem only from once per
> > > network namespace.
> > 
> > Unfortunately that's not going to get us very far, especially not in the
> > v4 case where we've got the common read-only pseudoroot that everyone
> > has to share.
> 
> I don't see how that can work in cases where each container has its own
> private mount namespace. You're going to have to tie that pseudoroot to
> the mount namespace somehow.

Sure, but in typical cases it'll still be shared; requiring that they
not be sounds like a severe limitation.

--b.

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: Grace period
  2012-04-09 16:11             ` bfields
@ 2012-04-09 16:17                 ` Myklebust, Trond
  0 siblings, 0 replies; 44+ messages in thread
From: Myklebust, Trond @ 2012-04-09 16:17 UTC (permalink / raw)
  To: bfields; +Cc: Stanislav Kinsbursky, Jeff Layton, linux-nfs, linux-kernel

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset="utf-8", Size: 1635 bytes --]

On Mon, 2012-04-09 at 12:11 -0400, bfields@fieldses.org wrote:
> On Mon, Apr 09, 2012 at 08:08:57PM +0400, Stanislav Kinsbursky wrote:
> > 09.04.2012 19:27, Jeff Layton Ð¿Ð¸ÑˆÐµÑ‚:
> > >
> > >If you allow one container to hand out conflicting locks while another
> > >container is allowing reclaims, then you can end up with some very
> > >difficult to debug silent data corruption. That's the worst possible
> > >outcome, IMO. We really need to actively keep people from shooting
> > >themselves in the foot here.
> > >
> > >One possibility might be to only allow filesystems to be exported from
> > >a single container at a time (and allow that to be overridable somehow
> > >once we have a working active/active serving solution). With that, you
> > >may be able limp along with a per-container grace period handling
> > >scheme like you're proposing.
> > >
> > 
> > Ok then. Keeping people from shooting themselves here sounds reasonable.
> > And I like the idea of exporting a filesystem only from once per
> > network namespace.
> 
> Unfortunately that's not going to get us very far, especially not in the
> v4 case where we've got the common read-only pseudoroot that everyone
> has to share.

I don't see how that can work in cases where each container has its own
private mount namespace. You're going to have to tie that pseudoroot to
the mount namespace somehow.

-- 
Trond Myklebust
Linux NFS client maintainer

NetApp
Trond.Myklebust@netapp.com
www.netapp.com

ÿôèº{.nÇ+‰·Ÿ®‰†+%ŠËÿ±éÝ¶\x17¥Šwÿº{.nÇ+‰·¥Š{±þG«éÿŠ{ayº\x1dÊ‡Ú™ë,j\a¢f£¢·hšïêÿ‘êçz_è®\x03(éšŽŠÝ¢j"ú\x1a¶^[m§ÿÿ¾\a«þG«éÿ¢¸?™¨èÚ&£ø§~á¶iO•æ¬z·švØ^\x14\x04\x1a¶^[m§ÿÿÃ\fÿ¶ìÿ¢¸?–I¥

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: Grace period
@ 2012-04-09 16:17                 ` Myklebust, Trond
  0 siblings, 0 replies; 44+ messages in thread
From: Myklebust, Trond @ 2012-04-09 16:17 UTC (permalink / raw)
  To: bfields; +Cc: Stanislav Kinsbursky, Jeff Layton, linux-nfs, linux-kernel

T24gTW9uLCAyMDEyLTA0LTA5IGF0IDEyOjExIC0wNDAwLCBiZmllbGRzQGZpZWxkc2VzLm9yZyB3
cm90ZToNCj4gT24gTW9uLCBBcHIgMDksIDIwMTIgYXQgMDg6MDg6NTdQTSArMDQwMCwgU3Rhbmlz
bGF2IEtpbnNidXJza3kgd3JvdGU6DQo+ID4gMDkuMDQuMjAxMiAxOToyNywgSmVmZiBMYXl0b24g
0L/QuNGI0LXRgjoNCj4gPiA+DQo+ID4gPklmIHlvdSBhbGxvdyBvbmUgY29udGFpbmVyIHRvIGhh
bmQgb3V0IGNvbmZsaWN0aW5nIGxvY2tzIHdoaWxlIGFub3RoZXINCj4gPiA+Y29udGFpbmVyIGlz
IGFsbG93aW5nIHJlY2xhaW1zLCB0aGVuIHlvdSBjYW4gZW5kIHVwIHdpdGggc29tZSB2ZXJ5DQo+
ID4gPmRpZmZpY3VsdCB0byBkZWJ1ZyBzaWxlbnQgZGF0YSBjb3JydXB0aW9uLiBUaGF0J3MgdGhl
IHdvcnN0IHBvc3NpYmxlDQo+ID4gPm91dGNvbWUsIElNTy4gV2UgcmVhbGx5IG5lZWQgdG8gYWN0
aXZlbHkga2VlcCBwZW9wbGUgZnJvbSBzaG9vdGluZw0KPiA+ID50aGVtc2VsdmVzIGluIHRoZSBm
b290IGhlcmUuDQo+ID4gPg0KPiA+ID5PbmUgcG9zc2liaWxpdHkgbWlnaHQgYmUgdG8gb25seSBh
bGxvdyBmaWxlc3lzdGVtcyB0byBiZSBleHBvcnRlZCBmcm9tDQo+ID4gPmEgc2luZ2xlIGNvbnRh
aW5lciBhdCBhIHRpbWUgKGFuZCBhbGxvdyB0aGF0IHRvIGJlIG92ZXJyaWRhYmxlIHNvbWVob3cN
Cj4gPiA+b25jZSB3ZSBoYXZlIGEgd29ya2luZyBhY3RpdmUvYWN0aXZlIHNlcnZpbmcgc29sdXRp
b24pLiBXaXRoIHRoYXQsIHlvdQ0KPiA+ID5tYXkgYmUgYWJsZSBsaW1wIGFsb25nIHdpdGggYSBw
ZXItY29udGFpbmVyIGdyYWNlIHBlcmlvZCBoYW5kbGluZw0KPiA+ID5zY2hlbWUgbGlrZSB5b3Un
cmUgcHJvcG9zaW5nLg0KPiA+ID4NCj4gPiANCj4gPiBPayB0aGVuLiBLZWVwaW5nIHBlb3BsZSBm
cm9tIHNob290aW5nIHRoZW1zZWx2ZXMgaGVyZSBzb3VuZHMgcmVhc29uYWJsZS4NCj4gPiBBbmQg
SSBsaWtlIHRoZSBpZGVhIG9mIGV4cG9ydGluZyBhIGZpbGVzeXN0ZW0gb25seSBmcm9tIG9uY2Ug
cGVyDQo+ID4gbmV0d29yayBuYW1lc3BhY2UuDQo+IA0KPiBVbmZvcnR1bmF0ZWx5IHRoYXQncyBu
b3QgZ29pbmcgdG8gZ2V0IHVzIHZlcnkgZmFyLCBlc3BlY2lhbGx5IG5vdCBpbiB0aGUNCj4gdjQg
Y2FzZSB3aGVyZSB3ZSd2ZSBnb3QgdGhlIGNvbW1vbiByZWFkLW9ubHkgcHNldWRvcm9vdCB0aGF0
IGV2ZXJ5b25lDQo+IGhhcyB0byBzaGFyZS4NCg0KSSBkb24ndCBzZWUgaG93IHRoYXQgY2FuIHdv
cmsgaW4gY2FzZXMgd2hlcmUgZWFjaCBjb250YWluZXIgaGFzIGl0cyBvd24NCnByaXZhdGUgbW91
bnQgbmFtZXNwYWNlLiBZb3UncmUgZ29pbmcgdG8gaGF2ZSB0byB0aWUgdGhhdCBwc2V1ZG9yb290
IHRvDQp0aGUgbW91bnQgbmFtZXNwYWNlIHNvbWVob3cuDQoNCi0tIA0KVHJvbmQgTXlrbGVidXN0
DQpMaW51eCBORlMgY2xpZW50IG1haW50YWluZXINCg0KTmV0QXBwDQpUcm9uZC5NeWtsZWJ1c3RA
bmV0YXBwLmNvbQ0Kd3d3Lm5ldGFwcC5jb20NCg0K

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: Grace period
  2012-04-09 16:08           ` Stanislav Kinsbursky
@ 2012-04-09 16:11             ` bfields
  2012-04-09 16:17                 ` Myklebust, Trond
  0 siblings, 1 reply; 44+ messages in thread
From: bfields @ 2012-04-09 16:11 UTC (permalink / raw)
  To: Stanislav Kinsbursky
  Cc: Jeff Layton, Trond.Myklebust, linux-nfs, linux-kernel

On Mon, Apr 09, 2012 at 08:08:57PM +0400, Stanislav Kinsbursky wrote:
> 09.04.2012 19:27, Jeff Layton пишет:
> >
> >If you allow one container to hand out conflicting locks while another
> >container is allowing reclaims, then you can end up with some very
> >difficult to debug silent data corruption. That's the worst possible
> >outcome, IMO. We really need to actively keep people from shooting
> >themselves in the foot here.
> >
> >One possibility might be to only allow filesystems to be exported from
> >a single container at a time (and allow that to be overridable somehow
> >once we have a working active/active serving solution). With that, you
> >may be able limp along with a per-container grace period handling
> >scheme like you're proposing.
> >
> 
> Ok then. Keeping people from shooting themselves here sounds reasonable.
> And I like the idea of exporting a filesystem only from once per
> network namespace.

Unfortunately that's not going to get us very far, especially not in the
v4 case where we've got the common read-only pseudoroot that everyone
has to share.

--b.

> Looks like there should be a list of pairs
> "exported superblock - network namespace". And if superblock is
> exported already in other namespace, then export in new namespace
> have to be skipped (replaced?) with appropriate warning (error?)
> message shown in log.
> Or maybe we even should deny starting of NFS server if one of it's
> exports is shared already by other NFS server "instance"?
> But any of these ideas would be easy to implement in RAM, and thus
> it suits only for containers...
> 
> -- 
> Best regards,
> Stanislav Kinsbursky

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: Grace period
  2012-04-09 15:27         ` Jeff Layton
@ 2012-04-09 16:08           ` Stanislav Kinsbursky
  2012-04-09 16:11             ` bfields
  0 siblings, 1 reply; 44+ messages in thread
From: Stanislav Kinsbursky @ 2012-04-09 16:08 UTC (permalink / raw)
  To: Jeff Layton; +Cc: bfields, Trond.Myklebust, linux-nfs, linux-kernel

09.04.2012 19:27, Jeff Layton пишет:
>
> If you allow one container to hand out conflicting locks while another
> container is allowing reclaims, then you can end up with some very
> difficult to debug silent data corruption. That's the worst possible
> outcome, IMO. We really need to actively keep people from shooting
> themselves in the foot here.
>
> One possibility might be to only allow filesystems to be exported from
> a single container at a time (and allow that to be overridable somehow
> once we have a working active/active serving solution). With that, you
> may be able limp along with a per-container grace period handling
> scheme like you're proposing.
>

Ok then. Keeping people from shooting themselves here sounds reasonable.
And I like the idea of exporting a filesystem only from once per network 
namespace. Looks like there should be a list of pairs "exported superblock - 
network namespace". And if superblock is exported already in other namespace, 
then export in new namespace have to be skipped (replaced?) with appropriate 
warning (error?) message shown in log.
Or maybe we even should deny starting of NFS server if one of it's exports is 
shared already by other NFS server "instance"?
But any of these ideas would be easy to implement in RAM, and thus it suits only 
for containers...

-- 
Best regards,
Stanislav Kinsbursky

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: Grace period
  2012-04-09 14:25       ` Stanislav Kinsbursky
@ 2012-04-09 15:27         ` Jeff Layton
  2012-04-09 16:08           ` Stanislav Kinsbursky
  0 siblings, 1 reply; 44+ messages in thread
From: Jeff Layton @ 2012-04-09 15:27 UTC (permalink / raw)
  To: Stanislav Kinsbursky; +Cc: bfields, Trond.Myklebust, linux-nfs, linux-kernel

On Mon, 09 Apr 2012 18:25:48 +0400
Stanislav Kinsbursky <skinsbursky@parallels.com> wrote:

> 09.04.2012 17:47, Jeff Layton пишет:
> > On Mon, 09 Apr 2012 15:24:19 +0400
> > Stanislav Kinsbursky<skinsbursky@parallels.com>  wrote:
> >
> >> 07.04.2012 03:40, bfields@fieldses.org пишет:
> >>> On Fri, Apr 06, 2012 at 09:08:26PM +0400, Stanislav Kinsbursky wrote:
> >>>> Hello, Bruce.
> >>>> Could you, please, clarify this reason why grace list is used?
> >>>> I.e. why list is used instead of some atomic variable, for example?
> >>>
> >>> Like just a reference count?  Yeah, that would be OK.
> >>>
> >>> In theory it could provide some sort of debugging help.  (E.g. we could
> >>> print out the list of "lock managers" currently keeping us in grace.)  I
> >>> had some idea we'd make those lock manager objects more complicated, and
> >>> might have more for individual containerized services.
> >>
> >> Could you share this idea, please?
> >>
> >> Anyway, I have nothing against lists. Just was curious, why it was used.
> >> I added Trond and lists to this reply.
> >>
> >> Let me explain, what is the problem with grace period I'm facing right know, and
> >> what I'm thinking about it.
> >> So, one of the things to be containerized during "NFSd per net ns" work is the
> >> grace period, and these are the basic components of it:
> >> 1) Grace period start.
> >> 2) Grace period end.
> >> 3) Grace period check.
> >> 3) Grace period restart.
> >>
> >> So, the simplest straight-forward way is to make all internal stuff:
> >> "grace_list", "grace_lock", "grace_period_end" work and both "lockd_manager" and
> >> "nfsd4_manager" - per network namespace. Also, "laundromat_work" have to be
> >> per-net as well.
> >> In this case:
> >> 1) Start - grace period can be started per net ns in "lockd_up_net()" (thus has
> >> to be moves there from "lockd()") and "nfs4_state_start()".
> >> 2) End - grace period can be ended per net ns in "lockd_down_net()" (thus has to
> >> be moved there from "lockd()"), "nfsd4_end_grace()" and "fs4_state_shutdown()".
> >> 3) Check - looks easy. There is either svc_rqst or net context can be passed to
> >> function.
> >> 4) Restart - this is a tricky place. It would be great to restart grace period
> >> only for the networks namespace of the sender of the kill signal. So, the idea
> >> is to check siginfo_t for the pid of sender, then try to locate the task, and if
> >> found, then get sender's networks namespace, and restart grace period only for
> >> this namespace (of course, if lockd was started for this namespace - see below).
> >>
> >> If task not found, of it's lockd wasn't started for it's namespace, then grace
> >> period can be either restarted for all namespaces, of just silently dropped.
> >> This is the place where I'm not sure, how to do. Because calling grace period
> >> for all namespaces will be overkill...
> >>
> >> There also another problem with the "task by pid" search, that found task can be
> >> actually not sender (which died already), but some other new task with the same
> >> pid number. In this case, I think, we can just neglect this probability and
> >> always assume, that we located sender (if, of course, lockd was started for
> >> sender's network namespace).
> >>
> >> Trond, Bruce, could you, please, comment this ideas?
> >>
> >
> > I can comment and I'm not sure that will be sufficient.
> >
> 
> Hi, Jeff. Thanks for the comment.
> 
> > The grace period has a particular purpose. It keeps nfsd or lockd from
> > handing out stateful objects (e.g. locks) before clients have an
> > opportunity to reclaim them. Once the grace period expires, there is no
> > more reclaim allowed and "normal" lock and open requests can proceed.
> >
> > Traditionally, there has been one nfsd or lockd "instance" per host.
> > With that, we were able to get away with a relatively simple-minded
> > approach of a global grace period that's gated on nfsd or lockd's
> > startup and shutdown.
> >
> > Now, you're looking at making multiple nfsd or lockd "instances". Does
> > it make sense to make this a per-net thing? Here's a particularly
> > problematic case to illustrate what I mean:
> >
> > Suppose I have a filesystem that's mounted and exported in two
> > different containers. You start up one container and then 60s later,
> > start up the other. The grace period expires in the first container and
> > that nfsd hands out locks that conflict with some that have not been
> > reclaimed yet in the other.
> >
> > Now, we can just try to say "don't export the same fs from more than
> > one container". But we all know that people will do it anyway, since
> > there's nothing that really stops you from doing so.
> >
> 
> Yes, I see. But situation you described is existent already.
> I.e. you can replace containers with the same file system by two nodes, sharing 
> the same distributed file system (like Lustre and GPFS), and you'll experience 
> the same problem in such case.
> 

Yep, which is why we don't support active/active serving from clustered
filesystems (yet). Containers are somewhat similar to a clustered
configuration.

The simple minded grace period handling we have now is really only
suitable for very simple export configurations. The grace period exists
to ensure that filesystem objects are not "oversubscribed" so it makes
some sense to turn it into a per-sb property.

> > What probably makes more sense is making the grace period a per-sb
> > property, and coming up with a set of rules for the fs going into and
> > out of "grace" status.
> >
> > Perhaps a way for different net namespaces to "subscribe" to a
> > particular fs, and don't take the fs out of grace until all of the
> > grace period timers pop? If a fs attempts to subscribe after the fs
> > comes out of grace, then its subscription would be denied and reclaim
> > attempts would get NFS4ERR_NOGRACE or the NLM equivalent.
> >
> 
> This raises another problem. Imagine, that grace period has elapsed for some 
> container and then you start nfsd in another one. New grace period will affect 
> all both of them. And that's even worse from my pow.
> 

If you allow one container to hand out conflicting locks while another
container is allowing reclaims, then you can end up with some very
difficult to debug silent data corruption. That's the worst possible
outcome, IMO. We really need to actively keep people from shooting
themselves in the foot here.

One possibility might be to only allow filesystems to be exported from
a single container at a time (and allow that to be overridable somehow
once we have a working active/active serving solution). With that, you
may be able limp along with a per-container grace period handling
scheme like you're proposing.

-- 
Jeff Layton <jlayton@redhat.com>

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: Grace period
  2012-04-09 13:47     ` Jeff Layton
@ 2012-04-09 14:25       ` Stanislav Kinsbursky
  2012-04-09 15:27         ` Jeff Layton
  0 siblings, 1 reply; 44+ messages in thread
From: Stanislav Kinsbursky @ 2012-04-09 14:25 UTC (permalink / raw)
  To: Jeff Layton; +Cc: bfields, Trond.Myklebust, linux-nfs, linux-kernel

09.04.2012 17:47, Jeff Layton пишет:
> On Mon, 09 Apr 2012 15:24:19 +0400
> Stanislav Kinsbursky<skinsbursky@parallels.com>  wrote:
>
>> 07.04.2012 03:40, bfields@fieldses.org пишет:
>>> On Fri, Apr 06, 2012 at 09:08:26PM +0400, Stanislav Kinsbursky wrote:
>>>> Hello, Bruce.
>>>> Could you, please, clarify this reason why grace list is used?
>>>> I.e. why list is used instead of some atomic variable, for example?
>>>
>>> Like just a reference count?  Yeah, that would be OK.
>>>
>>> In theory it could provide some sort of debugging help.  (E.g. we could
>>> print out the list of "lock managers" currently keeping us in grace.)  I
>>> had some idea we'd make those lock manager objects more complicated, and
>>> might have more for individual containerized services.
>>
>> Could you share this idea, please?
>>
>> Anyway, I have nothing against lists. Just was curious, why it was used.
>> I added Trond and lists to this reply.
>>
>> Let me explain, what is the problem with grace period I'm facing right know, and
>> what I'm thinking about it.
>> So, one of the things to be containerized during "NFSd per net ns" work is the
>> grace period, and these are the basic components of it:
>> 1) Grace period start.
>> 2) Grace period end.
>> 3) Grace period check.
>> 3) Grace period restart.
>>
>> So, the simplest straight-forward way is to make all internal stuff:
>> "grace_list", "grace_lock", "grace_period_end" work and both "lockd_manager" and
>> "nfsd4_manager" - per network namespace. Also, "laundromat_work" have to be
>> per-net as well.
>> In this case:
>> 1) Start - grace period can be started per net ns in "lockd_up_net()" (thus has
>> to be moves there from "lockd()") and "nfs4_state_start()".
>> 2) End - grace period can be ended per net ns in "lockd_down_net()" (thus has to
>> be moved there from "lockd()"), "nfsd4_end_grace()" and "fs4_state_shutdown()".
>> 3) Check - looks easy. There is either svc_rqst or net context can be passed to
>> function.
>> 4) Restart - this is a tricky place. It would be great to restart grace period
>> only for the networks namespace of the sender of the kill signal. So, the idea
>> is to check siginfo_t for the pid of sender, then try to locate the task, and if
>> found, then get sender's networks namespace, and restart grace period only for
>> this namespace (of course, if lockd was started for this namespace - see below).
>>
>> If task not found, of it's lockd wasn't started for it's namespace, then grace
>> period can be either restarted for all namespaces, of just silently dropped.
>> This is the place where I'm not sure, how to do. Because calling grace period
>> for all namespaces will be overkill...
>>
>> There also another problem with the "task by pid" search, that found task can be
>> actually not sender (which died already), but some other new task with the same
>> pid number. In this case, I think, we can just neglect this probability and
>> always assume, that we located sender (if, of course, lockd was started for
>> sender's network namespace).
>>
>> Trond, Bruce, could you, please, comment this ideas?
>>
>
> I can comment and I'm not sure that will be sufficient.
>

Hi, Jeff. Thanks for the comment.

> The grace period has a particular purpose. It keeps nfsd or lockd from
> handing out stateful objects (e.g. locks) before clients have an
> opportunity to reclaim them. Once the grace period expires, there is no
> more reclaim allowed and "normal" lock and open requests can proceed.
>
> Traditionally, there has been one nfsd or lockd "instance" per host.
> With that, we were able to get away with a relatively simple-minded
> approach of a global grace period that's gated on nfsd or lockd's
> startup and shutdown.
>
> Now, you're looking at making multiple nfsd or lockd "instances". Does
> it make sense to make this a per-net thing? Here's a particularly
> problematic case to illustrate what I mean:
>
> Suppose I have a filesystem that's mounted and exported in two
> different containers. You start up one container and then 60s later,
> start up the other. The grace period expires in the first container and
> that nfsd hands out locks that conflict with some that have not been
> reclaimed yet in the other.
>
> Now, we can just try to say "don't export the same fs from more than
> one container". But we all know that people will do it anyway, since
> there's nothing that really stops you from doing so.
>

Yes, I see. But situation you described is existent already.
I.e. you can replace containers with the same file system by two nodes, sharing 
the same distributed file system (like Lustre and GPFS), and you'll experience 
the same problem in such case.

> What probably makes more sense is making the grace period a per-sb
> property, and coming up with a set of rules for the fs going into and
> out of "grace" status.
>
> Perhaps a way for different net namespaces to "subscribe" to a
> particular fs, and don't take the fs out of grace until all of the
> grace period timers pop? If a fs attempts to subscribe after the fs
> comes out of grace, then its subscription would be denied and reclaim
> attempts would get NFS4ERR_NOGRACE or the NLM equivalent.
>

This raises another problem. Imagine, that grace period has elapsed for some 
container and then you start nfsd in another one. New grace period will affect 
all both of them. And that's even worse from my pow.

-- 
Best regards,
Stanislav Kinsbursky

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: Grace period
  2012-04-09 11:24   ` Grace period Stanislav Kinsbursky
@ 2012-04-09 13:47     ` Jeff Layton
  2012-04-09 14:25       ` Stanislav Kinsbursky
  2012-04-09 23:26     ` bfields
  1 sibling, 1 reply; 44+ messages in thread
From: Jeff Layton @ 2012-04-09 13:47 UTC (permalink / raw)
  To: Stanislav Kinsbursky; +Cc: bfields, Trond.Myklebust, linux-nfs, linux-kernel

On Mon, 09 Apr 2012 15:24:19 +0400
Stanislav Kinsbursky <skinsbursky@parallels.com> wrote:

> 07.04.2012 03:40, bfields@fieldses.org пишет:
> > On Fri, Apr 06, 2012 at 09:08:26PM +0400, Stanislav Kinsbursky wrote:
> >> Hello, Bruce.
> >> Could you, please, clarify this reason why grace list is used?
> >> I.e. why list is used instead of some atomic variable, for example?
> >
> > Like just a reference count?  Yeah, that would be OK.
> >
> > In theory it could provide some sort of debugging help.  (E.g. we could
> > print out the list of "lock managers" currently keeping us in grace.)  I
> > had some idea we'd make those lock manager objects more complicated, and
> > might have more for individual containerized services.
> 
> Could you share this idea, please?
> 
> Anyway, I have nothing against lists. Just was curious, why it was used.
> I added Trond and lists to this reply.
> 
> Let me explain, what is the problem with grace period I'm facing right know, and 
> what I'm thinking about it.
> So, one of the things to be containerized during "NFSd per net ns" work is the 
> grace period, and these are the basic components of it:
> 1) Grace period start.
> 2) Grace period end.
> 3) Grace period check.
> 3) Grace period restart.
> 
> So, the simplest straight-forward way is to make all internal stuff: 
> "grace_list", "grace_lock", "grace_period_end" work and both "lockd_manager" and 
> "nfsd4_manager" - per network namespace. Also, "laundromat_work" have to be 
> per-net as well.
> In this case:
> 1) Start - grace period can be started per net ns in "lockd_up_net()" (thus has 
> to be moves there from "lockd()") and "nfs4_state_start()".
> 2) End - grace period can be ended per net ns in "lockd_down_net()" (thus has to 
> be moved there from "lockd()"), "nfsd4_end_grace()" and "fs4_state_shutdown()".
> 3) Check - looks easy. There is either svc_rqst or net context can be passed to 
> function.
> 4) Restart - this is a tricky place. It would be great to restart grace period 
> only for the networks namespace of the sender of the kill signal. So, the idea 
> is to check siginfo_t for the pid of sender, then try to locate the task, and if 
> found, then get sender's networks namespace, and restart grace period only for 
> this namespace (of course, if lockd was started for this namespace - see below).
> 
> If task not found, of it's lockd wasn't started for it's namespace, then grace 
> period can be either restarted for all namespaces, of just silently dropped. 
> This is the place where I'm not sure, how to do. Because calling grace period 
> for all namespaces will be overkill...
> 
> There also another problem with the "task by pid" search, that found task can be 
> actually not sender (which died already), but some other new task with the same 
> pid number. In this case, I think, we can just neglect this probability and 
> always assume, that we located sender (if, of course, lockd was started for 
> sender's network namespace).
> 
> Trond, Bruce, could you, please, comment this ideas?
> 

I can comment and I'm not sure that will be sufficient.

The grace period has a particular purpose. It keeps nfsd or lockd from
handing out stateful objects (e.g. locks) before clients have an
opportunity to reclaim them. Once the grace period expires, there is no
more reclaim allowed and "normal" lock and open requests can proceed.

Traditionally, there has been one nfsd or lockd "instance" per host.
With that, we were able to get away with a relatively simple-minded
approach of a global grace period that's gated on nfsd or lockd's
startup and shutdown.

Now, you're looking at making multiple nfsd or lockd "instances". Does
it make sense to make this a per-net thing? Here's a particularly
problematic case to illustrate what I mean:

Suppose I have a filesystem that's mounted and exported in two
different containers. You start up one container and then 60s later,
start up the other. The grace period expires in the first container and
that nfsd hands out locks that conflict with some that have not been
reclaimed yet in the other.

Now, we can just try to say "don't export the same fs from more than
one container". But we all know that people will do it anyway, since
there's nothing that really stops you from doing so.

What probably makes more sense is making the grace period a per-sb
property, and coming up with a set of rules for the fs going into and
out of "grace" status.

Perhaps a way for different net namespaces to "subscribe" to a
particular fs, and don't take the fs out of grace until all of the
grace period timers pop? If a fs attempts to subscribe after the fs
comes out of grace, then its subscription would be denied and reclaim
attempts would get NFS4ERR_NOGRACE or the NLM equivalent.

-- 
Jeff Layton <jlayton@redhat.com>

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: Grace period
       [not found] ` <20120406234039.GA20940@fieldses.org>
@ 2012-04-09 11:24   ` Stanislav Kinsbursky
  2012-04-09 13:47     ` Jeff Layton
  2012-04-09 23:26     ` bfields
  0 siblings, 2 replies; 44+ messages in thread
From: Stanislav Kinsbursky @ 2012-04-09 11:24 UTC (permalink / raw)
  To: bfields, Trond.Myklebust; +Cc: linux-nfs, linux-kernel

07.04.2012 03:40, bfields@fieldses.org пишет:
> On Fri, Apr 06, 2012 at 09:08:26PM +0400, Stanislav Kinsbursky wrote:
>> Hello, Bruce.
>> Could you, please, clarify this reason why grace list is used?
>> I.e. why list is used instead of some atomic variable, for example?
>
> Like just a reference count?  Yeah, that would be OK.
>
> In theory it could provide some sort of debugging help.  (E.g. we could
> print out the list of "lock managers" currently keeping us in grace.)  I
> had some idea we'd make those lock manager objects more complicated, and
> might have more for individual containerized services.

Could you share this idea, please?

Anyway, I have nothing against lists. Just was curious, why it was used.
I added Trond and lists to this reply.

Let me explain, what is the problem with grace period I'm facing right know, and 
what I'm thinking about it.
So, one of the things to be containerized during "NFSd per net ns" work is the 
grace period, and these are the basic components of it:
1) Grace period start.
2) Grace period end.
3) Grace period check.
3) Grace period restart.

So, the simplest straight-forward way is to make all internal stuff: 
"grace_list", "grace_lock", "grace_period_end" work and both "lockd_manager" and 
"nfsd4_manager" - per network namespace. Also, "laundromat_work" have to be 
per-net as well.
In this case:
1) Start - grace period can be started per net ns in "lockd_up_net()" (thus has 
to be moves there from "lockd()") and "nfs4_state_start()".
2) End - grace period can be ended per net ns in "lockd_down_net()" (thus has to 
be moved there from "lockd()"), "nfsd4_end_grace()" and "fs4_state_shutdown()".
3) Check - looks easy. There is either svc_rqst or net context can be passed to 
function.
4) Restart - this is a tricky place. It would be great to restart grace period 
only for the networks namespace of the sender of the kill signal. So, the idea 
is to check siginfo_t for the pid of sender, then try to locate the task, and if 
found, then get sender's networks namespace, and restart grace period only for 
this namespace (of course, if lockd was started for this namespace - see below).

If task not found, of it's lockd wasn't started for it's namespace, then grace 
period can be either restarted for all namespaces, of just silently dropped. 
This is the place where I'm not sure, how to do. Because calling grace period 
for all namespaces will be overkill...

There also another problem with the "task by pid" search, that found task can be 
actually not sender (which died already), but some other new task with the same 
pid number. In this case, I think, we can just neglect this probability and 
always assume, that we located sender (if, of course, lockd was started for 
sender's network namespace).

Trond, Bruce, could you, please, comment this ideas?

-- 
Best regards,
Stanislav Kinsbursky

^ permalink raw reply	[flat|nested] 44+ messages in thread

end of thread, other threads:[~2016-07-06  0:38 UTC | newest]

Thread overview: 44+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-06-14 21:25 [PATCH] NFS: Don't let readdirplus revalidate an inode that was marked as stale Trond Myklebust
2016-06-30 21:46 ` grace period Marc Eshel
2016-07-01 16:08   ` Bruce Fields
2016-07-01 17:31     ` Marc Eshel
2016-07-01 20:07       ` Bruce Fields
2016-07-01 20:24         ` Marc Eshel
2016-07-01 20:47           ` Bruce Fields
2016-07-01 20:46         ` Marc Eshel
2016-07-01 21:01           ` Bruce Fields
2016-07-01 22:42             ` Marc Eshel
2016-07-02  0:58               ` Bruce Fields
2016-07-03  5:30                 ` Marc Eshel
2016-07-05 20:51                   ` Bruce Fields
2016-07-05 23:05                     ` Marc Eshel
2016-07-06  0:38                       ` Bruce Fields
     [not found]                 ` <OFC1237E53.3CFCA8E8-ON88257FE5.001D3182-88257FE5.001E3A5B@LocalDomain>
2016-07-04 23:53                   ` HA NFS Marc Eshel
2016-07-05 15:08                     ` Steve Dickson
2016-07-05 20:56                       ` Marc Eshel
     [not found]         ` <OF5D486F02.62CECB7B-ON88257FE3.0071DBE5-88257FE3.00722318@LocalDomain>
2016-07-01 20:51           ` grace period Marc Eshel
     [not found] <4F7F230A.6080506@parallels.com>
     [not found] ` <20120406234039.GA20940@fieldses.org>
2012-04-09 11:24   ` Grace period Stanislav Kinsbursky
2012-04-09 13:47     ` Jeff Layton
2012-04-09 14:25       ` Stanislav Kinsbursky
2012-04-09 15:27         ` Jeff Layton
2012-04-09 16:08           ` Stanislav Kinsbursky
2012-04-09 16:11             ` bfields
2012-04-09 16:17               ` Myklebust, Trond
2012-04-09 16:17                 ` Myklebust, Trond
2012-04-09 16:21                 ` bfields
2012-04-09 16:33                   ` Myklebust, Trond
2012-04-09 16:33                     ` Myklebust, Trond
2012-04-09 16:39                     ` bfields
2012-04-09 16:56                     ` Stanislav Kinsbursky
2012-04-09 18:11                       ` bfields
2012-04-10 10:56                         ` Stanislav Kinsbursky
2012-04-10 13:39                           ` bfields
2012-04-10 15:36                             ` Stanislav Kinsbursky
2012-04-10 18:28                               ` Jeff Layton
2012-04-10 20:46                                 ` bfields
2012-04-11 10:08                                 ` Stanislav Kinsbursky
2012-04-09 23:26     ` bfields
2012-04-10 11:29       ` Stanislav Kinsbursky
2012-04-10 13:37         ` bfields
2012-04-10 14:10           ` Stanislav Kinsbursky
2012-04-10 14:18             ` bfields

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.