[PATCH 0/4] NFSD: Pin to vfsmount instead of mntget for export cache

linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH 0/4] NFSD: Pin to vfsmount instead of mntget for export cache
@ 2015-05-06 13:18 Kinglong Mee
       [not found] ` <554A149B.5060102-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
  0 siblings, 1 reply; 18+ messages in thread
From: Kinglong Mee @ 2015-05-06 13:18 UTC (permalink / raw)
  To: J. Bruce Fields, Al Viro, linux-fsdevel, linux-nfs
  Cc: NeilBrown, Trond Myklebust, kinglongmee

If there are some mount points(not exported for nfs) under pseudo root,
after client's operation of those entry under the root, anyone *can't*
unmount those mount points until export cache expired.

# cat /etc/exports
/nfs/xfs        *(rw,insecure,no_subtree_check,no_root_squash)
/nfs/pnfs       *(rw,insecure,no_subtree_check,no_root_squash)
# ll /nfs/
total 0
drwxr-xr-x. 3 root root 84 Apr 21 22:27 pnfs
drwxr-xr-x. 3 root root 84 Apr 21 22:27 test
drwxr-xr-x. 2 root root  6 Apr 20 22:01 xfs
# mount /dev/sde /nfs/test
# df
Filesystem                      1K-blocks    Used Available Use% Mounted on
......
/dev/sdd                          1038336   32944   1005392   4% /nfs/pnfs
/dev/sdc                         10475520   32928  10442592   1% /nfs/xfs
/dev/sde                           999320    1284    929224   1% /nfs/test
# mount -t nfs 127.0.0.1:/nfs/ /mnt
# ll /mnt/*/
/mnt/pnfs/:
total 0
-rw-r--r--. 1 root root 0 Apr 21 22:23 attr
drwxr-xr-x. 2 root root 6 Apr 21 22:19 tmp

/mnt/xfs/:
total 0
# umount /nfs/test/
umount: /nfs/test/: target is busy
        (In some cases useful info about processes that
         use the device is found by lsof(8) or fuser(1).)

I don't think that's user expect, they want umount /nfs/test/.

It's caused by exports cache of nfsd holds the reference of
the path (here is /nfs/test/), so, it can't be umounted.

The patch site using fs_pin instead of mntget for export cache,
user at nfs server can unmount any mount points includes exported
for nfs. Maybe, only umounted for unexported mount points is better?

[PATCH 1/4] fs_pin: Fix uninitialized value in fs_pin
[PATCH 2/4] fs_pin: Export functions for specific filesystem
[PATCH 3/4] sunrpc: New helper cache_force_expire for cache cleanup
[PATCH 4/4] nfsd: Pin to vfsmnt instead of mntget

Kinglong Mee (4):
  fs_pin: initialize done to zero
  fs_pin: export functions for nactive filesystem
  sunrpc: new helper cache_force_expire for deleting a cache
  nfsd: pin to mnt insteads mntget

 fs/fs_pin.c                  |  3 +++
 fs/nfsd/export.c             | 37 ++++++++++++++++++++++++++++++-------
 fs/nfsd/export.h             | 10 +++++++++-
 include/linux/fs_pin.h       |  1 +
 include/linux/sunrpc/cache.h | 11 +++++++++++
 5 files changed, 54 insertions(+), 8 deletions(-)

-- 
2.4.0


^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH 1/4] fs_pin: Fix uninitialized value in fs_pin
       [not found] ` <554A149B.5060102-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
@ 2015-05-06 13:19   ` Kinglong Mee
  2015-05-07 19:43     ` J. Bruce Fields
  2015-05-06 13:19   ` [PATCH 2/4] fs_pin: Export functions for specific filesystem Kinglong Mee
                     ` (2 subsequent siblings)
  3 siblings, 1 reply; 18+ messages in thread
From: Kinglong Mee @ 2015-05-06 13:19 UTC (permalink / raw)
  To: Al Viro, linux-fsdevel-u79uwXL29TY76Z2rM5mHXA,
	linux-nfs-u79uwXL29TY76Z2rM5mHXA
  Cc: J. Bruce Fields, NeilBrown, Trond Myklebust,
	kinglongmee-Re5JQEeQqe8AvxtiuMwx3w

Without initialized, done in fs_pin at stack space may
contains strange value.

Signed-off-by: Kinglong Mee <kinglongmee-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
---
 include/linux/fs_pin.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/include/linux/fs_pin.h b/include/linux/fs_pin.h
index 3886b3b..18fad53 100644
--- a/include/linux/fs_pin.h
+++ b/include/linux/fs_pin.h
@@ -16,6 +16,7 @@ static inline void init_fs_pin(struct fs_pin *p, void (*kill)(struct fs_pin *))
 	INIT_HLIST_NODE(&p->s_list);
 	INIT_HLIST_NODE(&p->m_list);
 	p->kill = kill;
+	p->done = 0;
 }
 
 void pin_remove(struct fs_pin *);
-- 
2.4.0


--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 2/4] fs_pin: Export functions for specific filesystem
       [not found] ` <554A149B.5060102-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
  2015-05-06 13:19   ` [PATCH 1/4] fs_pin: Fix uninitialized value in fs_pin Kinglong Mee
@ 2015-05-06 13:19   ` Kinglong Mee
  2015-05-06 13:20   ` [PATCH 3/4] sunrpc: New helper cache_force_expire for cache cleanup Kinglong Mee
  2015-05-06 13:21   ` [PATCH 4/4] nfsd: Pin to vfsmnt instead of mntget Kinglong Mee
  3 siblings, 0 replies; 18+ messages in thread
From: Kinglong Mee @ 2015-05-06 13:19 UTC (permalink / raw)
  To: Al Viro, linux-fsdevel-u79uwXL29TY76Z2rM5mHXA,
	linux-nfs-u79uwXL29TY76Z2rM5mHXA
  Cc: J. Bruce Fields, NeilBrown, Trond Myklebust,
	kinglongmee-Re5JQEeQqe8AvxtiuMwx3w

Exports functions for others who want pin to vfsmount,
eg, nfsd's export cache.

Signed-off-by: Kinglong Mee <kinglongmee-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
---
 fs/fs_pin.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/fs/fs_pin.c b/fs/fs_pin.c
index 611b540..553e8b1 100644
--- a/fs/fs_pin.c
+++ b/fs/fs_pin.c
@@ -17,6 +17,7 @@ void pin_remove(struct fs_pin *pin)
 	wake_up_locked(&pin->wait);
 	spin_unlock_irq(&pin->wait.lock);
 }
+EXPORT_SYMBOL(pin_remove);
 
 void pin_insert_group(struct fs_pin *pin, struct vfsmount *m, struct hlist_head *p)
 {
@@ -26,11 +27,13 @@ void pin_insert_group(struct fs_pin *pin, struct vfsmount *m, struct hlist_head
 	hlist_add_head(&pin->m_list, &real_mount(m)->mnt_pins);
 	spin_unlock(&pin_lock);
 }
+EXPORT_SYMBOL(pin_insert_group);
 
 void pin_insert(struct fs_pin *pin, struct vfsmount *m)
 {
 	pin_insert_group(pin, m, &m->mnt_sb->s_pins);
 }
+EXPORT_SYMBOL(pin_insert);
 
 void pin_kill(struct fs_pin *p)
 {
-- 
2.4.0

--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 3/4] sunrpc: New helper cache_force_expire for cache cleanup
       [not found] ` <554A149B.5060102-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
  2015-05-06 13:19   ` [PATCH 1/4] fs_pin: Fix uninitialized value in fs_pin Kinglong Mee
  2015-05-06 13:19   ` [PATCH 2/4] fs_pin: Export functions for specific filesystem Kinglong Mee
@ 2015-05-06 13:20   ` Kinglong Mee
  2015-05-06 13:21   ` [PATCH 4/4] nfsd: Pin to vfsmnt instead of mntget Kinglong Mee
  3 siblings, 0 replies; 18+ messages in thread
From: Kinglong Mee @ 2015-05-06 13:20 UTC (permalink / raw)
  To: J. Bruce Fields, linux-fsdevel-u79uwXL29TY76Z2rM5mHXA,
	linux-nfs-u79uwXL29TY76Z2rM5mHXA
  Cc: Al Viro, NeilBrown, Trond Myklebust, kinglongmee-Re5JQEeQqe8AvxtiuMwx3w

Using expiry_time force cleanup a cache.

Signed-off-by: Kinglong Mee <kinglongmee-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
---
 include/linux/sunrpc/cache.h | 11 +++++++++++
 1 file changed, 11 insertions(+)

diff --git a/include/linux/sunrpc/cache.h b/include/linux/sunrpc/cache.h
index 437ddb6..ce75e9c 100644
--- a/include/linux/sunrpc/cache.h
+++ b/include/linux/sunrpc/cache.h
@@ -210,6 +210,17 @@ extern int cache_check(struct cache_detail *detail,
 		       struct cache_head *h, struct cache_req *rqstp);
 extern void cache_flush(void);
 extern void cache_purge(struct cache_detail *detail);
+
+static inline void cache_force_expire(struct cache_detail *detail, struct cache_head *h)
+{
+	write_lock(&detail->hash_lock);
+	h->expiry_time = seconds_since_boot() - 1;
+	detail->nextcheck = seconds_since_boot();
+	write_unlock(&detail->hash_lock);
+
+	cache_flush();
+}
+
 #define NEVER (0x7FFFFFFF)
 extern void __init cache_initialize(void);
 extern int cache_register_net(struct cache_detail *cd, struct net *net);
-- 
2.4.0

--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 4/4] nfsd: Pin to vfsmnt instead of mntget
       [not found] ` <554A149B.5060102-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
                     ` (2 preceding siblings ...)
  2015-05-06 13:20   ` [PATCH 3/4] sunrpc: New helper cache_force_expire for cache cleanup Kinglong Mee
@ 2015-05-06 13:21   ` Kinglong Mee
  2015-05-08  4:40     ` NeilBrown
  3 siblings, 1 reply; 18+ messages in thread
From: Kinglong Mee @ 2015-05-06 13:21 UTC (permalink / raw)
  To: J. Bruce Fields, linux-fsdevel-u79uwXL29TY76Z2rM5mHXA,
	linux-nfs-u79uwXL29TY76Z2rM5mHXA
  Cc: Al Viro, NeilBrown, Trond Myklebust, kinglongmee-Re5JQEeQqe8AvxtiuMwx3w

If there are some mount points(not exported for nfs) under pseudo root,
after client's operation of those entry under the root, anyone *can't*
unmount those mount points until export cache expired.

# cat /etc/exports
/nfs/xfs        *(rw,insecure,no_subtree_check,no_root_squash)
/nfs/pnfs       *(rw,insecure,no_subtree_check,no_root_squash)
# ll /nfs/
total 0
drwxr-xr-x. 3 root root 84 Apr 21 22:27 pnfs
drwxr-xr-x. 3 root root 84 Apr 21 22:27 test
drwxr-xr-x. 2 root root  6 Apr 20 22:01 xfs
# mount /dev/sde /nfs/test
# df
Filesystem                      1K-blocks    Used Available Use% Mounted on
......
/dev/sdd                          1038336   32944   1005392   4% /nfs/pnfs
/dev/sdc                         10475520   32928  10442592   1% /nfs/xfs
/dev/sde                           999320    1284    929224   1% /nfs/test
# mount -t nfs 127.0.0.1:/nfs/ /mnt
# ll /mnt/*/
/mnt/pnfs/:
total 0
-rw-r--r--. 1 root root 0 Apr 21 22:23 attr
drwxr-xr-x. 2 root root 6 Apr 21 22:19 tmp

/mnt/xfs/:
total 0
# umount /nfs/test/
umount: /nfs/test/: target is busy
        (In some cases useful info about processes that
         use the device is found by lsof(8) or fuser(1).)

I don't think that's user expect, they want umount /nfs/test/.

It's caused by exports cache of nfsd holds the reference of
the path (here is /nfs/test/), so, it can't be umounted.

The patch site using fs_pin instead of mntget for export cache,
user at nfs server can unmount any mount points includes exported
for nfs. Maybe, only umounted for unexported mount points is better?

Signed-off-by: Kinglong Mee <kinglongmee-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
---
 fs/nfsd/export.c | 37 ++++++++++++++++++++++++++++++-------
 fs/nfsd/export.h | 10 +++++++++-
 2 files changed, 39 insertions(+), 8 deletions(-)

diff --git a/fs/nfsd/export.c b/fs/nfsd/export.c
index f79521a..80f82f5 100644
--- a/fs/nfsd/export.c
+++ b/fs/nfsd/export.c
@@ -42,10 +42,12 @@ static void expkey_put(struct kref *ref)
 	struct svc_expkey *key = container_of(ref, struct svc_expkey, h.ref);
 
 	if (test_bit(CACHE_VALID, &key->h.flags) &&
-	    !test_bit(CACHE_NEGATIVE, &key->h.flags))
-		path_put(&key->ek_path);
+	    !test_bit(CACHE_NEGATIVE, &key->h.flags)) {
+		dput(key->ek_path.dentry);
+		pin_remove(&key->ek_pin);
+	}
 	auth_domain_put(key->ek_client);
-	kfree(key);
+	kfree_rcu(key, rcu_head);
 }
 
 static void expkey_request(struct cache_detail *cd,
@@ -120,6 +122,7 @@ static int expkey_parse(struct cache_detail *cd, char *mesg, int mlen)
 		goto out;
 
 	key.ek_client = dom;	
+	key.cd = cd;
 	key.ek_fsidtype = fsidtype;
 	memcpy(key.ek_fsid, buf, len);
 
@@ -210,6 +213,13 @@ static inline void expkey_init(struct cache_head *cnew,
 	new->ek_fsidtype = item->ek_fsidtype;
 
 	memcpy(new->ek_fsid, item->ek_fsid, sizeof(new->ek_fsid));
+	new->cd = item->cd;
+}
+
+static void expkey_pin_kill(struct fs_pin *pin)
+{
+	struct svc_expkey *key = container_of(pin, struct svc_expkey, ek_pin);
+	cache_force_expire(key->cd, &key->h);
 }
 
 static inline void expkey_update(struct cache_head *cnew,
@@ -218,8 +228,10 @@ static inline void expkey_update(struct cache_head *cnew,
 	struct svc_expkey *new = container_of(cnew, struct svc_expkey, h);
 	struct svc_expkey *item = container_of(citem, struct svc_expkey, h);
 
+	init_fs_pin(&new->ek_pin, expkey_pin_kill);
 	new->ek_path = item->ek_path;
-	path_get(&item->ek_path);
+	dget(item->ek_path.dentry);
+	pin_insert_group(&new->ek_pin, item->ek_path.mnt, NULL);
 }
 
 static struct cache_head *expkey_alloc(void)
@@ -309,11 +321,13 @@ static void nfsd4_fslocs_free(struct nfsd4_fs_locations *fsloc)
 static void svc_export_put(struct kref *ref)
 {
 	struct svc_export *exp = container_of(ref, struct svc_export, h.ref);
-	path_put(&exp->ex_path);
+
+	dput(exp->ex_path.dentry);
+	pin_remove(&exp->ex_pin);
 	auth_domain_put(exp->ex_client);
 	nfsd4_fslocs_free(&exp->ex_fslocs);
 	kfree(exp->ex_uuid);
-	kfree(exp);
+	kfree_rcu(exp, rcu_head);
 }
 
 static void svc_export_request(struct cache_detail *cd,
@@ -694,15 +708,23 @@ static int svc_export_match(struct cache_head *a, struct cache_head *b)
 		path_equal(&orig->ex_path, &new->ex_path);
 }
 
+static void export_pin_kill(struct fs_pin *pin)
+{
+	struct svc_export *exp = container_of(pin, struct svc_export, ex_pin);
+	cache_force_expire(exp->cd, &exp->h);
+}
+
 static void svc_export_init(struct cache_head *cnew, struct cache_head *citem)
 {
 	struct svc_export *new = container_of(cnew, struct svc_export, h);
 	struct svc_export *item = container_of(citem, struct svc_export, h);
 
+	init_fs_pin(&new->ex_pin, export_pin_kill);
 	kref_get(&item->ex_client->ref);
 	new->ex_client = item->ex_client;
 	new->ex_path = item->ex_path;
-	path_get(&item->ex_path);
+	dget(item->ex_path.dentry);
+	pin_insert_group(&new->ex_pin, item->ex_path.mnt, NULL);
 	new->ex_fslocs.locations = NULL;
 	new->ex_fslocs.locations_count = 0;
 	new->ex_fslocs.migrated = 0;
@@ -811,6 +833,7 @@ exp_find_key(struct cache_detail *cd, struct auth_domain *clp, int fsid_type,
 
 	key.ek_client = clp;
 	key.ek_fsidtype = fsid_type;
+	key.cd = cd;
 	memcpy(key.ek_fsid, fsidv, key_len(fsid_type));
 
 	ek = svc_expkey_lookup(cd, &key);
diff --git a/fs/nfsd/export.h b/fs/nfsd/export.h
index 1f52bfc..1cf6ada 100644
--- a/fs/nfsd/export.h
+++ b/fs/nfsd/export.h
@@ -4,6 +4,7 @@
 #ifndef NFSD_EXPORT_H
 #define NFSD_EXPORT_H
 
+#include <linux/fs_pin.h>
 #include <linux/sunrpc/cache.h>
 #include <uapi/linux/nfsd/export.h>
 
@@ -46,6 +47,8 @@ struct exp_flavor_info {
 
 struct svc_export {
 	struct cache_head	h;
+	struct cache_detail	*cd;
+
 	struct auth_domain *	ex_client;
 	int			ex_flags;
 	struct path		ex_path;
@@ -58,7 +61,9 @@ struct svc_export {
 	struct exp_flavor_info	ex_flavors[MAX_SECINFO_LIST];
 	enum pnfs_layouttype	ex_layout_type;
 	struct nfsd4_deviceid_map *ex_devid_map;
-	struct cache_detail	*cd;
+
+	struct fs_pin		ex_pin;
+	struct rcu_head		rcu_head;
 };
 
 /* an "export key" (expkey) maps a filehandlefragement to an
@@ -67,12 +72,15 @@ struct svc_export {
  */
 struct svc_expkey {
 	struct cache_head	h;
+	struct cache_detail	*cd;
 
 	struct auth_domain *	ek_client;
 	int			ek_fsidtype;
 	u32			ek_fsid[6];
 
 	struct path		ek_path;
+	struct fs_pin		ek_pin;
+	struct rcu_head		rcu_head;
 };
 
 #define EX_ISSYNC(exp)		(!((exp)->ex_flags & NFSEXP_ASYNC))
-- 
2.4.0

--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [PATCH 1/4] fs_pin: Fix uninitialized value in fs_pin
  2015-05-06 13:19   ` [PATCH 1/4] fs_pin: Fix uninitialized value in fs_pin Kinglong Mee
@ 2015-05-07 19:43     ` J. Bruce Fields
       [not found]       ` <20150507194335.GA16527-uC3wQj2KruNg9hUCZPvPmw@public.gmane.org>
  0 siblings, 1 reply; 18+ messages in thread
From: J. Bruce Fields @ 2015-05-07 19:43 UTC (permalink / raw)
  To: Kinglong Mee
  Cc: Al Viro, linux-fsdevel, linux-nfs, NeilBrown, Trond Myklebust

On Wed, May 06, 2015 at 09:19:23PM +0800, Kinglong Mee wrote:
> Without initialized, done in fs_pin at stack space may
> contains strange value.

Looks like both init_fs_pin callers use some variation on kzalloc(), so
I don't think there's any actual bug here.

Maybe there's some other reason for the belt-and-suspenders approach,
that's Al's call, I think.

--b.

> 
> Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
> ---
>  include/linux/fs_pin.h | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/include/linux/fs_pin.h b/include/linux/fs_pin.h
> index 3886b3b..18fad53 100644
> --- a/include/linux/fs_pin.h
> +++ b/include/linux/fs_pin.h
> @@ -16,6 +16,7 @@ static inline void init_fs_pin(struct fs_pin *p, void (*kill)(struct fs_pin *))
>  	INIT_HLIST_NODE(&p->s_list);
>  	INIT_HLIST_NODE(&p->m_list);
>  	p->kill = kill;
> +	p->done = 0;
>  }
>  
>  void pin_remove(struct fs_pin *);
> -- 
> 2.4.0
> 

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 1/4] fs_pin: Fix uninitialized value in fs_pin
       [not found]       ` <20150507194335.GA16527-uC3wQj2KruNg9hUCZPvPmw@public.gmane.org>
@ 2015-05-08  0:36         ` Kinglong Mee
  0 siblings, 0 replies; 18+ messages in thread
From: Kinglong Mee @ 2015-05-08  0:36 UTC (permalink / raw)
  To: J. Bruce Fields
  Cc: Al Viro, linux-fsdevel-u79uwXL29TY76Z2rM5mHXA,
	linux-nfs-u79uwXL29TY76Z2rM5mHXA, NeilBrown, Trond Myklebust,
	kinglongmee-Re5JQEeQqe8AvxtiuMwx3w

On 5/8/2015 3:43 AM, J. Bruce Fields wrote:
> On Wed, May 06, 2015 at 09:19:23PM +0800, Kinglong Mee wrote:
>> Without initialized, done in fs_pin at stack space may
>> contains strange value.
> 
> Looks like both init_fs_pin callers use some variation on kzalloc(), so
> I don't think there's any actual bug here.

Yes, you are right.

But nfsd's export cache using kmalloc for memory allocating,
if I insert a fs_pin into svc_export{}, the value of done maybe
strange value.

Maybe I should update using kzalloc in nfsd?
But I'd like this one.

thanks,
Kinglong Mee

> 
> Maybe there's some other reason for the belt-and-suspenders approach,
> that's Al's call, I think.
> 
> --b.
> 
>>
>> Signed-off-by: Kinglong Mee <kinglongmee-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
>> ---
>>  include/linux/fs_pin.h | 1 +
>>  1 file changed, 1 insertion(+)
>>
>> diff --git a/include/linux/fs_pin.h b/include/linux/fs_pin.h
>> index 3886b3b..18fad53 100644
>> --- a/include/linux/fs_pin.h
>> +++ b/include/linux/fs_pin.h
>> @@ -16,6 +16,7 @@ static inline void init_fs_pin(struct fs_pin *p, void (*kill)(struct fs_pin *))
>>  	INIT_HLIST_NODE(&p->s_list);
>>  	INIT_HLIST_NODE(&p->m_list);
>>  	p->kill = kill;
>> +	p->done = 0;
>>  }
>>  
>>  void pin_remove(struct fs_pin *);
>> -- 
>> 2.4.0
>>
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 4/4] nfsd: Pin to vfsmnt instead of mntget
  2015-05-06 13:21   ` [PATCH 4/4] nfsd: Pin to vfsmnt instead of mntget Kinglong Mee
@ 2015-05-08  4:40     ` NeilBrown
  2015-05-08 13:47       ` J. Bruce Fields
  0 siblings, 1 reply; 18+ messages in thread
From: NeilBrown @ 2015-05-08  4:40 UTC (permalink / raw)
  To: Kinglong Mee
  Cc: J. Bruce Fields, linux-fsdevel, linux-nfs, Al Viro, Trond Myklebust

[-- Attachment #1: Type: text/plain, Size: 8350 bytes --]

On Wed, 06 May 2015 21:21:15 +0800 Kinglong Mee <kinglongmee@gmail.com> wrote:

> If there are some mount points(not exported for nfs) under pseudo root,
> after client's operation of those entry under the root, anyone *can't*
> unmount those mount points until export cache expired.
> 
> # cat /etc/exports
> /nfs/xfs        *(rw,insecure,no_subtree_check,no_root_squash)
> /nfs/pnfs       *(rw,insecure,no_subtree_check,no_root_squash)
> # ll /nfs/
> total 0
> drwxr-xr-x. 3 root root 84 Apr 21 22:27 pnfs
> drwxr-xr-x. 3 root root 84 Apr 21 22:27 test
> drwxr-xr-x. 2 root root  6 Apr 20 22:01 xfs
> # mount /dev/sde /nfs/test
> # df
> Filesystem                      1K-blocks    Used Available Use% Mounted on
> ......
> /dev/sdd                          1038336   32944   1005392   4% /nfs/pnfs
> /dev/sdc                         10475520   32928  10442592   1% /nfs/xfs
> /dev/sde                           999320    1284    929224   1% /nfs/test
> # mount -t nfs 127.0.0.1:/nfs/ /mnt
> # ll /mnt/*/
> /mnt/pnfs/:
> total 0
> -rw-r--r--. 1 root root 0 Apr 21 22:23 attr
> drwxr-xr-x. 2 root root 6 Apr 21 22:19 tmp
> 
> /mnt/xfs/:
> total 0
> # umount /nfs/test/
> umount: /nfs/test/: target is busy
>         (In some cases useful info about processes that
>          use the device is found by lsof(8) or fuser(1).)
> 
> I don't think that's user expect, they want umount /nfs/test/.
> 
> It's caused by exports cache of nfsd holds the reference of
> the path (here is /nfs/test/), so, it can't be umounted.
> 
> The patch site using fs_pin instead of mntget for export cache,
> user at nfs server can unmount any mount points includes exported
> for nfs. Maybe, only umounted for unexported mount points is better?


Thanks for this patch.  It looks good!

My only comment on the code is that I would really like to see a
"path_get_pin()" and "path_put_unpin()" rather than open coding:

> +	dget(item->ek_path.dentry);
> +	pin_insert_group(&new->ek_pin, item->ek_path.mnt, NULL);

and 

> +		dput(key->ek_path.dentry);
> +		pin_remove(&key->ek_pin);


But the question you raise is an important one:  Exactly which filesystems
should be allowed to be unmounted?
This is a change in behaviour - is it one that people uniformly would want?

The kernel doesn't currently know which file systems were explicitly listed
in /etc/exports, and which were found by following a 'crossmnt'.
It could guess and allow the unmounting of anything below a 'crossmnt', but I
wouldn't be comfortable with that - it is error prone.

mountd does know what is in /etc/exports, and could tell the kernel.
For the expkey cache, we could always use path_get_pin.
For the export cache (where flags are available) we could use path_get
or path_get_pin depending on some new flag.

I'm not really sure it is worth it.  I would rather the filesystems could
always be unmounted.  But doing that could possibly break someone's work
flow.  Maybe.

Or maybe I'm seeing problems where there aren't any.

Anyone else have an opinion?

Thanks,
NeilBrown



> 
> Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
> ---
>  fs/nfsd/export.c | 37 ++++++++++++++++++++++++++++++-------
>  fs/nfsd/export.h | 10 +++++++++-
>  2 files changed, 39 insertions(+), 8 deletions(-)
> 
> diff --git a/fs/nfsd/export.c b/fs/nfsd/export.c
> index f79521a..80f82f5 100644
> --- a/fs/nfsd/export.c
> +++ b/fs/nfsd/export.c
> @@ -42,10 +42,12 @@ static void expkey_put(struct kref *ref)
>  	struct svc_expkey *key = container_of(ref, struct svc_expkey, h.ref);
>  
>  	if (test_bit(CACHE_VALID, &key->h.flags) &&
> -	    !test_bit(CACHE_NEGATIVE, &key->h.flags))
> -		path_put(&key->ek_path);
> +	    !test_bit(CACHE_NEGATIVE, &key->h.flags)) {
> +		dput(key->ek_path.dentry);
> +		pin_remove(&key->ek_pin);
> +	}
>  	auth_domain_put(key->ek_client);
> -	kfree(key);
> +	kfree_rcu(key, rcu_head);
>  }
>  
>  static void expkey_request(struct cache_detail *cd,
> @@ -120,6 +122,7 @@ static int expkey_parse(struct cache_detail *cd, char *mesg, int mlen)
>  		goto out;
>  
>  	key.ek_client = dom;	
> +	key.cd = cd;
>  	key.ek_fsidtype = fsidtype;
>  	memcpy(key.ek_fsid, buf, len);
>  
> @@ -210,6 +213,13 @@ static inline void expkey_init(struct cache_head *cnew,
>  	new->ek_fsidtype = item->ek_fsidtype;
>  
>  	memcpy(new->ek_fsid, item->ek_fsid, sizeof(new->ek_fsid));
> +	new->cd = item->cd;
> +}
> +
> +static void expkey_pin_kill(struct fs_pin *pin)
> +{
> +	struct svc_expkey *key = container_of(pin, struct svc_expkey, ek_pin);
> +	cache_force_expire(key->cd, &key->h);
>  }
>  
>  static inline void expkey_update(struct cache_head *cnew,
> @@ -218,8 +228,10 @@ static inline void expkey_update(struct cache_head *cnew,
>  	struct svc_expkey *new = container_of(cnew, struct svc_expkey, h);
>  	struct svc_expkey *item = container_of(citem, struct svc_expkey, h);
>  
> +	init_fs_pin(&new->ek_pin, expkey_pin_kill);
>  	new->ek_path = item->ek_path;
> -	path_get(&item->ek_path);
> +	dget(item->ek_path.dentry);
> +	pin_insert_group(&new->ek_pin, item->ek_path.mnt, NULL);
>  }
>  
>  static struct cache_head *expkey_alloc(void)
> @@ -309,11 +321,13 @@ static void nfsd4_fslocs_free(struct nfsd4_fs_locations *fsloc)
>  static void svc_export_put(struct kref *ref)
>  {
>  	struct svc_export *exp = container_of(ref, struct svc_export, h.ref);
> -	path_put(&exp->ex_path);
> +
> +	dput(exp->ex_path.dentry);
> +	pin_remove(&exp->ex_pin);
>  	auth_domain_put(exp->ex_client);
>  	nfsd4_fslocs_free(&exp->ex_fslocs);
>  	kfree(exp->ex_uuid);
> -	kfree(exp);
> +	kfree_rcu(exp, rcu_head);
>  }
>  
>  static void svc_export_request(struct cache_detail *cd,
> @@ -694,15 +708,23 @@ static int svc_export_match(struct cache_head *a, struct cache_head *b)
>  		path_equal(&orig->ex_path, &new->ex_path);
>  }
>  
> +static void export_pin_kill(struct fs_pin *pin)
> +{
> +	struct svc_export *exp = container_of(pin, struct svc_export, ex_pin);
> +	cache_force_expire(exp->cd, &exp->h);
> +}
> +
>  static void svc_export_init(struct cache_head *cnew, struct cache_head *citem)
>  {
>  	struct svc_export *new = container_of(cnew, struct svc_export, h);
>  	struct svc_export *item = container_of(citem, struct svc_export, h);
>  
> +	init_fs_pin(&new->ex_pin, export_pin_kill);
>  	kref_get(&item->ex_client->ref);
>  	new->ex_client = item->ex_client;
>  	new->ex_path = item->ex_path;
> -	path_get(&item->ex_path);
> +	dget(item->ex_path.dentry);
> +	pin_insert_group(&new->ex_pin, item->ex_path.mnt, NULL);
>  	new->ex_fslocs.locations = NULL;
>  	new->ex_fslocs.locations_count = 0;
>  	new->ex_fslocs.migrated = 0;
> @@ -811,6 +833,7 @@ exp_find_key(struct cache_detail *cd, struct auth_domain *clp, int fsid_type,
>  
>  	key.ek_client = clp;
>  	key.ek_fsidtype = fsid_type;
> +	key.cd = cd;
>  	memcpy(key.ek_fsid, fsidv, key_len(fsid_type));
>  
>  	ek = svc_expkey_lookup(cd, &key);
> diff --git a/fs/nfsd/export.h b/fs/nfsd/export.h
> index 1f52bfc..1cf6ada 100644
> --- a/fs/nfsd/export.h
> +++ b/fs/nfsd/export.h
> @@ -4,6 +4,7 @@
>  #ifndef NFSD_EXPORT_H
>  #define NFSD_EXPORT_H
>  
> +#include <linux/fs_pin.h>
>  #include <linux/sunrpc/cache.h>
>  #include <uapi/linux/nfsd/export.h>
>  
> @@ -46,6 +47,8 @@ struct exp_flavor_info {
>  
>  struct svc_export {
>  	struct cache_head	h;
> +	struct cache_detail	*cd;
> +
>  	struct auth_domain *	ex_client;
>  	int			ex_flags;
>  	struct path		ex_path;
> @@ -58,7 +61,9 @@ struct svc_export {
>  	struct exp_flavor_info	ex_flavors[MAX_SECINFO_LIST];
>  	enum pnfs_layouttype	ex_layout_type;
>  	struct nfsd4_deviceid_map *ex_devid_map;
> -	struct cache_detail	*cd;
> +
> +	struct fs_pin		ex_pin;
> +	struct rcu_head		rcu_head;
>  };
>  
>  /* an "export key" (expkey) maps a filehandlefragement to an
> @@ -67,12 +72,15 @@ struct svc_export {
>   */
>  struct svc_expkey {
>  	struct cache_head	h;
> +	struct cache_detail	*cd;
>  
>  	struct auth_domain *	ek_client;
>  	int			ek_fsidtype;
>  	u32			ek_fsid[6];
>  
>  	struct path		ek_path;
> +	struct fs_pin		ek_pin;
> +	struct rcu_head		rcu_head;
>  };
>  
>  #define EX_ISSYNC(exp)		(!((exp)->ex_flags & NFSEXP_ASYNC))


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 811 bytes --]

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 4/4] nfsd: Pin to vfsmnt instead of mntget
  2015-05-08  4:40     ` NeilBrown
@ 2015-05-08 13:47       ` J. Bruce Fields
  2015-05-11 13:08         ` Kinglong Mee
  0 siblings, 1 reply; 18+ messages in thread
From: J. Bruce Fields @ 2015-05-08 13:47 UTC (permalink / raw)
  To: NeilBrown
  Cc: Kinglong Mee, linux-fsdevel, linux-nfs, Al Viro, Trond Myklebust

On Fri, May 08, 2015 at 02:40:31PM +1000, NeilBrown wrote:
> Thanks for this patch.  It looks good!
> 
> My only comment on the code is that I would really like to see a
> "path_get_pin()" and "path_put_unpin()" rather than open coding:
> 
> > +	dget(item->ek_path.dentry);
> > +	pin_insert_group(&new->ek_pin, item->ek_path.mnt, NULL);
> 
> and 
> 
> > +		dput(key->ek_path.dentry);
> > +		pin_remove(&key->ek_pin);
> 
> 
> But the question you raise is an important one:  Exactly which filesystems
> should be allowed to be unmounted?
> This is a change in behaviour - is it one that people uniformly would want?
> 
> The kernel doesn't currently know which file systems were explicitly listed
> in /etc/exports, and which were found by following a 'crossmnt'.
> It could guess and allow the unmounting of anything below a 'crossmnt', but I
> wouldn't be comfortable with that - it is error prone.
> 
> mountd does know what is in /etc/exports, and could tell the kernel.
> For the expkey cache, we could always use path_get_pin.
> For the export cache (where flags are available) we could use path_get
> or path_get_pin depending on some new flag.
> 
> I'm not really sure it is worth it.  I would rather the filesystems could
> always be unmounted.  But doing that could possibly break someone's work
> flow.  Maybe.
> 
> Or maybe I'm seeing problems where there aren't any.
> 
> Anyone else have an opinion?

The undisputed bug here was negative cache entries preventing unmount.
So most conservative might be just to purge negative entries.

Otherwise, the only guarantees I think we've really had is that we won't
allow unmount if you hold any actual state on the filesystem (NLM locks,
NFSv4 locks, opens, or delegations).

If a filesystem is exported but no clients hold state on it, then it's
currently mostly chance whether the unmount succeeds or not.  So we're
probably free to change the behavior in this case.  I'd be inclined to
allow the unmount, but haven't thought this through carefully.

It could also be useful to have the ability to force an unmount even in
the presence of locks.  That's not a safe default, but an
"allow_force_unmount" export option might be useful.

We might similarly be able to add some way for the kernel to distinguish
explicit exports from crossmnt-found exports, but I'm not seeing the use
case for that.

--b.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 4/4] nfsd: Pin to vfsmnt instead of mntget
  2015-05-08 13:47       ` J. Bruce Fields
@ 2015-05-11 13:08         ` Kinglong Mee
       [not found]           ` <5550A9DF.1070908-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
  0 siblings, 1 reply; 18+ messages in thread
From: Kinglong Mee @ 2015-05-11 13:08 UTC (permalink / raw)
  To: J. Bruce Fields, NeilBrown
  Cc: linux-fsdevel, linux-nfs, Al Viro, Trond Myklebust, kinglongmee

On 5/8/2015 9:47 PM, J. Bruce Fields wrote:
> On Fri, May 08, 2015 at 02:40:31PM +1000, NeilBrown wrote:
>> Thanks for this patch.  It looks good!
>>
>> My only comment on the code is that I would really like to see a
>> "path_get_pin()" and "path_put_unpin()" rather than open coding:
>>
>>> +	dget(item->ek_path.dentry);
>>> +	pin_insert_group(&new->ek_pin, item->ek_path.mnt, NULL);
>>
>> and 
>>
>>> +		dput(key->ek_path.dentry);
>>> +		pin_remove(&key->ek_pin);
>>
>>
>> But the question you raise is an important one:  Exactly which filesystems
>> should be allowed to be unmounted?
>> This is a change in behaviour - is it one that people uniformly would want?
>>
>> The kernel doesn't currently know which file systems were explicitly listed
>> in /etc/exports, and which were found by following a 'crossmnt'.
>> It could guess and allow the unmounting of anything below a 'crossmnt', but I
>> wouldn't be comfortable with that - it is error prone.
>>
>> mountd does know what is in /etc/exports, and could tell the kernel.
>> For the expkey cache, we could always use path_get_pin.
>> For the export cache (where flags are available) we could use path_get
>> or path_get_pin depending on some new flag.
>>
>> I'm not really sure it is worth it.  I would rather the filesystems could
>> always be unmounted.  But doing that could possibly break someone's work
>> flow.  Maybe.
>>
>> Or maybe I'm seeing problems where there aren't any.
>>
>> Anyone else have an opinion?
> 
> The undisputed bug here was negative cache entries preventing unmount.
> So most conservative might be just to purge negative entries.

I'd like this,
if the cache is valid, user should not be allowed to umount the filesystem.

> 
> Otherwise, the only guarantees I think we've really had is that we won't
> allow unmount if you hold any actual state on the filesystem (NLM locks,
> NFSv4 locks, opens, or delegations).

Those resources hold the reference of vfsmnt.

> 
> If a filesystem is exported but no clients hold state on it, then it's
> currently mostly chance whether the unmount succeeds or not.  So we're
> probably free to change the behavior in this case.  I'd be inclined to
> allow the unmount, but haven't thought this through carefully.

If client mount a nfsserver succeed without holds state, 
nfs server umounts the exported filesystem, 
client also think the filesystem is valid, but it is umounted.

> 
> It could also be useful to have the ability to force an unmount even in
> the presence of locks.  That's not a safe default, but an
> "allow_force_unmount" export option might be useful.
> 
> We might similarly be able to add some way for the kernel to distinguish
> explicit exports from crossmnt-found exports, but I'm not seeing the use
> case for that.

Agree, I don't think we needs that right now.

thanks,
Kinglong Mee

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 4/4] nfsd: Pin to vfsmnt instead of mntget
       [not found]           ` <5550A9DF.1070908-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
@ 2015-05-13  4:25             ` NeilBrown
  2015-05-13 12:30               ` Kinglong Mee
       [not found]               ` <20150513142515.6bd881c8-wvvUuzkyo1EYVZTmpyfIwg@public.gmane.org>
  2015-05-15 21:09             ` J. Bruce Fields
  1 sibling, 2 replies; 18+ messages in thread
From: NeilBrown @ 2015-05-13  4:25 UTC (permalink / raw)
  To: Kinglong Mee
  Cc: J. Bruce Fields, linux-fsdevel-u79uwXL29TY76Z2rM5mHXA,
	linux-nfs-u79uwXL29TY76Z2rM5mHXA, Al Viro, Trond Myklebust

[-- Attachment #1: Type: text/plain, Size: 3408 bytes --]

On Mon, 11 May 2015 21:08:47 +0800 Kinglong Mee <kinglongmee-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:

> On 5/8/2015 9:47 PM, J. Bruce Fields wrote:
> > On Fri, May 08, 2015 at 02:40:31PM +1000, NeilBrown wrote:
> >> Thanks for this patch.  It looks good!
> >>
> >> My only comment on the code is that I would really like to see a
> >> "path_get_pin()" and "path_put_unpin()" rather than open coding:
> >>
> >>> +	dget(item->ek_path.dentry);
> >>> +	pin_insert_group(&new->ek_pin, item->ek_path.mnt, NULL);
> >>
> >> and 
> >>
> >>> +		dput(key->ek_path.dentry);
> >>> +		pin_remove(&key->ek_pin);
> >>
> >>
> >> But the question you raise is an important one:  Exactly which filesystems
> >> should be allowed to be unmounted?
> >> This is a change in behaviour - is it one that people uniformly would want?
> >>
> >> The kernel doesn't currently know which file systems were explicitly listed
> >> in /etc/exports, and which were found by following a 'crossmnt'.
> >> It could guess and allow the unmounting of anything below a 'crossmnt', but I
> >> wouldn't be comfortable with that - it is error prone.
> >>
> >> mountd does know what is in /etc/exports, and could tell the kernel.
> >> For the expkey cache, we could always use path_get_pin.
> >> For the export cache (where flags are available) we could use path_get
> >> or path_get_pin depending on some new flag.
> >>
> >> I'm not really sure it is worth it.  I would rather the filesystems could
> >> always be unmounted.  But doing that could possibly break someone's work
> >> flow.  Maybe.
> >>
> >> Or maybe I'm seeing problems where there aren't any.
> >>
> >> Anyone else have an opinion?
> > 
> > The undisputed bug here was negative cache entries preventing unmount.
> > So most conservative might be just to purge negative entries.
> 
> I'd like this,
> if the cache is valid, user should not be allowed to umount the filesystem.
> 
> > 
> > Otherwise, the only guarantees I think we've really had is that we won't
> > allow unmount if you hold any actual state on the filesystem (NLM locks,
> > NFSv4 locks, opens, or delegations).
> 
> Those resources hold the reference of vfsmnt.
> 
> > 
> > If a filesystem is exported but no clients hold state on it, then it's
> > currently mostly chance whether the unmount succeeds or not.  So we're
> > probably free to change the behavior in this case.  I'd be inclined to
> > allow the unmount, but haven't thought this through carefully.
> 
> If client mount a nfsserver succeed without holds state, 
> nfs server umounts the exported filesystem, 
> client also think the filesystem is valid, but it is umounted.

This is no different from "exportfs -au" being run on the server, thus
unexporting the filesystem and making in unavailable to the client, even
though the client has it mounted.

I think we need to give the server admin control of their filesystems, and
assume they won't do something that they don't really want to do.



> 
> > 
> > It could also be useful to have the ability to force an unmount even in
> > the presence of locks.  That's not a safe default, but an
> > "allow_force_unmount" export option might be useful.

We already have a mechanism to forcibly drop any locks by writing some magic
to /proc/fs/nfsd/unlock_{ip,filesystem}.  I don't think we need any more.

NeilBrown

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 811 bytes --]

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 4/4] nfsd: Pin to vfsmnt instead of mntget
  2015-05-13  4:25             ` NeilBrown
@ 2015-05-13 12:30               ` Kinglong Mee
       [not found]                 ` <555343CA.6010307-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
       [not found]               ` <20150513142515.6bd881c8-wvvUuzkyo1EYVZTmpyfIwg@public.gmane.org>
  1 sibling, 1 reply; 18+ messages in thread
From: Kinglong Mee @ 2015-05-13 12:30 UTC (permalink / raw)
  To: NeilBrown
  Cc: J. Bruce Fields, linux-fsdevel, linux-nfs, Al Viro,
	Trond Myklebust, kinglongmee

On 5/13/2015 12:25 PM, NeilBrown wrote:
> On Mon, 11 May 2015 21:08:47 +0800 Kinglong Mee <kinglongmee@gmail.com> wrote:
> 
>> On 5/8/2015 9:47 PM, J. Bruce Fields wrote:
>>> On Fri, May 08, 2015 at 02:40:31PM +1000, NeilBrown wrote:
>>>> Thanks for this patch.  It looks good!
>>>>
>>>> My only comment on the code is that I would really like to see a
>>>> "path_get_pin()" and "path_put_unpin()" rather than open coding:
>>>>
>>>>> +	dget(item->ek_path.dentry);
>>>>> +	pin_insert_group(&new->ek_pin, item->ek_path.mnt, NULL);
>>>>
>>>> and 
>>>>
>>>>> +		dput(key->ek_path.dentry);
>>>>> +		pin_remove(&key->ek_pin);
>>>>
>>>>
>>>> But the question you raise is an important one:  Exactly which filesystems
>>>> should be allowed to be unmounted?
>>>> This is a change in behaviour - is it one that people uniformly would want?
>>>>
>>>> The kernel doesn't currently know which file systems were explicitly listed
>>>> in /etc/exports, and which were found by following a 'crossmnt'.
>>>> It could guess and allow the unmounting of anything below a 'crossmnt', but I
>>>> wouldn't be comfortable with that - it is error prone.
>>>>
>>>> mountd does know what is in /etc/exports, and could tell the kernel.
>>>> For the expkey cache, we could always use path_get_pin.
>>>> For the export cache (where flags are available) we could use path_get
>>>> or path_get_pin depending on some new flag.
>>>>
>>>> I'm not really sure it is worth it.  I would rather the filesystems could
>>>> always be unmounted.  But doing that could possibly break someone's work
>>>> flow.  Maybe.
>>>>
>>>> Or maybe I'm seeing problems where there aren't any.
>>>>
>>>> Anyone else have an opinion?
>>>
>>> The undisputed bug here was negative cache entries preventing unmount.
>>> So most conservative might be just to purge negative entries.
>>
>> I'd like this,
>> if the cache is valid, user should not be allowed to umount the filesystem.
>>
>>>
>>> Otherwise, the only guarantees I think we've really had is that we won't
>>> allow unmount if you hold any actual state on the filesystem (NLM locks,
>>> NFSv4 locks, opens, or delegations).
>>
>> Those resources hold the reference of vfsmnt.
>>
>>>
>>> If a filesystem is exported but no clients hold state on it, then it's
>>> currently mostly chance whether the unmount succeeds or not.  So we're
>>> probably free to change the behavior in this case.  I'd be inclined to
>>> allow the unmount, but haven't thought this through carefully.
>>
>> If client mount a nfsserver succeed without holds state, 
>> nfs server umounts the exported filesystem, 
>> client also think the filesystem is valid, but it is umounted.
> 
> This is no different from "exportfs -au" being run on the server, thus
> unexporting the filesystem and making in unavailable to the client, even
> though the client has it mounted.

No, I don't think so.
If user using "exportfs -au" to flush caches, I think he known
what the influence of he does, but an umount of filesystem, 
maybe he doesn't known that contains flushing nfsd's exports cache.

For an using of nfsd exports, I'd like an error of an umount,
because I don't realize the exports for nfsd.

I also think nfsd should allowing umount of unexported filesystem,
because user has the right to umount it.

> 
> I think we need to give the server admin control of their filesystems, and
> assume they won't do something that they don't really want to do.
> 
>>>
>>> It could also be useful to have the ability to force an unmount even in
>>> the presence of locks.  That's not a safe default, but an
>>> "allow_force_unmount" export option might be useful.
> 
> We already have a mechanism to forcibly drop any locks by writing some magic
> to /proc/fs/nfsd/unlock_{ip,filesystem}.  I don't think we need any more.

No, I don't agree.
If there are locks (eg, LOCKs/DELEGATIONs/LAYOUTs) exist, nfsd should not allows
user umounting of the filesystem, maybe client is process those files.
We shouldn't clean those information for they are controlled by expire time.

thanks,
Kinglong Mee

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 4/4] nfsd: Pin to vfsmnt instead of mntget
       [not found]                 ` <555343CA.6010307-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
@ 2015-05-13 12:55                   ` Kinglong Mee
  0 siblings, 0 replies; 18+ messages in thread
From: Kinglong Mee @ 2015-05-13 12:55 UTC (permalink / raw)
  To: NeilBrown
  Cc: J. Bruce Fields, linux-fsdevel-u79uwXL29TY76Z2rM5mHXA,
	linux-nfs-u79uwXL29TY76Z2rM5mHXA, Al Viro, Trond Myklebust

On 5/13/2015 8:30 PM, Kinglong Mee wrote:
> On 5/13/2015 12:25 PM, NeilBrown wrote:
>> On Mon, 11 May 2015 21:08:47 +0800 Kinglong Mee <kinglongmee-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
>>
>>> On 5/8/2015 9:47 PM, J. Bruce Fields wrote:
>>>> On Fri, May 08, 2015 at 02:40:31PM +1000, NeilBrown wrote:
>>>>> Thanks for this patch.  It looks good!
>>>>>
>>>>> My only comment on the code is that I would really like to see a
>>>>> "path_get_pin()" and "path_put_unpin()" rather than open coding:
>>>>>
>>>>>> +	dget(item->ek_path.dentry);
>>>>>> +	pin_insert_group(&new->ek_pin, item->ek_path.mnt, NULL);
>>>>>
>>>>> and 
>>>>>
>>>>>> +		dput(key->ek_path.dentry);
>>>>>> +		pin_remove(&key->ek_pin);
>>>>>
>>>>>
>>>>> But the question you raise is an important one:  Exactly which filesystems
>>>>> should be allowed to be unmounted?
>>>>> This is a change in behaviour - is it one that people uniformly would want?
>>>>>
>>>>> The kernel doesn't currently know which file systems were explicitly listed
>>>>> in /etc/exports, and which were found by following a 'crossmnt'.
>>>>> It could guess and allow the unmounting of anything below a 'crossmnt', but I
>>>>> wouldn't be comfortable with that - it is error prone.
>>>>>
>>>>> mountd does know what is in /etc/exports, and could tell the kernel.
>>>>> For the expkey cache, we could always use path_get_pin.
>>>>> For the export cache (where flags are available) we could use path_get
>>>>> or path_get_pin depending on some new flag.
>>>>>
>>>>> I'm not really sure it is worth it.  I would rather the filesystems could
>>>>> always be unmounted.  But doing that could possibly break someone's work
>>>>> flow.  Maybe.
>>>>>
>>>>> Or maybe I'm seeing problems where there aren't any.
>>>>>
>>>>> Anyone else have an opinion?
>>>>
>>>> The undisputed bug here was negative cache entries preventing unmount.
>>>> So most conservative might be just to purge negative entries.
>>>
>>> I'd like this,
>>> if the cache is valid, user should not be allowed to umount the filesystem.
>>>
>>>>
>>>> Otherwise, the only guarantees I think we've really had is that we won't
>>>> allow unmount if you hold any actual state on the filesystem (NLM locks,
>>>> NFSv4 locks, opens, or delegations).
>>>
>>> Those resources hold the reference of vfsmnt.
>>>
>>>>
>>>> If a filesystem is exported but no clients hold state on it, then it's
>>>> currently mostly chance whether the unmount succeeds or not.  So we're
>>>> probably free to change the behavior in this case.  I'd be inclined to
>>>> allow the unmount, but haven't thought this through carefully.
>>>
>>> If client mount a nfsserver succeed without holds state, 
>>> nfs server umounts the exported filesystem, 
>>> client also think the filesystem is valid, but it is umounted.
>>
>> This is no different from "exportfs -au" being run on the server, thus
>> unexporting the filesystem and making in unavailable to the client, even
>> though the client has it mounted.
> 
> No, I don't think so.
> If user using "exportfs -au" to flush caches, I think he known
> what the influence of he does, but an umount of filesystem, 
> maybe he doesn't known that contains flushing nfsd's exports cache.
> 
> For an using of nfsd exports, I'd like an error of an umount,
> because I don't realize the exports for nfsd.
> 
> I also think nfsd should allowing umount of unexported filesystem,
> because user has the right to umount it.

The following is a diff draft of umounting an unexported filesystem.

thanks,
Kinglong Mee

------------------------------------------------------------------------
diff --git a/fs/nfsd/export.c b/fs/nfsd/export.c
index f79521a..bcaa914 100644
--- a/fs/nfsd/export.c
+++ b/fs/nfsd/export.c
@@ -309,11 +309,17 @@ static void nfsd4_fslocs_free(struct nfsd4_fs_locations *fsloc)
 static void svc_export_put(struct kref *ref)
 {
 	struct svc_export *exp = container_of(ref, struct svc_export, h.ref);
-	path_put(&exp->ex_path);
+
+	if (exp->ex_pin.kill) {
+		dput(exp->ex_path.dentry);
+		pin_remove(&exp->ex_pin);
+	} else
+		path_put(&exp->ex_path);
+
 	auth_domain_put(exp->ex_client);
 	nfsd4_fslocs_free(&exp->ex_fslocs);
 	kfree(exp->ex_uuid);
-	kfree(exp);
+	kfree_rcu(exp, rcu_head);
 }
 
 static void svc_export_request(struct cache_detail *cd,
@@ -699,6 +705,7 @@ static void svc_export_init(struct cache_head *cnew, struct cache_head *citem)
 	struct svc_export *new = container_of(cnew, struct svc_export, h);
 	struct svc_export *item = container_of(citem, struct svc_export, h);
 
+	init_fs_pin(&new->ex_pin, NULL);
 	kref_get(&item->ex_client->ref);
 	new->ex_client = item->ex_client;
 	new->ex_path = item->ex_path;
@@ -738,6 +745,24 @@ static void export_update(struct cache_head *cnew, struct cache_head *citem)
 	}
 }
 
+static void export_pin_kill(struct fs_pin *pin)
+{
+	struct svc_export *exp = container_of(pin, struct svc_export, ex_pin);
+	cache_force_expire(exp->cd, &exp->h);
+}
+
+static void export_update_negative(struct cache_head *cnew, struct cache_head *citem)
+{
+	struct svc_export *new = container_of(cnew, struct svc_export, h);
+
+	if (!test_bit(CACHE_NEGATIVE, &new->h.flags))
+		return ;
+
+	init_fs_pin(&new->ex_pin, export_pin_kill);
+	pin_insert_group(&new->ex_pin, new->ex_path.mnt, NULL);
+	mntput(new->ex_path.mnt);
+}
+
 static struct cache_head *svc_export_alloc(void)
 {
 	struct svc_export *i = kmalloc(sizeof(*i), GFP_KERNEL);
@@ -758,6 +783,7 @@ static struct cache_detail svc_export_cache_template = {
 	.match		= svc_export_match,
 	.init		= svc_export_init,
 	.update		= export_update,
+	.update_negative= export_update_negative,
 	.alloc		= svc_export_alloc,
 };
 
diff --git a/fs/nfsd/export.h b/fs/nfsd/export.h
index 1f52bfc..c764a8e 100644
--- a/fs/nfsd/export.h
+++ b/fs/nfsd/export.h
@@ -4,6 +4,7 @@
 #ifndef NFSD_EXPORT_H
 #define NFSD_EXPORT_H
 
+#include <linux/fs_pin.h>
 #include <linux/sunrpc/cache.h>
 #include <uapi/linux/nfsd/export.h>
 
@@ -46,6 +47,8 @@ struct exp_flavor_info {
 
 struct svc_export {
 	struct cache_head	h;
+	struct cache_detail	*cd;
+
 	struct auth_domain *	ex_client;
 	int			ex_flags;
 	struct path		ex_path;
@@ -58,7 +61,9 @@ struct svc_export {
 	struct exp_flavor_info	ex_flavors[MAX_SECINFO_LIST];
 	enum pnfs_layouttype	ex_layout_type;
 	struct nfsd4_deviceid_map *ex_devid_map;
-	struct cache_detail	*cd;
+
+	struct fs_pin		ex_pin;
+	struct rcu_head		rcu_head;
 };
 
 /* an "export key" (expkey) maps a filehandlefragement to an
diff --git a/include/linux/sunrpc/cache.h b/include/linux/sunrpc/cache.h
index 437ddb6..39b31b5 100644
--- a/include/linux/sunrpc/cache.h
+++ b/include/linux/sunrpc/cache.h
@@ -101,6 +101,7 @@ struct cache_detail {
 	int			(*match)(struct cache_head *orig, struct cache_head *new);
 	void			(*init)(struct cache_head *orig, struct cache_head *new);
 	void			(*update)(struct cache_head *orig, struct cache_head *new);
+	void			(*update_negative)(struct cache_head *orig, struct cache_head *new);
 
 	/* fields below this comment are for internal use
 	 * and should not be touched by cache owners
diff --git a/net/sunrpc/cache.c b/net/sunrpc/cache.c
index 2928aff..4a95dee 100644
--- a/net/sunrpc/cache.c
+++ b/net/sunrpc/cache.c
@@ -149,9 +149,11 @@ struct cache_head *sunrpc_cache_update(struct cache_detail *detail,
 	if (!test_bit(CACHE_VALID, &old->flags)) {
 		write_lock(&detail->hash_lock);
 		if (!test_bit(CACHE_VALID, &old->flags)) {
-			if (test_bit(CACHE_NEGATIVE, &new->flags))
+			if (test_bit(CACHE_NEGATIVE, &new->flags)) {
 				set_bit(CACHE_NEGATIVE, &old->flags);
-			else
+				if (detail->update_negative)
+					detail->update_negative(old, new);
+			} else
 				detail->update(old, new);
 			cache_fresh_locked(old, new->expiry_time);
 			write_unlock(&detail->hash_lock);
@@ -171,9 +173,11 @@ struct cache_head *sunrpc_cache_update(struct cache_detail *detail,
 	head = &detail->hash_table[hash];
 
 	write_lock(&detail->hash_lock);
-	if (test_bit(CACHE_NEGATIVE, &new->flags))
+	if (test_bit(CACHE_NEGATIVE, &new->flags)) {
 		set_bit(CACHE_NEGATIVE, &tmp->flags);
-	else
+		if (detail->update_negative)
+			detail->update_negative(old, new);
+	} else
 		detail->update(tmp, new);
 	tmp->next = *head;
 	*head = tmp;

--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [PATCH 4/4] nfsd: Pin to vfsmnt instead of mntget
       [not found]           ` <5550A9DF.1070908-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
  2015-05-13  4:25             ` NeilBrown
@ 2015-05-15 21:09             ` J. Bruce Fields
  1 sibling, 0 replies; 18+ messages in thread
From: J. Bruce Fields @ 2015-05-15 21:09 UTC (permalink / raw)
  To: Kinglong Mee
  Cc: NeilBrown, linux-fsdevel-u79uwXL29TY76Z2rM5mHXA,
	linux-nfs-u79uwXL29TY76Z2rM5mHXA, Al Viro, Trond Myklebust

On Mon, May 11, 2015 at 09:08:47PM +0800, Kinglong Mee wrote:
> On 5/8/2015 9:47 PM, J. Bruce Fields wrote:
> > On Fri, May 08, 2015 at 02:40:31PM +1000, NeilBrown wrote:
> >> Thanks for this patch.  It looks good!
> >>
> >> My only comment on the code is that I would really like to see a
> >> "path_get_pin()" and "path_put_unpin()" rather than open coding:
> >>
> >>> +	dget(item->ek_path.dentry);
> >>> +	pin_insert_group(&new->ek_pin, item->ek_path.mnt, NULL);
> >>
> >> and 
> >>
> >>> +		dput(key->ek_path.dentry);
> >>> +		pin_remove(&key->ek_pin);
> >>
> >>
> >> But the question you raise is an important one:  Exactly which filesystems
> >> should be allowed to be unmounted?
> >> This is a change in behaviour - is it one that people uniformly would want?
> >>
> >> The kernel doesn't currently know which file systems were explicitly listed
> >> in /etc/exports, and which were found by following a 'crossmnt'.
> >> It could guess and allow the unmounting of anything below a 'crossmnt', but I
> >> wouldn't be comfortable with that - it is error prone.
> >>
> >> mountd does know what is in /etc/exports, and could tell the kernel.
> >> For the expkey cache, we could always use path_get_pin.
> >> For the export cache (where flags are available) we could use path_get
> >> or path_get_pin depending on some new flag.
> >>
> >> I'm not really sure it is worth it.  I would rather the filesystems could
> >> always be unmounted.  But doing that could possibly break someone's work
> >> flow.  Maybe.
> >>
> >> Or maybe I'm seeing problems where there aren't any.
> >>
> >> Anyone else have an opinion?
> > 
> > The undisputed bug here was negative cache entries preventing unmount.
> > So most conservative might be just to purge negative entries.
> 
> I'd like this,
> if the cache is valid, user should not be allowed to umount the filesystem.
> 
> > 
> > Otherwise, the only guarantees I think we've really had is that we won't
> > allow unmount if you hold any actual state on the filesystem (NLM locks,
> > NFSv4 locks, opens, or delegations).
> 
> Those resources hold the reference of vfsmnt.
> 
> > 
> > If a filesystem is exported but no clients hold state on it, then it's
> > currently mostly chance whether the unmount succeeds or not.  So we're
> > probably free to change the behavior in this case.  I'd be inclined to
> > allow the unmount, but haven't thought this through carefully.
> 
> If client mount a nfsserver succeed without holds state, 
> nfs server umounts the exported filesystem, 
> client also think the filesystem is valid, but it is umounted.

People do sometimes want that even when state's held.

The case I've seen is migration of individual exports (or sets of
exports) on shared block storage, using a floating IP--I think the
sequence is: shut down the new server, move the floating IP to the new
server, then unexport and unmount on the old server, then mount on the
new server, export, and restart the new server.

Or maybe they really just want to unmount something and don't mind
client applications erroring out.

--b.
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 4/4] nfsd: Pin to vfsmnt instead of mntget
       [not found]               ` <20150513142515.6bd881c8-wvvUuzkyo1EYVZTmpyfIwg@public.gmane.org>
@ 2015-05-15 21:11                 ` J. Bruce Fields
  2015-05-15 23:23                   ` NeilBrown
  0 siblings, 1 reply; 18+ messages in thread
From: J. Bruce Fields @ 2015-05-15 21:11 UTC (permalink / raw)
  To: NeilBrown
  Cc: Kinglong Mee, linux-fsdevel-u79uwXL29TY76Z2rM5mHXA,
	linux-nfs-u79uwXL29TY76Z2rM5mHXA, Al Viro, Trond Myklebust

On Wed, May 13, 2015 at 02:25:15PM +1000, NeilBrown wrote:
> On Mon, 11 May 2015 21:08:47 +0800 Kinglong Mee <kinglongmee-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
> 
> > On 5/8/2015 9:47 PM, J. Bruce Fields wrote:
> > > It could also be useful to have the ability to force an unmount even in
> > > the presence of locks.  That's not a safe default, but an
> > > "allow_force_unmount" export option might be useful.
> 
> We already have a mechanism to forcibly drop any locks by writing some magic
> to /proc/fs/nfsd/unlock_{ip,filesystem}.  I don't think we need any more.

Yeah, I remember thinking this sort of approach would have advantages,
maybe I was wrong, I need to revisit it.

The unlock_{ip,filesystem} approach requires temporarily shutting down
mountd, doesn't it?

--b.
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 4/4] nfsd: Pin to vfsmnt instead of mntget
  2015-05-15 21:11                 ` J. Bruce Fields
@ 2015-05-15 23:23                   ` NeilBrown
  2015-05-22 15:02                     ` Kinglong Mee
  0 siblings, 1 reply; 18+ messages in thread
From: NeilBrown @ 2015-05-15 23:23 UTC (permalink / raw)
  To: J. Bruce Fields
  Cc: Kinglong Mee, linux-fsdevel, linux-nfs, Al Viro, Trond Myklebust

[-- Attachment #1: Type: text/plain, Size: 1414 bytes --]

On Fri, 15 May 2015 17:11:34 -0400 "J. Bruce Fields" <bfields@fieldses.org>
wrote:

> On Wed, May 13, 2015 at 02:25:15PM +1000, NeilBrown wrote:
> > On Mon, 11 May 2015 21:08:47 +0800 Kinglong Mee <kinglongmee@gmail.com> wrote:
> > 
> > > On 5/8/2015 9:47 PM, J. Bruce Fields wrote:
> > > > It could also be useful to have the ability to force an unmount even in
> > > > the presence of locks.  That's not a safe default, but an
> > > > "allow_force_unmount" export option might be useful.
> > 
> > We already have a mechanism to forcibly drop any locks by writing some magic
> > to /proc/fs/nfsd/unlock_{ip,filesystem}.  I don't think we need any more.
> 
> Yeah, I remember thinking this sort of approach would have advantages,
> maybe I was wrong, I need to revisit it.
> 
> The unlock_{ip,filesystem} approach requires temporarily shutting down
> mountd, doesn't it?

Not necessarily.
It does require ensuring that new locks aren't suddenly taken though.

I imagine an early step in the migration process is to "ifconfig down" the
virtual interface with the floating ID.  Then you can safely "unlock" and
unmount any filesystems are that only accessed via the IP.

But you are right that using the "unlock_*" interface and then unmounting is
racy in a way that we are trying to make "unmount" not racy.  So maybe an
"allow_force_unmount" would have a place.

Thanks,
NeilBrown

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 811 bytes --]

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 4/4] nfsd: Pin to vfsmnt instead of mntget
  2015-05-15 23:23                   ` NeilBrown
@ 2015-05-22 15:02                     ` Kinglong Mee
  2015-05-22 16:03                       ` J. Bruce Fields
  0 siblings, 1 reply; 18+ messages in thread
From: Kinglong Mee @ 2015-05-22 15:02 UTC (permalink / raw)
  To: NeilBrown, J. Bruce Fields
  Cc: linux-fsdevel, linux-nfs, Al Viro, Trond Myklebust

On 5/16/2015 7:23 AM, NeilBrown wrote:
> On Fri, 15 May 2015 17:11:34 -0400 "J. Bruce Fields" <bfields@fieldses.org>
> wrote:
> 
>> On Wed, May 13, 2015 at 02:25:15PM +1000, NeilBrown wrote:
>>> On Mon, 11 May 2015 21:08:47 +0800 Kinglong Mee <kinglongmee@gmail.com> wrote:
>>>
>>>> On 5/8/2015 9:47 PM, J. Bruce Fields wrote:
>>>>> It could also be useful to have the ability to force an unmount even in
>>>>> the presence of locks.  That's not a safe default, but an
>>>>> "allow_force_unmount" export option might be useful.
>>>
>>> We already have a mechanism to forcibly drop any locks by writing some magic
>>> to /proc/fs/nfsd/unlock_{ip,filesystem}.  I don't think we need any more.
>>
>> Yeah, I remember thinking this sort of approach would have advantages,
>> maybe I was wrong, I need to revisit it.
>>
>> The unlock_{ip,filesystem} approach requires temporarily shutting down
>> mountd, doesn't it?
> 
> Not necessarily.
> It does require ensuring that new locks aren't suddenly taken though.
> 
> I imagine an early step in the migration process is to "ifconfig down" the
> virtual interface with the floating ID.  Then you can safely "unlock" and
> unmount any filesystems are that only accessed via the IP.
> 
> But you are right that using the "unlock_*" interface and then unmounting is
> racy in a way that we are trying to make "unmount" not racy.  So maybe an
> "allow_force_unmount" would have a place.

No, unlock_{ip,filesystem} are used for nlmlock, doesn't support nfsv4 resources.
Some other interfaces under /sys/kernel/debug/nfsd/forget_* support nfsv4 resources,
without for an filesystem. It seems will be removed sometime.

thanks,
Kinglong Mee

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 4/4] nfsd: Pin to vfsmnt instead of mntget
  2015-05-22 15:02                     ` Kinglong Mee
@ 2015-05-22 16:03                       ` J. Bruce Fields
  0 siblings, 0 replies; 18+ messages in thread
From: J. Bruce Fields @ 2015-05-22 16:03 UTC (permalink / raw)
  To: Kinglong Mee
  Cc: NeilBrown, linux-fsdevel, linux-nfs, Al Viro, Trond Myklebust

On Fri, May 22, 2015 at 11:02:25PM +0800, Kinglong Mee wrote:
> On 5/16/2015 7:23 AM, NeilBrown wrote:
> > On Fri, 15 May 2015 17:11:34 -0400 "J. Bruce Fields" <bfields@fieldses.org>
> > wrote:
> > 
> >> On Wed, May 13, 2015 at 02:25:15PM +1000, NeilBrown wrote:
> >>> On Mon, 11 May 2015 21:08:47 +0800 Kinglong Mee <kinglongmee@gmail.com> wrote:
> >>>
> >>>> On 5/8/2015 9:47 PM, J. Bruce Fields wrote:
> >>>>> It could also be useful to have the ability to force an unmount even in
> >>>>> the presence of locks.  That's not a safe default, but an
> >>>>> "allow_force_unmount" export option might be useful.
> >>>
> >>> We already have a mechanism to forcibly drop any locks by writing some magic
> >>> to /proc/fs/nfsd/unlock_{ip,filesystem}.  I don't think we need any more.
> >>
> >> Yeah, I remember thinking this sort of approach would have advantages,
> >> maybe I was wrong, I need to revisit it.
> >>
> >> The unlock_{ip,filesystem} approach requires temporarily shutting down
> >> mountd, doesn't it?
> > 
> > Not necessarily.
> > It does require ensuring that new locks aren't suddenly taken though.
> > 
> > I imagine an early step in the migration process is to "ifconfig down" the
> > virtual interface with the floating ID.  Then you can safely "unlock" and
> > unmount any filesystems are that only accessed via the IP.
> > 
> > But you are right that using the "unlock_*" interface and then unmounting is
> > racy in a way that we are trying to make "unmount" not racy.  So maybe an
> > "allow_force_unmount" would have a place.
> 
> No, unlock_{ip,filesystem} are used for nlmlock, doesn't support nfsv4 resources.

I still prefer the "allow_force_unmount" option, but maybe we should
also fix unlock_{ip,filesystem} to deal with nfsv4.  (Though I think
it's a little less well-defined there due to the possibility of
trunking.)

> Some other interfaces under /sys/kernel/debug/nfsd/forget_* support nfsv4 resources,
> without for an filesystem. It seems will be removed sometime.

We definitely don't want people to depend on those for anything other
than testing clients.

I don't think they'd be practical for this use.  forget_client comes the
closest, but you'd have to figure out the ip address of every client you
want to forget.  If there's a risk people might try to really use that,
then maybe we should go for scarier warnings and/or remove that
particular interface.

--b.

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2015-05-22 16:03 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-05-06 13:18 [PATCH 0/4] NFSD: Pin to vfsmount instead of mntget for export cache Kinglong Mee
     [not found] ` <554A149B.5060102-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2015-05-06 13:19   ` [PATCH 1/4] fs_pin: Fix uninitialized value in fs_pin Kinglong Mee
2015-05-07 19:43     ` J. Bruce Fields
     [not found]       ` <20150507194335.GA16527-uC3wQj2KruNg9hUCZPvPmw@public.gmane.org>
2015-05-08  0:36         ` Kinglong Mee
2015-05-06 13:19   ` [PATCH 2/4] fs_pin: Export functions for specific filesystem Kinglong Mee
2015-05-06 13:20   ` [PATCH 3/4] sunrpc: New helper cache_force_expire for cache cleanup Kinglong Mee
2015-05-06 13:21   ` [PATCH 4/4] nfsd: Pin to vfsmnt instead of mntget Kinglong Mee
2015-05-08  4:40     ` NeilBrown
2015-05-08 13:47       ` J. Bruce Fields
2015-05-11 13:08         ` Kinglong Mee
     [not found]           ` <5550A9DF.1070908-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2015-05-13  4:25             ` NeilBrown
2015-05-13 12:30               ` Kinglong Mee
     [not found]                 ` <555343CA.6010307-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2015-05-13 12:55                   ` Kinglong Mee
     [not found]               ` <20150513142515.6bd881c8-wvvUuzkyo1EYVZTmpyfIwg@public.gmane.org>
2015-05-15 21:11                 ` J. Bruce Fields
2015-05-15 23:23                   ` NeilBrown
2015-05-22 15:02                     ` Kinglong Mee
2015-05-22 16:03                       ` J. Bruce Fields
2015-05-15 21:09             ` J. Bruce Fields

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).