All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/4 v2] NFSD: Pin to vfsmount for some nfsd exports cache
@ 2015-05-24 15:01 ` Kinglong Mee
  0 siblings, 0 replies; 16+ messages in thread
From: Kinglong Mee @ 2015-05-24 15:01 UTC (permalink / raw)
  To: J. Bruce Fields, Al Viro, linux-fsdevel-u79uwXL29TY76Z2rM5mHXA,
	linux-nfs-u79uwXL29TY76Z2rM5mHXA, NeilBrown
  Cc: Trond Myklebust, Steve Dickson, kinglongmee-Re5JQEeQqe8AvxtiuMwx3w

If there are some mount points(not exported for nfs) under pseudo root,
after client's operation of those entry under the root, anyone *can't*
unmount those mount points until export cache expired.

# cat /etc/exports
/nfs/xfs        *(rw,insecure,no_subtree_check,no_root_squash)
/nfs/pnfs       *(rw,insecure,no_subtree_check,no_root_squash)
# ll /nfs/
total 0
drwxr-xr-x. 3 root root 84 Apr 21 22:27 pnfs
drwxr-xr-x. 3 root root 84 Apr 21 22:27 test
drwxr-xr-x. 2 root root  6 Apr 20 22:01 xfs
# mount /dev/sde /nfs/test
# df
Filesystem                      1K-blocks    Used Available Use% Mounted on
......
/dev/sdd                          1038336   32944   1005392   4% /nfs/pnfs
/dev/sdc                         10475520   32928  10442592   1% /nfs/xfs
/dev/sde                           999320    1284    929224   1% /nfs/test
# mount -t nfs 127.0.0.1:/nfs/ /mnt
# ll /mnt/*/
/mnt/pnfs/:
total 0
-rw-r--r--. 1 root root 0 Apr 21 22:23 attr
drwxr-xr-x. 2 root root 6 Apr 21 22:19 tmp

/mnt/xfs/:
total 0
# umount /nfs/test/
umount: /nfs/test/: target is busy
        (In some cases useful info about processes that
         use the device is found by lsof(8) or fuser(1).)

I don't think that's user expect, they want umount /nfs/test/.

It's caused by exports cache of nfsd holds the reference of
the path (here is /nfs/test/), so, it can't be umounted.

v1 --> v2,
1. Adds an option named "allow_umount" for exports allowing user
   un-mounting the filesystem where nfsd exports base on.
2. New helpers path_get_pin/path_put_unpin for path pin.
3. Update exports according to the "allow_umount" option.

Kinglong Mee (5):
  fs_pin: Fix uninitialized value in fs_pin
  fs_pin: Export functions for specific filesystem
  path: New helpers path_get_pin/path_put_unpin for path pin
  sunrpc: New helper cache_force_expire for cache cleanup
  nfsd: Allows user un-mounting filesystem where nfsd exports base on

 fs/fs_pin.c                      |  3 +++
 fs/namei.c                       | 26 ++++++++++++++++++++
 fs/nfsd/export.c                 | 52 ++++++++++++++++++++++++++++++++--------
 fs/nfsd/export.h                 | 11 ++++++++-
 include/linux/fs_pin.h           |  6 +++++
 include/linux/path.h             |  4 ++++
 include/linux/sunrpc/cache.h     | 11 +++++++++
 include/uapi/linux/nfsd/export.h |  3 ++-
 8 files changed, 104 insertions(+), 12 deletions(-)

-- 
2.4.1

--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH 0/4 v2] NFSD: Pin to vfsmount for some nfsd exports cache
@ 2015-05-24 15:01 ` Kinglong Mee
  0 siblings, 0 replies; 16+ messages in thread
From: Kinglong Mee @ 2015-05-24 15:01 UTC (permalink / raw)
  To: J. Bruce Fields, Al Viro, linux-fsdevel, linux-nfs, NeilBrown
  Cc: Trond Myklebust, Steve Dickson, kinglongmee

If there are some mount points(not exported for nfs) under pseudo root,
after client's operation of those entry under the root, anyone *can't*
unmount those mount points until export cache expired.

# cat /etc/exports
/nfs/xfs        *(rw,insecure,no_subtree_check,no_root_squash)
/nfs/pnfs       *(rw,insecure,no_subtree_check,no_root_squash)
# ll /nfs/
total 0
drwxr-xr-x. 3 root root 84 Apr 21 22:27 pnfs
drwxr-xr-x. 3 root root 84 Apr 21 22:27 test
drwxr-xr-x. 2 root root  6 Apr 20 22:01 xfs
# mount /dev/sde /nfs/test
# df
Filesystem                      1K-blocks    Used Available Use% Mounted on
......
/dev/sdd                          1038336   32944   1005392   4% /nfs/pnfs
/dev/sdc                         10475520   32928  10442592   1% /nfs/xfs
/dev/sde                           999320    1284    929224   1% /nfs/test
# mount -t nfs 127.0.0.1:/nfs/ /mnt
# ll /mnt/*/
/mnt/pnfs/:
total 0
-rw-r--r--. 1 root root 0 Apr 21 22:23 attr
drwxr-xr-x. 2 root root 6 Apr 21 22:19 tmp

/mnt/xfs/:
total 0
# umount /nfs/test/
umount: /nfs/test/: target is busy
        (In some cases useful info about processes that
         use the device is found by lsof(8) or fuser(1).)

I don't think that's user expect, they want umount /nfs/test/.

It's caused by exports cache of nfsd holds the reference of
the path (here is /nfs/test/), so, it can't be umounted.

v1 --> v2,
1. Adds an option named "allow_umount" for exports allowing user
   un-mounting the filesystem where nfsd exports base on.
2. New helpers path_get_pin/path_put_unpin for path pin.
3. Update exports according to the "allow_umount" option.

Kinglong Mee (5):
  fs_pin: Fix uninitialized value in fs_pin
  fs_pin: Export functions for specific filesystem
  path: New helpers path_get_pin/path_put_unpin for path pin
  sunrpc: New helper cache_force_expire for cache cleanup
  nfsd: Allows user un-mounting filesystem where nfsd exports base on

 fs/fs_pin.c                      |  3 +++
 fs/namei.c                       | 26 ++++++++++++++++++++
 fs/nfsd/export.c                 | 52 ++++++++++++++++++++++++++++++++--------
 fs/nfsd/export.h                 | 11 ++++++++-
 include/linux/fs_pin.h           |  6 +++++
 include/linux/path.h             |  4 ++++
 include/linux/sunrpc/cache.h     | 11 +++++++++
 include/uapi/linux/nfsd/export.h |  3 ++-
 8 files changed, 104 insertions(+), 12 deletions(-)

-- 
2.4.1


^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH 1/5 v2] fs_pin: Fix uninitialized value in fs_pin
  2015-05-24 15:01 ` Kinglong Mee
@ 2015-05-24 15:10     ` Kinglong Mee
  -1 siblings, 0 replies; 16+ messages in thread
From: Kinglong Mee @ 2015-05-24 15:10 UTC (permalink / raw)
  To: J. Bruce Fields, Al Viro, linux-fsdevel-u79uwXL29TY76Z2rM5mHXA,
	linux-nfs-u79uwXL29TY76Z2rM5mHXA, NeilBrown
  Cc: Trond Myklebust, Steve Dickson, kinglongmee-Re5JQEeQqe8AvxtiuMwx3w

Without initialized, done in fs_pin at stack space may
contains strange value.

v2, same as v1.

Signed-off-by: Kinglong Mee <kinglongmee-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
---
 include/linux/fs_pin.h | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/include/linux/fs_pin.h b/include/linux/fs_pin.h
index 3886b3b..0dde7b7 100644
--- a/include/linux/fs_pin.h
+++ b/include/linux/fs_pin.h
@@ -1,3 +1,6 @@
+#ifndef _LINUX_FS_PIN_H
+#define _LINUX_FS_PIN_H
+
 #include <linux/wait.h>
 
 struct fs_pin {
@@ -16,9 +19,12 @@ static inline void init_fs_pin(struct fs_pin *p, void (*kill)(struct fs_pin *))
 	INIT_HLIST_NODE(&p->s_list);
 	INIT_HLIST_NODE(&p->m_list);
 	p->kill = kill;
+	p->done = 0;
 }
 
 void pin_remove(struct fs_pin *);
 void pin_insert_group(struct fs_pin *, struct vfsmount *, struct hlist_head *);
 void pin_insert(struct fs_pin *, struct vfsmount *);
 void pin_kill(struct fs_pin *);
+
+#endif
-- 
2.4.1

--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 1/5 v2] fs_pin: Fix uninitialized value in fs_pin
@ 2015-05-24 15:10     ` Kinglong Mee
  0 siblings, 0 replies; 16+ messages in thread
From: Kinglong Mee @ 2015-05-24 15:10 UTC (permalink / raw)
  To: J. Bruce Fields, Al Viro, linux-fsdevel, linux-nfs, NeilBrown
  Cc: Trond Myklebust, Steve Dickson, kinglongmee

Without initialized, done in fs_pin at stack space may
contains strange value.

v2, same as v1.

Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
---
 include/linux/fs_pin.h | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/include/linux/fs_pin.h b/include/linux/fs_pin.h
index 3886b3b..0dde7b7 100644
--- a/include/linux/fs_pin.h
+++ b/include/linux/fs_pin.h
@@ -1,3 +1,6 @@
+#ifndef _LINUX_FS_PIN_H
+#define _LINUX_FS_PIN_H
+
 #include <linux/wait.h>
 
 struct fs_pin {
@@ -16,9 +19,12 @@ static inline void init_fs_pin(struct fs_pin *p, void (*kill)(struct fs_pin *))
 	INIT_HLIST_NODE(&p->s_list);
 	INIT_HLIST_NODE(&p->m_list);
 	p->kill = kill;
+	p->done = 0;
 }
 
 void pin_remove(struct fs_pin *);
 void pin_insert_group(struct fs_pin *, struct vfsmount *, struct hlist_head *);
 void pin_insert(struct fs_pin *, struct vfsmount *);
 void pin_kill(struct fs_pin *);
+
+#endif
-- 
2.4.1


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 2/5 v2] fs_pin: Export functions for specific filesystem
  2015-05-24 15:01 ` Kinglong Mee
@ 2015-05-24 15:10     ` Kinglong Mee
  -1 siblings, 0 replies; 16+ messages in thread
From: Kinglong Mee @ 2015-05-24 15:10 UTC (permalink / raw)
  To: J. Bruce Fields, Al Viro, linux-fsdevel-u79uwXL29TY76Z2rM5mHXA,
	linux-nfs-u79uwXL29TY76Z2rM5mHXA, NeilBrown
  Cc: Trond Myklebust, Steve Dickson, kinglongmee-Re5JQEeQqe8AvxtiuMwx3w

Exports functions for others who want pin to vfsmount,
eg, nfsd's export cache.

v2, same as v1.

Signed-off-by: Kinglong Mee <kinglongmee-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
---
 fs/fs_pin.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/fs/fs_pin.c b/fs/fs_pin.c
index 611b540..553e8b1 100644
--- a/fs/fs_pin.c
+++ b/fs/fs_pin.c
@@ -17,6 +17,7 @@ void pin_remove(struct fs_pin *pin)
 	wake_up_locked(&pin->wait);
 	spin_unlock_irq(&pin->wait.lock);
 }
+EXPORT_SYMBOL(pin_remove);
 
 void pin_insert_group(struct fs_pin *pin, struct vfsmount *m, struct hlist_head *p)
 {
@@ -26,11 +27,13 @@ void pin_insert_group(struct fs_pin *pin, struct vfsmount *m, struct hlist_head
 	hlist_add_head(&pin->m_list, &real_mount(m)->mnt_pins);
 	spin_unlock(&pin_lock);
 }
+EXPORT_SYMBOL(pin_insert_group);
 
 void pin_insert(struct fs_pin *pin, struct vfsmount *m)
 {
 	pin_insert_group(pin, m, &m->mnt_sb->s_pins);
 }
+EXPORT_SYMBOL(pin_insert);
 
 void pin_kill(struct fs_pin *p)
 {
-- 
2.4.1

--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 2/5 v2] fs_pin: Export functions for specific filesystem
@ 2015-05-24 15:10     ` Kinglong Mee
  0 siblings, 0 replies; 16+ messages in thread
From: Kinglong Mee @ 2015-05-24 15:10 UTC (permalink / raw)
  To: J. Bruce Fields, Al Viro, linux-fsdevel, linux-nfs, NeilBrown
  Cc: Trond Myklebust, Steve Dickson, kinglongmee

Exports functions for others who want pin to vfsmount,
eg, nfsd's export cache.

v2, same as v1.

Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
---
 fs/fs_pin.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/fs/fs_pin.c b/fs/fs_pin.c
index 611b540..553e8b1 100644
--- a/fs/fs_pin.c
+++ b/fs/fs_pin.c
@@ -17,6 +17,7 @@ void pin_remove(struct fs_pin *pin)
 	wake_up_locked(&pin->wait);
 	spin_unlock_irq(&pin->wait.lock);
 }
+EXPORT_SYMBOL(pin_remove);
 
 void pin_insert_group(struct fs_pin *pin, struct vfsmount *m, struct hlist_head *p)
 {
@@ -26,11 +27,13 @@ void pin_insert_group(struct fs_pin *pin, struct vfsmount *m, struct hlist_head
 	hlist_add_head(&pin->m_list, &real_mount(m)->mnt_pins);
 	spin_unlock(&pin_lock);
 }
+EXPORT_SYMBOL(pin_insert_group);
 
 void pin_insert(struct fs_pin *pin, struct vfsmount *m)
 {
 	pin_insert_group(pin, m, &m->mnt_sb->s_pins);
 }
+EXPORT_SYMBOL(pin_insert);
 
 void pin_kill(struct fs_pin *p)
 {
-- 
2.4.1


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 3/5 v2] path: New helpers path_get_pin/path_put_unpin for path pin
  2015-05-24 15:01 ` Kinglong Mee
@ 2015-05-24 15:10     ` Kinglong Mee
  -1 siblings, 0 replies; 16+ messages in thread
From: Kinglong Mee @ 2015-05-24 15:10 UTC (permalink / raw)
  To: J. Bruce Fields, Al Viro, linux-fsdevel-u79uwXL29TY76Z2rM5mHXA,
	linux-nfs-u79uwXL29TY76Z2rM5mHXA, NeilBrown
  Cc: Trond Myklebust, Steve Dickson, kinglongmee-Re5JQEeQqe8AvxtiuMwx3w

Two helpers for filesystem pining to vfsmnt, not mntget.

v2, new patch.

Signed-off-by: Kinglong Mee <kinglongmee-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
---
 fs/namei.c           | 26 ++++++++++++++++++++++++++
 include/linux/path.h |  4 ++++
 2 files changed, 30 insertions(+)

diff --git a/fs/namei.c b/fs/namei.c
index fe30d3b..8f94e7c 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -492,6 +492,32 @@ void path_put(const struct path *path)
 }
 EXPORT_SYMBOL(path_put);
 
+/**
+ * path_get_pin - get a reference to a path's dentry
+ *                and pin to path's vfsmnt
+ * @path: path to get the reference to
+ * @p: the fs_pin pin to vfsmnt
+ */
+void path_get_pin(struct path *path, struct fs_pin *p)
+{
+	dget(path->dentry);
+	pin_insert_group(p, path->mnt, NULL);
+}
+EXPORT_SYMBOL(path_get_pin);
+
+/**
+ * path_put_unpin - put a reference to a path's dentry
+ *                  and remove pin to path's vfsmnt
+ * @path: path to put the reference to
+ * @p: the fs_pin removed from vfsmnt
+ */
+void path_put_unpin(struct path *path, struct fs_pin *p)
+{
+	dput(path->dentry);
+	pin_remove(p);
+}
+EXPORT_SYMBOL(path_put_unpin);
+
 struct nameidata {
 	struct path	path;
 	struct qstr	last;
diff --git a/include/linux/path.h b/include/linux/path.h
index d137218..34599fb 100644
--- a/include/linux/path.h
+++ b/include/linux/path.h
@@ -3,6 +3,7 @@
 
 struct dentry;
 struct vfsmount;
+struct fs_pin;
 
 struct path {
 	struct vfsmount *mnt;
@@ -12,6 +13,9 @@ struct path {
 extern void path_get(const struct path *);
 extern void path_put(const struct path *);
 
+extern void path_get_pin(struct path *, struct fs_pin *);
+extern void path_put_unpin(struct path *, struct fs_pin *);
+
 static inline int path_equal(const struct path *path1, const struct path *path2)
 {
 	return path1->mnt == path2->mnt && path1->dentry == path2->dentry;
-- 
2.4.1

--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 3/5 v2] path: New helpers path_get_pin/path_put_unpin for path pin
@ 2015-05-24 15:10     ` Kinglong Mee
  0 siblings, 0 replies; 16+ messages in thread
From: Kinglong Mee @ 2015-05-24 15:10 UTC (permalink / raw)
  To: J. Bruce Fields, Al Viro, linux-fsdevel, linux-nfs, NeilBrown
  Cc: Trond Myklebust, Steve Dickson, kinglongmee

Two helpers for filesystem pining to vfsmnt, not mntget.

v2, new patch.

Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
---
 fs/namei.c           | 26 ++++++++++++++++++++++++++
 include/linux/path.h |  4 ++++
 2 files changed, 30 insertions(+)

diff --git a/fs/namei.c b/fs/namei.c
index fe30d3b..8f94e7c 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -492,6 +492,32 @@ void path_put(const struct path *path)
 }
 EXPORT_SYMBOL(path_put);
 
+/**
+ * path_get_pin - get a reference to a path's dentry
+ *                and pin to path's vfsmnt
+ * @path: path to get the reference to
+ * @p: the fs_pin pin to vfsmnt
+ */
+void path_get_pin(struct path *path, struct fs_pin *p)
+{
+	dget(path->dentry);
+	pin_insert_group(p, path->mnt, NULL);
+}
+EXPORT_SYMBOL(path_get_pin);
+
+/**
+ * path_put_unpin - put a reference to a path's dentry
+ *                  and remove pin to path's vfsmnt
+ * @path: path to put the reference to
+ * @p: the fs_pin removed from vfsmnt
+ */
+void path_put_unpin(struct path *path, struct fs_pin *p)
+{
+	dput(path->dentry);
+	pin_remove(p);
+}
+EXPORT_SYMBOL(path_put_unpin);
+
 struct nameidata {
 	struct path	path;
 	struct qstr	last;
diff --git a/include/linux/path.h b/include/linux/path.h
index d137218..34599fb 100644
--- a/include/linux/path.h
+++ b/include/linux/path.h
@@ -3,6 +3,7 @@
 
 struct dentry;
 struct vfsmount;
+struct fs_pin;
 
 struct path {
 	struct vfsmount *mnt;
@@ -12,6 +13,9 @@ struct path {
 extern void path_get(const struct path *);
 extern void path_put(const struct path *);
 
+extern void path_get_pin(struct path *, struct fs_pin *);
+extern void path_put_unpin(struct path *, struct fs_pin *);
+
 static inline int path_equal(const struct path *path1, const struct path *path2)
 {
 	return path1->mnt == path2->mnt && path1->dentry == path2->dentry;
-- 
2.4.1


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 4/5 v2] sunrpc: New helper cache_force_expire for cache cleanup
  2015-05-24 15:01 ` Kinglong Mee
@ 2015-05-24 15:10     ` Kinglong Mee
  -1 siblings, 0 replies; 16+ messages in thread
From: Kinglong Mee @ 2015-05-24 15:10 UTC (permalink / raw)
  To: J. Bruce Fields, Al Viro, linux-fsdevel-u79uwXL29TY76Z2rM5mHXA,
	linux-nfs-u79uwXL29TY76Z2rM5mHXA, NeilBrown
  Cc: Trond Myklebust, Steve Dickson, kinglongmee-Re5JQEeQqe8AvxtiuMwx3w

Using expiry_time force cleanup a cache.

v2, same as v1.

Signed-off-by: Kinglong Mee <kinglongmee-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
---
 include/linux/sunrpc/cache.h | 11 +++++++++++
 1 file changed, 11 insertions(+)

diff --git a/include/linux/sunrpc/cache.h b/include/linux/sunrpc/cache.h
index 437ddb6..ce75e9c 100644
--- a/include/linux/sunrpc/cache.h
+++ b/include/linux/sunrpc/cache.h
@@ -210,6 +210,17 @@ extern int cache_check(struct cache_detail *detail,
 		       struct cache_head *h, struct cache_req *rqstp);
 extern void cache_flush(void);
 extern void cache_purge(struct cache_detail *detail);
+
+static inline void cache_force_expire(struct cache_detail *detail, struct cache_head *h)
+{
+	write_lock(&detail->hash_lock);
+	h->expiry_time = seconds_since_boot() - 1;
+	detail->nextcheck = seconds_since_boot();
+	write_unlock(&detail->hash_lock);
+
+	cache_flush();
+}
+
 #define NEVER (0x7FFFFFFF)
 extern void __init cache_initialize(void);
 extern int cache_register_net(struct cache_detail *cd, struct net *net);
-- 
2.4.1

--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 4/5 v2] sunrpc: New helper cache_force_expire for cache cleanup
@ 2015-05-24 15:10     ` Kinglong Mee
  0 siblings, 0 replies; 16+ messages in thread
From: Kinglong Mee @ 2015-05-24 15:10 UTC (permalink / raw)
  To: J. Bruce Fields, Al Viro, linux-fsdevel, linux-nfs, NeilBrown
  Cc: Trond Myklebust, Steve Dickson, kinglongmee

Using expiry_time force cleanup a cache.

v2, same as v1.

Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
---
 include/linux/sunrpc/cache.h | 11 +++++++++++
 1 file changed, 11 insertions(+)

diff --git a/include/linux/sunrpc/cache.h b/include/linux/sunrpc/cache.h
index 437ddb6..ce75e9c 100644
--- a/include/linux/sunrpc/cache.h
+++ b/include/linux/sunrpc/cache.h
@@ -210,6 +210,17 @@ extern int cache_check(struct cache_detail *detail,
 		       struct cache_head *h, struct cache_req *rqstp);
 extern void cache_flush(void);
 extern void cache_purge(struct cache_detail *detail);
+
+static inline void cache_force_expire(struct cache_detail *detail, struct cache_head *h)
+{
+	write_lock(&detail->hash_lock);
+	h->expiry_time = seconds_since_boot() - 1;
+	detail->nextcheck = seconds_since_boot();
+	write_unlock(&detail->hash_lock);
+
+	cache_flush();
+}
+
 #define NEVER (0x7FFFFFFF)
 extern void __init cache_initialize(void);
 extern int cache_register_net(struct cache_detail *cd, struct net *net);
-- 
2.4.1


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 5/5] nfsd: allows user un-mounting filesystem where nfsd exports base on
  2015-05-24 15:01 ` Kinglong Mee
  (?)
  (?)
@ 2015-05-24 15:10 ` Kinglong Mee
  2015-06-05 15:02   ` Al Viro
  -1 siblings, 1 reply; 16+ messages in thread
From: Kinglong Mee @ 2015-05-24 15:10 UTC (permalink / raw)
  To: J. Bruce Fields, Al Viro, linux-fsdevel, linux-nfs, NeilBrown
  Cc: Trond Myklebust, Steve Dickson, kinglongmee

If there are some mount points(not exported for nfs) under pseudo root,
after client's operation of those entry under the root, anyone *can't*
unmount those mount points until export cache expired.

/nfs/xfs        *(rw,insecure,no_subtree_check,no_root_squash)
/nfs/pnfs       *(rw,insecure,no_subtree_check,no_root_squash)
total 0
drwxr-xr-x. 3 root root 84 Apr 21 22:27 pnfs
drwxr-xr-x. 3 root root 84 Apr 21 22:27 test
drwxr-xr-x. 2 root root  6 Apr 20 22:01 xfs
Filesystem                      1K-blocks    Used Available Use% Mounted on
......
/dev/sdd                          1038336   32944   1005392   4% /nfs/pnfs
/dev/sdc                         10475520   32928  10442592   1% /nfs/xfs
/dev/sde                           999320    1284    929224   1% /nfs/test
/mnt/pnfs/:
total 0
-rw-r--r--. 1 root root 0 Apr 21 22:23 attr
drwxr-xr-x. 2 root root 6 Apr 21 22:19 tmp

/mnt/xfs/:
total 0
umount: /nfs/test/: target is busy
        (In some cases useful info about processes that
	use the device is found by lsof(8) or fuser(1).)

I don't think that's user expect, they want umount /nfs/test/.

It's caused by exports cache of nfsd holds the reference of
the path (here is /nfs/test/), so, it can't be umounted.

v2,
1. Update exports according to the "allow_umount" option.
   Pin to vfsmnt default, change when updating.
2. Using kzalloc for all memory allocating without kmalloc.

Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
---
 fs/nfsd/export.c                 | 52 ++++++++++++++++++++++++++++++++--------
 fs/nfsd/export.h                 | 11 ++++++++-
 include/uapi/linux/nfsd/export.h |  3 ++-
 3 files changed, 54 insertions(+), 12 deletions(-)

diff --git a/fs/nfsd/export.c b/fs/nfsd/export.c
index f79521a..cc34b0b 100644
--- a/fs/nfsd/export.c
+++ b/fs/nfsd/export.c
@@ -43,9 +43,9 @@ static void expkey_put(struct kref *ref)
 
 	if (test_bit(CACHE_VALID, &key->h.flags) &&
 	    !test_bit(CACHE_NEGATIVE, &key->h.flags))
-		path_put(&key->ek_path);
+		path_put_unpin(&key->ek_path, &key->ek_pin);
 	auth_domain_put(key->ek_client);
-	kfree(key);
+	kfree_rcu(key, rcu_head);
 }
 
 static void expkey_request(struct cache_detail *cd,
@@ -83,7 +83,7 @@ static int expkey_parse(struct cache_detail *cd, char *mesg, int mlen)
 		return -EINVAL;
 	mesg[mlen-1] = 0;
 
-	buf = kmalloc(PAGE_SIZE, GFP_KERNEL);
+	buf = kzalloc(PAGE_SIZE, GFP_KERNEL);
 	err = -ENOMEM;
 	if (!buf)
 		goto out;
@@ -120,6 +120,7 @@ static int expkey_parse(struct cache_detail *cd, char *mesg, int mlen)
 		goto out;
 
 	key.ek_client = dom;	
+	key.cd = cd;
 	key.ek_fsidtype = fsidtype;
 	memcpy(key.ek_fsid, buf, len);
 
@@ -210,6 +211,13 @@ static inline void expkey_init(struct cache_head *cnew,
 	new->ek_fsidtype = item->ek_fsidtype;
 
 	memcpy(new->ek_fsid, item->ek_fsid, sizeof(new->ek_fsid));
+	new->cd = item->cd;
+}
+
+static void expkey_pin_kill(struct fs_pin *pin)
+{
+	struct svc_expkey *key = container_of(pin, struct svc_expkey, ek_pin);
+	cache_force_expire(key->cd, &key->h);
 }
 
 static inline void expkey_update(struct cache_head *cnew,
@@ -218,13 +226,14 @@ static inline void expkey_update(struct cache_head *cnew,
 	struct svc_expkey *new = container_of(cnew, struct svc_expkey, h);
 	struct svc_expkey *item = container_of(citem, struct svc_expkey, h);
 
+	init_fs_pin(&new->ek_pin, expkey_pin_kill);
 	new->ek_path = item->ek_path;
-	path_get(&item->ek_path);
+	path_get_pin(&new->ek_path, &new->ek_pin);
 }
 
 static struct cache_head *expkey_alloc(void)
 {
-	struct svc_expkey *i = kmalloc(sizeof(*i), GFP_KERNEL);
+	struct svc_expkey *i = kzalloc(sizeof(*i), GFP_KERNEL);
 	if (i)
 		return &i->h;
 	else
@@ -309,11 +318,16 @@ static void nfsd4_fslocs_free(struct nfsd4_fs_locations *fsloc)
 static void svc_export_put(struct kref *ref)
 {
 	struct svc_export *exp = container_of(ref, struct svc_export, h.ref);
-	path_put(&exp->ex_path);
+
+	if (EX_ALLOW_UMOUNT(exp))
+		path_put_unpin(&exp->ex_path, &exp->ex_pin);
+	else
+		path_put(&exp->ex_path);
+
 	auth_domain_put(exp->ex_client);
 	nfsd4_fslocs_free(&exp->ex_fslocs);
 	kfree(exp->ex_uuid);
-	kfree(exp);
+	kfree_rcu(exp, rcu_head);
 }
 
 static void svc_export_request(struct cache_detail *cd,
@@ -520,7 +534,7 @@ static int svc_export_parse(struct cache_detail *cd, char *mesg, int mlen)
 		return -EINVAL;
 	mesg[mlen-1] = 0;
 
-	buf = kmalloc(PAGE_SIZE, GFP_KERNEL);
+	buf = kzalloc(PAGE_SIZE, GFP_KERNEL);
 	if (!buf)
 		return -ENOMEM;
 
@@ -694,15 +708,23 @@ static int svc_export_match(struct cache_head *a, struct cache_head *b)
 		path_equal(&orig->ex_path, &new->ex_path);
 }
 
+static void export_pin_kill(struct fs_pin *pin)
+{
+	struct svc_export *exp = container_of(pin, struct svc_export, ex_pin);
+	cache_force_expire(exp->cd, &exp->h);
+}
+
 static void svc_export_init(struct cache_head *cnew, struct cache_head *citem)
 {
 	struct svc_export *new = container_of(cnew, struct svc_export, h);
 	struct svc_export *item = container_of(citem, struct svc_export, h);
 
+	init_fs_pin(&new->ex_pin, export_pin_kill);
 	kref_get(&item->ex_client->ref);
+	new->ex_flags = NFSEXP_ALLOW_UMOUNT;
 	new->ex_client = item->ex_client;
 	new->ex_path = item->ex_path;
-	path_get(&item->ex_path);
+	path_get_pin(&new->ex_path, &new->ex_pin);
 	new->ex_fslocs.locations = NULL;
 	new->ex_fslocs.locations_count = 0;
 	new->ex_fslocs.migrated = 0;
@@ -717,6 +739,14 @@ static void export_update(struct cache_head *cnew, struct cache_head *citem)
 	struct svc_export *item = container_of(citem, struct svc_export, h);
 	int i;
 
+	if (!EX_ALLOW_UMOUNT(item)) {
+		path_get(&new->ex_path);
+		if (EX_ALLOW_UMOUNT(new))
+			path_put_unpin(&new->ex_path, &new->ex_pin);
+		else
+			path_put(&new->ex_path);
+	}
+
 	new->ex_flags = item->ex_flags;
 	new->ex_anon_uid = item->ex_anon_uid;
 	new->ex_anon_gid = item->ex_anon_gid;
@@ -740,7 +770,7 @@ static void export_update(struct cache_head *cnew, struct cache_head *citem)
 
 static struct cache_head *svc_export_alloc(void)
 {
-	struct svc_export *i = kmalloc(sizeof(*i), GFP_KERNEL);
+	struct svc_export *i = kzalloc(sizeof(*i), GFP_KERNEL);
 	if (i)
 		return &i->h;
 	else
@@ -811,6 +841,7 @@ exp_find_key(struct cache_detail *cd, struct auth_domain *clp, int fsid_type,
 
 	key.ek_client = clp;
 	key.ek_fsidtype = fsid_type;
+	key.cd = cd;
 	memcpy(key.ek_fsid, fsidv, key_len(fsid_type));
 
 	ek = svc_expkey_lookup(cd, &key);
@@ -1159,6 +1190,7 @@ static struct flags {
 	{ NFSEXP_NOAUTHNLM, {"insecure_locks", ""}},
 	{ NFSEXP_V4ROOT, {"v4root", ""}},
 	{ NFSEXP_PNFS, {"pnfs", ""}},
+	{ NFSEXP_ALLOW_UMOUNT, {"allow_umount", ""}},
 	{ 0, {"", ""}}
 };
 
diff --git a/fs/nfsd/export.h b/fs/nfsd/export.h
index 1f52bfc..1134875 100644
--- a/fs/nfsd/export.h
+++ b/fs/nfsd/export.h
@@ -4,6 +4,7 @@
 #ifndef NFSD_EXPORT_H
 #define NFSD_EXPORT_H
 
+#include <linux/fs_pin.h>
 #include <linux/sunrpc/cache.h>
 #include <uapi/linux/nfsd/export.h>
 
@@ -46,6 +47,8 @@ struct exp_flavor_info {
 
 struct svc_export {
 	struct cache_head	h;
+	struct cache_detail	*cd;
+
 	struct auth_domain *	ex_client;
 	int			ex_flags;
 	struct path		ex_path;
@@ -58,7 +61,9 @@ struct svc_export {
 	struct exp_flavor_info	ex_flavors[MAX_SECINFO_LIST];
 	enum pnfs_layouttype	ex_layout_type;
 	struct nfsd4_deviceid_map *ex_devid_map;
-	struct cache_detail	*cd;
+
+	struct fs_pin		ex_pin;
+	struct rcu_head		rcu_head;
 };
 
 /* an "export key" (expkey) maps a filehandlefragement to an
@@ -67,17 +72,21 @@ struct svc_export {
  */
 struct svc_expkey {
 	struct cache_head	h;
+	struct cache_detail	*cd;
 
 	struct auth_domain *	ek_client;
 	int			ek_fsidtype;
 	u32			ek_fsid[6];
 
 	struct path		ek_path;
+	struct fs_pin		ek_pin;
+	struct rcu_head		rcu_head;
 };
 
 #define EX_ISSYNC(exp)		(!((exp)->ex_flags & NFSEXP_ASYNC))
 #define EX_NOHIDE(exp)		((exp)->ex_flags & NFSEXP_NOHIDE)
 #define EX_WGATHER(exp)		((exp)->ex_flags & NFSEXP_GATHERED_WRITES)
+#define EX_ALLOW_UMOUNT(exp)	((exp)->ex_flags & NFSEXP_ALLOW_UMOUNT)
 
 int nfsexp_flags(struct svc_rqst *rqstp, struct svc_export *exp);
 __be32 check_nfsd_access(struct svc_export *exp, struct svc_rqst *rqstp);
diff --git a/include/uapi/linux/nfsd/export.h b/include/uapi/linux/nfsd/export.h
index 0df7bd5..61aa8bb 100644
--- a/include/uapi/linux/nfsd/export.h
+++ b/include/uapi/linux/nfsd/export.h
@@ -51,9 +51,10 @@
  */
 #define	NFSEXP_V4ROOT		0x10000
 #define NFSEXP_PNFS		0x20000
+#define NFSEXP_ALLOW_UMOUNT	0x40000
 
 /* All flags that we claim to support.  (Note we don't support NOACL.) */
-#define NFSEXP_ALLFLAGS		0x3FE7F
+#define NFSEXP_ALLFLAGS		0x7FE7F
 
 /* The flags that may vary depending on security flavor: */
 #define NFSEXP_SECINFO_FLAGS	(NFSEXP_READONLY | NFSEXP_ROOTSQUASH \
-- 
2.4.1


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH 0/4 v2] NFSD: Pin to vfsmount for some nfsd exports cache
  2015-05-24 15:01 ` Kinglong Mee
                   ` (2 preceding siblings ...)
  (?)
@ 2015-06-01 18:21 ` J. Bruce Fields
  2015-06-02  1:41   ` Kinglong Mee
  -1 siblings, 1 reply; 16+ messages in thread
From: J. Bruce Fields @ 2015-06-01 18:21 UTC (permalink / raw)
  To: Kinglong Mee
  Cc: Al Viro, linux-fsdevel, linux-nfs, NeilBrown, Trond Myklebust,
	Steve Dickson

On Sun, May 24, 2015 at 11:01:56PM +0800, Kinglong Mee wrote:
> If there are some mount points(not exported for nfs) under pseudo root,
> after client's operation of those entry under the root, anyone *can't*
> unmount those mount points until export cache expired.

Thanks for the update, apologies for the delayed response.

> 
> # cat /etc/exports
> /nfs/xfs        *(rw,insecure,no_subtree_check,no_root_squash)
> /nfs/pnfs       *(rw,insecure,no_subtree_check,no_root_squash)
> # ll /nfs/
> total 0
> drwxr-xr-x. 3 root root 84 Apr 21 22:27 pnfs
> drwxr-xr-x. 3 root root 84 Apr 21 22:27 test
> drwxr-xr-x. 2 root root  6 Apr 20 22:01 xfs
> # mount /dev/sde /nfs/test
> # df
> Filesystem                      1K-blocks    Used Available Use% Mounted on
> ......
> /dev/sdd                          1038336   32944   1005392   4% /nfs/pnfs
> /dev/sdc                         10475520   32928  10442592   1% /nfs/xfs
> /dev/sde                           999320    1284    929224   1% /nfs/test
> # mount -t nfs 127.0.0.1:/nfs/ /mnt
> # ll /mnt/*/
> /mnt/pnfs/:
> total 0
> -rw-r--r--. 1 root root 0 Apr 21 22:23 attr
> drwxr-xr-x. 2 root root 6 Apr 21 22:19 tmp
> 
> /mnt/xfs/:
> total 0
> # umount /nfs/test/
> umount: /nfs/test/: target is busy
>         (In some cases useful info about processes that
>          use the device is found by lsof(8) or fuser(1).)
> 
> I don't think that's user expect, they want umount /nfs/test/.
> 
> It's caused by exports cache of nfsd holds the reference of
> the path (here is /nfs/test/), so, it can't be umounted.
> 
> v1 --> v2,
> 1. Adds an option named "allow_umount" for exports allowing user
>    un-mounting the filesystem where nfsd exports base on.

I don't think allow_umount is a useful option.  I'd rather just make the
code behave like allow_umount was on all the time.

The fact is nobody could ever *depend* on umount to fail on an exported
filesystem anyway.

I do think we might want a stronger "allow_umount" option that actually
revokes locks and such as necessary.  I just don't see the need for this
in between case.

--b.

> 2. New helpers path_get_pin/path_put_unpin for path pin.
> 3. Update exports according to the "allow_umount" option.
> 
> Kinglong Mee (5):
>   fs_pin: Fix uninitialized value in fs_pin
>   fs_pin: Export functions for specific filesystem
>   path: New helpers path_get_pin/path_put_unpin for path pin
>   sunrpc: New helper cache_force_expire for cache cleanup
>   nfsd: Allows user un-mounting filesystem where nfsd exports base on
> 
>  fs/fs_pin.c                      |  3 +++
>  fs/namei.c                       | 26 ++++++++++++++++++++
>  fs/nfsd/export.c                 | 52 ++++++++++++++++++++++++++++++++--------
>  fs/nfsd/export.h                 | 11 ++++++++-
>  include/linux/fs_pin.h           |  6 +++++
>  include/linux/path.h             |  4 ++++
>  include/linux/sunrpc/cache.h     | 11 +++++++++
>  include/uapi/linux/nfsd/export.h |  3 ++-
>  8 files changed, 104 insertions(+), 12 deletions(-)
> 
> -- 
> 2.4.1

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 0/4 v2] NFSD: Pin to vfsmount for some nfsd exports cache
  2015-06-01 18:21 ` [PATCH 0/4 v2] NFSD: Pin to vfsmount for some nfsd exports cache J. Bruce Fields
@ 2015-06-02  1:41   ` Kinglong Mee
  0 siblings, 0 replies; 16+ messages in thread
From: Kinglong Mee @ 2015-06-02  1:41 UTC (permalink / raw)
  To: J. Bruce Fields
  Cc: Al Viro, linux-fsdevel, linux-nfs, NeilBrown, Trond Myklebust,
	Steve Dickson

On 6/2/2015 2:21 AM, J. Bruce Fields wrote:
> On Sun, May 24, 2015 at 11:01:56PM +0800, Kinglong Mee wrote:
>> If there are some mount points(not exported for nfs) under pseudo root,
>> after client's operation of those entry under the root, anyone *can't*
>> unmount those mount points until export cache expired.
> 
> Thanks for the update, apologies for the delayed response.
> 
>>
>> # cat /etc/exports
>> /nfs/xfs        *(rw,insecure,no_subtree_check,no_root_squash)
>> /nfs/pnfs       *(rw,insecure,no_subtree_check,no_root_squash)
>> # ll /nfs/
>> total 0
>> drwxr-xr-x. 3 root root 84 Apr 21 22:27 pnfs
>> drwxr-xr-x. 3 root root 84 Apr 21 22:27 test
>> drwxr-xr-x. 2 root root  6 Apr 20 22:01 xfs
>> # mount /dev/sde /nfs/test
>> # df
>> Filesystem                      1K-blocks    Used Available Use% Mounted on
>> ......
>> /dev/sdd                          1038336   32944   1005392   4% /nfs/pnfs
>> /dev/sdc                         10475520   32928  10442592   1% /nfs/xfs
>> /dev/sde                           999320    1284    929224   1% /nfs/test
>> # mount -t nfs 127.0.0.1:/nfs/ /mnt
>> # ll /mnt/*/
>> /mnt/pnfs/:
>> total 0
>> -rw-r--r--. 1 root root 0 Apr 21 22:23 attr
>> drwxr-xr-x. 2 root root 6 Apr 21 22:19 tmp
>>
>> /mnt/xfs/:
>> total 0
>> # umount /nfs/test/
>> umount: /nfs/test/: target is busy
>>         (In some cases useful info about processes that
>>          use the device is found by lsof(8) or fuser(1).)
>>
>> I don't think that's user expect, they want umount /nfs/test/.
>>
>> It's caused by exports cache of nfsd holds the reference of
>> the path (here is /nfs/test/), so, it can't be umounted.
>>
>> v1 --> v2,
>> 1. Adds an option named "allow_umount" for exports allowing user
>>    un-mounting the filesystem where nfsd exports base on.
> 
> I don't think allow_umount is a useful option.  I'd rather just make the
> code behave like allow_umount was on all the time.

Got it.

A new patch site is posted as updated from v1.

> 
> The fact is nobody could ever *depend* on umount to fail on an exported
> filesystem anyway.
> 
> I do think we might want a stronger "allow_umount" option that actually
> revokes locks and such as necessary.  I just don't see the need for this
> in between case.

Yes, it isn't.

But user always care it when umount fail, and it is late.
I think a proc file is useful than mount options like the stronger
"allow_umount". Maybe "exportfs -f" can do the job of revokes locks.

We can discuss it deeply when really need it. 

thanks,
Kinglong Mee

> 
> --b.
> 
>> 2. New helpers path_get_pin/path_put_unpin for path pin.
>> 3. Update exports according to the "allow_umount" option.
>>
>> Kinglong Mee (5):
>>   fs_pin: Fix uninitialized value in fs_pin
>>   fs_pin: Export functions for specific filesystem
>>   path: New helpers path_get_pin/path_put_unpin for path pin
>>   sunrpc: New helper cache_force_expire for cache cleanup
>>   nfsd: Allows user un-mounting filesystem where nfsd exports base on
>>
>>  fs/fs_pin.c                      |  3 +++
>>  fs/namei.c                       | 26 ++++++++++++++++++++
>>  fs/nfsd/export.c                 | 52 ++++++++++++++++++++++++++++++++--------
>>  fs/nfsd/export.h                 | 11 ++++++++-
>>  include/linux/fs_pin.h           |  6 +++++
>>  include/linux/path.h             |  4 ++++
>>  include/linux/sunrpc/cache.h     | 11 +++++++++
>>  include/uapi/linux/nfsd/export.h |  3 ++-
>>  8 files changed, 104 insertions(+), 12 deletions(-)
>>
>> -- 
>> 2.4.1
> 

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 5/5] nfsd: allows user un-mounting filesystem where nfsd exports base on
  2015-05-24 15:10 ` [PATCH 5/5] nfsd: allows user un-mounting filesystem where nfsd exports base on Kinglong Mee
@ 2015-06-05 15:02   ` Al Viro
  2015-06-06  2:21     ` Al Viro
  0 siblings, 1 reply; 16+ messages in thread
From: Al Viro @ 2015-06-05 15:02 UTC (permalink / raw)
  To: Kinglong Mee
  Cc: J. Bruce Fields, linux-fsdevel, linux-nfs, NeilBrown,
	Trond Myklebust, Steve Dickson

On Sun, May 24, 2015 at 11:10:50PM +0800, Kinglong Mee wrote:
> --- a/fs/nfsd/export.c
> +++ b/fs/nfsd/export.c
> @@ -43,9 +43,9 @@ static void expkey_put(struct kref *ref)
>  
>  	if (test_bit(CACHE_VALID, &key->h.flags) &&
>  	    !test_bit(CACHE_NEGATIVE, &key->h.flags))
> -		path_put(&key->ek_path);
> +		path_put_unpin(&key->ek_path, &key->ek_pin);
>  	auth_domain_put(key->ek_client);
> -	kfree(key);
> +	kfree_rcu(key, rcu_head);
>  }

That looks wrong.  OK, so you want umount() to proceed; fine, no problem
with that.  However, what happens if the final mntput() hits while you
are just approaching that path_put_unpin()?  ->kill() will be triggered,
and it would bloody better
	a) make sure that expkey_put() is called for that key if it hadn't
already been done and
	b) do not return until such expkey_put() completes.  Including the
ones that might have been already entered by the time we'd got to ->kill().

Am I missing something subtle here?

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 5/5] nfsd: allows user un-mounting filesystem where nfsd exports base on
  2015-06-05 15:02   ` Al Viro
@ 2015-06-06  2:21     ` Al Viro
  2015-06-06 13:38       ` Kinglong Mee
  0 siblings, 1 reply; 16+ messages in thread
From: Al Viro @ 2015-06-06  2:21 UTC (permalink / raw)
  To: Kinglong Mee
  Cc: J. Bruce Fields, linux-fsdevel, linux-nfs, NeilBrown,
	Trond Myklebust, Steve Dickson

On Fri, Jun 05, 2015 at 04:02:13PM +0100, Al Viro wrote:
> On Sun, May 24, 2015 at 11:10:50PM +0800, Kinglong Mee wrote:
> > --- a/fs/nfsd/export.c
> > +++ b/fs/nfsd/export.c
> > @@ -43,9 +43,9 @@ static void expkey_put(struct kref *ref)
> >  
> >  	if (test_bit(CACHE_VALID, &key->h.flags) &&
> >  	    !test_bit(CACHE_NEGATIVE, &key->h.flags))
> > -		path_put(&key->ek_path);
> > +		path_put_unpin(&key->ek_path, &key->ek_pin);
> >  	auth_domain_put(key->ek_client);
> > -	kfree(key);
> > +	kfree_rcu(key, rcu_head);
> >  }
> 
> That looks wrong.  OK, so you want umount() to proceed; fine, no problem
> with that.  However, what happens if the final mntput() hits while you
> are just approaching that path_put_unpin()?  ->kill() will be triggered,
> and it would bloody better
> 	a) make sure that expkey_put() is called for that key if it hadn't
> already been done and
> 	b) do not return until such expkey_put() completes.  Including the
> ones that might have been already entered by the time we'd got to ->kill().
> 
> Am I missing something subtle here?

Having looked through that code...  It *is* wrong.  Note that the normal
approach is to have pin_remove() called via pin_kill(), directly or triggered
from group_pin_kill() and/or cleanup_mnt() on the mount it's attached to.
pin_remove() should never be called outside of ->kill() callbacks.  It should
be called at the point where you are OK with fs being shut down.

The fundamental reason why it's broken is different, though - you *can't*
grab a reference if all you've got is a pin.  By the time the callback is
called, the mount in question is already irretrievably committed to being
killed.  There's one hell of a wide window between the point of no return
and the point where you are notified of anything, and that's by design -
you might very well have had several mounts doomed by a syscall and they
all get through cleanup_mnt() just before return to userland.  One by one.
So between the point where this puppy is doomed and the call of your callback
there might have been several filesystems going through shutdown, with tons
of IO, waiting for remote servers, etc.

We could add a primitive that would _try_ to grab a reference - that can
be done (lock_mount_hash(), check if it has MNT_DOOMED or MNT_SYNC_UMOUNT,
fail if it does, otherwise mnt_add_count(mnt, 1) and succeed, doing
unlock_mount_hash() on both exit paths).  HOWEVER, you'll need to think
very carefully where to use that primitive - unlike mntget() it _can_
fail and lock_mount_hash() can inflict quite a bit of cacheline pingpong
if used heavily.

Could you give details on lifecycle of those objects, including the stages
at which we might try to grab references?  Combination of such primitive with
a pin (doing just "NULL the references to vfsmount/dentry, do dput() on
what that dentry used to be and call pin_remove()") might work, if the
lifecycle is good enough.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 5/5] nfsd: allows user un-mounting filesystem where nfsd exports base on
  2015-06-06  2:21     ` Al Viro
@ 2015-06-06 13:38       ` Kinglong Mee
  0 siblings, 0 replies; 16+ messages in thread
From: Kinglong Mee @ 2015-06-06 13:38 UTC (permalink / raw)
  To: Al Viro
  Cc: J. Bruce Fields, linux-fsdevel, linux-nfs, NeilBrown,
	Trond Myklebust, Steve Dickson, kinglongmee

On 6/6/2015 10:21 AM, Al Viro wrote:
> On Fri, Jun 05, 2015 at 04:02:13PM +0100, Al Viro wrote:
>> On Sun, May 24, 2015 at 11:10:50PM +0800, Kinglong Mee wrote:
>>> --- a/fs/nfsd/export.c
>>> +++ b/fs/nfsd/export.c
>>> @@ -43,9 +43,9 @@ static void expkey_put(struct kref *ref)
>>>  
>>>  	if (test_bit(CACHE_VALID, &key->h.flags) &&
>>>  	    !test_bit(CACHE_NEGATIVE, &key->h.flags))
>>> -		path_put(&key->ek_path);
>>> +		path_put_unpin(&key->ek_path, &key->ek_pin);
>>>  	auth_domain_put(key->ek_client);
>>> -	kfree(key);
>>> +	kfree_rcu(key, rcu_head);
>>>  }
>>
>> That looks wrong.  OK, so you want umount() to proceed; fine, no problem
>> with that.  However, what happens if the final mntput() hits while you
>> are just approaching that path_put_unpin()?  ->kill() will be triggered,
>> and it would bloody better
>> 	a) make sure that expkey_put() is called for that key if it hadn't
>> already been done and
>> 	b) do not return until such expkey_put() completes.  Including the
>> ones that might have been already entered by the time we'd got to ->kill().

You are right.
Sorry for my fault, the above patch misses caring the race.

>>
>> Am I missing something subtle here?
> 
> Having looked through that code...  It *is* wrong.  Note that the normal
> approach is to have pin_remove() called via pin_kill(), directly or triggered
> from group_pin_kill() and/or cleanup_mnt() on the mount it's attached to.
> pin_remove() should never be called outside of ->kill() callbacks.  It should
> be called at the point where you are OK with fs being shut down.

Thank you very much for your comments.
I will try to using fs_pin as the restrict.

> 
> The fundamental reason why it's broken is different, though - you *can't*
> grab a reference if all you've got is a pin.  By the time the callback is
> called, the mount in question is already irretrievably committed to being
> killed.  There's one hell of a wide window between the point of no return
> and the point where you are notified of anything, and that's by design -
> you might very well have had several mounts doomed by a syscall and they
> all get through cleanup_mnt() just before return to userland.  One by one.
> So between the point where this puppy is doomed and the call of your callback
> there might have been several filesystems going through shutdown, with tons
> of IO, waiting for remote servers, etc.
> 
> We could add a primitive that would _try_ to grab a reference - that can
> be done (lock_mount_hash(), check if it has MNT_DOOMED or MNT_SYNC_UMOUNT,
> fail if it does, otherwise mnt_add_count(mnt, 1) and succeed, doing
> unlock_mount_hash() on both exit paths).  HOWEVER, you'll need to think
> very carefully where to use that primitive - unlike mntget() it _can_
> fail and lock_mount_hash() can inflict quite a bit of cacheline pingpong
> if used heavily.

Do you mean adding a new feature?

> 
> Could you give details on lifecycle of those objects, including the stages
> at which we might try to grab references?  Combination of such primitive with
> a pin (doing just "NULL the references to vfsmount/dentry, do dput() on
> what that dentry used to be and call pin_remove()") might work, if the
> lifecycle is good enough.

NFSD has two caches named expkey and export which are managed by sunrpc cache
fundamental. I will only explain export following for expkey is similar as export.

struct cache_head {
	struct kref     ref;
	... ... 
};
struct svc_export {
	struct cache_head       h;
	struct path             ex_path;
	... ...
};

1. svc_export has a reference, will be freed when the reference is decreased to zero.
2. ex_path must be put when freed (Want change mntget to fs_pin for ex_path's vfsmnt).
3. With fs_pin, there are two logic (one is the normal logic, the other is pin_kill)
   which can cause free svc_export.
4. The reference of the normal logic is zero, but the pin_kill logic is not zero.
   the second logic will decrease the reference indirectly, if decrease to zero, 
   umount will go though the normal logic's code, at last frees the svc_export;
   if not zero, umount must don't free the svc_export.

I try to solve the window as,
struct svc_export {
	struct cache_head       h;
	struct path             ex_path;
	... ...
	struct fs_pin           ex_pin;
        struct rcu_head         rcu_head;

        /* For cache_put and fs umounting window */
        struct completion       ex_done;
        struct work_struct      ex_work;
};

1. ex_done is for umount waiting the reference is decreased to zero.
2. ex_work is for umount decrease the reference indirectly.
3. The normal logic don't free the svc_export, calls complete() and
   go though pin_kill() logic as,
   (svc_export_put will be called when reference is decreased to zero)
   
   static void svc_export_put(struct kref *ref)
   {
          struct svc_export *exp = container_of(ref, struct svc_export, h.ref);

          rcu_read_lock();
          complete(&exp->ex_done);
          pin_kill(&exp->ex_pin);
   }

4. pin_kill() logic will schedules to decrease the reference though ex_work,
   and at last path_put_unpin and destroy the svc_export.

   static void export_pin_kill(struct fs_pin *pin)
   {
          struct svc_export *exp = container_of(pin, struct svc_export, ex_pin);

          if (!completion_done(&exp->ex_done)) {
                  schedule_work(&exp->ex_work);
                  wait_for_completion(&exp->ex_done);
          }
          path_put_unpin(&exp->ex_path, &exp->ex_pin);
          svc_export_destroy(exp);
   }

The full patches will be sent later. Thanks again.

thanks,
Kinglong Mee

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2015-06-06 13:50 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-05-24 15:01 [PATCH 0/4 v2] NFSD: Pin to vfsmount for some nfsd exports cache Kinglong Mee
2015-05-24 15:01 ` Kinglong Mee
     [not found] ` <5561E7E4.50604-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2015-05-24 15:10   ` [PATCH 1/5 v2] fs_pin: Fix uninitialized value in fs_pin Kinglong Mee
2015-05-24 15:10     ` Kinglong Mee
2015-05-24 15:10   ` [PATCH 2/5 v2] fs_pin: Export functions for specific filesystem Kinglong Mee
2015-05-24 15:10     ` Kinglong Mee
2015-05-24 15:10   ` [PATCH 3/5 v2] path: New helpers path_get_pin/path_put_unpin for path pin Kinglong Mee
2015-05-24 15:10     ` Kinglong Mee
2015-05-24 15:10   ` [PATCH 4/5 v2] sunrpc: New helper cache_force_expire for cache cleanup Kinglong Mee
2015-05-24 15:10     ` Kinglong Mee
2015-05-24 15:10 ` [PATCH 5/5] nfsd: allows user un-mounting filesystem where nfsd exports base on Kinglong Mee
2015-06-05 15:02   ` Al Viro
2015-06-06  2:21     ` Al Viro
2015-06-06 13:38       ` Kinglong Mee
2015-06-01 18:21 ` [PATCH 0/4 v2] NFSD: Pin to vfsmount for some nfsd exports cache J. Bruce Fields
2015-06-02  1:41   ` Kinglong Mee

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.