All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/9 v8] NFSD: Pin to vfsmount for nfsd exports cache
@ 2015-07-27  3:05 Kinglong Mee
       [not found] ` <55B5A012.1030006-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
                   ` (4 more replies)
  0 siblings, 5 replies; 49+ messages in thread
From: Kinglong Mee @ 2015-07-27  3:05 UTC (permalink / raw)
  To: J. Bruce Fields, Al Viro
  Cc: linux-nfs, linux-fsdevel, NeilBrown, Trond Myklebust, kinglongmee

If there are some mount points(not exported for nfs) under pseudo root,
after client's operation of those entry under the root, anyone *can't*
unmount those mount points until export cache expired.

# cat /etc/exports
/nfs/xfs        *(rw,insecure,no_subtree_check,no_root_squash)
/nfs/pnfs       *(rw,insecure,no_subtree_check,no_root_squash)
# ll /nfs/
total 0
drwxr-xr-x. 3 root root 84 Apr 21 22:27 pnfs
drwxr-xr-x. 3 root root 84 Apr 21 22:27 test
drwxr-xr-x. 2 root root  6 Apr 20 22:01 xfs
# mount /dev/sde /nfs/test
# df
Filesystem                      1K-blocks    Used Available Use% Mounted on
......
/dev/sdd                          1038336   32944   1005392   4% /nfs/pnfs
/dev/sdc                         10475520   32928  10442592   1% /nfs/xfs
/dev/sde                           999320    1284    929224   1% /nfs/test
# mount -t nfs 127.0.0.1:/nfs/ /mnt
# ll /mnt/*/
/mnt/pnfs/:
total 0
-rw-r--r--. 1 root root 0 Apr 21 22:23 attr
drwxr-xr-x. 2 root root 6 Apr 21 22:19 tmp

/mnt/xfs/:
total 0
# umount /nfs/test/
umount: /nfs/test/: target is busy
        (In some cases useful info about processes that
         use the device is found by lsof(8) or fuser(1).)

It's caused by exports cache of nfsd holds the reference of
the path (here is /nfs/test/), so, it can't be umounted.

I don't think that's user expect, they want umount /nfs/test/.
Bruce think user can also umount /nfs/pnfs/ and /nfs/xfs.

This patch site lets nfsd exports pinning to vfsmount, 
not using mntget, so user can umount any exports mountpoint now.

v3, 
1. New helpers path_get_pin/path_put_unpin for path pin.
2. Use kzalloc for allocating memory.

v4, Thanks for Al Viro's commets for the logic of fs_pin.
1. add a completion for pin_kill waiting the reference is decreased to zero.
2. add a work_struct for pin_kill decreases the reference indirectly.
3. free svc_export/svc_expkey in pin_kill, not svc_export_put/svc_expkey_put.
4. svc_export_put/svc_expkey_put go though pin_kill logic.

v5, 
let killing fs_pin under a reference of vfsmnt.

v6,
1. revert the change of v5
2. new helper legitimize_mntget() for nfsd exports/expkey cache
   get vfsmount from fs_pin
3. cleanup some codes of sunrpc's cache
4. switch using list_head instead of single list for cache_head
   in cache_detail
5. new functions validate/invalidate for processing of reference
   increase/decrease change (nfsd exports/expkey using grab the
   reference of mnt)
6. delete cache_head directly from cache_detail in pin_kill

v7, 
implement self reference increase and decrease for nfsd exports/expkey 

When reference of cahce_head increase(>1), grab a reference of mnt once.
and reference decrease to 1 (==1), drop the reference of mnt.

v8, Use hash_list for sunrpc cachen and a new method for nfsd's pin,

1. There are only one outlet from each cache, exp_find_key() for expkey, 
   exp_get_by_name() for export.
2. Any fsid to export or filehandle to export will call the function.
3. exp_get()/exp_put() increase/decrease the reference of export.

Call legitimize_mntget() in the only outlet function exp_find_key()/
exp_get_by_name(), if fail return STALE, otherwise, any valid
expkey/export from the cache is validated (Have get the reference of vfsmnt).

Add mntget() in exp_get() and mntput() in exp_put(), because the export
passed to exp_get/exp_put are returned from exp_find_key/exp_get_by_name.

For expkey cache,
1. At first, a fsid is passed to exp_find_key, and lookup a cache
   in svc_expkey_lookup, if success, ekey->ek_path is pined to mount.
2. Then call legitimize_mntget getting a reference of vfsmnt 
   before return from exp_find_key.
3. Any calling exp_find_key with valid cache must put the vfsmnt.

for export cache,
1. At first, a path (returned from exp_find_key) with validate vfsmnt
   is passed to exp_get_by_name, if success, exp->ex_path is pined to mount.
2. Then call legitimize_mntget getting a reference of vfsmnt 
   before return from exp_get_by_name.
3. Any calling exp_get_by_name with valid cache must put the vfsmnt
   by exp_put();
4. Any using the exp returned from exp_get_by_name must call exp_get(),
   will increase the reference of vfsmnt.

So that,
a. After getting the reference in 2, any umount of filesystem will get -EBUSY.
b. After put all reference after 4, or before get the reference in 2, 
   any umount of filesystem will call pin_kill, and delete the cache directly,
   also unpin the vfsmount.
c. Between 1 and 2, have get the reference of exp/key cache, with invalidate vfsmnt.
   As you said, umount of filesystem only wait exp_find_key/exp_get_by_name
   put the reference of cache when legitimize_mntget fail.

Kinglong Mee (9):
  fs_pin: Initialize value for fs_pin explicitly
  fs_pin: Export functions for specific filesystem
  path: New helpers path_get_pin/path_put_unpin for path pin
  fs: New helper legitimize_mntget() for getting a legitimize mnt
  sunrpc: Store cache_detail in seq_file's private, directly
  sunrpc/nfsd: Remove redundant code by exports seq_operations functions
  sunrpc: Switch to using hash list instead single list
  sunrpc: New helper cache_delete_entry for deleting cache_head directly
  nfsd: Allows user un-mounting filesystem where nfsd exports base on

 fs/fs_pin.c                  |   4 +
 fs/namei.c                   |  26 ++++++
 fs/namespace.c               |  19 ++++
 fs/nfsd/export.c             | 209 ++++++++++++++++++++++++-------------------
 fs/nfsd/export.h             |  22 ++++-
 include/linux/fs_pin.h       |   6 ++
 include/linux/mount.h        |   1 +
 include/linux/path.h         |   4 +
 include/linux/sunrpc/cache.h |  10 ++-
 net/sunrpc/cache.c           | 133 ++++++++++++++++-----------
 10 files changed, 286 insertions(+), 148 deletions(-)

-- 
2.4.3


^ permalink raw reply	[flat|nested] 49+ messages in thread

* [PATCH 1/9 v8] fs_pin: Initialize value for fs_pin explicitly
  2015-07-27  3:05 [PATCH 0/9 v8] NFSD: Pin to vfsmount for nfsd exports cache Kinglong Mee
@ 2015-07-27  3:06     ` Kinglong Mee
  2015-07-27  3:08 ` [PATCH 4/9 v8] fs: New helper legitimize_mntget() for getting a legitimize mnt Kinglong Mee
                       ` (3 subsequent siblings)
  4 siblings, 0 replies; 49+ messages in thread
From: Kinglong Mee @ 2015-07-27  3:06 UTC (permalink / raw)
  To: J. Bruce Fields, Al Viro
  Cc: linux-nfs-u79uwXL29TY76Z2rM5mHXA,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA, NeilBrown, Trond Myklebust,
	kinglongmee-Re5JQEeQqe8AvxtiuMwx3w

Without initialized, done in fs_pin at stack space may
contains strange value.

v8, same as v3
Adds macro for header file

Signed-off-by: Kinglong Mee <kinglongmee-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
---
 include/linux/fs_pin.h | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/include/linux/fs_pin.h b/include/linux/fs_pin.h
index 3886b3b..0dde7b7 100644
--- a/include/linux/fs_pin.h
+++ b/include/linux/fs_pin.h
@@ -1,3 +1,6 @@
+#ifndef _LINUX_FS_PIN_H
+#define _LINUX_FS_PIN_H
+
 #include <linux/wait.h>
 
 struct fs_pin {
@@ -16,9 +19,12 @@ static inline void init_fs_pin(struct fs_pin *p, void (*kill)(struct fs_pin *))
 	INIT_HLIST_NODE(&p->s_list);
 	INIT_HLIST_NODE(&p->m_list);
 	p->kill = kill;
+	p->done = 0;
 }
 
 void pin_remove(struct fs_pin *);
 void pin_insert_group(struct fs_pin *, struct vfsmount *, struct hlist_head *);
 void pin_insert(struct fs_pin *, struct vfsmount *);
 void pin_kill(struct fs_pin *);
+
+#endif
-- 
2.4.3

--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH 1/9 v8] fs_pin: Initialize value for fs_pin explicitly
@ 2015-07-27  3:06     ` Kinglong Mee
  0 siblings, 0 replies; 49+ messages in thread
From: Kinglong Mee @ 2015-07-27  3:06 UTC (permalink / raw)
  To: J. Bruce Fields, Al Viro
  Cc: linux-nfs, linux-fsdevel, NeilBrown, Trond Myklebust, kinglongmee

Without initialized, done in fs_pin at stack space may
contains strange value.

v8, same as v3
Adds macro for header file

Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
---
 include/linux/fs_pin.h | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/include/linux/fs_pin.h b/include/linux/fs_pin.h
index 3886b3b..0dde7b7 100644
--- a/include/linux/fs_pin.h
+++ b/include/linux/fs_pin.h
@@ -1,3 +1,6 @@
+#ifndef _LINUX_FS_PIN_H
+#define _LINUX_FS_PIN_H
+
 #include <linux/wait.h>
 
 struct fs_pin {
@@ -16,9 +19,12 @@ static inline void init_fs_pin(struct fs_pin *p, void (*kill)(struct fs_pin *))
 	INIT_HLIST_NODE(&p->s_list);
 	INIT_HLIST_NODE(&p->m_list);
 	p->kill = kill;
+	p->done = 0;
 }
 
 void pin_remove(struct fs_pin *);
 void pin_insert_group(struct fs_pin *, struct vfsmount *, struct hlist_head *);
 void pin_insert(struct fs_pin *, struct vfsmount *);
 void pin_kill(struct fs_pin *);
+
+#endif
-- 
2.4.3


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH 2/9 v8] fs_pin: Export functions for specific filesystem
  2015-07-27  3:05 [PATCH 0/9 v8] NFSD: Pin to vfsmount for nfsd exports cache Kinglong Mee
@ 2015-07-27  3:07     ` Kinglong Mee
  2015-07-27  3:08 ` [PATCH 4/9 v8] fs: New helper legitimize_mntget() for getting a legitimize mnt Kinglong Mee
                       ` (3 subsequent siblings)
  4 siblings, 0 replies; 49+ messages in thread
From: Kinglong Mee @ 2015-07-27  3:07 UTC (permalink / raw)
  To: J. Bruce Fields, Al Viro
  Cc: linux-nfs-u79uwXL29TY76Z2rM5mHXA,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA, NeilBrown, Trond Myklebust,
	kinglongmee-Re5JQEeQqe8AvxtiuMwx3w

Exports functions for others who want pin to vfsmount,
eg, nfsd's export cache.

v8, same as v4
add exporting of pin_kill.

Signed-off-by: Kinglong Mee <kinglongmee-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
---
 fs/fs_pin.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/fs/fs_pin.c b/fs/fs_pin.c
index 611b540..a1a4eb2 100644
--- a/fs/fs_pin.c
+++ b/fs/fs_pin.c
@@ -17,6 +17,7 @@ void pin_remove(struct fs_pin *pin)
 	wake_up_locked(&pin->wait);
 	spin_unlock_irq(&pin->wait.lock);
 }
+EXPORT_SYMBOL(pin_remove);
 
 void pin_insert_group(struct fs_pin *pin, struct vfsmount *m, struct hlist_head *p)
 {
@@ -26,11 +27,13 @@ void pin_insert_group(struct fs_pin *pin, struct vfsmount *m, struct hlist_head
 	hlist_add_head(&pin->m_list, &real_mount(m)->mnt_pins);
 	spin_unlock(&pin_lock);
 }
+EXPORT_SYMBOL(pin_insert_group);
 
 void pin_insert(struct fs_pin *pin, struct vfsmount *m)
 {
 	pin_insert_group(pin, m, &m->mnt_sb->s_pins);
 }
+EXPORT_SYMBOL(pin_insert);
 
 void pin_kill(struct fs_pin *p)
 {
@@ -72,6 +75,7 @@ void pin_kill(struct fs_pin *p)
 	}
 	rcu_read_unlock();
 }
+EXPORT_SYMBOL(pin_kill);
 
 void mnt_pin_kill(struct mount *m)
 {
-- 
2.4.3

--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH 2/9 v8] fs_pin: Export functions for specific filesystem
@ 2015-07-27  3:07     ` Kinglong Mee
  0 siblings, 0 replies; 49+ messages in thread
From: Kinglong Mee @ 2015-07-27  3:07 UTC (permalink / raw)
  To: J. Bruce Fields, Al Viro
  Cc: linux-nfs, linux-fsdevel, NeilBrown, Trond Myklebust, kinglongmee

Exports functions for others who want pin to vfsmount,
eg, nfsd's export cache.

v8, same as v4
add exporting of pin_kill.

Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
---
 fs/fs_pin.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/fs/fs_pin.c b/fs/fs_pin.c
index 611b540..a1a4eb2 100644
--- a/fs/fs_pin.c
+++ b/fs/fs_pin.c
@@ -17,6 +17,7 @@ void pin_remove(struct fs_pin *pin)
 	wake_up_locked(&pin->wait);
 	spin_unlock_irq(&pin->wait.lock);
 }
+EXPORT_SYMBOL(pin_remove);
 
 void pin_insert_group(struct fs_pin *pin, struct vfsmount *m, struct hlist_head *p)
 {
@@ -26,11 +27,13 @@ void pin_insert_group(struct fs_pin *pin, struct vfsmount *m, struct hlist_head
 	hlist_add_head(&pin->m_list, &real_mount(m)->mnt_pins);
 	spin_unlock(&pin_lock);
 }
+EXPORT_SYMBOL(pin_insert_group);
 
 void pin_insert(struct fs_pin *pin, struct vfsmount *m)
 {
 	pin_insert_group(pin, m, &m->mnt_sb->s_pins);
 }
+EXPORT_SYMBOL(pin_insert);
 
 void pin_kill(struct fs_pin *p)
 {
@@ -72,6 +75,7 @@ void pin_kill(struct fs_pin *p)
 	}
 	rcu_read_unlock();
 }
+EXPORT_SYMBOL(pin_kill);
 
 void mnt_pin_kill(struct mount *m)
 {
-- 
2.4.3


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH 3/9 v8] path: New helpers path_get_pin/path_put_unpin for path pin
  2015-07-27  3:05 [PATCH 0/9 v8] NFSD: Pin to vfsmount for nfsd exports cache Kinglong Mee
@ 2015-07-27  3:07     ` Kinglong Mee
  2015-07-27  3:08 ` [PATCH 4/9 v8] fs: New helper legitimize_mntget() for getting a legitimize mnt Kinglong Mee
                       ` (3 subsequent siblings)
  4 siblings, 0 replies; 49+ messages in thread
From: Kinglong Mee @ 2015-07-27  3:07 UTC (permalink / raw)
  To: J. Bruce Fields, Al Viro
  Cc: linux-nfs-u79uwXL29TY76Z2rM5mHXA,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA, NeilBrown, Trond Myklebust,
	kinglongmee-Re5JQEeQqe8AvxtiuMwx3w

Two helpers for filesystem pining to vfsmnt, not mntget.

v8 same as v2.

Signed-off-by: Kinglong Mee <kinglongmee-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
---
 fs/namei.c           | 26 ++++++++++++++++++++++++++
 include/linux/path.h |  4 ++++
 2 files changed, 30 insertions(+)

diff --git a/fs/namei.c b/fs/namei.c
index ae4e4c1..08f2cfc 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -492,6 +492,32 @@ void path_put(const struct path *path)
 }
 EXPORT_SYMBOL(path_put);
 
+/**
+ * path_get_pin - get a reference to a path's dentry
+ *                and pin to path's vfsmnt
+ * @path: path to get the reference to
+ * @p: the fs_pin pin to vfsmnt
+ */
+void path_get_pin(struct path *path, struct fs_pin *p)
+{
+	dget(path->dentry);
+	pin_insert_group(p, path->mnt, NULL);
+}
+EXPORT_SYMBOL(path_get_pin);
+
+/**
+ * path_put_unpin - put a reference to a path's dentry
+ *                  and remove pin to path's vfsmnt
+ * @path: path to put the reference to
+ * @p: the fs_pin removed from vfsmnt
+ */
+void path_put_unpin(struct path *path, struct fs_pin *p)
+{
+	dput(path->dentry);
+	pin_remove(p);
+}
+EXPORT_SYMBOL(path_put_unpin);
+
 #define EMBEDDED_LEVELS 2
 struct nameidata {
 	struct path	path;
diff --git a/include/linux/path.h b/include/linux/path.h
index d137218..34599fb 100644
--- a/include/linux/path.h
+++ b/include/linux/path.h
@@ -3,6 +3,7 @@
 
 struct dentry;
 struct vfsmount;
+struct fs_pin;
 
 struct path {
 	struct vfsmount *mnt;
@@ -12,6 +13,9 @@ struct path {
 extern void path_get(const struct path *);
 extern void path_put(const struct path *);
 
+extern void path_get_pin(struct path *, struct fs_pin *);
+extern void path_put_unpin(struct path *, struct fs_pin *);
+
 static inline int path_equal(const struct path *path1, const struct path *path2)
 {
 	return path1->mnt == path2->mnt && path1->dentry == path2->dentry;
-- 
2.4.3

--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH 3/9 v8] path: New helpers path_get_pin/path_put_unpin for path pin
@ 2015-07-27  3:07     ` Kinglong Mee
  0 siblings, 0 replies; 49+ messages in thread
From: Kinglong Mee @ 2015-07-27  3:07 UTC (permalink / raw)
  To: J. Bruce Fields, Al Viro
  Cc: linux-nfs, linux-fsdevel, NeilBrown, Trond Myklebust, kinglongmee

Two helpers for filesystem pining to vfsmnt, not mntget.

v8 same as v2.

Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
---
 fs/namei.c           | 26 ++++++++++++++++++++++++++
 include/linux/path.h |  4 ++++
 2 files changed, 30 insertions(+)

diff --git a/fs/namei.c b/fs/namei.c
index ae4e4c1..08f2cfc 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -492,6 +492,32 @@ void path_put(const struct path *path)
 }
 EXPORT_SYMBOL(path_put);
 
+/**
+ * path_get_pin - get a reference to a path's dentry
+ *                and pin to path's vfsmnt
+ * @path: path to get the reference to
+ * @p: the fs_pin pin to vfsmnt
+ */
+void path_get_pin(struct path *path, struct fs_pin *p)
+{
+	dget(path->dentry);
+	pin_insert_group(p, path->mnt, NULL);
+}
+EXPORT_SYMBOL(path_get_pin);
+
+/**
+ * path_put_unpin - put a reference to a path's dentry
+ *                  and remove pin to path's vfsmnt
+ * @path: path to put the reference to
+ * @p: the fs_pin removed from vfsmnt
+ */
+void path_put_unpin(struct path *path, struct fs_pin *p)
+{
+	dput(path->dentry);
+	pin_remove(p);
+}
+EXPORT_SYMBOL(path_put_unpin);
+
 #define EMBEDDED_LEVELS 2
 struct nameidata {
 	struct path	path;
diff --git a/include/linux/path.h b/include/linux/path.h
index d137218..34599fb 100644
--- a/include/linux/path.h
+++ b/include/linux/path.h
@@ -3,6 +3,7 @@
 
 struct dentry;
 struct vfsmount;
+struct fs_pin;
 
 struct path {
 	struct vfsmount *mnt;
@@ -12,6 +13,9 @@ struct path {
 extern void path_get(const struct path *);
 extern void path_put(const struct path *);
 
+extern void path_get_pin(struct path *, struct fs_pin *);
+extern void path_put_unpin(struct path *, struct fs_pin *);
+
 static inline int path_equal(const struct path *path1, const struct path *path2)
 {
 	return path1->mnt == path2->mnt && path1->dentry == path2->dentry;
-- 
2.4.3


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH 4/9 v8] fs: New helper legitimize_mntget() for getting a legitimize mnt
  2015-07-27  3:05 [PATCH 0/9 v8] NFSD: Pin to vfsmount for nfsd exports cache Kinglong Mee
       [not found] ` <55B5A012.1030006-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
@ 2015-07-27  3:08 ` Kinglong Mee
       [not found]   ` <55B5A0B0.7060604-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
  2015-07-27  3:09 ` [PATCH 5/9 v8] sunrpc: Store cache_detail in seq_file's private directly Kinglong Mee
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 49+ messages in thread
From: Kinglong Mee @ 2015-07-27  3:08 UTC (permalink / raw)
  To: J. Bruce Fields, Al Viro
  Cc: linux-nfs, linux-fsdevel, NeilBrown, Trond Myklebust, kinglongmee

New helper legitimize_mntget for getting a mnt without setting
MNT_SYNC_UMOUNT | MNT_UMOUNT | MNT_DOOMED, otherwise return NULL.

v8, same as v6

Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
---
 fs/namespace.c        | 19 +++++++++++++++++++
 include/linux/mount.h |  1 +
 2 files changed, 20 insertions(+)

diff --git a/fs/namespace.c b/fs/namespace.c
index 2b8aa15..842cf57 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -1153,6 +1153,25 @@ struct vfsmount *mntget(struct vfsmount *mnt)
 }
 EXPORT_SYMBOL(mntget);
 
+struct vfsmount *legitimize_mntget(struct vfsmount *vfsmnt)
+{
+	struct mount *mnt;
+
+	if (vfsmnt == NULL)
+		return NULL;
+
+	read_seqlock_excl(&mount_lock);
+	mnt = real_mount(vfsmnt);
+	if (vfsmnt->mnt_flags & (MNT_SYNC_UMOUNT | MNT_UMOUNT | MNT_DOOMED))
+		vfsmnt = NULL;
+	else
+		mnt_add_count(mnt, 1);
+	read_sequnlock_excl(&mount_lock);
+
+	return vfsmnt;
+}
+EXPORT_SYMBOL(legitimize_mntget);
+
 struct vfsmount *mnt_clone_internal(struct path *path)
 {
 	struct mount *p;
diff --git a/include/linux/mount.h b/include/linux/mount.h
index f822c3c..8ae9dc0 100644
--- a/include/linux/mount.h
+++ b/include/linux/mount.h
@@ -79,6 +79,7 @@ extern void mnt_drop_write(struct vfsmount *mnt);
 extern void mnt_drop_write_file(struct file *file);
 extern void mntput(struct vfsmount *mnt);
 extern struct vfsmount *mntget(struct vfsmount *mnt);
+extern struct vfsmount *legitimize_mntget(struct vfsmount *vfsmnt);
 extern struct vfsmount *mnt_clone_internal(struct path *path);
 extern int __mnt_is_readonly(struct vfsmount *mnt);
 
-- 
2.4.3


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH 5/9 v8] sunrpc: Store cache_detail in seq_file's private directly
  2015-07-27  3:05 [PATCH 0/9 v8] NFSD: Pin to vfsmount for nfsd exports cache Kinglong Mee
       [not found] ` <55B5A012.1030006-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
  2015-07-27  3:08 ` [PATCH 4/9 v8] fs: New helper legitimize_mntget() for getting a legitimize mnt Kinglong Mee
@ 2015-07-27  3:09 ` Kinglong Mee
  2015-07-29  2:11   ` NeilBrown
  2015-07-27  3:10 ` [PATCH 7/9 v8] sunrpc: Switch to using hash list instead single list Kinglong Mee
  2015-07-27  3:10 ` [PATCH 8/9 v8] sunrpc: New helper cache_delete_entry for deleting cache_head directly Kinglong Mee
  4 siblings, 1 reply; 49+ messages in thread
From: Kinglong Mee @ 2015-07-27  3:09 UTC (permalink / raw)
  To: J. Bruce Fields, Al Viro
  Cc: linux-nfs, linux-fsdevel, NeilBrown, Trond Myklebust, kinglongmee

Cleanup.

Just store cache_detail in seq_file's private,
an allocated handle is redundant.

v8, same as v6.

Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
---
 net/sunrpc/cache.c | 28 +++++++++++++---------------
 1 file changed, 13 insertions(+), 15 deletions(-)

diff --git a/net/sunrpc/cache.c b/net/sunrpc/cache.c
index 2928aff..edec603 100644
--- a/net/sunrpc/cache.c
+++ b/net/sunrpc/cache.c
@@ -1270,18 +1270,13 @@ EXPORT_SYMBOL_GPL(qword_get);
  * get a header, then pass each real item in the cache
  */
 
-struct handle {
-	struct cache_detail *cd;
-};
-
 static void *c_start(struct seq_file *m, loff_t *pos)
 	__acquires(cd->hash_lock)
 {
 	loff_t n = *pos;
 	unsigned int hash, entry;
 	struct cache_head *ch;
-	struct cache_detail *cd = ((struct handle*)m->private)->cd;
-
+	struct cache_detail *cd = m->private;
 
 	read_lock(&cd->hash_lock);
 	if (!n--)
@@ -1308,7 +1303,7 @@ static void *c_next(struct seq_file *m, void *p, loff_t *pos)
 {
 	struct cache_head *ch = p;
 	int hash = (*pos >> 32);
-	struct cache_detail *cd = ((struct handle*)m->private)->cd;
+	struct cache_detail *cd = m->private;
 
 	if (p == SEQ_START_TOKEN)
 		hash = 0;
@@ -1334,14 +1329,14 @@ static void *c_next(struct seq_file *m, void *p, loff_t *pos)
 static void c_stop(struct seq_file *m, void *p)
 	__releases(cd->hash_lock)
 {
-	struct cache_detail *cd = ((struct handle*)m->private)->cd;
+	struct cache_detail *cd = m->private;
 	read_unlock(&cd->hash_lock);
 }
 
 static int c_show(struct seq_file *m, void *p)
 {
 	struct cache_head *cp = p;
-	struct cache_detail *cd = ((struct handle*)m->private)->cd;
+	struct cache_detail *cd = m->private;
 
 	if (p == SEQ_START_TOKEN)
 		return cd->cache_show(m, cd, NULL);
@@ -1373,24 +1368,27 @@ static const struct seq_operations cache_content_op = {
 static int content_open(struct inode *inode, struct file *file,
 			struct cache_detail *cd)
 {
-	struct handle *han;
+	struct seq_file *seq;
+	int err;
 
 	if (!cd || !try_module_get(cd->owner))
 		return -EACCES;
-	han = __seq_open_private(file, &cache_content_op, sizeof(*han));
-	if (han == NULL) {
+
+	err = seq_open(file, &cache_content_op);
+	if (err) {
 		module_put(cd->owner);
-		return -ENOMEM;
+		return err;
 	}
 
-	han->cd = cd;
+	seq = file->private_data;
+	seq->private = cd;
 	return 0;
 }
 
 static int content_release(struct inode *inode, struct file *file,
 		struct cache_detail *cd)
 {
-	int ret = seq_release_private(inode, file);
+	int ret = seq_release(inode, file);
 	module_put(cd->owner);
 	return ret;
 }
-- 
2.4.3


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH 6/9 v8] sunrpc/nfsd: Remove redundant code by exports seq_operations functions
  2015-07-27  3:05 [PATCH 0/9 v8] NFSD: Pin to vfsmount for nfsd exports cache Kinglong Mee
@ 2015-07-27  3:09     ` Kinglong Mee
  2015-07-27  3:08 ` [PATCH 4/9 v8] fs: New helper legitimize_mntget() for getting a legitimize mnt Kinglong Mee
                       ` (3 subsequent siblings)
  4 siblings, 0 replies; 49+ messages in thread
From: Kinglong Mee @ 2015-07-27  3:09 UTC (permalink / raw)
  To: J. Bruce Fields, Al Viro
  Cc: linux-nfs-u79uwXL29TY76Z2rM5mHXA,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA, NeilBrown, Trond Myklebust,
	kinglongmee-Re5JQEeQqe8AvxtiuMwx3w

Nfsd has implement a site of seq_operations functions as sunrpc's cache.
Just exports sunrpc's codes, and remove nfsd's redundant codes.

v8, same as v6

Signed-off-by: Kinglong Mee <kinglongmee-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
---
 fs/nfsd/export.c             | 73 ++------------------------------------------
 include/linux/sunrpc/cache.h |  5 +++
 net/sunrpc/cache.c           | 15 +++++----
 3 files changed, 17 insertions(+), 76 deletions(-)

diff --git a/fs/nfsd/export.c b/fs/nfsd/export.c
index f79521a..b4d84b5 100644
--- a/fs/nfsd/export.c
+++ b/fs/nfsd/export.c
@@ -1075,73 +1075,6 @@ exp_pseudoroot(struct svc_rqst *rqstp, struct svc_fh *fhp)
 	return rv;
 }
 
-/* Iterator */
-
-static void *e_start(struct seq_file *m, loff_t *pos)
-	__acquires(((struct cache_detail *)m->private)->hash_lock)
-{
-	loff_t n = *pos;
-	unsigned hash, export;
-	struct cache_head *ch;
-	struct cache_detail *cd = m->private;
-	struct cache_head **export_table = cd->hash_table;
-
-	read_lock(&cd->hash_lock);
-	if (!n--)
-		return SEQ_START_TOKEN;
-	hash = n >> 32;
-	export = n & ((1LL<<32) - 1);
-
-	
-	for (ch=export_table[hash]; ch; ch=ch->next)
-		if (!export--)
-			return ch;
-	n &= ~((1LL<<32) - 1);
-	do {
-		hash++;
-		n += 1LL<<32;
-	} while(hash < EXPORT_HASHMAX && export_table[hash]==NULL);
-	if (hash >= EXPORT_HASHMAX)
-		return NULL;
-	*pos = n+1;
-	return export_table[hash];
-}
-
-static void *e_next(struct seq_file *m, void *p, loff_t *pos)
-{
-	struct cache_head *ch = p;
-	int hash = (*pos >> 32);
-	struct cache_detail *cd = m->private;
-	struct cache_head **export_table = cd->hash_table;
-
-	if (p == SEQ_START_TOKEN)
-		hash = 0;
-	else if (ch->next == NULL) {
-		hash++;
-		*pos += 1LL<<32;
-	} else {
-		++*pos;
-		return ch->next;
-	}
-	*pos &= ~((1LL<<32) - 1);
-	while (hash < EXPORT_HASHMAX && export_table[hash] == NULL) {
-		hash++;
-		*pos += 1LL<<32;
-	}
-	if (hash >= EXPORT_HASHMAX)
-		return NULL;
-	++*pos;
-	return export_table[hash];
-}
-
-static void e_stop(struct seq_file *m, void *p)
-	__releases(((struct cache_detail *)m->private)->hash_lock)
-{
-	struct cache_detail *cd = m->private;
-
-	read_unlock(&cd->hash_lock);
-}
-
 static struct flags {
 	int flag;
 	char *name[2];
@@ -1270,9 +1203,9 @@ static int e_show(struct seq_file *m, void *p)
 }
 
 const struct seq_operations nfs_exports_op = {
-	.start	= e_start,
-	.next	= e_next,
-	.stop	= e_stop,
+	.start	= cache_seq_start,
+	.next	= cache_seq_next,
+	.stop	= cache_seq_stop,
 	.show	= e_show,
 };
 
diff --git a/include/linux/sunrpc/cache.h b/include/linux/sunrpc/cache.h
index 437ddb6..04ee5a2 100644
--- a/include/linux/sunrpc/cache.h
+++ b/include/linux/sunrpc/cache.h
@@ -224,6 +224,11 @@ extern int sunrpc_cache_register_pipefs(struct dentry *parent, const char *,
 					umode_t, struct cache_detail *);
 extern void sunrpc_cache_unregister_pipefs(struct cache_detail *);
 
+/* Must store cache_detail in seq_file->private if using next three functions */
+extern void *cache_seq_start(struct seq_file *file, loff_t *pos);
+extern void *cache_seq_next(struct seq_file *file, void *p, loff_t *pos);
+extern void cache_seq_stop(struct seq_file *file, void *p);
+
 extern void qword_add(char **bpp, int *lp, char *str);
 extern void qword_addhex(char **bpp, int *lp, char *buf, int blen);
 extern int qword_get(char **bpp, char *dest, int bufsize);
diff --git a/net/sunrpc/cache.c b/net/sunrpc/cache.c
index edec603..673c2fa 100644
--- a/net/sunrpc/cache.c
+++ b/net/sunrpc/cache.c
@@ -1270,7 +1270,7 @@ EXPORT_SYMBOL_GPL(qword_get);
  * get a header, then pass each real item in the cache
  */
 
-static void *c_start(struct seq_file *m, loff_t *pos)
+void *cache_seq_start(struct seq_file *m, loff_t *pos)
 	__acquires(cd->hash_lock)
 {
 	loff_t n = *pos;
@@ -1298,8 +1298,9 @@ static void *c_start(struct seq_file *m, loff_t *pos)
 	*pos = n+1;
 	return cd->hash_table[hash];
 }
+EXPORT_SYMBOL_GPL(cache_seq_start);
 
-static void *c_next(struct seq_file *m, void *p, loff_t *pos)
+void *cache_seq_next(struct seq_file *m, void *p, loff_t *pos)
 {
 	struct cache_head *ch = p;
 	int hash = (*pos >> 32);
@@ -1325,13 +1326,15 @@ static void *c_next(struct seq_file *m, void *p, loff_t *pos)
 	++*pos;
 	return cd->hash_table[hash];
 }
+EXPORT_SYMBOL_GPL(cache_seq_next);
 
-static void c_stop(struct seq_file *m, void *p)
+void cache_seq_stop(struct seq_file *m, void *p)
 	__releases(cd->hash_lock)
 {
 	struct cache_detail *cd = m->private;
 	read_unlock(&cd->hash_lock);
 }
+EXPORT_SYMBOL_GPL(cache_seq_stop);
 
 static int c_show(struct seq_file *m, void *p)
 {
@@ -1359,9 +1362,9 @@ static int c_show(struct seq_file *m, void *p)
 }
 
 static const struct seq_operations cache_content_op = {
-	.start	= c_start,
-	.next	= c_next,
-	.stop	= c_stop,
+	.start	= cache_seq_start,
+	.next	= cache_seq_next,
+	.stop	= cache_seq_stop,
 	.show	= c_show,
 };
 
-- 
2.4.3

--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH 6/9 v8] sunrpc/nfsd: Remove redundant code by exports seq_operations functions
@ 2015-07-27  3:09     ` Kinglong Mee
  0 siblings, 0 replies; 49+ messages in thread
From: Kinglong Mee @ 2015-07-27  3:09 UTC (permalink / raw)
  To: J. Bruce Fields, Al Viro
  Cc: linux-nfs, linux-fsdevel, NeilBrown, Trond Myklebust, kinglongmee

Nfsd has implement a site of seq_operations functions as sunrpc's cache.
Just exports sunrpc's codes, and remove nfsd's redundant codes.

v8, same as v6

Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
---
 fs/nfsd/export.c             | 73 ++------------------------------------------
 include/linux/sunrpc/cache.h |  5 +++
 net/sunrpc/cache.c           | 15 +++++----
 3 files changed, 17 insertions(+), 76 deletions(-)

diff --git a/fs/nfsd/export.c b/fs/nfsd/export.c
index f79521a..b4d84b5 100644
--- a/fs/nfsd/export.c
+++ b/fs/nfsd/export.c
@@ -1075,73 +1075,6 @@ exp_pseudoroot(struct svc_rqst *rqstp, struct svc_fh *fhp)
 	return rv;
 }
 
-/* Iterator */
-
-static void *e_start(struct seq_file *m, loff_t *pos)
-	__acquires(((struct cache_detail *)m->private)->hash_lock)
-{
-	loff_t n = *pos;
-	unsigned hash, export;
-	struct cache_head *ch;
-	struct cache_detail *cd = m->private;
-	struct cache_head **export_table = cd->hash_table;
-
-	read_lock(&cd->hash_lock);
-	if (!n--)
-		return SEQ_START_TOKEN;
-	hash = n >> 32;
-	export = n & ((1LL<<32) - 1);
-
-	
-	for (ch=export_table[hash]; ch; ch=ch->next)
-		if (!export--)
-			return ch;
-	n &= ~((1LL<<32) - 1);
-	do {
-		hash++;
-		n += 1LL<<32;
-	} while(hash < EXPORT_HASHMAX && export_table[hash]==NULL);
-	if (hash >= EXPORT_HASHMAX)
-		return NULL;
-	*pos = n+1;
-	return export_table[hash];
-}
-
-static void *e_next(struct seq_file *m, void *p, loff_t *pos)
-{
-	struct cache_head *ch = p;
-	int hash = (*pos >> 32);
-	struct cache_detail *cd = m->private;
-	struct cache_head **export_table = cd->hash_table;
-
-	if (p == SEQ_START_TOKEN)
-		hash = 0;
-	else if (ch->next == NULL) {
-		hash++;
-		*pos += 1LL<<32;
-	} else {
-		++*pos;
-		return ch->next;
-	}
-	*pos &= ~((1LL<<32) - 1);
-	while (hash < EXPORT_HASHMAX && export_table[hash] == NULL) {
-		hash++;
-		*pos += 1LL<<32;
-	}
-	if (hash >= EXPORT_HASHMAX)
-		return NULL;
-	++*pos;
-	return export_table[hash];
-}
-
-static void e_stop(struct seq_file *m, void *p)
-	__releases(((struct cache_detail *)m->private)->hash_lock)
-{
-	struct cache_detail *cd = m->private;
-
-	read_unlock(&cd->hash_lock);
-}
-
 static struct flags {
 	int flag;
 	char *name[2];
@@ -1270,9 +1203,9 @@ static int e_show(struct seq_file *m, void *p)
 }
 
 const struct seq_operations nfs_exports_op = {
-	.start	= e_start,
-	.next	= e_next,
-	.stop	= e_stop,
+	.start	= cache_seq_start,
+	.next	= cache_seq_next,
+	.stop	= cache_seq_stop,
 	.show	= e_show,
 };
 
diff --git a/include/linux/sunrpc/cache.h b/include/linux/sunrpc/cache.h
index 437ddb6..04ee5a2 100644
--- a/include/linux/sunrpc/cache.h
+++ b/include/linux/sunrpc/cache.h
@@ -224,6 +224,11 @@ extern int sunrpc_cache_register_pipefs(struct dentry *parent, const char *,
 					umode_t, struct cache_detail *);
 extern void sunrpc_cache_unregister_pipefs(struct cache_detail *);
 
+/* Must store cache_detail in seq_file->private if using next three functions */
+extern void *cache_seq_start(struct seq_file *file, loff_t *pos);
+extern void *cache_seq_next(struct seq_file *file, void *p, loff_t *pos);
+extern void cache_seq_stop(struct seq_file *file, void *p);
+
 extern void qword_add(char **bpp, int *lp, char *str);
 extern void qword_addhex(char **bpp, int *lp, char *buf, int blen);
 extern int qword_get(char **bpp, char *dest, int bufsize);
diff --git a/net/sunrpc/cache.c b/net/sunrpc/cache.c
index edec603..673c2fa 100644
--- a/net/sunrpc/cache.c
+++ b/net/sunrpc/cache.c
@@ -1270,7 +1270,7 @@ EXPORT_SYMBOL_GPL(qword_get);
  * get a header, then pass each real item in the cache
  */
 
-static void *c_start(struct seq_file *m, loff_t *pos)
+void *cache_seq_start(struct seq_file *m, loff_t *pos)
 	__acquires(cd->hash_lock)
 {
 	loff_t n = *pos;
@@ -1298,8 +1298,9 @@ static void *c_start(struct seq_file *m, loff_t *pos)
 	*pos = n+1;
 	return cd->hash_table[hash];
 }
+EXPORT_SYMBOL_GPL(cache_seq_start);
 
-static void *c_next(struct seq_file *m, void *p, loff_t *pos)
+void *cache_seq_next(struct seq_file *m, void *p, loff_t *pos)
 {
 	struct cache_head *ch = p;
 	int hash = (*pos >> 32);
@@ -1325,13 +1326,15 @@ static void *c_next(struct seq_file *m, void *p, loff_t *pos)
 	++*pos;
 	return cd->hash_table[hash];
 }
+EXPORT_SYMBOL_GPL(cache_seq_next);
 
-static void c_stop(struct seq_file *m, void *p)
+void cache_seq_stop(struct seq_file *m, void *p)
 	__releases(cd->hash_lock)
 {
 	struct cache_detail *cd = m->private;
 	read_unlock(&cd->hash_lock);
 }
+EXPORT_SYMBOL_GPL(cache_seq_stop);
 
 static int c_show(struct seq_file *m, void *p)
 {
@@ -1359,9 +1362,9 @@ static int c_show(struct seq_file *m, void *p)
 }
 
 static const struct seq_operations cache_content_op = {
-	.start	= c_start,
-	.next	= c_next,
-	.stop	= c_stop,
+	.start	= cache_seq_start,
+	.next	= cache_seq_next,
+	.stop	= cache_seq_stop,
 	.show	= c_show,
 };
 
-- 
2.4.3


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH 7/9 v8] sunrpc: Switch to using hash list instead single list
  2015-07-27  3:05 [PATCH 0/9 v8] NFSD: Pin to vfsmount for nfsd exports cache Kinglong Mee
                   ` (2 preceding siblings ...)
  2015-07-27  3:09 ` [PATCH 5/9 v8] sunrpc: Store cache_detail in seq_file's private directly Kinglong Mee
@ 2015-07-27  3:10 ` Kinglong Mee
  2015-07-29  2:19   ` NeilBrown
  2015-07-27  3:10 ` [PATCH 8/9 v8] sunrpc: New helper cache_delete_entry for deleting cache_head directly Kinglong Mee
  4 siblings, 1 reply; 49+ messages in thread
From: Kinglong Mee @ 2015-07-27  3:10 UTC (permalink / raw)
  To: J. Bruce Fields, Al Viro
  Cc: linux-nfs, linux-fsdevel, NeilBrown, Trond Myklebust, kinglongmee

Switch using list_head for cache_head in cache_detail,
it is useful of remove an cache_head entry directly from cache_detail.

v8, using hash list, not head list

Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
---
 include/linux/sunrpc/cache.h |  4 +--
 net/sunrpc/cache.c           | 60 +++++++++++++++++++++++---------------------
 2 files changed, 33 insertions(+), 31 deletions(-)

diff --git a/include/linux/sunrpc/cache.h b/include/linux/sunrpc/cache.h
index 04ee5a2..03d3b4c 100644
--- a/include/linux/sunrpc/cache.h
+++ b/include/linux/sunrpc/cache.h
@@ -46,7 +46,7 @@
  * 
  */
 struct cache_head {
-	struct cache_head * next;
+	struct hlist_node	cache_list;
 	time_t		expiry_time;	/* After time time, don't use the data */
 	time_t		last_refresh;   /* If CACHE_PENDING, this is when upcall 
 					 * was sent, else this is when update was received
@@ -73,7 +73,7 @@ struct cache_detail_pipefs {
 struct cache_detail {
 	struct module *		owner;
 	int			hash_size;
-	struct cache_head **	hash_table;
+	struct hlist_head *	hash_table;
 	rwlock_t		hash_lock;
 
 	atomic_t		inuse; /* active user-space update or lookup */
diff --git a/net/sunrpc/cache.c b/net/sunrpc/cache.c
index 673c2fa..4a2340a 100644
--- a/net/sunrpc/cache.c
+++ b/net/sunrpc/cache.c
@@ -44,7 +44,7 @@ static void cache_revisit_request(struct cache_head *item);
 static void cache_init(struct cache_head *h)
 {
 	time_t now = seconds_since_boot();
-	h->next = NULL;
+	INIT_HLIST_NODE(&h->cache_list);
 	h->flags = 0;
 	kref_init(&h->ref);
 	h->expiry_time = now + CACHE_NEW_EXPIRY;
@@ -54,15 +54,14 @@ static void cache_init(struct cache_head *h)
 struct cache_head *sunrpc_cache_lookup(struct cache_detail *detail,
 				       struct cache_head *key, int hash)
 {
-	struct cache_head **head,  **hp;
-	struct cache_head *new = NULL, *freeme = NULL;
+	struct cache_head *new = NULL, *freeme = NULL, *tmp = NULL;
+	struct hlist_head *head;
 
 	head = &detail->hash_table[hash];
 
 	read_lock(&detail->hash_lock);
 
-	for (hp=head; *hp != NULL ; hp = &(*hp)->next) {
-		struct cache_head *tmp = *hp;
+	hlist_for_each_entry(tmp, head, cache_list) {
 		if (detail->match(tmp, key)) {
 			if (cache_is_expired(detail, tmp))
 				/* This entry is expired, we will discard it. */
@@ -88,12 +87,10 @@ struct cache_head *sunrpc_cache_lookup(struct cache_detail *detail,
 	write_lock(&detail->hash_lock);
 
 	/* check if entry appeared while we slept */
-	for (hp=head; *hp != NULL ; hp = &(*hp)->next) {
-		struct cache_head *tmp = *hp;
+	hlist_for_each_entry(tmp, head, cache_list) {
 		if (detail->match(tmp, key)) {
 			if (cache_is_expired(detail, tmp)) {
-				*hp = tmp->next;
-				tmp->next = NULL;
+				hlist_del_init(&tmp->cache_list);
 				detail->entries --;
 				freeme = tmp;
 				break;
@@ -104,8 +101,8 @@ struct cache_head *sunrpc_cache_lookup(struct cache_detail *detail,
 			return tmp;
 		}
 	}
-	new->next = *head;
-	*head = new;
+
+	hlist_add_head(&new->cache_list, head);
 	detail->entries++;
 	cache_get(new);
 	write_unlock(&detail->hash_lock);
@@ -143,7 +140,6 @@ struct cache_head *sunrpc_cache_update(struct cache_detail *detail,
 	 * If 'old' is not VALID, we update it directly,
 	 * otherwise we need to replace it
 	 */
-	struct cache_head **head;
 	struct cache_head *tmp;
 
 	if (!test_bit(CACHE_VALID, &old->flags)) {
@@ -168,15 +164,13 @@ struct cache_head *sunrpc_cache_update(struct cache_detail *detail,
 	}
 	cache_init(tmp);
 	detail->init(tmp, old);
-	head = &detail->hash_table[hash];
 
 	write_lock(&detail->hash_lock);
 	if (test_bit(CACHE_NEGATIVE, &new->flags))
 		set_bit(CACHE_NEGATIVE, &tmp->flags);
 	else
 		detail->update(tmp, new);
-	tmp->next = *head;
-	*head = tmp;
+	hlist_add_head(&tmp->cache_list, &detail->hash_table[hash]);
 	detail->entries++;
 	cache_get(tmp);
 	cache_fresh_locked(tmp, new->expiry_time);
@@ -416,28 +410,29 @@ static int cache_clean(void)
 	/* find a non-empty bucket in the table */
 	while (current_detail &&
 	       current_index < current_detail->hash_size &&
-	       current_detail->hash_table[current_index] == NULL)
+	       hlist_empty(&current_detail->hash_table[current_index]))
 		current_index++;
 
 	/* find a cleanable entry in the bucket and clean it, or set to next bucket */
 
 	if (current_detail && current_index < current_detail->hash_size) {
-		struct cache_head *ch, **cp;
+		struct cache_head *ch = NULL;
 		struct cache_detail *d;
+		struct hlist_head *head;
+		struct hlist_node *tmp;
 
 		write_lock(&current_detail->hash_lock);
 
 		/* Ok, now to clean this strand */
 
-		cp = & current_detail->hash_table[current_index];
-		for (ch = *cp ; ch ; cp = & ch->next, ch = *cp) {
+		head = &current_detail->hash_table[current_index];
+		hlist_for_each_entry_safe(ch, tmp, head, cache_list) {
 			if (current_detail->nextcheck > ch->expiry_time)
 				current_detail->nextcheck = ch->expiry_time+1;
 			if (!cache_is_expired(current_detail, ch))
 				continue;
 
-			*cp = ch->next;
-			ch->next = NULL;
+			hlist_del_init(&ch->cache_list);
 			current_detail->entries--;
 			rv = 1;
 			break;
@@ -1284,7 +1279,7 @@ void *cache_seq_start(struct seq_file *m, loff_t *pos)
 	hash = n >> 32;
 	entry = n & ((1LL<<32) - 1);
 
-	for (ch=cd->hash_table[hash]; ch; ch=ch->next)
+	hlist_for_each_entry(ch, &cd->hash_table[hash], cache_list)
 		if (!entry--)
 			return ch;
 	n &= ~((1LL<<32) - 1);
@@ -1292,11 +1287,12 @@ void *cache_seq_start(struct seq_file *m, loff_t *pos)
 		hash++;
 		n += 1LL<<32;
 	} while(hash < cd->hash_size &&
-		cd->hash_table[hash]==NULL);
+		hlist_empty(&cd->hash_table[hash]));
 	if (hash >= cd->hash_size)
 		return NULL;
 	*pos = n+1;
-	return cd->hash_table[hash];
+	return hlist_entry_safe(cd->hash_table[hash].first,
+				struct cache_head, cache_list);
 }
 EXPORT_SYMBOL_GPL(cache_seq_start);
 
@@ -1308,23 +1304,25 @@ void *cache_seq_next(struct seq_file *m, void *p, loff_t *pos)
 
 	if (p == SEQ_START_TOKEN)
 		hash = 0;
-	else if (ch->next == NULL) {
+	else if (ch->cache_list.next == NULL) {
 		hash++;
 		*pos += 1LL<<32;
 	} else {
 		++*pos;
-		return ch->next;
+		return hlist_entry_safe(ch->cache_list.next,
+					struct cache_head, cache_list);
 	}
 	*pos &= ~((1LL<<32) - 1);
 	while (hash < cd->hash_size &&
-	       cd->hash_table[hash] == NULL) {
+	       hlist_empty(&cd->hash_table[hash])) {
 		hash++;
 		*pos += 1LL<<32;
 	}
 	if (hash >= cd->hash_size)
 		return NULL;
 	++*pos;
-	return cd->hash_table[hash];
+	return hlist_entry_safe(cd->hash_table[hash].first,
+				struct cache_head, cache_list);
 }
 EXPORT_SYMBOL_GPL(cache_seq_next);
 
@@ -1666,17 +1664,21 @@ EXPORT_SYMBOL_GPL(cache_unregister_net);
 struct cache_detail *cache_create_net(struct cache_detail *tmpl, struct net *net)
 {
 	struct cache_detail *cd;
+	int i;
 
 	cd = kmemdup(tmpl, sizeof(struct cache_detail), GFP_KERNEL);
 	if (cd == NULL)
 		return ERR_PTR(-ENOMEM);
 
-	cd->hash_table = kzalloc(cd->hash_size * sizeof(struct cache_head *),
+	cd->hash_table = kzalloc(cd->hash_size * sizeof(struct hlist_head),
 				 GFP_KERNEL);
 	if (cd->hash_table == NULL) {
 		kfree(cd);
 		return ERR_PTR(-ENOMEM);
 	}
+
+	for (i = 0; i < cd->hash_size; i++)
+		INIT_HLIST_HEAD(&cd->hash_table[i]);
 	cd->net = net;
 	return cd;
 }
-- 
2.4.3


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH 8/9 v8] sunrpc: New helper cache_delete_entry for deleting cache_head directly
  2015-07-27  3:05 [PATCH 0/9 v8] NFSD: Pin to vfsmount for nfsd exports cache Kinglong Mee
                   ` (3 preceding siblings ...)
  2015-07-27  3:10 ` [PATCH 7/9 v8] sunrpc: Switch to using hash list instead single list Kinglong Mee
@ 2015-07-27  3:10 ` Kinglong Mee
       [not found]   ` <55B5A135.9050800-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
  4 siblings, 1 reply; 49+ messages in thread
From: Kinglong Mee @ 2015-07-27  3:10 UTC (permalink / raw)
  To: J. Bruce Fields, Al Viro
  Cc: linux-nfs, linux-fsdevel, NeilBrown, Trond Myklebust, kinglongmee

A new helper cache_delete_entry() for delete cache_head from
cache_detail directly.

It will be used by pin_kill, so make sure the cache_detail is valid
before deleting is needed.

Because pin_kill is not many times, so the influence of performance
is accepted.

v8, same as v6.

Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
---
 include/linux/sunrpc/cache.h |  1 +
 net/sunrpc/cache.c           | 30 ++++++++++++++++++++++++++++++
 2 files changed, 31 insertions(+)

diff --git a/include/linux/sunrpc/cache.h b/include/linux/sunrpc/cache.h
index 03d3b4c..2824db5 100644
--- a/include/linux/sunrpc/cache.h
+++ b/include/linux/sunrpc/cache.h
@@ -210,6 +210,7 @@ extern int cache_check(struct cache_detail *detail,
 		       struct cache_head *h, struct cache_req *rqstp);
 extern void cache_flush(void);
 extern void cache_purge(struct cache_detail *detail);
+extern void cache_delete_entry(struct cache_detail *cd, struct cache_head *h);
 #define NEVER (0x7FFFFFFF)
 extern void __init cache_initialize(void);
 extern int cache_register_net(struct cache_detail *cd, struct net *net);
diff --git a/net/sunrpc/cache.c b/net/sunrpc/cache.c
index 4a2340a..b722aea 100644
--- a/net/sunrpc/cache.c
+++ b/net/sunrpc/cache.c
@@ -454,6 +454,36 @@ static int cache_clean(void)
 	return rv;
 }
 
+void cache_delete_entry(struct cache_detail *detail, struct cache_head *h)
+{
+	struct cache_detail *tmp;
+
+	if (!detail || !h)
+		return;
+
+	spin_lock(&cache_list_lock);
+	list_for_each_entry(tmp, &cache_list, others) {
+		if (tmp == detail)
+			goto found;
+	}
+	spin_unlock(&cache_list_lock);
+	printk(KERN_WARNING "%s: Deleted cache detail %p\n", __func__, detail);
+	return ;
+
+found:
+	write_lock(&detail->hash_lock);
+
+	hlist_del_init(&h->cache_list);
+	detail->entries--;
+	set_bit(CACHE_CLEANED, &h->flags);
+
+	write_unlock(&detail->hash_lock);
+	spin_unlock(&cache_list_lock);
+
+	cache_put(h, detail);
+}
+EXPORT_SYMBOL_GPL(cache_delete_entry);
+
 /*
  * We want to regularly clean the cache, so we need to schedule some work ...
  */
-- 
2.4.3


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH 9/9 v8] nfsd: Allows user un-mounting filesystem where nfsd exports base on
  2015-07-27  3:05 [PATCH 0/9 v8] NFSD: Pin to vfsmount for nfsd exports cache Kinglong Mee
@ 2015-07-27  3:12     ` Kinglong Mee
  2015-07-27  3:08 ` [PATCH 4/9 v8] fs: New helper legitimize_mntget() for getting a legitimize mnt Kinglong Mee
                       ` (3 subsequent siblings)
  4 siblings, 0 replies; 49+ messages in thread
From: Kinglong Mee @ 2015-07-27  3:12 UTC (permalink / raw)
  To: J. Bruce Fields, Al Viro
  Cc: linux-nfs-u79uwXL29TY76Z2rM5mHXA,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA, NeilBrown, Trond Myklebust,
	kinglongmee-Re5JQEeQqe8AvxtiuMwx3w

If there are some mount points(not exported for nfs) under pseudo root,
after client's operation of those entry under the root,  anyone *can't*
unmount those mount points until export cache expired.

/nfs/xfs        *(rw,insecure,no_subtree_check,no_root_squash)
/nfs/pnfs       *(rw,insecure,no_subtree_check,no_root_squash)
total 0
drwxr-xr-x. 3 root root 84 Apr 21 22:27 pnfs
drwxr-xr-x. 3 root root 84 Apr 21 22:27 test
drwxr-xr-x. 2 root root  6 Apr 20 22:01 xfs
Filesystem                      1K-blocks    Used Available Use% Mounted on
......
/dev/sdd                          1038336   32944   1005392   4% /nfs/pnfs
/dev/sdc                         10475520   32928  10442592   1% /nfs/xfs
/dev/sde                           999320    1284    929224   1% /nfs/test
/mnt/pnfs/:
total 0
-rw-r--r--. 1 root root 0 Apr 21 22:23 attr
drwxr-xr-x. 2 root root 6 Apr 21 22:19 tmp

/mnt/xfs/:
total 0
umount: /nfs/test/: target is busy
        (In some cases useful info about processes that
        use the device is found by lsof(8) or fuser(1).)

It's caused by exports cache of nfsd holds the reference of
the path (here is /nfs/test/), so, it can't be umounted.

I don't think that's user expect, they want umount /nfs/test/.
Bruce think user can also umount /nfs/pnfs/ and /nfs/xfs.

Also, using kzalloc for all memory allocating without kmalloc.
Thanks for Al Viro's commets for the logic of fs_pin.

v3,
1. using path_get_pin/path_put_unpin for path pin
2. using kzalloc for memory allocating

v5, v4,
1. add a completion for pin_kill waiting the reference is decreased to zero.
2. add a work_struct for pin_kill decreases the reference indirectly.
3. free svc_export/svc_expkey in pin_kill, not svc_export_put/svc_expkey_put.
4. svc_export_put/svc_expkey_put go though pin_kill logic.

v6,
1. Pin vfsmnt to mount point at first, when reference increace (==2),
   grab a reference to vfsmnt by mntget. When decreace (==1),
   drop the reference to vfsmnt, left pin.
2. Delete cache_head directly from cache_detail.

v7,
implement self reference increase and decrease for nfsd exports/expkey

v8,
new method as,

1. There are only one outlet from each cache, exp_find_key() for expkey,
   exp_get_by_name() for export.
2. Any fsid to export or filehandle to export will call the function.
3. exp_get()/exp_put() increase/decrease the reference of export.

Call legitimize_mntget() in the only outlet function exp_find_key()/
exp_get_by_name(), if fail return STALE, otherwise, any valid
expkey/export from the cache is validated (Have get the reference of vfsmnt).

Add mntget() in exp_get() and mntput() in exp_put(), because the export
passed to exp_get/exp_put are returned from exp_find_key/exp_get_by_name.

For expkey cache,
1. At first, a fsid is passed to exp_find_key, and lookup a cache
   in svc_expkey_lookup, if success, ekey->ek_path is pined to mount.
2. Then call legitimize_mntget getting a reference of vfsmnt 
   before return from exp_find_key.
3. Any calling exp_find_key with valid cache must put the vfsmnt.

for export cache,
1. At first, a path (returned from exp_find_key) with validate vfsmnt
   is passed to exp_get_by_name, if success, exp->ex_path is pined to mount.
2. Then call legitimize_mntget getting a reference of vfsmnt 
   before return from exp_get_by_name.
3. Any calling exp_get_by_name with valid cache must put the vfsmnt
   by exp_put();
4. Any using the exp returned from exp_get_by_name must call exp_get(),
   will increase the reference of vfsmnt.

So that,
a. After getting the reference in 2, any umount of filesystem will get -EBUSY.
b. After put all reference after 4, or before get the reference in 2, 
   any umount of filesystem will call pin_kill, and delete the cache directly,
   also unpin the vfsmount.
c. Between 1 and 2, have get the reference of exp/key cache, with invalidate vfsmnt.
   As you said, umount of filesystem only wait exp_find_key/exp_get_by_name
   put the reference of cache when legitimize_mntget fail.

Signed-off-by: Kinglong Mee <kinglongmee-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
---
 fs/nfsd/export.c | 136 +++++++++++++++++++++++++++++++++++++++++++++----------
 fs/nfsd/export.h |  22 ++++++++-
 2 files changed, 132 insertions(+), 26 deletions(-)

diff --git a/fs/nfsd/export.c b/fs/nfsd/export.c
index b4d84b5..7f4816d 100644
--- a/fs/nfsd/export.c
+++ b/fs/nfsd/export.c
@@ -37,15 +37,23 @@
 #define	EXPKEY_HASHMAX		(1 << EXPKEY_HASHBITS)
 #define	EXPKEY_HASHMASK		(EXPKEY_HASHMAX -1)
 
+static void expkey_destroy(struct svc_expkey *key)
+{
+	auth_domain_put(key->ek_client);
+	kfree_rcu(key, rcu_head);
+}
+
 static void expkey_put(struct kref *ref)
 {
 	struct svc_expkey *key = container_of(ref, struct svc_expkey, h.ref);
 
 	if (test_bit(CACHE_VALID, &key->h.flags) &&
-	    !test_bit(CACHE_NEGATIVE, &key->h.flags))
-		path_put(&key->ek_path);
-	auth_domain_put(key->ek_client);
-	kfree(key);
+	    !test_bit(CACHE_NEGATIVE, &key->h.flags)) {
+		rcu_read_lock();
+		complete(&key->ek_done);
+		pin_kill(&key->ek_pin);
+	} else
+		expkey_destroy(key);
 }
 
 static void expkey_request(struct cache_detail *cd,
@@ -83,7 +91,7 @@ static int expkey_parse(struct cache_detail *cd, char *mesg, int mlen)
 		return -EINVAL;
 	mesg[mlen-1] = 0;
 
-	buf = kmalloc(PAGE_SIZE, GFP_KERNEL);
+	buf = kzalloc(PAGE_SIZE, GFP_KERNEL);
 	err = -ENOMEM;
 	if (!buf)
 		goto out;
@@ -119,6 +127,7 @@ static int expkey_parse(struct cache_detail *cd, char *mesg, int mlen)
 	if (key.h.expiry_time == 0)
 		goto out;
 
+	key.cd = cd;
 	key.ek_client = dom;	
 	key.ek_fsidtype = fsidtype;
 	memcpy(key.ek_fsid, buf, len);
@@ -181,7 +190,11 @@ static int expkey_show(struct seq_file *m,
 	if (test_bit(CACHE_VALID, &h->flags) && 
 	    !test_bit(CACHE_NEGATIVE, &h->flags)) {
 		seq_printf(m, " ");
-		seq_path(m, &ek->ek_path, "\\ \t\n");
+		if (legitimize_mntget(ek->ek_path.mnt)) {
+			seq_path(m, &ek->ek_path, "\\ \t\n");
+			mntput(ek->ek_path.mnt);
+		} else
+			seq_printf(m, "Dir umounting");
 	}
 	seq_printf(m, "\n");
 	return 0;
@@ -210,6 +223,26 @@ static inline void expkey_init(struct cache_head *cnew,
 	new->ek_fsidtype = item->ek_fsidtype;
 
 	memcpy(new->ek_fsid, item->ek_fsid, sizeof(new->ek_fsid));
+	new->cd = item->cd;
+}
+
+static void expkey_pin_kill(struct fs_pin *pin)
+{
+	struct svc_expkey *key = container_of(pin, struct svc_expkey, ek_pin);
+
+	if (!completion_done(&key->ek_done)) {
+		schedule_work(&key->ek_work);
+		wait_for_completion(&key->ek_done);
+	}
+
+	path_put_unpin(&key->ek_path, &key->ek_pin);
+	expkey_destroy(key);
+}
+
+static void expkey_close_work(struct work_struct *work)
+{
+	struct svc_expkey *key = container_of(work, struct svc_expkey, ek_work);
+	cache_delete_entry(key->cd, &key->h);
 }
 
 static inline void expkey_update(struct cache_head *cnew,
@@ -218,16 +251,19 @@ static inline void expkey_update(struct cache_head *cnew,
 	struct svc_expkey *new = container_of(cnew, struct svc_expkey, h);
 	struct svc_expkey *item = container_of(citem, struct svc_expkey, h);
 
+	init_fs_pin(&new->ek_pin, expkey_pin_kill);
 	new->ek_path = item->ek_path;
-	path_get(&item->ek_path);
+	path_get_pin(&new->ek_path, &new->ek_pin);
 }
 
 static struct cache_head *expkey_alloc(void)
 {
-	struct svc_expkey *i = kmalloc(sizeof(*i), GFP_KERNEL);
-	if (i)
+	struct svc_expkey *i = kzalloc(sizeof(*i), GFP_KERNEL);
+	if (i) {
+		INIT_WORK(&i->ek_work, expkey_close_work);
+		init_completion(&i->ek_done);
 		return &i->h;
-	else
+	} else
 		return NULL;
 }
 
@@ -306,14 +342,21 @@ static void nfsd4_fslocs_free(struct nfsd4_fs_locations *fsloc)
 	fsloc->locations = NULL;
 }
 
-static void svc_export_put(struct kref *ref)
+static void svc_export_destroy(struct svc_export *exp)
 {
-	struct svc_export *exp = container_of(ref, struct svc_export, h.ref);
-	path_put(&exp->ex_path);
 	auth_domain_put(exp->ex_client);
 	nfsd4_fslocs_free(&exp->ex_fslocs);
 	kfree(exp->ex_uuid);
-	kfree(exp);
+	kfree_rcu(exp, rcu_head);
+}
+
+static void svc_export_put(struct kref *ref)
+{
+	struct svc_export *exp = container_of(ref, struct svc_export, h.ref);
+
+	rcu_read_lock();
+	complete(&exp->ex_done);
+	pin_kill(&exp->ex_pin);
 }
 
 static void svc_export_request(struct cache_detail *cd,
@@ -520,7 +563,7 @@ static int svc_export_parse(struct cache_detail *cd, char *mesg, int mlen)
 		return -EINVAL;
 	mesg[mlen-1] = 0;
 
-	buf = kmalloc(PAGE_SIZE, GFP_KERNEL);
+	buf = kzalloc(PAGE_SIZE, GFP_KERNEL);
 	if (!buf)
 		return -ENOMEM;
 
@@ -636,7 +679,7 @@ static int svc_export_parse(struct cache_detail *cd, char *mesg, int mlen)
 	if (expp == NULL)
 		err = -ENOMEM;
 	else
-		exp_put(expp);
+		cache_put(&expp->h, expp->cd);
 out4:
 	nfsd4_fslocs_free(&exp.ex_fslocs);
 	kfree(exp.ex_uuid);
@@ -664,7 +707,12 @@ static int svc_export_show(struct seq_file *m,
 		return 0;
 	}
 	exp = container_of(h, struct svc_export, h);
-	seq_path(m, &exp->ex_path, " \t\n\\");
+	if (legitimize_mntget(exp->ex_path.mnt)) {
+		seq_path(m, &exp->ex_path, " \t\n\\");
+		mntput(exp->ex_path.mnt);
+	} else
+		seq_printf(m, "Dir umounting");
+
 	seq_putc(m, '\t');
 	seq_escape(m, exp->ex_client->name, " \t\n\\");
 	seq_putc(m, '(');
@@ -694,15 +742,35 @@ static int svc_export_match(struct cache_head *a, struct cache_head *b)
 		path_equal(&orig->ex_path, &new->ex_path);
 }
 
+static void export_pin_kill(struct fs_pin *pin)
+{
+	struct svc_export *exp = container_of(pin, struct svc_export, ex_pin);
+
+	if (!completion_done(&exp->ex_done)) {
+		schedule_work(&exp->ex_work);
+		wait_for_completion(&exp->ex_done);
+	}
+
+	path_put_unpin(&exp->ex_path, &exp->ex_pin);
+	svc_export_destroy(exp);
+}
+
+static void export_close_work(struct work_struct *work)
+{
+	struct svc_export *exp = container_of(work, struct svc_export, ex_work);
+	cache_delete_entry(exp->cd, &exp->h);
+}
+
 static void svc_export_init(struct cache_head *cnew, struct cache_head *citem)
 {
 	struct svc_export *new = container_of(cnew, struct svc_export, h);
 	struct svc_export *item = container_of(citem, struct svc_export, h);
 
+	init_fs_pin(&new->ex_pin, export_pin_kill);
 	kref_get(&item->ex_client->ref);
 	new->ex_client = item->ex_client;
 	new->ex_path = item->ex_path;
-	path_get(&item->ex_path);
+	path_get_pin(&new->ex_path, &new->ex_pin);
 	new->ex_fslocs.locations = NULL;
 	new->ex_fslocs.locations_count = 0;
 	new->ex_fslocs.migrated = 0;
@@ -740,10 +808,12 @@ static void export_update(struct cache_head *cnew, struct cache_head *citem)
 
 static struct cache_head *svc_export_alloc(void)
 {
-	struct svc_export *i = kmalloc(sizeof(*i), GFP_KERNEL);
-	if (i)
+	struct svc_export *i = kzalloc(sizeof(*i), GFP_KERNEL);
+	if (i) {
+		INIT_WORK(&i->ex_work, export_close_work);
+		init_completion(&i->ex_done);
 		return &i->h;
-	else
+	} else
 		return NULL;
 }
 
@@ -798,6 +868,11 @@ svc_export_update(struct svc_export *new, struct svc_export *old)
 		return NULL;
 }
 
+static void exp_put_key(struct svc_expkey *key)
+{
+	mntput(key->ek_path.mnt);
+	cache_put(&key->h, key->cd);
+}
 
 static struct svc_expkey *
 exp_find_key(struct cache_detail *cd, struct auth_domain *clp, int fsid_type,
@@ -809,6 +884,7 @@ exp_find_key(struct cache_detail *cd, struct auth_domain *clp, int fsid_type,
 	if (!clp)
 		return ERR_PTR(-ENOENT);
 
+	key.cd = cd;
 	key.ek_client = clp;
 	key.ek_fsidtype = fsid_type;
 	memcpy(key.ek_fsid, fsidv, key_len(fsid_type));
@@ -819,6 +895,12 @@ exp_find_key(struct cache_detail *cd, struct auth_domain *clp, int fsid_type,
 	err = cache_check(cd, &ek->h, reqp);
 	if (err)
 		return ERR_PTR(err);
+
+	if (!legitimize_mntget(ek->ek_path.mnt)) {
+		cache_put(&ek->h, ek->cd);
+		return ERR_PTR(-ESTALE);
+	}
+
 	return ek;
 }
 
@@ -842,6 +924,12 @@ exp_get_by_name(struct cache_detail *cd, struct auth_domain *clp,
 	err = cache_check(cd, &exp->h, reqp);
 	if (err)
 		return ERR_PTR(err);
+
+	if (!legitimize_mntget(exp->ex_path.mnt)) {
+		cache_put(&exp->h, exp->cd);
+		return ERR_PTR(-ESTALE);
+	}
+
 	return exp;
 }
 
@@ -928,7 +1016,7 @@ static struct svc_export *exp_find(struct cache_detail *cd,
 		return ERR_CAST(ek);
 
 	exp = exp_get_by_name(cd, clp, &ek->ek_path, reqp);
-	cache_put(&ek->h, nn->svc_expkey_cache);
+	exp_put_key(ek);
 
 	if (IS_ERR(exp))
 		return ERR_CAST(exp);
@@ -1195,10 +1283,10 @@ static int e_show(struct seq_file *m, void *p)
 		return 0;
 	}
 
-	exp_get(exp);
+	cache_get(&exp->h);
 	if (cache_check(cd, &exp->h, NULL))
 		return 0;
-	exp_put(exp);
+	cache_put(&exp->h, exp->cd);
 	return svc_export_show(m, cd, cp);
 }
 
diff --git a/fs/nfsd/export.h b/fs/nfsd/export.h
index 1f52bfc..52210fb 100644
--- a/fs/nfsd/export.h
+++ b/fs/nfsd/export.h
@@ -4,6 +4,7 @@
 #ifndef NFSD_EXPORT_H
 #define NFSD_EXPORT_H
 
+#include <linux/fs_pin.h>
 #include <linux/sunrpc/cache.h>
 #include <uapi/linux/nfsd/export.h>
 
@@ -46,9 +47,10 @@ struct exp_flavor_info {
 
 struct svc_export {
 	struct cache_head	h;
+	struct cache_detail	*cd;
+
 	struct auth_domain *	ex_client;
 	int			ex_flags;
-	struct path		ex_path;
 	kuid_t			ex_anon_uid;
 	kgid_t			ex_anon_gid;
 	int			ex_fsid;
@@ -58,7 +60,14 @@ struct svc_export {
 	struct exp_flavor_info	ex_flavors[MAX_SECINFO_LIST];
 	enum pnfs_layouttype	ex_layout_type;
 	struct nfsd4_deviceid_map *ex_devid_map;
-	struct cache_detail	*cd;
+
+	struct path		ex_path;
+	struct fs_pin		ex_pin;
+	struct rcu_head		rcu_head;
+
+	/* For svc_export_put and fs umounting window */
+	struct completion	ex_done;
+	struct work_struct	ex_work;
 };
 
 /* an "export key" (expkey) maps a filehandlefragement to an
@@ -67,12 +76,19 @@ struct svc_export {
  */
 struct svc_expkey {
 	struct cache_head	h;
+	struct cache_detail	*cd;
 
 	struct auth_domain *	ek_client;
 	int			ek_fsidtype;
 	u32			ek_fsid[6];
 
 	struct path		ek_path;
+	struct fs_pin		ek_pin;
+	struct rcu_head		rcu_head;
+
+	/* For expkey_put and fs umounting window */
+	struct completion	ek_done;
+	struct work_struct	ek_work;
 };
 
 #define EX_ISSYNC(exp)		(!((exp)->ex_flags & NFSEXP_ASYNC))
@@ -100,12 +116,14 @@ __be32			nfserrno(int errno);
 
 static inline void exp_put(struct svc_export *exp)
 {
+	mntput(exp->ex_path.mnt);
 	cache_put(&exp->h, exp->cd);
 }
 
 static inline struct svc_export *exp_get(struct svc_export *exp)
 {
 	cache_get(&exp->h);
+	mntget(exp->ex_path.mnt);
 	return exp;
 }
 struct svc_export * rqst_exp_find(struct svc_rqst *, int, u32 *);
-- 
2.4.3

--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH 9/9 v8] nfsd: Allows user un-mounting filesystem where nfsd exports base on
@ 2015-07-27  3:12     ` Kinglong Mee
  0 siblings, 0 replies; 49+ messages in thread
From: Kinglong Mee @ 2015-07-27  3:12 UTC (permalink / raw)
  To: J. Bruce Fields, Al Viro
  Cc: linux-nfs, linux-fsdevel, NeilBrown, Trond Myklebust, kinglongmee

If there are some mount points(not exported for nfs) under pseudo root,
after client's operation of those entry under the root,  anyone *can't*
unmount those mount points until export cache expired.

/nfs/xfs        *(rw,insecure,no_subtree_check,no_root_squash)
/nfs/pnfs       *(rw,insecure,no_subtree_check,no_root_squash)
total 0
drwxr-xr-x. 3 root root 84 Apr 21 22:27 pnfs
drwxr-xr-x. 3 root root 84 Apr 21 22:27 test
drwxr-xr-x. 2 root root  6 Apr 20 22:01 xfs
Filesystem                      1K-blocks    Used Available Use% Mounted on
......
/dev/sdd                          1038336   32944   1005392   4% /nfs/pnfs
/dev/sdc                         10475520   32928  10442592   1% /nfs/xfs
/dev/sde                           999320    1284    929224   1% /nfs/test
/mnt/pnfs/:
total 0
-rw-r--r--. 1 root root 0 Apr 21 22:23 attr
drwxr-xr-x. 2 root root 6 Apr 21 22:19 tmp

/mnt/xfs/:
total 0
umount: /nfs/test/: target is busy
        (In some cases useful info about processes that
        use the device is found by lsof(8) or fuser(1).)

It's caused by exports cache of nfsd holds the reference of
the path (here is /nfs/test/), so, it can't be umounted.

I don't think that's user expect, they want umount /nfs/test/.
Bruce think user can also umount /nfs/pnfs/ and /nfs/xfs.

Also, using kzalloc for all memory allocating without kmalloc.
Thanks for Al Viro's commets for the logic of fs_pin.

v3,
1. using path_get_pin/path_put_unpin for path pin
2. using kzalloc for memory allocating

v5, v4,
1. add a completion for pin_kill waiting the reference is decreased to zero.
2. add a work_struct for pin_kill decreases the reference indirectly.
3. free svc_export/svc_expkey in pin_kill, not svc_export_put/svc_expkey_put.
4. svc_export_put/svc_expkey_put go though pin_kill logic.

v6,
1. Pin vfsmnt to mount point at first, when reference increace (==2),
   grab a reference to vfsmnt by mntget. When decreace (==1),
   drop the reference to vfsmnt, left pin.
2. Delete cache_head directly from cache_detail.

v7,
implement self reference increase and decrease for nfsd exports/expkey

v8,
new method as,

1. There are only one outlet from each cache, exp_find_key() for expkey,
   exp_get_by_name() for export.
2. Any fsid to export or filehandle to export will call the function.
3. exp_get()/exp_put() increase/decrease the reference of export.

Call legitimize_mntget() in the only outlet function exp_find_key()/
exp_get_by_name(), if fail return STALE, otherwise, any valid
expkey/export from the cache is validated (Have get the reference of vfsmnt).

Add mntget() in exp_get() and mntput() in exp_put(), because the export
passed to exp_get/exp_put are returned from exp_find_key/exp_get_by_name.

For expkey cache,
1. At first, a fsid is passed to exp_find_key, and lookup a cache
   in svc_expkey_lookup, if success, ekey->ek_path is pined to mount.
2. Then call legitimize_mntget getting a reference of vfsmnt 
   before return from exp_find_key.
3. Any calling exp_find_key with valid cache must put the vfsmnt.

for export cache,
1. At first, a path (returned from exp_find_key) with validate vfsmnt
   is passed to exp_get_by_name, if success, exp->ex_path is pined to mount.
2. Then call legitimize_mntget getting a reference of vfsmnt 
   before return from exp_get_by_name.
3. Any calling exp_get_by_name with valid cache must put the vfsmnt
   by exp_put();
4. Any using the exp returned from exp_get_by_name must call exp_get(),
   will increase the reference of vfsmnt.

So that,
a. After getting the reference in 2, any umount of filesystem will get -EBUSY.
b. After put all reference after 4, or before get the reference in 2, 
   any umount of filesystem will call pin_kill, and delete the cache directly,
   also unpin the vfsmount.
c. Between 1 and 2, have get the reference of exp/key cache, with invalidate vfsmnt.
   As you said, umount of filesystem only wait exp_find_key/exp_get_by_name
   put the reference of cache when legitimize_mntget fail.

Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
---
 fs/nfsd/export.c | 136 +++++++++++++++++++++++++++++++++++++++++++++----------
 fs/nfsd/export.h |  22 ++++++++-
 2 files changed, 132 insertions(+), 26 deletions(-)

diff --git a/fs/nfsd/export.c b/fs/nfsd/export.c
index b4d84b5..7f4816d 100644
--- a/fs/nfsd/export.c
+++ b/fs/nfsd/export.c
@@ -37,15 +37,23 @@
 #define	EXPKEY_HASHMAX		(1 << EXPKEY_HASHBITS)
 #define	EXPKEY_HASHMASK		(EXPKEY_HASHMAX -1)
 
+static void expkey_destroy(struct svc_expkey *key)
+{
+	auth_domain_put(key->ek_client);
+	kfree_rcu(key, rcu_head);
+}
+
 static void expkey_put(struct kref *ref)
 {
 	struct svc_expkey *key = container_of(ref, struct svc_expkey, h.ref);
 
 	if (test_bit(CACHE_VALID, &key->h.flags) &&
-	    !test_bit(CACHE_NEGATIVE, &key->h.flags))
-		path_put(&key->ek_path);
-	auth_domain_put(key->ek_client);
-	kfree(key);
+	    !test_bit(CACHE_NEGATIVE, &key->h.flags)) {
+		rcu_read_lock();
+		complete(&key->ek_done);
+		pin_kill(&key->ek_pin);
+	} else
+		expkey_destroy(key);
 }
 
 static void expkey_request(struct cache_detail *cd,
@@ -83,7 +91,7 @@ static int expkey_parse(struct cache_detail *cd, char *mesg, int mlen)
 		return -EINVAL;
 	mesg[mlen-1] = 0;
 
-	buf = kmalloc(PAGE_SIZE, GFP_KERNEL);
+	buf = kzalloc(PAGE_SIZE, GFP_KERNEL);
 	err = -ENOMEM;
 	if (!buf)
 		goto out;
@@ -119,6 +127,7 @@ static int expkey_parse(struct cache_detail *cd, char *mesg, int mlen)
 	if (key.h.expiry_time == 0)
 		goto out;
 
+	key.cd = cd;
 	key.ek_client = dom;	
 	key.ek_fsidtype = fsidtype;
 	memcpy(key.ek_fsid, buf, len);
@@ -181,7 +190,11 @@ static int expkey_show(struct seq_file *m,
 	if (test_bit(CACHE_VALID, &h->flags) && 
 	    !test_bit(CACHE_NEGATIVE, &h->flags)) {
 		seq_printf(m, " ");
-		seq_path(m, &ek->ek_path, "\\ \t\n");
+		if (legitimize_mntget(ek->ek_path.mnt)) {
+			seq_path(m, &ek->ek_path, "\\ \t\n");
+			mntput(ek->ek_path.mnt);
+		} else
+			seq_printf(m, "Dir umounting");
 	}
 	seq_printf(m, "\n");
 	return 0;
@@ -210,6 +223,26 @@ static inline void expkey_init(struct cache_head *cnew,
 	new->ek_fsidtype = item->ek_fsidtype;
 
 	memcpy(new->ek_fsid, item->ek_fsid, sizeof(new->ek_fsid));
+	new->cd = item->cd;
+}
+
+static void expkey_pin_kill(struct fs_pin *pin)
+{
+	struct svc_expkey *key = container_of(pin, struct svc_expkey, ek_pin);
+
+	if (!completion_done(&key->ek_done)) {
+		schedule_work(&key->ek_work);
+		wait_for_completion(&key->ek_done);
+	}
+
+	path_put_unpin(&key->ek_path, &key->ek_pin);
+	expkey_destroy(key);
+}
+
+static void expkey_close_work(struct work_struct *work)
+{
+	struct svc_expkey *key = container_of(work, struct svc_expkey, ek_work);
+	cache_delete_entry(key->cd, &key->h);
 }
 
 static inline void expkey_update(struct cache_head *cnew,
@@ -218,16 +251,19 @@ static inline void expkey_update(struct cache_head *cnew,
 	struct svc_expkey *new = container_of(cnew, struct svc_expkey, h);
 	struct svc_expkey *item = container_of(citem, struct svc_expkey, h);
 
+	init_fs_pin(&new->ek_pin, expkey_pin_kill);
 	new->ek_path = item->ek_path;
-	path_get(&item->ek_path);
+	path_get_pin(&new->ek_path, &new->ek_pin);
 }
 
 static struct cache_head *expkey_alloc(void)
 {
-	struct svc_expkey *i = kmalloc(sizeof(*i), GFP_KERNEL);
-	if (i)
+	struct svc_expkey *i = kzalloc(sizeof(*i), GFP_KERNEL);
+	if (i) {
+		INIT_WORK(&i->ek_work, expkey_close_work);
+		init_completion(&i->ek_done);
 		return &i->h;
-	else
+	} else
 		return NULL;
 }
 
@@ -306,14 +342,21 @@ static void nfsd4_fslocs_free(struct nfsd4_fs_locations *fsloc)
 	fsloc->locations = NULL;
 }
 
-static void svc_export_put(struct kref *ref)
+static void svc_export_destroy(struct svc_export *exp)
 {
-	struct svc_export *exp = container_of(ref, struct svc_export, h.ref);
-	path_put(&exp->ex_path);
 	auth_domain_put(exp->ex_client);
 	nfsd4_fslocs_free(&exp->ex_fslocs);
 	kfree(exp->ex_uuid);
-	kfree(exp);
+	kfree_rcu(exp, rcu_head);
+}
+
+static void svc_export_put(struct kref *ref)
+{
+	struct svc_export *exp = container_of(ref, struct svc_export, h.ref);
+
+	rcu_read_lock();
+	complete(&exp->ex_done);
+	pin_kill(&exp->ex_pin);
 }
 
 static void svc_export_request(struct cache_detail *cd,
@@ -520,7 +563,7 @@ static int svc_export_parse(struct cache_detail *cd, char *mesg, int mlen)
 		return -EINVAL;
 	mesg[mlen-1] = 0;
 
-	buf = kmalloc(PAGE_SIZE, GFP_KERNEL);
+	buf = kzalloc(PAGE_SIZE, GFP_KERNEL);
 	if (!buf)
 		return -ENOMEM;
 
@@ -636,7 +679,7 @@ static int svc_export_parse(struct cache_detail *cd, char *mesg, int mlen)
 	if (expp == NULL)
 		err = -ENOMEM;
 	else
-		exp_put(expp);
+		cache_put(&expp->h, expp->cd);
 out4:
 	nfsd4_fslocs_free(&exp.ex_fslocs);
 	kfree(exp.ex_uuid);
@@ -664,7 +707,12 @@ static int svc_export_show(struct seq_file *m,
 		return 0;
 	}
 	exp = container_of(h, struct svc_export, h);
-	seq_path(m, &exp->ex_path, " \t\n\\");
+	if (legitimize_mntget(exp->ex_path.mnt)) {
+		seq_path(m, &exp->ex_path, " \t\n\\");
+		mntput(exp->ex_path.mnt);
+	} else
+		seq_printf(m, "Dir umounting");
+
 	seq_putc(m, '\t');
 	seq_escape(m, exp->ex_client->name, " \t\n\\");
 	seq_putc(m, '(');
@@ -694,15 +742,35 @@ static int svc_export_match(struct cache_head *a, struct cache_head *b)
 		path_equal(&orig->ex_path, &new->ex_path);
 }
 
+static void export_pin_kill(struct fs_pin *pin)
+{
+	struct svc_export *exp = container_of(pin, struct svc_export, ex_pin);
+
+	if (!completion_done(&exp->ex_done)) {
+		schedule_work(&exp->ex_work);
+		wait_for_completion(&exp->ex_done);
+	}
+
+	path_put_unpin(&exp->ex_path, &exp->ex_pin);
+	svc_export_destroy(exp);
+}
+
+static void export_close_work(struct work_struct *work)
+{
+	struct svc_export *exp = container_of(work, struct svc_export, ex_work);
+	cache_delete_entry(exp->cd, &exp->h);
+}
+
 static void svc_export_init(struct cache_head *cnew, struct cache_head *citem)
 {
 	struct svc_export *new = container_of(cnew, struct svc_export, h);
 	struct svc_export *item = container_of(citem, struct svc_export, h);
 
+	init_fs_pin(&new->ex_pin, export_pin_kill);
 	kref_get(&item->ex_client->ref);
 	new->ex_client = item->ex_client;
 	new->ex_path = item->ex_path;
-	path_get(&item->ex_path);
+	path_get_pin(&new->ex_path, &new->ex_pin);
 	new->ex_fslocs.locations = NULL;
 	new->ex_fslocs.locations_count = 0;
 	new->ex_fslocs.migrated = 0;
@@ -740,10 +808,12 @@ static void export_update(struct cache_head *cnew, struct cache_head *citem)
 
 static struct cache_head *svc_export_alloc(void)
 {
-	struct svc_export *i = kmalloc(sizeof(*i), GFP_KERNEL);
-	if (i)
+	struct svc_export *i = kzalloc(sizeof(*i), GFP_KERNEL);
+	if (i) {
+		INIT_WORK(&i->ex_work, export_close_work);
+		init_completion(&i->ex_done);
 		return &i->h;
-	else
+	} else
 		return NULL;
 }
 
@@ -798,6 +868,11 @@ svc_export_update(struct svc_export *new, struct svc_export *old)
 		return NULL;
 }
 
+static void exp_put_key(struct svc_expkey *key)
+{
+	mntput(key->ek_path.mnt);
+	cache_put(&key->h, key->cd);
+}
 
 static struct svc_expkey *
 exp_find_key(struct cache_detail *cd, struct auth_domain *clp, int fsid_type,
@@ -809,6 +884,7 @@ exp_find_key(struct cache_detail *cd, struct auth_domain *clp, int fsid_type,
 	if (!clp)
 		return ERR_PTR(-ENOENT);
 
+	key.cd = cd;
 	key.ek_client = clp;
 	key.ek_fsidtype = fsid_type;
 	memcpy(key.ek_fsid, fsidv, key_len(fsid_type));
@@ -819,6 +895,12 @@ exp_find_key(struct cache_detail *cd, struct auth_domain *clp, int fsid_type,
 	err = cache_check(cd, &ek->h, reqp);
 	if (err)
 		return ERR_PTR(err);
+
+	if (!legitimize_mntget(ek->ek_path.mnt)) {
+		cache_put(&ek->h, ek->cd);
+		return ERR_PTR(-ESTALE);
+	}
+
 	return ek;
 }
 
@@ -842,6 +924,12 @@ exp_get_by_name(struct cache_detail *cd, struct auth_domain *clp,
 	err = cache_check(cd, &exp->h, reqp);
 	if (err)
 		return ERR_PTR(err);
+
+	if (!legitimize_mntget(exp->ex_path.mnt)) {
+		cache_put(&exp->h, exp->cd);
+		return ERR_PTR(-ESTALE);
+	}
+
 	return exp;
 }
 
@@ -928,7 +1016,7 @@ static struct svc_export *exp_find(struct cache_detail *cd,
 		return ERR_CAST(ek);
 
 	exp = exp_get_by_name(cd, clp, &ek->ek_path, reqp);
-	cache_put(&ek->h, nn->svc_expkey_cache);
+	exp_put_key(ek);
 
 	if (IS_ERR(exp))
 		return ERR_CAST(exp);
@@ -1195,10 +1283,10 @@ static int e_show(struct seq_file *m, void *p)
 		return 0;
 	}
 
-	exp_get(exp);
+	cache_get(&exp->h);
 	if (cache_check(cd, &exp->h, NULL))
 		return 0;
-	exp_put(exp);
+	cache_put(&exp->h, exp->cd);
 	return svc_export_show(m, cd, cp);
 }
 
diff --git a/fs/nfsd/export.h b/fs/nfsd/export.h
index 1f52bfc..52210fb 100644
--- a/fs/nfsd/export.h
+++ b/fs/nfsd/export.h
@@ -4,6 +4,7 @@
 #ifndef NFSD_EXPORT_H
 #define NFSD_EXPORT_H
 
+#include <linux/fs_pin.h>
 #include <linux/sunrpc/cache.h>
 #include <uapi/linux/nfsd/export.h>
 
@@ -46,9 +47,10 @@ struct exp_flavor_info {
 
 struct svc_export {
 	struct cache_head	h;
+	struct cache_detail	*cd;
+
 	struct auth_domain *	ex_client;
 	int			ex_flags;
-	struct path		ex_path;
 	kuid_t			ex_anon_uid;
 	kgid_t			ex_anon_gid;
 	int			ex_fsid;
@@ -58,7 +60,14 @@ struct svc_export {
 	struct exp_flavor_info	ex_flavors[MAX_SECINFO_LIST];
 	enum pnfs_layouttype	ex_layout_type;
 	struct nfsd4_deviceid_map *ex_devid_map;
-	struct cache_detail	*cd;
+
+	struct path		ex_path;
+	struct fs_pin		ex_pin;
+	struct rcu_head		rcu_head;
+
+	/* For svc_export_put and fs umounting window */
+	struct completion	ex_done;
+	struct work_struct	ex_work;
 };
 
 /* an "export key" (expkey) maps a filehandlefragement to an
@@ -67,12 +76,19 @@ struct svc_export {
  */
 struct svc_expkey {
 	struct cache_head	h;
+	struct cache_detail	*cd;
 
 	struct auth_domain *	ek_client;
 	int			ek_fsidtype;
 	u32			ek_fsid[6];
 
 	struct path		ek_path;
+	struct fs_pin		ek_pin;
+	struct rcu_head		rcu_head;
+
+	/* For expkey_put and fs umounting window */
+	struct completion	ek_done;
+	struct work_struct	ek_work;
 };
 
 #define EX_ISSYNC(exp)		(!((exp)->ex_flags & NFSEXP_ASYNC))
@@ -100,12 +116,14 @@ __be32			nfserrno(int errno);
 
 static inline void exp_put(struct svc_export *exp)
 {
+	mntput(exp->ex_path.mnt);
 	cache_put(&exp->h, exp->cd);
 }
 
 static inline struct svc_export *exp_get(struct svc_export *exp)
 {
 	cache_get(&exp->h);
+	mntget(exp->ex_path.mnt);
 	return exp;
 }
 struct svc_export * rqst_exp_find(struct svc_rqst *, int, u32 *);
-- 
2.4.3


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* Re: [PATCH 1/9 v8] fs_pin: Initialize value for fs_pin explicitly
  2015-07-27  3:06     ` Kinglong Mee
  (?)
@ 2015-07-29  0:25     ` NeilBrown
  2015-07-29 19:41       ` J. Bruce Fields
  -1 siblings, 1 reply; 49+ messages in thread
From: NeilBrown @ 2015-07-29  0:25 UTC (permalink / raw)
  To: Kinglong Mee
  Cc: J. Bruce Fields, Al Viro, linux-nfs, linux-fsdevel, Trond Myklebust

On Mon, 27 Jul 2015 11:06:53 +0800 Kinglong Mee <kinglongmee@gmail.com>
wrote:

> Without initialized, done in fs_pin at stack space may
> contains strange value.
> 
> v8, same as v3
> Adds macro for header file
> 
> Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>

Reviewed-by: NeilBrown <neilb@suse.com>

It would be really good if some of these early patches could be applied
to the relevant trees so they appear in -next and we only need to keep
reviewing the more interesting code at the end.

Al, Bruce: any chance of some of these getting into -next ...

Thanks,
NeilBrown

> ---
>  include/linux/fs_pin.h | 6 ++++++
>  1 file changed, 6 insertions(+)
> 
> diff --git a/include/linux/fs_pin.h b/include/linux/fs_pin.h
> index 3886b3b..0dde7b7 100644
> --- a/include/linux/fs_pin.h
> +++ b/include/linux/fs_pin.h
> @@ -1,3 +1,6 @@
> +#ifndef _LINUX_FS_PIN_H
> +#define _LINUX_FS_PIN_H
> +
>  #include <linux/wait.h>
>  
>  struct fs_pin {
> @@ -16,9 +19,12 @@ static inline void init_fs_pin(struct fs_pin *p, void (*kill)(struct fs_pin *))
>  	INIT_HLIST_NODE(&p->s_list);
>  	INIT_HLIST_NODE(&p->m_list);
>  	p->kill = kill;
> +	p->done = 0;
>  }
>  
>  void pin_remove(struct fs_pin *);
>  void pin_insert_group(struct fs_pin *, struct vfsmount *, struct hlist_head *);
>  void pin_insert(struct fs_pin *, struct vfsmount *);
>  void pin_kill(struct fs_pin *);
> +
> +#endif


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 2/9 v8] fs_pin: Export functions for specific filesystem
  2015-07-27  3:07     ` Kinglong Mee
  (?)
@ 2015-07-29  0:30     ` NeilBrown
  2015-07-30 12:31       ` Kinglong Mee
  -1 siblings, 1 reply; 49+ messages in thread
From: NeilBrown @ 2015-07-29  0:30 UTC (permalink / raw)
  To: Kinglong Mee
  Cc: J. Bruce Fields, Al Viro, linux-nfs, linux-fsdevel, Trond Myklebust

On Mon, 27 Jul 2015 11:07:25 +0800 Kinglong Mee <kinglongmee@gmail.com>
wrote:

> Exports functions for others who want pin to vfsmount,
> eg, nfsd's export cache.
> 
> v8, same as v4
> add exporting of pin_kill.
> 
> Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>

Reviewed-by: NeilBrown <neilb@suse.com>

These are needed for any module to participate in pinning.

mnt_pin_kill() and group_pin_kill() are just helper-functions for
unmount etc (and are in fs/internal.h) so don't need to be exported.

Thanks,
NeilBrown

> ---
>  fs/fs_pin.c | 4 ++++
>  1 file changed, 4 insertions(+)
> 
> diff --git a/fs/fs_pin.c b/fs/fs_pin.c
> index 611b540..a1a4eb2 100644
> --- a/fs/fs_pin.c
> +++ b/fs/fs_pin.c
> @@ -17,6 +17,7 @@ void pin_remove(struct fs_pin *pin)
>  	wake_up_locked(&pin->wait);
>  	spin_unlock_irq(&pin->wait.lock);
>  }
> +EXPORT_SYMBOL(pin_remove);
>  
>  void pin_insert_group(struct fs_pin *pin, struct vfsmount *m, struct hlist_head *p)
>  {
> @@ -26,11 +27,13 @@ void pin_insert_group(struct fs_pin *pin, struct vfsmount *m, struct hlist_head
>  	hlist_add_head(&pin->m_list, &real_mount(m)->mnt_pins);
>  	spin_unlock(&pin_lock);
>  }
> +EXPORT_SYMBOL(pin_insert_group);
>  
>  void pin_insert(struct fs_pin *pin, struct vfsmount *m)
>  {
>  	pin_insert_group(pin, m, &m->mnt_sb->s_pins);
>  }
> +EXPORT_SYMBOL(pin_insert);
>  
>  void pin_kill(struct fs_pin *p)
>  {
> @@ -72,6 +75,7 @@ void pin_kill(struct fs_pin *p)
>  	}
>  	rcu_read_unlock();
>  }
> +EXPORT_SYMBOL(pin_kill);
>  
>  void mnt_pin_kill(struct mount *m)
>  {


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 4/9 v8] fs: New helper legitimize_mntget() for getting a legitimize mnt
  2015-07-27  3:08 ` [PATCH 4/9 v8] fs: New helper legitimize_mntget() for getting a legitimize mnt Kinglong Mee
@ 2015-07-29  2:06       ` NeilBrown
  0 siblings, 0 replies; 49+ messages in thread
From: NeilBrown @ 2015-07-29  2:06 UTC (permalink / raw)
  To: Kinglong Mee
  Cc: J. Bruce Fields, Al Viro, linux-nfs-u79uwXL29TY76Z2rM5mHXA,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA, Trond Myklebust

On Mon, 27 Jul 2015 11:08:32 +0800 Kinglong Mee <kinglongmee-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
wrote:

> New helper legitimize_mntget for getting a mnt without setting
> MNT_SYNC_UMOUNT | MNT_UMOUNT | MNT_DOOMED, otherwise return NULL.
> 
> v8, same as v6
> 
> Signed-off-by: Kinglong Mee <kinglongmee-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
> ---
>  fs/namespace.c        | 19 +++++++++++++++++++
>  include/linux/mount.h |  1 +
>  2 files changed, 20 insertions(+)
> 
> diff --git a/fs/namespace.c b/fs/namespace.c
> index 2b8aa15..842cf57 100644
> --- a/fs/namespace.c
> +++ b/fs/namespace.c
> @@ -1153,6 +1153,25 @@ struct vfsmount *mntget(struct vfsmount *mnt)
>  }
>  EXPORT_SYMBOL(mntget);
>  
> +struct vfsmount *legitimize_mntget(struct vfsmount *vfsmnt)
> +{
> +	struct mount *mnt;
> +
> +	if (vfsmnt == NULL)
> +		return NULL;
> +
> +	read_seqlock_excl(&mount_lock);
> +	mnt = real_mount(vfsmnt);
> +	if (vfsmnt->mnt_flags & (MNT_SYNC_UMOUNT | MNT_UMOUNT | MNT_DOOMED))
> +		vfsmnt = NULL;
> +	else
> +		mnt_add_count(mnt, 1);
> +	read_sequnlock_excl(&mount_lock);
> +
> +	return vfsmnt;
> +}
> +EXPORT_SYMBOL(legitimize_mntget);
> +
>  struct vfsmount *mnt_clone_internal(struct path *path)
>  {
>  	struct mount *p;
> diff --git a/include/linux/mount.h b/include/linux/mount.h
> index f822c3c..8ae9dc0 100644
> --- a/include/linux/mount.h
> +++ b/include/linux/mount.h
> @@ -79,6 +79,7 @@ extern void mnt_drop_write(struct vfsmount *mnt);
>  extern void mnt_drop_write_file(struct file *file);
>  extern void mntput(struct vfsmount *mnt);
>  extern struct vfsmount *mntget(struct vfsmount *mnt);
> +extern struct vfsmount *legitimize_mntget(struct vfsmount *vfsmnt);
>  extern struct vfsmount *mnt_clone_internal(struct path *path);
>  extern int __mnt_is_readonly(struct vfsmount *mnt);
>  

It is unfortunate that we seem to have to take the mount_lock global
lock on every nfs request.  I wonder if we can avoid that....

What if we did:

  seq = 0;
retry:
  read_seqbegin_or_lock(&mount_lock, &seq);
  if (vfsmnt->mnt_flags & (MNT_SYNC_UMOUNT | MNT_UMOUNT | MNT_DOOMED))
      vfsmnt = NULL;
  else if (need_seqretry(&mount_lock, seq);
      goto retry;
  else {
	mnt_add_count(&mnt, 1);
        if (need_seqretry(&mount_lock, seq) ||
	    vfsmnt->mnt_flags & (MNT_SYNC_UMOUNT | MNT_UMOUNT |
                                 MNT_DOOMED)) {
             mnt_add_count(&mnt, -1);
             goto retry;
        }
   }
   done_seqretry(&mount_lock, seq);


Is there any risk from having a temporarily elevated mnt_count there?
I can't see one, but there is clearly some complexity in managing that
count.

Thanks,
NeilBrown
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 4/9 v8] fs: New helper legitimize_mntget() for getting a legitimize mnt
@ 2015-07-29  2:06       ` NeilBrown
  0 siblings, 0 replies; 49+ messages in thread
From: NeilBrown @ 2015-07-29  2:06 UTC (permalink / raw)
  To: Kinglong Mee
  Cc: J. Bruce Fields, Al Viro, linux-nfs, linux-fsdevel, Trond Myklebust

On Mon, 27 Jul 2015 11:08:32 +0800 Kinglong Mee <kinglongmee@gmail.com>
wrote:

> New helper legitimize_mntget for getting a mnt without setting
> MNT_SYNC_UMOUNT | MNT_UMOUNT | MNT_DOOMED, otherwise return NULL.
> 
> v8, same as v6
> 
> Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
> ---
>  fs/namespace.c        | 19 +++++++++++++++++++
>  include/linux/mount.h |  1 +
>  2 files changed, 20 insertions(+)
> 
> diff --git a/fs/namespace.c b/fs/namespace.c
> index 2b8aa15..842cf57 100644
> --- a/fs/namespace.c
> +++ b/fs/namespace.c
> @@ -1153,6 +1153,25 @@ struct vfsmount *mntget(struct vfsmount *mnt)
>  }
>  EXPORT_SYMBOL(mntget);
>  
> +struct vfsmount *legitimize_mntget(struct vfsmount *vfsmnt)
> +{
> +	struct mount *mnt;
> +
> +	if (vfsmnt == NULL)
> +		return NULL;
> +
> +	read_seqlock_excl(&mount_lock);
> +	mnt = real_mount(vfsmnt);
> +	if (vfsmnt->mnt_flags & (MNT_SYNC_UMOUNT | MNT_UMOUNT | MNT_DOOMED))
> +		vfsmnt = NULL;
> +	else
> +		mnt_add_count(mnt, 1);
> +	read_sequnlock_excl(&mount_lock);
> +
> +	return vfsmnt;
> +}
> +EXPORT_SYMBOL(legitimize_mntget);
> +
>  struct vfsmount *mnt_clone_internal(struct path *path)
>  {
>  	struct mount *p;
> diff --git a/include/linux/mount.h b/include/linux/mount.h
> index f822c3c..8ae9dc0 100644
> --- a/include/linux/mount.h
> +++ b/include/linux/mount.h
> @@ -79,6 +79,7 @@ extern void mnt_drop_write(struct vfsmount *mnt);
>  extern void mnt_drop_write_file(struct file *file);
>  extern void mntput(struct vfsmount *mnt);
>  extern struct vfsmount *mntget(struct vfsmount *mnt);
> +extern struct vfsmount *legitimize_mntget(struct vfsmount *vfsmnt);
>  extern struct vfsmount *mnt_clone_internal(struct path *path);
>  extern int __mnt_is_readonly(struct vfsmount *mnt);
>  

It is unfortunate that we seem to have to take the mount_lock global
lock on every nfs request.  I wonder if we can avoid that....

What if we did:

  seq = 0;
retry:
  read_seqbegin_or_lock(&mount_lock, &seq);
  if (vfsmnt->mnt_flags & (MNT_SYNC_UMOUNT | MNT_UMOUNT | MNT_DOOMED))
      vfsmnt = NULL;
  else if (need_seqretry(&mount_lock, seq);
      goto retry;
  else {
	mnt_add_count(&mnt, 1);
        if (need_seqretry(&mount_lock, seq) ||
	    vfsmnt->mnt_flags & (MNT_SYNC_UMOUNT | MNT_UMOUNT |
                                 MNT_DOOMED)) {
             mnt_add_count(&mnt, -1);
             goto retry;
        }
   }
   done_seqretry(&mount_lock, seq);


Is there any risk from having a temporarily elevated mnt_count there?
I can't see one, but there is clearly some complexity in managing that
count.

Thanks,
NeilBrown

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 5/9 v8] sunrpc: Store cache_detail in seq_file's private directly
  2015-07-27  3:09 ` [PATCH 5/9 v8] sunrpc: Store cache_detail in seq_file's private directly Kinglong Mee
@ 2015-07-29  2:11   ` NeilBrown
  0 siblings, 0 replies; 49+ messages in thread
From: NeilBrown @ 2015-07-29  2:11 UTC (permalink / raw)
  To: Kinglong Mee
  Cc: J. Bruce Fields, Al Viro, linux-nfs, linux-fsdevel, Trond Myklebust

On Mon, 27 Jul 2015 11:09:10 +0800 Kinglong Mee <kinglongmee@gmail.com>
wrote:

> Cleanup.
> 
> Just store cache_detail in seq_file's private,
> an allocated handle is redundant.
> 
> v8, same as v6.
> 
> Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
> ---
>  net/sunrpc/cache.c | 28 +++++++++++++---------------
>  1 file changed, 13 insertions(+), 15 deletions(-)
> 
> diff --git a/net/sunrpc/cache.c b/net/sunrpc/cache.c
> index 2928aff..edec603 100644
> --- a/net/sunrpc/cache.c
> +++ b/net/sunrpc/cache.c
> @@ -1270,18 +1270,13 @@ EXPORT_SYMBOL_GPL(qword_get);
>   * get a header, then pass each real item in the cache
>   */
>  
> -struct handle {
> -	struct cache_detail *cd;
> -};
> -
>  static void *c_start(struct seq_file *m, loff_t *pos)
>  	__acquires(cd->hash_lock)
>  {
>  	loff_t n = *pos;
>  	unsigned int hash, entry;
>  	struct cache_head *ch;
> -	struct cache_detail *cd = ((struct handle*)m->private)->cd;
> -
> +	struct cache_detail *cd = m->private;
>  
>  	read_lock(&cd->hash_lock);
>  	if (!n--)
> @@ -1308,7 +1303,7 @@ static void *c_next(struct seq_file *m, void *p, loff_t *pos)
>  {
>  	struct cache_head *ch = p;
>  	int hash = (*pos >> 32);
> -	struct cache_detail *cd = ((struct handle*)m->private)->cd;
> +	struct cache_detail *cd = m->private;
>  
>  	if (p == SEQ_START_TOKEN)
>  		hash = 0;
> @@ -1334,14 +1329,14 @@ static void *c_next(struct seq_file *m, void *p, loff_t *pos)
>  static void c_stop(struct seq_file *m, void *p)
>  	__releases(cd->hash_lock)
>  {
> -	struct cache_detail *cd = ((struct handle*)m->private)->cd;
> +	struct cache_detail *cd = m->private;
>  	read_unlock(&cd->hash_lock);
>  }
>  
>  static int c_show(struct seq_file *m, void *p)
>  {
>  	struct cache_head *cp = p;
> -	struct cache_detail *cd = ((struct handle*)m->private)->cd;
> +	struct cache_detail *cd = m->private;
>  
>  	if (p == SEQ_START_TOKEN)
>  		return cd->cache_show(m, cd, NULL);
> @@ -1373,24 +1368,27 @@ static const struct seq_operations cache_content_op = {
>  static int content_open(struct inode *inode, struct file *file,
>  			struct cache_detail *cd)
>  {
> -	struct handle *han;
> +	struct seq_file *seq;
> +	int err;
>  
>  	if (!cd || !try_module_get(cd->owner))
>  		return -EACCES;
> -	han = __seq_open_private(file, &cache_content_op, sizeof(*han));
> -	if (han == NULL) {
> +
> +	err = seq_open(file, &cache_content_op);
> +	if (err) {
>  		module_put(cd->owner);
> -		return -ENOMEM;
> +		return err;
>  	}
>  
> -	han->cd = cd;
> +	seq = file->private_data;
> +	seq->private = cd;
>  	return 0;
>  }
>  
>  static int content_release(struct inode *inode, struct file *file,
>  		struct cache_detail *cd)
>  {
> -	int ret = seq_release_private(inode, file);
> +	int ret = seq_release(inode, file);
>  	module_put(cd->owner);
>  	return ret;
>  }


Reviewed-by: NeilBrown <neilb@suse.com>

The 'struct handle' structure hasn't been needed since 2.5.74!

commit 121bed2470d9e8ddeeaa0b9dc2af5561ba07df87
Author: Neil Brown <neilb@cse.unsw.edu.au>
Date:   Thu Jun 26 19:37:59 2003 -0700

    [PATCH] Remove path buffer passed around by cache_show routines
    
    this was need for paths, but now we have seq_path...


This is certainly a patch that could go upstream without waiting for
the rest :-)

Thanks,
NeilBrown

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 6/9 v8] sunrpc/nfsd: Remove redundant code by exports seq_operations functions
  2015-07-27  3:09     ` Kinglong Mee
  (?)
@ 2015-07-29  2:15     ` NeilBrown
  -1 siblings, 0 replies; 49+ messages in thread
From: NeilBrown @ 2015-07-29  2:15 UTC (permalink / raw)
  To: Kinglong Mee
  Cc: J. Bruce Fields, Al Viro, linux-nfs, linux-fsdevel, Trond Myklebust

On Mon, 27 Jul 2015 11:09:42 +0800 Kinglong Mee <kinglongmee@gmail.com>
wrote:

> Nfsd has implement a site of seq_operations functions as sunrpc's cache.
> Just exports sunrpc's codes, and remove nfsd's redundant codes.
> 
> v8, same as v6
> 
> Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>

Reviewed-by: NeilBrown <neilb@suse.com>

This depends on the previous cleanup - together they are a very nice
improvement.

NeilBrown


> ---
>  fs/nfsd/export.c             | 73 ++------------------------------------------
>  include/linux/sunrpc/cache.h |  5 +++
>  net/sunrpc/cache.c           | 15 +++++----
>  3 files changed, 17 insertions(+), 76 deletions(-)
> 
> diff --git a/fs/nfsd/export.c b/fs/nfsd/export.c
> index f79521a..b4d84b5 100644
> --- a/fs/nfsd/export.c
> +++ b/fs/nfsd/export.c
> @@ -1075,73 +1075,6 @@ exp_pseudoroot(struct svc_rqst *rqstp, struct svc_fh *fhp)
>  	return rv;
>  }
>  
> -/* Iterator */
> -
> -static void *e_start(struct seq_file *m, loff_t *pos)
> -	__acquires(((struct cache_detail *)m->private)->hash_lock)
> -{
> -	loff_t n = *pos;
> -	unsigned hash, export;
> -	struct cache_head *ch;
> -	struct cache_detail *cd = m->private;
> -	struct cache_head **export_table = cd->hash_table;
> -
> -	read_lock(&cd->hash_lock);
> -	if (!n--)
> -		return SEQ_START_TOKEN;
> -	hash = n >> 32;
> -	export = n & ((1LL<<32) - 1);
> -
> -	
> -	for (ch=export_table[hash]; ch; ch=ch->next)
> -		if (!export--)
> -			return ch;
> -	n &= ~((1LL<<32) - 1);
> -	do {
> -		hash++;
> -		n += 1LL<<32;
> -	} while(hash < EXPORT_HASHMAX && export_table[hash]==NULL);
> -	if (hash >= EXPORT_HASHMAX)
> -		return NULL;
> -	*pos = n+1;
> -	return export_table[hash];
> -}
> -
> -static void *e_next(struct seq_file *m, void *p, loff_t *pos)
> -{
> -	struct cache_head *ch = p;
> -	int hash = (*pos >> 32);
> -	struct cache_detail *cd = m->private;
> -	struct cache_head **export_table = cd->hash_table;
> -
> -	if (p == SEQ_START_TOKEN)
> -		hash = 0;
> -	else if (ch->next == NULL) {
> -		hash++;
> -		*pos += 1LL<<32;
> -	} else {
> -		++*pos;
> -		return ch->next;
> -	}
> -	*pos &= ~((1LL<<32) - 1);
> -	while (hash < EXPORT_HASHMAX && export_table[hash] == NULL) {
> -		hash++;
> -		*pos += 1LL<<32;
> -	}
> -	if (hash >= EXPORT_HASHMAX)
> -		return NULL;
> -	++*pos;
> -	return export_table[hash];
> -}
> -
> -static void e_stop(struct seq_file *m, void *p)
> -	__releases(((struct cache_detail *)m->private)->hash_lock)
> -{
> -	struct cache_detail *cd = m->private;
> -
> -	read_unlock(&cd->hash_lock);
> -}
> -
>  static struct flags {
>  	int flag;
>  	char *name[2];
> @@ -1270,9 +1203,9 @@ static int e_show(struct seq_file *m, void *p)
>  }
>  
>  const struct seq_operations nfs_exports_op = {
> -	.start	= e_start,
> -	.next	= e_next,
> -	.stop	= e_stop,
> +	.start	= cache_seq_start,
> +	.next	= cache_seq_next,
> +	.stop	= cache_seq_stop,
>  	.show	= e_show,
>  };
>  
> diff --git a/include/linux/sunrpc/cache.h b/include/linux/sunrpc/cache.h
> index 437ddb6..04ee5a2 100644
> --- a/include/linux/sunrpc/cache.h
> +++ b/include/linux/sunrpc/cache.h
> @@ -224,6 +224,11 @@ extern int sunrpc_cache_register_pipefs(struct dentry *parent, const char *,
>  					umode_t, struct cache_detail *);
>  extern void sunrpc_cache_unregister_pipefs(struct cache_detail *);
>  
> +/* Must store cache_detail in seq_file->private if using next three functions */
> +extern void *cache_seq_start(struct seq_file *file, loff_t *pos);
> +extern void *cache_seq_next(struct seq_file *file, void *p, loff_t *pos);
> +extern void cache_seq_stop(struct seq_file *file, void *p);
> +
>  extern void qword_add(char **bpp, int *lp, char *str);
>  extern void qword_addhex(char **bpp, int *lp, char *buf, int blen);
>  extern int qword_get(char **bpp, char *dest, int bufsize);
> diff --git a/net/sunrpc/cache.c b/net/sunrpc/cache.c
> index edec603..673c2fa 100644
> --- a/net/sunrpc/cache.c
> +++ b/net/sunrpc/cache.c
> @@ -1270,7 +1270,7 @@ EXPORT_SYMBOL_GPL(qword_get);
>   * get a header, then pass each real item in the cache
>   */
>  
> -static void *c_start(struct seq_file *m, loff_t *pos)
> +void *cache_seq_start(struct seq_file *m, loff_t *pos)
>  	__acquires(cd->hash_lock)
>  {
>  	loff_t n = *pos;
> @@ -1298,8 +1298,9 @@ static void *c_start(struct seq_file *m, loff_t *pos)
>  	*pos = n+1;
>  	return cd->hash_table[hash];
>  }
> +EXPORT_SYMBOL_GPL(cache_seq_start);
>  
> -static void *c_next(struct seq_file *m, void *p, loff_t *pos)
> +void *cache_seq_next(struct seq_file *m, void *p, loff_t *pos)
>  {
>  	struct cache_head *ch = p;
>  	int hash = (*pos >> 32);
> @@ -1325,13 +1326,15 @@ static void *c_next(struct seq_file *m, void *p, loff_t *pos)
>  	++*pos;
>  	return cd->hash_table[hash];
>  }
> +EXPORT_SYMBOL_GPL(cache_seq_next);
>  
> -static void c_stop(struct seq_file *m, void *p)
> +void cache_seq_stop(struct seq_file *m, void *p)
>  	__releases(cd->hash_lock)
>  {
>  	struct cache_detail *cd = m->private;
>  	read_unlock(&cd->hash_lock);
>  }
> +EXPORT_SYMBOL_GPL(cache_seq_stop);
>  
>  static int c_show(struct seq_file *m, void *p)
>  {
> @@ -1359,9 +1362,9 @@ static int c_show(struct seq_file *m, void *p)
>  }
>  
>  static const struct seq_operations cache_content_op = {
> -	.start	= c_start,
> -	.next	= c_next,
> -	.stop	= c_stop,
> +	.start	= cache_seq_start,
> +	.next	= cache_seq_next,
> +	.stop	= cache_seq_stop,
>  	.show	= c_show,
>  };
>  


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 7/9 v8] sunrpc: Switch to using hash list instead single list
  2015-07-27  3:10 ` [PATCH 7/9 v8] sunrpc: Switch to using hash list instead single list Kinglong Mee
@ 2015-07-29  2:19   ` NeilBrown
  2015-07-29 19:51       ` J. Bruce Fields
  0 siblings, 1 reply; 49+ messages in thread
From: NeilBrown @ 2015-07-29  2:19 UTC (permalink / raw)
  To: Kinglong Mee
  Cc: J. Bruce Fields, Al Viro, linux-nfs, linux-fsdevel, Trond Myklebust

On Mon, 27 Jul 2015 11:10:15 +0800 Kinglong Mee <kinglongmee@gmail.com>
wrote:

> Switch using list_head for cache_head in cache_detail,
> it is useful of remove an cache_head entry directly from cache_detail.
> 
> v8, using hash list, not head list
> 
> Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>

Reviewed-by: NeilBrown <neilb@suse.com>

Thanks,
NeilBrown

> ---
>  include/linux/sunrpc/cache.h |  4 +--
>  net/sunrpc/cache.c           | 60 +++++++++++++++++++++++---------------------
>  2 files changed, 33 insertions(+), 31 deletions(-)
> 
> diff --git a/include/linux/sunrpc/cache.h b/include/linux/sunrpc/cache.h
> index 04ee5a2..03d3b4c 100644
> --- a/include/linux/sunrpc/cache.h
> +++ b/include/linux/sunrpc/cache.h
> @@ -46,7 +46,7 @@
>   * 
>   */
>  struct cache_head {
> -	struct cache_head * next;
> +	struct hlist_node	cache_list;
>  	time_t		expiry_time;	/* After time time, don't use the data */
>  	time_t		last_refresh;   /* If CACHE_PENDING, this is when upcall 
>  					 * was sent, else this is when update was received
> @@ -73,7 +73,7 @@ struct cache_detail_pipefs {
>  struct cache_detail {
>  	struct module *		owner;
>  	int			hash_size;
> -	struct cache_head **	hash_table;
> +	struct hlist_head *	hash_table;
>  	rwlock_t		hash_lock;
>  
>  	atomic_t		inuse; /* active user-space update or lookup */
> diff --git a/net/sunrpc/cache.c b/net/sunrpc/cache.c
> index 673c2fa..4a2340a 100644
> --- a/net/sunrpc/cache.c
> +++ b/net/sunrpc/cache.c
> @@ -44,7 +44,7 @@ static void cache_revisit_request(struct cache_head *item);
>  static void cache_init(struct cache_head *h)
>  {
>  	time_t now = seconds_since_boot();
> -	h->next = NULL;
> +	INIT_HLIST_NODE(&h->cache_list);
>  	h->flags = 0;
>  	kref_init(&h->ref);
>  	h->expiry_time = now + CACHE_NEW_EXPIRY;
> @@ -54,15 +54,14 @@ static void cache_init(struct cache_head *h)
>  struct cache_head *sunrpc_cache_lookup(struct cache_detail *detail,
>  				       struct cache_head *key, int hash)
>  {
> -	struct cache_head **head,  **hp;
> -	struct cache_head *new = NULL, *freeme = NULL;
> +	struct cache_head *new = NULL, *freeme = NULL, *tmp = NULL;
> +	struct hlist_head *head;
>  
>  	head = &detail->hash_table[hash];
>  
>  	read_lock(&detail->hash_lock);
>  
> -	for (hp=head; *hp != NULL ; hp = &(*hp)->next) {
> -		struct cache_head *tmp = *hp;
> +	hlist_for_each_entry(tmp, head, cache_list) {
>  		if (detail->match(tmp, key)) {
>  			if (cache_is_expired(detail, tmp))
>  				/* This entry is expired, we will discard it. */
> @@ -88,12 +87,10 @@ struct cache_head *sunrpc_cache_lookup(struct cache_detail *detail,
>  	write_lock(&detail->hash_lock);
>  
>  	/* check if entry appeared while we slept */
> -	for (hp=head; *hp != NULL ; hp = &(*hp)->next) {
> -		struct cache_head *tmp = *hp;
> +	hlist_for_each_entry(tmp, head, cache_list) {
>  		if (detail->match(tmp, key)) {
>  			if (cache_is_expired(detail, tmp)) {
> -				*hp = tmp->next;
> -				tmp->next = NULL;
> +				hlist_del_init(&tmp->cache_list);
>  				detail->entries --;
>  				freeme = tmp;
>  				break;
> @@ -104,8 +101,8 @@ struct cache_head *sunrpc_cache_lookup(struct cache_detail *detail,
>  			return tmp;
>  		}
>  	}
> -	new->next = *head;
> -	*head = new;
> +
> +	hlist_add_head(&new->cache_list, head);
>  	detail->entries++;
>  	cache_get(new);
>  	write_unlock(&detail->hash_lock);
> @@ -143,7 +140,6 @@ struct cache_head *sunrpc_cache_update(struct cache_detail *detail,
>  	 * If 'old' is not VALID, we update it directly,
>  	 * otherwise we need to replace it
>  	 */
> -	struct cache_head **head;
>  	struct cache_head *tmp;
>  
>  	if (!test_bit(CACHE_VALID, &old->flags)) {
> @@ -168,15 +164,13 @@ struct cache_head *sunrpc_cache_update(struct cache_detail *detail,
>  	}
>  	cache_init(tmp);
>  	detail->init(tmp, old);
> -	head = &detail->hash_table[hash];
>  
>  	write_lock(&detail->hash_lock);
>  	if (test_bit(CACHE_NEGATIVE, &new->flags))
>  		set_bit(CACHE_NEGATIVE, &tmp->flags);
>  	else
>  		detail->update(tmp, new);
> -	tmp->next = *head;
> -	*head = tmp;
> +	hlist_add_head(&tmp->cache_list, &detail->hash_table[hash]);
>  	detail->entries++;
>  	cache_get(tmp);
>  	cache_fresh_locked(tmp, new->expiry_time);
> @@ -416,28 +410,29 @@ static int cache_clean(void)
>  	/* find a non-empty bucket in the table */
>  	while (current_detail &&
>  	       current_index < current_detail->hash_size &&
> -	       current_detail->hash_table[current_index] == NULL)
> +	       hlist_empty(&current_detail->hash_table[current_index]))
>  		current_index++;
>  
>  	/* find a cleanable entry in the bucket and clean it, or set to next bucket */
>  
>  	if (current_detail && current_index < current_detail->hash_size) {
> -		struct cache_head *ch, **cp;
> +		struct cache_head *ch = NULL;
>  		struct cache_detail *d;
> +		struct hlist_head *head;
> +		struct hlist_node *tmp;
>  
>  		write_lock(&current_detail->hash_lock);
>  
>  		/* Ok, now to clean this strand */
>  
> -		cp = & current_detail->hash_table[current_index];
> -		for (ch = *cp ; ch ; cp = & ch->next, ch = *cp) {
> +		head = &current_detail->hash_table[current_index];
> +		hlist_for_each_entry_safe(ch, tmp, head, cache_list) {
>  			if (current_detail->nextcheck > ch->expiry_time)
>  				current_detail->nextcheck = ch->expiry_time+1;
>  			if (!cache_is_expired(current_detail, ch))
>  				continue;
>  
> -			*cp = ch->next;
> -			ch->next = NULL;
> +			hlist_del_init(&ch->cache_list);
>  			current_detail->entries--;
>  			rv = 1;
>  			break;
> @@ -1284,7 +1279,7 @@ void *cache_seq_start(struct seq_file *m, loff_t *pos)
>  	hash = n >> 32;
>  	entry = n & ((1LL<<32) - 1);
>  
> -	for (ch=cd->hash_table[hash]; ch; ch=ch->next)
> +	hlist_for_each_entry(ch, &cd->hash_table[hash], cache_list)
>  		if (!entry--)
>  			return ch;
>  	n &= ~((1LL<<32) - 1);
> @@ -1292,11 +1287,12 @@ void *cache_seq_start(struct seq_file *m, loff_t *pos)
>  		hash++;
>  		n += 1LL<<32;
>  	} while(hash < cd->hash_size &&
> -		cd->hash_table[hash]==NULL);
> +		hlist_empty(&cd->hash_table[hash]));
>  	if (hash >= cd->hash_size)
>  		return NULL;
>  	*pos = n+1;
> -	return cd->hash_table[hash];
> +	return hlist_entry_safe(cd->hash_table[hash].first,
> +				struct cache_head, cache_list);
>  }
>  EXPORT_SYMBOL_GPL(cache_seq_start);
>  
> @@ -1308,23 +1304,25 @@ void *cache_seq_next(struct seq_file *m, void *p, loff_t *pos)
>  
>  	if (p == SEQ_START_TOKEN)
>  		hash = 0;
> -	else if (ch->next == NULL) {
> +	else if (ch->cache_list.next == NULL) {
>  		hash++;
>  		*pos += 1LL<<32;
>  	} else {
>  		++*pos;
> -		return ch->next;
> +		return hlist_entry_safe(ch->cache_list.next,
> +					struct cache_head, cache_list);
>  	}
>  	*pos &= ~((1LL<<32) - 1);
>  	while (hash < cd->hash_size &&
> -	       cd->hash_table[hash] == NULL) {
> +	       hlist_empty(&cd->hash_table[hash])) {
>  		hash++;
>  		*pos += 1LL<<32;
>  	}
>  	if (hash >= cd->hash_size)
>  		return NULL;
>  	++*pos;
> -	return cd->hash_table[hash];
> +	return hlist_entry_safe(cd->hash_table[hash].first,
> +				struct cache_head, cache_list);
>  }
>  EXPORT_SYMBOL_GPL(cache_seq_next);
>  
> @@ -1666,17 +1664,21 @@ EXPORT_SYMBOL_GPL(cache_unregister_net);
>  struct cache_detail *cache_create_net(struct cache_detail *tmpl, struct net *net)
>  {
>  	struct cache_detail *cd;
> +	int i;
>  
>  	cd = kmemdup(tmpl, sizeof(struct cache_detail), GFP_KERNEL);
>  	if (cd == NULL)
>  		return ERR_PTR(-ENOMEM);
>  
> -	cd->hash_table = kzalloc(cd->hash_size * sizeof(struct cache_head *),
> +	cd->hash_table = kzalloc(cd->hash_size * sizeof(struct hlist_head),
>  				 GFP_KERNEL);
>  	if (cd->hash_table == NULL) {
>  		kfree(cd);
>  		return ERR_PTR(-ENOMEM);
>  	}
> +
> +	for (i = 0; i < cd->hash_size; i++)
> +		INIT_HLIST_HEAD(&cd->hash_table[i]);
>  	cd->net = net;
>  	return cd;
>  }


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 8/9 v8] sunrpc: New helper cache_delete_entry for deleting cache_head directly
  2015-07-27  3:10 ` [PATCH 8/9 v8] sunrpc: New helper cache_delete_entry for deleting cache_head directly Kinglong Mee
@ 2015-07-29  2:29       ` NeilBrown
  0 siblings, 0 replies; 49+ messages in thread
From: NeilBrown @ 2015-07-29  2:29 UTC (permalink / raw)
  To: Kinglong Mee
  Cc: J. Bruce Fields, Al Viro, linux-nfs-u79uwXL29TY76Z2rM5mHXA,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA, Trond Myklebust

On Mon, 27 Jul 2015 11:10:45 +0800 Kinglong Mee <kinglongmee-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
wrote:

> A new helper cache_delete_entry() for delete cache_head from
> cache_detail directly.
> 
> It will be used by pin_kill, so make sure the cache_detail is valid
> before deleting is needed.

I cannot see any justification for validating the cache_detail.

When this gets called, the cache_head has not yet been freed (though it
probably will be soon) so the cache_detail must still be around.

However it is possible for this to race with cache_clean() which could
have already removed the cache_head from the list (and decremented
->entries), but which hasn't called cache_put() yet.

The use of cache_list_lock is not enough to protect against that race.

So I think you should drop the use of cache_list_lock, drop the check
that detail is still in the list, and after getting ->hash_lock, check
hlist_unhash() and only unhash if that failed.

Thanks,
NeilBrown

> 
> Because pin_kill is not many times, so the influence of performance
> is accepted.
> 
> v8, same as v6.
> 
> Signed-off-by: Kinglong Mee <kinglongmee-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
> ---
>  include/linux/sunrpc/cache.h |  1 +
>  net/sunrpc/cache.c           | 30 ++++++++++++++++++++++++++++++
>  2 files changed, 31 insertions(+)
> 
> diff --git a/include/linux/sunrpc/cache.h b/include/linux/sunrpc/cache.h
> index 03d3b4c..2824db5 100644
> --- a/include/linux/sunrpc/cache.h
> +++ b/include/linux/sunrpc/cache.h
> @@ -210,6 +210,7 @@ extern int cache_check(struct cache_detail *detail,
>  		       struct cache_head *h, struct cache_req *rqstp);
>  extern void cache_flush(void);
>  extern void cache_purge(struct cache_detail *detail);
> +extern void cache_delete_entry(struct cache_detail *cd, struct cache_head *h);
>  #define NEVER (0x7FFFFFFF)
>  extern void __init cache_initialize(void);
>  extern int cache_register_net(struct cache_detail *cd, struct net *net);
> diff --git a/net/sunrpc/cache.c b/net/sunrpc/cache.c
> index 4a2340a..b722aea 100644
> --- a/net/sunrpc/cache.c
> +++ b/net/sunrpc/cache.c
> @@ -454,6 +454,36 @@ static int cache_clean(void)
>  	return rv;
>  }
>  
> +void cache_delete_entry(struct cache_detail *detail, struct cache_head *h)
> +{
> +	struct cache_detail *tmp;
> +
> +	if (!detail || !h)
> +		return;
> +
> +	spin_lock(&cache_list_lock);
> +	list_for_each_entry(tmp, &cache_list, others) {
> +		if (tmp == detail)
> +			goto found;
> +	}
> +	spin_unlock(&cache_list_lock);
> +	printk(KERN_WARNING "%s: Deleted cache detail %p\n", __func__, detail);
> +	return ;
> +
> +found:
> +	write_lock(&detail->hash_lock);
> +
> +	hlist_del_init(&h->cache_list);
> +	detail->entries--;
> +	set_bit(CACHE_CLEANED, &h->flags);
> +
> +	write_unlock(&detail->hash_lock);
> +	spin_unlock(&cache_list_lock);
> +
> +	cache_put(h, detail);
> +}
> +EXPORT_SYMBOL_GPL(cache_delete_entry);
> +
>  /*
>   * We want to regularly clean the cache, so we need to schedule some work ...
>   */

--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 8/9 v8] sunrpc: New helper cache_delete_entry for deleting cache_head directly
@ 2015-07-29  2:29       ` NeilBrown
  0 siblings, 0 replies; 49+ messages in thread
From: NeilBrown @ 2015-07-29  2:29 UTC (permalink / raw)
  To: Kinglong Mee
  Cc: J. Bruce Fields, Al Viro, linux-nfs, linux-fsdevel, Trond Myklebust

On Mon, 27 Jul 2015 11:10:45 +0800 Kinglong Mee <kinglongmee@gmail.com>
wrote:

> A new helper cache_delete_entry() for delete cache_head from
> cache_detail directly.
> 
> It will be used by pin_kill, so make sure the cache_detail is valid
> before deleting is needed.

I cannot see any justification for validating the cache_detail.

When this gets called, the cache_head has not yet been freed (though it
probably will be soon) so the cache_detail must still be around.

However it is possible for this to race with cache_clean() which could
have already removed the cache_head from the list (and decremented
->entries), but which hasn't called cache_put() yet.

The use of cache_list_lock is not enough to protect against that race.

So I think you should drop the use of cache_list_lock, drop the check
that detail is still in the list, and after getting ->hash_lock, check
hlist_unhash() and only unhash if that failed.

Thanks,
NeilBrown

> 
> Because pin_kill is not many times, so the influence of performance
> is accepted.
> 
> v8, same as v6.
> 
> Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
> ---
>  include/linux/sunrpc/cache.h |  1 +
>  net/sunrpc/cache.c           | 30 ++++++++++++++++++++++++++++++
>  2 files changed, 31 insertions(+)
> 
> diff --git a/include/linux/sunrpc/cache.h b/include/linux/sunrpc/cache.h
> index 03d3b4c..2824db5 100644
> --- a/include/linux/sunrpc/cache.h
> +++ b/include/linux/sunrpc/cache.h
> @@ -210,6 +210,7 @@ extern int cache_check(struct cache_detail *detail,
>  		       struct cache_head *h, struct cache_req *rqstp);
>  extern void cache_flush(void);
>  extern void cache_purge(struct cache_detail *detail);
> +extern void cache_delete_entry(struct cache_detail *cd, struct cache_head *h);
>  #define NEVER (0x7FFFFFFF)
>  extern void __init cache_initialize(void);
>  extern int cache_register_net(struct cache_detail *cd, struct net *net);
> diff --git a/net/sunrpc/cache.c b/net/sunrpc/cache.c
> index 4a2340a..b722aea 100644
> --- a/net/sunrpc/cache.c
> +++ b/net/sunrpc/cache.c
> @@ -454,6 +454,36 @@ static int cache_clean(void)
>  	return rv;
>  }
>  
> +void cache_delete_entry(struct cache_detail *detail, struct cache_head *h)
> +{
> +	struct cache_detail *tmp;
> +
> +	if (!detail || !h)
> +		return;
> +
> +	spin_lock(&cache_list_lock);
> +	list_for_each_entry(tmp, &cache_list, others) {
> +		if (tmp == detail)
> +			goto found;
> +	}
> +	spin_unlock(&cache_list_lock);
> +	printk(KERN_WARNING "%s: Deleted cache detail %p\n", __func__, detail);
> +	return ;
> +
> +found:
> +	write_lock(&detail->hash_lock);
> +
> +	hlist_del_init(&h->cache_list);
> +	detail->entries--;
> +	set_bit(CACHE_CLEANED, &h->flags);
> +
> +	write_unlock(&detail->hash_lock);
> +	spin_unlock(&cache_list_lock);
> +
> +	cache_put(h, detail);
> +}
> +EXPORT_SYMBOL_GPL(cache_delete_entry);
> +
>  /*
>   * We want to regularly clean the cache, so we need to schedule some work ...
>   */


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 9/9 v8] nfsd: Allows user un-mounting filesystem where nfsd exports base on
  2015-07-27  3:12     ` Kinglong Mee
@ 2015-07-29  3:56         ` NeilBrown
  -1 siblings, 0 replies; 49+ messages in thread
From: NeilBrown @ 2015-07-29  3:56 UTC (permalink / raw)
  To: Kinglong Mee
  Cc: J. Bruce Fields, Al Viro, linux-nfs-u79uwXL29TY76Z2rM5mHXA,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA, Trond Myklebust

On Mon, 27 Jul 2015 11:12:06 +0800 Kinglong Mee <kinglongmee-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
wrote:

> If there are some mount points(not exported for nfs) under pseudo root,
> after client's operation of those entry under the root,  anyone *can't*
> unmount those mount points until export cache expired.
> 
> /nfs/xfs        *(rw,insecure,no_subtree_check,no_root_squash)
> /nfs/pnfs       *(rw,insecure,no_subtree_check,no_root_squash)
> total 0
> drwxr-xr-x. 3 root root 84 Apr 21 22:27 pnfs
> drwxr-xr-x. 3 root root 84 Apr 21 22:27 test
> drwxr-xr-x. 2 root root  6 Apr 20 22:01 xfs
> Filesystem                      1K-blocks    Used Available Use% Mounted on
> ......
> /dev/sdd                          1038336   32944   1005392   4% /nfs/pnfs
> /dev/sdc                         10475520   32928  10442592   1% /nfs/xfs
> /dev/sde                           999320    1284    929224   1% /nfs/test
> /mnt/pnfs/:
> total 0
> -rw-r--r--. 1 root root 0 Apr 21 22:23 attr
> drwxr-xr-x. 2 root root 6 Apr 21 22:19 tmp
> 
> /mnt/xfs/:
> total 0
> umount: /nfs/test/: target is busy
>         (In some cases useful info about processes that
>         use the device is found by lsof(8) or fuser(1).)
> 
> It's caused by exports cache of nfsd holds the reference of
> the path (here is /nfs/test/), so, it can't be umounted.
> 
> I don't think that's user expect, they want umount /nfs/test/.
> Bruce think user can also umount /nfs/pnfs/ and /nfs/xfs.
> 
> Also, using kzalloc for all memory allocating without kmalloc.
> Thanks for Al Viro's commets for the logic of fs_pin.
> 
> v3,
> 1. using path_get_pin/path_put_unpin for path pin
> 2. using kzalloc for memory allocating
> 
> v5, v4,
> 1. add a completion for pin_kill waiting the reference is decreased to zero.
> 2. add a work_struct for pin_kill decreases the reference indirectly.
> 3. free svc_export/svc_expkey in pin_kill, not svc_export_put/svc_expkey_put.
> 4. svc_export_put/svc_expkey_put go though pin_kill logic.
> 
> v6,
> 1. Pin vfsmnt to mount point at first, when reference increace (==2),
>    grab a reference to vfsmnt by mntget. When decreace (==1),
>    drop the reference to vfsmnt, left pin.
> 2. Delete cache_head directly from cache_detail.
> 
> v7,
> implement self reference increase and decrease for nfsd exports/expkey
> 
> v8,
> new method as,
> 
> 1. There are only one outlet from each cache, exp_find_key() for expkey,
>    exp_get_by_name() for export.
> 2. Any fsid to export or filehandle to export will call the function.
> 3. exp_get()/exp_put() increase/decrease the reference of export.
> 
> Call legitimize_mntget() in the only outlet function exp_find_key()/
> exp_get_by_name(), if fail return STALE, otherwise, any valid
> expkey/export from the cache is validated (Have get the reference of vfsmnt).
> 
> Add mntget() in exp_get() and mntput() in exp_put(), because the export
> passed to exp_get/exp_put are returned from exp_find_key/exp_get_by_name.
> 
> For expkey cache,
> 1. At first, a fsid is passed to exp_find_key, and lookup a cache
>    in svc_expkey_lookup, if success, ekey->ek_path is pined to mount.
> 2. Then call legitimize_mntget getting a reference of vfsmnt 
>    before return from exp_find_key.
> 3. Any calling exp_find_key with valid cache must put the vfsmnt.
> 
> for export cache,
> 1. At first, a path (returned from exp_find_key) with validate vfsmnt
>    is passed to exp_get_by_name, if success, exp->ex_path is pined to mount.
> 2. Then call legitimize_mntget getting a reference of vfsmnt 
>    before return from exp_get_by_name.
> 3. Any calling exp_get_by_name with valid cache must put the vfsmnt
>    by exp_put();
> 4. Any using the exp returned from exp_get_by_name must call exp_get(),
>    will increase the reference of vfsmnt.
> 
> So that,
> a. After getting the reference in 2, any umount of filesystem will get -EBUSY.
> b. After put all reference after 4, or before get the reference in 2, 
>    any umount of filesystem will call pin_kill, and delete the cache directly,
>    also unpin the vfsmount.
> c. Between 1 and 2, have get the reference of exp/key cache, with invalidate vfsmnt.
>    As you said, umount of filesystem only wait exp_find_key/exp_get_by_name
>    put the reference of cache when legitimize_mntget fail.
> 
> Signed-off-by: Kinglong Mee <kinglongmee-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
> ---
>  fs/nfsd/export.c | 136 +++++++++++++++++++++++++++++++++++++++++++++----------
>  fs/nfsd/export.h |  22 ++++++++-
>  2 files changed, 132 insertions(+), 26 deletions(-)
> 
> diff --git a/fs/nfsd/export.c b/fs/nfsd/export.c
> index b4d84b5..7f4816d 100644
> --- a/fs/nfsd/export.c
> +++ b/fs/nfsd/export.c
> @@ -37,15 +37,23 @@
>  #define	EXPKEY_HASHMAX		(1 << EXPKEY_HASHBITS)
>  #define	EXPKEY_HASHMASK		(EXPKEY_HASHMAX -1)
>  
> +static void expkey_destroy(struct svc_expkey *key)
> +{
> +	auth_domain_put(key->ek_client);
> +	kfree_rcu(key, rcu_head);
> +}
> +
>  static void expkey_put(struct kref *ref)
>  {
>  	struct svc_expkey *key = container_of(ref, struct svc_expkey, h.ref);
>  
>  	if (test_bit(CACHE_VALID, &key->h.flags) &&
> -	    !test_bit(CACHE_NEGATIVE, &key->h.flags))
> -		path_put(&key->ek_path);
> -	auth_domain_put(key->ek_client);
> -	kfree(key);
> +	    !test_bit(CACHE_NEGATIVE, &key->h.flags)) {
> +		rcu_read_lock();
> +		complete(&key->ek_done);
> +		pin_kill(&key->ek_pin);
> +	} else
> +		expkey_destroy(key);
>  }
>  
>  static void expkey_request(struct cache_detail *cd,
> @@ -83,7 +91,7 @@ static int expkey_parse(struct cache_detail *cd, char *mesg, int mlen)
>  		return -EINVAL;
>  	mesg[mlen-1] = 0;
>  
> -	buf = kmalloc(PAGE_SIZE, GFP_KERNEL);
> +	buf = kzalloc(PAGE_SIZE, GFP_KERNEL);

Why this change?  There are certainly times when kzmalloc is
appropriate but I don't see that this is one of them, or that the
change has anything to do with the rest of the patch.


>  	err = -ENOMEM;
>  	if (!buf)
>  		goto out;
> @@ -119,6 +127,7 @@ static int expkey_parse(struct cache_detail *cd, char *mesg, int mlen)
>  	if (key.h.expiry_time == 0)
>  		goto out;
>  
> +	key.cd = cd;
>  	key.ek_client = dom;	
>  	key.ek_fsidtype = fsidtype;
>  	memcpy(key.ek_fsid, buf, len);
> @@ -181,7 +190,11 @@ static int expkey_show(struct seq_file *m,
>  	if (test_bit(CACHE_VALID, &h->flags) && 
>  	    !test_bit(CACHE_NEGATIVE, &h->flags)) {
>  		seq_printf(m, " ");
> -		seq_path(m, &ek->ek_path, "\\ \t\n");
> +		if (legitimize_mntget(ek->ek_path.mnt)) {
> +			seq_path(m, &ek->ek_path, "\\ \t\n");
> +			mntput(ek->ek_path.mnt);
> +		} else
> +			seq_printf(m, "Dir umounting");

This "Dir umounting" needs to parse as a single word, so having a space
in there is bad.  Maybe "Dir-unmounting".


>  	}
>  	seq_printf(m, "\n");
>  	return 0;
> @@ -210,6 +223,26 @@ static inline void expkey_init(struct cache_head *cnew,
>  	new->ek_fsidtype = item->ek_fsidtype;
>  
>  	memcpy(new->ek_fsid, item->ek_fsid, sizeof(new->ek_fsid));
> +	new->cd = item->cd;
> +}
> +
> +static void expkey_pin_kill(struct fs_pin *pin)
> +{
> +	struct svc_expkey *key = container_of(pin, struct svc_expkey, ek_pin);
> +
> +	if (!completion_done(&key->ek_done)) {
> +		schedule_work(&key->ek_work);
> +		wait_for_completion(&key->ek_done);
> +	}
> +
> +	path_put_unpin(&key->ek_path, &key->ek_pin);
> +	expkey_destroy(key);
> +}
> +
> +static void expkey_close_work(struct work_struct *work)
> +{
> +	struct svc_expkey *key = container_of(work, struct svc_expkey, ek_work);
> +	cache_delete_entry(key->cd, &key->h);
>  }

I'm perplexed by this separate scheduled work.
You say:

> 2. add a work_struct for pin_kill decreases the reference indirectly.

above.
cache_delete_entry() can call cache_put() which would call expkey_put()
which calls pin_kill(), which will block until path_put_unpin calls
pin_remove() which of course now cannot happen.

So I can see why you have it, but I really really don't like it. :-(

I'll post a patch to make a change to fs_pin so this sort of thing
should be much easier.

>  
>  static inline void expkey_update(struct cache_head *cnew,
> @@ -218,16 +251,19 @@ static inline void expkey_update(struct cache_head *cnew,
>  	struct svc_expkey *new = container_of(cnew, struct svc_expkey, h);
>  	struct svc_expkey *item = container_of(citem, struct svc_expkey, h);
>  
> +	init_fs_pin(&new->ek_pin, expkey_pin_kill);
>  	new->ek_path = item->ek_path;
> -	path_get(&item->ek_path);
> +	path_get_pin(&new->ek_path, &new->ek_pin);
>  }
>  
>  static struct cache_head *expkey_alloc(void)
>  {
> -	struct svc_expkey *i = kmalloc(sizeof(*i), GFP_KERNEL);
> -	if (i)
> +	struct svc_expkey *i = kzalloc(sizeof(*i), GFP_KERNEL);
> +	if (i) {
> +		INIT_WORK(&i->ek_work, expkey_close_work);
> +		init_completion(&i->ek_done);
>  		return &i->h;
> -	else
> +	} else
>  		return NULL;
>  }

I'm slightly less offended by this kzalloc, but I still think it needs
to be justified if it is going to remain.


>  
> @@ -306,14 +342,21 @@ static void nfsd4_fslocs_free(struct nfsd4_fs_locations *fsloc)
>  	fsloc->locations = NULL;
>  }
>  
> -static void svc_export_put(struct kref *ref)
> +static void svc_export_destroy(struct svc_export *exp)
>  {
> -	struct svc_export *exp = container_of(ref, struct svc_export, h.ref);
> -	path_put(&exp->ex_path);
>  	auth_domain_put(exp->ex_client);
>  	nfsd4_fslocs_free(&exp->ex_fslocs);
>  	kfree(exp->ex_uuid);
> -	kfree(exp);
> +	kfree_rcu(exp, rcu_head);
> +}
> +
> +static void svc_export_put(struct kref *ref)
> +{
> +	struct svc_export *exp = container_of(ref, struct svc_export, h.ref);
> +
> +	rcu_read_lock();
> +	complete(&exp->ex_done);
> +	pin_kill(&exp->ex_pin);
>  }
>  
>  static void svc_export_request(struct cache_detail *cd,
> @@ -520,7 +563,7 @@ static int svc_export_parse(struct cache_detail *cd, char *mesg, int mlen)
>  		return -EINVAL;
>  	mesg[mlen-1] = 0;
>  
> -	buf = kmalloc(PAGE_SIZE, GFP_KERNEL);
> +	buf = kzalloc(PAGE_SIZE, GFP_KERNEL);
>  	if (!buf)
>  		return -ENOMEM;
>  
> @@ -636,7 +679,7 @@ static int svc_export_parse(struct cache_detail *cd, char *mesg, int mlen)
>  	if (expp == NULL)
>  		err = -ENOMEM;
>  	else
> -		exp_put(expp);
> +		cache_put(&expp->h, expp->cd);
>  out4:
>  	nfsd4_fslocs_free(&exp.ex_fslocs);
>  	kfree(exp.ex_uuid);
> @@ -664,7 +707,12 @@ static int svc_export_show(struct seq_file *m,
>  		return 0;
>  	}
>  	exp = container_of(h, struct svc_export, h);
> -	seq_path(m, &exp->ex_path, " \t\n\\");
> +	if (legitimize_mntget(exp->ex_path.mnt)) {
> +		seq_path(m, &exp->ex_path, " \t\n\\");
> +		mntput(exp->ex_path.mnt);
> +	} else
> +		seq_printf(m, "Dir umounting");
> +

again, "Dir-umounting" .. or even "Dir-unmounting" with the 'n'.


>  	seq_putc(m, '\t');
>  	seq_escape(m, exp->ex_client->name, " \t\n\\");
>  	seq_putc(m, '(');
> @@ -694,15 +742,35 @@ static int svc_export_match(struct cache_head *a, struct cache_head *b)
>  		path_equal(&orig->ex_path, &new->ex_path);
>  }
>  
> +static void export_pin_kill(struct fs_pin *pin)
> +{
> +	struct svc_export *exp = container_of(pin, struct svc_export, ex_pin);
> +
> +	if (!completion_done(&exp->ex_done)) {
> +		schedule_work(&exp->ex_work);
> +		wait_for_completion(&exp->ex_done);
> +	}
> +
> +	path_put_unpin(&exp->ex_path, &exp->ex_pin);
> +	svc_export_destroy(exp);
> +}
> +
> +static void export_close_work(struct work_struct *work)
> +{
> +	struct svc_export *exp = container_of(work, struct svc_export, ex_work);
> +	cache_delete_entry(exp->cd, &exp->h);
> +}
> +
>  static void svc_export_init(struct cache_head *cnew, struct cache_head *citem)
>  {
>  	struct svc_export *new = container_of(cnew, struct svc_export, h);
>  	struct svc_export *item = container_of(citem, struct svc_export, h);
>  
> +	init_fs_pin(&new->ex_pin, export_pin_kill);
>  	kref_get(&item->ex_client->ref);
>  	new->ex_client = item->ex_client;
>  	new->ex_path = item->ex_path;
> -	path_get(&item->ex_path);
> +	path_get_pin(&new->ex_path, &new->ex_pin);
>  	new->ex_fslocs.locations = NULL;
>  	new->ex_fslocs.locations_count = 0;
>  	new->ex_fslocs.migrated = 0;
> @@ -740,10 +808,12 @@ static void export_update(struct cache_head *cnew, struct cache_head *citem)
>  
>  static struct cache_head *svc_export_alloc(void)
>  {
> -	struct svc_export *i = kmalloc(sizeof(*i), GFP_KERNEL);
> -	if (i)
> +	struct svc_export *i = kzalloc(sizeof(*i), GFP_KERNEL);
> +	if (i) {
> +		INIT_WORK(&i->ex_work, export_close_work);
> +		init_completion(&i->ex_done);
>  		return &i->h;
> -	else
> +	} else
>  		return NULL;
>  }
>  
> @@ -798,6 +868,11 @@ svc_export_update(struct svc_export *new, struct svc_export *old)
>  		return NULL;
>  }
>  
> +static void exp_put_key(struct svc_expkey *key)
> +{
> +	mntput(key->ek_path.mnt);
> +	cache_put(&key->h, key->cd);
> +}

This is only called in one place.  Does it really help clarity to make
it a separate function?

>  
>  static struct svc_expkey *
>  exp_find_key(struct cache_detail *cd, struct auth_domain *clp, int fsid_type,
> @@ -809,6 +884,7 @@ exp_find_key(struct cache_detail *cd, struct auth_domain *clp, int fsid_type,
>  	if (!clp)
>  		return ERR_PTR(-ENOENT);
>  
> +	key.cd = cd;
>  	key.ek_client = clp;
>  	key.ek_fsidtype = fsid_type;
>  	memcpy(key.ek_fsid, fsidv, key_len(fsid_type));
> @@ -819,6 +895,12 @@ exp_find_key(struct cache_detail *cd, struct auth_domain *clp, int fsid_type,
>  	err = cache_check(cd, &ek->h, reqp);
>  	if (err)
>  		return ERR_PTR(err);
> +
> +	if (!legitimize_mntget(ek->ek_path.mnt)) {
> +		cache_put(&ek->h, ek->cd);
> +		return ERR_PTR(-ESTALE);
> +	}
> +

I think -ENOENT would be a better error code here.
Just pretend that the entry doesn't exist - because in a moment it
won't.


>  	return ek;
>  }
>  
> @@ -842,6 +924,12 @@ exp_get_by_name(struct cache_detail *cd, struct auth_domain *clp,
>  	err = cache_check(cd, &exp->h, reqp);
>  	if (err)
>  		return ERR_PTR(err);
> +
> +	if (!legitimize_mntget(exp->ex_path.mnt)) {
> +		cache_put(&exp->h, exp->cd);
> +		return ERR_PTR(-ESTALE);
> +	}
> +
>  	return exp;
>  }

You *really* don't need this legitimize_mntget() here, just mntget().
You already have a legitimate reference to the mnt here.


I think this patch is mostly good - there only serious problem is the
"Dir umounting" string that you use in place of a pathname, and which
contains a space.

But I'd really like to get rid of the completion and work struct if I
can...

Thanks,
NeilBrown


>  
> @@ -928,7 +1016,7 @@ static struct svc_export *exp_find(struct cache_detail *cd,
>  		return ERR_CAST(ek);
>  
>  	exp = exp_get_by_name(cd, clp, &ek->ek_path, reqp);
> -	cache_put(&ek->h, nn->svc_expkey_cache);
> +	exp_put_key(ek);
>  
>  	if (IS_ERR(exp))
>  		return ERR_CAST(exp);
> @@ -1195,10 +1283,10 @@ static int e_show(struct seq_file *m, void *p)
>  		return 0;
>  	}
>  
> -	exp_get(exp);
> +	cache_get(&exp->h);
>  	if (cache_check(cd, &exp->h, NULL))
>  		return 0;
> -	exp_put(exp);
> +	cache_put(&exp->h, exp->cd);
>  	return svc_export_show(m, cd, cp);
>  }
>  
> diff --git a/fs/nfsd/export.h b/fs/nfsd/export.h
> index 1f52bfc..52210fb 100644
> --- a/fs/nfsd/export.h
> +++ b/fs/nfsd/export.h
> @@ -4,6 +4,7 @@
>  #ifndef NFSD_EXPORT_H
>  #define NFSD_EXPORT_H
>  
> +#include <linux/fs_pin.h>
>  #include <linux/sunrpc/cache.h>
>  #include <uapi/linux/nfsd/export.h>
>  
> @@ -46,9 +47,10 @@ struct exp_flavor_info {
>  
>  struct svc_export {
>  	struct cache_head	h;
> +	struct cache_detail	*cd;
> +
>  	struct auth_domain *	ex_client;
>  	int			ex_flags;
> -	struct path		ex_path;
>  	kuid_t			ex_anon_uid;
>  	kgid_t			ex_anon_gid;
>  	int			ex_fsid;
> @@ -58,7 +60,14 @@ struct svc_export {
>  	struct exp_flavor_info	ex_flavors[MAX_SECINFO_LIST];
>  	enum pnfs_layouttype	ex_layout_type;
>  	struct nfsd4_deviceid_map *ex_devid_map;
> -	struct cache_detail	*cd;
> +
> +	struct path		ex_path;
> +	struct fs_pin		ex_pin;
> +	struct rcu_head		rcu_head;
> +
> +	/* For svc_export_put and fs umounting window */
> +	struct completion	ex_done;
> +	struct work_struct	ex_work;
>  };
>  
>  /* an "export key" (expkey) maps a filehandlefragement to an
> @@ -67,12 +76,19 @@ struct svc_export {
>   */
>  struct svc_expkey {
>  	struct cache_head	h;
> +	struct cache_detail	*cd;
>  
>  	struct auth_domain *	ek_client;
>  	int			ek_fsidtype;
>  	u32			ek_fsid[6];
>  
>  	struct path		ek_path;
> +	struct fs_pin		ek_pin;
> +	struct rcu_head		rcu_head;
> +
> +	/* For expkey_put and fs umounting window */
> +	struct completion	ek_done;
> +	struct work_struct	ek_work;
>  };
>  
>  #define EX_ISSYNC(exp)		(!((exp)->ex_flags & NFSEXP_ASYNC))
> @@ -100,12 +116,14 @@ __be32			nfserrno(int errno);
>  
>  static inline void exp_put(struct svc_export *exp)
>  {
> +	mntput(exp->ex_path.mnt);
>  	cache_put(&exp->h, exp->cd);
>  }
>  
>  static inline struct svc_export *exp_get(struct svc_export *exp)
>  {
>  	cache_get(&exp->h);
> +	mntget(exp->ex_path.mnt);
>  	return exp;
>  }
>  struct svc_export * rqst_exp_find(struct svc_rqst *, int, u32 *);

--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 9/9 v8] nfsd: Allows user un-mounting filesystem where nfsd exports base on
@ 2015-07-29  3:56         ` NeilBrown
  0 siblings, 0 replies; 49+ messages in thread
From: NeilBrown @ 2015-07-29  3:56 UTC (permalink / raw)
  To: Kinglong Mee
  Cc: J. Bruce Fields, Al Viro, linux-nfs, linux-fsdevel, Trond Myklebust

On Mon, 27 Jul 2015 11:12:06 +0800 Kinglong Mee <kinglongmee@gmail.com>
wrote:

> If there are some mount points(not exported for nfs) under pseudo root,
> after client's operation of those entry under the root,  anyone *can't*
> unmount those mount points until export cache expired.
> 
> /nfs/xfs        *(rw,insecure,no_subtree_check,no_root_squash)
> /nfs/pnfs       *(rw,insecure,no_subtree_check,no_root_squash)
> total 0
> drwxr-xr-x. 3 root root 84 Apr 21 22:27 pnfs
> drwxr-xr-x. 3 root root 84 Apr 21 22:27 test
> drwxr-xr-x. 2 root root  6 Apr 20 22:01 xfs
> Filesystem                      1K-blocks    Used Available Use% Mounted on
> ......
> /dev/sdd                          1038336   32944   1005392   4% /nfs/pnfs
> /dev/sdc                         10475520   32928  10442592   1% /nfs/xfs
> /dev/sde                           999320    1284    929224   1% /nfs/test
> /mnt/pnfs/:
> total 0
> -rw-r--r--. 1 root root 0 Apr 21 22:23 attr
> drwxr-xr-x. 2 root root 6 Apr 21 22:19 tmp
> 
> /mnt/xfs/:
> total 0
> umount: /nfs/test/: target is busy
>         (In some cases useful info about processes that
>         use the device is found by lsof(8) or fuser(1).)
> 
> It's caused by exports cache of nfsd holds the reference of
> the path (here is /nfs/test/), so, it can't be umounted.
> 
> I don't think that's user expect, they want umount /nfs/test/.
> Bruce think user can also umount /nfs/pnfs/ and /nfs/xfs.
> 
> Also, using kzalloc for all memory allocating without kmalloc.
> Thanks for Al Viro's commets for the logic of fs_pin.
> 
> v3,
> 1. using path_get_pin/path_put_unpin for path pin
> 2. using kzalloc for memory allocating
> 
> v5, v4,
> 1. add a completion for pin_kill waiting the reference is decreased to zero.
> 2. add a work_struct for pin_kill decreases the reference indirectly.
> 3. free svc_export/svc_expkey in pin_kill, not svc_export_put/svc_expkey_put.
> 4. svc_export_put/svc_expkey_put go though pin_kill logic.
> 
> v6,
> 1. Pin vfsmnt to mount point at first, when reference increace (==2),
>    grab a reference to vfsmnt by mntget. When decreace (==1),
>    drop the reference to vfsmnt, left pin.
> 2. Delete cache_head directly from cache_detail.
> 
> v7,
> implement self reference increase and decrease for nfsd exports/expkey
> 
> v8,
> new method as,
> 
> 1. There are only one outlet from each cache, exp_find_key() for expkey,
>    exp_get_by_name() for export.
> 2. Any fsid to export or filehandle to export will call the function.
> 3. exp_get()/exp_put() increase/decrease the reference of export.
> 
> Call legitimize_mntget() in the only outlet function exp_find_key()/
> exp_get_by_name(), if fail return STALE, otherwise, any valid
> expkey/export from the cache is validated (Have get the reference of vfsmnt).
> 
> Add mntget() in exp_get() and mntput() in exp_put(), because the export
> passed to exp_get/exp_put are returned from exp_find_key/exp_get_by_name.
> 
> For expkey cache,
> 1. At first, a fsid is passed to exp_find_key, and lookup a cache
>    in svc_expkey_lookup, if success, ekey->ek_path is pined to mount.
> 2. Then call legitimize_mntget getting a reference of vfsmnt 
>    before return from exp_find_key.
> 3. Any calling exp_find_key with valid cache must put the vfsmnt.
> 
> for export cache,
> 1. At first, a path (returned from exp_find_key) with validate vfsmnt
>    is passed to exp_get_by_name, if success, exp->ex_path is pined to mount.
> 2. Then call legitimize_mntget getting a reference of vfsmnt 
>    before return from exp_get_by_name.
> 3. Any calling exp_get_by_name with valid cache must put the vfsmnt
>    by exp_put();
> 4. Any using the exp returned from exp_get_by_name must call exp_get(),
>    will increase the reference of vfsmnt.
> 
> So that,
> a. After getting the reference in 2, any umount of filesystem will get -EBUSY.
> b. After put all reference after 4, or before get the reference in 2, 
>    any umount of filesystem will call pin_kill, and delete the cache directly,
>    also unpin the vfsmount.
> c. Between 1 and 2, have get the reference of exp/key cache, with invalidate vfsmnt.
>    As you said, umount of filesystem only wait exp_find_key/exp_get_by_name
>    put the reference of cache when legitimize_mntget fail.
> 
> Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
> ---
>  fs/nfsd/export.c | 136 +++++++++++++++++++++++++++++++++++++++++++++----------
>  fs/nfsd/export.h |  22 ++++++++-
>  2 files changed, 132 insertions(+), 26 deletions(-)
> 
> diff --git a/fs/nfsd/export.c b/fs/nfsd/export.c
> index b4d84b5..7f4816d 100644
> --- a/fs/nfsd/export.c
> +++ b/fs/nfsd/export.c
> @@ -37,15 +37,23 @@
>  #define	EXPKEY_HASHMAX		(1 << EXPKEY_HASHBITS)
>  #define	EXPKEY_HASHMASK		(EXPKEY_HASHMAX -1)
>  
> +static void expkey_destroy(struct svc_expkey *key)
> +{
> +	auth_domain_put(key->ek_client);
> +	kfree_rcu(key, rcu_head);
> +}
> +
>  static void expkey_put(struct kref *ref)
>  {
>  	struct svc_expkey *key = container_of(ref, struct svc_expkey, h.ref);
>  
>  	if (test_bit(CACHE_VALID, &key->h.flags) &&
> -	    !test_bit(CACHE_NEGATIVE, &key->h.flags))
> -		path_put(&key->ek_path);
> -	auth_domain_put(key->ek_client);
> -	kfree(key);
> +	    !test_bit(CACHE_NEGATIVE, &key->h.flags)) {
> +		rcu_read_lock();
> +		complete(&key->ek_done);
> +		pin_kill(&key->ek_pin);
> +	} else
> +		expkey_destroy(key);
>  }
>  
>  static void expkey_request(struct cache_detail *cd,
> @@ -83,7 +91,7 @@ static int expkey_parse(struct cache_detail *cd, char *mesg, int mlen)
>  		return -EINVAL;
>  	mesg[mlen-1] = 0;
>  
> -	buf = kmalloc(PAGE_SIZE, GFP_KERNEL);
> +	buf = kzalloc(PAGE_SIZE, GFP_KERNEL);

Why this change?  There are certainly times when kzmalloc is
appropriate but I don't see that this is one of them, or that the
change has anything to do with the rest of the patch.


>  	err = -ENOMEM;
>  	if (!buf)
>  		goto out;
> @@ -119,6 +127,7 @@ static int expkey_parse(struct cache_detail *cd, char *mesg, int mlen)
>  	if (key.h.expiry_time == 0)
>  		goto out;
>  
> +	key.cd = cd;
>  	key.ek_client = dom;	
>  	key.ek_fsidtype = fsidtype;
>  	memcpy(key.ek_fsid, buf, len);
> @@ -181,7 +190,11 @@ static int expkey_show(struct seq_file *m,
>  	if (test_bit(CACHE_VALID, &h->flags) && 
>  	    !test_bit(CACHE_NEGATIVE, &h->flags)) {
>  		seq_printf(m, " ");
> -		seq_path(m, &ek->ek_path, "\\ \t\n");
> +		if (legitimize_mntget(ek->ek_path.mnt)) {
> +			seq_path(m, &ek->ek_path, "\\ \t\n");
> +			mntput(ek->ek_path.mnt);
> +		} else
> +			seq_printf(m, "Dir umounting");

This "Dir umounting" needs to parse as a single word, so having a space
in there is bad.  Maybe "Dir-unmounting".


>  	}
>  	seq_printf(m, "\n");
>  	return 0;
> @@ -210,6 +223,26 @@ static inline void expkey_init(struct cache_head *cnew,
>  	new->ek_fsidtype = item->ek_fsidtype;
>  
>  	memcpy(new->ek_fsid, item->ek_fsid, sizeof(new->ek_fsid));
> +	new->cd = item->cd;
> +}
> +
> +static void expkey_pin_kill(struct fs_pin *pin)
> +{
> +	struct svc_expkey *key = container_of(pin, struct svc_expkey, ek_pin);
> +
> +	if (!completion_done(&key->ek_done)) {
> +		schedule_work(&key->ek_work);
> +		wait_for_completion(&key->ek_done);
> +	}
> +
> +	path_put_unpin(&key->ek_path, &key->ek_pin);
> +	expkey_destroy(key);
> +}
> +
> +static void expkey_close_work(struct work_struct *work)
> +{
> +	struct svc_expkey *key = container_of(work, struct svc_expkey, ek_work);
> +	cache_delete_entry(key->cd, &key->h);
>  }

I'm perplexed by this separate scheduled work.
You say:

> 2. add a work_struct for pin_kill decreases the reference indirectly.

above.
cache_delete_entry() can call cache_put() which would call expkey_put()
which calls pin_kill(), which will block until path_put_unpin calls
pin_remove() which of course now cannot happen.

So I can see why you have it, but I really really don't like it. :-(

I'll post a patch to make a change to fs_pin so this sort of thing
should be much easier.

>  
>  static inline void expkey_update(struct cache_head *cnew,
> @@ -218,16 +251,19 @@ static inline void expkey_update(struct cache_head *cnew,
>  	struct svc_expkey *new = container_of(cnew, struct svc_expkey, h);
>  	struct svc_expkey *item = container_of(citem, struct svc_expkey, h);
>  
> +	init_fs_pin(&new->ek_pin, expkey_pin_kill);
>  	new->ek_path = item->ek_path;
> -	path_get(&item->ek_path);
> +	path_get_pin(&new->ek_path, &new->ek_pin);
>  }
>  
>  static struct cache_head *expkey_alloc(void)
>  {
> -	struct svc_expkey *i = kmalloc(sizeof(*i), GFP_KERNEL);
> -	if (i)
> +	struct svc_expkey *i = kzalloc(sizeof(*i), GFP_KERNEL);
> +	if (i) {
> +		INIT_WORK(&i->ek_work, expkey_close_work);
> +		init_completion(&i->ek_done);
>  		return &i->h;
> -	else
> +	} else
>  		return NULL;
>  }

I'm slightly less offended by this kzalloc, but I still think it needs
to be justified if it is going to remain.


>  
> @@ -306,14 +342,21 @@ static void nfsd4_fslocs_free(struct nfsd4_fs_locations *fsloc)
>  	fsloc->locations = NULL;
>  }
>  
> -static void svc_export_put(struct kref *ref)
> +static void svc_export_destroy(struct svc_export *exp)
>  {
> -	struct svc_export *exp = container_of(ref, struct svc_export, h.ref);
> -	path_put(&exp->ex_path);
>  	auth_domain_put(exp->ex_client);
>  	nfsd4_fslocs_free(&exp->ex_fslocs);
>  	kfree(exp->ex_uuid);
> -	kfree(exp);
> +	kfree_rcu(exp, rcu_head);
> +}
> +
> +static void svc_export_put(struct kref *ref)
> +{
> +	struct svc_export *exp = container_of(ref, struct svc_export, h.ref);
> +
> +	rcu_read_lock();
> +	complete(&exp->ex_done);
> +	pin_kill(&exp->ex_pin);
>  }
>  
>  static void svc_export_request(struct cache_detail *cd,
> @@ -520,7 +563,7 @@ static int svc_export_parse(struct cache_detail *cd, char *mesg, int mlen)
>  		return -EINVAL;
>  	mesg[mlen-1] = 0;
>  
> -	buf = kmalloc(PAGE_SIZE, GFP_KERNEL);
> +	buf = kzalloc(PAGE_SIZE, GFP_KERNEL);
>  	if (!buf)
>  		return -ENOMEM;
>  
> @@ -636,7 +679,7 @@ static int svc_export_parse(struct cache_detail *cd, char *mesg, int mlen)
>  	if (expp == NULL)
>  		err = -ENOMEM;
>  	else
> -		exp_put(expp);
> +		cache_put(&expp->h, expp->cd);
>  out4:
>  	nfsd4_fslocs_free(&exp.ex_fslocs);
>  	kfree(exp.ex_uuid);
> @@ -664,7 +707,12 @@ static int svc_export_show(struct seq_file *m,
>  		return 0;
>  	}
>  	exp = container_of(h, struct svc_export, h);
> -	seq_path(m, &exp->ex_path, " \t\n\\");
> +	if (legitimize_mntget(exp->ex_path.mnt)) {
> +		seq_path(m, &exp->ex_path, " \t\n\\");
> +		mntput(exp->ex_path.mnt);
> +	} else
> +		seq_printf(m, "Dir umounting");
> +

again, "Dir-umounting" .. or even "Dir-unmounting" with the 'n'.


>  	seq_putc(m, '\t');
>  	seq_escape(m, exp->ex_client->name, " \t\n\\");
>  	seq_putc(m, '(');
> @@ -694,15 +742,35 @@ static int svc_export_match(struct cache_head *a, struct cache_head *b)
>  		path_equal(&orig->ex_path, &new->ex_path);
>  }
>  
> +static void export_pin_kill(struct fs_pin *pin)
> +{
> +	struct svc_export *exp = container_of(pin, struct svc_export, ex_pin);
> +
> +	if (!completion_done(&exp->ex_done)) {
> +		schedule_work(&exp->ex_work);
> +		wait_for_completion(&exp->ex_done);
> +	}
> +
> +	path_put_unpin(&exp->ex_path, &exp->ex_pin);
> +	svc_export_destroy(exp);
> +}
> +
> +static void export_close_work(struct work_struct *work)
> +{
> +	struct svc_export *exp = container_of(work, struct svc_export, ex_work);
> +	cache_delete_entry(exp->cd, &exp->h);
> +}
> +
>  static void svc_export_init(struct cache_head *cnew, struct cache_head *citem)
>  {
>  	struct svc_export *new = container_of(cnew, struct svc_export, h);
>  	struct svc_export *item = container_of(citem, struct svc_export, h);
>  
> +	init_fs_pin(&new->ex_pin, export_pin_kill);
>  	kref_get(&item->ex_client->ref);
>  	new->ex_client = item->ex_client;
>  	new->ex_path = item->ex_path;
> -	path_get(&item->ex_path);
> +	path_get_pin(&new->ex_path, &new->ex_pin);
>  	new->ex_fslocs.locations = NULL;
>  	new->ex_fslocs.locations_count = 0;
>  	new->ex_fslocs.migrated = 0;
> @@ -740,10 +808,12 @@ static void export_update(struct cache_head *cnew, struct cache_head *citem)
>  
>  static struct cache_head *svc_export_alloc(void)
>  {
> -	struct svc_export *i = kmalloc(sizeof(*i), GFP_KERNEL);
> -	if (i)
> +	struct svc_export *i = kzalloc(sizeof(*i), GFP_KERNEL);
> +	if (i) {
> +		INIT_WORK(&i->ex_work, export_close_work);
> +		init_completion(&i->ex_done);
>  		return &i->h;
> -	else
> +	} else
>  		return NULL;
>  }
>  
> @@ -798,6 +868,11 @@ svc_export_update(struct svc_export *new, struct svc_export *old)
>  		return NULL;
>  }
>  
> +static void exp_put_key(struct svc_expkey *key)
> +{
> +	mntput(key->ek_path.mnt);
> +	cache_put(&key->h, key->cd);
> +}

This is only called in one place.  Does it really help clarity to make
it a separate function?

>  
>  static struct svc_expkey *
>  exp_find_key(struct cache_detail *cd, struct auth_domain *clp, int fsid_type,
> @@ -809,6 +884,7 @@ exp_find_key(struct cache_detail *cd, struct auth_domain *clp, int fsid_type,
>  	if (!clp)
>  		return ERR_PTR(-ENOENT);
>  
> +	key.cd = cd;
>  	key.ek_client = clp;
>  	key.ek_fsidtype = fsid_type;
>  	memcpy(key.ek_fsid, fsidv, key_len(fsid_type));
> @@ -819,6 +895,12 @@ exp_find_key(struct cache_detail *cd, struct auth_domain *clp, int fsid_type,
>  	err = cache_check(cd, &ek->h, reqp);
>  	if (err)
>  		return ERR_PTR(err);
> +
> +	if (!legitimize_mntget(ek->ek_path.mnt)) {
> +		cache_put(&ek->h, ek->cd);
> +		return ERR_PTR(-ESTALE);
> +	}
> +

I think -ENOENT would be a better error code here.
Just pretend that the entry doesn't exist - because in a moment it
won't.


>  	return ek;
>  }
>  
> @@ -842,6 +924,12 @@ exp_get_by_name(struct cache_detail *cd, struct auth_domain *clp,
>  	err = cache_check(cd, &exp->h, reqp);
>  	if (err)
>  		return ERR_PTR(err);
> +
> +	if (!legitimize_mntget(exp->ex_path.mnt)) {
> +		cache_put(&exp->h, exp->cd);
> +		return ERR_PTR(-ESTALE);
> +	}
> +
>  	return exp;
>  }

You *really* don't need this legitimize_mntget() here, just mntget().
You already have a legitimate reference to the mnt here.


I think this patch is mostly good - there only serious problem is the
"Dir umounting" string that you use in place of a pathname, and which
contains a space.

But I'd really like to get rid of the completion and work struct if I
can...

Thanks,
NeilBrown


>  
> @@ -928,7 +1016,7 @@ static struct svc_export *exp_find(struct cache_detail *cd,
>  		return ERR_CAST(ek);
>  
>  	exp = exp_get_by_name(cd, clp, &ek->ek_path, reqp);
> -	cache_put(&ek->h, nn->svc_expkey_cache);
> +	exp_put_key(ek);
>  
>  	if (IS_ERR(exp))
>  		return ERR_CAST(exp);
> @@ -1195,10 +1283,10 @@ static int e_show(struct seq_file *m, void *p)
>  		return 0;
>  	}
>  
> -	exp_get(exp);
> +	cache_get(&exp->h);
>  	if (cache_check(cd, &exp->h, NULL))
>  		return 0;
> -	exp_put(exp);
> +	cache_put(&exp->h, exp->cd);
>  	return svc_export_show(m, cd, cp);
>  }
>  
> diff --git a/fs/nfsd/export.h b/fs/nfsd/export.h
> index 1f52bfc..52210fb 100644
> --- a/fs/nfsd/export.h
> +++ b/fs/nfsd/export.h
> @@ -4,6 +4,7 @@
>  #ifndef NFSD_EXPORT_H
>  #define NFSD_EXPORT_H
>  
> +#include <linux/fs_pin.h>
>  #include <linux/sunrpc/cache.h>
>  #include <uapi/linux/nfsd/export.h>
>  
> @@ -46,9 +47,10 @@ struct exp_flavor_info {
>  
>  struct svc_export {
>  	struct cache_head	h;
> +	struct cache_detail	*cd;
> +
>  	struct auth_domain *	ex_client;
>  	int			ex_flags;
> -	struct path		ex_path;
>  	kuid_t			ex_anon_uid;
>  	kgid_t			ex_anon_gid;
>  	int			ex_fsid;
> @@ -58,7 +60,14 @@ struct svc_export {
>  	struct exp_flavor_info	ex_flavors[MAX_SECINFO_LIST];
>  	enum pnfs_layouttype	ex_layout_type;
>  	struct nfsd4_deviceid_map *ex_devid_map;
> -	struct cache_detail	*cd;
> +
> +	struct path		ex_path;
> +	struct fs_pin		ex_pin;
> +	struct rcu_head		rcu_head;
> +
> +	/* For svc_export_put and fs umounting window */
> +	struct completion	ex_done;
> +	struct work_struct	ex_work;
>  };
>  
>  /* an "export key" (expkey) maps a filehandlefragement to an
> @@ -67,12 +76,19 @@ struct svc_export {
>   */
>  struct svc_expkey {
>  	struct cache_head	h;
> +	struct cache_detail	*cd;
>  
>  	struct auth_domain *	ek_client;
>  	int			ek_fsidtype;
>  	u32			ek_fsid[6];
>  
>  	struct path		ek_path;
> +	struct fs_pin		ek_pin;
> +	struct rcu_head		rcu_head;
> +
> +	/* For expkey_put and fs umounting window */
> +	struct completion	ek_done;
> +	struct work_struct	ek_work;
>  };
>  
>  #define EX_ISSYNC(exp)		(!((exp)->ex_flags & NFSEXP_ASYNC))
> @@ -100,12 +116,14 @@ __be32			nfserrno(int errno);
>  
>  static inline void exp_put(struct svc_export *exp)
>  {
> +	mntput(exp->ex_path.mnt);
>  	cache_put(&exp->h, exp->cd);
>  }
>  
>  static inline struct svc_export *exp_get(struct svc_export *exp)
>  {
>  	cache_get(&exp->h);
> +	mntget(exp->ex_path.mnt);
>  	return exp;
>  }
>  struct svc_export * rqst_exp_find(struct svc_rqst *, int, u32 *);


^ permalink raw reply	[flat|nested] 49+ messages in thread

* [PATCH] fs-pin: allow pin_remove() to be called other than from ->kill()
  2015-07-27  3:12     ` Kinglong Mee
@ 2015-07-29  3:59         ` NeilBrown
  -1 siblings, 0 replies; 49+ messages in thread
From: NeilBrown @ 2015-07-29  3:59 UTC (permalink / raw)
  To: Kinglong Mee, Al Viro
  Cc: J. Bruce Fields, linux-nfs-u79uwXL29TY76Z2rM5mHXA,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA, Trond Myklebust



fs-pin currently assumes when either the vfsmount or the fs_pin wants
to unpin, pin_kill() will be called.
This requires that the ->kill() function can wait for any transient
references to the fs_pin to be released.  If the structure containing
the fs_pin doesn't already have the ability to wait for references,
this can be a burden.

As the fs_pin already has infrastructure for waiting, that can be
leveraged to remove the burden.

In this alternate scenario, only the vfsmount calls pin_kill() when it
wants to unpin.  The owner of the fs_pin() instead calls pin_remove().

The ->kill() function removes any long-term references, and then calls
pin_kill() (recursively).
When the last reference on (the structure containing) the fs_pin is
dropped, pin_remove() will be called and the (recursive) pin_kill()
call will complete.

For this to be safe, the final "put" must *not* free the structure if
pin_kill() has already been called, as that could leave ->kill()
accessing freed data.

So we provide a return value for pin_remove() which reports the old
->done value.

When final put calls pin_remove() it checks that value.
If it was 0, then pin_kill() has not called ->kill and will not,
so final put can free the data structure.
If it was -1, then pin_kill() has called ->kill, and ->kill will
free the data structure - final put must not touch it.

This makes the 'wait' infrastructure of fs_pin available to any
pinning client which wants to use it.

Signed-Off-By: NeilBrown <neilb-IBi9RG/b67k@public.gmane.org>

---
Hi Al,
 do you see this as a workable solution?  I think it will improve the nfsd pining patch
a lot.

Thanks,
NeilBrown


diff --git a/fs/fs_pin.c b/fs/fs_pin.c
index 611b5408f6ec..b7954a9d17da 100644
--- a/fs/fs_pin.c
+++ b/fs/fs_pin.c
@@ -6,16 +6,32 @@
 
 static DEFINE_SPINLOCK(pin_lock);
 
-void pin_remove(struct fs_pin *pin)
+/**
+ * pin_remove - disconnect an fs_pin from the pinned structure.
+ * @pin:	The struct fs_pin which is pinning something.
+ *
+ * Detach a 'pin' which was added by pin_insert().  A return value
+ * of -1 implies that pin_kill() has already been called and that the
+ * ->kill() function now owns the data structure containing @pin.
+ * The function which called pin_remove() must not touch the data structure
+ * again (unless it is the ->kill() function itself).
+ * A return value of 0 implies an uneventful disconnect: pin_kill() has not called,
+ * and will not call, the ->kill() function on this @pin.
+ * Any other return value is a usage error - e.g. repeated call to pin_remove().
+ */
+int pin_remove(struct fs_pin *pin)
 {
+	int ret;
 	spin_lock(&pin_lock);
 	hlist_del_init(&pin->m_list);
 	hlist_del_init(&pin->s_list);
 	spin_unlock(&pin_lock);
 	spin_lock_irq(&pin->wait.lock);
+	ret = pin->done;
 	pin->done = 1;
 	wake_up_locked(&pin->wait);
 	spin_unlock_irq(&pin->wait.lock);
+	return ret;
 }
 
 void pin_insert_group(struct fs_pin *pin, struct vfsmount *m, struct hlist_head *p)
diff --git a/include/linux/fs_pin.h b/include/linux/fs_pin.h
index 3886b3bffd7f..2fe9d3ba09e8 100644
--- a/include/linux/fs_pin.h
+++ b/include/linux/fs_pin.h
@@ -18,7 +18,7 @@ static inline void init_fs_pin(struct fs_pin *p, void (*kill)(struct fs_pin *))
 	p->kill = kill;
 }
 
-void pin_remove(struct fs_pin *);
+int pin_remove(struct fs_pin *);
 void pin_insert_group(struct fs_pin *, struct vfsmount *, struct hlist_head *);
 void pin_insert(struct fs_pin *, struct vfsmount *);
 void pin_kill(struct fs_pin *);
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH] fs-pin: allow pin_remove() to be called other than from ->kill()
@ 2015-07-29  3:59         ` NeilBrown
  0 siblings, 0 replies; 49+ messages in thread
From: NeilBrown @ 2015-07-29  3:59 UTC (permalink / raw)
  To: Kinglong Mee, Al Viro
  Cc: J. Bruce Fields, linux-nfs, linux-fsdevel, Trond Myklebust



fs-pin currently assumes when either the vfsmount or the fs_pin wants
to unpin, pin_kill() will be called.
This requires that the ->kill() function can wait for any transient
references to the fs_pin to be released.  If the structure containing
the fs_pin doesn't already have the ability to wait for references,
this can be a burden.

As the fs_pin already has infrastructure for waiting, that can be
leveraged to remove the burden.

In this alternate scenario, only the vfsmount calls pin_kill() when it
wants to unpin.  The owner of the fs_pin() instead calls pin_remove().

The ->kill() function removes any long-term references, and then calls
pin_kill() (recursively).
When the last reference on (the structure containing) the fs_pin is
dropped, pin_remove() will be called and the (recursive) pin_kill()
call will complete.

For this to be safe, the final "put" must *not* free the structure if
pin_kill() has already been called, as that could leave ->kill()
accessing freed data.

So we provide a return value for pin_remove() which reports the old
->done value.

When final put calls pin_remove() it checks that value.
If it was 0, then pin_kill() has not called ->kill and will not,
so final put can free the data structure.
If it was -1, then pin_kill() has called ->kill, and ->kill will
free the data structure - final put must not touch it.

This makes the 'wait' infrastructure of fs_pin available to any
pinning client which wants to use it.

Signed-Off-By: NeilBrown <neilb@suse.com>

---
Hi Al,
 do you see this as a workable solution?  I think it will improve the nfsd pining patch
a lot.

Thanks,
NeilBrown


diff --git a/fs/fs_pin.c b/fs/fs_pin.c
index 611b5408f6ec..b7954a9d17da 100644
--- a/fs/fs_pin.c
+++ b/fs/fs_pin.c
@@ -6,16 +6,32 @@
 
 static DEFINE_SPINLOCK(pin_lock);
 
-void pin_remove(struct fs_pin *pin)
+/**
+ * pin_remove - disconnect an fs_pin from the pinned structure.
+ * @pin:	The struct fs_pin which is pinning something.
+ *
+ * Detach a 'pin' which was added by pin_insert().  A return value
+ * of -1 implies that pin_kill() has already been called and that the
+ * ->kill() function now owns the data structure containing @pin.
+ * The function which called pin_remove() must not touch the data structure
+ * again (unless it is the ->kill() function itself).
+ * A return value of 0 implies an uneventful disconnect: pin_kill() has not called,
+ * and will not call, the ->kill() function on this @pin.
+ * Any other return value is a usage error - e.g. repeated call to pin_remove().
+ */
+int pin_remove(struct fs_pin *pin)
 {
+	int ret;
 	spin_lock(&pin_lock);
 	hlist_del_init(&pin->m_list);
 	hlist_del_init(&pin->s_list);
 	spin_unlock(&pin_lock);
 	spin_lock_irq(&pin->wait.lock);
+	ret = pin->done;
 	pin->done = 1;
 	wake_up_locked(&pin->wait);
 	spin_unlock_irq(&pin->wait.lock);
+	return ret;
 }
 
 void pin_insert_group(struct fs_pin *pin, struct vfsmount *m, struct hlist_head *p)
diff --git a/include/linux/fs_pin.h b/include/linux/fs_pin.h
index 3886b3bffd7f..2fe9d3ba09e8 100644
--- a/include/linux/fs_pin.h
+++ b/include/linux/fs_pin.h
@@ -18,7 +18,7 @@ static inline void init_fs_pin(struct fs_pin *p, void (*kill)(struct fs_pin *))
 	p->kill = kill;
 }
 
-void pin_remove(struct fs_pin *);
+int pin_remove(struct fs_pin *);
 void pin_insert_group(struct fs_pin *, struct vfsmount *, struct hlist_head *);
 void pin_insert(struct fs_pin *, struct vfsmount *);
 void pin_kill(struct fs_pin *);

^ permalink raw reply related	[flat|nested] 49+ messages in thread

* Re: [PATCH 1/9 v8] fs_pin: Initialize value for fs_pin explicitly
  2015-07-29  0:25     ` NeilBrown
@ 2015-07-29 19:41       ` J. Bruce Fields
       [not found]         ` <20150729194155.GC21949-uC3wQj2KruNg9hUCZPvPmw@public.gmane.org>
  0 siblings, 1 reply; 49+ messages in thread
From: J. Bruce Fields @ 2015-07-29 19:41 UTC (permalink / raw)
  To: NeilBrown
  Cc: Kinglong Mee, Al Viro, linux-nfs, linux-fsdevel, Trond Myklebust

On Wed, Jul 29, 2015 at 10:25:19AM +1000, NeilBrown wrote:
> On Mon, 27 Jul 2015 11:06:53 +0800 Kinglong Mee <kinglongmee@gmail.com>
> wrote:
> 
> > Without initialized, done in fs_pin at stack space may
> > contains strange value.
> > 
> > v8, same as v3
> > Adds macro for header file
> > 
> > Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
> 
> Reviewed-by: NeilBrown <neilb@suse.com>
> 
> It would be really good if some of these early patches could be applied
> to the relevant trees so they appear in -next and we only need to keep
> reviewing the more interesting code at the end.

This patch seems a little bikeshed-y.  I'd rather just drop it or save
it for some other day.  It's not necessary to the series.

--b.

> 
> Al, Bruce: any chance of some of these getting into -next ...
> 
> Thanks,
> NeilBrown
> 
> > ---
> >  include/linux/fs_pin.h | 6 ++++++
> >  1 file changed, 6 insertions(+)
> > 
> > diff --git a/include/linux/fs_pin.h b/include/linux/fs_pin.h
> > index 3886b3b..0dde7b7 100644
> > --- a/include/linux/fs_pin.h
> > +++ b/include/linux/fs_pin.h
> > @@ -1,3 +1,6 @@
> > +#ifndef _LINUX_FS_PIN_H
> > +#define _LINUX_FS_PIN_H
> > +
> >  #include <linux/wait.h>
> >  
> >  struct fs_pin {
> > @@ -16,9 +19,12 @@ static inline void init_fs_pin(struct fs_pin *p, void (*kill)(struct fs_pin *))
> >  	INIT_HLIST_NODE(&p->s_list);
> >  	INIT_HLIST_NODE(&p->m_list);
> >  	p->kill = kill;
> > +	p->done = 0;
> >  }
> >  
> >  void pin_remove(struct fs_pin *);
> >  void pin_insert_group(struct fs_pin *, struct vfsmount *, struct hlist_head *);
> >  void pin_insert(struct fs_pin *, struct vfsmount *);
> >  void pin_kill(struct fs_pin *);
> > +
> > +#endif

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 7/9 v8] sunrpc: Switch to using hash list instead single list
  2015-07-29  2:19   ` NeilBrown
@ 2015-07-29 19:51       ` J. Bruce Fields
  0 siblings, 0 replies; 49+ messages in thread
From: J. Bruce Fields @ 2015-07-29 19:51 UTC (permalink / raw)
  To: NeilBrown
  Cc: Kinglong Mee, Al Viro, linux-nfs-u79uwXL29TY76Z2rM5mHXA,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA, Trond Myklebust

On Wed, Jul 29, 2015 at 12:19:39PM +1000, NeilBrown wrote:
> On Mon, 27 Jul 2015 11:10:15 +0800 Kinglong Mee <kinglongmee-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
> wrote:
> 
> > Switch using list_head for cache_head in cache_detail,
> > it is useful of remove an cache_head entry directly from cache_detail.
> > 
> > v8, using hash list, not head list
> > 
> > Signed-off-by: Kinglong Mee <kinglongmee-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
> 
> Reviewed-by: NeilBrown <neilb-IBi9RG/b67k@public.gmane.org>

Thanks, applying this and previous 2 patches.

--b.

> 
> Thanks,
> NeilBrown
> 
> > ---
> >  include/linux/sunrpc/cache.h |  4 +--
> >  net/sunrpc/cache.c           | 60 +++++++++++++++++++++++---------------------
> >  2 files changed, 33 insertions(+), 31 deletions(-)
> > 
> > diff --git a/include/linux/sunrpc/cache.h b/include/linux/sunrpc/cache.h
> > index 04ee5a2..03d3b4c 100644
> > --- a/include/linux/sunrpc/cache.h
> > +++ b/include/linux/sunrpc/cache.h
> > @@ -46,7 +46,7 @@
> >   * 
> >   */
> >  struct cache_head {
> > -	struct cache_head * next;
> > +	struct hlist_node	cache_list;
> >  	time_t		expiry_time;	/* After time time, don't use the data */
> >  	time_t		last_refresh;   /* If CACHE_PENDING, this is when upcall 
> >  					 * was sent, else this is when update was received
> > @@ -73,7 +73,7 @@ struct cache_detail_pipefs {
> >  struct cache_detail {
> >  	struct module *		owner;
> >  	int			hash_size;
> > -	struct cache_head **	hash_table;
> > +	struct hlist_head *	hash_table;
> >  	rwlock_t		hash_lock;
> >  
> >  	atomic_t		inuse; /* active user-space update or lookup */
> > diff --git a/net/sunrpc/cache.c b/net/sunrpc/cache.c
> > index 673c2fa..4a2340a 100644
> > --- a/net/sunrpc/cache.c
> > +++ b/net/sunrpc/cache.c
> > @@ -44,7 +44,7 @@ static void cache_revisit_request(struct cache_head *item);
> >  static void cache_init(struct cache_head *h)
> >  {
> >  	time_t now = seconds_since_boot();
> > -	h->next = NULL;
> > +	INIT_HLIST_NODE(&h->cache_list);
> >  	h->flags = 0;
> >  	kref_init(&h->ref);
> >  	h->expiry_time = now + CACHE_NEW_EXPIRY;
> > @@ -54,15 +54,14 @@ static void cache_init(struct cache_head *h)
> >  struct cache_head *sunrpc_cache_lookup(struct cache_detail *detail,
> >  				       struct cache_head *key, int hash)
> >  {
> > -	struct cache_head **head,  **hp;
> > -	struct cache_head *new = NULL, *freeme = NULL;
> > +	struct cache_head *new = NULL, *freeme = NULL, *tmp = NULL;
> > +	struct hlist_head *head;
> >  
> >  	head = &detail->hash_table[hash];
> >  
> >  	read_lock(&detail->hash_lock);
> >  
> > -	for (hp=head; *hp != NULL ; hp = &(*hp)->next) {
> > -		struct cache_head *tmp = *hp;
> > +	hlist_for_each_entry(tmp, head, cache_list) {
> >  		if (detail->match(tmp, key)) {
> >  			if (cache_is_expired(detail, tmp))
> >  				/* This entry is expired, we will discard it. */
> > @@ -88,12 +87,10 @@ struct cache_head *sunrpc_cache_lookup(struct cache_detail *detail,
> >  	write_lock(&detail->hash_lock);
> >  
> >  	/* check if entry appeared while we slept */
> > -	for (hp=head; *hp != NULL ; hp = &(*hp)->next) {
> > -		struct cache_head *tmp = *hp;
> > +	hlist_for_each_entry(tmp, head, cache_list) {
> >  		if (detail->match(tmp, key)) {
> >  			if (cache_is_expired(detail, tmp)) {
> > -				*hp = tmp->next;
> > -				tmp->next = NULL;
> > +				hlist_del_init(&tmp->cache_list);
> >  				detail->entries --;
> >  				freeme = tmp;
> >  				break;
> > @@ -104,8 +101,8 @@ struct cache_head *sunrpc_cache_lookup(struct cache_detail *detail,
> >  			return tmp;
> >  		}
> >  	}
> > -	new->next = *head;
> > -	*head = new;
> > +
> > +	hlist_add_head(&new->cache_list, head);
> >  	detail->entries++;
> >  	cache_get(new);
> >  	write_unlock(&detail->hash_lock);
> > @@ -143,7 +140,6 @@ struct cache_head *sunrpc_cache_update(struct cache_detail *detail,
> >  	 * If 'old' is not VALID, we update it directly,
> >  	 * otherwise we need to replace it
> >  	 */
> > -	struct cache_head **head;
> >  	struct cache_head *tmp;
> >  
> >  	if (!test_bit(CACHE_VALID, &old->flags)) {
> > @@ -168,15 +164,13 @@ struct cache_head *sunrpc_cache_update(struct cache_detail *detail,
> >  	}
> >  	cache_init(tmp);
> >  	detail->init(tmp, old);
> > -	head = &detail->hash_table[hash];
> >  
> >  	write_lock(&detail->hash_lock);
> >  	if (test_bit(CACHE_NEGATIVE, &new->flags))
> >  		set_bit(CACHE_NEGATIVE, &tmp->flags);
> >  	else
> >  		detail->update(tmp, new);
> > -	tmp->next = *head;
> > -	*head = tmp;
> > +	hlist_add_head(&tmp->cache_list, &detail->hash_table[hash]);
> >  	detail->entries++;
> >  	cache_get(tmp);
> >  	cache_fresh_locked(tmp, new->expiry_time);
> > @@ -416,28 +410,29 @@ static int cache_clean(void)
> >  	/* find a non-empty bucket in the table */
> >  	while (current_detail &&
> >  	       current_index < current_detail->hash_size &&
> > -	       current_detail->hash_table[current_index] == NULL)
> > +	       hlist_empty(&current_detail->hash_table[current_index]))
> >  		current_index++;
> >  
> >  	/* find a cleanable entry in the bucket and clean it, or set to next bucket */
> >  
> >  	if (current_detail && current_index < current_detail->hash_size) {
> > -		struct cache_head *ch, **cp;
> > +		struct cache_head *ch = NULL;
> >  		struct cache_detail *d;
> > +		struct hlist_head *head;
> > +		struct hlist_node *tmp;
> >  
> >  		write_lock(&current_detail->hash_lock);
> >  
> >  		/* Ok, now to clean this strand */
> >  
> > -		cp = & current_detail->hash_table[current_index];
> > -		for (ch = *cp ; ch ; cp = & ch->next, ch = *cp) {
> > +		head = &current_detail->hash_table[current_index];
> > +		hlist_for_each_entry_safe(ch, tmp, head, cache_list) {
> >  			if (current_detail->nextcheck > ch->expiry_time)
> >  				current_detail->nextcheck = ch->expiry_time+1;
> >  			if (!cache_is_expired(current_detail, ch))
> >  				continue;
> >  
> > -			*cp = ch->next;
> > -			ch->next = NULL;
> > +			hlist_del_init(&ch->cache_list);
> >  			current_detail->entries--;
> >  			rv = 1;
> >  			break;
> > @@ -1284,7 +1279,7 @@ void *cache_seq_start(struct seq_file *m, loff_t *pos)
> >  	hash = n >> 32;
> >  	entry = n & ((1LL<<32) - 1);
> >  
> > -	for (ch=cd->hash_table[hash]; ch; ch=ch->next)
> > +	hlist_for_each_entry(ch, &cd->hash_table[hash], cache_list)
> >  		if (!entry--)
> >  			return ch;
> >  	n &= ~((1LL<<32) - 1);
> > @@ -1292,11 +1287,12 @@ void *cache_seq_start(struct seq_file *m, loff_t *pos)
> >  		hash++;
> >  		n += 1LL<<32;
> >  	} while(hash < cd->hash_size &&
> > -		cd->hash_table[hash]==NULL);
> > +		hlist_empty(&cd->hash_table[hash]));
> >  	if (hash >= cd->hash_size)
> >  		return NULL;
> >  	*pos = n+1;
> > -	return cd->hash_table[hash];
> > +	return hlist_entry_safe(cd->hash_table[hash].first,
> > +				struct cache_head, cache_list);
> >  }
> >  EXPORT_SYMBOL_GPL(cache_seq_start);
> >  
> > @@ -1308,23 +1304,25 @@ void *cache_seq_next(struct seq_file *m, void *p, loff_t *pos)
> >  
> >  	if (p == SEQ_START_TOKEN)
> >  		hash = 0;
> > -	else if (ch->next == NULL) {
> > +	else if (ch->cache_list.next == NULL) {
> >  		hash++;
> >  		*pos += 1LL<<32;
> >  	} else {
> >  		++*pos;
> > -		return ch->next;
> > +		return hlist_entry_safe(ch->cache_list.next,
> > +					struct cache_head, cache_list);
> >  	}
> >  	*pos &= ~((1LL<<32) - 1);
> >  	while (hash < cd->hash_size &&
> > -	       cd->hash_table[hash] == NULL) {
> > +	       hlist_empty(&cd->hash_table[hash])) {
> >  		hash++;
> >  		*pos += 1LL<<32;
> >  	}
> >  	if (hash >= cd->hash_size)
> >  		return NULL;
> >  	++*pos;
> > -	return cd->hash_table[hash];
> > +	return hlist_entry_safe(cd->hash_table[hash].first,
> > +				struct cache_head, cache_list);
> >  }
> >  EXPORT_SYMBOL_GPL(cache_seq_next);
> >  
> > @@ -1666,17 +1664,21 @@ EXPORT_SYMBOL_GPL(cache_unregister_net);
> >  struct cache_detail *cache_create_net(struct cache_detail *tmpl, struct net *net)
> >  {
> >  	struct cache_detail *cd;
> > +	int i;
> >  
> >  	cd = kmemdup(tmpl, sizeof(struct cache_detail), GFP_KERNEL);
> >  	if (cd == NULL)
> >  		return ERR_PTR(-ENOMEM);
> >  
> > -	cd->hash_table = kzalloc(cd->hash_size * sizeof(struct cache_head *),
> > +	cd->hash_table = kzalloc(cd->hash_size * sizeof(struct hlist_head),
> >  				 GFP_KERNEL);
> >  	if (cd->hash_table == NULL) {
> >  		kfree(cd);
> >  		return ERR_PTR(-ENOMEM);
> >  	}
> > +
> > +	for (i = 0; i < cd->hash_size; i++)
> > +		INIT_HLIST_HEAD(&cd->hash_table[i]);
> >  	cd->net = net;
> >  	return cd;
> >  }
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 7/9 v8] sunrpc: Switch to using hash list instead single list
@ 2015-07-29 19:51       ` J. Bruce Fields
  0 siblings, 0 replies; 49+ messages in thread
From: J. Bruce Fields @ 2015-07-29 19:51 UTC (permalink / raw)
  To: NeilBrown
  Cc: Kinglong Mee, Al Viro, linux-nfs, linux-fsdevel, Trond Myklebust

On Wed, Jul 29, 2015 at 12:19:39PM +1000, NeilBrown wrote:
> On Mon, 27 Jul 2015 11:10:15 +0800 Kinglong Mee <kinglongmee@gmail.com>
> wrote:
> 
> > Switch using list_head for cache_head in cache_detail,
> > it is useful of remove an cache_head entry directly from cache_detail.
> > 
> > v8, using hash list, not head list
> > 
> > Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
> 
> Reviewed-by: NeilBrown <neilb@suse.com>

Thanks, applying this and previous 2 patches.

--b.

> 
> Thanks,
> NeilBrown
> 
> > ---
> >  include/linux/sunrpc/cache.h |  4 +--
> >  net/sunrpc/cache.c           | 60 +++++++++++++++++++++++---------------------
> >  2 files changed, 33 insertions(+), 31 deletions(-)
> > 
> > diff --git a/include/linux/sunrpc/cache.h b/include/linux/sunrpc/cache.h
> > index 04ee5a2..03d3b4c 100644
> > --- a/include/linux/sunrpc/cache.h
> > +++ b/include/linux/sunrpc/cache.h
> > @@ -46,7 +46,7 @@
> >   * 
> >   */
> >  struct cache_head {
> > -	struct cache_head * next;
> > +	struct hlist_node	cache_list;
> >  	time_t		expiry_time;	/* After time time, don't use the data */
> >  	time_t		last_refresh;   /* If CACHE_PENDING, this is when upcall 
> >  					 * was sent, else this is when update was received
> > @@ -73,7 +73,7 @@ struct cache_detail_pipefs {
> >  struct cache_detail {
> >  	struct module *		owner;
> >  	int			hash_size;
> > -	struct cache_head **	hash_table;
> > +	struct hlist_head *	hash_table;
> >  	rwlock_t		hash_lock;
> >  
> >  	atomic_t		inuse; /* active user-space update or lookup */
> > diff --git a/net/sunrpc/cache.c b/net/sunrpc/cache.c
> > index 673c2fa..4a2340a 100644
> > --- a/net/sunrpc/cache.c
> > +++ b/net/sunrpc/cache.c
> > @@ -44,7 +44,7 @@ static void cache_revisit_request(struct cache_head *item);
> >  static void cache_init(struct cache_head *h)
> >  {
> >  	time_t now = seconds_since_boot();
> > -	h->next = NULL;
> > +	INIT_HLIST_NODE(&h->cache_list);
> >  	h->flags = 0;
> >  	kref_init(&h->ref);
> >  	h->expiry_time = now + CACHE_NEW_EXPIRY;
> > @@ -54,15 +54,14 @@ static void cache_init(struct cache_head *h)
> >  struct cache_head *sunrpc_cache_lookup(struct cache_detail *detail,
> >  				       struct cache_head *key, int hash)
> >  {
> > -	struct cache_head **head,  **hp;
> > -	struct cache_head *new = NULL, *freeme = NULL;
> > +	struct cache_head *new = NULL, *freeme = NULL, *tmp = NULL;
> > +	struct hlist_head *head;
> >  
> >  	head = &detail->hash_table[hash];
> >  
> >  	read_lock(&detail->hash_lock);
> >  
> > -	for (hp=head; *hp != NULL ; hp = &(*hp)->next) {
> > -		struct cache_head *tmp = *hp;
> > +	hlist_for_each_entry(tmp, head, cache_list) {
> >  		if (detail->match(tmp, key)) {
> >  			if (cache_is_expired(detail, tmp))
> >  				/* This entry is expired, we will discard it. */
> > @@ -88,12 +87,10 @@ struct cache_head *sunrpc_cache_lookup(struct cache_detail *detail,
> >  	write_lock(&detail->hash_lock);
> >  
> >  	/* check if entry appeared while we slept */
> > -	for (hp=head; *hp != NULL ; hp = &(*hp)->next) {
> > -		struct cache_head *tmp = *hp;
> > +	hlist_for_each_entry(tmp, head, cache_list) {
> >  		if (detail->match(tmp, key)) {
> >  			if (cache_is_expired(detail, tmp)) {
> > -				*hp = tmp->next;
> > -				tmp->next = NULL;
> > +				hlist_del_init(&tmp->cache_list);
> >  				detail->entries --;
> >  				freeme = tmp;
> >  				break;
> > @@ -104,8 +101,8 @@ struct cache_head *sunrpc_cache_lookup(struct cache_detail *detail,
> >  			return tmp;
> >  		}
> >  	}
> > -	new->next = *head;
> > -	*head = new;
> > +
> > +	hlist_add_head(&new->cache_list, head);
> >  	detail->entries++;
> >  	cache_get(new);
> >  	write_unlock(&detail->hash_lock);
> > @@ -143,7 +140,6 @@ struct cache_head *sunrpc_cache_update(struct cache_detail *detail,
> >  	 * If 'old' is not VALID, we update it directly,
> >  	 * otherwise we need to replace it
> >  	 */
> > -	struct cache_head **head;
> >  	struct cache_head *tmp;
> >  
> >  	if (!test_bit(CACHE_VALID, &old->flags)) {
> > @@ -168,15 +164,13 @@ struct cache_head *sunrpc_cache_update(struct cache_detail *detail,
> >  	}
> >  	cache_init(tmp);
> >  	detail->init(tmp, old);
> > -	head = &detail->hash_table[hash];
> >  
> >  	write_lock(&detail->hash_lock);
> >  	if (test_bit(CACHE_NEGATIVE, &new->flags))
> >  		set_bit(CACHE_NEGATIVE, &tmp->flags);
> >  	else
> >  		detail->update(tmp, new);
> > -	tmp->next = *head;
> > -	*head = tmp;
> > +	hlist_add_head(&tmp->cache_list, &detail->hash_table[hash]);
> >  	detail->entries++;
> >  	cache_get(tmp);
> >  	cache_fresh_locked(tmp, new->expiry_time);
> > @@ -416,28 +410,29 @@ static int cache_clean(void)
> >  	/* find a non-empty bucket in the table */
> >  	while (current_detail &&
> >  	       current_index < current_detail->hash_size &&
> > -	       current_detail->hash_table[current_index] == NULL)
> > +	       hlist_empty(&current_detail->hash_table[current_index]))
> >  		current_index++;
> >  
> >  	/* find a cleanable entry in the bucket and clean it, or set to next bucket */
> >  
> >  	if (current_detail && current_index < current_detail->hash_size) {
> > -		struct cache_head *ch, **cp;
> > +		struct cache_head *ch = NULL;
> >  		struct cache_detail *d;
> > +		struct hlist_head *head;
> > +		struct hlist_node *tmp;
> >  
> >  		write_lock(&current_detail->hash_lock);
> >  
> >  		/* Ok, now to clean this strand */
> >  
> > -		cp = & current_detail->hash_table[current_index];
> > -		for (ch = *cp ; ch ; cp = & ch->next, ch = *cp) {
> > +		head = &current_detail->hash_table[current_index];
> > +		hlist_for_each_entry_safe(ch, tmp, head, cache_list) {
> >  			if (current_detail->nextcheck > ch->expiry_time)
> >  				current_detail->nextcheck = ch->expiry_time+1;
> >  			if (!cache_is_expired(current_detail, ch))
> >  				continue;
> >  
> > -			*cp = ch->next;
> > -			ch->next = NULL;
> > +			hlist_del_init(&ch->cache_list);
> >  			current_detail->entries--;
> >  			rv = 1;
> >  			break;
> > @@ -1284,7 +1279,7 @@ void *cache_seq_start(struct seq_file *m, loff_t *pos)
> >  	hash = n >> 32;
> >  	entry = n & ((1LL<<32) - 1);
> >  
> > -	for (ch=cd->hash_table[hash]; ch; ch=ch->next)
> > +	hlist_for_each_entry(ch, &cd->hash_table[hash], cache_list)
> >  		if (!entry--)
> >  			return ch;
> >  	n &= ~((1LL<<32) - 1);
> > @@ -1292,11 +1287,12 @@ void *cache_seq_start(struct seq_file *m, loff_t *pos)
> >  		hash++;
> >  		n += 1LL<<32;
> >  	} while(hash < cd->hash_size &&
> > -		cd->hash_table[hash]==NULL);
> > +		hlist_empty(&cd->hash_table[hash]));
> >  	if (hash >= cd->hash_size)
> >  		return NULL;
> >  	*pos = n+1;
> > -	return cd->hash_table[hash];
> > +	return hlist_entry_safe(cd->hash_table[hash].first,
> > +				struct cache_head, cache_list);
> >  }
> >  EXPORT_SYMBOL_GPL(cache_seq_start);
> >  
> > @@ -1308,23 +1304,25 @@ void *cache_seq_next(struct seq_file *m, void *p, loff_t *pos)
> >  
> >  	if (p == SEQ_START_TOKEN)
> >  		hash = 0;
> > -	else if (ch->next == NULL) {
> > +	else if (ch->cache_list.next == NULL) {
> >  		hash++;
> >  		*pos += 1LL<<32;
> >  	} else {
> >  		++*pos;
> > -		return ch->next;
> > +		return hlist_entry_safe(ch->cache_list.next,
> > +					struct cache_head, cache_list);
> >  	}
> >  	*pos &= ~((1LL<<32) - 1);
> >  	while (hash < cd->hash_size &&
> > -	       cd->hash_table[hash] == NULL) {
> > +	       hlist_empty(&cd->hash_table[hash])) {
> >  		hash++;
> >  		*pos += 1LL<<32;
> >  	}
> >  	if (hash >= cd->hash_size)
> >  		return NULL;
> >  	++*pos;
> > -	return cd->hash_table[hash];
> > +	return hlist_entry_safe(cd->hash_table[hash].first,
> > +				struct cache_head, cache_list);
> >  }
> >  EXPORT_SYMBOL_GPL(cache_seq_next);
> >  
> > @@ -1666,17 +1664,21 @@ EXPORT_SYMBOL_GPL(cache_unregister_net);
> >  struct cache_detail *cache_create_net(struct cache_detail *tmpl, struct net *net)
> >  {
> >  	struct cache_detail *cd;
> > +	int i;
> >  
> >  	cd = kmemdup(tmpl, sizeof(struct cache_detail), GFP_KERNEL);
> >  	if (cd == NULL)
> >  		return ERR_PTR(-ENOMEM);
> >  
> > -	cd->hash_table = kzalloc(cd->hash_size * sizeof(struct cache_head *),
> > +	cd->hash_table = kzalloc(cd->hash_size * sizeof(struct hlist_head),
> >  				 GFP_KERNEL);
> >  	if (cd->hash_table == NULL) {
> >  		kfree(cd);
> >  		return ERR_PTR(-ENOMEM);
> >  	}
> > +
> > +	for (i = 0; i < cd->hash_size; i++)
> > +		INIT_HLIST_HEAD(&cd->hash_table[i]);
> >  	cd->net = net;
> >  	return cd;
> >  }

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 1/9 v8] fs_pin: Initialize value for fs_pin explicitly
  2015-07-29 19:41       ` J. Bruce Fields
@ 2015-07-29 21:48             ` NeilBrown
  0 siblings, 0 replies; 49+ messages in thread
From: NeilBrown @ 2015-07-29 21:48 UTC (permalink / raw)
  To: J. Bruce Fields
  Cc: Kinglong Mee, Al Viro, linux-nfs-u79uwXL29TY76Z2rM5mHXA,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA, Trond Myklebust

On Wed, 29 Jul 2015 15:41:55 -0400 "J. Bruce Fields"
<bfields-uC3wQj2KruNg9hUCZPvPmw@public.gmane.org> wrote:

> On Wed, Jul 29, 2015 at 10:25:19AM +1000, NeilBrown wrote:
> > On Mon, 27 Jul 2015 11:06:53 +0800 Kinglong Mee <kinglongmee-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
> > wrote:
> > 
> > > Without initialized, done in fs_pin at stack space may
> > > contains strange value.
> > > 
> > > v8, same as v3
> > > Adds macro for header file
> > > 
> > > Signed-off-by: Kinglong Mee <kinglongmee-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
> > 
> > Reviewed-by: NeilBrown <neilb-IBi9RG/b67k@public.gmane.org>
> > 
> > It would be really good if some of these early patches could be applied
> > to the relevant trees so they appear in -next and we only need to keep
> > reviewing the more interesting code at the end.
> 
> This patch seems a little bikeshed-y.  I'd rather just drop it or save
> it for some other day.  It's not necessary to the series.

???

I accept that:


> > > @@ -1,3 +1,6 @@
> > > +#ifndef _LINUX_FS_PIN_H
> > > +#define _LINUX_FS_PIN_H
> > > +
> > >  #include <linux/wait.h>

could be a little bike-shed-y, not that I've seen much bike shedding
going on.

However:
> > >  
> > >  struct fs_pin {
> > > @@ -16,9 +19,12 @@ static inline void init_fs_pin(struct fs_pin *p, void (*kill)(struct fs_pin *))
> > >  	INIT_HLIST_NODE(&p->s_list);
> > >  	INIT_HLIST_NODE(&p->m_list);
> > >  	p->kill = kill;
> > > +	p->done = 0;
> > >  }
> > >  

is quite important.
Without that assignment we would probably need to rename the function to
   init_most_of_fs_pin
or
   init_fs_pin_if_already_zeroed
or maybe just
   __init_fs_pin
with the time honoured interpretation that it sort-of does what the
name says, but maybe not exactly how you think and please use with care.

Then in nfsd code we would need to add the assignment ourselves, or use
kzalloc where it would otherwise be completely unnecessary.


Thanks for accepting the other patches!

NeilBrown

--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 1/9 v8] fs_pin: Initialize value for fs_pin explicitly
@ 2015-07-29 21:48             ` NeilBrown
  0 siblings, 0 replies; 49+ messages in thread
From: NeilBrown @ 2015-07-29 21:48 UTC (permalink / raw)
  To: J. Bruce Fields
  Cc: Kinglong Mee, Al Viro, linux-nfs, linux-fsdevel, Trond Myklebust

On Wed, 29 Jul 2015 15:41:55 -0400 "J. Bruce Fields"
<bfields@fieldses.org> wrote:

> On Wed, Jul 29, 2015 at 10:25:19AM +1000, NeilBrown wrote:
> > On Mon, 27 Jul 2015 11:06:53 +0800 Kinglong Mee <kinglongmee@gmail.com>
> > wrote:
> > 
> > > Without initialized, done in fs_pin at stack space may
> > > contains strange value.
> > > 
> > > v8, same as v3
> > > Adds macro for header file
> > > 
> > > Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
> > 
> > Reviewed-by: NeilBrown <neilb@suse.com>
> > 
> > It would be really good if some of these early patches could be applied
> > to the relevant trees so they appear in -next and we only need to keep
> > reviewing the more interesting code at the end.
> 
> This patch seems a little bikeshed-y.  I'd rather just drop it or save
> it for some other day.  It's not necessary to the series.

???

I accept that:


> > > @@ -1,3 +1,6 @@
> > > +#ifndef _LINUX_FS_PIN_H
> > > +#define _LINUX_FS_PIN_H
> > > +
> > >  #include <linux/wait.h>

could be a little bike-shed-y, not that I've seen much bike shedding
going on.

However:
> > >  
> > >  struct fs_pin {
> > > @@ -16,9 +19,12 @@ static inline void init_fs_pin(struct fs_pin *p, void (*kill)(struct fs_pin *))
> > >  	INIT_HLIST_NODE(&p->s_list);
> > >  	INIT_HLIST_NODE(&p->m_list);
> > >  	p->kill = kill;
> > > +	p->done = 0;
> > >  }
> > >  

is quite important.
Without that assignment we would probably need to rename the function to
   init_most_of_fs_pin
or
   init_fs_pin_if_already_zeroed
or maybe just
   __init_fs_pin
with the time honoured interpretation that it sort-of does what the
name says, but maybe not exactly how you think and please use with care.

Then in nfsd code we would need to add the assignment ourselves, or use
kzalloc where it would otherwise be completely unnecessary.


Thanks for accepting the other patches!

NeilBrown


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 1/9 v8] fs_pin: Initialize value for fs_pin explicitly
  2015-07-29 21:48             ` NeilBrown
  (?)
@ 2015-07-30  0:36             ` J. Bruce Fields
  2015-07-30 12:28               ` Kinglong Mee
  -1 siblings, 1 reply; 49+ messages in thread
From: J. Bruce Fields @ 2015-07-30  0:36 UTC (permalink / raw)
  To: NeilBrown
  Cc: Kinglong Mee, Al Viro, linux-nfs, linux-fsdevel, Trond Myklebust

On Thu, Jul 30, 2015 at 07:48:24AM +1000, NeilBrown wrote:
> On Wed, 29 Jul 2015 15:41:55 -0400 "J. Bruce Fields"
> <bfields@fieldses.org> wrote:
> 
> > On Wed, Jul 29, 2015 at 10:25:19AM +1000, NeilBrown wrote:
> > > On Mon, 27 Jul 2015 11:06:53 +0800 Kinglong Mee <kinglongmee@gmail.com>
> > > wrote:
> > > 
> > > > Without initialized, done in fs_pin at stack space may
> > > > contains strange value.
> > > > 
> > > > v8, same as v3
> > > > Adds macro for header file
> > > > 
> > > > Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
> > > 
> > > Reviewed-by: NeilBrown <neilb@suse.com>
> > > 
> > > It would be really good if some of these early patches could be applied
> > > to the relevant trees so they appear in -next and we only need to keep
> > > reviewing the more interesting code at the end.
> > 
> > This patch seems a little bikeshed-y.  I'd rather just drop it or save
> > it for some other day.  It's not necessary to the series.
> 
> ???
> 
> I accept that:
> 
> 
> > > > @@ -1,3 +1,6 @@
> > > > +#ifndef _LINUX_FS_PIN_H
> > > > +#define _LINUX_FS_PIN_H
> > > > +
> > > >  #include <linux/wait.h>
> 
> could be a little bike-shed-y, not that I've seen much bike shedding
> going on.
> 
> However:
> > > >  
> > > >  struct fs_pin {
> > > > @@ -16,9 +19,12 @@ static inline void init_fs_pin(struct fs_pin *p, void (*kill)(struct fs_pin *))
> > > >  	INIT_HLIST_NODE(&p->s_list);
> > > >  	INIT_HLIST_NODE(&p->m_list);
> > > >  	p->kill = kill;
> > > > +	p->done = 0;
> > > >  }
> > > >  
> 
> is quite important.
> Without that assignment we would probably need to rename the function to
>    init_most_of_fs_pin
> or
>    init_fs_pin_if_already_zeroed
> or maybe just
>    __init_fs_pin
> with the time honoured interpretation that it sort-of does what the
> name says, but maybe not exactly how you think and please use with care.
> 
> Then in nfsd code we would need to add the assignment ourselves, or use
> kzalloc where it would otherwise be completely unnecessary.

Right, the existing users do something like kzalloc, last I checked.
Maybe it'd be an improvement not to.

I don't really care, but since Al's a bit overloaded, and you don't want
to see this reposted, and it's not really essential to the series, why
not drop it for now?

--b.

> 
> 
> Thanks for accepting the other patches!
> 
> NeilBrown

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 1/9 v8] fs_pin: Initialize value for fs_pin explicitly
  2015-07-30  0:36             ` J. Bruce Fields
@ 2015-07-30 12:28               ` Kinglong Mee
  0 siblings, 0 replies; 49+ messages in thread
From: Kinglong Mee @ 2015-07-30 12:28 UTC (permalink / raw)
  To: Al Viro
  Cc: J. Bruce Fields, NeilBrown, linux-nfs, linux-fsdevel,
	Trond Myklebust, kinglongmee

On 7/30/2015 08:36, J. Bruce Fields wrote:
> On Thu, Jul 30, 2015 at 07:48:24AM +1000, NeilBrown wrote:
>> On Wed, 29 Jul 2015 15:41:55 -0400 "J. Bruce Fields"
>> <bfields@fieldses.org> wrote:
>>
>>> On Wed, Jul 29, 2015 at 10:25:19AM +1000, NeilBrown wrote:
>>>> On Mon, 27 Jul 2015 11:06:53 +0800 Kinglong Mee <kinglongmee@gmail.com>
>>>> wrote:
>>>>
>>>>> Without initialized, done in fs_pin at stack space may
>>>>> contains strange value.
>>>>>
>>>>> v8, same as v3
>>>>> Adds macro for header file
>>>>>
>>>>> Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
>>>>
>>>> Reviewed-by: NeilBrown <neilb@suse.com>
>>>>
>>>> It would be really good if some of these early patches could be applied
>>>> to the relevant trees so they appear in -next and we only need to keep
>>>> reviewing the more interesting code at the end.
>>>
>>> This patch seems a little bikeshed-y.  I'd rather just drop it or save
>>> it for some other day.  It's not necessary to the series.
>>
>> ???
>>
>> I accept that:
>>
>>
>>>>> @@ -1,3 +1,6 @@
>>>>> +#ifndef _LINUX_FS_PIN_H
>>>>> +#define _LINUX_FS_PIN_H
>>>>> +
>>>>>  #include <linux/wait.h>
>>
>> could be a little bike-shed-y, not that I've seen much bike shedding
>> going on.
>>
>> However:
>>>>>  
>>>>>  struct fs_pin {
>>>>> @@ -16,9 +19,12 @@ static inline void init_fs_pin(struct fs_pin *p, void (*kill)(struct fs_pin *))
>>>>>  	INIT_HLIST_NODE(&p->s_list);
>>>>>  	INIT_HLIST_NODE(&p->m_list);
>>>>>  	p->kill = kill;
>>>>> +	p->done = 0;
>>>>>  }
>>>>>  
>>
>> is quite important.
>> Without that assignment we would probably need to rename the function to
>>    init_most_of_fs_pin
>> or
>>    init_fs_pin_if_already_zeroed
>> or maybe just
>>    __init_fs_pin
>> with the time honoured interpretation that it sort-of does what the
>> name says, but maybe not exactly how you think and please use with care.
>>
>> Then in nfsd code we would need to add the assignment ourselves, or use
>> kzalloc where it would otherwise be completely unnecessary.
> 
> Right, the existing users do something like kzalloc, last I checked.
> Maybe it'd be an improvement not to.
> 
> I don't really care, but since Al's a bit overloaded, and you don't want
> to see this reposted, and it's not really essential to the series, why
> not drop it for now?

What's your opinion about this patch, Al?
Drop? needs update?

thanks,
Kinglong Mee

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 2/9 v8] fs_pin: Export functions for specific filesystem
  2015-07-29  0:30     ` NeilBrown
@ 2015-07-30 12:31       ` Kinglong Mee
  0 siblings, 0 replies; 49+ messages in thread
From: Kinglong Mee @ 2015-07-30 12:31 UTC (permalink / raw)
  To: NeilBrown
  Cc: J. Bruce Fields, Al Viro, linux-nfs, linux-fsdevel,
	Trond Myklebust, kinglongmee

On 7/29/2015 08:30, NeilBrown wrote:
> On Mon, 27 Jul 2015 11:07:25 +0800 Kinglong Mee <kinglongmee@gmail.com>
> wrote:
> 
>> Exports functions for others who want pin to vfsmount,
>> eg, nfsd's export cache.
>>
>> v8, same as v4
>> add exporting of pin_kill.
>>
>> Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
> 
> Reviewed-by: NeilBrown <neilb@suse.com>
> 
> These are needed for any module to participate in pinning.
> 
> mnt_pin_kill() and group_pin_kill() are just helper-functions for
> unmount etc (and are in fs/internal.h) so don't need to be exported.

Yes, thanks for your comments.
It's better adding the comments to the change-log.

thanks,
Kinglong Mee

> 
> Thanks,
> NeilBrown
> 
>> ---
>>  fs/fs_pin.c | 4 ++++
>>  1 file changed, 4 insertions(+)
>>
>> diff --git a/fs/fs_pin.c b/fs/fs_pin.c
>> index 611b540..a1a4eb2 100644
>> --- a/fs/fs_pin.c
>> +++ b/fs/fs_pin.c
>> @@ -17,6 +17,7 @@ void pin_remove(struct fs_pin *pin)
>>  	wake_up_locked(&pin->wait);
>>  	spin_unlock_irq(&pin->wait.lock);
>>  }
>> +EXPORT_SYMBOL(pin_remove);
>>  
>>  void pin_insert_group(struct fs_pin *pin, struct vfsmount *m, struct hlist_head *p)
>>  {
>> @@ -26,11 +27,13 @@ void pin_insert_group(struct fs_pin *pin, struct vfsmount *m, struct hlist_head
>>  	hlist_add_head(&pin->m_list, &real_mount(m)->mnt_pins);
>>  	spin_unlock(&pin_lock);
>>  }
>> +EXPORT_SYMBOL(pin_insert_group);
>>  
>>  void pin_insert(struct fs_pin *pin, struct vfsmount *m)
>>  {
>>  	pin_insert_group(pin, m, &m->mnt_sb->s_pins);
>>  }
>> +EXPORT_SYMBOL(pin_insert);
>>  
>>  void pin_kill(struct fs_pin *p)
>>  {
>> @@ -72,6 +75,7 @@ void pin_kill(struct fs_pin *p)
>>  	}
>>  	rcu_read_unlock();
>>  }
>> +EXPORT_SYMBOL(pin_kill);
>>  
>>  void mnt_pin_kill(struct mount *m)
>>  {
> 
> 

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 7/9 v8] sunrpc: Switch to using hash list instead single list
  2015-07-29 19:51       ` J. Bruce Fields
@ 2015-07-30 13:01           ` Kinglong Mee
  -1 siblings, 0 replies; 49+ messages in thread
From: Kinglong Mee @ 2015-07-30 13:01 UTC (permalink / raw)
  To: J. Bruce Fields, NeilBrown
  Cc: Al Viro, linux-nfs-u79uwXL29TY76Z2rM5mHXA,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA, Trond Myklebust,
	kinglongmee-Re5JQEeQqe8AvxtiuMwx3w

On 7/30/2015 03:51, J. Bruce Fields wrote:
> On Wed, Jul 29, 2015 at 12:19:39PM +1000, NeilBrown wrote:
>> On Mon, 27 Jul 2015 11:10:15 +0800 Kinglong Mee <kinglongmee-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
>> wrote:
>>
>>> Switch using list_head for cache_head in cache_detail,
>>> it is useful of remove an cache_head entry directly from cache_detail.
>>>
>>> v8, using hash list, not head list
>>>
>>> Signed-off-by: Kinglong Mee <kinglongmee-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
>>
>> Reviewed-by: NeilBrown <neilb-IBi9RG/b67k@public.gmane.org>
> 
> Thanks, applying this and previous 2 patches.

Thanks, I see them in your tree.
I will not include those 3 patches in the next version.

thanks,
Kinglong Mee

> 
> --b.
> 
>>
>> Thanks,
>> NeilBrown
>>
>>> ---
>>>  include/linux/sunrpc/cache.h |  4 +--
>>>  net/sunrpc/cache.c           | 60 +++++++++++++++++++++++---------------------
>>>  2 files changed, 33 insertions(+), 31 deletions(-)
>>>
>>> diff --git a/include/linux/sunrpc/cache.h b/include/linux/sunrpc/cache.h
>>> index 04ee5a2..03d3b4c 100644
>>> --- a/include/linux/sunrpc/cache.h
>>> +++ b/include/linux/sunrpc/cache.h
>>> @@ -46,7 +46,7 @@
>>>   * 
>>>   */
>>>  struct cache_head {
>>> -	struct cache_head * next;
>>> +	struct hlist_node	cache_list;
>>>  	time_t		expiry_time;	/* After time time, don't use the data */
>>>  	time_t		last_refresh;   /* If CACHE_PENDING, this is when upcall 
>>>  					 * was sent, else this is when update was received
>>> @@ -73,7 +73,7 @@ struct cache_detail_pipefs {
>>>  struct cache_detail {
>>>  	struct module *		owner;
>>>  	int			hash_size;
>>> -	struct cache_head **	hash_table;
>>> +	struct hlist_head *	hash_table;
>>>  	rwlock_t		hash_lock;
>>>  
>>>  	atomic_t		inuse; /* active user-space update or lookup */
>>> diff --git a/net/sunrpc/cache.c b/net/sunrpc/cache.c
>>> index 673c2fa..4a2340a 100644
>>> --- a/net/sunrpc/cache.c
>>> +++ b/net/sunrpc/cache.c
>>> @@ -44,7 +44,7 @@ static void cache_revisit_request(struct cache_head *item);
>>>  static void cache_init(struct cache_head *h)
>>>  {
>>>  	time_t now = seconds_since_boot();
>>> -	h->next = NULL;
>>> +	INIT_HLIST_NODE(&h->cache_list);
>>>  	h->flags = 0;
>>>  	kref_init(&h->ref);
>>>  	h->expiry_time = now + CACHE_NEW_EXPIRY;
>>> @@ -54,15 +54,14 @@ static void cache_init(struct cache_head *h)
>>>  struct cache_head *sunrpc_cache_lookup(struct cache_detail *detail,
>>>  				       struct cache_head *key, int hash)
>>>  {
>>> -	struct cache_head **head,  **hp;
>>> -	struct cache_head *new = NULL, *freeme = NULL;
>>> +	struct cache_head *new = NULL, *freeme = NULL, *tmp = NULL;
>>> +	struct hlist_head *head;
>>>  
>>>  	head = &detail->hash_table[hash];
>>>  
>>>  	read_lock(&detail->hash_lock);
>>>  
>>> -	for (hp=head; *hp != NULL ; hp = &(*hp)->next) {
>>> -		struct cache_head *tmp = *hp;
>>> +	hlist_for_each_entry(tmp, head, cache_list) {
>>>  		if (detail->match(tmp, key)) {
>>>  			if (cache_is_expired(detail, tmp))
>>>  				/* This entry is expired, we will discard it. */
>>> @@ -88,12 +87,10 @@ struct cache_head *sunrpc_cache_lookup(struct cache_detail *detail,
>>>  	write_lock(&detail->hash_lock);
>>>  
>>>  	/* check if entry appeared while we slept */
>>> -	for (hp=head; *hp != NULL ; hp = &(*hp)->next) {
>>> -		struct cache_head *tmp = *hp;
>>> +	hlist_for_each_entry(tmp, head, cache_list) {
>>>  		if (detail->match(tmp, key)) {
>>>  			if (cache_is_expired(detail, tmp)) {
>>> -				*hp = tmp->next;
>>> -				tmp->next = NULL;
>>> +				hlist_del_init(&tmp->cache_list);
>>>  				detail->entries --;
>>>  				freeme = tmp;
>>>  				break;
>>> @@ -104,8 +101,8 @@ struct cache_head *sunrpc_cache_lookup(struct cache_detail *detail,
>>>  			return tmp;
>>>  		}
>>>  	}
>>> -	new->next = *head;
>>> -	*head = new;
>>> +
>>> +	hlist_add_head(&new->cache_list, head);
>>>  	detail->entries++;
>>>  	cache_get(new);
>>>  	write_unlock(&detail->hash_lock);
>>> @@ -143,7 +140,6 @@ struct cache_head *sunrpc_cache_update(struct cache_detail *detail,
>>>  	 * If 'old' is not VALID, we update it directly,
>>>  	 * otherwise we need to replace it
>>>  	 */
>>> -	struct cache_head **head;
>>>  	struct cache_head *tmp;
>>>  
>>>  	if (!test_bit(CACHE_VALID, &old->flags)) {
>>> @@ -168,15 +164,13 @@ struct cache_head *sunrpc_cache_update(struct cache_detail *detail,
>>>  	}
>>>  	cache_init(tmp);
>>>  	detail->init(tmp, old);
>>> -	head = &detail->hash_table[hash];
>>>  
>>>  	write_lock(&detail->hash_lock);
>>>  	if (test_bit(CACHE_NEGATIVE, &new->flags))
>>>  		set_bit(CACHE_NEGATIVE, &tmp->flags);
>>>  	else
>>>  		detail->update(tmp, new);
>>> -	tmp->next = *head;
>>> -	*head = tmp;
>>> +	hlist_add_head(&tmp->cache_list, &detail->hash_table[hash]);
>>>  	detail->entries++;
>>>  	cache_get(tmp);
>>>  	cache_fresh_locked(tmp, new->expiry_time);
>>> @@ -416,28 +410,29 @@ static int cache_clean(void)
>>>  	/* find a non-empty bucket in the table */
>>>  	while (current_detail &&
>>>  	       current_index < current_detail->hash_size &&
>>> -	       current_detail->hash_table[current_index] == NULL)
>>> +	       hlist_empty(&current_detail->hash_table[current_index]))
>>>  		current_index++;
>>>  
>>>  	/* find a cleanable entry in the bucket and clean it, or set to next bucket */
>>>  
>>>  	if (current_detail && current_index < current_detail->hash_size) {
>>> -		struct cache_head *ch, **cp;
>>> +		struct cache_head *ch = NULL;
>>>  		struct cache_detail *d;
>>> +		struct hlist_head *head;
>>> +		struct hlist_node *tmp;
>>>  
>>>  		write_lock(&current_detail->hash_lock);
>>>  
>>>  		/* Ok, now to clean this strand */
>>>  
>>> -		cp = & current_detail->hash_table[current_index];
>>> -		for (ch = *cp ; ch ; cp = & ch->next, ch = *cp) {
>>> +		head = &current_detail->hash_table[current_index];
>>> +		hlist_for_each_entry_safe(ch, tmp, head, cache_list) {
>>>  			if (current_detail->nextcheck > ch->expiry_time)
>>>  				current_detail->nextcheck = ch->expiry_time+1;
>>>  			if (!cache_is_expired(current_detail, ch))
>>>  				continue;
>>>  
>>> -			*cp = ch->next;
>>> -			ch->next = NULL;
>>> +			hlist_del_init(&ch->cache_list);
>>>  			current_detail->entries--;
>>>  			rv = 1;
>>>  			break;
>>> @@ -1284,7 +1279,7 @@ void *cache_seq_start(struct seq_file *m, loff_t *pos)
>>>  	hash = n >> 32;
>>>  	entry = n & ((1LL<<32) - 1);
>>>  
>>> -	for (ch=cd->hash_table[hash]; ch; ch=ch->next)
>>> +	hlist_for_each_entry(ch, &cd->hash_table[hash], cache_list)
>>>  		if (!entry--)
>>>  			return ch;
>>>  	n &= ~((1LL<<32) - 1);
>>> @@ -1292,11 +1287,12 @@ void *cache_seq_start(struct seq_file *m, loff_t *pos)
>>>  		hash++;
>>>  		n += 1LL<<32;
>>>  	} while(hash < cd->hash_size &&
>>> -		cd->hash_table[hash]==NULL);
>>> +		hlist_empty(&cd->hash_table[hash]));
>>>  	if (hash >= cd->hash_size)
>>>  		return NULL;
>>>  	*pos = n+1;
>>> -	return cd->hash_table[hash];
>>> +	return hlist_entry_safe(cd->hash_table[hash].first,
>>> +				struct cache_head, cache_list);
>>>  }
>>>  EXPORT_SYMBOL_GPL(cache_seq_start);
>>>  
>>> @@ -1308,23 +1304,25 @@ void *cache_seq_next(struct seq_file *m, void *p, loff_t *pos)
>>>  
>>>  	if (p == SEQ_START_TOKEN)
>>>  		hash = 0;
>>> -	else if (ch->next == NULL) {
>>> +	else if (ch->cache_list.next == NULL) {
>>>  		hash++;
>>>  		*pos += 1LL<<32;
>>>  	} else {
>>>  		++*pos;
>>> -		return ch->next;
>>> +		return hlist_entry_safe(ch->cache_list.next,
>>> +					struct cache_head, cache_list);
>>>  	}
>>>  	*pos &= ~((1LL<<32) - 1);
>>>  	while (hash < cd->hash_size &&
>>> -	       cd->hash_table[hash] == NULL) {
>>> +	       hlist_empty(&cd->hash_table[hash])) {
>>>  		hash++;
>>>  		*pos += 1LL<<32;
>>>  	}
>>>  	if (hash >= cd->hash_size)
>>>  		return NULL;
>>>  	++*pos;
>>> -	return cd->hash_table[hash];
>>> +	return hlist_entry_safe(cd->hash_table[hash].first,
>>> +				struct cache_head, cache_list);
>>>  }
>>>  EXPORT_SYMBOL_GPL(cache_seq_next);
>>>  
>>> @@ -1666,17 +1664,21 @@ EXPORT_SYMBOL_GPL(cache_unregister_net);
>>>  struct cache_detail *cache_create_net(struct cache_detail *tmpl, struct net *net)
>>>  {
>>>  	struct cache_detail *cd;
>>> +	int i;
>>>  
>>>  	cd = kmemdup(tmpl, sizeof(struct cache_detail), GFP_KERNEL);
>>>  	if (cd == NULL)
>>>  		return ERR_PTR(-ENOMEM);
>>>  
>>> -	cd->hash_table = kzalloc(cd->hash_size * sizeof(struct cache_head *),
>>> +	cd->hash_table = kzalloc(cd->hash_size * sizeof(struct hlist_head),
>>>  				 GFP_KERNEL);
>>>  	if (cd->hash_table == NULL) {
>>>  		kfree(cd);
>>>  		return ERR_PTR(-ENOMEM);
>>>  	}
>>> +
>>> +	for (i = 0; i < cd->hash_size; i++)
>>> +		INIT_HLIST_HEAD(&cd->hash_table[i]);
>>>  	cd->net = net;
>>>  	return cd;
>>>  }
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 7/9 v8] sunrpc: Switch to using hash list instead single list
@ 2015-07-30 13:01           ` Kinglong Mee
  0 siblings, 0 replies; 49+ messages in thread
From: Kinglong Mee @ 2015-07-30 13:01 UTC (permalink / raw)
  To: J. Bruce Fields, NeilBrown
  Cc: Al Viro, linux-nfs, linux-fsdevel, Trond Myklebust, kinglongmee

On 7/30/2015 03:51, J. Bruce Fields wrote:
> On Wed, Jul 29, 2015 at 12:19:39PM +1000, NeilBrown wrote:
>> On Mon, 27 Jul 2015 11:10:15 +0800 Kinglong Mee <kinglongmee@gmail.com>
>> wrote:
>>
>>> Switch using list_head for cache_head in cache_detail,
>>> it is useful of remove an cache_head entry directly from cache_detail.
>>>
>>> v8, using hash list, not head list
>>>
>>> Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
>>
>> Reviewed-by: NeilBrown <neilb@suse.com>
> 
> Thanks, applying this and previous 2 patches.

Thanks, I see them in your tree.
I will not include those 3 patches in the next version.

thanks,
Kinglong Mee

> 
> --b.
> 
>>
>> Thanks,
>> NeilBrown
>>
>>> ---
>>>  include/linux/sunrpc/cache.h |  4 +--
>>>  net/sunrpc/cache.c           | 60 +++++++++++++++++++++++---------------------
>>>  2 files changed, 33 insertions(+), 31 deletions(-)
>>>
>>> diff --git a/include/linux/sunrpc/cache.h b/include/linux/sunrpc/cache.h
>>> index 04ee5a2..03d3b4c 100644
>>> --- a/include/linux/sunrpc/cache.h
>>> +++ b/include/linux/sunrpc/cache.h
>>> @@ -46,7 +46,7 @@
>>>   * 
>>>   */
>>>  struct cache_head {
>>> -	struct cache_head * next;
>>> +	struct hlist_node	cache_list;
>>>  	time_t		expiry_time;	/* After time time, don't use the data */
>>>  	time_t		last_refresh;   /* If CACHE_PENDING, this is when upcall 
>>>  					 * was sent, else this is when update was received
>>> @@ -73,7 +73,7 @@ struct cache_detail_pipefs {
>>>  struct cache_detail {
>>>  	struct module *		owner;
>>>  	int			hash_size;
>>> -	struct cache_head **	hash_table;
>>> +	struct hlist_head *	hash_table;
>>>  	rwlock_t		hash_lock;
>>>  
>>>  	atomic_t		inuse; /* active user-space update or lookup */
>>> diff --git a/net/sunrpc/cache.c b/net/sunrpc/cache.c
>>> index 673c2fa..4a2340a 100644
>>> --- a/net/sunrpc/cache.c
>>> +++ b/net/sunrpc/cache.c
>>> @@ -44,7 +44,7 @@ static void cache_revisit_request(struct cache_head *item);
>>>  static void cache_init(struct cache_head *h)
>>>  {
>>>  	time_t now = seconds_since_boot();
>>> -	h->next = NULL;
>>> +	INIT_HLIST_NODE(&h->cache_list);
>>>  	h->flags = 0;
>>>  	kref_init(&h->ref);
>>>  	h->expiry_time = now + CACHE_NEW_EXPIRY;
>>> @@ -54,15 +54,14 @@ static void cache_init(struct cache_head *h)
>>>  struct cache_head *sunrpc_cache_lookup(struct cache_detail *detail,
>>>  				       struct cache_head *key, int hash)
>>>  {
>>> -	struct cache_head **head,  **hp;
>>> -	struct cache_head *new = NULL, *freeme = NULL;
>>> +	struct cache_head *new = NULL, *freeme = NULL, *tmp = NULL;
>>> +	struct hlist_head *head;
>>>  
>>>  	head = &detail->hash_table[hash];
>>>  
>>>  	read_lock(&detail->hash_lock);
>>>  
>>> -	for (hp=head; *hp != NULL ; hp = &(*hp)->next) {
>>> -		struct cache_head *tmp = *hp;
>>> +	hlist_for_each_entry(tmp, head, cache_list) {
>>>  		if (detail->match(tmp, key)) {
>>>  			if (cache_is_expired(detail, tmp))
>>>  				/* This entry is expired, we will discard it. */
>>> @@ -88,12 +87,10 @@ struct cache_head *sunrpc_cache_lookup(struct cache_detail *detail,
>>>  	write_lock(&detail->hash_lock);
>>>  
>>>  	/* check if entry appeared while we slept */
>>> -	for (hp=head; *hp != NULL ; hp = &(*hp)->next) {
>>> -		struct cache_head *tmp = *hp;
>>> +	hlist_for_each_entry(tmp, head, cache_list) {
>>>  		if (detail->match(tmp, key)) {
>>>  			if (cache_is_expired(detail, tmp)) {
>>> -				*hp = tmp->next;
>>> -				tmp->next = NULL;
>>> +				hlist_del_init(&tmp->cache_list);
>>>  				detail->entries --;
>>>  				freeme = tmp;
>>>  				break;
>>> @@ -104,8 +101,8 @@ struct cache_head *sunrpc_cache_lookup(struct cache_detail *detail,
>>>  			return tmp;
>>>  		}
>>>  	}
>>> -	new->next = *head;
>>> -	*head = new;
>>> +
>>> +	hlist_add_head(&new->cache_list, head);
>>>  	detail->entries++;
>>>  	cache_get(new);
>>>  	write_unlock(&detail->hash_lock);
>>> @@ -143,7 +140,6 @@ struct cache_head *sunrpc_cache_update(struct cache_detail *detail,
>>>  	 * If 'old' is not VALID, we update it directly,
>>>  	 * otherwise we need to replace it
>>>  	 */
>>> -	struct cache_head **head;
>>>  	struct cache_head *tmp;
>>>  
>>>  	if (!test_bit(CACHE_VALID, &old->flags)) {
>>> @@ -168,15 +164,13 @@ struct cache_head *sunrpc_cache_update(struct cache_detail *detail,
>>>  	}
>>>  	cache_init(tmp);
>>>  	detail->init(tmp, old);
>>> -	head = &detail->hash_table[hash];
>>>  
>>>  	write_lock(&detail->hash_lock);
>>>  	if (test_bit(CACHE_NEGATIVE, &new->flags))
>>>  		set_bit(CACHE_NEGATIVE, &tmp->flags);
>>>  	else
>>>  		detail->update(tmp, new);
>>> -	tmp->next = *head;
>>> -	*head = tmp;
>>> +	hlist_add_head(&tmp->cache_list, &detail->hash_table[hash]);
>>>  	detail->entries++;
>>>  	cache_get(tmp);
>>>  	cache_fresh_locked(tmp, new->expiry_time);
>>> @@ -416,28 +410,29 @@ static int cache_clean(void)
>>>  	/* find a non-empty bucket in the table */
>>>  	while (current_detail &&
>>>  	       current_index < current_detail->hash_size &&
>>> -	       current_detail->hash_table[current_index] == NULL)
>>> +	       hlist_empty(&current_detail->hash_table[current_index]))
>>>  		current_index++;
>>>  
>>>  	/* find a cleanable entry in the bucket and clean it, or set to next bucket */
>>>  
>>>  	if (current_detail && current_index < current_detail->hash_size) {
>>> -		struct cache_head *ch, **cp;
>>> +		struct cache_head *ch = NULL;
>>>  		struct cache_detail *d;
>>> +		struct hlist_head *head;
>>> +		struct hlist_node *tmp;
>>>  
>>>  		write_lock(&current_detail->hash_lock);
>>>  
>>>  		/* Ok, now to clean this strand */
>>>  
>>> -		cp = & current_detail->hash_table[current_index];
>>> -		for (ch = *cp ; ch ; cp = & ch->next, ch = *cp) {
>>> +		head = &current_detail->hash_table[current_index];
>>> +		hlist_for_each_entry_safe(ch, tmp, head, cache_list) {
>>>  			if (current_detail->nextcheck > ch->expiry_time)
>>>  				current_detail->nextcheck = ch->expiry_time+1;
>>>  			if (!cache_is_expired(current_detail, ch))
>>>  				continue;
>>>  
>>> -			*cp = ch->next;
>>> -			ch->next = NULL;
>>> +			hlist_del_init(&ch->cache_list);
>>>  			current_detail->entries--;
>>>  			rv = 1;
>>>  			break;
>>> @@ -1284,7 +1279,7 @@ void *cache_seq_start(struct seq_file *m, loff_t *pos)
>>>  	hash = n >> 32;
>>>  	entry = n & ((1LL<<32) - 1);
>>>  
>>> -	for (ch=cd->hash_table[hash]; ch; ch=ch->next)
>>> +	hlist_for_each_entry(ch, &cd->hash_table[hash], cache_list)
>>>  		if (!entry--)
>>>  			return ch;
>>>  	n &= ~((1LL<<32) - 1);
>>> @@ -1292,11 +1287,12 @@ void *cache_seq_start(struct seq_file *m, loff_t *pos)
>>>  		hash++;
>>>  		n += 1LL<<32;
>>>  	} while(hash < cd->hash_size &&
>>> -		cd->hash_table[hash]==NULL);
>>> +		hlist_empty(&cd->hash_table[hash]));
>>>  	if (hash >= cd->hash_size)
>>>  		return NULL;
>>>  	*pos = n+1;
>>> -	return cd->hash_table[hash];
>>> +	return hlist_entry_safe(cd->hash_table[hash].first,
>>> +				struct cache_head, cache_list);
>>>  }
>>>  EXPORT_SYMBOL_GPL(cache_seq_start);
>>>  
>>> @@ -1308,23 +1304,25 @@ void *cache_seq_next(struct seq_file *m, void *p, loff_t *pos)
>>>  
>>>  	if (p == SEQ_START_TOKEN)
>>>  		hash = 0;
>>> -	else if (ch->next == NULL) {
>>> +	else if (ch->cache_list.next == NULL) {
>>>  		hash++;
>>>  		*pos += 1LL<<32;
>>>  	} else {
>>>  		++*pos;
>>> -		return ch->next;
>>> +		return hlist_entry_safe(ch->cache_list.next,
>>> +					struct cache_head, cache_list);
>>>  	}
>>>  	*pos &= ~((1LL<<32) - 1);
>>>  	while (hash < cd->hash_size &&
>>> -	       cd->hash_table[hash] == NULL) {
>>> +	       hlist_empty(&cd->hash_table[hash])) {
>>>  		hash++;
>>>  		*pos += 1LL<<32;
>>>  	}
>>>  	if (hash >= cd->hash_size)
>>>  		return NULL;
>>>  	++*pos;
>>> -	return cd->hash_table[hash];
>>> +	return hlist_entry_safe(cd->hash_table[hash].first,
>>> +				struct cache_head, cache_list);
>>>  }
>>>  EXPORT_SYMBOL_GPL(cache_seq_next);
>>>  
>>> @@ -1666,17 +1664,21 @@ EXPORT_SYMBOL_GPL(cache_unregister_net);
>>>  struct cache_detail *cache_create_net(struct cache_detail *tmpl, struct net *net)
>>>  {
>>>  	struct cache_detail *cd;
>>> +	int i;
>>>  
>>>  	cd = kmemdup(tmpl, sizeof(struct cache_detail), GFP_KERNEL);
>>>  	if (cd == NULL)
>>>  		return ERR_PTR(-ENOMEM);
>>>  
>>> -	cd->hash_table = kzalloc(cd->hash_size * sizeof(struct cache_head *),
>>> +	cd->hash_table = kzalloc(cd->hash_size * sizeof(struct hlist_head),
>>>  				 GFP_KERNEL);
>>>  	if (cd->hash_table == NULL) {
>>>  		kfree(cd);
>>>  		return ERR_PTR(-ENOMEM);
>>>  	}
>>> +
>>> +	for (i = 0; i < cd->hash_size; i++)
>>> +		INIT_HLIST_HEAD(&cd->hash_table[i]);
>>>  	cd->net = net;
>>>  	return cd;
>>>  }
> 

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 8/9 v8] sunrpc: New helper cache_delete_entry for deleting cache_head directly
  2015-07-29  2:29       ` NeilBrown
@ 2015-07-30 13:14         ` Kinglong Mee
  -1 siblings, 0 replies; 49+ messages in thread
From: Kinglong Mee @ 2015-07-30 13:14 UTC (permalink / raw)
  To: NeilBrown
  Cc: J. Bruce Fields, Al Viro, linux-nfs-u79uwXL29TY76Z2rM5mHXA,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA, Trond Myklebust

On 7/29/2015 10:29, NeilBrown wrote:
> On Mon, 27 Jul 2015 11:10:45 +0800 Kinglong Mee <kinglongmee-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
> wrote:
> 
>> A new helper cache_delete_entry() for delete cache_head from
>> cache_detail directly.
>>
>> It will be used by pin_kill, so make sure the cache_detail is valid
>> before deleting is needed.
> 
> I cannot see any justification for validating the cache_detail.
> 
> When this gets called, the cache_head has not yet been freed (though it
> probably will be soon) so the cache_detail must still be around.
> 
> However it is possible for this to race with cache_clean() which could
> have already removed the cache_head from the list (and decremented
> ->entries), but which hasn't called cache_put() yet.

Yes, that's right.
I will drop the checking of cache_detail here.

> 
> The use of cache_list_lock is not enough to protect against that race.
> 
> So I think you should drop the use of cache_list_lock, drop the check
> that detail is still in the list, and after getting ->hash_lock, check
> hlist_unhash() and only unhash if that failed.

Thanks again for your comments.

thanks,
Kinglong Mee

> 
> Thanks,
> NeilBrown
> 
>>
>> Because pin_kill is not many times, so the influence of performance
>> is accepted.
>>
>> v8, same as v6.
>>
>> Signed-off-by: Kinglong Mee <kinglongmee-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
>> ---
>>  include/linux/sunrpc/cache.h |  1 +
>>  net/sunrpc/cache.c           | 30 ++++++++++++++++++++++++++++++
>>  2 files changed, 31 insertions(+)
>>
>> diff --git a/include/linux/sunrpc/cache.h b/include/linux/sunrpc/cache.h
>> index 03d3b4c..2824db5 100644
>> --- a/include/linux/sunrpc/cache.h
>> +++ b/include/linux/sunrpc/cache.h
>> @@ -210,6 +210,7 @@ extern int cache_check(struct cache_detail *detail,
>>  		       struct cache_head *h, struct cache_req *rqstp);
>>  extern void cache_flush(void);
>>  extern void cache_purge(struct cache_detail *detail);
>> +extern void cache_delete_entry(struct cache_detail *cd, struct cache_head *h);
>>  #define NEVER (0x7FFFFFFF)
>>  extern void __init cache_initialize(void);
>>  extern int cache_register_net(struct cache_detail *cd, struct net *net);
>> diff --git a/net/sunrpc/cache.c b/net/sunrpc/cache.c
>> index 4a2340a..b722aea 100644
>> --- a/net/sunrpc/cache.c
>> +++ b/net/sunrpc/cache.c
>> @@ -454,6 +454,36 @@ static int cache_clean(void)
>>  	return rv;
>>  }
>>  
>> +void cache_delete_entry(struct cache_detail *detail, struct cache_head *h)
>> +{
>> +	struct cache_detail *tmp;
>> +
>> +	if (!detail || !h)
>> +		return;
>> +
>> +	spin_lock(&cache_list_lock);
>> +	list_for_each_entry(tmp, &cache_list, others) {
>> +		if (tmp == detail)
>> +			goto found;
>> +	}
>> +	spin_unlock(&cache_list_lock);
>> +	printk(KERN_WARNING "%s: Deleted cache detail %p\n", __func__, detail);
>> +	return ;
>> +
>> +found:
>> +	write_lock(&detail->hash_lock);
>> +
>> +	hlist_del_init(&h->cache_list);
>> +	detail->entries--;
>> +	set_bit(CACHE_CLEANED, &h->flags);
>> +
>> +	write_unlock(&detail->hash_lock);
>> +	spin_unlock(&cache_list_lock);
>> +
>> +	cache_put(h, detail);
>> +}
>> +EXPORT_SYMBOL_GPL(cache_delete_entry);
>> +
>>  /*
>>   * We want to regularly clean the cache, so we need to schedule some work ...
>>   */
> 
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 8/9 v8] sunrpc: New helper cache_delete_entry for deleting cache_head directly
@ 2015-07-30 13:14         ` Kinglong Mee
  0 siblings, 0 replies; 49+ messages in thread
From: Kinglong Mee @ 2015-07-30 13:14 UTC (permalink / raw)
  To: NeilBrown
  Cc: J. Bruce Fields, Al Viro, linux-nfs, linux-fsdevel, Trond Myklebust

On 7/29/2015 10:29, NeilBrown wrote:
> On Mon, 27 Jul 2015 11:10:45 +0800 Kinglong Mee <kinglongmee@gmail.com>
> wrote:
> 
>> A new helper cache_delete_entry() for delete cache_head from
>> cache_detail directly.
>>
>> It will be used by pin_kill, so make sure the cache_detail is valid
>> before deleting is needed.
> 
> I cannot see any justification for validating the cache_detail.
> 
> When this gets called, the cache_head has not yet been freed (though it
> probably will be soon) so the cache_detail must still be around.
> 
> However it is possible for this to race with cache_clean() which could
> have already removed the cache_head from the list (and decremented
> ->entries), but which hasn't called cache_put() yet.

Yes, that's right.
I will drop the checking of cache_detail here.

> 
> The use of cache_list_lock is not enough to protect against that race.
> 
> So I think you should drop the use of cache_list_lock, drop the check
> that detail is still in the list, and after getting ->hash_lock, check
> hlist_unhash() and only unhash if that failed.

Thanks again for your comments.

thanks,
Kinglong Mee

> 
> Thanks,
> NeilBrown
> 
>>
>> Because pin_kill is not many times, so the influence of performance
>> is accepted.
>>
>> v8, same as v6.
>>
>> Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
>> ---
>>  include/linux/sunrpc/cache.h |  1 +
>>  net/sunrpc/cache.c           | 30 ++++++++++++++++++++++++++++++
>>  2 files changed, 31 insertions(+)
>>
>> diff --git a/include/linux/sunrpc/cache.h b/include/linux/sunrpc/cache.h
>> index 03d3b4c..2824db5 100644
>> --- a/include/linux/sunrpc/cache.h
>> +++ b/include/linux/sunrpc/cache.h
>> @@ -210,6 +210,7 @@ extern int cache_check(struct cache_detail *detail,
>>  		       struct cache_head *h, struct cache_req *rqstp);
>>  extern void cache_flush(void);
>>  extern void cache_purge(struct cache_detail *detail);
>> +extern void cache_delete_entry(struct cache_detail *cd, struct cache_head *h);
>>  #define NEVER (0x7FFFFFFF)
>>  extern void __init cache_initialize(void);
>>  extern int cache_register_net(struct cache_detail *cd, struct net *net);
>> diff --git a/net/sunrpc/cache.c b/net/sunrpc/cache.c
>> index 4a2340a..b722aea 100644
>> --- a/net/sunrpc/cache.c
>> +++ b/net/sunrpc/cache.c
>> @@ -454,6 +454,36 @@ static int cache_clean(void)
>>  	return rv;
>>  }
>>  
>> +void cache_delete_entry(struct cache_detail *detail, struct cache_head *h)
>> +{
>> +	struct cache_detail *tmp;
>> +
>> +	if (!detail || !h)
>> +		return;
>> +
>> +	spin_lock(&cache_list_lock);
>> +	list_for_each_entry(tmp, &cache_list, others) {
>> +		if (tmp == detail)
>> +			goto found;
>> +	}
>> +	spin_unlock(&cache_list_lock);
>> +	printk(KERN_WARNING "%s: Deleted cache detail %p\n", __func__, detail);
>> +	return ;
>> +
>> +found:
>> +	write_lock(&detail->hash_lock);
>> +
>> +	hlist_del_init(&h->cache_list);
>> +	detail->entries--;
>> +	set_bit(CACHE_CLEANED, &h->flags);
>> +
>> +	write_unlock(&detail->hash_lock);
>> +	spin_unlock(&cache_list_lock);
>> +
>> +	cache_put(h, detail);
>> +}
>> +EXPORT_SYMBOL_GPL(cache_delete_entry);
>> +
>>  /*
>>   * We want to regularly clean the cache, so we need to schedule some work ...
>>   */
> 
> 

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 4/9 v8] fs: New helper legitimize_mntget() for getting a legitimize mnt
  2015-07-29  2:06       ` NeilBrown
  (?)
@ 2015-07-30 13:17       ` Kinglong Mee
  -1 siblings, 0 replies; 49+ messages in thread
From: Kinglong Mee @ 2015-07-30 13:17 UTC (permalink / raw)
  To: NeilBrown
  Cc: J. Bruce Fields, Al Viro, linux-nfs, linux-fsdevel,
	Trond Myklebust, kinglongmee

On 7/29/2015 10:06, NeilBrown wrote:
> On Mon, 27 Jul 2015 11:08:32 +0800 Kinglong Mee <kinglongmee@gmail.com>
> wrote:
> 
>> New helper legitimize_mntget for getting a mnt without setting
>> MNT_SYNC_UMOUNT | MNT_UMOUNT | MNT_DOOMED, otherwise return NULL.
>>
>> v8, same as v6
>>
>> Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
>> ---
>>  fs/namespace.c        | 19 +++++++++++++++++++
>>  include/linux/mount.h |  1 +
>>  2 files changed, 20 insertions(+)
>>
>> diff --git a/fs/namespace.c b/fs/namespace.c
>> index 2b8aa15..842cf57 100644
>> --- a/fs/namespace.c
>> +++ b/fs/namespace.c
>> @@ -1153,6 +1153,25 @@ struct vfsmount *mntget(struct vfsmount *mnt)
>>  }
>>  EXPORT_SYMBOL(mntget);
>>  
>> +struct vfsmount *legitimize_mntget(struct vfsmount *vfsmnt)
>> +{
>> +	struct mount *mnt;
>> +
>> +	if (vfsmnt == NULL)
>> +		return NULL;
>> +
>> +	read_seqlock_excl(&mount_lock);
>> +	mnt = real_mount(vfsmnt);
>> +	if (vfsmnt->mnt_flags & (MNT_SYNC_UMOUNT | MNT_UMOUNT | MNT_DOOMED))
>> +		vfsmnt = NULL;
>> +	else
>> +		mnt_add_count(mnt, 1);
>> +	read_sequnlock_excl(&mount_lock);
>> +
>> +	return vfsmnt;
>> +}
>> +EXPORT_SYMBOL(legitimize_mntget);
>> +
>>  struct vfsmount *mnt_clone_internal(struct path *path)
>>  {
>>  	struct mount *p;
>> diff --git a/include/linux/mount.h b/include/linux/mount.h
>> index f822c3c..8ae9dc0 100644
>> --- a/include/linux/mount.h
>> +++ b/include/linux/mount.h
>> @@ -79,6 +79,7 @@ extern void mnt_drop_write(struct vfsmount *mnt);
>>  extern void mnt_drop_write_file(struct file *file);
>>  extern void mntput(struct vfsmount *mnt);
>>  extern struct vfsmount *mntget(struct vfsmount *mnt);
>> +extern struct vfsmount *legitimize_mntget(struct vfsmount *vfsmnt);
>>  extern struct vfsmount *mnt_clone_internal(struct path *path);
>>  extern int __mnt_is_readonly(struct vfsmount *mnt);
>>  
> 
> It is unfortunate that we seem to have to take the mount_lock global
> lock on every nfs request.  I wonder if we can avoid that....
> 
> What if we did:
> 
>   seq = 0;
> retry:
>   read_seqbegin_or_lock(&mount_lock, &seq);
>   if (vfsmnt->mnt_flags & (MNT_SYNC_UMOUNT | MNT_UMOUNT | MNT_DOOMED))
>       vfsmnt = NULL;
>   else if (need_seqretry(&mount_lock, seq);
>       goto retry;
>   else {
> 	mnt_add_count(&mnt, 1);
>         if (need_seqretry(&mount_lock, seq) ||
> 	    vfsmnt->mnt_flags & (MNT_SYNC_UMOUNT | MNT_UMOUNT |
>                                  MNT_DOOMED)) {
>              mnt_add_count(&mnt, -1);
>              goto retry;
>         }
>    }
>    done_seqretry(&mount_lock, seq);

I don't really understand the seqlock, I will have a check about it.

> 
> 
> Is there any risk from having a temporarily elevated mnt_count there?
> I can't see one, but there is clearly some complexity in managing that
> count.

Yes, it's more complex than the above patch.
But, if it can avoid taking the mount_lock every nfs request, I'd like it.
I will test it later.

thanks,
Kinglong Mee

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 9/9 v8] nfsd: Allows user un-mounting filesystem where nfsd exports base on
  2015-07-29  3:56         ` NeilBrown
@ 2015-07-30 13:30           ` Kinglong Mee
  -1 siblings, 0 replies; 49+ messages in thread
From: Kinglong Mee @ 2015-07-30 13:30 UTC (permalink / raw)
  To: NeilBrown
  Cc: J. Bruce Fields, Al Viro, linux-nfs-u79uwXL29TY76Z2rM5mHXA,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA, Trond Myklebust,
	kinglongmee-Re5JQEeQqe8AvxtiuMwx3w

On 7/29/2015 11:56, NeilBrown wrote:
> On Mon, 27 Jul 2015 11:12:06 +0800 Kinglong Mee <kinglongmee-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
> wrote:
> 
>> If there are some mount points(not exported for nfs) under pseudo root,
>> after client's operation of those entry under the root,  anyone *can't*
>> unmount those mount points until export cache expired.
>>
... snip...
>>  
>>  static void expkey_request(struct cache_detail *cd,
>> @@ -83,7 +91,7 @@ static int expkey_parse(struct cache_detail *cd, char *mesg, int mlen)
>>  		return -EINVAL;
>>  	mesg[mlen-1] = 0;
>>  
>> -	buf = kmalloc(PAGE_SIZE, GFP_KERNEL);
>> +	buf = kzalloc(PAGE_SIZE, GFP_KERNEL);
> 
> Why this change?  There are certainly times when kzmalloc is
> appropriate but I don't see that this is one of them, or that the
> change has anything to do with the rest of the patch.

It is for [1/9] at first, without zalloc of memory, the fs_pin->done
maybe a bad value for use. If applying [1/9], change to kzalloc is
not needed here.

Maybe it should be a separated patch.
I will drop the change in the next version is true.

> 
> 
>>  	err = -ENOMEM;
>>  	if (!buf)
>>  		goto out;
>> @@ -119,6 +127,7 @@ static int expkey_parse(struct cache_detail *cd, char *mesg, int mlen)
>>  	if (key.h.expiry_time == 0)
>>  		goto out;
>>  
>> +	key.cd = cd;
>>  	key.ek_client = dom;	
>>  	key.ek_fsidtype = fsidtype;
>>  	memcpy(key.ek_fsid, buf, len);
>> @@ -181,7 +190,11 @@ static int expkey_show(struct seq_file *m,
>>  	if (test_bit(CACHE_VALID, &h->flags) && 
>>  	    !test_bit(CACHE_NEGATIVE, &h->flags)) {
>>  		seq_printf(m, " ");
>> -		seq_path(m, &ek->ek_path, "\\ \t\n");
>> +		if (legitimize_mntget(ek->ek_path.mnt)) {
>> +			seq_path(m, &ek->ek_path, "\\ \t\n");
>> +			mntput(ek->ek_path.mnt);
>> +		} else
>> +			seq_printf(m, "Dir umounting");
> 
> This "Dir umounting" needs to parse as a single word, so having a space
> in there is bad.  Maybe "Dir-unmounting".

Thanks.

> 
> 
>>  	}
>>  	seq_printf(m, "\n");
>>  	return 0;
>> @@ -210,6 +223,26 @@ static inline void expkey_init(struct cache_head *cnew,
>>  	new->ek_fsidtype = item->ek_fsidtype;
>>  
>>  	memcpy(new->ek_fsid, item->ek_fsid, sizeof(new->ek_fsid));
>> +	new->cd = item->cd;
>> +}
>> +
>> +static void expkey_pin_kill(struct fs_pin *pin)
>> +{
>> +	struct svc_expkey *key = container_of(pin, struct svc_expkey, ek_pin);
>> +
>> +	if (!completion_done(&key->ek_done)) {
>> +		schedule_work(&key->ek_work);
>> +		wait_for_completion(&key->ek_done);
>> +	}
>> +
>> +	path_put_unpin(&key->ek_path, &key->ek_pin);
>> +	expkey_destroy(key);
>> +}
>> +
>> +static void expkey_close_work(struct work_struct *work)
>> +{
>> +	struct svc_expkey *key = container_of(work, struct svc_expkey, ek_work);
>> +	cache_delete_entry(key->cd, &key->h);
>>  }
> 
> I'm perplexed by this separate scheduled work.
> You say:
> 
>> 2. add a work_struct for pin_kill decreases the reference indirectly.
> 
> above.
> cache_delete_entry() can call cache_put() which would call expkey_put()
> which calls pin_kill(), which will block until path_put_unpin calls
> pin_remove() which of course now cannot happen.
> 
> So I can see why you have it, but I really really don't like it. :-(
> 
> I'll post a patch to make a change to fs_pin so this sort of thing
> should be much easier.

I will review your patch, and try to update the new resolve.

> 
>>  
>>  static inline void expkey_update(struct cache_head *cnew,
>> @@ -218,16 +251,19 @@ static inline void expkey_update(struct cache_head *cnew,
>>  	struct svc_expkey *new = container_of(cnew, struct svc_expkey, h);
>>  	struct svc_expkey *item = container_of(citem, struct svc_expkey, h);
>>  
>> +	init_fs_pin(&new->ek_pin, expkey_pin_kill);
>>  	new->ek_path = item->ek_path;
>> -	path_get(&item->ek_path);
>> +	path_get_pin(&new->ek_path, &new->ek_pin);
>>  }
>>  
>>  static struct cache_head *expkey_alloc(void)
>>  {
>> -	struct svc_expkey *i = kmalloc(sizeof(*i), GFP_KERNEL);
>> -	if (i)
>> +	struct svc_expkey *i = kzalloc(sizeof(*i), GFP_KERNEL);
>> +	if (i) {
>> +		INIT_WORK(&i->ek_work, expkey_close_work);
>> +		init_completion(&i->ek_done);
>>  		return &i->h;
>> -	else
>> +	} else
>>  		return NULL;
>>  }
> 
> I'm slightly less offended by this kzalloc, but I still think it needs
> to be justified if it is going to remain.
> 
> 
>>  
>> @@ -306,14 +342,21 @@ static void nfsd4_fslocs_free(struct nfsd4_fs_locations *fsloc)
>>  	fsloc->locations = NULL;
>>  }
>>  
>> -static void svc_export_put(struct kref *ref)
>> +static void svc_export_destroy(struct svc_export *exp)
>>  {
>> -	struct svc_export *exp = container_of(ref, struct svc_export, h.ref);
>> -	path_put(&exp->ex_path);
>>  	auth_domain_put(exp->ex_client);
>>  	nfsd4_fslocs_free(&exp->ex_fslocs);
>>  	kfree(exp->ex_uuid);
>> -	kfree(exp);
>> +	kfree_rcu(exp, rcu_head);
>> +}
>> +
>> +static void svc_export_put(struct kref *ref)
>> +{
>> +	struct svc_export *exp = container_of(ref, struct svc_export, h.ref);
>> +
>> +	rcu_read_lock();
>> +	complete(&exp->ex_done);
>> +	pin_kill(&exp->ex_pin);
>>  }
>>  
>>  static void svc_export_request(struct cache_detail *cd,
>> @@ -520,7 +563,7 @@ static int svc_export_parse(struct cache_detail *cd, char *mesg, int mlen)
>>  		return -EINVAL;
>>  	mesg[mlen-1] = 0;
>>  
>> -	buf = kmalloc(PAGE_SIZE, GFP_KERNEL);
>> +	buf = kzalloc(PAGE_SIZE, GFP_KERNEL);
>>  	if (!buf)
>>  		return -ENOMEM;
>>  
>> @@ -636,7 +679,7 @@ static int svc_export_parse(struct cache_detail *cd, char *mesg, int mlen)
>>  	if (expp == NULL)
>>  		err = -ENOMEM;
>>  	else
>> -		exp_put(expp);
>> +		cache_put(&expp->h, expp->cd);
>>  out4:
>>  	nfsd4_fslocs_free(&exp.ex_fslocs);
>>  	kfree(exp.ex_uuid);
>> @@ -664,7 +707,12 @@ static int svc_export_show(struct seq_file *m,
>>  		return 0;
>>  	}
>>  	exp = container_of(h, struct svc_export, h);
>> -	seq_path(m, &exp->ex_path, " \t\n\\");
>> +	if (legitimize_mntget(exp->ex_path.mnt)) {
>> +		seq_path(m, &exp->ex_path, " \t\n\\");
>> +		mntput(exp->ex_path.mnt);
>> +	} else
>> +		seq_printf(m, "Dir umounting");
>> +
> 
> again, "Dir-umounting" .. or even "Dir-unmounting" with the 'n'.
> 
> 
>>  	seq_putc(m, '\t');
>>  	seq_escape(m, exp->ex_client->name, " \t\n\\");
>>  	seq_putc(m, '(');
>> @@ -694,15 +742,35 @@ static int svc_export_match(struct cache_head *a, struct cache_head *b)
>>  		path_equal(&orig->ex_path, &new->ex_path);
>>  }
>>  
>> +static void export_pin_kill(struct fs_pin *pin)
>> +{
>> +	struct svc_export *exp = container_of(pin, struct svc_export, ex_pin);
>> +
>> +	if (!completion_done(&exp->ex_done)) {
>> +		schedule_work(&exp->ex_work);
>> +		wait_for_completion(&exp->ex_done);
>> +	}
>> +
>> +	path_put_unpin(&exp->ex_path, &exp->ex_pin);
>> +	svc_export_destroy(exp);
>> +}
>> +
>> +static void export_close_work(struct work_struct *work)
>> +{
>> +	struct svc_export *exp = container_of(work, struct svc_export, ex_work);
>> +	cache_delete_entry(exp->cd, &exp->h);
>> +}
>> +
>>  static void svc_export_init(struct cache_head *cnew, struct cache_head *citem)
>>  {
>>  	struct svc_export *new = container_of(cnew, struct svc_export, h);
>>  	struct svc_export *item = container_of(citem, struct svc_export, h);
>>  
>> +	init_fs_pin(&new->ex_pin, export_pin_kill);
>>  	kref_get(&item->ex_client->ref);
>>  	new->ex_client = item->ex_client;
>>  	new->ex_path = item->ex_path;
>> -	path_get(&item->ex_path);
>> +	path_get_pin(&new->ex_path, &new->ex_pin);
>>  	new->ex_fslocs.locations = NULL;
>>  	new->ex_fslocs.locations_count = 0;
>>  	new->ex_fslocs.migrated = 0;
>> @@ -740,10 +808,12 @@ static void export_update(struct cache_head *cnew, struct cache_head *citem)
>>  
>>  static struct cache_head *svc_export_alloc(void)
>>  {
>> -	struct svc_export *i = kmalloc(sizeof(*i), GFP_KERNEL);
>> -	if (i)
>> +	struct svc_export *i = kzalloc(sizeof(*i), GFP_KERNEL);
>> +	if (i) {
>> +		INIT_WORK(&i->ex_work, export_close_work);
>> +		init_completion(&i->ex_done);
>>  		return &i->h;
>> -	else
>> +	} else
>>  		return NULL;
>>  }
>>  
>> @@ -798,6 +868,11 @@ svc_export_update(struct svc_export *new, struct svc_export *old)
>>  		return NULL;
>>  }
>>  
>> +static void exp_put_key(struct svc_expkey *key)
>> +{
>> +	mntput(key->ek_path.mnt);
>> +	cache_put(&key->h, key->cd);
>> +}
> 
> This is only called in one place.  Does it really help clarity to make
> it a separate function?

I will update it with your next comments about legitimize_mntget()
in exp_get_by_name().

> 
>>  
>>  static struct svc_expkey *
>>  exp_find_key(struct cache_detail *cd, struct auth_domain *clp, int fsid_type,
>> @@ -809,6 +884,7 @@ exp_find_key(struct cache_detail *cd, struct auth_domain *clp, int fsid_type,
>>  	if (!clp)
>>  		return ERR_PTR(-ENOENT);
>>  
>> +	key.cd = cd;
>>  	key.ek_client = clp;
>>  	key.ek_fsidtype = fsid_type;
>>  	memcpy(key.ek_fsid, fsidv, key_len(fsid_type));
>> @@ -819,6 +895,12 @@ exp_find_key(struct cache_detail *cd, struct auth_domain *clp, int fsid_type,
>>  	err = cache_check(cd, &ek->h, reqp);
>>  	if (err)
>>  		return ERR_PTR(err);
>> +
>> +	if (!legitimize_mntget(ek->ek_path.mnt)) {
>> +		cache_put(&ek->h, ek->cd);
>> +		return ERR_PTR(-ESTALE);
>> +	}
>> +
> 
> I think -ENOENT would be a better error code here.
> Just pretend that the entry doesn't exist - because in a moment it
> won't.

Yes, -ESTALE is for filehandle.
-ENOENT is better for cache entry.

> 
> 
>>  	return ek;
>>  }
>>  
>> @@ -842,6 +924,12 @@ exp_get_by_name(struct cache_detail *cd, struct auth_domain *clp,
>>  	err = cache_check(cd, &exp->h, reqp);
>>  	if (err)
>>  		return ERR_PTR(err);
>> +
>> +	if (!legitimize_mntget(exp->ex_path.mnt)) {
>> +		cache_put(&exp->h, exp->cd);
>> +		return ERR_PTR(-ESTALE);
>> +	}
>> +
>>  	return exp;
>>  }
> 
> You *really* don't need this legitimize_mntget() here, just mntget().
> You already have a legitimate reference to the mnt here.

Got it!

> 
> 
> I think this patch is mostly good - there only serious problem is the
> "Dir umounting" string that you use in place of a pathname, and which
> contains a space.
> 
> But I'd really like to get rid of the completion and work struct if I
> can...

Thanks again for your comments. I will update those patches later!

thanks,
Kinglong Mee
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 9/9 v8] nfsd: Allows user un-mounting filesystem where nfsd exports base on
@ 2015-07-30 13:30           ` Kinglong Mee
  0 siblings, 0 replies; 49+ messages in thread
From: Kinglong Mee @ 2015-07-30 13:30 UTC (permalink / raw)
  To: NeilBrown
  Cc: J. Bruce Fields, Al Viro, linux-nfs, linux-fsdevel,
	Trond Myklebust, kinglongmee

On 7/29/2015 11:56, NeilBrown wrote:
> On Mon, 27 Jul 2015 11:12:06 +0800 Kinglong Mee <kinglongmee@gmail.com>
> wrote:
> 
>> If there are some mount points(not exported for nfs) under pseudo root,
>> after client's operation of those entry under the root,  anyone *can't*
>> unmount those mount points until export cache expired.
>>
... snip...
>>  
>>  static void expkey_request(struct cache_detail *cd,
>> @@ -83,7 +91,7 @@ static int expkey_parse(struct cache_detail *cd, char *mesg, int mlen)
>>  		return -EINVAL;
>>  	mesg[mlen-1] = 0;
>>  
>> -	buf = kmalloc(PAGE_SIZE, GFP_KERNEL);
>> +	buf = kzalloc(PAGE_SIZE, GFP_KERNEL);
> 
> Why this change?  There are certainly times when kzmalloc is
> appropriate but I don't see that this is one of them, or that the
> change has anything to do with the rest of the patch.

It is for [1/9] at first, without zalloc of memory, the fs_pin->done
maybe a bad value for use. If applying [1/9], change to kzalloc is
not needed here.

Maybe it should be a separated patch.
I will drop the change in the next version is true.

> 
> 
>>  	err = -ENOMEM;
>>  	if (!buf)
>>  		goto out;
>> @@ -119,6 +127,7 @@ static int expkey_parse(struct cache_detail *cd, char *mesg, int mlen)
>>  	if (key.h.expiry_time == 0)
>>  		goto out;
>>  
>> +	key.cd = cd;
>>  	key.ek_client = dom;	
>>  	key.ek_fsidtype = fsidtype;
>>  	memcpy(key.ek_fsid, buf, len);
>> @@ -181,7 +190,11 @@ static int expkey_show(struct seq_file *m,
>>  	if (test_bit(CACHE_VALID, &h->flags) && 
>>  	    !test_bit(CACHE_NEGATIVE, &h->flags)) {
>>  		seq_printf(m, " ");
>> -		seq_path(m, &ek->ek_path, "\\ \t\n");
>> +		if (legitimize_mntget(ek->ek_path.mnt)) {
>> +			seq_path(m, &ek->ek_path, "\\ \t\n");
>> +			mntput(ek->ek_path.mnt);
>> +		} else
>> +			seq_printf(m, "Dir umounting");
> 
> This "Dir umounting" needs to parse as a single word, so having a space
> in there is bad.  Maybe "Dir-unmounting".

Thanks.

> 
> 
>>  	}
>>  	seq_printf(m, "\n");
>>  	return 0;
>> @@ -210,6 +223,26 @@ static inline void expkey_init(struct cache_head *cnew,
>>  	new->ek_fsidtype = item->ek_fsidtype;
>>  
>>  	memcpy(new->ek_fsid, item->ek_fsid, sizeof(new->ek_fsid));
>> +	new->cd = item->cd;
>> +}
>> +
>> +static void expkey_pin_kill(struct fs_pin *pin)
>> +{
>> +	struct svc_expkey *key = container_of(pin, struct svc_expkey, ek_pin);
>> +
>> +	if (!completion_done(&key->ek_done)) {
>> +		schedule_work(&key->ek_work);
>> +		wait_for_completion(&key->ek_done);
>> +	}
>> +
>> +	path_put_unpin(&key->ek_path, &key->ek_pin);
>> +	expkey_destroy(key);
>> +}
>> +
>> +static void expkey_close_work(struct work_struct *work)
>> +{
>> +	struct svc_expkey *key = container_of(work, struct svc_expkey, ek_work);
>> +	cache_delete_entry(key->cd, &key->h);
>>  }
> 
> I'm perplexed by this separate scheduled work.
> You say:
> 
>> 2. add a work_struct for pin_kill decreases the reference indirectly.
> 
> above.
> cache_delete_entry() can call cache_put() which would call expkey_put()
> which calls pin_kill(), which will block until path_put_unpin calls
> pin_remove() which of course now cannot happen.
> 
> So I can see why you have it, but I really really don't like it. :-(
> 
> I'll post a patch to make a change to fs_pin so this sort of thing
> should be much easier.

I will review your patch, and try to update the new resolve.

> 
>>  
>>  static inline void expkey_update(struct cache_head *cnew,
>> @@ -218,16 +251,19 @@ static inline void expkey_update(struct cache_head *cnew,
>>  	struct svc_expkey *new = container_of(cnew, struct svc_expkey, h);
>>  	struct svc_expkey *item = container_of(citem, struct svc_expkey, h);
>>  
>> +	init_fs_pin(&new->ek_pin, expkey_pin_kill);
>>  	new->ek_path = item->ek_path;
>> -	path_get(&item->ek_path);
>> +	path_get_pin(&new->ek_path, &new->ek_pin);
>>  }
>>  
>>  static struct cache_head *expkey_alloc(void)
>>  {
>> -	struct svc_expkey *i = kmalloc(sizeof(*i), GFP_KERNEL);
>> -	if (i)
>> +	struct svc_expkey *i = kzalloc(sizeof(*i), GFP_KERNEL);
>> +	if (i) {
>> +		INIT_WORK(&i->ek_work, expkey_close_work);
>> +		init_completion(&i->ek_done);
>>  		return &i->h;
>> -	else
>> +	} else
>>  		return NULL;
>>  }
> 
> I'm slightly less offended by this kzalloc, but I still think it needs
> to be justified if it is going to remain.
> 
> 
>>  
>> @@ -306,14 +342,21 @@ static void nfsd4_fslocs_free(struct nfsd4_fs_locations *fsloc)
>>  	fsloc->locations = NULL;
>>  }
>>  
>> -static void svc_export_put(struct kref *ref)
>> +static void svc_export_destroy(struct svc_export *exp)
>>  {
>> -	struct svc_export *exp = container_of(ref, struct svc_export, h.ref);
>> -	path_put(&exp->ex_path);
>>  	auth_domain_put(exp->ex_client);
>>  	nfsd4_fslocs_free(&exp->ex_fslocs);
>>  	kfree(exp->ex_uuid);
>> -	kfree(exp);
>> +	kfree_rcu(exp, rcu_head);
>> +}
>> +
>> +static void svc_export_put(struct kref *ref)
>> +{
>> +	struct svc_export *exp = container_of(ref, struct svc_export, h.ref);
>> +
>> +	rcu_read_lock();
>> +	complete(&exp->ex_done);
>> +	pin_kill(&exp->ex_pin);
>>  }
>>  
>>  static void svc_export_request(struct cache_detail *cd,
>> @@ -520,7 +563,7 @@ static int svc_export_parse(struct cache_detail *cd, char *mesg, int mlen)
>>  		return -EINVAL;
>>  	mesg[mlen-1] = 0;
>>  
>> -	buf = kmalloc(PAGE_SIZE, GFP_KERNEL);
>> +	buf = kzalloc(PAGE_SIZE, GFP_KERNEL);
>>  	if (!buf)
>>  		return -ENOMEM;
>>  
>> @@ -636,7 +679,7 @@ static int svc_export_parse(struct cache_detail *cd, char *mesg, int mlen)
>>  	if (expp == NULL)
>>  		err = -ENOMEM;
>>  	else
>> -		exp_put(expp);
>> +		cache_put(&expp->h, expp->cd);
>>  out4:
>>  	nfsd4_fslocs_free(&exp.ex_fslocs);
>>  	kfree(exp.ex_uuid);
>> @@ -664,7 +707,12 @@ static int svc_export_show(struct seq_file *m,
>>  		return 0;
>>  	}
>>  	exp = container_of(h, struct svc_export, h);
>> -	seq_path(m, &exp->ex_path, " \t\n\\");
>> +	if (legitimize_mntget(exp->ex_path.mnt)) {
>> +		seq_path(m, &exp->ex_path, " \t\n\\");
>> +		mntput(exp->ex_path.mnt);
>> +	} else
>> +		seq_printf(m, "Dir umounting");
>> +
> 
> again, "Dir-umounting" .. or even "Dir-unmounting" with the 'n'.
> 
> 
>>  	seq_putc(m, '\t');
>>  	seq_escape(m, exp->ex_client->name, " \t\n\\");
>>  	seq_putc(m, '(');
>> @@ -694,15 +742,35 @@ static int svc_export_match(struct cache_head *a, struct cache_head *b)
>>  		path_equal(&orig->ex_path, &new->ex_path);
>>  }
>>  
>> +static void export_pin_kill(struct fs_pin *pin)
>> +{
>> +	struct svc_export *exp = container_of(pin, struct svc_export, ex_pin);
>> +
>> +	if (!completion_done(&exp->ex_done)) {
>> +		schedule_work(&exp->ex_work);
>> +		wait_for_completion(&exp->ex_done);
>> +	}
>> +
>> +	path_put_unpin(&exp->ex_path, &exp->ex_pin);
>> +	svc_export_destroy(exp);
>> +}
>> +
>> +static void export_close_work(struct work_struct *work)
>> +{
>> +	struct svc_export *exp = container_of(work, struct svc_export, ex_work);
>> +	cache_delete_entry(exp->cd, &exp->h);
>> +}
>> +
>>  static void svc_export_init(struct cache_head *cnew, struct cache_head *citem)
>>  {
>>  	struct svc_export *new = container_of(cnew, struct svc_export, h);
>>  	struct svc_export *item = container_of(citem, struct svc_export, h);
>>  
>> +	init_fs_pin(&new->ex_pin, export_pin_kill);
>>  	kref_get(&item->ex_client->ref);
>>  	new->ex_client = item->ex_client;
>>  	new->ex_path = item->ex_path;
>> -	path_get(&item->ex_path);
>> +	path_get_pin(&new->ex_path, &new->ex_pin);
>>  	new->ex_fslocs.locations = NULL;
>>  	new->ex_fslocs.locations_count = 0;
>>  	new->ex_fslocs.migrated = 0;
>> @@ -740,10 +808,12 @@ static void export_update(struct cache_head *cnew, struct cache_head *citem)
>>  
>>  static struct cache_head *svc_export_alloc(void)
>>  {
>> -	struct svc_export *i = kmalloc(sizeof(*i), GFP_KERNEL);
>> -	if (i)
>> +	struct svc_export *i = kzalloc(sizeof(*i), GFP_KERNEL);
>> +	if (i) {
>> +		INIT_WORK(&i->ex_work, export_close_work);
>> +		init_completion(&i->ex_done);
>>  		return &i->h;
>> -	else
>> +	} else
>>  		return NULL;
>>  }
>>  
>> @@ -798,6 +868,11 @@ svc_export_update(struct svc_export *new, struct svc_export *old)
>>  		return NULL;
>>  }
>>  
>> +static void exp_put_key(struct svc_expkey *key)
>> +{
>> +	mntput(key->ek_path.mnt);
>> +	cache_put(&key->h, key->cd);
>> +}
> 
> This is only called in one place.  Does it really help clarity to make
> it a separate function?

I will update it with your next comments about legitimize_mntget()
in exp_get_by_name().

> 
>>  
>>  static struct svc_expkey *
>>  exp_find_key(struct cache_detail *cd, struct auth_domain *clp, int fsid_type,
>> @@ -809,6 +884,7 @@ exp_find_key(struct cache_detail *cd, struct auth_domain *clp, int fsid_type,
>>  	if (!clp)
>>  		return ERR_PTR(-ENOENT);
>>  
>> +	key.cd = cd;
>>  	key.ek_client = clp;
>>  	key.ek_fsidtype = fsid_type;
>>  	memcpy(key.ek_fsid, fsidv, key_len(fsid_type));
>> @@ -819,6 +895,12 @@ exp_find_key(struct cache_detail *cd, struct auth_domain *clp, int fsid_type,
>>  	err = cache_check(cd, &ek->h, reqp);
>>  	if (err)
>>  		return ERR_PTR(err);
>> +
>> +	if (!legitimize_mntget(ek->ek_path.mnt)) {
>> +		cache_put(&ek->h, ek->cd);
>> +		return ERR_PTR(-ESTALE);
>> +	}
>> +
> 
> I think -ENOENT would be a better error code here.
> Just pretend that the entry doesn't exist - because in a moment it
> won't.

Yes, -ESTALE is for filehandle.
-ENOENT is better for cache entry.

> 
> 
>>  	return ek;
>>  }
>>  
>> @@ -842,6 +924,12 @@ exp_get_by_name(struct cache_detail *cd, struct auth_domain *clp,
>>  	err = cache_check(cd, &exp->h, reqp);
>>  	if (err)
>>  		return ERR_PTR(err);
>> +
>> +	if (!legitimize_mntget(exp->ex_path.mnt)) {
>> +		cache_put(&exp->h, exp->cd);
>> +		return ERR_PTR(-ESTALE);
>> +	}
>> +
>>  	return exp;
>>  }
> 
> You *really* don't need this legitimize_mntget() here, just mntget().
> You already have a legitimate reference to the mnt here.

Got it!

> 
> 
> I think this patch is mostly good - there only serious problem is the
> "Dir umounting" string that you use in place of a pathname, and which
> contains a space.
> 
> But I'd really like to get rid of the completion and work struct if I
> can...

Thanks again for your comments. I will update those patches later!

thanks,
Kinglong Mee

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH] fs-pin: allow pin_remove() to be called other than from ->kill()
  2015-07-29  3:59         ` NeilBrown
  (?)
@ 2015-08-10 11:37         ` Kinglong Mee
  -1 siblings, 0 replies; 49+ messages in thread
From: Kinglong Mee @ 2015-08-10 11:37 UTC (permalink / raw)
  To: Al Viro
  Cc: NeilBrown, J. Bruce Fields, linux-nfs, linux-fsdevel,
	Trond Myklebust, kinglongmee

Ping ...

Hello Viro,
What's your opinion at this patch ?

If Okay, I will update those patches based on this one.

thanks,
Kinglong Mee

On 7/29/2015 11:59, NeilBrown wrote:
> 
> 
> fs-pin currently assumes when either the vfsmount or the fs_pin wants
> to unpin, pin_kill() will be called.
> This requires that the ->kill() function can wait for any transient
> references to the fs_pin to be released.  If the structure containing
> the fs_pin doesn't already have the ability to wait for references,
> this can be a burden.
> 
> As the fs_pin already has infrastructure for waiting, that can be
> leveraged to remove the burden.
> 
> In this alternate scenario, only the vfsmount calls pin_kill() when it
> wants to unpin.  The owner of the fs_pin() instead calls pin_remove().
> 
> The ->kill() function removes any long-term references, and then calls
> pin_kill() (recursively).
> When the last reference on (the structure containing) the fs_pin is
> dropped, pin_remove() will be called and the (recursive) pin_kill()
> call will complete.
> 
> For this to be safe, the final "put" must *not* free the structure if
> pin_kill() has already been called, as that could leave ->kill()
> accessing freed data.
> 
> So we provide a return value for pin_remove() which reports the old
> ->done value.
> 
> When final put calls pin_remove() it checks that value.
> If it was 0, then pin_kill() has not called ->kill and will not,
> so final put can free the data structure.
> If it was -1, then pin_kill() has called ->kill, and ->kill will
> free the data structure - final put must not touch it.
> 
> This makes the 'wait' infrastructure of fs_pin available to any
> pinning client which wants to use it.
> 
> Signed-Off-By: NeilBrown <neilb@suse.com>
> 
> ---
> Hi Al,
>  do you see this as a workable solution?  I think it will improve the nfsd pining patch
> a lot.
> 
> Thanks,
> NeilBrown
> 
> 
> diff --git a/fs/fs_pin.c b/fs/fs_pin.c
> index 611b5408f6ec..b7954a9d17da 100644
> --- a/fs/fs_pin.c
> +++ b/fs/fs_pin.c
> @@ -6,16 +6,32 @@
>  
>  static DEFINE_SPINLOCK(pin_lock);
>  
> -void pin_remove(struct fs_pin *pin)
> +/**
> + * pin_remove - disconnect an fs_pin from the pinned structure.
> + * @pin:	The struct fs_pin which is pinning something.
> + *
> + * Detach a 'pin' which was added by pin_insert().  A return value
> + * of -1 implies that pin_kill() has already been called and that the
> + * ->kill() function now owns the data structure containing @pin.
> + * The function which called pin_remove() must not touch the data structure
> + * again (unless it is the ->kill() function itself).
> + * A return value of 0 implies an uneventful disconnect: pin_kill() has not called,
> + * and will not call, the ->kill() function on this @pin.
> + * Any other return value is a usage error - e.g. repeated call to pin_remove().
> + */
> +int pin_remove(struct fs_pin *pin)
>  {
> +	int ret;
>  	spin_lock(&pin_lock);
>  	hlist_del_init(&pin->m_list);
>  	hlist_del_init(&pin->s_list);
>  	spin_unlock(&pin_lock);
>  	spin_lock_irq(&pin->wait.lock);
> +	ret = pin->done;
>  	pin->done = 1;
>  	wake_up_locked(&pin->wait);
>  	spin_unlock_irq(&pin->wait.lock);
> +	return ret;
>  }
>  
>  void pin_insert_group(struct fs_pin *pin, struct vfsmount *m, struct hlist_head *p)
> diff --git a/include/linux/fs_pin.h b/include/linux/fs_pin.h
> index 3886b3bffd7f..2fe9d3ba09e8 100644
> --- a/include/linux/fs_pin.h
> +++ b/include/linux/fs_pin.h
> @@ -18,7 +18,7 @@ static inline void init_fs_pin(struct fs_pin *p, void (*kill)(struct fs_pin *))
>  	p->kill = kill;
>  }
>  
> -void pin_remove(struct fs_pin *);
> +int pin_remove(struct fs_pin *);
>  void pin_insert_group(struct fs_pin *, struct vfsmount *, struct hlist_head *);
>  void pin_insert(struct fs_pin *, struct vfsmount *);
>  void pin_kill(struct fs_pin *);
> 

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH] fs-pin: allow pin_remove() to be called other than from ->kill()
  2015-07-29  3:59         ` NeilBrown
@ 2015-08-18  6:07           ` Kinglong Mee
  -1 siblings, 0 replies; 49+ messages in thread
From: Kinglong Mee @ 2015-08-18  6:07 UTC (permalink / raw)
  To: NeilBrown, Al Viro
  Cc: J. Bruce Fields, linux-nfs-u79uwXL29TY76Z2rM5mHXA,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA, Trond Myklebust,
	kinglongmee-Re5JQEeQqe8AvxtiuMwx3w

Sorry for my so late reply.

On 7/29/2015 11:59, NeilBrown wrote:
> fs-pin currently assumes when either the vfsmount or the fs_pin wants
> to unpin, pin_kill() will be called.
> This requires that the ->kill() function can wait for any transient
> references to the fs_pin to be released.  If the structure containing
> the fs_pin doesn't already have the ability to wait for references,
> this can be a burden.
> 
> As the fs_pin already has infrastructure for waiting, that can be
> leveraged to remove the burden.
> 
> In this alternate scenario, only the vfsmount calls pin_kill() when it
> wants to unpin.  The owner of the fs_pin() instead calls pin_remove().
> 
> The ->kill() function removes any long-term references, and then calls
> pin_kill() (recursively).
> When the last reference on (the structure containing) the fs_pin is
> dropped, pin_remove() will be called and the (recursive) pin_kill()
> call will complete.
> 
> For this to be safe, the final "put" must *not* free the structure if
> pin_kill() has already been called, as that could leave ->kill()
> accessing freed data.
> 
> So we provide a return value for pin_remove() which reports the old
> ->done value.
> 
> When final put calls pin_remove() it checks that value.
> If it was 0, then pin_kill() has not called ->kill and will not,
> so final put can free the data structure.
> If it was -1, then pin_kill() has called ->kill, and ->kill will
> free the data structure - final put must not touch it.

I find another problem, 
how can xxx_pin_kill known the last reference of the data have be put?

eg,
static void expkey_pin_kill(struct fs_pin *pin)
{
        struct svc_expkey *key = container_of(pin, struct svc_expkey, ek_pin);
        cache_delete_entry(key->cd, &key->h);
        expkey_destroy(key);
}

expkey_pin_kill has call cache_delete_entry() but doesn't know whether
the last reference has be put (here expkey_put is called)? 

Before the cache_list is deleted from the cache, a third user gets
the reference, so that, the third user will be the last put of the cache
by calling expkey_put, xxx_pin_kill only decrease the reference.

thanks,
Kinglong Mee

> 
> This makes the 'wait' infrastructure of fs_pin available to any
> pinning client which wants to use it.
> 
> Signed-Off-By: NeilBrown <neilb-IBi9RG/b67k@public.gmane.org>
> 
> ---
> Hi Al,
>  do you see this as a workable solution?  I think it will improve the nfsd pining patch
> a lot.
> 
> Thanks,
> NeilBrown
> 
> 
> diff --git a/fs/fs_pin.c b/fs/fs_pin.c
> index 611b5408f6ec..b7954a9d17da 100644
> --- a/fs/fs_pin.c
> +++ b/fs/fs_pin.c
> @@ -6,16 +6,32 @@
>  
>  static DEFINE_SPINLOCK(pin_lock);
>  
> -void pin_remove(struct fs_pin *pin)
> +/**
> + * pin_remove - disconnect an fs_pin from the pinned structure.
> + * @pin:	The struct fs_pin which is pinning something.
> + *
> + * Detach a 'pin' which was added by pin_insert().  A return value
> + * of -1 implies that pin_kill() has already been called and that the
> + * ->kill() function now owns the data structure containing @pin.
> + * The function which called pin_remove() must not touch the data structure
> + * again (unless it is the ->kill() function itself).
> + * A return value of 0 implies an uneventful disconnect: pin_kill() has not called,
> + * and will not call, the ->kill() function on this @pin.
> + * Any other return value is a usage error - e.g. repeated call to pin_remove().
> + */
> +int pin_remove(struct fs_pin *pin)
>  {
> +	int ret;
>  	spin_lock(&pin_lock);
>  	hlist_del_init(&pin->m_list);
>  	hlist_del_init(&pin->s_list);
>  	spin_unlock(&pin_lock);
>  	spin_lock_irq(&pin->wait.lock);
> +	ret = pin->done;
>  	pin->done = 1;
>  	wake_up_locked(&pin->wait);
>  	spin_unlock_irq(&pin->wait.lock);
> +	return ret;
>  }
>  
>  void pin_insert_group(struct fs_pin *pin, struct vfsmount *m, struct hlist_head *p)
> diff --git a/include/linux/fs_pin.h b/include/linux/fs_pin.h
> index 3886b3bffd7f..2fe9d3ba09e8 100644
> --- a/include/linux/fs_pin.h
> +++ b/include/linux/fs_pin.h
> @@ -18,7 +18,7 @@ static inline void init_fs_pin(struct fs_pin *p, void (*kill)(struct fs_pin *))
>  	p->kill = kill;
>  }
>  
> -void pin_remove(struct fs_pin *);
> +int pin_remove(struct fs_pin *);
>  void pin_insert_group(struct fs_pin *, struct vfsmount *, struct hlist_head *);
>  void pin_insert(struct fs_pin *, struct vfsmount *);
>  void pin_kill(struct fs_pin *);
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH] fs-pin: allow pin_remove() to be called other than from ->kill()
@ 2015-08-18  6:07           ` Kinglong Mee
  0 siblings, 0 replies; 49+ messages in thread
From: Kinglong Mee @ 2015-08-18  6:07 UTC (permalink / raw)
  To: NeilBrown, Al Viro
  Cc: J. Bruce Fields, linux-nfs, linux-fsdevel, Trond Myklebust, kinglongmee

Sorry for my so late reply.

On 7/29/2015 11:59, NeilBrown wrote:
> fs-pin currently assumes when either the vfsmount or the fs_pin wants
> to unpin, pin_kill() will be called.
> This requires that the ->kill() function can wait for any transient
> references to the fs_pin to be released.  If the structure containing
> the fs_pin doesn't already have the ability to wait for references,
> this can be a burden.
> 
> As the fs_pin already has infrastructure for waiting, that can be
> leveraged to remove the burden.
> 
> In this alternate scenario, only the vfsmount calls pin_kill() when it
> wants to unpin.  The owner of the fs_pin() instead calls pin_remove().
> 
> The ->kill() function removes any long-term references, and then calls
> pin_kill() (recursively).
> When the last reference on (the structure containing) the fs_pin is
> dropped, pin_remove() will be called and the (recursive) pin_kill()
> call will complete.
> 
> For this to be safe, the final "put" must *not* free the structure if
> pin_kill() has already been called, as that could leave ->kill()
> accessing freed data.
> 
> So we provide a return value for pin_remove() which reports the old
> ->done value.
> 
> When final put calls pin_remove() it checks that value.
> If it was 0, then pin_kill() has not called ->kill and will not,
> so final put can free the data structure.
> If it was -1, then pin_kill() has called ->kill, and ->kill will
> free the data structure - final put must not touch it.

I find another problem, 
how can xxx_pin_kill known the last reference of the data have be put?

eg,
static void expkey_pin_kill(struct fs_pin *pin)
{
        struct svc_expkey *key = container_of(pin, struct svc_expkey, ek_pin);
        cache_delete_entry(key->cd, &key->h);
        expkey_destroy(key);
}

expkey_pin_kill has call cache_delete_entry() but doesn't know whether
the last reference has be put (here expkey_put is called)? 

Before the cache_list is deleted from the cache, a third user gets
the reference, so that, the third user will be the last put of the cache
by calling expkey_put, xxx_pin_kill only decrease the reference.

thanks,
Kinglong Mee

> 
> This makes the 'wait' infrastructure of fs_pin available to any
> pinning client which wants to use it.
> 
> Signed-Off-By: NeilBrown <neilb@suse.com>
> 
> ---
> Hi Al,
>  do you see this as a workable solution?  I think it will improve the nfsd pining patch
> a lot.
> 
> Thanks,
> NeilBrown
> 
> 
> diff --git a/fs/fs_pin.c b/fs/fs_pin.c
> index 611b5408f6ec..b7954a9d17da 100644
> --- a/fs/fs_pin.c
> +++ b/fs/fs_pin.c
> @@ -6,16 +6,32 @@
>  
>  static DEFINE_SPINLOCK(pin_lock);
>  
> -void pin_remove(struct fs_pin *pin)
> +/**
> + * pin_remove - disconnect an fs_pin from the pinned structure.
> + * @pin:	The struct fs_pin which is pinning something.
> + *
> + * Detach a 'pin' which was added by pin_insert().  A return value
> + * of -1 implies that pin_kill() has already been called and that the
> + * ->kill() function now owns the data structure containing @pin.
> + * The function which called pin_remove() must not touch the data structure
> + * again (unless it is the ->kill() function itself).
> + * A return value of 0 implies an uneventful disconnect: pin_kill() has not called,
> + * and will not call, the ->kill() function on this @pin.
> + * Any other return value is a usage error - e.g. repeated call to pin_remove().
> + */
> +int pin_remove(struct fs_pin *pin)
>  {
> +	int ret;
>  	spin_lock(&pin_lock);
>  	hlist_del_init(&pin->m_list);
>  	hlist_del_init(&pin->s_list);
>  	spin_unlock(&pin_lock);
>  	spin_lock_irq(&pin->wait.lock);
> +	ret = pin->done;
>  	pin->done = 1;
>  	wake_up_locked(&pin->wait);
>  	spin_unlock_irq(&pin->wait.lock);
> +	return ret;
>  }
>  
>  void pin_insert_group(struct fs_pin *pin, struct vfsmount *m, struct hlist_head *p)
> diff --git a/include/linux/fs_pin.h b/include/linux/fs_pin.h
> index 3886b3bffd7f..2fe9d3ba09e8 100644
> --- a/include/linux/fs_pin.h
> +++ b/include/linux/fs_pin.h
> @@ -18,7 +18,7 @@ static inline void init_fs_pin(struct fs_pin *p, void (*kill)(struct fs_pin *))
>  	p->kill = kill;
>  }
>  
> -void pin_remove(struct fs_pin *);
> +int pin_remove(struct fs_pin *);
>  void pin_insert_group(struct fs_pin *, struct vfsmount *, struct hlist_head *);
>  void pin_insert(struct fs_pin *, struct vfsmount *);
>  void pin_kill(struct fs_pin *);
> 

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH] fs-pin: allow pin_remove() to be called other than from ->kill()
  2015-08-18  6:07           ` Kinglong Mee
  (?)
@ 2015-08-18  6:21           ` NeilBrown
  2015-08-18  6:37               ` Kinglong Mee
  -1 siblings, 1 reply; 49+ messages in thread
From: NeilBrown @ 2015-08-18  6:21 UTC (permalink / raw)
  To: Kinglong Mee
  Cc: Al Viro, J. Bruce Fields, linux-nfs, linux-fsdevel, Trond Myklebust

On Tue, 18 Aug 2015 14:07:58 +0800 Kinglong Mee <kinglongmee@gmail.com>
wrote:

> Sorry for my so late reply.
> 
> On 7/29/2015 11:59, NeilBrown wrote:
> > fs-pin currently assumes when either the vfsmount or the fs_pin wants
> > to unpin, pin_kill() will be called.
> > This requires that the ->kill() function can wait for any transient
> > references to the fs_pin to be released.  If the structure containing
> > the fs_pin doesn't already have the ability to wait for references,
> > this can be a burden.
> > 
> > As the fs_pin already has infrastructure for waiting, that can be
> > leveraged to remove the burden.
> > 
> > In this alternate scenario, only the vfsmount calls pin_kill() when it
> > wants to unpin.  The owner of the fs_pin() instead calls pin_remove().
> > 
> > The ->kill() function removes any long-term references, and then calls
> > pin_kill() (recursively).
> > When the last reference on (the structure containing) the fs_pin is
> > dropped, pin_remove() will be called and the (recursive) pin_kill()
> > call will complete.
> > 
> > For this to be safe, the final "put" must *not* free the structure if
> > pin_kill() has already been called, as that could leave ->kill()
> > accessing freed data.
> > 
> > So we provide a return value for pin_remove() which reports the old
> > ->done value.
> > 
> > When final put calls pin_remove() it checks that value.
> > If it was 0, then pin_kill() has not called ->kill and will not,
> > so final put can free the data structure.
> > If it was -1, then pin_kill() has called ->kill, and ->kill will
> > free the data structure - final put must not touch it.
> 
> I find another problem, 
> how can xxx_pin_kill known the last reference of the data have be put?
> 
> eg,
> static void expkey_pin_kill(struct fs_pin *pin)
> {
>         struct svc_expkey *key = container_of(pin, struct svc_expkey, ek_pin);
>         cache_delete_entry(key->cd, &key->h);
>         expkey_destroy(key);
> }
> 
> expkey_pin_kill has call cache_delete_entry() but doesn't know whether
> the last reference has be put (here expkey_put is called)? 
> 
> Before the cache_list is deleted from the cache, a third user gets
> the reference, so that, the third user will be the last put of the cache
> by calling expkey_put, xxx_pin_kill only decrease the reference.

expkey_pin_kill() should call:
  cache_delete_entry()
  pin_kill()
  expkey_destroy()

The "cache_delete_entry()" call removes the only long-term reference.
Any other reference will be transient so it is safe to wait for those.

The 'pin_kill()' call will wait of pin_remove() to be called (it
already does that).
pin_remove() will be called when the last reference is dropped.  As
described above, that pin_remove call will return -1 and so the 'put'
function will not have called expkey_destroy.

Finally the expkey_destroy() function actually frees the data
structure.  No other code can be touching at this point.

Thanks,
NeilBrown


> 
> thanks,
> Kinglong Mee
> 
> > 
> > This makes the 'wait' infrastructure of fs_pin available to any
> > pinning client which wants to use it.
> > 
> > Signed-Off-By: NeilBrown <neilb@suse.com>
> > 
> > ---
> > Hi Al,
> >  do you see this as a workable solution?  I think it will improve the nfsd pining patch
> > a lot.
> > 
> > Thanks,
> > NeilBrown
> > 
> > 
> > diff --git a/fs/fs_pin.c b/fs/fs_pin.c
> > index 611b5408f6ec..b7954a9d17da 100644
> > --- a/fs/fs_pin.c
> > +++ b/fs/fs_pin.c
> > @@ -6,16 +6,32 @@
> >  
> >  static DEFINE_SPINLOCK(pin_lock);
> >  
> > -void pin_remove(struct fs_pin *pin)
> > +/**
> > + * pin_remove - disconnect an fs_pin from the pinned structure.
> > + * @pin:	The struct fs_pin which is pinning something.
> > + *
> > + * Detach a 'pin' which was added by pin_insert().  A return value
> > + * of -1 implies that pin_kill() has already been called and that the
> > + * ->kill() function now owns the data structure containing @pin.
> > + * The function which called pin_remove() must not touch the data structure
> > + * again (unless it is the ->kill() function itself).
> > + * A return value of 0 implies an uneventful disconnect: pin_kill() has not called,
> > + * and will not call, the ->kill() function on this @pin.
> > + * Any other return value is a usage error - e.g. repeated call to pin_remove().
> > + */
> > +int pin_remove(struct fs_pin *pin)
> >  {
> > +	int ret;
> >  	spin_lock(&pin_lock);
> >  	hlist_del_init(&pin->m_list);
> >  	hlist_del_init(&pin->s_list);
> >  	spin_unlock(&pin_lock);
> >  	spin_lock_irq(&pin->wait.lock);
> > +	ret = pin->done;
> >  	pin->done = 1;
> >  	wake_up_locked(&pin->wait);
> >  	spin_unlock_irq(&pin->wait.lock);
> > +	return ret;
> >  }
> >  
> >  void pin_insert_group(struct fs_pin *pin, struct vfsmount *m, struct hlist_head *p)
> > diff --git a/include/linux/fs_pin.h b/include/linux/fs_pin.h
> > index 3886b3bffd7f..2fe9d3ba09e8 100644
> > --- a/include/linux/fs_pin.h
> > +++ b/include/linux/fs_pin.h
> > @@ -18,7 +18,7 @@ static inline void init_fs_pin(struct fs_pin *p, void (*kill)(struct fs_pin *))
> >  	p->kill = kill;
> >  }
> >  
> > -void pin_remove(struct fs_pin *);
> > +int pin_remove(struct fs_pin *);
> >  void pin_insert_group(struct fs_pin *, struct vfsmount *, struct hlist_head *);
> >  void pin_insert(struct fs_pin *, struct vfsmount *);
> >  void pin_kill(struct fs_pin *);
> > 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH] fs-pin: allow pin_remove() to be called other than from ->kill()
  2015-08-18  6:21           ` NeilBrown
@ 2015-08-18  6:37               ` Kinglong Mee
  0 siblings, 0 replies; 49+ messages in thread
From: Kinglong Mee @ 2015-08-18  6:37 UTC (permalink / raw)
  To: NeilBrown
  Cc: Al Viro, J. Bruce Fields, linux-nfs-u79uwXL29TY76Z2rM5mHXA,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA, Trond Myklebust,
	kinglongmee-Re5JQEeQqe8AvxtiuMwx3w

On 8/18/2015 14:21, NeilBrown wrote:
> On Tue, 18 Aug 2015 14:07:58 +0800 Kinglong Mee <kinglongmee-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
> wrote:
> 
>> Sorry for my so late reply.
>>
>> On 7/29/2015 11:59, NeilBrown wrote:
>>> fs-pin currently assumes when either the vfsmount or the fs_pin wants
>>> to unpin, pin_kill() will be called.
>>> This requires that the ->kill() function can wait for any transient
>>> references to the fs_pin to be released.  If the structure containing
>>> the fs_pin doesn't already have the ability to wait for references,
>>> this can be a burden.
>>>
>>> As the fs_pin already has infrastructure for waiting, that can be
>>> leveraged to remove the burden.
>>>
>>> In this alternate scenario, only the vfsmount calls pin_kill() when it
>>> wants to unpin.  The owner of the fs_pin() instead calls pin_remove().
>>>
>>> The ->kill() function removes any long-term references, and then calls
>>> pin_kill() (recursively).
>>> When the last reference on (the structure containing) the fs_pin is
>>> dropped, pin_remove() will be called and the (recursive) pin_kill()
>>> call will complete.
>>>
>>> For this to be safe, the final "put" must *not* free the structure if
>>> pin_kill() has already been called, as that could leave ->kill()
>>> accessing freed data.
>>>
>>> So we provide a return value for pin_remove() which reports the old
>>> ->done value.
>>>
>>> When final put calls pin_remove() it checks that value.
>>> If it was 0, then pin_kill() has not called ->kill and will not,
>>> so final put can free the data structure.
>>> If it was -1, then pin_kill() has called ->kill, and ->kill will
>>> free the data structure - final put must not touch it.
>>
>> I find another problem, 
>> how can xxx_pin_kill known the last reference of the data have be put?
>>
>> eg,
>> static void expkey_pin_kill(struct fs_pin *pin)
>> {
>>         struct svc_expkey *key = container_of(pin, struct svc_expkey, ek_pin);
>>         cache_delete_entry(key->cd, &key->h);
>>         expkey_destroy(key);
>> }
>>
>> expkey_pin_kill has call cache_delete_entry() but doesn't know whether
>> the last reference has be put (here expkey_put is called)? 
>>
>> Before the cache_list is deleted from the cache, a third user gets
>> the reference, so that, the third user will be the last put of the cache
>> by calling expkey_put, xxx_pin_kill only decrease the reference.
> 
> expkey_pin_kill() should call:
>   cache_delete_entry()
>   pin_kill()
>   expkey_destroy()
> 
> The "cache_delete_entry()" call removes the only long-term reference.
> Any other reference will be transient so it is safe to wait for those.
> 
> The 'pin_kill()' call will wait of pin_remove() to be called (it
> already does that).

Sorry for my missing of calling pin_kill() here.

> pin_remove() will be called when the last reference is dropped.  As
> described above, that pin_remove call will return -1 and so the 'put'
> function will not have called expkey_destroy.
> 
> Finally the expkey_destroy() function actually frees the data
> structure.  No other code can be touching at this point.

With calling pin_kill() again in expkey_pin_kill makes every clear now.
Thanks again.

The only thing is waiting Al's opinion. 

thanks,
Kinglong Mee

> 
> Thanks,
> NeilBrown
> 
> 
>>
>> thanks,
>> Kinglong Mee
>>
>>>
>>> This makes the 'wait' infrastructure of fs_pin available to any
>>> pinning client which wants to use it.
>>>
>>> Signed-Off-By: NeilBrown <neilb-IBi9RG/b67k@public.gmane.org>
>>>
>>> ---
>>> Hi Al,
>>>  do you see this as a workable solution?  I think it will improve the nfsd pining patch
>>> a lot.
>>>
>>> Thanks,
>>> NeilBrown
>>>
>>>
>>> diff --git a/fs/fs_pin.c b/fs/fs_pin.c
>>> index 611b5408f6ec..b7954a9d17da 100644
>>> --- a/fs/fs_pin.c
>>> +++ b/fs/fs_pin.c
>>> @@ -6,16 +6,32 @@
>>>  
>>>  static DEFINE_SPINLOCK(pin_lock);
>>>  
>>> -void pin_remove(struct fs_pin *pin)
>>> +/**
>>> + * pin_remove - disconnect an fs_pin from the pinned structure.
>>> + * @pin:	The struct fs_pin which is pinning something.
>>> + *
>>> + * Detach a 'pin' which was added by pin_insert().  A return value
>>> + * of -1 implies that pin_kill() has already been called and that the
>>> + * ->kill() function now owns the data structure containing @pin.
>>> + * The function which called pin_remove() must not touch the data structure
>>> + * again (unless it is the ->kill() function itself).
>>> + * A return value of 0 implies an uneventful disconnect: pin_kill() has not called,
>>> + * and will not call, the ->kill() function on this @pin.
>>> + * Any other return value is a usage error - e.g. repeated call to pin_remove().
>>> + */
>>> +int pin_remove(struct fs_pin *pin)
>>>  {
>>> +	int ret;
>>>  	spin_lock(&pin_lock);
>>>  	hlist_del_init(&pin->m_list);
>>>  	hlist_del_init(&pin->s_list);
>>>  	spin_unlock(&pin_lock);
>>>  	spin_lock_irq(&pin->wait.lock);
>>> +	ret = pin->done;
>>>  	pin->done = 1;
>>>  	wake_up_locked(&pin->wait);
>>>  	spin_unlock_irq(&pin->wait.lock);
>>> +	return ret;
>>>  }
>>>  
>>>  void pin_insert_group(struct fs_pin *pin, struct vfsmount *m, struct hlist_head *p)
>>> diff --git a/include/linux/fs_pin.h b/include/linux/fs_pin.h
>>> index 3886b3bffd7f..2fe9d3ba09e8 100644
>>> --- a/include/linux/fs_pin.h
>>> +++ b/include/linux/fs_pin.h
>>> @@ -18,7 +18,7 @@ static inline void init_fs_pin(struct fs_pin *p, void (*kill)(struct fs_pin *))
>>>  	p->kill = kill;
>>>  }
>>>  
>>> -void pin_remove(struct fs_pin *);
>>> +int pin_remove(struct fs_pin *);
>>>  void pin_insert_group(struct fs_pin *, struct vfsmount *, struct hlist_head *);
>>>  void pin_insert(struct fs_pin *, struct vfsmount *);
>>>  void pin_kill(struct fs_pin *);
>>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
>> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH] fs-pin: allow pin_remove() to be called other than from ->kill()
@ 2015-08-18  6:37               ` Kinglong Mee
  0 siblings, 0 replies; 49+ messages in thread
From: Kinglong Mee @ 2015-08-18  6:37 UTC (permalink / raw)
  To: NeilBrown
  Cc: Al Viro, J. Bruce Fields, linux-nfs, linux-fsdevel,
	Trond Myklebust, kinglongmee

On 8/18/2015 14:21, NeilBrown wrote:
> On Tue, 18 Aug 2015 14:07:58 +0800 Kinglong Mee <kinglongmee@gmail.com>
> wrote:
> 
>> Sorry for my so late reply.
>>
>> On 7/29/2015 11:59, NeilBrown wrote:
>>> fs-pin currently assumes when either the vfsmount or the fs_pin wants
>>> to unpin, pin_kill() will be called.
>>> This requires that the ->kill() function can wait for any transient
>>> references to the fs_pin to be released.  If the structure containing
>>> the fs_pin doesn't already have the ability to wait for references,
>>> this can be a burden.
>>>
>>> As the fs_pin already has infrastructure for waiting, that can be
>>> leveraged to remove the burden.
>>>
>>> In this alternate scenario, only the vfsmount calls pin_kill() when it
>>> wants to unpin.  The owner of the fs_pin() instead calls pin_remove().
>>>
>>> The ->kill() function removes any long-term references, and then calls
>>> pin_kill() (recursively).
>>> When the last reference on (the structure containing) the fs_pin is
>>> dropped, pin_remove() will be called and the (recursive) pin_kill()
>>> call will complete.
>>>
>>> For this to be safe, the final "put" must *not* free the structure if
>>> pin_kill() has already been called, as that could leave ->kill()
>>> accessing freed data.
>>>
>>> So we provide a return value for pin_remove() which reports the old
>>> ->done value.
>>>
>>> When final put calls pin_remove() it checks that value.
>>> If it was 0, then pin_kill() has not called ->kill and will not,
>>> so final put can free the data structure.
>>> If it was -1, then pin_kill() has called ->kill, and ->kill will
>>> free the data structure - final put must not touch it.
>>
>> I find another problem, 
>> how can xxx_pin_kill known the last reference of the data have be put?
>>
>> eg,
>> static void expkey_pin_kill(struct fs_pin *pin)
>> {
>>         struct svc_expkey *key = container_of(pin, struct svc_expkey, ek_pin);
>>         cache_delete_entry(key->cd, &key->h);
>>         expkey_destroy(key);
>> }
>>
>> expkey_pin_kill has call cache_delete_entry() but doesn't know whether
>> the last reference has be put (here expkey_put is called)? 
>>
>> Before the cache_list is deleted from the cache, a third user gets
>> the reference, so that, the third user will be the last put of the cache
>> by calling expkey_put, xxx_pin_kill only decrease the reference.
> 
> expkey_pin_kill() should call:
>   cache_delete_entry()
>   pin_kill()
>   expkey_destroy()
> 
> The "cache_delete_entry()" call removes the only long-term reference.
> Any other reference will be transient so it is safe to wait for those.
> 
> The 'pin_kill()' call will wait of pin_remove() to be called (it
> already does that).

Sorry for my missing of calling pin_kill() here.

> pin_remove() will be called when the last reference is dropped.  As
> described above, that pin_remove call will return -1 and so the 'put'
> function will not have called expkey_destroy.
> 
> Finally the expkey_destroy() function actually frees the data
> structure.  No other code can be touching at this point.

With calling pin_kill() again in expkey_pin_kill makes every clear now.
Thanks again.

The only thing is waiting Al's opinion. 

thanks,
Kinglong Mee

> 
> Thanks,
> NeilBrown
> 
> 
>>
>> thanks,
>> Kinglong Mee
>>
>>>
>>> This makes the 'wait' infrastructure of fs_pin available to any
>>> pinning client which wants to use it.
>>>
>>> Signed-Off-By: NeilBrown <neilb@suse.com>
>>>
>>> ---
>>> Hi Al,
>>>  do you see this as a workable solution?  I think it will improve the nfsd pining patch
>>> a lot.
>>>
>>> Thanks,
>>> NeilBrown
>>>
>>>
>>> diff --git a/fs/fs_pin.c b/fs/fs_pin.c
>>> index 611b5408f6ec..b7954a9d17da 100644
>>> --- a/fs/fs_pin.c
>>> +++ b/fs/fs_pin.c
>>> @@ -6,16 +6,32 @@
>>>  
>>>  static DEFINE_SPINLOCK(pin_lock);
>>>  
>>> -void pin_remove(struct fs_pin *pin)
>>> +/**
>>> + * pin_remove - disconnect an fs_pin from the pinned structure.
>>> + * @pin:	The struct fs_pin which is pinning something.
>>> + *
>>> + * Detach a 'pin' which was added by pin_insert().  A return value
>>> + * of -1 implies that pin_kill() has already been called and that the
>>> + * ->kill() function now owns the data structure containing @pin.
>>> + * The function which called pin_remove() must not touch the data structure
>>> + * again (unless it is the ->kill() function itself).
>>> + * A return value of 0 implies an uneventful disconnect: pin_kill() has not called,
>>> + * and will not call, the ->kill() function on this @pin.
>>> + * Any other return value is a usage error - e.g. repeated call to pin_remove().
>>> + */
>>> +int pin_remove(struct fs_pin *pin)
>>>  {
>>> +	int ret;
>>>  	spin_lock(&pin_lock);
>>>  	hlist_del_init(&pin->m_list);
>>>  	hlist_del_init(&pin->s_list);
>>>  	spin_unlock(&pin_lock);
>>>  	spin_lock_irq(&pin->wait.lock);
>>> +	ret = pin->done;
>>>  	pin->done = 1;
>>>  	wake_up_locked(&pin->wait);
>>>  	spin_unlock_irq(&pin->wait.lock);
>>> +	return ret;
>>>  }
>>>  
>>>  void pin_insert_group(struct fs_pin *pin, struct vfsmount *m, struct hlist_head *p)
>>> diff --git a/include/linux/fs_pin.h b/include/linux/fs_pin.h
>>> index 3886b3bffd7f..2fe9d3ba09e8 100644
>>> --- a/include/linux/fs_pin.h
>>> +++ b/include/linux/fs_pin.h
>>> @@ -18,7 +18,7 @@ static inline void init_fs_pin(struct fs_pin *p, void (*kill)(struct fs_pin *))
>>>  	p->kill = kill;
>>>  }
>>>  
>>> -void pin_remove(struct fs_pin *);
>>> +int pin_remove(struct fs_pin *);
>>>  void pin_insert_group(struct fs_pin *, struct vfsmount *, struct hlist_head *);
>>>  void pin_insert(struct fs_pin *, struct vfsmount *);
>>>  void pin_kill(struct fs_pin *);
>>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> 

^ permalink raw reply	[flat|nested] 49+ messages in thread

end of thread, other threads:[~2015-08-18  6:37 UTC | newest]

Thread overview: 49+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-07-27  3:05 [PATCH 0/9 v8] NFSD: Pin to vfsmount for nfsd exports cache Kinglong Mee
     [not found] ` <55B5A012.1030006-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2015-07-27  3:06   ` [PATCH 1/9 v8] fs_pin: Initialize value for fs_pin explicitly Kinglong Mee
2015-07-27  3:06     ` Kinglong Mee
2015-07-29  0:25     ` NeilBrown
2015-07-29 19:41       ` J. Bruce Fields
     [not found]         ` <20150729194155.GC21949-uC3wQj2KruNg9hUCZPvPmw@public.gmane.org>
2015-07-29 21:48           ` NeilBrown
2015-07-29 21:48             ` NeilBrown
2015-07-30  0:36             ` J. Bruce Fields
2015-07-30 12:28               ` Kinglong Mee
2015-07-27  3:07   ` [PATCH 2/9 v8] fs_pin: Export functions for specific filesystem Kinglong Mee
2015-07-27  3:07     ` Kinglong Mee
2015-07-29  0:30     ` NeilBrown
2015-07-30 12:31       ` Kinglong Mee
2015-07-27  3:07   ` [PATCH 3/9 v8] path: New helpers path_get_pin/path_put_unpin for path pin Kinglong Mee
2015-07-27  3:07     ` Kinglong Mee
2015-07-27  3:09   ` [PATCH 6/9 v8] sunrpc/nfsd: Remove redundant code by exports seq_operations functions Kinglong Mee
2015-07-27  3:09     ` Kinglong Mee
2015-07-29  2:15     ` NeilBrown
2015-07-27  3:12   ` [PATCH 9/9 v8] nfsd: Allows user un-mounting filesystem where nfsd exports base on Kinglong Mee
2015-07-27  3:12     ` Kinglong Mee
     [not found]     ` <55B5A186.7040004-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2015-07-29  3:56       ` NeilBrown
2015-07-29  3:56         ` NeilBrown
2015-07-30 13:30         ` Kinglong Mee
2015-07-30 13:30           ` Kinglong Mee
2015-07-29  3:59       ` [PATCH] fs-pin: allow pin_remove() to be called other than from ->kill() NeilBrown
2015-07-29  3:59         ` NeilBrown
2015-08-10 11:37         ` Kinglong Mee
2015-08-18  6:07         ` Kinglong Mee
2015-08-18  6:07           ` Kinglong Mee
2015-08-18  6:21           ` NeilBrown
2015-08-18  6:37             ` Kinglong Mee
2015-08-18  6:37               ` Kinglong Mee
2015-07-27  3:08 ` [PATCH 4/9 v8] fs: New helper legitimize_mntget() for getting a legitimize mnt Kinglong Mee
     [not found]   ` <55B5A0B0.7060604-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2015-07-29  2:06     ` NeilBrown
2015-07-29  2:06       ` NeilBrown
2015-07-30 13:17       ` Kinglong Mee
2015-07-27  3:09 ` [PATCH 5/9 v8] sunrpc: Store cache_detail in seq_file's private directly Kinglong Mee
2015-07-29  2:11   ` NeilBrown
2015-07-27  3:10 ` [PATCH 7/9 v8] sunrpc: Switch to using hash list instead single list Kinglong Mee
2015-07-29  2:19   ` NeilBrown
2015-07-29 19:51     ` J. Bruce Fields
2015-07-29 19:51       ` J. Bruce Fields
     [not found]       ` <20150729195151.GD21949-uC3wQj2KruNg9hUCZPvPmw@public.gmane.org>
2015-07-30 13:01         ` Kinglong Mee
2015-07-30 13:01           ` Kinglong Mee
2015-07-27  3:10 ` [PATCH 8/9 v8] sunrpc: New helper cache_delete_entry for deleting cache_head directly Kinglong Mee
     [not found]   ` <55B5A135.9050800-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2015-07-29  2:29     ` NeilBrown
2015-07-29  2:29       ` NeilBrown
2015-07-30 13:14       ` Kinglong Mee
2015-07-30 13:14         ` Kinglong Mee

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.