linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH RFC v2 0/4] 9p: fix open-unlink-f*syscall bug.
@ 2020-09-23 14:11 Jianyong Wu
  2020-09-23 14:11 ` [PATCH v2 1/4] fs/9p: fix create-unlink-getattr idiom Jianyong Wu
                   ` (3 more replies)
  0 siblings, 4 replies; 18+ messages in thread
From: Jianyong Wu @ 2020-09-23 14:11 UTC (permalink / raw)
  To: ericvh, lucho, asmadeus, qemu_oss
  Cc: groug, v9fs-developer, linux-kernel, justin.he, jianyong.wu

open-unlink-f*syscall bug is well-known in 9p. We try to fix the bug
in this patch set.
I take Eric's and Greg's patches which constiute the 1/4 - 3/4 of this patch
set as the main frame of the solution. In patch 4/4, I fix the fid race issue
exists in Greg's patch.

change log:
v1 to v2:
        (1) in patch 4/4: do fid refcounter down in the clunk helper.
        (2) int patch 4/4: remove the enum value denoting from which of the
inode or dentry fids are allcated.

Eric Van Hensbergen (1):
  fs/9p: fix create-unlink-getattr idiom

Greg Kurz (1):
  fs/9p: search open fids first

Jianyong Wu (2):
  fs/9p: track open fids
  9p: fix race issue in fid contention.

 fs/9p/fid.c             | 69 ++++++++++++++++++++++++++++++++++++++---
 fs/9p/fid.h             | 11 ++++++-
 fs/9p/vfs_dentry.c      |  2 ++
 fs/9p/vfs_dir.c         |  6 +++-
 fs/9p/vfs_file.c        |  1 +
 fs/9p/vfs_inode.c       | 47 ++++++++++++++++++++++------
 fs/9p/vfs_inode_dotl.c  | 35 +++++++++++++++++----
 fs/9p/vfs_super.c       |  1 +
 fs/9p/xattr.c           | 16 ++++++++--
 include/net/9p/client.h |  7 +++++
 net/9p/client.c         | 14 ++++++---
 11 files changed, 179 insertions(+), 30 deletions(-)

-- 
2.17.1


^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH v2 1/4] fs/9p: fix create-unlink-getattr idiom
  2020-09-23 14:11 [PATCH RFC v2 0/4] 9p: fix open-unlink-f*syscall bug Jianyong Wu
@ 2020-09-23 14:11 ` Jianyong Wu
  2020-09-23 14:11 ` [PATCH v2 2/4] fs/9p: track open fids Jianyong Wu
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 18+ messages in thread
From: Jianyong Wu @ 2020-09-23 14:11 UTC (permalink / raw)
  To: ericvh, lucho, asmadeus, qemu_oss
  Cc: groug, v9fs-developer, linux-kernel, justin.he, jianyong.wu, Greg Kurz

From: Eric Van Hensbergen <ericvh@gmail.com>

Fixes several outstanding bug reports of not being able to getattr from an
open file after an unlink.  This patch cleans up transient fids on an unlink
and will search open fids on a client if it detects a dentry that appears to
have been unlinked.  This search is necessary because fstat does not pass fd
information through the VFS API to the filesystem, only the dentry which for
9p has an imperfect match to fids.

Inherent in this patch is also a fix for the qid handling on create/open
which apparently wasn't being set correctly and was necessary for the search
to succeed.

A possible optimization over this fix is to include accounting of open fids
with the inode in the private data (in a similar fashion to the way we track
transient fids with dentries).  This would allow a much quicker search for
a matching open fid.

Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
(changed v9fs_fid_find_global to v9fs_fid_find_inode in comment)
Signed-off-by: Greg Kurz <gkurz@linux.vnet.ibm.com>
Signed-off-by: Jianyong Wu <jianyong.wu@arm.com>

---
 fs/9p/fid.c       | 30 ++++++++++++++++++++++++++++++
 fs/9p/vfs_inode.c |  4 ++++
 net/9p/client.c   |  5 ++++-
 3 files changed, 38 insertions(+), 1 deletion(-)

diff --git a/fs/9p/fid.c b/fs/9p/fid.c
index 3d681a2c2731..3304984c0fad 100644
--- a/fs/9p/fid.c
+++ b/fs/9p/fid.c
@@ -38,6 +38,33 @@ void v9fs_fid_add(struct dentry *dentry, struct p9_fid *fid)
 	spin_unlock(&dentry->d_lock);
 }
 
+/**
+ * v9fs_fid_find_inode - search for a fid off of the client list
+ * @inode: return a fid pointing to a specific inode
+ * @uid: return a fid belonging to the specified user
+ *
+ */
+
+static struct p9_fid *v9fs_fid_find_inode(struct inode *inode, kuid_t uid)
+{
+	struct p9_client *clnt = v9fs_inode2v9ses(inode)->clnt;
+	struct p9_fid *fid, *fidptr, *ret = NULL;
+	unsigned long flags;
+
+	p9_debug(P9_DEBUG_VFS, " inode: %p\n", inode);
+
+	spin_lock_irqsave(&clnt->lock, flags);
+	list_for_each_entry_safe(fid, fidptr, &clnt->fidlist, flist) {
+		if (uid_eq(fid->uid, uid) &&
+		   (inode->i_ino == v9fs_qid2ino(&fid->qid))) {
+			ret = fid;
+			break;
+		}
+	}
+	spin_unlock_irqrestore(&clnt->lock, flags);
+	return ret;
+}
+
 /**
  * v9fs_fid_find - retrieve a fid that belongs to the specified uid
  * @dentry: dentry to look for fid in
@@ -65,6 +92,9 @@ static struct p9_fid *v9fs_fid_find(struct dentry *dentry, kuid_t uid, int any)
 			}
 		}
 		spin_unlock(&dentry->d_lock);
+	} else {
+		if (dentry->d_inode)
+			ret = v9fs_fid_find_inode(dentry->d_inode, uid);
 	}
 
 	return ret;
diff --git a/fs/9p/vfs_inode.c b/fs/9p/vfs_inode.c
index ae0c38ad1fcb..31c2fddabb82 100644
--- a/fs/9p/vfs_inode.c
+++ b/fs/9p/vfs_inode.c
@@ -570,6 +570,10 @@ static int v9fs_remove(struct inode *dir, struct dentry *dentry, int flags)
 
 		v9fs_invalidate_inode_attr(inode);
 		v9fs_invalidate_inode_attr(dir);
+
+		/* invalidate all fids associated with dentry */
+		/* NOTE: This will not include open fids */
+		dentry->d_op->d_release(dentry);
 	}
 	return retval;
 }
diff --git a/net/9p/client.c b/net/9p/client.c
index 09f1ec589b80..1a3f72bf45fc 100644
--- a/net/9p/client.c
+++ b/net/9p/client.c
@@ -1219,7 +1219,7 @@ struct p9_fid *p9_client_walk(struct p9_fid *oldfid, uint16_t nwname,
 	if (nwname)
 		memmove(&fid->qid, &wqids[nwqids - 1], sizeof(struct p9_qid));
 	else
-		fid->qid = oldfid->qid;
+		memmove(&fid->qid, &oldfid->qid, sizeof(struct p9_qid));
 
 	kfree(wqids);
 	return fid;
@@ -1272,6 +1272,7 @@ int p9_client_open(struct p9_fid *fid, int mode)
 		p9_is_proto_dotl(clnt) ? "RLOPEN" : "ROPEN",  qid.type,
 		(unsigned long long)qid.path, qid.version, iounit);
 
+	memmove(&fid->qid, &qid, sizeof(struct p9_qid));
 	fid->mode = mode;
 	fid->iounit = iounit;
 
@@ -1317,6 +1318,7 @@ int p9_client_create_dotl(struct p9_fid *ofid, const char *name, u32 flags, u32
 			(unsigned long long)qid->path,
 			qid->version, iounit);
 
+	memmove(&ofid->qid, qid, sizeof(struct p9_qid));
 	ofid->mode = mode;
 	ofid->iounit = iounit;
 
@@ -1362,6 +1364,7 @@ int p9_client_fcreate(struct p9_fid *fid, const char *name, u32 perm, int mode,
 				(unsigned long long)qid.path,
 				qid.version, iounit);
 
+	memmove(&fid->qid, &qid, sizeof(struct p9_qid));
 	fid->mode = mode;
 	fid->iounit = iounit;
 
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH v2 2/4] fs/9p: track open fids
  2020-09-23 14:11 [PATCH RFC v2 0/4] 9p: fix open-unlink-f*syscall bug Jianyong Wu
  2020-09-23 14:11 ` [PATCH v2 1/4] fs/9p: fix create-unlink-getattr idiom Jianyong Wu
@ 2020-09-23 14:11 ` Jianyong Wu
  2020-09-23 14:11 ` [PATCH v2 3/4] fs/9p: search open fids first Jianyong Wu
  2020-09-23 14:11 ` [PATCH RFC v2 4/4] 9p: fix race issue in fid contention Jianyong Wu
  3 siblings, 0 replies; 18+ messages in thread
From: Jianyong Wu @ 2020-09-23 14:11 UTC (permalink / raw)
  To: ericvh, lucho, asmadeus, qemu_oss
  Cc: groug, v9fs-developer, linux-kernel, justin.he, jianyong.wu, Greg Kurz

From: Greg Kurz <gkurz@linux.vnet.ibm.com>

This patch adds accounting of open fids in a list hanging off the i_private
field of the corresponding inode. This allows faster lookups compared to
searching the full 9p client list.

The lookup code is modified accordingly.

Signed-off-by: Greg Kurz <gkurz@linux.vnet.ibm.com>
Signed-off-by: Jianyong Wu <jianyong.wu@arm.com>
---
 fs/9p/fid.c             | 32 +++++++++++++++++++++++---------
 fs/9p/fid.h             |  1 +
 fs/9p/vfs_dir.c         |  3 +++
 fs/9p/vfs_file.c        |  1 +
 fs/9p/vfs_inode.c       |  6 +++++-
 fs/9p/vfs_inode_dotl.c  |  1 +
 include/net/9p/client.h |  1 +
 7 files changed, 35 insertions(+), 10 deletions(-)

diff --git a/fs/9p/fid.c b/fs/9p/fid.c
index 3304984c0fad..d11dd430590d 100644
--- a/fs/9p/fid.c
+++ b/fs/9p/fid.c
@@ -39,7 +39,7 @@ void v9fs_fid_add(struct dentry *dentry, struct p9_fid *fid)
 }
 
 /**
- * v9fs_fid_find_inode - search for a fid off of the client list
+ * v9fs_fid_find_inode - search for an open fid off of the inode list
  * @inode: return a fid pointing to a specific inode
  * @uid: return a fid belonging to the specified user
  *
@@ -47,24 +47,38 @@ void v9fs_fid_add(struct dentry *dentry, struct p9_fid *fid)
 
 static struct p9_fid *v9fs_fid_find_inode(struct inode *inode, kuid_t uid)
 {
-	struct p9_client *clnt = v9fs_inode2v9ses(inode)->clnt;
-	struct p9_fid *fid, *fidptr, *ret = NULL;
-	unsigned long flags;
+	struct hlist_head *h;
+	struct p9_fid *fid, *ret = NULL;
 
 	p9_debug(P9_DEBUG_VFS, " inode: %p\n", inode);
 
-	spin_lock_irqsave(&clnt->lock, flags);
-	list_for_each_entry_safe(fid, fidptr, &clnt->fidlist, flist) {
-		if (uid_eq(fid->uid, uid) &&
-		   (inode->i_ino == v9fs_qid2ino(&fid->qid))) {
+	spin_lock(&inode->i_lock);
+	h = (struct hlist_head *)&inode->i_private;
+	hlist_for_each_entry(fid, h, ilist) {
+		if (uid_eq(fid->uid, uid)) {
 			ret = fid;
 			break;
 		}
 	}
-	spin_unlock_irqrestore(&clnt->lock, flags);
+	spin_unlock(&inode->i_lock);
 	return ret;
 }
 
+/**
+ * v9fs_open_fid_add - add an open fid to an inode
+ * @dentry: inode that the fid is being added to
+ * @fid: fid to add
+ *
+ */
+
+void v9fs_open_fid_add(struct inode *inode, struct p9_fid *fid)
+{
+	spin_lock(&inode->i_lock);
+	hlist_add_head(&fid->ilist, (struct hlist_head *)&inode->i_private);
+	spin_unlock(&inode->i_lock);
+}
+
+
 /**
  * v9fs_fid_find - retrieve a fid that belongs to the specified uid
  * @dentry: dentry to look for fid in
diff --git a/fs/9p/fid.h b/fs/9p/fid.h
index 928b1093f511..dfa11df02818 100644
--- a/fs/9p/fid.h
+++ b/fs/9p/fid.h
@@ -15,6 +15,7 @@ static inline struct p9_fid *v9fs_parent_fid(struct dentry *dentry)
 }
 void v9fs_fid_add(struct dentry *dentry, struct p9_fid *fid);
 struct p9_fid *v9fs_writeback_fid(struct dentry *dentry);
+void v9fs_open_fid_add(struct inode *inode, struct p9_fid *fid);
 static inline struct p9_fid *clone_fid(struct p9_fid *fid)
 {
 	return IS_ERR(fid) ? fid :  p9_client_walk(fid, 0, NULL, 1);
diff --git a/fs/9p/vfs_dir.c b/fs/9p/vfs_dir.c
index 674d22bf4f6f..d82d8a346f86 100644
--- a/fs/9p/vfs_dir.c
+++ b/fs/9p/vfs_dir.c
@@ -210,6 +210,9 @@ int v9fs_dir_release(struct inode *inode, struct file *filp)
 	fid = filp->private_data;
 	p9_debug(P9_DEBUG_VFS, "inode: %p filp: %p fid: %d\n",
 		 inode, filp, fid ? fid->fid : -1);
+	spin_lock(&inode->i_lock);
+	hlist_del(&fid->ilist);
+	spin_unlock(&inode->i_lock);
 	if (fid)
 		p9_client_clunk(fid);
 	return 0;
diff --git a/fs/9p/vfs_file.c b/fs/9p/vfs_file.c
index 92cd1d80218d..b42cc1752cd1 100644
--- a/fs/9p/vfs_file.c
+++ b/fs/9p/vfs_file.c
@@ -96,6 +96,7 @@ int v9fs_file_open(struct inode *inode, struct file *file)
 	mutex_unlock(&v9inode->v_mutex);
 	if (v9ses->cache == CACHE_LOOSE || v9ses->cache == CACHE_FSCACHE)
 		v9fs_cache_inode_set_cookie(inode, file);
+	v9fs_open_fid_add(inode, fid);
 	return 0;
 out_error:
 	p9_client_clunk(file->private_data);
diff --git a/fs/9p/vfs_inode.c b/fs/9p/vfs_inode.c
index 31c2fddabb82..6b243ffcbcf0 100644
--- a/fs/9p/vfs_inode.c
+++ b/fs/9p/vfs_inode.c
@@ -256,6 +256,7 @@ int v9fs_init_inode(struct v9fs_session_info *v9ses,
 	inode->i_rdev = rdev;
 	inode->i_atime = inode->i_mtime = inode->i_ctime = current_time(inode);
 	inode->i_mapping->a_ops = &v9fs_addr_operations;
+	inode->i_private = NULL;
 
 	switch (mode & S_IFMT) {
 	case S_IFIFO:
@@ -796,6 +797,7 @@ v9fs_vfs_atomic_open(struct inode *dir, struct dentry *dentry,
 	struct v9fs_session_info *v9ses;
 	struct p9_fid *fid, *inode_fid;
 	struct dentry *res = NULL;
+	struct inode *inode;
 
 	if (d_in_lookup(dentry)) {
 		res = v9fs_vfs_lookup(dir, dentry, 0);
@@ -824,7 +826,8 @@ v9fs_vfs_atomic_open(struct inode *dir, struct dentry *dentry,
 	}
 
 	v9fs_invalidate_inode_attr(dir);
-	v9inode = V9FS_I(d_inode(dentry));
+	inode = d_inode(dentry);
+	v9inode = V9FS_I(inode);
 	mutex_lock(&v9inode->v_mutex);
 	if ((v9ses->cache == CACHE_LOOSE || v9ses->cache == CACHE_FSCACHE) &&
 	    !v9inode->writeback_fid &&
@@ -852,6 +855,7 @@ v9fs_vfs_atomic_open(struct inode *dir, struct dentry *dentry,
 	file->private_data = fid;
 	if (v9ses->cache == CACHE_LOOSE || v9ses->cache == CACHE_FSCACHE)
 		v9fs_cache_inode_set_cookie(d_inode(dentry), file);
+	v9fs_open_fid_add(inode, fid);
 
 	file->f_mode |= FMODE_CREATED;
 out:
diff --git a/fs/9p/vfs_inode_dotl.c b/fs/9p/vfs_inode_dotl.c
index 0028eccb665a..08f2e089fb0e 100644
--- a/fs/9p/vfs_inode_dotl.c
+++ b/fs/9p/vfs_inode_dotl.c
@@ -342,6 +342,7 @@ v9fs_vfs_atomic_open_dotl(struct inode *dir, struct dentry *dentry,
 	file->private_data = ofid;
 	if (v9ses->cache == CACHE_LOOSE || v9ses->cache == CACHE_FSCACHE)
 		v9fs_cache_inode_set_cookie(inode, file);
+	v9fs_open_fid_add(inode, ofid);
 	file->f_mode |= FMODE_CREATED;
 out:
 	v9fs_put_acl(dacl, pacl);
diff --git a/include/net/9p/client.h b/include/net/9p/client.h
index dd5b5bd781a4..ce7882da8e86 100644
--- a/include/net/9p/client.h
+++ b/include/net/9p/client.h
@@ -152,6 +152,7 @@ struct p9_fid {
 	void *rdir;
 
 	struct hlist_node dlist;	/* list of all fids attached to a dentry */
+	struct hlist_node ilist;
 };
 
 /**
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH v2 3/4] fs/9p: search open fids first
  2020-09-23 14:11 [PATCH RFC v2 0/4] 9p: fix open-unlink-f*syscall bug Jianyong Wu
  2020-09-23 14:11 ` [PATCH v2 1/4] fs/9p: fix create-unlink-getattr idiom Jianyong Wu
  2020-09-23 14:11 ` [PATCH v2 2/4] fs/9p: track open fids Jianyong Wu
@ 2020-09-23 14:11 ` Jianyong Wu
  2020-09-23 14:11 ` [PATCH RFC v2 4/4] 9p: fix race issue in fid contention Jianyong Wu
  3 siblings, 0 replies; 18+ messages in thread
From: Jianyong Wu @ 2020-09-23 14:11 UTC (permalink / raw)
  To: ericvh, lucho, asmadeus, qemu_oss
  Cc: groug, v9fs-developer, linux-kernel, justin.he, jianyong.wu

From: Greg Kurz <groug@kaod.org>

A previous patch fixed the "create-unlink-getattr" idiom: if getattr is
called on an unlinked file, we try to find an open fid attached to the
corresponding inode.

We have a similar issue with file permissions and setattr:

open("./test.txt", O_RDWR|O_CREAT, 0666) = 4
chmod("./test.txt", 0)                  = 0
truncate("./test.txt", 0)               = -1 EACCES (Permission denied)
ftruncate(4, 0)                         = -1 EACCES (Permission denied)

The failure is expected with truncate() but not with ftruncate().

This happens because the lookup code does find a matching fid in the
dentry list. Unfortunately, this is not an open fid and the server
will be forced to rely on the path name, rather than on an open file
descriptor. This is the case in QEMU: the setattr operation will use
truncate() and fail because of bad write permissions.

This patch changes the logic in the lookup code, so that we consider
open fids first. It gives a chance to the server to match this open
fid to an open file descriptor and use ftruncate() instead of truncate().
This does not change the current behaviour for truncate() and other
path name based syscalls, since file permissions are checked earlier
in the VFS layer.

With this patch, we get:

open("./test.txt", O_RDWR|O_CREAT, 0666) = 4
chmod("./test.txt", 0)                  = 0
truncate("./test.txt", 0)               = -1 EACCES (Permission denied)
ftruncate(4, 0)                         = 0

Signed-off-by: Greg Kurz <groug@kaod.org>
Signed-off-by: Jianyong Wu <jianyong.wu@arm.com>
---
 fs/9p/fid.c | 9 +++++----
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/fs/9p/fid.c b/fs/9p/fid.c
index d11dd430590d..0b23b0fe6c51 100644
--- a/fs/9p/fid.c
+++ b/fs/9p/fid.c
@@ -95,8 +95,12 @@ static struct p9_fid *v9fs_fid_find(struct dentry *dentry, kuid_t uid, int any)
 		 dentry, dentry, from_kuid(&init_user_ns, uid),
 		 any);
 	ret = NULL;
+
+	if (d_inode(dentry))
+		ret = v9fs_fid_find_inode(d_inode(dentry), uid);
+
 	/* we'll recheck under lock if there's anything to look in */
-	if (dentry->d_fsdata) {
+	if (!ret && dentry->d_fsdata) {
 		struct hlist_head *h = (struct hlist_head *)&dentry->d_fsdata;
 		spin_lock(&dentry->d_lock);
 		hlist_for_each_entry(fid, h, dlist) {
@@ -106,9 +110,6 @@ static struct p9_fid *v9fs_fid_find(struct dentry *dentry, kuid_t uid, int any)
 			}
 		}
 		spin_unlock(&dentry->d_lock);
-	} else {
-		if (dentry->d_inode)
-			ret = v9fs_fid_find_inode(dentry->d_inode, uid);
 	}
 
 	return ret;
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH RFC v2 4/4] 9p: fix race issue in fid contention.
  2020-09-23 14:11 [PATCH RFC v2 0/4] 9p: fix open-unlink-f*syscall bug Jianyong Wu
                   ` (2 preceding siblings ...)
  2020-09-23 14:11 ` [PATCH v2 3/4] fs/9p: search open fids first Jianyong Wu
@ 2020-09-23 14:11 ` Jianyong Wu
  2020-09-23 14:49   ` Dominique Martinet
  2020-11-03 10:41   ` Dominique Martinet
  3 siblings, 2 replies; 18+ messages in thread
From: Jianyong Wu @ 2020-09-23 14:11 UTC (permalink / raw)
  To: ericvh, lucho, asmadeus, qemu_oss
  Cc: groug, v9fs-developer, linux-kernel, justin.he, jianyong.wu

Eric's and Greg's patch offer a mechanism to fix open-unlink-f*syscall
bug in 9p. But there is race issue in fid comtention.
As Greg's patch stores all of fids from opened files into according inode,
so all the lookup fid ops can retrieve fid from inode preferentially. But
there is no mechanism to handle the fid comtention issue. For example,
there are two threads get the same fid in the same time and one of them
clunk the fid before the other thread ready to discard the fid. In this
scenario, it will lead to some fatal problems, even kernel core dump.

I introduce a mechanism to fix this race issue. A counter field introduced
into p9_fid struct to store the reference counter to the fid. When a fid
is allocated from the inode or dentry, the counter will increase, and
will decrease at the end of its occupation. It is guaranteed that the
fid won't be clunked before the reference counter go down to 0, then
we can avoid the clunked fid to be used.

tests:
race issue test from the old test case:
for file in {01..50}; do touch f.${file}; done
seq 1 1000 | xargs -n 1 -P 50 -I{} cat f.* > /dev/null

open-unlink-f*syscall test:
I have tested for f*syscall include: ftruncate fstat fchown fchmod faccessat.

Signed-off-by: Jianyong Wu <jianyong.wu@arm.com>
---
 fs/9p/fid.c             | 22 ++++++++++++++++++----
 fs/9p/fid.h             | 10 +++++++++-
 fs/9p/vfs_dentry.c      |  2 ++
 fs/9p/vfs_dir.c         |  9 +++++----
 fs/9p/vfs_inode.c       | 37 +++++++++++++++++++++++++++++--------
 fs/9p/vfs_inode_dotl.c  | 34 ++++++++++++++++++++++++++++------
 fs/9p/vfs_super.c       |  1 +
 fs/9p/xattr.c           | 16 +++++++++++++---
 include/net/9p/client.h |  6 ++++++
 net/9p/client.c         |  9 +++++----
 10 files changed, 116 insertions(+), 30 deletions(-)

diff --git a/fs/9p/fid.c b/fs/9p/fid.c
index 0b23b0fe6c51..89643dabcdae 100644
--- a/fs/9p/fid.c
+++ b/fs/9p/fid.c
@@ -28,6 +28,7 @@
 
 static inline void __add_fid(struct dentry *dentry, struct p9_fid *fid)
 {
+	atomic_set(&fid->count, 1);
 	hlist_add_head(&fid->dlist, (struct hlist_head *)&dentry->d_fsdata);
 }
 
@@ -60,6 +61,8 @@ static struct p9_fid *v9fs_fid_find_inode(struct inode *inode, kuid_t uid)
 			break;
 		}
 	}
+	if (ret && !IS_ERR(ret))
+		atomic_inc(&ret->count);
 	spin_unlock(&inode->i_lock);
 	return ret;
 }
@@ -74,6 +77,7 @@ static struct p9_fid *v9fs_fid_find_inode(struct inode *inode, kuid_t uid)
 void v9fs_open_fid_add(struct inode *inode, struct p9_fid *fid)
 {
 	spin_lock(&inode->i_lock);
+	atomic_set(&fid->count, 1);
 	hlist_add_head(&fid->ilist, (struct hlist_head *)&inode->i_private);
 	spin_unlock(&inode->i_lock);
 }
@@ -106,6 +110,7 @@ static struct p9_fid *v9fs_fid_find(struct dentry *dentry, kuid_t uid, int any)
 		hlist_for_each_entry(fid, h, dlist) {
 			if (any || uid_eq(fid->uid, uid)) {
 				ret = fid;
+				atomic_inc(&ret->count);
 				break;
 			}
 		}
@@ -167,7 +172,10 @@ static struct p9_fid *v9fs_fid_lookup_with_uid(struct dentry *dentry,
 	fid = v9fs_fid_find(ds, uid, any);
 	if (fid) {
 		/* Found the parent fid do a lookup with that */
-		fid = p9_client_walk(fid, 1, &dentry->d_name.name, 1);
+		struct p9_fid *ofid = fid;
+
+		fid = p9_client_walk(ofid, 1, &dentry->d_name.name, 1);
+		p9_client_clunk(ofid);
 		goto fid_out;
 	}
 	up_read(&v9ses->rename_sem);
@@ -192,8 +200,10 @@ static struct p9_fid *v9fs_fid_lookup_with_uid(struct dentry *dentry,
 		v9fs_fid_add(dentry->d_sb->s_root, fid);
 	}
 	/* If we are root ourself just return that */
-	if (dentry->d_sb->s_root == dentry)
+	if (dentry->d_sb->s_root == dentry) {
+		atomic_inc(&fid->count);
 		return fid;
+	}
 	/*
 	 * Do a multipath walk with attached root.
 	 * When walking parent we need to make sure we
@@ -240,6 +250,7 @@ static struct p9_fid *v9fs_fid_lookup_with_uid(struct dentry *dentry,
 			fid = ERR_PTR(-ENOENT);
 		} else {
 			__add_fid(dentry, fid);
+			atomic_inc(&fid->count);
 			spin_unlock(&dentry->d_lock);
 		}
 	}
@@ -290,11 +301,14 @@ struct p9_fid *v9fs_fid_lookup(struct dentry *dentry)
 struct p9_fid *v9fs_writeback_fid(struct dentry *dentry)
 {
 	int err;
-	struct p9_fid *fid;
+	struct p9_fid *fid, *ofid;
 
-	fid = clone_fid(v9fs_fid_lookup_with_uid(dentry, GLOBAL_ROOT_UID, 0));
+	ofid = v9fs_fid_lookup_with_uid(dentry, GLOBAL_ROOT_UID, 0);
+	if (ofid && !IS_ERR(ofid))
+		fid = clone_fid(ofid);
 	if (IS_ERR(fid))
 		goto error_out;
+	p9_client_clunk(ofid);
 	/*
 	 * writeback fid will only be used to write back the
 	 * dirty pages. We always request for the open fid in read-write
diff --git a/fs/9p/fid.h b/fs/9p/fid.h
index dfa11df02818..1fed96546728 100644
--- a/fs/9p/fid.h
+++ b/fs/9p/fid.h
@@ -22,6 +22,14 @@ static inline struct p9_fid *clone_fid(struct p9_fid *fid)
 }
 static inline struct p9_fid *v9fs_fid_clone(struct dentry *dentry)
 {
-	return clone_fid(v9fs_fid_lookup(dentry));
+	struct p9_fid *fid, *nfid;
+
+	fid = v9fs_fid_lookup(dentry);
+	if (!fid || IS_ERR(fid))
+		return fid;
+
+	nfid = p9_client_walk(fid, 0, NULL, 1);
+	p9_client_clunk(fid);
+	return nfid;
 }
 #endif
diff --git a/fs/9p/vfs_dentry.c b/fs/9p/vfs_dentry.c
index 7d6f69aefd45..4b4292123b3d 100644
--- a/fs/9p/vfs_dentry.c
+++ b/fs/9p/vfs_dentry.c
@@ -85,6 +85,8 @@ static int v9fs_lookup_revalidate(struct dentry *dentry, unsigned int flags)
 			retval = v9fs_refresh_inode_dotl(fid, inode);
 		else
 			retval = v9fs_refresh_inode(fid, inode);
+		p9_client_clunk(fid);
+
 		if (retval == -ENOENT)
 			return 0;
 		if (retval < 0)
diff --git a/fs/9p/vfs_dir.c b/fs/9p/vfs_dir.c
index d82d8a346f86..b6a5a0be444d 100644
--- a/fs/9p/vfs_dir.c
+++ b/fs/9p/vfs_dir.c
@@ -210,11 +210,12 @@ int v9fs_dir_release(struct inode *inode, struct file *filp)
 	fid = filp->private_data;
 	p9_debug(P9_DEBUG_VFS, "inode: %p filp: %p fid: %d\n",
 		 inode, filp, fid ? fid->fid : -1);
-	spin_lock(&inode->i_lock);
-	hlist_del(&fid->ilist);
-	spin_unlock(&inode->i_lock);
-	if (fid)
+	if (fid) {
+		spin_lock(&inode->i_lock);
+		hlist_del(&fid->ilist);
+		spin_unlock(&inode->i_lock);
 		p9_client_clunk(fid);
+	}
 	return 0;
 }
 
diff --git a/fs/9p/vfs_inode.c b/fs/9p/vfs_inode.c
index 6b243ffcbcf0..4a937fac1acb 100644
--- a/fs/9p/vfs_inode.c
+++ b/fs/9p/vfs_inode.c
@@ -551,6 +551,7 @@ static int v9fs_remove(struct inode *dir, struct dentry *dentry, int flags)
 	if (v9fs_proto_dotl(v9ses))
 		retval = p9_client_unlinkat(dfid, dentry->d_name.name,
 					    v9fs_at_to_dotl_flags(flags));
+	p9_client_clunk(dfid);
 	if (retval == -EOPNOTSUPP) {
 		/* Try the one based on path */
 		v9fid = v9fs_fid_clone(dentry);
@@ -595,14 +596,12 @@ v9fs_create(struct v9fs_session_info *v9ses, struct inode *dir,
 {
 	int err;
 	const unsigned char *name;
-	struct p9_fid *dfid, *ofid, *fid;
+	struct p9_fid *dfid, *ofid = NULL, *fid = NULL;
 	struct inode *inode;
 
 	p9_debug(P9_DEBUG_VFS, "name %pd\n", dentry);
 
 	err = 0;
-	ofid = NULL;
-	fid = NULL;
 	name = dentry->d_name.name;
 	dfid = v9fs_parent_fid(dentry);
 	if (IS_ERR(dfid)) {
@@ -616,12 +615,14 @@ v9fs_create(struct v9fs_session_info *v9ses, struct inode *dir,
 	if (IS_ERR(ofid)) {
 		err = PTR_ERR(ofid);
 		p9_debug(P9_DEBUG_VFS, "p9_client_walk failed %d\n", err);
+		p9_client_clunk(dfid);
 		return ERR_PTR(err);
 	}
 
 	err = p9_client_fcreate(ofid, name, perm, mode, extension);
 	if (err < 0) {
 		p9_debug(P9_DEBUG_VFS, "p9_client_fcreate failed %d\n", err);
+		p9_client_clunk(dfid);
 		goto error;
 	}
 
@@ -633,6 +634,7 @@ v9fs_create(struct v9fs_session_info *v9ses, struct inode *dir,
 			p9_debug(P9_DEBUG_VFS,
 				   "p9_client_walk failed %d\n", err);
 			fid = NULL;
+			p9_client_clunk(dfid);
 			goto error;
 		}
 		/*
@@ -643,11 +645,13 @@ v9fs_create(struct v9fs_session_info *v9ses, struct inode *dir,
 			err = PTR_ERR(inode);
 			p9_debug(P9_DEBUG_VFS,
 				   "inode creation failed %d\n", err);
+			p9_client_clunk(dfid);
 			goto error;
 		}
 		v9fs_fid_add(dentry, fid);
 		d_instantiate(dentry, inode);
 	}
+	p9_client_clunk(dfid);
 	return ofid;
 error:
 	if (ofid)
@@ -760,6 +764,7 @@ struct dentry *v9fs_vfs_lookup(struct inode *dir, struct dentry *dentry,
 	 */
 	name = dentry->d_name.name;
 	fid = p9_client_walk(dfid, 1, &name, 1);
+	p9_client_clunk(dfid);
 	if (fid == ERR_PTR(-ENOENT))
 		inode = NULL;
 	else if (IS_ERR(fid))
@@ -910,7 +915,7 @@ v9fs_vfs_rename(struct inode *old_dir, struct dentry *old_dentry,
 	struct inode *old_inode;
 	struct inode *new_inode;
 	struct v9fs_session_info *v9ses;
-	struct p9_fid *oldfid;
+	struct p9_fid *oldfid, *dfid;
 	struct p9_fid *olddirfid;
 	struct p9_fid *newdirfid;
 	struct p9_wstat wstat;
@@ -927,13 +932,20 @@ v9fs_vfs_rename(struct inode *old_dir, struct dentry *old_dentry,
 	if (IS_ERR(oldfid))
 		return PTR_ERR(oldfid);
 
-	olddirfid = clone_fid(v9fs_parent_fid(old_dentry));
+	dfid = v9fs_parent_fid(old_dentry);
+	olddirfid = clone_fid(dfid);
+	if (dfid && !IS_ERR(dfid))
+		p9_client_clunk(dfid);
+
 	if (IS_ERR(olddirfid)) {
 		retval = PTR_ERR(olddirfid);
 		goto done;
 	}
 
-	newdirfid = clone_fid(v9fs_parent_fid(new_dentry));
+	dfid = v9fs_parent_fid(new_dentry);
+	newdirfid = clone_fid(dfid);
+	p9_client_clunk(dfid);
+
 	if (IS_ERR(newdirfid)) {
 		retval = PTR_ERR(newdirfid);
 		goto clunk_olddir;
@@ -990,6 +1002,7 @@ v9fs_vfs_rename(struct inode *old_dir, struct dentry *old_dentry,
 	p9_client_clunk(olddirfid);
 
 done:
+	p9_client_clunk(oldfid);
 	return retval;
 }
 
@@ -1022,6 +1035,7 @@ v9fs_vfs_getattr(const struct path *path, struct kstat *stat,
 		return PTR_ERR(fid);
 
 	st = p9_client_stat(fid);
+	p9_client_clunk(fid);
 	if (IS_ERR(st))
 		return PTR_ERR(st);
 
@@ -1042,7 +1056,7 @@ v9fs_vfs_getattr(const struct path *path, struct kstat *stat,
 
 static int v9fs_vfs_setattr(struct dentry *dentry, struct iattr *iattr)
 {
-	int retval;
+	int retval, use_dentry = 0;
 	struct v9fs_session_info *v9ses;
 	struct p9_fid *fid = NULL;
 	struct p9_wstat wstat;
@@ -1058,8 +1072,10 @@ static int v9fs_vfs_setattr(struct dentry *dentry, struct iattr *iattr)
 		fid = iattr->ia_file->private_data;
 		WARN_ON(!fid);
 	}
-	if (!fid)
+	if (!fid) {
 		fid = v9fs_fid_lookup(dentry);
+		use_dentry = 1;
+	}
 	if(IS_ERR(fid))
 		return PTR_ERR(fid);
 
@@ -1089,6 +1105,10 @@ static int v9fs_vfs_setattr(struct dentry *dentry, struct iattr *iattr)
 		filemap_write_and_wait(d_inode(dentry)->i_mapping);
 
 	retval = p9_client_wstat(fid, &wstat);
+
+	if (use_dentry)
+		p9_client_clunk(fid);
+
 	if (retval < 0)
 		return retval;
 
@@ -1213,6 +1233,7 @@ static const char *v9fs_vfs_get_link(struct dentry *dentry,
 		return ERR_PTR(-EBADF);
 
 	st = p9_client_stat(fid);
+	p9_client_clunk(fid);
 	if (IS_ERR(st))
 		return ERR_CAST(st);
 
diff --git a/fs/9p/vfs_inode_dotl.c b/fs/9p/vfs_inode_dotl.c
index 08f2e089fb0e..823c2eb5f1bf 100644
--- a/fs/9p/vfs_inode_dotl.c
+++ b/fs/9p/vfs_inode_dotl.c
@@ -296,6 +296,7 @@ v9fs_vfs_atomic_open_dotl(struct inode *dir, struct dentry *dentry,
 
 	/* instantiate inode and assign the unopened fid to the dentry */
 	fid = p9_client_walk(dfid, 1, &name, 1);
+	p9_client_clunk(dfid);
 	if (IS_ERR(fid)) {
 		err = PTR_ERR(fid);
 		p9_debug(P9_DEBUG_VFS, "p9_client_walk failed %d\n", err);
@@ -408,7 +409,6 @@ static int v9fs_vfs_mkdir_dotl(struct inode *dir,
 	err = p9_client_mkdir_dotl(dfid, name, mode, gid, &qid);
 	if (err < 0)
 		goto error;
-
 	fid = p9_client_walk(dfid, 1, &name, 1);
 	if (IS_ERR(fid)) {
 		err = PTR_ERR(fid);
@@ -452,6 +452,7 @@ static int v9fs_vfs_mkdir_dotl(struct inode *dir,
 	if (fid)
 		p9_client_clunk(fid);
 	v9fs_put_acl(dacl, pacl);
+	p9_client_clunk(dfid);
 	return err;
 }
 
@@ -479,6 +480,7 @@ v9fs_vfs_getattr_dotl(const struct path *path, struct kstat *stat,
 	 */
 
 	st = p9_client_getattr_dotl(fid, P9_STATS_ALL);
+	p9_client_clunk(fid);
 	if (IS_ERR(st))
 		return PTR_ERR(st);
 
@@ -540,7 +542,7 @@ static int v9fs_mapped_iattr_valid(int iattr_valid)
 
 int v9fs_vfs_setattr_dotl(struct dentry *dentry, struct iattr *iattr)
 {
-	int retval;
+	int retval, use_dentry = 0;
 	struct p9_fid *fid = NULL;
 	struct p9_iattr_dotl p9attr;
 	struct inode *inode = d_inode(dentry);
@@ -565,8 +567,10 @@ int v9fs_vfs_setattr_dotl(struct dentry *dentry, struct iattr *iattr)
 		fid = iattr->ia_file->private_data;
 		WARN_ON(!fid);
 	}
-	if (!fid)
+	if (!fid) {
 		fid = v9fs_fid_lookup(dentry);
+		use_dentry = 1;
+	}
 	if (IS_ERR(fid))
 		return PTR_ERR(fid);
 
@@ -575,8 +579,11 @@ int v9fs_vfs_setattr_dotl(struct dentry *dentry, struct iattr *iattr)
 		filemap_write_and_wait(inode->i_mapping);
 
 	retval = p9_client_setattr(fid, &p9attr);
-	if (retval < 0)
+	if (retval < 0) {
+		if (use_dentry)
+			p9_client_clunk(fid);
 		return retval;
+	}
 
 	if ((iattr->ia_valid & ATTR_SIZE) &&
 	    iattr->ia_size != i_size_read(inode))
@@ -588,9 +595,15 @@ int v9fs_vfs_setattr_dotl(struct dentry *dentry, struct iattr *iattr)
 	if (iattr->ia_valid & ATTR_MODE) {
 		/* We also want to update ACL when we update mode bits */
 		retval = v9fs_acl_chmod(inode, fid);
-		if (retval < 0)
+		if (retval < 0) {
+			if (use_dentry)
+				p9_client_clunk(fid);
 			return retval;
+		}
 	}
+	if (use_dentry)
+		p9_client_clunk(fid);
+
 	return 0;
 }
 
@@ -742,6 +755,7 @@ v9fs_vfs_symlink_dotl(struct inode *dir, struct dentry *dentry,
 	if (fid)
 		p9_client_clunk(fid);
 
+	p9_client_clunk(dfid);
 	return err;
 }
 
@@ -770,11 +784,15 @@ v9fs_vfs_link_dotl(struct dentry *old_dentry, struct inode *dir,
 		return PTR_ERR(dfid);
 
 	oldfid = v9fs_fid_lookup(old_dentry);
-	if (IS_ERR(oldfid))
+	if (IS_ERR(oldfid)) {
+		p9_client_clunk(dfid);
 		return PTR_ERR(oldfid);
+	}
 
 	err = p9_client_link(dfid, oldfid, dentry->d_name.name);
 
+	p9_client_clunk(dfid);
+	p9_client_clunk(oldfid);
 	if (err < 0) {
 		p9_debug(P9_DEBUG_VFS, "p9_client_link failed %d\n", err);
 		return err;
@@ -789,6 +807,7 @@ v9fs_vfs_link_dotl(struct dentry *old_dentry, struct inode *dir,
 			return PTR_ERR(fid);
 
 		v9fs_refresh_inode_dotl(fid, d_inode(old_dentry));
+		p9_client_clunk(fid);
 	}
 	ihold(d_inode(old_dentry));
 	d_instantiate(dentry, d_inode(old_dentry));
@@ -887,6 +906,8 @@ v9fs_vfs_mknod_dotl(struct inode *dir, struct dentry *dentry, umode_t omode,
 	if (fid)
 		p9_client_clunk(fid);
 	v9fs_put_acl(dacl, pacl);
+	p9_client_clunk(dfid);
+
 	return err;
 }
 
@@ -915,6 +936,7 @@ v9fs_vfs_get_link_dotl(struct dentry *dentry,
 	if (IS_ERR(fid))
 		return ERR_CAST(fid);
 	retval = p9_client_readlink(fid, &target);
+	p9_client_clunk(fid);
 	if (retval)
 		return ERR_PTR(retval);
 	set_delayed_call(done, kfree_link, target);
diff --git a/fs/9p/vfs_super.c b/fs/9p/vfs_super.c
index 74df32be4c6a..d54ddcfb69ed 100644
--- a/fs/9p/vfs_super.c
+++ b/fs/9p/vfs_super.c
@@ -267,6 +267,7 @@ static int v9fs_statfs(struct dentry *dentry, struct kstatfs *buf)
 	}
 	res = simple_statfs(dentry, buf);
 done:
+	p9_client_clunk(fid);
 	return res;
 }
 
diff --git a/fs/9p/xattr.c b/fs/9p/xattr.c
index ac8ff8ca4c11..87217dd0433e 100644
--- a/fs/9p/xattr.c
+++ b/fs/9p/xattr.c
@@ -71,14 +71,17 @@ ssize_t v9fs_xattr_get(struct dentry *dentry, const char *name,
 		       void *buffer, size_t buffer_size)
 {
 	struct p9_fid *fid;
+	int ret;
 
 	p9_debug(P9_DEBUG_VFS, "name = %s value_len = %zu\n",
 		 name, buffer_size);
 	fid = v9fs_fid_lookup(dentry);
 	if (IS_ERR(fid))
 		return PTR_ERR(fid);
+	ret = v9fs_fid_xattr_get(fid, name, buffer, buffer_size);
+	p9_client_clunk(fid);
 
-	return v9fs_fid_xattr_get(fid, name, buffer, buffer_size);
+	return ret;
 }
 
 /*
@@ -96,8 +99,15 @@ ssize_t v9fs_xattr_get(struct dentry *dentry, const char *name,
 int v9fs_xattr_set(struct dentry *dentry, const char *name,
 		   const void *value, size_t value_len, int flags)
 {
-	struct p9_fid *fid = v9fs_fid_lookup(dentry);
-	return v9fs_fid_xattr_set(fid, name, value, value_len, flags);
+	int ret;
+	struct p9_fid *fid;
+
+	fid  = v9fs_fid_lookup(dentry);
+	if (IS_ERR(fid))
+		return PTR_ERR(fid);
+	ret = v9fs_fid_xattr_set(fid, name, value, value_len, flags);
+	p9_client_clunk(fid);
+	return ret;
 }
 
 int v9fs_fid_xattr_set(struct p9_fid *fid, const char *name,
diff --git a/include/net/9p/client.h b/include/net/9p/client.h
index ce7882da8e86..58ed9bd306bd 100644
--- a/include/net/9p/client.h
+++ b/include/net/9p/client.h
@@ -140,10 +140,16 @@ struct p9_client {
  *
  * TODO: This needs lots of explanation.
  */
+enum fid_source {
+	FID_FROM_OTHER,
+	FID_FROM_INODE,
+	FID_FROM_DENTRY,
+};
 
 struct p9_fid {
 	struct p9_client *clnt;
 	u32 fid;
+	atomic_t count;
 	int mode;
 	struct p9_qid qid;
 	u32 iounit;
diff --git a/net/9p/client.c b/net/9p/client.c
index 1a3f72bf45fc..a6c8a915e0d8 100644
--- a/net/9p/client.c
+++ b/net/9p/client.c
@@ -901,6 +901,7 @@ static struct p9_fid *p9_fid_create(struct p9_client *clnt)
 	fid->clnt = clnt;
 	fid->rdir = NULL;
 	fid->fid = 0;
+	atomic_set(&fid->count, 1);
 
 	idr_preload(GFP_KERNEL);
 	spin_lock_irq(&clnt->lock);
@@ -908,7 +909,6 @@ static struct p9_fid *p9_fid_create(struct p9_client *clnt)
 			    GFP_NOWAIT);
 	spin_unlock_irq(&clnt->lock);
 	idr_preload_end();
-
 	if (!ret)
 		return fid;
 
@@ -1187,7 +1187,6 @@ struct p9_fid *p9_client_walk(struct p9_fid *oldfid, uint16_t nwname,
 
 	p9_debug(P9_DEBUG_9P, ">>> TWALK fids %d,%d nwname %ud wname[0] %s\n",
 		 oldfid->fid, fid->fid, nwname, wnames ? wnames[0] : NULL);
-
 	req = p9_client_rpc(clnt, P9_TWALK, "ddT", oldfid->fid, fid->fid,
 								nwname, wnames);
 	if (IS_ERR(req)) {
@@ -1461,12 +1460,14 @@ int p9_client_clunk(struct p9_fid *fid)
 	struct p9_req_t *req;
 	int retries = 0;
 
-	if (!fid) {
-		pr_warn("%s (%d): Trying to clunk with NULL fid\n",
+	if (!fid || IS_ERR(fid)) {
+		pr_warn("%s (%d): Trying to clunk with invalid fid\n",
 			__func__, task_pid_nr(current));
 		dump_stack();
 		return 0;
 	}
+	if (!atomic_dec_and_test(&fid->count))
+		return 0;
 
 again:
 	p9_debug(P9_DEBUG_9P, ">>> TCLUNK fid %d (try %d)\n", fid->fid,
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [PATCH RFC v2 4/4] 9p: fix race issue in fid contention.
  2020-09-23 14:11 ` [PATCH RFC v2 4/4] 9p: fix race issue in fid contention Jianyong Wu
@ 2020-09-23 14:49   ` Dominique Martinet
  2020-09-24  8:38     ` Jianyong Wu
  2020-11-03 10:41   ` Dominique Martinet
  1 sibling, 1 reply; 18+ messages in thread
From: Dominique Martinet @ 2020-09-23 14:49 UTC (permalink / raw)
  To: Jianyong Wu
  Cc: ericvh, lucho, qemu_oss, groug, v9fs-developer, linux-kernel, justin.he


Overall looks good; a few comments.

Jianyong Wu wrote on Wed, Sep 23, 2020:
> open-unlink-f*syscall test:
> I have tested for f*syscall include: ftruncate fstat fchown fchmod faccessat.

Given the other thread, what did you test this with?
Since qemu doesn't work apparently do you have a in-house server at arm
I could test?
(I'll try with ganesha otherwise, it keeps files open so it should work
I think...)

> +	atomic_set(&fid->count, 1);

I kind of like the refcount API becauese it has some extra overflow
checks; but it requires a bit more work around clunk (instead of bailing
out early if counter hits 0, you need to have it call a separate
function in case it does)

That's mostly esthetics though I'm not going to fuss over that.

> @@ -74,6 +77,7 @@ static struct p9_fid *v9fs_fid_find_inode(struct inode *inode, kuid_t uid)
>  void v9fs_open_fid_add(struct inode *inode, struct p9_fid *fid)
>  {
>  	spin_lock(&inode->i_lock);
> +	atomic_set(&fid->count, 1);

Hm, that should be done at fid creation time in net/9p/client.c
p9_fid_create ; no ?
(you do it there already, I don't see what reseting count here brings
except confusion)

> diff --git a/fs/9p/fid.h b/fs/9p/fid.h
> index dfa11df02818..1fed96546728 100644
> --- a/fs/9p/fid.h
> +++ b/fs/9p/fid.h
> @@ -22,6 +22,14 @@ static inline struct p9_fid *clone_fid(struct p9_fid *fid)
>  }
>  static inline struct p9_fid *v9fs_fid_clone(struct dentry *dentry)
>  {
> -	return clone_fid(v9fs_fid_lookup(dentry));
> +	struct p9_fid *fid, *nfid;
> +
> +	fid = v9fs_fid_lookup(dentry);
> +	if (!fid || IS_ERR(fid))
> +		return fid;
> +
> +	nfid = p9_client_walk(fid, 0, NULL, 1);

I think you clone_fid() here is slightly easier to understand; everyone
doesn't know that a walk with no component is a clone.
The compiler will optimize that IS_ERR(fid) is checked twice, it's fine.

> diff --git a/include/net/9p/client.h b/include/net/9p/client.h
> index ce7882da8e86..58ed9bd306bd 100644
> --- a/include/net/9p/client.h
> +++ b/include/net/9p/client.h
> @@ -140,10 +140,16 @@ struct p9_client {
>   *
>   * TODO: This needs lots of explanation.
>   */
> +enum fid_source {
> +	FID_FROM_OTHER,
> +	FID_FROM_INODE,
> +	FID_FROM_DENTRY,
> +};

leftovers from previous iteration.


Overall looks good to me.
I'd need to spend some time checking the actual counting part &
hammering the fs a bit then confirming no fid got forgotten (there's a
pr_info at umount time) but I'm happy with this ; thanks!

-- 
Dominique





^ permalink raw reply	[flat|nested] 18+ messages in thread

* RE: [PATCH RFC v2 4/4] 9p: fix race issue in fid contention.
  2020-09-23 14:49   ` Dominique Martinet
@ 2020-09-24  8:38     ` Jianyong Wu
  2020-09-24  8:56       ` Greg Kurz
  2020-09-24  9:51       ` Dominique Martinet
  0 siblings, 2 replies; 18+ messages in thread
From: Jianyong Wu @ 2020-09-24  8:38 UTC (permalink / raw)
  To: Dominique Martinet
  Cc: ericvh, lucho, qemu_oss, groug, v9fs-developer, linux-kernel, Justin He

Hi Dominique,

> -----Original Message-----
> From: Dominique Martinet <asmadeus@codewreck.org>
> Sent: Wednesday, September 23, 2020 10:50 PM
> To: Jianyong Wu <Jianyong.Wu@arm.com>
> Cc: ericvh@gmail.com; lucho@ionkov.net; qemu_oss@crudebyte.com;
> groug@kaod.org; v9fs-developer@lists.sourceforge.net; linux-
> kernel@vger.kernel.org; Justin He <Justin.He@arm.com>
> Subject: Re: [PATCH RFC v2 4/4] 9p: fix race issue in fid contention.
>
>
> Overall looks good; a few comments.

Thanks
>
> Jianyong Wu wrote on Wed, Sep 23, 2020:
> > open-unlink-f*syscall test:
> > I have tested for f*syscall include: ftruncate fstat fchown fchmod faccessat.
>
> Given the other thread, what did you test this with?
Er, I just use Greg's qemu of https://github.com/gkurz/qemu.git: 9p-attr-fixes. I should have referenced it in commit message.

> Since qemu doesn't work apparently do you have a in-house server at arm I
> could test?
> (I'll try with ganesha otherwise, it keeps files open so it should work I think...)
>
Yeah, I test this on my arm server. But as these codes are arch-free, we can test it in whatever a machine.
(also the server in arm can't be accessed by outer space, so sorry)
But I think that this test are far from serious and complete.

> > +atomic_set(&fid->count, 1);
>
> I kind of like the refcount API becauese it has some extra overflow checks;
> but it requires a bit more work around clunk (instead of bailing out early if
> counter hits 0, you need to have it call a separate function in case it does)
>
> That's mostly esthetics though I'm not going to fuss over that.
>
Sorry, I'm not sure what's the context of this line, does this line lie in "__add_fid”. I'm not sure about
why it do harm to clunk?

> > @@ -74,6 +77,7 @@ static struct p9_fid *v9fs_fid_find_inode(struct
> > inode *inode, kuid_t uid)  void v9fs_open_fid_add(struct inode *inode,
> > struct p9_fid *fid)  {
> >  spin_lock(&inode->i_lock);
> > +atomic_set(&fid->count, 1);
>
> Hm, that should be done at fid creation time in net/9p/client.c p9_fid_create ;
> no ?
> (you do it there already, I don't see what reseting count here brings except
> confusion)
>
I put this counter set op before the fids are added to hlist. So I can make sure the counter value is 1 before
fids are used. It's redundant code. I can remove it in both "__add_fid" and "v9fs_open_fid_add", but we must take care of it that
no clunk is called between fids are created and added to hlist. Both are good for me.

> > diff --git a/fs/9p/fid.h b/fs/9p/fid.h index
> > dfa11df02818..1fed96546728 100644
> > --- a/fs/9p/fid.h
> > +++ b/fs/9p/fid.h
> > @@ -22,6 +22,14 @@ static inline struct p9_fid *clone_fid(struct
> > p9_fid *fid)  }  static inline struct p9_fid *v9fs_fid_clone(struct
> > dentry *dentry)  {
> > -return clone_fid(v9fs_fid_lookup(dentry));
> > +struct p9_fid *fid, *nfid;
> > +
> > +fid = v9fs_fid_lookup(dentry);
> > +if (!fid || IS_ERR(fid))
> > +return fid;
> > +
> > +nfid = p9_client_walk(fid, 0, NULL, 1);
>
> I think you clone_fid() here is slightly easier to understand; everyone doesn't
> know that a walk with no component is a clone.
> The compiler will optimize that IS_ERR(fid) is checked twice, it's fine.
>
Er, I rewrite it because I must acquire the intermedia fid from v9fs_fid_lookup and clunk it.

> > diff --git a/include/net/9p/client.h b/include/net/9p/client.h index
> > ce7882da8e86..58ed9bd306bd 100644
> > --- a/include/net/9p/client.h
> > +++ b/include/net/9p/client.h
> > @@ -140,10 +140,16 @@ struct p9_client {
> >   *
> >   * TODO: This needs lots of explanation.
> >   */
> > +enum fid_source {
> > +FID_FROM_OTHER,
> > +FID_FROM_INODE,
> > +FID_FROM_DENTRY,
> > +};
>
> leftovers from previous iteration.
>
Sorry, need remove it.

>
> Overall looks good to me.
> I'd need to spend some time checking the actual counting part & hammering
> the fs a bit then confirming no fid got forgotten (there's a pr_info at umount
> time) but I'm happy with this ; thanks!
>
Yeah, it's a tedious job to do that. Also we need to find a reliable test suit. Then we can check
this patch both from code and test.

Thanks
Jianyong
> --
> Dominique
>
>
>

IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH RFC v2 4/4] 9p: fix race issue in fid contention.
  2020-09-24  8:38     ` Jianyong Wu
@ 2020-09-24  8:56       ` Greg Kurz
  2020-09-24  9:51       ` Dominique Martinet
  1 sibling, 0 replies; 18+ messages in thread
From: Greg Kurz @ 2020-09-24  8:56 UTC (permalink / raw)
  To: Jianyong Wu
  Cc: Dominique Martinet, ericvh, lucho, qemu_oss, v9fs-developer,
	linux-kernel, Justin He

On Thu, 24 Sep 2020 08:38:01 +0000
Jianyong Wu <Jianyong.Wu@arm.com> wrote:

> > Given the other thread, what did you test this with?  
> Er, I just use Greg's qemu of https://github.com/gkurz/qemu.git: 9p-attr-fixes. I should have referenced it in commit message.

... which is a pretty old QEMU version BTW.

https://github.com/gkurz/qemu/blob/9p-attr-fixes/VERSION

2.6.50 aka 2.7 development tree

As said by Christian in some other mail, if someone wants these fixes to be
effective when using QEMU, they should maybe invest time to rebase against
the current development branch. I personally don't have time to do that but
I'm available to answer questions if needed.

Cheers,

--
Greg

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH RFC v2 4/4] 9p: fix race issue in fid contention.
  2020-09-24  8:38     ` Jianyong Wu
  2020-09-24  8:56       ` Greg Kurz
@ 2020-09-24  9:51       ` Dominique Martinet
  2020-09-25  9:49         ` Jianyong Wu
  1 sibling, 1 reply; 18+ messages in thread
From: Dominique Martinet @ 2020-09-24  9:51 UTC (permalink / raw)
  To: Jianyong Wu
  Cc: ericvh, lucho, qemu_oss, groug, v9fs-developer, linux-kernel, Justin He

Jianyong Wu wrote on Thu, Sep 24, 2020:
> > Jianyong Wu wrote on Wed, Sep 23, 2020:
> > > open-unlink-f*syscall test:
> > > I have tested for f*syscall include: ftruncate fstat fchown fchmod faccessat.
> >
> > Given the other thread, what did you test this with?
>
> Er, I just use Greg's qemu of https://github.com/gkurz/qemu.git:
> 9p-attr-fixes. I should have referenced it in commit message.

Ok. That branch is fairly old (based on pre-2.7.0 code), so will
need some work as well.
It doesn't look like anyone has time to refresh the patches from what I
just have read so it might fall on you as well...

(I see Greg just made the same point, took a bit too long to write this
mail ;))

> > Since qemu doesn't work apparently do you have a in-house server at arm I
> > could test?
> > (I'll try with ganesha otherwise, it keeps files open so it should work I think...)
>
> Yeah, I test this on my arm server. But as these codes are arch-free, we can test it in whatever a machine.
> (also the server in arm can't be accessed by outer space, so sorry)
> But I think that this test are far from serious and complete.

Yes I meant arm-specific code, not infrastructure. This is fine I'll do
more tests here.
 
> > > +atomic_set(&fid->count, 1);
> >
> > I kind of like the refcount API becauese it has some extra overflow checks;
> > but it requires a bit more work around clunk (instead of bailing out early if
> > counter hits 0, you need to have it call a separate function in case it does)
> >
> > That's mostly esthetics though I'm not going to fuss over that.
>
> Sorry, I'm not sure what's the context of this line, does this line lie in "__add_fid”. I'm not sure about
> why it do harm to clunk?

It's not about clunk, it's about atomic_set/inc/dec
vs. refcount_set/inc/dec -- because refcount has overflow checks.

I've just noticed refcount_dec_and_test that could be used as a drop-in
replacement, so it would be good ot update.

> 
> > > @@ -74,6 +77,7 @@ static struct p9_fid *v9fs_fid_find_inode(struct
> > > inode *inode, kuid_t uid)  void v9fs_open_fid_add(struct inode *inode,
> > > struct p9_fid *fid)  {
> > >  spin_lock(&inode->i_lock);
> > > +atomic_set(&fid->count, 1);
> >
> > Hm, that should be done at fid creation time in net/9p/client.c p9_fid_create ;
> > no ?
> > (you do it there already, I don't see what reseting count here brings except
> > confusion)
> 
> I put this counter set op before the fids are added to hlist. So I can
> make sure the counter value is 1 before fids are used. It's redundant
> code. I can remove it in both "__add_fid" and "v9fs_open_fid_add", but

I'm not sure what you're trying to do.
There are two ways to handle inserting to a list as far as refcounting
goes:
 - consider you add a new reference that will be removed when the fid is
removed from the list ; in that case you should increment the counter
and clunk the fid as usual whne you're done using it in whatever calls
__add_fid and v9fs_open_fid_add
 - or consider it an ownership transfer; then don't increment refcount,
but you must also never use the fid ever again after in the same thread
(because another thread could free the list and clunk) ; so if you still
need to use the fid after adding to the list the first option is better.

I'm not sure a fid could be added to both list in some usage patterns
but if that is the case reseting the count to 1 is most definitely an
error, as both would want to be able to decrement once.

> we must take care of it that no clunk is called between fids are
> created and added to hlist. Both are good for me.

note for this point that if the fid is not in any list it cannot be
accessed by another thread and thus cannot be raced on clunk.



> > > diff --git a/fs/9p/fid.h b/fs/9p/fid.h index
> > > dfa11df02818..1fed96546728 100644
> > > --- a/fs/9p/fid.h
> > > +++ b/fs/9p/fid.h
> > > @@ -22,6 +22,14 @@ static inline struct p9_fid *clone_fid(struct
> > > p9_fid *fid)  }  static inline struct p9_fid *v9fs_fid_clone(struct
> > > dentry *dentry)  {
> > > -return clone_fid(v9fs_fid_lookup(dentry));
> > > +struct p9_fid *fid, *nfid;
> > > +
> > > +fid = v9fs_fid_lookup(dentry);
> > > +if (!fid || IS_ERR(fid))
> > > +return fid;
> > > +
> > > +nfid = p9_client_walk(fid, 0, NULL, 1);
> >
> > I think you clone_fid() here is slightly easier to understand; everyone doesn't
> > know that a walk with no component is a clone.
> > The compiler will optimize that IS_ERR(fid) is checked twice, it's fine.
> >
> Er, I rewrite it because I must acquire the intermedia fid from
> v9fs_fid_lookup and clunk it.

I agree we need to expand clone_fid(lookup()) patterns, this is good.
I'm just saying clone_fid() is easier to understand than
p9_client_walk() with no component.

(so this is a one-for-one replacement)


> > Overall looks good to me.
> > I'd need to spend some time checking the actual counting part & hammering
> > the fs a bit then confirming no fid got forgotten (there's a pr_info at umount
> > time) but I'm happy with this ; thanks!
>
> Yeah, it's a tedious job to do that. Also we need to find a reliable test suit. Then we can check
> this patch both from code and test.

I use a subset of xfstests plus some hand-crafted tests most of which
can be found in https://github.com/phdeniel/sigmund (I no longer use it
though)

I'm sure some xfstests also do open-unlink-whatever tests, just need to
find which :P

-- 
Dominique

^ permalink raw reply	[flat|nested] 18+ messages in thread

* RE: [PATCH RFC v2 4/4] 9p: fix race issue in fid contention.
  2020-09-24  9:51       ` Dominique Martinet
@ 2020-09-25  9:49         ` Jianyong Wu
  0 siblings, 0 replies; 18+ messages in thread
From: Jianyong Wu @ 2020-09-25  9:49 UTC (permalink / raw)
  To: Dominique Martinet
  Cc: ericvh, lucho, qemu_oss, groug, v9fs-developer, linux-kernel, Justin He

Hi Dominique,

> -----Original Message-----
> From: Dominique Martinet <asmadeus@codewreck.org>
> Sent: Thursday, September 24, 2020 5:52 PM
> To: Jianyong Wu <Jianyong.Wu@arm.com>
> Cc: ericvh@gmail.com; lucho@ionkov.net; qemu_oss@crudebyte.com;
> groug@kaod.org; v9fs-developer@lists.sourceforge.net; linux-
> kernel@vger.kernel.org; Justin He <Justin.He@arm.com>
> Subject: Re: [PATCH RFC v2 4/4] 9p: fix race issue in fid contention.
>
> Jianyong Wu wrote on Thu, Sep 24, 2020:
> > > Jianyong Wu wrote on Wed, Sep 23, 2020:
> > > > open-unlink-f*syscall test:
> > > > I have tested for f*syscall include: ftruncate fstat fchown fchmod
> faccessat.
> > >
> > > Given the other thread, what did you test this with?
> >
> > Er, I just use Greg's qemu of https://github.com/gkurz/qemu.git:
> > 9p-attr-fixes. I should have referenced it in commit message.
>
> Ok. That branch is fairly old (based on pre-2.7.0 code), so will need some
> work as well.
> It doesn't look like anyone has time to refresh the patches from what I just
> have read so it might fall on you as well...
>
> (I see Greg just made the same point, took a bit too long to write this mail ;))
>
> > > Since qemu doesn't work apparently do you have a in-house server at
> > > arm I could test?
> > > (I'll try with ganesha otherwise, it keeps files open so it should
> > > work I think...)
> >
> > Yeah, I test this on my arm server. But as these codes are arch-free, we can
> test it in whatever a machine.
> > (also the server in arm can't be accessed by outer space, so sorry)
> > But I think that this test are far from serious and complete.
>
> Yes I meant arm-specific code, not infrastructure. This is fine I'll do more tests
> here.
>
Yeah.
> > > > +atomic_set(&fid->count, 1);
> > >
> > > I kind of like the refcount API becauese it has some extra overflow
> > > checks; but it requires a bit more work around clunk (instead of
> > > bailing out early if counter hits 0, you need to have it call a
> > > separate function in case it does)
> > >
> > > That's mostly esthetics though I'm not going to fuss over that.
> >
> > Sorry, I'm not sure what's the context of this line, does this line
> > lie in "__add_fid”. I'm not sure about why it do harm to clunk?
>
> It's not about clunk, it's about atomic_set/inc/dec vs. refcount_set/inc/dec --
> because refcount has overflow checks.
>
> I've just noticed refcount_dec_and_test that could be used as a drop-in
> replacement, so it would be good ot update.
>
OK.
> >
> > > > @@ -74,6 +77,7 @@ static struct p9_fid *v9fs_fid_find_inode(struct
> > > > inode *inode, kuid_t uid)  void v9fs_open_fid_add(struct inode
> > > > *inode, struct p9_fid *fid)  {  spin_lock(&inode->i_lock);
> > > > +atomic_set(&fid->count, 1);
> > >
> > > Hm, that should be done at fid creation time in net/9p/client.c
> p9_fid_create ;
> > > no ?
> > > (you do it there already, I don't see what reseting count here brings
> except
> > > confusion)
> >
> > I put this counter set op before the fids are added to hlist. So I can
> > make sure the counter value is 1 before fids are used. It's redundant
> > code. I can remove it in both "__add_fid" and "v9fs_open_fid_add", but
>
> I'm not sure what you're trying to do.
> There are two ways to handle inserting to a list as far as refcounting
> goes:
>  - consider you add a new reference that will be removed when the fid is
> removed from the list ; in that case you should increment the counter
> and clunk the fid as usual whne you're done using it in whatever calls
> __add_fid and v9fs_open_fid_add
>  - or consider it an ownership transfer; then don't increment refcount,
> but you must also never use the fid ever again after in the same thread
> (because another thread could free the list and clunk) ; so if you still
> need to use the fid after adding to the list the first option is better.
>
> I'm not sure a fid could be added to both list in some usage patterns
> but if that is the case reseting the count to 1 is most definitely an
> error, as both would want to be able to decrement once.
>
> > we must take care of it that no clunk is called between fids are
> > created and added to hlist. Both are good for me.
>
> note for this point that if the fid is not in any list it cannot be
> accessed by another thread and thus cannot be raced on clunk.
>
>
Fine, I will remove the "set" op.
>
> > > > diff --git a/fs/9p/fid.h b/fs/9p/fid.h index
> > > > dfa11df02818..1fed96546728 100644
> > > > --- a/fs/9p/fid.h
> > > > +++ b/fs/9p/fid.h
> > > > @@ -22,6 +22,14 @@ static inline struct p9_fid *clone_fid(struct
> > > > p9_fid *fid)  }  static inline struct p9_fid *v9fs_fid_clone(struct
> > > > dentry *dentry)  {
> > > > -return clone_fid(v9fs_fid_lookup(dentry));
> > > > +struct p9_fid *fid, *nfid;
> > > > +
> > > > +fid = v9fs_fid_lookup(dentry);
> > > > +if (!fid || IS_ERR(fid))
> > > > +return fid;
> > > > +
> > > > +nfid = p9_client_walk(fid, 0, NULL, 1);
> > >
> > > I think you clone_fid() here is slightly easier to understand; everyone
> doesn't
> > > know that a walk with no component is a clone.
> > > The compiler will optimize that IS_ERR(fid) is checked twice, it's fine.
> > >
> > Er, I rewrite it because I must acquire the intermedia fid from
> > v9fs_fid_lookup and clunk it.
>
> I agree we need to expand clone_fid(lookup()) patterns, this is good.
> I'm just saying clone_fid() is easier to understand than
> p9_client_walk() with no component.
>
> (so this is a one-for-one replacement)
>
Yeah, this change makes it more pellucid.

>
> > > Overall looks good to me.
> > > I'd need to spend some time checking the actual counting part &
> hammering
> > > the fs a bit then confirming no fid got forgotten (there's a pr_info at
> umount
> > > time) but I'm happy with this ; thanks!
> >
> > Yeah, it's a tedious job to do that. Also we need to find a reliable test suit.
> Then we can check
> > this patch both from code and test.
>
> I use a subset of xfstests plus some hand-crafted tests most of which
> can be found in https://github.com/phdeniel/sigmund (I no longer use it
> though)
>
> I'm sure some xfstests also do open-unlink-whatever tests, just need to
> find which :P
>
Fine.

Thanks
Jianyong

> --
> Dominique
IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH RFC v2 4/4] 9p: fix race issue in fid contention.
  2020-09-23 14:11 ` [PATCH RFC v2 4/4] 9p: fix race issue in fid contention Jianyong Wu
  2020-09-23 14:49   ` Dominique Martinet
@ 2020-11-03 10:41   ` Dominique Martinet
  2020-11-04 11:32     ` Christian Schoenebeck
                       ` (2 more replies)
  1 sibling, 3 replies; 18+ messages in thread
From: Dominique Martinet @ 2020-11-03 10:41 UTC (permalink / raw)
  To: Jianyong Wu
  Cc: ericvh, lucho, qemu_oss, groug, v9fs-developer, linux-kernel, justin.he

Jianyong,

Jianyong Wu wrote on Wed, Sep 23, 2020:
> Eric's and Greg's patch offer a mechanism to fix open-unlink-f*syscall
> bug in 9p. But there is race issue in fid comtention.
> As Greg's patch stores all of fids from opened files into according inode,
> so all the lookup fid ops can retrieve fid from inode preferentially. But
> there is no mechanism to handle the fid comtention issue. For example,
> there are two threads get the same fid in the same time and one of them
> clunk the fid before the other thread ready to discard the fid. In this
> scenario, it will lead to some fatal problems, even kernel core dump.
> 
> I introduce a mechanism to fix this race issue. A counter field introduced
> into p9_fid struct to store the reference counter to the fid. When a fid
> is allocated from the inode or dentry, the counter will increase, and
> will decrease at the end of its occupation. It is guaranteed that the
> fid won't be clunked before the reference counter go down to 0, then
> we can avoid the clunked fid to be used.
> 
> tests:
> race issue test from the old test case:
> for file in {01..50}; do touch f.${file}; done
> seq 1 1000 | xargs -n 1 -P 50 -I{} cat f.* > /dev/null
> 
> open-unlink-f*syscall test:
> I have tested for f*syscall include: ftruncate fstat fchown fchmod faccessat.
> 
> Signed-off-by: Jianyong Wu <jianyong.wu@arm.com>

Ok so I've finally taken some time to test -- sorry for the delay.
I didn't bother with qemu but the use-after-close f* calls work with
nfs-ganesha and it doesn't introduce any obvious regression with the
current qemu either, so this is good for me.

Greg, Christian - from what I understood (in private, hopefully I'm
allowed to repeat!), he won't be able to contribute to qemu because of
company policies and I'm unlikely to take the time either right now.
I don't think it's a problem to continue as is though, we can land linux
kernel support (it's still useful for non-qemu servers) and if someone
is interested later on they'll just need to finish that bit.


I'm not seeing any obvious problem now so I'll take these patches in
-next within the next few days, with this extra patch on top that
basically applies the requests I had:
 - removing the extra atomic_set in fs/9p/fid.c
 - convert from atomic to refcount API (overflow checks)
 - rename a no-composant walk to clone_fid()

I'll just run some more instrumented tests first to check we're not
leaking anything but so far I haven't found any problem.

If you noticed anything else please speak up.
Thanks for taking the time to finish this! :)
---
 fs/9p/fid.c             | 10 ++++------
 fs/9p/fid.h             |  2 +-
 include/net/9p/client.h |  2 +-
 net/9p/client.c         |  4 ++--
 4 files changed, 8 insertions(+), 10 deletions(-)

diff --git a/fs/9p/fid.c b/fs/9p/fid.c
index 89643dabcdae..50118ec72a92 100644
--- a/fs/9p/fid.c
+++ b/fs/9p/fid.c
@@ -28,7 +28,6 @@
 
 static inline void __add_fid(struct dentry *dentry, struct p9_fid *fid)
 {
-	atomic_set(&fid->count, 1);
 	hlist_add_head(&fid->dlist, (struct hlist_head *)&dentry->d_fsdata);
 }
 
@@ -62,7 +61,7 @@ static struct p9_fid *v9fs_fid_find_inode(struct inode *inode, kuid_t uid)
 		}
 	}
 	if (ret && !IS_ERR(ret))
-		atomic_inc(&ret->count);
+		refcount_inc(&ret->count);
 	spin_unlock(&inode->i_lock);
 	return ret;
 }
@@ -77,7 +76,6 @@ static struct p9_fid *v9fs_fid_find_inode(struct inode *inode, kuid_t uid)
 void v9fs_open_fid_add(struct inode *inode, struct p9_fid *fid)
 {
 	spin_lock(&inode->i_lock);
-	atomic_set(&fid->count, 1);
 	hlist_add_head(&fid->ilist, (struct hlist_head *)&inode->i_private);
 	spin_unlock(&inode->i_lock);
 }
@@ -110,7 +108,7 @@ static struct p9_fid *v9fs_fid_find(struct dentry *dentry, kuid_t uid, int any)
 		hlist_for_each_entry(fid, h, dlist) {
 			if (any || uid_eq(fid->uid, uid)) {
 				ret = fid;
-				atomic_inc(&ret->count);
+				refcount_inc(&ret->count);
 				break;
 			}
 		}
@@ -201,7 +199,7 @@ static struct p9_fid *v9fs_fid_lookup_with_uid(struct dentry *dentry,
 	}
 	/* If we are root ourself just return that */
 	if (dentry->d_sb->s_root == dentry) {
-		atomic_inc(&fid->count);
+		refcount_inc(&fid->count);
 		return fid;
 	}
 	/*
@@ -250,7 +248,7 @@ static struct p9_fid *v9fs_fid_lookup_with_uid(struct dentry *dentry,
 			fid = ERR_PTR(-ENOENT);
 		} else {
 			__add_fid(dentry, fid);
-			atomic_inc(&fid->count);
+			refcount_inc(&fid->count);
 			spin_unlock(&dentry->d_lock);
 		}
 	}
diff --git a/fs/9p/fid.h b/fs/9p/fid.h
index 1fed96546728..f7f33509e169 100644
--- a/fs/9p/fid.h
+++ b/fs/9p/fid.h
@@ -28,7 +28,7 @@ static inline struct p9_fid *v9fs_fid_clone(struct dentry *dentry)
 	if (!fid || IS_ERR(fid))
 		return fid;
 
-	nfid = p9_client_walk(fid, 0, NULL, 1);
+	nfid = clone_fid(fid);
 	p9_client_clunk(fid);
 	return nfid;
 }
diff --git a/include/net/9p/client.h b/include/net/9p/client.h
index 58ed9bd306bd..e1c308d8d288 100644
--- a/include/net/9p/client.h
+++ b/include/net/9p/client.h
@@ -149,7 +149,7 @@ enum fid_source {
 struct p9_fid {
 	struct p9_client *clnt;
 	u32 fid;
-	atomic_t count;
+	refcount_t count;
 	int mode;
 	struct p9_qid qid;
 	u32 iounit;
diff --git a/net/9p/client.c b/net/9p/client.c
index a6c8a915e0d8..ba4910138c5b 100644
--- a/net/9p/client.c
+++ b/net/9p/client.c
@@ -901,7 +901,7 @@ static struct p9_fid *p9_fid_create(struct p9_client *clnt)
 	fid->clnt = clnt;
 	fid->rdir = NULL;
 	fid->fid = 0;
-	atomic_set(&fid->count, 1);
+	refcount_set(&fid->count, 1);
 
 	idr_preload(GFP_KERNEL);
 	spin_lock_irq(&clnt->lock);
@@ -1466,7 +1466,7 @@ int p9_client_clunk(struct p9_fid *fid)
 		dump_stack();
 		return 0;
 	}
-	if (!atomic_dec_and_test(&fid->count))
+	if (!refcount_dec_and_test(&fid->count))
 		return 0;
 
 again:


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [PATCH RFC v2 4/4] 9p: fix race issue in fid contention.
  2020-11-03 10:41   ` Dominique Martinet
@ 2020-11-04 11:32     ` Christian Schoenebeck
  2020-11-04 11:57       ` Dominique Martinet
  2020-11-05  7:05     ` Jianyong Wu
  2020-11-19 16:06     ` [PATCH 0/2] follow-up to " Dominique Martinet
  2 siblings, 1 reply; 18+ messages in thread
From: Christian Schoenebeck @ 2020-11-04 11:32 UTC (permalink / raw)
  To: Dominique Martinet
  Cc: Jianyong Wu, ericvh, lucho, groug, v9fs-developer, linux-kernel,
	justin.he

On Dienstag, 3. November 2020 11:41:16 CET Dominique Martinet wrote:
> Jianyong,
> 
> Jianyong Wu wrote on Wed, Sep 23, 2020:
> > Eric's and Greg's patch offer a mechanism to fix open-unlink-f*syscall
> > bug in 9p. But there is race issue in fid comtention.
> > As Greg's patch stores all of fids from opened files into according inode,
> > so all the lookup fid ops can retrieve fid from inode preferentially. But
> > there is no mechanism to handle the fid comtention issue. For example,
> > there are two threads get the same fid in the same time and one of them
> > clunk the fid before the other thread ready to discard the fid. In this
> > scenario, it will lead to some fatal problems, even kernel core dump.
> > 
> > I introduce a mechanism to fix this race issue. A counter field introduced
> > into p9_fid struct to store the reference counter to the fid. When a fid
> > is allocated from the inode or dentry, the counter will increase, and
> > will decrease at the end of its occupation. It is guaranteed that the
> > fid won't be clunked before the reference counter go down to 0, then
> > we can avoid the clunked fid to be used.
> > 
> > tests:
> > race issue test from the old test case:
> > for file in {01..50}; do touch f.${file}; done
> > seq 1 1000 | xargs -n 1 -P 50 -I{} cat f.* > /dev/null
> > 
> > open-unlink-f*syscall test:
> > I have tested for f*syscall include: ftruncate fstat fchown fchmod
> > faccessat.
> > 
> > Signed-off-by: Jianyong Wu <jianyong.wu@arm.com>
> 
> Ok so I've finally taken some time to test -- sorry for the delay.
> I didn't bother with qemu but the use-after-close f* calls work with
> nfs-ganesha and it doesn't introduce any obvious regression with the
> current qemu either, so this is good for me.
> 
> Greg, Christian - from what I understood (in private, hopefully I'm
> allowed to repeat!), he won't be able to contribute to qemu because of
> company policies and I'm unlikely to take the time either right now.
> I don't think it's a problem to continue as is though, we can land linux
> kernel support (it's still useful for non-qemu servers) and if someone
> is interested later on they'll just need to finish that bit.
> 

Hmm, no idea what kind of policy that is; there is no GPL3 in qemu at least 
that some companies are concerned about, but OK not my business.

I actually thought this would still take a while on kernel side, so in the 
meantime we layed the ground in qemu for resolving this issue independent of 
clients and independent of any guest OS installation by introducing test cases 
using the 9pfs 'local' filesystem driver:

https://github.com/qemu/qemu/blob/master/tests/qtest/virtio-9p-test.c

So the idea was to resolve that chicken egg problem of this issue that way and 
also handle it a bit more systematically. If you now run qemu's 9p tests with 
latest git version (or at least with yesterday's QEMU 5.2 rc1 tarball):

cd qemu/build
make
export QTEST_QEMU_BINARY=x86_64-softmmu/qemu-system-x86_64
tests/qtest/qos-test

these tests will now create a test directory qtest-9p-local-XXXXXX under the 
current directory (i.e. the build directory) where they are creating real 
directories and files like on a production system would do, just without a 
guest OS.

As you can see, there are already 9p tests for creating and deleting 
directories, files, symlinks and hard links, etc.

Maybe somebody interested to see this issue resolved in qemu might help by 
rebasing Greg's old patches and testing it with some test cases this way. 
Personally I need to work on some other things in the next couple weeks, but 
if somebody needs help, questions, review, etc., I'll be there.

> 
> I'm not seeing any obvious problem now so I'll take these patches in
> -next within the next few days, with this extra patch on top that
> basically applies the requests I had:
>  - removing the extra atomic_set in fs/9p/fid.c
>  - convert from atomic to refcount API (overflow checks)
>  - rename a no-composant walk to clone_fid()
> 
> I'll just run some more instrumented tests first to check we're not
> leaking anything but so far I haven't found any problem.
> 
> If you noticed anything else please speak up.
> Thanks for taking the time to finish this! :)
> ---
>  fs/9p/fid.c             | 10 ++++------
>  fs/9p/fid.h             |  2 +-
>  include/net/9p/client.h |  2 +-
>  net/9p/client.c         |  4 ++--
>  4 files changed, 8 insertions(+), 10 deletions(-)
> 
> diff --git a/fs/9p/fid.c b/fs/9p/fid.c
> index 89643dabcdae..50118ec72a92 100644
> --- a/fs/9p/fid.c
> +++ b/fs/9p/fid.c
> @@ -28,7 +28,6 @@
> 
>  static inline void __add_fid(struct dentry *dentry, struct p9_fid *fid)
>  {
> -	atomic_set(&fid->count, 1);
>  	hlist_add_head(&fid->dlist, (struct hlist_head *)&dentry->d_fsdata);
>  }
> 
> @@ -62,7 +61,7 @@ static struct p9_fid *v9fs_fid_find_inode(struct inode
> *inode, kuid_t uid) }
>  	}
>  	if (ret && !IS_ERR(ret))
> -		atomic_inc(&ret->count);
> +		refcount_inc(&ret->count);
>  	spin_unlock(&inode->i_lock);
>  	return ret;
>  }
> @@ -77,7 +76,6 @@ static struct p9_fid *v9fs_fid_find_inode(struct inode
> *inode, kuid_t uid) void v9fs_open_fid_add(struct inode *inode, struct
> p9_fid *fid)
>  {
>  	spin_lock(&inode->i_lock);
> -	atomic_set(&fid->count, 1);
>  	hlist_add_head(&fid->ilist, (struct hlist_head *)&inode->i_private);
>  	spin_unlock(&inode->i_lock);
>  }
> @@ -110,7 +108,7 @@ static struct p9_fid *v9fs_fid_find(struct dentry
> *dentry, kuid_t uid, int any) hlist_for_each_entry(fid, h, dlist) {
>  			if (any || uid_eq(fid->uid, uid)) {
>  				ret = fid;
> -				atomic_inc(&ret->count);
> +				refcount_inc(&ret->count);
>  				break;
>  			}
>  		}
> @@ -201,7 +199,7 @@ static struct p9_fid *v9fs_fid_lookup_with_uid(struct
> dentry *dentry, }
>  	/* If we are root ourself just return that */
>  	if (dentry->d_sb->s_root == dentry) {
> -		atomic_inc(&fid->count);
> +		refcount_inc(&fid->count);
>  		return fid;
>  	}
>  	/*
> @@ -250,7 +248,7 @@ static struct p9_fid *v9fs_fid_lookup_with_uid(struct
> dentry *dentry, fid = ERR_PTR(-ENOENT);
>  		} else {
>  			__add_fid(dentry, fid);
> -			atomic_inc(&fid->count);
> +			refcount_inc(&fid->count);
>  			spin_unlock(&dentry->d_lock);
>  		}
>  	}
> diff --git a/fs/9p/fid.h b/fs/9p/fid.h
> index 1fed96546728..f7f33509e169 100644
> --- a/fs/9p/fid.h
> +++ b/fs/9p/fid.h
> @@ -28,7 +28,7 @@ static inline struct p9_fid *v9fs_fid_clone(struct dentry
> *dentry) if (!fid || IS_ERR(fid))
>  		return fid;
> 
> -	nfid = p9_client_walk(fid, 0, NULL, 1);
> +	nfid = clone_fid(fid);
>  	p9_client_clunk(fid);
>  	return nfid;
>  }
> diff --git a/include/net/9p/client.h b/include/net/9p/client.h
> index 58ed9bd306bd..e1c308d8d288 100644
> --- a/include/net/9p/client.h
> +++ b/include/net/9p/client.h
> @@ -149,7 +149,7 @@ enum fid_source {
>  struct p9_fid {
>  	struct p9_client *clnt;
>  	u32 fid;
> -	atomic_t count;
> +	refcount_t count;
>  	int mode;
>  	struct p9_qid qid;
>  	u32 iounit;
> diff --git a/net/9p/client.c b/net/9p/client.c
> index a6c8a915e0d8..ba4910138c5b 100644
> --- a/net/9p/client.c
> +++ b/net/9p/client.c
> @@ -901,7 +901,7 @@ static struct p9_fid *p9_fid_create(struct p9_client
> *clnt) fid->clnt = clnt;
>  	fid->rdir = NULL;
>  	fid->fid = 0;
> -	atomic_set(&fid->count, 1);
> +	refcount_set(&fid->count, 1);
> 
>  	idr_preload(GFP_KERNEL);
>  	spin_lock_irq(&clnt->lock);
> @@ -1466,7 +1466,7 @@ int p9_client_clunk(struct p9_fid *fid)
>  		dump_stack();
>  		return 0;
>  	}
> -	if (!atomic_dec_and_test(&fid->count))
> +	if (!refcount_dec_and_test(&fid->count))
>  		return 0;
> 
>  again:

Best regards,
Christian Schoenebeck



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH RFC v2 4/4] 9p: fix race issue in fid contention.
  2020-11-04 11:32     ` Christian Schoenebeck
@ 2020-11-04 11:57       ` Dominique Martinet
  2020-11-05 12:32         ` Christian Schoenebeck
  0 siblings, 1 reply; 18+ messages in thread
From: Dominique Martinet @ 2020-11-04 11:57 UTC (permalink / raw)
  To: Christian Schoenebeck
  Cc: Jianyong Wu, ericvh, lucho, groug, v9fs-developer, linux-kernel,
	justin.he

Christian Schoenebeck wrote on Wed, Nov 04, 2020:
> > Greg, Christian - from what I understood (in private, hopefully I'm
> > allowed to repeat!), he won't be able to contribute to qemu because of
> > company policies and I'm unlikely to take the time either right now.
> > I don't think it's a problem to continue as is though, we can land linux
> > kernel support (it's still useful for non-qemu servers) and if someone
> > is interested later on they'll just need to finish that bit.
> 
> Hmm, no idea what kind of policy that is; there is no GPL3 in qemu at least 
> that some companies are concerned about, but OK not my business.
> 
> I actually thought this would still take a while on kernel side,

To be honest, so did I -- the original patches are so old I had more or
less given up on it :P

But I don't see any more problem now and we'll want to get there
eventually so now's a good time as any... I just want to get fault
injection to work to test various refcounting cornercases but shouldn't
be much longer.

> so in the 
> meantime we layed the ground in qemu for resolving this issue independent of 
> clients and independent of any guest OS installation by introducing test cases 
> using the 9pfs 'local' filesystem driver:
> 
> https://github.com/qemu/qemu/blob/master/tests/qtest/virtio-9p-test.c
> 
> So the idea was to resolve that chicken egg problem of this issue that way and 
> also handle it a bit more systematically. If you now run qemu's 9p tests with 
> latest git version (or at least with yesterday's QEMU 5.2 rc1 tarball):
> 
> cd qemu/build
> make
> export QTEST_QEMU_BINARY=x86_64-softmmu/qemu-system-x86_64
> tests/qtest/qos-test
> 
> these tests will now create a test directory qtest-9p-local-XXXXXX under the 
> current directory (i.e. the build directory) where they are creating real 
> directories and files like on a production system would do, just without a 
> guest OS.
> 
> As you can see, there are already 9p tests for creating and deleting 
> directories, files, symlinks and hard links, etc.
> 
> Maybe somebody interested to see this issue resolved in qemu might help by 
> rebasing Greg's old patches and testing it with some test cases this way. 
> Personally I need to work on some other things in the next couple weeks, but 
> if somebody needs help, questions, review, etc., I'll be there.

Great news, nice work there.
I see the new tests it doesn't look hard to add new ones reproducing
open-unlink-fstat for example; I think it's good to have regardless of
kernel progress.

We'll get there!
-- 
Dominique

^ permalink raw reply	[flat|nested] 18+ messages in thread

* RE: [PATCH RFC v2 4/4] 9p: fix race issue in fid contention.
  2020-11-03 10:41   ` Dominique Martinet
  2020-11-04 11:32     ` Christian Schoenebeck
@ 2020-11-05  7:05     ` Jianyong Wu
  2020-11-19 16:06     ` [PATCH 0/2] follow-up to " Dominique Martinet
  2 siblings, 0 replies; 18+ messages in thread
From: Jianyong Wu @ 2020-11-05  7:05 UTC (permalink / raw)
  To: Dominique Martinet
  Cc: ericvh, lucho, qemu_oss, groug, v9fs-developer, linux-kernel, Justin He

Hi Dominique,

Good News!
I'm happy to see that you have time to pick this up again. All the changes are OK for me.

Thanks
Jianyong

> -----Original Message-----
> From: Dominique Martinet <asmadeus@codewreck.org>
> Sent: Tuesday, November 3, 2020 6:41 PM
> To: Jianyong Wu <Jianyong.Wu@arm.com>
> Cc: ericvh@gmail.com; lucho@ionkov.net; qemu_oss@crudebyte.com;
> groug@kaod.org; v9fs-developer@lists.sourceforge.net; linux-
> kernel@vger.kernel.org; Justin He <Justin.He@arm.com>
> Subject: Re: [PATCH RFC v2 4/4] 9p: fix race issue in fid contention.
>
> Jianyong,
>
> Jianyong Wu wrote on Wed, Sep 23, 2020:
> > Eric's and Greg's patch offer a mechanism to fix open-unlink-f*syscall
> > bug in 9p. But there is race issue in fid comtention.
> > As Greg's patch stores all of fids from opened files into according
> > inode, so all the lookup fid ops can retrieve fid from inode
> > preferentially. But there is no mechanism to handle the fid comtention
> > issue. For example, there are two threads get the same fid in the same
> > time and one of them clunk the fid before the other thread ready to
> > discard the fid. In this scenario, it will lead to some fatal problems, even
> kernel core dump.
> >
> > I introduce a mechanism to fix this race issue. A counter field
> > introduced into p9_fid struct to store the reference counter to the
> > fid. When a fid is allocated from the inode or dentry, the counter
> > will increase, and will decrease at the end of its occupation. It is
> > guaranteed that the fid won't be clunked before the reference counter
> > go down to 0, then we can avoid the clunked fid to be used.
> >
> > tests:
> > race issue test from the old test case:
> > for file in {01..50}; do touch f.${file}; done seq 1 1000 | xargs -n 1
> > -P 50 -I{} cat f.* > /dev/null
> >
> > open-unlink-f*syscall test:
> > I have tested for f*syscall include: ftruncate fstat fchown fchmod faccessat.
> >
> > Signed-off-by: Jianyong Wu <jianyong.wu@arm.com>
>
> Ok so I've finally taken some time to test -- sorry for the delay.
> I didn't bother with qemu but the use-after-close f* calls work with nfs-
> ganesha and it doesn't introduce any obvious regression with the current
> qemu either, so this is good for me.
>
> Greg, Christian - from what I understood (in private, hopefully I'm allowed to
> repeat!), he won't be able to contribute to qemu because of company
> policies and I'm unlikely to take the time either right now.
> I don't think it's a problem to continue as is though, we can land linux kernel
> support (it's still useful for non-qemu servers) and if someone is interested
> later on they'll just need to finish that bit.
>
>
> I'm not seeing any obvious problem now so I'll take these patches in -next
> within the next few days, with this extra patch on top that basically applies
> the requests I had:
>  - removing the extra atomic_set in fs/9p/fid.c
>  - convert from atomic to refcount API (overflow checks)
>  - rename a no-composant walk to clone_fid()
>
> I'll just run some more instrumented tests first to check we're not leaking
> anything but so far I haven't found any problem.
>
> If you noticed anything else please speak up.
> Thanks for taking the time to finish this! :)
> ---
>  fs/9p/fid.c             | 10 ++++------
>  fs/9p/fid.h             |  2 +-
>  include/net/9p/client.h |  2 +-
>  net/9p/client.c         |  4 ++--
>  4 files changed, 8 insertions(+), 10 deletions(-)
>
> diff --git a/fs/9p/fid.c b/fs/9p/fid.c
> index 89643dabcdae..50118ec72a92 100644
> --- a/fs/9p/fid.c
> +++ b/fs/9p/fid.c
> @@ -28,7 +28,6 @@
>
>  static inline void __add_fid(struct dentry *dentry, struct p9_fid *fid)  {
> -atomic_set(&fid->count, 1);
>  hlist_add_head(&fid->dlist, (struct hlist_head *)&dentry-
> >d_fsdata);  }
>
> @@ -62,7 +61,7 @@ static struct p9_fid *v9fs_fid_find_inode(struct inode
> *inode, kuid_t uid)
>  }
>  }
>  if (ret && !IS_ERR(ret))
> -atomic_inc(&ret->count);
> +refcount_inc(&ret->count);
>  spin_unlock(&inode->i_lock);
>  return ret;
>  }
> @@ -77,7 +76,6 @@ static struct p9_fid *v9fs_fid_find_inode(struct inode
> *inode, kuid_t uid)  void v9fs_open_fid_add(struct inode *inode, struct
> p9_fid *fid)  {
>  spin_lock(&inode->i_lock);
> -atomic_set(&fid->count, 1);
>  hlist_add_head(&fid->ilist, (struct hlist_head *)&inode->i_private);
>  spin_unlock(&inode->i_lock);
>  }
> @@ -110,7 +108,7 @@ static struct p9_fid *v9fs_fid_find(struct dentry
> *dentry, kuid_t uid, int any)
>  hlist_for_each_entry(fid, h, dlist) {
>  if (any || uid_eq(fid->uid, uid)) {
>  ret = fid;
> -atomic_inc(&ret->count);
> +refcount_inc(&ret->count);
>  break;
>  }
>  }
> @@ -201,7 +199,7 @@ static struct p9_fid *v9fs_fid_lookup_with_uid(struct
> dentry *dentry,
>  }
>  /* If we are root ourself just return that */
>  if (dentry->d_sb->s_root == dentry) {
> -atomic_inc(&fid->count);
> +refcount_inc(&fid->count);
>  return fid;
>  }
>  /*
> @@ -250,7 +248,7 @@ static struct p9_fid *v9fs_fid_lookup_with_uid(struct
> dentry *dentry,
>  fid = ERR_PTR(-ENOENT);
>  } else {
>  __add_fid(dentry, fid);
> -atomic_inc(&fid->count);
> +refcount_inc(&fid->count);
>  spin_unlock(&dentry->d_lock);
>  }
>  }
> diff --git a/fs/9p/fid.h b/fs/9p/fid.h
> index 1fed96546728..f7f33509e169 100644
> --- a/fs/9p/fid.h
> +++ b/fs/9p/fid.h
> @@ -28,7 +28,7 @@ static inline struct p9_fid *v9fs_fid_clone(struct dentry
> *dentry)
>  if (!fid || IS_ERR(fid))
>  return fid;
>
> -nfid = p9_client_walk(fid, 0, NULL, 1);
> +nfid = clone_fid(fid);
>  p9_client_clunk(fid);
>  return nfid;
>  }
> diff --git a/include/net/9p/client.h b/include/net/9p/client.h index
> 58ed9bd306bd..e1c308d8d288 100644
> --- a/include/net/9p/client.h
> +++ b/include/net/9p/client.h
> @@ -149,7 +149,7 @@ enum fid_source {
>  struct p9_fid {
>  struct p9_client *clnt;
>  u32 fid;
> -atomic_t count;
> +refcount_t count;
>  int mode;
>  struct p9_qid qid;
>  u32 iounit;
> diff --git a/net/9p/client.c b/net/9p/client.c index
> a6c8a915e0d8..ba4910138c5b 100644
> --- a/net/9p/client.c
> +++ b/net/9p/client.c
> @@ -901,7 +901,7 @@ static struct p9_fid *p9_fid_create(struct p9_client
> *clnt)
>  fid->clnt = clnt;
>  fid->rdir = NULL;
>  fid->fid = 0;
> -atomic_set(&fid->count, 1);
> +refcount_set(&fid->count, 1);
>
>  idr_preload(GFP_KERNEL);
>  spin_lock_irq(&clnt->lock);
> @@ -1466,7 +1466,7 @@ int p9_client_clunk(struct p9_fid *fid)
>  dump_stack();
>  return 0;
>  }
> -if (!atomic_dec_and_test(&fid->count))
> +if (!refcount_dec_and_test(&fid->count))
>  return 0;
>
>  again:

IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH RFC v2 4/4] 9p: fix race issue in fid contention.
  2020-11-04 11:57       ` Dominique Martinet
@ 2020-11-05 12:32         ` Christian Schoenebeck
  0 siblings, 0 replies; 18+ messages in thread
From: Christian Schoenebeck @ 2020-11-05 12:32 UTC (permalink / raw)
  To: Dominique Martinet
  Cc: Jianyong Wu, ericvh, lucho, groug, v9fs-developer, linux-kernel,
	justin.he

On Mittwoch, 4. November 2020 12:57:08 CET Dominique Martinet wrote:
> Christian Schoenebeck wrote on Wed, Nov 04, 2020:
> > > Greg, Christian - from what I understood (in private, hopefully I'm
> > > allowed to repeat!), he won't be able to contribute to qemu because of
> > > company policies and I'm unlikely to take the time either right now.
> > > I don't think it's a problem to continue as is though, we can land linux
> > > kernel support (it's still useful for non-qemu servers) and if someone
> > > is interested later on they'll just need to finish that bit.
> > 
> > Hmm, no idea what kind of policy that is; there is no GPL3 in qemu at
> > least
> > that some companies are concerned about, but OK not my business.
> > 
> > I actually thought this would still take a while on kernel side,
> 
> To be honest, so did I -- the original patches are so old I had more or
> less given up on it :P
> 
> But I don't see any more problem now and we'll want to get there
> eventually so now's a good time as any... I just want to get fault
> injection to work to test various refcounting cornercases but shouldn't
> be much longer.

Exactly! The situation would presumably not change at any other time in 
future. Maybe there will be issues, we'll see, but I think it's worth it, as a 
large bunch of software depends on use-after-unlink behaviour.

> 
> > so in the
> > meantime we layed the ground in qemu for resolving this issue independent
> > of clients and independent of any guest OS installation by introducing
> > test cases using the 9pfs 'local' filesystem driver:
> > 
> > https://github.com/qemu/qemu/blob/master/tests/qtest/virtio-9p-test.c
> > 
> > So the idea was to resolve that chicken egg problem of this issue that way
> > and also handle it a bit more systematically. If you now run qemu's 9p
> > tests with latest git version (or at least with yesterday's QEMU 5.2 rc1
> > tarball):
> > 
> > cd qemu/build
> > make
> > export QTEST_QEMU_BINARY=x86_64-softmmu/qemu-system-x86_64
> > tests/qtest/qos-test
> > 
> > these tests will now create a test directory qtest-9p-local-XXXXXX under
> > the current directory (i.e. the build directory) where they are creating
> > real directories and files like on a production system would do, just
> > without a guest OS.
> > 
> > As you can see, there are already 9p tests for creating and deleting
> > directories, files, symlinks and hard links, etc.
> > 
> > Maybe somebody interested to see this issue resolved in qemu might help by
> > rebasing Greg's old patches and testing it with some test cases this way.
> > Personally I need to work on some other things in the next couple weeks,
> > but if somebody needs help, questions, review, etc., I'll be there.
> 
> Great news, nice work there.
> I see the new tests it doesn't look hard to add new ones reproducing
> open-unlink-fstat for example; I think it's good to have regardless of
> kernel progress.
> 
> We'll get there!

Yes, that was the goal, trying to keep it simple so that people not 
necessarily being deeply familiar with 9P (or QEMU) can still quickly write 
tests for their issues.

This provides several benefits: we can now clearly isolate issues, because in 
the past we often received patches where it was not immediately clear what's 
that this patch is fixing exacly, is it a qemu problem, is it rather the 
client that should handle this, or is this even some spanning side effect of 
several layers involved like e.g. when overlayfs is deployed.

And another major benefit is that it simply makes development much more 
efficient. Because you can now just change something on qemu side, and simply 
run

make && tests/qtest/qos-test

to see within few seconds whether it really does what you wanted it to do. And 
on doubt you just look into that subdirectory qtest-9p-local-XXXXXX to see 
what happened.

You can also automatically test your changes with multiple qemu configurations 
(e.g. different security modes, mappings, etc.) as each test case can supply 
its own set of qemu CL options, and the tests can also be run for all enabled 
architectures.

The command "tests/qtest/qos-test" is just a shortcut for 9P tests of course. 
Because obviously there is a huge amount of test cases in qemu for all its 
subsystems. But I will document this and other things more clearly soon to 
lower the entry barrier for new people getting in touch with the qemu 9p code 
base.

Best regards,
Christian Schoenebeck



^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH 0/2] follow-up to 9p: fix race issue in fid contention.
  2020-11-03 10:41   ` Dominique Martinet
  2020-11-04 11:32     ` Christian Schoenebeck
  2020-11-05  7:05     ` Jianyong Wu
@ 2020-11-19 16:06     ` Dominique Martinet
  2020-11-19 16:06       ` [PATCH 1/2] 9p: apply review requests for fid refcounting Dominique Martinet
  2020-11-19 16:06       ` [PATCH 2/2] 9p: Fix writeback fid incorrectly being attached to dentry Dominique Martinet
  2 siblings, 2 replies; 18+ messages in thread
From: Dominique Martinet @ 2020-11-19 16:06 UTC (permalink / raw)
  Cc: Jianyong Wu, lucho, justin.he, ericvh, qemu_oss, groug,
	linux-kernel, v9fs-developer, Dominique Martinet

In case anyone wondered why the patch series isn't in -next yet, I ran into
some troubles with non-none cache that the second patch appears to fix.

Also realized I hadn't sent the fixups I had meant Jianyong Wu to do, so
sending both now (keeping it a separate patch) and will put to my next
tree now, this time for real.

Hopefully not too many other bugs not uncovered... But only one way to
find out, and I think refcounting 9p fid will allow some nice
optimizations in the future if anyone cares to work on it...


Onto fscache now...!


Dominique Martinet (2):
  9p: apply review requests for fid refcounting
  9p: Fix writeback fid incorrectly being attached to dentry

 fs/9p/fid.c             | 10 ++++------
 fs/9p/fid.h             |  2 +-
 fs/9p/vfs_file.c        |  6 +++---
 include/net/9p/client.h |  2 +-
 net/9p/client.c         |  4 ++--
 5 files changed, 11 insertions(+), 13 deletions(-)

-- 
2.28.0


^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH 1/2] 9p: apply review requests for fid refcounting
  2020-11-19 16:06     ` [PATCH 0/2] follow-up to " Dominique Martinet
@ 2020-11-19 16:06       ` Dominique Martinet
  2020-11-19 16:06       ` [PATCH 2/2] 9p: Fix writeback fid incorrectly being attached to dentry Dominique Martinet
  1 sibling, 0 replies; 18+ messages in thread
From: Dominique Martinet @ 2020-11-19 16:06 UTC (permalink / raw)
  Cc: Jianyong Wu, lucho, justin.he, ericvh, qemu_oss, groug,
	linux-kernel, v9fs-developer, Dominique Martinet

Fix style issues in parent commit ("apply review requests for fid
refcounting"), no functional change.

Signed-off-by: Dominique Martinet <asmadeus@codewreck.org>
---
 fs/9p/fid.c             | 10 ++++------
 fs/9p/fid.h             |  2 +-
 include/net/9p/client.h |  2 +-
 net/9p/client.c         |  4 ++--
 4 files changed, 8 insertions(+), 10 deletions(-)

diff --git a/fs/9p/fid.c b/fs/9p/fid.c
index 89643dabcdae..50118ec72a92 100644
--- a/fs/9p/fid.c
+++ b/fs/9p/fid.c
@@ -28,7 +28,6 @@
 
 static inline void __add_fid(struct dentry *dentry, struct p9_fid *fid)
 {
-	atomic_set(&fid->count, 1);
 	hlist_add_head(&fid->dlist, (struct hlist_head *)&dentry->d_fsdata);
 }
 
@@ -62,7 +61,7 @@ static struct p9_fid *v9fs_fid_find_inode(struct inode *inode, kuid_t uid)
 		}
 	}
 	if (ret && !IS_ERR(ret))
-		atomic_inc(&ret->count);
+		refcount_inc(&ret->count);
 	spin_unlock(&inode->i_lock);
 	return ret;
 }
@@ -77,7 +76,6 @@ static struct p9_fid *v9fs_fid_find_inode(struct inode *inode, kuid_t uid)
 void v9fs_open_fid_add(struct inode *inode, struct p9_fid *fid)
 {
 	spin_lock(&inode->i_lock);
-	atomic_set(&fid->count, 1);
 	hlist_add_head(&fid->ilist, (struct hlist_head *)&inode->i_private);
 	spin_unlock(&inode->i_lock);
 }
@@ -110,7 +108,7 @@ static struct p9_fid *v9fs_fid_find(struct dentry *dentry, kuid_t uid, int any)
 		hlist_for_each_entry(fid, h, dlist) {
 			if (any || uid_eq(fid->uid, uid)) {
 				ret = fid;
-				atomic_inc(&ret->count);
+				refcount_inc(&ret->count);
 				break;
 			}
 		}
@@ -201,7 +199,7 @@ static struct p9_fid *v9fs_fid_lookup_with_uid(struct dentry *dentry,
 	}
 	/* If we are root ourself just return that */
 	if (dentry->d_sb->s_root == dentry) {
-		atomic_inc(&fid->count);
+		refcount_inc(&fid->count);
 		return fid;
 	}
 	/*
@@ -250,7 +248,7 @@ static struct p9_fid *v9fs_fid_lookup_with_uid(struct dentry *dentry,
 			fid = ERR_PTR(-ENOENT);
 		} else {
 			__add_fid(dentry, fid);
-			atomic_inc(&fid->count);
+			refcount_inc(&fid->count);
 			spin_unlock(&dentry->d_lock);
 		}
 	}
diff --git a/fs/9p/fid.h b/fs/9p/fid.h
index 1fed96546728..f7f33509e169 100644
--- a/fs/9p/fid.h
+++ b/fs/9p/fid.h
@@ -28,7 +28,7 @@ static inline struct p9_fid *v9fs_fid_clone(struct dentry *dentry)
 	if (!fid || IS_ERR(fid))
 		return fid;
 
-	nfid = p9_client_walk(fid, 0, NULL, 1);
+	nfid = clone_fid(fid);
 	p9_client_clunk(fid);
 	return nfid;
 }
diff --git a/include/net/9p/client.h b/include/net/9p/client.h
index 58ed9bd306bd..e1c308d8d288 100644
--- a/include/net/9p/client.h
+++ b/include/net/9p/client.h
@@ -149,7 +149,7 @@ enum fid_source {
 struct p9_fid {
 	struct p9_client *clnt;
 	u32 fid;
-	atomic_t count;
+	refcount_t count;
 	int mode;
 	struct p9_qid qid;
 	u32 iounit;
diff --git a/net/9p/client.c b/net/9p/client.c
index a6c8a915e0d8..ba4910138c5b 100644
--- a/net/9p/client.c
+++ b/net/9p/client.c
@@ -901,7 +901,7 @@ static struct p9_fid *p9_fid_create(struct p9_client *clnt)
 	fid->clnt = clnt;
 	fid->rdir = NULL;
 	fid->fid = 0;
-	atomic_set(&fid->count, 1);
+	refcount_set(&fid->count, 1);
 
 	idr_preload(GFP_KERNEL);
 	spin_lock_irq(&clnt->lock);
@@ -1466,7 +1466,7 @@ int p9_client_clunk(struct p9_fid *fid)
 		dump_stack();
 		return 0;
 	}
-	if (!atomic_dec_and_test(&fid->count))
+	if (!refcount_dec_and_test(&fid->count))
 		return 0;
 
 again:
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 2/2] 9p: Fix writeback fid incorrectly being attached to dentry
  2020-11-19 16:06     ` [PATCH 0/2] follow-up to " Dominique Martinet
  2020-11-19 16:06       ` [PATCH 1/2] 9p: apply review requests for fid refcounting Dominique Martinet
@ 2020-11-19 16:06       ` Dominique Martinet
  1 sibling, 0 replies; 18+ messages in thread
From: Dominique Martinet @ 2020-11-19 16:06 UTC (permalink / raw)
  Cc: Jianyong Wu, lucho, justin.he, ericvh, qemu_oss, groug,
	linux-kernel, v9fs-developer, Dominique Martinet

v9fs_dir_release needs fid->ilist to have been initialized for filp's
fid, not the inode's writeback fid's.

With refcounting this can be improved on later but this appears to fix
null deref issues.

Fixes: xxx ("fs/9p: track open fids")
Signed-off-by: Dominique Martinet <asmadeus@codewreck.org>
---
(note: fixes tag can't be filled here, will be corrected later)
 fs/9p/vfs_file.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/fs/9p/vfs_file.c b/fs/9p/vfs_file.c
index b0ef225cecd0..c5e49c88688d 100644
--- a/fs/9p/vfs_file.c
+++ b/fs/9p/vfs_file.c
@@ -46,7 +46,7 @@ int v9fs_file_open(struct inode *inode, struct file *file)
 	int err;
 	struct v9fs_inode *v9inode;
 	struct v9fs_session_info *v9ses;
-	struct p9_fid *fid;
+	struct p9_fid *fid, *writeback_fid;
 	int omode;
 
 	p9_debug(P9_DEBUG_VFS, "inode: %p file: %p\n", inode, file);
@@ -85,13 +85,13 @@ int v9fs_file_open(struct inode *inode, struct file *file)
 		 * because we want write after unlink usecase
 		 * to work.
 		 */
-		fid = v9fs_writeback_fid(file_dentry(file));
+		writeback_fid = v9fs_writeback_fid(file_dentry(file));
 		if (IS_ERR(fid)) {
 			err = PTR_ERR(fid);
 			mutex_unlock(&v9inode->v_mutex);
 			goto out_error;
 		}
-		v9inode->writeback_fid = (void *) fid;
+		v9inode->writeback_fid = (void *) writeback_fid;
 	}
 	mutex_unlock(&v9inode->v_mutex);
 	if (v9ses->cache == CACHE_LOOSE || v9ses->cache == CACHE_FSCACHE)
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2020-11-19 16:07 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-09-23 14:11 [PATCH RFC v2 0/4] 9p: fix open-unlink-f*syscall bug Jianyong Wu
2020-09-23 14:11 ` [PATCH v2 1/4] fs/9p: fix create-unlink-getattr idiom Jianyong Wu
2020-09-23 14:11 ` [PATCH v2 2/4] fs/9p: track open fids Jianyong Wu
2020-09-23 14:11 ` [PATCH v2 3/4] fs/9p: search open fids first Jianyong Wu
2020-09-23 14:11 ` [PATCH RFC v2 4/4] 9p: fix race issue in fid contention Jianyong Wu
2020-09-23 14:49   ` Dominique Martinet
2020-09-24  8:38     ` Jianyong Wu
2020-09-24  8:56       ` Greg Kurz
2020-09-24  9:51       ` Dominique Martinet
2020-09-25  9:49         ` Jianyong Wu
2020-11-03 10:41   ` Dominique Martinet
2020-11-04 11:32     ` Christian Schoenebeck
2020-11-04 11:57       ` Dominique Martinet
2020-11-05 12:32         ` Christian Schoenebeck
2020-11-05  7:05     ` Jianyong Wu
2020-11-19 16:06     ` [PATCH 0/2] follow-up to " Dominique Martinet
2020-11-19 16:06       ` [PATCH 1/2] 9p: apply review requests for fid refcounting Dominique Martinet
2020-11-19 16:06       ` [PATCH 2/2] 9p: Fix writeback fid incorrectly being attached to dentry Dominique Martinet

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).