linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC] ->encode_fh() and related breakage
@ 2011-12-30  6:27 Al Viro
  2011-12-30  7:14 ` Andreas Dilger
                   ` (3 more replies)
  0 siblings, 4 replies; 7+ messages in thread
From: Al Viro @ 2011-12-30  6:27 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: Linus Torvalds, Sage Weil, Dan Magenheimer

	a) cleancache is abusing ->encode_fh().  Badly.  In particular,
on-stack(!) struct dentry is a lousy idea.

	b) ceph is _also_ abusing ->encode_fh().  For one thing, it
gets fhandle that depends on object name if you give it a large enough
buffer to put result into.  This is absolutely broken - having cross-directory
renames break opened files is bad enough, but having that happen on rename()
_inside_ a directory...  Besides, the only thing that ought to depend on
max_len passed to ->encode_fh() should be "do I tell the caller to come
back with bigger one?".

	c) cleancache relies on (unguaranteed) property of ->encode_fh() -
if the last argument is 0, most of the instances do not look at anything
other than dentry->d_inode.  Guess what happens with ceph?  Right, we
end up doing spin_lock(&d.d_lock) and dget(d.d_parent).  Even though
struct dentry d is auto, has no initializer and the only thing done to
it is assignment to d.d_inode.  IOW, that spinlock has undefined contents
and in case if you happen to pass that spin_lock(), dget(random_pointer)
will get you an increment of value in memory at random address.  Fun...
Moreover, on FAT we get dereference (read-only, but still it's at least an
oops) of uninitialized pointer - d.d_parent->d_inode is followed.

	d) after a look through ->encode_fh() instances:
* all of them care about dentry->d_inode (d'oh)
* most of them look at anything else only if that inode is a directory and
the last argument is non-zero; in that case dentry->d_lock is grabbed and
(usually) some properties of dentry->d_parent->d_inode are looked at.
* most of them never do anything blocking.  Exceptions:
	* ceph doing dget()/dput() on parent.  For no good reason, since
everything it does with that parent could've been done before dropping
dentry->d_lock (and we don't do anything else that might've blocked)
	* gfs2 - same, but it does igrab() on parent's inode instead.
Again, for no good reason whatsoever.

	e) in case of default encode_fh, cleancache gets one difference from
exportfs_encode_fh() behaviour - it ignores ->i_generation.  Note that for
non-default encode_fh() i_generation is *not* ignored.

	f) in case of _no_ export operations being present, cleancache is
basically praying for inumbers being stable for some unknown reason.  Which
is about as efficient as prayers tend to be...	

	One kinda-sorta solution (besides kicking cleancache out of tree)
would be to modify ->encode_fh() API - instead of dentry + connectable
pass it inode + NULL-or-parent-inode, with ->d_lock held by caller.  And
demand it to be non-blocking, obviously...  It would obviously break
ceph ->encode_fh(), since that sucker apparently wants the name of last
component anyway.  OTOH, that's broken for reasons mentioned above.
OTTH, fhandle encoding is bloody close to user-visible ABI ;-/

	_Absolutely_ untested patch along those lines follows; it definitely
breaks ceph, it does nothing for (e) and (f), it might be broken in all kinds
of dumb ways, etc.  Comments are welcome...

diff --git a/fs/btrfs/export.c b/fs/btrfs/export.c
index 5f77166..e60dbe1 100644
--- a/fs/btrfs/export.c
+++ b/fs/btrfs/export.c
@@ -13,15 +13,14 @@
 					     parent_root_objectid) / 4)
 #define BTRFS_FID_SIZE_CONNECTABLE_ROOT (sizeof(struct btrfs_fid) / 4)
 
-static int btrfs_encode_fh(struct dentry *dentry, u32 *fh, int *max_len,
-			   int connectable)
+static int btrfs_encode_fh(struct inode *inode, u32 *fh, int *max_len,
+			   struct inode *parent)
 {
 	struct btrfs_fid *fid = (struct btrfs_fid *)fh;
-	struct inode *inode = dentry->d_inode;
 	int len = *max_len;
 	int type;
 
-	if (connectable && (len < BTRFS_FID_SIZE_CONNECTABLE)) {
+	if (parent && (len < BTRFS_FID_SIZE_CONNECTABLE)) {
 		*max_len = BTRFS_FID_SIZE_CONNECTABLE;
 		return 255;
 	} else if (len < BTRFS_FID_SIZE_NON_CONNECTABLE) {
@@ -36,19 +35,13 @@ static int btrfs_encode_fh(struct dentry *dentry, u32 *fh, int *max_len,
 	fid->root_objectid = BTRFS_I(inode)->root->objectid;
 	fid->gen = inode->i_generation;
 
-	if (connectable && !S_ISDIR(inode->i_mode)) {
-		struct inode *parent;
+	if (parent) {
 		u64 parent_root_id;
 
-		spin_lock(&dentry->d_lock);
-
-		parent = dentry->d_parent->d_inode;
 		fid->parent_objectid = BTRFS_I(parent)->location.objectid;
 		fid->parent_gen = parent->i_generation;
 		parent_root_id = BTRFS_I(parent)->root->objectid;
 
-		spin_unlock(&dentry->d_lock);
-
 		if (parent_root_id != fid->root_objectid) {
 			fid->parent_root_objectid = parent_root_id;
 			len = BTRFS_FID_SIZE_CONNECTABLE_ROOT;
diff --git a/fs/ceph/export.c b/fs/ceph/export.c
index 9fbcdec..bcaca95 100644
--- a/fs/ceph/export.c
+++ b/fs/ceph/export.c
@@ -249,7 +249,9 @@ static struct dentry *ceph_fh_to_parent(struct super_block *sb,
 }
 
 const struct export_operations ceph_export_ops = {
+#ifdef BROKEN
 	.encode_fh = ceph_encode_fh,
+#endif
 	.fh_to_dentry = ceph_fh_to_dentry,
 	.fh_to_parent = ceph_fh_to_parent,
 };
diff --git a/fs/exportfs/expfs.c b/fs/exportfs/expfs.c
index b05acb7..f8f7029 100644
--- a/fs/exportfs/expfs.c
+++ b/fs/exportfs/expfs.c
@@ -304,24 +304,23 @@ out:
 
 /**
  * export_encode_fh - default export_operations->encode_fh function
- * @dentry:  the dentry to encode
+ * @inode:   the object to encode
  * @fh:      where to store the file handle fragment
  * @max_len: maximum length to store there
- * @connectable: whether to store parent information
+ * @parent:  parent directory inode, if wanted
  *
  * This default encode_fh function assumes that the 32 inode number
  * is suitable for locating an inode, and that the generation number
  * can be used to check that it is still valid.  It places them in the
  * filehandle fragment where export_decode_fh expects to find them.
  */
-static int export_encode_fh(struct dentry *dentry, struct fid *fid,
-		int *max_len, int connectable)
+static int export_encode_fh(struct inode *inode, struct fid *fid,
+		int *max_len, struct inode *parent)
 {
-	struct inode * inode = dentry->d_inode;
 	int len = *max_len;
 	int type = FILEID_INO32_GEN;
 
-	if (connectable && (len < 4)) {
+	if (parent && (len < 4)) {
 		*max_len = 4;
 		return 255;
 	} else if (len < 2) {
@@ -332,14 +331,9 @@ static int export_encode_fh(struct dentry *dentry, struct fid *fid,
 	len = 2;
 	fid->i32.ino = inode->i_ino;
 	fid->i32.gen = inode->i_generation;
-	if (connectable && !S_ISDIR(inode->i_mode)) {
-		struct inode *parent;
-
-		spin_lock(&dentry->d_lock);
-		parent = dentry->d_parent->d_inode;
+	if (parent) {
 		fid->i32.parent_ino = parent->i_ino;
 		fid->i32.parent_gen = parent->i_generation;
-		spin_unlock(&dentry->d_lock);
 		len = 4;
 		type = FILEID_INO32_GEN_PARENT;
 	}
@@ -352,11 +346,19 @@ int exportfs_encode_fh(struct dentry *dentry, struct fid *fid, int *max_len,
 {
 	const struct export_operations *nop = dentry->d_sb->s_export_op;
 	int error;
+	struct inode *inode = dentry->d_inode, *parent = NULL;
 
+	if (connectable && !S_ISDIR(inode->i_mode)) {
+		spin_lock(&dentry->d_lock);
+		parent = dentry->d_parent->d_inode;
+		BUG_ON(!parent);
+	}
 	if (nop->encode_fh)
-		error = nop->encode_fh(dentry, fid->raw, max_len, connectable);
+		error = nop->encode_fh(inode, fid->raw, max_len, parent);
 	else
-		error = export_encode_fh(dentry, fid, max_len, connectable);
+		error = export_encode_fh(inode, fid, max_len, parent);
+	if (parent)
+		spin_unlock(&dentry->d_lock);
 
 	return error;
 }
diff --git a/fs/fat/inode.c b/fs/fat/inode.c
index 7873797..0813c70 100644
--- a/fs/fat/inode.c
+++ b/fs/fat/inode.c
@@ -752,10 +752,9 @@ static struct dentry *fat_fh_to_dentry(struct super_block *sb,
 }
 
 static int
-fat_encode_fh(struct dentry *de, __u32 *fh, int *lenp, int connectable)
+fat_encode_fh(struct inode *inode, __u32 *fh, int *lenp, struct inode *parent)
 {
 	int len = *lenp;
-	struct inode *inode =  de->d_inode;
 	u32 ipos_h, ipos_m, ipos_l;
 
 	if (len < 5) {
@@ -771,9 +770,9 @@ fat_encode_fh(struct dentry *de, __u32 *fh, int *lenp, int connectable)
 	fh[1] = inode->i_generation;
 	fh[2] = ipos_h;
 	fh[3] = ipos_m | MSDOS_I(inode)->i_logstart;
-	spin_lock(&de->d_lock);
-	fh[4] = ipos_l | MSDOS_I(de->d_parent->d_inode)->i_logstart;
-	spin_unlock(&de->d_lock);
+	fh[4] = ipos_l;
+	if (parent)
+		fh[4] |= MSDOS_I(parent)->i_logstart;
 	return 3;
 }
 
diff --git a/fs/fuse/inode.c b/fs/fuse/inode.c
index 64cf8d0..25ca731 100644
--- a/fs/fuse/inode.c
+++ b/fs/fuse/inode.c
@@ -627,12 +627,10 @@ static struct dentry *fuse_get_dentry(struct super_block *sb,
 	return ERR_PTR(err);
 }
 
-static int fuse_encode_fh(struct dentry *dentry, u32 *fh, int *max_len,
-			   int connectable)
+static int fuse_encode_fh(struct inode *inode, u32 *fh, int *max_len,
+			   struct inode *parent)
 {
-	struct inode *inode = dentry->d_inode;
-	bool encode_parent = connectable && !S_ISDIR(inode->i_mode);
-	int len = encode_parent ? 6 : 3;
+	int len = parent ? 6 : 3;
 	u64 nodeid;
 	u32 generation;
 
@@ -648,14 +646,9 @@ static int fuse_encode_fh(struct dentry *dentry, u32 *fh, int *max_len,
 	fh[1] = (u32)(nodeid & 0xffffffff);
 	fh[2] = generation;
 
-	if (encode_parent) {
-		struct inode *parent;
-
-		spin_lock(&dentry->d_lock);
-		parent = dentry->d_parent->d_inode;
+	if (parent) {
 		nodeid = get_fuse_inode(parent)->nodeid;
 		generation = parent->i_generation;
-		spin_unlock(&dentry->d_lock);
 
 		fh[3] = (u32)(nodeid >> 32);
 		fh[4] = (u32)(nodeid & 0xffffffff);
@@ -663,7 +656,7 @@ static int fuse_encode_fh(struct dentry *dentry, u32 *fh, int *max_len,
 	}
 
 	*max_len = len;
-	return encode_parent ? 0x82 : 0x81;
+	return parent ? 0x82 : 0x81;
 }
 
 static struct dentry *fuse_fh_to_dentry(struct super_block *sb,
diff --git a/fs/gfs2/export.c b/fs/gfs2/export.c
index fe9945f..faa7916 100644
--- a/fs/gfs2/export.c
+++ b/fs/gfs2/export.c
@@ -28,15 +28,14 @@
 #define GFS2_LARGE_FH_SIZE 8
 #define GFS2_OLD_FH_SIZE 10
 
-static int gfs2_encode_fh(struct dentry *dentry, __u32 *p, int *len,
-			  int connectable)
+static int gfs2_encode_fh(struct inode *inode, __u32 *p, int *len,
+			  struct inode *parent)
 {
 	__be32 *fh = (__force __be32 *)p;
-	struct inode *inode = dentry->d_inode;
 	struct super_block *sb = inode->i_sb;
 	struct gfs2_inode *ip = GFS2_I(inode);
 
-	if (connectable && (*len < GFS2_LARGE_FH_SIZE)) {
+	if (parent && (*len < GFS2_LARGE_FH_SIZE)) {
 		*len = GFS2_LARGE_FH_SIZE;
 		return 255;
 	} else if (*len < GFS2_SMALL_FH_SIZE) {
@@ -50,14 +49,10 @@ static int gfs2_encode_fh(struct dentry *dentry, __u32 *p, int *len,
 	fh[3] = cpu_to_be32(ip->i_no_addr & 0xFFFFFFFF);
 	*len = GFS2_SMALL_FH_SIZE;
 
-	if (!connectable || inode == sb->s_root->d_inode)
+	if (!parent || inode == sb->s_root->d_inode)
 		return *len;
 
-	spin_lock(&dentry->d_lock);
-	inode = dentry->d_parent->d_inode;
-	ip = GFS2_I(inode);
-	igrab(inode);
-	spin_unlock(&dentry->d_lock);
+	ip = GFS2_I(parent);
 
 	fh[4] = cpu_to_be32(ip->i_no_formal_ino >> 32);
 	fh[5] = cpu_to_be32(ip->i_no_formal_ino & 0xFFFFFFFF);
@@ -65,8 +60,6 @@ static int gfs2_encode_fh(struct dentry *dentry, __u32 *p, int *len,
 	fh[7] = cpu_to_be32(ip->i_no_addr & 0xFFFFFFFF);
 	*len = GFS2_LARGE_FH_SIZE;
 
-	iput(inode);
-
 	return *len;
 }
 
diff --git a/fs/isofs/export.c b/fs/isofs/export.c
index dd4687f..aa4356d 100644
--- a/fs/isofs/export.c
+++ b/fs/isofs/export.c
@@ -107,12 +107,11 @@ static struct dentry *isofs_export_get_parent(struct dentry *child)
 }
 
 static int
-isofs_export_encode_fh(struct dentry *dentry,
+isofs_export_encode_fh(struct inode *inode,
 		       __u32 *fh32,
 		       int *max_len,
-		       int connectable)
+		       struct inode *parent)
 {
-	struct inode * inode = dentry->d_inode;
 	struct iso_inode_info * ei = ISOFS_I(inode);
 	int len = *max_len;
 	int type = 1;
@@ -124,7 +123,7 @@ isofs_export_encode_fh(struct dentry *dentry,
 	 * offset of the inode and the upper 16 bits of fh32[1] to
 	 * hold the offset of the parent.
 	 */
-	if (connectable && (len < 5)) {
+	if (parent && (len < 5)) {
 		*max_len = 5;
 		return 255;
 	} else if (len < 3) {
@@ -136,16 +135,12 @@ isofs_export_encode_fh(struct dentry *dentry,
 	fh32[0] = ei->i_iget5_block;
  	fh16[2] = (__u16)ei->i_iget5_offset;  /* fh16 [sic] */
 	fh32[2] = inode->i_generation;
-	if (connectable && !S_ISDIR(inode->i_mode)) {
-		struct inode *parent;
+	if (parent) {
 		struct iso_inode_info *eparent;
-		spin_lock(&dentry->d_lock);
-		parent = dentry->d_parent->d_inode;
 		eparent = ISOFS_I(parent);
 		fh32[3] = eparent->i_iget5_block;
 		fh16[3] = (__u16)eparent->i_iget5_offset;  /* fh16 [sic] */
 		fh32[4] = parent->i_generation;
-		spin_unlock(&dentry->d_lock);
 		len = 5;
 		type = 2;
 	}
diff --git a/fs/nilfs2/namei.c b/fs/nilfs2/namei.c
index 1cd3f62..958e1cb 100644
--- a/fs/nilfs2/namei.c
+++ b/fs/nilfs2/namei.c
@@ -519,31 +519,24 @@ static struct dentry *nilfs_fh_to_parent(struct super_block *sb, struct fid *fh,
 	return nilfs_get_dentry(sb, fid->cno, fid->parent_ino, fid->parent_gen);
 }
 
-static int nilfs_encode_fh(struct dentry *dentry, __u32 *fh, int *lenp,
-			   int connectable)
+static int nilfs_encode_fh(struct inode *inode, __u32 *fh, int *lenp,
+			   struct inode *parent)
 {
 	struct nilfs_fid *fid = (struct nilfs_fid *)fh;
-	struct inode *inode = dentry->d_inode;
 	struct nilfs_root *root = NILFS_I(inode)->i_root;
 	int type;
 
 	if (*lenp < NILFS_FID_SIZE_NON_CONNECTABLE ||
-	    (connectable && *lenp < NILFS_FID_SIZE_CONNECTABLE))
+	    (parent && *lenp < NILFS_FID_SIZE_CONNECTABLE))
 		return 255;
 
 	fid->cno = root->cno;
 	fid->ino = inode->i_ino;
 	fid->gen = inode->i_generation;
 
-	if (connectable && !S_ISDIR(inode->i_mode)) {
-		struct inode *parent;
-
-		spin_lock(&dentry->d_lock);
-		parent = dentry->d_parent->d_inode;
+	if (parent) {
 		fid->parent_ino = parent->i_ino;
 		fid->parent_gen = parent->i_generation;
-		spin_unlock(&dentry->d_lock);
-
 		type = FILEID_NILFS_WITH_PARENT;
 		*lenp = NILFS_FID_SIZE_CONNECTABLE;
 	} else {
diff --git a/fs/ocfs2/export.c b/fs/ocfs2/export.c
index 745db42..322216a 100644
--- a/fs/ocfs2/export.c
+++ b/fs/ocfs2/export.c
@@ -177,21 +177,23 @@ bail:
 	return parent;
 }
 
-static int ocfs2_encode_fh(struct dentry *dentry, u32 *fh_in, int *max_len,
-			   int connectable)
+static int ocfs2_encode_fh(struct inode *inode, u32 *fh_in, int *max_len,
+			   struct inode *parent)
 {
-	struct inode *inode = dentry->d_inode;
 	int len = *max_len;
 	int type = 1;
 	u64 blkno;
 	u32 generation;
 	__le32 *fh = (__force __le32 *) fh_in;
 
+#ifdef TRACE_HOOKS_ARE_NOT_BRAINDEAD_IN_YOUR_OPINION
+#error "You go ahead and fix that mess, then.  Somehow"
 	trace_ocfs2_encode_fh_begin(dentry, dentry->d_name.len,
 				    dentry->d_name.name,
 				    fh, len, connectable);
+#endif
 
-	if (connectable && (len < 6)) {
+	if (parent && (len < 6)) {
 		*max_len = 6;
 		type = 255;
 		goto bail;
@@ -211,12 +213,7 @@ static int ocfs2_encode_fh(struct dentry *dentry, u32 *fh_in, int *max_len,
 	fh[1] = cpu_to_le32((u32)(blkno & 0xffffffff));
 	fh[2] = cpu_to_le32(generation);
 
-	if (connectable && !S_ISDIR(inode->i_mode)) {
-		struct inode *parent;
-
-		spin_lock(&dentry->d_lock);
-
-		parent = dentry->d_parent->d_inode;
+	if (parent) {
 		blkno = OCFS2_I(parent)->ip_blkno;
 		generation = parent->i_generation;
 
@@ -224,8 +221,6 @@ static int ocfs2_encode_fh(struct dentry *dentry, u32 *fh_in, int *max_len,
 		fh[4] = cpu_to_le32((u32)(blkno & 0xffffffff));
 		fh[5] = cpu_to_le32(generation);
 
-		spin_unlock(&dentry->d_lock);
-
 		len = 6;
 		type = 2;
 
diff --git a/fs/reiserfs/inode.c b/fs/reiserfs/inode.c
index 9e8cd5a..0095740 100644
--- a/fs/reiserfs/inode.c
+++ b/fs/reiserfs/inode.c
@@ -1592,13 +1592,12 @@ struct dentry *reiserfs_fh_to_parent(struct super_block *sb, struct fid *fid,
 		(fh_type == 6) ? fid->raw[5] : 0);
 }
 
-int reiserfs_encode_fh(struct dentry *dentry, __u32 * data, int *lenp,
-		       int need_parent)
+int reiserfs_encode_fh(struct inode *inode, __u32 * data, int *lenp,
+		       struct inode *parent)
 {
-	struct inode *inode = dentry->d_inode;
 	int maxlen = *lenp;
 
-	if (need_parent && (maxlen < 5)) {
+	if (parent && (maxlen < 5)) {
 		*lenp = 5;
 		return 255;
 	} else if (maxlen < 3) {
@@ -1611,19 +1610,16 @@ int reiserfs_encode_fh(struct dentry *dentry, __u32 * data, int *lenp,
 	data[2] = inode->i_generation;
 	*lenp = 3;
 	/* no room for directory info? return what we've stored so far */
-	if (maxlen < 5 || !need_parent)
+	if (maxlen < 5 || !parent)
 		return 3;
 
-	spin_lock(&dentry->d_lock);
-	inode = dentry->d_parent->d_inode;
-	data[3] = inode->i_ino;
-	data[4] = le32_to_cpu(INODE_PKEY(inode)->k_dir_id);
+	data[3] = parent->i_ino;
+	data[4] = le32_to_cpu(INODE_PKEY(parent)->k_dir_id);
 	*lenp = 5;
 	if (maxlen >= 6) {
-		data[5] = inode->i_generation;
+		data[5] = parent->i_generation;
 		*lenp = 6;
 	}
-	spin_unlock(&dentry->d_lock);
 	return *lenp;
 }
 
diff --git a/fs/udf/namei.c b/fs/udf/namei.c
index 08bf46e..6b9b526 100644
--- a/fs/udf/namei.c
+++ b/fs/udf/namei.c
@@ -1273,16 +1273,15 @@ static struct dentry *udf_fh_to_parent(struct super_block *sb,
 				 fid->udf.parent_partref,
 				 fid->udf.parent_generation);
 }
-static int udf_encode_fh(struct dentry *de, __u32 *fh, int *lenp,
-			 int connectable)
+static int udf_encode_fh(struct inode *inode, __u32 *fh, int *lenp,
+			 struct inode *parent)
 {
 	int len = *lenp;
-	struct inode *inode =  de->d_inode;
 	struct kernel_lb_addr location = UDF_I(inode)->i_location;
 	struct fid *fid = (struct fid *)fh;
 	int type = FILEID_UDF_WITHOUT_PARENT;
 
-	if (connectable && (len < 5)) {
+	if (parent && (len < 5)) {
 		*lenp = 5;
 		return 255;
 	} else if (len < 3) {
@@ -1295,14 +1294,11 @@ static int udf_encode_fh(struct dentry *de, __u32 *fh, int *lenp,
 	fid->udf.partref = location.partitionReferenceNum;
 	fid->udf.generation = inode->i_generation;
 
-	if (connectable && !S_ISDIR(inode->i_mode)) {
-		spin_lock(&de->d_lock);
-		inode = de->d_parent->d_inode;
-		location = UDF_I(inode)->i_location;
+	if (parent) {
+		location = UDF_I(parent)->i_location;
 		fid->udf.parent_block = location.logicalBlockNum;
 		fid->udf.parent_partref = location.partitionReferenceNum;
 		fid->udf.parent_generation = inode->i_generation;
-		spin_unlock(&de->d_lock);
 		*lenp = 5;
 		type = FILEID_UDF_WITH_PARENT;
 	}
diff --git a/fs/xfs/xfs_export.c b/fs/xfs/xfs_export.c
index 558910f..1a2adfc 100644
--- a/fs/xfs/xfs_export.c
+++ b/fs/xfs/xfs_export.c
@@ -53,19 +53,18 @@ static int xfs_fileid_length(int fileid_type)
 
 STATIC int
 xfs_fs_encode_fh(
-	struct dentry		*dentry,
-	__u32			*fh,
-	int			*max_len,
-	int			connectable)
+	struct inode	*inode,
+	__u32		*fh,
+	int		*max_len,
+	struct inode	*parent)
 {
 	struct fid		*fid = (struct fid *)fh;
 	struct xfs_fid64	*fid64 = (struct xfs_fid64 *)fh;
-	struct inode		*inode = dentry->d_inode;
 	int			fileid_type;
 	int			len;
 
 	/* Directories don't need their parent encoded, they have ".." */
-	if (S_ISDIR(inode->i_mode) || !connectable)
+	if (!parent)
 		fileid_type = FILEID_INO32_GEN;
 	else
 		fileid_type = FILEID_INO32_GEN_PARENT;
@@ -97,20 +96,16 @@ xfs_fs_encode_fh(
 
 	switch (fileid_type) {
 	case FILEID_INO32_GEN_PARENT:
-		spin_lock(&dentry->d_lock);
-		fid->i32.parent_ino = XFS_I(dentry->d_parent->d_inode)->i_ino;
-		fid->i32.parent_gen = dentry->d_parent->d_inode->i_generation;
-		spin_unlock(&dentry->d_lock);
+		fid->i32.parent_ino = XFS_I(parent)->i_ino;
+		fid->i32.parent_gen = parent->i_generation;
 		/*FALLTHRU*/
 	case FILEID_INO32_GEN:
 		fid->i32.ino = XFS_I(inode)->i_ino;
 		fid->i32.gen = inode->i_generation;
 		break;
 	case FILEID_INO32_GEN_PARENT | XFS_FILEID_TYPE_64FLAG:
-		spin_lock(&dentry->d_lock);
-		fid64->parent_ino = XFS_I(dentry->d_parent->d_inode)->i_ino;
-		fid64->parent_gen = dentry->d_parent->d_inode->i_generation;
-		spin_unlock(&dentry->d_lock);
+		fid64->parent_ino = XFS_I(parent)->i_ino;
+		fid64->parent_gen = parent->i_generation;
 		/*FALLTHRU*/
 	case FILEID_INO32_GEN | XFS_FILEID_TYPE_64FLAG:
 		fid64->ino = XFS_I(inode)->i_ino;
diff --git a/include/linux/exportfs.h b/include/linux/exportfs.h
index 3a4cef5..12291a7 100644
--- a/include/linux/exportfs.h
+++ b/include/linux/exportfs.h
@@ -165,8 +165,8 @@ struct fid {
  */
 
 struct export_operations {
-	int (*encode_fh)(struct dentry *de, __u32 *fh, int *max_len,
-			int connectable);
+	int (*encode_fh)(struct inode *inode, __u32 *fh, int *max_len,
+			struct inode *parent);
 	struct dentry * (*fh_to_dentry)(struct super_block *sb, struct fid *fid,
 			int fh_len, int fh_type);
 	struct dentry * (*fh_to_parent)(struct super_block *sb, struct fid *fid,
diff --git a/include/linux/reiserfs_fs.h b/include/linux/reiserfs_fs.h
index 26be28f..5e31164 100644
--- a/include/linux/reiserfs_fs.h
+++ b/include/linux/reiserfs_fs.h
@@ -2043,8 +2043,8 @@ struct dentry *reiserfs_fh_to_dentry(struct super_block *sb, struct fid *fid,
 				     int fh_len, int fh_type);
 struct dentry *reiserfs_fh_to_parent(struct super_block *sb, struct fid *fid,
 				     int fh_len, int fh_type);
-int reiserfs_encode_fh(struct dentry *dentry, __u32 * data, int *lenp,
-		       int connectable);
+int reiserfs_encode_fh(struct inode *inode, __u32 * data, int *lenp,
+		       struct inode *parent);
 
 int reiserfs_truncate_file(struct inode *, int update_timestamps);
 void make_cpu_key(struct cpu_key *cpu_key, struct inode *inode, loff_t offset,
diff --git a/mm/cleancache.c b/mm/cleancache.c
index bcaae4c..668c74f 100644
--- a/mm/cleancache.c
+++ b/mm/cleancache.c
@@ -75,7 +75,7 @@ EXPORT_SYMBOL(__cleancache_init_shared_fs);
 static int cleancache_get_key(struct inode *inode,
 			      struct cleancache_filekey *key)
 {
-	int (*fhfn)(struct dentry *, __u32 *fh, int *, int);
+	int (*fhfn)(struct inode *, __u32 *fh, int *, struct inode *);
 	int len = 0, maxlen = CLEANCACHE_KEY_MAX;
 	struct super_block *sb = inode->i_sb;
 
@@ -83,9 +83,7 @@ static int cleancache_get_key(struct inode *inode,
 	if (sb->s_export_op != NULL) {
 		fhfn = sb->s_export_op->encode_fh;
 		if  (fhfn) {
-			struct dentry d;
-			d.d_inode = inode;
-			len = (*fhfn)(&d, &key->u.fh[0], &maxlen, 0);
+			len = (*fhfn)(inode, &key->u.fh[0], &maxlen, NULL);
 			if (len <= 0 || len == 255)
 				return -1;
 			if (maxlen > CLEANCACHE_KEY_MAX)
diff --git a/mm/shmem.c b/mm/shmem.c
index 82d13f3..e438e24 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -1952,11 +1952,9 @@ static struct dentry *shmem_fh_to_dentry(struct super_block *sb,
 	return dentry;
 }
 
-static int shmem_encode_fh(struct dentry *dentry, __u32 *fh, int *len,
-				int connectable)
+static int shmem_encode_fh(struct inode *inode, __u32 *fh, int *len,
+				struct inode *parent)
 {
-	struct inode *inode = dentry->d_inode;
-
 	if (*len < 3) {
 		*len = 3;
 		return 255;

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [RFC] ->encode_fh() and related breakage
  2011-12-30  6:27 [RFC] ->encode_fh() and related breakage Al Viro
@ 2011-12-30  7:14 ` Andreas Dilger
  2011-12-30  7:57   ` Al Viro
  2011-12-30 16:48 ` Dan Magenheimer
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 7+ messages in thread
From: Andreas Dilger @ 2011-12-30  7:14 UTC (permalink / raw)
  To: Al Viro; +Cc: linux-fsdevel, Linus Torvalds, Sage Weil, Dan Magenheimer

On 2011-12-30, at 12:27 AM, Al Viro wrote:
> 	One kinda-sorta solution (besides kicking cleancache out of tree)
> would be to modify ->encode_fh() API - instead of dentry + connectable
> pass it inode + NULL-or-parent-inode, with ->d_lock held by caller.  And
> demand it to be non-blocking, obviously...  It would obviously break
> ceph ->encode_fh(), since that sucker apparently wants the name of last
> component anyway.  OTOH, that's broken for reasons mentioned above.
> OTTH, fhandle encoding is bloody close to user-visible ABI ;-/

Is it possible to always pass d_parent if this is available, rather than
only if @connectable is passed?  For network filesystems re-exporting NFS
it is desirable to always be passed the parent directory.

> diff --git a/fs/exportfs/expfs.c b/fs/exportfs/expfs.c
> index b05acb7..f8f7029 100644
> --- a/fs/exportfs/expfs.c
> +++ b/fs/exportfs/expfs.c
> @@ -304,24 +304,23 @@ out:
> 
> /**
>  * export_encode_fh - default export_operations->encode_fh function
> - * @dentry:  the dentry to encode
> + * @inode:   the object to encode
>  * @fh:      where to store the file handle fragment
>  * @max_len: maximum length to store there
> - * @connectable: whether to store parent information
> + * @parent:  parent directory inode, if wanted
>  *
>  * This default encode_fh function assumes that the 32 inode number
>  * is suitable for locating an inode, and that the generation number
>  * can be used to check that it is still valid.  It places them in the
>  * filehandle fragment where export_decode_fh expects to find them.
>  */
> -static int export_encode_fh(struct dentry *dentry, struct fid *fid,
> -		int *max_len, int connectable)
> +static int export_encode_fh(struct inode *inode, struct fid *fid,
> +		int *max_len, struct inode *parent)
> {
> -	struct inode * inode = dentry->d_inode;
> 	int len = *max_len;
> 	int type = FILEID_INO32_GEN;
> 
> -	if (connectable && (len < 4)) {
> +	if (parent && (len < 4)) {
> 		*max_len = 4;
> 		return 255;
> 	} else if (len < 2) {
> @@ -332,14 +331,9 @@ static int export_encode_fh(struct dentry *dentry, struct fid *fid,
> 	len = 2;
> 	fid->i32.ino = inode->i_ino;
> 	fid->i32.gen = inode->i_generation;
> -	if (connectable && !S_ISDIR(inode->i_mode)) {
> -		struct inode *parent;
> -
> -		spin_lock(&dentry->d_lock);
> -		parent = dentry->d_parent->d_inode;
> +	if (parent) {
> 		fid->i32.parent_ino = parent->i_ino;
> 		fid->i32.parent_gen = parent->i_generation;
> -		spin_unlock(&dentry->d_lock);
> 		len = 4;
> 		type = FILEID_INO32_GEN_PARENT;
> 	}
> @@ -352,11 +346,19 @@ int exportfs_encode_fh(struct dentry *dentry, struct fid *fid, int *max_len,
> {
> 	const struct export_operations *nop = dentry->d_sb->s_export_op;
> 	int error;
> +	struct inode *inode = dentry->d_inode, *parent = NULL;
> 
> +	if (connectable && !S_ISDIR(inode->i_mode)) {
> +		spin_lock(&dentry->d_lock);
> +		parent = dentry->d_parent->d_inode;
> +		BUG_ON(!parent);
> +	}
> 	if (nop->encode_fh)
> -		error = nop->encode_fh(dentry, fid->raw, max_len, connectable);
> +		error = nop->encode_fh(inode, fid->raw, max_len, parent);
> 	else
> -		error = export_encode_fh(dentry, fid, max_len, connectable);
> +		error = export_encode_fh(inode, fid, max_len, parent);
> +	if (parent)
> +		spin_unlock(&dentry->d_lock);
> 
> 	return error;
> }

Cheers, Andreas






^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [RFC] ->encode_fh() and related breakage
  2011-12-30  7:14 ` Andreas Dilger
@ 2011-12-30  7:57   ` Al Viro
  2012-01-09 13:55     ` Bernd Schubert
  0 siblings, 1 reply; 7+ messages in thread
From: Al Viro @ 2011-12-30  7:57 UTC (permalink / raw)
  To: Andreas Dilger; +Cc: linux-fsdevel, Linus Torvalds, Sage Weil, Dan Magenheimer

On Fri, Dec 30, 2011 at 01:14:16AM -0600, Andreas Dilger wrote:
> On 2011-12-30, at 12:27 AM, Al Viro wrote:
> > 	One kinda-sorta solution (besides kicking cleancache out of tree)
> > would be to modify ->encode_fh() API - instead of dentry + connectable
> > pass it inode + NULL-or-parent-inode, with ->d_lock held by caller.  And
> > demand it to be non-blocking, obviously...  It would obviously break
> > ceph ->encode_fh(), since that sucker apparently wants the name of last
> > component anyway.  OTOH, that's broken for reasons mentioned above.
> > OTTH, fhandle encoding is bloody close to user-visible ABI ;-/
> 
> Is it possible to always pass d_parent if this is available, rather than
> only if @connectable is passed?  For network filesystems re-exporting NFS
> it is desirable to always be passed the parent directory.

Even without check_subtree on the export?  Whatever the hell for?
Note that using the parent means broken rename() - fhandles will be
changed when file gets moved from one directory to another.  It's
exactly what is asked for in case of check_subtree, but otherwise it's
an obvious misfeature.  _And_ it costs extra locking and extra work
on decode_fh side...

I obviously looked only at the in-tree filesystems; out-of-tree code might
be doing very weird things, up to and including ROT13 of uuencoded UTF16
representation of pathname stored in fhandle.  And yes, fh representation
is too close to being a part of ABI for comfort, but then it's an ABI
exported by out-of-tree code, so I'm not going to worry too much about
its stability...

"desirable for network filesystems" is too damn vague; could you give more
details?

^ permalink raw reply	[flat|nested] 7+ messages in thread

* RE: [RFC] ->encode_fh() and related breakage
  2011-12-30  6:27 [RFC] ->encode_fh() and related breakage Al Viro
  2011-12-30  7:14 ` Andreas Dilger
@ 2011-12-30 16:48 ` Dan Magenheimer
  2012-01-03 11:31 ` Steven Whitehouse
  2012-01-03 17:00 ` Sage Weil
  3 siblings, 0 replies; 7+ messages in thread
From: Dan Magenheimer @ 2011-12-30 16:48 UTC (permalink / raw)
  To: Al Viro, linux-fsdevel
  Cc: Linus Torvalds, Sage Weil, Andreas Dilger, Chris Mason

> From: Al Viro [mailto:viro@ZenIV.linux.org.uk]
> Sent: Thursday, December 29, 2011 11:27 PM
> To: linux-fsdevel@vger.kernel.org
> Cc: Linus Torvalds; Sage Weil; Dan Magenheimer
> Subject: [RFC] ->encode_fh() and related breakage
> 
> 	a) cleancache is abusing ->encode_fh().  Badly.  In particular,
> on-stack(!) struct dentry is a lousy idea.

Indeed. See:  https://lkml.org/lkml/2011/2/24/378 
Chris Mason (now cc'ed) suggested (offlist) this short-term hack, which
was a fix to a reported issue** from a user of btrfs+cleancache+zcache
and the above lkml post was intended to log the hack for later review.

Thanks, Al, for picking up the ball I dropped and neglected.

** https://lkml.org/lkml/2011/2/13/163 
 
> 	c) cleancache relies on (unguaranteed) property of ->encode_fh() -
> if the last argument is 0, most of the instances do not look at anything
> other than dentry->d_inode.  Guess what happens with ceph?  Right, we
> end up doing spin_lock(&d.d_lock) and dget(d.d_parent).  Even though
> struct dentry d is auto, has no initializer and the only thing done to
> it is assignment to d.d_inode.  IOW, that spinlock has undefined contents
> and in case if you happen to pass that spin_lock(), dget(random_pointer)
> will get you an increment of value in memory at random address.  Fun...
> Moreover, on FAT we get dereference (read-only, but still it's at least an
> oops) of uninitialized pointer - d.d_parent->d_inode is followed.

Fortunately, cleancache is per-filesystem opt-in so that encode_fh
call from cleancache_get_key (in mm/cleancache.c) would never occur
for those filesystems.

> 	e) in case of default encode_fh, cleancache gets one difference from
> exportfs_encode_fh() behaviour - it ignores ->i_generation.  Note that for
> non-default encode_fh() i_generation is *not* ignored.
> 
> 	f) in case of _no_ export operations being present, cleancache is
> basically praying for inumbers being stable for some unknown reason.  Which
> is about as efficient as prayers tend to be...

What cleancache is looking for is a persistent 192-bit file identifier
that is unique and reproducible, regardless of whether the dentry is in-cache.
There should be no way for that unique identifier to become incoherent
with pages on disk belonging to that inode unless the inode is
in memory and cleancache_flush/invalidate_inode is called. Any fs
that cannot guarantee this uniqueness and coherency should not enable
cleancache, which is a good reason why cleancache is per-fs opt-in.

The use case (in case you are baffled):  Think of cleancache as being
huge, NOT part of the main memory of the system and so NOT subject
to normal system memory pressure.  We want the data pages to remain
in cleancache as long as possible, regardless of whether the inode
or dentry are in-cache.

> 	One kinda-sorta solution (besides kicking cleancache out of tree)

8-/

> would be to modify ->encode_fh() API - instead of dentry + connectable
> pass it inode + NULL-or-parent-inode, with ->d_lock held by caller.

> -static int export_encode_fh(struct dentry *dentry, struct fid *fid,
> -		int *max_len, int connectable)
> +static int export_encode_fh(struct inode *inode, struct fid *fid,
> +		int *max_len, struct inode *parent)

I won't pretend to be worthy to comment on the rest of the patch
or the change-of-API issues but FWIW this API change and the matching
cleancache changes below:

Acked-by: Dan Magenheimer <dan.magenheimer.com>

> diff --git a/mm/cleancache.c b/mm/cleancache.c
> index bcaae4c..668c74f 100644
> --- a/mm/cleancache.c
> +++ b/mm/cleancache.c
> @@ -75,7 +75,7 @@ EXPORT_SYMBOL(__cleancache_init_shared_fs);
>  static int cleancache_get_key(struct inode *inode,
>  			      struct cleancache_filekey *key)
>  {
> -	int (*fhfn)(struct dentry *, __u32 *fh, int *, int);
> +	int (*fhfn)(struct inode *, __u32 *fh, int *, struct inode *);
>  	int len = 0, maxlen = CLEANCACHE_KEY_MAX;
>  	struct super_block *sb = inode->i_sb;
> 
> @@ -83,9 +83,7 @@ static int cleancache_get_key(struct inode *inode,
>  	if (sb->s_export_op != NULL) {
>  		fhfn = sb->s_export_op->encode_fh;
>  		if  (fhfn) {
> -			struct dentry d;
> -			d.d_inode = inode;
> -			len = (*fhfn)(&d, &key->u.fh[0], &maxlen, 0);
> +			len = (*fhfn)(inode, &key->u.fh[0], &maxlen, NULL);
>  			if (len <= 0 || len == 255)
>  				return -1;
>  			if (maxlen > CLEANCACHE_KEY_MAX)

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [RFC] ->encode_fh() and related breakage
  2011-12-30  6:27 [RFC] ->encode_fh() and related breakage Al Viro
  2011-12-30  7:14 ` Andreas Dilger
  2011-12-30 16:48 ` Dan Magenheimer
@ 2012-01-03 11:31 ` Steven Whitehouse
  2012-01-03 17:00 ` Sage Weil
  3 siblings, 0 replies; 7+ messages in thread
From: Steven Whitehouse @ 2012-01-03 11:31 UTC (permalink / raw)
  To: Al Viro; +Cc: linux-fsdevel, Linus Torvalds, Sage Weil, Dan Magenheimer

Hi,

GFS2 changes look good to me:

Acked-by: Steven Whitehouse <swhiteho@redhat.com>

Steve.

On Fri, 2011-12-30 at 06:27 +0000, Al Viro wrote:
> 	a) cleancache is abusing ->encode_fh().  Badly.  In particular,
> on-stack(!) struct dentry is a lousy idea.
> 
> 	b) ceph is _also_ abusing ->encode_fh().  For one thing, it
> gets fhandle that depends on object name if you give it a large enough
> buffer to put result into.  This is absolutely broken - having cross-directory
> renames break opened files is bad enough, but having that happen on rename()
> _inside_ a directory...  Besides, the only thing that ought to depend on
> max_len passed to ->encode_fh() should be "do I tell the caller to come
> back with bigger one?".
> 
> 	c) cleancache relies on (unguaranteed) property of ->encode_fh() -
> if the last argument is 0, most of the instances do not look at anything
> other than dentry->d_inode.  Guess what happens with ceph?  Right, we
> end up doing spin_lock(&d.d_lock) and dget(d.d_parent).  Even though
> struct dentry d is auto, has no initializer and the only thing done to
> it is assignment to d.d_inode.  IOW, that spinlock has undefined contents
> and in case if you happen to pass that spin_lock(), dget(random_pointer)
> will get you an increment of value in memory at random address.  Fun...
> Moreover, on FAT we get dereference (read-only, but still it's at least an
> oops) of uninitialized pointer - d.d_parent->d_inode is followed.
> 
> 	d) after a look through ->encode_fh() instances:
> * all of them care about dentry->d_inode (d'oh)
> * most of them look at anything else only if that inode is a directory and
> the last argument is non-zero; in that case dentry->d_lock is grabbed and
> (usually) some properties of dentry->d_parent->d_inode are looked at.
> * most of them never do anything blocking.  Exceptions:
> 	* ceph doing dget()/dput() on parent.  For no good reason, since
> everything it does with that parent could've been done before dropping
> dentry->d_lock (and we don't do anything else that might've blocked)
> 	* gfs2 - same, but it does igrab() on parent's inode instead.
> Again, for no good reason whatsoever.
> 
> 	e) in case of default encode_fh, cleancache gets one difference from
> exportfs_encode_fh() behaviour - it ignores ->i_generation.  Note that for
> non-default encode_fh() i_generation is *not* ignored.
> 
> 	f) in case of _no_ export operations being present, cleancache is
> basically praying for inumbers being stable for some unknown reason.  Which
> is about as efficient as prayers tend to be...	
> 
> 	One kinda-sorta solution (besides kicking cleancache out of tree)
> would be to modify ->encode_fh() API - instead of dentry + connectable
> pass it inode + NULL-or-parent-inode, with ->d_lock held by caller.  And
> demand it to be non-blocking, obviously...  It would obviously break
> ceph ->encode_fh(), since that sucker apparently wants the name of last
> component anyway.  OTOH, that's broken for reasons mentioned above.
> OTTH, fhandle encoding is bloody close to user-visible ABI ;-/
> 
> 	_Absolutely_ untested patch along those lines follows; it definitely
> breaks ceph, it does nothing for (e) and (f), it might be broken in all kinds
> of dumb ways, etc.  Comments are welcome...
> 
> diff --git a/fs/btrfs/export.c b/fs/btrfs/export.c
> index 5f77166..e60dbe1 100644
> --- a/fs/btrfs/export.c
> +++ b/fs/btrfs/export.c
> @@ -13,15 +13,14 @@
>  					     parent_root_objectid) / 4)
>  #define BTRFS_FID_SIZE_CONNECTABLE_ROOT (sizeof(struct btrfs_fid) / 4)
>  
> -static int btrfs_encode_fh(struct dentry *dentry, u32 *fh, int *max_len,
> -			   int connectable)
> +static int btrfs_encode_fh(struct inode *inode, u32 *fh, int *max_len,
> +			   struct inode *parent)
>  {
>  	struct btrfs_fid *fid = (struct btrfs_fid *)fh;
> -	struct inode *inode = dentry->d_inode;
>  	int len = *max_len;
>  	int type;
>  
> -	if (connectable && (len < BTRFS_FID_SIZE_CONNECTABLE)) {
> +	if (parent && (len < BTRFS_FID_SIZE_CONNECTABLE)) {
>  		*max_len = BTRFS_FID_SIZE_CONNECTABLE;
>  		return 255;
>  	} else if (len < BTRFS_FID_SIZE_NON_CONNECTABLE) {
> @@ -36,19 +35,13 @@ static int btrfs_encode_fh(struct dentry *dentry, u32 *fh, int *max_len,
>  	fid->root_objectid = BTRFS_I(inode)->root->objectid;
>  	fid->gen = inode->i_generation;
>  
> -	if (connectable && !S_ISDIR(inode->i_mode)) {
> -		struct inode *parent;
> +	if (parent) {
>  		u64 parent_root_id;
>  
> -		spin_lock(&dentry->d_lock);
> -
> -		parent = dentry->d_parent->d_inode;
>  		fid->parent_objectid = BTRFS_I(parent)->location.objectid;
>  		fid->parent_gen = parent->i_generation;
>  		parent_root_id = BTRFS_I(parent)->root->objectid;
>  
> -		spin_unlock(&dentry->d_lock);
> -
>  		if (parent_root_id != fid->root_objectid) {
>  			fid->parent_root_objectid = parent_root_id;
>  			len = BTRFS_FID_SIZE_CONNECTABLE_ROOT;
> diff --git a/fs/ceph/export.c b/fs/ceph/export.c
> index 9fbcdec..bcaca95 100644
> --- a/fs/ceph/export.c
> +++ b/fs/ceph/export.c
> @@ -249,7 +249,9 @@ static struct dentry *ceph_fh_to_parent(struct super_block *sb,
>  }
>  
>  const struct export_operations ceph_export_ops = {
> +#ifdef BROKEN
>  	.encode_fh = ceph_encode_fh,
> +#endif
>  	.fh_to_dentry = ceph_fh_to_dentry,
>  	.fh_to_parent = ceph_fh_to_parent,
>  };
> diff --git a/fs/exportfs/expfs.c b/fs/exportfs/expfs.c
> index b05acb7..f8f7029 100644
> --- a/fs/exportfs/expfs.c
> +++ b/fs/exportfs/expfs.c
> @@ -304,24 +304,23 @@ out:
>  
>  /**
>   * export_encode_fh - default export_operations->encode_fh function
> - * @dentry:  the dentry to encode
> + * @inode:   the object to encode
>   * @fh:      where to store the file handle fragment
>   * @max_len: maximum length to store there
> - * @connectable: whether to store parent information
> + * @parent:  parent directory inode, if wanted
>   *
>   * This default encode_fh function assumes that the 32 inode number
>   * is suitable for locating an inode, and that the generation number
>   * can be used to check that it is still valid.  It places them in the
>   * filehandle fragment where export_decode_fh expects to find them.
>   */
> -static int export_encode_fh(struct dentry *dentry, struct fid *fid,
> -		int *max_len, int connectable)
> +static int export_encode_fh(struct inode *inode, struct fid *fid,
> +		int *max_len, struct inode *parent)
>  {
> -	struct inode * inode = dentry->d_inode;
>  	int len = *max_len;
>  	int type = FILEID_INO32_GEN;
>  
> -	if (connectable && (len < 4)) {
> +	if (parent && (len < 4)) {
>  		*max_len = 4;
>  		return 255;
>  	} else if (len < 2) {
> @@ -332,14 +331,9 @@ static int export_encode_fh(struct dentry *dentry, struct fid *fid,
>  	len = 2;
>  	fid->i32.ino = inode->i_ino;
>  	fid->i32.gen = inode->i_generation;
> -	if (connectable && !S_ISDIR(inode->i_mode)) {
> -		struct inode *parent;
> -
> -		spin_lock(&dentry->d_lock);
> -		parent = dentry->d_parent->d_inode;
> +	if (parent) {
>  		fid->i32.parent_ino = parent->i_ino;
>  		fid->i32.parent_gen = parent->i_generation;
> -		spin_unlock(&dentry->d_lock);
>  		len = 4;
>  		type = FILEID_INO32_GEN_PARENT;
>  	}
> @@ -352,11 +346,19 @@ int exportfs_encode_fh(struct dentry *dentry, struct fid *fid, int *max_len,
>  {
>  	const struct export_operations *nop = dentry->d_sb->s_export_op;
>  	int error;
> +	struct inode *inode = dentry->d_inode, *parent = NULL;
>  
> +	if (connectable && !S_ISDIR(inode->i_mode)) {
> +		spin_lock(&dentry->d_lock);
> +		parent = dentry->d_parent->d_inode;
> +		BUG_ON(!parent);
> +	}
>  	if (nop->encode_fh)
> -		error = nop->encode_fh(dentry, fid->raw, max_len, connectable);
> +		error = nop->encode_fh(inode, fid->raw, max_len, parent);
>  	else
> -		error = export_encode_fh(dentry, fid, max_len, connectable);
> +		error = export_encode_fh(inode, fid, max_len, parent);
> +	if (parent)
> +		spin_unlock(&dentry->d_lock);
>  
>  	return error;
>  }
> diff --git a/fs/fat/inode.c b/fs/fat/inode.c
> index 7873797..0813c70 100644
> --- a/fs/fat/inode.c
> +++ b/fs/fat/inode.c
> @@ -752,10 +752,9 @@ static struct dentry *fat_fh_to_dentry(struct super_block *sb,
>  }
>  
>  static int
> -fat_encode_fh(struct dentry *de, __u32 *fh, int *lenp, int connectable)
> +fat_encode_fh(struct inode *inode, __u32 *fh, int *lenp, struct inode *parent)
>  {
>  	int len = *lenp;
> -	struct inode *inode =  de->d_inode;
>  	u32 ipos_h, ipos_m, ipos_l;
>  
>  	if (len < 5) {
> @@ -771,9 +770,9 @@ fat_encode_fh(struct dentry *de, __u32 *fh, int *lenp, int connectable)
>  	fh[1] = inode->i_generation;
>  	fh[2] = ipos_h;
>  	fh[3] = ipos_m | MSDOS_I(inode)->i_logstart;
> -	spin_lock(&de->d_lock);
> -	fh[4] = ipos_l | MSDOS_I(de->d_parent->d_inode)->i_logstart;
> -	spin_unlock(&de->d_lock);
> +	fh[4] = ipos_l;
> +	if (parent)
> +		fh[4] |= MSDOS_I(parent)->i_logstart;
>  	return 3;
>  }
>  
> diff --git a/fs/fuse/inode.c b/fs/fuse/inode.c
> index 64cf8d0..25ca731 100644
> --- a/fs/fuse/inode.c
> +++ b/fs/fuse/inode.c
> @@ -627,12 +627,10 @@ static struct dentry *fuse_get_dentry(struct super_block *sb,
>  	return ERR_PTR(err);
>  }
>  
> -static int fuse_encode_fh(struct dentry *dentry, u32 *fh, int *max_len,
> -			   int connectable)
> +static int fuse_encode_fh(struct inode *inode, u32 *fh, int *max_len,
> +			   struct inode *parent)
>  {
> -	struct inode *inode = dentry->d_inode;
> -	bool encode_parent = connectable && !S_ISDIR(inode->i_mode);
> -	int len = encode_parent ? 6 : 3;
> +	int len = parent ? 6 : 3;
>  	u64 nodeid;
>  	u32 generation;
>  
> @@ -648,14 +646,9 @@ static int fuse_encode_fh(struct dentry *dentry, u32 *fh, int *max_len,
>  	fh[1] = (u32)(nodeid & 0xffffffff);
>  	fh[2] = generation;
>  
> -	if (encode_parent) {
> -		struct inode *parent;
> -
> -		spin_lock(&dentry->d_lock);
> -		parent = dentry->d_parent->d_inode;
> +	if (parent) {
>  		nodeid = get_fuse_inode(parent)->nodeid;
>  		generation = parent->i_generation;
> -		spin_unlock(&dentry->d_lock);
>  
>  		fh[3] = (u32)(nodeid >> 32);
>  		fh[4] = (u32)(nodeid & 0xffffffff);
> @@ -663,7 +656,7 @@ static int fuse_encode_fh(struct dentry *dentry, u32 *fh, int *max_len,
>  	}
>  
>  	*max_len = len;
> -	return encode_parent ? 0x82 : 0x81;
> +	return parent ? 0x82 : 0x81;
>  }
>  
>  static struct dentry *fuse_fh_to_dentry(struct super_block *sb,
> diff --git a/fs/gfs2/export.c b/fs/gfs2/export.c
> index fe9945f..faa7916 100644
> --- a/fs/gfs2/export.c
> +++ b/fs/gfs2/export.c
> @@ -28,15 +28,14 @@
>  #define GFS2_LARGE_FH_SIZE 8
>  #define GFS2_OLD_FH_SIZE 10
>  
> -static int gfs2_encode_fh(struct dentry *dentry, __u32 *p, int *len,
> -			  int connectable)
> +static int gfs2_encode_fh(struct inode *inode, __u32 *p, int *len,
> +			  struct inode *parent)
>  {
>  	__be32 *fh = (__force __be32 *)p;
> -	struct inode *inode = dentry->d_inode;
>  	struct super_block *sb = inode->i_sb;
>  	struct gfs2_inode *ip = GFS2_I(inode);
>  
> -	if (connectable && (*len < GFS2_LARGE_FH_SIZE)) {
> +	if (parent && (*len < GFS2_LARGE_FH_SIZE)) {
>  		*len = GFS2_LARGE_FH_SIZE;
>  		return 255;
>  	} else if (*len < GFS2_SMALL_FH_SIZE) {
> @@ -50,14 +49,10 @@ static int gfs2_encode_fh(struct dentry *dentry, __u32 *p, int *len,
>  	fh[3] = cpu_to_be32(ip->i_no_addr & 0xFFFFFFFF);
>  	*len = GFS2_SMALL_FH_SIZE;
>  
> -	if (!connectable || inode == sb->s_root->d_inode)
> +	if (!parent || inode == sb->s_root->d_inode)
>  		return *len;
>  
> -	spin_lock(&dentry->d_lock);
> -	inode = dentry->d_parent->d_inode;
> -	ip = GFS2_I(inode);
> -	igrab(inode);
> -	spin_unlock(&dentry->d_lock);
> +	ip = GFS2_I(parent);
>  
>  	fh[4] = cpu_to_be32(ip->i_no_formal_ino >> 32);
>  	fh[5] = cpu_to_be32(ip->i_no_formal_ino & 0xFFFFFFFF);
> @@ -65,8 +60,6 @@ static int gfs2_encode_fh(struct dentry *dentry, __u32 *p, int *len,
>  	fh[7] = cpu_to_be32(ip->i_no_addr & 0xFFFFFFFF);
>  	*len = GFS2_LARGE_FH_SIZE;
>  
> -	iput(inode);
> -
>  	return *len;
>  }
>  
> diff --git a/fs/isofs/export.c b/fs/isofs/export.c
> index dd4687f..aa4356d 100644
> --- a/fs/isofs/export.c
> +++ b/fs/isofs/export.c
> @@ -107,12 +107,11 @@ static struct dentry *isofs_export_get_parent(struct dentry *child)
>  }
>  
>  static int
> -isofs_export_encode_fh(struct dentry *dentry,
> +isofs_export_encode_fh(struct inode *inode,
>  		       __u32 *fh32,
>  		       int *max_len,
> -		       int connectable)
> +		       struct inode *parent)
>  {
> -	struct inode * inode = dentry->d_inode;
>  	struct iso_inode_info * ei = ISOFS_I(inode);
>  	int len = *max_len;
>  	int type = 1;
> @@ -124,7 +123,7 @@ isofs_export_encode_fh(struct dentry *dentry,
>  	 * offset of the inode and the upper 16 bits of fh32[1] to
>  	 * hold the offset of the parent.
>  	 */
> -	if (connectable && (len < 5)) {
> +	if (parent && (len < 5)) {
>  		*max_len = 5;
>  		return 255;
>  	} else if (len < 3) {
> @@ -136,16 +135,12 @@ isofs_export_encode_fh(struct dentry *dentry,
>  	fh32[0] = ei->i_iget5_block;
>   	fh16[2] = (__u16)ei->i_iget5_offset;  /* fh16 [sic] */
>  	fh32[2] = inode->i_generation;
> -	if (connectable && !S_ISDIR(inode->i_mode)) {
> -		struct inode *parent;
> +	if (parent) {
>  		struct iso_inode_info *eparent;
> -		spin_lock(&dentry->d_lock);
> -		parent = dentry->d_parent->d_inode;
>  		eparent = ISOFS_I(parent);
>  		fh32[3] = eparent->i_iget5_block;
>  		fh16[3] = (__u16)eparent->i_iget5_offset;  /* fh16 [sic] */
>  		fh32[4] = parent->i_generation;
> -		spin_unlock(&dentry->d_lock);
>  		len = 5;
>  		type = 2;
>  	}
> diff --git a/fs/nilfs2/namei.c b/fs/nilfs2/namei.c
> index 1cd3f62..958e1cb 100644
> --- a/fs/nilfs2/namei.c
> +++ b/fs/nilfs2/namei.c
> @@ -519,31 +519,24 @@ static struct dentry *nilfs_fh_to_parent(struct super_block *sb, struct fid *fh,
>  	return nilfs_get_dentry(sb, fid->cno, fid->parent_ino, fid->parent_gen);
>  }
>  
> -static int nilfs_encode_fh(struct dentry *dentry, __u32 *fh, int *lenp,
> -			   int connectable)
> +static int nilfs_encode_fh(struct inode *inode, __u32 *fh, int *lenp,
> +			   struct inode *parent)
>  {
>  	struct nilfs_fid *fid = (struct nilfs_fid *)fh;
> -	struct inode *inode = dentry->d_inode;
>  	struct nilfs_root *root = NILFS_I(inode)->i_root;
>  	int type;
>  
>  	if (*lenp < NILFS_FID_SIZE_NON_CONNECTABLE ||
> -	    (connectable && *lenp < NILFS_FID_SIZE_CONNECTABLE))
> +	    (parent && *lenp < NILFS_FID_SIZE_CONNECTABLE))
>  		return 255;
>  
>  	fid->cno = root->cno;
>  	fid->ino = inode->i_ino;
>  	fid->gen = inode->i_generation;
>  
> -	if (connectable && !S_ISDIR(inode->i_mode)) {
> -		struct inode *parent;
> -
> -		spin_lock(&dentry->d_lock);
> -		parent = dentry->d_parent->d_inode;
> +	if (parent) {
>  		fid->parent_ino = parent->i_ino;
>  		fid->parent_gen = parent->i_generation;
> -		spin_unlock(&dentry->d_lock);
> -
>  		type = FILEID_NILFS_WITH_PARENT;
>  		*lenp = NILFS_FID_SIZE_CONNECTABLE;
>  	} else {
> diff --git a/fs/ocfs2/export.c b/fs/ocfs2/export.c
> index 745db42..322216a 100644
> --- a/fs/ocfs2/export.c
> +++ b/fs/ocfs2/export.c
> @@ -177,21 +177,23 @@ bail:
>  	return parent;
>  }
>  
> -static int ocfs2_encode_fh(struct dentry *dentry, u32 *fh_in, int *max_len,
> -			   int connectable)
> +static int ocfs2_encode_fh(struct inode *inode, u32 *fh_in, int *max_len,
> +			   struct inode *parent)
>  {
> -	struct inode *inode = dentry->d_inode;
>  	int len = *max_len;
>  	int type = 1;
>  	u64 blkno;
>  	u32 generation;
>  	__le32 *fh = (__force __le32 *) fh_in;
>  
> +#ifdef TRACE_HOOKS_ARE_NOT_BRAINDEAD_IN_YOUR_OPINION
> +#error "You go ahead and fix that mess, then.  Somehow"
>  	trace_ocfs2_encode_fh_begin(dentry, dentry->d_name.len,
>  				    dentry->d_name.name,
>  				    fh, len, connectable);
> +#endif
>  
> -	if (connectable && (len < 6)) {
> +	if (parent && (len < 6)) {
>  		*max_len = 6;
>  		type = 255;
>  		goto bail;
> @@ -211,12 +213,7 @@ static int ocfs2_encode_fh(struct dentry *dentry, u32 *fh_in, int *max_len,
>  	fh[1] = cpu_to_le32((u32)(blkno & 0xffffffff));
>  	fh[2] = cpu_to_le32(generation);
>  
> -	if (connectable && !S_ISDIR(inode->i_mode)) {
> -		struct inode *parent;
> -
> -		spin_lock(&dentry->d_lock);
> -
> -		parent = dentry->d_parent->d_inode;
> +	if (parent) {
>  		blkno = OCFS2_I(parent)->ip_blkno;
>  		generation = parent->i_generation;
>  
> @@ -224,8 +221,6 @@ static int ocfs2_encode_fh(struct dentry *dentry, u32 *fh_in, int *max_len,
>  		fh[4] = cpu_to_le32((u32)(blkno & 0xffffffff));
>  		fh[5] = cpu_to_le32(generation);
>  
> -		spin_unlock(&dentry->d_lock);
> -
>  		len = 6;
>  		type = 2;
>  
> diff --git a/fs/reiserfs/inode.c b/fs/reiserfs/inode.c
> index 9e8cd5a..0095740 100644
> --- a/fs/reiserfs/inode.c
> +++ b/fs/reiserfs/inode.c
> @@ -1592,13 +1592,12 @@ struct dentry *reiserfs_fh_to_parent(struct super_block *sb, struct fid *fid,
>  		(fh_type == 6) ? fid->raw[5] : 0);
>  }
>  
> -int reiserfs_encode_fh(struct dentry *dentry, __u32 * data, int *lenp,
> -		       int need_parent)
> +int reiserfs_encode_fh(struct inode *inode, __u32 * data, int *lenp,
> +		       struct inode *parent)
>  {
> -	struct inode *inode = dentry->d_inode;
>  	int maxlen = *lenp;
>  
> -	if (need_parent && (maxlen < 5)) {
> +	if (parent && (maxlen < 5)) {
>  		*lenp = 5;
>  		return 255;
>  	} else if (maxlen < 3) {
> @@ -1611,19 +1610,16 @@ int reiserfs_encode_fh(struct dentry *dentry, __u32 * data, int *lenp,
>  	data[2] = inode->i_generation;
>  	*lenp = 3;
>  	/* no room for directory info? return what we've stored so far */
> -	if (maxlen < 5 || !need_parent)
> +	if (maxlen < 5 || !parent)
>  		return 3;
>  
> -	spin_lock(&dentry->d_lock);
> -	inode = dentry->d_parent->d_inode;
> -	data[3] = inode->i_ino;
> -	data[4] = le32_to_cpu(INODE_PKEY(inode)->k_dir_id);
> +	data[3] = parent->i_ino;
> +	data[4] = le32_to_cpu(INODE_PKEY(parent)->k_dir_id);
>  	*lenp = 5;
>  	if (maxlen >= 6) {
> -		data[5] = inode->i_generation;
> +		data[5] = parent->i_generation;
>  		*lenp = 6;
>  	}
> -	spin_unlock(&dentry->d_lock);
>  	return *lenp;
>  }
>  
> diff --git a/fs/udf/namei.c b/fs/udf/namei.c
> index 08bf46e..6b9b526 100644
> --- a/fs/udf/namei.c
> +++ b/fs/udf/namei.c
> @@ -1273,16 +1273,15 @@ static struct dentry *udf_fh_to_parent(struct super_block *sb,
>  				 fid->udf.parent_partref,
>  				 fid->udf.parent_generation);
>  }
> -static int udf_encode_fh(struct dentry *de, __u32 *fh, int *lenp,
> -			 int connectable)
> +static int udf_encode_fh(struct inode *inode, __u32 *fh, int *lenp,
> +			 struct inode *parent)
>  {
>  	int len = *lenp;
> -	struct inode *inode =  de->d_inode;
>  	struct kernel_lb_addr location = UDF_I(inode)->i_location;
>  	struct fid *fid = (struct fid *)fh;
>  	int type = FILEID_UDF_WITHOUT_PARENT;
>  
> -	if (connectable && (len < 5)) {
> +	if (parent && (len < 5)) {
>  		*lenp = 5;
>  		return 255;
>  	} else if (len < 3) {
> @@ -1295,14 +1294,11 @@ static int udf_encode_fh(struct dentry *de, __u32 *fh, int *lenp,
>  	fid->udf.partref = location.partitionReferenceNum;
>  	fid->udf.generation = inode->i_generation;
>  
> -	if (connectable && !S_ISDIR(inode->i_mode)) {
> -		spin_lock(&de->d_lock);
> -		inode = de->d_parent->d_inode;
> -		location = UDF_I(inode)->i_location;
> +	if (parent) {
> +		location = UDF_I(parent)->i_location;
>  		fid->udf.parent_block = location.logicalBlockNum;
>  		fid->udf.parent_partref = location.partitionReferenceNum;
>  		fid->udf.parent_generation = inode->i_generation;
> -		spin_unlock(&de->d_lock);
>  		*lenp = 5;
>  		type = FILEID_UDF_WITH_PARENT;
>  	}
> diff --git a/fs/xfs/xfs_export.c b/fs/xfs/xfs_export.c
> index 558910f..1a2adfc 100644
> --- a/fs/xfs/xfs_export.c
> +++ b/fs/xfs/xfs_export.c
> @@ -53,19 +53,18 @@ static int xfs_fileid_length(int fileid_type)
>  
>  STATIC int
>  xfs_fs_encode_fh(
> -	struct dentry		*dentry,
> -	__u32			*fh,
> -	int			*max_len,
> -	int			connectable)
> +	struct inode	*inode,
> +	__u32		*fh,
> +	int		*max_len,
> +	struct inode	*parent)
>  {
>  	struct fid		*fid = (struct fid *)fh;
>  	struct xfs_fid64	*fid64 = (struct xfs_fid64 *)fh;
> -	struct inode		*inode = dentry->d_inode;
>  	int			fileid_type;
>  	int			len;
>  
>  	/* Directories don't need their parent encoded, they have ".." */
> -	if (S_ISDIR(inode->i_mode) || !connectable)
> +	if (!parent)
>  		fileid_type = FILEID_INO32_GEN;
>  	else
>  		fileid_type = FILEID_INO32_GEN_PARENT;
> @@ -97,20 +96,16 @@ xfs_fs_encode_fh(
>  
>  	switch (fileid_type) {
>  	case FILEID_INO32_GEN_PARENT:
> -		spin_lock(&dentry->d_lock);
> -		fid->i32.parent_ino = XFS_I(dentry->d_parent->d_inode)->i_ino;
> -		fid->i32.parent_gen = dentry->d_parent->d_inode->i_generation;
> -		spin_unlock(&dentry->d_lock);
> +		fid->i32.parent_ino = XFS_I(parent)->i_ino;
> +		fid->i32.parent_gen = parent->i_generation;
>  		/*FALLTHRU*/
>  	case FILEID_INO32_GEN:
>  		fid->i32.ino = XFS_I(inode)->i_ino;
>  		fid->i32.gen = inode->i_generation;
>  		break;
>  	case FILEID_INO32_GEN_PARENT | XFS_FILEID_TYPE_64FLAG:
> -		spin_lock(&dentry->d_lock);
> -		fid64->parent_ino = XFS_I(dentry->d_parent->d_inode)->i_ino;
> -		fid64->parent_gen = dentry->d_parent->d_inode->i_generation;
> -		spin_unlock(&dentry->d_lock);
> +		fid64->parent_ino = XFS_I(parent)->i_ino;
> +		fid64->parent_gen = parent->i_generation;
>  		/*FALLTHRU*/
>  	case FILEID_INO32_GEN | XFS_FILEID_TYPE_64FLAG:
>  		fid64->ino = XFS_I(inode)->i_ino;
> diff --git a/include/linux/exportfs.h b/include/linux/exportfs.h
> index 3a4cef5..12291a7 100644
> --- a/include/linux/exportfs.h
> +++ b/include/linux/exportfs.h
> @@ -165,8 +165,8 @@ struct fid {
>   */
>  
>  struct export_operations {
> -	int (*encode_fh)(struct dentry *de, __u32 *fh, int *max_len,
> -			int connectable);
> +	int (*encode_fh)(struct inode *inode, __u32 *fh, int *max_len,
> +			struct inode *parent);
>  	struct dentry * (*fh_to_dentry)(struct super_block *sb, struct fid *fid,
>  			int fh_len, int fh_type);
>  	struct dentry * (*fh_to_parent)(struct super_block *sb, struct fid *fid,
> diff --git a/include/linux/reiserfs_fs.h b/include/linux/reiserfs_fs.h
> index 26be28f..5e31164 100644
> --- a/include/linux/reiserfs_fs.h
> +++ b/include/linux/reiserfs_fs.h
> @@ -2043,8 +2043,8 @@ struct dentry *reiserfs_fh_to_dentry(struct super_block *sb, struct fid *fid,
>  				     int fh_len, int fh_type);
>  struct dentry *reiserfs_fh_to_parent(struct super_block *sb, struct fid *fid,
>  				     int fh_len, int fh_type);
> -int reiserfs_encode_fh(struct dentry *dentry, __u32 * data, int *lenp,
> -		       int connectable);
> +int reiserfs_encode_fh(struct inode *inode, __u32 * data, int *lenp,
> +		       struct inode *parent);
>  
>  int reiserfs_truncate_file(struct inode *, int update_timestamps);
>  void make_cpu_key(struct cpu_key *cpu_key, struct inode *inode, loff_t offset,
> diff --git a/mm/cleancache.c b/mm/cleancache.c
> index bcaae4c..668c74f 100644
> --- a/mm/cleancache.c
> +++ b/mm/cleancache.c
> @@ -75,7 +75,7 @@ EXPORT_SYMBOL(__cleancache_init_shared_fs);
>  static int cleancache_get_key(struct inode *inode,
>  			      struct cleancache_filekey *key)
>  {
> -	int (*fhfn)(struct dentry *, __u32 *fh, int *, int);
> +	int (*fhfn)(struct inode *, __u32 *fh, int *, struct inode *);
>  	int len = 0, maxlen = CLEANCACHE_KEY_MAX;
>  	struct super_block *sb = inode->i_sb;
>  
> @@ -83,9 +83,7 @@ static int cleancache_get_key(struct inode *inode,
>  	if (sb->s_export_op != NULL) {
>  		fhfn = sb->s_export_op->encode_fh;
>  		if  (fhfn) {
> -			struct dentry d;
> -			d.d_inode = inode;
> -			len = (*fhfn)(&d, &key->u.fh[0], &maxlen, 0);
> +			len = (*fhfn)(inode, &key->u.fh[0], &maxlen, NULL);
>  			if (len <= 0 || len == 255)
>  				return -1;
>  			if (maxlen > CLEANCACHE_KEY_MAX)
> diff --git a/mm/shmem.c b/mm/shmem.c
> index 82d13f3..e438e24 100644
> --- a/mm/shmem.c
> +++ b/mm/shmem.c
> @@ -1952,11 +1952,9 @@ static struct dentry *shmem_fh_to_dentry(struct super_block *sb,
>  	return dentry;
>  }
>  
> -static int shmem_encode_fh(struct dentry *dentry, __u32 *fh, int *len,
> -				int connectable)
> +static int shmem_encode_fh(struct inode *inode, __u32 *fh, int *len,
> +				struct inode *parent)
>  {
> -	struct inode *inode = dentry->d_inode;
> -
>  	if (*len < 3) {
>  		*len = 3;
>  		return 255;
> --
> To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html



^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [RFC] ->encode_fh() and related breakage
  2011-12-30  6:27 [RFC] ->encode_fh() and related breakage Al Viro
                   ` (2 preceding siblings ...)
  2012-01-03 11:31 ` Steven Whitehouse
@ 2012-01-03 17:00 ` Sage Weil
  3 siblings, 0 replies; 7+ messages in thread
From: Sage Weil @ 2012-01-03 17:00 UTC (permalink / raw)
  To: Al Viro; +Cc: linux-fsdevel, Linus Torvalds, Dan Magenheimer

Hi Al,

On Fri, 30 Dec 2011, Al Viro wrote:
> 	a) cleancache is abusing ->encode_fh().  Badly.  In particular,
> on-stack(!) struct dentry is a lousy idea.
> 
> 	b) ceph is _also_ abusing ->encode_fh().  For one thing, it
> gets fhandle that depends on object name if you give it a large enough
> buffer to put result into.  This is absolutely broken - having cross-directory
> renames break opened files is bad enough, but having that happen on rename()
> _inside_ a directory...  Besides, the only thing that ought to depend on
> max_len passed to ->encode_fh() should be "do I tell the caller to come
> back with bigger one?".

This is partly by design, and partly a work in progress.  Ceph has a 
limited ability to lookup inodes by ino.  The parent information is a 
hint that makes the lookup more efficient.  Currently that's required in 
most circumstances, but once we do a bit more work on the server it won't 
be--it'll just be an optimization.

> 	c) cleancache relies on (unguaranteed) property of ->encode_fh() -
> if the last argument is 0, most of the instances do not look at anything
> other than dentry->d_inode.  Guess what happens with ceph?  Right, we
> end up doing spin_lock(&d.d_lock) and dget(d.d_parent).  Even though
> struct dentry d is auto, has no initializer and the only thing done to
> it is assignment to d.d_inode.  IOW, that spinlock has undefined contents
> and in case if you happen to pass that spin_lock(), dget(random_pointer)
> will get you an increment of value in memory at random address.  Fun...
> Moreover, on FAT we get dereference (read-only, but still it's at least an
> oops) of uninitialized pointer - d.d_parent->d_inode is followed.
> 
> 	d) after a look through ->encode_fh() instances:
> * all of them care about dentry->d_inode (d'oh)
> * most of them look at anything else only if that inode is a directory and
> the last argument is non-zero; in that case dentry->d_lock is grabbed and
> (usually) some properties of dentry->d_parent->d_inode are looked at.
> * most of them never do anything blocking.  Exceptions:
> 	* ceph doing dget()/dput() on parent.  For no good reason, since
> everything it does with that parent could've been done before dropping
> dentry->d_lock (and we don't do anything else that might've blocked)

I can push a fix for this when the window opens.  Or, fix things up for 
any proposed API change...

> 	* gfs2 - same, but it does igrab() on parent's inode instead.
> Again, for no good reason whatsoever.
> 
> 	e) in case of default encode_fh, cleancache gets one difference from
> exportfs_encode_fh() behaviour - it ignores ->i_generation.  Note that for
> non-default encode_fh() i_generation is *not* ignored.
> 
> 	f) in case of _no_ export operations being present, cleancache is
> basically praying for inumbers being stable for some unknown reason.  Which
> is about as efficient as prayers tend to be...	
> 
> 	One kinda-sorta solution (besides kicking cleancache out of tree)
> would be to modify ->encode_fh() API - instead of dentry + connectable
> pass it inode + NULL-or-parent-inode, with ->d_lock held by caller.  And
> demand it to be non-blocking, obviously...  It would obviously break
> ceph ->encode_fh(), since that sucker apparently wants the name of last
> component anyway.  OTOH, that's broken for reasons mentioned above.
> OTTH, fhandle encoding is bloody close to user-visible ABI ;-/

The only problem I have here is that we would like to use the parent dir 
ino and dentry name as a hint in the fh even when !connectable.  Can we do 
something like

> +     int (*encode_fh)(struct inode *inode, __u32 *fh, int *max_len,
> +                     struct inode *parent, int connectable);

?

Thanks!
sage



> 
> 	_Absolutely_ untested patch along those lines follows; it definitely
> breaks ceph, it does nothing for (e) and (f), it might be broken in all kinds
> of dumb ways, etc.  Comments are welcome...
> 
> diff --git a/fs/btrfs/export.c b/fs/btrfs/export.c
> index 5f77166..e60dbe1 100644
> --- a/fs/btrfs/export.c
> +++ b/fs/btrfs/export.c
> @@ -13,15 +13,14 @@
>  					     parent_root_objectid) / 4)
>  #define BTRFS_FID_SIZE_CONNECTABLE_ROOT (sizeof(struct btrfs_fid) / 4)
>  
> -static int btrfs_encode_fh(struct dentry *dentry, u32 *fh, int *max_len,
> -			   int connectable)
> +static int btrfs_encode_fh(struct inode *inode, u32 *fh, int *max_len,
> +			   struct inode *parent)
>  {
>  	struct btrfs_fid *fid = (struct btrfs_fid *)fh;
> -	struct inode *inode = dentry->d_inode;
>  	int len = *max_len;
>  	int type;
>  
> -	if (connectable && (len < BTRFS_FID_SIZE_CONNECTABLE)) {
> +	if (parent && (len < BTRFS_FID_SIZE_CONNECTABLE)) {
>  		*max_len = BTRFS_FID_SIZE_CONNECTABLE;
>  		return 255;
>  	} else if (len < BTRFS_FID_SIZE_NON_CONNECTABLE) {
> @@ -36,19 +35,13 @@ static int btrfs_encode_fh(struct dentry *dentry, u32 *fh, int *max_len,
>  	fid->root_objectid = BTRFS_I(inode)->root->objectid;
>  	fid->gen = inode->i_generation;
>  
> -	if (connectable && !S_ISDIR(inode->i_mode)) {
> -		struct inode *parent;
> +	if (parent) {
>  		u64 parent_root_id;
>  
> -		spin_lock(&dentry->d_lock);
> -
> -		parent = dentry->d_parent->d_inode;
>  		fid->parent_objectid = BTRFS_I(parent)->location.objectid;
>  		fid->parent_gen = parent->i_generation;
>  		parent_root_id = BTRFS_I(parent)->root->objectid;
>  
> -		spin_unlock(&dentry->d_lock);
> -
>  		if (parent_root_id != fid->root_objectid) {
>  			fid->parent_root_objectid = parent_root_id;
>  			len = BTRFS_FID_SIZE_CONNECTABLE_ROOT;
> diff --git a/fs/ceph/export.c b/fs/ceph/export.c
> index 9fbcdec..bcaca95 100644
> --- a/fs/ceph/export.c
> +++ b/fs/ceph/export.c
> @@ -249,7 +249,9 @@ static struct dentry *ceph_fh_to_parent(struct super_block *sb,
>  }
>  
>  const struct export_operations ceph_export_ops = {
> +#ifdef BROKEN
>  	.encode_fh = ceph_encode_fh,
> +#endif
>  	.fh_to_dentry = ceph_fh_to_dentry,
>  	.fh_to_parent = ceph_fh_to_parent,
>  };
> diff --git a/fs/exportfs/expfs.c b/fs/exportfs/expfs.c
> index b05acb7..f8f7029 100644
> --- a/fs/exportfs/expfs.c
> +++ b/fs/exportfs/expfs.c
> @@ -304,24 +304,23 @@ out:
>  
>  /**
>   * export_encode_fh - default export_operations->encode_fh function
> - * @dentry:  the dentry to encode
> + * @inode:   the object to encode
>   * @fh:      where to store the file handle fragment
>   * @max_len: maximum length to store there
> - * @connectable: whether to store parent information
> + * @parent:  parent directory inode, if wanted
>   *
>   * This default encode_fh function assumes that the 32 inode number
>   * is suitable for locating an inode, and that the generation number
>   * can be used to check that it is still valid.  It places them in the
>   * filehandle fragment where export_decode_fh expects to find them.
>   */
> -static int export_encode_fh(struct dentry *dentry, struct fid *fid,
> -		int *max_len, int connectable)
> +static int export_encode_fh(struct inode *inode, struct fid *fid,
> +		int *max_len, struct inode *parent)
>  {
> -	struct inode * inode = dentry->d_inode;
>  	int len = *max_len;
>  	int type = FILEID_INO32_GEN;
>  
> -	if (connectable && (len < 4)) {
> +	if (parent && (len < 4)) {
>  		*max_len = 4;
>  		return 255;
>  	} else if (len < 2) {
> @@ -332,14 +331,9 @@ static int export_encode_fh(struct dentry *dentry, struct fid *fid,
>  	len = 2;
>  	fid->i32.ino = inode->i_ino;
>  	fid->i32.gen = inode->i_generation;
> -	if (connectable && !S_ISDIR(inode->i_mode)) {
> -		struct inode *parent;
> -
> -		spin_lock(&dentry->d_lock);
> -		parent = dentry->d_parent->d_inode;
> +	if (parent) {
>  		fid->i32.parent_ino = parent->i_ino;
>  		fid->i32.parent_gen = parent->i_generation;
> -		spin_unlock(&dentry->d_lock);
>  		len = 4;
>  		type = FILEID_INO32_GEN_PARENT;
>  	}
> @@ -352,11 +346,19 @@ int exportfs_encode_fh(struct dentry *dentry, struct fid *fid, int *max_len,
>  {
>  	const struct export_operations *nop = dentry->d_sb->s_export_op;
>  	int error;
> +	struct inode *inode = dentry->d_inode, *parent = NULL;
>  
> +	if (connectable && !S_ISDIR(inode->i_mode)) {
> +		spin_lock(&dentry->d_lock);
> +		parent = dentry->d_parent->d_inode;
> +		BUG_ON(!parent);
> +	}
>  	if (nop->encode_fh)
> -		error = nop->encode_fh(dentry, fid->raw, max_len, connectable);
> +		error = nop->encode_fh(inode, fid->raw, max_len, parent);
>  	else
> -		error = export_encode_fh(dentry, fid, max_len, connectable);
> +		error = export_encode_fh(inode, fid, max_len, parent);
> +	if (parent)
> +		spin_unlock(&dentry->d_lock);
>  
>  	return error;
>  }
> diff --git a/fs/fat/inode.c b/fs/fat/inode.c
> index 7873797..0813c70 100644
> --- a/fs/fat/inode.c
> +++ b/fs/fat/inode.c
> @@ -752,10 +752,9 @@ static struct dentry *fat_fh_to_dentry(struct super_block *sb,
>  }
>  
>  static int
> -fat_encode_fh(struct dentry *de, __u32 *fh, int *lenp, int connectable)
> +fat_encode_fh(struct inode *inode, __u32 *fh, int *lenp, struct inode *parent)
>  {
>  	int len = *lenp;
> -	struct inode *inode =  de->d_inode;
>  	u32 ipos_h, ipos_m, ipos_l;
>  
>  	if (len < 5) {
> @@ -771,9 +770,9 @@ fat_encode_fh(struct dentry *de, __u32 *fh, int *lenp, int connectable)
>  	fh[1] = inode->i_generation;
>  	fh[2] = ipos_h;
>  	fh[3] = ipos_m | MSDOS_I(inode)->i_logstart;
> -	spin_lock(&de->d_lock);
> -	fh[4] = ipos_l | MSDOS_I(de->d_parent->d_inode)->i_logstart;
> -	spin_unlock(&de->d_lock);
> +	fh[4] = ipos_l;
> +	if (parent)
> +		fh[4] |= MSDOS_I(parent)->i_logstart;
>  	return 3;
>  }
>  
> diff --git a/fs/fuse/inode.c b/fs/fuse/inode.c
> index 64cf8d0..25ca731 100644
> --- a/fs/fuse/inode.c
> +++ b/fs/fuse/inode.c
> @@ -627,12 +627,10 @@ static struct dentry *fuse_get_dentry(struct super_block *sb,
>  	return ERR_PTR(err);
>  }
>  
> -static int fuse_encode_fh(struct dentry *dentry, u32 *fh, int *max_len,
> -			   int connectable)
> +static int fuse_encode_fh(struct inode *inode, u32 *fh, int *max_len,
> +			   struct inode *parent)
>  {
> -	struct inode *inode = dentry->d_inode;
> -	bool encode_parent = connectable && !S_ISDIR(inode->i_mode);
> -	int len = encode_parent ? 6 : 3;
> +	int len = parent ? 6 : 3;
>  	u64 nodeid;
>  	u32 generation;
>  
> @@ -648,14 +646,9 @@ static int fuse_encode_fh(struct dentry *dentry, u32 *fh, int *max_len,
>  	fh[1] = (u32)(nodeid & 0xffffffff);
>  	fh[2] = generation;
>  
> -	if (encode_parent) {
> -		struct inode *parent;
> -
> -		spin_lock(&dentry->d_lock);
> -		parent = dentry->d_parent->d_inode;
> +	if (parent) {
>  		nodeid = get_fuse_inode(parent)->nodeid;
>  		generation = parent->i_generation;
> -		spin_unlock(&dentry->d_lock);
>  
>  		fh[3] = (u32)(nodeid >> 32);
>  		fh[4] = (u32)(nodeid & 0xffffffff);
> @@ -663,7 +656,7 @@ static int fuse_encode_fh(struct dentry *dentry, u32 *fh, int *max_len,
>  	}
>  
>  	*max_len = len;
> -	return encode_parent ? 0x82 : 0x81;
> +	return parent ? 0x82 : 0x81;
>  }
>  
>  static struct dentry *fuse_fh_to_dentry(struct super_block *sb,
> diff --git a/fs/gfs2/export.c b/fs/gfs2/export.c
> index fe9945f..faa7916 100644
> --- a/fs/gfs2/export.c
> +++ b/fs/gfs2/export.c
> @@ -28,15 +28,14 @@
>  #define GFS2_LARGE_FH_SIZE 8
>  #define GFS2_OLD_FH_SIZE 10
>  
> -static int gfs2_encode_fh(struct dentry *dentry, __u32 *p, int *len,
> -			  int connectable)
> +static int gfs2_encode_fh(struct inode *inode, __u32 *p, int *len,
> +			  struct inode *parent)
>  {
>  	__be32 *fh = (__force __be32 *)p;
> -	struct inode *inode = dentry->d_inode;
>  	struct super_block *sb = inode->i_sb;
>  	struct gfs2_inode *ip = GFS2_I(inode);
>  
> -	if (connectable && (*len < GFS2_LARGE_FH_SIZE)) {
> +	if (parent && (*len < GFS2_LARGE_FH_SIZE)) {
>  		*len = GFS2_LARGE_FH_SIZE;
>  		return 255;
>  	} else if (*len < GFS2_SMALL_FH_SIZE) {
> @@ -50,14 +49,10 @@ static int gfs2_encode_fh(struct dentry *dentry, __u32 *p, int *len,
>  	fh[3] = cpu_to_be32(ip->i_no_addr & 0xFFFFFFFF);
>  	*len = GFS2_SMALL_FH_SIZE;
>  
> -	if (!connectable || inode == sb->s_root->d_inode)
> +	if (!parent || inode == sb->s_root->d_inode)
>  		return *len;
>  
> -	spin_lock(&dentry->d_lock);
> -	inode = dentry->d_parent->d_inode;
> -	ip = GFS2_I(inode);
> -	igrab(inode);
> -	spin_unlock(&dentry->d_lock);
> +	ip = GFS2_I(parent);
>  
>  	fh[4] = cpu_to_be32(ip->i_no_formal_ino >> 32);
>  	fh[5] = cpu_to_be32(ip->i_no_formal_ino & 0xFFFFFFFF);
> @@ -65,8 +60,6 @@ static int gfs2_encode_fh(struct dentry *dentry, __u32 *p, int *len,
>  	fh[7] = cpu_to_be32(ip->i_no_addr & 0xFFFFFFFF);
>  	*len = GFS2_LARGE_FH_SIZE;
>  
> -	iput(inode);
> -
>  	return *len;
>  }
>  
> diff --git a/fs/isofs/export.c b/fs/isofs/export.c
> index dd4687f..aa4356d 100644
> --- a/fs/isofs/export.c
> +++ b/fs/isofs/export.c
> @@ -107,12 +107,11 @@ static struct dentry *isofs_export_get_parent(struct dentry *child)
>  }
>  
>  static int
> -isofs_export_encode_fh(struct dentry *dentry,
> +isofs_export_encode_fh(struct inode *inode,
>  		       __u32 *fh32,
>  		       int *max_len,
> -		       int connectable)
> +		       struct inode *parent)
>  {
> -	struct inode * inode = dentry->d_inode;
>  	struct iso_inode_info * ei = ISOFS_I(inode);
>  	int len = *max_len;
>  	int type = 1;
> @@ -124,7 +123,7 @@ isofs_export_encode_fh(struct dentry *dentry,
>  	 * offset of the inode and the upper 16 bits of fh32[1] to
>  	 * hold the offset of the parent.
>  	 */
> -	if (connectable && (len < 5)) {
> +	if (parent && (len < 5)) {
>  		*max_len = 5;
>  		return 255;
>  	} else if (len < 3) {
> @@ -136,16 +135,12 @@ isofs_export_encode_fh(struct dentry *dentry,
>  	fh32[0] = ei->i_iget5_block;
>   	fh16[2] = (__u16)ei->i_iget5_offset;  /* fh16 [sic] */
>  	fh32[2] = inode->i_generation;
> -	if (connectable && !S_ISDIR(inode->i_mode)) {
> -		struct inode *parent;
> +	if (parent) {
>  		struct iso_inode_info *eparent;
> -		spin_lock(&dentry->d_lock);
> -		parent = dentry->d_parent->d_inode;
>  		eparent = ISOFS_I(parent);
>  		fh32[3] = eparent->i_iget5_block;
>  		fh16[3] = (__u16)eparent->i_iget5_offset;  /* fh16 [sic] */
>  		fh32[4] = parent->i_generation;
> -		spin_unlock(&dentry->d_lock);
>  		len = 5;
>  		type = 2;
>  	}
> diff --git a/fs/nilfs2/namei.c b/fs/nilfs2/namei.c
> index 1cd3f62..958e1cb 100644
> --- a/fs/nilfs2/namei.c
> +++ b/fs/nilfs2/namei.c
> @@ -519,31 +519,24 @@ static struct dentry *nilfs_fh_to_parent(struct super_block *sb, struct fid *fh,
>  	return nilfs_get_dentry(sb, fid->cno, fid->parent_ino, fid->parent_gen);
>  }
>  
> -static int nilfs_encode_fh(struct dentry *dentry, __u32 *fh, int *lenp,
> -			   int connectable)
> +static int nilfs_encode_fh(struct inode *inode, __u32 *fh, int *lenp,
> +			   struct inode *parent)
>  {
>  	struct nilfs_fid *fid = (struct nilfs_fid *)fh;
> -	struct inode *inode = dentry->d_inode;
>  	struct nilfs_root *root = NILFS_I(inode)->i_root;
>  	int type;
>  
>  	if (*lenp < NILFS_FID_SIZE_NON_CONNECTABLE ||
> -	    (connectable && *lenp < NILFS_FID_SIZE_CONNECTABLE))
> +	    (parent && *lenp < NILFS_FID_SIZE_CONNECTABLE))
>  		return 255;
>  
>  	fid->cno = root->cno;
>  	fid->ino = inode->i_ino;
>  	fid->gen = inode->i_generation;
>  
> -	if (connectable && !S_ISDIR(inode->i_mode)) {
> -		struct inode *parent;
> -
> -		spin_lock(&dentry->d_lock);
> -		parent = dentry->d_parent->d_inode;
> +	if (parent) {
>  		fid->parent_ino = parent->i_ino;
>  		fid->parent_gen = parent->i_generation;
> -		spin_unlock(&dentry->d_lock);
> -
>  		type = FILEID_NILFS_WITH_PARENT;
>  		*lenp = NILFS_FID_SIZE_CONNECTABLE;
>  	} else {
> diff --git a/fs/ocfs2/export.c b/fs/ocfs2/export.c
> index 745db42..322216a 100644
> --- a/fs/ocfs2/export.c
> +++ b/fs/ocfs2/export.c
> @@ -177,21 +177,23 @@ bail:
>  	return parent;
>  }
>  
> -static int ocfs2_encode_fh(struct dentry *dentry, u32 *fh_in, int *max_len,
> -			   int connectable)
> +static int ocfs2_encode_fh(struct inode *inode, u32 *fh_in, int *max_len,
> +			   struct inode *parent)
>  {
> -	struct inode *inode = dentry->d_inode;
>  	int len = *max_len;
>  	int type = 1;
>  	u64 blkno;
>  	u32 generation;
>  	__le32 *fh = (__force __le32 *) fh_in;
>  
> +#ifdef TRACE_HOOKS_ARE_NOT_BRAINDEAD_IN_YOUR_OPINION
> +#error "You go ahead and fix that mess, then.  Somehow"
>  	trace_ocfs2_encode_fh_begin(dentry, dentry->d_name.len,
>  				    dentry->d_name.name,
>  				    fh, len, connectable);
> +#endif
>  
> -	if (connectable && (len < 6)) {
> +	if (parent && (len < 6)) {
>  		*max_len = 6;
>  		type = 255;
>  		goto bail;
> @@ -211,12 +213,7 @@ static int ocfs2_encode_fh(struct dentry *dentry, u32 *fh_in, int *max_len,
>  	fh[1] = cpu_to_le32((u32)(blkno & 0xffffffff));
>  	fh[2] = cpu_to_le32(generation);
>  
> -	if (connectable && !S_ISDIR(inode->i_mode)) {
> -		struct inode *parent;
> -
> -		spin_lock(&dentry->d_lock);
> -
> -		parent = dentry->d_parent->d_inode;
> +	if (parent) {
>  		blkno = OCFS2_I(parent)->ip_blkno;
>  		generation = parent->i_generation;
>  
> @@ -224,8 +221,6 @@ static int ocfs2_encode_fh(struct dentry *dentry, u32 *fh_in, int *max_len,
>  		fh[4] = cpu_to_le32((u32)(blkno & 0xffffffff));
>  		fh[5] = cpu_to_le32(generation);
>  
> -		spin_unlock(&dentry->d_lock);
> -
>  		len = 6;
>  		type = 2;
>  
> diff --git a/fs/reiserfs/inode.c b/fs/reiserfs/inode.c
> index 9e8cd5a..0095740 100644
> --- a/fs/reiserfs/inode.c
> +++ b/fs/reiserfs/inode.c
> @@ -1592,13 +1592,12 @@ struct dentry *reiserfs_fh_to_parent(struct super_block *sb, struct fid *fid,
>  		(fh_type == 6) ? fid->raw[5] : 0);
>  }
>  
> -int reiserfs_encode_fh(struct dentry *dentry, __u32 * data, int *lenp,
> -		       int need_parent)
> +int reiserfs_encode_fh(struct inode *inode, __u32 * data, int *lenp,
> +		       struct inode *parent)
>  {
> -	struct inode *inode = dentry->d_inode;
>  	int maxlen = *lenp;
>  
> -	if (need_parent && (maxlen < 5)) {
> +	if (parent && (maxlen < 5)) {
>  		*lenp = 5;
>  		return 255;
>  	} else if (maxlen < 3) {
> @@ -1611,19 +1610,16 @@ int reiserfs_encode_fh(struct dentry *dentry, __u32 * data, int *lenp,
>  	data[2] = inode->i_generation;
>  	*lenp = 3;
>  	/* no room for directory info? return what we've stored so far */
> -	if (maxlen < 5 || !need_parent)
> +	if (maxlen < 5 || !parent)
>  		return 3;
>  
> -	spin_lock(&dentry->d_lock);
> -	inode = dentry->d_parent->d_inode;
> -	data[3] = inode->i_ino;
> -	data[4] = le32_to_cpu(INODE_PKEY(inode)->k_dir_id);
> +	data[3] = parent->i_ino;
> +	data[4] = le32_to_cpu(INODE_PKEY(parent)->k_dir_id);
>  	*lenp = 5;
>  	if (maxlen >= 6) {
> -		data[5] = inode->i_generation;
> +		data[5] = parent->i_generation;
>  		*lenp = 6;
>  	}
> -	spin_unlock(&dentry->d_lock);
>  	return *lenp;
>  }
>  
> diff --git a/fs/udf/namei.c b/fs/udf/namei.c
> index 08bf46e..6b9b526 100644
> --- a/fs/udf/namei.c
> +++ b/fs/udf/namei.c
> @@ -1273,16 +1273,15 @@ static struct dentry *udf_fh_to_parent(struct super_block *sb,
>  				 fid->udf.parent_partref,
>  				 fid->udf.parent_generation);
>  }
> -static int udf_encode_fh(struct dentry *de, __u32 *fh, int *lenp,
> -			 int connectable)
> +static int udf_encode_fh(struct inode *inode, __u32 *fh, int *lenp,
> +			 struct inode *parent)
>  {
>  	int len = *lenp;
> -	struct inode *inode =  de->d_inode;
>  	struct kernel_lb_addr location = UDF_I(inode)->i_location;
>  	struct fid *fid = (struct fid *)fh;
>  	int type = FILEID_UDF_WITHOUT_PARENT;
>  
> -	if (connectable && (len < 5)) {
> +	if (parent && (len < 5)) {
>  		*lenp = 5;
>  		return 255;
>  	} else if (len < 3) {
> @@ -1295,14 +1294,11 @@ static int udf_encode_fh(struct dentry *de, __u32 *fh, int *lenp,
>  	fid->udf.partref = location.partitionReferenceNum;
>  	fid->udf.generation = inode->i_generation;
>  
> -	if (connectable && !S_ISDIR(inode->i_mode)) {
> -		spin_lock(&de->d_lock);
> -		inode = de->d_parent->d_inode;
> -		location = UDF_I(inode)->i_location;
> +	if (parent) {
> +		location = UDF_I(parent)->i_location;
>  		fid->udf.parent_block = location.logicalBlockNum;
>  		fid->udf.parent_partref = location.partitionReferenceNum;
>  		fid->udf.parent_generation = inode->i_generation;
> -		spin_unlock(&de->d_lock);
>  		*lenp = 5;
>  		type = FILEID_UDF_WITH_PARENT;
>  	}
> diff --git a/fs/xfs/xfs_export.c b/fs/xfs/xfs_export.c
> index 558910f..1a2adfc 100644
> --- a/fs/xfs/xfs_export.c
> +++ b/fs/xfs/xfs_export.c
> @@ -53,19 +53,18 @@ static int xfs_fileid_length(int fileid_type)
>  
>  STATIC int
>  xfs_fs_encode_fh(
> -	struct dentry		*dentry,
> -	__u32			*fh,
> -	int			*max_len,
> -	int			connectable)
> +	struct inode	*inode,
> +	__u32		*fh,
> +	int		*max_len,
> +	struct inode	*parent)
>  {
>  	struct fid		*fid = (struct fid *)fh;
>  	struct xfs_fid64	*fid64 = (struct xfs_fid64 *)fh;
> -	struct inode		*inode = dentry->d_inode;
>  	int			fileid_type;
>  	int			len;
>  
>  	/* Directories don't need their parent encoded, they have ".." */
> -	if (S_ISDIR(inode->i_mode) || !connectable)
> +	if (!parent)
>  		fileid_type = FILEID_INO32_GEN;
>  	else
>  		fileid_type = FILEID_INO32_GEN_PARENT;
> @@ -97,20 +96,16 @@ xfs_fs_encode_fh(
>  
>  	switch (fileid_type) {
>  	case FILEID_INO32_GEN_PARENT:
> -		spin_lock(&dentry->d_lock);
> -		fid->i32.parent_ino = XFS_I(dentry->d_parent->d_inode)->i_ino;
> -		fid->i32.parent_gen = dentry->d_parent->d_inode->i_generation;
> -		spin_unlock(&dentry->d_lock);
> +		fid->i32.parent_ino = XFS_I(parent)->i_ino;
> +		fid->i32.parent_gen = parent->i_generation;
>  		/*FALLTHRU*/
>  	case FILEID_INO32_GEN:
>  		fid->i32.ino = XFS_I(inode)->i_ino;
>  		fid->i32.gen = inode->i_generation;
>  		break;
>  	case FILEID_INO32_GEN_PARENT | XFS_FILEID_TYPE_64FLAG:
> -		spin_lock(&dentry->d_lock);
> -		fid64->parent_ino = XFS_I(dentry->d_parent->d_inode)->i_ino;
> -		fid64->parent_gen = dentry->d_parent->d_inode->i_generation;
> -		spin_unlock(&dentry->d_lock);
> +		fid64->parent_ino = XFS_I(parent)->i_ino;
> +		fid64->parent_gen = parent->i_generation;
>  		/*FALLTHRU*/
>  	case FILEID_INO32_GEN | XFS_FILEID_TYPE_64FLAG:
>  		fid64->ino = XFS_I(inode)->i_ino;
> diff --git a/include/linux/exportfs.h b/include/linux/exportfs.h
> index 3a4cef5..12291a7 100644
> --- a/include/linux/exportfs.h
> +++ b/include/linux/exportfs.h
> @@ -165,8 +165,8 @@ struct fid {
>   */
>  
>  struct export_operations {
> -	int (*encode_fh)(struct dentry *de, __u32 *fh, int *max_len,
> -			int connectable);
> +	int (*encode_fh)(struct inode *inode, __u32 *fh, int *max_len,
> +			struct inode *parent);
>  	struct dentry * (*fh_to_dentry)(struct super_block *sb, struct fid *fid,
>  			int fh_len, int fh_type);
>  	struct dentry * (*fh_to_parent)(struct super_block *sb, struct fid *fid,
> diff --git a/include/linux/reiserfs_fs.h b/include/linux/reiserfs_fs.h
> index 26be28f..5e31164 100644
> --- a/include/linux/reiserfs_fs.h
> +++ b/include/linux/reiserfs_fs.h
> @@ -2043,8 +2043,8 @@ struct dentry *reiserfs_fh_to_dentry(struct super_block *sb, struct fid *fid,
>  				     int fh_len, int fh_type);
>  struct dentry *reiserfs_fh_to_parent(struct super_block *sb, struct fid *fid,
>  				     int fh_len, int fh_type);
> -int reiserfs_encode_fh(struct dentry *dentry, __u32 * data, int *lenp,
> -		       int connectable);
> +int reiserfs_encode_fh(struct inode *inode, __u32 * data, int *lenp,
> +		       struct inode *parent);
>  
>  int reiserfs_truncate_file(struct inode *, int update_timestamps);
>  void make_cpu_key(struct cpu_key *cpu_key, struct inode *inode, loff_t offset,
> diff --git a/mm/cleancache.c b/mm/cleancache.c
> index bcaae4c..668c74f 100644
> --- a/mm/cleancache.c
> +++ b/mm/cleancache.c
> @@ -75,7 +75,7 @@ EXPORT_SYMBOL(__cleancache_init_shared_fs);
>  static int cleancache_get_key(struct inode *inode,
>  			      struct cleancache_filekey *key)
>  {
> -	int (*fhfn)(struct dentry *, __u32 *fh, int *, int);
> +	int (*fhfn)(struct inode *, __u32 *fh, int *, struct inode *);
>  	int len = 0, maxlen = CLEANCACHE_KEY_MAX;
>  	struct super_block *sb = inode->i_sb;
>  
> @@ -83,9 +83,7 @@ static int cleancache_get_key(struct inode *inode,
>  	if (sb->s_export_op != NULL) {
>  		fhfn = sb->s_export_op->encode_fh;
>  		if  (fhfn) {
> -			struct dentry d;
> -			d.d_inode = inode;
> -			len = (*fhfn)(&d, &key->u.fh[0], &maxlen, 0);
> +			len = (*fhfn)(inode, &key->u.fh[0], &maxlen, NULL);
>  			if (len <= 0 || len == 255)
>  				return -1;
>  			if (maxlen > CLEANCACHE_KEY_MAX)
> diff --git a/mm/shmem.c b/mm/shmem.c
> index 82d13f3..e438e24 100644
> --- a/mm/shmem.c
> +++ b/mm/shmem.c
> @@ -1952,11 +1952,9 @@ static struct dentry *shmem_fh_to_dentry(struct super_block *sb,
>  	return dentry;
>  }
>  
> -static int shmem_encode_fh(struct dentry *dentry, __u32 *fh, int *len,
> -				int connectable)
> +static int shmem_encode_fh(struct inode *inode, __u32 *fh, int *len,
> +				struct inode *parent)
>  {
> -	struct inode *inode = dentry->d_inode;
> -
>  	if (*len < 3) {
>  		*len = 3;
>  		return 255;
> 
> 

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [RFC] ->encode_fh() and related breakage
  2011-12-30  7:57   ` Al Viro
@ 2012-01-09 13:55     ` Bernd Schubert
  0 siblings, 0 replies; 7+ messages in thread
From: Bernd Schubert @ 2012-01-09 13:55 UTC (permalink / raw)
  To: Al Viro
  Cc: Andreas Dilger, linux-fsdevel, Linus Torvalds, Sage Weil,
	Dan Magenheimer

On 12/30/2011 08:57 AM, Al Viro wrote:
> On Fri, Dec 30, 2011 at 01:14:16AM -0600, Andreas Dilger wrote:
>> On 2011-12-30, at 12:27 AM, Al Viro wrote:
>>> 	One kinda-sorta solution (besides kicking cleancache out of tree)
>>> would be to modify ->encode_fh() API - instead of dentry + connectable
>>> pass it inode + NULL-or-parent-inode, with ->d_lock held by caller.  And
>>> demand it to be non-blocking, obviously...  It would obviously break
>>> ceph ->encode_fh(), since that sucker apparently wants the name of last
>>> component anyway.  OTOH, that's broken for reasons mentioned above.
>>> OTTH, fhandle encoding is bloody close to user-visible ABI ;-/
>>
>> Is it possible to always pass d_parent if this is available, rather than
>> only if @connectable is passed?  For network filesystems re-exporting NFS
>> it is desirable to always be passed the parent directory.
>
> Even without check_subtree on the export?  Whatever the hell for?
> Note that using the parent means broken rename() - fhandles will be
> changed when file gets moved from one directory to another.  It's
> exactly what is asked for in case of check_subtree, but otherwise it's
> an obvious misfeature.  _And_ it costs extra locking and extra work
> on decode_fh side...
>
> I obviously looked only at the in-tree filesystems; out-of-tree code might
> be doing very weird things, up to and including ROT13 of uuencoded UTF16
> representation of pathname stored in fhandle.  And yes, fh representation
> is too close to being a part of ABI for comfort, but then it's an ABI
> exported by out-of-tree code, so I'm not going to worry too much about
> its stability...
>
> "desirable for network filesystems" is too damn vague; could you give more
> details?

FhGFS has the distributed metadata feature, where the parent directory 
identifies the corresponding metadata server of a dentry.
I guess other parallel filesystems might take the same approach.


Cheers,
Bernd

--
Bernd Schubert
Fraunhofer ITWM



^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2012-01-09 13:55 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-12-30  6:27 [RFC] ->encode_fh() and related breakage Al Viro
2011-12-30  7:14 ` Andreas Dilger
2011-12-30  7:57   ` Al Viro
2012-01-09 13:55     ` Bernd Schubert
2011-12-30 16:48 ` Dan Magenheimer
2012-01-03 11:31 ` Steven Whitehouse
2012-01-03 17:00 ` Sage Weil

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).